Machine Learning by example.

Big Data, Machine Learning and Deep Learning workshop

View project on GitHub

TODO find a title : … Hand-on Big Data and deep learning …

proposed Schedule

see for more details

Two parts Neural Network for Machine Learning and Text mining as part of Data mining

Part 1 : Neural Network for Machine Learning.

It is meant to be a hands-on course without sacrificing the conceptual background. As such the first few code examples are developed (using numpy or cupy) from basic principles without using any ML framework. After that we use Tensorflow v2 for almost all examples.

see for more details

TODO: complete this intro

Part 2: Data mining, text mining and Big Data Concepts applications with Spark and Python

What data tell us?

  • What is happening? : Descriptive
  • Why is it happening? : Diagnostic
  • What is likely to happen? : Predictive
  • What do I need to do? : Prescriptive

This part, after a first general presentation on the techniques and theories about text mining, presents simple examples and a more realistic use case:

Learning how to examine, with Spark and python (pySpark), the content of a fairly large text base, using Latent Semantic Analysis (LSA). The primary goal is to explore the data by determining which “concepts” (or semantic classes) best explain the data. We will also extract representative documents and make queries that find documents in the database that mention certain terms or that are similar to a query document.

LSA aims to better represent a corpus of documents by exploring the relationships between words in documents. The aim is to “distil” from the corpus a set of relevant “concepts”. Each concept captures a direction of variation in the data, which often corresponds to a subject addressed in the corpus. Broadly speaking, each concept is described by three characteristics: the relevance of the concept for each document in the corpus, the affinity with the terms present in the corpus and the usefulness of the concept in describing the variance of the data. By selecting only the most important concepts, LSA can describe the data with a rough representation that employs fewer concepts, eliminates “noise”, and merges similar topics.

see for more details


TODO …

  • Start the forked project
  • Validate the idea
  • Validate Pedagogical methodologie
  • Validate the planning (sessions)
  • Complete the intro
  • Develop asynchronous lectures (vidéo and assessment of each item of Session1 fundamentals of Big data, MAL and text mining as part of data mining
  • Develop the lab (how-to spark+python, and the examples)

Proposed methodology (Pedagogy)

Some sessions could be asynchronous and be freely accessible by registered students (or even public with a creative common licence). This kind of sessions will contain small parts: (small videos 10-15mn, the assessment for each part, Tools to communicate with the instructor: forum or chat). Each asynchronous session ends with an synchronous online class for debriefing and some complements based on the results of assessments and learner questions)

Platforms: google classroom, google drive, github, Windows 10 or Linux, Spark, Python, PySpark and Jupyter Notebook.