In-depth tutorials with Practical Sessions will take place on DAY 4 & 5
40 participants per session: registrations after application acceptance.
Some sessions will be scheduled on 2 days and others will be repeated twice.
A laptop is required.
Below is the list of confirmed sessions as of today (click on the session title to see detailed information).
APPROXIMATE BAYESIAN INFERENCE: OLD AND NEW
Approximate Bayesian inference (ABI) offers many promising solutions to advance modern machine-learning methods such as deep learning and reinforcement learning. This tutorial will give an overview of old and new methods for ABI. We will start with motivating applications of ABI methods in modern machine learning, and discuss the computational challenges associated with them. We will review traditional methods (such as the Laplace approximation, Markov chain Monte Carlo, Expectation Propagation, and Variational Inference) as well as modern methods that are motivated by deep learning (e.g., stochastic-gradient variational inference and variational auto-encoders). Overall, the tutorial will aim to motivate and empower the audience to pursue research in ABI, as well as to apply the ABI methods to real-world problems.
Basic knowledge of Machine Learning and Programming in Python.
CAUSALITY AND MACHINE LEARNING
1-day-long session repeated twice
– Kun ZHANG, Assistant Professor — Carnegie Mellon University
– Mingming GONG, PostDoc, University of Pittsburgh & Carnegie Mellon University
Does smoking cause cancer? Can we find the causal direction between two variables by analyzing their observed values? In our daily life and science, people often attempt to answer such causal questions, for the sake of understanding and manipulating systems properly. On the other hand, we are also often faced with problems of how to properly make use of causal knowledge for machine learning. For instance, how can we make optimal predictions in non-stationary environments? In the past decades, interesting advances were made in machine learning, statistics, and philosophy for tackling long-standing causality problems, such as how to discover causal knowledge from purely observational data and how to infer the effect of interventions using such data. Furthermore, recently it has been shown that causal information can facilitate understanding and solving various machine learning problems, including transfer learning and semi-supervised learning. This tutorial reviews essential concepts in causality studies and is focused on how to learn causal relations from observation data and why and how the causal perspective helps in machine learning and other tasks.
CRASH COURSE IN DEEP LEARNING AND PYTORCH
1-day-long session repeated twice
June 28th and June 29th
– Chintala SOUMITH, Artificial Intelligence Researcher Engineer at Facebook
You shall learn the basics in deep learning with examples in pytorch.
After this workshop, you will have a basic understanding of convolutional networks, standard gradient based optimization methods, pytorch tensors, autograd, and deep-learning specific modules.
Knowledge of python programming
Basics of linear algebra and statistics
Environment : Python Jupyter
Packages: numpy, pytorch, torchvision, matplotlib.
PyTorch and torchvision wheels are available on http://pytorch.org
HYPOTHESIS TESTING USING KERNEL EMBEDDINGS
INTRODUCTION TO REINFORCEMENT LEARNING
Single session on 2 days
– Olivier PIETQUIN, Researcher at Google Brain, Lille, France
In this session session we will address the fundamentals of Reinforcement Learning. Reinforcement learning is the machine learning answer to sequential decision making and control. It has known an increasing interest since its recent success in learning to play Atari games from raw pixels or helping mastering the game of Go. We will first describe the underlying model of Markov Decision Processes and describe the fundamental principles of Dynamic Programming. From there, we will derive algorithms able to learn a control policy through interactions with their environment in the case of discrete state and actions spaces. The second day, we will present methods allowing to scale up reinforcement learning algorithms so as to address continuous state and action spaces and real world problems. This will lead us all the way to deep Reinforcement Learning and its applications to video games, robotics and Go.
Have the packages Numpy, matplotlib et OpenAI Gym installed
MACHINE LEARNING FOR GENETIC DATA AND BIOMEDICAL IMAGES
1-day long session on Machine learning for genetic data: June 28th
1-day long session on Machine learning for biomedical images: June 29th
– Chloé-Agathe AZENCOTT, Researcher at the Centre for Computational Biology (CBIO) of Mines ParisTech, Institut Curie and INSERM
– Jean-Philippe VERT, Director, Centre for Computational Biology (CBIO) at MINES ParisTech, Institut Curie and INSERM, and Research Professor, Department of Mathematics and Applications, ENS Paris
– Thomas WALTER, Researcher at the Centre for Computational Biology (CBIO) of Mines ParisTech, Institut Curie and INSERM
MISSING DATA IMPUTATION
1-day long session
– Julie JOSSE, Professor, Ecole polytechnique, Palaiseau, France
The ability to easily collect and gather a large amount of data from different sources can be seen as an opportunity to better understand many processes. It has already led to breakthroughs in several application areas. However, due to the wide heterogeneity of measurements and objectives, these large databases often exhibit an extraordinary high number of missing values. Hence, in addition to scientific questions, such data also present some important methodological and technical challenges for data analyst. In this tutorial, we give an overview of the missing values literature as well as the recent improvements that caught the attention of the community due to their ability to handle large matrices with large amount of missing entries. We will illustrate the methods on medical, environmental and survey data.
Knowledge in PCA and in linear regression.
Environment : R and Rstudio
Packages: missMDA, missForest, Amelia, mice, naniar, VIM, norm
OPTIMAL TRANSPORT AND MACHINE LEARNING
1-day long session repeated twice
June 28th and June 29th
– Marco CUTURI, Professor ENSAE, Palaiseau, France
– Nicolas COURTY, Assistant Professor, Irisa, Rennes, France
– Rémi FLAMARY, Assistant Professor, University of Nice, France
Optimal transport (OT) provides a powerful and flexible way to compare probability measures, of all shapes: absolutely continuous, degenerate, or discrete. This includes of course point clouds, histograms of features, and more generally datasets, parametric densities or generative models. Originally proposed by Monge in the eighteenth century, this theory later led to Nobel Prizes for Koopmans and Kantorovich as well as Villani’s Fields Medal in 2010.
After having attracted the interest of mathematicians for several years, OT has recently reached the machine learning community, because it can now tackle (both in theory and numerically) challenging learning scenarios, including for instance dimensionality reduction and structured prediction problems that involve histograms or point clouds, and estimation of parametric densities or generative models in highly degenerate / high-dimensional problems.
We will present in this course a brief introduction to all the important elements needed to grasp this new tool, with an emphasis on algorithmics (LP and regularized formulations) as well as applications (barycenters, distance between texts, topic models, generative models)
– Python/numpy/matplotlib with jupyter notebook or spyder
– POT Python optimal transport toolbox (easy install through anaconda or pip)
MODERN RECOMMENDATION TECHNIQUES IN THE REAL WORLD
1-day long session: June 29th
– Olivier KOCH, Staff Machine Learning Lead, Criteo
– Flavian VASILE, Reasearch Lead, Learning Representations team, Criteo
In this course we will cover a variety of machine learning-based methods for recommendation, ranging from classical approaches to modern Deep Learning-based techniques. The focus of the course in on real-world recommendation and the two big resulting questions:
- How to scale Recommender Systems both in space and time (how to make them work for large sets of items and user profiles and how to make them be able to take into account real-time user activity signals) and
- How to mitigate the inherent discrepancy between offline technical metrics and the actual online performance.
The course will be divided in five parts:
- In part 1 we will offer a quick introduction to the field of Recommendation, offer examples of state-of-the-art Recommender Systems in the wild and review the main ML approaches powering them.
- In part 2 we will review classical ML approaches for Recommendation starting with Collaborative Filtering and continuing with Matrix Factorization, Content-Based Recommendation and Hybrid Solutions
- In part 3 we will go over modern Deep Learning based models and introduce RNN-based user modelling for Recommendation
- In part 4 we will go over one of the most promising upcoming change in Recommendation, which is Causal Recommendation, that merges ideas from Machine Learning with ideas from Causal Inference and go over cutting-edge emerging techniques
- In part 5 we will wrap-up with conclusions and have Q&A sessions with the participants in the course.
All theoretical parts (2-4) will have a practical session at the end, where the participants will be able to get hands-on experience on the methods introduces in the theoretical section.
- jupyter notebooks
- python (version TBD)
- tensorflow, pytorch, matplotlib, pandas, numpy
- we will provide a web page with install instructions and sanity checks
M1-level knowledge in linear algebra and stats
SUBMODULARITY IN DATA SCIENCE
– Andreas KRAUSE, Professor, ETH Zürich, Academic Co-Director of the Swiss Data Science Center
TOPOLOGICAL DATA ANALYSIS
Schedule: two 1-day-long courses; the 2 sessions can be attended independently.
– Vitaliy KURLIN (June 28), Associate Professor, University of Liverpool; Data Scientist, Materials Innovation Factory
– Pawel DLOTKO (June 29), Assistant Professor, Swansea University
– Krasen SAMARDZHIEV (June 28-29), PhD student – University of Liverpool
– Vincent ROUVREAU (June 29), Research Software Engineer – Inria
June 28: Topological algorithms for data skeletonisation in Python.
Hour 1. A lecture with slides: point clouds as metric spaces; a review of clustering; a minimum spanning tree; MST-based clusterings, e.g. the single-edge clustering.
Hour 2. Demos and practical exercises on a Minimum Spanning Tree.
Hour 3. A lecture with slides: representing a point cloud by an abstract graph, the graph reconstruction problem; the Mapper algorithm.
Hour 4. Demos and practical exercises on the Mapper algorithm.
Hour 5. A lecture with slides: a Delaunay triangulation of a point cloud in the plane, 1-dimensional persistence for the filtration of alpha-complexes, a Homologically Persistent Skeleton.
Hour 6. Demos and practical exercises on a Homologically Persistent Skeleton.
Hour 1. Followup and generalization of simplicial complexes. Recovering shapes from point clouds. Rips, Cech and Alpha complexes and the comparisons of those constructions. Practical session on generating some basic point clouds (like circle, sphere, torus) and reconstruction of the sets with simplicial complexes using Gudhi (C++ or python version).
Hour 2. Euler characteristics, homology group, Betti numbers – definition and basic examples. Computations of Betti numbers of sets obtained in Hour 1.
Hour 3. Filtered complexes and persistent homology. Extending on the previous studies to build parameter-free descriptor of the data.
Hour 4. Example of cubical complexes obtained form image data, numerical simulation and point cloud data. Experiments with homology and persistent homology of cubical complexes (theory with practical experiments).
Hour 5. Persistence representations. How to tunnel persistent homology as an input for machine learning. Experiments with Gudhi.
Hour 6. If time permits, experiments with detection of (semi) periodicity using persistent homology.
Attendees are kindly asked to bring their own laptops to participate to these sessions.