## PhD Track

## Data Science and Artificial Intelligence

### Overview

Nowadays, the major players in the economic world are becoming increasingly aware of the potential of their data. They are constantly looking for ways of exploiting their data and extracting as much useful information as possible from them. The role of data scientists is to help companies in this task, by means of acquiring, storing, organizing and processing this mass of information in order to extract value. An expert in data science and artificial intelligence should have cross-disciplinary skills, ranging from a solid background in mathematics and statistics to mastering the IT tools and infrastructure necessary for data management and processing. In addition, data scientists must have the curiosity and thirst to understand the application domain in which they work. The objective of this program is to prepare students to become leaders in data science research and to help them to develop the research skills needed to pursue careers in academia or in industry.

**Language of instruction: **English and French

**ECTS: **120 for the first 2 years of MSc

**Oriented: **Research

**Duration:**MSc (2 years) and a PhD (3 years)

**Courses Location: **IP Paris

### Educational Objectives

Advances in data acquisition, computational speed and data analysis methods marked the beginning of a major transformation, which profoundly affects all sectors (from e-commerce to scientific research, finance and health). The exploitation of immense amount of data requires sophisticated mathematical techniques to extract relevant information and to incorporate it into an artificial inteligence system. All these methods form the basis of data science and artificial intelligence. The transition from data to knowledge and algorithms brings many challenges that require an interdisciplinary approach. Data science and artificial intelligence rely heavily on the statistical processing of information: mathematical statistics, numerical methods, statistical learning or machine learning. From the analysis of exploratory data to the most sophisticated techniques of inference (hierarchical graphical models) and classification or regression (deep learning, support vector machine, etc.), a wide range of mathematical and computational statistics and learning methods are used. These methods, in order to be developed on a large scale, require the mastery of data distribution mechanisms and very large-scale calculations. Applied mathematics (functional analysis, numerical analysis, convex and non-convex optimization) also has an essential role to play in building artificial intelligence systems. From an application point of view, data science and artificial intelligence have a strong impact on many sectors. There is currently a large worldwide shortage of data scientists and experts in artificial intelligence. Students from data science and artificial intelligence programs are therefore eagerly awaited on the job market. In the present track, we train students to be doctoral candidates. The program helps the students to develop research skills through a research seminar, research mentoring by Professors of IP Paris, a summer school preparing students to carry out cutting edge research in data science and artificial intelligence as well as a first research project before the final internship.

### Program Structure

**First year courses**

Candidates will choose one referent institution among the following options:

- Ecole Polytechnique
- ENSTA Paris
- Telecom Paris and Telecom SudParis
- ENSAE Paris

Each student enrolled in this program will have a tutor, who is a faculty member of the host institution. Most courses of the first year will be chosen from the curriculum of the host institution, with a possibility to complement the offer with courses of the other institutions. The program of each student has to be discussed and validated by his/her tutor. A subsample of proposed courses, grouped by themes, is presented below:

**a) Courses on statistics**

Ecole Polytechnique, Telecom Paris and ENSAE Paris propose introductory and more advanced courses in statistics. In addition, ENSAE offers an individual project on applied statistics. There are also a number of courses on more specialized topics such as

- Bayesian filtering in hidden Markov models (Telecom SudParis)
- Parametric statistics and extreme value theory (Telecom Paris)
- Sequential Monte Carlo methods (Telecom SudParis)
- Monte Carlo Methods (Ecole Polytechnique)
- Introduction to hypothesis testing and sampling theory (Telecom Paris)
- Martingales and asymptotic statistics (Telecom Paris)
- Introduction to time series (Telecom Paris and ENSAE Paris)
- Linear models in statistics (Telecom Paris)
- Causal Inference (Ecole Polytechnique)
- Statistical modeling seminar (ENSAE Paris)

**b) Courses on machine learning and learning theory**

Ecole Polytechnique, Telecom Paris and ENSAE Paris have courses on introduction to machine learning and learning theory, that prepare students for the more advanced courses of the second year of the program. In addition, Ecole Polytechnique proposes an individual project on Statistical Learning. There are also a number of more specialized courses at Ecole Polytechnique, Telecom Paris and Telecom SudParis including:

- Data Analysis and Unsupervised learning (Ecole Polytechnique)
- Optimization for machine learning (Telecom Paris)
- Machine learning for text mining (Telecom Paris)
- Introduction to deep learning (Ecole Polytechnique and Telecom Paris)

**c) Courses on probability theory and related topics**

Students admitted in this program are expected to have a solid background in probability theory. In some exceptional cases, students might be invited to attend the course probability theory of ENSAE. A number of advanced courses on related topics are proposed:

- Martingales and Stochastic Algorithms (ENSTA Paris)
- Introduction to Stochastic Calculus (ENSTA Paris and Telelcom Paris)
- Probabilistic numerical methods (ENSTA Paris)
- Introduction to stochastic processes (Telelcom Paris and ENSAE Paris)
- Hilbert spaces, mathematical statistics and Probability (Telelcom Paris)
- Simulation and Monte Carlo methods (ENSAE)
- Stochastic models for Finance (Ecole Polytechnique)

**d) Courses on optimisation, operations research and control theory**

- Control theory (ENSTA Paris)
- Introduction to operations research (ENSTA Paris, Ecole Polytechnique)
- Games, graphs and operational research (ENSTA Paris)
- Advanced differentiable optimization (ENSTA Paris)
- Game theory (ENSAE Paris)

**e) Other courses**

There are a number of other courses that the students can choose. They range from mathematical courses like Spectral theory of self-adjoint operators to very applied courses such as Python for data scientists. The list is as follows:

- Communication Networks and Social Networks (Ecole Polytechnique)
- Spectral theory of self-adjoint operators (ENSTA Paris)
- Introduction to databases (ENSTA Paris)
- High performance scientific computing (ENSTA Paris)
- Numerical analysis (Telecom Paris)
- Signal Processing (Ecole Polytechnique)
- Introduction to R and Python (ENSAE Paris)
- LaTeX (ENSAE Paris)
- Econometrics (ENSAE Paris)
- Survey theory (ENSAE Paris)
- C++ (ENSAE Paris)
- Python for data scientists (ENSAE Paris)

**Second year courses**

Each student enrolled in the program will be assigned an advisor, who is a IP Paris faculty. Sudents have to successfully complete 60 ECTS by choosing courses among those offered in the *M2 datascience program*[1]. The individual program of each student has to be discussed and validated by the advisor. The second-year is split into four periods. Courses take place on 3 periods. The last fourth period, of at least 14 weeks, corresponds to the final internship.**1st period: September to Mid-November**

- Optimization for Data Science
- Bayesian Learning for partially observed dynamical systems
- Introduction to Bayesian learning
- Statistical Learning Theory
- Convex Analysis and Optimization Theory
- Machine Learning
- Visualization and Visual Analytics for Data Science
- Introduction to Graphical Models
- Deep Learning I
- Big Data Frameworks
- Data Camp

**2nd period: Mid-November to end of January)**

- Optimization for Data Science
- Bootstrap and resampling methods in machine learning
- High dimensional matrix estimation
- Partially observed Markov chains in signal and image
- Convex Analysis and Optimization Theory
- Reinforcement learning
- Graphical models for large scale content access
- Theoretical guidelines for high-dimensional data analysis
- Generalisation properties of algorithms in ML
- Research project I

**3rd period (February to Early April)**

- Deep learning I
- Missing Data and causality
- Audio and music information retrieval
- Tail events analysis: Robustness, outliers and models for extreme values
- Structured Data: learning and prediction
- Systems for Big Data Analytics
- Multi-object estimation and filtering
- Stochastic approximation and reinforcement learning
- Kernel Techniques with Information Theoretical Applications
- Research project 2

**4th period: Internship**

Final internship in academia, minimum duration 14 weeks **starting in April**

[1] Detailed information on *M2 datascience* program is available here

### Involved Laboratories

- LTCI: Information Processing and Communications Laboratory (Télécom Paris)
- SAMOVAR (Télécom SudParis)
- CMAP : Applied Mathematics Center (Ecole Polytechnique)
- CREST: Center for Research in Economics and Statistics (ENSAE Paris)
- UMA: Applied Mathematics UER (ENSTA Paris)

### Admissions

Application guidelines for a PhD Track at IP Paris

**Academic prerequisites**

Bachelor in mathematics or statistics, with outstanding results. Evidence of research potential is essential as the main goal of such a PhD program is to train leaders in data science research. We expect competition to join the program to be high; therefore, even strong results in a very good bachelor program may not guarantee admission to the program. Students who have completed the first year of an equivalent program may exceptionally be directly admitted to the second year.

Students who have completed the first year of an equivalent program may exceptionally be directly admitted to the second year (4-year PhD program).

**Language prerequisites**

A certificate of competence in English (TOEFL, IELTS, TOEIC, Cambridge ESOL) Level B2 (Students who studied in English speaking Colleges are exempted).

**Application timeline**

The deadline for PhD Track application is February 28, 2020 – –** Extended to March 22, 2020**

Eligible applicants on the basis of the provided documentation will be contacted for an interview from the 15th of March onwards.

You shall receive an answer 2 months after the application deadline of the session.

### Tuition Fees

**Annual tuition fees for year 1 and 2 (MSc):**for PhD track students IP Paris has decided to subsidize the tuition fees for Years 1 and 2 bringing them down from 6250 Euros (non EU students) / 4250 Euros (EU students) to the level of the official tuition fees of the Ministry of Higher Education, Research and innovation for EU students (243 Euros in 2019-2020)

**Annual tuition fees for year 3 to 5 (PhD):**Official tuition fees of the Ministry of Higher Education, Research and innovation ( EU and non EU students pay the EU fees ; 380 euros in 2019-2020)

*An annual scholarship may be granted to the best applicants for the first 2 years (MSc). Modalities will be communicated by February 15th 2020.*

*Successful applicants will obtain a three-year PhD scholarship for year 3 to 5.*