## Mathematical Foundations of Machine Learning [MA4801]

### Sommersemester 2018

### Prof. Dr. Michael M. Wolf

Lecturer: |
Prof. Dr. Michael M. Wolf | |

Assistant: |
Javier Cuesta | |

Lectures: |
Wednesday, 12:15-13:45, LMU Physik Hörsaal (Garching Forschungszentrum) | Anmeldung |

Exercises: |
Tuesday, 12:15 - 13:45, LMU Physik Hörsaal (Garching Forschungszentrum) Friday, 14:15-15:45, room MI 02.08.020 |
Anmeldung |

### Exam

- The exam will be on Monday, July 30, 11:00 - 12:00 in the Physics department (rooms 5101.EG.501 and 5101.EG.503).
- Notes are allowed only on a single handwritten piece of paper, size A5 (!), double-sided.
- Other literature, as well as electronic utilities, are not allowed.

### Content

The course will provide an introduction to the mathematical foundations of supervised learning theory, neural networks, support vector machines and kernel methods.Date | Content | To read | Black board notes |
---|---|---|---|

April 11 | Intro, Statistical learning theory framework, error decomposition | Intro, 1.1 and 1.2 in lecture notes | |

April 18 | PAC bounds, growth function, VC-dimension and VC-dichotomy | 1.3, 1.5 and 1.6 in lecture notes | lec2 |

April 25 | VC-dimension of vector spaces, covering and packing numbers | 1.6 and 1.9 in lecture notes | lec3 |

May 2 | Covering number PAC bound, pseudo and fat-shattering dimension | 1.9 and 1.10 in lecture notes | lec4 |

May 9 | Pseudo and fat-shattering dimension, uniform stability, on-average stability, regularization | 1.10 and 1.11 in lecture notes | |

May 16 | Stochastic learning algorithms, differential privacy, KL-divergence | parts of 1.12 in lecture notes | lec6 |

May 23 | Mutual information bound, ensemble methods, Bagging, Ada-Boost | parts of 1.12 and 1.14 in lecture notes | |

May 30 | Gradient boosting, biological and artificial neural nets, Perceptrons, Boolean function representation | 2.1, 2.2 and parts of 1.14 and 2.3 in lecture notes | |

June 6 | Storage capacity of neural networks | 2.3 in lecture notes | |

June 13 | Accuracy bounds for non-linear approximations with continuous parameter dependence, Exponential benefits of depth | ..., 2.10 in lecture notes | |

June 20 | Relations between VC-dimension, computational complexity, approximation capability and depth of neural networks | 1.6 in lecture notes, ... | |

June 27 | (Stochastic) gradient descent, backpropagation, algorithmic differentiation | 2.6, 2.7 in lecture notes | |

July 4 | Convergence of (stochastic) gradient descent | 2.8 in lecture notes | |

July 11 | Saddle points, local minima, overview on kernelized SVMs | 2.9, 3 in lecture notes |

### Prerequisites

Basic knowledge in linear algebra, analysis and probability theory is required as well as some elementary Hilbert space theory.Date | List of major updates for lecture notes |
---|---|

June 1 | Sections on Bagging and Gradient Boosting added |

June 5 | Paragraphs added on general NN models and on storage capacity |

June 24 | Added lower bound on VCdim for parameterized function classes of bounded computational complexity |

July 3 | Added section on algorithmic differentiation |

File | Date | Content | Comments | Solution |
---|---|---|---|---|

Exercise class 1 | Week 17 - 24 April | Basic probability review, Hoeffding's inequality | Basic Prob. Review | Solution 1 |

Exercise class 2 | Week 24 April - 1 May | PAC learning, VC dimension and VC-dichotomy | Wording in H2.2 changed | Solution 2 |

Exercise class 3 | Week 8 - 15 May | Covering numbers, pseudo dimension, and fat-shattering-dimension | Typo in H3.3b corrected | Solution 3 |

Exercise class 4 | Week 15 - 22 May | Algorithmic Stability and Uniform covering number | H4.3 a bit simplified and more hints added | Solution 4 |

Exercise class 5 | Week 22 - 29 May | Stability and generalized bounds | Solution 5 | |

Exercise class 6 | Week 29 May - 5 June | Relative entropy bounds | H6.2c Simplified | Solution 6 |

Exercise class 7 | Week 5 June - 12 June | Emsemble methods, Ada-Boost and Neural netwoks | Typo in H7.2 and Hints added | Solution 7 |

Exercise class 8 | Week 12 June - 19 June | VC dimension of Neural networks, Approximation by Neural networks | Typos in H8.1 and H8.3a corrected | Solution 8 |

Exercise class 9 | Week 19 June - 26 June | Geometry and complexity of neural networks | Solution 9 | |

Exercise class 10 | Week 26 June - 3 July | Pseudo-dimension, memorization capacity and approximation of Neural networks | Solution 10 | |

Exercise class 11 | Week 3 July - 6 July | Algorithmic differentiation, LP and Learning | Solution 11 | |

Exercise class 12 | Week 6 July - 13 July | Open discussion/Preparation for the Exam |

### Literature

There are many good books on the topic. Recent examples with a focus on mathematical aspects are:- Foundations of Machine Learning, M. Mohri, A. Rostamizadeh, A. Talwalkar, MIT Press, 2012
- Understanding Machine Learning: From Theory to Algorithms, Shai Shalev-Shwartz, Shai Ben-David, Cambridge University Press, 2014

- Neural Network Learning: Theoretical Foundations, M. Anthony, P.L. Bartlett, Cambridge University Press, 1999
- Statistical Learning Theory, V.N. Vapnik, John Wiley & Sons, 1998