DATA8003 Theoretical Foundation of Deep Learning, 2023


Deep learning has achieved great success in many real-world applications. However, the reason why deep learning is so powerful remains elusive. The goal of this course is to introduce theoretical tools and methods that are developed to understand and explain the success of deep learning. In particular, this course will cover multiple aspects of machine learning, including landscape analysis, optimization, generalization, and algorithm designs. We will start with the introduction of the basic setup of machine learning problems, including loss function, training algorithms, and generalization performance evaluation. Then we will further introduce the conventional optimization theory and statistical learning theory, and discuss its limitation in studying over-parameterized deep neural network models. We will also introduce the neural tangent kernel (NTK) theory, a modern theoretical method can handle over-parameterization and nonconvex issues in deep learning. Finally, we will discuss representation learning and benign-overfitting of over-parameterized learning models, and their connections to the optimization and generalization in deep learning. The instructor will give lectures on the selected topics. Students will need to complete the homework (including programming and mathematical derivations) and a course project. 

Office Hour

  • Tuesday, 4 pm - 5 pm, CB204F (starting from next week)
  • Please also feel free to contact me via email:
  • Students who are not in moodle now, please sign in here


  • Attendance 10%
  • Paper Presentation 20%
  • Homework 20%
  • Final Project 50%

Course Content

Paper Presentation:

  • During the first 2 weeks (due by Sept. 17), each student need to find 2 papers related to the foundation of deep learning, and send them to the instructor (will have a google sheet.)
  • Before the 3rd week, the instructor will approve whether these papers are good for presenting.
  • From the 4th week, 1 or 2 student will need to present the paper during each class. The presentation will be around 20mins and students need to prepare the slides.

Link for candidate papers.

Link for sigining in the presentation.


  • Each team should be up to 2 students (3 students are allowed in extreme cases). Proposal submitted in mid-semester.

Link for signing in the final project presentation.

The project can be:

  • Understanding of the phenomena in deep learning: can be either empirically or theoretically.
  • Analysis of deep learning algorithms: convergence, generalization, or implicit bias.
  • A solution to new deep learning problems: new training paradigm, new algorithms, new evaluations, new objectives.

The project can also be (team up is not allowed):

  • Summary of a series of interesting papers:
  • You are required to summarize at least 3 papers working in the same topics.
  • These papers need to be related to the understanding or theory of deep learning, rather than deep learning for certain application.
  • These papers can include the paper you presented.
  • A clear logic and comparison between these papers need to be presented.