We will use the following textbooks for this course:

[HTF] The elements of statistical learning: data mining, inference and prediction. Trevor Hastie, Robert Tibshirani and Jerome Friedman. Springer. 2001. Q325.75.F75 2001 c. 1. Available at http://statweb.stanford.edu/~tibs/ElemStatLearn/.

[BIS] Pattern recognition and machine learning. Christopher M. Bishop. 2009. Q327.B52 2009 c. 1

Other useful references are:

[MRT] Foundations of machine learning. Mehryar Mohri, Afshin Rostamizadeh, and Ameet Talwalkar. 2012. Q325.5 .M64 2012 c. 1

[SSBD] Understanding Machine Learning: From Theory to Algorithms. Shai Shalev-Shwartz and Shai Ben-David. 2014. Q325.5 .S475 2014 c. 1. Available at http://www.cs.huji.ac.il/~shais/UnderstandingMachineLearning/understanding-machine-learning-theory-algorithms.pdf.

[MUR] Machine learning: a probabilistic perspective. Kevin P. Murphy. 2012. Q325.5 .M87 2012 c. 1

[TM] Machine learning. Tom M. Mitchell. 1997. Q325.5.M58 1997 c. 1

Date

Topics

Readings

Introduction

T
1/29

Introduction to Machine Learning

Define learning, why/when do we need machine learning, discuss different types of machine learning, recent success, cool applications

MUR Chapter 1 PDF,
BIS Chapter 1, SSBD Chapter 1

Th
1/31

Course overview, formal introduction

Supervised learning: task, performance, evaluation; classification, regression, loss function, risk

MRT Chapter 1
HTF Chapter 1

F
2/1

Recitation: Probability and Statistics

Events, random variables, probabilities, pdf, pmf, cdf, mean, mode, median, variance, multivariate distributions, marginals, conditionals, Bayes theorem, independence

Review slides
Aaditya Ramdas' Tutorials
Math review (BIS Appendix B, MRT Appendix A, C). BIS Chapter 2.

T
2/5

Foundations

Bayes optimal rule, Bayes risk, empirical risk minimization (ERM), generalization error, supervised learning: classification/regression, rote learning, lazy learning, model fitting

SSBD Chapter 2, HTF Section 2.1-2.3, MUR Sections 1.1-1.2

F
2/7

Recitation: Linear Algebra

Vector spaces, norms, metric spaces, inner product spaces, Cauchy-Schwarz, Orthonormal bases

Video Lecture
Review Notes
Review Slides

F
2/8

Recitation: MLE, MAP, Intro Python

Parametric distributions, parameter estimation (MLE), MAP; introduction to Python, Jupyter notebook, numpy

BIS Chapter 2,
Review Notes, Python Tutorial, Learn Python

T
2/12

Linear Regression

Linear functions, loss function, empirical risk minimization, least squares solution, generalization, error decomposition

HTF Sections 2.3, 3.2, BIS Section 3.1

Th
2/14

Error analysis, statistical view

Bayes optimal predictor, statistical view, Gaussian model, Maximum Likelihood Estimation (MLE), Polynomial regression, general additive regression, overfitting

HTF Sections 2.4, 2.6, 2.9, BIS Sections 1.1, 1.2, 3.1, 3.2, MUR Sections 7.1-7.3, SSBD Section 9.2,

F
2/15

Regularization, gradient descent

Model complexity and overfitting, penalizing model complexity, description length, shrinkage methods, ridge regression, Lasso, gradient descent,

HTF Section 3.3, BIS Sections 1.1, 1.3, 3.1.4, MUR Section 7.5

T
2/19

Classification

Introduction, classification as regression, linear classifiers, risk, conditional risk, logistic regression, MLE, surrogate loss, generalized additive models

HTF Sections 4.1, 4.4, BIS Sections 1.5, 4.3.2, MUR Sections 8.1-8.3

Th
2/21

Logistic Regression

Log odds ratio, logistic function, gradient descent, Newton-Raphson

BIS Section 4.3.4, SSBD Sections 9.3, 14.1, MUR Section 8.5

F
2/22

Recitation: Convex Optimization

Convex sets, convex function, standard form, Lagrange multipliers, equivalence of constrained and unconstrained versions of ridge regression and Lasso regression

MRT Appendix B
BIS Appendix D, E

T
2/26

Stochastic gradient descent

Overfitting with logistic regression, MAP estimation, regularization, Softmax, Stochastic gradient descent (SGD)

BIS Sections 7.1, SSBD Sections 15.1-15.1.1, MUR Sections 14.5

Th
2/28

Support Vector Machines

Optimal separating hyperplane, Large margin classifier, margin and regularization, Lagrange multipliers, KKT conditions, max-margin optimization, quadratic programming, support vectors

HTF Section 4.5, MRT Sections 4.1-4.2

F
3/1

TBD

T
3/5

Support Vector Machines II

Non-separable case, SVMs with slack, loss in SVM, solving SVM in the primal SVM regression

BIS Section 7.1, SSBD Sections 15.1-15.1.1, MUR Section 14.5

Th
3/7

Kernel methods

solving SVM in the primal, subgradient, subgradient descent, nonlinear features, feature space, kernel trick, representer theorem, kernel SVM in the primal, Mercer's kernels, radial basic function, kernel SVM, SVM regression

BIS Sections 6.1, 6.2, MRT Sections 5.1, 5.2, 5.3.1-5.3.2, SSBD Sections 15.2, 15.4-15.5, MUR Sections 14.1-2

F
3/8

Recitation: classification

Classification as regression: linear regression, logistic regression, SVMs, empirical risk minimization, losses

Notes

T
3/12

MIDTERM REVIEW

Th
3/14

MIDTERM

F
3/15

Discuss MIDTERM solutions

Spring Break (3/18 - 3/22)