In Defence of Uniform Convergence: explaining generalization for interpolating predictors
4th December 2020, 3:00 pm – 4:00 pm
In deep learning, networks are trained with stochastic gradient descent to zero training error. Despite deep networks interpolating the training data, they often perform well in practice. In this talk, I will present some of my recent work on interpolating predictors. I will discuss some of the roadblocks that arise in using uniform bounds to explain performance of complex models. I will then describe a new role that uniform bounds may play in studying interpolating predictors by (i) defining uniform convergence when the model complexity grows with sample size; (ii) studying generalization error of an interpolating predictor in terms of a “derandomized” surrogate hypothesis, where a predictor is partially derandomized or rerandomized, e.g., fit to the training data but with modified label noise. As an application of “derandomized” surrogate analysis, I will present our results on over-parameterized linear regression. This is joint work with Jeffrey Negrea and Daniel M. Roy.