Towards Building a Heavy-Tailed Theory of Stochastic Gradient Descent for Deep Neural Networks
Statistics Seminar
12th November 2021, 4:00 pm – 5:00 pm
Virtual Seminar, Zoom link: TBA
In this talk, I will focus on the 'tail behavior' in SGD in deep learning. I will first empirically illustrate that heavy tails arise in the gradient noise (i.e., the difference between the stochastic gradient and the true gradient). Accordingly I will propose to model the gradient noise as a heavy-tailed α-stable random vector, and accordingly propose to analyze SGD as a discretization of a stochastic differential equation (SDE) driven by a stable process. As opposed to classical SDEs that are driven by a Brownian motion, SDEs driven by stable processes can incur ‘jumps’, which force the SDE (and its discretization) transition from 'narrow minima' to 'wider minima', as proven by existing metastability theory and the extensions that we proved recently. These results open up a different perspective and shed more light on the view that SGD 'prefers' wide minima. In the second part of the talk, I will focus on the generalization properties of such heavy-tailed SDEs and show that the generalization error can be controlled by the Hausdorff dimension of the trajectories of the SDE, which is closely linked to the tail behavior of the driving process. Our results imply that heavier-tailed processes should achieve better generalization; hence, the tail-index of the process can be used as a notion of "capacity metric”. Finally, if time permits, I will talk about the 'originating cause' of such heavy-tailed behavior and present theoretical results which show that heavy-tails can even emerge in very sterile settings such as linear regression with iid Gaussian data.
The talk will be based on the following papers:
U. Şimşekli, L. Sagun, M. Gürbüzbalaban, "A Tail-Index Analysis of Stochastic Gradient Noise in Deep Neural Networks", ICML 2019
U. Şimşekli, O. Sener, G. Deligiannidis, M. A. Erdogdu, "Hausdorff Dimension, Heavy Tails, and Generalization in Neural Networks", NeurIPS 2020
M. Gurbuzbalaban, U. Simsekli, L. Zhu, "The Heavy-Tail Phenomenon in SGD", ICML 2021
Comments are closed.