Adaptive and nonlinear damping for model training
Statistics Seminar
17th January 2025, 1:00 pm – 2:00 pm
Fry Building, 2.04
We discuss novel damping procedures for training large scale Bayesian data models
such as deep neural networks. Drawing inspiration from the concept of a thermostat
(widely used for temperature regulation in molecular dynamics), we introduce
kinetic energy controls on the individual parameter velocities of the model. This
approach can be likened to component-wise Nosé-Hoover style thermostatting
taken in the zero-temperature limit and it can be directly related to the introduction
of cubic damping, a vibration suppression mechanism used in structural engineering
applications. While a large momentum parameter helps to overcome barriers
and progress training in low-curvature regions, it should be reduced in areas of
steep gradient to avoid instability; our adaptive scheme allows this adjustment to
be performed automatically, on a per-parameter basis. By using these schemes,
we obtain enhanced efficiency, including significant speedups and test accuracy
improvements in representative classification tasks (Fashion MNIST, CIFAR10/100,
SST2).
Comments are closed.