Benefit of deep learning: Statistical efficiency and optimization guarantee with non-convex noisy gradient descent
12th February 2021, 10:00 am – 11:00 am
In this talk, I discuss how deep learning can statistically outperform
shallow methods such as kernel methods with optimization guarantee.
First, I will briefly summarize our recent work on the excess risk
bounds of deep learning in the Besov space and its variants, in which
sparsity and non-convex geometry of the target function space play the
essential role. In the latter half, I present a deep learning
optimization framework based on a noisy gradient descent in an
infinite dimensional Hilbert space (gradient Langevin dynamics), and
show generalization error and excess risk bounds for the solution
obtained by the optimization procedure. The proposed framework can
deal with finite and infinite width networks simultaneously unlike
existing one such as neural tangent kernel and mean field analysis.
Moreover, I will show that deep learning can avoid the curse of
dimensionality in a teacher-student setting, and eventually achieve
better excess risk than kernel methods.