One paper accepted to AISTATS

Our paper "Linear convergence of gradient descent for overparametrized finite width two-layer linear networks with general initialization"(Xu et al., 2023) is accepted to AISTATS2023!

Reference

Linear Convergence of Gradient Descent For Finite Width Over-parametrized Linear Networks With General Initialization

Ziqing Xu, Hancheng Min, Salma Tarmoun, Enrique Mallada, and Rene Vidal

In Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, Apr 2023

Abs PDF

Recent theoretical analyses of the convergence of gradient descent (GD) to a global minimum for over-parametrized neural networks make strong assumptions on the step size (infinitesimal), the hidden-layer width (infinite), or the initialization (spectral, balanced). In this work, we relax these assumptions and derive a linear convergence rate for two-layer linear networks trained using GD on the squared loss in the case of finite step size, finite width and general initialization. Despite the generality of our analysis, our rate estimates are significantly tighter than those of prior work. Moreover, we provide a time-varying step size rule that monotonically improves the convergence rate as the loss function decreases to zero. Numerical experiments validate our findings.