One paper accepted to ICML

Our paper "On the Convergence of Gradient Flow on Multi-layer Linear Models" (Min et al., 2023) is accepted to ICML2023!

Reference

On the Convergence of Gradient Flow on Multi-layer Linear Models

Hancheng Min, René Vidal, and Enrique Mallada

In The 40th International Conference on Machine Learning (ICML), Jul 2023

Abs PDF

In this paper, we analyze the convergence of gradient flow on a multi-layer linear model with a loss function of the form f(W_1W_2⋯W_L). We show that when f satisfies the gradient dominance property, proper weight initialization leads to exponential convergence of the gradient flow to a global minimum of the loss. Moreover, the convergence rate depends on two trajectory-specific quantities that are controlled by the weight initialization: the imbalance matrices, which measure the difference between the weights of adjacent layers, and the least singular value of the weight product W=W_1W_2⋯W_L. Our analysis exploits the fact that the gradient of the overparameterized loss can be written as the composition of the non-overparametrized gradient with a time-varying (weight-dependent) linear operator whose smallest eigenvalue controls the convergence rate. The key challenge we address is to derive a uniform lower bound for this time-varying eigenvalue that lead to improved rates for several multi-layer network models studied in the literature.