On the Convergence of Gradient Flow on Multi-layer Linear Models
Hancheng Min, René Vidal, and Enrique Mallada
In The 40th International Conference on Machine Learning (ICML), Jul 2023
In this paper, we analyze the convergence of gradient flow on a multi-layer linear model with a loss function of the form f(W_1W_2⋯W_L). We show that when f satisfies the gradient dominance property, proper weight initialization leads to exponential convergence of the gradient flow to a global minimum of the loss. Moreover, the convergence rate depends on two trajectory-specific quantities that are controlled by the weight initialization: the imbalance matrices, which measure the difference between the weights of adjacent layers, and the least singular value of the weight product W=W_1W_2⋯W_L. Our analysis exploits the fact that the gradient of the overparameterized loss can be written as the composition of the non-overparametrized gradient with a time-varying (weight-dependent) linear operator whose smallest eigenvalue controls the convergence rate. The key challenge we address is to derive a uniform lower bound for this time-varying eigenvalue that lead to improved rates for several multi-layer network models studied in the literature.