View article

[PDF] from neurips.cc

Beyond ntk with vanilla gradient descent: A mean-field analysis of neural networks with polynomial width, samples, and time

Authors

Arvind Mahankali, Haochen Zhang, Kefan Dong, Margalit Glasgow, Tengyu Ma

Publication date

2024/2/13

Journal

Advances in Neural Information Processing Systems

Volume

Description

Despite recent theoretical progress on the non-convex optimization of two-layer neural networks, it is still an open question whether gradient descent on neural networks without unnatural modifications can achieve better sample complexity than kernel methods. This paper provides a clean mean-field analysis of projected gradient flow on polynomial-width two-layer neural networks. Different from prior works, our analysis does not require unnatural modifications of the optimization algorithm. We prove that with sample size where is the dimension of the inputs, the network trained with projected gradient flow converges in polynomial time to a non-trivial error that is not achievable by kernel methods using samples, hence demonstrating a clear separation between unmodified gradient descent and NTK. As a corollary, we show that projected gradient descent with a positive learning rate and a polynomial number of iterations converges to low error with the same sample complexity.

Total citations

Cited by 9

202320244 5

Scholar articles

Beyond ntk with vanilla gradient descent: A mean-field analysis of neural networks with polynomial width, samples, and time

A Mahankali, H Zhang, K Dong, M Glasgow, T Ma - Advances in Neural Information Processing Systems, 2024