Authors
Sanjeev Arora, Simon S Du, Wei Hu, Zhiyuan Li, Ruslan Salakhutdinov, Ruosong Wang
Publication date
2019/4/26
Journal
Neurips 2019 arXiv preprint arXiv:1904.11955
Description
How well does a classic deep net architecture like AlexNet or VGG19 classify on a standard dataset such as CIFAR-10 when its “width”—namely, number of channels in convolutional layers, and number of nodes in fully-connected internal layers—is allowed to increase to infinity? Such questions have come to the forefront in the quest to theoretically understand deep learning and its mysteries about optimization and generalization. They also connect deep learning to notions such as Gaussian processes and kernels. A recent paper [Jacot et al., 2018] introduced the Neural Tangent Kernel (NTK) which captures the behavior of fully-connected deep nets in the infinite width limit trained by gradient descent; this object was implicit in some other recent papers. An attraction of such ideas is that a pure kernel-based method is used to capture the power of a fully-trained deep net of infinite width.
Total citations
20192020202120222023202446158187221208122
Scholar articles
S Arora, SS Du, W Hu, Z Li, RR Salakhutdinov, R Wang - Advances in neural information processing systems, 2019