Authors
Jeffrey Pennington, Samuel Schoenholz, Surya Ganguli
Publication date
2018/3/31
Conference
International Conference on Artificial Intelligence and Statistics
Pages
1924-1932
Publisher
PMLR
Description
Recent work has shown that tight concentration of the entire spectrum of singular values of a deep network’s input-output Jacobian around one at initialization can speed up learning by orders of magnitude. Therefore, to guide important design choices, it is important to build a full theoretical understanding of the spectra of Jacobians at initialization. To this end, we leverage powerful tools from free probability theory to provide a detailed analytic understanding of how a deep network’s Jacobian spectrum depends on various hyperparameters including the nonlinearity, the weight and bias distributions, and the depth. For a variety of nonlinearities, our work reveals the emergence of new universal limiting spectral distributions that remain concentrated around one even as the depth goes to infinity.
Scholar articles
J Pennington, S Schoenholz, S Ganguli - International Conference on Artificial Intelligence and …, 2018