Authors
Rong Ge, Jason D Lee, Tengyu Ma
Publication date
2017/11/1
Journal
ICLR 2017; arXiv preprint arXiv:1711.00501
Description
We consider the problem of learning a one-hidden-layer neural network: we assume the input is from Gaussian distribution and the label , where is a nonnegative vector in with , is a full-rank weight matrix, and is a noise vector. We first give an analytic formula for the population risk of the standard squared loss and demonstrate that it implicitly attempts to decompose a sequence of low-rank tensors simultaneously. Inspired by the formula, we design a non-convex objective function whose landscape is guaranteed to have the following properties: 1. All local minima of are also global minima. 2. All global minima of correspond to the ground truth parameters. 3. The value and gradient of can be estimated using samples. With these properties, stochastic gradient descent on provably converges to the global minimum and learn the ground-truth parameters. We also prove finite sample complexity result and validate the results by simulations.
Total citations
2016201720182019202020212022202320241236545951492420
Scholar articles