Authors
Simon S Du, Chi Jin, Jason D Lee, Michael I Jordan, Aarti Singh, Barnabas Poczos
Publication date
2017
Journal
Advances in neural information processing systems
Volume
30
Description
Although gradient descent (GD) almost always escapes saddle points asymptotically [Lee et al., 2016], this paper shows that even with fairly natural random initialization schemes and non-pathological functions, GD can be significantly slowed down by saddle points, taking exponential time to escape. On the other hand, gradient descent with perturbations [Ge et al., 2015, Jin et al., 2017] is not slowed down by saddle points—it can find an approximate local minimizer in polynomial time. This result implies that GD is inherently slower than perturbed GD, and justifies the importance of adding perturbations for efficient non-convex optimization. While our focus is theoretical, we also present experiments that illustrate our theoretical findings.
Total citations
20172018201920202021202220232024622444350475522
Scholar articles
SS Du, C Jin, JD Lee, MI Jordan, A Singh, B Poczos - Advances in neural information processing systems, 2017