View article

Lookbehind Optimizer: k steps back, 1 step forward

Authors

Gonçalo Mordido, Pranshu Malviya, Aristide Baratin, Sarath Chandar

Publication date

2023/7/31

Description

The Lookahead optimizer improves the training stability of deep neural networks by having a set of fast weights that "look ahead" to guide the descent direction. Here, we combine this idea with sharpness-aware minimization (SAM) to stabilize its multi-step variant and improve the loss-sharpness trade-off. We propose Lookbehind, which computes gradient ascent steps ("looking behind") at each iteration and combine the gradients to bias the descent step toward flatter minima. We apply Lookbehind on top of two popular sharpness-aware training methods -- SAM and adaptive SAM (ASAM) -- and show that our approach leads to a myriad of benefits across a variety of tasks and training regimes. Particularly, we show increased generalization performance, greater robustness against noisy weights, and higher tolerance to catastrophic forgetting in lifelong learning settings.

Total citations

Cited by 1

20241

Scholar articles

Lookbehind optimizer: k steps back, 1 step forward

G Mordido, P Malviya, A Baratin, S Chandar - arXiv preprint arXiv:2307.16704, 2023