Follow
Minhak Song
Title
Cited by
Cited by
Year
Linear attention is (maybe) all you need (to understand transformer optimization)
K Ahn*, X Cheng*, M Song*, C Yun, A Jadbabaie, S Sra
ICLR 2024 (arXiv:2310.01082), 2023
222023
Trajectory Alignment: Understanding the Edge of Stability Phenomenon via Bifurcation Theory
M Song, C Yun
NeurIPS 2023 (arXiv:2307.04204), 2023
52023
Does SGD really happen in tiny subspaces?
M Song, K Ahn, C Yun
arXiv preprint arXiv:2405.16002, 2024
2024
The system can't perform the operation now. Try again later.
Articles 1–3