Follow
Tengyu MA
Title
Cited by
Year
Linguistic Calibration of Long-Form Generations
N Band, X Li, T Ma, T Hashimoto
Forty-first International Conference on Machine Learning, 2024
2024
Linguistic Calibration of Language Models
N Band, X Li, T Ma, T Hashimoto
arXiv preprint arXiv:2404.00474, 2024
12024
Chain of thought empowers transformers to solve inherently serial problems
Z Li, H Liu, D Zhou, T Ma
arXiv preprint arXiv:2402.12875, 2024
72024
What is the Inductive Bias of Flatness Regularization? A Study of Deep Matrix Factorization Models
K Gatmiry, Z Li, T Ma, S Reddi, S Jegelka, CY Chuang
Advances in Neural Information Processing Systems 36, 2024
2024
Sharpness minimization algorithms do not only minimize sharpness to achieve better generalization
K Wen, Z Li, T Ma
Advances in Neural Information Processing Systems 36, 2024
162024
Beyond ntk with vanilla gradient descent: A mean-field analysis of neural networks with polynomial width, samples, and time
A Mahankali, H Zhang, K Dong, M Glasgow, T Ma
Advances in Neural Information Processing Systems 36, 2024
82024
Doremi: Optimizing data mixtures speeds up language model pretraining
SM Xie, H Pham, X Dong, N Du, H Liu, Y Lu, PS Liang, QV Le, T Ma, ...
Advances in Neural Information Processing Systems 36, 2024
652024
Data selection for language models via importance resampling
SM Xie, S Santurkar, T Ma, PS Liang
Advances in Neural Information Processing Systems 36, 34201-34227, 2023
732023
Provable guarantees for self-supervised deep learning with spectral contrastive loss
JZ Haochen, WEI Colin, AD Gaidon, MA Tengyu
US Patent App. 17/714,848, 2023
2023
Toward L_∞ Recovery of Nonlinear Functions: A Polynomial Sample Complexity Bound for Gaussian Random Fields
K Dong, T Ma
The Thirty Sixth Annual Conference on Learning Theory, 2877-2918, 2023
22023
One step of gradient descent is provably the optimal in-context learner with one layer of linear self-attention
A Mahankali, TB Hashimoto, T Ma
arXiv preprint arXiv:2307.03576, 2023
472023
Same pre-training loss, better downstream: Implicit bias matters for language models
H Liu, SM Xie, Z Li, T Ma
International Conference on Machine Learning, 22188-22214, 2023
282023
The inductive bias of flatness regularization for deep matrix factorization
K Gatmiry, Z Li, CY Chuang, S Reddi, T Ma, S Jegelka
arXiv preprint arXiv:2306.13239, 2023
62023
Large language models as tool makers
T Cai, X Wang, T Ma, X Chen, D Zhou
arXiv preprint arXiv:2305.17126, 2023
982023
Sophia: A scalable stochastic second-order optimizer for language model pre-training
H Liu, Z Li, D Hall, P Liang, T Ma
arXiv preprint arXiv:2305.14342, 2023
722023
Symbol tuning improves in-context learning in language models
J Wei, L Hou, A Lampinen, X Chen, D Huang, Y Tay, X Chen, Y Lu, ...
arXiv preprint arXiv:2305.08298, 2023
422023
Larger language models do in-context learning differently
J Wei, J Wei, Y Tay, D Tran, A Webson, Y Lu, X Chen, H Liu, D Huang, ...
arXiv preprint arXiv:2303.03846, 2023
1892023
How Sharpness-Aware Minimization Minimizes Sharpness?
K Wen, T Ma, Z Li
The Eleventh International Conference on Learning Representations, 2023
342023
On the opportunities and risks of foundation models. arXiv 2021
R Bommasani, DA Hudson, E Adeli, R Altman, S Arora, S von Arx, ...
arXiv preprint arXiv:2108.07258, 2023
672023
Larger language models do in-context learning differently, 2023
J Wei, J Wei, Y Tay, D Tran, A Webson, Y Lu, X Chen, H Liu, D Huang, ...
URL https://arxiv. org/abs/2303.03846, 2023
72023
The system can't perform the operation now. Try again later.
Articles 1–20