Scaling Exponents Across Parameterizations and Optimizers K Everett, L Xiao, M Wortsman, AA Alemi, R Novak, PJ Liu, I Gur, ... arXiv preprint arXiv:2407.05872, 2024 | | 2024 |
4+ 3 Phases of Compute-Optimal Neural Scaling Laws E Paquette, C Paquette, L Xiao, J Pennington arXiv preprint arXiv:2405.15074, 2024 | 2 | 2024 |
Beyond human data: Scaling self-training for problem-solving with language models A Singh, JD Co-Reyes, R Agarwal, A Anand, P Patil, PJ Liu, J Harrison, ... arXiv preprint arXiv:2312.06585, 2023 | 36 | 2023 |
Frontier Language Models are not Robust to Adversarial Arithmetic, or" What do I need to say so you agree 2+ 2= 5? CD Freeman, L Culp, A Parisi, ML Bileschi, GF Elsayed, A Rizkowsky, ... arXiv preprint arXiv:2311.07587, 2023 | | 2023 |
Precise Learning Curves and Higher-Order Scaling Limits for Dot Product Kernel Regression L Xiao, H Hu, T Misiakiewicz, YM Lu, J Pennington Journal of Statistical Mechanics: Theory and Experiment 2023 (11), 114005, 2023 | 1 | 2023 |
Small-scale proxies for large-scale transformer training instabilities M Wortsman, PJ Liu, L Xiao, K Everett, A Alemi, B Adlam, JD Co-Reyes, ... arXiv preprint arXiv:2309.14322, 2023 | 24 | 2023 |
Fast neural kernel embeddings for general activations I Han, A Zandieh, J Lee, R Novak, L Xiao, A Karbasi Advances in neural information processing systems 35, 35657-35671, 2022 | 14 | 2022 |
Precise learning curves and higher-order scalings for dot-product kernel regression L Xiao, H Hu, T Misiakiewicz, Y Lu, J Pennington Advances in Neural Information Processing Systems 35, 4558-4570, 2022 | 36 | 2022 |
Synergy and symmetry in deep learning: Interactions between the data, model, and inference algorithm L Xiao, J Pennington arXiv preprint arXiv:2207.04612, 2022 | 10 | 2022 |
Eigenspace restructuring: a principle of space and frequency in neural networks L Xiao Conference on Learning Theory, 4888-4944, 2022 | 18 | 2022 |
Dataset distillation with infinitely wide convolutional networks T Nguyen, R Novak, L Xiao, J Lee Advances in Neural Information Processing Systems 34, 5186-5198, 2021 | 209 | 2021 |
Oscillatory Loomis—Whitney and Projections of Sublevel Sets M Gilula, K O’Neill, L Xiao Journal d'Analyse Mathématique, 1-27, 2021 | 2 | 2021 |
Dataset Distillation with Infinitely Wide Convolutional Networks J Lee, L Xiao, R Novak, TC Nguyen | | 2021 |
Disentangling trainability and generalization in deep neural networks L Xiao, J Pennington, S Schoenholz International Conference on Machine Learning, 10462-10472, 2020 | 112* | 2020 |
Exploring the Uncertainty Properties of Neural Networks' Implicit Priors in the Infinite-Width Limit B Adlam, J Lee, L Xiao, J Pennington, J Snoek ICLR, 2020 | 19 | 2020 |
Provable benefit of orthogonal initialization in optimizing deep linear networks W Hu, L Xiao, J Pennington arXiv preprint arXiv:2001.05992, 2020 | 131 | 2020 |
Finite versus infinite neural networks: an empirical study J Lee, S Schoenholz, J Pennington, B Adlam, L Xiao, R Novak, ... Advances in Neural Information Processing Systems 33, 15156-15172, 2020 | 204 | 2020 |
The surprising simplicity of the early-time learning dynamics of neural networks W Hu, L Xiao, B Adlam, J Pennington Advances in Neural Information Processing Systems 33, 17116-17128, 2020 | 73 | 2020 |
Neural tangents: Fast and easy infinite neural networks in python R Novak, L Xiao, J Hron, J Lee, AA Alemi, J Sohl-Dickstein, ... arXiv preprint arXiv:1912.02803, 2019 | 251 | 2019 |
Wide neural networks of any depth evolve as linear models under gradient descent J Lee, L Xiao, S Schoenholz, Y Bahri, R Novak, J Sohl-Dickstein, ... Advances in neural information processing systems 32, 2019 | 1064 | 2019 |