Scaling Exponents Across Parameterizations and Optimizers K Everett, L Xiao, M Wortsman, AA Alemi, R Novak, PJ Liu, I Gur, ... arXiv preprint arXiv:2407.05872, 2024 | | 2024 |
4+ 3 Phases of Compute-Optimal Neural Scaling Laws E Paquette, C Paquette, L Xiao, J Pennington arXiv preprint arXiv:2405.15074, 2024 | 2 | 2024 |
High dimensional analysis reveals conservative sharpening and a stochastic edge of stability A Agarwala, J Pennington arXiv preprint arXiv:2404.19261, 2024 | | 2024 |
Training LLMs over Neurally Compressed Text B Lester, J Lee, A Alemi, J Pennington, A Roberts, J Sohl-Dickstein, ... arXiv preprint arXiv:2404.03626, 2024 | 1 | 2024 |
Beyond human data: Scaling self-training for problem-solving with language models A Singh, JD Co-Reyes, R Agarwal, A Anand, P Patil, PJ Liu, J Harrison, ... arXiv preprint arXiv:2312.06585, 2023 | 36 | 2023 |
Frontier Language Models are not Robust to Adversarial Arithmetic, or" What do I need to say so you agree 2+ 2= 5? CD Freeman, L Culp, A Parisi, ML Bileschi, GF Elsayed, A Rizkowsky, ... arXiv preprint arXiv:2311.07587, 2023 | | 2023 |
Precise Learning Curves and Higher-Order Scaling Limits for Dot Product Kernel Regression L Xiao, H Hu, T Misiakiewicz, YM Lu, J Pennington Journal of Statistical Mechanics: Theory and Experiment 2023 (11), 114005, 2023 | 1 | 2023 |
Small-scale proxies for large-scale transformer training instabilities M Wortsman, PJ Liu, L Xiao, K Everett, A Alemi, B Adlam, JD Co-Reyes, ... arXiv preprint arXiv:2309.14322, 2023 | 24 | 2023 |
Spherical random features for polynomial kernels J Pennington, S Kumar US Patent 11,636,384, 2023 | | 2023 |
Implicit regularization or implicit conditioning? exact risk trajectories of sgd in high dimensions C Paquette, E Paquette, B Adlam, J Pennington Advances in Neural Information Processing Systems 35, 35984-35999, 2022 | 13 | 2022 |
Precise learning curves and higher-order scalings for dot-product kernel regression L Xiao, H Hu, T Misiakiewicz, Y Lu, J Pennington Advances in Neural Information Processing Systems 35, 4558-4570, 2022 | 36 | 2022 |
Second-order regression models exhibit progressive sharpening to the edge of stability A Agarwala, F Pedregosa, J Pennington arXiv preprint arXiv:2210.04860, 2022 | 22 | 2022 |
Synergy and symmetry in deep learning: Interactions between the data, model, and inference algorithm L Xiao, J Pennington arXiv preprint arXiv:2207.04612, 2022 | 10 | 2022 |
Wide Bayesian neural networks have a simple weight posterior: theory and accelerated sampling J Hron, R Novak, J Pennington, J Sohl-Dickstein International conference on machine learning, 8926-8945, 2022 | 6 | 2022 |
Homogenization of SGD in high-dimensions: Exact dynamics and generalization properties C Paquette, E Paquette, B Adlam, J Pennington arXiv preprint arXiv:2205.07069, 2022 | 23 | 2022 |
A random matrix perspective on mixtures of nonlinearities in high dimensions B Adlam, JA Levinson, J Pennington International Conference on Artificial Intelligence and Statistics, 3434-3457, 2022 | 15 | 2022 |
A second order regression model shows edge of stability behavior A Agarwala, J Pennington, F Pedregosa | 1 | 2022 |
Overparameterization improves robustness to covariate shift in high dimensions N Tripuraneni, B Adlam, J Pennington Advances in Neural Information Processing Systems 34, 13883-13897, 2021 | 48 | 2021 |
Covariate shift in high-dimensional random feature regression N Tripuraneni, B Adlam, J Pennington arXiv preprint arXiv:2111.08234, 2021 | 27 | 2021 |
Anisotropic random feature regression in high dimensions G Mel, J Pennington International Conference on Learning Representations, 2021 | 8 | 2021 |