Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM S Sukhbaatar, O Golovneva, V Sharma, H Xu, XV Lin, B Rozière, J Kahn, ... arXiv preprint arXiv:2403.07816, 2024 | | 2024 |
Self-rewarding language models W Yuan, RY Pang, K Cho, S Sukhbaatar, J Xu, J Weston arXiv preprint arXiv:2401.10020, 2024 | 120 | 2024 |
Self-alignment with instruction backtranslation X Li, P Yu, C Zhou, T Schick, L Zettlemoyer, O Levy, J Weston, M Lewis arXiv preprint arXiv:2308.06259, 2023 | 107 | 2023 |
Towards a unified view of sparse feed-forward network in pretraining large language model ZL Liu, T Dettmers, XV Lin, V Stoyanov, X Li arXiv preprint arXiv:2305.13999, 2023 | 2 | 2023 |
Large language model programs I Schlag, S Sukhbaatar, A Celikyilmaz, W Yih, J Weston, J Schmidhuber, ... arXiv preprint arXiv:2305.05364, 2023 | 14 | 2023 |
ToKen: Task decomposition and knowledge infusion for few-shot hate speech detection B AlKhamissi, F Ladhak, S Iyer, V Stoyanov, Z Kozareva, X Li, P Fung, ... arXiv preprint arXiv:2205.12495, 2022 | 18 | 2022 |
Lifting the curse of multilinguality by pre-training modular transformers J Pfeiffer, N Goyal, XV Lin, X Li, J Cross, S Riedel, M Artetxe arXiv preprint arXiv:2205.06266, 2022 | 90 | 2022 |
Opt: Open pre-trained transformer language models S Zhang, S Roller, N Goyal, M Artetxe, M Chen, S Chen, C Dewan, ... arXiv preprint arXiv:2205.01068, 2022 | 1880 | 2022 |
Xi Victoria Lin, Todor Mihaylov, Myle Ott, Sam Shleifer, Kurt Shuster, Daniel Simig, Punit Singh Koura, Anjali Sridhar, Tianlu Wang, and Luke Zettlemoyer. 2022 S Zhang, S Roller, N Goyal, M Artetxe, M Chen, S Chen, C Dewan, ... Opt: Open pretrained transformer language models 1, 2022 | 683 | 2022 |
Efficient language modeling with sparse all-mlp P Yu, M Artetxe, M Ott, S Shleifer, H Gong, V Stoyanov, X Li arXiv preprint arXiv:2203.06850, 2022 | 11 | 2022 |
Efficient large scale language modeling with mixtures of experts M Artetxe, S Bhosale, N Goyal, T Mihaylov, M Ott, S Shleifer, XV Lin, J Du, ... arXiv preprint arXiv:2112.10684, 2021 | 80 | 2021 |
Few-shot learning with multilingual language models XV Lin, T Mihaylov, M Artetxe, T Wang, S Chen, D Simig, M Ott, N Goyal, ... arXiv preprint arXiv:2112.10668, 2021 | 321* | 2021 |
Robust optimization for multilingual translation with imbalanced data X Li, H Gong Advances in Neural Information Processing Systems 34, 25086-25099, 2021 | 19 | 2021 |
Pay better attention to attention: Head selection in multilingual and multi-domain sequence modeling H Gong, Y Tang, J Pino, X Li Advances in Neural Information Processing Systems 34, 2668-2681, 2021 | 9 | 2021 |
Do language models have beliefs? methods for detecting, updating, and visualizing model beliefs P Hase, M Diab, A Celikyilmaz, X Li, Z Kozareva, V Stoyanov, M Bansal, ... arXiv preprint arXiv:2111.13654, 2021 | 66 | 2021 |
Distributionally robust multilingual machine translation C Zhou, D Levy, X Li, M Ghazvininejad, G Neubig arXiv preprint arXiv:2109.04020, 2021 | 23 | 2021 |
Multilingual translation from denoising pre-training Y Tang, C Tran, X Li, PJ Chen, N Goyal, V Chaudhary, J Gu, A Fan Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 …, 2021 | 125 | 2021 |
Fst: the fair speech translation system for the iwslt21 multilingual shared task Y Tang, H Gong, X Li, C Wang, J Pino, H Schwenk, N Goyal arXiv preprint arXiv:2107.06959, 2021 | 6 | 2021 |
Improving speech translation by understanding and learning from the auxiliary text translation task Y Tang, J Pino, X Li, C Wang, D Genzel arXiv preprint arXiv:2107.05782, 2021 | 65 | 2021 |
Gender bias amplification during speed-quality optimization in neural machine translation A Renduchintala, D Diaz, K Heafield, X Li, M Diab arXiv preprint arXiv:2106.00169, 2021 | 42 | 2021 |