Authors
Sang Michael Xie, Aditi Raghunathan, Percy Liang, Tengyu Ma
Publication date
2022
Conference
International Conference on Learning Representations (ICLR)
Description
Large language models (LMs) such as GPT-3 have the surprising ability to do in-context learning, where the model learns to do a downstream task simply by conditioning on a prompt consisting of input-output examples. The LM learns from these examples without being explicitly pretrained to learn. Thus, it is unclear what enables in-context learning. In this paper, we study how in-context learning can emerge when pretraining documents have long-range coherence. Here, the LM must infer a latent document-level concept to generate coherent next tokens during pretraining. At test time, in-context learning occurs when the LM also infers a shared latent concept between examples in a prompt. We prove when this occurs despite a distribution mismatch between prompts and pretraining data in a setting where the pretraining distribution is a mixture of HMMs. In contrast to messy large-scale datasets used to train LMs capable of in-context learning, we generate a small-scale synthetic dataset (GINC) where Transformers and LSTMs both exhibit in-context learning. Beyond the theory, experiments on GINC exhibit large-scale real-world phenomena including improved in-context performance with model scaling (despite the same pretraining loss), sensitivity to example order, and instances where zero-shot is better than few-shot in-context learning.
Total citations
2021202220232024242227212
Scholar articles
SM Xie, A Raghunathan, P Liang, T Ma - arXiv preprint arXiv:2111.02080, 2021
SM Xie, A Raghunathan, P Liang, T Ma - URL https://arxiv. org/abs/2111, 2022
SM Xie, A Raghunathan, P Liang, T Ma - URL http://arxiv. org/abs/2111.02080