View article

[PDF] from arxiv.org

An Explanation of In-context Learning as Implicit Bayesian Inference

Authors

Sang Michael Xie, Aditi Raghunathan, Percy Liang, Tengyu Ma

Publication date

2022

Conference

International Conference on Learning Representations (ICLR)

Description

Large language models (LMs) such as GPT-3 have the surprising ability to do in-context learning, where the model learns to do a downstream task simply by conditioning on a prompt consisting of input-output examples. The LM learns from these examples without being explicitly pretrained to learn. Thus, it is unclear what enables in-context learning. In this paper, we study how in-context learning can emerge when pretraining documents have long-range coherence. Here, the LM must infer a latent document-level concept to generate coherent next tokens during pretraining. At test time, in-context learning occurs when the LM also infers a shared latent concept between examples in a prompt. We prove when this occurs despite a distribution mismatch between prompts and pretraining data in a setting where the pretraining distribution is a mixture of HMMs. In contrast to messy large-scale datasets used to train LMs capable of in-context learning, we generate a small-scale synthetic dataset (GINC) where Transformers and LSTMs both exhibit in-context learning. Beyond the theory, experiments on GINC exhibit large-scale real-world phenomena including improved in-context performance with model scaling (despite the same pretraining loss), sensitivity to example order, and instances where zero-shot is better than few-shot in-context learning.

Total citations

Cited by 485

20212022202320242 42 227 212

Scholar articles

An explanation of in-context learning as implicit bayesian inference

SM Xie, A Raghunathan, P Liang, T Ma - arXiv preprint arXiv:2111.02080, 2021

An explanation of in-context learning as implicit bayesian inference, 2021*

SM Xie, A Raghunathan, P Liang, T Ma - URL https://arxiv. org/abs/2111, 2022

Cited by 4 Related articles

An Explanation of In-context Learning as Implicit Bayesian Inference, July 2022*

SM Xie, A Raghunathan, P Liang, T Ma - URL http://arxiv. org/abs/2111.02080

Cited by 3 Related articles