Tomek Korbak

Cited by

	All	Since 2019
Citations	1313	1309
h-index	16	16
i10-index	20	19

880

440

220

660

2019202020212022202320244 11 23 35 371 862

Public access

View all

4 articles

0 articles

available

not available

Based on funding mandates

Co-authors

Ethan PerezAnthropic; New York UniversityVerified email at anthropic.com
Marc DymetmanIndependent Researcher (Prev. Principal Scientist, NAVER Labs Europe)Verified email at naverlabs.com
Germán KruszewskiSenior Scientist @ Naver Labs Europe; MSCA Postdoctoral Researcher @ UPFVerified email at naverlabs.com
Samuel R. BowmanNYU and AnthropicVerified email at nyu.edu
Hady ElsaharResearch Scientist at Meta AIVerified email at meta.com
Kyunghyun ChoNew York University, GenentechVerified email at nyu.edu
Joanna Rączaszek-LeonardiProfessor, University of WarsawVerified email at psych.uw.edu.pl
Owain EvansResearch Associate, University of OxfordVerified email at philosophy.ox.ac.uk
Jason PhangNew York UniversityVerified email at nyu.edu
Anil SethSussex UniversityVerified email at sussex.ac.uk
David Scott KruegerUniversity Assistant Professor, University of CambridgeVerified email at cam.ac.uk

Tomek Korbak

Other namesTomasz Korbak

UK AI Safety Institute

Verified email at dsit.gov.uk - Homepage

language models AI safety reinforcement learning Bayesian inference LLM agents


Title Sort by citations Sort by year Sort by title	Cited by Cited by	Year
Open problems and fundamental limitations of reinforcement learning from human feedback S Casper, X Davies, C Shi, TK Gilbert, J Scheurer, J Rando, R Freedman, ... arXiv preprint arXiv:2307.15217, 2023	282	2023
The Reversal Curse: LLMs trained on "A is B" fail to learn "B is A" L Berglund, M Tong, M Kaufmann, M Balesni, AC Stickland, T Korbak, ... arXiv preprint arXiv:2309.12288, 2023	152*	2023
Pretraining language models with human preferences T Korbak, K Shi, A Chen, RV Bhalerao, C Buckley, J Phang, SR Bowman, ... International Conference on Machine Learning, 17506-17533, 2023	146	2023
Inverse scaling: When bigger isn't better IR McKenzie, A Lyzhov, M Pieler, A Parrish, A Mueller, A Prabhu, ... arXiv preprint arXiv:2306.09479, 2023	94*	2023
Towards understanding sycophancy in language models M Sharma, M Tong, T Korbak, D Duvenaud, A Askell, SR Bowman, ... arXiv preprint arXiv:2310.13548, 2023	91	2023
Training language models with language feedback at scale J Scheurer, JA Campos, T Korbak, JS Chan, A Chen, K Cho, E Perez arXiv preprint arXiv:2303.16755, 2023	77	2023
Improving code generation by training with natural language feedback A Chen, J Scheurer, T Korbak, JA Campos, JS Chan, SR Bowman, K Cho, ... arXiv preprint arXiv:2303.16749, 2023	50	2023
Aligning language models with preferences through f-divergence minimization D Go, T Korbak, G Kruszewski, J Rozen, N Ryu, M Dymetman arXiv preprint arXiv:2302.08215, 2023	46	2023
Foundational challenges in assuring alignment and safety of large language models U Anwar, A Saparov, J Rando, D Paleka, M Turpin, P Hase, ES Lubana, ... arXiv preprint arXiv:2404.09932, 2024	45	2024
RL with KL penalties is better viewed as Bayesian inference T Korbak, E Perez, CL Buckley arXiv preprint arXiv:2205.11275, 2022	40	2022
On reinforcement learning and distribution matching for fine-tuning language models with no catastrophic forgetting T Korbak, H Elsahar, G Kruszewski, M Dymetman Advances in Neural Information Processing Systems 35, 16203-16220, 2022	38	2022
Taken out of context: On measuring situational awareness in LLMs L Berglund, AC Stickland, M Balesni, M Kaufmann, M Tong, T Korbak, ... arXiv preprint arXiv:2309.00667, 2023	33*	2023
Many-shot jailbreaking C Anil, E Durmus, M Sharma, J Benton, S Kundu, J Batson, N Rimsky, ... Anthropic, April, 2024	31*	2024
Computational enactivism under the free energy principle T Korbak Synthese 198 (3), 2743-2763, 2021	31	2021
Controlling conditional language models without catastrophic forgetting T Korbak, H Elsahar, G Kruszewski, M Dymetman International Conference on Machine Learning, 11499-11528, 2022	30	2022
Interaction history as a source of compositionality in emergent communication T Korbak, J Zubek, Ł Kuciński, P Miłoś, J Rączaszek-Leonardi Interaction Studies 22 (2), 212-243, 2021	19*	2021
Catalytic role of noise and necessity of inductive biases in the emergence of compositional communication Ł Kuciński, T Korbak, P Kołodziej, P Miłoś Advances in neural information processing systems 34, 23075-23088, 2021	15	2021
Is model collapse inevitable? breaking the curse of recursion by accumulating real and synthetic data M Gerstgrasser, R Schaeffer, A Dey, R Rafailov, H Sleight, J Hughes, ... arXiv preprint arXiv:2404.01413, 2024	13	2024
Measuring non-trivial compositionality in emergent communication T Korbak, J Zubek, J Rączaszek-Leonardi arXiv preprint arXiv:2010.15058, 2020	10	2020
Scaffolded minds and the evolution of content in signaling pathways T Korbak Studies in Logic, Grammar and Rhetoric 41 (1), 89-103, 2015	10	2015

The system can't perform the operation now. Try again later.

Articles 1–20

Citations per year

Duplicate citations

Merged citations

Add co-authorsCo-authors

Follow

Cited by

Co-authors