Follow
Alex Tamkin
Alex Tamkin
Research Scientist, Anthropic
Verified email at cs.stanford.edu - Homepage
Title
Cited by
Cited by
Year
On the opportunities and risks of foundation models
R Bommasani, DA Hudson, E Adeli, R Altman, S Arora, S von Arx, ...
arXiv preprint arXiv:2108.07258, 2021
37322021
Understanding the Capabilities, Limitations, and Societal Impact of Large Language Models
A Tamkin, M Brundage, J Clark, D Ganguli
arXiv preprint arXiv:2102.02503, https://arxiv.org/abs/2102.02503, 2021
2892021
Towards Monosemanticity: Decomposing Language Models With Dictionary Learning
T Bricken, A Templeton, J Batson, B Chen, A Jermyn, T Conerly, ...
https://transformer-circuits.pub/2023/monosemantic-features/index.html, 2023
1422023
Towards measuring the representation of subjective global opinions in language models
E Durmus, K Nguyen, TI Liao, N Schiefer, A Askell, A Bakhtin, C Chen, ...
arXiv preprint arXiv:2306.16388, 2023
1102023
Being Optimistic to Be Conservative: Quickly Learning a CVaR Policy
R Keramati, C Dann, A Tamkin, E Brunskill
Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI-20), 2020
882020
Studying large language model generalization with influence functions
R Grosse, J Bae, C Anil, N Elhage, A Tamkin, A Tajdini, B Steiner, D Li, ...
arXiv preprint arXiv:2308.03296, 2023
762023
Viewmaker Networks: Learning Views for Unsupervised Representation Learning
A Tamkin, M Wu, N Goodman
ICLR 2021, 2020
722020
Scaling monosemanticity: Extracting interpretable features from claude 3 sonnet
A Templeton, T Conerly, J Marcus, J Lindsey, T Bricken, B Chen, ...
Transformer Circuits Thread, 2024
652024
Drone.io: A Gestural and Visual Interface for Human-Drone Interaction
JR Cauchard, A Tamkin, CY Wang, L Vink, M Park, T Fang, JA Landay
2019 14th ACM/IEEE International Conference on Human-Robot Interaction (HRI …, 2019
592019
Investigating transferability in pretrained language models
A Tamkin, T Singh, D Giovanardi, N Goodman
Findings of EMNLP 2020, 2020
462020
Language Through a Prism: A Spectral Approach for Multiscale Language Representations
A Tamkin, D Jurafsky, N Goodman
NeurIPS 2020, 2020
392020
DABS: A Domain-Agnostic Benchmark for Self-Supervised Learning
A Tamkin, V Liu, R Lu, D Fein, C Schultz, N Goodman
NeurIPS 2021, 2021
382021
Distributionally-Aware Exploration for CVaR Bandits
A Tamkin, R Keramati, C Dann, E Brunskill
NeurIPS 2019 Workshop on Safety and Robustness in Decision Making, 2019
382019
Active Learning Helps Pretrained Models Learn the Intended Task
A Tamkin, D Nguyen, S Deshpande, J Mu, N Goodman
NeurIPS 2022, 2022
362022
C5t5: Controllable generation of organic molecules with transformers
D Rothchild, A Tamkin, J Yu, U Misra, J Gonzalez
arXiv preprint arXiv:2108.10307, 2021
312021
Evaluating and mitigating discrimination in language model decisions
A Tamkin, A Askell, L Lovitt, E Durmus, N Joseph, S Kravec, K Nguyen, ...
arXiv preprint arXiv:2312.03689, 2023
302023
Recursive Routing Networks: Learning to Compose Modules for Language Understanding
I Cases, C Rosenbaum, M Riemer, A Geiger, T Klinger, A Tamkin, O Li, ...
NAACL 2019, 2019
292019
Many-shot jailbreaking
C Anil, E Durmus, M Sharma, J Benton, S Kundu, J Batson, N Rimsky, ...
Anthropic, April, 2024
272024
Eliciting human preferences with language models
BZ Li, A Tamkin, N Goodman, J Andreas
arXiv preprint arXiv:2310.11589, 2023
252023
Task Ambiguity in Humans and Language Models
A Tamkin, K Handa, A Shrestha, N Goodman
ICLR 2023, 2023
232023
The system can't perform the operation now. Try again later.
Articles 1–20