Peter Baldwin

Cited by

	All	Since 2019
Citations	392	281
h-index	12	8
i10-index	14	8

2007200820092010201120122013201420152016201720182019202020212022202320241 2 8 6 6 8 7 14 9 12 12 21 8 31 47 36 70 88

Peter Baldwin

Principal Measurement Scientist, National Board of Medical Examiners

Verified email at nbme.org

psychometrics estimation NLP LLM


Title Sort by citations Sort by year Sort by title	Cited by Cited by	Year
Predicting the difficulty of multiple choice questions in a high-stakes medical exam V Yaneva, P Baldwin, J Mee Proceedings of the fourteenth workshop on innovative use of NLP for building …, 2019	57	2019
Predicting the difficulty and response time of multiple choice questions using transfer learning K Xue, V Yaneva, C Runyon, P Baldwin Proceedings of the fifteenth workshop on innovative use of NLP for building …, 2020	39	2020
Predicting item survival for multiple choice questions in a high-stakes medical exam V Yaneva, P Baldwin, J Mee Proceedings of the Twelfth Language Resources and Evaluation Conference …, 2020	28	2020
Using natural language processing to predict item response times and improve test construction P Baldwin, V Yaneva, J Mee, BE Clauser, LA Ha Journal of Educational Measurement 58 (1), 4-30, 2021	24	2021
Using item response time data in test development and validation: Research with beginning computer users AL Zenisky, P Baldwin Center for educational assessment report No 593, 2006	24	2006
Massachusetts adult proficiency tests technical manual, version 2 SG Sireci, P Baldwin, A Martone, AL Zenisky, L Kaira, W Lam, CL Shea, ... Center for Educational Assessment Research Report No 677, 2008	19	2008
Hip psychometrics P Baldwin, J Bernstein, H Wainer Statistics in Medicine 28 (17), 2277-2292, 2009	18	2009
Comparison of automated scoring methods for a computerized performance assessment of clinical judgment P Harik, P Baldwin, B Clauser Applied Psychological Measurement 37 (8), 587-597, 2013	17	2013
A comparison of IRT equating methods on recovering item parameters and growth in mixed-format tests SG Baldwin, P Baldwin, ML Nering annual meeting of the American Educational Research Association, Chicago, IL, 2007	17	2007
A comparison of experimental and observational approaches to assessing the effects of time constraints in a medical licensing examination P Harik, BE Clauser, I Grabovsky, P Baldwin, MJ Margolis, D Bucak, ... Journal of Educational Measurement 55 (2), 308-327, 2018	16	2018
Weighting components of a composite score using naïve expert judgments about their relative importance P Baldwin Applied Psychological Measurement 39 (7), 539-550, 2015	14	2015
An experimental study of the internal consistency of judgments made in bookmark standard setting BE Clauser, P Baldwin, MJ Margolis, J Mee, M Winward Journal of Educational Measurement 54 (4), 481-497, 2017	13	2017
Examining ChatGPT Performance on USMLE Sample Items and Implications for Assessment V Yaneva, P Baldwin, DP Jurich, K Swygert, BE Clauser Academic Medicine, 10.1097, 2023	10	2023
A strategy for developing a common metric in item response theory when parameter posterior distributions are known P Baldwin Journal of Educational Measurement 48 (1), 1-11, 2011	10	2011
The effect of rating unfamiliar items on Angoff passing scores JC Clauser, RK Hambleton, P Baldwin Educational and psychological measurement 77 (6), 901-916, 2017	9	2017
Findings from the First Shared Task on Automated Prediction of Difficulty and Response Time for Multiple-Choice Questions V Yaneva, K North, P Baldwin, S Rezayi, Y Zhou, SR Choudhury, P Harik, ... Proceedings of the 19th Workshop on Innovative Use of NLP for Building …, 2024	8	2024
The choice of response probability in bookmark standard setting: an experimental study P Baldwin, MJ Margolis, BE Clauser, J Mee, M Winward Educational Measurement: Issues and Practice 39 (1), 37-44, 2020	8	2020
Assessing the impact of modifications to the documentation component’s scoring rubric and rater training on USMLE integrated clinical encounter scores SG Baldwin, P Harik, LA Keller, BE Clauser, P Baldwin, TA Rebbecchi Academic Medicine 84 (10), S97-S100, 2009	7	2009
Massachusetts adult proficiency tests technical manual SG Sireci, P Baldwin, A Martone, A Zenisky, RK Hambleton, KT Han Center for Educational Assessment, University of Massachusetts Amherst, 2006	6	2006
A modified IRT model intended to improve parameter estimates under small sample conditions P Baldwin annual meeting of the National Council on Measurement in Education, San …, 2006	6	2006

The system can't perform the operation now. Try again later.

Articles 1–20

Citations per year

Duplicate citations

Merged citations

Add co-authorsCo-authors

Follow

Cited by