View article

[PDF] from harvard.edu

A method of automated nonparametric content analysis for social science

Authors

Daniel Hopkins, Gary King

Publication date

2010

Journal

American Journal of Political Science

Volume

Issue

Pages

229–247

Publisher

http://gking.harvard.edu/files/words.pdf

Description

The increasing availability of digitized text presents enormous opportunities for social scientists. Yet hand coding many blogs, speeches, government records, newspapers, or other sources of unstructured text is infeasible. Although computer scientists have methods for automated content analysis, most are optimized to classify individual documents, whereas social scientists instead want generalizations about the population of documents, such as the proportion in a given category. Unfortunately, even a method with a high percent of individual documents correctly classified can be hugely biased when estimating category proportions. By directly optimizing for this social science goal, we develop a method that gives approximately unbiased estimates of category proportions even when the optimal classifier performs poorly. We illustrate with diverse data sets, including the daily expressed opinions of thousands of …

Total citations

Cited by 1295

2007200820092010201120122013201420152016201720182019202020212022202320247 17 18 27 38 55 80 101 90 124 97 98 123 97 107 87 81 27

Scholar articles

A method of automated nonparametric content analysis for social science

DJ Hopkins, G King - American Journal of Political Science, 2010

Cited by 1159 Related articles All 15 versions

Extracting systematic social science meaning from text*

D Hopkins, G King - Manuscript available at http://gking. harvard. edu/files …, 2007