Authors
Maarten Sap, Gregory Park, Johannes Eichstaedt, Margaret Kern, David Stillwell, Michal Kosinski, Lyle Ungar, Hansen Andrew Schwartz
Publication date
2014
Journal
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)
Pages
1146-1151
Description
Demographic lexica have potential for widespread use in social science, economic, and business applications. We derive predictive lexica (words and weights) for age and gender using regression and classification models from word usage in Facebook, blog, and Twitter data with associated demographic labels. The lexica, made publicly available, 1 achieved state-of-the-art accuracy in language based age and gender prediction over Facebook and Twitter, and were evaluated for generalization across social media genres as well as in limited message situations.
Total citations
20142015201620172018201920202021202220232024114302729414434332515
Scholar articles
M Sap, G Park, J Eichstaedt, M Kern, D Stillwell… - Proceedings of the 2014 conference on empirical …, 2014