View article

[PDF] from aclanthology.org

Geolocation prediction in social media data by finding location indicative words

Authors

Bo Han, Paul Cook, Timothy Baldwin

Publication date

2012/12

Conference

Proceedings of COLING 2012

Pages

1045-1062

Description

Geolocation prediction is vital to geospatial applications like localised search and local event detection. Predominately, social media geolocation models are based on full text data, including common words with no geospatial dimension (eg today) and noisy strings (tmrw), potentially hampering prediction and leading to slower/more memory-intensive models. In this paper, we focus on finding location indicative words (LIWs) via feature selection, and establishing whether the reduced feature set boosts geolocation accuracy. Our results show that an information gain ratiobased approach surpasses other methods at LIW selection, outperforming state-of-the-art geolocation prediction methods by 10.6% in accuracy and reducing the mean and median of prediction error distance by 45km and 209km, respectively, on a public dataset. We further formulate notions of prediction confidence, and demonstrate that performance is even higher in cases where our model is more confident, striking a trade-off between accuracy and coverage. Finally, the identified LIWs reveal regional language differences, which could be potentially useful for lexicographers.

Total citations

Cited by 266

2013201420152016201720182019202020212022202320248 8 17 19 38 32 27 35 23 23 25 7

Scholar articles

Geolocation prediction in social media data by finding location indicative words

B Han, P Cook, T Baldwin - Proceedings of COLING 2012, 2012