Authors
Leon Derczynski, Diana Maynard, Giuseppe Rizzo, Marieke Van Erp, Genevieve Gorrell, Raphaël Troncy, Johann Petrak, Kalina Bontcheva
Publication date
2015/3/1
Journal
Information Processing & Management
Volume
51
Issue
2
Pages
32-49
Publisher
Pergamon
Description
Applying natural language processing for mining and intelligent information access to tweets (a form of microblog) is a challenging, emerging research area. Unlike carefully authored news text and other longer content, tweets pose a number of new challenges, due to their short, noisy, context-dependent, and dynamic nature. Information extraction from tweets is typically performed in a pipeline, comprising consecutive stages of language identification, tokenisation, part-of-speech tagging, named entity recognition and entity disambiguation (e.g. with respect to DBpedia). In this work, we describe a new Twitter entity disambiguation dataset, and conduct an empirical analysis of named entity recognition and disambiguation, investigating how robust a number of state-of-the-art systems are on such noisy texts, what the main sources of error are, and which problems should be further investigated to improve the state of …
Total citations
201520162017201820192020202120222023202433457770646346323711
Scholar articles
L Derczynski, D Maynard, G Rizzo, M Van Erp… - Information Processing & Management, 2015