Authors
Harsha Gurulingappa, Roman Klinger, Martin Hofmann-Apitius, Juliane Fluck
Publication date
2010/5
Journal
2nd Workshop on Building and evaluating resources for biomedical text mining (7th edition of the Language Resources and Evaluation Conference)
Pages
15-22
Description
The mentions of human health perturbations such as the diseases and adverse effects denote a special entity class in the biomedical literature. They help in understanding the underlying risk factors and develop a preventive rationale. The recognition of these named entities in texts through dictionary-based approaches relies on the availability of appropriate terminological resources. Although few resources are publicly available, not all are suitable for the text mining needs. Therefore, this work provides an overview of the well known resources with respect to human diseases and adverse effects such as the MeSH, MedDRA, ICD-10, SNOMED CT, and UMLS. Individual dictionaries are generated from these resources and their performance in recognizing the named entities is evaluated over a manually annotated corpus. In addition, the steps for curating the dictionaries, rule-based acronym disambiguation and their impact on the dictionary performance is discussed. The results show that the MedDRA and UMLS achieve the best recall. Besides this, MedDRA provides an additional benefit of achieving a higher precision. The combination of search results of all the dictionaries achieve a considerably high recall. The corpus is available on http://www. scai. fraunhofer. de/disease-ae-corpus. html
Total citations
2011201220132014201520162017201820192020202120222023202433334841455242
Scholar articles
H Gurulingappa, R Klinger, M Hofmann-Apitius, J Fluck - 2nd Workshop on Building and evaluating resources …, 2010