Authors
Ben Wellner, Matt Huyck, Scott Mardis, John Aberdeen, Alex Morgan, Leonid Peshkin, Alex Yeh, Janet Hitzeman, Lynette Hirschman
Publication date
2007/9/1
Journal
Journal of the American Medical Informatics Association
Volume
14
Issue
5
Pages
564-573
Publisher
BMJ Group
Description
Objective: This paper describes a successful approach to de-identification that was developed to participate in a recent AMIA-sponsored challenge evaluation.
Method: Our approach focused on rapid adaptation of existing toolkits for named entity recognition using two existing toolkits, Carafe and LingPipe.
Results: The “out of the box” Carafe system achieved a very good score (phrase F-measure of 0.9664) with only four hours of work to adapt it to the de-identification task. With further tuning, we were able to reduce the token-level error term by over 36% through task-specific feature engineering and the introduction of a lexicon, achieving a phrase F-measure of 0.9736.
Conclusions: We were able to achieve good performance on the de-identification task by the rapid retargeting of existing toolkits. For the Carafe system, we developed a method for tuning the balance of …
Total citations
20072008200920102011201220132014201520162017201820192020202120222023202428516689121210912101014444
Scholar articles
B Wellner, M Huyck, S Mardis, J Aberdeen, A Morgan… - Journal of the American Medical Informatics …, 2007