View article

[HTML] from sciencedirect.com

Automatic detection of protected health information from clinic narratives

Authors

Hui Yang, Jonathan M Garibaldi

Publication date

2015/12/1

Journal

Journal of biomedical informatics

Volume

Pages

S30-S38

Publisher

Academic Press

Description

This paper presents a natural language processing (NLP) system that was designed to participate in the 2014 i2b2 de-identification challenge. The challenge task aims to identify and classify seven main Protected Health Information (PHI) categories and 25 associated sub-categories. A hybrid model was proposed which combines machine learning techniques with keyword-based and rule-based approaches to deal with the complexity inherent in PHI categories. Our proposed approaches exploit a rich set of linguistic features, both syntactic and word surface-oriented, which are further enriched by task-specific features and regular expression template patterns to characterize the semantics of various PHI categories. Our system achieved promising accuracy on the challenge test data with an overall micro-averaged F-measure of 93.6%, which was the winner of this de-identification challenge.

Total citations

Cited by 128

20152016201720182019202020212022202320243 8 15 12 14 20 19 8 19 10

Scholar articles

Automatic detection of protected health information from clinic narratives

H Yang, JM Garibaldi - Journal of biomedical informatics, 2015