TY - GEN
T1 - Automatic Annotation of Training Data for Deep Learning Based De-identification of Narrative Clinical Text
AU - Laursen, Martin Sundahl
AU - Pedersen, Jannik Skyttegaard
AU - Vinholt, Pernille
AU - Savarimuthu, Thiusius R.
PY - 2023
Y1 - 2023
N2 - Electronic health records contain information about patients’ medical history which is important for research but the text must be de-identified before use. This study utilized dictionaries constructed from publicly available lists of identifiers to automatically annotate a training dataset for a named entity recognition model to de-identify names, streets, and locations in Danish narrative clinical text. Ambiguous identifiers were not annotated if they occurred more than expected for an identifier. The model had recall 93.43%, precision 86.10%, and F1 89.62%. We found that the model generalized from the training data to achieve better performance than simply using the dictionaries to directly annotate text.
AB - Electronic health records contain information about patients’ medical history which is important for research but the text must be de-identified before use. This study utilized dictionaries constructed from publicly available lists of identifiers to automatically annotate a training dataset for a named entity recognition model to de-identify names, streets, and locations in Danish narrative clinical text. Ambiguous identifiers were not annotated if they occurred more than expected for an identifier. The model had recall 93.43%, precision 86.10%, and F1 89.62%. We found that the model generalized from the training data to achieve better performance than simply using the dictionaries to directly annotate text.
UR - https://ceur-ws.org/Vol-3416/
M3 - Article in proceedings
VL - 3416
SP - 30
EP - 44
BT - WNLPe-Health 2022
A2 - Hasanuzzaman, Mohammed
A2 - Prakash Singh, Jyoti
A2 - Dias, Gaël
A2 - Soguero-Ruiz, Cristina
A2 - Solvoll , Terje
A2 - Dinh Ngo, Phuong
PB - CEUR Workshop Proceedings
T2 - The First Workshop on Context-aware NLP in eHealth
Y2 - 15 December 2022 through 18 December 2022
ER -