Automatic Annotation of Training Data for Deep Learning Based De-identification of Narrative Clinical Text

Publikation: Kapitel i bog/rapport/konference-proceedingKonferencebidrag i proceedingsForskningpeer review

108 Downloads (Pure)

Abstract

Electronic health records contain information about patients’ medical history which is important for research but the text must be de-identified before use. This study utilized dictionaries constructed from publicly available lists of identifiers to automatically annotate a training dataset for a named entity recognition model to de-identify names, streets, and locations in Danish narrative clinical text. Ambiguous identifiers were not annotated if they occurred more than expected for an identifier. The model had recall 93.43%, precision 86.10%, and F1 89.62%. We found that the model generalized from the training data to achieve better performance than simply using the dictionaries to directly annotate text.
OriginalsprogEngelsk
TitelWNLPe-Health 2022 : Proceedings of The First Workshop on Context-aware NLP in eHealth (WNLPe-Health 2022)
RedaktørerMohammed Hasanuzzaman, Jyoti Prakash Singh, Gaël Dias, Cristina Soguero-Ruiz, Terje Solvoll , Phuong Dinh Ngo
Vol/bind3416
ForlagCEUR Workshop Proceedings
Publikationsdato2023
Sider30-44
StatusUdgivet - 2023
BegivenhedThe First Workshop on Context-aware NLP in eHealth: (WNLPe-Health 2022) - Delhi, Indien
Varighed: 15. dec. 202218. dec. 2022

Konference

KonferenceThe First Workshop on Context-aware NLP in eHealth
Land/OmrådeIndien
ByDelhi
Periode15/12/202218/12/2022

Fingeraftryk

Dyk ned i forskningsemnerne om 'Automatic Annotation of Training Data for Deep Learning Based De-identification of Narrative Clinical Text'. Sammen danner de et unikt fingeraftryk.
  • IPJ - The Intelligent Patient Journal

    Savarimuthu, T. R. (Projektdeltager), Pedersen, J. S. (Projektdeltager), Laursen, M. S. (Projektdeltager) & Vinholt, P. J. (Projektdeltager)

    01/06/202001/07/2023

    Projekter: ProjektForskning

Citationsformater