Unlocking the Potential of Electronic Health Records With Danish Clinical Language Models for Text Mining

Jannik Skyttegaard Pedersen

Research output: ThesisPh.D. thesis

112 Downloads (Pure)


This PhD dissertation focuses on the development of language technology that can be used to extract clinical information from Danish electronic health records (EHRs). EHRs contain important health-related information that can be used to guide the treatment of patients. However, a large part of the information is stored in unstructured narrative text of the EHR, making it difficult and time-consuming to extract the relevant details, especially in acute situations. Consequently, important information may be lost which can increase the risk of misdiagnosis and adverse treatment outcomes.

The recent paradigm shift in the field of natural language processing (NLP), driven by self-supervised neural networks and the transformer architecture, has produced automatic text-processing tools with unprecedented performances. These tools could be used to extract and structure the information from the narrative text of EHRs automatically. However, research in language technology has mostly been explored for high-resource languages like English, while the development of Danish language technology has received less attention, especially for specialized domains such as the clinical.

This dissertation explores the potential of language technology to automatically extract information from the narrative text of Danish EHRs. Moreover, it emphasizes the importance of developing language resources tailored for the Danish clinical domain, as it can be used to enhance clinical research possibilities and improve patient treatment.

The dissertation covers the development of two Danish pre-trained language models which show improved performance compared to existing Danish language models. Moreover, it explores the impact of dataset curation on potential biases in clinical language models. The dissertation also investigates how language models can be used to extract bleeding events from Danish EHRs and evaluates the performance of medical doctors in identifying relevant information when using the bleeding algorithm as an assistive tool. Finally, the dissertation presents a pre-trained language model that can be used to extract clinical information such as diseases, symptoms, and treatments in the narrative text of Danish EHRs.
Translated title of the contributionFrigørelse af potentialet for Elektroniske Patientjournaler med Danske Kliniske Sprogmodeller til Tekstmining
Original languageEnglish
Awarding Institution
  • University of Southern Denmark
  • Savarimuthu, Thiusius R., Principal supervisor
  • Vinholt, Pernille Just, Co-supervisor
Publication statusPublished - 2. Nov 2023

Note re. dissertation

Print copy of the full thesis is restricted to reference use in the library.


Dive into the research topics of 'Unlocking the Potential of Electronic Health Records With Danish Clinical Language Models for Text Mining'. Together they form a unique fingerprint.

Cite this