Using electronic patient records to discover disease correlations and stratify patient cohorts

Francisco S Roque, Peter Bjødstrup Jensen, Henriette Schmock, Marlene Dalgaard, Massimo Andreatta, Thomas Hansen, Karen Søeby, Søren Bredkjær, Anders Juul, Thomas Werge, Lars J Jensen, Søren Brunak*

*Corresponding author for this work

Research output: Contribution to journalJournal articleResearchpeer-review


Electronic patient records remain a rather unexplored, but potentially rich data source for discovering correlations between diseases. We describe a general approach for gathering phenotypic descriptions of patients from medical records in a systematic and non-cohort dependent manner. By extracting phenotype information from the free-text in such records we demonstrate that we can extend the information contained in the structured record data, and use it for producing fine-grained patient stratification and disease co-occurrence statistics. The approach uses a dictionary based on the International Classification of Disease ontology and is therefore in principle language independent. As a use case we show how records from a Danish psychiatric hospital lead to the identification of disease correlations, which subsequently can be mapped to systems biology frameworks.

Original languageEnglish
Article numbere1002141
JournalPLOS Computational Biology
Issue number8
Publication statusPublished - Aug 2011
Externally publishedYes


  • Cluster Analysis
  • Cohort Studies
  • Comorbidity
  • Computational Biology
  • Data Collection
  • Data Mining
  • Electronic Health Records
  • Humans
  • International Classification of Diseases
  • Reproducibility of Results


Dive into the research topics of 'Using electronic patient records to discover disease correlations and stratify patient cohorts'. Together they form a unique fingerprint.

Cite this