Linking glycemic dysregulation in diabetes to symptoms, comorbidities, and genetics through EHR data mining

Isa Kristina Kirk, Christian Simon, Karina Banasik, Peter Christoffer Holm, Amalie Dahl Haue, Peter Bjødstrup Jensen, Lars Juhl Jensen, Cristina Leal Rodríguez, Mette Krogh Pedersen, Robert Eriksson, Henrik Ullits Andersen, Thomas Almdal, Jette Bork-Jensen, Niels Grarup, Knut Borch-Johnsen, Oluf Pedersen, Flemming Pociot, Torben Hansen, Regine Bergholdt, Peter Rossing*Søren Brunak

*Kontaktforfatter for dette arbejde

Publikation: Bidrag til tidsskriftTidsskriftartikelForskningpeer review

Resumé

Diabetes is a diverse and complex disease, with considerable variation in phenotypic manifestation and severity. This variation hampers the study of etiological differences and reduces the statistical power of analyses of associations to genetics, treatment outcomes, and complications. We address these issues through deep, fine-grained phenotypic stratification of a diabetes cohort. Text mining the electronic health records of 14,017 patients, we matched two controlled vocabularies (ICD-10 and a custom vocabulary developed at the clinical center Steno Diabetes Center Copenhagen) to clinical narratives spanning a 19 year period. The two matched vocabularies comprise over 20,000 medical terms describing symptoms, other diagnoses, and lifestyle factors. The cohort is genetically homogeneous (Caucasian diabetes patients from Denmark) so the resulting stratification is not driven by ethnic differences, but rather by inherently dissimilar progression patterns and lifestyle related risk factors. Using unsupervised Markov clustering, we defined 71 clusters of at least 50 individuals within the diabetes spectrum. The clusters display both distinct and shared longitudinal glycemic dysregulation patterns, temporal co-occurrences of comorbidities, and associations to single nucleotide polymorphisms in or near genes relevant for diabetes comorbidities.

OriginalsprogEngelsk
Artikelnummere44941
TidsskrifteLife
Vol/bind8
Antal sider19
ISSN2050-084X
DOI
StatusUdgivet - 10. dec. 2019

Fingeraftryk

Data Mining
Vocabulary
Medical problems
Data mining
Comorbidity
Controlled Vocabulary
Electronic Health Records
International Classification of Diseases
Denmark
Single Nucleotide Polymorphism
Cluster Analysis
Thesauri
Polymorphism
Genetics
Nucleotides
Genes
Health

Citer dette

Kirk, Isa Kristina ; Simon, Christian ; Banasik, Karina ; Holm, Peter Christoffer ; Haue, Amalie Dahl ; Jensen, Peter Bjødstrup ; Juhl Jensen, Lars ; Rodríguez, Cristina Leal ; Pedersen, Mette Krogh ; Eriksson, Robert ; Andersen, Henrik Ullits ; Almdal, Thomas ; Bork-Jensen, Jette ; Grarup, Niels ; Borch-Johnsen, Knut ; Pedersen, Oluf ; Pociot, Flemming ; Hansen, Torben ; Bergholdt, Regine ; Rossing, Peter ; Brunak, Søren. / Linking glycemic dysregulation in diabetes to symptoms, comorbidities, and genetics through EHR data mining. I: eLife. 2019 ; Bind 8.
@article{a3ef77ef45f6452aa3cc3d2893902db0,
title = "Linking glycemic dysregulation in diabetes to symptoms, comorbidities, and genetics through EHR data mining",
abstract = "Diabetes is a diverse and complex disease, with considerable variation in phenotypic manifestation and severity. This variation hampers the study of etiological differences and reduces the statistical power of analyses of associations to genetics, treatment outcomes, and complications. We address these issues through deep, fine-grained phenotypic stratification of a diabetes cohort. Text mining the electronic health records of 14,017 patients, we matched two controlled vocabularies (ICD-10 and a custom vocabulary developed at the clinical center Steno Diabetes Center Copenhagen) to clinical narratives spanning a 19 year period. The two matched vocabularies comprise over 20,000 medical terms describing symptoms, other diagnoses, and lifestyle factors. The cohort is genetically homogeneous (Caucasian diabetes patients from Denmark) so the resulting stratification is not driven by ethnic differences, but rather by inherently dissimilar progression patterns and lifestyle related risk factors. Using unsupervised Markov clustering, we defined 71 clusters of at least 50 individuals within the diabetes spectrum. The clusters display both distinct and shared longitudinal glycemic dysregulation patterns, temporal co-occurrences of comorbidities, and associations to single nucleotide polymorphisms in or near genes relevant for diabetes comorbidities.",
keywords = "comorbidities, computational biology, diabetes, diabetes subtypes, EHR, epidemiology, genotyping, global health, human, systems biology, text mining",
author = "Kirk, {Isa Kristina} and Christian Simon and Karina Banasik and Holm, {Peter Christoffer} and Haue, {Amalie Dahl} and Jensen, {Peter Bj{\o}dstrup} and {Juhl Jensen}, Lars and Rodr{\'i}guez, {Cristina Leal} and Pedersen, {Mette Krogh} and Robert Eriksson and Andersen, {Henrik Ullits} and Thomas Almdal and Jette Bork-Jensen and Niels Grarup and Knut Borch-Johnsen and Oluf Pedersen and Flemming Pociot and Torben Hansen and Regine Bergholdt and Peter Rossing and S{\o}ren Brunak",
year = "2019",
month = "12",
day = "10",
doi = "10.7554/eLife.44941",
language = "English",
volume = "8",
journal = "eLife",
issn = "2050-084X",
publisher = "eLife Sciences Publications Ltd.",

}

Kirk, IK, Simon, C, Banasik, K, Holm, PC, Haue, AD, Jensen, PB, Juhl Jensen, L, Rodríguez, CL, Pedersen, MK, Eriksson, R, Andersen, HU, Almdal, T, Bork-Jensen, J, Grarup, N, Borch-Johnsen, K, Pedersen, O, Pociot, F, Hansen, T, Bergholdt, R, Rossing, P & Brunak, S 2019, 'Linking glycemic dysregulation in diabetes to symptoms, comorbidities, and genetics through EHR data mining', eLife, bind 8, e44941. https://doi.org/10.7554/eLife.44941

Linking glycemic dysregulation in diabetes to symptoms, comorbidities, and genetics through EHR data mining. / Kirk, Isa Kristina; Simon, Christian; Banasik, Karina; Holm, Peter Christoffer; Haue, Amalie Dahl; Jensen, Peter Bjødstrup; Juhl Jensen, Lars; Rodríguez, Cristina Leal; Pedersen, Mette Krogh; Eriksson, Robert; Andersen, Henrik Ullits; Almdal, Thomas; Bork-Jensen, Jette; Grarup, Niels; Borch-Johnsen, Knut; Pedersen, Oluf; Pociot, Flemming; Hansen, Torben; Bergholdt, Regine; Rossing, Peter; Brunak, Søren.

I: eLife, Bind 8, e44941, 10.12.2019.

Publikation: Bidrag til tidsskriftTidsskriftartikelForskningpeer review

TY - JOUR

T1 - Linking glycemic dysregulation in diabetes to symptoms, comorbidities, and genetics through EHR data mining

AU - Kirk, Isa Kristina

AU - Simon, Christian

AU - Banasik, Karina

AU - Holm, Peter Christoffer

AU - Haue, Amalie Dahl

AU - Jensen, Peter Bjødstrup

AU - Juhl Jensen, Lars

AU - Rodríguez, Cristina Leal

AU - Pedersen, Mette Krogh

AU - Eriksson, Robert

AU - Andersen, Henrik Ullits

AU - Almdal, Thomas

AU - Bork-Jensen, Jette

AU - Grarup, Niels

AU - Borch-Johnsen, Knut

AU - Pedersen, Oluf

AU - Pociot, Flemming

AU - Hansen, Torben

AU - Bergholdt, Regine

AU - Rossing, Peter

AU - Brunak, Søren

PY - 2019/12/10

Y1 - 2019/12/10

N2 - Diabetes is a diverse and complex disease, with considerable variation in phenotypic manifestation and severity. This variation hampers the study of etiological differences and reduces the statistical power of analyses of associations to genetics, treatment outcomes, and complications. We address these issues through deep, fine-grained phenotypic stratification of a diabetes cohort. Text mining the electronic health records of 14,017 patients, we matched two controlled vocabularies (ICD-10 and a custom vocabulary developed at the clinical center Steno Diabetes Center Copenhagen) to clinical narratives spanning a 19 year period. The two matched vocabularies comprise over 20,000 medical terms describing symptoms, other diagnoses, and lifestyle factors. The cohort is genetically homogeneous (Caucasian diabetes patients from Denmark) so the resulting stratification is not driven by ethnic differences, but rather by inherently dissimilar progression patterns and lifestyle related risk factors. Using unsupervised Markov clustering, we defined 71 clusters of at least 50 individuals within the diabetes spectrum. The clusters display both distinct and shared longitudinal glycemic dysregulation patterns, temporal co-occurrences of comorbidities, and associations to single nucleotide polymorphisms in or near genes relevant for diabetes comorbidities.

AB - Diabetes is a diverse and complex disease, with considerable variation in phenotypic manifestation and severity. This variation hampers the study of etiological differences and reduces the statistical power of analyses of associations to genetics, treatment outcomes, and complications. We address these issues through deep, fine-grained phenotypic stratification of a diabetes cohort. Text mining the electronic health records of 14,017 patients, we matched two controlled vocabularies (ICD-10 and a custom vocabulary developed at the clinical center Steno Diabetes Center Copenhagen) to clinical narratives spanning a 19 year period. The two matched vocabularies comprise over 20,000 medical terms describing symptoms, other diagnoses, and lifestyle factors. The cohort is genetically homogeneous (Caucasian diabetes patients from Denmark) so the resulting stratification is not driven by ethnic differences, but rather by inherently dissimilar progression patterns and lifestyle related risk factors. Using unsupervised Markov clustering, we defined 71 clusters of at least 50 individuals within the diabetes spectrum. The clusters display both distinct and shared longitudinal glycemic dysregulation patterns, temporal co-occurrences of comorbidities, and associations to single nucleotide polymorphisms in or near genes relevant for diabetes comorbidities.

KW - comorbidities

KW - computational biology

KW - diabetes

KW - diabetes subtypes

KW - EHR

KW - epidemiology

KW - genotyping

KW - global health

KW - human

KW - systems biology

KW - text mining

U2 - 10.7554/eLife.44941

DO - 10.7554/eLife.44941

M3 - Journal article

C2 - 31818369

AN - SCOPUS:85076272755

VL - 8

JO - eLife

JF - eLife

SN - 2050-084X

M1 - e44941

ER -