The clinical potential of artificial intelligence in early detection of lung cancer

Margrethe Høstgaard Bang Henriksen

Research output: ThesisPh.D. thesis

25 Downloads (Pure)

Abstract

Lung cancer (LC) is currently the leading cause of cancer-related deaths, highlighting the critical necessity for early detection, which is essential for providing curative treatment. While screening for LC is gradually introduced through pilot studies across various countries, discussions persist regarding the optimal selection criteria. Numerous studies have highlighted the superiority of individual prediction models over the widespread categorical standard criteria based solely on age and smoking intensity.

The overall aim of this thesis was to explore and refine LC detection models based on artificial intelligence (AI) utilizing data obtained from clinical health records and registries. The issue was addressed from several angles, resulting in the incorporation of five articles in this thesis.

The first four studies revolved around data derived from a high-risk cohort of patients evaluated in the LC fast-track clinics in the Region of Southern Denmark. Extensive clinical and laboratory data were collected from this cohort of nearly 40,000 individuals, including 25% of which were LC patients. Associations between data variables and LC status were examined in Article I, laying the groundwork for subsequent prediction models. The initial findings let to the usage of smoking and laboratory data in the development of prediction models employing both a machine learning (ML) approach (Article II) and a Bayesian Networks (BN) approach (Article III). The ML model exhibited similar performance to the BN approach with a mean area under the receiver operating characteristic (ROC) curve (AUC) of 0.77 compared to AUC 0.76, and both with a sensitivity of 21% at a fixed specificity of 95%. The ML model identified smoking status, lactate dehydrogenase, age and plasma calcium levels as the most important factors for detection of LC. The BN model demonstrated performance robustness when introduced to missing data (up to 30%), a notable advantage when working with clinical data analysis. 

Additional data types such as symptoms at diagnosis, comorbidities, and medication were integrated into an expanded BN model, investigating whether a more comprehensive dataset could enhance model performance (Article IV). The best-performing model achieved an AUC of 0.79 and was developed using comorbidity, laboratory results, and smoking data on a relatively large dataset with 21% missing variables. Additionally, a model developed on a small but complete dataset proved to be stable when applied to larger datasets with up to 39% missing data, indicating its applicability in individuals with incomplete data. While laboratory results and smoking status were the strongest predictors of LC, comorbidity (including data on medication and data from general practice) and symptoms at diagnosis appeared to be the least informative. 

While the first four studies focused solely on high-risk patients, we aimed to extrapolate these findings to a potentially lower-risk population eligible for LC screening. Therefore, we assessed the risk of LC and the overlap with LC fast-track clinics among chronic obstructive pulmonary disease (COPD) outpatients (Article V). Within this cohort, we observed a 5% risk of LC, surpassing the risk in the general population more than tenfold. Importantly, LC patients with COPD were diagnosed at an earlier stage than LC patients without COPD. Additionally, 18% of COPD outpatients were referred to LC diagnostics at some point. While this high referral rate may be due to increased medical attention, it suggests potential benefits of a regular and systematic screening approach for these patients.

The insights and methodology outlined in this thesis serve as foundational elements for our ongoing research, which aims to integrate risk models into a practical clinical screening context. A highly effective model capable of early-stage LC prediction will enhance screening efficacy and promote early detection, ultimately leading to improved survival rates.
Translated title of the contributionTidlig opsporing af lungekræft ved hjælp af kunstig intelligens
Original languageEnglish
Awarding Institution
  • University of Southern Denmark
Supervisors/Advisors
  • Hansen, Torben Frøstrup, Principal supervisor
  • Hilberg, Ole, Co-supervisor
  • Lohman Brasen, Claus, Co-supervisor
  • Jensen, Lars Henrik, Co-supervisor
External participants
Date of defence20. Feb 2025
Publisher
DOIs
Publication statusPublished - 27. Jan 2025

Note re. dissertation

Print copy of the full thesis is restricted to reference use in the library.

Keywords

  • Lung cancer
  • early detection
  • prediction models
  • machine learning
  • bayesian networks
  • artificial intelligence
  • screening
  • screening models

Fingerprint

Dive into the research topics of 'The clinical potential of artificial intelligence in early detection of lung cancer'. Together they form a unique fingerprint.
  • Ph.d. cup participant

    Høstgaard Bang Henriksen, M. (Guest lecturer)

    2024

    Activity: Talks and presentationsTalks and presentations in private or public companies

  • DCCC Wrap up seminar

    Høstgaard Bang Henriksen, M. (Guest lecturer)

    2023

    Activity: Talks and presentationsConference presentations

  • Flash talk

    Høstgaard Bang Henriksen, M. (Guest lecturer)

    2023 → …

    Activity: Talks and presentationsTalks and presentations in private or public companies

Cite this