PURPOSE: To develop a predictive model based on Danish administrative registers to facilitate automated identification of individuals at risk of any type of cancer.
METHODS: A nationwide register-based cohort study covering all individuals in Denmark aged +20 years. The outcome was all-type cancer during 2017 excluding nonmelanoma skin cancer. Diagnoses, medication, and contact with general practitioners in the exposure period (2007-2016) were considered for the predictive model. We applied backward selection to all variables by logistic regression to develop a risk model for cancer. We applied the models to the validation cohort, calculated the receiver operating characteristic curves, and estimated the corresponding areas under the curve (AUC).
RESULTS: The study population consisted of 4.2 million persons; 32,447 (0.76%) were diagnosed with cancer in 2017. We identified 39 predictive risk factors in women and 42 in men, with age above 30 as the strongest predictor for cancer. Testing the model for cancer risk showed modest accuracy, with an AUC of 0.82 (95% CI 0.81-0.82) for men and 0.75 (95% CI 0.74-0.75) for women.
CONCLUSION: We have developed and tested a model for identifying the individual risk of cancer through the use of administrative data. The models need to be further investigated before being applied to clinical practice.