PURPOSE: Prognostic models for diffuse large B-cell lymphoma (DLBCL), such as the International Prognostic Index (IPI) are widely used in clinical practice. The models are typically developed with simplicity in mind and thus do not exploit the full potential of detailed clinical data. This study investigated whether nationwide lymphoma registries containing clinical data and machine learning techniques could prove to be useful for building modern prognostic tools.
PATIENTS AND METHODS: This study was based on nationwide lymphoma registries from Denmark and Sweden, which include large amounts of clinicopathologic data. Using the Danish DLBCL cohort, a stacking approach was used to build a new prognostic model that leverages the strengths of different survival models. To compare the performance of the stacking approach with established prognostic models, cross-validation was used to estimate the concordance index (C-index), time-varying area under the curve, and integrated Brier score. Finally, the generalizability was tested by applying the new model to the Swedish cohort.
RESULTS: In total, 2,759 and 2,414 patients were included from the Danish and Swedish cohorts, respectively. In the Danish cohort, the stacking approach led to the lowest integrated Brier score, indicating that the survival curves obtained from the stacking model fitted the observed survival the best. The C-index and time-varying area under the curve indicated that the stacked model (C-index: Denmark [DK], 0.756; Sweden [SE], 0.744) had good discriminative capabilities compared with the other considered prognostic models (IPI: DK, 0.662; SE, 0.661; and National Comprehensive Cancer Network-IPI: DK, 0.681; SE, 0.681). Furthermore, these results were reproducible in the independent Swedish cohort.
CONCLUSION: A new prognostic model based on machine learning techniques was developed and was shown to significantly outperform established prognostic indices for DLBCL. The model is available at https://lymphomapredictor.org .