TY - JOUR
T1 - Differentiation of COVID-19 pneumonia from other lung diseases using CT radiomic features and machine learning
T2 - A large multicentric cohort study
AU - Shiri, Isaac
AU - Salimi, Yazdan
AU - Saberi, Abdollah
AU - Pakbin, Masoumeh
AU - Hajianfar, Ghasem
AU - Avval, Atlas Haddadi
AU - Sanaat, Amirhossein
AU - Akhavanallaf, Azadeh
AU - Mostafaei, Shayan
AU - Mansouri, Zahra
AU - Askari, Dariush
AU - Ghasemian, Mohammadreza
AU - Sharifipour, Ehsan
AU - Sandoughdaran, Saleh
AU - Sohrabi, Ahmad
AU - Sadati, Elham
AU - Livani, Somayeh
AU - Iranpour, Pooya
AU - Kolahi, Shahriar
AU - Khosravi, Bardia
AU - Khateri, Maziar
AU - Bijari, Salar
AU - Atashzar, Mohammad Reza
AU - Shayesteh, Sajad P.
AU - Babaei, Mohammad Reza
AU - Jenabi, Elnaz
AU - Hasanian, Mohammad
AU - Shahhamzeh, Alireza
AU - Ghomi, Seyed Yaser Foroghi
AU - Mozafari, Abolfazl
AU - Shirzad-Aski, Hesamaddin
AU - Movaseghi, Fatemeh
AU - Bozorgmehr, Rama
AU - Goharpey, Neda
AU - Abdollahi, Hamid
AU - Geramifar, Parham
AU - Radmard, Amir Reza
AU - Arabi, Hossein
AU - Rezaei-Kalantari, Kiara
AU - Oveisi, Mehrdad
AU - Rahmim, Arman
AU - Zaidi, Habib
PY - 2024/3
Y1 - 2024/3
N2 - To derive and validate an effective machine learning and radiomics-based model to differentiate COVID-19 pneumonia from other lung diseases using a large multi-centric dataset. In this retrospective study, we collected 19 private and five public datasets of chest CT images, accumulating to 26 307 images (15 148 COVID-19; 9657 other lung diseases including non-COVID-19 pneumonia, lung cancer, pulmonary embolism; 1502 normal cases). We tested 96 machine learning-based models by cross-combining four feature selectors (FSs) and eight dimensionality reduction techniques with eight classifiers. We trained and evaluated our models using three different strategies: #1, the whole dataset (15 148 COVID-19 and 11 159 other); #2, a new dataset after excluding healthy individuals and COVID-19 patients who did not have RT-PCR results (12 419 COVID-19 and 8278 other); and #3 only non-COVID-19 pneumonia patients and a random sample of COVID-19 patients (3000 COVID-19 and 2582 others) to provide balanced classes. The best models were chosen by one-standard-deviation rule in 10-fold cross-validation and evaluated on the hold out test sets for reporting. In strategy#1, Relief FS combined with random forest (RF) classifier resulted in the highest performance (accuracy = 0.96, AUC = 0.99, sensitivity = 0.98, specificity = 0.94, PPV = 0.96, and NPV = 0.96). In strategy#2, Recursive Feature Elimination (RFE) FS and RF classifier combination resulted in the highest performance (accuracy = 0.97, AUC = 0.99, sensitivity = 0.98, specificity = 0.95, PPV = 0.96, NPV = 0.98). Finally, in strategy #3, the ANOVA FS and RF classifier combination resulted in the highest performance (accuracy = 0.94, AUC =0.98, sensitivity = 0.96, specificity = 0.93, PPV = 0.93, NPV = 0.96). Lung radiomic features combined with machine learning algorithms can enable the effective diagnosis of COVID-19 pneumonia in CT images without the use of additional tests.
AB - To derive and validate an effective machine learning and radiomics-based model to differentiate COVID-19 pneumonia from other lung diseases using a large multi-centric dataset. In this retrospective study, we collected 19 private and five public datasets of chest CT images, accumulating to 26 307 images (15 148 COVID-19; 9657 other lung diseases including non-COVID-19 pneumonia, lung cancer, pulmonary embolism; 1502 normal cases). We tested 96 machine learning-based models by cross-combining four feature selectors (FSs) and eight dimensionality reduction techniques with eight classifiers. We trained and evaluated our models using three different strategies: #1, the whole dataset (15 148 COVID-19 and 11 159 other); #2, a new dataset after excluding healthy individuals and COVID-19 patients who did not have RT-PCR results (12 419 COVID-19 and 8278 other); and #3 only non-COVID-19 pneumonia patients and a random sample of COVID-19 patients (3000 COVID-19 and 2582 others) to provide balanced classes. The best models were chosen by one-standard-deviation rule in 10-fold cross-validation and evaluated on the hold out test sets for reporting. In strategy#1, Relief FS combined with random forest (RF) classifier resulted in the highest performance (accuracy = 0.96, AUC = 0.99, sensitivity = 0.98, specificity = 0.94, PPV = 0.96, and NPV = 0.96). In strategy#2, Recursive Feature Elimination (RFE) FS and RF classifier combination resulted in the highest performance (accuracy = 0.97, AUC = 0.99, sensitivity = 0.98, specificity = 0.95, PPV = 0.96, NPV = 0.98). Finally, in strategy #3, the ANOVA FS and RF classifier combination resulted in the highest performance (accuracy = 0.94, AUC =0.98, sensitivity = 0.96, specificity = 0.93, PPV = 0.93, NPV = 0.96). Lung radiomic features combined with machine learning algorithms can enable the effective diagnosis of COVID-19 pneumonia in CT images without the use of additional tests.
KW - computed tomography
KW - COVID-19
KW - differential diagnosis
KW - machine learning
KW - radiomics
U2 - 10.1002/ima.23028
DO - 10.1002/ima.23028
M3 - Journal article
AN - SCOPUS:85184209029
SN - 0899-9457
VL - 34
JO - International Journal of Imaging Systems and Technology
JF - International Journal of Imaging Systems and Technology
IS - 2
M1 - e23028
ER -