TY - JOUR
T1 - Assessment of image quality on the diagnostic performance of clinicians and deep learning models
T2 - Cross-sectional comparative reader study
AU - Oloruntoba, A. I.
AU - Asghari-Jafarabadi, M.
AU - Sashindranath, M.
AU - Ingvar, null
AU - Adler, N. R.
AU - Vico-Alonso, C.
AU - Niklasson, L.
AU - Caixinha, A. L.
AU - Hiscutt, E.
AU - Holmes, Z.
AU - Assersen, K. B.
AU - Adamson, S.
AU - Jegathees, T.
AU - Bertelsen, T.
AU - Velasco-Tamariz, V.
AU - Helkkula, T.
AU - Kristiansen, S.
AU - Toholka, R.
AU - Goh, M. S.
AU - Chamberlain, A.
AU - McCormack, C.
AU - Vestergaard, T.
AU - Mehta, D.
AU - Nguyen, T. D.
AU - Ge, Z.
AU - Soyer, H. P.
AU - Mar, V.
PY - 2024/12/10
Y1 - 2024/12/10
N2 - Background: Skin cancer is a prevalent and clinically significant condition, with early and accurate diagnosis being crucial for improved patient outcomes. Dermoscopy and artificial intelligence (AI) hold promise in enhancing diagnostic accuracy. However, the impact of image quality, particularly high dynamic range (HDR) conversion in smartphone images, on diagnostic performance remains poorly understood. Objective: This study aimed to investigate the effect of varying image qualities, including HDR-enhanced dermoscopic images, on the diagnostic capabilities of clinicians and a convolutional neural network (CNN) model. Methods: Eighteen dermatology clinicians assessed 303 images of 101 skin lesions that were categorized into three image quality groups: low quality (LQ), high quality (HQ) and enhanced quality (EQ) produced using HDR-style conversion. Clinicians participated in a two part reader study that required their diagnosis, management and confidence level for each image assessed. Results: In the binary classification of lesions, clinicians had the greatest diagnostic performance with HQ images, with sensitivity (77.3%; CI 69.1–85.5), specificity (63.1%; CI 53.7–72.5) and accuracy (70.2%; CI 61.3–79.1). For the multiclass classification, the overall performance was also best with HQ images, attaining the greatest specificity (91.9%; CI 83.2–95.0) and accuracy (51.5%; CI 48.4–54.7). Clinicians had a superior performance (median correct diagnoses) to the CNN model for the binary classification of LQ and EQ images, but their performance was comparable on the HQ images. However, in the multiclass classification, the CNN model significantly outperformed the clinicians on HQ images (p < 0.01). Conclusion: This study highlights the importance of image quality on the diagnostic performance of clinicians and deep learning models. This has significant implications for telehealth reporting and triage.
AB - Background: Skin cancer is a prevalent and clinically significant condition, with early and accurate diagnosis being crucial for improved patient outcomes. Dermoscopy and artificial intelligence (AI) hold promise in enhancing diagnostic accuracy. However, the impact of image quality, particularly high dynamic range (HDR) conversion in smartphone images, on diagnostic performance remains poorly understood. Objective: This study aimed to investigate the effect of varying image qualities, including HDR-enhanced dermoscopic images, on the diagnostic capabilities of clinicians and a convolutional neural network (CNN) model. Methods: Eighteen dermatology clinicians assessed 303 images of 101 skin lesions that were categorized into three image quality groups: low quality (LQ), high quality (HQ) and enhanced quality (EQ) produced using HDR-style conversion. Clinicians participated in a two part reader study that required their diagnosis, management and confidence level for each image assessed. Results: In the binary classification of lesions, clinicians had the greatest diagnostic performance with HQ images, with sensitivity (77.3%; CI 69.1–85.5), specificity (63.1%; CI 53.7–72.5) and accuracy (70.2%; CI 61.3–79.1). For the multiclass classification, the overall performance was also best with HQ images, attaining the greatest specificity (91.9%; CI 83.2–95.0) and accuracy (51.5%; CI 48.4–54.7). Clinicians had a superior performance (median correct diagnoses) to the CNN model for the binary classification of LQ and EQ images, but their performance was comparable on the HQ images. However, in the multiclass classification, the CNN model significantly outperformed the clinicians on HQ images (p < 0.01). Conclusion: This study highlights the importance of image quality on the diagnostic performance of clinicians and deep learning models. This has significant implications for telehealth reporting and triage.
U2 - 10.1111/jdv.20462
DO - 10.1111/jdv.20462
M3 - Journal article
C2 - 39655640
AN - SCOPUS:85211441263
SN - 0926-9959
JO - Journal of The European Academy of Dermatology and Venereology
JF - Journal of The European Academy of Dermatology and Venereology
ER -