Deep Learning–Based Algorithms in Screening of Diabetic Retinopathy

A Systematic Review of Diagnostic Performance

Katrine B. Nielsen, Mie L. Lautrup, Jakob K.H. Andersen, Thiusius R. Savarimuthu, Jakob Grauslund*

*Kontaktforfatter for dette arbejde

Publikation: Bidrag til tidsskriftReviewForskningpeer review

15 Downloads (Pure)

Resumé

Topic Diagnostic performance of deep learning–based algorithms in screening patients with diabetes for diabetic retinopathy (DR). The algorithms were compared with the current gold standard of classification by human specialists. Clinical Relevance Because DR is a common cause of visual impairment, screening is indicated to avoid irreversible vision loss. Automated DR classification using deep learning may be a suitable new screening tool that could improve diagnostic performance and reduce manpower. Methods For this systematic review, we aimed to identify studies that incorporated the use of deep learning in classifying full-scale DR in retinal fundus images of patients with diabetes. The studies had to provide a DR grading scale, a human grader as a reference standard, and a deep learning performance score. A systematic search on April 5, 2018, through MEDLINE and Embase yielded 304 publications. To identify potentially missed publications, the reference lists of the final included studies were manually screened, yielding no additional publications. The Quality Assessment of Diagnostic Accuracy Studies 2 tool was used for risk of bias and applicability assessment. Results By using objective selection, we included 11 diagnostic accuracy studies that validated the performance of their deep learning method using a new group of patients or retrospective datasets. Eight studies reported sensitivity and specificity of 80.28% to 100.0% and 84.0% to 99.0%, respectively. Two studies report accuracies of 78.7% and 81.0%. One study provides an area under the receiver operating curve of 0.955. In addition to diagnostic performance, one study also reported on patient satisfaction, showing that 78% of patients preferred an automated deep learning model over manual human grading. Conclusions Advantages of implementing deep learning–based algorithms in DR screening include reduction in manpower, cost of screening, and issues relating to intragrader and intergrader variability. However, limitations that may hinder such an implementation particularly revolve around ethical concerns regarding lack of trust in the diagnostic accuracy of computers. Considering both strengths and limitations, as well as the high performance of deep learning–based algorithms, automated DR classification using deep learning could be feasible in a real-world screening scenario.
OriginalsprogEngelsk
TidsskriftOphthalmology Retina
Vol/bind3
Udgave nummer4
Sider (fra-til)294-304
Antal sider11
ISSN2468-6530
DOI
StatusUdgivet - apr. 2019

Fingeraftryk

Diabetic Retinopathy
Publications
MEDLINE

Citer dette

@article{f7991c1cbc784762ae12f82abf1027d9,
title = "Deep Learning–Based Algorithms in Screening of Diabetic Retinopathy: A Systematic Review of Diagnostic Performance",
abstract = "Topic Diagnostic performance of deep learning–based algorithms in screening patients with diabetes for diabetic retinopathy (DR). The algorithms were compared with the current gold standard of classification by human specialists. Clinical Relevance Because DR is a common cause of visual impairment, screening is indicated to avoid irreversible vision loss. Automated DR classification using deep learning may be a suitable new screening tool that could improve diagnostic performance and reduce manpower. Methods For this systematic review, we aimed to identify studies that incorporated the use of deep learning in classifying full-scale DR in retinal fundus images of patients with diabetes. The studies had to provide a DR grading scale, a human grader as a reference standard, and a deep learning performance score. A systematic search on April 5, 2018, through MEDLINE and Embase yielded 304 publications. To identify potentially missed publications, the reference lists of the final included studies were manually screened, yielding no additional publications. The Quality Assessment of Diagnostic Accuracy Studies 2 tool was used for risk of bias and applicability assessment. Results By using objective selection, we included 11 diagnostic accuracy studies that validated the performance of their deep learning method using a new group of patients or retrospective datasets. Eight studies reported sensitivity and specificity of 80.28{\%} to 100.0{\%} and 84.0{\%} to 99.0{\%}, respectively. Two studies report accuracies of 78.7{\%} and 81.0{\%}. One study provides an area under the receiver operating curve of 0.955. In addition to diagnostic performance, one study also reported on patient satisfaction, showing that 78{\%} of patients preferred an automated deep learning model over manual human grading. Conclusions Advantages of implementing deep learning–based algorithms in DR screening include reduction in manpower, cost of screening, and issues relating to intragrader and intergrader variability. However, limitations that may hinder such an implementation particularly revolve around ethical concerns regarding lack of trust in the diagnostic accuracy of computers. Considering both strengths and limitations, as well as the high performance of deep learning–based algorithms, automated DR classification using deep learning could be feasible in a real-world screening scenario.",
author = "Nielsen, {Katrine B.} and Lautrup, {Mie L.} and Andersen, {Jakob K.H.} and Savarimuthu, {Thiusius R.} and Jakob Grauslund",
year = "2019",
month = "4",
doi = "10.1016/j.oret.2018.10.014",
language = "English",
volume = "3",
pages = "294--304",
journal = "Ophthalmology Retina",
issn = "2468-6530",
number = "4",

}

Deep Learning–Based Algorithms in Screening of Diabetic Retinopathy : A Systematic Review of Diagnostic Performance. / Nielsen, Katrine B.; Lautrup, Mie L.; Andersen, Jakob K.H.; Savarimuthu, Thiusius R.; Grauslund, Jakob.

I: Ophthalmology Retina, Bind 3, Nr. 4, 04.2019, s. 294-304.

Publikation: Bidrag til tidsskriftReviewForskningpeer review

TY - JOUR

T1 - Deep Learning–Based Algorithms in Screening of Diabetic Retinopathy

T2 - A Systematic Review of Diagnostic Performance

AU - Nielsen, Katrine B.

AU - Lautrup, Mie L.

AU - Andersen, Jakob K.H.

AU - Savarimuthu, Thiusius R.

AU - Grauslund, Jakob

PY - 2019/4

Y1 - 2019/4

N2 - Topic Diagnostic performance of deep learning–based algorithms in screening patients with diabetes for diabetic retinopathy (DR). The algorithms were compared with the current gold standard of classification by human specialists. Clinical Relevance Because DR is a common cause of visual impairment, screening is indicated to avoid irreversible vision loss. Automated DR classification using deep learning may be a suitable new screening tool that could improve diagnostic performance and reduce manpower. Methods For this systematic review, we aimed to identify studies that incorporated the use of deep learning in classifying full-scale DR in retinal fundus images of patients with diabetes. The studies had to provide a DR grading scale, a human grader as a reference standard, and a deep learning performance score. A systematic search on April 5, 2018, through MEDLINE and Embase yielded 304 publications. To identify potentially missed publications, the reference lists of the final included studies were manually screened, yielding no additional publications. The Quality Assessment of Diagnostic Accuracy Studies 2 tool was used for risk of bias and applicability assessment. Results By using objective selection, we included 11 diagnostic accuracy studies that validated the performance of their deep learning method using a new group of patients or retrospective datasets. Eight studies reported sensitivity and specificity of 80.28% to 100.0% and 84.0% to 99.0%, respectively. Two studies report accuracies of 78.7% and 81.0%. One study provides an area under the receiver operating curve of 0.955. In addition to diagnostic performance, one study also reported on patient satisfaction, showing that 78% of patients preferred an automated deep learning model over manual human grading. Conclusions Advantages of implementing deep learning–based algorithms in DR screening include reduction in manpower, cost of screening, and issues relating to intragrader and intergrader variability. However, limitations that may hinder such an implementation particularly revolve around ethical concerns regarding lack of trust in the diagnostic accuracy of computers. Considering both strengths and limitations, as well as the high performance of deep learning–based algorithms, automated DR classification using deep learning could be feasible in a real-world screening scenario.

AB - Topic Diagnostic performance of deep learning–based algorithms in screening patients with diabetes for diabetic retinopathy (DR). The algorithms were compared with the current gold standard of classification by human specialists. Clinical Relevance Because DR is a common cause of visual impairment, screening is indicated to avoid irreversible vision loss. Automated DR classification using deep learning may be a suitable new screening tool that could improve diagnostic performance and reduce manpower. Methods For this systematic review, we aimed to identify studies that incorporated the use of deep learning in classifying full-scale DR in retinal fundus images of patients with diabetes. The studies had to provide a DR grading scale, a human grader as a reference standard, and a deep learning performance score. A systematic search on April 5, 2018, through MEDLINE and Embase yielded 304 publications. To identify potentially missed publications, the reference lists of the final included studies were manually screened, yielding no additional publications. The Quality Assessment of Diagnostic Accuracy Studies 2 tool was used for risk of bias and applicability assessment. Results By using objective selection, we included 11 diagnostic accuracy studies that validated the performance of their deep learning method using a new group of patients or retrospective datasets. Eight studies reported sensitivity and specificity of 80.28% to 100.0% and 84.0% to 99.0%, respectively. Two studies report accuracies of 78.7% and 81.0%. One study provides an area under the receiver operating curve of 0.955. In addition to diagnostic performance, one study also reported on patient satisfaction, showing that 78% of patients preferred an automated deep learning model over manual human grading. Conclusions Advantages of implementing deep learning–based algorithms in DR screening include reduction in manpower, cost of screening, and issues relating to intragrader and intergrader variability. However, limitations that may hinder such an implementation particularly revolve around ethical concerns regarding lack of trust in the diagnostic accuracy of computers. Considering both strengths and limitations, as well as the high performance of deep learning–based algorithms, automated DR classification using deep learning could be feasible in a real-world screening scenario.

U2 - 10.1016/j.oret.2018.10.014

DO - 10.1016/j.oret.2018.10.014

M3 - Review

VL - 3

SP - 294

EP - 304

JO - Ophthalmology Retina

JF - Ophthalmology Retina

SN - 2468-6530

IS - 4

ER -