Deep Learning–Based Algorithms in Screening of Diabetic Retinopathy: A Systematic Review of Diagnostic Performance

Katrine B. Nielsen, Mie L. Lautrup, Jakob K.H. Andersen, Thiusius R. Savarimuthu, Jakob Grauslund*

*Corresponding author for this work

Research output: Contribution to journalReviewResearchpeer-review

52 Downloads (Pure)

Abstract

Topic Diagnostic performance of deep learning–based algorithms in screening patients with diabetes for diabetic retinopathy (DR). The algorithms were compared with the current gold standard of classification by human specialists. Clinical Relevance Because DR is a common cause of visual impairment, screening is indicated to avoid irreversible vision loss. Automated DR classification using deep learning may be a suitable new screening tool that could improve diagnostic performance and reduce manpower. Methods For this systematic review, we aimed to identify studies that incorporated the use of deep learning in classifying full-scale DR in retinal fundus images of patients with diabetes. The studies had to provide a DR grading scale, a human grader as a reference standard, and a deep learning performance score. A systematic search on April 5, 2018, through MEDLINE and Embase yielded 304 publications. To identify potentially missed publications, the reference lists of the final included studies were manually screened, yielding no additional publications. The Quality Assessment of Diagnostic Accuracy Studies 2 tool was used for risk of bias and applicability assessment. Results By using objective selection, we included 11 diagnostic accuracy studies that validated the performance of their deep learning method using a new group of patients or retrospective datasets. Eight studies reported sensitivity and specificity of 80.28% to 100.0% and 84.0% to 99.0%, respectively. Two studies report accuracies of 78.7% and 81.0%. One study provides an area under the receiver operating curve of 0.955. In addition to diagnostic performance, one study also reported on patient satisfaction, showing that 78% of patients preferred an automated deep learning model over manual human grading. Conclusions Advantages of implementing deep learning–based algorithms in DR screening include reduction in manpower, cost of screening, and issues relating to intragrader and intergrader variability. However, limitations that may hinder such an implementation particularly revolve around ethical concerns regarding lack of trust in the diagnostic accuracy of computers. Considering both strengths and limitations, as well as the high performance of deep learning–based algorithms, automated DR classification using deep learning could be feasible in a real-world screening scenario.
Original languageEnglish
JournalOphthalmology Retina
Volume3
Issue number4
Pages (from-to)294-304
Number of pages11
ISSN2468-6530
DOIs
Publication statusPublished - Apr 2019

Fingerprint

Diabetic Retinopathy
Publications
MEDLINE

Keywords

  • Algorithms
  • Deep Learning
  • Diabetic Retinopathy/diagnosis
  • Diagnostic Techniques, Ophthalmological
  • Humans
  • Machine Learning
  • Mass Screening/methods
  • Neural Networks, Computer
  • ROC Curve

Cite this

@article{f7991c1cbc784762ae12f82abf1027d9,
title = "Deep Learning–Based Algorithms in Screening of Diabetic Retinopathy: A Systematic Review of Diagnostic Performance",
abstract = "Topic Diagnostic performance of deep learning–based algorithms in screening patients with diabetes for diabetic retinopathy (DR). The algorithms were compared with the current gold standard of classification by human specialists. Clinical Relevance Because DR is a common cause of visual impairment, screening is indicated to avoid irreversible vision loss. Automated DR classification using deep learning may be a suitable new screening tool that could improve diagnostic performance and reduce manpower. Methods For this systematic review, we aimed to identify studies that incorporated the use of deep learning in classifying full-scale DR in retinal fundus images of patients with diabetes. The studies had to provide a DR grading scale, a human grader as a reference standard, and a deep learning performance score. A systematic search on April 5, 2018, through MEDLINE and Embase yielded 304 publications. To identify potentially missed publications, the reference lists of the final included studies were manually screened, yielding no additional publications. The Quality Assessment of Diagnostic Accuracy Studies 2 tool was used for risk of bias and applicability assessment. Results By using objective selection, we included 11 diagnostic accuracy studies that validated the performance of their deep learning method using a new group of patients or retrospective datasets. Eight studies reported sensitivity and specificity of 80.28{\%} to 100.0{\%} and 84.0{\%} to 99.0{\%}, respectively. Two studies report accuracies of 78.7{\%} and 81.0{\%}. One study provides an area under the receiver operating curve of 0.955. In addition to diagnostic performance, one study also reported on patient satisfaction, showing that 78{\%} of patients preferred an automated deep learning model over manual human grading. Conclusions Advantages of implementing deep learning–based algorithms in DR screening include reduction in manpower, cost of screening, and issues relating to intragrader and intergrader variability. However, limitations that may hinder such an implementation particularly revolve around ethical concerns regarding lack of trust in the diagnostic accuracy of computers. Considering both strengths and limitations, as well as the high performance of deep learning–based algorithms, automated DR classification using deep learning could be feasible in a real-world screening scenario.",
keywords = "Algorithms, Deep Learning, Diabetic Retinopathy/diagnosis, Diagnostic Techniques, Ophthalmological, Humans, Machine Learning, Mass Screening/methods, Neural Networks, Computer, ROC Curve",
author = "Nielsen, {Katrine B.} and Lautrup, {Mie L.} and Andersen, {Jakob K.H.} and Savarimuthu, {Thiusius R.} and Jakob Grauslund",
year = "2019",
month = "4",
doi = "10.1016/j.oret.2018.10.014",
language = "English",
volume = "3",
pages = "294--304",
journal = "Ophthalmology Retina",
issn = "2468-6530",
number = "4",

}

Deep Learning–Based Algorithms in Screening of Diabetic Retinopathy : A Systematic Review of Diagnostic Performance. / Nielsen, Katrine B.; Lautrup, Mie L.; Andersen, Jakob K.H.; Savarimuthu, Thiusius R.; Grauslund, Jakob.

In: Ophthalmology Retina, Vol. 3, No. 4, 04.2019, p. 294-304.

Research output: Contribution to journalReviewResearchpeer-review

TY - JOUR

T1 - Deep Learning–Based Algorithms in Screening of Diabetic Retinopathy

T2 - A Systematic Review of Diagnostic Performance

AU - Nielsen, Katrine B.

AU - Lautrup, Mie L.

AU - Andersen, Jakob K.H.

AU - Savarimuthu, Thiusius R.

AU - Grauslund, Jakob

PY - 2019/4

Y1 - 2019/4

N2 - Topic Diagnostic performance of deep learning–based algorithms in screening patients with diabetes for diabetic retinopathy (DR). The algorithms were compared with the current gold standard of classification by human specialists. Clinical Relevance Because DR is a common cause of visual impairment, screening is indicated to avoid irreversible vision loss. Automated DR classification using deep learning may be a suitable new screening tool that could improve diagnostic performance and reduce manpower. Methods For this systematic review, we aimed to identify studies that incorporated the use of deep learning in classifying full-scale DR in retinal fundus images of patients with diabetes. The studies had to provide a DR grading scale, a human grader as a reference standard, and a deep learning performance score. A systematic search on April 5, 2018, through MEDLINE and Embase yielded 304 publications. To identify potentially missed publications, the reference lists of the final included studies were manually screened, yielding no additional publications. The Quality Assessment of Diagnostic Accuracy Studies 2 tool was used for risk of bias and applicability assessment. Results By using objective selection, we included 11 diagnostic accuracy studies that validated the performance of their deep learning method using a new group of patients or retrospective datasets. Eight studies reported sensitivity and specificity of 80.28% to 100.0% and 84.0% to 99.0%, respectively. Two studies report accuracies of 78.7% and 81.0%. One study provides an area under the receiver operating curve of 0.955. In addition to diagnostic performance, one study also reported on patient satisfaction, showing that 78% of patients preferred an automated deep learning model over manual human grading. Conclusions Advantages of implementing deep learning–based algorithms in DR screening include reduction in manpower, cost of screening, and issues relating to intragrader and intergrader variability. However, limitations that may hinder such an implementation particularly revolve around ethical concerns regarding lack of trust in the diagnostic accuracy of computers. Considering both strengths and limitations, as well as the high performance of deep learning–based algorithms, automated DR classification using deep learning could be feasible in a real-world screening scenario.

AB - Topic Diagnostic performance of deep learning–based algorithms in screening patients with diabetes for diabetic retinopathy (DR). The algorithms were compared with the current gold standard of classification by human specialists. Clinical Relevance Because DR is a common cause of visual impairment, screening is indicated to avoid irreversible vision loss. Automated DR classification using deep learning may be a suitable new screening tool that could improve diagnostic performance and reduce manpower. Methods For this systematic review, we aimed to identify studies that incorporated the use of deep learning in classifying full-scale DR in retinal fundus images of patients with diabetes. The studies had to provide a DR grading scale, a human grader as a reference standard, and a deep learning performance score. A systematic search on April 5, 2018, through MEDLINE and Embase yielded 304 publications. To identify potentially missed publications, the reference lists of the final included studies were manually screened, yielding no additional publications. The Quality Assessment of Diagnostic Accuracy Studies 2 tool was used for risk of bias and applicability assessment. Results By using objective selection, we included 11 diagnostic accuracy studies that validated the performance of their deep learning method using a new group of patients or retrospective datasets. Eight studies reported sensitivity and specificity of 80.28% to 100.0% and 84.0% to 99.0%, respectively. Two studies report accuracies of 78.7% and 81.0%. One study provides an area under the receiver operating curve of 0.955. In addition to diagnostic performance, one study also reported on patient satisfaction, showing that 78% of patients preferred an automated deep learning model over manual human grading. Conclusions Advantages of implementing deep learning–based algorithms in DR screening include reduction in manpower, cost of screening, and issues relating to intragrader and intergrader variability. However, limitations that may hinder such an implementation particularly revolve around ethical concerns regarding lack of trust in the diagnostic accuracy of computers. Considering both strengths and limitations, as well as the high performance of deep learning–based algorithms, automated DR classification using deep learning could be feasible in a real-world screening scenario.

KW - Algorithms

KW - Deep Learning

KW - Diabetic Retinopathy/diagnosis

KW - Diagnostic Techniques, Ophthalmological

KW - Humans

KW - Machine Learning

KW - Mass Screening/methods

KW - Neural Networks, Computer

KW - ROC Curve

U2 - 10.1016/j.oret.2018.10.014

DO - 10.1016/j.oret.2018.10.014

M3 - Review

C2 - 31014679

VL - 3

SP - 294

EP - 304

JO - Ophthalmology Retina

JF - Ophthalmology Retina

SN - 2468-6530

IS - 4

ER -