Topic Diagnostic performance of deep learning–based algorithms in screening patients with diabetes for diabetic retinopathy (DR). The algorithms were compared with the current gold standard of classification by human specialists. Clinical Relevance Because DR is a common cause of visual impairment, screening is indicated to avoid irreversible vision loss. Automated DR classification using deep learning may be a suitable new screening tool that could improve diagnostic performance and reduce manpower. Methods For this systematic review, we aimed to identify studies that incorporated the use of deep learning in classifying full-scale DR in retinal fundus images of patients with diabetes. The studies had to provide a DR grading scale, a human grader as a reference standard, and a deep learning performance score. A systematic search on April 5, 2018, through MEDLINE and Embase yielded 304 publications. To identify potentially missed publications, the reference lists of the final included studies were manually screened, yielding no additional publications. The Quality Assessment of Diagnostic Accuracy Studies 2 tool was used for risk of bias and applicability assessment. Results By using objective selection, we included 11 diagnostic accuracy studies that validated the performance of their deep learning method using a new group of patients or retrospective datasets. Eight studies reported sensitivity and specificity of 80.28% to 100.0% and 84.0% to 99.0%, respectively. Two studies report accuracies of 78.7% and 81.0%. One study provides an area under the receiver operating curve of 0.955. In addition to diagnostic performance, one study also reported on patient satisfaction, showing that 78% of patients preferred an automated deep learning model over manual human grading. Conclusions Advantages of implementing deep learning–based algorithms in DR screening include reduction in manpower, cost of screening, and issues relating to intragrader and intergrader variability. However, limitations that may hinder such an implementation particularly revolve around ethical concerns regarding lack of trust in the diagnostic accuracy of computers. Considering both strengths and limitations, as well as the high performance of deep learning–based algorithms, automated DR classification using deep learning could be feasible in a real-world screening scenario.