Colon capsule endoscopy (CCE) faces substantial challenges, one of which is achieving adequate colon cleansing. Furthermore, the interobserver agreement on bowel-cleansing quality varies. To address this issue, we developed an artificial intelligence algorithm (AIA) to evaluate bowel-cleansing quality. The aim of this study was to estimate the interobserver agreement on bowel cleansing between a group of experienced CCE readers and an AIA and to examine whether percentiles of the overall bowel-cleansing quality are a suitable way of reporting the results generated by the AIA.
Bowel-cleansing quality in 842 CCE investigations was scored on both a 2- and 4-point grading scale for the entire colon and by segment by experienced CCE readers and the AIA. For the algorithm, a score was given based on the mean score, median, upper and lower quartiles, and second and 98th percentiles. The level of agreement was evaluated using Cohen’s κ.
The interobserver agreement between the CCE readers and AIA on bowel-cleansing quality was minimal to none for the overall bowel evaluation, by segment, and on the 2- and 4- point grading scale regardless of the threshold for the AIA score.
We found minimal agreement on evaluation of bowel-cleansing quality in CCE between CCE readers and the AIA. Mean or percentiles of the AIA grading did not seem suitable for AI-generated bowel-cleansing evaluation.