Inter‐rater reliability of “clean cup” scores by coffee experts

Davide Giacalone*, Ida Steen, Jesper Alstrup, Morten Münchow

Cupping scores from experts are extensively used in the coffee industry for a variety of applications, from quality control to judging coffee competitions. In this paper, we examined inter-rater reliability (IRR) of “clean cup” ratings by coffee experts (“cuppers”) in two studies. In both studies, IRR reliability was found to be low, denoting a lack of concept alignment among experts. Remarkably, however, within-assessor reproducibility was high, suggesting that expert cuppers have their own individual understanding of “clean cup.”. Practical applications: The results presented suggested that “clean cup” scores have a fundamentally subjective nature. Since cupping scores are routinely used to drive business decisions (particularly in the context of quality control), it would be advisable that such attributes be anchored in a precise definition (in the case of “clean cup” of what constitute a defect from a sensory point of view) developed based on properly conducted sensory studies.

