BACKGROUND: The reliability of musculoskeletal diagnostic ultrasound imaging (MSK-DUSI) for the evaluation of neck musculature has been sparsely documented in the research literature. Until now, research has featured a limited number of subjects and only few studies have tested for both inter- and intra-reliability using appropriate methodology.
METHODS: Four examiners conducted an inter- and intra-rater reliability and agreement study. Fifty females with and without neck pain (NP) between the ages of 20-70 were recruited from October 2014 to April 2015. The muscles that were evaluated were the longus colli (Lcol), the rectus capitis posterior major (Rcpm), the deep cervical extensors (Dce) and the semispinalis capitis (Sscap). Each of the examiners captured ultrasound images of their allocated muscle and measured the thickness of that muscle twice, on separate occasions, for the first part of the intra-rater reliability study. For the second part, a second image of the same muscle was taken on the same subject and measured by the same examiner. The four examiners then met to measure on each other's images, to test inter-rater reliability. Their results were compared pair-wise using Interclass Correlation Coefficients (ICC) and Bland-Altman plots. Linear regression analysis was performed to evaluate for possible bias.
RESULTS: Inter-rater reliability was found to be good for Lcol and Sscap muscles and moderate towards poor for the deeper Rcpm and Dce muscles. Intra-rater reliability was good for all the muscles, with the exception of the Dce, which was found to be moderate in the second part of the study. The B&A plots showed good agreement, few outliers, and no bias. However, the agreement intervals indicated a measurement error within the variance of the method that may not have been acceptable for these small muscles if the aim is to evaluate change in thickness.
CONCLUSIONS: This study found that MSK-DUSI had variable reliability in assessing the thickness of the Lcol, Rcpm, Dce, and Sscap muscles. No bias was demonstrated, but agreement intervals were wide.