Interrater agreement and reliability of outcome measurement instruments and staging systems used in hidradenitis suppurativa

L Thorlacius, A Garg, P T Riis, S. M. Nielsen, V Bettoli, J R Ingram, V Del Marmol, L Matusiak, J C Pascual, J Revuz, K Sartorius, T Tzellos, H H van der Zee, C C Zouboulis, D M Saunte, A B Gottlieb, R Christensen, G B E Jemec

Research output: Contribution to journalJournal articleResearchpeer-review

Abstract

BACKGROUND: Monitoring disease activity over time is a prerequisite for clinical practice and research. Valid and reliable outcome measurement instruments (OMIs) and staging systems provide researchers and clinicians with benchmark tools to assess the primary and secondary outcomes of interventional trials and to guide treatment selection properly.

OBJECTIVES: To investigate inter-rater reliability and agreement in instruments currently used in hidradenitis suppurativa (HS), with dermatologists experienced in HS as the rater population of interest.

METHODS: In a prospective completely balanced design, 24 patients with HS underwent a physical examination by 12 raters (288 assessments) using nine instruments. The results were analysed using generalized linear mixed models.

RESULTS: For the staging systems, the study found good inter-rater reliability for Hurley staging in the axillae and gluteal region, moderate inter-rater reliability for Hurley staging in the groin and for Physician's Global Assessment, and fair inter-rater reliability for refined Hurley staging and the International HS Severity Scoring System. For all the tested OMIs, the observed intervals for limits of agreement were very wide relative to the ranges of the scales.

CONCLUSIONS: The very wide intervals for limits of agreement imply that substantial changes are needed in clinical research in order to rule out measurement error. The results illustrate a difficulty, even for experienced HS experts, to agree on the type and number of lesions when evaluating disease severity. The apparent caveats call for global efforts, such as the HIdradenitis SuppuraTiva cORe outcomes set International Collaboration (HISTORIC) to reach consensus on how best to measure physical signs of HS reliably in randomized trials. What's already known about this topic? Without valid and reliable instruments to measure outcomes, researchers and clinicians lack the necessary benchmarks to assess primary and secondary end points of interventional trials properly. Hidradenitis suppurativa (HS) is a chronic inflammatory skin disease. Several outcome measure instruments exist for HS, but their validation is generally incomplete or of relatively low methodological quality. What does this study add? Using a prospective completely balanced design this study examined inter-rater reliability with HS-experienced dermatologists as the rater population of interest. The study did not find very good reliability for any included instrument or lesion counts. This study illustrates the difficulty in finding agreement on the type and number of HS lesions, even among experts. The results question whether physical signs are best measured by a traditional physician lesion count instrument. What are the clinical implications of this work? For staging, Hurley staging and physician global visual analogue scale proved to be acceptable instruments in terms of inter-rater reliability. For the instruments designed to measure changes in health status, our study illustrates how difficult it is, even for experts, to measure the physical signs of HS using a simple rater counting. Consequently, other assessment methods of physicals signs, such as ultrasound evaluation, require consideration.

Original languageEnglish
JournalBritish Journal of Dermatology
Volume181
Issue number3
Pages (from-to)483-491
ISSN0007-0963
DOIs
Publication statusPublished - Sep 2019

Fingerprint

Hidradenitis Suppurativa
Benchmarking
Physicians
Research Personnel
Outcome Assessment (Health Care)
Groin
Visual Analog Scale
Research
Skin Diseases
Population
Physical Examination

Cite this

Thorlacius, L ; Garg, A ; Riis, P T ; Nielsen, S. M. ; Bettoli, V ; Ingram, J R ; Del Marmol, V ; Matusiak, L ; Pascual, J C ; Revuz, J ; Sartorius, K ; Tzellos, T ; van der Zee, H H ; Zouboulis, C C ; Saunte, D M ; Gottlieb, A B ; Christensen, R ; Jemec, G B E. / Interrater agreement and reliability of outcome measurement instruments and staging systems used in hidradenitis suppurativa. In: British Journal of Dermatology. 2019 ; Vol. 181, No. 3. pp. 483-491.
@article{2fa1ae1c3b3547ea813e730b01bd5802,
title = "Interrater agreement and reliability of outcome measurement instruments and staging systems used in hidradenitis suppurativa",
abstract = "BACKGROUND: Monitoring disease activity over time is a prerequisite for clinical practice and research. Valid and reliable outcome measurement instruments (OMIs) and staging systems provide researchers and clinicians with benchmark tools to assess the primary and secondary outcomes of interventional trials and to guide treatment selection properly.OBJECTIVES: To investigate inter-rater reliability and agreement in instruments currently used in hidradenitis suppurativa (HS), with dermatologists experienced in HS as the rater population of interest.METHODS: In a prospective completely balanced design, 24 patients with HS underwent a physical examination by 12 raters (288 assessments) using nine instruments. The results were analysed using generalized linear mixed models.RESULTS: For the staging systems, the study found good inter-rater reliability for Hurley staging in the axillae and gluteal region, moderate inter-rater reliability for Hurley staging in the groin and for Physician's Global Assessment, and fair inter-rater reliability for refined Hurley staging and the International HS Severity Scoring System. For all the tested OMIs, the observed intervals for limits of agreement were very wide relative to the ranges of the scales.CONCLUSIONS: The very wide intervals for limits of agreement imply that substantial changes are needed in clinical research in order to rule out measurement error. The results illustrate a difficulty, even for experienced HS experts, to agree on the type and number of lesions when evaluating disease severity. The apparent caveats call for global efforts, such as the HIdradenitis SuppuraTiva cORe outcomes set International Collaboration (HISTORIC) to reach consensus on how best to measure physical signs of HS reliably in randomized trials. What's already known about this topic? Without valid and reliable instruments to measure outcomes, researchers and clinicians lack the necessary benchmarks to assess primary and secondary end points of interventional trials properly. Hidradenitis suppurativa (HS) is a chronic inflammatory skin disease. Several outcome measure instruments exist for HS, but their validation is generally incomplete or of relatively low methodological quality. What does this study add? Using a prospective completely balanced design this study examined inter-rater reliability with HS-experienced dermatologists as the rater population of interest. The study did not find very good reliability for any included instrument or lesion counts. This study illustrates the difficulty in finding agreement on the type and number of HS lesions, even among experts. The results question whether physical signs are best measured by a traditional physician lesion count instrument. What are the clinical implications of this work? For staging, Hurley staging and physician global visual analogue scale proved to be acceptable instruments in terms of inter-rater reliability. For the instruments designed to measure changes in health status, our study illustrates how difficult it is, even for experts, to measure the physical signs of HS using a simple rater counting. Consequently, other assessment methods of physicals signs, such as ultrasound evaluation, require consideration.",
author = "L Thorlacius and A Garg and Riis, {P T} and Nielsen, {S. M.} and V Bettoli and Ingram, {J R} and {Del Marmol}, V and L Matusiak and Pascual, {J C} and J Revuz and K Sartorius and T Tzellos and {van der Zee}, {H H} and Zouboulis, {C C} and Saunte, {D M} and Gottlieb, {A B} and R Christensen and Jemec, {G B E}",
note = "This article is protected by copyright. All rights reserved.",
year = "2019",
month = "9",
doi = "10.1111/bjd.17716",
language = "English",
volume = "181",
pages = "483--491",
journal = "British Journal of Dermatology",
issn = "0007-0963",
publisher = "Wiley-Blackwell",
number = "3",

}

Thorlacius, L, Garg, A, Riis, PT, Nielsen, SM, Bettoli, V, Ingram, JR, Del Marmol, V, Matusiak, L, Pascual, JC, Revuz, J, Sartorius, K, Tzellos, T, van der Zee, HH, Zouboulis, CC, Saunte, DM, Gottlieb, AB, Christensen, R & Jemec, GBE 2019, 'Interrater agreement and reliability of outcome measurement instruments and staging systems used in hidradenitis suppurativa', British Journal of Dermatology, vol. 181, no. 3, pp. 483-491. https://doi.org/10.1111/bjd.17716

Interrater agreement and reliability of outcome measurement instruments and staging systems used in hidradenitis suppurativa. / Thorlacius, L; Garg, A; Riis, P T; Nielsen, S. M.; Bettoli, V; Ingram, J R; Del Marmol, V; Matusiak, L; Pascual, J C; Revuz, J; Sartorius, K; Tzellos, T; van der Zee, H H; Zouboulis, C C; Saunte, D M; Gottlieb, A B; Christensen, R; Jemec, G B E.

In: British Journal of Dermatology, Vol. 181, No. 3, 09.2019, p. 483-491.

Research output: Contribution to journalJournal articleResearchpeer-review

TY - JOUR

T1 - Interrater agreement and reliability of outcome measurement instruments and staging systems used in hidradenitis suppurativa

AU - Thorlacius, L

AU - Garg, A

AU - Riis, P T

AU - Nielsen, S. M.

AU - Bettoli, V

AU - Ingram, J R

AU - Del Marmol, V

AU - Matusiak, L

AU - Pascual, J C

AU - Revuz, J

AU - Sartorius, K

AU - Tzellos, T

AU - van der Zee, H H

AU - Zouboulis, C C

AU - Saunte, D M

AU - Gottlieb, A B

AU - Christensen, R

AU - Jemec, G B E

N1 - This article is protected by copyright. All rights reserved.

PY - 2019/9

Y1 - 2019/9

N2 - BACKGROUND: Monitoring disease activity over time is a prerequisite for clinical practice and research. Valid and reliable outcome measurement instruments (OMIs) and staging systems provide researchers and clinicians with benchmark tools to assess the primary and secondary outcomes of interventional trials and to guide treatment selection properly.OBJECTIVES: To investigate inter-rater reliability and agreement in instruments currently used in hidradenitis suppurativa (HS), with dermatologists experienced in HS as the rater population of interest.METHODS: In a prospective completely balanced design, 24 patients with HS underwent a physical examination by 12 raters (288 assessments) using nine instruments. The results were analysed using generalized linear mixed models.RESULTS: For the staging systems, the study found good inter-rater reliability for Hurley staging in the axillae and gluteal region, moderate inter-rater reliability for Hurley staging in the groin and for Physician's Global Assessment, and fair inter-rater reliability for refined Hurley staging and the International HS Severity Scoring System. For all the tested OMIs, the observed intervals for limits of agreement were very wide relative to the ranges of the scales.CONCLUSIONS: The very wide intervals for limits of agreement imply that substantial changes are needed in clinical research in order to rule out measurement error. The results illustrate a difficulty, even for experienced HS experts, to agree on the type and number of lesions when evaluating disease severity. The apparent caveats call for global efforts, such as the HIdradenitis SuppuraTiva cORe outcomes set International Collaboration (HISTORIC) to reach consensus on how best to measure physical signs of HS reliably in randomized trials. What's already known about this topic? Without valid and reliable instruments to measure outcomes, researchers and clinicians lack the necessary benchmarks to assess primary and secondary end points of interventional trials properly. Hidradenitis suppurativa (HS) is a chronic inflammatory skin disease. Several outcome measure instruments exist for HS, but their validation is generally incomplete or of relatively low methodological quality. What does this study add? Using a prospective completely balanced design this study examined inter-rater reliability with HS-experienced dermatologists as the rater population of interest. The study did not find very good reliability for any included instrument or lesion counts. This study illustrates the difficulty in finding agreement on the type and number of HS lesions, even among experts. The results question whether physical signs are best measured by a traditional physician lesion count instrument. What are the clinical implications of this work? For staging, Hurley staging and physician global visual analogue scale proved to be acceptable instruments in terms of inter-rater reliability. For the instruments designed to measure changes in health status, our study illustrates how difficult it is, even for experts, to measure the physical signs of HS using a simple rater counting. Consequently, other assessment methods of physicals signs, such as ultrasound evaluation, require consideration.

AB - BACKGROUND: Monitoring disease activity over time is a prerequisite for clinical practice and research. Valid and reliable outcome measurement instruments (OMIs) and staging systems provide researchers and clinicians with benchmark tools to assess the primary and secondary outcomes of interventional trials and to guide treatment selection properly.OBJECTIVES: To investigate inter-rater reliability and agreement in instruments currently used in hidradenitis suppurativa (HS), with dermatologists experienced in HS as the rater population of interest.METHODS: In a prospective completely balanced design, 24 patients with HS underwent a physical examination by 12 raters (288 assessments) using nine instruments. The results were analysed using generalized linear mixed models.RESULTS: For the staging systems, the study found good inter-rater reliability for Hurley staging in the axillae and gluteal region, moderate inter-rater reliability for Hurley staging in the groin and for Physician's Global Assessment, and fair inter-rater reliability for refined Hurley staging and the International HS Severity Scoring System. For all the tested OMIs, the observed intervals for limits of agreement were very wide relative to the ranges of the scales.CONCLUSIONS: The very wide intervals for limits of agreement imply that substantial changes are needed in clinical research in order to rule out measurement error. The results illustrate a difficulty, even for experienced HS experts, to agree on the type and number of lesions when evaluating disease severity. The apparent caveats call for global efforts, such as the HIdradenitis SuppuraTiva cORe outcomes set International Collaboration (HISTORIC) to reach consensus on how best to measure physical signs of HS reliably in randomized trials. What's already known about this topic? Without valid and reliable instruments to measure outcomes, researchers and clinicians lack the necessary benchmarks to assess primary and secondary end points of interventional trials properly. Hidradenitis suppurativa (HS) is a chronic inflammatory skin disease. Several outcome measure instruments exist for HS, but their validation is generally incomplete or of relatively low methodological quality. What does this study add? Using a prospective completely balanced design this study examined inter-rater reliability with HS-experienced dermatologists as the rater population of interest. The study did not find very good reliability for any included instrument or lesion counts. This study illustrates the difficulty in finding agreement on the type and number of HS lesions, even among experts. The results question whether physical signs are best measured by a traditional physician lesion count instrument. What are the clinical implications of this work? For staging, Hurley staging and physician global visual analogue scale proved to be acceptable instruments in terms of inter-rater reliability. For the instruments designed to measure changes in health status, our study illustrates how difficult it is, even for experts, to measure the physical signs of HS using a simple rater counting. Consequently, other assessment methods of physicals signs, such as ultrasound evaluation, require consideration.

U2 - 10.1111/bjd.17716

DO - 10.1111/bjd.17716

M3 - Journal article

VL - 181

SP - 483

EP - 491

JO - British Journal of Dermatology

JF - British Journal of Dermatology

SN - 0007-0963

IS - 3

ER -