Screening for technical flaws in multiple-choice items

A generalizability study

Lotte Dyhrberg O'Neill*, Sara Mathilde Radl Mortensen, Cita Nørgaard, Anne Lindebo Holm, Ulla Glenert Friis

*Kontaktforfatter for dette arbejde

Publikation: Bidrag til tidsskriftTidsskriftartikelForskningpeer review

8 Downloads (Pure)

Resumé

Construction errors in multiple-choice items are quite prevalent and constitute threats to test validity of multiple-choice tests. Currently very little research on the usefulness of systematic item screening by local review committees before test administration seem to exist. The aim of this study was therefore to examine validity and feasibility aspects of review committee screening for item flaws. We examined the reliability of item reviewers’ independent judgments of the presence/absence of item flaws with a generalizability study design and found only moderate reliability using five reviewers. Statistical analyses of actual exam scores could be a more efficient way of identifying flaws and improving average item discrimination of tests in local contexts. The question of validity of human judgments of item flaws is important - not just for sufficiently sound quality assurance procedures of tests in local test contexts - but also for the global research on item flaws.
OriginalsprogEngelsk
TidsskriftDansk Universitetspaedagogisk Tidsskrift
Vol/bind14
Udgave nummer26
Sider (fra-til)51-65
ISSN1901-5089
StatusUdgivet - 1. apr. 2019

Fingeraftryk

quality assurance
discrimination
threat

Emneord

  • Multiple Choice Tests
  • Validity
  • Generalizability
  • Item flaws
  • Quality Assurance

Citer dette

@article{c2baf9fa80634cc1a6eb0a3156a9dd28,
title = "Screening for technical flaws in multiple-choice items: A generalizability study",
abstract = "Construction errors in multiple-choice items are quite prevalent and constitute threats to test validity of multiple-choice tests. Currently very little research on the usefulness of systematic item screening by local review committees before test administration seem to exist. The aim of this study was therefore to examine validity and feasibility aspects of review committee screening for item flaws. We examined the reliability of item reviewers’ independent judgments of the presence/absence of item flaws with a generalizability study design and found only moderate reliability using five reviewers. Statistical analyses of actual exam scores could be a more efficient way of identifying flaws and improving average item discrimination of tests in local contexts. The question of validity of human judgments of item flaws is important - not just for sufficiently sound quality assurance procedures of tests in local test contexts - but also for the global research on item flaws.",
keywords = "Multiple Choice Tests, Validity, Generalizability, Item flaws, Quality Assurance, Multiple-choice Tests, Higher Education, Validity, Quality Assurance, quality appraisal, screening, Item flaws",
author = "{Dyhrberg O'Neill}, Lotte and {Radl Mortensen}, {Sara Mathilde} and Cita N{\o}rgaard and Holm, {Anne Lindebo} and Friis, {Ulla Glenert}",
year = "2019",
month = "4",
day = "1",
language = "English",
volume = "14",
pages = "51--65",
journal = "Dansk Universitetspaedagogisk Tidsskrift",
issn = "1901-5089",
publisher = "Statsbiblioteket",
number = "26",

}

Screening for technical flaws in multiple-choice items : A generalizability study. / Dyhrberg O'Neill, Lotte; Radl Mortensen, Sara Mathilde; Nørgaard, Cita; Holm, Anne Lindebo; Friis, Ulla Glenert.

I: Dansk Universitetspaedagogisk Tidsskrift, Bind 14, Nr. 26, 01.04.2019, s. 51-65.

Publikation: Bidrag til tidsskriftTidsskriftartikelForskningpeer review

TY - JOUR

T1 - Screening for technical flaws in multiple-choice items

T2 - A generalizability study

AU - Dyhrberg O'Neill, Lotte

AU - Radl Mortensen, Sara Mathilde

AU - Nørgaard, Cita

AU - Holm, Anne Lindebo

AU - Friis, Ulla Glenert

PY - 2019/4/1

Y1 - 2019/4/1

N2 - Construction errors in multiple-choice items are quite prevalent and constitute threats to test validity of multiple-choice tests. Currently very little research on the usefulness of systematic item screening by local review committees before test administration seem to exist. The aim of this study was therefore to examine validity and feasibility aspects of review committee screening for item flaws. We examined the reliability of item reviewers’ independent judgments of the presence/absence of item flaws with a generalizability study design and found only moderate reliability using five reviewers. Statistical analyses of actual exam scores could be a more efficient way of identifying flaws and improving average item discrimination of tests in local contexts. The question of validity of human judgments of item flaws is important - not just for sufficiently sound quality assurance procedures of tests in local test contexts - but also for the global research on item flaws.

AB - Construction errors in multiple-choice items are quite prevalent and constitute threats to test validity of multiple-choice tests. Currently very little research on the usefulness of systematic item screening by local review committees before test administration seem to exist. The aim of this study was therefore to examine validity and feasibility aspects of review committee screening for item flaws. We examined the reliability of item reviewers’ independent judgments of the presence/absence of item flaws with a generalizability study design and found only moderate reliability using five reviewers. Statistical analyses of actual exam scores could be a more efficient way of identifying flaws and improving average item discrimination of tests in local contexts. The question of validity of human judgments of item flaws is important - not just for sufficiently sound quality assurance procedures of tests in local test contexts - but also for the global research on item flaws.

KW - Multiple Choice Tests

KW - Validity

KW - Generalizability

KW - Item flaws

KW - Quality Assurance

KW - Multiple-choice Tests

KW - Higher Education

KW - Validity

KW - Quality Assurance

KW - quality appraisal

KW - screening

KW - Item flaws

M3 - Journal article

VL - 14

SP - 51

EP - 65

JO - Dansk Universitetspaedagogisk Tidsskrift

JF - Dansk Universitetspaedagogisk Tidsskrift

SN - 1901-5089

IS - 26

ER -