Providing Validation Evidence for a Clinical-Science Module: Improving Testing Reliability with Quizzes
Michael Peeters
University of Toledo
M Kenneth Cor
University of Alberta Faculty of Pharmacy & Pharmaceutical Sciences
Erik Maki
Drake University College of Pharmacy
DOI: https://doi.org/10.24926/iip.v12i1.2235
Abstract
Description of the Problem: High-stakes decision-making should have sound validation evidence; reliability is vital towards this. A short exam may not be very reliable on its own within didactic courses, and so supplementing it with quizzes might help. But how much? This study’s objective was to understand how much reliability (for the overall module-grades) could be gained by adding quiz data to traditional exam data in a clinical-science module.
The Innovation: In didactic coursework, quizzes are a common instructional strategy. However, individual contexts/instructors can vary quiz use formatively and/or summatively. Second-year PharmD students took a clinical-science course, wherein a 5-week module focused on cardiovascular therapeutics. Generalizability Theory (G-Theory) combined seven quizzes leading to an exam into one module-level reliability, based on a model where students were crossed with items nested in eight fixed testing occasions (mGENOVA used). Furthermore, G-Theory decision-studies were planned to illustrate changes in module-grade reliability, where the number of quiz-items and relative-weighting of quizzes were altered.
Critical Analysis: One-hundred students took seven quizzes and one exam. Individually, the exam had 32 multiple-choice questions (MCQ) (KR-20 reliability=0.67), while quizzes had a total of 50MCQ (5-9MCQ each) with most individual quiz KR-20s less than or equal to 0.54. After combining the quizzes and exam using G-Theory, estimated reliability of module-grades was 0.73; improved from the exam alone. Doubling the quiz-weight, from the syllabus’ 18% quizzes and 82% exam, increased the composite-reliability of module-grades to 0.77. Reliability of 0.80 was achieved with equal-weight for quizzes and exam.
Next Steps: Expectedly, more items lent to higher reliability. However, using quizzes predominantly formatively had little impact on reliability, while using quizzes more summatively (i.e., increasing their relative-weight in module-grade) improved reliability further. Thus, depending on use, quizzes can add to a course’s rigor.