PAPER: Exploring Incomplete Rating Designs with Mokken Scale Analysis

Stefanie A. Wind; Yogendra J. Patil

Open Conference Systems, ITC 2016 Conference

Stefanie A. Wind, Yogendra J. Patil

Building: Pinnacle
Room: 3F-Port of Hong Kong
Date: 2016-07-02 11:00 AM – 12:30 PM
Last modified: 2016-05-21

Abstract

Introduction

Recently, Mokken scale analysis (MSA; Mokken, 1971)) has been demonstrated as a method for evaluating rating quality in educational assessments (Authors, 2015). However, this approach requires complete rating designs (each rater scores each student). Practical constraints limit the use of complete designs in operational assessment systems.

Objectives

This study explores the impact of missing data imputation on MSA-based rating quality indicators, focusing on three questions:

What is the effect of missing data imputation on MSA-based rating quality indicators?
How does the effect of missing data imputation on MSA-based rating quality indicators vary across rating designs?
Do MSA-based rating quality indicators still provide useful diagnostic information when data are imputed?

Method

Two datasets are from large-scale writing assessments are explored: Dataset 1 includes ratings of 50 essays by 62 raters; Dataset 2 includes ratings of 2,121 essays by 10 raters. The original datasets were modified to create simulated datasets with varying degrees of missingness that reflect operational testing programs.

Missing data were imputed using methods that have previously been explored in the context of MSA, including Random Imputation (RI), Two-way imputation, and response function imputation (van Ginkel, van der Ark, & Sijtsma, 2007). MSA rating quality indices are compared across the original and simulated datasets.

Results

Increasing levels of imputed data result in greater discrepancies in rating quality indices between the original and imputed data, with the most discrepancies observed for the RI method. Discrepancies in rater scalability increase linearly with missingness, whereas discrepancies related to monotonicity are minimal. Differences in invariant ordering vary across methods and designs.

Conclusions

The current study provides insight into the consequences of various imputation methods across designs commonly used in operational assessments that is essential for the widespread use of MSA to evaluate rating quality.

References

Authors (2015).

Mokken, R. J. (1971). A Theory and Procedure of Scale Analysis. The Hague: Mouton/ Berlin: De Gruyter.

van Ginkel, J. R., van der Ark, L. A., & Sijtsma, K. (2007). Multiple Imputation of Item Scores in Test and Questionnaire Data, and Influence on Psychometric Results. Multivariate Behavioral Research, 42(2), 387â€“414. http://doi.org/10.1080/00273170701360803

An account with this site is required in order to view papers. Click here to create an account.