Open Conference Systems, ITC 2016 Conference

Font Size: 
PAPER: Measurement Invariance in a Low-Stakes Testing Environment: A Demonstration of Best Practices
Barbara E Rowan, Joshua T. Goodman, J. Christine Harmes

Building: Pinnacle
Room: 3F-Port of San Francisco
Date: 2016-07-03 11:00 AM – 12:30 PM
Last modified: 2016-05-18

Abstract


The purpose of this study was to examine whether scores from paper-based (PPT) and computer-based (CBT) versions of a non-cognitive Achievement Goal Questionnaire (AGQ) scale of the Attitude Toward Learning measure were equivalent and could be used interchangeably. Previous research on test score comparability used simple methodology that provided insufficient evidence for the score equivalence. This study demonstrated a set of methodological best practices, providing a more complex and accurate analysis of the degree of measurement invariance that exists across groups.

True measurement invariance would establish that the scores from the CBT and the PPT versions of the AGQ were equivalent and interchangeable. Good model fit was found for parallel, tau-equivalent, and congeneric models. The adjusted chi-square difference test for WLSM estimation method was used to test for significant improvement of model fit as the constraints of the model are relaxed. A significant adjusted chi-square difference was found between the scalar invariant model (parallel measures) and the metric invariant model (tau-equivalent measures), indicating that the CBT and PPT raw scores were not equivalent and cannot be used interchangeably. The adjusted chi-square difference between the metric (tau-equivalent measures) and the configural (congeneric measures) models, however, was not significantly different, suggesting that the CBT and the PPT versions of the AGQ are essentially tau-equivalent measures. Therefore, the same construct was measured to the same degree, but scores are not equivalent without rescaling.

This study makes an important contribution to the field in detailing a sound methodology for testing measurement invariance. Metric invariance is sufficient for measures used in low-stakes testing situations like this one, where the purpose is for program assessment and the data are used in the aggregate. For this reason, rescaling the AGQ scores should be sufficient for treating the results as equivalent.


An account with this site is required in order to view papers. Click here to create an account.