Open Conference Systems, ITC 2016 Conference

Font Size: 
PAPER: Hierarchical Quality Control of Test Scores Using Examinee Background Information
Dvir Kleper, Avi Allalouf, Elliot Turvall, Carmel Oren, Marina Fronton

Building: Pinnacle
Room: 3F-Port of Hong Kong
Date: 2016-07-04 11:00 AM – 12:30 PM
Last modified: 2016-06-08

Abstract


Introduction

Scoring a test is a multi-staged and difficult procedure that relies on complex statistical assumptions and is often subject to errors that can have detrimental consequences for all those involved. Hence, the standard practice is to conduct rigorous quality control before reporting scores (Allalouf, 2007; Kolen & Brennan, 2014).

Researchers have found a link between demographic background variables (gender, age, parental education level, etc.) and test scores (Liu, et. al., 2012); this finding pertains to scores on the Psychometric Entrance Test (PET) as well (Sa'ar & Oren, 2014). This link should make it possible to predict test scores for an individual examinee or for a group of examinees.

Objective

To investigates how applicable two bi-level are for predicting PET scores, and whether it is feasible to rely on the optimal model for carrying out quality control on the scoring process.

Design/Methodology

A hierarchical linear analysis on two levels was conducted in order to investigate the link between background data and the general score, across test versions, where:

  • Level 1: the individual examinee
  • Level 2: the test version

As is standard in multilevel studies on quality control (Wei, 2013), we checked three models:

  1. Analysis of variance with random effects
  2. Regression with means as outcome
  3. Regression with random coefficients

Results/Conclusions

The second model predicted 69% of the total variance of the mean scores on a test version. Therefore, at the level of the test version, there was a very good ability to predict the mean. The third model explained 20% of the variance of the general score of an individual examinee. Therefore, at the level of the examinee the ability to predict the score of an individual examinee was relatively weak.

The model with good predictive ability was tested on another database and was found to be valid.


An account with this site is required in order to view papers. Click here to create an account.