Open Conference Systems, ITC 2016 Conference

Font Size: 
PAPER: Conquer the IRT Hurdle for Adaptive Test Designs
Huijuan Meng, Fanmin Guo, Kyung (Chris) T Han

Building: Pinnacle
Room: 3F-Port of Hong Kong
Date: 2016-07-03 03:30 PM – 05:00 PM
Last modified: 2016-05-22

Abstract


Conventionally adaptive tests require item pools with calibrated items.  However, when a test program is newly developed or new items are added to the existing item bank/pool, administering those items are not done adaptively because the items have not been calibrated yet. In this case, an alternative linear-on-the-fly testing (LOFT) design may be viable by using subject-matter experts' (SME) judgments about item difficulty to enforce some control over form difficulty. For multi-stage testing (MST) or computerized adaptive testing (CAT) though, SME’s item difficulty ratings per se may not be sufficient to conquer the hurdle of lacking an IRT-calibrated item pool because both designs rely on the usage of item parameters.

One possible solution is to use two sets of item parameters, one in the test assembly and the other for test scoring. When a test is assembled, the imputed IRT parameters are used for new items based on known parameters from existing operational items under the same content specification and having the same SME’s difficulty rating.  With imputed values, either MST or CAT design can be adopted by the testing program in order to produce higher score precisions for examinees along a wider score scale. The tests are scored with post-administration calibrated and scaled item parameters.

Therefore, the purposes of this study are (1) evaluating the feasibility of this approach under different conditions; and (2) identifying an optimal calibration strategy for new items in the pool.

Four factors examined in this study are:

  1. Five test designs: LOFT, MST, LOFT+CAT, Random+CAT, and CAT
  2. Two item pools: 600 and 720 items
  3. Three proportions of equating items: 20%, 50%, and 80%, and
  4. Two calibration strategies: all vs. half data.

Performance of various test designs are evaluated in terms of parameter recovery, reliability, item exposure rate, and Conditional SEM.


An account with this site is required in order to view papers. Click here to create an account.