Open Conference Systems, ITC 2016 Conference

Font Size: 
PAPER: Predicting Item Difficulty: Methodological Challenges and Way Forward
Yasmine El Masri, Steve Ferrara, Peter W. Foltz, Jo-Anne Baird

Building: Pinnacle
Room: 3F-Port of Hong Kong
Date: 2016-07-04 11:00 AM – 12:30 PM
Last modified: 2016-05-21

Abstract


Understanding what makes a question challenging for students is of prime importance in education. To maximise learning, teachers should provide students with tasks that match their abilities. Item writers need to identify what affects the level of question difficulty to manipulate test demands and best reflect the construct assessed. Item writers should also eliminate construct-irrelevant sources of demand which bias against particular gender, language or cultural groups in national and international assessments. Pre-testing of items, however is expensive and not a viable option in many circumstances.  Equally, we need better theories about what makes items difficult in order to obtain the most information from assessment data. Despite identifying a large number of explanatory variables, predicting item difficulty remains a challenge in educational assessment.

This presentation highlights methodological challenges faced when determining item difficulty. It uses empirical data of a study predicting item difficulty of 216 key stage 2 science items administered in England between 2010 and 2012. The analysis included potential predictors previously identified in the literature: topic, concept, question type, nature of stimulus, depth of knowledge and linguistic variables. Coding frameworks employed in similar studies were adapted. Linguistic demands were gauged using a computational linguistic facility. Results of stepwise regression analyses were consistent with previous studies with up to 23% of variance explained.

While a substantial part of unexplained variance could be attributed to the unpredictable interaction of variables, we argue that progress in item difficulty prediction requires improvement in the methods employed, most of which rely heavily on subjective expert judgement. Future research needs to focus on improving coding frameworks as well as developing systematic training protocols of raters such as the use of anchor-based rating methods. These technical advances would pave the way to improved test design and reduced development costs of assessments.

Keywords: item difficulty, item demands, expert judgement, science tests, coding framework


An account with this site is required in order to view papers. Click here to create an account.