PAPER: Improving the Reliability of Essay Evaluations Using a Tool from the Literature on the Wisdomâ€“of-Crowds

Avi Allalouf; Meir Barneron; Ilan Yaniv

Open Conference Systems, ITC 2016 Conference

Avi Allalouf, Meir Barneron, Ilan Yaniv

Building: Pinnacle
Room: 3F-Port of Hong Kong
Date: 2016-07-03 11:00 AM – 12:30 PM
Last modified: 2016-06-08

Abstract

College candidates taking the Psychometric Entrance Test (the Israeli SAT equivalent) are required to write a short essay. This task tests the candidateâ€™s academic writing skills and is a central component of the test. The essays are traditionally evaluated and graded by raters who are well-trained for this task.

Given the importance of obtaining reliable and accurate evaluations, the common practice is to average the evaluations of two independent raters. This practice is known to improve the reliability in performance assessment tests.

The National Institute for Testing and Evaluation, in charge of the PET, accepts essays written in a dozen of foreign languages. The rational is to make the entry into institutions of higher education accessible to candidates from various backgrounds and is based on the assumption that an examinee expresses her writing skills better in her mother tongue. Yet, in some languages (e.g., Amharic) it is hard to find well-trained raters. This raises the question of whether it exists a method to improve the accuracy of grades based on a single rater.

Recent research in the field of judgment and decision-making suggests that judgments accuracy could be improved by eliciting multiple judgments from the same individual (at different times), rather than by eliciting single judgments from multiple individuals. This â€œwisdom-of-crowd effectâ€ within the mind of a single individual implies that essay evaluations made by the same rater at two different occasions should be more accurate than a grade based on a single evaluation.

Our study used professional raters and real essays. We found robust evidence for benefits of this method. The project is unique in that it incorporates ideas from the judgment and decision-making literature into the field of assessment and evaluation, suggesting a noteworthy application that should be considered in attempt to improve the reliability of evaluations.

An account with this site is required in order to view papers. Click here to create an account.