SYMPOSIUM: Test-Taking Motivation and the Validity of Inferences from Test Scores: A Global Concern

Sara Finney; Hanna EklÃ¶f; Eva Knekta; Aaron Myers; Catherine Mathers; Christiane Penk; Joesph Rios; Liyang Mao; Lydia Liu; Hongwen Guo

Open Conference Systems, ITC 2016 Conference

Sara Finney, Hanna EklÃ¶f, Eva Knekta, Aaron Myers, Catherine Mathers, Christiane Penk, Joesph Rios, Liyang Mao, Lydia Liu, Hongwen Guo

Building: Pinnacle
Room: Cordova-SalonF
Date: 2016-07-03 11:00 AM – 12:30 PM
Last modified: 2016-06-01

Abstract

Session Chair: Sara Finney

Introduction

The validity of inferences made from test scores depends on examinees putting forth the effort necessary to demonstrate proficiency. Whereas test-taking motivation (TTM) is likely to be high when examinees complete tests with personal consequences, TTM is likely to be variable when tests have no personal consequences. As noted in the Standards for Testing (AERA, APA, & NCME, 1999), â€œWhen individual scores are not reported to test takers, it is important to determine whether the examinees took the test experience seriouslyâ€ (p. 167). Several testing programs do not report scores to examinees; however, the stakes of the test are high for educators and policymakers. Using data from several countries and testing contexts, the objective of this symposium was to model the impact of TTM on the validity of test scores.

Contributions

The first paper highlights the decrease in TTM over time for Swedish examinees completing the PISA. Fortunately, multilevel analyses indicated the decrease in TTM was associated with only a small decline in PISA scores. The second paper shows that manipulating test instructions did not enhance TTM or performance for U.S. examinees. Moreover, the fully-mediated effect of test importance on performance via effort was not moderated by instructions. The third paper uses data from German examinees to estimate the effect of expectancy (i.e., self-efficacy) on TTM and performance. The fourth paper utilizes both simulated and applied data from U.S. examinees to show careless responding can have a biasing effect on aggregate test scores and filtering out examinees with low TMM is questionable for operational testing.

Conclusions

The International Testing Commission (2000) calls for test users to â€œconsider other qualities which may have artificially lowered or raised results when interpreting scoresâ€ (p. 15, Guideline 2.7.7). These studies highlight the importance of not only considering, but actually measuring and modeling TTM.

*****

PAPER 1: The Swedish PISA decline: Can changes in reported test-taking effort explain changes in PISA performance?
Hanna EklÃ¶f & Eva Knekta, UmeÃ¥ University, Sweden

Introduction and Objectives

PISA has had a major impact on the discussions about educational quality around the world, so also in Sweden. Sweden has had the largest results decline of all countries in PISA, causing a discussion about whether Swedish students take the low-stakes PISA test seriously and are motivated to do their best, and whether the decline in PISA could be attributed to a lower level of effort now than previously. The present study aimed to explore this issue by investigating whether Swedish students report less effort in PISA than students in other countries and whether effort can explain any variation in test performance over time.

Design/Methodology

The study used PISA questionnaire as well as performance data. Of primary interest was the PISA effort thermometer, a self-report measure of studentsâ€™ test-taking effort. A number of other student background variables were also included in the analyses. Data were primarily analyzed through regression analysis and multilevel regression modeling.

Results and Conclusions

Between 2003 and 2012, there is a small decrease in reported effort among Swedish students. The decline is larger for girls than for boys. The effect of reported effort on performance is significant also when other background variables are included in the model but the effect is rather stable over time. Compared to other countries, Swedish students report a comparatively low level of effort. Multilevel analyses showed that most of the variation can be attributed to the student level, but also that there are differences between schools (and countries) in terms of effort. In conclusion, however, findings show that only a few score points of the Swedish results decline could be

PAPER 2: A Moderated Mediation Model of Test Importance, Examinee Effort, and Test Performance Across Test Instruction Conditions
Aaron J. Myers, Sara J. Finney, & Catherine E. Mathers, James Madison University

Introduction

Research on the effects of test instructions has shown that increasing the personal relevance of tests results in higher test performance, perceived test importance, and examinee effort. We argue these differences in performance and motivation may be due to unrealistic test instruction conditions. Thus, we investigated more realistic instruction conditions. Moreover, perceived test importance has predicted test-taking effort, which in turn predicted test performance. This mediated relationship may be moderated by the personal relevance of the test, which we investigated via manipulating test instructions.

Objective

Does the average level of test importance, examinee effort, and test performance differ across instruction conditions? Is the indirect effect of perceived test importance on test performance via examinee effort moderated by instruction condition?

Design/Methodology

Two samples of U.S. college students (first-year N = 1215; upperclass N = 1109) participated in an operational testing program for institutional accountability purposes. Students were randomly assigned to one of three realistic instruction conditions. In Condition 1, students were informed their scores would be aggregated and used to impact decisions at the institution. In Condition 2, students were told they would receive their personal scores. In Condition 3, students were told their scores would be released to faculty.

Results

For both samples, there was no significant difference between instruction conditions with respect to average test performance, effort, and test importance. Moreover, moderated mediation analysis indicated there was a significant and practical indirect effect of importance on performance via effort and this indirect effect was not moderated by instruction condition.

Conclusions

The stability of the fully-mediated effect of test importance on performance via effort across instructional condition and student population underscores the value of attending to perceived test importance. Attempts to increase effort and, in turn, performance should be centered on increasing test importance, which isnâ€™t increased via instruction.

PAPER 3: Test-taking motivation in low-stakes assessments: An empirical investigation of the theoretical expectancy x value interaction
Christiane Penk, German Institute for International Educational Research

Introduction/Objectives

Test-taking motivation (TTM) research conducted internationally often applies the modern expectancy-value theory. Embedded within this framework, TTM refers to studentsâ€™ invested effort, their expectancy for success and perceived value of the test. Recent expectancy-value models based on Atkinsonâ€™s theory of achievement motivation that originally, assumed an interaction effect occurs between expectancy and value. However, the interaction term disappeared from research. One recent study of Trautwein and colleagues found the â€œlostâ€ interaction effect of expectancy and value on performance but they did not consider effort. It is not known if those results for achievement motivation are applicable to the motivational processes seen in low-stakes testing contexts. Here, besides the two main components of the expectancy-value model, effort is highly relevant. This third component introduces more theoretical possible combinations beyond the usual expectancy value interaction: a) an expectancy value interaction that affects performance, b) an expectancy effort interaction that affects performance, and c) a value effort interaction that affects performance. Thus, the objective of the study is the investigation of the various theoretical interaction terms to contribute to the existing theory.

Design/Methodology

Analyses are based on data from a German National Assessment Study in 2015, which assessed competencies in German (first language) of 40,000 ninth graders. The students answered questions about their expectancy, value, and effort.

Results and Conclusions

Using latent moderated structural equations approach, preliminary analyses indicate one significant interaction effect of c) value x effort on performance (b = -0.094, SE = 0.032, p < .010) after controlling for studentsâ€™ socio-demographic background. The interaction effects explained just a small amount (0.2%) of the total variance, and 6% of the total variance was explained by expectancy, value, effort, and the interaction of value and effort. Implications for the expectancy-value-effort model of TTM are discussed.

PAPER 4: The Impact of Careless Responding on Aggregated-Scores: To Filter Examinees or Not?
Joseph Rios Liyang Mao, Lydia Liu, and Hongwen Guo, Educational Testing Service, U.S.A

Introduction/Objectives

When examinee motivation is questionable, researchers are confronted with determining whether careless responding is an issue and if so, deciding on the best approach to deal with such responses. As there has been insufficient research on these topics, the objectives of this study were to: a) evaluate the degree of underestimation of the â€œtrueâ€ mean when careless responses are present, b) compare the effectiveness of two (examinee- and response-level) filtering procedures in purifying biased aggregated-scores, and c) evaluate the assumption that careless responding is unrelated to ability, which underlies examinee-level filtering.

Design/Methodology

Objectives A and B were evaluated by conducting simulation analyses in which the following independent variables were manipulated: a) percentage of careless responses in the total sample (1% to 25%), b) test difficulty (easy [Â = -1], moderate [Â = 0], difficult: [Â = 1]), and c) whether ability was related to careless responding (unrelated [i.e., unmotivated examinees possessed randomly sampled abilities] and related [i.e., unmotivated examinees sampled possessed Â < 0]). Objective C was examined by comparing prior performance on a high-stakes college admission test between examinees (N = 1,322) with high and low percentages of rapid responding on a low-stakes assessment.

Results/Conclusions

Results demonstrated that: a) the â€œtrueâ€ mean was underestimated by around 0.20 SDs if the total amount of careless responses exceeded 6.25%, 12.5%, and 25% for easy, moderately difficult, and difficult tests, respectively, b) examinee-level filtering was found to artificially inflate the â€œtrueâ€ mean by as much as .42 SDs when ability was related to careless responding, and c) in applied data, the assumption that motivation and ability are unassociated was untenable. Results suggest: a) only under certain conditions does careless responding have a large biasing effect on aggregated scores, and b) the validity of employing examinee-level filtering for operational use is questionable.

An account with this site is required in order to view papers. Click here to create an account.