Open Conference Systems, ITC 2016 Conference

Font Size: 
SYMPOSIUM: Issues in Small-Scale High-Stakes Assessments: Limitations, Challenges and Opportunities
Raman K Grover, Dallie Sandilands

Building: Pinnacle
Room: Cordova-SalonB
Date: 2016-07-02 03:30 PM – 05:00 PM
Last modified: 2016-06-01

Abstract


Introduction

The development, administration, and analysis of high stakes assessments are directed by guidelines established in the Standards for Educational and Psychological Testing (AERA, APA & NCME, 2014).   However, it can be difficult for small scale high stakes (“SSHSâ€) testing programs to effectively meet such guidelines.   Statistical analyses recommended to establish reliability, validity, and fairness require larger sample sizes than those typically available to smaller testing programs.  Other challenges include limited funding, inadequate numbers of subject matter experts (SMEs) required for test development, insufficient examinees for pilot testing and item calibration, and in many cases, geographical distances separating examinees which can present challenges in developing and administering exams.

Contributions

Very little research exists that synthesizes the challenges faced by SSHS programs and there is very little guidance available to such programs.  Therefore, the goal of this symposium is to be the first to summarize the challenges and bring together experts in SSHS assessments to share their experiences in producing high quality assessments despite these challenges.

The first presenters introduce challenges faced by SSHS programs within a Canadian context. Two groups of researchers present studies within an American context:  One researcher introduces an alternative approach to, and lessons learned from, conducting standard setting studies when few SMEs are available; the other group present study findings and make recommendations about equating methods for SSHS assessments.  In the fourth presentation, researchers share insights on technological challenges faced throughout the stages of developing and administering SSHS assessments in a global, online environment. Finally, in the fifth presentation, the feasibility of multi-stage adaptive testing is explored in a small-scale high-stakes exam.

Conclusions

There are countless SSHS programs globally.  Each faces unique challenges in meeting the high standards recommended for high stakes testing.  Because they are by their nature small and solitary they often face these challenges alone.  This symposium will provide a first opportunity for collaborating and learning from each other’s experiences.

*****

Identifying Challenges in Small Scale High Stakes Assessments in Canada within the Framework of “The Standardsâ€

Raman K. Grover, Psychometric Consultant
Dallie Sandilands, EMP Educational Measurement Professionals

Introduction

It is a challenge for small-scale high stakes (SSHS) assessment programs to meet the guidelines specified in the Standards (AERA, APA & NCME, 2014) due to a number of issues stemming from a limited budget, insufficient numbers of subject matter experts, and inadequate sample size for psychometric analyses.  These challenges are often amplified in Canada because of the need to generate both English and French language forms that are comparable.  Little research exists that provides guidance to psychometricians dealing with these issues.

Objectives

The objective of this presentation is to provide a framework for organizing the issues faced by SSHS assessments in Canada in order to stimulate discussion about possible approaches to dealing with these issues.

Design/Methodology

We organize the issues faced by SSHS exam programs within dual frameworks:  Ferrara’s (2007) framework to guide validity research agendas; and the Standards.  Ferrara’s 5 step framework focuses on 1) defining the test construct, 2) designing the test blueprint, 3) creating the test forms, 4) administering the test and collecting data, and 5) interpreting the scores and making valid inferences.  Each of these steps is aligned with the framework provided by the Standards.

Results

This framework organizes the issues in a coherent manner.  In general the issues faced by SSHS programs in Canada at each of the steps stem from 1) lack of financial resources; 2) availability of sufficient number of subject matter experts to define the test construct, develop the blueprint, write appropriate number of items in both English and French, and conduct standard setting; and 3) inadequate examinee sample size to conduct psychometric analyses.

Conclusion

Although our research focused on applying this dual framework to organize the SSHS issues in a Canadian context, it can easily be applied to organize the issues faced in other countries as well.

Setting Standards when Subject Matter Experts are Scarce and Stakes are High

Tia Sukin, Pacific Metrics

This presentation focuses on an alternative approach to conducting standard setting studies for high stakes assessments with a small number of test takers and scarce subject matter experts (SMEs). This process has been trialed within three similar assessment contexts. The process, lessons learned from implementing the process, and a resulting briefing booklet—which provides emerging validity evidence for the test development process and standard setting outcomes—will be discussed.

The objective of this presentation is to provide a summary of the standard setting approach trialed along with findings that support the methodology and implications for validity. This standard setting methodology differs from more traditional approaches in the following ways: 1) fewer panelists are recruited, 2) an item review and editing process is engaged, and 3) no item performance data or impact feedback is provided. Each difference will be addressed in relation to its impact on the validity of the standard setting process.

In brief, in this presentation I argue that it is more important to recruit qualified participants rather than to have a large number of panelists who do not meet the qualification criteria despite the negative impact on reliability; that the test development process can by streamlined and SMEs time maximized by having them engage both in an item review and standard setting process during the same meeting; and that the maximization of SME time can help mitigate the lack of item performance and impact data.

Due to this streamlined approach to test development and standard setting, it is critical that ongoing validity studies continue well into the operational delivery of assessments. These studies are sure to inform the refinement of these processes for future high stakes small scale assessments.

To Equate or Not to Equate:  That is the Question

Drew Dallas, NCCPA
Joshua Goodman, NCCPA

Introduction/Objective

All examination programs aim to maximize score-fairness, security, and examinee-friendliness (e.g., quick reporting, low-cost, frequent administrations). In large-scale testing programs, smart test design and application of psychometric methods are used to ensure each of these factors is addressed. Small scale programs, which often cannot safely use large-scale psychometric methods or justify the expense of complicated test-design, must choose which to address and which to discount. Small scale programs must choose between frequent reuse of forms (efficient but less secure) or ongoing form construction (more secure but less efficient). In the latter approach, small scale programs typically do not equate a new test form, which requires costly and slow post-administration processes before scores are reported. If equating could be safely employed, small-sample testing programs could find a compromise between examinee-friendliness (fast and flexible), score integrity (consistent performance expectations), and exam security (new forms to limit exposure). In this presentation we evaluate several equating methodologies in the context of small scale exams.

Design/Methodology

This study has two parts.  In part one, we simulated responses to small-sample tests under a variety of conditions. Then we used several methods (under a common-item design) to equate forms. The table below contains a summary of the study conditions.

In part two, we apply the same methods to real data from small scale testing programs and compare the two sets of results.

Results/Conclusions

This study is currently underway with results forthcoming.  The preliminary findings suggest that specialized small sample methods (e.g. Circle Arc, GLM) are increasingly beneficial as sample size decreases and lose efficiency as sample increases. The study will provide practical guidelines on which approach is best given sample size, form, and population characteristics.

Proficiency in Less Commonly Taught Languages: Practical Challenges for High Stakes Testing

Ken Petersen, Camelot Marshall, and Werner Wothke
American Councils for International Education

Introduction/Objective:

Assessing student proficiency in a foreign language poses both theoretical and methodological challenges. Claims about students’ ability to comprehend authentic input or to compose appropriate, accurate, and comprehensible output require rigorous and integrated approaches to content development, test administration, item analysis and scoring.

Proficiency assessment of less commonly taught languages (LCTLs) presents further challenges for testing professionals. Being “less common,†these languages are typically under-represented—both in terms of language professionals and students. Typically, LCTL professionals are scattered among academic institutions; students are few in number and frequently spread across the globe. Language exams are administered in remote locations with unpredictable hardware, network and limited on-site technical support. In addition, the small samples of LCTL examinees do not easily provide sufficient data for item calibration, so that initial test forms often rely on subject matter expert judgments.

Methodology:

Throughout the past eleven years, American Councils for International Education has been testing language proficiency in LCTLs—current languages include: Arabic, Chinese, Hindi, Korean, Russian, Persian, Portuguese, Swahili, Turkish and Urdu. To support the many small-group test administrations conducted in this period, American Councils has developed an online testing system with important subsystems: The online item development component allows for version-controlled, workflow-managed collaborative development and review cycles among content developers; the test administration component provides a secure, yet flexible platform for administering, proctoring and monitoring exams; and the psychometric analysis component supports post-hoc item analysis for quality assurance, and standard setting.

Results/Conclusions:

This presentation highlights American Councils’ criterion-referenced test development, standard setting, and online administration. Particular attention will focus on the technological challenges of the assessment processes, and how they relate to the broader implications of the results for college credit and placement, admissions to overseas language and culture immersion programs, language scholarship programs, and proficiency certification for employment with the U.S. Federal government.

References

American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (2014). Standards for educational and psychological testing. Washington, DC: American Educational Research Association.

Ferrara, S. (2007). Our field needs a framework to guide development of validity research agendas and identification of validity research questions and threats to validity. Measurement: Interdisciplinary Research and Perspectives, 5(2-3), 156–164. http://doi.org/10.1080/15366360701487500



An account with this site is required in order to view papers. Click here to create an account.