SYMPOSIUM: Navigating the Intersection of Professional Standards and Legal Interpretation: A Practitionerâ€™s Challenge

Hillary Michaels; Susan Davis-Becker; Michaela Geddes; Chad W. Buckendahl

Open Conference Systems, ITC 2016 Conference

Hillary Michaels, Susan Davis-Becker, Michaela Geddes, Chad W. Buckendahl

Building: Pinnacle
Room: Cordova-SalonB
Date: 2016-07-03 11:00 AM – 12:30 PM
Last modified: 2016-06-02

Abstract

With the release of the Standards for Educational and Psychological Testing (AERA, APA, & NCME, 2014; Test Standards), the measurement community has an updated set of expectations to guide practice. In providing this guidance, the Test Standards notes particular cautions regarding interpretation and use. One of those cautions relates to its use in legal proceedings where existing regulatory or other legal requirements may take precedence. For practitioners who work with testing programs that may be subject to legal challenge, it is non-trivial to understand how professional standards and legal expectations do not always agree. And more important, practitioners must decide how to respond when these challenges occur.

This session focuses on a discussion of different phases of test development and validation perspectives in the context of a recent decision in a long-standing legal challenge that impacts credentialing and employment testing. The result of this challenge has led to multiple decisions that raise questions about how measurement practitioners should interpret and use the Test Standards. Specifically, Gulino v. Board of Education of the City School District of the City of New York (2015a, 2015b) suggests that reliance on the Standards as the primary guidance for development and validation of a credentialing examination for educators was in error. Although the relevant version of the Test Standards (1999) in place at the time of the development makes a clear distinction between validation expectations for credentialing examinations and employment tests, the court decisions at various phases in this caseâ€™s history have consistently interpreted the use as being for personnel selection purposes (see Gulino 2015a, 2015b, 2014, 2012). This interpretation has the potential to establish a precedent that may have broad reaching implications for practitioners. Presenters in this session will focus on the following phases: program and test design, practice analysis, standard setting, and fairness.

Considerations for Program and Test Design
Hillary Michaels, HumRRO

Similar to most industries, successful programs are predicated on a good plan. Cases like Gulino raise questions about how a psychometric practitioner should engage in program and test design. And more important, how validity evidence to support the use of test scores for credentialing or employment purposes should be collected and evaluated. Practitioners in the education and credentialing sectors generally rely primarily on the guidance provided by the Test Standards for directing the collection of validity evidence, evaluating the appropriateness and use of test scores, and supporting the inferences from those test scores. This presentation will evaluate factors that practitioners can consider when evaluating legal precedent during a program design or redesign in the context of professional expectations and best practice.

For credentialing programs that seek to identify whether a population of candidates have the minimally defined characteristics associated with the meaning of the credential, the key issues from the case serve as a framework for analyzing considerations for practitioners. In addition to defining the purpose of the credential and the intended uses, there are a number of practical considerations that programs will evaluate at the design or redesign phase. Factors such as the intended candidate population, eligibility criteria, successful candidate performance, types of measurement methods and item types, security threats, administration strategy and locations, credential maintenance, documentation requirements, and legal precedent can all be considered part of the design phase. Each of these factors can directly or indirectly influence the validity of the scores and decisions. Therefore, understanding how these elements can be considered and evaluated as part of the validity argument will be an important activity for programs to mitigate potential legal challenges. Anticipating that additional challenges to programs may emerge from the decisions observed in Gulino, practitioners will be able to use the recommendations from this presentation to adopt or adapt for their programs.

Considerations for Practice Analysis
Susan Davis-Becker, ACS Ventures, LLC

Given the centrality of the practice analysis in the Gulino deliberations and decisions, it raises a number of questions about how a psychometric practitioner should appropriately conduct these studies. This presentation will explore the challenges of how an examination that is used as part of a larger credentialing program is responding to the rulings. A credentialing examination with a similar purpose to the examination at the center of Gulino (2015b) with respect to purpose, intended use, and design is conducting its most recent practice analysis considering lessons learned. This paper discusses the redesign of the programâ€™s practice analysis, highlighting features that were noted in the decisions and how elements of the Test Standards and best practice differ from some of the legal commentary.

This study was conducted in two phases. In the first phase, the current program was evaluated for available validity evidence of the job-relatedness of knowledge, skills, and abilities of the target candidate, identifying the strengths and areas for improvement. This review began with the historical foundations and rationale for the testing requirements, continued with a practice analysis that focused on a content validation study to evaluate the content of the program, and then evaluated the test framework against a national database of related occupational expectations. The second phase was a program redesign activity where the findings from the first phase were used to evaluate the future directions for the program, what changes needed to be made to the current design (e.g., requirements, content, intended use of test scores), and how these changes would be implemented over a period of time. In addition, guidance to practitioners regarding how to design and execute this type of process in the context of the courtâ€™s ruling will be provided. Information about how these validation efforts can be appropriately documented will also be included.

Considerations for Standard Setting
Michaela Geddes, Yardstick

When stakes are attached to the use of scores from a test, every aspect of the test is subject to inquiry. One aspect of the test that is especially vulnerable is the method by which performance standards (i.e., passing scores) are established. Examinees who receive an unfavourable decision may bring a legal suit against the test developer or user of the test. As a result, the process for determining the performance standard(s) as well as evidence that supports the passing score may be challenged. Legal defensibility requires a standard setting method that produces valid passing scores aligned with their intended interpretation and use. The methodology for setting the passing score for the examinations highlighted in Gulino was discussed in expert witness reports and testimony. Although the topic did not receive extensive discussion in the courtâ€™s ruling, it does represent one of the key elements of a defensible program.

This presentation will discuss some common methodologies used to define performance of the target candidate and recommend passing scores on both multiple-choice examinations and performance-based assessments (e.g., OSCEs, clinical skills examinations), and will also share case studies that impact on legal issues. The presentation will also include types of evidence that program and agencies can collect and evaluate to ensure valid, legally defensible performance standards. Considerations for maintaining the meaning of the passing score across forms and time will also be considered. As part of this presentation, the speaker will also discuss the implications for practice in the context of Gulino where the passing score was raised by plaintiffs as one of the challenges to the technical characteristics of the examination program. Evaluation of evidence collected during standard setting along with appropriate approaches for documentation of this evidence will also be included.

Considerations for Fairness
Chad W. Buckendahl, ACS Ventures, LLC

Because of the technical nature of test development and validation as it related to key questions in Gulino, expert witnesses were called upon throughout the proceedings to offer opinions about the validity evidence. This presentation will focus on key questions of fairness that led to the initial challenge along with technical characteristics of fairness that were evaluated and deliberated during proceedings. In this case, plaintiffs asserted that the examination produced adverse impact for members of protected classes. Specifically, African-American and Hispanic candidates passed the examination at a rate lower than a predefined threshold (80%) when compared with the pass rate of the majority. Because the intended interpretation and use of the test scores were for credentialing purposes, the author applied the Test Standards as the primary resource for evaluating the evidence. Beyond the evidence from the differences in pass rates, evidence from judgmental studies (e.g., bias review) and empirical studies (e.g., DIF) were presented in support of the test being fair across the candidate population.

A primary lesson about this topic is that even though the Test Standards is intended to serve as a primary guide for the profession, the court did not defer to it. Further, because the courtâ€™s appointed expert prioritized alternative expectations such as the SIOP Principles (2003) and Uniform Guidelines (1978), this diluted the perceived value of the Test Standards. This is professionally worrisome because test developers and users responsible for credentialing programs would have reasonably used the Test Standards for guidance, evaluated the differences in use between credentialing and employment, and designed their program accordingly. Although not intended to be prescriptive, if guidance from the Test Standards does not align with legal expectations, practitioners will be exposed to greater risk because they will be unaware of the challenges that may emerge when another paradigm serves as the relevant standard of evaluation in a legal proceeding.

References

American Educational Research Association, American Psychological Association, & National Council on Measurement in Education (2014). Standards for educational and psychological testing. Washington, D.C.: American Educational Research Association.

American Educational Research Association, American Psychological Association, & National Council on Measurement in Education (1999). Standards for educational and psychological testing. Washington, D.C.: American Educational Research Association.

Gulino v. Board of Education of the City School District of the City of New York. 2015a (August 7) U.S. District Court, S.D. New York.

Gulino v. Board of Education of the City School District of the City of New York. 2015b (June 5) U.S. District Court, S.D. New York.

Gulino v. Board of Education of the City School District of the City of New York. 555 F. Appâ€™x 37 (2d Cir. 2014).

Gulino v. Board of Education of the City School District of the City of New York. 907 F. Supp. 2d 492, 498 (S.D.N.Y. 2012).

Principles for the validation and use of personnel selection procedures 4^th ed. (2003). Society for Industrial and Organizational Psychology.

Uniform guidelines on employee selection procedures. (1978). 43 FR 34295.

An account with this site is required in order to view papers. Click here to create an account.