SYMPOSIUM: Current Advances in Research on the Multidimensional Forced-Choice Format

Eunike Wetzel; Anna Brown; Daniel Morillo; Vincente Ponsoda; Iwin Leenen; Francisco Abad; Pedro Hontangas; Yin Lin; Rachelle Sass; Ulf-Dietrich Reips; Dave Bartram

Open Conference Systems, ITC 2016 Conference

Eunike Wetzel, Anna Brown, Daniel Morillo, Vincente Ponsoda, Iwin Leenen, Francisco Abad, Pedro Hontangas, Yin Lin, Rachelle Sass, Ulf-Dietrich Reips, Dave Bartram

Building: Pinnacle
Room: 2F-Port of Vancouver
Date: 2016-07-02 11:00 AM – 12:30 PM
Last modified: 2016-06-03

Abstract

Introduction

The multidimensional forced-choice (MFC) format has been proposed as an alternative to the popular rating scale format, which suffers from drawbacks such as a high susceptibility to response biases. In the MFC format, two or more items measuring different traits are presented simultaneously to the respondent. The respondentâ€™s task is to either rank the items with respect to how well they describe him or her or choose the one that describes him/her best and the one that describes him/her least. The goal of this symposium is to provide an overview of current research on the MFC format with a particular focus on parameter estimation and test construction issues.

Contributions

In the first talk, Author 1 will present a study on the development of a multidimensional forced-choice Big Five instrument with triplets that are matched regarding their social desirability. The test construction process and analyses conducted for item selection will be described. Second, Author 2 will present a simulation study comparing two estimation procedures for pairwise comparisons, an MCMC procedure and the Thurstonian item response procedure, regarding the accuracy of the estimators and their standard errors. Third, Author 3 will present a study investigating whether using only a subset of the scales in an MFC instrument assessing job competencies affects the resulting trait scores and their validities. The final presentation by Author 4 will report the results of a study comparing test-taking motivation between the rating scale format and different versions of the MFC format (pairs, triplets, tetrads, pentads). Lastly, Discussant will discuss the contributions in this symposium.

Conclusions

Recent developments in modeling forced-choice responses make the MFC format a viable alternative to rating scales. The research presented in this symposium shows advances in the areas of parameter estimation and test construction with the MFC format.

*****

First presentation

Development of a multidimensional forced-choice Big Five instrument
Eunike Wetzel, Rachelle Sass, & Ulf-Dietrich Reips, University of Konstanz

Introduction

The multidimensional forced-choice (MFC) format makes demands on test construction that go beyond those made in the construction of rating scale questionnaires. Â Trait recovery is most accurate when item blocks contain either only positively keyed items or a mixture of positively and negatively keyed items and when blocks contain items measuring traits with low or negative correlations (Brown & Maydeu-Olivares, 2011).

Objectives

The goal of this project was to develop a Big Five instrument that was optimized for the multidimensional forced-choice format. A second goal was to match items within triplets regarding their social desirability.

Design

Social desirability ratings of the initial item pool of 213 items were obtained. Based on the ratings and additional requirements, a first pilot instrument with 71 triplets was constructed. Data for pilot study 1 (N = 988) were obtained using Amazon Mechanical Turk and item analyses were conducted with the Thurstonian item response model. Items were selected based on factor loadings and item informations. Two versions of the revised pilot instrument with 33 triplets were administered to 634 and 652 participants, respectively, using Prolific Academic. Item analyses were repeated to obtain the final instrument with 20 triplets.

Results

In the context of the assessment of the Big Five, some of the guidelines were easier to implement than others. Item properties such as item informations were dependent on which items the item of interest was paired with. Matching items regarding their social desirability led to an unbalanced distribution of traits over triplets.

Conclusions

Test construction for MFC questionnaires is highly demanding. Since the MFC format eliminates the occurrence of a number of response biases such as acquiescence, the benefits may still outweigh the costs, although more research on the susceptibility of the MFC format to other response biases is necessary.

Second presentation

Comparing CFA and Bayesian estimations of forced-choice questionnaires with paired dominance items
Daniel Morillo, Vicente Ponsoda, Iwin Leenen, Francisco J. Abad, Universidad Autonoma de Madrid; & Pedro M. Hontangas, University of Kent

Introduction

The Multi-Unidimensional Pairwise Preference, 2-Parameter Logistic model (MUPP-2PL) has been proposed for paired forced-choice items with a dominance measurement model.Â Â Along with it, a MCMC algorithm has been introduced allowing for the joint Bayesian estimation of the model.Â Given the quasi-equivalence of the MUPP-2PL with the Thurstonian IRT (TIRT) model applied to pairwise preferences, both the MCMC algorithm and the Confirmatory Factor Analysis, can be applied to the same pairwise preference data.

Objectives and design

The goal is to compare both estimation procedures under several simulation conditions, to get insight into the advantages and drawbacks of each method.Â The following factors have been manipulated: questionnaire length (18 and 36 items), proportion of item pairs with opposite polarity (2/3 of pairs with a direct and an inverse item, 1/3 of pairs with a direct and an inverse item, and all pairs only with direct items), interdimensional correlations (0, .25, and .5), and number of respondents (500, and 1,000).Â Several goodness-of-recovery indices have been obtained to test the accuracy of the estimators and their standard errors: mean error, mean RMSE and correlation (or mean reliability in the case of the latent traits) of each parameter type estimates, and percentage of estimates within the 95% confidence / credible interval.

Results

The two algorithms produced similar results.Â MCMC was significantly better at recovering the latent space structure (i.e., less biased correlations) and produced more reliable latent trait estimates, with more accurate estimation errors.Â However, the computation time was much longer.

Conclusions

We suggest a two-step strategy: in a first exploratory step, the faster TIRT procedure is used to find a theoretically plausible model with an appropriate fit to the data.Â In a second step, the MCMC algorithm is used to get a more accurate estimation.Â This process is illustrated with an empirical application.

Third presentation

Does the Removal of Scales from a Forced-Choice Competency Assessment Alter its Properties?Yin Lin & Anna Brown, University of Kent

Introduction

Forced choice (FC) is becoming more popular thanks to its superior resistance to response biases and distortions (e.g., Christiansen, Burns, & Montgomery, 2005) and normative scoring through Item Response Theory (Brown & Maydeu-Olivares, 2013). A typical example of this format is the multidimensional forced choice (MFC) triplet, where three statements measuring three different scales are presented simultaneously and ranked.

Objectives

The MFC format comes with complexity in practice. For example, for recruitment projects, validation studies with incumbents are sometimes conducted using a long assessment covering all potentially important competencies for a job role, in order to identify key drivers for good performance to be targeted in the actual candidate assessments. However, the removal of certain scales from the pre-designed MFC assessment used in the validation study unavoidably changes its design, requiring a new MFC assessment to be constructed. The question is whether this redesigning affects the resulting scores and their validities.

Design

An alternative form study (N=508) was conducted comparing two MFC competency assessments â€“ one assessing 18 scales and the other a subset of 12. A well-validated personality assessment was also included for exploring construct validity.

Results

Convergent correlations between the overlapping scales were strong (range 0.61-0.81, mean 0.71). Moreover, a structural equation model constraining all scale intercorrelations to be equal (Bentler, 1995; Byrne, 2006) was built and resulted in exceptional model fit (CFI 0.999, RMSEA 0.010). Finally, construct validities against the personality instrument were similar, with the vast majority (98.4%) of correlations being within 0.10 of each other.

Conclusions

Empirical evidence from this study thus supported measurement invariance across different FC assessments constructed from the same item bank, even when the scales being assessed are different.

Fourth presentation

Taking the test-taker's perspective: A comparison of test-taking motivation between forced-choice and rating scale instruments
Rachelle Sass, Ulf-Dietrich Reips, & Eunike Wetzel, University of Konstanz

Introduction

Several steps underlie the process of responding to questionnaire items: reading and interpreting the itemâ€™s content, retrieving relevant information, integrating the information to form a judgment, and reporting the judgment. An additional step is required of the forced-choice format due to the necessity of weighing the items in the block against each other to determine their rank. Thus, responding to forced-choice items should pose higher cognitive demands than responding to rating scale items. Previous findings indicate that high task difficulty and cognitive load may lower test-taking motivation and induce participants to execute the response process less diligently. Test motivation might also be lower in the forced-choice format because participants are forced to make a decision between items.

Objectives

The present study investigated whether there are differences in test-taking motivation between the two response formats. It was hypothesized that the more items included in one block, the stronger the decline in test-taking motivation.

Design

A sample of 2,000 participants completed the test online and were randomly assigned to one of five test versions: four forced-choice versions comprised of pairs, triplets, tetrads, and pentads, and a rating scale version. The items comprising these test versions were from an instrument assessing the Big Five. Test-taking motivation was assessed using several items from the motivation and lack of concentration scales of the Test Attitude Survey (Arvey et al., 2006) adapted to the context of personality assessment.

Results

Preliminary analyses indicate that test-taking motivation was lower in the forced-choice test versions compared with the rating scale version. The effect was particularly strong when items were presented as tetrads or pentads.

Conclusions

This research indicates that test-takers find it more motivating to respond to rating scale items compared with forced-choice items. Further research could investigate methods to construct forced-choice questionnaires that increase test-taking motivation.

An account with this site is required in order to view papers. Click here to create an account.