With the advance in computerized tests, item response times (RTs) collection has become a routine activity in many large-scale tests. As a result, besides the traditional item response accuracy (RA) data, an additional source of information is available to test developers and data-analysts. Recorded RT may help to improve test design, aberrant response behavior detection, and item selection in computerized adaptive tests. For example, when respondents are not motivated in a low-stakes test, they may respond to items in a speeded manner, such responding behavior may not be easily identified only based on RA. Among several proposed RT modeling approaches, the hierarchical modeling framework (van der Linden, 2007) is one of the most flexible tools to explain the relationship between response speed and accuracy. This framework is generalized enough to integrate available measurement models for RA and RT. Currently, however, almost all RT research only employs the unidimensional item response theory (IRT) models as a measurement model for RA. The unidimensional IRT models only provide a single overall ability score which may not meet the needs of providing multidimensional analysis and assessment results.
To provide multidimensional analysis results with collateral information in RT, this study proposed a joint responses and times multidimensional Rasch model (JRT-MRM) for fitting RT and RA data simultaneously. In the JRT-MRM model, the multidimensional Rasch model (Adams, Wilson, & Wang, 1997) was employed as the measurement model and the lognormal RT model (van der Linden, 2006) was employed as the RT model, respectively. Model parameter estimation was explored using the Bayesian MCMC method via JAGS (Version 4.2.0) (Plummer, 2015). The PISA 2012 and 2015 computer-based mathematics data were analyzed. For simplicity, only the PISA 2012 data was introduced and mentioned here. This dataset contains 1582 participants’ dichotomous RA and log RT data to 10 items. According to the 2012 PISA mathematics assessment framework (OECD, 2013) and the log-file databases for released computerized mathematics items, four mathematical content knowledge dimensions were assessed, namely, (θ1) change and relationships, (θ2) quantity, (θ3) space and shape, and (θ4) uncertainty and data. The test structure is a between-item multidimensional structure (Adams et al., 1997). To evaluate the advantages of introducing the information of RT (or the consequences of ignoring the information of RT) in the analysis, the JRT-MRM and the MRM were used to fit the data.
For item parameters, the correlation between the estimated item intercept/easiness parameters of two models was 0.9997. In the JRT-MRM, the estimated item time-intensity parameters were ranged from 3.740 to 4.779. More importantly, the standard errors (the standard deviation of the posterior distribution) of the estimated item intercept/easiness parameters of the JRT-MRM were generally smaller than those of the MRM, which means considering RT in the analysis would lead to a more precise estimation of item parameters. In the JRT-MRM, the estimated correlation among item intercept/easiness parameters and time-intensity parameters is –0.422, which was consistent with previous studies that the more difficult items need more time to be solved (e.g., Fox & Marianti, 2016; van der Linden, 2006; 2007). In addition, for person parameters, the correlation between the each estimated latent ability of two models was 0.989, 0.997, 0.985, and 0.953, respectively. In the JRT-MRM, the estimated person speed parameters were ranged from –0.913 to 2.910. The estimated correlation between θ1 and person speed was –0.351, between θ2 and person speed was –245, between θ3 and person speed was –0.365, and between θ4 and person speed was –487, which means moderate negative correlations exist between the multidimensional abilities and the person speed parameter. Although this result is not consistent with common sense that the more able respondents tended to work faster, some studies also have reported a negative correlation between the ability and speed parameters (e.g., Klein Entink, Fox et al., 2009; van der Linden & Fox, 2015). As a low-stakes test, PISA has limited for individual respondents (Huff & Goodman, 2007). Thus, a reasonable explanation could be that low ability respondents lack of motivation in taking the test (Wise & Kong, 2005), which led to shorter RT and a greater number of incorrect responses than high ability respondents.
Overall, the proposed JRT-MRM works well in real data analysis and implements the analysis of RT data. The results indicated that incorporating RT in the multidimensional Rasch model would result in more accurate estimation of the model parameters and provide a chance and condition to data-analysts to using RT information to make further decisions and interventions.
Key words
item response theory /
multidimensional item response theory /
item response times /
computer-based assessment /
joint modeling /
Rasch model /
PISA