Joint Modeling for Response Times and Response Accuracy in Computer-based Multidimensional Assessments

Abstract

Abstract: With the advance in computerized tests, item response times (RTs) collection has become a routine activity in many large-scale tests. As a result, besides the traditional item response accuracy (RA) data, an additional source of information is available to test developers and data-analysts. Recorded RT may help to improve test design, aberrant response behavior detection, and item selection in computerized adaptive tests. For example, when respondents are not motivated in a low-stakes test, they may respond to items in a speeded manner, such responding behavior may not be easily identified only based on RA. Among several proposed RT modeling approaches, the hierarchical modeling framework (van der Linden, 2007) is one of the most flexible tools to explain the relationship between response speed and accuracy. This framework is generalized enough to integrate available measurement models for RA and RT. Currently, however, almost all RT research only employs the unidimensional item response theory (IRT) models as a measurement model for RA. The unidimensional IRT models only provide a single overall ability score which may not meet the needs of providing multidimensional analysis and assessment results. To provide multidimensional analysis results with collateral information in RT, this study proposed a joint responses and times multidimensional Rasch model (JRT-MRM) for fitting RT and RA data simultaneously. In the JRT-MRM model, the multidimensional Rasch model (Adams, Wilson, & Wang, 1997) was employed as the measurement model and the lognormal RT model (van der Linden, 2006) was employed as the RT model, respectively. Model parameter estimation was explored using the Bayesian MCMC method via JAGS (Version 4.2.0) (Plummer, 2015). The PISA 2012 and 2015 computer-based mathematics data were analyzed. For simplicity, only the PISA 2012 data was introduced and mentioned here. This dataset contains 1582 participants’ dichotomous RA and log RT data to 10 items. According to the 2012 PISA mathematics assessment framework (OECD, 2013) and the log-file databases for released computerized mathematics items, four mathematical content knowledge dimensions were assessed, namely, (θ1) change and relationships, (θ2) quantity, (θ3) space and shape, and (θ4) uncertainty and data. The test structure is a between-item multidimensional structure (Adams et al., 1997). To evaluate the advantages of introducing the information of RT (or the consequences of ignoring the information of RT) in the analysis, the JRT-MRM and the MRM were used to fit the data. For item parameters, the correlation between the estimated item intercept/easiness parameters of two models was 0.9997. In the JRT-MRM, the estimated item time-intensity parameters were ranged from 3.740 to 4.779. More importantly, the standard errors (the standard deviation of the posterior distribution) of the estimated item intercept/easiness parameters of the JRT-MRM were generally smaller than those of the MRM, which means considering RT in the analysis would lead to a more precise estimation of item parameters. In the JRT-MRM, the estimated correlation among item intercept/easiness parameters and time-intensity parameters is –0.422, which was consistent with previous studies that the more difficult items need more time to be solved (e.g., Fox & Marianti, 2016; van der Linden, 2006; 2007). In addition, for person parameters, the correlation between the each estimated latent ability of two models was 0.989, 0.997, 0.985, and 0.953, respectively. In the JRT-MRM, the estimated person speed parameters were ranged from –0.913 to 2.910. The estimated correlation between θ1 and person speed was –0.351, between θ2 and person speed was –245, between θ3 and person speed was –0.365, and between θ4 and person speed was –487, which means moderate negative correlations exist between the multidimensional abilities and the person speed parameter. Although this result is not consistent with common sense that the more able respondents tended to work faster, some studies also have reported a negative correlation between the ability and speed parameters (e.g., Klein Entink, Fox et al., 2009; van der Linden & Fox, 2015). As a low-stakes test, PISA has limited for individual respondents (Huff & Goodman, 2007). Thus, a reasonable explanation could be that low ability respondents lack of motivation in taking the test (Wise & Kong, 2005), which led to shorter RT and a greater number of incorrect responses than high ability respondents. Overall, the proposed JRT-MRM works well in real data analysis and implements the analysis of RT data. The results indicated that incorporating RT in the multidimensional Rasch model would result in more accurate estimation of the model parameters and provide a chance and condition to data-analysts to using RT information to make further decisions and interventions.

Key words: item response theory, multidimensional item response theory, item response times, computer-based assessment, joint modeling, Rasch model, PISA

摘要： 随着心理与教育测量研究的发展和科技的进步，计算机化(大规模)测验逐渐受到人们的关注。为探究在计算机化多维测验中如何利用作答时间数据来辅助评估多维潜在能力，以及为我国义务教育阶段教育质量监测提供数据分析方法上的理论支持。本研究以2012年和2015年国际学生能力评估(PISA)计算机化数学测验数据为例，提出了一种可同时利用作答时间和作答精度数据的联合作答与时间的多维Rasch模型。根据新模型对PISA数据的分析结果，表明引入作答时间数据，不仅有助于提高模型参数的估计精度，还有助于数据分析者利用被试的作答时间信息来做进一步的决策和干预(e.g., 对异常作答行为或预备知识的诊断)。

关键词: 项目反应理论, 多维项目反应理论, 题目作答时间, 计算机化测验, 联合建模, Rasch模型, 国际学生能力评估

Peida ZHAN. Joint Modeling for Response Times and Response Accuracy in Computer-based Multidimensional Assessments[J]. Psychological Science, 2019, 42(1): 170-178.

詹沛达. 计算机化多维测验中作答时间和作答精度数据的联合分析[J]. 心理科学, 2019, 42(1): 170-178.

[1]	meng xiangbin, Liu Jia, Ding Rui. A SAEM Algorithm for the estimation of item parameters in the 3-Parameter Normal-Ogive Model [J]. Journal of Psychological Science, 2023, 46(2): 450-460.
[2]	Zheng Tianpeng , Zhou Wenjie , Guo Lei . Cognitive Diagnosis Modelling Based on Response Times [J]. Journal of Psychological Science, 2023, 46(2): 478-490.
[3]	. Advantages and significance of the One-parameter and Unidimensional Rasch Model [J]. Journal of Psychological Science, 2021, 44(6): 1490-1498.
[4]	Ming-Feng XUE Ping Chen Tour Liu Feng-Quan ZHEN. Using GLMM to Unify GT and IRT [J]. Journal of Psychological Science, 2021, 44(2): 449-456.
[5]	wenyi Wang. Missing data handling methods based on the 2PLM [J]. Psychological Science, 2016, 39(6): 1500-1507.
[6]	. Improvement of MCMC Algorithm and Its Application in Parameter Estimation under the IRT Model [J]. Psychological Science, 2013, 36(3): 734-738.
[7]	. Use Rasch Model to Test and Analyze Children’s School Readiness Status [J]. Psychological Science, 2013, 36(2): 484-488.
[8]	. A Cross-Cultural Comparison about the Predictive Effectiveness of School Educational Resources on the Students’ Mathematical Literacy [J]. Psychological Science, 2012, 35(2): 352-357.
[9]	. Unidimensional Item Factor Analysis: A Comparison of Categorical Confirmation Factor Analysis and Item Response Theory [J]. Psychological Science, 2012, 35(2): 441-445.
[10]	. New progress on IRT: the Mixture Model Based on 3PLM and GRM [J]. Psychological Science, 2011, 34(5): 1189-1194.

Joint Modeling for Response Times and Response Accuracy in Computer-based Multidimensional Assessments

计算机化多维测验中作答时间和作答精度数据的联合分析

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 10

Recommended Articles 0

Metrics