With the advancement in psychological and educational testing, researchers have increasingly focused not only on measuring the abilities or traits of test takers, but also on assessing their mastery of specific knowledge structures. As a result, cognitive diagnostic assessment has become a major focus within the fields of psychological and educational measurement. In practice, however, both general and cognitive diagnostic tests frequently reveal abnormal response patterns from test takers, including missing responses and random guessing, which can be attributed to either individual characteristics or item properties. These abnormal responses can introduce biases in parameter estimation, thereby threatening the reliability and validity of the tests. Addressing these common abnormal response patterns is crucial for accurate data analysis. While much of the existing research on abnormal responses has been concentrated within the Item Response Theory (IRT) framework, there is a notable lack of work in the cognitive diagnosis domain, which remains in its early stages of development. Inspired by the IRTree framework, this study develops a novel cognitive diagnostic model that simultaneously accounts for missing responses and random guessing. This innovative model seeks to enhance the representation of abnormal response patterns within cognitive diagnostic assessments, offering significant implications for future research.
The paper begins with a comprehensive review of relevant concepts, theories, and prior research. It then details the modeling approach and framework of the new model, including the prior information for parameter settings and the Markov Chain Monte Carlo (MCMC) estimation method. A 3×2×2×4 four-factorial experimental design is employed, varying the proportions of missing responses (2.5%, 5%, 10%), proportions of random guessing (2.5%, 5%), sample sizes (1000, 1500), and handling methods (IRTree-LCDM, LCDM-FCS, LCDM-CIM, LCDM-ZR). This simulation study evaluates the parameter estimation accuracy and robustness of the new model and compares its attribute classification accuracy with traditional cognitive diagnostic models using different methods to handle missing values (i.e., full conditional specification, corrected item mean imputation, and zero replacement). Finally, the new model is applied to real data from the 8th-grade mathematics test of TIMSS 2019. The fit of the new model to the data is compared with that of traditional cognitive diagnostic models, and typical test-takers are analyzed to illustrate the advantages and practical value of the new model.
Results show that: (1)Compared to traditional LCDM using FCS, CIM, and ZR for handling missing data, the newly developed IRTree-LCDM exhibits superior parameter estimation and diagnostic precision. The average Attribute Classification Correct Rate (ACCR) for test takers exceeds 0.946, while the average Pattern Classification Correct Rate (PCCR) reaches .783. (2)The proportion of abnormal response patterns affects the classification accuracy of attributes and patterns; the higher the proportion of abnormal responses, the lower the classification accuracy. However, compared to traditional LCDM (using FCS, CIM, and ZR methods for missing data imputation), the new model shows significant advantages in handling missing responses and random guessing. (3)Compared to traditional LCDM (using ZR for missing data imputation), IRTree-LCDM performs better in actual tests, providing more reasonable estimates of test takers' attribute mastery patterns.
In conclusion, the IRTree-LCDM model demonstrates significant value and importance in handling abnormal responses.
Key words
cognitive diagnosis /
item response tree model /
item response theory /
miss /
random guess
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
References
[1] 刘玥, 刘红云. (2021). 心理与教育测验中异常作答处理的新技术: 混合模型方法. 心理科学进展, 29(9), 1696-1710.
[2] 罗照盛. (2012). 项目反应理论基础. 北京师范大学出版社..
[3] 宋枝璘, 郭磊, 郑天鹏. (2022). 认知诊断缺失数据处理方法的比较: 零替换、多重插补与极大似然估计法. 心理学报, 54(4), 426-440.
[4] 涂冬波, 蔡艳, 高旭亮, 汪大勋. (2019). 高级认知诊断. 北京师范大学出版社..
[5] Boeck, P. D., & Partchev, I. (2012). IRTrees: Tree-based item response models of the GLMM family. Journal of Statistical Software, Code Snippets, 48(1), 1-28.
[6] Brooks, S. P., & Gelman, A. (1998). General methods for monitoring convergence of iterative simulations. Journal of Computational and Graphical Statistics, 7(4), 434-455.
[7] Cai M. Y., van Buuren S., & Vink G. (2023). Joint distribution properties of fully conditional specification under the normal linear model with normal inverse-gamma priors. Scientific Reports, 13, Article 644.
[8] Cao, J., & Stokes, S. L. (2008). Bayesian IRT guessing models for partial guessing behaviors. Psychometrika, 73(2), 209-230.
[9] Dai, S. H. (2021). Handling missing responses in psychometrics: Methods and software. Psych, 3(4), 673-693.
[10] de la Torre, J. (2011). The generalized DINA model framework. Psychometrika, 76(2), 179-199.
[11] de la Torre, J., & Douglas, J. A. (2004). Higher-order latent trait models for cognitive diagnosis. Psychometrika, 69(3), 333-353.
[12] Debeer D., Janssen R., & De Boeck P. (2017). Modeling skipped and not-reached items using IRTrees. Journal of Educational Measurement, 54(3), 333-363.
[13] Glas, C. A. W., & Pimentel, J. L. (2008). Modeling nonignorable missing data in speeded tests. Educational and Psychological Measurement, 68(6), 907-922.
[14] Goegebeur Y., De Boeck P., Wollack J. A., & Cohen A. S. (2008). A speeded item response model with gradual process change. Psychometrika, 73(1), 65-87.
[15] Henson R. A., Templin J. L., & Willse J. T. (2009). Defining a family of cognitive diagnosis models using log-linear models with latent variables. Psychometrika, 74(2), 191-210.
[16] Holman, R., & Glas, C. A. W. (2005). Modelling non-ignorable missing-data mechanisms with item response theory models. British Journal of Mathematical and Statistical Psychology, 58(1), 1-17.
[17] Hsu C. L., Jin K. Y., & Chiu M. M. (2020). Cognitive diagnostic models for random guessing behaviors. Frontiers in Psychology, 11, Article 570365.
[18] Huang, H. Y. (2016). Mixture random-effect IRT models for controlling extreme response style on rating scales. Frontiers in Psychology, 7, Article 1706.
[19] Huang, H. Y. (2020). A mixture IRTree model for performance decline and nonignorable missing data. Educational and Psychological Measurement, 80(6), 1168-1195.
[20] Jiang S. Y., Wang C., & Weiss D. J. (2016). Sample size requirements for estimation of item parameters in the multidimensional graded response model. Frontiers in Psychology, 7, Article 109.
[21] Jin K. Y., Siu W. L., & Huang X. T. (2022). Exploring the impact of random guessing in distractor analysis. Journal of Educational Measurement, 59(1), 43-61.
[22] Kim, S., & Moses, T. (2018). The impact of aberrant responses and detection in forced-choice noncognitive assessment. ETS Research Report Series, 2018(1), 1-15.
[23] Köhler C., Pohl S., & Carstensen C. H. (2015). Taking the missing propensity into account when estimating competence scores: Evaluation of item response theory models for nonignorable omissions. Educational and Psychological Measurement, 75(5), 850-874.
[24] Kuha J., Katsikatsou M., & Moustaki I. (2018). Latent variable modelling with non-ignorable item non-response: Multigroup response propensity models for cross-national analysis. Journal of the Royal Statistical Society Series A: Statistics in Society, 181(4), 1169-1192.
[25] Liu, C. W. (2021). Examining nonnormal latent variable distributions for non-ignorable missing data. Applied Psychological Measurement, 45(3), 159-177.
[26] Lu J., Wang C., Zhang J. W., & Tao J. (2020). A mixture model for responses and response times with a higher-order ability structure to detect rapid guessing behaviour. British Journal of Mathematical and Statistical Psychology, 73(2), 261-288.
[27] McLeod L., Lewis C., & Thissen D. (2003). A Bayesian method for the detection of item preknowledge in computerized adaptive testing. Applied Psychological Measurement, 27(2), 121-137.
[28] Peng S. W., Cai Y., Wang D. X., Luo F., & Tu D. B. (2022). A generalized diagnostic classification modeling framework integrating differential speediness: Advantages and illustrations in psychological and educational testing. Multivariate Behavioral Research, 57(6), 940-959.
[29] Peng S. W., Man K. W., Veldkamp B. P., Cai Y., & Tu D. B. (2024). A mixture model for random responding behavior in forced-choice noncognitive assessment: Implication and application in organizational research. Organizational Research Methods, 27(3), 414-442.
[30] Plummer, M. (2003). JAGS: A program for analysis of Bayesian graphical models using Gibbs sampling. In K. Hornik, F. Leisch, & A. Zeileis (Eds.), Proceedings of the 3rd international workshop on Distributed Statistical Computing (DSC 2003)(pp. 1-10).Vienna, Austria.
[31] Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Danmarks Paedagogiske Institut.
[32] R Core Team. (2022). R: A language and environment for statistical computing. R Foundation for Statistical Computing.
[33] Rios, J. A. (2022). Assessing the accuracy of parameter estimates in the presence of rapid guessing misclassifications. Educational and Psychological Measurement, 82(1), 122-150.
[34] Spiegelhalter D. J., Best N. G., Carlin B. P., & van der Linde, A. (2002). Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society Series B: Statistical Methodology, 64(4), 583-639.
[35] Ulitzsch E., von Davier M., & Pohl S. (2020a). Using response times for joint modeling of response and omission behavior. Multivariate Behavioral Research, 55(3), 425-453.
[36] Ulitzsch E., von Davier M., & Pohl S. (2020b). A hierarchical latent response model for inferences about examinee engagement in terms of guessing and item-level non-response. British Journal of Mathematical and Statistical Psychology, 73(1), 83-112.
[37] Ulitzsch E., von Davier M., & Pohl S. (2020c). A multiprocess item response model for not-reached items due to time limits and quitting. Educational and Psychological Measurement, 80(3), 522-547.
[38] von Davier M., Khorramdel L., He Q. W., Shin H. J., & Chen H. W. (2019). Developments in psychometric population models for technology-based large-scale assessments: An overview of challenges and opportunities. Journal of Educational and Behavioral Statistics, 44(6), 671-705.
[39] Wise, S. L., & DeMars, C. E. (2006). An application of item response time: The effort-moderated IRT model. Journal of Educational Measurement, 43(1), 19-38.
[40] Zhan P. D., Jiao H., Man K. W., & Wang L. J. (2019). Using JAGS for Bayesian cognitive diagnosis modeling: A tutorial. Journal of Educational and Behavioral Statistics, 44(4), 473-503.