|
Evaluation of Item-Level Fit in Cognitive Diagnosis Model
Gao Xuliang, Wang Fang, Xia Linpo, Hou Minmin
2024, 47(2):
474-484.
DOI: 10.16719/j.cnki.1671-6981.20240226
The goal of cognitive diagnosis model (CDM) is to classify participants into potential categories with different attribute patterns, which provide diagnostic information about whether the student has mastered a set of skills or attributes. Compared with single-dimensional item response models (e.g., item response models), CDM provides a more detailed assessment of the strengths and weaknesses of students. Although CDM was originally developed in the field of educational evaluation, it has now been used to evaluate other types of structures, such as psychological disorders and context-based ability assessment. As with any model-based evaluation, a key step in implementing the CDM is to check the model data fit, that is, the consistency between model predictions and observed data. Only when the model fits the data, the estimated model parameters can be reliably explained. Item fit is used to evaluate the fit of each item with the model, which helps to identify abnormal items. Deleting or modifying these items will improve the overall model data fit for the entire test. At present, some commonly used item fit statistics in IRT have been extended to CDM. However, there is no research system to compare the comprehensive performance of these item fit indicators in CDM. In this study, we compared the performance of χ2, G2, S-χ2, z(r), z(l), and Stone-Q1 in the CDM. This study investigated the Type I error rate and power of the above item fit statistics through a simulation study. The factors manipulated include sample size (N=500, 1000), generating model (DINA, DINO, and ACDM), fitting model (DINA, DINO, and ACDM), test length (30 and 60), test quality (high and low), and significance level (.01 and .05). The test examined five attributes. For high-quality and low-quality tests, the guess parameters and slipping parameters of the three generating models are randomly extracted from uniform distributions U(.05, .15) and U(.15, .25), respectively. The simulation results showed that, in terms of the Type I error, z(r) and z(l) performed best under all conditions. In terms of statistical test power, when the generating model was ACDM, z(r) and z(l) had the highest average power under all conditions. When the generating model was DINA or DINO, in the low-quality test, the power of χ2and G2 was higher; and in the high-quality test, z(r) had the highest power. In short, combining the performance of the Type I error and power, if the data fit A-CDM, z(r), and z(l)performed best; when the data fit the DINA or DINO model, in low-quality test, χ2, and G2 performed the best; however, in high-quality tests, the z(r) performed better among all methods. This study only investigated the condition that the number of attributes is 5, and the actual test may measure more attributes. Therefore, future research should focus on the influence of the number of attributes. Lastly, the person fit assessment is also an important step in the cognitive diagnostic test, which can help identify the abnormal responses of individual students. More studies on the person fit in cognitive diagnosis model are needed.
References |
Related Articles |
Metrics
|