›› 2019, Vol. 42 ›› Issue (5): 1251-1259.

Previous Articles     Next Articles

The Robustness of the Item-level Model Comparison Statistics in Cognitive Diagnostic Models

  

  • Received:2018-09-09 Revised:2019-02-01 Online:2019-09-20 Published:2019-09-20
  • Contact: Yan-Lou LIU

认知诊断模型中项目水平模型比较统计量的健壮性

刘彦楼张倩萌郑宗军尹昊   

  1. 1. 曲阜师范大学
    2. 泰山学院
  • 通讯作者: 刘彦楼

Abstract: For the past decade, cognitive diagnostic models (CDMs) have received considerable attention as a psychometric model. A variety of specific and general CDMs with different assumptions about how an examinee’s latent attribute mastery pattern influence test performance have been developed in the literature. For example, the deterministic inputs, noisy “and” gate model (DINA), the deterministic inputs, noisy “or” gate model (DINO), the additive cognitive diagnostic model (A-CDM) are specific CDMs. The log-linear cognitive diagnosis model, the generalized DINA model (G-DINA) and the general diagnostic model are example of general CDMs. Although specific CDMs can be shown as special cases of the general models, selecting the most appropriate CDM at the item level is of great important to researchers and practitioners, since the correctly specified CDM can provide higher accurate attribute mastery pattern estimates than a general CDM. Under the conditions that the Q-matrix is correctly specified and the saturated model provides the best model-data fit, many methods are available for selecting the most appropriate CDM from the saturated CDM at the item level, such as the Wald and likelihood ratio (LR) tests. However, CDM is a simplification of reality, under the most circumstance if not all, CDMs seldom perfectly represent real world phenomena. It is reasonable to explore the robustness of item level model comparison statistics under model-data misfit condition. The primary purpose of this simulation study was to investigate the impact of the model-data misfit on the empirical performance of the Wald statistic based on the observed information (W_Obs) or the sandwich-type matrix (W_Sw), the LR statistic, and a new proposed Wald statistic computation method that is the Wald statistic based on the empirical cross-product information matrix (W_XPD) for item-level model selection with respect to the Type I error and power. Four factors were manipulated in the simulation: Four Sample Sizes times Three data generating models times Two Q-matrix Specification Types times Four item-level model comparison statistics. (a) Sample size: 1,000, 2,000, 3,000, 5,000; (b) data generating model: DINA, A-CDM, G-DINA; (c) Q-matrix specification type: correctly specified and misspecified Q-matrices; (d) item-level model comparison statistic: W_XPD, W_Obs, W_Sw, LR. The attribute tetrachoric correlation was fixed to rho =.7, the test length and the number of latent attributes were fixed to J=30 and K=5, the summary of the Q-matrix misspecification is provided in Table 1. The intercept parameter lambda _{j,0} was fixed to .2, the main and/or interaction effect parameters were equally fixed to .7/n_j, n_j is the length of the main and/or interaction effect parameters. Binary observed response data was generated using the G-DINA framework. There were 300 replications in each combination. The model parameters were estimated using CDM package and the item parameter asymptotic covariance matrix was estimated using dcminfo package, the R codes are available upon request. The simulation results showed that: (a) When the Q-matrix was correctly specified, for the DINA model the W_Sw or LR test had better Type I error rate control, for the A-CDM the Type I error rates of the W_Sw or W_Obs were better than that of the LR or W_XPD. (b) Although the Q-matrix misspecification have shown negative impact on the Type I error rates of the W_Sw, among the four statistics investigated in this study W_Sw have the most robust performance across all the simulation conditions. (c) Interestingly, when the Q-matrix was misspecified, under most of the simulation conditions the performance of the W_XPD with respect to the Type error rate control was slightly better than W_Sw. We conclude that when the saturated GDINA provides a good fit to the response data, W_Sw can be used to select the most appropriate CDM at the item level, however, if the model does not fit data well enough, W_XPD might be a better choice.

Key words: cognitive diagnostic model, information matrix, sandwich-type matrix, Wald statistics, model comparison

摘要: 使用模拟研究方法比较了以往研究中提出的基于观察信息矩阵、三明治矩阵的Wald(分别表示为W_Obs、W_Sw)、似然比(Likelihood Ratio)统计量以及新提出的基于经验交叉相乘信息矩阵的Wald统计量(W_XPD)在模型——数据失拟条件下进行项目水平上模型比较时的表现。结果显示:(1)W_Sw的一类错误控制率有很强的健壮性。(2)W_XPD在Q矩阵错误设定的大多数条件下的表现优于W_Sw。结论:模型—数据拟合良好时可以使用W_Sw进行项目水平上的模型比较,当模型与数据失拟时W_XPD可能是更好的选择。

关键词: 认知诊断模型, 信息矩阵, 三明治矩阵, Wald统计量, 模型比较