Abstract
Cognitive diagnostic assessment (CDA) has received much attention in educational and psychological measurement recently. During the CDA, cognitive diagnostic test is the key component to examine whether or not individuals master the attributes that the test intends to measure based on his/her item responses. Lots of factors can affect the quality of the cognitive diagnostic test, among which differential item functioning (DIF) is one of the most important factors. Recently, researchers have developed lots of methods to detect DIF items, such as Mantel–Haenszel (MH), simultaneous item bias test, logistic regression (LR), and Wald statistics. These methods have been mainly designed to compare two groups, named the reference group and the focal group, respectively. However, there are more than two groups in practical situations, such as different classrooms within a school or different schools within a district. As we notice, a few studies have extended the DIF detection methods to multiple groups, for example, Li and Wang (2015) used the Wald statistic to detect DIF items for three groups. However, during the calculation of the Wald statistic, the information matrix is itemwise-based, which yields inflated Type I error rate. Meanwhile, both MH and LR methods can be used to detect DIF items for multiple group by using total score as the match variable. However, the results are worse than the itemwise-based Wald statistic for most conditions, therefore, the Wald statistic will be considered in this study. Currently, we attempt to extend the improved Wald statistics to more than two groups to control the Type I error rate as well as improve the power rate.
A simulation study is conducted to investigate the performance of two improved Wald statistics for DIF detection with more than two groups in CDA. Six factors are manipulated in the study, which are DIF type, DIF size, sample size, test length, proportion of DIF items, and method of DIF detection. In addition, five factors are fixed in the study, include number of groups, number of attributes, correlation among attributes, model that used to generate response pattern, and distribution of item parameters. Type I error rate and statistic power are used to evaluate the performance of three DIF detection methods, the nominal level of Type-I error rate is setting as .05. In order to reduce the sampling error, 50 replications are used for each condition. Results show that (1) For all conditions, the itemwise-based (IW-based) Wald statistic, which leads to inflated Type I error rates, produce larger Type I error rates than the two improved Wald statistics— the cross-product information-based (XPD-based) Wald statistic and the observed information-based (Obs-based) Wald statistic. (2) When the DINA model is used to estimate the item parameters, the Type I error rates of the two improved Wald statistics close to the nominal level for most of conditions. (3) The IW-based Wald statistic yields the highest power for all conditions, the Obs-based and the XPD-based Wald statistics produce similar power rate in most conditions. The differences are diminished among these three Wald statistics when sample size and DIF size are relatively larger.
Key words
Cognitive diagnostic test /
differential item functioning /
multiple groups /
improved Wald statistics
Cite this article
Download Citations
Using Information Matrix-based Method to Detect Differential Item Functioning with Multiple Groups in Cognitive Diagnostic Test[J]. Journal of Psychological Science. 2022, 45(3): 710-717
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}