Psychological Science ›› 2019, Vol. 42 ›› Issue (1): 187-193.

Previous Articles     Next Articles

Comparison of Bifactor CAT Item Selection Criteria for Polytomous Items

  

  • Received:2017-11-30 Revised:2018-08-28 Online:2019-01-20 Published:2019-01-20
  • Contact: Xiu-Zhen MAO

双因子模型MCAT中多级评分项目选题策略的比较

毛秀珍1,刘欢2,唐倩1   

  1. 1. 四川师范大学
    2.
  • 通讯作者: 毛秀珍

Abstract: Bifactor model assumes that the test involves a general factor and multiple group factors. Numerous analyses on the structures of psychological trait measurement, school education survey, medical survey, and diagnostic testing have shown that the bifactor model could well represent the construct structures of the tests, surveys, or scales, and it has shown better model-data fit than other competing models (e.g. unidimensional, higher-order, and correlation models). when the abilities are assumed to be orthogonal in the bifactor model, the bifactor dimension reduction method has proved to be can reduces the multidimensional integration to multiple 2-dimensional integreations, which greatly simplifies the computation of parameter estimation (Gibbons & Hedeker, 1992; Gibbons, et al., 2007). The bifactor CAT has proved to be a practical approach that could substantialy reduce the burden of respondents while increasing testing efficiency (Gibbons, et al., 2007). However, the number of dimensions in mutidimensional CAT usually becomes an obstacle to the application of many famous item selection method, especially for the polytomous items. Specially, this study focus on the formula of information matrix forpolytomous items and how to simplify the computation of item selection method using the dimension reduction method. First, the Fisher information for bifactor grade response model was derived; then, the dimension reduction method was applied to the computation of item selection methods including the posterior weighted Fisher D-optimality method, the posterior weighted Kull-Leilber information method, the continuous entropy method, and the mutual information method; last, these methods were then compared with simulated data under three different bifactor pattern designs, using the original D-optimality method as the baseline. We conducted Monte Carlo simulation using an MATLAB program (R2010a) to wrote the CAT code and evaluate different item selection methods in terms of the correlation between real and estimated abilities, root mean squared error, absolute deviation, and Euclidean distance. The results showed that: (1) the information of the bifactor graded response model can be easily obtained and it is the generation of the information of the 2-parameter logistic model; (2) simulation results showed that for each item selection method, the correlation in high bifactor pattern is the highest, the root mean square and the absolute is lowest; (3) under the same simulation condition, the mutual information item selection method produced the highest average correlation of real ability and estimated ability, lowest root mean square, absolute bias and euclidean distance among all the item selection methods while the Posterior Kullback-Leibler method performed the worst according to these indice; (4) the PDO, CEM and DO methods produce very similar results when fixing the test condition; (5)the euclidean distance of all the methods from the begin to the end showed that their difference become significant when the test length is larger than 20 items. In conclusion, the derivation showed that dimension reduction method can be easily use to simplify the computation of item selection methods including PDO, PKL, CEM and MI. This method can simplify the multidimensional integration contain in each method to multiple 2-dimensional integreations. The simulation results further showed that when the between the discrimination parameters of the group factors and those of the general factor are smaller, estimates of the group factors become more accurate and vice versa for the estimates of the general factor. Under the same test, the CEM method performed the best in test precision while the PKL performed the worst and all other three methods performed similarly. Some problems like controlling the exposure rate, meeting the content constraint and item selection for mix-form test valued to be explore further.

Key words: multidimensional item response theory, bifactor graded response model, computerized adaptive testing, item selection, measurement precision

摘要: 双因子模型假设测验考察一个一般因子和多个组因子,符合很多教育和心理测验的因素结构。“维度缩减”方法将参数估计中多维积分计算化简为多个迭代二维积分,是双因子模型的重要特征。本文针对考察多级评分项目的计算机化自适应测验,首先推导双因子等级反应模型下Fisher信息量的计算,然后推导“维度缩减”方法在项目选择方法中的应用,最后在低、中、高双因子模式题库中比较D-优化方法、后验加权Fisher信息D优化方法(PDO)、后验加权Kullback-Leibler方法(PKL)、连续熵(CEM)和互信息(MI)方法在能力估计的相关、均方根误差、绝对值偏差和欧氏距离的表现。模拟研究表明:(1)双因子模式越强,即一般因子和组因子在项目上的区分度的差异越小,一般因子估计精度降低,组因子估计精度增加,整体能力的估计精度提高;(2)相同实验条件下,连续熵方法的测量精度最高,PKL方法的能力估计精度最低,其它方法的测量精度没有显著差异。

关键词: 多维项目反应理论, 双因子等级反应模型, 计算机化自适应测验, 选题方法, 测量精度。