Psychological Science ›› 2016, Vol. 39 ›› Issue (6): 1500-1507.

Previous Articles     Next Articles

Missing data handling methods based on the 2PLM

wenyi Wang 2,   

  • Received:2015-12-20 Revised:2016-04-06 Online:2016-11-20 Published:2016-11-20

2PLM下缺失数据处理方法及其比较

汪文义,宋丽红,罗芬,丁树良   

  1. 江西师范大学
  • 通讯作者: 宋丽红
  • 基金资助:

    国家自然科学基金;教育部人文社会科学研究青年基金项目;江西省社会科学研究“十二五”(2012年)规划项目;江西省教育科学2013年度一般课题;江西省教育厅科学技术研究项目;江西师范大学青年成长基金和江西师范大学博士启动基金资助;全国教育科学“十二五”规划2015年度课题教育部重点项目

Abstract:

Missing data are encountered regularly by researchers in education research. For example, many large-scale assessments are low-stakes surveys, which typically suffer from a substantial amount of missing data. The low-stakes nature of these surveys, as well as variations in average performance across countries and other factors such as testing traditions, test design, time limited, intentional omission, and so on, have been discussed as contributing factors to the amount of omitted responses observed in these assessments. Researchers have shown that missing data may create problems in the estimation of item parameters and subject ability parameters in the item response theory (IRT) context. A number of missing data handling methods have been developed in the IRT framework. The methods are not only involving response function imputation, but also including treating the missing items as not presented (NP), incorrect (IN) or fractionally correct (FR), can be carried out directly with the item parameter estimation software BILOG-MG. There have also been a number of algorithms in the context of data imputation. The current study described several approaches to deal with missing data in the two-parameter logistic model (2PLM). Although the software BILOG-MG could handle the missing data, but it is a commercial software. We need to domestically develop an EM algorithm in which the missing responses were ignored, that is, treated as missing completely at random (MCAR). MCAR which can be thought of as having no systematic cause is only one specific types of missing data. Noted that Zhang, Xin, Zeng, and Sun have proposed an EM algorithm (denote it as ZS) to dealing with missing data under MCAR with a huge computational burden when percent of missing data is higher. When data are miss at random (MAR), the probability of a value being missing is dependent on item response of the individual but not on the missing value itself. The estimation of item parameters and abilities may be influenced. However, to the best of our knowledge, there has been no work addressing the missing data under the assumption for MAR in 2PLM. We propose an EM algorithm under MAR, denoted by EE. Following a general introduction of multiple imputing methods based on item response model, two new multiple imputing methods (EF and ER) were proposed through considering uncertainties of ability parameter estimates and missing item responses, compared against two original methods (PF and PR) proposed by Huisman, and Molenaar only based on item responses probability. Simulation studies were provided to demonstrate the accuracy of these methods with sample size of 1000. Various percents of missing data were simulated: 5%,15%,30%,40%,and 50%. Missing data was simulated according to three different types of underlying missing data mechanism, including MCAR, MAR, and missing not at random. Missing data was imputed using NP, ZS, IN, PR, PF, ER, EF, and EE. Simulation results suggested that new multiple inputting methods and NP method worked well under various conditions; the EM algorithm under MAR has similar performance compared to the NP, because the 2PLM has the advantage of invariance of model parameters.

摘要:

项目反应理论(IRT)是用于客观测量的现代教育与心理测量理论之一,广泛用于缺失数据十分常见的大尺度测验分析。IRT中两参数逻辑斯蒂克模型(2PLM)下仅有完全随机缺失机制下缺失反应和缺失能力处理的EM算法。本研究推导2PLM下缺失反应忽略的EM 算法,并提出随机缺失机制下缺失反应和缺失能力处理的EM算法和考虑能力估计和作答反应不确定性的多重借补法。研究显示:在各种缺失机制、缺失比例和测验设计下,缺失反应忽略的EM算法和多重借补法表现理想。