Missing data handling methods based on the 2PLM

wenyi Wang

Journal of Psychological Science ›› 2016, Vol. 39 ›› Issue (6) : 1500-1507.

PDF(1330 KB)
PDF(1330 KB)
Journal of Psychological Science ›› 2016, Vol. 39 ›› Issue (6) : 1500-1507.

Missing data handling methods based on the 2PLM

  • wenyi Wang 2,
Author information +
History +

Abstract

Missing data are encountered regularly by researchers in education research. For example, many large-scale assessments are low-stakes surveys, which typically suffer from a substantial amount of missing data. The low-stakes nature of these surveys, as well as variations in average performance across countries and other factors such as testing traditions, test design, time limited, intentional omission, and so on, have been discussed as contributing factors to the amount of omitted responses observed in these assessments. Researchers have shown that missing data may create problems in the estimation of item parameters and subject ability parameters in the item response theory (IRT) context. A number of missing data handling methods have been developed in the IRT framework. The methods are not only involving response function imputation, but also including treating the missing items as not presented (NP), incorrect (IN) or fractionally correct (FR), can be carried out directly with the item parameter estimation software BILOG-MG. There have also been a number of algorithms in the context of data imputation. The current study described several approaches to deal with missing data in the two-parameter logistic model (2PLM). Although the software BILOG-MG could handle the missing data, but it is a commercial software. We need to domestically develop an EM algorithm in which the missing responses were ignored, that is, treated as missing completely at random (MCAR). MCAR which can be thought of as having no systematic cause is only one specific types of missing data. Noted that Zhang, Xin, Zeng, and Sun have proposed an EM algorithm (denote it as ZS) to dealing with missing data under MCAR with a huge computational burden when percent of missing data is higher. When data are miss at random (MAR), the probability of a value being missing is dependent on item response of the individual but not on the missing value itself. The estimation of item parameters and abilities may be influenced. However, to the best of our knowledge, there has been no work addressing the missing data under the assumption for MAR in 2PLM. We propose an EM algorithm under MAR, denoted by EE. Following a general introduction of multiple imputing methods based on item response model, two new multiple imputing methods (EF and ER) were proposed through considering uncertainties of ability parameter estimates and missing item responses, compared against two original methods (PF and PR) proposed by Huisman, and Molenaar only based on item responses probability. Simulation studies were provided to demonstrate the accuracy of these methods with sample size of 1000. Various percents of missing data were simulated: 5%,15%,30%,40%,and 50%. Missing data was simulated according to three different types of underlying missing data mechanism, including MCAR, MAR, and missing not at random. Missing data was imputed using NP, ZS, IN, PR, PF, ER, EF, and EE. Simulation results suggested that new multiple inputting methods and NP method worked well under various conditions; the EM algorithm under MAR has similar performance compared to the NP, because the 2PLM has the advantage of invariance of model parameters.

Cite this article

Download Citations
wenyi Wang. Missing data handling methods based on the 2PLM[J]. Journal of Psychological Science. 2016, 39(6): 1500-1507
PDF(1330 KB)

Accesses

Citation

Detail

Sections
Recommended

/