Under the weighted-score logistic model (WSLM), which is proposed by Jian, Dai, & Dai(2016). On the basis of the item emphases of the polytomously scored item, the WSLM model adds the weighted-score parameters into the dichotomous logistic model. Because of the dichotomous model have five forms at least. Similarly, the weighted-Score Logistic model also have four forms, including the one-parameter weighted-Score Logistic model, the two-parameter weighted-Score Logistic model, the three-parameter weighted-Score Logistic model including c parameter, the three-parameter weighted-Score Logistic model including γ parameter, the four-parameter weighted-Score Logistic model.
There are response disturbances such as random guessing, carelessness, transcription error in the educational tests. In the paper and pencil testes or computerized adaptive testing, the aberrant responses such as careless errors and lucky guesses would cause significant ability estimation biases in the past researches. Mislevy & Bock (1982) proposed the Biweight estimator, and made comparison between the Biweight estimator and maximum likelihood estimator. Results showed that the Biweight estimator could typically reduce Biases, thereby dispel measurement disturbances. And three-parameter Logistic IRT model and four parameter Logistic IRT model, Huber robust estimation, and the other methods have therefore been proposed to address the response disturbance, including random guessing, carelessness, etc..
The paper comparisons the four models to robustify ability estimates by an example of a test. The four models compared including two-parameter WSLM, three-parameter WSLM contains c parameter, three-parameter WSLM contains γ parameter, four-parameter WSLM. Second, three simulation studies in three test cases are presented respectively, with the aim of comparing four approaches, including 2PM-MLE, Biweight estimation, Huber estimation, 4PM-Robust estimation. The hypothetical test instrument contains 34 items, with difficulty thresholds b~ N(0,1), and log (a) ~ N(0,1). The 35th item with difficulty thresholds range from -4.0 to 4.0. The ability of the middle-ability examinee is estimated by the responses on the 34 items of the basic test under two-parameter logistic model, and the ability estimation is seen as the reference value for the other three models.
Based on the two-parameter WSLM, the ability of the examinees will be overestimated when there exist guessing phenomenon on the difficult items; Meanwhile, the ability of the examinees will be underestimated when there exist sleeping phenomenon on the easy items. The three-parameter WSLM, which contains c parameter, the overestimation phenomenon would be rectified. However, the underestimation phenomenon still exists when the examinees miss the easy items. Secondly, The three-parameter WSLM, which contains γ parameter, the underestimation phenomenon would be rectified well when the examinees miss the easy items. But the overestimation phenomenon still exists when the examinees get the difficult items. Thirdly, the four-parameter WSLM, which contains c, γ parameter, the underestimation phenomenon would be rectified well when the examinees miss the easy items, and the overestimation phenomenon would also be rectified well when the low-ability examinees get the difficult items luckily. So, the examinee can get the ability robust estimation under the four-parameter WSLM when there exists response disturbances such as random guessing and carelessness error n the tests.
Key words
weighted Logistic model /
guessing phenomena /
randon error phenomena /
ability overestimated /
ability underestimated