Comparison and Application of IRT Model in English Reading Comprehension Test

MA Ji; LIU Hong-Yun

PDF(1107 KB)

Journal of Psychological Science ›› 2018, Vol. 41 ›› Issue (6) : 1374-1381.

Comparison and Application of IRT Model in English Reading Comprehension Test

Author information +

History +

Abstract

Testlet is common in reading comprehension tests. Compared to traditional tests which consist of several single items, test with testlet can not only reduce test time and cost, but also build tasks which are more similar to the real-world situations, to improve the validity of the test. However, if reading materials within testlet have different impacts on examinees with different knowledge backgrounds, the testlet effect occurs. As is shown in previous researches, when testlet effect exists, the estimates of item parameters will be biased if traditional IRT model is applied. To solve this problem, researchers extended the Testlet Response Theory by adding testlet parameter into standard IRT models. This article summarized the models dealing with testlet effect, and then analyzed data from a high school English reading comprehension test, which consists of one cloze test and five reading comprehension tests. Item types are multiple-choice items, including 4 options and 5-answers-out-of-7-options items. The sample size of this research was 934. Two different kinds of measurement models were compared for this kind of situation, which were two-parameter logistic item response (2PL-IRT) model and 5 two-parameter logistic testlet item response (2PL-TRT) models. Each of the 2PL-TRT model has different number of testlets. The most complicated model (5T_TRT) contains all 5 reading comprehension testlets. Then, according to the magnitude of testlet effect, the number of testlets was reduced in the models. The simplest 2PL-TRT model (1T_TRT) only contains one testlet, which has the largest testlet effect. Firstly, to ensure that all of the reading comprehensions violate the Local Independent Hypothesis (LID), the Q3 values of each testlet were calculated in R. As expected, the absolute values of all 5 reading comprehensions’ Q3 values were exceeded 0.20, which indicated all 5 reading comprehensions violate the Local Independent Hypothesis. After that, the ability estimates of every examinee, the estimates of discrimination parameter (a) and difficulty parameter (b) of every item were estimated by SCORIGHT 3.0. For the estimates of ability, there were no obvious difference among 5 two-parameter-logistic-testlet item response models. For the examinees with abilities between -1.50 to 0.50, IRT model will lead to biased estimates of ability parameter. For the estimates of item parameters, if the testlet effect of a reading comprehension test reaches 0.50, the items within the test should not be viewed as local independent items. Therefore, these items should be analyzed by the TRT model as a testlet. Otherwise, if these items are regarded as local independent items mistakenly, the estimates of item parameters will be seriously biased. The bias will increase as the testlet effect become larger. According to the results, in practice, if a good balance between the accuracy of parameter estimates and the simplicity of models is desired, it is necessary to take two things into consideration: the type of parameter and the magnitude of testlet effect. In addition, researchers accentuated the importance of the rationality of the reading materials. To avoid testlet effects, it is important to take the article subject and item types into consideration before test construction.