心理科学 ›› 2018, Vol. 41 ›› Issue (6): 1374-1381.

• 发展与教育 • 上一篇    下一篇

高中英语阅读测验中题组模型的选择与应用

马洁,刘红云   

  1. 北京师范大学
  • 收稿日期:2018-03-15 修回日期:2018-09-02 出版日期:2018-11-20 发布日期:2018-11-20
  • 通讯作者: 刘红云

Comparison and Application of IRT Model in English Reading Comprehension Test

1,   

  • Received:2018-03-15 Revised:2018-09-02 Online:2018-11-20 Published:2018-11-20

摘要: 本研究通过高中英语阅读测验实测数据,对比分析双参数逻辑斯蒂克模型 (2PL-IRT)和加入不同数量题组的双参数逻辑斯蒂克模型 (2PL-TRT), 探究题组数量对参数估计及模型拟合的影响。结果表明:(1) 2PL-IRT模型对能力介于-1.50到0.50的被试,能力参数估计偏差较大;(2)将题组效应大于0.50的题组作为局部独立题目纳入模型,会导致部分题目区分度参数的低估和大部分题目难度参数的高估;(3)题组效应越大,将其当作局部独立题目纳入模型估计项目参数的偏差越大。

关键词: 关键词 项目反应理论, 题组反应理论, 题组模型, 模型选择

Abstract: Testlet is common in reading comprehension tests. Compared to traditional tests which consist of several single items, test with testlet can not only reduce test time and cost, but also build tasks which are more similar to the real-world situations, to improve the validity of the test. However, if reading materials within testlet have different impacts on examinees with different knowledge backgrounds, the testlet effect occurs. As is shown in previous researches, when testlet effect exists, the estimates of item parameters will be biased if traditional IRT model is applied. To solve this problem, researchers extended the Testlet Response Theory by adding testlet parameter into standard IRT models. This article summarized the models dealing with testlet effect, and then analyzed data from a high school English reading comprehension test, which consists of one cloze test and five reading comprehension tests. Item types are multiple-choice items, including 4 options and 5-answers-out-of-7-options items. The sample size of this research was 934. Two different kinds of measurement models were compared for this kind of situation, which were two-parameter logistic item response (2PL-IRT) model and 5 two-parameter logistic testlet item response (2PL-TRT) models. Each of the 2PL-TRT model has different number of testlets. The most complicated model (5T_TRT) contains all 5 reading comprehension testlets. Then, according to the magnitude of testlet effect, the number of testlets was reduced in the models. The simplest 2PL-TRT model (1T_TRT) only contains one testlet, which has the largest testlet effect. Firstly, to ensure that all of the reading comprehensions violate the Local Independent Hypothesis (LID), the Q3 values of each testlet were calculated in R. As expected, the absolute values of all 5 reading comprehensions’ Q3 values were exceeded 0.20, which indicated all 5 reading comprehensions violate the Local Independent Hypothesis. After that, the ability estimates of every examinee, the estimates of discrimination parameter (a) and difficulty parameter (b) of every item were estimated by SCORIGHT 3.0. For the estimates of ability, there were no obvious difference among 5 two-parameter-logistic-testlet item response models. For the examinees with abilities between -1.50 to 0.50, IRT model will lead to biased estimates of ability parameter. For the estimates of item parameters, if the testlet effect of a reading comprehension test reaches 0.50, the items within the test should not be viewed as local independent items. Therefore, these items should be analyzed by the TRT model as a testlet. Otherwise, if these items are regarded as local independent items mistakenly, the estimates of item parameters will be seriously biased. The bias will increase as the testlet effect become larger. According to the results, in practice, if a good balance between the accuracy of parameter estimates and the simplicity of models is desired, it is necessary to take two things into consideration: the type of parameter and the magnitude of testlet effect. In addition, researchers accentuated the importance of the rationality of the reading materials. To avoid testlet effects, it is important to take the article subject and item types into consideration before test construction.

Key words: Item Response Theory, Testlet Response Theory, Testlet Response Model, Model Selection