Psychological Science ›› 2015, Vol. 38 ›› Issue (6): 1504-1512.
Previous Articles Next Articles
Yue Liu1,
Received:
Revised:
Online:
Published:
刘玥1,刘红云2
通讯作者:
Abstract:
The NEAT((non-equivalent groups with anchor test) design is widely used in large scale educational assessments. However, in practical, security problems in some regions make it difficult to re-use items; it is also impossible to use inner anchor sets in high-stakes tests. Besides, the criteria for selecting anchor set are hard to achieve in some situations. The purpose of this study is to find a practical alternative for score equating when NEAT design is not conducted in large-scale assessments, and to evaluate the new method in various conditions. This new approach is called Assembling Anchor Sets (AAS) approach. First, items from different tests are ordered by the predicted difficulty of experienced experts. Then, items of nearly equal difficulty are chosen as common items, considering the representativeness of anchor tests at the same time. Last, equating methods under NEAT design should be applied. The study is a mixed measure design of simulation conditions and score equating methods. Two equating procedures are compared. One is to build anchor sets using AAS approach and imply the equipercentile equating methods under NEAT design, the other is to treat the two simulated samples as random groups, and use the equipercentile equating methods under random groups design. There are two simulations in the research. Simulation 1 aims to explore the effect of the error of experts’ judgments on the AAS approach. There are five simulation conditions:(1) number of examines (2000 and 5000);(2) anchor length(5 items, 1/8 of total test;10 items,1/4 of total test);(3) proportion of mistakenly predicted items(four levels in the condition of 5 common items, 0%、20%、60%、100%;five levels in the condition of 10 common items, 0%、30%、50%、70%、100%);(4) errors of predicted item difficulty(+1 and +5);(5) difficulty levels of total tests(equivalent and non-equivalent). Simulation 2 aims to find out the influence of the differences of nearly equal difficulty items on the AAS approach. There are five simulation conditions, four of which are the same as the factors (1) (2) (3) (5) in simulation 1. The other one is the differences between the difficulty of common items (+0.15 and +0.25). In both simulations, test length is fixed as 40 items, and 30 replications are generated. Data are generated according to Rasch model using R program. The equipercentile equating methods are conducted by R package called “Equate”. The responses of the examinees took the original test on the new test are also simulated. Therefore, it can be referred as the examinees took both the original and the new tests. Then, equipercentile equating method under random groups design can be applied to equate the scores on the new test to the original test. This is the true scores in the study. Finally, the two equating methods are evaluated by four criteria: bias, mean absolute error, root mean square error, correlation between the scores after equating and true scores. The results show that:(1) as the proportion of mistakenly predicted items, the errors of predicted item difficulty, and the differences between the difficulty of common items increase, the equating error of AAS approach increases;(2)in the conditions of equivalent groups, when the proportion of mistakenly predicted items is large, the errors of predicted item difficulty are serious, and the differences between the difficulty of common items are obvious, the AAS approach is worse than the method under random groups design; in the conditions of non-equivalent groups, the AAS approach is always better than random groups methods;(3) in the conditions of non-equivalent groups, as anchor length increases, the equating error of AAS approach decreases. In conclusion, it is highly recommended to use the AAS approach in non-equivalent groups conditions. Further research should focus on developing some practical ways to increase the accuracy of predicted difficulty of items.
摘要:
研究旨在探索无铆题情况下,使用构造铆测验法,实现测验分数等值。研究一和研究二分别探索题目难度排序错误、铆题难度差异对构造铆测验法的影响。结果表明:(1)等组条件下,随着错误铆题比例,难度排序错误程度,铆题难度差异增大,构造铆测验法的等值误差逐渐增大,随机等组法的等值误差较为稳定;不等组条件下,构造铆测验法的等值误差均小于随机等组法;(2)对于构造铆测验法,在不等组条件下,铆测验长度越短,等值误差越大。
Yue Liu. A Search for Alternatives to test Equating with no common items[J]. Psychological Science, 2015, 38(6): 1504-1512.
刘玥 刘红云. 无铆题情况下测验分数等值方法探索——构造铆测验法[J]. 心理科学, 2015, 38(6): 1504-1512.
0 / Recommend
Add to citation manager EndNote|Ris|BibTeX
URL: https://jps.ecnu.edu.cn/EN/
https://jps.ecnu.edu.cn/EN/Y2015/V38/I6/1504