Psychological Science ›› 2016, Vol. 39 ›› Issue (1): 214-223.
Previous Articles Next Articles
Received:
Revised:
Online:
Published:
康春花,孙小坚,曾平飞
通讯作者:
Abstract: Open-ended items or Constructed-response items (CR items) are widely used in educational and psychological test. When score these CR items, additional facet and rater facet are needed. However, differences are existed between different raters, how to ensure the consistency between raters is a critical problem. Wang and Liu (2007) formulated a generalized multilevel facets model based on Generalized Partial Credit Model (GPCM) to deal with this problem. However, Tutz (1990) found that in Rasch family models, category from j to j+1 is not scored strictly stepwise, that means Rasch family models belong to simultaneous processing rather than successive processing (Andrich, 1995). Therefore, if PCM was used to simultaneous processing task, it may be lead to lower item information than the GRM (Cook, Dodd, & Fitzpatrick, 1999). Thus, a new model is needed to handle this situation. The purpose of this paper is to formulate an IRT-based model that can deal with the items which step difficulties are monotonously increasing and also can detect a variety of rater effects precisely and effectively. The model named Grade Response Multilevel Facets, which combines the IRT model, multilevel analysis technology and many facets model and consists of three facets (person, item, and rater facets) and a slope parameter. The model contains two levels, the first level is an item response model, and the second level includes two regression models, one of which is related to variables that may affect the abilities of person. The rest regression model is devoted to illustrating the rater effects that included in the level 1 model. To examine the recovery of the parameters when the model was used in different situations, two simulations were conducted. Simulation 1 is a rater fixed-effects model, only the ability was modeled as random-effects, both items and raters were set fixed, which means raters will keep the same attitude (e.g., give high scores for all people) when rate test-takers, and the thresholds of these items are fixed all the time. In simulation 2, both examinee ability and raters were set to random-effects; raters can hold different standards to different test-takers. The R software was adopted to generate examinee’s responses for 4 items, and the parameters of the model were estimated by the SAS NLMIXED procedure based on the marginal maximum likelihood estimation. To reduce the sampling error, 50 replications were accepted in simulation 1 and 30 replications were used in simulation 2. 3 indices, Bias, RMSE, and Absolute value of relative bias (ARB) were devoted to evaluate the recovery of parameters. Results show that: (1) There is a little difference between the estimates and the true value in both conditions, the procedure can recover these estimates fairly well; (2) In both simulations, the model can detects the rater effects precisely; (3)The random-effect model of rater is more suitability and stability than fixed-effect model. In summary, this model will have a great prospect in open-ended ratings.
Key words: subjective scoring, rater effect, grade response multilevel facets model
摘要: 国内外考试改革和大型测评实践越来越强调主观题的作用,则评分者信度研究又重新成为一个备受关注的议题。研究在Wang和Liu(2007)的广义多水平侧面模型基础上,提出并探讨了等级反应多水平侧面模型。结果表明:在评分者固定效应和随机效应两种实验条件下,各偏差值的均值与标准差均较小,说明模型在当前实验条件下,各参数估计值的返真性和稳健性均较好,可以检测出评分者效应,由此,后续可进一步加入评分者效应的影响因素,使其发展为可同时检测评分者效应及其影响因素的完整模型。
关键词: 主观评分, 评分者效应, 等级反应 多水平侧面模型
康春花 孙小坚 曾平飞. 基于等级反应模型的多水平多侧面评分者模型[J]. 心理科学, 2016, 39(1): 214-223.
0 / Recommend
Add to citation manager EndNote|Ris|BibTeX
URL: https://jps.ecnu.edu.cn/EN/
https://jps.ecnu.edu.cn/EN/Y2016/V39/I1/214