Abstract
Open-ended items or Constructed-response items (CR items) are widely used in educational and psychological test. When score these CR items, additional facet and rater facet are needed. However, differences are existed between different raters, how to ensure the consistency between raters is a critical problem. Wang and Liu (2007) formulated a generalized multilevel facets model based on Generalized Partial Credit Model (GPCM) to deal with this problem. However, Tutz (1990) found that in Rasch family models, category from j to j+1 is not scored strictly stepwise, that means Rasch family models belong to simultaneous processing rather than successive processing (Andrich, 1995). Therefore, if PCM was used to simultaneous processing task, it may be lead to lower item information than the GRM (Cook, Dodd, & Fitzpatrick, 1999). Thus, a new model is needed to handle this situation.
The purpose of this paper is to formulate an IRT-based model that can deal with the items which step difficulties are monotonously increasing and also can detect a variety of rater effects precisely and effectively. The model named Grade Response Multilevel Facets, which combines the IRT model, multilevel analysis technology and many facets model and consists of three facets (person, item, and rater facets) and a slope parameter. The model contains two levels, the first level is an item response model, and the second level includes two regression models, one of which is related to variables that may affect the abilities of person. The rest regression model is devoted to illustrating the rater effects that included in the level 1 model.
To examine the recovery of the parameters when the model was used in different situations, two simulations were conducted. Simulation 1 is a rater fixed-effects model, only the ability was modeled as random-effects, both items and raters were set fixed, which means raters will keep the same attitude (e.g., give high scores for all people) when rate test-takers, and the thresholds of these items are fixed all the time. In simulation 2, both examinee ability and raters were set to random-effects; raters can hold different standards to different test-takers. The R software was adopted to generate examinee’s responses for 4 items, and the parameters of the model were estimated by the SAS NLMIXED procedure based on the marginal maximum likelihood estimation. To reduce the sampling error, 50 replications were accepted in simulation 1 and 30 replications were used in simulation 2. 3 indices, Bias, RMSE, and Absolute value of relative bias (ARB) were devoted to evaluate the recovery of parameters. Results show that: (1) There is a little difference between the estimates and the true value in both conditions, the procedure can recover these estimates fairly well; (2) In both simulations, the model can detects the rater effects precisely; (3)The random-effect model of rater is more suitability and stability than fixed-effect model. In summary, this model will have a great prospect in open-ended ratings.
Key words
subjective scoring /
rater effect /
grade response multilevel facets model
Cite this article
Download Citations
Formulation and Expectation of GRM- Based Multilevel Facets Rater Model[J]. Journal of Psychological Science. 2016, 39(1): 214-223
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}