›› 2019, Vol. 42 ›› Issue (3): 731-738.

Previous Articles     Next Articles

Estimation of Variance Components of Sparse Data Due to Different Rating Plans Based on Generalizability Theory

Guang MingLI1, 2   

  1. 1.
    2. south china normal university
  • Received:2017-11-22 Revised:2018-10-22 Online:2019-05-20 Published:2019-05-20
  • Contact: Guang MingLI

概化理论不同评分方案缺失数据方差分量估计

黎光明,蒋欢   

  1. 华南师范大学
  • 通讯作者: 黎光明

Abstract: As for performance-based measurements which include but are not limited to writing tests and speaking tests, it is inevitable that every answer is only rated by a fraction of raters due to the limitation of time and resources. And it will be caught in a dilemma when trying to analyze data under the framework of Generalizability Theory, because data cannot be characterized as any type of design. Therefore, this kind of data should be considered as sparse data. During the tests, examiners always specify how raters paired to rate each answer, and that is what they call rating plan. It regulates how the rater facet associated with any other facets and the object of measurement, hence it determines the structure of sparse data. But domestic researches have not introduced this definition and merely used unbalanced designs to describe structures. Mixture of Connected Rating Plan, Disconnected Cross Rating Plan and Fixed Rater Rating Plan are three most commonly used rating plans. The sparse data from these rating plans differ in structure. Mixture of Connected Rating Plan is easy to operate but its data structure is the most intricate. Disconnected Cross Rating Plan and Fixed Rater Rating Plan both need to divide raters into groups at first, and their data structures are much simple. This study expounded these rating plans in detail, as well as three estimation methods, analogous-ANOVA, rating method and subdividing method. Then followed a simulation study which aimed to investigate the estimation accuracy of variance components of three estimation methods under three rating plans of different numbers of raters and different levels of rater related effect. The number of raters was set to three levels: 8 raters, 14raters and 28 raters, and rater related effect was set to two levels: low and high. The number of examinees was 3000 and the number of items was 2, and they were both fixed. Results indicated that: (1) Theoretically, analogous-ANOVA was easy to understand, but its estimation accuracy at all conditions under all rating plans was unsatisfactory. This method should not be considered in empirical studies. (2) Rating method ignored the complicated combinations of raters, treating ratings from every answer as the same random effect. Regardless the data from which rating plan, computational process of this method was exactly the same. Estimation accuracy of this method was barely influenced by the number of raters, and when rater related effect was low, its estimated values were relatively accurate compared to analogous-ANOVA. (3) Subdividing method expected to make the best use of the current data and its procedure was rather complex. But among all three methods, this method could get the most accurate variance component estimates, and both factor, the number of raters and rater related effect, hardly had any impact. Only in Fixed Rater Rating Plan, researchers need pay attention to the ratio of the number of raters and examinees. When it is lower than 0.0047, it is still safe to use subdividing method.

Key words: Generalizability Theory, rating plan, sparse data, estimation of variance components

摘要: 包含评分者侧面的测验通常不符合任意一种概化理论设计,因此从概化理论的角度来看这类测验下的数据应属于缺失数据,而决定缺失结构的就是测验的评分方案。用R软件模拟出三种评分方案下的数据,并比较传统法、评价法和拆分法在各评分方案下的估计效果,结果表明:(1)传统法估计准确性较差;(2)评分者一致性较高时,适宜用评价法进行估计;(3)拆分法的估计结果最准确,仅在固定评分者评分方案下需注意评分者与考生数量之比,该比值小于等于0.0047 时估计结果较为准确。

关键词: 概化理论, 评分方案, 缺失数据, 方差分量估计