Effects of Several Factors on IRT Observed Score Kernel Equating

Journal of Psychological Science ›› 2022, Vol. 45 ›› Issue (4) : 988-997.

PDF(2855 KB)
PDF(2855 KB)
Journal of Psychological Science ›› 2022, Vol. 45 ›› Issue (4) : 988-997.

Effects of Several Factors on IRT Observed Score Kernel Equating

Author information +
History +

Abstract

Attributing to its advantages of pre-smoothing and continuization of score distributions, kernel equating has been testified and shown equivalent to or better than other equating methods, especially traditional ones, in the aspect of equating accuracy and stability. IRT observed score kernel equating is formed by integrating kernel equating and IRT observed score equating. Few researches have focused on evaluating its performance systematically. Therefore, bandwidth selection method, sample size, test length, equating design, and data simulation methods were investigated about their influence on it. To ensure ecological validity, data from a large-scale assessment were used as the sampling pool. IRT data simulation method and pseudo tests and pseudo groups simulation method were used to avoid the simulation preference in random Equivalent Groups design (EG) and Non-Equivalent groups with Anchor Test design (NEAT). In detail, bandwidth selection methods included Penalty method, Silverman’s rule of thumb method, and Double smoothing method. Levels of sample size were 1000, 2000, and 5000. Meanwhile, test containing 30 items and 45 items were considered. Finally, local criteria and universal criteria were computed, the former of which were Percent Relative Error (PRE) and Standard Error of Equating (SEE), and the latter of which were Averaged Percent Relative Error (APRE) and Averaged Standard Error of Equating (ASEE). It was found out that in EG, regarding local criteria, PRE increased as central moment became higher, which also meant that the distribution difference before and after equating was enlarged. Nonetheless, considering that PRE was formed by multiplying initial difference with 100, bandwidth selection methods performed alike. On the other hand, PRE was significantly reduced by increasing sample size and lengthening tests, especially by the latter one. Similar to PRE, when it came to SEE, there was no difference between effect of bandwidth selection methods. Larger sample size rendered less random error, which was contrary to test length. Furthermore, curves of SEE were “high at left but low at right” for pseudo tests and pseudo groups method, and “low at left but high at right” for IRT simulation method. As for universal criteria, APRE among bandwidth selection methods were alike, which were all small. Effects of sample size and test length were same as observed in local criteria. There was no significant difference between ASEE for two data simulation methods. In NEAT, regarding local criteria, PRE increased as central moment became higher. The results of Penalty method and Silverman’s rule of thumb method coincided, which were superior to others. And this trend was more evident when test is shorter. PRE was significantly reduced by lengthening tests as in EG, but not by increasing sample size. To be mentioned was the results that PRE for Double smoothing method was most influenced by sample size when test included 30 items and IRT simulation method was used, which indicated some interactions among them. When it came to SEE, bandwidth selection methods performed alike, only showing discrepancies at extreme scores. Increasing sample size and lengthening test could reduce random error. Meanwhile, distribution of SEE for pseudo tests and pseudo groups method was more stable than that for IRT method. As for universal criteria, the trends for APRE and ASEE were same as those in local criteria. To summarize, performances of bandwidth selection methods were similar in EG, but Penalty method and Silverman’s rule of thumb method prevailed in NEAT. Bandwidth selection, sample size, and test length affected IRT observed score equating together. Preference of data simulation methods was spotted, which suggested researchers that multiple simulation methods and designs should be conducted before final conclusions are drawn in the field of comparison of equating method. Further study should focus more on the systematic evaluation of equating.

Key words

IRT observed score kernel equating / bandwidth selection methods / equating design / data simulation methods

Cite this article

Download Citations
Effects of Several Factors on IRT Observed Score Kernel Equating[J]. Journal of Psychological Science. 2022, 45(4): 988-997
PDF(2855 KB)

Accesses

Citation

Detail

Sections
Recommended

/