[1] 蔡艳, 丁树良, 涂冬波. (2009). 铆题比例对等值精度的影响. 心理学探新, 29, 86-89. [2] 戴海崎, 刘启辉. (2002). 锚题题型与等值估计方法对等值的影响. 心理学报, 34, 367-370. [3] 黎光明, 梁正妍. (2019). 锚题比例与年级离散度对垂直等值的影响. 江西师范大学学报(自然科学版), 43, 52-58. [4] 曾平飞, 李雨秦, 刘文惠, 焦丽亚, 康春花. (2017). 大规模测评中IRT等值的影响因素研究. 中国考试, 9, 22-29, 52. [5] Angoff, W. H. (1984). Scales, norms, and equivalent scores.Educational Testing Service. [6] Baker, F. B., & Al-Karni, A. (1991). A comparison of two procedures for computing IRT equating coefficients. Journal of Educational Measurement, 28, 147-162. [7] Battauz, M. (2013). IRT test equating in complex linkage plans. Psychometrika, 78, 464-480. [8] Battauz, M. (2015). Factors affecting the variability of IRT equating coefficients. Statistica Neerlandica, 69, 85-101. [9] Budescu, D. (1985). Efficiency of linear equating as a function of the length of the anchor test. Journal of Educational Measurement, 22, 13-20. [10] Chang H. H., Qian J. H., & Ying Z. L. (2001). A-stratified multistage computerized adaptive testing with B blocking. Applied Psychological Measurement, 25, 333-341. [11] Cook, L. L., & Petersen, N. S. (1987). Problems related to the use of conventional and item response theory equating methods in less than optimal circumstances. Applied Psychological Measurement, 11, 225-244. [12] Fitzpatrick, J., & Skorupski, W. P. (2016). Equating with miditests using IRT. Journal of Educational Measurement, 53, 172-189. [13] Haebara, T. (1980). Equating logistic ability scales by a weighted least squares method. Japanese Psychological Research, 22, 144-149. [14] Hanson, B. A., & Béguin, A. A. (2002). Obtaining a common scale for item response theory item parameters using separate versus concurrent estimation in the common-item equating design. Applied Psychological Measurement, 26, 3-24. [15] Harris, D. J., & Crouse, J. D. (1993). A study of criteria used in equating. Applied Measurement in Education, 6, 195-240. [16] Kim, S. H., & Cohen, A. S. (1992). Effects of linking methods on detection of DIF. Journal of Educational Measurement, 29, 51-66. [17] Kim, S. H., & Cohen, A. S. (1995). A minimum χ2 method for equating tests under the graded response model. Applied Psychological Measurement, 19, 167-176. [18] Kim, S. H., & Cohen, A. S. (1998). A comparison of linking and concurrent calibration under item response theory. Applied Psychological Measurement, 22, 131-143. [19] Kim, S., & Kolen, M. J. (2004). STUIRT . University of Iowa. [20] Kolen, M. J., & Brennan, R. L. (2004). Test equating, scaling, and linking: Methods and practices. Springer. [21] Livingston, S. A. (2004). Equating test scores (without IRT). Educational Testing Service. [22] Lord, F. M. (1980). Applications of item response theory to practical testing problems.Lawrence Erlbaum Associates. [23] Lord, F. M., & Wingersky, M. S. (1984). Comparison of IRT true-score and equipercentile observed-score "equatings". Applied Psychological Measurement, 8, 453-461. [24] Loyd, B. H., & Hoover, H. D. (1980). Vertical equating using the Rasch Model. Journal of Educational Measurement, 17, 179-193. [25] Marco, G. L. (1977). Item characteristic curve solutions to three intractable testing problems. Journal of Educational Measurement, 14, 139-160. [26] Ogasawara, H. (2000). Asymptotic standard errors of IRT equating coefficients using moments. Economic Review (Otaru University of Commerce), 51, 1-23. [27] Ogasawara, H. (2001). Least squares estimation of item response theory linking coefficients. Applied Psychological Measurement, 25, 373-383. [28] Petersen N. S., Cook L. L., & Stocking M. L. (1983). IRT versus conventional equating methods: A comparative study of scale stability. Journal of Educational Statistics, 8, 137-156. [29] Stocking, M. L., & Lord, F. M. (1983). Developing a common metric in item response theory. Applied Psychological Measurement, 7, 201-210. [30] Tatsuoka, K. K. (1991). Item construction and psychometric models appropriate for constructed-responses. Educational Testing Service. [31] von Davier M., Khorramdel L., He Q. W., Shin H. J., & Chen H. W. (2019). Developments in psychometric population models for technology-based large-scale assessments: An overview of challenges and opportunities. Journal of Educational and Behavioral Statistics, 44, 671-705. [32] Way, W. D., & Tang, K. L. (1991). A comparison of four logistic model equating methods. Paper presented at the annual meeting of the American Educational Research Association, Chicago. [33] Wingersky, M. S., & Lord, F. M. (1984). An investigation of methods for reducing sampling error in certain IRT procedures. Applied Psychological Measurement, 8, 347-364. [34] Wu M. L., Adams R. J., & Wilson M. R. (1997). ConQuest: Multi-Aspect Test Software. Australian Council for Educational Research, Camberwell, Victoria. |