[1] 曹亦薇. (2003). 项目功能差异在跨文化人格问卷分析中的应用. 心理学报, 35(1), 120-126. [2] 关丹丹, 乔辉, 陈康, 韩奕帆. (2019). 全国高考英语试题的城乡项目功能差异分析. 心理学探新, 39(1), 64-69. [3] 郭聪颖, 边玉芳. (2013). 题组项目功能差异(DIF)检验方法的应用探索. 心理学探新, 33(5), 423-429. [4] 林岳卿, 方积乾. (2011). 多维IRT与单维IRT在多维量表中应用的差异. 中国卫生统计, 28(3), 226-228. [5] 刘文, 边玉芳, 陈玲丽, 马文超. (2010). 马洛-克罗恩社会赞许性量表在跨文化研究中的项目功能差异检验. 心理科学, 33(6), 1473-1476. [6] 骆方, 张厚粲. (2006). 检验项目功能差异的两类方法——CFA和IRT的比较. 心理学探新, 26(1), 74-78. [7] 漆书青, 戴海崎, 丁树良. (2002). 现代教育与心理测量学原理. 高等教育出版社.. [8] 魏丹, 张丹慧, 刘红云. (2020). 基于多维题组反应模型的项目功能差异检验探究. 心理科学, 43(1), 206-214. [9] 余跃, 杜文久, 周娟, 秦菊香. (2016). LP方法及其与三种常用DIF检测方法的比较. 心理科学, 39(3), 720-726. [10] 张龙, 涂冬波. (2015). 多级计分题项目功能差异常用检测方法及比较. 江西师范大学学报(自然科学版), 39(5), 441-448. [11] 郑蝉金, 郭聪颖, 边玉芳. (2011). 变通的题组项目功能差异检验方法在篇章阅读测验中的应用. 心理学报, 43(7), 830-835. [12] American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (2014). Standards for educational and psychological testing: National council on measurement in education. Author. [13] Barnett, V., & Lewis, T. (1994). Outliers in statistical data. Wiley. [14] Bechger, T. M., & Maris, G. (2015). A statistical test for differential item pair functioning. Psychometrika, 80(2), 317-340. [15] Bond T. G.,& Fox, C. M. (2013). Applying the rasch model: Fundamental measurement in the human sciences Psychology Press Fundamental measurement in the human sciences. Psychology Press. [16] Bradley, J. V. (1978). Robustness? British Journal of Mathematical and Statistical Psychology, 31(2), 144-152. [17] Cai, L. (2017). flexMIRT® Version 3.51: Flexible multilevel multidimensional item analysis and test scoring . Vector Psychometric Group. [18] Candell, G. L., & Drasgow, F. (1988). An iterative procedure for linking metrics and assessing item bias in item response theory. Applied Psychological Measurement, 12(3), 253-260. [19] Cao M. Y., Tay L., & Liu Y. W. (2017). A Monte Carlo study of an iterative Wald test procedure for DIF analysis. Educational and Psychological Measurement, 77(1), 104-118. [20] Clauser B., Mazor K., & Hambleton R. K. (1993). The effects of purification of matching criterion on the identification of DIF using the Mantel-Haenszel procedure. Applied Measurement in Education, 6(4), 269-279. [21] Clauser, B. E., & Mazor, K. M. (1998). Using statistical procedures to identify differentially functioning test items. Educational Measurement: Issues and Practice, 17(1), 31-44. [22] DeMars, C. E. (2011). An analytic comparison of effect sizes for differential item functioning. Applied Measurement in Education, 24(3), 189-209. [23] Fidalgo A. M., Mellenbergh G. J., & Muñiz J. (2000). Effects of amount of DIF, test length, and purification type on robustness and power of Mantel-Haenszel procedures. Methods of Psychological Research, 5(3), 43-53. [24] Finch, H. (2005). The MIMIC model as a method for detecting DIF: Comparison with Mantel-Haenszel, SIBTEST, and the IRT likelihood ratio. Applied Psychological Measurement, 29(4), 278-295. [25] Fischer, G. H., & Molenaar, I. W. (1995). Rasch models: Foundations, recent developments, and applications. Springer. [26] French, B. F., & Maller, S. J. (2007). Iterative purification and effect size use with logistic regression for differential item functioning detection. Educational and Psychological Measurement, 67(3), 373-393. [27] Frick H., Strobl C., & Zeileis A. (2015). Rasch mixture models for DIF detection: A comparison of old and new score specifications. Educational and Psychological Measurement, 75(2), 208-234. [28] Halpern, D. F. (2000). Sex differences in cognitive abilities. Lawrence Erlbaum Associates Publishers. [29] Hansen M., Cai L., Monroe S., & Li Z. (2016). Limited-information goodness-of-fit testing of diagnostic classification item response models. British Journal of Mathematical and Statistical Psychology, 69(3), 225-252. [30] Holland, P. W., & Thayer, D. T. (1986). Differential item performance and the Mantel-Haenszel statistic. Paper presented at the annual meeting of the American Educational Research Association, San Francisco, CA. [31] Holland, P. W., & Thayer, D. T. (1988). Differential item performance and the Mantel-Haenszel procedure. In H. Wainer & H. I. Braun (Eds.), Test validity (pp. 129-145). Lawrence Erlbaum Associates. [32] Hyde, J. S., & Linn, M. C. (1988). Gender differences in verbal ability: A meta-analysis. Psychological Bulletin, 104(1), 53-69. [33] Jöreskog, K. G., & Goldberger, A. S. (1975). Estimation of a model with multiple indicators and multiple causes of a single latent variable. Journal of the American Statistical Association, 70(351), 631-639. [34] Kopf J., Zeileis A., & Strobl C. (2015a). A framework for anchor methods and an iterative forward approach for DIF detection. Applied Psychological Measurement, 39(2), 83-103. [35] Kopf J., Zeileis A., & Strobl C. (2015b). Anchor selection strategies for DIF analysis: Review, assessment, and new approaches. Educational and Psychological Measurement, 75(1), 22-56. [36] Lord, F. M. (1980). Applications of item response theory to practical testing problems IRT. Lawrence Erlbaum Associates. [37] Magis D., Béland S., Tuerlinckx F., & De Boeck P. (2010). A general framework and an R package for the detection of dichotomous differential item functioning. Behavior Research Methods, 42(3), 847-862. [38] Magis, D., & De Boeck, P. (2012). A robust outlier approach to prevent type I error inflation in differential item functioning. Educational and Psychological Measurement, 72(2), 291-311. [39] Magis, D., & Facon, B. (2013). Item purification does not always improve DIF detection: A counterexample with Angoff's delta plot. Educational and Psychological Measurement, 73(2), 293-311. [40] May, H. (2006). A multilevel Bayesian item response theory method for scaling socioeconomic status in international studies of education. Journal of Educational and Behavioral Statistics, 31(1), 63-79. [41] Muthén, B. (1985). A method for studying the homogeneity of test items with respect to other relevant variables. Journal of Educational Statistics, 10(2), 121-132. [42] Navas-Ara, M. J., & Gómez-Benito, J. (2002). Effects of ability scale purification on the identification of dif. European Journal of Psychological Assessment, 18(1), 9-15. [43] OECD. (2014). PISA 2012 Technical Report. OECD Publishing. [44] Roussos L. A., Schnipke D. L., & Pashley P. J. (1999). A generalized formula for the Mantel-Haenszel differential item functioning parameter. Journal of Educational and Behavioral Statistics, 24(3), 293-322. [45] Shaywitz B. A., Shaywltz S. E., Pugh K. R., Constable R. T., Skudlarski P., Fulbright R. K., & Gore J. C. (1995). Sex differences in the functional organization of the brain for language. Nature, 373(6515), 607-609. [46] Shealy, R., & Stout, W. (1993). A model-based standardization approach that separates true bias/DIF from group ability differences and detects test bias/DTF as well as item bias/DIF. Psychometrika, 58(2), 159-194. [47] Shih, C. L., & Wang, W. C. (2009). Differential item functioning detection using the multiple indicators, multiple causes method with a pure short anchor. Applied Psychological Measurement, 33(3), 184-199. [48] Sinharay S., Dorans N. J., Grant, M. C, Blew, E. O., & Knorr C. M. (2006). Using past data to enhance small-sample DIF estimation: A Bayesian approach. ETS Research Report, 2006(1), i-38. [49] Soares T. M., Gonçalves F. B., & Gamerman D. (2009). An integrated Bayesian model for DIF analysis. Journal of Educational and Behavioral Statistics, 34(3), 348-377. [50] Steiger, J. H., & Lind, J. C. (1980). Statistically based tests for the number of common factors. Paper presented at the Annual Meeting of the Psychometric Society, Iowa City, IA. [51] Tay L., Meade A. W., & Cao M. Y. (2015). An overview and practical guide to IRT measurement equivalence analysis. Organizational Research Methods, 18(1), 3-46. [52] Thissen D., Steinberg L., & Gerrard M. (1986). Beyond group-mean differences: The concept of item bias. Psychological Bulletin, 99(1), 118-128. [53] Thissen D., Steinberg L., & Wainer H. (1993). Detection of differential item functioning using the parameters of item response models. In P. W. Holland & H. Wainer (Eds.), Differential item functioning (pp. 67-113). Lawrence Erlbaum Associates. [54] Tutz, G., & Schauberger, G. (2015). A penalty approach to differential item functioning in Rasch models. Psychometrika, 80(1), 21-43. [55] Wang W. C., Shih C. L., & Yang C. C. (2009). The MIMIC method with scale purification for detecting differential item functioning. Educational and Psychological Measurement, 69(5), 713-731. [56] Wang, W. C., & Su, Y. H. (2004). Effects of average signed area between two item characteristic curves and test purification procedures on the DIF detection via the Mantel-Haenszel method. Applied Measurement in Education, 17(2), 113-144. [57] Woods C. M., Cai L., & Wang M. (2013). The Langer-improved Wald test for DIF testing with multiple groups: Evaluation and comparison to two-group IRT. Educational and Psychological Measurement, 73(3), 532-547. [58] Xu J., Paek I., & Xia Y. (2017). Investigating the behaviors of M2 and RMSEA2 in fitting a unidimensional model to multidimensional data. Applied Psychological Measurement, 41(8), 632-644. [59] Yen, W. M. (1984). Effects of local item dependence on the fit and equating performance of the three-parameter logistic model. Applied Psychological Measurement, 8(2), 125-145. [60] Yuan K. H., Liu H. Y., & Han Y. T. (2021). Differential item functioning analysis without a priori information on anchor items: QQ plots and graphical test. Psychometrika, 86(2), 345-377. [61] Zieky, M. (1993). Practical questions in the use of DIF statistics in test development. In P. W. Holland & H. Wainer (Eds.), Differential item functioning (pp. 337-347). Routledge. [62] Zwick, R., & Thayer, D. T. (2002). Application of an empirical Bayes enhancement of Mantel-Haenszel differential item functioning analysis to a computerized adaptive test. Applied Psychological Measurement, 26(1), 57-76. [63] Zwick R., Thayer D. T., & Lewis C. (2000). Using loss functions for DIF detection: An empirical Bayes approach. Journal of Educational and Behavioral Statistics, 25(2), 225-247. |