元认知能力指个体对自身思维过程的察觉、反思与调节,有助于学习者自主监测学习过程。启发式提问被认为是提升元认知的有效手段,但传统课堂中实施难度大,结合大语言模型后仍面临模板化、缺乏逻辑递进与评估机制不足等问题。为此构建了基于错因的启发式问题库,并设计多种策略引导大模型提问,推动其从“问题百科”转变为“提问导师”。同时引入人工与大模型双重评分机制评估提问效果。结果显示:(1)基于错因的问题库能有效提升学习者的元认知能力;(2)多种提问策略中,半自由提问策略(即基于错因提问,其余自由发挥)表现优于完全自由或严格策略;(3)通过多维度评价,证明提出的智能启发式提问机制能有效促进学习者元认知发展。
Abstract
Metacognitive ability refers to an individual’s awareness of, reflection on, and regulation of their own cognitive processes. This ability facilitates learners’ autonomous monitoring of their learning. Heuristic questioning, as a key approach to activating and cultivating metacognition, encourages students to engage in active thinking, identify cognitive blind spots, and adjust learning strategies, thereby fostering a positive learning cycle. However, in traditional classrooms, it is challenging for teachers to provide personalized questioning support to every student. With the recent rapid advancement of large language models (LLMs), new opportunities have emerged for personalized questioning. Nevertheless, existing LLMs predominantly function as “answering machines” rather than “questioning mentors.” While they excel at answering questions, they often struggle to generate deep and thought-provoking questions, which limits the potential of intelligent heuristic questioning to promote metacognitive development.
This study proposes a heuristic questioning mechanism based on error-type analysis, aiming to shift LLMs from being encyclopedias to experienced questioning tutors. A cross-disciplinary question bank was developed to categorize common errors and their corresponding heuristic questions. Retrieval-Augmented Generation (RAG) was used to enable flexible dialogue guided by preset prompts, referencing the error-based knowledge base. Three questioning strategies, characterized by fully open-ended, template-constrained, and semi-open (combining error guidance with generative flexibility), were designed and compared with a baseline model without the question bank.
To evaluate the effectiveness of these strategies, a dual evaluation framework combining human judgment and automated scoring by LLMs was established. For the subjective evaluation, volunteers rated teacher-student dialogues generated by different strategies across multiple dimensions using questionnaires. The automated evaluation used a dialogue-adapted scoring rubric constructed from established metacognitive assessment frameworks, with quantitative analysis of students’ cognitive regulation indicators performed by the large model. By comparing the distribution and trends of human and model scores, the study analyzed the guidance efficacy and task adaptability of each strategy.
The results indicated that: (1) the error-based question bank significantly enhances students’ thinking and metacognitive development; (2) among the tested strategies, the semi-open approach achieves the best overall performance by balancing content specificity, generative flexibility, and learner adaptability; and (3) multidimensional evaluation confirms the effectiveness of the proposed intelligent heuristic questioning mechanism in fostering metacognitive growth.
关键词
元认知能力 /
启发式提问 /
大语言模型 /
错因分析 /
提问策略 /
人机共测
Key words
Metacognitive /
heuristic questioning /
large language models /
error analysis /
questioning strategies /
joint Human-AI testing
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] 侯燕, 赵玉芳, 黄莉. (2021). 大学生心理痛苦观察者评定量表的编制及其心理测量学分析. 西南师范大学学报(自然科学版), 46(4), 73-78.
[2] 贾龙才. (2024). 初中数学常见错题归类及解题技巧研究. 数学学习与研究, 6, 155-157.
[3] 李志厚. (2004). 通过有效提问促进学生思维发展. 教育导刊, 9, 35-37.
[4] 刘兆敏, 高伟伟, 罗湘齐. (2017). 自发走神和有意走神及其与元认知的关系. 心理科学, 40(5), 1047-1053.
[5] 王梓宇, 张子元, 朱荣娟, 游旭群, 梁继民. (2024). 机器学习在认知增强中的应用研究. 心理科学, 47(6), 1519-1529.
[6] 姚欣雨, 李文兵. (2024). 指向运算能力培养的数学运算错因分析及对策——以“幂的运算”为例. 数学教学研究, 43(3), 37-41.
[7] 朱莎莎. (2015). 启发式教学研究述评. 湖南大学马克思主义学院学报, 35(4), 12-18.
[8] An D., Ye C., & Liu S. (2024). The influence of metacognition on learning engagement: The mediating effect of learning strategy and learning behavior. Current Psychology, 43, 31241-31253.
[9] Balcikanli, C. (2011). Metacognitive awareness inventory for teachers (MAIT). Electronic Journal of Research in Educational Psychology, 9(3), 1309-1332.
[10] Borich, G. D. (2000). Effective teaching methods. Merrill.
[11] Callender A. A., Franco-Watkins A. M., & Roberts A. S. (2016). Improving metacognition in the classroom through instruction, training, and feedback. Metacognition and Learning, 11(2), 215-235.
[12] Chiang, C.-H., & Lee, H.-Y. (2023). Can large language models be an alternative to human evaluations? Proceedings of the Annual Meeting of the Association for Computational Linguistics.
[13] Cohen, J. (2013). Statistical power analysis for the behavioral sciences. Routledge.
[14] Drigas A., Papoutsi C., & Skianis C. (2021). Metacognitive and metaemotional training strategies through the Nine-layer Pyramid Model of Emotional Intelligence. International Journal of Recent Contributions from Engineering, Science & IT (iJES), 9(4), 58-76.
[15] Fan Y., Jiang F., Li P., & Li H. (2023). GrammarGPT: Exploring open-source LLMs for native Chinese grammatical error correction with supervised fine-tuning. Proceedings of the CCF International Conference on Natural Language Processing and Chinese Computing(pp.69-80). Springer.
[16] Fernández-Sánchez A., Lorenzo-Castiñeiras J. J., & Sánchez-Bello A. (2025). Navigating the future of pedagogy: The integration of AI tools in developing educational assessment rubrics. European Journal of Education, 60(1), e12826.
[17] Field, A. (2024). Discovering statistics using IBM SPSS statistics. Sage Publications Limited.
[18] Flavell, J. H. (1979). Metacognition and cognitive monitoring: A new area of cognitive-developmental inquiry. American Psychologist, 34(10), 906-911.
[19] Ghimire, N., & Mokhtari, K. (2025). Evaluating the predictive power of metacognitive reading strategies across diverse educational contexts. Large-Scale Assessments in Education, 13, 4.
[20] Girden, E. R. (1992). ANOVA: Repeated measures. Sage.
[21] Hackl V., Müller A. E., Granitzer M., & Sailer M. (2023, December 5). Is GPT-4 a reliable rater? Evaluating consistency in GPT-4' s text ratings. Frontiers in Education, 8, 1272229.
[22] Henkel O., Levonian Z., Li C., & Postle M. (2024). Retrieval-augmented generation to improve math question-answering: Trade-offs between groundedness and human preference. Proceedings of the International Conference on Educational Data Mining.
[23] IBM Corp. (2021). IBM SPSS Statistics for Windows, Version 28.0. IBM Corp.
[24] King, A. (1992). Facilitating elaborative learning through guided student-generated questioning. Educational Psychologist, 27(1), 111-126.
[25] Kojima T., Gu S. S., Reid M., Matsuo Y., & Iwasawa Y. (2022). Large language models are zero-shot reasoners. Advances in Neural Information Processing Systems, 35, 22199-22213.
[26] Kuhn, D. (2022). Metacognition matters in many ways. Educational Psychologist, 57(2), 73-86.
[27] Lakens, D. (2013). Calculating and reporting effect sizes to facilitate cumulative science: A practical primer for t-tests and ANOVAs. Frontiers in Psychology, 4, 863.
[28] Lee G. G., Latif E., Wu X., Liu N., & Zhai X. (2024). Applying large language models and chain-of-thought for automatic scoring. Computers and Education: Artificial Intelligence, 6, 100213.
[29] Lewis P., Perez E., Piktus A., Petroni F., Karpukhin V., Goyal N., Kuttler H., Lewis M., Yih W. T., Rocktäschel, T. & Riedel, S. (2020). Retrieval-augmented generation for knowledge-intensive NLP tasks. Advances in Neural Information Processing Systems, 33, 9459-9474.
[30] Meade, A. W., & Craig, S. B. (2012). Identifying careless responses in survey data. Psychological Methods, 17(3), 437.
[31] OECD. (2021). Teachers and leaders in vocational education and training. OECD Publishing..
[32] Paul, R., & Elder, L. (2019). The miniature guide to critical thinking concepts and tools. Rowman & Littlefield.
[33] Pintrich, P. R. (2002). The role of metacognitive knowledge in learning, teaching, and assessing. Theory Into Practice, 41(4), 219-225.
[34] Sáiz-Manzanares M. C., Marticorena-Sánchez R., & Díez-Palomar J. (2023). Perceived satisfaction of university students with the use of chatbots as a tool for self-regulated learning. Heliyon, 9, e12843.
[35] Schraw, G., & Dennison, R. S. (1994). Assessing metacognitive awareness. Contemporary Educational Psychology, 19(4), 460-475.
[36] Schunk, D. H. (2012). Learning theories: An educational perspective. Pearson Education.
[37] Shridhar K., Macina J., El-Assady M., Sinha T., Kapur M., & Sachan M. (2022). Automatic generation of Socratic subquestions for teaching math word problems. In Y. Goldberg, Z. Kozareva, & Y. Zhang (Eds.), In proceedings of the conference on empirical methods in natural language processing (pp. 4136-4149). Association for Computational Linguistics.
[38] The LearningAgencyLab. (2024). Eedi-mining misconceptions in mathematics: Predict affinity between misconceptions and incorrect answers (distractors) in multiple-choice questions. https://www.kaggle.com/competitions/eedi-mining-misconceptions-in-mathematics/data
[39] Veenman M. V. J., Van Hout-Wolters, B. H. A. M., & Afflerbach P. (2006). Metacognition and learning: Conceptual and methodological considerations. Metacognition and Learning, 1(1), 3-14.
[40] Wei J., Wang X., Schuurmans D., Bosma M., Xia F., Chi E., Le Q. V., & Zhou D. (2022). Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35, 24824-24837.
[41] Wollny S., Schneider J., & Haring M. (2021). Are we there yet? A systematic literature review on chatbots in education. Frontiers in Artificial Intelligence, 4, 654924.
[42] Woolf, B. P. (2010). Building intelligent interactive tutors: Student-centered strategies for revolutionizing e-learning. Morgan Kaufmann.
[43] Xie Y., Zeng F., & Yang Y. (2024). A meta-analysis of the relationship between metacognition and academic achievement in mathematics: From preschool to university. Acta Psychologica, 249, 104486.
[44] Xiao C., Xu S. X., Zhang K., Wang Y., & Xia L. (2023). Evaluating reading comprehension exercises generated by LLMs: A showcase of ChatGPT in education applications. Proceedings of the workshop on innovative use of NLP for building educational applications (BEA 2023).
[45] Yin J., Zhu Y., Goh T. T., Wu W., & Hu Y. (2024). Using educational chatbots with metacognitive feedback to improve science learning. Applied Sciences, 14(2), 1-18.
[46] Zhou Z., Ning M., Wang Q., Yao J., Wang W., Huang X., & Huang K. (2023). Learning by analogy: Diverse questions generation in math word problem. In A. Rogers, J. Boyd-Graber, & N. Okazaki (Eds.), findings of the association for computational linguistics: ACL 2023 (pp. 11091-11104). Association for Computational Linguistics.
[47] Zohar, A., & Dori, Y. J. (2003). Higher order thinking skills and low-achieving students: Are they mutually exclusive? Journal of the Learning Sciences, 12(2), 145-181.
基金
*本研究得到国家自然科学基金项目(62377013)、上海市科技计划项目(20dz2260300)和中央高校基本科研业务费专项资金的资助