大模型与心理认知融合实验:现状,挑战与展望*

瞿晶晶, 张玮健, 高晓雪, 王祥丰

心理科学 ›› 2025, Vol. 48 ›› Issue (4) : 804-813.

PDF(935 KB)
中文  |  English
PDF(935 KB)
心理科学 ›› 2025, Vol. 48 ›› Issue (4) : 804-813. DOI: 10.16719/j.cnki.1671-6981.20250404
计算建模与人工智能

大模型与心理认知融合实验:现状,挑战与展望*

  • 瞿晶晶1, 张玮健1,2, 高晓雪3, 王祥丰**4
作者信息 +

Epitome: An Innovative Tool Platform Connecting AI and Psychological Research

  • Qu Jingjing1, Zhang Weijian1,2, Gao Xiaoxue3, Wang Xiangfeng4
Author information +
文章历史 +

摘要

大语言模型(LLMs)的社会化渗透正在重塑人类社交图景,如何系统研究人-LLM协同进化的心理机制已成为前沿课题。为此,需要聚焦LLM技术对心理学实验带来的影响,通过汇总认知机制、被试模拟和多主体人机交互三个不同层次的实验现状进行分析,并说明对应的挑战和局限性。该研究总结了融入LLM的心理学实验设计流程,包含12项核心任务,并分析了现有实验平台的优劣。未来需要构建LLM原生实验框架及平台,以应对人机融合实验范式带来的技术挑战和伦理问题。

Abstract

The widespread social penetration of Large Language Models (LLMs) is reshaping human social landscapes, making the systematic study of psychological mechanisms in human-LLM co-evolution a frontier research area. This paper systematically analyzes the impact of LLM technology on psychological experiments through three distinct levels: cognitive mechanism comparison studies, human subject simulation experiments, and multi-agent human-machine interaction experiments.
Cognitive mechanism analysis: Research reveals that LLMs exhibit human-like characteristics in perceptual judgment, reasoning, and decision-making tasks, achieving or surpassing human performance in many cognitive domains. However, fundamental differences exist between LLM and human cognitive mechanisms, particularly in memory and forgetting processes, causal reasoning, and theory of mind capabilities. While LLMs demonstrate perfect short-term memory retention and lack forgetting mechanisms, humans show complex memory dynamics. These differences necessitate careful consideration in experimental design and evaluation metrics.
Human subject simulation: LLMs demonstrate remarkable ability to simulate fine-grained cognitive features, including cognitive dissonance, emotional responses, and social behaviors. However, significant limitations exist, including black-box properties, homogenization tendencies due to alignment techniques, and poor performance in simulating specific demographic characteristics. These constraints raise concerns about ecological validity when LLMs completely substitute human subjects in psychological experiments.
Multi-agent human-machine interaction: LLMs show promise as novel social entities in various experimental paradigms, from one-on-one interactions to large-scale social simulations. In dyadic experiments, LLMs can simulate emotional states and engage in empathetic interactions, though challenges remain in balancing expressiveness with naturalness. In multi-agent scenarios, LLMs participate in game-theoretic settings like prisoner's dilemmas and public goods games, revealing complex strategic capabilities but limitations in theory of mind reasoning. Large-scale social simulations using thousands of LLM agents provide unprecedented opportunities to study collective behavior and social dynamics.
Experimental framework and platforms: The paper outlines a standardized workflow for LLM-integrated psychological experiments comprising 12 core tasks across four phases: proposal, preparation, execution, and data analysis. The complexity of human-machine interaction experiments demands advanced tools and specialized platforms. The emerging experiment platform addresses these challenges through native LLM integration, visual design systems, and multi-agent simulation capabilities, though limitations exist in physiological measurement support.
Future directions: The rapid iteration of LLM technology and technical complexity of human-machine experimental deployment present ongoing challenges. Future research requires developing LLM-native experimental frameworks, modular visualization systems, and comprehensive platforms supporting diverse experimental paradigms. As AI agents become more autonomous and sophisticated, new psychological questions regarding ethics, safety, and human-machine relationships will emerge, necessitating innovative experimental approaches grounded in psychological theory.
This comprehensive review highlights both the transformative potential and inherent limitations of LLM integration in psychological research, providing essential insights for researchers navigating this rapidly evolving interdisciplinary landscape.

关键词

人机协作 / 人机交互 / 实验平台 / 大语言模型(LLMs)

Key words

human-machine collaboration / human-machine interaction / experimental platform / Large Language Models (LLMs)

引用本文

导出引用
瞿晶晶, 张玮健, 高晓雪, 王祥丰. 大模型与心理认知融合实验:现状,挑战与展望*[J]. 心理科学. 2025, 48(4): 804-813 https://doi.org/10.16719/j.cnki.1671-6981.20250404
Qu Jingjing, Zhang Weijian, Gao Xiaoxue, Wang Xiangfeng. Epitome: An Innovative Tool Platform Connecting AI and Psychological Research[J]. Journal of Psychological Science. 2025, 48(4): 804-813 https://doi.org/10.16719/j.cnki.1671-6981.20250404

参考文献

[1] Abbasiantaeb Z., Yuan Y., Kanoulas E., & Aliannejadi M. (2024) . Let the LLMs talk: Simulating human-to-human conversational QA via zero-shot LLM-to-LLM Interactions. Proceedings of the 17th ACM international conference on web search and data mining.
[2] Abdurahman S., Atari M., Karimi-Malekabadi F., Xue M. J., Trager J., Park P. S., Golazizian P., Omrani A., & Dehghani M. (2024) . Perils and opportunities in using large language models in psychological research. PNAS Nexus, 3(7) , pgae245.
[3] Agashe S., Fan Y., Reyna A., & Wang X. E. (2025). LLM-coordination: Evaluating and analyzing multi-agent coordination abilities in large language models. ArXiv.
[4] Akata E., Schulz L., Coda-Forno, J., Oh, S. J., Bethge, M., & Schulz, E.
(in press). Playing repeated games with large language models. Nature Human Behaviour.
[5] Anthis J. R., Liu R., Richardson S. M., Kozlowski A. C., Koch B., Evans J., Brynjolfsson E., & Bernstein M. (2025). LLM social simulations are a promising research method. ArXiv.
[6] Anwyl-Irvine A. L., Massonnié J., Flitton A., Kirkham N., & Evershed J. K. (2020) . Gorilla in our midst: An online behavioral experiment builder. Behavior Research Methods, 52(1) , 388-407.
[7] Backmann S., Piedrahita D. G., Tewolde E., Mihalcea R., Schölkopf B., & Jin Z. (2025) . When ethics and payoffs diverge: LLM agents in morally charged social dilemmas. ArXiv.
[8] Binz M., Akata E., Bethge M., Brändle F., Callaway F., Coda-Forno J., Dayan P., .. Schulz E. (2024). Centaur: A foundation model of human cognition. ArXiv.
[9] Binz, M., & Schulz, E. (2023) . Using cognitive psychology to understand GPT-3. Proceedings of the National Academy of Sciences, 120(6) , e2218523120.
[10] Buckner C., & Garson J. (1997) . Connectionism. https://plato.stanford.edu/eNtRIeS/connectionism/
[11] Chen D. L., Schonger M., & Wickens C. (2016) . oTree—An open-source platform for laboratory, online, and field experiments. Journal of Behavioral and Experimental Finance, 9, 88-97.
[12] Chen L., Huang Y., Li Y., Jin Y., Zhao S., Zheng Z., & Zhang Q. (2024). alignment between the decision-making logic of llms and human cognition: A case study on legal LLMs. ArXiv.
[13] Coletta A., Dwarakanath K., Liu P., Vyetrenko S., & Balch T. (2024). LLM-driven imitation of subrational behavior: Illusion or reality? ArXiv.
[14] Demszky D., Yang D., Yeager D. S., Bryan C. J., Clapper M., Chandhok S., Eichstaedt J. C., Hecht C., Jamieson J., Johnson M., Jones M., Krettek-Cobb D., Lai L., JonesMitchell N., Ong D. C., Dweck C. S., Gross J. J., & Pennebaker J. W. (2023) . Using large language models in psychology. Nature Reviews Psychology, 2(11) , 688-701.
[15] Dillion D., Tandon N., Gu Y., & Gray K. (2023) . Can AI language models replace human participants? Trends in Cognitive Sciences, 27(7) , 597-600.
[16] Elyoseph, Z., & Levkovich, I. (2023) . Beyond human expertise: The promise and limitations of ChatGPT in suicide risk assessment. Frontiers in Psychiatry, 14, 1212141.
[17] Elyoseph, Z., & Levkovich, I. (2024) . Comparing the perspectives of generative AI, mental health experts, and the general public on schizophrenia recovery: Case vignette study. JMIR Mental Health, 11(1) , e53043.
[18] Engel C., Grossmann M. R. P., & Ockenfels A. (2023) . Integrating machine behavior into human subject experiments: A user-friendly toolkit and illustrations. SSRN Electronic Journal.
[19] Engel C., Grossmann M. R. P., & Ockenfels A. (2024) . Integrating machine behavior into human subject experiments: A user-friendly toolkit and illustrations (SSRN Scholarly Paper No. 4682602) . Social Science Research Network. https://papers.ssrn.com/abstract=4682602
[20] Epitome-实验管理. (n.d.) . Retrieved June 9, 2025, from . (n.d.) . Retrieved June 9, 2025, from http://lab.epitome-ai.com/#/experiment
[21] Esteban-Romero S., Martín-Fernández I., Gil-Martín M., Griol-Barres D., Callejas-Carrión Z., & Fernández-Martínez F. (2024) . LLM-driven multimodal fusion for human perception analysis. Proceedings of the 5th on multimodal sentiment analysis challenge and workshop: Social perception and humor, 45-51.
[22] Etsenake, D., & Nagappan, M. (2024) . Understanding the human-LLM Dynamic: A literature survey of LLM use in programming tasks. ArXiv.
[23] Farrell H., Gopnik A., Shalizi C., & Evans J. (2025) . Large AI models are cultural and social technologies. Science, 387(6739) , 1153-1156.
[24] Felin T., & Holweg M. (2024) . Theory is all you need: AI, human cognition, and causal reasoning (SSRN Scholarly Paper No. 4737265) . Social Science Research Network.
[25] Ferrag M. A., Tihanyi N., & Debbah M. (2025). From LLM reasoning to autonomous ai agents: A comprehensive review. ArXiv.
[26] Finger H., Goeke C., Diekamp D., Standvoß K., & König P. (2017) . LabVanced: A unified JavaScript framework for online studies. International Conference on Computational Social Science (Cologne) , 1-3.
[27] Gao J., Gebreegziabher S. A., Choo K. T. W., Li T. J. J., Perrault S. T., & Malone T. W. (2024) . A taxonomy for human-llm interaction modes: An initial exploration. Extended Abstracts of the CHI Conference on Human Factors in Computing Systems, 1-11.
[28] Gao M., Hu X., Yin X., Ruan J., Pu X., & Wan X. (2025) . LLM-based NLG evaluation: Current status and challenges. Computational Linguistics, 1-28.
[29] Gui, G., & Toubia, O. (2023) . The challenge of using llms to simulate human behavior: A causal inference perspective. SSRN Electronic Journal.
[30] Gurcan, Rcan & #214, & nder. (2024) . LLM-augmented agent-based modelling for social simulations: Challenges and opportunities. In HHAI 2024: Hybrid human AI systems for the social good (pp. 134-144) . IOS Press.
[31] Holleman G. A., Hooge I. T. C., Kemner C., & Hessels R. S. (2020). The 'Real-world approach' and its problems: A critique of the term ecological validity. Frontiers in Psychology, 11, 74.
[32] Hong Y., Choi J., Kim M., & Kim B. (2025). Can LLMs and humans be friends? Uncovering factors affecting human-AI intimacy formation. ArXiv.
[33] Hu J., Dong T., Luo G., Ma H., Zou P., Sun X., Guo D., Yang X., & Wang M. (2025) . PsycoLLM: Enhancing LLM for psychological understanding and evaluation. IEEE Transactions on Computational Social Systems, 12(2) , 539-551.
[34] Ivey J., Kumar S., Liu J., Shen H., Rakshit S., Raju R., Zhang H., Ananthasubramaniam A., Kim J., Yi B., Wright D., Israeli A., Møller A. G., Zhang L., & Jurgens D. (2024) . Real or robotic? Assessing whether llms accurately simulate qualities of human responses in dialogue. ArXiv.
[35] Jia J., Yuan Z., Pan J., McNamara P. E., & Chen D. (2024) . Decision-making behavior evaluation framework for LLMs under uncertain context. Advances in Neural Information Processing Systems, 37, 113360-113382.
[36] Ke L., Tong S., Cheng P., & Peng K. (2024). Exploring the frontiers of LLMs in psychological applications: A comprehensive review. ArXiv.
[37] Kim S., Jeong J., Han J. S., & Shin D. (2024) . LLM-mirror: A generated-persona approach for survey pre-testing. ArXiv.
[38] Kim T., Shin D., Kim Y. H., & Hong H. (2024). DiaryMate: Understanding user perceptions and experience in human-AI collaboration for personal journaling. Proceedings of the 2024 CHI conference on human factors in computing systems.
[39] Kojima T., Gu S. (Shane) , Reid M., Matsuo Y., & Iwasawa Y. (2022) . Large language models are zero-shot reasoners. Advances in Neural Information Processing Systems, 35, 22199-22213.
[40] Langgenius/dify. (2025). LangGenius. https://github.com/langgenius/dify (Original work published 2023)
[41] Lehr S. A., Saichandran K. S., Harmon-Jones E., Vitali N., & Banaji M. R. (2025) . Kernels of selfhood: GPT-4o shows humanlike patterns of cognitive dissonance moderated by free choice. Proceedings of the National Academy of Sciences, 122(20) , e2501823122.
[42] Lei Y., Liu H., Xie C., Liu S., Yin Z., Chen C., Li G., Torr P., & Wu Z. (2024). FairMindSim: Alignment of behavior, emotion, and belief in humans and LLM agents amid ethical dilemmas. ArXiv.
[43] Li C., Wang J., Zhang Y., Zhu K., Hou W., Lian J., Luo F., Yang Q., & Xie X. (2023). Large language models understand and can be enhanced by emotional stimuli. ArXiv.
[44] Li J., Lai Y., Li W., Ren J., Zhang M., Kang X., Wang S., Li P., Zhang Y. Q., Ma W., & Liu Y. (2025). Agent hospital: A simulacrum of hospital with evolvable medical agents. ArXiv.
[45] Liu X., Zhang J., Shang H., Guo S., Yang C., & Zhu Q. (2025). Exploring prosocial irrationality for LLM agents: A social cognition view. ArXiv.
[46] Ma, J. (2024). Can machines think like humans? A behavioral evaluation of llm-agents in dictator games. ArXiv.
[47] Mahowald K., Ivanova A. A., Blank I. A., Kanwisher N., Tenenbaum J. B., & Fedorenko E. (2024) . Dissociating language and thought in large language models. Trends in Cognitive Sciences, 28(6) , 517-540.
[48] McKenna, C. (2023) . oTree GPT [HTML]. https://github.com/clintmckenna/oTree_gpt.
[49] Newell A. (1992) . Précis of unified theories of cognition. Behavioral and Brain Sciences, 15(3) , 425-437.
[50] Online behavioral experiments for researchers. (n.d.) . Cognition. Retrieved June 9, 2025, from https://www.cognition.run
[51] Pang X., Tang S., Ye R., Xiong Y., Zhang B., Wang Y., & Chen S. (2024, March 15). Self-alignment of large language models via multi-agent social simulation. ICLR 2024 Workshop on Large Language Model (LLM) Agents.
[52] Petrov N. B., Serapio-García G., & Rentfrow J. (2024) . Limited ability of LLMs to simulate human psychological behaviours: A psychometric analysis. ArXiv.
[53] Piao J., Yan Y., Zhang J., Li N., Yan J., Lan X., Lu Z., Zheng Z., Wang J. Y., Zhou D., Gao C., Xu F., Zhang F., Rong K., Su J., & Li Y. (2025). AgentSociety: Large-scale simulation of LLM-driven generative agents advances understanding of human behaviors and society. ArXiv.
[54] Pollet, T. V., & Saxton, T. K. (2019) . How diverse are the samples used in the journals 'Evolution & Human Behavior' and 'Evolutionary Psychology'? Evolutionary Psychological Science, 5(3) , 357-368.
[55] Qiu Z., Lyu H., Xiong W., & Luo J. (2025). Can LLMs simulate social media engagement? A study on action-guided response generation. ArXiv.
[56] Qualtrics XM - experience management software. (n.d.) . Qualtrics. Retrieved June 9, 2025, from https://www.qualtrics.com/
[57] Ritter F. E., Tehranchi F., & Oury J. D. (2019) . ACT-R: A cognitive architecture for modeling cognition. WIREs Cognitive Science, 10(3) , e1488.
[58] Sartori, G., & Orrù, G. (2023). Language models and psychological sciences. Frontiers in Psychology, 14, 1279317.
[59] Sidji M., Smith W., & Rogerson M. J. (2024). Human-AI collaboration in cooperative games: A study of playing codenames with an LLM assistant.Proceeding of the ACM on Human-Computer Interaction. 8, 316:1-316:25.
[60] Sreedhar, K., & Chilton, L. (2024). Simulating human strategic behavior: Comparing single and multi-agent LLMs. ArXiv.
[61] Srivastava A., Rastogi A., Rao A., Shoeb A. A. M., Abid A., Fisch A., Brown A. R., Wu Z. (2023). Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models. ArXiv.
[62] Stoet, G. (2017). PsyToolkit: A novel web-based method for running online questionnaires and reaction-time experiments. Teaching of Psychology, 44(1), 24-31.
[63] Sucholutsky I., Collins K. M., Jacoby N., Thompson B. D., & Hawkins R. D. (2025). Using LLMs to advance the cognitive science of collectives (No. ArXiv:2506.00052; Version 1). ArXiv.
[64] Tie G., Zhao Z., Song D., Wei F., Zhou R., Dai Y., Yin W., .. Gao J. (2025). Large language models post-training: Surveying techniques from alignment to reasoning. ArXiv.
[65] Wang Q., Tang Z., & He B. (2025). From ChatGPT to DeepSeek: Can LLMs simulate humanity? ArXiv.
[66] Wang Y., Liang D., & Zeng Y. (2025). Cognitive alignment between humans and LLMs across multimodal domains. Research Square.
[67] Wester J., Jacobsen R. M., de Jong S., Als N. K. K., Djernæs H. B., & van Berkel N. (2024). Theory of mind and self-presentation in human-LLM interactions: CHI' 24: CHI Conference on human factors in computing. Proceedings of the ACM SIGCHI conference on human factors in computing systems.
[68] Winkler, J. R., & Appel, M. (2024). Measuring dynamic emotional experiences in response to media stimuli. Frontiers in Psychology, 15, 1436918.
[69] Xie C., Chen C., Jia F., Ye Z., Lai S., Shu K., Gu J., Bibi A., Hu Z., Jurgens D., Evans J., Torr P., Ghanem B., & Li G. (2024, November 6). Can large language model agents simulate human trust behavior? The Thirty-eighth Annual Conference on Neural Information Processing Systems.
[70] Yadav N., Achananuparp P., Jiang J., & Lim E. P. (2025). Effects of theory of mind and prosocial beliefs on steering human-Aligned behaviors of LLMs in Ultimatum Games. ArXiv.
[71] Yang Q., Steinfeld A., Rosé C., & Zimmerman J. (2020). Re-examining whether, why, and how human-AI interaction is uniquely difficult to design. Human Factors in Computing Systems, PA
[72] Yang Q., Wang Z., Chen H., Wang S., Pu Y., Gao X., Huang W., Song S., & Huang G. (2024a). PsychoGAT: A novel psychological measurement paradigm through interactive fiction games with LLM agents. ArXiv.
[73] Yang Q., Wang Z., Chen H., Wang S., Pu Y., Gao X., Huang W., Song S., & Huang G. (2024b). PsychoGAT: A novel psychological measurement paradigm through interactive fiction games with LLM agents. ArXiv.
[74] Yax N., Anlló H., & Palminteri S. (2024). Studying and improving reasoning in humans and machines. Communications Psychology, 2(1), 1-16.
[75] Zhang X., Lin J., Mou X., Yang S., Liu X., Sun L., Lyu H., .. Wei Z. (2025). SocioVerse: A world model for social simulation powered by llm agents and a pool of 10 million real-world users. ArXiv.
[76] Zhu Y., He Y., Haq E. U., Tyson G., & Hui P. (2025). Characterizing LLM-driven social network: The chirper.ai Case. ArXiv.

基金

* 本研究得到上海人工智能实验室和国家自然科学基金面上项目(32371094)的资助

PDF(935 KB)

评审附件

Accesses

Citation

Detail

段落导航
相关文章

/