大语言模型模拟区域心理结构的有效性:人格与幸福感的实证检验*

柯罗马, 李增逸, 廖江群, 童松, 彭凯平

心理科学 ›› 2025, Vol. 48 ›› Issue (4) : 907-919.

PDF(1388 KB)
中文  |  English
PDF(1388 KB)
心理科学 ›› 2025, Vol. 48 ›› Issue (4) : 907-919. DOI: 10.16719/j.cnki.1671-6981.20250412
计算建模与人工智能

大语言模型模拟区域心理结构的有效性:人格与幸福感的实证检验*

  • 柯罗马1, 李增逸2, 廖江群3, 童松**4,5, 彭凯平**1
作者信息 +

Effectiveness of Large Language Models in Simulating Regional Psychological Structures: An Empirical Examination of Personality and Subjective Well-being

  • Ke Luoma1, Li Zengyi2, Liao Jiangqun3, Tong Song4,5, Peng Kaiping1
Author information +
文章历史 +

摘要

研究旨在评估大语言模型(DeepSeek)仅基于人口统计特征条件下,模拟群体心理特征的能力。基于与中国家庭追踪调查(2018年)人口学特征相匹配的样本(N = 2943),构建人工智能生成的“虚拟被试”,比较其与真实人群在大五人格与幸福感区域分布上的一致性。研究发现,模拟数据在幸福感和大五人格的区域分布趋势上与真实数据总体一致,仅在细节上表现出特异性,主要体现在模型易捕捉并放大外显特质(如神经质)及受区域刻板影响下对幸福感的估计偏差。这表明,以DeepSeek为代表的大语言模型模拟区域心理结构方面的潜力,但其应用需要注意文化敏感性和细粒度特质模拟方面的局限,以避免过度解读。研究为评估大语言模型在人群心理特征建模的有效性提供了实证支持。

Abstract

This study aimed to investigate the capacity of a large language model (LLM), specifically DeepSeek, for simulating regional psychological characteristics based solely on demographic information. In particular, it examined whether DeepSeek can preserve culturally distinct psychological patterns without reducing them to oversimplified, flattened profiles, with a focus on personality traits and subjective well-being across different regions of China. Utilizing a sample matched to demographic features from the 2018 China Family Panel Studies (CFPS2018) (N = 2,943), the research generated artificial "virtual participants" with DeepSeek. The simulated dataset was compared to real human responses from CFPS to analyze regional differences in Big Five personality traits (openness, conscientiousness, extraversion, agreeableness, neuroticism) and subjective well-being.
Methodologically, the empirical human dataset comprised adult participants from CFPS 2018, covering seven culturally and socioeconomically distinct Chinese regions (North China, Northeast, East China, Central China, South China, Southwest, and Northwest). Each region had an equal number of males and females aged from 18 to 65. Personality was measured using a simplified 15-item Chinese Big Five inventory, while subjective happiness was assessed using a single-item self-rating scale. Correspondingly, a matched virtual dataset of equivalent size and demographic distribution was generated using DeepSeek-V3-0324, with constructed prompts designed to mirror the demographics and cultural context of the actual participants. The virtual participants responded to identical psychological assessments, ensuring comparability.
Results from independent-sample t-tests indicated overall similarity, while significant differences between human and AI-generated data in certain aspects. Specifically, the virtual dataset closely mirrored human data in terms of personality and happiness distributions, but exhibited significant differences in several traits. Simulated participants scored significantly lower in extraversion and openness (with medium to large effect sizes) and higher in agreeableness and neuroticism compared to human data. Happiness levels in the simulated dataset were consistently lower, suggesting limitations in DeepSeek’s capacity to replicate subjective emotional experiences accurately.
Further ANOVA analyses revealed that both datasets reflected significant regional differences in personality traits and happiness. For example, in human responses, the Southwest region demonstrated significantly higher extraversion, while the Northeast region exhibited higher subjective happiness. However, DeepSeek’s simulated data diverged from these patterns, notably underestimating happiness in the Northeast and overestimating certain personality dimensions in economically prosperous East China.
Additionally, regression analyses explored the relationship between personality traits and subjective happiness within both datasets. Human data indicated significant positive predictors of happiness as conscientiousness, extraversion, openness, and the negative predictor, neuroticism. The virtual data, however, showed different structural variations: openness and agreeableness positively predicted happiness, neuroticism negatively predicted happiness significantly more strongly, extraversion negatively predicted happiness, and conscientiousness had no significant predictive effect. Principal Component Analysis (PCA) further highlighted structural difference between the human and simulated datasets, particularly reflecting an overreliance on more linguistically salient and externally expressed traits in the AI-generated responses.
These findings contribute significantly to the understanding of LLM applications in psychological research. Primarily, they demonstrate DeepSeek’s general effectiveness in simulating broad psychological distributions, while also highlighting its limitations in capturing region-specific psychological structures shaped by the interplay of economic conditions, cultural norms, and psychological dispositions—limitations likely stemming from the model’s training data, which insufficiently represents these layered contextual factors.
The practical implications of this research are substantial. The use of DeepSeek as a tool for generating "virtual participants" could significantly reduce costs and logistical burdens associated with large-scale psychological research, enabling preliminary testing and refinement of research designs prior to field deployment. However, caution is recommended due to observed biases, including exaggerated cultural stereotypes and inadequate modeling of subjective emotional states. Future model iterations and methodological advancements should address these issues by incorporating richer, more culturally grounded training data and more precise affective modeling techniques.
Despite these limitations, the research provides important methodological insights and theoretical contributions by introducing an innovative approach using LLM-generated virtual participants for psychological inquiry. It underscores the potential of DeepSeek and similar models for cost-effective large-scale research while highlighting crucial areas that require further refinement.
In conclusion, this study validates the feasibility of employing large language models such as DeepSeek for simulating regional psychological structures, but also emphasizes the necessity for continued development to address culturally grounded and psychologically meaningful variations effectively. As training data and algorithms advance, these models may help reshape methodologies within personality and cross-cultural psychological research.

关键词

大语言模型 / DeepSeek / 大五人格 / 幸福感 / 区域心理结构 / 虚拟被试

Key words

large language model / deepseek / big five personality / subjective well-being / regional psychological structure / virtual participants

引用本文

导出引用
柯罗马, 李增逸, 廖江群, 童松, 彭凯平. 大语言模型模拟区域心理结构的有效性:人格与幸福感的实证检验*[J]. 心理科学. 2025, 48(4): 907-919 https://doi.org/10.16719/j.cnki.1671-6981.20250412
Ke Luoma, Li Zengyi, Liao Jiangqun, Tong Song, Peng Kaiping. Effectiveness of Large Language Models in Simulating Regional Psychological Structures: An Empirical Examination of Personality and Subjective Well-being[J]. Journal of Psychological Science. 2025, 48(4): 907-919 https://doi.org/10.16719/j.cnki.1671-6981.20250412

参考文献

[1] 蔡华俭, 黄梓航, 林莉, 张明杨, 王潇欧, 朱慧珺, 谢怡萍, 杨盈, 杨紫嫣, 敬一鸣. (2020). 半个多世纪来中国人的心理与行为变化——心理学视野下的研究. 心理科学进展, 28(10), 1599-1618.
[2] 陈灿锐, 高艳红, 申荷永. (2012). 主观幸福感与大三人格特征相关研究的元分析. 心理科学进展, 20(1), 19-26.
[3] 吴琼, 谷丽萍. (2020). 简版人格量表在中国大型综合调查中的应用. 调研世界, 5, 53-58.
[4] 张海钟, 姜永志, 赵文进, 安桂花, 张小龙, 胡志军, 张万里. (2012). 中国区域跨文化心理学理论探索与实证研究.心理科学进展, 20(8), 1229-1236.
[5] Anglim J., Horwood S., Smillie L. D., Marrero R. J., & Wood J. K. (2020). Predicting psychological and subjective well-being from personality: A meta-analysis. Psychological Bulletin, 146(4), 279-323.
[6] Argyle L. P., Busby E. C., Fulda N., Gubler J. R., Rytting C., & Wingate D. (2023). Out of one, many: Using language models to simulate human samples. Political Analysis, 31(3), 337-351.
[7] Baumeister R. F., Vohs K. D., & Funder D. C. (2007). Psychology as the science of self-reports and finger movements: Whatever happened to actual behavior? Perspectives on Psychological Science, 2(4), 396-403.
[8] Bisbee J., Clinton J. D., Dorff C., Kenkel B., & Larson J. M. (2024). Synthetic replacements for human survey data? The perils of large language models. Political Analysis, 32(4), 401-416.
[9] Bogg, T., & Roberts, B. W. (2004). Conscientiousness and health-related behaviors: A meta-analysis of the leading behavioral contributors to mortality. Psychological Bulletin, 130(6), 887-919.
[10] Bubeck S., Chandrasekaran V., Eldan R., Gehrke J., Horvitz E., Kamar E., Lee P., Lee Y. T., Li Y., Lundberg S., Nori H., Palangi H., Ribeiro M. T., & Zhang Y. (2023). Sparks of artificial general intelligence: Early experiments with GPT-4. ArXiv.
[11] Cohen, J. (1992). A power primer. Psychological Bulletin, 112(1), 155-159.
[12] De Winter, J. C. F., Driessen T., & Dodou D. (2024). The use of ChatGPT for personality research: Administering questionnaires using generated personas. Personality and Individual Differences, 228, Article 112729.
[13] Demszky D., Yang D., Yeager D. S., Bryan C. J., Clapper M., Chandhok S., Eichstaedt J. C., Hecht C., Jamieson J., Johnson M., Jones M., Krettek-Cobb D., Lai L., JonesMitchell N., Ong D. C., Dweck C. S., Gross J. J., & Pennebaker J. W. (2023). Using large language models in psychology. Nature Reviews Psychology, 2, 688-701.
[14] Dillion D., Tandon N., Gu Y., & Gray K. (2023). Can AI language models replace human participants? Trends in Cognitive Sciences, 27(7), 597-600.
[15] Grant S., Langan-Fox J., & Anglim J. (2009). The big five traits as predictors of subjective and psychological well-being. Psychological Reports, 105(1), 205-231.
[16] Grossmann I., Feinberg M., Parker D. C., Christakis N. A., Tetlock P. E., & Cunningham W. A. (2023). AI and the transformation of social science research. Science, 380(6650), 1108-1109.
[17] Hahn E., Gottschling J., & Spinath F. M. (2012). Short measurements of personality - validity and reliability of the GSOEP Big Five Inventory (BFI-S). Journal of Research in Personality, 46(3), 355-359.
[18] Harding J., D’Alessandro W., Laskowski N. G., & Long R. (2024). AI language models cannot replace human research participants. AI and Society, 39(5), 2603-2605.
[19] Hou H., Liu I., Kong F., & Ni S. (2025). Computational positive psychology: Advancing the science of wellbeing in the digital era. The Journal of Positive Psychology, 20(1), 1-14.
[20] Jiang H., Zhang X., Cao X., Breazeal C., Roy D., & Kabbara J. (2023). PersonaLLM: Investigating the ability of large language models to express personality traits. ArXiv.
[21] Jin C., Zhang S., Shu T., & Cui Z. (2023). The cultural psychology of large language models: Is ChatGPT a holistic or analytic thinker? ArXiv.
[22] Kovač G., Sawayama M., Portelas R., Colas C., Dominey P. F., & Oudeyer P.-Y. (2023). Large language models as superpositions of cultural perspectives.ArXiv.
[23] Lucy, L., & Bamman, D. (2021). Gender and representation bias in GPT-3 generated stories. Proceedings of the third workshop on narrative understanding.
[24] Matz S. C., Gladstone J. J., & Stillwell D. (2016). Money buys happiness when spending fits our personality. Psychological Science, 27(5), 715-725.
[25] McCrae, R. R., & John, O. P. (1992). An introduction to the five-factor model and its applications. Journal of Personality, 60(2), 175-215.
[26] Mei Q., Xie Y., Yuan W., & Jackson M. O. (2024). A Turing test of whether AI chatbots are behaviorally similar to humans. Proceedings of the National Academy of Sciences, 121(9), Article e2313925121.
[27] Oishi S., Kesebir S., & Diener E. (2011). Income inequality and happiness. Psychological Science, 22(9), 1095-1100.
[28] Paunonen, S. V., & Ashton, M. C. (2001). Big five factors and facets and the prediction of behavior. Journal of Personality and Social Psychology, 81(3), 524-539.
[29] Rathje S., Mirea D. M., Sucholutsky I., Marjieh R., Robertson C. E., & Van Bavel, J. J. (2024). GPT is an effective tool for multilingual psychological text analysis. Proceedings of the National Academy of Sciences, 121(34), Article e2308950121.
[30] Sarstedt M., Adler S. J., Rau L., & Schmitt B. (2024). Using large language models to generate silicon samples in consumer and marketing research: Challenges, opportunities, and guidelines. Psychology and Marketing, 41(6), 1254-1270.
[31] Schimmack U., Diener E., & Oishi S. (2002). Life-satisfaction is a momentary judgment and a stable personality characteristic: The use of chronically accessible and stable sources. Journal of Personality, 70(3), 345-384.
[32] Schimmack U., Oishi S., Furr R. M., & Funder D. C. (2004). Personality and life satisfaction: A facet-level analysis. Personality and Social Psychology Bulletin, 30(8), 1062-1075.
[33] Serapio-García G., Safdari M., Crepy C., Sun L., Fitz S., Romero P., Abdulhai M., Faust A., & Matarić M. (2023). Personality traits in large language models. ArXiv.
[34] Sorokovikova A., Fedorova N., Rezagholi S., & Yamshchikov I. P. (2024). LLMs simulate big five personality traits: Further evidence. Proceedings of the 1st Workshop on Personalization of Generative AI Systems.
[35] Steel P., Schmidt J., & Shultz J. (2008). Refining the relationship between personality and subjective well-being. Psychological Bulletin, 134(1), 138-161.
[36] Strachan J. W. A., Albergo D., Borghini G., Pansardi O., Scaliti E., Gupta S., Saxena K., Rufo A., Panzeri S., Manzi G., Graziano M. S. A., & Becchio C. (2024). Testing theory of mind in large language models and humans. Nature Human Behaviour, 8(7), 1285-1295.
[37] Talhelm T., Zhang X., Oishi S., Shimin C., Duan D., Lan X., & Kitayama S. (2014). Large-scale psychological differences within China explained by rice versus wheat agriculture. Science, 344(6184), 603-608.
[38] Tourangeau, R., & Yan, T. (2007). Sensitive questions in surveys. Psychological Bulletin, 133(5), 859-883.
[39] Trott S., Jones C., Chang T., Michaelov J., & Bergen B. (2023). Do large language models know what humans know? Cognitive Science, 47(7), Article e13309.
[40] Wang A., Morgenstern J., & Dickerson J. P. (2025). Large language models that replace human participants can harmfully misportray and flatten identity groups. Nature Machine Intelligence, 7(3), 400-411.
[41] Wang J., Liu C., & Cai Z. (2022). Digital literacy and subjective happiness of low-income groups: Evidence from rural China. Frontiers in Psychology, 13, Article 1045187.
[42] Wang Y., Zhao J., Ones D. S., He L., & Xu X. (2025). Evaluating the ability of large language models to emulate personality. Scientific Reports, 15(1), Article 519.
[43] Zhai Q., Willis M., O’Shea B., Zhai Y., & Yang Y. (2013). Big five personality traits, job satisfaction and subjective wellbeing in China. International Journal of Psychology, 48(6), 1099-1108.

基金

* 本研究得到国家重点研发计划(2016YFA0602500)和清华大学全球产业研究院自选课题(2021-11-09-LXHT005-01、2024-06-18-LXHT002)的资助

PDF(1388 KB)

评审附件

Accesses

Citation

Detail

段落导航
相关文章

/