This study aimed to investigate the capacity of a large language model (LLM), specifically DeepSeek, for simulating regional psychological characteristics based solely on demographic information. In particular, it examined whether DeepSeek can preserve culturally distinct psychological patterns without reducing them to oversimplified, flattened profiles, with a focus on personality traits and subjective well-being across different regions of China. Utilizing a sample matched to demographic features from the 2018 China Family Panel Studies (CFPS2018) (N = 2,943), the research generated artificial "virtual participants" with DeepSeek. The simulated dataset was compared to real human responses from CFPS to analyze regional differences in Big Five personality traits (openness, conscientiousness, extraversion, agreeableness, neuroticism) and subjective well-being.
Methodologically, the empirical human dataset comprised adult participants from CFPS 2018, covering seven culturally and socioeconomically distinct Chinese regions (North China, Northeast, East China, Central China, South China, Southwest, and Northwest). Each region had an equal number of males and females aged from 18 to 65. Personality was measured using a simplified 15-item Chinese Big Five inventory, while subjective happiness was assessed using a single-item self-rating scale. Correspondingly, a matched virtual dataset of equivalent size and demographic distribution was generated using DeepSeek-V3-0324, with constructed prompts designed to mirror the demographics and cultural context of the actual participants. The virtual participants responded to identical psychological assessments, ensuring comparability.
Results from independent-sample t-tests indicated overall similarity, while significant differences between human and AI-generated data in certain aspects. Specifically, the virtual dataset closely mirrored human data in terms of personality and happiness distributions, but exhibited significant differences in several traits. Simulated participants scored significantly lower in extraversion and openness (with medium to large effect sizes) and higher in agreeableness and neuroticism compared to human data. Happiness levels in the simulated dataset were consistently lower, suggesting limitations in DeepSeek’s capacity to replicate subjective emotional experiences accurately.
Further ANOVA analyses revealed that both datasets reflected significant regional differences in personality traits and happiness. For example, in human responses, the Southwest region demonstrated significantly higher extraversion, while the Northeast region exhibited higher subjective happiness. However, DeepSeek’s simulated data diverged from these patterns, notably underestimating happiness in the Northeast and overestimating certain personality dimensions in economically prosperous East China.
Additionally, regression analyses explored the relationship between personality traits and subjective happiness within both datasets. Human data indicated significant positive predictors of happiness as conscientiousness, extraversion, openness, and the negative predictor, neuroticism. The virtual data, however, showed different structural variations: openness and agreeableness positively predicted happiness, neuroticism negatively predicted happiness significantly more strongly, extraversion negatively predicted happiness, and conscientiousness had no significant predictive effect. Principal Component Analysis (PCA) further highlighted structural difference between the human and simulated datasets, particularly reflecting an overreliance on more linguistically salient and externally expressed traits in the AI-generated responses.
These findings contribute significantly to the understanding of LLM applications in psychological research. Primarily, they demonstrate DeepSeek’s general effectiveness in simulating broad psychological distributions, while also highlighting its limitations in capturing region-specific psychological structures shaped by the interplay of economic conditions, cultural norms, and psychological dispositions—limitations likely stemming from the model’s training data, which insufficiently represents these layered contextual factors.
The practical implications of this research are substantial. The use of DeepSeek as a tool for generating "virtual participants" could significantly reduce costs and logistical burdens associated with large-scale psychological research, enabling preliminary testing and refinement of research designs prior to field deployment. However, caution is recommended due to observed biases, including exaggerated cultural stereotypes and inadequate modeling of subjective emotional states. Future model iterations and methodological advancements should address these issues by incorporating richer, more culturally grounded training data and more precise affective modeling techniques.
Despite these limitations, the research provides important methodological insights and theoretical contributions by introducing an innovative approach using LLM-generated virtual participants for psychological inquiry. It underscores the potential of DeepSeek and similar models for cost-effective large-scale research while highlighting crucial areas that require further refinement.
In conclusion, this study validates the feasibility of employing large language models such as DeepSeek for simulating regional psychological structures, but also emphasizes the necessity for continued development to address culturally grounded and psychologically meaningful variations effectively. As training data and algorithms advance, these models may help reshape methodologies within personality and cross-cultural psychological research.
Key words
large language model /
deepseek /
big five personality /
subjective well-being /
regional psychological structure /
virtual participants
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
References
[1] 蔡华俭, 黄梓航, 林莉, 张明杨, 王潇欧, 朱慧珺, 谢怡萍, 杨盈, 杨紫嫣, 敬一鸣. (2020). 半个多世纪来中国人的心理与行为变化——心理学视野下的研究. 心理科学进展, 28(10), 1599-1618.
[2] 陈灿锐, 高艳红, 申荷永. (2012). 主观幸福感与大三人格特征相关研究的元分析. 心理科学进展, 20(1), 19-26.
[3] 吴琼, 谷丽萍. (2020). 简版人格量表在中国大型综合调查中的应用. 调研世界, 5, 53-58.
[4] 张海钟, 姜永志, 赵文进, 安桂花, 张小龙, 胡志军, 张万里. (2012). 中国区域跨文化心理学理论探索与实证研究.心理科学进展, 20(8), 1229-1236.
[5] Anglim J., Horwood S., Smillie L. D., Marrero R. J., & Wood J. K. (2020). Predicting psychological and subjective well-being from personality: A meta-analysis. Psychological Bulletin, 146(4), 279-323.
[6] Argyle L. P., Busby E. C., Fulda N., Gubler J. R., Rytting C., & Wingate D. (2023). Out of one, many: Using language models to simulate human samples. Political Analysis, 31(3), 337-351.
[7] Baumeister R. F., Vohs K. D., & Funder D. C. (2007). Psychology as the science of self-reports and finger movements: Whatever happened to actual behavior? Perspectives on Psychological Science, 2(4), 396-403.
[8] Bisbee J., Clinton J. D., Dorff C., Kenkel B., & Larson J. M. (2024). Synthetic replacements for human survey data? The perils of large language models. Political Analysis, 32(4), 401-416.
[9] Bogg, T., & Roberts, B. W. (2004). Conscientiousness and health-related behaviors: A meta-analysis of the leading behavioral contributors to mortality. Psychological Bulletin, 130(6), 887-919.
[10] Bubeck S., Chandrasekaran V., Eldan R., Gehrke J., Horvitz E., Kamar E., Lee P., Lee Y. T., Li Y., Lundberg S., Nori H., Palangi H., Ribeiro M. T., & Zhang Y. (2023). Sparks of artificial general intelligence: Early experiments with GPT-4. ArXiv.
[11] Cohen, J. (1992). A power primer. Psychological Bulletin, 112(1), 155-159.
[12] De Winter, J. C. F., Driessen T., & Dodou D. (2024). The use of ChatGPT for personality research: Administering questionnaires using generated personas. Personality and Individual Differences, 228, Article 112729.
[13] Demszky D., Yang D., Yeager D. S., Bryan C. J., Clapper M., Chandhok S., Eichstaedt J. C., Hecht C., Jamieson J., Johnson M., Jones M., Krettek-Cobb D., Lai L., JonesMitchell N., Ong D. C., Dweck C. S., Gross J. J., & Pennebaker J. W. (2023). Using large language models in psychology. Nature Reviews Psychology, 2, 688-701.
[14] Dillion D., Tandon N., Gu Y., & Gray K. (2023). Can AI language models replace human participants? Trends in Cognitive Sciences, 27(7), 597-600.
[15] Grant S., Langan-Fox J., & Anglim J. (2009). The big five traits as predictors of subjective and psychological well-being. Psychological Reports, 105(1), 205-231.
[16] Grossmann I., Feinberg M., Parker D. C., Christakis N. A., Tetlock P. E., & Cunningham W. A. (2023). AI and the transformation of social science research. Science, 380(6650), 1108-1109.
[17] Hahn E., Gottschling J., & Spinath F. M. (2012). Short measurements of personality - validity and reliability of the GSOEP Big Five Inventory (BFI-S). Journal of Research in Personality, 46(3), 355-359.
[18] Harding J., D’Alessandro W., Laskowski N. G., & Long R. (2024). AI language models cannot replace human research participants. AI and Society, 39(5), 2603-2605.
[19] Hou H., Liu I., Kong F., & Ni S. (2025). Computational positive psychology: Advancing the science of wellbeing in the digital era. The Journal of Positive Psychology, 20(1), 1-14.
[20] Jiang H., Zhang X., Cao X., Breazeal C., Roy D., & Kabbara J. (2023). PersonaLLM: Investigating the ability of large language models to express personality traits. ArXiv.
[21] Jin C., Zhang S., Shu T., & Cui Z. (2023). The cultural psychology of large language models: Is ChatGPT a holistic or analytic thinker? ArXiv.
[22] Kovač G., Sawayama M., Portelas R., Colas C., Dominey P. F., & Oudeyer P.-Y. (2023). Large language models as superpositions of cultural perspectives.ArXiv.
[23] Lucy, L., & Bamman, D. (2021). Gender and representation bias in GPT-3 generated stories. Proceedings of the third workshop on narrative understanding.
[24] Matz S. C., Gladstone J. J., & Stillwell D. (2016). Money buys happiness when spending fits our personality. Psychological Science, 27(5), 715-725.
[25] McCrae, R. R., & John, O. P. (1992). An introduction to the five-factor model and its applications. Journal of Personality, 60(2), 175-215.
[26] Mei Q., Xie Y., Yuan W., & Jackson M. O. (2024). A Turing test of whether AI chatbots are behaviorally similar to humans. Proceedings of the National Academy of Sciences, 121(9), Article e2313925121.
[27] Oishi S., Kesebir S., & Diener E. (2011). Income inequality and happiness. Psychological Science, 22(9), 1095-1100.
[28] Paunonen, S. V., & Ashton, M. C. (2001). Big five factors and facets and the prediction of behavior. Journal of Personality and Social Psychology, 81(3), 524-539.
[29] Rathje S., Mirea D. M., Sucholutsky I., Marjieh R., Robertson C. E., & Van Bavel, J. J. (2024). GPT is an effective tool for multilingual psychological text analysis. Proceedings of the National Academy of Sciences, 121(34), Article e2308950121.
[30] Sarstedt M., Adler S. J., Rau L., & Schmitt B. (2024). Using large language models to generate silicon samples in consumer and marketing research: Challenges, opportunities, and guidelines. Psychology and Marketing, 41(6), 1254-1270.
[31] Schimmack U., Diener E., & Oishi S. (2002). Life-satisfaction is a momentary judgment and a stable personality characteristic: The use of chronically accessible and stable sources. Journal of Personality, 70(3), 345-384.
[32] Schimmack U., Oishi S., Furr R. M., & Funder D. C. (2004). Personality and life satisfaction: A facet-level analysis. Personality and Social Psychology Bulletin, 30(8), 1062-1075.
[33] Serapio-García G., Safdari M., Crepy C., Sun L., Fitz S., Romero P., Abdulhai M., Faust A., & Matarić M. (2023). Personality traits in large language models. ArXiv.
[34] Sorokovikova A., Fedorova N., Rezagholi S., & Yamshchikov I. P. (2024). LLMs simulate big five personality traits: Further evidence. Proceedings of the 1st Workshop on Personalization of Generative AI Systems.
[35] Steel P., Schmidt J., & Shultz J. (2008). Refining the relationship between personality and subjective well-being. Psychological Bulletin, 134(1), 138-161.
[36] Strachan J. W. A., Albergo D., Borghini G., Pansardi O., Scaliti E., Gupta S., Saxena K., Rufo A., Panzeri S., Manzi G., Graziano M. S. A., & Becchio C. (2024). Testing theory of mind in large language models and humans. Nature Human Behaviour, 8(7), 1285-1295.
[37] Talhelm T., Zhang X., Oishi S., Shimin C., Duan D., Lan X., & Kitayama S. (2014). Large-scale psychological differences within China explained by rice versus wheat agriculture. Science, 344(6184), 603-608.
[38] Tourangeau, R., & Yan, T. (2007). Sensitive questions in surveys. Psychological Bulletin, 133(5), 859-883.
[39] Trott S., Jones C., Chang T., Michaelov J., & Bergen B. (2023). Do large language models know what humans know? Cognitive Science, 47(7), Article e13309.
[40] Wang A., Morgenstern J., & Dickerson J. P. (2025). Large language models that replace human participants can harmfully misportray and flatten identity groups. Nature Machine Intelligence, 7(3), 400-411.
[41] Wang J., Liu C., & Cai Z. (2022). Digital literacy and subjective happiness of low-income groups: Evidence from rural China. Frontiers in Psychology, 13, Article 1045187.
[42] Wang Y., Zhao J., Ones D. S., He L., & Xu X. (2025). Evaluating the ability of large language models to emulate personality. Scientific Reports, 15(1), Article 519.
[43] Zhai Q., Willis M., O’Shea B., Zhai Y., & Yang Y. (2013). Big five personality traits, job satisfaction and subjective wellbeing in China. International Journal of Psychology, 48(6), 1099-1108.