Empowering the Construction and Automated Measurement of Psychological Trait Dimensions with Artificial Intelligence: A Case Study of National Stereotypes

Yilin Wang, Nan Zhao, Tingshao Zhu

Journal of Psychological Science ›› 2025, Vol. 48 ›› Issue (4) : 997-1008.

PDF(1752 KB)
PDF(1752 KB)
Journal of Psychological Science ›› 2025, Vol. 48 ›› Issue (4) : 997-1008. DOI: 10.16719/j.cnki.1671-6981.20250419
Computational modeling and artificial intelligence

Empowering the Construction and Automated Measurement of Psychological Trait Dimensions with Artificial Intelligence: A Case Study of National Stereotypes

  • Yilin Wang1,2, Nan Zhao1,2, Tingshao Zhu1,2
Author information +
History +

Abstract

National stereotypes play a significant role in shaping intergroup attitudes, behaviors, and international relations. Accurately measuring these stereotypes is essential to understand social cognition at the individual and societal levels. However, traditional methods of assessing such stereotypes typically rely on predefined dimensions and structured questionnaires, which often limit the scope of concept identification and introduce measurement biases. To overcome these limitations, this study introduces an artificial intelligence-empowered paradigm for psychological assessment that applies large language models (LLMs) to integrate dimensional construction and automated measurement without the need for conventional scale development. This automated evaluation approach is referred to as the LLM-rating model, which enables direct, scalable, and objective evaluation of psychological indicators from open-ended textual data.
In Study 1, we utilized LLMs to extract national stereotype content from free-description responses provided by participants of different nationalities. Specifically, we recruited 191 Chinese participants (107 female; mean age = 31.28 years) and 176 American participants (85 female; mean age = 47.08 years) to describe their impressions of different foreign nations. The free-description responses were processed using text mining methods, including network analysis and topic modeling, and further analyzed with LLMs to identify the cross-cultural core dimensions of national stereotypes. This approach revealed five dimensions: cultural richness, development and progress, dominance and threat, social equality, authoritarianism, and dictatorship. These dimensions extend beyond conventional stereotype content models, and offer a more comprehensive understanding of national images. By incorporating LLMs into both the extraction and categorization processes, our study reduces human subjectivity in manual coding and provides a data-driven approach to identifying stereotype structure.
In Study 2, based on the valid American participants from Study 1, we re-invited 59 of them (29 female; mean age = 47.29) to participate again in order to validate the automated measurement model. Using multiple advanced LLMs, including GPT-4o, DeepSeek-R1, Llama 3.3, and Qwen-max, we developed LLM-rating models to assess national stereotype dimensions. Each model generated stereotype ratings independently, which were then evaluated for human-model rating consistency by comparing them with human expert evaluations, and for temporal stability of rating results across different time points. The results demonstrated high consistency between the LLM-generated ratings and the human expert evaluations across all dimensions. Additionally, the models also showed strong temporal stability, maintaining similar ratings for free-description texts about the same country, written by the same participant at different time points. These findings suggest that LLMs could be used for large-scale, automated psychological measurements, saving human and material resources while expanding the methodological possibilities for social cognition research.
The highlight of this study lies in its establishment of a new computational framework for constructing and measuring psychological dimensions, empowered by artificial intelligence. Traditional assessment approaches typically require constructing psychological scales based on theoretical assumptions, involving substantial effort in defining concepts, generating items, and conducting validation studies. In contrast, our LLM-rating paradigm bypasses the need for scale development by directly leveraging the natural language processing capabilities. These models extract meaningful psychological concepts directly from free-text responses, and construct core dimensions based on the extracted concepts, followed by automated scoring. This approach not only enhances efficiency but also ensures adaptability, as it allows national stereotype assessment to evolve dynamically with societal changes based on a large corpus rather than being constrained by static survey items.
In conclusion, this study introduces a computational paradigm for psychological assessment by integrating artificial intelligence and social psychological research. By leveraging LLMs throughout the entire process from dimension construction to automated measurement, our study underscores the potential of LLMs for social science research, which provides more scalable and objective approaches to measuring stereotypes and other psychological indicators. This work offers a new perspective on social cognition research and provides practical implications for interpersonal communication at the individual level and collaboration at the national level.

Key words

artificial intelligence / large language model / national stereotype / social cognition / psychological assessment

Cite this article

Download Citations
Yilin Wang, Nan Zhao, Tingshao Zhu. Empowering the Construction and Automated Measurement of Psychological Trait Dimensions with Artificial Intelligence: A Case Study of National Stereotypes[J]. Journal of Psychological Science. 2025, 48(4): 997-1008 https://doi.org/10.16719/j.cnki.1671-6981.20250419

References

[1] Anagnostidis, S., & Bulian, J. (2024). How susceptible are LLMs to influence in prompts? ArXiv.
[2] Bai J., Bai S., Chu Y., Cui Z., Dang K., Deng X., & Zhu T. (2023). Qwen technical report. ArXiv.
[3] Bai X., Wang A., Sucholutsky I., & Griffiths T. L. (2024). Measuring implicit bias in explicitly unbiased large language models. ArXiv.
[4] Blondel V. D., Guillaume J. L., Lambiotte R., & Lefebvre E. (2008). Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment, 10.
[5] Buhmann, A., & Ingenhoff, D. (2015). Advancing the country image construct from a public relations perspective. Journal of Communication Management, 19(1), 62-80.
[6] Chen, C., & Leitch, A. (2024). LLMs as academic reading companions: Extending HCI through synthetic personae. ArXiv.
[7] Chen D., Huang Y., Ma Z., Chen H., Pan X., Ge C., & Zhou J. (2023). Data-Juicer: A one-stop data processing system for large language models. ArXiv.
[8] Curtis, L. (2008). U.S.-India relations: The China factor. Backgrounder, 1, 8,115-121.
[9] Dahl, R. A. (2008). Polyarchy: Participation and opposition. Yale university press.
[10] Dancey, C. P., & Reidy, J. (2007). Statistics without maths for psychology. Pearson/Prentice Hall.
[11] DeepSeek-AI, Guo D., Yang D., Zhang H., Song J., Zhang R., & Zhang Z. (2025). DeepSeek-R1: Incentivizing reasoning capability in LLMs via reinforcement learning. ArXiv.
[12] Esses V. M., Veenvliet S., & Medianu S. (2012). The dehumanization of refugees: Determinants and consequences. In S. Wiley, G. Philogène, & T. A. Revenson (Eds.), Social categories in everyday experience (pp. 133-150). American Psychological Association.
[13] Fiske S. T., Cuddy A. J. C., & Glick P. (2007). Universal dimensions of social cognition: Warmth and competence. Trends in Cognitive Sciences, 11(2), 77-83.
[14] Fiske S. T., Cuddy A. J. C., Glick P., & Xu J. (2002). A model of (often mixed) stereotype content: Competence and warmth respectively follow from perceived status and competition. Journal of Personality and Social Psychology, 82(6), 878-902.
[15] Gerlach M., Peixoto T. P., & Altmann E. G. (2018). A network approach to topic models. Science Advances, 4(7), eaaq1360.
[16] Grattafiori A., Dubey A., Jauhri A., Pandey A., Kadian A., Al-Dahle A., & Ma Z. (2024). The Llama 3 herd of models. ArXiv.
[17] Herrmann R. K., Voss J. F., Schooler T. Y. E., & Ciarrochi J. (1997). Images in international relations: An experimental test of cognitive schemata. International Studies Quarterly, 41(3), 403-433.
[18] Huang F., Sun X., Mei A., Wang Y., Ding H., & Zhu T. (2024). LLM plus machine learning outperform expert rating to predict life satisfaction from self-statement text. IEEE Transactions on Computational Social Systems. Advance online publication.
[19] Inglehart R.,& Welzel, C. (2005). Modernization, cultural change, and democracy: The human development sequence. Cambridge University Press.
[20] Kissinger, H. A. (2012). The future of U.S.-Chinese relations: Conflict is a choice, not a necessity essay. Foreign Affairs, 91(2), 44-55.
[21] Li L., Li A., Hao B., Guan Z., & Zhu T. (2014). Predicting active users’ personality based on micro-blogging behaviors. PLoS ONE, 9(1), e84997.
[22] Linssen, H., & Hagendoorn, L. (1994). Social and geographical factors in the explanation of the content of European nationality stereotypes. British Journal of Social Psychology, 33(2), 165-182.
[23] List of countries and dependencies by population. (2024). In Wikipedia. https://en.wikipedia.org/w/index.php?title=List_of_countries_and_dependencies_by_population&oldid=1264772310
[24] List of languages by number of native speakers. (2024). In Wikipedia. https://en.wikipedia.org/w/index.php?title=List_of_languages_by_number_of_native_speakers&oldid=1261007365
[25] Liu M., Xue J., Zhao N., Wang X., Jiao D., & Zhu T. (2021). Using social media to explore the consequences of domestic violence on mental health. Journal of Interpersonal Violence, 36(3-4), NP1965-1985NP.
[26] Martin D., Hutchison J., Slessor G., Urquhart J., Cunningham S. J., & Smith K. (2014). The spontaneous formation of stereotypes via cumulative cultural evolution. Psychological Science, 25(9), 1777-1786.
[27] Martin, I. M., & Eroglu, S. (1993). Measuring a multi-dimensional construct: Country image. Journal of Business Research, 28(3), 191-210.
[28] Mercer, J. (2018). Reputation and international politics. Cornell University Press..
[29] OpenAI, Achiam J., Adler S., Agarwal S., Ahmad L., Akkaya I., & Zoph B. (2024). GPT-4 technical report. ArXiv.
[30] OpenAI, Hurst A., Lerer A., Goucher A. P., Perelman A., Ramesh A., & Malkov Y. (2024). GPT-4o system card. ArXiv.
[31] Pack A., Barrett A., & Escalante J. (2024). Large language models and automated essay scoring of english language learner writing: Insights into validity and reliability. Computers and Education: Artificial Intelligence, 6, 100234.
[32] Poppe, E., & Linssen, H. (1999). In-group favouritism and the reflection of realistic dimensions of difference between national states in central and eastern European nationality stereotypes. British Journal of Social Psychology, 38(1), 85-102.
[33] Pratto F., Sidanius J., & Levin S. (2006). Social dominance theory and the dynamics of intergroup relations: Taking stock and looking forward. European Review of Social Psychology, 17(1), 271-320.
[34] Pu, Z. (1989). A comparative perspective on the United States and Chinese constitutions. William And Mary Law Review, 30, 867-880.
[35] Tang X., Chen H., Lin D., & Li K. (2024). Harnessing LLMs for multi-dimensional writing assessment: Reliability and alignment with human judgments. Heliyon, 10(14), e34262.
[36] Tang X., Duan X., & Cai Z. G. (2025). Large language models for automated literature review: An evaluation of reference generation, abstract writing, and review composition. ArXiv.
[37] Waltz K. N.(2010). Theory of international politics. Waveland Press..
[38] Wang, Y., & Lin, C. (2024). Stereotypes at the intersection of perceivers, situations, and identities: Analyzing stereotypes from storytelling using natural language processing. OSF.
[39] Wang Y., Wu P., Liu X., Li S., Zhu T., & Zhao N. (2020). Subjective well-being of Chinese sina weibo users in residential lockdown during the COVID-19 pandemic: Machine learning analysis. Journal of Medical Internet Research, 22(12), e24775.
[40] Wilkinson, R., & Pickett, K. (2011). The spirit level: Why greater equality makes societies stronger. Bloomsbury Pub Plc USA.
[41] Xiao, C., & Yang, B. Z. (2024). LLMs may not be human-level players, but they can be testers: Measuring game difficulty with LLM agents. ArXiv.
[42] Yan, X. (2010). The instability of China-US relations. The Chinese Journal of International Politics, 3(3), 263-292.
[43] Zhu L., Wang X., & Wang X. (2023). JudgeLM: Fine-tuned large language models are scalable judges. ArXiv.
PDF(1752 KB)

Accesses

Citation

Detail

Sections
Recommended

/