大语言模型与心理学的历史渊源*

黄林洁琼, 张雯, 陈珍, 李晨曦, 李兴珊

心理科学 ›› 2025, Vol. 48 ›› Issue (4) : 773-781.

PDF(388 KB)
中文  |  English
PDF(388 KB)
心理科学 ›› 2025, Vol. 48 ›› Issue (4) : 773-781. DOI: 10.16719/j.cnki.1671-6981.20250401
计算建模与人工智能

大语言模型与心理学的历史渊源*

  • 黄林洁琼1, 张雯1,2, 陈珍1,2, 李晨曦1,2, 李兴珊**1,2
作者信息 +

The Historical Origins of Large Language Models and Psychology

  • Huang Linjieqiong1, Zhang Wen1,2, Chen Zhen1,2, Li Chenxi1,2, Li Xingshan1,2
Author information +
文章历史 +

摘要

近年来,大语言模型表现出媲美人类的语言理解和生成能力,成为人工智能领域的重大突破。在大语言模型的发展历程中,心理学与其存在深厚的历史渊源。首先,作为模型架构基础,人工神经网络早期被心理学家用于模拟人类认知过程;其次,计算机科学与心理学在词汇语义表征方面的学科交叉推动了词嵌入技术的发展;第三,大语言模型和人类在实时语言加工中存在相似特点;此外,跨领域的人员交集与学术传承构成了推动人工智能研究的重要力量。因此,心理学在语言加工与认知建模方面的耕耘及其与计算机科学的深入合作为语言智能的突破做出重要贡献。这为科研人员提供重要启示:跨学科合作与基础研究的深耕是推动创新和实现突破的关键。

Abstract

In recent years, large language models (LLMs) have made significant advancements. Through deep learning, LLMs have learned from vast amounts of human language data and demonstrated human-like language understanding and generation abilities. Through techniques such as supervised fine-tuning and reinforcement learning, LLMs can handle a variety of human tasks and generate text according to human intentions, marking a major breakthrough in the field of artificial intelligence (AI). This paper reviews the development of LLMs, demonstrates their historical roots in psychology, and highlights the critical role of interdisciplinary collaboration, offering insights for future research at the intersection of AI and psychology.
First, psychologists have played a foundational role in the development of artificial neural networks—the backbone of LLMs. Early neuropsychologists such as Donald Hebb and psychologist Frank Rosenblatt focused on learning mechanisms within neural systems, thereby laying the groundwork for machine learning. Long before the deep learning era, psychologists extensively used artificial neural networks to model human cognition. Researchers such as James L. McClelland and David E. Rumelhart continuously refined network architectures to simulate language processing, fostering deep integration between psychology and artificial neural networks. These contributions provided essential theoretical and methodological foundations for the development of LLMs.
Second, the technique of word embeddings is central for enabling LMMs to understand language, and its development has benefited from interdisciplinary collaboration among psychology, linguistics, and computer science. Word embedding technique enables abstract language to be transformed into a form that computers can understand and process. Early psychological and linguistic research introduced the concept of distributed representations of lexical semantics and developed initial quantitative methods. Psychologists later used large-scale corpora to construct high-dimensional semantic vectors, advancing semantic representation techniques. Computer scientists, building on this foundation, implemented these ideas via neural network-based embedding techniques capable of capturing contextual meaning. The evolution of lexical semantic representation methods has facilitated the development of word embedding techniques, enabling the rapid and efficient processing of massive text corpora and contributing to major breakthroughs in language-related AI.
Third, the algorithms of LLMs and cognitive mechanisms of human language processing share several key characteristics, mainly in terms of incremental processing, predictive ability, and dynamic attention allocation. Although the real-time processing, active prediction, and selective attention mechanisms shaped by human biological evolution differ in specifics from the autoregressive generation, masked prediction, and self-attention mechanisms used by LLMs, they exhibit a high degree of functional convergence. This convergence highlights the crucial role of language itself in the development of AI. The deep analogy between the two suggests that understanding the fundamental principles of language may be a vital pathway to achieving general intelligence. Therefore, psychological research into language processing mechanisms could provide essential theoretical foundations and practical guidance for the future development of AI.

关键词

大语言模型 / 心理学 / 跨学科合作 / 语言认知 / 人工神经网络

Key words

large language models / psychology / interdisciplinary cooperation / language cognition / artificial neural networks

引用本文

导出引用
黄林洁琼, 张雯, 陈珍, 李晨曦, 李兴珊. 大语言模型与心理学的历史渊源*[J]. 心理科学. 2025, 48(4): 773-781 https://doi.org/10.16719/j.cnki.1671-6981.20250401
Huang Linjieqiong, Zhang Wen, Chen Zhen, Li Chenxi, Li Xingshan. The Historical Origins of Large Language Models and Psychology[J]. Journal of Psychological Science. 2025, 48(4): 773-781 https://doi.org/10.16719/j.cnki.1671-6981.20250401

参考文献

[1] 张钹, 朱军, 苏航. (2020). 迈向第三代人工智能. 中国科学: 信息科学, 50(9), 1281-1302.
[2] Aher G. V., Arriaga R. I., & Kalai A. T. (2023). Using large language models to simulate multiple humans and replicate human subject studies. Paper presented at the Proceedings of the 40th International conference on machine learning.
[3] Altmann, G. T. M., & Kamide, Y. (1999). Incremental interpretation at verbs: Restricting the domain of subsequent reference. Cognition, 73(3), 247-264.
[4] Birch, S., & Rayner, K. (1997). Linguistic focus affects eye movements during reading. Memory and Cognition, 25, 653-660.
[5] Brown T., Mann B., Ryder N., Subbiah M., Kaplan J. D., Dhariwal P., Neelakantan A., Shyam P., Sastry G., & Askell A. (2020). Language models are few-shot learners. Advances in Neural Information Processing Systems, 33, 1877-1901.
[6] Coltheart M., Rastle K., Perry C., Langdon R., & Ziegler J. (2001). DRC: A dual route cascaded model of visual word recognition and reading aloud. Psychological Review, 108(1), 204-256.
[7] Devlin J., Chang M. W., Lee K., & Toutanova K. (2019). Bert: Pre-training of deep bidirectional transformers for language understanding. Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: Human language technologies.
[8] Dongare A. D., Kharde R. R., & Kachare A. D. (2012). Introduction to artificial neural network. International Journal of Engineering and Innovative Technology, 2(1), 189-194.
[9] Firth, J. R. (1957). A synopsis of linguistic theory 1930-1955. In J. R. Firth (Ed.), Studies in linguistic analysis (pp. 1-32). Oxford.
[10] Frazier, L., & Rayner, K. (1982). Making and correcting errors during sentence comprehension: Eye movements in the analysis of structurally ambiguous sentences. Cognitive Psychology, 14(2), 178-210.
[11] Goldstein A., Zada Z., Buchnik E., Schain M., Price A., Aubrey B., Nastase S. A., & Hasson U. (2022). Shared computational principles for language processing in humans and deep language models. Nature Neuroscience, 25(3), 369-380.
[12] Gunel B., Du J., Conneau A., & Stoyanov V. (2020). Supervised contrastive learning for pre-trained language model fine-tuning. ArXiv.
[13] Harris, Z. S. (1954). Distributional structure. Word, 10(2-3), 146-162.
[14] Hebb, D. O. (1949). The organization of behavior. Wiley & Sons.
[15] Hyönä J., Lorch Jr R. F., & Kaakinen J. K. (2002). Individual differences in reading to summarize expository text: Evidence from eye fixation patterns. Journal of Educational Psychology, 94(1), 44-55.
[16] Hinton G. E., Osindero S., & Teh Y. W. (2006). A fast learning algorithm for deep belief nets. Neural Computation, 18(7), 1527-1554.
[17] Joseph, H. S. S. L., & Liversedge, S. P. (2013). Children' s and adults' on-line processing of syntactically ambiguous sentences during reading. PLoS ONE, 8(1), e54141.
[18] Kintsch, W. (1988). The role of knowledge in discourse comprehension: A construction-integration model. Psychological Review, 95(2), 163-182.
[19] Krizhevsky A., Sutskever I., & Hinton G. E. (2012). ImageNet classification with deep convolutional neural networks. In F. Pereira, C. J. C. Burges, L. Bottou, & K. Q. Weinberger (Eds.), Advances in neural information processing systems. Curran Associates, Inc.
[20] Kutas, M., & Hillyard, S. A. (1980). Reading senseless sentences: Brain potentials reflect semantic incongruity. Science, 207(4427), 203-205.
[21] Kintsch, W., & van Dijk, T. A. (1978). Toward a model of text comprehension and production. Psychological Review, 85(5), 363-394.
[22] Landauer, T. K., & Dumais, S. T. (1997). A solution to Plato' s problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychological Review, 104(2), 211-240.
[23] Lund, K., & Burgess, C. (1996). Producing high-dimensional semantic spaces from lexical co-occurrence. Behavior Research Methods, Instruments, and Computers, 28(2), 203-208.
[24] Marr, D. (1982). Vision: A computational investigation into the human representation and processing of visual information. W. H. Freeman.
[25] McClelland, J. L., & Elman, J. L. (1986). The TRACE model of speech perception. Cognitive Psychology, 18(1), 1-86.
[26] McClelland, J. L., & Rumelhart, D. E. (1981). An interactive activation model of context effects in letter perception: I. An account of basic findings. Psychological Review, 88(5), 375-407.
[27] McCulloch, W. S., & Pitts, W. (1943). A logical calculus of the ideas immanent in nervous activity. The Bulletin of Mathematical Biophysics, 5, 115-133.
[28] Mikolov T., Chen K., Corrado G., & Dean J. (2013). Efficient estimation of word representations in vector space. ArXiv.
[29] Miller, G. A. (1995). WordNet: A lexical database for English. Communications of the ACM, 38(11), 39-41.
[30] Osgood, C. E. (1952). The nature and measurement of meaning. Psychological Bulletin, 49(3), 197-237.
[31] Osterhout, L., & Holcomb, P. J. (1992). Event-related brain potentials elicited by syntactic anomaly. Journal of Memory and Language, 31(6), 785-806.
[32] Ouerchefani R., Ouerchefani N., Rejeb M. R. B., & Le Gall D. (2024). Pragmatic language comprehension: Role of theory of mind, executive functions, and the prefrontal cortex. Neuropsychologia, 194, 108756.
[33] Ouyang L., Wu J., Jiang X., Almeida D., Wainwright C., Mishkin P., Zhang C., & Lowe R. (2022). Training language models to follow instructions with human feedback. Advances in Nneural Information Processing Systems, 35, 27730-27744.
[34] Radford A., Narasimhan K., Salimans T., & Sutskever I. (2018). Improving language understanding by generative pre-training. OpenAI.
[35] Radford A., Wu J., Child R., Luan D., Amodei D., & Sutskever I. (2019). Language models are unsupervised multitask learners. OpenAI.
[36] Rayner K., Pollatsek A., Ashby J., & Clifton Jr, C. (2012). Psychology of reading. Psychology Press..
[37] Rosenblatt, F. (1957). The perceptron, a perceiving and recognizing automaton Project Para. Cornell Aeronautical Laboratory.
[38] Rumelhart D. E., Hinton G. E., & Williams R. J. (1986). Learning representations by back-propagating errors. Nature, 323(6088), 533-536.
[39] Rumelhart D. E.,McClelland, J. L., & PDP Research Group. (1986). Parallel distributed processing, volume 1: Explorations in the microstructure of cognition: Foundations. MIT Press.
[40] Seidenberg, M. S., & McClelland, J. L. (1989). A distributed, developmental model of word recognition and naming. Psychological Review, 96(4), 523-568.
[41] Smith S., Patwary M., Norick B., LeGresley P., Rajbhandari S., Casper J., Z., Liu, & Catanzaro B. (2022). Using deepspeed and megatron to train megatron-turing nlg 530b, a large-scale generative language model. ArXiv.
[42] Vaswani A., Shazeer N., Parmar N., Uszkoreit J., Jones L., Gomez A. N., L., Kaiser, & Polosukhin I. (2017). Attention is all you need. Advances in Neural Information Processing Systems, 30, 5998-6008.
[43] Zhang C., Peng B., Sun X., Niu Q., Liu J., Chen K., Li M., Feng P., Bi Z., Liu M., Zhang Y., Fei C., Yin C. H., Yan L. K., & Wang T. (2024). From word vectors to multimodal embeddings: Techniques, applications, and future directions for large language models. ArXiv.

基金

* 本研究得到国家自然科学基金项目(32371156)的资助

PDF(388 KB)

评审附件

Accesses

Citation

Detail

段落导航
相关文章

/