The Historical Origins of Large Language Models and Psychology

Huang Linjieqiong, Zhang Wen, Chen Zhen, Li Chenxi, Li Xingshan

Journal of Psychological Science ›› 2025, Vol. 48 ›› Issue (4) : 773-781.

PDF(388 KB)
PDF(388 KB)
Journal of Psychological Science ›› 2025, Vol. 48 ›› Issue (4) : 773-781. DOI: 10.16719/j.cnki.1671-6981.20250401
Computational modeling and artificial intelligence

The Historical Origins of Large Language Models and Psychology

  • Huang Linjieqiong1, Zhang Wen1,2, Chen Zhen1,2, Li Chenxi1,2, Li Xingshan1,2
Author information +
History +

Abstract

In recent years, large language models (LLMs) have made significant advancements. Through deep learning, LLMs have learned from vast amounts of human language data and demonstrated human-like language understanding and generation abilities. Through techniques such as supervised fine-tuning and reinforcement learning, LLMs can handle a variety of human tasks and generate text according to human intentions, marking a major breakthrough in the field of artificial intelligence (AI). This paper reviews the development of LLMs, demonstrates their historical roots in psychology, and highlights the critical role of interdisciplinary collaboration, offering insights for future research at the intersection of AI and psychology.
First, psychologists have played a foundational role in the development of artificial neural networks—the backbone of LLMs. Early neuropsychologists such as Donald Hebb and psychologist Frank Rosenblatt focused on learning mechanisms within neural systems, thereby laying the groundwork for machine learning. Long before the deep learning era, psychologists extensively used artificial neural networks to model human cognition. Researchers such as James L. McClelland and David E. Rumelhart continuously refined network architectures to simulate language processing, fostering deep integration between psychology and artificial neural networks. These contributions provided essential theoretical and methodological foundations for the development of LLMs.
Second, the technique of word embeddings is central for enabling LMMs to understand language, and its development has benefited from interdisciplinary collaboration among psychology, linguistics, and computer science. Word embedding technique enables abstract language to be transformed into a form that computers can understand and process. Early psychological and linguistic research introduced the concept of distributed representations of lexical semantics and developed initial quantitative methods. Psychologists later used large-scale corpora to construct high-dimensional semantic vectors, advancing semantic representation techniques. Computer scientists, building on this foundation, implemented these ideas via neural network-based embedding techniques capable of capturing contextual meaning. The evolution of lexical semantic representation methods has facilitated the development of word embedding techniques, enabling the rapid and efficient processing of massive text corpora and contributing to major breakthroughs in language-related AI.
Third, the algorithms of LLMs and cognitive mechanisms of human language processing share several key characteristics, mainly in terms of incremental processing, predictive ability, and dynamic attention allocation. Although the real-time processing, active prediction, and selective attention mechanisms shaped by human biological evolution differ in specifics from the autoregressive generation, masked prediction, and self-attention mechanisms used by LLMs, they exhibit a high degree of functional convergence. This convergence highlights the crucial role of language itself in the development of AI. The deep analogy between the two suggests that understanding the fundamental principles of language may be a vital pathway to achieving general intelligence. Therefore, psychological research into language processing mechanisms could provide essential theoretical foundations and practical guidance for the future development of AI.

Key words

large language models / psychology / interdisciplinary cooperation / language cognition / artificial neural networks

Cite this article

Download Citations
Huang Linjieqiong, Zhang Wen, Chen Zhen, Li Chenxi, Li Xingshan. The Historical Origins of Large Language Models and Psychology[J]. Journal of Psychological Science. 2025, 48(4): 773-781 https://doi.org/10.16719/j.cnki.1671-6981.20250401

References

[1] 张钹, 朱军, 苏航. (2020). 迈向第三代人工智能. 中国科学: 信息科学, 50(9), 1281-1302.
[2] Aher G. V., Arriaga R. I., & Kalai A. T. (2023). Using large language models to simulate multiple humans and replicate human subject studies. Paper presented at the Proceedings of the 40th International conference on machine learning.
[3] Altmann, G. T. M., & Kamide, Y. (1999). Incremental interpretation at verbs: Restricting the domain of subsequent reference. Cognition, 73(3), 247-264.
[4] Birch, S., & Rayner, K. (1997). Linguistic focus affects eye movements during reading. Memory and Cognition, 25, 653-660.
[5] Brown T., Mann B., Ryder N., Subbiah M., Kaplan J. D., Dhariwal P., Neelakantan A., Shyam P., Sastry G., & Askell A. (2020). Language models are few-shot learners. Advances in Neural Information Processing Systems, 33, 1877-1901.
[6] Coltheart M., Rastle K., Perry C., Langdon R., & Ziegler J. (2001). DRC: A dual route cascaded model of visual word recognition and reading aloud. Psychological Review, 108(1), 204-256.
[7] Devlin J., Chang M. W., Lee K., & Toutanova K. (2019). Bert: Pre-training of deep bidirectional transformers for language understanding. Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: Human language technologies.
[8] Dongare A. D., Kharde R. R., & Kachare A. D. (2012). Introduction to artificial neural network. International Journal of Engineering and Innovative Technology, 2(1), 189-194.
[9] Firth, J. R. (1957). A synopsis of linguistic theory 1930-1955. In J. R. Firth (Ed.), Studies in linguistic analysis (pp. 1-32). Oxford.
[10] Frazier, L., & Rayner, K. (1982). Making and correcting errors during sentence comprehension: Eye movements in the analysis of structurally ambiguous sentences. Cognitive Psychology, 14(2), 178-210.
[11] Goldstein A., Zada Z., Buchnik E., Schain M., Price A., Aubrey B., Nastase S. A., & Hasson U. (2022). Shared computational principles for language processing in humans and deep language models. Nature Neuroscience, 25(3), 369-380.
[12] Gunel B., Du J., Conneau A., & Stoyanov V. (2020). Supervised contrastive learning for pre-trained language model fine-tuning. ArXiv.
[13] Harris, Z. S. (1954). Distributional structure. Word, 10(2-3), 146-162.
[14] Hebb, D. O. (1949). The organization of behavior. Wiley & Sons.
[15] Hyönä J., Lorch Jr R. F., & Kaakinen J. K. (2002). Individual differences in reading to summarize expository text: Evidence from eye fixation patterns. Journal of Educational Psychology, 94(1), 44-55.
[16] Hinton G. E., Osindero S., & Teh Y. W. (2006). A fast learning algorithm for deep belief nets. Neural Computation, 18(7), 1527-1554.
[17] Joseph, H. S. S. L., & Liversedge, S. P. (2013). Children' s and adults' on-line processing of syntactically ambiguous sentences during reading. PLoS ONE, 8(1), e54141.
[18] Kintsch, W. (1988). The role of knowledge in discourse comprehension: A construction-integration model. Psychological Review, 95(2), 163-182.
[19] Krizhevsky A., Sutskever I., & Hinton G. E. (2012). ImageNet classification with deep convolutional neural networks. In F. Pereira, C. J. C. Burges, L. Bottou, & K. Q. Weinberger (Eds.), Advances in neural information processing systems. Curran Associates, Inc.
[20] Kutas, M., & Hillyard, S. A. (1980). Reading senseless sentences: Brain potentials reflect semantic incongruity. Science, 207(4427), 203-205.
[21] Kintsch, W., & van Dijk, T. A. (1978). Toward a model of text comprehension and production. Psychological Review, 85(5), 363-394.
[22] Landauer, T. K., & Dumais, S. T. (1997). A solution to Plato' s problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychological Review, 104(2), 211-240.
[23] Lund, K., & Burgess, C. (1996). Producing high-dimensional semantic spaces from lexical co-occurrence. Behavior Research Methods, Instruments, and Computers, 28(2), 203-208.
[24] Marr, D. (1982). Vision: A computational investigation into the human representation and processing of visual information. W. H. Freeman.
[25] McClelland, J. L., & Elman, J. L. (1986). The TRACE model of speech perception. Cognitive Psychology, 18(1), 1-86.
[26] McClelland, J. L., & Rumelhart, D. E. (1981). An interactive activation model of context effects in letter perception: I. An account of basic findings. Psychological Review, 88(5), 375-407.
[27] McCulloch, W. S., & Pitts, W. (1943). A logical calculus of the ideas immanent in nervous activity. The Bulletin of Mathematical Biophysics, 5, 115-133.
[28] Mikolov T., Chen K., Corrado G., & Dean J. (2013). Efficient estimation of word representations in vector space. ArXiv.
[29] Miller, G. A. (1995). WordNet: A lexical database for English. Communications of the ACM, 38(11), 39-41.
[30] Osgood, C. E. (1952). The nature and measurement of meaning. Psychological Bulletin, 49(3), 197-237.
[31] Osterhout, L., & Holcomb, P. J. (1992). Event-related brain potentials elicited by syntactic anomaly. Journal of Memory and Language, 31(6), 785-806.
[32] Ouerchefani R., Ouerchefani N., Rejeb M. R. B., & Le Gall D. (2024). Pragmatic language comprehension: Role of theory of mind, executive functions, and the prefrontal cortex. Neuropsychologia, 194, 108756.
[33] Ouyang L., Wu J., Jiang X., Almeida D., Wainwright C., Mishkin P., Zhang C., & Lowe R. (2022). Training language models to follow instructions with human feedback. Advances in Nneural Information Processing Systems, 35, 27730-27744.
[34] Radford A., Narasimhan K., Salimans T., & Sutskever I. (2018). Improving language understanding by generative pre-training. OpenAI.
[35] Radford A., Wu J., Child R., Luan D., Amodei D., & Sutskever I. (2019). Language models are unsupervised multitask learners. OpenAI.
[36] Rayner K., Pollatsek A., Ashby J., & Clifton Jr, C. (2012). Psychology of reading. Psychology Press..
[37] Rosenblatt, F. (1957). The perceptron, a perceiving and recognizing automaton Project Para. Cornell Aeronautical Laboratory.
[38] Rumelhart D. E., Hinton G. E., & Williams R. J. (1986). Learning representations by back-propagating errors. Nature, 323(6088), 533-536.
[39] Rumelhart D. E.,McClelland, J. L., & PDP Research Group. (1986). Parallel distributed processing, volume 1: Explorations in the microstructure of cognition: Foundations. MIT Press.
[40] Seidenberg, M. S., & McClelland, J. L. (1989). A distributed, developmental model of word recognition and naming. Psychological Review, 96(4), 523-568.
[41] Smith S., Patwary M., Norick B., LeGresley P., Rajbhandari S., Casper J., Z., Liu, & Catanzaro B. (2022). Using deepspeed and megatron to train megatron-turing nlg 530b, a large-scale generative language model. ArXiv.
[42] Vaswani A., Shazeer N., Parmar N., Uszkoreit J., Jones L., Gomez A. N., L., Kaiser, & Polosukhin I. (2017). Attention is all you need. Advances in Neural Information Processing Systems, 30, 5998-6008.
[43] Zhang C., Peng B., Sun X., Niu Q., Liu J., Chen K., Li M., Feng P., Bi Z., Liu M., Zhang Y., Fei C., Yin C. H., Yan L. K., & Wang T. (2024). From word vectors to multimodal embeddings: Techniques, applications, and future directions for large language models. ArXiv.
PDF(388 KB)

Accesses

Citation

Detail

Sections
Recommended

/