从视知觉到记忆:人工智能驱动的信息重构*

张国欣, 王文强, 黎峰, 王本驰

心理科学 ›› 2025, Vol. 48 ›› Issue (6) : 1346-1358.

PDF(2201 KB)
中文  |  English
PDF(2201 KB)
心理科学 ›› 2025, Vol. 48 ›› Issue (6) : 1346-1358. DOI: 10.16719/j.cnki.1671-6981.20250605
计算建模与人工智能

从视知觉到记忆:人工智能驱动的信息重构*

  • 张国欣1,2, 王文强1,2, 黎峰**3, 王本驰**1,2
作者信息 +

AI-Driven Information Reconstruction: Bridging Visual Perception and Memory

  • Zhang Guoxin1,2, Wang Wenqiang1,2, Li Feng3, Wang Benchi1,2
Author information +
文章历史 +

摘要

视觉重构旨在解码外部视觉刺激引发的神经活动,并将其转化为视觉内容;记忆重构则致力于复现记忆提取时的内部神经活动,并生成相应的记忆内容。传统重构研究侧重于分析大脑信号的多维变量模式。近年来,随着深度学习的进步,多种生成式模型(如变分自编码器、对抗生成网络、扩散模型等)在视觉及记忆重构领域得到广泛应用。然而,这些方法在重构质量和泛化能力上仍面临诸多挑战。论文系统梳理了视觉重构的技术发展框架,深入分析了记忆重构的研究现状与局限,并提出优化记忆重构的多种策略,以助力理解大脑视觉加工与记忆存储的神经机制。

Abstract

This systematic review traces the evolution of neural decoding methodologies, from early visual feature extraction to the cutting-edge frontier of reconstructing memory and mental states. Advances in computational neuroscience and artificial intelligence have driven this progress, with sophisticated generative models redefining how neural signals are interpreted and translated.
Early pioneering studies in visual perception laid the groundwork for neural decoding by demonstrating the reconstruction of basic visual elements—such as oriented edges and simple geometric shapes—from functional magnetic resonance imaging (fMRI) and electroencephalography (EEG) data. These foundational efforts, while groundbreaking, were limited by simple stimulus designs and reliance on handcrafted feature representations. The advent of deep learning marked a transformative shift, enabling more flexible and nuanced interpretations of neural activity. Initial applications of variational autoencoders and generative adversarial networks established frameworks for mapping neural signals to latent representations, though challenges in reconstruction accuracy and model stability persisted.
Recent advances in diffusion models have ushered in a new era of precision, significantly enhancing the translation of neural signals into detailed visual outputs. Modern frameworks now leverage biologically inspired, hierarchical architectures that separately process low-level visual features and high-level semantic content, closely mirroring the organization of the human visual system. These developments have enabled the reconstruction of complex naturalistic scenes, human faces, and dynamic visual experiences, bringing neural signal interpretation closer to practical applications.
Venturing further, the field now confronts the more ambitious challenge of reconstructing memory and mental imagery—a task far more complex than visual perception decoding. Unlike perception studies that rely on controlled external stimuli, memory decoding must grapple with internally generated, highly subjective, and context-dependent neural signals. Preliminary research reveals intriguing parallels between neural patterns associated with recalled and perceived experiences, yet significant hurdles remain. These include the noisy and variable nature of memory-related signals, individual differences in neural encoding, and the dynamic, context-sensitive character of memory retrieval.
Technical challenges in memory reconstruction remain formidable. Current approaches struggle to generalize across diverse memory types and individual neural encoding variations, limiting their applicability. Non-invasive neuroimaging techniques suffer from insufficient temporal resolution and signal clarity, obscuring the fine-grained dynamics of memory processes. Furthermore, existing neural signal encoders—typically optimized for simpler, stimulus-driven tasks like visual perception—often fail to capture the complex, distributed brain activity underlying memory. New theoretical frameworks are needed to model the reconstructive, dynamic, and context-dependent nature of human memory, which fundamentally differs from the linear stimulus-response dynamics characteristic of visual perception.
Addressing these challenges demands several critical research priorities. First, developing advanced neuroimaging technologies with improved spatiotemporal resolution is essential to capture intricate, dynamic patterns of memory encoding. Second, designing more sophisticated neural signal encoders will better model the distributed and context-sensitive nature of memory processes. Third, creating large-scale, standardized neuroimaging datasets is vital for training robust, generalizable models that account for individual variability and diverse memory types. These efforts collectively aim to bridge current limitations and achieve accurate, generalizable memory reconstruction.
Successful reconstruction of memory and mental states could fundamentally transform our understanding of human cognition, enable novel treatments for memory-related disorders, and revolutionize human-computer interfaces. However, these advancements also raise profound philosophical questions regarding the nature of memory, personal identity, and consciousness. As neural decoding approaches this pivotal juncture, researchers must carefully balance its transformative potential against technical, ethical, and societal challenges. The journey from visual reconstruction to comprehensive mental state decoding represents a groundbreaking endeavor at the intersection of neuroscience and artificial intelligence, with the potential to reshape our understanding of the human mind.

关键词

多维变量模式分析 / 视觉重构 / 记忆重构 / 生成式模型

Key words

artificial intelligence / visual reconstruction / memory reconstruction / generative model

引用本文

导出引用
张国欣, 王文强, 黎峰, 王本驰. 从视知觉到记忆:人工智能驱动的信息重构*[J]. 心理科学. 2025, 48(6): 1346-1358 https://doi.org/10.16719/j.cnki.1671-6981.20250605
Zhang Guoxin, Wang Wenqiang, Li Feng, Wang Benchi. AI-Driven Information Reconstruction: Bridging Visual Perception and Memory[J]. Journal of Psychological Science. 2025, 48(6): 1346-1358 https://doi.org/10.16719/j.cnki.1671-6981.20250605

参考文献

[1] Allen E. J., St-Yves G., Wu Y., Breedlove J. L., Prince J. S., Dowdle L. T., Nau M., Caron B., Pestilli F., Charest I., Hutchinson J. B., Naselaris T., & Kay K. (2022). A massive 7T fMRI dataset to bridge cognitive neuroscience and artificial intelligence. Nature Neuroscience, 25(1), 116-126.
[2] Arjovsky M., Chintala S., & Bottou L. (2017, August -11). Wasserstein generative adversarial networks [Conference paper]. International Conference on Machine Learning, Sydney, Australia.
[3] Banerjee C., Nguyen K., Fookes C., & George K. (2024). Physics-informed computer vision: A review and perspectives. ACM Computing Surveys, 57(1), 1-38.
[4] Benchetrit Y., Banville H., & King J. R. (2024, May). Brain decoding: Toward real-time reconstruction of visual perception [Conference paper]. International Conference on Learning Representations, Vienna, Austria.
[5] Bhargav K., Ambika S., Deepak S., & Sudha S. (2020, November). Imagenation: A DCGAN-based method for image reconstruction from fMRI [Conference paper]. IEEE International Conference on Research in Computational Intelligence and Communication Networks, Bangalore, India.
[6] Breedlove J. L., St-Yves G., Olman C. A., & Naselaris T. (2020). Generative feedback explains distinct brain activity codes for seen and mental images. Current Biology, 30(12), 2211-2224.e6.
[7] Chang N., Pyles J. A., Marcus A., Gupta A., Tarr M. J., & Aminoff E. M. (2019). BOLD5000, a public fMRI dataset while viewing 5000 visual images. Scientific Data, 6(1), 49.
[8] Chen P., Zhang C., Li B., Tong L., Wang L., Ma S., & Yan B. (2024). An fMRI dataset in response to large-scale short natural dynamic facial expression videos. Scientific Data, 11(1), 1247.
[9] Chen Z., Qing J., & Zhou J. H. (2023, December 10-16). Cinematic mindscapes: High-quality video reconstruction from brain activity [Conference paper]. Conference on Neural Information Processing Systems, New Orleans, LA, United States.
[10] Chen Z., Qing J., Xiang T., Yue W. L., & Zhou J. H. (2023, June 18-22). Seeing beyond the brain: Conditional diffusion model with sparse masked modeling for vision decoding [Conference paper]. Conference on Computer Vision and Pattern Recognition, Vancouver, Canada.
[11] Ciftci, K., & Hackl, K. (2024). A physics-informed GAN framework based on model-free data-driven computational mechanics. Computer Methods in Applied Mechanics and Engineering, 424, 116907.
[12] Cowen A. S., Chun M. M., & Kuhl B. A. (2014). Neural portraits of perception: Reconstructing face images from evoked brain activity. NeuroImage, 94, 12-22.
[13] Dhariwal, P., & Nichol, A. (2021, December 6-14). Diffusion models beat GANs on image synthesis [Conference paper]. Conference on Neural Information Processing Systems, Virtual Conference.
[14] Fares A., Zhong S., & Jiang J. (2020, October 12-16). Brain-media: A dual conditioned and lateralization supported GAN (DCLS-GAN) towards visualization of image-evoked brain activities [Conference paper]. ACM International Conference on Multimedia, Seattle, WA, United States.
[15] Goodfellow I., Pouget-Abadie J., Mirza M., Xu B., Warde-Farley D., Ozair S., Courville A., & Bengio Y. (2020). Generative adversarial networks. Communications of the ACM, 63(11), 139-144.
[16] Gong Z., Bao G., Zhang Q., Wan Z., Miao D., Wang S., & Zhang Y. (2024, December 9-15). NeuroClips: Towards high-fidelity and smooth fMRI-to-video reconstruction [Conference paper]. Conference on Neural Information Processing Systems, New Orleans, LA, United States.
[17] Grootswagers T., Zhou I., Robinson A. K., Hebart M. N., & Carlson T. A. (2022). Human EEG recordings for 1,854 concepts presented in rapid serial visual presentation streams. Scientific Data, 9(1), 3.
[18] Gu Z., Jamison K. W., Khosla M., Allen E. J., Wu Y., St-Yves G., Naselaris T., Kay K., Sabuncu M. R., & Kuceyeski A. (2022). NeuroGen: Activation optimized image synthesis for discovery neuroscience. NeuroImage, 247, 118812.
[19] Güçlütürk Y., Güçlü U., Seeliger K., Bosch S., van Lier R., & van Gerven, M. A. (2017, December 4-9). Reconstructing perceived faces from brain activations with deep adversarial neural decoding [Conference paper]. Conference on Neural Information Processing Systems, Long Beach, CA, United States.
[20] Harrison, S. A., & Tong, F. (2009). Decoding reveals the contents of visual working memory in early visual areas. Nature, 458(7238), 632-635.
[21] Haxby J. V., Gobbini M. I., Furey M. L., Ishai A., Schouten J. L., & Pietrini P. (2001). Distributed and overlapping representations of faces and objects in ventral temporal cortex. Science, 293(5539), 2425-2430.
[22] Hebart M. N., Contier O., Teichmann L., Rockter A. H., Zheng C. Y., Kidder A., Corriveau A., Vaziripash K. M., & Baker C. I. (2023). THINGS-data, a multimodal collection of large-scale datasets for investigating object representations in human brain and behavior. eLife, 12, e82580.
[23] Higgins I., Matthey L., Pal A., Burgess C., Glorot X., Botvinick M., Mohamed S., & Lerchner A. (2017, April 24-26). Beta-VAE: Learning basic visual concepts with a constrained variational framework [Conference paper]. International Conference on Learning Representations, Toulon, France.
[24] Hinton, G. E., & Zemel, R. (1993, December 6-9). Autoencoders, minimum description length and Helmholtz free energy [Conference paper]. Conference on Neural Information Processing Systems, Denver, CO, United States.
[25] Ho J., Jain A., & Abbeel P. (2020, December 6-12). Denoising diffusion probabilistic models [Conference paper]. Conference on Neural Information Processing Systems, Virtual Conference.
[26] Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735-1780.
[27] Horikawa, T., & Kamitani, Y. (2017). Generic decoding of seen and imagined objects using hierarchical visual features. Nature Communications, 8(1), 15037.
[28] Huang W., Yan H., Wang C., Li J., Yang X., Li L., Zou, Z. F, Zhang, J., & Chen H. (2020). Long short-term memory-based neural decoding of object categories evoked by natural images. Human Brain Mapping, 41(15), 4442-4453.
[29] Jiao Z., You H., Yang F., Li X., Zhang H., & Shen D. (2019, August 10-16). Decoding EEG by visual-guided deep neural networks [Conference paper]. International Joint Conference on Artificial Intelligence, Macao, China.
[30] Kavasidis I., Palazzo S., Spampinato C., Giordano D., & Shah M. (2017, October 23-27). Brain2Image: Converting brain signals into images [Conference paper]. ACM International Conference on Multimedia, Mountain View, CA, United States.
[31] Kay K. N., Naselaris T., Prenger R. J., & Gallant J. L. (2008). Identifying natural images from human brain activity. Nature, 452(7185), 352-355.
[32] Kingma, D. P., & Welling, M. (2014, April 14-16). Auto-encoding variational bayes [Conference paper]. International Conference on Learning Representations, Banff, Canada.
[33] Klepl D., Wu M., & He F. (2024). Graph neural network-based eeg classification: A survey. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 32, 493-503.
[34] Kumar P., Saini R., Roy P. P., Sahu P. K., & Dogra D. P. (2018). Envisioned speech recognition using EEG sensors. Personal and Ubiquitous Computing, 22(1), 185-199.
[35] Kupershmidt G., Beliy R., Gaziv G., & Irani M. (2022). A penny for your (visual) thoughts: Self-supervised reconstruction of natural movies from brain activity. arXiv.
[36] Lee, H., & Kuhl, B. A. (2016). Reconstructing perceived and retrieved faces from activity patterns in lateral parietal cortex. Journal of Neuroscience, 36(22), 6069-6082.
[37] LeCun Y., Boser B., Denker J. S., Henderson D., Howard R. E., Hubbard W., & Jackel L. D. (1989). Backpropagation applied to handwritten zip code recognition. Neural Computation, 1(4), 541-551.
[38] Li D., Wei C., Li S., Zou J., & Liu Q. (2024, December 9-15). Visual decoding and reconstruction via EEG embeddings with guided diffusion [Conference paper]. Conference on Neural Information Processing Systems, New Orleans, LA, United States.
[39] Lin T. Y., Maire M., Belongie S., Hays J., Perona P., Ramanan D., Dollár P., & Zitnick C. L. (2014, September 6-12). Microsoft COCO: Common objects in context [Conference paper]. European Conference on Computer Vision, Zurich, Switzerland.
[40] Lin T. Y., Maire M., Belongie S., Hays J., Perona P., Ramanan D., Dollár P., & Zitnick, C. L. (2014). Microsoft COCO: Common objects in context. In D. Fleet, T. Pajdla, B. Schiele, & T. Tuytelaars (Eds.), European Conference on Computer Vision (pp. 740-755). Spring International Publishing.
[41] Liu J., Zhan M., Hajhajate D., Spagna A., Dehaene S., Cohen L., & Bartolomeo P. (2025). Visual mental imagery in typical imagers and in aphantasia: A millimeter-scale 7-T fMRI study. Cortex, 185, 113-132.
[42] Liu X. H., Liu Y. K., Wang Y., Ren K., Shi H., Wang Z., & Zheng W. L. (2024, December 9-15). EEG2video: Towards decoding dynamic visual perception from EEG signals [Conference paper]. Conference on Neural Information Processing Systems, New Orleans, LA, United States.
[43] Mirza, M., & Osindero, S. (2014, December 8-13). Conditional generative adversarial nets [Conference paper]. NIPS 2014: Deep Learning and Representation Learning Workshop, Montreal, Canada.
[44] Miyawaki Y., Uchida H., Yamashita O., Sato M., Morito Y., Tanabe H. C., Sadato N., & Kamitani Y. (2008). Visual image reconstruction from human brain activity using a combination of multiscale local image decoders. Neuron, 60(5), 915-929.
[45] Mou C., Wang X., Xie L., Wu Y., Zhang J., Qi Z., & Shan Y. (2024). T2i-adapter: Learning adapters to dig out more controllable ability for text-to-image diffusion models. Proceedings of the AAAI Conference on Artificial Intelligence, 38(5), 4296-4304.
[46] Naik R., Chaudhari K., Jadhav K., & Joshi A. (2025). MindCeive: Perceiving human imagination using CNN-GRU and GANs. Biomedical Signal Processing and Control, 100, 107110.
[47] Naselaris T., Prenger R. J., Kay K. N., Oliver M., & Gallant J. L. (2009). Bayesian reconstruction of natural images from human brain activity. Neuron, 63(6), 902-915.
[48] Nishimoto S., Vu A. T., Naselaris T., Benjamini Y., Yu B., & Gallant J. L. (2011). Reconstructing visual experiences from brain activity evoked by natural movies. Current Biology, 21(19), 1641-1646.
[49] Palazzo S., Spampinato C., Kavasidis I., Giordano D., & Shah M. (2017, October 22-29). Generative adversarial networks conditioned by brain signals [Conference paper]. International Conference on Computer Vision, Venice, Italy.
[50] Podell D., English Z., Lacey K., Blattmann A., Dockhorn T., Müller J., Penna J., & Rombach R. (2024, May 7-11). SDXL: Improving latent diffusion models for high-resolution image synthesis [Conference paper]. International Conference on Learning Representations, Vienna, Austria.
[51] Radford A., Kim J. W., Hallacy C., Ramesh A., Goh G., Agarwal S., Sastry G., Askell A., Mishkin P., & Clark J. (2021, July 18-24). Learning transferable visual models from natural language supervision [Conference paper]. International Conference on Machine Learning, Virtual Conference.
[52] Rakhimberdina Z., Jodelet Q., Liu X., & Murata T. (2021). Natural image reconstruction from fMRI using deep learning: A survey. Frontiers in Neuroscience, 15, 795488.
[53] Robinson A. K., Quek G. L., & Carlson T. A. (2023). Visual representations: Insights from neural decoding. Annual Review of Vision Science, 9(1), 313-335.
[54] Ronneberger O., Fischer P., & Brox T. (2015, October 5-9). U-net: Convolutional networks for biomedical image segmentation [Conference paper]. International Conference on Medical Image Computing and Computer-Assisted Intervention, Munich, Germany.
[55] Russakovsky O., Deng J., Su H., Krause J., Satheesh S., Ma S., Huang Z., Karpathy A., Khosla A., Bernstein M., Berg A. C., & Fei-Fei L. (2015). ImageNet large scale visual recognition challenge. International Journal of Computer Vision, 115, 211-252.
[56] Schoenmakers S., Barth M., Heskes T., & Van Gerven M. (2013). Linear reconstruction of perceived images from human brain activity. NeuroImage, 83, 951-961.
[57] Scotti P., Banerjee A., Goode J., Shabalin S., Nguyen A., Cohen E., Dempster A., Verlinde N., Yundler E., Weisberg D., Norman K., & Abraham T. (2023, December 10-16). Reconstructing the mind' s eye: fMRI-to-image with contrastive learning and diffusion priors [Conference paper]. Conference on Neural Information Processing Systems, New Orleans, LA, United States.
[58] Seeliger K., Fritsche M., Güçlü U., Schoenmakers S., Schoffelen J. M., Bosch S. E., & van Gerven, M. A. J. (2018). Convolutional neural network-based encoding and decoding of visual object recognition in space and time. NeuroImage, 180, 253-266.
[59] Shen G., Horikawa T., Majima K., & Kamitani Y. (2019). Deep image reconstruction from human brain activity. PLoS Computational Biology, 15(1), e1006633.
[60] Shimizu, H., & Srinivasan, R. (2022). Improving classification and reconstruction of imagined images from EEG signals. PLoS ONE, 17(9), e0274847.
[61] Singh P., Pandey P., Miyapuram K., & Raman S. (2023, June 4-9). EEG2IMAGE: Image reconstruction from EEG brain signals [Conference paper]. International Conference on Acoustics, Speech and Signal Processing, Rhodes, Greece.
[62] Sohl-Dickstein J., Weiss E. A., Maheswaranathan N., & Ganguli S. (2015, July 6-11). Deep unsupervised learning using nonequilibrium thermodynamics [Conference paper]. International Conference on Machine Learning, Lille, France.
[63] Song J., Meng C., & Ermon S. (2021, May 3-7). Denoising diffusion implicit models [Conference paper]. International Conference on Learning Representations, Virtual Conference.
[64] Song Y., Liu B., Li X., Shi N., Wang Y., & Gao X. (2024, May 7-11). Decoding natural images from EEG for object recognition [Conference paper]. International Conference on Learning Representations, Vienna, Austria.
[65] Spampinato C., Palazzo S., Kavasidis I., Giordano D., Shah M., & Souly N. (2019, June 16-20). Deep learning human mind for automated visual classification [Conference paper]. Computer Vision and Pattern Recognition Conference, Long Beach, CA, United States.
[66] Takagi, Y., & Nishimoto, S. (2023, June 18-22). High-resolution image reconstruction with latent diffusion models from human brain activity [Conference paper]. Computer Vision and Pattern Recognition Conference, Vancouver, Canada.
[67] Tirupattur P., Rawat Y. S., Spampinato C., & Shah M. (2018, October 22-26). ThoughtViz: Visualizing human thoughts using generative adversarial network [Conference paper]. ACM International Conference on Multimedia, Seoul, Republic of Korea.
[68] Tong, J., & Chen, W. (2025). MMPI net: A novel multimodal model considering the similarities between perception and imagination for image evoked EEG decoding. IEEE Journal of Biomedical and Health Informatics, 29(8), 5549-5560.
[69] van Den Oord A., Vinyals O., & Kavukcuoglu K. (2017, December 4-9). Neural discrete representation learning [Conference paper]. Conference on Neural Information Processing Systems, Long Beach, CA, United States.
[70] Vaswani A., Shazeer N., Parmar N., Uszkoreit J., Jones L., Gomez A. N., Kaiser Ł., & Polosukhin I. (2017, December 4-9). Attention is all you need [Conference paper]. Conference on Neural Information Processing Systems, Long Beach, CA, United States.
[71] Wang C., Yan H., Huang W., Li J., Wang Y., Fan Y.-S., Sheng W., Liu T., Li R., & Chen H. (2022). Reconstructing rapid natural vision with fMRI-conditional video generative adversarial network. Cerebral Cortex, 32(20), 4502-4511.
[72] Wen H., Shi J., Zhang Y., Lu K. H., Cao J., & Liu Z. (2018). Neural encoding and decoding with deep learning for dynamic natural vision. Cerebral Cortex, 28(12), 4136-4160.
[73] Xia W., De Charette R., Oztireli C., & Xue J. H. (2024, January 3-7). DREAM: Visual decoding from REversing HumAn visual SysteM [Conference paper]. Winter Conference on Applications of Computer Vision, Waikoloa, HI, United States.
[74] Xu X., Wang Z., Zhang E., Wang K., & Shi H. (2023, October 2-6). Versatile diffusion: Text, images and variations all in one diffusion model [Conference paper]. International Conference on Computer Vision, Paris, France.
[75] Ye H., Zhang J., Liu S., Han X., & Yang W. (2023). IP-adapter: Text compatible image prompt adapter for text-to-image diffusion models. ArXiv.
[76] Zafar R., Malik A. S., Kamel N., Dass S. C., Abdullah J. M., Reza F., & Abdul Karim, A. H. (2015). Decoding of visual information from human brain activity: A review of fMRI and EEG studies. Journal of Integrative Neuroscience, 14(2), 155-168.
[77] Zeng H., Xia N., Tao M., Pan D., Zheng H., Wang C., Xu F., Zakaria W., & Dai G. (2023). DCAE: A dual conditional autoencoder framework for the reconstruction from EEG into image. Biomedical Signal Processing and Control, 81, 104440.
[78] Zhang L., Rao A., & Agrawala M. (2023, October 2-6). Adding conditional control to text-to-image diffusion models [Conference paper]. International Conference on Computer Vision, Paris, France.
[79] Zhao S., Liu Z., Lin J., Zhu J. Y., & Han S. (2020, December 6-12). Differentiable augmentation for data-efficient GAN training [Conference paper]. Conference on Neural Information Processing Systems, Virtual Conference.

基金

*本研究得到国家社科重大项目(20&ZD296)的资助

PDF(2201 KB)

Accesses

Citation

Detail

段落导航
相关文章

/