The Performance of Deep Convolutional Neural Networks in Face Recognition and the Comparison with the Human Visual System

Cheng Yuhui, Shen Tianyu, Lu Zitong, Yuan Xiangyong, Jiang Yi

Journal of Psychological Science ›› 2025, Vol. 48 ›› Issue (4) : 814-825.

PDF(670 KB)
PDF(670 KB)
Journal of Psychological Science ›› 2025, Vol. 48 ›› Issue (4) : 814-825. DOI: 10.16719/j.cnki.1671-6981.20250405
Computational modeling and artificial intelligence

The Performance of Deep Convolutional Neural Networks in Face Recognition and the Comparison with the Human Visual System

  • Cheng Yuhui1, Shen Tianyu1, Lu Zitong2, Yuan Xiangyong3,4, Jiang Yi3,4
Author information +
History +

Abstract

Face recognition is a fundamental cognitive function that plays a crucial role in human social interaction, as the human brain exhibits a remarkable sensitivity to facial stimuli. For decades, psychologists, cognitive neuroscientists, and computer vision researchers have been dedicated to uncovering the behavioral and neural mechanisms underlying face processing. Existing studies have demonstrated that humans process facial information differently from other objects, supporting the existence of highly specialized mechanisms for face perception. In particular, the fusiform face area (FFA) in the human brain has been identified as a specialized region for face recognition, and numerous face-selective neurons have been observed in the temporal lobe of macaques. In recent years, Deep Convolutional Neural Networks (DCNNs) have demonstrated remarkable performance in modeling and understanding face processing, providing new computational perspectives for exploring the neural mechanisms underlying face recognition. DCNNs are a class of artificial neural networks that have achieved impressive performance in visual recognition tasks, including face recognition. These models typically begin by applying a series of convolutional and pooling operations to extract increasingly abstract features, which are then passed through one or more fully connected layers to perform classification tasks. Consequently, there has been a growing interest in investigating the applications of DCNNs in face recognition.
First, this review examines the performance of DCNNs in identifying key facial attributes. Although most DCNNs are trained only for face identity tasks, they can still infer social information such as gender and expression. In addition, this review also discusses the similarities and differences between DCNNs and humans in well-known face processing phenomena, such as the inversion, own-race, and familiarity effects. Evidence suggests that DCNNs can produce face-specific cognitive effects similar to those observed in humans. To better understand the computational validity of DCNNs, this review compares their internal representations with the neural mechanisms involved in human face recognition. On the one hand, this paper analyzes the hierarchical processing architecture that emerges in trained DCNNs and evaluates its correspondence with the hierarchical structure of the human visual system, spanning from early visual areas (e.g., V1-V4) to higher-level face-selective regions such as the FFA. On the other hand, this review further discusses evidence for brain-like functional specialization within DCNNs, examining whether units selective to different facial attributes can be mapped onto the functionally specialized cortical areas observed in neuroimaging and electrophysiological studies.
Lastly, this paper highlights several limitations of current models and outlines promising directions for future research. First, although DCNNs excel at face recognition, they remain far less robust than humans when faced with challenges such as viewpoint shifts, image distortions, adversarial perturbations, and limited training data. Second, although DCNNs exhibit behavioral effects like those observed in humans, there are multiple possible explanations for the underlying mechanisms responsible for these phenomena. The DCNN models examined in different studies often vary in terms of architecture, task objectives, and training datasets, which may affect the comparability of their results. Third, the extent to which current models can capture essential features of the biological visual system remains unclear. Specifically, many DCNNs operate as feedforward architectures and lack critical elements such as recurrent processing, top-down feedback, and dynamic attentional modulation, all of which are fundamental characteristics of the human visual system. Fourth, current neural network models primarily focus on the perceptual stage underlying face recognition. Future research should aim to incorporate semantic-level processing to more fully capture the complexity of human face perception. Fifth, generative Adversarial Networks (GANs) have recently attracted significant attention, which are powerful tools for generating diverse facial stimuli, enabling more controlled and flexible investigations of face perception. Integrating GANs with DCNNs has also enhanced our understanding of the mechanisms underlying facial representation, making it a promising direction for future research.

Key words

face recognition / convolutional neural network / fusiform face area / hierarchical structure / functional specialization

Cite this article

Download Citations
Cheng Yuhui, Shen Tianyu, Lu Zitong, Yuan Xiangyong, Jiang Yi. The Performance of Deep Convolutional Neural Networks in Face Recognition and the Comparison with the Human Visual System[J]. Journal of Psychological Science. 2025, 48(4): 814-825 https://doi.org/10.16719/j.cnki.1671-6981.20250405

References

[1] Baek S., Song M., Jang J., Kim G., & Paik S. B. (2021). Face detection in untrained deep neural networks. Nature Communications, 12(1), 7328.
[2] Behrmann, M., & Avidan, G. (2022). Face perception: Computational insights from phylogeny. Trends in Cognitive Sciences, 26(4), 350-363.
[3] Belhumeur P. N., Hespanha J. P., & Kriegman D. J. (1997). Eigenfaces vs. fisherfaces: Recognition using class specific linear projection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(7), 711-720.
[4] Blauch N. M., Behrmann M., & Plaut D. C. (2021). Computational insights into human perceptual expertise for familiar and unfamiliar face recognition. Cognition, 208, 104341.
[5] Blauch N. M., Behrmann M., & Plaut D. C. (2022). A connectivity-constrained computational account of topographic organization in primate high-level visual cortex. Proceedings of the National Academy of Sciences, 119(3), e2112566119.
[6] Bothwell R. K., Brigham J. C., & Malpass R. S. (1989). Cross-racial identification. Personality and Social Psychology Bulletin, 15(1), 19-25.
[7] Cadieu C. F., Hong H., Yamins D. L. K., Pinto N., Ardila D., Solomon E. A., DiCarlo J. J. (2014). Deep neural networks rival the representation of primate IT cortex for core visual object recognition. PLOS Computational Biology, 10(12), e1003963.
[8] Calder A. J.(2011). Oxford handbook of face perception. Oxford University Press..
[9] Colón Y. I., Castillo C. D., & O’Toole A. J. (2021). Facial expression is retained in deep networks trained for face identification. Journal of Vision, 21(4), 4-4.
[10] Dahl C. D., Logothetis N. K., Bülthoff H. H., & Wallraven C. (2010). The thatcher illusion in humans and monkeys. Proceedings of the Royal Society B: Biological Sciences, 277(1696), 2973-2981.
[11] Desimone, R. (1991). Face-selective cells in the temporal cortex of monkeys. Journal of Cognitive Neuroscience, 3(1), 1-8.
[12] Dhar P., Bansal A., Castillo C. D., Gleason J., Phillips P. J., & Chellappa R. (2020). How are attributes expressed in face DCNNs? Paper presented at the 2020 15th IEEE international conference on automatic face and gesture recognition.
[13] Dobs K., Isik L., Pantazis D., & Kanwisher N. (2019). How face perception unfolds over time. Nature Communications, 10(1), 1258.
[14] Dobs K., Martinez J., Kell A. J. E., & Kanwisher N. (2022). Brain-like functional specialization emerges spontaneously in deep neural networks. Science Advances, 8(11), eabl8913.
[15] Dobs K., Yuan J., Martinez J., & Kanwisher N. (2023). Behavioral signatures of face perception emerge in deep neural networks optimized for face recognition. Proceedings of the National Academy of Sciences, 120(32), e2220642120.
[16] Dong Y., Ruan S., Su H., Kang C., Wei X., & Zhu J. (2022). Viewfool: Evaluating the robustness of visual recognition to adversarial viewpoints. Advances in Neural Information Processing Systems, 35, 36789-36803.
[17] Eickenberg M., Gramfort A., Varoquaux G., & Thirion B. (2017). Seeing it all: Convolutional network layers map the function of the human visual system. NeuroImage, 152, 184-194.
[18] Geirhos R., Temme C. R. M., Rauber J., Schütt H. H., Bethge M., & Wichmann F. A. (2018). Generalisation in humans and deep neural networks. ArXiv.
[19] Grossman S., Gaziv G., Yeagle E. M., Harel M., Mégevand P., Groppe D. M., Khucis S., Herrero J. L., & Mehta A. D. (2019). Convergent evolution of face spaces across human face-selective neuronal groups and deep convolutional networks. Nature Communications, 10(1), 4934.
[20] Gu J., Yang X., De Mello S., & Kautz J. (2017). Dynamic facial analysis: From bayesian filtering to recurrent neural network. Proceedings of the IEEE conference on computer vision and pattern recognition.
[21] Güçlü, U., & Van Gerven, M. A. J. (2015). Deep neural networks reveal a gradient in the complexity of neural representations across the ventral stream. Journal of Neuroscience, 35(27), 10005-10014.
[22] Gupta, P., & Dobs, K. (2025). Human-like face pareidolia emerges in deep neural networks optimized for face and object recognition. PLOS Computational Biology, 21(1), e1012751.
[23] Hadjikhani N., Kveraga K., Naik P., & Ahlfors S. P. (2009). Early (M170) activation of face-specific cortex by face-like objects. Neuroreport, 20(4), 403-407.
[24] Haxby J. V., Hoffman E. A., & Gobbini M. I. (2000). The distributed human neural system for face perception. Trends in Cognitive Sciences, 4(6), 223-233.
[25] He K., Zhang X., Ren S., & Sun J. (2016). Deep residual learning for image recognition. Proceedings of the IEEE conference on computer vision and pattern recognition.
[26] Hill M. Q., Parde C. J., Castillo C. D., Colon Y. I., Ranjan R., Chen J. C., Blanz, V. & O’Toole, A. J. (2019). Deep convolutional neural networks in the face of caricature. Nature Machine Intelligence, 1(11), 522-529.
[27] Jacob G., Pramod R. T., Katti H., & Arun S. P. (2021). Qualitative similarities and differences in visual object representations between brains and deep networks. Nature communications, 12(1), 1872.
[28] Jiahui G., Feilong M., Visconti di Oleggio Castello, M., Nastase S. A., Haxby J. V., & Gobbini M. I. (2023). Modeling naturalistic face processing in humans with deep convolutional neural networks. Proceedings of the National Academy of Sciences, 120(43), e2304085120.
[29] Kadosh, K. C., & Johnson, M. H. (2007). Developing a cortex specialized for face perception. Trends in Cognitive Sciences, 11(9), 367-369.
[30] Kanwisher, N. (2000). Domain specificity in face perception. Nature Neuroscience, 3(8), 759-763.
[31] Kanwisher N., Gupta P., & Dobs K. (2023). CNNs reveal the computational implausibility of the expertise hypothesis. Iscience, 26(2), 105976.
[32] Kanwisher N., McDermott J., & Chun M. M. (1997). The fusiform face area: A module in human extrastriate cortex specialized for face perception. Journal of Neuroscience, 17(11), 4302-4311.
[33] Khaligh-Razavi, S.-M., & Kriegeskorte, N. (2014). Deep supervised, but not unsupervised, models may explain IT cortical representation. PLOS Computational Biology, 10(11), e1003915.
[34] Kietzmann T. C., Spoerer C. J., Sörensen L. K. A., Cichy R. M., Hauk O., & Kriegeskorte N. (2019). Recurrence is required to capture the representational dynamics of the human visual system. Proceedings of the National Academy of Sciences, 116(43), 21854-21863.
[35] Konkle, T., & Alvarez, G. A. (2022). A self-supervised domain-general learning framework for human ventral stream representation. Nature Communications, 13(1), 491.
[36] Kramer R. S. S., Young A. W., & Burton A. M. (2018). Understanding face familiarity. Cognition, 172, 46-58.
[37] Kravitz D. J., Saleem K. S., Baker C. I., Ungerleider L. G., & Mishkin M. (2013). The ventral visual pathway: An expanded neural framework for the processing of object quality. Trends in Cognitive Sciences, 17(1), 26-49.
[38] Krizhevsky A., Sutskever I., & Hinton G. E. (2012). Imagenet classification with deep convolutional neural networks. Curran Assooiates Inc.
[39] Lago F., Pasquini C., Böhme R., Dumont H., Goffaux V., & Boato G. (2021). More real than real: A study on human visual perception of synthetic faces [applications corner]. IEEE Signal Processing Magazine, 39(1), 109-116.
[40] Lake B. M., Salakhutdinov R., & Tenenbaum J. B. (2015). Human-level concept learning through probabilistic program induction. Science, 350(6266), 1332-1338.
[41] LeCun Y., Bengio Y., & Hinton G. (2015). Deep learning. Nature, 521(7553), 436-444.
[42] Li Y., Zheng W., Cui Z., & Zhang T. (2018). Face recognition based on recurrent regression neural network. Neurocomputing, 297, 50-58.
[43] Lu, Z., & Wang, Y. (2025). Category-selective neurons in deep networks: Comparing purely visual and visual-language models. ArXiv.
[44] Luo A., Henderson M., Wehbe L., & Tarr M. (2023). Brain diffusion for visual exploration: Cortical discovery using large scale generative models. Advances in neural information processing systems, 36, 75740-75781.
[45] Luo A. F., Henderson M. M., Tarr M. J., & Wehbe L. (2023). Brainscuba: Fine-grained natural language captions of visual cortex selectivity. ArXiv.
[46] Madry A., Makelov A., Schmidt L., Tsipras D., & Vladu A. (2017). Towards deep learning models resistant to adversarial attacks. ArXiv.
[47] Margalit E., Lee H., Finzi D., DiCarlo J. J., Grill-Spector K., & Yamins, D. L. K. (2024). A unifying framework for functional organization in early and higher ventral visual cortex. Neuron, 112(14), 2435-2451.
[48] McKone E., Kanwisher N., & Duchaine B. C. (2007). Can generic expertise explain special processing for faces? Trends in Cognitive Sciences, 11(1), 8-15.
[49] Nestor A., Plaut D. C., & Behrmann M. (2016). Feature-based face representations and image reconstruction from behavioral and neural data. Proceedings of the National Academy of Sciences, 113(2), 416-421.
[50] O'Toole, A. J., & Castillo, C. D. (2021). Face recognition by humans and machines: Three fundamental advances from deep learning. Annual Review of Vision Science, 7(1), 543-570.
[51] O'Toole A. J., Roark D. A., & Abdi H. (2002). Recognizing moving faces: A psychological and neural synthesis. Trends in cognitive sciences, 6(6), 261-266.
[52] O’Toole A. J., Abdi H., Deffenbacher K. A., & Valentin D. (1993). Low-dimensional representation of faces in higher dimensions of the face space. Journal of the Optical Society of America A, 10(3), 405-411.
[53] O’Toole A. J., Castillo C. D., Parde C. J., Hill M. Q., & Chellappa R. (2018). Face space representations in deep convolutional neural networks. Trends in Cognitive Sciences, 22(9), 794-809.
[54] O’Toole A. J., Deffenbacher K. A., Valentin D., McKee K., Huff D., & Abdi H. (1998). The perception of face gender: The role of stimulus structure in recognition and classification. Memory and Cognition, 26, 146-160.
[55] Phillips, P. J., & O'Toole, A. J. (2014). Comparison of human and computer performance across face recognition experiments. Image and Vision Computing, 32(1), 74-85.
[56] Phillips P. J., Yates A. N., Hu Y., Hahn C. A., Noyes E., Jackson K., & Sankaranarayanan S. (2018). Face recognition accuracy of forensic examiners, superrecognizers, and face recognition algorithms. Proceedings of the National Academy of Sciences, 115(24), 6171-6176.
[57] Prince J. S., Alvarez G. A., & Konkle T. (2024). Contrastive learning explains the emergence and function of visual category-selective regions. Science Advances, 10(39), eadl1776.
[58] Raman, R., & Hosoya, H. (2020). Convolutional neural networks explain tuning properties of anterior, but not middle, face-processing areas in macaque inferotemporal cortex. Communications Biology, 3(1), 221.
[59] Ratan Murty N. A., Bashivan P., Abate A., DiCarlo J. J., & Kanwisher N. (2021). Computational models of category-selective brain regions enable high-throughput tests of selectivity. Nature Communications, 12(1), 5540.
[60] Riesenhuber, M., & Poggio, T. (1999). Hierarchical models of object recognition in cortex. Nature Neuroscience, 2(11), 1019-1025.
[61] Rolls, E. T., & Milward, T. (2000). A model of invariant object recognition in the visual system: Learning rules, activation functions, lateral inhibition, and information-based performance measures. Neural Computation, 12(11), 2547-2572.
[62] Rossion, B. (2014). Understanding face perception by means of human electrophysiology. Trends in Cognitive Sciences, 18(6), 310-318.
[63] Shoham A., Grosbard I. D., Patashnik O., Cohen-Or D., & Yovel G. (2024). Using deep neural networks to disentangle visual and semantic information in human perception and memory. Nature Human Behaviour, 8(4), 702-717.
[64] Shoura M., Walther D. B., & Nestor A. (2025). Unraveling other-race face perception with GAN-based image reconstruction. Behavior Research Methods, 57(4), 1-14.
[65] Simonyan, K., & Zisserman, A. (2014a). Two-stream convolutional networks for action recognition in videos. ArXiv.
[66] Simonyan, K., & Zisserman, A. (2014b). Very deep convolutional networks for large-scale image recognition. ArXiv.
[67] Song Y., Qu Y., Xu S., & Liu J. (2021). Implementation-independent representation for deep convolutional neural networks and humans in processing faces. Frontiers in Computational Neuroscience, 14, 601314.
[68] Taigman Y., Yang M., Ranzato M. A., & Wolf L. (2014). Deepface: Closing the gap to human-level performance in face verification. Proceedings of the IEEE conference on computer vision and pattern recognition.
[69] Tanaka, J. W., & Sengco, J. A. (1997). Features and their configuration in face recognition. Memory and Cognition, 25(5), 583-592.
[70] Taubert J., Wardle S. G., Flessert M., Leopold D. A., & Ungerleider L. G. (2017). Face pareidolia in the rhesus monkey. Current Biology, 27(16), 2505-2509.
[71] Thompson, P. (1980). Margaret Thatcher: A new illusion. Perception, 9(4), 483-484.
[72] Tian F., Xie H., Song Y., Hu S., & Liu J. (2022). The face inversion effect in deep convolutional neural networks. Frontiers in Computational Neuroscience, 16, 854218.
[73] Tian J., Xie H., Hu S., & Liu J. (2021). Multidimensional face representation in a deep convolutional neural network reveals the mechanism underlying AI racism. Frontiers in Computational Neuroscience, 15, 620281.
[74] Tsao D. Y., Moeller S., & Freiwald W. A. (2008). Comparing face patch systems in macaques and humans. Proceedings of the National Academy of Sciences, 105(49), 19514-19519.
[75] Valentine T., Lewis M. B., & Hills P. J. (2016). Face-space: A unifying concept in face recognition research. Quarterly journal of Experimental Psychology, 69(10), 1996-2019.
[76] Vinken K., Prince J. S., Konkle T., & Livingstone M. S. (2023). The neural code for “face cells” is not face-specific. Science Advances, 9(35), eadg1736.
[77] Wang J., Cao R., Brandmeir N. J., Li X., & Wang S. (2022). Face identity coding in the deep neural network and primate brain. Communications Biology, 5(1), 611.
[78] Wang, M., & Deng, W. (2020). Mitigating bias in face recognition using skewness-aware reinforcement learning. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition.
[79] Wardle S. G., Taubert J., Teichmann L., & Baker C. I. (2020). Rapid and dynamic processing of face pareidolia in the human brain. Nature Communications, 11(1), 4518.
[80] Wichmann, F. A., & Geirhos, R. (2023). Are deep neural networks adequate behavioral models of human visual perception? Annual Review of Vision Science, 9(1), 501-524.
[81] Yamins D. L. K., Hong H., Cadieu C. F., Solomon E. A., Seibert D., & DiCarlo J. J. (2014). Performance-optimized hierarchical models predict neural responses in higher visual cortex. Proceedings of the National Academy of Sciences, 111(23), 8619-8624.
[82] Yin, R. K. (1969). Looking at upside-down faces. Journal of Experimental Psychology, 81(1), 141.
[83] Yovel G., Grosbard I., & Abudarham N. (2023). Deep learning models challenge the prevailing assumption that face-like effects for objects of expertise support domain-general mechanisms. Proceedings of the Royal Society B, 290(1998), 20230093.
[84] Zhang C., Bengio S., Hardt M., Recht B., & Vinyals O. (2016). Understanding deep learning requires rethinking generalization. ArXiv.
[85] Zhou L., Yang A., Meng M., & Zhou K. (2022). Emerged human-like facial expression representation in a deep convolutional neural network. Science Advances, 8(12), eabj4383.
[86] Zhuang C., Yan S., Nayebi A., Schrimpf M., Frank M. C., DiCarlo J. J., & Yamins, D. L. K. (2021). Unsupervised neural network models of the ventral visual stream. Proceedings of the National Academy of Sciences, 118(3), e2014196118.
PDF(670 KB)

Accesses

Citation

Detail

Sections
Recommended

/