The Role of Speaker’s Identity Information in Spoken Word Processing

Yin Shuqi, Aierken Mukaidaisi, Shen Taiyu, Li Li, Yu Keke, Wang Ruiming

Journal of Psychological Science ›› 2026, Vol. 49 ›› Issue (3) : 565-575.

PDF(1636 KB)
PDF(1636 KB)
Journal of Psychological Science ›› 2026, Vol. 49 ›› Issue (3) : 565-575. DOI: 10.16719/j.cnki.1671-6981.20260306
General Psychology, Experimental Psychology & Ergonomics

The Role of Speaker’s Identity Information in Spoken Word Processing

Author information +
History +

Abstract

Considering the speaker’s identity information provides a more social and ecological explanation of the cognitive processing of spoken words. However, whether and how speaker’s identity information affects spoken word processing is controversial. The abstractionist view (including the early and developmental abstractionist views) and the episodic view hold different opinions on this issue. Moreover, previous studies have employed different experimental tasks that provide different evidence for these views. Based on our analyses of these previous studies, we propose that existing views may each be suitable for explaining different processes in spoken word processing. It is necessary to examine the role of speaker’s identity information in spoken word processing requiring different processing depths. Based on this background, the present study focused on whether and how speaker’s identity information affected lexical access and conceptual comprehension in spoken word processing. Addressing these issues can help us better understand spoken word processing.

The present study conducted two behavioral experiments and adopted the classic long-term repetition priming paradigm to minimize possible interference from explicit experimental tasks. Specifically, Experiment 1 adopted a lexical decision task to examine whether and how speaker’s identity information affected lexical access in spoken word processing. Eighty-eight participants were recruited for the experiments and randomly divided into two groups (speakers’ identities were consistent vs. inconsistent). The experiment contained learning and test phases. In the consistent group, participants would hear stimuli spoken by a male in both the learning and test phases; in the inconsistent group, participants would hear stimuli spoken by a male in the learning phase and by a female in the test phase. The experimental materials consisted of 36 real words (e.g., “/yi1fu2/”, which means clothes in English) and 36 pseudowords (i.e., pronounceable but meaningless nonwords, e.g., “/ju4hong2/”). Participants needed to judge whether the auditory word was real or pseudo. Experiment 2 adopted a category decision task to examine whether and how speaker’s identity information affected conceptual comprehension in spoken word processing. The participants and design were the same as Experiment 1, with 36 biological words (e.g., “/xiao3cao3/”, which means grass in English) and 36 non-biological words (e.g., “/qian1bi3/”, which means pencil in English) as experimental materials. Participants needed to judge whether the auditory word was biological or non-biological.

In Experiment 1, the performance of learned words was better than that of unlearned words, indicating a stable repetition effect. More importantly, in the overall analysis (including real words and pseudowords), for learned words, the accuracy of the consistent condition was significantly larger than the inconsistent condition; for unlearned words, there was no significant difference between the consistent and inconsistent conditions. Further analysis revealed that the results for pseudowords were the same as the overall analysis, but for real words, there were no significant differences in either accuracy or reaction time between the consistent and inconsistent conditions for both learned and unlearned words. In Experiment 2, the response times of learned words were significantly shorter than those of unlearned words, suggesting the repetition effect of learned words. However, in contrast to Experiment 1, the accuracy of the consistent condition was significantly larger than the inconsistent condition for unlearned words, while there was no such difference for learned words.

Speaker’s identity information influences the processing of spoken word differently depending on the processes. Specifically, speaker’s identity consistency facilitation for learned words in the lexical decision task suggested that the representation of the speaker’s identity was integrated with linguistic information and would affect lexical access integrally, supporting the episodic view. In contrast, speaker’s identity consistency facilitation for unlearned words in the category decision task suggested that the speaker’s identity and linguistic information would be represented separately and affect conceptual comprehension independently, supporting the developmental abstractionist view. Integrating the developmental abstractionist and episodic views helps us better understand spoken word processing.

Key words

spoken word processing / identity information / linguistic information / lexical access / conceptual comprehension

Cite this article

Download Citations
Yin Shuqi , Aierken Mukaidaisi , Shen Taiyu , et al . The Role of Speaker’s Identity Information in Spoken Word Processing[J]. Journal of Psychological Science. 2026, 49(3): 565-575 https://doi.org/10.16719/j.cnki.1671-6981.20260306

References

[1]
汉语大字典编纂处. (2020). 现代汉语词典. 四川辞书出版社.
[2]
胡砚冰, 蒋晓鸣. (2023). “信”以传信,“疑”以传疑?基于人声线索的可信度编码与解码. 心理科学, 5, 1057-1066.
[3]
姜路遥, 李兵兵. (2023). 汉语听觉阈下启动效应:来自听觉掩蔽启动范式的证据. 心理学报, 4, 529-541.
[4]
李利, 莫雷, 王瑞明, 罗雪莹. (2006). 非熟练中—英双语者跨语言长时重复启动效应. 心理学报, 5, 672-680.
[5]
明莉莉, 胡学平. (2021). 人类嗓音加工的神经机制——来自正常视力者和盲人的脑神经证据. 心理科学进展, 12, 2147-2160.
[6]
莫雷, 李利, 王瑞明. (2005). 熟练中—英双语者跨语言长时重复启动效应. 心理科学, 6, 10-15.
[7]
余可可, 周亚聪, 刘秉怡, 蔡涵涵, 王瑞明. (2021). 听话者对说话者嗓音中语言学信息和副语言学信息的加工. 心理研究, 1, 29-36.
[8]
张钦, 张必隐. (1999). 词汇决定任务中的策略因素. 心理科学, 1, 75-76.
[9]
赵荣, 王小娟, 杨剑峰. (2016). 声调在汉语音节感知中的作用. 心理学报, 48(8), 915-923.
[10]
Bates, D., Mächler, M., Bolker, B., & Walker, S. (2015). Fitting linear mixed-effects models using lme4. Journal of Statistical Software, 67(1), 1-48.
[11]
Belin, P., Fecteau, S., & Bédard, C. (2004). Thinking the voice: Neural correlates of voice perception. Trends in Cognitive Sciences, 8(3), 129-135.
[12]
Blank, H., Wieland, N., & von Kriegstein, K. (2014). Person recognition and the brain: Merging evidence from patients and healthy individuals. Neuroscience and Biobehavioral Reviews, 47, 717-734.
[13]
Boersma, P., & Weenink, D. (1992). Praat: Doing phonetics by computer (Version 6.2.06). [computer software]. https://www.fon.hum.uva.nl/praat/
[14]
Bowers, J. S. (2000). In defense of abstractionist theories of repetition priming and word identification. Psychonomic Bulletin and Review, 7(1), 83-99.
[15]
Cai, Z. G., Gilbert, R. A., Davis, M. H., Gaskell, M. G., Farrar, L., Adler, S., & Rodd, J. M. (2017). Accent modulates access to word meaning: Evidence for a speaker-model account of spoken word recognition. Cognitive Psychology, 98, 73-101.
[16]
Clapp, W., Vaughn, C., Todd, S., & Sumner, M. (2023). Talker-specificity and token-specificity in recognition memory. Cognition, 237, 105450.
[17]
Cool Edit, Pro. (2002). Syntrillium Software (Version 2.1). [computer software]. https://web.archive.org/web/2002/20/041732/
[18]
Cooper, A., & Bradlow, A. R. (2017). Talker and background noise specificity in spoken word recognition memory. Laboratory Phonology, 8(1), 1-15.
[19]
Cutler, A., Eisner, F., McQueen, J. M., & Norris, D. (2010). How abstract phonemic categories are necessary for coping with speaker-related variation. Laboratory Phonology, 10, 91-111.
[20]
Davies, C., Porretta, V., Koleva, K., & Klepousniotou, E. (2022). Speaker-specific cues influence semantic disambiguation. Journal of Psycholinguistic Research, 51(5), 933-955.
[21]
Faul, F., Erdfelder, E., Lang, A. G., & Buchner, A. (2007). G*Power 3: A flexible statistical power analysis program for the social, behavioral, and biomedical sciences. Behavior Research Methods, 39(2), 175-191.
[22]
Goldinger, S. D. (1996). Words and voices: Episodic traces in spoken word identification and recognition memory. Journal of Experimental Psychology: Learning, Memory, and Cognition, 22(5), 1166.
[23]
Goldinger, S. D. (2007). A complementary-systems approach to abstract and episodic speech perception. In 16th International Congress of Phonetic Sciences, Saarbrücken, Germany.
[24]
Hickok, G., & Poeppel, D. (2015). Neural basis of speech perception. Handbook of Clinical Neurology, 129, 149-160.
[25]
Jia, S., Tsang, Y. K., Huang, J., Chen, H. C. (2015). Processing cantonese lexical tones: Evidence from oddball paradigms. Neuroscience, 305, 351-360.
[26]
Kapnoula, E. C., & Samuel, A. G. (2019). Voices in the mental lexicon: Words carry indexical information that can affect access to their meaning. Journal of Memory and Language, 107, 111-127.
[27]
Kuznetsova, A., Brockhoff, P. B., & Christensen, R. H. (2017). lmerTest package: Tests in linear mixed effects models. Journal of Statistical Software, 82(13), 1-26.
[28]
Lavan, N. (2023). The time course of person perception from voices: A behavioral study. PsychologicalScience, 34(7), 771-783.
[29]
Lavan, N., Rinke, P., & Scharinger, M. (2024). The time course of person perception from voices in the brain. Proceedings of the National Academy of Sciences, 121(26), e2318361121.
[30]
Lenth, R. (2021). Emmeans: Estimated marginal means, aka least-squares means(R package version 1.8.2). [computer software]. https://CRAN.R-project.org/package=emmecans
[31]
Luthra, S. (2024). Why are listeners hindered by talker variability? Psychonomic Bulletin and Review, 31(1), 104-121.
[32]
Ma, Y., Yu, K., Yin, S., Li, L., Li, P., & Wang, R. (2023). Attention modulates the role of speakers' voice identity and linguistic information in spoken word processing: Evidence from event-related potentials. Journal of Speech, Language, and Hearing Research, 66(5), 1678-1693.
[33]
McLennan, C. T., & Luce, P. A. (2005). Examining the time course of indexical specificity effects in spoken word recognition. Journal of Experimental Psychology: Learning Memory and Cognition, 31(2), 306-321.
[34]
McQueen, J. M., Cutler, A., & Norris, D. (2006). Phonological abstraction in the mental lexicon. Cognitive Science, 30(6), 1113-1126.
[35]
Orfanidou, E., Davis, M. H., Ford, M. A., & Marslen-Wilson, W. D. (2011). Perceptual and response components in repetition priming of spoken words and pseudowords. Quarterly Journal of Experimental Psychology, 64(1), 96-121.
[36]
Rodd, J. M., Lopez Cutrin, B., Kirsch, H., Millar, A., & Davis, M. H. (2013). Long-term priming of the meanings of ambiguous words. Journal of Memory and Language, 68(2), 180-198.
[37]
Samuel, A. G. (2011). Speech perception. Annual Review of Psychology, 62(1), 49-72.
[38]
Scott, S. K. (2019). From speech and talkers to the social world: The neural processing of human spoken language. Science, 6461, 58-62.
[39]
Yu, K., Chen, Y., Yin, S., Li, L., & Wang, R. (2022). The roles of pitch type and lexicality in the hemispheric lateralization for lexical tone processing: An ERP study. International Journal of Psychophysiology, 177, 83-91.
[40]
Zeelenberg, R., & Pecher, D. (2003). Evidence for long-term cross-language repetition priming in conceptual implicit memory tasks. Journal of Memory and Language, 49(1), 80-94.
PDF(1636 KB)

Accesses

Citation

Detail

Sections
Recommended

/