说者身份信息对口语词汇加工的影响*

殷舒琦; 穆凯代斯·艾尔肯; 沈泰宇; 李利; 余可可; 王瑞明

doi:10.16719/j.cnki.1671-6981.20260306

PDF(1118 KB)

心理科学 ›› 2026, Vol. 49 ›› Issue (3) : 565-575. DOI: 10.16719/j.cnki.1671-6981.20260306

基础、实验与工效

说者身份信息对口语词汇加工的影响^*

殷舒琦 ¹ ,
穆凯代斯·艾尔肯 ¹ ,
沈泰宇 ¹ ,
李利 ² ,
余可可 ¹^,^** ,
王瑞明 ¹^,^**

作者信息 +

The Role of Speaker’s Identity Information in Spoken Word Processing

Yin Shuqi ¹ ,
Aierken Mukaidaisi ¹ ,
Shen Taiyu ¹ ,
Li Li ² ,
Yu Keke ¹^,^** ,
Wang Ruiming ¹^,^**

Author information +

文章历史 +

摘要

研究采用长时重复启动范式，操纵学习和测验材料中说者身份的一致性，考察说者身份信息对较浅的词汇通达 (实验1) 和较深的概念理解 (实验2) 的影响。实验1词汇决定任务发现，对于学过的词，说者身份一致条件的正确率显著高于不一致条件，表明说者身份信息和语言信息以整合的方式影响词汇通达。实验2类别决定任务发现，对于未学过的词，说者身份一致条件的正确率显著高于不一致条件，表明说者身份信息以独立的方式影响概念理解。基于上述结果和先前的相关理论，研究尝试提出了包含身份信息加工的口语词汇加工的新观点，有助于更具社会性和生态性地解释口语词汇的认知加工。

Abstract

Considering the speaker’s identity information provides a more social and ecological explanation of the cognitive processing of spoken words. However, whether and how speaker’s identity information affects spoken word processing is controversial. The abstractionist view (including the early and developmental abstractionist views) and the episodic view hold different opinions on this issue. Moreover, previous studies have employed different experimental tasks that provide different evidence for these views. Based on our analyses of these previous studies, we propose that existing views may each be suitable for explaining different processes in spoken word processing. It is necessary to examine the role of speaker’s identity information in spoken word processing requiring different processing depths. Based on this background, the present study focused on whether and how speaker’s identity information affected lexical access and conceptual comprehension in spoken word processing. Addressing these issues can help us better understand spoken word processing.

The present study conducted two behavioral experiments and adopted the classic long-term repetition priming paradigm to minimize possible interference from explicit experimental tasks. Specifically, Experiment 1 adopted a lexical decision task to examine whether and how speaker’s identity information affected lexical access in spoken word processing. Eighty-eight participants were recruited for the experiments and randomly divided into two groups (speakers’ identities were consistent vs. inconsistent). The experiment contained learning and test phases. In the consistent group, participants would hear stimuli spoken by a male in both the learning and test phases; in the inconsistent group, participants would hear stimuli spoken by a male in the learning phase and by a female in the test phase. The experimental materials consisted of 36 real words (e.g., “/yi1fu2/”, which means clothes in English) and 36 pseudowords (i.e., pronounceable but meaningless nonwords, e.g., “/ju4hong2/”). Participants needed to judge whether the auditory word was real or pseudo. Experiment 2 adopted a category decision task to examine whether and how speaker’s identity information affected conceptual comprehension in spoken word processing. The participants and design were the same as Experiment 1, with 36 biological words (e.g., “/xiao3cao3/”, which means grass in English) and 36 non-biological words (e.g., “/qian1bi3/”, which means pencil in English) as experimental materials. Participants needed to judge whether the auditory word was biological or non-biological.

In Experiment 1, the performance of learned words was better than that of unlearned words, indicating a stable repetition effect. More importantly, in the overall analysis (including real words and pseudowords), for learned words, the accuracy of the consistent condition was significantly larger than the inconsistent condition; for unlearned words, there was no significant difference between the consistent and inconsistent conditions. Further analysis revealed that the results for pseudowords were the same as the overall analysis, but for real words, there were no significant differences in either accuracy or reaction time between the consistent and inconsistent conditions for both learned and unlearned words. In Experiment 2, the response times of learned words were significantly shorter than those of unlearned words, suggesting the repetition effect of learned words. However, in contrast to Experiment 1, the accuracy of the consistent condition was significantly larger than the inconsistent condition for unlearned words, while there was no such difference for learned words.

Speaker’s identity information influences the processing of spoken word differently depending on the processes. Specifically, speaker’s identity consistency facilitation for learned words in the lexical decision task suggested that the representation of the speaker’s identity was integrated with linguistic information and would affect lexical access integrally, supporting the episodic view. In contrast, speaker’s identity consistency facilitation for unlearned words in the category decision task suggested that the speaker’s identity and linguistic information would be represented separately and affect conceptual comprehension independently, supporting the developmental abstractionist view. Integrating the developmental abstractionist and episodic views helps us better understand spoken word processing.

导出引用

殷舒琦, 穆凯代斯·艾尔肯, 沈泰宇, 等. 说者身份信息对口语词汇加工的影响^*[J]. 心理科学. 2026, 49(3): 565-575 https://doi.org/10.16719/j.cnki.1671-6981.20260306

Yin Shuqi, Aierken Mukaidaisi, Shen Taiyu, et al. The Role of Speaker’s Identity Information in Spoken Word Processing[J]. Journal of Psychological Science. 2026, 49(3): 565-575 https://doi.org/10.16719/j.cnki.1671-6981.20260306

参考文献

列表( 原文顺序 | 文献年度倒序 | 文中引用次数倒序 ) 可视化分析

[1]	汉语大字典编纂处. (2020). 现代汉语词典. 四川辞书出版社. 本文引用 [2]

[2]	胡砚冰, 蒋晓鸣. (2023). “信”以传信,“疑”以传疑?基于人声线索的可信度编码与解码. 心理科学, 5, 1057-1066. 本文引用 [1]

[3]	姜路遥, 李兵兵. (2023). 汉语听觉阈下启动效应:来自听觉掩蔽启动范式的证据. 心理学报, 4, 529-541. 本文引用 [1]

[4]	李利, 莫雷, 王瑞明, 罗雪莹. (2006). 非熟练中—英双语者跨语言长时重复启动效应. 心理学报, 5, 672-680. 本文引用 [2]

[5]	明莉莉, 胡学平. (2021). 人类嗓音加工的神经机制——来自正常视力者和盲人的脑神经证据. 心理科学进展, 12, 2147-2160. 本文引用 [1]

[6]	莫雷, 李利, 王瑞明. (2005). 熟练中—英双语者跨语言长时重复启动效应. 心理科学, 6, 10-15. 本文引用 [1]

[7]	余可可, 周亚聪, 刘秉怡, 蔡涵涵, 王瑞明. (2021). 听话者对说话者嗓音中语言学信息和副语言学信息的加工. 心理研究, 1, 29-36. 本文引用 [1]

[8]	张钦, 张必隐. (1999). 词汇决定任务中的策略因素. 心理科学, 1, 75-76. 本文引用 [2]

[9]	赵荣, 王小娟, 杨剑峰. (2016). 声调在汉语音节感知中的作用. 心理学报, 48(8), 915-923. 本文引用 [1]

[10]	Bates, D., Mächler, M., Bolker, B., & Walker, S. (2015). Fitting linear mixed-effects models using lme4. Journal of Statistical Software, 67(1), 1-48. 本文引用 [1]

[11]

Belin,

, Fecteau,

, & Bédard,

(2004). Thinking the voice: Neural correlates of voice perception. Trends in Cognitive Sciences, 8(3), 129-135.

https://www.ncbi.nlm.nih.gov/pubmed/15301753

本文引用 [3] 摘要

The human voice is the carrier of speech, but also an "auditory face" that conveys important affective and identity information. Little is known about the neural bases of our abilities to perceive such paralinguistic information in voice. Results from recent neuroimaging studies suggest that the different types of vocal information could be processed in partially dissociated functional pathways, and support a neurocognitive model of voice perception largely similar to that proposed for face perception.

[12]

Blank,

, Wieland,

, & von Kriegstein,

(2014). Person recognition and the brain: Merging evidence from patients and healthy individuals. Neuroscience and Biobehavioral Reviews, 47, 717-734.

https://doi.org/10.1016/j.neubiorev.2014.10.022

https://linkinghub.elsevier.com/retrieve/pii/S0149763414002759

本文引用 [2]

[13]	Boersma, P., & Weenink, D. (1992). Praat: Doing phonetics by computer (Version 6.2.06). [computer software]. https://www.fon.hum.uva.nl/praat/ https://www.fon.hum.uva.nl/praat/ 本文引用 [1]

[14]	Bowers, J. S. (2000). In defense of abstractionist theories of repetition priming and word identification. Psychonomic Bulletin and Review, 7(1), 83-99. https://doi.org/10.3758/BF03210726 http://link.springer.com/10.3758/BF03210726 本文引用 [1]

[15]

Cai,

Z. G.

, Gilbert,

R. A.

, Davis,

M. H.

, Gaskell,

M. G.

, Farrar,

, Adler,

, & Rodd,

J. M.

(2017). Accent modulates access to word meaning: Evidence for a speaker-model account of spoken word recognition. Cognitive Psychology, 98, 73-101.

https://doi.org/S0010-0285(17)30076-2

https://www.ncbi.nlm.nih.gov/pubmed/28881224

本文引用 [7] 摘要

Speech carries accent information relevant to determining the speaker's linguistic and social background. A series of web-based experiments demonstrate that accent cues can modulate access to word meaning. In Experiments 1-3, British participants were more likely to retrieve the American dominant meaning (e.g., hat meaning of "bonnet") in a word association task if they heard the words in an American than a British accent. In addition, results from a speeded semantic decision task (Experiment 4) and sentence comprehension task (Experiment 5) confirm that accent modulates on-line meaning retrieval such that comprehension of ambiguous words is easier when the relevant word meaning is dominant in the speaker's dialect. Critically, neutral-accent speech items, created by morphing British- and American-accented recordings, were interpreted in a similar way to accented words when embedded in a context of accented words (Experiment 2). This finding indicates that listeners do not use accent to guide meaning retrieval on a word-by-word basis; instead they use accent information to determine the dialectic identity of a speaker and then use their experience of that dialect to guide meaning access for all words spoken by that person. These results motivate a speaker-model account of spoken word recognition in which comprehenders determine key characteristics of their interlocutor and use this knowledge to guide word meaning access.Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.

[16]	Clapp, W., Vaughn, C., Todd, S., & Sumner, M. (2023). Talker-specificity and token-specificity in recognition memory. Cognition, 237, 105450. https://doi.org/10.1016/j.cognition.2023.105450 https://linkinghub.elsevier.com/retrieve/pii/S0010027723000847 本文引用 [7]

[17]	Cooper, A., & Bradlow, A. R. (2017). Talker and background noise specificity in spoken word recognition memory. Laboratory Phonology, 8(1), 1-15. https://doi.org/10.5334/labphon.25 http://www.journal-labphon.org/articles/10.5334/labphon.25/ 本文引用 [2]

[18]	Cutler, A., Eisner, F., McQueen, J. M., & Norris, D. (2010). How abstract phonemic categories are necessary for coping with speaker-related variation. Laboratory Phonology, 10, 91-111. 本文引用 [2]

[19]

Davies,

, Porretta,

, Koleva,

, & Klepousniotou,

(2022). Speaker-specific cues influence semantic disambiguation. Journal of Psycholinguistic Research, 51(5), 933-955.

https://doi.org/10.1007/s10936-022-09852-0

https://www.ncbi.nlm.nih.gov/pubmed/35556197

本文引用 [6] 摘要

Addressees use information from specific speakers' previous discourse to make predictions about incoming linguistic material and to restrict the choice of potential interpretations. In this way, speaker specificity has been shown to be an influential factor in language processing across several domains e.g., spoken word recognition, sentence processing, and pragmatics. However, its influence on semantic disambiguation has received little attention to date. Using an exposure-test design and visual world eye tracking, we examined the effect of speaker-specific literal vs. nonliteral style on the disambiguation of metaphorical polysemes such as 'fork', 'head', and 'mouse'. Eye movement data revealed that when interpreting polysemous words with a literal and a nonliteral meaning, addressees showed a late-stage preference for the literal meaning in response to a nonliteral speaker. We interpret this as reflecting an indeterminacy in the intended meaning in this condition, as well as the influence of meaning dominance cues at later stages of processing. Response data revealed that addressees then ultimately resolved to the literal target in 90% of trials. These results suggest that addressees consider a range of senses in the earlier stages of processing, and that speaker style is a contextual determinant in semantic processing.© 2022. The Author(s).

[20]

Faul,

, Erdfelder,

, Lang,

A. G.

, & Buchner,

(2007). G*Power 3: A flexible statistical power analysis program for the social, behavioral, and biomedical sciences. Behavior Research Methods, 39(2), 175-191.

https://doi.org/10.3758/bf03193146

https://www.ncbi.nlm.nih.gov/pubmed/17695343

本文引用 [1] 摘要

G*Power (Erdfelder, Faul, & Buchner, 1996) was designed as a general stand-alone power analysis program for statistical tests commonly used in social and behavioral research. G*Power 3 is a major extension of, and improvement over, the previous versions. It runs on widely used computer platforms (i.e., Windows XP, Windows Vista, and Mac OS X 10.4) and covers many different statistical tests of the t, F, and chi2 test families. In addition, it includes power analyses for z tests and some exact tests. G*Power 3 provides improved effect size calculators and graphic options, supports both distribution-based and design-based input modes, and offers all types of power analyses in which users might be interested. Like its predecessors, G*Power 3 is free.

[21]	Goldinger, S. D. (1996). Words and voices: Episodic traces in spoken word identification and recognition memory. Journal of Experimental Psychology: Learning, Memory, and Cognition, 22(5), 1166. 本文引用 [2]

[22]	Goldinger, S. D. (2007). A complementary-systems approach to abstract and episodic speech perception. In 16th International Congress of Phonetic Sciences, Saarbrücken, Germany. 本文引用 [2]

[23]	Hickok, G., & Poeppel, D. (2015). Neural basis of speech perception. Handbook of Clinical Neurology, 129, 149-160. 本文引用 [1]

[24]

Jia,

, Tsang,

Y. K.

, Huang,

, Chen,

H. C.

(2015). Processing cantonese lexical tones: Evidence from oddball paradigms. Neuroscience, 305, 351-360.

https://doi.org/10.1016/j.neuroscience.2015.08.009

https://www.ncbi.nlm.nih.gov/pubmed/26265553

本文引用 [1] 摘要

Two event-related potential (ERP) experiments were conducted to investigate whether Cantonese lexical tones are processed with general auditory perception mechanisms and/or a special speech module. Two tonal features (f0 direction and f0 height deviation) were manipulated to reflect acoustic processing, and the contrast between syllables and hums was used to reveal the involvement of a speech module. Experiment 1 adopted a passive oddball paradigm to study a relatively early stage of tonal processing. Mismatch negativity (MMN) and novelty P3 (P3a) were modulated by the interaction between tonal feature and stimulus type. Similar interactions were found for N2 and P3 in Experiment 2, where more in-depth tonal processing was examined with an active oddball paradigm. Moreover, detecting tonal deviants of syllables elicited N1 and P2 that were not found in hum detection. Together, these findings suggest that the processing of lexical tone relies on both acoustic and linguistic processes from the early stage. Another noteworthy finding is the absence of brain lateralization in both experiments, which challenges the use of a lateralization pattern as evidence for processing lexical tones through a special speech module. Copyright © 2015 IBRO. Published by Elsevier Ltd. All rights reserved.

[25]

Kapnoula,

E. C.

, & Samuel,

A. G.

(2019). Voices in the mental lexicon: Words carry indexical information that can affect access to their meaning. Journal of Memory and Language, 107, 111-127.

https://doi.org/10.1016/j.jml.2019.05.001

https://linkinghub.elsevier.com/retrieve/pii/S0749596X19300464

本文引用 [8]

[26]	Kuznetsova, A., Brockhoff, P. B., & Christensen, R. H. (2017). lmerTest package: Tests in linear mixed effects models. Journal of Statistical Software, 82(13), 1-26. 本文引用 [1]

[27]	Lavan, N. (2023). The time course of person perception from voices: A behavioral study. PsychologicalScience, 34(7), 771-783. 本文引用 [2]

[28]	Lavan, N., Rinke, P., & Scharinger, M. (2024). The time course of person perception from voices in the brain. Proceedings of the National Academy of Sciences, 121(26), e2318361121. 本文引用 [2]

[29]	Lenth, R. (2021). Emmeans: Estimated marginal means, aka least-squares means(R package version 1.8.2). [computer software]. https://CRAN.R-project.org/package=emmecans https://CRAN.R-project.org/package=emmecans 本文引用 [1]

[30]

Luthra,

(2024). Why are listeners hindered by talker variability? Psychonomic Bulletin and Review, 31(1), 104-121.

https://doi.org/10.3758/s13423-023-02355-6

本文引用 [1] 摘要

Though listeners readily recognize speech from a variety of talkers, accommodating talker variability comes at a cost: Myriad studies have shown that listeners are slower to recognize a spoken word when there is talker variability compared with when talker is held constant. This review focuses on two possible theoretical mechanisms for the emergence of these processing penalties. One view is that multitalker processing costs arise through a resource-demanding talker accommodation process, wherein listeners compare sensory representations against hypothesized perceptual candidates and error signals are used to adjust the acoustic-to-phonetic mapping (an active control process known as contextual tuning). An alternative proposal is that these processing costs arise because talker changes involve salient stimulus-level discontinuities that disrupt auditory attention. Some recent data suggest that multitalker processing costs may be driven by both mechanisms operating over different time scales. Fully evaluating this claim requires a foundational understanding of both talker accommodation and auditory streaming; this article provides a primer on each literature and also reviews several studies that have observed multitalker processing costs. The review closes by underscoring a need for comprehensive theories of speech perception that better integrate auditory attention and by highlighting important considerations for future research in this area.

[31]

Ma,

, Yu,

, Yin,

, Li,

, & Wang,

(2023). Attention modulates the role of speakers' voice identity and linguistic information in spoken word processing: Evidence from event-related potentials. Journal of Speech, Language, and Hearing Research, 66(5), 1678-1693.

https://doi.org/10.1044/2023_JSLHR-22-00420

http://pubs.asha.org/doi/10.1044/2023_JSLHR-22-00420

本文引用 [4]

[32]

McLennan,

C. T.

, & Luce,

P. A.

(2005). Examining the time course of indexical specificity effects in spoken word recognition. Journal of Experimental Psychology: Learning Memory and Cognition, 31(2), 306-321.

https://doi.org/10.1037/0278-7393.31.2.306

https://doi.apa.org/doi/10.1037/0278-7393.31.2.306

本文引用 [4]

[33]

McQueen,

J. M.

, Cutler,

, & Norris,

(2006). Phonological abstraction in the mental lexicon. Cognitive Science, 30(6), 1113-1126.

https://doi.org/10.1207/s15516709cog0000_79

https://www.ncbi.nlm.nih.gov/pubmed/21702849

本文引用 [1] 摘要

A perceptual learning experiment provides evidence that the mental lexicon cannot consist solely of detailed acoustic traces of recognition episodes. In a training lexical decision phase, listeners heard an ambiguous [f-s] fricative sound, replacing either [f] or [s] in words. In a test phase, listeners then made lexical decisions to visual targets following auditory primes. Critical materials were minimal pairs that could be a word with either [f] or [s] (cf. English knife-nice), none of which had been heard in training. Listeners interpreted the minimal pair words differently in the second phase according to the training received in the first phase. Therefore, lexically mediated retuning of phoneme perception not only influences categorical decisions about fricatives (Norris, McQueen, & Cutler, 2003), but also benefits recognition of words outside the training set. The observed generalization across words suggests that this retuning occurs prelexically. Therefore, lexical processing involves sublexical phonological abstraction, not only accumulation of acoustic episodes.2006 Lawrence Erlbaum Associates, Inc.

[34]

Orfanidou,

, Davis,

M. H.

, Ford,

M. A.

, & Marslen-Wilson,

W. D.

(2011). Perceptual and response components in repetition priming of spoken words and pseudowords. Quarterly Journal of Experimental Psychology, 64(1), 96-121.

https://doi.org/10.1080/17470211003743794

https://journals.sagepub.com/doi/10.1080/17470211003743794

本文引用 [6] 摘要

Two experiments explored repetition priming effects for spoken words and pseudowords in order to investigate abstractionist and episodic accounts of spoken word recognition and repetition priming. In Experiment 1, lexical decisions were made on spoken words and pseudowords with half of the items presented twice (∼12 intervening items). Half of all repetitions were spoken in a “different voice” from the first presentations. Experiment 2 used the same procedure but with stimuli embedded in noise to slow responses. Results showed greater priming for words than for pseudowords and no effect of voice change in both normal and effortful processing conditions. Additional analyses showed that for slower participants, priming is more equivalent for words and pseudowords, suggesting episodic stimulus–response associations that suppress familiarity-based mechanisms that ordinarily enhance word priming. By relating behavioural priming to the time-course of pseudoword identification we showed that under normal listening conditions (Experiment 1) priming reflects facilitation of both perceptual and decision components, whereas in effortful listening conditions (Experiment 2) priming effects primarily reflect enhanced decision/response generation processes. Both stimulus–response associations and enhanced processing of sensory input seem to be voice independent, providing novel evidence concerning the degree of perceptual abstraction in the recognition of spoken words and pseudowords.

[35]	Rodd, J. M., Lopez Cutrin, B., Kirsch, H., Millar, A., & Davis, M. H. (2013). Long-term priming of the meanings of ambiguous words. Journal of Memory and Language, 68(2), 180-198. https://doi.org/10.1016/j.jml.2012.08.002 https://linkinghub.elsevier.com/retrieve/pii/S0749596X12000836 本文引用 [1]

[36]	Samuel, A. G. (2011). Speech perception. Annual Review of Psychology, 62(1), 49-72. https://doi.org/10.1146/psych.2011.62.issue-1 https://www.annualreviews.org/toc/psych/62/1 本文引用 [1]

[37]	Scott, S. K. (2019). From speech and talkers to the social world: The neural processing of human spoken language. Science, 6461, 58-62. 本文引用 [1]

[38]

Yu,

, Chen,

, Yin,

, Li,

, & Wang,

(2022). The roles of pitch type and lexicality in the hemispheric lateralization for lexical tone processing: An ERP study. International Journal of Psychophysiology, 177, 83-91.

https://doi.org/10.1016/j.ijpsycho.2022.05.001

https://www.ncbi.nlm.nih.gov/pubmed/35533781

本文引用 [1] 摘要

Previous studies proposed different views to explain the hemispheric lateralization of lexical tone processing. But how the acoustic and phonological information modulates it remains unclear. The acoustic information refers to the physical acoustic features of lexical tones, and the phonological information means the different word meanings differentiated by lexical tones. In the present study, we adopted the active oddball paradigm to explore the effects of pitch type and lexicality on native Cantonese speakers' lexical tone processing with the event-related potential (ERP) technique. We used Cantonese level and contour tones (pitch type) to examine the role of acoustic information and real words and pseudowords (lexicality) to detect the phonological information's effect. The results showed that the pitch type and lexicality affected the N2b amplitudes between the left and right hemispheres interactively, while they did not play roles in P3b amplitudes. The results indicated that the acoustic and phonological information modulated the hemispheric lateralization of lexical tone processing interactively only in the early stage (N2b time window) but not in the later stage (P3b time window). The findings suggested a two-stage model interprets the hemispheric lateralization in lexical tone processing.Copyright © 2021. Published by Elsevier B.V.

[39]	Zeelenberg, R., & Pecher, D. (2003). Evidence for long-term cross-language repetition priming in conceptual implicit memory tasks. Journal of Memory and Language, 49(1), 80-94. https://doi.org/10.1016/S0749-596X(03)00020-2 https://linkinghub.elsevier.com/retrieve/pii/S0749596X03000202 本文引用 [1]