“信”以传信,“疑”以传疑?基于人声线索的可信度编码与解码 *

胡砚冰, 蒋晓鸣

心理科学 ›› 2023, Vol. 46 ›› Issue (5) : 1057-1066.

PDF(982 KB)
中文  |  English
PDF(982 KB)
心理科学 ›› 2023, Vol. 46 ›› Issue (5) : 1057-1066. DOI: 10.16719/j.cnki.1671-6981.20230504
基础、实验与工效

“信”以传信,“疑”以传疑?基于人声线索的可信度编码与解码 *

  • 胡砚冰, 蒋晓鸣**
作者信息 +

Encoding and Decoding Credibility in Human Vocal Cues

  • Hu Yanbing, Jiang Xiaoming
Author information +
文章历史 +

摘要

言语交流中,个体通过声音来表达和判断可信度的信息是人际信任形成的基础。人声可信度加工是指听者基于声源者的嗓音与发声特点判断其言语信息的可信程度,这一过程会影响听者与声源者之间信任关系的建立。人声加工的认知模型提出,人声中的副语言线索、语义线索和社会群体线索会影响声源者对可信度的编码与听者对可信度的解码。声源者通过有意编码韵律副语言线索产生不同的可信度水平,听者在200ms可以区分不同水平的可信度。当声源者编码的语义和副语言线索之间的可信度水平不一致时,听者需要更长的时间消解线索间的冲突并推理可信度。声源者的种族身份特征会无意编码口音线索,这种线索调节了听者整合社会群体线索解码可信度时两条不同的神经通路:基于人声线索推理可信度的颞顶网络和从人声中提取编码社会群体规则的额顶网络。未来研究应从人声加工的发生和发展、言语多模态信息协调以及特殊群体功能损伤的角度探讨人声可信度的编码与解码机制。

Abstract

Recognizing vocal cues of credibility plays a significant role in social interactions. Vocal credibility refers to the degree to which listeners judge the truthfulness of a message from the speaker's voice, and affects the listener's social impressions and subsequent behaviours towards the speaker. Supported by behavioral and neurophysiological evidence, the cognitive processing model of vocal expression proposes that listeners can decode speakers' credibility from various vocal information such as speech prosody, lexical-semantic information, and accents.
In terms of speech prosody, credible voices are associated with higher Fundamental Frequency (F0) and amplitudes, whereas the untrustworthy ones are characterized by slower speech rates and more frequent pauses. Evidence from event-related potentials (ERPs) has shown that listeners can differentiate between credible and untrustworthy prosody apart as early as 200ms and continue to compute these vocal cues in a dynamic fashion. Furthermore, fMRI studies have revealed that the increased trustworthiness of vocal expressions is associated with increased activity in the left Superior frontal gyrus (SFG) and the left Inferior Frontal Gyrus (IFG), whereas the increased untrustworthiness engendered an activation in the right Superior Temporal Gyrus (STG). Moreover, functional connectivity studies have shown that the strengthened connections between the left Postcentral Gyrus and the Supplementary Motor Area (SMA) are associated with the listeners' decoding of vocal cues of speaker untrustworthiness.
The vocal cues and semantic information interact to form the speaker's credibility impression. The ERP evidence showed that vocal expressions with the vocal cue incongruent with the semantic content elicited a larger Late Positive Potential (LPP). Source localization revealed that Middle Frontal Gyrus (MFG) and STG contributed to this effect. Prosody-semantic congruency modulates the neurocognitive mechanisms underlying the decoding of speaker credibility.
Accent cues that indicate in- and out-group voices are relevant for decoding credibility. Listeners perceived in- (vs. out-) group voices as more credible. ERP results showed that for the out-group accent, the doubtful (vs. confident) voice elicited a smaller P200 response. However, for the in-group accent, the doubtful (vs. confident) voice elicited larger early responses. Moreover, basal ganglia, left cuneus and right fusiform gyrus were activated when listeners judged out-group vs. in-group voices for speaker's credibility. More importantly, the superior parietal and middle temporal brain regions were activated when listeners perceived in-group (vs. out-group) credibility. These results suggest that there could be two pathways for decoding vocal credibility. Listeners show greater sensitivity towards in-group voice at the perceptual level, and they tend to follow a ‘direct path' for making social inferences based on the human voice. For the out-group speaker, the social category information activated by accent information delays listeners' social reasoning about the out-group features. Listeners follow a longer or more laborious ‘indirect path' to compute the credibility of the out-group speaker with an in-depth analysis of the vocal expression. The processing of vocal credibility involves not only the analysis of lower-level acoustic features but also the higher-level social categorization.
In future research, one should explore further three topics related to the encoding and decoding mechanisms of speakers' credibility: (1) The developmental mechanisms underlying the decoding of credibility from human voice; (2) The role of multimodal nonverbal cues in encoding and decoding speaker credibility in real-life social interactions; and (3) The neurocognitive deficits of decoding vocal credibility in clinical populations.

关键词

人声表情加工模型 / 言语交流 / 人声线索 / 副语言线索 / 人声可信度

Key words

vocal expression processing model / speech communication / vocal cue / paralinguistic cue / vocal trustworthiness

引用本文

导出引用
胡砚冰, 蒋晓鸣. “信”以传信,“疑”以传疑?基于人声线索的可信度编码与解码 *[J]. 心理科学. 2023, 46(5): 1057-1066 https://doi.org/10.16719/j.cnki.1671-6981.20230504
Hu Yanbing, Jiang Xiaoming. Encoding and Decoding Credibility in Human Vocal Cues[J]. Journal of Psychological Science. 2023, 46(5): 1057-1066 https://doi.org/10.16719/j.cnki.1671-6981.20230504

参考文献

[1] 蒋晓鸣. (2020). 文化互鉴视角下非言语表情的嗓音编码和解码. 同济大学学报(社会科学版), 31(1), 116-124.
[2] 伍可, 陈杰, 李雯婕, 陈洁佳, 刘雷, 刘翠红. (2020). 人声加工的神经机制. 心理科学进展, 28(5), 752-765.
[3] 张坤坤, 张珂烨, 张火垠, 罗文波. (2019). 面孔可信度加工的时间进程和影响因素. 心理科学进展, 27(8), 1394-1403.
[4] 周爱保, 胡砚冰, 鲁小勇, 申莎, 关香丽, 陈大亮, 崔嘉溦. (2020). 我听故我在? 自我声音识别机制的探索. 心理科学, 43(3), 564-570.
[5] 周爱保, 胡砚冰, 周滢鑫, 李玉, 李文一, 张号博, 胡国庆. (2021). 听而不“闻”? 人声失认症的神经机制. 心理科学进展, 29(3), 414-424.
[6] Bestelmeyer P. E. G., Kotz S. A., & Belin P. (2017). Effects of emotional valence and arousal on the voice perception network. Social Cognitive and Affective Neuroscience, 12(8), 1351-1358.
[7] Birch S. A. J., Severson R. L., & Baimel A. (2020). Children' s understanding of when a person' s confidence and hesitancy is a cue to their credibility. PLoS ONE, 15(1), Article e0227026.
[8] Caballero, J. A., & Pell, M. D. (2020). Implicit effects of speaker accents and vocally-expressed confidence on decisions to trust. Decision, 7(4), 314-331.
[9] Declerck C. H., Boone C., & Emonds G. (2013). When do people cooperate? The neuroeconomics of prosocial decision making. Brain and Cognition, 81(1), 95-117.
[10] Dricu, M., & Frühholz, S. (2016). Perceiving emotional expressions in others: Activation likelihood estimation meta-analyses of explicit evaluation, passive perception and incidental perception of emotions. Neuroscience and Biobehavioral Reviews, 71, 810-828.
[11] Fitch, W. T. (2000). The evolution of speech: A comparative review. Trends in Cognitive Sciences, 4(7), 258-267.
[12] Frühholz, S., & Grandjean, D. (2013). Multiple subregions in superior temporal cortex are differentially sensitive to vocal expressions: A quantitative meta-analysis. Neuroscience and Biobehavioral Reviews, 37(1), 24-35.
[13] Frühholz, S., & Schweinberger, S. R. (2021). Nonverbal auditory communication - Evidence for integrated neural systems for voice signal production and perception. Progress in Neurobiology, 199, Article 101948.
[14] Frühholz S., Trost W., & Kotz S. A. (2016). The sound of emotions-Towards a unifying neural network perspective of affective sound processing. Neuroscience and Biobehavioral Reviews, 68, 96-110.
[15] Fujisaki, H. (1997). Prosody, models, and spontaneous speech. In Y. Sagisaka, N. Campbell, & N. Higuchi (Eds.), Computing prosody: Computational models for processing spontaneous speech (pp. 27-42). Springer.
[16] Goupil, L., & Aucouturier, J. J. (2021). Distinct signatures of subjective confidence and objective accuracy in speech prosody. Cognition, 212, Article 104661.
[17] Goupil L., Ponsot E., Richardson D., Reyes G., & Aucouturier J. J. (2021). Listeners' perceptions of the certainty and honesty of a speaker are associated with a common prosodic signature. Nature Communications, 12(1), Article 861.
[18] Grossmann, T. (2021). Developmental origins of the pathway for social perception. Trends in Cognitive Sciences, 25(7), 546-547.
[19] Hamilton L. S., Oganian Y., Hall J., & Chang E. F. (2021). Parallel and distributed encoding of speech across human auditory cortex. Cell, 184(18), 4626-4639.
[20] Harada T., Itakura S., Xu F., Lee K., Nakashita S., Saito D. N., & Sadato N. (2009). Neural correlates of the judgment of lying: A functional magnetic resonance imaging study. Neuroscience Research, 63(1), 24-34.
[21] Hensel L., Bzdok D., Müller V. I., Zilles K., & Eickhoff S. B. (2015). Neural correlates of explicit social judgments on vocal stimuli. Cerebral Cortex, 25(5), 1152-1162.
[22] Hickok, G., & Poeppel, D. (2007). The cortical organization of speech processing. Nature Reviews Neuroscience, 8(5), 393-402.
[23] Hughes, C., & Cutting, A. L. (1999). Nature, nurture, and individual differences in early understanding of mind. Psychological Science, 10(5), 429-432.
[24] Jessen, S., & Grossmann, T. (2020). Neural evidence for the impact of facial trustworthiness on object processing in a gaze-cueing task in 7-month-old infants. Social Neuroscience, 15(1), 74-82.
[25] Jiang X. M., Gossack-Keenan K., & Pell M. D. (2020). To believe or not to believe? How voice and accent information in speech alter listener impressions of trust. Quarterly Journal of Experimental Psychology, 73(1), 55-79.
[26] Jiang, X. M., & Pell, M. D. (2015). On how the brain decodes vocal cues about speaker confidence. Cortex, 66, 9-34.
[27] Jiang, X. M., & Pell, M. D. (2016a). Neural responses towards a speaker's feeling of (un)knowing. Neuropsychologia, 81, 79-93.
[28] Jiang, X. M., & Pell, M. D. (2016b). The feeling of another's knowing: How "mixed messages" in speech are reconciled. Journal of Experimental Psychology: Human Perception and Performance, 42(9), 1412-1428.
[29] Jiang, X. M., & Pell, M. D. (2017). The sound of confidence and doubt. Speech Communication, 88, 106-126.
[30] Jiang X. M., Sanford R., & Pell M. D. (2017). Neural systems for evaluating speaker (Un)believability. Human Brain Mapping, 38(7), 3732-3749.
[31] Jiang X. M., Sanford R., & Pell M. D. (2018). Neural architecture underlying person perception from in-group and out-group voices. NeuroImage, 181, 582-597.
[32] Kinzler, K. D. (2021). Language as a social cue. Annual Review of Psychology, 72(1), 241-264.
[33] Knight S., Lavan N., Torre I., & McGettigan C. (2021). The influence of perceived vocal traits on trusting behaviours in an economic game. Quarterly Journal of Experimental Psychology, 74(10), 1747-1754.
[34] Kotz, S. A., & Paulmann, S. (2011). Emotion, language, and the brain. Language and Linguistics Compass, 5(3), 108-125.
[35] Krueger, F., & Meyer-Lindenberg, A. (2019). Toward a model of interpersonal trust drawn from neuroscience, psychology, and economics. Trends in Neurosciences, 42(2), 92-101.
[36] Landrum A. R., Eaves B. S., & Shafto P. (2015). Learning to trust and trusting to learn: A theoretical framework. Trends in Cognitive Sciences, 19(3), 109-111.
[37] Lavan N., Burton A. M., Scott S. K., & McGettigan C. (2019). Flexible voices: Identity perception from variable vocal signals. Psychonomic Bulletin and Review, 26(1), 90-102.
[38] Lee, S. A. (2020). Domain specificity. In J. Vonk, & T. Shackelford (Eds.), Encyclopedia of animal cognition and behavior (pp. 1-4). Springer.
[39] Leitman D. I., Wolf D. H., Laukka P., Ragland J. D., Valdez J. N., Turetsky B. I., & Gur R. C. (2011). Not pitch perfect: Sensory contributions to affective communication impairment in schizophrenia. Biological Psychiatry, 70(7), 611-618.
[40] Liberman, A. M., & Whalen, D. H. (2000). On the relation of speech to language. Trends in Cognitive Sciences, 4(5), 187-196.
[41] Lima C. F., Arriaga P., Anikin A., Pires A. R., Frade S., Neves L., & Scott S. K. (2021). Authentic and posed emotional vocalizations trigger distinct facial responses. Cortex, 141, 280-292.
[42] Mattavelli G., Romano D., Young A. W., & Ricciardelli P. (2021). The interplay between gaze cueing and facial trait impressions. Quarterly Journal of Experimental Psychology, 74(9), 1642-1655.
[43] Milham M. P., Banich M. T., Webb A., Barad V., Cohen N. J., Wszalek T., & Kramer A. F. (2001). The relative involvement of anterior cingulate and prefrontal cortex in attentional control depends on nature of conflict. Cognitive Brain Research, 12(3), 467-473.
[44] Monetta L., Cheang H. S., & Pell M. D. (2008). Understanding speaker attitudes from prosody by adults with Parkinson's disease. Journal of Neuropsychology, 2(2), 415-430.
[45] Mori, Y., & Pell, M. D. (2019). The look of (Un)confidence: Visual markers for inferring speaker confidence in speech. Frontiers in Communication, 4, Article 63.
[46] O'Connor, J. J. M., & Barclay, P. (2017). The influence of voice pitch on perceptions of trustworthiness across social contexts. Evolution and Human Behavior, 38(4), 506-512.
[47] Oleszkiewicz A., Pisanski K., Lachowicz-Tabaczek K., & Sorokowska A. (2017). Voice-based assessments of trustworthiness, competence, and warmth in blind and sighted adults. Psychonomic Bulletin and Review, 24(3), 856-862.
[48] Pell, M. D., & Kotz, S. A. (2021). Comment: The next frontier: Prosody research gets interpersonal. Emotion Review, 13(1), 51-56.
[49] Pickering, M. J., & Garrod, S. (2013). An integrated theory of language production and comprehension. Behavioral and Brain Sciences, 36(4), 329-347.
[50] Pisanski K., Cartei V., McGettigan C., Raine J., & Reby D. (2016). Voice modulation: A window into the origins of human vocal control? Trends in Cognitive Sciences, 20(4), 304-318.
[51] Rigoulot S., Fish K., & Pell M. D. (2014). Neural correlates of inferring speaker sincerity from white lies: An event-related potential source localization study. Brain Research, 1565, 48-62.
[52] Schirmer, A., & Kotz, S. A. (2006). Beyond the right hemisphere: Brain mechanisms mediating vocal emotional processing. Trends in Cognitive Sciences, 10(1), 24-30.
[53] Scott, S. K. (2019). From speech and talkers to the social world: The neural processing of human spoken language. Science, 366(6461), 58-62.
[54] Spreckelmeyer K. N., Kutas M., Urbach T., Altenmüller E., & Münte T. F. (2009). Neural processing of vocal emotion and identity. Brain and Cognition, 69(1), 121-126.
[55] Sumner, M. (2015). The social weight of spoken words. Trends in Cognitive Sciences, 19(5), 238-239.
[56] Tatz J. R., Peynircioğlu Z. F., & Brent W. (2020). Face-voice space: Integrating visual and auditory cues in judgments of person distinctiveness. Attention, Perception, and Psychophysics, 82(7), 3710-3727.
[57] Torre I., Carrigan E., McDonnell R., Domijan K., McCabe K., & Harte N. (2019). The effect of multimodal emotional expression and agent appearance on trust in human-agent interaction, Newcastle upon Tyne, United Kingdom.
[58] van Overwalle, F., & Baetens, K. (2009). Understanding others' actions and goals by mirror and mentalizing systems: A meta-analysis. NeuroImage, 48(3), 564-584.
[59] van Overwalle F., Baetens K., Mariën P., & Vandekerckhove M. (2014). Social cognition and the cerebellum: A meta-analysis of over 350 fMRI studies. NeuroImage, 86, 554-572.
[60] Wu D. C., Loke I. C., Xu F., & Lee K. (2011). Neural correlates of evaluations of lying and truth-telling in different social contexts. Brain Research, 1389, 115-124.
[61] Young A. W., Frühholz S., & Schweinberger S. R. (2020). Face and voice perception: Understanding commonalities and differences. Trends in Cognitive Sciences, 24(5), 398-410.
[62] Zellou G., Cohn M., & Kline T. (2021). The influence of conversational role on phonetic alignment toward voice-AI and human interlocutors. Language, Cognition and Neuroscience, 36(10), 1298-1312.

基金

*本研究得到国家自然科学基金委面上项目(31971037)、上海市“科技创新行动计划”自然科学基金项目(22ZR1460200)、上海市教育发展基金会和上海市教育委员会“曙光计划”项目(20SG31)和上海外国语大学导师学术引领计划项目(2022113001)的资助

PDF(982 KB)

Accesses

Citation

Detail

段落导航
相关文章

/