随着人工智能技术的飞速发展,大语言模型(LLMs)在文本生成、翻译、问答等诸多领域展现出惊人的能力。然而,这些模型是否像人类一样容易受到认知偏差的影响,是一个亟待解决的关键问题。理解人工智能模型是否存在类似的认知偏差,对于评估其可靠性、改进其性能以及预测其潜在的社会影响至关重要。研究旨在探究谷歌的Gemini 1.5 Pro和DeepSeek这两款大语言模型对框架效应和确认偏误的易感性。框架效应指人们对相同信息的不同表述方式做出不同反应的现象,而确认偏误则考察模型在处理信息时是否存在系统性偏好。为评估这些偏误,研究设计了一系列实验信息的比例和呈现顺序。
在框架效应实验中,研究构建了基因检测的决策场景,控制积极信息和消极信息的比例(如积极信息占20%、50%或80%),并变换信息呈现的先后顺序,记录模型对是否进行基因检测的倾向。在确认偏误实验中,研究提供了关于"萝卜快跑"自动驾驶汽车的积极性和消极性两篇报道,系统改变了报道中错误信息的比例(10%,30%和50%),并同样测试了不同信息呈现顺序下模型对报道的支持倾向。
研究结果表明,Gemini 1.5 Pro和DeepSeek均表现出对框架效应的易感性。具体而言,在基因检测场景下,两者的决策态度主要受到所呈现积极信息和消极信息比例的影响:当积极信息占比较高时,模型更倾向于选择进行基因检测;反之,当消极信息占比较高时,则更倾向于不进行或持谨慎态度。而信息呈现的先后顺序对框架效应的实验结果没有显著影响。
在确认偏误的实验中,Gemini 1.5 Pro并未表现出对积极或消极报道的整体偏好,其判断更多地受到信息呈现顺序的显著影响,表现出"近因效应",即更倾向于支持后呈现的报道,而报道中错误信息的比例对其影响不显著。DeepSeek在确认偏误实验中则表现出对正面报道的整体偏好,其支持正面报道的比例显著更高。尽管如此,DeepSeek的决策同样也受到了信息呈现顺序的显著影响,而错误信息比例无显著影响。
这些发现揭示了先进大语言模型中存在类似人类的认知脆弱性。Gemini 1.5 Pro对信息呈现顺序的敏感性以及DeepSeek对正面信息的偏好和对顺序的敏感性,都对人工智能在决策过程中的可靠性和客观性提出了挑战。这也提示在开发和应用人工智能时,需更加审慎地评估其潜在的认知偏误并采取有效措施以避免可能带来的负面社会影响。未来的研究应考虑纳入更多不同类型的模型进行对比,并探索更复杂的交互场景。
Abstract
The rapid advancement of Artificial Intelligence (AI) and Large Language Models (LLMs) has led to their increasing integration into various domains, from text generation and translation to question-answering. However, a critical question remains: do these sophisticated models, much like humans, exhibit susceptibility to cognitive biases? Understanding the presence and nature of such biases in AI is paramount for assessing their reliability, enhancing their performance, and predicting their societal impact. This research specifically investigates the susceptibility of Google’s Gemini 1.5 Pro and DeepSeek, two prominent LLMs, to framing effects and confirmation bias. The study meticulously designed a series of experimental trials, systematically manipulating information proportions and presentation orders to evaluate these biases.
In the framing effect experiment, a genetic testing decision-making scenario was constructed. The proportion of positive and negative information (e.g., 20%, 50%, or 80% positive) and their presentation order were varied. The models’ inclination towards undergoing genetic testing was recorded. For the confirmation bias experiment, two reports—one positive and one negative—about “RoboTaxi” autonomous vehicles were provided. The proportion of erroneous information within these reports (10%, 30%, and 50%) and their presentation order were systematically altered, and the models’ support for each report was assessed.
The findings demonstrate that both Gemini 1.5 Pro and DeepSeek are susceptible to framing effects. In the genetic testing scenario, their decision-making was primarily influenced by the proportion of positive and negative information presented. When the proportion of positive information was higher, both models showed a greater inclination to recommend or proceed with genetic testing. Conversely, a higher proportion of negative information led to greater caution or a tendency not to recommend the testing. Importantly, the order in which this information was presented did not significantly influence their decisions in the framing effect scenarios.
Regarding confirmation bias, the two models exhibited distinct behaviors. Gemini 1.5 Pro did not show an overall preference for either positive or negative reports. However, its judgments were significantly influenced by the order of information presentation, demonstrating a “recency effect,” meaning it tended to support the report presented later. The proportion of erroneous information within the reports had no significant impact on Gemini 1.5 Pro’s decisions. In contrast, DeepSeek exhibited an overall confirmation bias, showing a clear preference for positive reports. Similar to Gemini 1.5 Pro, DeepSeek’s decisions were also significantly affected by the order of information presentation, while the proportion of misinformation had no significant effect.
These results reveal human-like cognitive vulnerabilities in advanced LLMs, highlighting critical challenges to their reliability and objectivity in decision-making processes. Gemini 1.5 Pro’s sensitivity to presentation order and DeepSeek’s general preference for positive information, coupled with its sensitivity to order, underscore the need for careful evaluation of potential cognitive biases during the development and application of AI. The study suggests that effective measures are necessary to mitigate these biases and prevent potential negative societal impacts. Future research should include a broader range of models for comparative analysis and explore more complex interactive scenarios to further understand and address these phenomena. The findings contribute significantly to understanding the limitations and capabilities of current AI systems, guiding their responsible development, and anticipating their potential societal implications.
关键词
人工智能 /
大语言模型 /
认知偏差 /
确认偏误 /
框架效应
Key words
artificial intelligence /
large language models /
cognitive bias /
confirmation bias /
framing effect
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] 侯悍超, 倪士光, 林书亚, 王蒲生. (2024). 当AI学习共情:心理学视角下共情计算的主题、场景与优化. 心理科学进展, 32(5), 845-858.
[2] 刘冬予, 骆方, 屠焯然, 饶思敬, 沈阳. (2024). 人工智能技术赋能心理学发展的现状与挑战. 北京师范大学学报(自然科学版), 60(1), 30-37.
[3] 张瀚予, 丁怡宁, 郭思琪. (2024). 信息茧房效应下用户群体极化形成机理研究. 图书与情报, 3, 132-144.
[4] 赵立, 赵宏坚, 高智伟, 王黎明, 刘越, 罗渝, 廖勇. (2024). 认知传感网中基于Transformer网络的MAC协议识别方法. 电讯技术, 64(10), 1-8.
[5] Bengio Y., Hinton G., Yao A., Song D., Abbeel P., Darrell T., Harari Y. N., Zhang Y. Q., Xue L., Shalev-Shwartz S., Hadfield G., Clune J., Maharaj T., Hutter F., Baydin A. G., McIlraith S., Gao Q., Acharya A., Krueger D., & Dragan A. (2024). Managing extreme AI risks amid rapid progress. Science, 384(6698), 842-845.
[6] Berthet V., Teovanovic P., & de Gardelle V. (2024). A common factor underlying individual differences in confirmation bias. Scientific Reports, 14(1), PP 27795.
[7] Binz, M., & Schulz, E. (2023). Using cognitive psychology to understand GPT-3. Proceedings of the National Academy of Sciences, 120(6), e2218523120.
[8] Botvinick, M. M. (2022). Realizing the promise of AI: A new calling for cognitive science. Trends in Cognitive Sciences, 26(12), 1013-1014.
[9] Buslón N., Cortés A., Catuara-Solarz S., Cirillo D., & Rementería M. J. (2023). Raising awareness of sex and gender bias in artificial intelligence and health. Frontiers in Global Women' s Health, 4, 970312.
[10] Cui W., Wang D., & Han N. (2024). Survey on fake information generation, dissemination and detection. Chinese Journal of Electronics, 33(3), 573-583.
[11] DeRose J. F., Wang J., & Berger M. (2021). Attention flows: Analyzing and comparing attention mechanisms in language models. IEEE Transactions on Visualization and Computer Graphics, 27(2), 1160-1170.
[12] Gandhi T. K., Classen D. C., Sinsky C. A., Rhew D. C., Garde N. V., Roberts A., & Federico F. (2023). How can artificial intelligence decrease cognitive and work burden for front line practitioners? JAMIA Open, 6(3), ooad079.
[13] Gu M., Zhao T. F., Yang L., Wu X. K., & Chen W. N. (2024). Modeling information cocoons in networked populations: Insights from backgrounds and preferences. IEEE Transactions on Computational Social Systems, 11(3), 4497-4510.
[14] Kostick-Quenet K., Cohen I. G., Gerke S., Lo B., Antaki J., Movahedi F., Njah H., Schoen L., Estep J. E., & Blumenthal-Barby J. S. (2022). Mitigating racial bias in machine learning. Journal of Law, Medicine and Ethics, 50(1), 92-100.
[15] Lee, D., & Yoon, S. N. (2021). Application of artificial intelligence-based technologies in the healthcare industry: Opportunities and challenges. International Journal of Environmental Research and Public Health, 18(1), 271. mdpi.
[16] Liu M., Zeng J., & Gao Z. (2021). The interval anchoring effect. Experimental Psychology, 68(6), 295-304.
[17] Michel, M., & Peters, M. A. K. (2020). Confirmation bias without rhyme or reason. Synthese, 199(1-2), 2757-2772.
[18] Moffett A. T., Bowerman C., Stanojevic S., Eneanya N. D., Halpern S. D., & Weissman G. E. (2023). Global, race-neutral reference equations and pulmonary function test interpretation. JAMA Network Open, 6(6), e2316174.
[19] Moravec P. L., Minas R. K., & Dennis A. R. (2019). Fake news on social media: People believe what they want to believe when it makes no sense at all. MIS Quarterly, 43(4), 1343.
[20] Mukhamediev R. L., Popova Y., Kuchin Y., Zaitseva E., Kalimoldayev A., Symagulov A., Levashenko V., Abdoldina F., Gopejenko V., Yakunin K., Muhamedijeva E., & Yelis M. (2022). Review of artificial intelligence and machine learning technologies: Classification, restrictions, opportunities and challenges. Mathematics, 10(15), 2552.
[21] Obermeyer Z., Powers B., Vogeli C., & Mullainathan S. (2019). Dissecting racial bias in an algorithm used to manage the health of populations. Science, 366(6464), 447-453.
[22] O' Connor, M. I. (2024). Equity360: Gender, race, and ethnicity—The power of AI to improve or worsen health disparities. Clinical Orthopaedics and Related Research, 482(4), 591-594.
[23] Piao J., Liu J., Zhang F., Su J., & Li Y. (2023). Human-AI adaptive dynamics drives the emergence of information cocoons. Nature Machine Intelligence, 5(11), 1214-1224.
[24] Sergio D. S., Rashmi G., & Dario M. (2023). Editorial: Highlights in psychology: Cognitive bias. Frontiers in Psychology, 14, 1242809.
[25] Turner, B. M., & Schley, D. R. (2016). The anchor integration model: A descriptive model of anchoring effects. Cognitive Psychology, 90, 1-47.
[26] Tversky, A., & Kahneman, D. (1974). Judgment under uncertainty: Heuristics and biases. Science, 185(4157), 1124-1131.
[27] Tversky, A., & Kahneman, D. (1981). The framing of decisions and the psychology of choice. Science, 211(4481), 453-458.
[28] Yeh C., Chen Y., Wu A., Chen C., Viégas F., & Wattenberg M. (2024). AttentionViz: A global view of transformer attention. IEEE Transactions on visualization and computer graphics, 30(1), 262-272.
[29] Zhao Y., Huang Z., Seligman M., & Peng K. (2024). Risk and prosocial behavioural cues elicit human-like response patterns from AI chatbots. Scientific Reports, 14(1), 7095.
[30] Zhu H., Wu H., Cao J., Fu G., & Li H. (2018). Information dissemination model for social media with constant updates. Physica A-Statistical Mechanics and Its Applications, 502, 469-482.