波动反馈环境下人脑学习的计算策略*

张洳源, 高雨燕, 方泽鸣, 周强

心理科学 ›› 2025, Vol. 48 ›› Issue (4) : 847-860.

PDF(2319 KB)
中文  |  English
PDF(2319 KB)
心理科学 ›› 2025, Vol. 48 ›› Issue (4) : 847-860. DOI: 10.16719/j.cnki.1671-6981.20250408
计算建模与人工智能

波动反馈环境下人脑学习的计算策略*

  • 张洳源1,2, 高雨燕3, 方泽鸣2, 周强**3
作者信息 +

Human Learning Strategies in a Volatile Feedback Environment

  • Zhang Ruyuan1,2, Gao Yuyan3, Fang Zeming2, Zhou Qiang3
Author information +
文章历史 +

摘要

以往研究表明,个体在波动反馈环境中通常涉及联结学习和波动学习,且这些学习策略通常通过强化学习或贝叶斯推理来描述。然而,目前尚缺乏对多种学习模型的比较研究,尚未解决的问题是个体在联结学习与波动学习过程中到底采取何种策略,亦不清楚不同联结之间的概率差异如何影响这些策略。论文结合波动反转学习任务与针对学习策略的计算建模,探讨个体在波动反馈环境中的学习策略并考察联结概率差对策略的影响。结果表明,个体在波动反转学习任务中倾向于采用结合了启发式思维的贝叶斯学习策略。重要的是,这一策略在不同联结概率差异下保持一致。论文通过对波动反馈环境下人脑学习策略的研究,对理解人脑学习和决策的动态性和灵活性具有显著意义。

Abstract

A volatile feedback environment is defined as one in which the association between actions and outcomes is uncertain and constantly changing. To adapt to such environments, people generally rely on two types of learning: associative learning and volatility learning. Most research explores these strategies using the modeling approaches of either reinforcement learning (RL) or dynamic Bayesian inference (DBI). However, much of the existing research has focused on individual learning processes under the assumption that one of the two modeling approaches is correct. Without directly comparing these two approaches, it is difficult to determine which one people actually use when learning.
The aim of this study wasThis study aims to investigate which learning strategy as reflected by modeling approaches (i.e., RL or DBI) can best account for learning behavior in a volatile feedback environment, and to assess whether these strategies vary with differences in associative probabilities. In order toTo simulate volatile feedback environments, we employed a volatile reversal learning task programmed using jsPsych, which was completed by 36 healthy participants. In this task, the probabilistic contingencies between stimuli and response options remained constant for a period of time (i.e., the stable phase) and fluctuated rapidly during another period (i.e., the volatile phase). Participants were informed that the association probability could change over time, but not when such changes would occur. In order to accurately track and adapt to changes in the environment, individuals must engage in both association learning (i.e., forming associations between cue stimuli and responses) and volatility learning (i.e., detecting how quickly the associations change). This task design enabled a more comprehensive evaluation of how individuals learn in dynamic and uncertain environments, extending beyond the scope of classic learning paradigms. It provides a more ecologically valid measure of learning strategies in volatile feedback environments. The manipulation of association probability differences was also implemented, with each participant completing two experimental conditions (high versus low association probability difference) within a counterbalanced within-subjects design. This manipulation allowed us to explore how variations in association probability impact individual learning strategies in volatile feedback environments.
This study quantitatively analyzes and compares individual learning behavior using several computational models within the frameworks of reinforcement learning (RL) and dynamic Bayesian inference (DBI). Specifically, RL focuses on optimizing behavioral policies based on feedback, emphasizing the manner in which individuals adjust future actions by computing prediction errors through interaction with the environment. In contrast, DBI places greater emphasis on probabilistic inference for modelling uncertainty, thereby enabling individuals to adapt flexibly to novel or ambiguous situations. The Bayesian approach relies on adjusting prior and posterior beliefs to better cope with a volatile feedback environment. The computational models were implemented in Python and were applied to fit participants’ task performance.
Firstly, behavioral accuracy comparisons confirmed that manipulating the difference in associative probabilities effectively distinguished participants’ performance, indicating that participants’ choices were not random. More importantly, through computational modeling, we found that, among all the models, the Hidden Markov Model (HMM) best fitted individual learning behaviors. This suggests that individuals primarily employ Bayesian learning strategies that incorporate heuristics within the task. Furthermore, we found that individuals’ learning strategies remained consistent across different levels of associative probability differences. However, as the differences in associative probabilities decreased (i.e., the task became more difficult), individuals tended to estimate a higher environmental volatility, which led to a higher learning rate (i.e., they adjusted their choices more frequently).
These findings indicate that humans use a combination ofe Bayesian inference with and several heuristics to learn associations in a volatile reversal learning task. Tasks with smaller differences in associative probabilities, which are more difficult, induce higher estimates of environmental volatility in humans. This study highlights the flexibility of human learning and decision-making and motivates future computational models in this line of research.

关键词

联结学习 / 波动学习 / 学习策略 / 贝叶斯推理 / 强化学习 / 计算建模

Key words

association / volatility / learning strategy / dynamic Bayesian inference / Reinforcement learning / computational modeling

引用本文

导出引用
张洳源, 高雨燕, 方泽鸣, 周强. 波动反馈环境下人脑学习的计算策略*[J]. 心理科学. 2025, 48(4): 847-860 https://doi.org/10.16719/j.cnki.1671-6981.20250408
Zhang Ruyuan, Gao Yuyan, Fang Zeming, Zhou Qiang. Human Learning Strategies in a Volatile Feedback Environment[J]. Journal of Psychological Science. 2025, 48(4): 847-860 https://doi.org/10.16719/j.cnki.1671-6981.20250408

参考文献

[1] Anderson, J. R., & Schunn, C. D. (2013). Implications of the Act-R learning theory: No magic bullets. In R. Glaser (Ed.), Advances in Instructional Psychology (pp. 1-33). Routledge.
[2] Asmuth J., Li L., Littman M. L., Nouri A., & Wingate D. (2012). A Bayesian sampling approach to exploration in reinforcement learning. ArXiv.
[3] Atanassova D. V., Mathys C., Diaconescu A. O., Madariaga V. I., Oosterman J. M., & Brazil I. A. (2024). Diminished pain sensitivity mediates the relationship between psychopathic traits and reduced learning from pain. Communications Psychology, 2(1), 86.
[4] Atrey A., Clary K., & Jensen D. (2020). Exploratory not explanatory: Counterfactual analysis of saliency maps for deep reinforcement learnin. Proceedings of the International Conference on Learning Representations.
[5] Behrens T. E., Woolrich M. W., Walton M. E., & Rushworth M. F. (2007). Learning the value of information in an uncertain world. Nature Neuroscience, 10(9), 1214-1221.
[6] Boll S., Gamer M., Gluth S., Finsterbusch J., & Büchel C. (2013). Separate amygdala subregions signal surprise and predictiveness during associative fear learning in humans. European Journal of Neuroscience, 37(5), 758-767.
[7] Browning M., Behrens T. E., Jocham G., O'reilly J. X., & Bishop S. J. (2015). Anxious individuals have difficulty learning the causal statistics of aversive environments. Nature neuroscience, 18(4), 590-596.
[8] Cohen N. J., Eichenbaum H., Deacedo B. S., & Corkin S. (1985). Different memory systems underlying acquisition of procedural and declarative knowledge. Annals of the New York Academy of Sciences, 444, 54-71.
[9] Cools, R., & D’Esposito, M. (2009). Dopaminergic modulation of flexible cognitive control in humans. Dopamine Handbook, 14, 249-260.
[10] Costa V. D., Dal Monte O., Lucas D. R., Murray E. A., & Averbeck B. B. (2016). Amygdala and ventral striatum make distinct contributions to reinforcement learning. Neuron, 92(2), 505-517.
[11] Daw N. D., Gershman S. J., Seymour B., Dayan P., & Dolan R. J. (2011). Model-based influences on humans' choices and striatal prediction errors. Neuron, 69(6), 1204-1215.
[12] Daw N. D., O'doherty J. P., Dayan P., Seymour B., & Dolan R. J. (2006). Cortical substrates for exploratory decisions in humans. Nature, 441(7095), 876-879.
[13] Dayan, P., & Balleine, B. W. (2002). Reward, motivation, and reinforcement learning. Neuron, 36(2), 285-298.
[14] Dayan, P., & Niv, Y. (2008). Reinforcement learning: The good, the bad and the ugly. Current Opinion in Neurobiology, 18(2), 185-196.
[15] De Berker A. O., Rutledge R. B., Mathys C., Marshall L., Cross G. F., Dolan R. J., & Bestmann S. (2016). Computations of uncertainty mediate acute stress responses in humans. Nature Communications, 7(1), 10996.
[16] Dorfman, H. M., & Gershman, S. J. (2019). Controllability governs the balance between pavlovian and instrumental action selection. Nature Communications, 10(1), 5826.
[17] Eddy, S. R. (2004). What is a hidden markov model? Nature Biotechnology, 22(10), 1315-1316.
[18] Falck J., Zhang L., Raffington L., Mohn J. J., Triesch J., Heim C., & Shing Y. L. (2023). Longitudinal changes in value-based learning in middle childhood: Distinct contributions of hippocampus and striatum. eLife, 12, RP89483.
[19] Fang Z., Zhao M., Xu T., Li Y., Xie H., Quan P., Geng H., & Zhang R. Y. (2024). Individuals with anxiety and depression use atypical decision strategies in an uncertain world. eLife, 13, RP93887.
[20] Farashahi S., Donahue C. H., Khorsand P., Seo H., Lee D., & Soltani A. (2017). Metaplasticity as a neural substrate for adaptive learning and choice under uncertainty. Neuron, 94(2), 401-414.
[21] Frank M. J., Seeberger L. C., & O'reilly R. C. (2004). By carrot or by stick: Cognitive reinforcement learning in parkinsonism. Science, 306(5703), 1940-1943.
[22] Gagne C., Zika O., Dayan P., & Bishop S. J. (2020). Impaired adaptation of learning to contingency volatility in internalizing psychopathology. eLife, 9, e61387.
[23] Gershman, S., & Wilson, R. (2010). The neural costs of optimal control. In J. Lafferty, C. K. I. Williams, J. Shawe-Taylor, R. Zemel, & A. Culotta (Eds.), Advances in neural information processing systems (pp. 712-720). Curran Associates, Inc
[24] Gershman S. J., Blei D. M., & Niv Y. (2010). Context, learning, and extinction. Psychological Review, 117(1), 197.
[25] Gershman, S. J., & Niv, Y. (2010). Learning latent structure: Carving nature at its joints. Current Opinion in Neurobiology, 20(2), 251-256.
[26] Ghahramani, Z. (2001). An introduction to hidden markov models and bayesian networks. International Journal of Pattern Recognition and Artificial Intelligence, 15(1), 9-42.
[27] Glimcher, P. W. (2011). Understanding dopamine and reinforcement learning: The Dopamine reward prediction error hypothesis. Proceedings of the National Academy of Sciences, 108(3), 15647-15654.
[28] Greene, W. H. (2003). Econometric analysis. Prentice Hall.
[29] Grewal J. K., Krzywinski M., & Altman N. (2019). Markov models—hidden Markov models. Nature Methods, 16(9), 795-796.
[30] Grill F., Guitart-Masip M., Johansson J., Stiernman L., Axelsson J., Nyberg L., & Rieckmann A. (2024). Dopamine release in human associative striatum during reversal learning. Nature Communications, 15(1), 59.
[31] Guo X., Zeng D., & Wang Y. (2024). Reinforcement learning with hidden Markov models for discovering decision-making dynamics. ArXiv.
[32] Hampton A. N., Bossaerts P., & O’doherty J. P. (2006). The role of the ventromedial prefrontal cortex in abstract state-based inference during decision making in humans. Journal of Neuroscience, 26(32), 8360-8367.
[33] Kreis I., Zhang L., Mittner M., Syla L., Lamm C., & Pfuhl G. (2023). Aberrant uncertainty processing is linked to psychotic-like experiences, autistic traits, and is reflected in pupil dilation during probabilistic learning. Cognitive, Affective, and Behavioral Neuroscience, 23(3), 905-919.
[34] Lai L., Huang A. Z., & Gershman S. J. (2022). Action chunking as policy compression. PsyArXiv.
[35] Lefebvre G., Lebreton M., Meyniel F., Bourgeois-Gironde S., & Palminteri S. (2017). Behavioural and neural characterization of optimistic reinforcement learning. Nature Human Behaviour, 1(4), 806-820.
[36] Lieder F., Shenhav A., Musslick S., & Griffiths T. L. (2018). Rational metareasoning and the plasticity of cognitive control. PLoS Computational Biology, 14(4), e1006043.
[37] Marjieh R., Harrison P. M., Lee H., Deligiannaki F., & Jacoby N. (2024). Timbral effects on consonance disentangle psychoacoustic mechanisms and suggest perceptual origins for musical scales. Nature Communications, 15(1), 1482.
[38] Martinez-Saito M., Konovalov R., Piradov M. A., Shestakova A., Gutkin B., & Klucharev V. (2019). Action in auctions: Neural and computational mechanisms of bidding behaviour. European Journal of Neuroscience, 50(8), 3327-3348.
[39] Mathys C., Daunizeau J., Friston K. J., & Stephan K. E. (2011). A Bayesian foundation for individual learning under uncertainty. Frontiers in Human Neuroscience, 5, 39.
[40] Mathys C. D., Lomakina E. I., Daunizeau J., Iglesias S., Brodersen K. H., Friston K. J., & Stephan K. E. (2014). Uncertainty in perception and the hierarchical gaussian filter. Frontiers in Human Neuroscience, 8, 825.
[41] Niv, Y. (2019). Learning task-state representations. Nature Neuroscience, 22(10), 1544-1553.
[42] Nussenbaum, K., & Hartley, C. A. (2024). Understanding the development of reward learning through the lens of meta-learning. Nature Reviews Psychology, 3, 424-438
[43] Pearce, J. M., & Hall, G. (1980). A model for pavlovian learning: Variations in the effectiveness of conditioned but not of unconditioned stimuli. Psychological Review, 87(6), 532.
[44] Piray P., Dezfouli A., Heskes T., Frank M. J., & Daw N. D. (2019). Hierarchical Bayesian inference for concurrent model fitting and comparison for group studies. PLoS Computational Biology, 15(6), e1007043.
[45] Pool E. R., Pauli W. M., Cross L., & O' Doherty, J. P. (2023). Neural substrates of parallel devaluation-sensitive and devaluation-insensitive pavlovian learning in humans. Nature Communications, 14(1), 8057.
[46] Rabiner, L. R. (1989). A tutorial on hidden markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2), 257-286.
[47] Rescorla, R. A. (1972). A theory of pavlovian conditioning: Variations in the effectiveness of reinforcement and non-reinforcement. Classical Conditioning, Current Research and Theory, 2, 64-69.
[48] Rigoux L., Stephan K. E., Friston K. J., & Daunizeau J. (2014). Bayesian model selection for group studies—revisited. NeuroImage, 84, 971-985.
[49] Ritz H., Nassar M. R., Frank M. J., & Shenhav A. (2018). A control theoretic model of adaptive learning in dynamic environments. Journal of Cognitive Neuroscience, 30(10), 1405-1421.
[50] Särkkä S.,& Svensson, L. (2023). Bayesian filtering and smoothing. Cambridge University Press..
[51] Shin, Y. S., & Niv, Y. (2021). Biased evaluations emerge from inferring hidden causes. Nature Human Behaviour, 5(9), 1180-1189.
[52] Sohn, H., & Narain, D. (2021). Neural implementations of Bayesian inference. Current Opinion in Neurobiology, 70, 121-129.
[53] Soltani, A., & Izquierdo, A. (2019). Adaptive learning under expected and unexpected uncertainty. Nature Reviews Neuroscience, 20(10), 635-644.
[54] Spriggs M. J., Sumner R. L., McMillan R. L., Moran R. J., Kirk I. J., & Muthukumaraswamy S. D. (2018). Indexing sensory plasticity: Evidence for distinct predictive coding and hebbian learning mechanisms in the cerebral cortex. NeuroImage, 176, 290-300.
[55] Strens, M. (2000). A Bayesian framework for reinforcement learning. Proceedings of International Conference on Machine Learning.
[56] Worthy D. A., Hawthorne M. J., & Otto A. R. (2013). Heterogeneity of strategy use in the Iowa Gambling Task: A comparison of win-stay/lose-shift and reinforcement learning models. Psychonomic Bulletin and Review, 20, 364-371.
[57] Worthy, D. A., & Maddox, W. T. (2014). A comparison model of reinforcement-learning and win-stay-lose-shift decision-making processes: A Tribute to Wk Estes. Journal of Mathematical Psychology, 59, 41-49.
[58] Wulf G., Shea C., & Lewthwaite R. (2010). Motor skill learning and performance: A review of influential factors. Medical Education, 44(1), 75-84.
[59] Wurm F., Walentowska W., Ernst B., Severo M. C., Pourtois G., & Steinhauser M. (2022). Task learnability modulates surprise but not valence processing for reinforcement learning in probabilistic choice tasks. Journal of Cognitive Neuroscience, 34(1), 34-53.

基金

*本研究得到国家自然科学基金专项项目(32441102)、上海市教委“人工智能促进科研范式改革赋能学科跃升计划”项目(2024AIZD014)和国家社会科学基金项目(20BSH047)的资助

PDF(2319 KB)

评审附件

Accesses

Citation

Detail

段落导航
相关文章

/