The decision-making process in basketball is inherently dynamic, involving the continuous integration of context and experience. While this process aligns with the core tenets of reinforcement learning (RL), classical RL models have been limited in their ability to capture the dynamic characteristics of multi-alternative, continuous decision-making. To address these limitations, the present study employed a modified version of the intrinsically enhanced model to investigate the effects of context and sports experience on the decision-making of basketball players. Moreover, the modified model was compared with several classic models, including one-trial back logistic regression, model-free model, and model-based model, to evaluate its performance in explaining and predicting decision behavior in the two-stage decision task.
This study aimed to examine how internal reward signals derived from process goals and external reward outcomes are integrated during decision-making in basketball. Specifically, we aimed to assess whether the incorporation of internal reward signals, which represent process-related achievements (e.g., tactical execution), could improve the predictive power of a modified intrinsically enhanced model. Furthermore, we explored how contextual factors modulate the decision strategies and reaction times, and whether these effects differ between experienced basketball players and novices.
A 2 (stimulus type: abstract symbols vs. basketball tactical diagrams) × 2 (group: novices vs. basketball players) mixed experimental design was employed. A total of 56 participants were recruited, including 29 basketball players with competitive experience and 27 novices with no basketball experience. Participants performed a two-stage decision task developed in MATLAB R2021b using Psychtoolbox (v3.0.19). At the outset of each trial, a fixation point was displayed for one second. In Stage 1 (S1), participants were presented with two options, representing either tactical initiation phases (in the basketball condition) or abstract alternatives (in the abstract condition). Their choice probabilistically determined the subsequent Stage 2 (S2) state, with common transitions occurring with a 70% probability and rare transitions with a 30% probability. During S2, participants made a binary decision and received immediate feedback (reward or no reward) with each option's reward probability fixed at 50%, thereby controlling for variability in external outcomes.
Behavioral responses and reaction times were recorded for both stages. Hierarchical Bayesian models were employed to analyze the data, with parameter estimation conducted via Bayesian methods using Stan and PyStan (v3.10) within a Python 3.12 environment. Model performance was evaluated using the widely applicable information criterion (WAIC), specifically the expected log pointwise predictive density (ELPD_WAIC), which simultaneously accounts for model fit and complexity. Key parameters of interest included the learning rate (α), which captures the effect of immediate reward feedback; the inverse temperature (β), which reflects the balance between exploitation and exploration; and the internal reward weight (θ), which indicates the degree to which process goals influence decision-making.
Model comparison revealed that the modified internally enhanced model outperformed all alternative models across all experimental conditions, as evidenced by consistently higher ELPD_WAIC values. This finding supports the enhanced model's ability to effectively integrate both internal reward signals and external outcomes in explaining decision behavior.
Analysis of the model parameters showed that, the α parameter, which reflects the immediate influence of reward feedback, remained stable across conditions. This finding implies that the underlying reward processing mechanism is robust and relatively unaffected by differences in experience or stimulus type. Although both basketball players and novices predominantly made outcome-driven decisions, basketball players exhibited a significantly higher sensitivity to process goals. Specifically, the internal reward weight (θ) for basketball players was approximately 0.33, indicating a relatively greater, but still sub-dominant, influence of process-related internal rewards compared to external outcomes. Furthermore, the β parameter was significantly lower in the basketball tactical diagram condition compared to the abstract condition, suggesting that participants were more inclined to employ an exploration strategy when exposed to contextually rich stimuli. Interestingly, the increased complexity inherent in the basketball tactical diagrams also led to prolonged reaction times. This finding indicates that additional cognitive load imposed by complex information requires individuals to invest more cognitive resources to process and integrate information, which tends to promote exploration strategies and prolong reaction times.
In summary, our study demonstrates that the modified intrinsically enhanced model provides a superior framework for capturing the dynamic decision-making processes of basketball players. While both experienced basketball players and novices primarily exhibit outcome-driven decision-making, basketball players display a higher sensitivity to process goals, reflecting the influence of extensive training and experience. Moreover, the increased information complexity associated with basketball tactical stimuli significantly prolongs reaction times and facilitates exploration strategies due to heightened cognitive load. These findings underscore the necessity of integrating both internal and external reward mechanisms to comprehensively model decision behavior in complex, real-world settings.
Key words
reinforcement learning /
intrinsically enhanced model /
basketball /
two-stage task /
decision making
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
References
[1] 郭鸣谦, 潘晚坷, 胡传鹏. (2024). 认知建模中模型比较的方法. 心理科学进展, 32(10), 1736-1756.
[2] Altfeld S., Langenkamp H., Beckmann J., & Kellmann M. (2017). Measuring the effectiveness of psychologically oriented basketball drills in team practice to improve self-regulation. International Journal of Sports Science and Coaching, 12(6), 725-736.
[3] Bavard S., Rustichini A., & Palminteri S. (2021). Two sides of the same coin: Beneficial and detrimental consequences of range adaptation in human reinforcement learning. Science Advances, 7(14), eabe0340.
[4] Brysbaert, M. (2019). How many participants do we have to include in properly powered experiments? A tutorial of power analysis with reference tables. Journal of Cognition, 2(1), 1-38.
[5] Burton, D. (1989). Winning isn' t everything: examining the impact of performance goals on collegiate swimmers' cognitions and performance. The Sport Psychologist, 3(2), 105-132.
[6] Campbell E. M., Zhong W., Hogeveen J., & Grafman J. (2025). Dorsal-ventral reinforcement learning network connectivity and incentive-driven changes in random exploration. The Journal of Neuroscience, 45(1), e0422242025.
[7] Cunningham, J. B., & McCrum-Gardner, E. (2007). Power, effect and sample size using GPower: Practical issues for researchers and members of research ethics committees. Evidence-Based Midwifery, 5, 132.
[8] Daw N. D., Gershman S. J., Seymour B., Dayan P., & Dolan R. J. (2011). Model-based influences on humans' choices and striatal prediction errors. Neuron, 69(6), 1204-1215.
[9] Ericsson K. A., Krampe R. T., & Tesch-Römer C. (1993). The role of deliberate practice in the acquisition of expert performance. Psychological Review, 100(3), 363.
[10] Feher da Silva, C., & Hare, T. A. (2018). A note on the analysis of two-stage task results: How changes in task structure affect what model-free and model-based strategies predict about the effects of reward and transition on the stay probability. PLoS one, 13(4), e0195328.
[11] Feher da Silva, C., & Hare, T. A. (2020). Humans primarily use model-based inference in the two-stage task. Nature Human Behaviour, 4(10), 1053-1066.
[12] Feher da Silva C., Lombardi G., Edelson M., & Hare T. A. (2023). Rethinking model-based and model-free influences on mental effort and striatal prediction errors. Nature Human Behaviour, 7, 323-334.
[13] Huys Q. J. M., Eshel N., O'Nions E., Sheridan L., Dayan P., & Roiser J. P. (2012). Bonsai trees in your head: How the pavlovian system sculpts goal-directed choices by pruning decision trees. PLoS Computational Biology, 8(3), e1002410.
[14] Jeong Y. H., Healy L. C., & McEwan D. (2021). The application of goal setting theory to goal setting interventions in sport: A systematic review. International Review of Sport and Exercise Psychology, 16(1), 474-499.
[15] Kevin J. M., Carlos D. B., & Matthew M. B. (2016). Identifying model-based and model-free patterns in behavior on multi-step Tasks. bioRxiv.
[16] Kingston, K. M., & Hardy, L. (1997). Effects of different types of goals on processes that support performance. The Sport Psychologist, 11(3), 277-293.
[17] Levy, D. J., & Glimcher, P. W. (2012). The root of all value: A neural common currency for choice. Current Opinion in Neurobiology, 22(6), 1027-1038.
[18] Locke, E. (2000). Motivation, cognition, and action: An analysis of studies of task goals and knowledge. Applied Psychology, 49(3), 408-429.
[19] Locke, E. A. (1968). Toward a theory of task motivation and incentives. Organizational Behavior and Human Performance, 3(2), 157-189.
[20] Locke, E. A., & Latham, G. P. (2002). Building a practically useful theory of goal setting and task motivation: A 35-year odyssey. American Psychologist, 57(9), 705-717.
[21] Locke, E. A., & Latham, G. P. (2013). New developments in goal setting and task performance. Routledge New York.
[22] Louie, K., & De Martino, B. (2014). The Neurobiology of Context-Dependent Valuation and Choice. Elsevier.
[23] Luna R., Vadillo M. A., & Luque D. (2023). Model-free decision making resists improved instructions and is enhanced by stimulus-response associations. Cortex, 168, 102-113.
[24] Luque D., Molinero S., Watson P., López F. J., & Le Pelley, M. E. (2020). Measuring habit formation through goal-directed response switching. Journal of Experimental Psychology: General, 149(8), 1449-1459.
[25] Molinaro, G., & Collins, A. G. E. (2023). Intrinsic rewards explain context-sensitive valuation in reinforcement learning. PLoS Biology, 21(7), e3002201.
[26] Mullen, R., & Hardy, L. (2010). Conscious Processing and the Process Goal Paradox. Journal of Sport and Exercise Psychology, 32(3), 275-297.
[27] Neiman, T., & Loewenstein, Y. (2011). Reinforcement learning in professional basketball players. Nature Communications, 2(1), 1283.
[28] O'Doherty J. P., Cockburn J., & Pauli W. M. (2017). Learning, reward, and decision making. Annual Review of Psychology, 68(1), 73-100.
[29] Palminteri S., Khamassi M., Joffily M., & Coricelli G. (2015). Contextual modulation of value signals in reward and punishment learning. Nature Communications, 6(1), 8096.
[30] Rangel, A., & Clithero, J. A. (2012). Value normalization in decision making: Theory and evidence. Current Opinion in Neurobiology, 22(6), 970-981.
[31] Rösch D., Schultz F., & Höner O. (2021). Decision-making skills in youth basketball players: Diagnostic and external validation of a video-based assessment. International Journal of Environmental Research and Public Health, 18(5), 2331.
[32] Rummery G. A.,& Niranjan, M. (1994). On-line Q-learning using connectionist systems.Cambridge University Press..
[33] Sang K., Todd P. M., Goldstone R. L., & Hills T. T. (2020). Simple threshold rules solve explore/exploit trade-offs in a resource accumulation search task. Cognitive Science, 44(2), e12817.
[34] Shteingart H., Neiman T., & Loewenstein Y. (2013). The role of first impression in operant learning. Journal of Experimental Psychology: General, 142(2), 476-488.
[35] Shteingart, H., & Loewenstein, Y. (2014). Reinforcement learning and human behavior. Current Opinion in Neurobiology, 25, 93-98.
[36] Song T., Ye M., Teng G., Zhang W., & Chen A. (2025). Expertise advantage of automatic prediction in visual motion representation is domain-general: A meta-analysis. Psychology of Sport and Exercise, 76, 102776.
[37] Steffen A., Rockstroh B., Wienbruch C., & Miller G. A. (2011). Distinct cognitive mechanisms in a gambling task share neural mechanisms. Psychophysiology, 48(8), 1037-1046.
[38] Sutton R. S.,& Barto, A. G. (2018). Reinforcement learning: An Introduction The MIT Press An Introduction. The MIT Press.
[39] Sweller, J. (2011). Cognitive load theory. In J. P. Mestre & B. H. Ross (Eds.), Psychology of Learning and Motivation (pp. 37-76). Academic Press.
[40] Terner, Z., & Franks, A. (2021). Modeling player and team performance in basketball. Annual Review of Statistics and Its Application, 8(1), 1-23.
[41] van Maarseveen, M. J. J., Savelsbergh G. J. P., & Oudejans, R. R. D. (2018). In situ examination of decision-making skills and gaze behaviour of basketball players. Human Movement Science, 57, 205-216.
[42] Vehtari A., Gelman A., & Gabry J. (2017). Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Statistics and Computing, 27(5), 1413-1432.
[43] Verschuere B., Köbis N. C., Bereby-Meyer Y., Rand D., & Shalvi S. (2018). Taxing the brain to uncover lying? Meta-analyzing the effect of imposing cognitive load on the reaction-time costs of lying. Journal of Applied Research in Memory and Cognition, 7(3), 462-469.
[44] Wojciechowski J., Olson J. M., Subramanian G., Kosowska Z., & Pietras K. (2025). The impact of reducing cognitive load in RT and P300 concealed information tests with importance related fillers. International Journal of Psychophysiology, 209, 112507.