PDF(1312 KB)
PDF(1312 KB)
PDF(1312 KB)
过程还是结果?基于内部奖赏的篮球运动员情境特异性决策建模*
Process or Outcome? Context-Specific Decision Modeling of Basketball Players Based on Intrinsic Rewards
篮球运动员在比赛中的决策往往是基于具体情境与自身经验的连续过程,这与强化学习理论的基本观点相符。然而,传统强化学习模型在捕捉多选项连续决策的动态特征方面存在不足。为此,本研究共招募56人(29名篮球运动员),采用两阶段任务,并纳入改良后的内部增强模型,探讨情境与运动经验对决策过程的影响,并将其与其他经典模型进行比较。结果显示,内部增强模型在所有实验条件下均表现出最佳拟合效果。虽然篮球运动员和新手的决策均主要以结果为导向,但篮球运动员对过程目标的敏感性显著更高;此外,在篮球战术图情境下因信息复杂度高,个体更倾向于采用探索策略,导致决策时间延长。
The decision-making process in basketball is inherently dynamic, involving the continuous integration of context and experience. While this process aligns with the core tenets of reinforcement learning (RL), classical RL models have been limited in their ability to capture the dynamic characteristics of multi-alternative, continuous decision-making. To address these limitations, the present study employed a modified version of the intrinsically enhanced model to investigate the effects of context and sports experience on the decision-making of basketball players. Moreover, the modified model was compared with several classic models, including one-trial back logistic regression, model-free model, and model-based model, to evaluate its performance in explaining and predicting decision behavior in the two-stage decision task.
This study aimed to examine how internal reward signals derived from process goals and external reward outcomes are integrated during decision-making in basketball. Specifically, we aimed to assess whether the incorporation of internal reward signals, which represent process-related achievements (e.g., tactical execution), could improve the predictive power of a modified intrinsically enhanced model. Furthermore, we explored how contextual factors modulate the decision strategies and reaction times, and whether these effects differ between experienced basketball players and novices.
A 2 (stimulus type: abstract symbols vs. basketball tactical diagrams) × 2 (group: novices vs. basketball players) mixed experimental design was employed. A total of 56 participants were recruited, including 29 basketball players with competitive experience and 27 novices with no basketball experience. Participants performed a two-stage decision task developed in MATLAB R2021b using Psychtoolbox (v3.0.19). At the outset of each trial, a fixation point was displayed for one second. In Stage 1 (S1), participants were presented with two options, representing either tactical initiation phases (in the basketball condition) or abstract alternatives (in the abstract condition). Their choice probabilistically determined the subsequent Stage 2 (S2) state, with common transitions occurring with a 70% probability and rare transitions with a 30% probability. During S2, participants made a binary decision and received immediate feedback (reward or no reward) with each option's reward probability fixed at 50%, thereby controlling for variability in external outcomes.
Behavioral responses and reaction times were recorded for both stages. Hierarchical Bayesian models were employed to analyze the data, with parameter estimation conducted via Bayesian methods using Stan and PyStan (v3.10) within a Python 3.12 environment. Model performance was evaluated using the widely applicable information criterion (WAIC), specifically the expected log pointwise predictive density (ELPD_WAIC), which simultaneously accounts for model fit and complexity. Key parameters of interest included the learning rate (α), which captures the effect of immediate reward feedback; the inverse temperature (β), which reflects the balance between exploitation and exploration; and the internal reward weight (θ), which indicates the degree to which process goals influence decision-making.
Model comparison revealed that the modified internally enhanced model outperformed all alternative models across all experimental conditions, as evidenced by consistently higher ELPD_WAIC values. This finding supports the enhanced model’s ability to effectively integrate both internal reward signals and external outcomes in explaining decision behavior.
Analysis of the model parameters showed that, the α parameter, which reflects the immediate influence of reward feedback, remained stable across conditions. This finding implies that the underlying reward processing mechanism is robust and relatively unaffected by differences in experience or stimulus type. Although both basketball players and novices predominantly made outcome-driven decisions, basketball players exhibited a significantly higher sensitivity to process goals. Specifically, the internal reward weight (θ) for basketball players was approximately 0.33, indicating a relatively greater, but still sub-dominant, influence of process-related internal rewards compared to external outcomes. Furthermore, the β parameter was significantly lower in the basketball tactical diagram condition compared to the abstract condition, suggesting that participants were more inclined to employ an exploration strategy when exposed to contextually rich stimuli. Interestingly, the increased complexity inherent in the basketball tactical diagrams also led to prolonged reaction times. This finding indicates that additional cognitive load imposed by complex information requires individuals to invest more cognitive resources to process and integrate information, which tends to promote exploration strategies and prolong reaction times.
In summary, our study demonstrates that the modified intrinsically enhanced model provides a superior framework for capturing the dynamic decision-making processes of basketball players. While both experienced basketball players and novices primarily exhibit outcome-driven decision-making, basketball players display a higher sensitivity to process goals, reflecting the influence of extensive training and experience. Moreover, the increased information complexity associated with basketball tactical stimuli significantly prolongs reaction times and facilitates exploration strategies due to heightened cognitive load. These findings underscore the necessity of integrating both internal and external reward mechanisms to comprehensively model decision behavior in complex, real-world settings.
强化学习 / 内部增强模型 / 篮球 / 决策 / 两阶段任务
reinforcement learning / intrinsically enhanced model / basketball / two-stage task / decision making
| [1] |
郭鸣谦, 潘晚坷, 胡传鹏. (2024). 认知建模中模型比较的方法. 心理科学进展, 32(10), 1736-1756.
认知建模近年来在科学心理学获得广泛应用, 而模型比较是认知建模中关键的一环: 研究者需要通过模型比较选择出最优模型, 才能进行后续的假设检验或潜变量推断。模型比较不仅要考虑模型对数据的拟合(平衡过拟合与欠拟合), 也需要考虑模型的复杂度。然而, 模型比较指标众多, 纷繁复杂, 给研究者的选用带来困难。本文将认知建模常用的模型比较指标分为三大类并介绍其计算方法及优劣, 包括拟合优度指标(包括均方误差、决定系数、ROC曲线等)、基于交叉验证的指标(包括AIC、DIC等)和基于边际似然的指标。结合正交Go/No-Go范式的公开数据, 本文展示各指标在R语言中如何实现。在此基础上, 本文探讨各指标的适用情境及模型平均等新思路。
|
| [2] |
It has repeatedly been shown that effective regulation of one's emotional states is crucial for high performance in sports. However, self-regulation skills need to be learned and practiced by players for effective use. The present study examined the effectiveness of newly designed basketball drills for regular team practice for implementing and improving self-regulation skills in young basketball players. A quasi-experimental design with four measurements and two follow-ups was applied. The sample included two teams (N = 20) of the highest national under 16 youth basketball league of Germany. The intervention group was instructed in self-regulation skills (e.g. self-talk, self-relaxation, routines) at the beginning of the intervention. Subsequently, the techniques were practiced in the course of eight psychologically oriented basketball drills between the second and fourth measurement. The sport-specific Volitional Components Questionnaire and the Action Control Scale Sport were used in this study. Results indicated that the intervention group improved significantly in the factors measuring self-motivation and dealing with negative thoughts while the control group showed no significant effects. The effects were maintained in the follow-ups. The results imply that learned and practiced strategies to regulate one's emotional state have long-term benefits for players.
|
| [3] |
|
| [4] |
|
| [5] |
Sport psychologists believe that contemporary sport’s pervasive preoccupation with winning may actually be responsible for athletes’ anxiety, motivation, and self-confidence problems. Winning is a goal that lacks the flexibility and control necessary for athletes to (a) achieve consistent success and (b) take credit for success. Martens and Burton (1982) concluded that performance goals (PGs) based on attaining personal performance standards offer the flexibility and control needed to develop high perceived ability and performance. Thus the purpose of this study was twofold: (a) to evaluate whether a goal setting training (GST) program could teach athletes to set appropriate PGs, and (b) to assess the impact of the GST program on the perceived ability, competitive cognitions, and performance of collegiate swimmers. A collegiate swim team (N=30) participated in a season-long GST program, and program effects were systematically evaluated with a multimethod approach using interteam, intrateam, and case study data. Interteam and case study data generally supported both predictions. Intrateam analyses revealed that high-accuracy GST swimmers demonstrated more optimal cognitions and performance than low-accuracy teammates, suggesting that goal setting skill mediated GST effectiveness.
|
| [6] |
|
| [7] |
|
| [8] |
The mesostriatal dopamine system is prominently implicated in model-free reinforcement learning, with fMRI BOLD signals in ventral striatum notably covarying with model-free prediction errors. However, latent learning and devaluation studies show that behavior also shows hallmarks of model-based planning, and the interaction between model-based and model-free values, prediction errors, and preferences is underexplored. We designed a multistep decision task in which model-based and model-free influences on human choice behavior could be distinguished. By showing that choices reflected both influences we could then test the purity of the ventral striatal BOLD signal as a model-free report. Contrary to expectations, the signal reflected both model-free and model-based predictions in proportions matching those that best explained choice behavior. These results challenge the notion of a separate model-free learner and suggest a more integrated computational architecture for high-level human decision-making.Copyright © 2011 Elsevier Inc. All rights reserved.
|
| [9] |
|
| [10] |
|
| [11] |
|
| [12] |
|
| [13] |
|
| [14] |
|
| [15] |
|
| [16] |
Empirical studies attesting to the effectiveness of goal setting in sport have been plagued by equivocation. Inconsistencies may relate to task/goal complexity and the types of goals that participants are asked to use (Hardy, Jones, & Gould, 1996). This study addresses the second of these issues by examining the relative efficacy of two types of goal-setting training program that differ according to their primary focus. Thirty-seven club golfers completed the Competitive State Anxiety Inventory-2 on three occasions at important competitions and the Sport Psychology Skills Questionnaire prior to, and following, the intervention. Two-factor (Group × Test) ANOVAs revealed a significant interaction (p <.05) for ability, indicating significant improvements from Test 1 to Test 2 for the process-oriented group, and between Test 1 and Test 3. The significant interactions (p <.05) for self-efficacy, cognitive anxiety control, and concentration provide further evidence for the positive impact of process goals in competitive situations.
|
| [17] |
How do humans make choices between different types of rewards? Economists have long argued on theoretical grounds that humans typically make these choices as if the values of the options they consider have been mapped to a single common scale for comparison. Neuroimaging studies in humans have recently begun to suggest the existence of a small group of specific brain sites that appear to encode the subjective values of different types of rewards on a neural common scale, almost exactly as predicted by theory. We have conducted a meta analysis using data from thirteen different functional magnetic resonance imaging studies published in recent years and we show that the principle brain area associated with this common representation is a subregion of the ventromedial prefrontal cortex (vmPFC)/orbitofrontal cortex (OFC). The data available today suggest that this common valuation path is a core system that participates in day-to-day decision making suggesting both a neurobiological foundation for standard economic theory and a tool for measuring preferences neurobiologically. Perhaps even more exciting is the possibility that our emerging understanding of the neural mechanisms for valuation and choice may provide fundamental insights into pathological choice behaviors like addiction, obesity and gambling.Copyright © 2012 Elsevier Ltd. All rights reserved.
|
| [18] |
|
| [19] |
|
| [20] |
|
| [21] |
|
| [22] |
|
| [23] |
Human behaviour may be thought of as supported by two different computational-learning mechanisms, model-free and model-based respectively. In model-free strategies, stimulus-response associations are strengthened when actions are followed by a reward and weakened otherwise. In model-based learning, previous to selecting an action, the current values of the different possible actions are computed based on a detailed model of the environment. Previous research with the two-stage task suggests that participants' behaviour usually shows a mixture of both strategies. But, interestingly, a recent study by da Silva and Hare (2020) found that participants primarily deploy model-based behaviour when they are given detailed instructions about the structure of the task. In the present study, we reproduce this essential experiment. Our results confirm that improved instructions give rise to a stronger model-based component. Crucially, we also found a significant effect of reward that became stronger under conditions that favoured the development of strong stimulus-response associations. This suggests that the effect of reward, often taken as indicator of a model-free component, is related to stimulus-response learning.Copyright © 2023 The Author(s). Published by Elsevier Ltd.. All rights reserved.
|
| [24] |
|
| [25] |
|
| [26] |
The three experiments reported here examined the process goal paradox, which has emerged from the literature on goal setting and conscious processing. We predicted that skilled but anxious performers who adopted a global movement focus using holistic process goals would outperform those who used part-oriented process goals. In line with the conscious processing hypothesis, we also predicted that performers using part process goals would experience performance impairment in test compared with baseline conditions. In all three experiments, participants performed motor tasks in baseline and test conditions. Cognitive state anxiety increased in all of the test conditions. The results confirmed our first prediction; however, we failed to find unequivocal evidence to support our second prediction. The consistent pattern of the results lends support to the suggestion that, for skilled athletes who perform under competitive pressure, using a holistic process goal that focuses attention on global aspects of a motor skill is a more effective attentional focus strategy than using a part process goal.
|
| [27] |
|
| [28] |
|
| [29] |
|
| [30] |
A sizable body of evidence has shown that the brain computes several types of value-related signals to guide decision making, such as stimulus values, outcome values, and prediction errors. A critical question for understanding decision-making mechanisms is whether these value signals are computed using an absolute or a normalized code. Under an absolute code, the neural response used to represent the value of a given stimulus does not depend on what other values might have been encountered. By contrast, under a normalized code, the neural response associated with a given value depends on its relative position in the distribution of values. This review provides a simple framework for thinking about value normalization, and uses it to evaluate the existing experimental evidence.Copyright © 2012 Elsevier Ltd. All rights reserved.
|
| [31] |
|
| [32] |
|
| [33] |
|
| [34] |
|
| [35] |
The dominant computational approach to model operant learning and its underlying neural activity is model-free reinforcement learning (RL). However, there is accumulating behavioral and neuronal-related evidence that human (and animal) operant learning is far more multifaceted. Theoretical advances in RL, such as hierarchical and model-based RL extend the explanatory power of RL to account for some of these findings. Nevertheless, some other aspects of human behavior remain inexplicable even in the simplest tasks. Here we review developments and remaining challenges in relating RL models to human operant learning. In particular, we emphasize that learning a model of the world is an essential step before or in parallel to learning the policy in RL and discuss alternative models that directly learn a policy without an explicit world model in terms of state-action pairs. Copyright © 2013 Elsevier Ltd. All rights reserved.
|
| [36] |
|
| [37] |
Distinct psychological processes have been proposed to unfold in decision-making. The time course of neural mechanisms supporting these processes has not been fully identified. The present MEG study examined spatio-temporal activity related to components of decision-making proposed to support reward valuation, reward prediction, and outcome evaluation. Each trial presented information on reward value (10 or 50 cents) and reward probability (10%, 50%, or 90%). Brain activity related to those inputs and to outcome feedback was evaluated via electromagnetic responses in source space. Distributed dipole activity reflected reward value and reward probability 150-350 ms after information arrival. Neural responses to reward-value information peaked earlier than those to reward-probability information. Results suggest that valuation, prediction, and outcome evaluation share neural structures and mechanisms even on a relatively fine time scale.Copyright © 2011 Society for Psychophysiological Research.
|
| [38] |
|
| [39] |
|
| [40] |
|
| [41] |
In this study we examined in situ decision-making skills and gaze behaviour of skilled female basketball players. Players participated as ball carriers in a specific 3 vs 3 pick-and-roll basketball play. Playing both on the right and left side of the court and facing three types of defensive play, they chose and performed one of four options: shoot, drive to the basket, pass to the screener or pass to the corner player. We concurrently measured gaze behaviour to examine the direct relationship between gaze and decision making. As one of the first studies examining decision making and gaze behaviour in situ, this study found support for the embodied choice framework as the results showed that handling the ball with the dominant or non-dominant hand influenced the decisions that were made. Gaze measures suggested that peripheral vision may serve a significant role in decision making in situ, whereas players mainly relied on central vision to execute an action. Furthermore, this study underlined the need for developing and using newer and more informative methods to analyse gaze.Copyright © 2017 Elsevier B.V. All rights reserved.
|
| [42] |
|
| [43] |
|
| [44] |
|
/
| 〈 |
|
〉 |