99, −0.3, 0.4, 0.7, 0.95, and 0.999. The observable correlation through sampling by the subject will, however, very on a continuous Carfilzomib scale also between these steps due to Stochasticity in the outcomes. A change from the current to a new correlation was determined probabilistically in every trial with a p = 0.3 transition probability, under the constraint that a change would only occur after the new correlation became theoretically detectable by an ideal observer that was tracking the correlation coefficient in a sliding window over the past five trials. In detail, after the normatively estimated correlation based on the last five
trials (similar to the sliding window model below) approached the new generative correlation (with a deviation <0.2), the correlation was allowed to change on all further trials. This prevented overly rapid changes in the generative correlation before subjects could have possibly detected the new correlation coefficient from outcome observations. On average (across subjects and sessions) the correlations Selleck Smad inhibitor changed every ten trials. To discourage subjects from persevering on a more favorable spot of the response scale that would give a reasonable result over
a wider range of correlations, and instead be forced to track the correlation explicitly, we further implemented an adaptive rule that if subjects’ response was both suboptimal (farther from the optimum than 0.2) and they did not change their response within the past five trials then the correlation would jump to the farthest extreme (either −0.99 or +0.999). This increased the penalty on subjects payout at their current weights and encouraged them to find a better weight allocation. In practice, this constraint came rarely (never for 10 subjects, one or two occurrences in five, and three occurrences in one subject) into use during the fMRI experiment. We modeled trial-by-trial values
of the correlation strength by using principles of reinforcement learning (Sutton and Barto, 1998). Reinforcement learning Dichloromethane dehalogenase generates in every trial a prediction error as the deviation of the experienced outcome R from the predicted outcome. Those prediction errors, multiplied by the learning rate, are then used to update predictions in future trials: equation(1) resourcevalue:Vi,t+1=Vi,t+αVδi,t,and equation(2) valuepredictionerror:δi,t=R−Vi,t. The squared prediction error is also a measure of the outcome fluctuation and thereby a quantifier of risk. A sequence of continuously large prediction errors indicates that the outcomes greatly fluctuate, whereby a sequence of small prediction errors indicate that prediction is precise with little deviation. We used this to model the risk h for both resources: equation(3) resourcevariance:hi,t+1=hi,t+αRεi,t,and equation(4) variancepredictionerror:εi,t=δi,t2−hi,t.