model.OpALS module¶
Author: | Dominic Hunt |
---|---|
Reference: | Based on the paper Opponent actor learning (OpAL): Modeling interactive effects of striatal dopamine on reinforcement learning and choice incentive. Collins, A. G. E., & Frank, M. J. (2014). Psychological Review, 121(3), 337–66. doi:10.1037/a0037015 |
-
class
model.OpALS.
OpALS
(alpha=0.3, beta=4, rho=0, saturateVal=10, invBeta=None, alphaCrit=None, betaGo=None, betaNogo=None, alphaGo=None, alphaNogo=None, alphaGoDiff=None, alphaNogoDiff=None, alphaGoNogoDiff=None, expect=None, expectGo=None, **kwargs)[source]¶ Bases:
model.modelTemplate.Model
The Opponent actor learning model modified to have saturation values
The saturation values are the same for the actor and critic learners
-
Name
¶ The name of the class used when recording what has been used.
Type: string
-
currAction
¶ The current action chosen by the model. Used to pass participant action to model when fitting
Type: int
Parameters: - alpha (float, optional) – Learning rate parameter, used as either the
- alphaGoNogoDiff (float, optional) – The difference between
alphaGo
andalphaNogo
. Default isNone
. If notNone
will overwritealphaNogo
\(\alpha_N = \alpha_G - \alpha_\delta\) - alphaCrit (float, optional) – The critic learning rate. Default is
alpha
- alphaGo (float, optional) – Learning rate parameter for Go, the positive part of the actor learning
Default is
alpha
- alphaNogo (float, optional) – Learning rate parameter for Nogo, the negative part of the actor learning
Default is
alpha
- alphaGoDiff (float, optional) – The difference between
alphaCrit
andalphaGo
. The default isNone
If notNone
andalphaNogoDiff
is also notNone
, it will overwrite thealphaGo
parameter \(\alpha_G = \alpha_C + \alpha_\deltaG\) - alphaNogoDiff (float, optional) – The difference between
alphaCrit
andalphaNogo
. The default isNone
If notNone
andalphaGoDiff
is also notNone
, it will overwrite thealphaNogo
parameter \(\alpha_N = \alpha_C + \alpha_\deltaN\) - beta (float, optional) – Sensitivity parameter for probabilities. Also known as an exploration- exploitation parameter. Defined as \(\beta\) in the paper
- invBeta (float, optional) – Inverse of sensitivity parameter for the probabilities.
Defined as \(\frac{1}{\beta+1}\). Default
0.2
- rho (float, optional) – The asymmetry between the actor weights. \(\rho = \beta_G - \beta = \beta_N + \beta\)
- number_actions (integer, optional) – The maximum number of valid actions the model can expect to receive. Default 2.
- number_cues (integer, optional) –
- The initial maximum number of stimuli the model can expect to receive.
- Default 1.
- number_critics (integer, optional) – The number of different reaction learning sets. Default number_actions*number_cues
- action_codes (dict with string or int as keys and int values, optional) – A dictionary used to convert between the action references used by the task or dataset and references used in the models to describe the order in which the action information is stored.
- prior (array of floats in
[0, 1]
, optional) – The prior probability of of the states being the correct one. Defaultones((number_actions, number_cues)) / number_critics)
- expect (array of floats, optional) – The initialisation of the the expected reward.
Default
ones((number_actions, number_cues)) / number_critics
- expectGo (array of floats, optional) – The initialisation of the the expected go and nogo.
Default
ones((number_actions, number_cues)) / number_critics
- saturateVal (float, optional) – The saturation value for the model. Default is 10
- stimFunc (function, optional) – The function that transforms the stimulus into a form the model can understand and a string to identify it later. Default is blankStim
- rewFunc (function, optional) – The function that transforms the reward into a form the model can understand. Default is blankRew
- decFunc (function, optional) – The function that takes the internal values of the model and turns them in to a decision. Default is model.decision.discrete.weightProb
Notes
Actor: The chosen action is updated with
\[ \begin{align}\begin{aligned}\delta_{d,t} = r_t-E_{d,t}\\E_{d,t+1} = E_{d,t} + \alpha_E \delta_{d,t} (1-\frac{E_{d,t}}{S})\end{aligned}\end{align} \]Critic: The chosen action is updated with
\[ \begin{align}\begin{aligned}G_{d,t+1} = G_{d,t} + \alpha_G G_{d,t} \delta_{d,t} (1-\frac{G_{d,t}}{S})\\N_{d,t+1} = N_{d,t} - \alpha_N N_{d,t} \delta_{d,t} (1-\frac{N_{d,t}}{S})\end{aligned}\end{align} \]Probabilities: The probabilities for all actions are calculated using
\[ \begin{align}\begin{aligned}A_{d,t} = (1+\rho) G_{d,t}-(1-\rho) N_{d,t}\\P_{d,t} = \frac{ e^{\beta A_{d,t} }}{\sum_{d \in D}e^{\beta A_{d,t}}}\end{aligned}\end{align} \]-
actorStimulusProbs
()[source]¶ Calculates in the model-appropriate way the probability of each action.
Returns: probabilities – The probabilities associated with the action choices Return type: 1D ndArray of floats
-
calcProbabilities
(actionValues)[source]¶ Calculate the probabilities associated with the actions
Parameters: actionValues (1D ndArray of floats) – Returns: probArray – The probabilities associated with the actionValues Return type: 1D ndArray of floats
-
delta
(reward, expectation, action, stimuli)[source]¶ Calculates the comparison between the reward and the expectation
Parameters: Returns: Return type: delta
-
returnTaskState
()[source]¶ Returns all the relevant data for this model
Returns: results – The dictionary contains a series of keys including Name, Probabilities, Actions and Events. Return type: dict
-
rewardExpectation
(observation)[source]¶ Calculate the estimated reward based on the action and stimuli
This contains parts that are task dependent
Parameters: observation ({int | float | tuple}) – The set of stimuli Returns: - actionExpectations (array of floats) – The expected rewards for each action
- stimuli (list of floats) – The processed observations
- activeStimuli (list of [0, 1] mapping to [False, True]) – A list of the stimuli that were or were not present
-
storeState
()[source]¶ Stores the state of all the important variables so that they can be accessed later
-
updateModel
(delta, action, stimuli, stimuliFilter)[source]¶ Parameters: - delta (float) – The difference between the reward and the expected reward
- action (int) – The action chosen by the model in this trialstep
- stimuli (list of float) – The weights of the different stimuli in this trialstep
- stimuliFilter (list of bool) – A list describing if a stimulus cue is present in this trialstep
-