Learning a sequence of motor responses to attain reward: a speed-accuracy trade-off

Cos, Ignasi; Rueda-Orozco, Pavel; Robbe, David; Girard, Benoît

doi:10.1186/1471-2202-14-S1-P143

Volume 14 Supplement 1

Abstracts from the Twenty Second Annual Computational Neuroscience Meeting: CNS*2013

Poster presentation
Open access
Published: 08 July 2013

Learning a sequence of motor responses to attain reward: a speed-accuracy trade-off

Ignasi Cos^1,2,
Pavel Rueda-Orozco³,
David Robbe³ &
…
Benoît Girard^1,2

BMC Neuroscience volume 14, Article number: P143 (2013) Cite this article

1502 Accesses
Metrics details

The study of decision-making between goal directed actions with rodents has been often based on experimental tasks in which animals were trained to perform specific sequences of actions, such as lever presses or nose pokes [4], to attain reward. This supported the hypothesis of reinforcement learning as the underlying mechanism to acquire those behavioural sequences, putatively implemented by the basal-ganglia circuitry [1, 3].

However, experimental evidence suggests that whenever we extend the complexity of the motor responses towards timely constrained behaviour, it starts reflecting an influence of costs related not only to reward, but rather a compromise between the motor factors relevant to the task, and the timely requirements to attain the goal [6]. To investigate this further, we took advantage of new behavioral protocol in which rats running on a treadmill need to estimate a fixed-temporal interval to obtain a reward [5]. Interestingly rats became proficients in this task by developping very stereotyped running trajectories. The establishment of these precise running kinematics occured progressively in a trial-and-error process that lasted between 2 to 3 months. At this point if we shortened the treadmill length, animals persisted in reproducing the previously learned kinematics even if doing so they stopped receiving reward. This is consistent with that these stereotyped running kinematics are motor habit [8].

To provide a theoretical backend for these results, we developed a model-free reinforcement learning model [7]. We excluded model-based algorithms because of the inability of the rats to exploit the previously learned behavior to accelerate their learning rate when the task changes. The specificity of this model is to count reward delivery as positive reward, but also efforts generated at each time step as negative rewards. The problem is thus a speed-accuracy trade-off process: the goal of the model is to generate the motor sequence that optimizes the ratio discounted reward/effort. The main result shows that, as long as the local time and speed are included into the characterization of the kinematic state, the model can replicate the same motor sequences. This suggests that these two pieces information are required to learn time-constrained motor sequences, and predicts that if a brain structure indeed learns these habitual sequences as the model does (our suggestion would be the sensorimotor circuits of the basal ganglia [2]), it should exhibit correlates with the same variables during the entire sequence.

References

Houk JC, Adams JL, Barto AG: A model of how the basal ganglia generate and use neural signals that predictv reinforcement. Models of information processing in the basal ganglia. Edited by: Houk JC, Davis JL, Beiser DG. 1995, Cambridge (MA): The MIT Press, 249-270.
Google Scholar
Khamassi M, Humphries MD: Integrating cortico-limbic-basal ganglia architectures for learning model-based and model-free navigation strategies. Front Behav Neurosci. 2012, 6:
Google Scholar
Khamassi M, Lachèze L, Girard B, Berthoz A, Guillot A: Actor-Critic models of reinforcement learning in the basal ganglia: from natural to artificial rats. Adapt Behav. 2005, 13 (2): 131-148. 10.1177/105971230501300205.
Article Google Scholar
Roesch MR, Calu DJ, Schoenbaum G: Dopamine neurons encode the better option in rats deciding between differently delayed or sized rewards. Nature neuroscience. 10 (12): 1615-1624.
Rueda-Orozco P, Robbe D: Striatal ensembles continuously represent animals kinematics and limb movement dynamics during execution of a locomotor habit. submitted.
Shadmehr R, Smith MA, Krakauer JW: Error correction, sensory prediction, and adaptation in motor control. Ann Rev Neurosci. 2010, 33: 89-108. 10.1146/annurev-neuro-060909-153135.
Article CAS PubMed Google Scholar
Sutton RS, Barto AG: Reinforcement learning: An introduction. 1998, Cambridge, MA: MIT press
Google Scholar
Yin HH, Knowlton BJ: The role of the basal ganglia in habit formation. Nature Reviews Neuroscience. 2006, 7 (6): 464-476. 10.1038/nrn1919.
Article CAS PubMed Google Scholar

Download references

Author information

Authors and Affiliations

ISIR, Université Pierre et Marie Curie, Paris, 75005, France
Ignasi Cos & Benoît Girard
UMR-7222, CNRS, Paris, 75005, France
Ignasi Cos & Benoît Girard
Institut de Neurobiologie de la Méditerranée (INMED), INSERM, Marseille, 13273, France
Pavel Rueda-Orozco & David Robbe

Authors

Ignasi Cos
View author publications
You can also search for this author in PubMed Google Scholar
Pavel Rueda-Orozco
View author publications
You can also search for this author in PubMed Google Scholar
David Robbe
View author publications
You can also search for this author in PubMed Google Scholar
Benoît Girard
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ignasi Cos.

Rights and permissions

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Cos, I., Rueda-Orozco, P., Robbe, D. et al. Learning a sequence of motor responses to attain reward: a speed-accuracy trade-off. BMC Neurosci 14 (Suppl 1), P143 (2013). https://0-doi-org.brum.beds.ac.uk/10.1186/1471-2202-14-S1-P143

Download citation

Published: 08 July 2013
DOI: https://0-doi-org.brum.beds.ac.uk/10.1186/1471-2202-14-S1-P143

Abstracts from the Twenty Second Annual Computational Neuroscience Meeting: CNS*2013

Learning a sequence of motor responses to attain reward: a speed-accuracy trade-off

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

BMC Neuroscience

Contact us

Abstracts from the Twenty Second Annual Computational Neuroscience Meeting: CNS*2013

Learning a sequence of motor responses to attain reward: a speed-accuracy trade-off

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

BMC Neuroscience

Contact us