A network model that can learn reward timing using reinforced expression of synaptic plasticity

Gavornik, Jeffrey P; Loewenstein, Yonatan; Shouval, Harel Z

doi:10.1186/1471-2202-8-S2-P103

Volume 8 Supplement 2

Sixteenth Annual Computational Neuroscience Meeting: CNS*2007

Poster presentation
Open access
Published: 06 July 2007

A network model that can learn reward timing using reinforced expression of synaptic plasticity

Jeffrey P Gavornik^1,2,
Yonatan Loewenstein³ &
Harel Z Shouval¹

BMC Neuroscience volume 8, Article number: P103 (2007) Cite this article

1805 Accesses
Metrics details

Recent experimental results indicate that cells within the primary visual cortex can learn to predict the time of rewards associated with visual cues [1]. In this work, different visual cues were paired with rewards at specific temporal offsets. Before training, neurons in visual cortex were active only during the duration of the visual cue. After sufficient training neurons developed persistent activity for a time period correlated with the timing of reward.

Recurrent connections in a neural network can be constructed to set a desired network time constant that is different from the time constants of the constituent neurons. However, it is not known how such a network can learn the appropriate recurrent weights. A plasticity model that is able to accomplish this must be sensitive to the timing of reward events that, at least initially, occur seconds after the activity in the network returns to its basal level. In order to learn the appropriate dynamics, this network needs to solve a temporal credit assignment problem. In our model plasticity is an ongoing process changing the recurrent synaptic weights as a function of their activity; in the absence of a reward signal this plasticity rapidly decays. External reward signals allow permanent expression of preceding plasticity events, reinforcing only those which predict the reward. As a result, the network dynamics are altered and it develops time constants correlated with the timing of different rewards. As in other reinforcement learning models the reward signal is inhibited by the network activity to produce a stable activity pattern.

We have implemented these ideas in both abstract passive integrator networks and in more realistic integrate and fire networks and obtained results that are qualitatively similar to the experimental results. Further, we examine the implications of different possible biophysical mechanisms and propose experiments to test which specific mechanism are involved.

Support: NSF CRCNS grant number 0515285.

References

Schuler MG, Bear MF: Reward timing in the primary visual cortex. Science. 2006, 311: 1606-1609. 10.1126/science.1123513.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Neurobiology and Anatomy the University of Texas Medical School in Houston, TX, USA
Jeffrey P Gavornik & Harel Z Shouval
Department of Electrical and Computer Engineering the University of Texas, Austin, TX, USA
Jeffrey P Gavornik
Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA
Yonatan Loewenstein

Authors

Jeffrey P Gavornik
View author publications
You can also search for this author in PubMed Google Scholar
Yonatan Loewenstein
View author publications
You can also search for this author in PubMed Google Scholar
Harel Z Shouval
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jeffrey P Gavornik.

Rights and permissions

Open Access This article is published under license to BioMed Central Ltd. This is an Open Access article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Gavornik, J.P., Loewenstein, Y. & Shouval, H.Z. A network model that can learn reward timing using reinforced expression of synaptic plasticity. BMC Neurosci 8 (Suppl 2), P103 (2007). https://0-doi-org.brum.beds.ac.uk/10.1186/1471-2202-8-S2-P103

Download citation

Published: 06 July 2007
DOI: https://0-doi-org.brum.beds.ac.uk/10.1186/1471-2202-8-S2-P103

Sixteenth Annual Computational Neuroscience Meeting: CNS*2007

A network model that can learn reward timing using reinforced expression of synaptic plasticity

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

BMC Neuroscience

Contact us

Sixteenth Annual Computational Neuroscience Meeting: CNS*2007

A network model that can learn reward timing using reinforced expression of synaptic plasticity

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

BMC Neuroscience

Contact us