Content area

Abstract

Since its introduction, the reward prediction error theory of dopamine has explained a wealth of empirical phenomena, providing a unifying framework for understanding the representation of reward and value in the brain1-3. According to the now canonical theory, reward predictions are represented as a single scalar quantity, which supports learning about the expectation, or mean, of stochastic outcomes. Here we propose an account of dopamine-based reinforcement learning inspired by recent artificial intelligence research on distributional reinforcement learning4-6. We hypothesized that the brain represents possible future rewards not as a single mean, but instead as a probability distribution, effectively representing multiple future outcomes simultaneously and in parallel. This idea implies a set of empirical predictions, which we tested using single-unit recordings from mouse ventral tegmental area. Our findings provide strong evidence for a neural realization of distributional reinforcement learning.

Details

Title
A distributional code for value in dopamine-based reinforcement learning
Author
Dabney, Will 1 ; Kurth-Nelson, Zeb 1 ; Uchida, Naoshige 2 ; Starkweather, Clara Kwon 2 ; Hassabis, Demis 1 ; Munos, Rémi; Botvinick, Matthew

 DeepMind, London, UK 
 Center for Brain Science, Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA, USA 
Pages
671-2,675A-675N
Section
Article
Publication year
2020
Publication date
Jan 30, 2020
Publisher
Nature Publishing Group
ISSN
00280836
e-ISSN
14764687
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
2353080006
Copyright
Copyright Nature Publishing Group Jan 30, 2020