Scientific article
Open access

A local temporal difference code for distributional reinforcement learning

Publication date2020

The successor representation (SR) allows for efficient and flexible value computation by representing states with their expected temporal evolution. Here, we present two extensions of the SR theory to belief state and to distribution overvalue. When states are partially observed, an optimal agent should rely on beliefstates indicating the probability of being in different states. Here, we first present an analytical expression for the SR of for fixed policies and for decision making problems with only one decision. We show that this expression also provide a good approximation to SR(b) in problems with multiple decisions such as noisy 2Dnavigation tasks. We then propose a neural network that approaches the optimal SR(b) in tasks with multiple decisions. Next, we extent the SR to distribution overvalue. In the process, we propose a new local code for distributional reinforcement learning which allows agents to recover the value distribution of a state given its SR, as well as the expected temporal evolution of the value distribution. Finally, we combine these advances into a SR model that jointly accounts for uncertainty over states and value, implemented in a biologically plausible neural network.

Citation (ISO format)
TANO RETAMALES, Pablo Ernesto, DAYAB, Peter, POUGET, Alexandre. A local temporal difference code for distributional reinforcement learning. In: Advances in Neural Information Processing Systems, 2020, vol. 33, p. 1–12.
Main files (1)
Article (Published version)
Secondary files (1)
  • PID : unige:149080
ISSN of the journal1049-5258

Technical informations

Creation11/01/2020 7:49:00 PM
First validation11/01/2020 7:49:00 PM
Update time03/16/2023 12:02:25 AM
Status update03/16/2023 12:02:23 AM
Last indexation01/17/2024 12:25:37 PM
All rights reserved by Archive ouverte UNIGE and the University of GenevaunigeBlack