Welcome to the UPF Digital Repository

Uncertainty-based decision-making in reinforcement learning and the distributed adaptive control cognitive architecture

Show simple item record

dc.contributor.author Talavante Díaz, Pablo
dc.date.accessioned 2021-11-12T11:08:24Z
dc.date.available 2021-11-12T11:08:24Z
dc.date.issued 2021-07
dc.identifier.uri http://hdl.handle.net/10230/48970
dc.description Treball fi de màster de: Master in Cognitive Systems and Interactive Media
dc.description Directors: Adrián Fernández Amil, Ismael Tito Freire
dc.description.abstract This thesis explores the role of uncertainty estimation during training in Reinforce- ment Learning as a potential way of increasing sample efficiency, acting as a regu- lator between two subsystems that shape a policy: memory and stimulus-response. Memory-based subsystems are related to Episodic Reinforcement Learning, where exact snapshots or sequences of tuples generated during training are stored and then retrieved to perform the action that maximizes reward based solely on these past experiences. This way of learning is more related to how the hippocampus operates in the brain. In contrast, stimulus-response subsystems can be expressed as models that map states to actions in a model-free fashion. In humans and other animals, the dorsal striatum is responsible for making this stimulus-response mapping. However, this mapping process does not take into account the inherent uncertainty or variability of stimuli (i.e., perceptual uncertainty) in stochastic environments with partial observability and thus sometimes the optimal policy would be to rely more on the sequential feature of (model-based) memory. Several studies have shown that uncertainty plays a significant role in the decision-making process. Therefore we studied how it can arbitrate between the two systems. Concretely, we used an agent based on the Distributed Adaptive Control (DAC-ML) cognitive architecture comprising the two subsystems and an arbitration module that regulated their respective use based on the entropies of the policies. The agent was trained on a foraging task and showed dynamics that are aligned with human behaviour, where the memory-based system dominates at first, and throughout training, the stimulus-response systemslowly takes over. This research could potentially lead to more flexible and efficient Reinforcement Learning algorithms that combine different ways of learning and operating depending on the available knowledge about the environment.
dc.format.mimetype application/pdf
dc.language.iso eng
dc.rights This work is licensed under a Creative Commons Attribution- NonCommercial- NoDerivs 3.0 Spain License
dc.rights.uri https://creativecommons.org/licenses/by-nc-nd/3.0/es/
dc.title Uncertainty-based decision-making in reinforcement learning and the distributed adaptive control cognitive architecture
dc.type info:eu-repo/semantics/masterThesis
dc.subject.keyword Reinforcement Learning
dc.subject.keyword Episodic Control
dc.subject.keyword Policy Entropy
dc.subject.keyword Sample inefficiency problem
dc.subject.keyword Distributed Adaptive Control
dc.rights.accessRights info:eu-repo/semantics/openAccess


This item appears in the following Collection(s)

Show simple item record

Search DSpace

Advanced Search


My Account


In collaboration with Compliant to Partaking