Welcome to the UPF Digital Repository

Reinforcement learning with options in semi Markov decision processes

Show simple item record

dc.contributor.author Goswami, Sayan
dc.date.accessioned 2021-12-15T12:43:41Z
dc.date.available 2021-12-15T12:43:41Z
dc.date.issued 2021-09
dc.identifier.uri http://hdl.handle.net/10230/49225
dc.description Tutors: Anders Jonsson i M. Sadegh Talebi
dc.description Treball fi de màster de: Master in Intelligent Interactive Systems
dc.description.abstract The options framework incorporates temporally extended actions (termed options) to the reinforcement learning paradigm. A wide variety of prior works exist that experimentally illustrate the significance of options on the performance of a learning algorithm in a complex domains. However, the work by Fruit et al. on the semi-Markov Decision Process (SMDP) version of the UCRL2 algorithm introduced a formal understanding of circumstance that make options conducive to the performance of a learning algorithm. In this work we present our implementation of the algorithm proposed by Fruit et al. We perform experimentation on a navigation task characterized by a grid world domain. We achieve a sub-linear trend in accumulated regret as well as a linear trend in accumulated reward in the grid world domain using empirical Bernstein peeling as confidence bound.
dc.format.mimetype application/pdf
dc.language.iso eng
dc.rights Reconeixement-CompartirIgual 4.0 Internacional
dc.rights.uri https://creativecommons.org/licenses/by-sa/4.0
dc.title Reinforcement learning with options in semi Markov decision processes
dc.type info:eu-repo/semantics/masterThesis
dc.subject.keyword Reinforcement learning
dc.subject.keyword Hierarchical reasoning
dc.subject.keyword Options framework
dc.subject.keyword Machine learning
dc.rights.accessRights info:eu-repo/semantics/openAccess


This item appears in the following Collection(s)

Show simple item record

Search DSpace

Advanced Search


My Account


In collaboration with Compliant to Partaking