Reinforcement learning with options in semi Markov decision processes
Mostra el registre complet Registre parcial de l'ítem
- dc.contributor.author Goswami, Sayan
- dc.date.accessioned 2021-12-15T12:43:41Z
- dc.date.available 2021-12-15T12:43:41Z
- dc.date.issued 2021-09
- dc.description Tutors: Anders Jonsson i M. Sadegh Talebica
- dc.description Treball fi de màster de: Master in Intelligent Interactive Systems
- dc.description.abstract The options framework incorporates temporally extended actions (termed options) to the reinforcement learning paradigm. A wide variety of prior works exist that experimentally illustrate the significance of options on the performance of a learning algorithm in a complex domains. However, the work by Fruit et al. on the semi-Markov Decision Process (SMDP) version of the UCRL2 algorithm introduced a formal understanding of circumstance that make options conducive to the performance of a learning algorithm. In this work we present our implementation of the algorithm proposed by Fruit et al. We perform experimentation on a navigation task characterized by a grid world domain. We achieve a sub-linear trend in accumulated regret as well as a linear trend in accumulated reward in the grid world domain using empirical Bernstein peeling as confidence bound.ca
- dc.format.mimetype application/pdf*
- dc.identifier.uri http://hdl.handle.net/10230/49225
- dc.language.iso engca
- dc.rights Reconeixement-CompartirIgual 4.0 Internacionalca
- dc.rights.accessRights info:eu-repo/semantics/openAccessca
- dc.rights.uri https://creativecommons.org/licenses/by-sa/4.0ca
- dc.subject.keyword Reinforcement learning
- dc.subject.keyword Hierarchical reasoning
- dc.subject.keyword Options framework
- dc.subject.keyword Machine learning
- dc.title Reinforcement learning with options in semi Markov decision processesca
- dc.type info:eu-repo/semantics/masterThesisca