Reinforcement learning with options in semi Markov decision processes

Goswami, Sayan

Reinforcement learning with options in semi Markov decision processes

Mostra el registre complet Registre parcial de l'ítem

dc.contributor.author Goswami, Sayan
dc.date.accessioned 2021-12-15T12:43:41Z
dc.date.available 2021-12-15T12:43:41Z
dc.date.issued 2021-09
dc.description Tutors: Anders Jonsson i M. Sadegh Talebica
dc.description Treball fi de màster de: Master in Intelligent Interactive Systems
dc.description.abstract The options framework incorporates temporally extended actions (termed options) to the reinforcement learning paradigm. A wide variety of prior works exist that experimentally illustrate the significance of options on the performance of a learning algorithm in a complex domains. However, the work by Fruit et al. on the semi-Markov Decision Process (SMDP) version of the UCRL2 algorithm introduced a formal understanding of circumstance that make options conducive to the performance of a learning algorithm. In this work we present our implementation of the algorithm proposed by Fruit et al. We perform experimentation on a navigation task characterized by a grid world domain. We achieve a sub-linear trend in accumulated regret as well as a linear trend in accumulated reward in the grid world domain using empirical Bernstein peeling as confidence bound.ca
dc.format.mimetype application/pdf*
dc.identifier.uri http://hdl.handle.net/10230/49225
dc.language.iso engca
dc.rights Reconeixement-CompartirIgual 4.0 Internacionalca
dc.rights.accessRights info:eu-repo/semantics/openAccessca
dc.rights.uri https://creativecommons.org/licenses/by-sa/4.0ca
dc.subject.keyword Reinforcement learning
dc.subject.keyword Hierarchical reasoning
dc.subject.keyword Options framework
dc.subject.keyword Machine learning
dc.title Reinforcement learning with options in semi Markov decision processesca
dc.type info:eu-repo/semantics/masterThesisca

Col·leccions

Treballs de recerca de màster