dc.contributor.author |
Goswami, Sayan |
dc.date.accessioned |
2021-12-15T12:43:41Z |
dc.date.available |
2021-12-15T12:43:41Z |
dc.date.issued |
2021-09 |
dc.identifier.uri |
http://hdl.handle.net/10230/49225 |
dc.description |
Tutors: Anders Jonsson i M. Sadegh Talebi |
dc.description |
Treball fi de màster de: Master in Intelligent Interactive Systems |
dc.description.abstract |
The options framework incorporates temporally extended actions (termed options)
to the reinforcement learning paradigm. A wide variety of prior works exist that
experimentally illustrate the significance of options on the performance of a learning
algorithm in a complex domains. However, the work by Fruit et al. on the
semi-Markov Decision Process (SMDP) version of the UCRL2 algorithm introduced
a formal understanding of circumstance that make options conducive to the performance
of a learning algorithm. In this work we present our implementation of the
algorithm proposed by Fruit et al. We perform experimentation on a navigation
task characterized by a grid world domain. We achieve a sub-linear trend in accumulated
regret as well as a linear trend in accumulated reward in the grid world
domain using empirical Bernstein peeling as confidence bound. |
dc.format.mimetype |
application/pdf |
dc.language.iso |
eng |
dc.rights |
Reconeixement-CompartirIgual 4.0 Internacional |
dc.rights.uri |
https://creativecommons.org/licenses/by-sa/4.0 |
dc.title |
Reinforcement learning with options in semi Markov decision processes |
dc.type |
info:eu-repo/semantics/masterThesis |
dc.subject.keyword |
Reinforcement learning |
dc.subject.keyword |
Hierarchical reasoning |
dc.subject.keyword |
Options framework |
dc.subject.keyword |
Machine learning |
dc.rights.accessRights |
info:eu-repo/semantics/openAccess |