Reinforcement learning with options in semi Markov decision processes

dc.contributor.authorGoswami, Sayan
dc.date.accessioned2021-12-15T12:43:41Z
dc.date.available2021-12-15T12:43:41Z
dc.date.issued2021-09
dc.descriptionTutors: Anders Jonsson i M. Sadegh Talebica
dc.descriptionTreball fi de màster de: Master in Intelligent Interactive Systems
dc.description.abstractThe options framework incorporates temporally extended actions (termed options) to the reinforcement learning paradigm. A wide variety of prior works exist that experimentally illustrate the significance of options on the performance of a learning algorithm in a complex domains. However, the work by Fruit et al. on the semi-Markov Decision Process (SMDP) version of the UCRL2 algorithm introduced a formal understanding of circumstance that make options conducive to the performance of a learning algorithm. In this work we present our implementation of the algorithm proposed by Fruit et al. We perform experimentation on a navigation task characterized by a grid world domain. We achieve a sub-linear trend in accumulated regret as well as a linear trend in accumulated reward in the grid world domain using empirical Bernstein peeling as confidence bound.ca
dc.format.mimetypeapplication/pdf*
dc.identifier.urihttp://hdl.handle.net/10230/49225
dc.language.isoengca
dc.rightsReconeixement-CompartirIgual 4.0 Internacionalca
dc.rights.accessRightsinfo:eu-repo/semantics/openAccessca
dc.rights.urihttps://creativecommons.org/licenses/by-sa/4.0ca
dc.subject.keywordReinforcement learning
dc.subject.keywordHierarchical reasoning
dc.subject.keywordOptions framework
dc.subject.keywordMachine learning
dc.titleReinforcement learning with options in semi Markov decision processesca
dc.typeinfo:eu-repo/semantics/masterThesisca

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
TFM_Sayan.pdf
Size:
2.86 MB
Format:
Adobe Portable Document Format
Description:

License

Rights