Reinforcement learning with options in semi Markov decision processes

Mostra el registre complet Registre parcial de l'ítem

  • dc.contributor.author Goswami, Sayan
  • dc.date.accessioned 2021-12-15T12:43:41Z
  • dc.date.available 2021-12-15T12:43:41Z
  • dc.date.issued 2021-09
  • dc.description Tutors: Anders Jonsson i M. Sadegh Talebica
  • dc.description Treball fi de màster de: Master in Intelligent Interactive Systems
  • dc.description.abstract The options framework incorporates temporally extended actions (termed options) to the reinforcement learning paradigm. A wide variety of prior works exist that experimentally illustrate the significance of options on the performance of a learning algorithm in a complex domains. However, the work by Fruit et al. on the semi-Markov Decision Process (SMDP) version of the UCRL2 algorithm introduced a formal understanding of circumstance that make options conducive to the performance of a learning algorithm. In this work we present our implementation of the algorithm proposed by Fruit et al. We perform experimentation on a navigation task characterized by a grid world domain. We achieve a sub-linear trend in accumulated regret as well as a linear trend in accumulated reward in the grid world domain using empirical Bernstein peeling as confidence bound.ca
  • dc.format.mimetype application/pdf*
  • dc.identifier.uri http://hdl.handle.net/10230/49225
  • dc.language.iso engca
  • dc.rights Reconeixement-CompartirIgual 4.0 Internacionalca
  • dc.rights.accessRights info:eu-repo/semantics/openAccessca
  • dc.rights.uri https://creativecommons.org/licenses/by-sa/4.0ca
  • dc.subject.keyword Reinforcement learning
  • dc.subject.keyword Hierarchical reasoning
  • dc.subject.keyword Options framework
  • dc.subject.keyword Machine learning
  • dc.title Reinforcement learning with options in semi Markov decision processesca
  • dc.type info:eu-repo/semantics/masterThesisca