Reinforcement learning with options in semi Markov decision processes
Reinforcement learning with options in semi Markov decision processes
Enllaç permanent
Descripció
Resum
The options framework incorporates temporally extended actions (termed options) to the reinforcement learning paradigm. A wide variety of prior works exist that experimentally illustrate the significance of options on the performance of a learning algorithm in a complex domains. However, the work by Fruit et al. on the semi-Markov Decision Process (SMDP) version of the UCRL2 algorithm introduced a formal understanding of circumstance that make options conducive to the performance of a learning algorithm. In this work we present our implementation of the algorithm proposed by Fruit et al. We perform experimentation on a navigation task characterized by a grid world domain. We achieve a sub-linear trend in accumulated regret as well as a linear trend in accumulated reward in the grid world domain using empirical Bernstein peeling as confidence bound.Descripció
Tutors: Anders Jonsson i M. Sadegh Talebi
Treball fi de màster de: Master in Intelligent Interactive SystemsCol·leccions
Mostra el registre complet