Reinforcement learning with options in semi Markov decision processes
| dc.contributor.author | Goswami, Sayan | |
| dc.date.accessioned | 2021-12-15T12:43:41Z | |
| dc.date.available | 2021-12-15T12:43:41Z | |
| dc.date.issued | 2021-09 | |
| dc.description | Tutors: Anders Jonsson i M. Sadegh Talebi | ca |
| dc.description | Treball fi de màster de: Master in Intelligent Interactive Systems | |
| dc.description.abstract | The options framework incorporates temporally extended actions (termed options) to the reinforcement learning paradigm. A wide variety of prior works exist that experimentally illustrate the significance of options on the performance of a learning algorithm in a complex domains. However, the work by Fruit et al. on the semi-Markov Decision Process (SMDP) version of the UCRL2 algorithm introduced a formal understanding of circumstance that make options conducive to the performance of a learning algorithm. In this work we present our implementation of the algorithm proposed by Fruit et al. We perform experimentation on a navigation task characterized by a grid world domain. We achieve a sub-linear trend in accumulated regret as well as a linear trend in accumulated reward in the grid world domain using empirical Bernstein peeling as confidence bound. | ca |
| dc.format.mimetype | application/pdf | * |
| dc.identifier.uri | http://hdl.handle.net/10230/49225 | |
| dc.language.iso | eng | ca |
| dc.rights | Reconeixement-CompartirIgual 4.0 Internacional | ca |
| dc.rights.accessRights | info:eu-repo/semantics/openAccess | ca |
| dc.rights.uri | https://creativecommons.org/licenses/by-sa/4.0 | ca |
| dc.subject.keyword | Reinforcement learning | |
| dc.subject.keyword | Hierarchical reasoning | |
| dc.subject.keyword | Options framework | |
| dc.subject.keyword | Machine learning | |
| dc.title | Reinforcement learning with options in semi Markov decision processes | ca |
| dc.type | info:eu-repo/semantics/masterThesis | ca |
Files
Original bundle
1 - 1 of 1

