Action selection for MDPs: anytime AO* vs. UCT

dc.contributor.authorBonet, Blai
dc.contributor.authorGeffner, Héctor
dc.date.accessioned2018-12-04T15:29:04Z
dc.date.available2018-12-04T15:29:04Z
dc.date.issued2012
dc.descriptionComunicació presentada a: the 26th AAAI Conference on Artificial Intelligence, celebrada a Toronto, Canadà, del 22 al 26 de juliol de 2012
dc.description.abstractIn the presence of non-admissible heuristics, A* and other best-first algorithms can be converted into anytime optimal algorithms over OR graphs, by simply continuing the search after the first solution is found. The same trick, however, does not work for best-first algorithms over AND/OR graphs, that must be able to expand leaf nodes of the explicit graph that are not necessarily part of the best partial solution. Anytime optimal variants of AO* must thus address an exploration-exploitation tradeoff: they cannot just ”exploit”, they must keep exploring as well. In this work, we develop one such variant of AO* and apply it to finite-horizon MDPs. This Anytime AO* algorithm eventually delivers an optimal policy while using non-admissible random heuristics that can be sampled, as when the heuristic is the cost of a base policy that can be sampled with rollouts. We then test Anytime AO* for action selection over large infinite-horizon MDPs that cannot be solved with existing off-line heuristic search and dynamic programming algorithms, and compare it with UCT.en
dc.description.sponsorshipH. Geffner is partially supported by grants TIN2009-10232, MICINN, Spain, and EC-7PM SpaceBook.en
dc.format.mimetypeapplication/pdf
dc.identifier.citationBonet B, Geffner H. Action selection for MDPs: anytime AO* vs. UCT. In: Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence; 2012 Jul 22-26. Toronto, Canada. [Menlo Park, California]:AAAI; 2012. p. 1749-55.
dc.identifier.urihttp://hdl.handle.net/10230/35973
dc.language.isoeng
dc.publisherAssociation for the Advancement of Artificial Intelligence (AAAI)
dc.relation.ispartofProceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence; 2012 Jul 22-26. Toronto, Canada. [Menlo Park, California]:AAAI; 2012. p. 1749-55.
dc.relation.projectIDinfo:eu-repo/grantAgreement/EC/FP7/270019
dc.relation.projectIDinfo:eu-repo/grantAgreement/ES/3PN/TIN2009-10232
dc.rights© 2012, Association for the Advancement of Artificial Intelligence (www.aaai.org)
dc.rights.accessRightsinfo:eu-repo/semantics/openAccess
dc.titleAction selection for MDPs: anytime AO* vs. UCT
dc.typeinfo:eu-repo/semantics/conferenceObject
dc.type.versioninfo:eu-repo/semantics/acceptedVersion

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
geffner_AAAI2012_acti.pdf
Size:
250.78 KB
Format:
Adobe Portable Document Format