Policy search for path integral control

Gómez, Vicenç; Kappen, Hilbert J.; Peters, Jan-Michael; Neumann, Gerhard

Policy search for path integral control

Mostra el registre complet Registre parcial de l'ítem

dc.contributor.author Gómez, Vicençca
dc.contributor.author Kappen, Hilbert J.ca
dc.contributor.author Peters, Jan-Michaelca
dc.contributor.author Neumann, Gerhardca
dc.date.accessioned 2017-07-04T08:57:32Z
dc.date.available 2017-07-04T08:57:32Z
dc.date.issued 2014
dc.description Comunicació presentada a la European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD 2014), celebrada els dies 15 a 19 de setembre de 2014 a Nancy, França.
dc.description.abstract Path integral (PI) control defines a general class of control problems for which the optimal control computation is equivalent to an inference problem that can be solved by evaluation of a path integral over state trajectories. However, this potential is mostly unused in real-world problems because of two main limitations: first, current approaches can typically only be applied to learn open-loop controllers and second, current sampling procedures are inefficient and not scalable to high dimensional systems. We introduce the efficient Path Integral Relative-Entropy Policy Search (PI-REPS) algorithm for learning feedback policies with PI control. Our algorithm is inspired by information theoretic policy updates that are often used in policy search. We use these updates to approximate the state trajectory distribution that is known to be optimal from the PI control theory. Our approach allows for a principled treatment of different sampling distributions and can be used to estimate many types of parametric or non-parametric feedback controllers. We show that PI-REPS significantly outperforms current methods and is able to solve tasks that are out of reach for current methods.
dc.description.sponsorship This work was supported by the European Community Seventh Framework Programme (FP7/2007-2013) under grant agreement 270327 (CompLACS).
dc.format.mimetype application/pdfca
dc.identifier.citation Gómez V, Kappen HJ, Peters J, Neumann G. Policy search for path integral control. In: Calders T, Esposito F, Hüllermeier E, Meo R, editors. Machine learning and knowledge discovery in databases. European conference, ECML PKDD 2014; 2014 Sep 15-19; Nancy, France. [place unknown]: Springer; 2014. p. 482-97. DOI: 10.1007/978-3-662-44848-9_31
dc.identifier.doi http://dx.doi.org/10.1007/978-3-662-44848-9_31
dc.identifier.uri http://hdl.handle.net/10230/32501
dc.language.iso eng
dc.publisher Springerca
dc.relation.ispartof Calders T, Esposito F, Hüllermeier E, Meo R, editors. Machine learning and knowledge discovery in databases. European conference, ECML PKDD 2014; 2014 Sep 15-19; Nancy, France. [place unknown]: Springer; 2014. p. 482-97.
dc.relation.projectID info:eu-repo/grantAgreement/EC/FP7/270327
dc.rights.accessRights info:eu-repo/semantics/openAccess
dc.subject.keyword Path integrals
dc.subject.keyword Stochastic optimal control
dc.subject.keyword Policy search
dc.title Policy search for path integral controlca
dc.type info:eu-repo/semantics/conferenceObject
dc.type.version info:eu-repo/semantics/acceptedVersion

Col·leccions

Congressos (Departament de Tecnologies de la Informació i les Comunicacions)
Documents OpenAIRE (Open Access Infrastructure for Research in Europe)