dc.contributor.author |
Cabrero Daniel, Beatriz |
dc.date.accessioned |
2017-10-27T10:32:43Z |
dc.date.available |
2017-10-27T10:32:43Z |
dc.date.issued |
2017-07 |
dc.identifier.uri |
http://hdl.handle.net/10230/33109 |
dc.description |
Supervisor: Dr. Vicenç Gómez Cerdà; Co-Supervisor: Dr. Mario Ceresa |
dc.description |
Treball fi de màster de: Master in Intelligent Interactive Systems |
dc.description.abstract |
We consider the problem of computing optimal control policies in large-scale multiagent
systems, for which the standard approach via the Bellman equation is intractable.
Our formulation is based on the Kullback-Leibler control framework, also
known as Linearly-Solvable Markov Decision Problems. In this setting, adaptive
importance sampling methods have been derived that, when combined with function
approximation, can be effective for high-dimensional systems. Our approach
iteratively learns an importance sampler from which the optimal control can be
extracted and requires to simulate and reweight agents’ trajectories in the world
multiple times. We illustrate our approach through a modified version of the popular
stag-hunt game; in this scenario, there is a multiplicity of optimal policies
depending on the “temperature” parameter of the environment. The system is built
inside Pandora, a multi-agent-based modeling framework and toolbox for parallelization,
freeing us from dealing with memory management when running multiple
simulations. By using function approximation and assuming some particular factorization
of the system dynamics, we are able to scale-up our method to problems
with M = 12 agents moving in two-dimensional grids of size N = 21×21, improving
on existing methods that perform approximate inference on a temporal probabilistic
graphical model. |
dc.format.mimetype |
application/pdf |
dc.language.iso |
eng |
dc.rights |
Atribución-NoComercial-SinDerivadas 3.0 España |
dc.rights.uri |
http://creativecommons.org/licenses/by-nc-nd/3.0/es/ |
dc.subject.other |
Sistemes multiagent |
dc.subject.other |
Processos de Markov |
dc.title |
Cross-Entropy method for Kullback-Leibler control in multi-agent systems |
dc.type |
info:eu-repo/semantics/masterThesis |
dc.subject.keyword |
Agent-based system |
dc.subject.keyword |
Function approximation |
dc.subject.keyword |
Kullback-Leibler divergence |
dc.subject.keyword |
Optimal control |
dc.subject.keyword |
Parallel programming |
dc.rights.accessRights |
info:eu-repo/semantics/openAccess |