Empirical analysis of exploration strategies in QMIX
Empirical analysis of exploration strategies in QMIX
Enllaç permanent
In real world scenarios, to solve a gran majority of problems, there is the necessity for different agents to cooperate under the condition of local observations. Fortunately, in the recent years, significant advances in Multi-Agent Reinforcement Learning have been done regarding this matter. To tackle this kind of problems, a lot of approaches are based on the on Centralized Training with Decentralized Execution, which allow the agents to be trained in a simulated environment where they can have access to the global information to later solve the problem relying only on local observations. Some popular methods are Value-Decomposition Networks (VDN) and QMIX. They undertake the problem by computing the joint action-value function Qtot as a combination of the individual action-value functions Qa, that only condition on individual action-observation histories. Specifically, this work focuses on QMIX, which has been gaining a lot of popularity in the last year due to its capacity to compute a richer representation of the joint action-value function than VDN by combining the individual Q-values in a non-linear approach. However, despite the fact that there have been a lot of improvements on QMIX, there have been small advances on how different exploration techniques could boost the learning in this context. In this work, by performing an experimental evaluation, its shown how some exploration methods outperform the greedy approach used on the original implementation of QMIX on different cases.Descripció
Treball fi de màster de: Master in Intelligent Interactive Systems
Tutor: Vicenç GómezCol·leccions
Mostra el registre complet