Learning non-linear payoff transformations in multi-agent systems

Enllaç permanent

Descripció

  • Resum

    The use of Deep Reinforcement Learning methodologies has been successful in recent years in cooperative multi-agent systems. However, this success has been mostly empirical and there is a lack of theoretical understanding and solid description of the learning process of those algorithms. The discussion of whether the limitations of these algorithms can be tackled with tuning and optimization or, contrarily, are constrained by their own definition in these models can also easily be put forward. In this work, we propose a theoretical formulation to reproduce one of the claimed limitations of Value Decomposition Networks (VDN), when compared to its improved related model QMIX, regarding their representational capacity. Both of these algorithms follow the centralized-learning-decentralized-execution fashion. For this purpose, we scale down the dimensions of the system to bypass the need for deep learning structures and work with a toy model two-step game and a series of one-shot games that are randomly generated to produce non-linear payoff growth. Despite their simplicity, these settings capture multi-agent challenges such as the scalability problem and the non-unique learning goals. Based on our analytical description, we are also able to formulate a possible alternative solution to this limitation through the use of simple non-linear transformations of the payoff, which sets a possible direction of future work regarding larger scale systems.
  • Descripció

    Treball fi de màster de: Master in Intelligent Interactive Systems
    Tutor: Vicenç Gómez
  • Mostra el registre complet