The use of Deep Reinforcement Learning methodologies has been successful in recent
years in cooperative multi-agent systems. However, this success has been mostly empirical and there is a lack of theoretical understanding and solid description of the
learning process of those algorithms. The discussion of whether the limitations of
these algorithms can be tackled with tuning and optimization or, contrarily, are constrained
by their own definition in these models can also easily be put forward. ...
The use of Deep Reinforcement Learning methodologies has been successful in recent
years in cooperative multi-agent systems. However, this success has been mostly empirical and there is a lack of theoretical understanding and solid description of the
learning process of those algorithms. The discussion of whether the limitations of
these algorithms can be tackled with tuning and optimization or, contrarily, are constrained
by their own definition in these models can also easily be put forward. In
this work, we propose a theoretical formulation to reproduce one of the claimed limitations
of Value Decomposition Networks (VDN), when compared to its improved
related model QMIX, regarding their representational capacity. Both of these algorithms
follow the centralized-learning-decentralized-execution fashion. For this
purpose, we scale down the dimensions of the system to bypass the need for deep
learning structures and work with a toy model two-step game and a series of one-shot
games that are randomly generated to produce non-linear payoff growth. Despite
their simplicity, these settings capture multi-agent challenges such as the scalability
problem and the non-unique learning goals. Based on our analytical description, we
are also able to formulate a possible alternative solution to this limitation through
the use of simple non-linear transformations of the payoff, which sets a possible
direction of future work regarding larger scale systems.
+