Policy gradient methods on a multi-agent game with Kullback-Leibler costs
Policy gradient methods on a multi-agent game with Kullback-Leibler costs
Enllaç permanent
Descripció
Resum
In nature we find all kinds of multi-agent systems sustained upon cooperative behaviours. In this work, we study multi-agent systems by means of the Stag-Hunt game, which presents a conflict between mutual benefit and personal risk. In particular, we consider the probabilistic inference approach for reinforcement learning on a grid-based variant of this game. We analyze the behavior of two different policy gradient algorithms in the presence of function approximation: the standard REINFORCE algorithm and the Cross-Entropy (CE) method, which differ on the functional form of the loss. However, even though both REINFORCE and CE share the same global optimal solution, we have found that REINFORCE behaves too greedily compared with CE. In agreement with previous results based on probabilistic graphical models, we obtain two different qualitative optimal solutions (riskand payoff-dominant) as a function of a temperature parameter, whose transition is better observed using the CE method. We also analyze the difference between using or not path-cost, in addition to the end-cost. It is known that adding pathcost makes the problem harder using an explicit probabilistic graphical model, since it increases its tree-width. Nevertheless, we observe the opposite effect for policy gradient methods, for which path-cost enhances the performance of the resulting controls in all circumstances. This is explained because the samples used by policy gradients are generally more informed with path-cost. Finally, we also consider a distributed version of the algorithm, with partial observability and feature sharing between the agents. In this setting, we show the feasibility of generalizing to larger grids using training data from smaller grids.Descripció
Treball fi de màster de: Master in Intelligent Interactive Systems
Tutors: Vicenç Gómez i Martí Sanchez Fibla