Repositori Digital de la UPF
Reinforcement learning (RL) agents often face challenges in real-world scenarios where the task is not known in advance. This thesis tackles the problem of task uncertainty by developing agents that can identify and adapt to the current objective in real-time, using only reward signals as feedback. Instead of a monolithic meta-policy, we propose a modular framework based on a committee of pre-trained
"expert" policies, each specialized for a single known task. We develop and analyze two distinct online adaptation mechanisms: a "Dual Lambda" algorithm, derived from a game-theoretic max-min formulation using Lagrangian duality, which finds a robust policy mixture and offers formal guarantees; a pragmatic Predictive Control (MPC-style) algorithm that selects the best expert at each step through short-horizon simulations in the true environment. The performance of these algorithms is rigorously evaluated in a custom 2D navigation environment through a three-phase protocol of increasing complexity, culminat-
ing in a zero-shot generalization test with novel, unseen obstacle geometries. The results demonstrate that both approaches significantly outperform baseline methods, successfully adapting to the active task. The analysis reveals a trade-off: the Dual Lambda method provides inherent conservatism and theoretical robustness, while the predictive approach offers greater practical flexibility and emergent behaviors,
such as autonomously assigning specialized roles to experts in complex scenarios. A crucial finding is that the performance of both algorithms is fundamentally bounded by the expressive capacity of the initial expert policy set, highlighting that while the adaptation mechanism is critical, its success is contingent on the diversity of the underlying skills. This work provides a comprehensive analysis of two modular
solutions for task-uncertain RL and establishes a foundation for developing more flexible and robust autonomous systems.
(2025) Nikodim Aleksandrovich, Svetlichnyi