Robustness in reinforcement learning under task-uncertainty

Nikodim Aleksandrovich, Svetlichnyi

Robustness in reinforcement learning under task-uncertainty

Mostra el registre complet Registre parcial de l'ítem

dc.contributor.author Nikodim Aleksandrovich, Svetlichnyi
dc.date.accessioned 2025-10-20T13:49:40Z
dc.date.available 2025-10-20T13:49:40Z
dc.date.issued 2025
dc.description Treball fi de màster de: Master in Intelligent Interactive Systems
dc.description Supervisor: Dr. Miguel Calvo-Fullana
dc.description.abstract Reinforcement learning (RL) agents often face challenges in real-world scenarios where the task is not known in advance. This thesis tackles the problem of task uncertainty by developing agents that can identify and adapt to the current objective in real-time, using only reward signals as feedback. Instead of a monolithic meta-policy, we propose a modular framework based on a committee of pre-trained "expert" policies, each specialized for a single known task. We develop and analyze two distinct online adaptation mechanisms: a "Dual Lambda" algorithm, derived from a game-theoretic max-min formulation using Lagrangian duality, which finds a robust policy mixture and offers formal guarantees; a pragmatic Predictive Control (MPC-style) algorithm that selects the best expert at each step through short-horizon simulations in the true environment. The performance of these algorithms is rigorously evaluated in a custom 2D navigation environment through a three-phase protocol of increasing complexity, culminat- ing in a zero-shot generalization test with novel, unseen obstacle geometries. The results demonstrate that both approaches significantly outperform baseline methods, successfully adapting to the active task. The analysis reveals a trade-off: the Dual Lambda method provides inherent conservatism and theoretical robustness, while the predictive approach offers greater practical flexibility and emergent behaviors, such as autonomously assigning specialized roles to experts in complex scenarios. A crucial finding is that the performance of both algorithms is fundamentally bounded by the expressive capacity of the initial expert policy set, highlighting that while the adaptation mechanism is critical, its success is contingent on the diversity of the underlying skills. This work provides a comprehensive analysis of two modular solutions for task-uncertain RL and establishes a foundation for developing more flexible and robust autonomous systems.ENG
dc.identifier.uri http://hdl.handle.net/10230/71579
dc.language.iso eng
dc.rights Llicència CC Reconeixement-NoComercial-CompartirIgual 4.0 Internacional (CC BY-NC-SA 4.0)
dc.rights.accessRights info:eu-repo/semantics/openAccess
dc.rights.uri https://creativecommons.org/licenses/by-nc-sa/4.0/
dc.subject.other Aprenentatge automàtic
dc.title Robustness in reinforcement learning under task-uncertainty
dc.type info:eu-repo/semantics/masterThesis

Col·leccions

Master in Intelligent Interactive Systems. Master Thesis projects