Robustness in reinforcement learning under task-uncertainty

Mostra el registre complet Registre parcial de l'ítem

  • dc.contributor.author Nikodim Aleksandrovich, Svetlichnyi
  • dc.date.accessioned 2025-10-20T13:49:40Z
  • dc.date.available 2025-10-20T13:49:40Z
  • dc.date.issued 2025
  • dc.description Treball fi de màster de: Master in Intelligent Interactive Systems
  • dc.description Supervisor: Dr. Miguel Calvo-Fullana
  • dc.description.abstract Reinforcement learning (RL) agents often face challenges in real-world scenarios where the task is not known in advance. This thesis tackles the problem of task uncertainty by developing agents that can identify and adapt to the current objective in real-time, using only reward signals as feedback. Instead of a monolithic meta-policy, we propose a modular framework based on a committee of pre-trained "expert" policies, each specialized for a single known task. We develop and analyze two distinct online adaptation mechanisms: a "Dual Lambda" algorithm, derived from a game-theoretic max-min formulation using Lagrangian duality, which finds a robust policy mixture and offers formal guarantees; a pragmatic Predictive Control (MPC-style) algorithm that selects the best expert at each step through short-horizon simulations in the true environment. The performance of these algorithms is rigorously evaluated in a custom 2D navigation environment through a three-phase protocol of increasing complexity, culminat- ing in a zero-shot generalization test with novel, unseen obstacle geometries. The results demonstrate that both approaches significantly outperform baseline methods, successfully adapting to the active task. The analysis reveals a trade-off: the Dual Lambda method provides inherent conservatism and theoretical robustness, while the predictive approach offers greater practical flexibility and emergent behaviors, such as autonomously assigning specialized roles to experts in complex scenarios. A crucial finding is that the performance of both algorithms is fundamentally bounded by the expressive capacity of the initial expert policy set, highlighting that while the adaptation mechanism is critical, its success is contingent on the diversity of the underlying skills. This work provides a comprehensive analysis of two modular solutions for task-uncertain RL and establishes a foundation for developing more flexible and robust autonomous systems.ENG
  • dc.identifier.uri http://hdl.handle.net/10230/71579
  • dc.language.iso eng
  • dc.rights Llicència CC Reconeixement-NoComercial-CompartirIgual 4.0 Internacional (CC BY-NC-SA 4.0)
  • dc.rights.accessRights info:eu-repo/semantics/openAccess
  • dc.rights.uri https://creativecommons.org/licenses/by-nc-sa/4.0/
  • dc.subject.other Aprenentatge automàtic
  • dc.title Robustness in reinforcement learning under task-uncertainty
  • dc.type info:eu-repo/semantics/masterThesis