Repositori Digital de la UPF

Guies

Enviaments recents

No hi ha miniatura disponible

Bayesian bandits for algorithm selection: latent-state modeling and spatial reward structures

This thesis extends the classical Multi-Armed Bandit (MAB) framework to dynamic and spatial environments. In dynamic settings, Bayesian latent-state models with Thompson Sampling and UCB are evaluated for their ability to adapt to non-stationary rewards, with comparisons to simpler autoregressive (AR) models. For spatially structured problems, Gaussian Process (GP) and Lipschitz bandits are used to exploit correlations between arms. Algorithms such as GP-UCB and Zoom-In demonstrate improved learning efficiency. Empirical results highlight the benefits of modeling temporal and spatial structure, while also emphasizing the computational trade-offs compared to classical, more tractable bandit algorithms.

(2025-06-04) Ernst, Marvin Michel; Gelabert Cortés, Oriol; Vadenja, Melisa