Navigate
Browse
Recent Submissions

Item type: Item , Assessing the Impact of Barcelona’s Superblocks on Housing Rental and Sale Prices(2025) Aribó Herrera, Marta XiulanThe Barcelona Superblock Programme is designed to improve environmental quality and public health by giving back private vehicle space to pedestrians, carrying out street pedestrianisation and the creation of green areas. This thesis studies how these interventions affect housing prices, focusing on the cases of the Poblenou and Sant Antoni Superblocks. We use a Difference-in-Differences design with Two-Way Fixed Effects to estimate changes in both rental and sale prices after the implementation of Superblocks (SBs). The design combines matched control groups with multiple model variants, and robustness is evaluated through placebo tests. Combining two data sources; Idealista, an online private real estate listing platform, and Gencat, the official database of actual transactions, we are able to assess the differences between these two sources and further contextualise the impact of the SBs. Results from Idealista show robust increases in both rental and sale prices in Poblenou, while in Sant Antoni, we do not find robust evidence of an impact on rental prices. Gencat-based estimates are generally more sensitive to model choice and sample limitations. Understanding the impact of urban interventions on housing prices is necessary. If Superblocks raise local prices, they may lead to unintended consequences like population displacement or a reduction in affordability. At the same time, price increases can reflect improved neighbourhood quality. This study not only quantifies those effects but also provides a framework for evaluating urban policy impacts on housing markets.
Item type: Item , Exploring the integration of large language models for automatic emotion labeling in speech(2025) Yun Chien, YiIn this work, we present a comprehensive comparison of methodologies for speech emotion recognition (SER), with a focus on evaluating the effectiveness of large language models (LLMs) in this domain. Our study is structured into three parts. First, we extract audio embeddings using models such as WavLM, HuBERT, and Dasheng, and use classical machine learning classifier-Support Vector Machine (SVM) and Multilayer Perceptron (MLP) for emotion prediction. These approach serves as a baseline for comparison. Second, we investigate the capacity of LLMs like GPT-4o, Qwen2-Audio, and Amazon Nova Sonic to analyze audio features, including speaker attributes such as gender, thereby extending their application beyond traditional natural language processing. Third, we explore a more integrated approach that directly inputs raw audio into LLM for audio processing, such as Qwen2-Audio7B-Instruct, for end-to-end emotion classification, without the need for traditional signal-processing-based feature extraction. We evaluate and compare the performance of these methodologies based on various metrics, such as accuracy, precision, recall, and F1-score. A key aspect of this study is the primary focus on the results obtained from LLM-based models. Our results reveal several key insights: (1) data distribution significantly affects classifier performance; (2) different audio embeddings shows different results even with the same classifier and dataset; and (3) despite their capability, current LLMs still underperform compared to classical classifiers such as SVM and MLP in emotion prediction tasks.
Item type: Item , Knowledge graph inference from text(2025) Fuentes del Pino, RaulKnowledge graphs (KGs) play a central role in representing structured information of real world entities and their relationships and supporting tasks such as reasoning, search, and question answering. Recent advances in natural language processing have opened the door to building KGs directly from text, making it possible to automate the extraction of entities and their semantic relations at scale. This work presents a complete pipeline for constructing a knowledge graph from natural language and using it to improve the accuracy and reliability of answers generated by large language models (LLMs). The proposed system consists of four main stages. First, entities are identified using a domain adapted Named Entity Recognition model trained with BIO tagging on a corpus of automatically generated sentences. Then, a relation extraction component generates subject–predicate–object triples using a generative Transformer based model fine-tuned with domain-specific data. These structured triplets are encoded into dense vectors using semantic embeddings and indexed using a high performance HNSW (Hierarchical Navigable Small World) search structure. This allows efficient retrieval of relevant facts at inference time. To evaluate the performance of the system, experiments were conducted across twelve semantic domains using the WikiDialogue dataset. The evaluation focused on entity detection accuracy and triplet extraction precision. Results show that training on multiple domains significantly improves generalization. Moreover, the use of strict entity boundaries and span-based matching helps reduce false positives. In the final stage, the indexed triplets are used as input to an LLM (LLaMA 3.3–70B Instruct) using a Retrieval-Augmented Generation setup. Given a user query, the system retrieves semantically similar triplets and includes them in the model prompt. Three prompting strategies are explored: no context (the model answers based solely on its internal knowledge), strict (answers only if the fact is explicitly stated in the retrieved context), and permissive (uses background knowledge when the context is insufficient). Examples show that strict prompts reduce hallucinations, while permissive prompts provide fluent answers but may introduce unsupported information. Overall, the results confirm that a modular KG-based pipeline can enhance LLM output by grounding it in structured knowledge. The system is scalable, adaptable to multiple domains, and highlights the trade-offs between precision, recall, and fluency in generation tasks. This opens the door to applying similar approaches in settings where precision and clarity are essential, such as healthcare, finance, or enterprise data systems.
Item type: Item , Invisible signals: detecting potential selection bias in AI-based resume screening(2025) Buyreu Real, PauAutomated resume screening tools are now widely used in hiring processes, offering the promise of efficiency and fairness by reducing human bias. Yet recent studies have shown that these systems can still behave unfairly by picking up on subtle linguistic clues that reveal sensitive personal information. This thesis explores whether transformer-based models can infer protected attributes (gender, perceived origin, religion, or sexual orientation) from resume text, even when this information isn’t stated directly. To investigate this, the study analyzes a real-world dataset of over 900 resumes. Each document was cleaned and its words categorized into semantic groups, such as occupation-related words, location-related, skill-related, and proper nouns. The main method used is a series of lexical ablation experiments: for each demographic attribute, twelve experiments were run by including or excluding different word categories. These were combined with a lexical shift analysis using Shifterator to identify which specific words most influenced the model’s predictions. The results show that models can reliably infer gender and perceived origin. Occupation related terms were mainly predictive of gender, while geographic references were almost direct cues in identifying perceived origin. However, attempts to predict religion and sexual orientation failed, likely due to limited language cues or imbalanced data. Interestingly, even individual words like gendered job titles (e.g., “waitress”) or places names were enough to act as unintended signals. These findings raise important concerns about fairness in algorithmic hiring. The fact that AI models can detect protected attributes even in anonymized resumes suggests that bias may persist through indirect linguistic patterns. This highlights the need for stronger audits, more transparent systems, and proactive strategies to reduce bias, such as masking certain word types or using debiasing techniques during training. It also calls for caution when relying on AI-driven tools in hiring. Overall, this thesis adds to the field of algorithmic fairness by presenting a practical framework to identify and understand hidden bias in resume screening. It shows that removing obvious identifiers is not enough; fairness also depends on understanding how language itself can reveal sensitive information.
Item type: Item , Differentially private fine-tuning of self-supervised learning models for human activity recognition on wearables(2025) Ozan Güner, OktayThe widespread use of wearables in health applications has advanced personalized Human Activity Recognition (HAR), but it also introduces privacy challenges under regulations such as the European Health Data Space (EHDS). This thesis explores fine-tuning strategies for self-supervised learning models on wearable sensor data to balance user privacy and model utility within the stringent EU regulatory landscape. We employ Differentially Private Stochastic Gradient Descent (DP-SGD) to fine-tune the pre-trained HarNet10 model on the PAMAP2 dataset, evaluating two distinct strategies: classifier head fine-tuning and full model fine-tuning. Our two sequential experimental design, first investigates the privacy-utility trade off between the two strategies, revealing that classifier head fine-tuning consistently outperforms the full model approach by maintaining higher accuracy and F1-scores. This strategy better preserves the rich, pre-trained representations in the feature extractor, mitigating the impact of DP-SGD’s noise. Second, an empirical privacy evaluation using a membership inference attack confirms these findings. The differentially private classifier head model demonstrates robust protection, reducing the attack’s success to near random guessing (AUC score of 0.532) compared to the vulnerable non-private baseline (AUC of 0.690), thus aligning theoretical guarantees with practical resilience. These results confirm that classifier head fine-tuning with DP-SGD offers an optimal privacy-utility balance for HAR tasks compared to its full model variant. Our study contributes a validated framework for developing trustworthy AI in wearables, demonstrating that effective privacy can be achieved by fine-tuning a small fraction (4.83%) of a foundational model’s parameters. This research provides a practical reference for building secure, privacy preserving solutions that align with EU regulations.
Item type: Item , The Role of graph structure in the generalization performance of anyCSP(2025) Essamadi, OumaimaNeural approaches to constraint satisfaction problems (CSPs) have demonstrated competitive performance on various combinatorial optimization tasks, yet their generalization capabilities across different structural patterns remain underexplored. Understanding how neural CSP solvers transfer learned strategies between graph topologies with distinct connectivity patterns is important for developing robust neural combinatorial optimization approaches. We present a systematic study of structural generalization in neural CSP solvers, focusing on the AnyCSP model’s ability to adapt across diverse graph structures. We conduct two complementary experiments using the k-coloring problem as a controlled testbed for evaluating structural transfer capabilities. The first experiment examines cross-connectivity generalization by training models exclusively on one graph type and testing their performance on the remaining types. This design allows us to isolate the impact of structural training bias on generalization across topologically distinct problem classes. The second experiment performs connectivity ablation by training on pairwise combinations of graph types and evaluating their effectiveness on real-world structured benchmark instances. Our cross-connectivity results reveal that models trained on geometric and scale free structures achieve a success rates of 91.4% and 92.2% when tested on random graphs. However, the reverse directions yield drastically different outcomes, with only 14.8–60% success rates, indicating that models trained on structurally simpler topologies cannot adapt to more complex patterns. The connectivity ablation study shows that training combinations that include geometric graphs consistently outperform other pairings on structured benchmark instances. Our work provides theoretical insights into the mechanisms underlying neural CSP generalization and practical suggestions for developing more robust neural approaches to combinatorial optimization problems with complex graph structures.
Item type: Item , Robustness in reinforcement learning under task-uncertainty(2025) Nikodim Aleksandrovich, SvetlichnyiReinforcement learning (RL) agents often face challenges in real-world scenarios where the task is not known in advance. This thesis tackles the problem of task uncertainty by developing agents that can identify and adapt to the current objective in real-time, using only reward signals as feedback. Instead of a monolithic meta-policy, we propose a modular framework based on a committee of pre-trained "expert" policies, each specialized for a single known task. We develop and analyze two distinct online adaptation mechanisms: a "Dual Lambda" algorithm, derived from a game-theoretic max-min formulation using Lagrangian duality, which finds a robust policy mixture and offers formal guarantees; a pragmatic Predictive Control (MPC-style) algorithm that selects the best expert at each step through short-horizon simulations in the true environment. The performance of these algorithms is rigorously evaluated in a custom 2D navigation environment through a three-phase protocol of increasing complexity, culminat- ing in a zero-shot generalization test with novel, unseen obstacle geometries. The results demonstrate that both approaches significantly outperform baseline methods, successfully adapting to the active task. The analysis reveals a trade-off: the Dual Lambda method provides inherent conservatism and theoretical robustness, while the predictive approach offers greater practical flexibility and emergent behaviors, such as autonomously assigning specialized roles to experts in complex scenarios. A crucial finding is that the performance of both algorithms is fundamentally bounded by the expressive capacity of the initial expert policy set, highlighting that while the adaptation mechanism is critical, its success is contingent on the diversity of the underlying skills. This work provides a comprehensive analysis of two modular solutions for task-uncertain RL and establishes a foundation for developing more flexible and robust autonomous systems.
Item type: Item , Image captioning for digram geometry specification(2025) Giorgi, MatteoAI hiring tools promise objectivity, but may deliver bias at scale. While algorithmic resume screening outperforms manual review by processing thousands of applications daily, these systems can perpetuate gender discrimination through learned patterns embedded in the data. This research challenges present data protection practices, suggesting that anonymization is insufficient against AI inference capabilities, and proposes an alternative approach: Adversarial Debiasing. Our three-phase experimental pipeline investigated whether language models can detect gender from anonymized resumes, gender bias extent in ICT classification, and adversarial debiasing effectiveness. Using the FINDHR dataset of donated resumes with protected attributes and LiveCareer dataset for pre-training, we analyzed gender predictability from anonymized data, implemented transfer learning, then applied adversarial training using gradient reversal architecture to enforce gender uninformativeness in ICT features. This research makes two primary contributions to fair AI in algorithmic hiring. First, we provide empirical evidence that standard anonymization fails against AI inference, with DistilBERT achieving 63.86% gender detection accuracy on anonymized resumes through systematic linguistic patterns in professional self-presentation. Second, we demonstrate adversarial debiasing effectiveness, achieving near-perfect gender uninformativeness, decreasing predictability from 63.10% to 51.19% (effectively random levels) while preserving 97.6% of ICT classification performance. Our lambda parameter optimization enabled transparent control over fairness-utility trade-offs. These results have strong regulatory and industry implications. Current anonymization standards appear inadequate against AI pattern detection capabilities, demanding updated privacy protection frameworks. Our adversarial debiasing approach offers promise by embedding fairness constraints within feature learning processes, addressing proxy discrimination at its source while maintaining operational utility.
Item type: Item , Uncalibrated photometric stereo under general lighting with physics characteristics(2025) Adrián Matos, Jesús MiguelThis thesis introduces a novel variational approach to uncalibrated photometric stereo for robust recovery of 3D surface shape, reflectance properties, and lighting from multiple images captured under varying and unknown illumination. Building on recent advances in physically-aware piecewise regularization, the proposed method incorporates a specialized depth regularizer acting separately on interior regions and object boundaries, substantially improving stability and accuracy in the estimation of depth and albedo compared to conventional methods. Unlike classical photometric stereo techniques that require strict calibration or prior knowledge of lighting conditions, the presented framework formulates the simultaneous recovery of normals, diffuse and specular albedo, and lighting parameters as an unsupervised joint optimization problem. Optimization is carried out via a lagged block coordinate descent scheme, alternately updating surface depth, normal maps, lighting coefficients, and albedo, with robust M-estimators and adaptive Huber-TV regularization applied to both diffuse and specular reflectance components. The effectiveness of this approach is validated through extensive experiments on both synthetic datasets—with known ground truth—and real-world multi-illumination image collections. In the synthetic setting, the method accurately reconstructs detailed surfaces and reflectance from controlled geometric and photometric configurations, demonstrating strong quantitative improvements over classical and learning based baselines, especially near object boundaries and in challenging regions with specular highlights. Crucially, when applied to real image datasets, the model retains its robustness, offering high-fidelity 3D reconstruction even in the presence of noise, missing data, or imperfect segmentation, thus evidencing its practical potential beyond laboratory conditions. Overall, this work advances the state of the art in uncalibrated photometric stereo by demonstrating that a variational, physically-motivated framework with piecewise regularization can faithfully recover geometry and reflectance from both synthetic and real uncalibrated multi-illumination images. The results lay the foundation for future efforts toward even more adaptive regularization and application to uncontrolled, in-the-wild photometric datasets, promising safer and more flexible geometry reconstruction for complex materials and scenes.
Item type: Item , Deep learning-based photovoltaic energy forecasting with ground-based sky imagery and atmospheric data(2025) Montoya Espinagosa, InésDue to the rise in the use of renewable energies as an alternative to traditional energies, and especially solar energy, there is increasing interest in studying how to address photovoltaic forecasting in the face of the challenge of variability in photovoltaic energy production, using different techniques and methodologies. This work develops a hybrid methodology for short and long-term forecasting based on two studies with the same purpose. A multimodal approach is proposed that combines images of the sky and photovoltaic energy history with meteorological data. The main objective is to improve the accuracy of ramp event prediction, increase the robustness of forecasts in cloudy conditions, and extend capabilities beyond nowcasting, in order to support more efficient operation of the power grid and better management of solar variability. Deep convolutional neural network architectures are used for nowcasting and forecasting models, incorporating individual and multiple meteorological variables, as well as solar position. The results demonstrate that the inclusion of meteorological data, particularly the surface long-wave (thermal), radiation downwards (strd), and the combination of wind and solar position, significantly improves predictions in both nowcasting and forecasting tasks, especially on cloudy days. This study highlights the importance of integrating diverse data sources to improve the reliability and interpretability of solar energy prediction models.
Item type: Item , Joint multi-view RGB optimization for clothed 3D avatar reconstruction(2025) Ece Ugur, FuldenThe creation of high fidelity 3D human avatars from images is a central challenge in computer vision, with wide ranging applications in virtual reality, gaming, and telepresence. However, state-of-the-art single-view reconstruction methods are inherently limited by self occlusion and viewpoint ambiguity, often resulting in geometrically inaccurate or incomplete models, especially for subjects with complex clothing. This thesis introduces ExECON, a novel pipeline for avatar reconstruction. Our method extends ECON, a state-of-the-art framework that uses a single front-view RGB image, by leveraging sparse multi-view RGB inputs for improved robustness and geometry accuracy. The cornerstone of ExECON is a proposed multi-view algorithm, named Joint Multi-view Body Optimization (JMBO), which optimizes a single, canonical SMPL-X body model across all available viewpoints simultaneously. This multi-view consistent body prior then serves as a more accurate foundation for a subsequent detailed surface reconstruction stage, which leverages real front and back views to improve both body pose and clothing geometry. The efficacy of JMBO has been demonstrated through experimental validation. The multi-stage approach is critical for achieving global consistency, which significantly improves key pose and geometric quality metrics. An end to end evaluation of the final reconstructed avatars reveals substantial quantitative enhancements. The over all geometric error is reduced by nearly 65% compared to the single-view baseline. Qualitative results show that the method successfully reconstructs challenging loose clothing geometries, such as hoodies, which are a common failure case for single view systems. Our work demonstrates that establishing a multi-view geometrically consistent body prior and using it to guide the surface reconstruction can resolve the critical ambiguities of single-view methods, producing more accurate and complete 3D human avatars.
Item type: Item , Algorithmic determination of recidivism outcomes for improved risk prediction(2025) Singh, AshwinUnder European Union’s Artificial Intelligence (AI) Act, high-risk AI systems such as recidivism risk-assessment tools must perform accurately with minimal residual risk. However, such tools often rely on outdated training data due to the complex and time-consuming nature of identifying post-release outcomes. This not only prevents timely evaluations, but also undermines regulatory compliance. A notable example is RisCanvi, a recidivism risk-assessment instrument that has been in use across prisons in Catalunya since 2009. In this thesis, we address these limitations for RisCanvi by translating legal rules into a certifiably accurate algorithm (validated with domain experts) for determining recidivism. Leveraging this algorithm, we construct a high-quality dataset linking post-release outcomes with RisCanvi evaluations for 17.8K inmates released in Catalunya between 2010 and 2019. This dataset significantly improves the prediction of both general and violent recidivism relative to prior versions of RisCanvi, while enabling a comprehensive evaluation of model configurations. Drawing on this analysis, we offer empirically grounded recommendations on training data size, class re-weighting strategies, and model selection for optimizing predictive performance. Through an extensive ablation study, we identify features unrelated to criminal behavior and/or not within the inmates’ control that are redundant for predicting recidivism risk. Importantly, we demonstrate that imposing monotonicity shape constraints on RisCanvi’s features preserves predictive performance while ensuring that rehabilitative progress lowers predicted risk. Finally, we show that this constrained model satisfies relaxed algorithmic fairness constraints and produces subgroup-invariant risk scores, making it a suitable tool for decision support in high-stakes settings.
Item type: Item , Fine-turning open-source deep learning models for crowd tracking in immersive interactive installations(2024) Svetoslavov Hristov, StefanThis paper explores the development of a real-time people tracking system for immersive interactive environments using open-source deep learning models. The goal was to create an AI-based solution capable of tracking people in complex environments where immersive interactive systems are placed, and varying lighting conditions are present. The research focuses on leveraging pre-trained models and fine-tuning them to meet specific application needs, rather than building a tracking system from scratch. The study employed a detection and tracking framework using the RTMDet-tiny detection model, fine-tuned with specific datasets, and integrated with the DeepSORT tracking algorithm. The refined model's performance was evaluated based on its accuracy and robustness in different sequences, considering metrics such as mean Average Precision (mAP), Higher Order Tracking Accuracy (HOTA), Multiple Object Tracking Accuracy (MOTA), and Identification F1 Score (IDF1). The refined detection model showed an overall average accuracy (mAP) of 0.741, with significant variations depending on object size and intersection over union (IoU) thresholds. The system performed well in detecting medium-sized objects but struggled with small and large objects due to the lack of annotated diverse training data. The discussion highlights the challenges and limitations encountered, such as the modular integration issues with OpenMMLab repositories and the high manual cost of data annotation. Future work should focus on enriching the training dataset with more varied images, implementing posture and body part detection, and exploring alternative tracking algorithms like ByteTrack for potentially better performance. Additionally, filtering detections at the edges of the image to reduce fluctuations and improving visual descriptors for distinguishing individuals in complex environments are suggested as important steps to enhance the system's accuracy and reliability. This study contributes to the field of Computer Vision by demonstrating the practical application of deep learning models in real-time people tracking for immersive interactive systems. The insights gained can inform future developments in AI-based tracking solutions, ensuring more engaging and personalized user experiences in various interactive settings.
Item type: Item , The Impact of training order on memorization and forgetting in large language models(2024) Leybzon, Danny D.This paper investigates the phenomenon of memorization in large language models (LLMs), focusing on its dynamics throughout the training process and its implications for data privacy and copyright compliance. While LLMs have demonstrated impressive performance across various natural language processing tasks, their tendency to reproduce portions of their training data verbatim raises concerns about the potential leakage of sensitive information and copyright nfringement. Our research reveals several key findings about the memorization process in LLMs. First, we observe that models tend to memorize a higher proportion of their training data during the early stages of training. This memorization rate exhibits logarithmic growth before stabilizing into a linear pattern. Notably, this logarithmic growth is attributed to an increase in the number of examples forgotten by the model at each step, while the number of newly memorized examples remains relatively constant. We demonstrate that the dynamic nature of memorization results in few examples being retained throughout the entire training process. However, forgotten examples are often re-memorized during subsequent training steps. Importantly, examples memorized early in training have a higher likelihood of remaining memorized throughout the entire process. Based on these findings, we tentatively recommend that model developers avoid including their most sensitive data at either the very beginning or end of the training process to mitigate potential risks associated with memorization. Our study also reveals that different types of text are memorized at varying rates, although the overall memorization dynamics remain consistent across text categories. We find that contact information is disproportionately memorized, and many examples that persist in memory throughout training, especially those containing contact information, follow a "templated" structure. These insights contribute to a deeper understanding of memorization in LLMs and provide a foundation for developing strategies to minimize the memorization of sensitive information and mitigate the risk of training data extraction attacks. Furthermore, our findings have implications for addressing copyright concerns in the development and deployment of large language models. This research advances the field by offering a more nuanced view of memorization dynamics in LLMs and provides practical recommendations for model developers to enhance data privacy and copyright compliance in their training processes.
Item type: Item , Enhancing collaborative learning platforms: leveraging artificial intelligence for improved social presence(2024) Szafranek, Karolina MartynaComputer Supported Collaborative Learning (CSCL) supports the learning process by incorporating technologies that emphasize social interactions. A critical component of successful CSCL implementation is social presence, which reflects the personal connection between learners in a digital environment. This paper investigates the role of CSCL tools in fostering social presence and explores the potential of Artificial Intelligence (AI) to enhance this aspect, focusing on PyramidApp as a case study in university settings. Through observations and interviews, this study identifies how the features of PyramidApp promote social presence and suggests several improvements: upgrading the user interface with engaging visuals, enhancing chat functionality with features like emojis, direct replies, and AI feedback, redesigning the answer improvement process with real-time cursor displays, and refining the social awareness feature for more insightful feedback. Building on these suggestions, the paper develops a proof of concept for an Intelligent Assistant designed to scaffold learner interactions and enhance social presence. Using Large Language Models (LLMs), the proposed approach is evaluated through a simulation by integrating LLM interactions within the PyramidApp interface. Customisable prompts are crafted to generate LLM outputs representing Intelligent Assistant interventions, which are found to successfully embody and promote social presence through both tone and content, demonstrating potential for enhancing social presence in CSCL applications. The Intelligent Assistant is found to effectively guide interactions, foster a collaborative and friendly atmosphere, and support knowledge creation, receiving high usability scores. The findings also highlight areas for further exploration, such as experimenting with different personas for the Intelligent Assistant and optimizing the frequency and timing of interactions to maximize its effectiveness in enhancing social presence in collaborative educational contexts.
Item type: Item , Imitation learning and policy representation for constrained reinforcement learning(2024) Caicoya Ros, AnaThis thesis explores the integration of imitation learning and policy representation within the domain of constrained reinforcement learning (CRL) to enhance decision making in environments with stringent limitations. Reinforcement Learning (RL) is a machine learning paradigm focused on training agents to make decisions by maximizing cumulative rewards. However, real-world scenarios often require additional constraints, such as safety and regulatory requirements, necessitating the use of CRL to ensure these constraints are respected while optimizing performance. The research addresses the challenges of solving CRL problems using Generative Adversarial Imitation Learning (GAIL) within the framework of Constrained Markov Decision Processes (CMDPs). CMDPs provide a mathematical structure that incorporates constraints into the RL process. The methodology involves two key phases: the first phase uses the state-augmented CRL algorithm to obtain policies that satisfy the constraints in an augmented space, incorporating dual variables. The second phase refines these policies using GAIL to map them back to the original state space, leveraging imitation learning techniques to ensure robust performance. Numerical results from simulations demonstrate the effectiveness of this approach in achieving constraint satisfaction while maintaining high performance. The findings indicate that the integration of CRL with imitation learning can lead to significant improvements in policy robustness and compliance with constraints. This research contributes to the broader field of machine learning by providing new insights and methods for developing constrained intelligent systems.
Item type: Item , Refining 3D asset creation using machine learning and depth estimation(2024) Jiménez Ayguadè, OriolThis master’s thesis explores recent advancements in machine learning, particularly those enabling the generation of 3D graphics assets from textual descriptions or input images. Central to this research is the evaluation and enhancement of the Dream- Gaussian pipeline, a method leveraging Gaussian Splatting and diffusion models to create 3D scenes from minimal input data. The research identifies the limitations of the DreamGaussian approach and proposes refinements, including the introduction of an additional depth prior to improve model convergence. The thesis provides a comprehensive overview of state-of-the-art techniques in 3D generation, discussing key methods such as Gaussian Splatting, Score Distillation Sampling (SDS), and diffusion models. It addresses the specific issues encountered when adapting 2D image and text generation techniques for 3D applications. A significant focus is placed on the "depth loss" approach, detailing the process of inferring and refining depth maps to enhance 3D asset quality. Experimental results validate the proposed improvements, highlighting the impact of the depth prior on the generation process. Additionally, a "Text-to-Image-to-3D" pipeline is introduced, showcasing an alternative method for generating 3D assets from textual input by first inferring a reference view image. The thesis concludes with a discussion of the achievements and limitations of the current work, offering insights into potential future research directions.
Item type: Item , Exploring large language models for task planning in an open world(2024) Basbous, AdrianaThis thesis builds on the Common Sense-Based Open World Planning (COWP) framework, which integrates a classical task planner with an LLM module to enable robot autonomy in open world household environments. The framework combines the robustness of traditional planning approaches with the unfolding capabilities of LLMs in order to relax closed-world assumptions and handle failures that arise when a robot must handle new information about its environment. The objectives of this work include formulating a general household domain in PDDL for future learning and skill transfer, implementing a more efficient approach to augment a robot’s world knowledge, extending the framework to handle unknown objects, and evaluating the framework’s robustness. Experimental results demonstrate an 18% improvement in handling open world sit- uations compared to the original framework. The adapted framework also shows significant reductions in computation time (4.9 times less than the benchmark) and in the number of API calls (over 13 times fewer) required per task and situation simulated. Key findings highlight a potential for improved efficiency and accuracy in robot task planning in open worlds using LLMs but underscore the need for higher con- sistency, better informed models, and more robust collaboration schemes to achieve practical, real-world applications. Future work could focus on refining knowledge representation systems used in the framework, enhancing the LLM’s search capabili- ties, optimizing the prompting strategy, and incorporating mid- and low-level control in the framework to address the granularity of the problem-solving challenge.
Item type: Item , A deep learning approach for the segmentation and classification of Plus disease in retinal fundus images of preterm infants(2024) Matamala Avrova, Elisabet-VeraRetinopathy of Prematurity (ROP) is a severe retinal vascular disorder affecting premature infants, characterized by abnormal vessel proliferation that can lead to vision impairment or blindness if untreated. Timely and accurate diagnosis is crucial. However, current diagnostic methods often rely on subjective manual assessment, which may introduce errors. This project aims to overcome these challenges by applying artificial intelligence (AI) to enhance ROP diagnosis. The objectives within this study include developing and evaluating deep learning models for image segmentation and classification of ROP, with a particular focus on Plus disease phases. For segmentation tasks, the nnU-Net architecture was employed to create five models trained on three publicly available datasets (RETA, HVDROPDB, FIVES). These models achieved robust segmentation metrics on the training dataset (Dice coefficient similarity of 0.9525 and clDice of 0.9363). In terms of classification, models were trained on a dataset sourced from the Ophthalmic Telemedicine Network in Catalonia (RTOC), focusing on distinguishing between three labels: Plus, No-Plus, and Pre-Plus. The models achieved an impressive accuracy of 81.04%, surpassing previous studies, highlighting their ability to identify unique features associated with Pre-Plus disease. Additionally, a mockup interface was developed to visualize and interact with the diagnostic outputs of the AI models. This interface facilitates clinical integration and ongoing evaluation and refinement of the diagnostic approach for ROP.
Item type: Item , Text embeddings for modeling the evolution of online discussions(2024) Wang, QionggeThe rise of online discussion platforms has transformed the way people communicate, exchange ideas, and engage with information. From social media platforms like Reddit, Twitter, and Facebook to specialized forums and community websites, online discussions play a crucial role in shaping public opinion, disseminating information, and fostering community interaction. Understanding the dynamics of online discussion threads is essential for studying user behavior, analyzing information flow, and enhancing platform design. This study employs advanced sentence embedding methods, such as Sentence-BERT (SBERT), A Lite BERT (ALBERT), SimCSE, and Universal Sentence Encoder (USE), to enhance textual data analysis. These methods improve semantic analysis by capturing sentence meaning and providing scalable, high-quality embeddings. Our approach uses statistical modeling techniques, particularly we model how the structure of a conversation evolves over time using the multinomial logit model formulation of a popular growing tree generative model introduced in [1]. The original model assumes that the growth of a discussion depends on the interaction between three comment features: popularity, novelty, and the Root bias of the first post which initiates the discussion. The main contribution of this work is incorporating a new feature that accounts for the (textual) content of the user’s comments. For that, we use sentence embeddings and calculate the cosine similarity between parent-child comment pairs. The relevance of these four features is estimated using a maximum likelihood approach to study the differences between the newly defined multinomial logit model and the original one [1] aiming to generate robust, semantically related models. The multinomial logit model is well-suited for this task, offering a flexible framework to express different aspects of online discussions and uncover underlying patterns and relationships. It is a simple and efficient model that can be easily trained on datasets of various sizes. However, it may be limited in capturing complex relationships between words and their meanings.
