From perception to action: implementing in-context imitation learning on a franka robot for pick-and-place tasks

Carpes Martínez, Antonio Alberto

From perception to action: implementing in-context imitation learning on a franka robot for pick-and-place tasks

Enllaç permanent

http://hdl.handle.net/10230/71797

Descripció

Resum
This thesis presents a practical implementation of Instant Policy, an In-Context Imitation Learning (ICIL) model characterized by the rapid learning of new tasks, after processing a few number of demonstrations at inference time. The research evaluates how demonstration context modifications affect the model ability to understand and generalize manipulation behaviors using a Franka Emika Panda arm and Intel RealSense D435 camera integrated with Instant Policy, a state-of-the-art one-shot learning model. The core research systematically modifies demonstration buffers to analyze the model contextual reasoning capabilities across different pick-and-place scenarios. Besides, we deploy a modular pipeline that transforms RGB-D input into structured point clouds through YOLOv11-based segmentation, enabling object identification, demonstration extraction and model deployment at test time. To address gripper annotation challenges, we introduce an automated dataset creation methodology combining LangSAM for text-prompt-based segmentation and XMem++ for video mask propagation. The control architecture employs Instant Policy as a Denoising Diffusion Implicit Model, generating action sequences through graph-based reasoning over point clouds and demonstration context. Experimental results demonstrate successful adaptation of pick-and-place behaviors based on different demonstration contexts, with generalization across object pose and background variations. Performance analysis reveals critical dependencies on segmentation quality, highlighting robust perception requirements for real-world deployment. This work validates ICIL viability for robotic pick-and-place tasks, contributing insights into context understanding, automated dataset creation, and empirical validation of ICIL performance in unstructured manipulation scenarios.
Descripció
Treball fi de màster de: Erasmus Mundus joint Master in Artificial Intelligence (EMAI)
Supervisor: Alessandro De Luca Co-Supervisor: Magí Dalmau Moreno
Col·leccions
Erasmus Mundus joint Master in Artificial Intelligence (EMAI). Master thesis projects

Mostra el registre complet

From perception to action: implementing in-context imitation learning on a franka robot for pick-and-place tasks

From perception to action: implementing in-context imitation learning on a franka robot for pick-and-place tasks

Fitxers

Data

Autories

Resum

Descripció

Col·leccions