From perception to action: implementing in-context imitation learning on a franka robot for pick-and-place tasks

Enllaç permanent

Descripció

  • Resum

    This thesis presents a practical implementation of Instant Policy, an In-Context Imitation Learning (ICIL) model characterized by the rapid learning of new tasks, after processing a few number of demonstrations at inference time. The research evaluates how demonstration context modifications affect the model ability to understand and generalize manipulation behaviors using a Franka Emika Panda arm and Intel RealSense D435 camera integrated with Instant Policy, a state-of-the-art one-shot learning model. The core research systematically modifies demonstration buffers to analyze the model contextual reasoning capabilities across different pick-and-place scenarios. Besides, we deploy a modular pipeline that transforms RGB-D input into structured point clouds through YOLOv11-based segmentation, enabling object identification, demonstration extraction and model deployment at test time. To address gripper annotation challenges, we introduce an automated dataset creation methodology combining LangSAM for text-prompt-based segmentation and XMem++ for video mask propagation. The control architecture employs Instant Policy as a Denoising Diffusion Implicit Model, generating action sequences through graph-based reasoning over point clouds and demonstration context. Experimental results demonstrate successful adaptation of pick-and-place behaviors based on different demonstration contexts, with generalization across object pose and background variations. Performance analysis reveals critical dependencies on segmentation quality, highlighting robust perception requirements for real-world deployment. This work validates ICIL viability for robotic pick-and-place tasks, contributing insights into context understanding, automated dataset creation, and empirical validation of ICIL performance in unstructured manipulation scenarios.
  • Descripció

    Treball fi de màster de: Erasmus Mundus joint Master in Artificial Intelligence (EMAI)
    Supervisor: Alessandro De Luca Co-Supervisor: Magí Dalmau Moreno
  • Mostra el registre complet