Repositori Digital de la UPF

236

Continguts institucionals

268

Dades primàries de recerca

659

Docència

20024

Recerca: articles, congressos, llibres

3500

Recerca: tesis

2929

Recerca: working papers, preprints, informes, etc.

6714

Revistes científiques

4829

Treballs d'estudiants

422

Vida universitària

Guies

Dades de recerca en accés obert

Portal de producció científica

Enviaments recents

No hi ha miniatura disponible

Self-supervised and in-context learning techniques for automated optical inspection

Automated Optical Inspection (AOI) is a family of techniques used to find defects and anomalies in electronic devices from high-quality photographs of different regions of an integrated component and its packaging. Current methods use computer vision models and image preprocessing pipelines specific to each chip design and manufacturer. As a result, the current deep learning approach for AOI requires a long retraining process whenever new devices are introduced or significant covariate shifts occur in the input image distribution. In this work, we adapt and evaluate different pre-training techniques (DINO, iBOT, and MAE) for small vision transformers (ViT and FasterViT) to streamline the design process of AOI semantic segmentation models and shorten the training time needed to adapt the models to new input conditions. We use a custom, relatively small dataset for model pre-training with only 7000 unlabeled images, showing how the pre-training strategies perform well in small data regimes. Furthermore, we introduce a set of retrieval-based scene understanding techniques to solve the task of semantic segmentation of wire-bonded devices with virtually no training time in labeled data. Our results demonstrate how our custom pre-trained encoders and retrieval strategies outperform comparable convolutional architectures pre-trained using full supervision in semantic segmentation, both in speed and quality, when training time is constrained. Moreover, we show how our proposed image retrieval strategies generalize to existing ViT models pretrained on different datasets, and how the techniques can be used to predict images of a single device and produce high-quality segmentation masks using a relatively small number of labeled training images. Finally, we show how the retrieval strategies outperform fine-tuned, convolutional encoder-decoder models in the context of out-of-distribution, unseen images.

(2025) Figueira, Joaquín

Mostra'n més