Comparison of vision transformers and convolution neural networks

Mostra el registre complet Registre parcial de l'ítem

  • dc.contributor.author Belka, Caroline
  • dc.contributor.author Chen, Joshua
  • dc.contributor.author Wallstein, Jonas
  • dc.date.accessioned 2025-05-06T14:11:09Z
  • dc.date.available 2025-05-06T14:11:09Z
  • dc.date.issued 2024-06-09
  • dc.description Treball fi de màster de: Master's Degree in Data Science. Methodology Program. Curs 2023-2024
  • dc.description Tutor: Gabor Lugosi
  • dc.description.abstract This thesis explores the differences between Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) to understand how these architectures perceive and learn from images. We first provide an in-depth explanation and comparative literature review of CNNs and ViTs. We then investigate how both models adapt to classifying satellite images and rotated scene images, evaluated in terms of rotational invariance and learned representations using Centered Kernel Alignment (CKA). ViTs demonstrated better performance and stability, which we attribute to their ability to integrate global information through self-attention mechanisms, while CNNs showed more variation due to their hierarchical feature learning and local receptive fields.
  • dc.description.abstract Esta tesis compara Convolutional Neural Networks (CNNs) y Vision Transformers (ViTs) en la percepción y aprendizaje de imágenes. Se ofrece una revisión bibliográfica comparativa y se investiga su adaptación a la clasificación de imágenes de satélite y escenas rotadas, evaluadas en términos de invarianza rotacional y representaciones aprendidas con Centered Kernel Alignment (CKA). Los ViT mostraron mejor rendimiento y estabilidad, atribuida a su capacidad para integrar información global mediante autoatención. Las CNN mostraron más variación debido a su aprendizaje jerárquico de características y campos receptivos locales.en
  • dc.identifier.uri http://hdl.handle.net/10230/70311
  • dc.language.iso eng
  • dc.rights This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License
  • dc.rights.accessRights info:eu-repo/semantics/openAccess
  • dc.rights.uri https://creativecommons.org/licenses/by-nc-nd/4.0
  • dc.subject.keyword Computer visionen
  • dc.subject.keyword Deep learningen
  • dc.subject.keyword Transformersen
  • dc.subject.keyword Visión por computadores
  • dc.subject.keyword Aprendizaje profundoes
  • dc.subject.keyword Transformadoreses
  • dc.subject.other Treball de fi de màster – Curs 2023-2024
  • dc.title Comparison of vision transformers and convolution neural networks
  • dc.type info:eu-repo/semantics/masterThesis