Leveraging pre-trained autoencoders for interpretable prototype learning of music audio

Alonso Jiménez, Pablo; Pepino, Leonardo; Batlle-Roca, Roser; Zinemanas, Pablo; Bogdanov, Dmitry; Serra, Xavier; Rocamora, Martín

Leveraging pre-trained autoencoders for interpretable prototype learning of music audio

Mostra el registre complet Registre parcial de l'ítem

dc.contributor.author Alonso Jiménez, Pablo
dc.contributor.author Pepino, Leonardo
dc.contributor.author Batlle-Roca, Roser
dc.contributor.author Zinemanas, Pablo
dc.contributor.author Bogdanov, Dmitry
dc.contributor.author Serra, Xavier
dc.contributor.author Rocamora, Martín
dc.date.accessioned 2024-02-22T07:10:31Z
dc.date.available 2024-02-22T07:10:31Z
dc.date.issued 2024
dc.description This work has been accepted at the ICASSP Workshop on Explainable AI for Speech and Audio (XAI-SA) at Seul, Korea. April 15, 2024
dc.description.abstract We present PECMAE an interpretable model for music audio classification based on prototype learning. Our model is based on a previous method, APNet, which jointly learns an autoencoder and a prototypical network. Instead, we propose to decouple both training processes. This enables us to leverage existing self-supervised autoencoders pre-trained on much larger data (EnCodecMAE), providing representations with better generalization. APNet allows prototypes’ reconstruction to waveforms for interpretability relying on the nearest training data samples. In contrast, we explore using a diffusion decoder that allows reconstruction without such dependency. We evaluate our method on datasets for music instrument classification (Medley-Solos-DB) and genre recognition (GTZAN and a larger in-house dataset), the latter being a more challenging task not addressed with prototypical networks before. We find that the prototype-based models preserve most of the performance achieved with the autoencoder embeddings, while the sonification of prototypes benefits understanding the behavior of the classifier
dc.description.sponsorship This work has been supported by the Musical AI project - PID2019-111403GB-I00/AEI/10.13039/501100011033, funded by the Spanish Ministerio de Ciencia e Innovacion and the Agencia Estatal de Investigación.
dc.format.mimetype application/pdf
dc.identifier.citation Alonso-Jiménez P, Pepino L, Batlle-Roca R, Zinemanas P, Bogdanov D, Serra X, Rocamora M. Leveraging pre-trained autoencoders for interpretable prototype learning of music audio. Paper presented at: ICASSP Workshop on Explainable AI for Speech and Audio (XAI-SA); 2024 Apr 15; Seoul, Korea.
dc.identifier.uri http://hdl.handle.net/10230/59220
dc.language.iso eng
dc.publisher Institute of Electrical and Electronics Engineers (IEEE)
dc.relation.projectID info:eu-repo/grantAgreement/ES/2PE/111403GB-I00
dc.rights.accessRights info:eu-repo/semantics/openAccess
dc.subject.keyword Prototypical learning
dc.subject.keyword self-supervised learning
dc.subject.keyword music audio classification
dc.subject.keyword interpretable AI
dc.title Leveraging pre-trained autoencoders for interpretable prototype learning of music audio
dc.type info:eu-repo/semantics/conferenceObject
dc.type.version info:eu-repo/semantics/acceptedVersion

Col·leccions

Informes (Departament de Tecnologies de la Informació i les Comunicacions)