COALA: co-aligned autoencoders for learning semantically enriched audio representations

dc.contributor.authorFavory, Xavier
dc.contributor.authorDrossos, Konstantinos
dc.contributor.authorVirtanen, Tuomas
dc.contributor.authorSerra, Xavier
dc.date.accessioned2025-05-28T06:05:36Z
dc.date.available2025-05-28T06:05:36Z
dc.date.issued2020
dc.description.abstractAudio representation learning based on deep neural networks (DNNs) emerged as an alternative approach to hand-crafted features. For achieving high performance, DNNs often need a large amount of annotated data which can be difficult and costly to obtain. In this paper, we propose a method for learning audio representations, aligning the learned latent representations of audio and associated tags. Aligning is done by maximizing the agreement of the latent representations of audio and tags, using a contrastive loss. The result is an audio embedding model which reflects acoustic and semantic characteristics of sounds. We evaluate the quality of our embedding model, measuring its performance as a feature extractor on three different tasks (namely, sound event recognition, and music genre and musical instrument classification), and investigate what type of characteristics the model captures. Our results are promising, sometimes in par with the state-of-the-art in the considered tasks and the embeddings produced with our method are well correlated with some acoustic descriptors.
dc.format.mimetypeapplication/pdf
dc.identifier.citationFavory X, Drossos K, Virtanen T, Serra X. COALA: co-aligned autoencoders for learning semantically enriched audio representations. In: Daumé H, Singh A, editors. Proceedings Self-supervision in Audio and Speech Workshop at the 37th International Conference on Machine Learning (ICML), PMLR; 2020 Jul 13-18; Vienna: Austria. San Diego: ICML; 2020. [8 p.]
dc.identifier.urihttp://hdl.handle.net/10230/70539
dc.language.isoeng
dc.publisherInternational Conference on Machine Learning (ICML)
dc.rightsCopyright 2020 by the author(s).
dc.rights.accessRightsinfo:eu-repo/semantics/openAccess
dc.subject.keywordCOALA
dc.subject.keywordCo-aligned autoencoders
dc.subject.keywordLearning
dc.subject.keywordAudio representations
dc.titleCOALA: co-aligned autoencoders for learning semantically enriched audio representations
dc.typeinfo:eu-repo/semantics/conferenceObject
dc.type.versioninfo:eu-repo/semantics/publishedVersion

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Favory_ICML_coal.pdf
Size:
425.44 KB
Format:
Adobe Portable Document Format