Multimodal metric learning for tag-based music retrieval

Won, Minz; Oramas, Sergio; Nieto Caballero, Oriol; Gouyon, Fabien; Serra, Xavier

Multimodal metric learning for tag-based music retrieval

Mostra el registre complet Registre parcial de l'ítem

dc.contributor.author Won, Minz
dc.contributor.author Oramas, Sergio
dc.contributor.author Nieto Caballero, Oriol
dc.contributor.author Gouyon, Fabien
dc.contributor.author Serra, Xavier
dc.date.accessioned 2023-03-07T08:09:14Z
dc.date.available 2023-03-07T08:09:14Z
dc.date.issued 2021
dc.description Comunicació presentada a 2021 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2021), celebrat del 6 a l'11 de juny de 2021 de manera virtual.
dc.description.abstract Tag-based music retrieval is crucial to browse large-scale mu-sic libraries efficiently. Hence, automatic music tagging has been actively explored, mostly as a classification task, which has an inherent limitation: a fixed vocabulary. On the other hand, metric learning enables flexible vocabularies by using pretrained word embeddings as side information. Also, met-ric learning has proven its suitability for cross-modal retrieval tasks in other domains (e.g., text-to-image) by jointly learning a multimodal embedding space. In this paper, we investigate three ideas to successfully introduce multimodal metric learning for tag-based music retrieval: elaborate triplet sampling, acoustic and cultural music information, and domain-specific word embeddings. Our experimental results show that the proposed ideas enhance the retrieval system quantitatively and qualitatively. Furthermore, we release the MSD500: a subset of the Million Song Dataset (MSD) containing 500 cleaned tags, 7 manually annotated tag categories, and user taste profiles.
dc.description.sponsorship This work was funded by the predoctoral grant MDM-2015-0502-17-2 from the Spanish Ministry of Economy and Competitiveness linked to the Maria de Maeztu Units of Excellence Programme (MDM-2015-0502).
dc.format.mimetype application/pdf
dc.identifier.citation Won M, Oramas S, Nieto O, Gouyon F, Serra X. Multimodal metric learning for tag-based music retrieval. In: 2021 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2021): proceedings; 2021 Jun 6-11; Toronto, Canada. [Piscataway]: IEEE, 2021. p. 591-5. DOI: 10.1109/ICASSP39728.2021.9413514
dc.identifier.doi http://dx.doi.org/10.1109/ICASSP39728.2021.9413514
dc.identifier.issn 1520-6149
dc.identifier.uri http://hdl.handle.net/10230/56075
dc.language.iso eng
dc.publisher Institute of Electrical and Electronics Engineers (IEEE)
dc.relation.ispartof 2021 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2021): proceedings; 2021 Jun 6-11; Toronto, Canada. [Piscataway]: IEEE, 2021. p. 591-5.
dc.relation.projectID info:eu-repo/grantAgreement/ES/1PE/MDM-2015-0502-17-2
dc.rights © 2021 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. http://dx.doi.org/10.1109/ICASSP39728.2021.9413514
dc.rights.accessRights info:eu-repo/semantics/openAccess
dc.subject.keyword Metric learning
dc.subject.keyword Music retrieval
dc.subject.keyword Multimodality
dc.subject.keyword Auto-tagging
dc.title Multimodal metric learning for tag-based music retrieval
dc.type info:eu-repo/semantics/conferenceObject
dc.type.version info:eu-repo/semantics/acceptedVersion

Col·leccions

Congressos (Departament de Tecnologies de la Informació i les Comunicacions)