Multimodal metric learning for tag-based music retrieval
Mostra el registre complet Registre parcial de l'ítem
- dc.contributor.author Won, Minz
- dc.contributor.author Oramas, Sergio
- dc.contributor.author Nieto Caballero, Oriol
- dc.contributor.author Gouyon, Fabien
- dc.contributor.author Serra, Xavier
- dc.date.accessioned 2023-03-07T08:09:14Z
- dc.date.available 2023-03-07T08:09:14Z
- dc.date.issued 2021
- dc.description Comunicació presentada a 2021 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2021), celebrat del 6 a l'11 de juny de 2021 de manera virtual.
- dc.description.abstract Tag-based music retrieval is crucial to browse large-scale mu-sic libraries efficiently. Hence, automatic music tagging has been actively explored, mostly as a classification task, which has an inherent limitation: a fixed vocabulary. On the other hand, metric learning enables flexible vocabularies by using pretrained word embeddings as side information. Also, met-ric learning has proven its suitability for cross-modal retrieval tasks in other domains (e.g., text-to-image) by jointly learning a multimodal embedding space. In this paper, we investigate three ideas to successfully introduce multimodal metric learning for tag-based music retrieval: elaborate triplet sampling, acoustic and cultural music information, and domain-specific word embeddings. Our experimental results show that the proposed ideas enhance the retrieval system quantitatively and qualitatively. Furthermore, we release the MSD500: a subset of the Million Song Dataset (MSD) containing 500 cleaned tags, 7 manually annotated tag categories, and user taste profiles.
- dc.description.sponsorship This work was funded by the predoctoral grant MDM-2015-0502-17-2 from the Spanish Ministry of Economy and Competitiveness linked to the Maria de Maeztu Units of Excellence Programme (MDM-2015-0502).
- dc.format.mimetype application/pdf
- dc.identifier.citation Won M, Oramas S, Nieto O, Gouyon F, Serra X. Multimodal metric learning for tag-based music retrieval. In: 2021 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2021): proceedings; 2021 Jun 6-11; Toronto, Canada. [Piscataway]: IEEE, 2021. p. 591-5. DOI: 10.1109/ICASSP39728.2021.9413514
- dc.identifier.doi http://dx.doi.org/10.1109/ICASSP39728.2021.9413514
- dc.identifier.issn 1520-6149
- dc.identifier.uri http://hdl.handle.net/10230/56075
- dc.language.iso eng
- dc.publisher Institute of Electrical and Electronics Engineers (IEEE)
- dc.relation.ispartof 2021 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2021): proceedings; 2021 Jun 6-11; Toronto, Canada. [Piscataway]: IEEE, 2021. p. 591-5.
- dc.relation.projectID info:eu-repo/grantAgreement/ES/1PE/MDM-2015-0502-17-2
- dc.rights © 2021 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. http://dx.doi.org/10.1109/ICASSP39728.2021.9413514
- dc.rights.accessRights info:eu-repo/semantics/openAccess
- dc.subject.keyword Metric learning
- dc.subject.keyword Music retrieval
- dc.subject.keyword Multimodality
- dc.subject.keyword Auto-tagging
- dc.title Multimodal metric learning for tag-based music retrieval
- dc.type info:eu-repo/semantics/conferenceObject
- dc.type.version info:eu-repo/semantics/acceptedVersion