Won, MinzOramas, SergioNieto Caballero, OriolGouyon, FabienSerra, Xavier2023-03-072023-03-072021Won M, Oramas S, Nieto O, Gouyon F, Serra X. Multimodal metric learning for tag-based music retrieval. In: 2021 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2021): proceedings; 2021 Jun 6-11; Toronto, Canada. [Piscataway]: IEEE, 2021. p. 591-5. DOI: 10.1109/ICASSP39728.2021.94135141520-6149http://hdl.handle.net/10230/56075Comunicació presentada a 2021 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2021), celebrat del 6 a l'11 de juny de 2021 de manera virtual.Tag-based music retrieval is crucial to browse large-scale mu-sic libraries efficiently. Hence, automatic music tagging has been actively explored, mostly as a classification task, which has an inherent limitation: a fixed vocabulary. On the other hand, metric learning enables flexible vocabularies by using pretrained word embeddings as side information. Also, met-ric learning has proven its suitability for cross-modal retrieval tasks in other domains (e.g., text-to-image) by jointly learning a multimodal embedding space. In this paper, we investigate three ideas to successfully introduce multimodal metric learning for tag-based music retrieval: elaborate triplet sampling, acoustic and cultural music information, and domain-specific word embeddings. Our experimental results show that the proposed ideas enhance the retrieval system quantitatively and qualitatively. Furthermore, we release the MSD500: a subset of the Million Song Dataset (MSD) containing 500 cleaned tags, 7 manually annotated tag categories, and user taste profiles.application/pdfeng© 2021 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. http://dx.doi.org/10.1109/ICASSP39728.2021.9413514Multimodal metric learning for tag-based music retrievalinfo:eu-repo/semantics/conferenceObjecthttp://dx.doi.org/10.1109/ICASSP39728.2021.9413514Metric learningMusic retrievalMultimodalityAuto-tagginginfo:eu-repo/semantics/openAccess