Multimodal metric learning for tag-based music retrieval
Multimodal metric learning for tag-based music retrieval
Citació
- Won M, Oramas S, Nieto O, Gouyon F, Serra X. Multimodal metric learning for tag-based music retrieval. In: 2021 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2021): proceedings; 2021 Jun 6-11; Toronto, Canada. [Piscataway]: IEEE, 2021. p. 591-5. DOI: 10.1109/ICASSP39728.2021.9413514
Enllaç permanent
Descripció
Resum
Tag-based music retrieval is crucial to browse large-scale mu-sic libraries efficiently. Hence, automatic music tagging has been actively explored, mostly as a classification task, which has an inherent limitation: a fixed vocabulary. On the other hand, metric learning enables flexible vocabularies by using pretrained word embeddings as side information. Also, met-ric learning has proven its suitability for cross-modal retrieval tasks in other domains (e.g., text-to-image) by jointly learning a multimodal embedding space. In this paper, we investigate three ideas to successfully introduce multimodal metric learning for tag-based music retrieval: elaborate triplet sampling, acoustic and cultural music information, and domain-specific word embeddings. Our experimental results show that the proposed ideas enhance the retrieval system quantitatively and qualitatively. Furthermore, we release the MSD500: a subset of the Million Song Dataset (MSD) containing 500 cleaned tags, 7 manually annotated tag categories, and user taste profiles.Descripció
Comunicació presentada a 2021 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2021), celebrat del 6 a l'11 de juny de 2021 de manera virtual.