Multimodal deep learning for music genre classification

Oramas, Sergio; Barbieri, Francesco; Nieto Caballero, Oriol; Serra, Xavier

Multimodal deep learning for music genre classification

Mostra el registre complet Registre parcial de l'ítem

dc.contributor.author Oramas, Sergioca
dc.contributor.author Barbieri, Francescoca
dc.contributor.author Nieto Caballero, Oriolca
dc.contributor.author Serra, Xavierca
dc.date.accessioned 2018-10-24T09:59:30Z
dc.date.available 2018-10-24T09:59:30Z
dc.date.issued 2018
dc.description.abstract Music genre labels are useful to organize songs, albums, and artists into broader groups that share similar musical characteristics. In this work, an approach to learn and combine multimodal data representations for music genre classification is proposed. Intermediate representations of deep neural networks are learned from audio tracks, text reviews, and cover art images, and further combined for classification. Experiments on single and multi-label genre classification are then carried out, evaluating the effect of the different learned representations and their combinations. Results on both experiments show how the aggregation of learned representations from different modalities improves the accuracy of the classification, suggesting that different modalities embed complementary information. In addition, the learning of a multimodal feature space increases the performance of pure audio representations, which may be specially relevant when the other modalities are available for training, but not at prediction time. Moreover, a proposed approach for dimensionality reduction of target labels yields major improvements in multi-label classification not only in terms of accuracy, but also in terms of the diversity of the predicted genres, which implies a more fine-grained categorization. Finally, a qualitative analysis of the results sheds some light on the behavior of the different modalities on the classification task.
dc.description.sponsorship This work was partially funded by the Spanish Ministry of Economy and Competitiveness under the Maria de Maeztu Units of Excellence Programme (MDM-2015-0502).
dc.format.mimetype application/pdf
dc.identifier.citation Oramas S, Barbieri F, Nieto O, Serra X. Multimodal deep learning for music genre classification. Transactions of the International Society for Music Information Retrieval. 2018;1(1):4-21. DOI: 10.5334/tismir.10
dc.identifier.doi http://dx.doi.org/10.5334/tismir.10
dc.identifier.issn 2514-3298
dc.identifier.uri http://hdl.handle.net/10230/35647
dc.language.iso eng
dc.publisher Ubiquity Pressca
dc.relation.ispartof Transactions of the International Society for Music Information Retrieval. 2018;1(1):4-21.
dc.rights © 2018 The Author(s). This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International License (CC-BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
dc.rights.accessRights info:eu-repo/semantics/openAccess
dc.rights.uri https://creativecommons.org/licenses/by/4.0/
dc.subject.keyword Information retrieval
dc.subject.keyword Deep learning
dc.subject.keyword Music
dc.subject.keyword Multimodal
dc.subject.keyword Multi-label classification
dc.title Multimodal deep learning for music genre classificationca
dc.type info:eu-repo/semantics/article
dc.type.version info:eu-repo/semantics/publishedVersion

Col·leccions

Articles (Departament de Tecnologies de la Informació i les Comunicacions)