How low can you go? Reducing frequency and time resolution in current CNN architectures for music auto-tagging

Mostra el registre complet Registre parcial de l'ítem

  • dc.contributor.author Ferraro, Andrés
  • dc.contributor.author Bogdanov, Dmitry
  • dc.contributor.author Serra, Xavier
  • dc.contributor.author Jeon, Jay Ho
  • dc.contributor.author Yoon, Jason
  • dc.date.accessioned 2023-03-07T08:09:11Z
  • dc.date.available 2023-03-07T08:09:11Z
  • dc.date.issued 2020
  • dc.description Comunicació presentada a 28th European Signal Processing Conference (EUSIPCO 2020), celebrat del 18 al 21 de gener de 2020 a Amsterdam, Països Baixos.
  • dc.description.abstract Automatic tagging of music is an important research topic in Music Information Retrieval and audio analysis algorithms proposed for this task have achieved improvements with advances in deep learning. In particular, many state-of-the-art systems use Convolutional Neural Networks and operate on mel-spectrogram representations of the audio. In this paper, we compare commonly used mel-spectrogram representations and evaluate model performances that can be achieved by reducing the input size in terms of both lesser amount of frequency bands and larger frame rates. We use the MagnaTagaTune dataset for comprehensive performance comparisons and then compare selected configurations on the larger Million Song Dataset. The results of this study can serve researchers and practitioners in their trade-off decision between accuracy of the models, data storage size and training and inference times.
  • dc.format.mimetype application/pdf
  • dc.identifier.citation Ferraro A, Bogdanov D, Serra X, Ho Jeon J, Yoon J. How low can you go? Reducing frequency and time resolution in current CNN architectures for music auto-tagging. In: 28th European Signal Processing Conference (EUSIPCO 2020): proceedings; 2020 Jan 18-21; Amsterdam, Netherlands. [Piscataway]: IEEE; 2020. p. 131-5. DOI: 10.23919/Eusipco47968.2020.9287769
  • dc.identifier.doi http://dx.doi.org/10.23919/Eusipco47968.2020.9287769
  • dc.identifier.issn 2219-5491
  • dc.identifier.uri http://hdl.handle.net/10230/56074
  • dc.language.iso eng
  • dc.publisher Institute of Electrical and Electronics Engineers (IEEE)
  • dc.relation.ispartof 28th European Signal Processing Conference (EUSIPCO 2020): proceedings; 2020 Jan 18-21; Amsterdam, Netherlands. [Piscataway]: IEEE; 2020. p. 131-5.
  • dc.rights © 2020 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. http://dx.doi.org/10.23919/Eusipco47968.2020.9287769
  • dc.rights.accessRights info:eu-repo/semantics/openAccess
  • dc.subject.keyword music auto-tagging
  • dc.subject.keyword audio classification
  • dc.subject.keyword convolutional neural networks
  • dc.title How low can you go? Reducing frequency and time resolution in current CNN architectures for music auto-tagging
  • dc.type info:eu-repo/semantics/conferenceObject
  • dc.type.version info:eu-repo/semantics/acceptedVersion