Accurate and scalable version identification using musically-motivated embeddings

Mostra el registre complet Registre parcial de l'ítem

  • dc.contributor.author Yesiler, Furkan
  • dc.contributor.author Serrà Julià, Joan
  • dc.contributor.author Gómez Gutiérrez, Emilia, 1975-
  • dc.date.accessioned 2021-02-12T07:23:21Z
  • dc.date.issued 2020
  • dc.description.abstract The version identification (VI) task deals with the automatic detection of recordings that correspond to the same underlying musical piece. Despite many efforts, VI is still an open problem, with much room for improvement, specially with regard to combining accuracy and scalability. In this paper, we present MOVE, a musically-motivated method for accurate and scalable version identification. MOVE achieves state-of-the-art performance on two publicly-available benchmark sets by learning scalable embeddings in an Euclidean distance space, using a triplet loss and a hard triplet mining strategy. It improves over previous work by employing an alternative input representation, and introducing a novel technique for temporal content summarization, a standardized latent space, and a data augmentation strategy specifically designed for VI. In addition to the main results, we perform an ablation study to highlight the importance of our design choices, and study the relation between embedding dimensionality and model performance.en
  • dc.description.sponsorship Comunicació presentada a: ICASSP 2020 IEEE International Conference on Acoustics, Speech and Signal Processing, celebrat en línia del 4 al 8 de maig de 2020.
  • dc.description.sponsorship This work is supported by the MIP-Frontiers project, the European Union’s Horizon 2020 research and innovation programme under the Marie Skodowska-Curie grant agreement No. 765068, and by TROMPA, the Horizon 2020 project 770376-2.
  • dc.format.mimetype application/pdf
  • dc.identifier.citation Yesiler F, Serrà J, Gómez E. Accurate and scalable version identification using musically-motivated embeddings. In: 2020 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP); 2020 May 4-8; Barcelona, Spain. New Jersery: The Institute of Electrical and Electronics Engineers; 2020. p. 21-5. DOI: 10.1109/ICASSP40776.2020.9053793
  • dc.identifier.doi http://dx.doi.org/10.1109/ICASSP40776.2020.9053793
  • dc.identifier.issn 2379-190X
  • dc.identifier.uri http://hdl.handle.net/10230/46459
  • dc.language.iso eng
  • dc.publisher Institute of Electrical and Electronics Engineers (IEEE)
  • dc.relation.ispartof 2020 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP); 2020 May 4-8; Barcelona, Spain. New Jersery: The Institute of Electrical and Electronics Engineers; 2020. p. 21-5
  • dc.relation.projectID info:eu-repo/grantAgreement/EC/H2020/770376
  • dc.relation.projectID info:eu-repo/grantAgreement/EC/H2020/765068
  • dc.rights © 2020 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. http://dx.doi.org/10.1109/ICASSP40776.2020.9053793
  • dc.rights.accessRights info:eu-repo/semantics/openAccess
  • dc.subject.keyword Cover song identificationen
  • dc.subject.keyword Deep learningen
  • dc.subject.keyword Music embeddingen
  • dc.subject.keyword Network encoderen
  • dc.title Accurate and scalable version identification using musically-motivated embeddingsen
  • dc.type info:eu-repo/semantics/conferenceObject
  • dc.type.version info:eu-repo/semantics/acceptedVersion