Correspondence between audio and visual deep models for musical instrument detection in video recordings

Mostra el registre complet Registre parcial de l'ítem

  • dc.contributor.author Slizovskaia, Olga
  • dc.contributor.author Gómez Gutiérrez, Emilia, 1975-
  • dc.contributor.author Haro Ortega, Gloria
  • dc.date.accessioned 2019-05-13T10:56:30Z
  • dc.date.available 2019-05-13T10:56:30Z
  • dc.date.issued 2017
  • dc.description Comunicació presentada a: 18th International Society for Music Information Retrieval Conference (ISMIR17) celebrat del 23 al 27 d'octubre de 2017 a Suzhou, Xina.
  • dc.description.abstract This work aims at investigating cross-modal connections between audio and video sources in the task of musical instrument recognition. We also address in this work the understanding of the representations learned by convolutional neural networks (CNNs) and we study feature correspondence between audio and visual components of a multimodal CNN architecture. For each instrument category, we select the most activated neurons and investigate existing cross-correlations between neurons from the audio and video CNN which activate the same instrument category. We analyse two training schemes for multimodal applications and perform a comparative analysis and visualisation of model predictions.
  • dc.description.sponsorship This work is partly supported by the Maria de Maeztu Units of Excellence Programme (MDM-2015-0502). We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan X GPU and WiMIR society for covering the registration expenses.
  • dc.format.mimetype application/pdf
  • dc.identifier.citation Slizovskaia O, Gómez E, Haro G. Correspondence between audio and visual deep models for musical instrument detection in video recordings. Paper presented at: 18th International Society for Music Information Retrieval Conference (ISMIR17); 2017 Oct 23-27; Suzhou, China.
  • dc.identifier.uri http://hdl.handle.net/10230/37216
  • dc.language.iso eng
  • dc.publisher International Society for Music Information Retrieval (ISMIR)
  • dc.rights © Olga Slizovskaia, Emilia Gomez, Gloria Haro. Licensed under a Creative Commons Attribution 4.0 International License (CC BY4.0). Attribution: Olga Slizovskaia, Emilia Gomez, Gloria Haro. “Correspondence between audio and visual deep models for musical instrument detection in video recordings”, 18th International Society for Music Information Retrieval Conference, Suzhou, China, 2017.
  • dc.rights.accessRights info:eu-repo/semantics/openAccess
  • dc.rights.uri https://creativecommons.org/licenses/by/4.0/
  • dc.title Correspondence between audio and visual deep models for musical instrument detection in video recordings
  • dc.type info:eu-repo/semantics/conferenceObject
  • dc.type.version info:eu-repo/semantics/publishedVersion