Visual music transcription of clarinet video recordings trained with audio-based labelled data
Mostra el registre complet Registre parcial de l'ítem
- dc.contributor.author Gómez Gutiérrez, Emilia, 1975-ca
- dc.contributor.author Arias Martínez, Pabloca
- dc.contributor.author Zinemanas, Pabloca
- dc.contributor.author Haro Ortega, Gloriaca
- dc.date.accessioned 2018-02-26T11:18:05Z
- dc.date.available 2018-02-26T11:18:05Z
- dc.date.issued 2017
- dc.description Comunicació presentada a la 2017 IEEE International Conference on Computer Vision Workshops (ICCVW), celebrada els dies 22 a 29 d'octubre de 2017 a Venècia, Itàlia.
- dc.description.abstract Automatic transcription is a well-known task in the music information retrieval (MIR) domain, and consists on the computation of a symbolic music representation (e.g. MIDI) from an audio recording. In this work, we address the automatic transcription of video recordings when the audio modality is missing or it does not have enough quality, and thus analyze the visual information. We focus on the clarinet which is played by opening/closing a set of holes and keys. We propose a method for automatic visual note estimation by detecting the fingertips of the player and measuring their displacement with respect to the holes and keys of the clarinet. To this aim, we track the clarinet and determine its position on every frame. The relative positions of the fingertips are used as features of a machine learning algorithm trained for note pitch classification. For that purpose, a dataset is built in a semiautomatic way by estimating pitch information from audio signals in an existing collection of 4.5 hours of video recordings from six different songs performed by nine different players. Our results confirm the difficulty of performing visual vs audio automatic transcription mainly due to motion blur and occlusions that cannot be solved with a single view.en
- dc.description.sponsorship This work is partly supported by the Spanish Ministry of Economy and Competitiveness under the Maria de Maeztu Units of Excellence Programme (MDM-2015-0502); the Spanish research projects CASAS (TIN2015-70816-R, MINECO/FEDER, UE) and MOTECAVI (TIN2015-70410-C2-1-R, MINECO/FEDER, UE); BPIFrance and Région Île de France in the framework of the FUI 18 Plein Phare project; the Office of Naval research by grant N00014-17-1-2552, ANR-DGA project ANR-12-ASTR-0035; and ANR-14-CE27-001 (MIRIAM).
- dc.format.mimetype application/pdf
- dc.identifier.citation Gómez E, Arias P, Xinemanas P, Haro G. Visual music transcription of clarinet video recordings trained with audio-based labelled data. In: 2017 IEEE International Conference on Computer Vision Workshops (ICCVW); 2017 Oct 22-29; Venice, Italy. Piscataway (NJ): IEEE; 2017. p. 463-70. DOI: 10.1109/ICCVW.2017.62
- dc.identifier.doi http://dx.doi.org/10.1109/ICCVW.2017.62
- dc.identifier.uri http://hdl.handle.net/10230/34002
- dc.language.iso eng
- dc.publisher Institute of Electrical and Electronics Engineers (IEEE)ca
- dc.relation.ispartof 2017 IEEE International Conference on Computer Vision Workshops (ICCVW); 2017 Oct 22-29; Venice, Italy. Piscataway (NJ): IEEE; 2017. p. 463-70.
- dc.relation.projectID info:eu-repo/grantAgreement/ES/1PE/TIN2015-70816-R
- dc.relation.projectID info:eu-repo/grantAgreement/ES/1PE/TIN2015-70410-C2-1-R
- dc.rights © 2017 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. The final published article can be found at http://ieeexplore.ieee.org/document/8265272/
- dc.rights.accessRights info:eu-repo/semantics/openAccess
- dc.subject.keyword Visualizationen
- dc.subject.keyword Skinen
- dc.subject.keyword Kalman filtersen
- dc.subject.keyword Feature extractionen
- dc.subject.keyword Instrumentsen
- dc.subject.keyword Video recordingen
- dc.title Visual music transcription of clarinet video recordings trained with audio-based labelled dataca
- dc.type info:eu-repo/semantics/conferenceObject
- dc.type.version info:eu-repo/semantics/acceptedVersion