Automatic estimation of singing voice musical dynamics

Citació

  • Narang J, Tamer NC, Vega VDL, Serra X. Automatic estimation of singing voice musical dynamics. Paper presented at: 25th International Society for Music Information Retrieval Conference (ISMIR2024); 2024 November 10-14; San Francisco, USA.

Enllaç permanent

Descripció

  • Resum

    Musical dynamics form a core part of expressive singing voice performances. However, automatic analysis of musical dynamics for singing voice has received limited attention partly due to the scarcity of suitable datasets and a lack of clear evaluation frameworks. To address this challenge, we propose a methodology for dataset curation. Employing the proposed methodology, we compile a dataset comprising 509 musical dynamics annotated singing voice performances, aligned with 163 score files, leveraging stateof-the-art source separation and alignment techniques. The scores are sourced from the OpenScore Lieder corpus of romantic-era compositions, widely known for its wealth of expressive annotations. Utilizing the curated dataset, we train a multi-head attention based CNN model with varying window sizes to evaluate the effectiveness of estimating musical dynamics. We explored two distinct perceptually motivated input representations for the model training: log-Mel spectrum and bark-scale based features. For testing, we manually curate another dataset of 25 musical dynamics annotated performances in collaboration with a professional vocalist. We conclude through our experiments that bark-scale based features outperform log-Melfeatures for the task of singing voice dynamics prediction. The dataset along with the code is shared publicly for further research on the topic.
  • Descripció

    This work has been accepted at 25th International Society for Music Information Retrieval Conference (ISMIR2024), in San Francisco, USA. November 10-14, 2024
  • Mostra el registre complet