Saraga Audiovisual: a large multimodal open data collection for the analysis of carnatic music
Saraga Audiovisual: a large multimodal open data collection for the analysis of carnatic music
Citació
- Shankar A, Plaja-Roglans G, Nuttall T, Rocamora M, Serra X. Saraga Audiovisual: a large multimodal open data collection for the analysis of carnatic music. Paper presented at: 25th International Society for Music Information Retrieval Conference (ISMIR2024); 2024 November 10-14; San Francisco, USA.
Enllaç permanent
Descripció
Resum
Carnatic music is a style of South Indian art music whose analysis using computational methods is an active area of research in Music Information Research (MIR). A core, open dataset for such analysis is the Saraga dataset, which includes multi-stem audio, expert annotations, and accompanying metadata. However, it has been noted that there are several limitations to the Saraga collections, and that additional relevant aspects of the tradition still need to be covered to facilitate musicologically important research lines. In this work, we present Saraga Audiovisual, a dataset that includes new and more diverse renditions of Carnatic vocal performances, totalling 42 concerts and more than 60 hours of music. A major contribution of this dataset is the inclusion of video recordings for all concerts, allowing for a wide range of multimodal analyses. We also provide high-quality human pose estimation data of the musicians extracted from the video footage, and perform benchmarking experiments for the different modalities to validate the utility of the novel collection. Saraga Audiovisual, along with access tools and results of our experiments, is made available for research purposes.Descripció
This work has been accepted at 25th International Society for Music Information Retrieval Conference (ISMIR2024), in San Francisco, USA. November 10-14, 2024