Research corpora are representative collections of data and
are essential to develop data-driven approaches in Music
Information Research (MIR). We address the problem of
building research corpora for MIR in Indian art music traditions
of Hindustani and Carnatic music, considering several
relevant criteria for building such corpora. We also
discuss a methodology to assess the corpora based on these
criteria and present an evaluation of the corpora in their
coverage and completeness. In addition ...
Research corpora are representative collections of data and
are essential to develop data-driven approaches in Music
Information Research (MIR). We address the problem of
building research corpora for MIR in Indian art music traditions
of Hindustani and Carnatic music, considering several
relevant criteria for building such corpora. We also
discuss a methodology to assess the corpora based on these
criteria and present an evaluation of the corpora in their
coverage and completeness. In addition to the corpora, we
briefly describe the test datasets that we have built for use
in many research tasks. In specific, we describe the tonic
dataset, the Carnatic rhythm dataset, the Carnatic varṇaṁ
dataset, and the Mridangam stroke dataset. The criteria and
the evaluation methodology discussed in this article can be
used to systematically build a representative and comprehensive
research corpus. The corpora and the datasets are
accessible to the research community from a central online
repository.
+