Research corpora are fundamental for the computational study of
music. The design criteria with which to create them is a research
task in itself. These corpora need to be well suited for the specific
research problems to be addressed. Since these research problems
are also shaped by musical, cultural and other specific aspects
of the music traditions to be studied, the research corpora
should take these specificities into account. In this paper we address
the problems of creating corpora ...
Research corpora are fundamental for the computational study of
music. The design criteria with which to create them is a research
task in itself. These corpora need to be well suited for the specific
research problems to be addressed. Since these research problems
are also shaped by musical, cultural and other specific aspects
of the music traditions to be studied, the research corpora
should take these specificities into account. In this paper we address
the problems of creating corpora for computational research
on Arab-Andalusian music, considering several relevant criteria for
creating such corpora. We focus on the problems raised during
the annotation process of the corpora, specifically the language issues
surrounding this art music tradition. Following the criteria,
we created a research corpus consisting of audio recordings with
their corresponding metadata, lyrics and music scores. So far we
have gathered 338 recordings from 3 different Arab-Andalusian
music schools of Morocco, covering most of the musical modes,
rhythms and forms of this art music tradition. The Arab-Andalusian
corpus is accessible to the research community from a central online
repository. Moreover, the audio recordings of this corpora are
freely available through the Internet Archive repository. The Arab-
Andalusian corpus can be used to generate test datasets, which can
be used as ground truth to test several computational research tasks.
+