This paper focuses on Cover Song Identification (CSI),
an important research challenge in content-based Music
Information Retrieval (MIR). Although the task itself is
interesting and challenging for both academia and industry scenarios, there are a number of limitations for the
advancement of current approaches. We specifically address two of them in the present study. First, the number of publicly available datasets for this task is limited,
and there is no publicly available benchmark set ...
This paper focuses on Cover Song Identification (CSI),
an important research challenge in content-based Music
Information Retrieval (MIR). Although the task itself is
interesting and challenging for both academia and industry scenarios, there are a number of limitations for the
advancement of current approaches. We specifically address two of them in the present study. First, the number of publicly available datasets for this task is limited,
and there is no publicly available benchmark set that is
widely used among researchers for comparative algorithm
evaluation. Second, most of the algorithms are not publicly shared and reproducible, limiting the comparison of
approaches. To overcome these limitations we propose
Da-TACOS, a DaTAset for COver Song Identification and
Understanding, and two frameworks for feature extraction
and benchmarking to facilitate reproducibility. Da-TACOS
contains 25K songs represented by unique editorial metadata plus 9 low- and mid-level features pre-computed with
open source libraries, and is divided into two subsets. The
Cover Analysis subset contains audio features (e.g. key,
tempo) that can serve to study how musical characteristics vary for cover songs. The Benchmark subset contains
the set of features that have been frequently used in CSI research, e.g. chroma, MFCC, beat onsets etc. Moreover, we
provide initial benchmarking results of a selected number
of state-of-the-art CSI algorithms using our dataset, and
for reproducibility, we share a GitHub repository containing the feature extraction and benchmarking frameworks.
+