Generating continuous f0 annotations for tasks such as
melody extraction and multiple f0 estimation typically involves
running a monophonic pitch tracker on each track
of a multitrack recording and manually correcting any estimation
errors. This process is labor intensive and time
consuming, and consequently existing annotated datasets
are very limited in size. In this paper we propose a framework
for automatically generating continuous f0 annotations
without requiring manual refinement: ...
Generating continuous f0 annotations for tasks such as
melody extraction and multiple f0 estimation typically involves
running a monophonic pitch tracker on each track
of a multitrack recording and manually correcting any estimation
errors. This process is labor intensive and time
consuming, and consequently existing annotated datasets
are very limited in size. In this paper we propose a framework
for automatically generating continuous f0 annotations
without requiring manual refinement: the estimate
of a pitch tracker is used to drive an analysis/synthesis
pipeline which produces a synthesized version of the track.
Any estimation errors are now reflected in the synthesized
audio, meaning the tracker’s output represents an accurate
annotation. Analysis is performed using a wide-band
harmonic sinusoidal modeling algorithm which estimates
the frequency, amplitude and phase of every harmonic,
meaning the synthesized track closely resembles the original
in terms of timbre and dynamics. Finally the synthesized
track is automatically mixed back into the multitrack.
The framework can be used to annotate multitrack datasets
for training learning-based algorithms. Furthermore, we
show that algorithms evaluated on the automatically generated/
annotated mixes produce results that are statistically
indistinguishable from those they produce on the original,
manually annotated, mixes. We release a software library
implementing the proposed framework, along with new
datasets for melody, bass and multiple f0 estimation.
+