Openly available datasets are a key factor in the advancement
of data-driven research approaches, including many
of the ones used in sound and music computing. In the
last few years, quite a number of new audio datasets have
been made available but there are still major shortcomings
in many of them to have a significant research impact.
Among the common shortcomings are the lack of transparency
in their creation and the difficulty of making them
completely open and sharable. They often do ...
Openly available datasets are a key factor in the advancement
of data-driven research approaches, including many
of the ones used in sound and music computing. In the
last few years, quite a number of new audio datasets have
been made available but there are still major shortcomings
in many of them to have a significant research impact.
Among the common shortcomings are the lack of transparency
in their creation and the difficulty of making them
completely open and sharable. They often do not include
clear mechanisms to amend errors and many times they are
not large enough for current machine learning needs. This
paper introduces Freesound Datasets, an online platform
for the collaborative creation of open audio datasets based
on principles of transparency, openness, dynamic character,
and sustainability. As a proof-of-concept, we present
an early snapshot of a large-scale audio dataset built using
this platform. It consists of audio samples from Freesound
organised in a hierarchy based on the AudioSet Ontology.
We believe that building and maintaining datasets following
the outlined principles and using open tools and collaborative
approaches like the ones presented here will have a
significant impact in our research community.
+