FSD50K: an open dataset of human-labeled sound events
Mostra el registre complet Registre parcial de l'ítem
- dc.contributor.author Fonseca, Eduardo
- dc.contributor.author Favory, Xavier
- dc.contributor.author Pons, Jordi
- dc.contributor.author Font, Frederic
- dc.contributor.author Serra, Xavier
- dc.date.accessioned 2023-03-07T08:06:55Z
- dc.date.available 2023-03-07T08:06:55Z
- dc.date.issued 2022
- dc.description.abstract Most existing datasets for sound event recognition (SER) are relatively small and/or domain-specific, with the exception of AudioSet, based on over 2 M tracks from YouTube videos and encompassing over 500 sound classes. However, AudioSet is not an open dataset as its official release consists of pre-computed audio features. Downloading the original audio tracks can be problematic due to YouTube videos gradually disappearing and usage rights issues. To provide an alternative benchmark dataset and thus foster SER research, we introduce FSD50K , an open dataset containing over 51 k audio clips totalling over 100 h of audio manually labeled using 200 classes drawn from the AudioSet Ontology. The audio clips are licensed under Creative Commons licenses, making the dataset freely distributable (including waveforms). We provide a detailed description of the FSD50K creation process, tailored to the particularities of Freesound data, including challenges encountered and solutions adopted. We include a comprehensive dataset characterization along with discussion of limitations and key factors to allow its audio-informed usage. Finally, we conduct sound event classification experiments to provide baseline systems as well as insight on the main factors to consider when splitting Freesound audio data for SER. Our goal is to develop a dataset to be widely adopted by the community as a new open benchmark for SER research.
- dc.description.sponsorship This work was supported in part by European Union’s Horizon 2020 research and innovation programme under Grant Agreement 688382 AudioCommons, in part by two Google Faculty Research Awards 2017 and 2018, and in part by the Maria de Maeztu Units of Excellence Programme under Grant MDM-2015-0502.
- dc.format.mimetype application/pdf
- dc.identifier.citation Fonseca E, Favory X, Pons J, Font F, Serra X. FSD50K: an open dataset of human-labeled sound events. IEEE/ACM Trans Audio Speech Lang Process. 2022;30:829-52. DOI: 10.1109/TASLP.2021.3133208
- dc.identifier.doi http://dx.doi.org/10.1109/TASLP.2021.3133208
- dc.identifier.issn 2329-9290
- dc.identifier.uri http://hdl.handle.net/10230/56072
- dc.language.iso eng
- dc.publisher Institute of Electrical and Electronics Engineers (IEEE)
- dc.relation.ispartof IEEE/ACM Transactions on Audio, Speech, and Language Processing. 2022;30:829-52.
- dc.relation.projectID info:eu-repo/grantAgreement/EC/H2020/688382
- dc.relation.projectID info:eu-repo/grantAgreement/ES/1PE/MDM-2015-0502
- dc.rights © 2021 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. http://dx.doi.org/10.1109/TASLP.2021.3133208
- dc.rights.accessRights info:eu-repo/semantics/openAccess
- dc.subject.keyword audio dataset
- dc.subject.keyword sound event
- dc.subject.keyword recognition
- dc.subject.keyword classification
- dc.subject.keyword tagging
- dc.subject.keyword data collection
- dc.subject.keyword environmental sound
- dc.title FSD50K: an open dataset of human-labeled sound events
- dc.type info:eu-repo/semantics/article
- dc.type.version info:eu-repo/semantics/acceptedVersion