Welcome to the UPF Digital Repository

A Simple fusion of deep and shallow learning for acoustic scene classification

Show simple item record

dc.contributor.author Fonseca, Eduardo
dc.contributor.author Gong, Rong
dc.contributor.author Serra, Xavier
dc.date.accessioned 2019-03-06T09:33:37Z
dc.date.available 2019-03-06T09:33:37Z
dc.date.issued 2018
dc.identifier.citation Fonseca E, Gong R, Serra X. A Simple fusion of deep and shallow learning for acoustic scene classification. In: Georgaki A, Andreopoulou A, editors. Proceedings of the 15th Sound and Music Computing Conference (SMC2018). Sonic crossing; 2018 Jul 4-7;Limassol, Xipre. Limassol: Cyprus University of Technology; 2018. p. 265-72. DOI: 10.5281/zenodo.1422583
dc.identifier.isbn 978-9963-697-30-4
dc.identifier.issn 2518-3672
dc.identifier.uri http://hdl.handle.net/10230/36757
dc.description Comunicació presentada a: 15th Sound and Music Computing Conference (SMC2018). Sonic crossing, celebrat a Limassol, Xipre, del 4 al 7 de juliol de 2018.
dc.description.abstract In the past, Acoustic Scene Classification systems havebeen based on hand crafting audio features that are input toa classifier. Nowadays, the common trend is to adopt datadriven techniques, e.g., deep learning, where audio repre-sentations are learned from data. In this paper, we proposea system that consists of a simple fusion of two methods ofthe aforementioned types: a deep learning approach wherelog-scaled mel-spectrograms are input to a convolutionalneural network, and a feature engineering approach, wherea collection of hand-crafted features is input to a gradientboosting machine. We first show that both methods pro-vide complementary information to some extent. Then, weuse a simple late fusion strategy to combine both meth-ods. We report classification accuracy of each method in-dividually and the combined system on the TUT AcousticScenes 2017 dataset. The proposed fused system outper-forms each of the individual methods and attains a classifi-cation accuracy of 72.8% on the evaluation set, improvingthe baseline system by 11.8%.
dc.description.sponsorship This work is partially supported by the European Union’sHorizon 2020 research and innovation programme undergrant agreement No 688382 “AudioCommons”, and theEuropean Research Council under the European Union’sSeventh Framework Program, as part of the CompMusicproject (ERC grant agreement 267583), and a Google Fac-ulty Research Award 2017. We are grateful for the GPUsdonated by NVidia.
dc.format.mimetype application/pdf
dc.language.iso eng
dc.publisher Cyprus University of Technology
dc.relation.ispartof Georgaki A, Andreopoulou A, editors. Proceedings of the 15th Sound and Music Computing Conference (SMC2018). Sonic crossing; 2018 Jul 4-7; Limassol, Xipre. Limassol: Cyprus University of Technology; 2018. p. 265-72.
dc.rights © 2018 Eduardo Fonseca, Rong Gong, and Xavier Serra et al. This is an open-access article distributed under the terms of the Creative Commons Attribution 3.0 Unported License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
dc.rights.uri https://creativecommons.org/licenses/by/3.0/
dc.title A Simple fusion of deep and shallow learning for acoustic scene classification
dc.type info:eu-repo/semantics/conferenceObject
dc.identifier.doi http://dx.doi.org/10.5281/zenodo.1422583
dc.relation.projectID info:eu-repo/grantAgreement/EC/H2020/688382
dc.relation.projectID info:eu-repo/grantAgreement/EC/FP7/267583
dc.rights.accessRights info:eu-repo/semantics/openAccess
dc.type.version info:eu-repo/semantics/publishedVersion

Thumbnail

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Advanced Search

Browse

My Account

Statistics

Compliant to Partaking