Welcome to the UPF Digital Repository

An Interpretable deep learning model for automatic sound classification

Show simple item record

dc.contributor.author Zinemanas, Pablo
dc.contributor.author Rocamora, Martín
dc.contributor.author Miron, Marius
dc.contributor.author Font Corbera, Frederic
dc.contributor.author Serra, Xavier
dc.date.accessioned 2021-05-06T10:19:56Z
dc.date.available 2021-05-06T10:19:56Z
dc.date.issued 2021
dc.identifier.citation Zinemanas P, Rocamora M, Miron M, Font F, Serra X. An Interpretable deep learning model for automatic sound classification. Electronics. 2021;10(7):850. DOI: 10.3390/electronics10070850
dc.identifier.issn 2079-9292
dc.identifier.uri http://hdl.handle.net/10230/47341
dc.description.abstract Deep learning models have improved cutting-edge technologies in many research areas, but their black-box structure makes it difficult to understand their inner workings and the rationale behind their predictions. This may lead to unintended effects, such as being susceptible to adversarial attacks or the reinforcement of biases. There is still a lack of research in the audio domain, despite the increasing interest in developing deep learning models that provide explanations of their decisions. To reduce this gap, we propose a novel interpretable deep learning model for automatic sound classification, which explains its predictions based on the similarity of the input to a set of learned prototypes in a latent space. We leverage domain knowledge by designing a frequency-dependent similarity measure and by considering different time-frequency resolutions in the feature space. The proposed model achieves results that are comparable to that of the state-of-the-art methods in three different sound classification tasks involving speech, music, and environmental audio. In addition, we present two automatic methods to prune the proposed model that exploit its interpretability. Our system is open source and it is accompanied by a web application for the manual editing of the model, which allows for a human-in-the-loop debugging approach.
dc.format.mimetype application/pdf
dc.language.iso eng
dc.publisher MDPI
dc.relation.ispartof Electronics. 2021;10(7):850
dc.relation.isreferencedby https://github.com/pzinemanas/APNet
dc.rights © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
dc.rights.uri http://creativecommons.org/licenses/by/4.0/
dc.title An Interpretable deep learning model for automatic sound classification
dc.type info:eu-repo/semantics/article
dc.identifier.doi http://dx.doi.org/10.3390/electronics10070850
dc.subject.keyword Interpretability
dc.subject.keyword Explainability
dc.subject.keyword Deep learning
dc.subject.keyword Sound classification
dc.subject.keyword Prototypes
dc.rights.accessRights info:eu-repo/semantics/openAccess
dc.type.version info:eu-repo/semantics/publishedVersion


This item appears in the following Collection(s)

Show simple item record

Search DSpace

Advanced Search


My Account


Compliant to Partaking