Toward interpretable polyphonic sound event detection with attention maps based on local prototypes

Zinemanas, Pablo; Rocamora, Martín; Fonseca, Eduardo; Font, Frederic; Serra, Xavier

Toward interpretable polyphonic sound event detection with attention maps based on local prototypes

Mostra el registre complet Registre parcial de l'ítem

dc.contributor.author Zinemanas, Pablo
dc.contributor.author Rocamora, Martín
dc.contributor.author Fonseca, Eduardo
dc.contributor.author Font, Frederic
dc.contributor.author Serra, Xavier
dc.date.accessioned 2021-12-14T08:44:50Z
dc.date.available 2021-12-14T08:44:50Z
dc.date.issued 2021
dc.description Comunicació presentada a: DCASE 2021 celebrat del 15 al 19 de novembre de 2021 de manera virtual.
dc.description.abstract Understanding the reasons behind the predictions of deep neural networks is a pressing concern as it can be critical in several application scenarios. In this work, we present a novel interpretable model for polyphonic sound event detection. It tackles one of the limitations of our previous work, i.e. the difficulty to deal with a multi-label setting properly. The proposed architecture incorporates a prototype layer and an attention mechanism. The network learns a set of local prototypes in the latent space representing a patch in the input representation. Besides, it learns attention maps for positioning the local prototypes and reconstructing the latent space. Then, the predictions are solely based on the attention maps. Thus, the explanations provided are the attention maps and the corresponding local prototypes. Moreover, one can reconstruct the prototypes to the audio domain for inspection. The obtained results in urban sound event detection are comparable to that of two opaque baselines but with fewer parameters while offering interpretability.
dc.format.mimetype application/pdf
dc.identifier.citation Zinemanas P, Rocamora M, Fonseca E, Font F, Serra X. Toward interpretable polyphonic sound event detection with attention maps based on local prototypes. In: Font F, Mesaros A, Ellis DPW, Fonseca E, Fuentes M, Elizalde B, editors. Proceedings of the 6th Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE 2021); 2021 Nov 15-19; Online. Barcelona: Music Technology Group, Universitat Pompeu Fabra; 2021. p.50-4. DOI: 10.5281/zenodo.5770113
dc.identifier.doi http://dx.doi.org/10.5281/zenodo.5770113
dc.identifier.isbn 978-84-09-36072-7
dc.identifier.uri http://hdl.handle.net/10230/49196
dc.language.iso eng
dc.publisher Universitat Pompeu Fabra. Music Technology Group
dc.relation.ispartof Font F, Mesaros A, Ellis DPW, Fonseca E, Fuentes M, Elizalde B, editors. Proceedings of the 6th Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE 2021); 2021 Nov 15-19; Online. Barcelona: Music Technology Group, Universitat Pompeu Fabra; 2021.
dc.rights This work is licensed under a Creative Commons Attribution 4.0 International License. To view a copy of this license, visit: http://creativecommons.org/licenses/by/4.0/
dc.rights.accessRights info:eu-repo/semantics/openAccess
dc.rights.uri http://creativecommons.org/licenses/by/4.0/
dc.subject.keyword Interpretability
dc.subject.keyword Sound event detection
dc.subject.keyword Prototypes
dc.title Toward interpretable polyphonic sound event detection with attention maps based on local prototypes
dc.type info:eu-repo/semantics/conferenceObject
dc.type.version info:eu-repo/semantics/publishedVersion

Col·leccions

Congressos (Departament de Tecnologies de la Informació i les Comunicacions)