Environmental sound recognition using short-time feature aggregation

Roma Trepat, GerardHerrera Boyer, Perfecto, 1964-Nogueira, Waldo2020-09-022020-09-022018Roma G, Herrera P, Nogueira W. Environmental sound recognition using short-time feature aggregation. J Intell Inf Syst. 2018;51:457-75. DOI: 10.1007/s10844-017-0481-40925-9902http://hdl.handle.net/10230/45243Recognition of environmental sound is usually based on two main architectures, depending on whether the model is trained with frame-level features or with aggregated descriptions of acoustic scenes or events. The former architecture is appropriate for applications where target categories are known in advance, while the later affords a less supervised approach. In this paper, we propose a framework for environmental sound recognition based on blind segmentation and feature aggregation. We describe a new set of descriptors, based on Recurrence Quantification Analysis (RQA), which can be extracted from the similarity matrix of a time series of audio descriptors. We analyze their usefulness for recognition of acoustic scenes and events in addition to standard feature aggregation. Our results show the potential of non-linear time series analysis techniques for dealing with environmental sounds.application/pdfeng© Springer The final publication is available at Springer via http://dx.doi.org/10.1007/s10844-017-0481-4Environmental sound recognition using short-time feature aggregationinfo:eu-repo/semantics/articlehttp://dx.doi.org/10.1007/s10844-017-0481-4Audio databasesEvent detectionEnvironmental sound recognitionAudio featuresRecurrence quantification analysisPattern recognitioninfo:eu-repo/semantics/openAccess