Zipf’s law in short-time timbral codings of speech, music, and environmental sound signals

Haro Berois, Martín; Serrà Julià, Joan; Herrera Boyer, Perfecto, 1964-; Corral, Álvaro

Zipf’s law in short-time timbral codings of speech, music, and environmental sound signals

Mostra el registre complet Registre parcial de l'ítem

dc.contributor.author Haro Berois, Martín
dc.contributor.author Serrà Julià, Joan
dc.contributor.author Herrera Boyer, Perfecto, 1964-
dc.contributor.author Corral, Álvaro
dc.date.accessioned 2019-07-17T15:30:33Z
dc.date.available 2019-07-17T15:30:33Z
dc.date.issued 2012
dc.description.abstract Timbre is a key perceptual feature that allows discrimination between different sounds. Timbral sensations are highly dependent on the temporal evolution of the power spectrum of an audio signal. In order to quantitatively characterize such sensations, the shape of the power spectrum has to be encoded in a way that preserves certain physical and perceptual properties. Therefore, it is common practice to encode short-time power spectra using psychoacoustical frequency scales. In this paper, we study and characterize the statistical properties of such encodings, here called timbral code-words. In particular, we report on rank-frequency distributions of timbral code-words extracted from 740 hours of audio coming from disparate sources such as speech, music, and environmental sounds. Analogously to text corpora, we find a heavy-tailed Zipfian distribution with exponent close to one. Importantly, this distribution is found independently of different encoding decisions and regardless of the audio source. Further analysis on the intrinsic characteristics of most and least frequent code-words reveals that the most frequent code-words tend to have a more homogeneous structure. We also find that speech and music databases have specific, distinctive code-words while, in the case of the environmental sounds, this database-specific code-words are not present. Finally, we find that a Yule-Simon process with memory provides a reasonable quantitative approximation for our data, suggesting the existence of a common simple generative mechanism for all considered sound sources.
dc.description.sponsorship Funding was received from Classical Planet: TSI-070100- 2009-407 (MITYC), www.mityc.es; DRIMS: TIN2009-14247-C02-01 (MICINN), www.micinn.es; FIS2009-09508, www.micinn.es; and 2009SGR-164, www.gencat.cat. JS acknowledges funding from Consejo Superior de Investigaciones Científicas (JAEDOC069/2010), www.csic.es; and Generalitat de Catalunya (2009-SGR-1434), www.gencat.cat.
dc.format.mimetype application/pdf
dc.identifier.citation Haro M, Serrà J, Herrera P, Corral Á. Zipf’s law in short-time timbral codings of speech, music, and environmental sound signals. PLoS ONE. 2012;7(3):e33993. DOI: 10.1371/journal.pone.0033993
dc.identifier.doi http://dx.doi.org/10.1371/journal.pone.0033993
dc.identifier.issn 1932-6203
dc.identifier.uri http://hdl.handle.net/10230/42026
dc.language.iso eng
dc.publisher Public Library of Science (PLoS)
dc.relation.ispartof PLoS ONE. 2012;7(3):e33993
dc.relation.projectID info:eu-repo/grantAgreement/ES/3PN/TIN2009-14247-C02-01
dc.relation.projectID info:eu-repo/grantAgreement/ES/3PN/FIS2009-09508
dc.rights © 2012 Haro et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
dc.rights.accessRights info:eu-repo/semantics/openAccess
dc.title Zipf’s law in short-time timbral codings of speech, music, and environmental sound signals
dc.type info:eu-repo/semantics/article
dc.type.version info:eu-repo/semantics/publishedVersion

Col·leccions

Articles (Departament de Tecnologies de la Informació i les Comunicacions)