Fuentes, MagdalenaPlaja-Roglans, GenísCortès Sebastià, GuillemKhandelwal, TanmayMiron, MariusSerra, XavierBello, Juan PabloSalomon, Justin2025-05-272025-05-272024Fuentes M, Plaja-Roglans G, Cortès-Sebastià G, Khandelwal T, Miron M, Bello JP, et al. Soundata: reproducible use of audio datasets. J Open Source Softw. 2024;9(98):6634. DOI: 10.21105/joss.066342475-9066http://hdl.handle.net/10230/70528Soundata is an open-source Python library for working with audio datasets in a programmatic and standardized way. It removes the need for writing custom loaders and improves reproducibility by providing tools to validate data against a canonical version. It speeds up research pipelines by allowing users to quickly download a dataset, validate that the dataset is complete and correct, and load it into memory in a standardized and reproducible way. It is designed to work with bioacoustics, environmental, urban, and spatial sound datasets; to be easy to use and easy to contribute to; and to increase reproducibility and standardize the usage of sound datasets in a flexible way.application/pdfengAuthors of papers retain copyright and release the work under a Creative Commons Attribution 4.0 International License (CC BY 4.0).Soundata: reproducible use of audio datasetsinfo:eu-repo/semantics/articlehttp://dx.doi.org/10.21105/joss.06634AudioEnvironmental-soundBioacousticsDatasetUrban-soundinfo:eu-repo/semantics/openAccess