Urban sound & sight: dataset and benchmark for audio-visual urban scene understanding

Mostra el registre complet Registre parcial de l'ítem

  • dc.contributor.author Fuentes, Magdalena
  • dc.contributor.author Steers, Bea
  • dc.contributor.author Zinemanas, Pablo
  • dc.contributor.author Rocamora, Martín
  • dc.contributor.author Bondi, Luca
  • dc.contributor.author Wilkins, Julia
  • dc.contributor.author Shi, Qianyi
  • dc.contributor.author Hou, Yao
  • dc.contributor.author Das, Samarjit
  • dc.contributor.author Serra, Xavier
  • dc.contributor.author Bello, Juan Pablo
  • dc.date.accessioned 2022-06-20T05:54:39Z
  • dc.date.available 2022-06-20T05:54:39Z
  • dc.date.issued 2022
  • dc.description Comunicació presentada a: 2022 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), celebrat del 22 al 27 de maig de 2022 a Singapur.
  • dc.description.abstract Automatic audio-visual urban traffic understanding is a growing area of research with many potential applications of value to industry, academia, and the public sector. Yet, the lack of well-curated resources for training and evaluating models to research in this area hinders their development. To address this we present a curated audio-visual dataset, Urban Sound & Sight (Urbansas), developed for investigating the detection and localization of sounding vehicles in the wild. Urbansas consists of 12 hours of unlabeled data along with 3 hours of manually annotated data, including bounding boxes with classes and unique id of vehicles, and strong audio labels featuring vehicle types and indicating off-screen sounds. We discuss the challenges presented by the dataset and how to use its annotations for the localization of vehicles in the wild through audio models.
  • dc.format.mimetype application/pdf
  • dc.identifier.citation Fuentes M, Steers B, Zinemanas P, Rocamora M, Bondi L, Wilkins J, Shi Q, Hou Y, Das S, Serra X, Bello JP. Urban sound & sight: dataset and benchmark for audio-visual urban scene understanding. In: 2022 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP); 2022 May 22-27; Singapore. [New Jersery]: The Institute of Electrical and Electronics Engineers; 2022. p. 141-5. DOI: 10.1109/ICASSP43922.2022.9747644
  • dc.identifier.doi http://doi.org/10.1109/ICASSP43922.2022.9747644
  • dc.identifier.uri http://hdl.handle.net/10230/53526
  • dc.language.iso eng
  • dc.publisher Institute of Electrical and Electronics Engineers (IEEE)
  • dc.relation.ispartof 2022 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP); 2022 May 22-27; Singapore. [New Jersery]: The Institute of Electrical and Electronics Engineers; 2022. p. 141-5.
  • dc.rights © 2022 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. http://dx.doi.org/10.1109/ICASSP43922.2022.9747644
  • dc.rights.accessRights info:eu-repo/semantics/openAccess
  • dc.subject.keyword audio-visual
  • dc.subject.keyword urban research
  • dc.subject.keyword traffic
  • dc.subject.keyword dataset
  • dc.title Urban sound & sight: dataset and benchmark for audio-visual urban scene understanding
  • dc.type info:eu-repo/semantics/article
  • dc.type.version info:eu-repo/semantics/acceptedVersion