Enhancing neural audio fingerprint robustness to audio degradation for music identification

Araz, Recep Oguz; Cortès Sebastià, Guillem; Molina, Emilio; Serra, Joan; Serra, Xavier; Mitsufuji, Yuhki; Bogdanov, Dmitry

Enhancing neural audio fingerprint robustness to audio degradation for music identification

Mostra el registre complet Registre parcial de l'ítem

dc.contributor.author Araz, Recep Oguz
dc.contributor.author Cortès Sebastià, Guillem
dc.contributor.author Molina, Emilio
dc.contributor.author Serra, Joan
dc.contributor.author Serra, Xavier
dc.contributor.author Mitsufuji, Yuhki
dc.contributor.author Bogdanov, Dmitry
dc.date.accessioned 2025-09-05T06:26:40Z
dc.date.available 2025-09-05T06:26:40Z
dc.date.issued 2025
dc.description Comunicació presentada al 26th International Society for Music Information Retrieval Conference (ISMIR 2025), celebrada a Daejeon (Korea) del 21 al 25 de setembre del 2025
dc.description.abstract Audio fingerprinting (AFP) allows the identification of unknown audio content by extracting compact representations, termed audio fingerprints, that are designed to remain robust against common audio degradations. Neural AFP methods often employ metric learning, where representation quality is influenced by the nature of the supervision and the utilized loss function. However, recent work unrealistically simulates real-life audio degradation during training, resulting in sub-optimal supervision. Additionally, although several modern metric learning approaches have been proposed, current neural AFP methods continue to rely on the NT‑Xent loss without exploring the recent advances or classical alternatives. In this work, we propose a series of best practices to enhance the self-supervision by leveraging musical signal properties and realistic room acoustics. We then present the first systematic evaluation of various metric learning approaches in the context of AFP, demonstrating that a self‑supervised adaptation of the triplet loss yields superior performance. Our results also reveal that training with multiple positive samples per anchor has critically different effects across loss functions. Our approach is built upon these insights and achieves state-of-the-art performance on both a large, synthetically degraded dataset and a real-world dataset recorded using microphones in diverse music venues.
dc.description.sponsorship This work was supported by the pre-doctoral program AGAUR-FI ajuts (2024 FI-3 00065) Joan Oró, funded by the Secretaria d’Universitats i Recerca of the Departament de Recerca i Universitats of the Generalitat de Catalunya; and by the Cátedras ENIA program “IA y Música: Cátedra en Inteligencia Artificial y Música” (TSI-100929-2023-1), funded by the Secretaría de Estado de Digitalización e Inteligencia Artificial and the European Union Next Generation EU. This work was also part of the project TROBA Technologies for the recognition of musical works in the era of dynamic generation of audio content (ACE014/20/000051), within the call Nuclis d’R+D 2024, with the support of ACCIÓ (Agency for Business Competitiveness, Government of Catalonia).
dc.format.mimetype application/pdf
dc.identifier.citation Oguz Araz R, Cortès-Sebastià G, Molina E, Serra J, Serra X, Mitsufuji Y, Bogdanov D. Enhancing neural audio fingerprint robustness to audio degradation for music identification. Paper presented at: 26th International Society for Music Information Retrieval Conference (ISMIR 2025); 2025 Sep 21-25; Daejeon, Korea. 8p.
dc.identifier.uri http://hdl.handle.net/10230/71121
dc.language.iso eng
dc.publisher International Society for Music Information Retrieval (ISMIR)
dc.rights Licensed under a Creative Commons Attribution 4.0 International License (CC BY 4.0). Attribution: R. O. Araz, G. Cortès-Sebastià, E. Molina, J. Serrà, X. Serra, Y. Mitsufuji, and D. Bogdanov, “Enhancing Neural Audio Fingerprint Robustness to Audio Degradation for Music Identification”, in Proc. of the 26th Int. Society for Music Information Retrieval Conf., Daejeon, South Korea, 2025.
dc.rights.accessRights info:eu-repo/semantics/openAccess
dc.rights.uri http://creativecommons.org/licenses/by/4.0/
dc.subject.keyword Neural audio fingerprint
dc.subject.keyword Music identification
dc.subject.keyword Audio robustness
dc.subject.keyword Audio degradation
dc.title Enhancing neural audio fingerprint robustness to audio degradation for music identification
dc.type info:eu-repo/semantics/conferenceObject
dc.type.version info:eu-repo/semantics/publishedVersion

Col·leccions

Congressos (Departament de Tecnologies de la Informació i les Comunicacions)