Tensorflow audio models in Essentia

Citació

Alonso-Jiménez P, Bogdanov D, Pons J, Serra X. Tensorflow audio models in Essentia. In: 2020 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP); 2020 May 4-8; Barcelona, Spain. New Jersery: The Institute of Electrical and Electronics Engineers; 2020. p. 266-70. DOI: 10.1109/ICASSP40776.2020.9054688

Enllaç permanent

Descripció

Resum
Essentia is a reference open-source C ++ /Python library for audio and music analysis. In this work, we present a set of algorithms that employ TensorFlow in Essentia, allow predictions with pre-trained deep learning models, and are designed to offer flexibility of use, easy extensibility, and real-time inference. To show the potential of this new interface with TensorFlow, we provide a number of pre-trained state-of-the-art music tagging and classification CNN models. We run an extensive evaluation of the developed models. In particular, we assess the generalization capabilities in a cross-collection evaluation utilizing both external tag datasets as well as manual annotations tailored to the taxonomies of our models.
Descripció
Comunicació presentada a: ICASSP 2020 IEEE International Conference on Acoustics, Speech and Signal Processing, celebrat en línia del 4 al 8 de maig de 2020.
DOI
http://dx.doi.org/10.1109/ICASSP40776.2020.9054688
Col·leccions
Congressos (Departament de Tecnologies de la Informació i les Comunicacions)

Fitxers