Informes (Departament de Tecnologies de la Informació i les Comunicacions)http://hdl.handle.net/10230/209952024-03-29T01:11:15Z2024-03-29T01:11:15ZLeveraging pre-trained autoencoders for interpretable prototype learning of music audioAlonso Jiménez, PabloPepino, LeonardoBatlle-Roca, RoserZinemanas, PabloBogdanov, DmitrySerra, XavierRocamora, Martínhttp://hdl.handle.net/10230/592202024-02-27T15:38:00Z2024-01-01T00:00:00ZLeveraging pre-trained autoencoders for interpretable prototype learning of music audio
Alonso Jiménez, Pablo; Pepino, Leonardo; Batlle-Roca, Roser; Zinemanas, Pablo; Bogdanov, Dmitry; Serra, Xavier; Rocamora, Martín
We present PECMAE an interpretable model for music audio classification based on prototype learning. Our model is based on a previous method, APNet, which jointly learns an autoencoder and a prototypical network. Instead, we propose to decouple both training processes. This enables us to leverage existing self-supervised autoencoders pre-trained on much larger data (EnCodecMAE), providing representations with better generalization. APNet allows prototypes’ reconstruction to waveforms for interpretability relying on the nearest training data samples. In contrast, we explore using a diffusion decoder that allows reconstruction without such dependency. We evaluate our method on datasets for music instrument classification (Medley-Solos-DB) and genre recognition (GTZAN and a larger in-house dataset), the latter being a more challenging task not addressed with prototypical networks before. We find that the prototype-based models preserve most of the performance achieved with the autoencoder embeddings, while the sonification of prototypes benefits understanding the behavior of the classifier
This work has been accepted at the ICASSP Workshop on Explainable AI for Speech and Audio (XAI-SA) at Seul, Korea. April 15, 2024
2024-01-01T00:00:00ZComputers in education: how can we support the teachers?Hernández Leo, Daviniahttp://hdl.handle.net/10230/586572024-01-10T02:30:41Z2024-01-09T00:00:00ZComputers in education: how can we support the teachers?
Hernández Leo, Davinia
Comunicació presentada a 31st International Conference on Computers on Education (ICCE), celebrada del 4 al 8 de desembre de 2023 a Matsue, Japó.
2024-01-09T00:00:00ZCompleting audio drum loops with symbolic drum suggestionsHaki, BehzadPelinski, TeresaNieto, MarinaJordà Puig, Sergihttp://hdl.handle.net/10230/583182023-11-21T02:30:31Z2023-11-20T00:00:00ZCompleting audio drum loops with symbolic drum suggestions
Haki, Behzad; Pelinski, Teresa; Nieto, Marina; Jordà Puig, Sergi
Sampled drums can be used as an affordable way of creating
human-like drum tracks, or perhaps more interestingly,
can be used as a mean of experimentation with rhythm
and groove. Similarly, AI-based drum generation tools can
focus on creating human-like drum patterns, or alternatively,
focus on providing producers/musicians with means
of experimentation with rhythm. In this work, we aimed
to explore the latter approach. To this end, we present a
suite of Transformer-based models aimed at completing audio
drum loops with stylistically consistent symbolic drum
events. Our proposed models rely on a reduced spectral
representation of the drum loop, striking a balance between
a raw audio recording and an exact symbolic transcription.
Using a number of objective evaluations, we explore the validity
of our approach and identify several challenges that
need to be further studied in future iterations of this work.
Lastly, we provide a real-time VST plugin that allows musicians/
producers to utilize the models in real-time production
settings.
This work has been presented at NIME'23, at Mexico City, Mexico. 31 May - 2 June, 2023.
2023-11-20T00:00:00ZCarnatic singing voice separation using cold diffusion on training data with bleedingPlaja-Roglans, GenísMiron, MariusShankar, AdithiSerra, Xavierhttp://hdl.handle.net/10230/581882023-10-31T02:30:49Z2023-10-30T00:00:00ZCarnatic singing voice separation using cold diffusion on training data with bleeding
Plaja-Roglans, Genís; Miron, Marius; Shankar, Adithi; Serra, Xavier
Supervised music source separation systems using deep
learning are trained by minimizing a loss function between
pairs of predicted separations and ground-truth isolated
sources. However, open datasets comprising isolated
sources are few, small, and restricted to a few music styles.
At the same time, multi-track datasets with source bleeding
are usually found larger in size, and are easier to compile.
In this work, we address the task of singing voice separation
when the ground-truth signals have bleeding and only
the target vocals and the corresponding mixture are available.
We train a cold diffusion model on the frequency
domain to iteratively transform a mixture into the corresponding
vocals with bleeding. Next, we build the final
separation masks by clustering spectrogram bins according
to their evolution along the transformation steps. We
test our approach on a Carnatic music scenario for which
solely datasets with bleeding exist, while current research
on this repertoire commonly uses source separation models
trained solely with Western commercial music. Our evaluation
on a Carnatic test set shows that our system improves
Spleeter on interference removal and it is competitive in
terms of signal distortion. Code is open sourced.
This work has been accepted at the 24th International Society for Music Information Retrieval Conference (ISMIR 2023), at Milan, Italy. October 5-9, 2023.
2023-10-30T00:00:00Z