Browsing by Author "Pons Puig, Jordi"

Sort by: Order: Results:

  • Rethage, Dario; Pons Puig, Jordi; Serra, Xavier (Institute of Electrical and Electronics Engineers (IEEE), 2018)
    Most speech processing techniques use magnitude spectrograms as front-end and are therefore by default discarding part of the signal: the phase. In order to overcome this limitation' we propose an end-to-end learning method ...
  • Gong, Rong; Pons Puig, Jordi; Serra, Xavier (International Society for Music Information Retrieval (ISMIR), 2017)
    We approach the singing phrase audio to score matching problem by using phonetic and duration information – with a focus on studying the jingju a cappella singing case. We argue that, due to the existence of a basic melodic ...
  • Pons Puig, Jordi (Universitat Pompeu Fabra, 2019-11-15)
    Automatic music and audio tagging can help increase the retrieval and re-use possibilities of many audio databases that remain poorly labeled. In this dissertation, we tackle the task of music and audio tagging from the ...
  • Pons Puig, Jordi; Serra, Xavier (Institute of Electrical and Electronics Engineers (IEEE), 2017)
    Many researchers use convolutional neural networks with small rectangular filters for music (spectrograms) classification. First, we discuss why there is no reason to use this filters setup by default and second, we point ...
  • Pons Puig, Jordi; Nieto Caballero, Oriol; Prockup, Matthew; Schmidt, Erik M.; Ehmann, Andreas F.; Serra, Xavier (International Society for Music Information Retrieval (ISMIR), 2018)
    The lack of data tends to limit the outcomes of deep learning research, particularly when dealing with end-to-end learning stacks processing raw data such as waveforms. In this study, 1.2M tracks annotated with musical ...
  • Pons Puig, Jordi; Nieto Caballero, Oriol; Prockup, Matthew; Schmidt, Erik M.; Ehmann, Andreas F.; Serra, Xavier (2017)
    The lack of data tends to limit the outcomes of deep learning research – specially, when dealing with end-to-end learning stacks processing raw data such as waveforms. In this study we make use of musical labels annotated ...
  • Pons Puig, Jordi; Lidy, Thomas; Serra, Xavier (Institute of Electrical and Electronics Engineers (IEEE), 2016)
    A common criticism of deep learning relates to the difficulty in understanding the underlying relationships that/nthe neural networks are learning, thus behaving like a black-box. In this article we explore various ...
  • Fonseca, Eduardo; Pons Puig, Jordi; Favory, Xavier; Font Corbera, Frederic; Bogdanov, Dmitry; Ferraro, Andrés; Oramas, Sergio; Porter, Alastair; Serra, Xavier (International Society for Music Information Retrieval (ISMIR), 2017)
    Openly available datasets are a key factor in the advancement of data-driven research approaches, including many of the ones used in sound and music computing. In the last few years, quite a number of new audio datasets ...
  • Fonseca, Eduardo; Plakal, Manoj; Font Corbera, Frederic; Ellis, Daniel P. W.; Favory, Xavier; Pons Puig, Jordi; Serra, Xavier (Tampere University of Technology, 2018)
    This paper describes Task 2 of the DCASE 2018 Challenge, titled “General-purpose audio tagging of Freesound content with AudioSet labels”. This task was hosted on the Kaggle platform as “Freesound General-Purpose Audio ...
  • Pons Puig, Jordi; Serra, Xavier (Institute of Electrical and Electronics Engineers (IEEE), 2018)
    The computer vision literature shows that randomly weighted neural networks perform reasonably as feature extractors. Following this idea, we study how non-trained (randomly weighted) convolutional neural networks perform ...
  • Pons Puig, Jordi; Janer Mestres, Jordi; Rode, Thilo; Nogueira, Waldo (Acoustical Society of America, 2016)
    Music perception remains rather poor for many Cochlear Implant (CI) users due to the users' deficient pitch perception. However, comprehensible vocals and simple music structures are well perceived by many CI users. In ...
  • Pons Puig, Jordi; Gong, Rong; Serra, Xavier (International Society for Music Information Retrieval (ISMIR), 2017)
    This paper introduces a new score-informed method for the segmentation of jingju a cappella singing phrase into syllables. The proposed method estimates the most likely sequence of syllable boundaries given the estimated ...
  • Alonso-Jiménez, Pablo; Bogdanov, Dmitry; Pons Puig, Jordi; Serra, Xavier (Institute of Electrical and Electronics Engineers (IEEE), 2020)
    Essentia is a reference open-source C ++ /Python library for audio and music analysis. In this work, we present a set of algorithms that employ TensorFlow in Essentia, allow predictions with pre-trained deep learning models, ...
  • Pons Puig, Jordi; Slizovskaia, Olga; Gómez Gutiérrez, Emilia, 1975-; Serra, Xavier (European Association for Signal Processing (EURASIP), 2017)
    The focus of this work is to study how to efficiently tailor Convolutional Neural Networks (CNNs) towards learning timbre representations from log-mel magnitude spectrograms. We first review the trends when designing ...
  • Pons Puig, Jordi; Serrà Julià, Joan; Serra, Xavier (Institute of Electrical and Electronics Engineers (IEEE), 2019)
    We investigate supervised learning strategies that improve the training of neural network audio classifiers on small annotated collections. In particular, we study whether (i) a naive regularization of the solution space, ...