A Simple fusion of deep and shallow learning for acoustic scene classification

Citació

  • Fonseca E, Gong R, Serra X. A Simple fusion of deep and shallow learning for acoustic scene classification. In: Georgaki A, Andreopoulou A, editors. Proceedings of the 15th Sound and Music Computing Conference (SMC2018). Sonic crossing; 2018 Jul 4-7;Limassol, Xipre. Limassol: Cyprus University of Technology; 2018. p. 265-72. DOI: 10.5281/zenodo.1422583

Enllaç permanent

Descripció

  • Resum

    In the past, Acoustic Scene Classification systems havebeen based on hand crafting audio features that are input toa classifier. Nowadays, the common trend is to adopt datadriven techniques, e.g., deep learning, where audio repre-sentations are learned from data. In this paper, we proposea system that consists of a simple fusion of two methods ofthe aforementioned types: a deep learning approach wherelog-scaled mel-spectrograms are input to a convolutionalneural network, and a feature engineering approach, wherea collection of hand-crafted features is input to a gradientboosting machine. We first show that both methods pro-vide complementary information to some extent. Then, weuse a simple late fusion strategy to combine both meth-ods. We report classification accuracy of each method in-dividually and the combined system on the TUT AcousticScenes 2017 dataset. The proposed fused system outper-forms each of the individual methods and attains a classifi-cation accuracy of 72.8% on the evaluation set, improvingthe baseline system by 11.8%.
  • Descripció

    Comunicació presentada a: 15th Sound and Music Computing Conference (SMC2018). Sonic crossing, celebrat a Limassol, Xipre, del 4 al 7 de juliol de 2018.
  • Mostra el registre complet