Won, MinzChun, SanghyukNieto Caballero, Oriol2020-04-202020Won M, Chun S, Nieto O, Serra X. Data-driven harmonic filters for audio representation learning. In: 2020 IEEE InternationalConference on Acoustics, Speech,and Signal Processing Proceedings; 2020 May 4-8; Barcelona, Spain. [New York]: IEEE; 2020. p. 536-40.

We introduce a trainable front-end module for audio representation learning that exploits the inherent harmonic structure of audio signals. The proposed architecture, composed of a set of filters, compels the subsequent network to capture harmonic relations while preserving spectro-temporal locality. Since the harmonic structure is known to have a key role in human auditory perception, one can expect these harmonic filters to yield more efficient audio representations. Experimental results show that a simple convolutional neural network back-end with the proposed front-end outperforms state-of-the-art baseline methods in automatic music tagging, keyword spotting, and sound event tagging tasks.