Music Information Retrieval is largely based on descriptors computed from audio signals, and in many practical applications they are to be computed on music corpora containing audio files encoded in a variety of lossy formats. Such encodings distort the original signal and therefore may affect the computation of descriptors. This raises the question of the robustness of these descriptors across various audio encodings. We examine this assumption for the case of MFCCs and chroma features. In particular, ...
Music Information Retrieval is largely based on descriptors computed from audio signals, and in many practical applications they are to be computed on music corpora containing audio files encoded in a variety of lossy formats. Such encodings distort the original signal and therefore may affect the computation of descriptors. This raises the question of the robustness of these descriptors across various audio encodings. We examine this assumption for the case of MFCCs and chroma features. In particular, we analyze their robustness to sampling rate, codec, bitrate, frame size and music genre. Using two different audio analysis tools over a diverse collection of music tracks, we compute several statistics to quantify the robustness of the resulting descriptors, and then estimate the practical effects for a sample task like genre classification.
+