Semi-supervised music tagging transformer

Won, Minz; Choi, Keunwoo; Serra, Xavier

Semi-supervised music tagging transformer

Mostra el registre complet Registre parcial de l'ítem

dc.contributor.author Won, Minz
dc.contributor.author Choi, Keunwoo
dc.contributor.author Serra, Xavier
dc.date.accessioned 2025-05-28T06:06:12Z
dc.date.available 2025-05-28T06:06:12Z
dc.date.issued 2021
dc.description.abstract We present Music Tagging Transformer that is trained with a semi-supervised approach. The proposed model captures local acoustic characteristics in shallow convolutional layers, then temporally summarizes the sequence of the extracted features using stacked self-attention layers. Through a careful model assessment, we first show that the proposed architecture outperforms the previous state-of-the-art music tagging models that are based on convolutional neural networks under a supervised scheme. The Music Tagging Transformer is further improved by noisy student training, a semi-supervised approach that leverages both labeled and unlabeled data combined with data augmentation. To our best knowledge, this is the first attempt to utilize the entire audio of the million song dataset.
dc.format.mimetype application/pdf
dc.identifier.citation Won M, Choi K, Serra X. Semi-supervised music tagging transformer. In: Lee JH, Lerch A, Duan Z, Nam J, Rao P, Kranenburg PV, Srinivasamurthy A, editors. Proceedings 22nd International Society for Music Information Retrieval Conference (ISMIR 2021). [Canada]: International Society for Music Information Retrieval; 2021. p. 769-76.
dc.identifier.uri http://hdl.handle.net/10230/70540
dc.language.iso eng
dc.publisher International Society for Music Information Retrieval (ISMIR)
dc.rights © Minz Won, Keunwoo Choi, and Xavier Serra. Licensed under a Creative Commons Attribution 4.0 International License (CC BY 4.0). Attribution: Minz Won, Keunwoo Choi, and Xavier Serra, “SemiSupervised Music Tagging Transformer”, in Proc. of the 22nd Int. Society for Music Information Retrieval Conf., Online, 2021.
dc.rights.accessRights info:eu-repo/semantics/openAccess
dc.rights.uri http://creativecommons.org/licenses/by/4.0
dc.subject.keyword Sound
dc.subject.keyword Audio and speech processing
dc.title Semi-supervised music tagging transformer
dc.type info:eu-repo/semantics/conferenceObject
dc.type.version info:eu-repo/semantics/publishedVersion

Col·leccions

Congressos (Departament de Tecnologies de la Informació i les Comunicacions)