Music recommendation systems are commonly used for
personalized recommendations. However, there are cases
where due to privacy concerns or design decisions, there is
no user information nor collaborative filtering data available. In those cases, it is possible to use content-based similarity spaces to retrieve the most similar tracks to be recommended based on the reference track. In this paper, we
compare the latent spaces extracted from state-of-the-art
autotagging models in terms of the similarity ...
Music recommendation systems are commonly used for
personalized recommendations. However, there are cases
where due to privacy concerns or design decisions, there is
no user information nor collaborative filtering data available. In those cases, it is possible to use content-based similarity spaces to retrieve the most similar tracks to be recommended based on the reference track. In this paper, we
compare the latent spaces extracted from state-of-the-art
autotagging models in terms of the similarity between lists
of retrieved nearest neighbors. We additionally study item
factors from collaborative-filtering data as a reference. We
provide insights into how much the choice of the architecture, training dataset, or model layer (output vs. penultimate) as well as a projection of the latent space onto 2D
changes the list of retrieved nearest neighbors. We release
the dataset of 9 content-based and 3 collaborative-filtering
latent representations of 29 275 tracks from Jamendo that
we use for the evaluation. Moreover, we perform an online
user experiment to compare the perceived track-to-track
similarity of the selected evaluated latent spaces. The results show that content-based spaces show better results
in our scenario, particularly embeddings from penultimate
layers of auto-tagging architectures.
+