Text embeddings for modeling the evolution of online discussions

Enllaç permanent

Descripció

  • Resum

    The rise of online discussion platforms has transformed the way people communicate, exchange ideas, and engage with information. From social media platforms like Reddit, Twitter, and Facebook to specialized forums and community websites, online discussions play a crucial role in shaping public opinion, disseminating information, and fostering community interaction. Understanding the dynamics of online discussion threads is essential for studying user behavior, analyzing information flow, and enhancing platform design. This study employs advanced sentence embedding methods, such as Sentence-BERT (SBERT), A Lite BERT (ALBERT), SimCSE, and Universal Sentence Encoder (USE), to enhance textual data analysis. These methods improve semantic analysis by capturing sentence meaning and providing scalable, high-quality embeddings. Our approach uses statistical modeling techniques, particularly we model how the structure of a conversation evolves over time using the multinomial logit model formulation of a popular growing tree generative model introduced in [1]. The original model assumes that the growth of a discussion depends on the interaction between three comment features: popularity, novelty, and the Root bias of the first post which initiates the discussion. The main contribution of this work is incorporating a new feature that accounts for the (textual) content of the user’s comments. For that, we use sentence embeddings and calculate the cosine similarity between parent-child comment pairs. The relevance of these four features is estimated using a maximum likelihood approach to study the differences between the newly defined multinomial logit model and the original one [1] aiming to generate robust, semantically related models. The multinomial logit model is well-suited for this task, offering a flexible framework to express different aspects of online discussions and uncover underlying patterns and relationships. It is a simple and efficient model that can be easily trained on datasets of various sizes. However, it may be limited in capturing complex relationships between words and their meanings.
  • Descripció

    Treball fi de màster de: Master in Intelligent Interactive Systems
    Tutor: Vicenç Gómez
  • Mostra el registre complet