Biases on social media data: (keynote extended abstract)

Citació

  • Baeza-Yates R. Biases on social media data: (keynote extended abstract). In: Seghrouchni AEF, Sukthankar G, Liu TY, van Steen M. WWW '20: Companion Proceedings of the Web Conference; 2020 Apr 20-24; Taipei, Taiwan. New York: Association for Computing Machinery; 2020. p. 782-83. DOI: 10.1145/3366424.3383564

Enllaç permanent

Descripció

  • Resum

    Social media data is often used to pulse the opinion of online communities, either by predicting sentiment or stances (e.g., political), to mention just two typical use cases. However, those analysis assume that the data samples really represent the underlying demographics of the overall community, both, in number and characteristics, which in most cases is not true. As a result, extrapolating these results to larger populations usually do not work. This happens because social media data is inherently biased, mainly due to two facts: (1) not all people is equally active in social media platforms and most of them are really passive; and (2) there are demographic biases in gender and age, among other attributes. Hence, the questions of how representative is the data and if is possible to make it representative are of crucial importance. We also discuss related issues such as using public samples of mostly private platforms as well as typical errors in the analysis of such data.
  • Descripció

    Comunicació presentada al WWW'20: International World Wide Web Conference, celebrat del 20 al 24 d'abril de 2020 a Taipei, Taiwan.
  • Mostra el registre complet