Efficient document filtering using vector space topic expansion and pattern-mining
Efficient document filtering using vector space topic expansion and pattern-mining
Citació
- Proskurnia J, Mavlyutov R, Castillo C, Aberer K, Cudre-Mauroux P. Efficient document filtering using vector space topic expansion and pattern-mining: the case of event detection in microposts. In: CIKM '17 Proceedings of the 2017 ACM on Conference on Information and Knowledge Management; 2017 Nov 6-10; Singapore, Singapore. New York: ACM; 2017. p. 457-66. DOI: 10.1145/3132847.3133016
Enllaç permanent
Descripció
Resum
Automatically extracting information from social media is challenging given that social content is often noisy, ambiguous, and inconsistent. However, as many stories break on social channels first before being picked up by mainstream media, developing methods to better handle social content is of utmost importance. In this paper, we propose a robust and effective approach to automatically identify microposts related to a specific topic defined by a small sample of reference documents. Our framework extracts clusters of semantically similar microposts that overlap with the reference documents, by extracting combinations of key features that de ne those clusters through frequent pattern mining. Thiis allows us to construct compact and interpretable representations of the topic, dramatically decreasing the computational burden compared to classical clustering and k-NN-based machine learning techniques and producing highly-competitive results even with small training sets (less than 1’000 training objects). Our method is eficient and scales gracefully with large sets of incoming microposts. We experimentally validate our approach on a large corpus of over 60M microposts, showing that it significantly outperforms state-of-theart techniques.Descripció
Comunicació presentada a: CIKM '17 the 2017 ACM on Conference on Information and Knowledge Management, celebrat del 6 al 10 de novembre de 2017 a Singapur, Singapur.