In this paper, we present a framework for topic
detection in news articles. The framework receives as input the
results retrieved from a query-based search and clusters them by
topic. To this end, the recently introduced “DBSCAN-Martingale”
method for automatically estimating the number of topics and the
well-established Latent Dirichlet Allocation topic modelling
approach for the assignment of news articles into topics of interest,
are utilized. Furthermore, the proposed query-based topic ...
In this paper, we present a framework for topic
detection in news articles. The framework receives as input the
results retrieved from a query-based search and clusters them by
topic. To this end, the recently introduced “DBSCAN-Martingale”
method for automatically estimating the number of topics and the
well-established Latent Dirichlet Allocation topic modelling
approach for the assignment of news articles into topics of interest,
are utilized. Furthermore, the proposed query-based topic detection
framework works on high-level textual features (such as concepts
and named entities) that are extracted from news articles. Our topic
detection approach is tackled as a text clustering task, without
knowing the number of clusters and compares favorably to several
text clustering approaches, in a public dataset of retrieved results,
with respect to four representative queries.
+