Topic detection using the DBSCAN-Martingale and the time operator

Citació

  • Gialampoukidis I, Vrochidis S, Kompatsiaris I, Antoniou I. Topic detection using the DBSCAN-Martingale and the time operator. In: Skiadas CH, editor. Proceedings of the 17th Applied Stochastic Models and Data Analysis International Conference with the 6th Demographics Workshop ASMDA2017; 2017 June 6-9; London, UK. [London]: ISAST; 2017. p. 387-95.

Enllaç permanent

Descripció

  • Resum

    Topic detection is usually considered as a decision process implemented in some relevant context, for example clustering. In this case, clusters correspond to topics that should be identifed. Density-based clustering, for example, uses only a density level E and a lower bound for the number of points in a cluster. As the density level is hard to be estimated, a stochastic process, called the DBSCANMartingale, is constructed for the combination of several outputs of DBSCAN for various randomly selected values of E in a predefned closed interval [0; Emax] from the uniform distribution. We have observed that most of the clusters are extracted in the interval [0; Emax=2], and moreover in the interval [Emax=2; Emax] the DBSCANMartingale stochastic process is less innovative, i.e. extracts only a few or no clusters. Therefore, non-symmetric skewed distributions are needed to generate density levels for the extraction of all clusters in a fast way. In this work we show that skewed distributions may be used instead of the uniform, so as to extract all clusters as quickly as possible. Experiments on real datasets show that the average innovation time of the DBSCAN-Martingale stochastic process is reduced when skewed distributions are employed, so less time is needed to extract all clusters.
  • Descripció

    Comunicació presentada a: The 17th Conference of the Applied Stochastic Models and Data Analysis (ASMDA), celebrada del 6 al 9 de juny de 2017 a Londres, Regne Unit.
  • Mostra el registre complet