In settings wherein discussion topics are not statically assigned, such as in microblogs, a need exists for identifying and separating topics of a given event. We approach the problem by using a novel type of similarity, calculated between the major terms used in posts. The occurrences of such terms are periodically sampled from the posts stream. The generated temporal series are processed by using marker-based stigmergy, i.e., a biologically-inspired mechanism performing scalar and temporal information aggregation. More precisely, each sample of the series generates a functional structure, called mark, associated with some concentration. The concentrations disperse in a scalar space and evaporate over time. Multiple deposits, when samples are close in terms of instants of time and values, aggregate in a trail and then persist longer than an isolated mark. To measure similarity between time series, the Jaccard’s similarity coefficient between trails is calculated. Discussion topics are generated by such similarity measure in a clustering process using Self-Organizing Maps, and are represented via a colored term cloud. Structural parameters are correctly tuned via an adaptation mechanism based on Differential Evolution. Experiments are completed for a real-world scenario, and the resulting similarity is compared with Dynamic Time Warping (DTW) similarity.
This is an open access article distributed under the Creative Commons Attribution License
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited