Next Article in Journal
Variability and Reproducibility of Directed and Undirected Functional MRI Connectomes in the Human Brain
Previous Article in Journal
The Complexity Entropy Analysis of a Supply Chain System Considering Recovery Rate and Channel Service
Previous Article in Special Issue
Double-Granule Conditional-Entropies Based on Three-Level Granular Structures
Article Menu
Issue 7 (July) cover image

Export Article

Open AccessArticle

Estimating Topic Modeling Performance with Sharma–Mittal Entropy

St. Petersburg School of Physics, Mathematics, and Computer Science, National Research University Higher School of Economics, Kantemirovskaya Ulitsa, 3A, St. Petersburg 194100, Russia
*
Author to whom correspondence should be addressed.
Entropy 2019, 21(7), 660; https://doi.org/10.3390/e21070660
Received: 26 April 2019 / Revised: 27 June 2019 / Accepted: 3 July 2019 / Published: 5 July 2019
(This article belongs to the Special Issue Information-Theoretical Methods in Data Mining)
  |  
PDF [2239 KB, uploaded 5 July 2019]
  |     |  

Abstract

Topic modeling is a popular approach for clustering text documents. However, current tools have a number of unsolved problems such as instability and a lack of criteria for selecting the values of model parameters. In this work, we propose a method to solve partially the problems of optimizing model parameters, simultaneously accounting for semantic stability. Our method is inspired by the concepts from statistical physics and is based on Sharma–Mittal entropy. We test our approach on two models: probabilistic Latent Semantic Analysis (pLSA) and Latent Dirichlet Allocation (LDA) with Gibbs sampling, and on two datasets in different languages. We compare our approach against a number of standard metrics, each of which is able to account for just one of the parameters of our interest. We demonstrate that Sharma–Mittal entropy is a convenient tool for selecting both the number of topics and the values of hyper-parameters, simultaneously controlling for semantic stability, which none of the existing metrics can do. Furthermore, we show that concepts from statistical physics can be used to contribute to theory construction for machine learning, a rapidly-developing sphere that currently lacks a consistent theoretical ground. View Full-Text
Keywords: Sharma–Mittal entropy; topic modeling; optimal number of topics; stability Sharma–Mittal entropy; topic modeling; optimal number of topics; stability
Figures

Figure 1

This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0).
SciFeed

Share & Cite This Article

MDPI and ACS Style

Koltcov, S.; Ignatenko, V.; Koltsova, O. Estimating Topic Modeling Performance with Sharma–Mittal Entropy. Entropy 2019, 21, 660.

Show more citation formats Show less citations formats

Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Related Articles

Article Metrics

Article Access Statistics

1

Comments

[Return to top]
Entropy EISSN 1099-4300 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert
Back to Top