Next Article in Journal
Social Conflicts Studied by Statistical Physics Approach and Monte Carlo Simulations
Previous Article in Journal
Spin Waves and Skyrmions in Magneto-Ferroelectric Superlattices: Theory and Simulation
Open AccessProceedings

Fast Tuning of Topic Models: An Application of Rényi Entropy and Renormalization Theory

Internet Studies Lab, National Research University Higher School of Economics, 55/2 Sedova St., 192148 St. Petersburg, Russia
*
Author to whom correspondence should be addressed.
Presented at the 5th International Electronic Conference on Entropy and Its Applications, 18–30 November 2019; Available online: https://ecea-5.sciforum.net/.
These authors contributed equally to this work.
Proceedings 2020, 46(1), 5; https://doi.org/10.3390/ecea-5-06674
Published: 17 November 2019
In practice, the critical step in building machine learning models of big data (BD) is costly in terms of time and the computing resources procedure of parameter tuning with a grid search. Due to the size, BD are comparable to mesoscopic physical systems. Hence, methods of statistical physics could be applied to BD. The paper shows that topic modeling demonstrates self-similar behavior under the condition of a varying number of clusters. Such behavior allows using a renormalization technique. The combination of a renormalization procedure with the Rényi entropy approach allows for fast searching of the optimal number of clusters. In this paper, the renormalization procedure is developed for the Latent Dirichlet Allocation (LDA) model with a variational Expectation-Maximization algorithm. The experiments were conducted on two document collections with a known number of clusters in two languages. The paper presents results for three versions of the renormalization procedure: (1) a renormalization with the random merging of clusters, (2) a renormalization based on minimal values of Kullback–Leibler divergence and (3) a renormalization with merging clusters with minimal values of Rényi entropy. The paper shows that the renormalization procedure allows finding the optimal number of topics 26 times faster than grid search without significant loss of quality.
Keywords: renormalization theory; optimal number of topics; Rényi entropy renormalization theory; optimal number of topics; Rényi entropy
MDPI and ACS Style

Koltcov, S.; Ignatenko, V.; Pashakhin, S. Fast Tuning of Topic Models: An Application of Rényi Entropy and Renormalization Theory. Proceedings 2020, 46, 5.

Show more citation formats Show less citations formats
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Back to TopTop