Next Article in Journal
The Free Energy Requirements of Biological Organisms; Implications for Evolution
Previous Article in Journal
On the Stability of Classical Orbits of the Hydrogen Ground State in Stochastic Electrodynamics
Article Menu

Export Article

Open AccessArticle
Entropy 2016, 18(4), 105; doi:10.3390/e18040105

Generalized Analysis of a Distribution Separation Method

1
Tianjin Key Laboratory of Cognitive Computing and Application, School of Computer Science and Technology, Tianjin University, Tianjin 300072, China
2
School of Computer Software, Tianjin University, Tianjin 300072, China
3
Computing and Communications Department, The Open University, Milton Keynes MK7 6AA, UK
4
Ubiquitous Awareness and Intelligent Solutions Lab, Lanzhou University, Lanzhou 730000, China
*
Authors to whom correspondence should be addressed.
Academic Editor: Raúl Alcaraz Martínez
Received: 25 November 2015 / Revised: 11 March 2016 / Accepted: 15 March 2016 / Published: 13 April 2016
(This article belongs to the Section Information Theory)
View Full-Text   |   Download PDF [461 KB, uploaded 14 April 2016]   |  

Abstract

Separating two probability distributions from a mixture model that is made up of the combinations of the two is essential to a wide range of applications. For example, in information retrieval (IR), there often exists a mixture distribution consisting of a relevance distribution that we need to estimate and an irrelevance distribution that we hope to get rid of. Recently, a distribution separation method (DSM) was proposed to approximate the relevance distribution, by separating a seed irrelevance distribution from the mixture distribution. It was successfully applied to an IR task, namely pseudo-relevance feedback (PRF), where the query expansion model is often a mixture term distribution. Although initially developed in the context of IR, DSM is indeed a general mathematical formulation for probability distribution separation. Thus, it is important to further generalize its basic analysis and to explore its connections to other related methods. In this article, we first extend DSM’s theoretical analysis, which was originally based on the Pearson correlation coefficient, to entropy-related measures, including the KL-divergence (Kullback–Leibler divergence), the symmetrized KL-divergence and the JS-divergence (Jensen–Shannon divergence). Second, we investigate the distribution separation idea in a well-known method, namely the mixture model feedback (MMF) approach. We prove that MMF also complies with the linear combination assumption, and then, DSM’s linear separation algorithm can largely simplify the EM algorithm in MMF. These theoretical analyses, as well as further empirical evaluation results demonstrate the advantages of our DSM approach. View Full-Text
Keywords: information retrieval; distribution separation; KL-divergence; mixture model information retrieval; distribution separation; KL-divergence; mixture model
Figures

Figure 1

This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. (CC BY 4.0).

Scifeed alert for new publications

Never miss any articles matching your research from any publisher
  • Get alerts for new papers matching your research
  • Find out the new papers from selected authors
  • Updated daily for 49'000+ journals and 6000+ publishers
  • Define your Scifeed now

SciFeed Share & Cite This Article

MDPI and ACS Style

Zhang, P.; Yu, Q.; Hou, Y.; Song, D.; Li, J.; Hu, B. Generalized Analysis of a Distribution Separation Method. Entropy 2016, 18, 105.

Show more citation formats Show less citations formats

Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Related Articles

Article Metrics

Article Access Statistics

1

Comments

[Return to top]
Entropy EISSN 1099-4300 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert
Back to Top