Open Access
This article is

- freely available
- re-usable

*Entropy*
**2017**,
*19*(6),
269;
https://doi.org/10.3390/e19060269

Article

A Novel Distance Metric: Generalized Relative Entropy

^{1}

College of Computer Science, Inner Mongolia University, Hohhot 010010, China

^{2}

Inner Mongolia Key Laboratory of Data Mining and Knowledge Engineering, Hohhot 010010, China

^{*}

Author to whom correspondence should be addressed.

Received: 26 May 2017 / Accepted: 7 June 2017 / Published: 13 June 2017

## Abstract

**:**

Information entropy and its extension, which are important generalizations of entropy, are currently applied to many research domains. In this paper, a novel generalized relative entropy is constructed to avoid some defects of traditional relative entropy. We present the structure of generalized relative entropy after the discussion of defects in relative entropy. Moreover, some properties of the provided generalized relative entropy are presented and proved. The provided generalized relative entropy is proved to have a finite range and is a finite distance metric. Finally, we predict nucleosome positioning of fly and yeast based on generalized relative entropy and relative entropy respectively. The experimental results show that the properties of generalized relative entropy are better than relative entropy.

Keywords:

relative entropy; generalized relative entropy; upper bound; distance metric; adjusted distance## 1. Background

The concept of entropy was proposed by T. Clausius as one of the parameters to reflect the degree of chaos for the object. Later, research found that information was such an abstract concept that was hard to make it clear to obtain its amount. Indeed, it was not until the information entropy was proposed by Shannon that we had a standard measure for the amount of information. Then, some related concepts based on information entropy have been proposed subsequently, such as cross entropy, relative entropy and mutual information, which offered an effective method to solve the complex problems of information processing. Therefore, the study of a novel metric based on information entropy was significant in the research domain of information science.

Information entropy was first proposed by Shannon. Assuming an information source I is composed by n different signals I

_{i}, H(I), the information entropy of I was shown in Equation (1), where ${p}_{i}=\frac{\mathrm{amount}\text{}\mathrm{of}\text{}{I}_{i}}{{\mathrm{signal}}^{\prime}\mathrm{s}\text{}\mathrm{amount}\text{}\mathrm{of}\text{}\mathrm{I}}$ denotes frequency of I_{i}, E() means mathematical expectation, $k>1$ denotes the base of logarithm. When $k=2$, the unit of H(I) is bit.
$$\mathrm{H}\left(\mathrm{I}\right)=\mathrm{E}\left(-{\mathrm{log}}_{k}{p}_{i}\right)=-{\displaystyle \sum}_{i=1}^{n}{p}_{i}\xb7{\mathrm{log}}_{k}{p}_{i}\text{}$$

Information entropy was a metric of the chaos degree for an information source. The bigger the information entropy was, the more chaotic the information source, and vice versa. Afterwards cross entropy was proposed based on information entropy, the definition was shown in Equation (2) where P denotes “real” distribution of information source, and Q denotes “unreal” distribution of information source p

_{i}denotes frequency of components of P and q_{i}denotes frequency of components of Q.
$$\mathrm{H}\left(\mathrm{P},\mathrm{Q}\right)={\displaystyle \sum}_{i=1}^{n}{p}_{i}\xb7{\mathrm{log}}_{k}\frac{1}{{q}_{i}}$$

Cross entropy also can act as the reaction of the similarity degree of component’s distributions of the two information sources. $\mathrm{H}\left(\mathrm{P},\mathrm{Q}\right)=\mathrm{H}\left(\mathrm{P}\right)$ if and only if all components’ distributions were identical. A homologous metric was relative entropy, and this was also known as Kullback–Leibler divergence. Its definition was shown in Equations (3) and (4), where Equation (3) was the definition of relative entropy for the discrete random variables and Equation (4) was the definition of the continuous random variables.

$$\mathrm{D}(\mathrm{P}||\mathrm{Q})={\displaystyle \sum}_{i=1}^{n}{p}_{i}\xb7{\mathrm{log}}_{k}\frac{{p}_{i}}{{q}_{i}}$$

$$\mathrm{D}(\mathrm{P}||\mathrm{Q})={\displaystyle \oint}{p}_{x}\xb7{\mathrm{log}}_{k}\frac{{p}_{x}}{{q}_{x}}dx$$

Relative entropy reflected the differences of the two information sources with different distributions, the bigger the relative entropy was, the bigger the differences in the information sources were, and vice versa. Subsequently, mutual information, another entropy-based metric was also proposed. This included two random variables X and Y. Mutual information was defined as relative entropy of p(x) and p(y) in $\mathrm{I}\left(X;Y\right)={\u222f}_{x\in \mathrm{X},y\in \mathrm{Y}}p\left(x,y\right)\xb7{\mathrm{log}}_{k}\frac{p\left(x,y\right)}{p\left(x\right)\xb7p\left(y\right)}$. The value of mutual information was non-negative. If and only if X and Y were independent variables with each other, was the value of mutual information equal to zero.

Recently, there have been many extensions and applications of information entropy based on metric. However, all of these information entropy-based methods have some defects. The two most important defects are: (1) it is not a distance metric; (2) it does not have an upper bound.

So in this paper, Section 2 introduces related works with information entropy-based methods; Section 3 provides a novel generalized relative entropy and prove some properties of the provided entropy; Section 4 predicts nucleosome positioning based on generalized relative entropy and relative entropy respectively; Section 5 summarizes the whole content.

## 2. Related Work

For years, many scholars have studied the applications of various entropy. Earlier, Białynicki-Birula et al. deduced a new uncertain relationship in quantum mechanics based on information entropy [1]. Uhlmann et al. applied relative entropy in digital integration, and proved some properties of the interpolation theory [2]. Shore et al. deduced the principle of maximum entropy and minimum cross entropy [3]. Fraser et al. analyzed the coordinate of singular factors [4]. Pincus et al. analyzed the complex degree of the system by the entropy [5]. Afterwards, Hyvarinen et al. analyzed independent component and projection pursuit based on entropy [6].

In 2000, Petersen et al. analyzed the optimization problem for the system with constraint of the relative entropy [7]. Kwak et al. classified a sample based on mutual information between the input information and the variable category [8]. Later, Pluim et al. analyzed the image matching in medicine based on mutual information [9]. Arif et al. used the entropy to analyze the changes of the center of gravity between the old and young in order to find a method of improving the walking stability for the old [10]. Phillips et al. analyzed the distribution of species by maximum entropy model [11]. Krishnaveni et al. analyzed the electroencephalogram of humans by mutual information [12]. Afterwards, Wolf et al. researched area laws in quantum systems by using mutual information and correlations [13]. Baldwin et al. utilized a maximum entropy model to find some regularity about the selection of habitat of wild animal [14]. Verdu et al. combined the matching with relative entropy and analyzed the relationship between both of them [15].

In 2011, Batina et al. reviewed mutual information [16]. Audenaert studied the asymmetry of relative entropy [17]. Gong et al. made the best of the scale-invariant feature transform (SIFT) and mutual information to propose a method that can match the object precisely [18]. A novel coarse-to-fine scheme for automatic image registration is proposed Giagkiozis et al. proposed a new method that can take advantage of the knowledge of cross entropy to solve the problem of multi-object programming [19]. Tang and Mao researched information entropy-based metrics for measuring emergences in artificial societies [20].

In recent years, many scholars have studied entropy-based methods in recognition and classification. Soares and Knobbe studied entropy-based discretization methods for ranking data [21]. Li et al. proposed a method to solve the problem of molecular docking using information entropy and the ant colony genetic algorithm [22]. Ma et al. used information entropy to analyze the changes of substance in the processes of chemical changes [23]. Kö et al. researched operational meaning of min- and max-entropy [24]. In addition, Müller and Pastena studied a generalization of majorization based on Shannon entropy [25]. Zhang et al. proposed a feature selection algorithm for fuzzy rough sets on the basis of information entropy [26]. Guariglia et al studied some fractal properties of entropy [27]. Ebrahimzadeh et al. proposed the concept of logical entropy based on entropy, and applied it to a quantum dynamical system [28]. Lopez-Garcia et al. proposed a method to make a prediction for a traffic jam in a short period of time; they combined the genetic algorithm with cross entropy [29]. Sutter et al. studied the monotonicity of cross entropy [30]. Opper provided an estimator for the relative entropy rate of path measures for stochastic differential equations [31]. Tang et al. studied an EEMD-based multi-scale fuzzy entropy approach for complexity analysis in clean energy markets [32].

## 3. Generalized Relative Entropy

#### 3.1. Structure of Generalized Relative Entropy

Nowadays, relative entropy is becoming one of the most important dissimilarity measures between two multidimensional vectors. Let $X\left({x}_{1},\dots ,{x}_{s}\right)$ and $Y\left({y}_{1},\dots ,{y}_{s}\right)$ be two multidimensional vectors, which are constituted by s components with different counts. The count of i-th component is x

_{i}in vector X and y_{i}in vector Y. Therefore, the relative entropy RE(X,Y), which denotes relation from X to Y, is defined in Equation (5), where ${p}_{x}\left(i\right)\stackrel{\scriptscriptstyle\mathrm{def}}{=}\frac{{x}_{i}}{{{\displaystyle \sum}}_{i=1}^{s}{x}_{i}}$ means the probability of x_{i}in X for each i. Herein, we define ${p}_{x}\left(i\right)\xb7\mathrm{log}\frac{{p}_{x}\left(i\right)}{{p}_{y}\left(i\right)}=0$ when ${p}_{x}\left(i\right)=0$ in order to avoid form of equation $0\xb7\mathrm{log}0$. In real application, $\epsilon >0$ is added in the denominator of log(), which is a very small positive number to avoid form of equation $\mathrm{log}\infty $.
$$\mathrm{RE}\left(X,Y\right)={{\displaystyle \sum}}_{i=1}^{s}{p}_{x}\left(i\right)\xb7\mathrm{log}\frac{{p}_{x}\left(i\right)}{{p}_{y}\left(i\right)}$$

It is known that RE(X,Y) is not a distance metric because usually $\mathrm{RE}\left(X,Y\right)\text{}\ne \text{}\mathrm{RE}\left(Y,X\right)$ when $X\ne Y$. However, relative entropy does not have a finite upper bound, which means it can not be easily used to measure difference between high dimensional vectors in real application. So in this paper, based on definition of relative entropy, we present a generalized relative entropy d(X,Y) by Equation (6), where s denotes the number of components and $k\ge 1$ denotes the control parameter of function d(), $r=0\text{}\mathrm{when}\text{}X=Y;\text{}r=1\text{}\mathrm{when}\text{}X\ne Y$. We believe the generalized relative entropy has better properties than relative entropy. Moreover, it is a distance metric.

$$\mathrm{d}\left(X,Y\right)={\displaystyle \sum}_{i=1}^{s}\left({p}_{x}\left(i\right)\xb7\mathrm{log}\frac{k\xb7{p}_{x}\left(i\right)}{\left(k-1\right){p}_{x}\left(i\right)+{p}_{y}\left(i\right)}+{p}_{y}\left(i\right)\xb7\mathrm{log}\frac{k\xb7{p}_{y}\left(i\right)}{{p}_{x}\left(i\right)+\left(k-1\right){p}_{y}\left(i\right)}\right)+r\xb7\mathrm{log}{\left(1+\frac{1}{k-1}\right)}^{2}$$

#### 3.2. Properties of Generalized Relative Entropy

Theorem 1 will be presented to prove that the generalized relative entropy d() is a distance metric. However, Lemmas 1 and 2 and Inferences 1 and 2 are presented first.

**Lemma**

**1.**

${{\displaystyle \sum}}_{i=1}^{s}\left({p}_{x}\left(i\right)\xb7\mathrm{log}\frac{k\xb7{p}_{x}\left(i\right)}{\left(k-1\right){p}_{x}\left(i\right)+{p}_{y}\left(i\right)}\right)$ is constant nonnegative if ${p}_{x}\left(i\right)\ge 0,{p}_{y}\left(i\right)\ge 0,{{\displaystyle \sum}}_{i=1}^{s}{p}_{x}\left(i\right)=1,k\ge 1$.

**Proof.**

Because ${p}_{x}\left(i\right)\xb7\mathrm{log}\frac{k\xb7{p}_{x}\left(i\right)}{\left(k-1\right){p}_{x}\left(i\right)+{p}_{y}\left(i\right)}=-{p}_{x}\left(i\right)$ $\xb7\mathrm{log}\frac{\left(k-1\right){p}_{x}\left(i\right)+{p}_{y}\left(i\right)}{k\xb7{p}_{x}\left(i\right)}$, $\frac{\left(k-1\right){p}_{x}\left(i\right)+{p}_{y}\left(i\right)}{k\xb7{p}_{x}\left(i\right)}\ge 0$, we have following Equation (7).

$$\sum}_{i=1}^{s}\left({p}_{x}\left(i\right)\xb7\mathrm{log}\frac{\left(k-1\right)\xb7{p}_{x}\left(i\right)+{p}_{y}\left(i\right)}{\mathrm{k}\xb7{\mathrm{p}}_{\mathrm{x}}\left(i\right)}\right)\le \mathrm{log}{\displaystyle \sum}_{i=1}^{s}\frac{\left(k-1\right)\xb7{p}_{x}\left(i\right)+{p}_{y}\left(i\right)}{k\xb7{p}_{x}\left(i\right)}\xb7{p}_{x}\left(i\right)\phantom{\rule{0ex}{0ex}}=\mathrm{log}\frac{\left(k-1\right)\xb7{{\displaystyle \sum}}_{i=1}^{s}{p}_{x}\left(i\right)+{{\displaystyle \sum}}_{i=1}^{s}{p}_{y}\left(i\right)}{\mathrm{k}}=\mathrm{log}1=0$$

So ${{\displaystyle \sum}}_{i=1}^{s}{p}_{x}\left(i\right)\xb7\mathrm{log}\frac{k\xb7{p}_{x}\left(i\right)}{\left(k-1\right){p}_{x}\left(i\right)+{p}_{y}\left(i\right)}=-{{\displaystyle \sum}}_{i=1}^{s}{p}_{x}\left(i\right)\xb7\mathrm{log}\frac{\left(k-1\right){p}_{x}\left(i\right)+{p}_{y}\left(i\right)}{k\xb7{p}_{x}\left(i\right)}\ge 0$

Lemma 1 is proved. □

Then, with consideration of condition that sign “=” appeared, we have Inference 1.

**Inference**

**1.**

${{\displaystyle \sum}}_{i=1}^{s}\left({p}_{x}\left(i\right)\xb7\mathrm{log}\frac{k\xb7{p}_{x}\left(i\right)}{\left(k-1\right){p}_{x}\left(i\right)+{p}_{y}\left(i\right)}\right)$ is zero if and only if ${p}_{x}\left(i\right)={p}_{y}\left(i\right)$ for all i where ${p}_{x}\left(i\right)\ge 0,{p}_{y}\left(i\right)\ge 0,{{\displaystyle \sum}}_{i=1}^{s}{p}_{x}\left(i\right)={{\displaystyle \sum}}_{i=1}^{s}{p}_{y}\left(i\right)=1\text{}and\text{}k\ge 1$.

Then, we have Lemma 2 and Inference 2 based on Lemma 1 and Inference 1 to prove upper bound of d(X,Y).

**Lemma**

**2.**

${{\displaystyle \sum}}_{i=1}^{s}\left({p}_{x}\left(i\right)\xb7\mathrm{log}\frac{k\xb7{p}_{x}\left(i\right)}{\left(k-1\right){p}_{x}\left(i\right)+{p}_{y}\left(i\right)}\right)\le \mathrm{log}\frac{k}{k-1}$ if ${p}_{x}\left(i\right)\ge 0,\text{}{p}_{y}\left(i\right)\ge 0,\text{}k\ge 1,\text{}{{\displaystyle \sum}}_{i=1}^{s}{p}_{x}\left(i\right)=1$.

**Proof.**

We have Equation (8) to prove Lemma 2 based on Equation (9).

$$\sum}_{i=1}^{s}\left({p}_{x}\left(i\right)\xb7\mathrm{log}\frac{k\xb7{p}_{x}\left(i\right)}{\left(k-1\right){p}_{x}\left(i\right)+{p}_{y}\left(i\right)}\right)\le {\displaystyle \sum}_{i=1}^{s}\left({p}_{x}\left(i\right)\xb7\mathrm{log}\frac{k}{k-1}\right)=\mathrm{log}\frac{k}{k-1$$

$$\mathrm{log}\frac{k\xb7{p}_{x}\left(i\right)}{\left(k-1\right){p}_{x}\left(i\right)+{p}_{y}\left(i\right)}\le \mathrm{log}\frac{k\xb7{p}_{x}\left(i\right)}{\left(k-1\right){p}_{x}\left(i\right)}=\mathrm{log}\frac{k}{k-1}$$

Lemma 2 is proved.□

**Inference**

**2.**

Upper bound of $d\left(X,Y\right)$ is $4\xb7\mathrm{log}\frac{k}{k-1}$ where ${{\displaystyle \sum}}_{i=1}^{s}{p}_{x}\left(i\right)={{\displaystyle \sum}}_{i=1}^{s}{p}_{y}\left(i\right)=1\text{}and\text{}k\ge 1$.

**Proof.**

We have Equation (10) to prove Inference 2 based on Lemma 2.

$$\mathrm{d}\left(X,Y\right)={\displaystyle \sum}_{i=1}^{s}\left({p}_{x}\left(i\right)\xb7\mathrm{log}\frac{k\xb7{p}_{x}\left(i\right)}{\left(k-1\right){p}_{x}\left(i\right)+{p}_{y}\left(i\right)}+{p}_{y}\left(i\right)\xb7\mathrm{log}\frac{k\xb7{p}_{y}\left(i\right)}{{p}_{x}\left(i\right)+\left(k-1\right){p}_{y}\left(i\right)}\right)+r\phantom{\rule{0ex}{0ex}}\xb7\mathrm{log}{\left(1+\frac{1}{k-1}\right)}^{2}\le \mathrm{log}\frac{k}{k-1}+\mathrm{log}\frac{k}{k-1}+2r\xb7\mathrm{log}\frac{k}{k-1}$$

Inference 2 is proved.□

After that, Theorem 1 is presented to prove d(X,Y) is a distance metric between two elements X and Y with same diversity s and length n.

**Theorem**

**1.**

Function d() is a distance metric of elemental set E{} in space S(E{}, d()) where all elements in E have same diversity s.

**Proof.**

Let X and Y be two elements in E, ${p}_{x}\left(i\right)$ and ${p}_{y}\left(i\right)$ denote the frequency of the i-th component in X or Y, $k>1$ is a control parameter, s is the number of components in X and Y, $r=0\text{}\mathrm{when}\text{}\mathrm{X}=\mathrm{Y}\text{}\mathrm{and}\text{}\mathrm{r}=1\text{}\mathrm{when}\text{}X\ne Y$, we have $\mathrm{d}\left(\mathrm{X},\mathrm{Y}\right)=$ ${{\displaystyle \sum}}_{i=1}^{s}({p}_{x}\left(i\right)\xb7\mathrm{log}\frac{k\xb7{p}_{x}\left(i\right)}{\left(k-1\right){p}_{x}\left(i\right)+{p}_{y}\left(i\right)}+{p}_{y}\left(i\right)$ $\xb7\mathrm{log}\frac{k\xb7{p}_{y}\left(i\right)}{{p}_{x}\left(i\right)+\left(k-1\right){p}_{y}\left(i\right)})+r\xb7\mathrm{log}{\left(1+\frac{1}{k-1}\right)}^{2}$ from Equation (2). Then, we use following properties to prove Theorem 1.

**Property**

**1.**

$d\left(X,Y\right)\ge 0$ for every X and Y, $d\left(X,Y\right)=0$ if and only if $X=Y$.

First, we know ${{\displaystyle \sum}}_{i=1}^{s}\left({p}_{x}\left(i\right)\xb7\mathrm{log}\frac{k\xb7{p}_{x}\left(i\right)}{\left(k-1\right){p}_{x}\left(i\right)+{p}_{y}\left(i\right)}\right)$ is nonnegative from Lemma 1. It implies ${{\displaystyle \sum}}_{i=1}^{s}\left({p}_{y}\left(i\right)\xb7\mathrm{log}\frac{k\xb7{p}_{y}\left(i\right)}{\left(k-1\right){p}_{y}\left(i\right)+{p}_{x}\left(i\right)}\right)\ge 0$. Then, we know $r\xb7\mathrm{log}{\left(1+\frac{1}{k-1}\right)}^{2}\ge 0$. So $\mathrm{d}\left(X,Y\right)={{\displaystyle \sum}}_{i=1}^{s}\left({p}_{x}\left(i\right)\xb7\mathrm{log}\frac{k\xb7{p}_{x}\left(i\right)}{\left(k-1\right){p}_{x}\left(i\right)+{p}_{y}\left(i\right)}\right)$ $+{{\displaystyle \sum}}_{i=1}^{s}\left({p}_{y}\left(i\right)\xb7\mathrm{log}\frac{k\xb7{p}_{y}\left(i\right)}{{p}_{x}\left(i\right)+\left(k-1\right){p}_{y}\left(i\right)}\right)$ $+r\xb7\mathrm{log}{\left(1+\frac{1}{k-1}\right)}^{2}\ge 0$, which means Property 1 is proved.

**Property**

**2.**

$d\left(X,Y\right)=d\left(Y,X\right)$ for every X and Y.

It is known that the formation of $\mathrm{d}\left(\mathrm{X},\mathrm{Y}\right)$ is symmetrical to $\mathrm{d}\left(\mathrm{Y},\mathrm{X}\right)$ for every pair of vectors X and Y, which means Property 2 is proved.

**Property**

**3.**

D(X,Y) + d(Y,Z) ≥ d(X,Z).

First, if there are at least two elements in $\left\{X,Y,Z\right\}$ that are equal, it is known that $\mathrm{d}\left(X,Y\right)+\mathrm{d}\left(Y,Z\right)\ge \mathrm{d}\left(X,Z\right)$ because in the three functions d(), one value is zero and other two values are the same and nonnegative.

So, Equation (11) is used to describe $\mathrm{d}\left(X,Y\right)+\mathrm{d}\left(Y,Z\right)-\mathrm{d}\left(X,Z\right)$ when $X\ne Y\ne Z$.

$$\begin{array}{ll}\mathrm{d}\left(X,Y\right)+\mathrm{d}\left(Y,Z\right)& -\mathrm{d}\left(X,Z\right)\\ & ={\displaystyle \sum}_{i=1}^{s}({p}_{x}\left(i\right)\xb7\mathrm{log}\frac{k\xb7{p}_{x}\left(i\right)}{\left(k-1\right){p}_{x}\left(i\right)+{p}_{y}\left(i\right)}+{p}_{y}\left(i\right)\\ & \xb7\mathrm{log}\frac{k\xb7{p}_{y}\left(i\right)}{{p}_{x}\left(i\right)+\left(k-1\right){p}_{y}\left(i\right)}+{p}_{y}\left(i\right)\xb7\mathrm{log}\frac{k\xb7{p}_{y}\left(i\right)}{\left(k-1\right){p}_{y}\left(i\right)+{p}_{z}\left(i\right)}+{p}_{z}\left(i\right)\\ & \xb7\mathrm{log}\frac{k\xb7{p}_{z}\left(i\right)}{{p}_{z}\left(i\right)+\left(k-1\right){p}_{y}\left(i\right)}-{p}_{x}\left(i\right)\xb7\mathrm{log}\frac{k\xb7{p}_{x}\left(i\right)}{\left(k-1\right){p}_{x}\left(i\right)+{p}_{z}\left(i\right)}-{p}_{z}\left(i\right)\\ & \xb7\mathrm{log}\frac{k\xb7{p}_{z}\left(i\right)}{{p}_{x}\left(i\right)+\left(k-1\right){p}_{z}\left(i\right)})+\mathrm{log}{\left(1+\frac{1}{k-1}\right)}^{2}\end{array}$$

Then, we have Equation (12) to prove Property 3 based on Lemmas 1 and 2.

$$\begin{array}{ll}\mathrm{d}\left(X,Y\right)+\mathrm{d}\left(Y,Z\right)& -\mathrm{d}\left(X,Z\right)\\ & \ge \mathrm{log}{\left(1+\frac{1}{k-1}\right)}^{2}-{\displaystyle {\displaystyle \sum}_{i=1}^{s}}{p}_{x}\left(i\right)\xb7\mathrm{log}\frac{k\xb7{p}_{x}\left(i\right)}{\left(k-1\right){p}_{x}\left(i\right)+{p}_{z}\left(i\right)}\\ & -{\displaystyle {\displaystyle \sum}_{i=1}^{s}}{p}_{z}\left(i\right)\xb7\mathrm{log}\frac{k\xb7{p}_{z}\left(i\right)}{{p}_{x}\left(i\right)+\left(k-1\right){p}_{z}\left(i\right)}\ge 2\xb7\mathrm{log}\frac{k}{k-1}-2\xb7\mathrm{log}\frac{k}{k-1}\\ & \ge 0\end{array}$$

To summarize Properties 1–3, Theorem 1 is proved.□

Then, we use Theorem 2 to provide the range of d(X,Y) for all elements X and Y which were combined with s components.

**Theorem**

**2.**

Range of $d\left(X,Y\right)$ is $\left\{0\right\}\cup [2\xb7\mathrm{log}\frac{k}{k-1},4\xb7\mathrm{log}\frac{k}{k-1}]$.

**Proof.**

Case (1) $X=Y$

When $X=Y$, we have $\mathrm{d}\left(X,Y\right)=\mathrm{d}\left(X,X\right)=0$ by Inference 1.

Case (2) $X\ne Y$

When $X\ne Y$, we have $\mathrm{d}\left(X,Y\right)\le 4\xb7\mathrm{log}\frac{k}{k-1}$ by Inference 2, and $\mathrm{d}\left(X,Y\right)\ge 2\xb7\mathrm{log}\frac{k}{k-1}$ by Lemma 1.

To summarize Cases 1 and 2, Theorem 2 is proved.□

In this way, the generalized relative entropy is provided and some properties are proved.

## 4. Experiment

#### 4.1. Model Predicting Nucleosome Positoning

In this paper, we undertake some experiments in order to prove that generalized relative entropy has better properties than relative entropy. We consider the following two species: fly and yeast, and we use those datasets to predict nucleosome positioning. the datasets of fly are downloaded from Supplementary data in [33], including 2900 core DNA sequences and 2850 linker DNA sequences of 147 bp, and the datasets of yeast are downloaded from Supporting Information S1 in [34], including 1880 core DNA sequences and 1740 linker DNA sequences of 150 bp.

The following describes the processes of the experiments. Firstly, we introduce the definition of k-nucleotide sequences combinations. This means the combination of four nucleotides (A or G or C or T). Thus, di-nucleotide sequences have 16 combinations such as AA or AT. Next, we introduce the processes of statistics taking fly datasets for an example. Firstly, we count the frequencies of di-nucleotide sequences in core DNA sequences and liner DNA sequences, respectively, which construct two real distributions $p{1}_{x}\left(i\right)$, $p{2}_{x}\left(i\right)$, where i represents the i-th di-nucleotide sequences. Secondly, we count the frequencies of all di-nucleotide sequences for each DNA sequence and we construct the unreal distribution ${p}_{y}\left(i\right)$. Thirdly, we count the relative entropy between each DNA sequence and the core sequences and the linker DNA sequences, respectively, which constructs two dimensions feature vectors $\mathrm{R}\left(R{E}_{1},R{E}_{2}\right)$. Then, we put them into the back propagation neural network (BP neural network) to train a classification model to predict nucleosome positioning and use 10-fold cross-validation to examine the quality of the model. Then, we count generalized relative entropy between each DNA sequence and the core sequences and the linker DNA sequences, respectively, which constructs two dimensions feature vectors $\mathrm{R}\left({d}_{1},{d}_{2}\right)$. The k ranges from 1.1 to 5. Then, we put them into a BP neural network to train a classification model to predict nucleosome positioning and use 10-fold cross-validation to examine the quality of the model. Next, we predict nucleosome positioning of yeast datasets using the same methodology as for predicting nucleosome positioning of the fly datasets.

#### 4.2. Evaluations of the Equality of Predition

In this paper, four variables TP, FP, FN, TN are defined. TP represents the situation such that both the prediction and the fact are the core DNA sequences. FP presents the situation that the linker DNA sequences incorrectly predicted the core DNA sequences. FN represents the situation that the core DNA sequences incorrectly predicted the linker DNA sequences. TN represents the situation that both the prediction and fact are the linker DNA sequences. We define the following standard to examine the quality of the prediction of a model [35].
where Sn represents sensitivity, Sp represents specificity, Acc represents accuracy, and Mcc represents Mathew correlation coefficient.

$$\mathrm{Sn}=\frac{TP}{TP+FN}$$

$$\mathrm{Sp}=\frac{TN}{TN+FP}$$

$$\mathrm{Acc}=\frac{TP+TN}{TP+FN+TN+FP}$$

$$\mathrm{Mcc}=\frac{TP\times TN-FP\times FN}{\sqrt{\left(TP+FN\right)\left(TP+FP\right)\left(TN+FN\right)\left(TN+FP\right)}}$$

#### 4.3. Results and Analysis

We use 10-fold cross-validation to examine the quality of the model for fly datasets and yeast datasets. From the following Table and Figures (Table 1 and Table 2, Figure 1, Figure 2, Figure 3 and Figure 4), we can come to the conclusion that the results obtained by generalized relative entropy are better than relative entropy. Besides, it is obvious that the values obtained by generalized relative entropy are higher than the values obtained by relative entropy when k equals 2, 3.1 and 4.1 (Table 1). Meanwhile, we can see that the values of Acc for yeast datasets are higher than fly datasets (Figure 1, Table 2), which illustrates that nucleosome positioning is more easily obtained in yeast than fly.

## 5. Conclusions

In this paper, we provided a novel distance metric based on relative entropy, which was called generalized relative entropy. The generalized relative entropy surmounted the disadvantage of relative entropy because it had an upper bound and satisfies the triangle inequality of distance. The properties of the distance metric and upper bound were proved in this paper. Then, the range of the provided generalized relative entropy was computed, and k ranges from 1.1 to 5. In order to validate the advantages of generalized relative entropy, we predict nucleosome positioning of fly and yeast based on generalized relative entropy and relative entropy, respectively. The experimental results show that generalized relative entropy is better than relative entropy in nucleosome positioning. Finally, since there was a parameter k to control the generalized relative entropy, we believe that this metric can be used in a variety of real applications by adjusted k.

## Acknowledgments

The authors wish to thank the anonymous editors and reviewers for their helpful comments in this paper. This work is supported by Grants Programs of National Natural Science Foundation of China (No. 61502254), the Program for New Century Excellent Talents in University (NCET-12-1016), the program for Young Talents of Science and Technology in the Universities of Inner Mongolia Autonomous Region (NJYT-12-B04). We also want to thank Prof. Guo for his help to this paper.

## Author Contributions

Shuai Liu conceived the method and finished some proofs; Mengye Lu, Gaocheng Liu and Zheng Pan performed and analyzed the experiments; Gaocheng Liu cleaned the data; Shuai Liu and Mengye Lu wrote the paper. All authors have read and approved the final version of the manuscript.

## Conflicts of Interest

The authors declare no conflict of interest.

## References

- Biaynicki-Birula, I.; Mycielski, J. Uncertainty relations for information entropy in wave mechanics. Commun. Math. Phys.
**1975**, 44, 129–132. [Google Scholar] [CrossRef] - Uhlmann, A. Relative entropy and the Wigner-Yanase-Dyson-Lieb concavity in an interpolation theory. Commun. Math. Phys.
**1977**, 54, 21–32. [Google Scholar] [CrossRef] - Shore, J.; Johnson, R. Axiomatic derivation of the principle of maximum entropy and the principle of minimum cross-entropy. IEEE Trans. Inf. Theory
**1980**, 26, 26–37. [Google Scholar] [CrossRef] - Fraser, A.M.; Swinney, H.L. Independent coordinates for strange attractors from mutual information. Phys. Rev. A
**1986**, 33, 1134–1140. [Google Scholar] [CrossRef] - Pincus, S.M. Approximate entropy as a measure of system complexity. Proc. Natl. Acad. Sci. USA
**1991**, 88, 2297–2301. [Google Scholar] [CrossRef] [PubMed] - Hyvärinen, A. New Approximations of Differential Entropy for Independent Component Analysis and Projection Pursuit. 1998. Available online: https://papers.nips.cc/paper/1408-new-approximations-of-differential-entropy-for-independent-component-analysis-and-projection-pursuit.pdf (accessed on 12 June 2017).
- Petersen, I.R.; James, M.R.; Dupuis, P. Minimax optimal control of stochastic uncertain systems with relative entropy constraints. IEEE Trans. Autom. Control
**2000**, 45, 398–412. [Google Scholar] [CrossRef] - Kwak, N.; Choi, C.-H. Input feature selection by mutual information based on Parzen window. IEEE Trans. Pattern Anal. Mach. Intell.
**2002**, 24, 1667–1671. [Google Scholar] [CrossRef] - Pluim, J.P.W.; Maintz, J.B.A.; Viergever, M.A. Mutual-information-based registration of medical images: A survey. IEEE Trans. Med. Imaging
**2003**, 22, 986–1004. [Google Scholar] [CrossRef] [PubMed] - Arif, M.; Ohtaki, Y.; Nagatomi, R.; Inooka, H. Estimation of the Effect of Cadence on Gait Stability in Young and Elderly People using Approximate Entropy Technique. Meas. Sci. Rev.
**2004**, 4, 29–40. [Google Scholar] - Phillips, S.J.; Anderson, R.P.; Schapire, R.E. Maximum entropy modeling of species geographic distributions. Ecol. Model.
**2006**, 190, 231–259. [Google Scholar] [CrossRef] - Krishnaveni, V.; Jayaraman, S.; Ramadoss, K. Application of Mutual Information based Least dependent Component Analysis (MILCA) for Removal of Ocular Artifacts from Electroencephalogram. Int. J. Biomed. Sci.
**2006**, 1, 63–74. [Google Scholar] - Wolf, M.M.; Verstraete, F.; Hastings, M.B.; Cirac, J.I. Area laws in quantum systems: Mutual information and correlations. Phys. Rev. Lett.
**2008**, 100, 070502. [Google Scholar] [CrossRef] [PubMed] - Baldwin, R.A. Use of Maximum Entropy Modeling in Wildlife Research. Entropy
**2009**, 11, 854–866. [Google Scholar] [CrossRef] - Verdu, S. Mismatched Estimation and Relative Entropy. IEEE Trans. Inf. Theory
**2010**, 56, 3712–3720. [Google Scholar] [CrossRef] - Batina, L.; Gierlichs, B.; Prouff, E.; Rivain, M.; Standaert, F.X.; Veyrat-Charvillon, N. Mutual Information Analysis: A Comprehensive Study. J. Cryptol.
**2011**, 24, 269–291. [Google Scholar] [CrossRef] - Audenaert, K.M.R. On the asymmetry of the relative entropy. J. Math. Phys.
**2013**, 54, 073506. [Google Scholar] [CrossRef] - Gong, M.; Zhao, S.; Jiao, L.; Tian, D.; Wang, S. A Novel Coarse-to-Fine Scheme for Automatic Image Registration Based on SIFT and Mutual Information. IEEE Trans. Geosci. Remote Sens.
**2014**, 52, 4328–4338. [Google Scholar] [CrossRef] - Giagkiozis, I.; Purshouse, R.C.; Fleming, P.J. Generalized decomposition and cross entropy methods for many-objective optimization. Inf. Sci.
**2014**, 282, 363–387. [Google Scholar] [CrossRef] - Tang, M.; Mao, X. Information Entropy-Based Metrics for Measuring Emergences in Artificial Societies. Entropy
**2014**, 16, 4583–4602. [Google Scholar] [CrossRef] - De SÁ, C.R.; Soares, C.; Knobbe, A. Entropy-based discretization methods for ranking data. Inf. Sci.
**2015**, 329, 921–936. [Google Scholar] [CrossRef] - Li, Z.; Gu, J.; Zhuang, H.; Kang, L.; Zhao, X.; Guo, Q. Adaptive molecular docking method based on information entropy genetic algorithm. Appl. Soft Comput.
**2015**, 26, 299–302. [Google Scholar] [CrossRef] - Ma, C.W.; Wei, H.L.; Wang, S.S.; Ma, Y.G.; Wada, R.; Zhang, Y.L. Isobaric yield ratio difference and Shannon information entropy. Phys. Lett. B
**2015**, 742, 19–22. [Google Scholar] [CrossRef] - König, R.; Renner, R.; Schaffner, C. The operational meaning of min- and max-entropy. IEEE Trans. Inf. Theory
**2015**, 55, 4337–4347. [Google Scholar] [CrossRef] - Müller, M.P.; Pastena, M. A Generalization of Majorization that Characterizes Shannon Entropy. IEEE Trans. Inf. Theory
**2016**, 62, 1711–1720. [Google Scholar] [CrossRef] - Zhang, X.; Mei, C.; Chen, D.; Li, J. Feature selection in mixed data: A method using a novel fuzzy rough set-based information entropy. Pattern Recognit.
**2016**, 56, 1–15. [Google Scholar] [CrossRef] - Guariglia, E. Entropy and Fractal Antennas. Entropy
**2016**, 18, 84. [Google Scholar] [CrossRef] - Ebrahimzadeh, A. Logical entropy of quantum dynamical systems. Open Phys.
**2016**, 14, 1–5. [Google Scholar] [CrossRef] - Lopez-Garcia, P.; Onieva, E.; Osaba, E.; Masegosa, A.D.; Perallos, A. A Hybrid Method for Short-Term Traffic Congestion Forecasting Using Genetic Algorithms and Cross Entropy. IEEE Trans. Intell. Transp. Syst.
**2016**, 17, 557–569. [Google Scholar] [CrossRef] - Sutter, D.; Tomamichel, M.; Harrow, A.W. Strengthened Monotonicity of Relative Entropy via Pinched Petz Recovery Map. IEEE Trans. Inf. Theory
**2016**, 62, 2907–2913. [Google Scholar] [CrossRef] - Opper, M. An estimator for the relative entropy rate of path measures for stochastic differential equations. J. Comput. Phys.
**2017**, 330, 127–133. [Google Scholar] [CrossRef] - Tang, L.; Lv, H.; Yu, L. An EEMD-based multi-scale fuzzy entropy approach for complexity analysis in clean energy markets. Appl. Soft Comput.
**2017**, 56, 124–133. [Google Scholar] [CrossRef] - Guo, S.-H.; Deng, E.-Z.; Xu, L.-Q.; Ding, H.; Lin, H.; Chen, W.; Chou, K.-C. iNuc-PseKNC: A sequence-based predictor for predicting nucleosome positioning in genomes with pseudo k-tuple nucleotide composition. Bioinformatics
**2014**, 30, 1522–1529. [Google Scholar] [CrossRef] [PubMed] - Chen, W.; Feng, P.; Ding, H.; Lin, H.; Chou, K.-C. Using deformation energy to analyze nucleosome positioning in genomes. Genomics
**2016**, 107, 69–75. [Google Scholar] [CrossRef] [PubMed] - Awazu, A. Prediction of nucleosome positioning by the incorporation of frequencies and distributions of three different nucleotide segment lengths into a general pseudo k-tuple nucleotide composition. Bioinformatics
**2017**, 33, 42–48. [Google Scholar] [CrossRef] [PubMed]

Method | Acc | Sn | Sp | Mcc | |
---|---|---|---|---|---|

Relative entropy | 0.7289 | 0.6837 | 0.7744 | 0.4603 | |

Generalized relative entropy | (k = 2) | 0.7426 | 0.7105 | 0.7763 | 0.4885 |

(k = 3.1) | 0.7477 | 0.7215 | 0.7751 | 0.4970 | |

(k = 4.1) | 0.7485 | 0.7225 | 0.7762 | 0.4994 |

Method | Acc | Sn | Sp | Mcc |
---|---|---|---|---|

Relative entropy | 0.9843 | 0.9875 | 0.9809 | 0.9684 |

Generalized relative entropy (k = 2) | 0.9901 | 0.9937 | 0.9860 | 0.9801 |

© 2017 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).