Information Distances versus Entropy Metric

Hu, Bo; Bi, Lvqing; Dai, Songsong

doi:10.3390/e19060260

Open AccessArticle

Information Distances versus Entropy Metric

by

Bo Hu

¹

,

Lvqing Bi

^2,3 and

Songsong Dai

^3,*

¹

School of Mechanical and Electrical Engineering, Guizhou Normal University, Guiyang 550025, China

²

School of Electronics and Communication Engineering, Yulin Normal University, Yulin 537000, China

³

School of Information Science and Engineering, Xiamen University, Xiamen 361005, China

^*

Author to whom correspondence should be addressed.

Entropy 2017, 19(6), 260; https://doi.org/10.3390/e19060260

Submission received: 18 April 2017 / Revised: 2 June 2017 / Accepted: 2 June 2017 / Published: 7 June 2017

(This article belongs to the Section Information Theory, Probability and Statistics)

Download Versions Notes

Abstract

:

Information distance has become an important tool in a wide variety of applications. Various types of information distance have been made over the years. These information distance measures are different from entropy metric, as the former is based on Kolmogorov complexity and the latter on Shannon entropy. However, for any computable probability distributions, up to a constant, the expected value of Kolmogorov complexity equals the Shannon entropy. We study the similar relationship between entropy and information distance. We also study the relationship between entropy and the normalized versions of information distances.

Keywords:

entropy; Kolmogorov complexity; information distance; normalized information distance

1. Introduction

Information distance [1] is a universal distance measure between individual objects based on Kolmogorov complexity. It is the length of a shortest program that transforms one object into the other object. Since the theory of information distance was proposed, various information distance measures have been known. The normalized versions of information distances [2] have been introduced for measuring similarity between sequences. The min distance and its normalized version, which do not satisfy the triangle inequality, have been presented in [3,4]. The time-bounded version of information distance [5] has been used for studying the computability properties of the normalized information distances. A safe approximability of the normalized information distance have been discussed in [6]. Since the normalized information distance is uncomputable, two practical distance measures, the normalized compression distance and the Google similarity distance, have been presented [7,8,9,10,11]. These distance measures have been successfully applied to bioinformatics [10], music clustering [7,8,9], linguistics [2,12], plagiarism detection [13], question answering [3,4,14] and many more.

As mentioned in [1], information distance should be contrasted with the entropy metric. The former is based on Kolmogorov complexity and the latter on Shannon entropy. Various relations between Shannon entropy and Kolmogorov complexity are known [15,16,17]. It is well known that for any computable probability distributions the expected value of Kolmogorov complexity equals the Shannon entropy [18,19]. Linear inequalities are valid for Shannon entropy are also valid for Kolmogorov complexity, and vice verse [20]. We also know that various notions are both based on Shannon entropy and Kolmogorov complexity. Hence, many similar relationships between entropy based notions and Kolmogorov complexity based notions have been proposed. Relations between time-bounded entropy measures and time-bounded Kolmogorov complexity have been proposed in [21]. Relations between Shannon mutual information and algorithmic (Kolmogorov) mutual information have been proposed in [18]. Then, relations between entropy based cryptographic security and Kolmogorov complexity based cryptographic security have been studied [22,23,24,25]. One-way functions have been studied on both time-bounded entropy and time-bounded Kolmogorov complexity [26]. However, the relationship between information distance and entropy has not been studied. In this paper, we study the similar relationship between information distance and the entropy metric. We also analyze the validity of the relationship between normalized information distance and the entropy metric.

The rest of this paper is organized as follows: In Section 2, some basic notions are reviewed. In Section 3, we study the relationship between information distance and the entropy metric. In Section 4, we study the relationship between normalized information distance and the entropy metric. Finally, conclusions are stated in Section 5.

2. Preliminaries

In this paper, let

| x |

be the length of the string x and

\log (\cdot)

be the function

\log_{2} (\cdot)

.

2.1. Kolmogorov Complexity

Kolmogorov complexity was introduced independently by Solomonoff [27] and Kolmogorov [28] and later by Chaitin [29]. Some basic notions of Kolmogorov complexity are given below. For more details, see [16,17]. We use the prefix-free definition of Kolmogorov complexity. A string x is a proper prefix of a string y if we have

y = x z

for

z \neq ε

, where

ε

is the empty string. A set of strings A is prefix-free if there are not two strings x and y in A such that x is a proper prefix of y. For convenience, we use the prefix-free Turing machine, i.e., Turing machines with a prefix-free domain.

Let F be a fixed prefix-free optimal universal Turing machine. The conditional Kolmogorov complexity

K (y | x)

of y given x is defined by

K (y | x) = \min {| p | : F (p, x) = y},

where

F (p, x)

is the output of the program p with auxiliary input x when it is run in the machine F.

The (unconditional) Kolmogorov complexity

K (y)

of y is defined as

K (y | ε)

.

2.2. Shannon Entropy

Shannon entropy [30] is a measure of the average uncertainty in a random variable. Some basic notions of entropy are given here. For more details, see [16,18]. For simplicity, all random variables mentioned in the paper are outcomes in the sets of finite strings.

Let

X, Y

be two random variables with a computable joint probability distribution

f (x, y)

, the marginal distributions of X and Y are defined by

f_{1} (x) = \sum_{y} f (x, y)

and

f_{2} (x) = \sum_{x} f (x, y)

, respectively.

The joint Shannon entropy of X and Y is defined as

\begin{matrix} H (X, Y) = - \sum_{x \in X, y \in Y} f (x, y) \log f (x, y) . \end{matrix}

(1)

The Shannon entropy of X is defined as

\begin{matrix} H (X) = - \sum_{x \in X} f_{1} (x) \log f_{1} (x) = - \sum_{x \in X, y \in Y} f (x, y) \log f_{1} (x) . \end{matrix}

(2)

The conditional Shannon entropy with respect to Y given X is defined as

\begin{matrix} (3) & H (Y | X) & = & \sum_{x \in X} f_{1} (x) H (Y | X = x) \\ (4) & = & - \sum_{x \in X} f_{1} (x) \sum_{y \in Y} f (y | x) \log f (y | x) \\ (5) & = & - \sum_{x \in X, y \in Y} f (x, y) \log f (x | y) . \end{matrix}

The mutual information between variables X and Y is defined as

I (X; Y) = H (X) - H (X | Y) .

(6)

Kolmogorov complexity and Shannon entropy are fundamentally different measures. However, for any computable probability distributions, up to

K (f) + O (1)

, the Shannon entropy equals the expected value of the Kolmogorov complexity [17,18,19]. Conditional Kolmogorov complexity and conditional Shannon entropy are also related.

The following two Lemmas are Theorem 8.1.1 from [17] and Theorem 5 from [22], respectively.

Lemma 1.

Let X be a random variable over

X

. For any computable probability distribution

f (x)

over

X

,

0 \leq (\sum_{x} f (x) K (x) - H (X)) \leq K (f) + O (1) .

(7)

Lemma 2.

Let

X, Y

be two random variables over

X

,

Y

, respectively. For any computable probability distribution

f (x, y)

over

X \times Y

,

0 \leq (\sum_{x, y} f (x, y) K (x | y) - H (X | Y)) \leq K (f) + O (1) .

(8)

The following two Lemmas will be used in the next section.

Lemma 3.

There are four positive integer

a, b, c, d

such that

\frac{1}{2} \max (a, b) + \frac{1}{2} \max (c, d) > \max (\frac{a + c}{2}, \frac{b + d}{2}) .

Proof.

Let

a > b > 0

and

d > c > 0

, then

\frac{1}{2} \max (a, b) + \frac{1}{2} \max (c, d) = \frac{a + d}{2} > \max (\frac{a + c}{2}, \frac{b + d}{2})

. ☐

Lemma 4.

There are four positive integer

a, b, c, d

such that

\frac{1}{2} \min (a, b) + \frac{1}{2} \min (c, d) < \min (\frac{a + c}{2}, \frac{b + d}{2}) .

Proof.

Let

a > b > 0

and

d > c > 0

, then

\frac{1}{2} \min (a, b) + \frac{1}{2} \min (c, d) = \frac{b + c}{2} < \min (\frac{a + c}{2}, \frac{b + d}{2})

. ☐

3. Information Distance Versus Entropy

A metric on a set

X

is a function

d : X \times X \to R^{+}

having the following properties: for every

x, y, z \in X

(i): $d (x, y) = 0$ if and only if $x = y$ ;
(ii): $d (x, y) = d (y, x)$ ;
(iii): $d (x, y) + d (y, z) \geq d (x, z)$ .

Here, entropy metric means the metric on the set of all random variables over a set.

d (X, Y) = H (X | Y) + H (Y | X)

is a metric [16]. It is easy to know that

d (X, Y) = \max {H (X | Y), H (Y | X)}

is also a metric.

Information distance

E_{\max} (x, y)

[1], the length of a shortest program computing y from x and vice versa, is defined as

E_{\max} (x, y) = \min {| p | : F (p, x) = y, F (p, y) = x} .

In [1], up to an additive logarithmic term, the equality,

E_{\max} (x, y) = \max (K (x | y), K (y | x))

, holds. So

E_{\max} (x, y)

is called the max distance between x and y.

We show the following relationship between max distance and the entropy metric.

Theorem 1.

Let

X, Y

be two random variables with a computable joint probability distribution

f (x, y)

, then

\max (H (X | Y), H (Y | X)) \leq \sum_{x, y} f (x, y) E_{\max} (x, y) \leq H (X | Y) + H (Y | X) + 2 K (f) + O (1)

Proof.

First, from Lemma 2, we have

\begin{matrix} \sum_{x, y} f (x, y) E_{\max} (x, y) \geq \sum_{x, y} f (x, y) K (x | y) \geq H (X | Y), \end{matrix}

(9)

\begin{matrix} \sum_{x, y} f (x, y) E_{\max} (x, y) \geq \sum_{x, y} f (x, y) K (y | x) \geq H (Y | X) . \end{matrix}

(10)

Thus

\begin{matrix} \sum_{x, y} f (x, y) E_{\max} (x, y) \geq \max (H (X | Y), H (Y | X)) . \end{matrix}

(11)

Moreover, from Lemma 2, we get

\begin{matrix} \sum_{x, y} f (x, y) E_{\max} (x, y) & \leq & \sum_{x, y} f (x, y) (K (x | y) + K (y | x)) \\ \leq & \sum_{x, y} f (x, y) K (x | y) + \sum_{x, y} f (x, y) K (y | x) \\ \leq & H (X | Y) + H (Y | X) + 2 K (f) + O (1) . \end{matrix}

☐

Remark 1.

From the above Theorem, the inequality

\sum_{x, y} f (x, y) E_{\max} (x, y) \geq \max (H (X | Y), H (Y | X))

holds.

Unfortunately, the inequality

\sum_{x, y} f (x, y) E_{\max} (x, y) \leq \max (H (X | Y), H (Y | X))

does not hold.

For instance, let the joint probability distribution

f (x, y)

of X and Y be

f (x_{1}, y_{1}) = f (x_{2}, y_{2}) = 0.5

, and let

a = K (x_{1} | y_{1}),

b = K (y_{1} | x_{1}),

c = K (x_{2} | y_{2})

and

d = K (y_{2} | x_{2})

such that

a \neq b

and

d \neq c

. Assume, without loss of generality, that

a > b

and

d > c

, then from Lemma 3, we have

\sum_{x, y} f (x, y) E_{\max} (x, y) > \sum_{x, y} f (x, y) K (x | y) + \sum_{x, y} f (x, y) K (y | x)

.

This means we will get the inequality

\sum_{x, y} f (x, y) E_{\max} (x, y) > \sum_{x, y} f (x, y) K (x | y) + \sum_{x, y} f (x, y) K (y | x) \geq \max (H (X | Y), H (Y | X))

for some cases.

From above results, we know that the relationship

\sum_{x, y} f (x, y) E_{\max} (x, y) \approx \max (H (X | Y), H (Y | X))

does not hold.

Because the mutual information between X and Y is defined as

I (X; Y) = H (X) - H (X | Y),

we have the following result.

Corollary 1.

Let

X, Y

be two random variables with a computable joint probability distribution

f (x, y)

, then

\max (H (X), H (Y)) - I (X; Y) \leq \sum_{x, y} f (x, y) E_{\max} (x, y) \leq H (X) + H (Y) - 2 I (X; Y) + 2 K (u) + O (1)

Min distance

E_{\min} (x, y)

[3,4] is defined as

E_{\min} (x, y) = \min {| p | : F (p, x, z) = y, F (p, y, r) = x, | p | + | z | + | r | \leq E_{\max} (x, y)} .

In [3,4], the equality,

E_{\min} (x, y) = \min (K (x | y), K (y | x))

, holds, when a term

O (\log | x | + | y |)

is omitted. Then we have the following relationship between min distance and the entropy metric.

Theorem 2.

Let

X, Y

be two random variables with a computable joint probability distribution

f (x, y)

, then

\sum_{x, y} f (x, y) E_{\min} (x, y) \leq \min (H (X | Y), H (Y | X)) + K (f) + O (1)

Proof.

From Lemma 1, we have

\begin{matrix} \sum_{x, y} f (x, y) E_{\min} (x, y) \leq \sum_{x, y} f (x, y) K (x | y) \leq H (X | Y) + K (f) + O (1), \end{matrix}

(12)

\begin{matrix} \sum_{x, y} f (x, y) E_{\min} (x, y) \leq \sum_{x, y} f (x, y) K (y | x) \leq H (Y | X) + K (f) + O (1) . \end{matrix}

(13)

Thus

\begin{matrix} \sum_{x, y} f (x, y) E_{\min} (x, y) \leq \min (H (X | Y), H (Y | X)) + K (f) + O (1) . \end{matrix}

(14)

☐

Remark 2.

From the above Theorem, the inequality

\sum_{x, y} f (x, y) E_{\min} (x, y) \leq \min (H (X | Y), H (Y | X))

holds.

Unfortunately, from Lemma 4, we know that the inequality

\sum_{x, y} f (x, y) E_{\min} (x, y) \geq \min (H (X | Y), H (Y | X))

does not hold.

Thus, the relationship

\sum_{x, y} f (x, y) E_{\min} (x, y) \approx \min (H (X | Y), H (Y | X))

does not hold.

Corollary 2.

Let

X, Y

be two random variables with a computable joint probability distribution

f (x, y)

, then

\sum_{x, y} f (x, y) E_{\min} (x, y) \leq \min (H (X), H (Y)) - I (X; Y) + K (f) + O (1)

Sum distance

E_{s u m} (x, y)

[1] is defined as

E_{s u m} (x, y) = K (x | y) + K (y | x) .

Then we have the following relationship between sum distance and entropy measure.

Theorem 3.

Let

X, Y

be two random variables with a computable joint probability distribution

f (x, y)

, then

H (X | Y) + H (Y | X) \leq \sum_{x, y} f (x, y) E_{s u m} (x, y) \leq H (X | Y) + H (Y | X) + 2 K (f) + O (1)

Proof.

First, we have

\begin{matrix} \sum_{x, y} f (x, y) E_{s u m} (x, y) & = & \sum_{x, y} f (x, y) (K (x | y) + K (y | x)) \\ = & \sum_{x, y} f (x, y) K (x | y) + \sum_{x, y} f (x, y) K (x | y) \end{matrix}

Then, from Lemma 1, we can get

\begin{matrix} 0 \leq \sum_{x, y} f (x, y) E_{s u m} (x, y) - (H (X | Y) + H (Y | X)) \leq 2 K (f) + O (1) . \end{matrix}

(15)

☐

Corollary 3.

Let

X, Y

be two random variables with a computable joint probability distribution

f (x, y)

, then

H (X) + H (Y) - 2 I (X; Y) \leq \sum_{x, y} f (x, y) E_{s u m} (x, y) \leq H (X) + H (Y) - 2 I (X; Y) + 2 K (f) + O (1)

From above results we know that, when f is given, up to a additive constant,

\begin{matrix} \sum_{x, y} f (x, y) E_{s u m} (x, y) = H (X | Y) + H (Y | X) = H (X) + H (Y) - 2 I (X; Y) . \end{matrix}

(16)

4. Normalized Information Distance Versus Entropy

In this section, we establish relationships between entropy and the normalized versions of information distances.

The normalized version

e_{m a x} (x, y)

[2] of

E_{m a x} (x, y)

is defined as

e_{m a x} (x, y) = \frac{\max (K (x | y), K (y | x))}{\max (K (x), K (y))} .

Theorem 4.

Let

X, Y

be two random variables with a computable joint probability distribution

f (x, y)

, then

\sum_{x, y} f (x, y) e_{\max} (x, y) \leq \frac{H (X | Y) + H (Y | X) + 2 K (f) + O (1)}{\max (H (X), H (Y))}

Proof.

First, we have

\begin{matrix} e_{m a x} (x, y) = \frac{\max (K (x | y), K (y | x))}{\max (K (x), K (y))} \leq \frac{K (x | y) + K (y | x)}{K (x)} . \end{matrix}

(17)

Then

\begin{matrix} K (x) e_{m a x} (x, y) \leq K (x | y) + K (y | x) . \end{matrix}

(18)

Therefore,

\begin{matrix} \sum_{x, y} f (x, y) [K (x) e_{m a x} (x, y)] & \leq & \sum_{x, y} f (x, y) (K (x | y) + K (y | x)) \\ = & \sum_{x, y} f (x, y) K (x | y) + \sum_{x, y} f (x, y) K (y | x), \end{matrix}

From Lemmas 1 and 2, we have

\begin{matrix} H (x) [\sum_{x, y} f (x, y) e_{m a x} (x, y)] & \leq & [\sum_{x, y} f (x, y)] [\sum_{x, y} f (x, y) e_{m a x} (x, y)], by Lemma 1 \\ \leq & \sum_{x, y} f (x, y) [K (x) e_{m a x} (x, y)] \\ \leq & \sum_{x, y} f (x, y) K (x | y) + \sum_{x, y} f (x, y) K (y | x) \\ \leq & H (X | Y) + H (Y | X) + 2 K (f) + O (1), by Lemma 2 \end{matrix}

Then

\begin{matrix} \sum_{x, y} f (x, y) e_{m a x} (x, y) \leq \frac{H (X | Y) + H (Y | X) + 2 K (f) + O (1)}{H (X)} . \end{matrix}

(19)

Similarly, we have

\begin{matrix} \sum_{x, y} f (x, y) e_{m a x} (x, y) \leq \frac{H (X | Y) + H (Y | X) + 2 K (f) + O (1)}{H (Y)} . \end{matrix}

(20)

Thus

\begin{matrix} \sum_{x, y} f (x, y) e_{m a x} (x, y) \leq \frac{H (X | Y) + H (Y | X) + 2 K (f) + O (1)}{\max (H (X), H (Y))} . \end{matrix}

(21)

☐

Corollary 4.

Let

X, Y

be two random variables with a computable joint probability distribution

f (x, y)

, then

\sum_{x, y} f (x, y) e_{m a x} (x, y) \leq \frac{H (X) + H (Y) - 2 I (X; Y) + 2 K (f) + O (1)}{\max (H (X), H (Y))}

The normalized version

e_{m i n} (x, y)

[3,4] of

E_{m i n} (x, y)

is defined as

e_{m i n} (x, y) = \frac{\min (K (x | y), K (y | x))}{\min (K (x), K (y))} .

Because

e_{m i n} (x, y) \leq e_{m a x} (x, y)

, for all

x, y

[3,4], the following Corollary is straightforward with the above Theorem.

Corollary 5.

Let

X, Y

be two random variables with a computable joint probability distribution

f (x, y)

, then

\sum_{x, y} f (x, y) e_{\min} (x, y) \leq \frac{H (X | Y) + H (Y | X) + 2 K (f) + O (1)}{\max (H (X), H (Y))}

The normalized version

e_{s u m} (x, y)

[2] of

E_{s u m} (x, y)

is defined as

e_{s u m} (x, y) = \frac{K (x | y) + K (y | x)}{K (x, y)} .

Theorem 5.

Let

X, Y

be two random variables with a computable joint probability distribution

u (x, y)

, then

\begin{matrix} \sum_{x, y} f (x, y) e_{s u m} (x, y) \leq \frac{H (X | Y) + H (Y | X) + 2 K (f) + O (1)}{H (X, Y)} \end{matrix}

(22)

Proof.

Since

e_{s u m} (x, y) = \frac{K (x | y) + K (y | x)}{K (x, y)},

then

\begin{matrix} K (x, y) e_{s u m} (x, y) = K (x | y) + K (y | x) . \end{matrix}

(23)

Therefore,

\begin{matrix} \sum_{x, y} f (x, y) [K (x, y) e_{s u m} (x, y)] = \sum_{x, y} f (x, y) K (x | y) + \sum_{x, y} f (x, y) K (y | x) . \end{matrix}

(24)

From Lemma 1, we have

\begin{matrix} H (X, Y) [\sum_{x, y} f (x, y) e_{s u m} (x, y)] & \leq & [\sum_{x, y} f (x, y) K (x, y)] [\sum_{x, y} f (x, y) e_{s u m} (x, y)] \\ \leq & \sum_{x, y} f (x, y) [K (x, y) e_{s u m} (x, y)] \\ = & \sum_{x, y} f (x, y) K (x | y) + \sum_{x, y} f (x, y) K (y | x) \\ \leq & H (X | Y) + H (Y | X) + 2 K (f) + O (1) \end{matrix}

Thus

\begin{matrix} \sum_{x, y} f (x, y) e_{s u m} (x, y) \leq \frac{H (X | Y) + H (Y | X) + 2 K (f) + O (1)}{H (X, Y)} . \end{matrix}

(25)

☐

Corollary 6.

Let

X, Y

be two random variables with a computable joint probability distribution

f (x, y)

, then

\begin{matrix} \sum_{x, y} f (x, y) e_{s u m} (x, y) \leq \frac{H (X, Y) - I (X; Y) + 2 K (f) + O (1)}{H (X, Y)} . \end{matrix}

(26)

5. Conclusions

As we know, the Shannon entropy of a distribution is approximately equal to the expected Kolmogorov complexity, up to a constant term that only depends on the distribution [17]. We studied whether a similar relationship holds for information distance. Theorem 5 gave the analogous result for sum distance. We also gave some bounds for the expected value of other (normalized) information distances.

Acknowledgments

The authors would like to thank the referees for their valuable comments and recommendations. Project supported by the Guangxi University Science and Technology Research Project (Grant No. 2013YB193) and the Science and Technology Foundation of Guizhou Province, China (LKS [2013] 35).

Author Contributions

Conceptualization, formal analysis, investigation and writing the original draft is done by Songsong Dai; Validation, review and editing is done by Bo Hu and Lvqing Bi; Project administration and Resources are provided by Bo Hu and Lvqing Bi. All authors have read and approved the final manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

References

Bennett, C.H.; Gacs, P.; Li, M.; Vitányi, P.M.B.; Zurek, W. Information distance. IEEE Trans. Inf. Theory 1998, 44, 1407–1423. [Google Scholar] [CrossRef]
Li, M.; Chen, X.; Li, X.; Ma, B.; Vitányi, P.M.B. The similarity metric. IEEE Trans. Inf. Theory 2004, 50, 3250–3264. [Google Scholar] [CrossRef]
Li, M. Information distance and its applications. Int. J. Found. Comput. Sci. 2011, 18, 1–9. [Google Scholar]
Zhang, X.; Hao, Y.; Zhu, X.Y.; Li, M. New information distance measure and its application in question answering system. J. Comput. Sci. Technol. 2008, 23, 557–572. [Google Scholar] [CrossRef]
Terwijn, S.A.; Torenvliet, L.; Vitányi, P.M.B. Nonapproximability of the normalized information distance. J. Comput. Syst. Sci. 2011, 77, 738–742. [Google Scholar] [CrossRef]
Bloem, P.; Mota, F.; Rooij, S.D.; Antunes, L.; Adriaans, P. A safe approximation for Kolmogorov complexity. In Proceedings of the International Conference on Algorithmic Learning Theory, Bled, Slovenia, 8–10 October 2014; Auer, P., Clark, A., Zeugmann, T., Zilles, S., Eds.; Springer: Cham, Switzerland, 2014; Volume 8776, pp. 336–350. [Google Scholar]
Cilibrasi, R.; Vitányi, P.M.B.; Wolf, R. Algorithmic clustering of music based on string compression. Comput. Music J. 2004, 28, 49–67. [Google Scholar] [CrossRef]
Cuturi, M.; Vert, J.P. The context-tree kernel for strings. Neural Netw. 2005, 18, 1111–1123. [Google Scholar] [CrossRef] [PubMed]
Cilibrasi, R.; Vitányi, P.M.B. Clustering by compression. IEEE Trans. Inf. Theory 2005, 51, 1523–1545. [Google Scholar] [CrossRef]
Li, M.; Badger, J.; Chen, X.; Kwong, S.; Kearney, P.; Zhang, H. An information-based sequence distance and its application to whole mito chondrial genome phylogeny. Bioinformatics 2001, 17, 149–154. [Google Scholar] [CrossRef] [PubMed]
Cilibrasi, R.L.; Vitányi, P.M.B. The Google similarity distance. IEEE Trans. Knowl. Data Eng. 2007, 19, 370–383. [Google Scholar] [CrossRef]
Benedetto, D.; Caglioti, E.; Loreto, V. Language trees and zipping. Phys. Rev. Lett. 2002, 88, 048702. [Google Scholar] [CrossRef] [PubMed]
Chen, X.; Francia, B.; Li, M.; McKinnon, B.; Seker, A. Shared information and program plagiarism detection. IEEE Trans. Inf. Theory 2004, 50, 1545–1551. [Google Scholar] [CrossRef]
Bu, F.; Zhu, X.Y.; Li, M. A new multiword expression metric and its applications. J. Comput. Sci. Technol. 2011, 26, 3–13. [Google Scholar] [CrossRef]
Leung-Yan-Cheong, S.K.; Cover, T.M. Some equivalences between Shannon entropy and Kolmogorov complexity. IEEE Trans. Inf. Theory 1978, 24, 331–339. [Google Scholar] [CrossRef]
Cover, T.M.; Thomas, J.A. Elements of Information Theory; Wiley: Hoboken, NJ, USA, 2006. [Google Scholar]
Li, M.; Vitányi, P.M.B. An Introduction to Kolmogorov Complexity and Its Applications, 3rd ed.; Springer: New York, NY, USA, 2008. [Google Scholar]
Grünwald, P.; Vitányi, P. Shannon information and Kolmogorov complexity. arXiv 2008. [Google Scholar]
Teixeira, A.; Matos, A.; Souto, A.; Antunes, L. Entropy measures vs. Kolmogorov complexity. Entropy 2011, 13, 595–611. [Google Scholar] [CrossRef]
Hammer, D.; Romashchenko, A.; Shen, A.; Vereshchagin, N. Inequalities for Shannon entropies and Kolmogorov complexities. J. Comput. Syst. Sci. 2000, 60, 442–464. [Google Scholar] [CrossRef]
Pinto, A. Comparing notions of computational entropy. Theory Comput. Syst. 2009, 45, 944–962. [Google Scholar] [CrossRef]
Antunes, L.; Laplante, S.; Pinto, A.; Salvador, L. Cryptographic Security of Individual Instances. In Information Theoretic Security; Springer: Berlin/Heidelberg, Germany, 2009; pp. 195–210. [Google Scholar]
Kaced, T. Almost-perfect secret sharing. In Proceedings of the 2011 IEEE International Symposium on Information Theory Proceedings (ISIT), Saint Petersburg, Russia, 31 July–5 August 2011; pp. 1603–1607. [Google Scholar]
Dai, S.; Guo, D. Comparing security notions of secret sharing schemes. Entropy 2015, 17, 1135–1145. [Google Scholar] [CrossRef]
Bi, L.; Dai, S.; Hu, B. Normalized unconditional ϵ-security of private-key encryption. Entropy 2017, 19, 100. [Google Scholar] [CrossRef]
Antunes, L.; Matos, A.; Pinto, A.; Souto, A.; Teixeira, A. One-way functions using algorithmic and classical information theories. Theory Comput. Syst. 2013, 52, 162–178. [Google Scholar] [CrossRef]
Solomonoff, R. A formal theory of inductive inference—Part I. Inf. Control 1964, 7, 1–22. [Google Scholar] [CrossRef]
Kolmogorov, A. Three approaches to the quantitative definition of information. Probl. Inf. Transm. 1965, 1, 1–7. [Google Scholar] [CrossRef]
Chaitin, G. On the length of programs for computing finite binary sequences: Statistical considerations. J. ACM 1969, 16, 145–159. [Google Scholar] [CrossRef]
Shannon, C.E. A mathematical theory of communication. Bell Syst. Tech. J. 1948, 27, 623–656. [Google Scholar] [CrossRef]

© 2017 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Hu, B.; Bi, L.; Dai, S. Information Distances versus Entropy Metric. Entropy 2017, 19, 260. https://doi.org/10.3390/e19060260

AMA Style

Hu B, Bi L, Dai S. Information Distances versus Entropy Metric. Entropy. 2017; 19(6):260. https://doi.org/10.3390/e19060260

Chicago/Turabian Style

Hu, Bo, Lvqing Bi, and Songsong Dai. 2017. "Information Distances versus Entropy Metric" Entropy 19, no. 6: 260. https://doi.org/10.3390/e19060260

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Information Distances versus Entropy Metric

Abstract

1. Introduction

2. Preliminaries

2.1. Kolmogorov Complexity

2.2. Shannon Entropy

3. Information Distance Versus Entropy

4. Normalized Information Distance Versus Entropy

5. Conclusions

Acknowledgments

Author Contributions

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI