Point Information Gain and Multidimensional Data Analysis

Rychtáriková, Renata; Korbel, Jan; Macháček, Petr; Císař, Petr; Urban, Jan; Štys, Dalibor

doi:10.3390/e18100372

Open AccessArticle

Point Information Gain and Multidimensional Data Analysis

by

Renata Rychtáriková

^1,*,

Jan Korbel

²

,

Petr Macháček

¹,

Petr Císař

¹,

Jan Urban

¹ and

Dalibor Štys

¹

Institute of Complex Systems, South Bohemian Research Center of Aquaculture and Biodiversity of Hydrocenoses, Faculty of Fisheries and Protection of Waters (FFWP), University of South Bohemia in České Budějovice, Zámek 136, Nové Hrady 373 33, Czech Republic

²

Faculty of Nuclear Sciences and Physical Engineering, Czech Technical University in Prague, Břehová 7, Prague 155 19, Czech Republic

^*

Author to whom correspondence should be addressed.

Entropy 2016, 18(10), 372; https://doi.org/10.3390/e18100372

Submission received: 1 August 2016 / Revised: 17 September 2016 / Accepted: 14 October 2016 / Published: 19 October 2016

(This article belongs to the Section Information Theory, Probability and Statistics)

Download

Browse Figures

Versions Notes

Abstract

:

We generalize the point information gain (PIG) and derived quantities, i.e., point information gain entropy (PIE) and point information gain entropy density (PIED), for the case of the Rényi entropy and simulate the behavior of PIG for typical distributions. We also use these methods for the analysis of multidimensional datasets. We demonstrate the main properties of PIE/PIED spectra for the real data with the examples of several images and discuss further possible utilizations in other fields of data processing.

Keywords:

point information gain (PIG); Rényi entropy; data processing

1. Introduction

Measurement of relative information between two probability distributions is one of the most important goals of information theory. Among many other concepts, there are two that are widely used. By far, the most widespread concept is called the relative Shannon entropy or the Kullback–Leibler divergence. In this work, we use an alternative approach based on a simple concept of entropy difference instead. By generalization of both concepts from Shannon’s approach to Rényi’s approach, we obtain the whole class of information variables that enable aiming for different parts of probability distributions and interpret it as an investigation of different parts of multifractal systems.

Despite the mathematical precision of the concept of the Shannon/Rényi divergence, we use another concept, the (Rényi) entropy difference, for introduction of a value which locally determines an information contribution of a given element in a discrete set. Even though there is no substantial restriction on the usage of a standard divergence for calculation of the information difference upon elimination of one element from a set, for practical reasons, we used the simple concept of entropy difference between sets with and without a given element. The resulted value has been called a point information gain

Γ_{α}^{(i)}

[1,2]. The goal of this article is to examine and demonstrate some properties of this variable and derive another quantities, namely a point information gain entropy

H_{α}

and a point information gain entropy density

Ξ_{α}

. We also introduce the relation of all these variables to global and local information in multidimensional data analysis.

2. Mathematical Description and Properties of Point Information Gain

2.1. Point Information Gain and Its Relation to Other Information Entropies

An important problem in the information theory is to estimate the amount of information gained or lost by refining, and approximate the probability distribution P by the distribution Q. The most popular measure used in the theory is the Kullback–Leibler (KL) divergence, defined as

D_{K L} (P | | Q) = \sum_{j} p_{j} ln \frac{p_{j}}{q_{j}} = E_{p} [ln P] - E_{p} [ln Q] = S_{P} (Q) - S (P),

(1)

where

S_{P} (Q)

is so-called cross-entropy [3] and

S (P)

is the Shannon entropy of distribution P. If P is similar to Q, this measure can be approximated by entropy difference

Δ S (P, Q) = S (Q) - S (P) .

(2)

Indeed, this measure does not obey as many theoretic-measure axioms as the KL-divergence. For instance, for

P \neq Q

, we can still obtain

Δ S (P, Q) = 0

. Nevertheless, if

P \approx Q

and

P \neq Q

, this value can be still a suitable quantity revealing some important information aspects of a system. The situation, when the distributions are approximative histograms of some underlying distributions P for n and (

n + 1

) measurements, respectively, is particularly interesting. In this case, the entropy difference

Δ S (P_{n}, P_{n + 1}) = S (P_{n + 1}) - S (P_{n})

(3)

can be interpreted as an information gained by the

(n + 1)

-th measurement. Naturally,

P_{n} \to P

. When dealing with real complex systems, it is sometimes advantageous to introduce new information variables and entropies that capture the complexity of the system better, e.g., Hellinger’s distance, Jeffrey’s distance or J-divergence. There are also some specific information measures that have special interpretations and are widely used in various applications [4,5]. Two of the most important quantities are the Tsallis–Havrda–Charvát (THC) entropy [6], which is the entropy of non-extensive systems, and the Rényi entropy, the entropy of multifractal systems [7,8]. The latter is tightly connected to the theory of multifractal systems and generalized dimensions [9]. It is defined as

H_{α} (P) = \frac{1}{1 - α} ln \sum_{j} p_{j}^{α}, α \geq 0,

(4)

where α is the Rényi coefficient and

p_{j}

is the probability of occurrence of a phenomenon j in the discrete distribution. Limit

α \to 1

recovers the Shannon entropy. Similar to the Shannon entropy, the Rényi entropy also has an operational meaning. Actually, it can be interpreted as the average information cost, when the cost of an elementary piece of information is an exponential function of its length [10]. Thus, changing the parameter α changes the cost of the information and therefore accentuates some parts of the probability distributions while suppressing the others. Thus, by taking into account the whole class of Rényi entropies, we get a new generalized class of information quantities.

The point information gain

Γ_{α}^{(i)}

of the i-th point was developed as a practical tool for assessment of the information contribution of an element to a given discrete distribution [11]. Similar to the Shannon entropy difference, it is defined as a difference of two Rényi entropies—with and without the examined element of a discrete phenomenon. Let us consider a discrete distribution of k distinct possible outcomes (e.g., different colors of pixels). Let us have a discrete distribution

P = {p_{j}}_{j = 1}^{k} = \{\frac{n_{1}}{n}, \dots, \frac{n_{i}}{n}, \dots, \frac{n_{k}}{n}\},

(5)

where n denotes the total number of elements in the discrete distribution and

n_{i}

the number of elements of i-th phenomenon,

i \in {1, 2, \dots, k - 1, k}

, respectively. Let us denote

n_{j}^{(i)} = n_{j}

for

j \neq i

and

n_{i}^{(i)} = n_{i} - 1

. Then, the distribution with the omitted i-th phenomenon can be written as

P^{(i)} = {\{p_{j}^{(i)}\}}_{j = 1}^{k} = \{\frac{n_{1}^{(i)}}{n - 1}, \dots, \frac{n_{i}^{(i)}}{n - 1}, \dots, \frac{n_{k}^{(i)}}{n - 1}\} .

(6)

Hence, we may write the point information gain

Γ_{α}^{(i)}

as

\begin{matrix} Γ_{α}^{(i)} & = & H_{α} (P^{(i)}) - H_{α} (P) = \frac{1}{1 - α} ln (\sum_{j = 1}^{k} {(p_{j}^{(i)})}^{α}) - \frac{1}{1 - α} ln (\sum_{j = 1}^{k} {(p_{j})}^{α}) \\ = & \frac{1}{1 - α} ln (\frac{\sum_{j = 1}^{k} {(p_{j}^{(i)})}^{α}}{\sum_{j = 1}^{k} p_{j}^{α}}), \end{matrix}

(7)

where k is the total number of the phenomena in the discrete distribution. In the rest of the text, we use the natural logarithm to simplify calculations. However, all computations have been performed with the usage of binary logarithm which, for the Rényi entropy and its derivatives, yields values in bits. In contrast to the commonly used Rényi divergence [12,13,14,15,16,17,18], we use

Γ_{α}^{(i)}

for its relative simplicity and practical interpretation. Unlike the KL divergence, the Rényi divergence cannot be interpreted as a difference of cross-entropy and entropy of the underling distribution and computation becomes intractable. As discussed above, for similar distributions, it still preserves its information values.

After the substitution for the probabilities, one gets that

Γ_{α}^{(i)} = \frac{1}{1 - α} ln \frac{\sum_{j = 1}^{k} \frac{{(n_{j}^{(i)})}^{α}}{{(n - 1)}^{α}}}{\sum_{j = 1}^{k} \frac{n_{j}^{α}}{n^{α}}} = C_{α} (n) + \frac{1}{1 - α} ln \frac{\sum_{j = 1}^{k} {(n_{j}^{(i)})}^{α}}{\sum_{j = 1}^{k} n_{j}^{α}},

(8)

where

C_{α} (n) = ln {(\frac{n}{n - 1})}^{\frac{α}{1 - α}}

depends only on n. For

n \to \infty

and

Γ_{α}^{(i)} \to 0

, the whole entropy remains finite (contrary to unconditional entropy, which has to be renormalized for continuous case (for details, see Reference [7]). Therefore, we examine only the second term. When the argument of the logarithm is close to 1, i.e.,

\sum_{j = 1}^{k} {(n_{j}^{(i)})}^{α} \approx \sum_{j = 1}^{k} n_{j}^{α},

(9)

which leads to the condition that

{(\frac{n_{i}^{(i)}}{n_{i}})}^{α} = {(\frac{n_{i} - 1}{n_{i}})}^{α} \approx 1,

(10)

for given α, one can then approximate the logarithm by the Taylor expansion of the first order. After denoting

D_{α}^{(i)} = \frac{\sum_{j = 1}^{k} {(n_{j}^{(i)})}^{α}}{\sum_{j = 1}^{k} n_{j}^{α}},

(11)

the second term of

Γ_{α}^{(i)}

can be approximated as

\frac{1}{1 - α} ln D_{α}^{(i)} = \frac{1}{1 - α} (D_{α}^{(i)} - 1) + O ({(D_{α}^{(i)} - 1)}^{2}),

(12)

where we used the big

O

asymptotic notation. Let us note that the last term in Equation (8) is nothing else than the THC entropy [6,19]. Naturally, for very similar distributions, these two quantities are practically the same. This is due to the fact that, for large n, the omission of the point has no large impact on the whole distribution. Consequently, an actual value of parameter α, which leads to rescaling of probabilities, is more important than a particular form of entropy.

We shall continue by utilizing the Rényi entropy due to its relation to the generalized dimension of multifractal systems [20,21]. Let us concentrate again to the term

D_{α}^{(i)}

. We can rewrite it as

D_{α}^{(i)} = \frac{\sum_{j = 1}^{k} {(n_{j}^{(i)})}^{α}}{\sum_{j = 1}^{k} n_{j}^{α}} = \frac{\sum_{j = 1, j \neq i}^{k} n_{j}^{α} + {(n_{i} - 1)}^{α}}{\sum_{j = 1}^{k} n_{j}^{α}} = 1 - α \frac{n_{i}^{α - 1}}{\sum_{j = 1}^{k} n_{j}^{α}} + \frac{1}{\sum_{j = 1}^{k} n_{j}^{α}} ω (n_{i}^{α - 2}),

(13)

where we use the small ω asymptotic notation. Specifically, provided

α = 2

, we obtain

Γ_{2}^{(i)} \approx C_{2} (n) + \frac{1}{1 - 2} (\frac{\sum_{j = 1, j \neq i}^{k} n_{j}^{2} + {(n_{i} - 1)}^{2}}{\sum_{j = 1}^{k} n_{j}^{2}} - 1) \approx C_{2} (n) + \frac{2 n_{i} - 1}{\sum_{j = 1}^{k} n_{j}^{2}},

(14)

which explains why the dependency

n_{i}

on

Γ_{2}^{(i)}

is approximately linear. In general, point information gain is a monotone function of

n_{i}

, respectively

p_{i}

, for all possible discrete distributions. Thus, it may be used as a quantity of information gain between two discrete distributions, which in the occurrence of one particular feature, differ.

Let us discuss an interpretation of the point information gain. We can rewrite Equation (8) as

\begin{matrix} Γ_{α}^{(i)} & = & ln {(\frac{n}{n - 1})}^{\frac{α}{1 - α}} + ln {(1 + \frac{{(n_{i} - 1)}^{α} - n_{i}^{α}}{\sum_{j = 1}^{k} n_{j}^{α}})}^{\frac{1}{1 - α}} \\ = & - ln {[{(1 - \frac{1}{n})}^{α}]}^{\frac{1}{1 - α}} + ln {(1 + n_{i}^{α} \frac{{(1 - \frac{1}{n_{i}})}^{α} - 1}{\sum_{j = 1}^{k} n_{j}^{α}})}^{\frac{1}{1 - α}} . \end{matrix}

(15)

We are interested in the situation when

Γ_{α}^{(i)} = 0

. After straightforward manipulations, we can get rid of ln and

\frac{1}{1 - α}

power, so

{(1 - \frac{1}{n})}^{α} = 1 + n_{i}^{α} \frac{{(1 - \frac{1}{n_{i}})}^{α} - 1}{\sum_{j = 1}^{k} n_{j}^{α}} .

(16)

If

n ≫ 1

and

n_{i} ≫ 1

, we can approximate both sides with the rule

{(1 + x)}^{α} \approx 1 + α x

for x close to zero which gives

1 - \frac{α}{n} = 1 - \frac{α n_{i}^{α - 1}}{\sum_{j = 1}^{k} n_{j}^{α}} .

(17)

Thus, we end with

n_{i} = \sqrt[α - 1]{\frac{\sum_{j = 1}^{k} n_{j}^{α}}{n}} .

(18)

This shows that

Γ_{α}^{(i)} = 0

holds for events with average frequency.

Γ_{α}^{(i)} < 0

corresponds to rare events, while

Γ_{α}^{(i)} > 0

corresponds to frequent events. Thus, in addition to the definition of the quantity of the contribution of each event to the examined distribution, we also obtain the discrimination between points which contribute to the total information of the given distribution under the statistical assumption represented by a particular α. This opens the question on existence of the “optimal” distribution for the given α.

Then, the possible variants of such optimality arise subsequently: the first one can be defined as a distribution for which exactly half of the value

n_{i}

produces

Γ_{α}^{(i)} > 0

and the other half yields

Γ_{α}^{(i)} < 0

. The second one requires values

Γ_{α}^{(i)}

to be spaced equally. Existence of such a distribution could be understood as another generalization of the concept of the entropy power [22,23], and we refer this question to our future research.

With respect to the previous discussion and practical utilization of this notion, we emphasize that for real systems with large n, values

Γ_{α}^{(i)}

are relatively small numbers for current numerical precision of common computers. Their further computer averaging and numerical representation lead to significant errors such as underflow and overflow (e.g., Figure 1c). At lower values α, the values

Γ_{α}^{(i)}

are broadly separated for rare points, while, at higher values α, the resolution is higher for more frequent data points. Therefore, spectrum

Γ_{α}^{(i)}

vs. α is more advisable to compute rather than a single

Γ_{α}^{(i)}

value at a chosen α.

2.2. Point Information Gain for Typical Distributions

In Figure 1, we demonstrate

Γ_{α}^{(i)}

-transformations of three thoroughly studied distributions—the Lévy, Cauchy, and Gauss distribution (specified in Section 4.1). Mainly, Figure 1c shows averaging of digital levels, which results in multiple appearance of unique points. This phenomenon is reduced with the increasing number of the points in the distribution. Nevertheless, it does not disappear in any real case. Thus, the monotone dependencies of

n_{i}

, respectively

p_{i}

, on the

Γ_{α}^{(i)}

are valid only at the approximation to an infinite resolution in levels of values.

Figure 2 shows distribution changes of the values

Γ_{α}^{(i)}

with the increasing α-parameter. For each parameter α, the elements

Γ_{α}^{(i)}

are enveloped by monotone increasing curves. For instance, as devised in Equation (14), the near linearity of the dependency of the number of elements on the values

Γ_{α}^{(i)}

at α = 2 is seen in Figure 2d. The differences between the distributions are expressed by the distributions of the values

Γ_{α}^{(i)}

along the horizontal axes.

2.3. Point Information Gain Entropy and Point Information Gain Entropy Density

In the previous sections, we showed that

Γ_{α}^{(i)}

is different for any

n_{i}

and the dependency of these two variables is a monotone increasing function for all

α > 0

. Here, we propose new variables—a point information gain entropy (

H_{α}

) and point information gain entropy density (

Ξ_{α}

) defined by formulas

H_{α} = \sum_{j = 1}^{k} n_{j} Γ_{α}^{(j)}

(19)

and

Ξ_{α} = \sum_{j = 1}^{k} Γ_{α}^{(j)} .

(20)

They can be understood as a multiple of the average point information gain and—under linear averaging—an average gain of the phenomenon j, respectively.

The information content is generally measured by the entropy. The famous Shannon source coding theorem [24] refers to a specific process of transmission of a discretized signal and introduction of the noise. The Rényi entropy is one of the class of one-parametric entropies and offers numerous additional features over the Shannon entropy [7,12,25] such as the determination of a generalized dimension of a strange attractor [20,21]. The universality of the generalized dimension for characterization of any distribution, whose regularity may be only coincidental, is still under dispute. However, the values

H_{α}

and

Ξ_{α}

characterize a given distribution for any α. Differences between distributions are expressed in counts along the axes

Γ_{α}

. Therefore, independently of the mechanisms of the generation of the distributions, the values

H_{α}

/

Ξ_{α}

can serve for the comparison of these distributions. It holds for any both parametric and non-parametric distributions.

The next question is whether the

Ξ_{α}

has some expected properties. In this aspect, we mention the facts observed upon examination of Equation (12), which enable us to rewrite it as

\begin{matrix} Ξ_{α} & = & \sum_{j = 1}^{k} Γ_{α}^{(j)} = \frac{k}{1 - α} ln (\frac{n^{α}}{{(n - 1)}^{α}}) + \frac{1}{1 - α} \sum_{i = 1}^{k} ln (\frac{\sum_{j = 1, j \neq i}^{k} n_{j}^{α} + {(n_{i} - 1)}^{α}}{\sum_{j = 1}^{k} n_{j}^{α}}) = \\ = & C_{α} (n) \cdot k + \frac{1}{1 - α} ln (\prod_{j = 1}^{k} D_{α}^{(j)}), \end{matrix}

(21)

where the product in the argument of the logarithm in the second term is a product of functions upper limited by 1 and thus again a function upper limited by 1. From the previous analysis done for the

D_{α}^{(i)}

, we may conclude that the point information gain entropy density (

Ξ_{α}

) inherits properties of Rényi entropy, i.e., zooming properties, etc.

Similar to Equation (21), the point information gain entropy (

H_{α}

) can be rewritten as

\begin{matrix} H_{α} & = & \sum_{j = 1}^{k} n_{j} Γ_{α}^{(j)} = \frac{\sum_{j = 1}^{k} n_{j}}{1 - α} ln (\frac{n^{α}}{{(n - 1)}^{α}}) + \sum_{i = 1}^{k} n_{i} ln (\frac{(\sum_{j = 1, j \neq i}^{k} n_{j}^{α} + {(n_{i} - 1)}^{α})}{\sum_{j = 1}^{k} n_{j}^{α}}) = \\ = & C_{α} (n) \cdot \sum_{j = 1}^{k} n_{j} + ln (\prod_{j = 1}^{k} {(D_{α}^{(j)})}^{n_{j}}) . \end{matrix}

(22)

Again, the argument of the logarithm in the second term is upper limited by 1. The

H_{α}

also has properties inherited from Rényi entropy, although their mutual relation is more complicated.

3. Estimation of Point Information Gain in Multidimensional Datasets

3.1. Point Information Gain in the Context of Whole Image

Point information gain

Γ_{α, i}

introduced in Equation (7) was originally applied to the image enhancement [1,2]. A typical digital image is a matrix of

x \times y \times n

values, where x and y are dimensions of the image and n corresponds to the number of color channels (e.g., n is 1 and 3 for a monochrome and RGB image, respectively). In most cases, the intensity values are in the range from 0 to 255 (a 8-bit image) or from 0 to 4095 (a 12-bit image) for each color channel. For any size and bit depth of an image, we can compute the global information (Algorithm 1) provided by the occupied intensity bin i and evaluate as a change of a probability intensity histogram after removing a point from this bin.

For each parameter α, the calculation of

Γ_{α}^{(i)}

helps to find values of the intensities with the identical occurrences and determine their distribution in (a structural part of) the image. Thus, in general, the recalculations to

Γ_{α}^{(i)}

can be considered as Look-Up Tables—intensities with the highest probabilities of occurrences in an image correspond to the highest (positive) values

Γ_{α}^{(i)}

and the brightest intensities in a

Γ_{α}^{(i)}

-transformed image and vice versa. Sometimes, mainly in the case of local information, due to the transformation of the original values

Γ_{α}^{(i)}

into an 8-bit resolution, some levels

Γ_{α}^{(i)}

are merged into one intensity level of the transformed image.

Algorithm 1: Point information gain vector (

Γ_{α}

), point information gain entropy (

H_{α}

), and point information gain entropy density (

Ξ_{α}

) calculations for global (Whole image) information and typical histograms.

Input: n-bin histogram

h

; α, where α ≥ 0 ∧ α ≠ 1

Output:

Γ_{α}

;

H_{α}

;

Ξ_{α}

1

p = h /

sum

(h)

; % explain the frequency histogram

h

as a probability histogram

p

2

Γ_{α} =

zeros

(h)

; % create a zero matrix

Γ_{α}

of the size of the histogram

h

3 for

i = 1

to n do

10 end

11

H_{α} = sum (h .^{*} Γ_{α})

;

% calculate

H_{α}

as a sum of the element-by-element multiplication of

h

and

Γ_{α} (E q u a t i o n (19))

12

Ξ_{α} = sum (Γ_{α})

; % calculate

Ξ_{α}

as a sum of all unique values in

Γ_{α}

(Equation (20));

Everything is best visualized in Figure 3 and Figure 4, which show the

Γ_{α}^{(i)}

-transformations of the texmos2.s512 image. The intention was probably to create an image with a uniform distribution of intensities. Provided the uniform intensity distribution, the output of the global

Γ_{α}^{(i)}

-calculation would be only one value

Γ_{α}^{(i)}

, i.e., Figure 3b would be unicolor. However, eight original intensities (Figure 3a) resulted in five values

Γ_{0.99}^{(i)}

(i.e., local parts) (Figure 4b,d). The detailed image analysis showed that the number of occurrences is only identical for intensities 32-224 and 96-128-192, i.e., there are five unique values of frequencies of intensity occurrences (Figure 4a). For a change, in the 4.1.07 image, the global

Γ_{0.99}^{(i)}

-recalculation emphasizes the unevenness of the background and shadows around a group of the jelly beans (Figure 5b). In conformity with the statement in the next-to-last paragraph in Section 2.1, this principle also enables highlighting of rare points in images with rich spectrum of intensities, mainly at low α-values. The calculations using higher values α do not point highlight rare points so intensively and the resulting image is more smooth.

3.2. Local Point Information Gain

Since multidimensional datasets, as e.g., images, consist of special structures given by the pixel lattice, it can be also beneficial to calculate not only global information gain, but also local information gain in some defined surroundings (Algorithm 2). The local information is defined after removing an element from the bin i where the element lies in the center of the surroundings, which creates the intensity histogram. The choice of the local surroundings around pixels is specific for each image. However, we do not have any systematic method for comparison of suitability of different surroundings around the pixels. The suitability of the chosen surroundings depends obviously on the process by which the observed pattern or other distribution was generated. According to our knowledge, the choice of the appropriate surroundings on the basis of known image generation was studied only for cellular automata [27,28,29]. This makes the study of the local information very interesting because it outlines another method for recognition of the processes of self-organization/pattern formation [30]. In this article, we confine ourselves to the usage of the local information for better understanding of both the limitation of the method of the

Γ_{α}^{(i)}

-calculation and the local information itself. The cross, square, and circular surroundings around each pixel are demonstrated on three different standard images—texmos2.s512 (monochrome, computer-generated, unifractal), 4.1.07 (RGB, photograph, unifractal) [26], and wd950112 (monochrome version, computer-generated, multifractal) [31].

The cross from the intensity values, whose shanks meet in the examined point of the original image [1], was chosen as the first local surroundings. In contrast to the global recalculation, such a transformation of the texmos2.s512 image produces a substantially much richer intensity

Γ_{α}^{(i)}

-image. One can see that relatively simple global information consists of more complex local information (Figure 4a,c,e).

However, the cross-local type of the image transformation is the least suitable approach for the analysis of the photograph of the jelly beans (Figure 5c). In this case, a circular local element is recommended to be used instead. As seen in Figure 5d–f, the increase of the diameter up to the size of the jelly beans reduces the background gradually. The next increase enables grouping the jelly beans into higher-order assemblies. A similar grouping is observable for the smallest squares in the transformed texmos2.s512 using the 29 px square surroundings (Figure 3f). In contrast, lower values of square surroundings (Figure 3d) highlight only the border intensities.

Algorithm 2: Point information gain matrix (

Γ_{α}

), point information gain entropy (

H_{α}

), and point information gain entropy density (

Ξ_{α}

) calculations for local kinds of information. Parameters a and b are semiaxes of the ellipse surroundings and a half-width of the rectangle surroundings, respectively, a = 0 and b = 0 for the cross surroundings.

Input: 2D discrete data

I_{m \times n}

; α, where α ≥ 0 ∧ α ≠ 1; parameters of surroundings

a, b

Output:

Γ_{α}^{(i)}

;

H_{α}

;

Ξ_{α}

1

Γ_{α} =

zeros

(I)

; % create a zero matrix

Γ_{α}

of the size of the

I_{m \times n}

matrix

2

hashMap =

containers.Map; % declare an empty hash-map (the key-value array)

3 for

i = (a + 1)

to

(m - a - 1)

do

16 end

17

H_{α} = sum (sum (Γ_{α}))

; % calculate

H_{α}

as a sum of all elements in the matrix

Γ_{α}

(Equation (19))

18

Ξ_{α} = sum (values (hashMap))

; % calculate

Ξ_{α}

as a sum of all elements in the matrix

hashMap

(Equation (20))

3.3. Point Information Gain Entropy and Point Information Gain Entropy Density

From the point of view of thermodynamics, the

H_{α}

and

Ξ_{α}

can be considered as additive, homological state variables whose knowledge can be helpful in analysis of multidimensional (image) data as well [32]. Despite the relative familiarity of their formulas (Section 2.3), the

H_{α}

can be defined as a sum of all information contributions to the data distribution, either the global or partial one, i.e., all

Γ_{α}^{(i)}

, whereas the

Ξ_{α}

is a sum of all information microstates of the distribution. Even in case of the local information, each two (collision) histograms with the same proportional representation of frequencies of elements, which were obtained from distributions around two pixels at different positions and only differing in the positions of frequencies in the histogram, are considered to be unique microstates and produce unique values

Γ_{α}^{(i)}

(see Algorithm 2). Thus, in agreement with the predictions arising from Equations (19) and (20), the

Ξ_{α}

-calculation does not suppress contributions of elements with low probabilities of occurrences (rare points) and is more robust and stable against changes in the local surroundings. This phenomenon manifests itself in the lower differences in dependencies

Ξ_{α}

(α) for four square surroundings in comparison to the dependencies

H_{α}

(α) in Figure 6. Nevertheless, it is worth noting that, during the calculation with the usage of the local geometrical surroundings, the surroundings touch the edges of the image at most and only an interior part of the image is processed. This fact—technical limitation—negatively influences values

H_{α}

and

Ξ_{α}

for square surroundings in Figure 6 and also leads to the lower sizes of

Γ_{α}^{(i)}

-transformed images (e.g., Figure 3d–f and Figure 5d–f).

Plotting the

H_{α}

and

Ξ_{α}

vs. α in Figure 6 is not random. As mentioned for

Γ_{α}^{(i)}

calculations (Section 2.1), multidimensional discrete (image) data is suitable to be characterized not only by one discrete value, either

H_{α}

or

Ξ_{α}

, at a particular α, but also by their α-dependent spectra. The reason is not only to avoid digital rounding, but also to possibly to characterize the type and the origin of geometrical structures in the image (cf. Section 3.1). Another application has been found in the statistical evaluation (clustering) of the time-lapse multidimensional datasets [32,33]. This calculation method was originally developed for study of multifractal self-organizing biological images [34]; however, it enables description of any types of images. Since parts of an image are forms of complex structures, the best way to interpret the image is to use a combination of its global and local kinds of information. We demonstrate this fact on an example of a unifractal (almost non-fractal) Euclidian image and a computer-generated multifractal image (Figure 6). Whereas the Euclidian image gives monotone spectra

H_{α}

/

Ξ_{α}

(α) (for the global and cross-local kinds of information, even linear dependencies at the particular discrete interval of values α), the recalculation of the multifractal image shows extremes at values of α close to 1. Analogous dependences were also plotted for the image sets of the course of the self-organizing Belousov–Zhabotinsky reaction [32].

4. Materials and Methods

4.1. Processing of Images and Typical Histograms

The values of

Γ_{α}^{(i)}

,

H_{α}

, and

Ξ_{α}

for all typical histograms and images were computed using Equations (7), (19), and (20). Algorithms are described in Section 4.2. The software and scripts, as well as results of all calculations, are available via ftp (Appendix).

For the Cauchy, Lévy, and Gauss distributions, histograms of dependencies of the number of elements on the

Γ_{α}^{(i)}

were calculated for α = {0.1, 0.3, 0.5, 0.7, 0.99, 1.3, 1.5, 1.7, 2.0, 2.5, 3.0, 3.5, 4.0} using a Matlab^® script (Mathworks, Natick, MA, USA). The following probability density functions

f (x)

were studied:

(a): Lévy distribution:

$f (x) = round [10^{c} \frac{exp (- \frac{1}{2 x})}{\sqrt{2 π x^{3}}}], x \in 〈 1, 256 〉, x \in N, c \in {3, 5, 7},$

(23)
(b): Cauchy distribution:

$f (x) = round [10^{c} \frac{1}{π (1 + x^{2})}], x \in 〈 0, 255 〉, x \in Z, c \in {3.5, 7},$

(24)
(c): Gauss distribution:

$f (x) = round [10^{c} \frac{exp (- \frac{x^{2}}{2 σ^{2}})}{σ \sqrt{2 π}}], x \in 〈 0, 255 〉, x \in Z, c \in {4, 300} \land σ = 1, c \in {3, 4} \land σ = 10 .$

(25)

In Figure 1 and Figure 2, the Cauchy and Lévy distributions with c = 7 and the Gauss distribution with parameters c = 4 and σ = 10 are depicted.

Multidimensional image analysis based on calculation of

Γ_{α}^{(i)}

,

H_{α}

, and

Ξ_{α}

was tested on 5 standard 8-bpc images (Table 1). Before the computations, original images wd950112.gif and 6ASCP011.gif obtained from [31] were transformed into monochrome *.png formats in Matlab^® software. All images were processed using an Image Info Extractor Professional software (Institute of Complex System, University of South Bohemia, Nové Hrady, Czech Republic) for α = {0.1, 0.2, ..., 0.9, 0.99, 1.1, 1.2, ..., 4.0}. The global information was extracted using (the italics refer to parameters which are set in the Image Info Extractor Professional software.)

W h o l e

I m a g e

calculation. The vertical–horizontal cross, square (a side of 5, 11, 15, and 29 px, respectively), and circle (a radius of 2, 5, and 8 px, respectively) for local information were set as special cases of a

C r o s s

,

R e c t a n g l e

, and

E l l i p s e

calculation at the rotation angle

P h i

of 0

^{\circ}

. Into the Image Info Extractor Professional software, a side of the square and radius of the circle surroundings was input as

w i d t h / 2

and

h e i g h t / 2

of 2, 5, and 14 px and a and b of 2, 5, and 8 px, respectively.

4.2. Calculation Algorithms

The algorithms implemented into the Image Info Extractor Professional are described in Algorithms 1 and 2. In the case of RGB images, the algorithms were applied to each color channel. The values

Γ_{α}^{(i)}

were visualized by a full rescaling into 8-bit resolution. Let us note that, for α = 1, the equations in lines 9 of both algorithms switch to the calculation of the Shannon entropy.

5. Conclusions

In this article, we propose novel information quantities—a point information gain (

Γ_{α}^{(i)}

), a point information gain entropy (

H_{α}

), and a point information gain entropy density (

Ξ_{α}

). We found a monotone dependency of the number of the elements of a given property in the set on

Γ_{α}^{(i)}

. The variables

H_{α}

and

Ξ_{α}

can be used as quantities in multidimensional datasets for the definition of the information context. Examination of local information in the distribution shows a potential for in-depth insight into formation of observed structures and patterns. This option can be practically utilized in acquisition of differently resolved variables in the dataset. The method enables avoiding cases where the number of occurrences of a certain event is the same, but ,in distribution in time, space or along any other variable, differ. In principle, the variables

H_{α}

and

Ξ_{α}

are unique for each distribution but suffer from problems with digital precision of the computation. Therefore, we propose their α-dependent spectra as proper characteristics of any discrete distribution, e.g., for clustering of multidimensional datasets.

Acknowledgments

This work was supported by the Ministry of Education, Youth and Sports of the Czech Republic—projects CENAKVA (No. CZ.1.05/2.1.00/01.0024), CENAKVA II (No. LO1205 under the NPU I program), and the CENAKVA Centre Development (No. CZ.1.05/2.1.00/19.0380). Jan Korbel acknowledges the support from the Czech Science Foundation, Grant No. GA14-07983S.

Author Contributions

Renata Rychtáriková was the main author of the text and tested the algorithms; Jan Korbel was responsible for the theoretical part of the article; Petr Macháček and Petr Císař were the developers of the Image Info Extractor Professional software; Jan Urban was the first who derived the point information gain from Shannon entropy; Dalibor Štys was the group leader who derived the point information gain for the Rényi entropy and prepared the first version of the manuscript. All authors have read and approved the final manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix

All processed data are available at [36] (for more details, see Section 4):

Folder “Figures” contains subfolders with results of $Γ_{α}^{(i)}$ , $H_{α}$ , and $Ξ_{α}$ calculations for “RGB” (4.1.07.tiff, wash-ir.tiff) and “gray” (texmos2.s512.png, wd950112.png, 6ASCP011.png) standard images calculated for 40 values α. The results are separated into subfolders according to the type of extracted information.
Folder “H_Xi” stores the PIE_PIED.xlsx and PIE_PIED2.xlsx files with dependencies of $H_{α}$ and $Ξ_{α}$ on α as exported from the PIE.mat files (in folder “Figures”). Titles of the graphs, which are in agreement with the computed variables and extracted kinds of information, are written in the sheets.
Folder “Histograms” stores the histograms of the occurrences of $Γ_{α}^{(i)}$ values for the Cauchy (two types), Lévy (three types), and Gauss (four types) distributions. The parameters of the original distributions are saved in the equation.txt files. All histograms were recalculated using 13 values α.
Folder “Software” contains a 32- and 64-bit version of an Image Info Extractor Professional v. b9 software (ImageExtractor_b9_xxbit.zip; supported by OS Win7) and a pig_histograms.m Matlab^® script for recalculation of the typical probability density functions. A script pie_ec.m serves for the extraction of $H_{α}$ and $Ξ_{α}$ from the folders (outputs from the Image Info Extractor Professional) over α. In the software and script, the variables $Γ_{α}^{(i)}$ , $H_{α}$ , and $Ξ_{α}$ are called $P I G$ , $P I E$ , and $P I E D$ , respectively. Manuals for the software and scripts are also attached.

References

Štys, D.; Urban, J.; Vaněk, J.; Císař, P. Analysis of biological time-lapse microscopic experiment from the point of view of the information theory. Micron 2011, 42, 3360–3365. [Google Scholar] [CrossRef] [PubMed]
Urban, J.; Vaněk, J.; Štys, D. Preprocessing of microscopy images via Shannon’s entropy. In Proceedings of the Pattern Recognition and Information Processing, Minsk, Belarus, 19–21 May 2009; pp. 183–187.
Boer, P.T.D.; Kroese, D.P.; Mannor, S.; Rubinstein, R. A tutorial on the cross-entropy method. Ann. Oper. Res. 2005, 134, 19–67. [Google Scholar] [CrossRef]
Baez, J.C.; Fritz, T.; Leinster, T. A characterization of entropy in terms of information loss. Entropy 2011, 13, 1945–1957. [Google Scholar] [CrossRef]
Marcolli, M.; Tedeschi, N. Entropy algebras and Birkhoff factorization. J. Geom. Phys. 2015, 97, 243–265. [Google Scholar] [CrossRef]
Tsallis, C. Possible generalization of Boltzmann–Gibbs statistics. J. Stat. Phys. 1988, 52, 479–487. [Google Scholar] [CrossRef]
Jizba, P.; Arimitsu, T. The world according to Rényi: Thermodynamics of multifractal systems. Ann. Phys. 2004, 312, 17–59. [Google Scholar] [CrossRef]
Jizba, P.; Korbel, J. Multifractal diffusion entropy analysis: Optimal bin width of probability histograms. Physica A 2014, 413, 438–458. [Google Scholar] [CrossRef]
Hentschel, H.G.E.; Procaccia, I. The infinite number of generalized dimensions of fractals and strange attractors. Physcia D 1983, 8, 435–444. [Google Scholar] [CrossRef]
Campbel, L.L. A coding theorem and Rényi’s entropy. Inf. Control 1965, 8, 423–429. [Google Scholar] [CrossRef]
Štys, D.; Jizba, P.; Papáček, S.; Náhlik, T.; Císař, P. On measurement of internal variables of complex self-organized systems and their relation to multifractal spectra. In Proceedings of the 6th IFIP TC 6 International Workshop (WSOS 2012), Delft, The Netherlands, 15–16 March 2012; pp. 36–47.
Rényi, A. On measures of entropy and information. In Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, Berkeley, CA, USA, 20 June–30 July 1960; pp. 547–561.
Kullback, S.; Leibler, R.A. On information and sufficiency. Ann. Math. Stat. 1951, 22, 79–86. [Google Scholar] [CrossRef]
Csiszár, I. I-divergence geometry of probability distributions and minimization problems. Ann. Prob. 1975, 3, 146–158. [Google Scholar] [CrossRef]
Harremoes, P. Interpretations of Rényi entropies and divergences. Physica A 2006, 365, 57–62. [Google Scholar] [CrossRef]
Van Erven, T.; Harremoes, P. Rényi divergence and Kullback–Leibler divergence. IEEE Trans. Inf. Theory 2014, 60, 3797–3820. [Google Scholar] [CrossRef]
Van Erven, T.; Harremoës, P. Rényi divergence and majorization. In Proceedings of the 2010 IEEE International Symposium on Information Theory Proceedings (ISIT), Austin, TX, USA, 13–18 June 2010.
Jizba, P.; Kleinert, H.; Shefaat, M. Rényi’s information transfer between financial time series. Physica A 2012, 391, 2971–2989. [Google Scholar] [CrossRef]
Havrda, J.; Charvát, F. Quantification method of classification processes. Concept of structural α-entropy. Kybernetika 1967, 3, 30–35. [Google Scholar]
Grassberger, P.; Procaccia, I. Measuring the strangeness of strange attractors. Physica D 1983, 9, 189–208. [Google Scholar] [CrossRef]
Grassberger, P.; Procaccia, I. Characterization of strange attractors. Phys. Rev. Lett. 1983, 50, 346. [Google Scholar] [CrossRef]
Costa, M. A new entropy power inequality. IEEE Trans. Inf. Theory 1985, 31, 751–760. [Google Scholar] [CrossRef]
Jizba, P.; Dunningham, J.A.; Joo, J. Role of information theoretic uncertainty relations in quantum theory. Ann. Phys. 2015, 355, 87–114. [Google Scholar] [CrossRef]
Shannon, C.E. A mathematical theory of communication. Bell Syst. Tech. J. 1948, 27, 379–423, 623–656. [Google Scholar] [CrossRef]
Jizba, P.; Arimitsu, T. On observability of Rényi’s entropy. Phys. Rev. E 2004, 69, 026128. [Google Scholar] [CrossRef] [PubMed]
The USC-SIPI Image Database. Available online: http://sipi.usc.edu/database/database.php?volume=textures &image=61#top (accessed on 17 October 2016).
Shalizi, C.R.; Crutchfield, J.P. Computational mechanics: Pattern and prediction, structure and simplicity. J. Stat. Phys. 2001, 104, 817–879. [Google Scholar] [CrossRef]
Shalizi, C.R.; Shalizi, K.L. Quantifying self-organization in cyclic cellular automata. In Noise in Complex Systems and Stochastic Dynamics; Society of Photo Optical: Bellingham, WA, USA, 2003. [Google Scholar]
Shalizi, C.R.; Shalizi, K.L.; Haslinger, R. Quantifying self-organization with optimal predictors. Phys. Rev. Lett. 2004, 93, 118701. [Google Scholar] [CrossRef] [PubMed]
Crutchfield, J.P. Between order and chaos. Nat. Phys. 2012, 8, 17–24. [Google Scholar] [CrossRef]
Explore Fractals Beautiful, Colorful Fractals, and More! Available online: https://www.pinterest.com/pin/254031235202385248/ (accessed on 17 October 2016).
Zhyrova, A.; Štys, D.; Císař, P. Macroscopic description of complex self-organizing system: Belousov–Zhabotinsky reaction. In ISCS 2013: Interdisciplinary Symposium on Complex Systems; Sanayei, A., Zelinka, N., Rössler, O.E., Eds.; Springer: Berlin/Heidelberg, Germany, 2014; pp. 109–115. [Google Scholar]
Rychtarikova, R. Clustering of multi-image sets using Rényi information entropy. In Bioinformatics and Biomedical Engineering; Ortuño, F., Rojas, I., Eds.; Springer: Cham, Switzerland, 2016; pp. 517–526. [Google Scholar]
Štys, D.; Vaněk, J.; Náhlík, T.; Urban, J.; Císař, P. The cell monolayer trajectory from the system state point of view. Mol. BioSyst. 2011, 42, 2824–2833. [Google Scholar] [CrossRef] [PubMed]
Available online: http://cims.nyu.edu/ kiryl/Photos/Fractals1/ascp011et.html (accessed on 17 October 2016).
Point Information Gain Supplementary Data. Available online: ftp://160.217.215.251/pig (accessed on 17 October 2016).

Figure 1.

Γ_{α, i}

-transformations of the discretized Lévy (a), Cauchy (b), and Gauss (c) distribution at α = 0.99. The deviation from the monotone dependency in the Gauss distribution is due to the digital rounding.

Figure 1.

Γ_{α, i}

-transformations of the discretized Lévy (a), Cauchy (b), and Gauss (c) distribution at α = 0.99. The deviation from the monotone dependency in the Gauss distribution is due to the digital rounding.

Figure 2.

Γ_{α}^{(i)}

-transformations of the discretized Lévy distribution at α =

{0.5, 0.99, 1.5, 2.0, 2.5, 4.0}

(from (a) to (f)).

Figure 2.

Γ_{α}^{(i)}

-transformations of the discretized Lévy distribution at α =

{0.5, 0.99, 1.5, 2.0, 2.5, 4.0}

(from (a) to (f)).

Figure 3.

Γ_{0.99}^{(i)}

-transformations of the texmos2.s512 image [26]. Original image (a) and information images calculated from the whole image (b), a cross around each pixel (c), and squares of the side of 5, 15, and 29 px, respectively, with the centered examined pixel (d–f).

Figure 3.

Γ_{0.99}^{(i)}

-transformations of the texmos2.s512 image [26]. Original image (a) and information images calculated from the whole image (b), a cross around each pixel (c), and squares of the side of 5, 15, and 29 px, respectively, with the centered examined pixel (d–f).

Figure 4. Histograms of

Γ_{0.99}^{(i)}

-transformations of the texmos2.s512 image [26]. Original image (a), original values

Γ_{0.99}^{(i)}

calculated from the whole image (b), original values

Γ_{0.99}^{(i)}

calculated from a cross whose shanks intersect in the examined pixel (c),

Γ_{0.99}^{(i)}

-transformed images calculated from the whole image (d), and

Γ_{0.99}^{(i)}

-transformed images calculated from a cross around each pixel (e). Colors in the original and globally (whole image) transformed histograms correspond to the intensity levels with the identical frequencies of occurrences in the original image.

Figure 4. Histograms of

Γ_{0.99}^{(i)}

-transformations of the texmos2.s512 image [26]. Original image (a), original values

Γ_{0.99}^{(i)}

calculated from the whole image (b), original values

Γ_{0.99}^{(i)}

calculated from a cross whose shanks intersect in the examined pixel (c),

Γ_{0.99}^{(i)}

-transformed images calculated from the whole image (d), and

Γ_{0.99}^{(i)}

-transformed images calculated from a cross around each pixel (e). Colors in the original and globally (whole image) transformed histograms correspond to the intensity levels with the identical frequencies of occurrences in the original image.

Figure 5.

Γ_{0.99}^{(i)}

-transformations of the 4.1.07 image [26]. Original image (a) and information images calculated from the whole image (b), a cross around each pixel (c), and circles of the diameter of 5, 17, and 30 px, respectively, with the centered examined pixel (d–f).

Figure 5.

Γ_{0.99}^{(i)}

-transformations of the 4.1.07 image [26]. Original image (a) and information images calculated from the whole image (b), a cross around each pixel (c), and circles of the diameter of 5, 17, and 30 px, respectively, with the centered examined pixel (d–f).

Figure 6. Spectra

H_{α}

and

Ξ_{α}

for global information and different local surroundings of a unifractal (texmos2.s512 [26], column (a)) and multifracal (wd950112 [31], column (b)) image at α =

{0.1, 0.2, . . ., 0.9, 0.99, 1.1, 1.2, . . ., 4.0}

.

Figure 6. Spectra

H_{α}

and

Ξ_{α}

for global information and different local surroundings of a unifractal (texmos2.s512 [26], column (a)) and multifracal (wd950112 [31], column (b)) image at α =

{0.1, 0.2, . . ., 0.9, 0.99, 1.1, 1.2, . . ., 4.0}

.

Table 1. Specifications of images.

**Table 1.** Specifications of images.
Image	Source	Colors	Resolution	Geometry	Origin
texmos2.s512.png	[26]	mono	512 × 512	unifractal	computer-based
4.1.07.tiff	[26]	RGB	256 × 256	unifractal	photograph
wash-ir.tiff	[26]	RGB	2250 × 2250	unifractal	computer-based
wd950112.png	[31]	mono	1024 × 768	multifractal	computer-based
6ASCP011.png	[35]	mono	1600 × 1200	multifractal	computer-based

© 2016 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC-BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Rychtáriková, R.; Korbel, J.; Macháček, P.; Císař, P.; Urban, J.; Štys, D. Point Information Gain and Multidimensional Data Analysis. Entropy 2016, 18, 372. https://doi.org/10.3390/e18100372

AMA Style

Rychtáriková R, Korbel J, Macháček P, Císař P, Urban J, Štys D. Point Information Gain and Multidimensional Data Analysis. Entropy. 2016; 18(10):372. https://doi.org/10.3390/e18100372

Chicago/Turabian Style

Rychtáriková, Renata, Jan Korbel, Petr Macháček, Petr Císař, Jan Urban, and Dalibor Štys. 2016. "Point Information Gain and Multidimensional Data Analysis" Entropy 18, no. 10: 372. https://doi.org/10.3390/e18100372

APA Style

Rychtáriková, R., Korbel, J., Macháček, P., Císař, P., Urban, J., & Štys, D. (2016). Point Information Gain and Multidimensional Data Analysis. Entropy, 18(10), 372. https://doi.org/10.3390/e18100372

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Point Information Gain and Multidimensional Data Analysis

Abstract

1. Introduction

2. Mathematical Description and Properties of Point Information Gain

2.1. Point Information Gain and Its Relation to Other Information Entropies

2.2. Point Information Gain for Typical Distributions

2.3. Point Information Gain Entropy and Point Information Gain Entropy Density

3. Estimation of Point Information Gain in Multidimensional Datasets

3.1. Point Information Gain in the Context of Whole Image

3.2. Local Point Information Gain

3.3. Point Information Gain Entropy and Point Information Gain Entropy Density

4. Materials and Methods

4.1. Processing of Images and Typical Histograms

4.2. Calculation Algorithms

5. Conclusions

Acknowledgments

Author Contributions

Conflicts of Interest

Appendix

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI