Abstract
We generalize the point information gain (PIG) and derived quantities, i.e., point information gain entropy (PIE) and point information gain entropy density (PIED), for the case of the Rényi entropy and simulate the behavior of PIG for typical distributions. We also use these methods for the analysis of multidimensional datasets. We demonstrate the main properties of PIE/PIED spectra for the real data with the examples of several images and discuss further possible utilizations in other fields of data processing.
1. Introduction
Measurement of relative information between two probability distributions is one of the most important goals of information theory. Among many other concepts, there are two that are widely used. By far, the most widespread concept is called the relative Shannon entropy or the Kullback–Leibler divergence. In this work, we use an alternative approach based on a simple concept of entropy difference instead. By generalization of both concepts from Shannon’s approach to Rényi’s approach, we obtain the whole class of information variables that enable aiming for different parts of probability distributions and interpret it as an investigation of different parts of multifractal systems.
Despite the mathematical precision of the concept of the Shannon/Rényi divergence, we use another concept, the (Rényi) entropy difference, for introduction of a value which locally determines an information contribution of a given element in a discrete set. Even though there is no substantial restriction on the usage of a standard divergence for calculation of the information difference upon elimination of one element from a set, for practical reasons, we used the simple concept of entropy difference between sets with and without a given element. The resulted value has been called a point information gain [1,2]. The goal of this article is to examine and demonstrate some properties of this variable and derive another quantities, namely a point information gain entropy and a point information gain entropy density . We also introduce the relation of all these variables to global and local information in multidimensional data analysis.
2. Mathematical Description and Properties of Point Information Gain
2.1. Point Information Gain and Its Relation to Other Information Entropies
An important problem in the information theory is to estimate the amount of information gained or lost by refining, and approximate the probability distribution P by the distribution Q. The most popular measure used in the theory is the Kullback–Leibler (KL) divergence, defined as
where is so-called cross-entropy [3] and is the Shannon entropy of distribution P. If P is similar to Q, this measure can be approximated by entropy difference
Indeed, this measure does not obey as many theoretic-measure axioms as the KL-divergence. For instance, for , we can still obtain . Nevertheless, if and , this value can be still a suitable quantity revealing some important information aspects of a system. The situation, when the distributions are approximative histograms of some underlying distributions P for n and () measurements, respectively, is particularly interesting. In this case, the entropy difference
can be interpreted as an information gained by the -th measurement. Naturally, . When dealing with real complex systems, it is sometimes advantageous to introduce new information variables and entropies that capture the complexity of the system better, e.g., Hellinger’s distance, Jeffrey’s distance or J-divergence. There are also some specific information measures that have special interpretations and are widely used in various applications [4,5]. Two of the most important quantities are the Tsallis–Havrda–Charvát (THC) entropy [6], which is the entropy of non-extensive systems, and the Rényi entropy, the entropy of multifractal systems [7,8]. The latter is tightly connected to the theory of multifractal systems and generalized dimensions [9]. It is defined as
where α is the Rényi coefficient and is the probability of occurrence of a phenomenon j in the discrete distribution. Limit recovers the Shannon entropy. Similar to the Shannon entropy, the Rényi entropy also has an operational meaning. Actually, it can be interpreted as the average information cost, when the cost of an elementary piece of information is an exponential function of its length [10]. Thus, changing the parameter α changes the cost of the information and therefore accentuates some parts of the probability distributions while suppressing the others. Thus, by taking into account the whole class of Rényi entropies, we get a new generalized class of information quantities.
The point information gain of the i-th point was developed as a practical tool for assessment of the information contribution of an element to a given discrete distribution [11]. Similar to the Shannon entropy difference, it is defined as a difference of two Rényi entropies—with and without the examined element of a discrete phenomenon. Let us consider a discrete distribution of k distinct possible outcomes (e.g., different colors of pixels). Let us have a discrete distribution
where n denotes the total number of elements in the discrete distribution and the number of elements of i-th phenomenon, , respectively. Let us denote for and . Then, the distribution with the omitted i-th phenomenon can be written as
Hence, we may write the point information gain as
where k is the total number of the phenomena in the discrete distribution. In the rest of the text, we use the natural logarithm to simplify calculations. However, all computations have been performed with the usage of binary logarithm which, for the Rényi entropy and its derivatives, yields values in bits. In contrast to the commonly used Rényi divergence [12,13,14,15,16,17,18], we use for its relative simplicity and practical interpretation. Unlike the KL divergence, the Rényi divergence cannot be interpreted as a difference of cross-entropy and entropy of the underling distribution and computation becomes intractable. As discussed above, for similar distributions, it still preserves its information values.
After the substitution for the probabilities, one gets that
where depends only on n. For and , the whole entropy remains finite (contrary to unconditional entropy, which has to be renormalized for continuous case (for details, see Reference [7]). Therefore, we examine only the second term. When the argument of the logarithm is close to 1, i.e.,
which leads to the condition that
for given α, one can then approximate the logarithm by the Taylor expansion of the first order. After denoting
the second term of can be approximated as
where we used the big asymptotic notation. Let us note that the last term in Equation (8) is nothing else than the THC entropy [6,19]. Naturally, for very similar distributions, these two quantities are practically the same. This is due to the fact that, for large n, the omission of the point has no large impact on the whole distribution. Consequently, an actual value of parameter α, which leads to rescaling of probabilities, is more important than a particular form of entropy.
We shall continue by utilizing the Rényi entropy due to its relation to the generalized dimension of multifractal systems [20,21]. Let us concentrate again to the term . We can rewrite it as
where we use the small ω asymptotic notation. Specifically, provided , we obtain
which explains why the dependency on is approximately linear. In general, point information gain is a monotone function of , respectively , for all possible discrete distributions. Thus, it may be used as a quantity of information gain between two discrete distributions, which in the occurrence of one particular feature, differ.
Let us discuss an interpretation of the point information gain. We can rewrite Equation (8) as
We are interested in the situation when . After straightforward manipulations, we can get rid of ln and power, so
If and , we can approximate both sides with the rule for x close to zero which gives
Thus, we end with
This shows that holds for events with average frequency. corresponds to rare events, while corresponds to frequent events. Thus, in addition to the definition of the quantity of the contribution of each event to the examined distribution, we also obtain the discrimination between points which contribute to the total information of the given distribution under the statistical assumption represented by a particular α. This opens the question on existence of the “optimal” distribution for the given α.
Then, the possible variants of such optimality arise subsequently: the first one can be defined as a distribution for which exactly half of the value produces and the other half yields . The second one requires values to be spaced equally. Existence of such a distribution could be understood as another generalization of the concept of the entropy power [22,23], and we refer this question to our future research.
With respect to the previous discussion and practical utilization of this notion, we emphasize that for real systems with large n, values are relatively small numbers for current numerical precision of common computers. Their further computer averaging and numerical representation lead to significant errors such as underflow and overflow (e.g., Figure 1c). At lower values α, the values are broadly separated for rare points, while, at higher values α, the resolution is higher for more frequent data points. Therefore, spectrum vs. α is more advisable to compute rather than a single value at a chosen α.
Figure 1.
-transformations of the discretized Lévy (a), Cauchy (b), and Gauss (c) distribution at α = 0.99. The deviation from the monotone dependency in the Gauss distribution is due to the digital rounding.
2.2. Point Information Gain for Typical Distributions
In Figure 1, we demonstrate -transformations of three thoroughly studied distributions—the Lévy, Cauchy, and Gauss distribution (specified in Section 4.1). Mainly, Figure 1c shows averaging of digital levels, which results in multiple appearance of unique points. This phenomenon is reduced with the increasing number of the points in the distribution. Nevertheless, it does not disappear in any real case. Thus, the monotone dependencies of , respectively , on the are valid only at the approximation to an infinite resolution in levels of values.
Figure 2 shows distribution changes of the values with the increasing α-parameter. For each parameter α, the elements are enveloped by monotone increasing curves. For instance, as devised in Equation (14), the near linearity of the dependency of the number of elements on the values at α = 2 is seen in Figure 2d. The differences between the distributions are expressed by the distributions of the values along the horizontal axes.
Figure 2.
-transformations of the discretized Lévy distribution at α = (from (a) to (f)).
2.3. Point Information Gain Entropy and Point Information Gain Entropy Density
In the previous sections, we showed that is different for any and the dependency of these two variables is a monotone increasing function for all . Here, we propose new variables—a point information gain entropy () and point information gain entropy density () defined by formulas
and
They can be understood as a multiple of the average point information gain and—under linear averaging—an average gain of the phenomenon j, respectively.
The information content is generally measured by the entropy. The famous Shannon source coding theorem [24] refers to a specific process of transmission of a discretized signal and introduction of the noise. The Rényi entropy is one of the class of one-parametric entropies and offers numerous additional features over the Shannon entropy [7,12,25] such as the determination of a generalized dimension of a strange attractor [20,21]. The universality of the generalized dimension for characterization of any distribution, whose regularity may be only coincidental, is still under dispute. However, the values and characterize a given distribution for any α. Differences between distributions are expressed in counts along the axes . Therefore, independently of the mechanisms of the generation of the distributions, the values / can serve for the comparison of these distributions. It holds for any both parametric and non-parametric distributions.
The next question is whether the has some expected properties. In this aspect, we mention the facts observed upon examination of Equation (12), which enable us to rewrite it as
where the product in the argument of the logarithm in the second term is a product of functions upper limited by 1 and thus again a function upper limited by 1. From the previous analysis done for the , we may conclude that the point information gain entropy density () inherits properties of Rényi entropy, i.e., zooming properties, etc.
Similar to Equation (21), the point information gain entropy () can be rewritten as
Again, the argument of the logarithm in the second term is upper limited by 1. The also has properties inherited from Rényi entropy, although their mutual relation is more complicated.
3. Estimation of Point Information Gain in Multidimensional Datasets
3.1. Point Information Gain in the Context of Whole Image
Point information gain introduced in Equation (7) was originally applied to the image enhancement [1,2]. A typical digital image is a matrix of values, where x and y are dimensions of the image and n corresponds to the number of color channels (e.g., n is 1 and 3 for a monochrome and RGB image, respectively). In most cases, the intensity values are in the range from 0 to 255 (a 8-bit image) or from 0 to 4095 (a 12-bit image) for each color channel. For any size and bit depth of an image, we can compute the global information (Algorithm 1) provided by the occupied intensity bin i and evaluate as a change of a probability intensity histogram after removing a point from this bin.
For each parameter α, the calculation of helps to find values of the intensities with the identical occurrences and determine their distribution in (a structural part of) the image. Thus, in general, the recalculations to can be considered as Look-Up Tables—intensities with the highest probabilities of occurrences in an image correspond to the highest (positive) values and the brightest intensities in a -transformed image and vice versa. Sometimes, mainly in the case of local information, due to the transformation of the original values into an 8-bit resolution, some levels are merged into one intensity level of the transformed image.
| Algorithm 1: Point information gain vector (), point information gain entropy (), and point information gain entropy density () calculations for global (Whole image) information and typical histograms. |
| Input: n-bin histogram ; α, where α ≥ 0 ∧ α ≠ 1 |
| Output: ; ; |
| 1 sum; % explain the frequency histogram as a probability histogram |
| 2 zeros; % create a zero matrix of the size of the histogram |
| 3 for to n do |
![]() |
| 10 end |
| 11 ; |
| % calculate as a sum of the element-by-element multiplication of and |
| 12 ; % calculate as a sum of all unique values in (Equation (20)); |
Everything is best visualized in Figure 3 and Figure 4, which show the -transformations of the texmos2.s512 image. The intention was probably to create an image with a uniform distribution of intensities. Provided the uniform intensity distribution, the output of the global -calculation would be only one value , i.e., Figure 3b would be unicolor. However, eight original intensities (Figure 3a) resulted in five values (i.e., local parts) (Figure 4b,d). The detailed image analysis showed that the number of occurrences is only identical for intensities 32-224 and 96-128-192, i.e., there are five unique values of frequencies of intensity occurrences (Figure 4a). For a change, in the 4.1.07 image, the global -recalculation emphasizes the unevenness of the background and shadows around a group of the jelly beans (Figure 5b). In conformity with the statement in the next-to-last paragraph in Section 2.1, this principle also enables highlighting of rare points in images with rich spectrum of intensities, mainly at low α-values. The calculations using higher values α do not point highlight rare points so intensively and the resulting image is more smooth.
Figure 3.
-transformations of the texmos2.s512 image [26]. Original image (a) and information images calculated from the whole image (b), a cross around each pixel (c), and squares of the side of 5, 15, and 29 px, respectively, with the centered examined pixel (d–f).
Figure 4.
Histograms of -transformations of the texmos2.s512 image [26]. Original image (a), original values calculated from the whole image (b), original values calculated from a cross whose shanks intersect in the examined pixel (c), -transformed images calculated from the whole image (d), and -transformed images calculated from a cross around each pixel (e). Colors in the original and globally (whole image) transformed histograms correspond to the intensity levels with the identical frequencies of occurrences in the original image.
Figure 5.
-transformations of the 4.1.07 image [26]. Original image (a) and information images calculated from the whole image (b), a cross around each pixel (c), and circles of the diameter of 5, 17, and 30 px, respectively, with the centered examined pixel (d–f).
3.2. Local Point Information Gain
Since multidimensional datasets, as e.g., images, consist of special structures given by the pixel lattice, it can be also beneficial to calculate not only global information gain, but also local information gain in some defined surroundings (Algorithm 2). The local information is defined after removing an element from the bin i where the element lies in the center of the surroundings, which creates the intensity histogram. The choice of the local surroundings around pixels is specific for each image. However, we do not have any systematic method for comparison of suitability of different surroundings around the pixels. The suitability of the chosen surroundings depends obviously on the process by which the observed pattern or other distribution was generated. According to our knowledge, the choice of the appropriate surroundings on the basis of known image generation was studied only for cellular automata [27,28,29]. This makes the study of the local information very interesting because it outlines another method for recognition of the processes of self-organization/pattern formation [30]. In this article, we confine ourselves to the usage of the local information for better understanding of both the limitation of the method of the -calculation and the local information itself. The cross, square, and circular surroundings around each pixel are demonstrated on three different standard images—texmos2.s512 (monochrome, computer-generated, unifractal), 4.1.07 (RGB, photograph, unifractal) [26], and wd950112 (monochrome version, computer-generated, multifractal) [31].
The cross from the intensity values, whose shanks meet in the examined point of the original image [1], was chosen as the first local surroundings. In contrast to the global recalculation, such a transformation of the texmos2.s512 image produces a substantially much richer intensity -image. One can see that relatively simple global information consists of more complex local information (Figure 4a,c,e).
However, the cross-local type of the image transformation is the least suitable approach for the analysis of the photograph of the jelly beans (Figure 5c). In this case, a circular local element is recommended to be used instead. As seen in Figure 5d–f, the increase of the diameter up to the size of the jelly beans reduces the background gradually. The next increase enables grouping the jelly beans into higher-order assemblies. A similar grouping is observable for the smallest squares in the transformed texmos2.s512 using the 29 px square surroundings (Figure 3f). In contrast, lower values of square surroundings (Figure 3d) highlight only the border intensities.
| Algorithm 2: Point information gain matrix (), point information gain entropy (), and point information gain entropy density () calculations for local kinds of information. Parameters a and b are semiaxes of the ellipse surroundings and a half-width of the rectangle surroundings, respectively, a = 0 and b = 0 for the cross surroundings. |
| Input: 2D discrete data ; α, where α ≥ 0 ∧ α ≠ 1; parameters of surroundings |
| Output: ; ; |
| 1 zeros; % create a zero matrix of the size of the matrix |
| 2 containers.Map; % declare an empty hash-map (the key-value array) |
| 3 for to do |
![]() |
| 16 end |
| 17 ; % calculate as a sum of all elements in the matrix (Equation (19)) |
| 18 ; % calculate as a sum of all elements in the matrix |
| (Equation (20)) |
3.3. Point Information Gain Entropy and Point Information Gain Entropy Density
From the point of view of thermodynamics, the and can be considered as additive, homological state variables whose knowledge can be helpful in analysis of multidimensional (image) data as well [32]. Despite the relative familiarity of their formulas (Section 2.3), the can be defined as a sum of all information contributions to the data distribution, either the global or partial one, i.e., all , whereas the is a sum of all information microstates of the distribution. Even in case of the local information, each two (collision) histograms with the same proportional representation of frequencies of elements, which were obtained from distributions around two pixels at different positions and only differing in the positions of frequencies in the histogram, are considered to be unique microstates and produce unique values (see Algorithm 2). Thus, in agreement with the predictions arising from Equations (19) and (20), the -calculation does not suppress contributions of elements with low probabilities of occurrences (rare points) and is more robust and stable against changes in the local surroundings. This phenomenon manifests itself in the lower differences in dependencies (α) for four square surroundings in comparison to the dependencies (α) in Figure 6. Nevertheless, it is worth noting that, during the calculation with the usage of the local geometrical surroundings, the surroundings touch the edges of the image at most and only an interior part of the image is processed. This fact—technical limitation—negatively influences values and for square surroundings in Figure 6 and also leads to the lower sizes of -transformed images (e.g., Figure 3d–f and Figure 5d–f).
Figure 6.
Spectra and for global information and different local surroundings of a unifractal (texmos2.s512 [26], column (a)) and multifracal (wd950112 [31], column (b)) image at α = .
Plotting the and vs. α in Figure 6 is not random. As mentioned for calculations (Section 2.1), multidimensional discrete (image) data is suitable to be characterized not only by one discrete value, either or , at a particular α, but also by their α-dependent spectra. The reason is not only to avoid digital rounding, but also to possibly to characterize the type and the origin of geometrical structures in the image (cf. Section 3.1). Another application has been found in the statistical evaluation (clustering) of the time-lapse multidimensional datasets [32,33]. This calculation method was originally developed for study of multifractal self-organizing biological images [34]; however, it enables description of any types of images. Since parts of an image are forms of complex structures, the best way to interpret the image is to use a combination of its global and local kinds of information. We demonstrate this fact on an example of a unifractal (almost non-fractal) Euclidian image and a computer-generated multifractal image (Figure 6). Whereas the Euclidian image gives monotone spectra /(α) (for the global and cross-local kinds of information, even linear dependencies at the particular discrete interval of values α), the recalculation of the multifractal image shows extremes at values of α close to 1. Analogous dependences were also plotted for the image sets of the course of the self-organizing Belousov–Zhabotinsky reaction [32].
4. Materials and Methods
4.1. Processing of Images and Typical Histograms
The values of , , and for all typical histograms and images were computed using Equations (7), (19), and (20). Algorithms are described in Section 4.2. The software and scripts, as well as results of all calculations, are available via ftp (Appendix).
For the Cauchy, Lévy, and Gauss distributions, histograms of dependencies of the number of elements on the were calculated for α = {0.1, 0.3, 0.5, 0.7, 0.99, 1.3, 1.5, 1.7, 2.0, 2.5, 3.0, 3.5, 4.0} using a Matlab® script (Mathworks, Natick, MA, USA). The following probability density functions were studied:
- (a)
- Lévy distribution:
- (b)
- Cauchy distribution:
- (c)
- Gauss distribution:
In Figure 1 and Figure 2, the Cauchy and Lévy distributions with c = 7 and the Gauss distribution with parameters c = 4 and σ = 10 are depicted.
Multidimensional image analysis based on calculation of , , and was tested on 5 standard 8-bpc images (Table 1). Before the computations, original images wd950112.gif and 6ASCP011.gif obtained from [31] were transformed into monochrome *.png formats in Matlab® software. All images were processed using an Image Info Extractor Professional software (Institute of Complex System, University of South Bohemia, Nové Hrady, Czech Republic) for α = {0.1, 0.2, ..., 0.9, 0.99, 1.1, 1.2, ..., 4.0}. The global information was extracted using (the italics refer to parameters which are set in the Image Info Extractor Professional software.) calculation. The vertical–horizontal cross, square (a side of 5, 11, 15, and 29 px, respectively), and circle (a radius of 2, 5, and 8 px, respectively) for local information were set as special cases of a , , and calculation at the rotation angle of 0. Into the Image Info Extractor Professional software, a side of the square and radius of the circle surroundings was input as and of 2, 5, and 14 px and a and b of 2, 5, and 8 px, respectively.
Table 1.
Specifications of images.
4.2. Calculation Algorithms
The algorithms implemented into the Image Info Extractor Professional are described in Algorithms 1 and 2. In the case of RGB images, the algorithms were applied to each color channel. The values were visualized by a full rescaling into 8-bit resolution. Let us note that, for α = 1, the equations in lines 9 of both algorithms switch to the calculation of the Shannon entropy.
5. Conclusions
In this article, we propose novel information quantities—a point information gain (), a point information gain entropy (), and a point information gain entropy density (). We found a monotone dependency of the number of the elements of a given property in the set on . The variables and can be used as quantities in multidimensional datasets for the definition of the information context. Examination of local information in the distribution shows a potential for in-depth insight into formation of observed structures and patterns. This option can be practically utilized in acquisition of differently resolved variables in the dataset. The method enables avoiding cases where the number of occurrences of a certain event is the same, but ,in distribution in time, space or along any other variable, differ. In principle, the variables and are unique for each distribution but suffer from problems with digital precision of the computation. Therefore, we propose their α-dependent spectra as proper characteristics of any discrete distribution, e.g., for clustering of multidimensional datasets.
Acknowledgments
This work was supported by the Ministry of Education, Youth and Sports of the Czech Republic—projects CENAKVA (No. CZ.1.05/2.1.00/01.0024), CENAKVA II (No. LO1205 under the NPU I program), and the CENAKVA Centre Development (No. CZ.1.05/2.1.00/19.0380). Jan Korbel acknowledges the support from the Czech Science Foundation, Grant No. GA14-07983S.
Author Contributions
Renata Rychtáriková was the main author of the text and tested the algorithms; Jan Korbel was responsible for the theoretical part of the article; Petr Macháček and Petr Císař were the developers of the Image Info Extractor Professional software; Jan Urban was the first who derived the point information gain from Shannon entropy; Dalibor Štys was the group leader who derived the point information gain for the Rényi entropy and prepared the first version of the manuscript. All authors have read and approved the final manuscript.
Conflicts of Interest
The authors declare no conflict of interest.
Appendix
All processed data are available at [36] (for more details, see Section 4):
- Folder “Figures” contains subfolders with results of , , and calculations for “RGB” (4.1.07.tiff, wash-ir.tiff) and “gray” (texmos2.s512.png, wd950112.png, 6ASCP011.png) standard images calculated for 40 values α. The results are separated into subfolders according to the type of extracted information.
- Folder “H_Xi” stores the PIE_PIED.xlsx and PIE_PIED2.xlsx files with dependencies of and on α as exported from the PIE.mat files (in folder “Figures”). Titles of the graphs, which are in agreement with the computed variables and extracted kinds of information, are written in the sheets.
- Folder “Histograms” stores the histograms of the occurrences of values for the Cauchy (two types), Lévy (three types), and Gauss (four types) distributions. The parameters of the original distributions are saved in the equation.txt files. All histograms were recalculated using 13 values α.
- Folder “Software” contains a 32- and 64-bit version of an Image Info Extractor Professional v. b9 software (ImageExtractor_b9_xxbit.zip; supported by OS Win7) and a pig_histograms.m Matlab® script for recalculation of the typical probability density functions. A script pie_ec.m serves for the extraction of and from the folders (outputs from the Image Info Extractor Professional) over α. In the software and script, the variables , , and are called , , and , respectively. Manuals for the software and scripts are also attached.
References
- Štys, D.; Urban, J.; Vaněk, J.; Císař, P. Analysis of biological time-lapse microscopic experiment from the point of view of the information theory. Micron 2011, 42, 3360–3365. [Google Scholar] [CrossRef] [PubMed]
- Urban, J.; Vaněk, J.; Štys, D. Preprocessing of microscopy images via Shannon’s entropy. In Proceedings of the Pattern Recognition and Information Processing, Minsk, Belarus, 19–21 May 2009; pp. 183–187.
- Boer, P.T.D.; Kroese, D.P.; Mannor, S.; Rubinstein, R. A tutorial on the cross-entropy method. Ann. Oper. Res. 2005, 134, 19–67. [Google Scholar] [CrossRef]
- Baez, J.C.; Fritz, T.; Leinster, T. A characterization of entropy in terms of information loss. Entropy 2011, 13, 1945–1957. [Google Scholar] [CrossRef]
- Marcolli, M.; Tedeschi, N. Entropy algebras and Birkhoff factorization. J. Geom. Phys. 2015, 97, 243–265. [Google Scholar] [CrossRef]
- Tsallis, C. Possible generalization of Boltzmann–Gibbs statistics. J. Stat. Phys. 1988, 52, 479–487. [Google Scholar] [CrossRef]
- Jizba, P.; Arimitsu, T. The world according to Rényi: Thermodynamics of multifractal systems. Ann. Phys. 2004, 312, 17–59. [Google Scholar] [CrossRef]
- Jizba, P.; Korbel, J. Multifractal diffusion entropy analysis: Optimal bin width of probability histograms. Physica A 2014, 413, 438–458. [Google Scholar] [CrossRef]
- Hentschel, H.G.E.; Procaccia, I. The infinite number of generalized dimensions of fractals and strange attractors. Physcia D 1983, 8, 435–444. [Google Scholar] [CrossRef]
- Campbel, L.L. A coding theorem and Rényi’s entropy. Inf. Control 1965, 8, 423–429. [Google Scholar] [CrossRef]
- Štys, D.; Jizba, P.; Papáček, S.; Náhlik, T.; Císař, P. On measurement of internal variables of complex self-organized systems and their relation to multifractal spectra. In Proceedings of the 6th IFIP TC 6 International Workshop (WSOS 2012), Delft, The Netherlands, 15–16 March 2012; pp. 36–47.
- Rényi, A. On measures of entropy and information. In Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, Berkeley, CA, USA, 20 June–30 July 1960; pp. 547–561.
- Kullback, S.; Leibler, R.A. On information and sufficiency. Ann. Math. Stat. 1951, 22, 79–86. [Google Scholar] [CrossRef]
- Csiszár, I. I-divergence geometry of probability distributions and minimization problems. Ann. Prob. 1975, 3, 146–158. [Google Scholar] [CrossRef]
- Harremoes, P. Interpretations of Rényi entropies and divergences. Physica A 2006, 365, 57–62. [Google Scholar] [CrossRef]
- Van Erven, T.; Harremoes, P. Rényi divergence and Kullback–Leibler divergence. IEEE Trans. Inf. Theory 2014, 60, 3797–3820. [Google Scholar] [CrossRef]
- Van Erven, T.; Harremoës, P. Rényi divergence and majorization. In Proceedings of the 2010 IEEE International Symposium on Information Theory Proceedings (ISIT), Austin, TX, USA, 13–18 June 2010.
- Jizba, P.; Kleinert, H.; Shefaat, M. Rényi’s information transfer between financial time series. Physica A 2012, 391, 2971–2989. [Google Scholar] [CrossRef]
- Havrda, J.; Charvát, F. Quantification method of classification processes. Concept of structural α-entropy. Kybernetika 1967, 3, 30–35. [Google Scholar]
- Grassberger, P.; Procaccia, I. Measuring the strangeness of strange attractors. Physica D 1983, 9, 189–208. [Google Scholar] [CrossRef]
- Grassberger, P.; Procaccia, I. Characterization of strange attractors. Phys. Rev. Lett. 1983, 50, 346. [Google Scholar] [CrossRef]
- Costa, M. A new entropy power inequality. IEEE Trans. Inf. Theory 1985, 31, 751–760. [Google Scholar] [CrossRef]
- Jizba, P.; Dunningham, J.A.; Joo, J. Role of information theoretic uncertainty relations in quantum theory. Ann. Phys. 2015, 355, 87–114. [Google Scholar] [CrossRef]
- Shannon, C.E. A mathematical theory of communication. Bell Syst. Tech. J. 1948, 27, 379–423, 623–656. [Google Scholar] [CrossRef]
- Jizba, P.; Arimitsu, T. On observability of Rényi’s entropy. Phys. Rev. E 2004, 69, 026128. [Google Scholar] [CrossRef] [PubMed]
- The USC-SIPI Image Database. Available online: http://sipi.usc.edu/database/database.php?volume=textures &image=61#top (accessed on 17 October 2016).
- Shalizi, C.R.; Crutchfield, J.P. Computational mechanics: Pattern and prediction, structure and simplicity. J. Stat. Phys. 2001, 104, 817–879. [Google Scholar] [CrossRef]
- Shalizi, C.R.; Shalizi, K.L. Quantifying self-organization in cyclic cellular automata. In Noise in Complex Systems and Stochastic Dynamics; Society of Photo Optical: Bellingham, WA, USA, 2003. [Google Scholar]
- Shalizi, C.R.; Shalizi, K.L.; Haslinger, R. Quantifying self-organization with optimal predictors. Phys. Rev. Lett. 2004, 93, 118701. [Google Scholar] [CrossRef] [PubMed]
- Crutchfield, J.P. Between order and chaos. Nat. Phys. 2012, 8, 17–24. [Google Scholar] [CrossRef]
- Explore Fractals Beautiful, Colorful Fractals, and More! Available online: https://www.pinterest.com/pin/254031235202385248/ (accessed on 17 October 2016).
- Zhyrova, A.; Štys, D.; Císař, P. Macroscopic description of complex self-organizing system: Belousov–Zhabotinsky reaction. In ISCS 2013: Interdisciplinary Symposium on Complex Systems; Sanayei, A., Zelinka, N., Rössler, O.E., Eds.; Springer: Berlin/Heidelberg, Germany, 2014; pp. 109–115. [Google Scholar]
- Rychtarikova, R. Clustering of multi-image sets using Rényi information entropy. In Bioinformatics and Biomedical Engineering; Ortuño, F., Rojas, I., Eds.; Springer: Cham, Switzerland, 2016; pp. 517–526. [Google Scholar]
- Štys, D.; Vaněk, J.; Náhlík, T.; Urban, J.; Císař, P. The cell monolayer trajectory from the system state point of view. Mol. BioSyst. 2011, 42, 2824–2833. [Google Scholar] [CrossRef] [PubMed]
- Available online: http://cims.nyu.edu/ kiryl/Photos/Fractals1/ascp011et.html (accessed on 17 October 2016).
- Point Information Gain Supplementary Data. Available online: ftp://160.217.215.251/pig (accessed on 17 October 2016).
© 2016 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC-BY) license (http://creativecommons.org/licenses/by/4.0/).

