Point Divergence Gain and Multidimensional Data Sequences Analysis

We introduce novel information-entropic variables—a Point Divergence Gain (Ωα(l→m)), a Point Divergence Gain Entropy (Iα), and a Point Divergence Gain Entropy Density (Pα)—which are derived from the Rényi entropy and describe spatio-temporal changes between two consecutive discrete multidimensional distributions. The behavior of Ωα(l→m) is simulated for typical distributions and, together with Iα and Pα, applied in analysis and characterization of series of multidimensional datasets of computer-based and real images.


Introduction
Extracting the information from raw data obtained from, e.g., a set of experiments, is a challenging task. Quantifying the information gained by a single point of a time series, a pixel in an image, or a single measurement is important in understanding which points bring the most information about the underlying system. This task is especially delicate in case of time-series and image processing because the information is not only stored in the elements, but also in the interactions between successive points in a time series. Similar, when extracting information from an image, not all pixels have the same information content. This type of information is sometimes called local information because the information depends not only on the frequency of the phenomenon but also on the position of the element in the structure. The most important task is to identify the sources of information and to quantify them. Naturally, it is possible to use standard data-processing techniques based on quantities from information theory like, e.g., Kullback-Leibler divergence. On the other hand, the mathematical rigorousness is typically compensated by an increased computational complexity. For this end, a simple quantity called Point Information Gain and its relative macroscopic variables-a Point Information Gain Entropy and a Point Information Gain Entropy Density-were introduced in [1]. In [2], mathematical properties of the Point Information Gain were extensively discussed and applications to real-image data processing were pointed out. From the mathematical point of view, the Point Information Gain represents a change of information after removing an element of a particular phenomena from a distribution. The method is based on the Rényi entropy, which has been already extensively used in multifractal analysis and data processing (see e.g., Refs. [2][3][4][5] and references therein).
In this article, we introduce an analogous variable to the Point Information Gain. This new variable locally determines an information change after an exchange of a given element in a discrete set. We use a simple concept of entropy difference between the original set and the set with the exchanged element. The resulting value is called Point Divergence Gain Ω (l→m) α [6,7]. The main idea is to describe the importance of changes in the series of images (typically representing a video record from an experiment) and extract the most important information from it. Similar to the Point Information Gain Entropy and the Point Information Gain Entropy Density, the macroscopic variables called a Point Divergence Gain Entropy I α and a Point Divergence Gain Entropy Density P α are defined to characterize subsequent changes in a multidimensional discrete distribution by one number. The goal of this article is to examine and demonstrate some properties of these variables and use them for examination of time-spatial changes of information in sets of discrete multidimensional data, namely series of images in image processing and analysis, after the exchange of a pixel of a particular intensity for a pixel at the same position in the consecutive image. The main reason for choosing the Point Divergence Gain as the relevant quantity for the analysis of spatio-temporal changes is the fact that it represents an information gain of each pixel change. One can also consider model-based approaches based on the theory of random-fields, which can be more predictive in some cases. On the other hand, the model-free approach based on entropy gives us typically more relevant information for real data, where it is typically difficult to find an appropriate model. For the overview of model-based approaches in the random field theory, one can consult, e.g., Refs. [8][9][10].
The paper is organized as follows: in Section 2, we define the main quantity of the paper, i.e., the Point Divergence Gain and the related quantities and discuss its theoretical properties. In Section 3, we show applications of the Point Divergence Gain to image processing for both computer-based and real sequences of images. We show that the Point Divergence Gain can be used as a measure of difference for clustering methods and detects the most prominent behaviour of a system. In Section 4, we explain the presented methods and finer technical details necessary for the analysis including algorithms. Section 5 is dedicated to conclusions. All image data, scripts for histogram processing, and Image Info Extractor Professional software for image processing are available via sftp://160.217.215.193:13332/pdg (user: anonymous; password: anonymous.).

Point Divergence Gain
Recently, a quantity called Point Information Gain (PIG, Γ (i) α ) [6,7] and its generalization based on the Rényi entropy [2] have been introduced. We show how to apply the concept of PIG to sequence of multidimensional data frames.
Let us assume a set of variables with k possible outcomes (e.g., possible colours of each pixel).
The Γ (i) α is a simple variable based on entropy difference and enables us to quantify an information gain of each phenomenon. It is simply defined as a difference between entropy of an original discrete distribution which typically describes a frequency histogram of possible outcomes. Let us also define a distribution, where one occurrence of the i-th phenomenon is omitted, i.e., Thus, the Point Information Gain is defined as where H α is the Rényi entropy (Despite all computer implementations being calculated as log 2 , the following derivations are written in natural logarithm, i.e., ln.) The Rényi entropy represents a one-parametric class of information quantities tightly related to multifractal dynamics and enables us to focus on certain parts of the distribution [11]. Unlike the typically used Rényi's relative entropy [3,4,[11][12][13][14][15][16][17], the Point Information Gain Γ (i) α is a simple, computationally tractable quantity. Its mathematical properties have been extensively discussed in [2]. On the same basis, we can define a Point Divergence Gain (PDG, Ω (l→m) α ), where a discrete distribution P (i) is replaced by a distribution which can be obtained from the original distribution P, where the occurrence of the examined l-th phenomenon (n l ∈ N + ) is removed and supplied by a point of the occurrence of the m-th phenomenon (n m ∈ N 0 ). The main idea behind the definition is to quantify the information change in the subsequent image, if only one point is changed. Analogous to the Point Information Gain Γ (i) α , the Point Divergence Gain can be defined as Let us first show its connection to the Point Information Gain Γ (i) α . Since P (l) = P (l→m,m) , it is possible to express the Point Divergence Gain as α (P (l→m) ). (7) Let us investigate mathematical properties of the PDG. The Ω (l→m) α can be rewritten as By plugging the relative frequencies from Equations (1) and (5) into Equation (8), we obtain As seen in Equation (9), the variable Ω (l→m) α does not depend (contrary to the Γ (i) α ) on n but depends only on the number of elements of each phenomenon j. In Equation (9), let us design the nominator ∑ k j=1 n α j , which is constant and related to the original distribution (histogram) of elements and to the parameter α, as C α . It gives us the final form Equation (10) demonstrates that, for a particular distribution, Ω (l→m) α is a function only of the parameter α and frequencies of occurrences of the phenomena n l and n m in the original distribution, between which the exchange of the element occurs. Equation (10) further shows that if the exchange of the element occurs between phenomena l and m of the same (similar) frequencies of occurrence (i.e., n l ≈ n m ), the value of Ω In the 3D plots of Figure 1, we demonstrate Ω

Algorithm 1: Calculation of a point divergence gain matrix (Ω α ) for typical histograms.
Input: n-bin histogram h; α, where α ≥ 0 ∧ α = 1 Output: Ω α 1 Ω α =zeros(n, n); % create a zero square matrix Ω α of the size of n × n 2 C α = sum(h. ∧ α)); % calculate the constant C α for the given distribution and α Now we will consider the specific case α = 2 (collision entropy) for which Equation (10) can be simplified to For a specific difference ∆n (x→y) = D, Equation (11) can be approximated by the 1st-order Taylor sequence Equations (11) and (12) show that, for each unique ∆n (x→y) , the Ω (l→m) 2 depends only on the difference between the bins l and m, which the exchange of the element occurs between, and this dependence is almost linear. In other words, this explains why, for all distributions in Figure 2, For α → 1, the Rényi entropy becomes the ordinary Shannon entropy [18] and we obtain (cf. Equation (4)) and The difference of these entropies (cf. Equation (9)) is gradually giving One can see that relation (15) is defined for n l ∈ N \ {0, 1} and n m ∈ N + and is approximately equal to 0 for n l , n m 0 (the Cauchy and Rayleigh distribution for α = 1 in Figure 3). For n l ∈ N + and n m ∈ N 0 , from Equation (10), further implies:

Point Divergence Gain Entropy and Point Divergence Gain Entropy Density
In this section, we introduce two new variables that help us to investigate changes between two (typically consecutive) points of time series. A typical example can be provided by video processing, where each element of a time or spatial series is represented by a frame. Let us have two data frames I b = {a 1 , . . . , a n } and I b = {b 1 , . . . , b n } (For simplicity, we use only one index which corresponds to a one-dimensional frame. In case of images, we have typically two-dimensional frames and the elements are described by two indexes, e.g., x and y positions.). At each position i ∈ {1, . . . , n}, it is possible to replace the value a i by the value of the following frame, i.e., b i . The resulting Ω (a i →b i ) α then quantifies how much information is gained/lost, when, at the i-th position, we replace the value a i for the value b i . A Point Divergence Gain Entropy (PDGE, I α ) is defined as a sum of absolute values of all PDGs for all pixels, i.e., where n lm denotes the number of present substitutions l → m, when we transform I a → I b .
The absolute value ensures that the contribution of the transformation of a rare point to a frequent point (negative Ω α ) and a frequent point to a rare point (positive Ω α ) do not cancel each other and both contribute to the resulting PDGE. Typically, appearance or disappearance of a rare point (and replacement by a frequent value-typically background colour) carries important information about the experiment. The PDGE can be understood as an absolute information change.
Moreover, it is possible to introduce other macroscopic quantity-a Point Divergence Gain Entropy Density (PDGED, P α ), where we do not sum over all pixels, but only over all realized transitions l → m. Thus, the PDGED can be defined as where If the aim is to assess the influence of elements of a high occurrence on the time-spatial changes in the image series, it is recommended to use PDGE where each element is weighted by its number of occurrences. If the aim is to suppress the influence of these extreme values, it is better to compute PDGED.
Let us consider a time-series V, where each time step contains one frame, so V = {I 1 , I 2 , . . . }. The series V can be, e.g., a sequence of images (a video) obtained from some experiment, etc. For each time step, it is possible to calculate I α (t) = I α (I t ; I t+s ), resp. P α (t) = P α (I t ; I t+s ), where s is the time lag. Typically, we assume s = 1, i.e., consecutive frames with a constant time step.

Application of Point Divergence Gain and Its Entropies in Image Processing
The generalized Point Divergence Gain Ω (l→m) α in Equation (10) was originally used for characterization of dynamic changes in image series, namely in z-stacks of raw RGB data of unmodified live cells obtained via scanning along the z-axis using video-enhanced digital bright-field transmission microscopy [6,7]. In these two references, this new mathematical approach utilizes 8-and 12-bit intensity histograms of two consecutive images for pixel-by-pixel intensity weighted (parameterized) subtraction of these images to suppress the camera-based noise and to enhance the image contrast (In case of calibrated digital camera-based images, where the value of each point of the image reflects a number of incident photons, or, in case of computer-based images, it can be sufficient to use a simple subtraction for evaluation of time-spatial changes in the image series.).
For this paper, we chose other (grayscale) digital image series (Table 1) in order to demonstrate other applications of the PDG mathematical approach in image processing and analysis. Moreover, we newly introduce applications of the additive macroscopic variables Point Divergence Gain Entropy I α and Point Divergence Gain Entropy Density P α .  [20][21][22] at 200 achievable states with the internal excitation of 10, and phase transition, internal excitation, and external neighbourhood kind of noise of 0, 0.25, and 0.15, respectively. b The microscopic series of a 6-µm standard microring (FocalCheck TM , cat. No. F36909, Life Technologies TM (Eugene, OR, USA)) were acquired using the CellObserver microscope (Zeiss, Oberkochen, Germany) at the EMBL (Heidelberg, Germany). For both light processes, the green region of the visible spectrum was selected using an emission and transmission optical filter, respectively. In case of the diffraction, the point spread function was separated and the background intensities was disposed using Algorithm 1 in [7]. c The 12-bit depth was reduced using a Least Information Lost algorithm [23], which, by shifting the intensity bins, filled all empty bins in the histogram obtained from the whole data series up and rescaled these intensities between their minimal and maximal value.

Image Origin and Specification
Owing to the relation of the Ω (l→m) α to the Rényi entropy, the I α and P α as macroscopic variables can determine a fractal origin of images by plotting I α = f I (α) and P α = f P (α) spectra. If we deal with an image multifractality, the dependency I α = f I (α) or the dependency P α = f P (α) shows a peak. In case of a unifractality, these dependences are monotonous. It is demonstrated in Figures 4 and 5. There can be no doubts that the origin of the simulated Belousov-Zhabotinsky reaction (Figure 4) is multifractal. This statement is further strengthened by the courses of the dependencies I α = f I (α) and P α = f P (α), where we can see peaks with maxima at α ∈ (1, 2). On the contrary, a pair of images in Figure 5 (moving toys of cars) is a mixture of the objects of different fractal origin. In this case, whereas the course of f I (α) is monotonous and thus shows a unifractal characteristics, the dependence f P (α) has a maximum at α = 0.6 and thus demonstrates some multifractal features in the image. This is due to the fact that, since each information contribution is counted only once, the P α is more sensitive to the phenomena, which occur less frequently in the image. The monotonic course of the P α would be achieved only when a sequence of time-evolved Euclidian objects was transformed into the values Ω (l→m) α . Figure 4. The I α , P α , and Ω α for a pair of multifractal grayscale images. I. The I α and P α spectra, II. 8-bit visualization of Ω α -values for α = {0.99; 2.0}.
As mentioned in Section 2.2, the variables I α and P α measure absolute information change between a pair of images and characterize a similarity between these images. Therefore, these variables can find a practical utilization in auto-focusing in both light and electron digital microscopy. The in-focus object can be defined as an image with the global extreme of I α or P α . In other characteristics, this image fulfils the Nijboer-Zernike definition [24]: it is the smallest and darkest image in light or electron diffraction or the smallest and brightest image in light fluorescence (Section 3.3).

Image Filtering and Segmentation
Segmentation is a type of filtering of specific features in an image. The parameter α and the related value of Ω = 0 is lower. In digital light transmission microscopy, this mathematical method enabled us to find time stable intracellular objects inside live mammalian cells from consecutive pixels that fulfilled the equality Ω (l→m) α = 0 for α = 4.00 [6] or α = 5.00 [7]. In these cases, the high value of α ensured merging rare points in the image, suppressing the camera noise that was reflected in the images and, thus, modelling the shape of organelles. The rest of image escaped the observation. In the next paper [25], this method was extended to widefield fluorescent data.
As in the case of the Point Information Gain [2], the process of image segmentation of objects of a certain shape can be further improved by usage of the surroundings of this shape from which the intensity histogram is created for each pixel in the image.

Clustering of Image Sets
Finally, we used the Point Divergence Gain to detect the most relevant information contained in a sequence of images, capturing, e.g., an experiment. For this end, we used I α or P α as quantities of information change in the consecutive images and applied the clustering methods on them. The values of I α or P α are small numbers (Section 2.1). Due to the computation rounding of small numbers of the I α and the P α and for a better characterization of the image multifractality, in clustering, we use α-dependent spectra of these variables than a sole number at one α.
The dependence of the label of the cluster on the order of the image in the series is the smoothest for joint vectors [I α , P α ]. The similarity of these vectors (and thus images as well) is described in a space of principal components, e.g., [26], and classified by standard clustering algorithms such as k-means++ algorithm [27]. In comparison to the entropies and entropy densities related to the Γ (i) α , the clustering using the I α and the P α is more sensitive to changes in the patterns (intensities) and does not require other specification of images by local entropies computed from a specific type of surroundings around each pixel.
The described clustering method was examined on z-stacks obtained using light microscopy. The z-stacks were classified into 2-6 clusters (groups) when patterns of each image was described by 26 numbers, i.e., by vectors [I α , P α ] at 13 α (Figures 6a and 7a). These clusters were evaluated on the basis of the sizes of intensity changes between images. These five classification graphs of the gradually splitting clusters (Figures 6a and 7a, middle) further demonstrate the mutual similarity among the micrographs in each data series. The typical (middle) image of each cluster is shown in Figures 6b and 7b.
Firstly, we shall deal with a z-stack with 1057 images of a microring obtained using a widefield fluorescent microscope. The results of clustering illustrate a canonically repetitive properties of the so-called point spread function as the image of the observed object goes to and from its focus. In this case, the image group containing the real focus of the maximal I α and P α at low α (Section 3.1) is successfully determined by clustering into two clusters (Figure 6a). However, we will aim for a description of the results for five clusters. The central Cluster 5 (94 images) can be called an object's focal region with image levels where parts of the object have their own focus. The in-focus cluster is asymmetrically surrounded by Cluster 4 (131 and 53 images below and above Cluster 5, respectively), which was set on the basis of the occurrence of the lower peaks of I α and P α at low α. Cluster 3 (190 and 150 images below and above the focus, respectively) is typical of constant I α and P α for all α. Cluster 2 contains img. 176-214 and the last 126 images. These images are characteristic of constant I α and decreasing/increasing P α at α ≥ 2. Cluster 1 (the first 175 images) is prevalently dominated by increasing I α and decreasing P α at high α.
Before the calculation of the I α and P α , the undesirable background intensities were removed from the images obtained using optical transmission microscopy. The rest of each image was rescaled into 8 bits (Section 4.2). The results of clustering of these images (Figure 7a) are similar to fluorescent data (Figure 6a). The light transmission point spread function is symmetrical around its focus as well but the pixels at the same x, y-positions below and above the focus have opposite, dark vs. bright, intensities. Furthermore, the transitional regions between the clusters are longer than for the fluorescent data. The central, in-focus, part of the z-stack (img. 427-561 in Cluster 4) with the highest peaks of I α and P α is unambiguously separated using four clusters. The focus itself lies at the 505th image. This central part of the z-stack is surrounded by eight groups of images which were, due to their similarity, objectively classified into three clusters. Cluster 1 was formed by images 1-78, 376-426, and 562-661. These images show peaks of middle values of the I α and P α . Images 79-153, 292-375, and 662-703 were classified into Cluster 2 (dominated by the local minimum of the I α at α < 1). Cluster 3 is related to the images with the lowest values of the I α together with the lowest values and local peaks of the P α for α < 1 and for α < 1, respectively. This cluster contains images 154-291 and the last 537 images of the series. Let us mention that, in the clustering process, the I α and P α can recognize outliers such as incorrectly saved images or images with illumination artifacts.
In Figure 1, the Cauchy and Lévy distributions at c = 7 and the Gauss distribution at parameters c = 10 and σ = 10 are depicted.

Image Processing and Analysis
Image analysis based on calculation of the Ω (l→m) α , I α , and P α is demonstrated on five standard grayscale multi-image series ( Table 1). All images were processed using Whole Image mode in an Image Info Extractor Professional software (Institute of Complex Systems, FFPW, USB, Nové Hrady, Czech Republic). A pair of images 5000-5001 of a simulated Belousov-Zhabotinsky (BZ) reaction and a pair of images motion01.512-motion02.512 were recalculated for 40 values of α = {0.1, 0.2, ..., 0.9, 0.99, 1.1, 1.2, ..., 4.0}. The rest of series were processed for 13 values of α = {0.1, 0.3, 0.5, 0.7, 0.99, 1.3, 1.5, 1.7, 2.0, 2.5, 3.0, 3.5, 4.0}. The transformation at 13 α was followed by clustering of the matrices [P α , I α ] vs. Img. by k-means method (squared Euclidian distance metrics). Due to a high data variance in the BZ simulation, the clustering was preceded by the z-score standardization of the matrices over α. The resulted indices of clusters were reclassified to be consecutive (i.e., the first image of the series and the first image of the following group are classified into gr. 1 and 2, respectively, etc.).

Conclusions
In this paper, we derived novel variables from the Rényi entropy-a Point Divergence Gain Ω (l→m) α , a Point Divergence Gain Entropy I α , and a Point Divergence Gain Entropy Density P α . We have discussed their theoretical properties and made a brief comparison with the related quantity called Point Information Gain Γ i α [2]. Moreover, we have shown that the Ω (l→m) α and related quantities can find their applications in multidimensional data analysis, particularly in video processing. However, due to element-by-element computation, we can characterize time-spatial (4-D) changes much more sensitively than using, e.g., the previously derived Γ i α .
The Ω (l→m) α can be considered as a microstate of the information changes in the space-time. However, the Ω (l→m) α , I α , and P α show a property that is similar to the Γ i α and its relative macroscopic variables. Due to the derivation from the Rényi entropy, they are good descriptors of multifractility. Therefore, they can be utilized to characterize patterns in datasets and to classify the (sub)data into groups of similar properties. This has been successfully utilized in clustering of multi-image sets, image filtration, and image segmentation, namely in microscopic digital imaging.