Fractional Order Generalized Information

This paper formulates a novel expression for entropy inspired in the properties of Fractional Calculus. The characteristics of the generalized fractional entropy are tested both in standard probability distributions and real world data series. The results reveal that tuning the fractional order allow an high sensitivity to the signal evolution, which is useful in describing the dynamics of complex systems. The concepts are also extended to relative distances and tested with several sets of data, confirming the goodness of the generalization.

The generalized concepts motivate further developments and new research avenues emerge.Bearing these ideas in mind, the present study combines both concepts and is organized as follows.Section 2 introduces entropy and fractional calculus in order to formulate the new generalized fractional entropy.Section 3 applies the new index in several types of data, namely two mathematical induced series, the digits of number π [22] and the Weierstrass function, two financial time series, the Dow Jones Industrial Average and the Europe Brent Spot Price [23,24], and one genomic series, the Human chromosome Y [25].The results are analysed and distinct entropy formulations, for several fractional orders, compared.Section 4 expands the proposed index towards the concepts of distance.The Kullback-Leibler and Jensen-Shannon divergence measures are revisited and rewritten in the light of the fractional perspective.The performance of the index is tested with two sets of data, namely 13 irrational numbers and the whole 24 Human chromosomes, adopting the fractional order that reveals higher sensitivity.Finally, Section 5 outlines the main conclusions.

Fractional Generalization of Entropy
Information theory was developed by Claude Shannon in 1948 [26,27] and has been applied in many scientific areas.The fundamental cornerstone is the information content of some event having probability of occurrence p i : The expected value, called Shannon entropy [28,29], becomes: where E (•) denotes the expected value operator.Expression (2) obeys the four Khinchin axioms [30,31] and several generalizations of entropy have been proposed, obeying only a sub-set of them.
Recently Ubriaco brought together information theory and FC and proposed [32] the expression: where 0 ≤ q ≤ 1 denotes the "order" so that q = 1 yields Expression (2).This formulation obeys the same properties as the Shannon entropy except additivity and is the expected value of information content given by: It is well known in FC the adoption of a power function for obtaining intermediate values, that is, for "fractionating" classical integer operators.In brief, the Laplace transform of the fractional derivative of order α ∈ R of a signal x (t) with zero initial conditions is given by: where t represents time, and L {•} and s denote the Laplace operator and variable, respectively.This property motivated the approximation of fractional derivatives by expanding the factor s α both with the Fourier and the Z transforms [33,34].However, the adoption by means of a power function is related with transforms and we can design a distinct fractional approach for information and entropy.In fact, we can think of Shannon information I (p i ) = − ln p i between the cases D −1 I (p i ) = p i (1 − ln p i ) and D 1 I (p i ) = − 1 p i , which, in the perspective of FC, leads to the proposal of information and entropy of order α ∈ R given by [35]: where Γ (•) and ψ (•) represent the gamma and digamma functions.Expression (7) fails to obey some of the Khinchin axioms with exception of the case α = 0 that leads to the classical Shannon entropy.This behaviour is in line with what occurs in FC, where fractional derivatives fail to obey some of the properties of integer-order operators.By other words, in both cases, by generalizing operators we loose some classical properties.
Figure 1 shows the locus of I q versus (q, p), 0 ≤ q ≤ 1, and I α versus (α, p), −1 ≤ α ≤ 1.We observe that I q has a smaller amplitude excursion than I α .Moreover, we verify that I α takes not only positive, but also negative values for α > 0. Therefore, Expression (6) assumes also the assumption that we can have negative information, that, for a given value of α > 0, can be interpreted as derived from "misleading events".While exploring the concept of "deception" is not the objective of the present paper, we should note that related ideas were addressed, in abstract terms, in the scope of negative probabilities [36][37][38][39][40][41] and, in practical terms, in the scope of robotics [42].In short, we can say that the parameter α allows us to tune the level of confidence of the information varying from positive (trustworthy) up to negative (deceptive) information.In order to illustrate the behaviour of the new index and to compare the two approaches Figure 2 depicts the entropies S q and S α for the uniform, Poisson (α = 2), geometric (p = 0.3), binomial (p = 0.3), and Benford probability distributions.We verify that S q has a much smaller variation with q than S α with α.There is a large similarity between the shape of the curves for 0 < q ≤ 1 and −0.5 < α ≤ 0. This is natural since S q tends to the traditional entropy when q → 1, while S α tends to the traditional entropy when α → 0. Furthermore, we verify that S α has maxima for 0.07 < α < 0.23 and reaches null values for 0.62 < α < 0.68.Therefore, in a practical application we can adopt values for α in the first range if information is reliable, or we can consider values of α in the second range if data contains misleading information.Usually it is of interest to investigate the evolution for a binary distribution, that in our case leads to the expressions: Figure 3 shows the locus of S q and S α versus (p, q), 0 ≤ q ≤ 1, and (α, p), −1 ≤ α ≤ 1, respectively.In both cases we have a symmetrical variation relatively to p = 0.5, but S q is less sensitive than S α to the variation of the order.In the case of S α we observe that the chart passes from convex to concave in the region of α = 0.5.

Application of the Generalized Entropy
This section applies S q and S α to the mathematical constant π, the Weierstrass function, the Dow Jones Industrial Average (closing values) and the Europe Brent Spot Price (USD per barrel) financial time series, and one genomic series, the Human chromosome Y.The mathematical constant π is expanded in base 10, and each digit is considered separately in the series.In the Weierstrass function, f (ξ) = ∞ n=0 a n cos (b n πξ) are adopted the parameters a = 0.5, b = 3 and the range −2 ≤ ξ ≤ 2. The two financial series correspond to daily values during the period 18 May 1987 up to 14 March 2014.In the four cases we adopt a total of L = 7000 data values.For the calculation of the histograms of relative frequency a non-overlapping sliding time window of W = 100 points is adopted, producing a total of k = 1, • • • , 70 samples.In the case of the genomic series we have four bases denoted {A,C,T,G} that are sampled in groups of 3 producing histograms with 4 3 bins.A small percentage of triplets involving the symbol N (considered as "not useful" in genomics) are not analysed.Therefore, a sequence of size L = 872 • 10 4 is adopted and two distinct non-overlapping sliding windows, of  versus (q, p), 0 ≤ q ≤ 1 (left) and S bin α versus (α, p), −1 ≤ α ≤ 1 (right).Figures 4 and 5 represent S q versus (q, p), and S α versus (α, p), for the π series and the Weierstrass function, respectively.We observe that S q has a low sensitivity to the dynamics of the series exhibiting significant variations only for q close to one, that is, when it reduces to the Shannon entropy.On the other hand, S α detects clearly dynamical variations, being particularly sensitive in the region 0 < α < 0.6.
Figures 6 and 7 depict the plots of S q and S α for the Dow Jones Industrial Average and Europe Brent Spot Price, respectively.We verify a behaviour similar to the one pointed out previously.
Figure 4. Entropy variation for the π series: S q versus (q, p), 0 ≤ q ≤ 1 (left) and S α versus (α, p), −1 ≤ α ≤ 1 (right). .Entropy variation for the Weierstrass function: S q versus (q, p), 0 ≤ q ≤ 1 (left) and S α versus (α, p), −1 ≤ α ≤ 1 (right). .Entropy variation for the Dow Jones Industrial Average time series: S q versus (q, p), 0 ≤ q ≤ 1 (left) and S α versus (α, p), −1 ≤ α ≤ 1 (right).Finally, Figures 8 and 9 show S q and S α for the Human chromosome Y, with the only difference being the size and number of sliding windows.As previously the higher sensitivities occur for q = 1 with S q and for α = 0.5 with S α .The sliding window W 1 is more appropriate for highlighting dynamical evolutions than window W 2 that is considerable large and leads to an "averaging" of the information content of the chromosome series.

Application of the Generalized Entropy
In this section we explore further the concept of generalized fractional information and entropy.We start by recalling the Kullback-Leibler divergence of Q from P defined as [43][44][45][46][47]: The Jensen-Shannon divergence JSD (P Q) is defined as: where M = P +Q 2 .Alternatively, we can calculate JSD (P Q) as: Having in mind Expressions ( 4), ( 6) and ( 12), the fractional JSD can be written as: In order to illustrate the fractional-order distance we consider two examples, namely the set A of n = 13 irrational numbers and the set B of n = 24 Human chromosomes.Set A consists of the numbers Pi (π = 3.141 = 1.618 . ..), ln 2, ln 3, ln 5, √ 2, √ 3 and √ 5 labelled in the sequel as {Pi, Nep, Eul, Cat, Hil, Khi, Gol, Ln2, Ln3, Ln5, St2, St3, St5}.Set B consists of the whole set of Human chromosomes labelled in the sequel as {Hu1, ..., Hu22, HuX, HuY}.The irrational numbers are expanded up to 7000 digits and, for each one, groups of two digits feed 10 2 bins of histograms of relative frequency of occurrence.On the other hand, the chromosome bases are read in triplets feeding 4 3 bins of histograms of relative frequency of occurrence.In both cases, a comparison n × n symmetrical matrix D of element to element relative distances is constructed, adopting the indices JSD q and JSD α .For simplifying comparisons all distances were converted to the interval between zero (minimum distance) and one (maximum distance).The results are visualized by means of Phylip [48,49] (plots using options "neighbor" and "drawtree"), a package of programs for inferring phylogenies.These algorithms produces a tree based on matrix D, trying to accommodate the distances into the two dimensional space.13) and ( 14).We verify that not only the charts are qualitatively of the same type, but also that the generalization leads to results compatible with those produced by distinct methods [50][51][52] which confirms the goodness of the proposed concept.
Figure 10.Tree (Phylip with algorithm "neighbor" and visualization by "drawtree") of the set A of 13 irrational numbers, compared by means of the indices I q , q = 1 (left) and I α , α = 0.5 (right).

Conclusions
This paper presented a generalization of the concept of entropy inspired in the properties of Fractional Calculus.The novel index follows the recent trend in expanding the scope of application of both mathematical tools, by relaxing some properties and allowing their application in new scientific areas.The generalized fractional entropy was first adopted with several typical probability distributions.In a second phase the index was also applied to several types of data, namely of mathematical, financial and biological nature.It was verified that the proposed entropy leads to an higher sensitivity to the signal evolution being useful in describing the dynamics of complex systems.Furthermore, the proposed generalization embeds the concept of positive and negative information, that is, with data either reliable or misleading, allowing the extension of entropy for deceptive cases.The new formulation is then extended for measuring relative distances and tested with two distinct sets of data.The results reveal the goodness of the generalized fractional information concept.

Figure 3 .
Figure 3. Variation of entropy: S bin q

Figures 10 and 11
Figures 10 and 11  show the trees for sets A and B based of distances(13) and(14).We verify that not only the charts are qualitatively of the same type, but also that the generalization leads to results compatible with those produced by distinct methods[50][51][52] which confirms the goodness of the proposed concept.

Figure 11 .
Figure 11.Tree (Phylip with algorithm "neighbor" and visualization by "drawtree") of the set B of 24 Human chromosomes compared by means of the indices I q , q = 1 (left) and I α , α = 0.5 (right).