Informational Entropy of B-ary Trees after a Vertex Cut

Together with stars and paths, b-ary trees are one of the most studied acyclic graph structures. As any other structure, a b-ary tree can be seen as containing information. The aim of the present research was to assess through informational entropy the structural information changes in b-ary trees after removal of an arbitrary vertex.


Introduction
B-ary trees are, together with stars and paths, one of the most studied acyclic graph structures.A survey on the available literature showed that almost 20% of papers dealing with graphs deal also with trees.The statistics calculated on trees are used in machine learning [1] and solving of nondeterministic polynomial-time hard (NP-hard) problems [2].The concept of entropy in conjunction with regular trees was used as measure of similarity between two grammars [3].Moreover, the information flows in b-ary trees was subject of some theoretical results, including a converse in [4].General formulas characterizing the random walk in a b-ary tree were obtained in [5].A general

OPEN ACCESS
Entropy 2008, 10 577 formula giving the number of substructures by their sizes by systematically removal of a vertex in a bary tree was also obtained [6].
As any other structure, a tree can be seen as a system that contains information.The aim of the present research was to assess through informational entropy the structural information changes in bary trees after removal of an arbitrary vertex.

Material
Let T(b,Y) be a regular b-ary tree with Y levels (Figure 1).The removal of an arbitrary vertex applied on a b-ary tree will generate one (when the removal is applied to a leaf), two (when the removal is applied to a root), or three (when the removal is applied to an inside node) sub-graphs.It is possible to evaluate the number of sub-graphs by removing every vertex from the tree one at a time.A statistical motivation is associated to this procedure.If we are able to obtain the entire population of sub-structures by their sizes after this procedure, then the probability of appearances of a substructure of a given size can be obtained (by dividing its occurrence to the total number of substructures).
It could be said that systematically removal of every vertex is not a usual procedure for a given practical problem.Indeed, but usually in a real application, an issue may be the removal of one vertex when the specification of which one vertex is to be remove is unknown.One example may be a tree network topology (let us say a Regional Internet Registry providing Internet resource allocations).In this case, the assessment of the impact of removing a node is an important issue for projecting the network.
The assumptions applied in the present research were as follows: all vertices and edges in the b-ary tree are present (or available) at the initial moment (before removing the first vertex), the tree is balanced, and the vertices and edges have no precedence/weight one to another.Then we will move from complete characterization of a set (the set of all sub-graphs obtained by removing of an arbitrary vertex of a tree) to an issue of probability (obtaining the appearance probability of a set of a given size).In terms of the entire structure (the tree), the appearance probability of a set of a given size may not be a relevant information but a global parameter which to characterize the tree under the given circumstances (removal of one vertex) could be of interest and was investigated.

Method
Entropy is the parameter that characterizes the amount of disorder.Rényi [15] gave a generalization of the entropy through a family of functional for quantifying the diversity, uncertainty or randomness of a system.
Let X be a discrete variable that take the values {x1, …, xn} and p(•) the probability mass function (gives the probability that a discrete random variable is exactly equal to a value).The Rényi entropy (abbreviated as H α (X) for any non-negative order α, where α ≥ 0) is given by: Two particular cases of Eq(1) are taken into consideration: α → 1 (or Shannon's entropy [8]), and α = 2 (negative logarithm of Simpson's diversity index D [10] or Onicescu's energy [9]):

Results
Let us recall the structure from Figure 1.The following formula (abbreviated as NSSP) describes the number of substructures (coefficient of the X variable) by sizes (power of the X variable) for systematically removal of a vertex from a b-ary tree [6]: The Eq (3) give the full description of the counts and sizes of substructures for any Y,b>0.The Table 1 presents the substructures and sizes for some particular cases of trees.Note that the coefficient of X 0 counts the total number of vertices in the original structure.

Table 1. Substructures obtained by remove of an arbitrary vertex: regular b-ary trees b Y Formula Comments
1 1 2X 0 +2X 1 A graph with two vertices and one edge 1 >1 (Y+1)X 0 +2X(X Y -1)/(X-1) For b=1 tree degenerates in a path >1 1 (b+1)X 0 +b(X+X b ) For Y=1 tree degenerates in a star b 2 (b 2 +b+1)X 0 +b(X b+1 +X b2 )+b 2 (X+X b2+b ) Powers of X's may be repeated in the series Y = number of levels; b = ramification degree of the b-ary tree By using the Eq(3), we want to find the roots of the following equation: Excluding b=1 (b, Y, k 1 , and k 2 are natural not null numbers; and 1 is also a natural number) the equality Eq(4) can be satisfied just when the only solutions of Eq( 5) is for k 1 =Y+1 (that implying as consequence to have the b=1).The conclusion that can be drawn from here is that for b>1 all terms from Eq(3) are distinct.
Since the problem of occurrences of sub-graphs sizes (n o ) was solved in the general case for b>1 (n o repeats in terms of Eq(3)) we are able to perform the calculations for a given Y and b (note that the problem can be extended to any b and Y less than or equal to a given value): in addition, the probability (of observing a structure of a given size in a random cut) is given by: The replacement of Eq(6) into Eq(1) and Eq(2) leads to the following formulas: where H 1 (b,Y) is the Rényi entropy of order 1; H 2 (b,Y) is the Rényi entropy of order 2 of a b-ary tree T(b,Y) after removal of a vertex, when b>1.The value of 2 is given by the fact that we have equal probabilities for fragments of size (b Y+1-k -1)/(b -1), and (b Y+1 -b Y+1-k )/(b -1) respectively (see Eq (6)).
For b=1 (path) an entry from Table 1 can help us to obtain the values for H

Discussion
In terms of statistics, the random event subject of the investigation was to remove a vertex from a bary tree, and the observation was to count the size of a connected structure (the size of a connected structure being the random variable).This statistical experiment allowed obtaining an important informational statistics: entropy and diversity index.Equations ( 7)-( 10) allow one to compute the values and based on the obtained results, to plot the dependence of the H x (b,Y) (where x = 1, 2; H 1 (b,Y) = Rényi entropy of order 1; H 2 (b,Y) = Rényi entropy of order 2) and D(b,Y) by the number of levels (Y) and ramification degree (b) of the tree.These dependencies ranging from 1 to 20 for Y and b are presented in Figure 2.  decrease with increasing of b after an initial increase (see b>2 and Y>8, Figure 4).
A regression analysis for a give value of levels (Y) was applied in order to identify the relationship between Simpson's diversity index (D(b,Y)) and Rényi entropy of order 1 (H 1 (b,Y)), and between Rényi entropy of order 1 (H 1 (b,Y)) and 2 (H 2 (b,Y)).The obtained results expressed as equations and associated squared correlation coefficients are presented in Table 2.The relative variation (min=0; max=1) of regression equation coefficients from Table 2 for H1 = f(D) and H2 = f(H1) are graphically presented in Figure 5.The analysis of the results presented in Table 2 revealed the followings: ÷ Very good relationships in terms of determination coefficient are obtained between Simpson's diversity index and Rényi entropy of order 1, and Rényi entropy of order 1 and 2, respectively.÷ The relationships are polynomial of second order for both investigated situations ((D(b,Y))- ÷ The H 2 (b,Y) = f(H 1 (b,Y)) obtained slightly better performances in terms of determination coefficients compared with H 1 (b,Y) = f(D(b,Y)) (the maximum value of R 2 = 0.999989 -Y=4 compared with R 2 = 0.999948 -Y=3).Moreover, the distribution of the determination coefficient is more narrow for H 2 (b,Y) = f(H 1 (b,Y)) (a difference between maximum and minimum of 0.000129) compared to H 1 (b,Y) = f(D(b,Y)) (a difference between maximum and minimum of 0.000631).The calculated values of information entropy proved to be negatively correlated for the arbitrary vertex removal from a b-ary tree.The variation of the correlations between these measures of information is graphically presented in Figure 6.
The linear relationship between Rényi's entropy of order 1 (H 1 ) and 2 (H 2 ) of b-ary trees is perfect (equal to 1) for b=1 due to the equality of the values of these two parameters.The analysis of the obtained values of correlation coefficients revealed the followings: ÷ Degree of ramification (b): o r(H 1 ,H 2 ): the minimum as well as the maximum values of correlation coefficient (expressed as absolute values) increase with increasing of ramification degree.Moreover, a certain pattern of variation could be observed.o r(H 1 ,D): the correlation coefficients analysed as absolute value increase with increasing of ramification degree (the range of this variation is wider (0.036699) compared with the narrow variation of r(H 1 , H 2 ) (0.005394)).The distribution of r(H 1 ,D) follows two patterns according with the value of b (see Figure 7).(20,20).The highest correlation coefficient of r(H 1 ,D) was also obtained by T(≥16,20).o The absolute values of r(H 1 ,H 2 ) follows a similar pattern for Y≤9 with a slowly decrease until b=6 followed by an increase.For Y>9, the absolute value of correlation coefficients increase with increasing of b. o The variation of the r(H 1 ,D) is presented in Figure 8. Overall, the correlations coefficients revealed to be good, being always higher than 0.9730 (r(H 1 ,H 2 ) or smaller than -0.8724 (r(H 1 ,D)).
The linear relationships between H 1 and H 2 , as well as between H 1 and D became weaker with the increases of the degree of ramification (b) and of the number of levels (Y).The highest correlation expressed as the absolute value was of r(H 1 ,D) = -0.9724for T(2,2) and of r(H 1 ,H 2 ) = 0.9996 for T(2,2)).
The informational entropy computed on tress found its application on pattern recognition [16], clarification on self-similarity properties of river networks [17], characterization of kinetic equation [18], evaluation of comminution process [19], and so on.Moreover, the informational entropy analysis provide the information about the total loss of information of the investigated structure (the maximum entropy of a system correspond to the total loss of information), asymmetry or symmetry as well as about simplicity [20].
The formulas given by Eq (7) and Eq(8) may be used for larger b and Y as approximation formulas when the b-ary tree is not complete (having few gaps), by using an averaged (approximated) value for its number of levels, since few for larger trees may affect in small amount the entire information contained in the tree.Figures 2 and 6  As it could be observed, just the balanced b-ary trees were considered in the present research and this could be a limitation of the study due to the possibility to obtain a generalization.The behaviour obtained in this research on balanced b-ary trees will be assessing in further researches on unbalanced trees (note that the unbalanced trees are particular cases).

Conclusions
The values of Rényi entropy of order 1 are equal with the values of and Rényi entropy of order 2 for a value of the ramification degree of a b-ary tree equal to 1.The values obtained by Rényi entropy of order 1 and 2 decreases with the increases of the ramification degree of the b-ary tree.The Rényi entropy of order 1 and 2 increase with the increases of the number of levels for a give value of the ramification degree of a b-ary tree.The relationship between Rényi entropy of order 1 and 2, and between Rényi entropy of order 1 and Simpson's diversity index revealed to be of order two polynomial types rather than of a linear type.

Figure 1 .
Figure 1.A b-ary tree with Y levels, T(b,Y)

Figure 2 .o
Figure 2. Rényi entropy of order 1 -H 1 = H 1 (b,Y), Rényi entropy of order 2 -H 2 = H 2 (b,Y), and Simpson's diversity index -D = D(b,Y) of a b-ary tree with Y levels after a vertex removal

Figure 3 .o
Figure 3. Variation of Simpson's diversity index (D) and Rényi's entropy of order 1 (H 1 ) for a given value of ramification degree

Figure 4 .
Figure 4. Variation of Rényi's entropy of order 1 (H 1 ) and 2 (H 2 ) for a given value of Y

Figure
Figure 6.Correlation between Rényi's entropy of order 1 (H 1 ) and 2 (H 2 ), and between Rényi's entropy of order 1 (H 1 ) and Simpson's diversity index (D) of a b-ary tree after a vertex removal

Figure 8 .
Figure 8. Variation of r(H 1 ,D) with the degree of ramification (b) sustain these assumptions; the entropy of the b-ary tree varies very slowly for larger values of b and Y.

Table 2 .
Substructures by removal of an arbitrary vertex for some regular b-ary trees 6. Correlation between Rényi's entropy of order 1 (H 1 ) and 2 (H 2 ), and between Rényi's entropy of order 1 (H 1 ) and Simpson's diversity index (D) of a b-ary tree after a vertex removal 2 ) = r(b,Y) = r([H 1 (bb,YY)] 1≤bb≤b;1≤YY≤Y ,[H 2 (bb,YY)] 1≤bb≤b;1≤YY≤Y ) where [⋅] denotes a matrix having b columns and Y rows.Figure 7. Distribution of r(H 1 ,D) with number of levels (Y) ÷ Number of levels (Y):o The minimum and maximum value (expressed as absolute values) increase with number of levels for r(H 1 ,H 2 ) as well as for r(H 1 ,D).o The maximum value of r(H 1 ,H 2 ) is obtained for T