Information Theory of Networks

The paper puts the emphasis on surveying information-theoretic network measures for analyzing the structure of networks. In order to apply the quantities interdisciplinarily, we also discuss some of their properties such as their structural interpretation and uniqueness.


Introduction
Information theory has been proven useful to solve interdisciplinary problems.For example, problems in biology, chemistry, computer science, ecology, electrical engineering, and neuroscience have been tackled by using information-theoretic methods such as entropy and mutual information, see [1][2][3][4][5].In particular, advanced information-measures such as the Jensen-Shannon divergence have also been used for performing biological sequence analysis [6].
In terms of investigating networks, information-theoretic techniques have been applied in an interdisciplinary manner [7][8][9][10][11].In this paper, we put the emphasis on reviewing information-theoretic measures to explore the network structure and shed light on some of their strong and weak points.But note that the problem of exploring the dynamics of networks by using information theory has also been tackled, see [12].
As the measures have been explored interdisciplinarily, it is particularly important to understand their strong and weak points.Otherwise, the results of applications involving the measures can not be understood properly.Besides surveying the most important measures, the main contribution is to highlight some of their strong and weak points.In this paper, this relates to better understand their structural interpretation and to gain insights about their uniqueness.The uniqueness, often called the discrimination power or degeneracy of an information-theoretic graph measure (and of course of any graph measure) relates to the property how well it can discriminate non-isomorphic graphs by its values, see [38][39][40].An important problem is to evaluate the degree of degeneracy of a measure by several quantities such as the sensitivity measure due to Konstantinova [40].Note that the discrimination power of a measure clearly depends on the graph class in question, see [38,41].

Measures Based on Equivalence Criteria and Graph Invariants
To find such measures, seminal work was done by Bonchev [8,42], Mowshowitz [15][16][17][18], Rashevsky [19] and Trucco [14].Chronologically, Rashevsky [19], MacArthur [27] and Trucco [14] were the first who applied Shannon's information measure to derive an entropy of a graph characterizing its topology.Then, Mowshowitz [15][16][17][18] called it structural information content of a graph and developed a theory to study the properties of such graph entropies under certain graph operations such as product, join etc.So far, numerous related quantities have been defined by applying the general approach of deriving partitions based on a graph invariant which is due to Mowshowitz [15].As a result of these developments, Bertz [43], Basak et al. [44,45] and Bonchev [8,42,46] contributed various related measures which are all based on the idea of deriving partitions by using a graph invariant, e.g., vertices, edges, degrees, and distances.
Let G = (V, E) be a graph, X be a graph invariant, and τ be an equivalence criterion.Then, G can be partitioned with respect to the elements of the graph invariant under consideration.From this procedure, one also obtains probability values for each partition [8,15] given by p i : By applying Shannon's information measure [5], we yield the graph entropies as follows [8]: where k equals the number of different partitions.I t is called total information content and I m is called the mean information content of G, respectively.
In the following, we survey graph entropy measures exemplarily by applying this principle.Besides well-known quantities, we also mention more recently developed indices.
1. Topological information content due to Rashevsky [19]: |N i | denotes the number of topologically equivalent vertices in the i-th vertex orbit of G. k is the number of different orbits.This measure is based on symmetry in a graph as it relies on its automorphism group and vertex orbits.It can be easily shown that I a vanishes for vertex transitive graphs.Also, it attains maximum entropy for asymmetric graphs.However, it has been shown [41] that these symmetry-based measures possess little discrimination power.The reason for this is that many non-isomorphic graphs have the same orbit structure and, hence, they can not be distinguished by this index.Historically seen, the term topological information content was proposed by Rashevski [19].Then, Trucco [14] redefined the measure in terms of graph orbits.Finally, Mowshowitz [15] studied extensively mathematical properties of this information measure for graphs (e.g., the behavior of I a under graph operations) and generalized it by considering infinite graphs [18].2. Symmetry index for graphs due to Mowshowitz et al. [47]: In [47], extremal values of this index and formulas for special graph classes such as wheels, stars and path graphs have been studied.As conjectured, the discrimination power of S turned out to be higher than by using I a as a discriminating term log (|Aut(G)|) has been added, see Equation ( 4).
In particular, we obtained this result by calculating S on a set of 2265 chemical graphs whose order range from four to nineteen.A detailed explanation of the dataset can be found in [48].3. Chromatic information content due to Mowshowitz [15,16]: where Graph-theoretic properties of I c and its behavior on several graph classes have been explored by Mowshowitz [15,16].To our knowledge, the structural interpretation of this measure as well as the uniqueness has not yet been explored extensively.4. Magnitude-based information indices due to Bonchev et al. [49]: where k i is the occurrence of a distance possessing value i in the distance matrix of G.The motivation to introduce these measures was to find quantities which detect branching well, see [49].In this context, branching of a graph correlates with the number of terminal vertices.By using this model, Bonchev et al. [49] showed numerically and by means of inequalities that these indices detect branching meaningfully.Also, it turned out that magnitude-based information indices possess high discrimination power for trees.But recent studies [50] have shown that the uniqueness of the magnitude-based information indices deteriorate tremendously when being applied to large sets of graphs containing cycles.More precisely, Dehmer et al. [50] evaluated the uniqueness of several graph entropy measures and other topological indices by using almost 12 million non-isomorphic, connected and unweighted graphs possessing ten vertices.5. Vertex degree equality-based information index found by Bonchev [8]: where |N kv i | is the number of vertices with degree equal to i and k := max v∈V k v .Note that this quantity is easy to determine as the time complexity of the calculation of the degrees is clearly polynomial.But it is intuitive that a simple comparison of the degree distribution of graphs is not meaningful to discriminate their structure.In [50], it has been shown that this measure possesses little discrimination power when applying the quantity to several sets of graphs.6. Overall information indices found by Bonchev [46,51]: The index calculates the overall value OX of a certain graph invariant X by summing up its values in all subgraphs, and partitioning them into terms of increasing orders (increasing number of subgraph edges k).In the simplest case, we have OX = SC, i.e., it is equal to the subgraph count [51].Several more overall indices and their informational functionals have been calculated, such as overall connectivity (the sum of total adjacency of all subgraphs), overall Wiener index (the sum of total distances of all subgraphs), the overall Zagreb indices, and the overall Hosoya index [51].They all share (with some inessential variations) the property to increase in value with the increase in graph complexity.The properties of most of these information functionals will not be studied here in detail.
Clearly, we only surveyed a subset of existing graph entropy measures.Further measures which are based on the same criterion can be found in [51][52][53].Also, we would like to mention that information measures for graphs based on other entropy measures have been studied [54].For instance, Passerini and Severini [54] explored the von Neumann entropy of networks in the context of network physics.Altogether, the variety of existing network measures bears great potential for analyzing complex networks quantitatively.But in the future, the usefulness and ability of these measures must be investigated more extensively to gain further theoretical insights in terms of their properties.

Körner Entropy
The definition of the Körner entropy is rooted in information theory and has been introduced to solve a particular coding problem, see [30,55].Simony [55] discussed several definitions of this quantity which have been proven to be equivalent.One definition thereof is For V ′ ⊆ V (G), the induced subgraph on V ′ is denoted by G(V ′ ) and χ(G) is the chromatic number [56] of G, G t the t-th co-normal power [30] of G and Note that P t (x) is the probability of the string x, see [55].Examples and the interpretation of this graph entropy measure can be found in [30,55].Due to the fact that its calculation relies on the stable set problem, its computational complexity may be insufficient.To our knowledge, the Körner entropy has not been used as a graph complexity measure in the sense of the quantities described in the previous section.That means, it does not express the structural information content of a graph (as the previously mentioned graph entropies) as it has been used in a different context, see [30,55].Also, its computational complexity makes it impossible to apply this quantity on a large scale and to investigate properties such as correlation and uniqueness.

Entropy Measures Using Information Functionals
Information-theoretic complexity measures for graphs can also be inferred by assigning a probability value to each vertex of a graph in question [9,21].Such probability values have been defined by using information functionals [9,21,48].In order to define these information functionals, some key questions must be answered: • What kind of structural features (e.g., vertices, edges, degrees, distances etc.) should be used to derive meaningful information functionals?• In this context, what does "meaningful" mean?• In case the functional is parametric, how can the parameters be optimized?
• What kind of structural information does the functional as well as the resulting entropy detect?
To discuss the first item, see [9,21,48] and note that metrical properties have been used to derive such information functionals.In order to prove whether a functional as well as the resulting entropy measures captures structural information meaningfully, an optimality criterion is needed.For example, suppose there exists a data set where the class labels of its entities (graphs) are known.By employing supervised machine learning techniques, the classification error can be optimized.Note that the last item relates to investigate the structural interpretation of the graph entropy measure.Indeed, this question could be raised for any topological index.
In order to reproduce some of these measures, we start with a graph G = (V, E) and let f be an information functional representing a positive function that maps vertices to the positive reals.Note that f captures structural information of G.If we define the vertex probabilities as [9,21] we yield the families of information-theoretic graph complexity measures [9,48]: λ > 0 is a scaling constant.Typical information functionals are [9,21,48] and The parameters c k > 0 to weight structural characteristics or differences of G in each sphere have to be chosen such that at least c i ̸ = c j holds.Otherwise the probabilities become 1 |V | leading to maximum entropy log(|V |).For instance, the setting c 1 > c 2 > • • • > c ρ(G) have often been used, see [9,21,48].Also, other schemes for the coefficients can be chosen but need to be interpreted in terms of the structural interpretation of the resulting entropy measure.As the measures are parametric (when using a parametric information functional), they can be interpreted as generalizations of the aforementioned partition-based measures.
By applying Equation ( 15), concrete information measures to characterize the structural complexity chemical structures have been derived in [48].For example, if we choose the coefficients linearly decreasing, e.g., or exponentially decreasing, e.g., the resulting measures are called I λ f V lin and I λ f V exp , respectively.Importantly, it turned out that I λ f V lin and I λ f V exp possess high discrimination power when applying them to real and synthetic chemical graphs, see [48].
To obtain more advanced information functionals, the concept outlined above has been extended in [57].The main idea for deriving these information functionals is based on the assumption that starting from an arbitrary vertex v i ∈ V , information spreads out via shortest paths in the graph which can be determined by using Dijkstra's algorithm [58].Then, more sophisticated information functionals as well as complexity measures have been defined [57] by using local property measures, e.g., vertex centrality measures [59].In particular, some of them turned out to be highly unique when applying the measures to almost 12 million non-isomorphic, connected and unweighted graphs possessing ten vertices [50].Interestingly, the just mentioned information-theoretic complexity measures showed a constantly high uniqueness that does not depend much on the cardinality of the underlying graph set.This property is desirable as we found that the uniqueness of the most existing measures deteriorates dramatically if the cardinality of the underlying graph set increases.

Information-Theoretic Measures for Trees
In this section, we sketch a few entropic measures which have been developed to characterize trees structurally.For example, Emmert-Streib et al. [60] developed an approach to determine the structural information content of rooted trees by using the partitioning of the vertices in such a tree.That means the number of vertices can be counted on each tree level which leads to a probability distribution and, thus, to an entropy characterizing the topology of a rooted tree.Dehmer [57] used this idea to calculate the entropy of arbitrary undirected graphs by applying a decomposition approach.Mehler [31] also employed entropic measures as balance and imbalance measures of tree-like graphs in the context of social network analysis.Other aspects of tree entropy have been tackled by Lions [61].

Other Information-Theoretic Network Measures
Apart from information-theoretic measures mostly used in mathematical and structural chemistry, several other entropic networks measures for measuring disorder relations in complex networks have been explored in the context of network physics, see [62].If P (k v ) denotes the probability of a vertex v possessing degree k, the distribution of the so-called remaining degree was defined by [62] q(k v ) := (k + 1)P kv+1 < k > (20) By applying Shannon's information measure, the following graph entropy measure has been obtained [62]: It can be interpreted as a measure for determining the heterogeneity of a complex network [62].In order to develop information indices for weighted directed networks, Wilhelm et al. [63] defined the measure called Medium Articulation that obtains its maximum for networks with a medium number of edges.It has been defined by [63] MA(G) where represents the redundancy and [63] the mutual information.Finally, the normalized flux from v i to v j is t v i v j is the flux (edge weight) between v i and v j .It can be easily shown that R vanishes for a directed ring but attains its maximum for the complete graph [63].The behavior of I is just converse.This implies that MA vanishes for extremal graphs and attains its maximum in between [63].We remark that a critical discussion of MA and modified measures have been recently contributed by Ulanowicz et al. [64].
For finalizing this section, we also reproduce the so-called offdiagonal complexity (OdC) [65] that is based on determining the entropy of the offdiagonal elements of the vertex-vertex link correlation matrix [65,66].Let G = (V, E) be a graph and let (c ij ) ij be the vertex-vertex link correlation matrix, see [65].Here c ij denotes the number of all neighbors possessing degree j > i of all vertices with degree i [66].k := max v∈V k v stands for the maximum degree of G.If one defines [66] OdC can be defined by [66] OdC := As the measure depends on correlations between degrees of pairs of vertices [65], it is not surprising that its discrimination power is low, see [41].

Structural Interpretation of Graph Measures
We already mentioned the problem of exploring the structural interpretation of topological graph measures exemplarily in the preceding sections.In general, this relates to explore what kind of structural complexity a particular measure does detect.The following listing shows a few such types of structural complexity of measures which have already been explored: • Branching in trees [49,67,68].Examples for branching measures are the Wiener index [69], the magnitude-based measures also known as Bonchev-Trinajstić indices [49] and others outlined by Janežić et al. [68].• Linear tree complexity depending on their size and symmetry [68].Examples for such measures are the MI and MB indices, TC and TC1 Indices etc., see [68].• Balance and imbalance of tree-like graphs [31].For examples, see [31].
• Cyclicity in graphs [23,38,68,70,71].Note that in the context of mathematical chemistry, this graph property has been introduced and studied by Bonchev et al. [38].Examples for branching measures are the BT and BI Indices, and the F index, see [70].
In view of the vast amount of topological measures developed so far, determining their structural interpretation is a daunting problem.Evidently, it is important to contribute to this problem as measures could be then classified by this property.This might be useful when designing new measures or finding topological indices for solving a particular problem.

Summary and Conclusion
In this paper, we surveyed information-theoretic measures for analyzing networks quantitatively.Also, we discussed some of their properties, namely the structural interpretation and uniqueness.Because a vast number of measures have been developed, the former problem has been somewhat overlooked when analyzing topological network measures.Also, the uniqueness of information-theoretic and non-information-theoretic measures is a crucial property.Applications thereof might be interesting for applications such as problems in combinatorial chemistry [73].In fact, many papers exist to tackle this problem [40,[74][75][76] but not on a large scale.Interestingly, a statistical analysis has been recently shown [50] that the uniqueness of many topological indices strongly depends on the cardinality of a graph set in question.Also, it is clear that the uniqueness property depends on a particular graph class.This implies that results may not be generalized when the measure gives feasible results for a special class only, e.g., trees, isomers etc.