A Note on Distance-based Graph Entropies

: A variety of problems in, e


Introduction
Studies of the information content of complex networks and graphs have been initiated in the late 1950s based on the seminal work due to Shannon [1].Numerous measures for analyzing complex networks quantitatively have been contributed [2].A variety of problems in, e.g., discrete mathematics, computer science, information theory, statistics, chemistry, biology, etc., deal with investigating entropies for relational structures.For example, graph entropy measures have been used extensively to characterize the structure of graph-based systems in mathematical chemistry, biology [3] and in computer science-related areas, see [4].The concept of graph entropy [5,6] introduced by Rashevsky [7] and Trucco [8] has been used to measure the structural complexity of graphs [3,9,10].The entropy of a graph is an information-theoretic quantity that has been introduced by Mowshowitz [11].Here the complexity of a graph [12] is based on the well-known Shannon's entropy [1,5,11,13].Importantly, Mowshowitz interpreted his graph entropy measure as the structural information content of a graph and demonstrated that this quantity satisfies important properties when using product graphs etc., see, e.g., [11,[14][15][16].Note the Körner's graph entropy [17] has been introduced from an information theory point of view and has not been used to characterize graphs quantitatively.An extensive overview on graph entropy measures can be found in [6].A statistical analysis of topological graph measures has been performed by Emmert-Streib and Dehmer [18].
Dehmer [5] presents some novel information functionals of V that capture, in some sense, the structural information of the underlying graph G. Several graph invariants, such as the number of vertices, edges, distances, the vertex degree sequences, extended degree sequences (i.e., the second neighbor, third neighbor, etc.), degree powers and connections, have been used for developing entropy-based measures [5,6,19,20].In this paper, we study graph entropies related to a new information functional, which is the number of vertices with distance k to a given vertex.Distance is one of the most important graph invariants.For a given vertex v in a graph, the number of vertices with distance one to v is exactly the degree of v; the number of pairs of vertices with distance three, which is also related to the clustering coefficient of networks [21], is also called the Wiener polarity index introduced by Wiener in 1947 [22].
In view of the vast of amount of existing graph entropy measures [3,5], there has been very little work to find their extremal values [23].A reason for this might be the fact that Shannon's entropy represents a multivariate function and all probability values are not equal to zero when considering graph entropies.Inspired by Dehmer and Kraus [23], it turned out that determining minimal values of graph entropies is intricate because there is a lack of analytical methods to tackle this particular problem.Other related work is due to Shi [24], who proved a lower bound of quantum decision tree complexity by using Shannon's entropy.Dragomir and Goh [25] obtained several general upper bounds for Shannon's entropy by using Jensen's inequality.Finally, Dehmer and Kraus [23] proved some extremal results for graph entropies, which are based on information functionals.
The main contribution of the paper is to study novel properties of graph entropies, which are based on an information functional by using the number of vertices with distance k to a given vertex.The paper is organized as follows.In Section 2, some concepts and notations in graph theory are introduced.In Section 3, we introduce the concept of graph entropies and some types of distance-based graph entropies.In Section 4, we state some properties of graph entropy.The paper finishes with a summary and conclusion in Section 5.

Preliminaries
A graph G is an ordered pair of sets V (G) and E(G) such that the elements uv ∈ E(G) are a sub-collection of the unordered pairs of elements of V (G).For convenience, we denote a graph by G = (V, E) sometimes.The elements of V (G) are called vertices and the elements of E(G) are called edges.If e = uv is an edge, then we say vertices u and v are adjacent, and u, v are two endpoints (or ends) of e.A loop is an edge whose two endpoints are the same.Two edges are called parallel, if both edges have the same endpoints.A simple graph is a graph containing no loops and parallel edges.If G is a graph with n vertices and m edges, then we say the order of G is n and the size of G is m.A graph of order n is addressed as an n-vertex graph, and a graph of order n and size m is addressed as an In this paper, we only consider simple graphs.
A graph is connected if, for every partition of its vertex set into two nonempty sets X and Y , there is an edge with one end in X and one end in Y .Otherwise, the graph is disconnected.In other words, a graph is disconnected if its vertex set can be partitioned into two nonempty subsets X and Y so that no edge has one end in X and one end in Y .
A path graph is a simple graph whose vertices can be arranged in a linear sequence in such a way that two vertices are adjacent if they are consecutive in the sequence, and are nonadjacent otherwise.Likewise, a cycle graph on three or more vertices is a simple graph whose vertices can be arranged in a cyclic sequence in such a way that two vertices are adjacent if they are consecutive in the sequence, and are nonadjacent otherwise.Denote by P n and C n the path graph and the cycle graph with n vertices, respectively.
A connected graph without any cycle is a tree.Actually, the path P n is a tree of order n with exactly two pendent vertices.The star of order n, denoted by S n , is the tree with n − 1 pendent vertices.A tree is called a double star S p,q , if it is obtained from S p+1 and S q by identifying a leaf of S p+1 with the center of S q .So, for the double star S p,q with n vertices, we have p + q = n.We call a double star S p,q balanced, if p = ⌊ n 2 ⌋ and q = ⌈ n 2 ⌉.A comet is a tree composed of a star and a pendent path.For any numbers n and 2 ≤ t ≤ n − 1, we denote by CS(n, t) the comet of order n with t pendent vertices, i.e., a tree formed by a path P n−t of which one end vertex coincides with a pendent vertex of a star S t+1 of order t + 1.
The length of a path is the number of its edges.For two vertices u and v, the distance between u and v in a graph G, denoted by d G (u, v), is the length of the shortest path connecting u and v.A path P connecting u and v in G is called the geodesic path, if it is an induced path, i.e., the distance between u and v in G is exactly the length of the path P .The diameter of a graph G is the greatest distance between two vertices of G, denoted by D(G).The set All vertices adjacent to vertex u are called neighbors of u.The neighborhood of u is the set of the neighbors of u.The number of edges adjacent to vertex u is the degree of u, denoted by d(u).Vertices of degrees 0 and 1 are said to be isolated and pendent vertices, respectively.A pendent vertex is also referred to as a leaf of the underlying graph.A vertex of degree i is also addressed as an i-degree vertex.The minimum and maximum degree of G is denoted by δ(G) and ∆(G) , respectively.If G has For terminology and notations not defined here, we refer the readers to [26].

Distance-Based Graph Entropies
Now we reproduce the definition of Shannon's entropy [1].
We are now ready to define the entropy of a graph due to Dehmer [5] by using information functionals.
Definition 2. Let G = (V, E) be a connected graph.For a vertex v i ∈ V , we define , where f represents an arbitrary information functional.
Observe that |V | i=1 p(v i ) = 1.Hence, we can interpret the quantities p(v i ) as vertex probabilities.Now we immediately obtain one definition of graph entropy of graph G. Definition 3. Let G = (V, E) be a connected graph and f be an arbitrary information functional.The entropy of G is defined as Distance is one of the most important graph invariants.We first restate some definitions of the information functionals based on distances.In [5], the following information functional was introduced: where c j with j = 1, 2, . . ., D(G) and α are arbitrary real positive parameters.The information functional proposed in [20] is calculated for a vertex v i as the entropy of its shortest distances from all other vertices in the graph: where D(v i ) = u∈V d(v i , u).The aggregation function over all distances of vertices in the graph is proposed as follows: The information functional based on the shortest distances is introduced in [29]: There are also some functionals based on the betweenness centralities [29,30].
In this paper, we consider a new information functional, which is the number of vertices with distance k to a given vertex.For a given vertex v in a graph, the number of vertices with distance one to v is exactly the degree of v. On the other hand, the number of pairs of vertices with distance three, which is also related to the clustering coefficient of networks, is also called the Wiener polarity index introduced for molecular networks by Wiener in 1947 [22].For more recent results on Wiener index and Wiener polarity index, we refer to [31][32][33][34][35][36][37][38][39][40][41].
Let G = (V, E) be a connected graph with n vertices and v i ∈ V (G).Denote by n k (v i ) the number of vertices with distance k to v i , i.e., where k is an integer such that 1 ≤ k ≤ D(G).Definition 4. Let G = (V, E) be a connected graph.For a vertex v i ∈ V and 1 ≤ k ≤ D(G), we define the information functional as: Therefore, by applying Definition 4 and Equality (2), we obtain the special graph entropy In this paper, we will discuss the extremal properties of the above graph entropy.

Results and Discussion
Observe that for k = 1, n 1 (v i ) = d(v i ) is the degree of v i and which has been studied in [19] for some classes of graphs.If we denote the number of edges by m, then we have since n i=1 d i = 2m.Denote by p k (G) the number of geodesic paths with length k in graph G. Then we have n i=1 n k (v i ) = 2p k , since each path of length k is counted twice in n i=1 n k (v i ).Therefore, Equation (3) can be represented as The number of paths with length k in a given graph is widely studied by Erdös and Bollobás; we refer the readers to [42][43][44][45][46][47].Since there are some good algorithms for finding shortest paths in a graph, such as Dijkstra's algorithm [26], we can obtain the following result.
Proposition 5. Let G be a graph with n vertices.For a given integer k, the value of I k (G) can be computed in polynomial time.
Let T be a tree with n vertices and V (T ) = {v 1 , v 2 , . . ., v n }.In the following, we consider the properties of I k (T ) for k = 2.
First, we study the values of p 2 (T ) and n k (v i ).Observe that Then from Equation ( 5), we have .
If T ∼ = S n is a star graph, then we have If T ∼ = P n is a path graph, then we have Let T be a tree with n vertices.By calculating the values I 2 (T ) for n = 7, 8, 9, 10, we can obtain the trees with extremal values of entropy.The trees with maximum and minimum values of I 2 (T ) are shown in Figures 1 and 2, respectively.As we have seen from Figure 1, for n = 7, 8, 9, 10, the maximum value of I 2 (T ) is attained when T is the balanced double star S ⌊ n 2 ⌋,⌈ n 2 ⌉ .By some elementary calculations, we have It is easy to obtain the following result.Theorem 6.Let S n , P n , S ⌊ n 2 ⌋,⌈ n 2 ⌉ be the star graph, the path graph and the balanced double star graph with n vertices, respectively.Then we have Proof.First, we have for all n ≥ 3.For n = 2k, The case of n = 2k + 1 is similar.From Figure 2, for n = 7, 8, 9, 10, the minimum value of I 2 (T ) is attained when T is a comet.For a comet CS(n, t) with n − t ≥ 3, by some elementary calculations, we have Denote by t 0 the root of g(t) = 0. Then CS(n, t 0 ) is the tree with the minimum value of entropy among all comets.
In fact, for a tree T with n vertices, we can guess the following result.However, this was neither successfully proved nor disproved despite many attempts.
Conjecture 7.For a tree T with n vertices, the balanced double star and the comet CS(n, t 0 ) can attain the maximum and the minimum values of I 2 (T ), respectively.
Observe that the extremal graphs for n = 10 is not unique.From this observation, we can obtain the following result.Theorem 8. Let CS(n, t) be a comet with n − t ≥ 4. Denote by T a tree obtained from CS(n, t) by deleting the leaf that is not adjacent to the vertex of maximum degree and attaching a new vertex to one leaf that is adjacent to the vertex of maximum degree.Then we have I 2 (T ) = I 2 (CS(n, t)).
Proof.Let CS(n, t) be a comet with n − t ≥ 4. Let w be the vertex with maximum degree t.Denote by u the leaf of CS(n, t) that is not adjacent to w, and v is one leaf that is adjacent to w.Let T = CS(n, t) − u + uv.Note that the degree sequence of T is the same as CS(n, t).Then we only need to check the part For a given graph G, we define a sequence s(n 2 , G) = (n 2 (v 1 ), n 2 (v 2 ), . . ., n 2 (v n )).
Actually, the above proof provides a method to verify whether two graphs have the same value of entropy.If G and H are two graphs with the same vertex set and the same degree sequence, then we can use the defined sequence s(n 2 , G) to check whether G and H have the same value of entropy.

Conclusions
Many distance-based entropies have been proposed and studied.In this paper, based on Shannon's entropy, we study graph entropies related to a new information functional, which is the number of vertices with distance k to a given vertex.One of the future works is to explore the discrimination power of this entropy.Some properties of this entropy of graphs are characterized.Similar to other entropies [48], to determine the extremal values of I k (G) and characterize the extremal graphs is a challenging problem.It seems also much complicated for trees.One possible attempt is to establish some graph transformations, which can increase or decrease the values of the entropy.

Figure 1 .
Figure 1.The trees with maximum value of I 2 (T ) among all trees with n vertices for 7 ≤ n ≤ 10.

Figure 2 . 9 n
Figure 2. The trees with minimum value of I 2 (T ) among all trees with n vertices for 7 ≤ n ≤ 10.