Group Degree Centrality and Centralization in Networks

: The importance of individuals and groups in networks is modeled by various centrality measures. Additionally, Freeman’s centralization is a way to normalize any given centrality or group centrality measure, which enables us to compare individuals or groups from different networks. In this paper, we focus on degree-based measures of group centrality and centralization. We address the following related questions: For a ﬁxed k , which k -subset S of members of G represents the most central group? Among all possible values of k , which is the one for which the corresponding set S is most central? How can we efﬁciently compute both k and S ? To answer these questions, we relate with the well-studied areas of domination and set covers. Using this, we ﬁrst observe that determining S from the ﬁrst question is N P -hard. Then, we describe a greedy approximation algorithm which computes centrality values over all group sizes k from 1 to n in linear time, and achieve a group degree centrality value of at least ( 1 − 1/ e )( w ∗ − k ) , compared to the optimal value of w ∗ . To achieve fast running time, we design a special data structure based on the related directed graph, which we believe is of independent interest.


Introduction
In most networks, some vertices are more central than the others. To model this intuitive feeling, centrality indices were introduced. The first mathematical concept of centrality of graphs was introduced almost 150 years ago by Jordan [1]. There are many ways to provide a measure of the relative "importance" of a node in a network, where the different motivations lead to different centrality measures that were developed in several areas of science.
Node centrality. A social network is typically represented as a graph, where individuals are represented as vertices, and the relationships between pairs of individuals as edges. In the paper, we will freely interchange the terms vertex/node and graph/network, without any meaningful difference.
Various vertex-based measures of centrality have been proposed to determine the relative importance of a vertex within a graph. Arguably, the most common branch of centrality indices is based on the distance between the nodes of the network. Some of the standard centrality indices from this branch are degree, betweenness, closeness, and eccentricity. Among other measures of node centrality, a few of the better known in network analysis are: eigenvector centrality, Google PageRank, Katz centrality, Alpha centrality, and others. For detailed definitions and discussions on various centrality indices, we refer the reader to [2][3][4].
Another concept of vertex centrality, the personalization, was introduced in 2003 (see [5]), and is a measure that shows how central an individual is according to a given subset R (group of important people) in a given social network. In 2005, the subgraph centrality [6] was introduced, which characterizes the participation of each node in all subgraphs in a network and is calculated from the spectra of the adjacency matrix of the network. Recently, Bell [7] introduced the concept called the subgroup centrality, where centrality (of one vertex) is calculated only on a restricted set of vertices. The most basic centrality measure is the degree of a vertex, which we study in this paper.
In 1999, Everett and Borgatti [8] introduced the concept of group centrality which enables researchers to answer questions such as, "How central is the engineering department in the informal influence network of this company?" or, "Among middle managers in a given organization, which are more central, men or women?" With these measures, we can also solve the inverse problem: given a network of ties among organization members, how can we form a team that is maximally central? In [8], the authors introduced group centrality for measures of degree, closeness, and betweenness centrality, which we use in this paper. In 2006, another important group centrality measure motivated by the key players problem was introduced (see [9]). In 2011, Miyano et al. [10] discussed the problem of finding the best group for the so-called k-vertex maximum domination problem (or k-maxVD, in short), which in fact corresponds to maximizing the group degree centrality (introduced in 1999 [8]), where the score is further increased by a constant k.
Freeman's centralization. In his study, Freeman [11] realized that despite all of the vertex-centrality indices defined up to that point, there was a need for a normalization which could measure the relative importance of a given vertex in a network and would be based on any chosen centrality index. Hence, he defined a centralization measure based on normalized variance in vertex centrality of any chosen centrality measure, with an aim to allow a comparison of distinct networks on the basis of their highest vertex-centralization scores. One may also consider his approach as another type of vertex-centrality, which measures the extent to how some vertex in a network stands out from others in terms of a given centrality index. Every centrality measure can have its own centralization measure. In order to calculate centralization for some vertex measure M G : V(G) → R in a given graph G, we first define When there is no risk of confusion regarding the network G, we write M 0 (v) instead of M 0 (G, v). In the same paper, Freeman remarked that the centralizations of degree centrality, betweenness centrality, and closeness centrality achieve their maximum if, and only if G is a star. The statement was later proved in detail by Everett, Sinclair, and Dankelmann [12]. In order to compare centralization values of different graphs with possibly different sizes, in the definition of centralization, Freeman used a normalized formula, where the normalizing divisor is based on the theoretically largest centrality variance in any graph from a given class G of graphs [11]: Following Freeman's approach, the group centralization notion was introduced in [13], which brings us to the focus of this paper.

Group Degee Centrality and Centralization
Let G n be the family of all graphs on n vertices, let G ∈ G n , and let S ⊆ V(G). According to [8], the group degree centrality is defined as where N(v) stands for the set of vertices adjacent to v in G. Also observe that introducing an isolated vertex to a group has zero contribution towards its group degree centrality. Such a vertex will never be considered in the optimal group S, unless S already dominates the non-isolated vertices of G. For easier notation, we will thus only consider graphs without isolated vertices. Given a graph G and integer k, let S * G,k be one of the sets from ( V(G) k ) that achieves the maximum value of group degree centrality, that is, GD G (S). Observe that S * G,k also takes the maximum value for group degree centralization. Whenever the graph G or the value k are known from the context, we omit the subscript from the notations of centrality, centralization, and similarly simplify the optimal group label to S * k . As defined in [11,13], GD 1 (S, G) stands for the group degree centralization. Define k := |S|, and observe According to Freeman [11], the denominator is needed to normalize centralization, and in turn achieve better relative comparison. Indeed, notice that such normalized values will never exceed value 1. Clearly, as we already mentioned, GD 1 (S, G) is maximized whenever GD(S, G) is maximized, and by Krnc et al. [13] we have that the maximum value of the denominator corresponds to the star graph S n and a maximizing set S * k corresponds to any k-set containing the center of the star. In Figure 1 the group degree centralization is presented of a particular graph. Given a graph G, define the maximizing group size k * (G) to be the positive integer, such that where G is known from the context, and we also write S * := S * k * (G) . Notice that S * achieves the maximum value for group degree centralization as well.
In what follows, we optimize the procedure of calculating the group degree centrality for a given graph and an input integer k. We start by calculating the denominator of the group degree centralization. Lemma 1. Let G be a star on n vertices. Then, Proof. Denote the center of the star by c, partition the sets of ( V(G) k ) into parts P 1 and P 2 depending on whether or not they contain the vertex c as a member, and observe that the group degree centrality of members of these parts equals to n − k and 1, respectively. As In the following proposition, we use a classical graph theoretical approach of double counting to show that the sum ∑ S ∈( V(G) k ) GD(S , G) from (1) can be computed efficiently.

Proposition 1.
Let G be a graph on n vertices, and let k ≤ n be a positive integer. It holds that GD(S , G) can be computed in O(n) time.
Proof. For each vertex v ∈ V(G), define its contribution g k (v) to be the number of k-sets that dominate v, that is, By the double counting argument, it follows that Thus, we conclude which can be computed in O(n) time, traversing all vertices once.
We join the results from Lemma 1 and Proposition 1 to further develop (1), and claim the following.
Theorem 1. For a given graph G on n vertices and a group of its vertices S of size k, the group degree centralization can be evaluated as which can be computed in O(n) time.
It is easy to see that finding S * k can hence be computed in O(n k+1 ), by traversing over all k-tuples and computing group degree centrality at each iteration. However, as k grows, this may not be a feasible approach. As shown below, for an input value of k, determining S * k is N P-hard. First, recall how the set S of cardinality k is said to be k-dominating whenever v∈S N(v) ∪ S = V(G).

Proposition 2.
The problem that determines a set S * k for a given input graph G and an integer k is N P-hard.
Proof. We prove the claim by reducing a well-known N P-problem of determining the existence of a k-dominating set to our problem of finding S * k . Let us assume that there exists a polynomial algorithm for finding a k-set S * k ⊆ V(G), such that Now, observe that the existence of a k-dominating set is equivalent to the property GD G (S * k ) = n − k.
As group degree centrality of a given fixed set S * k can be computed in polynomial time O(nk), it is clear that the set S * k provides an answer regarding the existence of a k-dominating set.
In the last section, we present an efficient algorithm for calculating group degree centrality scores for all group sizes.

Greedy Computation of Degree Centrality
By Proposition 2, it is NP-hard to determine the group with the biggest degree centrality. In this section, we present a greedy approach (Algorithm 1) to identify k-sets which are close to achieving the biggest group degree centrality for a given network, for 1 ≤ k ≤ n, and describe how one can obtain the corresponding group degree centralization values. Both procedures are described in Algorithms 2 and 3, respectively. We also discuss the corresponding approximability and time complexity.
The greedy algorithm presented here behaves in a natural way: Algorithm 1 An outline of the greedy algorithm for group degree centrality.
Input: Graph G. Output: An ordering S of V(G) whose prefixes correspond to approximations of sets maximizing group degree centrality. A list L of corresponding group degree centrality values. In what follows, we describe the important elements of the algorithm. In the next subsection, we present Algorithm 1 in more detail, such as Algorithm 2. The majority of Algorithm 2 deals with step two of the above pseudocode-maintaining all vertices from V(G) \ S in ordered buckets, labeled by their contribution towards the group degree centrality to set S. For efficient implementation of those transitions, we utilised a particular directed graph as a dynamically changing data structure, which enabled us to achieve a linear running time. In the following two subsections, we consider the centralization and complexity, and the approximability analysis as well.
Algorithm 2 Greedy approach to finding a group with the maximum degree centralization.

Input: Graph G.
Output: An ordering S of V(G) whose prefixes correspond to approximations of sets maximizing group degree centrality. A list L of corresponding group degree centrality values.  Append a vertex v from maximal bucket to S.

6:
Append for all u ∈ N + (v) do 8: for all w ∈ N − (u) do 9: Remove (w, u) from G . update buckets 10: end for 11: end for 12: Remove v from G . update buckets 13: end for 14: Return S and L.

Algorithm 3 Computing Freeman group degree centralization values.
Input: Graph G.
Output: An ordering S of V(G) whose prefixes correspond to approximations of sets maximizing group degree centrality. A list L 1 of corresponding group degree centralization values.
1: Let S and L the lists obtained by Algorithm 2 on input G.
to L 1 . see Theorem 1 6: end for 7: Return S and L 1 .

The Directed Graph Structure for Implementation
In this subsection, we consider Algorithm 1, which we implement by using a directed graph as a data structure. The corresponding implementation is described as Algorithm 2. The algorithm returns an ordering S = (v 1 , . . . , v n ) of V(G), whose prefixes correspond to approximations of sets maximizing group degree centrality, together with a list L of corresponding group degree centrality values. Define S k = {v 1 , . . . , v k }. We start with an empty list S, and in each iteration, we append a vertex to S which maximizes the contribution towards the current group degree value. The added group degree contributions are then accordingly appended to the list L, and this process is repeated as k increases towards the graph order n.
We describe the state after each iteration k by maintaining a directed graph G on the vertex-set uv ∈ E(G), and • v is not currently dominated by any vertex from S.
At the initialization (i.e., when S is empty), the initial graph G contains all of the edges of G in both directions (line 2). Denote by G our digraph in all iterations of the main loop. For a given vertex v / ∈ S k , let its contribution towards the group degree centrality be defined as GD G (S k ∪ {v}) − GD G (S k ). Then, the following holds: Proposition 3. Let v ∈ V(G) \ S k , at the beginning of (k + 1)-th iteration. Then, its contribution equals Proof. First, note that each out-neighbour of v clearly increments the degree-centrality score by one. Since we assume v was initially not an isolated vertex, it is dominated by S k whenever deg − G (v) = 0. In the case when deg − (v) > 0, that is, when vertex v is not dominated by any other vertex from S k , it is clear that vertex v did not contribute towards the value of GD(S k ), so we conclude that the contribution simply equals to deg − G (v) = 0. To conclude the proof, it remains to consider the case when deg − (v) = 0. Note that in this case vertex, v was already contributing towards ∪ u∈S k N [u]. Since this is not the case after adding v to S k , we have to subtract one from the overall total group degree score.
It is important to point out that we keep the vertices with the same contribution within the same bucket, and maintain this bucket-structure across all iterations, so it is easy to identify the vertex v, maximizing its contribution. The main loop hence consists of greedily selecting a vertex v which maximizes the group degree centrality contribution, appending it to S (line 5), then appending the corresponding group degree centrality value to L (line 6), and finally updating the directed graph G (lines 7-10). This maintenance of G is done by removing v from G , and by removing edges towards all the newly dominated vertices, that is, towards vertices in N + (v).

Obtaining Freeman Group Degree Centralization
In order to obtain corresponding values of Freeman's group degree centralization, we need to normalize group degree centrality scores from Algorithm 2, which is described in Algorithm 3.
To determine the sum of all group degree centralities over all k-sets, we apply Proposition 1, for arbitrary value of k. Together with Lemma 1, this suffices to finalize (2) from Theorem 1.

Complexity and Approximability Analysis
A careful reader may notice a similarity between our greedy approach with the classical greedy appriximation algorithm for MAX COVERAGE or MAX k-VERTEX DOMINATION problems (see [10,14]). While this greedy approach gives an approximation ratio of 1 − 1/e for both mentioned algorithms, any such constant is not attainable for the k-group degree centrality. We now describe the approximability and time complexity of Algorithms 2 and 3, summarised in Theorem 2.
Theorem 2. Given a graph G on n vertices and m edges, the greedy algorithm for k-group degree centrality over all set sizes 1 ≤ k ≤ n altogether runs in linear time O(n + m) and achieves the k-group degree centrality value of at least (1 − 1/e)(w * − k), where w * is the maximizing k-group degree centrality of G.
Proof. We first deal with the time complexity for Algorithm 2. It is easy to see that the execution of lines outside of the main for-loop takes O(n) time. For the main for-loop, first note that performing decrementation of a bucket for any vertex takes a constant time. It is also important to observe that any (directed) edge may be identified in a constant time (i.e., lies at up to second neighbourhood of v), and is removed once only. The removal of arbitrary edge (u, w) causes the decrementation of buckets at u, and sometimes also at w. The total number of steps performed by both for-loops in lines 7 and 8 is hence within Θ(m), and so the overall time complexity of Algorithm 2 is Θ(m).
Regarding Algorithm 3, we focus on the complexity of the main for-loop. In particular, line 4 can be computed in O(n) time, by traversing all vertices and looking at their degrees (see Proposition 1). After this, line 5 is simply a calculation of (2) from Theorem 1, which is performed in O(1). Thus, the overall time complexity of computing the group degree centralization for all values of k is O(n 2 ). Note however that if one is interested in an approximation for a single set-size k, this can be done in O(m) by executing Algorithm 3 for the specified value of k only.
Let O k = {w 1 , w 2 , . . . , w k } ⊆ V(G) be the set maximizing k-group degree centrality, let NO k = ∪ k j=1 N[a j ] \ O k , and let w * = | NO k | − k be the corresponding value of GD(O k ). For 1 ≤ i ≤ k, assume that Algorithm 2 selects vertices a i in the i-th step, and define y i = GD({a 1 , . . . , a i }) where additionally, we set y 0 = 0 and z 0 = w * . Notice that x i denotes the contribution of a i , and z i tells us how far we are from the optimal value at the i-th step. Figure 2. A subdivision of a graph S 9 gives a graph on 19 vertices where Algorithm 2 cannot provide an optimal set which maximizes 9-degree centrality.
Additionally, while in this paper we only study the group degree centralization, one should ask similar algorithmic questions for the group centralization of some other centrality indices. One could also consider modifying Freeman's centralization and consider a different type of normalization for group centrality measures which could preferably be more efficient to calculate.