Next Article in Journal
A New Approach in Analytical Dynamics of Mechanical Systems
Next Article in Special Issue
General Solutions for Descriptor Systems of Coupled Generalized Sylvester Matrix Fractional Differential Equations via Canonical Forms
Previous Article in Journal
Application of the Structure Function in the Evaluation of the Human Factor in Healthcare
Previous Article in Special Issue
Isomorphism Theorems in the Primary Categories of Krasner Hypermodules
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Fast and Exact Greedy Algorithm for the Core–Periphery Problem

Department of Mathematics, Computer Science and Physics, University of Udine, 33100 Udine, Italy
*
Author to whom correspondence should be addressed.
Symmetry 2020, 12(1), 94; https://doi.org/10.3390/sym12010094
Submission received: 6 December 2019 / Revised: 27 December 2019 / Accepted: 30 December 2019 / Published: 3 January 2020
(This article belongs to the Special Issue Symmetry in Numerical Linear and Multilinear Algebra)

Abstract

:
The core–periphery structure is one of the key concepts in the structural analysis of complex networks. It consists of a partitioning of the node set of a given graph or network into two groups, called core and periphery, where the core nodes induce a well-connected subgraph and share connections with peripheral nodes, while the peripheral nodes are loosely connected to the core nodes and other peripheral nodes. We propose a polynomial-time algorithm to detect core–periphery structures in networks having a symmetric adjacency matrix. The core set is defined as the solution of a combinatorial optimization problem, which has a pleasant symmetry with respect to graph complementation. We provide a complete description of the optimal solutions to that problem and an exact and efficient algorithm to compute them. The proposed approach is extended to networks with loops and oriented edges. Numerical simulations are carried out on both synthetic and real-world networks to demonstrate the effectiveness and practicability of the proposed algorithm.

1. Introduction

The structural analysis of graphs and networks found in the real world aims at extracting high-level features or summarizing relevant properties of large data sets. That analysis can be carried out at different levels, depending on specific goals and viewpoints. A microscopic, node-level analysis intends to discover which network elements are most relevant for e.g., network cohesion and functioning, information or epidemic diffusion, or flow control. At the other extreme, one may want to concisely describe a quantitative property of a given network by one or a few numbers, for example, the overall number of nodes or edges and the average node degree, which quantify how large is a network and how strongly is connected, respectively.
A third approach to network analysis, which lies in between these two extremes, is the so-called network clustering or block-modeling [1,2,3]. Roughly speaking, it consists of detecting the presence of various types of intermediate structures, typically consisting of groups of nodes that exhibit very specific patterns of relationships in their interior and with the rest of the network. For instance, a group of nodes that is rich in internal connections but is loosely connected with the exterior can be identified as a community. Community detection problems have received considerable attention in the network science literature because communities emerge naturally in many social, biological, or collaboration networks [3,4]. Besides communities, other network structures can be recognized and are attracting considerable interest nowadays. For example, in the analysis of biological networks including predator and prey species or social networks with ties representing hostilities, an important task is the identification of anti-communities, namely, groups of nodes being poorly connected internally but having many links between different groups. Communities and anti-communities may coexist, for instance in communication and social networks, giving rise to nested block structures that are difficult to disentangle [1,5,6].
Another block structure recurring in the analysis of complex networks is the core–periphery structure, whose investigation has been popularized by the seminal works by Borgatti and Everett [7]. In the last two decades the interest toward core–periphery structures had a remarkable growth in the network science literature [6,8,9,10]. In its simplest form, a core–periphery structure can be described as a partitioning of a given graph or network into two groups of nodes, called core and periphery. The core nodes are tightly interconnected and share connections with peripheral nodes, while peripheral nodes are loosely connected to core nodes and to other peripheral nodes. The presence of core–periphery structures has been detected in a large variety of complex systems, such as social networks, trade networks, and financial networks, see e.g., [9,11,12].
These network structures are well described by means of suitable block partitionings of an adjacency matrix representing idealized situations [1,2,3]. Various basic mesoscale network structures that can be described by a block partitioning of the adjacency matrix are illustrated in Figure 1. Darker shadings represent matrix blocks having more nonzero entries, and denote the presence of stronger connections between the corresponding node sets. These idealized block structures can be identified by a variety of statistical and matrix analysis tools, give a reasonable approximation of structured real-world networks and can be easily reproduced synthetically.
In this paper, we address the case where a core–periphery structure is present in a given graph and must be discovered. We consider a core–periphery bipartitioning method considered by Brusco in [13]. This method is based on a block model where off-diagonal blocks are neglected, and identifies the core nodes as belonging to a subset that minimizes a certain objective function. Our main result is an algorithm that exactly solves that combinatorial optimization problem with a computational cost that grows at most quadratically in the number of nodes. This significantly improves the computational time required to solve the problem with respect to the branch-and-bound procedure proposed in [13]. The combination of computational efficiency and theoretical justification makes our algorithm attractive for the analysis of large networks.
In the next section, we introduce the combinatorial optimization problem we adopt to define the core–periphery partitioning and derive an exact algorithm to solve it. Since that problem may have multiple optimal solutions, we also provide a complete description of the optimal solution set and show a pleasant symmetry in our approach with respect to graph complementation. At first, we consider the core–periphery partitioning problem in loopless graphs whose adjacency matrix is symmetric. In Section 3 we extend our optimization approach to graphs with loops and to directed graphs. Section 4 is devoted to applications of our methodology to two real-world networks and two graph families that are relevant to complex network analysis, that is, equitable graphs and power-law networks.

1.1. Related Work

Core–periphery models are basically divided into discrete and continuous models. Discrete models induce a binary classification of the nodes as belonging either to the core, which has a high edge density, or to the periphery, whose edge density is low. The sought partitioning is determined by maximizing a correlation index or other matrix nearness measures between the adjacency matrix and an idealized block structure, see e.g., [12,13,14,15]. In continuous models each node obtains a value describing its “coreness”; unlike in discrete models, that value is a real, positive number. Continuous models may be based on a matrix similarity criterion, analogously to discrete models [6,10], or on a node-level centrality index quantifying the extent to which a vertex controls the information that flows through the network. To the latter class belong various bipartitioning algorithms based on closeness or betweenness indices [9,14] and other measurements based on random walks [11,16]. Both continuous and discrete models have been pioneered by Borgatti and Everett [7]. They proposed two prototypical block structures for core–periphery partitionings, namely, those depicted in Figure 1c,d. The basic assumption is that every two core nodes are directly connected while peripheral nodes have no direct links. Connections between core nodes and peripheral nodes may be requested (see Figure 1c) or not (see Figure 1d). Recent reviews on modeling and analyzing core–periphery structures, together with the related concepts of degree assortativity, rich-clubs, and nested network structures, can be found in [6,8,14,16].

1.2. Notation

For the purposes of this work, we borrow some basic notions from graph theory [4,17]. A graph or network G consists of a finite set V of nodes or vertices and a set E of edges. Each edge e E is an unordered pair of nodes. If e = { i , j } then we say that nodes i and j are adjacent, and that e is incident to both i and j. For notational simplicity, we identify V with { 1 , 2 , , n } and we denote by i j the edge { i , j } . Note that i j = j i . A complete graph is a graph in which all nodes are adjacent to each other. In some cases, it is useful to admit the presence of loops, that is, edges having the form i i . Unless otherwise specified, we mainly consider simple graphs, i.e., graphs without loops and parallel edges, that is, edges incident to the same node pair.
A weighted graph is a graph endowed with an edge weight, that is a function w : E R assigning a real (usually positive) number to every edge. We denote w i j the weight of the edge i j . Clearly, w i j = w j i .
The adjacency matrix of a simple graph G with n nodes is the n × n symmetric matrix A = ( a i j ) such that a i j = 1 if i j is an edge in G and a i j = 0 otherwise. If the graph is weighted then a i j = w i j if i j is an edge in G and a i j = 0 otherwise. For i = 1 , , n define
d i = j = 1 n a i j .
In an unweighted graph d i is the degree of node i, that is, the number of edges incident to it. If the graph is weighted, then the number d i is usually called strength or generalized degree of i. It represents the total weight of edges incident to i, and may not be an integer. In what follows, we simply call d i the degree of node i, both in weighted and unweighted graphs.
The sets S 1 , , S form a partition of V if k = 1 S k = V and S i S j = for i j . The cardinality of a subset S V is denoted by | S | , and the complementary set V S by S ¯ . The pair { S , S ¯ } is a bipartition of V. The subgraph induced by S is the graph G = ( S , E ) where i j E iff i j E and { i , j } S . Equivalently, if A is the adjacency matrix of G then the adjacency matrix of G is the submatrix of A extracted from rows and columns with indices in S. Finally, the volume of S V is
vol ( S ) = i S d i = i S j = 1 n a i j .

2. Identifying a Core–Periphery Bipartition

Let G = ( V , E ) be a weighted graph and let A be its adjacency matrix. We assume that every edge i j E is endowed with a positive weight 0 < w i j 1 , so that the entries of A belongs to the interval [ 0 , 1 ] . We may think of a i j as a relative measure, or fraction, of the strength of the interaction between nodes i and j with respect to a maximal capacity: If a i j = 1 then the bond between the two nodes is perfect, or as strong as possible, while if a i j = 0 then the two nodes do not interact; all intermediate values between these two extremes can be allowed. Obviously, this assumption includes trivially the case where the graph to be considered is unweighted, in which case w i j = 1 for every i j E .
In this section, we address the problem of identifying a core–periphery bipartition of G that we suppose to exist in the graph. Note that we are not considering the problem of deciding whether or not that bipartition actually exists, that is, measuring the statistical significance of that structure, which is a different problem considered in, e.g., [9,11]. For the moment, we restrict ourselves to the case where G has no loops or they must be ignored if present, as is common in most network applications. Consider the following function defined over subsets of V:
z ( S ) = i , j S i j ( 1 a i j ) + i , j S i j a i j , S V .
The first term counts the overall weight that is missing in the subgraph induced by S with respect to the ideal situation where all node pairs are perfectly connected, while the second term is the overall weight of edges that are present in the subgraph induced by the complementary set S ¯ . Thus, if z ( S ) is close to zero then S can be recognized as a core in the graph since its nodes are tightly connected to each other; at the same time, the nodes in S ¯ induce a subgraph with few or no edges, hence S ¯ represents a peripheral zone of the network. Therefore, the localization of a core–periphery structure in G can be carried out by looking for a set S V attaining a minimal value of z ( S ) . In the sequel we will analyze the following combinatorial optimization problem, also considered in [13]:
min S V z ( S ) .

2.1. A Greedy Algorithm for Problem (2)

For any vertex set S V define the following auxiliary quantities:
  • the overall weight of the edges in the subgraph induced by S is e ( S ) = i , j S a i j
  • the overall weight of the edges having exactly one node in S is ( S ) = i S j S a i j .
It is immediate to see that vol ( S ) = e ( S ) + ( S ) . Moreover, owing to the symmetry of A, we have ( S ) = ( S ¯ ) . Hence the function z ( S ) can be rewritten as follows:
z ( S ) = | S | ( | S | 1 ) e ( S ) + e ( S ¯ ) = | S | ( | S | 1 ) vol ( S ) + ( S ) + vol ( S ¯ ) ( S ¯ ) = | S | ( | S | 1 ) vol ( S ) + vol ( S ¯ ) = | S | ( | S | 1 ) + vol ( V ) 2 vol ( S ) .
The last passage comes from the identity vol ( S ) + vol ( S ¯ ) = vol ( V ) .
Observe that Problem (2) can be restated equivalently as
min k = 1 , , n min S V | S | = k z ( S ) .
With this restatement we can reduce the solution of Problem (2) to that of at most n subproblems endowed with a cardinality constraint. Finding a set S V that minimizes z ( S ) having a prescribed cardinality is an easy task. In fact, owing to the previous arguments, if | S | = k then z ( S ) = k ( k 1 ) + v 2 vol ( S ) , where v = vol ( V ) = i , j a i j is the overall weight of edges in G. However, since v does not depend on k, its presence is unessential in the solution of Problem (2), and an optimal solution of the k-th subproblem consists of a set S with cardinality k and maximum volume. Such set can be easily computed by the greedy strategy described here below.
Without loss of generality, we can assume that the nodes of G are numbered in non-increasing order of their degrees:
d 1 d 2 d n .
Then, for each given k = | S | , a set which minimizes z ( S ) is S = { 1 , , k } . Consequently, the optimal value of z ( S ) under the constraint | S | = k is given by
z k : = min S V | S | = k z ( S ) = k ( k 1 ) + v 2 i = 1 k d i .
Finally, denoting by z the optimal value of Problem (2), it holds z = min k z k . Letting σ k = z k v , we have the equivalent characterization z = v + min k σ k . In conclusion, Problem (2) can be solved by the following three-step procedure:
  • For i = 1 , , n compute the degrees d i = j a i j and reorder the nodes in non-increasing order by degree, as in Formula (3).
  • For k = 1 , , n let σ k = k ( k 1 ) 2 i = 1 k d i .
  • Let k be an integer such that σ k = min k σ k . Then the optimal value of Problem (2) is z = z k and the set S = { 1 , , k } is an optimal solution to Problem (2).
The computation of the optimal index k can be simplified by noting that the numbers
Δ k : = σ k σ k 1 = 2 ( k 1 ) 2 d k
form a strictly increasing sequence:
Δ k + 1 Δ k = 2 + 2 ( d k d k + 1 ) 2 .
This implies that the sequence σ 1 , , σ n is convex, and its minimum is attained at the largest integer k such that Δ k < 0 . Hence, the step 2 in the preceding procedure needs to be performed only as long as Δ k < 0 , that is, d k > k 1 . In summary, the previous procedure can be better described by the Algorithm 1 here below, which provides an optimal solution to Problem (2). This fact, and the possible existence of other optimal solutions, is formalized in the subsequent results.
Algorithm 1: Computing an optimal solution of Problem (2)
Input: symmetric adjacency matrix A = ( a i j ) of size n
Output: integer k , core set S
Symmetry 12 00094 i001
Compute a permutation π of { 1 , , n } such that d π ( 1 ) d π ( 2 ) d π ( n )
k 1
Symmetry 12 00094 i002
k k
Define the core set S = { π ( 1 ) , , π ( k ) }
Theorem 1.
Under the node ordering Formula (3), let k be the integer defined as
k = max { k { 1 , , n } : d k > k 1 } .
Then, Problem (2) has an optimal solution having cardinality k , that is,
min S V z ( S ) = z k .
One such optimal solution is given by S = { 1 , , k } . That solution is the unique optimal solution with cardinality k if and only if d k > d k + 1 .
Proof. 
The fact that S solves Problem (2) has been shown by the previous arguments. In fact, the inequality d k > k 1 appearing in Equation (5) is equivalent to the condition Δ k = σ k σ k 1 < 0 . Note that that condition cannot be replaced by d k k since d k may not be an integer if the graph G is weighted. Uniqueness holds if and only if there is exactly one way to choose k nodes whose degree is not smaller than d k , that is, d k > d k + 1 .  □
Remark 1.
The integer k in Equation (5) can be defined equivalently as the smallest integer k such that Δ k + 1 = σ k + 1 σ k 0 . Since σ k + 1 σ k = 2 ( k d k + 1 ) , we have the following alternative characterization:
k = min { k { 1 , , n 1 } : d k + 1 k } .
Corollary 1.
Every optimal solution to Problem (2) has cardinality either k or k + 1 . Moreover, an optimal solution of cardinality k + 1 exists if and only if d k + 1 = k .
Proof. 
Since the optimal value of Problem (2) is z k = v + σ k , there exists an optimal solution S with | S | = > k if and only if σ k = σ . In that case, if = k + 1 then we obtain
0 = σ k σ k + 1 = 2 d k + 1 2 k .
So the identity d k + 1 = k is equivalent to the existence of a solution S with | S | = k + 1 . To prove that no solution with cardinality larger than k + 1 can exist it is sufficient to observe that from Equation (4) for any k we get
2 Δ k + 1 Δ k = σ k + 1 2 σ k + σ k 1 .
Hence the identity σ k = σ k + 1 = σ k + 2 is impossible.  □
In conclusion, assuming that the degree sequence d 1 , , d n is ordered as in Formula (3), the optimal solution set of Problem (2) falls into one of the following alternatives:
  • If d k > d k + 1 and d k + 1 < k then S = { 1 , , k } is the unique optimal solution.
  • If d k > d k + 1 and there exists an integer h 1 such that k = d k + 1 = = d k + h then both S = { 1 , , k } and S { k + i } are optimal solutions, for every i = 1 , , h .
  • If there exists an integer h 1 such that d k = = d k + h then the problem is solved by any set made by either k or k + 1 vertices whose degree is not smaller than d k and containing all the vertices whose degree is greater than d k .
Example 1.
Consider the following simple graphs:
Symmetry 12 00094 i003
In both graphs the nodes are already ordered according to Formula (3), and we have k = 2 . In G 1 , Problem (2) has three optimal solutions, namely { 1 , 2 } , { 1 , 3 } and { 1 , 2 , 3 } , since the uniqueness condition d 2 > d 3 in Theorem 1 is broken and d 3 = 2 . In G 2 we have d 3 = d 4 = 2 , hence Corollary 1 indicates the existence of solutions having cardinality 2 and 3. Moreover, the smallest solution is unique, since d 2 > d 3 . In fact, the solutions are { 1 , 2 } , { 1 , 2 , 3 } and { 1 , 2 , 4 } .
Remark 2.
The computational cost (in terms of elementary arithmetic-logical operations) of Algorithm 1 can be analyzed as follows.
1. 
If the matrix A is given in a sparse format or, equivalently, the graph G is represented by an adjacency list, the degrees d 1 , , d n can be computed in O ( m ) operations where m is the number of edges in G. In any case, that cost is not larger than O ( n 2 ) .
2. 
Standard sorting algorithms can reorder the numbers d i in non-increasing order in O ( n log n ) operations, in the worst case.
3. 
The computation of the integer k in Equation (5) requires no more than O ( n ) operations.
Thus, Algorithm 1 requires O ( m + n log n ) operations, that is O ( n 2 ) operations in the worst case. We point out that the complexity of the branch-and-bound algorithm in [13] to solve Problem (2) is unknown, but the timings reported in that paper suggest an exponential behavior in n.

2.2. A Symmetry Property

The formalization of the core–periphery problem proposed in Problem (2) has a pleasant symmetry with respect to graph complementation. Given a loopless graph G = ( V , E ) with adjacency matrix A = ( a i j ) let G = ( V , E ) be its complementary graph, that is, the graph whose adjacency matrix is A = ( a i j ) where
a i j = 1 a i j i j 0 i = j .
If G is unweighted then any two distinct vertices are adjacent in G if and only if they are not adjacent in G. It is not hard to see that if S is a core for G then S ¯ is a core for G . Indeed, let z G ( S ) and z G ( S ) be the function z in Equation (1) computed in G and G , respectively. Then, from Equation (7) we have
z G ( S ) = i , j S i j ( 1 a i j ) + i , j S i j a i j = i , j S i j a i j + i , j S i j ( 1 a i j ) = z G ( S ¯ ) .
Thus, S is a solution of Problem (2) in G if and only if S ¯ solves Problem (2) in G .

3. Allowing Loops and Directed Edges

In this section we extend the core–periphery partitioning algorithm discussed previously to two different settings. Firstly, we consider the case where loops are present and must be taken into consideration; afterwards, we address directed graphs, that is, graphs where node adjacency relationships are unsymmetric.

3.1. Allowing Loops

Certain networks include loops, that is, edges of the form i i , which must be taken in due consideration in the structural analysis of the network. In that case, some diagonal entries of the adjacency matrix are nonzero. In fact, it is customary to set a i i = 1 in the adjacency matrix of an unweighted graph when the loop i i is present in the graph. In order to tackle appropriately the presence of loops, it is reasonable to modify the Definition (1) by including the contribution of the diagonal entries of A in the summations. Hereafter, we sketch briefly the consequent changes in the solution of the corresponding optimization Problem (2) with respect to the previous paragraphs.
  • Following the same arguments considered in the loopless case, the expression of z ( S ) becomes
    z ( S ) = | S | 2 + v 2 vol ( S ) , v = i , j a i j .
  • The optimal value of z ( S ) under the constraint | S | = k is given by z k = k 2 + v 2 ( d 1 + + d k ) , and the index k characterizing the (least) cardinality of a core is
    k = max { k { 1 , , n } : d k > k 1 / 2 } = min { k { 1 , , n 1 } : d k + 1 k + 1 / 2 } .
Anyway, the symmetry property described in the previous paragraph continues to hold. Indeed, when loops are allowed, the adjacency matrix of the complementary graph is A = ( a i j ) with a i j = 1 a i j .

3.2. Directed Graphs

A directed graph, or digraph, differs from a graph in that the incidence relation is unsymmetric. For definiteness, a directed graph is a pair G = ( V , E ) where V is a set of vertices or nodes and E V × V . Hence, any element e E is an ordered pair of nodes, e = ( i , j ) , called directed edge or arc. In this paragraph we consider directed unweighted graphs, so the adjacency matrix of G is the matrix A = ( a i j ) such that a i j = 1 if ( i , j ) E and a i j = 0 otherwise. In a directed graph, loops are usually admitted.
The identification of a core–periphery partitioning in a directed graph can be carried out by extending to the present setting the principle adopted in the undirected case. That is, we argue that core nodes should form a well-connected subset, inducing a subgraph that is as most complete as possible, while the number of arcs running between peripheral nodes should be kept as small as possible. In our approach, the number of arcs between core nodes and peripheral nodes is unimportant. This principle can be formalized as follows. Let A be the adjacency matrix of a directed graph G. We define the core of G as a set S V which minimizes the objective function
z ( S ) = i , j S ( 1 a i j ) + i , j S a i j .
Hereafter, we describe how the optimization of this function can be carried out with the help of the algorithm devised in the previous section.
Suppose we are given the adjacency matrix A of a directed unweighted graph G = ( V , E ) , and let B = 1 2 ( A + A T ) . We can think of B = ( b i j ) as the adjacency matrix of an undirected graph G = ( V , E ) having the same vertex set of the original graph and where each edge i j E is endowed by a weight w i j whose value is either 1 or 1 2 . In fact, when i j we have b i j = 1 iff both i j E and j i E , while b i j = 1 2 iff exactly one of i j E and j i E is true. Obviously, b i j = 0 means that both edges i j and j i are absent in G (and in G , too). Also loops in G and G are the same, since a i i = b i i . Moreover, for any subset S V we have the identity i , j S a i j = i , j S b i j . Finally, note that if every arc in G has its own reciprocal, that is i j E j i E , then A is symmetric and A = B .
Hence, for any S V , the value of z ( S ) computed in the directed graph G via Equation (8) is the same as the value of z ( S ) computed in the symmetrised and weighted graph G . Thus the only modification needed in the algorithm in the preceding section to deal with directed graphs is to define the numbers d i as
d i = j = 1 n b i j = 1 2 j = 1 n ( a i j + a j i ) .
The computation of the solution S can be carried out as outlined in the previous paragraph.

4. Examples

In this section, we discuss the application of our bipartitioning algorithm to two real-world networks that are popular benchmarks for graph blockmodel problems, and to two graph families often encountered as idealized models of complex networks, namely, the equitable graphs and the graphs having a power-law degree profile.

4.1. Two Real-World Networks

In this paragraph, we report the results of the application of Algorithm 1 to two real-world networks that are available in the Pajek network repository [18].
The USAir97 dataset is a graph built from the USA national aerial flights in 1997. Each node represents an airport, and two nodes are connected by an undirected edge if the corresponding airports are served by a direct flight in either direction. The graph has 332 nodes and 2126 edges, so the average degree is about 12.8 . Algorithm 1 identifies as core a well connected set of k = 35 nodes. The left panel in Figure 2 shows the adjacency matrix once the nodes have been reordered according to their degree, as in Equation (3). Colored dots are placed in correspondence with nonzero matrix entries. The matrix entries corresponding to the core nodes are shown in yellow. That submatrix is very dense, due to the fact that the subgraph induced by core nodes has 505 edges and the average degree in that subgraph is about 28.9 . A graphical layout of USAir97 is shown in the central panel in Figure 2, where core nodes are colored in yellow. The subgraph induced by the core nodes is displayed in the right panel. Incidentally, we note that in this example Problem (2) has also an optimal solution with 36 nodes, since d 35 = 36 , d 36 = 35 and d 37 = 34 .
The Zachary dataset represents a small social network made by members of a karate club [19]. The graph has 34 nodes and 78 edges representing ties among club members. This network is one of the most widely used testbeds for community detection algorithms. Here below, we compare the core identified by Algorithm 1 in this network with those identified by other algorithms in the recent literature [15,16]. For this network, Problem (2) has two optimal solutions with six nodes and one optimal solution with seven nodes. We consider the latter as core in the following discussion. The left panel in Figure 3 shows the pattern of the adjacency matrix after nodes are rearranged in non-increasing order by degree. Yellow dots mark the position of the non-zero entries in the core block. The central panel represents a graphical layout of the network. Each node is marked by its original node number to allow comparisons with related works, as shown in Table 1. Core nodes are marked in yellow. The rightmost panel represents the subgraph induced by the core nodes. Finally, Table 1 collects the results of Algorithm 1 together with those by three core–periphery partitioning algorithms presented in [15] (called BE-KL, 2-step, and divisive) and the rich-core algorithm in [16]. For each algorithm, we list the core nodes identified by it by their original node numbers. It is worth noting that, according to the analysis presented in [15], the Zachary network actually contains two distinct communities comprising a large part of the network, each community having an internal core–periphery structure.

4.2. Equitable Graphs

Let G = ( V , E ) be a graph with adjacency matrix A. A partition { V 1 , , V } of V is equitable if there exists an × matrix P = ( p i j ) , called the partition matrix, such that for all h , k = 1 , , ,
i V h , j V a i j = p h k .
Simply put, every node in the h-th block is connected with exactly p h k nodes in the k-th block. In that case, the graph G is called equitable with respect to the given partition. The subsets V 1 , , V are the blocks of the graph. For the Equation (9) to have solution, the entries of the partition matrix must fulfill the identities
| V i | p i j = | V j | p j i , i , j = 1 , , .
Equitable graphs have been used in many problems in combinatorics, algebraic graph theory, and community detection problems [20,21]. They represent the deterministic counterpart of the stochastic block models, which are random graphs where the probability that two nodes are adjacent depends on their block indices [1,2,3].
The idealized core–periphery structures introduced in [7] and shown in Figure 1c,d are defined by equitable graphs with two blocks.
Let G be an equitable graph with respect to the partition { V 1 , , V } and let P be the partition matrix. Note that the degree of a node depends only on its block index. Indeed, let β ( i ) denote the block index of node i V . Then, d i = j = 1 p β ( i ) , j . Thus if β ( i ) = β ( j ) then d i = d j . The goal of this paragraph is to analyze to what extent the core–periphery algorithm proposed here recognizes the nodes with largest degree as core nodes in an equitable graph. For i = 1 , , let d ( i ) = j = 1 p i j be the degree of each node in V i . We assume d ( 1 ) > d ( i ) for i > 1 .
Hence, the candidate core nodes are those in the first block. The following result describes the relationship between V 1 and the core set identified by the proposed algorithm.
Corollary 2.
In the hypotheses stated above, let n 1 = | V 1 | .
1. 
If d ( 1 ) n 1 then every optimal solution of Problem (2) is a subset of V 1 ,
2. 
if d ( 1 ) n 1 then every optimal solution of Problem (2) contains V 1 as subset.
Proof. 
Assume, without loss in generality, that the nodes are ordered as in Formula (3). Let k be the index defined in Equation (5) and let S = { 1 , , k } .
  • If d ( 1 ) n 1 then
    d d ( 1 ) = d ( 1 ) > d ( 1 ) 1 , d d ( 1 ) + 1 d d ( 1 ) = d ( 1 ) .
    Owing to Equations (5) and (6) we have k = d ( 1 ) n 1 and consequently S V 1 . To prove that every other optimal solution is included in V 1 it remains to show that if k = n 1 then no optimal solution can have k + 1 nodes. By Corollary 1 an optimal solution with cardinality k + 1 exists if and only if d k + 1 = k , but if k = n 1 then d k + 1 < d ( 1 ) = k and we are done.
  • If n 1 d ( 1 ) then d n 1 = d ( 1 ) > n 1 1 . Hence k n 1 and V 1 S .  □
Figure 4 shows the adjacency matrices and the graphical layouts of two equitable graphs with n = 30 , = 3 and | V 1 | = | V 2 | = | V 3 | = 10 . Letting
P ( a ) = a 1 1 1 0 1 1 1 0 ,
the associated partition matrices are P ( 4 ) (leftmost panels) and P ( 9 ) (rightmost panels), respectively. As in the preceding paragraph, the core nodes identified by Algorithm 1 are marked in yellow. When a = 4 the core is a proper subset of the first block, due to its scarce internal density. In fact, in this case we have d ( 1 ) = 6 and d ( 2 ) = d ( 3 ) = 2 . In the other case d ( 1 ) = 11 , d ( 2 ) = d ( 3 ) = 2 , the subgraph induced by the first block is complete and the core coincides exactly with the first block.
Remark 3.
The two idealized core–periphery structures introduced by Borgatti and Everett (in the loopless case), which are depicted in Figure 1c,d, correspond to equitable graphs with a 2-block partition { V 1 , V 2 } and the two partition matrices
P 1 = n 1 1 0 0 0 , P 1 = n 1 1 n 1 n 2 0 ,
respectively, where n i = | V i | for i = 1 , 2 . The following are immediate consequences of Corollary 2.
  • In the first case we have d n 1 1 = n 1 1 > n 1 2 while d n 1 = n 1 1 < n 1 . Hence k = n 1 1 and the set S identified by our algorithm is V 1 except one node. However, since d n 1 = k , from Corollary 1 we obtain that also V 1 is a solution of Problem (2).
  • In the second case we have d n 1 = n 1 > n 1 1 while d n 1 + 1 = n 1 n 1 . Also in this case k = n 1 and V 1 is a core. However, we also have d n 1 + 1 = n 1 . Hence by Corollary 1 any set having the form V 1 { i } with i V 2 is also a solution of Problem (2).

4.3. Power-Law Networks

A graph or network is considered to have a power-law degree distribution with exponent γ if the number n k of nodes having degree k is approximately given by α k γ , where α is a coefficient that depends on the size of the graph ([4] Chap. 4). Many interesting real-world networks have a power-law degree distribution with an exponent γ ( 2 , 3 ) . In fact, power-law degree distributions arise naturally in networks evolving in time by the addition of new nodes and edges according to the so-called preferential attachment rules, see e.g., ([4] Chap. 6) or ([17] Chap. 3). For that reason, a power-law network is usually intended as belonging to a sequence of networks with increasing sizes having a power-law degree distribution whose exponent γ is independent on the network size.
Power-law networks are often deemed as having a core–periphery structure [9,17]. In this paragraph we examine the outcome of our bipartitioning algorithm when applied to power-law networks. The next result describes the asymptotic behavior of the core identified by Algorithm 1 when the network size n diverges. In what follows, the notation f ( n ) g ( n ) means that f ( n ) / g ( n ) 1 as n .
Corollary 3.
Let G be a power-law network with exponent γ > 2 and no isolated nodes. Assuming that the average degree in G does not depend on n, the number of nodes in the core set is O ( n 1 / γ ) .
Proof. 
Let n k α k γ be the degree profile of G, where α depends on n. For k 1 let N ( k ) be the number of nodes with degree greater than or equal to k. With these notations, the integer k in Equation (5) can be characterized by the identity
k = | { i : d i k } | = N ( k ) .
For large n, the number N ( k ) can be approximated by
N ( k ) = i = k n i α k x γ d x = α γ 1 k 1 γ .
Let d min and d max be the smallest and largest degree in G, respectively. The average degree d avg in G is
d avg = 1 n k = d min d max N ( k ) α n ( γ 1 ) d min d max k 1 γ = C α n ( γ 1 ) ( γ 2 ) ,
where C = d min 2 γ d max 2 γ . By hypotheses, C is bounded above and below by positive constants, hence we have α = O ( n ) , since d avg does not depend on n. By Equation (11), we can estimate k as the solution of the equation k = α k 1 γ / ( γ 1 ) , that is,
k = α 1 γ 1 / γ ,
and the claim follows.  □
To validate experimentally the claim of the previous corollary, we performed a series of simulations with power-law networks generated by means of the algorithm described in [22]. That algorithm generates random graphs in the Chung–Lu model [17], which is a flexible, very popular random graph model with nice theoretical properties. We generated random power-law networks with size n = 1000 · 2 i 1 for i = 1 , , 10 and average degree equal to 3, in expectation. In Figure 5, we display the core size versus the network size for networks with exponent γ = 2.1 (left), γ = 2.5 (center) and γ = 2.9 (right). Each cross represents the result for one network. The solid lines plot functions κ n 1 / γ where κ is chosen to fit the results obtained with the largest size n. As is clearly visible, the core sizes approach very closely the theoretical behavior estimated in Corollary 3 over a very large range of n.

5. Conclusions

In a variety of complex real-world systems that are modeled as networks, one often observes the presence of a group of densely interconnected nodes, while the remaining nodes form a sparse peripheral area with few internal links. This particular pattern is called core–periphery, and is one of the key mesoscopic structures that are relevant to the analysis of complex networks.
In this work we proposed a fast and theoretically well-founded algorithm for the localization of core–periphery structures in both oriented and non-oriented complex networks, assuming the presence of that structure. Basically, the set of core nodes is identified by a certain combinatorial optimization problem that has been introduced earlier by Brusco in [13]. We provided a complete description of the set of optimal solutions to that optimization problem. Our algorithm has a very low computational cost, both in terms of memory space and operation count, which makes it suitable for the analysis of large networks. The applicability of the algorithm has been demonstrated experimentally on both real-world and synthetic networks.

Author Contributions

Conceptualization, writing, original draft preparation, F.R.; conceptualization, writing, review and editing, software: D.F. All authors have read and agreed to the published version of the manuscript.

Funding

This research has been carried out in the framework of the departmental research project “ICON: Innovative Combinatorial Optimization in Networks”, Department of Mathematics, Computer Science and Physics (PRID 2017–2018), University of Udine, Italy. Moreover, the first author has been partly supported by INdAM-GNCS.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Decelle, A.; Krzakala, F.; Moore, C.; Zdeborová, L. Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications. Phys. Rev. E 2011, 84, 066106. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  2. Fasino, D.; Tudisco, F. The expected adjacency and modularity matrices in the degree corrected stochastic block model. Spec. Matrices 2018, 6, 110–121. [Google Scholar] [CrossRef]
  3. Karrer, B.; Newman, M.E.J. Stochastic blockmodels and community structure in networks. Phys. Rev. E 2011, 83, 016107. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  4. Barabási, A.L. Network Science; Cambridge University Press: Cambridge, UK, 2016. [Google Scholar]
  5. Fasino, D.; Tudisco, F. A modularity based spectral method for simultaneous community and anti-community detection. Linear Algebra Appl. 2018, 542, 605–623. [Google Scholar] [CrossRef] [Green Version]
  6. Rombach, P.; Porter, M.A.; Fowler, J.H.; Mucha, P.J. Core-periphery structure in networks (revisited). SIAM Rev. 2017, 59, 619–646. [Google Scholar] [CrossRef]
  7. Borgatti, S.P.; Everett, M.G. Models of core/periphery structures. Soc. Net. 1999, 21, 375–395. [Google Scholar] [CrossRef]
  8. Csermely, P.; London, A.; Wu, L.Y.; Uzzi, B. Structure and dynamics of core/periphery networks. J. Complex Netw. 2013, 1, 93–123. [Google Scholar] [CrossRef] [Green Version]
  9. Holme, P. Core-periphery organization of complex networks. Phys. Rev. E 2005, 72, 046111. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  10. Tudisco, F.; Higham, D.J. A nonlinear spectral method for core-periphery detection in networks. SIAM J. Math. Data Sci. 2019, 2, 269–292. [Google Scholar] [CrossRef]
  11. Della Rossa, F.; Dercole, F.; Piccardi, C. Profling core-periphery network structure by random walkers. Sci. Rep. 2013, 3, 1467. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  12. In’t Veld, D.; Van Lelyveld, I. Finding the core: Network structure in interbank markets. J. Bank. Financ. 2014, 49, 27–40. [Google Scholar] [CrossRef]
  13. Brusco, M. An exact algorithm for a core/periphery bipartitioning problem. Soc. Netw. 2011, 33, 12–19. [Google Scholar] [CrossRef]
  14. Cucuringu, M.; Rombach, P.; Lee, S.H.; Porter, M.A. Detection of core-periphery structure in networks using spectral methods and geodesic paths. Eur. J. Appl. Math. 2016, 27, 846–887. [Google Scholar] [CrossRef] [Green Version]
  15. Kojaku, S.; Masuda, N. Finding multiple core-periphery pairs in networks. Phys. Rev. E 2017, 96, 052313. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  16. Ma, A.; Mondragón, R.J. Rich-cores in networks. PLoS ONE 2015, 10, e0119678. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  17. Chung, F.; Lu, L. Complex Graphs and Networks; Number 107 in CBMS; American Mathematical Society: Providence, RI, USA, 2004. [Google Scholar]
  18. Batagelj, V.; Mrvar, A. Pajek-program for large network analysis. Connections 1998, 21, 47–57. Available online: http://mrvar.fdv.uni-lj.si/pajek/ (accessed on 6 December 2019).
  19. Zachary, W.W. An information flow model for conflict and fission in small groups. J. Anthropol. Res. 1977, 33, 452–473. [Google Scholar] [CrossRef] [Green Version]
  20. Godsil, C.D. Compact graphs and equitable partitions. Linear Algebra Appl. 1997, 255, 259–266. [Google Scholar] [CrossRef] [Green Version]
  21. Newman, M.E.J.; Martin, T. Equitable random graphs. Phys. Rev. E 2014, 90, 052824. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  22. Fasino, D.; Tonetto, A.; Tudisco, F. Generating large scale-free networks with the Chung–Lu random graph model. ArXiv 2019, arXiv:1910.11341. [Google Scholar]
Figure 1. Examples of 2-block models for various mesoscale structures in undirected graphs. Shaded areas represent densities of non-zero entries in idealized adjacency matrices. (a) A block model with two communities. (b) A block model with two anti-communities. (c,d) The two prototypical core–periphery block models introduced in [7].
Figure 1. Examples of 2-block models for various mesoscale structures in undirected graphs. Shaded areas represent densities of non-zero entries in idealized adjacency matrices. (a) A block model with two communities. (b) A block model with two anti-communities. (c,d) The two prototypical core–periphery block models introduced in [7].
Symmetry 12 00094 g001
Figure 2. Core–periphery bipartition of the network USAir97. (a) The adjacency matrix, reordered according to the node numbering in Formula (3); entries in the core block are shown in yellow. (b) A graphical layout of the network. Core nodes are shown in yellow. (c) The subgraph induced by core nodes.
Figure 2. Core–periphery bipartition of the network USAir97. (a) The adjacency matrix, reordered according to the node numbering in Formula (3); entries in the core block are shown in yellow. (b) A graphical layout of the network. Core nodes are shown in yellow. (c) The subgraph induced by core nodes.
Symmetry 12 00094 g002
Figure 3. Core–periphery bipartition of the network Zachary. (a) The adjacency matrix, reordered according to the node numbering in Formula (3); entries in the core block are shown in yellow. (b) A graphical layout of the network. Core nodes are shown in yellow. Node numbering as in the original graph. (c) The subgraph induced by core nodes.
Figure 3. Core–periphery bipartition of the network Zachary. (a) The adjacency matrix, reordered according to the node numbering in Formula (3); entries in the core block are shown in yellow. (b) A graphical layout of the network. Core nodes are shown in yellow. Node numbering as in the original graph. (c) The subgraph induced by core nodes.
Symmetry 12 00094 g003
Figure 4. Core–periphery bipartition of two equitable graphs with n = 30 and three blocks. The partition matrices are P ( 4 ) (a,b) and P ( 9 ) (c,d), respectively, where P ( a ) is given in Equation (10). Yellow dots correspond to core nodes, according to Algorithm 1.
Figure 4. Core–periphery bipartition of two equitable graphs with n = 30 and three blocks. The partition matrices are P ( 4 ) (a,b) and P ( 9 ) (c,d), respectively, where P ( a ) is given in Equation (10). Yellow dots correspond to core nodes, according to Algorithm 1.
Symmetry 12 00094 g004
Figure 5. Core sizes in power-law networks generated by the Chung–Lu random graph model. Horizontal axes: size n of a power-law network with average degree 3. Vertical axes: number of nodes in the corresponding core. Each cross represents a random network. Solid lines show the O ( n 1 / γ ) behavior. (a) γ = 2.1 ; (b) γ = 2.5 ; (c) γ = 2.9 .
Figure 5. Core sizes in power-law networks generated by the Chung–Lu random graph model. Horizontal axes: size n of a power-law network with average degree 3. Vertical axes: number of nodes in the corresponding core. Each cross represents a random network. Solid lines show the O ( n 1 / γ ) behavior. (a) γ = 2.1 ; (b) γ = 2.5 ; (c) γ = 2.9 .
Symmetry 12 00094 g005
Table 1. Core nodes identified in the Zachary network by different algorithms.
Table 1. Core nodes identified in the Zachary network by different algorithms.
Algorithm 1BE-KL [15]2-step [15]Divisive [15]Rich-core [16]
1 , 2 , 3 , 4 , 32 , 33 , 34 1 , 2 , 3 , 33 , 34 1 , 2 , 4 , 33 , 34 1 , 2 , 3 , 4 , 24 , 33 , 34 1 , 2 , 3 , 4 , 9 , 14 , 24 , 32 , 33 , 34

Share and Cite

MDPI and ACS Style

Fasino, D.; Rinaldi, F. A Fast and Exact Greedy Algorithm for the Core–Periphery Problem. Symmetry 2020, 12, 94. https://doi.org/10.3390/sym12010094

AMA Style

Fasino D, Rinaldi F. A Fast and Exact Greedy Algorithm for the Core–Periphery Problem. Symmetry. 2020; 12(1):94. https://doi.org/10.3390/sym12010094

Chicago/Turabian Style

Fasino, Dario, and Franca Rinaldi. 2020. "A Fast and Exact Greedy Algorithm for the Core–Periphery Problem" Symmetry 12, no. 1: 94. https://doi.org/10.3390/sym12010094

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop