Next Article in Journal
Hyperparameter Optimization Using Successive Halving with Greedy Cross Validation
Previous Article in Journal
Improvement of Ant Colony Algorithm Performance for the Job-Shop Scheduling Problem Using Evolutionary Adaptation and Software Realization Heuristics
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A General Computational Approach for Counting Labeled Graphs

1
Division of Infectious Diseases and Global Public Health, University of California San Diego, La Jolla, CA 92093, USA
2
Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA
*
Author to whom correspondence should be addressed.
Algorithms 2023, 16(1), 16; https://doi.org/10.3390/a16010016
Submission received: 23 September 2022 / Revised: 15 December 2022 / Accepted: 16 December 2022 / Published: 27 December 2022
(This article belongs to the Section Combinatorial Optimization, Graph, and Network Algorithms)

Abstract

:
This paper presents a general recursive formula to estimate the number of labeled graphs as well as details to evaluate the formula for the following graph properties: number of edges (graph density), degree sequence, degree distribution, classification mixing, and degree mixing, i.e., the formula estimates the number of labeled graphs that have given values for graph properties. The proposed approach can be extended to additional graph properties (e.g., number of triangles) as well as properties of bipartite graphs. For special settings in which formulas exist from previous research, simulation studies demonstrate the validity of the proposed approach. In addition, we demonstrate how our approach can be used to quantify the level of variability in values of a graph property in the subset of graphs that hold a specified value of a different graph property (or properties) constant.

1. Introduction

Graph enumeration is a well-established area of combinatorics for counting graphs with particular features. Examples of such enumeration include determining how many graphs exist with a given number of vertices or edges, or a given degree sequence. Approaches for counting graphs fall into two categories based on whether their vertices are labeled or unlabeled. In the former case, vertices of a graph are labeled in a way that makes them distinguishable from one another. In the latter, all permutations of vertices are considered to form the same graph [1]. In social network analysis—our area of interest—vertices are most often distinguishable from each other; hence, we focus on labeled graph enumeration.
Recently, Iniquez et al. discussed bridging the gap between graph theory and social network analysis [2]. As they note, connecting these two disciplines may clarify the role of randomness in modeling dynamical systems, such as those that govern spread of infectious diseases within a population. Simulation studies that investigate such spread and the impact of interventions on it often make use of agent-based models (ABMs) [3,4]. An important component of ABMs is the formulation of interactions among the agents in the model; these interactions can be represented as a graph. We refer to the collection of such interactions as contact networks (in keeping with infectious disease transmission literature). Often ABMs generate the contact network from a stochastic process. Graph enumeration can aid in interpreting results of simulation studies of processes that operate on graphs (e.g., spread of infection) by permitting assessment of the contribution to variation in the results (e.g., total number infected in unit time) that arises from variation in the generated contact networks. Higher levels of the latter might be expected to lead to higher levels of the former. We provide an application of our methods to demonstrate how graph enumeration can help in quantifying variation in graphs.
Current solutions to graph enumeration problems are individually tailored to particular properties (such as degree sequence) [5]. These solutions are either closed-form mathematical expressions or asymptotic formulas. Equations to calculate the number of labeled graphs with various characteristics have been reported; these include rooted graphs, connected graphs, and directed graphs [1]. Considerable research has been devoted to estimating the number of labeled graphs with a given degree sequence—a property important in social network analysis [6,7,8,9]. However, there has been little research focusing on other important properties in social network analysis, such as degree mixing and number of triangles.
Below, we propose a general approach for counting labeled graphs that applies to several graph properties, including degree sequence. Furthermore, our approach deviates from the standard one of developing a closed-form or asymptotic formula. By contrast, we propose an algorithmic method to the graph enumeration problem. The next section provides terminology used in the paper. Section 3 presents a general recursive formula to estimate the number of labeled graphs as well as details to evaluate the formula for the following graph properties: number of edges (graph density), degree sequence, degree distribution, classification mixing, and degree mixing. For settings in which formulas exist from previous research, Section 4 presents simulation studies demonstrating the degree of similarity between our proposed methods and those that are currently available. In Section 5, we apply the proposed approach to estimate the number of labeled graphs associated with different values of degree distribution and degree mixing that arise from the Barabási–Albert model to investigate the variation across graphs generated with this model [10]. The paper concludes with a discussion and further research.

2. Terminology

We represent a graph, G = ( V , E ) , as an adjacency matrix with dimensions equal to the size of set V. Therefore, G has dimensions | V | × | V | , where | V | denotes the size of set V. Let n and m represent the number of vertices in G, i.e., n = | V | and number of edges, i.e., m = | E | , respectively. Let { v 1 , , v n } denote the vertices in set V, which are labeled (arbitrarily) but enables them to be distinguishable from one another. Let ( v i , v j ) denote an edge between v i and v j . Let G [ i , j ] = 1 indicate that there is an edge between v i and v j , where v i , v j { v 1 , , v n } , while G [ i , j ] = 0 indicates that there is no edge. Denote the neighbors of v i as η ( v i ) , i.e., η ( v i ) = { v j : G [ i , j ] = 1 } . Let G n be the set of all simple labeled graphs with n vertices.
Let ϕ 1 denoted an algebraic map from a graph G to its number of edges, i.e., ϕ 1 ( G ) = m . The degree of vertex v i , denoted as d i ( G ) , is the number of edges the vertex has with other vertices in V; therefore d i ( G ) = j G [ i , j ] . Let d ( G ) = ( d 1 ( G ) , , d n ( G ) ) represent the vector of degrees for nodes in set V, commonly referred to as a degree sequence. The degree distribution, denoted as D ( G ) , is a vector representing the number of these degrees over all vertices in set V; the kth entry represents the number of vertices with degree k, i.e., D k ( G ) = i = 1 n I { d i ( G ) = k } . Let ϕ 2 and ϕ 2 a denote the mapping from a graph to its degree distribution, i.e., ϕ 2 ( G ) = D ( G ) , and degree sequence, i.e., ϕ 2 a ( G ) = d ( G ) , respectively.
Let m i ( G ) represent a discrete classification for vertex v i in graph G; we denote the number of distinct classifications as q. Let m ( G ) = ( m 1 ( G ) , , m n ( G ) ) be a vector containing the characteristics of all vertices. The classification distribution, denoted as M ( G ) , is a vector representing the number of individuals with these classification over all vertices; the kth entry represents the number of vertices with classification k, i.e., M k ( G ) = i = 1 n I { m i ( G ) = k } . Let M M ( G ) be a q × q symmetric matrix representing the mixing by classification of graph G; we refer to M M ( G ) as a classification mixing matrix. The entry M M k , l ( G ) is the total number of edges between a vertex with classification k and vertex with classification l. Let ϕ 3 denote the mapping from a graph to its classification mixing matrix. Let D M M ( G ) be a particular mixing matrix where the classification represents vertex degrees. Therefore, the entry D M M k , l ( G ) is the total number of edges between vertices of degree k and l. Let ϕ 4 denote the mapping from a graph to its degree mixing matrix.
Denote the inverse images associated with a map ϕ i as c ϕ i ( x ) = { G : ϕ i ( G ) = x , G G n } . These inverse images of singleton sets have been referred to as fibers in algebraic statistics literature [11]. The graph enumeration problem is calculating the size of a fiber, denoted as | c ϕ i ( x ) | , which represents the number of graphs where the graph property associated with ϕ i equals x; this quantity has been referred to as a volume factor [12]. We refer to x as a graphical value associated with ϕ i if | c ϕ i ( x ) | 1 .

3. Methods

This section first presents a general recursive formula to estimate the number of labeled graphs; details for specific graph properties, e.g., degree distribution, follow afterwards. Equation (1) provides a recursive formula to estimate the number of graphs, | c ϕ i ( x k ) | , with specific value(s), x k , for particular graph properties associated with ϕ i :
| c ϕ i ( x k ) | = r ϕ i ( x k , x k 1 ) | c ϕ i ( x k 1 ) | ,
where r ϕ i ( x k , x k 1 ) is the ratio between the sizes of fibers c ϕ i ( x k ) and c ϕ i ( x k 1 ) , i.e.,
r ϕ i ( x k , x k 1 ) = | c ϕ i ( x k ) | | c ϕ i ( x k 1 ) | .
Goyal et al. [13] provides equations to calculate r ϕ ( x i , x i 1 ) for a range of graph properties including number of edges, classification mixing, degree distribution, degree mixing, and number of triangles (controlling for degree mixing) when x i and x i + 1 are specified such that there exists graphs G i and G i 1 where:
s1. 
G i and G i 1 differ by the presence or absence of a single edge;
s2. 
ϕ i ( G i ) = x i ; and
s3. 
ϕ i ( G i 1 ) = x i 1 .
To make use of the recursive formula and previous work by Goyal et al. [13], it is necessary to specify a sequence of values { x 0 , , x k } such that there exists graphs { G 0 ,…, G k } where each consecutive pair satisfies s 1 s 3 . In addition, we need to be able to calculate | c ϕ i ( x 0 ) | . Although there is no constraint on x 0 , it is often useful to set x 0 equal to the specific value of the graph properties associated with the empty graph; hence, typically, | c ϕ ( x 0 ) | = 1. Throughout this paper, we follow this approach. In the sections below, we provide details for calculating | c ϕ i ( x ) | for when ϕ i and x are associated with a given number of edges, degree distribution, degree sequence, classification mixing matrix, and degree mixing matrix.

3.1. Graph Enumeration Problem: Calculate the Number of Labeled Graphs of Size n with m Edges

In this section, we calculate | c ϕ 1 ( m ) | . To address this graph enumeration problem, we specify the following sequence of number of edges for the recursive procedure: x 0 , , x k = m where x i = i . Theorem 1 proves that there exists a collection of graphs that is consistent with this specification of number of edges, i.e., there exists a collection of graphs G 0 , , G k where each consecutive pair satisfies s 1 s 3 for ϕ 1 .
Theorem 1.
For a sequence of number of edges: x 0 , , x k where x i = i and k n 2 , there exists a collection of graphs G 0 , , G k where each consecutive pair satisfies s 1 s 3 for ϕ 1 and G i G n for all i { 0 , , k } .
Proof. 
Let E = { e 1 , , e k } be a set of distinct edges among vertices { 1 , , n } ; this is possible because k n 2 . Let G i denote the graph formed with the first i edges from E, i.e., G i contains edges { e 1 , , e i } . Based on the definition of G i , ϕ 1 ( G i ) = x i = i , ϕ 1 ( G i 1 ) = x i 1 = i 1 , and G i and G i 1 differ by a single edge.    □
Since x 0 , , x k satisfies s 1 s 3 , we can use results from Goyal et al. [13] to calculate r ϕ 1 ( x i , x i 1 ) as shown below:
r ϕ 1 ( x i , x i 1 ) = n 2 x i 1 x i .
Using Equation (3) along with the specification of x 0 , , x k = m as x i = i and | c ϕ 1 ( x 0 ) | = 1 , it is possible to calculate | c ϕ 1 ( m ) | . Section 4.1.1 provides a numerical example for calculating | c ϕ 1 ( m ) | when m = 10 , while Section 4.1.2 provides a comparison between the recursive formula and a previously established formula.

3.2. Graph Enumeration Problem: Calculate the Number of Labeled Graphs of Size n with Degree Distribution D

To calculate | c ϕ 2 ( D ) | , the number of labeled graphs with degree distribution D, using the recursive formula, we need to specify a sequence of degree distributions, x 0 , , x k = D . We specify such a sequence by leveraging the Havel–Hakimi algorithm [14,15]. Let d be any degree sequence that is consistent with degree distribution D . The Havel–Hakimi algorithm permits identification of a set of edges, denoted as E, that can be used to construct a graph with degree sequence d. Algorithm 1 provides a procedure to identify E.
Algorithm 1: Degree distribution
Algorithms 16 00016 i001
Let G i denote the graph formed with the first i edges from E, i.e., contains edges { e 1 , , e i } . Let x i denote the degree distribution associated with graph G i , i.e., x i = D ( G i ) . Theorem 2 states that x 0 , , x k = D satisfies s 1 s 3 .
Theorem 2.
Let E denote the collection of edges outputted from Algorithm 1 with a graphical degree distribution D as input. Let G i denote the graph formed with the first i edges from E, i.e., contains edges { e 1 , , e i } . Let x i = ϕ 2 ( G i ) . Each consecutive pair in the collection of graphs G 0 , , G k satisfies s 1 s 3 for ϕ 2 .
Proof. 
The edges in E are distinct. Therefore, by design, the conditions are satisfied.    □
Let ( v l , v j ) be the single edge that differs between G i and G i 1 . Based on results from Goyal et al. [13]:
r ϕ ( x i , x i 1 ) = β 1 ( G i 1 ) α 1 ( G i 1 ) α 1 ( G i ) ,
where
α 1 ( G ) = E ( D M M d l ( G ) , d j ( G ) | D ( G ) ) ;
β 1 ( G ) = D d l ( G ) ( G ) D d j ( G ) ( G )   if   d l ( G ) d j ( G ) D d l ( G ) ( G ) 2   else ,
and based on Newman [16],
E ( D M M x , y | D ) x D x × y D y . 5 ( z z D z ) × 1 2 I { x = y } .
Using Equation (4) along with the specification of x 0 , , x k = D as defined above and | c ϕ 2 ( x 0 ) | = 1 (as only the empty graph has degree distribution x 0 ), it is possible to calculate | c ϕ 2 ( D ) | .

3.3. Graph Enumeration Problem: Calculate the Number of Labeled Graphs of Size n with Degree Sequence d

The number of graphs with degree sequence d , | c ϕ 2 a ( d ) | , can be computed by dividing the number of labeled graphs with the degree distribution consistent with d, denoted as D ( d ) , by the number of permutations of assigning vertices to degrees. Specifically,
| c ϕ 2 a ( x ) | = | c ϕ 2 ( D ( d ) ) | j = 0 n n k = 0 j 1 D k ( d ) D j ( d ) .
Section 4.2.1 provides a numerical example for calculating | c ϕ 2 a ( d ) | , while Section 4.2.2 provides a comparison between the presented recursive formula and a formula by Liebenau et al. [9].

3.4. Graph Enumeration Problem: Calculate the Number of Labeled Graphs of Size n with Classification Mixing Matrix M M

To calculate the number of labeled graphs with mixing matrix M M , we assume that the classification of all vertices, M , is known and M M is graphical. We specify x 0 , , x k = M M as the following for l m ( x i is symmetric):
x i l , m = 0   if   i a = 1 l 1 b = a q x k a , b + b = l m 1 x k l , b x k l , m   else   if   i a = 1 l 1 b = a q x k a , b + b = l m x k l , b i a = 1 l 1 b = a q x k a , b + b = l m 1 x k l , b else ,
where q is the number of distinct classifications. Theorem 3 proves that there are graphs consistent with { x 0 , , x k } that satisfy s 1 s 3 .
Theorem 3.
For a sequence of mixing matrices { x 0 , , x k } defined by Equation (9), there exists a collection of graphs G 0 , , G k where each consecutive pair satisfies s 1 s 3 for ϕ 3 .
Proof. 
To show this, let E = { e 1 , , e k } be a set of distinct edges where the first x k 1 , 1 are between vertices with classification 1, the next x k 1 , 2 are between vertices with classification 1 and 2, and so on. Let G i denote the graph formed with the first i edges from E, i.e., G i contains edges { e 1 , , e i } . Based on the definition of G i , ϕ 3 ( G i ) = x i , ϕ 3 ( G i 1 ) = x i 1 , and G i and G i 1 differ by a single edge.    □
Let ( v l , v j ) be the single edge that differs between G i and G i 1 . Given Theorem 3, r ϕ 3 ( x i , x i 1 ) is the following:
If m l m j
r ϕ 3 ( x i , x i 1 ) = M m l × M m j x i 1 l , j x i l , j ,
else,
r ϕ 3 ( x i , x i 1 ) = M m l 2 x i 1 l , j x i l , j ,
Using Equations (10) and (11) along with the specification of x 0 , , x k = M M as defined above and | c ϕ 3 ( x 0 ) | = 1 , it is possible to calculate | c ϕ 3 ( M M ) | .

3.5. Graph Enumeration Problem: Calculate the Number of Graphs of Size n with Degree Mixing Matrix D M M

To calculate the number of labeled graphs with degree mixing D M M , we follow a similar approach as that for degree distribution. Specifically, we use a constructive proof for assessing whether a degree mixing matrix is graphical to specify a set of edges, E, that can be used to construct a graph with degree mixing D M M [13]; Algorithm 2 provides a procedure to construct E.
Algorithm 2: Degree Mixing
Algorithms 16 00016 i002
Theorem 4.
Let E denote the collection of edges output from Algorithm 2 with a graphical degree mixing matrix D M M as input. Let G i denote the graph formed with the first i edges from E, i.e., contains edges { e 1 , , e i } . Let x i = ϕ 4 ( G i ) . Each consecutive pair in the collection of graphs G 0 , , G x satisfies s 1 s 3 for ϕ 4 .
Proof. 
The edges in E are distinct. Therefore, by design, the conditions are satisfied. □
Based on the definition of x 0 in Theorem 4, | c ϕ 4 ( x 0 ) | = 1 . Let ( v l , v j ) be the single edge that differs between G i and G i 1 . Based on results from Goyal et al. [13]:
r ϕ 4 ( x i , x i 1 ) = [ γ 1 ( G i 1 ) α 2 ( G i 1 ) ] × β ( l , j ) 0 ( G i 1 ) D M M d l ( G i ) , d j ( G i ) ( G i ) × β ( l , j ) 1 ( G i ) ,
where
α 2 ( G ) = D M M d l ( G ) , d j ( G ) ( G ) ;
γ 1 ( G ) = D d i ( G ) ( G ) D d j ( G ) ( G )   if   d i ( G ) d j ( G ) D d i ( G ) ( G ) 2 else ;
and based on concepts from Newman [16], if d i ( z ) d j ( z ) ,
β ( l , j ) s ( G ) Π z D M M d l ( G ) , z ( G ) I { d j ( G ) = z } · s n l z I { d j ( G ) = z } · s z D M M d l ( G ) , z ( G ) I { d j ( G ) = z } · s d l ( G ) s × Π z D M M z , d j ( G ) ( G ) I { d l ( G ) = z } · s n j z I { d l ( G ) = z } · s z D M M z , d j ( G ) ( G ) I { d l ( G ) = z } · s d j ( G ) s
else,
β ( l , j ) s ( G ) Π z D M M d l ( G ) , z ( G ) I { d j ( G ) = z } · s n l z + n j z 2 I { d j ( G ) = z } · s z D M M d l ( G ) , z ( G ) I { d j ( G ) = z } · s d l ( G ) + d j ( G ) 2 s .
where D M M a , b ( G ) = D M M a , b ( G ) if a b and D M M a , b ( G ) = 2 D M M a , b ( G ) if a = b and n l z and n j z denote the number of vertices that are neighbors of i and j and equal to z.
Using Equation (12) along with the specification of x 0 , , x k = D M M as defined above and | c ϕ 4 ( x 0 ) | = 1 , it is possible to calculate | c ϕ 4 ( D M M ) | .

3.6. Additional Graph Properties and Bipartite Graphs

The recursive formula and associated framework we propose can be used to calculate the number of labeled graphs for many additional graph properties. In particular, Goyal et al. [13] provide equations for r ϕ ( x k , x k 1 ) for number of triangles (controlling for degree mixing) as well as jointly specifying classification mixing matrix and degree distribution. In addition, Goyal et al. [17] enables extending the calculation of r ϕ ( x k , x k 1 ) to the setting of bipartite graphs.

4. Results

In this section, we present numerical examples—including validation results for estimating the number of labeled graphs—for two graph properties: number of edges (Section 4.1) and degree sequence (Section 4.2). To the author’s knowledge, our paper is the first to provide formulas for estimating the number of labeled graphs consistent with values for several graph properties (e.g., degree mixing matrix) described in Section 3; hence, for these properties, we are not able to compare our approach to any existing approach. In the application section (Section 5), we present results for the number of labeled graphs associated with particular degree distributions and degree mixing matrices using our presented approach.

4.1. Number of Edges

Although there is a closed-form expression for calculating the number of labeled graphs of size n with m number of edges, we use Equation (1) for calculation of this graph property to illustrate our recursive approach for graph enumeration.

4.1.1. Example

To illustrate the use of the recursive formula, we provide a numerical example where we estimate the number of graphs of size n = 1000 with exactly m = 10 edges; that is, we calculate | c ϕ 1 ( m = 10 ) | . As discussed in Section 3.1, we set x 0 = 0 , x 1 = 1 , , x k = 10 . Therefore, based on Equation (1):
| c ϕ 1 ( x k = 10 ) | = r ϕ 1 ( x k = 10 , x k 1 = 9 ) · | c ϕ 1 ( x k 1 = 9 ) | .
Based on Equation (3),
r ϕ 1 ( x k = 10 , x k 1 = 9 ) = 1000 2 9 10 = 49949.1 .
Therefore,
| c ϕ 1 ( x k = 10 ) | = 49949.1 · | c ϕ 1 ( x k 1 = 9 ) | .
The next step in the procedure is the calculation of | c ϕ 1 ( x k 1 = 9 ) | using the following:
| c ϕ 1 ( x k 1 = 9 ) | = r ϕ 1 ( x k 1 = 9 , x k 2 = 8 ) · | c ϕ 1 ( x k 2 = 8 ) | .
The procedure ends at x 0 = 0 . Therefore,
l o g ( | c ϕ 1 ( x k = 10 ) | ) = i = 1 i = 10 l o g ( r ϕ 1 ( x i , x i 1 ) ) + l o g ( | c ϕ 1 ( x 0 = 0 ) | ) .
Table 1 provides the log values for r ( x i , x i 1 ) for i = 1 to 10. Based on these values and noting that | c ϕ 1 ( x 0 = 0 ) | = 1 , we calculate l o g ( | c ϕ 1 ( x k = 10 ) | ) = 116.11 .

4.1.2. Comparison

To illustrate the validity of the recursive formula in this setting, we compare the estimates of the number of graphs of size n with m edges, | c ϕ 1 ( m = x ) | , based on the proposed recursive formula to those based on the following existing formula [1]:
| c ϕ 1 ( x ) | = n 2 x .
Our comparison considers values of x ranging from { 1 , , 10 } for graphs of size n = 1000 .
For each value of x { 1 , , 10 } , Table 1 provides the log values for r ( x , x 1 ) and | c ϕ 1 ( x ) | based on Equations (1) and (22). The estimates obtained from the proposed recursive formula and the known formula are identical—an expected finding given that a closed-form equation for r ϕ 1 ( x , x 1 ) exists.
Regarding complexity, computing the recursive formula for | c ϕ 1 ( x ) | requires computing n 2 as well as x number of subtractions and divisions. Therefore, the complexity of the recursive formula is O ( x ) where subtractions and divisions are O ( 1 ) .

4.2. Degree Sequence

This section illustrates the use of Equation (1) to estimate the number of graphs with a fixed degree sequence.

4.2.1. Example

To illustrate the use of the recursive formula, we provide a numerical example wherein we estimate the number of graphs with n = 1000 and fixed degree sequence. In particular, we estimate the number of 2-regular graphs of size n = 1000 ; a δ -regular graph is one in which each vertex has exactly degree δ . Therefore, the degree sequence for a δ -regular graph is d = { δ , , δ } .
As the number of distinct degree sequences for a degree distribution associated with a δ -regular graph is one, Equation (8) simplifies to:
| c ϕ 2 a ( d ) | = | c ϕ 2 ( D ( d ) ) | .
Using Algorithm 1, we generate a sequence of edges E, where E specifies a 2-regular graph of size n. The edge list E contains 1000 edges. A partial list of the edges in E, denoted as a pair consisting of Vertex 1 and Vertex 2 { 1 , , 1000 } , are shown in Table 2. Using the edge list E, we generate a series of graphs, G 0 , , G 1000 , where G i is the graph that contains the first i edges in E. Next, we generate a series of degree distributions { x 0 , , x 1000 } , where x i = ϕ 2 ( G i ) . Table 3 shows a partial list of the degree distribution sequences, i.e., x i . For each consecutive entry in the degree distribution sequences, we can use Equations (4)–(7) to estimate r ϕ ( x i , x i 1 ) . As x 0 is only consistent with the empty graph, | c ϕ 2 ( x 0 ) | = 1 . We estimate the log number of 2-regular graphs as 5907.899 ; as we see in the next section an existing formula by Liebenau et al. [9] estimates this as 5907.352 .

4.2.2. Comparison

As in the previous section, we illustrate the validity of the recursive formula in this setting by comparing the estimates of | c ϕ 2 a ( d ) | based on the proposed recursive formula to those that result from an existing formula. In particular, we compare results of the proposed recursive approach to those resulting from the available formula for estimating the number of δ -regular graphs of size n = 1000 .
Liebenau et al. [9] proved the validity of a general asymptotic formula—conjectured in 1990—for the number of graphs with given degree sequence. They also provide a formula that converges to the number of δ -regular graphs as n . This formula allows comparison of | c ϕ 2 a ( d ) | , where d is the degree sequence d = { δ , , δ } , obtained from this asymptotic formula to those from the proposed recursive formula.
Figure 1 shows log estimates for the number of δ -regular graphs for δ from 1 to 10 for the two approaches. The red bars depict estimates based on the recursive formula introduced in this paper; the blue bars are estimates based on Liebenau et al. [9]. Each plot in Figure 1 shows log estimates for graphs of size 1000, 5000, and 10,000. The log estimates differ by less than 0.01 % . To calculate the number of graphs with a given degree sequence based on the recursive formula, we first make use of the Havel–Hakimi algorithm, which has a complexity of O ( n 2 ) [18]. Next, the ratio presented in Equation (4) must be computed ( i = 1 n d ) / 2 times, where each calculation of the ratio is O ( 1 ) . Finally, we calculate the number of degree sequences that are associated with a degree distribution, which has complexity O ( n ) . Therefore, the complexity of the recursive formula for calculating the number of graphs of size n with a degree sequence d is O ( n 2 ) + O ( ( i = 1 n d ) / 2 ) .

5. Application

In this section, we estimate the variation in degree mixing matrices consistent with degree distributions formed by the Barabási–Albert (BA) model [10]. The BA model can be initiated with a small seed graph that grows by the addition of new vertices one at a time. Each new vertex forms a new edge with an existing vertex based on preferential attachment rules. Vertices and edges, once introduced, are never deleted. The BA model fixes the number of (undirected) edges connected to each new vertex. The BA model provides a mechanism to generate graphs with a fat-tailed degree distribution—specifically a power-law degree distribution—wherein the probability, P ( k ) , that a vertex in the graph has degree k, decays as a power-law P ( k ) k γ .
To calculate the variation in degree mixing matrices consistent with degree distributions formed by the BA model, we first estimate the number graphs consistent with a degree distribution associated with the BA. Second, we estimate the number of graphs consistent with a degree mixing matrix associated with a degree distribution from the BA model. Third, we estimate the number of distinct degree mixing matrices associated with a degree distribution generated from the BA model, which provides a metric for the variation of graphs generated by the BA model.
For the first step, we generate 100 graphs using the BA model ( n = 5000 ), denoted as { G 1 B A , , G 100 B A } . Figure 2 shows density plots for the log estimates for | c ϕ 2 ( ϕ 2 ( G i B A ) ) | in the first panel. The average number of labeled graphs associated with a degree distribution generated from the BA model was estimated as 1.26 e 16988 (exponential of the mean of the first panel in Figure 2). Second, for each graph, we estimate | c ϕ 4 ( ϕ 4 ( G i B A ) ) | . Figure 2 (second panel) shows density plots for the log estimates for | c ϕ 4 ( ϕ 4 ( G i B A ) ) | . Figure 2 (third panel) shows a density plot for the log estimates for the number of distinct degree mixing matrices associated with a degree distribution generated from the BA model. The exponential of the mean gives an estimate of 4.16 e 634 distinct degree mixing matrices associated with a degree distribution generated from the BA model.

6. Discussion

This paper presents a general recursive formula to estimate the number of labeled graphs with specific values for graph properties of interest. We consider those with particular relevance for social network analysis: number of edges (graph density), degree sequence, degree distribution, classification mixing, and degree mixing. The proposed method can easily be extended to additional graph properties, including number of triangles (controlling for degree mixing), as well as to bipartite graphs; the formulas for Equation (2) are currently available. The proposed recursive formula differs from other available approaches for graph enumeration both in its overall approach and in the breadth of graph properties that can be considered; it may be profitable to investigate the theoretical connections between the proposed method and other approaches. Furthermore, graph enumeration has the potential to play an important role in statistical network analysis, because formulating the likelihood of observing a real-world graph with particular properties is necessary for making principled inferences.
One current area of research addresses the question of how to make use of results obtained from a study in one population setting to predict what results of a similar study would have been in a different setting. Causal methodologists refer to such research as the study of transportability. This notion is related to the idea of generalizability of results to populations different from the one under study, but true generalizability requires that two populations be similar in all factors that impact study results in important ways. For example, if characteristics such as age or sex of recipients of interventions impacted their efficacy, then generalizability would require that the two populations be similar in these characteristics. Transportability analyses attempt to adjust for differences in populations in prediction of quantities such as intervention effects in new populations. In the settings we consider, adjustment would be required not only for individual characteristics, but also potentially for graph features that impact intervention effects. For example, if degree assortativity—a summary measure of the degree mixing matrix—impacts the spread of disease or the effectiveness of interventions, then this factor would need to be taken into account when predicting spread or effectiveness in a population different from the one that was studied. The methods we describe would aid in investigation of transportability in such settings by facilitating development of ABM-based simulation studies in which graph properties can be chosen to reflect knowledge about those properties (including their uncertainty) in the setting of interest. Of course, in many settings detailed information about potentially important properties may be unavailable. This issue can be addressed using the methods described above to assess the extent to which the unknown properties might impact intervention effectiveness (or other quantities of interest) in the new population [19]. Ideally it would be safe to exclude these properties from the graph model; but if not, an investigator could use the proposed methods to consider plausible ranges of these properties. Doing so would appropriately increase the uncertainty of the prediction of intervention effectiveness in the new population.

Author Contributions

Conceptualization, R.G. and V.D.G.; methodology, R.G. and V.D.G.; software, R.G.; validation, R.G.; writing—original draft preparation, R.G.; writing—review and editing, V.D.G.; visualization, R.G.; funding acquisition, R.G. and V.D.G. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Institute of Health grants R37 AI-51164 and R01 AI-147441.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the paper; or in the decision to publish the results.

References

  1. Harary, F.; Palmer, E.M. Graphical Enumeration; Elsevier: Amsterdam, The Netherlands, 2014. [Google Scholar]
  2. Iñiguez, G.; Battiston, F.; Karsai, M. Bridging the gap between graphs and networks. Commun. Phys. 2020, 3, 1–5. [Google Scholar] [CrossRef]
  3. Goyal, R.; Hotchkiss, J.; Schooley, R.T.; De Gruttola, V.; Martin, N.K. Evaluation of SARS-CoV-2 transmission mitigation strategies on a university campus using an agent-based network model. Clin. Infect. Dis. 2021, 73, 1735–1741. [Google Scholar] [CrossRef]
  4. Hambridge, H.L.; Kahn, R.; Onnela, J.P. Examining sars-cov-2 interventions in residential colleges using an empirical network. Int. J. Infect. Dis. 2021, 113, 325–330. [Google Scholar] [CrossRef]
  5. Lucatero, C.R. Combinatorial Enumeration of Graphs. In Probability, Combinatorics and Control; IntechOpen: London, UK, 2019. [Google Scholar]
  6. Read, R. The enumeration of locally restricted graphs (I). J. Lond. Math. Soc. 1959, 1, 417–436. [Google Scholar] [CrossRef]
  7. Bender, E.A.; Canfield, E.R. The asymptotic number of labeled graphs with given degree sequences. J. Combin. Theory Ser. A 1978, 24, 296–307. [Google Scholar] [CrossRef] [Green Version]
  8. Bollobás, B. A probabilistic proof of an asymptotic formula for the number of labelled regular graphs. Eur. J. Combin. 1980, 1, 311–316. [Google Scholar] [CrossRef] [Green Version]
  9. Liebenau, A.; Wormald, N. Asymptotic enumeration of graphs by degree sequence, and the degree sequence of a random graph. arXiv 2017, arXiv:1702.08373. [Google Scholar]
  10. Barabasi, A.L.; Albert, R. Emergence of Scaling in Random Networks. Science 1999, 286, 509–512. [Google Scholar] [CrossRef] [Green Version]
  11. Petrovic, S. A survey of discrete methods in (algebraic) statistics for networks. Algebr. Geom. Methods Discret. Math. 2017, 685, 260–281. [Google Scholar]
  12. Shalizi, C.R.; Rinaldo, A. Consistency under sampling of exponential random graph models. Ann. Stat. 2013, 41, 508–535. [Google Scholar] [CrossRef]
  13. Goyal, R.; Blitzstein, J.; De Gruttola, V. Sampling networks from their posterior predictive distribution. Netw. Sci. 2014, 2, 107–131. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  14. Hakimi, S.L. On realizability of a set of integers as degrees of the vertices of a linear graph. J. SIAM 1962, 10, 496–506. [Google Scholar] [CrossRef]
  15. Havel, V. A remark on the existence of Finite graphs. Časopis Pest. Mat. 1955, 80, 477–480. [Google Scholar] [CrossRef]
  16. Newman, M. Assortative mixing in networks. Phys. Rev. Lett. 2002, 89, 208701. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  17. Goyal, R.; De Gruttola, V. Inference on network statistics by restricting to the network space: Applications to sexual history data. Stat. Med. 2018, 37, 218–235. [Google Scholar] [CrossRef] [PubMed]
  18. Sensarma, D.; Sen Sarma, S. Role of graphic integer sequence in the determination of graph integrity. Mathematics 2019, 7, 261. [Google Scholar] [CrossRef] [Green Version]
  19. DeGruttola, V.; Goyal, R.; Martin, N.K.; Wang, R. Network methods and design of randomized trials: Application to investigation of COVID-19 vaccination boosters. Clin. Trials 2022, 19, 363–374. [Google Scholar] [CrossRef] [PubMed]
Figure 1. Comparison: Log estimates the number of δ -regular graphs for degrees from 1 to 10. The red bars depict estimates based on the recursive formula introduced in this paper; the blue bars are estimates based on Liebenau et al. [9]. Each plot shows log estimates for graphs of size 1000, 5000, and 10,000.
Figure 1. Comparison: Log estimates the number of δ -regular graphs for degrees from 1 to 10. The red bars depict estimates based on the recursive formula introduced in this paper; the blue bars are estimates based on Liebenau et al. [9]. Each plot shows log estimates for graphs of size 1000, 5000, and 10,000.
Algorithms 16 00016 g001
Figure 2. Barabási–Albert (BA) model: Density plots for the log estimates for | c ϕ 2 ( ϕ 2 ( G i B A ) ) | (first panel) [degree distribution], | c ϕ 4 ( ϕ j ( G i B A ) ) | (second panel) [degree mixing], and number of distinct degree mixing matrices associated with a degree distribution generated from the BA model.
Figure 2. Barabási–Albert (BA) model: Density plots for the log estimates for | c ϕ 2 ( ϕ 2 ( G i B A ) ) | (first panel) [degree distribution], | c ϕ 4 ( ϕ j ( G i B A ) ) | (second panel) [degree mixing], and number of distinct degree mixing matrices associated with a degree distribution generated from the BA model.
Algorithms 16 00016 g002
Table 1. Comparison of methods to calculate number of edges.
Table 1. Comparison of methods to calculate number of edges.
x l o g ( r ( x , x 1 ) ) l o g ( | c ϕ 1 ( x ) | ) l o g ( | c ϕ 1 ( x ) | )
[Equation (1)][Equation (22)]
000
1 13.12 13.12 13.12
2 12.42 25.55 25.55
3 12.02 37.57 37.57
4 11.74 49.31 49.31
5 11.51 60.82 60.82
6 11.33 72.15 72.15
7 11.18 83.32 83.32
8 11.04 94.37 94.37
9 10.92 105.29 105.29
10 10.82 116.11 116.11
Table 2. A partial list of the edges in E, denoted as a pair consisting of Vertex 1 and Vertex 2.
Table 2. A partial list of the edges in E, denoted as a pair consisting of Vertex 1 and Vertex 2.
Vertex 1Vertex 2
v 1 v 2
v 1 v 3
v 4 v 5
v 4 v 6
v 14 v 15
v 11 v 12
v 8 v 9
v 5 v 6
v 2 v 3
Table 3. A partial list of the degree distribution sequences.
Table 3. A partial list of the degree distribution sequences.
Degree 0Degree 1Degree 2Degree 3
x 0 1000000
x 1 998200
x 2 997210
x 3 995410
x 4 994420
x 996 089920
x 997 069940
x 998 049960
x 999 029980
x 1000 0010000
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Goyal, R.; De Gruttola, V. A General Computational Approach for Counting Labeled Graphs. Algorithms 2023, 16, 16. https://doi.org/10.3390/a16010016

AMA Style

Goyal R, De Gruttola V. A General Computational Approach for Counting Labeled Graphs. Algorithms. 2023; 16(1):16. https://doi.org/10.3390/a16010016

Chicago/Turabian Style

Goyal, Ravi, and Victor De Gruttola. 2023. "A General Computational Approach for Counting Labeled Graphs" Algorithms 16, no. 1: 16. https://doi.org/10.3390/a16010016

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop