Enumerating Tree-Like Graphs and Polymer Topologies with a Given Cycle Rank

Cycle rank is an important notion that is widely used to classify, understand, and discover new chemical compounds. We propose a method to enumerate all non-isomorphic tree-like graphs of a given cycle rank with self-loops and no multiple edges. To achieve this, we develop an algorithm to enumerate all non-isomorphic rooted graphs with the required constraints. The idea of our method is to define a canonical representation of rooted graphs and enumerate all non-isomorphic graphs by generating the canonical representation of rooted graphs. An important feature of our method is that for an integer n≥1, it generates all required graphs with n vertices in O(n) time per graph and O(n) space in total, without generating invalid intermediate structures. We performed some experiments to enumerate graphs with a given cycle rank from which it is evident that our method is efficient. As an application of our method, we can generate tree-like polymer topologies of a given cycle rank with self-loops and no multiple edges.

The problem of the enumeration of chemical compounds with given constraints is often modeled as the problem of the enumeration of graphs. Several chemical compound generation methods have been proposed [3][4][5][6][7][8][9], where some methods [3,4] focus on general chemical compounds, while the other methods [5][6][7][8] deal with restricted chemical graphs. These methods are mainly based on the branching algorithm paradigm; the required chemical compounds appear at the leaves of a computation tree. However, these algorithms generate many invalid intermediate structures that appear at the non-leaf nodes of the computation tree [9]. Due to this fact, these methods are inefficient to generate chemical compounds with more than 20 non-hydrogen atoms. Thus, it is natural to explore and develop such methods that can enumerate chemical compounds without generating invalid intermediate structures.
Jin et al. [9] proposed one such chemical compound generation method based on the junction tree and the variational autoencoder.
For a chemical compound C, the polymer topology is a connected multi-graph where all vertices have degree at least three obtained by iteratively removing all vertices of degree at most two from C [19]. For example, the polymer topologies of the chemical compounds remdesivir C 27 H 35 N 6 O 8 P (Figure 1a) and dexamethasone C 22 H 29 FO 5 (Figure 1b) are illustrated in Figure 1c,d, respectively. Observe that two different chemical compounds can have the same polymer topology, and the categorization of polymer topologies can play an important role in understanding and studying the synthetic pathways of macro-chemical compounds [20]. Polymer topologies P are often classified with respect to their cycle rank, which is the number of edges that are necessary to remove to get a spanning tree of P. Haruna et al. [19] developed an enumeration method to generate all polymer topologies with a given rank by using frontier-based search and zero-suppressed decision diagrams. As a result, they enumerated all polymer topologies with cycle rank at most 6. For a multi-graph G, we define the skeleton to be the simple graph obtained by removing all self-loops and multiple edges from G. Notice that the class of graphs with a tree skeleton, ∆ ≥ 0 self-loops, and no multiple edges contains all tree-like polymer topologies with cycle rank ∆, and therefore, it is an interesting problem to enumerate all such graphs. Figure 2 illustrates examples of chemical compounds that have tree-like polymer topologies with self-loops and no multiple edges. Recently, Azam et al. [21] proposed a method to count all trees with given numbers of vertices and self-loops by using dynamic programming. As a result, they gave the upper bound and the lower bound on the number of tree-like mutually non-isomorphic polymer topologies with a given rank.  This article aims to develop an efficient method to enumerate all mutually non-isomorphic graphs with a tree skeleton, n vertices, ∆ self-loops, and no multiple edges without generating invalid intermediate structures. The idea of our method is to define a canonical representation of rooted graphs with the said structures and then enumerate these graphs by generating their canonical representations. As a consequence of our method, we can get all polymer topologies with a tree skeleton, a given cycle rank, self-loops, and no multiple edges. We organize the paper as follows: In Section 2, we discuss some preliminaries. In Section 3, we first prove the mathematical properties based on which we develop our enumeration method. We discuss experimental results and an application of our enumeration method to generate all polymer topologies with a tree skeleton and a given cycle rank in Section 4. We conclude and discuss some future directions in Section 5.

Preliminaries
For a graph G, let V(G) denote the vertex set and E(G) denote the edge set. Let n(G) denote |V(G)| and self(G) denote the number of self-loops in G. We define size s(G) of graph G to be the sequence (n(G), self(G)). Let s(v) denote the number of self-loops on v ∈ V(G).
For a multi-graph G, we define the skeleton γ(G) of G to be the simple graph obtained by removing all self-loops and multiple edges from the graph G. For a rooted graph G with root r G , we define the rooted skeleton γ(G) of G to be the rooted simple graph obtained by removing all self-loops from G with root r G .
Let n ≥ 1 and ∆ ≥ 0 be two integers. We denote by H(n, ∆) a maximal set of mutually non-isomorphic rooted graphs with a tree skeleton, n vertices, and ∆ self-loops. Let H be a rooted graph in H(n, ∆). For a vertex v ∈ V(H), let H v denote the subgraph of H rooted at v induced by v and its descendants in the rooted skeleton γ(H). For a vertex v ∈ N H (r H ) of root r H of H, we call the subgraph H v a root-subgraph of H.
Let n ≥ 1 and ∆ ≥ 0 be two integers and H be a rooted graph in H(n, ∆). An ordered graph (H, π) of H is defined to be the rooted graph H with a left-to-right ordering π on the children of each vertex of the rooted skeleton γ(H). Let K = (H, π) be an ordered graph of H. For a vertex v ∈ V(K), we define the ordered subgraph K v of K to be a subgraph of K rooted at v induced by v and its descendants in the rooted skeleton γ(K) with preserving the ordering π on the children of each vertex in γ(K v ). For a vertex v ∈ N K (r K ), we call the ordered subgraph K v an ordered root-subgraph of K.
For an ordered tree, we discuss two vertex orderings: depth first search (DFS) ordering [22] and sibling-depth first search (SDFS) ordering. In DFS ordering, we index the vertices of a given ordered tree starting from the root and visiting them from left to right. Masui et al. [23] introduced the SDFS ordering for simple ordered trees. For an ordered tree T = (L, π) with n vertices and a left-to-right ordering π, the SDFS ordering is defined to be a vertex ordering obtained by indexing the vertices from the set {1, 2, . . . , n} such that: (i) the root has index one; (ii) all siblings are indexed consecutively according to the left-to-right ordering π; and (iii) all descendants of a vertex v are indexed consecutively with indices larger than that of v and smaller than the indices of the descendants of any vertex u, which is not a descendant of v with index larger than v.
Examples of an ordered tree and its vertex indexing in DFS and SDFS ordering are illustrated in Figure 3. Let A = (a 1 , a 2 , . . . , a n ) and B = (b 1 , b 2 , . . . , b m ) be two sequences over integers. We say that the sequence A is lexicographically smaller A ≺ B than the sequence B if there exists an integer , 1 ≤ ≤ min{n, m}, such that for each integer i, 1 ≤ i ≤ , it holds that a i = b i and: (i) either = n with n < m or (ii) < min{n, m} with a +1 < b +1 .
In such a case, we say that the sequence B is lexicographically greater B A than the sequence A. We define the concatenation A ⊕ B of the sequences A and B to be the sequence (a 1 , . . . , a n , b 1 , . . . , b m ).

Enumeration of Graphs with a Tree Skeleton and a Given Number of Vertices and Self-Loops
For two integers n ≥ 1 and ∆ ≥ 0, the aim of this section is to present a method to generate all rooted graphs in H(n, ∆). The idea of our enumeration method is to generate a rooted graph H ∈ H(n, ∆) by generating a canonical ordered graph of H. To achieve this, we define a canonical graph of a rooted graph H ∈ H(n, ∆) and represent the canonical ordered graph with a sequence by using its ordered subgraphs. Finally, generate the canonical ordered graph of a rooted graph by using the sequence representation of the canonical ordered graph.
We next present a canonical representation of rooted graphs in H(n, ∆) based on a generalization of the canonical representation of simple rooted trees with n vertices introduced by Masui et al. [23].
Recall that H(n, 0) denote a maximal set of all mutually rooted non-isomorphic simple rooted trees with n vertices. Further, note that for ∆ ≥ 1, it is necessary for a canonical representation of a rooted graph H ∈ H(n, ∆) to contain the information of vertices and self-loops in H.
Let H be a rooted graph in H(n, ∆) and r H denote its root. Further, let K = (H, π) be an ordered graph of H with a left-to-right ordering π. For an integer i ∈ [1, n] and i-th vertex v i of K following the SDFS ordering on the rooted skeleton γ(K), let K (i) denote the ordered subgraph K v i of K for convenience.
We introduce a canonical representation of K by using the information of the number of vertices and self-loops in the ordered subgraphs of K. For the vertices {v 1 , v 2 , . . . , v n } of K indexed by SDFS ordering on the rooted skeleton γ(K), we define the sequence representation SR(K) of K to be a sequence of the size of each ordered subgraph K (i) , integer i ∈ [2, n], of K: Examples of a rooted graph H ∈ H (11,3), an ordered graph K = (H, π) of H with a left-to-right ordering π, and vertices indexed in SDFS ordering and canonical representation SR(K) of K are illustrated in Figure 4a-c.
The next lemma states that the sequence representation of an ordered graph K is a concatenation of: (i) a sequence of the size of the root-subgraphs of K in the left-to-right ordering and (ii) the sequence representation of all root-subgraphs of K following the left-to-right ordering.
(i) Sequence M is the sequence representation SR(K) of some ordered graph K with n vertices and ∆ self-loops if and only if M is (n, ∆)-admissible.

(ii)
Whether M is admissible or not can be tested in O(n) time.

(iii)
When M is (n, ∆)-admissible, the ordered graph K with SR(K) = M can be constructed in O(n) time.

Proof. For sequence M with an integer
For an ordered graph K and integer i ∈ [1, deg K (r K )], let K i denote the i-th root-subgraph of K in the left-to-right ordering.
(i) The if-part: Suppose that M = SR(K) for some ordered graph K with n vertices and ∆ self-loops. If n = 1, then M is (1, ∆)-admissible by the definition of admissibility. Let us assume that n ≥ 2. Then, for Thus, by recursively using Lemma 1 for The only-if part: We prove the converse of (i) by induction on n.
For n = 1, M is (1, ∆)-admissible by the definition of admissibility. Note that M is an empty sequence in this case. Let K be an ordered graph with n = 1 vertices and ∆ self-loops. Then, SR(K) is an empty sequence, and hence, M = SR(K).
Suppose that the converse of (i) holds for any positive integer . We show that the converse holds for the integer + 1. Let M be ( + 1, ∆)-admissible. Then, by the definition of admissibility, there exists an integer d ∈ [1, ] such that = ∑ 1≤i≤d a i and ∆ ≥ ∑ 1≤i≤d b i . This implies that for an integer by the admissibility of M. This and the inductive hypothesis that the converse of (i) holds for any integer ≥ 1 imply that for each integer i ∈ [1, d], there exists an ordered graph H with a i vertices and b i self-loops such that M i = SR(H). Let K denote the ordered graph with + 1 vertices, ∆ self-loops, deg K (r K ) = d, ∆ − ∑ 1≤i≤d b i self-loops on the root r K , and the i-th root subgraph K i of K be the ordered subgraph H such that M i = SR(H). Then, it immediately follows that: This means that M = SR(K) holds, since ((a 1 , b 1 ), . . . , (a d , b d )) = (s(K 1 ), . . . , s(K d )), showing that the converse holds for the integer + 1.
Hence, by mathematical induction, the converse of (i) holds for any integer n ≥ 1.
(ii) We prove this result by induction on n.
For n = 1, the sequence M is an empty sequence and is (1, ∆)-admissible by the definition of admissibility. Therefore, it takes constant O(1) time to test admissibility in this case.
Suppose that for n = , ≥ 1, the admissibility of sequence M can be tested in O( ) time. We show that the statement (ii) holds for n = + 1. To show if M is ( + 1, ∆)-admissible, we need to find an integer d ∈ [1, ] such that = ∑ 1≤i≤d a i and This and the inductive hypothesis imply that for an integer i ∈ [1, d], the admissibility of the sequence M i can be tested in O(a i ) time. Thus, the time Hence, by mathematical induction, the admissibility of a sequence M of size n − 1, n ≥ 1 can be tested in O(n) time.
(iii) We prove the claim in (iii) by induction on n.
For n = 1, M is (1, ∆)-admissible and is the sequence representation of the ordered graph K with only one vertex and ∆ self-loops. This implies that K can be constructed in O(1) time.
Suppose that for n = , ≥ 1, the statement (iii) holds. We show that the statement (iii) holds for n = + 1. Let M be an ( + 1, ∆)-admissible sequence. Then, there exists an integer d ∈ [1, ], such that = ∑ 1≤i≤d a i and ∆ ≥ ∑ 1≤i≤d b i . By (i), there exists an ordered graph K with n vertices and ∆ self-loops such that SR(K) = M. Thus, it holds that deg K (r K ) = d. Further, by Lemma 1, for each integer i ∈ [1, d], the i-th root-subgraph K i of K has a i vertices and b i self-loops. Observe that such an integer d exists uniquely due to the admissibility of M. This implies that deg K (r K ) can be obtained in O(d) time.
By SR(K) = M and Lemma 1, for each integer i ∈ [1, d], it holds that SR( Thus, by the inductive hypothesis for an integer i ∈ [1, d], the subgraph K i can be constructed from Hence, by mathematical induction, for integers n ≥ 1 and ∆ ≥ 0 and an (n, ∆)-admissible sequence M, the ordered graph K with SR(K) = M can be constructed by M in O(n) time.
Let M = ((a 1 , b 1 ), (a 2 , b 2 ), . . . , (a n−1 , b n−1 )) be an (n, ∆)-admissible sequence. Then, there exists an integer d ∈ [1, n − 1] such that n − 1 = ∑ 1≤i≤d a i . Furthermore, by Theorem 1(i), there exists an ordered graph K with n vertices and ∆ self-loops such that deg K (r K ) = d and SR(K) = M. Note that such an integer d and ordered graph K are unique. We call the integer d the root-degree of By Theorem 1(i), it follows that an ordered graph K with n ≥ 1 vertices and ∆ ≥ 0 self-loops can be completely determined by SR(K). Thus, we define a canonical representation of a rooted graph H ∈ H(n, ∆) as follows. For a rooted graph H ∈ H(n, ∆), we define the canonical representation to be an (n, ∆)-admissible sequence M such that M is lexicographically maximum among all (n, ∆)-admissible sequences that are the sequence representation of ordered graphs of H.
In Figure 4d, we show the canonical representation M of the rooted graph H ∈ H(11, 3) illustrated in Figure 4a. Further, we show the ordered graph L such that SR(L) = M.
To generate all rooted graphs in H(n, ∆), it is enough to generate the canonical representation of each rooted graph H ∈ H(n, ∆) by Theorem 1(i). For two integers n ≥ 1 and ∆ ≥ 0, let M(n, ∆) denote the set of all (n, ∆)-admissible sequences that are canonical representation of graphs in H(n, ∆). Note that the empty sequence is the only sequence in M(1, ∆). In the next lemma, we give a characterization of sequences in M(n, ∆).

Lemma 2.
Let n ≥ 2 and ∆ ≥ 0 be two integers. Let M = ((a 1 , b 1 ), (a 2 , b 2 ), . . . , (a n−1 , b n−1 )) be a sequence of integer pairs with an integer d ∈ [1, n − 1] such that n − 1 = ∑ 1≤i≤d a i . For an integer i ∈ [1, d], let M(i) denote the subsequence of M consisting of a i − 1 consecutive entries starting from d + ∑ 1≤h≤i−1 a h − (i − 1) + 1. Then, M ∈ M(n, ∆) if and only if the following hold:   The only-if part: Let M satisfy (i), (ii), and (iii). We show that M ∈ M(n, ∆). To prove this, we show that M is a canonical representation of some graph in H(n, ∆).

By (i) and (ii), we have
By Theorem 1(i), there exists a unique ordered graph K = (H, π) such that SR(K) = M for some H ∈ H(n, ∆). This implies that deg K (r K ) = d, and for each integer i ∈ [1, d], the i-th root subgraph of K has a i vertices and b i self-loops. This implies that any ordered graph L that is rooted isomorphic to K has x vertices and y self-loops such that (x, y) = (a i , b i ) for some i ∈ [1, d].
We next give the structure of the sequences that are lexicographically minimum and maximum among all sequences in M(n, ∆).  Proof. (i) It is easy to observe that the sequence N is (n, ∆)-admissible. Furthermore, for two integers i ≥ 2 and j ≥ 0, the ranges of the first and the second entries in any sequence in M(i, j) are [1, i − 1] and [0, j], respectively. This implies that the sequence N is lexicographically minimum among all the sequences in M(n, ∆). Moreover, the sequence ((1, ∆)) is admissible and lexicographically maximum among all the sequences in M(2, ∆). From this, it follows that the sequence ((2, ∆), (1, ∆)) is admissible and lexicographically maximum among all the sequences in M(3, ∆). Thus, by using this inductive argument, we can conclude that the sequence M is admissible and lexicographically maximum among all the sequences in M(n, ∆).
(ii) We know that a sequence S in M(n, ∆) is of length n − 1, and therefore, by using a for-loop of size n − 1 and (i), we can test if S is lexicographically minimum or maximum among all sequences in M(n, ∆) in O(n) time.
/* Theorem 2(c) */ 10: 11: 12: if k = d then 13: When the conditions at Line 6 hold, then k and h can be computed in O(n) time, since k ≤ d and h ≤ d + 1. This implies that ((x 1 , y 1 ) Note that we can generate all rooted graphs in H(n, ∆) by generating their canonical representation by repeatedly using Algorithm 1 starting from the lexicographically maximum sequence L(n, ∆) in M(n, ∆) in O(n) time per graph and O(n) space in total.
Theorem 3. Let n ≥ 2 and ∆ ≥ 0 be two integers. Then, all mutually non-isomorphic graphs with n vertices, ∆ self-loops, and a tree skeleton can be generated in O(n) time per graph and O(n) space in total.

Proof.
A tree can be viewed as a rooted tree by considering its centroid as the root [24]. We know that when n is odd, then there are only trees with unicentroids; however, when n is even, then there are trees with unicentroids.
Hence, we can generate all non-isomorphic graphs with n vertices, ∆ self-loops, and a tree skeleton with a unicentroid or bicentroid in O(n) time per graph and O(n) space in total, which completes that proof.

Conclusions
We proposed an efficient method to enumerate all mutually non-isomorphic graphs with a tree skeleton, a given number of vertices, and the number of self-loops. The idea of this method is to generate rooted graphs with n vertices and ∆ self-loops by generating their canonical representation. We defined the canonical representation of a rooted graphs H with n vertices and ∆ self-loops based on the ordered graphs of H. The proposed method generates all graphs with a tree skeleton, n vertices, and ∆ self-loops in O(n) time per tree and O(n) space in total. As an application, we can generate all polymer topologies with a tree skeleton, self-loops, no multiple edges, and a given cycle rank.
An interesting future research direction is to design a method that can directly count and enumerate all mutually non-isomorphic polymer topologies with a given cycle rank.