Symmetry in Sphere-based Assembly Configuration Spaces

Many remarkably robust, rapid and spontaneous self-assembly phenomena in nature can be modeled geometrically starting from a collection of rigid bunches of spheres. This paper highlights the role of symmetry in sphere-based assembly processes. Since spheres within bunches could be identical and bunches could be identical as well, the underlying symmetry groups could be of large order that grows with the number of participating spheres and bunches. Thus, understanding symmetries and associated isomorphism classes of microstates correspond to various types of macrostates can significantly reduce the complexity of computing entropy and free energy, as well as paths and kinetics, in high dimensional configuration spaces. In addition, a precise understanding of symmetries is crucial for giving provable guarantees of algorithmic accuracy and efficiency in such computations. In particular, this may aid in predicting crucial assembly-driving interactions. This is a primarily expository paper that develops a novel, original framework for dealing with symmetries in configuration spaces of assembling spheres with the following goals. (1) We give new, formal definitions of various concepts relevant to sphere-based assembly that occur in previous work, and in turn, formal definitions of their relevant symmetry groups leading to the main theorem concerning their symmetries. These previously developed concepts include, for example, (a) assembly configuration spaces, (b) stratification of assembly configuration space into regions defined by active constraint graphs, (c) paths through the configurational regions, and (d) coarse assembly pathways. (2) We demonstrate the new symmetry concepts to compute sizes and numbers of orbits in two example settings appearing in previous work. (3) We give formal statements of a variety of open problems and challenges using the new conceptual definitions.


Motivation
Supramolecular assembly is prevalent in nature, health-care and engineering, but poorly understood. The assembly starts with identical copies of structures drawn from a small number of types. Modeling these starting structures as rigid-bunches-of-spheres is well-suited to assembly processes driven by so-called short-range or hard sphere interaction potentials.
More formally, an input to a computational model of an assembly process is an assembly system consisting of the following.
• A collection of k rigid molecular components belonging to a few types; a rigid component is specified as the set of positions of the centers of their constituent atoms, in a local coordinate system. In many cases, an atom could be the representation for the average position of a collection of atoms in an amino acid residue. Note that an assembly configuration is given by the positions and orientations of the entire set of k rigid molecular components in an assembly system, relative to one fixed component. Since each rigid molecular component has six degrees of freedom, a configuration is a point in 6(k − 1) dimensional Euclidean space.
• The pairwise component of the potential energy function of the assembly system, specified as a sum of potential energy terms between pairs of constituent atoms i and j in two different rigid components of the assembly system. The weak interaction between the rigid molecular components is captured by this potential energy function. The pairwise potential energy terms are, in turn, specified using pairwise potential energy functions similar to so-called Lennard-Jones potentials and Morse potentials [22]. The potential energy is a function of the distance d i,j between i and j.
• A non-pairwise component of the potential energy function in the form of global potential energy terms that capture the tethers between the rigid components within a monomer, as well as other global potential energy terms that implicitly represent the solvent (water or lipid bilayer membrane) effect [47,48,36]. These are independent of particular pairs of atoms.
It is important to note that all the above potential energy terms are functions of the assembly configuration.
The formal conceptual framework we develop here is inspired by the following types of prediction questions.
• Input: the 3D descriptions of the rigid molecular components and their interactions (Section 2 describes how they are formally specified). Output: prediction of the final assembly structures and their likelihood.
• Input: as in the previous item, plus a 3D configuration of final assembled structure. Output: prediction of those interactions that are crucial for the assembly process to terminate in the given input assembly configuration.
• Input: as in the previous item. Output: prediction of minimal alterations of the building blocks or interactions that would significantly increase likelihood of the assembly process terminating in the given input assembly configuration.
• Input: as in the previous item, additionally more than one choice of final assembly configuration. Output: prediction of key events such as specific intermediate subassembly configuration choices during assembly that determine which one of the final assembly configuration is more likely to result.
Experimentally in vitro or vivo, these types of predictions about supramolecular assembly processes are difficult because of the remarkable rapidity, spontaneity and robustness of assembly processes. The prediction tasks highlight combinatorial explosion and thus insufficiency of experimentation (trying various possibilities) and guesswork, even with the help of known data on similar assemblies and biological knowledge about evolutionarily conserved structures. In addition, many of the current experimental methods are labor and resourceintensive, making blind alleys expensive in time and effort.
On the other hand, computer simulations guided by theoretical first principles and standard paradigms such as Monte Carlo(MC) or Molecular Dynamics(MD) are limited due to the reasons detailed in the next subsections.

Assembly Configurational Volume
Stability and binding affinity of subassemblies depend on free energy whose landscape in the case of assembly is heavily influenced by configurational entropy (volume measure of microstates corresponding to a macrostate; see [39]); this depends on accurate computation of configurational volumes by sampling, attempted by a long and distinguished series of methods [39,3,28,29,27,43,26,63,44]. Assembly configuration spaces are high dimensional, and the number of required samples is typically exponential in the dimension. Sampling on a highdimensional ambient space grid typically means computing a large proportion of samples that lie outside any region of interest which is effectively of lower dimension, and these samples must be discarded. Not only are the relevant regions in the case of short-ranged potentials of effectively lower dimension, they are also geometrically/topologically complex, hence gridbased sampling in Cartesian space, as well as nonergodic methods like MC or MD, have to generate impractically dense sampling to accurately reflect the volume/measure ratios of these important, relatively low volume regions having complex geometry and topology. These methods do not exploit the abundance of symmetries of the landscape. They are used both for assembly processes, whose feasible regions are defined by one-sided pairwise distance equalities and inequalities between atom-centers, and folding processes, where the feasible regions are defined by pairwise distance equalities. The difference of complexity between the two is a litmus test for the limitations that are addressed by the Cayley configuration space approach taken by EASAL described in Section 1.5.
Conventional methods to compute the energy landscape of small clusters are based on searching for local minima [74,21,22]. Point group symmetrisation schemes [54,77,53] and local rigidification schemes [45,62] have been exploited in global optimisation algorithms to gain computational efficiency.
Because of the complexity of the problem of dealing with the short range of interaction of hard spheres leading to narrow regions of lower potential energy, separated by vast flat parts, conventional local-minima based methods for energy landscape computation [74] are limited. These methods have the additional disadvantage of small perturbations to energy values requiring complete recomputation and also they do not deal well with the very flat landscape that is the signature of short-range potentials.
An alternative approach for short-range potentials is to consider the "sticky sphere limit" based on taking the limit as the range of interaction goes to zero [6,70,51]. In this limit, the energy landscape reduces to a collection of manifolds of different dimensions, glued together at their boundaries (formally, a Thom-Whitney stratification of real semi-algebraic sets), as described in theoretical models proposed independently and separately by Holmes-Cerfon et al. [33] in 2013 and by the first author's research group [55,56] in 2011.
The background provided in the remainder of this section recalls previously developed concepts for describing assembly configuration spaces. This motivates the conceptual framework for symmetry in assembly under short-ranged potentials given in Section 2.

Kinetics, Topology and Geometric complexity
Kinetics and transition rates between subassemblies also require an explicit understanding of the geometry, topology and multiple paths in the assembly configuration space. For cluster assemblies from spheres, there are a number of methods [4,76,8,17,42,35,34] to compute the entire configuration space of small molecules such as cyclo-octane [49,37,58]. Some methods from robotics and computational geometry [63], such as the probabilistic roadmap [2], effectively give bounds to approximate free energy without relying on MC or MD sampling. Starting from MC and MD samples, recent heuristic methods infer topology [23,72,46,59], and use topology to guide dimensionality reduction [79]. Yet, most prevailing methods are unable to extract the topology in a sufficiently efficient and accurate manner as to be able to feasibly compute volume or path integrals (required for entropy or kinetics computations) even for small assemblies. Moreover even those prevailing methods that exploit symmetry in the configuration space to compute free energy and kinetics do not employ a formal and precise group-theoretic framework.

Recursive decomposition, Assembly trees, Combinatorial entropy
For larger, microscale assemblies, a direct study of the free energy and configurational entropy is computationally emphatically intractable. At these coarser scales, the primitives are stable subassemblies and transition rates (obtained from the computational tasks of the previous two subsections). Still, the combinatorial entropy of multiple pathways makes it difficult to isolate crucial combinations of assembly-driving interface interactions. This issue has been addressed by the first author's previous work on recursive decompositions [30,31,32] of larger assemblies into smaller subassemblies. This work introduces structures called assembly trees and the notion of combinatorial entropy, applied to model viral capsid assembly in [67].
While trees of various types have been used to model various processes related to assembly [18,75], to the best of our knowledge, the assembly trees from [67] have a formal structure that is distinct from other tree representations of assembly pathways. In particular, non-root nodes of the assembly tree contain subassemblies, rather than configurations of the entire assembly system; and any pair of nodes that are incomparable (neither ancestor or child in the tree) are disjoint subassemblies, i.e, they do not contain any common rigid components; moreover, only rigid subassembly configurations are represented. In addition, the authors have taken the first steps towards precisely formalizing the effect of symmetries on a highly simplified version of assembly trees; specifically their orbits under the action of a fixed group of symmetries -called assembly pathways [13]. These concepts will be discussed in detail in Sections 2 and 3.

Symmetry in Chemistry
Since spheres within rigid bunches of an assembly system could be identical and bunches could be identical as well, the underlying symmetry groups could be of large order, that grows with the number of participating spheres and bunches. Therefore, all of the tasks in the previous three subsections can be significantly simplified by taking advantage of natural symmetries of the configuration space that arise due to identical assembling units, their symmetries, and symmetries of the final assembled structure. However, none of the prevailing methods discussed above computationally incorporates these symmetries. Group theory has been used to study the symmetry of molecules and molecular orbits [16,20,14,41] for a long time. The well-known Pólya enumeration theorem [57], which provides a method to find the number of orbits of a group action, is motivated by the problem of enumerating permutational isomers of a given molecular skeleton. Group theory is widely used in crystallography to describe crystallographic symmetry and classify crystal structures [1,25]. Other applications include using the molecule symmetry group in studying molecular spectroscopy [15] and using generating functions in understanding nuclear spin statistics of nonrigid molecules [5]. However, most of these works only involve symmetry of individual structures. The literature is sparse in the context of symmetry in assembly systems or in configuration spaces.

EASAL: Efficient Atlasing and Search of Assembly Landscapes
A recent method of the first author, EASAL (efficient atlasing and search of assembly landscapes) [55,56], formally addresses the issues highlighted in the first two subsections above: computation of configurational entropy and kinetics, via geometrization, stratification and convexification using Cayley parameterization of assembly configuration spaces. Geometrization and Stratification were also used later in [33] independently (as mentioned at the end of Section 1.1): the geometrization is achieved in [33] via a somewhat different process consistent with smooth potential energy functions, while the stratification is the standard Thom-Whitney stratification of semialgebraic sets as laid out in [55,56].
On the other hand, Cayley convexification based on [68] is a unique feature of EASAL not present in [33], that makes it tractable to sample and compute entropy integrals over higher dimensional constant-potential-energy regions of the assembly configuration space. In addition Cayley convexification helps formalize and precisely explain the intuitively clear observation that assembly configuration spaces are significantly simpler geometrically and topologically than folding configuration spaces. The difference in complexity is especially stark when there are cycles of pairwise constraints between atom centers.
We describe the Geometrization and Stratification aspects of EASAL's approach below. Stratification is explained in further detail in Section 2 and Cayley parameters for configuration spaces and convexification based on [68] are explained in Section 4.

Geometrization
The assembly configuration space is represented as a semi-algebraic set satisfying geometric constraints specified as distance inequalities between atom-centers. The short-range or hard sphere potential interaction is typically discretized to take different constant values on three intervals for the distance value d i,j : (0, r i,j ), (r i,j , r i,j + δ i,j ), and (r i,j + δ i,j , ∞). Typically, r i,j , the so-called Van der Waal or steric radius, specifies "forbidden" regions around atoms i and j. And r i,j + δ i,j is a distance where the attractive (electrostatic or other weak) forces between the two atoms is no longer strong (typically these forces decay as the reciprocal of some power of the distance d i,j between atom centers). Intuitively, the interval (0, r i,j ) is where the repulsive force highly dominates, and (r i,j , r i,j + δ i,j ) is where the attractive force and repulsive forces are balanced, and (r i,j + δ i,j , ∞) is where neither force is strong. Over these 3 intervals respectively, the potential assumes a very high value, a very low value, and a medium value m i,j . All of these bounds for the intervals for d i,j , as well as the values for the potential on these intervals, are specified as part of the input to the assembly model. These constants are specified for each pair of atoms i and j, i.e., the subscripts are necessary. The interval with the low value is called the well. The Hard-Sphere potentials are defined solely by the Van der Waal's forbidden distance constraint, δ i,j = 0.
The information in the potential energy landscape can thus be geometrized, i.e., represented using assembly constraints, in the form of distance intervals. These constraints define feasible configurations. The set of feasible configurations is called the assembly configuration space. The active constraint regions of the configuration space are regions where at least one of the short-ranged inter-atom distances lies in the potential energy well, i.e, the interval (r i,j , r i,j + δ i,j ).

Stratification
The above geometrization of an assembly configuration space makes it natural to stratify an assembly configuration space into atlas of active constraint regions, More details are provided in Section 2 -see also Figure 7. The active constraint regions of the configuration space are regions where at least one of the inter-atom distances lies in the potential energy well. The active constraint regions are stratified by dimension into a topological Thom-Whitney complex, with the boundary region being one dimension smaller. The active constraint regions can be modeled as so-called convexifiable Cayley configuration spaces [68], a combinatorially definable concept by first labeling each region by its unique active constraint graph (see Section 2). A demo movie of EASAL is available at: http://www.cise.ufl.edu/research/SurfLab/EASALvideo.mpg. Standard algorithms can be employed for a fast computation of paths from one configuration to another in the atlas. However, the computation of entropy integrals over these paths poses several challenges.

Organization and Contribution
This is a primarily expository paper that develops a novel, original framework for dealing with symmetries in configuration spaces of assembling spheres under short ranged potentials. It is motivated by a longer-term goal to exploit natural symmetries using assembly trees and other concepts described in the previous sections, that have appeared in various avatars in the community, including our work on EASAL. Such an understanding of symmetries is essential for significantly reducing the complexity of the computation of configurational and combinatorial entropy as well as kinetics, since spheres within rigid bunches of an assembly system could be identical and bunches could be identical as well, giving underlying symmetry groups of large order, that grows with the number of participating spheres and bunches.
To this end, we develop a formal conceptual framework for assembly under short-ranged potentials, as assembly of rigid bunches of spheres. As different definitions of assembly macrostates are appropriate in different contexts, for example, depending on whether different copies of identical atoms or molecules are considered interchangeable or not, we carefully define and differentiate between congruence and isomorphism of configurations. We then show how symmetries of assembly configuration spaces arise due to: multiple copies of identical building blocks (in particular when these building blocks are rigid bunches of spheres), internal symmetries of building blocks, and the symmetries of the final assembled structure.
The organization of this paper is as following. In Section 2, we define the new conceptual framework for symmetry in assembly under short-ranged potentials (or assembly of rigid bunches of spheres) leading to the main Theorem 4. An application of some of these results on symmetry can be found in [56]. In Section 3, we illustrate one aspect of our approach [13] for computing combinatorial entropy using generating functions for counting the number and size of simplified assembly pathways (orbits of a symmetry group action on assembly trees). Note that while this simple example has a fixed group size, the method demonstrated applies also when the underlying symmetry group grows with the size of the system. In Section 4, open questions and directions are given.

Framework for Symmetry in Assembly
In this section, we define natural groups of symmetries acting on various previously defined objects related to symmetry that are described in the Introduction and later in this section. The four new groups we defined are the weak automorphism group, the strict congruence group, the strict order preserving isomorphism group and the strict permuted congruence group of an assembly configuration. We consider the action of these groups on various objects defined in previous literature on assembly and sketched in the introduction [55,56,66], such as assembly configuration space, active constraint regions, active constraint graphs, assembly paths and trees. These resulting symmetry classes will be used to formalize the main new Theorem 4 and two applications in Example 1 and Section 3, as well as open problems in the last section of this paper.
Let X be a set under the action of a group G, and x be any element of X. The orbit The following theorem from standard group theory can be used to determine the number of orbits and the size of orbits for various objects defined in this section. An explicit application of this theorem is shown in the next section. Theorem 1. Let X be a set under the action of a group G. For all x ∈ X, the equalities Stabilizer theorem) and Different definitions of macrostates are appropriate in different contexts, for example, depending on whether different copies of identical atoms or molecules are considered interchangeable or not. For this reason we carefully define and differentiate between congruence and isomorphism of configurations.
In order to give a physically meaningful formalization of an assembly system under shortranged potentials, we define the notion of a bunch, i.e., a rigid configuration of spheres of varying colors and radii.

A Bunch and its symmetries
Let SE(3) denote the group of orientation preserving isometries of R 3 .
A bunch is a tuple (P ; C, r, δ) where P = (p 1 , p 2 , . . . , p n ) is an ordered set of points in R 3 , and C, r, δ are functions defining colored spheres centered at the points in P . Specifically, C : P → C where C is a finite set of "colors", and r, δ : P → R + such that the spheres are nonintersecting, i.e. p i − p j 2 ≥ r(p i ) + r(p j ) for any i = j. The map δ is interpreted as the width of the annulus specified by the potential energy well and is used in the definition of an active constraint graph of an assembly configuration later in this section. For a bunch B, P (B) is used to denote the point set B; similarly we have C(B), r(B) and δ(B).
Two bunches B = (P ; C, r, δ) and B = (P ; C , r , δ ) are isomorphic if there is an element φ of SE(3) and a permutation π ∈ S n , such that φ(p i ) = p π(i) for all i, where n = |P |, and φ preserves the color, radius and annulus of points. In this case with a slight abuse of notation, we write B ∈ φ(B), where φ(B) denotes the set of bunches that are isomorphic to B under φ and some permutation in S n . See Figure 1 for an example.
Two bunches B = (P ; C, r, δ) and B = (P ; C , r , δ ) are strictly isomorphic, if there is a permutation π ∈ S n such that B and B are isomorphic under π and the identity element in SE(3). The weak automorphism group of B, denoted Waut(B), is the group of all permutations π ∈ S n that take B to a strictly isomorphic B . Two bunches B = (P ; C, r, δ) and B = (P ; C , r , δ ) are order preserving isomorphic or congruent, if there is a φ ∈ SE(3) such that B and B are isomorphic under φ and the identity permutation. In this case with a slight abuse of notation, we write B = φ(B).
We have the following observation that describes strict isomorphism using the notion of congruence.
Observation 2. Two congruent bunches B and B are strictly isomorphic, if and only if P =P whereP andP denote the unordered point sets of B and B respectively, and for all p ∈ P , C (p) = C(p), r (p) = r(p), δ (p) = δ(p).

An assembly configuration space and its symmetries
An assembly configuration is an ordered set Two assembly configurations B = (B 1 , . . . , B k ) and B = (B 1 , . . . , B k ) are configurations of the same assembly system (see Section 1) if B i is congruent to B σ(i) for some permutation σ ∈ S k , for all i. Notice that the congruence between bunches could be different for each i. The set of all assembly configurations of an assembly system is called an assembly configuration space. The assembly configuration space containing the assembly configuration B is denoted A(B), or simply A when the context is clear.
In the following discussion, we always restrict our universe to assembly configurations in the same assembly configuration space.
Note that all assembly configurations in the same assembly configuration space A have the same weak automorphism group. Thus we define the weak automorphism group of an assembly configuration space A, denoted Waut A , to be the weak automorphism group of any assembly configuration B in A.
Two assembly configurations B and B are congruent if there is an isomorphism φ ∈ SE(3) that preserves both the order of the bunches and the order of points within each bunch, i.e. for all i, B i is congruent to B i under φ. Two assembly configurations B and B are strictly congruent if they are both congruent and strictly isomorphic. In general, we think of two strict congruent assembly configurations as the same. The strict congruence group of an assembly configuration B is the stabilizer of the set strictly congruent assembly configurations of B under Waut A . It is the stabilizer subgroup stab Waut A B of the assembly configuration B under Waut A .
Two assembly configurations B and B are order preserving isomorphic if there is an isomorphism φ ∈ SE(3) that preserves the order of the bunches, i.e. for all i, B i is congruent to φ(B i ). Two assembly configurations B and B are strictly order preserving isomorphic if they are both order preserving isomorphic and strictly isomorphic. The strict order preserving isomorphism group of an assembly configuration B is the stabilizer of the set of strictly order preserving isomorphic configurations of B under Waut A .
Two assembly configurations B and B are permuted-congruent if there is an isomorphism that preserves the order of points within each bunch, i.e. there is an element φ of SE(3) and a permutation σ ∈ S k , such that for all i, B σ(i) is congruent to B i under φ. Two assembly configurations B and B are strictly permuted-congruent if they are both permuted-congruent and strictly isomorphic. The strict permuted congruence group of an assembly configuration B is the stabilizer of the set of permuted-congruent configurations of B under Waut A .

Figure 2:
The assembly configuration B 1 consists of 3 isomorphic bunches. B 2 is obtained from B 1 with a strict congruence, B 3 is obtained from B 1 with a strict permuted congruence, and B 4 is obtained from B 1 with a strict isomorphism that is neither a strict congruence, nor a strict permuted congruence, nor a strict order preserving isomorphism. Figure 3 shows another example of four assembly configurations each containing two bunches. The strict congruence group stab Waut A B of the assembly configuration B 1 is of size 2 and contains those tuples (σ, π 1 , π 2 ), where π 1 ∈ {id, (2 4)}, σ = id, π 2 = id. The weak automorphism group Waut A of the assembly system is of size 4 and contains those tuples (σ, π 1 , π 2 ), where π 1 ∈ {id, (2 4), (3 1), (2 4)(3 1)}, σ = id, π 2 = id. All four strictly isomorphic assembly configurations are obtained by applying Waut A to the assembly configuration   Figure 3: Four assembly configurations obtained by applying Waut A on the assembly configuration B 1 . B 2 is obtained from B 1 with a congruence, while B 3 is obtained from B 1 with a strict order preserving isomorphism.
We have the following observations for alternative characterizations of strict congruence, strict order preserving isomorphism and strict permuted congruence of assembly configurations. ).
Two active constraint graphs The automorphism group of an active constraint graph G is the group of elements ψ ∈ Waut A such that ψ(G) = G, i.e. it is the stabilizer subgroup stab Waut A G.
For example, Figure 4 shows all the non-isomorphic active constraint graphs with 12 edges of an assembly system consisting of 6 bunches, where all bunches are identical singleton spheres.  Figure 4: All non-isomorphic active constraint graphs with 12 edges of an assembly system of 6 bunches that are identical singleton spheres. The label on top is automatically generated by EASAL and specifies the orbit number of the shown active constraint graph.
Note: It is clear that stab Waut A B ⊆ stab Waut A G(B). Moreover, there are assembly configurations B such that stab Waut A B stab Waut A G(B), i.e. the strict congruence group of B does not have all the automorphisms of the corresponding active constraint graph. Refer to the assembly configuration B and its active constraint graph G in Figure 5, where each bunch is a singleton sphere. The permutation σ = (1 2 3) ∈ Waut A is contained in stab Waut A (G). However, it is not contained in the strict congruence group stab Waut A B of the assembly configuration.
The full graph G * of an active constraint graph G is obtained by adding edges to G to make the set of vertices in each bunch into a clique.
An active constraint region R G of the assembly configuration space A contains all assembly configurations B with the active constraint graph G(B) = G. The action of elements of Waut A on an active constraint region, and the stabilizer of an active constraint region in Waut A are well-defined by the action of Waut A on assembly configurations.  Figure 5: An assembly configuration whose automorphism group is strictly contained in that of the corresponding active constraint graph. Here the bunches are singleton spheres and bunches of the same color have the same C, r and δ.
The following theorem gives containment and equality relations between stabilizer subgroups of an active constraint graph, an active constraint region and individual configurations in the active constraint region.
Theorem 4. For an active constraint graph G = G(B) of an assembly configuration space In addition, there exist active constraint graphs G of assembly configuration spaces A where the above containment is strict, i.e.
for all B such that We give an example to show the existence of G where stab Waut A B stab Waut A G for any assembly configuration B of G. Refer to the assembly configuration in Figure 6, where each bunch is a singleton sphere. The permutation σ = (1 2 3) is contained in the automorphism group stab Waut A G of the active constraint graph G. However, it is not contained in the strict congruence group of any corresponding assembly configuration, as the position of the sphere 6 is asymmetric with respect to 1, 2, 3 in any assembly configuration of G. Thus stab Waut A B stab Waut A G for any assembly configuration B of G..  Figure 6: Any assembly configuration corresponding to the active constraint graph G has its strict congruence group strictly contained in stab Waut A G. Here the bunches are singleton spheres and bunches of the same color have the same C, r and δ.
(2) stab Waut A G = stab Waut A R G : from the definition of permutations in the weak automorphism group of the assembly configuration space, it follows that stab Waut A G ⊆ stab Waut A R G . To show stab Waut A R G ⊆ stab Waut A ), consider any element ψ ∈ stab Waut A R G . For any assembly configuration B ∈ R G , if a pair of spheres (x, y) are "touching" (i.e. they yield an edge in the corresponding active constraint graph), it must be the case that (ψ(x), ψ(y)) are also "touching" in ψ(B), since G(B) = G(ψ(B)) = G. Similarly, ψ must mapping "non-touching" pairs to "non-touching" pairs. Therefore ψ ∈ stab Waut A G. Remark 1. We expect the strict order preserving isomorphism group and the strict permuted congruence group of an assembly configuration B to lie between the strict congruence group stab Waut A B and the automorphism group stab Waut A G of its active constraint graph. However, the containment relationship between these two groups is not clear.

Symmetries in stratification, assembly path and pathway
A stratification S(A) of the assembly configuration space A is a partition of the space into strata X i of A that form a filtration ∅ ⊂ X 0 ⊂ X 1 ⊂ . . . ⊂ X m = A, m = 6(n − 1). Each X i is a union of active constraint regions R G , where the corresponding active constraint graph G has m − i independent edges, i.e. m − i inequality constraints are active. Each active constraint graph G is itself part of at least one, and possibly many, hence l-indexed, nested chains of the form Note that here for all l, j, R G l m−j ⊆ X j is closed and j dimensional. See Figure 7 for an example of assembly configuration space stratification.
Given two active constraint graphs G i and G j , R Gi (resp. G i ) is a parent of R Gj (resp. G j ) (resp. R Gj is a child of R Gi ) if G i G j and there does not exists an active constraint graph G m such that G i G m G j . The parent-child relation provides a Hasse diagram of active constraint regions in the stratification of A.  Figure 4 of 6 bunches, with each bunch being a singleton sphere and all bunches identical. So Waut A is the complete symmetric group of permutations of 6 elements, S 6 . Each node shown is an orbit representative of an active constraint region corresponding to an active constraint graph. The grey part is those active constraint graphs (orbit representatives) whose corresponding constraint regions are empty. The example active constraint graph representatives on the right have arrows pointing to their regions in the stratification. The labels in the circles are unimportant: they are automatically generated and specify an orbit of an active constraint graph (example shown on right).
An assembly path from G 1 to G m in the stratification is a sequence A coarse assembly path from G 1 to G m in the stratification is a sequence G 1 G 2 G 3 . . . G m where G * i+1 has exactly one new rigid component S not in G * i , with S containing a set of two or more rigid components S 1 . . . S m of G i . In addition, for all proper subsets Q {S 1 . . . S m } with |Q| ≥ 2, the subgraphs of G * i+1 induced by Q are not rigid. (The rigid components of a graph are the maximal rigid subgraphs. Two rigid components cannot intersect on more than two vertices. We refer the reader to combinatorial rigidity concepts in [24].) For example, In Figure 7, the sequence of active constraint graphs on the right form an assembly path.
An assembly forest corresponding to a coarse assembly path from G 1 to G m is the unique forest where the leaves are the maximal rigid components of G * 1 . The internal nodes are the new rigid components S occurring in some G * i+1 in the path. The children of S are the set of rigid components S 1 . . . S m contained in S that occur in G * i . The roots of the forest are the rigid components of G * m . An assembly tree is an assembly forest with only one root. See Figure 9 in Section 3 for examples of assembly trees [66,13,12].
A full (coarse) assembly path is an (coarse) assembly path from G 1 to G m , where G 1 is the empty active constraint graph, and G * m is a rigid active constraint graph. A (coarse) assembly path from primitives has the first property of the full assembly path, i.e. G 1 is the empty active constraint graph, but not the last property, i.e. G m can be any active constraint graph. The full assembly tree and assembly tree from primitives are also defined in this way.
A path between full active constraint graphs G and H where G H and H G is a sequence G = G i , G i+1 , G i+2 , . . . , G i+m = H, where any pair G i+k and G i+k+1 are on some assembly path, and The fundamental domain of the stratification S(A) is the minimal sub stratificationS(A) such that π∈Waut A π(S(A)) = S(A), where π acts onS(A) via its action on the active constraint regions (resp. active constraint graphs) ofS(A). In other words the active constraint regions (resp. active constraint graphs) inS(A) are orbit representatives of active constraint regions (resp. active constraint graphs) under Waut A .
An assembly pathway is an orbit of an assembly tree under Waut A . The definition extends to full, and coarse assembly trees.

Example illustrating above symmetries
Some of the symmetry concepts defined here were used in [56] to efficiently compute path and higher dimensional region intervals in sphere-based assembly configuration spaces more efficiently reproducing and extending the results in [33]. We give a brief description here in the form of an example:  Example 1. As an example, Figure 7 shows the Hasse diagram of the fundamental region of a stratification of an assembly system of 6 bunches that are identical singleton spheres considered first in [33]. Figure 8 shows an (orbit representative of an) active constraint graph of the system together with its parents and children in the Hasse diagram. In addition, orbit representatives of paths help in improving efficiency of path integrals. in Figure 7, any path that goes down from the top of the diagram to the bottom is the orbit representative of an assembly path. In Figure 8, the sequence e10q6 e11g3 e12g2 is the orbit representative of an assembly path but not a coarse assembly path, as none of e11g3's rigid components contains two or more rigid components of e10g6. On the other hand, the sequence e10q6 e12g2 is the orbit representative of a coarse assembly path.

Enumerating Simple Assembly Pathways
In this section, we consider the action of the strict congruence group of a single final configuration on its assembly trees, and use generating functions to count the number and sizes of simplified assembly pathways [13]. Note that our approach could potentially be applied for all other groups defined in Section 2, the largest of which is the weak automorphism group of the final configuration, which would be the same as the weak automorphism group of the assembly configuration space.
A simple assembly is modeled by a rooted tree, the leaves are abstract representation of individual bunches, the root representing the final assembled configuration. The internal vertices represent intermediate stages of assembly, simplified to be subsets instead of subgraphs of the root. This simplification results in a loss of information about the assembly configuration space and active constraint graphs of the intermediate stages of assembly. To compensate, the group is taken to be the automorphism group G of the graph of the assembled structure at the root instead of the weak automorphism group Waut A of the assembly configuration space.
The definitions of assembly tree and pathway are simplified as follows. Given a finite group G acting on a finite set X, we will define a simplified assembly pathway for the pair (G, X). First, a simplified assembly tree is a rooted tree for which each internal vertex has at least two children and whose leaves are bijectively labeled with elements of a set X. There is an induced labeling on all the vertices of a simplified assembly tree by labeling a vertex v by the set of labels on the leaves that are descendents of v. We identify each vertex of a simplified assembly tree with its label. Two simplified assembly trees are considered identical if there is a root preserving, adjacency preserving, and label preserving bijection between their vertex sets. The 26 simplified assembly trees with four leaves, labeled in the set X = {1, 2, 3, 4} are shown in Figure 9.
For a simplified assembly tree τ , the action of G on X induces a natural action of G on the power set of X and thereby on the set of vertices of τ . Let T X denote the set of all simplified assembly trees for X. If g ∈ G, then define the tree g(τ ) as the unique simplified assembly tree whose set of vertex labels (including the labels of internal vertices) is {g(v) : v ∈ τ }. Thus we have an induced action of G on T X . Each orbit of this action of G on T X consists of a set of simplified assembly trees called a simplified assembly pathway for (G, X). For this example there are exactly 11 simplified assembly pathways, which are indicated in Figure 9 by boxes around the orbits. There are four simplified assembly pathways of size one, i.e., with one simplified assembly tree in the orbit, three simplified assembly pathways of size two, and four simplified assembly pathways of size four.
For any subgroup H of G, let t X (H) denote the number of trees in T X that are fixed by every element of H. Furthermore, let t(H) := t X (H) denote the number of trees in T X that  Theorem 6. Let G be a group acting on a set X. If H is a subgroup of G, then where µ is the Möbius function for the lattice of subgroups of G.
Example 3 (Klein 4-group acting on T 4 -continued). Theorem 5, applied to our previous example of Z 2 ⊕ Z 2 acting simply on {1, 2, 3, 4}, states that the size of a simplified assembly pathway must be 1, 2 or 4, since it must be a divisor of 4 = |Z 2 ⊕ Z 2 |. To find the number of pathways of each size, note that G has three subgroups of order 2, namely and that t(G) = 4, where K 0 denotes the trivial subgroup of order 1. The simplified assembly trees in T X that are fixed by all elements of G are shown in Figure 9, A, B, C, D. For i = 1, 2, 3, those simplified assembly trees in T X that are fixed by all elements of K i and by no other elements of G are are shown in Figure 9, E, F, G, respectively. The remaining 16 simplified assembly trees in Figure 9 are fixed by no elements of G except the identity. Therefore, according to Theorem 5, the number of pathways of size 1, 2 and 4 are, respectively, t(G) = 4, 1 2 t(K 1 ) + t(K 2 ) + t(K 3 ) = 1 2 (2 + 2 + 2) = 3, The problem of enumerating simplified assembly pathways is reduced, using Theorems 5 and 6, to calculating the number t(G) of simplified assembly trees fixed by a given group G. This is done using permutation group theory and generating functions. It will be assumed, as is the case in many of the biological appllications, that G acts freely on X, i.e., if g(x) = x for some x ∈ X, then g must be the identity. In this case where n is the number of G-orbits in its action on X. Denote by t n (G) the number of trees in T n := T Xn that are fixed by G. We define the exponential generating function x n n! for the sequence {t n (G)}. If G is the trivial group of order one, then let us denote this generating function simply by f (x). This is the generating function for the total number of rooted, labeled trees with n leaves in which every non-leaf vertex has at least two children. For H ≤ G, let

Theorem 7.
The generating function f G (x) satisfies the following functional equations: and for |G| > 1, Althogh proofs are omitted in this survey, the rather involved proof of Theorem 7 relies on, in addition to generating function techniques, a characterization of block systems arising from a group acting on a set and a recursive procedure for constructing all trees in T X that are fixed by G. (See [13, Theorems 9 and 14].)

Remark 2.
Finding the generating function f G (x) depends on first finding the generating functions f H (x) for proper subgroups H of G. In that sense, the procedure for finding f G (x) is recursive, proceeding up the lattice of subgroups of G, starting from the trivial subgroup.
It is also worth mentioning that subgroups that are conjugate in G have the same generating function.
Example 4 (Klein 4-group acting on T 4 -continued). Consider G = Z 2 ⊕ Z 2 acting on X n . Recall that |X n | = 4n, the integer n being the number of G-orbits. Recall that the subgroups of G are K 0 , K 1 , K 2 , K 3 , G, where K 0 is the trivial group and The functional equations in the statement of Theorem 7 are for i = 1, 2, 3, and Using these equations and MAPLE software, the coefficients of the respective generating functions provide the following first few values for the number of fixed simplified assembly trees. For the first entry t 1 (G) = 4 for the group G, the four fixed trees are shown in Figure 9 A, B, C, D. For trees with eight leaves there are t 2 (G) = 104 simplified assembly trees fixed by G = Z 2 ⊕ Z 2 , and so on. Example 5 (The icosahedral group acting on a viral capsid). A symmetry of a polyhedron is a transformation in SE(3) that keeps the polyhedron, as a whole, fixed, and a direct symmetry is similarly defined. The icosahedral group is the group of direct symmetries of the icosahedron. It is a group of order 60 denoted G 60 . A viral capsid assembly configuration is modeled by a polyhedron P with icosahedral symmetry. Its set X of facets represent the protein monomers. The icosahedral group acts on P and hence on the set X. It follows from the so-called quasi-equivalence theory of the capsid structure that G 60 acts freely on X. We have |X| := |X n | = 60n, where n is the number of orbits in the action of the icosahedral group on X. Not every n is possible for a viral capsid; n must be a T -number, that is, a number of form h 2 + hk + k 2 , where h and k are nonnegative integers.
Note. An icosahedral viral capsid assemly configuration has a corresponding icosahedral active constraint graph. And the group G 60 , viewed as a subgroup of the symmetric group S 60 is the automorphism group of this active constraint graph. As mentioned in the beginning of this section, we are interested in the orbits of simplified assembly trees under the action of this automorphism group. However, we continue to use the more intuitive view of G 60 as a geometric group.
Before the number of simplified assembly trees can be enumerated, basic information about the icosahedral group is needed. The group G 60 consists of: • the identity, • 15 rotations of order 2 about axes that pass through the midpoints of pairs of diametrically opposite edges of P , • 20 rotations of order 3 about axes that pass through the centers of diametrically opposite triangular faces, and • 24 rotations of order 5 about axes that pass through diametrically opposite vertices.
There are 59 subgroups of G 60 that play a crucial role in the theory. Besides the two trivial subgroups, they are the following: • 15 subgroups of order 2, each generated by one of the rotations of order 2, • 10 subgroups of order 3, each generated by one of the rotations of order 3, • 5 subgroups of order 4, each generated by rotations of order 2 about perpendicular axes, • 6 subgroups of order 5, each generated by one of the rotations of order 5, • 10 subgroups of order 6, each generated by a rotation of order 3 about an axis L and a rotation of order 2 that reverses L, • 6 subgroups of order 10, each generated by a rotation of order 5 about an axis L and a rotation of order 2 that reverses L, • 5 subgroups of order 12, each the symmetry group of a regular tetrahedron inscribed in P .
From the above geometric description of the subgroups, it follows that all subgroups of a given order are conjugate in the group G 60 . Representatives of the conjugacy classes of the subgroups of the icosahedral group are denoted by G 0 , G 2 , G 3 , G 5 , G 6 , G 10 , G 12 , G 60 , where the subscript is the order of the group. The set of subgroups of G 60 forms a lattice, ordered by inclusion. A partial Hasse diagram for this lattice L is shown in Figure 10. The number on the edge joining G i (below) and G j (above) indicate the number of distinct subgroups of order i contained in each subgroup of order j. The number in parentheses on the edge joining G i (below) and G j (above) indicate the number of distinct subgroups of order j containing each subgroup of order i. The Möbius function of L is shown in Table 1. The entry in the table corresponding to the row labeled G i and column G j is µ(G i , G j ). Consider the case |X| = 60, i.e., for the T = 1 capsid. Using Theorem 7 and MAPLE software, the generating functions f Gi (x) were computed, and hence their coefficients t 60/i (G i )  which count simplified assembly trees that are fixed by any copy of G i were also computed. Note that, since |X| = 60, the number of orbits of G i in its action on X is 60/i. Substituting these values into Theorem 6 and using the Möbius Table 1 yields the numerical values for t 60/i (G i ), the number of simplified assembly trees over X with |X| = 60 that are fixed by G i but by no other elements of G 60 . In other words, these are the numbers of trees whose stabilizer in G 60 is exactly G i . Substituting these numbers t into Theorem 5, we arrive at the number of simplified assembly pathways of each possible size: 204 simplified assembly pathways of size 1 ∼ 168 × 10 8 simplified assembly pathways of size 5 ∼ 223 × 10 9 simplified assembly pathways of size 6 ∼ 613 × 10 17 simplified assembly pathways of size 10 ∼ 102 × 10 17 simplified assembly pathways of size 12 ∼ 334 × 10 28 simplified assembly pathways of size 15 ∼ 504 × 10 31 simplified assembly pathways of size 20 ∼ 835 × 10 51 simplified assembly pathways of size 30 ∼ 320 × 10 99 simplified assembly pathways of size 60 4 Open Questions

Enumeration problems in (non-simplified) assembly framework
We are interested in the following enumeration problems related to the action of Waut A for the framework in Section 2: 1. How to compute the size of orbits/stabilizers and the number of orbits under Waut A for assembly configurations, active constraint graphs, active constraint regions, (coarse) assembly paths and assembly trees/forests? 2. How to compute the number of coarse assembly paths that correspond to a particular assembly tree/forest? 3. Given two active constraint graphs G and H, where G and H are incomparable, i.e. G H and H G, how to compute the number of paths between them?
4. Given two active constraint graphs G 1 and G m , where G 1 G m , how to compute the number of (coarse) assembly paths from G 1 to G m ?
5. What are the orbits of the (coarse) assembly paths in (4) under the action of stab Waut A (G m )?
6. What are the orbits of the (coarse) assembly paths in (4) under the action of the group

Symmetries within an active constraint region via Cayley configurations
So far, we have discussed the orbit of an active constraint region and active constraint graph, and pointed out that it is sufficient to deal with a single orbit representative provided we are able to compute the multiplying factors associated with the size of the orbit, stabilizer, number of orbits etc. In fact, a single active constraint region could be decomposed into the union of nontrivial subregions that form the orbit of a fundamental region, leading to enormous efficiencies in sampling, computation of volumes that are currently hoplessly intractable in high dimensional configuration spaces as discussed in the Introduction.
In fact since the fundamental region itself could have subregions with varying orders of stabilizers, we could decompose into more than one orbit representative, with different stabilizers. In any case, sampling or computing the volume of an active constraint region is simplified by sampling these fundamental subregions and computing the size of their orbits.
One way to obtain such a decomposition of an active constraint region R G is via the locally complete Cayley (assembly) configurations δ F corresponding to the active constraint graph G. Convex Cayley configuration spaces highlight the key difference between assembly and other constraint systems e.g., folding. This difference is captured in the combinatorial structure of active constraint graphs. A Cayley parameter for an active constraint region R G is a non-edge of its active constraint graph G. For specific sets of non-edges F , the set of vectors λ F of attainable lengths of F -(in 3D realizations of a linkage (G, δ) with underlying graph G and edge lengths δ) -is always convex for any given lengths δ (that is, for all the 3D realizations of the bar-joint constraint system or linkage (G, δ)). This set is called the (3-dimensional) Cayley configuration space of the linkage (G, δ) on the Cayley parameters F , denoted Φ F (G, δ) and can be viewed as a "projection" of the space of pairwise distance vectors of realizations of (G, δ) on the Cayley parameters F . Such graphs G are said to have convexifiable Cayley configuration spaces with parameters F (short: convexifiable).
Convexity permits the use of convex programming techniques for improving efficiency of sampling, search, volume computations etc. for the configuration space.
The concept is best explained using key theorems of the first author in [68,69] discussed in Section 4.
We assume knowledge of common graph operations such as k-sums and resulting partial k-trees, a minor-closed class (partial 2-trees are series-parallel graphs with a forbidden minor K 4 ). Theorem 8. [68] A graph H has a convexifiable Cayley configuration space with parameters F if and only if for each f ∈ F all the minimal 2-sum components of H ∪ F that contain both endpoints of f are partial 2-trees. The Cayley configuration space Φ F (H, δ) of a bar-joint system or linkage (H, δ) is a convex polytope. When H ∪ F is a 2-tree, the bounding hyperplanes of this polytope are triangle inequalities relating the lengths of edges of the triangles in H ∪ F .
Note: A major advantage of the convex Cayley method is that sampling the configuration space can be effected by standard methods of convex programming. Another advantage is that the method is completely unaffected when δ are intervals rather than exact values [68]. A different characterization of inherent Cayley convexity for a graph G on a set F of nonedges as in the above section has been proven also for higher dimensions d [68], [19], showing equivalence to a minor-closed property of d-flattenability introduced in [7] and also for other, non-euclidean distances (norms) in [69]. Any realization of H in a normed space can be flattened into d-dimensional normed space (in the same norm) maintaining the same edge distances.

Fundamental regions of Active constraint regions
After G has been completed with the convexifying Cayley parameters F , the locally rigid graph G ∪ F typically loses symmetries present in G, i.e, the automorphism group is smaller. However, F can be replaced by any set of edges π(F ) for π ∈ stab Waut A (G). Each locally complete Cayley configuration in the active constraint region G is of the form δ F (lengths of edges in F , where G ∪ F is rigid). Each cartesian (assembly) configuration within an active constraint region with graph G corresponds bijectively to a globally complete Cayley configuration (δ F , δ H ) where G ∪ F is rigid and G ∪ F ∪ H is globally rigid (or even G ∪ F ∪ H is complete graph).
Thus when sampling the Cayley configuration space on F , one can find the boundaries of the fundamental regions corresponding to the corresponding cartesian assembly configurations as follows. For a Cayley configuration δ F , all its generically finitely many real/cartesian configurations can be obtained as various corresponding values of δ H , which include the values of δ π(F ) . The boundary of a fundamental region occurs during sampling when we encounter a cartesian (assembly) configuration c where the lengths of π(F ) correspond to already sampled lengths of F .
Note that there could be a different decomposition into fundamental regions, corresponding to each cartesian configuration (type) corresponding to the Cayley configuration. For example, for a different configuration c from the configuration c above, the lengths of π(F ) may not correspond to already sampled lengths of F . Or, there could be another element σ ∈ stab Waut A (G), with σ = π where the lengths of σ(F ) in c could correspond to already sampled lengths of F . In this manner, one can, in principle, algorithmically bound fundamental regions R i G of the active constraint region R G , by inspecting the assembly configurations corresponding to the Cayley configuration space on F , such that the active constraint region R G is the union of the orbits of the regions R i G (under the action of stab Waut A (G)). Efficiently finding these fundamental regions as well as the number and sizes of their orbits is an open question, whose answer would enormously reduce the complexity of configurational entropy computations for assembly.

g-unfixable unlabeled trees
Call a tree g-unfixable if there is no leaf-labeling so that the resulting labeled tree is fixed by the permutation g, and let us say that a tree is G-unfixable if it is g-unfixable for every nontrivial element of the group G. A study of unlabeled trees that are g-unfixable may lead to relevant related results. These properties are interesting for at least two reasons. First, they clarify the minimum quantifiable information in a labeled tree that is necessary to decide if it is fixed by a group element g: if the underlying unlabeled tree is g-unfixable, then the information in the labeling is unnecessary to make this decision. This may lead to efficient algorithms that use properties of the automorphism group of the tree to help in deciding whether a given labeled tree is fixed by the given group.

Depth of an assembly pathway
A result of [12] tells us that the orbit size of an assembly pathway is at least the depth of the pathway. The number of assembly pathways and orbit sizes of assembly trees that constitute a pathway, must be taken into consideration in defining any probability space over pathways. If the dynamics of transitioning between states along a pathway and thereby the density of states influencing the configurational integral computation [78] and other such factors nullify the vast differences in symmetry-induced numeracy factors between pathways, then that argument is yet to be made. The local rules theories using simple geometric rules, ODEs and other first principles physics based simulations of assembly of viral capsids [65,9,10,11,64,61,80,50,60,38,40] have been used to obtain the assembly kinetics including rates and concentrations of intermediates, and implicitly provide a probability distribution over pathways. A cautionary note in [52] uses an ODE based model of reaction kinetics to question simplistic models of assembly pathways. However, the model does not contradict the simple and transparent thesis that when symmetric structures form from identical units, the simple numeracy of orbit sizes of assembly trees must be taken into consideration in any theory predicting of likely assembly pathways. This paper shows the rich intricacy of possible symmetries at play. We in fact conjecture that this symmetry factor increases with the depth of the pathway. Proving this conjecture would strengthen the motivation for studying the symmetry factor.

Other questions
Theorems 5 and 6 as well as our successful computation effort in the special case of |X| = 60 and T = 1 can serve as a motivation to revisit the following questions, first raised in [12].
1. Given two symmetry invariant properties, how to compute the ratio of the number of pathways that satisfy both of these properties to the number of symmetry classes that satisfy only one of these properties?
2. What can we say about larger (icosahedrally) symmetric polyhedral graphs (larger T numbers of viral capsids, for example), fullerenes and fulleroids and polyhedra with different symmetry groups? In such cases, the computations of Section 3 can also be phrased as algorithmic questions, where asymptotic complexity of the algorithm is expressed in terms of the number of facets of the polyhedron (or the T number).
3. To fully extend the techniques in Section 3 to the framework of Section 2, each subassembly must be a rigid subgraph of the graph at the root. Some assembly trees fail to satisfy the rigidity condition and can never occur (probability 0). Such assembly trees are geometrically invalid. In addition, a valid assembly tree can be assigned a non-zero probability according to how difficult it is to find a solution to the constraints on each subassembly. Computing this probability -called the geometric stability factor -is necessary to make the required predictions.
Dropping the rigidity requirement, but maintaining the subgraph (connectivity) requirement, in [73], two of the authors study the number of assembly trees of graphs on labeled vertices. In that model, each graph has a trivial automorphism group, but the enumeration of assembly trees still leads to the use of a recent and very powerful technique from the theory of D-finite power series in several variables.
Incorporating a nontrivial automorphism group of the graph could help understand the role of capsid symmetry in the RNA assembly model of [71], which purports that RNA viruses assemble by attaching to the internal (symmetry breaking) genome strand since that would avoid having to deal with the prohibitive number of possible assembly pathways. It should be noted that in our precise and formal theory of assembly trees and their orbits (our pathways), assembly has an underlying partial order of stable intermediates, that are influenced by the connectivity and rigidity, they are subgraphs of the underlying polyhedral graph given by active constraints. The informal definition of pathway in [71] is a linear order (in our language, an assembly tree that is a path) given by a hamiltonian circuit in the viral polyhedral (dual) graph. We are not aware of a clarification of why the interactions of a given monomer in the sequence to multiple other monomers besides the previous one in the sequence would be insignificant. If not, the assembly tree would indeed be a partial order as in our case, and the tree would have a minimum fan-in required for rigidity, reducing the number of assembly trees significantly and reducing the number of their symmetry classes or orbits further, whereby this number alone is not a significant reason to adopt a alternate model of assembly (such as RNA strand attachment) that cuts down the possible pathways.
As future work, we also aim to apply the symmetry framework developed in this paper to explain more experimental and theoretical results from previous literature.

Conclusions
In this paper, we developed a novel framework for symmetry in assembly under short range potentials and considered the symmetry groups of various objects studied in previous literature on assembly, including assembly configuration spaces, active constraint graphs, active constraint regions, assembly trees and pathways. The new Theorem 4 which formalizes the containment relations between stabilizer subgroups of active constraint graph and corresponding assembly configurations. We then demonstrated the new symmetry concepts to compute the sizes and numbers of orbits in two example settings appearing in previous work. The methods can improve efficiency for large systems with multiple identical bunches and spheres that have large order symmetry groups. The new symmetry framework helps formalize a number of questions for future work.

Acknowledgment
We thank Rahul Prabhu for his feedback and assistance in paper preparation.