Invariant Graph Partition Comparison Measures

Symmetric graphs have non-trivial automorphism groups. This article starts with the proof that all partition comparison measures we have found in the literature fail on symmetric graphs, because they are not invariant with regard to the graph automorphisms. By the construction of a pseudometric space of equivalence classes of permutations and with Hausdorff’s and von Neumann’s methods of constructing invariant measures on the space of equivalence classes, we design three different families of invariant measures, and we present two types of invariance proofs. Last, but not least, we provide algorithms for computing invariant partition comparison measures as pseudometrics on the partition space. When combining an invariant partition comparison measure with its classical counterpart, the decomposition of the measure into a structural difference and a difference contributed by the group automorphism is derived.


Introduction
Partition comparison measures are routinely used in a variety of tasks in cluster analysis: finding the proper number of clusters, assessing the stability and robustness of solutions of cluster algorithms, comparing different solutions of randomized cluster algorithms or comparing optimal solutions of different cluster algorithms in benchmarks [1], or in competitions like the 10th DIMACS graph-clustering challenge [2].Their development has been for more than a century an active area of research in statistics, data analysis and machine learning.One of the oldest and still very well-known measure is the one of Jaccard [3]; more recent approaches were by Horta and Campello [4] and Romano et al. [5].For an overview of many of these measures, see Appendix B.Besides the need to compare clustering partitions, there is an ongoing discussion of what actually are the best clusters [6,7].Another problem often addressed is how to measure cluster validity [8,9].
However, the comparison of graph partitions leads to new challenges because of the need to handle graph automorphisms properly.The following small example shows that standard partition comparison measures have unexpected results when applied to graph partitions: in Figure 1, we show two different ways of partitioning the cycle graph C 4 (Figure 1a,d).Partitioning means grouping the nodes into non-overlapping clusters.The nodes are arbitrarily labeled with 1 to 4 (Figure 1b,e), and then, there are four possibilities of relabeling the nodes so that the edges stay the same.One possibility is relabeling 1 by 2, 2 by 3, 3 by 4 and 4 by 1, and the images resulting from this relabeling are shown in Figure 1c,f.The relabeling corresponds to a counterclockwise rotation of the graph by 90 • , and formal details are given in Section 2. The effects of this relabeling on the partitions P 1 and Q 1 are different: 1. Partition P 1 = {{1, 2}, {3, 4}} is mapped to the structurally equivalent partition P 2 = {{1, 4}, {2, 3}}. 2. Partition Q 1 = {{1, 3}, {2, 4}} is mapped to the identical partition Q 2 .Equally-colored nodes represent graph clusters, and the choice of colors is arbitrary.Adding, again arbitrary, but fixed, node labels impacts the node partitions and results in the failure to recognize the structural difference when comparing these partitions with partition comparison measures (see Table 1).The different images (b,c) (P 1 = {{1, 2}, {3, 4}}, P 2 = {{1, 4}, {2, 3}}) and (e,f) (Q 1 = Q 2 = {{1, 3}, {2, 4}}) emerge from the graph's symmetry.
Table 1 illustrates the failure of partition comparison measures (here, the Rand Index (RI)) to recognize structural differences: 1.Because P 1 and P 2 are structurally equivalent, the RI should be one (as for Cases 1, 2 and 3) instead of 1/3. 2. Comparisons of structurally different different partitions (Cases 4 and 5) and comparisons of structurally equivalent partitions (Case 6) should not result in the same value.N 11 +N 10 +N 01 +N 00 .N 11 indicates the number of nodes that are in both partitions together in a cluster; N 10 and N 01 are the number of nodes that are together in a cluster in one partition, but not in the other; and N 00 are the number of nodes that are in both partitions in different clusters.See Appendix B for the formal definitions.Partitions P 1 and P 2 are equivalent (yet not equal, denoted "∼"), and partitions Q 1 and Q 2 are identical (thus, also equivalent, denoted "=").However, the comparison of the structurally different partitions (denoted " =") P i and Q j yields the same result as the comparison between the equivalent partitions P 1 and P 2 .This makes the recognition of structural differences impossible.

Case
Compared Partitions Relation N 11 N 10 N 01 N 00 RI One may argue that graphs in real applications contain symmetries only rarely.However, recent investigations of graph symmetries in real graph datasets show that a non-negligible proportion of these graphs contain symmetries.MacArthur et al. [10] state that "a certain degree of symmetry is also ubiquitous in complex systems" [10] (p.3525).Their study includes a small number of biological, technological and social networks.In addition, Darga et al. [11] studied automorphism groups in very large sparse graphs (circuits, road networks and the Internet router network), with up to five million nodes with eight million links with execution times below 10 s.Katebi et al. [12] reported symmetries in 268 of 432 benchmark graphs.A recent large-scale study conducted by the authors of this article for approximately 1700 real-world graphs revealed that about three quarters of these graphs contain symmetries [13].
The rather frequent occurrence of symmetries in graphs and the obvious deficiencies of classic partition comparison measures demonstrated above have motivated our analysis of the effects of graph automorphisms on partition comparison measures.
Our contribution has the following structure: Permutation groups and graph automorphisms are introduced in Section 2. The full automorphism group of the butterfly graph serves as a motivating example for the formal definition of stable partitions, stable with regard to the actions of the automorphism group of a graph.In Section 3, we first provide a definition that captures the property that a measure is invariant with regard to the transformations in an automorphism group.Based on this definition, we first give a simple proof by counterexample for each partition comparison measure in Appendix B, that these measures based on the comparison of two partitions are not invariant to the effects of automorphisms on partitions.The non-existence of partition comparison measures for which the identity and the invariance axioms hold simultaneously is proven subsequently.In Section 4, we construct three families of invariant partition comparison measures by a two-step process: First, we define a pseudometric space by defining equivalence classes of partitions as the orbit of a partition under the automorphism group Aut(G).Second, the definitions of the invariant counterpart of a partition comparison measure are given: we define them as the computation of the maximum, the minimum and the average of the direct product of the two equivalence classes.The section also contains a proof of the equivalence of several variants of the computation of the invariant measures, which-by exploiting the group properties of Aut(G)-differ in the complexity of the computation.In Section 5, we introduce the decomposition of the measures into a structurally stable and unstable part, as well as upper bounds for instability.In Section 6, we present an application of the decomposition of measures for analyzing partitions of the Karate graph.The article ends with a short discussion, conclusion and outlook in Section 7.

Graphs, Permutation Groups and Graph Automorphisms
We consider connected, undirected, unweighted and loop-free graphs.Let G = (V, E) denote a graph where V is a finite set of nodes and E is a set of edges.An edge is represented as {u, v} ∈ {{x, y} | (x, y) ∈ V × V ∧ x = y}.Nodes adjacent to u ∈ V (there exists an edge between u and those nodes) are called neighbors.A partition P of a graph G is a set of subsets C i , i = 1, . . ., k of V with the usual properties: (i) Each subset is called a cluster, and it is identified by its labeled nodes.
As a partition quality criterion, we use the well-known modularity measure Q of Newman and Girvan [14] (see Appendix A).It is a popular optimization criterion for unsupervised graph clustering algorithms, which try to partition the nodes of the graph in a way that the connectivity within the clusters is maximized and the number of edges connecting the clusters is minimized.For a fast and efficient randomized state-of-the-art algorithm, see Ovelgönne and Geyer-Schulz [15].
Partitions are compared by comparison measures, which are functions of the form m : P(V) × P(V) → R where P(V) denotes the set of all possible partitions of the set V. A survey of many of these measures is given in Appendix B.
A permutation on V is a bijection g : V → V. We denote permutations by the symbols f , g and h.Each permutation can be written in cycle form: for a permutation with a single cycle of length r, we write c = (v 1 v 2 . . .v r ).c maps v i to v i+1 (i = 1, . . ., r − 1), v r to v 1 and leave all other nodes fixed.Permutations with more than one cycle are written as a product of disjoint cycles (i.e., no two cycles have a common element).(v k ) means that the element v k remains fixed, and for brevity, these elements are omitted.
Permutations are applied from the right: The image of u under the permutation g is ug.The composition of g and h is h • g, with • being the permutation composition symbol.For brevity, h • g is written as gh, so that u(gh) = (ug)h holds.Computer scientists call this a postfix notation; in prefix notation, we have h(g(u)).Often, we also find u g , which we will use in the following.For k compositions g • g • g • . .., we write g k and g 0 = id.
A set of permutation functions forms a permutation group H, if the usual group axioms hold [16]: Unit element: The identity function id ∈ H acts as the neutral element: ∀g ∈ H : id • g = g • id = g 3. Inverse element: For any g in H, the inverse permutation function g −1 ∈ H is the inverse of g: The set of all permutations of V is denoted by Sym(V).Sym(V) is a group, and it is called the symmetric group (see [17]).Sym(V) ∼ Sym(V ) iff |V| = |V | with ∼ denoting isomorphism.A generator of a finite permutation group H is a subset of the permutations of H from which all permutations in H can be generated by application of the group axioms [18].
An action of H on V (H acts on V) is called the group action of a set [19] (p.5): Groups acting on a set V also act on combinatorial structures defined on V [20] (p.149), for example the power set 2 V , the set of all partitions P(V) or the set of graphs G(V).We denote combinatorial structures as capital calligraphic letters; in the following, only partitions (P) are of interest because they are the results of graph cluster algorithms.The action of a permutation g on a combinatorial structure is performed by pointwise application of g.For instance, for P, the image of Let H be a permutation group.When H acts on V, a node u is mapped by the elements of H onto other nodes.The set of these images is called the orbit of u under H: The group of permutations H u that fixes u is called the stabilizer of u under H: The orbit stabilizer theorem is given without proof [16].It links the order of a permutation group with the cardinality of an orbit and the order of the stabilizer: The action of H on V induces an equivalence relation on the set: for All elements of an orbit are equivalent, and the orbits of a group partition the set V. An orbit of length one (in terms of set cardinality) is called trivial.Analogously, for a partition P, the definition is: Definition 1.The image of the action of H on a partition P (or the orbit of P under H) is the set of all equivalent partitions of partition P under H A graph automorphism f is a permutation that preserves edges, i.e., {u f , The automorphism group of a graph contains all permutations of vertices that map edges to edges and non-edges to non-edges.The automorphism group of G is defined as: Example 1.Let G b f be the butterfly graph (Figure 2, e.g., Erdős et al. [21], Burr et al. [22]) whose full automorphism group is given in Table 2 (first column).The permutation (2 5) is not an automorphism, because it does not preserve the edges from 1 to 2 and from 5 to 4. The butterfly graph has the two orbits {1, 2, 4, 5} and {3}.Table 2.The full automorphism group Aut(G b f ) = {id, g 1 , . . . ,g 7 } of the butterfly graph in Figure 2 and its effect on three partitions.Bold partitions are distinct.A possible generator is {g 1 , g 4 }.
Example 2. Only P 1 in Table 2 is stable because its orbit is trivial.The two modularity optimal partitions (e.g., P id 2 and P For the evaluation of graph clustering solutions, the effects of graph automorphisms on graph partitions are of considerable importance: 1. Automorphisms may lead to multiple equivalent optimal solutions as the butterfly graph shows (P id 2 and P g 4 2 in Table 2).2. Partition comparison measures are not invariant with regard to automorphisms, as we show in Section 3.

Graph Partition Comparison Measures Are Not Invariant
When comparing graph partitions, a natural requirement is that the partition comparison measure is invariant under automorphism.
Observe that if Q ∈ P Aut(G) , then such a measure m cannot distinguish between P and Q, since m(P, Q) = m(P, P ) by definition.
However, unfortunately, as we show in the rest of this section, such a partition comparison measure does not exist.In the following, we present two proofs of this fact, which differ both in their level of generality and sophistication.

Variant 1: Construction of a Counterexample
Theorem 2. The measures for comparing partitions defined in Appendix B do not fulfill Definition 3 in general.
Proof.We choose the cycle graph C 36 and compute all modularity maximal partitions with Q = 2/3.Each of these six partitions has six clusters, and each of these clusters consists of a chain of six nodes (see Figure 3).
Clearly, since all partitions are equivalent, an invariant partition comparison measure should identify them as equivalent: Computing m(P 0 , P g k 0 ) for k = 0, . . ., 5 produces Table 3.Because the values in each row differ (in contrast to the requirements defined by Equation ( 1)), each row of Table 3 contains the counterexample for the measure used.

Variant 2: Inconsistency of the Identity and the Invariance Axiom
Theorem 3. Let G = (V, E) be a graph with |V| > 2 and nontrivial Aut(G).For partition comparison measures m : P(V) × P(V) → R, it is impossible to fulfill jointly the identity axiom m(P, Q) = c, if and only if P = Q (e.g., for a distance measure c = 0, for a similarity measure c = 1, etc.) for all P, Q ∈ P(V) and the axiom of invariance (from Definition 3) m(P, Q) = c, ∀Q ∈ P Aut(G) .

Proof.
1. Since Aut(G) is nontrivial, a nontrivial orbit with at least two different partitions, namely P and Q, exists because |P Aut(G) | > 1.It follows from the invariance axiom that m(P, Q) = c. 2. The identity axiom implies that it follows from m(P, Q) = c that P = Q.
3. This contradicts the assumption that P and Q are different.
3 .The six optimal partitions consist of six clusters (see Figure 3).The number of pairs in the same cluster in both partitions is denoted by N 11 , in different clusters by N 00 and in the same cluster in one partition, but not in the other, by N 01 or N 10 .For the definitions of all partition comparison measures, see Appendix B. To compute this table, the R package partitionComparison has been used [23].).As a consequence, in each cluster, one node drops out and is added to another cluster: For instance, Node 1 drops out of the "original" cluster C = {1, 2, 3, 4, 5, 6}, and Node 7 is added, resulting in C g = {2, 3, 4, 5, 6, 7}.All dropped nodes are shown in light gray.

The Construction of Invariant Measures for Finite Permutation Groups
The purpose of this section is to construct invariant counterparts for most of the partition comparison measures in Appendix B. We proceed in two steps: 1. We construct a pseudometric space from the images of the actions of Aut(G) on partitions in P(V) (Definition 1). 2. We extend the metrics for partition comparison by constructing invariant metrics on the pseudo-metric space of partitions.

The Construction of the Pseudometric Space of Equivalence Classes of Graph Partitions
We use a variant of the idea of Doob's concept of a pseudometric space [24] (p.5).A metric for a space S (with s, t, u ∈ S) is a function d : S × S → R + for which the following holds: 1. Symmetry: d(s, t) = d(t, s).A pseudometric space (S, d * ) relaxes the identity condition to d * (s, s) = 0.The distance between two elements s 1 , s 2 of an equivalence class [s] is defined as d * (s 1 , s 2 ) = 0 by Definition 3.
For graphs, S is the finite set of partitions P(V) and S * is the partition of P(V) into orbits of Aut(G): A partition P in S corresponds to its orbit P Aut(G) in S * .The relations between the spaces used in the following are: 1. (S, d) is a metric space with S = P(V) and with the function We construct three variants of d * in Section 4.2.3. (S, d * ) is the pseudometric space with S = P(V) and with the metric d * .The partitions in S are mapped to arguments of d * by the transformation ec : P(V) → P(V) Aut(G) , which is defined as ec(P ) := P Aut(G) .
Table 4 illustrates S * (the space of equivalence classes) of the pseudometric space (S, d * ) of the butterfly graph (shown in Figure 2).S * is the partition of P({1, 2, 3, 4, 5}) into 17 equivalence classes.Only the four classes E 1 , E 8 , E 12 and E 17 are stable because they are trivial orbits.The three partitions from Table 2 are contained in the following equivalence classes: P 1 ∈ E 8 , P 2 ∈ E 14 , and P 3 ∈ E 13 .
Table 4.The equivalence classes of the pseudometric space (S, d * ) of the butterfly graph (see Figure 2).Classes are grouped by their partition type, which is the corresponding integer partition.k is the number of partitions per type; l is the number of clusters the partitions of a type consists of; dia 1−RI is the diameter (see Equation ( 2)) of the equivalence class computed for the distance d RI computed from the Rand Index (RI) by 1 − RI.

The Construction of Left-Invariant and Additive Measures on the Pseudometric Space of Equivalence Classes of Graph Partitions
In the following, we consider only partition comparison measures, which are distance functions of a metric space.Note that a normalized similarity measure s can be transformed into a distance by the transformation d = 1 − s.
In a pseudometric space (S, d * ), we measure the distance d * (P, Q) between equivalence classes (which are sets) of partitions instead of the distance d(P, Q) between partitions.The partitions P and Q are formal arguments of d * , which are expanded to equivalence classes by P Aut(G) and Q Aut(G) .The standard construction of a distance measure between sets has been developed for the point set topology and is due to Felix Hausdorff [25] (p.166) and Kazimierz Kuratowski [26] (p.209).For finite sets, it requires the computation of the distances for all pairs of the direct product of the two sets.Since for finite permutation groups, we deal with distances between two finite sets of partitions, we use the following definitions for the lower and upper measures, respectively.Both definitions have the form of an optimization problem: d( P, Q) and: The diameter of a finite equivalence class of partitions is defined by The third option of defining a distance between two finite equivalence classes of partitions of taking the average distance is due to John von Neumann [27]: Note that the definitions for d * L , d * U and d * av require the computation of the minimal, maximal and average distance of all pairs of the direct product P Aut(G) × Q Aut(G) .The computational complexity of this is quadratic in the size of the larger equivalence class.
Posed as a measurement problem, we can instead fix one partition in one of the orbits and measure the minimal, maximal and average distance between all pairs of either the direct product of {P } × Q Aut(G) or {Q} × P Aut(G) .The complexity of this is linear in the size of the smaller equivalence class.
Theorems 4 and 5 and their proofs are based on these observations.They are the basis for the development of algorithms for the computation of invariant partition comparison measures of a computational complexity of at most linear order and often of constant order.Theorem 4. For all P Aut(G) = Q Aut(G) , the following equations hold: , that is P = P h and Q = Q g .Then, since the orbits of both partitions are generated by Aut(G), the following identities between distances hold: as well as: and: 1.For d * L , we have: by switching the reference systems.In the next sequence of equations, we establish that taking the minimum over all reference systems is equivalent to finding the minimum for one arbitrarily fixed reference system.min For the proof of d * U for P Aut(G) = Q Aut(G) we substitute max for min in the proof of d * L .
Theorem 5.For all P Aut(G) = Q Aut(G) , the following equations hold: Proof.For the proof of the equality of the identities of d * av , we use the property of an average of n observations x i,j with k identical groups of size m with i ∈ 1, . . ., k, j ∈ 1, . . ., m: The computation of an average over the group equals the result of the computation of an average over the orbit, because the orbit stabilizer Theorem 1 implies that each element of the orbit is generated |Aut(G) P | times, and this means that we average |Aut(G) P | groups of identical values and that Equation ( 9) applies.This establishes the equality of Expressions ( 3) and ( 4), as well as of Expressions ( 5) and ( 6) and of Expressions ( 7) and ( 8), respectively.
The two decompositions of the direct product Aut(G) × Aut(G) establish the equality of Expressions ( 4) and ( 6), as well as of Expressions ( 4) and (8).
Note that these proofs also show that d * L (P, Q), d * U (P, Q) and d * av (P, Q) are invariant.Next, we prove that the three measures d * L (P, Q), d * U (P, Q) and d * av (P, Q) are invariant measures.
Theorem 6.The lower pseudometric space (S, d * L ) has the following properties: These properties also hold for the upper pseudometric space (S, d * U ) and the average pseudometric space (S, d * av ). Proof.
1. Identity holds because of the definition of the distance d * between two elements in an equivalence class of the pseudometric space (S, d * ).

Invariance of d *
L (P, Q), d * U (P, Q) and d * av (P, Q) is proven by Theorems 4 and 5. 3. Symmetry holds, because d is symmetric, and min, max and the average do not depend on the order of their respective arguments.4. To proof the triangular inequality, we make use of Theorems 4 and 5 and of the fact that d is a metric for which the triangular inequality holds: (a) For d * L follows:

Decomposition of Partition Comparison Measures
In this section, we assess the structural (dis)similarity between two partitions and the effect of the group actions by combining a partition comparison measure and its invariant counterpart defined in Section 4. The distances d(P, Q), d * L (P, Q), d * U (P, Q) and d * av (P, Q) allow the decomposition of a partition comparison measure (transformed into a distance) into a structural component d struc and the effect d Aut(G) of the automorphism group Aut(G): dia(P ) measures the effect of the automorphism group Aut(G) on the equivalence class P Aut(G) (see the last column of Table 4).e Aut(G) max is an upper bound of the automorphism effect on the distance of two partitions P and Q: e Aut(G) max = min(dia(P ), dia(Q)).
This follows from Theorem 4. Note that e Aut(G) max as Case 1 in Table 5 shows.In Table 5, we show a few examples of measure decomposition for partitions of the butterfly graph for the Rand distance d RI : 1.In Case 1, we compare two partitions from nontrivial equivalence classes: the difference of 0.4 between d * U and d * L indicates that the potential maximal automorphism effect is larger than the lower measure.In addition, it is also smaller (by 0.2) than the automorphism effect in each of the equivalence classes.That d Aut(G) is zero for the lower measure implies that the pair (P, Q) is a pair with the minimal distance between the equivalence classes.The fact that d * av = 0.5 is the mid-point between the lower and upper measures indicates a symmetric distribution of the distances between the equivalence classes.2. That d Aut(G) is zero for the upper measure in Case 2 means that we have found a pair with the maximal distance between the equivalence classes.3.In Case 3, we have also found a pair with maximal distance between the equivalence classes.
However, the maximal potential automorphism effect is smaller than for Cases 1 and 2. In addition, the distribution of distances between the equivalence classes is asymmetric.4. Case 4 shows the comparison of a partition from a trivial with a partition from a non-trivial equivalence class.Note, that in this case, all three invariant measures, as well as d RI coincide and that no automorphism effect exists.
A different approach to measure the potential instability in clustering a graph G is the computation of the Kolmogorov-Sinai entropy of the finite permutation group Aut(G) acting on the graph [28].
Note, that the Kolmogorov-Sinai entropy of a finite permutation group is a measure of the uncertainty of the automorphism group.It cannot be used as a measure to compare two graph partitions.

Invariant Measures for the Karate Graph
In this section, we illustrate the use of invariant measures for the three partitions P O , P 1 and P 2 of the Karate graph K [29], which is shown in Figure 4. Aut(K) is of order 480, and it consists of the three subgroups (5 11), (6 7)}.In addition to the modularity optimal partition P O (with its clusters separated by longer and dashed lines in Figure 4), we use the partitions P 1 and P 2 : P 1 = {{5, 6, 7, 19, 21} , {1, 2, 3, 4, 8, 12, 13, 14, 18, 20, 22} , {9, 10, 11, 15, 16, 17, 23, 27, 30, 31, 33, 34} , {24, 25, 26, 28, 29, 32}}   P 2 = {{5, 6, 7, 8, 12, 19, 21} , {1, 2, 3, 4, 13, 14, 18, 20, 22} , {9, 10, 11, 15, 16, 17, 23, 27, 30, 31, 33, 34} , {24, 25, 26, 28, 29, 32}}   Both partitions are affected by the orbits {15, 16, 19, 21, 23} and {5, 11}, each overlapping two clusters.The dissimilarity to P O is larger for P 2 , which is reflected in Tables 6 and 7.For the optimal partition P O of type (5,6,11,12), the upper bound of the size of the equivalence class is 480 [30] (p.112).The actual size of the equivalence class of P O is one, which means the optimal solution is not affected by Aut(K).Partition P 1 , which is of the same type as P O , also has an upper bound of 480 for its equivalence class.The actual size of the equivalence classes of both P 1 and P 2 is 20.Note that the actual size of the equivalence classes that drive the complexity of computing invariant measures is in our example far below the upper bound.Table 6 shows the diameters of the equivalence classes of the partitions.Table 7 illustrates the decomposition into structural effects and automorphism effects for the three partitions of the Karate graph.We see that for the comparison of a stable partition (P O ) with one of the unstable partitions, the classic partition comparison measures are sufficient.However, when comparing the two unstable partitions P 1 and P 2 , the structural effect (0.0499) is dominated by the maximal automorphism effect (0.1176).Furthermore, we note that the distribution of values over the orbit of the automorphism group is asymmetric (by looking at d * L , d * U and d * av ).The analysis of the effects of the automorphism group of the Karate network showed that the automorphism group does not affect the stability of the optimal partition.However, the first results show that the situation is different for other networks like the Internet AS graph with 40,164 nodes and 85,123 edges (see Rossi et al. [31], and the data of of the graph tech-internet-as are from Rossi and Ahmed [32]): for this graph, several locally optimal solutions with a modularity value above 0.694 exist, all of which are unstable.Further analysis of the structural properties of the solution landscape of this graph is work in progress.

Discussion, Conclusions and Outlook
In this contribution, we study the effects of graph automorphisms on partition comparison measures.Our main results are: 1.A formal definition of partition stability, namely P is stable iff |P Aut(G) | = 1. 2. A proof of the non-invariance of all partition comparison measures if the automorphism group is nontrivial (|Aut(G)| > 1). 3. The construction of a pseudometric space of equivalence classes of graph partitions for three classes of invariant measures concerning finite permutation groups of graph automorphisms.4. The proof that the measures are invariant and that for these measures (after the transformation to a distance), the axioms of a metric space hold. 5.The space of partitions is equipped with a metric (the original partition comparison measure) and a pseudometric (the invariant partition comparison measure).6.The decomposition of the value of a partition comparison measure into a structural part and a remainder that measures the effect of group actions.
Our definitions of invariant measures have the advantage that any existing partition comparison measure (as long as it is a distance or can be transformed into one) can still be used for the task.Moreover, the decomposition of measures restores the primary purpose of the existing comparison measures, which is to quantify structural difference.However, the construction of these measures leads directly to the classic graph isomorphism problem, whose complexity-despite considerable efforts and hopes to the contrary [33]-is still an open theoretical problem [34,35].However, from a pragmatic point of view, today, quite efficient and practically usable algorithms exist to tackle the graph isomorphism problem [34].In addition, for very large and sparse graphs, algorithms for finding generators of the automorphism group exist [11].Therefore, this dependence on a computationally hard problem in general is not an actual disadvantage and allows one to implement the presented measure decomposition.The efficient implementation of algorithms for the decomposition of graph partition comparison measures is left for further research.
Another constraint is that we have investigated the effects of automorphisms on partition comparison measures in the setting of graph clustering only.The reason for this restriction is that the automorphism group of the graph is already defined by the graph itself and, therefore, is completely contained in the graph data.For arbitrary datasets, the information about the automorphism group is usually not contained in the data, but must be inferred from background theories.However, provided we know the automorphism group, our results on the decomposition of the measures generalize to arbitrary cluster problems.
All in all, this means that this article provides two major assets: first, it provides a theoretic framework that is independent of the preferred measure and the data.Second, we provide insights into a source of possible partition instability that has not yet been discussed in the literature.The downsides (symmetry group must be known and graph clustering only) are in our opinion not too severe, as we discussed above.Therefore, we think that our study indicates that a better understanding of the principle of symmetry is important for future research in data analysis.

Figure 2 .
Figure 2. The butterfly graph (five nodes, with two node pairs connected by the bridging node 3).
For the proof of the triangular inequality for d * U , we substitute max for min and d U for d L in the proof of the triangular inequality for d * L .(c) For d * av , it follows:

2 C 3 C 4 Figure 4 .
Figure 4. Zachary's Karate graph K with the vertices of the orbits of the three subgroups of Aut(K) in bold and the clusters of P O separated by dashed edges.

Table 1 .
The Rand index is RI = N 11 +N 00

Table 3 .
Comparing the modularity maximizing partitions of the cycle graph C 36 with modularity Q Pair counting measures ( f (N 11 , N 00 , N 01 , N 10 ); see TablesA1 and A2) CFigure3.The cycle graph C 36 (the "outer" cycle) and an initial partition of six clusters (connected nodes of the same color, separated by dashed lines).A single application of g = (1 2 . . .36) "rotates" the graph by one node (the "inner" cycle C g 36

Table 5 .
Measure decomposition for partitions of the butterfly graph for the Rand distance d RI = 1 − RI.

Table 6 .
Diameter (computed using d RI ), orbit size and stability of partitions P O , P 1 and P 2 .

Table 7 .
[23]riant measures and automorphism effects for the Karate graph.The R package partitionComparison has been used for the computations[23].