# How Symmetric Are Real-World Graphs? A Large-Scale Study

^{*}

## Abstract

**:**

`networkrepository.com`is carried out and a normalized version of the “network redundancy” measure is presented. It quantifies graph symmetry in terms of the number of orbits of the symmetry group from zero (no symmetries) to one (completely symmetric), and improves the recognition of asymmetric graphs. Over 70% of the analyzed graphs contain symmetries (i.e., graph automorphisms), independent of size and modularity. Therefore, we conclude that real-world graphs are likely to contain symmetries. This contribution is the first larger-scale study of symmetry in graphs and it shows the necessity of handling symmetry in data analysis: The existence of symmetries in graphs is the cause of two problems in graph clustering we are aware of, namely, the existence of multiple equivalent solutions with the same value of the clustering criterion and, secondly, the inability of all standard partition-comparison measures of cluster analysis to identify automorphic partitions as equivalent.

## 1. Introduction

`networkrepository.com`(http://www.networkrepository.com, last accessed 28 August 2017) [21].

## 2. Definition of Graph Symmetry

- Identity: $\mathbf{1}\in Aut\left(G\right):\mathbf{1}\pi =\pi \mathbf{1}=\pi \phantom{\rule{1.em}{0ex}}\forall \pi \in Aut\left(G\right)$.
- Inverses: $\pi \in Aut\left(G\right)\iff {\pi}^{-1}\in Aut\left(G\right)$ and $\pi {\pi}^{-1}={\pi}^{-1}\pi =\mathbf{1}$.
- Closure: $\forall \pi ,\tau \in Aut\left(G\right):\pi \tau \in Aut\left(G\right)$.
- Associativity: $\forall \pi ,\tau ,\rho \in Aut\left(G\right):\left(\pi \tau \right)\rho =\pi \left(\tau \rho \right)$.

## 3. Description of the Graph Symmetry Analysis Procedure

- Retrieval of graph metadata from
`networkrepository.com`(includes the download link to the actual dataset). - Selection of the datasets to download (by file size).
- Retrieval of the actual datasets (compressed zip-archives).
- Loading, selection and transformation of the graph data to perform the symmetry computations.
- Calculation of statistics and their analysis (discussed in Section 4).

#### 3.1. The Data Repository: `networkrepository.com`

`networkrepository.com`[21] is a platform that advertises to be “[t]he first interactive data and network repository with real-time analytics” (http://www.networkrepository.com, last accessed 28 August 2017). It lists about 3900 datasets and provides several (sometimes approximate) characteristics for each dataset. The datasets are grouped into classes of origin (for example, “Biological Networks”, “Cheminformatics”, “Social Networks”, etc.) with the smallest class containing less than 10 networks, the largest (“Miscellaneous Networks”) more than 2500. A striking advantage of this repository is that it also contains many datasets which can be found in other sources (for example, the one of GEPHI [22] (https://github.com/gephi/gephi/wiki/Datasets, last accessed 29 August 2017), SNAP [23] or the DIMACS challenges [24], 10th DIMACS challenge (http://dimacs.rutgers.edu/Challenges/, last accessed 29 August 2017)). This allowed us to easily retrieve many datasets at once without the need to combine several sources and—potentially—different data formats. A flaw of using this source of data is the presence of datasets that are not actual real-world graphs, as the discussion of the results will show.

#### 3.2. Data Download Selection

#### 3.3. Data Formats

#### 3.4. Data Analysis Selection

`saucy`[10], which we used to compute generating sets for the automorphism groups, cannot handle weighted graphs.

`networkrepository.com`also contains datasets of dynamic networks that change over time. These should be excluded as it is not clear which state of the network is the one to analyze (start, end, average, every state from start to end, …?). Some other datasets consist of disconnected components which we also excluded from further processing.

#### 3.5. Data Analysis Procedure

`saucy`[10]. Calling

`saucy`with a simple graph as input returns the size of the group, the number of permutations that generate the group, and some other information (sum of the permutation’s support, depth and number of nodes of the search tree), which we do not use further. Actually, we did call

`saucy`from Python and additionally computed the orbit partition ${O}_{G}$ (see the details in Appendix A).

## 4. Results

`saucy`. In the end, we analyzed a total of 902 different graphs (duplicates already removed). As discussed previously, the size of the automorphism group is not a good measure for the degree of symmetry. Nonetheless, it can be seen that 75% of the analyzed graphs have a very small group size. In total, 272 of the 902 graphs are asymmetric, which means about 70% contain symmetries.

`dimacs`and

`bhoslib`classes. The

`dimacs`class contains data from the second DIMACS challenge, which was about “Maximum Clique, Graph Coloring, and Satisfiability” (http://dimacs.rutgers.edu/Challenges/, last accessed 29 August 2017). These graphs were generated, thus no actual real-world networks. The

`bhoslib`class is discussed below. A third group of graphs were part of the

`misc`class and have names

`G1`,

`G2`and so on. These are also generated as described by Helmberg and Rendl [32]—most of them following a random model. As a consequence, these findings explain the large amount of highly non-modular and asymmetric graphs in Figure 3. It is also important to note that very non-modular graphs are in general unlikely to be real-world networks anyway [16] (Section 3.7).

`networkrepository.com`data, we did not carry out an in-depth comparison of analysis results between different classes. Nonetheless, Figure 5 presents a glance at the distributions of normalized redundancy per class. In total, 975 graphs (duplicates among different classes are now included) are spread over 17 classes. Note that

`networkrepository.com`has 21 different classes in total, due to the data selection described earlier, and some classes vanish in our analysis (“Brain Networks”, “Ecology Networks”, “Massive Network Data”, “Dynamic Networks”). However, we want to give some further information for the largest classes of our analysis.

`bhoslib`(“**B**enchmarks with**H**idden**O**ptimum**S**olutions for Graph Problems”, http://www.nlsde.buaa.edu.cn/~kexu/benchmarks/graph-benchmarks.htm, last accessed 4 September 2017): All these graphs are asymmetric. They all are generated from the “Model RB” [33], a derived random graph model. As argued earlier, random graphs tend to be asymmetric.`chem`: This largest class contains cheminformatics data which has overall a large degree of symmetry. The origin of all those networks is described by Borgwardt et al. [34]: They extracted proteins from an online enzyme information system and transformed them into undirected graphs. The nodes are labeled and we also took this labeling into account (see our discussion on graph simplification in Section 3) by additionally providing the partition vector of labels to the`saucy`call.`saucy`supports initially labeled nodes as starting point for the search procedure.`dimacs`/`dimacs10`: These datasets come from different DIMACS challenges (http://dimacs.rutgers.edu/Challenges/, last accessed 4 September 2017). The former, especially, contains a large number of highly symmetric graphs, but also many asymmetric graphs as described above.`misc`: This class contains datasets which seem not to fit any of the other classes. Therefore it is hard to make any valid assertions. Certainly one can see some interesting patterns—many graphs exist with ${r}_{G}^{\prime}\approx 0.45$ and there are five subclasses with ${r}_{G}^{\prime}\in [0.7,0.9]$—which gives room for further inspections.`socfb`: Although these networks seem to be mostly asymmetric, only four of the 29 actually are. All these graphs are quite large—more than 15,000 nodes on average—as they represent parts of the Facebook social network.

## 5. Conclusions

- It is the first large-scale study of symmetry of graphs (with about 900 graphs in the main part and with almost 800 simplified graphs in the appendix). Almost three quarters of the graphs are symmetric.
- The study shows that—except for the artificial random graphs (the class
`bhoslib`)—all other classes of real-world networks contain a high number of graphs with symmetries. - The analysis procedure is described in the appendix and its implementation is made available as Supplementary Materials.
- Last, but not least, for the purpose of summarizing the symmetry analysis of a large set of graphs, an improved measure of symmetry of a graph, namely a normalized version of the network redundancy measure, is derived. It has the intuitive property that completely asymmetric graphs have a measure of 0 and completely symmetric graphs a measure of 1. In contrast, the original definition by MacArthur et al. [19] yields different values for different asymmetric graphs, which makes them incomparable by the measure.

- Multiple equivalent solutions of graph clustering algorithms with unstable clusters.
- Diagnostic problems with all classic partition comparison measures. They fail to identify automorphic partitions as equivalent.

## 6. Outlook

## Supplementary Materials

## Acknowledgments

## Author Contributions

## Conflicts of Interest

## Appendix A. Analysis Procedure Details

#### Appendix A.1. Metadata Retrieval and Download Selection

`networkrepository.com`contains not only small but also very large datasets (“Massive Network Data”) of several gigabytes, analyzing everything was not an option. Therefore, we retrieved the table that contains all datasets (http://networkrepository.com/networks.php, last accessed 4 September 2017) as a starting point. A script loads this table and downloads all datasets (the download link is part of the metadata) that are smaller than approximately 70 megabytes (desired minimum/maximum sizes are configurable).

#### Appendix A.2. Main Analysis

`%%MatrixMarket matrix coordinate pattern symmetric`

`scipy.io.mminfo`and

`scipy.io.mmread`were used (http://scipy.org/scipylib/, last accessed 4 September 2017, release version 0.18.1 was used).

`scipy.io.mmread`returns a matrix that can then be used as input parameter for a

`networkx.Graph`(http://networkx.github.io/, last accessed 4 September 2017, release version 1.11 was used).

- had no header and a simple edge list (cf. the matrix position),
- had a header
`bip unweighted`or`bipartite unweighted`and a simple edge list, - had a header
`sym unweighted`or`symmetric unweighted`or`undirected unweighted`and a simple edge list, - had a header or not (according the rules above) and an edge list, where the third column’s values are timestamps, not weights.

`bip`/

`bipartite`had to be transformed, as the nodes from either node set were labeled $1,\dots ,{k}_{1}$ and $1,\dots ,{k}_{2}$. After transformation, the nodes were labeled $1,\dots ,{k}_{1}$ and ${k}_{1}+1,\dots ,{k}_{1}+{k}_{2}$.

`node_id label`. This information was turned into a vector of node labels and used as an additional parameter to call

`saucy`.

`saucy`(http://vlsicad.eecs.umich.edu/BK/SAUCY/, last accessed 21 November 2017, release version 3 was used), which was called through our own Python binding

`pysaucy`(https://github.com/FabianBall/pysaucy, last accessed 22 December 2017, release version 0.3.1 was used). If a vector of node labels was available, this information was provided as an additional parameter for the

`saucy`call. Modularity was the result of utilizing the

`RG`graph clustering algorithm [27]. The call was also wrapped by our Python binding

`pycggcrg`(https://github.com/FabianBall/pycggcrg, last accessed 22 December 2017, release version 0.2.3 was used). The results of the main step were again saved to a csv file.

#### Appendix A.3. Statistics

`n`,

`m`, and

`num_orbits`. For the graphs that had the same values for these three characteristics, all the other information was compared—which had also to be the same or similar if the graphs were duplicates—and as a last check, the names were compared. If the names were obviously similar, a duplicate was found. For example,

`karate.mtx`and

`soc-karate.mtx`both are the Karate network [29].

## Appendix B. Maximum Fraction of Affected Nodes for Fixed Normalized Redundancy

**Theorem**

**A1.**

**Proof.**

**Lemma**

**A1.**

**Proof.**

**Example**

**A1.**

**Figure A1.**A graph of an even number of nodes n for which the normalized network redundancy does not capture the high degree of symmetry, as no node is fixed by the automorphism group. The example shows that the (normalized) redundancy does not distinguish between the questions how many redundancies exist (${r}_{G}^{\prime}\approx 0.5$ means 50% of the nodes have a redundant counterpart) and how robust is an existing redundancy (${r}_{G}^{\prime}\approx 0.5$ means 50% are structurally identical, therefore redundant, and the other 50% are not redundant).

## Appendix C. Additional Diagrams

**Figure A2.**The histogram and box-plots for the number of nodes n. The x-axis and the bin widths follow a log-scale. Most graphs have between 10 and 100 nodes, and relatively fewer symmetries were found in medium-sized graphs, which coincides with the findings in Figure 2.

**Figure A3.**The density $\rho m/\left(\right)open="("\; close=")">\genfrac{}{}{0pt}{}{n}{2}$ of the graphs given their modularity Q. As stated in Section 4, modularity decreases with increasing density.

## Appendix D. Results of Simplified Networks

**Table A1.**Statistics for the simplified

`networkrepository.com`datasets. 163 of the 797 graphs are asymmetric, therefore $79.5$% of the graphs are symmetric. The row and column labels are the same as in Table 1.

n | m | Q | ${\mathit{r}}_{\mathit{G}}^{\prime}$ | $\left|\mathit{Aut}\right(\mathit{G}\left)\right|$ | |
---|---|---|---|---|---|

count | 797 | 797 | 797 | 797 | 797 |

mean | 70,041 | 7.0852 × 10${}^{5}$ | 0.73453 | 0.4275 | 2.3286 × 10${}^{7,237,014}$ |

std | 2.9308 × 10${}^{5}$ | 1.6665 × 10${}^{6}$ | 0.20054 | 0.36822 | 6.5740 × 10${}^{7,237,015}$ |

min | 16 | 46 | 0 | 0 | 1 |

25% | 2534 | 11,173 | 0.65179 | 0.00412 | 2 |

50% | 10,605 | 89,927 | 0.79553 | 0.49254 | 3.3692 × 10${}^{32}$ |

75% | 43,618 | 5.5493 × 10${}^{5}$ | 0.87875 | 0.76424 | 2.9131 × 10${}^{2486}$ |

max | 6.6865 × 10${}^{6}$ | 1.7233 × 10${}^{7}$ | 0.99771 | 1 | 1.8559 × 10${}^{7,237,017}$ |

## References

- Gross, D.J. The Role of Symmetry in Fundamental Physics. Proc. Natl. Acad. Sci. USA
**1996**, 93, 14256–14259. [Google Scholar] [CrossRef] [PubMed] - Balaban, A.T. Applications of Graph Theory in Chemistry. J. Chem. Inf. Comput. Sci.
**1985**, 25, 334–343. [Google Scholar] [CrossRef] - Dehmer, M.; Mowshowitz, A. A History of Graph Entropy Measures. Inf. Sci.
**2011**, 181, 57–78. [Google Scholar] [CrossRef] - Ball, F.; Geyer-Schulz, A. Weak Invariants of Actions of the Automorphism Group of a Graph. Arch. Data Sci. Ser. A
**2017**, 2, 1–22. [Google Scholar] - Read, R.C.; Corneil, D.G. The Graph Isomorphism Disease. J. Graph Theory
**1977**, 1, 339–363. [Google Scholar] [CrossRef] - Lubiw, A. Some NP-Complete Problems Similar to Graph Isomorphism. SIAM J. Comput.
**1981**, 10, 11–21. [Google Scholar] [CrossRef] - McKay, B.D. Practical Graph Isomorphism. Congr. Numerantium
**1981**, 30, 45–87. [Google Scholar] [CrossRef] - McKay, B.D.; Piperno, A. Practical Graph Isomorphism, II. J. Symb. Comput.
**2014**, 60, 94–112. [Google Scholar] [CrossRef] - Junttila, T.; Kaski, P. Engineering an Efficient Canonical Labeling Tool for Large and Sparse Graphs. In Proceedings of the 2007 Ninth Workshop on Algorithm Engineering and Experiments (ALENEX), New Orleans, LA, USA, 6 January 2007; pp. 135–149. [Google Scholar]
- Darga, P.T.; Sakallah, K.A.; Markov, I.L. Faster Symmetry Discovery Using Sparsity of Symmetries. In Proceedings of the 2008 45th ACM/IEEE Design Automation Conference, Anaheim, CA, USA, 8–13 June 2008; pp. 149–154. [Google Scholar]
- Babai, L. Graph Isomorphism in Quasipolynomial Time. arXiv, 2005; arXiv:1512.03547 [cs, math]. [Google Scholar]
- Babai, L. Graph Isomorphism in Quasipolynomial Time [Extended Abstract]. In Proceedings of the Forty-Eighth Annual ACM Symposium on Theory of Computing, Cambridge, MA, USA, 19–21 June 2016; ACM: New York, NY, USA, 2016; pp. 684–697. [Google Scholar]
- Garlaschelli, D.; Ruzzenenti, F.; Basosi, R. Complex Networks and Symmetry I: A Review. Symmetry
**2010**, 2, 1683–1709. [Google Scholar] [CrossRef] - Erdős, P.; Rényi, A. Asymmetric Graphs. Acta Math. Hung.
**1963**, 14, 295–315. [Google Scholar] [CrossRef] - Erdős, P.; Renyi, A. On Random Graphs I. Publ. Math.
**1957**, 6, 290–297. [Google Scholar] - Newman, M.E. The Structure and Function of Complex Networks. SIAM Rev.
**2003**, 45, 167–256. [Google Scholar] [CrossRef] - Xiao, Y.; MacArthur, B.D.; Wang, H.; Xiong, M.; Wang, W. Network Quotients: Structural Skeletons of Complex Systems. Phys. Rev. E
**2008**, 78, 046102. [Google Scholar] [CrossRef] [PubMed] - Wang, H.; Yan, G.; Xiao, Y. Symmetry in World Trade Network. J. Syst. Sci. Complex.
**2009**, 22, 280–290. [Google Scholar] [CrossRef] - MacArthur, B.D.; Sánchez-García, R.J.; Anderson, J.W. Symmetry in Complex Networks. Discret. Appl. Math.
**2008**, 156, 3525–3531. [Google Scholar] [CrossRef] - Garrido, A. Symmetry in Complex Networks. Symmetry
**2011**, 3, 1–15. [Google Scholar] [CrossRef] - Rossi, R.A.; Ahmed, N.K. The Network Data Repository with Interactive Graph Analytics and Visualization. In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, Austin, TX, USA, 25–30 January 2015; AAAI Press: Austin, TX, USA, 2015; pp. 4292–4293. [Google Scholar]
- Bastian, M.; Heymann, S.; Jacomy, M. Gephi: An Open Source Software for Exploring and Manipulating Networks. In Proceedings of the Third International AAAI Conference on Weblogs and Social Media, San Jose, CA, USA, 17–20 May 2009. [Google Scholar]
- Leskovec, J.; Krevl, A. SNAP Datasets: Stanford Large Network Dataset Collection; SNAP: Stanford, CA, USA, 2014. [Google Scholar]
- Bader, D.A.; Meyerhenke, H.; Sanders, P.; Schulz, C.; Kappes, A.; Wagner, D. Benchmarking for Graph Clustering and Partitioning. In Encyclopedia of Social Network Analysis and Mining; Springer: New York, NY, USA, 2014; pp. 73–82. [Google Scholar]
- Boisvert, R.F.; Pozo, R.; Remington, K.A. The Matrix Market Exchange Formats: Initial Design; Technical Report NISTIR 5935; National Institute of Standards and Technology, Applied and Computational Mathematics Division: Gaithersburg, MD, USA, 1996.
- Kunegis, J. Handbook of Network Analysis [KONECT—The Koblenz Network Collection]. arXiv, 2014; arXiv:1402.5500v3 [cs.SI]. [Google Scholar]
- Ovelgönne, M.; Geyer-Schulz, A.; Stein, M. Randomized Greedy Modularity Optimization for Group Detection in Huge Social Networks. In Proceedings of the 4th Workshop on Social Network Mining and Analysis, Washington, DC, USA, 25 July 2010; ACM: New York, NY, USA, 2010; pp. 1–9. [Google Scholar]
- Newman, M.E.J.; Girvan, M. Finding and Evaluating Community Structure in Networks. Phys. Rev. E
**2004**, 69, 026113. [Google Scholar] [CrossRef] [PubMed] - Zachary, W.W. An Information Flow Model for Conflict and Fission in Small Groups. J. Anthropol. Res.
**1977**, 33, 452–473. [Google Scholar] [CrossRef] - Fortunato, S.; Barthélemy, M. Resolution Limit in Community Detection. Proc. Natl. Acad. Sci. USA
**2007**, 104, 36–41. [Google Scholar] [CrossRef] [PubMed] - Quintas, L.V. Extrema Concerning Asymmetric Graphs. J. Comb. Theory
**1967**, 3, 57–82. [Google Scholar] [CrossRef] - Helmberg, C.; Rendl, F. A Spectral Bundle Method for Semidefinite Programming. SIAM J. Optim.
**2000**, 10, 673–696. [Google Scholar] [CrossRef] - Xu, K.; Li, W. Many Hard Examples in Exact Phase Transitions. Theor. Comput. Sci.
**2006**, 355, 291–302. [Google Scholar] [CrossRef] - Borgwardt, K.M.; Ong, C.S.; Schönauer, S.; Vishwanathan, S.V.N.; Smola, A.J.; Kriegel, H.P. Protein Function Prediction via Graph Kernels. Bioinformatics
**2005**, 21, i47–i56. [Google Scholar] [CrossRef] [PubMed] - Albert, R.; Barabási, A.L. Statistical Mechanics of Complex Networks. Rev. Mod. Phys.
**2002**, 74, 47–97. [Google Scholar] [CrossRef] - Jalili, M.; Perc, M. Information Cascades in Complex Networks. J. Complex Netw.
**2017**, 5, 665–693. [Google Scholar] [CrossRef] - Lubotzky, A. Discrete Groups, Expanding Graphs and Invariant Measures; Birkhäuser Basel: Basel, Switzerland, 1994. [Google Scholar]
- Hoory, S.; Linial, N.; Wigderson, A. Expander Graphs and Their Applications. Bull. Am. Math. Soc.
**2006**, 43, 439–561. [Google Scholar] [CrossRef]

**Figure 1.**Example of a two-step simplification of the graph ${G}_{w+l}$ (left) which is asymmetric. ${G}_{w}$ in the middle is still weighted but has a non-trivial automorphism group and G (right) is simple with a more complex automorphism group. Each automorphism group of the more complex graph(s) is a subgroup of the simpler graph(s): $Aut\left({G}_{w+l}\right)\le Aut\left({G}_{w}\right)\le Aut\left({G}_{w}\right)$. This example shows how simplification of graphs by removing weights and/or loops can lead to additional symmetries. (

**a**) The weighted graph ${G}_{w+l}$ which also has a loop at node 3. It has the trivial automorphism group $Aut\left({G}_{w+l}\right)=\left\{\mathbf{1}\right\}$; (

**b**) A simplified version ${G}_{w}$ of the graph ${G}_{w+l}$ with the following symmetries: $Aut\left({G}_{w}\right)=\{\mathbf{1},\left(1\phantom{\rule{0.166667em}{0ex}}2\right)\left(3\phantom{\rule{0.166667em}{0ex}}4\right),\left(2\phantom{\rule{0.166667em}{0ex}}3\right)\left(1\phantom{\rule{0.166667em}{0ex}}4\right),\left(1\phantom{\rule{0.166667em}{0ex}}3\right)\left(2\phantom{\rule{0.166667em}{0ex}}4\right)\}$; (

**c**) The simple graph G which has the automorphism group $Aut\left(G\right)=Aut\left({G}_{w}\right)\cup \{\left(1\phantom{\rule{0.166667em}{0ex}}3\right),\left(2\phantom{\rule{0.166667em}{0ex}}4\right),\left(1\phantom{\rule{0.166667em}{0ex}}2\phantom{\rule{0.166667em}{0ex}}3\phantom{\rule{0.166667em}{0ex}}4\right),\left(1\phantom{\rule{0.166667em}{0ex}}4\phantom{\rule{0.166667em}{0ex}}3\phantom{\rule{0.166667em}{0ex}}2\right)\}$.

**Figure 2.**The histogram and box-plots for the number of edges m. The x-axis and the bin widths follow a log-scale. In every bin, a substantial amount of graphs contain symmetries. An explanation of why the graphs with $4\le {log}_{10}m<6$ are relatively less-symmetric is given in the text. Nonetheless, symmetric graphs are present at all sizes.

**Figure 3.**The histogram and box-plots for the modularity Q as a measure of cluster structure. All bins have an equal width of 0.05. Again, many graphs with partitions having a certain modularity value Q are found to be symmetric. Only graphs with a very low modularity seem to be substantially less symmetric, which is discussed in the text. There is no concise definition for which values of Q a partition of a graph is good, because this depends on the graph’s size. For smaller graphs, values greater than 0.3 are fine, for large graphs, values greater 0.9 are not unusual.

**Figure 4.**The histogram and box-plot for the normalized redundancy ${r}_{G}^{\prime}$ for all 902 graphs. In total, there are 272 asymmetric and 13 completely symmetric (transitive) graphs. Most of the graphs have a quite low normalized redundancy, nevertheless about 70% of the analyzed data contain symmetries. The bin width is 0.05. The right skew of the distribution is underlined by the often-true relationship that the median (0.0625) is smaller than the mean (0.1514).

**Figure 5.**The normalized redundancy distributions per class. The classes are (as named on

`networkrepository.com`):

`bhoslib`(“BHOSLIB”),

`bio`(“Biological Networks”),

`ca`(“Collaboration Networks”),

`chem`(“Cheminformatics”),

`dimacs`(“DIMACS”),

`dimacs10`(“DIMACS10”),

`ia`(“Interaction Networks”),

`inf`(“Infrastructure Networks”),

`misc`(“Miscellaneous Networks”),

`rec`(“Recommendation Networks”),

`rt`(“Retweet Networks”),

`sc`(“Scientific Computing”),

`soc`(“Social Networks”),

`socfb`(“Facebook Networks”),

`tech`(“Technological Networks”),

`tscc`(“Temporal Reachability Networks”),

`web`(“Web Graphs”). After each class name, the tuple (“number of class members”, “number of symmetric class members”) is given. For this plot, duplicates from different classes were not removed to prevent a bias within classes. 975 networks are spread over the 17 classes. Details on the six largest classes are given in the text. The distributions of normalized redundancy per class differ strongly and with an exception of

`bhoslib`, each class contains a large amount of symmetric graphs.

**Table 1.**Statistics for the

`networkrepository.com`datasets: Only 272 of the 902 graphs are asymmetric. count is the number of datasets on which the statistics per column were computed on, mean (std) is the respective mean value (standard deviation). min and max are the minimal/maximal values observed and the three remaining rows in between denote the two quartiles (25% and 75%) and the median (50%). The columns n (m) yield statistics on the number of nodes (edges) of the graphs, Q is the modularity, which measures (clustering) partition quality. The last two columns contain information on the symmetries, ${r}_{G}^{\prime}$ is the normalized redundancy (see Equation (2)) and $\left|Aut\right(G\left)\right|$ the size of the automorphism group.

n | m | Q | ${\mathit{r}}_{\mathit{G}}^{\prime}$ | $\left|\mathit{Aut}\right(\mathit{G}\left)\right|$ | |
---|---|---|---|---|---|

count | 902 | 902 | 902 | 902 | 902 |

mean | 1.2796 × 10${}^{5}$ | 4.1845 × 10${}^{5}$ | 0.52569 | 0.1514 | 6.3659 × 10${}^{2,517,003}$ |

std | 8.6096 × 10${}^{5}$ | 1.8654 × 10${}^{6}$ | 0.24689 | 0.23447 | 1.9119 × 10${}^{2,517,005}$ |

min | 2 | 1 | 0 | 0 | 1 |

25% | 27 | 51 | 0.41563 | 0 | 1 |

50% | 42 | 84 | 0.57931 | 0.0625 | 4 |

75% | 800 | 24,489 | 0.66007 | 0.18182 | 24 |

max | 1.1951 × 10${}^{7}$ | 2.5166 × 10${}^{7}$ | 0.99867 | 1 | 5.7420 × 10${}^{2,517,006}$ |

© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Ball, F.; Geyer-Schulz, A.
How Symmetric Are Real-World Graphs? A Large-Scale Study. *Symmetry* **2018**, *10*, 29.
https://doi.org/10.3390/sym10010029

**AMA Style**

Ball F, Geyer-Schulz A.
How Symmetric Are Real-World Graphs? A Large-Scale Study. *Symmetry*. 2018; 10(1):29.
https://doi.org/10.3390/sym10010029

**Chicago/Turabian Style**

Ball, Fabian, and Andreas Geyer-Schulz.
2018. "How Symmetric Are Real-World Graphs? A Large-Scale Study" *Symmetry* 10, no. 1: 29.
https://doi.org/10.3390/sym10010029