## 1. Introduction

It is increasingly clear that the specific connectivities of biological networks are far from random. In neuroscience, the overall behavior and dynamics of a neuronal network is dictated both by the properties of individual neurons, and by the specific complex structure of the network through which they are connected [

1,

2]. Understanding the behavior of such a neuronal network must therefore involve inferring its underlying structural properties and design principles. However, it is unclear a priori which structural features of a network are most important in doing so. Distinguishing these important features is one of the goals in quantifying the complexity of a network, as a measure of network complexity could allow us to determine which features of a network cause it to be complex, and to infer then the purpose of these complex structures.

As of yet, there exists no broadly accepted definition of network complexity which would allow for its unambiguous quantification, though many quantitative measures have been proposed and applied [

3,

4]. A necessary property of any such measure, however, is that it must distinguish complex, structured networks from networks which are either completely random or completely ordered. One such measure was developed by Sakhanenko and Galas [

5] who defined a general complexity measure, based in the Kolmogorov complexity, which correctly vanishes in these limiting cases. Indeed, even this simple approach appears capable of indicating certain complex structures existing within the network [

5,

6].

If a network is determined via some measure to be highly complex, the question yet remains: what is the source of that complexity? That is, what are the substructures or features of the network which make it “complex”? Can these measures be applied in such a way so as to identify complexity-causing structures within the network, and can we attribute biophysiological meaning to these structures?

The nematode

C. elegans is an ideal system in which to attempt to address such questions. It remains the only organism for which the full connectivity of its neuronal network (its “Connectome”) is known. The

C. elegans Connectome consists of both directed synaptic connections and undirected gap junction connections among its 302 neurons [

7]. These neurons can be broadly classified as sensory neurons (those receiving external sensory input), motorneurons (those synapsing onto muscle), or, otherwise, as interneurons (if lacking explicit sensory input or motor output) [

7]. Despite not directly receiving external input or driving motor output, many interneurons play important computational roles in the network. For example, the “command” interneurons are individually crucial to the control of locomotion; for example, if the command interneuron pair AVAL/R is ablated, the ability of the worm to crawl backwards is severely diminished, whereas ablating the command interneuron pair AVBL/R diminishes the ability of the worm to crawl forwards [

8].

Even with its relatively small nervous system,

C. elegans is capable of a fairly broad range of behaviors. In addition to responding to a range of mechanical and chemical stimuli [

8,

9,

10], the worm must navigate [

11], mate [

12,

13], and lay eggs [

14,

15]. The network controls these various behaviors through shared pathways of overlapping subcircuits, all while the structure appears to approximately minimize the wiring cost between nodes [

16]. Previous studies indicate that, as with other neuronal networks [

17,

18,

19,

20,

21,

22,

23], the network partially accomplishes this trade-off through its “rich club” structure, having a highly-connected hub of “rich” neurons with high betweenness centrality [

24]. The specifically-tuned complex structure of the network may ultimately encode behavioral responses to inputs. For example, simulations of Connectome dynamics which treat all neurons as identical units are capable, through their connectivity structure alone, of generating biophysiologically reasonable dynamical responses to specific inputs (e.g., in [

25]).

It was recently demonstrated by Kim et al. [

26] that vulnerability analysis is capable of identifying many important functional structures within the

C. elegans connectome. The core idea of vulnerability analysis is that important links or nodes within the network can be identified by considering the amount by which different structural properties of the network change when said link/node is removed. Kim et al. computationally analyzed the effects of removing any one given node/edge from the Connectome on its clustering coefficient, global efficiency, and betweenness centrality, and found that such an approach identified biophysiologically relevant subcircuits.

In this paper, we study the complexity of the

C. elegans gap junction Connectome based on our previously defined measure and find that it is vastly more complex than a random graph with the same degree distribution. Its complexity score is 16.5 standard deviations above the randomly-expected mean (see Methods

Section 4.5 for specifics). We then extend the use of vulnerability analysis on the

C. elegans connectome by considering how the complexity of the graph is altered if any given edge is deleted. A large fraction of the network’s complexity is seen to be caused by a relatively small set of connections involving the known “command” interneurons. We then extend the idea of vulnerability analysis by using a greedy algorithm to iteratively delete these “complexity-causing” connections. When these links are eliminated, the network becomes significantly less complex than a random graph. Furthermore, the deleted structure is seen to have a clear biophysical interpretation: our algorithm implicates a set of edges involving neurons from the synaptic network’s “rich club” as being the source of the network’s excess complexity. Thus, this study supports a view of the Connectome as consisting of a low-complexity structure whose complexity is dramatically enhanced by the addition of unique connectivities of the network’s rich club.

## 3. Discussion

This study shows that the C. elegans neuron gap–junction network is vastly more complex than one would expect at random given its degree distribution. Expanding the concept of vulnerability analysis, we analyzed the complexity vulnerability of the graph and used an iterative procedure to successively eliminate edges based upon how much each deletion reduced graph complexity. After many edge deletions, we were left with a graph much less complex than an equivalent random graph. This suggests that the majority of the graph consists of a low complexity component that can be described fairly simply, and a subset of edges, which is responsible for the excess complexity of the network. This complexity-causing set of edges belongs disproportionately to the neurons which are members of the associated synaptic network’s previously-identified rich club.

The fact that the procedure reduces the graph to one with a remarkably

low complexity raises interesting questions concerning the topology of graphs. Many graphs, particularly in biology and neuroscience and including the

C. elegans neuronal network, are rather well-described by a collection of over-represented motifs [

7,

29,

30,

31]. A common view of neuronal networks, as consisting of a highly structured graph of repeated motifs with the repetitive structure violated by important hub neurons, is interesting to compare with the results in

Figure 4. Is the iterative procedure simply stripping away these hub structures to leave a simple, highly-repetitive core? Given that this complexity measure disappears when the mutual information between nodes approaches 1, it is plausible that particularly motif-heavy graphs, consisting of sets of nodes having similar connectivity patterns/high mutual information, may be classified as having a particularly low complexity by this measure. The exact relationship between motif structures, symmetry and symmetry-breaking and their effects upon network complexity measures deserves a deeper theoretical investigation, which will be the subject of future work.

These results also suggest that future work should investigate the broader range of structural properties to which this complexity measure is sensitive. It is important to note that this procedure does

not simply identify rich-club structure; recall that we consider the “rich club” set of neurons due to its previously-identified importance, but this set is defined from the

synaptic network, not the gap junction network, which lacks such structure. What functionally relevant structures does this approach reveal, and what are the associated topological properties? For example, one could consider the change in average betweenness centrality as nodes are iteratively deleted. Initially, the network has an average betweenness which is a factor of about 1.29 higher than that of a random graph (with the same degree distribution). After 40 deletions, this factor is reduced to 1.22. This small reduction does not appear to explain the large reduction in relative complexity over the same range (as seen in

Figure 4). Other preliminary work has been similarly inconclusive, suggesting that the relationship between complexity and other topological measures is nontrivial. The more general relationship between network complexity and various other global topological properties of the network is beyond the scope of this paper, but the subject of future work.

How would the identified structures differ if one uses a different quantification of network complexity? There is no universally recognized quantification of network complexity, and many such measures exist [

3,

4,

32,

33]. Furthermore, the Ψ measure is inherently dependent only on pairwise measures of connectivity in the sense that it is a sum over pairs of mutual information between nodes based on their connectivity. Measures that include higher numbers of nodes and their informational interdependence might reveal other features [

23,

32,

33]. It should be possible to perform a similar deconstruction procedure using other measures, which may distinguish different features and could prove useful both as a tool for the exploration of the structure of networks and for illuminating the differences between the measures themselves.

Further development of network complexity quantification and exploration of different measures could incorporate information which we have ignored thus far in our analysis. Since we used an undirected network complexity measure, we focused upon the undirected gap junction network. However, the directed synaptic network is also vitally important, and the method can be modified to incorporate directed connections. Similarly, we consider the unweighted binary graph, ignoring edge-weighting data such as the number of gap junctions between each node. Future analysis could make use of such information, perhaps by using previously-developed extensions of this complexity measure to include multiple edge types [

6].

In spite of these current limitations, this study strongly suggests that quantifying a network’s global complexity, performing vulnerability analysis using this complexity measure, and then iteratively eliminating edges based upon their vulnerability at each step is a useful direction for identifying physiologically important structures within complex graphs. Such methods are increasingly important as increasingly large and difficult-to-comprehend neuronal networks are measured by the burgeoning field of connectomics [

34,

35,

36]. In a real network, the interaction between the structural features, the behavior and flexibility of the network function, the vulnerability to damage and the costs of wiring the networks are fundamental. The trade-offs can only be fully understood by careful quantitative analysis of the kind begun here. Given the potentially scale-invariant nature of neuroscience networks [

24], the architectural principles we infer from these complex networks may yield broader design insights into the function and structure of nervous system networks.

## 4. Materials and Methods

#### 4.1. Network Complexity $\Psi (G)$

The graph complexity measure

$\Psi (G)$ is an information theoretic measure with a basis in the Kolmogorov complexity and was introduced and explored for undirected binary graphs by Sakhanenko and Galas [

5]. Consider an undirected binary graph with

N nodes, characterized by a symmetric adjacency matrix

$A=\left\{{a}_{ij}\right\}$, where

${a}_{ij}=1$ if nodes

i and

j are connected, and

${a}_{ij}=0$ if they are not. The measure is based on a Shannon-like description of information, such that the fundamental properties of a node are its connection probabilities. We denote the connection probability of node

i by:

Similarly, the disconnection probability is given by:

The complexity of an individual node is then given by:

We can then calculate the mutual information between two nodes

i an

j:

The graph complexity Ψ can then be calculated from the mutual informations and individual node complexities:

Note that the summands will disappear when either ${m}_{ij}=0$ or ${m}_{ij}=1$. The former case is when the connectivity patterns of the two nodes contain no information about the other, as is the case of infinite random graphs. The latter case occurs when two nodes carry perfect information about the other, as is the case with completely connected or completely disconnected graphs. Thus, the graph complexity will disappear in both of these limiting cases.

#### 4.2. $\Delta \Psi (G)$ and Iterative Edge Removal

Our greedy algorithm for complexity vulnerability analysis consists of iteratively deleting the graph edges whose deletions cause the largest reduction in graph complexity. That is, for an initial adjacency matrix ${A}^{(0)}$, we consider all one-edge deletions and recalculate the resulting Ψ, finding the edge $({i}_{1},{j}_{1})$ which causes the largest reduction in Ψ when deleted. We refer to the new adjacency matrix, with the single edge deletion, as ${A}^{(1)}$. We then repeat this process with ${A}^{(1)}$ to identify $({i}_{2},{j}_{2})$. This is repeated until all N edges are deleted, yielding an edge deletion order $e=\{({i}_{1},{j}_{1}),({i}_{2},{j}_{2}),({i}_{3},{j}_{3}),\dots ,({i}_{N},{j}_{N})\}$.

For notational purposes, we define a matrix

$\Delta (i,j)$ which is equal to one at entry

$(i,j)$ and is zero for all other entries. That is,

This allows us to write the adjacency matrix with a single edge deletion at

$({i}^{\prime},{j}^{\prime})$ as

$A-\Delta ({i}^{\prime},{j}^{\prime})$. We select the first edge to delete by calculating:

That is, we choose the indices of the edge deletion resulting in the graph with the least complexity. We can write the resulting adjacency matrix as:

We write the corresponding change in complexity as:

This process is repeated iteratively until all edges are deleted. At the

nth edge deletion we may write:

#### 4.3. Illustrative Example: 12 Nodes of Four Degrees

Using the Python package

graph-tool [

37], we generated 10,000 random Erdős–Rényi graphs, all consisting of 12 nodes which all have degree four. We calculated Ψ for all of these 10,000 graphs, giving the distribution seen in

Figure 1a. We then applied the iterative edge-removal procedure from

Section 4.2 to the random graph which had the highest value of Ψ. This gave the edge removal ordering as partially visualized in

Figure 1b.

#### 4.4. The C. Elegans Connectome

The

C. elegans neuronal network consists of 302 neurons connected via both a directed synaptic network (with 6393 synapses) and undirected gap junction network (with 890 gap junctions) [

7]. The bulk of these neurons, 282 out of 302, belong to the somatic nervous system. We use the connectivity data for the giant component of the somatic nervous system, consisting of 279 neurons, as provided by Varshney et al. [

7], who consolidated and updated earlier connectome measurements [

38,

39,

40].

Since this work focuses on the structure of undirected graphs, our analysis is of the gap junction network. All of the 279 neurons within the network are connected synaptically, but many lack gap junctions entirely. We eliminate the neurons that have no gap junction connections, leaving the 253 somatic neurons with both synaptic and gap junction connections. Many nodes share multiple connections, but we simplify our analysis by considering the unweighted network: ${a}_{ij}=1$ if nodes i and j have one or more gap junction connections, or ${a}_{ij}=0$ if they have no gap junction connections. Thus, our adjacency matrix ${A}^{(0)}$ consists of 253 nodes connected by 514 binary edges.

#### 4.5. Comparison of C. elegans to Random Connectivity

To compare the complexity of the actual Connectome against what we might expect at random, we use the

random_rewire function included within the Python package

graph-tool [

37], selecting the “uncorrelated” rewiring model. This procedure is described in the

graph-tool documentation and can be summarized as follows: for each edge

$(i,j)$, the algorithm randomly selects a second edge

$({i}^{\prime},{j}^{\prime})$. It then attempts to swap the target of each edge, such that the edges would then become

$(i,{j}^{\prime})$ and

$({i}^{\prime},j)$. This swap is rejected if it would result in parallel edges or self-loops. This swapping procedure is repeated for all edges

$(i,j)$ within the graph. This rewires a graph to randomize connections while preserving the exact degree sequence.

For the distribution of

Figure 2, we generated 10,000 randomly-rewired graphs with the same degree distribution as the

C. elegans gap junction network, yielding the displayed distribution with a mean and standard deviation of

$\Psi ({G}_{rand})=0.001173\pm 0.000015$. As we iteratively eliminated edges, we wished to continue comparing the reduced complexity against what we would expect at random. At each step

n, we therefore randomly rewired the partially deleted graph

${A}^{(n)}$ to generate 256 graphs with the same (partially deleted) degree distribution.

#### 4.6. C. elegans Rich Club

The “rich club” is defined by the rich club coefficient

$\Phi (k)$, which is defined by the connection probability between nodes of degree greater than

k. For some degree

k, define

${N}_{>k}$ as the number of nodes with degree greater than

k, and

${M}_{>k}$ as the number of connections between said nodes. The rich club coefficient is then just the ratio between the actual number of connections

${M}_{>k}$ and the number of possible connections [

24,

28,

41]:

Towlson et al. [

24] found that the

C. elegans Connectome has a significant rich club coefficient for degrees

$35\le k\le 73$, which implicates the following 11 nodes as belonging to the rich club: AVAL/R, AVBL/R, AVDL/R, AVEL/R, PVCL/R, and DVA. It is important to note that this is the set of neurons which we use as the “rich club” of the network, and we do

not do any re-calculation of the rich club based on our particular graph; Towlson calculates the rich club set from a binary form of the synaptic network, whereas we consider a binary form of the gap junction network. We refer to this set of neurons due to its known biological significance.