The Graph of Our Mind

Graph theory in the last two decades penetrated sociology, molecular biology, genetics, chemistry, computer engineering, and numerous other fields of science. One of the more recent areas of its applications is the study of the connections of the human brain. By the development of diffusion magnetic resonance imaging (diffusion MRI), it is possible today to map the connections between the 1–1.5 cm2 regions of the gray matter of the human brain. These connections can be viewed as a graph. We have computed 1015-vertex graphs with thousands of edges for hundreds of human brains from one of the highest quality data sources: the Human Connectome Project. Here we analyze the male and female braingraphs graph-theoretically and show statistically significant differences in numerous parameters between the sexes: the female braingraphs are better expanders, have more edges, larger bipartition widths, and larger vertex cover than the braingraphs of the male subjects. These parameters are closely related to the quality measures of highly parallel computer interconnection networks: the better expanding property, the large bipartition width, and the large vertex cover characterize high-quality interconnection networks. We apply the data of 426 subjects and demonstrate the statistically significant (corrected) differences in 116 graph parameters between the sexes.


Introduction
It is an old dream to describe the neuronal-level braingraph (or connectome) of different organisms, where the vertices correspond to the neurons and two neurons are connected by an edge if there is a connection between them.The connectome of the roundworm Caenorhabditis elegans with 302 neurons was mapped 30 years ago [1], but larger braingraphs, especially the complete fruitfly Drosophila melanogaster braingraph (the "flybrain") with approximately 100,000 neurons remained unmapped in its entirety, despite using enormous resources and efforts worldwide.Mapping the connections in the human brain on the neuronal level is completely hopeless today, mostly because there are, on the average, 86 billion neurons in the human brain [2].Constructing human braingraphs (or "connectomes"), where the vertices are not single neurons, but much larger areas of the gray matter of the brain (called Regions of Interest, ROIs), is possible, and it is the subject of a very intensive research work today.Two vertices, corresponding to the ROIs, are connected by an edge if a diffusion-MRI based workflow finds neuronal connections between them.In the process of the Human Connectome Project [3], an enormous amount of data and numerous tools were created related to the mapping of the human brain, and the resulting data were deposited in publicly available databases of dozens of terabytes.
Our focus in this work is the graph theoretical analysis of the connections of the brain; consequently, we just sketch the process of the construction of this graph here.
The human brain tissue, roughly, has two distinct parts: the white matter and the gray matter.The gray matter, by some simplifications, consists of the cell-bodies (or somas) of the neurons, and the white matter from the fibers of axons (long projections from the somas), insulated by lipid-like myelin sheaths.The cortex of the brain, and also some sub-cortical areas, contain gray matter, and most of the inner parts of the brain contain white matter.Again with some simplifications, the connections between the somas of the neurons, the axons, run in the white matter, except the very short axons running entirely in the gray matter.
Diffusion magnetic resonance imaging (MRI) is, again roughly speaking, capable of measuring the direction of the diffusion of the water molecules in living tissues, without any contrast agent.The gray matter of the brain consists of the cell bodies (somas) of the neurons, consequently, there is not any distinguished direction of the diffusion of the water molecules in the somas: in each direction the molecules can move freely.In the white matter, however, the neuronal fibers consisted of long axons, so the water molecules move more easily and more probably in the direction of the axons than perpendicularly, through the cell membrane, bordering the axons.Therefore, in each point of a given axon in the white matter, the diffusion of the water molecules is larger in directions parallel to the axons and smaller in other directions.
This way one can distinguish the white matter and the gray matter of the brain (this step is called partitioning).Moreover, by following or tracking the directions of the stronger diffusion, it is possible to map the orbits of the neuronal fibers in the white matter (this step is called tractography).Certainly, when the fiber tracts are crossed, it is not easy to follow the correct directions of the axons.
After the tractography is performed, one gets an image, similar to Figure 1.Most of the fibers start and end on the surface -the cortex -of the brain.
We are interested in the connections between the gray matter areas, mostly of the cortical areas, and we ignore the exact orbits of the neuronal fibers in the white matter.That is, it is not interesting for us where the "wires" run, just the fact of the connections between the separate areas of the gray matter.Naturally, the length or the number of neuronal fibers, connecting the gray matter ROIs, can be included in the graph as different weight functions on the edges.
Consequently, we define the graph as follows: the vertices are the small anatomical areas of the gray matter (ROIs), and two ROIs are connected by an edge if in the tractography phase, at least, one fiber is tracked between these two ROIs.We are considering five different resolutions of ROIs, and also five different weight functions, computed from the properties of the fibers, connecting the ROIs.

Previous work
Numerous publications cover the connectome [4,5] of healthy humans [6,7,8,9] and also the connectomes of the healthy and the diseased brain [10,11,12,13,14].Usually, these works analyze only 80-100 vertex graphs on the whole brain, and they are using concepts that originate from the network science, developed for large graphs of millions of vertices, found, e.g., in the graph of the World Wide Web.
Here we present another approach: We are analyzing larger graphs, up to 1015 vertices, and our algorithms are originated from graph theory and not from network science.In other words, we are also computing graph parameters that are quite hopeless to compute for graphs with millions of vertices.
In our previous work, we have made comparisons between the braingraphs of numerous subjects with several focuses: (i) We have constructed the Budapest Reference Connectome Server http: //connectome.pitgroup.org,which generates the common edges of up to 477 graphs of 1015 vertices, according to selectable parameters [15,16].The Budapest Reference Connectome Server, apart from the commonedge demonstration, is also a good tool for the instant visualization of the braingraph.
(ii) We have compared the diversity of the edges in distinct cerebral areas in 392 individual brains in [17]; (iii) Based on a feature of the Budapest Reference Connectome Server, we have found a probable connection between the consensus connectomes and the axon-development of the human brain [18,19,20,21].
(iv) We have described the most frequent small subgraphs of the human braingaph in [22].In [23] we have listed the most frequent complete subgraphs of the human connectome.In [24,25] we have introduced the method of the Frequent Network Neighborhood Mapping, and applied it for the neighbors of the hippocampus, one of the most important small funcional entity of the brain.
(v) We have compared women's and men's connectomes in 96 subjects in [26], and found that the braingraphs of females have numerous, statistically significant advantages in graph theoretical properties that are characteristic of the better connections.We have found 13 parameters, in which the difference remained significant after the very strict Holm-Bonferroni statistical correction [27].
In the present work, we have found 116 graph parameters (vs. the 13 parameters in [26]), which differ significantly between the sexes, after the Holm-Bonferroni correction.

Results
In the present work we are considering a 426-subject dataset from the Human Connectome Project public release [3].For each subject, we compute braingraphs with five different vertex-numbers: 83, 129, 234, 463 and 1015.The vertices correspond to anatomical areas of the gray matter in different resolutions.
The goal is to assign the same named vertex to the same anatomical region, in the case of all subjects.Naturally, the size and the shape of the brain of all subjects differ; therefore it is a non-trivial task to assign the same nodes (or ROIs) to the same anatomical regions for all subjects.This is called the "registration problem", and we have applied the solution from the FreeSurfer suite of programs [28] that mapped the individual brains to the Desikan-Kiliany brain atlas [29].Roughly, the registration method applies homeomorphisms in order to correspond the major cortical patterns of sulci and gyri between individual cortices.
We were using five different resolutions in 83, 129, 234, 463 and 1015 vertices, because for smaller values the graph structure is poorer, and for the higher resolutions there is a possibility of the registration errors, due to the potentially too small areas corresponded to the vertices.Therefore, we have computed and analyzed the graph properties for all of these five resolutions, instead of deciding arbitrarily that one of the resolutions are the best for our goals.
For every graph, we have considered five different edge weights.Four of these describe some quantities, related to the neuronal fibers, defining the edge in question.More exactly, the weight functions are: • Unweighted: Each edge has the same weight 1; • FiberN: The number of fibers discovered in the tractography step between the nodes, corresponded to ROIs; • FAMean: The average of the fractional anisotropies [30] of the neuronal fibers, connecting the endpoints of the edge; • FiberLengthMean: The average fiber-lengths between the endpoints of the edge.
• FiberNDivLength: The number of fiber tracts connecting the end-nodes, divided by the mean length of those fibers.
The last weight function, FiberNDivLength, describes a conductance-like quantity in a very simple electrical model: the resistances is proportional to the average fiber length and inversely proportional to the number of wires connecting the endpoints.Similarly, it is also describing a reliability measure of the edge: longer fibers are less reliable due to tractography errors, but multiple fibers between the same ROIs are increasing the reliability.
Other authors have considered the number of edges (weighted or unweighted), running between pre-defined areas of the brain.One of the main focuses of these works was the ratio between the edges, running between the two hemispheres of the brain divided by the number of edges running within each hemisphere [31,32].The authors of [32] considered 95-node graphs, computed from 949 subjects of a publicly unavailable dataset, and found that, relatively, males have more intra-hemispheric edges while females have more inter-hemispheric edges.
We were interested -instead of simple edge-counting between pre-defined vertex-sets -in computing much more elaborate graph-theoretic parameters of the braingraphs.
More exactly, we have computed the following parameters, for each graph, similarly as in [26,33]: • Number of edges (Sum).The weighted version of this number is the sum of the weights of the edges in the graph.
• Normalized largest eigenvalue (AdjLMaxDivD): The largest eigenvalue of the generalized adjacency matrix, divided by the average degree of the graph.The adjacency matrix of an n-vertex graph is an n × n matrix, where a ij is 1 if {v i , v j } is an edge, and 0 otherwise.The generalized adjacency matrix contains the weight of edge {v i , v j } in a ij .The division by the average degree of the vertices is important since the largest eigenvalue is bounded by the average-and maximum degrees [34], so a dense graph has a big λ max largest eigenvalue because of the larger average degree.
Since the vertex numbers are fixed, the average degree is already defined by the sum of weights for each graph.
• Eigengap of the transition matrix (PGEigengap): The transition matrix P G is defined by dividing the rows of the generalized adjacency matrix by the generalized degree of the node, where the generalized degree is the sum of the weights of the edges, incident to the vertex.A random walk on a graph can be characterized by the probabilities, for each i and j, of moving from vertex v i to vertex v j .These probabilities are the elements of transition matrix P G , with all the row-sums equal to 1.The eigengap of a matrix is the difference between the largest and the second largest eigenvalue of P G , and it is characteristic of the expander property of the graph: the larger the gap, the better expander is the graph (see [35]).
• Hoffman's bound (HoffmanBound): If λ max and λ min denote the largest and smallest eigenvalues of the adjacency matrix, then Hoffman's bound is defined as This quantity is a lower estimation for the chromatic number of the graph.
• Logarithm of the number of the spanning forests (LogAbsSpanningForestN): The quantity of the spanning trees in a connected graph can be computed from the spectrum of its Laplacian [36,37].Graphs with more edges usually have more spanning trees, since the addition of an edge does not decrease the number of the spanning trees.For non-connected graphs, the number of spanning forests is the product of the numbers of the spanning trees of their components.The quantity LogAbsSpanningForestN is defined to be the logarithm of the number of spanning forests in the unweighted case.For other weight functions, if we define the weight of a tree by the product of the weights of its edges, then LogAbsSpanningForestN equals to the sum of the logarithms of the weights of the spanning trees in the forests.
• Balanced minimum cut, divided by the number of edges (MinCutBalDivSum): If the nodes of a graph are partitioned into two classes, then a cut is the set of the edges running between these two classes.When we are looking for a minimum cut in a graph, most frequently one of the classes is small (say it contains just one vertex) and the other all the remaining vertices.Therefore, the most interesting case is when the sizes of the two classes of the partitions differ by at most one.Finding such a partition with the smallest cut is the "balanced minimum cut" or the "minimal bisection width" problem.This quantity, in a certain sense, describes the "bottleneck" of the graph, and it is an important characteristic of the interconnection networks (like the butterfly, the cube connected cycles, or the De Bruijn network, [38]) in computer engineering.For the whole brain graph, one may expect that the minimum cut corresponds to the partition to the two hemispheres, which was found when we analyzed the results.Consequently, this quantity is interesting within the hemispheres, when only the nodes of the right-or the left hemisphere is partitioned into two classes of equal size.Computing the balanced minimum cut is NP-hard [39], but its computation for the input-sizes of this study is possible with contemporary integer programming software.If we double every edge in a graph (allowing two edges between two vertices) then the minimum balanced cut will also be doubled.So, it is natural to expect that graphs with more edges may have larger minimum balanced cut just because the more edges present.However, if we norm (i.e., divide by) the balanced minimum cut with the number of the edges in the graph examined, then this effect can be factored out: for example, in the doubled-edge graph the balanced minimum cut is also doubled, but when its size is divided by the doubled edge number, the normed value will be the same as in the original graph.So, when MinCutBalDivSum is considered, the effects of the edge-numbers are factored out.
• Minimum weighted vertex cover (MinVertexCover): We need to assign to each vertex a non-negative weight satisfying that for each edge, the sum of the weights of its two endpoints is at least 1.This is the relaxation of the NP-hard vertex-cover problem [41], since here we allow fractional weights, too.The sum of all vertex-weights with this constraint can be minimized in polynomial time by linear programming.
• Minimum vertex cover (MinVertexCoverBinary): Same as the quantity above, but the weights need to be 0 or 1.Alternatively, this number gives the size of the smallest vertex-set such that each edge is connected to at least one of the vertices in the set.This graph parameter is NP-hard, and we computed it only for the unweighted case by an integer programming (IP) solver SCIP http://scip.zib.de[42,43].
• Maximum matching (MaxMatching): A graph matching is a set of edges without common vertices.A maximum matching contains the largest number of edges.A maximum matching in a weighted graph is the matching with the maximum sum of weights taken on its edges.
• Maximum fractional matching (MaxFracMatching): is the linearprogramming relaxation of the maximum matching problem.In the unweighted case, non-negative values x(e) are searched for each edge e in the graph, satisfying that for each vertex v in the graph, the sum of x(e)-s for the edges that are incident to v is at most 1.The maximum of the sums of e x(e) is the maximum fractional matching for a graph.For the weighted version with weight function w, e x(e)w(e) needs to be maximized.
The above parameters were computed for all five resolutions and the left and the right hemispheres and also for the whole connectome, with all five weight functions (with the following exceptions: MinVertexCoverBinary was computed only for the unweighted case, and the MinSpanningTree was not computed for the unweighted case).
The results, for each subject, each resolution and each weight function are detailed in a large Excel table, downloadable from the site http://uratim.com/bigtableB.zip.

The syntactics of the results:
Each parameter-name in the table at http://uratim.com/bigtableB.zipand elsewhere in this work contains two separating " " symbols that define three parts of the name.The first part describes the hemisphere or the whole connectome with the words Left, Right or All.The second part describes the parameter computed, and the third part the weight function used.For example, All AdjLMaxDivD FiberNDivLength means that the normalized largest eigenvalue AdjLMaxDivD was computed for the whole brain, with the FiberNDivLength weight function (see above).
In the Table http://uratim.com/bigtableB.zip,the first column, round index is used in the statistical analysis.Second column, "id", is the anonymized subject ID of the Human Connectome Project's 500-subject public release.Column 3 gives the sex of the subject, 0: female, 1: male.Fourth column gives the age-groups 0: 22-25 years; 1: 26-30 years; 2: 31-35 years; 3: 35+ years.Column 5 gives the number of vertices of the graph analyzed.

Discussion
The data that we used from the public release of the Human Connectome project, contains diffusion MRI recordings from healthy male and female subjects of age 22 through 35.Therefore, if we want to find correlations of the graph theoretical characteristics of the connectomes with some biological properties, we may easily use either the sex or the age of the subjects.
Our main finding now, on a large data set, validates our earlier results that was made on a much smaller data set in [26]: in numerous graph theoretical parameters, women's connectomes show statistically significant advantages against the men's respective parameters.The parameters in question are related to "better connectivity" in several aspects.
In the supporting material, we are enclosing several large tables with the results.In Table 1, the results of statistical analysis are detailed: the parameters with the bold last column are all significantly differ between the female and the male connectomes: the vast majority is "better" for the females.If the last column is not bold, but the fifth column is typeset in italic then those parameters, one-by-one, significantly differ between the sexes, but it is unlikely that all of them differ significantly (type II statistical errors are possible).
For example, as it is seen in Table 1, differences in the PGEigengap values show the better expander property in the braingraph of the females, in both hemispheres.The differences in the Sum quantity shows that in both hemispheres, women have more edges than men, and this statement remains true for weighted edges with most weight functions.Very strong statistical evidence show the difference and the women's advantage in the edge-number normalized balanced minimum cut in the left hemisphere.Matching numbers (both fractional and integer) are also significantly larger in the case of females.
Seemingly, in the left hemisphere the women's advantage is stronger in several parameters: the first several rows of Table 1 contains mostly "Left" or "All" prefixes in the second column.
In very few cases men have better parameters: e.g., in resolution 83, All MinSpanningForest FiberLengthMean is significantly larger for men than for women.Similarly, another parameter, weighted by FiberLengthMean, the All MinSpanningForest FiberLengthMean in 234-resolution is also larger for males.We believe that the larger brain size with the FiberLengthMean weighting compensates the fewer connections of the males in these cases.
In the supporting material, we are also enclosing Tables 2, 3, 4, 5 and 6 that give the detailed averaged results for each resolution for each graph parameter with ANOVA statistical analysis.The subject-level data are also available at http://uratim.com/bigtableB.zip.

Materials and Methods
We have used the Connectome Mapper Toolkit [44] http://cmtk.orgfor brain tissue segmentation into gray and white matter, partitioning the brain into anatomical regions, for tractography (tracking the axonal fibers in the white matter) and for the construction of the graphs from the fibers identified in the tractography phase of the workflow.The partitioning was based on the FreeSurfer suite of programs [28], according to the Desikan-Killiany brain anatomy atlas [29].The tractography used the MRtrix processing tool [45] with randomized seeding and with the deterministic streamline method.
The graphs were constructed using the results of the tractography step: two nodes, corresponding to ROIs, were connected if there existed, at least one, fiber connecting them.Loops were deleted from the graph.
The assembled graphs that we analyzed in the present work can be downloaded at the site http://braingraph.org/download-pit-group-connectomes/.The individual graph results are detailed in a large Excel table at the site http://uratim.com/bigtableB.zip

Statistical analysis
Our statistical null-hypothesis [46] was that the graph parameters do not differ between males and females.For dealing with both type I and type II statistical errors, we have partitioned the subjects into classes quasi-randomly: subjects with IDs with even digit-sums went to group 0, and those with odd digit sums went to group 1 (c.f. the first column of http://uratim.com/bigtableB.zip).
We applied group 0 for a base set, for making hypotheses, and group 1 as a holdout set, for testing those hypotheses.The hypotheses on group 0 were filtered by "Analysis of variance" (ANOVA) [47]: only the hypotheses with pvalue of less than 1% were selected for the testing in the holdout set.Next, the selected hypotheses were tested on group 1, with the rather strict Holm-Bonferroni correction method [27].The significance level in the Holm-Bonferroni correction was set to 5%.

Handling possible artifacts
While we have applied the same computational workflow for the data of the both sexes, it is still possible that some non-sex specific artifact caused the significant differences in the graph parameters between men and women subjects.One possible cause may be the statistical difference between the size of the brain of the sexes [48].In the tractography step, it may happen that the longer neural fibers of the males cannot be tracked so reliably as the shorter fibers of the females.To close out this possible error, we have selected 36 smallbrain males and 36 large-brain females such that all the females have larger brains than all the males in the data set [33].Next, we have computed the graph theoretical parameters as in the present work.Two main findings of ours were: (i) the small-brain men did not have the advantages identified in the set of the women in the present study; (ii) in several parameters, mostly with the weight function FAMean, women still have the statistically significant advantages identified in the present study.
We find this result decisive that the graph-theoretical differences in the connectomes are due to sex differences and not size differences.

Figure 1 :
Figure 1: The result of the tractography phase.Note the fibers starting and ending on the outer surface, the cortex of the brain, which consists of gray matter.The fibers are tracked only in the anisotropic white matter.