Network-Based Integrative Analysis of Genomics, Epigenomics and Transcriptomics in Autism Spectrum Disorders

Current studies suggest that autism spectrum disorders (ASDs) may be caused by many genetic factors. In fact, collectively considering multiple studies aimed at characterizing the basic pathophysiology of ASDs, a large number of genes has been proposed. Addressing the problem of molecular data interpretation using gene networks helps to explain genetic heterogeneity in terms of shared pathways. Besides, the integrative analysis of multiple omics has emerged as an approach to provide a more comprehensive view of a disease. In this work, we carry out a network-based meta-analysis of the genes reported as associated with ASDs by studies that involved genomics, epigenomics, and transcriptomics. Collectively, our analysis provides a prioritization of the large number of genes proposed to be associated with ASDs, based on genes’ relevance within the intracellular circuits, the strength of the supporting evidence of association with ASDs, and the number of different molecular alterations affecting genes. We discuss the presence of the prioritized genes in the SFARI (Simons Foundation Autism Research Initiative) database and in gene networks associated with ASDs by other investigations. Lastly, we provide the full results of our analyses to encourage further studies on common targets amenable to therapy.


Introduction
Autism spectrum disorders (ASDs) are among the most common neurodevelopmental disorders. ASDs are characterized by impaired social interactions, repetitive behavior, and restricted interests, and they are often comorbidities with other conditions such as epilepsy, mental retardation, inflammation, and gastrointestinal disorders. Despite the fact that the high heritability of ASDs is well established, the exact underlying causes are unknown in at least 70% of the cases [1]. Large genome-wide association studies (GWAS), Copy Number Variation (CNV) testing and genome sequencing yielded many non-overlapping genes, a fact that underlines the complex genetic heterogeneity of ASDs [1] and reflects the architecture of intracellular networks, in which several possible combinations of genetic variations are likely to lead to a common pathological phenotype [2,3].
The identification of the key molecular pathways that link many ASDs-causing genes is of prominent importance in developing therapeutic interventions [1]. In this context, network-based and pathway-based analyses provide functional explanations to non-overlapping genes and narrow the

Results
Firstly, we describe the results obtained regarding the genes associated with ASDs on the basis of genomics. We collected these genes from the SFARI Gene database [8], two recent large studies [23,24], and a series of previous studies summarized by Mosca et al. [7], for a total of 1133 genes (Table 1). Following the criteria adopted by SFARI, we distinguished between the genes with the strongest genomics evidence (334 genes, the "genomics-major" group) from the others (799, the "genomics-minor"). Table 1. Datasets considered in this study. Selected: The number of genes for which at least a high confidence interaction with any other gene is catalogued in the STRING database (see methods). G: Genomics; E: Epigenomics; T: Transcriptomics. ** major evidence; * minor evidence.

Type of Evidence
Description Subjects
-G Meta-analysis of GWAS of over 16,000 individuals with autism spectrum disorder [23]. 15,954 G Synaptic, transcriptional and chromatin genes disrupted in autism [24]. 13,808 E Case-control meta-analysis of blood DNA methylation and autism spectrum disorder [10].
1654 416 1444 272 955 T Gene expression profiling differentiates autism case-controls and phenotypic variants of autism spectrum disorders [25]. T Blood gene expression signatures distinguish autism spectrum disorders from controls [15]. 285 T Disrupted functional networks in autism underlie early brain mal-development and provide accurate classification. [26].

147
T Gene expression in blood of children with autism spectrum disorder [27]. 47 Subsequently, we could describe the multi-omics analysis in which we also considered evidences emerged in studies that focused on epigenomics [10] and transcriptomics [15,[25][26][27]. Additionally in these cases, we distinguished between the genes with the strongest evidences ("epigenomics-major" and "transcriptomics-major") from the others ("epigenomics-minor" and "transcriptomics-minor"), following the indications of the corresponding studies from which we collected the data (Table 1).

Genomics Analysis
Recently, the "omnigenic model" was proposed to explain the inheritance of complex diseases [3]. In this model, the genes whose genetic damage tend to have the strongest effects on disease risk are considered core genes, while those genes that have a minor impact on disease risk are designated as peripheral. The number of peripheral genes may be large as a consequence of the multiple ways in which these genes may interact with core genes throughout cell regulatory networks. Importantly, such classification may be on a graduated scale rather than simply binary [3].
In this context, ND provides an opportunity to define quantitatively the degree of peripherality of all genes in relation to "seed" genes, exploring all possible network paths among genes in intracellular networks. We applied ND on the human interactome (high confidence functional and biophysical interactions catalogued in STRING), considering as seeds: the core genes for ASDs, among which we included those classified in SFARI as "syndromic," "high confidence," "strong candidate," "suggestive evidence," and "syndromic minimal evidence," for a total of 334 genes; other 799 genes proposed to have a role in ASDs (Supplementary Table S1).
We found several genes with a significant network proximity to core genes ( Figure 1A, Supplementary Table S1). From a topological point of view, among these genes, we found both hubs (genes that establish many interactions, such as UBC, ubiquitin C; DYNC1H1, dynein cytoplasmic 1 heavy chain 1; and EP300, E1A binding protein p300) and genes with a lower number of connections (e.g., CHD2, chromodomain helicase DNA binding protein 2; NUDCD2, NudC domain containing 2; and SETD5, SET domain containing 5), which are nevertheless important for the information flow within the network ( Figure 1A).
within the network ( Figure 1A).
Interestingly, 13 genes obtained scores comparable to those of core genes ( Figure 1). These results indicate that these 13 genes closely interact with the core genes, and, in almost all cases, the number of interactions that these genes establish with the core genes is significant (  Figure S1). From now on, we will call the set of core genes and the 13 genes closely related to the core genes as "core+13." In the core+13 gene network, the 13 genes act as linkers between groups of core genes not directly connected with each other; for instance, WDR37 (WD repeat domain 37) links PACS1 (phosphofurin acidic cluster sorting protein 1) and PACS2 (phosphofurin acidic cluster sorting protein 2). The resulting largest connected component involves 204 genes, while the remaining 143 genes are mostly isolated or form very small modules of two or three genes.
We checked whether any of these 13 genes, currently not included in the highest categories of SFARI, are nevertheless classified in other categories corresponding to a lower degree of evidence or have been reported in other network-based analyses of ASDs data. We found that six genes belong to the categories designated as "minimal evidence" or "hypothesized but untested," and eight genes were proposed as part of gene networks associated with ASDs ( Table 2).
The association with ASDs for 12 of the 13 genes is supported at genomic level. In addition, HCN4 (hyperpolarization activated cyclic nucleotide gated potassium channel 4) was found with epigenetic modifications in a study of Andrews et al. [10], while PRKCA (protein kinase C alpha) was found both epigenetically modified [10] and differentially expressed [26]. WDR37 does not have supporting evidences at genomic level, but it was found differentially expressed [25]. Diffusion score (Xs) normalized by its empirical p-value (horizontal axis) and number of interactions (|I|, vertical axis); only genes with p < 0.05 are shown. (B) Connected components of "core+13" network. (A,B) Blue points: 13 genes of "core+13"; pink points: core genes; yellow points: Significant genes outside "core+13" genes; red border of points: Genes supported at transcriptomic and/or epigenetic levels. Diffusion score (Xs) normalized by its empirical p-value (horizontal axis) and number of interactions (|I|, vertical axis); only genes with p < 0.05 are shown. (B) Connected components of "core+13" network. (A,B) Blue points: 13 genes of "core+13"; pink points: core genes; yellow points: Significant genes outside "core+13" genes; red border of points: Genes supported at transcriptomic and/or epigenetic levels.
Interestingly, 13 genes obtained scores comparable to those of core genes ( Figure 1). These results indicate that these 13 genes closely interact with the core genes, and, in almost all cases, the number of interactions that these genes establish with the core genes is significant (Table 2, Supplementary Figure S1). From now on, we will call the set of core genes and the 13 genes closely related to the core genes as "core+13." In the core+13 gene network, the 13 genes act as linkers between groups of core genes not directly connected with each other; for instance, WDR37 (WD repeat domain 37) links PACS1 (phosphofurin acidic cluster sorting protein 1) and PACS2 (phosphofurin acidic cluster sorting protein 2). The resulting largest connected component involves 204 genes, while the remaining 143 genes are mostly isolated or form very small modules of two or three genes.
We checked whether any of these 13 genes, currently not included in the highest categories of SFARI, are nevertheless classified in other categories corresponding to a lower degree of evidence or have been reported in other network-based analyses of ASDs data. We found that six genes belong to the categories designated as "minimal evidence" or "hypothesized but untested," and eight genes were proposed as part of gene networks associated with ASDs (Table 2). Table 2. The 13 genes that closely interact with the core genes of ASDs. |I|: Number of interactors; |Ic|: Number of interactors that are core genes; p: Hypergeometric probability of observing |Ic| in a hypergeometric experiment; G: Genomics; E: Epigenomics; T: Transcriptomics; ** major; * minor; 0: No evidence; SFARI score: "minimal evidence" (4), "hypothesized but untested" (5); other modules: Reference of gene-networks studies of ASDs in which the gene is mentioned. The total number of genes considered is equal to the interactome size: 12,739 genes.  The association with ASDs for 12 of the 13 genes is supported at genomic level. In addition, HCN4 (hyperpolarization activated cyclic nucleotide gated potassium channel 4) was found with epigenetic modifications in a study of Andrews et al. [10], while PRKCA (protein kinase C alpha) was found both epigenetically modified [10] and differentially expressed [26]. WDR37 does not have supporting evidences at genomic level, but it was found differentially expressed [25].

Multi-Omics Analysis
We assessed the significance of the overlaps among the lists of genes associated with ASDs by genomics, epigenomics, and transcriptomics evidences. We observed significant overlaps between the list of genes from genomics and those supported by epigenomics or transcriptomics ( Table 3). The intersection among the three gene lists consists of 40 genes, 34 of which are included in the considered interactome (shortly "shared") ( Figure 2). Out of the shared genes, 26 do not interact directly with any other shared gene, while eight genes form three connected components composed of: DYNC1H1, TRAPPC6B (trafficking protein particle complex 6B), TRAPPC9 (trafficking protein particle complex 9) and CSNK1D (casein kinase 1 delta); GNAS (GNAS complex locus) and PRKCA (protein kinase C alpha); EP400 and TRRAP (transformation/transcription domain associated protein) (Supplementary Figure S2). the list of genes from genomics and those supported by epigenomics or transcriptomics ( Table 3). The intersection among the three gene lists consists of 40 genes, 34 of which are included in the considered interactome (shortly "shared") ( Figure 2). Out of the shared genes, 26 do not interact directly with any other shared gene, while eight genes form three connected components composed of: DYNC1H1, TRAPPC6B (trafficking protein particle complex 6B), TRAPPC9 (trafficking protein particle complex 9) and CSNK1D (casein kinase 1 delta); GNAS (GNAS complex locus) and PRKCA (protein kinase C alpha); EP400 and TRRAP (transformation/transcription domain associated protein) (Supplementary Figure S2). Table 3. Overlaps among the lists of genes associated with ASDs. G: Genomic; E: Epigenomics; T: Transcriptomics; ** major; * minor; core+13(E) and core+13(T) indicate genes belonging to the core+13 set and which are supported by E and T, respectively.  In order to find modules of functionally related genes supported by one or more types of evidences ("layers" from now on), we used ND (see methods) and obtained a final diffusion score that summarized the relevance of each gene in relation to its location in the interactome and its network proximity to other genes associated with ASDs in one or more layers (genomics, epigenomics, and transcriptomics). The higher the final diffusion score, the closer the gene to ASDs genes in one or more of the considered layers.
At the top of the resulting genome-wide ranking, we found genes with significant scores ( Figure  3, Supplementary Table S2). To assess whether these highly ranked genes formed significantly connected gene modules, we used network resampling [28] and found a multi-omics integrative gene module  Table S3). In addition to the 144 INT-MODULE genes occurring among the highest SFARI categories, we found that 10 genes are classified as "minimal evidence" and "hypothesized but untested" in SFARI (Table 4, In order to find modules of functionally related genes supported by one or more types of evidences ("layers" from now on), we used ND (see methods) and obtained a final diffusion score that summarized the relevance of each gene in relation to its location in the interactome and its network proximity to other genes associated with ASDs in one or more layers (genomics, epigenomics, and transcriptomics). The higher the final diffusion score, the closer the gene to ASDs genes in one or more of the considered layers.
At the top of the resulting genome-wide ranking, we found genes with significant scores (Figure 3, Supplementary Table S2). To assess whether these highly ranked genes formed significantly connected gene modules, we used network resampling [28] and found a multi-omics integrative gene module  Table S3). In addition to the 144 INT-MODULE genes occurring among the highest SFARI categories, we found that 10 genes are classified as "minimal evidence" and "hypothesized but untested" in SFARI (Table 4, Supplementary Figure S4), and seven of these 10 genes were reported by other network-based analyses ( Table 4). The INT-MODULE includes also LRRC46 (leucine rich repeat containing 46), the only gene of the module that does not occur in any of the input gene lists (Figure 3, Supplementary Figure S4).

RBM8A
RNA binding motif protein 8A 10 0 ** * 5 -SH3KBP1 SH3-domain kinase binding protein 1 12 * 0 ** 5 -   To functionally characterize the INT-MODULE, we partitioned its largest connected component (266 genes) in topological clusters and assessed both the enrichment of each cluster in terms of molecular pathways and the types of evidences associated with each cluster (Supplementary Tables S2, S4-S6). We explored several community detection strategies and found the highest modularity with a partition of 12 clusters ( Figure 4A, Supplementary Figure S5).  (Supplementary Tables  S2, S4-S6). We explored several community detection strategies and found the highest modularity with a partition of 12 clusters ( Figure 4A, Supplementary Figure S5).

Discussion
The integrative analysis of gene related evidence (e.g., DNA polymorphism or mutations, epigenetic changes, transcriptional variations) and gene-gene interaction evidences allows for the extraction of otherwise hidden patterns. In this context, it is worth noting that 68 genes that we proposed as relevant to ASDs in a previous network-based analysis of genetic data are now classified in SFARI (Supplementary Table S7).
In light of the utmost importance of jointly analyzing omics data and intracellular networks, Boyle et al. [3] proposed an explanation for the large number of genes that may be involved in a The two largest clusters are composed of 61 (cluster #8) and 53 (cluster #3) genes, and they are characterized by a similar proportion of supporting evidences ( Figure 4B). These two central clusters contain genes that are part of the same pathways, such as the Wnt signaling pathway (#8: q = 1.83 × 10 −2 ; #3: q = 1.79 × 10 −5 ) and IL-7 signal transduction (#8: q = 4.54 × 10 −2 ; #3: q = 6.02 × 10 −4 ), but they are also marked by specific pathways. In particular, among the pathways specifically enriched in cluster #8 and #3, we found chromatin organization (q = 1.27 × 10 −27 ) and signaling by VEGF (q = 5.41 × 10 −23 ), respectively. Cluster #7 (41 genes) is the most enriched in differentially expressed genes and significantly associated with pathways involved in cell cycle processes. Cluster #5 is mainly enriched in genes associated with epigenetic and transcriptional changes, and it is marked by mRNA splicing (q = 1.92 × 10 −12 ). Cluster #6 is particularly enriched in genes with epigenetic changes and associated it with extracellular matrix organization (q = 6.10 × 10 −5 ). Cluster #2 (nine genes) is supported at the genomics and epigenomics levels and is enriched in genes of the calcium signaling pathway (q = 4.88 × 10 −12 ). Lastly, clusters #11 and #4 are composed of genes associated with ASDs mainly at the genetic level, which, respectively, control the GABAergic synapse (#4: q = 3.90 × 10 −6 ) and encode for cell adhesion molecules (#11: q = 1.58 × 10 −5 ) active in the neuronal system.

Discussion
The integrative analysis of gene related evidence (e.g., DNA polymorphism or mutations, epigenetic changes, transcriptional variations) and gene-gene interaction evidences allows for the extraction of otherwise hidden patterns. In this context, it is worth noting that 68 genes that we proposed as relevant to ASDs in a previous network-based analysis of genetic data are now classified in SFARI (Supplementary Table S7).
In light of the utmost importance of jointly analyzing omics data and intracellular networks, Boyle et al. [3] proposed an explanation for the large number of genes that may be involved in a complex disease as the result of the highly interacting nature of molecular networks. Following the omnigenic model [3], we considered as core genes of ASDs those whose variations are highly scored in SFARI and quantified, by means of network proximity, the degree of peripherality of all other genes in relation to the core genes. This analysis led to the identification of 13 genes significantly connected with the core genes. The strong functional relationship we found between these 13 genes and the core genes suggests that even the former can play an important role in ASDs.
As for the 13 predicted genes ( Table 2) that closely interact with the core genes of ASDs, they mainly belong to different neuronal pathways and are especially involved in synaptic function and plasticity that, if impaired, could actively contribute to the pathogenesis of ASDs and/or to their comorbidities. Genes encoding for the ion channel were found among these genes, and the role of various ion channel gene defects (channelopathies) is known in the pathogenesis of ASDs. For instance, HCN2 and HCN4 belong to the hyperpolarization-activated cyclic nucleotide-gated (HCN) channels family, encoding for non-selective voltage-gated cation channels, and they are strongly expressed in the brain. These channels establish the slow native pacemaker currents contributing to membrane resting potentials, input resistance, dendritic integration, synaptic transmission, and neuronal excitability. Interestingly, it seems that SHANK3, strongly linked to ASDs, works in organization of HCN-channels [29] and that its expression negatively influences those of HCN2 [30], so variations in the SHANK3 gene are reflected in pacemaker current abnormalities. In addition, variants in HCN1, another member of the HCN family, were detected in patients with epileptic encephalopathy and clinical features of Dravet syndrome, intellectual disability, and autistic features [31].
Some of the predicted genes, such as EPB41 and EPB41L1, take part in cytoskeleton and synaptic structures. EPB41 is the founding member of the large family of proteins that associate with membrane proteins and cytoskeleton and in neurons is involved in protein-protein interactions at synaptic level. It interacts with NRXN1 and NRXN2, as well as NLGN1, -2, -3, and -4X. These proteins act at the presynaptic and post synaptic level and causative variations in NRXN1, -2 [32,33], as well and NLGN2 (also in core+13 gene set), -3, and -4X [34,35] have already been described in ASDs. Furthermore, EPB41L1 (highly expressed in the brain) and the ionotropic glutamate receptor GRIA1, were listed in the 13 predicted and in core genes, respectively, interact thus contributing to glutamate neurotransmission. An alteration of glutamate neurotransmission was found in ASDs. Interestingly, EPB41L1 is associated with mental retardation, deafness autosomal dominant 11 and autosomal dominant non-syndromic intellectual disability.
Then again, DLGAP2 is a member of the postsynaptic density proteins (as SHANK3), probably involved in molecular organization of synapses and signaling in neuronal cells, with implications in synaptogenesis and plasticity. In particular, DLGAP2 could be an adapter protein linking the ion channel to the sub-synaptic cytoskeleton. Animal models demonstrated that DLGAP2 has key role in social behaviors and synaptic functions [36]. Case studies also report rare DLGAP2 duplications in ASDs [37][38][39]. Then again, the DLGAP2 gene has an important paralog, DLGAP1, already associated with ASDs. DLGAP1 proteins interact with other ASDs-associated proteins such as DLG1, DLG4, SHANK1, SHANK2 and SHANK3 [18]. Moreover, the analysis of rare copy number variants in ASDs found numerous de novo and inherited events in many novel ASDs genes including DLGAP2 [22].
Among the 13 predicted genes, syntaxin-1A (STX1A) is also involved in synaptic signaling. This gene encodes for part of complex of proteins mediating fusion of synaptic vesicles with the presynaptic plasma membrane. A dysregulation of STX1A expression [40][41][42] has been reported in high functioning autism and Asperger syndrome. A significant association between three STX1A SNPs (Single Nucleotide Polymorhpisms) and Asperger syndrome was recently described. These SNPs could alter transcription factor binding sites both directly and through other variants in linkage disequilibrium [43].
The list of predicted genes includes GABRA5. It transcribes for the subunit 5 of GABA receptor alpha whose reduced expression and reduced protein level have been described in autism [44], and the SNPs of this gene are biomarkers of symptoms and developmental deficit in Han Chinese with autism [45]. The inclusion of this gene in the core list strengthens the evidences of imbalance between excitatory and inhibitory neurotransmission in ASDs and abnormalities in glutamate and GABA signaling as possible causative pathological mechanisms of ASDs.
Few of these predicted genes encode for proteins involved in non-neuronal specific signaling pathways, which are also important for ASDs: PRKCA, WDR37 and UBC. PRKCA regulates many signaling pathways such as cell proliferation, apoptosis, differentiation, tumorigenesis, angiogenesis, platelet function, and inflammation. A meta-analysis performed on the de novo mutation data of 10,927 individuals with neurodevelopmental disorders found an excess of missense variants in the PRKCA gene [46]. The WDR37 gene encodes a member of a protein family that is involved in many cellular processes such as cell cycle progression, signal transduction, apoptosis, and gene regulation. WDR37 is a nuclear protein ubiquitous expressed and particularly abundant in the cerebellum and whole brain. There are no direct evidences for ASDs development and WDR37-however, recently, it has been demonstrated that WDR47 shares functional characteristics with PAFAH1B1, which causes lissencephaly. PAFAH1B1 also constitutes a key protein-network interaction node with high-risk ASDs genes expressed in the synapse that can impact synaptogenesis and social behavior [47].
Our analysis confirms the importance of the X-linked gene in the aetiopathogenesis of ASDs. Mutations of CACNA1F (located at Xp11.23) mainly cause X-linked eye disorders. Since the role of various ion channel gene defects (channelopathies) in the pathogenesis of ASDs is becoming evident, the deep resequencing of these functional genomic regions has been performed. These studies revealed potentially causative rare variants contributing to ASDs in CACNA1F. Then again, CACNA1D, an important paralog of CACNA1F, displayed de novo missense variants in ASDs probands from the Simons Simplex Collection [48,49]. Moreover, the gene being X-linked could contribute to the sex bias of ASDs.
Out of the 13 genes tightly interconnected with the core, the occurrence in SFARI (six genes in "minimal evidence" or "hypothesized but untested"), the inclusion in networks associated with ASDs by other studies (eight genes) and the presence of epigenetics and/or transcriptional changes modified in ASDs patients vs controls (four genes), constitute further evidences in favor of these genes. A similar reasoning can be extended to peripheral genes, for which we proposed a graduated scale of relevance in relation to the core genes. To this aim, we provided the full results.
Overall, we observed a significant overlap between the lists of genes associated with ASDs by studies of genomics, as well as by studies of epigenomics and transcriptomics from blood samples, with a total of 40 genes supported by all the three types of evidences (of which 34 had high confidence functional interactions). We also observed that a significant number of the core+13 genes has been reported as epigenetically and/or transcriptionally modified in ASDs patients. The observation that different types of alterations refer to the same genes further stresses the role of these genes in ASDs. These results are in line with those of previous studies that suggested the potential role of genetic factors in contributing to DNA methylation differences in ASDs [10]. Moreover, blood-derived epigenetic changes observed in genes whose sequence variations are associated with ASDs are more likely to have a common function across tissues compared to those not related to genetic changes [17].
The existence of molecular relations between altered genes increases the likelihood that such alterations have a role in ASDs, suggesting molecular pathways that encompass such genes and functional relations among different types of alterations. Our analysis highlighted a network of 275 genes, which is strongly supported by genomics, epigenomics, and transcriptomics. Importantly, this network gathers 22 genes not directly linked to each other but supported by all three types of evidences. Interestingly, 157 of the INT-MODULE genes were proposed by other network-based studies on ASDs, different in terms of input data and analysis approach. A total of 144 genes belong to the highest scoring SFARI categories. Ten other genes of the network are currently classified in SFARI as "minimal evidence" and "hypothesized but untested" and are also supported by epigenomics and/or transcriptomics. Therefore, they deserve special attention among the genes of such categories.
The largest connected component of the network (275 genes) can be partitioned in 12 subgroups or topological sub-modules. This analysis suggests a different role of the sub-modules by function and by association with one or more types of alterations. For example, cluster #3, equally supported by all the three types of evidence, includes genes that belong to inflammatory mediator regulation of transient receptor potential (TRP) channels. Inflammation and immune system dysfunctions are in comorbidity with ASDs, and TRP canonical channel 6 (TRPC6) is emerging as a functional element for the control of calcium currents in immune-committed cells and target tissues, influencing leukocytes tasks. Interestingly, TRPC6 is also involved in neuronal development and variants in the TRPC6 gene (within core gene) were found in patients with ASDs. Moreover, MeCP2, a transcriptional regulator whose mutations cause Rett syndrome, was found abundant in a TRPC6 promoter region resulting a transcriptional regulator of this gene [50] TRPC6, in turn, activates neuronal pathways, including BDNF, CAMKIV, Akt, and CREB signaling pathways, also involved in ASDs [51].
The prioritization of genes in terms of causality is a relevant challenge, especially in a complex and multi-genic disorder like ASDs. Nevertheless, it is possible to distinguish, among the functional themes highlighted by our integrative analysis, possible causative pathways considering their function and/or alteration type. For example, genes of clusters #4 and #11 mainly display genetic alterations and participate, respectively, in neuronal cell adhesion and GABAergic synapses, pathways already associated with brain morpho-functional abnormalities in ASDs.
Our integrative analysis of the large number of genes reported by studies on ASDs that focused on genomics, epigenomics, and transcriptomics prioritized a series of genes interconnected by functional relations and associated with one or more types of molecular alteration. Since this rich information might unearth common targets that are amenable to therapy [1], we have provided the full results of our network-based meta-analyses.

Molecular Interactions
Molecular interactions were collected from the STRING database [52] for a total of 12,739 genes and 355,171 links with high confidence (score ≥ 700). Native identifiers were mapped to Entrez Gene [53] identifiers. In case multiple proteins mapped to the same gene identifier, only the pair of gene identifiers with the highest STRING confidence score was considered.
Genes belonging to the genomics-minor group were collected from Mosca et al. [7], in which genes associated with SNPs, mutations, and CNV emerging from several large studies were reported, the meta-analysis study of GWAS of over 16,000 individuals with ASDs [23], and the whole-exome sequencing study of rare coding variation in 3871 autism cases and 9937 ancestry-matched or parental controls [24]. Native gene identifiers were converted to Entrez Gene identifiers [53].

Epigenomics
Genes associated with ASDs at the epigenomics level were collected from a previous study [10] in which the authors performed a case-control meta-analysis of blood DNA methylation among two large case-control studies of autism (796 ASDs cases and 868 controls) using METAL software [54] on the probes that were present in both studies. All genes found by their meta-analysis with p < 10 −3 were assigned to epigenomics-major group, while the genes with 10 −3 ≤ p < 5 × 10 −3 were assigned to epigenomics-minor. Native gene identifiers were converted to Entrez Gene [53] identifiers.

Transcriptomics
Genes associated with ASDs at transcriptomics level were collected from the four studies [15,[25][26][27] reported in [55], in which the original authors generated blood-based gene expression profiles from microarray experiments with sample sizes greater than 40 and provided list of differentially expressed genes. Following the approach by [55], genes reported as differentially expressed in at least two studies were assigned to the transcriptomics-major group, while the other differentially expressed genes were assigned to the transcriptomics-minor group. Native gene identifiers were converted to Entrez Gene [53] identifiers, and only genes occurring in STRING network were considered in network-based analyses.

Gene Prioritization Based on Network Diffusion
Network diffusion (ND) was performed using an approach previously described [7,28,56,57]. A genes-by-layers input matrix X 0 = (x 1 , x 2 , x 3 ) was defined where each element x ij was set to: 1 if the gene i was member of a "-major" group in layer j; 0.5 if the gene i was member of a "-minor" group in layer j; and 0 if the gene was i was not associated with ASDs in layer j. ND was applied to X 0 using the genome-wide interactome represented by the symmetric normalized adjacency matrix W, according to the following iterative procedure: where α ∈ (0, 1) is a scalar that weights the relative importance of the two addends and was set to 0.7, a value that represents a good trade-off between diffusion rate and computational cost and determined consistent results in previous studies [7,28,[56][57][58]. The resulting matrix X ss , containing ND scores, was column-wise normalized by the maximum of each column, obtaining the matrix X * . Similarly to what was done by Ruffalo et al. [59], a final diffusion score d i was calculated for each gene i, multiplying the sum of its three scores x * i1 , x * i2 , x * i3 by the sum of the three averages y * i1 , y * i2 , y * i3 obtained considering the top 3 direct neighbors of i with the highest diffusion scores in each layer [60]. Statistical significance of gene scores was assessed by empirical p values, calculated using 1000 permutations of the input matrix X 0 .

Functional Characterization of the INT-MODULE
Topological community identification was performed using methods based on different rationales such as modularity/energy function optimization, edge removal, label propagation, leading eigenvector, and random walks. Modularity was quantified using the Newman definition [61]. Community identification and modularity quantification were performed using functions implemented R package igraph [62].
Pathway analysis was carried out using gene-pathway associations from Biosystems [63] and MSigDB Canonical Pathways [64]. Each pathway was assessed for the over-representation of genes from each cluster using the hypergeometric test (R functions "phyper" and "dhyper"). Nominal p values were corrected for multiple testing using the Bonferroni-Hochberg method (R function "p.adjust"), obtaining q values.
The enrichment of each cluster in terms of a type A of evidence (e.g., genomics) was quantified as the ratio between the fraction of genes supported by A in the cluster and the fraction of genes supported by A in the INT-MODULE.