Reconstruction and Analysis of Gene Networks of Human Neurotransmitter Systems Reveal Genes with Contentious Manifestation for Anxiety, Depression, and Intellectual Disabilities

Background: The study of the biological basis of anxiety, depression, and intellectual disabilities in humans is one of the most actual problems of modern neurophysiology. Of particular interest is the study of complex interactions between molecular genetic factors, electrophysiological properties of the nervous system, and the behavioral characteristics of people. The neurobiological understanding of neuropsychiatric disorders requires not only the identification of genes that play a role in the molecular mechanisms of the occurrence and course of diseases, but also the understanding of complex interactions that occur between these genes. A systematic study of such interactions obviously contributes to the development of new methods of diagnosis, prevention, and treatment of disorders, as the orientation to allele variants of individual loci is not reliable enough, because the literature describes a number of genes, the same alleles of which can be associated with different, sometimes extremely different variants of phenotypic traits, depending on the genetic background, of their carriers, habitat, and other factors. Results: In our study, we have reconstructed a series of gene networks (in the form of protein–protein interactions networks, as well as networks of transcription regulation) to build a model of the influence of complex interactions of environmental factors and genetic risk factors for intellectual disability, depression, and other disorders in human behavior. Conclusion: A list of candidate genes whose expression is presumably associated with environmental factors and has potentially contentious manifestation for behavioral and neurological traits is identified for further experimental verification.


Introduction
Anxiety and major depressive disorders are the most common stress-related neuropsychiatric disorders affecting approximately 10-15% of the human population [1,2]. Depressive disorders can be long-lasting, and are often accompanied by low self-esteem, loss of interest in normally enjoyable activities, low energy, pain without a clear cause, and increased risk of suicide [3]. Different forms of intellectual disability have a pronounced relationship with the risk of depression development and anxiety disorder. On the one hand, children and adults who have intellectual disability and autism often demonstrate comorbid anxiety disorder and/or depression [4,5]. On the other hand, depression traits. Their further experimental verification might elucidate the possible mechanisms of development of anxiety, depression, and intellectual disabilities, and therefore make new treatment strategies.
The reconstruction of the gene networks was conducted using web services String DB [22]. String DB is a database of known and predicted protein-protein interactions. Interactions include direct and indirect associations; they are based on computer prediction and relationships collected from other (primary) databases. The network of manually selected genes was expanded using the Cytoscape StringExpand Network tool (http://apps.cytoscape.org/apps/stringApp), with Selectivity = 0.4. Analysis of enrichment of gene sets with terms of gene ontology (GO) was identified by DAVID 6.8 service [23]. All settings were used in the default mode.

Reconstruction of Transcriptional Regulation Networks
For all the genes from the sample, the coordinates of the exact location on the chromosome were obtained via Ensemble Biomart web service (GRCh38.p12 assembly of the genome was used). Promoter region sequences of length 300 b.p., 1000 b.p., and 2000 b.p. were extracted from the reference genome. Using AnimalTFDB database [24], we found transcription factors presented in our gene set (TP53, SREBF2, SREBF1). Using MoLoTool web-service, transcription factor binding sites (TFBS) were predicted for the three found transcription factors within upstream regulatory regions for all the genes. MoLoTool is an interactive web-based program to identify the sequence motifs of transcription factor binding for a given set of sequences [25]. It allows user to analyze up to 15 sequences at the same time, and gives the values of the TFBS significance. We used significance threshold p ajusted < 10 −4 , which is recommended by the web-service. Thus, the information about the hypothesized transcriptional regulation of genes inside a sample was received. Networks were visualized in Cytoscape [26].

SNP Analysis
We have analyzed SNP (single nucleotide polymorphism) distribution within both coding and promoter regions of genes. SNP analysis was performed to identify the most variable genes at the cross-section of the human population among the sample associated with intellectual disability depressive disorder. Such genes may have a more significant connection with the disorder due to the prevalence of their various variants in humans and, at the same time, with the high prevalence of the disorder itself in the population. SNP search was performed using 1000 Genomes [27] Project data with SAM Tools. Information on SNP distribution in coding and promoter (up to 2000 b.p. upstream) was extracted from the Project data, and then processed with Python scripts to summarize SNP enrichment for those regions. This information was finally superimposed on the networks.

Comparison with Current Gene Panels
We have compared our gene networks with genes from commercial sequencing panels prepared by Roche, Illumina, and ThermoFisher for the study of neurological diseases (Appendix A). The gene sets covered by commercial panels from the Illumina (AmpliSeq for Illumina Neurological Research Panel, New-York, US) and ThermoFisher (Ion AmpliSeq™ Neurological Research Panel, Waltham, US) turned out to be identical, and therefore, we considered them as one set in the comparisons. Analysis of the overlapping of sets was performed using the tools of the Python programming language. The comparison is visualized using Venn diagrams.

Formation of the Initial List of Genes
To reconstruct the desired gene networks, we chose an initial set of genes. The main factor for choosing a gene was an ambiguity in phenotypic manifestation for the same variants in different human populations. Below there are six genes chosen as our initial gene set.
• SLC6A4-serotonin transporter gene. Many studies have shown a relationship between 5-HTTLPR polymorphism and anxiety [2], depression, and bipolar disorder, antidepressant sensitivity [28], risk of depression, and suicide [29]. However, in a recent study, it has been shown to reverse the effects of this polymorphism in several Asian populations such as Japanese [12], Chinese [30], and Tuvinians [11]. • HLA-B (human leukocyte antigen) is one of the genes of the histocompatibility complex of class 1 in humans. Its allele, HLA-B *1502, plays an important role in the development of such carbamazepine-caused diseases as Stevens-Johnson syndrome and Lyell's syndrome, but its effect is seen only in Chinese [31] and Malaysian populations [32]. In individuals of European origin, its influence was not found to be significant [33]. • ABCB1 (ATP-binding cassette) is a gene of the membrane protein P-glycoprotein, which may transport xenobiotic substances through the cell membrane. C3435T allele of this gene supposedly plays a role in the development of drug resistance to epilepsy in a number of populations: Japan [34], Taiwan [35], but does not show a significant role in Korea [36], Turkey [37], Great Britain [38,39], and Ireland [40].

•
Apolipoprotein E (ApoE) is one of the most important apolipoproteins, involved in the metabolism of blood lipids on the one hand, and cholesterol metabolism in the brain on the other. The ε4 allele of this gene appears to be associated with an increased risk of temporal epilepsy after head injuries. Also, ApoE genotype is related to a level of intellectual disability, and presence of dementia in Down syndrome [41]. However, studies confirming this relationship were obtained only by researchers from the United States [42,43]. In other countries, no such relationship has been identified [44,45]. • BDNF-neurotrophin, a signaling protein that stimulates and supports the development of neurons. Peripheral BDNF level is associated with attention deficit and intellectual disability in preschool children [46]. In Japanese population, it is a trait of susceptibility to partial epilepsy [47], but in European populations, such dependence could not be detected [48,49].

•
COMT is an enzyme that plays an important role in the degradation of catecholamines, including dopamine, adrenaline and norepinephrine and the role of rs4680 polymorphism in the etiology of idiopathic intellectual disability was revealed [50]. Additionally, the role of rs4680 as a risk factor for bipolar disorder was shown in Chinese population [51], as well as in Ashkenazi population [52], but its influence was considered insignificant in European populations [53].

Gene Networks Reconstruction
Further, for the initial genes, the search of intergenic interactions was carried out via StringDB. According to the data obtained, reconstruction of protein-protein gene networks in the Cytoscape program was performed. Using the built-in String Expand tool from the String DB database, the resulting networks were built up with other proteins associated with our model (Figure 1).

Gene Sets Formation
From the reconstructed gene networks, a list of genes for sequencing was selected for further experimental validation of our model (Table S1). The list was analyzed using David GO web service. The results of the analysis indicate that our model is highly enriched with genes involved in neurophysiological and behavioral processes of neuropsychiatric diseases (Table 1).

Gene Sets Formation
From the reconstructed gene networks, a list of genes for sequencing was selected for further experimental validation of our model (Table S1). The list was analyzed using David GO web service. The results of the analysis indicate that our model is highly enriched with genes involved in neurophysiological and behavioral processes of neuropsychiatric diseases (Table 1).

Comparison with Current Gene Panels
Although the results of our screening do not overlap with the Roche panel (Figure 2), and Illumina/ThermoFisher panels (Figure 2), it is likely that this can be explained by the choice of genes: We prioritized the genes of neurological diseases with differences in phenotypic manifestation in different populations. Obviously, such a policy is not suitable for genes whose influence is modulated by environmental factors, which was our reason for not using standard panels for sequencing.

Comparison with Current Gene Panels
Although the results of our screening do not overlap with the Roche panel (Figure 2), and Illumina/ThermoFisher panels (Figure 2), it is likely that this can be explained by the choice of genes: We prioritized the genes of neurological diseases with differences in phenotypic manifestation in different populations. Obviously, such a policy is not suitable for genes whose influence is modulated by environmental factors, which was our reason for not using standard panels for sequencing.  The reconstructed network of transcriptional regulation built on upstream promoter regions of 300 b.p. length is shown in Figure 3. The least variable gene in it is ARBA1 (0.7% SNP), the most variable is APOA2 (4.5% SNP). Transcription factors SREBF2 and SREBF1 are combined into one cluster. It should be noted that the most part of genes (excluding LILRB1), controlled by SREBF1, have binding sites for SREBF2. At the same time, SREBF2 regulates additional genes, which are not regulated by SREBF1. Moreover, SREBF1 and SREBF2 regulate each other. Transcription factor TP53 form separate clusters with the four regulated genes: DRD3, ahsg, mr1, and klrc1.
Networks built on upstream promoter regions of 1000 b.p./2000 b.p. (Figures 4 and 5, correspondingly) look similar, so they will be described together. For each transcription factor, there is a set of genes regulated only by it, as well as clusters of genes regulated together with other factors. The largest number of regulated genes has SREBF2, the smallest-TP53. Among the genes constituting both the initial set of genes and the reconstructed gene networks, the highest share of SNPs is in those of the immune system (such as KIR2DL3, LILRB2, HLA-A, HLA-B, HLA-C). On the other hand, substitutions in the most conservative genes among the sample (for example, BEX3) can lead to more serious consequences for the functioning of molecular genetic systems. It is noteworthy that part of the genes (BEX3, SHC2) has not yet been studied, that is, there is no direct connection between not only molecular processes, but also phenotypes. Such genes are the main goal of our further experimental work.
Networks built on upstream promoter regions of 1000 b.p./2000 b.p. (Figures 4 and 5, correspondingly) look similar, so they will be described together. For each transcription factor, there is a set of genes regulated only by it, as well as clusters of genes regulated together with other factors. The largest number of regulated genes has SREBF2, the smallest-TP53. Among the genes constituting both the initial set of genes and the reconstructed gene networks, the highest share of SNPs is in those of the immune system (such as KIR2DL3, LILRB2, HLA-A, HLA-B, HLA-C). On the other hand, substitutions in the most conservative genes among the sample (for example, BEX3) can lead to more serious consequences for the functioning of molecular genetic systems. It is noteworthy that part of the genes (BEX3, SHC2) has not yet been studied, that is, there is no direct connection between not only molecular processes, but also phenotypes. Such genes are the main goal of our further experimental work.

Discussion
We constructed the initial set of genes based on the principle of ambiguous phenotypic manifestation for the same polymorphisms for people from different populations, or for people exposed to different environmental influences. Resulted gene networks, constructed by adding genes functionally related to genes from the initial set, suggest hypothetical mechanisms for the realization of such a "contentious" phenotype. The fact that the vast majority of genes from gene networks are not contained in the diagnostic panels of psychiatric and neurological diseases, as well as behavioral disorders, suggests that these genes may be characterized by ambiguous phenotypic manifestation. This is supported by the GO enrichment analysis performed with the DAVID service-these genes are certainly associated with behavioral, neurological, and psychiatric disorders.
The next step of this research is to check the influence of the genes identified by us on such (endo)phenotypic characteristics as indicators of psychological tests, EEG parameters at rest, and under functional load on the brain and experimental behavioral data, determined for a sample of about 1000 people living in different regions of Russia and Mongolia.

Discussion
We constructed the initial set of genes based on the principle of ambiguous phenotypic manifestation for the same polymorphisms for people from different populations, or for people exposed to different environmental influences. Resulted gene networks, constructed by adding genes functionally related to genes from the initial set, suggest hypothetical mechanisms for the realization of such a "contentious" phenotype. The fact that the vast majority of genes from gene networks are not contained in the diagnostic panels of psychiatric and neurological diseases, as well as behavioral disorders, suggests that these genes may be characterized by ambiguous phenotypic manifestation. This is supported by the GO enrichment analysis performed with the DAVID service-these genes are certainly associated with behavioral, neurological, and psychiatric disorders.
The next step of this research is to check the influence of the genes identified by us on such (endo)phenotypic characteristics as indicators of psychological tests, EEG parameters at rest, and under functional load on the brain and experimental behavioral data, determined for a sample of about 1000 people living in different regions of Russia and Mongolia.
Analysis of transcriptional regulation networks (Figures 3-5) built on information about TFBS distributions within upstream promoter regions of 300, 1000, and 2000 b.p. length has shown the 4 (in case of 300 b.p.) and 6 (in case of 1000 and 2000 b.p.) clusters of regulated genes. These clusters (SREBF1-, SREBF2-, SREBF1/SREBF2and TP53-regulated genes in the first case and these plus pairwise regulated genes for the remaining pairs in the second case) may indicate several ways of transmission of transcriptional regulation signals, which also requires further experimental verification.
Thus, the genes found by us can become the basis for the formation of a new diagnostic panel to explain the "contentious" phenotypic manifestations and predict the impact of certain effects on their further manifestation.

Funding:
The research was partially funded by the RFBR Grant 18-29-13027 and the Budget Project 0259-2019-0008. The publication of this paper was supported by the project "Investigation, analysis and complex independent expertise of projects of the National technological initiatives, including the accompanying of projects of "road map" "NeuroNet"", which is executed within the framework of the state assignment № 28.12487.2018/12.1 of the Ministry of Science and Higher Education of the Russian Federation.