A Study of the Genomic Variations Associated with Autistic Spectrum Disorders in a Russian Cohort of Patients Using Whole-Exome Sequencing

This study provides new data on the whole-exome sequencing of a cohort of children with autistic spectrum disorders (ASD) from an underexplored Russian population. Using both a cross-sectional approach involving a control cohort of the same ancestry and an annotation-based approach involving relevant public databases, we explored exonic single nucleotide variants and copy-number variation potentially involved in the manifestation of ASD. The study results reveal new potential ASD candidate-variants found in the studied Russian cohort and show a high prevalence of common ASD-associated genomic variants, especially those in the genes known to be associated with the manifestation of intellectual disabilities. Our screening of an ASD cohort from a previously understudied population allowed us to flag at least a few novel genes (IGLJ2, FAM21A, OR11H12, HIP1, PRAMEF10, and ZNF717) regarding their potential involvement in ASD.


Introduction
Autism spectrum disorders (ASDs) represent a broad spectrum of neurodevelopmental conditions characterized by severe impairment in social interactions and communication skills and the manifestation of restricted, stereotypical behaviors. In 50-70 percent of cases, an ASD co-occurs with intellectual disability [1][2][3]. In addition, multiple other comorbid medical, psychiatric, neurological, and psychological conditions are commonly observed in ASD [3][4][5]. Emerging in early childhood, ASD are lifespan disorders that can be highly disabling [2]. ASD are high-incidence disorders. With slight regional variations, a worldwide average of ASD prevalence was estimated at~7.6 per 1000 or one in 132 persons in 2010 [6]. As new practices for identifying ASD continue to be developed, these estimates tend to increase. For example, in the USA, the 2016 prevalence estimate of 18.5 per 1000 children was 2.8 times higher than that reported in 2000-2002 [7].

Participants
The study participants were 193 children with ASD (mean age = 7 ± 4 years; 18 girls and 175 boys), ascertained through the 'Genetico' Center for Genetics and Reproductive Medicine (Moscow, Russia). The children were enrolled in a project focused on the genetic screening of ASD in a west-Russian population, based on ASD diagnoses established by child psychiatrists. The diagnostics of ASD in Russia are carried out in accordance with the Clinical Recommendations for Autism Spectrum Disorders (ID:594) issued by the Association of Psychiatrists and Psychologists for Evidence-Based Practice and approved by the Ministry of Health of the RF (https://cr.minzdrav.gov.ru/recomend/594_1, accessed on 15 May 2022). The diagnosis of ASD was established following the Autism Diagnostic Observation Schedule Second Edition (ADOS-II) based on the clinical picture and developmental history characteristics of ASD-a combination of symptoms of qualitative disorders of social interaction, communication, and limited, stereotyped, repetitive behavior. According to the Clinical Guidelines, the following tools to screen for the ASD symptoms were used: Checklist for Autism Spectrum Disorders (CASD), Social Communication Questionnaire (SCQ), and Autism Diagnostic Interview-Revised (ADI-R).
The children's demographics and comorbid diagnoses, along with the participants' family histories, are shown in Supplementary Table S1. The data on the clinical features in ASD individuals are summarized in Table 1. These data were collected from the children's medical records and interviews with parents. In addition, 51 individuals without ASD from the west-Russian general population enrolled via the Almazov National Medical Research Centre (St. Petersburg, Russia) were involved in the study as a comparison group; hereafter, they are referred to as nonASD. The nonASD comparison group included children and adults (mean age = 23 ± 16 years; 29 females and 23 males) who did not have a lifetime or current medical history of ASD, and any neurodevelopmental and psychiatric disorders, as per the interview with the participants and their medical records.

Exome Library Preparation and WES Data Processing
For the sequencing library preparation, three different exome capture assays were utilized. For the ASD cohort, the TruSeq DNA Exome and the SureSelect Human All Exon V7 were used for 73 and 120 individual samples, respectively. Whole-exome sequencing (WES) was performed using the Illumina HiSeq 4000 platform. The library preparation and sequencing were conducted at the 'Genetico' Center. WES data for the controls-51 nonASD individuals-were obtained using the SureSelect V6 r2 (hg19 build) capture assay for library preparation followed by sequencing on the Illumina platform. The sequencing was performed at the Almazov National Medical Research Centre.
Sequencing data quality control was performed using the FastQC v. 0.11.9 [24]. Sequencing reads were aligned using the BWA v. 0.7.17-r1188 [25] to a reference genome depending on the exome enrichment panel utilized: the hg19 (UCSC) was used for the samples prepared with the TruSeq DNA Exome and the SureSelect V6 panels, and the hg38 (UCSC) for the samples prepared with the SureSelect V7 exome capture assay. Subsequently, the aligned sequences were subjected to sorting, deduplication, and base quality score recalibration as per the GATK Best Practices [26].

Single Nucleotide Variant Calling
Detection of single nucleotide polymorphisms (SNPs) was performed as follows. First, base recalibrated BAM files were submitted to the Germline short variant (SNPs and indels) discovery workflow implemented in the GATK v. 4.1.6. Second, each sample was submitted to the HaplotypeCaller separately with an appropriate BED file of intervals. Since three different capture assays were used for the sequencing library preparation, the corresponding three datasets were subjected to joint-genotyping separately. Third, the consolidated GVCF files were submitted to an exact test of the heterozygosity excess, applying the hard filtering parameter of ExcessHet > 54.69. The subsequent filtering procedures were performed using the VariantRecalibrator algorithm implemented in the GATK with the following truth sensitivity level parameters: 99.95 and 99.90 for the SureSelect and Truseq ASD-data subsets, respectively, and 99.50 for the nonASD subset. The established thresholds were chosen empirically to increase the possibility of detecting rare variants. Specifically, this allowed the detection of the maximum number of true-positive calls with minimal risk for false-positive calls, as can be seen in the tranches plots presented in Supplementary  Figures S1 and S2).
The hg38 SNP call-sets were converted to the hg19 build using the Assembly Converter. Three SNP call-sets were merged using the bcftools v. 1.10.2 [27]; the 'include-non-variantsites flag' function implemented in the GenotypeGVCF pipeline was applied to account for the missing and reference genotypes. The indels were excluded, and the final merged SNP call-set was used in the downstream analysis. The SNP call-set was annotated to several databases: refGene, cytoBand, ClinVar, ExAC, avSNP147, dbNFSP v. 3.0a, and gnomAD Exome, using the ANNOVAR tools [28].

CNV Detection in WES Data
The detection of copy number variations (CNVs) in the WES data was performed by the 'exomecn.mops' function implemented in the cn.mops R package v. 1.32.0 [29]. The following parameters were adjusted: priorImpact = 100, upperThreshold = 0.59, low-erThreshold = −0.99, and useMedian = TRUE. Both segmentation algorithms, fastseg [29] and DNAcopy [30], were applied. Sixteen ASD samples were excluded: one sample prepared with the SureSelect V7 panel, due to a low genome coverage; and 15 samples prepared with the TruSeqExome panel, due to a batch-effect related to the use of IDT adapters. The CNV calling was performed in three separate runs for the subsets of samples processed with different exome capture assays for the library preparation-the TruSeq Exome (N = 58 ASD), the SureSelect V7 (N = 119 ASD), and the SureSelect V6 (N = 51 nonASD).

Analysis of ASD-Associated Variants
The association analysis was performed using the PLINK v1.9 toolset [31]. Prior to the association analysis, the SNP call-set underwent linkage disequilibrium (LD) pruning and clumping procedures. The LD pruning was conducted according to the following parameters of the 'indep' command: window size = 50, window shift = 5, and VIF threshold = 2. The SNP call-set was adjusted based on the following QC parameters: -maf 0.01, -geno 0.05, -hwe 0.001. After relatedness testing, nine individuals from closely related pairs (PI HAT ≥ 0.125) in the ASD cohort were removed. A case-control association analysis was performed using Fisher's exact test to generate significance values adjusted for multiple testing using Bonferroni correction. Established candidate SNP variants had to meet inclusion criteria based on the predicted pathogenicity score thresholds: a SIFT score < 0.05 [32] and PolyPhen2 HDIV score ≥ 0.453 [33]. The polymorphisms with unknown pathogenicity were also considered as potential candidate variants. In addition, the SNP and CNV callsets were intersected with an autism gene database, AutDB [34]. For the CNV call-sets, the intersection with the list of CNVs reported in the AutDB (validated) was performed using the following command: 'bedtools intersect -a test.bed -b autdb.bed -f 0.70 -r -wa -wb > output.' For a successful query, a test region had to overlap at least 70% of an AutDB record.

Genome-Wide SNP Association Analysis
A summary of the variant calling statistics across the comparison groups and the WESdata subsets is shown in Supplementary Table S2. Altogether we detected 237,019 SNPs across all individuals from both comparison cohorts, ASD and nonASD. The mean number of the detected SNPs per individual varied from 23 to 30K depending on the exome enrichment assay applied, which corresponds to the value of~25K SNPs per individual expected for the exome data [35]. The transition/transversion (Ti/Tv) ratio varied from 2.73 to 2.93, which corresponds to the value of 3.0 observed in the exonic regions [36].
After variant filtering and LD pruning, 22,249 remaining SNPs were included in the case-control association analysis. The association analysis identified ten variants related to eight genes that surpassed the genome-wide significance threshold (Table 2). These ten SNPs were novel variants not previously reported in association with ASD. Moreover, the eight genes harboring these variants also were not found among the over a thousand human genes implicated in ASD as per records in relevant databases-the SFARI [37] and AutDB [34]. According to the SNP functional annotation, the following variants might be highlighted: two likely pathogenic synonymous substitutions in the IGLJ2 (rs8033) and HIP1 (rs1167801) genes and five missense variants with moderate or high deleterious effects located in the PRAMEF10, ZNF717, FAM21A, and OR11H12 genes. The annotation of these genes against the databases on human diseases MalaCards [38] and OMIM [39], and human phenotype ontologies HPO [40], did not reveal associations with ASD and ASD-related phenotypes ( Table 2). Note. † The CADD (Combined Annot deleterious effect of the variant on pr and a C-score between 10 and 20-a ASD cohort are shown. ⁑ The data on human diseases, MalaCards [38] and databases. ** The genes detected in th with ASD, as per records in the most

CNV Burden in the ASD Cohor
Altogether 4991 CNVs acros tected: 4084 and 907 CNV events tively (Supplementary Table S3) differed between the comparison dicated a statistical significance o we observed a wider range in CN a higher prevalence of larger CN we found a remarkable differenc comparison groups. In both com tions: the deletions/duplications r respectively. However, the pred group, where a significantly great p = 1.222 × 10 −7 ).  Note. † The CADD (Combined Annotation-Dependent Depletion) score [41] indicates a predicted deleterious effect of the variant on protein function: a C-Score > 20 defines a pathogenic variant, and a C-score between 10 and 20-a likely pathogenic variant [42,43]. † † Allele frequencies in the ASD cohort are shown. Note. † The CADD (Combined Annotation-Depen deleterious effect of the variant on protein functi and a C-score between 10 and 20-a likely patho ASD cohort are shown. ⁑ The data on the associa human diseases, MalaCards [38] and OMIM [39] databases. ** The genes detected in this study ha with ASD, as per records in the most representat

CNV Burden in the ASD Cohort Compared
Altogether 4991 CNVs across all indivi tected: 4084 and 907 CNV events were ident tively (Supplementary Table S3). As seen i The data on the associations with phenotypes are provided based on the human diseases, MalaCards [38] and OMIM [39], and human phenotype ontologies, HPO [40], databases. ** The genes detected in this study have not been previously reported in association with ASD, as per records in the most representative relevant databases, SFARI and AutDB.

CNV Burden in the ASD Cohort Compared to nonASD
Altogether 4991 CNVs across all individuals from both comparison cohorts were detected: 4084 and 907 CNV events were identified in the ASD and nonASD cohorts, respectively (Supplementary Table S3). As seen in Figure 1, the distribution of the CNV sizes differed between the comparison cohorts; the results of the Kolmogorov-Smirnov test indicated a statistical significance of this difference (D = 0.0934, p = 4.749 × 10 −6 ). In particular, we observed a wider range in CNV length with a lower prevalence of smaller CNVs and a higher prevalence of larger CNVs in the ASD group compared to nonASD. In addition, we found a remarkable difference in the occurrence of different CNV-types between the comparison groups. In both comparison groups, deletions predominated over duplications: the deletions/duplications ratio was 2.22 and 1.48 for the ASD and nonASD cohorts, respectively. However, the predominance of deletions was more profound in the ASD group, where a significantly greater proportion of deletions was found (OR = 1.5; z = 5.365; p = 1.222 × 10 −7 ).
Note. † The CADD (Combined Annotation-Dependent Depletion) score [41] indicates a predicted deleterious effect of the variant on protein function: a C-Score > 20 defines a pathogenic variant, and a C-score between 10 and 20-a likely pathogenic variant [42,43]. † † Allele frequencies in the ASD cohort are shown. ⁑ The data on the associations with phenotypes are provided based on the human diseases, MalaCards [38] and OMIM [39], and human phenotype ontologies, HPO [40], databases. ** The genes detected in this study have not been previously reported in association with ASD, as per records in the most representative relevant databases, SFARI and AutDB.

CNV Burden in the ASD Cohort Compared to nonASD
Altogether 4991 CNVs across all individuals from both comparison cohorts were detected: 4084 and 907 CNV events were identified in the ASD and nonASD cohorts, respectively (Supplementary Table S3). As seen in Figure 1, the distribution of the CNV sizes differed between the comparison cohorts; the results of the Kolmogorov-Smirnov test indicated a statistical significance of this difference (D = 0.0934, p = 4.749 × 10 −6 ). In particular, we observed a wider range in CNV length with a lower prevalence of smaller CNVs and a higher prevalence of larger CNVs in the ASD group compared to nonASD. In addition, we found a remarkable difference in the occurrence of different CNV-types between the comparison groups. In both comparison groups, deletions predominated over duplications: the deletions/duplications ratio was 2.22 and 1.48 for the ASD and nonASD cohorts, respectively. However, the predominance of deletions was more profound in the ASD group, where a significantly greater proportion of deletions was found (OR = 1.5; z = 5.365; p = 1.222 × 10 −7 ).

Genome-Wide Screening of Common ASD-Associated Variants, SNPs and CNVs
The total unfiltered call-set of 237K SNPs identified in both comparison cohorts, ASD and nonASD, was intersected with the list of 891 common ASD-associated variants derived from the AutDB repository [34]. We found 138 of such SNPs in the studied groups (Supplementary Table S4). The distribution of these SNPs across the groups is shown in Figure 2. Although we did not find a significant overrepresentation of the common candidate variants in the ASD cohort compared to the nonASD controls, a greater number of

Genome-Wide Screening of Common ASD-Associated Variants, SNPs and CNVs
The total unfiltered call-set of 237K SNPs identified in both comparison cohorts, ASD and nonASD, was intersected with the list of 891 common ASD-associated variants derived from the AutDB repository [34]. We found 138 of such SNPs in the studied groups (Supplementary Table S4). The distribution of these SNPs across the groups is shown in Figure 2. Although we did not find a significant overrepresentation of the common candidate variants in the ASD cohort compared to the nonASD controls, a greater number of such SNPs were identified in the discovery group-137 SNPs in ASD vs. 102 SNPs in nonASD. Additionally, Fisher's exact test did not reveal a significant difference in the frequency of the ASD-associated SNPs between the comparison groups. such SNPs were identified in the discovery group-137 SNPs in ASD vs. 102 SNPs in nonASD. Additionally, Fisher's exact test did not reveal a significant difference in the frequency of the ASD-associated SNPs between the comparison groups. The CNV-call sets (ASD and nonASD) were also intersected with relevant data on the common ASD-related variations from the AutDB repository as per the procedures described in the Methods section. The list of overlapping CNVs was filtered according to the following criteria: localization within the same chromosome band, the same type of variation (either deletions or duplications), and the same gene content. The CNVs located on sex chromosomes and the CNVs containing HLA (human leukocyte antigen) genes, having the most extensive variability, were removed from the analysis. Detailed results of the intersection analysis are reported in Supplementary Table S5, and a summary is shown in Table 3. Altogether 29 CNVs previously associated with ASD were detected in the studied cohorts-23 and 13 CNVs in the ASD and nonASD groups, respectively, including 8 CNVs that overlapped between the comparison groups (Table 3; Figure 2). Similar to the SNPbased analysis results, despite the lack of significance in the CNV frequencies between the comparison groups, we observed a greater number of the common ASD-associated CNVevents in the ASD cohort than in the nonASD controls. Table 3. The distribution of 29 common ASD-associated CNVs in the studied ASD and nonASD cohorts. Despite a lack of significant differences in the variants' frequencies between the comparison groups, a greater number of the ASD-associated CNVs were detected in the ASD cohort compared to the nonASD controls, 23 vs. 13 CNVs.  The CNV-call sets (ASD and nonASD) were also intersected with relevant data on the common ASD-related variations from the AutDB repository as per the procedures described in the Methods section. The list of overlapping CNVs was filtered according to the following criteria: localization within the same chromosome band, the same type of variation (either deletions or duplications), and the same gene content. The CNVs located on sex chromosomes and the CNVs containing HLA (human leukocyte antigen) genes, having the most extensive variability, were removed from the analysis. Detailed results of the intersection analysis are reported in Supplementary Table S5, and a summary is shown in Table 3. Altogether 29 CNVs previously associated with ASD were detected in the studied cohorts-23 and 13 CNVs in the ASD and nonASD groups, respectively, including 8 CNVs that overlapped between the comparison groups (Table 3; Figure 2). Similar to the SNP-based analysis results, despite the lack of significance in the CNV frequencies between the comparison groups, we observed a greater number of the common ASD-associated CNV-events in the ASD cohort than in the nonASD controls. Table 3. The distribution of 29 common ASD-associated CNVs in the studied ASD and nonASD cohorts. Despite a lack of significant differences in the variants' frequencies between the comparison groups, a greater number of the ASD-associated CNVs were detected in the ASD cohort compared to the nonASD controls, 23 vs. 13 CNVs.  In addition, the CNV call-sets were compared with the CNV morbidity map [59,60], or the list of structural genomic variants that have been linked to severe pediatric diseases, including developmental delays, intellectual disability, and ASD. The development delay track derived using the UCSC Genome Browser tools consists of over 29 thousand individual entries for case subjects. An intersection of these records with the ASD and non-ASD CNV call-sets resulted in 52 overlapping entries (Supplementary Table S6). It was remarkable that only four of 51 nonASD individuals (7.8%) had those CNVs; in contrast, among ASD participants, 21 of 148 individuals (14.2%) harbored CNVs linked to a developmental disorder, including the deletion CNVR6294.56 on chromosome 14, known as a very common variant [61].

Gene-Based and Gene Ontology-Based Analyses
Concerning the gene content of the loci harboring CNVs, we identified 1562 and 990 genes having CNVs in the ASD and nonASD cohorts, respectively (Supplementary Table S7). We performed several overrepresentation analyses (ORA) to identify particular gene ontology (GO) and human phenotype ontology (HPO) terms enriched in these gene sets. It is necessary to note that both the ASD and nonASD sets of genes harboring CNVs were extremely enriched in those encoding olfactory receptors (OR), which were assigned to the GO: olfactory receptor activity. Specifically, 173 of 1562 genes in the ASD gene-set (Fold Enrichment = 5.45; FDR = 5.61 × 10 −59 ) and 167 of 990 genes in the nonASD gene-set (Fold Enrichment = 8.35; FDR = 1.88 × 10 −84 ) were OR genes. Consequently, this superfamily of highly polymorphic OR genes was removed from the gene-sets prior to the ORAs.
The ORA results are reported in the Supplementary Table S8. A graphical summary showing a top list of the most overrepresented (at an Enrichment FDR < 10 −5 ) GO terms is shown in Figure 3. As can be seen in Figure 3, at the established significance threshold, the only functional category overrepresented among the genes having CNVs in the nonASD group is the JAK-STAT pathway. This signaling pathway mediates cellular transcriptional responses to cytokines and, as a consequence, is related, first of all, to the immune response. Notably, STAT protein-related pathways were also found among the GO terms overrepre-sented in the set of genes having CNVs in the ASD cohort ( Figure 3). In turn, in comparison to the nonASD gene-set, the ASD gene-set was remarkably enriched in genes related to meiotic processes, in particular chromosome segregation, and in genes involved in the primary cilium assembly and organization (Figure 3).
Genes 2022, 13, x FOR PEER REVIEW 10 of 17 the only functional category overrepresented among the genes having CNVs in the nonASD group is the JAK-STAT pathway. This signaling pathway mediates cellular transcriptional responses to cytokines and, as a consequence, is related, first of all, to the immune response. Notably, STAT protein-related pathways were also found among the GO terms overrepresented in the set of genes having CNVs in the ASD cohort ( Figure 3). In turn, in comparison to the nonASD gene-set, the ASD gene-set was remarkably enriched in genes related to meiotic processes, in particular chromosome segregation, and in genes involved in the primary cilium assembly and organization (Figure 3). Figure 3. The plot shows functional categories (GO: biological process terms) significantly (at an Enrichment FDR < 10 −5 ) enriched in the sets of genes harboring CNVs in the ASD cohort (green) and nonASD cohort (blue). A network indicates GO terms sharing 30% or more genes; thicker edges represent more overlapped genes. Bigger nodes correspond to larger gene sets, and darker nodes correspond to more significant enrichment FDR-values. The enrichment tests and the network constructions were performed using ShinyGO tools [62].
The lists of genes with CNVs were also submitted to the HPO-based ORA. No significant enrichment was found for the nonASD gene-set. In contrast, the ASD gene-set was significantly enriched in a number of phenotypes related, first of all, to intellectual disability and developmental delays (Table 4). Tracking HPO-related genes in the data on the CNV distribution across individuals (Supplementary Table S9), we observed that several highlighted phenotypes had been reported in the participants' medical records. In particular, a representative number of individuals (N = 79) harbored CNVs in the genes associated with intellectual disability. One of the ASD individuals harboring CNVs in genes related to hypertelorism manifested this phenotype according to his medical history. Two of four participants who had records on the epicanthic fold carried CNVs in the genes Figure 3. The plot shows functional categories (GO: biological process terms) significantly (at an Enrichment FDR < 10 −5 ) enriched in the sets of genes harboring CNVs in the ASD cohort (green) and nonASD cohort (blue). A network indicates GO terms sharing 30% or more genes; thicker edges represent more overlapped genes. Bigger nodes correspond to larger gene sets, and darker nodes correspond to more significant enrichment FDR-values. The enrichment tests and the network constructions were performed using ShinyGO tools [62].
The lists of genes with CNVs were also submitted to the HPO-based ORA. No significant enrichment was found for the nonASD gene-set. In contrast, the ASD gene-set was significantly enriched in a number of phenotypes related, first of all, to intellectual disability and developmental delays (Table 4). Tracking HPO-related genes in the data on the CNV distribution across individuals (Supplementary Table S9), we observed that several highlighted phenotypes had been reported in the participants' medical records. In particular, a representative number of individuals (N = 79) harbored CNVs in the genes associated with intellectual disability. One of the ASD individuals harboring CNVs in genes related to hypertelorism manifested this phenotype according to his medical history. Two of four participants who had records on the epicanthic fold carried CNVs in the genes associated with the epicanthus. Two ASD participants were recorded as having hypotonia and one with confirmed microcephaly; however, they did not have CNVs in the genes linked to these phenotypes as per the HPO. Important to note that enrichment of functional groups of genes does not equate to the presence of phenotypes; thus, not necessary for every patient with perturbations in particular genes to develop the corresponding phenotype. However, the top list of HPO terms in Table 4 indicates a burden of CNVs in the genes directly related to the manifestation of severe developmental issues mostly known in disorders with autosomal recessive inheritance. Remarkable, in contrast to microcephaly, we did not observe an enrichment in the HPO Macrocephaly-a clinical feature also overrepresented in individuals with ASD, which reported rates are 10-20%; see, e.g., [63][64][65]. We tend to attribute this to a power insufficiency of the enrichment analysis for this particular HPO term, for which a limited amount of associated genes (about ten) are known compared to the HPO Microcephaly, which has been reliably linked to hundreds of various genes.

Discussion
In summarizing the results of this study exploring exonic variations in a Russian cohort of children with ASD, several findings and observations need to be pointed out and discussed. First, based on the data aggregated in relevant repositories, such as AutDB [34], SFARI [37], and the CNV morbidity map of developmental delay [59,60], we tracked the genomic variants known to be implicated in ASD and related developmental disorders in the studied ASD cohort compared to the nonASD individuals from the general population of the same origin. Although below the threshold of statistical significance, an elevated prevalence of common ASD-associated genomic alterations, both SNPs and CNVs, was observed in the ASD compared to the nonASD cohorts.
Second, in comparing the CNV metrics (length, prevalence, and distribution) between the comparison groups, we observed that the ASD cohort is characterized by a higher CNV burden. Specifically, we noted a higher prevalence of larger CNV events and a remarkable predominance of deletions over duplications (about 1.5 times) in the ASD compared to the nonASD group. These observations are consistent with the consensus in the literature that CNVs are one of the most prominent sources of genetic risks for ASD. Specifically, it has been reported that CNVs, as a genomic event, are highly prevalent (i.e., observed in up to 20%) in individuals with an ASD [66]. Additionally, it has been shown that de novo CNV events occur almost five times more frequently in individuals with ASD than in unaffected siblings and controls (5-10% vs. 1-2%, respectively), and large CNVs were consistently observed in the cases with developmental delays or intellectual disability [49,51]. A recent study exploring the effect sizes of the CNV types on the development of multiple cognitive domains and overall ASD risk suggested a differential effect of deletions and duplications on different phenotypic features of ASD. Specifically, whereas both CNV types may equally affect motor skills, IQ-related cognitive deficits in ASD have been predominantly attributed to haploinsufficiency due to deletions [66]. Remarkably, our phenotype-focused enrichment tests revealed a significant overrepresentation of the comorbid phenotypes related to intellectual disabilities and developmental delays among those associated with genes having CNVs, predominantly deletions, in the studied ASD cohort. It is important to note that, in addition to these two major phenotypes, several other comorbid conditions and health problems were highlighted by the CNV gene-set enrichment analysis; for example, conditions of hypertelorism and epicanthic fold were tracked in participants' medical records. Altogether, these observations provide additional evidence supporting the essential role of structural genomic variations in the etiology of diverse ASD phenotypes often accompanied by multiple comorbid developmental conditions. Third, we explored potential functional outcomes of the CNV burden across the comparison groups based on tests of gene-set enrichment in the specific biological pathways that these genes control. In contrast to the nonASD controls, the list of genes harboring CNVs in the genomes of individuals with ASD showed a significant overrepresentation of gene ontologies related to meiosis and chromosome segregation, and to the biological pathways involved in primary cilium assembly and organization. Both of these biological pathways might be potentially linked to the ASD phenotype. Specifically, the former may indicate an aggravated risk of chromosomal rearrangements in ASD genomes; such rearrangements, both de novo and inherited, are known to be involved in the etiology of ASD [67][68][69]. In fact, defects and deficits in primary cilia, known to impact brain development and maturation [70][71][72][73], have been demonstrated, directly and indirectly, to contribute to ASD [74] and ASD-related phenotypes, such as Asperger syndrome [75] and Fragile X syndrome [76].
Fourth, our case-control association analysis allowed us to identify seven SNPs with a predicted deleterious effect, which showed a significantly higher prevalence in the ASD cohort compared to controls. Six genes harboring these variants: IGLJ2, FAM21A, OR11H12, HIP1, PRAMEF10, and ZNF717, have not been previously linked to ASD or ASD-related phenotypes. However, several studies have shown that ZNF717, HIP1, and several genes from the PRAME (Preferentially Expressed Antigen In Melanoma) family might be involved in cognitive development, learning disorders, and developmental disorders with autistic features [77,78]. Specifically, mutations in PRAMEF5 and PRAMEF7 were described in patients with delayed speech and language development, hearing deficits, and reading disability [77]. In the same study [77], a mutation in the ZNF717 gene has been identified among 16 other rare homozygous variants in at least two families, those of patients with Joubert syndrome-a disorder characterized by autistic behavior and intellectual disability. Two reciprocal microduplications inclusive of HIP1 have been described in three children from two unrelated families who had neurobehavioral problems: one child had an expressive language disorder, and two children had attention deficit hyperactivity disorder and manifested aggressive behavior [78]. The detection of variants within the OR11H12 gene encoding an olfactory receptor is also not surprising, as rare and common variants in at least several OR genes have been reported in association with autism, such as OR2M4 [79], OR2T10 [80,81], and OR52M1 [82].
In conclusion, despite the main limitation-a relatively small sample size-the study's findings and observations are consistent with the growing body of evidence supporting the genetic bases of such heterogeneous disorders as ASD. We also report on several suggestive candidate genes that might be associated with ASD. Undoubtedly, follow-up research involving additional extended cohorts from the population is required to confirm the involvement of these genes in ASD; we consider these findings preliminary. To our knowledge, our study is one of the first attempts to investigate genome-wide polymorphic variants, SNPs and CNVs, in a previously understudied cohort of ASD from the Russian Federation, using a whole-exome sequencing technique.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/10 .3390/genes13050920/s1, Figure S1: Tranches plot (SureSelect V7), Figure S2: Tranches plot (TruSe-qExome), Table S1: Participants. Demographics and phenotypic data, Table S2: Summary statistics on genotyping results, provided separately for three WES data subsets prepared with three exome capture assays, Table S3: CNVs identified across all individuals from both comparison groups, ASD and nonASD, Table S4: The distribution of common ASD-associated SNPs in the studied ASD and nonASD cohorts, as per the results of an intersection analysis with relevant data sets from the AutDB repository, Table S5: Common ASD-associated CNVs detected in the studied cohorts, as per the results of an intersection analysis with relevant data sets from the AutDB repository, Table S6: CNV dataset and Developmental delay morbidity map intersection results, Table S7: List of genes located within the CNVs' loci detected in the studied cohorts, ASD and nonASD, Table S8: Top-30 list of GO terms ranked by Enrichment P-value, which were significantly overrepresented in the sets of genes harboring CNVs in ASD and nonASD, respectively, Table S9: HPO tracking in individual data.