Evaluation of Chromosome Microarray Analysis in a Large Cohort of Females with Autism Spectrum Disorders: A Single Center Italian Study

Autism spectrum disorders (ASD) encompass a heterogeneous group of neurodevelopmental disorders resulting from the complex interaction between genetic and environmental factors. Thanks to the chromosome microarray analysis (CMA) in clinical practice, the accurate identification and characterization of submicroscopic deletions/duplications (copy number variants, CNVs) associated with ASD was made possible. However, the widely acknowledged excess of males on the autism spectrum reflects on a paucity of CMA studies specifically focused on females with ASD (f-ASD). In this framework, we aim to evaluate the frequency of causative CNVs in a single-center cohort of idiopathic f-ASD. Among the 90 f-ASD analyzed, we found 20 patients with one or two potentially pathogenic CNVs, including those previously associated with ASD (located at 16p13.2 16p11.2, 15q11.2, and 22q11.21 regions). An exploratory genotype/phenotype analysis revealed that the f-ASD with causative CNVs had statistically significantly lower restrictive and repetitive behaviors than those without CNVs or with non-causative CNVs. Future work should focus on further understanding of f-ASD genetic underpinnings, taking advantage of next-generation sequencing technologies, with the ultimate goal of contributing to precision medicine in ASD.


Introduction
Autism spectrum disorders (ASD) are a heterogeneous group of neurodevelopmental pathologies characterized by early onset abnormalities in social communication and interaction, as well as atypically restricted and repetitive behaviors and interests [1]. Despite the exact pathogenesis of idiopathic ASD not yet being fully elucidated, recent evidences suggest an interaction between

Methods
We collected the clinical data of a group of 93 females referred consecutively to the Autism Spectrum Disorders Unit of our Children Neuropsychiatry Hospital between 2015 and 2016. The age at the last clinical evaluation ranged from 21 months to 17 years. All participants received a clinical diagnosis of ASD based on the criteria of the Diagnostic and Statistical Manual of Mental Disorders (DSM-5) [1]. All the patients were unrelated.
According to our ASD-screening protocol, neurometabolic conditions and hypoxic-ischemic injury were investigated. All participants were evaluated by an expert clinical geneticist in order to exclude recognizable monogenic syndromes. Prior to this study, each individual had also been tested for the expanded repeat sequences in 5 -UTR of the FMR1 gene as previously reported [42].
Based on this screening, we excluded two females with a history of perinatal hypoxia and diffuse white matter disease detected on brain magnetic resonance imaging (MRI), and one patient with macrocephaly harboring a pathogenic mutation in PTEN. In a single case (patient P11) we analyzed CNVs in spite of her presentation of a low-level somatic mosaicism for a fully-mutated/pre-mutated FMR1 allele, because the patient's phenotype could not be fully explained by this genetic condition.
Hence, we tested 90 ASD female individuals for CNVs. Participants were classified as clinically affected by "essential" autism, based on the absence of major congenital abnormalities and major dysmorphism [43,44].
Cognitive evaluation was performed in 87 participants with specific cognitive scales based on the age and the language level. According to the age, children were tested respectively with the Griffiths Mental Development Scale-Revised (GMDS-R) [45], Wechsler Preschool and Primary Scale of Intelligence-third edition (WPPSI, III) [46] or Wechsler Intelligence Scale for Children-IV (WISC, IV) [47]. The evaluation of non-verbal females was performed using the Leiter International Performance Scale-Revised (Leiter-R) [48]. In three participants, the cognitive assessment was not performed because of scarce compliance due to severe autism symptoms.
Clinical assessment of expressive language skills defined females with a complete absence of language (n = 27) and a group of "verbal" f-ASD (n = 63).
The semi-structured Autism Diagnostic Observation Schedule second edition (ADOS-2) evaluation [49], which provides a measure of autism severity, was available in 67 participants. We recorded the score on the Social Affect (SA) and the Restricted and Repetitive Behaviors (RRB) domains for each proband. Since we used different ADOS modules according to the non-echolalic expressive language level of each patient at the time of the evaluation, we converted the global ADOS scores and the sub-scores of the SA and RRB domains in the corresponding Calibrated Severity Score (CSS) [50,51].
This study was approved by the Pediatric Ethic Committee of Tuscany Region (Italy), and was performed according to the Declaration of Helsinki and its later amendments or comparable ethical standards. All parents or legal representatives signed an informed consent form before the inclusion of their child in the study. The identities of all individuals were omitted.

Genetic Analysis
CMA analyses were performed using the Agilent 8 × 60 K Microarray oligonucleotide platform with a median resolution of 100 Kbp, according to the manufacture's protocol (Agilent Technologies, Santa Clara, CA, USA). CNV coordinates refer to the Genome Reference Consortium Human Build 37 (GRCh37/hg19).
In each proband, CNVs were confirmed by quantitative polymerase chain reaction (qPCR). Segregation analyses in parental DNA (whenever available) were performed by qPCR. Polymorphic CNVs, based on Database of Genomic Variants data (DGV) [52]), were filtered out.
Non-polymorphic CNVs were classified as "causative" (C-CNVs) or "non-causative" (N-CNVs) according to the American College of Medical Genetics and Genomics (ACMG) guidelines [53]. We considered as "causative": (i) CNVs encompassing genomic regions or genes associated with ASD or with other neuropsychiatric conditions (i.e., intellectual disability, epilepsy and schizophrenia) in the Online Mendelian Inheritance in Man (OMIM) database [54]; (ii) CNVs containing "high confidence" ASD-genes reported in the Simons Foundation Autism Research Initiative (SFARI) Gene database [55] with a score < 3 or in the Autism Knowledge Base version 2.0 (Autism KB 2.0) database [56] with a score > 16; (iii) CNVs involving "candidate-genes" for ASD either reported in association with autism in literature, or listed in the aforementioned databases and with a SFARI Gene score ≥ 3 or an Autism KB score ≤ 16 (suggestive or "low confidence" candidate-genes). Conversely, CNVs were considered non-causative (N-CNVs) if they have never been associated with ASD or other neurodevelopmental disorders (NDDs). Patients who tested negative for CNVs were classified as "without CNVs" (w-CNVs).
To recognize significantly enriched functional modules, ASD-candidate genes encompassed by C-CNVs were evaluated by bioinformatics tools. A Core analysis run in the Variant Effects Analysis mode through the use of the Ingenuity Pathway Analysis (IPA) software [57] figured out cellular processes related to our gene dataset (21 genes). A functional network encompassing our ASD-candidate genes was generated. Bridging nodes were denoted evaluating both direct and indirect interactions with stringent level of confidence and only related to neurological diseases. Gene ontology (GO) categorization was carried out using ToppGene Suite [58]. The top three ontologies for Molecular Functions and Cellular Component were annotated and statistical significance of GO terms was reported as -log10 (p-value).

Statistical Analyses
We used a chi-square test to investigate the association between the CNVs subtype and the type of CNVs (duplication or deletion) and the pattern of inheritance (de novo or inherited, paternal or maternal). A Mann-Whitney test was used to verify if there were any differences in the CNVs burden of the different CNVs subtypes (excluding patient P23 who carried a whole X-chromosome duplication).
We also investigated the phenotype of the individuals with the different CNVs subtypes testing with the chi-square test the association between the CNVs subtype and cognitive (IQ ≤ 70 vs. >70) and language (non-verbal vs. verbal) levels. A Mann-Whitney test was used to ascertain that the groups with different CNVs subtype were matched on age and to verify if there were any differences in the CCS score obtained on the total ADOS and on its AS and RRB domains. In case of statistically significantly differences we compute for r score as effect size index. This was interpreted as negligible (r < 0.10), small (0.10 ≤ r < 0.30), medium (0.30 ≤ r < 0.50), or large (r ≥ 0.50).
Sixty-one f-ASD were considered w-CNVs (67.8% of the whole group).
Out of 35 CNVs, 25 were classified C-CNVS (71.4%) and 10 N-CNVs (28.6%). In the whole group of 90 f-ASD, 20 patients harbored at least one possible disease-causing CNV (diagnostic yield 22.2%) ( Figure 1).  Table 1 illustrates the results of CMA investigations. There were not recurrent C-CNVs, with the exception of two unrelated subjects who harbored a 15q11-q13 microduplication. Ten CNVs involved genomic regions already associated with known contiguous gene-deletion/duplication syndromes associated with ASD or NDDs, 5 CNVs encompassed "high-confident" ASD-genes and ten involved genes reported in literature or in the SFARI Gene/Autism KB databases as possible candidates for autism.
The function and evidence of possible disease-association of the reported candidate-genes are summarized in Table 2. Bioinformatic analysis showed that 11 out of 21 of the reported disease-associated and candidate genes are involved in synaptic structure and transmission (ADARB1, ASIC2, CADM2, DMD, GRIN2A, GRM7, NEDD4, NRXN1, PCDH15, PTPRD, TRPM2) ( Figure 2).
In 24 f-ASD, carrying 29 CNVs, we assessed a de novo origin in 8 and a paternal in 12, whereas CNVs were maternally-inherited in 9 patients. In 5 children we could not assess segregation because of lack of parental DNA. Table 3 shows the proportion of duplications and deletions and the mode of inheritance in relation to the different subtypes of CNVs. Overall, the rate of de novo CNV was 9.4%. All de novo CNVs involved known NDDs-associated genes/chromosomal regions. CNVs encompassing suggestive or "low confidence" ASD-genes were all inherited; 6 out 9 disrupted more than one NDD-gene or were associated with an additional C-CNV. Seven out of 9 maternally inherited vs. 6 out of 12 paternally inherited CNVs were causative. Table 1. Chromosomal microarray (CMA) results in the 29 participants carrying at least one Copy Number Variant (CNV). For each participant with positive CMA results are reported the genomic location and breakpoints of each CNV, the CNV subtype (deletion or duplication), the size in base pairs, the inheritance status, the associated known genetic syndrome or Autism Spectrum Disorders (ASD) candidate genes involved in the rearrangement, and the CNV classification (causative or non-causative).    The function and evidence of possible disease-association of the reported candidate-genes are summarized in Table 2. Bioinformatic analysis showed that 11 out of 21 of the reported disease-associated and candidate genes are involved in synaptic structure and transmission (ADARB1, ASIC2, CADM2, DMD, GRIN2A, GRM7, NEDD4, NRXN1, PCDH15, PTPRD, TRPM2) ( Figure 2).

Figure 2. Bioinformatic analyses performed on ASD-candidate genes encompassed by C-CNVs. (A) A Core analysis run in Variant Effects
Analysis mode using the Ingenuity Pathway Analysis software figured out cellular processes related to our gene dataset (21 genes) generating a functional network encompassing 11 genes (in red). Synaptic transmission resulted the most significant functional annotation (p-value 6.05 × 10 −9 ). Bridging nodes (in white) were denoted evaluating both direct and indirect interactions related only to neurological diseases and with stringent level of confidence (B). Gene ontology (GO) categorization was carried out using ToppGene Suite. Top three ontologies for Molecular Function (dark grey) and Cellular Component (light grey) were annotated; statistical significance of GO terms was reported as −log10 (p-value). The number of genes belonging to each category was reported on the right of each bar. Gene ontology (GO) categorization was carried out using ToppGene Suite. Top three ontologies for Molecular Function (dark grey) and Cellular Component (light grey) were annotated; statistical significance of GO terms was reported as −log10 (p-value). The number of genes belonging to each category was reported on the right of each bar.

Phenotypic Characterization
Twenty-seven f-ASD had an absence of language whereas 63 were "verbal". Cognitive evaluation was performed in 87 participants, being three participants unfit for psychometric testing. Forty-two of the tested individuals had IQ ≤ 70 and 45 had IQ ≥ 70.
Supplementary Table S1 recapitulates clinical data of the studied population.
Whilst the type of genomic micro-rearrangement (deletion vs. duplication) was not statistically correlated to causative/non-causative definition (Chi 2 (1) = 0.41, p = 0.52), not considering CNVs associated with contiguous-gene syndromes, most of the breakpoints of causative duplications lie within at least one NDD-candidate gene (n = 6/8). C-CNVs had a CNVs burden value statistical significantly higher than those of the N-CNVs subtypes (mean (SD) = 1.14 (1.43) vs. 0.19 (0.16); Mann-Whitney U = 52.50, z = 2.56, p = 0.01, r = 0.49). Table 4 shows the age, the cognitive and linguistic level as well as the autism severity of the three groups of individuals according to different CNV subtypes (causative, non-causative and without CNVs).
To investigate whether there were significant differences in clinical features between groups, we regrouped participants with negative CMA results (N-CNVs and w-CNVs) and compared their characteristics with cases with C-CNVs. The two groups resulted matched for age The relative frequencies of the phenotypic features were the following: in the group with C-CNVs, 55% (11/20) had IQ ≤ 70; 60% had a moderate-severe level of autism symptoms (9/15), 35% had absence of language (7/20); in the group with negative CMA, 46% (31/37) had IQ ≤ 70; 75% had a moderate-severe level of autism symptoms (47/62), 28% had absent language (20/70).
Conversely, we found that the f-ASD with C-CNVs had a statistically significantly lower CSS on the RRB ADOS domain that those without CNVs or with non-causative (mean (SD) = 6.08 (2.14) vs. 7.50 (2.27); Mann-Whitney U = 197, z = 2.48, p = 0.01, r = 0.30). Table 4. Demographic features of participants grouped according to CMA results. For each group (with causative and non-causative CNVs, or without CNVs) are reported the mean age at the last examination (in months), the rate of patients with a IQ level > 70 vs. ≤70, the rate of verbal vs. non-verbal patients, and the mean calibrated severity scores (CSS) of the global Autism Diagnostic Observation Schedule (ADOS) scores and the sub-scores of the Social Affect (SA) and Restricted and Repetitive Behaviors (RRB) domains. The language level was assessed in all 90 participants, the IQ level and the ADOS scores were available for 87 and 67 of the 90 individuals, respectively.

Discussion
Although a recent meta-analysis and multidisciplinary consensus statement proposes exome sequencing at the beginning of the evaluation of unexplained neurodevelopmental disorders [73], CMA is still the recommended first-tier genetic analysis in the evaluation of ASD subjects [40,74].
In the last few years, investigations of large cohorts of ASD individuals [13,37,75] have identified a high burden of CNVs with rare C-CNVs being found in 5-10% of idiopathic ASD [76]. However, these data are often affected by gender-bias due to the high M/F ratio in the vast majority of the studies and even more recent investigations addressing type and frequency of C-CNVs did not allowwith few exceptions-for separate gender examinations due to relatively small sample size [77][78][79][80].
Herein, we focused exclusively on a cohort of f-ASD and we found clinically significant CNVs in about 22% of patients. Few investigations have considered CNVs and clinical features in f-ASD in comparison with ASD males. In one study, large CNVs (>400 kb) were more frequent in f-ASD than in males (29% vs. 16%), and this difference was even higher (F/M 3:1) if analyses were limited to regions containing genes involved in NDDs [81]. In a similar vein, Levy and colleagues (2011) [13] detected that f-ASD have a high frequency of de novo CNVs (11.7% vs. 7.4% in males), and Sanders et al. (2015) [15] identified a significant difference in the rate of de novo CNVs between boys (5.3%) and girls (8.7%). Our numbers in an only girl cross-sectional, monocentric study denote a similar sex effect with a high diagnostic yield and a 9.4% occurrence of de novo variants.
All de novo CNVs involved known NDDs-associated chromosomal regions whereas CNVs encompassing suggestive or "low confidence" ASD-genes were all inherited and mostly disrupting more than one NDD-gene or associated with an additional C-CNV. Among C-CNVs, there was an excess of maternally-inherited potentially pathogenic CNVs. These findings support the "two-hit model" suggested in previous studies in which the compound effect of a small number of rare variants may contribute to phenotypic heterogeneity of ASD [82].
While literature in the ASD field reported an excess of clinically-significant deletions, we did not find a correlation between the type of genomic rearrangement and causative/non causative definition. Haploinsufficiency for genes within a deletion is a well-recognized cause of genetic disease. Conversely, interpreting the phenotypic consequences of microduplications is often challenging because the pathogenicity of most duplications cannot be explained by triplosensitivity. Sequencing the breakpoints of 119 duplications, Newman et al. (2015) demonstrated that, rather than an extra copy effect, the phenotype of microduplications can be related to the misregulation of genes that span the breakpoints, through loss-of-function mechanisms due to altered transcription or translation or to the creation of fusion proteins with unknown functions [83]. In our f-ASD cohort, most of the causative non-syndromic duplications breakpoints disrupted at least one NDD-candidate gene, hence we can suppose that the pathogenic phenotype could be caused by similar mechanisms.
Unlike previous literature results [78], we did not find any association between C-CNVs and IQ or language deficits. Analyzing the phenotypic features of females with C-CNVs versus those with negative CMA results, we only observed statistically significantly lower scores on the restricted repetitive behaviors (RRB) ADOS domain in f-ASD with clinically significant variants. Recently, Barone et al. reported more severe autistic symptoms in individuals with C-CNVs [79]. The discrepancies with our data could reflect the diverse characteristics of the studied population, indeed several studies suggested a sex effect on RRB scores, which are reported to be repeatedly lower in female than in male subgroups [28,[84][85][86]. Crucially, several lines of evidence suggest that social-communication (SC) and RRB symptom domains are underpinned by different genetic mechanisms. For instance, a recent genome-wide association study demonstrated that the RRB trait "systemizing" is heritable and genetically correlated with autism in the general population and that the SC and RRB domains in autistic subjects show low shared genetics [87]. In particular, the contribution of genetic factors to the RRB domain is sustained by their significative presence on both parents [88] and siblings [89] of probands with ASD. Overall, the impact of C-CNVs on ASD symptoms is still unclear and a recent work highlighted the contribution of environmental factors (i.e., maternal infections during pregnancy) on RRB severity in individuals with CNVs [90]. We can only speculate that we registered lower RRB scores in our f-ASD with positive CMA results because this sample represents the mild-end of a genomic "simple" disorder, while those girls with negative results could reflect the group of f-ASD with "complex" multifactorial etiology, as the largest portion of idiopathic autistic males.
With the exception of two subjects with a 15q11-q13 microduplication, no overlapping CNVs were detected, confirming the high genetic heterogeneity of ASD. Fifteen CNVs involved ASD/NDDs-associated genes or genomic regions already identified, whereas 10 CNVs encompassed genes reported as possible candidates for ASD in literature or in ASD databases (Tables 1 and 2). The contribution of each CNV to the phenotype of our f-ASD patients is discussed in the Supplementary File S1. Out of this list, some cases appear worth discussing.
The known contiguous-gene deletion/duplication syndromes detected in our cases were associated with a diagnosis of "idiopathic" ASD because these patients did not display any of the additional non-neurodevelopmental features specific of these syndromes, as dysmorphisms or congenital defects which can be seen in Smith-Magenis (P8), 17q12 microdeletion (P10), 2p15p16 deletion (P19), 22q11 duplication (P28) and SHOX duplication (P29) syndromes. These patients could represent the mild-end of the phenotypic spectrum of these genomic disorders, due to the "NDDs-protective effect" reported in females [16].
In some cases, reverse phenotyping allowed the investigation and prevention of important comorbidities, as in P25, who carries a de novo partial duplication of the DMD gene, which in females could manifest with muscle weakness and cardiomyopathy, and in P20, who carries a 16p11.2 duplication widely reported in ASD studies which is associated with the risk of developing psychotic symptoms [91].
Among clinically relevant rearrangements, aneuploidy was identified in a single subject, who presented an X chromosome trisomy (47,XXX). Interestingly, data in the literature did not report a greater risk for autism in X chromosome trisomy [92], even if difficulties in social functioning and, more broadly, an increased vulnerability for autistic traits are described [68].
The de novo 16p13 duplication detected in one patient (P3) involves partially UPS7. Variants affecting this gene were recently reported in 23 individuals with syndromic Developmental Delay/Intellectual Disability [93], and about half of reported subjects had ASD. P3 presents mild motor developmental delay, absent speech, behavioral anomalies and ASD, suggesting that USP7 haploinsufficiency should be suspected in a case of ASD with absence of speech and behavioral disorders. CNVs detected in P3 spans also GRIN2A and RBFOX1, so we cannot exclude a possible additional role of these genes in the phenotype of the patient.
The deletions found in P11, P14 and P15 reinforce the evidence of a possible contribution of PCDH15, GRM7, CADM2 and IMMP2L genes to ASD susceptibility.
When new and old genes pinpointed by CMA studies were combined in functional modules using IPA and ToppGene Suite, we observed an enrichment in genes involved in synaptic function and transmission, which are well-established biological processes involved in autism and NDDs [94].
In conclusion, this study provides a representative picture of the spectrum of CNV in f-ASD investigated in a clinical setting. As expected, no specific CNVs have been found to be required for developing ASD, supporting the heterogeneity of affected molecular pathways. However, genes in the C-CNVs of our sample of f-ASD code mainly for proteins that could be grouped in two different functional systems: synaptic function/structure, and mRNA/protein processing. Of note, environmental exposures during specific windows of vulnerabilities in prenatal and perinatal life critically interact with genetic susceptibility contributing to ASD pathogenesis [95]. Our study suggests that females with idiopathic ASD have a high rate of pathogenic CNVs encompassing both known and new candidate ASD genes. Hence, studies on large samples of f-ASD carefully assessed from a clinical point of view could help in unraveling the genetic determinants of autism. Moreover, f-ASD with normal-array comparative genomic hybridization analysis could benefit from whole exome or genome sequencing [96], paving the way for the implementation of personalized treatments based on genetic findings.

Supplementary Materials:
The following are available online at http://www.mdpi.com/2075-4426/10/4/160/s1, Table S1: Phenotipic characteristics of participants; Supplementary File S1: Contribution of each CNV to the phenotype of f-ASD patients. Funding: This work has been partially supported by grant from the IRCCS Fondazione Stella Maris (Ricerca Corrente, and the "5 × 1000" voluntary contributions, Italian Ministry of Health). S.C. was partially funded by AIMS-2-Trials.