Environmental Influences Measured by Epigenetic Clock and Vulnerability Components at Birth Impact Clinical ASD Heterogeneity

Although Autism Spectrum Disorders (ASD) is recognized as being heavily influenced by genetic factors, the role of epigenetic and environmental factors is still being established. This study aimed to identify ASD vulnerability components based on familial history and intrauterine environmental stress exposure, explore possible vulnerability subgroups, access DNA methylation age acceleration (AA) as a proxy of stress exposure during life, and evaluate the association of ASD vulnerability components and AA to phenotypic severity measures. Principal Component Analysis (PCA) was used to search the vulnerability components from 67 mothers of autistic children. We found that PC1 had a higher correlation with psychosocial stress (maternal stress, maternal education, and social class), and PC2 had a higher correlation with biological factors (psychiatric family history and gestational complications). Comparing the methylome between above and below PC1 average subgroups we found 11,879 statistically significant differentially methylated probes (DMPs, p < 0.05). DMPs CpG sites were enriched in variably methylated regions (VMRs), most showing environmental and genetic influences. Hypermethylated probes presented higher rates in different regulatory regions associated with functional SNPs, indicating that the subgroups may have different affected regulatory regions and their liability to disease explained by common variations. Vulnerability components score moderated by epigenetic clock AA was associated with Vineland Total score (p = 0.0036, adjR2 = 0.31), suggesting risk factors with stress burden can influence ASD phenotype.


Introduction
Autism spectrum disorder (ASD) is a neurodevelopmental disorder characterized by alterations in social communication and the presence of stereotyped and restrictive behaviors. It is a polygenic multifactorial disorder with a complex genetic architecture and a heterogeneous clinical presentation [1,2].
The genetic component seems to have a major role in ASD vulnerability. The genetic risk derives mostly from common genetic variants with a smaller contribution from de novo and rare inherited variation (2.6% of the variance in liability) [3]. In some ASD cases, a polygenic variation additively with strong acting de novo variants could increase the risk [4]. The polygenic model assumes that many inherited variants contribute to ASD, each with a small effect. This model is supported by many aspects, such as more than ten times increased risk for autism in siblings of probands, and observation of sub-threshold autisticlike traits in ASD first-degree relatives [5]. A polygenic liability to psychiatric disorders has been discussed by studies showing genetic correlation among psychiatric disorders, suggesting that they share common genetic factors [6,7]. Accordingly, not only ASD but also other psychiatric disorders are more common among relatives of ASD cases [8,9].
The heritability of ASD is estimated to be around 50%, although varying between different studies [3,10], which shows that environmental factors are also important. Different studies have shown an association between intrauterine stress exposure and a greater risk for ASD [11][12][13]. Many environmental factors associated with ASD can change epigenetic status, such as DNA methylation, and affect regulatory mechanisms in brain development [11][12][13][14][15][16][17]. Genetic liability in combination with environmental factors crosses a risk threshold triggering the disorder development [11,18]. Recent studies showed that variability in methylation levels in some genomic locations could be better explained by a combination of genetic (G) and environmental (E) factors [19][20][21][22][23][24][25]. These regions, named variably methylated regions (VMR), are located close to other functional genomic variations and enriched with CpGs sites that could influence gene expression. These VMRs did not change with age and are over-represented in Genome-Wide Association Studies (GWAS) of both psychiatric and other complex diseases, possibly representing biological markers [19].
Besides the heterogeneous etiopathology, the ASD clinical presentation is also highly heterogeneous and its dimensional diagnosis represents the idea that its features fall along a continuum of severity [26][27][28]. It has been shown that besides core symptoms, intellectual function, language abilities, and functionality contribute to this heterogeneity and severity of clinical presentation [29]. Functionality measured by the Vineland scores seems to have greater variability than ASD symptoms and has an important impact on ASD trajectories [30]. To better understand the phenotypic heterogeneity and the severity spectrum, different studies searched for ASD subgroups based on phenotype or genetic variations, but none considered environmental influences [2,18,27,28,[31][32][33].
We hypothesize that a model including environmental exposure factors in addition to genetic ones could better explain the severity spectrum. The prenatal period is probably the stage in which ASD environmental risk factors interacting with genetic play their major role, however epigenetic programming, altered by environmental factors during infancy, has an important impact on neurodevelopmental trajectories [34] and could contribute to the severity spectrum separated from gestational exposure.
Differences between epigenetic and chronological age (age acceleration, AA) have been associated with stress exposure [35,36]. This mechanism, known as 'Epigenetic age', has the potential to assess an individual's biological age based on DNAm levels. This measurement can show the cumulative effect of environmental factors on the epigenetic maintenance system, being used as a proxy of postnatal stress exposures [37]. So, we used AA as a proxy of postanal environmental exposure. To study gestational environmental exposure, considering that ASD vulnerability probably arises due to a genetic and gestational exposure interaction model [11][12][13] we used a Principal Component Analysis (PCA) to capture the most important factors of vulnerability. We searched principal components based on familial psychiatric history, pregnancy complications, psychosocial (maternal schooling, social class, stress), and toxic environmental exposures (tobacco, alcohol, drugs) during gestation. The genetic component was assessed based on family history considering its association with polygenic risk scores [38] and the greater contribution of common variation in ASD risk [3]. As the objective of PC analysis was to find components of vulnerability represented by genetic and gestational exposure to better explore individuals with a distinct gradient of vulnerability we search if they have differences in methylation patterns enriched by VMRs associated with genetic and environmental factors that did not change according to age. As it would be expected to find de novo and very rare variants (MAF < 0.005) with higher effects for ASD with low IQ [4,39,40], we performed a whole-exome analysis in a subsample to check if any individuals assigned to different gradients of vulnerability according to our PC analysis was not a consequence of bearing more rare genetic variations with higher functional impact.
Our results showed two components of ASD vulnerability: one is more associated with psychosocial stress exposure and the other with a biological vulnerability. A gradient of psychosocial stress exposure was detected and split into two groups, above and below average. A biological difference along this gradient was supported by a different pattern of VMRs between formed subgroups. The principal components and age acceleration were associated with Vineland scores measured at 7 years of age. The current study suggests that environmental factors can contribute to ASD phenotypic severity spectrum.

Participants
The sample of the present study was obtained from a Brazilian population that lives in São Paulo city and contained 68 ASD children with blood samples available from a previously published randomized clinical trial of a video modeling parenting training [41]. One patient that did not complete the Risk Scores questionnaire used to calculate the Risk Scores was excluded from the PCA analysis. The inclusion criteria were: patients with ASD diagnosed using ADI-R, ages between 3 and 7 years old, and IQ between 50 and 75. Exclusion criteria were the presence of known genetic syndrome evaluated by a clinical geneticist and individuals without the Risk Scores evaluation.
Patients were diagnosed using Autism Diagnostic Interview-Revised (ADI-R) [42], a 93-item structured interview that is conducted with parents to measure the following: reciprocal social interactions, communication, language, and behavior patterns [43]. The revised version has been abridged and modified to suit children with a mental age of approximately 18 months to adulthood and is linked to the DSM-IV and ICD-10 criteria. The interview lasts approximately 1.5 h for children up to 4 years old and becomes a little longer for older children. The score is based on the interviewer's judgment regarding the codes that best represent the behaviors described by the respondent, where the cutoff points for the diagnosis are 10 points for the total, 10 points for item B (Social Interaction), 7 or 8 points for item C (Communication, 8 for verbal patients and 7 for non-verbal patients) and 3 points for item D (restricted and repetitive behavior patterns).
IQ assessed by the Snijders-Oomen Nonverbal Intelligent Test-Revised (SON-R 2 1 2 -7) [44] used in children of 2 years and a half to 7 years. It evaluates the cognitive skills without using verbal or written language. ASD patients' functionality was assessed with Vineland Adaptative Behavior Scale [45], which measures adaptative behavior in domains related to communication, social skills, and daily life activities.
ASD severity was accessed with the Childhood Autism Rating Scale (CARS) [46,47], an instrument used to assess the severity of autism symptoms. The scale assesses the patient's behavior through an informant (parents or guardians) in 15 domains that include: personal relationships, imitation, emotional response, body use, use of objects, response to changes, visual response, auditory response, response, and use of taste, smell and touch, fear or nervousness, verbal communication, non-verbal communication, activity level, level and consistency of intellectual response and general impressions of the examiner.
Scores between 30 and 36 indicate mild to moderate autism and scores above 36 indicate severe autism.
The patients' internalizing and externalizing symptoms with the Child Behavior Checklist CBCL [48]. The CBCL is part of an assessment system developed by Achenbach and Rescorla [49] that assesses children's behavior by age group. There are two versions: one for children aged 1 1⁄2 to 5 years and one for children and teenagers aged 6 to 18 years. In both instruments, the informant should be the parents or caregivers. The first version is composed of 99 sentences that must be evaluated by the respondent as not true-as far as is known; a little true or sometimes true; or very true or often true, which corresponds respectively to 0, 1, and 2 points on the scale. The version for children and adolescents aged 6 to 18 years comprises 138 sentences, of which 118 refer to behavior problems and 20 to social competence. This scale consists of 20 items that include the child's activities, such as games, games, and tasks; participation in groups; relationship with family and friends; independence to play and school performance. In most of these items, parents are asked to compare their children's behavior with that of other children of the same age, in terms of performance and time spent on each activity, marking it as below average, within average, or above average. The instrument assesses emotional reactivity, anxiety/depression, somatic complaints, attention problems, aggressive behavior, sleep problems, social problems, thinking problems, and rule violations. From the scores obtained on these scales, the child or adolescent can be included in the clinical, borderline, or normal ranges, in relation to their global functioning and internalizing and externalizing profiles. For inclusion in the internalizing profile, the items are considered isolation, somatic complaints, anxiety/depression, and for inclusion in the externalizing profile, the items evaluated are violations of rules and aggressive behavior. Carvalho et al., [50] report that the CBCL is indicated in the literature as one of the most effective instruments in quantifying parental responses about their children's behavior.
The final sample used contains n = 67 children and mother pairs: 55 males (82.1%), ages between 3 and 7 years old (mean 4. ADI-R scores between 25-60 (mean 47.0, SD 7.0). The clinical genetic examination was evaluated by a team of geneticists and no genetic syndromes were described.
Blood and data were collected at baseline for 66 children for methylation analysis and an arbitrary subsample of 33 children, mother and father trios for exome analysis. The present study was approved by the Hospital das Clinicas (University of São Paulo Medical School) Ethical Committee (CAAE: 57067016.2.0000.0068, assessment #1.637.312). All experiments were performed following the ethical principles for medical research involving human subjects (Declaration of Helsinki), Informed consent was signed by the legal guardians of all patients.

Risk Factor Variables
The risk factor was scored according to (a) Family social class [51,52] measured by Criterio Brasil [53], (b) Maternal schooling [52], (c) Maternal stress during gestation (familial fights, illness, or death in the family, verbal or physical aggression, and emotional or depression symptoms during pregnancy) [14,54], (d) Toxic environmental exposures (Tobacco, alcohol and drug, exposure to pesticides, and use of known deleterious medications during pregnancy), (e) Familial psychiatric history (ASD, schizophrenia, depression, bipolar disorder, alcohol and/or drug abuse, obsessive-compulsive disorder, anxiety, attention deficit, and hyperactivity disorder) [18] and (f) Gestational complications: Eclampsia/preeclampsia, gestational diabetes and excess of weight gain during gestation (Appendix A and Table S1).

Statistical Analysis
The statistical language R (v3.5.2) on RStudio was used for all statistical analyses. For descriptive analyses, mean scores, standard deviations (SD), and frequencies were calculated. To achieve comparability among variables, scores were transformed into standardized z-scores. A principal component analysis (PCA) was performed with the risk factors variables using the "prcomp" function with default parameters, except for the scale set to TRUE. Exposure factors represented in at least 30% of the respondent sample were included in PCA analysis. Only components with an eigenvalue greater than one were retained. To select variables with important contributions to PCs compositions we considered correlations >0.70 between variables and components.

Methylation Analysis
For the methylation analysis, DNA from whole blood samples was treated using the EZ DNA Methylation kit (Zymo Research, Irvine, CA, USA). The bisulfite-converted DNA was hybridized in the BeadChip Human Methylation 450 microarrays (Illumina Inc., San Diego, CA, USA) following the manufacturer's protocol. Raw data was extracted by iScan SQ scanner using the GenomeStudio software (v.2011.1), with the methylation module v.1.9.0 IDAT files. Raw data is available at the GEO database (GSE164563).
Data analysis was performed using the minfi package [55] and other Bioconductor packages [56] (http://www.bioconductor.org, accessed on 15 September 2017). Quantile normalization [57] performed signal intensity normalization of Infinium I and II probes and 'noob' method (Normal-exponential convolution using out-of-band probes) background correction [58]. Batch effects were corrected using the ComBat [59] from the ChAMP package (v. 2.8.3) [60]. Cell composition was estimated based on the Houseman method [61], and no difference was found between the groups. Quality control steps removed 2784 probes (detection p-value > 0.01), 11,214 probes in sex chromosomes, 16,474 probes in SNPs and 26,480 located in cross-reactive sites [62], resulting in 428,560 for the differential analysis.
Identification of differentially methylated probes (DMPs) was performed with Mvalues (logit of β-values) employing an empirical Bayesian framework linear model from the limma package [63,64], sex factor was used as an adjusted variable Only DMPs with adjusted p-value < 0.05, corrected for multiple comparison effects by the Benjamini and Hochberg method, were considered further. Functional Epigenetic Modules (FEM) package (v.3.10.0) was used to find the differential methylation interactive hotspots [65]. DNA methylation age and age acceleration (AA) were calculated using Horvath's epigenetic clock [36].

Enrichment Analyses rVarbase and VMRs
A set of 1000 random groups was created based on CpG coordinates from the Illumina Infinium Human Methylation 450K BeadChip platform. The specific number of probes selected for each group was based on the number of CpGs analyzed for each cluster. Coordinated comparison of CpGs/random groups and rVarBase [66] database were performed to calculate overlapping elements in rVarBase using BEDTools [67] and estimate the p-value. For VMRs enrichment analysis, the Multivariate State Estimation Technique (MSET) algorithm [68] with 10,000 permutations was used. We used VMRs lists from 4 different studies: Islam et al. [23], Teh et al. [20], Garg et al. [25], and Hachiya et al. [19] (Databases are described in detail in Appendix B).

Regression Models
All linear regression analyses were done using the normal distribution of errors in the R core base package. Dependent variables used were represented by phenotype scores measured at the baseline of the clinical trial study and the independent variables were sex, age, and vulnerability scores.

Participants Characteristics
The presented psychosocial, genetic, and epigenetic data was obtained for the current study from a clinical sample participating in a recently published randomized clinical trial [41]. There are n = 67 mother and child pairs being considered here, engaged before the trial's randomization phase.
The range for maternal schooling was 16% with incomplete primary education to incomplete high school, 52% with complete high school, and 32% with higher education. Most of the families were from B, C, and D social classes, with only one family from social class A (higher); 56% from class B; 29% class C, 15% class D (lower) [77]. During gestation, 37 women (54%) did not report any kind of stress; however, 27 (39%) had 1 to 3 stressful events. Medical complications (gestational diabetes, preeclampsia, and more than 12 kg gain during gestation) were present in 38 women (55%), however, most of them had only one type (28 women, 41% of the total). The toxic environmental exposure was reported in 12 women (18%), six women reported alcohol exposure, 5 women reported smoking during gestation, and 1 reported drug use, therefore we did not consider those variables to search principal components. A family history of psychiatric disorders was present in 43 families (62%). There were 7 families (10%) that reported other cases of ASD in the family. Depression and Schizophrenia were the most prevalent disorders among the families. The first was reported in 32 relatives (from 24 families), and the second in 12 (from 11 families). The paternal age at birth was assessed and 41 families (62%) presented this information with an average value of 33.42 (SD ± 7.73).

Finding Components of Vulnerability
Considering as important exposure factors the ones represented in more than 30% of the respondent mothers' sample, we performed a PCA analysis. Two principal components (PCs) had eigenvalues >1, explaining up to 60% of the variance (Table 1).  Table 1). In this analysis, we can also see that PC3 appears with an eigenvalue of almost 1 and it adds 18% to the first two for a cumulative variance of 78%, however, none of the variables were correlated equal or greater than 0.7 with PC3. Figure 1a shows PC1 and PC2 projection in a biplot. The first PC was interpreted as a Psychosocial stress Vulnerability component (PV), while the second PC was interpreted as a Biological Vulnerability component (BV).

Finding Components of Vulnerability
Considering as important exposure factors the ones represented in more than 30% of the respondent mothers' sample, we performed a PCA analysis. Two principal components (PCs) had eigenvalues >1, explaining up to 60% of the variance (Table 1).  Table 1). In this analysis, we can also see that PC3 appears with an eigenvalue of almost 1 and it adds 18% to the first two for a cumulative variance of 78%, however, none of the variables were correlated equal or greater than 0.7 with PC3. Figure  1a shows PC1 and PC2 projection in a biplot. The first PC was interpreted as a Psychosocial stress Vulnerability component (PV), while the second PC was interpreted as a Biological Vulnerability component (BV).  We defined two groups: above (A, n = 29) and below (B, n = 38) average PC1 values (red and blue groups in Figure 1b, respectively). Aiming robustness, instead of a hard split between the groups around the average value, a buffer zone was defined, and their grouping was purposely mixed alternately (subjects between vertical lines in Figure 1b). Statistical analysis did not show any significant differences between A and B groups regarding the clinical variables (Table S2). Evidently, the groups presented differences related to exposure to psychological and social stressors, the main contributors to PC1 (Maternal Stress, p-value = 3.1 × 10 −4 ; Maternal Schooling, p-value = 2.1 × 10 −7 , and Social Class, p-value = 9.8 × 10 −5 ). Although there was no difference regarding gestational complications, the group with higher psychosocial stress scores presented lower family history scores (p-value = 0.0340).

Methylation Analysis
Methylation analysis was performed with two different aims, the first was to search biological differences between individuals assign to different gradients of vulnerability according to our PC analysis, and the second was to assess AA as a proxy of postnatal stress exposure. To investigate if the differences observed in PV were also reflected in the children methylomes, we searched for DMPs between above (group A) and below (Group B) average PV values on PC1 (Figure 1b). After the probe exclusion described in the Methods section, 428,560 CpGs sites remained for further analysis. A total of 11,879 DMPs (corrected p-value < 0.05) were identified when comparing A and B groups. To verify if these DMPs could be altered in the brain, we searched on the BECon datasets [78], which presents a correspondence analysis between blood and 3 brain regions (BA7, BA10, and BA20). A total of 11,384 (96%) and identified CpGs with at least a correlation |r| ≥ 0.5 (BA7 = 384 CpGs, B10 = 416 CpGs and B20 = 362 CpGs), with a low CpGs overlap among brain regions ( Figure S1), resulting in 964 (~8%) CpGs with correspondence between blood and brain.
Differences in children's methylomes suggest the presence of CpG regions that vary according to differences measured in mother-based PV. To support these findings, we compared our results with public datasets, one regarding VMRs, which represent methylated regions that vary according to genetic and environmental factors. Comparison of DMPs with all VMR datasets currently openly available showed that our set is enriched for VMRs, and the majority was best explained by environmental and genetic influences ( Table 2). Using the DMPs present in the VMRs database, we performed an unsupervised hierarchical clustering analysis which yielded a clear two groups pattern that recovered the original A and B partition ( Figure 2).  Analyzing our DMPs with rVarbase [66], which contains regulatory regions harboring annotated SNPs, there was an overlap of 2285 CpGs, although without enrichment when compared to random groups (p = 1). Using these 2285 overlapping CpGs, we observed that in group B (1326 CpGs), hypermethylated CpGs were more localized in active promoters and gene enhancers, whereas in group A (959 CpGs) they were more localized in bivalent promoters and enhancers (Χ 2 -test, p-value < 0.05, Table S3).
To investigate the pathways associated with DMPs, we performed the Functional Modules Analysis using the FEM package, which uses the properties of protein-protein interactions to build modules, then it performs gene set enrichment analysis for pathways within the modules. This resulted in 3 modules of functionally related genes (Table S4 and Figure 3), with two of these modules being enriched for genes from immune system pathways functions and stress response (Table S5). Analyzing our DMPs with rVarbase [66], which contains regulatory regions harboring annotated SNPs, there was an overlap of 2285 CpGs, although without enrichment when compared to random groups (p = 1). Using these 2285 overlapping CpGs, we observed that in group B (1326 CpGs), hypermethylated CpGs were more localized in active promoters and gene enhancers, whereas in group A (959 CpGs) they were more localized in bivalent promoters and enhancers (X 2 -test, p-value < 0.05, Table S3).
To investigate the pathways associated with DMPs, we performed the Functional Modules Analysis using the FEM package, which uses the properties of protein-protein interactions to build modules, then it performs gene set enrichment analysis for pathways within the modules. This resulted in 3 modules of functionally related genes (Table S4 and Figure 3), with two of these modules being enriched for genes from immune system pathways functions and stress response (Table S5). To investigate the postnatal stress burden, we used the epigenetic age, following Horvath's method [36]. As we are using it as a proxy of postnatal stress exposure, we expected that individuals assign to different gradients of vulnerability according to our PC analysis will not have AA differences. We calculated the methylation age and observed that both groups presented higher epigenetic age than chronological age, without a statistically significant difference between the groups (p = 0.087) (Figure 4a-c). The same was observed for Age Acceleration (p = 0.342) (Figure 4d). Although not statistically significant, when comparing the values of PC1, which resumes the psychosocial stress variables, we found that it was (weakly) positively correlated to epigenetic age acceleration (cor = 0.19, p = 0.13, Figure 4f). To investigate the postnatal stress burden, we used the epigenetic age, following Horvath's method [36]. As we are using it as a proxy of postnatal stress exposure, we expected that individuals assign to different gradients of vulnerability according to our PC analysis will not have AA differences. We calculated the methylation age and observed that both groups presented higher epigenetic age than chronological age, without a statistically significant difference between the groups (p = 0.087) (Figure 4a-c). The same was observed for Age Acceleration (p = 0.342) (Figure 4d). Although not statistically significant, when comparing the values of PC1, which resumes the psychosocial stress variables, we found that it was (weakly) positively correlated to epigenetic age acceleration (cor = 0.19, p = 0.13, Figure 4f).

Whole-Exome Analysis
To investigate if individuals assign to different gradients of vulnerability according to our PC analysis are not a consequence of the presence of rare variants of higher impact, we performed a whole-exome analysis on a subset of children and their both parents (8 from A and 25 from B) that were already available in our laboratory. Of the 6625 selected deleterious variants (of which 1966 are on the 25% most intolerant RVIS list), 5512 are in patients in group B (RVIS: 1547, de novo: 18) and 2,578 in patients in group A (RVIS: 736, de novo: 8). Χ 2 -test found no difference between groups (p-value > 0.05).

Regression Analyses
Using phenotypes as dependent variables, we built linear regression models using sex, age acceleration, the two PV groups, and PC scores, to understand their predictive power regarding clinical severity and functionality since they have been associated with clinical severity and prognostic [19,28,79]. Clinical severity and functionality were estimated by Child Autism Ratio Scale (CARS) and Vineland scores, respectively. We also built models introducing the father's age as covariable to represent a different source of the genetic contribution [80]. However, no significant model was associated with the phenotypes investigated.
First, we used the moderation analysis using the interaction between groups, and age acceleration. No significant model was observed using these parameters in the moderation analysis. However, using the variables without the moderation effect, the model was built to predict CARS based on age acceleration, groups, and sex, resulting in a significant regression equation (F(3, 62) = 2.889, p-value = 0.043), with an R 2 = 0.12. In this model, only grouping was a significant predictor (Table 3). A small variation of severity can be explained by this model: being in group A decreased the CARS score to 3.79 on average.

Whole-Exome Analysis
To investigate if individuals assign to different gradients of vulnerability according to our PC analysis are not a consequence of the presence of rare variants of higher impact, we performed a whole-exome analysis on a subset of children and their both parents (8 from A and 25 from B) that were already available in our laboratory. Of the 6625 selected deleterious variants (of which 1966 are on the 25% most intolerant RVIS list), 5512 are in patients in group B (RVIS: 1547, de novo: 18) and 2,578 in patients in group A (RVIS: 736, de novo: 8). X 2 -test found no difference between groups (p-value > 0.05).

Regression Analyses
Using phenotypes as dependent variables, we built linear regression models using sex, age acceleration, the two PV groups, and PC scores, to understand their predictive power regarding clinical severity and functionality since they have been associated with clinical severity and prognostic [19,28,79]. Clinical severity and functionality were estimated by Child Autism Ratio Scale (CARS) and Vineland scores, respectively. We also built models introducing the father's age as covariable to represent a different source of the genetic contribution [80]. However, no significant model was associated with the phenotypes investigated.
First, we used the moderation analysis using the interaction between groups, and age acceleration. No significant model was observed using these parameters in the moderation analysis. However, using the variables without the moderation effect, the model was built to predict CARS based on age acceleration, groups, and sex, resulting in a significant regression equation (F(3, 62) = 2.889, p-value = 0.043), with an R 2 = 0.12. In this model, only grouping was a significant predictor (Table 3). A small variation of severity can be explained by this model: being in group A decreased the CARS score to 3.79 on average. Considering a dimensional approach, we performed moderation analysis with the PC scores instead of groups. Using the vulnerability scores, we built two models, one with only PC1 and PC2 and the other using the 3 PCs (PC1, PC2, and PC3) to have more than 70% of variability explained. The moderation analysis using the first model, sex, PC1, PC2, and age acceleration, resulted in a significant regression equation (F(8, 54) = 2.149, p-value = 0.047), with an adjusted R 2 = 0.13 only considering the Vineland score as the outcome. The significant variables were AA:PC1 and AA:PC2 (Table 4). In the second model, applying the 3 first PCs, Vineland Total Score was significantly predicted (F(16, 46) = 2.761, p-value = 0.0036 and adjusted R 2 = 0.31) mostly by PC3, Sex, AA:PC2, AA:PC3 and AA:PC1:PC3 variables (Table 4). In regards to this model using 3 PCs, moderation analysis using CARS as a dependent variable did not produce a significant model (F(16, 49) = 1.74, p-value = 0.071 and adjusted R 2 = 0.15).

Discussion
In the present work, we showed how adverse environmental exposure during gestation and genetics could contribute to ASD vulnerability and including a biological measure as a proxy of adverse environmental exposure after birth we also presented their contribution to ASD clinical severity.
We searched among a clinically homogeneous group of ASD patients, principal components of ASD vulnerability, considering psychiatric family history representing a polygenic contribution and different gestation environmental exposure as entered features. Two final components explained 60% of the variability: one psychosocial (named PV, correlated with psychological stress factors and socioeconomic problems exposition) and the other biological (named BV, correlated with gestation problems and psychiatric family history). We arbitrarily divided the samples into two groups (A and B, above and below PV average respectively), but we cannot see a clear separation between groups, and the formed subgroups were mainly a consequence of the psychosocial exposure component reinforced by statistically differences when comparing Maternal Stress, Maternal Schooling and Social Class between groups. So, even individuals in group B having more familial history than in group A, which implies a larger polygenic genetic background [38,81], the subgroups were not represented by extremes of genetic background and environmental exposure as previously suggested [82]. It is important to note that out of 7 individuals with positive familial ASD history, 3 were in group B and 4 in group A, so we are not forming subgroups of simplex and complex ASD as already described in the literature [83]. Moreover, in the whole-exome analysis, we could not find differences in the burden of de novo and very rare variants with the higher impact between groups.
Genetic and environmental exposures were entered features used to achieve a dimension reduction so we expected that VMRs (variable methylated regions), polymorphic methylation sites that have been aggregated in genomic regions associated with Genetic, Environmental and GxE effects and independent of age would be enriched in any methylome comparison. Accordingly, differentially methylated positions between groups divided by PC1 were enriched by genetic VMRs or those representing the GxE interaction, that is, associated with SNPs (informative VMRs, and VMRs in EWAS studies) [20,23]. As we are looking at differentially methylated positions, we cannot affirm that one group has more G, E, or GxE VMRs than the other, but a different pattern provided by the CpG cluster analysis reinforces biological differences between groups. Moreover, there was an absence of DMP enrichment in rVarbase regulatory elements, but the presence of different elements in both groups suggests that these regions may be affected by common genomic variations in important regulatory regions in different ways. Additionally, DNA hypermethylation in bivalent domains was observed in group A with higher psychosocial stress vulnerability, epigenetic changes in bivalent domains under stress or environmental exposures were observed in cancer cells [84,85]. The differentially expressed DMPs were enriched in databases related to 2 different tissue VMRs (saliva and blood), and in Cells B, T, Fibroblasts, Glia, and Neuron VMRs, which means that these VMRs should not be associated with tissue-specific programming [23,25]. The fact that VMRs are age invariable and the enriched VMRs are not tissue-specific, is important to support their hole in the vulnerability components, considering that our methylation array has been performed when kids were 7 years of age. But it is important to note that enrichment in glial and neuron VMRs together with correspondence analysis between blood and brain suggest that the variation of these CpGs may also have a role in gene expression alterations in brain tissue [25].
In addition to our DMPs being enriched in genomic regions related to genetic and environmental effects, gene enrichment from immune system pathways functions and stress response, already associated with environmental exposition and ASD, were found when analyzing differently methylated gene promoters from epigenetic functional modules. During pregnancy complications as metabolic disorders like insulin resistance, obesity, and chronic hypertension are associated with systemic inflammation [86]. Obesity leads to a persistent state of low-grade inflammation through the recruitment and secretion of pro-inflammatory cytokines by hypertrophic adipocytes. Moreover, excessive body fat and hyperglycemia are sources of oxidative stress [87]. Psychosocial exposure also is associated with inflammation and oxidative stress [34]. Since the epigenetic clock was not calculated when individuals were born, it probably reflects stress accumulated throughout the whole life and not only during gestation. According, the AA was associated with PC1, but there is a lack of differences in the epigenetic clock between the groups. Methylation analysis showed that the DMPs were enriched in VMRs influenced by genetic or environmental factors that participate in pathways important to ASD, supporting our vulnerability components and suggesting that individuals with similar symptoms may have differences in their biological basis To achieve our second goal, we used regression analysis considering CARS and Vineland measured at seven years old as outcomes and formed groups, vulnerability components, and AA as a proxy of adverse environmental factors after birth as predictors. In a categorical view, groups were associated with CARS at seven years of age, whereas AA and sex were not. In a dimensional view, PCs, AA, and sex showed interactions associated with Vineland. Evidence for this is clearer when we apply the regression analyses using vulnerability score as a numeric variable represented by PC1, PC2, and later PC3 loads. Using only the first two PCs as independent variables, Vineland Total Score could be associated with the vulnerability score moderated by AA, suggesting that vulnerability scores calculated through risk factors could explain in part the variation in Vineland Total Score. The property of vulnerability score as a continuum is more pronounced using the model with the three PCs loads. This model resulted in an association with Vineland Total Score with a larger adjusted R 2 , meaning that the variation of these variables can better explain the variation of Vineland Total Score.
This article has a relatively small sample of only Brazilian young patients with severe ASD and it may not reflect the total ASD population. A previous study showed that ASD with and without ID share underlying mechanisms, although they have unique etiologic components [88]. It is a transversal study done with data that relied on the quality of the information provided by the patients' relatives. Part of the individuals had to be excluded due to missing information from the questionnaires. Another limitation is that the specific timing of stress exposure was not measured. Even considering these limitations, we showed how genetic and adverse gestation environmental factors could contribute to ASD vulnerability. Showed differences in methylation sites associated with inherited common genetic inheritance and environmental response to stress, as important pathways for ASD, corroborating our findings. Finally, the regression analysis confirmed that components of vulnerability including environmental influences after birth could contribute to explain phenotypic heterogeneity.

Conclusions
This paper suggests that individuals with similar symptoms and diagnoses may present differences in their pathophysiological basis and, the variability in phenotypes in the autism spectrum could be explained considering the vulnerability components and postnatal environmental exposure.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/ 10.3390/genes12091433/s1, Table S1: Score of environmental factors and familial genetic history, Table S2: Summary of descriptive statistics of Groups, Table S3: Type of rVarBase elements in DMPs, Table S4: Functional Epigenetic Modules, Table S5: Enrichment analysis of FEM Genes (GO, KEGG and Phenotype Database). Figure S1: Venn Diagram of correspondence analysis of DNA methylation between blood and brain.  Data Availability Statement: Raw methylation database is available at the GEO database (GSE164563) and whole-exome sequenced reads are available at SRA PRJNA525890.

Acknowledgments:
The authors thank Mariana Maschietto and Caroline Camilo for their technical and scientific support.

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Appendix A
Social and economic factors: Maternal schooling [52] score was 1 for complete college education, 2 for incomplete college, 3 for complete high School, 4 for incomplete high school, 5 complete basic school, 6 incomplete basic school, and 7 for no education. Family social class [51,52] score was no points for mothers from A and B class, 1 point for from C and 2 points from D.
Maternal stress SCORE: Stress during gestation [14,54], familial fights, illness, or death in the family, verbal, or physical aggression, and emotional or depression symptoms during pregnancy [89] were evaluated. The presence of each represented 1 point in the final score that varied from 0 (no stress) to 6 (presence of all the above).
Toxic environmental exposures SCORE: Tobacco, alcohol and drug use, exposure to pesticides, and use of known deleterious medications during pregnancy-e.g., diazepam, valproic acid, fluoxetine, and sertraline [13,90,91]. The presence of each was worth 0 to 2 points, depending on frequency and quantity of exposure. The total sum represented a score from 0 to 10.
Familial Genetic risk SCORE: All psychiatric disorders reported from the proband families were combined in a single score [8,9]. The pathologies listed were: ASD, schizophrenia, depression, bipolar disorder, alcohol and/or drug abuse, obsessive-compulsive disorder, anxiety, attention deficit, and hyperactivity disorder. We used the combination of all the disorders to represent a polygenic risk of the psychiatric disorder [18]. Each score was balanced according to the degree of familial proximity to the proband. First-degree relatives had weight 3, second-degree 2, and third-degree 1. We also used reliability of information to weigh the score: if the family member was hospitalized for the psychiatric disorder (weight 3), if they used medications or did therapy because of the disorder (weight 2), and if none of the above (weight 1). The final score varied from 0 to 18. For Example Subject 18, (a) mother with depression with no use of medication and no treatment for the condition; (b) uncle with schizophrenia, hospitalized for the condition. The score was 9, calculated as follows: 3 (1st degree relative with depression) × 1 (low reliability) + 2 (2nd degree relative with schizophrenia) × 3 (high reliability).
Gestational complications SCORE: Eclampsia/preeclampsia, gestational diabetes, and excess weight gain during gestation (more than 12 kg) were used to form the score [86,87,90,92,93]. The presence of each was worth 1 point in the final score that varied from 0 to 3.