It is currently estimated that 1.7% of children in the United States are diagnosed with autism spectrum disorder (ASD) [1
]. The diagnosis of ASD is based on assessment of behavioral symptoms, which include major impairments in social communication, stereotyped behaviors, and restricted interests [2
]. Although there is strong evidence that ASD often begins prenatally due to a complex interaction of genetic and environmental factors [3
], diagnosis of ASD postnatally is difficult at early ages since some obvious symptoms are not present in early infancy and other symptoms are difficult to distinguish from normal development. It is important to diagnose children with ASD as young as possible since available interventions are most effective if started early in life [5
]. One national prevalence study of eight-year-olds with ASD found that the median age of diagnosis was 46 months for autism and 52 months for ASD [1
]; however, this study did not account for children and adults diagnosed at ages above eight years, so the true median age of diagnosis is even higher. Stable diagnoses of ASD have been found in children as young as 18 months [6
], representing a significant disconnect between current and ideal outcomes.
Although ASD is currently diagnosed based upon behavior, there are physiological factors affected by or contributing to ASD. Development of a biomarker-based test for ASD, using quantifiable measures rather than qualitative judgment, would assist with screening for and diagnosing ASD earlier in childhood [7
]. This, in turn, would indicate if further evaluation is needed and allow for intervention and/or therapy to begin as early as possible. Early intervention may maximize the opportunity for improving neural connectivity while brain plasticity is still high [8
], likely helping to reduce the severity of ASD or even prevent it from fully manifesting [9
]. A number of intervention models have been demonstrated to be significantly beneficial for many children with ASD [10
], such as the Early Start Denver Model that has been found effective when started in early infancy [11
]. It is thus important to diagnose children with ASD, or at high risk of ASD, as soon as possible so that intervention can begin.
Besides aiding with diagnosis, ASD-related biomarkers may offer value for evaluating treatment efficacy. These would serve as complements to current behavioral/symptom assessments and help to further elucidate the underlying biological mechanisms contributing to ASD. For example, multivariate statistical analysis of changes in plasma metabolites has been found to offer value for modeling changes in metabolic profiles and adaptive behavior resulting from clinical intervention [12
]. Functional neuroimaging biomarkers may also be promising indicators of biological response to treatment [13
]. In addition, eye-tracking metrics could represent further avenues for quantifying changes in behavior resulting from intervention [14
]. As with diagnostic biomarkers, such approaches can help to mitigate subjectivity in treatment assessment arising from the use of behavioral measures.
Beyond its core symptoms, ASD is also associated with a number of co-occurring conditions that contribute to significant heterogeneity in clinical manifestations of the disorder [16
]. Gastrointestinal (GI) problems are one such group of conditions that are common in children with ASD [17
], especially constipation and/or diarrhea, and are strongly correlated with more severe ASD-related symptoms [19
]. Chronic GI symptoms may be due to perturbed gut microbiome homeostasis in individuals with ASD [20
], with the resulting metabolic abnormalities possibly contributing to altered GI and nervous system function [21
]. If the gut microbiome does indeed have roles in ASD pathophysiology, then correcting its abnormalities may offer one therapeutic pathway for alleviating the symptoms of ASD and its co-occurring conditions [22
This work presents the results of a pilot study using multivariate statistical modeling to highlight differences between 18 children with ASD and chronic GI disorders and a group of 20 typically developing peers (TD) without GI symptoms. The classification model was then applied to the same participants with ASD during and after treatment with Microbiota Transfer Therapy (MTT) [24
] to validate its use as a marker of metabolic changes due to clinical treatment.
2. Materials and Methods
2.1. Study Population and MTT Treatment
The details of the study population and MTT protocol are outlined in a previous study [24
]. Briefly, the study involved 18 children with ASD and chronic GI problems (ASD + GI cohort) and 20 TD children without GI problems (TD − GI cohort), all aged 7–16 years old. ASD + GI participants’ medical records from the previous two years were extensively reviewed by the study physician to determine eligibility for the study. Diagnoses of ASD were then verified using the Autism Diagnostic Interview-Revised through a phone interview with the parents by an evaluator. This was followed by a general physical health examination by the study physician to verify that the children had chronic GI symptoms of moderate to severe severity. Exclusion criteria included antibiotic use in the previous six months or probiotic use in the previous three months, dependence on tube feeding, the presence of life-threatening GI problems, having recent or scheduled surgeries, being severely malnourished or underweight, and being diagnosed with a single-gene disorder, major brain malformation, ulcerative colitis, Crohn’s disease, celiac disease, or eosinophilic esophagitis. TD children were identified as those not having a diagnosed mental disorder including ASD, attention-deficit hyperactivity disorder, depression, or anxiety; in addition, none of the TD children had parents or siblings with ASD.
MTT consisted of two weeks of oral vancomycin (an antibiotic to reduce pathogenic bacteria), one day of fasting and MoviPrep (a bowel cleanse to remove the vancomycin and further reduce levels of intestinal bacteria), one or two days of a high-dose of fecal microbiota (FM), and seven or eight weeks of low-dose FM. The FM consisted of a full spectrum of highly-purified microbiota extracted from stool samples of healthy, carefully-screened donors and prepared as previously described [25
]. Prilosec, a stomach acid suppressant, was also administered during eight weeks of treatment to increase the survival of orally administered FM through the stomach.
Improvement of GI symptoms through MTT was primarily assessed by the Gastrointestinal Symptom Rating Scale (GSRS) [26
] as completed by parents/guardians. The GSRS contains fifteen questions scored in five domains (abdominal pain, reflux, indigestion, diarrhea, and constipation) for evaluating GI symptoms during the previous week on a seven-point Likert scale. From the beginning to the end of MTT, the average GSRS score decreased 82% compared to baseline; eight weeks after treatment stopped, the average score was still 77% lower than at baseline. Changes in ASD-related symptoms were evaluated by the Childhood Autism Rating Scale (CARS), Social Responsiveness Scale, Aberrant Behavior Checklist, and Parental Global Impressions-III (PGI-III). Compared to baseline, the average CARS score decreased by 22% after MTT and by 24% after the eight weeks of follow-up [24
]. A significant negative correlation was also detected between the change in GSRS and PGI-III (Spearman rank correlation coefficient of −0.59) [24
2.2. Metabolite Measurements
Plasma samples were collected by phlebotomists in the morning from fasting participants. The samples were frozen immediately and stored in a −80 °C freezer. When all samples for the study were collected, they were shipped on dry ice to Metabolon (Durham, NC, USA), where sample preparation and data acquisition were processed to obtain metabolite profiling. Samples were extracted and analyzed by the Metabolon platform with the ultrahigh performance liquid chromatography-tandem mass spectroscopy (UPLC-MS/MS) instruments. The Metabolon platform consists of sample accessioning, sample preparation, quality assurance/quality control, and UPLC-MS/MS measurements, and the detailed information is described by Long et al. [27
]. Measurements for a total of 621 plasma metabolites were available for this study.
Plasma samples were collected from all ASD + GI participants at baseline (Week 0), after the administration of oral vancomycin prior to microbiota transfusion (Week 3), and after the end of MTT treatment (Week 10). Plasma samples from TD − GI controls were only collected at Week 0 as these participants did not undergo treatment.
2.3. Statistical Methods
Multivariate analysis was performed with Fisher discriminant analysis (FDA) [28
]. The objective of FDA is to determine a linear combination of metabolites that best separates the ASD + GI and TD − GI study cohorts at baseline (pre-treatment). Prior to FDA, each metabolite measurement was rescaled so that the mean value in the combined ASD + GI and TD − GI groups was 0 with a standard deviation of 1. A discriminant score was calculated by FDA for each study participant by multiplying each input metabolite measurement by a calculated parameter value and then summing these products together. The parameters for each metabolite were estimated such that the difference in mean discriminant score between the ASD + GI cohort and TD − GI cohort was maximized, and the variance of scores within each cohort was simultaneously minimized. Further mathematical details of the algorithm are provided in a previous paper by the authors [29
]. Although other methods can be used, FDA has been found to be well-suited for this type of research [30
Performing FDA with all 621 metabolites would lead to model overfitting and minimize generalizability of any findings; thus, it was necessary to identify an optimal subset of measurements for FDA. To facilitate this analysis, the most significant metabolites for classification were determined through a two-step process. First, any metabolites having fewer than fifteen measurements (i.e., 40% of participants) above the detection limit were excluded. The rationale with this step was to focus only on metabolites that had continuous distributions of values across participants, while still allowing for the possibility that a measurement could be almost entirely below the detection limit in one cohort and above the limit in the other cohort. A univariate analysis was then conducted to compute a receiver operating characteristic (ROC) curve for each individual metabolite by plotting the false positive rate against the true positive rate at different ASD + GI/TD − GI classification thresholds. The area under the ROC curve (AUROC) was then calculated to quantify the separation between the ASD + GI and TD − GI cohorts offered by the metabolite. AUROC values typically range between 0.5 and 1.0, with 0.5 reflecting uninformative classification and 1.0 denoting perfect separation. A general understanding is that AUROC values of 0.5–0.6 indicate meaningless classification, 0.6–0.7 is poor, 0.7–0.8 is average, and 0.8–0.9 is good, [31
], although interpretations may vary by discipline. AUROC values between 0.9–1.0 reflect excellent classification and are desirable for diagnostic tests. For this study, metabolites yielding an AUROC of at least 0.70 were selected as candidates for multivariate analysis with FDA.
Classification with FDA first involved using the top candidate metabolites to exhaustively evaluate all combinations of up to five metabolites. For each number of metabolites used (two, three, four, or five), the 1000 combinations producing the highest AUROC from the fitted discriminant scores were retained. The distributions of discriminant scores within each cohort yielded by each top combination in FDA were then estimated with kernel density estimation; this method uses Gaussian kernels to approximate the probability density functions (PDFs) of the discriminant scores. Defining our classification threshold for separating these distributions to be the null hypothesis H0, which states that a given sample belongs to the TD − GI cohort, the Type I (false positive) error is then taken to be the probability of incorrectly diagnosing a TD − GI participant as being ASD + GI. Similarly, the Type II (false negative) error β is defined as the probability of incorrectly diagnosing an ASD + GI participant as being TD − GI. The Type I and Type II errors were calculated based on H0 with respect to the PDFs obtained from model fitting.
The top 1000 combinations for each number of metabolites were evaluated with leave-one-out cross-validation, in which the classification of each participant was predicted using an FDA model fitted to the remaining (n minus 1) participants’ samples. This step is important, as it means that rather than merely fitting to the data, an estimate of the model’s ability to predict new data was obtained. It generally yields lower accuracies than fitting procedures, but the results are more likely to reflect generalizability to larger data sets. To evaluate each candidate model, the cross-validated sensitivity (or true positive rate, TPR, calculated as the number of correctly classified ASD + GI participants divided by the total number of ASD + GI participants) and specificity (or true negative rate, TNR, calculated as the number of correctly classified TD − GI children divided by the total number of TD−GI children) were calculated at values of the classification threshold H0 for which β = 0.01, 0.05, 0.10, and 0.20; modulating H0 in this manner allowed for characterization of the cross-validated performance of each model when placing the threshold at different points along the ASD + GI distribution.
To further evaluate individual FDA models, sample-level classification accuracies (CAs) and misclassification errors (MEs) resulting from leave-one-out cross-validation were also assessed. While holding out each participant in cross-validation, the PDFs of discriminant scores for the remaining n minus 1 participants were estimated. The percent membership of a held-out sample in its own cohort’s PDF (i.e., the probability of being classified in the correct cohort) was taken to be that sample’s CA, while the percent membership in the incorrect cohort’s PDF (i.e., the probability of being classified in the incorrect cohort) was defined to be the sample’s ME. High-confidence models are those having many samples with CA greater than 0.05 and ME less than 0.05.
The FDA model developed at baseline was used to assess changes in the plasma metabolite data for ASD + GI participants at Week 3 and Week 10 of MTT. Data from these time points were rescaled according to the mean and standard deviation parameters used to rescale the Week 0 metabolites. Changes resulting from MTT were quantified by the Type II error, with respect to the threshold H0
, associated with the PDF of the ASD + GI cohort’s discriminant scores at each time point. While the goal in hypothesis testing is typically to minimize Type II error, a larger Type II error is desired in this analysis since it is expected that successful treatment will make the ASD + GI cohort’s distribution less distinguishable from that of the TD − GI cohort. An additional metric of the MTT effect was the effect size, calculated at each time point as the median difference in discriminant score from baseline (where each participant’s sample was paired with their baseline sample); the 95% confidence interval (CI) for the effect size was calculated by non-parametric bootstrap resampling with 10,000 resamples [32
]. Significance level α
= 0.05 was used for all hypothesis testing.
Univariate analysis of plasma metabolites revealed that many individual metabolites could modestly differentiate between the ASD + GI and TD − GI cohorts, with 61 metabolites yielding an AUROC of at least 0.7 and the highest AUROC of 0.89 associated with nicotinamide riboside. No individual metabolite could classify ASD + GI with an AUROC value greater than 0.9, which is in contrast to multivariate modeling with FDA that was able to identify at least 1000 combinations of metabolites that could classify with AUROC of 0.97 or greater through model fitting. One of the most promising combinations of metabolites, the PM3, was able to classify ASD + GI with 94% sensitivity and 100% specificity after cross-validation. This multivariate approach achieved a level of separation between the ASD + GI and TD − GI cohorts that could not be attained from the metabolites individually.
Many top metabolites were found to be significantly correlated with each other, possibly due to these metabolites coming from the same or closely connected metabolic pathways [33
]. Multivariate approaches such as FDA are appropriate for addressing correlations in biological networks [34
] and do not require that the relationships between measurements be specified or well-defined. By identifying metabolites for the PM3 that were largely uncorrelated, it is possible to maximize the amount of discriminating information (i.e., metabolic patterns separating the ASD + GI and TD − GI cohorts) with a minimal number of metabolites. Further investigation of the biological significance of these metabolites and the metabolites they are correlated with is warranted.
Classification performance of the PM3 was evaluated with leave-one-out cross-validation, which supported the classifier’s ability to generalize to independent data sets. Although implementing other methods of cross-validation such as k
-fold cross-validation may help to further support these conclusions (especially given the large panel of metabolites involved), the small sample size introduces limitations with respect to how much the data set can be partitioned without approaching very small sample sizes in those partitions. A true validation set containing new ASD + GI and TD − GI participants (without treatment) would help to further evaluate the PM3 and alleviate potential concerns of overfitting, which is still not completely ruled out here given the small sample size and large initial number of available metabolites. Applying the model to the MTT Week 3 and Week 10 data for ASD + GI participants suggested that it may be a useful biomarker of treatment efficacy, and that major changes in metabolites associated with ASD and/or GI symptoms did occur; this is consistent with reported improvements in GI and ASD-related symptoms after treatment [24
] and with a recent study that found mice colonized with the gut microbiota of children with ASD to show significantly different metabolic and behavioral profiles from mice colonized with the gut microbiota of TD children [35
]. It is also worth highlighting that the large initial metabolic shift observed at Week 3 reflects the effect of vancomycin by itself, while the later shift observed at Week 10 reflects the effect of vancomycin in addition to MTT. Future studies may aim to include additional sample collection time points to evaluate the contributions of the individual treatment steps and better characterize the metabolic changes brought about by MTT.
For the PM3 metabolites, it is interesting to note that sarcosine at baseline had a bimodal distribution, with most ASD + GI participants having very low levels (15% of the TD − GI median) but with several in the normal range. After MTT they were all in the normal range. Conversely, the distribution of IMP was unimodal and broadly low in the ASD + GI group but clearly improved after treatment. The majority of values for tyramine O-sulfate (78%) were below the detection limit in the ASD + GI cohort at baseline, and at Week 3 and Week 10 there were still 78% and 72% of samples below the detection limit, respectively. In contrast, less than half of the values for tyramine O-sulfate (45%) were below the detection limit at baseline in the TD − GI cohort. The lack of improvement in this metabolite after MTT may indicate it (and/or correlated metabolites) as a target for future interventions to further improve the metabolic profiles of children with ASD and GI symptoms; however, it is also possible that its production/consumption are host-driven and not microbial, and thus is less likely to be responsive to MTT. Additionally, since so many values were below the detection limit in this study, it would be beneficial to measure this metabolite with greater accuracy in future studies to increase the reliability of the measurements for classification and evaluation of treatment efficacy.
Previous work by the authors revealed that blood markers of DNA methylation and oxidative stress from the folate-dependent one-carbon metabolism (FOCM) and transsulfuration (TS) pathways could be used to predict ASD status with 98% sensitivity and 96% specificity [36
] with subsequent validation in a follow-up study [30
]. Multivariate analysis of FOCM/TS markers has also been found to provide an indication of metabolic and behavioral improvement resulting from clinical interventions [12
]. The current panel of metabolites is not directed specifically at these pathways but was used to achieve (preliminary) results comparable to those obtained using targeted FOCM/TS measurements, albeit in individuals with known gastrointestinal symptoms. That being said, several metabolites appearing in the top 61 metabolites, such as sarcosine, cysteinylglycine, and glutamate, do have roles in FOCM and/or TS, and sarcosine was further included in the PM3. Future plasma metabolomics studies may wish to target additional FOCM/TS markers to further explore their validity for accurately classifying ASD.
There have been many other studies of metabolites in plasma or serum of children with ASD [37
]. Several of these studies attempted to classify ASD versus TD children, with some of the most successful studies including Anwar et al. [37
] (four-metabolite model with sensitivity/specificity of 92%/84%) and Momeni et al. [40
] (three-peptide model with sensitivity/specificity of 95%/85%). Among these studies, none of them included cross-validation or validation in another study, unlike the aforementioned studies analyzing FOCM/TS metabolites [30
]. Similarly, in this paper we demonstrated that a three-metabolite model was able to distinguish ASD + GI from TD − GI with 94% sensitivity and 100% specificity after leave-one-out cross-validation. Only one metabolomics paper [47
] specifically investigated the subset of children with ASD who had GI problems (using urinary metabolites) and it was found that the ASD group with GI problems had four gut bacterial metabolites that were significantly different. Additionally, none of the previous metabolite studies assessed changes after treatment to determine if the model could be used as a biomarker of treatment efficacy.
In studies such as this one where the prevalence of ASD in the study sample did not match the prevalence of ASD in the overall population, the classifier’s positive and negative predictive values may be incorrectly represented with respect to its true clinical values [48
]. To gain an indication of the classifier’s true clinical utility, the Bayes’ adjusted positive and negative predictive values should be calculated by incorporating the true population prevalence. Without adjusting for prevalence, the positive and negative predictive values of the PM3 were 100% and 95%, respectively. After adjusting for ASD population prevalence, assuming the current U.S. prevalence estimate of 1.7% [1
], the positive predictive value remains 100% and the negative predictive value increases to 99%. There is thus minimal mismatch between the predictive values of the PM3 in the study population and the adjusted estimates for the general population.
A major limitation of our study was its small sample size. Although the PM3 was able to accurately classify a small number of participants, it remains to be seen whether the metabolic patterns used for classification would hold up for a larger study population. The results thus require validation on larger cohorts beyond what our cross-validation procedure was able to accomplish. Moreover, potential subgroups in the ASD + GI cohort should be considered, since previous studies have reported the presence of subgroups in ASD [49
]. The small sample size might also influence the presence of outliers in the post-treatment ASD + GI participants (as seen at Week 10). It would have been valuable to validate the PM3 on later time points for TD − GI participants, but plasma samples were only collected from the TD − GI cohort at baseline. Additionally, this study compared individuals with ASD and GI symptoms to TD individuals without GI issues, meaning the classification of ASD versus TD and subsequent assessment of ASD at later time points was confounded by the presence of GI issues in the ASD + GI cohort. The observed metabolic shifts at Week 3 and Week 10 might not be due solely to improvement in ASD-related symptoms and might be influenced by improvements in GI-related symptoms. The PM3 should be further developed for a more general population of individuals, regardless of GI (or other co-occurring condition) status and would also ideally be generalizable to individuals with or without single-gene disorders.