Multivariate Analysis of Fecal Metabolites from Children with Autism Spectrum Disorder and Gastrointestinal Symptoms before and after Microbiota Transfer Therapy

Fecal microbiota transplant (FMT) holds significant promise for patients with Autism Spectrum Disorder (ASD) and gastrointestinal (GI) symptoms. Prior work has demonstrated that plasma metabolite profiles of children with ASD become more similar to those of their typically developing (TD) peers following this treatment. This work measures the concentration of 669 biochemical compounds in feces of a cohort of 18 ASD and 20 TD children using ultrahigh performance liquid chromatography-tandem mass spectroscopy. Subsequent measurements were taken from the ASD cohort over the course of 10-week Microbiota Transfer Therapy (MTT) and 8 weeks after completion of this treatment. Univariate and multivariate statistical analysis techniques were used to characterize differences in metabolites before, during, and after treatment. Using Fisher Discriminant Analysis (FDA), it was possible to attain multivariate metabolite models capable of achieving a sensitivity of 94% and a specificity of 95% after cross-validation. Observations made following MTT indicate that the fecal metabolite profiles become more like those of the TD cohort. There was an 82–88% decrease in the median difference of the ASD and TD group for the panel metabolites, and among the top fifty most discriminating individual metabolites, 96% report more comparable values following treatment. Thus, these findings are similar, although less pronounced, as those determined using plasma metabolites.


Introduction
Autism spectrum disorder (ASD) encompasses a large group of early onset neurological conditions that result in impairments in social behavior and communication, which are estimated to affect 1 of gut microbiota transfer therapy on the fecal metabolites of the ASD group. The study involved 38 children, aged 7-16 years, 18 of these professionally diagnosed with ASD by a healthcare provider (verified with the Autism Diagnostic Interview-Revised) and 20 determined to be typically developing. The participants with ASD were required to have moderate to severe GI problems, and the range of GI issues included constipation, diarrhea, and alternating diarrhea/constipation. GI symptoms were assessed biweekly with the Gastrointestinal Severity Rating Scale (GSRS) and daily with a Daily Stool Record using the Bristol Stool Form scale [25]. The study consisted of 2 weeks of antibiotic therapy, 1 day of bowel cleans, and a high major initial dose and 7-8 weeks of lower maintenance doses of FMT treatment followed by evaluation at 8 weeks post treatment. The TD group did not undergo MTT, but instead was used as a comparison group whose measurements were taken at the same time as the ASD group before treatment. The MTT experimental protocol and details of the study population are outlined in Kang et al. [25].
The pre-treatment protocol consisted of two weeks of oral vancomycin, which is a broad spectrum non-absorbable antibiotic. This treatment was intended to reduce pathogenic bacteria and prime the GI system for MTT. The dose of vancomycin administered was individualized to the weight of each participant at 40 mg/kg, with a maximum dose of 2 g [25]. Participants were then subjected to one day of fasting and a bowel cleanser (MoviPrep) in order to remove the vancomycin and further reduce levels of intestinal bacteria. Standardized Human Gut Microbiota (SHGM) consisted of a full spectrum of highly purified microbiota from healthy, carefully screened donors. The ASD cohort was split into two groups, each one following a different initial high dose (2.5 × 10 12 cells/day) SHGM treatment. One MTT treatment consisted of a single dose administered rectally (n = 6) while the other involved doses administered orally on two days (n = 12). Both techniques were followed by a lower concentration SHGM maintenance dosage (around 2.5 × 10 9 cells) given orally, with treatment ending 8 weeks after the initial high dose [25]. However, the protocol differed slightly for both groups of ASD children as those that received SHGM rectally waited for one week prior to beginning low dose SHGM.

Metabolite Measurements
Once the study had concluded, aliquots of the fecal samples were shipped overnight on dry ice to Metabolon (Durham, NC, USA). Both the control and autism samples were blinded and randomized prior to being shipped. Metabolon utilized ultrahigh performance liquid chromatography-tandem mass spectroscopy (UHPLC-MS/MS) instruments for obtaining metabolomic information on 669 metabolites. A detailed overview regarding this protocol can be found in Long et al. [31]. By using this technique, it is possible to determine a signal intensity corresponding to a metabolite's presence in a sample. Subsequently, the signal intensity is used to derive the relative abundance of each metabolite. For this objective, peak area integration using the area under the curve was utilized. In the case of missing values, imputation was performed by taking the lowest value of each compound measurement divided by the square root of 2.
Fecal samples were taken at four time points from the participants with ASD ( Figure 1). Parents were instructed to freeze these sample immediately after collection for up to 3 days, and the samples were then transported to Arizona State University on dry ice where they were stored in a −80 • C freezer. Initial fecal samples were collected from all participants at Week 0. Samples were also taken from ASD participants at the Week 3 mark from the beginning of the treatment (after about five days of microbiota transplant) and at the end of MTT treatment (Week 10). The ASD group was sampled again 8 weeks after administration of SHGM ceased (Week 18). In total, 18 ASD participants collected samples at all time points aside from Week 3, where only 17 samples were collected. The TD group received no treatment and 20 were sampled at the beginning (Week 0).

Statistical Analysis
The data collected for each of the metabolites underwent various forms of statistical analysis to assess differences between the ASD + GI and TD cohort. By comparing the differences observed for metabolites before and after MTT, it might be possible to gain some understanding of the role that this therapy could play in altering metabolic processes. Both univariate and multivariate techniques were used in this regard, and the implementation of the analysis routines was done in MATLAB.
J. Pers. Med. 2020, 10, x FOR PEER REVIEW 4 of 28 Figure 1. Timeline for the experimental protocol, which can be divided into three main phases. From Week 0 to Week 3, the Autism Spectrum Disorder (ASD) cohort is primed for Microbiota Transfer Therapy (MTT). From Week 3 to Week 10 the ASD cohort receives low dose fecal microbiota (FM) or is prepared for low dose FM, and finally from Week 10 to Week 18 no treatment is given to the individuals. A closeup is provided of Week 2-3 as this is when Standardized Human Gut Microbiota (SHGM) is initialized.

Statistical Analysis
The data collected for each of the metabolites underwent various forms of statistical analysis to assess differences between the ASD + GI and TD cohort. By comparing the differences observed for metabolites before and after MTT, it might be possible to gain some understanding of the role that this therapy could play in altering metabolic processes. Both univariate and multivariate techniques were used in this regard, and the implementation of the analysis routines was done in MATLAB.

Preprocessing
In order to ensure continuous distribution of values across all participants, metabolites with too many values below the detection limit at their initial stool sample (Week 0) were removed. The detection limit for a metabolite was determined to be the minimum value recorded for that metabolite. If less than 40% of all measurements were above the detection limit, the metabolite was removed from subsequent analysis. This step accounted for the possibility that a measurement could be almost entirely below the detection limit in one cohort, while simultaneously being above the limit in the other cohort. The remaining metabolites were then normalized such that for each metabolite the median value was 1.0 in the Week 0 TD cohort.

Univariate Analysis
Univariate analysis identifies metabolites that are differentially expressed between the ASD and TD cohorts. Using this information, it is possible to examine common correlations and relationships across different measurement quantities. In turn, this has the potential to identify underlying mechanisms of ASD etiology as well as provide guidance for the development of a multivariate model that can more accurately distinguish between both groups. As there are 669 metabolites under investigation, there is significant concern related to overfitting of statistical models if many or all these measurements are used to develop a multivariate model. By reducing the number of measurements to a smaller subset, it is possible to alleviate some of the concerns related to overfitting. Timeline for the experimental protocol, which can be divided into three main phases. From Week 0 to Week 3, the Autism Spectrum Disorder (ASD) cohort is primed for Microbiota Transfer Therapy (MTT). From Week 3 to Week 10 the ASD cohort receives low dose fecal microbiota (FM) or is prepared for low dose FM, and finally from Week 10 to Week 18 no treatment is given to the individuals. A closeup is provided of Week 2-3 as this is when Standardized Human Gut Microbiota (SHGM) is initialized.

Preprocessing
In order to ensure continuous distribution of values across all participants, metabolites with too many values below the detection limit at their initial stool sample (Week 0) were removed. The detection limit for a metabolite was determined to be the minimum value recorded for that metabolite. If less than 40% of all measurements were above the detection limit, the metabolite was removed from subsequent analysis. This step accounted for the possibility that a measurement could be almost entirely below the detection limit in one cohort, while simultaneously being above the limit in the other cohort. The remaining metabolites were then normalized such that for each metabolite the median value was 1.0 in the Week 0 TD cohort.

Univariate Analysis
Univariate analysis identifies metabolites that are differentially expressed between the ASD and TD cohorts. Using this information, it is possible to examine common correlations and relationships across different measurement quantities. In turn, this has the potential to identify underlying mechanisms of ASD etiology as well as provide guidance for the development of a multivariate model that can more accurately distinguish between both groups. As there are 669 metabolites under investigation, there is significant concern related to overfitting of statistical models if many or all these measurements are used to develop a multivariate model. By reducing the number of measurements to a smaller subset, it is possible to alleviate some of the concerns related to overfitting.
Metabolites were individually analyzed for their ability to classify between the ASD and TD cohorts at their Week 0 measurements. The area under the receiver operator curve (AUROC) served as an assessment of the potential of a metabolite to distinguish between ASD and TD groups. This metric is defined as the false positive rate against the false negative rate at different ASD/TD classification thresholds. An AUROC of 1.0 indicates the capacity for perfect separation, while an AUROC of 0.5 indicates that there is no ability to distinguish between the groups. Metabolites with an AUROC value above 0.6 were selected as candidates for use in multivariate analysis in this work.
Univariate analysis techniques evaluated whether significant changes had occurred among the metabolites between the beginning and end of the study for the ASD cohort. For this purpose, the metabolite measurements at Week 0 and Week 18 were compared using a parametric or non-parametric test, depending upon their distribution. An Anderson-Darling test for normality was used at both time points to determine the distribution of each set of measurements. Subsequently, either a Wilcoxon signed-rank test or a paired t-test was performed on the ASD group, comparing measurements from Week 0 to Week 18. A relatively normal distribution employs the parametric paired t-test; otherwise, the non-parametric Wilcoxon signed-rank test is used. The resulting p-value indicates how significantly the concentration of the metabolite changed for the cohort over the course of the study.
As there were a considerable number of quantities measured per study participant, it was imperative that multiple hypothesis correction tests were utilized. Subsequently, a false discovery rate (FDR) for each individual metabolite was computed using leave-n-out (n = 1, 2, 3) cross validation (see Table A1). Leave-n-out is an iterative process and involves removing n individual data points from the total dataset and rerunning the univariate analysis on this subset. This procedure is repeated so that all possible combinations with n removed individuals are assessed. The FDR is calculated as the proportion of univariate results that were not deemed significant.

Multivariate Analysis
Fisher discriminant analysis (FDA), metabolites that had been identified as having an AUROC value above 0.6 were used to develop a multivariate model for distinguishing between the ASD and TD cohorts. FDA is a dimensionality reduction technique that seeks to separate classes of data by determining a projection where such separation is maximized [32]. This is achieved by maximizing the ratio between the between-class variability S B and the within-class variability S W for a weight vector W.
In the case of K classes with n number of samples and m measurements, S B is defined as follows, where x denotes the global mean, x k denotes the local class mean, and n k is the number of samples within class k: In contrast, the within-class covariance matrix S W is defined as the following, where x i corresponds to an individual sample: Thus, FDA simultaneously maximizes the scatter between classes and minimizes the scatter within each class to find k-1 vectors that maximize the objective function. Subsequently, the eigenvector corresponding to the k-1 largest eigenvalue of S B S W corresponds to the optimum weight vector.
For this dataset, the objective of FDA is to separate the ASD and TD cohorts with a combination of metabolites. The initial stool samples (Week 0) were used to develop these models, so that the model classifies individuals before any treatment.
The previously mentioned preprocessing and univariate analysis steps were performed to reduce the set of metabolites considered for FDA. An FDA model could potentially be created with all 669 metabolites, but this model would likely overfit the data. To account for this, only metabolites that passed the preprocessing step and achieved a univariate AUROC of over 0.6 remained in consideration. This resulted in 165 metabolites under further investigation.
An exhaustive search was performed through all possible combinations of 2, 3, or 4 metabolites of the reduced set of 165 metabolites to determine the models which best separate the ASD and TD groups at Week 0. The AUROC was used here as well, measuring how well the multivariate models classify the two groups. Using kernel density estimation, the probability density function of each model was computed. Iterating through all combinations, the models were assessed, and the combination of metabolites was determined for each number of variables. For each number of metabolites, the 1000 models that had achieved the highest possible AUROC were recorded. To derive the five-metabolite models, all 1000 four metabolite models that had achieved the highest AUROC were augmented with each of the remaining 161 metabolites that had an AUROC greater than 0.6. The top 1000 five metabolite models that had the highest AUROC were then subjected to leave-one-out cross validation.

Cross-Validation
Leave-one-out cross-validation was performed on the optimal FDA models to evaluate robustness and statistical independence. Cross-validation ensures that, rather than merely fitting a model to presented data, the model obtained is also capable of classifying new data. Although cross-validation generally has a lower accuracy than what is computed just by fitting a model to data, the cross-validation accuracy will better reflect generalizability to new data sets, i.e., data not used for developing a classifier. Leave-one-out cross-validation proceeds iteratively, as a single individual's data is removed from the total dataset, then an FDA model is computed with measurements from the remaining individuals [33]. The measurements from the removed individual are now used as a test case to determine if the model prediction regarding classification is correct. This process is repeated for measurements from each of the individuals in the dataset: their data are removed, a model is developed with the remaining data, then they are classified with this model, until the data for each individual has been removed once. A confusion matrix is computed which includes the true positive rate (TPR), or sensitivity, and the true negative rate (TNR), or specificity. Additionally, for each model, the Type II (false negative) error β was modulated between 0.01, 0.05, 0.1, and 0.2 during cross-validation. The Type II error determined the threshold value for separating the two groups. By alternating the values of β, it was possible to evaluate the cross-validated performances along different positions of the ASD distribution. Lowering β meant raising the Type I error while lowering the Type II error and the converse also holds true. Thus, each of the four models (2, 3, 4, or 5 metabolites) had cross-validation performed four times, with corresponding computation of TPRs and TNRs.

Model Evaluation
The models obtained after cross validating at different thresholds for data collected at Week 0 were used to make predictions about the ASD group at the other MTT time points. Specifically, measurements at Week 3, Week 10 and Week 18 were used to monitor the change in classification performance over the course of the MTT protocol. Data from these time points were rescaled with respect to the TD Week 0 median and standard deviation. The probability density functions were compared between the time points, and the discriminate scores for each model as well as of their constituent metabolites were determined. Changes resulting from MTT were quantified using the Type II error, with respect to the threshold associated with the probability density function (PDF) of the ASD + GI cohort's discriminant scores at each time point. Thus, both univariate assessments were performed as well as the total assessment of the multivariate models' discriminant score. Additionally, correlation analysis between significant metabolite pairs was performed to determine possible underlying relationships.

Univariate Analysis
In total, there were 669 fecal metabolites that were measured in the study. Through the preprocessing step, 86 metabolites were determined not to have the prerequisite number of observations above the detection limit for further analysis. In order to classify ASD and TD cohorts at their Week 0 measurements, the area under the AUROC was used as an assessment of the potential of a metabolite to distinguish between ASD and TD groups. The remaining 583 metabolites were ranked according to their univariate AUROC, and 165 metabolites with an AUROC of at least 0.6 were identified. No single metabolite perfectly separated the cohorts (which would correspond to an AUROC = 1.0), as the metabolite with the highest AUROC, carnitine, achieved a value of 0.77 (Table 1).
Using the 165 metabolites with an AUROC greater than 0.6, additional univariate testing was performed to assess the degree to which measurements shifted following MTT. ASD metabolite samples measured at Week 0 were compared to their values following MTT at Week 18 using either a paired t-test or a Wilcoxon signed-rank test depending upon the distribution determined for the data via the Anderson-Darling normality test. It was found that 10.9% of the metabolites significantly changed (p < 0.05) following the MTT therapy when comparing the ASD group before and after treatment (see Table A1). The metabolites that had a threshold AUROC value of 0.6 were subsequently used for model discovery for the 2-, 3-, 4-and 5-metabolite models.

FDA Models
The FDA models with the greatest AUROC values for each number of constituent metabolites are listed in Table 2. The probability density function (PDF) of discriminant scores for the 2-, 3-, 4-, and 5-metabolite models that achieved the highest accuracy following cross validation are shown in Figure 2. There were two distinct models that were identified, using five separate metabolites, as having achieved the same accuracy after cross-validation. With the exception of one metabolite which differed between them (Adenosine and Indole), the constituents of these panels are identical. These two metabolite models are both shown in Table 2 and will be referred to as OFM-A and OFM-I, optimized fecal model-adenosine and optimized fecal model-indole, respectively (OFM-I/A). For all optimized metabolite panels, the TPR and TNR values for each are presented when the β value was modulated. The 5-metabolite models had higher AUROCs than the 2-, 3-, and 4-metabolite models, so they are the focus of the following analysis, due to their higher accuracy (0.95 specificity and 0.94 sensitivity). Modulating β revealed that the optimal cut-offs between the ASD and TD distributions for the OFM-I and OFM-A models was β = 0.05 for both the OFM-A and OFM-I.
For the 1000 best models with five metabolites, the AUROC ranged from 0.97 to 1.00 which are high values. The reason for using the 1000 best models is that there are not only one or two best models as judged by AUROC alone. Each of these models was subjected to cross validation, with OFM-I/A being derived from those that achieved the highest accuracy. The metabolites ultimately utilized for the development of a five-component model were all found to be in the top quartile of prevalence in the 1000 top models ( Figure 3). Notably, among the top fecal metabolite models, adenosine and hydroxyproline appeared in 36.3% and 62.4% of models, respectively. Only three metabolites were present in more than 25% of the top 1000 models that were not among those included in the OFM-I/A panels. These metabolites were Adenine, 2-aminobutyrate and 1,7-dimethylurate (corresponding to the 5th, 57th, and 86th highest AUROC rank, respectively). Table 1. The top 50 fecal metabolites (area under the receiver operator curve (AUROC) ≥ 0.66) distinguishing between ASD and typically developing (TD) groups at baseline ranked by univariate AUROC with at least 40% values above the detection limit (15 measurements). The AUROC at Week 18 is also provided. The entire list of top 165 metabolites is not shown due to space constraints (see Table A1). Either a Wilcoxon signed-rank test or a paired t-test was performed on the ASD group, comparing measurements from Week 0 to Week 18 for each of the metabolites. The p-value of those metabolites that significantly changed after MTT (p-value < 0.05) are presented. The primary associated sub-pathway for each metabolite is also provided [34].

Correlation Analysis
Correlation analysis was performed on the OFM metabolites as these were the ones that had been identified as being able to distinguish between the ASD and TD cohorts with the highest accuracy after cross-validation. It can be observed that many of the top 50 metabolites (AUROC ≥ 0.66) were significantly correlated with the OFM metabolites (Table 3). In contrast, the individual OFM metabolites for both models had little to no correlation with each other, apart from hydroxyproline with adenosine and 2-hydroxy-3-methylvalerate; these findings were expected since, if individual OFM metabolites were highly correlated with each other, then they would not be useful in the model due to their correlation.

Assessing Effects of MTT
Univariate assessment of the top 50 metabolites as ranked by AUROC demonstrated that 14% of these 50 metabolites showed significant differences in their Week 0 and Week 18 ASD measurements and that 47 of these 50 metabolites achieved a lower AUROC eight weeks following treatment (Table 1). In addition to classification at baseline, the multivariate models developed can be used to observe changes in fecal metabolome composition over the course of the study. Most metabolites in the OFM-I and OFM-A models changed significantly after MTT and have values closer to the TD group after MTT (see Table 4). The average difference between the median of the five metabolites for TD group at Week 0 and the ASD measurements at Week 18 compared to measurements at Week 0 diminished by 88% and 82% for the OFM-I and OFM-A models (see Table 4), so the ASD group became much more similar to the TD group.  Figure 2. PDFs of ASD and TD discriminant scores at Week 0. The probability density function of the FDA score provides a visualization of a model's ability to distinguish between the ASD and TD cohorts. The (a) two-metabolite model has most of its FDA scores highly concentrated near the region where thresholds would be applied. The (b) three-metabolite model is not as highly concentrated, but there is a significant amount of overlap between the scores of the ASD and TD participants, which is visible in both plots. The four (c) and five (d,e) metabolite models better separated the cohorts, with little overlap in the discriminant scores of the ASD and TD groups.
For the 1000 best models with five metabolites, the AUROC ranged from 0.97 to 1.00 which are high values. The reason for using the 1000 best models is that there are not only one or two best models as judged by AUROC alone. Each of these models was subjected to cross validation, with OFM-I/A being derived from those that achieved the highest accuracy. The metabolites ultimately utilized for the development of a five-component model were all found to be in the top quartile of prevalence in the 1000 top models ( Figure 3). Notably, among the top fecal metabolite models, Figure 2. PDFs of ASD and TD discriminant scores at Week 0. The probability density function of the FDA score provides a visualization of a model's ability to distinguish between the ASD and TD cohorts. The (a) two-metabolite model has most of its FDA scores highly concentrated near the region where thresholds would be applied. The (b) three-metabolite model is not as highly concentrated, but there is a significant amount of overlap between the scores of the ASD and TD participants, which is visible in both plots. The four (c) and five (d,e) metabolite models better separated the cohorts, with little overlap in the discriminant scores of the ASD and TD groups.

Correlation Analysis
Correlation analysis was performed on the OFM metabolites as these were the ones that had been identified as being able to distinguish between the ASD and TD cohorts with the highest accuracy after cross-validation. It can be observed that many of the top 50 metabolites (AUROC ≥ 0.66) were significantly correlated with the OFM metabolites (Table 3). In contrast, the individual OFM metabolites for both models had little to no correlation with each other, apart from hydroxyproline with adenosine and 2-hydroxy-3-methylvalerate; these findings were expected since, if individual OFM metabolites were highly correlated with each other, then they would not be useful  The OFM-I/A models were applied to the ASD samples at all distinct time points to assess their accuracy for classifying a sample as belonging to the ASD or TD cohort. The effectiveness of OFM-I/A for classification changed significantly before and after MTT. The type II error rate was initially observed to be 5% for both models, indicating that the ASD and TD distributions are quite distinct, but was observed to rise to 56% eight weeks after MTT was completed (Table 4), thereby indicating that distinguishing between the ASD and TD cohort is not reliably possible after MTT. The PDF curves are shown in Figure 4 to demonstrate the changes in the ASD cohort over time with respect to the values of the FDA score. The distributions indicate that the ASD cohort became more metabolically similar to the TD cohort after treatment, since the curves are shifted towards the TD curve. Notably, the distribution of scores for the ASD cohort become somewhat bimodal at the later time points for both models. The discriminant score for both models decreased substantially as time progressed, indicating that the metabolites of the ASD group were becoming more similar to that of the TD group.

Discussion
Preliminary analysis using univariate methods revealed that none of the individual fecal metabolites achieved a high AUROC value by itself. Interpretations regarding the threshold value needed for an AUROC to be deemed an effective classifier vary by discipline. AUROC values between 0.9-1.0 are desirable for diagnostic tests and are seen to be reflective of excellent  Table 4. Change in the difference between OFM-I/A metabolites measured in the TD and ASD cohort over the course of the study. The discriminant score was calculated by first taking the absolute value of the difference between measurements at each time point and the median of the TD group, then normalizing the difference by the standard deviation of the TD Week 0 measurements, and then adding the normalized absolute difference for each of the five metabolites. The background color distinguished the individual metabolites from the multivariate models.

Discussion
Preliminary analysis using univariate methods revealed that none of the individual fecal metabolites achieved a high AUROC value by itself. Interpretations regarding the threshold value needed for an AUROC to be deemed an effective classifier vary by discipline. AUROC values between 0.9-1.0 are desirable for diagnostic tests and are seen to be reflective of excellent classification [35]. However, this value was not chosen here as the AUROC is employed here as a pre-screening tool for reducing the number of metabolites for classification, and not for determining a metabolite that by itself can distinguish between the two groups. The highest AUROC value for an individual metabolite was 0.77, corresponding to carnitine, indicating that the ASD group is somewhat heterogeneous. In contrast, all optimized multivariate models using three or more elements were able to achieve an AUROC greater than 0.9, highlighting that a multivariate analysis can provide better classification than that which can be determined using univariate analysis alone. Nonetheless, 94% of the top 50 univariate metabolites report lower AUROC (Table 1) eight weeks following MTT, which indicates greater similarity between the ASD and TD measurements after treatment.
Analysis of all possible significant metabolites at Week 0 resulted in the OFM-I and OMF-A models, consisting of five metabolites. Four of these five metabolites were identical between the two models and both achieved AUROC values greater than 0.99. Interestingly, the two metabolites that differed between them, Adenosine and Indole, are associated with different metabolic processes and have no significant correlation. Furthermore, cross-validation revealed that using the OFM-I/A models at the Week 0 timepoint resulted in a 0.95 TPR and 0.94 TNR. Subsequently, there was an overall 94.7% accuracy for correctly classifying an individual into the ASD/TD groups after leave-one-out cross-validation.
Many of the metabolites identified as being differentially expressed between the ASD and TD cohorts have also been previously examined for their relationship to ASD. Specifically, among the top five metabolites ranked by their AUROC value, carnitine, indole and sphingosine have all been found to be differentially expressed in some capacity among individuals with ASD [14,17,18,36]. In a meta-analysis, 10-20% of individuals with ASD were found to have disorders with synthesizing carnitine, which was the metabolite that had achieved the largest AUROC value [36]. Plasma carnitine concentration has also previously been shown to be lower among cohorts with ASD [36]. It should be noted that following MTT the AUROC values of carnitine reduced to 0.68, which indicates that the ASD and TD carnitine distributions were less different after MTT (see Table A1). Sphingolipids such as sphingosine have been found to play an active role in the crosstalk between microbiota and intestinal cells [37]. The significant change in concentration for metabolites such as N-palmitoyl-sphingosine (d18:1/16:0) may have been associated with the changed microbiome composition resulting from MTT (Table 1). Approximately 68% of the variance observed in the fecal metabolome can be explained by the gut microbiome [38], which underscores the potential impact FMT can have on reshaping metabolite concentrations.
Among the metabolites which form part of the OFM-I/A models, theobromine exhibited a significant change between its measurements at Week 0 compared to the Week 18 value for the ASD cohort when a sign ranked test was applied at both timepoints (Table 1). Theobromine is not a microbial metabolite, and its source in fecal samples likely stems from dietary intake and from human metabolism of caffeine [38]. Consequently, this may account for the reason why it was not observed to be correlated with any other metabolite and why the median discriminate score often took the value of the detection limit. However, the metabolization of theobromine is primarily via hepatic demethylation and oxidation, which are processes that have at least been hypothesized to be perturbed in ASD [39,40]. The median concentration for this metabolite was also not found to change following the bowel cleanse measurement (see Table A1). We conducted a secondary analysis of a five-metabolite model without theobromine and found that it results in significantly lower sensitivity and specificity, so including theobromine seems to be important for developing a classification model.
Nonetheless, in the case of all metabolites present in the OFM-I/A models, the average difference between the Week 0 TD measurements and ASD group decreased greatly (82-88%) by the end of the study (Table 4). Hydroxyproline, which is another of the OFM metabolites, has been previously determined to be expressed in significantly higher concentration in the plasma of children with ASD, consistent with the higher levels in feces [41] and in the present study. Indole, which was also one of the OFM metabolites, has been found in higher concentration in fecal samples in children with ASD and other neurodevelopmental conditions [17], consistent with the results of this study, and is an important metabolite for tryptophan metabolism [42]. Thus, the shift to a lower discriminant score following the completion of the treatment is consistent with measurements of ASD fecal metabolites becoming more like those of their TD counterparts following MTT.
The OFM-I/A models in their totality demonstrated similar behavior when contrasting measurements taken at Week 0 and Week 18 of the ASD cohort. This study found that some metabolic changes had begun by Week 3 (after vancomycin, bowel cleanse, and approximately five days of FMT). It is also notable that the distributions of FDA scores within both the five-metabolite models at later timepoints (Week 10 and Week 18) are bimodal. This suggests that some individuals may respond differently to MTT than others. This finding was similar to the analysis performed on plasma metabolites where a steep decline in median discriminant score was also observed at Week 3 and Week 10 [29].
The OFM-I/A metabolites demonstrated limited correlation among themselves. This was to be expected as FDA seeks to maximize the amount of discriminating information with a minimal number of utilized metabolites. For this reason, within this subset of fecal metabolites, those with few correlations tended to appear more frequently in the top 1000 models. Specifically, there was a high proportion of top 1000 models featuring theobromine (28.5%), which was ranked as the fourth most common metabolite present in the models. Notably, this metabolite was not significantly correlated with any of the other top 50 metabolites as discussed above. In total, 44 of the 50 metabolites with the highest AUROC were correlated with the OFM metabolite panel, suggesting that there are at least six common types of metabolic abnormalities associated with ASD. Although adenosine and indole were not found to be correlated, they were included in the OFM-A and OFI-I models, respectively, and both metabolites are related to distinct biological pathways, with adenosine being associated with purine metabolism, while indole is associated with tryptophan metabolism. It was observed in one study that metabolites associated with these two pathways were the most different in urine of ASD and TD children [43].
While there are several metabolic pathways that are related to the top 50 metabolites identified, about 45% of the top metabolites were connected to phenylalanine and tyrosine metabolism, fatty acid metabolism or sphingolipid metabolism. Differences in tyrosine metabolites such as decreased concentration of phenylalanine and increased concentration of p-cresol have been previously observed in studies examining the gut metabolite composition in TD and ASD children [14,44]. The role of the microbiota in this pathway is also very significant. Tyrosine metabolism pathway downregulation was observed in an ASD cohort to be associated with an increased prevalence of Bacteroides vulgatus while upregulation was associated with Eggerthella lenta [44]. Similarly, the relationship between sphingolipid metabolism and microbiome crosstalk has been suggested, and differences in short chain fatty acids of children with ASD and their TD peers have been noted [24,36].
Several metabolites related to mitochondrial metabolism and regulation such as carnitine, betaine, and adenine were also determined to have particularly high AUROC values [35,[45][46][47]. Carnitine serves as the cofactor that transports long-chain fatty acids to the mitochondria matrix, and betaine plays a role in increasing mitochondrial membrane potential [45,47]. There has been considerable investigation into the relationship between ASD and mitochondrial dysfunction. It is estimated that around 4-7% of children with ASD are affected by mitochondrial disease, but it is speculated that up to 80% may have abnormalities in mitochondrial function [48,49].
Prior work has shown similar relationships between ASD fecal metabolite profiles as were observed in this study. GABA, an important neurotransmitter, was one of the metabolites identified as having a lower concentration in the ASD group prior to MTT, which is consistent with prior work [16,17]. Similarly, the fecal concentrations of free carnitine have been previously observed to be higher in children with ASD, which was also observed in this study [14]. The fecal metabolite measurements are consistent with prior work as the average indole measurements for the ASD cohort were more than twice the value of their TD counterparts at Week 0 (see Table A1) [17]. It has also been observed that fecal metabolites associated with glutamate metabolism such as 2-Keto-glutaramic acid and l-Aspartic acid were downregulated in children with ASD [44]. These metabolites were not measured in this study. Nonetheless, one of the metabolites associated with glutamate metabolism, carboxyethyl-GABA, was identified in significantly lower concentration in the ASD + GI cohort at baseline.
While there are indeed some similarities between the analysis of fecal metabolites and prior assessment of plasma samples taken from these participants, there are key distinctions. None of the metabolites identified as being utilized in the optimum multivariate models were previously identified as being significant for classification in the multivariate plasma models. The general performance of the fecal metabolites when subjected to univariate analysis had generally lower AUROC values than plasma metabolites [29]. However, despite having lower AUROC scores, multivariate analysis achieved high accuracy in distinguishing ASD and TD children. That being said, we were able to achieve greater separation using three plasma metabolites than with five fecal metabolites. This may be due to the greater homogeneity in plasma samples vs. stool samples. There have also been far more studies conducted examining plasma metabolite concentrations in individuals with ASD than studies focused on fecal samples [50,51].
Although the models were able to classify between the ASD + GI and TD cohorts with high accuracy, this study also has several limitations. The study focused exclusively on children with ASD with initially moderate to severe GI problems, which were compared to TD children with no GI issues. Therefore, assessments regarding ASD were confounded with GI problems in this analysis. ASD subgroups differentiated by variations in GI abnormalities were ignored in the analysis as subgroups were too small for a robust statistical assessment [52,53]. Furthermore, the study-cohort was not large and the ASD cohort was further split up into two different initial treatments. Future studies with a larger sample size examining cohorts with and without GI symptoms would allow for an assessment on the effectiveness of MTT in ameliorating behavioral symptoms in addition to GI related pathology.

Conclusions
This study investigated differences in fecal metabolites between a group of children diagnosed with ASD and GI symptoms and their typically developing peers with no history of GI symptoms. The univariate analysis demonstrated that individual fecal metabolites had limited potential to distinguish between ASD+GI and TD cohorts, unlike the previous study of plasma; this may be due to greater heterogeneity in stool compared to plasma. However, multivariate statistical analysis resulted in five-metabolite models that had high accuracy even after cross-validation. Both the OFM metabolite panels were shown to be capable of achieving 95% specificity and 94% sensitivity.
Following MTT, 14% of the top 50 metabolites that were found to have the greatest difference in concentration between the TD and ASD group shifted such that their distributions were significantly different eight weeks after the treatment ended. Furthermore, 94% of these metabolites reported lower AUROC following treatment, indicating diminished capacity to distinguish between the ASD and TD group. When considering a normalized average of the metabolites in the OFM models, the difference between the ASD and TD groups decreased by 82-88% at 18 weeks. These findings are similar, although less pronounced, as those determined using plasma metabolites, and both suggest that MTT resulted in shifting the metabolic profile of the ASD group towards becoming more similar to the TD group. Future work should be performed to validate the effect of MTT on fecal metabolites using a larger study cohort and a placebo arm. Appendix A Table A1. Univariate assessment of all metabolites with an AUROC greater than 0.60. The AUROC is provided at Week 0 for the TD and ASD cohort as well as average ultrahigh performance liquid chromatography-tandem mass spectroscopy (UHPLC-MS/MS) signal intensity measurements for both cohorts. The p-value for the significance between the ASD and TD cohorts at Week 0 and between the ASD cohort at Week 0 and Week 18 are also shown. Finally, the results of leave-n-out (n = 1-3) cross-validation are provided.

TD Mean
Week 0

ASD Mean
Week 0 Week 0 vs. Week

ASD p-Value
Week 0 ASD vs.