Next Article in Journal
The Influence of Symbiosis on the Proteome of the Exaiptasia Endosymbiont Breviolum minutum
Previous Article in Journal
Potential Association of Cutibacterium acnes with Sarcoidosis as an Endogenous Hypersensitivity Infection
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Gut Microbiota Analysis and In Silico Biomarker Detection of Children with Autism Spectrum Disorder across Cohorts

1
School of Life and Pharmaceutical Sciences, Hainan University, 58 Renmin Avenue, Haikou 570228, China
2
State Key Laboratory of Marine Resource Utilization in South China Sea, Hainan University, 58 Renmin Avenue, Haikou 570228, China
*
Author to whom correspondence should be addressed.
Microorganisms 2023, 11(2), 291; https://doi.org/10.3390/microorganisms11020291
Submission received: 14 November 2022 / Revised: 15 January 2023 / Accepted: 17 January 2023 / Published: 22 January 2023
(This article belongs to the Section Gut Microbiota)

Abstract

:
The study of human gut microbiota has attracted increasing interest in the fields of life science and healthcare. However, the complicated and interconnected associations between gut microbiota and human diseases are still difficult to determine in a predictive fashion. Artificial intelligence such as machine learning (ML) and deep learning can assist in processing and interpreting biological datasets. In this study, we aggregated data from different studies based on the species composition and relative abundance of gut microbiota in children with autism spectrum disorder (ASD) and typically developed (TD) individuals and analyzed the commonalities and differences of ASD-associated microbiota across cohorts. We established a predictive model using an ML algorithm to explore the diagnostic value of the gut microbiome for the children with ASD and identify potential biomarkers for ASD diagnosis. The results indicated that the Shenzhen cohort achieved a higher area under the receiver operating characteristic curve (AUROC) value of 0.984 with 97% accuracy, while the Moscow cohort achieved an AUROC value of 0.81 with 67% accuracy. For the combination of the two cohorts, the average prediction results had an AUROC of 0.86 and 80% accuracy. The results of our cross-cohort analysis suggested that a variety of influencing factors, such as population characteristics, geographical region, and dietary habits, should be taken into consideration in microbial transplantation or dietary therapy. Collectively, our prediction strategy based on gut microbiota can serve as an enhanced strategy for the clinical diagnosis of ASD and assist in providing a more complete method to assess the risk of the disorder.

1. Introduction

Autism spectrum disorder (ASD) comprises a large group of heterogeneous neuro-developmental disorders characterized by symptoms such as clumsiness, repetitive behavior, abnormalities in social interaction, and difficulties in speech and communication [1]. In 2011, the number of individuals with ASD reached 67 million worldwide. In the United States, 1 in 68 children had ASD in 2014, while the ratio increased to 1 in 45 in 2016 [2]. Similarly, there were more than 10 million individuals with ASD in China in 2016, of which more than 2 million were children [3]. These statistics indicate that ASD has turned into a global disorder with an increasing incidence. At present, almost all individuals with ASD need special care and education services, causing a significant burden on their families and society. In 2014, it was estimated that the cost of supporting an individual with ASD and intellectual disability was 2.4 million USD during their lifespan in the United States and 2.2 million USD in the United Kingdom, while the cost was 1.4 million USD for individuals with ASD without intellectual disability in both countries [4]. The estimated monthly cost of caring for individuals with ASD in Chinese families in 2014 was over RMB 5000, which did not cover parental productivity loss [5]. Although ASD has gradually attracted worldwide attention and become a worldwide public health issue, its pathogenesis and mechanisms are still unknown; hence, there is no complete cure for ASD. However, it is generally recognized that the earlier ASD symptoms are detected and the more scientific interventions and treatments are carried out, the better the prognosis. Therefore, the timely detection and accurate diagnosis of ASD symptoms are of great practical significance to help the individuals and their families to live normal lives.
At present, ASD symptoms cannot be diagnosed by a single test, such as oral swabs, blood tests, or urine tests. Current ASD diagnosis is based on symptoms, behaviors, medical history, and social function, which are heavily dependent on subjective judgment. With developments in the field of biology, many scientists have begun to seek biomarkers to support ASD diagnosis and detection. Biomarker exploration has mostly focused on the brain [6,7,8], nervous system [9,10,11,12,13], genetic traits [14,15,16,17,18,19,20,21], and the presence of certain metabolites [22,23,24,25,26,27], as it is generally believed that ASD results from genetic [14,15,16,17,18,19,20,28], environmental [29,30], neurological, and immunological factors [31,32]. Some recent studies have revealed that there is a close interaction between the host and its gut microbiota, and an imbalance of the gut microbiota can cause ASD symptoms for children [33]. Therefore, the differential characteristics of gut microbiota between neurotypical individuals and those with ASD have emerged as a new type of potential biomarker for ASD diagnosis. Although the specific causal relationship and communication mechanism between the gut microbiota and ASD are still uncertain, many studies support the postulation for the existence of the microbiome–gut–brain axis [34,35,36,37]. It is believed that certain factors in the gut microbiota, such as key metabolites and cytokines, may affect the development of the central nervous system in some way, leading to mental disorders such as ASD.
It is seen that when the homeostasis of gut microbiota in individuals with ASD is disrupted, the differential diversity and abundance in their microbial compositions are observed, as shown in Table 1. Several researchers have attempted to diagnose ASD using certain phenotypes of gut microbiota as possible biomarkers, and to further treat ASD by manipulation of gut microbiota. A summary of these studies is shown in Table 1. Although these studies have confirmed that the gut microbiota of individuals with ASD exhibited abnormalities, there were no consistent conclusions on which microbial species in the gut microbiota were responsible for ASD. The difficulties were found to relate to several factors, such as small sample sizes, a single cohort design, and the disunity of research models in data mining. Therefore, more refined and effective strategies are needed to assess the risk, detection, and prediction of ASD via biomarker-based methods.
Traditional statistical analyses are seen to be unable to clarify correlations with human diseases due to the extreme complexity of the gut microbiota in its composition and function, including its variety, quantity, and complicated interactions. Alternatively, an increasing number of studies have adopted artificial intelligence, such as ML techniques, to assist in the diagnosis and prediction of human diseases, such as preterm birth [48], glycemic responses [49], Vibrio cholera infection [50], alcoholic hepatitis [51], colorectal carcinoma [52], and even biological age [53] and death [54]. These prediction models have achieved good performance, showing the substantial potential of artificial intelligence techniques in the analysis of gut microbiota for human health.
In terms of ASD diagnosis, several studies have been conducted to develop prediction models with satisfactory accuracy. Among them, Bosl et al. [8] used ML algorithms to predict the possibility of an infant with autism based on standard electroencephalography (EEG), which records the electrical activity of the brain. In a group of 188 infants aged about 9 months, the model achieved an accuracy rate of over 80% in differentiation of infants at high risk of autism from normal infants. Hazlett et al. [7] developed a deep learning algorithm to predict the status of children with high autism risk at the age of 24 months, utilizing the surface area information of brain magnetic resonance images of individuals at 6 and 12 months of age. This method achieved an accuracy of 81% and a sensitivity of 88%. Compared with the traditional behavior questionnaire with an accuracy of 50%, the reliability of the results was greatly improved. However, the prediction was limited to individuals with high familial risk and might not be able to guarantee the same effect in general cases. In addition, both models were applied to brain monitoring, which might be of limited use in follow-up treatments due to the complexity of the brain and our limited understanding regarding their functions. More recently, there appears a new trend to investigate the human diseases with the aid of cross-cohort analyses. Some of such studies across cohorts have shown that individuals from different racial and ethnic groups possess remarkable differences in gut microbiota [47,55,56]. It enables researchers to find similarities or differences in gut microbiota present in individuals with ASD from different regions and of different races, and to manipulate the gut microbiota toward that of healthy people by changing the diet of patients or transplanting the gut microbiota, to achieve the purpose of treating ASD.
Due to the heterogeneity of the population, high costs, and difficulty of data acquisition, most current studies on ASD are confined to single cohort data and the number of samples is usually not sufficient, which makes the clinical implementation of microbial-based diagnostic tools challenging. This work aims to explore the characteristics of gut microbiota of individuals with ASD across regions and look for potential biomarkers of ASD, to assist in ASD diagnosis. It integrates and annotates multi-source ASD data from different countries, cohorts, and ethnicities to provide a unified processing procedure. The commonalities and differences in the composition of gut microbiota of individuals with ASD across cohorts are explored to provide a more comprehensive and robust assessment of the correlation between ASD and gut microbiota. In addition, ML technology is used to establish an ASD prediction model and to determine potential biomarkers in silico to provide a novel method to assess the risk of ASD. Our results illustrate the AI’s role in the interpretation of gut microbiome for the prediction of ASD and suggest that a focus on biomarkers in the gut microbiota could be helpful in diagnosing ASD in the future.

2. Materials and Methods

2.1. Gut Microbiome Data Acquisition

To obtain comprehensive information on gut microbiota, we employed shotgun metagenomic sequencing data from NCBI in our analysis. To avoid possible biased results by inconsistent data processing procedures, we selected the primitive sequencing metadata-sequence read archive (SRA) rather than the microbial classification data from existing research platforms. The SRA approach is able to compare the datasets from original sequencing data to enhance the reproducibility. To minimize errors by simple sequencing technology, two independent SRA datasets with similar sequencing and procedures (PRJNA516054 [28] and PRJEB23052 [24]) were selected. A total of 51 typically developed individuals (TD group) and 73 individuals with a clinical diagnosis of ASD (ASD group) from Moscow city in Russia and Shenzhen city in China were aggregated (Table 2).

2.2. Microbiome Bioinformatics

As SRA data are a kind of non-text data, and cannot be analyzed directly, readable FASTQ files were extracted from the SRA file first. Secondly, the quality of the sequencing data in FASTQ files was evaluated. According to the quality control reports by FastQC [57], we used Trimmomatic [58] to obtain high-quality reads by cutting the adapters of the sequences TACACTCTTTCCCTACACGACGCTCTTCCGATCT and GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT and filtering the unsatisfied reads with the parameter of “LEADING:3 TRAILING:3 HEADCROP:4 SLIDINGWINDOW:4:15 MINLEN:36” for both the Moscow and Shenzhen cohorts. For the Shenzhen cohort, the adapter sequence GGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATTAAAAAA was also removed. Second, we used Bowtie2 [59] to remove human reads by aligning the sequence to the hg38 build of the genome index. MetaPhlAn2 [60] was utilized to obtain the microbiome information for each sample, including the kingdom, phylum, class, order, family, genus, and species, as well as the corresponding abundances. In addition, all separated results from each sample were merged using MetaPhlAn2 merge_metaphlan_tables.py. Considering the interference of noise data, two criteria were designed to remove noise species, as described in Section 2.2. The results before and after modification were visualized using a Venn diagram.

2.3. Microbiome Data Analysis

The similarity between microbial samples was calculated using PCoA based on the Bray–Curtis algorithm. The correlations among different species were visualized through a co-occurrence network using Gephi version 0.9.2 [61]. The Spearman correlation coefficient between pairwise species was calculated using the Corr.test from the Psych package. The FDR-corrected p values of less than or equal to 0.05 and R values greater than or equal to 0.2 were considered significant. LEfSe analysis [62] was performed to determine the differentially abundant species, using the online analysis module provided by the Galaxy platform (http://huttenhower.sph.harvard.edu/galaxy, accessed on 15 July 2021). Differentially abundant species between the ASD and TD groups were tested using the pairwise Wilcoxon rank-sum test with linear discriminant analysis. The threshold value of the logarithmic linear discriminant analysis score for discriminative features was set as 3.0 and p = 0.05.

2.4. Prediction

Traditional statistical assessments are limited in feature selection, biomarker discovery, and diagnosis detection; thus, a more intelligent method is needed. ML is a collection of data analysis techniques that aim to learn patterns from multidimensional datasets and build predictive models based on associations between the features of a given dataset. The process of ML itself is to find a set of optimal model parameters and convert the features in the input data into accurate predictions for labels. The RF algorithm in ML is commonly considered to be effective when the number of features greatly exceeds the number of samples. The main workflow of ML consists of three steps: processing input data, learning or training the underlying model, and making predictions using new data. RF, as one of the main branches of ML, naturally follows these three main steps:
  • Processing input data. The input data of this paper are the microbial species and their relative abundance information from each sample generated after species annotation, where “features” are each microbial species and their relative abundance, and “labels” is the category of each sample, including neurotypical individuals and those with ASD.
  • Learning or training model. This step is mainly to find the optimal parameters of the model by repeating the sub-steps of “parameter estimation,” “model performance evaluation,” and “error identification and correction”.
  • Once the optimal parameters are determined in step 2, the model with the optimal parameters is used to predict with the new input data.
We used the stratified 10-fold cross-validation method to train and test the model, taking into account the imbalance in the sample number of groups. As shown in Figure 1, samples for the groups of TD_Moscow, ASD_Moscow, TD_Shenzhen, and ASD_Shenzhen were stratified in 10 folds so that each fold contained approximately the same proportion of samples as the original dataset (the ratio was 2:3:3:4 based on their size). All sample data in sequence were assigned numbers 1 to 10 and each sample received a single index. For each fold, the sample data with a same index were individually used as the test set while the rest of the sample data were used as the training sets. This design ensured that the samples were utilized in turn both as training data in 9 folds and test data in one fold. In the training process, the grid search method was used to choose the optimal parameter of the feature entry, which takes each feature as the entry for training in turn, while the parameter of tree depth was set to the default value of 500. In addition, all the error rates corresponding to each feature entry were compared, and the optimal feature entry, which corresponds to minimum error rates, was defined. For this, the tree depth was set to 100, 200, and 300; the maximum was set to 10,000; the minimum error rate was calculated and the corresponding tree depth was chosen as the optimal value. Thus, the prediction model was trained. In the test process, the “labels” of the true value of the test data indicated that the person was a neurotypical individual or that with ASD was removed first. Then, the “features” of the test data were input into the prediction model and the prediction values of “labels” for each sample in the test set were output. We compared all the prediction results (i.e., the prediction value of labels, neurotypical individuals, or those with ASD) with the correct label (i.e., the real value of labels, neurotypical individuals, or those with ASD) to calculate the overall accuracy of the entire different cross-validation fold and presented it as the AUROC value.
Stratified 10-fold cross-validation is a total of 10 calculations based on different combinations of data for the same sample data. This method not only made full use of the limited sample data, but also traversed the possibilities to prevent accidental results due to data combination. The AUROC metric summarizes the true-positive rate (TPR) and false-positive rate (FPR) for the unequal proportions of each outcome [63]. TPR indicates the probability of correctly judging ASD samples from all ASD samples and represents the sensitivity of the model; FPR indicates the probability of misjudging TD samples as ASD samples and represents the specificity of the model. In general, the larger the value of AUROC, the better the classification performance. A total of 100 repeats of stratified 10-fold cross-validation were run, in order to reduce the contingency of the model. In each repeat, grid search was used to choose the optimal parameters for the training process. An AUROC value was output for each repeat and, finally, five highest AUROC values and their corresponding models were singled out from the 100 repeats. For improving the prediction results, a mean decrease accuracy (MDA) method (named “importance”) embedded in the RF algorithm was used to calculate the importance of each “feature,” which indicated the contribution of each microbial species and their relative abundance to the prediction. The uninformative species and their relative abundance with the value of MDA less than or equal to 0 were removed, and the rest of the species and their relative abundance were utilized to repeat steps before as new sample data. Thus, five new models with the highest AUROC values and important species from every 100 runs were selected to run the next round. The iteration was repeated until little or no increase in the AUROC value was reached (AUROC value < 0.01), and the corresponding model was optimal. Using this iterative method not only reduced the accuracy loss caused by limited sample data but also reduced the contingency of the optimal model and provided almost unbiased performance. The interactive method is illustrated in Figure 2.

3. Results

3.1. Species Composition of Gut Microbiota

All acquired SRA data were with their adapter sequences cut, quality-trimmed, and filtered using Trimmomatic. After removing the human reads by alignment of the obtained reads to the hg38 build of the genome index using Bowtie2, MetaPhlAn2 was utilized to obtain the microbiome profiles (Supplementary Data S1). As expected, only 30–75% of the species for each sample were clearly identified because of the complexity and diversity of the gut microbiome, and the reference genome sequence constructed at present is far from complete. As shown in Figure 3A and Supplementary Data S2, a total of 749 kinds of gut microbes were annotated from the microbiome profiles of the Shenzhen and Moscow cohorts, among which 648 species were identified in the Shenzhen cohort and 602 species in the Moscow cohort. As a result, a slightly richer microbial diversity was shown in the Shenzhen cohort than in the Moscow cohort.
Based on the samples from Shenzhen, 555 microbial species were found in the ASD group (ASD_Shenzhen), and 514 species were found in the control group (TD_Shenzhen); 441 species were found in both the ASD and TD groups, accounting for 68.1%. In the samples from Moscow, 544 microbial species were found in the ASD group (ASD_Moscow), and 439 species were found in the control group (TD_Moscow), of which 381 species (63.3%) were found in both groups. This is in line with our current understanding of the composition of gut microbiota that absolute beneficial bacteria and absolute harmful bacteria account for a small proportion in the gut. The most abundant bacteria are opportunistic pathogens, which are affected by the environment and other bacteria and easily switch between being beneficial and harmful. Data from both cohorts showed that the ASD group had a greater diversity of microbes than the control group (Shenzhen: 555 vs. 514; Moscow: 544 vs. 439). However, in both the Shenzhen and Moscow cohorts, the ASD group and the control group shared more than 60% of the same microbes. It was seen that the ASD groups in the two cohorts contained a total of 574 species of microbes, of which 425 species were in common, accounting for 74.0%. Meanwhile, there were only 130 unique species in the Shenzhen cohort and 119 unique species in the Moscow cohort. There were 588 species of microbes in the control group of the two cohorts, among which 329 species were in common, accounting for 56.0%. Meanwhile, 177 species were unique to the Shenzhen cohort, and 82 species were unique to the Moscow cohort. This suggests that the microbes that cause ASD may be in common in terms of species, even though they are geographically different.
Data filtering was conducted to maintain the accuracy, by the method in literature [48]. Two criteria were designed to remove noises in this study: (1) species with abundance values greater than or equal to 0.01 in less than 5% of the samples, and (2) species with abundance values greater than or equal to 0.001 in less than 15% of the samples. The total species accepted were decreased from 749 to 285, with up to 256 species shared by the four subgroups (Figure 3B and Supplementary Data S3).
Phylum level analysis was performed after noise filtering. It is observed that the abundant phyla were Actinobacteria, Synergistetes, Firmicutes, Bacteroidetes, Proteobacteria, Fusobacteria, and Verrucomicrobia (Supplementary Data S4). Both the gut microbiota of the two cohorts showed that the two dominant phyla were Firmicutes and Bacteroidetes. Compared to the TD group, the ASD group showed a higher Firmicutes/Bacteroidetes ratio, which is in agreement with previous studies [24,42] (Figure 3C). At the genus level, the top ten abundant genera in both ASD and TD groups were Bacteroides, Faecalibacterium, Bifidobacterium, Eubacterium, Alistipes, Prevotella, Roseburia, Blautia, Ruminococcus, and Shigella (Figure 3D and Supplementary Data S5). In addition, the hierarchical heatmap (Figure 3E) indicated that the genus Intestinimonasc was more abundant in the TD group, while Prevotella, Coprococcus, and Sellimonas were more abundant in the ASD group. Notably, the results we obtained for Prevotella are inconsistent with those of a previous study [42].
At the species level, the top ten species with the relative abundance in the ASD group were Faecalibacterium prausnitzii, Bacteroides vulgatus, Bacteroides uniformis, Prevotella copri, Bifidobacterium pseudocatenulatum, Bacteroides fragilis, Bacteroides dorei, Bacteroides ovatus, Eubacterium rectale, and Alistipes putredinis, while the top ten abundant species in the TD group were Bacteroides vulgatus, Bacteroides uniformis, Faecalibacterium prausnitzii, Bacteroides dorei, Bacteroides fragilis, Bacteroides plebeius, Alistipes putredinis, Escherichia coli, Bifidobacterium longum, and Ruminococcus gnavus. As shown in Figure 3F, among the top ten species with the highest abundance, six species in the ASD and TD groups were the same. They were all species with high abundance, but their relative abundance was slightly different. This demonstrates once again that the high abundance of basic microbes remains common in both individuals with and without ASD.

3.2. The ASD Group Was More Heterogeneous than the TD Group

As there was little difference in species diversity between the ASD and TD groups, a principal coordinate analysis (PCoA) was performed to investigate the variation in the microbial communities in the datasets of the two cities based on the Bray–Curtis algorithm (Figure 4 and Supplementary Data S6). The PCoA analysis showed an obvious clustering between the two datasets (the sample in the Shenzhen cohort presented relatively left, while the sample in the Moscow cohort presented relatively right), which implied that geographic location was an important factor in the samples. In addition, the results from both the Shenzhen and the Moscow cohorts exhibited a similar tendency in that the microbiota composition of the ASD group was more heterogeneous than that of the TD group. The samples of neurotypical individuals were found to be more similar, while the samples of individuals with ASD were more diverse. The results are similar to those obtained by 16S rRNA gene sequence analysis in the literature [25]. However, it was found that the segmentation distance between the ASD and TD groups was not obvious for both cohorts, and there was no significant difference between the ASD and TD group samples. In addition, the values of PC1 and PC2 were only 0.1125 and 0.0893, respectively, indicating that the differences between the ASD and TD groups were not obvious.

3.3. No Biomarker Was Observed in the Species with Low Abundance

The linear discriminant analysis effect size (LEfSe) for discovering high-dimensional biomarkers was utilized (p < 0.05, LAD score > 3). The LEfSe analysis revealed a significant increase in the abundance of the species Bacteroides cellulosilyticus, Eubacterium rectale, Eubacterium_sp_CAG_180, Bacteroides intestinalis, Roseburia faecis, Ruthenibacterium lactatiformans, Firmicutes_bacterium_AM55_24TS, Coprococcus eutactus, Megamonas funiformis, Lachnospira pectinoschiz, Firmicutes_bacterium_CAG_65, Megamonas hypermegale, and Eubacterium_sp_CAG_581 in the ASD group compared to that in the TD group. Meanwhile, the ASD group showed a significant decrease in the abundance of Veillonella parvula, Bacteroides_coprocola_CAG_162, Eubacterium siraeum, Eubacterium_sp_CAG_248, Eubacterium_sp_CAG_202, Bacteroide stercoris, Bacteroides plebeius, and Bacteroides dorei (Figure 5). LEfSe was performed again with the original 749 species before modification. It was found that the result was the same as that after de-noising. This suggested that the data modification reserved important information and this analysis generated no biomarkers for low-abundance species.

3.4. Correlations in the ASD Group Were More Complex than Those in the TD Group

Co-occurrence networks for the species from both ASD and TD groups were constructed based on significant Spearman correlations to explore potential relationships among the species within the gut microbial communities. The network (false discovery rate <0.05, rho ≥ 0.2) was then visualized using Gephi version 0.9.2. The network relationships of the ASD group (285 nodes, 6997 edges, Supplementary Data S7A–C, Supplementary Figure S1A) were slightly more complex in comparison to those of the TD group (285 nodes, 6255 edges, Supplementary Data S8A–C, Supplementary Figure S1B). It was observed that the proportion of positive correlations (blue curves; 4626, 66.11% of ASD group; 4290, 68.59% of TD group) and negative correlations (yellow curves; 2371, 33.89% of ASD group; 1965, 31.41% of TD group) remained almost the same, while the absolute values were slightly different. The positive correlations between the gut microbiota in both ASD and TD groups were far more frequent than negative correlations.

3.5. Prediction Model Based on Random Forest Algorithm

It was found from the characteristics of gut microbiota data that neither PCoA nor co-occurrence network analysis showed qualitative differences in the microbial community between the TD and ASD groups. In this study, a random forest (RF) [64] classifier model was built for the classification of gut microbiota that could be present in the ASD group at the species level, which features non-linear and multi-dimensional relationships. In this model, a nested integrative method was improved for stratified 10-fold cross-validation (Figure 1). The details are given in Materials and Methods. The Moscow cohort (30 ASD samples and 20 TD samples) and the Shenzhen cohort (43 ASD samples and 31 TD samples) were analyzed separately and together. Prediction performance was evaluated using the area under the receiver operating characteristic curve (AUROC) metric. The average AUROC value of the final five models was regarded as the prediction result of the RF model. For the Shenzhen cohort alone, the prediction result achieved a high AUROC value of 0.984 and accuracy of 97%, with only one round of the 100-iteration run. The model used an average of 39 species of the full set of 749 species as features (Supplementary Data S9). The result of the Moscow cohort had a poor average result of AUROC = 0.81 and accuracy = 67%, with a total of six rounds of the 100-iteration run. Eventually, the iterations became stabilized in the fourth round, with the optimal average feature set containing 41 species (Figure 6A–C and Supplementary Data S10). For the combination of the two cohorts, the optimal model was achieved in the third round. The average prediction result had an AUROC = 0.86, accuracy = 80%, and an average feature set of 67 species (Figure 6D–F and Supplementary Data S11). This supports our RF model based on the idea that relative abundance of the microbial community can be used to accurately predict the clinical diagnosis of ASD.
We also tested the independence of the models cross Moscow and Shenzhen cohorts. The prediction model was trained by the data from one cohort, and validated with the other. The results showed a poor independence of the models with low AUROC values (around 0.5). This is consistent with the previous study [65], which focused on prediction for colorectal cancer.

3.6. Potential Biomarkers of ASD Diagnosis

As we aimed to capture potential biomarkers (i.e., species) to support ASD diagnosis, an evaluation method of feature importance by the mean decrease accuracy embedded in the RF algorithm was employed and an optimal prediction was achieved. To make the feature set smaller, an iterative method was adopted to enable optimal predictions with iteration to select the important features of the current round. The results showed that five models for the Shenzhen cohort predicted 41, 46, 33, 41, and 33 species as important features, with the optimal average feature set of 39 species. In comparison, five models for the Moscow cohort predicted 48, 34, 46, 38, and 38 species as important features (Table 3), with an average number of 41 species. The combined cohorts outputted 59, 77, 56, 72, and 71 species as important features from the final five optimal models (Table 4), with the optimal average feature set of 67 species. Furthermore, from the five best models with higher AUROC values, 27, 46, and 67 species were determined in at least three models from the Shenzhen cohort, Moscow cohort, and their combination, respectively (Supplementary Data S9–S11). These important species contribute most to prediction models and are considered potential biomarkers for autism disorders. However, when comparing the potential biomarkers of the Moscow and Shenzhen cohorts together and alone, only Eubacterium_sp_CAG_248 and Prevotella copri species were seen to appear across the three sets of data (Figure 7 and Supplementary Data S12). The result of partially overlapping potential biomarkers supported the interpretation of the poor independence of the prediction models mentioned in Section 3.5. This was probably due to regional differences, which has been mentioned in previous studies of the gut microbiome [66]. Therefore, the complexity of gut microbiota and those potential biomarkers may not be interpreted from the presence of single species but a combination.

4. Discussion

Homeostasis of metabolism maintained by gut microbiota is not only important for host nutrition and viability, but also for human health and avoidance of disease; disturbed gut microbiota is believed to be a cause of many mental disorders, including ASD. To explore the characteristics of the gut microbiota of individuals with ASD across regions, and look for potential biomarkers of ASD to assist in ASD diagnosis, we attempted to aggregate and annotate multi-source human ASD data from different countries, cohorts, and ethnicities with a uniform processing standard treatment and obtain harmonious and comparable data across studies. It is clear that the gut microbiota is highly complex, which requires high-level and in-depth analysis to explore its system mechanisms. Traditional microbial analysis is not able to distinguish the neurotypical individuals and those with ASD. It is, thus, justified to explore ML technology to enable accurate diagnosis from the gut microbiota in individuals with ASD.
We established a prediction model using a RF algorithm based on the relative abundance of 285 species from two datasets of gut microbiota. The prediction results showed that the RF model possessed prediction power even with only dozens of microbial species. The accuracy of our models is comparable to other prediction models [7,8] based on brain monitoring data, while our method could be more practical for exploring the follow-up treatments due to the simplicity of regulating gut microbiota. A significant variation across the two cohorts is also indicated (Figure 7), which is consistent with the previous work of obvious regional clustering by other analysis (showed in Figure 4). Some other studies shown in Table 1 achieved positive results based on 16S rRNA gene sequencing data. These results also supported that microbial information would serve as a promising diagnosis tool of ASD. However, due to different samples and processing methods, there exists room for the consistency of the potential biomarkers for autism disorders to be improved, which indicates that the data were specific and should be normalized with a standard. In our results, the likelihood of Eubacterium_sp_CAG_248 and Prevotella copri was highest. Here, Prevotella copri is one of the dominant species in the ordinary human gut microbiome [67], while some studies [25,68] have shown that the relative abundance of Prevotella copri in the ASD group was seen to significantly decrease in comparison to the TD group. On the other hand, our LEfSe analysis and RF prediction suggested that the relative abundance of Eubacterium_sp_CAG_248 was remarkably different for ASD and TD groups. This is in agreement with previous studies in the literature. For example, a recent study [25] indicated that Eubacterium_sp_CAG_248 was associated with ASD. Recent studies have shown corrections between Eubacterium spp. and ASD or other mental diseases. Eubacterium spp. sampled from the vagina was reported to be with higher abundance in individuals with ASD [25]. In addition to ASD diagnosis, Eubacterium_sp_CAG_248, together with Eubacterium_sp_CAG_28 and Eubacterium_sp_CAG_86, were negatively associated with five phenotypes, including colorectal cancer, liver cirrhosis, inflammatory bowel diseases, type 2 diabetes, and atherosclerotic cardiovascular disease [69]. Dan et al. also showed that Eubacterium_sp_CAG_38 displayed a positive correlation with hexanoic acid level, while El-Ansary et al. [70] demonstrated that levels of acetic, valeric, hexanoic, and stearidonic acids in the blood were significantly higher in autistic patients. A study [71] indicated that Eubacterium_sp_CAG_202 and Eubacterium_sp_CAG_156 were 2 of the 29 depleted species in the patients with major depressive disorder. Thus, these reports were in support of our conclusion that Eubacterium_sp_CAG_248 and Prevotella copri were potential ASD biomarkers, although further investigation is needed. There still exist challenges for precise diagnostic assistance for ASD with the aid of ML technology, as the relationship between gut microbiota and ASD is so complex and unclear for accurate analysis at present. More comprehensive studies are, thus, required to understand which genes or metabolites contribute to the mechanisms of gut metabolism that contribute to biomarkers of ASD in gut microbiota.
To improve prediction ability, we constructed two types of deep learning models. One was a DNN classifier based on structured data of gut microbiota species and relative abundance, and the other was VGG net, based on the graphic data of gut microbiota species and relative abundance for each sample. Unfortunately, neither model achieved satisfactory results; both produced AUROC values worse than that of the RF model. This may be due to the insufficiency of the samples to support in these two deep learning models. It is well known that when the number of samples is less than the number of features, the RF algorithm possesses more advantages in accuracy. This result is consistent with several previous conclusions on the prediction of other diseases by using deep learning models. For example, Yuan’s team developed the DeepGene model based on the deep neural network [72], aiming to identify the types of cancer by learning the genetic mutation data of individuals with cancer. Although there were 3122 samples of 12 types of cancers and the gene number was 22834, the predicted AUROC value was only 0.6. This also indicates that the DNN classifier and VGG net models may need to be supported by massive data; otherwise, models that perform well given limited samples need to be developed. In the future, more samples from different cities need to be added to the prediction model. Given the scarcity of available data, we plan to increase the number of “neurotypical” samples with the assumption that gut microbiota for the TD group, even from different cohorts, is able to present similar characteristics as possible biomarkers. We believe that improvement of prediction capabilities of ML-based models is critical for the development of new strategies for smart ASD diagnostic assistance.
In conclusion, ASD prediction strategies based on gut microbiota could be used to assist the diagnosis of ASD, and to assess ASD risks. The findings in our work are of use to the development of novel ASD diagnosis and treatment procedures. In addition, the results of our cross-cohort analysis suggested that various influencing factors, such as population characteristics, geographic regions, and dietary habits, should be taken into consideration. In addition, biomarker detection could be sensitive to data collection and processing, which could become dataset-specific. Therefore, a large amount of standardized data with more factors, collected and processed with the same criteria, should be analyzed in silico to explore the potential for clinical practice. Although the results presented in this study are far from being used directly in the actual diagnosis of ASD, they serve as a starting point to inspire subsequent development and research, as well as provide a paradigm shift for the study of other human diseases associated with gut microbiota.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/microorganisms11020291/s1.

Author Contributions

Conceptualization, W.W. and P.F.; methodology, W.W.; software, W.W.; validation, W.W.; formal analysis, W.W.; writing—original draft preparation, W.W.; writing—review and editing, P.F.; visualization, W.W.; supervision, P.F.; funding acquisition, P.F. All authors have read and agreed to the published version of the manuscript.

Funding

This study was supported by the Research Start-Up Funds from Hainan University in China (No. KYQD_ZR2017212).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The prediction code was written in R and is available online at https://github.com/dubi77/RF.git, accessed on 10 October 2022.

Acknowledgments

We are sincerely thankful for access to the public data resources PRJEB23052 (https://www.ncbi.nlm.nih.gov/bioproject/PRJEB23052, accessed on 26 December 2019) and PRJNA516054 (https://www.ncbi.nlm.nih.gov/bioproject/PRJNA516054, accessed on 15 November 2020).

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Battle, D.E. Diagnostic and Statistical Manual of Mental Disorders (DSM). CoDAS 2013, 25, 190–191. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  2. Christensen, D.L.; Braun, K.V.N.; Baio, J.; Bilder, D.; Charles, J.; Constantino, J.N.; Daniels, J.; Durkin, M.S.; Fitzgerald, R.T.; Kurzius-Spencer, M.; et al. Prevalence and Characteristics of Autism Spectrum Disorder among Children Aged 8 Years—Autism and Developmental Disabilities Monitoring Network, 11 Sites, United States, 2012. MMWR Surveill. Summ. 2018, 65, 1–23. [Google Scholar] [CrossRef] [PubMed]
  3. Wucailu Autism Research Institute. Report on the Industry Development of Autism Education and Rehabilitation in China II; Huaxia Publishing House: Beijing, China, 2017. (In Chinese) [Google Scholar]
  4. Buescher, A.V.S.; Cidav, Z.; Knapp, M.; Mandell, D.S. Costs of Autism Spectrum Disorders in the United Kingdom and the United States. JAMA Pediatr. 2014, 168, 721. [Google Scholar] [CrossRef] [PubMed]
  5. China Association of Persons with Psychiatric Disability and Their Relatives. Blue Papers on Needs of Parents with Autistic Children in China; China Association of Persons with Psychiatric Disability and Their Relatives: Beijing, China, 2014. (In Chinese) [Google Scholar]
  6. Piven, J.; Arndt, S.; Bailey, J.; Havercamp, S.; Andreasen, N.C.; Palmer, P. An MRI Study of Brain Size in Autism. Am. J. Psychiatry 1995, 152, 1145–1149. [Google Scholar] [CrossRef]
  7. Hazlett, H.C.; Gu, H.; Munsell, B.C.; Kim, S.H.; Styner, M.; Wolff, J.J.; Elison, J.T.; Swanson, M.R.; Zhu, H.; Botteron, K.N.; et al. Early Brain Development in Infants at High Risk for Autism Spectrum Disorder. Nature 2017, 542, 348–351. [Google Scholar] [CrossRef] [Green Version]
  8. Bosl, W.; Tierney, A.; Tager-Flusberg, H.; Nelson, C. EEG Complexity as a Biomarker for Autism Spectrum Disorder Risk. BMC Med. 2011, 9, 18. [Google Scholar] [CrossRef] [Green Version]
  9. Voineagu, I.; Wang, X.; Johnston, P.; Lowe, J.K.; Tian, Y.; Horvath, S.; Mill, J.; Cantor, R.M.; Blencowe, B.J.; Geschwind, D.H. Transcriptomic Analysis of Autistic Brain Reveals Convergent Molecular Pathology. Nature 2011, 474, 380–384. [Google Scholar] [CrossRef] [Green Version]
  10. Cao, W.; Lin, S.; Xia, Q.; Du, Y.; Yang, Q.; Zhang, M.; Lu, Y.; Xu, J.; Duan, S.; Xia, J.; et al. Gamma Oscillation Dysfunction in MPFC Leads to Social Deficits in Neuroligin 3 R451C Knockin Mice. Neuron 2018, 97, 1253–1260.e7. [Google Scholar] [CrossRef] [Green Version]
  11. Russo, F.B.; Freitas, B.C.; Pignatari, G.C.; Fernandes, I.R.; Sebat, J.; Muotri, A.R.; Beltrão-Braga, P.C.B. Modeling the Interplay between Neurons and Astrocytes in Autism Using Human Induced Pluripotent Stem Cells. Biol. Psychiatry 2018, 83, 569–578. [Google Scholar] [CrossRef]
  12. Cai, Y.; Tang, X.; Chen, X.; Li, X.; Wang, Y.; Bao, X.; Wang, L.; Sun, D.; Zhao, J.; Xing, Y.; et al. Liver X Receptor β Regulates the Development of the Dentate Gyrus and Autistic-like Behavior in the Mouse. Proc. Natl. Acad. Sci. USA 2018, 115, E2725–E2733. [Google Scholar] [CrossRef]
  13. Fernandez, A.; Meechan, D.W.; Karpinski, B.A.; Paronett, E.M.; Bryan, C.A.; Rutz, H.L.; Radin, E.A.; Lubin, N.; Bonner, E.R.; Popratiloff, A.; et al. Mitochondrial Dysfunction Leads to Cortical Under-Connectivity and Cognitive Impairment. Neuron 2019, 102, 1127–1142.e3. [Google Scholar] [CrossRef] [PubMed]
  14. Xu, L.-M.; Li, J.-R.; Huang, Y.; Zhao, M.; Tang, X.; Wei, L. AutismKB: An Evidence-Based Knowledgebase of Autism Genetics. Nucleic Acids Res. 2012, 40, D1016–D1022. [Google Scholar] [CrossRef] [PubMed]
  15. Tang, X.; Kim, J.; Zhou, L.; Wengert, E.; Zhang, L.; Wu, Z.; Carromeu, C.; Muotri, A.R.; Marchetto, M.C.N.; Gage, F.H.; et al. KCC2 Rescues Functional Deficits in Human Neurons Derived from Patients with Rett Syndrome. Proc. Natl. Acad. Sci. USA 2016, 113, 751–756. [Google Scholar] [CrossRef] [Green Version]
  16. Xu, X.; Li, C.; Gao, X.; Xia, K.; Guo, H.; Li, Y.; Hao, Z.; Zhang, L.; Gao, D.; Xu, C.; et al. Excessive UBE3A Dosage Impairs Retinoic Acid Signaling and Synaptic Plasticity in Autism Spectrum Disorders. Cell Res. 2018, 28, 48–68. [Google Scholar] [CrossRef] [PubMed]
  17. Brandler, W.M.; Antaki, D.; Gujral, M.; Kleiber, M.L.; Whitney, J.; Maile, M.S.; Hong, O.; Chapman, T.R.; Tan, S.; Tandon, P.; et al. Paternally Inherited Cis-Regulatory Structural Variants Are Associated with Autism. Science 2018, 360, 327–331. [Google Scholar] [CrossRef] [Green Version]
  18. Hannon, E.; Schendel, D.; Ladd-Acosta, C.; Grove, J.; iPSYCH-Broad ASD Group; Hansen, C.S.; Andrews, S.V.; Hougaard, D.M.; Bresnahan, M.; Mors, O.; et al. Elevated Polygenic Burden for Autism Is Associated with Differential DNA Methylation at Birth. Genome Med. 2018, 10, 19. [Google Scholar] [CrossRef] [Green Version]
  19. Satterstrom, F.K.; Walters, R.K.; Singh, T.; Wigdor, E.M.; Lescai, F.; Demontis, D.; Kosmicki, J.A.; Grove, J.; Stevens, C.; Bybjerg-Grauholm, J.; et al. Autism Spectrum Disorder and Attention Deficit Hyperactivity Disorder Have a Similar Burden of Rare Protein-Truncating Variants. Nat. Neurosci. 2019, 22, 1961–1965. [Google Scholar] [CrossRef]
  20. Bachmann, S.O.; Sledziowska, M.; Cross, E.; Kalbassi, S.; Waldron, S.; Chen, F.; Ranson, A.; Baudouin, S.J. Behavioral Training Rescues Motor Deficits in Cyfip1 Haploinsufficiency Mouse Model of Autism Spectrum Disorders. Transl. Psychiatry 2019, 9, 29. [Google Scholar] [CrossRef] [Green Version]
  21. Gazestani, V.H.; Pramparo, T.; Nalabolu, S.; Kellman, B.P.; Murray, S.; Lopez, L.; Pierce, K.; Courchesne, E.; Lewis, N.E. A Perturbed Gene Network Containing PI3K–AKT, RAS–ERK and WNT–β-Catenin Pathways in Leukocytes Is Linked to ASD Genetics and Symptom Severity. Nat. Neurosci. 2019, 22, 1624–1634. [Google Scholar] [CrossRef]
  22. Endo, T.; Shioiri, T.; Kitamura, H.; Kimura, T.; Endo, S.; Masuzawa, N.; Someya, T. Altered Chemical Metabolites in the Amygdala-Hippocampus Region Contribute to Autistic Symptoms of Autism Spectrum Disorders. Biol. Psychiatry 2007, 62, 1030–1037. [Google Scholar] [CrossRef]
  23. Kang, D.-W.; Ilhan, Z.E.; Isern, N.G.; Hoyt, D.W.; Howsmon, D.P.; Shaffer, M.; Lozupone, C.A.; Hahn, J.; Adams, J.B.; Krajmalnik-Brown, R. Differences in Fecal Microbial Metabolites and Microbiota of Children with Autism Spectrum Disorders. Anaerobe 2018, 49, 121–131. [Google Scholar] [CrossRef] [PubMed]
  24. Wang, M.; Wan, J.; Rong, H.; He, F.; Wang, H.; Zhou, J.; Cai, C.; Wang, Y.; Xu, R.; Yin, Z.; et al. Alterations in Gut Glutamate Metabolism Associated with Changes in Gut Microbiota Composition in Children with Autism Spectrum Disorder. mSystems 2019, 4, e00321-18. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  25. Dan, Z.; Mao, X.; Liu, Q.; Guo, M.; Zhuang, Y.; Liu, Z.; Chen, K.; Chen, J.; Xu, R.; Tang, J.; et al. Altered Gut Microbial Profile Is Associated with Abnormal Metabolism Activity of Autism Spectrum Disorder. Gut Microbes 2020, 11, 1246–1267. [Google Scholar] [CrossRef] [PubMed]
  26. Vargason, T.; Roth, E.; Grivas, G.; Ferina, J.; Frye, R.E.; Hahn, J. Classification of Autism Spectrum Disorder from Blood Metabolites: Robustness to the Presence of Co-Occurring Conditions. Res. Autism Spectr. Disord. 2020, 77, 101644. [Google Scholar] [CrossRef]
  27. Al-Ayadhi, L.; Zayed, N.; Bhat, R.S.; Moubayed, N.M.S.; Al-Muammar, M.N.; El-Ansary, A. The Use of Biomarkers Associated with Leaky Gut as a Diagnostic Tool for Early Intervention in Autism Spectrum Disorder: A Systematic Review. Gut Pathog. 2021, 13, 54. [Google Scholar] [CrossRef]
  28. Kovtun, A.S.; Averina, O.V.; Alekseeva, M.G.; Danilenko, V.N. Antibiotic Resistance Genes in the Gut Microbiota of Children with Autistic Spectrum Disorder as Possible Predictors of the Disease. Microb. Drug Resist. 2020, 26, 1307–1320. [Google Scholar] [CrossRef] [PubMed]
  29. Arora, M.; Reichenberg, A.; Willfors, C.; Austin, C.; Gennings, C.; Berggren, S.; Lichtenstein, P.; Anckarsäter, H.; Tammimies, K.; Bölte, S. Fetal and Postnatal Metal Dysregulation in Autism. Nat. Commun. 2017, 8, 15493. [Google Scholar] [CrossRef] [Green Version]
  30. Schmidt, R.J.; Iosif, A.-M.; Guerrero Angel, E.; Ozonoff, S. Association of Maternal Prenatal Vitamin Use With Risk for Autism Spectrum Disorder Recurrence in Young Siblings. JAMA Psychiatry 2019, 76, 391. [Google Scholar] [CrossRef]
  31. Piao, H.H.; Tam, V.T.M.; Na, H.S.; Kim, H.J.; Ryu, P.Y.; Kim, S.Y.; Rhee, J.H.; Choy, H.E.; Kim, S.W.; Hong, Y. Immunological Responses Induced by Asd and Wzy/Asd Mutant Strains of Salmonella Enterica Serovar Typhimurium in BALB/c Mice. J. Microbiol. 2010, 48, 486–495. [Google Scholar] [CrossRef]
  32. Marchezan, J.; Winkler dos Santos, E.G.A.; Deckmann, I.; Riesgo, R. dos S. Immunological Dysfunction in Autism Spectrum Disorder: A Potential Target for Therapy. Neuroimmunomodulation 2018, 25, 300–319. [Google Scholar] [CrossRef]
  33. Dinan, T.G.; Cryan, J.F. Gut Instincts: Microbiota as a Key Regulator of Brain Development, Ageing and Neurodegeneration: Microbiota-Gut-Brain Axis across the Lifespan. J. Physiol. 2017, 595, 489–503. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  34. Wang, Y.; Kasper, L.H. The Role of Microbiome in Central Nervous System Disorders. Brain Behav. Immun. 2014, 38, 1–12. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  35. Carabotti, M.; Scirocco, A.; Maselli, M.A.; Severia, C. The Gut-Brain Axis: Interactions between Enteric Microbiota, Central and Enteric Nervous Systems. Ann. Gastroenterol. 2015, 28, 203–209. [Google Scholar] [PubMed]
  36. Agustí, A.; García-Pardo, M.P.; López-Almela, I.; Campillo, I.; Maes, M.; Romaní-Pérez, M.; Sanz, Y. Interplay between the Gut-Brain Axis, Obesity and Cognitive Function. Front. Neurosci. 2018, 12, 155. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  37. Alharthi, A.; Alhazmi, S.; Alburae, N.; Bahieldin, A. The Human Gut Microbiome as a Potential Factor in Autism Spectrum Disorder. Int. J. Mol. Sci. 2022, 23, 1363. [Google Scholar] [CrossRef]
  38. Tomova, A.; Husarova, V.; Lakatosova, S.; Bakos, J.; Vlkova, B.; Babinska, K.; Ostatnikova, D. Gastrointestinal Microbiota in Children with Autism in Slovakia. Physiol. Behav. 2015, 138, 179–187. [Google Scholar] [CrossRef]
  39. Ma, B.; Liang, J.; Dai, M.; Wang, J.; Luo, J.; Zhang, Z.; Jing, J. Altered Gut Microbiota in Chinese Children with Autism Spectrum Disorders. Front. Cell. Infect. Microbiol. 2019, 9, 40. [Google Scholar] [CrossRef]
  40. Kang, D.-W.; Park, J.G.; Ilhan, Z.E.; Wallstrom, G.; LaBaer, J.; Adams, J.B.; Krajmalnik-Brown, R. Reduced Incidence of Prevotella and Other Fermenters in Intestinal Microflora of Autistic Children. PLoS ONE 2013, 8, e68322. [Google Scholar] [CrossRef] [Green Version]
  41. Zhang, M.; Ma, W.; Zhang, J.; He, Y.; Wang, J. Analysis of Gut Microbiota Profiles and Microbe-Disease Associations in Children with Autism Spectrum Disorders in China. Sci. Rep. 2018, 8, 13981. [Google Scholar] [CrossRef] [Green Version]
  42. Strati, F.; Cavalieri, D.; Albanese, D.; De Felice, C.; Donati, C.; Hayek, J.; Jousson, O.; Leoncini, S.; Renzi, D.; Calabrò, A.; et al. New Evidences on the Altered Gut Microbiota in Autism Spectrum Disorders. Microbiome 2017, 5, 24. [Google Scholar] [CrossRef]
  43. Hsiao, E.Y.; McBride, S.W.; Hsien, S.; Sharon, G.; Hyde, E.R.; McCue, T.; Codelli, J.A.; Chow, J.; Reisman, S.E.; Petrosino, J.F.; et al. Microbiota Modulate Behavioral and Physiological Abnormalities Associated with Neurodevelopmental Disorders. Cell 2013, 155, 1451–1463. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  44. Grimaldi, R.; Cela, D.; Swann, J.R.; Vulevic, J.; Gibson, G.R.; Tzortzis, G.; Costabile, A. In Vitro Fermentation of B-GOS: Impact on Faecal Bacterial Populations and Metabolic Activity in Autistic and Non-Autistic Children. FEMS Microbiol. Ecol. 2017, 93, fiw233. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  45. Newell, C.; Bomhof, M.R.; Reimer, R.A.; Hittel, D.S.; Rho, J.M.; Shearer, J. Ketogenic Diet Modifies the Gut Microbiota in a Murine Model of Autism Spectrum Disorder. Mol. Autism 2016, 7, 37. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  46. Li, N.; Yang, J.; Zhang, J.; Liang, C.; Wang, Y.; Chen, B.; Zhao, C.; Wang, J.; Zhang, G.; Zhao, D.; et al. Correlation of Gut Microbiome between ASD Children and Mothers and Potential Biomarkers for Risk Assessment. Genom. Proteom. Bioinform. 2019, 17, 26–38. [Google Scholar] [CrossRef] [PubMed]
  47. Xu, Y.; Wang, Y.; Xu, J.; Song, Y.; Liu, B.; Xiong, Z. Leveraging Existing 16SrRNA Microbial Data to Define a Composite Biomarker for Autism Spectrum Disorder. Microbiol. Spectr. 2022, 10, e00331-22. [Google Scholar] [CrossRef]
  48. Fettweis, J.M.; Serrano, M.G.; Brooks, J.P.; Edwards, D.J.; Girerd, P.H.; Parikh, H.I.; Huang, B.; Arodz, T.J.; Edupuganti, L.; Glascock, A.L.; et al. The Vaginal Microbiome and Preterm Birth. Nat. Med. 2019, 25, 1012–1021. [Google Scholar] [CrossRef] [Green Version]
  49. Zeevi, D.; Korem, T.; Zmora, N.; Israeli, D.; Rothschild, D.; Weinberger, A.; Ben-Yacov, O.; Lador, D.; Avnit-Sagi, T.; Lotan-Pompan, M.; et al. Personalized Nutrition by Prediction of Glycemic Responses. Cell 2015, 163, 1079–1094. [Google Scholar] [CrossRef] [Green Version]
  50. Midani, F.S.; Weil, A.A.; Chowdhury, F.; Begum, Y.A.; Khan, A.I.; Debela, M.D.; Durand, H.K.; Reese, A.T.; Nimmagadda, S.N.; Silverman, J.D.; et al. Human Gut Microbiota Predicts Susceptibility to Vibrio Cholerae Infection. J. Infect. Dis. 2018, 218, 645–653. [Google Scholar] [CrossRef] [Green Version]
  51. Smirnova, E.; Puri, P.; Muthiah, M.D.; Daitya, K.; Brown, R.; Chalasani, N.; Liangpunsakul, S.; Shah, V.H.; Gelow, K.; Siddiqui, M.S.; et al. Fecal Microbiome Distinguishes Alcohol Consumption from Alcoholic Hepatitis but Does Not Discriminate Disease Severity. Hepatology 2020, 72, 271–286. [Google Scholar] [CrossRef]
  52. Feng, Q.; Liang, S.; Jia, H.; Stadlmayr, A.; Tang, L.; Lan, Z.; Zhang, D.; Xia, H.; Xu, X.; Jie, Z.; et al. Gut Microbiome Development along the Colorectal Adenoma–Carcinoma Sequence. Nat. Commun. 2015, 6, 6528. [Google Scholar] [CrossRef]
  53. Galkin, F.; Aliper, A.; Putin, E.; Kuznetsov, I.; Gladyshev, V.N.; Zhavoronkov, A. Human Microbiome Aging Clocks Based on Deep Learning and Tandem of Permutation Feature Importance and Accumulated Local Effects. bioRxiv 2018. [Google Scholar] [CrossRef] [Green Version]
  54. Salosensaari, A.; Laitinen, V.; Havulinna, A.; Meric, G.; Cheng, S.; Perola, M.; Valsta, L.; Alfthan, G.; Inouye, M.; Watrous, J.D.; et al. Taxonomic Signatures of Long-Term Mortality Risk in Human Gut Microbiota. medRxiv 2020. [Google Scholar] [CrossRef] [Green Version]
  55. Chong, C.W.; Ahmad, A.F.; Lim, Y.A.L.; Teh, C.S.J.; Yap, I.K.S.; Lee, S.C.; Chin, Y.T.; Loke, P.; Chua, K.H. Effect of Ethnicity and Socioeconomic Variation to the Gut Microbiota Composition among Pre-Adolescent in Malaysia. Sci. Rep. 2015, 5, 13338. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  56. Gupta, V.K.; Paul, S.; Dutta, C. Geography, Ethnicity or Subsistence-Specific Variations in Human Microbiome Composition and Diversity. Front. Microbiol. 2017, 8, 1162. [Google Scholar] [CrossRef] [Green Version]
  57. Andrews, S. FastQC: A Quality Control Tool for High Throughput Sequence Data; Babraham Institute: Babraham, UK, 2010. [Google Scholar]
  58. Bolger, A.M.; Lohse, M.; Usadel, B. Trimmomatic: A Flexible Trimmer for Illumina Sequence Data. Bioinformatics 2014, 30, 2114–2120. [Google Scholar] [CrossRef] [Green Version]
  59. Langmead, B.; Salzberg, S.L. Fast Gapped-Read Alignment with Bowtie 2. Nat. Methods 2012, 9, 357–359. [Google Scholar] [CrossRef] [Green Version]
  60. Truong, D.T.; Franzosa, E.A.; Tickle, T.L.; Scholz, M.; Weingart, G.; Pasolli, E.; Tett, A.; Huttenhower, C.; Segata, N. Erratum: MetaPhlAn2 for Enhanced Metagenomic Taxonomic Profiling. Nat. Methods 2016, 13, 101. [Google Scholar] [CrossRef] [Green Version]
  61. Bastian, M.; Heymann, S.; Jacomy, M. Gephi: An Open Source Software for Exploring and Manipulating Networks. In Proceedings of the Third International Conference on Weblogs and Social Mediam, San Jose, CA, USA, 17–20 May 2009; pp. 17–20. [Google Scholar]
  62. Segata, N.; Izard, J.; Waldron, L.; Gevers, D.; Miropolsky, L.; Garrett, W.S.; Huttenhower, C. Metagenomic Biomarker Discovery and Explanation. Genome Biol. 2011, 12, R60. [Google Scholar] [CrossRef] [Green Version]
  63. Pasolli, E.; Truong, D.T.; Malik, F.; Waldron, L.; Segata, N. Machine Learning Meta-Analysis of Large Metagenomic Datasets: Tools and Biological Insights. PLoS Comput. Biol. 2016, 12, e1004977. [Google Scholar] [CrossRef] [Green Version]
  64. Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
  65. Thomas, A.M.; Manghi, P.; Asnicar, F.; Yue, F.; Pasolli, E.; Armanini, F.; Zolfo, M.; Beghini, F.; Manara, S.; Karcher, N.; et al. Metagenomic Analysis of Colorectal Cancer Datasets Identifies Cross-cohort Microbial Diagnostic Signatures and a Link with Choline Degradation. Nat. Med. 2019, 12, 667–678. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  66. Sun, Z.; Huang, S.; Zhu, P.; Yue, F.; Zhao, H.; Yang, M.; Niu, Y.; Jing, G.; Su, X.; Li, H.; et al. A Microbiome-Based Index for Assessing Skin Health and Treatment Effects for Atopic Dermatitis in Children. mSystems 2019, 4, e00293-19. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  67. Prasoodanan, P.K.V.; Sharma, A.K.; Mahajan, S.; Dhakan, D.B.; Maji, A.; Scaria, J.; Sharma, V.K. Western and Non-Western Gut Microbiomes Reveal New Roles of Prevotella in Carbohydrate Metabolism and Mouth–Gut Axis. Npj Biofilms Microbiomes 2021, 7, 77. [Google Scholar] [CrossRef] [PubMed]
  68. Zhang, M.; Chu, Y.; Meng, Q.; Ding, R.; Shi, X.; Wang, Z.; He, Y.; Zhang, J.; Liu, J.; Zhang, J.; et al. A Quasi-Paired Cohort Strategy Reveals the Impaired Detoxifying Function of Microbes in the Gut of Autistic Children. Sci. Adv. 2020, 6, eaba3760. [Google Scholar] [CrossRef] [PubMed]
  69. Tierney, B.T.; Tan, Y.; Kostic, A.D.; Patel, C.J. Gene-Level Metagenomic Architectures across Diseases Yield High-Resolution Microbiome Diagnostic Indicators. Nat. Commun. 2021, 12, 2907. [Google Scholar] [CrossRef]
  70. El-Ansary, A.K.; Ben Bacha, A.G.; Al- Ayahdi, L.Y. Plasma Fatty Acids as Diagnostic Markers in Autistic Patients from Saudi Arabia. Lipids Health Dis. 2011, 10, 62. [Google Scholar] [CrossRef] [Green Version]
  71. Yang, J.; Zheng, P.; Li, Y.; Wu, J.; Tan, X.; Zhou, J.; Sun, Z.; Chen, X.; Zhang, G.; Zhang, H.; et al. Landscapes of Bacterial and Metabolic Signatures and Their Interaction in Major Depressive Disorders. Sci. Adv. 2020, 6, eaba8555. [Google Scholar] [CrossRef]
  72. Yuan, Y.; Shi, Y.; Li, C.; Kim, J.; Cai, W.; Han, Z.; Feng, D.D. DeepGene: An Advanced Cancer Type Classifier Based on Deep Learning and Somatic Point Mutations. BMC Bioinform. 2016, 17, 476. [Google Scholar] [CrossRef]
Figure 1. Stratified 10-fold cross-validation. All models were evaluated by stratified 10-fold cross-validation with the dataset divided into training and test sets.
Figure 1. Stratified 10-fold cross-validation. All models were evaluated by stratified 10-fold cross-validation with the dataset divided into training and test sets.
Microorganisms 11 00291 g001
Figure 2. The main interactive processes of our random forest algorithm. (A) Stratified 10-fold cross-validation was conducted with a dataset divided into training and test sets. (B) A total of 100 repeats of stratified 10-fold cross-validation were run, and five models with the highest AUROC values were singled out (C). (D) After removing the unimportant species with a value of mean decrease accuracy less than or equal to 0, the rest of the species were utilized to run the next round of 100 repeats of stratified 10-fold cross-validation. The iteration was repeated until little or no increase in the AUROC value was reached (AUROC value < 0.01), and the corresponding model was optimal.
Figure 2. The main interactive processes of our random forest algorithm. (A) Stratified 10-fold cross-validation was conducted with a dataset divided into training and test sets. (B) A total of 100 repeats of stratified 10-fold cross-validation were run, and five models with the highest AUROC values were singled out (C). (D) After removing the unimportant species with a value of mean decrease accuracy less than or equal to 0, the rest of the species were utilized to run the next round of 100 repeats of stratified 10-fold cross-validation. The iteration was repeated until little or no increase in the AUROC value was reached (AUROC value < 0.01), and the corresponding model was optimal.
Microorganisms 11 00291 g002
Figure 3. Microbiome profiles of ASD and TD groups. (A) There were a total of 749 species in the gut microbiota for our analysis. (B) After the data reduction to exclude those species with noise and less information, 285 species remained. (C) The most abundant phyla in ASD and TD groups. (D) The top ten abundant genera in both ASD and TD groups. (E) The hierarchical heatmap indicated the more abundant genus in the TD group and the more abundant genus in the ASD group. (F) Species composition of ASD and TD groups ranked among the top ten in relative abundance at the species level.
Figure 3. Microbiome profiles of ASD and TD groups. (A) There were a total of 749 species in the gut microbiota for our analysis. (B) After the data reduction to exclude those species with noise and less information, 285 species remained. (C) The most abundant phyla in ASD and TD groups. (D) The top ten abundant genera in both ASD and TD groups. (E) The hierarchical heatmap indicated the more abundant genus in the TD group and the more abundant genus in the ASD group. (F) Species composition of ASD and TD groups ranked among the top ten in relative abundance at the species level.
Microorganisms 11 00291 g003
Figure 4. PCoA analysis based on Bray–Curtis algorithm. Results showed a clear clustering between two datasets and a similar tendency that the gut microbiota composition of the ASD group clusters was more heterogeneous than that of the TD group.
Figure 4. PCoA analysis based on Bray–Curtis algorithm. Results showed a clear clustering between two datasets and a similar tendency that the gut microbiota composition of the ASD group clusters was more heterogeneous than that of the TD group.
Microorganisms 11 00291 g004
Figure 5. LEfSe analysis. The bar chart (left) shows the significantly different species in TD and ASD groups, and the circle chart (right) shows the taxonomic rank of the different species from phylum to genus.
Figure 5. LEfSe analysis. The bar chart (left) shows the significantly different species in TD and ASD groups, and the circle chart (right) shows the taxonomic rank of the different species from phylum to genus.
Microorganisms 11 00291 g005
Figure 6. The results of the prediction model. The Moscow cohort (AC) alone had a poor average result of AUROC = 0.81 and accuracy = 0.67, with a total of four rounds of 100 runs, and the optimal average feature set was 41 species. The Shenzhen cohort alone achieved a high value of AUROC = 0.984 and accuracy = 0.968 with just one round of 100 runs, and the optimal model used an average of 39 species of the full set of 285 species as features. For the combination of two cohorts (DF), the optimal model was achieved in the third round, showing the average prediction result of AUROC = 0.86, accuracy = 0.80, and an average feature set of 67 species.
Figure 6. The results of the prediction model. The Moscow cohort (AC) alone had a poor average result of AUROC = 0.81 and accuracy = 0.67, with a total of four rounds of 100 runs, and the optimal average feature set was 41 species. The Shenzhen cohort alone achieved a high value of AUROC = 0.984 and accuracy = 0.968 with just one round of 100 runs, and the optimal model used an average of 39 species of the full set of 285 species as features. For the combination of two cohorts (DF), the optimal model was achieved in the third round, showing the average prediction result of AUROC = 0.86, accuracy = 0.80, and an average feature set of 67 species.
Microorganisms 11 00291 g006
Figure 7. Potential biomarkers. Only Eubacterium_sp_CAG_248 and Prevotella copri species were present across the Moscow cohort, the Shenzhen cohort, and both cohorts combined; they were considered the potential biomarkers.
Figure 7. Potential biomarkers. Only Eubacterium_sp_CAG_248 and Prevotella copri species were present across the Moscow cohort, the Shenzhen cohort, and both cohorts combined; they were considered the potential biomarkers.
Microorganisms 11 00291 g007
Table 1. Findings of previous studies. Differences in gut microbiota between individuals with autism spectrum disorder (ASD group) and typically developed individuals (TD group).
Table 1. Findings of previous studies. Differences in gut microbiota between individuals with autism spectrum disorder (ASD group) and typically developed individuals (TD group).
ModelNumber of SamplesCountry or RegionSequencing MethodsManifestation of Species Disorder (ASD)Reference
childrenTD: 10; ASD: 10SlovakiaRealtime-PCRphyla Bacteroidetes/Firmicutes: ↓ *;
Lactobacillus: ↑;
Bifidobacterium/Lactobacillus, Streptococcus thermophillus, the total bacteria content: -
[38]
childrenTD: 45; ASD: 45China16S rRNA V3-V4At phylum level: -;
genera Lachnoclostridium, Tyzzerella subgroup 4, Flavonifractor, unidentified_Lachnospiraceae: ↓
[39]
childrenTD: 20; ASD: 20-16S rRNA V2-V3genera Prevotella, Coprococcus, and unclassified_Veillonellaceae: ↓[40]
childrenTD: 35; ASD: 6China16S rRNA V3-V4phyla Bacteroidetes/Firmicutes; genera Sutterella, Odoribacter and Butyricimonas: ↑
genera Veillonella and Streptococcuse: ↓
[41]
children ASD TD: 40; ASD: 40Italy16S rRNA V3-V5phylum Bacteroidetes, genera Alistipes, Bilophila, Dialister, Parabacteroides, Veillonella: ↓;
phyla Firmicutes/Bacteroidetes; genera Collinsella, Corynebacterium, Dorea, and Lactobacillus; Escherichia/Shigella and Clostridium cluster XVII; fungal: genus Candida: ↑
[42]
mice TD: 10; ASD: 10USA16S rRNA V3-V5 classes Bacteroidia, Clostridia: ↑[43]
childrenTD: 3; ASD: 3UKFISH-FCMphyla Clostridium spp.: ↑
Bifidobacterial: ↓
[44]
childrenTD: 20; ASD: 18USA16S rRNA V4genera Bifidobacterium, Desulfovibrio: ↓[23]
miceTD: 21; ASD: 25CanadaqRT-PCRphyla Firmicutes: ↓
Bacteroidetes: ↑
[45]
children and mothersTD: 30;
ASD: 59
China16S rRNA V1-V2 Children:
phylum Proteobacteria: ↑;
genera Enhydrobacter, Chryseobacterium, Streptococcus, and Acinetobacter: ↑;
species Acinetobacter rhizosphaerae, Acinetobacter johnsonii, Prevotella melaninogenica: ↓
Mother:
families Moraxellaceae and Enterobacteriaceae, genus Faecalibacterium: ↓
[46]
minorsTD: 450
ASD: 569
China, Ecuador, Italy, Korean16S rRNA V3-V4, V4, V4-V5Results were variable according to different analysis methods and parameter settings.[47]
childrenTD: 31
ASD: 43
ChinaShotgun metagenomic sequencingphylum Actinobacteria: ↑;
three Clostridium taxons, two Eggerthella taxons, two Klebsiella taxons: ↑;
taxons Bacteroides vulgatus, Betaproteobacteria, Campylobacter jejuni subsp. jejuni 81–176, Campylobacter jejuni subsp. jejuni ICDCCJ07001, Candidatus Chloracidobacterium thermophilum B, Coraliomargarita akajimensis DSM 45221, Proteus mirabilis, and HI4320 Spirochaeta thermophila DSM 6192: ↓
[24]
childrenTD: 20
ASD: 30
MoscowShotgun metagenomic sequencingspecies Enterococcus faecium, Megasphaera elsdenii, Bacteroides fragilis:[28]
*: ↓: Significant decrease; ↑: Significant increase; -: no significant change.
Table 2. Description of sample data.
Table 2. Description of sample data.
CharacteristicMoscow CohortShenzhen Cohort
Subjects of ASD (n)3043
Subjects of TD (n)2031
Age (years)3–52–7
Sequencing instrumentsIllumina NovaSeq 6000Illumina HiSeq 4000
LayoutPAIREDPAIRED
AvgSpotLen300300
Bytes (Gb)1.92–4.080.526–4.09
Table 3. The number of species as important features in the five optimal prediction models selected in each iteration of the Moscow cohort. The iterations became stabilized in the fourth round.
Table 3. The number of species as important features in the five optimal prediction models selected in each iteration of the Moscow cohort. The iterations became stabilized in the fourth round.
Model AModel BModel CModel DModel EAverage
1st iteration13688111109140117
2nd iteration784968659070
3rd iteration603854446051
4th iteration483446383841
5th iteration432843363537
6th iteration362434353032
Table 4. The number of species as important features in the five optimal prediction models selected in each iteration of the Moscow and Shenzhen cohorts. The iterations became stabilized in the third round.
Table 4. The number of species as important features in the five optimal prediction models selected in each iteration of the Moscow and Shenzhen cohorts. The iterations became stabilized in the third round.
Model AModel BModel CModel DModel EAverage
1st iteration112125110116109114
2nd iteration939673808485
3rd iteration597756727167
4th iteration516150625355
5th iteration485645585152
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wang, W.; Fu, P. Gut Microbiota Analysis and In Silico Biomarker Detection of Children with Autism Spectrum Disorder across Cohorts. Microorganisms 2023, 11, 291. https://doi.org/10.3390/microorganisms11020291

AMA Style

Wang W, Fu P. Gut Microbiota Analysis and In Silico Biomarker Detection of Children with Autism Spectrum Disorder across Cohorts. Microorganisms. 2023; 11(2):291. https://doi.org/10.3390/microorganisms11020291

Chicago/Turabian Style

Wang, Wenjuan, and Pengcheng Fu. 2023. "Gut Microbiota Analysis and In Silico Biomarker Detection of Children with Autism Spectrum Disorder across Cohorts" Microorganisms 11, no. 2: 291. https://doi.org/10.3390/microorganisms11020291

APA Style

Wang, W., & Fu, P. (2023). Gut Microbiota Analysis and In Silico Biomarker Detection of Children with Autism Spectrum Disorder across Cohorts. Microorganisms, 11(2), 291. https://doi.org/10.3390/microorganisms11020291

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop