Usefulness of Machine Learning-Based Gut Microbiome Analysis for Identifying Patients with Irritable Bowels Syndrome

Irritable bowel syndrome (IBS) is diagnosed by subjective clinical symptoms. We aimed to establish an objective IBS prediction model based on gut microbiome analyses employing machine learning. We collected fecal samples and clinical data from 85 adult patients who met the Rome III criteria for IBS, as well as from 26 healthy controls. The fecal gut microbiome profiles were analyzed by 16S ribosomal RNA sequencing, and the determination of short-chain fatty acids was performed by gas chromatography–mass spectrometry. The IBS prediction model based on gut microbiome data after machine learning was validated for its consistency for clinical diagnosis. The fecal microbiome alpha-diversity indices were significantly smaller in the IBS group than in the healthy controls. The amount of propionic acid and the difference between butyric acid and valerate were significantly higher in the IBS group than in the healthy controls (p < 0.05). Using LASSO logistic regression, we extracted a featured group of bacteria to distinguish IBS patients from healthy controls. Using the data for these featured bacteria, we established a prediction model for identifying IBS patients by machine learning (sensitivity >80%; specificity >90%). Gut microbiome analysis using machine learning is useful for identifying patients with IBS.


Introduction
Irritable bowel syndrome (IBS) is currently accepted to be a functional gastrointestinal disorder characterized by symptoms such as abdominal pain or discomfort, bloating, and stool irregularities without any structural or organic lesions [1]. Many factors (visceral sensitivity, bowel motility, mucosal immunity, psychological stress, etc.) are involved in the pathophysiology of IBS [2], making it difficult to clarify the pathophysiological mechanism, diagnosis, and treatments for IBS. With regard to diagnosis, although some objective molecular markers based on blood, stool, and intestinal tissue sampling have been proposed, no valid biomarkers of IBS have yet been established [3]. In this context, the Rome Foundation have developed symptom-based criteria for diagnosing and distinguishing the clinical types of IBS, and the Rome Criteria are now used as a global standard.
The Rome III criteria, revised into Rome IV in 2016 [4], have been used for the diagnosis of IBS since 2006, and numerous data based on the Rome III criteria have been accumulated over the last decade. Ford et al. reported that the sensitivity and specificity of the Rome III criteria for the diagnosis of IBS are 68.8% and 79.5%, respectively [5,6]. This suggests that the Rome III criteria have room for further improvement, and, in fact, it is possible to categorize inflammatory bowel disease (IBD) or celiac disease into IBS on the basis of the Rome III criteria [7,8]. Thus, there is still a need for objective biomarkers with improved diagnostic accuracy for IBS, which can help identify individuals who will develop IBS in the future. Recent studies have strongly suggested that the gut microbiome may play a pivotal role in the pathophysiology of IBS [9,10]. In this context, the possibility that the intestinal microbiota signature might be a candidate biomarker for evaluating the severity of IBS symptoms has been suggested by a European group [11]. However, further detailed studies are needed to clarify whether the microbiota signature could be useful for diagnosis, group typing, and evaluation of both clinical severity and response to therapy. On the other hand, it is well known that the gut microbiota profile differs among human racial groups [12]. Therefore, in the present study, we investigated the gut microbiota profile and associated short-chain fatty acids in Japanese IBS patients and healthy controls, its relationship to clinical data and molecular samples, and its possible usefulness as a biomarker in clinical subsets of IBS.

Study Design and Participants
Eighty-five adult patients, aged 20-65 years who fulfilled the Rome III criteria for IBS were recruited prospectively at secondary/tertiary care outpatient clinics (Matsuda Hospital, JCHO Tokyo Shinjuku Medical Center, and Hyogo College of Medicine in Japan) between February 2017 and February 2018. A healthy control group of 26 individuals was also recruited by advertisement and checked by interview and a questionnaire to exclude any chronic diseases and or current gastrointestinal symptoms. All subjects provided written informed consent to participate after receiving verbal and written information about the study. All of the procedures complied with the principles of the Declaration of Helsinki and were approved by the Ethical Review Board at Matsuda Hospital (IRB No. H29-2), JCHO Tokyo Shinjuku Medical Center (IRB No. 2016-04) and Hyogo College of Medicine (IRB No. 2700).
Demographic information and body mass index were collected from all subjects. They were also asked to complete a questionnaire designed to obtain information about medical and medication history. For IBS patients, the Bristol stool form scale score and the characteristics and frequency of gastrointestinal symptoms were recorded [13]. Classification into IBS subtypes according to the Rome III criteria was performed based on the Bristol Stool Form scale characteristics: IBS with constipation (IBS-C), IBS with diarrhea (IBS-D), mixed IBS (IBS-M), or unsubtyped IBS (IBS-U) [1]. Exclusion criteria for all subjects included (i) use of antibiotics or antacids within one month before inclusion, (ii) having a major psychiatric disorder or use of psychotropic medication within one month before inclusion, and (iii) habitual use of tobacco or alcohol.

Fecal Sampling, DNA Extraction, and Sequencing
Fecal samples were collected using a brush-type collection kit containing a guanidine thiocyanate solution (Techno Suruga Laboratory, Shizuoka, Japan) and stored at 4 • C. DNA was extracted from fecal samples using an automated DNA extraction machine (GENE PREP STAR PI-480, Kurabo Industries Ltd., Osaka, Japan) according to the manufacturer's standard protocol. The 16S ribosomal RNA (rRNA) regions (V1-V2) were amplified using a forward primer (16S_27Fmod: TCG TCG GCA GCG TCA  GAT GTG TAT AAG AGA CAG AGR GTT TGA TYM TGG CTC AG) and reverse primer (16S_338R:  GTC TCG TGG GCT CGG AGA TGT GTA TAA GAG ACA GTG CTG CCT CCC GTA GGA GT) with KAPA HiFi Hot Start Ready Mix (Kapa Biosystems, Wilmington, MA, USA). To sequence 16S amplicons by the Illumina MiSeq platform, dual index adapters were attached using the Nextera XT Index kit (Illumina, San Diego, CA, USA). Each library was diluted to 5 ng/µL, and equal volumes were mixed to 4 nM. The DNA concentration of the mixed libraries was quantified by qPCR with KAPA SYBR FAST qPCR Master mix (KK4601, KAPA Biosystems, Wilmington, MA, USA) using primer 1 (AAT GAT ACG GCG ACC ACC) and primer 2 (CAA GCA GAA GAC GGC ATA CGA). The library preparations were carried out according to the 16S library preparation protocol of Illumina (Illumina, San Diego, CA, USA). Libraries were sequenced using the MiSeq Reagent Kit v2 (500 Cycles) for 250-bp pair-ends (Illumina, San Diego, CA, USA). Sequence files are available from the NCBI Sequence Read Archive [14].

Taxonomy Assignment Based on the 16S rRNA Gene Sequence
The paired-end reads of partial 16S rRNA gene sequences were clustered by 97% nucleotide identity, and then assigned taxonomic information using the Greengenes database (v13.8) [15] through the QIIME pipeline (v1.8.0) [16]. The steps for data processing and assignment based on the QIIME pipeline were as follows: (i) joining paired-end reads; (ii) quality filtering with an accuracy of Q30 (>99.9%) and a read length of >300 bp (the number of reads per sample before and after quality filtering is listed in Supplementary Data S1); (iii) random extracting of 10,000 reads per sample for subsequent analysis; (iv) clustering of operational taxonomic units (OTUs) with 97% identity by UCLUST (v1.2.22q) [17] (all the relative abundance values for each OTU and sample are listed in Supplementary Data S2); (v) assigning of taxonomic information to each OTU using the Ribosomal Database Project (RDP) classifier [18] with the full-length 16S gene data of Greengenes (v13.8) to determine the identity and composition of the bacterial genera.

Analysis of Bacterial Diversity
Microbiota diversity was assessed by Shannon index, PD (phylogenetic diversity) whole tree, and observed OTUs based on 97% nucleotide sequence identity. These values were calculated by QIIME [16] with a depth of 10,000. Then, p-values were calculated by Welch's test for testing group differences in diversity between the IBS patients and healthy controls. All distances among IBS patients and healthy controls were assessed by unweighted UniFrac distance by QIIME. Principal coordinate analysis (PCoA) was used to show the unweighted UniFrac distance between IBS patients and healthy controls in a low-dimensional space by cmdscale in the R statistical platform, version 3.4.3 [19]. Hierarchical clustering of unweighted UniFrac distance using Ward's method was performed to visualize the relationship between IBS patients and healthy controls using the Python clustering package (Scipy v1.2.1) [20]. All links connecting nodes closer than 2 Euclidean distances were assigned the same color.

Measurement of Fecal Short-Chain Fatty Acids
Fecal samples were collected from all participants. The fecal samples obtained for measurement of short-chain fatty acids (SCFAs) were immediately frozen at −30 • C and stored at −80 • C until measurement. Fecal SCFAs were measured using a modified protocol described previously [21]. In brief, the SCFA-containing ether layers were collected and pooled for gas chromatography-mass spectrometry 4 of 14 (GC/MS) analysis using GCMS-QP2010 Ultra (Shimadzu, Kyoto, Japan). The concentration of each SCFA was determined as µmol/g using external standard calibration over an appropriate concentration range. A p-value by Welch's test of <5% was considered to be significant.

Group Differences in Taxonomic Abundance
To reveal associations between taxonomic abundance and IBS status, we tested group differences of genus-level relative abundances using Welch's test. The centered log-ratio transformed values were used as inputs for these univariate analyses to manage 0 count values. Analysis was confined to taxa with a prevalence greater than 10% and a maximum proportion (relative abundance) greater than 0.005. A p value of less than 5% was considered to be significant.

Prediction Model for IBS and Statistical Analyses of IBS Biomarkers
To establish a methodology for identifying IBS patients based on fecal bacteria data, we tried a machine learning approach. Before machine learning, bacterial abundances were logarithmically transformed. As bacterial data included 689 taxa at the genus level, such a large data volume would have tended to induce dimensionality for machine learning. Therefore, we first extracted feature-taxa by L1 regularized logistic regression (LASSO; least absolute shrinkage and selection operator) [22] as used previously for feature-taxa extraction [11]. We next identified IBS by random-forest analysis [23] using the extracted taxa. The random forest was packaged in a pipeline of Python scikit-learn to prevent data leakage [24] and subjected to repeated cross-validation (10-fold, one hundred repeats). A parameter of inverse of regularization strength for logistic regression was optimized by inner 5-fold cross-validation. The performance of the classifier was quantified by area under the receiver-operating characteristic (ROC) curves with an average of a thousand models. The source code for the prediction model is available from GitHub [25].

Statistical Analyses of the Fecal Microbiome to Determine the Featured Taxa in IBS Patients
To determine the featured taxa in IBS patients, we used the LASSO logistic regression algorithm as developed by Tap et al. [11]. This algorithm extracts features (bacterial OTUs) as non-zero coefficients from 100 LASSO models (trained in 10-fold cross-validation and ten repeats). As train and test data, our OTU-based data were filtered to remove OTUs that were detected in only one sample or less than 10 reads as a total amount for all samples. The labels for classification were IBS and healthy control. For comparison with the features of Swedish IBS patients, we extracted OTUs whose assigned taxonomy at the genus level had been commonly observed in the Swedish data [11] and our data. Each of the featured taxa (OTUs) was assessed by BLAST [26].

Patient Characteristics and Clinical Status
Clinic and demographic characteristics for all of the subjects (85 IBS patients and 26 healthy controls) enrolled in this study are summarized in Table 1. Among the 85 IBS patients, 27 were diagnosed as IBS-C, 33 as IBS-D, 22 as IBS-M, and 3 as IBS-U according to the Rome III criteria. The various parameters including age, gender, and body mass index (BMI) did not differ significantly between the healthy controls and the IBS patients as a whole ( Table 1).
The characteristics of the various IBS subtypes are also shown in Table 1. Age, gender, and BMI did not differ between IBS-D and IBS-M, but age was higher in IBS-C than in the healthy controls. Stool frequency was significantly lower in IBS-C than in controls, whereas it was significantly higher in IBS-D. Bristol Stool Scale score was significantly higher in IBS-D than in controls.   Data are shown as mean ± SD. The frequency of IBS symptoms was graded as 1, 3-9 days/month; 2, 10-19 days/month; 3, 20-every day/month. NA, not available.

Factors
Healthy (n = 26)  Data are shown as mean ± SD. The p value for IBS-U was not indicated because of low numbers. t, t value. † Short-chain fatty acid data lacks 4 samples of IBS including 2 IBS-C and 2 IBS-M.

Distance of Microbial Composition between IBS and Healthy Controls
PCoA of unweighted UniFrac distances of microbial composition is shown in Figure 2A. The properties of healthy controls were positioned in the area where the level of PC1 and PC2 was less than 0.1 and 0.2, respectively. The properties of some IBS patients belonged to the same area, but those of others showed higher PC1 and/or PC2 levels, indicating that some IBS patients had healthy control-like properties whereas others were clearly distinguishable. Overall, IBS-C patients did not show high PC2 levels, but some showed high PC1 levels (>0.1). Hierarchical clustering of unweighted UniFrac distance was also performed to visualize purely IBS clusters and IBS-healthy control mixed clusters ( Figure 2B). The green clusters furthest apart from the IBS-healthy control mixed clusters were purely IBS clusters comprising 16 samples (8 D-type, 2 C-type, 5 M-type and 1 U-type) ( Figure 2B).

Comparisons of Relative Abundance of Each Taxon between Healthy Controls and IBS Patients
With the univariate analysis, we found significant taxon at the genus level. The Welch's test indicated statistical significance of relative abundances of several taxon existed between healthy controls and IBS patients (Table 3 and Figure 3). Table 3. Differences in abundance of single taxa between healthy controls and IBS patients.  With the univariate analysis, we found significant taxon at the genus level. The Welch's test indicated statistical significance of relative abundances of several taxon existed between healthy controls and IBS patients (Table 3 and Figure 3). Table 3. Differences in abundance of single taxa between healthy controls and IBS patients.

Classification of IBS and Healthy Controls by Machine Learning with Featured Taxa and Short-Chain Fatty Acids
We attempted to establish a model for distinguishing IBS patients from healthy control groups using taxa-assigned and/or SCFA data (Figure 4). We first tested whether a combination of logistic regression and random forest would be better than either approach alone. We found that the combination of logistic regression and random forest for taxa-assigned data yielded an area under the curve (AUC) of 0.911 ± 0.088 (95% CI, 0.905-0.916), whereas the AUC obtained by logistic regression was 0.887 ± 0.112 (95% CI, 0.880-0.894) and that obtained by random forest was 0.846 ± 0.130 (95% CI, 0.837-0.854) (Supplementary Figure S1). This confirmed that a combination of logistic regression and random forest was significantly better than either approach alone (p < 0.05, t ~5.20 for logistic regression and 12.9 for random forest); therefore, we decided to use this combination to establish a model for distinguishing IBS patients from control subjects.

Classification of IBS and Healthy Controls by Machine Learning with Featured Taxa and Short-Chain Fatty Acids
We attempted to establish a model for distinguishing IBS patients from healthy control groups using taxa-assigned and/or SCFA data (Figure 4). We first tested whether a combination of logistic regression and random forest would be better than either approach alone. We found that the combination of logistic regression and random forest for taxa-assigned data yielded an area under the curve (AUC) of 0.911 ± 0.088 (95% CI, 0.905-0.916), whereas the AUC obtained by logistic regression was 0.887 ± 0.112 (95% CI, 0.880-0.894) and that obtained by random forest was 0.846 ± 0.130 (95% CI, 0.837-0.854) (Supplementary Figure S1). This confirmed that a combination of logistic regression and random forest was significantly better than either approach alone (p < 0.05, t~5.20 for logistic regression and 12.9 for random forest); therefore, we decided to use this combination to establish a model for distinguishing IBS patients from control subjects. Figure 4. Area under the curve (AUC) scores and receiver-operating characteristic (ROC) curves for IBS prediction using taxa and short-chain fatty acid (SCFA) data. (A) Boxplots of AUC scores describing the prediction performance for IBS using taxa-assigned data, short-chain fatty acid data, and both. Solid lines in boxes indicate median and dashed lines indicate mean. (B) ROC curves describing specificity and sensitivity using taxa-assigned data, short-chain fatty acid data, and both. The gray shadow indicates the standard deviation of the ROC curve obtained using taxa-assigned data.

Comparison of Japanese IBS Featured Taxa with Swedish IBS
To determine whether the microbiomes in Japanese and Swedish IBS patients are similar or different, we extracted featured taxa using the LASSO logistic regression algorithm developed by Tap et al., which has been used to analyze Swedish IBS data [11]. The features extracted from our Japanese IBS data showed some bacteria that were not evident in the Swedish data, such as Halomonas, Klebsiella, Dorea, Prevotella, Lachnobacterium, Ruminococcus, Collinsella, Streptococcus, Bifidobacterium, and Oscillospira (Table 3 and Supplementary Table S1). Featured genera commonly observed in both the Swedish and our Japanese data were Bacteroides, Faecalibacterium, Parabacteroides, and Blautia (Supplementary Table S1). . Area under the curve (AUC) scores and receiver-operating characteristic (ROC) curves for IBS prediction using taxa and short-chain fatty acid (SCFA) data. (A) Boxplots of AUC scores describing the prediction performance for IBS using taxa-assigned data, short-chain fatty acid data, and both. Solid lines in boxes indicate median and dashed lines indicate mean. (B) ROC curves describing specificity and sensitivity using taxa-assigned data, short-chain fatty acid data, and both. The gray shadow indicates the standard deviation of the ROC curve obtained using taxa-assigned data.

Comparison of Japanese IBS Featured Taxa with Swedish IBS
To determine whether the microbiomes in Japanese and Swedish IBS patients are similar or different, we extracted featured taxa using the LASSO logistic regression algorithm developed by Tap et al., which has been used to analyze Swedish IBS data [11]. The features extracted from our Japanese IBS data showed some bacteria that were not evident in the Swedish data, such as Halomonas, Klebsiella, Dorea, Prevotella, Lachnobacterium, Ruminococcus, Collinsella, Streptococcus, Bifidobacterium, and Oscillospira (Table 3 and Supplementary Table S1). Featured genera commonly observed in both the Swedish and our Japanese data were Bacteroides, Faecalibacterium, Parabacteroides, and Blautia (Supplementary Table S1).

Discussion
To establish an objective tool for diagnosis of IBS, we investigated the fecal gut microbiota profile in Japanese healthy subjects and IBS patients. Overall, the α-diversity of the gut microbiome was significantly decreased in Japanese IBS patients relative to that of healthy subjects and was lowest in IBS-D than in other types of IBS ( Figure 1). However, the reduction of diversity was not as great as that in obesity [27] or patients with IBD [28], indicating that the dysbiosis in IBS may be comparatively subtle. Furthermore, since the gut microbiome data were based on analysis using 16S rRNA gene sequencing, exclusion of PCR bias may have been necessary. In a considerable proportion of IBS patients, the microbiome composition was similar to that in healthy subjects, although in some it was clearly different (Figure 2), similar to previous findings by Laubus et al. [29]. This variation may not be surprising, as a number of factors (e.g., race, diet, age, gender, social environment) that might possibly affect the gut microbiome profile play a role in the development of IBS, creating heterogeneity among patients [30]. The gut microbiota profile is known to be affected by race [31], and even within the same racial group, healthy individuals may show differences [12]. In this context, it was interesting to compare our data for Japanese IBS patients with those of Swedish IBS patients obtained using a similar study design [11]. This allowed us to extract some bacteria that were specific to Japanese IBS patients (Table 3 and Supplementary Table S1) and not observed in the Swedish study. Since the amplicon regions of the 16S rRNA gene differed between our study and the Swedish one, these two studies need to be compared with reference to the difference in the bioinformatics protocols employed. However, it was perhaps noteworthy that we detected a decrease of specific genera (Bacteroides, Faecalibacterium, Parabacteroides, and Blautia; Supplementary Table S1) that were common to both the Swedish and Japanese cohorts, suggesting that these genera may be highly reliable for distinguishing IBS patients from healthy controls.
There is also the issue of whether the difference in the gut microbiome profile is causative of IBS, or results from its development. This would appear difficult to address as both the gut microbiome profile and IBS pathophysiology are influenced by common environmental factors such as diet, psychological stress, lifestyle, and hormones [30]. In this context, fecal microbiota transplantation might seem to be an appealing approach for clarifying whether alteration of the gut microbiome is a possible cause of IBS. Interestingly, in germ-free animals, transplantation of the fecal microbiota from IBS patients has been shown to reproduce the visceral hypersensitivity or gastrointestinal dysmotility characteristic of IBS [32,33], indicating that the gut microbiome may indeed be a possible cause of IBS. However, Halkjaer et al. have reported that transplantation of the fecal microbiota from healthy subjects to IBS patients conferred no benefit in terms of symptom relief [34]. Taken together, therefore, at least in humans, the existing data suggest that specific alteration of the gut microbiome profile may have no pathophysiological significance in IBS. On the other hand, among the environmental factors mentioned above, diet may have a critical impact on both the gut microbiome profile and IBS pathophysiology [35,36]. Using gnotobiotic methodology, Gordon's group has suggested that diet plays an essential role in defining the gut microbiome profile [37], and moreover that certain dietary components such as fermentable oligosaccharides, disaccharides, monosaccharides, and polyols (FODMAP) not only change the composition of the human gut microbiome but also exacerbate the symptoms of IBS patients [38,39]. Unfortunately, the scope of the present study did not extend to analysis of the influence of diet on the gut microbiota profile in IBS patients, thus representing a qualitative limitation. However, not only diet but also various environmental factors influence the gut microbiome profile as well as IBS pathophysiology; therefore, it appears extremely difficult to clarify whether gut microbiome alterations are of crucial significance in this context.
The gut microbiota interacts with the host by producing SCFAs as mediators [40]. Indeed, it has been clarified that SCFAs act via specific receptors not only on epithelial cells but also immune cells in intestinal tissues [40,41], suggesting that SCFAs play a pivotal role in the pathophysiology of various gastrointestinal diseases. Our data indicated that propionic acid and the difference between butyric acid and valerate were significantly increased in IBS patients whereas acetic acid tended to be decreased (Table 2). When our patients were divided into groups according to IBS subtypes, those with IBS-D showed a significantly increased difference between butyric acid and valerate values, whereas acetic acid was decreased. As SCFAs are products of bacterial dietary fiber metabolism, their properties are determined by a combination of diet and gut microbiome composition. Although the existing data are conflicting, several studies have revealed that the propionic-acid-producing genus Veillonella is increased whereas the butyrate-producing Erysipelotrichaceae are decreased in feces from IBS patients [42][43][44]. In this context, the increase of propionic acid is consistent with previous reports [42][43][44], but we were unable to observe such alterations in the bacterial strains investigated here. Although a low FODMAP diet is useful for symptom relief in 50-80% of IBS patients [39], the mechanism of its effect is still unclear. Interestingly, a low FODMAP diet leads to a reduction of Biffidobacteria and butyrate-producing bacteria [45,46], which characterize IBS. Moreover, although a low FODMAP diet is likely to reduce the production of SCFAs, several studies have obtained conflicting results regarding the effects of such a diet [45][46][47]. Furthermore, we found that the differences in SCFAs among the various IBS subtypes were not so distinct ( Table 2), implying that differences in fecal SCFA concentrations may not play a very significant role in determining the specific symptoms of IBS patients.
Our present goal was to establish a model for diagnosis of IBS using data for the fecal microbiome and SCFAs. Using LASSO regularized multiple logistic regression [22], we evaluated data obtained by 16 rRNA gene sequencing, and finally established a machine learning model for diagnosis of patients with IBS using fecal microbiome data ( Figure 4B; sensitivity > 80% and specificity > 90%). We had initially expected that the SCFA data would have an additive effect on the diagnostic model, since SCFAs are also potential markers for IBS diagnosis [44]. However, as shown in Figure 4B, the SCFA data were of little additional advantage for our diagnostic model of IBS. This may not be surprising in view of the only slight differences in SCFAs between IBS patients and healthy subjects ( Table 2). There is a need for powerful biomarkers that can aid in the objective diagnosis of IBS and/or prediction of the response to therapy, and numerous candidates (e.g., serum molecules, fecal metabolites, motility, psychological aspects) have been investigated [48][49][50]. Although data on fecal microbiota signatures have varied among studies of IBS patients, such differences may have been at least partly due to not only the design of such studies but also the geographic regions where they were conducted, as this aspect can affect diet and lifestyle [30]. Nevertheless, it is interesting that features observed at the phylum level, such as a Firmicutes/Bacteroidetes ratio, have been almost consistent among IBS patients [9], and some common findings at the genus level have also been reported for different cohorts such the Swedish and present Japanese ones. We have no exact explanation for why certain taxa are common to IBS patients in different geographic regions; however, a machine leaning system or statistical analysis may help to reveal the complex associations among IBS-related environmental factors and improve the sensitivity and specificity of tools for IBS diagnosis based on microbiome information.
In summary, we have clarified the gut microbiome characteristics of Japanese IBS patients and the SCFAs they produce. Moreover, we have established a machine learning model for diagnosis of IBS using fecal microbiome data. However, we concede that this study had several limitations. First, it lacked any functional investigations of the microbial community, and the number of control subjects was small, thus diminishing the study relevance. To advance this study, integration of meta-omics approaches such as metagenomics, metatranscriptomics, metaproteomics, or metabolomics would be required. Second, the lack of dietary information might have concealed any effect of diet on the gut microbiota profile. In addition, it might be questionable whether our diagnostic tool would be able to classify IBS patients into various subtypes. In this context, we aimed to create a model based on gut microbiome data and preliminary indications suggested that our strategy might also contribute to the establishment of a machine learning model for subclassification of IBS patients (Supplementary Figure S2). Although further analyses will be needed before this diagnostic model can be established, our present work represents a first step towards devising an objective tool based on gut microbiome data for identifying IBS patients or individuals likely to develop the condition.

Supplementary Materials:
The following are available online at http://www.mdpi.com/2077-0383/9/8/2403/s1, Supplementary Figure S1: AUC scores classifying healthy control and IBS using taxa-assigned data by logistic regression, random forest, and a combination of both. Supplementary Figure S2: AUC scores classifying healthy controls and each IBS-subtype using taxa-assigned data. Supplementary Table S1: Taxa associated with IBS. Each OTU was assigned by a BLAST search. Supplementary Data S1: The number of reads per sample before and after quality filtering. Supplementary Data S2: Relative abundances for each OTU and sample.