Dietary Efficacy Evaluation by Applying a Prediction Model Using Clinical Fecal Microbiome Data of Colorectal Disease to a Controlled Animal Model from an Obesity Perspective

Obesity associated with a Western diet such as a high-fat diet (HFD) is a known risk factor for inflammatory bowel disease (IBD) and colorectal cancer (CRC). In this study, we aimed to develop fecal microbiome data-based deep learning algorithms for the risk assessment of colorectal diseases. The effects of a HFD and a candidate food (Nypa fruticans, NF) on IBD and CRC risk reduction were also evaluated. Fecal microbiome data were obtained from 109 IBD patients, 111 CRC patients, and 395 healthy control (HC) subjects by 16S rDNA amplicon sequencing. IBD and CRC risk assessment prediction models were then constructed by deep learning algorithms. Dietary effects were evaluated based on fecal microbiome data from rats fed on a regular chow diet (RCD), HFD, and HFD plus ethanol extracts or water extracts of NF. There were significant differences in taxa when IBD and CRC were compared with HC. The diagnostic performance (area under curve, AUC) of the deep learning algorithm was 0.84 for IBD and 0.80 for CRC prediction. Based on the rat fecal microbiome data, IBD and CRC risks were increased in HFD-fed rats versus RCD-fed rats. Interestingly, in the HFD-induced obesity model, the IBD and CRC risk scores were significantly lowered by the administration of ethanol extracts of NF, but not by the administration of water extracts of NF. In conclusion, changes in the fecal microbiome of obesity by Western diet could be important risk factors for the development of IBD and CRC. The risk prediction model developed in this study could be used to evaluate dietary efficacy.


Introduction
Obesity has become a public problem as Westernized diets, including high-sucrose and high-fat diets, have become more common for decades. From 1980 to 2013, the proportion of overweight adults increased by 28% in developed countries and nearly 60% in developing countries. Obesity is an epidemic issue with no country reporting a decrease in prevalence during that period [1]. A Westernized diet that includes a high-sucrose, highfat diet is known to accelerate obesity-related metabolic syndrome. In previous studies, overweight and obese people have been found to be more susceptible to chronic localized inflammation than those of normal weight [2,3]. In epidemiological studies, obesity has been associated with liver, gallbladder, and pancreatic cancers [4]. Adipose tissue in the body plays an important role in immune homeostasis. It is known to induce changes in intestinal immunity and influence the intestinal microflora, intestinal barrier function, and innate and adaptive immune cells residing in the gut [5]. Symbiotic microbes have also been found to be associated with obesity through their contributions to dietary energy 2.3. DNA Extraction and NGS Library Preparation for Sequencing of 16S rRNA Amplicon DNA was extracted from each stool using a DNeasy PowerSoil kit (QIAGEN, Düsseldorf, Germany) according to the manufacturer's instructions. The extracted DNA was quantified using a QIAxpert system (QIAGEN, Düsseldorf, Germany). The 16S rDNA was amplified using V3 forward primer (5 -TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGCCT ACGGGNGGCWGCAG-3 ) and V4 reverse primer (5 -GTCTCGTGGGCTCGGAGATGTG TATAAGAGACAGGACTACHVGGGTATCTAATCC-3 ). Index PCR with sequencing adapters attached to amplified DNA was performed using a Nextera XT Index kit (Illumina, San Diego, CA, USA). After quantifying, normalizing, and pooling the libraries, each amplicon was sequenced using a MiSeq instrument (Illumina, San Diego, CA, USA).

Microbiome Profiling
Paired-end reads that matched adapter sequences were trimmed with Cutadapt version 1.1.6 [29]. The resulting FASTQ files containing paired-end reads were merged with CASPER and then quality filtered with the Phred (Q) score-based criteria described by Bokulich [30,31]. Any reads shorter than 350 bp or longer than 550 bp after merging were also discarded. To identify chimeric sequences, a reference-based chimera detection step was conducted with VSEARCH against the SILVA gold database [32,33]. Next, sequence reads were clustered into operational taxonomic units (OTUs) using VSEARCH with a closed-reference clustering algorithm under a threshold of 97% sequence similarity. Representative sequences of OTUs were finally classified using the SILVA 132 database with UCLUST (parallel_assign_taxonomy_uclust.py script on QIIME version 1.9.1) under default parameters [34]. The Chao1 index, an estimator of the richness of taxa per individual, was estimated to measure the diversity of each sample. If clusters could not be assigned at the genus level due to a lack of sequences or redundant sequences in the database, the taxon was assigned at the next highest level as indicated in parentheses.

Data Preprocessing through Taxonomic Accumulation
Before using features for the model, we employed a novel hierarchal accumulation method to give weight to imprecisely classified genera through their verified upper-level taxonomy, while simultaneously reducing the influence of zero-inflation in the microbiome relative abundance data. First, the assigned species were summarized to their respective genus level and all abundance values were scaled by log2. Next, the sum of the taxa abundance values for each sample was set to one to represent the relative abundance of each OTU. Finally, taxonomic values from the genus to the kingdom level were accumulated from all samples using the following formula: V ACCUMULATION : accumulated value of a genus. V G : relative abundance of a genus. V f/O/C/P/K : sum of V G for genus' higher taxonomy (family, order, class, phylum, kingdom).
The accumulated taxonomic values of the sequences obtained from bacteria in all samples were calculated as described above and used for the downstream disease risk predictive model.

Risk Assessment Model
After data preprocessing, 1639 taxa from CRC and 809 taxa from IBD were selected as features for the colon disease model development. We selected the GBM method as a disease classification model. The GradientBoostingRegressor function in the scikit-learn package (version 0.21.3) was used in Python [35]. To overcome limitations caused by overfitting, cross-validation was applied with data randomly split into training sets and test sets at a 7:3 ratio for 30 iterations. Training data sets were used for model training and test sets were used for validation.

Statistical Analysis
The age difference between the healthy control group and the patient group was tested using a t-test. The gender difference was tested using a chi-squared test. The alpha diversity of the microbial composition was measured using the Simpson diversity index and rarefied to compare between the patient group and healthy control group. It was rarefied to 1099, which is the minimum read depth of the total samples. The beta diversity was analyzed using principal coordinate analysis (PCoA) based on Bray-Curtis dissimilarity. Significant differences between the healthy control group and the patient group were determined using a t-test for continuous variables. Before using the t-test, an f-test was performed to ensure that the variances were equal. Additionally, the Kruskal-Wallis test and Mann-Whitney U test were performed to analyze the microbiome differences in rat stool samples, with the p-value corrected using the Benjamini-Hochberg method. Significant differences were considered if the p-value was less than 0.05 or the adjusted p-value (Ad. p) was less than 0.05. All statistical analyses were performed using R version 4.1.0 (Vienna, Austria).

Clinical Characteristics of Subjects
To establish a risk assessment model based on the stool microbial profiles of the healthy control group and IBD and CRC patient groups, there was no statistically significant difference in age between the groups (p = 0.313 for healthy control group vs. IBD group, p = 0.462 for healthy control group vs. CRC group). However, there was a statistically significant difference in gender between the control group and the patient group (p < 0.001 for healthy control group vs. IBD group, p = 0.001 for healthy control group vs. CRC group). The mean age of the control group used in the IBD assessment model was 39.99 years. The mean age of IBD patients was 38.27 years. The mean age of the control group used in the CRC assessment model was 62.75 years. The mean age of CRC patients was 63.5 years, as shown in Table 1.

Comparison of Alpha and Beta Diversity between Healthy Controls and IBD Patients
The Simpson diversity index was used to calculate the alpha diversity of the microbial composition of the healthy control group and IBD patient group. The IBD patient group showed a significant decrease (p = 0.01) in the Simpson index compared to the healthy control group (Figure 1a). Beta diversity was measured based on PCoA using Bray-Curtis dissimilarity to determine the segregation of the clinical samples. The beta diversity significantly separated the clinical samples at both phylum and genus levels (p < 0.001) (Figure 1b Right-side bar plots highlight different average relative abundance of individual key taxa between the control and IBD subjects' stool microbiota at (d) phylum and (e) genus levels. Significance between groups was assessed by a t-test (*, p < 0.05; **, p < 0.01).

Comparison of Alpha and Beta Diversity between Healthy Controls and CRC Patients
Alpha diversity of the microbial composition of the healthy control group and CRC patient group was measured using the Simpson diversity index. Through this analysis, the CRC group showed a significantly lower diversity index than the control group (p = 0.0013) ( Figure 2a). Beta diversity was measured based on PCoA using Bray-Curtis dissimilarity to determine the segregation of the clinical samples. At both the phylum and genus levels, the clinical samples were separated significantly (p < 0.001) (Figure 2b,c).

Differences in Gut Microbiome Abundance between Healthy Controls and CRC Patients
As a result of metagenomic analysis at the phylum level, Actinobacteria and Acidobacteria were significantly higher in the case group than in the control group, while the carriage of Bacteroidetes was significantly lower in the CRC patient group than in the healthy control group (p < 0.05). Bacteroidetes and Actinobacteria showed fold changes of 0.77-and 1.35-fold, respectively (Figure 2d). Notably, Acidobacteria showed a 64-fold difference between the CRC patients and healthy subjects. At the genus level, Subdoligranulum and the [Ruminococcus] torques group were increased, whereas Prevotella 9, Ruminococcaceae UCG-014, Haemophilus, and Parabacteroides were significantly decreased (p < 0.05) in the CRC group compared with the control group. Haemophilus, Ruminococcaceae UCG-014, and Parabacteroides showed fold changes of 0.49-, 0.55-, and 0.59-fold, respectively. The For the model with the highest ROC curve in each disease group model, we extracted the characteristic importance calculated as the mean and standard deviation of the accumulation of impurity reduction. As a result, in the model predicting the control group and IBD group, Dorea, Blautia, Enhydrobacter, and so on were shown as the genera to have a high importance in that order. In the model predicting the control group and CRC group, Parvimonas, Peptostreptococcus, Psychrilyobacter, and so on were shown as the genera with a high importance in that order (Figure 3c). Microorganisms 2022, 10, x FOR PEER REVIEW 10 of 18

Body Weight Measure for 6 Weeks and Comparison of Alpha and Beta Diversity in HFD Induced Rat Model
The body weight of the HFD model was measured to evaluate dietary efficacy of NF extracts for weight reduction. At 6 weeks, the body weight of the HFD group was higher than that of the normal group. The NFW group showed no difference in body weight compared to the HFD group, while the NFE group showed lower body weight than the HFD group (Figure 4a). As a result of expressing as a bar graph to compare the weight at 6 weeks of dietary intake, the normal group and the high-fat diet group showed a significant difference. However, there was no significant difference in weight between the NFE group and the normal group (Figure 4b). mulation of impurity reduction. As a result, in the model predicting the control group and IBD group, Dorea, Blautia, Enhydrobacter, and so on were shown as the genera to have a high importance in that order. In the model predicting the control group and CRC group, Parvimonas, Peptostreptococcus, Psychrilyobacter, and so on were shown as the genera with a high importance in that order (Figure 3c).

Body Weight Measure for 6 Weeks and Comparison of Alpha and Beta Diversity in HFD Induced Rat Model
The body weight of the HFD model was measured to evaluate dietary efficacy of NF extracts for weight reduction. At 6 weeks, the body weight of the HFD group was higher than that of the normal group. The NFW group showed no difference in body weight compared to the HFD group, while the NFE group showed lower body weight than the HFD group (Figure 4a). As a result of expressing as a bar graph to compare the weight at 6 weeks of dietary intake, the normal group and the high-fat diet group showed a significant difference. However, there was no significant difference in weight between the NFE group and the normal group (Figure 4b). The alpha diversity of the gut microbiota composition in a murine model was measured using the Simpson index. As a result, the HFD group showed significantly lower enrichment than the normal group (p = 0.0019). The NFE group showed a significantly higher diversity value in the Simpson index than the HFD group (p = 0.021). The NFW group showed a significantly lower diversity value than the normal group (p = 0.0006), and was lower than the HFD group, although the difference was not statistically significant (p = 0.26) (Figure 5a). The beta diversity was measured based on PCoA plot using Bray-Curtis dissimilarity at the phylum and genus levels. At all levels, the rat stool samples of the four groups showed significant separation (p < 0.001) (Figure 5b,c). The alpha diversity of the gut microbiota composition in a murine model was measured using the Simpson index. As a result, the HFD group showed significantly lower enrichment than the normal group (p = 0.0019). The NFE group showed a significantly higher diversity value in the Simpson index than the HFD group (p = 0.021). The NFW group showed a significantly lower diversity value than the normal group (p = 0.0006), and was lower than the HFD group, although the difference was not statistically significant (p = 0.26) (Figure 5a). The beta diversity was measured based on PCoA plot using Bray-Curtis dissimilarity at the phylum and genus levels. At all levels, the rat stool samples of the four groups showed significant separation (p < 0.001) (Figure 5b,c).

Differences in Microbial Composition in HFD Induced Rat Model
We confirmed the microbiome configuration of the phylum level and the genus level and compared the difference among groups. At the phylum level, Proteobacteria and Actinobacteria were significantly higher in the HFD group than in the normal group, while Bacteroidetes was significantly lower in the HFD group than in the normal group. Firmicutes was significantly decreased in the NFE group than in the HFD group. The carriage of Bacteroidetes and Cyanobacteria was significantly increased in the NFE and NFW groups. However, Patescibacteria was significantly decreased in the two groups (Ad. p < 0.05) (Figure 5d). At the genus level, [Desulfovibrionaceae], Lachnoclostridium, [Eubacterium] coprostanoligenes group, and [Lachnospiraceae] were significantly higher in the HFD group than in the normal group, while [Muribaculaceae], Prevotellaceae Ga6A1 group, Lactobacillus, [Ruminococcaceae], Ruminococcus 1, and Ruminococcaceae UCG-013 were significantly lower in the HFD group. Bacteroides was significantly decreased in the NFE group than in the HFD group. The Prevotellaceae Ga6A1 group was significantly increased in the NFW group than in the HFD group. The carriage of [Muribaculaceae] and Phascolarctobacterium was significantly increased in NFE and NFW groups. However, Ruminococcaceae UCG-014, [Eubacterium] coprostanoligenes group, [Lachnospiraceae], and [Ruminococcaceae] were significantly decreased in both groups (Ad. p < 0.05) (Figure 5e) (Figure S3).

External Validation of Risk Assessment Models
To determine whether the colorectal disease risk assessment model could evaluate well in a rigorous animal model, we calculated predictive values by substituting the rat stool metagenome data from the normal diet, HFD, NFE, and NFW groups. In both models for evaluating IBD and CRC risk, the predicted values in all groups were higher than 0.5, the standard for distinguishing the healthy control group from the patient group. However, the predicted risk score was increased in the HFD group compared with the normal diet group (Figure 6a,b). Among the groups taking NF with HFD, the NFW group tended to show increased risk scores while the NFE group tended to show a level lower than or similar to the normal group. This trend was more conspicuously observed in the IBD model than in the CRC model.

Discussion
In the present study, we constructed a deep learning-based risk prediction model using fecal microbiome data from IBD and CRC patients. As shown in Figure 3b, the AUC Figure 6. Risk prediction values of IBD and CRC models for rat groups. Predicted values of IBD (a) and CRC (b) risk were plotted as boxplots for normal, HFD, NFE, and NFW rat groups (*, p < 0.05; **, p < 0.01).

Discussion
In the present study, we constructed a deep learning-based risk prediction model using fecal microbiome data from IBD and CRC patients. As shown in Figure 3b, the AUC value used as predictive performance of the deep learning algorithm of the model was 0.84 for IBD and 0.80 for CRC. Although the microbial profiles from normal individuals and patients with IBD are sometimes obtained and compared, this comparison is seldom performed to examine the performances of these prediction models in controlled animal models [18,36,37]. We further evaluated the risk by applying this predictive model for colonic disease to a controlled obese rat model. As shown in Figure 6, the risk prediction model for IBD and CRC showed that the HFD group increased the risk compared to the regular diet group in the rat experiment. This is an important finding because it suggests that microbiome compositions induced by a high-fat diet and those of IBD and CRC are shared. Additionally, we assessed whether the consumption of NF extract reduced the risk of CRC or IBD when consumed during a high-fat diet. Our results confirmed the correlation that the intake of ethanol-extracted NFs could increase the microbiome composition, which lowers the IBD risk score. The novel approach suggests that disease risk prediction models can be established using well-controlled animal models and used as tools to assess dietary efficacy through changes in microbiome composition.
There are many reports on the relationship between food and gut microbiome changes or the association between microbiome composition changes and disease. The MetaHIT Consortium extracted DNA from human feces and compared a group with a low number of gut microbiome genes and a group with a high number of gut microbiome genes to calculate a low gut microbiome gene [43]. In the treated group, high inflammatory cytokine secretion and high obesity rates were confirmed. In addition, it was confirmed that the genes of intestinal microbes were increased in both groups through a caloric control diet for 6 weeks. It was also confirmed that increasing fiber through the intake of fruits and vegetables could increase the abundance of high symbiotic microorganisms [44]. In another study, as a result of comparing a low-gluten diet group with a normal diet group, the relative abundance of Bifidobacterium species was decreased in the group that adhered to a low-gluten diet. Furthermore, a decreased amount of Bifidobacterium species induces a reduction of IL-1β, which is produced by immune cell-induced inflammation, inflammasome, and cytokines [45]. With respect to cardiac metabolism, the symbiotic microorganism Prevotella copri has been shown to have a protective association between the Mediterranean dietary pattern and cardiometabolic health. Moreover, among animal models, a high-fat diet habit could delay the disease according to immune regulation by changing the intestinal microflora in a mouse auto-inflammatory bone disease model [46]. These results suggest that it is possible to change intestinal symbiotic microorganisms through dietary intervention for disease prevention and treatment. Furthermore, disease occurrence could be prevented by changing microbiome compositions.
In this study, we used a machine learning classification method to build a risk assessment model considering the characteristics of stool metagenomic data. With the development of NGS technology, research on the microbial metagenome has exploded over the past decade, producing a huge amount of data. Various classification models for data analysis have been developed. As a result, a machine learning algorithm has been adopted as a powerful tool for interpreting the complex big data output of microbiome profiling [47,48]. To overcome some general challenges of microbiome analysis, a new data preprocessing method was also developed and applied. Microbiome count data are generally very rich in zero values, leading to bias. Thus, such data need to be modified before they are used as model input features [49]. Additionally, taxonomic assignment is based on a public database in which many entries are loosely specified at the genus level [50]. Instead of eliminating all incorrectly labeled classifications to reduce data resolution, we used a new method of clustering taxonomic hierarchical values of a given OTU into a single code value. Through this method, we aimed to address these biases and improve the rationale for our risk prediction model. Although further verification is required, the high predictive strength shown through multiple iterations of the model trained with coded metagenomic data supports the advantages of the taxonomic hierarchical coding method.
Widely distributed from tropical to subtropical regions of southeast Asia and Oceania, Nypa fruticans Wurmb (Arecaceae), a palm tree, is traditionally used in Myanmar to treat dental-related inflammation [24]. In the stool microbial profile of the HFD-induced rat model, Bacteroidetes, Phascolarctobacterium, and Lactobacillus were increased at the genus level compared to the control group, while Lachnospiraceae, Ruminococcaceae, and UCG-014 [Eubacterium] were decreased. Ethanol-extracted NFs can lower the imbalance of these bacteria [51]. In particular, the increase in Bacteroidetes and Lactobacillus strains associated with obesity and lack of a significant difference between the normal diet group and the NFE group suggest that NFE may inhibit weight gain during HFD intake.
Our study has some limitations. A risk assessment model was created based on a human-derived fecal microbiome profile. However, its risk assessment was applied to a rat animal model. In addition, controlled clinical samples are extremely rare, making it an inevitable choice. Furthermore, although there are various models of colitis (UC/CD) in murine, these methods are mainly postulated to chemically alter the microbial profile by disrupting the mucosal barrier and inducing inflammation of the colon via intestinal permeation by symbiotic microbiota [52]. On the other hand, the HFD rat model, which can induce chronic inflammation by natural intestinal flora, was selected as a controlled external validation model for our colorectal cancer and colitis risk assessment model [53,54]. When the four groups were scored with the IBD and CRC risk assessment models based on identical fecal microbiome composition data from obese rats, the IBD model distinguished each group better than the CRC model. This leads us to infer that the microbiome induced by the HFD model is relatively correlated with IBD. However, in the case of CRC, the obesity model appears to be an unsuitable model for risk assessment. It is judged that the level of evaluation is not dramatic because cancer develops through a long process. If possible, controlled clinical samples would be more appropriate for risk assessment.
In conclusion, this study demonstrates that the risk prediction model generated based on the microbial profile of clinical disease can be externally validated by utilizing appropriate animal disease models. This study proves that a well-assessed microbiome profile-based prediction model can be a platform for rapidly estimating various indicators or markers or the efficacy of diets and foods.