Environment-Related Genes Analysis of Limosilactobacillus fermentum Isolated from Food and Human Gut: Genetic Diversity and Adaption Evolution

Limosilactobacillus fermentum is ubiquitous in traditional fermented vegetables, meat products, and the human gut. It is regarded as a “generally recognized as safe” organism by the US Food and Drug Administration. So far, the genetic features and evolutionary strategies of L. fermentum from the human gut and food remain unknown. In this study, comparative genomic analysis of 224 L. fermentum strains isolated from food and human gut (164 L. fermentum strains isolated from human gut was sequenced in our lab) was performed to access genetic diversity and explore genomic features associated with environment. A total of 20,505 gene families were contained by 224 L. fermentum strains and these strains separated mainly into six clades in phylogenetic tree connected with their origin. Food source L. fermentum strains carried more carbohydrate active enzyme genes (belonging to glycosyltransferase family 2, glycoside hydrolase family 43_11, and glycoside hydrolase family 68) compared with that of human gut and L. fermentum derived from food showed higher ability to degrade xylulose and ribose. Moreover, the number of genes encoding otr(A), tetA(46), lmrB, poxtA, and efrB were more abundant in food source L. fermentum, which was consistent with the number of CRISPR spacers and prophages in L. fermentum of food source. This study provides new insight into the adaption of L. fermentum to the food and intestinal tract of humans, suggesting that the genomic evolution of L. fermentum was to some extent driven by environmental stress.


Introduction
Limosilactobacillus fermentum is a facultatively anaerobic, gas-producing, and obligately heterofermentative bacterium [1]. In 1901, some basic physiological and biochemical characteristics (cellular morphology, nutritional requirement, and carbohydrate fermentation) of this strain were described for the first time in Bergey's Manual of Systematic Bacteriology; it can actively ferment sugars, such as glucose, fructose, sucrose, lactose, mannose, and ribose, but usually shows little or no fermentation of xylose, cellobiose, and trehalose [2]. L. fermentum can convert the carbohydrates in food to acid to alter flavor, prolong shelf life, and improve nutritional quality. It is used as a common starter culture in traditional fermentation of fruits and vegetables [3,4]. Furthermore, it was regarded as a "generally recognized as safe" organism by the US Food and Drug Administration in 2013. Accumulating evidence has shown that L. fermentum is ubiquitous in the intestinal tract of begun to dominate the global probiotics market [7]. Researchers have indicated that the metabolites (exopolysaccharides [30], antimicrobial compounds [31], bile salt hydrolase [32], organic acid, and lactase [33]) produced by probiotics in the host tissue may modulate host biology and disease processes. Genes related to the synthesis of bile salt hydrolase [32], branched short-chain fatty acids [34], reuterin, and cobalamin [35] have been clarified.
Since probiotic properties and fermentation characteristics of L. fermentum are intimately linked to specific genes in the strain, it is essential to understand the genomic traits of L. fermentum strains. Furthermore, the nutrition characteristics, temperature, pH, oxygen, osmotic pressure, and redox potential [34] of human gut and fermentation food could affect the evolutionary change of L. fermentum and it is harbored in both environments [35]; whether this difference would cause the loss or occurrence of specific genes is still unknown. In this study, comprehensive genomic analysis of 224 L. fermentum strains derived from the human gut and food (164 were isolated from the gut of humans in our lab and 60 were obtained online) were collected, and the genes encoding orthologous proteins, antibiotic resistance, carbohydrate-active enzymes, CRISPR/Cas9, virulence factors, and prophage in L. fermentum were analyzed.

Genome Sequence
A total of 164 L. fermentum strains were isolated from 153 healthy Chinese human gut (samples WX111-WX115 were from the same person, samples HN112, HN14-HN1110 were from the same person) and Fast DNA Spin Kit was used to extract the DNA of L. fermentum strains [36]. Then, the DNA amplicons were sequenced using the Illumina Hiseq 10 platform (San Diego, CA, USA). Sixty L. fermentum genomes were obtained from the NCBI microbial genome database. The basic information of these strains is provided in Table S1.

Average Nucleotide Identity Values, Pan-, and Core-Genome, and Phylogenetic Analyses
Average nucleotide identity (ANI) was calculated using Python and pan and core genomes were analyzed using PGAP 1.2.1 [37]. OrthoMCL1.4 was used to analyze orthologous genes, and the maximum likelihood method was used to perform a phylogenetic analysis of 224 L. fermentum strains (based on 615 orthologous genes).

Clusters of Orthologous Groups (COGs) Analysis
The genomes of 224 L. fermentum strains were uploaded to BLAST against all annotated Clusters of Orthologous Groups (COGs) in the COG database (https://www.ncbi.nlm.nih. gov/COG (accessed on 5 June 2021)). The dominant COGs in each clade are shown in Table S2.

Carbohydrate Metabolism
The genomes of 224 L. fermentum strains were uploaded to BLAST against all annotated CAZyme proteins in the Carbohydrate-Active enZyme (CAZy) database. These genomes were also uploaded to BLAST and annotated against sequences in the non-redundant protein sequence database (NR), and the enzymes involved in carbohydrate metabolism were analyzed.

Statistical Analysis
PERMANOVA and pairwise comparison analysis was used to analyze the difference between groups (* p < 0.05, ** p < 0.01, and *** p < 0.001). The data of the ANI, pan and core genomes, COGs analysis, carbohydrate metabolism, ARGs, CRISPR-Cas systems, and prophage identification were visualized using R (ggplot2 package). Microsoft PowerPoint and Adobe Illustrator were used to visualize and assemble the pictures.

Data Deposition
The genomes of 164 L. fermentum strains screened in our lab were sequenced and uploaded to the Sequence Read Archive database in NCBI Data Bank with biosample accession numbers SAMN15891013-SAMN15891179.

Genetic Diversity and Phylogenetic Analysis of 224 L. fermentum Strains
The nucleotide-level genomic similarity between the coding regions of every two genomes of L. fermentum strains in this study was greater than 97% ( Figure 1A). The similarities, differences, and relationships between the genomes of 224 L. fermentum strains are presented in the Venn diagram in Figure 1B; 615 genes were shared by the genomes of all L. fermentum, and 11-525 unique genes were present in each strain. Pan-genome analysis revealed that the number of pan genes increased sharply as the genome of L. fermentum strains increased, and 20,505 gene families existed in the genomes of 224 L. fermentum strains. Compared with the pan-genome curve, the core-genome curve decreased flatly and 502 core genes were shared by the genomes of 224 L. fermentum strains ( Figure 1C).
On the phylogenetic tree, 224 L. fermentum strains were divided into six clades (clades I, II, III, IV, V, and VI) ( Figure 2). Of the 60 L. fermentum strains obtained from NCBI, 22 L. fermentum strains were derived from human fecal samples and 35 L. fermentum strains were isolated from food sources (Table S1). L. fermentum strains belonging to clades III and IV mostly originated from food sources, while L. fermentum strains isolated from the human gut mainly clustered in clades I, II, and VI. Clade V included L. fermentum strains with half human and half food sources.

Analysis of Clusters of Orthologous Groups (COGs) in L. fermentum Strains
A total of 1434 clusters of orthologous groups were harbored by the genome of 224 L. fermentum strains in the COG database. Principal coordinates analysis (PCoA) of COG

Analysis of Clusters of Orthologous Groups (COGs) in L. fermentum Strains
A total of 1434 clusters of orthologous groups were harbored by the genome of 224 L. fermentum strains in the COG database. Principal coordinates analysis (PCoA) of COG between six phylogenetic clades showed that orthologous groups of proteins in the

Analysis of Clusters of Orthologous Groups (COGs) in L. fermentum Strains
A total of 1434 clusters of orthologous groups were harbored by the genome of 224 L. fermentum strains in the COG database. Principal coordinates analysis (PCoA) of COG between six phylogenetic clades showed that orthologous groups of proteins in the genomes of clades I and II were more similar and proteins in clade VI were differentiated from those of any other groups ( Figure 3A). PERMANOVA and pairwise comparison results showed no significant difference between clades III and IV ( Figure 3B). Among all COG functional categories, genes categorized as mobilome, prophages, and transposons (functional categories of X in COGs) varied the most between different clades, and these genes in clades III and IV were significantly higher than those in any other clade ( Figure 3C). sults, we observed significant differences in L. fermentum genomes of human and food source. Then, LEfSe analysis of COG categories in two groups of L. fermentum genomes was performed and the result showed that the number of dominant COG functional categories belonging to human source L. fermentum and food source L. fermentum were 31 and 74, respectively. Food source L. fermentum strains were relatively lower than that of human source and they contained significantly low dominant COG categories (Table S2). Remarkably, some functional genes belonging to COG category of mobilome, prophages, and transposons were widely shared by L. fermentum strains of food source. Among these, genes annotated as COG2826, COG3328, COG2801, COG0675, COG1943, COG2963, COG3464, COG3436, and COG3293 were all related to transposase and were most differentially distributed in the food source L. fermentum genomes. Compared to food source, human gut source L. fermentum genomes had significantly more genes annotated as energy production and conversion, amino acid transport, and metabolism. Dominant COG categories sorted by LDA (linear discriminant analysis) score greater that 2.5 were COG1309 (DNA−binding protein), COG1028 (NAD(P)−dependent dehydrogenase), COG0538 (isocitrate dehydrogenase), COG0531 (serine transporter YbeC), COG0716 (Flavodoxin), and COG1063 (threonine dehydrogenase or related Zn−dependent dehydrogenase). Overall, compared to food source L. fermentum strains, human source L. fermentum genomes contained significantly more dominant COG categories, such as functional categories of C, E, G, K, L, and R.  Analysis of functional categories enriched in L. fermentum genome may provide new ideas for identifying the environmental characteristics or stress. Based on the above results, we observed significant differences in L. fermentum genomes of human and food source. Then, LEfSe analysis of COG categories in two groups of L. fermentum genomes was performed and the result showed that the number of dominant COG functional categories belonging to human source L. fermentum and food source L. fermentum were 31 and 74, respectively. Food source L. fermentum strains were relatively lower than that of human source and they contained significantly low dominant COG categories (Table S2). Remarkably, some functional genes belonging to COG category of mobilome, prophages, and transposons were widely shared by L. fermentum strains of food source. Among these, genes annotated as COG2826, COG3328, COG2801, COG0675, COG1943, COG2963, COG3464, COG3436, and COG3293 were all related to transposase and were most differentially distributed in the food source L. fermentum genomes. Compared to food source, human gut source L. fermentum genomes had significantly more genes annotated as energy production and conversion, amino acid transport, and metabolism. Dominant COG categories sorted by LDA (linear discriminant analysis) score greater that 2.5 were COG1309 (DNA-binding protein), COG1028 (NAD(P)-dependent dehydrogenase), COG0538 (isocitrate dehydrogenase), COG0531 (serine transporter YbeC), COG0716 (Flavodoxin), and COG1063 (threonine dehydrogenase or related Zn-dependent dehydrogenase). Overall, compared to food source L. fermentum strains, human source L. fermentum genomes contained significantly more dominant COG categories, such as functional categories of C, E, G, K, L, and R.

Identification of Carbohydrate Metabolism in L. fermentum Strains
CAZyme families included in the genomes of L. fermentum strains were glycoside hydrolases (GHs), glycosyltransferases (GTs), carbohydrate esterases (CEs), and carbohydrate -binding modules (CBMs). Among these, glycosyltransferase family 2, glycosyltransferase family 4, glycoside hydrolase family 73, and carbohydrate-binding module family 50 were major families in genomes of both two groups of L. fermentum strains. CAZyme families of food source and human feces source L. fermentum strains were comparative analyzed by PERMANOVA and pairwise comparison and result showed that CAZyme genes in group of food source were significantly higher than that of human gut source. A relatively large number of glycoside hydrolases not yet assigned to a family (GH0) were included in human feces source L. fermentum strains ( Figure 4A). In order to find out whether or not there is a statistically significant difference between two L. fermentum groups, LEfSe analysis with a Kruskal-Wallis test was used. Of 38 CAZyme families, 22 CAZyme families were significantly different between the human sources and food sources. The number of glycoside hydrolase family 3, glycoside hydrolase family 13_20, and glycoside hydrolase family 13_29 was common in human-derived L. fermentum, but they were rare in food-derived strains ( Figure 4B), while glycosyltransferase family 2, glycoside hydrolase family 43_11 and glycoside hydrolase family 68 were dominant in "food source" L. fermentum strains. fermentum strains in 6 phylogenetic clades; (C) the violin plots show the number of genes annotated with diverse COG functional categories in L. fermentum in the phylogenetic clades Ⅰ, Ⅱ, Ⅲ, Ⅳ, Ⅴ, and Ⅵ and * indicate a significant difference in functional categories X between different group (** p < 0.01, *** p < 0.001).

Identification of Carbohydrate Metabolism in L. fermentum Strains
CAZyme families included in the genomes of L. fermentum strains were glycoside hydrolases (GHs), glycosyltransferases (GTs), carbohydrate esterases (CEs), and carbohydrate−binding modules (CBMs). Among these, glycosyltransferase family 2, glycosyltransferase family 4, glycoside hydrolase family 73, and carbohydrate−binding module family 50 were major families in genomes of both two groups of L. fermentum strains. CA-Zyme families of food source and human feces source L. fermentum strains were comparative analyzed by PERMANOVA and pairwise comparison and result showed that CA-Zyme genes in group of food source were significantly higher than that of human gut source. A relatively large number of glycoside hydrolases not yet assigned to a family (GH0) were included in human feces source L. fermentum strains ( Figure 4A). In order to find out whether or not there is a statistically significant difference between two L. fermentum groups, LEfSe analysis with a Kruskal-Wallis test was used. Of 38 CAZyme families, 22 CAZyme families were significantly different between the human sources and food sources. The number of glycoside hydrolase family 3, glycoside hydrolase family 13_20, and glycoside hydrolase family 13_29 was common in human−derived L. fermentum, but they were rare in food−derived strains ( Figure 4B), while glycosyltransferase family 2, glycoside hydrolase family 43_11 and glycoside hydrolase family 68 were dominant in "food source" L. fermentum strains.   Based on the non-redundant protein sequence database (NR), the enzymes involved in carbohydrate metabolic pathways are presented in Figure 5A. Related genes encoding enzymes involved in the metabolism of L-arabinose, D-galactose, D-glucose, D-ribose, D-mannose, maltose, melibiose, manninotriose, sucrose, stachyose, lactose, and raffinose were present in almost all L. fermentum strains (more than 220). Genes encoding enzymes

Characteristic of Antibiotic Resistance Genes in L. fermentum Strains
The genomes of 224 L. fermentum strains were annotated using the Comprehensive Antibiotic Resistance Database (CARD) and a total of 58 antibiotic resistance gene categories were found in the genomes of 224 L. fermentum strains. Based on LEfSe analysis with a Kruskal-Wallis test, 19 significantly different antibiotic resistance gene families were shown in Figure 6. Of note, antibiotic resistance gene family otr(A) (tetracycline antibiotic) and tetA(46) (tetracycline antibiotic) were almost exclusively found in food source L. fermentum strains. Number of genes belonging to card category lmrB (lincosamide antibiotic), poxtA (tetracycline antibiotic, phenicol antibiotic, and oxazolidinone antibiotic), and efrB (fluoroquinolone antibiotic, rifamycin antibiotic, macrolide antibiotic) were also dominant in L. fermentum strains in food. For human-derived L. fermentum strains, antibiotic resistance gene family pmrA (fluoroquinolone antibiotic), bcrA (peptide antibiotic), arlR (fluoroquinolone antibiotic), vanRF (glycopeptide antibiotic), and mdtG (phosphonic acid antibiotic) were all more abundant.

Characteristic of Antibiotic Resistance Genes in L. fermentum Strains
The genomes of 224 L. fermentum strains were annotated using the Comprehensive Antibiotic Resistance Database (CARD) and a total of 58 antibiotic resistance gene categories were found in the genomes of 224 L. fermentum strains. Based on LEfSe analysis with a Kruskal-Wallis test, 19 significantly different antibiotic resistance gene families were shown in Figure 6. Of note, antibiotic resistance gene family otr(A) (tetracycline antibiotic) and tetA(46) (tetracycline antibiotic) were almost exclusively found in food source L. fermentum strains. Number of genes belonging to card category lmrB (lincosamide antibiotic), poxtA (tetracycline antibiotic, phenicol antibiotic, and oxazolidinone antibiotic), and efrB (fluoroquinolone antibiotic, rifamycin antibiotic, macrolide antibiotic) were also dominant in L. fermentum strains in food. For human−derived L. fermentum strains, antibiotic resistance gene family pmrA (fluoroquinolone antibiotic), bcrA (peptide antibiotic), arlR (fluoroquinolone antibiotic), vanRF (glycopeptide antibiotic), and mdtG (phosphonic acid antibiotic) were all more abundant. Figure 6. Heatmap of the number of differential antibiotic resistance genes annotated with the comprehensive antibiotic resistance database in the genome of L. fermentum from human gut and food. (The items of antibiotic resistance gene with LDA score greater than 2 using linear discriminant analysis effect size analysis are listed).

Identification of CRISPR−Cas Systems in L. fermentum Strains
CRISPRs and cas genes in the genomes of 224 L. fermentum strains were analyzed. The genomes of 210 L. fermentum strains contained at least one CRISPR, and the genomes of 159 L. fermentum strains included Cas genes (Table S3). Five CRISPR subgroups (Types IE, IIA, IIC, IIIA, and IC) were identified in 224 L. fermentum strains and class 1 Type IE was the most abundant subtype, followed by class 2 Type IIA (Table S3). Except for class 2 Type IIC, the abundant of CRISPR Types IE, IIA, IIIA, and IC were all higher in L. fermentum of food source compared with human gut source. Remarkably, L. fermentum strains of food source had significantly more CRISPR class 2 Type IIA and class 3 Type IIIA, which were almost 2.5 to 4 times more than that in L. fermentum derived from human gut (Figure 7). Phylogenetic analysis of Cas1 and Cas2 (differing by CRISPR subtype) showed that Cas1 and Cas2 genes variably distributed in L. fermentum that had nothing to do with their origin ( Figure S1).

Identification of CRISPR-Cas Systems in L. fermentum Strains
CRISPRs and cas genes in the genomes of 224 L. fermentum strains were analyzed. The genomes of 210 L. fermentum strains contained at least one CRISPR, and the genomes of 159 L. fermentum strains included Cas genes (Table S3). Five CRISPR subgroups (Types IE, IIA, IIC, IIIA, and IC) were identified in 224 L. fermentum strains and class 1 Type IE was the most abundant subtype, followed by class 2 Type IIA (Table S3). Except for class 2 Type IIC, the abundant of CRISPR Types IE, IIA, IIIA, and IC were all higher in L. fermentum of food source compared with human gut source. Remarkably, L. fermentum strains of food source had significantly more CRISPR class 2 Type IIA and class 3 Type IIIA, which were almost 2.5 to 4 times more than that in L. fermentum derived from human gut (Figure 7). Phylogenetic analysis of Cas1 and Cas2 (differing by CRISPR subtype) showed that Cas1 and Cas2 genes variably distributed in L. fermentum that had nothing to do with their origin ( Figure S1).
Spacers are small fragments of foreign DNA incorporated into bacteria's own CRISPR loci to avoid invasion by alien species. On the phylogenetic tree, spacers of L. fermentum clustered into nearly 50 phylogenetic groups ( Figure 8A). Distinct spacers sequence of L. fermentum are color-coded in the branches of the phylogenetic tree and more abundant spacer sequences were contained in L. fermentum of human gut source. Some spacers were only owned by human source L. fermentum strains. The spacers gene abundance of 224 L. fermentum strains was analyzed using PERMANOVA and pairwise comparison and the number of spacers showed significant difference between two source groups. The number of spacers in food source L. fermentum was significantly higher than that in human gut source L. fermentum ( Figure 8B). Spacers are small fragments of foreign DNA incorporated into bacteria's own CRISPR loci to avoid invasion by alien species. On the phylogenetic tree, spacers of L. fermentum clustered into nearly 50 phylogenetic groups ( Figure 8A). Distinct spacers sequence of L. fermentum are color−coded in the branches of the phylogenetic tree and more abundant spacer sequences were contained in L. fermentum of human gut source. Some spacers were only owned by human source L. fermentum strains. The spacers gene abundance of 224 L. fermentum strains was analyzed using PERMANOVA and pairwise comparison and the number of spacers showed significant difference between two source groups. The number of spacers in food source L. fermentum was significantly higher than that in human gut source L. fermentum ( Figure 8B).

Identification of Prophages in L. fermentum Strains
The number of prophages in L. fermentum strains predicted to be "intact" using PHASTER are shown in Figure 9. PHAGE_Lactob_LfeSau and PHAGE_Lactob_LF1 were the most abundant prophages in all L. fermentum strains and food source L. fermentum contained more abundant PHAGE_Staphy_SPbeta_like. Furthermore, less common prophages such as PHAGE_Paenib_Xenia, PHAGE_Lactob_phiPYB5, and  Spacers are small fragments of foreign DNA incorporated into bacteria's own CRISPR loci to avoid invasion by alien species. On the phylogenetic tree, spacers of L. fermentum clustered into nearly 50 phylogenetic groups ( Figure 8A). Distinct spacers sequence of L. fermentum are color−coded in the branches of the phylogenetic tree and more abundant spacer sequences were contained in L. fermentum of human gut source. Some spacers were only owned by human source L. fermentum strains. The spacers gene abundance of 224 L. fermentum strains was analyzed using PERMANOVA and pairwise comparison and the number of spacers showed significant difference between two source groups. The number of spacers in food source L. fermentum was significantly higher than that in human gut source L. fermentum ( Figure 8B).

Identification of Prophages in L. fermentum Strains
The number of prophages in L. fermentum strains predicted to be "intact" using PHASTER are shown in Figure 9. PHAGE_Lactob_LfeSau and PHAGE_Lactob_LF1 were the most abundant prophages in all L. fermentum strains and food source L. fermentum contained more abundant PHAGE_Staphy_SPbeta_like. Furthermore, less common prophages such as PHAGE_Paenib_Xenia, PHAGE_Lactob_phiPYB5, and

Identification of Prophages in L. fermentum Strains
The number of prophages in L. fermentum strains predicted to be "intact" using PHASTER are shown in Figure 9. PHAGE_Lactob_LfeSau and PHAGE_Lactob_LF1 were the most abundant prophages in all L. fermentum strains and food source L. fermentum contained more abundant PHAGE_Staphy_SPbeta_like. Furthermore, less common prophages such as PHAGE_Paenib_Xenia, PHAGE_Lactob_phiPYB5, and PHAGE_Lactob_phig1e were distributed sporadically in L. fermentum of both sources. PHAGE_Lactob_JCL1032 and PHAGE_Lactob_521B were found only in "human gut source" L. fermentum strains.

Discussion
Research has shown that Lactobacillus species populate nutrient-rich habitats, such as fermented plant matter and in animals (both vertebrates and invertebrates, including humans) [6]. It is generally believed that microbes constantly evolve through gene variants and horizontal gene transfer between distinct microbes to face a range of selective pressures in a variety of ecological environments [38]. In this study, the average percentage identity between nucleotide sequences of 224 L. fermentum strains was more than 97%, while the Venn diagram showed that the maximum number of specific genes of L. fermentum reached 525 (Figure 1). Pan-and core-genome analyses also showed that as the number of L. fermentum strains increased, the number of pan genes increased and the number of core genes continued to shrink. Research also showed that lactobacilli in distinct habitats could evolve with their environment and generate unique genes [14]. We speculated that if the number of sources of L. fermentum increased, the curves of pan and core genes would become steeper. Good et al. showed that molecular evolution in Escherichia coli was dynamic, driven by the accumulation of mutations, and constantly created new genetic opportunities for adaptation of strains [39]. This may also explain the growing number of pan genes and numerous unique genes in the genome of L. fermentum strains.
Phylogenetic analysis of L. fermentum strains derived from human gut and food (such as yogurt, dairy, sourdough, kimchi, fermented plant material, and fermented meat) was conducted in this study, and 224 L. fermentum strains were mainly clustered into six clades ( Figure 2). Most of the L. fermentum strains isolated from the human gut clustered in clade II, while the rest were mainly found in clades I, V, and VI. Research over the past few decades has clarified that symbiotic microbes and their metabolites (SCFAs, endotoxins, peptidoglycans, and polysaccharide antigens) play a crucial role in defending against pathogen colonization, host physiology (immunoregulation), and metabolism, which is widely believed to be a result of coevolution [40]. Many factors, such as exposure to xenobiotics and host diet [41], may provide the host with unique selective pressures on its gut microbiota [42]. Filannino et al. showed that lactic acid bacteria in plant foods participated in a series of reactions (fatty acid metabolism, carbon metabolism, nitrogen metabolism, and phenolic metabolism) through specific bacterial enzymes (such as linoleate isomerase, fatty acid hydratases, mannitol dehydrogenase, reductase, and amine dehydrogenase), and the fermentation process relies on the rapid adaptation and metabolic capability of Lactobacillus with available nutrients [43]. Since the ecological environment of the human intestinal tract and food are distinct, the phylogenetic analysis of our study may illustrate the niche-specific adaptation of L. fermentum strains to different habitats. Batstone et al. explored whether the host could actively choose more cooperative microbial strains through a cross-inoculation experiment and the results showed that rhizobia rapidly adapted and gave preference to its original legume genotype, evolved to be more beneficial, and the process was not affected by host selection [44]. It is possible that the separation of L. fermentum strains from different sources is also the result of long-term coevolution between L. fermentum and its sources. We have to admit that, unfortunately, the number of food L. fermentum strains was lower than that of human feces strains. Based on the current studies, we think that the impact of the number of strains from different origin on our results was limited. Marko Verce et al. showed that 28 L. fermentum isolated from mammal tissues, milk, and plant material fermentations clustered into five clades and was independent of their sources [45]. Another study by Oh et al. indicated that evolution of L. reuteri lineages was adaptive for the different host species, although the sample numbers from different host were unequal (humans (n = 35), mice (n = 35), rats (n=26), pigs (n = 41), chickens (n = 26), and turkeys (n = 5)) [46]. Although phylogenetic analysis revealed that L. fermentum strains isolated from the human gut and food clustered separately, clade V contained L. fermentum strains from both sources; additionally, these host-specific clusters (I, II, VI, human source; III, IV, food source) contained some strains originating from other hosts. Pennisi reported that the widest range of microbes was found in soil and free-living samples, followed by plants, algae, and carnivores, and microbes could spread across host and habitats [47]. Pasolli et al. analyzed the relevance between 666 food source microbiomes and 154,723 human sample microbiomes and speculated that food was the main source of lactic acid bacteria in the human gut [48]. Food and the human gut could be regarded as open systems, and some L. fermentum might have been recently introduced and transient in the temporary environment. This may explain why some L. fermentum strains were promiscuous in host-specific clades.
The COG database, Initially created in 1997, has undergone a series of updates, currently including complete genomes of 122 archaea and 1187 bacteria, and is a popular tool for annotation of functional proteins [49]. An average of 2000 coding sequences was contained in 224 L. fermentum strains, and approximately 1400 COGs were annotated in the genome of these L. fermentum strains (Table S1). pCoA, PERMANOVA, and pairwise analysis of COGs in L. fermentum strains showed that significant differences existed between the food source and human source clades, and we believe that this could presumably reveal their relationship. Genes involved in mobilomes, prophages, and transposons (functional categories of X in COGs) were significantly higher in food source clades (III and IV). Carr et al. showed that mobile genetic elements often move via horizontal gene transfer within a community [50] and studies have also shown that microbes in plants are more diverse than those in the human gut [47]. Wibowo et al. analyzed the microbial genomes from palaeofaeces samples and present-day human gut samples and indicated that mobile genetic elements in human gut microbiomes decreased over time [51].
Since the huge difference existed between human gut source and food source L. fermentum in both evolution and homologous genes, we then focused our attention on the specific differential genes belonging to unique environment. L. fermentum strains derived from human gut source contained significantly more dominant COG categories and these main COG classes were related to various functions including energy production and conversion (C), amino acid transport and metabolism (E), carbohydrate transport and metabolism (G), transcription (K), replication, recombination, and repair (L), and general function prediction only (R) ( Table S2). Hao Luo analyzed the Ka/Ks ratio of genes in functional categories of COGs and found that genes in functional categories of G, H, I, J, K, and L were more evolutionarily conserved and were more essential in coping with strong selective pressure [52]. Perhaps the genome-scale differences in L. fermentum were due to the individual evolution in host gut niche, reflecting the specific host physiology or dietary habits [53]. Food source L. fermentum had more genes encoding transposase. We speculated that the microbiota was more complex in the food than those in the human gut, and L. fermentum strains from food sources were more easily exposed to the mobile genetic elements.
Unique metabolic capacity is highly associated with the adaptation of microorganisms to their specific niche [54]. CAZy analysis showed that a total of 38 carbohydrate active enzyme families existed in 224 L. fermentum strains and the distribution of these enzymes