Comparative Genomics Analysis of Lactobacillus ruminis from Different Niches

Lactobacillus ruminis is a commensal motile lactic acid bacterium living in the intestinal tract of humans and animals. Although a few genomes of L. ruminis were published, most of them were animal derived. To explore the genetic diversity and potential niche-specific adaptation changes of L. ruminis, in the current work, draft genomes of 81 L. ruminis strains isolated from human, bovine, piglet, and other animals were sequenced, and comparative genomic analysis was performed. The genome size and GC content of L. ruminis on average were 2.16 Mb and 43.65%, respectively. Both the origin and the sampling distance of these strains had a great influence on the phylogenetic relationship. For carbohydrate utilization, the human-derived L. ruminis strains had a higher consistency in the utilization of carbon source compared to the animal-derived strains. L. ruminis mainly increased the competitiveness of niches by producing class II bacteriocins. The type of clustered regularly interspaced short palindromic repeats /CRISPR-associated (CRISPR/Cas) system presented in L. ruminis was mainly subtype IIA. The diversity of CRISPR/Cas locus depended on the high denaturation of spacer number and sequence, although cas1 protein was relatively conservative. The genetic differences in those newly sequenced L. ruminis strains highlighted the gene gains and losses attributed to niche adaptations.


Introduction
Lactobacillus ruminis is a lactic acid bacterium which is phylogenetically close to Lactobacillus salivarius [1], and is strictly anaerobic with low GC content [2]. L. ruminis was firstly isolated from human feces in 1960 and originally identified as Catenabacterium catenaforme [3], which was subsequently isolated from bovine rumen [4]. L. ruminis can trigger certain protective responses in humans and animals when given as a probiotic or symbiotic supplement [5]. The niches of L. ruminis are variable,

Strains Ethics Approval Statement, Culturing, Genome Sequencing, and Data Assembly
This study was approved by the Ethics Committee in Jiangnan University, China (SYXK 2012-0002). All the fecal samples from healthy persons were for public health purposes and these were the only human materials used in present study. Written informed consent for the use of their fecal samples was obtained from the participants or their legal guardians. All of them conducted health questionnaires before sampling and no human experiments were involved. The collection of fecal samples had no risk of predictable harm or discomfort to the participants, and sampling of homemade fermented food and domestic animals were all consented by owners.
Eighty one strains of L. ruminis, previously isolated from animal and human feces from different regions, were cultured in De Man, Rogosa and Sharpe (MRS) medium [20] and incubated in a anaerobic workstation for 24 h. The draft genomes of all the strains were sequenced using llumina Hiseq × 10 platform (Illumina, San Diego, CA, USA) with a coverage depth no less than the genome 100 ×. The reads were assembled by SOAPde-novo [21]. Ten publicly available genomes listed in Table S1 from National Centre for Biotechnology Information (NCBI) (https://www.ncbi.nlm.nih.gov/) were used for comparison.

Pan-Genome and Core-Genome Analysis
PGAP v1.2.1 was used to calculate the pan-genome and core-genome [26]. The protein sequences extracted from those 91 strains were aligned using Orthomcl software to make a Venn diagram (maintaining 50% identity; the cutoff E-value was 1e-4) [27].

Genotype/Phenotype Association Applied to Carbohydrate Metabolism
The enzymes and genes involved in carbohydrate metabolism were annotated using the Carbohydrate Active Enzyme Database (CAZy, http://www.cazy.org/) [33], and the strains were visually clustered using HemI software [34].
Twenty-seven different sugars were selected for carbohydrate utilization analysis of L. ruminis. A stock solution of the carbohydrate was filtered (0.22 µm) into a carbohydrate-free MRS (cfMRS), and bromocresol purple was added to give a final fermentation concentration of 1%. A total of 1% of L. ruminis was inoculated into the medium, and the strain was activated twice to ensure its activity. After anaerobic incubation for 24 h at 37 • C, the utilization was judged by observing the color. The test was performed three times in duplicate on different occasions.

Bacteriocin Prediction
BAGEL4 is an online database that help mine and visualize ribosome-synthesized and posttranslationally modified peptides and bacteriocin-producing gene clusters in the prokaryotic genome (http://bagel4.molgenrug.nl/index.php) [35]. On the BAGEL4 web server, a DNA nucleotide sequence was used as an input file. The conservation of RNA secondary structure was predicted by Weblogo (https://weblogo.berkeley.edu/logo.cgi) [36].

CRISPR Identification
CRISPRFinder (https://crisprcas.i2bc.paris-saclay.fr/CrisprCasFinder/Index) was used to discover CRISPR loci in L. ruminis genomes and predict CRISPR repeats and spacers [37]. Secondary structure prediction of repeat was performed by RNAfold (http://rna.tbi.univie.ac.at/cgi-bin/RNAWebSuite/ RNAfold.cgi) [38]. According to the amino acid sequence of cas1 protein and the nucleotide sequence of CRISPR repeat sequence, phylogenetic analysis was carried out. The phylogenetic tree was constructed with 1000 repetitions using the p-distance model algorithm in the Neighbor-Joining Method in Molecular Evolutionary Genetics Analysis (MEGA) 7.0 software (Sudhir Kumar, Philadelphia, PA, USA). Number distribution of spacer sequences was plotted using GraphPad Prism 6 software (GraphPad Software Inc., San Diego, CA, USA).

Prophage Identification
PHASTER was used to identify the presence and composition integrity of prophages (http: //phaster.ca/) [39]. The relevance between the number of spacer sequences and prophages was plotted using GraphPad Prism 6 software.

Genomic Characterization of L. ruminis
In the previous work in our lab, 81 strains of L. ruminis strains were isolated from human feces (75 strains), piglets (three strains), dogs (two strains), and cows (one strain), which were sampled from fifteen cities in China (Table S1). Following whole genome sequencing, the draft genomes of these 81 strains, plus ten publicly available L. ruminis genomes (ATCC 27782 [40], ATCC 25644 [40], DSM 20403 [41], DPC 6830 [1], DPC 6832 [1], SPM0211 [42], S23 [1], bz_0080 [43], ICIS-540 and TF10-9AT) from NCBI GenBank Database, were analyzed. Those 91 L. ruminis strains have an average genome size of 2.16 Mb, 2310 genes and a GC content of 43.65 %, while the genome size of L. ruminis ranged between 1.94 Mb (S23) and 2.4 Mb (ICIS-540), in which the animal-derived strains exhibited smaller genome than that isolates from human. Similar results were found for gene number, in which gene number of animal-derived strains was relatively lower than that in human-derived strains. There was no obvious relationship between GC content and source host.

Pan-Genome and Core-Genome of L. ruminis
A pan-genome analysis was performed to determine the total number of different genes which were present in the L. ruminis genomes and the pan-genome curve displayed an asymptotic trend. The number of new gene increase gradually decreased from 466 at the beginning to 50 at the last group ( Figure 1a). The mathematical function of the pan-genome displayed above the graph shows an exponential value less than 0.5, showing that the pan-genome was in a closing state. With the equation and genome number involved, the core genome of L. ruminis harbors 1188 genes ( Figure 1a). The Venn diagram represented the specific and homologous core genes among all the 91 L. ruminis strains, showing the shared genes among all the strains assayed were 1166 genes, while the unique gene for each L. ruminis strain ranged from 3 to 565 genes ( Figure 1b).

Genomic Characterization of L. ruminis
In the previous work in our lab, 81 strains of L. ruminis strains were isolated from human feces (75 strains), piglets (three strains), dogs (two strains), and cows (one strain), which were sampled from fifteen cities in China (Table S1). Following whole genome sequencing, the draft genomes of these 81 strains, plus ten publicly available L. ruminis genomes (ATCC 27782 [40], ATCC 25644 [40], DSM 20403 [41], DPC 6830 [1], DPC 6832 [1], SPM0211 [42], S23 [1], bz_0080 [43], ICIS-540 and TF10-9AT) from NCBI GenBank Database, were analyzed. Those 91 L. ruminis strains have an average genome size of 2.16 Mb, 2310 genes and a GC content of 43.65 %, while the genome size of L. ruminis ranged between 1.94 Mb (S23) and 2.4 Mb (ICIS-540), in which the animal-derived strains exhibited smaller genome than that isolates from human. Similar results were found for gene number, in which gene number of animal-derived strains was relatively lower than that in human-derived strains. There was no obvious relationship between GC content and source host.

Pan-Genome and Core-Genome of L. ruminis
A pan-genome analysis was performed to determine the total number of different genes which were present in the L. ruminis genomes and the pan-genome curve displayed an asymptotic trend. The number of new gene increase gradually decreased from 466 at the beginning to 50 at the last group ( Figure 1a). The mathematical function of the pan-genome displayed above the graph shows an exponential value less than 0.5, showing that the pan-genome was in a closing state. With the equation and genome number involved, the core genome of L. ruminis harbors 1188 genes ( Figure  1a). The Venn diagram represented the specific and homologous core genes among all the 91 L. ruminis strains, showing the shared genes among all the strains assayed were 1166 genes, while the unique gene for each L. ruminis strain ranged from 3 to 565 genes ( Figure 1b).

Phylogenetic Analyses of L. ruminis
To analyze the phylogenetic relationship of L. ruminis strains, a phylogenetic tree was created based on orthologue genes of 91 genomes that constituted to the core genome ( Figure 2a). The resulting phylogenetic tree divided those strains in five clades (clades A to E). Clade E consisted of five strains, among which one strain was isolated from horses (DPC 6832) and another four strains were from piglets (DPC 6830, FYNLJ94L3, FYNLJ99L1, and FYNLJ111L2). Clade D included three strains, which were all bovine-derived strains. While the clade A to clade C gathered 83 L. ruminis strains, among which two isolates were from dogs, one isolate were from milk and the remained were from human feces. All those strains from different cities were divided into three regions based on the relative distance more than 1000 km of the source (Figure 2b). The strains in region A were mainly concentrated on clade A, and most of the strains in region B were clustered on clade B, while most strains in clade C were isolated from region C.

Phylogenetic Analyses of L. ruminis
To analyze the phylogenetic relationship of L. ruminis strains, a phylogenetic tree was created based on orthologue genes of 91 genomes that constituted to the core genome ( Figure 2a). The resulting phylogenetic tree divided those strains in five clades (clades A to E). Clade E consisted of five strains, among which one strain was isolated from horses (DPC 6832) and another four strains were from piglets (DPC 6830, FYNLJ94L3, FYNLJ99L1, and FYNLJ111L2). Clade D included three strains, which were all bovine-derived strains. While the clade A to clade C gathered 83 L. ruminis strains, among which two isolates were from dogs, one isolate were from milk and the remained were from human feces. All those strains from different cities were divided into three regions based on the relative distance more than 1000 km of the source (Figure 2b). The strains in region A were mainly concentrated on clade A, and most of the strains in region B were clustered on clade B, while most strains in clade C were isolated from region C.

ANI Values of L. ruminis
The average nucleotide identity (ANI) is a classical method for analyzing the unique species or potential subspecies existing among the strains within the same species. ANI values of those 91 genomes were carried out through pairwise comparison at the 95% threshold to further identify their species [44]. The results showed that all the 91 strains belonged to L. ruminis, and there was no potential subspecies for L. ruminis (Figure 3). The ANI values of L. ruminis strains from different sources were lower than that of strains from the same source.

ANI Values of L. ruminis
The average nucleotide identity (ANI) is a classical method for analyzing the unique species or potential subspecies existing among the strains within the same species. ANI values of those 91 genomes were carried out through pairwise comparison at the 95% threshold to further identify their species [44]. The results showed that all the 91 strains belonged to L. ruminis, and there was no potential subspecies for L. ruminis (Figure 3). The ANI values of L. ruminis strains from different sources were lower than that of strains from the same source. Genes 2020, 10, x FOR PEER REVIEW 6 of 18
The computational prediction of glycosyl hydrolases (GHs) in each genome was carried out using CAZy database to evaluate the carbohydrate fermentation genotype of L. ruminis (Figure 4b). This analysis identified seventeen GH families, highlighting the predominance of genes encoding GHs belonging to GH1, GH13, and GH109 families, which was in charge of the synthesis of αglucosidase, β-glucosidase, α-N-acetylgalactosaminidase. The porcine-derived and bovine-derived strains did not consist of the GH2 and GH42 families, indicating that they could not synthesize βgalactosidase. In addition, the human-derived strains were shown to encode a higher number of GH families than all the animal-derived strains (Figure 4b).
The computational prediction of glycosyl hydrolases (GHs) in each genome was carried out using CAZy database to evaluate the carbohydrate fermentation genotype of L. ruminis (Figure 4b). This analysis identified seventeen GH families, highlighting the predominance of genes encoding GHs belonging to GH1, GH13, and GH109 families, which was in charge of the synthesis of α-glucosidase, β-glucosidase, α-N-acetylgalactosaminidase. The porcine-derived and bovine-derived strains did not consist of the GH2 and GH42 families, indicating that they could not synthesize β-galactosidase. In addition, the human-derived strains were shown to encode a higher number of GH families than all the animal-derived strains (Figure 4b).
Genotype and phenotype association analysis for utilization of carbohydrates was performed. Four putative carbohydrate utilization operons were annotated in the L. ruminis ( Figure 5). The utilization of lactose was related to the β-galactosidase encoded by the GH2 and GH42 families, and the activity of the enzyme was predicted to be encoded by lacZ gene. The lactose operon of the porcine-derived and bovine-derived strains did not encode the lacZ gene, consistent with absence of GH2 and GH42 families in those strains, which led to led to their inability to utilize lactose ( Figure 5A). The sucrose operon was predicted to be in charge of the transport of sucrose and hydrolysis of sucrose-6-phosphate, and in L. ruminis it appeared to relate to the β-fructofuranosidase and sucrose-6-phosphate hydrolase that belonged to the GH32 family, and those two enzymes were encoded by sacA in the sucrose operon. The majority of L. ruminis contained a complete sucrose operon, such as FHNXY44L3, but the porcine-derived strain, FYNLJ111L2, which could not utilize sucrose, was predicted due to the overlap and surplus of phosphotransferase system (PTS) transporter genes (scrA), resulting in termination of transcription ( Figure 5B). β-fructofuranosidase has been identified as the key enzyme involved in FOS utilization in other Lactobacillus. The operon of FOS composed of LacI family transcriptional regulator, beta-fructofuranosidase and major facilitator super-family (MFS) transporter. The porcine-derived strains cannot use the FOS due to the deficiency of sacA genes which encoded β-fructofuranosidase ( Figure 5C). Additionally, the utilization of raffinose was mainly regulated by active transport system (permease) and α-galactosidase. Insertion of the transposase in the raffinose operon of L. ruminis may affect its normal transcription leading to the inability to utilize raffinose ( Figure 5D).  Genotype and phenotype association analysis for utilization of carbohydrates was performed. Four putative carbohydrate utilization operons were annotated in the L. ruminis ( Figure 5). The utilization of lactose was related to the β-galactosidase encoded by the GH2 and GH42 families, and the activity of the enzyme was predicted to be encoded by lacZ gene. The lactose operon of the as the key enzyme involved in FOS utilization in other Lactobacillus. The operon of FOS composed of LacI family transcriptional regulator, beta-fructofuranosidase and major facilitator super-family (MFS) transporter. The porcine-derived strains cannot use the FOS due to the deficiency of sacA genes which encoded β-fructofuranosidase ( Figure 5C). Additionally, the utilization of raffinose was mainly regulated by active transport system (permease) and α-galactosidase. Insertion of the transposase in the raffinose operon of L. ruminis may affect its normal transcription leading to the inability to utilize raffinose ( Figure 5D).

Prediction of Bacteriocin Operons in L. ruminis
BAGEL was used to identify the potential bacteriocin operon in those L. ruminis strains. In the study, 51 of 91 strains of L. ruminis produced bacteriocin (Table S2). Analysis performed on putative bacteriocin indicated that Class II bacteriocin was the majority, presenting in 47 L. ruminis strains, followed by Class I bacteriocin which was exist in thirteen strains and Class III bacteriocin was absent in all those 91 genomes. L. ruminis could synthesize five different bacteriocins, such as sactipeptides, plantaricin 423, leucocin, coagulin A and Hiracin JM79, which were belonged to class I and class II, respectively. Most of the strains with sactipeptides operon were located in cluster D in the phylogenetic tree, while the strains with Plantaricin 423 encoding genes were in cluster C. Additionally, most of the animal-derived strains in clusters A and B on the tree had the genes encoding coagulin A (Table S2).
Four bacteriocins of class II showed high sequence conservation, and the precursor peptides included a conservative functional domain (YGNGVXCXXXXCXVXWXXA), that was classified to the class IIa bacteriocin ( Figure 6). The biosynthetic gene clusters for class IIa bacteriocin were analyzed in L. ruminis (Figure 7). In addition, the gene cluster of class IIa bacteriocin generally

Prediction of Bacteriocin Operons in L. ruminis
BAGEL was used to identify the potential bacteriocin operon in those L. ruminis strains. In the study, 51 of 91 strains of L. ruminis produced bacteriocin (Table S2). Analysis performed on putative bacteriocin indicated that Class II bacteriocin was the majority, presenting in 47 L. ruminis strains, followed by Class I bacteriocin which was exist in thirteen strains and Class III bacteriocin was absent in all those 91 genomes. L. ruminis could synthesize five different bacteriocins, such as sactipeptides, plantaricin 423, leucocin, coagulin A and Hiracin JM79, which were belonged to class I and class II, respectively. Most of the strains with sactipeptides operon were located in cluster D in the phylogenetic tree, while the strains with Plantaricin 423 encoding genes were in cluster C. Additionally, most of the animal-derived strains in clusters A and B on the tree had the genes encoding coagulin A (Table S2).
Four bacteriocins of class II showed high sequence conservation, and the precursor peptides included a conservative functional domain (YGNGVXCXXXXCXVXWXXA), that was classified to the class IIa bacteriocin ( Figure 6). The biosynthetic gene clusters for class IIa bacteriocin were analyzed in L. ruminis (Figure 7). In addition, the gene cluster of class IIa bacteriocin generally contained three types of genes, including a structural prebacteriocin gene encoding core peptide, an immunity gene encoding an immunity protein and transporter genes. When BAGEL was used to analyze sactipeptides, the gene cluster only contained a structural gene encoding a putative peptide of cysteine residues that was connected with the synthesis of sactipeptides. However, the specific ATP-binding cassette (ABC) transporters were never present in the gene cluster of sactipeptides ( Figure 7E). Therefore, the gene cluster encoding sactipeptides was incomplete and has no potential bacteriocin.
Genes 2020, 10, x FOR PEER REVIEW 9 of 18 contained three types of genes, including a structural prebacteriocin gene encoding core peptide, an immunity gene encoding an immunity protein and transporter genes. When BAGEL was used to analyze sactipeptides, the gene cluster only contained a structural gene encoding a putative peptide of cysteine residues that was connected with the synthesis of sactipeptides. However, the specific ATP-binding cassette (ABC) transporters were never present in the gene cluster of sactipeptides ( Figure 7E). Therefore, the gene cluster encoding sactipeptides was incomplete and has no potential bacteriocin.

Prediction of CRISPR/Cas Systems in L. ruminis
The CRISPR/Cas system was investigated in 91 genomes of L. ruminis strains by CRISPRFinder, and totally 59 CRISPR loci were identified in 49 out of 91 L. ruminis genomes (55%) ( Table S3). Under some certain circumstances, there was no adjacent cas gene in the CRISPR region detected by CRISPR Finder, and these regions that were considered to be invalid were not further involved in the subsequent genetic analysis. Of the 49 genomes containing CRISPR, ten of them consisted of more than one CRISPR locus. Regarding the CRISPR/Cas system, subtypes I-B, I-E, I-C, IIA and IIIA were identified, in which Type II was the most abundant and with IIA being the most dominant subtype. Only L. ruminis DPC6832 contained a subtype I-B CRISPR/Cas system.
The number of spacer sequences in the CRISPR loci of different subtypes was analyzed ( Figure.S1), whose number in the subtype IIA locus varied greatly, up to 66, at least five, with an average of about 24. The number of subtype IC spacer sequences was the lowest. Repeat sequences of the same type of CRISPR/Cas system showed high homology in the phylogenetic tree (Figure 8a). By predicting its secondary structure, repeat sequences can be better explored (Figure 9). The difference between the repeats of the types IB, IC, IE, and IIIA identified in L. ruminis were only a few base pair, with mainly the same frequency and secondary structure. Therefore, only one secondary structure was performed for the four types. Repeat sequences from the IIA subtype were more variable, with a total of three structures. When focusing on the RNA secondary structure of the repeat sequence of the CRISPR/Cas system in L. ruminis genomes, it was found that both ends of the repeat sequence contained a large loop and a small loop ( Figure 9D-F), which was a typical stem-loop stable structure. Among these subtypes, IE and IIA contain G:U base pairs which were classical of conserved RNA secondary structures ( Figure 9C-F). CRISPR/Cas loci onto the Cas1 tree demonstrated a considerable agreement between the phylogeny of Cas1 and locus types and subtypes (Figure 8b). Cas1 gene of subtype I-E was strictly monophyletic, while cas1 gene of subtype I-C, II-A and III-A was largely monophyletic, with a few exceptions.

Prediction of Prophage in L. ruminis
The prophages in the L. ruminis stains were predicted by PHASTER, and the results were listed in Table S4. Those prophages were predicted to be 'intact', 'incomplete' or 'questionable'. Incomplete and questionable described the CDSs associated with the prophage gene cluster, but they did not correctly define prophage. Fifty-five intact prophages were identified in 40 out of 91 L. ruminis genomes (44%) ( Table S4). Among them, 28 L. ruminis strains carried only one intact prophage, nine strains carried two prophages, and three strains carried three prophages. Interestingly, some prophage gene clusters which encoding structural and lysis components were identified as questionable or incomplete. In addition, by researching the correlation between the number of spacer sequences and the prophage, it was found that the relationship was negatively correlated in L. ruminis strains ( Figure 10).

Prediction of Prophage in L. ruminis
The prophages in the L. ruminis stains were predicted by PHASTER, and the results were listed in Table S4. Those prophages were predicted to be 'intact', 'incomplete' or 'questionable'. Incomplete and questionable described the CDSs associated with the prophage gene cluster, but they did not correctly define prophage. Fifty-five intact prophages were identified in 40 out of 91 L. ruminis genomes (44%) ( Table S4). Among them, 28 L. ruminis strains carried only one intact prophage, nine strains carried two prophages, and three strains carried three prophages. Interestingly, some prophage gene clusters which encoding structural and lysis components were identified as questionable or incomplete. In addition, by researching the correlation between the number of spacer sequences and the prophage, it was found that the relationship was negatively correlated in L. ruminis strains ( Figure 10).
Genes 2020, 10, x FOR PEER REVIEW 12 of 18 Figure 9. Secondary structure of repeats in L. ruminis.

Prediction of Prophage in L. ruminis
The prophages in the L. ruminis stains were predicted by PHASTER, and the results were listed in Table S4. Those prophages were predicted to be 'intact', 'incomplete' or 'questionable'. Incomplete and questionable described the CDSs associated with the prophage gene cluster, but they did not correctly define prophage. Fifty-five intact prophages were identified in 40 out of 91 L. ruminis genomes (44%) ( Table S4). Among them, 28 L. ruminis strains carried only one intact prophage, nine strains carried two prophages, and three strains carried three prophages. Interestingly, some prophage gene clusters which encoding structural and lysis components were identified as questionable or incomplete. In addition, by researching the correlation between the number of spacer sequences and the prophage, it was found that the relationship was negatively correlated in L. ruminis strains ( Figure 10).  corresponding to dietary diversity of human hosts. To some extent, these differences reflected not only the genomic variety of L. ruminis, but also niches adaptation through the acquisition or loss of metabolically related genes [56].
Bacteriocin is small antimicrobial peptide produced by many bacteria, including Lactobacillus, which may display either a narrow spectrum against closely related species or broad spectrum to species that belonged to different genera [57]. Heretofore, the well-known bacteriocin produced by L. ruminis belonged to class IIa bacteriocin, in which L. ruminis ATCC 27782 generated a Class II pediocin-like bacteriocin [17]. Sactipeptides is a sulphur-to-α-carbon-containing peptides, known as assactibiotics when it showed antibacterial activity [58]. Notably, the common characteristics in the sactibiotic gene clusters composed of the immunity proteins, structural genes, transporters, and S-adenosylmethionine enzymes including a classic conserved domain [59]. In addition, the precursor peptide containing one or more cysteine residues and the ABC-type bacteriocin transport system were considered to be potential biosynthetic gene clusters encoding sactipeptides [16]. In the current study, the sactipeptides of class I bacteriocin identified by BAGEL lacked a transporter and was not considered to be a credible bacteriocin gene operon. There had been no sactipeptide from a lactic acid bacteria strain characterized yet and only presumed bacteriocin clusters had been identified via in silico analysis [60], which needs further investigation.
This study researched the variety and distribution of CRISPR/Cas loci in 91 strains of L. ruminis, and the number of spacer sequences reflected the activity of the CRISPR system. An active CRISPR/Cas system has been shown to be able to continuously obtain spacer sequences. Conversely, the spacer sequence will be deleted in order to retain the activity of the CRISPR/Cas system in the absence of selection pressure [61]. From the number of spacer sequences, it can be inferred that the subtype IIA locus in L. ruminis was more active and had better ability to against the insertion of exogenous gene. The similar results were obtained in previous studies [62]. The presence of the G:U base pair highlighted the significance of stem loops in the repeat sequence for the function of CRISPRs [63]. The high denaturation of the CRISPR locus was illustrated by the variety observed in the number and sequence of the CRISPR spacers despite the CRISPR repeat conservation and Cas homology. In addition, it was interesting to note that incomplete and questionable prophages contained some phage components. For instance, there were some gene clusters that can encode structures, lysin, or lysis modules, whereas prophages were questionable or incomplete. It is inferred that retention of interfering prophage residues may be beneficial to the host. The discovery that bacteria containingincomplete prophages can have compatible functions such as bacteriocin supported this viewpoint [64]. Streptococcus pyogenes stains that lacked of CRISPR system contained evidently more prophages than CRISPR possessing strains [65]. And similar result was found in L. ruminis in the current work. The inverse proportion between number of spacers and phages of L. ruminis was obvious. It can be presumed that strains with a high amount of CRISPR/Cas system were more advantageous as compared with strains without these, when it comes to use DNA as nutrient.

Conclusions
In this study, the genome sequences of 91 L. ruminis strains provided a basis for functional gene analysis of this species. As a host adaptive lifestyle, the difference in niches had a greater impact on the evolution of bacterial genes. Adaptation to different host intestinal competition environments included the utilization of carbohydrates, the production of bacteriocin, and the presumed large number of CRISPR loci and prophage, which will contribute to the persistence of L. ruminis in the native colonization of the gastrointestinal tract.
Funding: This research was supported by the National Natural Science Foundation of China (Nos. 31771953, 31820103010), national first-class discipline program of Food Science and Technology (JUFSTR20180102), Collaborative innovation center of food safety and quality control in Jiangsu Province.
Acknowledgments: 2 FL was kindly shared by Friesland Campina DOMO, the Netherlands.

Conflicts of Interest:
The authors declare no conflict of interest.