The mutL Gene as a Genome-Wide Taxonomic Marker for High Resolution Discrimination of Lactiplantibacillus plantarum and Its Closely Related Taxa

The current taxonomy of the Lactiplantibacillus plantarum group comprises of 17 closely related species that are indistinguishable from each other by using commonly used 16S rRNA gene sequencing. In this study, a whole-genome-based analysis was carried out for exploring the highly distinguished target genes whose interspecific sequence identity is significantly less than those of 16S rRNA or conventional housekeeping genes. In silico analyses of 774 core genes by the cano-wgMLST_BacCompare analytics platform indicated that csbB, morA, murI, mutL, ntpJ, rutB, trmK, ydaF, and yhhX genes were the most promising candidates. Subsequently, the mutL gene was selected, and the discrimination power was further evaluated using Sanger sequencing. Among the type strains, mutL exhibited a clearly superior sequence identity (61.6–85.6%; average: 66.6%) to the 16S rRNA gene (96.7–100%; average: 98.4%) and the conventional phylogenetic marker genes (e.g., dnaJ, dnaK, pheS, recA, and rpoA), respectively, which could be used to separat tested strains into various species clusters. Consequently, species-specific primers were developed for fast and accurate identification of L. pentosus, L. argentoratensis, L. plantarum, and L. paraplantarum. During this study, one strain (BCRC 06B0048, L. pentosus) exhibited not only relatively low mutL sequence identities (97.0%) but also a low digital DNA–DNA hybridization value (78.1%) with the type strain DSM 20314T, signifying that it exhibits potential for reclassification as a novel subspecies. Our data demonstrate that mutL can be a genome-wide target for identifying and classifying the L. plantarum group species and for differentiating novel taxa from known species.


Introduction
Following reclassification of Lactobacillus according to physiological criteria, cladespecific signature genes, core genome phylogeny, organism ecology, and pairwise average amino acid identity, this varied genus now comprises of 25 genera, with 23 of them being novel [1]. This change to the taxonomy has affected major probiotic Lactobacillus species; for example, Lactobacillus plantarum has become Lactiplantibacillus plantarum. Because the new genera suggested for this group all still begin with an "L", the abbreviated genus and species names are unchanged. L. plantarum is a versatile lactic acid bacteria (LAB) markers for species-level discrimination, and their pairwise sequence identity matrices were calculated using Clustal Omega [40].

L. plantarum Group Strains and Culture Conditions
The Bioresource Collection and Research Center (BCRC, Hsinchu, Taiwan) served as the source of most of the reference strains employed in this study ( Table 1). The bacterial strains were cultivated on lactobacilli MRS agar (Difco Laboratories, Detroit, MI, USA) anaerobically for 24 h at 37 • C. Genomic DNA was extracted using the DNeasy kit (Qiagen, Valencia, CA, USA) following the manufacturer's protocols.

Degenerate PCR Primer Design, Nucleotide Sequencing and Phylogenetic Analysis on mutL Gene
On comparison with the mutL gene sequence from the whole genome among the species of the L. plantarum group, the degenerate primers, LpmutL-F (5 -TSGAYGTSAAYGTKCAYCC-3 ) and LpmutL-R (5 -ATGYGGRCARTTRAANGGAT-3 ), were designed and targeted to the conserved region. PCRs were performed using 23 µL of sterile MilliQ water, 3 µL of 10× PCR buffer, 0.5 µL of denucleoside triphosphates (10 mM), 1.2 µL of forward primer (10 mM), 1.2 µL of reverse primer (10 mM), 1.5 U of Taq DNA polymerase (DreamTaq, Thermo Scientific, Waltham, MA, USA), and 1 µL of template DNA (100 ng/µL). The thermal protocol consisted of the following conditions: initial strand denaturation at 94 • C for 5 min, followed by 30 cycles at 94 • C for 1 min, 58 • C for 1 min, and 72 • C for 1 min, with a final extension step at 72 • C for 7 min. The resulting amplicons were purified using a QIA quick PCR Purification Kit (Qiagen, Inc., Valencia, CA, USA) and sequenced using a BigDye Terminator v3.1 cycle-sequencing kit on a 3730 DNA Analyzer (Applied Biosystems and Hitachi, Foster City, CA, USA). All sequences were aligned using ClustalX version 1.8 [41], and the sequence identities were calculated using Clustal Omega. A phylogenetic tree was reconstructed using the software package mega version 7.0 [42]. The number of haplotypes was calculated using DnaSP version.5.1 [43].

Species-Specific Primer Design and Direct PCR-Based Identification
L. argentoratensis-, L. paraplantarum-, L. pentosus-, and L. plantarum-specific PCR primer sets were determined on the basis of mutL sequences using VectorNTI version 9.0 (Invitrogen, Carlsbad, CA, USA). Species-specific PCR testing was conducted on all L. plantarum group strains and 12 nontarget strains (Table 1). Regarding the steps constituting the thermal cycler protocol, initial strand denaturation was executed at 94 • C for 5 min; followed by 30 cycles of 94 • C for 1 min, 60 • C for 1 min, and 72 • C for 1 min; and finally an extension step executed at 72 • C for 7 min.
Serial dilution and plating methods were used to isolate the LAB isolates from various sources (Supplementary Table S1), and were discriminated using a MALDI Microflex LT mass spectrometer (Bruker Daltonics, Bremen, Germany), as previously described [44]. This was followed by species-specific PCR and mutL sequencing.

WGS of BCRC 06B0048 and Calculation of dDDH and Phylogenomic Tree Analysis
Genomic DNA was extracted using the EasyPrepHY genomic DNA extraction kit (Biotools Co. Ltd., Taipei, Taiwan) in accordance with the manufacturer's protocols. The draft genomes of strain BCRC 06B0048 was sequenced from an Illumina paired-end library with an average insert size of 350 bp by using an Illumina HiSeq4000 platform with the PE 150 strategy at Beijing Novogene Bioinformatics Technology Co., Ltd. (Beijing, China). The resulting raw reads were assembled de novo using CLC Genomics Workbench version 20.0 (QIAGEN). The dDDH value and the phylogenomic tree were estimated and reconstructed using the TYGS server [45].

Differentiation of Strain BCRC 06B0048 and BCRC 11053 T Based on Biochemical and Chemotaxonomic Characteristics
Profiles of carbohydrate fermentation and enzymic activities were determined using the API 50 CH and API ZYM systems (bioMérieux, Marcy-l'Étoile, France), respectively, according to the manufacturer's instructions. Whole-cell fatty acids were analyzed as fatty acid methyl esters (FAMEs) with the Sherlock Microbial Identification System (MIDI, Inc., Newark, NJ, USA), as described previously [46].

Results and Discussion
The gold standard marker for prokaryotic species identification has long been the 16S rRNA gene. Strains of a single species commonly exhibit 98.7% sequence identity with the 16S rRNA gene [47]. However, the 16S rRNA gene exhibits poor resolution in closely related species groups; for example, the Lentilactobacillus buchneri, Lacticaseibacillus casei, L. plantarum, and Latilactobacillus sakei groups [48][49][50][51][52]. Accordingly, researchers have employed comparative sequences of the housekeeping genes dnaJ, dnaK, hsp60, pheS, recA, rpoA, rpoB, and tuf to distinguish the members of the L. plantarum group [11,45,51,53,54]. Among them, recA exhibited the highest resolution at the interspecific level, along with relatively low sequence identities (75.6-93.0%; average: 80.6%; Supplementary Table S1). Moreover, researchers have established species-specific PCR primers that are targeted to the variable regions of the recA gene for species identification of L. argentoratensis, L. paraplantarum, L. pentosus, and L. plantarum [9,48]. However, because several phylogenetically closely related novel L. plantarum group species have been proposed in recent years [7,8], the usability and specificity of recA gene-based PCR and sequencing as well as species-specific primers should be reassessed through in silico methods. To identify highly distinguished taxonomic markers with deep-level phylogeny or species-specific genes according to the gain and loss of variable regions, the use of genome comparison methods have been applied to resolve the phylogenetically and phenotypically closely related species; this serves as an alternative to the conventional universal phylogenetic targets [55][56][57][58][59][60].
The genomic data of the L. plantarum group species exhibited sequences of high quality, as directly evidenced by the relatively small number of contigs (median, 48; Supplementary  Table S2); we thus used the aforementioned data to execute our subsequent comparative genomic analyses. The pan genomic analysis of L. plantarum and its most closely related species was performed on the cano-wgMLST_BacCompare analytics platform; our derived results indicated that the pan genome contained 9382 genes, of which 13.12% (1231 genes) were present in all of the tested genomes. Subsequently, the 774 genes with the largest allele size were selected as the taxonomic markers, and target genes with high discrimination levels were developed by comparing their sequence identities, including maximum, minimum, and mean values within the L. plantarum group; the results revealed that csbB, morA, murI, mutL, ntpJ, rutB, trmK, ydaF, and yhhX genes were the most promising candidates (Supplementary Table S1). mutL encodes a DNA mismatch repair protein that has an essential role in the maintenance of genomic stability by correcting DNA replication error [61]. It is regarded as a crucial taxonomic marker for species-and strain-level differentiation in the L. casei group [44,62,63]. Thus, the discrimination power of mutL in the L. plantarum group merits further evaluation. A degenerate primer pair (LpmutL-F/R) was used to successfully amplify a PCR amplicon that comprised approximately of 1000 bp of mutL from L. plantarum group strains; this product was subsequently employed for sequencing. Among the type strains, the average sequence identity of mutL was 66.6% (61.6-85.6%), which was clearly superior to that of the 16S rRNA gene (96.7-100%; average: 98.4%) ( Table 2) and conventional phylogenetic markers (Supplementary Table S1) and could be applied to effectively separate tested strains into various species clusters with a high bootstrap value in the phylogenetic tree ( Figure 1). Consequently, species-specific primers for four species in the L. plantarum group were designed (Table 3); these primers could successfully generate a single specific amplicon (319, 115, 176, and 385 bp) when they were used in PCR processes with DNA from reference strains of L. argentoratensis, L. paraplantarum, L. plantarum, and L. pentosus (Table 1), with no cross-reaction against nontarget LAB species. Furthermore, 20 LAB isolates were isolated from various sources (animal feces, fermented foods, and silage), and MALDI-TOF MS indicated that 13 of them were related to L. plantarum; these LAB isolates were next identified to the species-level of L. plantarum by using mutL-based species-specific PCR and sequencing (Supplementary Table S3). On the other hand, all 33 L. plantarum strains could be assigned to a combination of 16 different haplotypes using mutL gene sequence, which was comparable with that of other conventional MLST targets to distinguishing between strains in L. plantarum species [64,65].
We note that the BCRC 06B0048 strain, which was originally identified as L. pentosus, had relatively low mutL sequence identities (97.0%) with the type strain DSM 20314 T ; moreover, the mutL-based phylogenetic tree indicates that the five strains of the species L. pentosus could be divided into two subclusters and that the nodes exhibited high bootstrap values (99%; Figure 1). Traditionally, subspecies-level differentiation mainly relied on the phenotypic and genotypic characteristics. Nevertheless, genome-based taxonomic designation for subspecies with the use of 79% dDDH has been suggested [66]. This approach has been successfully applied to propose and officially validate novel bacterial subspecies such as Bifidobacterium catenulatum, B. gallinarum B. thermacidophilum, and L. reuteri [67,68]. The draft genome size of the BCRC 06B0048 strain was 3.7 Mb (G+C content: 46.2 mol%), and it contained 3,293 protein-coding genes (Supplementary Table S2). The dDDH value between the BCRC 06B0048 and DSM 20314 T strains was 78.1%, which is below the subspecies delineation threshold value. The phylogenomic tree revealed that the 17 type strains of the L. plantarum group were clearly differentiated into distinct species clusters, and the species L. pentosus could also be divided into two independent subclusters (Figure 2). In addition, strain BCRC 06B0048 has also shown several differential characteristics from the L. pentosus type strain BCRC 11053 T (Supplementary Table S4). These results indicate that strain BCRC 06B0048 has the potential for reclassification as an independent subspecies of L. pentosus.

Conclusions
In this study, we successfully applied a genome-based analysis method to assist in the selection of the promising target genes for a closely related species group by employing the cano-wgMLST_BacCompare analytics platform. The results suggest that the genome-wide taxonomic marker of mutL is an excellent phylogenetic target for precisely discriminating and identifying L. plantarum and related taxa through the use of direct sequencing as well as species-specific PCR assays. According to the genotypic and phenotypic characteristics, we deduced that strain BCRC 06B0048 may represent an undescribed subspecies of L. pentosus. We will endeavor to integrate polyphasic and combined with the genomic strategies to describe novel subspecies in the future.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/ 10.3390/microorganisms9081570/s1, Table S1: Interspecific analysis of the genome-based selection targets within the L. plantarum group. Table S2: Genomic characteristics of L. plantarum group type strains and BCRC 06B0048. Table S3: Identification of L. plantarum isolates from various sources using MALDI-TOF MS, species-specific PCR and mutL sequencing methods. Table S4: Differential characteristics between the L. pentosus BCRC 06B0048 and BCRC 11053 T .

Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.