Quantitative Detection of Bifidobacterium longum Strains in Feces Using Strain-Specific Primers

We adopted a bioinformatics-based technique to identify strain-specific markers, which were then used to quantify the abundances of three distinct B. longum sup. longum strains in fecal samples of humans and mice. A pangenome analysis of 205 B. longum sup. longum genomes revealed the accumulation of considerable strain-specific genes within this species; specifically, 28.7% of the total identified genes were strain-specific. We identified 32, 14, and 49 genes specific to B. longum sup. longum RG4-1, B. longum sup. longum M1-20-R01-3, and B. longum sup. longum FGSZY6M4, respectively. After performing an in silico validation of these strain-specific markers using a nucleotide BLAST against both the B. longum sup. longum genome database and an NR/NT database, RG4-1_01874 (1331 bp), M1-20-R01-3_00324 (1745 bp), and FGSZY6M4_01477 (1691 bp) were chosen as target genes for strain-specific quantification. The specificities of the qPCR primers were validated against 47 non-target microorganisms and fecal baseline microbiota to ensure that they produced no PCR amplification products. The performance of the qPCR primer-based analysis was further assessed using fecal samples. After oral administration, the target B. longum strains appeared to efficiently colonize both the human and mouse guts, with average population levels of >108 CFU/g feces. The bioinformatics pipeline proposed here can be applied to the quantification of various bacterial species.


Introduction
Intestinal commensals play an important role in host health via being involved in various aspects of host physiology, such as tissue development, metabolism, and immunomodulation [1,2]. Many of these organisms are believed to be beneficial to the host. Bifidobacterium is a genus of bacterial species that colonizes the gut early in life [3] and is considered beneficial to host health [4]. The abundances of various Bifidobacterium species in the gut vary widely among individuals according to differences in dietary patterns [5,6], age groups [7], and physiological statuses [8]. Among these species, B. longum stands out as a member of the core human microbiome [9] and the most dominant species within the Bifidobacterium genus in the gut, regardless of the host age [7]. B. longum is distributed broadly across subjects of various ages [10], and is among the limited number of bacterial species that can colonize the gut over years [11]. Therefore, B. longum is an excellent example of host-microbe co-evolution, and is considered to be among the most potent probiotic species that are likely to engraft and persist in the gut after oral ingestion [12].
Compared with probiotic strains that merely transit through the gut, those probiotic strains that are able to successfully reside in the gut, would interact closely with the gut immune system, mucosa, epithelial cells, and native microbial communities, thereby possibly harboring better probiotic effects. However, as a consequence of the difficulty of strainlevel detection, there is clearly a knowledge gap regarding gut colonization mechanisms of probiotics [13]. Current generally used approaches to detect strain colonization include plate counting and species-level PCR [14][15][16]. However, these methods are not accurate, considering the natural occurrence of phylogenetically related species with the ingested probiotic strains in the indigenous microbiota. Therefore, detection and quantitation of probiotics at the strain level are critically important for accessing gut colonization by various strains, and further understanding their functionality and related mechanistic insight.
Multiple approaches have been developed to measure the presence and abundance of specific probiotic strains in the gastrointestinal tract. Initially, methods based on selective culture medium and colony identification (e.g., bacterial morphology, biochemical analysis, pulsed field gel electrophoresis (PFGE), 16S ribosomal DNA (rDNA) PCR, internally transcribed spacer (ITS)-PCR, random amplified polymorphic DNA (RAPD)-PCR, and monoclonal antibodies) were commonly used [14,[17][18][19][20][21]. However, these methods are time consuming, laborious, and often inaccurate. Fluorescence [22] or antibiotic labeling [23], and group-specific fluorescence in situ hybridization (FISH) [24] are also ineffective, because of the recurrent loss of plasmids with these tags by strains during gut transition, the low detection sensitivity of fluorescence signals, and safety considerations regarding the application of these approaches in human subjects. Species-specific PCR assays that target 16S rDNA variable regions or 16S-23S ITS rDNA sequences have also been used to directly determine ingested probiotic strains in fecal samples [25,26]. However, this approach cannot distinguish the target strain from phylogenetically related species present in the baseline microbiota. Recently, with the accumulation of sequenced bacterial genomes, strain-specific gene markers have been identified at unprecedented speeds, impelling us to use these unique markers to detect and quantify strains using molecular methods.
Strain-specific detection depends on the identification of DNA regions unique to specific strains. Before the era of large-scale genomic sequencing, selected RAPD electrophoresis bands, specific DNA fragments from suppression subtractive hybridization (SSH), or known sequences related to specific traits (e.g., Lactobacillus rhamnosus GG (LGG) harbors a pili structure, but LC705 does not) were used to design strain-specific primers for qualification of some probiotic strains in the gut/fecal samples, including LGG [27], B. bifidum OLB6378 [28], L. gasseri K7 [29], B. breve Yakult [30], and L. reuteri DSM 16350 [31]. However, this strain specificity remained within narrow confidence intervals because the identification of the strain-specific DNA regions was based on a limited number of bacterial strains. Additionally, these methods usually required pure cultures of various bacterial strains for laborious electrophoretic analyses. Fortunately, recent emerged bioinformatics strategies based on the sequenced bacterial genomes provide an alternative to find nearly "true" strain-specific DNA sequences. Theoretically, some bioinformatics pipelines (Pan-Seq [32], PGAT [33], PGAP [34], and Roary [35]) can be used to search bacterial strain-specific DNA segments, which can subsequently be used as templates for strain-specific primers. However, no previous study has used these bioinformatics tools to identify strain-specific sequences and achieve strain-level bacterial detection.
In this study, we selected B. longum sup. longum as an example due to the fact that it has been reported to be the most potent probiotic species with long-term gut colonization potential, and used genomics analyses to identify DNA sequences specific to three B. longum sup. longum strains isolated from the fecal samples of three Chinese subjects. First, we used a Roary-based pangenome analysis to identify unique gene markers that were present only in a single strains of B. longum sup. longum but absent from all the other strains of this species. Next, we validated this strain specificity in the context of other microbes and the baseline microbiota, and then targeted these unique sequences to design strain-specific primers. Finally, we applied these strain-specific primers and quantitative PCR (qPCR) to quantify the colonized biomasses of these three B. longum sup. longum strains in the feces of humans and mice after ingestion.

Bacterial Strains, Culture Conditions and Genomic DNA Extraction
As shown in Table 1, 48 bacterial strains were used in this study. The genomic DNA of each bacterial strain was extracted using rapid bacterial genomic DNA isolation kit (Sangon Biotech Co., Ltd., Shanghai, China).

Bacterial Genome Sequencing and Retrieval of Publicly Available Genomes
Three B. longum strains (RG4-1, FGSZY6M4, and M1-20-R01-3) were isolated from fecal samples of three Chinese individuals, and under genome sequencing by Illumina HiSeq 2000. Briefly, a paired-end sequencing library (average insert size of 350 bp and maximum read length of 150 bp) was built according to manufacturers' instructions (Illumina Inc., San Diego, CA, USA). On average, 3 GB paired-end raw reads were generated for each sample. After removing adaptors and low-quality reads, the resulting clean reads were assembled using SOAPdenovo v2.04 Software for short-read de novo assembler, BGI HK Research Institute: Hong Kong, China, 2012 [36], as described previously [37]. A total of 202 publicly available B. longum genomes were downloaded from National Center for Biotechnology Information (NCBI) database (Table 2). In total, 205 B. longum assemblies were finally used in this study.

Single Nucleotide Polymorphism (SNP) Calling and Phylogeny Reconstruction
The SNPs were recalled for 205 B. longum genomes by mapping the assemblies against the reference genome (B. longum NCC 2705) using MUMmer [38], as previously described [37], and only bi-allelic SNPs in the core genome were included in following analysis. The sequences of concatenated SNPs were used to construct phylogenetic tree (neighbor-joining method) using TreeBest (http://treesoft.sourceforge.net/treebest.shtml, accessed on 23 September 2020). It should be mentioned that before conducting the above phylogenetic analysis, we confirmed that the used 205 B. longum assemblies belonged to B. longum sup. longum by building a phylogenetic tree based on core genome bi-SNPs with the genomes of B. longum subsp. suillum, B. longum subsp. infantis, and B. longum subsp. suis as the outgroup.

Identification of Strain-Specific Markers
B. longum genomes were re-annotated using Prokka [39], and the obtained protein sequences were used to perform pangenome analysis via Roary (with a minimum BLASTP percentage identity of 90%) [35]. The genes that were only present in a single newly sequenced strain (RG4-1, FGSZY6M4 or M1-20-R01-3) and absent from all the other 204 strains were preliminarily identified. Considering the above gene presence/absence analysis was conducted at the protein level, we further validated the specificity of these strain-specific genes in nucleotide level. A nucleotide database containing 205 B. longum genomes was constructed using the makeblastdb command, and the above strain-specific gene sequences were analyzed through BLASTN against this database [40]. The DNA sequences that were only present in the target strain were retained. After the validation for intra-species specificity, we then tested the specificity of the DNA sequences under the background of all the representative microbes via website-based nucleotide BLAST against NR/NT database of NCBI database. The DNA sequences showing no hit in the database were finally selected.

Design of Strain-Specific Primers and Validation of Their Specificity via Electrophoresis
Based on the selected strain-specific DNA sequence for each B. longum strain, we designed corresponding qPCR primer pairs using Primer Premier5.0. The specificity of each primer pair was checked by Primer-blast in NCBI database. We conducted three titers of electrophoresis analysis to evaluate the primer specificity. For the intra-species specificity, another ten B. longum strains, in addition to each target strain (RG4-1, FGSZY6M4 or M1-20-R01-3), were used. For the intra-genus specificity within Bifidobacterium, six other Bifidobacterium species were selected. For the specificity against other gut bacteria, 32 representative members of intestinal microbes were adopted. Genome DNA was extracted for each strain, and amplification was performed using the designed strain-specific primers. Bio-Rad T100 Thermal Cycler was used for PCR amplifications. Each reaction mixture (50 µL) consisted of 2 µL bacterial genomic DNA, 25 µL 2× Taq Plus MasterMix, 2 µL forward primer (0.4 µmol in the final mixture), 2 µL reverse primer (0.4 µmol in the final mixture), and 19 µL ddH 2 O. The PCR conditions were as follows: 94 • C for 2 min, followed by 94 • C for 30 s, 65 • C for 30 s, and 72 • C for 30 s conducting 35 cycles, and then 72 • C for 2 min. The electrophoresis analysis was conducted to separate PCR products at 120 V using a 1.5% agarose gel.

Strain-Specific qPCR Designs and Standard Curves for Absolute Quantification
To evaluate the specificity of these strain-specific primer pairs against the background of complex fecal bacterial communities, we collected fecal samples from 30 humans and 15 mice. These samples were all the bassline samples (i.e., before B. longum administration) in the following described animal experiment and human trial. The fecal DNA was extracted using FastDNA Spin Kit for Soil (Catalog number: 116570200, MP Biomedicals, Santa Ana, CA, USA) according to the manufacturers' instructions. The PCR program was optimized to ensure no positive amplification for these baseline samples. Positive control with DNA of each target B. longum strain and negative control using water instead of genomic DNA were included in all PCR runs. The PCR system (20 µL) consisted of 2 µL genomic DNA (template), 10 µL 2× supermix (BIO-RAD), 2 µL forward primer (0.4 µmol in the final mixture), 2 µL reverse primer (0.4 µmol in the final mixture), and 4 µL ddH 2 O. The following qPCR program yielded the most selective amplification: an initial denaturation step at 95 • C for 2 min; 30 cycles of denaturation at 95 • C for 5 s and annealing/extension at 65 • C for 30 s; a melt curve analysis between 65 • C and 95 • C in 0.5 • C increments at 2-5 s/step; and polymerase activation and DNA denaturation at 95 • C for 5 min.
For absolute quantification of target B. longum strains, standard curves were prepared. A pure culture-based standard series of each target B. longum strain was obtained using DNA extracted from a tenfold dilution series of each B. longum strain in MRS (16 h). The exact bacterial cell numbers of the first serial decimal dilution were determined using plate counting. Cycle threshold values (C T ) were plotted versus equivalent log cell numbers. The amplification efficiency of each design was determined by the slope of the standard curves: E (%) = (10 −1/slope − 1) × 100.

Quantification of Ingested B. longum Strains in the Fecal Samples
Each B. longum-specific qPCR system was used to access the colonized biomass of corresponding target strain in the gut of mice after strain administration. For animal experiments, 5-week-old male Balb/c mice used in this study were purchased from the Shanghai Laboratory Animal Center (Shanghai, China). Animal care and study protocols were approved by the Ethics Committee of Jiangnan University, China (JN . No20181130b1200130[261]). All of the applicable institutional and national guidelines for the care and use of animals were followed. All mice were kept in the mouse facility of the Laboratory Animal Center of the Department of Food Science and Technology, Jiangnan University, Wuxi, China, on a 12-h light/dark cycle in a temperature-(22 • C ± 1 • C) and humidity-controlled (55% ± 10%) room. Mice were assigned to three experiment groups (n = 5): the RG4-1 group, the FGSZY6M4 group and the M1-20-R01-3 group. The experimental period was 14 days, including a 7-day accommodation period, and followed by a 7-day B. longum intervention period. MRS broth-cultured B. longum strains, after being resuspended in sterile saline, were prepared each day, and routinely subjected to plate counting to ensure a gavage dose of 10 8 -10 9 CFU/d for each mouse. Fecal samples were collected at day 7 and day 14.
For the human trial, the 30 volunteers were students at Jiangnan University who ranged in age from 20 to 30 years. No use of antibiotics was reported by the subjects within the 1 month before or during the study, and probiotic foods were not allowed during the trial. The experimental period included a 2-week baseline period (without any treatment) and a 2-week probiotic intervention period. The subjects were randomly assigned to three groups (n = 10 for each group), in which each subject in each B. longum intervention group (RG4-1, FGSZY6M4 or M1-20-R01-3) was administrated 10 9 -10 10 viable cells/d of the corresponding B. longum strain. Fecal samples were collected at day 14 (±3 days) of the baseline period, and day 14 (±3 days) of the treatment period. The colonized biomass of the three B. longum strains in fecal samples was determined using the respective qPCR primers. All volunteers provided informed consent. The Ethics Committee of Jiangnan University (Wuxi, China) provided ethical clearance for this human trial in accordance with the Declaration of Helsinki.
The colonized biomass of each B. longum strains was confirmed by selective culture medium with colony typing with the designed strain-specific primers. In brief, fecal samples were used to isolate bifidobacteria by cultivation on deMan Rogosa Sharpe (MRS) agar supplemented with 50 mg/L mupirocin and 0.1% L-cysteine HCl. After incubation at 37 • C for 48 h in an anaerobic chamber (80% N 2 , 10% H 2 , 10% CO 2 ), colonies were counted, picked, and then subjected to conventional PCR using the strain-specific primers.

Genomic Diversity of B. longum
We reconstructed a phylogenetic tree based on three newly sequenced and 202 publicly available B. longum strain genomes (Tables 2 and 3, and Figure 1A), and calculated the pair-wise genetic distances and accessory gene numbers to reveal the intra-species genomic diversity ( Figure 1B-D). The parameters for the newly sequenced genomes are shown in Table 3. As shown in Figure 1A, the three B. longum strains (RG4-1, M1-20-R01-3, and FGSZY6M4) isolated from the fecal samples of three Chinese individuals are closely clustered in the phylogenetic tree. This is unsurprising, because the majority of publicly available strains were isolated from geographically distant areas (i.e., other countries or continents). Nevertheless, the three strains exhibited obvious genetic distances indicative of their distinct genotypes.  We also determined the number of variable SNPs in the core genomes of 205 B. longum strains ( Figure 1B). The results indicated that the average SNP distance across the 205 strains was 6691. The most phylogenetically related strains exhibited an SNP distance of 0, while the most phylogenetically distant strains exhibited an SNP distance of 11,145. For each of the three novel B. longum strains, the minimum genetic distances between the respective strain and the other 204 strains in the dataset were 6000 (RG4-1), 5257 (M1-20-R01-3), and 8514 (FGSZY6M4). These data further confirm the genetic differences between each of these three B. longum strains and the other strains ( Figure 1C). As shown in Figure 1D, the pangenome analysis indicated that accessory genes accounted for 85.1% of the total genes, and that new genes were accumulated frequently as new strains were added to the analysis.
Overall, B. longum showed a high level of intra-species genomic diversity in terms of the pair-wise SNP distances and the total numbers of accessory genes. The target strains selected for strain-specific detection were phylogenetically distinct and exhibited considerable genetic dissimilarity with the other strains in this dataset, thus implying the possibility of finding appropriate strain-specific markers.

In Silico Identification and Validation of Strain-Specific Gene Markers
Considering the relatively high intra-species genetic similarity relative to inter-species and inter-genus similarities, we initially searched for strain-specific genes within the B. longum genomes. The pipeline used to construct a strain-specific detection tool is shown in Figure 2A. The publicly available B. longum genomes used in this study were derived from different projects, and the formats of their annotation files were not uniform, which prevented a pangenome analysis. Accordingly, we re-annotated these publicly available genomes and the three target B. longum strain genomes using Prokka, and then conducted a gene presence/absence analysis based on the annotated protein sequences. We defined unique genes (present in only one strain within the dataset of 205 strains), core genes (present in ≥99% of strains), soft core genes (present in ≥95% to <99%), shell genes (present in ≥15% to <95%), and cloud genes (present in ≥0.5% to <15%). As shown in Figure 2B, this analysis preliminarily identified 2398 strain-specific genes across the 205 B. longum strains, which accounted for 28.7% of the total genes. Notably, 32, 14, and 49 strain-specific genes were identified, respectively, for B. longum RG4-1 (Table 4), B. longum M1-20-R01-3 (Table 5), and B. longum FGSZY6M4 (Table 6).     Zein seed storage protein 755 FGSZY6M4_01993 Next, we explored whether the specificities of the identified strain-specific genes would be maintained against other microbial taxa. Rather than using amino acid sequences in Roary analysis, we tested the specificities of the preliminary strain-specific genes at the nucleotide level using an in-house nucleotide BLAST against a B. longum genome database to further ensure the strain-specificity of the identified genes. We also used a website-based nucleotide BLAST tool against the NR/NT database (NCBI) to determine which sequences could not be identified in any other representative microbes. Finally, this screening process identified RG4-1_01874 (1331 bp), M1-20-R01-3_00324 (1745 bp), and FGSZY6M4_01477 (1691 bp), which were selected as the target DNA sequences for strain-specific quantification.

Strain-Specific qPCR Designs and Electrophoretic Validation
Next, qPCR primer pairs were designed for each of the three B. longum strains (RG4-1, M1-20-R01-3, and FGSZY6M4) based on their respective strain-specific DNA sequences (Table 7), with predicted product sizes of 115, 199, and 144 bp, respectively. Using electrophoresis, the specificities of the three qPCR probe sets were validated against genomic DNA derived from various microbial strains (Table 1), including 10 other B. longum strains, 6 other Bifidobacterium species, and 32 representative members of the intestinal microbiome. The results indicated that PCR amplification based on the strain-specific primers was only successful for each of the respective target strains, and no amplification of genomic material from non-target microorganisms was observed ( Figure 3). Multiple qPCR primer pairs were designed for each target strain, and the primers with the best performance at this stage were selected and are reported here.

Specificities, Standard Curves, and Amplification Efficiencies of qPCR Assays
Primer specificity was evaluated by qPCR against a complex microbial community present in baseline fecal samples that had not been enriched for the target strain (i.e., pre-treatment). The DNA of each target B. longum strain was used as a positive control in the qPCR runs. Under optimized conditions, no amplification was observed for the baseline samples, whereas the positive controls exhibited good amplification. As shown in Figure 4, standard curves corresponding to RG4-1, M1-20-R01-3, and FGSZY6M4 exhibited good linearity over a 4-log range (10 3 -10 7 CFU/qPCR system, R 2 > 0.99), a 5-log range (10 1 -10 6 CFU/qPCR system, R 2 > 0.99) and a 4-log range (10 3 -10 7 CFU/qPCR system, R 2 > 0.99), respectively. The equations of the regression curves for RG4-1, M1-20-R01-3, and FGSZY6M4 were as follows: Ct = −3.4789lgCFU + 38.217, Ct = −3.5901lgCFU + 35.128, and Ct = −3.2936lgCFU + 38.371; and the corresponding amplification efficiencies were 93.8%, 90.0% and 101.2%, respectively. Collectively, the three qPCR primer pairs exhibited specificity in the context of the complex bacterial communities in fecal samples from both humans and mice, the standard curves used for absolute quantification were of good linearity, and the amplification efficiencies were qualified, suggesting that these primers can be used to detect the abundances of the target B. longum strains in fecal samples.

Quantification of Ingested B. longum Strains in Mouse and Human Fecal Samples
We used the strain-specific primers to detect the abundances of three target B. longum strains in the feces of mice and human volunteers who had consumed these B. longum strains ( Figure 5). For the animal experiments, the colonized biomasses of the three B. longum strains in samples collected 1 week after oral administration exceeded 10 8 CFU/g feces, with average abundances of 1.54 × 10 9 CFU/g feces for RG4-1, 4.60 × 10 8 CFU/g feces for M1-20-R01-3, and 1.06 × 10 9 CFU/g feces for FGSZY6M4. For the human trial, the average population levels of each strain reached >10 8 CFU/g feces, with average abundances of 4.00 × 10 8 CFU/g feces for RG4-1, 3.78 × 10 8 CFU/g feces for M1-20-R01-3, and 7.18 × 10 8 CFU/g feces for FGSZY6M4. These results indicate that orally ingested B. longum strains can be detected at considerable levels in the feces of both mice and humans during a period of intervention, suggesting short-term engraftment of the ingested bacteria in the gut. The colonized abundances of the three B. longum strains obtained from selective culture medium combined with colony typing using our specific qPCR primers further confirmed the conclusion of short-term engraftment by those strains, despite the bacterial numbers detected by this culture-dependent method were generally 10-fold lower ( Figure 5).

Discussion
It has become increasingly clear that the beneficial effects of probiotic bacteria on the host are strain-specific, and the viability of probiotic strains in the gut after ingestion is believed to be an essential factor for them to demonstrate health-promoting functions. Therefore, determining the presence and colonized biomasses of specific probiotic strains in the gut is a key step toward the informed use of probiotics for therapeutic ends. Compared with traditional microbiological methods (e.g., selective media with colony identification), PCR-based molecular methods, which can distinguish the target strain from the baseline microbiota, are the most popular because of their superior sensitivity and specificity [13]. Here, we adopted a pangenome analysis-based approach to identify strain-specific DNA sequences and designed qPCR primers based on these unique markers, which enabled us to detect B. longum strains in fecal samples at a strain-level resolution. In addition to determining the abundances of probiotic bacteria at the strain level, this strategy advantageously involves the identification of strain-specific sequences based on a large set of bacterial genomes. The resulting strain-specificity was more "true" than those identified in an RAPD analysis based on a restricted number of pure cultured bacteria. This is the first known example of using a bioinformatics method to search for unique gene markers and design strain-specific molecular tools to detect and quantify individual bacterial strains. Previous studies have used the abovementioned RAPD methods to identify strain-specific sequences against a background of a limited number of strains of the same species. Daranas et al. screened a unique marker of L. plantarum PM411 against seven other L. plantarum strains in an assay based on seven random RAPD primers [41]. In another study, B. bifidum BF-1 was detected by targeting a strain-specific sequence in an analysis involving 27 RAPD primers and 30 B. bifidum strains [42]. In this study, we identified 32, 14, and 49 strain-specific genes corresponding to B. longum RG4-1 (Table 5), B. longum M1-20-R01-3 (Table 6), and B. longum FGSZY6M4, respectively (Table 7), in the context of 205 B. longum genomes. Obviously, our pangenome-based approach could identify a larger number of strain-specific fragments within a wider confidence interval, because it uses whole genome sequences rather than random PCR products, and a large set of bacterial genomes rather than a limited number of available pure cultures. In addition, we also observed a high frequency of strain-specific sequences among B. longum isolates. Specifically, we detected 2398 strain-specific genes among 205 B. longum strains, which accounted for 28.7% of the total genes, and 178 of these strains harbored unique gene markers. Our data suggest the existence of considerable intra-species genomic diversity within B. longum in terms of the accessory gene number. This finding allowed us to construct strain-specific primers to identify and quantify most of these strains based on single gene markers. For those strains without unique genes, future approaches may use two or more gene sequences as unique combination amplification targets.
The strain-specific sequences identified in this study were first confirmed in silico, by electrophoresis and by the absence of qPCR amplification signals in baseline microbiota of the tested fecal samples derived from humans and mice. In previous studies, strain-specific sequences identified using RAPD methods often showed homology with sequences from other microbes and thus were not truly strain-specific. A previous BLAST analysis of a B. bifidum OLB6378-specific RAPD fragment showed a high similarity (98%) to the parB gene sequences in publicly available B. bifidum genomes [28]. The strain-specific RAPD band identified by Karjalainen et al., and used for the strain-level detection of Propionibacterium freudenreichii (P. freudenreichii). JS was found to encode a 103-bp region with 91% identity to P. freudenreichii ssp. shermanii CIRM-BIA1 [43]. Similarly, the unique RAPD band for B. breve 99 was shown to contain regions homologous to those detected in strains of B. longum, B. adolescentis, B. dentium Bd1, and B. animalis ssp. lactis [43]. In this study, we validated the strain-specific sequences identified preliminarily via nucleotide BLAST against a selfconstructed database of the 205 B. longum genomes and further checked the specificities of the sequences against all available gene information from representative microbes via searching the NR/NT NCBI database. Our selected strain-specific fragments (RG4-1_01874, M1-20-R01-3_00324, and FGSZY6M4_01477) passed the nucleotide BLAST against the B. longum genome database and generated no hits in the NR/NT database. Therefore, the strain-specific sequences identified in this study were highly specific against a background of various microbes and can probably be applied in a broader taxonomic context. We further validated the specificities of the newly designed qPCR primers via conventional PCR against 10 other B. longum strains, 6 other Bifidobacterium species, and 32 representative members of the intestinal microbiome. We additionally confirmed the specificities of the primers by the absence of qPCR amplification products from fecal samples derived from both humans and mice that were free from the target B. longum strains.
The bioinformatics pipeline proposed in this study can be expanded to search for strain-specific markers and achieve strain-level qualifications of various bacteria, including probiotic species. A large number of genomes corresponding to probiotic Lactobacillus and Bifidobacterium species have been sequenced and made publicly available, including 544 L. plantarum, 198 L. paracasei, 112 B. breve, and 176 L. salivarius genomes in the NCBI database. Previous genomic analyses of these species have demonstrated open pangenomes, high intra-species genomic diversity, and high new gene discovery rates for L. salivarius [44,45] and L. casei [46] as the number of included genomes increases. Therefore, the pipeline constructed here can be directly translated for use in other probiotic species, provided that sufficient genomic data and abundant strain-specific genes are available. For this study, we selected Roary because of the simple command line and relatively higher running speed compared with other available tools. Other pangenome analysis tools, such as Pan-Seq [32], PGAT [33], and PGAP [34], are also suitable choices for constructing strain-specific primers.
Our qPCR assays further indicated that the three target B. longum strains could colonize the guts of both humans and mice at high levels of abundance (>10 8 CFU/g feces for both humans and mice) during an intervention period (1 week for mice and 2 weeks for humans), and the colonized biomasses were validated using selective culture and colonytyping methods with the strain-specific primers. Absolute qualification was achieved using highly linear standard curves (R 2 > 0.99) and amplification efficiency was qualified (>90.0%), both of which are comparable to previous studies [41]. In line with our results, a previous study demonstrated that at 2 weeks post-ingestion, the abundances of B. longum AH1206 ranged from 10 7 to 10 10 cells/g feces in human subjects [47]. Selective culture methods combined with strain typing using our strain-specific primers confirmed the colonization of these target B. longum strains. However, the bacterial numbers generated using this culture-based method were nearly 10-fold lower than those determined using qPCR. We attribute this underestimation to an inherent defect of culturebased methods, as previous studies have reported that both the frequencies and numbers of bacterial cells detected in feces by culture methods were substantially lower than the corresponding values obtained using qPCR [42,48]. In addition, PCR with our strainspecific primers enabled us to identify colonies of the three B. longum strains on selective agar efficiently and accurately.

Conclusions
In conclusion, benefiting from the large number of sequenced genomes of probiotic species, we took the most potent probiotic gut colonizer, B. longum, as an example, and proposed a precedent in which a pangenome analysis-based approach can be used to identify unique gene markers for a given bacterial strain, and targeted these markers to achieve strain-level qualification. The qPCR primers designed in this study were able to successfully detect and quantify the colonized biomasses of the given B. longum strains in fecal samples from humans and mice. Therefore, we demonstrated the ability of these efficient in silico analyses to replace existing time-and labor-intensive RAPD methods. Furthermore, by including as many bacterial genomes as possible, the annotated unique sequences are highly specific and can be applied in a broader taxonomic context involving a more complex microbial ecology. The pipeline constructed herein can also be adapted to identify strain-specific markers and design strain-level qPCR primers for other probiotic species.
Institutional Review Board Statement: Animal care and study protocols were approved by the Ethics Committee of Jiangnan University, China (JN. No20181130b1200130[261]). All of the applicable institutional and national guidelines for the care and use of animals were followed. For the human trial, the Ethics Committee of Jiangnan University (Wuxi, China) provided ethical clearance for this human trial according to the Helsinki Declaration.
Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.
Data Availability Statement: Data in this study are available from the authors upon request.

Conflicts of Interest:
There are no conflict of interest to declare.