Comparative Genomics and In Silico Evaluation of Genes Related to the Probiotic Potential of Bifidobacterium breve 1101A

Bifidobacterium breve is among the first microorganisms colonizing the intestinal tract in humans and is a predominant species in the gut microbiota of newborns and children. This bacterium is widely used in the probiotic industry due to its capacity to improve host health. The search for new targets with probiotic properties is an increasing trend with the help of next-generation sequencing as they facilitate the characterization of the bacterial features. B. breve 1101A was isolated from the faeces of healthy children in Brazil and therefore could play a protective role in the gut. To investigate the beneficial properties of this strain, the present study performed a comprehensive characterization of the genetic features involved in the bacterium resistance and adaptation to gastrointestinal conditions, production of nutrients, and immunomodulatory compounds. Furthermore, this study carried out the prediction of genomic elements (plasmids, prophages, CRISPR-Cas systems, insertion sequences, genomic islands, antibiotic resistance genes) to evaluate the safety of B. breve 1101A. A comparative genomics approach using 45 B. breve complete genomes based on pangenome and phylogenomic analysis was also performed to identify specific genes in B. breve 1101A. The prediction of genetic elements, possibly safety-related, did not detect plasmids, but only one incomplete prophage, two non-functional CRISPR systems, and seven genomic islands. Additionally, three antibiotic resistance genes were identified: ileS (resistance to mupirocin), rpoB, and erm(X). In the comparative genomic analysis, the pangenome was revealed to be open, and B. breve 1101A presented 63 unique genes associated with several processes, such as transmembrane transport, membrane components, DNA processes, and carbohydrate metabolism. In conclusion, B. breve 1101A is potentially safe and welladapted for intestinal disorder therapeutics, although the role of its unique genetic repertoire needs further investigation.


Introduction
Bifidobacterium species are Gram-positive bacteria, non-spore-forming, non-motile, with irregular rod-shaped cells, but sometimes with a Y or V shape [1,2]. Species from this genus are among the first colonizers of the intestinal tract in newborns [3,4]. Specifically, B. breve is the dominant species in the gut of breastfed babies [5].
Probiotics are microorganisms that confer health benefits on the host [6]. A considerable part of probiotics is from Lactobacillus and Bifidobacterium [7]. The principal mechanisms of action of probiotics are resistance to acid and bile salts [8], the capacity of adhesion to the host epithelium cells [9], improvement of the intestinal epithelial barrier [10], competition with pathogens and production of antimicrobial compounds [11], immunomodulatory effects [12], among others which are considered of relevance in the evaluation process of probiotics.
Several studies have explored genes in a bacterial genome using in silico analysis when looking for beneficial properties in new strains for probiotics [13]. It is noteworthy that some genes are involved in the aspects discussed above. However, other factors are related to the safety aspect (antibiotic resistance, virulence factors) and the genome stability (phages, plasmids, insertion sequences).
Therefore, genomic islands and CRISPR-Cas systems must be evaluated because they could confer some additional features [14][15][16]. For instance, the genome evaluation of Lactobacillus reuteri PNW1 evidenced the presence of genes related to antibiotic resistance, and it has allowed us to test virulence factors and genes of specific interest, among other elements [17]. Moreover, the functional analysis of Bacillus velezensis FTCo1 permitted the identification of genes related to adhesion and acid resistance [18]. Additionally, the in silico evaluation of B. coagulans HS243 showed the adaptation and probiosis potential of that strain [19].
Comparative genomics is another strategy used to evaluate several genomes of probiotic bacterial strains, being defined as pan-probiosis [20]. This Pan-genomic derivative approach allows the identification of genes related with probiotic properties, which are shared among strains or are either unique within a bacterium genus or species. Moreover, integration with phylogenomic analyses provides studies with the ability to connect genotypes and phenotypes to select strains for specific clinical or biotechnological applications [21]. Some studies have followed this perspective, such as the comparative evaluation of the Lactobacillus johnsonii ZLJ010 genome with others from the same group, to characterize the probiotic profile of this species [22]. Similarly, other recent studies have been successful in understanding the probiotic potential of other species like Lactobacillus helveticus [23].
In a previous study, Bifidobacterium breve 110 1A was evaluated in vitro with other strains of Bifidobacterium, isolated from the faeces of healthy children in Brazil [24]. Some of the parameters were growth rate, oxygen tolerance, antagonism, cell wall hydrophobicity, and antimicrobial susceptibility. In that study, B. breve 110 1A showed high aerotolerance, a feature considered as a positive property for industrial processes, a high inhibition rate (66.7%) of pathogens (Clostridium difficile, Listeria monocytogenes, Shigella sonnei, Vibrio cholera, among others), and an antibiotic resistance related to cefoxitin, erythromycin, and metronidazole classes [24]. In this respect, the genomic analysis could help elucidate some of these features and mechanisms and help to facilitate the exploration of others of interest. Therefore, the aim of the present study was the in silico characterization of the B. breve 110 1A genome using a comparative genome approach with 45 available complete genomes of the species, along with the searching of genes related to beneficial features to explore the probiotic potential of this strain.

Sample Preparation and DNA Extraction
B. breve 110 1A was isolated from infant faecal samples in Belo Horizonte, Brazil [24]. This strain was maintained in De Man, Rogosa, and Sharpe (MRS; Difco, Sparks, NV, USA) broth supplemented with 0.5% L-cysteine for 48h in an anaerobic chamber to 37 • C. The genomic DNA extraction was performed following an adapted protocol [25].
The 45 complete genome sequences of B. breve available in the NCBI GenBank database were downloaded in nucleotide FASTA format. All genomes were annotated using Prokka v1.14.5 [29]. Synteny was evaluated on the B. breve 110 1A genome with the other complete genomes of the species. Multiple whole-genome sequence alignments were conducted using the implemented Mauve v2.4 [30].

Taxonomy, Phylogenomics, and Evolutionary Analysis
Average Nucleotide Identity (ANI) values were calculated for the 46 genomes of B. breve and the outgroup species (Bifidobacterium longum NCTC 11818, B. bifidum JCM 1255, and B. animalis subsp. animalis ATCC 25527). The results were visualized with the heatmap R package (https://cran.r-/pheatmap/, accessed on 15 January 2022). The phylogenomic tree was performed using the Codon Tree Test method of the Pathosystems Resource Integration Center (PATRIC) (http://www.patricbrc.org, accessed on 15 January 2022) with 376 genes of a single copy. Support values were generated using replicates of 100 by the RaxML tool.

Prediction of Mobile Elements, Insertion Sequences, Bacteriocins, and CRISPR-Cas Systems in Bifidobacterium breve 110 1A
The in silico identification of plasmids on the genome of B. breve 110 1A was done with PlasmidFinder 2.0 [31]. The presence of phages was evaluated with PHASTER [32], and the prediction of insertion sequences (IS) was performed with Insertion Sequence Semi-Automatic Genome Annotation (ISsaga v2.0) from the ISfinder tool [33]. Therefore, the prediction of putative bacteriocins was performed with BAGEL4 [34], and the presence of Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) and Cas proteins were analyzed with the CRISPRCasfinder tool [35].

Genomic Plasticity Analysis
The prediction of putative genomic islands in the B. breve 110 1A genome, such as the product of possible horizontal gene transfer (HGT) events, was performed using GIPSy v1.1.2 [41] using B. breve Bifido_07 genome (ENA: FTRK01000000) as a reference genome, both genomes in Genbank format. This last bacterium was present in a report in a bacteremia case [42]. The circular genomic maps were visualized with BRIG v0.95 [43]. To explore the genes in the islands, functional annotation was performed with EggNOG 5.0 [44].

Pangenome Analysis
The pangenome size calculation was performed with Roary v3.11.2 [45]. The pangenome's openness was estimated with Bacterial Pan Genome Analysis, BPGA v1.3 [46], and it was considered the exponential parameter b of the empirical power law equation of the pangenome curve obtained with protein sequences. They were pre-processed, the posterior clustering step was done with USEARCH v11 with default parameters and an identity cut-off of 95%, considering an atypical average GC content (2*standard deviation). The functional analysis was done with Cluster of Orthologous Genes (COG) assignments based on representative sequences for core, accessory, and unique gene families. The number of unique genes of B. breve strains was represented in a flower chart, and the unique genes for B. breve 110 1A were processed with the GOfeat tool by functional annotation [47]. For the core genome SNP phylogenetic analysis, a rapid multiple-alignment of B. breve genomes was performed with ParSNP [48] and used for building the tree.

Identification of Genes Related to Probiotic Features
Genes involved in mechanisms of adhesion; resistance to stress conditions (acid, bile salts, heat, osmotic); repair and protection of DNA and proteins, and production of vitamins were retrieved from the literature regarding the genera Bifidobacterium and Lactobacillus [8,17,49,50] Protein sequences of these genes were aligned with our genome of study with Basic Local Alignment Search Tool (BLAST) with a cut-off 1 E 5 and a minimal identity percentage of 70%.

Whole-Genome Characterization of Bifidobacterium breve 110 1A
The complete genome of B. breve 110 1A was a circular chromosome of 2,371,121 bp with a GC content of 58.8%. Initially, the genome assembly reached 17 contigs with a coverage of 1234; N50 value of 643.298 bp, and after the gap-closure process, it was possible to obtain the complete genome in a single contig. The annotation process identified 1986 CDS (being 907 hypothetical proteins), 55 tRNA, 9 rRNA, and one tmRNA. Regarding its source information, most samples were isolated from children's faeces, whereas a few were obtained from adult faeces, vagina, environment, and human milk (Table 1).
In evaluating the conservation on the genome structure, B. breve 110 1A showed collinearity of the gene blocks with most of the evaluated genomes (Supplementary Figure S1A). In contrast, other few strains presented rearrangements over large parts of the genome (Supplementary Figure S1B). In this respect, B. breve ACS-071-V-Sch8b showed a large inversion and B. breve JCM7017 presented a minor inversion in the central region of its genome (around the 1.2Mbp locus). In addition, B. breve BR3 seems to have two inversion events, one covering most part of the genome and the other in the terminal region (200 kb size, approximately). * Specific information of sample origin was absent in the GenBank description. These samples were reported within the project "Comparative genomics and methylome analysis of the gut commensal Bifidobacterium breve [51]. ** Strains with probiotic properties confirmed experimentally by in vivo studies.

Taxonomy and Phylogenomics Analysis
Taxonomic characterization through similarity comparison using ANI values estimated for the 46 strains of B. breve is shown in Figure 1. All comparisons of B. breve 110 1A with other B. breve strains showed ANI values between 0.97-0.98 grouped with these strains, confirming its high nucleotide identity with this species.

Prediction of Mobile Elements, Bacteriocins, Insertion Sequences, CRISPR-Cas Systems, and Antibiotic Resistance Genes in Bifidobacterium breve 110 1A
The analysis performed with PlasmidFinder2.0 did not identify plasmids and bacterioc in-coding genes in the genome of B. breve 110 1A . The PHASTER tool predictions has identified an incomplete prophage region of 8518 bp coding six phage-like and three hypothetical proteins (Supplementary Figure S2, Table 2). The screening of insertion sequences (IS) carried out in ISfinder revealed 29 elements, being seven of them complete ORFs. The IS were classified into six family groups (IS3, IS256, IS21, ISL3, IS30, and IS5) ( Figure 3). Considering other genomic elements identified, three regions were related to Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) in the genome of B. breve 110 1A (Table 3); however, there was a lack of arrangements codifying for Cas proteins. The detection of resistance genes to antibiotics revealed, in total, three genes: ileS, rpoB, and erm(X). The importance of coverage percentage for every hit was above 99.2% (Table 4).

Genomic Plasticity Analysis
Seven genomic islands (GEI) were predicted in B. breve 110 1A : two Resistance Islands (RI) and five Genomic Islands (GI) (Figure 4). Some genes in resistance islands (RI1, RI2) are related to antibiotic resistance (methyltransferase domain, bacterial regulatory proteins-tetR family, VanZ-like family) and metal ion binding (Cupin two conserved barrel domain protein). Moreover, genes involved in DNA binding and transcription regulation (two genes for RelB antitoxin), DNA binding (addiction module antidote protein HigA, PemK-like, MazF-like toxin of type II toxin-antitoxin system, and two gene helix-turnhelix domain), transferases (HipA-like C-terminal domain, FR47-like protein) and Major facilitator Superfamily (MFS) were identified (Supplementary Tables S1 and S2). Gene content in Genomic Islands (GI1-GI5) were involved in processes such as transport proteins (ytfL, transporter associated domain, ABC transporter) and DNA binding to transcription regulation (transcriptional regulatory protein C terminal). Other genes involved in the active transport of solutes (four genes codifying for bacterial extracellular solute-binding protein, transmembrane transport), an integral membrane component (six codifying for binding-protein-dependent transport system inner membrane component, branched-chain amino acid transport system/permease component, and branched-chain amino acid ABC transporter, permease protein). Also, genes related to metabolism and transport of carbohydrates (glycosyl hydrolase family 36 N-terminal domain) with αgalactosidase activity, and others (FGGY kinase family, raffinose synthase or seed imbibition protein Sip1, ABC-type sugar transport system periplasmic component), uptake and translocation of the essential macronutrient phosphorus (phosphate transporter) (Supplementary Tables S3-S7).

Pangenome Analysis
Pangenome size estimation identified 5943 genes in total based on their distribution in the 46 genomes. Core genes were 1174, soft-core genes were 163, shell genes were 1197, and cloud genes were 3409 (Supplementary Figure S3).
The pangenome was visualized in a matrix that shows a core and accessory genome (Supplementary Figure S4). Therefore, the estimation of the pangenome showed the exponential parameter (b = 0.3047) that suggested to be considered as an open pangenome, which has been increased in size with the addition of new genomes in the analysis ( Figure 5A). The distribution of functional classifications in the COG for the pangenome showed the prevalence of several categories, according to the organization of genes present in the core, accessory, and unique ( Figure 5B).
The unique genes that were strain-specific in the pangenome analysis were in the range of 0-181 with a mean value of 34 genes per genome (Supplementary Figure S5). B. breve 110 1A presented 63 unique genes associated with processes, such as transmembrane transport, membrane components, DNA processes, carbohydrate metabolism, among others (Supplementary Table S8). Thirty genes were uncharacterized protein (47%), of which ten genes had GO terms related to intermembrane integral components and related to DNA binding. There were five genes involved in carbohydrate metabolism (α-galactosidase, β-galactosidase, dTDP-4-dehydrorhamnose reductase, glycosidase) and one gene for transport of maltose (maltose ABC transporter permease). Regarding genes related to DNA process events, genes received GO terms of DNA integration, transposition, and DNA binding (IS3 family transposase, integrase, proteins containing domains for IS or integrase). In addition, other genes related only to DNA binding and regulation of transcription were predicted (putative transcriptional regulator XRE family, transcriptional regulator LacI family). Moreover, six genes associated with the cellular membrane were identified (ABC transporter substrate-binding protein, ABC transporter permease protein probably xylobiose porter, MFS transporter, putative membrane protein, histidine kinase, sortase family protein, LPXTG-motif cell wall anchor domain protein) with functions, according to GO terms, such as an integral component of membrane, membrane transport, plasma membrane.
About the phylogenomic tree based on SNP variants from the core genome, the B. breve 110 1A formed a clade with DRBB26 from the Netherlands (Figure 6). Only in part, B. breve strains from Ireland were grouped based in SNP variants; therefore, there were no evident differences concerning the source of isolation (faecal and gut samples). However, there were few samples from other sources, such as 12L isolate from human milk which formed a clade with gut samples (CNCMI4321 and DRBB30); ACS071VSch8b isolate from the vagina, which formed a clade with a gut sample (DRBB28). The LMC520 isolate formed a clade with faecal samples (BR3, lw01, and JCM 7019). In contrast, the unique sample with clinical origin was FDAARGOS561, rooting the tree, and representatives from the intestine (NCTC11815) and gut (NRBB01) (Figure 6).

Identification of Genes Related to Probiotic Features
The analysis found 18 genes related to adhesion, some of them were sequences codifying for sortases, related as Tad-like protein (A, B, C, Z), TadE, and TadF. Two sequences were codifying for secretion proteins, such as LPXTG-type cell surface and only one sequence for luxS or autoinducer-2 (AI-2) (Supplementary Table S9).
Furthermore, it was identified that 39 CDS were involved in stress resistance, among them, multiple subunits of F0F1-type ATPase (a, b, alpha, beta, delta, epsilon, and gamma). In addition to this, some genes codifying for chaperons and SOS response genes (Supplementary Table S10).
Concerning genes associated with repairing and protection, genes were identified with ten sequences codifying for nucleoside triphosphate pyrophosphohydrolase (mutT) and DNA-binding ferritin-like protein (dps) methionine sulfoxide reductase (msr). Moreover, other sequences were found codifying for copper-transporting ATPase (copA), such as subunit A of the exonuclease ATP-binding cassette ABC complex (uvrA) ( Supplementary  Table S11). Additionally, there were 11 genes involved in the production of vitamins, such as B2, B9, and B12 (Supplementary Table S12).

Evolutionary Relationships from Phylogenomic Analysis
The B. breve strain 110 1A taxonomy was confirmed based on the orthologous multiple loci phylogenomic analysis showing a close relationship to other B. breve strains. Moreover, our results revealed DRBB26, isolated in the Netherlands from infant faeces as well (Table 1), to be the most closely related strain from 110 1A . Interestingly, the core genome SNP analysis also supported this finding, although no correlation was evident between the strain clusters and the countries of origin. It is important to notice that some of the B. breve clades observed in the multilocus approach were in contrast to previous phylogenomic analysis [51]. Although some clades remain as in the previous study, such as the clonal groups 1, 2, and 3 (Figure 2), the DRBB26 strain formed a different clade with B. breve NRBB01 in contrast to our study. In this context, B. breve 110 1A phylogenetic group could be replaced. As was shown in phylogenomic results, the closer relationship of B. breve strains, B. breve fell into the B. longum clade based on orthologous genes and a multilocus approach [52,53].
In addition, only three strains with complete genomes were confirmed to present probiotic properties in previous experimental studies, UCC2003 [54], lw01 [55,56], and BR3 [57,58]. Therefore, the phylogenetic analysis may not provide further insights regarding the probiotic profile of B. breve 110 1A as these strains do not show a close relationship with its clade. However, our study suggests that JCM 7019, MGYG-HGUT-02469, and JSRL01 strains possibly present beneficial properties due to their proximity with the probiotic strains, although experimental studies should be performed as well.
Genomic rearrangements are a very important feature regarding the safety of probiotic strains as it might be useful to detect genetic plasticity and instability. Genetic alterations associated with the growth of bifidobacteria, such as small deletions throughout the genome, make them able to adapt to extreme conditions, such as in the gastrointestinal tract or during a fermentation process [59]. The Bifidobacterium genus has been reported to present a highly conservative level of synteny [60], despite there being a degree of diversity related to each species. As expected, our study has shown the synteny of gene blocks was kept in most of the evaluated genomes, including the B. breve 110 1A , indicated by the locally collinear blocks (LCBs) on multiple alignment analysis. Nevertheless, three genomes presented a different genome organization, such as inversions on regions of different sizes. Two of these genomes (ACS-071-V-Sch8b and JCM7017) were reported previously in a comparative study of this species [52], suggesting that Bifidobacterium breve exhibits genomic regions which are affected by events of inversions, insertions, and deletions [60]. For example, urea metabolism genes and the way they utilize carbon from different sources (such as milk and plants) contribute to the classification of these subspecies of B. longum [61]. Prediction of mobile elements, bacteriocins, and CRISPR-Cas systems in Bifidobacterium breve 110 1A .
Regarding mobile elements, it is known that plasmids provide new characteristics to probiotic bacteria that increases the possibility to survive in other environmental conditions, acquire additional properties in bacterial metabolism, adherence, and antibiotic resistance, being the last one, is an undesired attribute for a probiotic bacterium [14]. Although plasmids were not detected in B. breve 110 1A , they were previously reported in the genus Bifidobacterium [62]. Regarding B. breve, plasmids were detected in 40% of 42 genomes analyzed [63]; in contrast, they were absent in 106 evaluated isolates of this species in another study [64]. A reduced number of plasmids has been reported for B. breve strains with complete genomes, particularly only the strains, NCFB 2258 [65] and BR3 [57].
Concerning the predicted prophage in the genome of B. breve 110 1A , the region was classified as incomplete and did not contain typical genes that codify for structural proteins (such as the tail, capsid), DNA regulation, integrases, lysis, and others for a functional bacteriophage. Therefore, this prophage region could be considered as a defective prophage. It presented some genes involved in response to oxidative stress (Fe-S cluster assembly ATPase SufC, cysteine desulfurase, SUF system NifU family Fe-S cluster assembly protein), biosynthesis of glycogen (glucose-1-phosphate adenylyltransferase), among other processes (RNA methyltransferase, histidine triad domain protein, PhoH family protein). Prophages could represent a future lysis event for the bacteria, or they could provide some additional properties as a double-edged sword [66]. The frequency of integrated prophages is standard in bacterial genomes [67]. A considerable part of them is defective, possibly due to the phage domestication event [68]. Previous studies have reported the mentioned IS families in the Bifidobacterium genus [52,69]. For example, IS3 was referred to as the IS family with the most widespread and significant number being identified in Bifidobacterium [69]. Contrastingly, the study of the mobilome of B. breve showed the IS30 as the most frequent IS family [52].
Genes codifying for bacteriocins were not identified in the analyzed B. breve 110 1A . These molecules allow the competence against other bacteria, being specific for an intragenus or less clear for inter-genus strains, in the gut environment; it is a desirable feature for probiotic bacteria. Some studies have reported some produced bacteriocins in the genus Bifidobacterium [9,70]. However, in other studies, the production of these compounds in Bifidobacterium was relatively rare [71] or not reported in gut samples [72]. Although any gene for bacteriocin was not predicted, the antagonism showed in in vitro [24] could be due to other mechanisms of the strain that were evident in probiotic bacteria, such as adhesive capability, which allows for the inhibition of pathogen colonization [73] or the reduction of pH in the environment with the glucose fermentation and production of short-chain fat chain (SCFA) seen in bifidobacteria [74,75].
At the same time, CRISPR regions were identified. These components are essential for the bacteria to deal with phage sequences from the environment, especially in the gut tract where there is a viral community. Phages can lyse bacteria, and it can drastically affect the survival of the bacterial population in the production of probiotics, fermentation time, taste, among other parameters [76]. Considering that phages are resistant to the pasteurization process and its elimination is difficult, the search of probiotics with the ability to be protected from phages and other DNA invaders, such as plasmids [16]. Our results identified three CRISPR loci in the B. breve 110 1A genome. These CRISPR regions were considered not functional because there are no Cas proteins; it was impossible to determine the type and subtype of the systems due to the necessary presence of Cas proteins for its classification. In a previous study, the occurrence of CRISPR-Cas systems in the genus Bifidobacterium was 77% of the 48 analyzed species, representing a high presence in the genus [77].

Assessing the Risk of Antibiotic Resistance Genes in Probiotic Bacteria
The antibiotic resistance genes in the B. breve 110 1A were ileS, rpoB, and erm(X). The ileS confers resistance to mupirocin [78] and rpoB gene confers resistance to rifampicin [79]. Both genes were detected in the other 45 genomes of B. breve to extend this predictive analysis. These results were supported by other studies, where resistance to mupirocin is considered as intrinsic resistance in bifidobacteria [78,80]. Using different resistance databases, the erm(X) gene was identified in B. breve 110 1A . This gene can enzymatically modify the DNA sequence of the 23S rRNA gene by methylation to avoid the binding of macrolides, lincosamides, streptogramin B, and conferring resistance [81].
The presence of homologous erm gene was reported in B. longum and some strains of B. breve (BR-14 and DPC6330), and it was suggested the acquisition by HGT [82]. More specifically, erm(X) was reported in B. thermophilum and B. animalis subsp. lactis strains from pigs, founded in the Tn5432-like transposon, and similar transposon found in the pathogenic bacteria, such as Corynebacterium [83]. This antibiotic resistance gene was found in two strains of B. breve in another study [84]: NRBB51 (three copies) and DRBB26 (two copies) interleaved by genes codifying transposases. In the present study, only a transposase IS1249 (IS256 family) was detected in the vicinity of erm(X), and both within the RI2 could suggest that the island was acquired by HGT. Contrasting to these in silico results with the experimental essays of antibiotic susceptibility in B. breve 110 1A [24], this suggests that the identified erm(X), in this study, could be the responsible gene for the showed resistance to erythromycin. The species B. breve is currently considered with QPS status, Quality Presumption of Safety [85], due to not being related to any infective process. Although the presence of genes of resistance is not considered a safety issue per se, it is necessary to determine the possible transference of this resistance [79].

Genome Plasticity Analysis
The analysis of genomic islands (GEIs) prediction in B. breve 110 1A revealed seven islands (Figure 4, Tables S1 and S2). Gene content GEI1-GEI5 islands were concerned with transport proteins (ytfL, transporter associated domain, ABC transporter), membrane components, DNA binding (two genes helix-turn-helix lactose operon repressor), C-terminal transcriptional regulatory protein (two genes), and the metabolism of carbohydrates, such as genes related to galactose and raffinose that could contribute to these other functions of the bacteria.
Other genes involved in the active transport of solutes (four codifying for bacterial extracellular solute-binding protein, transmembrane transport), are an integral component of the membrane (six codifying for binding-protein-dependent transport system inner membrane component, branched-chain amino acid transport system/permease component and branched-chain amino acid ABC transporter, permease protein). In addition, genes related to metabolism and the transport of carbohydrates (glycosyl hydrolase family 36 N-terminal domain) with α-galactosidase activity, and others (FGGY kinase family, raffinose synthase or seed imbibition protein Sip1, ABC-type sugar transport system periplasmic component), uptake and translocation of the essential macronutrient phosphorus (phosphate transporter). Among other processes (lysR substrate binding domain, DNAbinding transcription factor 2, sugar-phosphate isomerase involved in capsule formation) (Tables S3-S7).

Pangenome Analysis
In a previous study, an analysis with MCL to perform a pangenome analysis with B. breve, the estimation of pangenome size was 3667 gene families, and the core genome was composed of 1307 gene families using 13 genomes (including 8 complete and 5 draft genomes) [52]. In another study, the pangenome size estimation was 6138 gene families, and the core genome size was 1282 gene families, using 73 genomes with 37 complete and 36 drafts [51]. In a recent study, the species' pangenome was evaluated with 55 genomes, with 46 drafts originally from China that were composed of 6707 gene families and the core genome composed of 1111 gene families [86]. Considering our results with 46 complete genomes of B. breve, with a pangenome size of 5943 genes and a core genome size of 1174, we are closer to the second study [84], even with the considerable difference in the number of used genomes. In this respect, genome assembly and annotation are two main factors determining the performance of a pangenome analysis and an adequate number of complete genomes [87]. Furthermore, it is essential to mention that draft genomes are helpful in some studies; however, due to the underrepresentation of some genes, it could affect comparative studies because of low-read coverage or assembly errors [88].
Concerning the result of the matrix representation of the pangenome, the similarity of a group composed of: (i) NRBB02, NRBB8, NRBB18, NRBB19, NRBB20, NRBB27, and NRBB49; (ii) DSM 20213, NCTC11815, NRBB01, and FDAARGOS561; and (iii) CNCM I4321 and DRBB30 by the ANI values = 0.9999 in each group. Moreover, the first two groups were reported as clonal strains [84]. Considering the estimation of the curve parameters of the pangenome suggested, it as an open pangenome because it tends to continue to increase in size with the addition of new genomes in the simulations ( Figure 5). Our results agree with a previous study with thirteen genomes that considered the pangenome as opened [52] and the seventy-three genomes as opened [51] of the B. breve pangenome. However, the last reference [86] considered the pangenome as not entirely closed but gradually saturated. The distribution of COG showed a high percentage of genes related to replication, recombination, and repair processes concentrated in the core and accessory genome. In the second place, many genes are involved in carbohydrate transport, and the metabolism in the core and accessory genome. The COG items were distributed according to basic (core genome) and adaptive processes (accessory genome).
According to the phylogenomic tree based on SNPs of the core genome, there was no marked distribution of the strains according to geographical distribution and source of isolation. Referring to B. breve 110 1A , the closest strain was DRBB26 isolated from the gut in the Netherlands and without additional information. A comparative perspective and the pangenome analysis allow for its segmentation and the identification of core, accessory, and unique genes [89]. This last part is relevant to know the strain-specific genetic repertoire that could represent adaptive features to different niches, metabolic advantages, or even genes related to pathogenicity. The comparative approach has been previously used in industrial starter cultures [21]. The results about mean value of unique genes per genome, in this study, was of 34 genes considering 46 complete genomes; contrasting with a previous study for the species of 53 unique genes per genome using eight complete genomes [52]. From the pool of unique genes, 47% were considered uncharacterized proteins in this study that coincide with the predominant presence of genes with this label in a previous B. breve genomic comparison [52]. More generally, genome projects identified hypothetical proteins that have a range of 30-40% of total identified proteins in prokaryotes [90] that partially interfere in the characterization of the strain. Moreover, mobile elements were present in minor frequency [52], represented by transposases and integrases. This could be related to the unique genes and the acquisition by HGT. Therefore, unique genes were involved in the metabolism and transport of carbohydrates that could represent additional features that allow the utilization of a wide variety of sources of energy. Especially, galactosidase and β -galactosidase are enzymes of interest because they are used commercially to improve symptoms of lactose intolerance and use this carbon source [91]. These enzymes were identified in bifidobacterial, and probiotics were used to alleviate lactose intolerance [92,93]. Other genes codified for the sortase family protein and LPXTG-motif cell wall anchor domain protein, which were involved in cellular adhesion, represent a desirable characteristic in potential candidates for probiotic bacteria in the gut colonization stage [17]. Moreover, an MFS transporter was also predicted in the unique gene group, and its overexpression was reported under acid bile exposition [94].

Identification of Genes Related to Probiotic Features in B. breve 110 1A
Some of these genes were related to adhesion mechanisms, among them: sortases (srtA1, srtA2, and srtA3) LPXTG-type proteins that probably have a role in the attachment to cells or mucus in the gut [49]. Another identified protein was Tad (ABCEFZ), essential for the pili structure necessary for bacterial colonization [95]. Therefore, luxS has been involved with acid tolerance and adherence to intestinal epidermal cells in Lactobacillus plantarum [96]. Acid and bile resistance is common in Lactobacillus strains, as in the work of Chou and Weimer, causing the bacteria to survive and adapt to stressful conditions, being potentially probiotic targets [97].
Some genes related to stress resistance, such as F0F1-type ATPase, work as proton pumps in Gram-positive bacteria. These transcription activities were evidenced in acid conditions in Bifidobacterium species [98][99][100]. Genes also identified (dnaK, groEL, groES) in codifying for chaperones are involved in a general response to stress by protection, removal of damaged proteins, and other related functions. Moreover, dnaJ and grpE, identified in the analysis, have shown response (upregulation) to acid environments [101]. Some other genes (lexA, ruvA, recA, and mutY) are involved in SOS response. The lexA has shown to be induced under temperature stress in B. breve, and recA showed an upregulated response in heat stress [102]. Moreover, reduced survival in L. reuteri was demonstrated in mutation in clp, in exposure to bile [103], indicating its importance for survival in conditions of acidity and general stress. In addition, it was identified that 10 CDS with functions were related to the repair and protection of DNA or proteins, such as DNA-binding ferritin-like protein and nucleoside triphosphate pyrophosphohydrolase (mutT) that act in the reparation due to the oxidative damage removing hydroxyl radicals [49]. Other identified genes, such as mutants of methionine sulfoxide reductase, have shown a reduced capacity in stress conditions in L. reuteri 100-23 [104], and mutants of copper-transporting ATPase (copA) in L. plantarum WCFS1 have shown a reduced competitive ability in mouse [105]. Therefore, copA is involved in the nucleotide reparation in acid conditions in L. helveticus CNBL 1156 [106]. Furthermore, there were 11 genes involved in the biosynthesis of vitamins. Some of them form parts of operons for specific vitamins. The integrity of the operons for vitamins is of interest for considering the functionality of its production by the B. breve 110 1A .

Conclusions
Probiotics are known for their beneficial health properties and there is an increasing tendency to look for and select new candidates for human usage. The present study applied a comparative genomic approach to explore the features of the first complete genome of B. breve from Brazil, strain 110 1A , isolated from healthy children's faeces. The analysis of this strain showed positive characteristics to be considered as a candidate based on the identified genes related to probiotic properties (adhesion, resistance to stress for acidity and heat, and production of vitamins) that suggest good survival ability in the gastrointestinal tract. Furthermore, some unique genes for B. breve 110 1A are involved in the adhesion and metabolism of some carbohydrates representing the desired condition for bacteria to consume different sources of energy from the gut niche. Moreover, the antagonism against pathogens observed in previous in vitro studies seems be promoted by other factors (i.e., competence by adherence, reduction of pH) due to the absence of bacteriocin genes. On the other hand, considering the safety criteria, a crucial point for further elucidation is whether the transference of the antibiotic resistance erm(x) gene in exploratory essays is possible. Moreover, the evaluation of unique genes with unknown functions and possible pathways should be explored. In this context, in vitro and in vivo studies with B. breve 110 1A should provide supportive evidence.
Supplementary Materials: The following supporting information can be downloaded at: https:// www.mdpi.com/article/10.3390/bacteria1030013/s1, Figure S1: Representation of multiple sequence alignments of whole genomes of 46 Bifidobacterium breve strains shows conserved synteny and collinearity among gene blocks; Figure S2: Representation of the incomplete prophage region in Bifidobacterium breve 110 1A . The prophage showed 9 CDS: 6 CDS in dark turquoise identified as phagelike protein and 3 CDS in light green as a hypothetical protein; Figure S3: Pie chart representing the number of genes shared in the core, softcore, shell and cloud genome for the 46 Bifidobacterium breve; Figure S4: Matrix representation of the pangenome based on the presence-absence of genes of the pan-genome; Figure S5: Plot diagram showing the core-genome size of all 46 genomes of Bifidobacterium breve (central area) and unique genes for each strain; Table S1: Functional annotation of genes in RI1 of B. breve 110 1A ; Table S2: Functional annotation of genes in RI2 of B. breve 110 1A ; Table S3: Functional annotation of genes in GI1 of B. breve 110 1A ; Table S4: Functional annotation of genes in GI2 of B. breve 110 1A ; Table S5: Functional annotation of genes in GI3 of B. breve 110 1A ; Table S6: Functional annotation of genes in GI4 of B. breve 110 1A ; Table S7: Functional annotation of genes in GI5 of B. breve 110 1A ; Table S8: Functional annotation information about the 63 unique genes of B. breve 110 1A ; Table S9: Identified genes potentially involved in adhesion mechanisms in B. breve 110 1A ; Table S10: Identified genes potentially involved in resistance mechanisms to stress in B. breve 110 1A ; Table S11: Identified genes potentially involved in repair and protection of DNA and protein mechanisms in B. breve 110 1A ; Table S12

Conflicts of Interest:
The authors declare no conflict of interest.