In Silico Comparative Genomic Analysis Revealed a Highly Conserved Proteolytic System in Lactobacillus delbrueckii

Lactobacillus delbrueckii, the type species of the genus Lactobacillus, is widely recognized as the primary starter culture in the dairy industry due to its proteolytic activity, which enables it to growth in milk. In this study, a comprehensive genomic analysis of the proteolytic system was conducted on L. delbrueckii strains. The analysis included 27 genomes of L. delbrueckii, with a specific focus on the key enzyme involved in this system, the cell envelope-associated proteinase (CEP). The amino acid sequences, as well as the protein-structure prediction of the CEPs, were compared. Additionally, syntenic analysis of the genomic locus related to the CEPs revealed high conservation in L. delbrueckii subsp. bulgaricus strains, while L. delbrueckii subsp. lactis strains exhibited greater variability, including the presence of insertion sequences, deletions, and rearrangements. Finally, the CEP promoter region and putative regulatory elements responsible for controlling the expression of the proteolytic system in lactobacilli were investigated. Our genomic analysis and in silico characterization of the CEPs contribute to our understanding of proteolytic activity and the potential applications of these lactic acid bacteria in the dairy industry. Further research in this area will expand our knowledge and potential practical uses of these findings.


Introduction
Lactic acid bacteria (LAB) have a significant impact on the food industry, fulfilling crucial roles as probiotics, postbiotics, and starter cultures in the production of fermented foods. The genus Lactobacillus includes the type species Lactobacillus delbrueckii [1,2] which consists of six recognized subspecies: Lactobacillus delbrueckii subsp. delbrueckii, L. delbrueckii subsp. bulgaricus, L. delbrueckii subsp. indicus, L. delbrueckii subsp. lactis, L. delbrueckii subsp. Sunkii, and L. delbrueckii subsp. jakobsenii. L. delbrueckii subsp. lactis was the first subspecies isolated by G. Leichmann (1896) from a dairy product [1]; this subspecies is used as a starter culture for the elaboration of a variety of fermented dairy products, such as fermented sour milks, mozzarella, and Swiss and Italian cheeses [1]. L. delbrueckii subsp. bulgaricus, which was initially isolated from a Bulgarian milk (1919) [1], is used in combination with Streptococcus thermophilus as a starter culture for commercial yogurt production, as well as in the manufacture of Swiss and Italian cheeses. On the other hand, L. delbrueckii subsp. indicus was first isolated from a dairy product in India [1], while L. delbrueckii subsp. delbrueckii, L. delbrueckii subsp. sunkii, and L. delbrueckii subsp. jakobsenii were isolated

Phylogenetic Analysis
The ANIm analysis of the 27 selected strains (Supplementary Figure S1) revealed identity percentages greater than 97%. The resulting ANI values ranged from 97.28 to 99.98%, falling within the generally recommended ranges for prokaryotic species [15,16]. Notably, the strains were categorized into two distinct groups. The first group consisted exclusively of strains belonging to the species L. delbrueckii subsp. bulgaricus, while the second group comprised L. delbrueckii subsp. delbrueckii and L. delbrueckii subsp. lactis strains. Surprisingly, among the 15 strains of L. delbrueckii subsp. bulgaricus studied, the strain ND02 was grouped with the second cluster, demonstrating a high degree of identity (99.97%) with L. delbrueckii subsp. lactis CIDCA133. Additionally, both strains shared an identical plasmid (100% identity and 0 gaps), annotated as CP002342.1 and CP065514.1 in ND02 and CIDCA133, respectively. These findings suggest that the strain ND02 should be classified as belonging to the L. delbrueckii subsp. lactis subspecies rather than L. delbrueckii subsp. bulgaricus.
To further elucidate the subspecies to which strain ND02 belongs, an MLSA was conducted. The 27 selected strains were effectively segregated into two distinct clusters ( Figure 1). The first cluster exclusively comprised strains belonging to L. delbrueckii subsp. bulgaricus, with the exception of the ND02 strain. In contrast, the second cluster encompassed strains of L. delbrueckii subsp. delbrueckii and L. delbrueckii subsp. lactis. Moreover, no discernible differences were observed between the strain ND02s and CIDCA133, both of which were positioned within the second cluster. The ANIm analysis of the 27 selected strains (Supplementary Figure S1) revealed identity percentages greater than 97%. The resulting ANI values ranged from 97.28 to 99.98%, falling within the generally recommended ranges for prokaryotic species [15,16]. Notably, the strains were categorized into two distinct groups. The first group consisted exclusively of strains belonging to the species L. delbrueckii subsp. bulgaricus, while the second group comprised L. delbrueckii subsp. delbrueckii and L. delbrueckii subsp. lactis strains. Surprisingly, among the 15 strains of L. delbrueckii subsp. bulgaricus studied, the strain ND02 was grouped with the second cluster, demonstrating a high degree of identity (99.97%) with L. delbrueckii subsp. lactis CIDCA133. Additionally, both strains shared an identical plasmid (100% identity and 0 gaps), annotated as CP002342.1 and CP065514.1 in ND02 and CIDCA133, respectively. These findings suggest that the strain ND02 should be classified as belonging to the L. delbrueckii subsp. lactis subspecies rather than L. delbrueckii subsp. bulgaricus.
To further elucidate the subspecies to which strain ND02 belongs, an MLSA was conducted. The 27 selected strains were effectively segregated into two distinct clusters (Figure 1). The first cluster exclusively comprised strains belonging to L. delbrueckii subsp. bulgaricus, with the exception of the ND02 strain. In contrast, the second cluster encompassed strains of L. delbrueckii subsp. delbrueckii and L. delbrueckii subsp. lactis. Moreover, no discernible differences were observed between the strain ND02s and CIDCA133, both of which were positioned within the second cluster.

Peptidases
The pangenome analysis of the 27 strains revealed a total of 2818 genes. Among them, the core-genome consisted of 894 genes, the softcore-genome (25 genomes) comprised 326 genes, the shell-genome (3-25 genomes) contained 996 genes, and the cloud-genome (1-2 genomes) included 602 genes. Within the cloud-genome, 309 genes were found to be unique to specific strains. Considering the proteolytic system of LAB, we identified several peptidase genes, pepC, pepD, pepF, pepI, pepM, pepN, pepO, pepQ, pepR, pepT, pepV, and pepX, in the core-genome, ( Table 1). The genes pepA, pepG/E, and pepP were part of the softcore-genome but were absent in the KCTC 13731, DSM 20072, and 2038 strains, respectively ( Table 1). The pepL gene was only found in the strains CIDCA133, CRL 581, DSM 20074, KCCM 34717, KCTC 13731, KCTC 3034, MAG_rmk202_Idel, NBRC 3202, ND02, and TUA4408L. Additionally, all evaluated strains possessed the pcp (OG0000669) and dpp (OG0000898) genes, encoding pyroglutamyl-peptidase I and dipeptidyl-peptidase VI, respectively. Furthermore, seven putative peptidases genes (OG0000061, OG0000393, OG0000665, OG0000907, OG0000926, OG0000928, and OG0000957) were identified, although their specific assignments were not determined. All L. delbrueckii strains evaluated in this study bear only one prt gene in each genome, named prtB in the strains belonging to the L. delbrueckii subsp. bulgaricus subspecies and prtL in the L. delbrueckii subsp. lactis and L. delbrueckii subsp. delbrueckii subspecies. This prt gene encodes a CEP, which generally possesses an amino acid sequence of about 1924 residues. The enzyme is synthesized as a preproprotein and matures after the removal of the PrePro domain [17], which consists of a putative 34-amino-acid signal sequence and a 158-amino-acid prosequence, with the predicted cleavage site between T192 and D193. Therefore, the mature proteinase contains approximately 1733 residues. The N-terminal region of the mature PrtL proteinase (497 amino acids) from the CRL 581 strain corresponds to the catalytic domain (PR), which exhibits similarity to subtilisin-type serine proteinases known as subtilases. Thus, these proteinase sequences can be classified within this proteinase family [17]. This catalytic domain is characterized by a catalytic triad composed of D30, H94, and S425 ( Figure 2). However, in L. delbrueckii subsp. delbrueckii KCTC 13731, L. delbrueckii subsp. delbrueckii NBRC 3202, and L. delbrueckii subsp. lactis KCTC 3035, there is a premature termination codon, resulting in a truncated mature proteinase of approximately 155, 155, and 843 residues, respectively (Supplementary Figure S2). The proteinase from the KCTC 13731 and NBRC 3202 strains would not be functional since they lack the S425 amino acid, which is part of the catalytic triad and essential for enzyme activity (Supplementary Figure S2). Similarly, the putative proteinase in the KCTC 3035 strain lacks the D30 and H94 amino acids that are crucial components of the active site of the enzyme (Supplementary Figure S2). All L. delbrueckii strains evaluated in this study bear only one prt gene in each genome, named prtB in the strains belonging to the L. delbrueckii subsp. bulgaricus subspecies and prtL in the L. delbrueckii subsp. lactis and L. delbrueckii subsp. delbrueckii subspecies. This prt gene encodes a CEP, which generally possesses an amino acid sequence of about 1924 residues. The enzyme is synthesized as a preproprotein and matures after the removal of the PrePro domain [17], which consists of a putative 34-amino-acid signal sequence and a 158-amino-acid prosequence, with the predicted cleavage site between T192 and D193. Therefore, the mature proteinase contains approximately 1733 residues. The N-terminal region of the mature PrtL proteinase (497 amino acids) from the CRL 581 strain corresponds to the catalytic domain (PR), which exhibits similarity to subtilisin-type serine proteinases known as subtilases. Thus, these proteinase sequences can be classified within this proteinase family [17]. This catalytic domain is characterized by a catalytic triad composed of D30, H94, and S425 ( Figure 2). However, in L. delbrueckii subsp. delbrueckii KCTC 13731, L. delbrueckii subsp. delbrueckii NBRC 3202, and L. delbrueckii subsp. lactis KCTC 3035, there is a premature termination codon, resulting in a truncated mature proteinase of approximately 155, 155, and 843 residues, respectively (Supplementary Figure S2). The proteinase from the KCTC 13731 and NBRC 3202 strains would not be functional since they lack the S425 amino acid, which is part of the catalytic triad and essential for enzyme activity (Supplementary Figure S2). Similarly, the putative proteinase in the KCTC 3035 strain lacks the D30 and H94 amino acids that are crucial components of the active site of the enzyme (Supplementary Figure S2).  A phylogenetic analysis of CEPs from different L. delbrueckii strains is shown in Figure 3. The proteinase sequences were clustered into three main groups. The first group consisted of putative PrtB sequences, which corresponded to the 14 strains of L. delbrueckii subsp. bulgaricus. The PrtB sequences exhibited more than 98.8% identity among them. 6

of 16
The second cluster included proteinases from the strains ND02 and CIDCA133, sharing the same predicted proteinase with 100% identity. Finally, the remaining strains belonging to L. delbrueckii subsp. delbrueckii (TUA4408L and ATCC 9649) and L. delbrueckii subsp. lactis (six strains) were clustered together. The analysis of alignments revealed that position 282 of the CEPs in L. delbrueckii subsp. lactis and L. delbrueckii subsp. delbrueckii, referred to as PrtL, is characterized by a serine residue, while PrtB (specific to L. delbrueckii subsp bulgaricus) exhibits a cysteine residue at the same position (Supplementary Figures S3  and S4). Similarly to PrtL, the ND02 strain also displays a serine residue at position 282 (position 90 in the mature CEP). An analysis carried out using Missense3D [18] to predict the structural changes introduced by an amino acid substitution showed that this substitution did not alter the conformational structure of the enzyme (Supplementary Figure S3).
A phylogenetic analysis of CEPs from different L. delbrueckii strains is shown in Figure 3. The proteinase sequences were clustered into three main groups. The first group consisted of putative PrtB sequences, which corresponded to the 14 strains of L. delbrueckii subsp. bulgaricus. The PrtB sequences exhibited more than 98.8% identity among them. The second cluster included proteinases from the strains ND02 and CIDCA133, sharing the same predicted proteinase with 100% identity. Finally, the remaining strains belonging to L. delbrueckii subsp. delbrueckii (TUA4408L and ATCC 9649) and L. delbrueckii subsp. lactis (six strains) were clustered together. The analysis of alignments revealed that position 282 of the CEPs in L. delbrueckii subsp. lactis and L. delbrueckii subsp. delbrueckii, referred to as PrtL, is characterized by a serine residue, while PrtB (specific to L. delbrueckii subsp bulgaricus) exhibits a cysteine residue at the same position (Supplementary Figures  S3 and S4). Similarly to PrtL, the ND02 strain also displays a serine residue at position 282 (position 90 in the mature CEP). An analysis carried out using Missense3D [18] to predict the structural changes introduced by an amino acid substitution showed that this substitution did not alter the conformational structure of the enzyme (Supplementary Figure  S3). Evolutionary relationships were inferred using the method of joining neighbors. Evolutionary distances were calculated using the p distance method and are expressed as the number of different amino acids per site. The PrtP proteinase from Lactococcus lactis subsp. cremoris SK11 was used as an outgroup. The percentages of identity between the proteinases of each strain are shown in the Table. L. delbrueckii subsp. bulgaricus strains are marked with a blue dot, L. delbrueckii subsp. delbrueckii strains with a green dot, and L. delbrueckii subsp. lactis with a red dot.
To characterize the genomic locus of CEPs in L. delbrueckii, a syntenic analysis was performed. An overview of the genomic context of CEPs is presented in Figure 4, with more detailed information available in Supplementary Figures S5-S7. Synteny is highly conserved among all 14 strains of L. delbrueckii subsp. bulgaricus subspecies (Supplementary Figure S5). The prt gene is consistently located downstream of the acetyltransferase To characterize the genomic locus of CEPs in L. delbrueckii, a syntenic analysis was performed. An overview of the genomic context of CEPs is presented in Figure 4, with more detailed information available in Supplementary Figures S5-S7. Synteny is highly conserved among all 14 strains of L. delbrueckii subsp. bulgaricus subspecies (Supplementary Figure S5). The prt gene is consistently located downstream of the acetyltransferase and aspartate ammonium lyase genes (LBLM1_05370 and asnA, respectively) and upstream of the patC (cystathionine beta-lyase) and htpx2 (heat shock protein) genes (  and aspartate ammonium lyase genes (LBLM1_05370 and asnA, respectively) and upstream of the patC (cystathionine beta-lyase) and htpx2 (heat shock protein) genes ( Figure  4, Supplementary Figures S5-S7).  Figure S7). In L. delbrueckii subsp. lactis DSM 20072, an insertion sequence was found between the patC and htpX2 genes. This insertion sequence contains imperfect reverse repeats of 26 bp flanked by direct repeats rich in A + T of 8 bp, and encodes a putative transposase consisting of 455 amino acids ( Figure 4). Both the repeats and peptide sequence perfectly match what was previously reported by Ravin and Alatossava [19] for ISLdl1, exhibiting 100% identity in the amino acid sequence. Additionally, a second insertion sequence was located at the 5' end of this sequence, showing 98% identity with the sequence described for IS110 by Bruton and Chater [20].

Analysis of Promoter Regions and Putative Regulators
As mentioned before, CEPs exhibit a high degree of identity among them. However, there are significant variations in proteolytic activity and the strength of repression when strains are grown in the presence of peptides [4,7]. For instance, despite having 99% identity in their amino acid sequences, the CRL 581 strain shows ten-fold higher activity compared to the DSM 20072 strain. These two proteinases differ by only four amino acids out of a total of 1924 residues. These substitutions do not affect the active site environment of the enzyme (amino acids 1 to 497). Specifically, the substitutions from the strains CRL 581 to DSM 20072 are D606 to E606, T610 to A610, G638 to V638, and N658 to D658. Structural prediction analyses [18] indicate that these amino acid substitutions do not impact the three-dimensional structure of the proteinase . Therefore, an in silico analysis was per-  Figure S7). In L. delbrueckii subsp. lactis DSM 20072, an insertion sequence was found between the patC and htpX2 genes. This insertion sequence contains imperfect reverse repeats of 26 bp flanked by direct repeats rich in A + T of 8 bp, and encodes a putative transposase consisting of 455 amino acids (Figure 4). Both the repeats and peptide sequence perfectly match what was previously reported by Ravin and Alatossava [19] for ISLdl1, exhibiting 100% identity in the amino acid sequence. Additionally, a second insertion sequence was located at the 5' end of this sequence, showing 98% identity with the sequence described for IS110 by Bruton and Chater [20].

Analysis of Promoter Regions and Putative Regulators
As mentioned before, CEPs exhibit a high degree of identity among them. However, there are significant variations in proteolytic activity and the strength of repression when strains are grown in the presence of peptides [4,7]. For instance, despite having 99% identity in their amino acid sequences, the CRL 581 strain shows ten-fold higher activity compared to the DSM 20072 strain. These two proteinases differ by only four amino acids out of a total of 1924 residues. These substitutions do not affect the active site environment of the enzyme (amino acids 1 to 497). Specifically, the substitutions from the strains CRL 581 to DSM 20072 are D606 to E606, T610 to A610, G638 to V638, and N658 to D658. Structural prediction analyses [18] indicate that these amino acid substitutions do not impact the threedimensional structure of the proteinase. Therefore, an in silico analysis was performed on the promoter region of the prt genes. Furthermore, the presence of transcriptional regulators associated with the lactobacilli proteolytic system (BCAAR, PrcR, and YebC), along with their consensus sequences in the prt gene promoter region, was also examined.
These in silico analyses of the proteolytic system of L. delbrueckii were primarily focused on CEPs, which serve as the key enzymes in the system and act as the primary drivers of proteolytic activity.
The core promoter showed a high degree of conservation among the L. delbrueckii strains analyzed (Table 2, Figure 5). Only the promoter region associated with the proteinase of L. delbrueckii subsp. delbrueckii ATCC 9649 exhibited the substitution of guanine with thymidine at the -35 element. With the exception of L. delbrueckii subsp. bulgaricus MN-BM-F01 and L. delbrueckii subsp. bulgaricus VHProbi R03, which possess an additional nucleotide between the UP element and the -35 element, the extended promoter region remained conserved in all the examined strains of L. delbrueckii subsp. bulgaricus. In contrast, the UP element was less conserved in L. delbrueckii subsp. lactis strains, with small variations in 1-2 nucleotides. Based on these findings, a consensus sequence for the prt promoter was established ( Figure 6).  Nucleotides that differ from those of the promoter described for the NCDO1489 strain are shown in blue. 1 distance between the UP element and the -35 element.; 2 distance between the -35 element and -10 element.     A pangenomic analysis to identify potential regulators, including the Lactobacillus helveticus BCAA-responsive transcriptional regulator (BCAAR), the Lactobacillus delbrueckii DNA-binding protein YebC, and the Lactobacillus casei OmpR family response regulator, PrcR, was performed. Remarkably, BCAAR (OG0000219), YebC (OG0000477), and PrcR (OG0000151) were identified in all 27 L. delbrueckii strains examined. However, it is noteworthy that the consensus sequences described for BCAAR and PrcR were not identified in the promoter region of the analyzed prt genes. On the other hand, studies investigating the consensus binding region of YebC have not been conducted yet. This finding highlights the need for further investigation in this specific area, offering new opportunities for future research and exploration.

Discussion
Phylogenetic analysis using ANIm and MLSA revealed a clear distinction between L. delbrueckii subsp. bulgaricus and L. delbrueckii subsp. lactis. Surprisingly, the ND02 strain, initially identified as L. delbrueckii subsp. bulgaricus based on 16S RNA analysis and sugar assimilation profiles, was found to be closely related to the L. delbrueckii subsp. lactis CIDCA133 strain. Moreover, the ND02 strain has a considerably larger genome size (2.13 Mb) compared to the reported genome size of the L. delbrueckii subsp. bulgaricus subspecies (1.82-1.89 Mb), aligning more closely with the genome size of the L. delbrueckii subsp. lactis subspecies (2.00-2.26 Mb). Liu et al. (2010) [11] conducted comparative studies on the proteolytic system of 22 LAB strains, which revealed that CEPs were present in only a few strains. In contrast, various peptidases genes, such as pepC, pepN, pepM, pepX, pepQ, pepO, and pepV, were found in all LAB genomes. Although comparative genomics approaches can distinguish different subgroups of peptidases, differences in enzyme specificity between these subgroups remain unclear.
The advent of advanced high-throughput sequencing techniques has led to a vast abundance of sequenced bacterial genomes becoming easily accessible. Therefore, this study aimed to analyze the genomic diversity of the proteolytic system, focusing specifically on the L. delbrueckii species. Through the analysis of 26 L. delbrueckii strains, a substantial repertoire of peptidase enzymes similar to those found in L. delbrueckii subsp. lactis CRL 581 were identified [7]. On the other hand, all strains featured a single prt gene encoding a putative CEP. This enzyme is of great significance in the proteolytic system as it plays a critical role in the initial step of protein hydrolysis. PrtB was found to be specific to L. delbrueckii subsp. bulgaricus, while PrtL was represented by strains of L. delbrueckii subsp. delbrueckii and L. delbrueckii subsp. lactis. A pangenomic analysis to identify potential regulators, including the Lactobacillus helveticus BCAA-responsive transcriptional regulator (BCAAR), the Lactobacillus delbrueckii DNA-binding protein YebC, and the Lactobacillus casei OmpR family response regulator, PrcR, was performed. Remarkably, BCAAR (OG0000219), YebC (OG0000477), and PrcR (OG0000151) were identified in all 27 L. delbrueckii strains examined. However, it is noteworthy that the consensus sequences described for BCAAR and PrcR were not identified in the promoter region of the analyzed prt genes. On the other hand, studies investigating the consensus binding region of YebC have not been conducted yet. This finding highlights the need for further investigation in this specific area, offering new opportunities for future research and exploration.

Discussion
Phylogenetic analysis using ANIm and MLSA revealed a clear distinction between L. delbrueckii subsp. bulgaricus and L. delbrueckii subsp. lactis. Surprisingly, the ND02 strain, initially identified as L. delbrueckii subsp. bulgaricus based on 16S RNA analysis and sugar assimilation profiles, was found to be closely related to the L. delbrueckii subsp. lactis CIDCA133 strain. Moreover, the ND02 strain has a considerably larger genome size (2.13 Mb) compared to the reported genome size of the L. delbrueckii subsp. bulgaricus subspecies (1.82-1.89 Mb), aligning more closely with the genome size of the L. delbrueckii subsp. lactis subspecies (2.00-2.26 Mb).
Liu et al. (2010) [11] conducted comparative studies on the proteolytic system of 22 LAB strains, which revealed that CEPs were present in only a few strains. In contrast, various peptidases genes, such as pepC, pepN, pepM, pepX, pepQ, pepO, and pepV, were found in all LAB genomes. Although comparative genomics approaches can distinguish different subgroups of peptidases, differences in enzyme specificity between these subgroups remain unclear.
The advent of advanced high-throughput sequencing techniques has led to a vast abundance of sequenced bacterial genomes becoming easily accessible. Therefore, this study aimed to analyze the genomic diversity of the proteolytic system, focusing specifically on the L. delbrueckii species. Through the analysis of 26 L. delbrueckii strains, a substantial repertoire of peptidase enzymes similar to those found in L. delbrueckii subsp. lactis CRL 581 were identified [7]. On the other hand, all strains featured a single prt gene encoding a putative CEP. This enzyme is of great significance in the proteolytic system as it plays a critical role in the initial step of protein hydrolysis. PrtB was found to be specific to L. delbrueckii subsp. bulgaricus, while PrtL was represented by strains of L. delbrueckii subsp. delbrueckii and L. delbrueckii subsp. lactis.
The activity and specificity of a CEP is influenced not only by its amino acid sequence but also by its three-dimensional structure. In CEP alignments, the analyses revealed a high degree of identity among the CEP primary sequences, indicating a significant level of similarity in their amino acid compositions. Moreover, the predicted three-dimensional structures of these proteins exhibit a remarkable degree of conservation, suggesting that the overall folding and arrangement of critical protein domains are highly preserved. However, despite the high sequence identity and shared conformation among L. delbrueckii proteinases, their activity levels vary among different strains. This indicates that additional factors, beyond sequence and structure, contribute to the observed variation in enzyme activity. These differences may be influenced by various mechanisms, such as posttranslational modifications, the regulation of gene expression, or the presence of specific regulatory elements within the proteolytic system. Therefore, a comparative analysis of putative promoters and regulatory sequences was performed to investigate their potential roles in modulating enzyme activity.
In bacterial transcription, the process begins with the recognition of specific consensus sequences known as promoters, which are located upstream of the transcription start site (TSS). The core promoter, consisting of the -10 and -35 elements, is essential for optimal transcription initiation, with their optimal positioning relative to the TSS [21]. To enhance transcription, a third element known as the UP element, a sequence rich in A + T located between positions −40 and −60, is often present [22]. Gilbert et al. [23] identified these three elements in the promoter region of the prtB gene in L. delbrueckii subsp. bulgaricus NCDO1489. In our study, the putative promoter of prt was found to be conserved in the majority of the evaluated strains, and the -10 and -35 elements were separated by a distance of 16 bp according to Gilbert et al. [23]. In addition, the first 500 bp upstream of the initiation codon ATG were highly conserved in all three subspecies. This suggests that factors beyond differences in amino acid and promoter sequences contribute to the activity differences.
Microorganisms have evolved a regulatory mechanism to control the proteolytic system in response to changes in nitrogen availability, ensuring a balanced nitrogen metabolism within the cell. In Lactococcus lactis, which is one of the most extensively studied Gram-positive microorganisms after Bacillus, the expression of proteolytic enzymes is repressed through nitrogen catabolite repression when readily assimilable nitrogenous compounds are present [7,24]. This repression is mediated by CodY, which undergoes a conformational change upon binding to isoleucine, leucine, and valine, allowing it to bind to the operator on DNA and block the transcription of proteolytic enzyme synthesis genes.
The regulatory mechanisms of the proteolytic system in the Lactobacillaceae family are not fully understood. CodY homologues have not been identified in their genomes. In L. helveticus CM4, the BCAAR regulator binds to the 5 -AAAAANNCTWTTATT-3 sequence within the promoter region of several genes involved in protein degradation and transport, including the pepT2, pepCE, pepO, pepO2, and dppD genes [25]. This binding occurs in the presence of branched-chain amino acids, leading to the repression of their transcription [25]. On the other hand, Alcantara et al. [26] identified a response regulator, PrcR, which regulates the expression of genes involved in amino acid biosynthesis, transport, intracellular peptidases, and proteinase in Lacticaseibacillus paracasei BL23. Although the exact DNA binding sequence for PrcR has not been determined, an AAAA motif may be involved in this regulation. Previously, we identified a putative transcriptional regulator belonging to the YebC family in L. delbrueckii subsp. lactis CRL 581 [7]. This regulator was found to bind to the promoter region of the prtL gene. Although the consensus sequence with which YebC matches has not yet been defined, its overexpression was observed in the presence of Casitone, a peptide-rich nitrogenous source [7]. In this in silico analysis of the 27 strains of L. delbrueckii, the presence of the three transcriptional regulators (YebC, BCAAR, and PrcR) was conserved. However, the consensus sequence of BCAAR or PrcR in the surroundings of the prt promoter was not identified. Therefore, further experiments are necessary to gain a comprehensive understanding of proteolytic activity regulation.

Average Nucleotide Identity (ANI) Analysis
DNA-DNA hybridization (DDH) is traditionally a widely used genomic taxonomic method. However, with the advancement of sequencing technologies and the reduction in sequencing costs, a new approach has emerged, which involves the massive sequencing of strains followed by the analysis and comparison of complete genomes or a large number of markers using bioinformatics tools [27,28]. One commonly used parameter in this approach is based on BLAST (ANIb). The calculation of ANI involves the comparison of base pairs in the genomes by identifying shared orthologous proteins or by dividing the genome into fragments of 1020 nucleotides [15]. Since the establishment of DDH, more advanced algorithms have emerged, such as MUMmer (ANIm) software, which creates and searches for data structures called suffix trees. These suffix trees can rapidly create sequence alignments containing millions of nucleotides [15]. ANIm provides more robust results when the pair of genomes compared share a high degree of similarity (ANI > 90%) [15]. Taking this into account, ANIm was used as a parameter for our comparative genomic analysis. ANIm calculation of the 28 strains (Table 3) was carried out using the JspeciesWS online server [28]. From these data, an ANI heatmap was constructed using the heatmap.2 function of the gplots R package (version 3.1.3).

Multilocus Sequence Analysis (MLSA) and Pangenomic Analysis
Genomes were re-annotated using PGAP v4.8 [29] in the stand-alone configuration. Amino acid sequences encoded in L. delbrueckii genomes were compared for ortholog group inference using OrthoFinder v2.5.4. [30]. A phylogenetic tree was constructed from the following housekeeping gene sequences: clpX, dnaA, hsp60, murE, pheS, and pyrG. Housekeeping genes for MLSA were selected based on previous work [31]. Using a multiple alignment program (MUSCLE) [32], the sequences were aligned and the tree was built from the maximum likelihood estimation statistical test [33], available in MEGAX [34].

In Silico Analysis of Promoter Region and Predicted Amino Acid Sequence of prt Genes
Genomes downloaded from Genbank were uploaded to the RAST server [35]. CEP sequences were obtained from the RAST server by performing BlastP alignment with the proteinase PrtL sequence of CRL 581. Protein sequence alignments were carried out using the T-coffee online tool [36] and colored using the Boxshade v3.2 tool. The aligned sequences were later used to build a tree using MEGA X version 10.1 software [34]. A phylogenetic tree was inferred using the neighbor-joining method, with the p distance as a criterion and the PrtP sequence of Lactococcus Lactis SK11 as an outgroup. The promoters of prt genes were identified via homology with the prtB promoter sequence from L. delbrueckii subsp. bulgaricus NCDO 1489 described by Gilbert et al. [23], considering the 500 pb sequence upstream of the ATG of each prt. Sequence logos were generated using the web-based application WebLogo 3.7.11 [37]. The synteny plots around the prt genes of L. delbrueckii strains were created using Easyfig v2.2.5 software [38] and the BLASTn algorithm.

Protein Modeling and Visualization
The Phyre2 web portal was utilized for protein modeling, prediction, and analysis [39]. For the visualization and selection of molecular chains, the EzMol software was employed [40].

Conclusions
This comprehensive genomic analysis of the proteolytic system in L. delbrueckii strains revealed the presence of various peptidase genes, including pepC, pepD, pepF, pepI, pepM, pepN, pepO, pepQ, pepR, pepT, pepV, and pepX, in the core-genome. The prt gene encoding CEPs was present in all L. delbrueckii strains, with two subtypes (prtB in L. delbrueckii subsp. bulgaricus and prtL in L. delbrueckii subsp. lactis and L. delbrueckii subsp. delbrueckii). The structural analysis of the CEPs confirmed the presence of conserved catalytic triads, and the predicted structural impact of the amino acid substitution at position 90 (S to C) did not alter the conformational structure of the enzyme. Synteny analysis demonstrated high conservation of the genomic context of the prt gene among L. delbrueckii subsp. bulgaricus strains, while L. delbrueckii subsp. lactis strains displayed greater variability, including the presence of insertion sequences, deletions, and rearrangements. In silico analysis of the prt gene promoter region revealed a high degree of conservation in the core promoter and extended promoter regions among the L. delbrueckii strains. Consensus sequences for the prt promoter were established. These findings contribute to our understanding of the proteolytic activity of L. delbrueckii and its potential applications in the dairy industry. Further research in this field will expand our knowledge and practical utilization of these findings.   (19H00965, 23H00354), and by the research program on the development of innovative technology grants (JPJ007097) from the project of the Bio-oriented Technology Research Advancement Institution (BRAIN), awarded to H.K. The study was also supported by AMED (Moonshot R&D-MILLENNIA Program, Grant Number JP21zf0127001). This work was also supported by the JSPS Core-to-Core Program, and the Advanced Research Networks project entitled "Establishment of international agricultural immunology research-core for a quantum improvement in food safety".

Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.