1. Introduction
Streptococcus thermophilus, as a significant commercial fermentation starter, is extensively utilized in products such as yogurt and cheese, with its consumption being substantial and its market value exceeding 40 billion USD [
1]. The high-density fermentation of
S. thermophilus has a crucial impact on industrial applications. M17 medium, a formula widely used for the laboratory culture of
S. thermophilus, contains β-disodium glycerophosphate (β-GP: 19 g/L) as its highest proportion component. β-GP is also the most expensive component (the price of the analytical reagent-grade one is approximately 200~650 USD/500 g), which greatly limits the industrial application of M17 medium. The purpose of adding β-GP to M17 medium is to replace phosphate buffer, as phosphate forms precipitates with alkali metal ions. Therefore, the role of β-GP in the medium is often attributed to its buffering capacity [
2,
3], but there is no definitive research to confirm this speculation. Subsequent studies found that β-GP inhibits the growth of
Lactobacillus bulgaricus and
Lactobacillus helveticus [
3]. Recent studies on β-GP also suggest that it may have antimicrobial properties by binding to bacterial cell membranes to induce cell lysis [
4]. Therefore, while β-GP improves the performance of the medium, it may also have potential negative effects. In order to develop a medium more suitable for industrial production, it is necessary to conduct in-depth research on the role of β-GP in the growth of
S. thermophilus.
The proliferative effect of urea on
S. thermophilus has been confirmed in our previous studies, where the addition of urea to a chemically defined medium (CDM) significantly increased the biomass of
S. thermophilus S-3 [
5]. This is beneficial for the high-density cultivation of
S. thermophilus. Research indicates that the urease activity is significantly correlated with lactose utilization [
6]. Studies on energetically discharged cells with blocked transcription and translation suggested that the production of ammonia can stimulate glycolysis [
7]. We also found that the addition of urea to the CDM could maintain the pH of the fermentation broth under neutral conditions, which gave us an insight: perhaps urea can replace β-GP in M17 medium for the cultivation of urease-positive bacteria. By analyzing the similarities and differences between β-GP and urea in promoting the growth of
S. thermophilus, regulatory targets for the proliferation of urease-negative bacteria can be screened.
Cell growth metabolic regulation is one of the most complex systems, and the application of systems biology methods can globally establish the relationship between genes, transcription, and metabolism. Omics technology, as a rapidly developing new technology, plays a huge role in various forms of biological research. Single omics data can only reflect differences in the part of the regulatory scale, so the joint analysis of multi-omics can more comprehensively explain biological mechanisms. Many methods have been developed for the integration of transcriptomics and metabolomics, but these methods mainly rely on subjective judgment when interpreting biological associations [
8]. Therefore, people hope to integrate omics data under the background of biological knowledge associated with organisms. Genome-scale metabolic network models (GEMs) are a systems biology tool constructed based on the genome, which can be used to place the interactions of different biological components in context and interpret these networks [
9]. Integrating omics data into GEMs allows for the study of organisms in a specific context, exploring metabolic mechanisms from a systems biology perspective. These methods have been successfully applied in model organisms such as
E. coli [
10] and yeast [
11]. REMI (Relative Expression and Metabolomic Integrations) is the first method that can integrate both transcriptomic and metabolomic data into GEMs for analysis [
12]. This approach can better reflect different physiological states of cells than original models.
To deeply understand the mechanisms by which β-GP and urea promote the proliferation of S. thermophilus, we measured the physiological phenotypes, transcriptomes, and metabolomes of S. thermophilus under different culture conditions, and used REMI to integrate these data into GEM for analysis. By establishing condition-specific models for different culture conditions, we elucidated the metabolic status of the cells and predicted the metabolic flux within the cells. This study provides a basis for improving the culture medium of S. thermophilus by replacing β-GP with urea, and lays the foundation for subsequent research on the high-density culture and regulatory targets of S. thermophilus.
2. Materials and Methods
2.1. Micro-Organisms and Culture Conditions
The S. thermophilus strains used in this study were all preserved in a −80 °C refrigerator in our laboratory. The strains were activated by culturing on M17 solid medium (containing 20 g/L lactose, the same below) at 37 °C for 24 h. A representative single clone colony was picked and inoculated into M17 liquid medium to continue culturing until the exponential phase. After obtaining the cells by centrifugation (5000× g, 5 min), they were washed three times with sterile physiological saline. The cells were then resuspended and inoculated (inoculation amount: 2%) into three types of media: M17 medium (M17), M17 medium with urea replacing β-GP (UM17), and M17 medium without β-GP (M17-β). They are cultured at 37 °C for 12 h, and samples are taken regularly to measure various indicators.
2.2. Determination of Sugar, Lactate, Urea, and Biomass
The concentrations of lactose, galactose, and lactic acid were analyzed using a high-performance liquid chromatography system (model 2695; Waters Corp., Milford, MA, USA). The concentrations of urea were determined using the diacetylmonoxime method. Cell dry weight was determined by weighing. All methods were based on previously established methods [
5].
2.3. Transcriptomic Analysis
When bacterial cells have grown to the late exponential phase, the fermentation broth was centrifuged (5000× g, 5 min) to collect the cells. The cells were washed three times with pre-cooled PBS (0.01 M), and then resuspended in PBS to the same OD600 value (1.0 ± 0.05). The suspension was quantitatively aliquoted into new 1.5 mL centrifuge tubes. After centrifugation, the supernatant was discarded, and the cells were flash-frozen in liquid nitrogen and stored in a −80 °C refrigerator. Total RNA was extracted from the tissue using CTAB method and genomic DNA was removed. Only RNA samples of high quality were employed for the construction of the sequencing library. Depletion of ribosomal RNA (rRNA) was carried out using the RiboCop rRNA Depletion Kit for Mixed Bacterial Samples (lexogen, Hampshire, MA, USA), as opposed to poly(A) purification. Following this, all mRNA was fragmented into short segments (200 nt) by the addition of fragmentation buffer. Double-stranded cDNA was then synthesized using random hexamer primers (Illumina, San Diego, CA, USA). During the synthesis of the second strand of cDNA, dUTP was incorporated in lieu of dTTP. The synthesized cDNA underwent end-repair, phosphorylation, and ‘A’ base addition as per Illumina’s library construction protocol. The RNA-seq transcriptome library was prepared following the Illumina® Stranded mRNA Prep, Ligation (San Diego, CA, USA) using total RNA. The paired-end RNA-seq library was sequenced with the Illumina Novaseq 6000 (Illumina Inc., San Diego, CA, USA). The processing of original images to sequences, base-calling, and quality value calculations were performed. The clean reads were obtained by removing low-quality sequences, reads with more than 10% of N bases (unknown bases), and reads containing adaptor sequences. Three biological replicates were set up for each group. The transcription analysis was carried out by Shanghai Meiji Biomedical Technology Co., Ltd. (Shanghai, China), including sample preprocessing, RNA extraction, sequencing library construction, Illumina sequencing. Transcriptomic data are available from the Sequence Read Archive (SRA) repository of National Center for Biotechnology Information (accession number: PRJNA1081653).
2.4. Untargeted LC-MS/MS Metabolomic Analysis
For untargeted metabolomics analysis, six biological replicates were set up for each group. The method of bacterial cell collection was the same as transcriptome. A quantity of 50 mg cells was introduced into a 2 mL centrifuge tube, along with a grinding bead of 6 mm diameter. Metabolite extraction was performed using 400 μL of an extraction solution (methanol: water in a 4:1 v/v ratio) that contained an internal standard (L-2-chlorophenylalanine) at a concentration of 0.02 mg/mL. The samples were ground using the Wonbio-96c frozen tissue grinder (manufactured by Shanghai Wanbo Biotechnology Co., Ltd., Shanghai, China) for a duration of 6 min at −10 °C and 50 Hz. This was followed by low-temperature ultrasonic extraction for 30 min at 5 °C and 40 kHz. The samples were then left at −20 °C for 30 min, after which they were centrifuged for 15 min at 4 °C and 13,000× g. The supernatant was transferred to the injection vial for LC-MS/MS analysis. The LC-MS/MS analysis was conducted on a Thermo UHPLC-Q Exactive HF-X system equipped with an ACQUITY HSS T3 column (100 mm × 2.1 mm i.d., 1.8 μm; Waters, Milford, CT, USA) at Majorbio Bio-Pharm Technology Co. Ltd. (Shanghai, China). The mobile phases consisted of 0.1% formic acid in water: acetonitrile (95:5, v/v) (solvent A) and 0.1% formic acid in acetonitrile: isopropanol: water (47.5:47.5, v/v) (solvent B). The flow rate was set at 0.40 mL/min and the column temperature was maintained at 40 °C. The mass spectrometric data were collected using a Thermo UHPLC-Q Exactive HF-X Mass Spectrometer (Thermo Fisher, Waltham, MA, USA) equipped with an electrospray ionization (ESI) source operating in both positive and negative modes. The optimal conditions were set as follows: source temperature at 425 °C; sheath gas flow rate at 50 arb; Aux gas flow rate at 13 arb; ion-spray voltage floating (ISVF) at −3500 V in negative mode and 3500 V in positive mode, respectively; and normalized collision energy, 20–40–60 V rolling for MS/MS. Full MS resolution was 60,000, and MS/MS resolution was 7500. Data acquisition was performed with the Data-Dependent Acquisition (DDA) mode. The detection was carried out over a mass range of 70–1050 m/z.
2.5. Data Processing and Statistical Analysis
2.5.1. Transcriptomics
The data generated from the Illumina platform were utilized for bioinformatics analysis. All analyses were conducted using the Majorbio Cloud Platform, a free online platform provided by Shanghai Majorbio Bio-pharm Technology Co., Ltd. The primary software and parameters are as follows: (i) Mapping reads to reference genome: High-quality reads from each sample were mapped to the customer-provided reference genome [
13]. (ii) rRNA contamination assessment: In this step, 10,000 raw reads randomly selected from each sample were aligned to the Rfam database using the blast method. The percentage of rRNA in each sample was calculated based on the annotation results, providing an estimate of rRNA contamination. (iii) Expression analysis: Gene and isoform abundances were quantified from RNA-Seq data using RSEM [
14]. RSEM computes maximum likelihood abundance estimates using the Expectation Maximization algorithm for its statistical model. This includes the modeling of paired-end and variable-length reads, fragment length distributions, and quality scores to determine which transcripts are isoforms of the same gene. The transcripts per million reads (TPM) method was used to calculate expression levels. (iv) Differential expression analysis: Differentially expressed genes (DEGs) were identified for each dataset using DESeq2 [
15]. (v) Gene Ontology (GO) enrichment analysis: The Gene Ontology project provides an ontology of defined terms representing gene properties, covering three domains: Cellular Component, Molecular Function, and Biological Process. GO enrichment analysis identifies GO terms that DEGs are enriched in, helping to illustrate the differences between two particular samples at functional levels. The Goatools [
16] is used to identify statistically significantly enriched GO terms using Fisher’s exact test. FDR correction is performed to reduce Type-1 error by the BH method. After multiple testing corrections, GO terms with an adjusted
p-value ≤ 0.05 are considered significantly enriched in DEGs. (vi) KEGG enrichment analysis: Differentially expressed genes typically interact with each other in vivo to perform certain biological functions. Compared with the whole genome background, KEGG enrichment analysis can identify the most important biological metabolic pathways and signal transduction pathways that DEGs are involved in KOBAS 2.0 [
17].
2.5.2. Metabolomics
The LC/MS raw data were preprocessed using Progenesis QI 2.0 software (Waters Corporation, Milford, CT, USA), resulting in the export of a three-dimensional data matrix in CSV format. This matrix contained sample information, metabolite names, and mass spectral response intensities. The data matrix was cleaned by removing internal standard peaks, known false-positive peaks (including noise, column bleed, and derivatized reagent peaks), and then de-duplicated and peak pooled. Metabolites were identified by searching databases, primarily HMDB 5.0 [
18], Metlin [
19], and Majorbio Database [
20]. The resulting data matrix was uploaded to the Majorbio cloud platform for further analysis. The data matrix was preprocessed by retaining at least 80% of the metabolic features detected in any set of samples. For specific samples with metabolite levels below the lower limit of quantification, the minimum metabolite value was estimated, and each metabolic signature was normalized to the sum. To mitigate errors caused by sample preparation and instrument instability, the response intensities of the sample mass spectrometry peaks were normalized using the sum normalization method. Variables from QC samples with a relative standard deviation (RSD) greater than 30% were excluded, and the remaining data were log10-transformed to obtain the final data matrix for subsequent analysis. The R package “ropls” (Version 1.6.2) was used to perform principal component analysis (PCA) and orthogonal least partial squares discriminant analysis (OPLS-DA), with a 7-cycle interactive validation to evaluate the stability of the model. Metabolites with a Variable Importance in the Projection (VIP) greater than 1 and a
p-value less than 0.05 were determined to be significantly different metabolites. These were based on the VIP obtained by the OPLS-DA model and the
p-value generated by Student’s t-test. Differential metabolites between two groups were mapped into their biochemical pathways through metabolic enrichment and pathway analysis based on the KEGG database [
21]. These metabolites could be classified according to the pathways they were involved in or the functions they performed. Enrichment analysis was used to analyze whether a group of metabolites appears in a function node or not. The principle was that the annotation analysis of a single metabolite develops into an annotation analysis of a group of metabolites. The Python package “scipy.stats” was used to perform enrichment analysis to obtain the most relevant biological pathways for experimental treatments.
2.6. Genome-Scale Model Refinement
Based on the previous model iCH492, we further refined the model and performed gap filling based on transcriptome and metabolome data. In order to study the impact of β-GP on cell growth, reactions related to β-GP metabolism were added.
2.7. Integration of Transcriptome Data and Metabolome Data into Genome-Scale Models
The REMI was used to integrate transcriptome and metabolome data into the model. REMI can combine gene expression data and metabolomics data as constraints for the model, greatly reducing the feasible flux solution space. Moreover, REMI extensively enumerated alternative solution spaces. Due to the complexity of the metabolic network, there may be multiple feasible pathways leading to the same phenotype, so the results provided by REMI can more accurately reflect the physiological state of the cell. REMI represents the ratio of gene expression/metabolic abundance under two conditions as the flux perturbation of each reaction and imposes it as a constraint on individual fluxes. It can integrate the model in three ways: (1) REMI-Gex: integrating transcriptome data into GEM; (2) REMI-M: integrating metabolome data into GEM; and (3) REMI-GexM: integrating both transcriptome and metabolome data as model constraints. The transcriptome and metabolomics data of M17-β vs. M17 and M17-β vs. UM17 were used to impose constraints on the model, and the calculation of metabolic flux of substrates and products refers to the previously established method. Analyzing the model according to the method of N. Hadadi et al. [
22], the upper and lower bounds of the specific growth rate of the context-specific models are constrained to within ± 0.1 of the actual value based on the phenotypic experiment.
REMI aims to maximize the consistency between gene expression, metabolite concentration, and metabolic flux levels for a given condition pair. By formulating an optimization problem, the number of constraints imposed by relative gene expression and metabolite abundance is maximized. These constraints are integrated into the model to predict growth phenotypes. Two scores are calculated: Theoretical Maximum Consistency Score (TMCS), which represents the number of available omics data; and Maximum Consistency Score (MCS), which represents the number of omics data consistent with metabolic flux, i.e., the number of genes/metabolites that can be integrated into the model. The MILP formula can enumerate alternative sets from given constraints. The IBM CPLEX 12.8 solver is used to calculate the maximum consistency score. All programming source codes used to analyze the data were available at GitHub (
https://github.com/houcj2001/GP, accessed on 20 February 2024).
4. Discussion
This study attempted to explain the impact of β-GP and urea on the proliferation of
S. thermophilus. By integrating omics data into GEM, additional constraints can be applied to the model, which can more accurately reflect the distribution of metabolic fluxes in cells. For this study, we manually curated and iteratively refined the GEM of
S. thermophilus S-3 based on omics data, resulting in the model iCH502. Phenotypic experiments, and transcriptomic and metabolomic analyses revealed significant differences in
S. thermophilus under different culture conditions, but the molecular mechanisms causing these differences have not yet been revealed. The results of the fermentation experiment indicate that both urea and β-GP have a positive effect on the growth of
S. thermophilus S-3, with urea demonstrating superior performance. In previous research, we used FBA to evaluate the role of urea metabolism in the growth of
S. thermophilus. FBA is performed based on the maximization of an objective function (such as biomass yield). However, this method does not reflect the impact of environmental stress and other factors on cells. Based on a small amount of substrate and product quantitative data, the specific growth rates calculated by FBA were all higher than the experimental values (
Table 1). However, the integration of omics data using REMI significantly improved the consistency between the predicted values and the experimental values, demonstrating that the integration of omics data can enhance the predictive capability of the model. REMI-generated alternative solutions allow us to characterize the intracellular state of
S. thermophilus by identifying the changing fluxes in metabolic pathways.
Transcriptomic analysis showed that the gene for glycerol uptake permease in the M17 group was significantly upregulated, leading us to speculate that, in addition to transporting glycerol, it may also be involved in the transport of β-GP. During the simulation analysis of the model, although we set a high uptake rate for the β-GP transport reaction (10 mmol/gDW/h), the flux of this reaction was small in all alternative solutions (0.38 ± 0.14 mmol/gDW/h). This suggests that, while β-GP can be utilized by cells as a substrate, it is not directly metabolized in large amounts. After hydrolysis, β-GP provides phosphate ions to the cells. As is well-known, phosphate groups play an important role in energy metabolism. The lack of phosphates in the M17-β group medium may be one of the reasons for its smaller biomass.
During the growth process of
S. thermophilus, a large amount of lactic acid was produced and excreted to the outside of the cell, which gradually increases the osmotic pressure outside the cell. In addition, the large amount of lactic acid produced by the cells during the exponential phase cannot be excreted in time, leading to a decrease in the intracellular pH. This decrease mainly occurs because the H
+/lactic acid symport performs poorly under high external lactic acid concentrations [
25]. Batch fermentation experiments showed that the pH of both the M17-β group and the M17 group have significantly decreased. Considering that the lactic acid production of the M17 group is several times that of the M17-β group (
Figure 2D), but the final pH is still higher than M17-β group, it indicates that β-GP can play a certain buffering role. In previous research, we have confirmed that urea metabolism causes the consumption of H+ inside the cell, and F
0F
1-ATPase transports H
+ from outside the cell to the inside, keeping the pH inside and outside the cell near neutral, and REMI results once again confirm our previous views [
5]. The phosphate group contained in β-GP gives it a buffering capacity within a certain range, and the phosphoric acid obtained after β-GP hydrolysis can also play a certain buffering effect inside the cell. However, β-GP requires proton symport, and, when one molecule of β-GP is transported into the cell, two protons will enter the cell, which will cause a decrease in the pH inside the cell, to some extent offsetting the buffering effect of phosphoric acid, so the buffering effect of β-GP should mainly be reflected outside the cell. From the perspective of maintaining the cytoplasmic pH close to neutral, urea has a better effect than β-GP. On the other hand, the proton gradient potential provided by urea metabolism allows extracellular protons to enter the cell through the F
0F
1-ATPase channel, producing a large amount of ATP for cell growth, so, when culturing urease-positive
S. thermophilus, urea can completely replace β-GP and have a better effect.
In the M17 group, three upregulated DEGs are related to tryptophan synthase. In previous research, we found that the absence of tryptophan has a detrimental effect on the growth of
S. thermophilus [
5], implying that genes related to tryptophan synthase are likely closely related to growth. The common reaction analysis also identified a reaction (IGPS) related to tryptophan metabolism, the product of which is used for the further synthesis of tryptophan. In addition to being used for protein synthesis, some studies suggest that tryptophan can be catalyzed by tryptophanase to form indole. As a signaling molecule, indole plays a role in responding to environmental stress and regulating biofilm formation [
26].
In the common reaction analysis, both the M17 and UM17 groups have multiple reactions belonging to the galactose metabolism pathway. In the case of perfect stoichiometry, the theoretical galactose–lactose exchange coefficient should be 0.5 g galactose/g lactose, but, in reality, the majority of strains are below this value [
27]. This implies that galactose is not completely excreted, but a portion is directly utilized by the cells. Compared to the M17-β group, the gene expression and reaction flux obtained by REMI in the galactose metabolic pathway in the M17 and UM17 groups were all upregulated, combined with the significant enrichment of the galactose metabolic pathway in the metabolome, all indicating that β-GP and urea can enhance the gene expression and metabolic flux of the galactose metabolic pathway. This suggests that galactose metabolism plays an important role in the growth of
S. thermophilus when lactose is the substrate. Some key genes in galactose metabolism (such as
galM,
galK) are expected to be targets for subsequent modification. This method can be utilized for the selection of additional modification targets suitable for growth regulation, as shown in
Table S8.
In
S. thermophilus, galactose is used to synthesize teichoic acid, polysaccharides, and nucleoside sugars [
28]. Metabolic flux shows that more UDP-galactose in the M17 and UM17 groups flows to the lipoteichoic acid synthesis pathway, and the results of transcriptome enrichment analysis and metabolome VIP analysis also show the activity of large-molecule synthesis such as peptidoglycan. The expression of genes
tagE related to lipoteichoic acid synthesis in the M17 group is upregulated compared to the other two groups, implying that β-GP may play a role in promoting cell wall synthesis. It is worth noting that, although the reaction flux of UDP-glucose to α-D-Glucose-1-phosphate in the M17 group is higher than that in the UM17 group, the reaction flux from α-D-Glucose-1-phosphate to α-D-Glucose-6-phosphate is lower than that in the UM17 group. Metabolome data also show that the relative abundance of α-D-Glucose-1-phosphate in the M17 group is significantly higher than that in the UM17 group, which is consistent with the lower expression of the gene
pgm. The difference between the UM17 group and the M17 group may be explained by the limitation of the total protein (enzyme) amount. In unicellular organisms, the finite intracellular volume restricts the unlimited increase of enzyme molecules, thereby limiting the concentration of available enzymes [
29]. The M17 group needs to synthesize more proteins in the cell wall synthesis pathway to resist acid stress, while the UM17 group avoids acid stress due to urea metabolism. The optimal growth of bacteria is a balance between maximizing yield and minimizing protein burden [
30], thus the UM17 group can allocate more carbon for the synthesis of biomass precursors.
REMI yielded as many as 920 alternative solutions for the M17-β vs. M17 group, suggesting that β-GP broadly impacts cellular metabolic pathways. In the transcriptome enrichment analysis, the M17 group showed upregulation in the ribosomal pathway, indicating that the cell was actively synthesizing proteins, which may be related to the cell’s response to environmental stress [
31]. Researchers found that the more easily metabolized α-isomer did not have an inhibitory effect in studies where β-GP inhibited the growth of certain lactic acid bacteria. In our study, we also found that changing to different brands of β-GP resulted in cellular growth inhibition (similar to M17-β); therefore, we hypothesize that the effect of β-GP on cell growth is related to its specific structure and plays a complex role in the cell’s response to acid stress.
Interestingly, all strains in our study tested positive for urease activity, suggesting that urease-positive strains are prevalent in
S. thermophilus. Genomic analysis indicated that
S. thermophilus S-3 possesses an 11-gene cluster encoding ureases, which is highly conserved within the species [
32]. This cluster is considered part of the core genome of
S. thermophilus [
33]. Despite the existence of urease-negative mutants in industrial fermenters, UM17 medium remains practical due to the abundance of urease-positive strains and the cost-effectiveness of urea compared to β-GP, both in laboratory research and industrial production.