Employing Active Learning in Medium Optimization for Selective Bacterial Growth

: Medium optimization and development for selective bacterial cultures are essential for isolating and functionalizing individual bacteria in microbial communities; nevertheless, it remains challenging due to the unknown mechanisms between bacterial growth and medium components. The present study ﬁrst tried combining machine learning (ML) with active learning to ﬁne-tune the medium components for the selective culture of two divergent bacteria, i.e., Lactobacillus plantarum and Escherichia coli . ML models considering multiple growth parameters of the two bacterial strains were constructed to predict the ﬁne-tuned medium combinations for higher speciﬁcity of bacterial growth. The growth parameters were designed as the exponential growth rate ( r ) and maximal growth yield ( K ), which were calculated according to the growth curves. The eleven chemical components in the commercially available medium MRS were subjected to medium optimization and specialization. High-throughput growth assays of both strains grown separately were performed to obtain thousands of growth curves in more than one hundred medium combinations, and the resultant datasets linking the growth parameters to the medium combinations were used for the ML training. Repeated rounds of active learning (i.e., ML model construction, medium prediction, and experimental veriﬁcation) successfully improved the speciﬁc growth of a single strain out of the two. Both r and K showed maximized differentiation between the two strains. A further analysis of all the data accumulated in active learning identiﬁed the decision-making medium components for growth speciﬁcity and the differentiated, determinative manner of growth decisions of the two strains. In summary, this study demonstrated the efﬁciency and practicality of active learning in medium optimization for selective cultures and offered novel insights into the contribution of the chemical components to speciﬁc bacterial growth.


Introduction
Culturomics has emerged as a vital method for studying complex microbial environments.It often combines various medium conditions for selective cultures to identify bacterial species.In environmental microbiology, culturomics has led to a reevaluation of microbial diversity, particularly for those microbes that are challenging to culture [1].In clinical microbiology, culturomics has led to the cultivation of 341 bacterial species from 212 different culture conditions, with over half of these being newly discovered in the human gut [2].The primary objective in the development of culturomics is to enable a method to provide diverse culture conditions that promote the growth of fastidious bacteria [3].With the aim of screening and identifying specific microorganisms within samples, the development of culture media for specific bacterial growth has become increasingly crucial.By incorporating various growth inhibitors into the culture media, unwanted microbial populations can be eliminated, facilitating the growth of the target microorganisms [4].Scientists have been exploring novel compositions for culture media, such as those that mimic natural marine environments, leading to the detection of new microorganisms [5].Due to the complexity of increasing samples and the demands for screening and identification, medium development for selective cultures has faced new challenges [6][7][8].The selective culture ensured the target bacterial growth and prevented other microbial communities from growing [4].The typical approach of adding inhibitors might also suppress the growth of the target bacterium.In the food industry, selective culture media are frequently used to detect microbial contamination and spoilage in food materials, which might be unsuitable for competitive bacteria [9,10].Therefore, medium optimization and specialization are highly required in the field.
Medium optimization was challenging due to the high complexity of the microbiomes and combinations of medium components [11].Traditional methods of Design of Experiments (DOE) [12,13] and Response Surface Methodology (RSM) [14,15] employed a quadratic polynomial approximation; thus, they might not fully capture the complex interactions between the medium and cells [16].Machine learning (ML) has been introduced to predict unknown events by learning a dataset [17].This approach has been widely applied in drug development [18,19], protein structure and function prediction [20,21], and epidemic surveillance [22,23] and has exhibited better outcomes than DOE or RSM [24].Lately, combining active learning with ML has successfully optimized the culture media for mammalian cells [25,26].These studies strongly suggested the efficiency of ML-associated active learning for medium development and its availability to improve the selective effect of a culture medium for specific bacterial growth, so-called medium specialization.
Therefore, to meet the current needs for culture medium optimization and specialization, a new method of medium optimization was developed in the present study by referring to the growth dynamics of microorganisms in a wide range of medium conditions.This method combined the high-throughput growth assay and machine learning techniques to fine-tune the medium composition for the selective culture of microorganisms.ML-combined active learning considering single or multiple growth parameters was conducted to fine-tune the medium compositions for the specific growth of Lactobacillus plantarum or Escherichia coli.High-throughput growth assays were performed to acquire the training data and for experimental verification.Multiple benchmarks, i.e., scores, were newly designed to be associated with ML to predict better medium combinations for specific bacterial growth.The datasets connecting the medium combinations with the goodness of bacterial growth obtained during active learning were analyzed to discover the contribution of medium components to bacterial growth specificity.The decision-making elements for bacterial growth and growth specificity were identified.The study tried to provide a representative case of employing active learning for medium specialization and insights into the medium's contribution to selective bacterial cultures.

Experimental and Computational Design of Active Learning for Medium Optimization
The initial training data were experimentally acquired, linking medium combinations to bacterial growth.Escherichia coli (Ec) and Lactobacillus plantarum (Lp) were used, as they were of different growth preferences and commonly employed in laboratories and production tests using selective culture media [27][28][29][30].Although the media appropriate for both strains were well known, whether the culture medium specific for Lp growth could be fine-tuned via machine learning for Ec growth was tested.Eleven components in the commercially available MRS medium for Lp growth were used to prepare the medium combinations.Theoretically, any media or components would be fine for medium optimization.The choice of MRS (11 components) was to benefit from machine learning, which is powerful when the number of variables (medium components) is large enough.Note that agar in the MRS medium was removed from the optimization medium, as the growth assay was performed in liquid media.These components were mixed in a broad range of concentration gradients, changing on a logarithmic scale (Figure 1A).Ec and Lp were cultured independently in 98 medium combinations (n = 4) to obtain the growth curves for calculating the growth parameters of growth rate (r) and maximal population density (K) (Figure 1B).As the initial training data, the medium combinations connecting with the growth parameters of both strains, i.e., r_Lp, K_Lp, r_Ec, and K_Ec, were acquired.These four parameters were used as the machine learning (ML) objective variables, either in a single mode or a multiple combination (Figure 2A).
Appl.Microbiol.2023, 3, FOR PEER REVIEW 3 assay was performed in liquid media.These components were mixed in a broad range of concentration gradients, changing on a logarithmic scale (Figure 1A).Ec and Lp were cultured independently in 98 medium combinations (n = 4) to obtain the growth curves for calculating the growth parameters of growth rate (r) and maximal population density (K) (Figure 1B).As the initial training data, the medium combinations connecting with the growth parameters of both strains, i.e., r_Lp, K_Lp, r_Ec, and K_Ec, were acquired.These four parameters were used as the machine learning (ML) objective variables, either in a single mode or a multiple combination (Figure 2A).ML-assisted medium optimization and specialization for different strains were performed using the gradient-boosting decision tree (GBDT), which has been repeatedly validated to have superior predictive performance and interpretability compared to other algorithms [31][32][33].The initial training data (R0) were applied to the GBDT model to improve r_Lp or K_Lp (R1 and R2) (Figure 2A).Medium optimization and specialization were performed by active learning, following model construction, prediction, and experimental verification steps.The top 10~20 predicted medium combinations of the best objective values (e.g., r, K) were subjected to experimental validation.The results were included in the training data for the following round of ML model construction and prediction (Figure 2B).Active learning was conducted for five rounds for each strain: R1 and R2 considered r_Lp or K_Lp for medium optimization of Lp, and S1~S3 considered multiple parameters for the medium specialization of Lp or Ec (Figure 2A).That is, S1-1 and S1-2 considered the pairs of r or K (i.e., r_Lp vs. r_Ec, K_Lp vs. K_Ec) to maximize the difference of r or K between Lp and Ec, and S2-1, S2-2, and S3 considered all parameters to maximize the difference of both r and K between Lp and Ec (Figure 2C).ML-assisted medium optimization and specialization for different strains were performed using the gradient-boosting decision tree (GBDT), which has been repeatedly validated to have superior predictive performance and interpretability compared to other algorithms [31][32][33].The initial training data (R0) were applied to the GBDT model to improve r_Lp or K_Lp (R1 and R2) (Figure 2A).Medium optimization and specialization were performed by active learning, following model construction, prediction, and experimental verification steps.The top 10~20 predicted medium combinations of the best objective values (e.g., r, K) were subjected to experimental validation.The results were included in the training data for the following round of ML model construction and prediction (Figure 2B).Active learning was conducted for five rounds for each strain: R1 and R2 considered r_Lp or K_Lp for medium optimization of Lp, and S1~S3 considered multiple parameters for the medium specialization of Lp or Ec (Figure 2A).That is, S1-1 and S1-2 considered the pairs of r or K (i.e., r_Lp vs. r_Ec, K_Lp vs. K_Ec) to maximize the difference of r or K between Lp and Ec, and S2-1, S2-2, and S3 considered all parameters to maximize the difference of both r and K between Lp and Ec (Figure 2C).

Active Learning Successfully Fine-Tuned the Media for Selective Bacterial Growth
Active learning considering the single parameter of r_Lp and K_Lp (R1 and R2), which was started from the initial training data (R0), successfully increased r_Lp and K_Lp within two rounds; however, the media optimized for Lp also improved the growth of Ec (Figure 3A,B, R1 and R2).Active learning considering multiple growth parameters was designed to maximize the difference of r or K between Lp and Ec, e.g., promoting the growth of Lp but repressing the growth of Ec.Three formulas were employed for the ML prediction and medium selection (see Methods).Three rounds of active learning (considering multiple parameters) increased the medium specialization: significant Lp growth

Active Learning Successfully Fine-Tuned the Media for Selective Bacterial Growth
Active learning considering the single parameter of r_Lp and K_Lp (R1 and R2), which was started from the initial training data (R0), successfully increased r_Lp and K_Lp within two rounds; however, the media optimized for Lp also improved the growth of Ec (Figure 3A,B, R1 and R2).Active learning considering multiple growth parameters was designed to maximize the difference of r or K between Lp and Ec, e.g., promoting the growth of Lp but repressing the growth of Ec.Three formulas were employed for the ML prediction and medium selection (see Methods).Three rounds of active learning (considering multiple parameters) increased the medium specialization: significant Lp growth and no Ec growth (Figure 3A,B, S1-1, S1-2, and S2-1).Intriguingly, although the optimization targeted a single parameter (r_Lp or K_Lp), the other parameter was also improved to a certain extent (Figure 3E,F, S1-1, S1-2, and S2-1).Moreover, the medium combinations suitable for Ec growth were successfully developed by active learning (Figure 3C,D), despite the medium components originating from MRS, which is developed explicitly for Lp.Three rounds of active learning improved the growth of Ec but had poor specificity because Lp grew as well (Figure 3C,D, S1-1, S1-2, and S2-1), and the parameters other than the targeted one were unsatisfied (Figure 3G,H, S1-1, S1-2, and S2-1).Two additional rounds considerably increased the medium specialization for Ec, both the targeted (Figure 3C,D, S2-2, and S3) and untargeted parameters (Figure 3G,H, S2-2, and S3).
Six medium combinations of high specificity for Lp (M1-3_Lp) or Ec (M1-3_Ec), newly developed via active learning, were selected for further verification.As the active learning prediction was performed in the mono-culture condition, whether these media remained specific in the presence of both Lp and Ec remained uncertain.The co-culture of Lp and Ec was performed in the six media, the medium compositions of which differed from that of MRS (Table 1).Nearly all of them exhibited significant specificity for the growth of the target strain, Lp or Ec, regardless of mono-or co-culture (Figure 4).Although Lp producing acetic acid might inhibit Ec [34][35][36], the media developed for Ec growth (M1-3_Ec) retained specificity in the presence of Lp.The results suggested that the MLassisted medium optimization and specialization were practical, and the resultant media were robust regardless of mono-or co-culture.
and no Ec growth (Figure 3A,B, S1-1, S1-2, and S2-1).Intriguingly, although the optimiza-tion targeted a single parameter (r_Lp or K_Lp), the other parameter was also improved to a certain extent (Figure 3E,F, S1-1, S1-2, and S2-1).Moreover, the medium combinations suitable for Ec growth were successfully developed by active learning (Figure 3C,D), despite the medium components originating from MRS, which is developed explicitly for Lp.Three rounds of active learning improved the growth of Ec but had poor specificity because Lp grew as well (Figure 3C,D, S1-1, S1-2, and S2-1), and the parameters other than the targeted one were unsatisfied (Figure 3G,H, S1-1, S1-2, and S2-1).Two additional rounds considerably increased the medium specialization for Ec, both the targeted (Figure 3C,D, S2-2, and S3) and untargeted parameters (Figure 3G,H, S2-2, and S3).Six medium combinations of high specificity for Lp (M1-3_Lp) or Ec (M1-3_Ec), newly developed via active learning, were selected for further verification.As the active learning prediction was performed in the mono-culture condition, whether these media remained specific in the presence of both Lp and Ec remained uncertain.The co-culture of Lp and Ec was performed in the six media, the medium compositions of which differed from that of MRS (Table 1).Nearly all of them exhibited significant specificity for the growth of the target strain, Lp or Ec, regardless of mono-or co-culture (Figure 4).Although Lp producing acetic acid might inhibit Ec [34][35][36], the media developed for Ec growth (M1-3_Ec) retained specificity in the presence of Lp.The results suggested that the ML-assisted medium optimization and specialization were practical, and the resultant media were robust regardless of mono-or co-culture.

Changes in Growth Parameters Revealed the Effectiveness of Active Learning
How the growth was fine-tuned during active learning was further analyzed.The distributions of the growth parameters significantly differentiated between the strains, considerably changing during active learning (Figure 5A).Bimodal distributions were commonly observed in Ec (Figure 5A, bottom), indicating the medium combinations predicted in the active learning were highly selective for Ec growth.In contrast, monomodal distributions were more often found in Lp, although the transition from monomodal to bimodal was observed in r_Lp (Figure 5A, upper).The peaks of distributions altered significantly as active learning proceeded (Figure 5, color variation), revealing the effectiveness of active learning for medium optimization and specialization.In addition, a correlation analysis of the four growth parameters showed significant cross-correlations, except for the pair of r_Ec and r_Lp (Figure 5B, red).The positive correlations between r and K in both strains (Figure 5B, blue) indicated a common feature of improved growth rate associated with increased population density.The negative correlation across the strains (Figure 5B, black) reasonably presented the trade-offs in the growth of Lp and Ec, as the active learning aimed to improve the medium specificity for a single strain.Taken together, the features of the datasets acquired during active learning well reflected the process of me-

Changes in Growth Parameters Revealed the Effectiveness of Active Learning
How the growth was fine-tuned during active learning was further analyzed.The distributions of the growth parameters significantly differentiated between the strains, considerably changing during active learning (Figure 5A).Bimodal distributions were commonly observed in Ec (Figure 5A, bottom), indicating the medium combinations predicted in the active learning were highly selective for Ec growth.In contrast, monomodal distributions were more often found in Lp, although the transition from monomodal to bimodal was observed in r_Lp (Figure 5A, upper).The peaks of distributions altered significantly as active learning proceeded (Figure 5, color variation), revealing the effectiveness of active learning for medium optimization and specialization.In addition, a correlation analysis of the four growth parameters showed significant cross-correlations, except for the pair of r_Ec and r_Lp (Figure 5B, red).The positive correlations between r and K in both strains (Figure 5B, blue) indicated a common feature of improved growth rate associated with increased population density.The negative correlation across the strains (Figure 5B, black) reasonably presented the trade-offs in the growth of Lp and Ec, as the active learning aimed to improve the medium specificity for a single strain.Taken together, the features of the datasets acquired during active learning well reflected the process of medium optimization and specialization.

Decision-Making Medium Components for the Changes in Bacterial Growth
The GBDT analysis showed that all four growth parameters were primarily determined by a single medium component (Figure 6).Differentiated decision-making components were observed in Lp, i.e., yeast extract and acetic acid for K and r, respectively (Figure 6, upper).As yeast extract was reported to provide initial nutrients for cell division and substance synthesis [37,38], the abundance of the resource might determine the final population size.It was reasonable that acetic acid, which often inhibited microbe growth [39][40][41], targeted r_Lp, as Lp preferred an acidic environment.On the other hand, both r_Ec and K_Ec were commonly determined by K 2 HPO 4 (Figure 6, bottom), which might provide a buffering effect in response to the changes in pH caused by Lp.

Decision-Making Medium Components for the Changes in Bacterial Growth
The GBDT analysis showed that all four growth parameters were primarily determined by a single medium component (Figure 6).Differentiated decision-making components were observed in Lp, i.e., yeast extract and acetic acid for K and r, respectively (Figure 6, upper).As yeast extract was reported to provide initial nutrients for cell division and substance synthesis [37,38], the abundance of the resource might determine the final population size.It was reasonable that acetic acid, which often inhibited microbe growth [39][40][41], targeted r_Lp, as Lp preferred an acidic environment.On the other hand, both r_Ec and K_Ec were commonly determined by K2HPO4 (Figure 6, bottom), which might provide a buffering effect in response to the changes in pH caused by Lp.A hierarchical clustering analysis of the normalized feature importance intriguingly divided the medium components into four main categories (Figure 7A).The medium com- A hierarchical clustering analysis of the normalized feature importance intriguingly divided the medium components into four main categories (Figure 7A).The medium components assigned in the same categories showed neither common chemical properties nor similar biological functions.It strongly suggested that the novel classification of medium components depended on their impact on bacterial growth.Four different trends of medium components contributing to the growth parameters were identified, that is, highly relevant to K_Lp (pink), r_Lp (yellow), r (purple), and Ec (grey), respectively (Figure 7B).Such specificity of medium components might be applied to the medium's development for differentiated bacterial growth.

Medium Components Adjusted via Active Learning for Bacterial Growth Specificity
The medium components contributing to the bacterial growth specificity could be identified according to the medium specialization that proceeded via active learning.The scores (S), calculated in active learning considering multiple growth parameters, were subjected to the GBDT analysis.As they represented the goodness of the growth specificity, the medium components of high feature importance indicated a significant contribution to the growth specificity.The results showed that yeast extract and glucose primarily determined the specificity of K for Lp and Ec, respectively (Figure 8A, blue), and K2HPO4 was the common component determining the specificity of r for Lp and Ec (Figure 8A, green).Yeast extract and K2HPO4 are shown in Figure 8A in black.The findings revealed that adjusting a single component differentiated the growth of Lp and Ec to a great extent.

Medium Components Adjusted via Active Learning for Bacterial Growth Specificity
The medium components contributing to the bacterial growth specificity could be identified according to the medium specialization that proceeded via active learning.The scores (S), calculated in active learning considering multiple growth parameters, were subjected to the GBDT analysis.As they represented the goodness of the growth specificity, the medium components of high feature importance indicated a significant contribution to the growth specificity.The results showed that yeast extract and glucose primarily determined the specificity of K for Lp and Ec, respectively (Figure 8A, blue), and K 2 HPO 4 was the common component determining the specificity of r for Lp and Ec (Figure 8A, green).Yeast extract and K 2 HPO 4 are shown in Figure 8A in black.The findings revealed that adjusting a single component differentiated the growth of Lp and Ec to a great extent.
In summary, the bacterial growth specificity was roughly determined by a single component, regardless of considering one or both of the parameters r and K.As r and K were the most representative features that quantitatively described the bacterial growth dynamics [42,43], the determinative manner of medium components contributing to r and K revealed the working principle for specific growth control.The buffering capacity and nutritional richness might influence the growth specificity during the exponential and stationary phases, respectively (Figure 8B).K 2 HPO 4 was supposed to control the pH condition and execute the buffering effect.Yeast extract and glucose were considered to provide nutrients, such as carbon resources, for metabolism, supporting bacterial growth.The differentiation in the essential components for bacterial growth specificity was well supported by the findings that the buffering agents influenced cell division and biosynthesis [44,45] and the nutritional resources affected the organic acid metabolism [46,47].In summary, ML-assisted medium optimization and specialization provided a practical tool for medium development and discovered novel insights into bacterial growth for precise culture control.In summary, the bacterial growth specificity was roughly determined by a single component, regardless of considering one or both of the parameters r and K.As r and K were the most representative features that quantitatively described the bacterial growth dynamics [42,43], the determinative manner of medium components contributing to r and K revealed the working principle for specific growth control.The buffering capacity and nutritional richness might influence the growth specificity during the exponential and stationary phases, respectively (Figure 8B).K2HPO4 was supposed to control the pH condition and execute the buffering effect.Yeast extract and glucose were considered to provide nutrients, such as carbon resources, for metabolism, supporting bacterial growth.The differentiation in the essential components for bacterial growth specificity was well supported by the findings that the buffering agents influenced cell division and biosynthesis [44,45] and the nutritional resources affected the organic acid metabolism [46,47].In summary, ML-assisted medium optimization and specialization provided a practical tool for medium development and discovered novel insights into bacterial growth for precise culture control.

Discussion
The present study first demonstrated ML-assisted medium specialization for differentiated bacterial growth.ML was remarkably significant in medium optimization for microbial and mammalian cells [26,31], which could be widely applied to synthetic construction and production [48,49].The present study provides alternative applications in medium development for selective cultures.The results indicated that combining ML with active learning was highly practical for precisely fine-tuning the medium compositions for the selective culture of particular bacteria.Further applications of optimizing selective culture media for complex microbiomes were perceived, such as the systematic

Discussion
The present study first demonstrated ML-assisted medium specialization for differentiated bacterial growth.ML was remarkably significant in medium optimization for microbial and mammalian cells [26,31], which could be widely applied to synthetic construction and production [48,49].The present study provides alternative applications in medium development for selective cultures.The results indicated that combining ML with active learning was highly practical for precisely fine-tuning the medium compositions for the selective culture of particular bacteria.Further applications of optimizing selective culture media for complex microbiomes were perceived, such as the systematic development of selective media for individual bacteria living in the environmental microbial community.
The present study made a first trial to combine the growth parameters, determined according to the experimental records, as the quantitative reference values for model construction and prediction.Medium optimization for multiple strains might raise the cost of data acquisition for training and testing.Theoretically, more data led to a more accurate ML model, and more targets (variables) required more experimental data [50].To save labor and cost, three combinations of four growth parameters (r_Lp, K_Lp, r_Ec, and K_Ec), representing the growth features of two different bacterial strains, were employed in active learning here.The success in medium optimization demonstrated that combining multiple parameters was highly recommended for fine-tuning the selective culture media.Note that many other combinations of the growth parameters could be considered, which might show higher efficiency or better selectivity.
On the other hand, improving multiple growth parameters simultaneously in model construction was theoretically ideal but might be biologically impractical, as the living cells and their communities were highly self-regulated and coordinated [51][52][53][54].In the present study, the constructed ML models tried to improve the growth rate (r) and maximize the population size (K) simultaneously, which was assumed to be impossible because of the potential trade-offs between r and K [55,56].Intriguingly, active learning allowed us to find the medium combinations that improve both r and K, demonstrating the availability of the parallel optimization of multiple growth parameters.The differentiated growth of two bacterial strains was also successfully achieved when considering the growth parameters of both strains.Intriguingly, the selective culture media developed in the mono-culture maintained specificity for bacterial growth in the co-culture.As the two strains (Lp and Ec) in the present study were ecologically and genetically far from each other, whether the present approach for medium specialization was practical for closely related or habited bacteria remained questioned.If single species played the dominant role in the microbial communities [57][58][59][60], the interactions among multiple species might be ignored in active learning by weighting the particular growth parameters in ML models.Nevertheless, further technical and experimental developments are required.
In addition, the big dataset acquired in active learning allowed us to investigate the contribution of medium components to bacterial growth.A novel understanding of the chemical role of bacterial growth was achieved.As an example of the new findings, acetic acid was commonly used to adjust the pH of the media for culturing Lp (e.g., MRS) to suppress the growth of other microbiomes growing at neutral conditions [39][40][41].The GBDT analytical results showed that the inhibitory effect of acetic acid on Ec was limited, and yeast extract played a more significant role in selective culture.The finding indicated that the commonly used or commercially available media could be further fine-tuned for better performance or milder conditions.For instance, antibiotics or dyes were often added to the media for selective microbial culture, which might cause increased resistance due to frequent usage [61][62][63][64].Optimizing medium components other than antibiotics should be tried to acquire milder conditions for suppressed bacterial growth associated with reduced a potentiality of acquiring antibiotic resistance.
ML-assisted medium optimization often resulted in novel insights that were outside of current knowledge.Besides the present findings of the medium contributions to differentiated bacterial growth, our previous studies observed the secondary contribution of glucose to bacterial growth [65], the differentiated contribution of carbon, sulfate, and nitrogen for survival [32], and the diversified metabolic strategies in transcriptome reorganization for increased productivity [31].Additionally, the cluster analysis intriguingly divided the medium components into four clusters, which were outside of any well-known chemical or biological categories.The mono-culture data showed higher accuracy in predicting interspecies relationships than the metabolic or phylogenetic data [33].Taken together, active learning for medium optimization and specialization allowed better cell culture and provided a dataset connecting medium compositions to microbial growth for a better understanding and application of microorganisms.

Bacterial Strains
Escherichia coli BW25113 and Lactobacillus plantarum (ATCC8014) were used, which were obtained from the National BioResource Project, the National Institute of Genetics (Shizuoka, Japan), and the National Biological Resource Center (Tsukuba, Japan), respectively.As previously described in detail [65,66], the stock solutions of the bacterial cells grown in the exponential phase were prepared for growth assay in advance, and hundreds of the stock solutions (60 µL) were stored at −80 • C for future use.

Medium Combinations
The medium combinations were initially decided according to the commercially available medium, MRS (Wako, Japan).The components (chemical compounds, reagents, etc.) comprised in MRS were purchased from Wako, except Tryptone (Sigma, Kawasaki, Kanagawa), yeast extract (MP Biochemicals, Santa Ana, CA, USA), and Tween 80 (MP Biochemicals).The lowest concentrations of these components were set at 1% of their concentrations in MRS.Their highest concentrations were determined based on the literature and manufacturers' instructions.The stock solutions of the medium components were prepared as described previously [32,65].They were aliquoted into 100~1000 µL portions for single use and stored at −30 • C. The medium combinations were prepared by mixing the stock solutions, of which the concentrations varied logarithmically in five different gradients, as previously reported [31,32,65].Initially, 96 medium combinations were prepared for the growth assay of both bacterial strains as the training data.A total of 192 combinations were prepared to test both strains in the present study.

Growth Assay and Calculation
The prepared culture mixtures were dispensed into a 96-well microplate (Costar, Washington, DC, USA) with 3-4 biological replicates per combination, each consisting of 200 µL per well, and the combinations were placed in different positions.The 96-well plate was incubated at 37 • C with shaking at 567 rpm in a microplate reader (Epoch2, BioTek, Winooski, VT, USA).Cell growth was monitored at an optical density of 600 nm (OD 600 ), and readings were taken at 30 min intervals over 48 h.The temporal changes of OD 600 readings were exported from the microplate reader and subjected to Python programs to calculate the two representative growth parameters, the growth rate (r) and the maximal OD 600 (K), as described elsewhere in detail [65,66].In total, 1660 growth curves were experimentally obtained, and the mean values of r and K (biological replicates, N = 4~6) were used for machine learning and computational analyses.

Machine Learning and Computational Analyses
Python 3 was used for machine learning (ML), as described previously [26,31,32].The ML models used the "GradientBoostingRegressor" from the "ensemble" module in the "scikit-learn" library.The explanatory and target variables were the medium components and growth parameters.The model employed 'random_state' and 'n_estimators' set to 0 and 300, respectively.The 'learning_rate' and 'max_depth' were searched between 0.001 and 0.5, using increments of 0.005 among 2, 3, 4, and 5.The root-mean-square error (RMSE) was calculated to assess prediction accuracy using the 'mean_squared_error' from the 'metrics' module in "scikit-learn".The 'feature_importances_' was calculated using an outer five-fold cross-validation.Both outer and inner cross-validations were performed using the 'cross_val_score' function from the 'model_selection' module in "scikit-learn".'GridSearchCV' was used for the hyperparameter search with 'learning_rate' and 'max_depth' ranging between 0.01 and 0.5, incrementing by 0.01 among 2, 3, 4, and 5.The 'n_estimators' were set to 300, while other hyperparameters were left as default.The average of the 'feature_importances_' values derived from the five-fold cross-validation was used.Additionally, the 'feature_importances_' were subjected to the hierarchical clustering analysis, using 'normalize' in the "sklearn.preprocessing"package with the "ward" method.

Model Construction and Active Learning
As previously reported [26], ML model construction and prediction were conducted using the supercomputer Cygnus system (NEC LX 124Rh-4G) in active learning.The GBDT models (R0~R2) were constructed initially for active learning, i.e., learning, prediction, and validation.The top 10~20 predicted medium combinations that showed the best r or K of individual strains were subjected to experimental verification.The resultant experimental outputs were included in the training dataset, which was used for the following round of ML model construction.Subsequently, ML models combining both growth parameters (r and K) were constructed using the following formula (Equations ( 1 S3 = [norm(K tar − K con ) + 5 × norm(r tar − r con )] × (K tar × r tar ) Here, Para represents any of the growth parameters of any strain.Para_tar and Para_con indicated the parameters of the target and control strain, respectively.Norm indicated the data normalization.K_tar and K_con, r_tar, and r_con represented K and r as the target or control, respectively.The resultant scores (S1~S3) were used as the target variables.The higher the scores, the more significant the difference in growth parameters, indicating higher specificity for the target bacterial growth.The top 10~20 medium combinations showing the highest scores (S1~S3) were experimentally tested and added to the training dataset for subsequent learning and prediction.Repeated rounds of active learning associated with the changes in ML models were performed.

Co-Culture Verification
The cell stocks of both bacterial strains were diluted 1000fold in 1 mL of the identical medium separately.The diluted mixture of E. coli was dispensed into a 24-well plate, and that of L. plantarum was placed into a Transwell insert (ThinCert ® Cell Culture Inserts, pore size 0.4 µm, Greiner Bio-One) and then positioned on the 24-well plate (Greiner Bio-One) containing the E. coli mixture.This allowed the two bacterial strains to grow in the same medium conditions without mixing with each other.The 24-well plate was incubated at 37 • C with shaking at 567 rpm (Epoch2, BioTek) for 24 h.The culture mixtures were individually injected into separate wells in an alternative 24-well microplate (1 mL per well), and OD 600 readings were measured using the same microplate reader.Six fine-tuned selective media were tested with three biological replicates.

Figure 1 .
Figure 1.Growth assay under medium combinations.(A).Concentration gradients of medium components.Circles indicate the concentrations used in the medium combinations, shown on a logarithmic scale.(B).High-throughput growth assays.Monoculture of two bacterial strains was performed under hundreds of medium combinations.The growth parameters calculated from the growth curves, i.e., growth rate and growth yield, are indicated as r and K, respectively.

Figure 1 .
Figure 1.Growth assay under medium combinations.(A).Concentration gradients of medium components.Circles indicate the concentrations used in the medium combinations, shown on a logarithmic scale.(B).High-throughput growth assays.Monoculture of two bacterial strains was performed under hundreds of medium combinations.The growth parameters calculated from the growth curves, i.e., growth rate and growth yield, are indicated as r and K, respectively.

Figure 2 .
Figure 2. Active learning for medium optimization.(A).ML models considering single or multiple growth parameters.The growth parameters subjected to being increased or suppressed are indicated in cyan and grey, respectively.R1 and R2 consider a single out of four parameters; S1 considers the paired parameters; and S2 and S3 consider all four.(B).Repeated rounds of active learning.The process of active learning is presented, i.e., ML model construction (described in (A)), medium prediction, and experimental verification.(C).Number of experimentally tested medium combinations in each round of active learning.R0 indicates the initial training data.S1-1, S1-2, S2-1, and S2-2 represent two rounds of active learning with the ML models of S1 and S2, respectively.

Figure 2 .
Figure 2. Active learning for medium optimization.(A).ML models considering single or multiple growth parameters.The growth parameters subjected to being increased or suppressed are indicated in cyan and grey, respectively.R1 and R2 consider a single out of four parameters; S1 considers the paired parameters; and S2 and S3 consider all four.(B).Repeated rounds of active learning.The process of active learning is presented, i.e., ML model construction (described in (A)), medium prediction, and experimental verification.(C).Number of experimentally tested medium combinations in each round of active learning.R0 indicates the initial training data.S1-1, S1-2, S2-1, and S2-2 represent two rounds of active learning with the ML models of S1 and S2, respectively.

Figure 3 .
Figure 3. Active learning improved medium specialization.Medium optimization for the specific growth of Lp or Ec is shown as the left (A,B,E,F) and right (C,D,G,H) four panels, respectively.Boxplots represent the growth parameters experimentally obtained under 10~20 medium combinations predicted per round of active learning.The colored boxes represent the parameters supposed to be improved in active learning, and the gray ones are to be repressed.The small quarter squares indicate the ML models used in active learning.The growth parameters to be improved in ML model construction are indicated in green and blue (r and K, respectively), and those to be suppressed are indicated in grey.R0 shows the initial training data.R1 and R2 show the experimental results of two rounds of active learning with the ML model considering either r_Lp or K_Lp solely.S1-1 and S1-2 show the experimental results of two rounds of active learning with the ML model considering the paired parameters of r or K. S2-1, S2-2, and S3 show the experimental results of three rounds of active learning with the ML models considering all four parameters.

Figure 3 .
Figure 3. Active learning improved medium specialization.Medium optimization for the specific growth of Lp or Ec is shown as the left (A,B,E,F) and right (C,D,G,H) four panels, respectively.Boxplots represent the growth parameters experimentally obtained under 10~20 medium combinations predicted per round of active learning.The colored boxes represent the parameters supposed to be improved in active learning, and the gray ones are to be repressed.The small quarter squares indicate the ML models used in active learning.The growth parameters to be improved in ML model construction are indicated in green and blue (r and K, respectively), and those to be suppressed are indicated in grey.R0 shows the initial training data.R1 and R2 show the experimental results of two rounds of active learning with the ML model considering either r_Lp or K_Lp solely.S1-1 and S1-2 show the experimental results of two rounds of active learning with the ML model considering the paired parameters of r or K. S2-1, S2-2, and S3 show the experimental results of three rounds of active learning with the ML models considering all four parameters.

Figure 4 .
Figure 4. Co-culture of both strains in fine-tuned media for selective culture.Six media optimized for specific growth of either Lp or Ec are indicated as MLp1~3 and MEc1~3, respectively.The OD600 values of Lp and Ec after 24 h co-culture are shown.Orange and blue represent Lp and Ec, respectively.Standard errors of biological replicates (N = 3) are indicated.

Figure 4 .
Figure 4. Co-culture of both strains in fine-tuned media for selective culture.Six media optimized for specific growth of either Lp or Ec are indicated as MLp1~3 and MEc1~3, respectively.The OD 600 values of Lp and Ec after 24 h co-culture are shown.Orange and blue represent Lp and Ec, respectively.Standard errors of biological replicates (N = 3) are indicated.

Figure 5 .
Figure 5. Data analysis of the growth parameters acquired in active learning.(A).Distributions of the growth parameters.The relative frequency of the parameters acquired in the indicated rounds of active learning is shown.Color variation indicates the rounds of active learning.(B).Correlation analysis of the four growth parameters.Scatter plots, histograms, correlation coefficients, and statistical significance are presented.

Figure 5 .
Figure 5. Data analysis of the growth parameters acquired in active learning.(A).Distributions of the growth parameters.The relative frequency of the parameters acquired in the indicated rounds of active learning is shown.Color variation indicates the rounds of active learning.(B).Correlation analysis of the four growth parameters.Scatter plots, histograms, correlation coefficients, and statistical significance are presented.

Figure 6 .
Figure 6.Contribution of medium components to bacterial growth.Feature importance of the medium components analyzed with GBDT is shown.Blue and green represent the growth yield (K) and the growth rate (r), respectively.The bacterial strains are indicated.

Figure 6 .
Figure 6.Contribution of medium components to bacterial growth.Feature importance of the medium components analyzed with GBDT is shown.Blue and green represent the growth yield (K) and the growth rate (r), respectively.The bacterial strains are indicated.

9 Figure 7 .
Figure 7. Categorization of medium components according to their contributions to bacterial growth.(A).Hierarchical clustering of medium components.(B).Radar plots of the feature importance of medium components to the four growth parameters.Color variation indicates the four categories.

Figure 7 .
Figure 7. Categorization of medium components according to their contributions to bacterial growth.(A).Hierarchical clustering of medium components.(B).Radar plots of the feature importance of medium components to the four growth parameters.Color variation indicates the four categories.

10 Figure 8 .
Figure 8. Contribution of medium components to the growth differentiation.(A).Feature importance of medium components to the scores considering multiple growth parameters.S_K, S_r, S_K, and r indicate the scores considering the pair of K_Lp and K_Ec, the pair of r_Lp and r_Ec, and all four parameters, respectively.(B).Hypothesis of differentiated factors determining the bacterial growth specificity in exponential and stationary phases.

Figure 8 .
Figure 8. Contribution of medium components to the growth differentiation.(A).Feature importance of medium components to the scores considering multiple growth parameters.S_K, S_r, S_K, and r indicate the scores considering the pair of K_Lp and K_Ec, the pair of r_Lp and r_Ec, and all four parameters, respectively.(B).Hypothesis of differentiated factors determining the bacterial growth specificity in exponential and stationary phases.

Table 1 . ML-predicted medium combinations of significant growth specificity. Six
media were selected for the co-culture test.MLp1, 2, and 3 and MEc1, 2, and 3 indicate the ML-predicted media specifically for Lp and Ec growth, respectively.The concentrations of the medium components are shown in the unit of g/L in comparison to the original medium MRS.

Table 1 . ML-predicted medium combinations of significant growth specificity
. Six media were selected for the co-culture test.MLp1, 2, and 3 and MEc1, 2, and 3 indicate the ML-predicted media specifically for Lp and Ec growth, respectively.The concentrations of the medium components are shown in the unit of g/L in comparison to the original medium MRS.