Chemical-Mediated Microbial Interactions Can Reduce the Effectiveness of Time-Series-Based Inference of Ecological Interaction Networks

Network-based assessments are important for disentangling complex microbial and microbial–host interactions and can provide the basis for microbial engineering. There is a growing recognition that chemical-mediated interactions are important for the coexistence of microbial species. However, so far, the methods used to infer microbial interactions have been validated with models assuming direct species-species interactions, such as generalized Lotka–Volterra models. Therefore, it is unclear how effective existing approaches are in detecting chemical-mediated interactions. In this paper, we used time series of simulated microbial dynamics to benchmark five major/state-of-the-art methods. We found that only two methods (CCM and LIMITS) were capable of detecting interactions. While LIMITS performed better than CCM, it was less robust to the presence of chemical-mediated interactions, and the presence of trophic competition was essential for the interactions to be detectable. We show that the existence of chemical-mediated interactions among microbial species poses a new challenge to overcome for the development of a network-based understanding of microbiomes and their interactions with hosts and the environment.


Introduction
There is a growing recognition that microbiome science needs to move beyond descriptive studies to a more systematic understanding that would facilitate mechanical, predictive, and manipulative approaches to rational microbial engineering [1]. Networkbased approaches will help disentangle complex microbial and microbe-host/environment interactions, which could have applications universally applicable to medicine and health care, agriculture, and other environmental and industrial areas [2][3][4]. Current attempts to understand interaction networks combine comprehensive quantification of microbiota using next-generation sequencing technologies with various network inference methodologies based on statistical and machine learning approaches.
Recent studies revealed that the exchange of metabolites plays an essential role in microbial interactions. All microorganisms exchange metabolites, such as vitamins, amino acids, nucleotides, or growth factors, by releasing them into the surrounding environment [5][6][7]. This metabolite cross-feeding is common both among different bacterial species and between bacteria and members of other kingdoms [6]. In consequence, a microbial community forms a unique chemical environment known as the exometabolome, which comprises hundreds of metabolites, the majority of which are derived from living cells [7,8]. Considering its prominent role in microbial interactions, the exometabolome and chemicalmediated interactions would be tightly linked with the dynamics of microbial communities, including composition, stability, and functionality.
However, most of the benchmarking of methods proposed for inferring microbial interactions has been performed using generalized Lotka-Volterra (gLV) models, which assumes direct (species-species) interactions [9][10][11]. Some methods were developed based on the gLV equation itself [12][13][14][15]. Recent studies [16][17][18] revealed that the species-species interaction models are insufficient to capture dynamics that occur through chemical-mediated interactions, while such interactions would serve a prominent role in the coexistence of diverse microbial species [19]. Therefore, benchmarking with the direct interaction model alone would be insufficient. If the presence of chemical-mediated interactions reduces the reliability of the network inference methods, it poses a new challenge for studies of microbial interaction networks.
In this paper, we investigated how accurate existing time-series-based inferences of ecological interactions would be when underlying interactions are mediated by chemicals. For this purpose, we started with an in-silico mediator-explicit model of microbial population dynamics whose parameter values had been experimentally calibrated [19]. We compared the performance of five major/state-of-the-art methods under different model assumptions, including the process of direct or chemical-mediated interactions, the presence or absence of competition for nutrients, as well as the effect of different sampling intervals and magnitudes of stochasticity.

Models
Our model is based on the mediator-explicit model proposed and validated for actual microbial systems by Niehaus et al. [19]. In this model, each species can produce multiple chemicals, and each chemical can influence multiple species. Each influence is represented by a coefficient defined for pairs of species and chemicals. The mediator-explicit model (M) is defined as follows: Here, variables x i and c k are the abundance of microbes and the amount of chemicals, respectively, and i = 1, 2, . . . , n and k = 1, 2, . . . , m, i.e., we assume a system consisting of n microbes and m chemicals. r i is the intrinsic growth rate, K is the carrying capacity, κ ik is the half-saturation density of the effect of chemical k on species i, ρ ik + and ρ ik − represent the positive and negative effect of chemical k on species i, respectively, α ki is the maximum rate of consumption of chemical k by species i, β ki is the rate of production of chemical k by species i, and δ is the constant dilution rate (see Table 1 for the parameter values). We also use the matrix representation of parameters as follows: n × m matrices κ = {κ ik }, ρ + = {ρ ik + } and ρ − = {ρ ik − }, and m × n matrices α = {α ki } and β = {β ki }.
While Niehaus et al. [19] did not include the carrying capacity, we introduced it to avoid overflow in numerical calculations. Moreover, it is reasonable to assume that population growth is limited due to resource availability. Model M assumes that each species has an independent bound for their population growth, i.e., there is no competition for resources among species. In addition to model M, we introduce model M that includes the effect of competition for nutrients: This model assumes that all species equally depend on all available nutrients, and therefore the increase of any species reduces the growth rate of other species. To be consistent with model M, we assumed that the total abundance Σ j x j is bounded by nK.
As the third and fourth models, we considered models with direct interactions comparable to the models M and M . These models are introduced based on the following considerations. Since the overall effect of chemicals on the species' growth rate is ρ = ρ + − ρ − and the rate of production of chemicals by each species is β, ρβ represents the instantaneous effects of a species on another, which is comparable to the direct interactions represented by an interaction matrix A = a ij . Therefore, we defined model D as and the model D as These models are simpler than models M and M in that they do not include the effects of the consumption of chemicals by microbes.
To obtain the time-series data, we introduced stochasticity and small influx to the above models and obtained numerical solutions using the Euler-Maruyama Scheme: Here, t is the step size of numerical simulation and is fixed as t = 0.025, W t is a normal distribution with s.d. of 1, σ represents the magnitude of noise, and Ω is Ω ∈ {M, M , D, D }. During the numerical simulations, some species would approach zero infinitely. Thus, to avoid underflow, we introduced a small influx . The param-eter values are scaled so that t = 1 corresponds to 1 day and K = 1 corresponds to 10 7 individuals/mL.

Effective Interaction Matrix
In models M and M , it is necessary to interpret the microbial interactions that occur via chemicals as direct interactions to evaluate the network inference methods. For this purpose, we first combine Equations (1) and (2) and describe them as (dx i /dt, . . . , dc k /dt, . . .) = (F M (x i ), . . . , G M (c k )). Under this expression, the Jacobi matrix can be written as Here, the m × n block, J F M (c) , contains the positive and negative effects of chemicals on organisms, and the n × m block, J G M (x) , contains the effects of production and consumption by microbes on chemicals. Then, we interpret A * = J F M (c) J G M (x) as a matrix representing the effective microbial interactions (effective interaction matrix), which is the target of the evaluation of the network inference methods. The above argument holds for M as well. In models D and D , the non-zero part of the Jacobi matrix is consistent with that of A. Therefore, in D and D , we identify A with A * .

Data Preparation
For each model, we generated the time series for generating data sets using pairs of a parameter set θ = (κ, ρ + , ρ − , α, β, σ) and an initial state X 0 = (x 1 , . . . , x n , c 1 , . . . , c m ). Since it was difficult to obtain multispecies coexistence with the initial parameter set, these pairs are obtained by several optimization steps as follows (see Figure 1 for a summary of the procedure and Table 2 for the parameter values). At T = 0, we initialized M p pairs of a parameter set and an initial state. For the initial states, abundances of microbes were drawn from a uniform distribution between 10 and 100 , and in models M and M , the density of chemicals was set to zero. Then, we numerically solved the differential equation with each pair up to t max steps (corresponding to time series of t max ∆t days). After discarding the first t 0 steps, we calculated the score of each pair according to the following evaluation function: Threshold value for the evaluation function 5 Figure 1. Procedure for generating a data set.

Network Inference Methods
We compared five methods (Pearson and Spearman rank correlation, LSA, CCM, and LIMITS) in this paper. Taking multivariate time series as input values, these methods return matrices of statistics indicating the presence/absence of interaction between species pairs, represented by correlation coefficients in Pearson, Spearman, and LSA, predictive skills in CCM, and interaction coefficients in LIMITS. These matrices are accompanied by matrices of p-values, and these can also be used to detect interactions. Except for the statistics of LIMITS, the matrices are dense such that every element has a non-zero value.
We refer to such a matrix as the inferred interaction matrix and consider the statistic and p-value of each method as a classifier to predict the presence or absence of effect from one species to another. Interactions are represented by non-zero values of the non-diagonal elements of an actual/inferred interaction matrix. When considering the presence or Here, denoting the time-averaged abundance of species i in the time series asx i , S * is the number of "major" species (number of species whosex i is larger than η), is the Simpson's diversity index, and c = L/n(n − 1) is the connectance (L is the number of non-zero values for off-diagonal elements in an interaction matrix). D and c were calculated for the major species. This function takes its maximum value when the number of major species is n, their time-averaged abundance is equal, and the connectance is 0.5. We adopted the evaluation function to obtain a state where the number of coexisting species is as high as possible, their interactions are neither too sparse nor too dense, and a small number of species are not dominant. Then, top m p pairs (parents) are selected as members of the next generation in addition to M p − m p pairs that are generated by mutations, where one of the non-zero elements of a matrix in θ is selected and switched with another element of the same matrix; this procedure is repeated one to four times with equal probability. Then, M p pairs are again evaluated as above. However, at each numerical simulation, each x i in X 0 are perturbed by multiplying a random value drawn from a uniform distribution between 0.8 to 1.2. After repeating the same procedure for T max steps, a pair with the highest z θ,X 0 is accepted if z θ,X 0 > ω. This procedure is repeated until we obtain N accepted time series for each model.
We obtained the data sets for testing the network inference methods by generating time series with various noise magnitude σ and resampling them at different intervals τ.
To evaluate the effect of the sampling interval, we first generated four data sets from a time series of a fixed noise magnitude σ = 1 and sampled 100 of the last points with different intervals τ ∈ {10, 20, 40, 80} (this corresponds to time series of 25, 50, 100 and 200 days sampled every 0.25, 0.5, 1 and 2 days). Then, to evaluate the effect of noise magnitude, we generated 4 data sets from time series of different noise magnitude σ ∈ {0.5, 1, 2, 4} and sampled 100 of the last points with a fixed interval τ = 40. Basic characteristics of the communities obtained under different noise magnitude are described in Appendix B.

Network Inference Methods
We compared five methods (Pearson and Spearman rank correlation, LSA, CCM, and LIMITS) in this paper. Taking multivariate time series as input values, these methods return matrices of statistics indicating the presence/absence of interaction between species pairs, represented by correlation coefficients in Pearson, Spearman, and LSA, predictive skills in CCM, and interaction coefficients in LIMITS. These matrices are accompanied by matrices of p-values, and these can also be used to detect interactions. Except for the statistics of LIMITS, the matrices are dense such that every element has a non-zero value.
We refer to such a matrix as the inferred interaction matrix and consider the statistic and p-value of each method as a classifier to predict the presence or absence of effect from one species to another. Interactions are represented by non-zero values of the nondiagonal elements of an actual/inferred interaction matrix. When considering the presence or absence of an interaction, it is necessary to distinguish between the direction of the interaction. In the following, in addition to the term "interaction", we regard the presence or absence of effect from species A to B and that of species B to A if strict consideration of directionality is needed.

Pearson and Spearman Correlation Coefficient
Pearson correlation coefficient [20] is a measure of linear correlation between two variables. It is a parametric measure that assumes Gaussian distributions of variables. Spearman rank correlation coefficient [21] is a nonparametric measure of rank correlation. It is relevant even if both or one of the variables is non-Gaussian and thus is more broadly applicable than Pearson correlation. These correlation coefficients are frequently used for the network-based analysis of biological systems [9]. Both methods provide correlation coefficients between −1 and 1 for each pair of variables, accompanied by the p-values. The correlation matrix and p-values are symmetric, i.e., they are inherently imprecise if the actual ecological interactions involve asymmetric interactions.

Local Similarity Analysis (LSA)
LSA (see Ruan et al. [22] for detail; also refer to Beman et al., [23], Steele et al. [24], Xia et al. [25]) also considers the association between time series but is optimized to detect non-linear, time-sensitive relationships. It captures local and potentially time-delayed association patterns that cannot be identified by ordinary correlation analysis [25]. A previous study [9] showed that, for Lotka-Volterra sparse ecological relationships, it attains greater performance than other correlation-based approaches. We performed LSA analysis according to the implementation by the authors [22] and calculated the p-value with 2000 bootstrap samples. This method returns a correlation coefficient between −1 and 1 for each pair of variables, accompanied by the p-values. The correlation matrix and p-values are symmetric, i.e., it has the same problem as Pearson and Spearman rank correlation.

Convergent Cross Mapping (CCM)
CCM [26] is a statistical test for a causal relationship between two time series variables and attempts to address the problem that correlation is often not an indicator of the presence or absence of actual causal relationships. This method is based on Takens' embedding theorem [27], which states that the essential information of a multi-dimensional dynamical system is retained in the time series of any single variable of that system. Based on this theory, in a pair of time series (X, Y), if X has a high forecasting skill in predicting Y, causality will be detected in the direction of Y → X . The predictive skill, denoted as ρ Y→X , is quantified by the Pearson correlation coefficient between actual Y and Y predicted by X. Different from the previous correlation-based methods, ρ Y→X is usually unequal to ρ X→Y . Its p-value is calculated by the bootstrapping procedure to account for the effects of the predictability inherent in the target time series (such as periodicity). To apply this method to systematic network inference, we needed to extend the implementation by the authors (Appendix A).

LIMITS
LIMITS [12] is based on the generalized Lotka-Volterra difference equation. The algorithm combines forward stepwise regression and bootstrap aggregation (bagging) to determine, in a majority voting fashion, the pairs of interacting variables and the value of their interaction coefficient. For each species, the logarithm of the change in abundance per time step is used as the response variable. Then, this method selects variables to be explanatory variables of linear regression step by step, as long as the predictive performance is improved. The species selected as the explanatory variables are candidates for the interaction partners. This process is repeated (here, 500 times) as bootstrap replicates, and the species selected in more than half of all iterations are finally determined as the interaction partners, i.e., variables that can affect the response variable. The interaction is estimated asymmetrically and in a way that includes positive and negative values. For each pair of species, the p-value was obtained as the number of bootstrap trials in which the interaction coefficient was zero divided by the number of all trials. This method returns a sparse matrix for the interaction matrix, but that of the p-values can be dense.

Evaluation
Since the number of major species varies from trial to trial, we evaluated only the interaction network of the top five abundant species. Due to the condition for acceptance (ω = 5 in z(θ, X 0 ) > ω), every trial contains at least 5 major species (Appendix B).
We here focus on the performance of the above methods in detecting the interactions because, in both real ecological communities and our model, interactions between species are sparse (Appendix B). To evaluate the performance of network inference methods through a classifier (correlation coefficient, predictive skill, or interaction coefficient), we used ROC-AUC (the area under the curve of a receiver-operator characteristic curve; Supporting Figure S1). An ROC curve is the plot of (1-specificity, sensitivity). Here, sensitivity is the number of interacting species pairs that a method was able to find divided by the actual number of them, and specificity is the number of non-interacting species pairs that a method correctly omitted divided by the actual number (in both cases, the presence/absence of an effect from species A to B and B to A must be distinguished). Given a matrix of statistics, an ROC curve is obtained when we keep changing the threshold value by which we judge that an interaction exists between species pairs. When the threshold value is too high, none of the elements are accepted, and the value takes (0,0). When the threshold value is too low, all of the elements are accepted, and the value takes (1,1). (This explanation is valid for cases where larger values indicate a greater likelihood that an interaction exists, but the opposite is true for cases such as p-values where smaller values suggest the presence of an interaction.) When the threshold is varied over a range covering (0,0) to (1,1), the more convex the ROC curve is to the upper left (in other words, the more sensitivity can be increased without decreasing specificity), the better the statistic is at detecting interactions. Here, the value of the ROC's AUC varies from 0 to 1 and becomes 1 when a method performs the best and 0.5 when it is indifferent to random selection.
The ROC-AUC can evaluate the overall performance even when the optimal threshold is not known, but from a practical standpoint, it is useful to have evaluations obtained under certain conditions. For this reason, we evaluated the precision for the interactions when half of the matrix elements are selected according to the descending (ascending) order (for simplicity, we refer to it as Precision (c = 0.5)), that is, how many correct interaction pairs are contained in the top 50% of candidates.

Software
We used Mathematica 10.2 and 11.0 for our analysis. The Mathematica notebook files used for the analysis can be downloaded from: https://drive.google.com/file/d/11suHcj6 RCC2p6gQ6zjTvzbo-DeExzR-g/view?usp=sharing (accessed on 19 January 2022).

Results
To evaluate five network inference methods, we generated in silico time-series microbial population datasets based on the mediator-explicit model (M) and direct interaction model (D) as well as modified models by adding the effect of resource competition (M and D ) (Appendix B; Figure A1). For the four models M, D, M , and D , we obtained network estimation results with five methods under different simulation conditions (see Supporting Figures S2-S5 for representative results). Results from the ROC-AUC indicate that in the presence of resource competition (M and D ), LIMITS and CCM, in this order, performed better than other methods (Figure 2a,b). While LIMITS attained the highest performance when p-values were used to detect interactions (Figure 2b), when the interaction was chemical-mediated (M ), the performance tended to decrease in the case of direct interaction (D ). The heatmap on the right side of each panel aids in the comparison between different conditions/methods. Both in Figure 2a,b, the value of row D 5 and column M 5 is positive, which means that the median of LIMITS in D was higher than that of M (black dot indicates that the difference is significant (p < 0.05) in terms of Mann-Whitney test). On the other hand, CCM was robust to the effect of direct or chemical-mediated interactions compared to LIMITS. In Figure 2a,b, the value of row D'4 and column M'4 shows that the difference between the median of CCM was close to zero and not significant. Relative to LIMITS and CCM, the correlation-based methods, Pearson and Spearman rank correlation and LSA, were ineffective in detecting both types of interactions, with median values always around 0.5. On the other hand, in the absence of nutrient competition (M and D), it was difficult to detect interactions with any of the methods. None of the methods significantly outperformed the others.
For M and D , the superiority of LIMITS over other methods was also evident when evaluated using precision (Figure 2c,d). In contrast to the case of ROC-AUC, there was no significant decrease in the performance of LIMITS in 'M' relative to D . This can be considered superficial for two reasons. The first reason is that precision is a criterion for identified interactions, but validity is not considered for non-identified interactions. The second reason is that it is necessary to set a single threshold for a classifier to obtain a precision value. Since the setting of the threshold is arbitrary, the same result may not always be obtained. Therefore, it was suggested that the effects of mediators could be overlooked in the evaluation by precision. Precision was higher in M relative to D and M (Figure 2c,d), but this was due to the difference in connectance since connectance was highest in M in all simulation conditions compared to other models (Appendix B; Figure A1) and precision was highly correlated to connectance (Supporting Figure S6). Thus, it is inappropriate to conclude that either the presence of mediators or nutrient competition improved the performance of network inference.
Within the range of the sampling interval (τ) and the magnitude of noise (σ) we studied, the above results were mostly robust, and there were no systematic dependencies of performance of the methods on either of the parameters (see Supporting Figures S7-S13). between the different pairs of models/methods. The value of a cell is obtained by subtracting the median of the pair of models/method in a column from the same value of the pair in a row. Black dots indicate that the difference is significant (p < 0.05) based on Mann-Whitney test. We compared the performance of the different methods applied to the same model, and we compared the performance when the condition of competition was the same but the property of the interaction was different (M and D or M and D ) and when the property of the interaction was the same but the condition of competition was different (M and M or D and D ).

Discussion
We compared the performance of five network inference methods in detecting interactions based on time series when interactions are mediated by chemicals or occur through direct interactions between species, as well as in different competitive contexts. Our results suggest that: (1) the existence of mediators can make microbial interactions difficult to detect, but the degree of difficulty may be method dependent, (2) nutrient competition can play an essential role for the detectability of interactions, and (3) correlation-based methods are not useful for detecting interactions from time series.
Among the methods evaluated in this study, LIMITS, which is derived from the discrete version of the generalized Lotka-Volterra equation, was found to be the most reliable. However, its performance will be reduced if chemical-mediated interactions are dominant due to a mismatch between the processes that the method assumes and those that actually occur. Although the performance of CCM was not as good as LIMITS, CCM would be more robust to the presence of chemical-mediated interactions compared to LIMITS. This may be due to the fact that CCM is not dependent on any specific equation and is based on a nonlinear forecasting method that can flexibly capture the relationship between variables. In summary, LIMITS is recommended as the network inference method in general since it outperformed the other methods under any application conditions in M and D and no method clearly outperformed the others in M and D. However, further investigation around the tools of nonlinear forecasting, especially integration with the machine learning framework used in LIMITS to improve its applicability to nonlinear dynamics, will be a promising direction for time-series-based network inference targeting microbial communities, where chemical-mediated interactions are thought to play a major role.
In the models with resource competition (M , D ), an increase in one species negatively affects the growth rate of other species and can reduce the population size. Thus, nutrient competition can facilitate coordinated variation in abundance and likely promote the detection of interactions. In fact, the difference between M, D and M , D was mostly characterized by the coefficient of variation (Appendix B). In microbiota, there are a few essential nutrients that are common to many species [28], as well as diverse chemicals that mediate interactions. Space will also be an important resource if, for example, the substrate for colonization can be a limiting factor. Thus, while we need to be careful about its strength, the network inference would usually benefit from competitive processes.
Finally, although correlation-based methods are widely used to infer interaction networks from time series, it should be noted that their reliability can be very low, regardless of whether the interactions are direct or chemical-mediated. This has been pointed out repeatedly in previous studies [12,26,29] for direct interactions, but we feel it is important to highlight again.
There are multiple levels of understanding of networks, from properties of the network as a whole, such as degree distribution and average degree, to properties of individual nodes, such as network centrality. The more attention we pay to finer scale properties, the more accurate the network inference needs to be. Therefore, improving the accuracy of the method used for network inference is essential for a network-based understanding of biological systems.

Conclusions
We found that the existence of mediators can make microbial interactions difficult to detect. However, the degree of difficulty was different among the methods. CCM and LIMITS were capable of detecting interactions from the time series. While LIMITS performed better than CCM, it was less robust to the presence of chemical-mediated interactions. Our result also suggests that the presence of nutrient competition can facilitate the detection of interactions. The existence of chemical-mediated interactions among microbial species poses a new challenge to overcome for the development of a network-based understanding of microbiomes and their relationship to hosts and the environment. Our study would provide an in silico experimental system of microbial population dynamics, including chemical-mediated interactions, to evaluate network inference methods that will be developed in the future.
Supplementary Materials: The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/ijerph19031228/s1, Figure S1. Evaluation criteria for binary classification problems and their relationship to ROC curves; Figure S2. Representative result of network inference with the 5 methods for model M (τ = 40 and σ = 1); Figure S3. Representative result of network inference with the 5 methods for model D (τ = 40 and σ = 1); Figure S4. Representative result of network inference with the 5 methods for model M' (τ = 40 and σ = 1); Figure S5. Representative result of network inference with the 5 methods for model D' (τ = 40 and σ = 1); Figure S6. Relationship between the basic properties of a community and the performance of network inference for different pairs of model and method; Figure S7. Relationship between the parameters for generating data sets (sampling interval τ and magnitude of noise σ ) and the performance of network inference for different pairs of model and method; Figure S8. Performance of network inference methods for different models (left) and the comparison of the different pairs of model and method (right) for τ = 10 and σ = 1; Figure S9. Performance of network inference methods for different models (left) and the comparison of the different pairs of model and method (right) for τ = 20 and σ = 1; Figure S10. Performance of network inference methods for different models (left) and the comparison of the different pairs of model and method (right) for τ = 80 and σ = 1; Figure S11. Performance of network inference methods for different models (left) and the comparison of the different pairs of model and method (right) for τ = 40 and σ = 0.5; Figure S12. Performance of network inference methods for different models (left) and the comparison of the different pairs of model and method (right) for τ = 40 and σ = 2; Figure S13. Performance of network inference methods for different models (left) and the comparison of the different pairs of model and method (right) for τ = 40 and σ = 4.