1. Introduction
Evolutionary dynamics deals with networks where each node represents an individual that can be either a resident or a mutant. These networks can be complete—in which case, it is a Moran process [
1]—or more generally change according to a birth–death updating rule and uniform initialization process: at each discrete time point, each individual at a given node can implant its offspring in an adjacent node. The probability of implantation is proportional to the fitness of the parent individual and the length of the edge connecting the two nodes. Depending on the structure of the network, certain configurations facilitate the spread of the mutant, accelerating fixation, while others slow it down or prevent the spread of the mutant [
2,
3,
4]. The probability of fixation is the probability that a single mutant initially placed in one of the nodes of the network will asymptotically establish itself in all nodes [
2]. The probabilities of mutant fixation depend on the temporal trajectories of the networks [
5].
How can the amplifying power of a network be reduced? Looking for a way to reduce the amplification power of a network, Bhaumik and Masuda [
6] proposed oscillating the network with another fixed network, at intervals of equal duration or not. Both networks can be amplifiers. Even assuming constant mutant fitness over time, they obtained less amplification than with either of the two networks taken over the entire period. Alcalde Cuesta et al. [
7] modified the edge weights by removing all non-trivial loops, allowing the expected time to fixation or extinction to be shortened. Alcalde Cuesta et al. [
8,
9] gave examples of suppressor graphs, noting that these graphs are “hard to isolate or identify”. This is precisely the purpose of this article: to treat any network with a given number of neighbors in order to make it less amplifying, if not suppressive.
Moreover, Li et al. [
10] modified a given network
into a given network
after
T time steps with the addition of sufficient “energy”
at time
t from 0 to
T. The dynamic is
and the energy
, where
here denotes transposition, is to be minimized. These authors did not address the issue of reducing the amplifying effect of networks, which is the topic of this article.
Research question: Instead of choosing a network that oscillates between two fixed networks, can a given directed graph invaded by a mutant with fixed relative fitness s-which measures the reproductive success of an individual relative to others in the population—have its edge weights modified under resource constraints in T time steps in order to make the resulting network as less amplifying as possible?
The successive modifications of the initial network have no target network but are required to provide the smallest possible fixation probability at horizon T. Optimization on the disturbance matrix will firstly make it possible to characterize the initial graph as likely to be amended or not. Secondly, if it is, it will be used to modify the initial matrix under resource constraints in order to reduce the amplifying effect as much as possible. The mathematical point of interest is to interweave stochastic optimization within a network with eigenvector solving, and to deal with a question relevant to evolutionary biology.
Implementing network control has to do with cancer therapies, for example. Knowledge of the effects of individual drugs and the interactions between multiple drugs has made it possible to predict the effect of drug synergies on cell-to-cell signaling [
11]. Pharmacologically, it is possible to manipulate protein interactions (which constitute the interactome) [
12] and influence the plasticity of intercellular networks [
13]. Li et al. [
10] provide further examples of network control in oncology, in ant behavior, and in networks amongst friends.
After recalling the calculation of the fixation probability (
Section 2.1), the initial networks, the resource constraint, the number of nodes, the relative fitness of the mutant, and the time horizon are varied to estimate the difference between the fixation probabilities associated with the initial networks and their modified versions (
Section 2.2). In addition to the parameters, these quantities will be related to the variation in the mean variances of the incoming or outgoing edge weights.
The main finding is that the amplification effect of certain initial graphs can be reduced. This probability is higher when each node has a larger number of neighbors, when the mean variance of the weights of incoming edges is high, and when that of the weights of outgoing edges is low. Among these amendable graphs, the probability of fixation is reduced as the time horizon increases, the nodes have fewer neighbors, the resource and the relative fitness are higher, and the mean variance of incoming edge weights is low and that of outgoing edge weights is high. Moreover, the modification leads to more heterogeneous outgoing edge weights and to more homogenous incoming edge weights. This optimal trade-off on the edge weight distributions is estimated numerically.
2. Materials and Methods
2.1. The Fixation Probability for a Static Network as a Fixed Point of Recursion
The
N vertices of a network associated with an adjacency matrix
C are occupied either by a wild (resident) individual with fitness 1 or by a mutant with fitness
s. A state of the network is described by the
N-dimensional vector
, with
if vertex
i is occupied by a mutant type and
if
i is occupied by a resident type. The set
B of all states
b has dimension
. A bijection
h assigns a unique integer from
to each state
and conversely. An individual placed at a vertex
i reproduces with a probability proportional to its fitness (
s if it is a mutant or 1 if not) [
3]. Its offspring replaces the occupant of vertex
with a probability proportional to the weight of the directed edge
. The process continues until all nodes are occupied by the same type, which is then said to be “fixated”. This recursion is described by a
Markov transition matrix
, whose entries are the probabilities that the state of the network changes from one value to another in one time step [
6].
A state
b with
m mutants, with
for
and
for
, where
is a permutation on
, is changed into a state
with
mutants, such that
for
and
for
. The entry of the Markov transition matrix
from
b to
is
Besides, the entry of the Markov transition matrix
from
b to the state
having
mutants such that
for
, and
for
, where
is
The probability that
b is unchanged is
The fixation probability of a mutant initially occupying state
b is the
th coordinate
of the eigenvector
, where
here denotes transposition, of
with the eigenvalue 1 [
2,
14]:
As
—because if the mutant is absent from all nodes, it has no chance of becoming fixated—and
—because if the mutant already occupies all nodes, it is fixated, for certain—solving Equation (
4) amounts to solving
equations.
The fixation probability
is the mean over all
N vertices of the coordinates of
for which initially the mutant occupies a single vertex [
6]. The subset of the states associated with these coordinates is denoted
, so that the fixation probability is
2.2. Controlling for the Least Possible Amplification
The adjacency matrix , with , is augmented with a matrix to at time . is denoted with the subscript t, instead of , to emphasize that is a control. Each satisfies the resource constraint , where is a small enough positive constant.
The variation in the fixation probability after adding the matrices
,
, is
The minimization program of the fixation probability after
T time steps is
under
The expression of
provides no analytical formula for the optimal
, due to the non-linear entanglement of the successive
in
, each row of which is re-scaled, then in the transition matrix
, then in its first eigenvector
, then in the expression of the fixation probability
, which involves only the components of
. Numerical minimization is required. The computer code is in Ratfor, calling IMSL routines for solving the system of equations. It uses an algorithm of stochastic optimization, presented in
Appendix A, and adapted from [
15]. The calculation with a powerful computer (two 64-bit “AMD EPYC 7402” physical processors (maximum speed: 3 Ghz) for a total of 96 logical processors) took more than six months. The article therefore fills a gap for
, but the solution is computational and the analysis is necessarily statistical.
As with all network problems, network dimension is an issue. For networks of dimension 8 with 4 neighbors at time horizon
, the minimization involves
controls (1 control to parameterize the upper bound of
and the
controls
,
,
, for the
closest neighbors plus the reflexive loop). Beyond that, for larger networks, parallelization and distributed architectures can handle very large dimensions, particularly with distributed stochastic methods [
16], which involve calculating the gradient on a subset of data on different nodes and then aggregating the results. With this method, minimization can in principle involve millions of variables and the network can have millions of nodes. Processing large network dimensions is fine. However, the purpose of this article is not to conquer large dimensions (up to what value?), but to demonstrate the potential of a program designed to minimize the fixation probability. Doing so up to dimension 8 proves the point. Analyzing results with dimensions 6 through 8 is sufficient to identify econometric patterns.
For each value of , with time horizon , dimension , number of closest neighbors , relative fitness , and resource constraint , at least 30 initial matrices associated with weighted directed graphs are simulated. Robustness is verified using bootstrapping. The mean obtained by direct calculation failed outside the bootstrap confidence interval for and for each dimension N. Ten more draws were added for each of these , ensuring that the associated bootstrap confidence intervals included the means obtained by direct calculation. The weights of their edges between the closest neighbors were taken by uniform sampling between 0 and 1, so that the addition of was calibrated relative to C. The vertices between the closest neighbors are valued in each direction; the others are set to zero.
The minimization program {(
7), (
8)} is solved numerically by stochastic optimization on all entries
,
, of the matrices
,
. At each step, it requires calculating
, solving
in
, computing
, and repeating this process in the framework of stochastic optimization.
3. Results
Figure 1 shows an example of such an optimal modification. In this example,
decreases at each time step, but this is not compulsory.
Figure 2 presents the means of
computed on samples of 30 or 40 matrices (to obtain the means by direct calculation included in their bootstrap confidence intervals) drawn at random by value of
for
time steps taken as an example (similar figures are obtained for
).
At horizon
T, for certain matrices
C, the additions of
fail numerically to reduce the fixation probability. The percentages in
Table 1 for
of amendable matrices increase with resources
, relative fitness
s, and the number
of neighbors. The percentages for other values of
T do not differ much according to the value of
: the overall percentage is 56% for
, 56% for
, 57% for
, and 58% for
.
Estimation: As previously mentioned, for some simulated matrices, the fixation probability is not reduced after minimization in
T steps. The question arises as to whether to account for this fact, which requires using Heckman’s two-stage selection regression [
17], or to estimate the determinants of
in a single regression, regardless of whether this criterion is zero or not. The lack of improvement in
after
T steps may indicate a lack of linearity in the relationship between the initial matrices and the final difference in fixation probabilities. Matrices for which the criterion does not improve may have a particular structure: the first random draw would yield the optimal adjacency matrix. This possibility is supported by the fact that the difference
, when strictly negative, is small, as
Figure 2 shows. This structural difference between amendable and non-amendable networks is accounted for by Heckman’s two-stage regression. The first stage characterizes this structural difference, while the second stage applies only to the amended matrices. Equation (
9) below will show that amendable matrices have a different structure from matrices that are not amendable. Therefore, Heckman’s procedure is more consistent than mixing all the matrices into a single regression.
The estimation system consists of the probit model of the probability that , which provides the inverse Mills ratio (IMR), to be included as an explanatory variable in the regression of for the draws for which .
To answer a reviewer, the probit model is based on the cumulative distribution function of the normal distribution. Its sigmoid curve, which allows for a gradual transition between 0 and 1, reflects a nonlinear relationship between the explanatory variables and the probability that the binary variable equals 1 (
) versus 0 (
). The use of normality in the probit response function, as in any binary regression model, does not mean that the residuals must be normally distributed. The reason is that the residuals from binary regression are of the deviance or Pearson type. They measure the difference between the observed value (0 or 1) and the predicted probability. These residuals are discrete or binary, or fit a binomial distribution, which does not correspond to a continuous normal distribution. In logistic or probit regression, more important than the normality of the residuals is the significance of the explanatory variables and the consistency of the estimates, which is the case here for Equation (
9) below. The distribution of the residuals may be skewed or have heavy tails [
18,
19,
20].
The probit model includes the explanatory variables , where and are the mean variances of the outgoing and incoming edge weights of the initial adjacency matrix C. Formally, , and . Because each row of , , is rescaled to build , as is a probability, the mean out-degree (the out-degree at a vertex is the sum of the weights of edges leaving the vertex) is constant (equal to 1). This is also the case for the mean in-degree (the in-degree at a vertex is the sum of the weights of edges arriving at this vertex).
The possible explanatory variable on the entire set of simulated data turns out to be collinear with s and is therefore not included in the probit of . For the regression of conditioned on , the collinearity of with s is sufficiently reduced such that can be included in the explanatory variables, with quadratic and cross-effects with s. The presence of among the explanatory variables reflects the fluidity properties of the initial network C. Adding the matrices is likely to increase or decrease the weights of the edges, which is reflected in the variations and . These two variables themselves depend on ; hence a system of three regressions, each incorporating the IMR and estimated by taking into account the variance–covariance matrix between the residuals of these three equations.
Estimation result: On normalized variables, Heckman’s two-step system is estimated on 21,600 observations for the probit as
for the
jth draw and where the star indicates significance at the 5% threshold. The first number under the coefficient is the associated standard deviation. The perturbations
are homoscedastic of zero expectation. The model correctly classifies 63% of the predictions, which is not low, but not that high either. The point is that several explanatory variables have significant coefficients, so taking the selection effect into account should improve the estimation of the associated three regressions below. The average marginal effects (AME) presented under the coefficients, in second position, highlight the major conflicting influences (AME = −0.67 and 1.89) of the two mean variances of the weights of the outgoing and incoming edges of the initial adjacency matrix
C, and, in third place, overshadowing the effects of the other parameters, the role played by the number
of neighbors of each node (AME = 0.46). The probit model was run successively without
and without
: The coefficients varied only slightly compared to the results of the probit model that included these two variables. This proves that the multicollinearity between
and
is sufficiently low to validate the probit model including these two variables.
Out of the 21,600 simulated draws, 12,308 have
. For the
jth draw of them, estimation on normalized variables is
where the numbers in parentheses are standard deviations; the
,
, are homoscedastic of zero expectation; and correlations are estimated as
,
,
. The coefficients of determination
0.40, 0.45, and 0.30, respectively. In a Heckman model with a nonlinear relationship and cross-effects,
can often appear relatively low, even if the model is actually relevant and useful [
21]. This stems from the nonlinear and inextricable dependence of the dependent variable on its predictors. An
around
is acceptable and even good in this context of nonlinear relationships [
21], and specifically in the Heckman model [
22].
4. Discussion
The
probit model of Equation (
9) has shown that the graph modified by the
is more likely to be less amplifying with more neighbors (
), which gives more possibilities, with more resources
, and with a higher relative fitness
s. The more nodes the graph has (the higher
N), the less the graph can be amended towards less amplification, but this effect is small (0.02). The time horizon
T has a negative effect, because if the matrices are not modified at the first step
, they are not modified at the subsequent steps
.
On the subset of draws for which
C manages to be amended towards less amplification, System (
10) of simultaneous equations shows the determinants that reduce amplification (their coefficients are significant and negative) and those that increase amplification (their coefficients are significant and positive). System (
10) thus shows that a longer time horizon
T favors reduction (coefficient −0.04), as a longer time horizon allows more resources to be used to modify the graph. This is also the case for the dimension
N of the graph (coefficient −0.02): the larger it is, the less likely the mutant becomes fixated, as it faces greater competition from more numerous residents. This is also the case for the relative fitness
s and its cross-effect with the initial probability of fixation
, as the term
corresponds to a downward curve as a function of
s for
, regardless of
. From this term, the decrease in
accelerates when
increases. Therefore, from the probit model, an initial matrix
C with a higher relative fitness
s has a higher probability of being amended towards less amplification, and, from the first regression of (
10), the expected reduction in amplification is all the greater.
This regression also shows that the fixation probability is reduced when the initial adjacency matrix has a high mean variance of outgoing edge weights (coefficient −0.32) and a small mean variance of incoming edge weights (0.45).
Moreover, an increase in the mean variance of the weights of the outgoing edges between C and favors a reduction in the amplification effect (coefficient −0.44). As the weights of the outgoing edges are normalized by row for the calculation of , the increase in the mean variance corresponds to greater heterogeneity of the outgoing edge weights. Then, after the modification sequence , certain weights are relatively weaker compared to the strongest links, which amounts to weakening the probability for the mutant to spread through these edges. This is consistent with the coefficient 0.08 of the number of neighbors, which thwarts the reduction in amplification.
The growth in the mean variance of the weights of the incoming edges also thwarts the reduction in the amplifying effect (coefficient 0.17). Limiting amplification requires distributing the risks of contamination among the incoming flows, thus tending toward uniform weighting of incoming edges, which, because of the correlation at 0.36 between and , runs up against the amplification-inhibiting role of . The regression of on gives a coefficient of 0.49 (SD = 0.01), which leaves an advantage for the effect of . The search for less amplification therefore involves increasing the heterogeneity of the weights of the outgoing edges, at the cost of increasing the homogeneity of incoming edge weights.
At last, the significance of the IMR coefficient confirms that the bias arising from selection must be taken into account.
The variance–covariance matrix provides correlations close to 0 between the regression residuals of and those of the regressions of the two other dependent variables, so that taking the variance–covariance matrix into account does not change the estimates in the regression of taken individually, but this deserved to be tested.
Limitations: As is often the case with graph-related problems, a limitation is the size N of the graph. Parallelization and distributed architectures could handle larger networks. Also, rather than solving equations, the fixation probability could be computed by letting converge for large and fixed t for an initial N-vector . This adds even more computing time. The aim, however, here, is not to explore large graphs; it is simply to show that minimization is possible and that the largest reduction in the fixation probability is related to the properties of the graph. The time horizon is another limitation, which lengthens the sequence of control matrices accordingly. Each adjacency matrix was simulated with a fixed number of neighbors. This has the advantage of quantifying the influence of this number by varying it and of working with a fixed set of parameters. However, modifying the edge weights in stages can be adapted to any given initial network, whether it was for example generated to be scale-free, random, or small-world.
5. Conclusions
The question was whether it was possible to amend a given directed network in order to reduce its power to amplify the spread of a mutant spreading according to the birth–death rule. The idea was to modify the weights of the edges of the graph associated with each step of a given horizon, subject to resource constraints. As amplification power is measured by the fixation probability associated with the adjacency matrix C of this graph, reducing this probability as much as possible led to a minimization program under resource constraints. This program cannot be solved analytically. A stochastic minimization in large dimension showed that, yes, reducing the fixation probability under resource constraints is feasible. This is the first main result. However, the computational cost is quite high for dimensions ranging from 6 to 8, because stochastic minimization requires repeated calculation of the transition matrices and their first eigenvectors in a simulated annealing scheme.
The second main result comes from the simulation of 21,600 random directed matrices, which helped clarify the influences of the parameters, the fixation probability of the initial adjacency matrix, the mean variances of the weights of outgoing and incoming edges of the initial graph, and the variations of these mean variances. The trade-off between these mean variances is estimated in the probit model by the coefficients −1.69 and 4.73 and in the first regression of System (
10) by the coefficients −0.32 and 0.45 for the initial graph. It is backed up by the effects of the same opposite directions (−0.44 and 0.17) in the first regression of System (
10) of the differences between the initial and final graphs of these mean variances. A promising research direction is the simultaneous minimization of
and maximization of
in order to examine how the resulting Pareto optimal solutions [
23] influence the fixation probability.
These results make it possible to estimate the probability of reducing the amplifying effect of any given adjacency matrix C and to stipulate which perturbation matrices to use in order to reduce the amplifying effect from to as much as possible, by exploiting the variability and distribution of connection weights.