Backtracking and Mixing Rate of Diffusion on Uncorrelated Temporal Networks

We consider the problem of diffusion on temporal networks, where the dynamics of each edge is modelled by an independent renewal process. Despite the apparent simplicity of the model, the trajectories of a random walker exhibit non-trivial properties. Here, we quantify the walker’s tendency to backtrack at each step (return where he/she comes from), as well as the resulting effect on the mixing rate of the process. As we show through empirical data, non-Poisson dynamics may significantly slow down diffusion due to backtracking, by a mechanism intrinsically different from the standard bus paradox and related temporal mechanisms. We conclude by discussing the implications of our work for the interpretation of results generated by null models of temporal networks.


Introduction
Random walks (RWs) play a key role in network theory [1].RWs are at the core of algorithms to explore the network structure and to uncover its important features, such as the centrality of the nodes [2,3]) or the presence of communities and modules [4,5].RWs are also often used as a simplified model for the diffusion of an entity, e.g., people in a network of places, a virus or information on a social network, etc., and several works have focused on the impact of static properties of a network, e.g., its degree distribution, on the dynamical properties of a RW [6].Driven by the availability of longitudinal data of empirical networked systems, and the increased importance of temporal networks [7,8], it is only much more recently that researchers have considered how the temporal properties of a network affect diffusion.Empirical observations have shown that temporal properties of networks strongly differ from classical homogeneous Poisson processes, due to their non-stationarity [9], correlations between the activation times of network entities [10][11][12] and fat-tailed inter-event times of activations [13].A central question is to understand the mechanisms that either accelerate or slow down the diffusion, for instance through the characteristic time for the dynamics to converge to the equilibrium state.This question has been considered by means of numerical simulations, by simulating a diffusive process on empirical temporal network data [14], and comparing its speed with the same process run on randomized null models [15].A theoretical approach, which we also adopt here, consists in neglecting correlations between activations of different edges, and modelling their dynamics as independent renewal processes [16,17].In the taxonomy of RWs on networks, this corresponds to the popular edge-centric passive RW [1].In particular, we explore in detail the implications of an apparently paradoxical situation [17,18]: despite the fact that edges are independent processes, they cease to be independent along the path of a walker when the inter-event time distribution is non-exponential, which may lead to biases in its dynamics and non-Markovian trajectories.
In this paper, we illustrate and analyse this effect for a specific dependency pattern between successive jumps, namely the tendency for the random walker to backtrack, i.e., return to the previously visited node more than a purely Markovian walker would.Our contributions are twofold.We first compute the backtrack probability as a function of the shape of the inter-event time distribution.Second, we estimate the impact of the resulting bias to backtrack on the mixing rate of the process.Taken together, these results allow to quantify a mechanism that may either slow down or accelerate diffusion, by changing the number of steps leading to mixing, which is inherently different from well-known mechanisms such as the bus paradox [19] or other temporal mechanisms [20], only affecting the time to relaxation, and not the number of steps.Our observations also allow us to gain insight into unexpected properties of a standard null model for temporal network analysis.

Bias on the Probability of Backtracking
The edge-centric passive RW is defined as follows.Edges are activated for an infinitesimal duration.The time between two consecutive activations of an edge is governed by a renewal process, with i.i.d.inter-activation times distributed according to a probability density function.The activation processes on different edges are independent.For the sake of notation, we consider the same inter-activation distributions f (t) for every edge; nonetheless, the forthcoming analytical derivations hold for distinct edge activities as well.The random walker waiting on a node jumps through the first edge incident to the node that is activated.As we consider continuous inter-activation distributions, the probability that two edges activate simultaneously is almost surely zero, and the walker never has to choose between multiple available edges.In order to describe the process, it is crucial to estimate the waiting time distribution g(t), i.e., the time that the random walker arriving on a node has to wait before a given edge activates.Under the assumption that the arrival of the walker on the node is independent of the edge activation, the relation between the inter-activation distribution f (t) on the given edge and the waiting-time distribution g(t) is given by: where τ ≡ +∞ 0 τ f (τ) dτ is the mean inter-activation time.The observation that g(t) may have very different moments from f (t), for instance a much larger mean is known under the name of bus paradox, or inspection paradox [19].In a popular example, one may think of someone arriving at a bus stop and having to wait (mean of g) for a bus much longer than the mean inter-arrival time between two buses (mean of f ).
It is important to note that this independence assumption is, in general, not respected if the walker passes several times by the same edge, as information about the previous passage time may help to predict the next activation time.This effect is most apparent for undirected networks, which we consider from now on.Consider a walker taking an edge from i to j.While activation times to nodes different from i may be affected by the bus paradox, this is not the case for the edge going back from j to i, i.e., the backtracking transition, see Figure 1.The waiting-time distribution for j to i is thus the inter-activation distribution f (t), while we assume it is reasonably approximated by g(t) for other edges (we neglect, in particular, memory effects due to the random walker exploring other short cycles such as triangles, leading to a similar if attenuated effect, see Discussion).The resulting statistical difference between backtracking and non-backtracking transitions may thus lead to biases in the dynamics of the walk, as the backtracking edge will be either favoured or penalized compared to the other edges depending on the underlying dynamics.
Illustration of the backtracking bias on an edge ij.When the walker arrives on j via ij, the next activation time t of the edge ij is given by the distribution f (t) of the renewal process, whereas the next activation time τ of another independent edge jk is given by the distribution associated to the bus paradox g(t).
Let us now determine the probability P d that the walker performs a backtracking jump as a function of the degree d of a node.Denoting X 1 , . . ., X d−1 ∼ g(t) as the independent identically distributed waiting-times for the activation of the d − 1 other competing edges, we have The independence of the edges implies where we have assumed that each edge has the same inter-activation distribution g(t) for the sake of simplicity.When this assumption is not verified, the forthcoming development holds using Equation (3) instead of (4).Injecting ( 1) and ( 4) in ( 2) and permuting the integrals yields The expression of P d can also be rewritten so that the cumulative density function of f (t) appears explicitly: where The probability P d depends on the number of competing edges d − 1 but also on the shape of the distribution.In particular, the presence of powers of r in the integral indicates that the shape of the distribution impacts the backtracking probability P d at least through its d − 1 first moments and thus through its variance.In the Poisson case, where f (t) is an exponential distribution λe −λt , the backtracking probability simplifies into the memoryless case Another interesting case is the power-law for t ≥ t min and f (t) = 0 for t < t min with α > 2 (since the expression of g(t) assumes finite mean), where Numerical simulations illustrate these results in Figure 2 where f (t) follows various distributions including the exponential, gamma and power-law distributions.Note that the numerical convergence of the simulation is not guaranteed when the variance of the waiting-time distribution becomes infinite, which happens, for instance, for power-law distributions of exponent α < 3.For each of these families of distributions, the higher the variance, the higher the probability of backtracking.However, as mentioned before, the backtracking probability depends, in general, on the full shape of the distribution.Probability of backtracking on an edge for various distributions: Power-law, Gamma (standard deviation σ and Exponential).Monte-Carlo simulations (circle) and theoretical curves obtained with (5) (solid line) coincide.For a given family, the higher the variance σ 2 , the higher the probability of backtracking.For power-law distributions with small exponent, the backtracking probability remains large and decreases slowly as the degree increases.
As a next step, we test the importance of this effect in real-world systems by considering four datasets of face-to-face contacts described in [21][22][23][24][25]. From the recorded contacts, we obtain the largest component with a typical size of a few hundred nodes.We extract the inter-activation times between each pair of individuals, and aggregate them in the empirical inter-activation distribution f (t).We simulate a RW on the corresponding homogeneous network where every edge activity is a renewal process governed by f (t).The probability of backtracking as a function of the nodes degree, computed up to the largest node degree of the network , is displayed in Figure 3.We observe in the real-world data that backtracking is, overall, higher than in the memoryless case.This result shows that the backtracking bias is inherent to random walk processes, even when edge activities are uncorrelated.
The paradox lies in the fact that the random walker has a tendency to take one particular edge over the others, even if each edge is statistically equivalent.(corresponding to the Markovian case), even under destruction of the correlations between edges activities.Standard deviation σ of the empirical inter-activation distributions f (t) are given in parentheses.For each dataset, the probabilities have been computed up to the largest nodes degree in the corresponding network.

Impact of Backtracking on the Mixing Rate of the Random Walk
In the previous section, we have shown that the shape of the inter-activation distribution may induce a backtracking bias for random walkers on a temporal network.We now estimate how this bias impacts the speed of diffusion, by estimating the mixing rate of the process.From now on, we take a discrete-time perspective and no longer consider the timings at which events take place.Time is measured by the number of steps k performed by the walker, thus focusing on the question: on average, how many steps does a walker have to perform for the process to reach equilibrium.This approach is in contrast with previous works focusing on the impact of the inter-activation distribution on the mixing time [20] and neglecting backtracking biases.
We first consider a standard memoryless RW process, i.e., where no backtracking bias is present.In that case, the mixing rate is obtained from the second dominant eigenvalue of the transition matrix of the process, equivalent to the spectral gap of the corresponding normalized Laplacian of the graph.As we show in Appendix A, the spectral properties of the transition matrix are equivalent to those of a transition matrix defined on the so-called line graph, where edges of the original graph define nodes in the line graph.This equivalence is relatively intuitive, as both processes are essentially equivalent (only their representation changes), but it is crucial as a line graph formulation is natural to represent second-order Markov processes.In the following, we thus consider the spectral gap of the transition matrix of the line graph, defined as 1 − |λ 2 |, where λ 2 is the eigenvalue with the second largest module.The corresponding eigenmode describes the asymptotic dynamics of the process and is associated to the presence of bottlenecks/modules in the network [5].The spectral gap provides information on the speed of convergence to stationarity since the distance between the transient state of the initial condition and the stationary state decays to 0 as |λ 2 | k for large k.Therefore, the characteristic number of jumps for relaxation to stationarity is of the order − log(|λ 2 |).
Table 1 compares the spectral gaps of the Markovian walker and the backtracking walker (with a backtracking probability computed by ( 5)) for four datasets, showing a significant slowdown of mixing due to backtracking alone.As a next step, we quantify this intuitive effect by performing a first-order approximation around the Markovian case, and focusing on regular networks for the sake of simplicity.Take an undirected network of ν nodes and µ edges, with an ν × ν adjacency matrix A. From A, we get the stochastic transition matrices T s of the network, and G s of its line graph G associated to the standard memoryless RW.Importantly, it can be shown that both transition matrices T s and G s share the same non-zero eigenvalues (See Appendix A for details) and correspond to the same linear dynamics described from the point of view of nodes and edges respectively.We now consider a system where the trajectories of the walker are non-Markovian, such that the transition matrix M s on the line graph differs from the transition matrix of the line graph associated to the Markovian case G s .We consider a small deviation due to the probability of backtracking, by adding a perturbation matrix P : Each row of P captures the bias ji of backtracking from edge i → j to edge j → i compared to a Markovian RW on T s .The line of matrix P corresponding to the jump transition from an edge i → j is made of the entry ji on the column corresponding to edge j → i, and − ji d(j) − 1 for the d(j) − 1 other edges leaving the node j, where d(j) is the degree of node j.For the sake of simplicity, we calculate the impact of ji on the spectrum of M s for regular networks, where the degree is constant and the backtracking bias is thus ji = for every edge j → i.In this case, each eigenvalue λ of M s is associated to an eigenvalue λ 0 of G s through the equality λ = λ 0 + λ * , where the perturbation λ * is to be determined.Standard derivations lead to the first order approximation (see Appendix B for details) λ * : where w L , w R , v L and v R correspond to left and right eigenvectors of G s and T s respectively, associated to λ.The sign of the corresponding shift can be determined as follows.On the one hand, D A is positive definite, on the other hand, K out PK T in is symmetric diagonally dominant with real non-negative diagonal entries by construction, hence positive semidefinite.Therefore, the sign of λ * takes the sign of λ 0 , and consequently the spectral gap of M s decreases when the dynamics favours backtracking, and increases otherwise.
Finally, we validate our linear approximation of the real shift of the spectral gap (1 − |λ 2 |) with respect to the backtracking perturbation by computing the relative error on the spectral shift on several regular graphs.As an illustration, Figure 4 shows the results for a regular network made of two communities of 50 nodes each, which is in the typical range of size of the four studied real-world networks (from 75 to 327 nodes).In each case, the approximation provides small relative error on the estimation of the spectral gap.Moreover, numerical simulations show that the linear approximation gives a lower bound to the true value of the spectral gap, and confirm the trend of slowing-down of the process under positive backtracking bias.

Discussion
The main purpose of this paper was to highlight the existence of a neglected, yet important, correlation taking place in a null model actually designed to destroy temporal correlations in temporal networks.In models where edges are undirected and their activations are independent stochastic processes, dependencies between successive jumps of a random walker are introduced, making the RW non-Markovian.Although we focused on backtracking in this paper, it is clear that further memory is created in the RW by triangles, or short cycles in general.Thus, while backtracking is absent in directed networks (as a return, edge does not necessarily exist or is ruled by an independent activation process), the effects from short cycles remain, and as such, are an interesting topic for further research.
Our findings question the relevance of standard models of diffusion on temporal networks: in the presence of bursty activation patterns, one cannot avoid correlations between events, either in the activation process or in the jumping process-making it a non-trivial task to characterize the 'simplest' diffusion process with a given degree of burstiness.While we do know that real-world diffusion in social or mobility network exhibits non-Markovian patterns [12,26], those patterns sometimes favour, and sometimes reject backtracking regardless of the degree of burstiness of the process, making it clear that they cannot be entirely accounted for by the effect at play in this paper.Whether the burstiness-induced memory is an undesirable artefact of the model or a useful and economical way to generate non-Markovian walks remains to be seen.
We have computed the effect of backtracking on the mixing rate of the diffusion process, due to its modification of the trajectories of the RW.This effect should be further amended by the effect of triangles or longer cycles, likely leading to a further asymptotic slowdown of the diffusion.This is a new mechanism for the impact of network temporality on diffusive processes, adding to intrinsically different mechanisms such as the bus paradox, where a walker may sit on a node for a very long time [11,19], and the further fact that the mixing time of bursty walker may be much larger than the naive estimate given by the number of jumps required to explore the network multiplied by the average waiting time of the walker at each step [20].This is a tribute to the extraordinary richness of phenomena brought by the sole departure from a Poisson or discrete-time diffusion process.A comprehensive theory articulating all these effects on a general temporal network while yielding useful insight into the diffusion in a complex environment is yet to be established.

Appendix A. Shared Eigenvalues of the Transition Matrix and Its Associated Transition Line Graph
We provide here proof that the transition matrix T s of a network and of its associated line graph G s share the same non-zero eigenvalues.
We consider an undirected network of ν nodes and µ edges, with an ν × ν adjacency matrix A and its associated incidence ν × 2µ matrix K, listing each edge of the network in two consecutive column of K with a (+1, −1) entry and a (−1, +1) entry for the two extremities of the edge-the extremity of which receives a (+1, −1) or a (−1, +1) being arbitrary.We decompose the incidence matrix into the difference of two binary matrices K in − K out .
First, it is direct that the adjacency matrix A and its associated line graph G share the same non-zero eigenvalues, since they are the commutated product of the same two rectangular matrices of in and out incidence: As a side note, G has at least 2m − n zero eigenvalues, where ν and µ are respectively the number of nodes and edges of A. The transition matrices are obtained by normalizing the adjacency matrices by the degree of the nodes: where D A and D G are the diagonal matrices of degrees of A and G respectively, and verify Let λ be a non-zero eigenvalue of T s , and v an associated eigenvector of λ.
Then, w R = K T in v is a right eigenvector of G S associated to λ. Indeed: Similarly, the left eigenvector of G s associated to λ is given by w L = K T out v.

Appendix B. First Order Approximation of the Spectral Gap for Regular Network
We provide here the development to obtain the first order approximation of the shift of the spectral gap of a regular network.The eigenvalue of the transition matrix M s is λ = λ 0 + λ * , where λ 0 is an eigenvalue of G s .The goal is to find the perturbation on the eigenvalue λ * .The right eigenvector x R of M s associated to λ may also be expressed as x R = w R + y R , where w R is a right eigenvector of G s associated to λ, and this leads to By definition of an eigenvector, (A1) and (A2) are equal.Keeping the terms in yields to Isolating λ * in (A3) leads to the Equations ( 7) and ( 8): As discussed, λ * takes the sign of λ 0 , so under positive backtracking bias , the eigenvalues of the transition matrix are shifted away from 0, resulting in a decrease of the spectral gap.

Figure 2 .
Figure 2.Probability of backtracking on an edge for various distributions: Power-law, Gamma (standard deviation σ and Exponential).Monte-Carlo simulations (circle) and theoretical curves obtained with (5) (solid line) coincide.For a given family, the higher the variance σ 2 , the higher the probability of backtracking.For power-law distributions with small exponent, the backtracking probability remains large and decreases slowly as the degree increases.

Figure 3 .
Figure3.Probability of backtracking on an edge for real-data.Each edge is governed by an i.i.d.renewal processes.All inter-activation times on all edges have been aggregated to a unique global distribution f (t).The probability of backtracking is much larger than 1 d (corresponding to the Markovian case), even under destruction of the correlations between edges activities.Standard deviation σ of the empirical inter-activation distributions f (t) are given in parentheses.For each dataset, the probabilities have been computed up to the largest nodes degree in the corresponding network.

Figure 4 .
Figure 4. Relative error of the linear approximation (9) in a network composed of 2 communities (2 cliques of 50 nodes of degree 50).

Table 1 .
Shift of the spectral gap of the transition matrix due to the backtracking bias induced by the network temporality.The spectral gap is largely reduced, showing the strong impact of the inter-activity distribution on the number of steps required to explore the network.