1. Introduction
Graphs
—also referred to as networks—provide a natural mathematical framework to represent a wide variety of complex systems arising in molecular, ecological, technological, and social contexts [
1]. In this representation, the set of vertices
V typically corresponds to the entities of the system, while the set of edges
E encodes their interactions. A fundamental mechanism governing the transport of mass, energy, or information across such networks is diffusion [
2,
3]. However, due to the complexity of the environments in which many real-world networks are embedded, transport processes often deviate from classical diffusive behavior [
4,
5,
6,
7].
Subdiffusion, characterized by a slower-than-linear growth of the mean squared displacement, is especially prevalent in complex systems. For instance, subdiffusive dynamics are ubiquitous in the crowded interior of biological cells [
8,
9,
10,
11], where millions of macromolecules interact, forming intricate networks of biochemical processes. Similarly, crowding effects caused by vehicle density and driving fluctuations in urban transportation systems have been shown to induce subdiffusive traffic states [
12]. Subdiffusion has also been observed in information transmission processes on online social networks, such as Twitter (now X) and Digg [
13].
A standard mathematical approach to model such a crowded and heterogeneous system is to replace the classical time derivative in the diffusion equation with a fractional-time derivative [
14,
15]. When anomalous diffusion takes place on a graph
G, this leads to the following fractional diffusion equation:
where
denotes the diffusivity,
is the Caputo time-fractional derivative, and
L is the graph Laplacian operator acting on functions defined on the vertex set. Specifically, denoting by
the set of all complex-valued functions on
V,
L is the linear mapping from
into itself given by
The operator
denotes the Caputo time-fractional derivative of order
[
16], defined by
where
is the first derivative of
u evaluated at time
. Despite their broad relevance in continuous settings, time-fractional diffusion models have been only sparsely explored on graphs and networks. To date, applications have been largely restricted to epidemiological modeling [
17,
18] and fractional diffusion on the human proteome, proposed as an alternative framework to account for the multi-organ damage associated with SARS-CoV-2 infection [
19]. An exception is the use of (
1) in an engineering context for achieving consensus in autonomous systems, where it is referred to as fractional-order consensus [
20,
21,
22,
23,
24,
25]. A rapidly growing line of research explores the integration of fractional derivatives into learning algorithms, leading to the development of fractional-order neural networks and related architectures, with emerging applications in machine learning and artificial intelligence [
26,
27].
Apart from crowding and excluded-volume effects, which reduce mobility by limiting the available space for motion, several additional mechanisms can give rise to subdiffusion in complex systems. This happens, for instance, in ecological networks, where structural disorder and heterogeneity—manifested through irregular geometries, bottlenecks, and hierarchical or fractal structures—constrain transport pathways and significantly slow down spatial exploration [
28]. These features often induce trapping events and lead to broad distributions of residence times. Another characteristic of complex systems is the existence of memory. Temporal memory effects have been observed, for instance, in transcription regulator–DNA interactions in live bacterial cells [
29], apart from the extensive existing evidence accumulated on single-cell experiments. As memory refers to the phenomenon where past events influence a system’s current and future states or behaviors, the evolution of these complex systems at a given time depends on their entire past history rather than solely on their instantaneous state.
In such non-Markovian settings, classical diffusion equations fail to provide an adequate description. Instead, diffusion models with memory kernels naturally arise, capturing the nonlocal-in-time response induced by trapping, heterogeneity, and temporal correlations [
30,
31,
32,
33,
34,
35]. Once again, in the case of graphs, this leads to the generalized diffusion equation
where
denotes a memory kernel. Such formulations provide a unifying framework for describing subdiffusive transport and establishing a direct connection between microscopic mechanisms and macroscopic anomalous diffusion. Related non-Markovian formulations on networks arise naturally when vertex-to-vertex motion is modeled as a non-Poisson continuous-time random walk. In this setting, the node-state evolution is governed by a generalized master equation in which the graph Laplacian (or transition operator) appears entwined in time with a kernel determined by the waiting-time distribution, and memory effects are known to substantially alter diffusion and mixing properties on (temporal) networks [
36,
37,
38].
While time-fractional diffusion equations and memory-kernel formulations are widely used to model subdiffusive transport in continuous settings, their role in the context of networks and graph-based diffusion has been, so far, explored mainly at a phenomenological level. Existing studies typically emphasize anomalous scaling, spectral properties, or long-time relaxation, but often treat memory as a uniform slowing down mechanism acting on otherwise classical graph diffusion [
39,
40,
41]. As a result, comparatively little is known about how non-Markovian effects interact with the discrete geometry of a graph, and how they influence transport pathways, or how they modify the vertex-level and path-level behavior beyond global decay rates.
The present work is motivated by the need to clarify how subdiffusion reshapes diffusion on graphs at the structural level. By exploiting the subordination principle [
42] and a mass-preserving sum-of-exponentials representation of the Mittag-Leffler operator, we connect fractional diffusion on graphs to a superposition of classical heat processes acting at different internal times. This perspective allows us to study, in a unified way, memory effects, vertex-dependent waiting times, effective distances, and path selection in subdiffusive dynamics. In doing so, we show that memory does not merely slow down diffusion uniformly, but induces heterogeneous temporal behavior across vertices and leads to well-defined subdiffusive distances and paths that differ from their classical diffusive counterparts. These results provide a concrete link between fractional models, memory kernels, and graph-based transport, and offer a more detailed understanding of what subdiffusion means when the underlying space is a network.
1.1. Subdiffusion: Physical Mechanisms and Mathematical Models
A central hallmark of anomalous diffusion is the deviation of the mean squared displacement (MSD) from the linear-in-time growth predicted by Fickian diffusion. Instead, one observes a power-law scaling [
5,
6,
43]
which defines subdiffusive behavior. As emphasized in [
6], such subdiffusion is not a single phenomenon but rather a collective outcome of different physical mechanisms, each leading to distinct stochastic and mathematical descriptions.
One prominent mechanism underlying subdiffusion is trapping [
6,
14,
15]. In crowded or energetically disordered environments, particles may experience long waiting times between successive displacements due to transient binding or deep potential wells. This situation is naturally described by the continuous-time random walk (CTRW) framework, where the waiting times between steps are independent random variables drawn from a heavy-tailed distribution
implying a diverging mean waiting time.
In the scaling limit, CTRW dynamics lead to a fractional diffusion equation (FDE) for the probability density
. Sokolov [
6] stresses that the use of such fractional equations is physically justified only when the underlying trapping assumptions are valid. CTRW models generally exhibit aging, weak ergodicity breaking, and large trajectory-to-trajectory fluctuations.
Another class of subdiffusive systems arises from transport in labyrinthine or fractal structures, such as percolation clusters or tortuous channel networks [
6,
44,
45,
46]. In these systems, anomalous diffusion originates from geometric constraints rather than trapping times. The particle explores a space with no translational invariance and a broad distribution of path lengths.
A paradigmatic example is diffusion on critical percolation clusters or related fractal media, for which the MSD again follows a subdiffusive power law. While fractional diffusion equations may reproduce the probability density in unbounded domains, Sokolov [
6] emphasizes that they generally fail to capture important properties such as first-passage times or confined-domain behavior. Consequently, geometric subdiffusion and trapping-induced subdiffusion can yield identical PDFs while corresponding to fundamentally different physical processes.
Subdiffusion may also emerge in viscoelastic environments [
47,
48,
49], where the tagged particle is embedded in a complex interacting medium, such as a polymer network or the cytoskeleton. In this case, the dynamics are governed not by trapping or geometry but by long-range temporal correlations in the particle’s motion.
These systems are often described by fractional Brownian motion (fBm), a Gaussian process characterized by correlated increments, or equivalently by generalized Langevin equations (GLEs) with memory kernels,
where the friction kernel typically follows a power law
, and
is a correlated noise term. Depending on the noise properties, the resulting MSD scales subdiffusively. Unlike CTRW-based models, fBm and GLE dynamics are ergodic and do not exhibit aging—a distinction that is crucial for interpreting experimental data.
Another way of modeling subdiffusion is via models based on diffusion equations with time-dependent diffusion coefficients [
50,
51],
Although such models reproduce the same MSD scaling as subdiffusive processes, they are primarily phenomenological fitting tools (see [
6]). Despite yielding the same probability density as fractional Brownian motion, they lack its correlation structure and are more closely related to mean-field descriptions of CTRW dynamics.
In closing, real systems often exhibit subdiffusion of mixed origin, combining trapping, geometric constraints, and viscoelastic effects. In such cases, different models may predict the same MSD or PDF while differing fundamentally in their aging properties, ergodicity, and trajectory statistics [
6].
We emphasize that the analytical ingredients underlying fractional diffusion—such as the subordination identity and the complete monotonicity of the Mittag-Leffler function—are classical results. The novelty of the present work lies not in these foundational properties themselves, but in their synthesis and interpretation in the context of graph-based diffusion, leading to new geometric, operator-theoretic, and vertex-level insights.
1.2. Contributions of the Paper
We employ the subordination relation (
Section 2) to construct a
sum-of-exponentials (SOE) approximation for the Mittag-Leffler function. While the scalar counterpart is well understood, we work with a matrix-valued approximation, tailored to studying the time evolution governed by the graph Laplacian. Physically, we interpret the SOE as a superposition of diffusion-like processes across multiple timescales. The subordination identity and the complete monotonicity of the Mittag-Leffler function are well-established results in fractional calculus. In this work, these classical tools serve as the foundation for a new graph-based framework. Our contribution is to develop a structural and dynamical synthesis of fractional diffusion on graphs that combines operator representations, geometric constructions, and vertex-level interpretations of memory.
In
Section 3, we introduce the mathematical framework of the sum-of-exponentials structure and derive appropriate tail bounds that guarantee the accuracy of the approximation as the number of terms grows to infinity (Theorem 1 and
Section 3.2). In
Section 4, we describe the error metrics used to assess the accuracy of the SOE approximation and discuss the influence of the fractional parameter
on the contributions of the different diffusion-like processes.
In
Section 5, we present new metrizations of the graph, which assign a weight to each edge based on the diffusive, subdiffusive, or SOE-based behavior, respectively. We then introduce our main experimental tool, the
shortest paths in these new metrics on the graph. We prove that in the small-time limit, the
subdiffusive shortest paths coincide exactly with the shortest paths in the original metric (Theorems 2 and 3), with a preference for traversing high-degree regions (Theorem 4).
Our numerical experiments show that the shortest paths in the diffusive and subdiffusive metrics exhibit completely different behaviors: while the diffusive shortest paths tend to cover broader regions of the graph for different time values, the subdiffusive shortest paths preserve path history even over extended timescales. The shortest paths in the SOE metric function as a bridge between these two phenomena. We propose a memory interpretation based on the influence of previous time instants from the diffusive point of view, and on remembering the most efficient paths at smaller values of t from the metrization point of view, revealing memory-assisted navigation on a network.
Next, we study how memory affects the dynamics on a graph from a vertex-based perspective. In
Section 6, we prove that a vertex may receive larger contributions from the more remote or more recent past, depending on the local temporal curvature of the solution to (
1). This happens for positive temporal curvature for any
, and for negative temporal curvature in the limit
(Theorem 5). In
Section 7, we provide examples of how different remote or recent past biases may occur at the same time for different vertices: early-time convexity at sources and concavity at their neighbors. Furthermore, we discuss how algebraic (rather than exponential) relaxation and degree-dependent waiting-time effects fundamentally alter transport, trapping, and path selection on networks.
Finally, in
Section 8, we show how fractional dynamics can be interpreted as arising from a singular limit of multi-rate diffusion with memory, in an appropriate asymptotic sense. Time-fractional equations can be viewed as scale-free limits of a finite superposition of diffusions in the Laplace domain, equivalently describable via operator-valued Volterra memory kernels (Propositions 7 and 8). This provides a common architecture linking SOE approximations, memory equations, and fractional calculus.
In summary, the genuinely new contribution of this work lies in the graph-based synthesis of fractional diffusion, including (i) a mass-preserving, operator-level sum-of-exponentials (SOE) construction for the Mittag-Leffler propagator on graphs; (ii) the introduction of subdiffusive distances and shortest paths that define a new geometry on networks; and (iii) a vertex-resolved interpretation of memory effects, revealing heterogeneous, degree-dependent temporal behavior. These elements provide a unified perspective linking fractional dynamics, classical diffusion processes, and graph structure.
3. Sum of Exponentials Approximation
In this section, we present a practical, mass-preserving approximation of the Mittag-Leffler function as a sum of exponential functions from a mathematical point of view. We start with the following fundamental definition.
Definition 1. The sum-of-exponentials (SOE) approximation iswith coefficients that depend only on α and not on t. Our aim is to obtain
This approximation is constructed as a quadrature of (
11) whose window is logarithmically scaled in
. In the rest of the section, we analyze the mathematical properties of the SOE approximation. Specifically, we perform the following:
- 1.
Provide a log–trapezoidal construction with , (mass conservation),
- 2.
Prove ensuring geometric convergence in J for a fixed window;
- 3.
Derive explicit, uniform tail bounds from to select ;
- 4.
Tabulate ready-to-use window endpoints
(and
) for typical
(see
Appendix A);
- 5.
Provide practical error metrics (relative/probe, mass error) for a posteriori assessment.
The SOE representation is obtained as a quadrature approximation of the subordination integral and should be understood primarily at the operator level. In particular, the coefficients arise from a discretization of the integral over the M–Wright density and do not, by themselves, define a stochastic mixture or a dynamical decomposition of independent processes.
Nevertheless, the representation may be viewed heuristically as a superposition of diffusion-like operators acting at rescaled internal times. This interpretation serves as an intuitive guide for understanding multi-scale behavior, but it is not intended as a literal probabilistic construction unless additional structure is imposed.
3.1. SOE via Log–Trapezoidal Quadrature
Set
,
, and truncate to
. With a uniform grid
,
, define
Then
Remark 1. The nodes correspond to quadrature points in the variable θ arising from the subordination formula, and the weights approximate the probability density in an integral sense. However, the discrete representation should not be interpreted as an exact probabilistic mixture of independent diffusion processes. Rather, it is a deterministic operator approximation whose coefficients inherit their meaning from the underlying integral representation. Any interpretation in terms of multiple time scales or internal clocks is therefore heuristic and serves primarily as a conceptual aid.
Convention 1. In all that follows, we fix the nodes once (they depend only on α and the chosen log–window) and, for any , evaluate the SOE as . Thus, the coefficients do not depend on t or on L. We then have the following results:
Proposition 2. If and , then for all the SOE approximation preserves mass, that is, .
Proof. Since , for each j. Thus, , and left-multiplying by gives . □
Lemma 1. For self-adjoint L and bounded Borel , In particular, the SOE error iswith and . Proof. Diagonalize . Then and □
Assuming that is bounded in a compact set, the error of the SOE approximation decreases geometrically in the number of terms J and uniformly in . This assumption holds provided that we have an upper bound on the eigenvalues of L.
Theorem 1. Let with , . For fixed and step ,with constants independent of h and λ. Consequently, Proof (Analytic trapezoidal rule). The map
is analytic in a strip
and decays rapidly as
(exhibiting doubly exponential decay as
and being integrable towards
). For analytic, rapidly decaying functions, the (periodized) trapezoidal rule error on a finite interval is
via Poisson summation; see (Theorem 2.1, [
57]). Uniform strip bounds over
yield uniform constants
. Finally, Lemma 1 transfers the scalar quadrature error to the operator norm bound. □
3.2. Tail Bounds for the Subordination Integral and Window Selection
Having determined that we can approximate the integral on
, we now show how to choose the window endpoints. We start by considering the subordination identity previously defined and its scalar counterpart
for
. For window endpoints
(equivalently,
and
), we define the truncated operator
so that the truncation error splits as the
left and
right tails:
By the spectral theorem (Lemma 1), the operator truncation error satisfies
3.2.1. Left Tail (Small )
Near
, the M–Wright density satisfies
(see [
54,
55]). A simple bound follows.
Proposition 3. For all , and ,Consequently, uniformly in , Proof. Use for and integrate (with the obvious limit as ). □
We remark that taking forces the left-tail contribution below for all eigenvalues (including the zero mode).
3.2.2. Right Tail (Large )
As
, the density exhibits a
stretched-exponential decay (Chapter 4, [
55]), (Chapter 2, [
54]): there exist
and
such that, for all sufficiently large
,
This yields two complementary bounds.
Proposition 4. Let (
18)
hold for all . Then for any , and ,In particular, the bound is independent of λ (and thus controls the zero eigenvalue). Proof. Dropping the factor , we bound the tail of a monotone stretched-exponential via the standard inequality , which holds for large x. □
Proposition 5. If the graph is connected and we restrict to (thus ), then for any , Proof. Use and . □
We now consider the practical choices based on the previous results:
All modes (including ): Pick
so that
; then by (
19),
for large enough
.
Mean-zero subspace: Set
to guarantee
by (
20).
3.2.3. Window Rules: Left and Right Together
Combining Propositions 3–5 yields explicit choices for that ensure the total tail remains below a target tolerance :
Corollary 1. Given and :
General (all modes). If we choosethen . Mean-zero subspace (connected graph). To estimate the operator on , we employ the following the choice of parametersIf one intends to reuse the same nodes for multiple times , it is sufficient to replace t with in the expression for , so that the nodes are t-independent. Then . Remark 2. The log–trapezoidal SOE uses on , with and . After fixing the window by Corollary 1, increasing J then controls the discretization error inside the window with geometric rate (Theorem 1).
Remark 3. The choice of via Proposition 5 with replaced by yields a conservative upper bound on the tail for high-frequency modes and motivates rules of the form . Empirically, setting ensures that falls near machine precision and works well when the focus is on modes away from the zero eigenvalue; the rigorous alternative (
19)
controls all modes using only properties of . 3.2.4. Constants in the Stretched-Exponential Bound
Explicit asymptotics for
yield
and an exponent constant
. The prefactor has a power
with
and a multiplicative constant depending only on
(see Section 2.3, [
54] and Section 4.3, [
55]). Using these in (
19) yields a fully explicit
.
5. Subdiffusive Distance and Paths
In this section, we introduce new geometrizations of a graph related to the diffusive processes, subdiffusive processes, and the SOE approximation of the latter. We study the shortest paths under these new metrics and observe, via numerical experiments, a distinct difference in behavior: while the diffusive case explores broader regions of the graph, the subdiffusive case recovers the usual shortest paths in the standard metric.
5.1. Subdiffusive Distance
Let us consider the solution to the Caputo time-fractional diffusion equation:
with initial condition
, where
is the vector with a one at position
v and zeros elsewhere. We then evaluate the capacity of the whole graph to diffuse mass between the vertices
v and
w at time
t. That is, we consider at time
t the difference between the mass remaining at the origin, i.e., vertex
v, and the mass diffused to vertex
w:
Similarly, the capacity of the whole graph to move mass from
w to
v conditioned to all initial mass being allocated at
w is as follows:
As we only consider undirected graphs here, the total capacity of mass diffusion between both vertices is as follows:
Then, we have the following result.
Proposition 6. Let be defined as before. Then is real and non-negative. There exists an embedding of vertices , such that .
Proof. Using positivity,
for
, we write
Denote
and by
the
v-th row of
X. Thus,
Then,
is an embedding of vertices in
. Therefore
so
is a square Euclidean distance between the vertices
v and
w. □
Hereafter, we call
the
(squared) subdiffusive distance between the corresponding vertices in the graph when
. Notice that for
, the quantity
is the diffusion distance defined by Coiffman et al. [
58,
59]. These distances are part of a large family of diffusion-like distances on graphs [
60,
61].
5.2. Subdiffusive Shortest Paths
The subdiffusive distance represents how easy it is to transmit mass between two vertices, regardless of whether they are adjacent or not. However, this distance does not directly determine which pathways are utilized for this transmission. Heuristically, paths should traverse the most efficient edges, although it is not immediately obvious how to identify them. To determine these paths in both the standard and subdiffusive regimes, we define the following weighted graph.
Definition 2. Let be the matrix of (subdiffusive) distances, intended as the entry-wise square root of . Define a weighted graph whose adjacency matrix is the Hadamard product , which assigns to each edge the cost induced by the discrepancy of their diffusion profiles at time t. That is,The resulting weighted graph is the geometrization of the original graph The weighted graph
can be seen as a representation of
G as a one-dimensional complex, i.e., a metric length space where each edge
is a compact one-dimensional manifold with boundary
, parameterized as
(see [
62,
63]).
The resulting shortest paths (computed via Dijkstra’s algorithm) identify chains of vertices whose subdiffusion states remain maximally coherent at time
t. We call these paths the
subdiffusive shortest paths. These are not, in general, the paths that require the fewest edges of
A, which we call
topological shortest paths or
geodesic shortest paths. They generalize the
shortest communicability paths introduced in [
64], which were defined using the matrix function
.
Consider the subdiffusive shortest paths on as time varies. When t is close to zero, the effects of diffusion are minimal. That is, each particle has explored only a small part of the graph. In this limit, the weighted metric on the graph exhibits the same behavior as the standard metric, and topological shortest paths are recovered, as the following two results show.
Theorem 2 (Shortest-path dominance for the fractional heat kernel as
)
. Let be a simple undirected graph with combinatorial Laplacian , and let . Denote by the graph distance between vertices . Then, for every ,In particular,and the leading-order contribution is determined by topological shortest paths between v and w. Moreover,where is the number of shortest walks of length from v to w. Proof. The Mittag-Leffler function admits the operator series representation
which converges absolutely for all
since
L is bounded (see, e.g., [
5]).
Taking the
entry yields
As in the classical heat-kernel case, write
. Any term contributing to
corresponds to a product of
m factors, each equal to either
D or
. Diagonal factors
D do not change the vertex index, while each factor
A corresponds to a single edge traversal. Hence, in order for
to be nonzero, the word must contain at least
factors of
A. Consequently,
For
, the only contributing word is
, since any appearance of
D would prevent reaching
w in exactly
steps. Therefore,
Substituting into the series above, the first nonzero term occurs at , which proves the stated expansion. □
Theorem 3 (Short-time selection of shortest paths)
. Let G be a finite connected graph and let be the time-fractional continuous-time random walk associated with the Caputo fractional diffusion equation on G (with ). Let be the number of jumps of up to time t, and let be the graph distance between distinct vertices . Then, for every ,That is, conditioned on arrival at w at very short times, the process selects a topologically shortest path with probability tending to 1
. Proof. Write
. Let
be the embedded discrete-time jump chain of
. The fractional CTRW representation yields
see [
65,
66]. Let
P be the one-step transition matrix of
, so that
. By definition of graph distance,
for all
. Hence,
For the time-fractional walk, the jump-count distribution satisfies the short-time scaling
see [
66]. Therefore, the leading contribution comes from
, and we obtain
Similarly,
Dividing the two expansions yields
which proves the claim.
Finally, on the event , the jump sequence has length d and connects v to w; hence, it is a topological shortest path. □
Remark 4 (Physical interpretation of short-time path selection). The previous results (Theorems 2 and 3) show that, for both classical and time-fractional diffusion on graphs, the short-time behavior is governed by topological constraints rather than by long-time transport mechanisms. In the fractional case, memory effects and heavy-tailed waiting times manifest themselves through the slower time scaling of transition probabilities; however, they do not alter the mechanism by which mass first propagates across the graph. At very short times, the process has insufficient opportunity to perform redundant or backtracking moves, and any realization that reaches a vertex j from i must therefore use the minimal number of jumps permitted by the graph distance. Thus, topologically shortest paths dominate not because they are energetically or entropically preferred, but because they are the only dynamically admissible routes in the short-time regime. Memory affects when such paths become observable, but not which paths contribute to the leading-order behavior. This provides a precise mathematical explanation for the observed agreement between diffusion-based distances and graph distances at very small times, even in the presence of anomalous (fractional) temporal dynamics.
Using Theorem 2 for two adjacent vertices
, we can compute the first-order term of the subdiffusive distance for
Note that usually there are multiple topologically shortest paths between the same pair of vertices. Theorem 4 describes which of these paths coincide with the subdiffusive shortest path as
. We need the following definition: for an edge
, its
edge degree is
Theorem 4. Let be a simple, finite, undirected graph and let be two vertices. Let be the subdiffusive shortest path according to the metric between and , passing through l edges. Then, in the limit , P is a topological shortest path between v and w that maximizes the sum of the edge degrees Proof. The length of the edge
is the square root of the communicability distance
. In the limit
, from (
26) we obtain
Therefore, the length of
P is the sum of the lengths of all its constituent edges
The first term depends only on
l, which in the limit
is the length of (any) topologically shortest path between
v and
w (Theorems 2 and 3). Therefore,
P is the subdiffusive shortest path if and only if it minimizes the second term, or equivalently, if
P has the maximum sum of the edge degrees
. □
5.3. Subdiffusive Shortest Paths on a Geometric Graph
We computationally analyze our previous analytical findings on an example graph. We focus our attention on the family of large-world networks for two reasons. First, large-diameter networks make it easier to differentiate between diffusion and subdiffusion both numerically and visually. Second, in small-diameter networks, both subdiffusive and topological shortest paths comprise very few edges relative to the network’s total size, making them difficult to distinguish.
Specifically, we consider a randomly generated Gabriel graph with vertices and 1156 edges. We use this graph because it is geometric, in the sense that its vertices are embedded in , which allows for clear visual interpretation, particularly when illustrating paths. Gabriel graphs are defined as follows.
Definition 3. Let be a finite set of points, referred to as generators. The Gabriel graph associated with P has vertex set . Two distinct vertices are connected by an edge if and only if the closed ball having the segment as its diameter contains no other points of P.
In this work, we restrict our attention to the planar case
and embed the points in a rectangle with a length-to-width ratio of 2:1. We geometrize the Gabriel graph using both the subdiffusive communicability distance
and its sum-of-exponentials (SOE) approximation. Specifically, we use
and introduce the corresponding approximate squared distance
Throughout this section, we study a low-memory regime with . Using the square roots of both and , we construct the weighted graph as in Definition 2 and compute the subdiffusive shortest paths.
Experimental Results
We begin by reporting the topological shortest paths (TSPs) between two vertices located near opposite corners of the Gabriel graph (see
Figure 3a). As is typical—even for planar graphs such as Gabriel graphs—there exist multiple TSPs between a given pair of vertices. These TSPs are colored according to the
average edge degree along the path (see Equation (
27)).
We next consider the shortest paths induced by
for
(see
Figure 3b). This case corresponds to a standard diffusive process, since it involves a single exponential term, which is the solution of the classical diffusion equation. We observe that, while at early times the diffusive paths coincide with the TSP as predicted by Theorems 2 and 3, they progressively deviate as time increases.
As the number of exponentials
J in the SOE approximation increases, the corresponding shortest paths converge toward the TSP, as illustrated in panels (C)–(E) of
Figure 3. The exact solution based on the Mittag-Leffler function is shown in
Figure 3f, and clearly demonstrates that the subdiffusive shortest paths coincide with the TSPs. Moreover, the limiting paths correspond to those TSPs with the largest average edge degree, in agreement with the analytical results derived in Theorem 4.
The time evolution of the vector probing errors,
and
, is displayed in
Figure 2A. The mass error is at the level of machine precision, confirming that mass conservation of the SOE approximation is numerically guaranteed. The relative error is zero at
(since the two initial conditions coincide); as
t grows, the error increases and then remains above a certain threshold, which indicates that the two solutions
and
are meaningfully distinct.
Figure 2B illustrates how the time averages of the operator and vector probing errors decay as
J increases. Both errors initially decrease before stagnating around
and
, respectively. The mass error remains on the order of machine precision and is therefore omitted.
5.4. Physical Interpretation: Memory Reinforcement of Past States
It is clear that, from a mathematical point of view, increasing
J improves the accuracy of the SOE approximation to the Mittag-Leffler function, leading to convergence of the corresponding shortest paths. To interpret this convergence from a physical perspective, recall that the SOE represents a superposition of diffusive processes. Let
. Then we can rewrite the SOE operator as
where the effective time instants are given by
.
In the example graph, the maximum diffusion speed is
(see
Table 1). Each term
therefore represents a diffusion process with speed
, evaluated at a slowed time
. Consequently, subdiffusion can be interpreted not only as a superposition of diffusion processes with different speeds, but also as the sampling of a single diffusion process at multiple past time instants.
This behavior reflects the intrinsic memory property of subdiffusion. Unlike classical diffusion, whose evolution depends solely on the current state, subdiffusive dynamics depend on the entire history of the process. As shown in
Figure 3, increasing
J incorporates a larger number of past states, allowing the process to remember the shortest paths taken in its early-time behavior. In this sense, both the Caputo fractional derivative and subdiffusion can be viewed as time-averaged processes over the system’s past evolution. We propose a memory-based interpretation for the convergence of the subdiffusive shortest paths to the TSP shown in
Figure 3: just as the SOE approximation (
31) at time
t recalls past time instants of the diffusive process, the shortest paths in the SOE metric at time
t are influenced by the shortest paths at earlier times
, which are strongly linked to the TSP as
.
The fractional order
controls the strength of this memory effect. As
, the fractional diffusion equation reduces to the classical diffusion equation, and memory effects vanish. Conversely, as
, greater weight is assigned to earlier times. This behavior is illustrated in
Figure 4, where we plot the dissimilarities of the subdiffusive shortest paths with respect to the topological shortest path for different values of
and
J. The dissimilarities are computed with the Levenshtein distance (also known as
edit distance), which counts the minimum number of edits (insertions, deletions, or substitutions) needed to transform the edge sequence of one path into the other. We observe that smaller values of
require fewer SOE terms to recover the Mittag-Leffler shortest paths. For instance, when
, only two distinct paths are observed for
, whereas multiple paths persist for
. This is the reason for choosing
in the experiments: it is far enough from 1 to allow memory effects, but it is not small enough for memory to completely suppress the discovery of alternative paths.
The contrast between the diffusive (
Figure 3a) and subdiffusive (
Figure 3f) behaviors is striking. When memory is not present, the diffusion-based geometrization explores the graph, as the multiple possible paths show. The inception of memory allows one to recall the topologically shortest path, and gravitate towards it. Memory is strengthened either by lowering the parameter
, or by considering more terms (i.e., by increasing
J), as each term is a recall to a previous time instant. Stronger memory means that the Levenshtein distance to the TSP is smaller, as shown in
Figure 3.
6. Caputo Fractional Derivative and Memory
We have seen that memory dictates the behavior of subdiffusion and the underlying fractional-time differential equation. In this section, we analyze the memory contributions present in the Caputo derivative, distinguishing between the influence of the remote past and more recent times. We show an underlying connection between these effects and the convexity of the solution to the subdiffusion equation.
6.1. Remote Versus Recent Memory in the Caputo Fractional Derivative
Definition 4. Consider the time-fractional Caputo derivative of a function , as defined in (
3)
. Divide the interval into two equal halves, and consider the contribution from each subinterval Recent past (which also includes the ‘present’ at time ):
When
is the solution of the fractional diffusion Equation (
1), we consider
, the
i-th component of the vector
. Denote by
and
its ‘remote past’, and ‘recent past’ contributions, respectively.
Definition 5. For a fixed vertex i and fixed , we say that vertex i recalls its remote past more strongly than its recent past on ifand we say that i recalls its recent past more strongly than its remote past if A vertex may exhibit both behaviors, depending on whether the derivative is increasing or decreasing, as shown in the following results.
Theorem 5 (Memory Bias)
. Let be the solution of the fractional diffusion Equation (
1)
. Fix a vertex and a time .If the derivative is non-decreasing for , then for any with strict inequality if , or if and is strictly increasing on . In this case, vertex i recalls its recent past more strongly than its remote past on .
If the derivative is non-increasing for , then in the limit with strict inequality if if is strictly decreasing on . In this case, vertex i recalls its remote past more strongly than its recent past on .
Proof. The proof is purely scalar and uses only the behavior of on the interval ; the graph structure enters only through the fact that arises from the fractional diffusion equation.
For the first part, by assumption
is non-decreasing on
, therefore
For
, this inequality is strict, while for
, it is strict only if
is strictly non-decreasing (i.e., increasing). By integrating over the interval
and performing a change of variables
, we obtain
For the second part, in the limit
, the weight of the multiplying kernel becomes
. Thus, for
, the remote and recent past contributions are, respectively,
By the assumption that
is non-increasing on
, the integrand of the first equation is greater than or equal to the integrand of the second; therefore, we obtain
The inequality is strict if
is strictly non-increasing (i.e., decreasing). □
Corollary 2. Let be as in Theorem 5, and fix a vertex . Let and be two disjoint time intervals with . Define the corresponding left and right recent contributions for :Assume that is strictly decreasing on ;
is strictly increasing on .
ThenIn particular, on , vertex i recalls its more distant past more strongly than its more recent past, whereas on the bias is reversed. Proof. Define on and on . Then is strictly decreasing and is strictly increasing on their respective domains, and the points , define uniform grids. Applying Theorem 5 to and yields the desired inequalities. □
Corollary 3. Let be as in Theorem 5, and consider two vertices . Fix a time and, for , consider the remote and recent past contributions of the two vertices:Assume that is strictly decreasing on ;
is strictly increasing on .
ThenThus, over the same time interval and for the same fractional model, vertex i exhibits a remote-past memory bias, whereas vertex j exhibits a recent-past memory bias. Proof. Apply Theorem 5 to the scalar functions and on the interval . □
The recent past term incorporates the contribution of the ‘present’ instant ; however, in the integral formulation, the value of the function at a single point does not affect the integral. Thus, one may consider the following discretization.
Definition 6. For odd k, let the interval be partitioned into k subintervals of equal length , where . We define the following specific regions within :
Remote past:
Recent past:
Present:
where the coefficients are appropriately determined by the trapezoidal rule.
The accuracy of this quadrature rule for
was shown by Odibat [
67].
Theorem 6. The fractional Caputo derivative can be expressed aswhere is the error term. In the limit , the contributions of and vanish; the only surviving contribution is , which reduces to .
6.2. Physical Interpretation: Memory Regimes on a Graph
The Memory Bias Theorem shows that, within the Caputo-driven fractional diffusion dynamics on a graph for , the relative importance of the “remote” versus “recent” parts of the past is not uniform across the network. Instead, it depends sensitively on the local temporal curvature of the solution at each vertex. That is, the sign of the second derivative determines whether is increasing or decreasing.
If the temporal derivative is increasing within a given window, the trajectory is bending upwards, and older information within that window receives a smaller weight than more recent information. In this regime, vertex i is said to exhibit an recent–past memory bias. Our results show that the recent past dominates the remote past for any .
Conversely, if is decreasing on a given window, the trajectory is bending downwards, and the older information receives a larger weight than the more recent information. In the limit, the vertex exhibits a remote–past memory bias.
A striking consequence is that different vertices on the same graph may simultaneously reside in opposite memory regimes, even though they are driven by the same fractional dynamics and are evaluated over the same time interval. Similarly, the same vertex may switch memory regimes over time, depending on the evolution of its temporal curvature. This expresses a fundamental “heterogeneity of memory” in fractional diffusion on graphs.
In summary, the fractional order determines how much of the past is remembered globally, but the shape of the temporal evolution at each vertex dictates which part of the past (remote versus recent) is preferentially recalled. This generates rich, spatially distributed memory patterns that reflect both the graph geometry and the initial configuration.
9. Conclusions
In this work, we have investigated subdiffusion on graphs from structural, dynamical, and mechanistic perspectives, emphasizing the role of memory rather than modifications of spatial connectivity or transition rules. By grounding time-fractional diffusion in a random time-change framework, we have shown that subdiffusion on networks is not merely a slower version of classical diffusion, but a fundamentally non-Markovian process in which past states actively shape future evolution. Importantly, this loss of Markovianity occurs without sacrificing linearity or mass conservation, making time-fractional models both analytically tractable and physically meaningful for networked systems.
A central outcome of our analysis is the demonstration that Mittag-Leffler graph dynamics admit a convex, mass-preserving representation as a superposition of classical heat semigroups evaluated at rescaled times. This operator-level viewpoint provides the foundation for the sum-of-exponentials (SOE) approximation, which enables the efficient numerical evaluation of fractional dynamics through a finite number of matrix exponentials. From a computational perspective, this reduces the problem to repeatedly applying classical diffusion operators, allowing one to leverage existing scalable algorithms while incorporating non-Markovian effects.
From a global perspective, the results of the paper can be understood as different manifestations of this operator-level structure. At the vertex level, the superposition of time scales induces heterogeneous memory effects: different nodes, and even the same node at different times, exhibit distinct biases toward the remote past, recent past, or present contributions, depending on the local temporal curvature of the evolving state. At the level of paths, this same structure translates into a geometric reformulation of transport: the subdiffusive distance induces a metrization of the graph that reshapes connectivity in a time-dependent manner. In particular, subdiffusive shortest paths recover topological shortest paths in the small-time limit while preferentially selecting routes through high-degree regions, revealing a form of memory-assisted navigation that contrasts with classical diffusion.
Taken together, these results show that the operator structure, vertex-level memory, and path-level geometry are not independent phenomena, but are tightly coupled aspects of a single underlying mechanism: the redistribution of diffusion across multiple intrinsic time scales. This integrated viewpoint clarifies how memory reshapes network transport in a structurally coherent way.
From a practical standpoint, the proposed framework is particularly relevant for systems in which transport is affected by memory or trapping effects, such as biological interaction networks, communication systems with delays, or transport in heterogeneous media. In such settings, the SOE approximation provides a computationally viable alternative to direct evaluation of fractional operators, enabling the simulation of subdiffusive dynamics on large graphs using standard diffusion solvers. Compared to classical diffusion, the present approach captures non-exponential relaxation, heterogeneous waiting times, and path-selection effects, while retaining computational efficiency through its reduction to a finite set of semigroup evaluations.
Finally, by connecting fractional diffusion to finite superpositions of classical diffusions and to operator-valued Volterra memory equations in the Laplace domain, we have placed time-fractional graph dynamics within a broader hierarchy of diffusion models with memory. In this context, fractional equations can be interpreted as scale-free limits of multi-rate processes, providing a unifying framework that links analytical theory, numerical approximation, and physical interpretation.
Overall, our results provide a coherent picture of subdiffusion on graphs, showing how memory, geometry, and dynamics interact across scales. This perspective not only deepens the theoretical understanding of fractional diffusion on networks, but also opens the way to practical applications and further quantitative comparisons with classical diffusion models.