Fractional Diffusion on Graphs: Superposition of Laplacian Semigroups Incorporating Memory

Deniskin, Nikita; Estrada, Ernesto

doi:10.3390/fractalfract10040273

Open AccessArticle

Fractional Diffusion on Graphs: Superposition of Laplacian Semigroups Incorporating Memory

by

Nikita Deniskin

¹

and

Ernesto Estrada

^2,*

¹

Faculty of Sciences, Scuola Normale Superiore, 56126 Pisa, Italy

²

Institute for Cross-Disciplinary Physics and Complex Systems (IFISC), CSIC-UIB, 07122 Palma de Mallorca, Spain

^*

Author to whom correspondence should be addressed.

Fractal Fract. 2026, 10(4), 273; https://doi.org/10.3390/fractalfract10040273

Submission received: 18 March 2026 / Revised: 15 April 2026 / Accepted: 16 April 2026 / Published: 21 April 2026

(This article belongs to the Special Issue Fractal Analysis and Data-Driven Complex Systems)

Download

Browse Figures

Versions Notes

Abstract

Subdiffusion on graphs is often modeled by time-fractional diffusion equations; yet, its structural and dynamical consequences remain unclear. We show that subdiffusive transport on graphs is a memory-driven process generated by a random time change that compresses operational time, produces long-tailed waiting times, and breaks Markovianity while preserving linearity and mass conservation. While the subordination representation and complete monotonicity properties of the Mittag-Leffler function are classical, we develop a graph-based synthesis in which Mittag-Leffler dynamics admit an exact convex, mass-preserving representation as a superposition of Laplacian semigroups evaluated at rescaled times. This perspective reveals fractional diffusion as ordinary diffusion acting across multiple intrinsic time scales and enables new structural and dynamical interpretations of graphs. This framework uncovers heterogeneous, vertex-dependent memory effects and induces transport biases absent in classical diffusion, including algebraic relaxation, degree-dependent waiting times, and early-time asymmetries between sources and neighbors. These features define a subdiffusive geometry on graphs, enabling the recovery of global shortest paths, in contrast to the graph exploration of diffusive geometry, while simultaneously favoring high-degree regions. Finally, we show that time-fractional diffusion can be interpreted as a singular limit of multi-rate diffusion, in an appropriate asymptotic sense.

Keywords:

subdiffusion on graphs; time-fractional diffusion; Mittag-Leffler functions; memory kernels; sum-of-exponentials; shortest paths

MSC:

26A33; 15A16; 35R11; 47D06; 45D05

1. Introduction

Graphs

G = (V, E)

—also referred to as networks—provide a natural mathematical framework to represent a wide variety of complex systems arising in molecular, ecological, technological, and social contexts [1]. In this representation, the set of vertices V typically corresponds to the entities of the system, while the set of edges E encodes their interactions. A fundamental mechanism governing the transport of mass, energy, or information across such networks is diffusion [2,3]. However, due to the complexity of the environments in which many real-world networks are embedded, transport processes often deviate from classical diffusive behavior [4,5,6,7].

Subdiffusion, characterized by a slower-than-linear growth of the mean squared displacement, is especially prevalent in complex systems. For instance, subdiffusive dynamics are ubiquitous in the crowded interior of biological cells [8,9,10,11], where millions of macromolecules interact, forming intricate networks of biochemical processes. Similarly, crowding effects caused by vehicle density and driving fluctuations in urban transportation systems have been shown to induce subdiffusive traffic states [12]. Subdiffusion has also been observed in information transmission processes on online social networks, such as Twitter (now X) and Digg [13].

A standard mathematical approach to model such a crowded and heterogeneous system is to replace the classical time derivative in the diffusion equation with a fractional-time derivative [14,15]. When anomalous diffusion takes place on a graph G, this leads to the following fractional diffusion equation:

\{\begin{matrix} D_{t}^{α} u_{θ} (t) + θ L u_{θ} (t) & = 0, \\ u_{θ} (0) & = u_{0}, \end{matrix}

(1)

where

θ > 0

denotes the diffusivity,

D_{t}^{α}

is the Caputo time-fractional derivative, and L is the graph Laplacian operator acting on functions defined on the vertex set. Specifically, denoting by

C (V)

the set of all complex-valued functions on V, L is the linear mapping from

C (V)

into itself given by

(L f) (v) : = \sum_{(v, w) \in E} (f (v) - f (w)), f \in C (V) .

(2)

The operator

D_{t}^{α}

denotes the Caputo time-fractional derivative of order

0 < α < 1

[16], defined by

D_{t}^{α} u (t) = \frac{1}{Γ (1 - α)} \int_{0}^{t} \frac{u^{'} (τ)}{{(t - τ)}^{α}} d τ,

(3)

where

u^{'} (τ)

is the first derivative of u evaluated at time

τ

. Despite their broad relevance in continuous settings, time-fractional diffusion models have been only sparsely explored on graphs and networks. To date, applications have been largely restricted to epidemiological modeling [17,18] and fractional diffusion on the human proteome, proposed as an alternative framework to account for the multi-organ damage associated with SARS-CoV-2 infection [19]. An exception is the use of (1) in an engineering context for achieving consensus in autonomous systems, where it is referred to as fractional-order consensus [20,21,22,23,24,25]. A rapidly growing line of research explores the integration of fractional derivatives into learning algorithms, leading to the development of fractional-order neural networks and related architectures, with emerging applications in machine learning and artificial intelligence [26,27].

Apart from crowding and excluded-volume effects, which reduce mobility by limiting the available space for motion, several additional mechanisms can give rise to subdiffusion in complex systems. This happens, for instance, in ecological networks, where structural disorder and heterogeneity—manifested through irregular geometries, bottlenecks, and hierarchical or fractal structures—constrain transport pathways and significantly slow down spatial exploration [28]. These features often induce trapping events and lead to broad distributions of residence times. Another characteristic of complex systems is the existence of memory. Temporal memory effects have been observed, for instance, in transcription regulator–DNA interactions in live bacterial cells [29], apart from the extensive existing evidence accumulated on single-cell experiments. As memory refers to the phenomenon where past events influence a system’s current and future states or behaviors, the evolution of these complex systems at a given time depends on their entire past history rather than solely on their instantaneous state.

In such non-Markovian settings, classical diffusion equations fail to provide an adequate description. Instead, diffusion models with memory kernels naturally arise, capturing the nonlocal-in-time response induced by trapping, heterogeneity, and temporal correlations [30,31,32,33,34,35]. Once again, in the case of graphs, this leads to the generalized diffusion equation

\{\begin{matrix} \int_{0}^{t} γ (t - s) \partial_{s} u_{θ} (s) d s + θ L u_{θ} (t) & = 0, \\ u_{θ} (0) & = u_{0}, \end{matrix}

(4)

where

γ (t)

denotes a memory kernel. Such formulations provide a unifying framework for describing subdiffusive transport and establishing a direct connection between microscopic mechanisms and macroscopic anomalous diffusion. Related non-Markovian formulations on networks arise naturally when vertex-to-vertex motion is modeled as a non-Poisson continuous-time random walk. In this setting, the node-state evolution is governed by a generalized master equation in which the graph Laplacian (or transition operator) appears entwined in time with a kernel determined by the waiting-time distribution, and memory effects are known to substantially alter diffusion and mixing properties on (temporal) networks [36,37,38].

While time-fractional diffusion equations and memory-kernel formulations are widely used to model subdiffusive transport in continuous settings, their role in the context of networks and graph-based diffusion has been, so far, explored mainly at a phenomenological level. Existing studies typically emphasize anomalous scaling, spectral properties, or long-time relaxation, but often treat memory as a uniform slowing down mechanism acting on otherwise classical graph diffusion [39,40,41]. As a result, comparatively little is known about how non-Markovian effects interact with the discrete geometry of a graph, and how they influence transport pathways, or how they modify the vertex-level and path-level behavior beyond global decay rates.

The present work is motivated by the need to clarify how subdiffusion reshapes diffusion on graphs at the structural level. By exploiting the subordination principle [42] and a mass-preserving sum-of-exponentials representation of the Mittag-Leffler operator, we connect fractional diffusion on graphs to a superposition of classical heat processes acting at different internal times. This perspective allows us to study, in a unified way, memory effects, vertex-dependent waiting times, effective distances, and path selection in subdiffusive dynamics. In doing so, we show that memory does not merely slow down diffusion uniformly, but induces heterogeneous temporal behavior across vertices and leads to well-defined subdiffusive distances and paths that differ from their classical diffusive counterparts. These results provide a concrete link between fractional models, memory kernels, and graph-based transport, and offer a more detailed understanding of what subdiffusion means when the underlying space is a network.

1.1. Subdiffusion: Physical Mechanisms and Mathematical Models

A central hallmark of anomalous diffusion is the deviation of the mean squared displacement (MSD) from the linear-in-time growth predicted by Fickian diffusion. Instead, one observes a power-law scaling [5,6,43]

〈 x^{2} (t) 〉 \sim t^{α}, 0 < α < 1,

(5)

which defines subdiffusive behavior. As emphasized in [6], such subdiffusion is not a single phenomenon but rather a collective outcome of different physical mechanisms, each leading to distinct stochastic and mathematical descriptions.

One prominent mechanism underlying subdiffusion is trapping [6,14,15]. In crowded or energetically disordered environments, particles may experience long waiting times between successive displacements due to transient binding or deep potential wells. This situation is naturally described by the continuous-time random walk (CTRW) framework, where the waiting times between steps are independent random variables drawn from a heavy-tailed distribution

ψ (τ) \sim τ^{- 1 - α}, 0 < α < 1,

(6)

implying a diverging mean waiting time.

In the scaling limit, CTRW dynamics lead to a fractional diffusion equation (FDE) for the probability density

p (x, t)

. Sokolov [6] stresses that the use of such fractional equations is physically justified only when the underlying trapping assumptions are valid. CTRW models generally exhibit aging, weak ergodicity breaking, and large trajectory-to-trajectory fluctuations.

Another class of subdiffusive systems arises from transport in labyrinthine or fractal structures, such as percolation clusters or tortuous channel networks [6,44,45,46]. In these systems, anomalous diffusion originates from geometric constraints rather than trapping times. The particle explores a space with no translational invariance and a broad distribution of path lengths.

A paradigmatic example is diffusion on critical percolation clusters or related fractal media, for which the MSD again follows a subdiffusive power law. While fractional diffusion equations may reproduce the probability density in unbounded domains, Sokolov [6] emphasizes that they generally fail to capture important properties such as first-passage times or confined-domain behavior. Consequently, geometric subdiffusion and trapping-induced subdiffusion can yield identical PDFs while corresponding to fundamentally different physical processes.

Subdiffusion may also emerge in viscoelastic environments [47,48,49], where the tagged particle is embedded in a complex interacting medium, such as a polymer network or the cytoskeleton. In this case, the dynamics are governed not by trapping or geometry but by long-range temporal correlations in the particle’s motion.

These systems are often described by fractional Brownian motion (fBm), a Gaussian process characterized by correlated increments, or equivalently by generalized Langevin equations (GLEs) with memory kernels,

m \dot{v} (t) = - \int_{0}^{t} G (t - t^{'}) v (t^{'}) d t^{'} + ξ (t),

(7)

where the friction kernel typically follows a power law

G (t) \sim t^{- β}

, and

ξ (t)

is a correlated noise term. Depending on the noise properties, the resulting MSD scales subdiffusively. Unlike CTRW-based models, fBm and GLE dynamics are ergodic and do not exhibit aging—a distinction that is crucial for interpreting experimental data.

Another way of modeling subdiffusion is via models based on diffusion equations with time-dependent diffusion coefficients [50,51],

\frac{\partial p (x, t)}{\partial t} = D (t) \frac{\partial^{2} p (x, t)}{\partial x^{2}}, D (t) \sim t^{α - 1} .

(8)

Although such models reproduce the same MSD scaling as subdiffusive processes, they are primarily phenomenological fitting tools (see [6]). Despite yielding the same probability density as fractional Brownian motion, they lack its correlation structure and are more closely related to mean-field descriptions of CTRW dynamics.

In closing, real systems often exhibit subdiffusion of mixed origin, combining trapping, geometric constraints, and viscoelastic effects. In such cases, different models may predict the same MSD or PDF while differing fundamentally in their aging properties, ergodicity, and trajectory statistics [6].

We emphasize that the analytical ingredients underlying fractional diffusion—such as the subordination identity and the complete monotonicity of the Mittag-Leffler function—are classical results. The novelty of the present work lies not in these foundational properties themselves, but in their synthesis and interpretation in the context of graph-based diffusion, leading to new geometric, operator-theoretic, and vertex-level insights.

1.2. Contributions of the Paper

We employ the subordination relation (Section 2) to construct a sum-of-exponentials (SOE) approximation for the Mittag-Leffler function. While the scalar counterpart is well understood, we work with a matrix-valued approximation, tailored to studying the time evolution governed by the graph Laplacian. Physically, we interpret the SOE as a superposition of diffusion-like processes across multiple timescales. The subordination identity and the complete monotonicity of the Mittag-Leffler function are well-established results in fractional calculus. In this work, these classical tools serve as the foundation for a new graph-based framework. Our contribution is to develop a structural and dynamical synthesis of fractional diffusion on graphs that combines operator representations, geometric constructions, and vertex-level interpretations of memory.

In Section 3, we introduce the mathematical framework of the sum-of-exponentials structure and derive appropriate tail bounds that guarantee the accuracy of the approximation as the number of terms grows to infinity (Theorem 1 and Section 3.2). In Section 4, we describe the error metrics used to assess the accuracy of the SOE approximation and discuss the influence of the fractional parameter

α

on the contributions of the different diffusion-like processes.

In Section 5, we present new metrizations of the graph, which assign a weight to each edge based on the diffusive, subdiffusive, or SOE-based behavior, respectively. We then introduce our main experimental tool, the shortest paths in these new metrics on the graph. We prove that in the small-time limit, the subdiffusive shortest paths coincide exactly with the shortest paths in the original metric (Theorems 2 and 3), with a preference for traversing high-degree regions (Theorem 4).

Our numerical experiments show that the shortest paths in the diffusive and subdiffusive metrics exhibit completely different behaviors: while the diffusive shortest paths tend to cover broader regions of the graph for different time values, the subdiffusive shortest paths preserve path history even over extended timescales. The shortest paths in the SOE metric function as a bridge between these two phenomena. We propose a memory interpretation based on the influence of previous time instants from the diffusive point of view, and on remembering the most efficient paths at smaller values of t from the metrization point of view, revealing memory-assisted navigation on a network.

Next, we study how memory affects the dynamics on a graph from a vertex-based perspective. In Section 6, we prove that a vertex may receive larger contributions from the more remote or more recent past, depending on the local temporal curvature of the solution to (1). This happens for positive temporal curvature for any

α \in [0, 1)

, and for negative temporal curvature in the limit

α \to 0

(Theorem 5). In Section 7, we provide examples of how different remote or recent past biases may occur at the same time for different vertices: early-time convexity at sources and concavity at their neighbors. Furthermore, we discuss how algebraic (rather than exponential) relaxation and degree-dependent waiting-time effects fundamentally alter transport, trapping, and path selection on networks.

Finally, in Section 8, we show how fractional dynamics can be interpreted as arising from a singular limit of multi-rate diffusion with memory, in an appropriate asymptotic sense. Time-fractional equations can be viewed as scale-free limits of a finite superposition of diffusions in the Laplace domain, equivalently describable via operator-valued Volterra memory kernels (Propositions 7 and 8). This provides a common architecture linking SOE approximations, memory equations, and fractional calculus.

In summary, the genuinely new contribution of this work lies in the graph-based synthesis of fractional diffusion, including (i) a mass-preserving, operator-level sum-of-exponentials (SOE) construction for the Mittag-Leffler propagator on graphs; (ii) the introduction of subdiffusive distances and shortest paths that define a new geometry on networks; and (iii) a vertex-resolved interpretation of memory effects, revealing heterogeneous, degree-dependent temporal behavior. These elements provide a unified perspective linking fractional dynamics, classical diffusion processes, and graph structure.

2. Preliminaries

2.1. Notation and Assumptions

We work on

R^{n}

with the Euclidean norm

{∥ \cdot ∥}_{2}

and the induced operator norm

{∥ \cdot ∥}_{2}

. We consider

G = (V, E)

to be a simple undirected graph on n vertices with

A = (A_{i j})

being its adjacency matrix, and

D = diag (d_{i})

with

d_{i} = \sum_{j} A_{i j}

the matrix of vertex degree. For a finite graph, the Laplacian operator is realized by the so-called graph Laplacian matrix

L = D - A

. L is a real symmetric, positive semi-definite matrix satisfying

L 1 = 0

; if G is connected, then

\ker (L) = span {1}

. The spectrum of L is

0 = λ_{1} \leq λ_{2} \leq \dots \leq λ_{n} = : λ_{\max}

; let

L = V Λ V^{⊤}

be an orthogonal diagonalization. For a bounded Borel function

f : [0, \infty) \to R

we use the spectral calculus

f (L) : = V f (Λ) V^{⊤}

.

If

{(λ_{n}, ϕ_{n})}_{n = 1}^{N}

are the eigenpairs of L, then the solution to the abstract Cauchy problem in Equation (1) is given by

\begin{matrix} u (t) & = E_{α} (- t^{α} L) u_{0} \\ = \sum_{n = 1}^{N} E_{α} (- λ_{n} t^{α}) 〈 u_{0}, ϕ_{n} 〉 ϕ_{n}, \end{matrix}

(9)

where

E_{α} (M) = E_{α, 1} (M)

is the Mittag-Leffler matrix function of

M,

which has the following power-series definition:

E_{α} (M) = \sum_{k = 0}^{\infty} \frac{M^{k}}{Γ (α k + 1)},

(10)

with

Γ (\cdot)

being the Euler gamma function.

2.2. Subordination Identity and the M–Wright Density

For

0 < α < 1

,

E_{α} (- t^{α} L) = \int_{0}^{\infty} M_{α} (θ) e^{- θ t^{α} L} d θ,

(11)

where

M_{α} (θ) \geq 0

is the M–Wright (Mainardi) density,

\int_{0}^{\infty} M_{α} (θ) d θ = 1

, and a convenient series form is [52]

M_{α} (θ) = \sum_{k = 0}^{\infty} \frac{{(- θ)}^{k}}{k! Γ (1 - α (k + 1))} (θ \geq 0) .

(12)

The integral in (11) is a Bochner integral of the operator-valued map

s \mapsto e^{- s L}

against probability measures.

2.3. Complete Monotonicity and Bernstein’s Theorem

The following facts clarify why the fractional propagator is a positive, mass-preserving contraction.

Proposition 1.

Fix

0 < α < 1

. The scalar function

λ \to E_{α} (- t^{α} λ)

is completely monotone on

[0, \infty)

. There is a probability density

M_{α} (θ)

on

(0, \infty)

such that

E_{α} (- t^{α} λ) = \int_{0}^{\infty} e^{- θ t^{α} λ} M_{α} (θ) d θ,

and in the operator case (Bochner integral):

E_{α} (- t^{α} L) = \int_{0}^{\infty} e^{- θ t^{α} L} M_{α} (θ) d θ .

Furthermore,

E_{α} (- t^{α} L)

is self-adjoint, positive, and

{∥ E_{α} (- t^{α} L) ∥}_{2} = 1

. Moreover, if

L 1 = 0

then

E_{α} (- t^{α} L) 1 = 1

.

Proof.

The Mittag-Leffler function is completely monotone. By Bernstein’s Theorem [53], any completely monotone function can be represented as a mixture of exponentials:

E_{α} (- y) = \int_{0}^{\infty} e^{- y θ} M_{α} (θ) d θ .

For the Mittag-Leffler function, the density is given by the Mainardi function

M_{α} (θ)

. For a thorough exposition of these results, see [52,53,54,55]. Using the change of variables

y = t^{α} λ

, we obtain the desired scalar identity. Functional calculus then yields the following operator identity:

\begin{matrix} E_{α} (L) & = V E_{α} (Λ) V^{⊤} = V (\int_{0}^{\infty} e^{- θ t^{α} Λ} M_{α} (θ) d θ) V \\ = \int_{0}^{\infty} (V e^{- θ t^{α} Λ} V) M_{α} (θ) d θ = \int_{0}^{\infty} e^{- θ t^{α} L} M_{α} (θ) d θ . \end{matrix}

We have positivity

E_{α} (- t^{α} λ) \geq 0

for

0 < α < 1

. From monotonicity, we obtain

E_{α} (- t^{α} λ) \leq E_{α} (0) = 1

for

λ \geq 0

. Since L is self-adjoint, the eigenvalues of L are non-negative and

λ_{1} = 0

, so we have

{∥ E_{α} (- t^{α} L) ∥}_{2} = max_{λ \in σ (L)} | E_{α} (- t^{α} λ) | = E_{α} (0) = 1 .

The mass property follows from

e^{- θ t L} 1 = 1

and

\int_{0}^{\infty} M_{α} (θ) d θ = 1

. □

If G is connected, then

{lim}_{t \to \infty} E_{α} (- t^{α} L) u_{0} = \frac{1}{n} (1^{⊤} u_{0}) 1

. For a graph with

c > 1

components, the limit projects

u_{0}

to the vector that is constant on each component, with the corresponding component averages.

We note that the subordination identity and the complete monotonicity properties summarized above are classical; their role here is to provide the analytical foundation for the graph-based operator constructions and interpretations developed in the subsequent sections.

2.4. Terminology: Subordination and the Random Clock

Consider the heat semigroup on the graph

{e^{- s L}}_{s \geq 0}

. Let

{(S_{τ})}_{τ \geq 0}

be the

α

-stable subordinator and

E_{t} = inf {τ > 0 : S_{τ} > t}

the inverse subordinator. Here, subordination means that the time-fractional evolution is obtained by running the baseline (time-linear) heat dynamics with the random clock

E_{t}

in place of deterministic time,

E_{α} (- t^{α} L) = \int_{0}^{\infty} e^{- s L} g_{α} (s, t) d s = \int_{0}^{\infty} M_{α} (θ) e^{- θ t^{α} L} d θ,

(13)

with

g_{α} (s, t) = t^{- α} M_{α} (s / t^{α})

; see [42,52,56]. We do not consider Bochner subordination by a general Bernstein function

ϕ

(which would lead to space-fractional

ϕ (L)

). Equation (13) means that the operator

E_{α} (- t^{α} L)

is the expected value of the standard diffusion operator

e^{- s L}

, where the

g_{α} (s, t)

is the probability that the random clock displays time s when the physical time that has passed is t.

2.5. The Time-Changed Process $X_{E_{t}}$

Let

{X_{s}}_{s \geq 0}

be the continuous–time Markov process on V with generator

- L

:

\frac{d}{d s} p (s) = - L p (s), p (s) = {(P [X_{s} = j | X_{0} = i])}_{i, j} = e^{- s L} .

Starting from node i, the holding time is an exponential random variable with decay rate

d_{i} = \sum_{j} A_{i j}

. The next node of the random walk is then chosen to be j with probability

A_{i j} / d_{i}

.

Let

E_{t} : = inf {τ > 0 : S_{τ} > t}

be the inverse

α

–stable subordinator (

0 < α < 1

), independent of X. The time-changed process is

Y_{t} : = X_{E_{t}}

, i.e., the baseline diffusion observed at the random operational time

E_{t}

. Averaging over the random clock yields

P [Y_{t} = j | Y_{0} = i] = \int_{0}^{\infty} P [X_{s} = j | X_{0} = i] g_{α} (s, t) d s = {[E_{α} (- t^{α} L)]}_{i j} .

Hence, for any initial

u_{0}

,

u (t) = E_{α} (- t^{α} L) u_{0}

solves the Caputo differential equation

\partial_{t}^{α} u (t) = - L u (t)

.

Y_{t}

is semi–Markov in physical time t: the holding time

T_{i}

at node i has survival

P (T_{i} > t) = E_{α} (- d_{i} t^{α})

, a heavy tail (

\sim t^{- α}

), producing trapping and memory. Since

L 1 = 0

and the integral representation averages stochastic kernels,

Y_{t}

preserves total mass. On a connected graph,

{lim}_{t \to \infty} Y_{t}

has the same equilibrium as the baseline walk (i.e., componentwise averaging).

3. Sum of Exponentials Approximation

In this section, we present a practical, mass-preserving approximation of the Mittag-Leffler function as a sum of exponential functions from a mathematical point of view. We start with the following fundamental definition.

Definition 1.

The sum-of-exponentials (SOE) approximation is

F_{J} (t, L) = \sum_{j = 1}^{J} a_{j} e^{- b_{j} t^{α} L},

(14)

with coefficients

a_{j}, b_{j}

that depend only on α and not on t.

Our aim is to obtain

E_{α} (- t^{α} L) \approx \sum_{j = 1}^{J} a_{j} e^{- b_{j} t^{α} L},

This approximation is constructed as a quadrature of (11) whose window is logarithmically scaled in

θ

. In the rest of the section, we analyze the mathematical properties of the SOE approximation. Specifically, we perform the following:

1.: Provide a log–trapezoidal construction with $a_{j} > 0$ , $\sum_{j} a_{j} = 1$ (mass conservation),
2.: Prove ensuring geometric convergence in J for a fixed window;
3.: Derive explicit, uniform tail bounds from $M_{α}$ to select $(θ_{\min}, θ_{\max})$ ;
4.: Tabulate ready-to-use window endpoints $(θ_{\min}, θ_{\max})$ (and $(y_{\min}, y_{\max})$ ) for typical $α, ε$ (see Appendix A);
5.: Provide practical error metrics (relative/probe, mass error) for a posteriori assessment.

The SOE representation is obtained as a quadrature approximation of the subordination integral and should be understood primarily at the operator level. In particular, the coefficients

(a_{j}, b_{j})

arise from a discretization of the integral over the M–Wright density and do not, by themselves, define a stochastic mixture or a dynamical decomposition of independent processes.

Nevertheless, the representation

E_{α} (- t^{α} L) \approx \sum_{j} a_{j} e^{- b_{j} t^{α} L}

may be viewed heuristically as a superposition of diffusion-like operators acting at rescaled internal times. This interpretation serves as an intuitive guide for understanding multi-scale behavior, but it is not intended as a literal probabilistic construction unless additional structure is imposed.

3.1. SOE via Log–Trapezoidal Quadrature

Set

θ = e^{y}

,

y \in R

, and truncate to

[y_{\min}, y_{\max}]

. With a uniform grid

y_{j} = y_{\min} + (j - 1) h

,

h = \frac{y_{\max} - y_{\min}}{J - 1}

, define

b_{j} = θ_{j} = e^{y_{j}}, {\tilde{a}}_{j} = h M_{α} (b_{j}) b_{j}, a_{j} = \frac{{\tilde{a}}_{j}}{\sum_{k = 1}^{J} {\tilde{a}}_{k}} .

(15)

Then

E_{α} (- t^{α} L) \approx \sum_{j = 1}^{J} a_{j} e^{- b_{j} t^{α} L}, a_{j} > 0, b_{j} > 0, \sum_{j = 1}^{J} a_{j} = 1 .

(16)

Remark 1.

The nodes

b_{j}

correspond to quadrature points in the variable θ arising from the subordination formula, and the weights

a_{j}

approximate the probability density

M_{α} (θ)

in an integral sense. However, the discrete representation

\sum_{j} a_{j} e^{- b_{j} t^{α} L}

should not be interpreted as an exact probabilistic mixture of independent diffusion processes. Rather, it is a deterministic operator approximation whose coefficients inherit their meaning from the underlying integral representation. Any interpretation in terms of multiple time scales or internal clocks is therefore heuristic and serves primarily as a conceptual aid.

Convention 1.

In all that follows, we fix the nodes

b_{j}

once (they depend only on α and the chosen log–window) and, for any

t > 0

, evaluate the SOE as

\sum_{j = 1}^{J} a_{j} e^{- (t^{α} b_{j}) L}

. Thus, the coefficients

(a_{j}, b_{j})

do not depend on t or on L. We then have the following results:

Proposition 2.

If

L 1 = 0

and

\sum_{j} a_{j} = 1

, then for all

t \geq 0

the SOE approximation preserves mass, that is,

1^{⊤} (\sum_{j = 1}^{J} a_{j} e^{- b_{j} t L}) u_{0} = 1^{⊤} u_{0}

.

Proof.

Since

L 1 = 0

,

e^{- b_{j} t^{α} L} 1 = 1

for each j. Thus,

(\sum_{j} a_{j} e^{- b_{j} t^{α} L}) 1 = (\sum_{j} a_{j}) 1 = 1

, and left-multiplying by

1^{⊤}

gives

1^{⊤} (\cdot) u_{0} = 1^{⊤} u_{0}

. □

Lemma 1.

For self-adjoint L and bounded Borel

f, g

,

{∥ f (L) - g (L) ∥}_{2} = \max_{λ \in σ (L)} | f (λ) - g (λ) | .

In particular, the SOE error is

{∥ E_{α} (- t^{α} L) - F_{J} (t, L) ∥}_{2} = \max_{λ \in [0, λ_{\max}]} | g (λ) - g_{J} (λ) |,

with

g (λ) = E_{α} (- t^{α} λ)

and

g_{J} (λ) = \sum_{j} a_{j} e^{- b_{j} t^{α} λ}

.

Proof.

Diagonalize

L = V Λ V^{⊤}

. Then

g (L) - g_{J} (L) = V (g (Λ) - g_{J} (Λ)) V^{⊤}

and

{∥ g (L) - g_{J} (L) ∥}_{2} = {∥ g (Λ) - g_{J} (Λ) ∥}_{2} = \max_{k} | g (λ_{k}) - g_{J} (λ_{k}) | .

□

Assuming that

λ

is bounded in a compact set, the error of the SOE approximation decreases geometrically in the number of terms J and uniformly in

λ

. This assumption holds provided that we have an upper bound on the eigenvalues of L.

Theorem 1.

Let

f_{λ} (y) : = M_{α} (e^{y}) e^{y} e^{- e^{y} t^{α} λ}

with

t > 0

,

λ \in [0, λ_{\max}]

. For fixed

[y_{\min}, y_{\max}]

and step

h = (y_{\max} - y_{\min}) / (J - 1)

,

sup_{λ \in [0, λ_{\max}]} |\int_{y_{\min}}^{y_{\max}} f_{λ} (y) d y - h \sum_{j = 1}^{J} f_{λ} (y_{j})| \leq C_{1} e^{- C_{2} / h},

with constants

C_{1}, C_{2} > 0

independent of h and λ. Consequently,

{∥ E_{α} (- t^{α} L) - \sum_{j = 1}^{J} a_{j} e^{- b_{j} t^{α} L} ∥}_{2} \leq C_{1} e^{- C_{2} (J - 1) / (y_{\max} - y_{\min})} .

Proof

(Analytic trapezoidal rule). The map

y \mapsto f_{λ} (y)

is analytic in a strip

S_{a} = {| Im y | < a}

and decays rapidly as

Re y \to \pm \infty

(exhibiting doubly exponential decay as

Re y \to + \infty

and being integrable towards

- \infty

). For analytic, rapidly decaying functions, the (periodized) trapezoidal rule error on a finite interval is

\leq C e^{- 2 π a / h}

via Poisson summation; see (Theorem 2.1, [57]). Uniform strip bounds over

λ \in [0, λ_{\max}]

yield uniform constants

C_{1}, C_{2}

. Finally, Lemma 1 transfers the scalar quadrature error to the operator norm bound. □

3.2. Tail Bounds for the Subordination Integral and Window Selection

Having determined that we can approximate the integral on

[y_{\min}, y_{\max}]

, we now show how to choose the window endpoints. We start by considering the subordination identity previously defined and its scalar counterpart

F (λ) = \int_{0}^{\infty} M_{α} (θ) e^{- θ t^{α} λ} d θ

for

λ \geq 0

. For window endpoints

0 < θ_{\min} < θ_{\max} < \infty

(equivalently,

y_{\min} = log θ_{\min}

and

y_{\max} = log θ_{\max}

), we define the truncated operator

F_{win} (L) = \int_{θ_{\min}}^{θ_{\max}} M_{α} (θ) e^{- θ t^{α} L} d θ,

so that the truncation error splits as the left and right tails:

T_{L} (λ) : = \int_{0}^{θ_{\min}} M_{α} (θ) e^{- θ t^{α} λ} d θ, T_{R} (λ) : = \int_{θ_{\max}}^{\infty} M_{α} (θ) e^{- θ t^{α} λ} d θ .

By the spectral theorem (Lemma 1), the operator truncation error satisfies

∥ F (L) - F_{win} {(L) ∥}_{2} = \max_{λ \in σ (L)} (T_{L} (λ) + T_{R} (λ)) \leq sup_{λ \in [0, λ_{\max}]} T_{L} (λ) + sup_{λ \in [0, λ_{\max}]} T_{R} (λ) .

3.2.1. Left Tail (Small $θ$ )

Near

θ = 0

, the M–Wright density satisfies

M_{α} (θ) = \frac{1}{Γ (1 - α)} + O (θ)

(see [54,55]). A simple bound follows.

Proposition 3.

For all

t \geq 0

,

λ \geq 0

and

0 < θ_{\min} \leq 1

,

T_{L} (λ) \leq \frac{1}{Γ (1 - α)} \{\begin{matrix} \frac{1 - e^{- t^{α} λ θ_{\min}}}{t^{α} λ}, & λ > 0, \\ θ_{\min}, & λ = 0 . \end{matrix}

(17)

Consequently, uniformly in

λ \in [0, λ_{\max}]

,

sup_{λ \in [0, λ_{\max}]} T_{L} (λ) \leq \frac{θ_{\min}}{Γ (1 - α)} .

Proof.

Use

M_{α} (θ) \leq \frac{1}{Γ (1 - α)}

for

θ \in (0, 1]

and integrate

\int_{0}^{θ_{\min}} e^{- t λ θ} d θ

(with the obvious limit as

λ ↓ 0

). □

We remark that taking

θ_{\min} = ε Γ (1 - α)

forces the left-tail contribution below

ε

for all eigenvalues (including the zero mode).

3.2.2. Right Tail (Large $θ$ )

As

θ \to \infty

, the density exhibits a stretched-exponential decay (Chapter 4, [55]), (Chapter 2, [54]): there exist

C_{α}, c_{α} > 0

and

q_{α} : = \frac{1}{1 - α} > 1

such that, for all sufficiently large

θ

,

M_{α} (θ) \leq C_{α} θ^{p_{α}} exp (- c_{α} θ^{q_{α}}), p_{α} : = \frac{α - 2}{2 (1 - α)} .

(18)

This yields two complementary bounds.

Proposition 4.

Let (18) hold for all

θ \geq θ_{0} (α)

. Then for any

t \geq 0

,

λ \geq 0

and

θ_{\max} \geq θ_{0} (α)

,

T_{R} (λ) \leq \int_{θ_{\max}}^{\infty} C_{α} θ^{p_{α}} e^{- c_{α} θ^{q_{α}}} d θ \leq \frac{C_{α}}{q_{α} c_{α}} θ_{\max}^{p_{α} + 1 - q_{α}} exp (- c_{α} θ_{\max}^{q_{α}}) .

(19)

In particular, the bound is independent of λ (and thus controls the zero eigenvalue).

Proof.

Dropping the factor

e^{- t λ θ} \leq 1

, we bound the tail of a monotone stretched-exponential via the standard inequality

\int_{x}^{\infty} u^{p} e^{- c u^{q}} d u \leq \frac{1}{q c} x^{p + 1 - q} e^{- c x^{q}}

, which holds for large x. □

Proposition 5.

If the graph is connected and we restrict to

1^{⊥}

(thus

λ \geq λ_{2} > 0

), then for any

θ_{\max} > 0

,

sup_{λ \in [λ_{2}, λ_{\max}]} T_{R} (λ) \leq e^{- t^{α} λ_{2} θ_{\max}} \int_{θ_{\max}}^{\infty} M_{α} (θ) d θ \leq e^{- t^{α} λ_{2} θ_{\max}} .

(20)

Proof.

Use

e^{- t λ θ} \leq e^{- t λ_{2} θ}

and

\int_{θ_{\max}}^{\infty} M_{α} (θ) d θ \leq 1

. □

We now consider the practical choices based on the previous results:

All modes (including $λ = 0$ ): Pick $θ_{\max}$ so that $c_{α} θ_{\max}^{q_{α}} \geq log (2 / ε)$ ; then by (19), ${sup}_{λ} T_{R} (λ) ≲ \frac{C_{α}}{q_{α} c_{α}} θ_{\max}^{p_{α} + 1 - q_{α}} e^{- c_{α} θ_{\max}^{q_{α}}} \leq ε / 2$ for large enough $θ_{\max}$ .
Mean-zero subspace: Set $θ_{\max} \geq \frac{1}{t^{α} λ_{2}} log (2 / ε)$ to guarantee ${sup}_{λ \geq λ_{2}} T_{R} (λ) \leq ε / 2$ by (20).

3.2.3. Window Rules: Left and Right Together

Combining Propositions 3–5 yields explicit choices for

(θ_{\min}, θ_{\max})

that ensure the total tail remains below a target tolerance

ε

:

Corollary 1.

Given

ε \in (0, 1)

and

0 < α < 1

:

General (all modes). If we choose

θ_{\min} = \frac{ε}{2} Γ (1 - α), θ_{\max} s . t . c_{α} θ_{\max}^{q_{α}} \geq log \frac{2}{ε},

then

{sup}_{λ \in [0, λ_{\max}]} (T_{L} (λ) + T_{R} (λ)) \leq ε

.

Mean-zero subspace (connected graph). To estimate the operator on

1^{⊥}

, we employ the following the choice of parameters

θ_{\min} = \frac{ε}{2} Γ (1 - α), θ_{\max} = \frac{1}{t λ_{2}} log \frac{2}{ε} .

If one intends to reuse the same nodes for multiple times

t \in [t_{\min}, t_{\max}]

, it is sufficient to replace t with

t_{\min}

in the expression for

θ_{\max}

, so that the nodes

(a_{j}, b_{j})

are t-independent. Then

{sup}_{λ \in [λ_{2}, λ_{\max}]} (T_{L} (λ) + T_{R} (λ)) \leq ε

.

Remark 2.

The log–trapezoidal SOE uses

y = log θ

on

[y_{\min}, y_{\max}]

, with

y_{\min} = log θ_{\min}

and

y_{\max} = log θ_{\max}

. After fixing the window by Corollary 1, increasing J then controls the discretization error inside the window with geometric rate (Theorem 1).

Remark 3.

The choice of

θ_{\max}

via Proposition 5 with

λ_{2}

replaced by

λ_{\max}

yields a conservative upper bound on the tail for high-frequency modes and motivates rules of the form

θ_{\max} \sim C / (t λ)

. Empirically, setting

C \approx 32

ensures that

exp (- θ_{\max} t λ)

falls near machine precision and works well when the focus is on modes away from the zero eigenvalue; the rigorous alternative (19) controls all modes using only properties of

M_{α}

.

3.2.4. Constants in the Stretched-Exponential Bound

Explicit asymptotics for

M_{α}

yield

q_{α} = \frac{1}{1 - α}

and an exponent constant

c_{α} = (1 - α) α^{α / (1 - α)}

. The prefactor has a power

θ^{p_{α}}

with

p_{α} = \frac{α - 2}{2 (1 - α)}

and a multiplicative constant depending only on

α

(see Section 2.3, [54] and Section 4.3, [55]). Using these in (19) yields a fully explicit

θ_{\max} (ε, α)

.

4. Error of the SOE Approximation

In this section, we define several error metrics to assess the accuracy of the SOE approximation, and test them using a sample graph.

4.1. Error Metrics

Given a probe vector

u_{0}

, let

u^{★} (t)

be the solution of (1), namely

u^{★} (t) = E_{α} (- t^{α} L) u_{0} .

Denote by

u^{SOE} (t)

the time evolution of

u_{0}

obtained using the SOE approximation

u^{SOE} (t) = F_{J} (t, L) u_{0} = \sum_{j = 1}^{J} a_{j} e^{- b_{j} t^{α} L} u_{0} .

We employ the following metrics to evaluate the error between

u^{SOE} (t)

and

u^{★} (t)

.

4.1.1. Relative Error

relerr (u_{0}) = \frac{∥ u^{★} (t) - u^{SOE} (t) ∥_{2}}{∥ u^{★} (t) ∥_{2}} .

(21)

We use the relative error in the two-norm because it reflects the operator-level error, that is, the precision with which the operator

F_{J} (t, L)

approximates

E_{α} (- t^{α} L)

.

4.1.2. Mass Conservation Error

masserr (u_{0}) = |1^{⊤} u^{★} (t) - 1^{⊤} u^{SOE} (t)| .

(22)

Ideally,

masserr (u_{0}) = 0

; with floating-point arithmetic, it is typically at the level of machine precision due to

\sum_{j} a_{j} = 1

and

L 1 = 0

.

4.1.3. Scalar Error

The scalar error, intended on the spectrum of the matrix functions, is

max_{λ \in [0, λ_{\max}]} |E_{α} (- t^{α} λ) - F_{J} (t, λ)| .

(23)

According to Theorem 1, and uniformly for

λ \in [0, λ_{\max}]

, the scalar error decays geometrically with J once the window captures the effective support of the integrand. While the raw error as a function of J need not be strictly decreasing (typical oscillations of spectrally convergent trapezoidal rules), its envelope decays until it reaches the desired accuracy [57].

In all cases, the SOE inherits the correct long-time limit and conservation properties by construction (Proposition 2). Accuracy improves with J and when the window is adapted to the relevant scale

t, λ

.

4.1.4. Operator Error

If an operator-level diagnostic is desired, one may estimate

\frac{∥ E_{α} (- t^{α} L) - F_{J} (t, L) ∥_{2}}{∥ E_{α} (- t^{α} L) ∥_{2}}

(24)

via power iteration on the difference operator; in practice, the probe-based relative error (21) is sufficient and cheaper.

We have constructed the SOE approximation for scalars, valid on an interval containing the eigenvalues of L, and then extended it to matrices. Therefore, we use the two-norm both for vectors (relative error) and for matrices (operator error) to provide a clear correspondence between the scalar and the operator settings. Indeed, note that in exact arithmetic, the operator error and the scalar spectral error are identical. This is because

{∥ E_{α} (- t^{α} L) ∥}_{2} = 1

and the two-norm of the symmetric matrix

E_{α} (- t^{α} L) - F_{J} (t, L)

is the magnitude of its largest eigenvalue; the latter is precisely the scalar error according to the spectral theorem applied to the function

g (λ) = E_{α} (- t^{α} λ) - F_{J} (t, λ)

. However, matrix computations in floating-point arithmetic may introduce numerical inaccuracies, as evidenced by a comparison of Figure 1 and Figure 2B.

4.2. Computational Results

In this section, we consider an Erdős–Rényi (ER) random graph with 250 vertices and 1000 edges. The graph is simple and connected. We compute the scalar error for three different values of

α

and a wide range of times,

11 \leq t \leq 1001

. The results are illustrated in Figure 1 for (a)

α = 0.8

, (b)

α = 0.5

, and (c)

α = 0.25

. The convergence rate of the scalar error is higher for smaller values of

α

; that is, fewer addends are needed to obtain a sufficiently accurate approximation.

Let us now focus on the nature of the SOE for the three different values of

α

, with

J = 61

. Using our approach, we obtain the operator-level approximations

E_{0.8} (- t^{0.8} L) \approx 0.242 e^{- 1.52 t^{0.8} L} + 0.220 e^{- 1.19 t^{0.8} L} + 0.148 e^{- 0.92 t^{0.8} L} + \dots,

E_{0.5} (- t^{0.5} L) \approx 0.119 e^{- 1.48 t^{0.5} L} + 0.115 e^{- 1.15 t^{0.5} L} + 0.108 e^{- 1.90 t^{0.5} L} + \dots,

and

E_{0.25} (- t^{0.25} L) \approx 0.096 e^{- 1.10 t^{0.25} L} + 0.095 e^{- 1.42 t^{0.25} L} + 0.091 e^{- 0.86 t^{0.25} L} + \dots .

The coefficients

a_{j} > 0

are quadrature weights that approximate the contribution of different regions of the subordination integral, and their normalization

\sum_{j} a_{j} = 1

reflects mass preservation at the operator level. The parameters

b_{j}

correspond to the quadrature nodes and determine the effective scaling of the semigroup factors

e^{- b_{j} t^{α} L}

.

While it is convenient to view the coefficients

a_{j}

as indicating the relative contribution of different exponential components, and the

b_{j}

as associated with different effective time scales, this interpretation should be understood as heuristic. The SOE representation is not derived from a decomposition into independent stochastic processes, but from a deterministic approximation of the integral representation of the Mittag-Leffler operator.

For larger values of

α

, the coefficients

a_{j}

are more concentrated, and subdiffusion can be represented as a superposition of fewer operators. Indeed, in the limit

α \to 1

, we exactly recover the diffusive behavior, so we expect

a_{1} = 1

and

a_{j} = 0

for

j > 1

. As

α

decreases, the contributions of the other operators become more prominent. The three leading coefficients of the SOE for

α = 0.25

indicate that the corresponding exponential components contribute with comparable weights in the operator approximation. This means that multiple diffusive processes are needed in the strong memory regime to effectively recover subdiffusion. We explain the connection between the scaling parameters

b_{j}

and the memory structure in the following section.

5. Subdiffusive Distance and Paths

In this section, we introduce new geometrizations of a graph related to the diffusive processes, subdiffusive processes, and the SOE approximation of the latter. We study the shortest paths under these new metrics and observe, via numerical experiments, a distinct difference in behavior: while the diffusive case explores broader regions of the graph, the subdiffusive case recovers the usual shortest paths in the standard metric.

5.1. Subdiffusive Distance

Let us consider the solution to the Caputo time-fractional diffusion equation:

u (t) = E_{α} (- t^{α} L) u_{0}

with initial condition

u_{0} = e_{v}

, where

e_{v}

is the vector with a one at position v and zeros elsewhere. We then evaluate the capacity of the whole graph to diffuse mass between the vertices v and w at time t. That is, we consider at time t the difference between the mass remaining at the origin, i.e., vertex v, and the mass diffused to vertex w:

u_{v | u_{0} = e_{v}} (t) - u_{w | u_{0} = e_{v}} (t) = {(E_{α} (- t^{α} L))}_{v v} - {(E_{α} (- t^{α} L))}_{v w} .

Similarly, the capacity of the whole graph to move mass from w to v conditioned to all initial mass being allocated at w is as follows:

u_{w | u_{0} = e_{w}} (t) - u_{v | u_{0} = e_{w}} (t) = {(E_{α} (- t^{α} L))}_{w w} - {(E_{α} (- t^{α} L))}_{w v} .

As we only consider undirected graphs here, the total capacity of mass diffusion between both vertices is as follows:

D_{α, t} (v, w) = {(E_{α} (- t^{α} L))}_{v v} + {(E_{α} (- t^{α} L))}_{w w} - 2 {(E_{α} (- t^{α} L))}_{v w} .

(25)

Then, we have the following result.

Proposition 6.

Let

D_{α, t} (v, w)

be defined as before. Then

D_{α, t} (v, w)

is real and non-negative. There exists an embedding of vertices

V \to R^{n}

,

i \to x_{i}

such that

D_{α, t} (v, w) = {∥ x_{v} - x_{w} ∥}_{2}^{2}

.

Proof.

Using positivity,

E_{α} (- t^{α} λ) \geq 0

for

0 < α < 1

, we write

\begin{matrix} E_{α} (- t^{α} L) & = V E_{α} (- t^{α} Λ) V^{⊤} \\ = V {(E_{α} (- t^{α} Λ))}^{1 / 2} {(E_{α} (- t^{α} Λ))}^{1 / 2} V^{⊤} \\ = {(V E_{α} (- t^{α} Λ))}^{1 / 2} {({(V E_{α} (- t^{α} Λ))}^{1 / 2})}^{⊤} . \end{matrix}

Denote

X_{α} (t) = {(V E_{α} (- t^{α} Λ))}^{1 / 2}

and by

x_{α, t}^{⊤} (v)

the v-th row of X. Thus,

{(E_{α} (- t^{α} L))}_{v w} = x_{α, t}^{⊤} (v) x_{α, t} (w) .

Then,

v \mapsto x_{α, t}^{⊤} (v)

is an embedding of vertices in

R^{n}

. Therefore

D_{α, t} (v, w) = {∥ x_{α, t} (v) - x_{α, t} (w) ∥}_{2}^{2},

so

D_{α, t} (v, w)

is a square Euclidean distance between the vertices v and w. □

Hereafter, we call

D_{α, t} (v, w)

the (squared) subdiffusive distance between the corresponding vertices in the graph when

α < 1

. Notice that for

α = 1

, the quantity

D_{1, t} (v, w)

is the diffusion distance defined by Coiffman et al. [58,59]. These distances are part of a large family of diffusion-like distances on graphs [60,61].

5.2. Subdiffusive Shortest Paths

The subdiffusive distance represents how easy it is to transmit mass between two vertices, regardless of whether they are adjacent or not. However, this distance does not directly determine which pathways are utilized for this transmission. Heuristically, paths should traverse the most efficient edges, although it is not immediately obvious how to identify them. To determine these paths in both the standard and subdiffusive regimes, we define the following weighted graph.

Definition 2.

Let

\sqrt{D_{α, t}}

be the matrix of (subdiffusive) distances, intended as the entry-wise square root of

D_{α, t}

. Define a weighted graph whose adjacency matrix is the Hadamard product

W = A ⊙ \sqrt{D_{α, t}}

, which assigns to each edge

(i, j)

the cost induced by the discrepancy of their diffusion profiles at time t. That is,

W_{α, t} (v, w) = \{\begin{matrix} \sqrt{D_{α, t} (v, w)}, & (v, w) \in E, \\ 0, & (v, w) \notin E . \end{matrix}

The resulting weighted graph

\tilde{G}

is the geometrization of the original graph

G .

The weighted graph

\tilde{G}

can be seen as a representation of G as a one-dimensional complex, i.e., a metric length space where each edge

e = (v, w) \in E

is a compact one-dimensional manifold with boundary

\partial e = {v, w}

, parameterized as

{\tilde{e}}_{α, t} (v, w) = [0, W_{α, t} (v, w)]

(see [62,63]).

The resulting shortest paths (computed via Dijkstra’s algorithm) identify chains of vertices whose subdiffusion states remain maximally coherent at time t. We call these paths the subdiffusive shortest paths. These are not, in general, the paths that require the fewest edges of A, which we call topological shortest paths or geodesic shortest paths. They generalize the shortest communicability paths introduced in [64], which were defined using the matrix function

f (A) = exp (β A)

.

Consider the subdiffusive shortest paths on

W (t)

as time varies. When t is close to zero, the effects of diffusion are minimal. That is, each particle has explored only a small part of the graph. In this limit, the weighted metric on the graph exhibits the same behavior as the standard metric, and topological shortest paths are recovered, as the following two results show.

Theorem 2

(Shortest-path dominance for the fractional heat kernel as

t \to 0

). Let

G = (V, E)

be a simple undirected graph with combinatorial Laplacian

L = D - A

, and let

0 < α < 1

. Denote by

d (v, w)

the graph distance between vertices

v \neq w

. Then, for every

v \neq w

,

{(E_{α} (- t^{α} L))}_{v w} = \frac{{(- t^{α})}^{d (v, w)}}{Γ (α d (v, w) + 1)} {(L^{d (v, w)})}_{v w} + O (t^{α (d (v, w) + 1)}), t ↓ 0 .

In particular,

{(E_{α} (- t^{α} L))}_{v w} = O (t^{α d (v, w)}),

and the leading-order contribution is determined by topological shortest paths between v and w. Moreover,

{(E_{α} (- t^{α} L))}_{v w} = \frac{t^{α d (v, w)}}{Γ (α d (v, w) + 1)} {(A^{d (v, w)})}_{v w} + O (t^{α (d (v, w) + 1)}),

where

{(A^{d (v, w)})}_{v w}

is the number of shortest walks of length

d (v, w)

from v to w.

Proof.

The Mittag-Leffler function admits the operator series representation

E_{α} (- t^{α} L) = \sum_{m = 0}^{\infty} \frac{{(- t^{α})}^{m}}{Γ (α m + 1)} L^{m},

which converges absolutely for all

t \geq 0

since L is bounded (see, e.g., [5]).

Taking the

(v, w)

entry yields

{(E_{α} (- t^{α} L))}_{v w} = \sum_{m = 0}^{\infty} \frac{{(- t^{α})}^{m}}{Γ (α m + 1)} {(L^{m})}_{v w} .

As in the classical heat-kernel case, write

L = D - A

. Any term contributing to

{(L^{m})}_{v w}

corresponds to a product of m factors, each equal to either D or

- A

. Diagonal factors D do not change the vertex index, while each factor A corresponds to a single edge traversal. Hence, in order for

{(L^{m})}_{v w}

to be nonzero, the word must contain at least

d (v, w)

factors of A. Consequently,

{(L^{m})}_{v w} = 0 for all m < d (v, w) .

For

m = d (v, w)

, the only contributing word is

{(- A)}^{d (v, w)}

, since any appearance of D would prevent reaching w in exactly

d (v, w)

steps. Therefore,

{(L^{d (v, w)})}_{v w} = {(- 1)}^{d (v, w)} {(A^{d (v, w)})}_{v w} .

Substituting into the series above, the first nonzero term occurs at

m = d (v, w)

, which proves the stated expansion. □

Theorem 3

(Short-time selection of shortest paths). Let G be a finite connected graph and let

X (t)

be the time-fractional continuous-time random walk associated with the Caputo fractional diffusion equation on G (with

0 < α < 1

). Let

N (t)

be the number of jumps of

X (\cdot)

up to time t, and let

d (v, w)

be the graph distance between distinct vertices

v \neq w

. Then, for every

v \neq w

,

lim_{t ↓ 0} P (N (t) = d (v, w) | X (t) = w, X (0) = v) = 1 .

That is, conditioned on arrival at w at very short times, the process selects a topologically shortest path with probability tending to 1.

Proof.

Write

d = d (v, w)

. Let

Y_{m}

be the embedded discrete-time jump chain of

X (t)

. The fractional CTRW representation yields

P (X (t) = w ∣ X (0) = v) = \sum_{m = 0}^{\infty} P (N (t) = m) P (Y_{m} = w ∣ Y_{0} = v),

see [65,66]. Let P be the one-step transition matrix of

Y_{m}

, so that

P (Y_{m} = w ∣ Y_{0} = v) = {(P^{m})}_{v w}

. By definition of graph distance,

{(P^{m})}_{v w} = 0

for all

m < d

. Hence,

P (X (t) = w ∣ X (0) = v) = \sum_{m = d}^{\infty} P (N (t) = m) {(P^{m})}_{v w} .

For the time-fractional walk, the jump-count distribution satisfies the short-time scaling

P (N (t) = m) = \frac{t^{α m}}{Γ (α m + 1)} + O (t^{α (m + 1)}), t ↓ 0,

see [66]. Therefore, the leading contribution comes from

m = d

, and we obtain

P (X (t) = w ∣ X (0) = v) = \frac{t^{α d}}{Γ (α d + 1)} {(P^{d})}_{v w} + O (t^{α (d + 1)}) .

Similarly,

P (N (t) = d, X (t) = w ∣ X (0) = v) = \frac{t^{α d}}{Γ (α d + 1)} {(P^{d})}_{v w} + O (t^{α (d + 1)}) .

Dividing the two expansions yields

P (N (t) = d | X (t) = w, X (0) = v) = 1 + O (t^{α}),

which proves the claim.

Finally, on the event

{N (t) = d, X (t) = w, X (0) = v}

, the jump sequence

(Y_{0}, \dots, Y_{d})

has length d and connects v to w; hence, it is a topological shortest path. □

Remark 4

(Physical interpretation of short-time path selection). The previous results (Theorems 2 and 3) show that, for both classical and time-fractional diffusion on graphs, the short-time behavior is governed by topological constraints rather than by long-time transport mechanisms. In the fractional case, memory effects and heavy-tailed waiting times manifest themselves through the slower time scaling

t^{α d (i, j)}

of transition probabilities; however, they do not alter the mechanism by which mass first propagates across the graph. At very short times, the process has insufficient opportunity to perform redundant or backtracking moves, and any realization that reaches a vertex j from i must therefore use the minimal number of jumps permitted by the graph distance. Thus, topologically shortest paths dominate not because they are energetically or entropically preferred, but because they are the only dynamically admissible routes in the short-time regime. Memory affects when such paths become observable, but not which paths contribute to the leading-order behavior. This provides a precise mathematical explanation for the observed agreement between diffusion-based distances and graph distances at very small times, even in the presence of anomalous (fractional) temporal dynamics.

Using Theorem 2 for two adjacent vertices

v, w

, we can compute the first-order term of the subdiffusive distance for

t \to 0

\begin{matrix} D_{α, t} (v, w) & = E_{α} {(- t^{α} L)}_{v v} + E_{α} {(- t^{α} L)}_{w w} - 2 E_{α} {(- t^{α} L)}_{v w} \\ = 2 - \frac{t^{α}}{Γ (α + 1)} (L_{v v} + L_{w w} - 2 L_{v w}) + O (t^{2 α}) \\ = 2 - \frac{t^{α}}{Γ (α + 1)} (d_{v} + d_{w} + 2) + O (t^{2 α}) . \end{matrix}

(26)

Note that usually there are multiple topologically shortest paths between the same pair of vertices. Theorem 4 describes which of these paths coincide with the subdiffusive shortest path as

t \to 0

. We need the following definition: for an edge

(v, w)

, its edge degree is

δ_{(v, w)} = d_{v} + d_{w} - 2 .

(27)

Theorem 4.

Let

G = (V, E)

be a simple, finite, undirected graph and let

v \neq w

be two vertices. Let

P = (v_{0}, \dots, v_{l})

be the subdiffusive shortest path according to the metric

D_{α, t}

between

v = v_{0}

and

w = v_{l}

, passing through l edges. Then, in the limit

t \to 0

, P is a topological shortest path between v and w that maximizes the sum of the edge degrees

\sum_{k = 0}^{l - 1} δ_{(v_{k}, v_{k + 1})} = d_{v} + 2 \sum_{k = 1}^{l - 1} d_{v_{k}} + d_{w} - 2 l .

(28)

Proof.

The length of the edge

(v_{k}, v_{k + 1})

is the square root of the communicability distance

D_{α, t} (v_{k}, v_{k + 1})

. In the limit

t \to 0

, from (26) we obtain

W_{α, t} (v_{k}, v_{k + 1}) = \sqrt{2} - \frac{\sqrt{2}}{4} \frac{t^{α}}{Γ (α + 1)} (δ_{(v_{k}, v_{k + 1})} + 4) + O (t^{2 α}) .

(29)

Therefore, the length of P is the sum of the lengths of all its constituent edges

W_{α, t} (P) = \sqrt{2} l (1 + \frac{t^{α}}{Γ (α + 1)}) - \frac{\sqrt{2}}{4} \frac{t^{α}}{Γ (α + 1)} \sum_{k = 0}^{l - 1} δ_{(v_{k}, v_{k + 1})} + O (t^{2 α}),

The first term depends only on l, which in the limit

t \to 0

is the length of (any) topologically shortest path between v and w (Theorems 2 and 3). Therefore, P is the subdiffusive shortest path if and only if it minimizes the second term, or equivalently, if P has the maximum sum of the edge degrees

\sum_{k = 0}^{l} δ_{(v_{k}, v_{k + 1})}

. □

5.3. Subdiffusive Shortest Paths on a Geometric Graph

We computationally analyze our previous analytical findings on an example graph. We focus our attention on the family of large-world networks for two reasons. First, large-diameter networks make it easier to differentiate between diffusion and subdiffusion both numerically and visually. Second, in small-diameter networks, both subdiffusive and topological shortest paths comprise very few edges relative to the network’s total size, making them difficult to distinguish.

Specifically, we consider a randomly generated Gabriel graph with

n = 600

vertices and 1156 edges. We use this graph because it is geometric, in the sense that its vertices are embedded in

R^{d}

, which allows for clear visual interpretation, particularly when illustrating paths. Gabriel graphs are defined as follows.

Definition 3.

Let

P \subset R^{d}

be a finite set of points, referred to as generators. The Gabriel graph

G_{G} = (V_{G}, E_{G})

associated with P has vertex set

V_{G} = P

. Two distinct vertices

v, w \in P

are connected by an edge

{v, w} \in E_{G}

if and only if the closed ball having the segment

[v w]

as its diameter contains no other points of P.

In this work, we restrict our attention to the planar case

d = 2

and embed the points in a rectangle with a length-to-width ratio of 2:1. We geometrize the Gabriel graph using both the subdiffusive communicability distance

D_{α, t} (v, w)

and its sum-of-exponentials (SOE) approximation. Specifically, we use

F_{J} (t, L) = \sum_{j = 1}^{J} a_{j} e^{- t^{α} b_{j} L},

and introduce the corresponding approximate squared distance

{\tilde{D}}_{J, α, t} (v, w) = {(F_{J} (t, L))}_{v v} + {(F_{J} (t, L))}_{w w} - 2 {(F_{J} (t, L))}_{v w} .

(30)

Throughout this section, we study a low-memory regime with

α = 0.85

. Using the square roots of both

D_{α, t} (v, w)

and

{\tilde{D}}_{J, α, t} (v, w)

, we construct the weighted graph as in Definition 2 and compute the subdiffusive shortest paths.

Experimental Results

We begin by reporting the topological shortest paths (TSPs) between two vertices located near opposite corners of the Gabriel graph (see Figure 3a). As is typical—even for planar graphs such as Gabriel graphs—there exist multiple TSPs between a given pair of vertices. These TSPs are colored according to the average edge degree along the path (see Equation (27)).

We next consider the shortest paths induced by

{\tilde{D}}_{J, α, t} (v, w)

for

J = 1

(see Figure 3b). This case corresponds to a standard diffusive process, since it involves a single exponential term, which is the solution of the classical diffusion equation. We observe that, while at early times the diffusive paths coincide with the TSP as predicted by Theorems 2 and 3, they progressively deviate as time increases.

As the number of exponentials J in the SOE approximation increases, the corresponding shortest paths converge toward the TSP, as illustrated in panels (C)–(E) of Figure 3. The exact solution based on the Mittag-Leffler function is shown in Figure 3f, and clearly demonstrates that the subdiffusive shortest paths coincide with the TSPs. Moreover, the limiting paths correspond to those TSPs with the largest average edge degree, in agreement with the analytical results derived in Theorem 4.

The time evolution of the vector probing errors,

relerr (u_{0} (t))

and

masserr (u_{0} (t))

, is displayed in Figure 2A. The mass error is at the level of machine precision, confirming that mass conservation of the SOE approximation is numerically guaranteed. The relative error is zero at

t = 0

(since the two initial conditions coincide); as t grows, the error increases and then remains above a certain threshold, which indicates that the two solutions

u^{★} (t)

and

u^{SOE} (t)

are meaningfully distinct. Figure 2B illustrates how the time averages of the operator and vector probing errors decay as J increases. Both errors initially decrease before stagnating around

10^{- 5}

and

10^{- 7}

, respectively. The mass error remains on the order of machine precision and is therefore omitted.

5.4. Physical Interpretation: Memory Reinforcement of Past States

It is clear that, from a mathematical point of view, increasing J improves the accuracy of the SOE approximation to the Mittag-Leffler function, leading to convergence of the corresponding shortest paths. To interpret this convergence from a physical perspective, recall that the SOE represents a superposition of diffusive processes. Let

b_{p} = \max {b_{j} : 1 \leq j \leq J}

. Then we can rewrite the SOE operator as

F_{J} (t, L) = \sum_{j = 1}^{J} a_{j} e^{- b_{j} t^{α} L} = \sum_{j = 1}^{J} a_{j} e^{- b_{p} \frac{b_{j}}{b_{p}} t^{α} L} = \sum_{j = 1}^{J} a_{j} e^{- b_{p} t_{j}^{α} L},

(31)

where the effective time instants are given by

t_{j} = {(\frac{b_{j}}{b_{p}})}^{\frac{1}{α}} t

.

In the example graph, the maximum diffusion speed is

b_{p} = b_{2} = 1.53

(see Table 1). Each term

e^{- b_{p} t_{j}^{α} L}

therefore represents a diffusion process with speed

b_{p}

, evaluated at a slowed time

t_{j}^{α}

. Consequently, subdiffusion can be interpreted not only as a superposition of diffusion processes with different speeds, but also as the sampling of a single diffusion process at multiple past time instants.

This behavior reflects the intrinsic memory property of subdiffusion. Unlike classical diffusion, whose evolution depends solely on the current state, subdiffusive dynamics depend on the entire history of the process. As shown in Figure 3, increasing J incorporates a larger number of past states, allowing the process to remember the shortest paths taken in its early-time behavior. In this sense, both the Caputo fractional derivative and subdiffusion can be viewed as time-averaged processes over the system’s past evolution. We propose a memory-based interpretation for the convergence of the subdiffusive shortest paths to the TSP shown in Figure 3: just as the SOE approximation (31) at time t recalls past time instants of the diffusive process, the shortest paths in the SOE metric at time t are influenced by the shortest paths at earlier times

τ < t

, which are strongly linked to the TSP as

τ \to 0

.

The fractional order

α

controls the strength of this memory effect. As

α \to 1

, the fractional diffusion equation reduces to the classical diffusion equation, and memory effects vanish. Conversely, as

α \to 0

, greater weight is assigned to earlier times. This behavior is illustrated in Figure 4, where we plot the dissimilarities of the subdiffusive shortest paths with respect to the topological shortest path for different values of

α

and J. The dissimilarities are computed with the Levenshtein distance (also known as edit distance), which counts the minimum number of edits (insertions, deletions, or substitutions) needed to transform the edge sequence of one path into the other. We observe that smaller values of

α

require fewer SOE terms to recover the Mittag-Leffler shortest paths. For instance, when

J = 20

, only two distinct paths are observed for

α = 0.25

, whereas multiple paths persist for

α = 0.85

. This is the reason for choosing

α = 0.85

in the experiments: it is far enough from 1 to allow memory effects, but it is not small enough for memory to completely suppress the discovery of alternative paths.

The contrast between the diffusive (Figure 3a) and subdiffusive (Figure 3f) behaviors is striking. When memory is not present, the diffusion-based geometrization explores the graph, as the multiple possible paths show. The inception of memory allows one to recall the topologically shortest path, and gravitate towards it. Memory is strengthened either by lowering the parameter

α

, or by considering more terms (i.e., by increasing J), as each term is a recall to a previous time instant. Stronger memory means that the Levenshtein distance to the TSP is smaller, as shown in Figure 3.

6. Caputo Fractional Derivative and Memory

We have seen that memory dictates the behavior of subdiffusion and the underlying fractional-time differential equation. In this section, we analyze the memory contributions present in the Caputo derivative, distinguishing between the influence of the remote past and more recent times. We show an underlying connection between these effects and the convexity of the solution to the subdiffusion equation.

6.1. Remote Versus Recent Memory in the Caputo Fractional Derivative

Definition 4.

Consider the time-fractional Caputo derivative of a function

x (t)

, as defined in (3). Divide the interval

[0, t]

into two equal halves, and consider the contribution from each subinterval

Remote past:

$R (α) = \frac{1}{Γ (1 - α)} \int_{0}^{t / 2} \frac{x^{'} (τ)}{{(t - τ)}^{α}} d τ;$
Recent past (which also includes the ‘present’ at time $τ = t$ ):

$P (α) = \frac{1}{Γ (1 - α)} \int_{t / 2}^{t} \frac{x^{'} (τ)}{{(t - τ)}^{α}} d τ .$

When

x (t)

is the solution of the fractional diffusion Equation (1), we consider

x_{i} (t)

, the i-th component of the vector

x (t)

. Denote by

R^{(i)} (α)

and

P^{(i)} (α)

its ‘remote past’, and ‘recent past’ contributions, respectively.

Definition 5.

For a fixed vertex i and fixed

t > 0

, we say that vertex i recalls its remote past more strongly than its recent past on

[0, t]

if

R^{(i)} (α) > P^{(i)} (α),

and we say that i recalls its recent past more strongly than its remote past if

R^{(i)} (α) < P^{(i)} (α) .

A vertex may exhibit both behaviors, depending on whether the derivative

x_{i}^{'} (t)

is increasing or decreasing, as shown in the following results.

Theorem 5

(Memory Bias). Let

x (t) = {(x_{1} (t), \dots, x_{n} (t))}^{⊤}

be the solution of the fractional diffusion Equation (1). Fix a vertex

i \in V

and a time

t > 0

.

If the derivative $x_{i}^{'} (τ)$ is non-decreasing for $τ \in [0, t]$ , then for any $α \in [0, 1)$

$R^{(i)} (α) \leq P^{(i)} (α),$

with strict inequality if $α \in (0, 1)$ , or if $α = 0$ and $x_{i}^{'} (τ)$ is strictly increasing on $[0, t]$ . In this case, vertex i recalls its recent past more strongly than its remote past on $[0, t]$ .
If the derivative $x_{i}^{'} (τ)$ is non-increasing for $τ \in [0, t]$ , then in the limit $α \to 0$

$R^{(i)} (0) \geq P^{(i)} (0),$

with strict inequality if if $x_{i}^{'} (τ)$ is strictly decreasing on $[0, t]$ . In this case, vertex i recalls its remote past more strongly than its recent past on $[0, t]$ .

Proof.

The proof is purely scalar and uses only the behavior of

x_{i}^{'} (t)

on the interval

[0, t]

; the graph structure enters only through the fact that

x_{i}^{'} (t)

arises from the fractional diffusion equation.

For the first part, by assumption

x_{i}^{'} (τ)

is non-decreasing on

[0, t]

, therefore

\frac{x_{i}^{'} (τ)}{{(t - τ)}^{α}} \leq \frac{x_{i}^{'} (τ + \frac{t}{2})}{{(t - \frac{t}{2} - τ)}^{α}}, for τ \in [0, t / 2] .

For

α \in (0, 1)

, this inequality is strict, while for

α = 0

, it is strict only if

x_{i}^{'} (τ)

is strictly non-decreasing (i.e., increasing). By integrating over the interval

[0, t / 2]

and performing a change of variables

τ^{'} = τ + \frac{t}{2}

, we obtain

\begin{matrix} R^{(i)} (α) & = \frac{1}{Γ (1 - α)} \int_{0}^{t / 2} \frac{x_{i}^{'} (τ)}{{(t - τ)}^{α}} d τ \\ \leq \frac{1}{Γ (1 - α)} \int_{0}^{t / 2} \frac{x_{i}^{'} (τ + \frac{t}{2})}{{(t - \frac{t}{2} - τ)}^{α}} d τ \\ = \frac{1}{Γ (1 - α)} \int_{t / 2}^{t} \frac{x_{i}^{'} (τ^{'})}{{(t - τ^{'})}^{α}} d τ^{'} = P^{(i)} (α) . \end{matrix}

For the second part, in the limit

α \to 0

, the weight of the multiplying kernel becomes

{lim}_{α \to 0} \frac{1}{{(t - τ)}^{α}} = 1

. Thus, for

α = 0

, the remote and recent past contributions are, respectively,

R^{(i)} (0) = \int_{0}^{t / 2} x_{i}^{'} (τ) d τ, and P^{(i)} (0) = \int_{t / 2}^{t} x_{i}^{'} (τ) d τ .

By the assumption that

x_{i}^{'} (τ)

is non-increasing on

[0, t]

, the integrand of the first equation is greater than or equal to the integrand of the second; therefore, we obtain

R^{(i)} (0) \geq P^{(i)} (0) .

The inequality is strict if

x_{i}^{'} (τ)

is strictly non-increasing (i.e., decreasing). □

Corollary 2.

Let

x (t)

be as in Theorem 5, and fix a vertex

i \in V

. Let

[t_{1}, t_{2}]

and

[t_{3}, t_{4}]

be two disjoint time intervals with

0 \leq t_{1} < t_{2} \leq t_{3} < t_{4}

. Define the corresponding left and right recent contributions for

α = 0

:

\begin{matrix} R^{(i, 1)} (0) & : = \frac{1}{Γ (1 - α)} \int_{t_{1}}^{\frac{t_{1} + t_{2}}{2}} \frac{x_{i}^{'} (τ)}{{(t - τ)}^{α}} d τ, & P^{(i, 2)} (0) & : = \frac{1}{Γ (1 - α)} \int_{\frac{t_{1} + t_{2}}{2}}^{t_{2}} \frac{x_{i}^{'} (τ)}{{(t - τ)}^{α}} d τ, \\ R^{(i, 3)} (0) & : = \frac{1}{Γ (1 - α)} \int_{t_{3}}^{\frac{t_{3} + t_{4}}{2}} \frac{x_{i}^{'} (τ)}{{(t - τ)}^{α}} d τ, & P^{(i, 4)} (0) & : = \frac{1}{Γ (1 - α)} \int_{\frac{t_{3} + t_{4}}{2}}^{t_{4}} \frac{x_{i}^{'} (τ)}{{(t - τ)}^{α}} d τ . \end{matrix}

Assume that

$x_{i}^{'} (t)$ is strictly decreasing on $[t_{1}, t_{2}]$ ;
$x_{i}^{'} (t)$ is strictly increasing on $[t_{3}, t_{4}]$ .

Then

R^{(i, 1)} (0) > P^{(i, 2)} (0), a n d R^{(i, 3)} (0) < P^{(i, 4)} (0) .

In particular, on

[t_{1}, t_{2}]

, vertex i recalls its more distant past more strongly than its more recent past, whereas on

[t_{3}, t_{4}]

the bias is reversed.

Proof.

Define

z_{1} (s) = x_{i} (t_{1} + s)

on

[0, t_{2} - t_{1}]

and

z_{2} (s) = x_{i} (t_{3} + s)

on

[0, t_{4} - t_{3}]

. Then

z_{1}^{'}

is strictly decreasing and

z_{2}^{'}

is strictly increasing on their respective domains, and the points

s_{j}^{(1)}

,

s_{j}^{(2)}

define uniform grids. Applying Theorem 5 to

z_{1}

and

z_{2}

yields the desired inequalities. □

Corollary 3.

Let

x (t)

be as in Theorem 5, and consider two vertices

i, j \in V

. Fix a time

t > 0

and, for

α = 0

, consider the remote and recent past contributions of the two vertices:

R^{(i)} (0), P^{(i)} (0), R^{(j)} (0), P^{(j)} (0) .

Assume that

$x_{i}^{'} (τ)$ is strictly decreasing on $[0, t]$ ;
$x_{j}^{'} (τ)$ is strictly increasing on $[0, t]$ .

Then

R^{(i)} (0) > P^{(i)} (0), a n d R^{(j)} (0) < P^{(j)} (0) .

Thus, over the same time interval

[0, t]

and for the same fractional model, vertex i exhibits a remote-past memory bias, whereas vertex j exhibits a recent-past memory bias.

Proof.

Apply Theorem 5 to the scalar functions

τ \mapsto x_{i} (τ)

and

τ \mapsto x_{j} (τ)

on the interval

[0, t]

. □

The recent past term

P (α)

incorporates the contribution of the ‘present’ instant

τ = t

; however, in the integral formulation, the value of the function at a single point does not affect the integral. Thus, one may consider the following discretization.

Definition 6.

For odd k, let the interval

[0, t]

be partitioned into k subintervals

[t_{j}, t_{j + 1}]

of equal length

h = t / k

, where

j = 0, \dots, k - 1

. We define the following specific regions within

[0, t]

:

Remote past: $R (k, α) = h^{1 - α} \sum_{j = 0}^{(k - 1) / 2} c_{j} (k, α) x^{'} (t_{j});$
Recent past: $P (k, α) = h^{1 - α} \sum_{(k - 1) / 2}^{j = k - 1} c_{j} (k, α) x^{'} (t_{j});$
Present: $P_{p r} (k, α) = h^{1 - α} c_{k} (k, α) x^{'} (t);$

where the coefficients

c_{j} (k)

are appropriately determined by the trapezoidal rule.

The accuracy of this quadrature rule for

h \to 0

was shown by Odibat [67].

Theorem 6.

The fractional Caputo derivative

D_{t}^{α} x (t)

can be expressed as

\begin{matrix} D_{t}^{α} x (t) & = R (k, α) + P (k, α) + P_{p r} (k, α) + E_{C} (f, h, α) \end{matrix},

(32)

where

E_{C} (f, h, α) = O (h^{2})

is the error term.

In the limit

α \to 1

, the contributions of

P (k, α)

and

R (k, α)

vanish; the only surviving contribution is

P_{p r} (k, α)

, which reduces to

x^{'} (t)

.

6.2. Physical Interpretation: Memory Regimes on a Graph

The Memory Bias Theorem shows that, within the Caputo-driven fractional diffusion dynamics on a graph for

α \to 0

, the relative importance of the “remote” versus “recent” parts of the past is not uniform across the network. Instead, it depends sensitively on the local temporal curvature of the solution at each vertex. That is, the sign of the second derivative

x_{i}^{″} (t)

determines whether

x_{i}^{'} (t)

is increasing or decreasing.

If the temporal derivative

x_{i}^{'} (t)

is increasing within a given window, the trajectory

x_{i} (t)

is bending upwards, and older information within that window receives a smaller weight than more recent information. In this regime, vertex i is said to exhibit an recent–past memory bias. Our results show that the recent past dominates the remote past for any

α \in [0, 1)

.

Conversely, if

x_{i}^{'} (t)

is decreasing on a given window, the trajectory is bending downwards, and the older information receives a larger weight than the more recent information. In the

α \to 0

limit, the vertex exhibits a remote–past memory bias.

A striking consequence is that different vertices on the same graph may simultaneously reside in opposite memory regimes, even though they are driven by the same fractional dynamics and are evaluated over the same time interval. Similarly, the same vertex may switch memory regimes over time, depending on the evolution of its temporal curvature. This expresses a fundamental “heterogeneity of memory” in fractional diffusion on graphs.

In summary, the fractional order determines how much of the past is remembered globally, but the shape of the temporal evolution at each vertex dictates which part of the past (remote versus recent) is preferentially recalled. This generates rich, spatially distributed memory patterns that reflect both the graph geometry and the initial configuration.

7. Emergence of Memory from Fractional Diffusion on a Graph

Once we have seen how convexity influences the recall of the early or late past, we now study how convexity can emerge in a network, depending on the initial mass distribution of the process. The strong influence of memory is also shown through the underlying random clock process.

7.1. Local Convexity and Concavity in Caputo Fractional Diffusion: A Mittag-Leffler and SOE-Based Analysis

We consider the Caputo fractional diffusion equation on a finite graph G with combinatorial Laplacian

L = D - A

:

D_{t}^{α} x (t) = - L x (t), x (0) = e_{v},

where

0 < α < 1

and

e_{v}

is the vth standard basis vector. The mild solution is given by the Mittag-Leffler matrix function

x (t) = E_{α} (- t^{α} L) e_{v} .

We are interested in the curvature of the time evolution at early times, i.e., the sign of

x_{i}^{″} (t)

for sufficiently small

t > 0

. This sign determines whether the temporal trajectory at a vertex is locally convex or concave. We show below that, regardless of the graph, the vertex where the mass is initially placed has an early-time convex and decreasing profile, while every neighbor exhibits an early-time concave and increasing profile. Both statements hold rigorously for every

0 < α < 1

.

Theorem 7

(Local convexity/concavity from Mittag-Leffler and SOE). Let G be a finite graph with Laplacian

L = D - A

. Consider the Caputo fractional diffusion equation

D_{t}^{α} x (t) = - L x (t), 0 < α < 1, x (0) = e_{v} .

Then:

At vertex v, the time evolution $x (t)$ is strictly decreasing and strictly convex for all $t \in (0, ε_{v})$ , for some $ε_{v} > 0$ .
If w is any neighbor of v, i.e., $A_{w v} > 0$ , then $u_{w} (t)$ is strictly increasing and strictly concave for all $t \in (0, ε_{w})$ , for some $ε_{w} > 0$ .

In particular, there exists

ε > 0

such that for all

t \in (0, ε)

the excited vertex v exhibits a convex decay, while each of its neighbors exhibits a concave growth.

Proof.

We use two complementary arguments: (i) the small-time expansion of the Mittag-Leffler matrix function, and (ii) a sum-of-exponentials (SOE) representation for

E_{α}

.

Step 1: Exact short-time Mittag-Leffler expansion. The matrix Mittag-Leffler function admits the convergent series

E_{α} (- t^{α} L) = I - \frac{t^{α}}{Γ (α + 1)} L + \frac{t^{2 α}}{Γ (2 α + 1)} L^{2} + O (t^{3 α}) .

Applying this to the initial condition

x (t) = E_{α} (- t^{α} L) e_{v}

yields

x (t) = e_{v} - \frac{t^{α}}{Γ (α + 1)} L e_{v} + O (t^{2 α}) .

At the excited vertex v we have

x_{v} (t) = 1 - \frac{deg (v)}{Γ (α + 1)} t^{α} + O (t^{2 α}),

because

{(L e_{v})}_{v} = deg (v)

. Differentiating yields

x_{v}^{'} (t) = - \frac{deg (v)}{Γ (α + 1)} α t^{α - 1} + O (t^{2 α - 1}) < 0,

x_{v}^{″} (t) = - \frac{deg (v)}{Γ (α + 1)} α (α - 1) t^{α - 2} + O (t^{2 α - 2}) > 0,

since

α (α - 1) < 0

for

0 < α < 1

. Note that the exponents

α - 1

,

2 α - 2

are negative. Thus

x_{v} (t)

is strictly decreasing and strictly convex on

(0, ε_{v})

.

At a neighbor w of v,

x_{w} (t) = - \frac{t^{α}}{Γ (α + 1)} {(L e_{v})}_{w} + O (t^{2 α}) = \frac{A_{w v}}{Γ (α + 1)} t^{α} + O (t^{2 α}),

because

{(L e_{v})}_{w} = - A_{w v}

for

w \neq v

. Thus

x_{w}^{'} (t) = \frac{A_{w v}}{Γ (α + 1)} α t^{α - 1} + O (t^{2 α - 1}) > 0,

x_{w}^{″} (t) = \frac{A_{w v}}{Γ (α + 1)} α (α - 1) t^{α - 2} + O (t^{2 α - 2}) < 0 .

Hence,

x_{w} (t)

is strictly increasing and strictly concave on

(0, ε_{w})

.

Step 2: Interpretation via SOE approximation. We have developed an SOE scheme to approximate the Mittag-Leffler function by

E_{α} (- λ t^{α}) \approx \sum_{m = 1}^{M} c_{m} e^{- d_{m} t}, c_{m} > 0, d_{m} > 0 .

Applying the same SOE to the matrix L yields

x (t) = E_{α} (- t^{α} L) e_{v} \approx \sum_{m = 1}^{M} c_{m} e^{- d_{m} t L} e_{v} .

Each term

e^{- d_{m} t L} e_{v}

is the solution of a classical diffusion equation with rate

d_{m}

and is therefore strictly convex at the source v and strictly concave at neighbors w at early times. Because all coefficients

c_{m}

are positive, the SOE sum preserves the convexity at v and concavity at its neighbors. This matches exactly the signs obtained in the rigorous Mittag-Leffler expansion above.

Combining Steps 1 and 2 proves (i) and (ii).

Thus, over the same time interval

t \in (0, ε_{v})

and for the same fractional model, vertex v exhibits a remote-past memory bias, i.e., it recalls more the past than the present, whereas vertex w exhibits a recent-past memory bias, i.e., it recalls more the present than the past. □

7.2. Random Time Change Induces Subdiffusion and Memory

We summarize the mechanism by which the random clock

E_{t}

slows down spreading (subdiffusion) and introduces memory into the dynamics on a graph. Under the subordination identity,

u (t) = E_{α} (- t^{α} L) u_{0} = E [e^{- E_{t} L}] u_{0},

the state at physical time t equals the baseline heat state evaluated at the random operational time

s = E_{t}

. For

0 < α < 1

, the inverse

α

-stable clock satisfies

E [E_{t}] = \frac{t^{α}}{Γ (1 + α)}, Var (E_{t}) = \frac{2 t^{2 α}}{Γ (1 + 2 α)} - \frac{t^{2 α}}{Γ {(1 + α)}^{2}},

so the typical amount of diffusion time available by physical time t scales like

t^{α}

(rather than t). Thus, any diffusive spread measure that, at baseline, scales with s (e.g., mean-square displacement in Euclidean space, or mixing surrogates on graphs), will scale like

t^{α}

under the random clock. This is the essence of subdiffusion: slower-than-classical spreading (

t^{α / 2}

instead of

t^{1 / 2}

in continuum settings), and on graphs, a slower homogenization than the exponential-in-t decay of the standard heat semigroup.

In the time-changed walk

Y_{t} = X_{E_{t}}

, the survival function of node i is the tail of the cumulative distribution function of the holding time

T_{i}

, given for

t \geq 0

by:

P (T_{i} > t) = \int_{0}^{\infty} M_{α} (θ) e^{- d_{i} t^{α} θ} d θ = E_{α} (- d_{i} t^{α}),

(33)

with

d_{i} = \sum_{j} A_{i j}

. The Mittag-Leffler tail obeys

E_{α} (- d_{i} t^{α}) \sim \frac{1}{d_{i} Γ (1 - α)} t^{- α}

for

t \to \infty

, which is a power law, in contrast to the exponentially decaying tail of the diffusive case. These rare but very long pauses act as traps that stretch physical time relative to operational time, producing a subdiffusive spread on the graph.

The survival function can also be computed for the SOE approximation. Recall the numerical quadrature from Section 3.1, with non-normalized weights

{\tilde{a}}_{j} = h M_{α} (b_{j}) b_{j}

for

1 \leq j \leq J

. Then, for the operator, we normalize the weights as

a_{j} = {\tilde{a}}_{j} / \sum_{k = 1}^{J} {\tilde{a}}_{k}

, while for scalar quantities, we use the non-normalized weights, leading to the SOE vertex waiting times

S_{i}^{S O E} = \sum_{j = 1}^{J} {\tilde{a}}_{j} e^{- d_{i} t^{α} b_{j}} .

(34)

From (34), we see that the waiting times in the SOE approximation also exhibit a heavy tail dependent upon the degree

d_{i}

; these are modulated by the global internal-time samples

b_{j}

, whose mixtures simultaneously approximate all vertex waiting-time distributions.

The Caputo equation

\partial_{t}^{α} u (t) = - L u (t), \partial_{t}^{α} u (t) = \frac{1}{Γ (1 - α)} \int_{0}^{t} {(t - τ)}^{- α} u^{'} (τ) d τ,

is explicitly history dependent: the instantaneous rate

u^{'} (t)

depends on the full past with a power-law kernel

{(t - τ)}^{- α}

. In renewal terms, the holding-time hazard for Mittag-Leffler waiting times is decreasing in age. The longer the process has been waiting, the less likely it is to jump immediately. Hence, the future depends on how long the current wait has lasted (“aging”), which breaks the Markov property in physical time t. Conditioned on

E_{t} = s

, the baseline path is Markov; after averaging over the random clock, the observed dynamics inherit memory.

On the other hand, with

L = V Λ V^{⊤}

, each mode decays as

E_{α} (- t^{α} λ_{k}) instead of e^{- t λ_{k}} .

For

t \to \infty

,

E_{α} (- t^{α} λ_{k}) \sim \frac{1}{λ_{k} Γ (1 - α)} t^{- α}

: algebraic decay replaces exponential decay. Hence, mixing, return probabilities, and any observable built from the heat kernel trace decay more slowly, reflecting both subdiffusion (slower spread) and long memory (long tails). The practical consequences on networks are as follows:

Slower homogenization: Community imbalances, gradients, or initial heterogeneities persist longer (power–law tail).
Trapping in dense/central regions: Large $d_{i}$ increases the attempt rate, but the heavy-tailed clock still produces long residence episodes; dwell-time distributions are broad across nodes.
Non-exponential relaxation: Observables fit Mittag-Leffler or power–law decays rather than exponentials; log–log slopes reveal $α$ .

In summary, the random clock

E_{t}

compresses operational time from t to

t^{α}

on average and introduces power–law waiting between moves; together these yield subdiffusive spreading and memory (aging) in the network dynamics.

8. A Generalized Physico-Mathematical Context of Subdiffusion on Graphs

The fractional-time differential equation that gives rise to subdiffusion is widely known. The SOE is not only the approximation of a function: we show how the superposition of diffusions can emerge in a graph, providing a bridge to graph dynamics, as well as integrating the concepts in the generalized structure of kernel-based differential equations.

8.1. A Multiplex Diffusion

Let us consider that there are J parallel diffusion processes occurring on the graph G, each with a diffusivity constant

β_{j}

. Therefore, we can consider a representation of the graph as a multiplex graph formed by J layers [68,69], each of them representing the same underlying graph. Every pair of layers

l_{i}

and

l_{j}

is interconnected by means of inter-layer edges connecting each vertex

v \in l_{i}

to its corresponding counterpart in

l_{j}

.

Then, we create the super-Laplacian matrix, which was introduced in [70]:

L = \oplus_{j = 1}^{J} β_{j} L + ω I \otimes (J - I),

(35)

where ⊗ is the Kronecker product,

ω \in R^{+} \cup \{0\}

is the strength coupling between the layers, J is an all-ones matrix and I is the identity matrix. Then,

L = (\begin{matrix} β_{1} L & C_{12} & \dots & C_{1 J} \\ C_{21} & β_{2} L & \dots & C_{2 J} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ C_{J 1} & C_{J 2} & \dots & β_{J} L \end{matrix}) .

(36)

Let us write the diffusion equation on the multiplex:

\partial_{t} u (t) + L u (t) = 0,

(37)

with initial condition

u (0) = γ \otimes φ

, where

γ = {[γ_{1} \dots γ_{J}]}^{T},

γ_{j} \in R

, and

φ \in R^{n \times 1}

. The solution of the abstract Cauchy problem is then

u (t) = e^{- t L} u (0) .

(38)

Let us consider a very weak coupling strength between pairs of layers,

ω ≪ 1

, such that we can consider the diffusion at each layer almost independently of the diffusion on other layers:

u (t) ≅ \oplus_{j = 1}^{J} γ_{j} e^{- t β_{j} L} φ .

(39)

We can then consider the sum of the concentrations at each vertex of the graph in all layers:

c (t) = \sum_{j = 1}^{h} u_{j} (t) = \sum_{j = 1}^{J} γ_{j} e^{- t β_{j} L} φ,

(40)

which is related to the SOE approximation we have found for the Mittag-Leffler Laplacian function. Therefore, the subdiffusive process on the graph can be seen as the aggregate diffusion occurring in parallel at different layers, where each layer has its own diffusivity

β_{j}

and the initial conditions on a layer are scalar multiples of the initial conditions on any other layer.

8.2. Factorized High-Order Temporal Diffusion Equation

Let us consider J independent diffusive species

v_{1} (t), \dots, v_{J} (t) \in R^{n}

evolving according to

\partial_{t} v_{j} (t) + β_{j} L v_{j} (t) = 0, j = 1, \dots, J,

(41)

with distinct diffusion rates

β_{j} > 0

. We define the observable field

u (t) : = \sum_{j = 1}^{J} v_{j} (t) .

(42)

Since L is diagonalizable, it suffices to work in a single Laplacian eigenmode with eigenvalue

λ \geq 0

and corresponding eigenvector

z_{λ} \in R^{n}

. Let

x_{j} (t)

be the scalar component of

v_{j} (t)

associated with the eigenmode

λ

; that is,

v_{j} (t) = \sum_{λ \in Sp (L)} x_{j} (t) z_{λ}

. Let

y (t) = \sum_{j = 1}^{J} x_{j} (t)

. The system (41) reduces to

\partial_{t} x_{j} (t) + β_{j} λ x_{j} (t) = 0, j = 1, \dots, J,

whose solutions are

x_{j} (t) = e^{- β_{j} λ t} x_{j} (0)

. Hence,

y (t) = \sum_{j = 1}^{J} e^{- β_{j} λ t} x_{j} (0) .

Each exponential

e^{- β_{j} λ t}

is annihilated by the operator

(\partial_{t} + β_{j} λ)

, and therefore

\prod_{m = 1}^{J} (\partial_{t} + β_{m} λ) y (t) = 0 .

Since this holds for every Laplacian eigenvalue

λ

, we obtain, in operator form, the multiplicative diffusion equation

\prod_{m = 1}^{J} (\partial_{t} + β_{m} L) u (t) = 0,

(43)

with suitable initial conditions. Then, we claim that

u (t) = \sum_{j = 1}^{J} γ_{j} e^{- t β_{j} L} φ .

is an exact solution of the multiplicative diffusion equation.

8.2.1. Verification

It suffices to check that each term

u_{j} (t) : = e^{- t β_{j} L} φ

lies in the kernel of the operator

\prod_{m = 1}^{J} (\partial_{t} + β_{m} L)

. First, observe that for each fixed m and j,

(\partial_{t} + β_{m} L) u_{j} (t) = (- β_{j} L + β_{m} L) e^{- t β_{j} L} φ = (β_{m} - β_{j}) L e^{- t β_{j} L} φ .

Applying the full product, we obtain

\prod_{m = 1}^{J} (\partial_{t} + β_{m} L) u_{j} (t) = (\prod_{m = 1}^{J} (β_{m} - β_{j})) L^{J} e^{- t β_{j} L} φ .

If the

β_{m}

are pairwise distinct and

j \in {1, \dots, J}

, then one of the factors in the product is

(β_{j} - β_{j}) = 0

. Hence,

\prod_{m = 1}^{J} (\partial_{t} + β_{m} L) u_{j} (t) = 0 for each j = 1, \dots, J .

By linearity,

\prod_{m = 1}^{J} (\partial_{t} + β_{m} L) u (t) = \sum_{j = 1}^{J} γ_{j} \prod_{m = 1}^{J} (\partial_{t} + β_{m} L) u_{j} (t) = 0 .

Thus, the function

u (t) = \sum_{j = 1}^{J} γ_{j} e^{- t β_{j} L} φ

is an exact solution of (43).

8.2.2. Initial Conditions

Equation (43) is of order J in time, so one can prescribe

u (0), \partial_{t} u (0), \dots, \partial_{t}^{J - 1} u (0)

. The representation

u (t) = \sum_{j = 1}^{J} e^{- t β_{j} L} ψ_{j}

is the general solution of (43), and the vectors

ψ_{j}

are uniquely determined from the initial data via a Vandermonde-type linear system (mode by mode in the eigenbasis of L). Choosing

ψ_{j} = γ_{j} φ

yields precisely

u (t) = \sum_{j = 1}^{J} γ_{j} e^{- t β_{j} L} φ .

Therefore, for any prescribed coefficients

β_{j}

,

γ_{j}

and profile

φ

, there exist initial conditions for (43) such that the unique solution is exactly

u (t) = \sum_{j = 1}^{J} γ_{j} e^{- t β_{j} L} φ .

8.2.3. Physical Interpretation

Equation (43) does not describe diffusion of a single species. Rather, it is the effective evolution equation satisfied by the total concentration u obtained by summing k independent diffusive species with distinct diffusion rates. The product structure arises from eliminating the hidden fields

v_{j}

and encodes the presence of multiple diffusive time scales. Thus, the multiplicative diffusion Equation (43) represents the minimal closed description of a multi-rate diffusion process when only the aggregate observable is accessible.

8.3. Operator-Valued Memory and Diffusion on Graphs

8.3.1. Operator-Valued Volterra Kernels

Let X be a finite-dimensional Hilbert space and denote by

L (X)

the space of bounded linear operators on X. An operator-valued Volterra kernel is a strongly measurable mapping

K : R_{+} \to L (X)

such that

\int_{0}^{T} {∥ K (t) ∥}_{L (X)} d t < \infty for all T > 0 .

Given such a kernel, the associated Volterra convolution operator acts on sufficiently regular functions

u : R_{+} \to X

by

(K u) (t) : = \int_{0}^{t} K (t - s) u (s) d s, t \geq 0 .

An evolution equation of the form

u^{'} (t) + K u (t) = 0

(44)

is called a Volterra evolution equation with operator-valued memory. The Volterra structure enforces causality, while the operator-valued nature of K allows for mode-dependent memory effects.

Remark 5.

Equation (44) describes memory acting directly on the state u. Caputo-type formulations, in which the kernel acts on

u^{'}

, can be interpreted later as arising in a singular limit of this general framework.

8.3.2. Multiplicative Diffusion on Graphs

Let $L \in R^{n \times n}$ be a symmetric graph Laplacian ( $L ⪰ 0$ and $ker L = span {1}$ ), and let $β_{1}, \dots, β_{k} > 0$ be pairwise distinct. We consider the multiplicative diffusion equation

\prod_{j = 1}^{J} (\partial_{t} + β_{j} L) u (t) = 0, t > 0,

(45)

supplemented with k initial conditions

u (0), u^{'} (0), \dots, u^{(k - 1)} (0)

.

Solutions of (45) admit the representation

u (t) = \sum_{j = 1}^{J} e^{- β_{j} t L} ψ_{j},

(46)

where the vectors

ψ_{j} \in R^{n}

are uniquely determined by the initial conditions. Consequently, the Laplace transform

\hat{u} (s) = L {u} (s)

satisfies

\hat{u} (s) = \sum_{j = 1}^{J} {(s I + β_{j} L)}^{- 1} ψ_{j}, u (0) = \sum_{j = 1}^{J} ψ_{j} .

(47)

8.3.3. Equivalence with an Operator-Valued Memory Equation

We now relate (45) to a Volterra evolution equation of the form

u^{'} (t) + \int_{0}^{t} K (t - s) L u (s) d s = 0, t > 0,

(48)

where

K : R_{+} \to L (R^{n})

is an operator-valued Volterra kernel.

Taking the Laplace transform of (48) yields

(s I + \hat{K} (s) L) \hat{u} (s) = u (0),

(49)

for

ℜ (s)

sufficiently large.

Assume for simplicity that

u (0) \in Ran (L)

, so that L is invertible on the dynamically relevant subspace (more generally,

L^{- 1}

may be replaced by the Moore–Penrose inverse

L^{†}

). Define

G^{(J)} (s) : = \sum_{j = 1}^{J} {(s I + β_{j} L)}^{- 1} P_{j},

(50)

where

P_{j} : R^{n} \to R^{n}

are linear maps such that

ψ_{j} = P_{j} u (0)

. If

G^{(J)} (s)

is invertible for

ℜ (s)

sufficiently large, we define

\hat{K} (s) : = L^{- 1} (G^{(J)} {(s)}^{- 1} - s I) .

(51)

Then (47) implies

(s I + \hat{K} (s) L) \hat{u} (s) = u (0),

which coincides with (49). Hence, solutions of the multiplicative diffusion Equation (45) also satisfy the Volterra Equation (48) with operator-valued kernel K.

8.3.4. Spectral Representation

Let

L = U Λ U^{⊤}

with

Λ = diag (λ_{ℓ})

. On each eigenmode

λ > 0

, the kernel is characterized by

{\hat{K}}_{λ} (s) = \frac{1}{λ} (\frac{1}{\sum_{j = 1}^{J} \frac{p_{j} (λ)}{s + β_{j} λ}} - s),

(52)

where

p_{j} (λ)

are the scalar multipliers induced by

P_{j}

. Thus, the memory kernel is generally mode-dependent and cannot be reduced to a single scalar kernel.

8.3.5. Fractional Diffusion as a Singular Memory Limit

In the next proposition, we show that the fractional diffusion equation is equivalent to a Volterra equation with a suitably chosen subdiffusion kernel

K_{S}

.

Proposition 7.

Let

L \in R^{n \times n}

be a symmetric, positive semidefinite operator (e.g., a graph Laplacian), and let

0 < α < 1

. Consider the Volterra evolution equation

u^{'} (t) + \int_{0}^{t} K (t - s) L u (s) d s = 0, t > 0,

(53)

with initial condition

u (0) = u_{0}

. If the kernel is chosen as

K_{S} (t) = \frac{1}{Γ (α - 1)} t^{α - 2}, t > 0,

(54)

then (53) is equivalent to the Caputo fractional diffusion equation

D_{t}^{α} u (t) = - L u (t), t > 0,

(55)

where

D_{t}^{α}

denotes the Caputo derivative of order α. Moreover, the unique solution is given by

u (t) = E_{α} (- t^{α} L) u_{0},

(56)

where

E_{α}

is the Mittag-Leffler function.

Proof.

Let

\hat{u} (s) = L {u} (s)

denote the Laplace transform of u. Using the identities

L {u^{'}} (s) = s \hat{u} (s) - u_{0}, L \{\int_{0}^{t} K (t - s) L u (s) d s\} = \hat{K} (s) L \hat{u} (s),

the Laplace transform of (53) yields

(s I + \hat{K} (s) L) \hat{u} (s) = u_{0} .

(57)

For the kernel (54), the classical Laplace-transform formula yields

{\hat{K}}_{S} (s) = s^{1 - α}, ℜ (s) > 0 .

Substituting this expression into (57) yields

(s I + s^{1 - α} L) \hat{u} (s) = u_{0} .

Multiplying both sides by

s^{α - 1}

yields

(s^{α} I + L) \hat{u} (s) = s^{α - 1} u_{0}

(58)

On the other hand, the Laplace transform of the Caputo derivative

D_{t}^{α} u

is

L {D_{t}^{α} u} (s) = s^{α} \hat{u} (s) - s^{α - 1} u_{0} .

Taking the Laplace transform of (55) therefore yields

s^{α} \hat{u} (s) - s^{α - 1} u_{0} = - L \hat{u} (s),

which rearranges to (58). Thus, Equations (53) and (55) are equivalent.

Finally, the expression of

\hat{u} (s)

in terms of

u_{0}

can be obtained from (58)

\hat{u} (s) = s^{α - 1} {(s^{α} I + L)}^{- 1} u_{0} .

(59)

The inverse Laplace transform of

s^{α - 1} {(s^{α} I + L)}^{- 1}

is the operator-valued Mittag-Leffler function

E_{α} (- t^{α} L)

, which proves (56). □

Remark 6.

We note that the Laplace-domain representation is written in a resolvent form. More generally, finite superpositions of Laplacian semigroups of the form

u (t) = \sum_{j = 1}^{J} e^{- β_{j} t L} ψ_{j}

lead (via the construction in (52)) to operator-valued memory kernels whose Laplace symbols are rational in s (mode-by-mode in the spectrum of L). When the number of terms increases and the associated time scales become dense, these rational symbols may converge, in an appropriate sense, to the fractional symbol

s^{α - 1}

. In this resolvent sense, time-fractional diffusion can be viewed as a singular, scale-free limit of diffusion with operator-valued Volterra memory.

Note that the kernel

K_{S} (t)

is negative, since

0 < α < 1

and thus

Γ (α - 1) < 0

. The stability of the Volterra heat equation has been studied in [71], in a continuous spatial setting, with particular attention to positive and negative kernels. An example of an application with a negative kernel in a stochastic setting is given in [47].

We now formalize the connection between the SOE approximation and fractional diffusion at the level of Laplace-domain resolvents. We emphasize that the result below concerns operator-valued symbols in the Laplace domain and should not be interpreted as a full dynamical equivalence in physical time. In particular, the statement establishes a form of resolvent convergence under suitable assumptions, rather than a complete convergence of time-domain solutions or memory kernels.

Proposition 8.

Let

L \in R^{n \times n}

be a symmetric graph Laplacian with spectrum

σ (L) \subset [0, λ_{\max}]

and let

0 < α < 1

. For s with

ℜ (s) > 0

, define the fractional resolvent family

{\hat{G}}_{S} (s) : = s^{α - 1} {(s^{α} I + L)}^{- 1} .

(60)

Assume that for each

h \in N

we are given coefficients

a_{j}, b_{j} > 0

(

j = 1, \dots, h

) and define the rational operator-valued symbol

{\hat{G}}^{(J)} (s) : = \sum_{j = 1}^{J} a_{j} {(s I + b_{j} L)}^{- 1} .

(61)

Suppose there exists a domain

Ω \subset {s \in C : ℜ (s) > 0}

and constants

M, δ > 0

such that for all

s \in Ω

:

\begin{matrix} ∥ {\hat{G}}_{S} (s) ∥ & \leq M, \end{matrix}

(62)

\begin{matrix} ∥ {\hat{G}}_{S} {(s)}^{- 1} ∥_{Ran (L) \to Ran (L)} & \leq M, \end{matrix}

(63)

\begin{matrix} ∥ {\hat{G}}^{(J)} (s) - {\hat{G}}_{S} (s) ∥ & \leq δ, with δ M < 1 . \end{matrix}

(64)

(Here,

{\hat{G}}_{S} {(s)}^{- 1}

denotes the inverse on

Ran (L)

, and the norm is the operator norm on

R^{n}

.)

Then for every

s \in Ω

, the operator

{\hat{G}}^{(J)} (s)

is invertible on

Ran (L)

and

∥ {\hat{G}}^{(J)} {(s)}^{- 1} - {\hat{G}}_{S} {(s)}^{- 1} ∥_{Ran (L) \to Ran (L)} \leq \frac{M^{2}}{1 - δ M} ∥ {\hat{G}}^{(J)} (s) - {\hat{G}}_{S} (s) ∥ .

(65)

Define the SOE-induced operator-valued memory symbol by

{\hat{K}}^{(J)} (s) : = ({\hat{G}}^{(J)} {(s)}^{- 1} - s I) L^{†}, s \in Ω,

(66)

where

L^{†}

is the Moore–Penrose inverse. Define analogously

{\hat{K}}_{S} (s) : = ({\hat{G}}_{S} {(s)}^{- 1} - s I) L^{†} = s^{1 - α} I on Ran (L) .

(67)

Then

{(s I + {\hat{K}}^{(J)} (s) L)}^{- 1} = {\hat{G}}^{(J)} (s), {(s I + {\hat{K}}_{S} (s) L)}^{- 1} = {\hat{G}}_{S} (s),

(68)

and moreover, the resolvents converge in operator norm:

∥ {(s I + {\hat{K}}^{(J)} (s) L)}^{- 1} - {(s I + s^{α - 1} L)}^{- 1} ∥ = ∥ {\hat{G}}^{(J)} (s) - {\hat{G}}_{S} (s) ∥ \underset{J \to \infty}{\to} 0, s \in Ω .

(69)

Proof.

Fix

s \in Ω

.

Step 1: Stability of invertibility on

Ran (L)

. Let

A : = {\hat{G}}_{S} (s)

and

B : = {\hat{G}}^{(J)} (s)

, viewed as operators on

Ran (L)

. By assumption, A is invertible there and

∥ A^{- 1} ∥ \leq M

. Moreover,

∥ A^{- 1} (B - A) ∥ \leq ∥ A^{- 1} ∥ ∥ B - A ∥ \leq M δ < 1 .

Hence, B is invertible on

Ran (L)

and the standard resolvent identity yields

B^{- 1} - A^{- 1} = B^{- 1} (A - B) A^{- 1} .

Using

∥ B^{- 1} ∥ \leq ∥ A^{- 1} ∥ / (1 - ∥ A^{- 1} (B - A) ∥) \leq M / (1 - δ M)

, we obtain

∥ B^{- 1} - A^{- 1} ∥ \leq \frac{M}{1 - δ M} ∥ A - B ∥ M = \frac{M^{2}}{1 - δ M} ∥ B - A ∥,

which proves (65).

Step 2: Definition of the memory symbols and resolvent identities. By definition (66),

{\hat{G}}^{(J)} {(s)}^{- 1} = s I + {\hat{K}}^{(J)} (s) L on Ran (L),

because

L^{†} L

is the identity on

Ran (L)

. Taking inverses on

Ran (L)

yields

{(s I + {\hat{K}}^{(J)} (s) L)}^{- 1} = {\hat{G}}^{(J)} (s) on Ran (L),

and the same argument yields

{(s I + {\hat{K}}_{S} (s) L)}^{- 1} = {\hat{G}}_{S} (s)

. This proves (68).

Step 3: identifying

{\hat{K}}_{S} (s)

and resolvent convergence. From (60) we have

{\hat{G}}_{S} {(s)}^{- 1} = s^{1 - α} (s^{α} I + L) = s I + s^{1 - α} L,

hence,

{\hat{K}}_{S} (s) = s^{α - 1} I

as in (67) on

Ran (L)

. Therefore,

{(s I + {\hat{K}}_{S} (s) L)}^{- 1} = {(s I + s^{1 - α} L)}^{- 1} = {\hat{G}}_{S} (s) .

Finally, (69) follows immediately from (68) and the assumed closeness (64), and, in particular, tends to 0 if

∥ {\hat{G}}^{(J)} (s) - {\hat{G}}_{S} (s) ∥ \to 0

as

J \to \infty

. □

Proposition 8 shows that, under suitable assumptions, the SOE-based resolvents approximate the fractional resolvent in operator norm on the domain

Ω

. This provides a precise sense in which fractional dynamics can be approximated by finite superpositions of classical diffusions at the level of Laplace-domain operators.

An analysis of the subdiffusion equation with memory with relation to rational kernels in the Laplace domain can be found in [32].

Remark 7.

Proposition 8 states that an SOE approximation does not merely approximate a solution curve, but induces an approximate evolution law in the same analytic class as the limiting fractional dynamics, namely through the Laplace-domain resolvent. It makes explicit the structural pathway

\begin{matrix} SOE ⟺ operator - valued ⟹ fractional limit \end{matrix}

where the double arrow refers to the exact algebraic construction

{\hat{G}}^{(J)} \leftrightarrow {\hat{K}}^{(J)}

via (66) and (68), while the single arrow denotes the limiting process

J \to \infty

in the resolvent sense (69). This is the natural notion of convergence for nonlocal-in-time equations, since the resolvent family uniquely characterizes the corresponding linear evolution.

Remark 8.

Proposition 8 should be interpreted strictly as a Laplace-domain (resolvent-level) approximation result. The following clarifications delimit the scope of this statement and prevent overinterpretation in terms of time-domain dynamics:

Time-domain kernel convergence.It does not claim that the inverse Laplace transforms satisfy

K^{(J)} (t) \to K_{S} (t)

pointwise, in

L_{loc}^{1} (R_{+})

, or in any distributional sense.

Strong convergence of semigroup families in physical time.If a physical-time representation involves nonlinear reparametrizations (e.g.,

τ = t^{α}

), the proposition does not claim that uniform approximation of

G^{(J)} (τ)

implies convergence of Laplace transforms in the physical time variable.

Equivalence to a Caputo equation for each finite J.For finite J, the induced kernel is regular (typically a finite exponential mixture in time), and the proposition does not claim that the corresponding memory equation is a Caputo fractional differential equation.

Global-in-s convergence.No statement is made about convergence uniformly for all

ℜ (s) > 0

, nor about behavior near

s = 0

or

| s | \to \infty

, unless such regions are included in Ω and the hypotheses are verified there.

In summary, the result establishes a controlled approximation at the level of Laplace-domain operators, which supports the interpretation of fractional diffusion as an effective, scale-free limit of multi-rate processes, but does not by itself constitute a full convergence result in the time domain.

9. Conclusions

In this work, we have investigated subdiffusion on graphs from structural, dynamical, and mechanistic perspectives, emphasizing the role of memory rather than modifications of spatial connectivity or transition rules. By grounding time-fractional diffusion in a random time-change framework, we have shown that subdiffusion on networks is not merely a slower version of classical diffusion, but a fundamentally non-Markovian process in which past states actively shape future evolution. Importantly, this loss of Markovianity occurs without sacrificing linearity or mass conservation, making time-fractional models both analytically tractable and physically meaningful for networked systems.

A central outcome of our analysis is the demonstration that Mittag-Leffler graph dynamics admit a convex, mass-preserving representation as a superposition of classical heat semigroups evaluated at rescaled times. This operator-level viewpoint provides the foundation for the sum-of-exponentials (SOE) approximation, which enables the efficient numerical evaluation of fractional dynamics through a finite number of matrix exponentials. From a computational perspective, this reduces the problem to repeatedly applying classical diffusion operators, allowing one to leverage existing scalable algorithms while incorporating non-Markovian effects.

From a global perspective, the results of the paper can be understood as different manifestations of this operator-level structure. At the vertex level, the superposition of time scales induces heterogeneous memory effects: different nodes, and even the same node at different times, exhibit distinct biases toward the remote past, recent past, or present contributions, depending on the local temporal curvature of the evolving state. At the level of paths, this same structure translates into a geometric reformulation of transport: the subdiffusive distance induces a metrization of the graph that reshapes connectivity in a time-dependent manner. In particular, subdiffusive shortest paths recover topological shortest paths in the small-time limit while preferentially selecting routes through high-degree regions, revealing a form of memory-assisted navigation that contrasts with classical diffusion.

Taken together, these results show that the operator structure, vertex-level memory, and path-level geometry are not independent phenomena, but are tightly coupled aspects of a single underlying mechanism: the redistribution of diffusion across multiple intrinsic time scales. This integrated viewpoint clarifies how memory reshapes network transport in a structurally coherent way.

From a practical standpoint, the proposed framework is particularly relevant for systems in which transport is affected by memory or trapping effects, such as biological interaction networks, communication systems with delays, or transport in heterogeneous media. In such settings, the SOE approximation provides a computationally viable alternative to direct evaluation of fractional operators, enabling the simulation of subdiffusive dynamics on large graphs using standard diffusion solvers. Compared to classical diffusion, the present approach captures non-exponential relaxation, heterogeneous waiting times, and path-selection effects, while retaining computational efficiency through its reduction to a finite set of semigroup evaluations.

Finally, by connecting fractional diffusion to finite superpositions of classical diffusions and to operator-valued Volterra memory equations in the Laplace domain, we have placed time-fractional graph dynamics within a broader hierarchy of diffusion models with memory. In this context, fractional equations can be interpreted as scale-free limits of multi-rate processes, providing a unifying framework that links analytical theory, numerical approximation, and physical interpretation.

Overall, our results provide a coherent picture of subdiffusion on graphs, showing how memory, geometry, and dynamics interact across scales. This perspective not only deepens the theoretical understanding of fractional diffusion on networks, but also opens the way to practical applications and further quantitative comparisons with classical diffusion models.

Author Contributions

Conceptualization, E.E.; methodology, E.E.; software, E.E. and N.D.; validation, E.E. and N.D.; writing—original draft preparation, E.E.; writing—review and editing, E.E. and N.D.; visualization E.E. and N.D.; supervision, E.E. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Agencia Estatal de Investigación (AEI, MCI, Spain) MCIN/AEI/10.13039/501100011033 under grant PID2023-149473NB-I00 and by Agencia Estatal de Investigación (AEI, MCI, Spain) MCIN/AEI/10.13039/501100011033 and Fondo Europeo de Desarrollo Regional (FEDER, UE) under the María de Maeztu Program for units of Excellence in R&D, grant CEX2021-001164-M) are also acknowledged.

Data Availability Statement

The original contributions presented in this study are included in the article.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A. Numerical Window Tables and Usage

We summarize practical window choices for the log–trapezoidal SOE, based on the tail bounds previously found. For the right tail, the stretched–exponential decay of

M_{α}

implies that

θ_{\max} \geq {(\frac{log (2 / ε)}{c_{α}})}^{1 / q_{α}}, c_{α} = (1 - α) α^{α / (1 - α)}, q_{α} = \frac{1}{1 - α},

is sufficient to make

sup_{λ \in [0, λ_{\max}]} \int_{θ_{\max}}^{\infty} M_{α} (θ) e^{- θ t λ} d θ ≲ ε / 2

uniformly in

λ

(i.e., including the zero mode); see (19). For the left tail, taking

θ_{\min} = \frac{ε}{2} Γ (1 - α)

ensures

sup_{λ \in [0, λ_{\max}]} \int_{0}^{θ_{\min}} M_{α} (θ) e^{- θ t λ} d θ \leq ε / 2

(Proposition 3).

Table A1 and Table A2 list these cutoffs for typical

α

and

ε

values, together with

y_{\max} = log θ_{\max}

and

y_{\min} = log θ_{\min}

, which are the actual endpoints used by the SOE in the variable

y = log θ

.

Table A1. Conservative right–tail cutoff

θ_{\max}

ensuring

sup_{λ \in [0, λ_{\max}]} \int_{θ_{\max}}^{\infty} M_{α} (θ) e^{- θ t λ} d θ ≲ ε / 2

. Brackets show

y_{\max} = log θ_{\max}

for direct use in the log–trapezoidal SOE. Values are independent of t and of the spectrum; they come from the uniform bound (19) using the stretched–exponential tail of

M_{α}

with constants

c_{α}, q_{α}

[54,55].

Table A1. Conservative right–tail cutoff

θ_{\max}

ensuring

sup_{λ \in [0, λ_{\max}]} \int_{θ_{\max}}^{\infty} M_{α} (θ) e^{- θ t λ} d θ ≲ ε / 2

. Brackets show

y_{\max} = log θ_{\max}

for direct use in the log–trapezoidal SOE. Values are independent of t and of the spectrum; they come from the uniform bound (19) using the stretched–exponential tail of

M_{α}

with constants

c_{α}, q_{α}

[54,55].

$α$	$ε = 10^{- 6}$	$ε = 10^{- 8}$	$ε = 10^{- 10}$	$ε = 10^{- 12}$
0.20	14.016 [2.640]	17.474 [2.861]	20.768 [3.033]	23.936 [3.175]
0.30	11.979 [2.483]	14.529 [2.676]	16.899 [2.827]	19.134 [2.951]
0.50	7.618 [2.031]	8.744 [2.168]	9.740 [2.276]	10.644 [2.365]
0.70	4.109 [1.413]	4.464 [1.496]	4.762 [1.561]	5.023 [1.614]
0.80	2.816 [1.035]	2.976 [1.090]	3.107 [1.134]	3.219 [1.169]

Table A2. Left–tail cutoff

θ_{\min} = \frac{ε}{2} Γ (1 - α)

guaranteeing

sup_{λ \in [0, λ_{\max}]} \int_{0}^{θ_{\min}} M_{α} (θ) e^{- θ t λ} d θ \leq ε / 2

. Brackets show

y_{\min} = log θ_{\min}

(Rounded to three significant digits.).

Table A2. Left–tail cutoff

θ_{\min} = \frac{ε}{2} Γ (1 - α)

guaranteeing

sup_{λ \in [0, λ_{\max}]} \int_{0}^{θ_{\min}} M_{α} (θ) e^{- θ t λ} d θ \leq ε / 2

. Brackets show

y_{\min} = log θ_{\min}

(Rounded to three significant digits.).

$α$	$ε = 10^{- 6}$	$ε = 10^{- 8}$	$ε = 10^{- 10}$	$ε = 10^{- 12}$
0.20	$5.82 \times 10^{- 7}$ [ $- 14.357$ ]	$5.82 \times 10^{- 9}$ [ $- 18.962$ ]	$5.82 \times 10^{- 11}$ [ $- 23.567$ ]	$5.82 \times 10^{- 13}$ [ $- 28.172$ ]
0.30	$6.49 \times 10^{- 7}$ [ $- 14.248$ ]	$6.49 \times 10^{- 9}$ [ $- 18.853$ ]	$6.49 \times 10^{- 11}$ [ $- 23.458$ ]	$6.49 \times 10^{- 13}$ [ $- 28.063$ ]
0.50	$8.86 \times 10^{- 7}$ [ $- 13.936$ ]	$8.86 \times 10^{- 9}$ [ $- 18.541$ ]	$8.86 \times 10^{- 11}$ [ $- 23.147$ ]	$8.86 \times 10^{- 13}$ [ $- 27.752$ ]
0.70	$1.50 \times 10^{- 6}$ [ $- 13.413$ ]	$1.50 \times 10^{- 8}$ [ $- 18.018$ ]	$1.50 \times 10^{- 10}$ [ $- 22.623$ ]	$1.50 \times 10^{- 12}$ [ $- 27.228$ ]
0.80	$2.30 \times 10^{- 6}$ [ $- 12.985$ ]	$2.30 \times 10^{- 8}$ [ $- 17.590$ ]	$2.30 \times 10^{- 10}$ [ $- 22.195$ ]	$2.30 \times 10^{- 12}$ [ $- 26.800$ ]

Let us now explain how to use these Tables.

1.: Fix a target tolerance $ε$ for the tails (e.g., $ε = 10^{- 12}$ in double precision).
2.: Read off $(θ_{\min}, y_{\min})$ from Table A2 and $(θ_{\max}, y_{\max})$ from Table A1 for your $α$ and $ε$ .
3.: Set the SOE window in the log variable as $[y_{\min}, y_{\max}]$ ; then increase J until the in-window discretization error decays to your overall tolerance (Theorem 1 shows geometric decay with J).

Variants and remarks.

(Mean-zero subspace.) If you only act on $1^{⊥}$ and know the spectral gap $λ_{2} > 0$ , you may choose $θ_{\max} = \frac{1}{t λ_{2}} log (2 / ε)$ (for reuse over $t \in [t_{\min}, t_{\max}]$ , use $t_{\min}$ here to keep the nodes t-independent) (Proposition 5), which can be much smaller than the uniform value in Table A1.
(Heuristic rule of thumb.) For high-frequency modes it is common to take $θ_{\max} \approx \frac{32}{t λ}$ so that $e^{- θ_{\max} t λ} \approx 10^{- 14}$ (machine precision). The uniform bound above is conservative and includes the zero mode.
(Scaling with α.) As $α ↑ 1$ the tail constant $c_{α}$ grows and the required $θ_{\max}$ shrinks; as $α ↓ 0$ the reverse happens, hence the larger values in the top row of Table A1.

References

Estrada, E. The Structure of Complex Networks: Theory and Applications; Oxford University Press: Oxford, UK, 2012. [Google Scholar]
López-Pintado, D. An overview of diffusion in complex networks. In Complex Networks and Dynamics: Social and Economic Interactions; Springer: Cham, Switzerland, 2016; pp. 27–48. [Google Scholar]
Masuda, N.; Porter, M.A.; Lambiotte, R. Random walks and diffusion on networks. Phys. Rep. 2017, 716, 1–58. [Google Scholar] [CrossRef]
Nicolaides, C.; Cueto-Felgueroso, L.; Juanes, R. Anomalous physical transport in complex networks. Phys. Rev. E Stat. Nonlinear Soft Matter Phys. 2010, 82, 055101. [Google Scholar] [CrossRef] [PubMed]
Metzler, R.; Klafter, J. The random walk’s guide to anomalous diffusion: A fractional dynamics approach. Phys. Rep. 2000, 339, 1–77. [Google Scholar] [CrossRef]
Sokolov, I.M. Models of anomalous diffusion in crowded environments. Soft Matter 2012, 8, 9043–9052. [Google Scholar] [CrossRef]
Medina, P.; Carrasco, S.; Correa-Burrows, P.; Rogan, J.; Valdivia, J.A. Nontrivial and anomalous transport on weighted complex networks. Commun. Nonlinear Sci. Numer. Simul. 2022, 114, 106684. [Google Scholar] [CrossRef]
Weiss, M.; Elsner, M.; Kartberg, F.; Nilsson, T. Anomalous subdiffusion is a measure for cytoplasmic crowding in living cells. Biophys. J. 2004, 87, 3518–3524. [Google Scholar] [CrossRef]
Gupta, S.; Biehl, R.; Sill, C.; Allgaier, J.; Sharp, M.; Ohl, M.; Richter, D. Protein entrapment in polymeric mesh: Diffusion in crowded environment with fast process on short scales. Macromolecules 2016, 49, 1941–1949. [Google Scholar] [CrossRef]
Grimaldo, M.; Roosen-Runge, F.; Zhang, F.; Schreiber, F.; Seydel, T. Dynamics of proteins in solution. Q. Rev. Biophys. 2019, 52, e7. [Google Scholar] [CrossRef]
Basak, S.; Sengupta, S.; Chattopadhyay, K. Understanding biochemical processes in the presence of sub-diffusive behavior of biomolecules in solution and living cells. Biophys. Rev. 2019, 11, 851–872. [Google Scholar] [CrossRef]
Combinido, J.S.L.; Lim, M.T. Crowding effects in vehicular traffic. PLoS ONE 2012, 7, e48151. [Google Scholar] [CrossRef] [PubMed]
Foroozani, A.; Ebrahimi, M. Anomalous information diffusion in social networks: Twitter and Digg. Expert Syst. Appl. 2019, 134, 249–266. [Google Scholar] [CrossRef]
Evangelista, L.R.; Lenzi, E.K. Fractional Diffusion Equations and Anomalous Diffusion; Cambridge University Press: Cambridge, UK, 2018. [Google Scholar]
Sokolov, I.M.; Klafter, J.; Blumen, A. Fractional kinetics. Phys. Today 2002, 55, 48–54. [Google Scholar] [CrossRef]
Caputo, M. Linear models of dissipation whose Q is almost frequency independent. Ann. Geophys. 1966, 19, 383–393. [Google Scholar] [CrossRef]
Abadias, L.; Estrada-Rodriguez, G.; Estrada, E. Fractional-order susceptible-infected model: Definition and applications to the study of COVID-19 main protease. Fract. Calc. Appl. Anal. 2020, 23, 635–655. [Google Scholar] [CrossRef]
D’Alessandro, M.; Van Mieghem, P. Fractional derivative in continuous-time Markov processes and applications to epidemics in networks. Phys. Rev. Res. 2025, 7, 013017. [Google Scholar] [CrossRef]
Estrada, E. Fractional diffusion on the human proteome as an alternative to the multi-organ damage of SARS-CoV-2. Chaos Interdiscip. J. Nonlinear Sci. 2020, 30, 081104. [Google Scholar] [CrossRef]
Sun, W.; Li, Y.; Li, C.; Chen, Y. Convergence speed of a fractional order consensus algorithm over undirected scale-free networks. Asian J. Control 2011, 13, 936–946. [Google Scholar] [CrossRef]
Yan, X.; Li, K.; Yang, C.; Zhuang, J.; Cao, J. Consensus of fractional-order multi-agent systems via observer-based boundary control. IEEE Trans. Netw. Sci. Eng. 2024, 11, 3370–3382. [Google Scholar] [CrossRef]
Sun, F.; Han, Y.; Zhu, W.; Kurths, J. Group consensus for fractional-order heterogeneous multi-agent systems under cooperation-competition networks with time delays. Commun. Nonlinear Sci. Numer. Simul. 2024, 133, 107951. [Google Scholar] [CrossRef]
Huang, C.; Wang, F. Distributed Consensus Tracking of Incommensurate Heterogeneous Fractional-Order Multi-Agent Systems Based on Vector Lyapunov Function Method. Fractal Fract. 2024, 8, 575. [Google Scholar] [CrossRef]
Lu, J.; Shen, J.; Cao, J.; Kurths, J. Consensus of networked multi-agent systems with delays and fractional-order dynamics. In Consensus and Synchronization in Complex Networks; Springer: Berlin/Heidelberg, Germany, 2012; pp. 69–110. [Google Scholar]
Cao, Y.; Li, Y.; Ren, W.; Chen, Y. Distributed coordination of networked fractional-order systems. IEEE Trans. Syst. Man Cybern. Part B Cybern. 2009, 40, 362–370. [Google Scholar]
Pang, G.; Lu, L.; Karniadakis, G.E. fPINNs: Fractional physics-informed neural networks. SIAM J. Sci. Comput. 2019, 41, A2603–A2626. [Google Scholar] [CrossRef]
Joshi, M.; Bhosale, S.; Vyawahare, V.A. A survey of fractional calculus applications in artificial neural networks. Artif. Intell. Rev. 2023, 56, 13897–13950. [Google Scholar] [CrossRef]
Kim, H.; Lawley, S.D. Cover times of many diffusive or subdiffusive searchers. SIAM J. Appl. Math. 2024, 84, 602–620. [Google Scholar] [CrossRef]
Jung, W.; Chen, T.Y.; Santiago, A.G.; Chen, P. Memory effects of transcription regulator-DNA interactions in bacteria. Proc. Natl. Acad. Sci. USA 2024, 121, e2407647121. [Google Scholar] [CrossRef] [PubMed]
Sandev, T.; Tomovski, Z.; Dubbeldam, J.L.; Chechkin, A. Generalized diffusion-wave equation with memory kernel. J. Phys. A Math. Theor. 2018, 52, 015201. [Google Scholar] [CrossRef]
Gurtin, M.E.; Pipkin, A.C. A general theory of heat conduction with finite wave speeds. Arch. Ration. Mech. Anal. 1968, 31, 113–126. [Google Scholar] [CrossRef]
Ponce, R. Discrete Subdiffusion Equations with Memory. Appl. Math. Optim. 2021, 84, 3475–3497. [Google Scholar] [CrossRef]
Saif, S.; Malik, S. An inverse problem for a two-dimensional diffusion equation with arbitrary memory kernel. Math. Methods Appl. Sci. 2023, 46, 11007–11020. [Google Scholar] [CrossRef]
Trimper, S.; Zabrocki, K.; Schulz, M. Memory-controlled diffusion. Phys. Rev. E Stat. Nonlinear Soft Matter Phys. 2004, 70, 056133. [Google Scholar] [CrossRef][Green Version]
Toan, N.D.; Thuy, L.T. The nonclassical diffusion equations with time-dependent memory kernels and a new class of nonlinearities. Glasg. Math. J. 2022, 64, 716–733. [Google Scholar]
Hoffmann, T.; Porter, M.A.; Lambiotte, R. Random walks on stochastic temporal networks. In Temporal Networks; Springer: Berlin/Heidelberg, Germany, 2013; pp. 295–313. [Google Scholar]
Lambiotte, R.; Salnikov, V.; Rosvall, M. Effect of memory on the dynamics of random walks on networks. J. Complex Netw. 2015, 3, 177–188. [Google Scholar] [CrossRef]
Scholtes, I.; Wider, N.; Pfitzner, R.; Garas, A.; Tessone, C.J.; Schweitzer, F. Causality-driven slow-down and speed-up of diffusion in non-Markovian temporal networks. Nat. Commun. 2014, 5, 5024. [Google Scholar]
Kosztołowicz, T.; Dworecki, K.; Mrówczyński, S. Measuring subdiffusion parameters. Phys. Rev. E Stat. Nonlinear Soft Matter Phys. 2005, 71, 041105. [Google Scholar][Green Version]
Kepten, E.; Weron, A.; Sikora, G.; Burnecki, K.; Garini, Y. Guidelines for the fitting of anomalous diffusion mean square displacement graphs from single particle tracking experiments. PLoS ONE 2015, 10, e0117722. [Google Scholar]
Gallos, L.K.; Song, C.; Havlin, S.; Makse, H.A. Scaling theory of transport in complex biological networks. Proc. Natl. Acad. Sci. USA 2007, 104, 7746–7751. [Google Scholar] [CrossRef]
Applebaum, D. Lévy Processes and Stochastic Calculus, 2nd ed.; Cambridge Studies in Advanced Mathematics; Cambridge University Press: Cambridge, UK, 2009; Volume 116. [Google Scholar] [CrossRef]
Bouchaud, J.P.; Georges, A. Anomalous diffusion in disordered media: Statistical mechanisms, models and physical applications. Phys. Rep. 1990, 195, 127–293. [Google Scholar] [CrossRef]
Polanowski, P.; Sikorski, A. Simulation of diffusion in a crowded environment. Soft Matter 2014, 10, 3597–3607. [Google Scholar] [CrossRef]
Meinecke, L. Multiscale modeling of diffusion in a crowded environment. Bull. Math. Biol. 2017, 79, 2672–2695. [Google Scholar] [CrossRef]
Fanelli, D.; McKane, A.J. Diffusion in a crowded environment. Phys. Rev. E Stat. Nonlinear Soft Matter Phys. 2010, 82, 021113. [Google Scholar] [CrossRef][Green Version]
Goychuk, I. Viscoelastic subdiffusion in a random Gaussian environment. Phys. Chem. Chem. Phys. 2018, 20, 24140–24155. [Google Scholar] [CrossRef] [PubMed]
Chauhan, T.; Kalyanaraman, K.; Sircar, S. Quantifying macrostructures in viscoelastic sub-diffusive flows. J. Math. Phys. 2024, 65, 073101. [Google Scholar] [CrossRef]
Goychuk, I.; Pöschel, T. Fingerprints of viscoelastic subdiffusion in random environments: Revisiting some experimental data and their interpretations. Phys. Rev. E 2021, 104, 034125. [Google Scholar] [CrossRef]
Saxton, M.J. Anomalous subdiffusion in fluorescence photobleaching recovery: A Monte Carlo study. Biophys. J. 2001, 81, 2226–2240. [Google Scholar] [CrossRef]
Lim, S.C.; Muniandy, S.V. Self-similar Gaussian processes for modeling anomalous diffusion. Phys. Rev. E 2002, 66, 021114. [Google Scholar] [CrossRef]
Mainardi, F.; Mura, A.; Pagnini, G. The M-Wright Function in Time-Fractional Diffusion Processes: A Tutorial Survey. Int. J. Differ. Equ. 2010, 2010, 104505. [Google Scholar] [CrossRef]
Schilling, R.L.; Song, R.; Vondraček, Z. Bernstein Functions: Theory and Applications, 2nd ed.; De Gruyter Studies in Mathematics; De Gruyter: Berlin, Germany, 2012; Volume 37. [Google Scholar]
Gorenflo, R.; Kilbas, A.A.; Mainardi, F.; Rogosin, S.V. Mittag-Leffler Functions, Related Topics and Applications; Springer Monographs in Mathematics; Springer: Berlin, Germany, 2014. [Google Scholar]
Mainardi, F. Fractional Calculus and Waves in Linear Viscoelasticity: An Introduction to Mathematical Models; Imperial College Press: London, UK, 2010. [Google Scholar]
Meerschaert, M.M.; Sikorskii, A. Stochastic Models for Fractional Calculus; De Gruyter Studies in Mathematics; De Gruyter: Berlin, Germany, 2011; Volume 43. [Google Scholar]
Trefethen, L.N.; Weideman, J.A.C. The Exponentially Convergent Trapezoidal Rule. SIAM Rev. 2014, 56, 385–458. [Google Scholar] [CrossRef]
Coifman, R.R.; Lafon, S.; Lee, A.B.; Maggioni, M.; Nadler, B.; Warner, F.; Zucker, S.W. Geometric diffusions as a tool for harmonic analysis and structure definition of data: Diffusion maps. Proc. Natl. Acad. Sci. USA 2005, 102, 7426–7431. [Google Scholar] [CrossRef]
Coifman, R.R.; Lafon, S. Diffusion maps. Appl. Comput. Harmon. Anal. 2006, 21, 5–30. [Google Scholar] [CrossRef]
Estrada, E. The communicability distance in graphs. Linear Algebra Its Appl. 2012, 436, 4317–4328. [Google Scholar] [CrossRef]
Estrada, E.; Sanchez-Lirola, M.; De La Peña, J.A. Hyperspherical embedding of graphs and networks in communicability spaces. Discret. Appl. Math. 2014, 176, 53–77. [Google Scholar] [CrossRef]
Markvorsen, S. Minimal webs in Riemannian manifolds. Geom. Dedicata 2008, 133, 7–34. [Google Scholar] [CrossRef][Green Version]
Bridson, M.R.; Haefliger, A. Metric Spaces of Non-Positive Curvature; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2013; Volume 319. [Google Scholar]
Silver, G.; Akbarzadeh, M.; Estrada, E. Tuned communicability metrics in networks. The case of alternative routes for urban traffic. Chaos Solitons Fractals 2018, 116, 402–413. [Google Scholar] [CrossRef]
Baeumer, B.; Meerschaert, M.M. Stochastic solutions for fractional Cauchy problems. Fract. Calc. Appl. Anal. 2001, 4, 481–500. [Google Scholar]
Meerschaert, M.M.; Nane, E.; Vellaisamy, P. The fractional Poisson process and the inverse stable subordinator. Electron. J. Probab. 2011, 16, 1600–1620. [Google Scholar] [CrossRef]
Odibat, Z. Approximations of fractional integrals and Caputo fractional derivatives. Appl. Math. Comput. 2006, 178, 527–533. [Google Scholar] [CrossRef]
Kivelä, M.; Arenas, A.; Barthelemy, M.; Gleeson, J.P.; Moreno, Y.; Porter, M.A. Multilayer networks. J. Complex Netw. 2014, 2, 203–271. [Google Scholar] [CrossRef]
Boccaletti, S.; Bianconi, G.; Criado, R.; Del Genio, C.I.; Gómez-Gardenes, J.; Romance, M.; Sendina-Nadal, I.; Wang, Z.; Zanin, M. The structure and dynamics of multilayer networks. Phys. Rep. 2014, 544, 1–122. [Google Scholar] [CrossRef] [PubMed]
Gomez, S.; Diaz-Guilera, A.; Gomez-Gardenes, J.; Perez-Vicente, C.J.; Moreno, Y.; Arenas, A. Diffusion dynamics on multiplex networks. Phys. Rev. Lett. 2013, 110, 028701. [Google Scholar] [CrossRef]
Li, L.; Zhou, X.; Gao, H. The stability and exponential stabilization of the heat equation with memory. J. Math. Anal. Appl. 2018, 466, 199–214. [Google Scholar] [CrossRef]

$Fractalfract 10 00273 g001$

Figure 1. Heatmaps illustrating the scalar error of the SOE approximation on a base-10 logarithmic scale, plotted as a function of the number of addends J and time t. The underlying network is an Erdős–Rényi (ER) random graph with 250 vertices and 1000 edges. Results are shown for fractional orders (a)

α = 0.8

, (b)

α = 0.5

, and (c)

α = 0.25

.

Figure 1. Heatmaps illustrating the scalar error of the SOE approximation on a base-10 logarithmic scale, plotted as a function of the number of addends J and time t. The underlying network is an Erdős–Rényi (ER) random graph with 250 vertices and 1000 edges. Results are shown for fractional orders (a)

α = 0.8

, (b)

α = 0.5

, and (c)

α = 0.25

.

$Fractalfract 10 00273 g001$

$Fractalfract 10 00273 g002$

Figure 2. (A) Relative (blue) and mass (orange) errors of the probing vector

u (t)

between the exact Mittag-Leffler solution and the SOE approximation with

J = 5

. (B) The time-averaged relative (blue) and operator (green) errors are shown as a function of J. The average is computed over 300 time steps in the interval

0.1 \leq t \leq 1000

.

Figure 2. (A) Relative (blue) and mass (orange) errors of the probing vector

u (t)

between the exact Mittag-Leffler solution and the SOE approximation with

J = 5

. (B) The time-averaged relative (blue) and operator (green) errors are shown as a function of J. The average is computed over 300 time steps in the interval

0.1 \leq t \leq 1000

.

$Fractalfract 10 00273 g002$

$Fractalfract 10 00273 g003a$ $Fractalfract 10 00273 g003b$

Figure 3. (a) Illustration of all topological shortest paths between a fixed pair of vertices in the example Gabriel graph, colored by average edge degree. (b–e) Shortest paths of SOE approximation for

J = 1, 10, 20, 40

, respectively, with

α = 0.85

. (f) Subdiffusive shortest path between the same vertex pair derived from the Mittag-Leffler function. In (b–f), the paths are colored according to the observation time (

0.1 \leq t \leq 1000

, sampled in 300 steps). The edge thickness is proportional to the frequency with which an edge is traversed by the corresponding paths.

Figure 3. (a) Illustration of all topological shortest paths between a fixed pair of vertices in the example Gabriel graph, colored by average edge degree. (b–e) Shortest paths of SOE approximation for

J = 1, 10, 20, 40

, respectively, with

α = 0.85

. (f) Subdiffusive shortest path between the same vertex pair derived from the Mittag-Leffler function. In (b–f), the paths are colored according to the observation time (

0.1 \leq t \leq 1000

, sampled in 300 steps). The edge thickness is proportional to the frequency with which an edge is traversed by the corresponding paths.

$Fractalfract 10 00273 g003a$ $Fractalfract 10 00273 g003b$

$Fractalfract 10 00273 g004$

Figure 4. Comparison of the average Levenshtein distance between the shortest subdiffusive paths and the topologically shortest path in the example Gabriel graph, for five values of

α

. Average over 300 time instants in the interval

0.1 \leq t \leq 1000

.

Figure 4. Comparison of the average Levenshtein distance between the shortest subdiffusive paths and the topologically shortest path in the example Gabriel graph, for five values of

α

. Average over 300 time instants in the interval

0.1 \leq t \leq 1000

.

$Fractalfract 10 00273 g004$

Table 1. The ten largest coefficients in magnitude for the

a_{j}

of the SOE approximation in the general case of Corollary 1 with

α = 0.85

. These coefficients are displayed for the SOE approximation on the example Gabriel graph.

Table 1. The ten largest coefficients in magnitude for the

a_{j}

of the SOE approximation in the general case of Corollary 1 with

α = 0.85

. These coefficients are displayed for the SOE approximation on the example Gabriel graph.

j	1	2	3	4	5	6	7	8	9	10
$a_{j}$	0.286	0.269	0.167	0.094	0.056	0.035	0.0234	0.016	0.011	0.008
$b_{j}$	1.19	1.53	0.93	0.725	0.57	0.44	0.34	0.27	0.21	0.16

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Deniskin, N.; Estrada, E. Fractional Diffusion on Graphs: Superposition of Laplacian Semigroups Incorporating Memory. Fractal Fract. 2026, 10, 273. https://doi.org/10.3390/fractalfract10040273

AMA Style

Deniskin N, Estrada E. Fractional Diffusion on Graphs: Superposition of Laplacian Semigroups Incorporating Memory. Fractal and Fractional. 2026; 10(4):273. https://doi.org/10.3390/fractalfract10040273

Chicago/Turabian Style

Deniskin, Nikita, and Ernesto Estrada. 2026. "Fractional Diffusion on Graphs: Superposition of Laplacian Semigroups Incorporating Memory" Fractal and Fractional 10, no. 4: 273. https://doi.org/10.3390/fractalfract10040273

APA Style

Deniskin, N., & Estrada, E. (2026). Fractional Diffusion on Graphs: Superposition of Laplacian Semigroups Incorporating Memory. Fractal and Fractional, 10(4), 273. https://doi.org/10.3390/fractalfract10040273

Article Menu

Fractional Diffusion on Graphs: Superposition of Laplacian Semigroups Incorporating Memory

Abstract

1. Introduction

1.1. Subdiffusion: Physical Mechanisms and Mathematical Models

1.2. Contributions of the Paper

2. Preliminaries

2.1. Notation and Assumptions

2.2. Subordination Identity and the M–Wright Density

2.3. Complete Monotonicity and Bernstein’s Theorem

2.4. Terminology: Subordination and the Random Clock

2.5. The Time-Changed Process X E t

3. Sum of Exponentials Approximation

3.1. SOE via Log–Trapezoidal Quadrature

3.2. Tail Bounds for the Subordination Integral and Window Selection

3.2.1. Left Tail (Small θ )

3.2.2. Right Tail (Large θ )

3.2.3. Window Rules: Left and Right Together

3.2.4. Constants in the Stretched-Exponential Bound

4. Error of the SOE Approximation

4.1. Error Metrics

4.1.1. Relative Error

4.1.2. Mass Conservation Error

4.1.3. Scalar Error

4.1.4. Operator Error

4.2. Computational Results

5. Subdiffusive Distance and Paths

5.1. Subdiffusive Distance

5.2. Subdiffusive Shortest Paths

5.3. Subdiffusive Shortest Paths on a Geometric Graph

Experimental Results

5.4. Physical Interpretation: Memory Reinforcement of Past States

6. Caputo Fractional Derivative and Memory

6.1. Remote Versus Recent Memory in the Caputo Fractional Derivative

6.2. Physical Interpretation: Memory Regimes on a Graph

7. Emergence of Memory from Fractional Diffusion on a Graph

7.1. Local Convexity and Concavity in Caputo Fractional Diffusion: A Mittag-Leffler and SOE-Based Analysis

7.2. Random Time Change Induces Subdiffusion and Memory

8. A Generalized Physico-Mathematical Context of Subdiffusion on Graphs

8.1. A Multiplex Diffusion

8.2. Factorized High-Order Temporal Diffusion Equation

8.2.1. Verification

8.2.2. Initial Conditions

8.2.3. Physical Interpretation

8.3. Operator-Valued Memory and Diffusion on Graphs

8.3.1. Operator-Valued Volterra Kernels

8.3.2. Multiplicative Diffusion on Graphs

8.3.3. Equivalence with an Operator-Valued Memory Equation

8.3.4. Spectral Representation

8.3.5. Fractional Diffusion as a Singular Memory Limit

9. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A. Numerical Window Tables and Usage

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

2.5. The Time-Changed Process $X_{E_{t}}$

3.2.1. Left Tail (Small $θ$ )

3.2.2. Right Tail (Large $θ$ )