On the Construction of Some Deterministic and Stochastic Non-Local SIR Models

: Fractional-order epidemic models have become widely studied in the literature. Here, we consider the generalization of a simple SIR model in the context of generalized fractional calculus and we study the main features of such model. Moreover, we construct semi-Markov stochastic epidemic models by using time changed continuous time Markov chains, where the parent process is the stochastic analog of a simple SIR epidemic. In particular, we show that, differently from what happens in the classic case, the deterministic model does not coincide with the large population limit of the stochastic one. This loss of ﬂuid limit is then stressed in terms of numerical examples.


Introduction
Starting from the seminal paper by Kermack and McKendrick [1], epidemic models have been a fruitful field of applications of dynamical systems. However, stochastic models for epidemic were already proposed by McKendrick in [2]. Starting from these papers, different epidemic models, both deterministic and stochastic, have been considered. The approach we focus on in this paper is the Continuous-Time Markov Chain one, used for instance by [3] and then successively extended to different deterministic models (see, e.g., [4,5] and references therein). For more recent works, we refer, for instance, to those of Ball [6] (in which the authors describe some exact results on the distribution of final size of the epidemics), Lefèvre and Simon [7] (for a matrix analysis approach), Ball et al. [8] (in which social distancing is modeled in terms of adaptive random graphs), and Liu et al. [9] (in which the authors modeled environmental noise via Lévy jumps).
In modern times, epidemic models with information variables have been considered, in the context of behavioral epidemiology (to cite a contemporary paper, see [10], which deals with the COVID-19 pandemic). The information variable in behavioral epidemiology is a first important example of the introduction of memory effects in deterministic epidemic models (see, e.g., [11][12][13]). Indeed, the information variable can be modeled as the convolution product between a function of the state of the system (representing the information that individuals consider relevant) and a memory kernel (representing the weight that individuals give to the past history of the system).
In stochastic epidemic modeling, instead, the necessity of introducing some memory effects is solved by considering semi-Markov epidemic models (in place of Markov ones), which, however, are more difficult to handle (see, e.g., [14]).
With the idea of introducing memory effects in mind, we can consider the case of non-integer order epidemic models. Indeed, fractional calculus operators (e.g., the Caputo fractional derivative) are non-local convolution operators that weight in some sense the variation of a function during its entire lifetime. To cite some fractional-order epidemic models, let us recall the works of Diethelm

Inverse Subordinators and Non-Local Convolution Derivatives
Before introducing the actual model, let us give some preliminaries on the non-local operators we use in the following. Let us fix in the whole text a filtered probability space (Ω, F , F t , P). Definition 1. We say that Φ : [0, +∞) → R is a Bernstein function if Φ ∈ C ∞ (R + ) (where we denote R + = (0, +∞)), Φ(λ) ≥ 0 for any λ ≥ 0 and, for any n ∈ N and λ ≥ 0, For generalities on Bernstein functions, we refer to [32]. Let us denote by BF the convex cone of Bernstein functions. For any Φ ∈ BF , there exists a unique triplet (a Φ , b Φ , ν Φ ), where a Φ , b Φ ≥ 0 and ν Φ (dt) is a measure on (0, +∞) with the integrability property The triple (a Φ , b Φ , ν Φ ) is said to be the characteristic triple of Φ and ν Φ is the Lévy measure of Φ.
From now on, let us denoteν Φ (t) = ν Φ (t, +∞) and g Φ (ds; y) the law of the random variable σ Φ (y). In [34], it is shown that L Φ (t) admits a probability density function f Φ (s; t) given by Moreover, denoting by L t→λ the Laplace transform operator, it holds that for any λ > 0. Now, we can introduce the non-local convolution derivatives (of Caputo type) associated with Φ, as given in [26,27].
Definition 3. Let u : R + → R be an absolutely continuous function. Then, we define the non-local convolution derivative induced by Φ of u as

Remark 1.
In analogy of what is usually done with Riemann-Liouville and Caputo derivatives, one can introduce the following regularized non-local derivative (4) observing that it coincides with the previous definition on absolutely continuous functions.
Let us recall that, if both u and d Φ u dt Φ are Laplace transformable, then We are interested in the eigenfunctions of such operators. It can be shown, by Laplace transform arguments (see, e.g., [28,35]) or Green functions arguments (see [36]), that the (eigenvalue) Cauchy problem admits a unique solution for any λ ∈ R and it is given by e Φ (t; λ) := E[e λE Φ (t) ] (hence, in particular, it is a completely monotone function in the variable −λ ∈ [0, +∞) for fixed t).
We are also interested in the following subclass of Bernstein functions.
is still a Bernstein function, called the conjugate Bernstein function.
Let us denote by SBF the class of special Bernstein functions. If Φ ∈ SBF , we call σ Φ a special subordinator. Let us also introduce the renewal function of σ Φ Under the assumption a Φ = b Φ = 0 and ν Φ (0, +∞) = +∞, if Φ ∈ SBF , we know there exists a non-increasing non-negative function u Φ (t) such that In [38], it is shown that, for any absolutely continuous function f (t), it holds that In particular, if g is Laplace transformable, it holds that By using such property, we can show the following comparison principle.
, y(t) are Laplace transformable with their non-local convolution derivatives and such that x(0) ≤ y(0) and for some function ξ(t) ≥ 0 that is Laplace transformable by hypotheses. Let us take the Laplace transform on both sides to obtain Dividing everything by Φ(λ), we have Now, let us take the inverse Laplace transform to obtain concluding the proof.
From now on, let us consider Φ ∈ SBF . We can also show a generalized mean value theorem, following the lines of ([39] Theorem 1).
and consider t ∈ (0, T]. Then, there exists ξ ∈ (0, t) such that Proof. Let us observe that, by definition, By the integral mean value theorem, we know that there exists ξ ∈ (0, t) such that On the other hand, we have Let us finally recall the following definition, which is useful in what follows.
We say that f is regularly varying at 0 + of order α ∈ R if f (1/t) is regularly varying at infinity of order α.

A Non-Local SIR Model
Let us consider the following model where β, γ > 0 are, respectively, the infection parameter and the removal parameter. A schematization of the compartments of the model is given in Figure 1. This model constitute a non-local counterpart to the simple SIR model without population dynamics. By existence and uniqueness of uniformly bounded solutions (see [28]), we have that where N is a fixed constant. Let us set for simplicity N = 1. In such way, we can consider S Φ (t), I Φ (t), and R Φ (t) to be, respectively, the density of susceptible, infected, and removed entities in the population. In particular, we can recover R Φ (t) = 1 − S Φ (t) − I Φ (t), thus we only need to study the dynamics of S Φ (t) and I Φ (t), through the system As a first step, we need to show that both S Φ (t) and I Φ (t) are non-negative if we start from initial datum S 0 , I 0 > 0. To do this, let us first observe that, by existence and uniqueness of uniformly bounded solutions (see [28] Corollary 4.6 for the linear case: the proof can be easily adapted to the bilinear case), we have that the equilibrium points of the dynamical system are of the form (S * , 0). Indeed, if (S(0), I(0)) = (S * , 0), then the unique solution of the system is given by (S Φ (t), I Φ (t)) ≡ (S * , 0). Now, we can show the following proposition.
Proof. Let us first observe that (S Φ (t), I Φ (t)) cannot leave the R 2 + quadrant (where R 2 + = {(x, y) ∈ R 2 : x > 0, y > 0}) through a point of the form (S * , 0) for S * ≥ 0 by existence and uniqueness of uniformly bounded solutions. To show that if we start from (S 0 , I 0 ) ∈ R 2 + we cannot leave the R 2 + quadrant through a point (0, I * ) with I * > 0, let us show that for any I * > 0 we can construct a solution (S Φ (t), I Φ (t)) starting from a certain (0, I 0 ) with I 0 > 0 and such that S Φ (t) ≡ 0 and I(t * ) = I * for any fixed t * > 0.
Proof. By the previous proposition, we know that I Φ (t) > 0 for any t > 0. Moreover, since S 0 + I 0 = 1, we have that R(0) = R 0 = 0. By the comparison principle in Proposition 1, we know that R Φ (t) ≥ 0 for any t > 0. We conclude the proof observing that this implies S Φ (t) + I Φ (t) ≤ 1.

Remark 4.
In the case R 0 > 1, we have that, at least for short times, by the comparison principle, I Φ (t) ≥ I 0 . However, we could conclude that even in this case lim t→+∞ I Φ (t) = 0 if we are able to show that there exists a time s > 0 for which θ 1 (t, s, ψ(·, (S 0 , I 0 ))) < γ β . This new condition (with respect to the ordinary case) is due to the lack of semigroup property of the flow ψ(·, (S 0 , I 0 )), together with the fact that we do not have any asymptote theorem for non-local derivatives. However, numerical examples, at least in the fractional-order case, seem to confirm the fact that lim t→+∞ I Φ (t) = 0 even if R 0 > 1.

Some Generalities on the Stochastic SIR Model
Let us now give some generalities on the (simple) stochastic SIR model. We prevalently follow the lines of Daley and Gani [4]. First, let us construct a SIR model by denoting with S 1 (t) the number of susceptible individuals, I 1 (t) the number of infective individuals, and R 1 (t) the number of removed individuals at time t > 0. Suppose the population is closed, that is to say there exists a constant N ∈ N such that . Thus, we are interested in the bivariate process (S 1 (t), I 1 (t)), which is a Continuous Time Markov Chain. Let us consider as state space the set Now, let us define β, γ > 0 as the infection and removal parameters, respectively, as done in the previous section. Let us denote the transition probability functions as Then, for any (s, i) ∈ S and (s 2 , These transition probability functions are compatible with the choice of the state space, since it is easy to see that S 1 (t) + I 1 (t) ≤ N and S 1 (t), I 1 (t) ≥ 0 for any t ≥ 0. A graphical schematization of the bivariate chain (S 1 (t), I 1 (t)) is given in Figure 2.
Schematization of a node for the stochastic SIR model without population dynamics.
In the following, let us fix two constants S 0 , I 0 ∈ N such that N = S 0 + I 0 and let us define the set S * = S \ {(S 0 , I 0 )}. For the bivariate process (S 1 (t), I 1 (t)), we can define the state probabilities p 1 (t, s, i) = p 1 (t, (s, i); 0, (S 0 , I 0 )), (s, i) ∈ S, that solve the following forward Kolmogorov equations If we introduce the probability generating function then we have the following proposition (see [4]).

Proposition 4.
The functions (p 1 (·, s, i)) (s,i)∈S are solutions of (8) if and only if ϕ 1 is solution of the following initial value problem: Let us also observe that this process admits (s, 0) as absorbent states for any 0 ≤ s ≤ S 0 and does not admit any recurrent state. In particular, (S 1 (t), I 1 (t)) becomes almost surely definitely constant. In particular, for a fixed ω ∈ Ω, we have that (S 1 (t, ω), Let us define the ultimate size of the epidemic as I tot (ω) = S 0 − S ∞ (ω) and the intensity of the epidemic as I rel = I tot S 0 . We obtain the distribution (P ∞ (s)) s≤S 0 of the random variable I tot by the formula On the other hand, let us denote by π ∞ the cumulative distribution function of I rel , i.e., If there exists ξ ∈ (0, 1) such that π ∞ (ξ) < 1, then the event that at least one individual has been infected occurs with non-null probability. For this reason, we say that a major outbreak occurs (for the initial datum (S 0 , I 0 ) and the couple of parameter (β, γ)) if and only if there exists ξ ∈ (0, 1) such that π ∞ (ξ) < 1. Let us also recall that, since I tot ≤ S 0 , I rel ≤ 1 and π ∞ (1) = 1. On the other hand, if for any ξ ∈ (0, 1] it holds that π ∞ (ξ) = 1, then, by continuity, π ∞ (0) = 1 and then I rel = 0 almost surely.
A closed form of π ∞ is not known in general. However, it is still useful to obtain some bounds on π ∞ , depending on the basic reproductive number R 0 , which is defined in the stochastic model in the same way as stated above. In particular, the following threshold theorem holds (see [40]).

Theorem 2.
Let us consider a stochastic SIR model with initial datum (S 0 , I 0 ) and basic reproductive number R 0 . Consider ξ ∈ (0, 1). • This threshold theorem is improved, e.g., in [41], but we focus only on a particular consequence, which is commonly called R 0 -dogma.

Corollary 2. A major outbreak occurs if and only if
Let us recall that this is true also in the deterministic case (both ordinary and non-local), where with the statement "a major outbreak occurs" we intend that at a certain time t > 0 it holds that The duration of the epidemics is defined as and is almost surely finite with finite mean. Indeed, since (S 1 (t), I 1 (t)) is a continuous time Markov chain, it is easy to see that T 1 can be controlled almost surely by an Erlang random variable with shape parameter 2N − 1 and rate γ. This is because, if we define T 1 as the sojourn time of (S 1 (t), I 1 (t)) in the state (S 1 (0), I 1 (0)), we have where for which r(s, i) ≥ γ for any (s, i) ∈ S with i = 0. The conditional distribution of any sojourn time is given by Equation (10) by using the Markov property of (S 1 (t), I 1 (t)) in the jump times. On the other hand, we can consider the state space S as the set of vertices of an oriented graph G = (S, E ) where for any (s, i) ∈ S the edges starting from (s It is easy to see that the maximal length of paths in G is 2N − 1. Thus, T 1 is controlled by the time of 2N − 1 jumps (which is the worst case scenario) with rate γ (which is the lowest possible rate). Let us finally recall that, as usual for continuous time Markov chains, we can consider the jump chain embedded in (S 1 (t), I 1 (t)). In particular, the distributions of S ∞ , I tot and I rel depend only on the jump chain.

The Time-Changed Stochastic SIR Model
Now, we want to introduce our time-changed stochastic SIR model. To do this, let us consider the stochastic SIR process (S 1 (t), I 1 (t)) discussed in the previous section. Let Φ ∈ SBF and consider the inverse subordinator L Φ (t) independent of (S 1 (t), I 1 (t)). Definition 6. The time-changed stochastic SIR process induced by (S 1 (t), I 1 (t)) and Φ is defined as We call (S 1 (t), I 1 (t)) the parent process.
Let us first remark that such kinds of processes are not Markov processes but only semi-Markov (see, e.g., [42][43][44][45]). Thus, in particular, Markov property still holds in correspondence of the jump times, leading to the definition of the same embedded jump chain, independently of the choice of Φ ∈ SBF . This implies that the distributions of the random variable S ∞ , I tot and I rel do not depend on the choice of the Bernstein function Φ and coincide with the ones of the Markov case. Thus, both Whittle's threshold theorem (i.e., Theorem 2) and the R 0 -dogma (i.e., Corollary 2) still hold true. What actually changes is the transient state. This can be first seen by the fact that the distribution of the sojourn time depend on the choice of Φ by means of the eigenfunctions of the operator ∂ Φ ∂t Φ .
Proof. Let us first observe that, by definition, it holds that However, since σ Φ and T 1 are independent and σ Φ does not admit any fixed discontinuity point, by a simple conditioning argument, one obtains Since T 1 and L Φ (t) are independent, we obtain where we use Equation (10) and the definition of e Φ .
Despite the process is not Markov, we can still define the state probabilities as Arguing as above, we can show the following integral representation.
Proof. Let us observe that L Φ (0) = 0 almost surely and it is independent of (S 1 (t), I 1 (t)), hence it holds that As done in the classical case, we can introduce the probability generating function for which, by linearity, the following integral representation holds In the next subsection, we obtain the non-local forward Kolmogorov equations for the process (S Φ (t), I Φ (t)).

The Non-Local Forward Kolmogorov Equations
The first thing we have to do is to establish an equivalence between the (non-local) linear system and the (non-local) partial differential equation Here, as solution of Equation (17) (and analogously of Equation (16)), we intend a function ϕ Φ that satisfies both the initial condition and the equation in (17) pointwise. Before doing this, we need the following easy technical Lemma. Lemma 2. Let (p Φ (t, s, i)) (s,i)∈S be solutions of Equation (16) and p Φ (t, s, i) ≡ 0 for any (s, i) ∈ S. Suppose I 0 > 0. Then, for any integer 1 ≤ k ≤ I 0 , it holds that p Φ (t, S 0 + k, I 0 − k) ≡ 0.
Proof. Let us argue by induction on m = I 0 − k. Let us first consider m = 0. Then, k = I 0 . Observing that (S 0 + I 0 , 0) = (N, 0) ∈ S, we have that However, (N, 1) ∈ S, thus we actually have Thus, we can rewrite the equation as Now, we can show the equivalence between Equations (16) and (17).
On the other hand, multiplying the first equation of (16) by z S 0 w I 0 and the second one by z s w i , and then summing over (s, i) ∈ S, we have, by linearity of d Φ dt Φ , Recalling that (S 0 , I 0 + 1) ∈ S, we have p Φ (t, S 0 , I 0 + 1) ≡ 0. Moreover, by Lemma 2, we know that p Φ (t, S 0 + 1, I 0 − 1) ≡ 0. Thus, the previous equality is equivalent to Thus, we have showing that ϕ Φ is solution of (17). Conversely, if ϕ Φ is solution of (17), then, since we have that p Φ (0, S 0 , I 0 ) = 1 and p Φ (0, s, i) = 0 for (s, i) ∈ S * . Moreover, by equations (17) and (18), we have However, by linearity thus, by identity of polynomials, we conclude the proof. Now that we have this equivalence result, we can directly work with ϕ Φ to show that Equation (16) are the actual non-local forward Kolmogorov equations of the coupled process (S Φ (t), I Φ (t)). Indeed, we have: (14) is solution of (17).
Proof. Let us recall Equation (15) and the fact that p Φ (0, s, i) = p 1 (0, s, i), since L Φ (0) = 0 almost surely. This implies that ϕ Φ (0, s, i) = ϕ 1 (0, s, i) = z S 0 w I 0 . On the other hand, for fixed z, w ∈ R, we have that ϕ Φ (t, z, w) is Laplace transformable since p Φ (t, s, i) are non-negative and bounded for t ∈ [0, +∞). Let us denote by ϕ Φ (λ, z, w) the Laplace transform of ϕ Φ (t, z, w) with respect to the variable t, which is well-defined for λ ∈ C such that (λ) > 0. By Fubini's theorem, we have Now, let us observe that ϕ 1 (t, z, w) is differentiable in t and lim t→+∞ ϕ 1 (t, z, w)e −tΦ(λ) = 0 for any z, w ∈ R and λ ∈ C such that (λ) > 0 (since, in such case, (Φ(λ)) > 0). Thus, we can integrate by parts to obtain Now, by Equation (9), we have Writing 1 λ z S 0 w I 0 = +∞ 0 e −λt z S 0 w I 0 dt and observing that ϕ 1 (t, z, w) is a polynomial in z and w, thus we can exchange the derivative and the integral sign, we have Multiplying everything by Φ(λ) (that is not 0 as (λ) > 0), we have As above, since ϕ Φ (t, z, w) is a polynomial in z, w, we can exchange the derivative sign with the Laplace transform obtaining Now, let us divide everything by λ to achieve Recalling Equation (2), we have thus, by injectivity of the Laplace transform, Since we writeν Φ * (ϕ Φ (·, z, w) − ϕ Φ (0, z, w))(t) as an integral, we know that it is absolutely continuous, hence we can differentiate both sides for almost any t > 0, obtaining that the equation of the Cauchy problem (17) holds for almost any t > 0.
Let us observe that this implies that (p Φ (t, s, i)) (s,i)∈S satisfies (16) for almost any t > 0. However, Equation (16) admits a unique continuous solution (see, e.g., [28]), thus (p Φ (t, s, i)) (s,i)∈S is continuous in t > 0 and so does ϕ Φ (t, z, w). From this, we have also that the function is continuous. Thus, one can differentiate on both sides of Equation (19) for any t > 0, concluding the proof.

Remark 5.
In the previous theorem, we also prove that (p Φ (t, s, i)) (s,i)∈S is the unique continuous solution of (16).

The Equation of the Mean
Let us recall that the state space S is finite, thus the process (S Φ (t), I Φ (t)) admits moments of any order. In particular, we have the following relations: By using these relations, we obtain a (non-local) Cauchy problem for the functions E[S Φ (t)] and E[I Φ (t)].

Corollary 3. The functions t → E[S Φ (t)] and t → E[I Φ (t)] are solutions of
Proof. Since ϕ Φ (t, z, w) is a polynomial in z, w, one easily obtains and the same holds for ∂ ∂w and ∂ 2 ∂z ∂w .
First, let us observe that, by definition, E[S Φ (0)] = S 0 and E[I Φ (0)] = I 0 . Now, let us consider the Cauchy problem (17) and differentiate the first equation with respect to z to obtain Substituting z = w = 1, we have However, recalling that we have Now, let us consider again Equation (17) and differentiate both sides with respect to w to achieve As above, substituting w = z = 1, we have and then, by using again relation (21), we get concluding the proof.

Remark 6.
Let us observe that, if we neglect the covariance term, Equations (20) and (7) coincide. The same happens in the classical case. However, the results in [46] can be used to show the convergence for big population of (S 1 (t), I 1 (t)) to the solution of (7) (for Φ(λ) = λ, i.e., the classical SIR model). More details are contained in [47].
On the other hand, in [48,49], the authors proved the mean-field convergence of some stochastic epidemic models (in particular, in [49], the SIR model) to their deterministic counterpart with an elementary proof.
Giving an interpretation of the aforementioned convergence result, we could say that considering a very big population we decouple in a certain sense the two processes S 1 (t) and I 1 (t), in such a way that any individual is independent from each other. This principle is typical of the mean field approach, as for instance one can see in the seminal theory of mean field games (see [50,51]). However, as we show in Section 6, this is not the case of the time-changed SIR model.

Delaying the Epidemic
Before exploiting the mean field convergence results, let us make some considerations on the outbreak duration. Indeed, we observe above that both Whittle's threshold theorem and the R 0 -dogma still hold in this case, since the jump chain of the process is left invariant by the time-change.
However, one thing that drastically changes is the outbreak duration. Let us denote by T Φ the outbreak duration in the time-changed case, i.e.
We observe above that T 1 is almost surely finite with finite mean. Concerning T Φ , we have quite a different behavior. Proposition 6. Let Φ be regularly varying at 0 + of order α ∈ [0, 1). Then, the random variable T Φ is almost surely finite and it holds that as t → +∞, where T 1 is the outbreak duration of the parent process.
Proof. Let us first show that T Φ is almost surely finite. To do this, let us just observe that we can (stochastically) control T Φ with the sum of 2N − 1 copies of a variable τ Φ whose distribution is given by Indeed, we can consider the oriented graph G = (S, E ) defined in Section 4 and observe that the maximum length of a path in G is given by 2N − 1. Moreover, by complete monotonicity of the function λ → e Φ (t; −λ) and the fact that r(s, i) ≥ γ for any (s, i) ∈ S with i = 0, it holds, by Proposition 5, that for any (s, i) ∈ S with i = 0. Thus, denoting by the usual stochastic ordering, we have that T Φ τ Φ . Moreover, being the coupled process (S Φ (t), I Φ (t)) a semi-Markov process, we can conclude that any sojourn time is stochastically less than τ Φ . Thus, we have that T Φ is made, in the worst case scenario, of 2N − 1 sojourn times, each one controlled by an independent copy of τ Φ , i.e.
Φ is the sum of 2N − 1 (hence, a finite number) of independent almost surely finite random variables, hence T Φ is also almost surely finite. Now, let us show Equation (22). Let us observe that we can equip the state space S with the discrete topology τ = P (S), where P (S) is the set of all subsets of S. Thus, we have that the set U = {(s, i) ∈ S : i > 0} is an open set and we can see T Φ as the first exit time of (S Φ (t), I Φ (t)) from U . Thus, we obtain Equation (22)

Remark 7.
Let us observe that the hypothesis of regular variation comes into play only in the second part of the proof, hence we can conclude that T Φ is almost surely finite even if Φ is not regularly varying. Moreover, the asymptotic result on P(T Φ > t) implies that E[T Φ ] = +∞.
The latter result implies that, in a certain sense, the time-change procedure we applied lead to a delay in the epidemics, which is something expected since the new clock L Φ (t) admits segments of constancy.

Large Population Limits
In this section, we study a large population limit for our model. Let M be a metric space and X n be a sequence of M-valued random variables. Let also X be an M-valued random variable. We say that X n weakly converges towards X (and we denote it as X n ⇒ X) if for any functional F ∈ M * it holds that E[F (X n )] → E[F (X)] as n → +∞.
On the set C[0, T], let us consider the uniform norm Let us observe that such norm can be applied to any bounded function f defined almost everywhere on [0, T] (where sup denotes the essential supremum). We have that C := (C[0, T], · L ∞ (0,T) ) is a Banach space, hence a metric space. Moreover, let us denote d J 1 the following metric on D[0, T]: T], f is increasing}. For more details, we refer to [53]. Let us show the following limit result.  Let us consider the Cauchy problem and let us define the process ( S Φ (t), I Φ (t)) = (S 1 (L Φ (t)), I 1 (L Φ (t))).
Then, we have that (s (N) Proof. The proof is an easy consequence of the fluid limit for the stochastic SIR model and the continuous mapping theorem. Indeed, the parent processes X N (t) := (s (N) 1 (t)) converge uniformly almost surely towards X(t) := (S 1 (t), I 1 (t)) in C and then weakly in D (see, e.g., [47]). Being L Φ and X N independent, we have (X N , L Φ ) ⇒ (X, L Φ ) in D × D. Moreover, X(t) is a continuous function and L Φ belongs almost surely to D ↑ [0, T]. Thus, by continuity of the composition map on C[0, T] × D ↑ [0, T] (see [53] Theorem 13.2.2) and the continuous mapping theorem (see [53] Theorem 3.4.3), we conclude the proof.

Remark 8.
Let us observe that, differently to the classical case, in the time-changed case, we lose the fluid limit. Indeed, we actually have (E[ S Φ (t)], E[ I Φ (t)]) = (S Φ (t), I Φ (t)) even if Φ is a complete Bernstein function (see [32] for the definition). By recalling that (see, e.g., [26]) and, with an analogous argument, for any t > 0.

Numerical Simulations
Let us show some numerical simulations of both the stochastic and the deterministic model. All the simulations were obtained using R (see [54]). In this section, we consider Φ(λ) = λ α , since we do not want to add some complications to simulation procedure. Concerning the numerical solution of the system (7), we used a one-step explicit rectangular product-integration rule, as shown, for instance, in [55]. For other Bernstein functions, one could provide product-integration rules in the same spirit as the one described.
To see the effects of the non-locality in the deterministic case, we plot in Figure 3 the phase portrait (first for α = 0.7 and then in the classical case) and in Figure 4 the functions I Φ (t) and I 1 (t). The phase portrait of the case α = 0.7 and α = 1 are quite similar. However, one can observe the following difference between the two phase portrait. In the classical case, it is well known that, if R 0 > 1, the decreasing phase of I 1 (t) starts as soon as S 1 (t) = γ β . This is not true in the non-local case. Up to now, we are not able to determine a sufficient condition for the decreasing phase to start, but we can see in Figure 3 that the decreasing phase seems to start generally before S Φ (t) reaches γ β (the threshold is represented in the figure as the vertical dotted line). In any case, the phase portrait confirms the R 0 dogma shown in Proposition 3. Indeed, as S 0 ≤ γ β , we can see from the phase portrait that I Φ (t) is decreasing, while, as S 0 > γ β , the function I Φ (t) is increasing for short times.  the decreasing phase to start, but we can see from Figure 3 that the decreasing phase seems to start generally before S Φ (t) reaches γ β (the threshold is represented in the figure as the vertical dotted line). In any case, the phase portrait confirms the R 0 dogma shown in Proposition 3. Indeed, as S 0 ≤ γ β , we can see from the phase portrait that I Φ (t) is decreasing, while, as S 0 > γ β , the function I Φ (t) is increasing for short times. On the other hand, in Figure 4 we can see that the whole dynamics of the epidemic (in particular of the function I Φ represented in the plot) is delayed in the decreasing phase. Indeed, for Φ(λ) = λ α with α ∈ (0, 1), one expects a power-law decay (of order α) in place of the exponential one for I 1 . This, for instance, can be seen in the generalization of the Hartmann-Grobmann theorem given in [19], which, however, is not applicable in our case since the equilibria are not hyperbolic. Concerning the stochastic model presented in Section 5, one can adapt Gillespie's algorithm (see [56]) to the semi-Markov case, together with the fact that a procedure to simulate Mittag-Leffler random variables is known (see [57]). Such a procedure is, for instance, illustrated, for Φ(λ) = λ α , in [58]. On the other hand, in Figure 4, we can see that the whole dynamics of the epidemic (in particular of the function I Φ represented in the plot) is delayed in the decreasing phase. Indeed, for Φ(λ) = λ α with α ∈ (0, 1), one expects a power-law decay (of order α) in place of the exponential one for I 1 . This, for instance, can be seen in the generalization of the Hartmann-Grobmann theorem given in [19], which, however, is not applicable in our case since the equilibria are not hyperbolic.
Concerning the stochastic model presented in Section 5, one can adapt Gillespie's algorithm (see [56]) to the semi-Markov case, together with the fact that a procedure to simulate Mittag-Leffler random variables is known (see [57]). Such a procedure is, for instance, illustrated, for Φ(λ) = λ α , in [58]. For other Bernstein functions, one could provide simulation procedures for random variables with distribution 1 − e Φ (t; −λ) by using the fact that the Laplace transform is known (see [59]). In Figure 5, different simulated paths of the process I Φ (t) are represented, all for I 0 = 100 (and S 0 = 900), β = 1.5/S 0 , and γ = 1 (in such a way that R 0 = 1.5). As one can see, the sojourn times vary in length in such a way to produce some actually long constancy intervals. Let us indeed recall that, despite being almost surely finite, the sojourn times admit infinite mean, thus we can expect long sojourn times with non-negligible probability (even more in the case Φ(λ) = λ α , since we know that E α (−λt α ) decays as t −α and then the sojourn times admit a power-law decay). Even though the shapes of the phase portraits are similar, the dynamics happen in different time-scales, as one can see in Figure 7. Indeed, we show above that the large population limit of the time-changed continuous time Markov chain does not coincide with the solution of the non-local dynamical system (7). Numerical evidence of this loss of fluid limit is given in Table 1. Here, for a fixed time horizon T > 0, we consider the error function Table 1. The values of E T (N) as N varies for different values of α, T = 10, I 0 = 0.1, and S 0 = 0.9 for the deterministic case (while for the stochastic one I 0 = 0.1N and S 0 = 0.9N), as well as γ = 1 and β = 2 for the deterministic case (while for the stochastic one β = 2/N). The discretization time interval is given by δt = 0.05.

Conclusions
Let us summarize here the main results and some observations that follow from them. In Section 3, we introduce a non-local generalization of the SIR model by simply substituting the classical derivative with a Caputo-type derivative. Despite the naive substitution, this new model exhibits some features that are more complicated to study. We are able to recover (in some form) the R 0 -dogma. However, the lack of semigroup property leads to some unexpected results. We do not have any sufficient condition for monotonicity of functions by knowing the sign of their Caputo-type derivative. The lack of such sufficient condition can also be seen from the phase portrait in Figure 3, as the maximum is reached in a region in which d Φ I Φ dt Φ is still strictly positive. Let us indeed recall that the classical Fermat theorem on extremal points (see, e.g., [60] Theorem 5.9) becomes an inequality in the non-local context (see, e.g., [61] Theorem 1 in the fractional case or [62] Proposition 2.2 in the general Caputo-type case), justifying the fact that, after the function reaches a maximum and starts decreasing, the non-local derivative could still be non-negative. Finally, let us observe that the lack of semigroup property can be overtaken by considering a deformation map θ, as shown in Appendix A, that takes in consideration the fact that the system remembers all its history.
On the other hand, in Section 5, we study the properties of a time-changed stochastic SIR model. In the linear context, time-changed processes provide the direct non-local counterpart of their parent process, as one can see, for instance, in [22][23][24][25]63,64], and many others. In the non-linear context, as for the SIR model, things can go differently. Indeed, even in the classical case, the equation of the expected value is affected by the non-linearity that appears as the covariance between the two components of the process. However, in the Markov case, we still have a fluid limit that links the deterministic case with the mean field dynamics of the stochastic case. In the time-changed case, this does not happen, as shown in Section 6. Thus, even though the time-changed SIR model preserves all the properties of the stationary state of the parent model and its forward Kolmogorov equations can be obtained from the parent's one by considering a Caputo-type derivative in place of the classical one, the whole model cannot be directly seen as the stochastic counterpart of the non-local SIR model introduced in Section 3. All these observations and statements are then supported by numerical evidence, as shown in Section 7.
Finally, concerning modeling purposes, let us observe that the time-changed SIR model should be used only to model subdiffusive epidemics. Indeed, one could use a continuous mapping approach to prove that Löf's diffusive approximation of the classical stochastic SIR model (see [5] and references therein) leads to a subdiffusive approximation of the time-changed SIR model. Thus, when working with time-changed models (including also fractional ones), one should always take in consideration the fact that such models exhibit an approximatively subdiffusive behavior and then should be used only in contextualization of such behavior with empirical evidence.
Here, we want to show the same result for any Bernstein function Φ ∈ SBF , to extend the definition of flow of a dynamical system to the non-local case.