On a Retarded Nonlocal Ordinary Differential System with Discrete Diffusion Modeling Life Tables

: In this paper, we consider a system of ordinary differential equations with non-local discrete diffusion and ﬁnite delay and with either a ﬁnite or an inﬁnite number of equations. We prove several properties of solutions such as comparison, stability and symmetry. We create a numerical simulation showing that this model can be appropriate to model dynamical life tables in actuarial or demographic sciences. In this way, some indicators of goodness and smoothness are improved when comparing with classical techniques.


Introduction
Nonlocal diffusion problems generated by convolution operators have been used widely in the literature in order to model diffusion processes. As explained in [1], a convolution operator can describe transport due to long-range dispersal mechanisms. Another application of such models is the approximation of reaction-diffusion equations, as done in [2] in the continuous case and in [3] in the discrete case. They have been also used to model phase transitions (see [4][5][6][7] among others). The long-time behavior of the solutions of such systems has been considered for example in [7,8]. In [9], the relation between a continuous nonlocal diffusion equation and its discrete counterpart was studied.
In our previous paper [10], we studied analytically and numerically the nonlocal ordinary differential system with discrete diffusion d dt where D is a suitable subset of Z, showing that it is appropriate for modeling life tables. The main downside is that the predictions are only valid for a short term of about two or three years. It seems, however, that some memory effects are present on life tables, so it would be better to consider a model with delay. This allows us to extend the validity of the predictions for larger periods of about five to ten years. Therefore, in this paper, we modify system (1) by means of the introduction of some delay terms. For life tables, the delay pattern is established via improvement rates [11]. We point out that our model is deterministic, non-parametric and predicts the evolution of the probability of death by age over time.
In Section 2, we study some properties of the solutions of the retarded nonlocal model such as a comparison principle, the stability of stationary points or the symmetry of solutions under certain assumptions. The analysis is carried out for a system with either a finite or an infinite number of equations, extending to the model with delay similar results from [10].
In Section 3, we apply the new model to life tables in Spain. We use the observed values in the period from 1908 to 2012 to fit the model and then validate it by making predictions in the period 2013-2018. We compare the predicted values with the observed ones by using two indicators of goodness. In both cases, the conclusion is that the model is appropriate for making quality predictions in the middle-term. We repeat the same calculations but using the Lee-Carter model, which is seminal and paradigmatic for this theory, obtaining that in almost all cases, our model improves the aforementioned indicators when comparing it with this classical model.

Properties of Solutions of Non-Local Diffusion Problems with Finite Delay
Let D be a subset of Z of one of the following types: For the cases (3)-(5), we define the Banach spaces with the usual norms u l ∞ 1 = sup i∈D |u i |, u l ∞ 2 = sup i∈Z\D |u i |, u l p 1 = ∑ i∈D |u i | p 1 p , p . If D = Z, then l ∞ 1 = l ∞ , l p 1 = l p . In the case (2), the phase space is R m , m = m 2 − m 1 + 1, with its norm denoted by · R m . The scalar products in R m and l 2 j , j = 1, 2, will be given by (·,·) R m and (·,·) l 2 j , respectively.
We consider the following non-autonomous retarded problem with non-local diffusion: where τ is the initial moment of time, and J : R + × C([−h, 0], X) → X, with either X = R m , X = l ∞ 1 or X = l 1 1 , is the non-autonomous convolution operator defined by where u(t) = u i (s) i∈D , u t = u(t + s) for s ∈ [−h, 0], j : Z → R, g : R × Z\D → R and dµ(s) = ξ(s)ds being ξ(·) a probability density. We assume the following conditions: (H1) j k ≥ 0 for all k ∈ Z.
(H3) g ∈ C(R, l ∞ 2 ). (H4) α i ∈ C([−h, 0], R), α i (s) ≥ 0 for i ∈ D, s ∈ [−h, 0]. Throughout the paper, the space C([−h, 0], X) will be denoted for short by C h and its supremum norm by · C h when no confusion arises. All the results in this section extend analogous ones from [10] in the same problem but without delay, that is, when h = 0. In the case with delay we need to add conditions concerning the functions α i (see (H4), (H5), (H7)).

A Finite Number of Equations
We assume now that D is given by (2), so X = R m . Let us prove first the existence and uniqueness of solutions by applying the abstract theory for differential equations with delay [12]. Lemma 1. Let (H1)-(H4) hold. For any φ ∈ C([−h, 0], R m ) and the initial moment τ, there exists a unique solution u(·) ∈ C 1 ([τ, +∞), R m ) of problem (6).
Proof. We write (6) in the form where The map f : C h → R m is globally Lipschitz continuous, which follows from On the other hand, the map h : [τ, +∞) → R m is obviously continuous as g ∈ C(R, l ∞ 2 ). Therefore, the map J : , is jointly continuous in both variables and Lipschitz continuous on the second variable. Hence, Theorem 2.3 in [12] implies the existence of a unique solution Since f is globally Lipschitz, it has sublinear growth, that is, there are constants Thus, it takes any bounded set of C h into a bounded set of R h . Since h is continuous, the map J takes closed bounded subsets of R × C h into bounded subsets of R m .
Let us prove that the solution exists for any t ≥ τ. If a solution u(·) were noncontinuable on [τ − h, b), for any closed bounded set U in R × C h there would exists a time t U such that u(t) ∈ U for any t U ≤ t < b ( [12], Theorem 3.2). We will obtain an estimate showing that the last is impossible for any solution. Multiplying (7) by u and using (8) and g ∈ C(R, l ∞ 2 ) we have Integrating between τ and t + θ, Taking the supremum over θ ∈ [−h, 0] and taking into account that Gronwall's lemma implies that Let B 0 be the closed ball in C h centered at 0 with radius R = 2 φ 2 C h + C 5 e C 6 (b−τ) and put U = [τ, b] × B 0 . Then, if u(·) were non-continuable, there would exist t U such that u(t) ∈ U for any t U ≤ t < b. This contradicts (9).
Further, we will prove that the solutions are non-negative if the initial condition is non-negative. This result implies in turn a comparison principle.
Then, the solution u(·) of problem (6) satisfies that u i (t) ≥ 0 for all i ∈ D, t ≥ τ.
Let us consider the autonomous case, that is, the function g does not depend on t. Assume also that α does not depend on i and denote A fixed point u ∈ R m has to satisfy the following system: We can rewrite it in the matrix form where We consider the following condition: Let r i be the sum of the elements of the row i of the matrix A. Then, (11) implies that Hence, A is diagonal dominant, so that the system has a unique solution, i.e., there exists a unique fixed point u.
We will show that it is stable. For this aim, we need to study the properties of the solutions of the homogeneous system We recall the definition of stability of stationary points. Definition 1. The zero solution of (12) is stable in the sense of Lyapunov if for each > 0 there exists γ( ) > 0 such that if v(0) R m < γ, then v(t) R m < ∀t ≥ 0. Additionally, we say that 0 is asymptotically stable if it is stable and there is Then, the zero solution of problem (12) is exponentially stable (hence, asymptotically stable). Moreover, it attracts uniformly as t → +∞ any solution starting on a bounded set.

Remark 1.
Although the condition h < − 1 2 log(1 − δ(h)) is an inequation, it is easy to see that for h small enough it holds true. Indeed, we can choose first h 0 > 0 such that M 1 (h 0 ) ∑ r∈D j i−r < 1 for any i ∈ D, which implies that M 1 (h) ∑ r∈D j i−r < 1 for all h ≤ h 0 , i ∈ D. Then, we choose h ≤ h 0 such that h < − 1 2 log(1 − δ(h 0 )) ≤ − 1 2 log(1 − δ(h)), where we have used that δ(h) is a non-increasing function.
We finish this subsection by showing that symmetry of the initial condition and the functions g, α and j imply symmetry of the solution.
Then, the unique solution to problem (6) is symmetric, that is, Then, the hypothesis of the lemma implies that is the unique solution to problem (6) with initial condition φ. Hence,

An Infinite Number of Equations
We consider now the case where the system can admit an infinite number of equations. For this we define D as a subset of Z of one of the types given in (3)-(5).
First, we study the Equation (6) We will need the following extra assumption:

Remark 2.
Conceptually, in life tables, it can be of interest to use an infinite number of equations. For example, domains of type (5) can be used for the biometric function Life Expectancy, which is defined (at year t) as ∑ ∞ i=0 p i (t) · i, where p i (t) represents the survival probability at age i at year t. We note that in this work we use the function q i = (1 − p i ), where q i represents the risk of death at age i. In Section 3.1.2, we apply model (6) with an infinite number of equations to the variable q i .
In order to obtain the existence and uniqueness of solutions we use once again the abstract theory for differential equations with delay given in ( [12], Chapter 2). Although the results are stated in the phase space R n , the theorems are valid for a Banach space as well.
Proof. We proceed as in Lemma 1. System (6) is considered in the form (7), where in this case ( f i (u t )) i∈D and (h i (t)) i∈D are column vectors with infinite elements.
We prove first that the map f : l ∞ 1 → l ∞ 1 is globally Lipschitz continuous. Indeed, by (H2) and (H5), we have Second, we check that h(t) is a continuous map for the cases (4) and (5). If t n → t in R, then by (H3) we obtain that In order to prove that the solution is globally defined for t ≥ τ, we need to obtain an estimate of its growth. Let b > τ be the maximal time of existence of the solution u(·), Arguing as in the proof of Lemma 1 we deduce that Again by the same arguments of the proof of Lemma 1, we arrive at a contradiction.
Further, we will obtain a comparison principle as in Proposition 1. For this aim, we need two additional assumptions: (H6)There exist K > 0 and a ∈ l 2 1 such that a i > 0, for all i ∈ D, and ∑ i∈D a 2 i a 2 r j i−r ≤ K for every r ∈ D.
It is obvious that (H7) is stronger than (H5). Condition (H6) is satisfied for any kernel j i with subexponential growth, that is, j i ≤ ce −α|i| for some α, c > 0 [10].
The function v satisfies the equality Multiplying it by (−v) + , we have Making use of (H7), the last term is estimated by By (H6) and (H7), for the penultimate term, we obtain is convergent for each r ∈ D and the iterated limit is convergent as well, the change of order in the summatories is justified. Further- , so it is correct to change the order of the summatory and the integral in the penultimate inequality of (15).
Thus, there exists a constant M > 0 satisfying ds.
Arguing as in the proof of Proposition 1 it follows that u(t) ≥ 0 for all t ≥ τ.
By (H2) and (H7), we have If t n → t in R, then by (H3b) and (H7), we obtain that The change of the order in the summatories is justified in the same way as in Lemma 2. On the other hand, since the functions Thus, in both cases, the limit function is integrable and it is legitimate to change the order of the summatory and the integral.
We study the comparison principle also in this case.
We finish this section by showing that if D = Z and the initial condition and the functions j i , α i and the initial condition are symmetric, then the solution is symmetric well.
. Then, the unique solution given in Lemmas 4 and 5 satisfies: Proof. The proof follows the same lines as in Lemma 3 except that the term containing the function g is now missing.

Application to Life Tables
Life tables are a widely used tool in demographic sciences or in actuarial field to study the behavior of a population in relation to the survival, or mortality, in a region or in a period of time. The implications of the results derived from them have a great importance, for example, for the analysis of the sustainability of the pension system; for the comparison of populations via indicators as expectation of life; for the estimation of the insurance premium; or for the calculation of the mathematical reserves to guarantee the pay of the claims [14].
Life tables are structured in biometric functions and the relations between them (see, for example, [15][16][17]). Some of these important functions, among others, can be: the survival probability at exact age x, p x , or its complement, the probability of death, q x (the focus of this work); the expectation of life at concrete age x, e x ; the central rate of mortality by age, m x , etc. Usually, we do not know the true values of the biometric functions and, then, they are treated as underlying values, which are unknown. In this situation, these values can be approximated using techniques or methodologies from this field of knowledge (see, [18], https://www.mortality.org/Public/Docs/MethodsProtocol.pdf or [19], among others). In this work, as in our previous paper [10], the aim is to estimate and predict the true probabilities of death (at each age x). In general, in order to estimate the values of a biometric function, different points of view and several types of techniques can be used. We consider three important paradigms: a first classification between the stochastic and the non-stochastic paradigms; second, we distinguish the situations when the model is based on the parametric point of view or, on the contrary, it does not consider parameters; finally, in the third level, we differentiate between the dynamic point of view (if the model considers the evolution with time) or the static point of view (the changes with time are not considered). The difference between the stochastic and non-stochastic frameworks is based, in actuarial practice, on the use or the interpretation of the results of the technique. For example, the Gompertz law [20] can be used in a deterministic framework (the results of the fit of the Gompertz law are used as unique values and as input in other steps of the process as, for example, in tarification) or it can be used in a stochastic framework, for example, by considering the fit as a realization of the random process of interest. The literature on this topic is extensive. The papers [11,15] contain a good introduction to the stochastic framework. The works of [21,22] introduce us to parametric and non parametric models. The works [16,[23][24][25][26] allow us to know examples of stochastic, parametric and statics models. Concerning stochastic, parametric and dynamical models, the works [27][28][29][30][31] are interesting. We can find examples of stochastic, nonparametric and static models in [32][33][34][35]; for non-parametric and dynamical models, see [36].
As in the previous work [10], in this paper, we consider a non-stochastic paradigm, with a non-parametric model and a dynamical point of view. However, in this work, we consider the previous information as well. Thus, the difference between these two works is the use of the past information, via improvement rates. In the paper [10], history is not considered, and only the current (actual) data are used to estimate the predictions for the future. That procedure has the important limitation that the predictions are only valid for a short term, for periods of two or three years. The technique that is proposed in this work avoids this limitation and, by considering the history of the mortality rates, improves the previous results because it allows us to make predictions for periods of five to ten years.
In order to introduce the previous information, we establish a delay pattern via improvement rates [11].

Dynamical Kernel Graduation with Delay
In order to improve the dynamical model (1), which was proposed in [10], in this work, we introduce the history using the concept of Improvement Rate (see [37][38][39][40]). These rates characterize the evolution of the mortality year-to-year or between two arbitrary moments, t 0 and t 1 . In this sense, we use the classical definition of improvement rate between t 0 and t 1 (t 0 < t 1 ) at age x, given by r t 0 ,t 1 x = q t 1 x /q t 0 x . If we consider as delay the value d = t 1 − t 0 > 0, then, r t 0 ,t 1 x ≡ r t 0 ,t 0 +d x . Similarly, we define the global improvement rate between t 0 and t 1 , r t 0 ,t 1 , as the coefficient of the linear model without constant term of the observed death rates, that is, we fit the linear model

Procedure by Steps
The proposed method can be summarized as follows:

1.
We consider the observed mortality rates at each age (from 0 to 100) and each year in the considered period (from 1908 to 2019), and denote them by q t x , x ∈ D = {0, . . . , 100}, t ∈ {1908, . . . , 2019}. Additionally, we consider the values of g x (t), which is the rate of death either at "negative ages" or after the actuarial infinite (we have chosen it to be equal to 100).

2.
We estimate the improvement rates by age and delays, that is: year delay

3.
We estimate the mean, with respect to the delay, of the improvement rates: being #(d) the number of couples (t 0 , t 1 ) such that d = |t 1 − t 0 |. We will refer to these values as the global improvement rates for the delay d.

4.
The annual improvement rates by delay, r d , play an important role in the procedure because they contain previous information of the mortality process. However, the experience of studying this phenomenon allows us to assure that the importance of these rates are not the same for all delays, and it is clear (under usual conditions such as non-pandemic ones) that the behavior of the improvement rates become more important if they are close to the time of prediction. Using this realistic assumption, we assume that the importance of these rates is modulated by a probability distribution function. In particular, we consider a modified exponential function. To do this, we consider the exponential probability density function with the form and obtain a density function in the interval [0, h] by putting Then, we define the density function ξ(·) from (6) by ξ(s) = f λ (−s) for s ∈ [−h, 0]. In the particular case when in the numerical approximations we consider only integer delays, we can discretize the interval [0, h] by using a finite number of integer delays s = {0, 1, . . . , d max }, where d max = h ∈ N is the maximum delay to be considered. Thus, instead of (17), we will use the discretized probability function which approximates (17) at s = 0, 1, ..., d max .

Remark 3.
We could also use the values of the function f λ (s) at s = 0, 1, ..., d max to define the approximative function f * (s), but we have used (18) because the results in the numerical simulations were quite good. The use of the exponential function to derive f * is arbitrary. Thus, it is reasonable to think that the selected function is not optimal. In this sense, we consider that this topic could be a possible new line of investigation to improve the work.

5.
Using the annual improvement rates, and the exponential distribution, we define the weighted improvement rates as 6. Using the improvement rates by delay, r d , we can define the functions α(s) from (6), by putting where r 0 = 1, and then defininig α(s) at other points by linear interpolation, that is, This procedure can be implemented for a non-integer step as well. Namely, let b be a divisor of d max . Then, we define in a similar way as above the improvement rates r d for d = b, 2b, ..., d max . We observe that in this particular situation, the functions α(s) are independent of i ∈ D.
It is usual to assume some hypothesis in the behavior of the mortality. Namely, H1. For each age x, for an arbitrary moment t and for a small time step τ > 0, the graduate value at t + τ, denoted by • q x (t + τ), depends on: • all graduate values at moment t, • q z (t), z ∈ D ⊂ Z (via Gaussian kernel graduation, see [17]). • all previous moments of time t + s, s ∈ [−h, 0] (via the exponential probability density function and the improvement rates).

H2. The relation between
• q x (t + τ) and • q z (t) , where j · is a suitable kernel (in this work a Gaussian kernel), g z (·) is the rate of death either at "negative ages" or after the actuarial infinite, dµ(s) = ξ(s)ds and α(s) is defined above using the annual improvement rates from the past.

Remark 4.
The hypothesis H2 is consistent in the sense that if τ → 0, then . Additionally, if τ = 0, the new graduate value depends on the step τ and the delay.

Remark 5.
We note that if the history is not important, then the annual improvement rates are equal to 1, so µ(s) can be interpreted as a degenerated probability function such that dµ(s) = 1 at 0. In this situation, the expression (21) is analogous to (1). (21) is based on the empirical experience and on the actuarial practice in a similar way as in [10].

Remark 6. The proposed relation
Next, we establish the relation between (21) and (6). From (21), we easily obtain that Formally, if we consider a fixed time t, and we take the limits as τ → 0, then

Application of Theoretical Results
Equation (22)  The Gaussian kernel obviosuly satisfies (H1) and (H2). The function g r (t) ≡ g r ∈ l ∞ 2 is a constant function, where g r is defined as follows: Hence, (H3) is satisfied. Here, g 0 is the probability of death after the actuarial infinite. In this case, α i (s) = α(s), where α(·) ∈ C([−h, 0], R) is given in (20). Therefore, (H4) holds true. Since the function g does not depend on, the problem is autonomous, so we take τ = 0.
We obtain that for small h there is a unique exponentially stable fixed point as well.

Lemma 7.
We choose h 0 > 0 such that Then for any h ≤ h 0 such that (22) possesses a unique fixed point which is exponentially stable.
Proof. Since j is a Gaussian kernel, ∑ i∈D j i−r < 1 for any r, so M 0 ≤ 1 implies that δ 0 > 0. The result follows then from Remark 1 and Corollary 2.
Although in the actuarial practice, the set D is finite because the probability of death is considered to be constant after some age (the actuarial infinite), it is also possible to estimate the values of • q x for any non-negative x. In this case, D is of the type (5); more precisely, D = {i ∈ Z : i ≥ 0}. As before, (H1), (H2), (H4) are satisfied and the function g r (t) ≡ g r is a constant function, where g r is defined by: g r = 0 if r < 0.

Details and Example
In this section, we will give the details of the numerical simulations.

Data, Period and Software
The data used in this study are downloaded from the Human Mortality Data Base [18]. It refers to the death rates of the Spanish population in the period 1908-2018. In this study, we combine several sub-periods to obtain the estimations. The period 1908-2012 is used to obtain some preliminary estimates; a sub-period of them, 1978-2012, is used to fit the significant improvement rates. We note that this last sub-period coincides with a period of significant changes in Spain. Among other events, the actual democratic status begins, which motivates the consolidation of the welfare system and implies the consolidation or changes of health protocols in Spain such as the transfer of home births to the hospital, significantly reducing the mortality at first ages [41,42]. The period 2013-2018 is used to validate the method.
The software used for all calculations and the construction of the figures is the R-Software [43]. Additionally, the data are downloaded from [18] and they are manipulated and visualized using the R-package [44]. This demographic package is also used to estimate the Lee-Carter Model.

Validation
The validation process has two parts. In the first part, we compare the forecasted data with the observed data in the same period. In the second part, to validate the model, we compare the proposed method with the classical Lee-Carter model [28].
In the first part, we fit the proposed model using the observed data in the period 1978-2012; next, we forecast the mortality in the years 2013-2018 to obtain the prediction of the mortality rates, q t, f x . Then, for each year in the validation period, we compare the observed death rates at each age, q t,obs x , with the forecasted values. The comparison is made using It is remarkable that a modification of the previous indicators can be appropriate. In particular, we can use the smoothed values and not the observed or forecasted values directly. We denote this modified indicators as I t SMSD and χ t SMSD . The second part of the validation consists in comparing the proposed method in this work with the classical Lee-Carter model [28]. The comparison is made in two ways. Firstly, we forecast the mortality rates by the two methodologies in the period 2013-2018. Then, we compute I t MSD and χ t MSD (or its smoothed versions) and compare the results directly. Additionally, we project the mortality in the period 2019-2030 with both techniques and compare it to determine if they have a similar magnitude.
The Lee-Carter model was formulated to estimate the central mortality rates, m t x , at year t. This model can be formulated as where a x , b x are factors depending only on the age; k t is a factor that describes the evolution of the mortality over the years; e t x is a realization of the white noise. It is remarkable that b x modulates the factor k t to adequate the behavior of the mortality due to social improvements such as the control of diseases, the improvement in health habits, nutrition... If we consider this model without other conditions, the number of parameters is 2 × #(number o f ages) + #(number o f years) (in our case, 2 × 101 + 111 = 313 parameters). It is usual to model the k t values as an autoregressive process of type ARIMA, and also that the fit provides us an autoregressive model of order 1 (one parameter); in this case, we reduce the number of parameters to 202. It is convenient to remark that the Lee-Carter model is a parametric model and the proposed method is nonparametric. In this sense, the assumptions of the proposed model are less restrictive.

Remark 8.
The election of the Lee-Carter model is not arbitrary and it is justified by the importance of that work. The paper of Lee-Carter model [28] is considered as a seminal model. It is the model with more influence in all ambits of this field of knowledge. We note that, in the actuality, there exist several versions of this model that improve it. However, we consider the comparison with this model by several reasonings: (i) It is usual in the scientific literature to use this model in comparisons; (ii) others models based on the Lee-Carter model either are not better in all conditions or the improvements are small; (iii) the Lee-Carter model has a great number of parameters, 200 or more; comparing this fact with our model, the difference is significant, as the number is less in our case.

Estimations and Results
It is known that the mortality has a dynamic behavior in all country or regions. Figure 1 shows the evolution of the crude mortality rates in the period 1908-2018. This figure shows the evolution of the death rates over the last 111 years.
We note that the highest series in the graphic correspond to the first years of the study, at the beginning of the XX century. The lines at the bottom of the figure corresponds to the recent years. This indicates to us that there exists an evident improvement in the mortality rates.
In order to apply the expression (21) to the observed data and obtain the forecasted values, it is necessary to know the values of the function g z after the actuarial infinite. In this work, we consider that these values correspond to the values of the mortality rates for ages greater than a characteristic age, when an effect known as "Plateau effect" begins [45,46]. This effect indicates to us that the risk of death does not increase to 1, but remains constant from certain age. Figure 1 shows this effect (simulated to facilitate interpretation), which can be observed in the final part of each curve and corresponds to the mortality risk values for ages in the range 90-130. In practice, the age at which the Plateau effect begins is unknown and it can be only estimated from the observed data. In this work, we consider that, for each year t greater than a characteristic age, g t z = max q t z z>70 . Figure 2 shows the evolution of the death risk at last ages. Evolution of the last mortality rates  and theoretical risk limit (from literature) year Logarithmic scale of rates The tendency of the series in Figure 2 is decreasing and the values of this series converge to the limit 0.35 (blue line at the bottom of the figure), which is consistent with the works of other authors ( [40], p. 41) in the Spanish case, for both genders and for ages up to 105 years. Other authors consider that this limit is 0.684 [46] (yellow-middle line in the figure), or 0.5 [45] (pink-top line in the figure). This type of assertion is not without controversy [47]. In our simulations, we have taken g 2012 z ≡ 0.385638 as the probability of death after the actuarial infinite. This is the value of g 0 in (23).
In step 2 of the method, the global improvement rates are estimated using (16). Figure 3 shows these rates for the first 21 lags; we find that the behavior is similar to the first 40 lags. We have analyzed these first delays and consider that they are enough to capture the dynamic of the improvements. Due to this, we have chosen the value of h equal to 21.
We fit a linear model for the global improvement rates (dashed line in Figure 3) without the constant coefficient and the obtained value of the parameter isb = −0.004171019. In this method, we consider as global improvement rates the values derived from the expression . . , 21}. Hence, with such a procedure, the function α(s) is linear. On other hand, we assume that the impact of the improvement rates is not equal for all of them. Its importance is modulated by another function. As we have said, the improvement rates are modulated by a probability function which is exponential, as in expression (18), and the weigthed improvement rates are obtained as in (19). The parameter λ in (18) is equal to 11 12 . Figure 4 shows the probability function which is used. Now, we have defined all the elements which are necessary to apply the method for forecasting.
In order to estimate the solution, we use expression (21) with τ = 1 and approximate the integral term with the Riemann summatory.
Firstly, and with the objective of using the observed data to measure the quality of the technique, we apply the method using the data in the period 1908-2012 and obtain two types of predictions for the period 2013-2018: (i) applying the proposed technique; (ii) applying the Lee-Carter method. Additionally, the indicators of quality defined by (24) are estimated. Figure 5 shows the predictions with the two techniques, the proposed technique and the Lee-Carter method. We can observe that these two series of values are close to the observed data. The magnitude of the values are coherent in the two methods. Comparing them with a graphical analysis, we are not able to identify whether one method is better than the other.  In order to clarify if one technique is better than the other, we need to calculate a numerical indicator. In this sense, we consider the quality indicators defined in (24). Using the observed data as reference data and the estimation made with each technique as the assumption, we obtain the values in Table 1. We can observe that the proposed technique is more accurate that the Lee-Carter method in both indicators of goodness. In the case of the indicators I t MSD , the improvement is for all years and the values of the error of the proposed technique with respect to the observed values are between 30% and 80% of the error of the Lee-Carter model. In the case of the indicators χ t MSD , we observe that the improvement is not for all years, as in 2015, the Lee-Carter model has better results, but we think that this occurs because that year has a great oscillation at initial ages. For the rest of the years, the rate of improvement of the proposed method with respect to the Lee-Carter model was between 30% and 80%. Now, as we said before, we will compare the proposed technique with the Lee-Carter method in forecasting. Figure 6 shows the predictions with these two methods for the period 2019-2028.  We observe that the Lee-Carter method has greater variability than the proposed method for the ages in the intervals 5-15 and 40-50. In addition, the mortality curves have similar values for the rest of the ages. As we do not have observed values of mortality for the period 2019-2028, it is difficult to determine which of the two techniques is better. In this situation, we will only compare the magnitude of the predictions with these two techniques, checking whether the values are similar or not. Since the Lee-Carter method is a reference method, we will verify if our results are within the confidence intervals of the Lee-Carter predictions. Figure 7 shows the values of the predictions with the proposed technique jointly with the upper and lower confidence curves of the Lee-Carter predictions (at a 95% confidence-level) for the years 2021-2028. It is clear that the values are between the two bounds of the Lee-Carter model, which indicates us that the method is appropriate. Additionally, considering that with the predictions between 2013 and 2018, the proposed method has turned out to be more accurate than the Lee-Carter method, it is convenient to reflect on which of these methods is appropriate.
We note that we can measure the goodness of the forecasting otherwise by using the quality indicators (24) but changing the reference values. First, we must assume as correct the values of the Lee-Carter values; second, we assume as correct the values of the proposed method. The comparison is made using the caution principle, depending of the application. For example, if we use the values of the proposed method when the true values are the Lee-Carter ones, the indicator I 2019 MSD was equal to 1.812869, which tells us that "globally" the discrepancies were close to 81%. If we use the Lee-Carter values when the true values are the ones of the proposed technique, then I 2019 MSD was 2.551214, which indicates to us that, "globally", discrepancies were close to 155%. The "possible error" was double if we used the Lee-Carter values. The same situation occurs for the rest of the years.  These two ways of measuring the quality of the proposed method allow us to affirm that, in the middle-term, the proposed method is quite appropriate.
It is important to point out that there exist several ways to improve this work: (i) introducing one improvement rate for each age instead of estimating a unique improvement rate for all ages, q t 1 · = r t 0 ,t 1 · q t 0 · (obviously, this increases the number of parameters of the problem, but no more than in the Lee-Carter model); (ii) introducing a process that allows us to select the better probability function.
Finally, we would like to remark that the main differences in the numerical simulations of this paper with respect to the ones in [10] are the following:

•
In the model of this paper, we have considered the influence of the values of the variable in the past and not only in the current moment of time, as in [10]. • The functions α(s) are new in this paper and we have approximated them via the improvement rates r d .

•
We have considered the probability functions (17), (18) in order to modulate the improvement rates r d .
• In [10], the value of the function g was taken to be equal to 0 outside the domain D, whereas in the present work, the value of g for ages greater than 100 was taken to be equal to a positive number due to the "Plateau effect". • The numerical simulations show that the predictions are good enough for at least a period of 8 years, whereas in the model without delay, the predictions were good for a period of 3 years.

Conclusions
In this work, we considered a system of ordinary differential equations with non-local discrete diffusion and finite delay. This paper extends the results in [10], incorporating prior information in the form of delay. With this, it is possible to improve the prediction window with respect to the previous work; in [10], the horizon of the prediction is adequate up to three years; however, in this work, it is verified (with the data in the considered period and for the defined measures) that the prediction horizon can be extended up to 8 years, and that it gives coherent values, in magnitude, when we compare it with other classical techniques such as the Lee-Carter model, up to 18 years.
The first part of the paper is dedicated to theoretically analyzing the system: firstly, a system with a finite number of equations; secondly, a system with an infinite number of them. In both cases, we prove the existence of solutions and several properties of them such as comparison results, stability and symmetry.
In the second part of the work, we applied the model to life tables, defining the parameters of the model (for example, the delay is incorporated as improvement rates of the probability of death) and applying the theoretical results of the first part by estimating and projecting to the future the risk of mortality (by age). In order to calculate the numerical estimations, we considered the data of Spanish mortality in the period 1908-2018 from [18]; the choice of this source of information is due to the fact that this organization uses a methodology that unifies certain methodological aspects of the statistical bureaus of several countries, providing a common framework that allows us to assure that the proposed model can be applied to any region in the world. In order to validate the proposed model and compare it with the Lee-Carter model, we defined some indicators of goodness and smoothness. In this sense, these measures indicate to us that this model is more accurate than the classical Lee-Carter model in the considered period of validation (between 2013 and 2018). Additionally, the proposed model gives us coherent values up to 2028 (maybe even for a larger period of time) when we compare these values with the values of the Lee-Carter model.
Finally, we consider several aspects of this work that could be improved in future researches. For example, we could consider improvement rates which are dependent of age or use other density probability functions, and not only the exponential distribution. It is possible to introduce other aspects in the model as the cohort effect, which is usual in the demographic and actuarial fields.