Two-Parameter Stochastic Weibull Diffusion Model: Statistical Inference and Application to Real Modeling Example

This paper describes the use of the non-homogeneous stochastic Weibull diffusion process, based on the two-parameter Weibull density function (the trend of which is proportional to the two-parameter Weibull probability density function). The trend function (conditioned and non-conditioned) is analyzed to obtain fits and forecasts for a real data set, taking into account the mean value of the process, the maximum likelihood estimators of the parameters of the model and the computational problems that may arise. To carry out the task, we employ the simulated annealing method for finding the estimators values and achieve the study. Finally, to evaluate the capacity of the model, the study is applied to real modeling data where we discuss the accuracy according to error measures.


Introduction
A diffusion process X t is a solution of the stochastic differential equation (SDE) of the form dX t = a(t, X t , θ)dt + σ(t, X t , θ)dw t , with w t a standard unidimensional or multidimensional Wiener process and a and σ known functions (with a vector-valued and σ matrix-valued if X t is a multivariate process). θ indicates the unknown parameter and the inference issue discussed in that of estimating θ under continuous observation or discrete observations of X t . In order to give an example of stochastic processes, we cite the Brownian motion which plays a central role in the development of stochastic analysis. It is a process which is Gaussian, Markov, self-similar, a martingale and has stationary, independent increments. Brownian motion is also known as a Wiener process in honor of Norbert Wiener who's work appeared in a series of papers in the early 1920s, a decade before Kolmogorov's monograph that set probability theory on a rigorous mathematical foundation.
Stochastic modeling deals with real-world situations in which uncertainty is present and employ probability skills to model those circumstances. Therefore, the purpose of stochastic modeling is to study a forecast and to estimate the probability of its outcomes, to explain what conditions or decisions might happen under different situations for good results. Stochastic diffusion processes are well adapted to illustrate the advancement of diverse phenomena and to forecast their future trends, by using statistical inference methods. For instance, stochastic diffusion processes have been employed with respect to demography [1], electricity consumption [2], life expectancy at birth [3], effect of therapy on tumors [4] and population extinction [5].
These models are defined by stochastic diffusion processes, considered using stochastic calculus methods and on the corresponding statistical inference. In general, the solution to an Itô-type SDE is a diffusion process, whose trend function E[X(t)] = f (t) has a form similar to a curve associated with known distribution. In some cases, the maximum likelihood (ML) method is the feasible procedure since the transition density function of the diffusion process is known explicitly.
The difficulty of estimating parameters of the drift coefficient has collected important interest in latest years. In most cases, the statistical inference is based on approximating the ML methodology, see for example, Prakasa Rao [6].
In the same context described above, we propose in this paper a study of the Weibull-type stochastic diffusion model. The trend function (TF) of this model corresponds to the graph of the probability density function of the Weibull distribution. From the explicit expression of the transition density function of the process, the ML method is applied to find out the estimators of the parameters of the process. In the measure to estimate the parameters, we must overcome the difficulty appeared when we were solving the ML principle. To carry out the problem, we suggest using the simulated annealing (SA) method. This methodology is implemented on an example with real data also illustrated with simulated data by employing the resulted values of the parameters. The estimation of parameters produces a computational problem. This paper is structured as next: in Section 2, we introduce the non-homogeneous Weibull diffusion process and its probabilistic aspects. The parameters are then estimated in Section 3, using the ML method with discrete sampling in time and considering the computational problems involved by means of SA presented in Section 4. We then determine the approximate confidence bounds of the process. Finally, in Section 5, this method is applied to real data.

The pdf and Moments of the Process
The SWDP, which is the proposed model in this study, is established as the non-homogeneous diffusion process depending on time {x(t), t ∈ [t 1 , T], t 1 > 0} and taking values in (0, +∞) by the next Itô's SDE where w(t) is a univariate standard Wiener process and x 1 is a constant. Thus, we give the infinitesimal moments by the equations: where σ > 0, and α and β are real constants. This model is an extension of the SWDP defined in Reference [7]. In fact, by considering a constant β instead of the terme α + 1 in the drift coefficient in Reference [7]; that is, a(x, t) = α t − (α + 1)t α x. Then, we obtain our stochastic Weibull diffusion process with a new drift coefficient defined in Equation (2).
Since, the functions a(t, x) and b(t, x), 0 < x < +∞, are Borel measurable and satisfy the uniform Lipschitz and the growth conditions (see Kloeden and Platen [8]). We conclude that there exists a constant C > 0 such as the infinitesimal moments specified in Equation (2) verify the Lipschitz and growth conditions ∀x, y ∈ R + and t ∈ [t 1 , T].
In fact, let us consider x, y ∈ R + and t ∈ [t 1 , T], then from one side we have From another side, for the particular case where y = 0, we have Thus, there exist an (a.s.) continuous process {x(t), t ∈ [t 1 , T]; t 1 > 0}, separable and measurable, which is the unique (a.s.) solution of the SDE (1). This solution is obtained by using Itô's formula. Let us define a new variable by y(t) = log(x(t)), so that This equation can be directly integrated, thus obtaining and hence The analytical expression of the solution of Equation (1) is easily deduced from Equation (4): Since y(t) conditionally on {y(s) = y s } has a one-dimensional normal distribution From the above, the probability density function (PDF) of the process considered has the next form

Moments of the Process
To determine the moments of the process, we take into account the useful property of the lognormal distribution, that the r-th conditional moment of the process is defined by As matter of fact, when we consider the situation where r = 1, the conditional trend function (CTF) of the process is: Thereby under the initial condition P[x(t 1 ) = x 1 ] = 1, the TF of the process is expressed by: Remark 1.
-As mentioned above, this process is a generalisation of the one defined in Reference [7]. In fact, assuming β = α + 1, the SWDP obtained becomes the SWDP based on the two-parameters Weibull distribution.
-Moreover, the trend function of the process, given in Equation (9), is corresponding to the PDF of the Weibull distribution.

Maximum Likelihood Estimation
The drift and diffusion parameters of the process that are α, β and σ 2 are estimated by ML method and discrete sampling. Therefore, we treat a discrete sampling of the process x(t 1 ), x(t 2 ), . . . , x(t n ) at times t 1 , t 2 , . . . , t n , and we denote x(t i ) = x i , for i = 1, . . . , n in the following. Moreover, we presume that the time gap among two successive observations is constant (i.e., t i − t i−1 = h, for i = 2, . . . , n). Hereafter, by taking P[x(t 1 ) = x 1 ] = 1 the initial condition, the linked likelihood function can be obtained from Equation (7) by: Since taking derivatives of a product is tedious, the log-likelihood for Equation (10) is usually maximised, that is, By applying the principle of ML, we obtainα,β andσ 2 , which are the estimators of α, β and σ 2 respectively. As a matter of fact, we derivate the log-likelihood function with respect to α, β and σ 2 then we get the next equations: For j = 2, . . . , n, we denote: From Equation (12a), we obtain (as a positive solution) the expression of the estimatorσ 2 on the following result:σ And consequently, by substituting σ 2 2 in Equations (12b) and (12c) by the expression of its estimator (see Equation (13)), the following nonlinear equations are obtained for the estimatorsα andβ:

Confidence Bounds of the Process
The confidence bounds (CB) of the process are obtained using the same procedure as in Reference [9]. Thus, from Equation (5), we consider the variable Since ∀ t ≥ t 1 , the random variable (w(t) − w(t 1 ) is the so-called independent increments and is normally distributed N 1 (0, t − t 1 ) an estimation for the variable Y, is normally distributed Thus, the 95% CB for the variable x(t) is obtained from the next characteristic: A CB for x(t) with the following form can thus be obtained: where,

Estimated TF and Estimated CBs
From Zenha's theorem [10], by replacing the parameters by their estimators in Equations (8) and (9), the estimated conditional trend (ECTF) function can be obtained from: and the estimated trend function (ETF) is given by: What is more, the ECB are contructed by replacing the parameters by their estimators in Equation (15).

Simulated Annealing Method
Simulated Annealing (SA) was first introduced by References [11,12], who showed up significant initial results, following a prior investigation by Reference [13] who attempted to minimise a function on a very large, finite set. The actual approach was subsequently applied to optimising a continuous set by Reference [14].
SA is a technique to approximating the solution to tough combinatorial optimisation questions. The problem we get into is max S∈F ( f (S)) , or equivalently min Under the proposed algorithm, in every repetition, we have an actual solution x which is represented by an objective function value f (x), for this solution a neighbour x is chosen from the neighbourhood of x indicated K(x), and determined as the set of all its nearest neighbours. For every move, the objective variance η = f (x ) − f (x) is measured. From maximisation problems, x takes the place of x when η ≥ 0. Moreover, x could also be admitted with a probability ω = e −η T . The approval probability is compared to a randomly-generated number r and x is accepted whenever ω > r. We have to fulfill the stopping criteria to find out the point x * which is a close solution to the issue.
In our situation, the problem is to maximise log-likelihood function obtained in Equation (11). Therefore, the objective function to minimise is a function of parameters α, β and σ 2 : In SA the motivation is to avoid trapping local optima, thereby enabling upward moves to higher-cost solutions under the orientation of a control parameter termed 'temperature'.

Application
The following time-dependent stochastic variable (stochastic process) is considered: x(t), that is, the ratio of the dependent population (those aged under 15 or over 65 years) to the working-age population (aged 15 to 65 years) during year t in Morocco. This ratio is expressed as the number of "dependents" per 100 "workers". This indicator is a decisive quantity of concern for demographic analysis also for pay-as-you-go retirement structure, social security system and health care insurance [15,16]. Indeed, the age dependency ratio measures the charge that the old population shows for the workers also it demonstrates how the dependency between young and old populations is making progress during demographic transitions. Formally, the age dependency ratio r(t), where g([a 1 , a 2 ), ) = [a 1 ,a 2 ) g(a, t) da represents the number of individuals with age a ∈ [a 1 , a 2 ) at time t. We also introduce g(a, t) for the average number of individuals with age a at time t.
The age dependency ratio in Morocco has significantly decreased; according to official Data in Table 1, the annual age dependency ratio fell from 105.58% (i.e., 105.58 dependents per 100 persons of working age) in 1968 to 51.89% in 2017. The mean ratio during this period was 74.55% with a minimum of 51.64% in 2015. The evolution of this ratio is associated with factors such as birth rate, fertility rate, employment trends, life expectancy and economic growth rates. The data used for this purpose correspond to the period 1968-2017 (see Table 1) and were provided in World Bank's database. The method applied is composed of two phases:

•
Step 2: Data for 2015-2017 are explored to forecast the expected values of the process. The results in Table 2 resume the behaviour of the conditional and the non-conditional trend functions given, respectively, by Equations (16) and (17) also the values of the confidence bounds (given 95%) established from Equation (15). The performance of the SWDP for the previsions is represented in Figures 1 and 2.

Goodness of Fit
The following scale-dependent quantities are based on the absolute error or squared errors and measures based on percentage errors: Mean Absolute Error (MAE) = mean (| e t |), Root Mean Square Error (RMSE) = mean(e 2 t ), Mean Absolute Percentage Error (MAPE) = mean(100 * e t /x(t)), assuming e t = x(t) −x(t) withx(t) is obtained by substituting the parameters in Equation (5) by their estimators.
The values obtained for the above error measures are acceptably low, especially the MAPE according to Table 3. The statistical measures obtained are illustrated in the Table 4.

Simulation
The sample paths were simulated by Equation (5), taking values of α, β, σ 2 and x 1 tight to those evaluated for these parameters in the real example in the application for which this investigation was established in Section 5.1. Ten trajectories with 500 values each were generated and the following time instants considered. Figure 3 shows the simulated trajectories of the SWDP, where the red curve represents the theoretical trend function, for the particular case of α = −0.5337, β = 0.8457, σ 2 = 3.8755 × 10 −5 , h = 0.096, t 1 = 1968, x 1 = 105.5770, which match, respectively to values near to those obtained in the study of x(t).

Conclusions
The SWDP was applied to analyse the age dependency ratio in Morocco. This obtained an improved description of the time series considered (1968-2014) and improved medium-term forecasts (2015-2017). From the results obtained (see Table 2, Figures 1 and 2), we deduce that when the real case considered is modelled by the SWDP model according to the estimation procedure designated in Section 3, the fit and prediction achieved, based on ETF and ECTF, present an important degree of accuracy Table 4.
From one hand, as the retirement age is stable, when the life expectancy is rising, an important part of one's lifetime is spent in pension. On the other hand, while the birth rates is decreasing, the part of population who will afterwards represent the support to the rest of the population is going down. In view of the fact that the dependency ratio indicates how many people need to be supported relative to the number of people who are working, consequently, the increasing number of retirees and the decreasing workforce drive up the dependency ratio.
An interesting area for future research would be to examine the possibility of defining a non-homogeneous Weibull model, introducing exogenous factors into the drift, similarly to the approach adopted for other diffusions [17,18]. This would enable us to study the factors affecting the evolution of the age dependency ratio for example: fertility, immigration, mortality, health and work ability. Funding: This reasearch was financed by LAMSAD from "Fonds propres de l'Université Hassan I" (Morocco) and FQM-147 from "Plan Andaluz de l'Investigaciòn" (Spain).