A Stochastic Lomax Diffusion Process: Statistical Inference and Application

: In this paper, we discuss a new stochastic diffusion process in which the trend function is proportional to the Lomax density function. This distribution arises naturally in the studies of the frequency of extremely rare events. We ﬁrst consider the probabilistic characteristics of the proposed model, including its analytic expression as the unique solution to a stochastic differential equation, the transition probability density function together with the conditional and unconditional trend functions. Then, we present a method to address the problem of parameter estimation using maximum likelihood with discrete sampling. This estimation requires the solution of a non-linear equation, which is achieved via the simulated annealing method. Finally, we apply the proposed model to a real-world example concerning adolescent fertility rate in Morocco.


Introduction
Stochastic diffusion models are used to analyze the evolution of phenomena in multiple fields of science, including biology, finance, energy consumption and physics. In addition to traditional applications, stochastic diffusion processes (SDPs) have attracted considerable attention as analytical tools in areas such as cell growth, population growth and environmental studies. In this respect, see for example: Lognormal [1]; Gompertz [2]; Logistic [3]; Hyperbolic [4]; Rayleigh [5]; Pearson [6]; Weibull [7] and Brennan-Schwartz [8].
However in most of these studies, the processes considered are time homogeneous, in other words, the present state of the process depend only on the previous states and not on time. In contrast observations from many fields such a as neuroscience, finance and biology, suggest otherwise. Various non-homogeneous SDPs have been proposed to reflect this time dependent behavior, see for example: Lognormal [9], Gompertz [10], Vasicek [11], Brennan-Schwartz [12], and Gamma [13] processes.
In most of the aforementioned studies, the statistical inference is based on the maximum likelihood function, which is the product of transition densities. However, in some cases the closed form of the transition density is unknown, or has complicated expression, so the maximum likelihood method remains difficult to implement. Therefore many methods based on an approximation of the maximum likelihood were developed, such as: Prakasa-Rao [14], Kloeden et al. [15], Bibby et al. [16] and among others.
The Pareto type (II) distribution or Pearon type (IV) distribution, also called Lomax distribution, was introduced and studied by Lomax [17]. This distribution is commonly used in reliability and many lifetime testing studies. It is also used to analyze business data.
The density function of a Lomax distribution on [0, +∞[ with β > 0 (scale parameter), and α > 0 (shape parameter) is given by: This distribution is a special case of a more general one called the Generalized Pareto distribution, the density function of which has the following form: , where µ and k are real parameters and σ > 0. This distribution encompasses the Pareto distribution as a special case since if we set µ = 0 and k = 1 α we obtain the Equation (1).
In the present paper, we introduce a new Stochastic Lomax Diffusion Process (SLDP) as a non-homogeneous extension of the lognormal process, and which presents a trend function that is proportional to the Lomax density function. Moreover, the term adopted for the model we study will be improved by stochastic calculus. In this work, we will present a detailed and complete study of the Lomax Model. To this end, we will proceed as follows: In Section 2, we define the model in terms of stochastic differential equation (SDE), we then give the analytical expression of the solution of the proposed model. After which, we determine the Transition Probability Density Function (TPDF) and the trend functions. In Section 3, we deal with the problem of parameter estimation using Maximum Likelihood (ML) in the basis of discrete sampling. In this case, the system of likelihood equations does not have an explicit solution, so as a result the ML estimators cannot be given in the closed form. Then, one possible way to solve this basic problem is the use of numerical methods. In Section 4, we propose the simulated annealing method approximating the ML estimator then we show the results of the simulation of the process in Section 5. Moreover, in Section 6, we illustrate the results obtained by this method by reference to real data, namely the adolescent fertility rate in Morocco. Finally, we summarize the main conclusions drawn from this work.

The Model
The proposed model is the one-dimensional non-homogeneous SDP {x(t), t ∈ [t 1 , T], t 1 ≥ 0} taking values on [0, ∞] and with drift and diffusion coefficients: where σ > 0, β > −t 1 and α are real parameters. Alternatively, the process defined above can be considered as the unique solution to the following SDE: where w(t) is the one-dimensional standard Wiener process and x t 1 is fixed in R * + .

Distribution of the Process
The SDE in Equation (3) has a unique solution (see Kloeden et al [15]). In order to obtain this solution, we consider the appropriate transformation y(t) = log(x(t)), then, by means of Itô formula, the Equation (3) becomes: The solution to which is: For s ∈ [t 1 , t]. Hence, we deduce the expression of the solution to SDE in Equation (3): then x(t)|x(s) = x s follows a Lognormal distribution: where Λ is the Lognarmal distribution. As a result, the TPDF of this process is found to be:

Trend Functions
From the properties of the Lognormal distribution, the main characteristics of the process can be determined, in particular the r-th conditional moment of the process is given by: Then, by considering the case where r = 1 in the previous expression, the conditional trend function of the process is: In addition, taking into account the initial condition P(x(t 1 ) = x t 1 ) = 1, the trend function of the process is: We note here that: • The trend function as defined in Equation (6) is proportional to the Lomax density function Equation (1). • Otherwise, in the absence of white noise (i.e., σ = 0) the solution to equation Equation (3) is x(t) = x s ( s+β t+β ) α which is proportional in this case to the Lomax density function [18], with shape parameter α and scale parameter β, which can be denoted P(I I)(β, α, µ = 0).

Maximum Likelihood Estimation
We consider a discrete sample of n observations of the process x(t) which we denote here (x i ) i=1,...,n , let (t 0 < t 1 < ... < t n ) denotes the moments when the process was observed with x i = x(t i ) moreover we set t i − t i−1 = h, and finally θ = (α, σ) is the parameters vector.
We know that the likelihood function l(x, θ) is the product of the densities functions: The Log-likelihood is given by: We differentiate this function with respect to the elements of vector θ to obtain the following equations: Equation (8) is a second-degree equation in σ 2 , which admits two solutions (since . Therefore, from the non-negative solution corresponding to σ 2 , the estimatorσ 2 is given by: By replacing σ 2 byσ 2 in Equation (7), the estimator of α is satisfying the following non-linear equation: On the other hand, substituting σ 2 byσ 2 in Equation (10), Obviously, this is a set of non-linear equations whose solutions may be difficult to find. To address this problem we use numerical resolution methods.

Computational Aspects
In this paper we suggest the simulated annealing (SA) method for solving the equations Equations (11) and (12). Hereafter is the description of the method.
Simulated annealing is a stochastic optimisation algorithm, developed in 1983 by [19], which approaches the global optimum of a given cost function by means of a random search. The fundamental idea of the algorithm is inspired by the process of annealing of metals in metallurgy. At each step of the simulated annealing algorithm a new point is randomly generated, if the new point improves the cost function it is accepted, otherwise, it is accepted with a probability exp(−∆ f /T), where f is the cost function and T is the temperature. Accepting points tat don't improve the cost function allows the algorithm to escape local optima. The main disadvantage of this method is that the adjustment of the parameters (initial temperature, minimum temperature, cooling process and stopping conditions ...) considerably affects the time required to reach the extremum.

Simulation
To illustrate the process described by Equation (3)  Using the simulated annealing method to solve Equations (11) and (12) The values obtained used this method are α = 1.511872058, β = 90.002739556 and σ 2 = 0.000099480. We then calculated the mean value of the simulated paths at each time , where x j is the sample path j, t i is the time step i and m is the number of simulated trajectories. We also obtained the estimated trend functions of the process using each method, the result are plotted in Figure 2.

Data Description
In this application, we examined the variable x(t) defined by the adolescent fertility rate, which is the number of births per 1000 women aged 15 to 19, these data are annual and are available on the site: https://data.worldbank.org/. The average value for Morocco during the period from 1979 to 2018 is 42.395655 with a minimum value of 30.6810 in 2018 and a maximum value of 92.9376 in 1979. Table 1 illustrates the observed values, as well as the ETF and Estimated Conditional Trend Function (ECTF) during this period.
In Figure 3, real data are plotted against trend functions (conditional and unconditional). The unconditional trend function provides a good estimates for the real values, the accuracy of those estimates can be more accurate if we consider the conditional trend function. Table 2 shows the estimated data for the years from 2016 to 2018 that were not used in the modeling and the actual data.

Goodness of Fit of the Model
The absolute mean error in percentage (MAPE) is the average of the deviations in absolute value compared to the observed values. It is a practical indicator of comparison, it makes it possible to evaluate the forecasts obtained from the models. We denote by y i , y i and n respectively the real values, the values predicted by the model and the number of predictions, so we have: The symmetric mean percentage absolute error (SMAPE) is a measure of precision based on relative errors and is defined as follows: The values obtained for the MAPE and SMAPE are: 1.756265 and 1.759680, respectively. The MAPE value is less than 10, so according to Lewis [20] the values obtained by this model are "very precise".

Conclusions
In this study of the stochastic Lomax diffusion process, from a theoretical point of view, we conclude that we can determine the basic probabilistic characteristics of the model and we obtain its parameter estimators. Using the maximum likelihood method in the basis of discrete sampling, we obtained a series of non linear equations which were solved by computational methods. We used the simulated annealing method to estimate the parameters of the model. Hence, a set of statistical results are obtained and show that the proposed process is enable to be applied to real data.
The Lomax model is applied to fit data for adolescent fertility rate in Morocco. The ETF presented a good description of the changing levels of the fertility rate. Furthermore, the pe-riod from 2016 to 2018 improved good forecasts. Then, the resulting values obtained by the MAPE and SMAPE were calculated and showed good results. Taking into account these points, we deduced that the methodology applied in the study of this new model was efficient and present a high degree of accuracy.