Gaussian and Lerch Models for Unimodal Time Series Forcasting

We consider unimodal time series forecasting. We propose Gaussian and Lerch models for this forecasting problem. The Gaussian model depends on three parameters and the Lerch model depends on four parameters. We estimate the unknown parameters by minimizing the sum of the absolute values of the residuals. We solve these minimizations with and without a weighted median and we compare both approaches. As a numerical application, we consider the daily infections of COVID-19 in China using the Gaussian and Lerch models. We derive a confident interval for the daily infections from each local minima.


Introduction
The least absolute deviations (LAD) method of curve-fitting proposed [1] consists of fitting the data (x i , y i ) to a function f (x i , θ), with i = 1, . . ., n.The parameter θ ∈ R p minimizes the sum of absolute deviations According to [2], in the linear regression case f (x i , θ) = ∑ p j=1 x ij θ j , the minimization of the quantity x ij θ j | was suggested by Boscovitch (1757) (some asymptotic results are given in [3]), see also [4][5][6].The latter objective function is convex with respect to the parameter θ.Hence, it has only one minima but may have many minimizers.
The linear regression case of LAD optimization is inherently more complex than the minimization of the sum of squares.The interest in the LAD method is associated with the development of robust methods.The LAD method is more resistant to the outliers in the data (see [7,8]).
The aim of our work is to analyze LAD minimization using a nonlinear regression motivated by the daily infections of COVID-19 in China during the first wave.We denote I(t) the observed number of infected persons at time t ∈ [1, T] with T ≤ 60 (see Figure 1).The variable t = 1, . . ., T represents day 1, . . ., T. Justified by the sigmoidal nature of a pandemic, we propose two models: the Gaussian model (see [9]) and the Lerch model as a prediction of the observed number of infected persons I(t) at time t.
The Gaussian parameter θ = (a, l, s), with a, l, s denote, respectively, the peak, the location of the peak, and the width of the first wave of COVID-19.
The Lerch probability distribution on the non-negative integers is proportional to the function z t (v+t) s with t = 0, 1, . .., the parameters z ∈ (0, 1) and v > 0. The Lerch probability distribution is strongly unimodal when s < −1 and v ≥ 1.In this case, its mode is at [•] signifies taking the integer part, see [10].
To estimate the three parameters θ = (a, l, s) (respectively, θ = (a, z, v, s)) based on the T observations, we consider LAD nonlinear regression with the subscript m = Gauss (respectively, m = Lerch).The Gaussian model was studied in our previous work [11].
As in our previous work [11], we propose to solve our proposed LAD regression using the simplex Nelder-Mead algorithm implemented by the optim function in R software.The Nelder-Mead algorithm [7,[12][13][14] is able to optimize functions without derivatives.It is a simplex method for finding a local minima of a function, the most widely used direct search method for solving optimization problems, and is considered one of the most popular derivative-free nonlinear optimization algorithms.
The output of the optim function depends on the initialization and is in general not a minimizer of the objective function.Restarting the Nelder-Mead algorithm from the last solution obtained (and continuing to restart it until there is no further improvement) can only improve the final solution and the latter is in general a local minimizer.Here is the iteration of the optim function until the convergence:

Probabilistic Interpretation of LAD Regression
Let us assume that where the errors (e(t)) are i.i.d. with the common probability distribution where e ∈ (−∞, ∞) and λ > 0 are location and scale parameters, respectively (see, e.g., [15,16]).It was named after Pierre-Simon Laplace (1749-1827), as the distribution whose likelihood is maximized when the location parameter is set to the median.Based on the data (I(1), . . ., I(T)) the likelihood is equal to It comes that the maximum likelihood estimator of the parameters θ and λ are θ = arg min{ f m (T, θ) : θ} λ = f m (T, θ).
In practice, θ are given by an algorithm of optimization, and usually, they are only local minimizers.Having θ and the scale λ, we derive a confidence interval for I(t) with t > T as a solution of the equation given by q = − λ ln(0.05)= 2.995732 λ.We derive the confidence interval of I(t) with the confidence level 0.95.Here, Îm (t) = â exp(− (t− l) 2 ŝ2 ) in the Gaussian case, and Îm (t) = â ẑt /( v + t) ŝ in the Lerch case.

LAD Regression Analysis Using Weighted Median
Before going forward, we recall the weighted median definition.

Weighted Median
We recall in the following proposition, the definition and the calculation of the weighted median.For more details, we advise the reader to see the work of [17].Proposition 1.Let us consider a sequence (x(t), w(t)) of real numbers with positive weighted w(t) > 0 and t = 1, . . ., T. The minimizer of the function a → ∑ T t=1 w(t)|a − x(t)| (called the weighted median) is given as follows.We calculate the permutation p(1), . . ., p(T), which rearranges the sequence (x(t) : t = 1, . . ., T) into ascending order.We form the sequence (w(p(t)) : t = 1, . . ., T), then we find the largest integer k which satisfies k then the weighted median a = x(p(k + 1)).

Back to Our Proposed LAD Regression
Both our LAD regressions have the form with a > 0, and g : (0, +∞) × D → (0, +∞) is a continuous positive map with D is a Euclidean domain.Now, we can announce the following corollary.
Which achieves the proof of (3).The proof of (2) works as follows.There exists a neighborhood V of b * such that

Numerical Results
In China, COVID-19 appeared on 23 December 2019 in the Wuhan region and after its fast-initial spreading, strict rules of social distancing were imposed almost one month later.Three months after the initially reported cases, the spreading in China subsided.The China data in Figure 1 were extracted from owid/COVID-19-data, available on the web.
Figure 1 shows that the peak and location, which are equal, respectively, to a = 15,136, and l = 22.

Confidence Intervals Using Six minimas with T = 10 in the Gaussian Model
We recall that the confidence interval of I(t) from the minimizer ( â, l, ŝ) is given by In Figure 2, we present the confidence intervals for the global minima and five local minimas among the list T = 10.The predictions using the minimas 335.047 and 336.92 are clearly bad.However, their predictions at the location l = 22 are close to the real peak among the six minimas.The percentage of predictions is given in Table 1.An R source code is given in Appendix A, which can be used to determine the confidence intervals for the other values of T once the list of minimas is determined by using one of the three considered methods.
In Figure 2, we present the confidence intervals for the global minima and six local minimas among the list T = 10 in the Gaussian model.

LAD Regression Using the Lerch Model with T = 10
By varying the initial condition and using the Algorithm 1 with the function (z, v, s) → f Lerch (T, a(T, z, v, s), z, v, s) we found a huge number of minimas.Table 2 shows some of them.We recall that a(T, z, v, s) is the weighted median of the sequence (x(t) = I(t)(v + t) s /z t : t = 1, . . ., T) with the weights w(t) = z t /(v + t) s : t = 1, . . ., T .We also recall that in the Gaussian case, the surface (l, s) → f Gauss (T, a(T, l, s), l, s) has only one minima equal to the global minima of the map (a, l, s) → f Gauss (T, a, l, s).
Figures 4 and 5 show, respectively, the optimal time series for the Lerch and Gauss models for T = 10, using our list of local minimas.We can observe that the Lerch model fits better with the prediction of I(t) for t > 10.But the Gauss model fits better with the prediction of the location and the size of the peak.In Table 3, we report the columns of mode positions and the corresponding amplitudes for each minimizer.The percentage of predictions is given in Table 2. Observe that the best percentage of predictions occurs at the global minima of the Lerch model, but the best percentage of predictions occurs at the local minimizer 325.186 of the Gauss model.
In Figure 6, we present the confidence intervals for the global minima and six local minimas among the list T = 10 of the Lerch model.

The Case T > 10
In this section, we consider the cases T = 20 and T = 60.In the Lerch model case, LAD regression still has many minimizers for T = 20.See Table 4.But in the Gauss model, LAD regression has only one minimizer for T = 20.See Table 5.From Tables 4 and 5, we can observe that the percentage of prediction of the Lerch model is better than the percentage of prediction of the Gauss model.Finally, the minima for T = 60 for both models (Gauss and Lerch) are unique and respective to (3318.433, 16.94084, 10.19735, 617.2386) and (0.05691053, 0.7445283, 3.3413322, −5.2886448, 570.7861098).
Figure 7 shows the optimal time series for the Lerch and Gauss models for T = 60, using our local minima.We observe that the Lerch approximation has a heavier tail than the Gaussian approximation.

Conclusions
In this work, we considered LAD nonlinear regression showed numerically in the Gaussian case g(t, l, s) = exp(− (t−l) 2 s 2 ) that the map f (T, a, l, s) has a huge number of local minimas, but the surface S has only a global minima, which is also the global minima of the map f (T, •).However, in the Lerch case g(t, z, v, s) = z t (t+v) s , contrary to the Gaussian case, we showed that the maps (a, z, v, s) → f (T, a, z, v, s) and (z, v, s) → S(T, z, v, s) have each a huge number of local minimas.We derived confident intervals for the daily infections from each local minima.Our message is that each local minima contains a part of the information and can be used for the prediction of a part of the parameters.

Corollary 1 . 3 . 1 . 2 .Proposition 2 . ( 1 )
For each fixed b, the minima of the function a → f (T, a, b) is attained at the weighted median a(b) of the sequence (x(t) = I(t)/g(t, b) : t = 1, . . ., T) endowed with the weights (w(t) = g(t, b) : t = 1, . . ., T).Moreover, if (a * , b * ) is a local minimizer of the function (a, b) → f (T, a, b), then a * is the weighted median of (x * (t) = I(t)/g(t, b * ) : t = 1, . . ., T) endowed with the weights (w * (t) = g(t, b * ) : t = 1, . . ., T). Proof.We observe that for fixed b, the map a → f (T, a, b) is a convex function.Now, let us assume that (a * , b * ) is a local minimizer of the function (a, b) → f (T, a, b).Then, a * is the global minimizer of the convex function a → f (T, a, b * ).Hence, a * is the weighted median of (x * (t) = I(t)/g(t, b * ) : t = 1, . . ., T) endowed with the weights (w * (t) = g(t, b * ) : t = 1, . . ., T).Comparison of the Minimizers of the Map b → f (T, a(b), b) and the Minimizers of the Map (a, b) → f (T, a, b) The following proposition is obvious.For each fixed a, the map b → f (T, a, b) is above the map b → f (T, a(b), b) and they intersect at the curve a = a(b).(2) If b * is a local minimizer of the map b → f (T, a(b), b), then (a(b * ), b * ) is also a local minimizer of the map (a, b) → f (T, a, b).(3) The local minimizers of the map (a, b) → f (T, a, b) belong to the set {(a(b), b) : b}.If (a(b * ), b * ) is a local minimizer of the map (a, b) → f (T, a, b), then, in general, b * is not a local minimizer of the map b → f (T, a(b), b).However, if (a(b * ), b * ) is a global minimizer of (a, b) → f (T, a, b), then b * is also a global minimizer of b → f (T, a(b), b).Proof.For each fixed b, the minima of the map a → f (T, a, b) is attained at the point a(b), which implies that f (T, a, b) ≥ f (T, a(b), b) and achieves the proof of (1).Let (a * , b * ) be a local minimizer of the map (a, b)

Proposition 3 .
Assume that b → f (T, a(b), b) has only a global minimizer.Then, the map (a, b) → f (T, a, b) may have many local minimizers, and b → a(b) is discontinuous at any point b * such that (a(b * ), b * ) is a local minimizer of the map (a, b) → f (T, a, b).Proof.By definition of the local minimizer, there exists a neighborhood V of (a(b * ), b * ) such that f (T, a, b) ≥ f (T, a(b * ), b * ) for each point (a, b) ∈ V. Necessarily, (a(b), b) is not in V for at least one point b near b * , if not f (T, a(b), b) ≥ f (T, a(b * ), b * ) for all point b near b * , and then b * is a local minimizer of the map b → f (T, a(b), b).This is absurd because b → f (T, a(b), b) has only a global minimizer.

4. 4 .
Confidence Intervals Using Six Minimas with T = 10 in the Lerch Model

Figure 6 .
Figure 6.Confidence intervals using six minimas with T = 10 of the Lerch model.

Figure 7 .
Figure 7. Optimal time series for T = 60 for Lerch and Gauss.

Table 1 .
A list of minimizers of (a, l, s) → f Gauss (T, a, l, s) with T = 10.

Table 2 .
A list of minimizers of (a, z, v, s) → f Lerch (T, a, z, v, s) with T = 10.

Table 3 .
Mode position and amplitude using Lerch.

Table 4 .
A list of minimizers of (a, z, v, s) → f Lerch (T, a, z, v, s) with T = 20.The last column contains the percentage of predictions.

Table 5 .
The unique minimizer of (a, l, s) → f Gauss (T, a, l, s) with T = 20.