Abstract
This paper describes one of the non-linear (and non-stationary) stochastic models, the GSB (Gaussian, or Generalized, Split-BREAK) process, which is used in the analysis of time series with pronounced and accentuated fluctuations. In the beginning, the stochastic structure of the GSB process and its important distributional and asymptotic properties are given. To that end, a method based on characteristic functions (CFs) was used. Various procedures for the estimation of model parameters, asymptotic properties, and numerical simulations of the obtained estimators are also investigated. Finally, as an illustration of the practical application of the GSB process, an analysis is presented of the dynamics and stochastic distribution of the infected and immunized population in relation to the disease COVID-19 in the territory of the Republic of Serbia.
Keywords:
stochastic processes; emphatic fluctuations; non-stationarity; asymptotic normality; Gaussian distribution; estimation; COVID-19 MSC:
60E10; 60F05; 62M10
1. Introduction
Stochastic models which are used in the analysis of time series with pronounced and permanent fluctuations are of particular importance in contemporary research. For this purpose, we start from the basic results of Engle and Smith [1], who first introduced the so-called STOchastic Permanent BREAKing process, popularly called the STOPBREAK process. Many authors have since considered the STOPBREAK notion, primarily in the field of econometrics. Some of its modifications were considered, among others, in [2,3,4,5], while its application was presented, for instance, in [6,7,8].
The original modification of the STOPBREAK process, named the Split-BREAK model, was introduced in [9]. After that, the general form of this process, named Gaussian (or Generalized) Split-BREAK (GSB) process, was proposed in [10,11,12]. This stochastic model also can be viewed as a generalization of STOPBREAK, as well as a well-known linear Auto-Regressive Moving Average (ARMA) model. In that way, the GSB process has already been applied in analyzing non-linear time series with pronounced and permanent fluctuations. Let us point out that in the mentioned works, of main consideration were the stochastic properties of the stationary components of the GSB process. The main goal of this paper is a more detailed investigation of the non-stationary components (time series) of the GSB model. These series naturally have a more complex stochastic structure, but they are of particular interest in contemporary research [13,14,15,16,17,18]. To this end, the asymptotic properties of distributions of the GSB series will also be of specific interest.
In addition to the theoretical aspects, the application of the GSB process in describing the dynamics and finding an adequate stochastic distribution of the infected and immunized population with respect to COVID-19 on the territory of the Republic of Serbia was also considered. We point out that many authors who deal with this, still current, issue have contributed various theoretical models that investigate it from several aspects. For instance, rigorous mathematical models, usually based on analyzing and solving systems of partial coupled equations, have been proposed, among others, in [19,20,21]. On the other hand, works in [22,23,24,25] combine deterministic and stochastic approaches, such as multiple and logistic regression, multifactor correlation, and the least squares estimation method, to predict the various effects caused by the COVID-19 pandemic. A particularly interesting approach is given in [26,27] where, to predict the COVID-19 dynamics more accurately, machine learning techniques and the construction of a complete information system are used. Finally, to the best of our knowledge, most stochastic approaches to-date in the analysis of infection, immunization, and other indicators related to the disease of COVID-19 were based on the use of the gamma distribution [21,28], as well as a log-normal distribution [29]. This is precisely one of the reasons why we believe that a different approach is given here, primarily in stochastic modeling and research of this problem. At the same time, let us emphasize that our main goal is to model the temporal dynamics of the COVID-19 disease, based on a formal study of the stochastic structure of the GSB model. In this sense, some other indicators and features of this disease, which can also affect its dynamics (see, for instance [30,31,32]), can to a certain degree be a limitation of this approach.
In the next section, starting from previous works [9,10,11,12], some definitions and basic stochastic properties of the GSB process are discussed. Section 3 contains the main and novel results related to this process’s detailed stochastic structure and asymptotic properties, where the method of characteristic functions (CFs) was used as the basic tool. Section 4 presents the procedure for estimating the unknown parameters of the GSB process and an investigation of the asymptotic properties of the obtained estimators. Numerical Monte Carlo simulations of the obtained estimators are considered in Section 5. In addition, the application of the GSB process in describing the dynamics and distribution of the size of infected and immunized populations on the territory of the Republic of Serbia is given here. Finally, concluding remarks are highlighted in Section 6.
2. Definition and Main Properties of the GSB Process
The basic series of GSB processes is defined by the following equality:
Here, are the known time values, is the series of the so-called martingale means, and are the innovations, i.e., series of independent identical distributed (IID) Gaussian random variables (RVs). Moreover, it is considered that is defined on the same probability space , expanded by some filtration , i.e., nondecreasing -algebras on In a practical sense, filtration represents a set of “information” at time . Therefore, it is assumed that, for each , the RVs are -adaptive. Accordingly, the conditional expectation, as well as the variance of RVs , are, respectively,
On the other hand, for martingale means , we assume that they are defined by the following recurrence relation:
Here, we can effectively assume that and . Meanwhile, is the so-called noise indicator, i.e., the RV that depends on innovations in the following way:
The value represents the critical value of the reaction, i.e., the significance of the previous realization of innovations which allow their present values to be included in Equation (2). In other words, value indicates that there is no change in the martingale mean value , compared to the previous value . Consequently, the value will be obtained with a “small” fluctuation, which depends only on . By contrast, in the case of an emphatic (permanent) fluctuation of yt is registered. Thus, the level of previous realizations of series affects the degree of variations in the series , that is, it indicates the intensity of fluctuations in the GSB process. Furthermore, according to the previous equalities, it follows that:
from which we conclude that the series realizations are “close” to the martingale means . Moreover, it is valid to put:
i.e., the mean values of the series and have equal, constant values. We notice that the previous equalities speak a lot about the stochastic nature of the GSB process, that is, the additive decomposition (1). Since the sequence is measurable concerning the field , it represents a component of predictability and stability of the GSB process. In contrast, the innovations series is the deviation factor (white noise) of the basic GSB series in relation to the martingale means .
Further, we determine the conditional variance of the series from the equation:
and from here, one obtains:
For each , it also holds that:
where . It follows that the variance of martingale means , under the assumption can be expressed as:
From here, the variance of the basic series can be obtained as follows:
According to the previous equalities, the variances of the series and have non-constant values that depend on the point in time in which they are observed.
Correlation functions of the series and can be obtained in a similar way. Note that for every , it holds that:
and it is easy to see that the covariance of the series satisfies:
From here, the correlation function of the martingale means is obtained:
Similarly, according to equalities:
the correlation function for , can be obtained as follows:
Therefore, both correlation functions depend on the time arguments and indicate the non-stationarity of the series and . This fact requires some more complex techniques to examine their properties. Moreover, note that when ,
Thus, the correlation functions of both series and satisfy the L2-continuity condition.
At the end of this section, we define a series of increments of the GSB process by the following equality:
Almost all authors who have studied STOPBREAK processes highlight the importance of this sequence. This series, as can be easily seen from Equations (1) and (2), can be given in the following form:
where . The series is named a Splitting Moving Average process (of order 1), shortened to Split-MA (1) process, because it operates in two regimes. Fluctuations of innovations that were emphasized in the previous time moment (t − 1) imply , so the equality holds. On the other hand, fluctuations that do not exceed the critical value give a representation of in the form of a standard, linear MA (1) process. In this way, has similar properties to the MA (1) models, which can be applied in research into it. Thus, taking earlier assumptions, the mean value and variance of this series, obtained by simple computation, are:
where Moreover, the covariance of this sequence is:
and obviously has an identical structure to the standard MA (1) series. Based on the obtained covariance, we can easily see that the series is stationary and that its correlation function can be written in the form:
Finally, according to Equations (3) and (4), it follows that:
which can be viewed as a non-linear Integrated Auto-Regressive Moving Average (ARIMA) model with “temporary” components . These imply the specific structure of the series , as well as other components of the GSB process.
In the following section, as we have already pointed out, we also discuss the application of the GSB model in describing the dynamics of infection and immunization of the population on the territory of the Republic of Serbia. As will be seen, this kind of dynamics has pronounced fluctuations that can be described by the non-stationary components of the GSB process, primarily by its main time series In that case, due to its stationarity, the Split-MA (1) process plays an important role. As an illustration, Figure 1 shows the realizations of all the above-mentioned series obtained by the Monte Carlo simulation of the GSB model.
Figure 1.
Dynamics of the basic series of the GSB model. (Parameter values are: and ).
3. Stochastic Distribution and Asymptotic Properties of the GSB Process
In this section, some stochastic properties of the GSB process, regarding the distribution and asymptotic behavior of its basic stochastic components, are discussed in more detail. As explained in the previous section, the GSB model, given by Equations (1)–(4), contains four stochastic components: the basic series , innovations , the martingale means , and the series of increments . At the same time, series and represent the stationary components of the GSB process, where is “close” to the linear MA model. In general form, the stochastic structure of the series is described in [12], where the method of characteristic functions (CFs) was used. Following this approach, the basic stochastic properties of the series can be expressed by the following statement.
Theorem 1.
Let be the Split-MA (1) process defined by Equation (4). For arbitrary and , the cumulative distribution function (CDF) of this stochastic process is given by:
where and are CDFs of RVs and , respectively.
Proof.
For arbitrary let us denote the series of RVs . Since and are mutually independent RVs, it follows
Moreover, it is simply shown that holds for every , i.e., is a series of uncorrelated RVs. By applying conditional probabilities, the CDF of these RVs can be obtained as follows:
where is the CDF of the RV. Based on that, for the CF of the RVs, one obtains:
Here, and are CFs of the RVs и respectively. By substituting these CFs into the previous equality, we have:
whence, by applying Equation (4), it follows that the CF of RVs is:
According to the last equality and Lévy’s correspondence theorem (see, e.g., [33] (p. 181)), Equation (5) immediately follows, that is, the statement of the theorem is proved. □
Remark 1.
As shown in [12], the CDF of RVs can also be given in the following form:
where denotes the convolution of two (arbitrary) CDFs :
The equivalence of Equations (5a) and (5b) are directly obtained from the fact that CDF is neutral for the convolution operator, i.e.,
Finally, note that by differentiating Equation (5), the probability density function (PDF) of the series , one obtains:
By a similar procedure as in the previous theorem and using the convolutions of CDFs, we describe the stochastic distribution of other components of the GSB process, i.e., the series and . As already shown in the previous section, these series represent non-stationary stochastic processes with a constant mean . Accordingly, the following statement is valid.
Theorem 2.
Let (yt) andbe the time series defined by Equations (1) and (2), respectively, where. For arbitraryand, the CDFs of these series are as follows:
Here,andare the CDFs of previously defined RVsand, respectively, andis the CDF of the RV. In addition, when, the following convergences (in distribution) are valid:
Proof.
For arbitrary , let us introduce a series of RVs . In the same way as in the proof of the previous theorem, it is shown that is a series of mutually uncorrelated RVs, with where By reapplying the conditional probabilities, the CDF of is obtained as follows:
According to this, their corresponding CF is obtained:
Applying Equation (2), we find that the CFs of the RVs are as follows:
where is CF of the RV Then, Equation (6) immediately follows from Equation (9) and Lévy’s correspondence theorem [33] (p. 181).
Similarly, by applying the previous Equations (1) and (9), the CFs of the RVs are obtained:
From here, by reapplying the theorem of Lévy, Equation (7) immediately follows.
To prove the second part of the theorem, i.e., Equation (8), note first that the CFs of the RVs and , when according to Equations (9) and (10), can be written as follows:
Here, is an infinitely small value of a higher order than when . Hence, for a fixed but arbitrary , we have:
and the convergences thus obtained confirm the asymptotic relations in Equation (8). □
Remark 2.
Note again that the proofs of the previous two theorems are based on determining the CFs of the corresponding time series of the GSB process. In this sense, the CFs of the uncorrelated series of RVs and play a fundamental role. The series and can be viewed as “new” innovations with “optional” non-zero values, which essentially describe the stochastic structure of the GSB process. Nevertheless, as the relation holds for each it is sufficient to consider only one of these two series of uncorrelated RVs (which is what was done in the statement of Theorem 2). Moreover, it can be easily shown that CDFs:
are continuous almost everywhere, with the only point of discontinuity where they have “jumps” of the values and , respectively (see for more detail [34,35]). Therefore, the CDFs of the series and are mixtures of Gaussian and discrete type distribution, usually named Contaminated Gaussian Distribution (CGD). This is another important fact that disables an application of some of the standard procedures in the investigation of the properties of non-stationary series and .
On the other hand, Equation (8) shows that even non-stationary time series and can generate series and that converge toward a normal distribution when . Moreover, based on the properties of the non-stationary components of the GSB process described in Section 2, the time series has a constant variance . These facts will be of importance in the practical application of the GSB process and can be readily observed based on the convergence of the corresponding CFs and . As an illustration, Figure 2 shows convergences of the modulus of these CFs, for different time indices .
Figure 2.
Graphs of the convergence of modulus of the characteristic functions and , when . (Parameter values are: ).
At the end of this section, we additionally describe some more asymptotic properties of series obtained by transformations of non-stationary time series and . They also refer to the possibility of finding their asymptotically normal (AN) distributions, which can be shown by the following statement:
Theorem 3.
For arbitraryand time seriesand, given by Equations (1) and (2), respectively, let us define the so-called-mean series:
Then the following statements hold:
- (i).
- When, time seriesandhave an asymptotically normal distribution, i.e., the following relations, when, are valid:
- (ii).
- When, time seriesandasymptotically vanish, i.e.,
Proof.
We show the statement of the theorem first for the time series . Based on the definition of time series , i.e., Equation (2), one obtains:
Thus, the series is represented as a sum of uncorrelated RVs , . By applying the well-known properties of the CFs, as well as the expressions for the CF of the series , the CFs of are as follows:
Taking the logarithm of the function gives a function:
where After some computation, we find that, when
Thus, the functions have local maxima at the point Using a similar procedure as in [34], that is, by Laplace approximation of functions at , one obtains:
Then, by taking the asymptotic value in the last expression, when , it follows:
Substituting this expression into the CFs , it is easy to conclude that the first part of the theorem, in the sense of the series , is valid.
The proof for the series is carried out analogously. Using Equation (1), as the previously proven facts, we have that
Since RVs , , are mutually independent, after some computation, we obtain the CFs of series as follows:
From here, using the same procedure as in the previous part of the proof, i.e., by taking the logarithm of the function , and by developing at the point , we have:
Finally, taking the asymptotic values, when , one obtains:
Substituting this expression into CFs , the entire statement of the theorem is proved. □
Remark 3.
In the previous theorem, the case is particularly interesting because Equation (11) then gives the following convergences:
We will call these convergences, in the usual way, central limit theorems (CLTs) for the GSB process. As will be seen below, they will be helpful for estimating the unknown parameters of the GSB process, primarily the conditional variance .
4. Parameter Estimation Procedures
Now, let us consider the problem of estimation of (unknown) parameters of the GSB process, the critical value , mean value , and conditional variance (σ2). To estimate the first parameter , a series of increments will be used as the (only) observable and stationary component of the GSB model. Recall that we have named this series the Split-MA (1) process because it is close to standard, linear MA models. Although some of the estimation procedures we present here are like standard estimation methods in MA models (see, for instance [36]), the specificity of the Split-MA (1) model requires additional testing and analysis, primarily of the quality of the obtained estimates. To that end, the consistency and asymptotic normality of the estimators were examined. After that, several new approaches were considered, based on the observation of non-stationary time series . The main goal of these procedures is aimed at obtaining the estimated values of the parameters and .
4.1. Estimates of Critical Value (c)
Let be the Split-MA (1) process defined by Equation (4). As we have already shown, the first correlation coefficient of this series is:
From here, by solving on , we get the estimated value of this parameter:
where:
is the estimated value of the first correlation. Based on the estimate , the corresponding estimate of the critical value can be determined as a solution to the equation:
According to Equation (14), it is easy to see that and are appropriate estimates if the following inequalities hold:
In [9], it was shown that thus obtained estimators are strictly consistent if the innovations have a continuous distribution. Moreover, the estimates and will also be asymptotically normal (AN) if the RVs have a symmetric distribution. Note that both conditions are fulfilled in the case of Gaussian innovations , when the RVs have a distribution. Thus, the estimate of the critical value is simply found from the equality:
Here, is the estimated variance of innovations which will be described later.
However, it can be shown that, as for the linear MA series, the estimate is not the most efficient estimate for (asymptotic efficiency of the estimate is analyzed at the end of this subsection). To obtain more efficient estimates of the given parameters, we will modify the well-known Gauss-Newton method of estimating the parameters of nonlinear functions (see, for instance [36]). First, notice that Equation (4) can be written in the form:
or, in functional form,
On the other hand, if we define a series of RVs as
then it is easy to see that the RVs are adapted, for each , and thus independent of and . According to mentioned properties of RVs and , it follows that is a stationary and ergodic series of RVs (see, for more detail [37]) with and correlation function To this series, using the procedure described in [38], we add the so-called residual series:
The RVs are also adapted and mutually non-correlated, which can easily be shown. Namely, by applying Equations (16)–(18), for any integer one obtains:
Thus, Equation (18) defines the series as a linear autoregressive (AR) process with innovations . From here, we obtain another estimate of the unknown parameter by the following algorithmic procedure:
- (1)
- Applying Equation (14), determine as (the initial) estimate of , and according to Equation (15), determine estimate .
- (2)
- Based on Equations (16)–(18) and having obtained an estimate , compute, for each , the values:where , .
- (3)
- Using the standard regression procedure, i.e., the correlation function when , obtain an estimate of in the form:
- (4)
- As in the first step, based on the estimate , the critical value can be estimated as a solution of the equation (concerning ):
We emphasize that in [9], strict consistency and AN of the estimates and as well as and was proved. At the same time, the distribution of innovations was not explicitly used there. In the case of GSB process, where innovations are Gaussian distributed, we can express these results as follows:
Theorem 4.
Estimatesandare strictly consistent for the parameter, i.e., it is valid that:
Moreover, the estimates and are asymptotically normal for , i.e.,
where and
Remark 4.
Based on the previous theorem, the consistency and AN of the estimates and as continuous functions of and , is also valid (see, for instance [9] or [39] p. 24). Additionally, for any , the inequality holds when the equality is valid only for , as can be seen in Figure 3. This means that asymptotic variance , as a measure of “scattering” from the true value , is (significantly) smaller than . So, is a more efficient estimate than , which justifies its introduction.
Figure 3.
Graphs of the asymptotic variances of the estimates . (dashed line) and (solid line), depending on .
4.2. Estimates of Mean
As an estimator for the parameter , the sample mean of series was usually used:
This estimator is obviously unbiased , but its variance is not bounded. Namely, using the previously defined -mean series when , we can represent the estimator as a sum of uncorrelated RVs:
Thus, for the variance of we get:
Note that, as expected, the variance is asymptotically identical to that in Theorem 3, i.e., as in Equation (11), when . Moreover, when , that is, in the case of extremely large values of the parameter . However, in practical applications, this condition is usually not met.
An alternative way to obtain an estimate for is to take the sample mean of the mean series , when , i.e.,
Here, and are the harmonic numbers, with assumption . Obviously, is also an unbiased estimate of the parameter , but with weights that are more pronounced at the “older” points of time in which realizations of the series are observed. This is consistent with the fact that the covariances of RVs depend on these “older” time indices. Moreover, as shown in Section 2, at these time points, the covariances of RVs are equal to their variances. For these reasons, it is expected that the estimate will be more efficient than . Indeed, using a similar procedure as before, we first represent the estimate as a sum of uncorrelated RVs:
As for each , the statement below holds:
it follows that it can also be written:
where . Thus, after some computation, the variance of one obtains is:
Notice that the variance of is also unbounded, but with a lower asymptotic order than , since:
This means that the estimate is (asymptotically) more efficient than , which can be seen in Figure 4. Here are shown 3D plots of both variances and , which were observed as functions of two variables and .
Figure 4.
Variances shown as 3D plots of the estimate (a) and estimate (b), depending on and . (The variance of innovations is ).
4.3. Estimates of Variance
Let us consider determining the estimates of the third unknown parameter , which represents the variance of the innovations , that is, the conditional variance of the base series . It is precisely these facts that enable different estimation procedures for the parameter . First, notice that based on the previously obtained estimates and , i.e., the modeled innovation values given by Equation (16), the variance can be easily estimated. The usual estimation procedure is based on sampling variance:
Here, are modeled innovation values obtained from the estimates and , respectively. Notice that in the case of Gaussian innovations , the estimates given by Equation (21) are identical to the maximum likelihood estimators. Indeed, the log-likelihood function then reads as follows:
and by solving the equation , the estimate of is obtained as in Equation (21), that is, as the sample variance of the series . Thus, the consistency and AN of both estimates and can be readily shown. We note that due to their equivalence, only the estimate will be further considered (see Theorem below).
On the other hand, note that the previous estimation procedure is based on unobservable, modeled values of innovations . Another approach to estimating the variance is based on the so-called two-stage procedure, using the previously estimated parameter . By applying the equality , as well as the sample variance of the series , we can obtain an estimate:
Then, it follows:
Theorem 5.
Estimatesandare strictly consistent for the parameter, i.e., it is valid to put:
Moreover, the estimatesandare asymptotically normal for, i.e.,
whereand
Proof.
Since is an IID series of RVs, the stationarity and ergodicity of this series are apparent. Applying the strong low of large numbers (SLLS), it follows:
Furthermore, it can easily be shown that is the variance of the estimate . Thus, applying the central limit theorem (CLT), the first convergence in Equation (23) is obtained.
To prove the properties of the estimate , we note that is also a stationary and ergodic series of RVs. If SLLS is now applied to the following statistics:
then one obtains:
At the same time, according to Theorem 4, we have that is a strongly consistent estimator of i.e., when Thus, the last two convergences give:
To prove the AN of the estimate , note first that the sequence is 1-dependent, in the sense of Definition 6.3.1 in [36] (p. 245). According to Cauchy-Swarz and Minkowski inequalities, applied to Equation (4), i.e., the sixth moment of the sum , it follows that:
Then, the Hoeffding-Robbins theorem [40] can be applied, based on which it follows:
for which:
By applying the almost sure convergence of the estimate and the previously obtained convergence in Equation (25), we have
where . Thus, according to Theorem 4, the second convergence in Equation (23) is obtained. □
Remark 5.
As in Theorem 4, by comparing the asymptotic variances and for the estimates and , respectively, it is easy to see that inequality holds. At the same time, the equality is valid only when (Figure 5a), so the estimator is more efficient than .
Figure 5.
(a) Graphs of the asymptotic variances of the estimates (dashed line) and (solid line), depending on . (b) Plot in 3D of the variance of statistics , depending on and . (The variance of the innovations is ).
However, according to the proof of the previous theorem, it can be easily seen that for the variance of the statistics , given by Equation (24), is valid (Figure 5b):
Thus, can be used as an estimator of the “hybrid” parameter , which will be of interest for practical research, that is, the application of the GSB model discussed below.
Finally, another approach to finding estimates of the variance is based on the observations of the non-stationary series . Applying Theorem 3, i.e., the previously proven convergence in Equation (13), we have:
If we now consider the statistics:
after some computation, one obtains:
Thus, is an asymptotically unbiased estimator for , and using the estimate , an estimator of the parameter can be taken as:
5. Numerical Simulation and Application of the GSB Process
As already mentioned in the introductory section, two important aspects related to the practical implementation of the GSB process will be explored here. Firstly, numerical Monte Carlo simulations of previously obtained GSB estimators are analyzed. Then, based on actual data, the GSB process was applied to analyze the dynamics and distribution of the infected and immunized population with respect to COVID-19 disease in the territory of the Republic of Serbia.
5.1. Numerical Simulations of GSB Estimators
We first describe a pseudo-algorithm for estimating the parameters of the GSB model based on independent Monte Carlo replications of the GSB series. To that end, we assume that all series have size , which is close to the length of the actual series to be considered below. The primary aim is to examine the convergence, i.e., the quality of the previously proposed estimators on a sample of a given length. Therefore, corresponding estimation errors will also be investigated for this purpose. Using the previously presented theoretical facts, the pseudo-algorithm for estimating the parameters of the GSB process can be formulated as follows:
- In the first estimation step, compute the sample correlation for a series of increments . If the condition is fulfilled, by using Equation (14), the estimator can be obtained.
- Compute statistics , given by Equation (24), as an estimate of the “hybrid” parameter The following variance estimator is then obtained:
- According to Equation (15) and previously obtained estimates and , compute the estimator .
- By using the estimate , for each , generate the (modeled) values of series and , by applying the iterative procedure:where and is given by Equation (20).
- According to previously obtained series , and by using Equation (21), compute a (more efficient) variance estimator
- By applying the Gauss-Newton procedure, i.e., Equations (16)–(18), the estimate can be obtained.
- According to previously obtained estimates and , compute the estimator .
We point out that in the above-mentioned pseudo algorithm, the 2nd stage can be replaced by the following alternative step:
2’.Compute statistics , given by Equation (26), and estimate the “hybrid” parameter . Then, according to Equation (27), the variance can be estimated as:
where .
By applying this pseudo-algorithm, the obtained values of the estimated parameters can be summarized as shown in Table 1, where their average values (Mean), minimums (Min.), maximums (Max.) can also be seen, along with the appropriate mean squared errors of estimation (MSEE) given in parentheses. Furthermore, testing results concerning the AN of thus obtained estimates are also presented in Table 1. To that end, Anderson-Darling and Cramer-von Mises normality tests were used. Their test statistics (denoted as AD and W, respectively), as well as their corresponding -values, were calculated using procedures from the R-package “nortest” [41].
Table 1.
Summary statistics of estimated parameters of the GSB process, obtained by a Monte Carlo study, along with realized statistics of normality tests.
According to the obtained values, it is evident that most estimators have a property of the AN. This applies even to the estimates of the mean value and , which are obtained from realizations of non-stationary GSB-series . As already explained, this is related to Theorems 2 and 3, which respectively describe the AN properties of the series and so-called α-means series. Notice that the asymptotic variance of these estimators is not bounded, hence there is a large range of their observed values. On the other hand, the AN property is not particularly emphasized in the case where the critical value is estimated. This is because both estimates and are obtained by the three-step procedure: estimates for the parameters and should first be determined, and only then for . In the case of variance estimators and , obtained based on modeled innovations , it is easy to see that they have the highest and almost the same efficiency. Furthermore, the values of the estimator are only slightly “weaker” than and . This is expected since, according to Theorem 5, the AN property holds for all these variance estimators. However, the estimate is by far the weakest variance estimate and can be omitted from further analysis. Moreover, based on previously obtained theoretical results, also confirmed through simulations, the most robust estimates of the unknown parameters , are , respectively. For those reasons, these estimators will be used for GSB modeling of actual data on COVID-19, which will be discussed below.
5.2. Application of the GSB Process: A Case Study of COVID-19 Dynamics
In this section we give, as an illustration, a practical application of the GSB process in stochastic modeling of actual data. In other words, as mentioned in the introductory section, we will show that it can be an adequate stochastic model for describing the dynamics of the infected and vaccinated population in relation to the SARS-CoV2 virus on the territory of the Republic of Serbia. To that end, we observe realizations of two time series and which, daily, represents the total number of infected persons, i.e., persons vaccinated with the first dose of the vaccine, starting from 24 December 2020 (the start date of vaccination in Serbia) and ending with 6 June 2022. The dynamics of both time series, length , are shown in Figure 6.
Figure 6.
Dynamics of the total infected (a) and vaccinated population (b) in relation to the virus SARS-CoV2 on the territory of the Republic of Serbia.
The main statistical indicators of these series (also labeled as Series A and Series B, respectively) are shown in the following Table 2. Based on thus obtained values, it can be concluded that these are time series with distinct, pronounced fluctuations. For instance, the average number of infected people is (approximately) 3650 per day, ranging from 60 to 19,901 infected people. Similar to that, the average number of vaccinated persons is 6348 per day, but the range of vaccinated persons varies from only 4 to as many as 68,678 persons per day. Therefore, we further consider the possibility that the GSB process can be used here as an appropriate stochastic model. For this purpose, as basic sequences, we observe the realizations of the so-called log-volumes, i.e., logarithmic values of series and :
Notice that the main goal of this transformation is to obtain more evenly distributed values of both series, and although based on increasing of the logarithmic function, the emphasis of fluctuations will remain. Additionally, inequalities implies the non-negativity of both log-volumes series .
Table 2.
Basic statistical indicators of observed actual series.
Further, using the log-volumes as a basic series, and using Equation (3), the series of increments , are determined entirely. Based on them, the estimates of GSB process parameters can be obtained by applying the pseudo-algorithm presented above. We emphasize that here the estimation procedure is repeated twice, i.e., for both series (A and B). Thus, modeled values of martingale means and innovations series, generated by Equation (29), are as follows:
where . As initial values of the iterative procedure (30), as before, we have taken as well as . Table 3 contains the basic statistical indicators of the actual series, log-volumes ( and increments , as well as modeled series, martingale means and innovations .
Table 3.
Basic statistical indicators of actual and modeled series.
By analyzing thus obtained values, an interesting connection can be observed, which can be explained by the previous theoretical results. Firstly, the average values of the log-volumes are “close” to the averages of the martingale means, which is in accordance with the equality . Moreover, with series A, almost equal values of other statistical indicators (standard deviations, for instance) are noticeable. This can also be seen by comparing the corresponding statistical indicators of increments and innovations , which will be explained below. Table 4 shows the above-mentioned estimators obtained according to the previously described procedures. In addition, some other estimates are shown, such as the sample linear correlation and estimates of the value . Accordingly, note that the condition is fulfilled in the cases of both series. Moreover, let us notice, for instance, that the estimated values for in the case of Series B are “close” to unity, so it can be assumed that innovations in this case have a standard distribution.
Table 4.
Estimated values of GSB process parameters.
As we have already pointed out, the most robust estimators of the GSB process are and based on them, modeled values of the series () and () were obtained. Let us recall that these series, respectively, represent the stability and the impact of fluctuations in the dynamics of the total number of infected and vaccinated people. The agreement between the modeled series and the actual data can be seen in Figure 7a where, along with the empirical values of the log-volumes (), modeled values of martingale means () are given. On the other hand, the agreement of a series of increments, i.e., the Split-MA(1) process () with innovations () is shown in Figure 7b.
Figure 7.
Graphs of empirical and modeled data: (a) log-volumes (solid lines) and martingale means (dashed lines); (b) Split-MA(1) process (solid lines) and innovations series (dashed lines). The upper panels represent the dynamics of the COVID-19 infection (Series A), and the lower panels represent the dynamics of the vaccinated population (Series B).
It should also be noted that the high agreement between the actual and modeled series is particularly noticeable in the case of series A. This can be explained theoretically, in the way it was done in Section 2. If at some points in time, innovations () have a pronounced fluctuation, they become equal to increments () at the next moment. The agreement between the realizations of these two series will be all the better if, in addition to large and pronounced fluctuations of (), the critical value is relatively small. Note that this is precisely the case with series A, where “small” estimated values of the parameter indicate the possibility that the true value of this parameter is (or, equivalently, ). If the sample size is large enough, this assumption can be formally tested by the null hypothesis or, equivalently, . According to Theorem 4, testing procedures can be based on the normal distribution, that is, using some standard, well-known statistical tests.
Note that in that case, the series of increments () is equalized with innovations (). This implies that () is a series with independent increments, i.e.,
According to Equation (1), it follows that , so all “information from the past” is contained in the previous realization of the series (). In that way, the entire statistical analysis of this series, i.e., the dynamics of the infected population, gains simplicity; namely, series A then has (only) two stochastic components () and (), i.e., it represents a random walk series.
Finally, using the inverse transformations of those given in Equation (29), PDFs of actual series and are readily obtained:
Here, are the PDFs of log-volumes (), obtained by differentiating the CDFs given by Equation (9), which can be done simply. Still, due to the non-stationarity of the mentioned series, which also depends on time, it is necessary to apply some numerical procedures to calculate their PDFs. For this purpose, the R-package “distr” [42] has been used, and the results of the applied procedure are shown in Figure 8.
Figure 8.
Empirical distributions of actual data (histograms) and their fitted PDFs (lines), obtained by the proposed estimation procedure: (a) distribution of the infected population (Series A); (b) distribution of the vaccinated population (Series B).
Here are the empirical distributions, i.e., histograms of the number of infected and vaccinated persons per day, with their fitted PDFs, obtained using Equations (32). Due to the non-stationarity of the time series and , as well as the comparison of the theoretical PDFs, fitting was also performed for the PDFs and of length (shown with dashed lines in Figure 8). In the case of the infected population (Series A), according to Equation (31) and the condition c ≈ 0, it follows that RVs have (an approximately) normal distribution. Thus, RVs will have (an approximately) log-normal distribution, shown with the solid line in Figure 8a. Note that this result is close to that obtained in [29]. Nevertheless, the distribution of the number of vaccinated population (Series B), shown with the solid line in Figure 8b, has a more pronounced “peak” close to the origin. It can also be explained by previous theoretical results, primarily given in Theorem 2, i.e., by Equation (8), which concerns the asymptotic behavior of the main GSB series .
6. Conclusions
The stochastic analysis of the GSB process presented in this paper confirms its possibility in modeling actual time series with pronounced fluctuations. The applied methods of dynamic and statistical analysis, based on this process, aim here to understand the long-term tendency of the SARS-COV2 virus behavior, as well as the immunization process. Along with other contemporary research, we hope this one can help further development of successful methods of overcoming the pandemic. To this end, notice that new strains of the SARS-CoV2 virus, which are very common, can affect the overall symptoms as well as the disease dynamics of COVID-19 (see, c.f. [43,44,45]). They may therefore change the dynamics of both time series investigated here. This may therefore be a new goal and motivation for some future research.
Finally, let us emphasize that one of the main stochastic advantages of the GSB model is that it allows the simultaneous use of both stationary and non-stationary components. Thereby, the asymptotic behavior of the GSB time series as well as the corresponding estimates thus obtained are of particular importance. It should also be noted that the proposed parameter estimation procedure can be implemented algorithmically in a relatively simple way. Additionally, some other estimation methods, such as the Empirical Characteristic Function (ECF) method described in [12] can be used. As shown in [11,12], it can also be used to model some other types of real data with pronounced and persistent fluctuations.
Author Contributions
Conceptualization, M.J.; data curation, M.J.; formal analysis, V.S.; methodology, K.K.; project administration, B.P.; software, K.K. and B.P.; supervision, V.S.; validation, P.Č.; visualization, P.Č.; writing—original draft, M.J., V.S. and K.K.; writing—review and editing, B.P. All authors have read and agreed to the published version of the manuscript.
Funding
This research was funded by the Ministry of Education, Science and Technological Development of the Republic of Serbia. (Grant number: III 47016.)
Data Availability Statement
Not applicable.
Acknowledgments
The authors would like to thank the Electronic Government of the Republic of Serbia and the Institute for Public Health “Milan Jovanović-Batut” for providing datasets used in this research.
Conflicts of Interest
The authors declare no conflict of interest.
References
- Engle, R.F.; Smith, A.D. Stochastic Permanent Breaks. Rev. Econ. Stat. 1999, 81, 553–574. [Google Scholar] [CrossRef]
- Diebold, F.X.; Inoue, A. Long Memory and Regime Switching. J. Econom. 2001, 105, 131–159. [Google Scholar] [CrossRef]
- Gonzalo, J.; Martínez, O. Large Shocks vs. Small Shocks. (Or does size matter? May be so.). J. Econom. 2006, 135, 311–347. [Google Scholar] [CrossRef]
- Dendramis, Y.; Kapetanios, G.; Tzavalis, E. Level Shifts in Stock Returns Driven by Large Shocks. J. Empir. Financ. 2014, 29, 41–51. [Google Scholar] [CrossRef]
- Dendramis, Y.; Kapetanios, G.; Tzavalis, E. Shifts in Volatility Driven by Large Stock Market Shocks. J. Econom. Dynam. Control 2015, 55, 130–147. [Google Scholar] [CrossRef]
- Huang, B.-N.; Fok, R.C.W. Stock Market Integration—an Application of the Stochastic Permanent Breaks Model. Appl. Econ. Lett. 2001, 8, 725–729. [Google Scholar] [CrossRef]
- González, A. A Smooth Permanent Surge Process. In SSE/EFI Working Paper Series in Economics and Finance No. 572; Stockholm School of Economics, The Economic Research Institute: Stockholm, UK, 2004. [Google Scholar]
- Kapetanios, G.; Tzavalis, E. Modeling Structural Breaks in Economic Relationships Using Large Shocks. J. Econom. Dynam. Control 2010, 34, 417–436. [Google Scholar] [CrossRef]
- Stojanović, V.; Popović, B.Č.; Popović, P. The Split-BREAK Model. Braz. J. Probab. Stat. 2011, 25, 44–63. [Google Scholar] [CrossRef]
- Stojanović, V.; Popović, B.Č.; Popović, P. Stochastic Analysis of GSB Process. Publ. Inst. Math. 2014, 95, 149–159. [Google Scholar] [CrossRef]
- Stojanović, V.; Popović, B.Č.; Popović, P. Model of General Split-BREAK Process. REVSTAT Stat. J. 2015, 13, 145–168. [Google Scholar]
- Stojanović, V.; Milovanović, G.V.; Jelić, G. Distributional Properties and Parameters Estimation of GSB Process: An Approach Based on Characteristic Functions. ALEA—Lat. Am. J. Probab. Math. Stat. 2016, 13, 835–861. [Google Scholar] [CrossRef]
- Xu, Z.; Wang, H.; Zhang, H.; Zhao, K.; Gao, H.; Zhu, Q. Non-Stationary Turbulent Wind Field Simulation of Long-Span Bridges Using the Updated Non-Negative Matrix Factorization-Based Spectral Representation Method. Appl. Sci. 2019, 9, 5506. [Google Scholar] [CrossRef]
- Granero-Belinchón, C.; Roux, S.G.; Garnier, N.B. Information Theory for Non-Stationary Processes with Stationary Increments. Entropy 2019, 21, 1223. [Google Scholar] [CrossRef]
- Zhao, D.; Gelman, L.; Chu, F.; Ball, A. Novel Method for Vibration Sensor-Based Instantaneous Defect Frequency Estimation for Rolling Bearings Under Non-Stationary Conditions. Sensors 2020, 20, 5201. [Google Scholar] [CrossRef] [PubMed]
- Qu, C.; Li, J.; Yan, L.; Yan, P.; Cheng, F.; Lu, D. Non-Stationary Flood Frequency Analysis Using Cubic B-Spline-Based GAMLSS Model. Water 2020, 12, 1867. [Google Scholar] [CrossRef]
- Aguejdad, R. The Influence of the Calibration Interval on Simulating Non-Stationary Urban Growth Dynamic Using CA-Markov Model. Remote Sens. 2021, 13, 468. [Google Scholar] [CrossRef]
- Narr, C.F.; Chernyavskiy, P.; Collins, S.M. Partitioning Macroscale and Microscale Ecological Processes Using Covariate-Driven Non-Stationary Spatial Models. Ecol. Appl. 2022, 32, e02485. [Google Scholar] [CrossRef]
- Vaz, S.; Torres, D.F.M. A Discrete-Time Compartmental Epidemiological Model for COVID-19 with a Case Study for Portugal. Axioms 2021, 10, 314. [Google Scholar] [CrossRef]
- Alqahtani, R.T.; Musa, S.S.; Yusuf, A. Unravelling the Dynamics of the COVID-19 Pandemic with the Effect of Vaccination, Vertical Transmission and Hospitalization. Results Phys. 2022, 39, 105715. [Google Scholar] [CrossRef]
- Ghosh, S.; Volpert, V.; Banerjee, M. An Epidemic Model with Time Delay Determined by the Disease Duration. Mathematics 2022, 10, 2561. [Google Scholar] [CrossRef]
- Almeshal, A.M.; Almazrouee, A.I.; Alenizi, M.R.; Alhajeri, S.N. Forecasting the Spread of COVID-19 in Kuwait Using Compartmental and Logistic Regression Models. Appl. Sci. 2020, 10, 3402. [Google Scholar] [CrossRef]
- Rossi, C.; Bonanomi, A.; Oasi, O. Psychological Wellbeing during the COVID-19 Pandemic: The Influence of Personality Traits in the Italian Population. Int. J. Environ. Res. Public Health 2021, 18, 5862. [Google Scholar] [CrossRef]
- Ponkratov, V.; Kuznetsov, N.; Bashkirova, N.; Volkova, M.; Alimova, M.; Ivleva, M.; Vatutina, L.; Elyakova, I. Predictive Scenarios of the Russian Oil Industry; with a Discussion on Macro and Micro Dynamics of Open Innovation in the COVID-19 Pandemic. J. Open Innov. Technol. Mark. Complex. 2020, 6, 85. [Google Scholar] [CrossRef]
- Hassan, S.M.; Riveros Gavilanes, J.M. First to React Is the Last to Forgive: Evidence from the Stock Market Impact of COVID-19. J. Risk Financ. Manag. 2021, 14, 26. [Google Scholar] [CrossRef]
- Flora, J.; Khan, W.; Jin, J.; Jin, D.; Hussain, A.; Dajani, K.; Khan, B. Usefulness of Vaccine Adverse Event Reporting System for Machine-Learning Based Vaccine Research: A Case Study for COVID-19 Vaccines. Int. J. Mol. Sci. 2022, 23, 8235. [Google Scholar] [CrossRef] [PubMed]
- Kouamé, K.-M.; Mcheick, H. An Ontological Approach for Early Detection of Suspected COVID-19 among COPD Patients. Appl. Syst. Innov. 2021, 4, 21. [Google Scholar] [CrossRef]
- Sarría-Santamera, A.; Abdukadyrov, N.; Glushkova, N.; Russell Peck, D.; Colet, P.; Yeskendir, A.; Asúnsolo, A.; Ortega, M.A. Towards an Accurate Estimation of COVID-19 Cases in Kazakhstan: Back-Casting and Capture–Recapture Approaches. Medicina 2022, 58, 253. [Google Scholar] [CrossRef] [PubMed]
- Shim, E.; Choi, W.; Song, Y. Clinical Time Delay Distributions of COVID-19 in 2020–2022 in the Republic of Korea: Inferences from a Nationwide Database Analysis. J. Clin. Med. 2022, 11, 3269. [Google Scholar] [CrossRef]
- Jankhonkhan, J.; Sawangtong, W. Model Predictive Control of COVID-19 Pandemic with Social Isolation and Vaccination Policies in Thailand. Axioms 2021, 10, 274. [Google Scholar] [CrossRef]
- Queirós-Reis, L.; Gomes da Silva, P.; Gonçalves, J.; Brancale, A.; Bassetto, M.; Mesquita, J.R. SARS-CoV-2 Virus−Host Interaction: Currently Available Structures and Implications of Variant Emergence on Infectivity and Immune Response. Int. J. Mol. Sci. 2021, 22, 10836. [Google Scholar] [CrossRef] [PubMed]
- Xu, L.; Xie, L.; Zhang, D.; Xu, X. Elucidation of Binding Features and Dissociation Pathways of Inhibitors and Modulators in SARS-CoV-2 Main Protease by Multiple Molecular Dynamics Simulations. Molecules 2022, 27, 6823. [Google Scholar] [CrossRef]
- Williams, D. Probability with Martingales; Cambridge University Press: Cambridge, UK, 1991. [Google Scholar]
- Stojanović, V.; Popović, B.Č.; Milovanović, G.V. The Split-SV model. Comput. Statist. Data Anal. 2016, 100, 560–581. [Google Scholar] [CrossRef]
- Stojanović, V.; Kevkić, T.; Jelić, G. Application of the Homotopy Analysis Method in Approximation of Convolutions Stochastic Distributions. Univ. Politeh. Buchar. Sci. Bull. 2017, 79, 103–112. [Google Scholar]
- Fuller, W.A. Introduction to Statistical Time Series; John Wiley & Sons: New York, NY, USA, 1996. [Google Scholar]
- Popović, B.Č. The First Order Random Coefficient (RC) Autoregressive Time Series. Sci. Rev. 1992, 21–22, 131–136. [Google Scholar]
- Lawrence, A.J.; Lewis, P.A.W. Reversed Residuals in Autoregressive Time Series Analysis. J. Time Series Anal. 1992, 13, 253–266. [Google Scholar] [CrossRef]
- Serfling, R.J. Approximation Theorems of Mathematical Statistics, 2nd ed.; John Wiley & Sons: Hoboken, NJ, USA, 2002. [Google Scholar]
- Hoeffding, W.; Robbins, H. The central limit theorem for dependent random variables. Duke Math. J. 1948, 15, 773–780. [Google Scholar] [CrossRef]
- Gross, L. Tests for normality. R Package Version 1.0-2. 2013. Available online: http://CRAN.R-project.org/package=nortest (accessed on 21 September 2022).
- Ruckdeschel, P.; Kohl, M.; Stabla, T.; Camphausen, F. S4 Classes for Distributions. R News 2006, 6, 2–6. Available online: https://CRAN.R-project.org/doc/Rnews (accessed on 21 September 2022).
- Sivakumar, B.; Deepthi, B. Complexity of COVID-19 Dynamics. Entropy 2022, 24, 50. [Google Scholar] [CrossRef]
- Beškovnik, B.; Zanne, M.; Golnar, M. Dynamic Changes in Port Logistics Caused by the COVID-19 Pandemic. J. Mar. Sci. Eng. 2022, 10, 1473. [Google Scholar] [CrossRef]
- Zakharov, V.; Balykina, Y.; Ilin, I.; Tick, A. Forecasting a New Type of Virus Spread: A Case Study of COVID-19 with Stochastic Parameters. Mathematics 2022, 10, 3725. [Google Scholar] [CrossRef]
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).