Next Article in Journal
A Review of Optimization Studies for System Appointment Scheduling
Next Article in Special Issue
Probability Distributions Approximation via Fractional Moments and Maximum Entropy: Theoretical and Computational Aspects
Previous Article in Journal
A Nonclassical Stefan Problem with Nonlinear Thermal Parameters of General Order and Heat Source Term
Previous Article in Special Issue
Estimation of Entropy for Generalized Rayleigh Distribution under Progressively Type-II Censored Samples
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Modeling High-Frequency Zeros in Time Series with Generalized Autoregressive Score Models with Explanatory Variables: An Application to Precipitation

by
Pedro Vidal-Gutiérrez
1,
Sergio Contreras-Espinoza
1 and
Francisco Novoa-Muñoz
2,*
1
Departamento de Estadística, Facultad de Ciencias, Universidad del Bío-Bío, Concepción 4051381, Chile
2
Departamento de Enfermería, Facultad de Ciencias de la Salud y de los Alimentos, Universidad del Bío-Bío, Chillán 3800708, Chile
*
Author to whom correspondence should be addressed.
Axioms 2024, 13(1), 15; https://doi.org/10.3390/axioms13010015
Submission received: 5 November 2023 / Revised: 23 November 2023 / Accepted: 25 November 2023 / Published: 25 December 2023
(This article belongs to the Special Issue Statistical Methods and Applications)

Abstract

:
An extension of the Generalized Autoregressive Score (GAS) model is presented for time series with excess null observations to include explanatory variables. An extension of the GAS model proposed by Harvey and Ito is suggested, and it is applied to precipitation data from a city in Chile. It is concluded that the model provides adequate prediction, and furthermore, an analysis of the relationship between the precipitation variable and the explanatory variables is shown. This relationship is compared with the meteorology literature, demonstrating concurrence.

1. Introduction

In recent times, models with varying parameters have gained increasing popularity for working with time-series data. One of these models is the Generalized Autoregressive Score (GAS), or the Dynamic Conditional Score (DCS). According to Creal et al. [1], these models belong to the class of observation-driven models.
Sometimes, a significant proportion of observations in a time series is zeros, while the remaining observations are positive and are measured on a continuous scale. An example of this is daily precipitation, where there are many days with no rainfall, resulting in these days being recorded as zeros. To work with this type of data, it is necessary to utilize the zero-augmented distributions introduced by Hautsch et al. [2]. This approach, developed by Harvey and Ito [3], provides a framework for working with GAS models in the presence of a significant frequency of zeros.
The objective and contribution of this paper is to extend the model proposed in [3] to include explanatory variables. To achieve this, this research is structured as follows: Section 2 briefly introduces the necessary theory for conducting this study and incorporates the explanatory variables. Section 3 presents the obtained results, which are discussed in Section 4. Finally, the conclusions are drawn in Section 5.

2. Materials and Methods

This section provides a summary of the GAS models [1], the zero-augmented distributions [2], and the integration of these concepts. The goal is to subsequently expand the model using explanatory variables.

2.1. GAS Models

GAS models [1], also known as DCS models [4], are observation-driven models. Blasques et al. [5] define these models for an observed time series y 1 , , y T with a density given by y t p y ( y t | f t , y 1 : t 1 ; θ ) , t = 1 , , T . This density depends on the time-varying parameter f t , past observations y 1 : t 1 : = { y 1 , y 2 , , y t 1 } , and static parameters θ . The time-varying parameter f t is defined as the function f t : = f t ( y 1 : t 1 ; θ ) .
An example of the updated equation is f t + 1 = ω + β f t + α s ( y t , f t ; θ ) , where θ = ( ω , α , β ) , s ( y t , f t ; θ ) = S t ( f t ; θ ) log p ( y t | f t , y 1 : t 1 ; θ ) f t is the weighted score, S t ( f t ; θ ) = I t d , d = 0 , 1 , and I t = E t 2 log p y ( y t | f t , y 1 : t 1 ; θ ) f t 2 .
Since the vector θ is unknown, it is estimated using the maximum likelihood method, maximizing ( θ ) : = t = 1 T log p y ( y t | f t , y 1 : t 1 ; θ ) .

2.2. Zero-Augmented Distribution for Non-Negative Variables

Consider a non-negative continuous random variable X with independent observations { X t } t = 1 n . To account for excess zeros, Hautsch et al. [2] allocate a probability mass at the exact zero value and define probabilities π : = P ( X > 0 ) and 1 π : = P ( X = 0 ) .
Conditional on X > 0 , X follows a continuous distribution with density g X ( x ) : = f X ( x X > 0 ) , which is continuous for x ( 0 , ) . Consequently, the unconditional distribution of X is semicontinuous with a discontinuity at zero. This implies the density f X ( x ) = ( 1 π ) δ ( x ) + π g X ( x ) I ( x > 0 ) , where 0 π 1 , δ ( x ) is a point probability mass at x = 0 , and I ( x > 0 ) denotes the indicator function that takes the value 1 for x > 0 and 0 otherwise. The probability π is treated as a parameter of the distribution that determines how much probability mass is assigned to the strictly positive part of X support. In [3], a GAS model using a aero-augmented distribution is presented and applied to precipitation data. In that work, the possibility of extending the model using explanatory variables is raised, which is addressed in this paper.

2.3. Dynamic Model for the Zero-Augmented Distribution Model

To model time series with excess null observations, Harvey and Ito [3] defined a probability density function g ( · ) for which it is possible to identify a scale parameter φ . In the context of GAS models, it is necessary to use a link function to introduce dynamics to the parameter, making φ = exp ( λ ) .
According to Harvey and Ito [3], in a parameter-driven model, the dynamics should be introduced through the parameter λ . Conversely, the DCS model is observation-driven, with the predictive distribution defined conditional on a filtered value of λ , denoted as λ t t 1 .
For an observed time series y 1 , y 2 , . . . , y T , let y t f ( y t λ t t 1 ; θ ) , where f ( · ) is the probability density function of y t obtained from a zero-augmented distribution. In other words, f ( y t λ t t 1 ) = ( 1 π ) ( 1 I ( y t > 0 ) ) + π g ( y t λ t t 1 ) I ( y t > 0 ) . Harvey and Ito [3] introduced dynamics to π through a logistic transformation, so when π t depends on λ t t 1 , it yields:
π t t 1 = exp ( δ 0 + δ 1 λ t t 1 ) 1 + exp ( δ 0 + δ 1 λ t t 1 ) .
Thus, the probability density function associated with y t takes the form:
f ( y t π t t 1 ; λ t t 1 ) = ( 1 π t t 1 ) ( 1 I ( y t > 0 ) ) + π t t 1 g ( y t λ t t 1 ) I ( y t > 0 ) .

2.4. Derivation of the Model’s Score

To obtain the score of the model, (2) is rewritten as follows:
f ( y t λ t t 1 ) = 1 π t t 1 , if y t = 0 π t t 1 g ( y t λ t t 1 ) , if y t > 0 .
By taking the derivative of the logarithm of (3) with respect to λ t t 1 and considering (1), the score of the model is given by:
log f ( y t λ t t 1 ) λ t t 1 = δ 1 π t t 1 , if y t = 0 δ 1 ( 1 π t t 1 ) + log g ( y t λ t t 1 ) λ t t 1 , if y t > 0 .
When expressed in terms of the indicator function I ( y t > 0 ) , this becomes:
log f ( y t λ t t 1 ) λ t t 1 = δ 1 π t t 1 ( 1 I ( y t > 0 ) ) + δ 1 ( 1 π t t 1 ) + log g ( y t λ t t 1 ) λ t t 1 I ( y t > 0 ) .

2.5. Generalized Beta Distribution of the Second Kind

For precipitation data, Harvey and Ito [3] recommend using the generalized beta distribution of the second kind [6], which is given by:
g ( y a , b , p , q ) = a ( y / b ) a p 1 b B ( p , q ) 1 + ( y / b ) a p + q , 0 < y < + 0 , otherwise ,
where, a , b , q , p > 0 , with b being the scale parameter, and a , p , and q are the shape parameters. According to Kleiber and Kotz [7], the non-central moments of order k N are given by:
E Y k = b k B ( p + k / a , q k / a ) B ( p , q ) = b k Γ ( p + k / a ) Γ ( q k / a ) Γ ( p ) Γ ( q )
At the same time, the density of a generalized beta distribution of the second kind exhibits considerable flexibility, as demonstrated in Figure 1.
Special cases encompass a broad range of distributions for non-negative variables. For instance, when p = 1 , the distribution becomes the Burr distribution, and when q = 1 , it becomes a log-logistic distribution (McDonald [8]).

2.6. GAS Model for a Zero-Augmented Distribution

We work with the generalized beta distribution of the second kind for which the density is given by (5). Using the exponential link function b = exp ( λ ) and incorporating the time dynamics, it yields:
g ( y t λ t t 1 ) = a ( y t / exp ( λ t t 1 ) ) a p 1 exp ( λ t t 1 ) B ( p , q ) 1 + ( y t / exp ( λ t t 1 ) ) a p + q , 0 < y t < + 0 , otherwise .
Applying a logarithm to expression (7) and considering (4) results in
log g ( y t λ t t 1 ) λ t t 1 = a ( p + q ) y t exp ( λ t t 1 ) a y t exp ( λ t t 1 ) a + 1 a p .
Thus, the model for y t in terms of the time-varying parameter λ t t 1 is:
y t f ( y t y 1 , , y t 1 , π t t 1 ; λ t t 1 ; θ ) ,
with a probability density function given by:
f ( y t y 1 , , y t 1 , π t t 1 ; λ t t 1 ; θ ) = ( 1 π t t 1 ) ( 1 I ( y t > 0 ) ) + π t t 1 g ( y t λ t t 1 ) I ( y t > 0 ) ,
where g ( · ) is the density of a generalized beta distribution of the second kind given in (7), θ = ( a , p , q , ω , ϕ , κ , δ 0 , δ 1 ) , π t t 1 is defined in (1), and
λ t + 1 t = ω + ϕ λ t t 1 + κ u t ,
where u t is the conditional score of the model and κ is the weight assigned to it.

2.7. Explanatory Variables

In Harvey and Luati [9], it is demonstrated that for a model for which the location parameter denoted by μ is time-varying, the model depends on a set of explanatory variables denoted by a k × 1 vector w t as well as the past values and the score through the following formulation:
μ t t 1 = ω + w t β + μ t t 1 , t = 1 , , T ,
μ t + 1 t = ϕ μ t t 1 + κ u t , t = 1 , , T ,
where β is also a k × 1 vector representing parameters that are estimated in the model for each explanatory variable.

2.8. Diagnosis

Diebold et al. [10] state that to evaluate whether a model y t is well-fitted, it should be demonstrated that the probability integral transform (PIT) of
z t = y t p t ( u ) d u
is independent and identically distributed as the uniform distribution U ( 0 , 1 ) , where p t ( · ) represents the density forecasts of the generating process f y ( y t ) .

2.9. Prediction

To obtain predictions, Blasques et al. [5] create confidence bands for the time-varying parameter f t + 1 . They consider the model for an observed time series y 1 , y 2 , , y T given by y t p y ( y t f t ; θ ) with the update equation
f t + 1 = ϕ ( y t , f t ; θ ) .
In GAS models, f T + 1 , by construction, depends on y 1 , y 2 , , y T , so the parameters need to be obtained from time T + 2 .
Harvey and Ito [3] accomplish this through computational simulation, following the steps outlined below for n 2 :
(A)
Given the point estimate by maximum likelihood θ ^ T and the filtered value f ^ T + 1 obtained from (11) for θ = θ ^ T and t = T , simulate S realizations y T + 1 1 , , y T + 1 S from the estimated conditional density at time T + 1 . In other words,
y T + 1 s p y y T + 1 f ^ T + 1 ; θ ^ T , s = 1 , , S .
(B)
Given the simulated observations y T + 1 1 , , y T + 1 S and equation (11), obtain the filtered values f ^ T + 2 1 , , f ^ T + 2 s , conditioned on θ ^ T and f ^ T + 1 , using:
f ^ T + 2 s = ϕ y T + 1 s , f ^ T + 1 ; θ ^ T , s = 1 , , S .
(C)
For f ^ T + 2 s , s = 1 , , S , repeat steps (A) and (B) for the periods T + 2 , , T + n .
(D)
Use f ^ T + n s to calculate forecast bands at the desired percentiles.

2.10. Brier Probability Score

To evaluate the quality of the prediction, the Brier probability score (BPS) will be used as a measure of accuracy. This metric is widely employed in such cases (Wilks [11]). BPS was introduced by Brier et al. [12] and is given by:
B P S = 1 n t = 1 n p t α t 2 ,
where n represents the number of predicted values, p t is the predicted probability at time t, and α t takes the value 1 if the event occurred at time t and 0 otherwise. Since 0 B P S 1 , Salvador [13] suggests that predictions are acceptable if B P S 0.35 .

2.11. Application

The zero-augmented GAS model to be formulated will be applied to precipitation data. Let y t represent the amount of precipitation in period t. Then,
y t p y ( y t y 1 , , y t 1 , λ t t 1 ; θ ) .
It is assumed that the data-generating process for precipitation follows the zero-augmented generalized beta distribution of the second kind. As a result, the conditional density of y t is defined as:
p ( y t y 1 , , y t 1 , λ t t 1 ; θ ) = ( 1 π t t 1 ) ( 1 I ( y t > 0 ) ) + π t t 1 g ( y t λ t t 1 ) I ( y t > 0 ) ,
where π t t 1 is given in (1), g ( y t λ t t 1 ) is the density of the generalized beta distribution of the second kind, which, according to Harvey and Ito [3], for improved estimates, should be reparameterized from (7) in terms of the reciprocal of the tail index: η ¯ = 1 / η , where η = a q is the tail index. This leads to:
g ( y t λ t t 1 ) = a ( y t / exp ( λ t t 1 ) ) a p 1 exp ( λ t t 1 ) B ( p , 1 a η ¯ ) 1 + ( y t / exp ( λ t t 1 ) ) a p + 1 a η ¯ , 0 < y t < + , 0 , otherwise ,
where a , p > 0 are shape parameters, and exp ( λ t t 1 ) is the scale parameter modeled through λ t t 1 , which acts as the location parameter. Replacing it with μ t t 1 in Equations (9) and (10) results in:
λ t t 1 = ω + w t β + λ t t 1 , t = 1 , , T , λ t + 1 t = ϕ λ t t 1 + κ u t , t = 1 , , T .
where ϕ and κ are parameters to be estimated; w t = ( w 1 , w 2 , , w k ) , where w i with i { 1 , , k } are the explanatory variables; β = ( β 1 , β 2 , , β k ) , where β i with i { 1 , , k } are the parameters to be estimated; and u t is the conditional score of the model, given by:
u t = log p ( y t y 1 , , y t 1 , λ t t 1 ; θ ) λ t t 1 = δ 1 π t t 1 ( 1 I ( y t > 0 ) ) + δ 1 ( 1 π t t 1 ) + log g ( y t λ t t 1 ) λ t t 1 I ( y t > 0 ) ,
where log g ( y t λ t t 1 ) λ t t 1 is given by Equation (8).
The conditional mean, obtained directly from (6), is given by:
E ( y t Y t 1 ) = π t t 1 exp ( λ t t 1 ) B ( p + 1 / a , 1 / ( a η ¯ ) 1 / a ) B ( p , 1 a η ¯ ) .

2.12. Dataset

The employed time series corresponds to the daily precipitation in the city of Puerto Montt in Chile, as shown in Figure 2. This variable is measured in millimeters (mm) and is equivalent to the liters of water that have fallen per square meter. The dataset was divided into two parts: the first part was used for model estimation and covers from 1 January 2011 to 31 December 2020 with a total of 3653 observations, out of which 1648 data points are zeros. The second part consisted of the following 244 observations, of which 127 data points were zeros. The data used were obtained from the website of the Dirección Meteorológica de Chile “http://www.meteochile.gob.cl/ (accessed on 22 November 2022)”, and the records belong to the El Tepual Puerto Montt Ap Station (code 410005).
As explanatory variables, the following were used: w 1 : = relative humidity, measured in percentage (%); w 2 : = atmospheric pressure, measured in hectopascals (hPa); and w 3 : = temperature, measured in degrees Celsius (°C). The daily maximum values reached by these variables were used.
Figure 3, Figure 4 and Figure 5 present the graphs of the explanatory variables, while Table 1 shows the descriptive statistics of these explanatory variables.

2.13. Parameter Estimation

The estimation of the vector θ = ( ω , ϕ , κ , δ 0 , δ 1 , a , p , η ¯ , β 1 , β 2 , β 3 ) was performed using the method of maximum likelihood, formulating the maximization problem as:
θ ^ = arg max θ t = 1 T log p y t y 1 , , y t 1 , λ t t 1 , θ .
The calculations were performed in the R programming language using the GB2 package (Graf et al. [14]), maxLik package (Henningsen and Toome [15]), pracma package (Borchers [16]), and DEoptim package (Mulle et al. [17]).

3. Results

In Table 2, the parameter estimates of the model are presented, along with their statistical significance and standard deviation in parentheses.
Figure 6 presents a graph of the precipitation in Puerto Montt and the adjusted mean.
Figure 7 depicts the empirical cumulative distribution function (ECDF) plotted against transformed integral probabilities for positive observations, while Table 3 displays the result of the Kolmogorov–Smirnov test along with its p-value.
The graph for the predicted scale parameter is shown in Figure 8, and Figure 9 displays the graph for the prediction of the conditional mean E ( y T + Y T ) , as given in (13).
Figure 10 illustrates the predictions for the probability of no rainfall. The shaded regions represent the 95% confidence bands. Notably, B P S = 0.24 .

4. Discussion

The goodness-of-fit of the model is evident from Figure 6 and is supported by the information in Table 2, where most of the estimated parameters are significant.
Additionally, Figure 7 indicates that the plot of PITs against the ECDF suggests that the data follow the distribution estimated by the model. This alignment is further confirmed by the Kolmogorov–Smirnov test results presented in Table 3, which verify that the model’s PITs follow a uniform distribution U ( 0 , 1 ) .
It is necessary to emphasize that to find out if the PITs had a uniform distribution U ( 0 , 1 ) , two methods were considered:
(a)
The classic Kolmogorov–Smirnov test;
(b)
A permutation and bootstrap approach. For this, the algorithm described in Præstgaard [18] was implemented, as suggested by one of the reviewers.
Figure 8 illustrates the predictions of the scale parameter of the model, which, combined with the estimated parameter vector θ , allows for density function forecasts at each prediction time point. Using these density functions (obtained from estimated parameters), conditional means are calculated and presented in Figure 9 as point estimates.
Figure 10 depicts the behavior of the parameter associated with the probability of y t taking a value of zero in the predictions. These values contribute to calculating the Brier probability score of the model, which has been calculated as 0.24 , indicating adequate model performance according to Salvador [13].
In Figure 8, Figure 9 and Figure 10, from January to March, it can be observed that the predicted values cluster around one end of the band. This behavior arises from the nature of the zero-augmented distribution model, as explained below:
Initially, in period T + 1 , the values of the scale and the probability of rainfall need to be determined, yielding b T + 1 T = 4.4489 and π T + 1 T = 0.2416 , respectively. As the value of π T + 1 T is close to 0, the simulations initially produce many zeros compared to positive values. This phenomenon directly impacts the behavior observed at the lower end of Figure 8 and Figure 9 and at the upper end of Figure 10, as it corresponds to the predictions of 1 π T + 1 T .
With the values from the preceding paragraph along with the estimation of θ , the density presented in Figure 11 is fully determined. Following the procedure outlined by Blasques et al. [5], using this density, values of y are simulated. For this purpose, 1000 simulations were conducted, and the resulting histogram is displayed in Figure 12.
As characteristic of a zero-augmented distribution density, Figure 12 exhibits a high frequency of zeros since the probability of no precipitation is 1 0.2416 = 0.7584 . Therefore, such a proportion of zeros was expected.
With the aforementioned results, the conditional score, u t , is obtained, as shown in Figure 13. It is noticeable that it inherits the shape of the graph in Figure 11.
Now, it is possible to calculate the scale parameter for time T + 2 , as depicted in Figure 14, which exhibits a similar pattern to that of Figure 11.
The same applies to the parameter for the probability of rain for time T + 2 , which is presented in Figure 15.
From the histograms of Figure 12, Figure 13, Figure 14, Figure 15, Figure 16 and Figure 17, the high frequency of zeros in the simulated observations causes the calculated values to inherit the same pattern. Therefore, if a specific point prediction is desired, such as the median, for instance, it should be approximated towards the side where the highest frequency lies. As mentioned before, this situation occurs during the period from January to March, which is natural due to it being the summer season in Chile. In other words, the probability of no rainfall is significantly higher compared to the other months. This pattern changes in the following months as the probabilities of no rainfall decrease (see Figure 10), and this is reflected in the corresponding predictions.
Next, the estimated coefficients of the explanatory variables β 1 , β 2 , and β 3 are interpreted. From (12), it follows that:
λ t t 1 = ω + w 1 β 1 + w 2 β 2 + w 3 β 3 + λ t t 1 .
Since the scale parameter of the model is
b t t 1 = exp ( λ t t 1 ) ,
when derived with respect to any of the explanatory variables, w i with i { 1 , 2 , 3 } , the result is:
b t t 1 w i = b t t 1 λ t t 1 λ t t 1 w i = exp ( λ t t 1 ) β i ;
hence, the sign of β i determines whether b t t 1 increases or decreases. If β i > 0 , then b t t 1 grows, and if β i < 0 , then b t t 1 decreases. Additionally, higher values of the scale parameter result in greater dispersion of the density, while lower values of the scale parameter lead the density to concentrate more around zero. This concentration causes a decrease in the probabilities of high values of the variable, in contrast to when the density becomes more spread out.
Regarding the probability of rain, π t t 1 given in (1), when deriving it with respect to any of the explanatory variables, w i with i { 1 , 2 , 3 } , the following is obtained:
π t t 1 w i = π t t 1 λ t t 1 λ t t 1 w i = δ 1 exp ( δ 0 + δ 1 λ t t 1 ) 1 + exp ( δ 0 + δ 1 λ t t 1 ) 2 β i .
As seen in Table 2, where δ 1 > 0 , the sign of β i determines whether π t t 1 increases or decreases.
Finally, by differentiating the conditional mean, E ( y t Y t 1 ) given in (13), with respect to any of the explanatory variables, w i with i { 1 , 2 , 3 } , we have:
E ( y t Y t 1 ) w i = B ( p + 1 / a , 1 / ( a η ¯ ) 1 / a ) B ( p , 1 a η ¯ ) w i π t t 1 b t t 1 = B ( p + 1 / a , 1 / ( a η ¯ ) 1 / a ) B ( p , 1 a η ¯ ) π t t 1 w i b t t 1 + π t t 1 b t t 1 w i .
From this, the sign of β i determines whether the conditional mean increases or decreases. If β i > 0 , then b t t 1 , π t t 1 , and the derivatives within the last parentheses are positive, and when β i < 0 , the opposite occurs.
In summary, the following cases can be observed:
(I)
If β i > 0 , then the scale, b t t 1 , increases, increasing the dispersion for y t > 0 , making higher values more likely. Additionally, the probability of rain, π t t 1 , increases, and the conditional mean, E ( y t Y t 1 ) , also increases.
(II)
If β i < 0 , then the scale, b t t 1 , decreases, concentrating the density of the distribution around zero for y t > 0 , making higher values less likely. Additionally, the probability of rain, π t t 1 , decreases, and the conditional mean, E ( y t Y t 1 ) , also decreases.
Since β 1 = 0.0457 > 0 (see Table 2) and is the coefficient associated with humidity and, according to Llasat Botija et al. [19], humidity promotes the formation of clouds that will lead to rainfall, this aligns with case (I).
On the other hand, β 2 = 0.0010 < 0 (see Table 2), which is the coefficient associated with pressure and corresponds to case (II). According to García de Pedraza [20], when pressure increases, the skies are clearer, a condition that does not favor rainfall. Conversely, if the pressure decreases, it is a condition that favors cloud formation and rain. Therefore, the results align with meteorological science.
Meanwhile, β 3 = 0.0884 < 0 (see Table 2), which is the coefficient associated with temperature and also corresponds to case (II). Regarding this, Trenberth et al. [21] mention that during the warm season over continents, higher temperatures are associated with lower precipitation amounts, while in colder seasons, lower temperatures indicate higher precipitation. Thus, an inverse relationship between temperature and rainfall would exist, but it is more related to the time of year. It is worth noting that this relationship is complex, and exceptions can occur. For example, higher temperatures could also promote cloud formation through water evaporation.

5. Conclusions

A model has been extended for data originating from a zero-augmented distribution: that is, it is to be used in time series where there is a high-frequency proportion of zeros. Additionally, it has been considered that the non-zero data come from a continuous distribution with support for positive values, following the GAS models guidelines of Harvey and Ito [3], as this would not be possible using classical models such as those of Box & Jenkins [22]. This has been applied in meteorology with the precipitation data from a city in Chile. The model has been successfully fitted and responds well to diagnostic tests.
When evaluating the predictive capability of the proposed model, the Brier PS score yielded a value of 0.24 , categorizing the model as suitable, in contrast to the values presented by Harvey and Ito [3], which were around 0.72 and 0.75 . The low value of the Brier PS score for the proposed model could signal that by incorporating explanatory variables, the fit of this type of model can be improved.
Regarding the explanatory variables, it was also very interesting to provide an interpretation of the estimated coefficients associated with each explanatory variable and to confirm that the results of the proposed model, regarding the relationship between precipitation and the explanatory variables humidity, pressure, and temperature, generally align with what is established in meteorology.
It is interesting to analyze how these models behave when the distribution associated with the non-zero part is not necessarily positive and/or continuous. For example, a discrete distribution could be used to analyze time series of the number of COVID-19 fatalities, where there is a high frequency of zeros. This could help determine whether the prediction quality remains consistent under such circumstances.
When it comes to applications in meteorology, it would be compelling to explore how to incorporate explanatory variables related to wind. These variables are known by a specific term in the literature—they are referred to as ’circular data’—and they have a distinctive treatment approach. This aspect has been studied in works by Harvey et al. [23] and Fisher and Lee [24].
It could also be important to analyze the scenario where a specific distribution cannot be identified for the non-zero part. In this case, it could be relevant to explore how to incorporate a more advanced system into these models, such as kernel density estimations for time series. These have also been studied in works such as those by Harvey and Oryshchenko [25] and Harvey [4], where non-parametric statistical tools are used to create distribution-free time-series models.

Author Contributions

Conceptualization, S.C.-E. and P.V.-G.; methodology, F.N.-M.; software, P.V.-G.; validation, S.C.-E., F.N.-M. and P.V.-G.; formal analysis, F.N.-M.; investigation, S.C.-E. and P.V.-G.; resources, F.N.-M.; data curation, P.V.-G.; writing—original draft preparation, F.N.-M.; writing—review and editing, S.C.-E.; visualization, P.V.-G.; supervision, F.N.-M.; project administration, S.C.-E. All authors have read and agreed to the published version of the manuscript.

Funding

Novoa-Muñoz’s research was fully supported by project 2220529 IF/R and Fondo de Apoyo a la Participación a Eventos Internacionales (FAPEI) at Universidad del Bío-Bío, Chile. Contreras-Espinoza was supported by Fondo de Apoyo a la Participación a Eventos Internacionales at Universidad del Bío-Bío, Chile.

Data Availability Statement

The data are obtained from the Meteorological Directorate of Chile “http://www.meteochile.gob.cl/” (accessed on 22 November 2022)—specifically, from “https://climatologia.meteochile.gob.cl/” (accessed on 22 November 2022). And the records belong to the El Tepual Puerto Montt Ap Station (code 410005).

Acknowledgments

The authors would like to thank the anonymous reviewers and the editor of this journal for their valuable time and their careful comments and suggestions because of which the quality of this paper has been improved.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
GASGeneralized Autoregressive Score
DCSDynamic Conditional Score
BPSBrier Probability Score
ECDFEmpirical Cumulative Distribution Function
PITProbability Integral Transform

References

  1. Creal, D.; Koopman, S.J.; Lucas, A. Generalized autoregressive score models with applications. J. Appl. Econom. 2013, 28, 777–795. [Google Scholar] [CrossRef]
  2. Hautsch, N.; Malec, P.; Schienle, M. Capturing the zero: A new class of zero-augmented distributions and multiplicative error processes. J. Financ. Econom. 2014, 12, 89–121. [Google Scholar] [CrossRef]
  3. Harvey, A.; Ito, R. Modeling time series when some observations are zero. J. Econom. 2020, 214, 33–45. [Google Scholar] [CrossRef]
  4. Harvey, A.C. Dynamic Models for Volatility and Heavy Tails: With Applications to Financial and Economic Time Series; Cambridge University Press: New York, NY, USA, 2013; Volume 52. [Google Scholar]
  5. Blasques, F.; Koopman, S.J.; Łasak, K.; Lucas, A. In-sample confidence bands and out-of-sample forecast bands for time-varying parameters in observation-driven models. Int. J. Forecast. 2016, 32, 875–887. [Google Scholar] [CrossRef]
  6. McDonald, J.B.; Xu, Y.J. A generalization of the beta distribution with applications. J. Econom. 1995, 66, 133–152. [Google Scholar] [CrossRef]
  7. Kleiber, C.; Kotz, S. Statistical Size Distributions in Economics and Actuarial Sciences; John Wiley & Sons: Hoboken, NJ, USA, 2003. [Google Scholar]
  8. McDonald, J.B. Some generalized functions for the size distribution of income. In Modeling Income Distributions and Lorenz Curves; Springer: New York, NY, USA, 2008; pp. 37–55. [Google Scholar]
  9. Harvey, A.; Luati, A. Filtering with heavy tails. J. Am. Stat. Assoc. 2014, 109, 1112–1122. [Google Scholar] [CrossRef]
  10. Diebold, F.X.; Gunther, T.A.; Tay, A.S. Evaluating Density Forecasts with Applications to Financial Risk Management. Int. Econ. Rev. 1998, 39, 863–883. [Google Scholar] [CrossRef]
  11. Wilks, D.S. Statistical Methods in the Atmospheric Sciences; Academic Press: Cambridge, MA, USA, 2011; Volume 100. [Google Scholar]
  12. Brier, G.W. Verification of forecasts expressed in terms of probability. Mon. Weather Rev. 1950, 78, 1–3. [Google Scholar] [CrossRef]
  13. Salvador, J.A.F. Data Analysis Advances in Marine Science for Fisheries Management: Supervised Classification Applications. Ph.D. Thesis, Department of Computer Science and Artificial Intelligence of the University of the Basque Country, Leioa, Spain, 2011. [Google Scholar]
  14. Graf, M.; Nedyalkova, D. GB2: Generalized Beta Distribution of the Second Kind: Properties, Likelihood, Estimation. R Package Version 2.1.1. 2022. Available online: https://CRAN.R-project.org/package=GB2 (accessed on 22 November 2022).
  15. Henningsen, A.; Toomet, O. maxLik: A package for maximum likelihood estimation in R. Comput. Stat. 2011, 26, 443–458. [Google Scholar] [CrossRef]
  16. Borchers, H. pracma: Practical Numerical Math Functions. R Package Version 2.4.2. 2022. Available online: https://CRAN.R-project.org/package=pracma (accessed on 22 November 2022).
  17. Mullen, K.; Ardia, D.; Gil, D.L.; Windover, D.; Cline, J. DEoptim: An R package for global optimization by differential evolution. J. Stat. Softw. 2011, 40, 1–26. [Google Scholar] [CrossRef]
  18. Præstgaard, J.T. Permutation and Bootstrap Kolmogorov-Smirnov Tests for the Equality of Two Distributions. Scand. J. Stat. 1995, 22, 305–322. [Google Scholar]
  19. Llasat, B.M.D.C.; Llasat-Botija, M.; Ter, C.A. Con el agua al cuello. 2009. Available online: http://hdl.handle.net/2445/8727 (accessed on 1 November 2022).
  20. García de Pedraza, L. Adecuado uso del barómetro. 2002. Available online: http://hdl.handle.net/20.500.11765/12031 (accessed on 1 November 2022).
  21. Trenberth, K.E.; Jones, P.D.; Ambenje, P.; Bojariu, R.; Easterling, D.; Klein, T.A.; Parker, D.; Rahimzadeh, F.; Renwick, J.A.; Rusticucci, M.; et al. Observations. Surface and Atmospheric Climate Change; Cambridge University Press: Cambridge, UK, 2007; Chapter 3. [Google Scholar]
  22. Box, G.E.; Jenkins, G.M.; Reinsel, G.C.; Ljung, G.M. Time Series Analysis: Forecasting and Control; John Wiley & Sons: Hoboken, NJ, USA, 2015. [Google Scholar]
  23. Harvey, A.; Hurn, S.; Thiele, S. Modeling Directional (Circular) Time Series; Apollo—University of Cambridge Repository: Cambridge, MA, USA, 2019. [Google Scholar] [CrossRef]
  24. Fisher, N.I.; Lee, A. Time series analysis of circular data. J. R. Stat. Soc. Ser. B (Methodol.) 1994, 56, 327–339. [Google Scholar] [CrossRef]
  25. Harvey, A.; Oryshchenko, V. Kernel density estimation for time series data. Int. J. Forecast. 2012, 28, 3–14. [Google Scholar] [CrossRef]
Figure 1. Generalized beta distribution of the second kind density function for b = 1 , p = 0.5 , q = 2 .
Figure 1. Generalized beta distribution of the second kind density function for b = 1 , p = 0.5 , q = 2 .
Axioms 13 00015 g001
Figure 2. Precipitation in Puerto Montt, Chile.
Figure 2. Precipitation in Puerto Montt, Chile.
Axioms 13 00015 g002
Figure 3. Humidity in Puerto Montt, Chile.
Figure 3. Humidity in Puerto Montt, Chile.
Axioms 13 00015 g003
Figure 4. Pressure in Puerto Montt, Chile.
Figure 4. Pressure in Puerto Montt, Chile.
Axioms 13 00015 g004
Figure 5. Temperature in Puerto Montt, Chile.
Figure 5. Temperature in Puerto Montt, Chile.
Axioms 13 00015 g005
Figure 6. Fitted model for rainfall in Puerto Montt, Chile.
Figure 6. Fitted model for rainfall in Puerto Montt, Chile.
Axioms 13 00015 g006
Figure 7. Probability integral transform (PIT) against the empirical cumulative distribution function (ECDF).
Figure 7. Probability integral transform (PIT) against the empirical cumulative distribution function (ECDF).
Axioms 13 00015 g007
Figure 8. Prediction of the scale parameter exp ( λ T + T ) with { 1 , 2 , , n } ( T = 3653 , n = 244 ) and the confidence band for each time within the observation period.
Figure 8. Prediction of the scale parameter exp ( λ T + T ) with { 1 , 2 , , n } ( T = 3653 , n = 244 ) and the confidence band for each time within the observation period.
Axioms 13 00015 g008
Figure 9. Prediction of the conditional mean E ( y T + Y T ) with { 1 , 2 , , n } ( T = 3653 , n = 244 ) and its corresponding confidence band for each time within the observation period.
Figure 9. Prediction of the conditional mean E ( y T + Y T ) with { 1 , 2 , , n } ( T = 3653 , n = 244 ) and its corresponding confidence band for each time within the observation period.
Axioms 13 00015 g009
Figure 10. Prediction of ( 1 π T + T ) with { 1 , 2 , , n } ( T = 3653 , n = 244 ) and the confidence band for each time within the observation period.
Figure 10. Prediction of ( 1 π T + T ) with { 1 , 2 , , n } ( T = 3653 , n = 244 ) and the confidence band for each time within the observation period.
Axioms 13 00015 g010
Figure 11. Probability density function p ( y T + 1 y 1 , , y T , λ T + 1 T ; θ ) with T = 3653 .
Figure 11. Probability density function p ( y T + 1 y 1 , , y T , λ T + 1 T ; θ ) with T = 3653 .
Axioms 13 00015 g011
Figure 12. Simulations of y T + 1 p ( y T + 1 y 1 , , y T , λ T + 1 T ; θ ) with T = 3653 .
Figure 12. Simulations of y T + 1 p ( y T + 1 y 1 , , y T , λ T + 1 T ; θ ) with T = 3653 .
Axioms 13 00015 g012
Figure 13. Simulations of the score u T + 1 with T = 3653 .
Figure 13. Simulations of the score u T + 1 with T = 3653 .
Axioms 13 00015 g013
Figure 14. Scale simulations exp ( λ T + 2 T ) with T = 3653 .
Figure 14. Scale simulations exp ( λ T + 2 T ) with T = 3653 .
Axioms 13 00015 g014
Figure 15. Simulations of π T + 2 T with T = 3653 .
Figure 15. Simulations of π T + 2 T with T = 3653 .
Axioms 13 00015 g015
Figure 16. Simulations of 1 π T + 2 T with T = 3653 .
Figure 16. Simulations of 1 π T + 2 T with T = 3653 .
Axioms 13 00015 g016
Figure 17. Simulations of E ( y T + 2 Y T ) with T = 3653 .
Figure 17. Simulations of E ( y T + 2 Y T ) with T = 3653 .
Axioms 13 00015 g017
Table 1. Descriptive statistics.
Table 1. Descriptive statistics.
PrecipitationHumidityPressureTemperature
Mean3.945396.46461009.91214.7442
Standard Deviation7.22982.80455.15894.2798
Minimum071987.34
Maximum69100102834.1
Asymmetry2.9967−1.3453−0.05550.4220
Kurtosis15.63309.01573.63222.9062
Table 2. Estimated parameters of the model.
Table 2. Estimated parameters of the model.
ParametersEstimation
ω 0.0647 (1.0330)
ϕ 0.3356 *** (0.0603)
κ 0.2418 *** (0.0323)
δ 0 −4.3049 *** (1.0917)
δ 1 2.1178 *** (0.1693)
a0.9785 *** (0.1001)
p1.0032 *** (0.1384)
η ¯ 0.4585 *** (0.1114)
β 1 0.0457 *** (0.0085)
β 2 −0.0010 *** (0.0003)
β 3 −0.0884 *** (0.0071)
*** p < 0.01 .
Table 3. Kolmogorov–Smirnov test results.
Table 3. Kolmogorov–Smirnov test results.
Kolmogorov–Smirnov Test
p-value KS0.0518
p-value bootstrap KS0.0510
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Vidal-Gutiérrez, P.; Contreras-Espinoza, S.; Novoa-Muñoz, F. Modeling High-Frequency Zeros in Time Series with Generalized Autoregressive Score Models with Explanatory Variables: An Application to Precipitation. Axioms 2024, 13, 15. https://doi.org/10.3390/axioms13010015

AMA Style

Vidal-Gutiérrez P, Contreras-Espinoza S, Novoa-Muñoz F. Modeling High-Frequency Zeros in Time Series with Generalized Autoregressive Score Models with Explanatory Variables: An Application to Precipitation. Axioms. 2024; 13(1):15. https://doi.org/10.3390/axioms13010015

Chicago/Turabian Style

Vidal-Gutiérrez, Pedro, Sergio Contreras-Espinoza, and Francisco Novoa-Muñoz. 2024. "Modeling High-Frequency Zeros in Time Series with Generalized Autoregressive Score Models with Explanatory Variables: An Application to Precipitation" Axioms 13, no. 1: 15. https://doi.org/10.3390/axioms13010015

APA Style

Vidal-Gutiérrez, P., Contreras-Espinoza, S., & Novoa-Muñoz, F. (2024). Modeling High-Frequency Zeros in Time Series with Generalized Autoregressive Score Models with Explanatory Variables: An Application to Precipitation. Axioms, 13(1), 15. https://doi.org/10.3390/axioms13010015

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop