Next Article in Journal
Dynamic Acceptance Sampling Strategy Based on Product Quality Performance Using Examples from IC Test Factory
Next Article in Special Issue
Bayesian Identification Procedure for Triple Seasonal Autoregressive Models
Previous Article in Journal
Hydro-Mechanical Coupling of Cement-Based Slurry Grouting in Saturated Geomaterials
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Bayesian Subset Selection of Seasonal Autoregressive Models

1
Department of Statistics, Mathematics, and Insurance, Faculty of Commerce, Menoufia University, Menoufia 32952, Egypt
2
Department of Statistics and Operation Research, Faculty of Science, King Saud University, P.O. Box 2455, Riyadh 11451, Saudi Arabia
3
Department of Mathematics, University of Caen-Normandie, 14000 Caen, France
*
Authors to whom correspondence should be addressed.
Mathematics 2023, 11(13), 2878; https://doi.org/10.3390/math11132878
Submission received: 23 May 2023 / Revised: 21 June 2023 / Accepted: 25 June 2023 / Published: 27 June 2023
(This article belongs to the Special Issue Bayesian Inference, Prediction and Model Selection)

Abstract

:
Seasonal autoregressive (SAR) models have many applications in different fields, such as economics and finance. It is well known in the literature that these models are nonlinear in their coefficients and that their Bayesian analysis is complicated. Accordingly, choosing the best subset of these models is a challenging task. Therefore, in this paper, we tackled this problem by introducing a Bayesian method for selecting the most promising subset of the SAR models. In particular, we introduced latent variables for the SAR model lags, assumed model errors to be normally distributed, and adopted and modified the stochastic search variable selection (SSVS) procedure for the SAR models. Thus, we derived full conditional posterior distributions of the SAR model parameters in the closed form, and we then introduced the Gibbs sampler, along with SSVS, to present an efficient algorithm for the Bayesian subset selection of the SAR models. In this work, we employed mixture–normal, inverse gamma, and Bernoulli priors for the SAR model coefficients, variance, and latent variables, respectively. Moreover, we introduced a simulation study and a real-world application to evaluate the accuracy of the proposed algorithm.

1. Introduction

Seasonal autoregressive (SAR) time series models are widely used in different fields such as economics and finance to fit and forecast time series that are characterized by seasonality [1]. As it is well known, time series modeling starts with the specification of the model order, followed by estimation, diagnostic checks, and forecasting [2]. Therefore, the model specification phase is important, since all other modeling phases depend on its accuracy. In most real-world applications, the number of time series lags incorporated in a proposed time series model for an underlying time series is unknown, and this number of time series lags in this case is known as the model order, which needs to be specified or estimated based on the given time series data and its assumed probability distribution [3].
Although the time series model order is usually unknown, a maximum value of this order can be assumed, and different methods can be introduced to select the best subset to have a parsimonious model. Traditional subset selection methods include information criteria, such as the Akaike information criterion (AIC) [4] and the corrected AIC (AIC c ) [5]. These traditional selection methods use exhaustive searches based on parameter estimation and order selection. Many researchers have used these methods for subset selection in autoregressive (AR) time series models, including McClave [6], Penm and Terrell [7], Thanoon [8], Sarkar and Kanjilal [9].
However, it is very computationally expensive to apply these traditional methods to complicated models with high orders, such as the SAR models and other time series models with multiple seasonalities [10,11]. Accordingly, other subset selection procedures based on Markov chain Monte Carlo (MCMC) methods have been proposed for reducing the computational cost and efficiently selecting the best subset of time series models. Some researchers have adopted the stochastic search variable selection (SSVS) procedure, which was introduced by George and McCulloch [12], for selecting the best subset of linear regression models to be applied to the subset selection of time series models. Chen [13] proposed the Gibbs sampler, along with the SSVS procedure, to select the best subset of AR models. This work has been extended by different researchers to other time series models. So et al. [14] extended it for the subset selection of AR models with exogenous variables, and Chen et al. [15] extended it for the subset selection of threshold ARMA models.
On the other hand, the Bayesian analysis of the SAR model is complicated, because the likelihood function is a nonlinear function of the SAR model coefficients, and, accordingly, its posterior density is analytically intractable. Different approaches have been introduced to facilitate this analysis, including Markov chain Monte Carlo (MCMC)-based approximations [16]. Barnett et al. [17,18] applied MCMC methods to estimate SAR and ARMA models based on sampling functions for partial autocorrelations. Ismail [19,20] applied the Gibbs sampler to introduce the Bayesian analysis of SAR and SMA models. Ismail and Amin [16] applied the Gibbs sampler to present the Bayesian estimation of seasonal ARMA (SARMA) models, and, recently, Amin [1] used the same approach to introduce the Bayesian prediction of SARMA models. For modeling time series with multiple seasonalities, Amin and Ismail [21] and Amin [22,23] applied the Gibbs sampler to introduce the Bayesian estimation of double SAR, SMA, and SARMA models. Recently, Amin [24,25] applied the Gibbs sampler to introduce the Bayesian analysis of double and triple SAR models.
From a real-world application perspective, it is crucial to introduce an efficient Bayesian method for selecting the best subset of the SAR models, with the aim to obtain a parsimonious SAR model. However, most of the existing work has focused only on the Bayesian estimation and the prediction of SAR processes, and none of them has tried to tackle this problem of selecting the best subset of the SAR models. Therefore, in this paper, we aim to fill this gap and enrich real-world applications of the SAR models by introducing a Bayesian method for subset selection of these models based on modifying the SSVS procedure and adopting the Gibbs sampler. In particular, we first introduce latent variables for the nonseasonal and seasonal SAR model lags, assume that the SAR model errors are normally distributed, and employ mixture–normal, inverse gamma, and Bernoulli priors for the SAR model coefficients, variance, and latent variables, respectively. We then derive full conditional posteriors of the SAR model parameters in the closed form, and we apply the Gibbs sampler, along with SSVS, to develop an efficient algorithm for the best subset selection of the SAR models. In order to evaluate the performance of the proposed algorithm, we conduct a simulation study and a real-world application.
The remainder of this paper is organized as follows: We summarize the SAR models and related Bayesian concepts in Section 2. We then introduce the posterior analysis and proposed algorithm for the Bayesian best subset selection of the SAR models in Section 3. In Section 4, we present and discuss simulations and the real-world application of the proposed Bayesian subset selection algorithm. Finally, we conclude this work in Section 5.

2. Seasonal Autoregressive (SAR) Models and Bayesian Concepts

A mean deleted time series { z t } is generated by a seasonal autoregressive model of order p 1 and p 2 , and is designated as SAR ( p 1 ) ( p 2 ) s if it satisfies [2]:
ϕ 1 ( B ) ϕ 2 ( B s ) z t = ε t ,
where the SAR errors { ε t } are assumed to follow a normal distribution with a mean zero and variance of σ 2 , s is the seasonal period, and B is an operator defined as B r z t = z t r . ϕ 1 ( B ) = 1 ϕ 11 B ϕ 12 B 2 ϕ 1 p 1 B p 1 and ϕ 2 ( B s ) = 1 ϕ 21 B s ϕ 22 B 2 s ϕ 2 p 2 B p 2 s are the nonseasonal and seasonal autoregressive polynomials with orders p 1 and p 2 , respectively. Also, ϕ 1 = ϕ 11 , ϕ 12 , , ϕ 1 p 1 T and ϕ 2 = ϕ 21 , ϕ 22 , , ϕ 2 p 2 T are the nonseasonal and seasonal autoregressive coefficients, respectively.
We can expand the SAR model (1) and write it as follows:
z t = i = 1 p 1 ϕ 1 i z t i + j = 1 p 2 ϕ 2 j z t j s i = 1 p 1 j = 1 p 2 ϕ 1 i ϕ 2 j z t i j s + ε t .
With the objective of simplifying the Bayesian analysis, we write the SAR model (2) in a matrix notation as follows:
y = X β + E ,
where y = z 1 , z 2 , , z n T , E = ε 1 , ε 2 , , ε n T , and X is an n × m matrix, i.e., m = ( 1 + p 1 ) ( 1 + p 2 ) 1 , with the t t h row defined as:
X t = z t 1 , , z t p 1 , z t s , z t s 1 , , z t s p 1 , , z t p 2 s , z t p 2 s 1 , , z t p 2 s p 1 ,
and β is the coefficient vector, which is defined as the following:
β = ϕ 11 , , ϕ 1 p 1 , ϕ 21 , ϕ 11 ϕ 21 , , ϕ 1 p 1 ϕ 21 , , ϕ 2 p 2 , ϕ 11 ϕ 2 p 2 , , ϕ 1 p 1 ϕ 2 p 2 T .
The products of the coefficients, i.e., ϕ 1 i ϕ 2 j s , are part of the SAR model, and, accordingly, this model is a nonlinear function of ϕ 1 and ϕ 2 , thereby leading to complications in its Bayesian analysis.
As we mentioned in the introduction, one of the main challenges of time series analysis is specifying the value of the SAR model order p 1 and p 2 , since these values are unknown and depend on the stochastic structure of the time series under study. Thus, we assume that the maximum value of the SAR model order is known, and we adopt and modify the SSVS procedure for the Bayesian best subset selection of the SAR model. Accordingly, we first introduce a latent variable for each coefficient of the SAR model, i.e., δ i j for ϕ i j j = 1 , , p i and i = 1 , 2 , where δ i j equals to one when the corresponding time series lag to ϕ i j is selected, and it equals to zero otherwise. We then represent the prior distribution on each SAR model coefficient ϕ i j using a mixture–normal distribution that is defined as:
ϕ i j | δ i j ( 1 δ i j ) N ( 0 , τ i j 2 ) + δ i j N ( 0 , c i j 2 τ i j 2 ) ,
and
p ( δ i j = 1 ) = 1 p ( δ i j = 0 ) = P i j i j .
Thus, the prior distribution of ϕ 1 and ϕ 2 can be presented as a multivariate normal distribution that is defined as follows:
ϕ i | δ i N p i ( 0 , M δ i W i M δ i ) i = 1 , 2 ,
where δ i = ( δ i 1 , , δ i p i ) T , M δ i is a diagonal matrix, i.e., M δ i = d i a g [ b i 1 τ i 1 , , b i p i τ i p i ] with b i j = 1 if δ i j = 0 and b i j = c i j if δ i j = 1 , and W i is a prior correlation matrix. Here, M δ i is specified in such a way to be a scaling of the prior covariance matrix to satisfy the prior specification in (6). In particular, we chose τ i 1 , , τ i p i to be small, and, therefore, the ϕ i j s associated with δ i j = 0 would likely be close to zero. In addition, we chose c i 1 , , c i p i to be large enough to make c i j 2 τ i j 2 highly greater than τ i j 2 ; thus, the ϕ i j associated with δ i j = 1 would tend to have a high variation and likely be away from zero, and the corresponding time series lags were selected as the best subset SAR model. For more information about setting these constants in the SSVS procedure, we refer to George and McCulloch [12].
As a prior distribution of σ 2 , we specify an inverse gamma distribution that is defined as:
σ 2 | δ i I G ν 2 , ν λ 2 .
Using the marginal distribution of the δ i j s given in (7), we can write the joint prior distribution of δ i as:
ζ ( δ i ) = j = 1 p i P i j δ i j ( 1 P i j ) ( 1 δ i j ) i = 1 , 2 .
It is worth noting that the uniform prior of δ i , i.e., ζ ( δ i ) = 2 p i i = 1 , 2 , is a special case, since each time series lag has the same probability to be selected.
The likelihood function of the SAR model (3) with normally distributed errors can be presented as:
L ( ϕ 1 , ϕ 2 , σ 2 y ) ( σ 2 ) n 2 exp 1 2 σ 2 E T E = ( σ 2 ) n 2 exp 1 2 σ 2 y X β T y X β .
We can obtain the joint posterior distribution of the SAR models by simply multiplying the prior distributions given in (8)–(10) by the likelihood function given in (11), which can be presented as:
ζ ϕ 1 , ϕ 2 , σ 2 , δ 1 , δ 2 y i = 1 2 j = 1 p i P i j δ i j ( 1 P i j ) ( 1 δ i j ) ( σ 2 ) n + ν 2 + 1 × exp 1 2 σ 2 ν λ + σ 2 i = 1 2 ϕ i T ( M δ i W i M δ i ) ϕ i + y X β T y X β .

3. Bayesian Subset Selection of the SAR Models

The introduction of the Bayesian subset selection of the SAR models is based on posterior analysis; however, the joint posterior (12) of the SAR model parameters is a nonlinear function of the coefficients ϕ 1 and ϕ 2 . Accordingly, this joint posterior is analytically intractable, and, thus, the marginal posterior of each parameter cannot be analytically derived in closed forms. One of the solutions that can be applied to tackle this problem and ease the Bayesian subset selection of the SAR models is introducing the Gibbs sampler to approximate the required marginal posteriors of these models. In this section, we first derive the conditional posterior distributions of the SAR model as a requirement to employ the Gibbs sampler, and we then introduce our proposed algorithm for the Bayesian subset selection of the SAR models.

3.1. Conditional Posteriors of the SAR Models

As we introduced in our previous work [1,10], deriving the conditional posterior of each SAR model parameter can be simply done from the joint posterior (12) by first combining related terms to that parameter and then integrating out all unrelated terms. Following the same approach, we derive here the full conditional posteriors of the SAR parameters, i.e., ϕ 1 , ϕ 2 , σ 2 , δ 1 , and δ 2 , that are required to adopt the Gibbs sampler, along with SSVS for selecting the best subset of the SAR models.
We rewrite the SAR model (3) as the following:
y = Z ϕ 1 ϕ 1 + L ϕ 1 ϕ 2 + E .
We substitute (13) in the joint posterior (12), and we then complete the square in the exponent with respect to ϕ 1 and integrate out all unrelated terms to obtain the conditional posterior of ϕ 1 given ϕ 2 , σ 2 , δ 1 , δ 2 , and y to be N p 1 ( μ ϕ 1 , V ϕ 1 ) , where:
μ ϕ 1 = σ 2 Z ϕ 1 T Z ϕ 1 + ( M δ 1 W 1 M δ 1 ) 1 1 σ 2 Z ϕ 1 T y L ϕ 1 ϕ 2 , and V ϕ 1 = σ 2 Z ϕ 1 T Z ϕ 1 + ( M δ 1 W 1 M δ 1 ) 1 1 ,
where Z ϕ 1 is a n × p 1 matrix with the ( t , i ) element Z ϕ 1 t i = z t i j = 1 p 2 ϕ 2 j z t i j s , and L ϕ 1 is a n × p 2 matrix with the t t h row L ϕ 1 t = z t s , z t 2 s , , z t p 2 s .
In the same way, we rewrite the SAR model (3) as the following:
y = Z ϕ 2 ϕ 2 + L ϕ 2 ϕ 1 + E .
We substitute (15) in the joint posterior (12), and we then complete the square in the exponent with respect to ϕ 2 and integrate out all unrelated terms to obtain the conditional posterior of ϕ 2 given ϕ 1 , σ 2 , δ 1 , δ 2 , and y to be N p 2 ( μ ϕ 2 , V ϕ 2 ) , where:
μ ϕ 2 = σ 2 Z ϕ 2 T Z ϕ 2 + ( M δ 2 W 2 M δ 2 ) 1 1 σ 2 Z ϕ 2 T y L ϕ 2 ϕ 1 , and V ϕ 2 = σ 2 Z ϕ 2 T Z ϕ 2 + ( M δ 2 W 2 M δ 2 ) 1 1 ,
where Z ϕ 2 is an n × p 2 matrix with the ( t , j ) element Z ϕ 2 t j = z t j s i = 1 p 1 ϕ 1 i z t i , and L ϕ 2 is an n × p 1 matrix with the t t h row L ϕ 2 t = z t 1 , z t 2 , , z t p 1 .
Moreover, from the joint posterior (12), we easily derive the conditional posterior of σ 2 given ϕ 1 , ϕ 2 , δ 1 , δ 2 , and y to be an inverse gamma I G ( n + ν 2 , λ 2 ) , where λ = ν λ + y X β T y X β .
Now, in order to simplify the deriving conditional posteriors of latent variables, we need to first simplify the notations. In particular, for the j t h latent variable in the i t h vector, we refer to other latent variables as δ i ( j ) j = 1 , , p i , and i = 1 , 2 . In addition, for the i t h latent variable vector, we refer to other latent variable vectors as δ ( i ) . Accordingly, we derive the conditional posterior of each latent variable δ i j given ϕ 1 , ϕ 2 , σ 2 , δ i ( j ) , δ ( i ) , and y to be a Bernoulli distribution with a probability that is defined as follows:
p ( δ i j = 1 ϕ 1 , ϕ 2 , σ 2 , δ i ( j ) , δ ( i ) , y ) = a i j a i j + b i j i j ,
where a i j = P i j × ζ ( ϕ i ϕ ( i ) , σ 2 , δ i ( j ) , δ ( i ) , y , δ i j = 1 ) , and b i j = ( 1 P i j ) × ζ ( ϕ i ϕ ( i ) , σ 2 , δ i ( j ) , δ ( i ) , y , δ i j = 0 ) .

3.2. Proposed Algorithm for Bayesian Subset Selection of the SAR Models

Based on the work of the previous subsection, the required conditional posteriors of the SAR parameters are available, and we are accordingly able to adopt the SSVS and Gibbs sampler to propose an algorithm for the Bayesian subset selection of the SAR models.
We can implement our proposed algorithm for the Bayesian subset selection of the SAR models in the following steps:
1.
Set a maximum value for the SAR model order as ( p 1 , p 2 ).
2.
Apply the OLS method to estimate the SAR( p 1 )( p 2 ) s model, and set these estimates as initial values for the Gibbs Sampler— { ϕ 1 0 , ϕ 2 0 , ( σ 2 ) 0 , δ 1 0 , δ 2 0 } .
3.
Set the Gibbs sampler simulation design, which includes the number of simulations, burn-in, and thinning.
4.
Set r as the current simulation and simulate the conditional posteriors as the following:
  • δ 1 j r ζ δ 1 j r y , ϕ 1 r 1 , ϕ 2 r 1 , ( σ 2 ) r , δ 1 ( j ) r , δ 2 r = B i n 1 , a 1 j a 1 j + b 1 j , j = 1 , , p 1 .
  • δ 2 j r ζ δ 2 j r y , ϕ 1 r 1 , ϕ 2 r 1 , ( σ 2 ) r , δ 2 ( j ) r , δ 1 r = B i n 1 , a 2 j a 2 j + b 2 j , j = 1 , , p 2 .
  • ϕ 1 r ζ ϕ 1 r y , ϕ 2 r 1 , ( σ 2 ) r 1 , δ 1 r , δ 2 r = N ( μ ϕ 1 , V ϕ 1 ) ,
  • ϕ 2 r ζ ϕ 2 r y , ϕ 1 r , ( σ 2 ) r 1 , δ 1 r , δ 2 r = N ( μ ϕ 2 , V ϕ 2 ) ,
  • ( σ 2 ) r ζ ( σ 2 ) r y , ϕ 1 r , ϕ 2 r , δ 1 r , δ 2 r = I G n + ν 2 , λ 2 .
In this r t h simulation, the generated values together construct the r t h value of the Markov chain, i.e., { ϕ 1 r , ϕ 2 r , ( σ 2 ) r , δ 1 r , δ 2 r } .
5.
Repeat step (4) until all the required Gibbs sampler simulations have been conducted.
6.
Apply the burn-in and thinning processes for the simulated Markov chain and monitor the convergence using autocorrelations, Raftery and Lewis diagnostics [26], and Geweke diagnostics [27]. For more information about these convergence diagnostics, see LeSage [28] and Amin [1,10].
7.
Once the convergence of the simulated Markov chain is confirmed, select the best SAR subset that corresponds to a value of latent variables with the highest frequency in the simulated Markov chain, and also (whenever it is needed) compute the Bayesian estimates of the SAR parameters directly using the sample averages of these simulation outputs.

4. Simulations and Real Application

In this section, we introduce a simulation study and a real-world application for the proposed Bayesian algorithm for selecting the best subset selection for the SAR models, wherein we aim to evaluate its accuracy and applicability.

4.1. Simulation Study

We performed simulations from four SAR models using a simulation design that is presented in Table 1. The parameters of these SAR models were selected to cover different seasonality patterns, without any bias to select specific models or parameters. In particular, the first two SAR models are examples of SAR(2)(1) 12 , the third SAR model is an example of SAR(1)(2) 12 , and the fourth SAR model is an example of SAR(2)(2) 12 .
We first generated 1,000 time series from each SAR model with different sizes, from 100 to 500 with an increment of 100, and we then applied the proposed Bayesian algorithm for selecting the best subset of the SAR model, as described in Section 3.2.
We set the Gibbs sampler simulation design as follows:
The number of Gibbs sampler simulations equaled 11,000, the burn-in equaled 1000, and thinning level equaled 10. In addition, we set the maximum order value of the SAR model (2) to be three, i.e., p i = 5 i = 1 , 2 . Using Gibbs sampler draws, we computed the frequency values for each latent variable to find the best subset of the SAR model as a value with the highest frequency. Also, whenever it is needed, we can easily compute Bayesian estimates of the SAR model parameters as the following summary statistics: mean, standard deviation, and ( 2 . 5 t h , 97 . 5 t h ) percentiles of draws as a 95% credible interval. We evaluated the accuracy of our proposed Bayesian algorithm for the best subset selection of the SAR models by simply computing the percentage of correctly selected best subset SAR models. In addition, for the sake of comparison, we used the traditional subset selection methods, including the AIC and AIC c , to select the best subset of the simulated SAR models and also computed their percentage of correctly selected best subset SAR models. We can illustrate in some detail how our proposed algorithm works for the best subset selection of the SAR models by presenting all the results of the Gibbs sampler draws for only one time series of size n = 300 generated from Model I. For this generated time series, we display the Bayesian subset selection results in Table 2 and the estimation results in Table 3.
As can be seen from Table 2, for the nonseasonal latent variables δ 1 , the values ( 1 , 1 , 0 , 0 , 0 ) had the highest frequency among all the possible values, with a percentage of about 71.4%, and, for the seasonal latent variables δ 2 , the values ( 1 , 0 , 0 , 0 , 0 ) had the highest frequency, with a percentage of about 60%. Therefore, for this generated time series, the algorithm selected SAR(2)(1) 12 as the best subset, which was the same as the true SAR model used to generate the time series, which highlights the accuracy of our proposed algorithm for the subset selection of SAR models. On the other hand, even though the estimation of the SAR models was not our objective in this work, the Bayesian estimates of the SAR parameters presented in Table 3 were very close to their true values in the simulated SAR models.
All of these results were only based on the time series generated from Model I, and, in the following, we present and discuss all the simulation study results. Since our objective in this work was the Bayesian best subset selection of the SAR models, we only display the simulation results for the Bayesian subset selection of all the simulated SAR models, and the Bayesian estimation results are not of our interest here. In particular, in Table 4, we show the percentage of correctly selected best subset SAR models using our proposed Bayesian algorithm and the traditional subset selection methods, i.e., the AIC and AIC c .
From Table 4, we can state general conclusions:
  • First, for our proposed algorithm, the larger the size of the time series, the higher the percentage of correctly selected subset SAR models that were obtained, which implies that the proposed Bayesian subset selection is a consistent estimator of the best subset of the SAR models. However, this was not the case for the traditional subset selection methods, where the simulation results showed that they are inconsistent estimators.
  • Second, for small time series sizes, i.e., n = 100 , our proposed algorithm had comparable accuracy to those of the traditional subset selection methods. However, once the time series size became larger, our proposed algorithm had substantially higher accuracy than those of the traditional subset selection methods. For instance, when the time series size n = 300 , our proposed algorithm had a percentage of correctly selected subset SAR models of at least 93%, compared to 75% at most for the traditional subset selection methods.
  • Third, the accuracy of our proposed algorithm was almost the same across all the simulated SAR models, which indicates the robustness of our proposed algorithm against the different stochastic behaviors of time series exhibiting seasonal patterns.
In general, all these simulation results confirm that the traditional subset selection methods do not completely fail to select the best subset of SAR models, but they are inconsistent estimators, and their best subset selection was achieved mostly with a low accuracy. On the other hand, the proposed Bayesian algorithm for the subset selection of the SAR models was a consistent estimator with a high accuracy of best subset selection.

4.2. Real Application

In this subsection, we evaluate the applicability of our proposed Bayesian subset selection algorithm to real-time series. We conducted the Bayesian subset selection of the SAR models with real-world time series exhibiting seasonal patterns. This real-time series that we considered in our application is the monthly Federal Reserve Board (FRB) production index, with data starting from January 1948 to December 1978. For more details about this time series, see, for example, Amin [1].
We present the FRB production index time series in Table 5 and visualize this real-time series in Figure 1. As can be seen from Figure 1a, the FRB production index is nonstationary. We tried to stationarize the time series by applying the first (nonseasonal) difference, as visualized in Figure 1a, but still, the differenced time series was not stationary in the seasonal component. Accordingly, we employed both nonseasonal and seasonal differences to stationarize it, as visualized in Figure 1c. Therefore, we applied our proposed algorithm of Bayesian subset selection to the stationary differenced FRB production index, not the nonstationary raw data, with the same Gibbs sampler setting used in our simulation study.
We present the Bayesian subset selection results of the SAR models for the differenced FRB production index in Table 6, and we also display the estimation results in Table 7. As can be seen from Table 6, for the nonseasonal latent variables δ 1 , the values ( 1 , 0 , 0 , 0 , 0 ) had the highest frequency among all the possible values with a percentage of about 45.3%. For the seasonal latent variables δ 2 , the values ( 1 , 1 , 1 , 1 , 0 ) had the highest frequency with a percentage of about 37.5%, but another set of values ( 1 , 1 , 1 , 0 , 0 ) had a very similar frequency with a percentage of about 36%. Thus, in this case, we had to look at the estimation results to check the significance of the SAR model coefficients using the 95% credible interval to decide between these two subsets of the SAR model. As we can see from Table 7, for the nonseasonal AR coefficients, only the first coefficient, i.e., ϕ 11 , was significant, and all other nonseasonal AR coefficients were insignificant. On the other hand, for the seasonal AR coefficients, the first three coefficients, i.e., ϕ 21 , ϕ 22 , and ϕ 23 , were significant, which supported the selection of ( 1 , 1 , 1 , 0 , 0 ) , not ( 1 , 1 , 1 , 1 , 0 ) . Therefore, for the differenced FRB production index, the SAR(1)(3) 12 was selected as the best subset of the SAR model. For the sake of comparison, we also applied the AIC and AIC c to select the best subset, and the results show that both methods selected the same subset, SAR(2)(4) 12 , which was very close to our algorithm selection.

5. Conclusions

In this paper, we developed a Bayesian subset selection of the SAR models based on the SSVS procedure and Gibbs sampler. By introducing latent variables for the nonseasonal and seasonal SAR model lags, we adopted and modified the SSVS procedure to select the best subset SAR model. We employed mixture–normal, inverse gamma, and Bernoulli priors for the SAR model coefficients, variance, and latent variables, respectively. By deriving full conditional posteriors of the SAR model parameters in the closed form, we introduced the Gibbs sampler along with SSVS to present an efficient algorithm for the subset selection of the SAR models. We performed a simulation study and a real application to evaluate the accuracy of the proposed algorithm, and the results of the simulation study confirmed its accuracy, while the results of the real application showed its applicability to select the best subset of the SAR model for real time series with seasonality. As part of future work, we plan to extend this work to select the best subset of time series models with multiple seasonalities, as introduced in [11], and also to select the best subset of multivariate autoregressive models, i.e., vector autoregressive models.

Author Contributions

Methodology, A.A.A.; Validation, A.A.A.; Writing—original draft, A.A.A.; Writing—review & editing, W.E., Y.T. and C.C.; Supervision, C.C.; Project administration, W.E.; Funding acquisition, W.E. and Y.T. All authors have read and agreed to the published version of the manuscript.

Funding

The study was funded by the Researchers Supporting Project number (RSPD2023R488), King Saud University, Riyadh, Saudi Arabia.

Data Availability Statement

All the datasets used in this paper are available for download (https://aymanamin.rbind.io/publication/2023-sar_subsetselection/), and they are accessed on 24 June 2023.

Acknowledgments

The authors are thankful to the editor and reviewers for their valuable suggestions that improved the paper. The study was funded by the Researchers Supporting Project number (RSPD2023R488), King Saud University, Riyadh, Saudi Arabia; and the authors thank King Saud University for the financial support.

Conflicts of Interest

The authors declare that they have no conflict of interest.

References

  1. Amin, A.A. Gibbs Sampling for Bayesian Prediction of SARMA Processes. Pak. J. Stat. Oper. Res. 2019, 15, 397–418. [Google Scholar] [CrossRef]
  2. Box, G.E.; Jenkins, G.M.; Reinsel, G.C.; Ljung, G.M. Time Series Analysis: Forecasting and Control; John Wiley & Sons: Hoboken, NJ, USA, 2015. [Google Scholar]
  3. Amin, A.A. Sensitivity to prior specification in Bayesian identification of autoregressive time series models. Pak. J. Stat. Oper. Res. 2017, 13, 699–713. [Google Scholar] [CrossRef] [Green Version]
  4. Akaike, H. A new look at the statistical model identification. IEEE Trans. Autom. Control. 1974, 19, 716–723. [Google Scholar] [CrossRef]
  5. Hurvich, C.M.; Tsai, C.L. Regression and time series model selection in small samples. Biometrika 1989, 76, 297–307. [Google Scholar] [CrossRef]
  6. McClave, J.T. Estimating the order of autoregressive models: The max X2 method. J. Am. Stat. Assoc. 1978, 73, 122–128. [Google Scholar]
  7. Penm, J.H.; Terrell, R. On the recursive fitting of subset autoregressions. J. Time Ser. Anal. 1982, 3, 43–59. [Google Scholar] [CrossRef]
  8. Thanoon, B. Subset threshold autoregression with applications. J. Time Ser. Anal. 1990, 11, 75–87. [Google Scholar] [CrossRef]
  9. Sarkar, A.; Kanjilal, P. On a method of identification of best subset model from full ar-model. Commun. Stat.-Theory Methods 1995, 24, 1551–1567. [Google Scholar] [CrossRef]
  10. Amin, A.A. Bayesian analysis of double seasonal autoregressive models. Sankhya B 2020, 82, 328–352. [Google Scholar] [CrossRef]
  11. Amin, A.A. Gibbs sampling for Bayesian estimation of triple seasonal autoregressive models. Commun. Stat.-Theory Methods 2022. [Google Scholar] [CrossRef]
  12. George, E.I.; McCulloch, R.E. Variable selection via Gibbs sampling. J. Am. Stat. Assoc. 1993, 88, 881–889. [Google Scholar] [CrossRef]
  13. Chen, C.W. Subset selection of autoregressive time series models. J. Forecast. 1999, 18, 505–516. [Google Scholar] [CrossRef]
  14. So, M.K.; Chen, C.W.; Liu, F.C. Best subset selection of autoregressive models with exogenous variables and generalized autoregressive conditional heteroscedasticity errors. J. R. Stat. Soc. Ser. (Appl. Stat.) 2006, 55, 201–224. [Google Scholar] [CrossRef]
  15. Chen, C.W.; Liu, F.C.; Gerlach, R. Bayesian subset selection for threshold autoregressive moving-average models. Comput. Stat. 2011, 26, 1–30. [Google Scholar] [CrossRef]
  16. Ismail, M.A.; Amin, A.A. Gibbs Sampling For SARMA Models. Pak. J. Stat. 2014, 30, 153–168. [Google Scholar]
  17. Barnett, G.; Kohn, R.; Sheather, S. Bayesian estimation of an autoregressive model using Markov chain Monte Carlo. J. Econom. 1996, 74, 237–254. [Google Scholar] [CrossRef]
  18. Barnett, G.; Kohn, R.; Sheather, S. Robust Bayesian Estimation Of Autoregressive–Moving-Average Models. J. Time Ser. Anal. 1997, 18, 11–28. [Google Scholar] [CrossRef]
  19. Ismail, M.A. Bayesian Analysis of Seasonal Autoregressive Models. J. Appl. Stat. Sci. 2003, 12, 123–136. [Google Scholar]
  20. Ismail, M.A. Bayesian Analysis of the Seasonal Moving Average Model: A Gibbs Sampling Approach. Jpn. J. Appl. Stat. 2003, 32, 61–75. [Google Scholar] [CrossRef]
  21. Amin, A.A.; Ismail, M.A. Gibbs Sampling for Double Seasonal Autoregressive Models. Commun. Stat. Appl. Methods 2015, 22, 557–573. [Google Scholar] [CrossRef] [Green Version]
  22. Amin, A.A. Bayesian Inference for Double Seasonal Moving Average Models: A Gibbs Sampling Approach. Pak. J. Stat. Oper. Res. 2017, 13, 483–499. [Google Scholar] [CrossRef] [Green Version]
  23. Amin, A.A. Gibbs Sampling for Double Seasonal ARMA Models. In Proceedings of the 29th Annual International Conference on Statistics and Computer Modeling in Human and Social Sciences, Cairo, Egypt, 28–30 March 2017. [Google Scholar]
  24. Amin, A.A. Full Bayesian analysis of double seasonal autoregressive models with real applications. J. Appl. Stat. 2023, 1–21. [Google Scholar] [CrossRef]
  25. Amin, A.A. Bayesian Inference of Triple Seasonal Autoregressive Models. Pak. J. Stat. Oper. Res. 2022, 18, 853–865. [Google Scholar] [CrossRef]
  26. Raftery, A.E.; Lewis, S.M. The number of iterations, convergence diagnostics and generic Metropolis algorithms. Pract. Markov Chain. Monte Carlo 1995, 7, 763–773. [Google Scholar]
  27. Geweke, J. Evaluating the accuracy of sampling-based approaches to the calculations of posterior moments. Bayesian Stat. 1992, 4, 641–649. [Google Scholar]
  28. LeSage, J.P. Applied Econometrics Using MATLAB; Technical Report; Department of Economics, University of Toronto: Toronto, ON, Canada, 1999. [Google Scholar]
Figure 1. Plots of FRB production index time series. (a) FRB production index. (b) Nonseasonal differenced FRB production index. (c) Nonseasonal and seasonal differenced FRB production index.
Figure 1. Plots of FRB production index time series. (a) FRB production index. (b) Nonseasonal differenced FRB production index. (c) Nonseasonal and seasonal differenced FRB production index.
Mathematics 11 02878 g001aMathematics 11 02878 g001b
Table 1. Simulation design.
Table 1. Simulation design.
Model ϕ 11 ϕ 12 ϕ 21 ϕ 22 s σ 2
I0.50.30.4-121.0
II1.5−0.90.6-121.0
III0.6-0.50.4121.0
IV1.5−0.90.50.4121.0
Table 2. Bayesian subset selection results for one time series generated from Model I.
Table 2. Bayesian subset selection results for one time series generated from Model I.
Non-Seasonal Latent VariablesSeasonal Latent Variables
δ 1 FreqPercent δ 2 FreqPercent
(1,1,0,0,0)71471.4(1,0,0,0,0)58958.9
(1,1,1,0,0)10010.0(1,0,1,0,0)11411.4
(1,1,0,0,1)909.0(1,0,0,0,1)909.0
(1,1,0,1,0)838.3(1,0,0,1,0)878.7
(1,0,0,0,0)474.7(1,1,0,0,0)878.7
(1,1,1,1,0)161.6(1,0,0,1,1)262.6
(1,1,1,0,1)151.5(1,0,1,1,0)202.0
(1,1,0,1,1)141.4(1,1,0,0,1)191.9
(1,0,0,0,1)60.6(1,0,1,0,1)181.8
(1,0,1,0,0)50.5(1,1,1,0,0)171.7
(1,0,0,1,0)40.4(1,1,0,1,0)171.7
(1,0,1,1,0)20.2(1,0,1,1,1)50.5
(1,0,0,1,1)20.2(1,1,1,1,0)40.4
(0,1,0,0,1)10.1(1,1,1,0,1)40.4
(1,0,1,0,1)10.1(1,1,0,1,1)30.3
Table 3. Bayesian estimates results for one time series generated from Model I.
Table 3. Bayesian estimates results for one time series generated from Model I.
ParameterActual μ σ LU
ϕ 11 0.50.4320.0630.3030.556
ϕ 12 0.30.2870.0720.1290.417
ϕ 13 0.00.0000.053−0.1080.104
ϕ 14 0.0−0.0030.050−0.1000.093
ϕ 15 0.0−0.0180.046−0.1140.065
ϕ 21 0.40.4010.0610.2850.518
ϕ 22 0.00.0170.052−0.0780.119
ϕ 23 0.0−0.0400.054−0.1550.063
ϕ 24 0.0−0.0080.053−0.1120.096
ϕ 25 0.0−0.0360.054−0.1390.066
σ 2 1.01.0710.0860.9061.244
μ and σ : posterior mean and standard deviation; L and U: lower and upper 95% credible interval limits.
Table 4. Percentage of correctly selected best subset SAR models for the simulation study.
Table 4. Percentage of correctly selected best subset SAR models for the simulation study.
nSSVSAICAIC c SSVSAICAIC c
Results for Model IResults for Model II
10046.552.959.383.859.070.8
20083.868.174.996.761.967.5
30093.467.772.798.761.164.6
40098.369.073.099.658.761.6
50099.367.971.099.959.761.5
Results for Model IIIResults for Model IV
10068.866.570.955.347.850.6
20095.572.378.290.159.167.9
30098.669.474.494.558.563.0
40099.870.574.896.860.962.9
50099.969.171.898.361.164.9
Table 5. FRB production index time series.
Table 5. FRB production index time series.
Year123456789101112
194840.641.140.540.140.441.239.341.642.343.241.840.5
194940.040.139.338.537.737.936.039.040.039.239.038.8
195039.840.341.642.643.044.743.448.348.750.048.348.2
195148.549.650.049.448.649.145.548.049.149.548.948.2
195249.050.150.449.148.348.444.950.653.454.554.553.6
195354.255.656.355.855.855.953.255.855.555.853.251.0
195451.352.051.851.251.351.948.851.352.353.553.653.4
195554.756.257.557.958.459.155.858.660.161.961.360.4
195660.561.161.261.660.761.055.260.462.663.862.662.0
195762.063.563.662.361.863.159.462.863.062.660.457.8
195857.056.455.854.755.057.659.458.760.761.362.160.8
195961.963.865.266.467.268.562.963.364.564.663.665.9
196067.868.268.067.567.367.563.465.666.666.964.561.7
196162.062.763.565.166.168.064.667.969.571.170.569.4
196268.971.072.172.472.473.469.771.974.775.073.671.8
196372.474.775.776.477.178.673.475.578.980.278.776.5
196477.579.880.281.581.983.278.381.384.683.984.583.4
196584.887.088.888.889.591.786.789.492.494.593.091.4
196692.795.397.397.197.8100.093.897.2101.7102.8100.097.5
196798.199.199.099.698.7100.994.499.6102.7103.4103.1101.4
1968101.8104.5105.6104.9106.5109.3102.5105.5109.6110.1109.6106.2
1969107.3110.4111.6110.6110.5114.0107.3111.6115.1115.1112.0108.3
1970106.5109.1109.4108.8108.6110.8104.5108.0110.4108.0105.1104.1
1971105.5108.3108.6108.8109.5112.5105.4108.8113.5113.9111.6108.5
1972111.5115.6116.8118.7118.4121.8114.2120.5125.5126.8125.2121.8
1973122.7128.1128.8128.6129.6133.0126.4130.3134.8135.3132.9126.7
1974126.3129.8130.8129.9131.7135.3127.3131.4135.5133.1125.5114.9
1975111.8113.0111.8113.0113.8119.2114.5121.4125.9125.4123.8119.8
1976122.2128.3128.6128.7130.0133.2126.5131.7134.3133.8132.1128.3
1977128.8133.6135.7136.2137.2141.5134.1138.2142.4142.7139.5134.9
1978134.8139.6141.4144.2144.2148.8141.9146.9152.0152.6149.7145.0
Table 6. Bayesian subset selection results for the differenced FRB production index.
Table 6. Bayesian subset selection results for the differenced FRB production index.
Nonseasonal Latent VariablesSeasonal Latent Variables
δ 1 FreqPercent δ 2 FreqPercent
(1,0,0,0,0)45345.3(1,1,1,1,0)37537.5
(1,0,0,1,0)11511.5(1,1,1,0,0)35835.8
(1,1,0,0,0)11311.3(1,1,0,0,0)10910.9
(1,0,0,0,1)10210.2(1,1,1,0,1)585.8
(1,0,1,0,0)595.9(1,1,1,1,1)555.5
(1,0,0,1,1)323.2(1,1,0,1,0)202.0
(1,1,0,0,1)242.4(1,1,0,0,1)202.0
(1,1,1,0,0)202.0(1,1,0,1,1)50.5
Table 7. Bayesian estimates results for the differenced FRB production index.
Table 7. Bayesian estimates results for the differenced FRB production index.
Parameter μ σ LU
ϕ 11 0.3090.0650.1780.430
ϕ 12 0.0730.057−0.0270.188
ϕ 13 −0.0060.051−0.1160.090
ϕ 14 0.0620.055−0.0430.179
ϕ 15 −0.0640.052−0.1770.031
ϕ 21 −0.6920.070−0.827−0.549
ϕ 22 −0.6140.089−0.786−0.441
ϕ 23 −0.2740.105−0.475−0.072
ϕ 24 −0.1320.092−0.3340.018
ϕ 25 0.0260.057−0.0890.135
σ 2 1.3870.1081.1961.609
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Amin, A.A.; Emam, W.; Tashkandy, Y.; Chesneau, C. Bayesian Subset Selection of Seasonal Autoregressive Models. Mathematics 2023, 11, 2878. https://doi.org/10.3390/math11132878

AMA Style

Amin AA, Emam W, Tashkandy Y, Chesneau C. Bayesian Subset Selection of Seasonal Autoregressive Models. Mathematics. 2023; 11(13):2878. https://doi.org/10.3390/math11132878

Chicago/Turabian Style

Amin, Ayman A., Walid Emam, Yusra Tashkandy, and Christophe Chesneau. 2023. "Bayesian Subset Selection of Seasonal Autoregressive Models" Mathematics 11, no. 13: 2878. https://doi.org/10.3390/math11132878

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop