Semiparametric Block Bootstrap Prediction Intervals for Parsimonious Autoregression

This paper investigates the research question of whether the principle of parsimony carries over into interval forecasting, and proposes new semiparametric prediction intervals that apply the block bootstrap to the first-order autoregression. The AR(1) model is parsimonious in which the error term may be serially correlated. Then, the block bootstrap is utilized to resample blocks of consecutive observations to account for the serial correlation. The Monte Carlo simulations illustrate that, in general, the proposed prediction intervals outperform the traditional bootstrap intervals based on nonparsimonious models.


Introduction
It is well known that a parsimonious model may produce superior out-of-sample point forecasts compared to a complex model with overfitting issue, see [1]. One objective of this paper is to examine whether the principle of parsimony (POP) can be extended to interval forecasts. Toward that end, this paper proposes a semiparametric block bootstrap prediction intervals (BBPI) based on a parsimonious first-order autoregression AR (1). By contrast, the standard or iid bootstrap prediction intervals developed by Thombs and Schucany [2] (called TS intervals thereafter) are based on a dynamically adequate AR(p), where p can be large.
A possibly overlooked fact is that there is inconsistency between the ways of obtaining point forecasts and interval forecasts, in terms of whether POP is applied. When the goal is the point forecast, the models selected by information criteria of AIC and BIC are typically parsimonious, but not necessarily adequate (see Enders Enders [3] for instance). However, POP is largely forgone by the classical Box-Jenkins prediction intervals and the TS intervals; both require serially uncorrelated error terms, and the chosen models can be very complicated.
This paper attempts to address that inconsistency. The key is to note that the essence of time series forecasting is to utilize the serial correlation, and there are multiple ways to do that. One way is to use a dynamically adequate AR(p) with serially uncorrelated errors, which is fully parametric. This paper instead employs a parsimonious AR(1) with possibly serially correlated errors. Our model is semiparametric since no specific function form is assumed for the error process. Our semiparametric approach of leaving some degree of serial correlation in the error term is similar to the famous Cochrane-Orcutt procedure of Cochrane and Orcutt [4].
We employ the AR(1) model in order to generate the bootstrap replicate. In particular, we are not interested in the autoregressive coefficient, and for our purposes, it becomes irrelevant that the OLS estimate may be inconsistent due to the autocorrelated error. Using the AR(1) has another advantage: the likelihood of multicollinearity is minimized, which can result in a more efficient estimate for the coefficient. On the other hand, we do want to make use of the correlation structure in the error, and that is fulfilled by using the block bootstrap. More explicitly, the block bootstrap redraws with replacement random blocks of consecutive residuals of the AR(1). The blocking is intended to preserve the time dependence structure.
Constructing the BBPI involves three steps. In step one, the AR(1) regression is estimated by ordinary least squares (OLS), and the residual is saved. In step two, a backward AR(1) regression is fitted, and random blocks of residuals are used to generate the bootstrap replicate. In step three the bootstrap replicate is used to run the AR(1) regression again and random blocks of residuals are used to compute the bootstrap out-of-sample forecast. After repeating steps two and three many times, the BBPI is determined by the percentiles of the empirical distribution of the bootstrap forecast (In the full-length version of the paper, which is available upon request, we discuss technical issues such as correcting the bias of autoregressive coefficients, selecting the block size, choosing between overlapping and non-overlapping blocks, and using the stationary bootstrap developed by Politis and Romano [5]).
We implement the Monte Carlo experiment that compares the average coverage rate of the BBPI to the TS intervals. There are two main findings. The first is that the BBPI dominates when the error term shows a strong serial correlation. The second is that the BBPI always outperforms the TS intervals for the one-step forecast. For a longer forecast horizon, the TS intervals may perform better. This second finding highlights a tradeoff between preserving correlation and adding variation when obtaining the bootstrap intervals. The block bootstrap achieves the former but sacrifices the latter.

Semiparametric Block Bootstrap Prediction Intervals
Let {y t } be a strictly stationary and weakly dependent time series with mean of zero. In practice, y t may represent the demeaned, differenced, detrended or deseasonalized series. At first, it is instructive to emphasize a fact: there are multiple ways to model a time series. For instance, suppose the data generating process (DGP) is an AR(2) with serially uncorrelated errors: where e t can be white noise or martingale difference. Then, we can always rewrite Equation (1) as an AR(1) with new error v t , and the new error follows an AR(1) process, so is serially correlated: where φ 1 = ρ + φ and φ 2 = −ρφ by construction. The point is, the exact form of the DGP does not matter. In this example, it can be AR(1) or AR (2). What matters is the serial correlation of y t , which can be captured by Equation (1), or Equation (2) along with Equation (3) equally well. This example indicates that it is plausible to obtain forecasts based on the parsimonious AR(1) model, as long as the serial correlation in v t has been accounted for, even if the "true" DGP is a general AR(p). There is concern that the estimated coefficient ofφ in Equation (2) will be inconsistent due to the autocorrelated error v t . However, this issue is largely irrelevant here because our focal point is forecasting y, not estimating the coefficient. One may use the generalized least squares method such as Cochrane-Orcutt estimation to mitigate the effect of serial correlation bias. Our Monte Carlo experiment shows that the proposed intervals perform well even without correcting the serial correlation bias.
Using the parsimonious model (2) has two benefits that are overlooked in the forecasting literature. First, notice that y t−1 is correlated with y t−2 . As a result, there is the issue of multicollinearity (correlated regressors) for Equation (1), but not Equation (2). The absence of multicollinearity can reduce the variance and improve the efficiency ofφ, which explains why a simple model can outperform a complicated model in terms of out-ofsample forecasting. Second, it is well known that the autoregressive coefficient estimated by OLS can be biased-see Shaman and Stine [15], for instance. As more coefficients need to be estimated in a complex AR model, its forecast can be less accurate than that of a parsimonious model.

Iterated Block Bootstrap Prediction Intervals
The goal is to find the prediction intervals for future values (y n+1 , y n+2 , . . . , y n+h ), where h is the maximum forecast horizon, after observing Ω = (y 1 , . . . , y n ). This paper focuses on the bootstrap prediction intervals because (i) they do not assume the distribution of y n+i conditional on Ω is normal, and (ii) the bootstrap intervals can automatically take into account the sampling variability of the estimated coefficients.
The TS intervals of Thombs and Schucany [2] are based on a "long" p-th order autoregression: The TS intervals assume that the error e t is serially uncorrelated, because the standard or iid bootstrap only works in the independent setting. This assumption of independent errors requires that the model (4) be dynamically adequate, i.e., a sufficient number of lagged values should be included. It is not uncommon that the chosen model can be complicated (e.g., for series with a long memory), which contradicts the principle of parsimony. Actually, the model (4) is just a finite-order approximation if the true DGP is an ARMA process with AR(∞) representation. In that case, the error e t is always serially correlated no matter how large p is. This extreme case implies that the assumption of serially uncorrelated errors can be too restrictive in practice.
This paper relaxes that independence assumption, and proposes the block bootstrap prediction intervals (BBPI) based on a "short" autoregression. Consider the AR(1), the simplest one: Most often, the error v t is serially correlated, so model (5) is inadequate. Nevertheless, the serial correlation in v t can be utilized to improve the forecast. Toward that end, the block bootstrap will later be applied to the residual whereφ 1 is the coefficient estimated by OLS. But first, any bootstrap prediction intervals should account for the sampling variability ofφ 1 . This is accomplished by running repeatedly the regression (5) using the bootstrap replicate, a pseudo time series. Following Thombs and Schucany [2] we generate the bootstrap replicate using the backward representation of the AR(1) model Note that the regressor is lead not lag. Denote the OLS estimate byθ 1 , and the residual byû t :û then one series of the bootstrap replicate (y * 1 , . . . , y * n ) is computed in a backward fashion as (starting with the last observation, then moving backward) By using the backward representation we can ensure the conditionality of AR forecasts on the last observed value y n . Put differently, all the bootstrap replicate series have the same last observation, y * n = y n . See Figure 1 of Thombs and Schucany [2] for an illustration of this conditionality.
In Equation (9), the randomness of the bootstrap replicate comes from the pseudo error termû * t , which is obtained by the block bootstrap as follows: 1.
Save the residual of the backward regressionû t given in Equation (8).

2.
Let b denote the block size (length). The first (random) block of residuals is where the index number i1 is a random draw from the discrete uniform distribution between 1 and n − b + 1. For instance, let b = 3 and suppose a random draw produces i1 = 20, then B 1 = (û 20 ,û 21 ,û 22 ). In this example the first block contains three consecutive residuals starting from the 20th observation. By redrawing the index number with replacement we can obtain the second block B 2 = (û i2 ,û i2+1 , . . . ,û i2+b−1 ), the third block B 3 = (û i3 ,û i3+1 , . . . ,û i3+b−1 ), and so on. We stack up these blocks until the length of the stacked series becomes n.û * t denotes the t-th observation of the stacked series.
Resampling blocks of residuals is intended to preserve the serial correlation of the error term in the parsimonious model. Generally speaking, the block bootstrap can be applied to any weakly dependent stationary series. Here it is applied to the residual of the short autoregression.
After generating the bootstrap replicate series using Equation (9), next, we refit the model (5) using the bootstrap replicate (y * 2 , . . . , y * n ). Denote the newly estimated coefficient (called bootstrap coefficient) byφ * 1 . Then, we can compute the iterated block bootstrap l-step forecastŷ * n+l asŷ * n = y n ,ŷ * n+l =φ * 1ŷ * n+l−1 +v * l , (l = 1, . . . , h) where the pseudo errorv * l is obtained by block bootstrapping the residual (6). For example, let h = 8, b = 4. Then two blocks of residuals (6) are randomly drawn, and they are (11) represents the l-th observation of the stacked series The ordering of B 1 and B 2 in the stacked series (12) does not matter. It is the ordering of the observations within each block that matters. That within-block ordering preserves the temporal structure.
Notice that the block bootstrap has been invoked twice: first it is applied toû t (8), then it is applied tov t (6). The first application adds randomness to the bootstrap replicate y * t ; whereas the second application randomizes the predicted valueŷ * n+l . To get the BBPI, we need to generate C series of the bootstrap replicate (9), use them to fit the model (5), and use Equation (11) to obtain a series of the iterated block bootstrap l-step forecasts where i is the index. The l-step iterated BBPI at the α nominal level are given by . Throughout this paper, we let α = 0.90. To avoid the discreteness problem, one may let C = 999, see Booth and Hall [16]. In this paper we use C = 1000 and find no qualitative difference.
Basically, we apply the percentile method of Efron and Tibshirani [17] to construct the BBPI. De Gooijer and Kumar [18] emphasize the percentile method performs well when the conditional distribution of the predicted values is unimodal. In preliminary simulation, we conduct the DIP test of Hartigan and Hartigan [19] and find that the distribution is indeed unimodal.

Direct Block Bootstrap Prediction Intervals
We call the BBPI (14) iterated because the forecast is computed in an iterative fashion: in Equation (11), the previous step forecastŷ * n+l−1 is used to compute the next stepŷ * n+l . Alternatively, we can use the bootstrap replicate (y * 1 , . . . , y * n ) to run a set of direct regressions using only one regressor. In total there are h direct regressions. More explicitly, the l-th direct regression uses y * t as the dependent variable and y * t−l as the independent variable. Denote the estimated direct coefficient byρ * l . The residual is computed aŝ Then, the direct bootstrap forecast is computed aŝ whereη * l is a random draw with replacement from the empirical distribution ofη t,l . The l-step direct BBPI at the α nominal level is given by There are other ways to obtain the direct prediction intervals. For example, the bootstrap replicate (y * 1 , . . . , y * n ) can be generated based on the backward form of direct regression. Ing [20] compares the mean-squared prediction errors of the iterated and direct point forecasts. In the next section, we will compare the iterated and direct BBPIs.

Error Distributions
This section compares the performances of various bootstrap prediction intervals using the Monte Carlo experiment. First, we investigate the distribution of error terms. Following Thombs and Schucany [2], the data generating process (DGP) is an AR (2): where φ 1 = 0.75, φ 2 = −0.5, t = 1, . . . , 55. The error u t follows an independently and identically distributed process. Three distributions are considered for u t : the standard normal distribution, the exponential distribution with mean of 0.5, and mixed normal distribution 0.9N(−1, 1) + 0.1N(9, 1). The exponential distribution is skewed; the mixed normal distribution is bimodal and skewed. All distributions are centered to have zero mean. We compare three bootstrap prediction intervals. The iterated block bootstrap prediction intervals (IBBPI) are based on the "short" AR(1) regression (5) and its backward form (7). The TS intervals of Thombs and Schucany [2] are based on the "long" AR(2) regression (18) and its backward form. Finally, the direct block bootstrap prediction intervals (DBBPI) are based on a series of first-order direct autoregressions. Each bootstrap prediction intervals are obtained from the empirical distribution of 1000 bootstrap forecasts. That is, we let C = 1000 in Equation (13) for the IBBPI, and so on. For the IBBPI and DBBPI, the block size b is 4. The TS intervals use the iid bootstrap, so b = 1.
The first 50 observations (n = 50) are used to fit the regression. Then, we evaluate whether the last five observations are inside the prediction intervals. That is, we focus on the out-of-sample forecasting. The main criterion for comparison is the average coverage rate (ACR): where 1( .) denotes the indicator function. The number of iteration is set as m = 20,000. The forecast horizon h ranges from 1 to 5. The nominal coverage α is 0.90. The intervals whose ACR is closest to 0.90 are deemed the best. Figure 1 plots the ACR against h, in which the ACRs of the IBBPI, TS intervals, and DBBPI are denoted by circle, square and star, respectively. In the leftmost graph, the error follows the standard normal distribution. It is shown that the ACR of the IBBPI is closest to the nominal coverage 0.90, followed by the TS intervals. The DBBPI have the worst performance. For instance, when h = 5, the IBBPI has ACR of 0.883, the TS intervals have ACR of 0.854, and the DBBPI has ACR of 0.829. The ranking remains largely unchanged for the exponential distribution (in the middle graph) and mixed normal distribution (in the rightmost graph). Overall, Figure 1 indicates that (i) the IBBPI has the best performance, and (ii) the DBBPI has the worst performance. Finding (ii) complements Ing [20], which shows that the iterated point forecast outperforms the direct point forecast. Finding (i) is new. By comparing the three graphs in Figure 1, we see no significant change in ACRs as the error distribution varies. This is expected because all intervals are bootstrap intervals that do not assume normality.

Autoregressive Coefficients
Now, we consider varying autoregressive coefficients in Equation (18): where t = 1, . . . , 55 and u t ∼ iidn(0, 1). The leftmost graph in Figure 2 looks similar to that in Figure 1 because the same DGP is used. In the middle graph we see no change in the ranking. The rightmost graph is interesting, where the sum of autoregressive coefficients is 1.2 − 0.2 = 1. Therefore, the series becomes nonstationary (having unit root). Obviously, nonstationarity causes distortion in the coverage rate, particularly when h is large. In light of this, we recommend applying the prediction intervals to the differenced data if unit roots are present. It is surprising to see the direct intervals are the best in the presence of nonstationarity, which may be explained by the fact that they are based on the direct regression (More simulations can be found in the full-length version of the paper where we examine the effects of sample sizes and sizes of blocks, and we compare block bootstrap intervals vs stationary bootstrap intervals, and overlapping vs non-overlapping blocks).

Principle of Parsimony
So far the DGP has been the AR(2). Next we use an ARMA(1,1) as the new DGP: where t = 1, . . . , 55 and u t ∼ iidn(0, 1). In theory, there is AR(∞) representation for this DGP. Thus, any AR(p) model is a finite-order approximation.
We verify the principle of parsimony (POP) in two ways. Figure 3 compares the iterated block bootstrap prediction intervals based on the AR(1) regression, to the TS intervals based on the AR(2) regression (TS2, denoted by diamond), the AR(3) regression (TS3, denoted by square) and the AR(4) regression (TS4, denoted by star). For the TS intervals, we do not check whether the residual is serially uncorrelated. That job is left to Figure 4. Figure 3 uses three sets of φ and θ. We see the block bootstrap intervals have the best performance in all cases. The performance of the TS deteriorates as the autoregression gets longer. This is the first evidence that POP may be applicable for interval forecasts.
The second evidence is presented in Figure 4, where the TS intervals are based on the autoregression whose order is determined by the Breusch-Godfrey test, which is appropriate since the regressors are lagged dependent variables (so the regressors are not strictly exogenous). This is how the model selection works. We start from the AR(1) regression. If the residual passes the Breusch-Godfrey test, then the AR(1) regression is chosen for constructing the TS intervals. Otherwise, we move to the AR(2) regression, apply the Breusch-Godfrey test again, and so on. In the end, the TS intervals are based on an adequate autoregression with serially uncorrelated errors.  In the leftmost graph of Figure 4, φ = 0.4, θ = 0.2. We see the BBPI outperforms the TS intervals when h equals 1 and 2; for higher h their ranking reverses. In the middle and rightmost graphs, more serial correlation is induced as θ rises from 0.2 to 0.6, and as φ rises from 0.4 to 0.9. In those two graphs, the BBPI dominates the TS intervals.
The fact that the BBPI fails to dominate the TS intervals in the leftmost graph indicates a tradeoff between preserving serial correlation and adding variation. Remember that the BBPI uses the block bootstrap that emphasizes preserving serial correlation. By contrast, the TS intervals use the iid bootstrap, which can generate more variation in the bootstrap replicate than the block bootstrap.
Keeping that in mind, then the leftmost graph makes sense. In that graph, θ is 0.2, close to zero. That means the ARMA(1,1) model is essentially an AR(1) model with weakly correlated errors. For such series preserving correlation becomes secondary.
Therefore, the TS intervals may perform better than the BBPI in the presence of weakly correlated errors. It is instructive to consider the limit, when the serial correlation becomes 0 and the error term becomes serially uncorrelated. Then, the block size should reduce to 1, and the block bootstrap degenerates to the iid bootstrap, which works best in the independent setting. Finally, from Figures 3 and 4 we notice that when h = 1, the BBPI always outperforms the TS intervals, no matter the serial correlation is weak or strong. This fact adds value to the BBPI for the short-horizon forecasts.

Conclusions
This paper proposes new prediction intervals by applying the block bootstrap to the first-order autoregression. The AR(1) model is parsimonious in which the error term can be serially correlated. Then, the block bootstrap is utilized to resample blocks of consecutive observations in order to maintain the time series structure of the error term. The forecasts can be obtained in an iterated manner, or by running direct regressions. The Monte Carlo experiment shows (1) there is evidence that the principle of parsimony can be extended to interval forecast; (2) there is a trade-off between preserving correlation and adding variation; (3) the proposed intervals have superior performance for one-step forecast.