Bootstrap Prediction Intervals of Temporal Disaggregation

In this article, we propose an interval estimation method to trace an unknown disaggregate series within certain bandwidths. First, we consider two model-based disaggregation methods called the GLS disaggregation and the ARIMA disaggregation. Then, we develop iterative steps to construct AR-sieve bootstrap prediction intervals for model-based temporal disaggregation. As an illustration, we analyze the quarterly total balances of U.S. international trade in goods and services between the first quarter of 1992 and the fourth quarter of 2020.


Introduction
Temporally aggregated data with calendrical periods, e.g., weekly, monthly, quarterly, or annually, have been widely used in various fields because the aggregation technique is simple and convenient for summarizing long sequential measurements and reducing their data lengths. However, it is inevitable that temporal aggregation causes a significant loss of information as a time series of relatively high frequency is compressed into periodic totals of relatively low frequency (see [1][2][3]). Thus, when decompressing the totals, restoring the original information remains a considerable challenge in time series analysis, and numerous solutions for temporal disaggregation have been proposed in modern statistical and econometric literature.
Refs. [4][5][6][7][8] solved disaggregation problems by using additional information obtained from related indicators observed at a desired high frequency. Refs. [9,10] estimated unknown consecutive observations in a disaggregate series based on the model structure of a given aggregate series. In addition, as discussed in [11][12][13], most of recent publications on disaggregation problems focus on exploring point estimation approaches. In other words, there has been, as yet, no systematic examination of interval estimation for temporal disaggregation. The aim of this study is therefore to propose an interval estimation method and to broaden our understanding of temporal disaggregation.
Refs. [14][15][16] introduced an AR-sieve bootstrap procedure to construct nonparametric prediction intervals for linear time series models such as stationary and invertible autoregressive integrated moving-average (ARIMA) processes. AR-sieve bootstrap interval estimation provides a theoretical basis for solving the model-based disaggregation problems of [9,10]. This article is organized as follows. Section 2 presents two model-based disaggregation methods-(1) the generalized least squares (GLS) disaggregation and (2) the ARIMA disaggregation, developed by [9,10], respectively. In Section 3, we propose a modified procedure for finding AR-sieve bootstrap prediction intervals of temporal disaggregation. In Section 4, we analyze a real time series, U.S. international trade balances in goods and services from 1992 to 2020, using the proposed approach. Moreover, some further remarks are given in Section 5.

The GLS Disaggregation
Suppose that x t is a finite time series for t = 1, 2, . . . , n, and the dth differenced series is covariance stationary with mean zero and autocovariance , and Z is the set of integers. The m-periodic temporal aggregate (or, simply, m-aggregate) series X T is defined as the nonoverlapping sum of the m sequential observations, x m(T−1)+1 , x m(T−1)+2 , . . ., x mT , i.e., for T = 1, 2, . . . , N and n = mN, and the dth differenced series of X T is given by (1) and (3) can be rewritten as matrix forms, u = ∆ n x and V = ∆ N X, where In addition, u and V are linearly associated, i.e., V = Ωu with Then, the GLS disaggregation estimates of x and u are given bŷ respectively, where Σ u and Σ V are the covariance matrices of u t and V T , 0 i×j is the zero matrix of dimension i × j, 1 m = (1, . . . , 1) m is the all-ones vector of size m, and I d is the identity matrix of dimension d × d. The symbol ⊗ denotes the Kronecker product, and the vertical and horizontal lines indicate partitioned matrices in (6). (For a more detailed discussion, see [9].) However, the solutions of either (6) or (7) cannot be directly derived unless the covariance matrix Σ u is known. In the following subsection, a model-based solution for the unknown covariance matrix Σ u is discussed.
Ref. [10] pointed out that (9) is not necessarily a one-to-one mapping, and they solved the disaggregation problem under some reasonable assumptions on ARIMA models. Suppose that an ARIMA(p, d, q) model of x t , is a disaggregation solution for an ARIMA(p, d, r) model of X T , with r ≤ p + d + 1, where e t and E T are Gaussian white noise with mean zero and constant variances σ 2 e and σ 2 E , respectively. It is also assumed that the AR and MA polynomials of (10), 1 − ∑ p i=1 φ i B i and 1 − ∑ q j=1 θ j B j , share no common factors. The same condition of no common factors is assumed for the model polynomials in (11) Under the condition of no hidden periodicity for aggregation order m (see [18], pp. 512-513), the model parameters φ i and θ j in (10) can be solved as follows [10]: 1.
The AR part: • When m is odd and the AR polynomial of the aggregate model is factorized as the disaggregate AR parameters φ i are given by • Otherwise, a disaggregate model cannot be uniquely determined.

2.
The MA part: Ref. [19] showed that the m-aggregate model for ARIMA(p, d, q) with q ≤ p + d + 1 is ARIMA(p, d, r) with r ≤ p + d + 1. Thus, it is reasonable to assume that the maximum MA order of the disaggregate model is q = p + d + 1.
Consider k = p + d + 1 in (9). Because γ V (j) for j > d + 1 depends only on γ u (i) for i > d + 1, the aggregation transformation matrix A in (9) can be partitioned as where P 11 is a square matrix of size d + 2, such as Γ V (d + 1) = P 11 Γ u (d + 1), and P 12 and P 22 are block matrices of dimensions (d + 2) × ((l + 1) − (d + 2)) and Using the approaches given in Theorem 1 of [10], Equation (9) is rewritten as where In other words, if the matrix in (15) is not singular, we have As discussed in [10], the inverse matrix in (16) exists under the condition of no hidden periodicity for order m. Then, we can derive the disaggregate MA parameters θ j from the autocovariances Two special cases of the ARIMA disaggregation are illustrated below.

Example 1.
Consider an m-aggregate series X T written as the ARIMA(1, 1, 0) model with aggregation order m = 3. We remark that the ARIMA(1, 1, 3) model is a possible solution for the unknown disaggregate series x t , under the conditions of p = 1 and q ≤ p + d + 1 = 3. We find the model parameters of (18) as follows.

AR-Sieve Bootstrap Prediction Intervals of Temporal Disaggregation
In this section, we extend the framework of temporal disaggregation to interval estimation. The traditional parametric approaches of finding prediction intervals are not applicable to this disaggregation problem because the true distributions of realizations are unknown at disaggregate time points. Thus, we alternatively consider a nonparametric method for (1 − α)% AR-sieve bootstrap prediction intervals, developed by [14][15][16]. Based on the bootstrap approach, we present a modified procedure for constructing bootstrap prediction intervals of temporal disaggregation.
Step 2: The dth differenced series u t = (1 − B) d x t is assumed to be stationary and invertible in (10). The invertibility admits an AR(∞) representation for u t such that and the ith coefficient estimateπ i is derived from the numerical association Furthermore, we decide an appropriate AR approximation of order s under the condition of |π s+h | < τ for h = 1, 2, . . ., where τ is a predetermined positive value close to zero.
Step 3: We compute the centered residuals of the AR(s) approximation, defined as for t = s + 1, . . . , n, whereê In addition, we obtain the empirical distribution function of the centered residuals, where the indicator 1ẽ t ≤z = 1 ifẽ t ≤ z or 0 elsewhere.
Step 8: We repeat Steps 4-7 many times and find the bootstrap distribution function ofx * t , denoted by F n (x * t ). Finally, we construct the (1 − α)% prediction interval for the unknown disaggregate observation x t , given by where Q * (k) is the kth percentile of F n (x * t ).
When the original procedure proposed by [14][15][16] and our modified version are compared, two main differences exist. First, the AIC (Akaike information criterion) or BIC (Bayesian information criterion) model identification is replaced with the GLS and ARIMA disaggregations in Step 1. From this perspective, the unknown disaggregate model can be directly selected from the structure of the known aggregate series without extra information about the model criteria of the disaggregate series. Next, while the h-step ahead forecasting is used in the original procedure for h > 0, we sequentially estimate the unknown disaggregate series with the one-step forward predictions for t = s + 1, . . . , n and the one-step backward predictions for t = s, s − 1, . . . , 1 in Step 7. Since the one-step ahead forecast limit is narrower than the h-step ahead forecast limit in general, the modification can prevent the prediction intervals from being unreasonably wide or unbounded.
Furthermore, we conduct a simulation study to analyze the properties of the AR-sieve bootstrap resamples of Step 4. Suppose that a finite disaggregate series x t , t = 1, 2, . . . , 5000, follows an ARIMA(1, 1, 1) model of (1 − φB)(1 − B)x t = (1 − θ)e t , where e t is Gaussian white noise with mean 0 and standard deviation σ e = √ 3. We generate 12 different time series for all the coefficient pairs of φ and θ ∈ {−0.75, −0.35, 0.35, 0.75} and φ = θ. Through Steps 2-4, we obtain the centered residualsẽ t and an AR-sieve bootstrap resample of size 5000 for each case. For all 12 pairs, the probability density plots of the bootstrappedẽ t are drawn in Figure 1, and their mean values, standard deviations, and the Shapiro-Wilk test results including the W-statistics and p-values are reported in Table 1. It is clear from these results that each bootstrap resample has a bell-shaped distribution analogous to the assumption of mean 0 and σ e = √ 3 ≈ 1.73205, and its normality is not rejected at significance level α = 0.05.

Real Data Analysis: The U.S. International Trade in Goods and Services
The U.S. Census Bureau and the U.S. Bureau of Economic Analysis periodically publish the total balance, defined as the difference between exports and imports, of U.S. international trade in goods and services. The trade balance data, as one of the main economic indicators, play an important role in explaining the U.S. economy (see [22]). To demonstrate the applicability of the bootstrap interval estimation proposed in Section 3, we investigate the quarterly balances of U.S. international trade in goods and services from the first quarter of 1992 to the fourth quarter of 2020, collected from [23,24]. Let X T be the quarterly total balance of U.S. international trade in goods and services at time T = 1, 2, . . . , 116. Since the augmented Dickey-Fuller test for X T provides strong evidence (i.e., test statistic: −2.1213 and p-value: 0.5262) of a unit-root, the lag one differenced series V T = X T − X T−1 is now considered. In addition, the BIC values of the fitted ARMA(p, q) models of V T , for p and q = 0, 1, . . . , 10, are computed with maximum likelihood (ML) estimation. Since the smallest BIC value 2439.75 is found at p = 0 and q = 1 as listed in Table 2, it is believed that the MA(1) model of V T or, equivalently, the IMA(1, 1) model of X t given below is the most appropriate fit for the data. Here,γ V (0) = 112841784,γ V (1) = −46810651, and σ 2 E = 87918193. Assume that X T is an m-aggregate series of m = 3 and the monthly trade balance series x t , t = 1, 2, . . . , 348, is unknown and needs to be estimated. From (26) in Example 2, we obtainγ u (0) = 11 81γ V (0) − 32 81γ V (1) = 33817290 andγ u (1) = − 4 81γ V (0) + 19 81γ V (1) = −16552710. As described in (27), we solve the quadratic equation and so findθ ≈ 0.8130118 or 1.2299944. However, due to the invertibility condition |θ| < 1, we only retain the IMA(1, 1) model of x t , withσ 2 e =γ u (0)/(1 +θ 2 ) = −γ u (1)/θ 2 = 20359741. Furthermore, we calculatex t , t = 1, 2, . . . , 348, using the GLS disaggregation formula of (6). The monthly disaggregate estimatesx t and the monthly averages of the observed aggregate series, i.e., X T /3, are compared in Figure 2. As illustrated in the lined graph of Figure 2, it seems obvious thatx t is well interpolated into the gaps between the monthly averages X T /3 and sufficiently explains the unknown monthly pattern.

Concluding Remarks
In this research, we have developed a modified procedure for constructing AR-sieve bootstrap prediction intervals of an unknown disaggregate series x t . As demonstrated in Section 4, when the upper and lower limits of the prediction intervals are linked over the indexed time, we can visually outline the sequential pattern of unknown disaggregate observations within certain bandwidths. Therefore, the proposed prediction intervals can provide a measure of reliability for temporal disaggregation and account for the uncertainty in estimating x t .
In addition, the proposed method is composed of the model-based point estimation introduced in Section 2 and the nonparametric bootstrap interval estimation described in Section 3. That is, the iterative procedure of bootstrap resampling generates empirical probability distributions for unknown disaggregate observations, while a disaggregate ARIMA model is established under the assumption of Gaussian innovations. From this perspective, the proposed method can be interpreted as a semiparametric approach to estimating and predicting unknown x t values.
Our underlying assumption is that the white noise distribution is Gaussian. However, non-Gaussian time series are also frequently found in real life. For example, if a set of measurements is a time series that contains whole numbered counts, they can be modeled with a Poisson count process. Refs. [25][26][27][28] developed the generalized AR models and related estimation methods for the Poisson scenarios. In future work, as an extension of the AR-sieve method to non-Gaussian models, we will explore a wide range of possibilities of the Poisson AR-sieve bootstrap prediction intervals of temporal disaggregation. Moreover, since the current GLS and ARIMA disaggregations do not meet the non-Gaussian conditions, we will examine other model-based disaggregation methods, for example, [8,29].