Simultaneous Indirect Inference, Impulse Responses and ARMA Models †

: A two-stage simulation-based framework is proposed to derive Identiﬁcation Robust conﬁdence sets by applying Indirect Inference, in the context of Autoregressive Moving Average (ARMA) processes for ﬁnite samples. Resulting objective functions are treated as test statistics, which are inverted rather than optimized, via the Monte Carlo test method. Simulation studies illustrate accurate size and good power. Projected impulse-response conﬁdence bands are simultaneous by construction and exhibit robustness to parameter identiﬁcation problems. The persistence of shocks on oil prices and returns is analyzed via impulse-response conﬁdence bands. Our ﬁndings support the usefulness of impulse-responses as an empirically relevant transformation of the conﬁdence set.


Introduction
Indirect inference refers to resampling-based statistical methods that use simulations to calculate estimating functions including likelihoods, scores or moments. The idea of replacing complicated or intractable statistical functions by computer simulations motivated initial applications and underlying theory; see Smith (1993); Gouriéroux et al. (1993); Gouriéroux and Monfort (1996) and Gallant and Tauchen (1996). Since then, available evidence suggests that indirect inference often works better than traditional methods in finite samples for bias or misspecification corrections under various situations. 1 We propose a two-stage framework that can yield identification robust confidence sets via Indirect Inference for finite samples. The primary focus concerns inference in Autoregressive Moving Average (ARMA) models and production of simultaneous confidence bands from its impulse-response functions. 2 Despite the simplicity of ARMA models in time series, identification and boundary issues raise enduring complications for estimation and inference. First, Autoregression (AR) unit roots cause the so-called non-uniform convergence problem, which means that methods requiring stationarity each function over the non-rejected values of the parameters. The confidence intervals obtained are thus simultaneous by construction and robust to the aforementioned identification problems.
We conduct a simulation study on ARMA and MA processes and compare our method with MLE-based inference. Simple objective functions as in Zinde-Walsh (1994, 1997), as well as over-identified criteria are considered, in addition to various lag structures. Results can be summarized as follows. In contrast to MLE, our method eradicates pile up root cancelation problems and boundary problems, even with very small samples, with no more than 50 observations. MLE is oversized for both individual parameter and joint tests. None of the estimation functions steadily outperforms the rest for all sample sizes and pairs tested. Thus, we incorporate a joint criterion, constructed by averaging the three proposed objective functions, to exploit the power advantages of each of them. Leads improve power for highly persistent processes. Overidentification yields important power benefits. On balance, autocorrelation-based methods require a larger sample to catch up power-wise with AR-based counterparts. Inverting the asymptotic Wald test based on MLE estimates is almost infeasible at corner solutions with parameters close to the unit boundary, since in many instances the variance-covariance matrix is not invertible. This result is expected and provides support to our research.
Our methodology is applied with a univariate perspective to investigate the persistence of oil shocks via impulse-responses confidence bands. We select daily spot prices of the West Texas Intermediate (WTI), a high quality, light sweet crude oil, delivered at Cushing, Oklahoma and priced in U.S. dollars per barrel. We focus on the logarithmic transformation of prices and returns at the weekly and monthly frequencies. The main advantage of our method is the possibility of remaining agnostic with respect to the process followed by these series. Among our results, we highlight that our simultaneous impulse-response confidence bands seem well identified, despite the presence of parameter uncertainty in these series. We find that root cancelation does not bias our results; however it affects predictability; in particular, by obscuring the behaviour of impulse-responses for oil returns. Hence, when the focus of the analysis is on predictability, researches should favour the study on prices, instead of growth in prices. We find that shocks to oil prices are permanent, taking about a decade or so to die down. Also, relying on MLE to estimate the effect of shocks when processes are highly persistent can mislead researches when conducting estimation and forecasting. In contrast, our proposed impulse-response confidence bands represent an empirically relevant transformation of the joint confidence set.
We introduce our framework in Section 2 and discuss inference by test inversion in Section 3. The ARMA special case is presented in Section 4. Simulation results are presented in Section 5. An empirical application with focus on benchmark oil prices and returns is explored in Section 6 and we conclude in Section 7.

General Framework
Consider the general model where X is the sample space, such that a sample of size T is represented as X = (X 1 , ..., X t , ..., X T ) , P (ϑ) is a probability distribution over X indexed by ϑ and Θ is the parameter space. We also asume that ϑ can be partitioned as where Θ β and Θ σ are partitions of Θ. β is the q-dimensional parameter of interest and σ is a nuisance parameter, which we aim to partial out.
Pseudo-data can be simulated from (1) and an auxiliary parameter denoted λ, of dimension p ≥ q, can be defined in a way that it is linked to β via an exact or limiting binding function, λ = L(ϑ). In general, L(.) need not have a known analytical form. Furthermore, a simple statistic is available to estimate λ given X.
Denote byλ =Λ(X) the considered data-based estimate of λ whereΛ(.) is the estimating function. Its simulation-based counterpart, given a specific value of ϑ, can be derived, for example, in this way: (i) creating H simulated paths, each of size T from (1) given ϑ; (ii) applying the same estimation method to each simulated path, leading to a series of estimations,Λ (x h (ϑ)), h = 1, ..., H and then (iii) computing their average as follows, The distance betweenΛ(X) and Λ(ϑ) forms the basis of Indirect Inference. A common special case is the Wald-type distance measure (Λ(X) − Λ(ϑ)) Ω (Λ(X) − Λ(ϑ)) whereΩ is a positive definite weighting matrix. Gouriéroux et al. (1993) suggestΩ = I p where I p denotes a p-dimensional identity matrix because the loss in efficiency with respect to an optimal estimator can be disregarded. This practice is also suggested, more recently, by Gouriéroux et al. (2010).

Nuisance Parameters
Focusing on β as the parameter of interest requires a strategy to account for σ. In some settings, σ can be easily estimated given β. Alternatively, an auxiliary estimate, or objective function that is exactly invariant to σ can be considered. We emphasize the latter case in this study, defined via the following assumptions that characterize the model and its respective estimating functions.
This assumption ensures that which yields an invariant objective function for the transformation that maps X * into X † . Indeed, applying Assumption 1 to the calibration in (6), we have: It follows that σ can be set to any value in Θ σ for simulation purposes with no bearing on estimation, nor inference, via objective functions of the form In line with the above cited literature, (10) uses an identity weighting matrix. Other choices are possible as long as S(X; ϑ) remains invariant to σ, or the distribution of S(X; ϑ) does not depend on σ under the null hypothesis that fixes β.
A common special case includes scale invariance, which implies robustness to multiplying the data by any positive scalar a, so Assumption 1 yields the following, which together imply:Λ (x h ((β , 1) )) =Λ(x h ((β , a) )).
Its multivariate counterpart involves multiplying the data by an invertible matrix. For illustrations on finite sample cases, see Beaulieu et al. (2007Beaulieu et al. ( , 2013 and Dufour et al. (2003. Invariance also implies that in the simulation study, σ can be set to any known value in its parameter space, since it will be partialled-out. Hence, we simplify our notation by supressing σ in S(X; ϑ), Λ(ϑ) andΛ(ϑ), and redefining them as S(X; β), Λ(β) andΛ(β).

Inference via Test Inversion
Standard applications of Indirect Inference resort to minimizing (10) and formulating adequate regularity conditions, so that the resulting estimator, denotedβ, is (typically) asymptotically normal with a covariance matrix that can be estimated consistently. To compare and contrast our proposed inference method with common practices and to define test inversion within a familiar setting, let us revisit the traditional α-level confidence interval, for a given scalar function g (β), whereΣ 1/2 g denotes the relevant standard error estimate computed (for example, via the delta-method) from the estimated asymptotic variance-covariance matrix ofβ and z α/2 is a two-tailed standard normal critical point. As it is well known, CI(g(β); α) is derived by solving the following inequality, over g(β 0 ): Said differently, the solution of (15) inverts a t-type test of based on the statistic |g(β) − g(β)|/Σ 1/2 g . The null hypothesis may also cover the full model specification in conjunction with testing g(β), in which case, the alternative hypothesis is no longer restricted to g(β) = g(β 0 ). Interpreting rejections when the model is jointly tested calls for caution; and when such a test is inverted, the outcome may be the empty set, which indicates that the model under the null is misspecified. The test that we invert below follows these principles, in the sense that model misspecification is covered by the alternative hypothesis.
Ideally, P[g(β) ∈ CI(g(β); α)] ≥ 1 − α, at least as T → ∞. However, identification failures, or stationarity concerns will cause intervals of this form to severely under-cover, even with large samples. In our research, we depart from this approach and propose a confidence set for β based on inverting a test of the form, using (10), which we treat as a test statistic. Inverting a test of H 0 (β 0 ) at a given level α consists in collecting, numerically or analytically, the β 0 values that are not rejected using the considered test at the specified level. For example, given the right-tailed test statistic, and assuming (for the moment and for illustrative purposes) that an α -level cut-off point T c is available, the test inversion involves solving, over β 0 , the inequality T ( β 0 ) < T c . The solution of this inequality is a parameter space subset, denoted CS(β; α), that satisfies: Since the finite sample distribution of (10) is intractable, and since there is no reason to expect that a cut-off point invariant to β 0 can be found, a numerical solution is required. We propose collecting via numerical search, the β 0 values that are not rejected using a MC p-value that we will define in the next section, denotedp (β 0 ). In other words, we propose to search for the β 0 values for whichp (β 0 ) > α. The result is a joint α -level confidence region denoted CS(β; α), as mentioned before. The set thus derived can be empty, which will provide a diagnostic test of the hypothesized model. Said differently, an empty confidence set would occur when all values of β are rejected at the considered level, meaning that in this case, the selected specification would be incompatible with the available data. Our proposed variant of Indirect Inference preserves its specification-robustness attributes. If identification is weak, the set will be unbounded or diffuse, which will also provide useful information regarding model fit (or lack thereof).
If a confidence interval for the individual components of β or, more generally, for a given function g(β) is desired, the above defined CS(β; α) can be projected. This implies minimizing and maximizing g(β) over the β values in CS(β; α). We study a special case of ARMA-based impulse-response functions that illustrates the usefulness of such projections.

Exact p-Values
We propose applying Dufour's (2006) MC method to compute the abovep (β 0 ) empirical p-value. Given that the null distribution of the considered statistic is nuisance parameter free, the MC method controls the type I error even if S(X, β 0 ) is itself simulation-based, in finite samples. This method is summarized as follows.
First, we introduce a new layer of simulation, for each value of β 0 : We draw L replications satisfying the null hypothesis in (17) from the model, These replications should be generated independently from the simulations underlying S(X, β 0 ). Next, we applyΛ(.) to each replication l leading tô Λ(x l (β 0 )), l = 1, ..., L, and compute a series of statistics that are similar to the one in (18), but now considering the vector of the estimated parameters obtained fromΛ(x l (β 0 )), for each replication l, instead of the vector obtained by applying the estimation method to the data. We are able to use the same vector of calibrated parameters, Λ(β 0 ) from (18), due to the exchangeability property of the MC test; see Dufour (2006). Lastly, the empirical p-value associated with the null hypothesis is computed asp where G L denotes the number of times the statistic S l (x l (β 0 )) is greater than or equal to the data-based statistic S(X, β 0 ). 5

ARMA Special Case
We consider the zero mean Gaussian ARMA(1,1) process of size T with unknown parameters where |θ| < 1 guarantees invertibility or an Autoregressive AR(∞) representation, |ψ| < 1 ensures stationarity in causal processes and the joint distribution of u 1 , ..., u t , ..., u T is known. For example, u t can be assumed independent and identically distributed with a standard normal distribution: This notation implies that X 0 is the pre-sample observation (i.e., it is not observed), in which case further assumptions are needed due to our indirect inference focus. In a stationary 6 environment, X 0 should be set to the following: Alternatively, e 0 can be set to zero, which maintaining (27), fixes X 0 to zero. Gaussian and Student-t errors are considered in our Monte Carlo (MC) simulations and empirical application; the generalized lambda family of distributions as considered in Gospodinov and Ng (2015) is another interesting parametrization. Non-zero mean and exogenous variables can be projected out. With regards to the above notation, we maintain σ as the nuisance parameter and we focus on inverting tests as in (17), for H 0 : β = β 0 .
In this context, for any positive scalar a and number of lags p, we have: Because of our focus on invariant statistics, to draw such data generating processes (DGPs), the choice of σ is immaterial because any value selected would give exactly the same sequence of simulated statistics in (21); thus using (23), gives exactly the same p-value. 6 Alternatively, the data can be burned in to induce stationarity. Extensions to the mean-non stationarity framework as in Khalaf and Saunders (2019) is a worthy research direction.
These invariance properties guarantee that the argument that minimizes Q(.) would numericly coincide if the data generating process (DGP) is drawn with unit variance or any other value of σ; the same holds for the empirical autocorrelations. To simplify the notation, we use the same number of leads and lags in Q(X), although clearly, invariance to σ does not require this restriction. The choice in terms of the number of lags p is user defined, as it is the case with all auxiliary model-based methods.
In contrast to most existing methods, the methodology that we propose controls the significance level for any given p and thus regardless of the truncation order. This is discussed further in the context of our simulation study in Section 5. Conformably, we consider three choices for our estimating functionΛ(.): (i) two -sided OLS estimation with forward and backward-looking components; (ii) OLS estimation of backward-looking or long-AR; and (iii) empirical autocorrelations. For clarity, and given any sample vector X = (X 1 , ..., X T ) of size T, we will refer to these choices aŝ where Q(X) corresponds to (30) andρ j is given by (31). For each choice of the estimating function, we construct a test statistic considering the distance between the previously estimated and the simulation-calibrated auxiliary parameters. In addition, to be able to harvest all available information, we construct a fourth test statistic by a simple average of those three.
Because we focus on an invariant auxiliary statistics, to compute their simulation-based counterparts for any given value of the parameters, it suffices to create H simulated paths, Population autocorrelations need not be derived by simulation; yet invariance is explicitly imposed this way for any p. For the MA(1) special case, we also use a simplified auxiliary parameter: We fit a long-AR to observed and simulated data and retain the coefficient of the first lag X t−1 .

Impulse-Response Confidence Bands
Impulse-response functions are the most empirically and policy relevant transformations of ARMA parameters. If the stationarity condition is satisfied, (24) admits an MA (∞) approximation using lags operators L, and the empirical impulse-response coefficients can be computed following the well-known iterative process: These equations define a series of known functions of the vector β = (θ, ψ), denoted g i (β), i = 1, ..., m, that can be projected in turn, which imply minimizing and maximizing g i (β) over the β values in CS(β; α).
Resulting confidence intervals are simultaneous, in the following sense. Let g i CS (β; α) denote the image of CS (β; α) by the function g i . Then, (19) implies that The simulation study reported below includes an illustrative analysis of such projections emphasizing close to boundary values of the MA parameter θ and the AR parameter ψ, as well as robustness to root cancelation.

Simulation Study
We conduct MC simulation experiments to assess the performance of our method for MA(1) and ARMA(1,1) processes. Our objective is to study power and size when MA parameters are close to the non-invertibility region for both processes, and the AR parameter is close to the non-stationary region for the ARMA. Our experiments are designed as follows. Simulated samples with number of observations T equal to 50, 100 and 200, used here as pseudo-data, are drawn from the ARMA model in (24). First, time series of errors are drawn from a Gaussian, or a Student-t distribution with 5 degrees of freedom, setting σ = 1, in line with Assumption 1. These choices of sample size correspond to samples employed in other MC studies, as well as data used in empirical applications.
The dimension of the auxiliary parameter is assessed by simulation. Power is studied by selecting various combinations of number of parameters and paths within the simulations explored in our model. Various tests are performed by changing the number of auxiliary parameters in our experiments -refer to definitions in Section 2-from 1 to 3 and 8 for the MA and from 6 to 8 and 12 for the ARMA. We take into consideration the recommendations from Galbraith and Zinde-Walsh (1994) and Ghysels et al. (2003) for the long-AR estimation, and Gorodnichenko et al. (2012) for the estimation with autocorrelations. We find that selecting 8 auxiliary parameters provides the highest power for all the sample sizes. 7 Hence, we suggest that for empirical practices, the total number of auxiliary parameters is set to 8 for each of the choices of the estimating functions, explained in Section 4: (i) γ f = γ b = 4 in the OLS estimation with forward and backward-looking components, (ii) γ b = 8 in the OLS estimation of the long-AR, and (iii) j = 8 in the estimation with empirical autocorrelations. In the case of the simplified auxiliary parameter estimation for the MA process, the long-AR is estimated with 8 parameters, but only the first coefficient is retained.
As part of previous experiments, we test the number of simulated paths in the first simulation stage, setting H = 1 and H = 3, as suggested by Ghysels et al. (2003) for the MA, and also H = 5, to examine whether it could improve power for ARMA -refer to the notation in Section 2. However, H = 3 is finally selected, keeping the number of auxiliary parameters equal to 8, as the value which provides the highest power for all sample sizes. The number of replications in the second simulation step, involving the application of Dufour's (2006) MC method, is L = 199 and we consider a level of significance α = 0.05 -see notation in Section 3. An identity weighting matrix is used to preserve the invariance to scale of the statistic associated to every choice of the estimating function. Lastly, 1000 replications are considered on each MC simulation experiment.

Main Results
Each pair of {MA, AR} parameters employed in the data generating process (DGP) of ARMA(1,1) is denoted in our experiments as {θ DGP , ψ DGP }. Outcomes from all our simulation studies are available upon request. We focus on the following set of pairs to illustrate our results: {θ DGP , ψ DGP } = {{0.6, 0}; {0.99, 0}; {0.65, −0.65}; {0.99, 0.99}}. Note that we set the AR parameter ψ DGP equal to zero 7 Useful insight beyond the ARMA (1,1) on minimum dimension is found in the literature, in particular, Galbraith and Zinde-Walsh (1997) and more recently Gospodinov and Ng (2015). in (24) a priory to draw samples from MA(1) processes, for instance, when the value of θ DGP is 0.6 and 0.99.
We report power in terms of empirical rejections and our main results are summarized along the following lines. The results of the experiments conducted for MA(1) processes are presented in Tables 1 and 2, and the studies related to the ARMA(1,1) process are shown in Tables 3 and 4. We use abbreviations for each estimating function: OLS-FBLC, OLS-Long AR and AutoCorr correspond to the OLS estimation with forward and backward-looking components, the OLS estimation of the long-AR and the estimation with empirical autocorrelations, respectively. Size is controlled with our method for all sample sizes, choices of the estimating function, and for both Gaussian and Student-t errors, irrespective of how close the true parameters are to the unit root. It is well known that MLE gives size distortions when the sample size is small, for both, individual parameter tests and joint tests. Size seems to improve for the MLE when the sample size is increased and the parameters are far from the unit root regions. However, when the parameters approach the unit root regions, size fails for the MLE as the sample size increases, which confirms its asymptotic failure. 8 Results obtained in term of power for processes with Student-t errors are in line with the ones obtained with Gaussian errors. This outcome confirms the robustness of our framework to fatter tails, since we purposely ignore the fact that the errors are drawn from the Student-t distribution when we apply our method.
Tables 1 and 2 exhibit results from experiments on MA processes. The top section of this table compares our method -denoted SIM METHOD in the tables-versus MLE in terms of size. Note that MLE is oversized, while size remains exact for our method. We can identify that the pile up effect starts for values of the parameter θ higher than 0.6, in finite samples. Although our method does not produce size distortions, it cannot discriminate between tested values higher than 0.85, when the true DGP value is between 0.85 and 1. This result is expected and traced back to, e.g., Stock (1991) and references therein. Typically, when values of the parameter near the boundary (around 0.99) cannot be refuted, values substantially different (for instance, 0.85) from the boundary are also hard to refute in small samples. For small values of the parameter, the method with the simplified auxiliary parameter estimation, inspired in Zinde-Walsh (1994, 1997), shows an increase in power with respect to the OLS estimation of the long-AR. However, as the tested value of the parameter increases and gets closer to the unit root region, the latter method shows higher power than the former. This result indicates that overidentification is useful to increase power when we have higher persistence in MA(1) processes.
In general, the power of our method increases for the three estimating functions applied to the ARMA as the sample size increases, but none of them steadily shows the highest power for all sample sizes and pairs tested. When the AR parameter is closer to the unit root than the MA parameter and the sample is very small, 50 or 100 observations, our method applied with the OLS estimation with forward and backward looking components offers the highest power on the boundary. However, when the reverse occurs, the MA parameter is closer to the unit root than the MA parameter, our framework with the autocorrelations enables better discrimination between the true pair and any pair with values that are closer to the boundaries. Results obtained with the root cancelation pair, {θ DGP , ψ DGP } = {0.65, −0.65} are reported in Table 3. As before, the top section of the table compares our method versus MLE in terms of size. Size remains exact for our method in this case; however MLE is oversized. When the OLS estimation with forward and backward components is applied, it gives the highest power for almost all the pairs tested for a sample size of T = 50 observations. Once we increase the sample size to T = 100, this estimating function still shows the highest power, followed closely by the second choice of estimating function, the OLS estimation of the long-AR. When the sample size is 8 This is noteworthy in view of the above cited problems discussed in the literature. T = 200, the power obtained with the autocorrelations reaches closely to the power obtained with the other two estimating functions.  Lastly, we examine the almost unit root pair as the true DGP pair, {θ DGP , ψ DGP } = {0.99, 0.99} in Table 4. For most of the pairs tested, applying the method with the OLS estimation of the long-AR gives the highest power for sample sizes of T = 50 and T = 100. However, for the same sample sizes, using autocorrelations gives the highest power for the last three pairs tested, for which the value of the MA parameter is closer to the boundary than the value of the AR parameter. As we increase the sample size, power reaches similar values for our method with the three estimating functions.     Note: Numbers reported are empirical rejections for the hypothesis H 0 : θ = θ 0 and ψ = ψ 0 . Size is measured with θ 0 = θ DGP and ψ 0 = ψ DGP . Other choices for θ 0 and ψ 0 are reported as tested values in the power tables.

Robustness Checks
In addition to our previous experiments, we conducted simple robustness checks for our method. Two examples are shown in Tables 5 and 6. We present the power values for two ARMA processes which are misspecified as ARMA(1,1) with a sample size T = 100. In Table 5 we have an ARMA(2,1) with {θ DGP , ψ 1,DGP , ψ 2,DGP } = {0.30, 0.60, 0.85}, so we have a high persistence AR(2) mixed with a low persistence MA(1). In general, the method applied with autocorrelations offers more power rejecting the pairs where the AR parameter is closer to the boundary than the MA parameter; while with the OLS estimation with forward and backward-looking components we obtain more power rejecting the pairs where the MA parameter is closer to the boundary than the AR parameter.
The ARMA (2,1) in Table 6 with {θ 1,DGP , θ 2,DGP , ψ DGP } = {0.30, 0.85, 0.60}, shows a high persistence AR(1) mixed with a MA(2), which has higher persistence on its second parameter. In this case, we find that the method applied with both, the OLS estimation of forward and backward-looking components and the OLS estimation of the long-AR, is more powerful rejecting the misspecification than with the autocorrelations. Note that in both examples, our three choices of estimating functions provide good power, rejecting the misspecification for all the pairs tested. Note: Numbers reported are empirical rejections for the hypothesis H 0 : θ = θ 0 and ψ = ψ 0 . Various choices for θ 0 and ψ 0 are reported as tested values in the power tables.

Empirical Application
Our two-stage simulation-based framework is applied with a univariate perspective to investigate the persistence of oil shocks via impulse-response confidence bands. Apart from supply and demand conditions, estimation and forecasting of oil prices are complicated by the effect of non-market related characteristics, such as, regulations, technology and geopolitical considerations; see Hamilton (1983Hamilton ( , 1996Hamilton ( , 2003Hamilton ( , 2009Kilian (2008aKilian ( , 2008bKilian ( , 2008c and Kilian and Vigfusson (2017). Main concerns in the literature related to oil prices are, among others, whether the impact of oil shocks is temporary or permanent and whether a particular model can offer better performance than a random walk. Most studies support analyzing the real price of oil based on its explanatory power for macroeconomic forecasting of output, for instance, Baumeister and Kilian (2012) and Alquist et al. (2013). However, Hamilton (2011) argues that nonlinear transformations of the nominal price of oil can reflect threshold responses based on consumer sentiment. In addition, he notes that deflating nominal prices by the Consumer Price Index (CPI) introduces measurement errors that can affect forecasting results. Hence, we explore the behaviour of oil shocks considering the logarithmic transformation of prices and returns (log-differences), at the weekly frequency and also at the monthly frequency, in both, nominal and real terms.
We have identified four lines of research where oil series are characterized as a mean reverting process, in particular ARMA: (i) study of long-run persistence in oil prices, (ii) study of oil volatility persistence, (iii) DSGE calibration and (iv) application of structural Vector Autoregression (VAR) family models. The first line of research focuses on oil price dynamics with long series and makes modeling decisions considering mean reverting and random walk processes; see Pindyck (1999); Lee et al. (2006); Bernard et al. (2012);Gil-Alana and Gupta (2014). Oil volatility persistence is studied in the second line of research, either by fitting ARMA family models to volatility proxies based on transformations of oil returns, or by applying Generalized Autoregressive Conditional Heteroscedasticity (GARCH) family models directly to oil returns; see e.g., Choi and Hammoudeh (2009);Charles and Darné (2014) and references therein. In the third line of research, DSEG models frequently apply calibrations by fitting ARMA family models to oil prices, for instance, in the work of Atkeson and Kehoe (1999) and Leduc and Sill (2004). Lastly, the literature applying VAR family models is very extensive and it is mostly dedicated to forecasting in short to middle horizons; see Pindyck (2004); Kilian (2009);Alquist and Kilian (2010); Baumeister and Kilian (2012); Alquist et al. (2013); Kilian (2014a, 2014b), Baumeister et al. (2014); ; . 9 In VAR models, impulse-responses are analyzed by either applying simulations, or Bayesian methods. 10 Our simultaneous analysis is related to the research of Staszewska-Bystrova (2007, 2013; Jordà (2009); Staszewska-Bystrova and Winker (2013); Jordà and Marcellino (2010); Kilian (2013, 2016); Guerron-Quintana et al. (2017) and Montiel et al. (2019). In contrast to the VAR literature, our proposed impulse-response confidence bands are compatible with small sample sizes and robust to parameter identification. Our framework remains agnostic with respect to the process followed by oil prices and returns by allowing for processes with no persistence, spurious white noise related to root cancelation, heavy persistence, as well as, almost unit root. Hence, we are able to capture the range of movement of the series under various observationally equivalent processes, this means through pairs of parameters that could generate the same log-likelihood function. This characteristic is related to the work by Sims (2001) and references therein and Schorfheide (2003, 2004).

Data
Daily spot prices of the West Texas Intermediate (WTI) crude oil in U.S. dollars per barrel are obtained from the webpage of the Energy Information Administration (EIA); see details in the reference EIA (2019). The series is selected after the deregulation and the post OPEC-administering pricing system, from January 1986 to June 2019. 11 Wednesday prices are selected for the weekly series and the second Wednesday of the month is selected for the monthly series, to avoid any calendar effects, such as, Monday or weekend effect, see French (1980), Friday-effect, and turn-on -the-month-effect, see Ariel (1987). The behaviour of the weekly nominal spot price series for this benchmark is shown in Figure 1.
We explore the persistence of oil shocks on the logarithmic transformation of prices and returns (log-differences), at the weekly frequency and also at the monthly frequency, in both, nominal and real terms. Monthly nominal prices are deflated using the U.S. Consumer Price Index for All Urban Consumers All Items (CPIAUCSL), published by the U.S. Bureau of Labor Statistics, taking May, 2019 9 Refer to Kilian (2014) for a comprehensive walk through the literature on oil price shocks. 10 Lütkepohl et al. (2015) provide an overview and evaluation in terms of coverage of the available literature dedicated to impulse-responses for VAR models and propose a new approach based on adjustments to the Bonferroni bands. 11 Alquist et al. (2013) explain that after mid-1980, due to the U.S. deregulation, the co-movement between the U.S. oil price series, including WTI, refiners acquisition cost for domestically produced oil and for imported crude oil, became stronger. This means that we can take the WIT, since results obtained with other series are expected to be similar due to the high correlation between the various U.S. oil price series.
as base date; see details in the reference US. Bureau of Labour Statistics (2019). 12 Table 7 exhibits summary of statistics for the samples selected (demeaned). The series of returns exhibit higher levels of skewness and kurtosis than the logarithmic transformation of prices. Overall, the levels of skewness and kurtosis of these samples indicate the presence of heavy tails and values that are more clustered around zero.  13 After 2014, WTI prices plummeted during the following three years, down to US$26.68 in January, 2016, mainly due to the effect of supply strategies, the surprising growth of U.S. shale oil production and by a slower-than-expected global growth, among others. The behaviour of WTI prices afterwards is associated with geopolitical risks and growth expectations for the U.S. and Chinese economies.

Impulse-Response Confidence Bands for Oil Series
Our framework produces confidence sets and impulse-responses confidence bands for oil series considering the Gaussian and the Student-t distribution, with degrees of freedom calibrated as 8, which are commonly used in macroeconomic and financial time series applications. Note that all series are demeaned before our two-step simulation-based methodology is applied. To implement our method, we conduct a grid search on the four quadrants with a step of 0.01. Values of the parameters with a p-value higher than the level of significance al pha = 0.05 are selected to construct our confidence sets. There is no statistical reason to predict any particular shape of the confidence region. It is worth noting that an empty confidence set indicates a rejection of the null hypothesis for any pair. This means a rejection of the parametric ARMA(1,1) specification as a whole. In this section, we highlight our main findings. All results are available upon request.
In general, outcomes of our method for the series of the logarithmic transformation of oil prices (price series, henceforth), with both Gaussian and Student-t distributions, lead to similar conclusions at the two frequencies examined, including nominal and real terms. Root cancelation is only present at the monthly frequency on confidence sets resulting from the method with the OLS estimation of forward and backward looking components as auxiliary function. Root cancelation would invalidate standard inference; nevertheless it is revealed by our method and not biasing our results. Non-empty and closed (one set) confidence sets are obtained for the price series in all cases examined.
We focus on the framework with the averaging objective function, which refutes root cancelation for the price series. This result reinforces the usefulness of our combined strategy. Figure 2 focuses on the fatter tails calibration and the averaging objective function for the monthly nominal frequency and contains two diagrams: (i) joint 95% confidence set and (ii) impulse-response confidence bands. Additionally, the second diagram displays the MLE impulse-response function, to illustrate that it can be misleading, since it is within our confidence bands. The confidence set obtained with our method in Figure 2a is sharp and supports the ARMA parametrization. Although the projections of the MA parameter θ contain zero, the MA root ranges from −0.18 to 0.36. The AR parameter ψ takes values between 0.95 and 0.99. As a result, our impulse-response confidence bands in Figure 2b show a persistent effect of the shock on prices, which takes above a decade or so to die down.
We refute the ARMA (1,1) on weekly nominal returns, with both the Gaussian and the Student-t distributions, when combined with the OLS estimation of forward and backward-looking components as auxiliary function. This means that the model is not complex enough at this particular frequency. At the monthly nominal frequency, we also obtain an empty confidence set if the Gaussian distribution is combined with the same auxiliary function. In contrast, the model with the fatter tails calibration is never rejected at the monthly frequency. Non-empty confidence sets, obtained with both distributions at the monthly frequency are closed (one set), or disjoint (with two and three subsets) and exhibit shapes that are sharp and tight around root cancelation. Confidence sets are informative, since a large area of the parameter space is refuted. However, we cannot refute root cancelation, even with fatter tails. Figure 3 shows results with the fatter tails calibration and the averaging method for the monthly nominal frequency of returns, following the same order as in the previous figure. Figure 3a exhibits the confidence set, which is sharp around root cancelation. Resulting impulse-response confidence bands in Figure 3b indicate that there is no persistence of the shock on returns, due to the effect of root cancelation. Collected together, our results support research at the frequency typically used in macro analysis for oil series, with an ARMA in levels, including the Gaussian shock hypothesis. Thus, we find that shocks to oil prices are permanent, taking about a decade or so to die down. Confidence sets obtained with oil returns are quite sharp around root cancellation, which affects predictability at the frequencies examined. In addition, the ARMA specification is refuted at the weekly frequency of returns, which suggests that alternative specifications allowing for e.g., conditional heteroskedasticity and skewness are interesting research paths, on frequencies more relevant to finance than macro applications.

Conclusions
Our framework delivers simultaneous impulse-response confidence bands for ARMA (1,1) processes by combining Indirect Inference, MC test methods and confidence sets projections. We treat proposed objective functions as test statistics, which we invert through a two-stage simulation to obtain joint confidence sets for the parameters. Invariance principles are proposed to ensure finite sample exactness in general nuisance parameter dependent settings. In contrast to MLE-based tests, our proposed methods achieve size control and good power, irrespective of how close the parameters are to the unit root boundary, or whether root cancelation is present.
We investigate the persistence of shocks on oil prices and returns of the West Texas Intermediate (WTI) benchmark via projected impulse-response confidence bands. The main advantage of our method is the possibility of remaining agnostic with respect to the process followed by these series. It is noteworthy that resulting impulse-response confidence bands seem well identified. Our results support a persistent ARMA model in levels. In contrast, confidence sets for monthly returns are tightly concentrated on the root cancellation line. ARMA with or without fat tails is rejected for the weekly return series. This noteworthy result reinforces the usefulness of confidence set methods that allow for empty outcomes. Lastly, our findings support the applicability of impulse-responses obtained from asymmetric and heavy tailed distributions, which leaves the door open for further research, in particular with asymmetric distributions.