## 1. Introduction

Indirect inference refers to resampling-based statistical methods that use simulations to calculate estimating functions including likelihoods, scores or moments. The idea of replacing complicated or intractable statistical functions by computer simulations motivated initial applications and underlying theory; see

Smith (

1993);

Gouriéroux et al. (

1993);

Gouriéroux and Monfort (

1996) and

Gallant and Tauchen (

1996). Since then, available evidence suggests that indirect inference often works better than traditional methods in finite samples for bias or misspecification corrections under various situations.

1 We propose a two-stage framework that can yield identification robust confidence sets via Indirect Inference for finite samples. The primary focus concerns inference in Autoregressive Moving Average (

**ARMA**) models and production of simultaneous confidence bands from its impulse-response functions.

2Despite the simplicity of ARMA models in time series, identification and boundary issues raise enduring complications for estimation and inference. First, Autoregression (

**AR**) unit roots cause the so-called

non-uniform convergence problem, which means that methods requiring stationarity break down at, or close to the unit boundary, whereas local-to-unity motivated methods perform poorly away from unity; see

Hansen (

1999);

Mikusheva (

2007);

Andrews and Guggenberger (

2010);

Gorodnichenko et al. (

2012);

Mikusheva (

2012,

2014) and

Phillips (

2014). Second, identification failure related to the invertibility of Moving Average (

**MA**) components causes the so-called

pile up effect, since estimating functions, including likelihoods, can have a local maximum at the invertibility boundary even when the true process is invertible; see

Sargan and Bhargava (

1983);

Davis and Dunsmuir (

1996);

Gospodinov (

2002);

Genton and Ronchetti (

2003);

Billio and Monfort (

2003);

Davis and Song (

2011); and

Gospodinov and Ng (

2015).

Third, root cancelation, also known as common factors, or parameter redundancy described, e.g., in

Harvey (

1981), occurs when the AR polynomial and the MA polynomial have one or more roots in common. Root cancelation and near-root cancelation produce spurious inference related to lack of identification or weak-identification of the parameters in question. Identification and inference concerns in this context have been extensively analyzed in the literature. For instance,

Ansley and Newbold (

1980) document the severe bias suffered by the Maximum Likelihood Estimation (MLE) for finite samples.

Galbraith and Zinde-Walsh (

1997) and references therein focus on providing identification devices for ARMA models.

Nelson and Startz (

2007) emphasize that near-root cancelations lead to spurious inference regardless of large sample sizes, even when asymptotic theory is expected to hold. More recently, see

Andrews and Cheng (

2012) and

Andrews and Mikusheva (

2014,

2015), most of which remains asymptotic.

Our two-stage simulation approach complements the asymptotic frameworks available in the literature, in the sense that it presents a finite sample version; see

Khalaf and Saunders (

2019). Indirect Inference relies on a binding function that links parameters of interest to auxiliary parameters, for which simple estimators are available. The binding function admits a simulation-based approximate. Hence, we propose the following Ordinary Least Squares (OLS) auxiliary estimates: (i) least squares in two-sided regressions with leads and lags, (ii) least squares in long-autoregressions (AR), and (iii) empirical autocorrelations. Methods with leads and lags generally aim to deal with feedback effects; see

Stock and Watson (

1993);

Dufour and Torres (

2000). AR-based approximations have long been popular for ARMA models; see

Hannan and Rissanen (

1982);

Galbraith and Zinde-Walsh (

1994,

1997);

Ghysels et al. (

2003);

Guay and Scaillet (

2003);

Gospodinov and Ng (

2015);

Dufour and Pelletier (

2014). Empirical autocorrelations and related measures including cumulants and local projections have recently been re-introduced from a minimum distance perspective; see e.g.,

Jorda and Kozicki (

2011);

Gorodnichenko et al. (

2012);

Gospodinov and Ng (

2015) and references therein. Our two-stage simulation framework unifies all the above for confidence interval purposes.

The first simulation stage involves constructing a Wald-type minimum-distance metric that matches auxiliary parameters with corresponding estimates. Typically in Indirect Inference, standard asymptotic arguments are employed to derive confidence sets and hypothesis tests; however, these assume that parameters are identified. In contrast to the standard extremum extimation strategy, we propose to treat this Wald-type distance metric as a test statistic, which we

invert, rather than minimize. Inverting a test involves collecting the values of the parameter that are not rejected via this test at a given level of significance

$\alpha $ to construct a confidence set. Exactness and identification robustness are achieved by applying the Monte Carlo (MC) test method from

Dufour (

2006) to compute simulated

p-values in a second stage. This method achieves size control

3 when the null distribution of the statistic, even if not computable analytically, does not depend on unknown or nuisance parameters.

4 In our ARMA application, a nuisance-parameter-free distribution results from scale-invariance of OLS in AR settings, and from the standardization inherent to empirical autocorrelations. Laslty, the confidence set is projected to construct impulse-response confidence bands, which entails maximizing and minimizing each function over the non-rejected values of the parameters. The confidence intervals obtained are thus simultaneous by construction and robust to the aforementioned identification problems.

We conduct a simulation study on ARMA and MA processes and compare our method with MLE-based inference. Simple objective functions as in

Galbraith and Zinde-Walsh (

1994,

1997), as well as over-identified criteria are considered, in addition to various lag structures. Results can be summarized as follows. In contrast to MLE, our method eradicates

pile up root cancelation problems and boundary problems, even with very small samples, with no more than 50 observations. MLE is oversized for both individual parameter and joint tests. None of the estimation functions steadily outperforms the rest for all sample sizes and pairs tested. Thus, we incorporate a joint criterion, constructed by averaging the three proposed objective functions, to exploit the power advantages of each of them. Leads improve power for highly persistent processes. Overidentification yields important power benefits. On balance, autocorrelation-based methods require a larger sample to catch up power-wise with AR-based counterparts. Inverting the asymptotic Wald test based on MLE estimates is almost infeasible at corner solutions with parameters close to the unit boundary, since in many instances the variance-covariance matrix is not invertible. This result is expected and provides support to our research.

Our methodology is applied with a univariate perspective to investigate the persistence of oil shocks via impulse-responses confidence bands. We select daily spot prices of the West Texas Intermediate (WTI), a high quality, light sweet crude oil, delivered at Cushing, Oklahoma and priced in U.S. dollars per barrel. We focus on the logarithmic transformation of prices and returns at the weekly and monthly frequencies. The main advantage of our method is the possibility of remaining agnostic with respect to the process followed by these series. Among our results, we highlight that our simultaneous impulse-response confidence bands seem well identified, despite the presence of parameter uncertainty in these series. We find that root cancelation does not bias our results; however it affects predictability; in particular, by obscuring the behaviour of impulse-responses for oil returns. Hence, when the focus of the analysis is on predictability, researches should favour the study on prices, instead of growth in prices. We find that shocks to oil prices are permanent, taking about a decade or so to die down. Also, relying on MLE to estimate the effect of shocks when processes are highly persistent can mislead researches when conducting estimation and forecasting. In contrast, our proposed impulse-response confidence bands represent an empirically relevant transformation of the joint confidence set.

We introduce our framework in

Section 2 and discuss inference by test inversion in

Section 3. The ARMA special case is presented in

Section 4. Simulation results are presented in

Section 5. An empirical application with focus on benchmark oil prices and returns is explored in

Section 6 and we conclude in

Section 7.

## 2. General Framework

Consider the general model

where

$\mathcal{X}$ is the sample space, such that a sample of size

T is represented as

$\mathcal{P}\left(\vartheta \right)$ is a probability distribution over

$\mathcal{X}$ indexed by

$\vartheta $ and

$\Theta $ is the parameter space. We also asume that

$\vartheta $ can be partitioned as

where

${\Theta}_{\beta}$ and

${\Theta}_{\sigma}$ are partitions of

$\Theta $.

$\beta $ is the

q-dimensional parameter of interest and

$\sigma $ is a nuisance parameter, which we aim to partial out.

Pseudo-data can be simulated from (

1) and an auxiliary parameter denoted

$\lambda $, of dimension

$p\ge q$, can be defined in a way that it is linked to

$\beta $ via an exact or limiting binding function,

$\lambda =\mathcal{L}\left(\vartheta \right)$. In general,

$\mathcal{L}(.)$ need not have a known analytical form. Furthermore, a simple statistic is available to estimate

$\lambda $ given

X.

Denote by

$\widehat{\lambda}=\widehat{\Lambda}\left(X\right)$ the considered data-based estimate of

$\lambda $ where

$\widehat{\Lambda}(.)$ is the estimating function. Its simulation-based counterpart, given a specific value of

$\vartheta $, can be derived, for example, in this way: (i) creating

H simulated paths,

each of size

T from (

1) given

$\vartheta $; (ii) applying the same estimation method to each simulated path, leading to a series of estimations,

and then (iii) computing their average as follows,

The distance between

$\widehat{\Lambda}\left(X\right)$ and

$\Lambda \left(\vartheta \right)$ forms the basis of Indirect Inference. A common special case is the Wald-type distance measure

${(\widehat{\Lambda}\left(X\right)-\Lambda \left(\vartheta \right))}^{\prime}\widehat{\Omega}(\widehat{\Lambda}\left(X\right)-\Lambda \left(\vartheta \right))$ where

$\widehat{\Omega}$ is a positive definite weighting matrix. Gouriéroux et al. (1993) suggest

$\widehat{\Omega}={I}_{p}$ where

${I}_{p}$ denotes a

p-dimensional identity matrix because the loss in efficiency with respect to an optimal estimator can be disregarded. This practice is also suggested, more recently, by

Gouriéroux et al. (

2010).

#### 2.1. Nuisance Parameters

Focusing on $\beta $ as the parameter of interest requires a strategy to account for $\sigma $. In some settings, $\sigma $ can be easily estimated given $\beta $. Alternatively, an auxiliary estimate, or objective function that is exactly invariant to $\sigma $ can be considered. We emphasize the latter case in this study, defined via the following assumptions that characterize the model and its respective estimating functions.

**Assumption** **1.** In the context of model (1), an estimating function $\widehat{\Lambda}(.)$ is invariant to $\sigma $, ifwhere ${X}^{*}$ and ${X}^{\u2020}$ are samples from (1) with $\mathcal{P}\left({({\beta}^{\prime},{\sigma}_{*}^{\prime})}^{\prime}\right)$ and $\mathcal{P}\left({({\beta}^{\prime},{\sigma}_{\u2020}^{\prime})}^{\prime}\right),$ respectively, and ${\sigma}_{*}\ne {\sigma}_{\u2020}$ are two different values of$\sigma .$ This assumption ensures that

which yields an invariant objective function for the transformation that maps

${X}^{*}$ into

${X}^{\u2020}$. Indeed, applying Assumption

**1** to the calibration in (

6), we have:

It follows that

$\sigma $ can be set to any value in

${\Theta}_{\sigma}$ for simulation purposes with no bearing on estimation, nor inference, via objective functions of the form

In line with the above cited literature, (

10) uses an identity weighting matrix. Other choices are possible as long as

$S(X;\vartheta )$ remains invariant to

$\sigma $, or the distribution of

$S(X;\vartheta )$ does not depend on

$\sigma $ under the null hypothesis that fixes

$\beta $.

A common special case includes scale invariance, which implies robustness to multiplying the data by any positive scalar

a, so

**Assumption 1** yields the following,

which together imply:

Invariance also implies that in the simulation study, $\sigma $ can be set to any known value in its parameter space, since it will be partialled-out. Hence, we simplify our notation by supressing $\sigma $ in $S(X;\vartheta )$, $\Lambda \left(\vartheta \right)$ and $\widehat{\Lambda}\left(\vartheta \right)$, and redefining them as $S(X;\beta )$, $\Lambda \left(\beta \right)$ and $\widehat{\Lambda}\left(\beta \right)$.

## 3. Inference via Test Inversion

Standard applications of Indirect Inference resort to minimizing (

10) and formulating adequate regularity conditions, so that the resulting estimator, denoted

$\widehat{\beta}$, is (typically) asymptotically normal with a covariance matrix that can be estimated consistently. To compare and contrast our proposed inference method with common practices and to define

test inversion within a familiar setting, let us revisit the traditional

$\alpha $-level confidence interval,

for a given scalar function

$g\left(\beta \right)$, where

${\widehat{\Sigma}}_{g}^{1/2}$ denotes the relevant standard error estimate computed (for example, via the delta-method) from the estimated asymptotic variance-covariance matrix of

$\widehat{\beta}$ and

${z}_{\alpha /2}$ is a two-tailed standard normal critical point. As it is well known,

$CI\left(g\right(\beta );\alpha )$ is derived by solving the following inequality, over

$g\left({\beta}_{0}\right)$:

Said differently, the solution of (

15)

inverts a t-type test of

based on the statistic

$|g\left(\widehat{\beta}\right)-g\left(\beta \right)|/{\widehat{\Sigma}}_{g}^{1/2}$.

The null hypothesis may also cover the full model specification in conjunction with testing $g\left(\beta \right)$, in which case, the alternative hypothesis is no longer restricted to $g\left(\beta \right)\ne g\left({\beta}_{0}\right)$. Interpreting rejections when the model is jointly tested calls for caution; and when such a test is inverted, the outcome may be the empty set, which indicates that the model under the null is misspecified. The test that we invert below follows these principles, in the sense that model misspecification is covered by the alternative hypothesis.

Ideally,

$\mathrm{P}\left[g\right(\beta )\in CI(g\left(\beta \right);\alpha \left)\right]\ge 1-\alpha $, at least as

$T\to \infty $. However, identification failures, or stationarity concerns will cause intervals of this form to severely under-cover, even with large samples. In our research, we depart from this approach and propose a confidence set for

$\beta $ based on inverting a test of the form,

using (

10), which we treat as a test statistic. Inverting a test of

${H}_{0}\left({\beta}_{0}\right)$ at a given level

$\alpha $ consists in collecting, numerically or analytically, the

${\beta}_{0}$ values that are not rejected using the considered test at the specified level. For example, given the right-tailed test statistic,

and assuming (for the moment and for illustrative purposes) that an

$\alpha $ -level cut-off point

${\mathcal{T}}_{c}$ is available, the test inversion involves solving, over

${\beta}_{0}$, the inequality

$\mathcal{T}({\beta}_{0})<{\mathcal{T}}_{c}$. The solution of this inequality is a parameter space subset, denoted

$CS(\beta ;\alpha )$, that satisfies:

Since the finite sample distribution of (

10) is intractable,

and since there is no reason to expect that a cut-off point invariant to${\beta}_{0}$ can be found, a numerical solution is required.

We propose collecting via numerical search, the ${\beta}_{0}$ values that are not rejected using a MC p-value that we will define in the next section, denoted $\widehat{p}\left({\beta}_{0}\right)$. In other words, we propose to search for the ${\beta}_{0}$ values for which $\widehat{p}\left({\beta}_{0}\right)>\alpha $. The result is a joint $\alpha $ -level confidence region denoted $CS(\beta ;\alpha )$, as mentioned before. The set thus derived can be empty, which will provide a diagnostic test of the hypothesized model. Said differently, an empty confidence set would occur when all values of $\beta $ are rejected at the considered level, meaning that in this case, the selected specification would be incompatible with the available data. Our proposed variant of Indirect Inference preserves its specification-robustness attributes. If identification is weak, the set will be unbounded or diffuse, which will also provide useful information regarding model fit (or lack thereof).

If a confidence interval for the individual components of $\beta $ or, more generally, for a given function $g\left(\beta \right)$ is desired, the above defined $CS(\beta ;\alpha )$ can be projected. This implies minimizing and maximizing $g\left(\beta \right)$ over the $\beta $ values in $CS(\beta ;\alpha )$. We study a special case of ARMA-based impulse-response functions that illustrates the usefulness of such projections.

#### 3.1. Exact p-Values

We propose applying

Dufour’s (

2006) MC method to compute the above

$\widehat{p}\left({\beta}_{0}\right)$ empirical

p-value.Given that the null distribution of the considered statistic is nuisance parameter free, the MC method controls the type I error even if

$S(X,{\beta}_{0})$ is itself simulation-based, in finite samples. This method is summarized as follows.

First, we introduce a new layer of simulation, for each value of

${\beta}_{0}$: We draw

L replications satisfying the null hypothesis in (

17) from the model,

These replications should be generated independently from the simulations underlying

$S(X,{\beta}_{0})$. Next, we apply

$\widehat{\Lambda}(.)$ to each replication

l leading to

and compute a series of statistics that are similar to the one in (

18), but now considering the vector of the estimated parameters obtained from

$\widehat{\Lambda}\left({x}_{l}\left({\beta}_{0}\right)\right)$, for each replication

l,

instead of the vector obtained by applying the estimation method to the data. We are able to use the same vector of calibrated parameters,

$\Lambda \left({\beta}_{0}\right)$ from (

18), due to the exchangeability property of the MC test; see

Dufour (

2006). Lastly, the empirical

p-value associated with the null hypothesis is computed as

where

${G}_{L}$ denotes the number of times the statistic

${S}_{l}\left({x}_{\mathbf{l}}\left({\beta}_{0}\right)\right)$ is greater than or equal to the data-based statistic

$S(X,{\beta}_{0}).$ 5## 4. ARMA Special Case

We consider the zero mean Gaussian ARMA(1,1) process of size

T with unknown parameters

$\beta =\{\theta ,\psi \}$,

with

where

$\left|\theta \right|<1$ guarantees invertibility or an Autoregressive AR(∞) representation,

$\left|\psi \right|<1$ ensures stationarity in causal processes and the joint distribution of

${u}_{1},\dots ,{u}_{t},\dots ,{u}_{T}$ is known. For example,

${u}_{t}$ can be assumed independent and identically distributed with a standard normal distribution:

This notation implies that

${X}_{0}$ is the pre-sample observation (i.e., it is not observed), in which case further assumptions are needed due to our indirect inference focus. In a stationary

6 environment,

${X}_{0}$ should be set to the following:

Alternatively, ${e}_{0}$ can be set to zero, which maintaining (27), fixes ${X}_{0}$ to zero.

Gaussian and Student-

$\mathsf{t}$ errors are considered in our Monte Carlo (MC) simulations and empirical application; the generalized lambda family of distributions as considered in

Gospodinov and Ng (

2015) is another interesting parametrization. Non-zero mean and exogenous variables can be projected out. With regards to the above notation, we maintain

$\sigma $ as the nuisance parameter and we focus on inverting tests as in (

17), for

${H}_{0}:\beta ={\beta}_{0}$.

In this context, for any positive scalar

a and number of lags

p, we have:

and

These invariance properties guarantee that the argument that minimizes

$Q(.)$ would numericly coincide if the data generating process (DGP) is drawn with unit variance or any other value of

$\sigma $; the same holds for the empirical autocorrelations. To simplify the notation, we use the same number of leads and lags in

$Q\left(X\right)$, although clearly, invariance to

$\sigma $ does not require this restriction. The choice in terms of the number of lags

p is user defined, as it is the case with all auxiliary model-based methods. In contrast to most existing methods, the methodology that we propose controls the significance level for any given

p and thus regardless of the truncation order. This is discussed further in the context of our simulation study in

Section 5.

Conformably, we consider three choices for our estimating function

$\widehat{\Lambda}(.)$: (i) two -sided OLS estimation with forward and backward-looking components; (ii) OLS estimation of backward-looking or long-AR; and (iii) empirical autocorrelations. For clarity, and given any sample vector

$X={({X}_{1},\dots ,{X}_{T})}^{\prime}$ of size

T, we will refer to these choices as

where

$Q\left(X\right)$ corresponds to (

28) and

${\widehat{\rho}}_{\mathbf{j}}$ is given by (

31).

For each choice of the estimating function, we construct a test statistic considering the distance between the previously estimated and the simulation-calibrated auxiliary parameters. In addition, to be able to harvest all available information, we construct a fourth test statistic by a simple average of those three.

Because we focus on an invariant auxiliary statistics, to compute their simulation-based counterparts for any given value of the parameters, it suffices to create

H simulated paths,

${x}_{h}\left(\beta \right)={\left({x}_{h,1},\dots ,{x}_{h,t},\dots ,{x}_{h,T}\right)}^{\prime}$, as

Population autocorrelations need not be derived by simulation; yet invariance is explicitly imposed this way for any p. For the MA(1) special case, we also use a simplified auxiliary parameter: We fit a long-AR to observed and simulated data and retain the coefficient of the first lag ${X}_{t-1}$.

#### 4.1. Impulse-Response Confidence Bands

Impulse-response functions are the most empirically and policy relevant transformations of ARMA parameters. If the stationarity condition is satisfied, (

24) admits an MA (∞) approximation using lags operators

L,

and the empirical impulse-response coefficients can be computed following the well-known iterative process:

These equations define a series of known functions of the vector

$\beta =(\theta ,\psi )$, denoted

${g}_{i}\left(\beta \right)$,

$i=1,\dots ,$ m, that can be projected in turn, which imply minimizing and maximizing

${g}_{i}\left(\beta \right)$ over the

$\beta $ values in

$CS(\beta ;\alpha )$.

Resulting confidence intervals are

simultaneous, in the following sense. Let

${g}_{i}\left(CS\left(\beta ;\alpha \right)\right)$ denote the image of

$CS\left(\beta ;\alpha \right)$ by the function

${g}_{i}$. Then, (

19) implies that

The simulation study reported below includes an illustrative analysis of such projections emphasizing close to boundary values of the MA parameter $\theta $ and the AR parameter $\psi $, as well as robustness to root cancelation.

## 5. Simulation Study

We conduct MC simulation experiments to assess the performance of our method for MA(1) and ARMA(1,1) processes. Our objective is to study power and size when MA parameters are close to the non-invertibility region for both processes, and the AR parameter is close to the non-stationary region for the ARMA. Our experiments are designed as follows. Simulated samples with number of observations T equal to 50, 100 and 200, used here as pseudo-data, are drawn from the ARMA model in (

24). First, time series of errors are drawn from a Gaussian, or a Student-t distribution with 5 degrees of freedom, setting

$\sigma =1$, in line with Assumption

$\mathbf{1}$. These choices of sample size correspond to samples employed in other MC studies, as well as data used in empirical applications.

The dimension of the auxiliary parameter is assessed by simulation. Power is studied by selecting various combinations of number of parameters and paths within the simulations explored in our model. Various tests are performed by changing the number of auxiliary parameters in our experiments -refer to definitions in

Section 2- from 1 to 3 and 8 for the MA and from 6 to 8 and 12 for the ARMA. We take into consideration the recommendations from

Galbraith and Zinde-Walsh (

1994) and

Ghysels et al. (

2003) for the long-AR estimation, and

Gorodnichenko et al. (

2012) for the estimation with autocorrelations. We find that selecting 8 auxiliary parameters provides the highest power for all the sample sizes.

7 Hence, we suggest that for empirical practices, the total number of auxiliary parameters is set to 8 for each of the choices of the estimating functions, explained in

Section 4: (i)

${\gamma}_{f}={\gamma}_{b}=4$ in the OLS estimation with forward and backward-looking components, (ii)

${\gamma}_{b}=8$ in the OLS estimation of the long-AR, and (iii)

$j=8$ in the estimation with empirical autocorrelations. In the case of the simplified auxiliary parameter estimation for the MA process, the long-AR is estimated with 8 parameters, but only the first coefficient is retained.

As part of previous experiments, we test the number of simulated paths in the first simulation stage, setting

$H=1$ and

$H=3$, as suggested by

Ghysels et al. (

2003) for the MA, and also

$H=5$, to examine whether it could improve power for ARMA -refer to the notation in

Section 2. However,

$H=3$ is finally selected, keeping the number of auxiliary parameters equal to 8, as the value which provides the highest power for all sample sizes. The number of replications in the second simulation step, involving the application of

Dufour’s (

2006) MC method, is

$L=199$ and we consider a level of significance

$\alpha =0.05$ -see notation in

Section 3. An identity weighting matrix is used to preserve the invariance to scale of the statistic associated to every choice of the estimating function. Lastly, 1000 replications are considered on each MC simulation experiment.

#### 5.1. Main Results

Each pair of

$\{MA,AR\}$ parameters employed in the data generating process (DGP) of ARMA(1,1) is denoted in our experiments as

$\{{\theta}_{DGP},{\psi}_{DGP}\}$. Outcomes from all our simulation studies are available upon request. We focus on the following set of pairs to illustrate our results:

$\{{\theta}_{DGP},{\psi}_{DGP}\}=\{\{0.6,0\};\{0.99,0\};\{0.65,-0.65\};\{0.99,0.99\}\}$. Note that we set the AR parameter

${\psi}_{DGP}$ equal to zero in (

24) a priory to draw samples from MA(1) processes, for instance, when the value of

${\theta}_{DGP}$ is 0.6 and 0.99.

We report power in terms of empirical rejections and our main results are summarized along the following lines. The results of the experiments conducted for MA(1) processes are presented in

Table 1 and

Table 2, and the studies related to the ARMA(1,1) process are shown in

Table 3 and

Table 4. We use abbreviations for each estimating function: OLS-FBLC, OLS-Long AR and AutoCorr correspond to the OLS estimation with forward and backward-looking components, the OLS estimation of the long-AR and the estimation with empirical autocorrelations, respectively. Size is controlled with our method for all sample sizes, choices of the estimating function, and for both Gaussian and Student- t errors, irrespective of how close the true parameters are to the unit root. It is well known that MLE gives size distortions when the sample size is small, for both, individual parameter tests and joint tests. Size seems to improve for the MLE when the sample size is increased and the parameters are far from the unit root regions. However, when the parameters approach the unit root regions, size fails for the MLE as the sample size increases, which confirms its asymptotic failure.

8Results obtained in term of power for processes with Student- t errors are in line with the ones obtained with Gaussian errors. This outcome confirms the robustness of our framework to fatter tails, since we purposely ignore the fact that the errors are drawn from the Student- t distribution when we apply our method.

Table 1 and

Table 2 exhibit results from experiments on MA processes. The top section of this table compares our method -denoted SIM METHOD in the tables- versus MLE in terms of size. Note that MLE is oversized, while size remains exact for our method. We can identify that the

pile up effect starts for values of the parameter

$\theta $ higher than 0.6, in finite samples. Although our method does not produce size distortions, it cannot discriminate between tested values higher than 0.85, when the true DGP value is between 0.85 and 1. This result is expected and traced back to, e.g.,

Stock (

1991) and references therein. Typically, when values of the parameter near the boundary (around 0.99) cannot be refuted, values substantially different (for instance, 0.85) from the boundary are also hard to refute in small samples. For small values of the parameter, the method with the simplified auxiliary parameter estimation, inspired in

Galbraith and Zinde-Walsh (

1994,

1997), shows an increase in power with respect to the OLS estimation of the long-AR. However, as the tested value of the parameter increases and gets closer to the unit root region, the latter method shows higher power than the former. This result indicates that overidentification is useful to increase power when we have higher persistence in MA(1) processes.

In general, the power of our method increases for the three estimating functions applied to the ARMA as the sample size increases, but none of them steadily shows the highest power for all sample sizes and pairs tested. When the AR parameter is closer to the unit root than the MA parameter and the sample is very small, 50 or 100 observations, our method applied with the OLS estimation with forward and backward looking components offers the highest power on the boundary. However, when the reverse occurs, the MA parameter is closer to the unit root than the MA parameter, our framework with the autocorrelations enables better discrimination between the true pair and any pair with values that are closer to the boundaries. Results obtained with the root cancelation pair,

$\{{\theta}_{DGP},{\psi}_{DGP}\}=\{0.65,-0.65\}$ are reported in

Table 3. As before, the top section of the table compares our method versus MLE in terms of size. Size remains exact for our method in this case; however MLE is oversized. When the OLS estimation with forward and backward components is applied, it gives the highest power for almost all the pairs tested for a sample size of T = 50 observations. Once we increase the sample size to

$T\phantom{\rule{3.33333pt}{0ex}}=$ 100, this estimating function still shows the highest power, followed closely by the second choice of estimating function, the OLS estimation of the long-AR. When the sample size is

T = 200, the power obtained with the autocorrelations reaches closely to the power obtained with the other two estimating functions.

Lastly, we examine the almost unit root pair as the true DGP pair,

$\{{\theta}_{DGP},{\psi}_{DGP}\}=\{0.99,0.99\}$ in

Table 4. For most of the pairs tested, applying the method with the OLS estimation of the long-AR gives the highest power for sample sizes of

T = 50 and

T = 100. However, for the same sample sizes, using autocorrelations gives the highest power for the last three pairs tested, for which the value of the MA parameter is closer to the boundary than the value of the AR parameter. As we increase the sample size, power reaches similar values for our method with the three estimating functions.

#### 5.2. Robustness Checks

In addition to our previous experiments, we conducted simple robustness checks for our method. Two examples are shown in

Table 5 and

Table 6. We present the power values for two ARMA processes which are misspecified as ARMA(1,1) with a sample size

T = 100. In

Table 5 we have an ARMA(2,1) with

$\{{\theta}_{DGP},{\psi}_{1,DGP},{\psi}_{2,DGP}\}=\{0.30,0.60,0.85\}$, so we have a high persistence AR(2) mixed with a low persistence MA(1). In general, the method applied with autocorrelations offers more power rejecting the pairs where the AR parameter is closer to the boundary than the MA parameter; while with the OLS estimation with forward and backward-looking components we obtain more power rejecting the pairs where the MA parameter is closer to the boundary than the AR parameter.

The ARMA (2,1) in

Table 6 with

$\{{\theta}_{1,DGP},{\theta}_{2,DGP},{\psi}_{DGP}\}=\{0.30,0.85,0.60\}$, shows a high persistence AR(1) mixed with a MA(2), which has higher persistence on its second parameter. In this case, we find that the method applied with both, the OLS estimation of forward and backward-looking components and the OLS estimation of the long-AR, is more powerful rejecting the misspecification than with the autocorrelations. Note that in both examples, our three choices of estimating functions provide good power, rejecting the misspecification for all the pairs tested.

## 6. Empirical Application

Our two-stage simulation-based framework is applied with a univariate perspective to investigate the persistence of oil shocks via impulse-response confidence bands. Apart from supply and demand conditions, estimation and forecasting of oil prices are complicated by the effect of non-market related characteristics, such as, regulations, technology and geopolitical considerations; see

Hamilton (

1983,

1996,

2003,

2009);

Kilian (

2008a,

2008b,

2008c) and

Kilian and Vigfusson (

2017). Main concerns in the literature related to oil prices are, among others, whether the impact of oil shocks is temporary or permanent and whether a particular model can offer better performance than a random walk. Most studies support analyzing the real price of oil based on its explanatory power for macroeconomic forecasting of output, for instance,

Baumeister and Kilian (

2012) and

Alquist et al. (

2013). However,

Hamilton (

2011) argues that nonlinear transformations of the nominal price of oil can reflect threshold responses based on consumer sentiment. In addition, he notes that deflating nominal prices by the Consumer Price Index (CPI) introduces measurement errors that can affect forecasting results. Hence, we explore the behaviour of oil shocks considering the logarithmic transformation of prices and returns (log-differences), at the weekly frequency and also at the monthly frequency, in both, nominal and real terms.

We have identified four lines of research where oil series are characterized as a mean reverting process, in particular ARMA: (i) study of long-run persistence in oil prices, (ii) study of oil volatility persistence, (iii) DSGE calibration and (iv) application of structural Vector Autoregression (VAR) family models. The first line of research focuses on oil price dynamics with long series and makes modeling decisions considering mean reverting and random walk processes; see

Pindyck (

1999);

Lee et al. (

2006);

Bernard et al. (

2012);

Gil-Alana and Gupta (

2014). Oil volatility persistence is studied in the second line of research, either by fitting ARMA family models to volatility proxies based on transformations of oil returns, or by applying Generalized Autoregressive Conditional Heteroscedasticity (GARCH) family models directly to oil returns; see e.g.,

Choi and Hammoudeh (

2009);

Charles and Darné (

2014) and references therein. In the third line of research, DSEG models frequently apply calibrations by fitting ARMA family models to oil prices, for instance, in the work of

Atkeson and Kehoe (

1999) and

Leduc and Sill (

2004). Lastly, the literature applying VAR family models is very extensive and it is mostly dedicated to forecasting in short to middle horizons; see

Pindyck (

2004);

Kilian (

2009);

Alquist and Kilian (

2010);

Baumeister and Kilian (

2012);

Alquist et al. (

2013);

Baumeister and Kilian (

2014a,

2014b),

Baumeister et al. (

2014);

Baumeister and Kilian (

2015);

Baumeister et al. (

2015).

9 In VAR models, impulse-responses are analyzed by either applying simulations, or Bayesian methods.

10Our simultaneous analysis is related to the research of

Staszewska-Bystrova (

2007,

2011,

2013);

Jorda (

2009);

Staszewska-Bystrova and Winker (

2013);

Jorda and Marcellino (

2010);

Inoue and Kilian (

2013,

2016);

Guerron-Quintana et al. (

2017) and

Montiel et al. (

2019). In contrast to the VAR literature, our proposed impulse-response confidence bands are compatible with small sample sizes and robust to parameter identification. Our framework remains agnostic with respect to the process followed by oil prices and returns by allowing for processes with no persistence, spurious white noise related to root cancelation, heavy persistence, as well as, almost unit root. Hence, we are able to capture the range of movement of the series under various observationally equivalent processes, this means through pairs of parameters that could generate the same log-likelihood function. This characteristic is related to the work by

Sims (

2001) and references therein and

Lubik and Schorfheide (

2003,

2004).

#### 6.1. Data

Daily spot prices of the West Texas Intermediate (WTI) crude oil in U.S. dollars per barrel are obtained from the webpage of the Energy Information Administration (EIA); see details in the reference

EIA (

2019). The series is selected after the deregulation and the post OPEC-administering pricing system, from January 1986 to June 2019.

11 Wednesday prices are selected for the weekly series and the second Wednesday of the month is selected for the monthly series, to avoid any calendar effects, such as, Monday or weekend effect, see

French (

1980), Friday-effect, and turn-on -the- month-effect, see

Ariel (

1987). The behaviour of the weekly nominal spot price series for this benchmark is shown in

Figure 1.

We explore the persistence of oil shocks on the logarithmic transformation of prices and returns (log-differences), at the weekly frequency and also at the monthly frequency, in both, nominal and real terms. Monthly nominal prices are deflated using the U.S. Consumer Price Index for All Urban Consumers All Items (CPIAUCSL), published by the U.S. Bureau of Labor Statistics, taking May, 2019 as base date; see details in the reference

US. Bureau of Labour Statistics (

2019).

12 Table 7 exhibits summary of statistics for the samples selected (demeaned). The series of returns exhibit higher levels of skewness and kurtosis than the logarithmic transformation of prices. Overall, the levels of skewness and kurtosis of these samples indicate the presence of heavy tails and values that are more clustered around zero.

The U.S. subprime mortgage crisis triggered an international financial crisis in 2008. According to the U.S. National Bureau of Economics Research (NBER), the U.S. recession is identified in terms of peaks and troughs from the last quarter of 2007 to the second quarter of 2009.

#### 6.2. Impulse-Response Confidence Bands for Oil Series

Our framework produces confidence sets and impulse-responses confidence bands for oil series considering the Gaussian and the Student-t distribution, with degrees of freedom calibrated as 8, which are commonly used in macroeconomic and financial time series applications. Note that all series are demeaned before our two-step simulation-based methodology is applied. To implement our method, we conduct a grid search on the four quadrants with a step of 0.01. Values of the parameters with a p-value higher than the level of significance $alpha=0.05$ are selected to construct our confidence sets. There is no statistical reason to predict any particular shape of the confidence region. It is worth noting that an empty confidence set indicates a rejection of the null hypothesis for any pair. This means a rejection of the parametric ARMA(1,1) specification as a whole. In this section, we highlight our main findings. All results are available upon request.

In general, outcomes of our method for the series of the logarithmic transformation of oil prices (price series, henceforth), with both Gaussian and Student-t distributions, lead to similar conclusions at the two frequencies examined, including nominal and real terms. Root cancelation is only present at the monthly frequency on confidence sets resulting from the method with the OLS estimation of forward and backward looking components as auxiliary function. Root cancelation would invalidate standard inference; nevertheless it is revealed by our method and not biasing our results. Non-empty and closed (one set) confidence sets are obtained for the price series in all cases examined.

We focus on the framework with the averaging objective function, which refutes root cancelation for the price series. This result reinforces the usefulness of our combined strategy.

Figure 2 focuses on the fatter tails calibration and the averaging objective function for the monthly nominal frequency and contains two diagrams: (i) joint

$95\%$ confidence set and (ii) impulse-response confidence bands. Additionally, the second diagram displays the MLE impulse-response function, to illustrate that it can be misleading, since it is within our confidence bands. The confidence set obtained with our method in

Figure 2a is sharp and supports the ARMA parametrization. Although the projections of the MA parameter

$\theta $ contain zero, the MA root ranges from −0.18 to 0.36. The AR parameter

$\psi $ takes values between 0.95 and 0.99. As a result, our impulse-response confidence bands in

Figure 2b show a persistent effect of the shock on prices, which takes above a decade or so to die down.

We refute the ARMA (1,1) on weekly nominal returns, with both the Gaussian and the Student-t distributions, when combined with the OLS estimation of forward and backward-looking components as auxiliary function. This means that the model is not complex enough at this particular frequency. At the monthly nominal frequency, we also obtain an empty confidence set if the Gaussian distribution is combined with the same auxiliary function. In contrast, the model with the fatter tails calibration is never rejected at the monthly frequency. Non-empty confidence sets, obtained with both distributions at the monthly frequency are closed (one set), or disjoint (with two and three subsets) and exhibit shapes that are sharp and tight around root cancelation. Confidence sets are informative, since a large area of the parameter space is refuted. However, we cannot refute root cancelation, even with fatter tails.

Figure 3 shows results with the fatter tails calibration and the averaging method for the monthly nominal frequency of returns, following the same order as in the previous figure.

Figure 3a exhibits the confidence set, which is sharp around root cancelation. Resulting impulse-response confidence bands in

Figure 3b indicate that there is no persistence of the shock on returns, due to the effect of root cancelation.

Collected together, our results support research at the frequency typically used in macro analysis for oil series, with an ARMA in levels, including the Gaussian shock hypothesis. Thus, we find that shocks to oil prices are permanent, taking about a decade or so to die down. Confidence sets obtained with oil returns are quite sharp around root cancellation, which affects predictability at the frequencies examined. In addition, the ARMA specification is refuted at the weekly frequency of returns, which suggests that alternative specifications allowing for e.g., conditional heteroskedasticity and skewness are interesting research paths, on frequencies more relevant to finance than macro applications.

## 7. Conclusions

Our framework delivers simultaneous impulse-response confidence bands for ARMA (1,1) processes by combining Indirect Inference, MC test methods and confidence sets projections. We treat proposed objective functions as test statistics, which we invert through a two-stage simulation to obtain joint confidence sets for the parameters. Invariance principles are proposed to ensure finite sample exactness in general nuisance parameter dependent settings. In contrast to MLE-based tests, our proposed methods achieve size control and good power, irrespective of how close the parameters are to the unit root boundary, or whether root cancelation is present.

We investigate the persistence of shocks on oil prices and returns of the West Texas Intermediate (WTI) benchmark via projected impulse-response confidence bands. The main advantage of our method is the possibility of remaining agnostic with respect to the process followed by these series. It is noteworthy that resulting impulse-response confidence bands seem well identified. Our results support a persistent ARMA model in levels. In contrast, confidence sets for monthly returns are tightly concentrated on the root cancellation line. ARMA with or without fat tails is rejected for the weekly return series. This noteworthy result reinforces the usefulness of confidence set methods that allow for empty outcomes. Lastly, our findings support the applicability of impulse-responses obtained from asymmetric and heavy tailed distributions, which leaves the door open for further research, in particular with asymmetric distributions.