## 1. Introduction

Almost all economic processes, like production, refinement or trading, involve some kind of transmission of input prices to output prices. For example in the fuel market, crude oil prices are transmitted to gasoline prices paid by consumers at a retail filling station. Such a price transmission is said to be asymmetric, if its characteristics differ between periods of increasing and decreasing prices. It is frequently suspected that oil refining companies, due to their market power, tend to delay decreases in crude oil prices whereas they transmit crude oil price increases immediately. In standard economic theory such asymmetric price transmissions are considered to be a result of market failure which should be avoided.

Various statistical methods have been developed to test if price transmissions in a given market are asymmetric.

1 All approaches based on historical times series are faced with the problem that price series usually follow non-stationary processes. We distinguish in the literature between short-run asymmetry referring to asymmetries in the reaction to transitory price movements and long-run asymmetry referring to differing speeds of adjustment after equilibrium errors. The latter category of models requires the existence of a cointegration relationship between input prices and output prices.

In the original residual-based cointegration model by

Engle and Granger (

1987), the only type of cointegrating relation allowed was a static linear equation whose stationarity was assessed by a single adjustment coefficient. As this is unable to capture asymmetries in the price transmission,

Enders and Granger (

1998) and

Enders and Siklos (

2001) devised a concept of threshold cointegration which allows the cointegrating relation to revert to its long-term equilibrium with two different speeds of adjustment. Their first model specification is based on the self-exciting threshold autoregressive (SETAR) model introduced by

Tong (

1978). Here, the speed on adjustment depends on whether the deviation from the equilibrium is above or below some threshold value. Alternatively, the momentum threshold autoregressive (MTAR) model replaces the deviation from equilibrium by its first differences, hence allowing for different speeds of adjustment for momentum above or below a threshold value. In both cases, the existence of a cointegration relationship can be confirmed by a residual-based test for cointegration. However, empirical studies on asymmetric price transmissions are mainly interested in a formal test for equality of both adjustment coefficients (

Galeotti et al. 2003;

Godby et al. 2000;

Grasso and Manera 2007;

Mohammadi 2011;

Simioni et al. 2013). A rejection of the null hypothesis would then constitute statistical evidence for asymmetric long-run adjustment.

Enders and Siklos (

2001) proclaim that ‘... the null hypothesis of symmetric adjustment (i.e.,

${\varrho}_{1}={\varrho}_{2}$) can be tested using a standard

F-distribution’ as long as a cointegration relationship can be confirmed.

The above outlined approach leads to a hierarchy of two tests: the primary aim is to reject the null hypothesis of no cointegration and only if the alternative holds true will the test for asymmetry be conducted. This might not result in a serious problem if evidence for cointegration were easy to obtain. However, tests for cointegration usually have very low power against the null hypothesis of no cointegration, so that the test for asymmetry is performed in very special situations (see, for example,

Karoglou and Morley (

2012);

Payne and Waters (

2008);

Thompson (

2006)). In this paper, we demonstrate by means of simulation experiments that tests for asymmetry in SETAR and MTAR models excessively reject their null hypothesis of symmetry in small samples and for slow adjustment rates. The extent of oversizing the test seems to depend inversely on the power of the primary threshold cointegration test. Furthermore, we find that the size and power properties of standard

F-tests for asymmetry vary considerably depending on whether a fixed or optimizing threshold is used. We provide simulation evidence that bootstrapping the test statistic leads to the correct size of the test while maintaining suitable power properties.

The performance of tests for asymmetry in residual-based cointegration models has been investigated in prior simulation studies.

Von Cramon-Taubadel and Meyer (

2000) diagnose a bias towards overrejecting the null hypothesis of symmetry. However, the authors deal with the effects of structural breaks, which from the perspective of the test is a misspecification. Hence, their results do not in general render the test invalid. There are also papers which point in the opposite direction of a tendency to underreject the hypothesis of symmetry.

Cook et al. (

1999) investigate the power of tests for asymmetry and find that they typically have low power which seems to increase with the sample size.

Galeotti et al. (

2003) state that the tests for asymmetry are biased toward accepting the null of symmetry in small samples without providing references or simulation evidence for their finding. They suggest bootstrapping the

F-statistic but do not describe the bootstrap algorithm in detail or evaluate its properties.

Grasso and Manera (

2007) also use a bootstrap test for asymmetry.

Honarvar (

2010) conducts extensive simulation experiments for several asymmetric error correction models.

2 These models measure the potentially asymmetric adjustment contributed by upstream and downstream prices whereas the adjustment of each variable is not explicitly modelled in our simulation experiments. Still, his results provide further evidence that test for asymmetry have low power against the alternative of threshold adjustment. It is argued that this is caused by the Engle-Granger two steps procedure which produces biased estimates of the cointegration vector if sample sizes are small. Additionally,

Norman (

2008) investigates the small sample property of threshold estimators in two-regime SETAR models. He finds that the estimator is biased in small samples because of its imprecision and the fact that the threshold is restricted to the range of data.

In the remainder of the paper, we briefly describe the threshold cointegration model and corresponding tests for asymmetry in

Section 2, discuss our simulation experiments in

Section 3 and provide an empirical illustration in

Section 4 using fuel market data.

Section 5 concludes and offers some recommendations for practitioners.

## 2. Models and Tests

For their residual-based cointegration test,

Enders and Siklos (

2001) follow the Engle-Granger two-step procedure and estimate the long-run equilibrium equation,

for integrated processes

${x}_{t}$ and

${y}_{t}$ by OLS. Under the null hypothesis of no cointegration, the error term

${z}_{t}$ is assumed to be generated by a unit root process,

while

${z}_{t}$ is assumed to follow a stationary SETAR/MTAR model under the alternative,

The Heaviside indicator variable

${I}_{t}$ is specified according to the TAR model,

Petruccelli and Woolford (

1984) have shown that the stationarity of the SETAR process is ensured if

${\varrho}_{1}<0$,

${\varrho}_{2}<0$ and

$(1+{\varrho}_{1})(1+{\varrho}_{2})<1$ holds, while

Lee and Shin (

2000) have proven that the stationarity of MTAR processes is ensured if

${\varrho}_{1}<0$,

${\varrho}_{2}<0$,

$(1+{\varrho}_{1})(1+{\varrho}_{2})<1$,

$(1+{\varrho}_{1}){(1+{\varrho}_{2})}^{2}<1$ and

${(1+{\varrho}_{1})}^{2}(1+{\varrho}_{2})<1$ holds.

To test for cointegration,

Enders and Siklos (

2001) recommend to estimate the linear regression in Equation (

3) and conduct an

F-test (

${F}_{CI}$) with the null hypothesis that both coefficients are zero,

While an alternative straightforward test would be to evaluate the maximum

t-statistic,

Enders and Siklos (

2001) provide simulation evidence that the

F-test has considerably more power against the null hypothesis.

Since the regressors

${I}_{t}{z}_{t-1}$ and

$(1-{I}_{t}){z}_{t-1}$ are orthogonal, we can write the test statistic as

where

${t}_{1}$ and

${t}_{2}$ are the

t-ratios for

${\widehat{\varrho}}_{1}$ and

${\widehat{\varrho}}_{2}$, respectively. Considering the stationarity region for SETAR and MTAR processes, the null should only be rejected if both coefficients have the correct (negative) sign. This means that large values of the

F-statistic do not lead to a rejection of the null hypothesis if at least one coefficient is positive.

3 Consequently, the alternative hypothesis takes the form of

Critical values for rejecting the null hypothesis are much larger than those coming from the conventional

F-distribution and the power of this test is quite poor for small sample sizes and slow speed of adjustment. Given that the test decision is in favor of cointegration, the test for asymmetry amounts to an

F-test of the null hypothesis,

against the alternative

using standard critical values (

Enders and Siklos 2001). We denote the

F-statistic with

${F}_{apt}$ and the corresponding critical value with

${F}_{apt}^{*}$. To simplify notation, we only implicitly assume that both coefficients have a negative sign if

${H}_{02}$ is rejected, i.e., in the event that

${F}_{apt}>{F}_{apt}^{*}$. The test for linear cointegration against threshold cointegration is therefore based on the composite hypothesis,

The correct nominal size of the test for asymmetry, i.e., the probability of rejecting the null hypothesis of linear cointegration in situations where

${H}_{0}$ holds true is given by

Rearranging the equation yields

where

$\alpha =P({F}_{CI}>{F}_{CI}^{*}\cap {F}_{apt}>{F}_{apt}^{*}|{H}_{0})$ is the chosen nominal level of significance and

$P({F}_{CI}>{F}_{CI}^{*}|{H}_{0})$ is the power curve of the Enders-Siklos cointegration test under symmetric adjustment as a function of the sample size and the speed of adjustment. We conclude that the test only maintains the chosen level of significance if the power of the primary cointegration test is unity which we know is not the case for small sample sizes or adjustment coefficients close to zero. If the power of the cointegration test, for example, takes the level 0.8, the nominal size of the test for asymmetry is inflated by the factor 1.25.

The difficulties to obtain the correct size of tests for asymmetry with cointegration pretesting can also be illustrated from a geometrical perspective. In general, the acceptance region for a joint hypothesis on uncorrelated coefficients ${\varrho}^{+}$ and ${\varrho}^{-}$ is an ellipse, its axes parallel to the coordinate axis with the length from end to end along one axis being ${d}_{1}=2se\left({\widehat{\varrho}}_{1}\right){Q}_{F}(1-\alpha )$ and its length along the other axis being ${d}_{2}=2se\left({\widehat{\varrho}}_{2}\right){Q}_{F}(1-\alpha )$, where ${Q}_{F}(\xb7)$ denotes the quantile function of ${F}_{CI}$ and the origin corresponds to the null hypothesis (here: ${\varrho}^{+}={\varrho}^{-}=0$). In this particular model, we can assume without loss of generality that the standard errors of ${\varrho}^{+}$ and ${\varrho}^{-}$ have the same magnitude. Hence, the ellipse becomes a circle with radius $r=\frac{{d}_{1}}{2}=\frac{{d}_{2}}{2}$.

Figure 1 displays a sketch of the acceptance region in the Cartesian plane. The null hypothesis of no cointegration is not rejected if one of the coefficients is positive so that we restrict the analysis to the third quadrant of the plane. Moving to a higher level of confidence shifts the circle further away from the origin. The null hypothesis of symmetric adjustment is represented by the dashed bisector and its acceptance region is indicated by two dotted lines. The composite rejection region is located to the left of the circle and outside of both dotted lines. We have added three points in the plane to illustrate the properties of the composite test. Point A coincides with a coefficient combination which should not lead to a rejection of symmetric adjustment. Point B lies within the acceptance region of the cointegration test. Although the combination of

${\varrho}^{+}$ and

${\varrho}^{-}$ is asymmetric, we cannot reject the null hypothesis of no cointegration and the test for equality of coefficients is not conducted. If we maintain the same ratio of

${\varrho}^{+}$ and

${\varrho}^{-}$ and move to Point C, the

${F}_{CI}$ statistic would increase and the null hypothesis of symmetric adjustment would be rejected. While it is more difficult to reject the null hypothesis of symmetric adjustment for combinations close to the origin considering the width of the acceptance region relative to the area of all possible combinations of

${\varrho}^{+}$ and

${\varrho}^{-}$, it is much easier to reject far from the origin. If we compare the composite test to the unconditional test for asymmetry, the latter does not specifically take the exclusion of the cointegration test acceptance region into account, where the null hypothesis is hardly ever rejected although the coefficients differ. Consequently, using critical values obtained from the standard

F-distribution leads to size distortions.

In order to improve the ability of the cointegration test to detect cointegration,

Enders and Siklos (

2001) suggest to follow

Chan (

1993) and select the threshold

$\tau $ by optimizing the SSE of regression (

3). In practice, this is achieved by estimating many regressions with fixed

$\tau $, where

$\tau $ runs through to all values of the cointegration residual series (SETAR) or its first difference (MTAR).

Chan (

1993) has shown that this procedure results in a superconsistent estimator for the threshold value which we denote by

${\tau}^{*}$. Usually a 100

q percentage lateral trimming is applied to avoid

${\tau}^{*}$ becoming too close to the extreme values and hence ensuring a sufficient number of observations in each regime. This procedure, however, has important implications for the follow-up tests for asymmetry as the threshold parameter is not identified under the null hypothesis. Considering a specification with no additional lags, the

F-statistic takes the form of

where

$SS{E}_{1}$ denotes the sum of squared errors for the symmetric model (

${\varrho}^{+}={\varrho}^{-}$),

$SS{E}_{2}\left(\tau \right)$ denotes the sum of squared errors for the asymmetric model depending on the threshold parameter

$\tau $ and

${p}_{2}$ denotes the number of parameters in the asymmetric model (here

${p}_{2}=2$). Therefore, we observe that the

F-statistic is a function of the nuisance parameter

$\tau $. Since

$SS{E}_{1}$ is fixed, it holds that

$F\left(\tau \right)\uparrow $ for

$SS{E}_{2}\left(\tau \right)\downarrow $. Optimizing the threshold value with respect to the SSE criterion leads to generally oversized tests for asymmetry if we assume the incorrect standard

F-distribution in those cases

Hansen (

1996,

1999).

One way to obtain correctly sized tests is to bootstrap the distribution of the

F-statistics. We employ a residual bootstrap algorithm similar to the procedure described in

Hansen (

1996) for SETAR processes or in

Caner and Hansen (

2001) for MTAR processes with a potential unit root. The algorithm is designed as follows:

- (1)
Estimate the long-run equilibrium equation to obtain ${\widehat{\beta}}_{0}$, ${\widehat{\beta}}_{1}$ and the cointegration residuals ${\widehat{z}}_{t}$. Conduct the F-test for asymmetry based on the MTAR/SETAR model and save ${F}_{apt}$.

- (2)
Estimate the symmetric model

to obtain

$\widehat{\varrho}$,

${\widehat{\gamma}}_{1},\dots ,{\widehat{\gamma}}_{k}$ and save the residuals

${\widehat{\epsilon}}_{t}$.

- (3)
Draw randomly from the residuals ${\widehat{\epsilon}}_{t}$ to obtain a bootstrap sample ${\epsilon}_{t}^{b}$.

- (4)
Generate the bootstrap cointegration residuals series as $\Delta {z}_{t}^{b}=\widehat{\varrho}{z}_{t-1}^{b}+{\sum}_{j=1}^{k}{\widehat{\gamma}}_{j}\phantom{\rule{0.166667em}{0ex}}\Delta {z}_{t-j}^{b}+{\epsilon}_{t}^{b}$ and use $({z}_{1}^{b},\dots ,{z}_{k}^{b})=({\widehat{z}}_{1},\dots ,{\widehat{z}}_{k})$ as initial observations.

- (5)
Generate the bootstrap variable ${y}_{t}^{b}={\widehat{\beta}}_{0}+{\widehat{\beta}}_{1}{x}_{t}+{z}_{t}^{b}$.

- (6)
Estimate the long-run equilibrium equation for ${y}_{t}^{b}$ and ${x}_{t}$ and re-estimate the MTAR/SETAR model to compute the bootstrap F-statistic, ${F}_{apt}^{b}$.

- (7)
Repeat (2) to (6) sufficiently often to obtain the empirical distribution of ${F}_{apt}^{b}$. Compute the p-value for ${F}_{apt}$ based on the bootstrap distribution.

The performance of bootstrapped tests for asymmetry is evaluated in the following section.

## 3. Simulations

In order to investigate the empirical size of test for asymmetry in fixed threshold and optimizing threshold models, we simulate a large number of symmetrically cointegrated time series

${x}_{t}$ and

${y}_{t}$ and report the probability that the test, performed at the 5%-level, falsely detects asymmetry. The data-generating process is given by

where

${\beta}_{0}=0$,

${\beta}_{1}=1$ and

${\sigma}^{2}=1$. Note that we thereby completely conform to the modelling framework of

Enders and Siklos (

2001): No misspecification of the long-run equilibrium equation and having normally distributed iid error terms which are in particular not serially correlated as the number of lags

$\Delta {z}_{t-j}$ is known in advance.

4We first estimate an MTAR model with a known threshold of

$\tau =0$ and display the rejection rates as a function of the statistic

${F}_{CI}$ obtained from the primary cointegration test. The results are plotted in the upper panel of

Figure 2. The rejection rates are calculated from 20,000 replications using a sample size of

$T=100$. We observe that the test is severely oversized for

$\varrho =-0.1$ and the extent of oversizing increases with the value of the estimated

F-statistic of the primary cointegration test (solid line in

Figure 2). This means that in those empirical applications with small samples where a cointegration relationship is confirmed, we automatically tend to report asymmetric adjustment rates. It is not surprising that a test which is performed conditional on the outcome of a primary test does not match its nominal level of significance. The striking point here is that the mismatch is always in favor of falsely detecting asymmetry and that this discrepancy seems to be inversely related to the

p-value of the cointegration test. In contrast, we do not report any size distortions which depend on the primary cointegration test for moderate adjustment coefficients (

$\varrho <-0.5$) where the power of the cointegration test is close to unity. In those cases, the empirical size is slightly below the nominal size of 5%. Further simulations, which are not reported, show that the size distortions vanish if the sample size increases, providing further evidence that these size distortions are inversely related to the power of the primary cointegration test. As expected from our theoretical derivations, we do not find any size distortions if the power of the cointegration test reaches unity.

Repeating our simulations for optimizing threshold values using a 15% lateral trimming yields very different results which are displayed in the lower panel of

Figure 2. Although we still notice oversizing from requiring evidence for cointegration, we find that the test for asymmetry in the MTAR model with optimizing threshold values is already substantially oversized for moderate adjustment coefficients. Since the threshold is not identified when adjustment is symmetric, the conditional least squares procedure seems to falsely select threshold values which artificially generate asymmetric adjustment estimates and result in a substantial difference in the sum of squared errors between the symmetric and asymmetric specification (see Equation (

13)). Moreover, as reported in

Table 1, the extent of oversizing increases with the sample size and cannot be sufficiently controlled by extensive lateral trimming.

We now turn to the SETAR models. The results with known threshold

$\tau =0$, displayed in the upper panel of

Figure 3, are surprisingly different from the fixed threshold MTAR results. The empirical size is close to zero across a wide range of adjustment coefficients which makes it difficult to assess the dependence on the primary cointegration test. However, taking the generally low power of the test for asymmetry in SETAR models into account, we still observe the effect of the power curve inflation factor. This result can also be related to prior studies reporting an underrejection of the null hypothesis (see, for example,

Cook et al. (

1999) and

Galeotti et al. (

2003)). Conversely, the test for asymmetry in SETAR models using optimizing threshold values, displayed in the lower panel of

Figure 3, shows a behavior similar to the MTAR model. In this case, it is possible, however not recommended, to control the size of the test by moving to a stronger lateral trimming (see

Table 1).

Finally, we analyze the empirical size and power of the proposed bootstrap tests for different combinations of

${\varrho}^{+}$ and

${\varrho}^{-}$. A Monte Carlo simulation experiment concerned with bootstrap procedures has to fulfil

$B,R\to \infty $, where

R is the number of replications and

B is the number of bootstrap draws. Assuming that the number of bootstrap replications is fixed, every added Monte Carlo iteration contributes multiplicatively to the overall computational cost. To avoid this inefficiency, we refer to the ‘Warp-speed’ bootstrap described in

Giacomini et al. (

2013). The authors provide a formal proof that it is sufficient to draw only one bootstrap replication in each Monte Carlo replication and to evaluate the statistic of interest against the resulting bootstrap distribution of size

R. The results for fixed and optimizing threshold values are displayed in

Table 2 and

Table 3. We observe that the bootstrap

F-tests maintain approximately the correct size and show suitable power properties for fixed threshold values and optimizing threshold values using a 15% lateral trimming. However, we still find size distortions originating from the hierarchical testing principle (not reported).

## 4. Application: (A)symmetric Fuel Price Transmissions

In the following, we investigate the potentially asymmetric price transmission from crude oil prices to retail gasoline prices in the US, Canada, France, Great Britain, Germany and Italy. We apply the threshold cointegration model to the data and test for asymmetry using conventional and bootstrapped critical values. Our weekly data cover the period from January 2010 until September 2017. WTI is the lead crude oil benchmark price for the North American market and Brent takes that role for the European market. Both crude oil price series and gasoline prices excluding tax and duty are obtained from Thomson Reuters Datastream. The Canadian prices are obtained from the Kent Group database. Using 405 observations and assuming moderate speed of adjustment, the power of the residual-based cointegration test is approximately unity, hence the first effect described in this paper (inflating the level of significance by conditioning on the primary cointegration test) should not play a role.

The results for MTAR and SETAR specifications are reported in

Table 4. All crude oil and retail gasoline price pairs are cointegrated. We compare the results for the conventional test for long-run asymmetry against the results for the bootstrap test described in

Section 2. In general, the

p-value for conventional

F-tests are higher than the ones obtained from bootstrapping for the fixed threshold models and lower for the models using optimizing threshold values. This is not surprising, considering that our simulation experiments have shown that conventional tests for long-run asymmetry are undersized for fixed thresholds and oversized for optimizing threshold values. If we use bootstrap-corrected

p-values, we only find evidence for asymmetric adjustment at the 5% significance level in case of the Great Britain retail gasoline market (MTAR model with optimal threshold value). For the US retail gasoline market and SETAR specification, we would reject the null hypothesis of symmetric adjustment for conventional

F-tests but would not do so using our bootstrap algorithm. The same holds for Canada, France and Italy if we employ an MTAR model with optimal threshold values.