Next Article in Journal / Special Issue
Timing Foreign Exchange Markets
Previous Article in Journal / Special Issue
Bayesian Nonparametric Measurement of Factor Betas and Clustering with Application to Hedge Fund Returns
 
 
Correction published on 5 February 2020, see Econometrics 2020, 8(1), 4.
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Return and Risk of Pairs Trading Using a Simulation-Based Bayesian Procedure for Predicting Stable Ratios of Stock Prices

by
David Ardia
1,2,
Lukasz T. Gatarek
3,
Lennart Hoogerheide
4 and
Herman K. Van Dijk
4,5,*
1
Institute of Financial Analysis, University of Neuchatel, Neuchatel, 2000, Switzerland
2
Department of Finance, Insurance and Real Estate, Laval University, Quebec City, G1V 0A6, Canada
3
Institute of Econometrics and Statistics, Faculty of Economics and Sociology, University of Lodz, Lodz, 90-255, Poland
4
Department of Econometrics and Tinbergen Institute, Vrije Universiteit Amsterdam, Amsterdam, 1081 HV, The Netherlands
5
Econometric Institute, Erasmus University Rotterdam, Rotterdam, 3062 PA, The Netherlands
*
Author to whom correspondence should be addressed.
Econometrics 2016, 4(1), 14; https://doi.org/10.3390/econometrics4010014
Submission received: 3 September 2015 / Revised: 26 January 2016 / Accepted: 28 January 2016 / Published: 10 March 2016
(This article belongs to the Special Issue Computational Complexity in Bayesian Econometric Analysis)

Abstract

:
We investigate the direct connection between the uncertainty related to estimated stable ratios of stock prices and risk and return of two pairs trading strategies: a conditional statistical arbitrage method and an implicit arbitrage one. A simulation-based Bayesian procedure is introduced for predicting stable stock price ratios, defined in a cointegration model. Using this class of models and the proposed inferential technique, we are able to connect estimation and model uncertainty with risk and return of stock trading. In terms of methodology, we show the effect that using an encompassing prior, which is shown to be equivalent to a Jeffreys’ prior, has under an orthogonal normalization for the selection of pairs of cointegrated stock prices and further, its effect for the estimation and prediction of the spread between cointegrated stock prices. We distinguish between models with a normal and Student t distribution since the latter typically provides a better description of daily changes of prices on financial markets. As an empirical application, stocks are used that are ingredients of the Dow Jones Composite Average index. The results show that normalization has little effect on the selection of pairs of cointegrated stocks on the basis of Bayes factors. However, the results stress the importance of the orthogonal normalization for the estimation and prediction of the spread—the deviation from the equilibrium relationship—which leads to better results in terms of profit per capital engagement and risk than using a standard linear normalization.

1. Introduction

In this paper we consider statistical arbitrage strategies. Such strategies presume that the patterns observed in the historical data are expected to be repeated in the future. That is, a statistical arbitrage is a purely advanced descriptive approach designed to exploit market inefficiencies.
Khandani and Lo [1] consider a specific strategy—first proposed by Lehmann [2] and Lo and MacKinlay [3]—that can be analyzed directly using individual equities returns. Given a collection of securities, they consider a long/short market-neutral equity strategy consisting of an equal dollar amount of long and short positions, where at each rebalancing interval, the long positions are made up of “losers” (underperforming stocks, relative to some market average) and the short positions are made up of “winners” (outperforming stocks, relative to the same market average). By buying yesterday’s losers and selling yesterday’s winners at each date, such a strategy actively bets on mean reversion across all stocks, profiting from reversals that occur within the rebalancing interval. For this reason, such strategies have been called “contrarian” trading strategies that benefit from market overreaction, i.e., when underperformance is followed by positive returns and vice-versa for outperformance. The same key idea is the basis of pairs trading strategies, which constitute another form of statistical arbitrage strategies.
The idea of pairs trading relies on long-term equilibrium among a pair of stocks. If such an equilibrium exists, then it is presumed that a specific linear combination of prices reverts to zero. A trading rule can be set up to exploit the temporary deviations (spread) to generate profit. When the spread between two assets is positive it is sold; that is, the outperforming stock is shorted and the long position is opened in the underperforming stock. In the opposite case, when the spread is negative: one buys. Gatev et al. [4] investigate the performance of this arbitrage rule over a period of 40 years and they find huge empirical evidence in favor of it. It is fundamental for the pairs trading strategy to precisely estimate the current and expected spread among the stock prices.
In this paper we interpret spread as the temporary deviation from the equilibrium in a cointegration model. Equilibrium in a cointegration model is interpreted as time series behavior that is characterized by stable, or otherwise stated stationary, long-run relations to which actual series return after temporary deviations. This approach differs from Gatev et al. [4], who implement a nonparametric framework. These authors choose a matching partner for each stock by finding the security that minimizes the sum of squared deviations between the two normalized price series; pairs are thus formed by exhaustive matching between normalized daily prices, where price includes reinvested dividends. However, as argued above, in the cointegration analysis that we perform, the spread between two assets is modeled as the temporary deviation from the long-run stable relations among the time series of asset prices. This deviation is computed as a linear combination of stock prices, where the weights in the linear combination are given by the cointegrating vector. Long-run stability also implies that there exists a finite uncertainty in the predictibility of stock prices that can be used in devising trading strategies. Therefore, pairs trading strategies are strongly dependent on the stability of ratios of pairs of stocks.
The estimated and predicted spreads are both computed from the estimated cointegration model. We introduce a simulation-based Bayesian estimation procedure that allows us to combine estimation and model uncertainty in a natural way with decision uncertainty associated with a decision process like a trading strategy. For the Bayesian estimation of the cointegration model, we work with a Metropolis-Hastings (M-H) type of sampler derived under an encompassing prior where we show that the encompassing prior is equivalent under certain conditions to the well-known Jeffreys’ or Information matrix prior. This sampling algorithm is derived by Kleibergen and Van Dijk [5] for the Simultaneous Equations Model and extended by Kleibergen and Paap [6] for the cointegration model. The latter authors specify a linear normalization to identify the parameters in the model. However, Strachan and Van Dijk [7] point at possible distortions of prior beliefs associated with the linear normalization. Moreover, in our application we find out that the distribution of the spread is particularly sensitive to the choice of normalization.
Therefore we make use of an alternative normalization, the orthogonal normalization, in order to identify the parameters in the cointegration model. Given that one is usually only interested in a linear combination of price series, this normalization is a natural one since it treats the variables in the series in a symmetric way. More details are given in Section 3. Hence, we implement the M-H sampler for the cointegration model under this normalization. We compare the performance of the pairs trading strategy under the orthogonal normalization with the performance of the counterpart under the linear normalization and find that, for our set of data, the orthogonal normalization is highly favored over the linear normalization with respect to the profitability and risk of the trading strategies.
The results imply that within the statistical arbitrage approach of pairs trading based on the cointegration model, the normalization is not only a useful device easing the parameter identification but it primarily becomes an important part of the model.
To take into account the non-normality of the conditional distribution of daily returns, we extend our approach of using the normal distribution to the case of the Student-t distribution.
The outline of the paper is as follows. In Section 2 the conditional and implicit statistical arbitrage approaches are discussed. In Section 3 our Bayesian analysis of the cointegration model under the encompassing prior is explained. In Section 4 we consider an empirical application using stocks in the Dow Jones Composite Average index. Section 5 concludes. The appendices contain technical derivations and additional tables with detailed results from our empirical application.

2. Pairs Trading: Implicit and Conditional Statistical Arbitrage

Suppose that there exists a statistical fair price relationship [8] between the prices y t , 1 and y t , 2 of two stocks, where the spread
s t = β 1 y t , 1 + β 2 y t , 2
is the deviation from this statistical fair price relationship, or “statistical mispricing”, at the end of day t. In this paper we consider two types of trading strategies that are based upon the existence of such a long-run equilibrium relationship: conditional statistical arbitrage (CSA) and implicit statistical arbitrage (ISA), where we use the classification of Burgess [8]. We will implement these strategies in such a way that at the end of each day the holding is updated, after which the holding is kept constant for a day. In the CSA strategy the desired holding at the end of day t is given by
C S A ( s t , k ) = s i g n ( E ( Δ s t + 1 | I t ) ) | E ( Δ s t + 1 | I t ) | k ,
where I t is the information set at the end of day t, and where we consider k = 0 and k = 1 . A positive value of C S A ( s t , k ) means that we buy C S A ( s t , k ) spreads and a negative value of C S A ( s t , k ) means that we short C S A ( s t , k ) spreads. That is, if β 1 > 0 and β 2 < 0 , then a positive value of C S A ( s t , k ) means that we buy β 1 × C S A ( s t , k ) of stock 1 and short ( β 2 ) × C S A ( s t , k ) of stock 2. For k = 1 the obvious intuition of the CSA strategy is that we want to invest more in periods with larger expected profits. In this way we consider the accuracy of the used method. In the case of k = 0 we only look at the sign of the expected change in the next day. In this way we consider the directional accuracy of the used method. Note that the expectation in (2) is taken over the distribution of Δ s t + 1 (given the information set I t and the ‘fixed’ values of β 1 and β 2 ). In the sequel of this paper, we use the posterior median to obtain estimates of model parameters, where the expectation in (2) will still be taken given these “fixed” estimated values. We use the posterior median, since the posterior distribution has Cauchy type tails in one of the model specifications that we investigate and these Cauchy type tails imply that the coefficients have no posterior means.
In the ISA strategy the desired holding at the end of day t is given by:
I S A ( s t ) = s t .
A positive value of I S A ( s t ) means that we buy I S A ( s t ) spreads and a negative value of I S A ( s t ) means that we short I S A ( s t ) spreads. Or equivalently, a negative value of s t means that we buy s t spreads and a positive value of s t means that we short s t spreads. That is, if β 1 > 0 and β 2 < 0 , then a positive value of I S A ( s t ) means that we buy β 1 × I S A ( s t ) = β 1 × ( s t ) of stock 1 and short ( β 2 ) × ( I S A ( s t ) ) = ( β 2 ) × s t of stock 2. In the sequel of this paper, we will substitute the posterior medians of β 1 and β 2 to obtain an estimate of the spread in (1).
The CSA and ISA strategies raise several questions. First, how do we define such long-run equilibrium relationships? How are the coefficients β 1 and β 2 estimated? Second, how do we find pairs of stocks that satisfy such a long-run equilibrium relationship? Third, how do we estimate how the stock prices adjust towards their long-run equilibrium relationship? In the next section, we consider how our Bayesian analysis of the cointegration model (under linear or orthogonal normalization) provides answers to all these questions. In order to answer the first and third questions we use the posterior distribution (more precisely, the posterior median) of the parameters in the cointegration model. In order to answer the second question we compute the Bayes factor of a model with a cointegration relationship versus a model without a cointegration relationship for a large number of pairs of stocks.
At this point, we stress why we make use of the CSA and ISA strategies, rather than the approach of Gatev et al. [4]. In the strategy of Gatev et al. [4] a holding is taken as soon as it is found that a pair of prices has substantially diverged. After that, the holding remains constant until the prices have completely converged to the equilibrium relationship. A disadvantage of that trading strategy is that there is not much trading going on (i.e., in most periods there is no trading at all), which makes it more difficult to investigate the difference in quality between different models given a finite period, or equivalently a very long period may be required to be able to find substantially credible differences in trading results between models.

3. Bayesian Analysis of the Cointegration Model Under Linear and Orthogonal Normalization

Consider a vector autoregressive model of order 1 (VAR(1)) for an n-dimensional vector of time series { Y t } t = 1 T
Y t = Φ Y t 1 + ε t ,
ε t is an independent n-dimensional vector normal process with zero mean and n × n positive definite symmetric (PDS) covariance matrix Σ. We will consider two alternative distributions for ε t : a multivariate normal distribution and a multivariate Student’s t distribution. Φ is an n × n matrix with with autoregressive coefficients. The initial values in Y 1 are assumed fixed. The VAR model in (4) can be written in error correction form
Δ Y t = Π Y t 1 + ε t ,
where Π = Φ I n (with I n the n × n identity matrix) is the long-run multiplier matrix, see e.g., Johansen [9] and Kleibergen and Paap [6].
If Π is a zero matrix, the series Y t contains n unit roots and there is no opportunity for long term predictibility with finite uncertainty. If the matrix Π has full rank, the univariate series in Y t are stationary and long-run equilibrium relations are assumed to hold. Cointegration appears if the rank Π equals r with 0 < r < n . The matrix Π can be written as the outer product of two full rank n × r matrices α and β:
Π = α β .
The matrix β contains the cointegration vectors, which reflect the stationary long-run (equilibrium) relations between the univariate series in Y t ; that is, each element of β Y t can be interpreted as a temporary deviation from a long-run (equilibrium) relations. The matrix α contains the adjustment parameters, which indicate the speed of adjustment to the long-run (equilibrium) relations.
To save on notation, we write (5) in matrix notation
Δ Y = Y 1 Π + ε
with ( T 1 ) × n matrices Δ Y = ( Δ Y 2 , , Δ Y T ) , Y 1 = ( Y 1 , , Y T 1 ) and ε = ( ε 2 , , ε T ) .
Under the cointegration restriction Π = β α , this model is given by: 1
Δ Y = Y 1 β α + ε .
The individual parameters in β α are non-identified as β α = β B B 1 α for any nonsingular r × r matrix B. That is, postmultiplying β by an invertible matrix B and premultiplying α by its inverse leaves the matrix β α unchanged. Therefore, r × r identification restrictions are required to identify the elements of β and α, so that these become estimable. In this paper we will consider two different normalization restrictions for identification purposes. The first normalization is the linear normalization, which is commonly used, where we have
β = I r β 2 .
That is, the r × r elements of the first r rows must form an identity matrix. The intuition behind this normalization is that for the case of two series it is assumed that the second series has an effect on the first series that is similar to the case of the linear regression model, where on measures the effect of a right-hand side explanatory variable on a left-hand side dependent variable. The second normalization is the orthogonal normalization, where we have
β β = I r .
Here the interpretation is that the two series are treated as symmetrically effective and only the linear combination matters. This normalization and interpretation comes natural for a set of time series of different, symmetrically treated prices, where one is mainly interested in stable linear combinations.
In this paper we consider the case of n = 2 time series (of stock prices) in Y t = ( y t , 1 , y t , 2 ) , where the rank of Π is equal to r = 1 :
Δ y t , 1 Δ y t , 2 = α 1 α 2 ( β 1 y t 1 , 1 + β 2 y t 1 , 2 ) + ε t , 1 ε t , 2
with spread
s t = β 1 y t , 1 + β 2 y t , 2
and
E ( Δ y t + 1 , 1 | I t ) E ( Δ y t + 1 , 2 | I t ) = α 1 α 2 ( β 1 y t , 1 + β 2 y t , 2 ) = α 1 α 2 s t ,
so that
E ( Δ s t + 1 | I t ) = β 1 E ( Δ y t + 1 , 1 ) + β 2 E ( Δ y t + 1 , 2 ) = ( α 1 β 1 + α 2 β 2 ) s t .
From (10) and (12) it is clear that our ISA trading strategy depends on β 1 and β 2 , whereas our CSA trading strategy also depends on α 1 and α 2 .
Under the linear normalization we have β 1 = 1 :
β = 1 β 2 ,
whereas under the orthogonal normalization we have
β β = β 1 2 + β 2 2 = 1 ,
which is (under the further identification restriction β 2 0 ) equivalent with
β 2 = 1 β 1 2 .
Since the adjustment coefficients α 1 and α 2 may be close to 0, there may be substantial uncertainty about the equilibrium relationship. The linear normalization allows β 2 to take values in ( , ) , whereas the orthogonal normalization allows β 1 to take values in [ 1 , 1 ] and β 2 in [ 0 , 1 ] . One may argue that the spread under the linear normalization is just a re-scaled version of the spread under the orthogonal normalization (where the spread under the linear normalization would result by dividing the spread under the orthogonal normalization by β 1 ). However, we will consider a moving window, where the parameters will be updated every day, so that the re-scaling factor is not constant over time. Therefore, the profit/loss of the ISA strategy under the linear normalization is not just a re-scaled version of the profit/loss of the ISA strategy under the orthogonal normalization. Further, we estimate the parameters using their posterior median, where the posterior median of β 2 under the linear normalization will typically differ from the ratio of the posterior medians of β 2 and β 1 under the orthogonal normalization. The profit/loss of the strategies under the linear normalization may be much affected by a small number of days at which the β 2 is estimated very large (in an absolute sense), whereas under the orthogonal normalization the profit/loss may be more evenly affected by the different days, as (the estimates of) β 1 and β 2 can not ‘escape’ to extreme values outside [ 1 , 1 ] × [ 0 , 1 ] .

3.1. The Encompassing and Jeffreys’ Framework for Prior Specification and Posterior Simulation

As mentioned above, we consider the case of n = 2 time series (of stock prices), where the rank of Π = β α is equal to r = 1 . That is, the matrix Π needs to satisfy a reduced rank restriction. A natural way to specify a prior for α and β is given by the encompassing framework, in which one first specifies a prior on Π without imposing a reduced rank restriction and then obtains the prior in our model as the conditional prior of Π given that the rank of Π is equal to 1.
As singular values are generalized eigenvalues of non-symmetric matrices, they are a natural way to represent the rank of a matrix. Using singular values we can artificially construct the full rank specification of Π via an auxiliary parameter given by the ( n r ) × ( n r ) matrix λ; i.e., λ is a scalar in our case with n = 2 and r = 1 . The reduced rank matrix β α is extended into the full rank specification:
Π = β α + β λ α ,
where β and α are n × ( n r ) matrices that are specified such that β β 0 , β β I n r , α α 0 and α α I n r . The full rank specification encompasses the reduced rank case given by λ = 0 . In this framework the probability p ( λ = 0 | Y ) can be interpreted as a measure quantifying the likelihood of reduced rank. The specification in (26) is obtained using the singular value decomposition Π = U S V of Π, where the n × n matrices U and V are orthogonal such that U U = I n and V V = I n and the n × n matrix S is diagonal and has the singular values of Π on its diagonal in a decreasing order.
To derive the elements of equation (26) in terms of parameters Π we partition Π according to the specifics of the chosen normalization. Under the linear normalization, we partition the matrices U, S and V as follows
U = U 11 U 12 U 21 U 22 , S = S 1 0 0 S 2 , and V = V 11 V 21 V 12 V 22 .
The matrices in decomposition (26) in terms of the blocks of U, S and V are given by
α = U 11 S 1 ( V 11 V 21 ) , α = ( V 22 V 22 ) 1 / 2 V 22 1 ( V 12 V 22 ) , β 2 = U 21 U 11 1 , β = U 12 U 22 U 22 1 ( U 22 U 22 ) 1 / 2 , λ = ( U 22 U 22 ) 1 / 2 U 22 S 2 V 22 ( V 22 V 22 ) 1 / 2 .
Under the orthogonal normalization, the matrices are partitioned as
U = ( U 1 U 2 ) , S = S 1 0 0 S 2 and V = V 1 V 2 ,
and the following relations hold:
α = S 1 V 1 , α = V 2 β = U 1 , β = U 2 , λ = S 2 .
Under the orthogonal normalization λ is directly equal to S 2 , whereas under the linear normalization it is just a rotation of S 2 . In both cases restriction λ = 0 is equivalent with restricting the n r smallest singular values of Π to 0.
The prior on ( α , β ) is equal to the conditional prior of the parameters ( α , β , λ ) given that λ = 0 , which is proportional to the joint prior for ( α , β , λ ) evaluated at λ = 0 :
p ( α , β ) p ( α , β , λ ) | λ = 0 p ( Π ( α , β , λ ) ) | λ = 0 | J ( Π , ( α , β , λ ) ) | | λ = 0 ,
where | λ = 0 stands for evaluated in λ = 0 , where J ( Π , ( α , β , λ ) ) denotes the Jacobian of the transformation from Π to ( α , β , λ ) . Kleibergen and Paap [6] derive the closed form expression for the determinant of the Jacobian | J ( Π , ( α , β , λ ) ) | for the general case of n variables and reduced rank r under the linear normalization. In Appendix B the Jacobian is derived under the orthogonal normalization of β.
Bastürk et al. [10] prove that under certain conditions the encompassing prior is equivalent to Jeffreys’ prior in the cointegration model with normally distributed innovations, irrespective of the normalization applied. We emphasize this equivalence, since the use of the information matrix or Jeffreys’ prior is more well-known than the encompassing approach. Since the information matrix prior may yield certain desirable properties of the posterior, we conclude that an encompassing approach may also serve this purpose.
In a similar fashion, the posterior of ( α , β ) is equal to the conditional posterior of the parameters ( α , β , λ ) given that λ = 0 , which is proportional to the joint posterior for ( α , β , λ ) evaluated at λ = 0 :
p ( α , β | Y ) = p ( α , β | λ = 0 , Y ) p ( α , β , λ | Y ) | λ = 0 = p ( Π ( α , β , λ | Y ) ) | λ = 0 | J ( Π , ( α , β , λ ) ) | | λ = 0 ,
where the detailed expression for p ( Π ( α , β , λ ) | Y ) is given by Kleibergen and Paap [6], and where
p ( α , β , λ | Y ) = p ( Π ( α , β , λ ) | Y ) | Π = β α + β λ α | J ( Π , ( α , β , λ ) ) | .
For Bayesian estimation of the cointegration model we need an algorithm to sample from the posterior density in (18). However this posterior densities does not belong to any known class of distributions, see Kleibergen and Paap [6], and as such can not be sampled directly. The idea of the Metropolis-Hastings (M-H) algorithm is to generate draws from the target density by constructing a Markov chain of which the distribution converges to the target distribution, using draws from a candidate density and an acceptance-rejection scheme. Kleibergen and Paap [6] present the M-H algorithm to sample from (18) for the cointegration model with normally distributed disturbances under the linear normalization. In this algorithm (19) is used to form a candidate density. The general outline of this sampling algorithm is presented in Appendix A. Appendix B presents the approach to evaluate the acceptance-rejection weights under the orthogonal normalization. The posteriors of the coefficients under the linear normalization have Cauchy type tails, so that there exist no posterior means for the coefficients. Therefore, we estimate the coefficients using the posterior median (which we do under both normalizations to keep the comparison between the normalizations as fair as possible).
Given that the time series considered have a non-normal shape, we also consider the model under a multivariate Student’s t distribution for the innovations ε t . Then the M-H algorithms are straightforwardly extended, see Geweke [11].
Since we make use of the independence-chain Metropolis-Hastings algorithm, the simulation of candidate draws and the evaluation of the importance weights (to be used in the probability of accepting the candidate draw) can be easily performed in a parallel fashion. This would enormously increase the speed of our computations. Only the final step of the method, the actual acceptance or rejection of candidate draws, can not be performed in a parallel fashion. But this step takes relatively very little computing time. As an alternative, one can make use of importance sampling, where the whole method can be performed in a parallel fashion.

3.2. Bayes Factors

We evaluate the Bayes factor of rank 1 versus rank 2 and the Bayes factor of rank 0 versus rank 2. The Bayes factor of rank 1 versus rank 0 is obviously given by the ratio of these Bayes factors. For the evaluation of these Bayes factors we extend the method of Kleibergen and Paap [6] who evaluate the Bayes factor as the Savage-Dickey density ratio, see Dickey [12] and Verdinelli and Wasserman [13] to the case of orthogonal normalization. The Bayes factor for the restricted model with λ = 0 (where Π has rank 0 or 1) versus the unrestricted model with unrestricted λ (where Π has rank 2) equals the ratio of the marginal posterior density of λ, and the marginal prior density of λ, both evaluated in λ = 0 . However, in the case of our diffuse prior specification this Bayes factor for rank reduction is not defined, as the marginal prior density of λ is improper.
Therefore, we follow Chao and Phillips [14] who use as prior height ( 2 π ) ( 2 n r r 2 ) / 2 to construct their posterior information criterium (PIC). We assume equal prior probabilities 1 3 for the rank 0, 1 or 2, so that the Bayes factor is equal to the posterior odds, the ratio of posterior model probabilities. For pairs of stock prices we will mostly observe that the estimated posterior model probability is highest for rank 0, the case of two random walk processes without cointegration. Only for a small fraction of pairs, we will observed that the estimated posterior model probability is highest for rank 1, the case of two cointegrated random walk processes.

4. Empirical Application

The CSA and ISA strategies are applied to components of the Dow Jones Composite Average index. We work with daily closing prices recorded over the period of one year, from 1 January 2009 until 31 December 2009. We consider the 65 stocks with the highest liquidity. First, we identify cointegrated pairs based on the estimated posterior probability of cointegration (i.e., Π having rank 1) computed for the first half year of the data. That is, among the 65 × 64 2 = 2080 pairs we select the 10 pairs with the highest Bayes factor of rank 1 versus rank 0 (where these Bayes factors are larger than 1) for both the linear and orthogonal normalization. The 10 pairs are identical for both normalizations; these pairs are given by Table 1. Second, those pairs are used in the CSA and ISA trading strategies during the last 6 months of 2009. We use a rolling window, where the parameter estimates are updated at the end of each trading day, after which the positions are updated and kept constant until the end of the next trading day. We will analyze the profits from these trading strategies, where we take into account the common level of transaction costs of 0.1% ( c = 0 . 001 ).
Next to the “standard” CSA approach described before, we will also perform a more cautious, more conservative CSA strategy that takes into account parameter uncertainty. Here we only take a position if we are more certain about the sign of the current spread (and hence the sign of the expected change of the spread, which is the opposite sign). We only take a position if the ( 50 + ξ / 2 ) % percentile and the ( 50 ξ / 2 ) % percentile of the posterior distribution of the current spread have the same sign, where we consider the cases of ξ = 20 % , 30 % , 40 % , 50 % or 60 % . The case of ξ = 60 % is the most cautious strategy, where the sign of the posterior 20 % and 80 % percentiles of the spread must be the same. Note that for ξ = 0 % this strategy reduced to the original CSA strategy.
In order to evaluate the CSA and ISA strategies in the cointegration models under the linear and orthogonal normalization and under a normal and Student’s t distribution for the innovations, we compute two measures. First of all, the strategy can not be evaluated in terms of the percentage return on initial capital investment, as we are not only buying stocks but we are also shorting stocks. Suppose that we perform our strategies for T consecutive trading days (where in our case T is the number of trading days in the last 6 months of 2009). Then the average daily capital engagement is given by:
A D C E C S A ( k , x % ) 1 T t = 0 T 1 | C S A ( s t , k , x % ) | ( | β ^ t , 1 | y t , 1 + | β ^ t , 2 | y t , 2 ) ,
A D C E I S A 1 T t = 0 T 1 | I S A ( s t ) | ( | β ^ t , 1 | y t , 1 + | β ^ t , 2 | y t , 2 ) ,
where β ^ t , 1 and β ^ t , 2 are the posterior medians of β 1 and β 2 computed at the end of the t-th day in the trading period. That is, β ^ t = 0 , 1 and β ^ t = 0 , 2 are computed at the end of the last trading day before the trading period. Our first performance measure is a profitability measure that is given by the total return of the strategy divided by the average daily capital engagement:
P r o f i t a b i l i t y C S A ( k , x % ) = Cumulative Return of C S A ( k , x % ) at time T A D C E C S A ( k , x % ) ,
P r o f i t a b i l i t y I S A = Cumulative Return of I S A at time T A D C E I S A .
Our second performance measure concerns the risk of the strategies. In order to estimate risk we use paths of cumulative return. When the cumulative return at time t + 1 is often lower than the cumulative return at time t, then a strategy can be considered risky. On the other hand, if the cumulative return is growing or remains steady over most periods the strategy can be considered as having low risk. In the latter case the signals generated by the trading rule are accurate and yield (mostly) profit. We define our measure as:
R i s k # { Cum .   return   at   time   t + 1 <   Cum .   return   at   time   t } # { Cum .   return   at   time   t + 1   Cum .   return   at   time   t }
for each strategy.
Table 2 presents the average of the P r o f i t a b i l i t y measure in (22)–(23) over the ten selected (cointegrated) pairs of stocks. Detailed results, for every pair of stocks, are presented in Appendix C. The hypothesis that the normalization plays an important role in pairs trading strategies is confirmed by the empirical findings. Table 2 confirms that the orthogonal normalization substantially outperforms the linear counterpart, irrespectively of the assumed distribution for the innovations. It is particularly pronounced when the profitability of the CSA strategies under k = 0 (directional accuracy) is compared with the counterpart under k = 1 (accuracy). For the linear normalization the increase from k = 0 to k = 1 is linked with substantial decrease in profitability. It means that predictions of the change of the spread under the linear normalization are relatively poor compared with the orthogonal case. For k = 1 where not only the direction but also the size of predicted change of the spread play an important role, the linear normalization performs relatively poorly. On the contrary, the orthogonal normalization shows an appreciable increase in profitability for k = 1 compared to k = 0 .
As expected, the R i s k measure for the CSA strategies in Table 3 decreases for more cautious, more conservative strategies with larger values of ξ. The R i s k measure under the linear normalization is similar or worse than the counterpart under the orthogonal normalization. Obviously, this only means that the percentage of trading days with a decrease of the cumulative return is similar. The size of these decreases may be larger. In further research, we will take a closer look at the riskiness of the alternative strategies in the different models.
Now we compare the performance between the normal distribution and the Student’s t distribution. The R i s k measure is slightly better under the Student’s t distribution. For the P r o f i t a b i l i t y measure the difference (in favor of the Student’s t distribution) seems somewhat more clearly present. However, it should be noted that the difference between the orthogonal and linear normalizations is much larger than the difference between the Student’s t and the normal distribution. The normalization is clearly the key factor for the profitability of the trading strategies for our set of data. One possible reason for this result is that the profit/loss of the strategies under the linear normalization may be much affected by a small number of days at which the β 2 is estimated very large (in an absolute sense), whereas under the orthogonal normalization the profit/loss may be more evenly affected by the different days, as (the estimates of) β 1 and β 2 can not “escape” to extreme values far outside [ 1 , 1 ] × [ 0 , 1 ] . The latter may happen in the case of the linear normalization if the adjustment coefficients α 1 and α 2 are close to 0. The latter may be found for certain pairs of empirical time series of stock prices (in certain periods), where the error correction may be rather slow. In future research, we will take a closer look at the reasons for the substantial difference in profitability between the normalizations.

5. Conclusions and Topics for Further Research

In this paper we explored the connection between the statistical wellknown cointegration model and decision strategies for the selection of pairs trading of stocks using a simulation-based Bayesian procedure. We considered two cases of pairs trading strategies: a conditional statistical arbitrage method and an implicit statistical arbitrage method. We used a simulation-based Bayesian procedure for predicting stable ratios, defined in a cointegration model, of pairs of stock prices. We showed the effect that using an encompassing or Jeffreys’ prior under an orthogonal normalization has for the selection of pairs of cointegrated stock prices and for the estimation and prediction of the spread between cointegrated stock prices and its uncertainty. An empirical application was done using stocks that are ingredients of the Dow Jones Composite Average index. The results showed that the normalization has little effect on the selection of pairs of cointegrated stocks on the basis of Bayes factors. However, the results stressed the importance of the orthogonal normalization for the estimation and prediction of the spread, which leads to better results in terms of profit per capital engagement and risk than using a standard linear normalization.
An important issue for future research is to investigate the robustness of our empirical results. Here we list three topics of further research. First, there may exist a sensitivity of the results for the specific data. We already indicated that our results may be sensitive for some particular data in a sub period. Second and more generally, if one considers the percentiles of the predictive distribution for the future spread during the trading strategy, taking into account the uncertainty in future innovations, then it is important to specify the distribution of the innovations carefully. In future research, we will consider a finite mixture of Gaussian distributions for the innovations. However, the algorithm for the posterior simulation will not be a straightforward extension of the algorithm under the normal distribution, which was the case for the Student’s t distribution. We will extend the partial and permutation-augmented MitISEM (Mixture of t by Importance Sampling weighted Expectation Maximization) approaches of Hoogerheide et al. [15] to perform the posterior simulation for the cointegration model with errors obeying a finite mixture distribution. A third point is the introduction of a learning strategy by the decision maker. Fourth, the economic performance of econometric predictions can be evaluated using a utility based metric to obtain a certainty equivalence of strategies. This penalizes the excess variation in predictions perceived as “risk” of the strategy. See West et al. [16]. Fifth, the methods can be compared with a strategy where one invests 50% in the risk-free rate and 50% in a risky asset, which is quite a successful and robust strategy, see Marquering and Verbeek [17]. Sixth, the performance of the methods can be compared with alternative Bayesian cointegration approaches, see Furmston et al. [18] and Bracegirdle and Barber [19].

Author Contributions

The authors contributed equally to the paper.

Funding

This research was funded by the National Science Center, Poland, grant number 2013/09/N/HS4/03751.

Acknowledgments

The authors are indebted to two anonymous reviewers and a guest-editor for very helpful comments on an earlier version of this paper.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A: Posterior Distribution and Sampling Algorithm Under Encompassing Prior

In this appendix we make use of two models, where we follow the terminology of Kleibergen and Paap [6]. First, the linear error correction (LEC) model is given by
Δ Y t = Π Y t 1 + ε t , ε t N ( 0 , Σ ) ,
in which we impose no rank reduction restriction on
Π = β α + β λ α .
We will use the posterior of ( α , β , λ , Σ ) in this LEC model as a candidate distribution in a Metropolis-Hastings algorithm.
Second, the cointegration model is given by
Δ Y t = Π Y t 1 + ε t , ε t N ( 0 , Σ ) ,
in which we impose the rank reduction restriction
Π = β α .
The posterior of ( α , β , Σ ) in this cointegration model is the actual target distribution that we are interested in. Straightforwardly using a Gibbs sampler by simulating α and β from their full conditional posteriors is not possible due to their difficult dependence structure, see Kleibergen and Van Dijk [20].
The encompassing prior assumes that rank restriction on Π is expressed explicitly using the decomposition (26), where not only the prior in the LEC model but also the posterior of α , β , λ | Σ , Y in the LEC model satisfies the transformation of random variables defined by (26) such that
p L E C ( α , β , λ | Σ , Y ) = p L E C ( Π | Σ , Y ) | Π = β α + β λ α | J ( Π , ( α , β , λ ) ) |
where J ( Π , ( α , β , λ ) ) denotes the Jacobian of the transformation from Π to ( α , β , λ ) .
The conditional posterior α , β | λ , Σ , Y in the LEC model, which is obviously proportional to the joint, can be evaluated in λ = 0 to obtain the posterior of α , β | Σ , Y in our cointegration model with rank reduction:
p ( α , β | Σ , Y ) = p L E C ( α , β | λ , Σ , Y ) | λ = 0 p L E C ( α , β , λ | Σ , Y ) | λ = 0 = p L E C ( Π | Σ , Y ) | Π = β α | J ( Π , ( α , β , λ ) ) | | λ = 0 ,
since our model results from the linear error correction model by imposing λ = 0 . Hence, we can consider the rank reduction as a parameter realization λ = 0 .

Sampling Algorithm Based on Diffuse Prior on Π and Nesting

We specify a diffuse prior on the parameters Π and Σ in the LEC model: p L E C ( Σ ) | Σ | n + 1 2 and p L E C ( Π | Σ ) 1 (where n is the dimension of Y t ). The conditional posterior of Σ given Π in the LEC model is a matric-variate normal distribution. The marginal posterior of Σ in the LEC model is an inverted Wishart distribution. See Zellner [21] for a discussion of Bayesian analysis in the linear model. The decomposition in (26) allows us to obtain a (joint) draw of α and β (and λ) from a draw of Π. The dependencies between α and β are fully taken into account by determining α and β simultaneously.
This poses the problem that our posteriors of interest in the cointegration model, p ( α , β , Σ | Y ) and p ( α , β | Y , Σ ) , do not involve λ while it is sampled in the posterior draws for the LEC model. Kleibergen and Paap [6] adopt the approach suggested by Chen [22]. For simulating from the posterior p ( α , β | Σ , Y ) it is first extended with an artificial extra parameter λ whose density we denote by g ( λ | α , β , Σ , Y ) . We use a Metropolis-Hastings (M-H) sampling algorithm for simulating from the joint density
p g ( α , β , λ , Σ | Y ) = g ( λ | α , β , Σ , Y ) p ( α , β , Σ | Y ) .
The posterior p ( α , β , λ | Σ , Y ) from (29) is used as the candidate generating density. When p g ( α , β , λ , Σ , Y ) is marginalized with respect to λ in order to remove the artificial parameter λ, the resultant distribution is p ( α , β , Σ | Y ) . The simulated values of α , β , Σ (discarding λ) therefore are a sample from p ( α , β , Σ | Y ) .
The choice of g ( λ | α , β , Σ , Y ) leads to the weight function w ( α , β , λ , Σ ) for use in the M-H algorithm. The acceptance probability in the M-H depends on a weight function which is the ratio of the target density (31) and the candidate generating density (29),
w ( α , β , λ , Σ ) = p g ( α , β , λ , Σ | Y ) p L E C ( α , β , λ , Σ | Y ) = g ( λ | α , β , Σ , Y ) p ( α , β | Σ , Y ) p L E C ( α , β , λ , Σ | Y ) = g ( λ | α , β , Σ , Y ) exp ( 1 2 ( t r ( β α Π ^ ) ( β α Π ^ ) ) exp ( 1 2 ( t r ( β α + β λ α Π ^ ) ( β α + β λ α Π ^ ) ) | J | | λ = 0 | J |
where Π ^ is the OLS estimator.
The exponentiated trace expressions in numerator and denominator are related to each other by
t r ( ( β α + β λ α Π ^ ) ( β α + β λ α Π ^ ) ) = t r ( ( β α Π ^ ) ( β α Π ^ ) ) + t r ( ( λ β Π ^ α ) ( λ β Π ^ α ) ) + t r ( ( β Π ^ α ) β Π ^ α ) = t r ( ( β α Π ^ ) ( β α Π ^ ) ) + t r ( ( λ λ ˜ ) ( λ λ ˜ ) ) + t r ( λ ˜ λ ˜ )
where λ ˜ = β Π ^ α . A sensible choice for the density function g ( λ | α , β , Σ , Y ) thus turns out to be
g ( λ | α , β , Σ , Y ) exp 1 2 t r ( ( λ λ ˜ ) ( λ λ ˜ ) ) .
Using this choice of g ( λ | α , β , Σ , Y ) the weight function reduces to
w ( α , β , λ , Σ ) exp 1 2 t r ( λ ˜ λ ˜ ) | J ( Π , ( α , β , λ ) ) | | λ = 0 | J ( Π , ( α , β , λ ) ) | .
For the determinant of the Jacobian | J ( Π , ( α , β , λ ) ) | under the linear normalization we refer to the appendix of Kleibergen and Paap [6]. In Appendix B we derive the determinant of the Jacobian under the orthogonal normalization.
The steps required in the sampling algorithm are,
  • Draw Σ i + 1 from p L E C ( Σ | Y ) .
  • Draw Π i + 1 from p L E C ( Π | Σ , Y ) .
  • Compute α i + 1 , β i + 1 , λ i + 1 from Π i + 1 using the singular value decomposition.
  • Accept Σ i + 1 , α i + 1 and β i + 1 with probability min w ( α i + 1 , β i + 1 , λ i + 1 , Σ i + 1 ) w ( α i , β i , λ i , Σ i ) , 1 .

Appendix B: Jacobian of the Transformation from Π to ( α , β , Σ ) under Orthogonal Normalization

Under the orthogonal normalization the components in decomposition (26) can be computed from Π using the singular value decomposition with matrices U , S and V partitioned according to
Π = U S V = ( U 1 U 2 ) S 1 0 0 S 2 V 1 V 2 ,
where U = ( U 1 U 2 ) and V = ( V 1 V 2 ) are orthonormal matrices. U 1 and V 1 are p × r , U 2 and V 2 are p × ( p r ) , and S 1 and S 2 are diagonal r × r and ( p r ) × ( p r ) . Then for orthogonal normalization the following relation hold: β = U 1 , α = S 1 V 1 , β = U 2 , α = V 2 , and λ = S 2 .
β 1 is identified uniquely by β 2 .2 So, it suffices to derive J ( Π , ( α , β 2 , λ ) )
J ( Π , ( β 2 , α , λ ) ) = v e c Π ( v e c β 2 ) v e c Π ( v e c α ) v e c Π ( v e c λ ) .
The expression for v e c Π ( v e c α ) is given by
v e c Π ( v e c α ) = ( I p β ) + v e c Π ( v e c α ) v e c α ( v e c α )
and
v e c Π ( v e c α ) = ( I β λ ) .
If we assume c = I r 0 and c = 0 I p r , we have α = c I r α ( α c ) 1 c and
v e c α ( v e c α ) = ( c ( α c ) 1 ( c ( α c ) 1 α c ) ) ( c ( α c ) 1 c ) K r , p ,
so that
v e c Π ( v e c α ) = ( I p β ) + c ( α c ) 1 β λ c ( ( c ( α c ) 1 α ) I p ) K r , p .
Then for β 2 we obtain
v e c Π ( v e c β 2 ) = v e c Π ( v e c β ) v e c β ( v e c β 1 ) v e c β 1 ( v e c β 2 ) + v e c Π ( v e c β ) v e c β ( v e c β ) v e c β ( v e c β 1 ) v e c β 1 ( v e c β 2 ) = ( α I n ) I r I r 0 ( n r ) × r v e c β 1 ( v e c β 2 ) + ( α λ I n ) v e c β ( v e c β ) I r I r 0 ( n r ) × r v e c β 1 ( v e c β 2 )
The formula for v e c β 1 ( v e c β 2 ) is derived based on the orthogonal normalization condition. We have:
I r = β 1 β 1 + β 2 β 2 0 = d ( β 1 β 1 ) + d ( β 2 β 2 ) 0 = ( β 1 I r ) d v e c β 1 + ( I r β 1 ) d v e c β 1 + ( β 2 I r ) d v e c β 2 + ( I r β 2 ) d v e c β 2
As K r v e c ( A ) = v e c ( A ) , see we have
0 = K r ( I r β 1 ) d v e c β 1 + ( I r β 1 ) d v e c β 1 + K r ( I r β 2 ) d v e c β 2 + ( I r β 2 ) d v e c β 2 = ( K r + I r ) ( I r β 1 ) d v e c β 1 + ( K r + I r ) ( I r β 2 ) d v e c β 2 = 2 N r ( I r β 1 ) d v e c β 1 + 2 N r ( I r β 2 ) d v e c β 2 ,
where N r = 1 2 ( I r + K r ) . Thus we obtain
0 = 2 N r ( I r β 1 ) d v e c β 1 + 2 N r ( I r β 2 ) d v e c β 2
and
v e c β 1 ( v e c β 2 ) = N r ( I r β 1 ) 1 N r ( I r β 2 ) .
Further, because β = U 1 and β = U 2 we can derive v e c β ( v e c β ) based on the orthomorphic transformation between U = ( U 1 U 2 ) and X ˜ , where X ˜ = ( I + U ) 1 ( I U ) and U = ( I + X ˜ ) 1 ( I X ˜ ) . We find that
v e c β ( v e c β ) = v e c β ( v e c X ˜ ) v e c X ˜ ( v e c β ) = I 0 I p r ( ( I p + U ) ( I p + X ˜ ) 1 ) × I I r 0 ( ( I p + X ˜ ) ( I p + U ) 1 )
where U 1 = U I r 0 and U 2 = U 0 I p r and d ( ( I + U ) 1 ( I U ) ) = ( I + U ) 1 d U ( I + U ) 1 ( I U ) ( I + U ) 1 d U .
For v e c Π ( v e c λ ) we obtain
v e c Π ( v e c λ ) = α β .

Appendix C: Tables for Ten Pairs of Stocks

Table 4. Performance evaluation measure P r o f i t a b i l i t y in (22)–(23), which is the ratio of cumulative income to the average daily capital absorption (in %), under a normal distribution for the innovations. The meaning of the ten pairs of stocks is indicated in Table 1. Outperforming normalization in boldface.
Table 4. Performance evaluation measure P r o f i t a b i l i t y in (22)–(23), which is the ratio of cumulative income to the average daily capital absorption (in %), under a normal distribution for the innovations. The meaning of the ten pairs of stocks is indicated in Table 1. Outperforming normalization in boldface.
k = 0 (Directional Accuracy) k = 1 (Accuracy)
CSA ISA CSA
0%20%30%40%50%60% 0%20%30%40%50%60%
linear normalization of β
AA-OSG 19.525.215.523.225.540.1 22.5 4.22.13.52.175.2
DUK-IBM 7.31716.222.916.774.3 6.5 0.91.81.52.52.37.5
DUK-OSG −310.515.947.746.282.7 24.9 2.94.34.99.211.416.1
NI-NSC 24.935.940.433.533.938.8 7.1 13.219.12525.833.837.4
NI-OSG 11.219.922.334.147.970.6 33.2 3.44.85.77.71014.7
CNP-OSG 14.824.83229.72337.3 10.3 1.62.12.72.83.58.8
MO-UPS −5.8−5.910.123.330.516.6 19.8 1.41.534.35.24.6
NI-R 39.232.922.914.42013.4 24.1 17.815.28.34.73.83.3
NI-UNP 11.29.18.318.631.644 6.4 0.91.31.62.84.76.9
NI-UTX 12.99.59.418.332.148.6 6.5 1.11.51.934.77.7
orthogonal normalization of β
AA-OSG 108.7138.6145.2155.2167.4181.2 133.3 229293.3324.6358.6397.1428.2
DUK-IBM 23.636.332.660.782.3364 14.9 19.734.736.465.9112.9626.1
DUK-OSG 80.985.785.3124.3136.559.7 62.2 77.596.5100.7130.7174.412.1
NI-NSC 46.454.859.881.281.580.9 17.5 32.344.456.482.195.1132.8
NI-OSG 67.1100.394.2113.3160.4150.2 65.5 7496.6108.9130.2164.4103.5
CNP-OSG 25.521.416.917.46.49.2 8.8 3.42.71.82.81.12.1
MO-UPS 0.24.42.815.63266.7 17.9 11.11.32.84.67.5
NI-R 23.712.92.5−10.2−9.412.4 17.2 9.77.75.511.52.6
NI-UNP 14.27.51.516.830.721.6 5.8 211.12.74.73.8
NI-UTX 180.7186.1195.3191.3147.1176.1 118 229.2252.3265.7269.4225.1288.1
Table 5. Performance evaluation measure P r o f i t a b i l i t y in (22)–(23), which is the ratio of cumulative income to the average daily capital absorption (in %), under a Student’s t-distribution for the innovations. The meaning of the ten pairs of stocks is indicated in Table 1.
Table 5. Performance evaluation measure P r o f i t a b i l i t y in (22)–(23), which is the ratio of cumulative income to the average daily capital absorption (in %), under a Student’s t-distribution for the innovations. The meaning of the ten pairs of stocks is indicated in Table 1.
k = 0 (Directional Accuracy) k = 1 (Accuracy)
CSA ISA CSA
0%20%30%40%50%60% 0%20%30%40%50%60%
linear normalization of β
AA-OSG 24.531.118.824.728.841.6 24.6 2.53.233.546.4
DUK-IBM 10.516.515.425.317.895.4 7 1.32.42.13.23.314.9
DUK-OSG 13.82735.845.247.782.6 24.7 2.84.55.17.1811.7
NI-NSC 41.553.173.271.367.679.8 17.8 34.848.270.487.793.8106.3
NI-OSG 2935.742.8517786.8 34.1 4.86.989.913.718
CNP-OSG 17.124.334.217.726.461.2 10.3 1.52.231.73.111.6
MO-UPS 9.97.920.839.84448.7 20.5 2.32.23.75.66.15.6
NI-R 32.428.124.64.211.514.8 23.9 17.415.916.21.52.63.3
NI-UNP 4.98.37.418.631.550.2 6.5 0.91.71.72.44.38.2
NI-UTX 19.717.216.82325.229.8 6.6 4.53.92.92.733.5
orthogonal normalization of β
AA-OSG 118.3150.9170.1182.5192.7213.2 141.1 245.1314351.8391.9427.4466.3
DUK-IBM 25.929.726.663.198.4350.3 14.9 19.830.43565.397.7608.5
DUK-OSG 78.484.296.7115.3148.5101.9 60.5 73.790.5102.5121.3133.516.4
NI-NSC 43.554.658.269.95316.1 12.8 23.44141.554.825.27.8
NI-OSG 55.57774.3101.5152.1119.6 54.8 53.669.980.8114145.1100.2
CNP-OSG 29.726.918.920.815.46.6 9.7 5.87.32.53.32.92.9
MO-UPS 11.14.92.315.430.768.9 18.5 1.71.51.52.94.67.7
NI-R 33.422.113.82.6-12.41.7 16.9 9.36.750.71.31.8
NI-UNP 16159.235.440.448.8 6.5 2.93.43.46.97.811.3
NI-UTX 190.9198.8206.6211.3196.5213.4 127.5 248.5272.8289.7299.3293.8333.8
Table 6. Risk evaluation measure R i s k in (24), fraction of trading days with decreases in cumulative income, under a normal distribution for the innovations. The meaning of the ten pairs of stocks is indicated in Table 1. Outperforming normalization in boldface.
Table 6. Risk evaluation measure R i s k in (24), fraction of trading days with decreases in cumulative income, under a normal distribution for the innovations. The meaning of the ten pairs of stocks is indicated in Table 1. Outperforming normalization in boldface.
k = 0 (Directional Accuracy) k = 1 (Accuracy)
CSA ISA CSA
0%20%30%40%50%60% 0%20%30%40%50%60%
linear normalization of β
AA-OSG 0.420.430.420.460.410.39 0.48 0.490.480.430.460.420.39
DUK-IBM 0.460.440.460.430.360 0.49 0.550.550.520.50.50
DUK-OSG 0.550.480.490.410.440.36 0.5 0.580.50.520.430.440.36
NI-NSC 0.40.380.370.430.530.54 0.48 0.50.420.40.480.530.54
NI-OSG 0.460.430.430.40.40.36 0.42 0.480.440.440.410.40.36
CNP-OSG 0.420.340.290.30.290.33 0.44 0.480.390.340.330.290.33
MO-UPS 0.520.520.470.420.390.42 0.5 0.580.580.530.460.430.46
NI-R 0.380.390.420.450.450.45 0.49 0.420.440.470.490.470.47
NI-UNP 0.410.450.430.450.420.33 0.44 0.460.490.430.450.420.33
NI-UTX 0.420.470.440.480.460.33 0.45 0.50.490.440.480.460.33
orthogonal distribution of β
AA-OSG 0.450.420.420.40.390.37 0.4 0.510.480.490.450.450.4
DUK-IBM 0.460.420.460.40.360 0.49 0.540.480.520.430.360
DUK-OSG 0.40.440.440.380.420.36 0.4 0.490.520.510.450.480.36
NI-NSC 0.360.360.350.350.390.46 0.44 0.440.420.390.390.430.54
NI-OSG 0.390.320.360.360.310.27 0.37 0.440.390.410.390.340.23
CNP-OSG 0.40.390.40.420.420.4 0.45 0.440.410.420.450.420.4
MO-UPS 0.520.520.540.490.440.31 0.53 0.580.580.60.560.50.34
NI-R 0.450.460.50.540.570.51 0.5 0.490.520.560.610.630.56
NI-UNP 0.440.460.540.50.460.47 0.46 0.520.520.590.570.540.53
NI-UTX 0.220.250.220.230.260.29 0.41 0.350.340.320.320.350.38
Table 7. Risk evaluation measure R i s k in (24), fraction of trading days with decreases in cumulative income, under a Student’s t distribution for the innovations. The meaning of the ten pairs of stocks is indicated in Table 1.
Table 7. Risk evaluation measure R i s k in (24), fraction of trading days with decreases in cumulative income, under a Student’s t distribution for the innovations. The meaning of the ten pairs of stocks is indicated in Table 1.
k = 0 (Directional Accuracy) k = 1 (Accuracy)
CSA ISA CSA
0%20%30%40%50%60% 0%20%30%40%50%60%
linear normalization of β
AA-OSG 0.450.460.490.460.420.37 0.48 0.480.480.490.460.420.37
DUK-IBM 0.450.480.490.460.430 0.5 0.540.550.530.50.430
DUK-OSG 0.510.460.460.380.440.36 0.49 0.540.470.480.410.440.36
NI-NSC 0.370.320.270.390.410.4 0.48 0.450.380.360.430.470.47
NI-OSG 0.410.390.390.380.320.33 0.44 0.450.40.40.390.340.36
CNP-OSG 0.450.380.330.340.290.25 0.46 0.50.420.390.420.330.25
MO-UPS 0.490.490.470.410.420.42 0.49 0.530.550.530.480.510.46
NI-R 0.420.410.440.490.50.47 0.48 0.450.440.480.530.520.49
NI-UNP 0.450.450.40.40.380.25 0.45 0.540.510.460.470.460.25
NI-UTX 0.430.440.450.430.410.4 0.5 0.490.480.480.470.440.44
orthogonal normalization of β
AA-OSG 0.460.410.390.370.350.34 0.42 0.490.430.420.380.370.36
DUK-IBM 0.420.430.440.340.230 0.45 0.510.520.520.380.230
DUK-OSG 0.390.410.40.370.270.18 0.4 0.510.520.510.460.370.27
NI-NSC 0.290.320.320.350.40.5 0.43 0.390.370.320.310.350.42
NI-OSG 0.440.40.380.390.320.38 0.42 0.50.450.440.430.350.43
CNP-OSG 0.370.380.350.380.360.4 0.44 0.430.420.380.380.360.4
MO-UPS 0.520.530.540.490.430.3 0.5 0.550.550.560.510.430.3
NI-R 0.420.450.490.530.570.55 0.5 0.50.50.510.560.590.55
NI-UNP 0.430.470.480.40.440.4 0.47 0.460.470.480.40.440.4
NI-UTX 0.240.250.240.250.250.24 0.49 0.380.40.370.370.380.36

References

  1. A.E. Khandani, and A.W. Lo. “What happened to the quants in August 2007? ” J. Invest. Manag. 5 (2007): 29–78. [Google Scholar] [CrossRef]
  2. B. Lehmann. “Fads, martingales and market efficiency.” Q. J. Econ. 105 (1990): 1–28. [Google Scholar] [CrossRef]
  3. A.W. Lo, and A.C. MacKinlay. “When are contrarian profits due to stock market overreaction? ” Rev. Financ. Stud. 3 (1990): 175–206. [Google Scholar] [CrossRef]
  4. E. Gatev, W.N. Goetzmann, and K.G. Rouwenhorst. “Pairs trading: Performance of a relative-value arbitrage rule.” Rev. Financ. Stud. 19 (2006): 797–827. [Google Scholar] [CrossRef]
  5. F.R. Kleibergen, and H.K. van Dijk. “Bayesian simultaneous equation analysis using reduced rank structures.” Econom. Theory 14 (1998): 701–743. [Google Scholar] [CrossRef]
  6. F.R. Kleibergen, and R. Paap. “Priors, posteriors and Bayes factors for a Bayesian analysis of cointegration.” J. Econom. 111 (2002): 223–249. [Google Scholar] [CrossRef]
  7. R.W. Strachan, and H.K. van Dijk. Valuing Structure, Model Uncertainty and Model Averaging in Vector Autoregressive Processes. Econometric Institute Report EI 2004-23; Rotterdam, the Netherlands: Erasmus University Rotterdam, 2004. [Google Scholar]
  8. A.N. Burgess. “A Computational Methodology for Modelling the Dynamics of Statistical Arbitrage.” Ph.D. Thesis, University of London, London Business School, London, UK, 1999. [Google Scholar]
  9. S. Johansen. “Estimation and hypothesis testing of cointegration vectors in Gaussian vector autoregressive models.” Econometrica 59 (1991): 1551–1580. [Google Scholar] [CrossRef]
  10. N. Baştürk, L.F. Hoogerheide, R. Kleijn, H.K. van Dijk, and Corresponding author: H.K. van Dijk, Department of Econometrics and Tinbergen Institute, Vrije Universiteit Amsterdam and Econometric Institute, Erasmus University Rotterdam. “Prior ignorance, likelihood shape and posterior existence in a cointegration model.” Unpublished working paper. 2015. [Google Scholar]
  11. J. Geweke. “Bayesian treatment of independent Student-t linear model.” J. Appl. Econom. 8 (1993): 19–40. [Google Scholar] [CrossRef]
  12. J. Dickey. “The weighted likelihood ratio, linear hypothesis on normal location parameters.” Ann. Math. Stat. 42 (1971): 204–223. [Google Scholar] [CrossRef]
  13. I. Verdinelli, and L. Wasserman. “Computing Bayes factors using a generalization of the Savage-Dickey density ratio.” J. Am. Stat. Assoc. 90 (1995): 614–618. [Google Scholar] [CrossRef]
  14. J.C. Chao, and P.C.B. Phillips. “Model selection in partially nonstationary vector autoregressive processes with reduced rank structure.” J. Econom. 91 (1999): 227–271. [Google Scholar] [CrossRef]
  15. L.F. Hoogerheide, A. Opschoor, and H.K. van Dijk. “A class of adaptive importance sampling weighted EM algorithms for efficient and robust posterior and predictive simulation.” J. Econom. 171 (2012): 101–120. [Google Scholar] [CrossRef]
  16. K.D. West, H.J. Edison, and D. Cho. “A utility-based comparison of some models of exchange rate volatility.” J. Int. Econ. 35 (1993): 23–45. [Google Scholar] [CrossRef]
  17. W. Marquering, and M. Verbeek. “The economic value of predicting stock index returns and volatility.” J. Financ. Q. Anal. 39 (2004): 407–429. [Google Scholar] [CrossRef]
  18. T. Furmston, S. Hailes, and A.J. Morton. “A Bayesian Residual-Based Test for Cointegration.” 2013. Available online: http://arxiv.org/abs/1311.0524 (accessed on 21 February 2016).
  19. C. Bracegirdle, and D. Barber. “Bayesian Conditional Cointegration.” 2012. Available online: http://arxiv.org/abs/1206.6459 (accessed on 21 February 2016).
  20. F.R. Kleibergen, and H.K. van Dijk. “On the shape of the likelihood/posterior in cointegration models.” Econom. Theory 10 (1994): 514–551. [Google Scholar] [CrossRef]
  21. A. Zellner. An Introduction to Bayesian Inference in Econometrics. New York, NY, USA: Wiley, 1971. [Google Scholar]
  22. M.H. Chen. “Importance-weighted marginal Bayesian posterior density estimation.” J. Am. Stat. Assoc. 89 (1994): 818–824. [Google Scholar] [CrossRef]
Table 1. Ten pairs of stocks with highest Bayes factor of model with Π having rank 1 (cointegration) versus model with Π having rank 0 (two random walks) under both the linear normalization and the orthogonal normalization (among stocks in the Dow Jones Composite Average index, using daily closing prices recorded over the period of 1 January 2009 until 30 June 2009).
Table 1. Ten pairs of stocks with highest Bayes factor of model with Π having rank 1 (cointegration) versus model with Π having rank 0 (two random walks) under both the linear normalization and the orthogonal normalization (among stocks in the Dow Jones Composite Average index, using daily closing prices recorded over the period of 1 January 2009 until 30 June 2009).
AA-OSG:ALCOA Inc.-Overseas Shipholding Group, Inc.
CNP-OSG:CenterPoint Energy, Inc.-Overseas Shipholding Group, Inc.
DUK-IBM:Duke Energy Corp.-International Business Machines Corp.
DUK-OSG:Duke Energy Corp.-Overseas Shipholding Group, Inc.
MO-UPS:Altria Group, Inc.-United Parcel Service, Inc.
NI-NSC:NiSource, Inc.-Norfolk Southern Corp.
NI-OSG:NiSource, Inc.-Overseas Shipholding Group, Inc.
NI-R:NiSource, Inc.-Ryder System, Inc.
NI-UNP:NiSource, Inc.-Union Pacific Corp.
NI-UTX:NiSource, Inc.-United Technologies Corp.
Table 2. Performance evaluation measure P r o f i t a b i l i t y in (22)–(23) (in %). Average over 10 pairs of assets.
Table 2. Performance evaluation measure P r o f i t a b i l i t y in (22)–(23) (in %). Average over 10 pairs of assets.
CSA ( k = 0 , Directional Accuracy)
ξ
0%20%30%40%50%60%
linear normalization
normal 12.5217.0819.7226.9431.3247.37
Student’s t 19.8724.2330.1132.938.7461.03
orthogonal normalization
normal 57.1064.863.6176.5683.49112.2
Student’s t 60.2766.4167.6781.7891.53114.05
ISA CSA ( k = 1 , accuracy)
ξ
0%20%30%40%50%60%
linear normalization
normal 15.42 4.805.736.076.988.8211.89
Student’s t 16.82 7.819.7712.5713.5315.3220.34
orthogonal normalization
normal 46.11 67.7883.0390.24104.62118.09160.68
Student’s t 46.32 68.3883.7591.37106.04113.93155.67
Table 3. Risk evaluation measure R i s k in (24). Average over 10 pairs of assets.
Table 3. Risk evaluation measure R i s k in (24). Average over 10 pairs of assets.
CSA ( k = 0 , Directional Accuracy)
ξ
0%20%30%40%50%60%
linear normalization
normal 0.450.430.420.420.420.35
Student’s t 0.440.420.410.410.400.32
orthogonal normalization
normal 0.400.400.420.410.400.34
Student’s t 0.390.400.400.390.360.33
ISA CSA ( k = 1 , accuracy)
ξ
0%20%30%40%50%60%
linear normalization
normal 0.47 0.510.480.450.450.440.35
Student’s t 0.48 0.500.470.460.460.440.34
orthogonal normalization
normal 0.45 0.480.460.480.460.450.37
Student’s t 0.46 0.470.470.450.420.390.35
  • 1.We also considered models with a constant term inside the cointegration relationship and/or drift terms in the model equation. The inclusion of such terms did not change the conclusions of our paper.
  • 2.In general we can add restrictions to the normalization restriction β β = I r in order to uniquely identify the r × r matrix β 1 given the ( n r ) × r matrix β 2 , where β = ( β 1 β 2 ) . For example, if we assume that n = 2 , r = 1 , then we can add the restriction β 1 0 , so that only β 1 = 1 β 2 2 satisfies β 1 2 + β 2 2 = 1 . Note that without that restriction β 1 0 we could have β 1 = 1 β 2 2 or β 1 = 1 β 2 2 , so that in that case β 1 would not be uniquely identified by β 2 .

Share and Cite

MDPI and ACS Style

Ardia, D.; Gatarek, L.T.; Hoogerheide, L.; Van Dijk, H.K. Return and Risk of Pairs Trading Using a Simulation-Based Bayesian Procedure for Predicting Stable Ratios of Stock Prices. Econometrics 2016, 4, 14. https://doi.org/10.3390/econometrics4010014

AMA Style

Ardia D, Gatarek LT, Hoogerheide L, Van Dijk HK. Return and Risk of Pairs Trading Using a Simulation-Based Bayesian Procedure for Predicting Stable Ratios of Stock Prices. Econometrics. 2016; 4(1):14. https://doi.org/10.3390/econometrics4010014

Chicago/Turabian Style

Ardia, David, Lukasz T. Gatarek, Lennart Hoogerheide, and Herman K. Van Dijk. 2016. "Return and Risk of Pairs Trading Using a Simulation-Based Bayesian Procedure for Predicting Stable Ratios of Stock Prices" Econometrics 4, no. 1: 14. https://doi.org/10.3390/econometrics4010014

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop