Pair-Copula Constructions for Financial Applications: A Review

: This survey reviews the large and growing literature on the use of pair-copula constructions (PCCs) in ﬁnancial applications. Using a PCC, multivariate data that exhibit complex patterns of dependence can be modeled using bivariate copulae as simple building blocks. Hence, this model represents a very ﬂexible way of constructing higher-dimensional copulae. In this paper, we survey inference methods and goodness-of-ﬁt tests for such models, as well as empirical applications of the PCCs in ﬁnance and economics.


Introduction
Understanding and quantifying dependence is the core of all modeling efforts in financial econometrics.For modeling high dimensional data that exhibit non-linear dependence, a copula approach is often taken [1,2].The concept of copulae was introduced already in 1959 by Sklar [3], but it was the seminal work of Embrechts et al. [4], introducing copulae to the field of financial risk management, that really lead to the incredible growth in papers published on this subject the last 15 years.From a practical point of view, the advantage of the copula-based approach is that the appropriate marginal distributions for the components of a multivariate system can be selected freely and then linked through a suitable copula.Hence, the dependence structure may be modeled independently of the marginal distributions.
For bivariate models, there exists a long and varied list of copula families; see, e.g., [1].However, in higher dimensions, the selection of parametric copulae is still rather limited [5].This has led to the development of hierarchical copula-based structures, of which the most promising is the pair-copula construction (PCC).This structure was originally proposed by Joe [6] and further explored and discussed by Bedford and Cooke [7,8] and Kurowicka and Cooke [9].However, it was the work of Aas et al. [10], putting the PCC in an inferential context, that really spurred a surge in empirical applications of these constructions.During the last eight years, the pair-copula constructions have been applied within a number of different fields, including finance and insurance, genetics, marketing, health and hydrology.The focus of this survey is however on the financial applications.
In the sections that follow, we first give an overview of the pair-copula construction and its subclass, the regular vine.We then consider inference methods and goodness-of-fit tests for these models, and finally, we present a survey of some of the numerous applications of PCCs that have appeared in the economics and finance literature.

The Pair-Copula Construction and the Regular Vine
A PCC is a multivariate copula that is constructed from a set of bivariate ones, so-called pair-copulae.More specifically, the copula density is decomposed into a product of pair-copula densities.All of these bivariate copulae may be selected completely freely as the resulting structure is guaranteed to be a valid copula.Hence, PCCs are highly flexible and able to characterize a wide range of complex dependencies.Inference on PCCs is in general demanding, but the subclass of regular vines has many appealing computational properties and, hence, constitutes an exception in the inferential context.
The notion of regular vines (R-vines) was introduced by Bedford and Cooke [8], and described in more detail in [9].It involves the specification of a sequence of trees, each edge of which corresponds to a pair-copula.These pair-copulae constitute the building blocks of the joint R-vine distribution.According to Definition 4.4 of [9], an R-vine V on d variables consists of the trees (also denoted levels) T 1 , ..., T d−1 .Let N i and E i be the sets of nodes and edges, respectively, in tree T i .Then, the following conditions are satisfied: 1. Tree T 1 has nodes N 1 = {1, ..., d} and edges E 1 .2. For i = 2, ..., d − 1, the nodes in tree T i are the edges in tree T i−1 , i.e., N i = E i−1 .3. Proximity condition: if two edges in tree T i are to be joined as nodes in tree T i+1 by an edge, they must share a common node in T i .
To build an R-vine with node set N := {N 1 , ..., N d−1 } and edge set E := {E 1 , ..., E d−1 }, one associates each edge e in E i with a bivariate copula C j(e),k(e)|D(e) .The nodes j(e) and k(e) are called the conditioned nodes, while D(e) is denoted the conditioning set and the union {j(e), k(e), D(e)} the constraint set.The copulae in tree T 1 have an empty conditioning set; in tree T 2 , these sets consist of one node, in tree T 3 of two nodes, and so on.Take, for instance, the edge 5, 4|231 joining {5, 2|31} and {4, 2|13} in the fourth tree of Figure 1, displaying a seven-dimensional R-vine tree specification.The conditioned nodes are 5 and 4; the conditioning set is {1, 2, 3}; and the constraint set is {5, 4, 1, 2, 3}.
Let the random vector X follow an R-vine distribution.Further, let X D(e) denote the subvector of X determined by the indices constituting D(e).Then, Theorem 4.2 in [9] states that the joint density of X can be written as: The right factor of the right-hand side of ( 1) is a product of d(d − 1)/2 bivariate copula densities and is called an R-vine copula.Note that the arguments of the pair-copulae are conditional distributions in all trees, but the first, where they are the univariate margins.
The key to the construction in ( 1) is that all copulae involved in the decomposition are bivariate and can belong to different families.There are no restrictions regarding the copula types that can be combined; the resulting structure is guaranteed to be valid anyhow.A further advantage with the R-vine copula is that the conditional distributions F(x|v) constituting the pair-copula arguments can be evaluated using a recursive formula derived in [6]: Here, C xv j |v −j is a bivariate copula; v j is an arbitrary component of v; and v −j denotes the vector v excluding v j .By construction, R-vines have the important characteristic that the copulae in question always are present in the preceding trees of the structure, so that they are available without extra computations.In order to find an expression for a general R-vine density, one needs an efficient way of storing the indices involved in the pair-copulae.One such approach was proposed by Morales-Napoles [11] and explored in more detail in [12].It involves the specification of a lower triangular matrix M = (m i,j |i, j = 1, ..., d) ∈ {0, ..., d} d×d whose diagonal entries m i,i are the nodes 1, ..., d of the first tree.Further, each row of M from the bottom up represents a tree.The conditioned sets of a node are determined by a diagonal entry and the corresponding column entry of the row under consideration, while the conditioning set is given by the column entries below this row.The R-vine matrix corresponding to the R-vine in Figure 1 is: To determine the edges in T 1 , we combine the numbers in the bottom row with the diagonal elements in the corresponding columns, i.e., the edges are (6,5), (7,5), (5,1), and so on.The edges of T 2 are given by the numbers in the second row from the bottom, associated with the diagonal elements, conditioning on the elements in the bottom row, namely (6,1|5), (7,1|5), etc. Proceeding like this, the only edge in T 6 is found by coupling the two upper elements in the leftmost column with the remaining five entries of the column as a conditioning set, i.e., (6,4|72,315).
Based on M, the R-vine density may be written as in [12]: where the pair-copulae have arguments F(x m j,j |x m i+1,j , ..., x m d,j ) and F(x m i,j |x m i+1,j , ..., x m d,j ).
Corresponding copula types and parameters can conveniently be stored in matrices similar to M.

Simplifying Assumption
In their general form, PCCs can represent most continuous multivariate distributions.However, to keep them tractable for inference, the assumption that the pair-copulae c j(e),k(e)|D(e) (F(x j(e) |x D(e) ), F(x k(e) |x D(e) ) are independent of the conditioning variables x D(e) , except through the conditional distributions, is usually made, leading to the so-called simplified PCC.
Even though not all multivariate distributions can be represented by a simplified PCC, it may always be used as an approximation.The work in [13] shows that the approximation in fact may be a good one, even when the simplifying assumption is far from being fulfilled.This subject has also been investigated by Stöber et al. [14], Killiches et al. [15] and Spanhel and Kurz [16].
There have been some attempts at estimating non-simplified vines [17,18].However, since the use of such methods in financial applications is still very limited, we do not describe them here.

Canonical Vines and D-Vines
In financial applications, two special cases of regular vines have mainly been used.These are denoted canonical vines and D-vines, respectively [19].Each model gives a specific way of decomposing the density.Figure 2 shows the specification corresponding to a five-dimensional D-vine.It consists of four trees T j , j = 1, . . ., 4. Tree T j has 6 − j nodes and 5 − j edges.Each edge corresponds to a pair-copula density, and the edge label corresponds to the subscript of the pair-copula density, e.g., edge 14|23 corresponds to the copula density c 14|23 (•).The whole decomposition is defined by the n(n − 1)/2 edges and the marginal densities of each variable.The nodes in tree T j are only necessary for determining the labels of the edges in tree T j+1 .As can be seen from Figure 2, two edges in T j , which become nodes in T j+1 , are joined by an edge in T j+1 only if these edges in T j share a common node.
The density f (x 1 , . . ., x n ) corresponding to a D-vine may be written as: where index j identifies the trees, while i runs over the edges in each tree.The D-vine in Figure 2 has density: A D-vine with 5 variables, 4 trees and 10 edges.Each edge may be may be associated with a pair-copula.
In a D-vine, no node in any tree T j is connected to more than two edges.In a canonical vine, each tree T j has a unique node that is connected to n − j edges.Figure 3 shows a canonical vine with five variables.The n-dimensional density corresponding to a canonical vine is given by: T 1 The canonical vine in Figure 3 has density: Fitting a canonical vine might be advantageous when a particular variable is known to be a key variable that governs interactions in the dataset.In such a situation, one may decide to locate this variable at the root of the canonical vine, as we have done with Variable 1 in Figure 3.

Serial Dependence
To date, pair-copula constructions have been employed largely to account for cross-sectional dependence.Applications to serial dependence in time series and longitudinal data are rare.There are however some exceptions.The work in [20], studying intraday electricity load data, was the first to demonstrate the usefulness of serial PCCs.Later, Vaz de Melo Mendes and Accioly [21] have used canonical vines for modeling nonlinear temporal dependences of Brazilian series of realized volatilities, while in [22], the focus is on serial dependence in equity time series.Finally, Brechmann and Czado [23] recently introduced the so-called copula autoregressive model, COPAR, which allows for non-linear and non-symmetric modeling of both serial and between-series dependence.

Inference
Inference on R-vines consists of three tasks: (i) selecting the structure with all its trees; (ii) choosing a copula family for each of the d(d − 1)/2 pair-copulae; and (iii) estimating the parameters of each pair-copula.Ideally, Steps (i)-(ii) should be performed simultaneously.In practice, however, this is usually done stepwise.In what follows, we give a short review of the main approaches that have been used for each step.See [24,25] for more comprehensive surveys of the various model selection and estimation methods that have been used for regular vine copulae.

Structure Selection
The number of possible R-vines on d variables is 2 2 )−1 d! [11].Finding the globally-optimal R-vine structure for a given high-dimensional dataset is therefore unfeasible, but several useful strategies have been proposed.Since the first trees can be estimated with more precision, a natural strategy is to build the structure starting from the bottom, trying to maximize the dependence in the first trees.This strategy was originally proposed in [10] for canonical vines and D-vines and later extended to regular vines by Dißmann et al. [12].The latter algorithm starts by finding the maximum spanning tree over the d nodes corresponding to the d variables using the well-known algorithm of [26].This is a tree on all nodes that maximizes the sum of the weights of the edges, using measures of pairwise dependence as weights.The subsequent trees are built in a similar manner, under the additional restriction that the proximity condition must be fulfilled.This procedure, which as far as we are concerned is the far most used in practical applications, requires the simultaneous selection of pair-copula types, as well as the estimation of the parameters.There are alternatives to this bottom-up strategy; [27] starts, e.g., with selecting the weakest conditional dependencies for the highest trees.See [25] for a comparison of these two selection procedures.
Recently, Bayesian approaches for estimating the posterior distribution of the tree structure of a regular vine have been developed [28].They are not treated further here, since their use in financial applications has been limited.

Choosing Copula Families
There are many possible pair-copula families, e.g., Gaussian, t, Gumbel and Clayton.See [1,2] for a more comprehensive list.The copula types are typically chosen one by one, using either a model selection criterion, such as the Akaike information criterion (AIC), the Bayesian information criterion (BIC) or the copula information criterion (CIC) [29], or a copula goodness-of-fit test.In [30] four different strategies are compared, among them AIC and the goodness-of-fit test based on the Cramér-von Mises statistic.In this study, the AIC turned out to be the most reliable selection criterion.
It should be noted that the selection of a family for a copula in a specific level of the vine depends on the choices made at the preceding level.This is due to the fact that in the sequential estimation procedure, the observations at one level are given as partial derivatives of the copulae at the preceding level.As discussed in [24,31], this selection strategy clearly accumulates uncertainty in the selection, and hence, the final model has to be carefully evaluated.

Parameter Estimation for a Given Structure and Copula Families
The pair-copula construction is by definition a multivariate copula.Hence, the parameters of a given PCC may be estimated using any multivariate copula estimator, such as the inference function for margins (IFM) method [1,32] or the maximum pseudo likelihood (MPL) estimator [33,34].However, the number of parameters of a PCC grows quickly with the dimension, meaning that in medium to high dimensions, these standard methods may be too demanding computationally.Therefore, Aas et al. proposed a sequential method in [10], for which the idea is to estimate the parameters level by level, conditioning on the parameters from the preceding levels of the structure.For more details, see [10], as well as [31], where the asymptotic properties of this approach are investigated.Further, [35] performs a comparison of the sequential approach and the standard copula estimators.
It should be noted that using the above methods, it is assumed that observations of each variable are independent over time.Hence, in the presence of temporal dependence, pair-copula constructions are usually fitted on standardized residuals obtained by filtering the original time series with ARIMA-GARCH models; see, e.g., [10] for an example.
Alternatives to the standard maximum likelihood estimators have been proposed.In [24,36,37], Bayesian techniques to select the pair-copula families for D-vines are covered, while [18,[38][39][40] discuss vines with non-parametric pair-copulae.In practical applications, however, the use of the Bayesian and non-parametric approaches has been very limited.

Time-Varying Models
An observation often reported by market professionals is that during major market events, correlations change dramatically.The possible existence of changes in the correlation or, more precisely, of changes in the dependency structure between assets has obvious implications in risk assessment and portfolio management.For instance, if all stocks tend to fall together as the market falls, the value of diversification may be overstated by those not taking the increase in downside correlations into account.Due to the challenge of high dimensionality in many applications, the parameters of the pair-copula constructions are usually assumed to be constant over time.There are however some exceptions.One direction of research uses parametric dependence models.The work in [41] builds time-varying models by combining the pair-copula constructions with stochastic autoregressive copula (SCAR) models to capture dependence that changes over time.More specifically, they utilize the fact that for all of the most well-known copula families, there exists a one-to-one relationship between the copula parameter and Kendall's τ, and let Kendall's τ for each bivariate copula be driven by a latent Gaussian AR(1)-process.A very similar approach is taken in [42], while So and Yeung [43] propose a model for the time-varying dependence (where the dependence measure may be either linear correlation, rank correlation or Kendall's τ), which is inspired by the Dynamic Conditional Correlation (DCC)-GARCH model of [44].
Another popular direction combines pair-copula constructions with regime-switching models.In essence, such models assume that a hidden underlying process, which may be understood as the state of the economy in financial applications, influences the development of a time series.The work in [45] estimates a regime-switching model for the dependence of the stock indices of the G5 and of four Latin American countries.They allow for two regimes, which are modeled by a canonical vine and a Gaussian copula, respectively, and assume that the unobserved latent state variable follows a Markov chain.Later, Stöber and Czado [46] have extended this approach to a model with K regimes, where each regime is described by a different R-vine.

Pruning and Truncation
The flexibility of R-vines comes at the price of the number of parameters exponentially increasing with the dimension.In high-dimensional applications, it is therefore necessary to reduce the number of parameters.One strategy is to identify as many pair-copulae as possible being equal to the independence copula, which amounts to specifying a series of conditional independencies.This may be done either by testing individual copulae for independence, so-called pruning, or by checking the contribution of all trees above a certain level, which is denoted truncation.

Pruning
Pruning a particular copula C jk|D in the R-vine structure is the same as stating that X j and X k are conditionally independent given X D .Pruning may be performed using a copula goodness-of-fit test, e.g., the bivariate asymptotic test based on Kendall's tau [47].However, such a test is, strictly speaking, not an independence test unless the copulae are Gaussian, since τ = 0 implies independence only for those copulae.Another option is therefore to use the Cramér-von Mises test proposed by Hobaek Haff and Segers [38].

Truncation
A truncated R-vine at level K is an R-vine where all pair-copulae with conditioning set equal to or larger than K are replaced by independence copulae.If K = 1, the truncated R-vine becomes a Markov tree distribution that only models unconditional relationships.The density of an R-vine copula truncated at level K is given by: where u = (u 1 , ..., The use of truncated R-vines may be justified as follows.As stated in Section 3.1, the selection algorithm of [12] builds the structure from the bottom up, trying to maximize the dependence in the first trees.Hence, if this procedure is successful, the most important and strongest (conditional) dependencies among the variables are captured by the pair-copulae in the first trees.At high levels of the structure, the parameters quantify conditional dependence with a very large number of conditioning variables.The uncertainty of the estimated copula parameters is large because of the repeated transformations of the original data using estimated conditional distribution functions [35].Moreover, the parameter estimates for the upper levels do not seem to affect the lower order dependencies particularly.This indicates that it might be appropriate to truncate large structures after a certain level.
Several methods have been proposed for determining the optimal truncation level; see, e.g., [27,48,49].In the approach by Brechmann et al. [48], one starts with K = 1 and fits the corresponding truncated R-vine (for K = 0, a pre-test of joint independence can be performed).K is thereafter increased by one.If the gain from fitting the extra tree is negligible, one stops and uses the resulting specification.If not, one proceeds until one reaches a truncation level K 0 , for which the contribution from an extra level is not significant.To assess whether the gain from fitting the extra tree is negligible, the likelihood ratio-based test proposed by Vuong [50] is used; see [48] for more details.

Model Validation
To evaluate whether a copula or copula construction appropriately fits the data at hand, goodness-of-fit (GOF) testing is called upon.In [10], a goodness-of-fit test based on the probability integral transform (PIT) of [51] and a transformation introduced by Breymann et al. [52] was suggested for vine copulae, but not further studied nor tested.The work in [53] applied two approaches based on the empirical copula and Kendall's process, which originally were proposed for standard multivariate copulae [54,55].After these early attempts, goodness-of-fit testing for vines was not treated before two new tests arising from the information matrix equality and the specification test of [56] were introduced by Schepsmeier [57,58].The first test is an extension of the bivariate GOF test of [59], while the second is inspired of the work of [60].An extensive simulation study in a high dimensional setting shows that the two new tests have excellent performance with respect to size and power.

Financial Applications
In this section, we briefly review some of the financial applications of pair-copula constructions, divided into broad groups according to the nature of the application.

Market Risk
The main application area of pair copula constructions in finance has been the assessment of market risk.Market risk is usually measured by value-at-risk (VaR) or conditional value-at-risk (cVaR).Both of these measures are designed to estimate the probability of large losses, leading to a demand for flexible dependency models like the pair-copula constructions.In the seminal paper [10], the PCC was used to model the dependency structure of a portfolio consisting of two stock and two bond return indices.Since then, these constructions have been used for equities [48,53,[61][62][63], interest rates [48,64,65], exchange rates [61,[66][67][68][69], electricity prices and other commodities [61,[70][71][72][73][74] and housing prices [75].In most of these studies, the PCC shows excellent performance compared to alternative dependency models.

Capital Asset Pricing
Traditionally, assets have been valuated using the famous classical capital asset pricing model (CAPM) [76,77].This model assumes that assets are multivariate normal distributed.Today, it is a well-known fact that returns in financial markets do not follow a normal distribution.Moreover, their dependency structure exhibits features, such as tail dependence and asymmetry.Addressing both of these issues, Heinen and Valdesogo [78] developed an extension of the CAPM, which can capture the non-linear and non-Gaussian behavior of the cross-section of asset returns, as well as model their dependencies to the market and the respective sector.Their model is based on two major building blocks: marginal GARCH models and a canonical vine structure.It is therefore denoted the canonical vine autoregressive (CAVA) model.
Later, Brechmann [79] extended the CAVA model to the more general structure of R-vines resulting in the regular vine market sector (RVMS) model.In an extensive application to European stock market returns, the authors demonstrate the superior performance of this model, in comparison to relevant benchmark models, among them the classical CAPM model and the CAVA model.

Credit Risk
From a methodological viewpoint, a misconception of credit risk was a core reason for the financial crisis of 2008 [80].Modeling the correlation structure of a credit portfolio has traditionally been based on the Gaussian copula, and this has received much criticism, even in a non-academic context [81].The works in [82][83][84] show how vine copulae can be used to derive a more accurate and reliable estimate of the economic capital of a loan portfolio.
In [85], the pair-copula construction is used for a different credit risk application.The focus of this paper is to determine the probability of default (PD) for firms.They consider a contingent claim model based on balance sheet data, where the dynamics of the equity is modeled via the D-vine.

Operational Risk
Operational risk data, when available, are usually scarce, heavy-tailed and possibly dependent.In [86,87], the aims are to model operational loss severities and frequencies, respectively, using pair-copula constructions.Empirical results on real-world data show that such flexible explicit dependence modeling might have a significant impact on the risk capital, leading to a clear diversification benefit compared to the standard Basel comonotonicity assumption.

Liquidity Risk
Liquidity risk is of major concern to both investors and portfolio managers.Studies on the commonality in liquidity have revealed clear empirical evidence for strong comovements in the bid-ask spreads of individual stocks; see, e.g., [88].To account for non-linear dependence between bid-ask spreads across firms, Weiß and Supper [62] proposed a model based on a D-vine to forecast liquidity-adjusted risk measures for a multivariate stock portfolio.The model is estimated from intraday bid-ask spreads and stock returns from NASDAQ, and the authors show that neglecting the non-linearities in the dependence between returns and bid-ask spreads in the forecasting of portfolio-VaR may lead to a severe underestimation of losses on the portfolio.

Systemic Risk
The Financial Stability Board defines systemic risk as "the risk of disruption to financial services that is (i) caused by an impairment of all or parts of the financial system; and (ii) has the potential to have serious negative consequences for the real economy" [89].The systemic relevance of an institution may be defined as the potential impact of its failure to other institutions.Hence, when measuring systemic risk, it is crucial to take the interdependence between institutions and markets into account.There might be considerably different relationships among institutions depending on industry sector and geographical region.Moreover, it is often observed that in times of crisis, the dependence of joint negative events increases.Such heterogeneous dependencies cannot be appropriately captured by standard copulae, but they may be accounted for using a pair-copula construction.Hence, PCCs have been used in several studies treating systemic risk.The work in [90] analyzes the interdependence among 38 major financial institutions from all over the world using credit default swaps (CDS).The interdependence among financial institutions is also the subject of [91].The work in [92] studies the dependency between different Eurozone financial markets, while Abbara and Zevallos [93] investigate linkages and contagion among important stock markets in Latin America.In [94], the effects of the increased interdependence between international stock markets on the probability of global crashes is examined, and Reboredo and Ugolini [95] investigate the systemic sovereign debt distress affecting European financial systems.Finally, the dependency between sovereign spreads of the European countries against Germany is studied in [96].

Portfolio Optimization
Portfolio optimization has come a long way from the seminal work of [97].After it was demonstrated by Rockafellar [98] that linear programming techniques can be used for cVaR optimization, this approach has become quite popular.The cVaR optimization usually takes scenarios as input.Hence, the returns of the instruments constituting the portfolio may in principle have any multivariate distribution.In [99], individual asset returns are assumed to be distributed according to the skewed Student t-distribution of [100], while a canonical vine copula is used to model their dependency structure.This model is found to produce the highest-ranked outcomes across a range of statistical and economic metrics when compared to other models incorporating elliptical or symmetric dependence structures.Other papers that shows the usefulness of pair-copula constructions for portfolio optimization are [101][102][103].

Option Pricing
Multivariate options are widely used when there is a need to hedge against a number of risks simultaneously.An example of such a derivative is a basket option.For such a derivative, the payoff depends on the value of a basket of assets instead of a single stock.The principal reason for using basket options is that they usually are cheaper to use for portfolio insurance than a corresponding portfolio of plain vanilla options.This is due to the correlation structure between the assets.
The pricing of basket options is a non-trivial task, as there is no analytic expression of the distribution of the weighted sum of the underlying assets in the basket.The most straightforward extension of the univariate Black and Scholes model is the Gaussian copula model, also called the multivariate Black and Scholes model.Several authors have however shown that calibrating the Gaussian copula model to market data may lead to non-meaningful parameter values, especially in distressed periods.Hence, the joint dynamics of a number of stocks should be modeled in a more realistic way.In [104], the dependence among the assets in a basket option is modeled using pair-copula constructions.The authors show that the choice of dependency structure has a significant effect on the option price and that using Gaussian or t-copulae may underprice the basket option.

Conclusions
In this survey, we have reviewed the literature on the use of pair-copula construction-based models in financial applications.We have discussed different inference methods and model validation approaches for PCCs, and a brief survey of the many applications of pair-copula constructions in the economics and finance literature is provided.

Figure 3 .
Figure 3.A canonical vine with 5 variables, 4 trees and 10 edges.Each edge may be may be associated with a pair-copula.