Next Article in Journal / Special Issue
Bayesian Inference for Latent Factor Copulas and Application to Financial Risk Forecasting
Previous Article in Journal
Maximum Likelihood Estimation of the I(2) Model under Linear Restrictions
Previous Article in Special Issue
Goodness-of-Fit Tests for Copulas of Multivariate Time Series
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Copula-Based Factor Models for Multivariate Asset Returns

1
Department of Economics, University of Augsburg, Universitätsstr. 16, 86159 Augsburg, Germany
2
Department of Mathematics, Technical University of Munich, Boltzmannstr. 3, 85748 Garching, Germany
*
Author to whom correspondence should be addressed.
Econometrics 2017, 5(2), 20; https://doi.org/10.3390/econometrics5020020
Submission received: 29 September 2016 / Revised: 3 May 2017 / Accepted: 3 May 2017 / Published: 17 May 2017
(This article belongs to the Special Issue Recent Developments in Copula Models)

Abstract

:
Recently, several copula-based approaches have been proposed for modeling stationary multivariate time series. All of them are based on vine copulas, and they differ in the choice of the regular vine structure. In this article, we consider a copula autoregressive (COPAR) approach to model the dependence of unobserved multivariate factors resulting from two dynamic factor models. However, the proposed methodology is general and applicable to several factor models as well as to other copula models for stationary multivariate time series. An empirical study illustrates the forecasting superiority of our approach for constructing an optimal portfolio of U.S. industrial stocks in the mean-variance framework.

1. Introduction

It took almost four decades before the statistical usefulness and attractiveness of copulas was widely recognized after the seminal papers by Frees and Valdez (1998), Li (2000), and Embrechts et al. (2002). Copulas are now a standard tool for modeling a dependence structure of multivariate iid data in applied science. The foundation of the copula theory was laid by the famous Sklar’s theorem (see Sklar 1959), which states that any multivariate distribution can be represented through its copula and marginal distributions. If marginal distributions are continuous, then the copula of a multivariate distribution is unique. This approach is particularly flexible, since margins and the dependence structure—which is dictated by the copula—can be modeled independently.
To our knowledge, Darsow et al. (1992) initiated a theoretical application of copulas to specify univariate Markov processes of first order. Thus, conditional independence can be stated in terms of copulas, and this results in a copula counterpart of the Chapman–Kolmogorov equations for the transition probabilities of a Markov process. Ibragimov (2009) generalized their approach for univariate Markov processes of higher order as well as for non-Markov processes. Furthermore, he introduced new classes of copulas for modeling univariate time series. Estimation of copula-based stationary time series models can still be pursued in the classical framework as for iid data. For example, Chen and Fan (2006) investigated theoretical aspects of the two-step estimation when marginal distributions are fitted non-parametrically in the first step and copula parameters are then estimated by maximum likelihood.
The first non-Gaussian VAR models were introduced by Biller and Nelson (2003), where smartly chosen Gaussian VAR time series were transformed to achieve desired autocorrelation structure and marginal distributions. Recently, Brechmann and Czado (2015), Beare and Seo (2015), and Smith (2015) simultaneously developed copula-based models for stationary multivariate time series. Although these models differ from each other, their generality consists of an underlying R-vine pair-copula construction (see Aas et al. 2009) to describe the cross-sectional and temporal dependence jointly. To capture the cross-sectional dependence, Brechmann and Czado (2015) employ C-Vine, while Beare and Seo (2015) and Smith (2015) consider D-vine. Further, Brechmann and Czado (2015) and Beare and Seo (2015) assume the existence of a key variable, whose temporal dependence was explicitly modeled, and this assumption combined with C- or D-vine for the cross-sectional dependence results in a corresponding R-vine for a multivariate time series. In contrast, Smith (2015) explicitly modeled the temporal dependence between multivariate observations and constructed a general D-vine for them consisting of D-vines for the cross-sectional dependence.
In the time of big data, factor models offer an elegant solution to describe a high-dimensional panel data with a few unobservable (latent) variables, called factors. The idea behind factor models is that observable variables are driven by two orthogonal, hidden processes: one captures their co-movements and arises from a linear combination of the latent factors, whereas the other covers their individual nature in the form of idiosyncratic shocks. The dimension of the observed data usually significantly exceeds the number of factors, and so a reduction in dimension takes place.
In the seminal works of Stock and Watson (1999, 2002a, 2002b), factor models supported the forecasting of univariate time series. It was shown for large panel data that the unobserved factors can be consistently estimated, and this served for a consistent forecasting framework. In particular, Stock and Watson (2002a) illustrated that the forecasting of several macroeconomic variables based on factor models can outperform those obtained from competing models such as autoregression (AR), VAR, and leading indicators. In the literature, factor models are classified as static or dynamic with respect to the stochastic dynamics of the unobserved factors. Static factor models suppose iid normally distributed factors, while dynamic factor models assume that the factor obeys a VAR model of order p 1 .
In this paper, we apply the copula autoregressive (COPAR) model of Brechmann and Czado (2015) to quantify the dependence structure of estimated unobserved factors in dynamic factor models. More precisely, we consider two dynamic factor models and estimate them separately with the maximum likelihood by employing the Kalman filter and smoother. The estimated factors are then combined with a COPAR model, from which latent factors are simulated for a forecasting purpose. Thus, our approach allows several estimated dynamic factor models to be coupled with a copula and admits a non-Gaussian dependence structure of simulated latent factors. To gain information from the estimated factors, a forecasted variable of each market is regressed on the corresponding simulated factors and its previous value.
It should be noted that our modeling approach is different from the factor copula modeling of Krupskii and Joe (2013) and Oh and Patton (2017). We first estimate unobserved factors and then fit copula for them as for observable data. With factor models, we reduce the data dimension, and using autoregressive structure of factors we decrease the number of copula parameters essentially. In contrast, Krupskii and Joe (2013) and Oh and Patton (2017) treat iid data and reduce the dimension of copula parameters by considering conditional independence with respect to unobserved factors. Numerically integrating the unobserved factors, a copula with low dimensional parameters for observed iid data is obtained. For multivariate time series, Oh and Patton (2016) extended the factor copula models with time-varying parametric copulas.
In the empirical application, we consider monthly U.S. financial and macroeconomic panel data to filter driving factors later employed for a mean-variance portfolio optimization. Our main contribution is a new method to improve portfolio performance using factor predictions sampled from the COPAR model. In contrast to dynamic factor models, we explicitly allow for non-Gaussian cross-sectional and temporal dependence between factors. The forecasted factors with non-linear dependence structure are used to assess the future variability of multivariate asset returns. This allows us to construct an optimal portfolio in the mean-variance framework. For comparison, three benchmark portfolios are constructed using dynamic factor models as well as empirical moments of observed asset returns. Thus, our optimal portfolio outperforms the benchmark portfolios according to several risk-adjusted performance measures.
The rest of the paper is organized as follows. Section 2 outlines dynamic factor models and their estimation in the maximum likelihood framework. Section 3 briefly considers vine copulas. Section 4 reviews COPAR models and discusses an algorithm to extend COPAR(1) model for another multivariate observation. Section 5 presents our proposed methodology for an optimal asset allocation of 35 industrial stocks from S&P500 listed in Appendix A and compares it with three benchmark portfolios. Finally, we conclude and discuss further research. Appendix B gives an overview on the considered monthly panel data. Appendix C presents bivariate pair-copulas considered in a model selection procedure for cross-sectional dependence. Appendices Appendix D and Appendix E contain detailed numerical results for portfolio comparisons. Appendix F summarizes testing results of Granger causality between estimated factors.

2. Factor Models

In our application, we deal only with dynamic factor models (DFMs). Further, we restrict our exposition to the simplest factor dynamics of order 1, since any VAR ( p ) can be written in VAR ( 1 ) form (see Lütkepohl 2005, p. 15).
Definition 1. 
(Dynamic Factor Model) For any point in time t, let X t R m be a stationary vector process with zero mean. Let F t R q , q m , denote the multivariate factor at time t, and let the vector ε t R m collect all idiosyncratic shocks. Then, a dynamic factor model of order 1 is given by
X t = Λ F t + ε t , F t = A F t 1 + u t ,
with constant matrices Λ R m × q and A R q × q . The idiosyncratic shocks ε t are iid Gaussian with zero mean and covariance matrix R R m × m and the error vectors u t R q are iid Gaussian with zero mean and covariance matrix Q R q × q .
Since X t is a stationary process, Definition 1 implicitly assumes that unobserved factor process F t is also stationary. The stationarity of F t can be ensured if the roots of the characteristic polynomial I q A z lie outside of the complex unit circle. In this case, the moving average representation for F t yields its stationary q dimensional zero-mean Gaussian distribution (see Lütkepohl 2005, pp. 18–21) given by
N q ( 0 , Σ F ) with Σ F = i = 1 A i Q A i .
For known or estimated Q and A, the factors can be drawn from (1) by truncating the infinite sum for a pre-specified error tolerance of 10 5 for all entries of Σ F .
Parameters Λ , A , R , and Q of the dynamic factor model in Definition 1 can be estimated in the maximum likelihood framework with Expectation-Maximization Algorithm (EM-Algorithm) of Dempster et al. (1977). This was first done by Shumway and Stoffer (1982) and Watson and Engle (1983), though Shumway and Stoffer (1982) assumed a known Λ , and Watson and Engle (1983) did not directly maximize the log-likelihood of dynamic factor models. Recently, Bork (2009) and Bańbura and Modugno (2014) derived the EM-Algorithm for the dynamic factor models in Definition 1, on which we rely in our empirical application. Note that the convergence properties of the EM-Algorithm has been theoretically shown for an exponential family by Wu (1983).
For the convenience of the reader, we outline the estimation procedure of Bork (2009) and Bańbura and Modugno (2014). and refer to the original works for further details. Ignoring the unobservability of the factors, the log-likelihood function of the model in Definition 1 for a data sample of length T can be derived by iterative conditioning on observations (e.g., in Bork 2009, p. 45 or Bańbura and Modugno 2014, p. 156). However, the factors F t are unobservable, and therefore the log-likelihood is integrated out with respect to the factor distribution. This results in the expected log-likelihood conditioned on the observed panel data, which constitutes the expectation step of the EM-Algorithm. In contrast to the unconditioned log-likelihood, here factors are replaced by their corresponding conditional moments of first and second order, which a single run of the Kalman filter and smoother given in (Bork 2009, p.43) can provide.
In the maximization step of the EM-Algorithm, Bork (2009) and Bańbura and Modugno (2014) treat the conditional factor moments as constants, when the partial derivatives of the conditional expectation of the log-likelihood with respect to the model parameters are computed. Next, they search for the zeros of the arising system of linear equations to determine the maximum of the expected log-likelihood function. The iterative parameter updates of the EM-Algorithm from Bork (2009) and Bańbura and Modugno (2014) is summarized in Theorem 1.
Theorem 1. 
(EM-Algorithm as in Bork (2009) and Bańbura and Modugno (2014)) Assume the dynamic factor model in Definition 1 and let the matrix X = X 1 , , X T R m × T collect all panel data. Let the index l 0 indicate the current loop of the EM-Algorithm and let E · | X , θ l denote the expectation conditioned on the panel data and the parameters estimated in loop l. Then, the parameter updates in loop ( l + 1 ) are given as follows:
Λ l + 1 = t = 1 T X t E F t | X , θ l t = 1 T E F t F t | X , θ l 1 , A l + 1 = t = 1 T E F t F t 1 | X , θ l t = 1 T E F t 1 F t 1 | X , θ l 1 , R l + 1 = 1 T t = 1 T X t X t Λ l + 1 t = 1 T E F t | X , θ l X t , Q l + 1 = 1 T t = 1 T E F t F t | X , θ l A l + 1 t = 1 T E F t 1 F t | X , θ l ,
where Z 1 stands for the inverse matrix and conditional factor moments are computed using the Kalman filter and smoother for each loop l.
The iterative estimation procedure in Theorem 1 requires a termination criterion. In our application, we terminate the above EM-Algorithm as soon as the change in log-likelihood is smaller than 10 8 . Finally, note that the estimated factors are unique up to rotation with orthogonal matrices. For forecasting purposes, one can ignore this fact, since estimated parameters of factors in forecasting equations will then be transformed correspondingly with no effect on forecasting variable. We illustrate this point later in our application.

3. Vine Copulas

Since the statistical applicability of vine copulas with non-Gaussian building bivariate copulas was recognized by Aas et al. (2009), vine copulas became a standard tool to describe the dependence structure of multivariate data (see Aas 2016 for a recent review). Moreover, Brechmann and Czado (2015), Beare and Seo (2015), and Smith (2015) have applied vine copulas to model temporal dependence of multivariate time series as well as the cross-sectional dependence between univariate time series. In this section, we review the COPAR model of Brechmann and Czado (2015), which is used to describe the stochastic dynamics and the dependence structure of the estimated factors from Section 2. We start with the concept of regular vines from Kurowicka and Cooke (2006). We do not consider illustrating examples on pair-copula constructions, and refer instead to Aas (2016) or Czado (2010) for more intuition on them.
Definition 2. 
(Regular vine) A collection of trees V = T 1 , , T d 1 is a regular vine on d elements if
1. 
T 1 is a tree with nodes N 1 = 1 , , d connected by a set of non-looping edges E 1 .
2. 
For i = 2 , , d 1 , T i is a connected tree with edge set E i and node set N i = E i 1 , where N i = d i 1 and E i = d i are the number of edges and nodes, respectively.
3. 
For i = 2 , , d 1 , e = a , b E i : a b = 1 (two nodes a , b N i are connected by an edge e in T i if the corresponding edges a and b in T i 1 share one node (proximity condition)).
A tree T = N , E is an acyclic graph, where N is its set of nodes and E is its set of edges. Acyclic means that there exits no path such that it cycles. In a connected tree we can reach each node from all other nodes on this tree. R-vine is simply a sequence of connected trees such that the edges of T i are the nodes of T i + 1 . A traditional example of these structures are canonical vines (C-vines) and drawable vines (D-vines) (see Czado (2010) and Aas et al. (2009)) in Figure 1. Every tree of a C-vine is defined by a root node, which has d i incoming edges, in each tree T i , i 1 , , d 1 , whereas a D-vine is solely defined through its first tree, where each node has at most two incoming edges.
Regular vines are a powerful tool to systemize all possible factorizations of a d dimensional density as a product of d univariate marginal densities with a product of d ( d 1 ) / 2 conditional and unconditional bivariate copulas (see Theorem 4.2 in Kurowicka and Cooke (2006)). Thus, the unconditional and conditional bivariate copulas—called pair-copulas—of a given factorization can be uniquely mapped onto the edge set E of a particular regular vine, and vice versa. Then, the conditional copulas and their arguments depend on conditioned values. The dependence of conditional copulas on conditioned values is crucial, and allows for statistical applications only for a subclass of elliptical distributions (see Stöber et al. (2013)), since in this case, conditioned pair-copulas depend on conditioned values only through their arguments. Aas et al. (2009) first developed this observation further and went beyond the elliptical world. Thus, they considered regular vine factorizations with arbitrary fixed conditional copulas and showed that this results in valid multivariate distributions and copulas. In this paper, we also assume that conditional copulas depend on conditioned values only through their arguments, and so they can be chosen from bivariate copula families.
The number of possible vine structures on d random variables can be immense. For d 4 , only C- and D-vines are possible. For d > 4 , there are d ! 2 different C- and D-vines. The total amount of regular vine structures has been computed by Morales-Nápoles et al. (2010), and is equal to d ! 2 · 2 d 2 2 . To select conditional copulas on these graphical structures, we define the following sets as in Czado (2010).
Definition 3. 
(Conditioned and conditioning sets) For any edge e = a , b E i of a regular vine V , the complete union of e is the subset
A e = v N 1 : m = 1 , , i 1 , e j m E m s . t . v e j 1 e j i 1 e .
The conditioning set associated with e is
D e = A a A b .
The conditioned sets associated with e are
i ( e ) = A a \ D e j ( e ) = A b \ D e .
The copula for this edge will be denoted by
C e C i e , j e D e .
Given a regular vine, we specify a regular vine copula by assigning a (conditional) pair copula (with parameters) to each edge of the regular vine. In doing so, we follow Czado et al. (2012).
Definition 4. 
(Regular vine copula). A regular vine copula C = V , B V , θ B V in d dimensions is a multivariate distribution function such that for a random vector U = U 1 , , U d C with uniform margins
  • V is a regular vine on d elements,
  • B V = C e e E m , m = 1 , , d 1 is a set of d d 1 / 2 copula families identifying the unconditional distributions of U i e , j e as well as the conditional distributions of U i e , j e U D e ,
  • θ B V = θ e e E m , m = 1 , , d 1 is the set of parameter vectors corresponding to the copulas in B V .
To facilitate statistical inference, a matrix representation of R-vines was proposed by Morales-Nápoles et al. (2010) and further developed by Dissmann et al. (2013). To specify a d-dimensional R-vine in matrix form, one needs several lower triangular d × d matrices: one that stores the structure of the R-vine, one with copula families, and another two with the first and second parameters.
For a d-dimensional R-vine, the matrix with the structure has the following form
M = m 1 , 1 m d 1 , 1 m d 1 , d 1 m d , 1 m d , d 1 m d , d ,
where m i , j 1 , , d . The rules for reading from this matrix are as follows. The conditioned set for an entry m i , j is the entry itself and the diagonal entry of the column m j , j , whereas the conditioning set is composed of variables under the entry; i.e., for m i , j , the conditioned set will be m i , j , m j , j , the conditioning set is m i + 1 , j , , m d , j . Thus, m i , j denotes the node m j , j , m i , j m i + 1 , j , , m d , j . We will assume that the diagonal of M is sorted in descending order, which can always be achieved by reordering the node labels, so that we have m i , i = n i + 1 . To illustrate the R-vine matrix notation, we consider the C-vine from Figure 1 and give below its R-vine matrix representation.
5 4 4 3 3 3 2 2 2 2 1 1 1 1 1
The first column encodes the following nodes
  • Node 54 321 is saved through m 1 , 1 , m 2 , 1 given m 3 , 1 , m 4 , 1 and m 5 , 1 ;
  • Node 53 21 is saved through m 1 , 1 , m 3 , 1 given m 4 , 1 and m 5 , 1 ;
  • Node 52 1 is saved through m 1 , 1 , m 4 , 1 given m 5 , 1 ;
  • Node 51 is saved through m 1 , 1 , m 5 , 1 .
In the sequel, we utilize C-vines to capture the cross-sectional dependence of a multivariate time series at any time point t. To capture the dependence between multivariate observations, the first tree of the C-vine for multivariate observation at time point t is connected to the first trees of the C-vines for existing neighboring multivariate observations at time points t 1 and t + 1 with one edge, correspondingly. This results in the first tree of an R-vine for all multivariate observations treated as one huge sample point. Depending on the choice of C-vines and the connection of the first trees, the copula autoregressive model of Brechmann and Czado (2015) from the next section can be obtained.
Finally, note that the information on copula families and their parameters is similarly stored in lower triangular d × d matrices. Each element of the R-vine matrix below the diagonal specifies a conditional or unconditional pair-copula depending on the diagonal entry above it. The family and parameters of this pair-copula are now entered at the same entry place of matrices for copula families and parameters. Since the diagonal entries of the R-vine matrix alone do not determine any pair-copulas, no entries for copula family and parameters matrices are needed on the diagonal. To avoid confusion with space character, we fill the main diagonal with * sign.

4. Copula Autoregressive Model

R-vines have been mostly used to model contemporaneous dependence. We now present a special R-vine structure called copula autoregressive (COPAR) model from Brechmann and Czado (2015) which is designed to capture cross-sectional, serial, and cross-serial dependence in multivariate time series, and allows a general Markovian structure. Let F t , G t t = 1 , , T be an observable bivariate stationary time series. To illustrate how the two individual time series are interdependent, consider the mapping of dependencies for T = 4 in Figure 2. Vertical solid lines represent the cross-sectional dependence, horizontal solid lines and curved lines represent the serial dependence for each time series, and dotted and dashed lines represent the cross-serial dependence.
Traditionally, R-vines were used to model only the cross-sectional dependence (pictured by the vertical solid lines in Figure 2), but under the assumption that other dependencies are absent (i.e., data is iid). The COPAR model is designed to additionally capture serial and cross-serial dependence. The following definition of a COPAR model without Markovian structure for two time series adopted to our notation is taken from Brechmann and Czado (2015). The vectors F s , , F t and G s , , G t are denoted as F s : t and G s : t , respectively.
Definition 5. 
(COPAR model for the bivariate case) The COPAR model for stationary continuous time series F t t = 1 , , T and G t t = 1 , , T has the following components.
(i) 
Unconditional marginal distributions of each time series are independent of time.
(ii) 
An R-vine for the serial and between-series dependence of F t t = 1 , , T and G t t = 1 , , T , where the following pairs are selected.
1. 
Serial dependence of F t t = 1 , , T : The pairs of serial D-vine copula for F 1 , , F T ; that is,
F s , F t F s + 1 : t 1 , 1 s < t T .
2. 
Between-series dependence:
F s , G t F s + 1 : t , 1 s t T ,
and
G s , F t F 1 : t 1 , G s + 1 : t 1 , 1 s < t T .
3. 
Conditional serial dependence of G t t = 1 , , T : The pairs of a serial D-vine copula for G 1 , , G T conditioned on all previous values of F t t = 1 , , T ; that is,
G s , G t F 1 : t , G s + 1 : t 1 , 1 s < t T .
Pair copulas of the same lag length t s , t s , are identical. We associate
1. 
copula C t s F C F s , F t F s + 1 : t 1 with (2),
2. 
copulas C t s F G C F s , G t F s + 1 : t and C t s G F C G s , F t F 1 : t 1 , G s + 1 : t 1 with (3) and (4), respectively, and
3. 
copula C t s G C G s , G t F 1 : t , G s + 1 : t 1 with (5).
Let us explain the above definition on an example of a bivariate time series F t , G t t = 1 , , 4 with four observations. Using a D-vine, Equation (2) captures the dependence structure of F 1 , F 2 , F 3 , and F 4 expressed by the top lines connecting them in Figure 2. For s = 1 and t = 4 , Equation (3) describes the conditional dependence between F 1 and G 4 conditioned on F 2 , F 3 , and F 4 . Thus, Equation (3) captures the conditional dependence between F s and G t for s < t reflected by the dashed lines in Figure 2. For s = t , Equation (3) models the unconditional dependence between F t and G t expressed by the vertical lines. Similarly, Equation (4) describes the conditional serial dependence between univariate time series illustrated by the dotted lines in Figure 2. However, Equations (3) and (4) are not symmetric with respect to conditioned sets. In particular, the conditional distribution of F s and G t for s < t is independent of G 1 , , G t 1 . This already indicates that the first time series F t t = 1 , , T plays a key role in the stochastic dynamics of the time series, since it is sufficient to describe the conditional dependence between F s and G t for s < t . Therefore, the univariate time series are not interchangeable. Using a D-vine, Equation (5) finally describes the dependence structure of G 1 , G 2 , , G T (connecting bottom lines in Figure 2) conditioned on F 1 , F 2 , , F T . Due to the property that copulas of same lag length are identical, the COPAR model defines a stationary bivariate time series.
Now, we expand the COPAR model to an arbitrary number of variables with Markov structure. We consider q univariate time series observed at T time points; that is, F 1 , t t = 1 , , T , , F q , t t = 1 , , T . We denote a random vector of q variables observed at times t = 1 , , T as F t = F 1 , t , , F q , t , time series of individual variables for i = 1 , , q as F i = F i , 1 , , F i , T , and also introduce a vector F l : q , s : t = F l , s , , F l , t , , F q , s , , F q , t . Finally, we define a COPAR k model, which is COPAR model for a multivariate time series with a Markovian structure of the k-th order. Since unobservable factors in Section 2 are also denoted by F t , our notation above should not lead to a confusion. Our modeling approach utilizes estimated factors even if they are not observed. Therefore, for reader convenience, we denote here multivariate time series with F t and present COPAR models in terms of F t .
Definition 6. 
(COPAR k model for q time series of length T) The COPAR k model for a q-dimensional stationary time series F t R q × 1 , t = 1 , , T has the following components.
(i) 
Unconditional marginal distributions of each time series are independent of time.
(ii) 
An R-vine for the serial and between-series dependence of F t R q × 1 , t = 1 , , T , where the following pairs are selected.
1. 
Serial dependence of F 1 : The pairs of serial D-vine copula for F 1 , 1 , , F 1 , T ; that is,
F 1 , s , F 1 , t F 1 , s + 1 : t 1 , 1 s < t T .
2. 
Between-series dependence of F i and F j for i < j , i , j = 1 , , q :
F i , s , F j , t F 1 : i 1 , 1 : t , F i , s + 1 : t 1 s t T ,
and
F j , s , F i , t F 1 : i 1 , 1 : t , F i : j 1 , 1 : t 1 , F j , s + 1 : t 1 1 s < t T ,
3. 
Conditional serial dependence of F i for 2 i q :
F i , s , F i , t F 1 : i 1 , 1 : t , F i , s + 1 : t 1 , 1 s < t T ,
whereas
(a) 
copulas i and for lag length t s > k are independent copulas,
(b) 
copulas for i , j = 1 , , q for the same lag length t s , t s are identical.
Definition 6 introduces the COPAR model, if a) is neglected. The dependence of F i is modeled conditioned on F 1 , , F i 1 , and consequently, the order of variables cannot be simply interchanged. The number of pair copulas in COPAR models for time series with Markovian structure does not depend on T, and is less than the number needed for a general R-vine. With respect to the number of pair-copulas, the following result from Brechmann and Czado (2015) holds.
Lemma 1.
The number of copulas needed for a COPAR k model of q univariate time series is q 2 k + q q 1 2 .
Note, the number of parameters in a VAR k for the between-series dependence of q time series—not counting parameters for marginal distributions—is equal to the number of pair-copulas in a COPAR k model. Therefore, the number of parameters in a COPAR k model is bounded by a multiple of the number of VAR parameters, depending on the number of parameters of the involved copula families. In contrast, a general R-vine requires q T q T 1 2 pair-copulas, resulting in a huge number of parameters.
Figure 3 illustrates the first 4 trees of a COPAR model for three univariate time series F t , G t , and H t with T = 4 (i.e., for four observations). Examining the first tree, we observe that it is a sequence of connected C-vines, where the central nodes are the time points of the first factor F t . Thus, unconditional contemporaneous dependence is modeled with a C-vine.
The matrix representation of the R-vine structure of this COPAR model is given by
H 4 H 1 G 4 H 2 H 1 F 4 H 3 H 2 H 1 H 3 G 1 H 3 H 2 H 1 G 3 G 2 G 1 H 3 H 2 H 1 F 3 G 3 G 2 G 1 G 1 H 2 H 1 H 2 G 4 G 3 G 2 G 2 G 1 H 2 H 1 G 2 F 1 F 1 G 3 G 3 G 2 G 1 G 1 H 1 F 2 F 2 F 2 F 1 F 1 F 1 G 2 G 2 G 1 H 1 H 1 F 3 F 3 F 2 F 2 F 2 F 1 F 1 F 1 G 1 G 1 G 1 F 4 F 4 F 3 F 3 F 3 F 2 F 2 F 2 F 1 F 1 F 1 F 1
and the matrix of copulas for three-dimensional COPAR model is given by
* C 3 H * C 2 H C 3 H G * C 1 H C 2 H G C 3 H F * C 3 G H C 1 H G C 2 H F C 2 H * C 2 G H C 3 G C 1 H F C 1 H C 2 G H * C 1 G H C 2 G C 3 G F C 2 G H C 1 G H C 2 H F * C G H C 1 G C 2 G F C 1 G H C 2 G C 1 H F C 1 H * C 3 F H C 3 F G C 1 G F C G H C 1 G C 2 G F C 1 G H C 1 G H * C 2 F H C 2 F G C 3 F C 2 F H C 2 F G C 1 G F C G H C 1 G C 1 H F * C 1 F H C 1 F G C 2 F C 1 F H C 1 F G C 2 F C 1 F H C 1 F G C 1 G F C G H * C F H C F G C 1 F C F H C F G C 1 F C F H C F G C 1 F C F H C F G * .
If a COPAR k model is estimated based on the first T multivariate observations, then it can easily be extended to T + 1 observations. This allows us to sample the ( T + 1 ) -th observation according to the estimated dependence structure between the k th subsequent multivariate observations. Let us illustrate this point for a COPAR 1 model and the above example with three univariate time series. The matrix representation of the R-vine structure of this COPAR 1 model remains unchanged, and the matrix of copulas for three-dimensional COPAR 1 model is now simplified to the matrix in (6), where 0 stands for independence copula
* 0 * 0 0 * C 1 H 0 0 * 0 C 1 H G 0 0 * 0 0 C 1 H F C 1 H 0 * C 1 G H 0 0 0 C 1 G H 0 * C G H C 1 G 0 C 1 G H 0 C 1 H F C 1 H * 0 0 C 1 G F C G H C 1 G 0 C 1 G H C 1 G H * 0 0 0 0 0 C 1 G F C G H C 1 G C 1 H F * C 1 F H C 1 F G 0 C 1 F H C 1 F G 0 C 1 F H C 1 F G C 1 G F C G H * C F H C F G C 1 F C F H C F G C 1 F C F H C F G C 1 F C F H C F G * .
Now, if we want to expand this COPAR 1 model by adding a new time point (i.e., T T + 1 —in our case, 4 5 ), the matrix representation will be changed as follows
  • Add three blank columns to the left of the matrix and add H T + 1 , G T + 1 , F T + 1 to the diagonal;
  • Under F T + 1 comes H 1 , , H T , then G 1 , , G T and F 1 , , F T ;
  • Under G T + 1 comes H 1 , , H T , then G 1 , , G T and F 1 , , F T + 1 ;
  • Under H T + 1 comes H 1 , , H T , then G 1 , , G T + 1 and F 1 , , F T + 1 .
The expanded matrix representation is as follows, where the new columns are marked in bold:
H 5 H 1 G 5 H 2 H 1 F 5 H 3 H 2 H 1 H 4 H 4 H 3 H 2 H 1 G 4 G 1 H 4 H 3 H 2 H 1 F 4 G 2 G 1 H 4 H 3 H 2 H 1 H 3 G 3 G 2 G 1 G 1 H 3 H 2 H 1 G 3 G 4 G 3 G 2 G 2 G 1 H 3 H 2 H 1 F 3 G 5 G 4 G 3 G 3 G 2 G 1 G 1 H 2 H 1 H 2 F 1 F 1 G 4 G 4 G 3 G 2 G 2 G 1 H 2 H 1 G 2 F 2 F 2 F 1 F 1 F 1 G 3 G 3 G 2 G 1 G 1 H 1 F 2 F 3 F 3 F 2 F 2 F 2 F 1 F 1 F 1 G 2 G 2 G 1 H 1 H 1 F 4 F 4 F 3 F 3 F 3 F 2 F 2 F 2 F 1 F 1 F 1 G 1 G 1 G 1 F 5 F 5 F 4 F 4 F 4 F 3 F 3 F 3 F 2 F 2 F 2 F 1 F 1 F 1 F 1 .
We also expand the matrix of copulas by adding three columns to the left of (6), as follows:
* 0 * 0 0 * 0 0 0 * C 1 H 0 0 0 * 0 C 1 HG 0 0 0 * 0 0 C 1 HF C 1 H 0 0 * 0 0 0 0 C 1 H G 0 0 * C 1 GH 0 0 0 0 C 1 H F C 1 H 0 * C GH C 1 G 0 C 1 G H 0 0 0 C 1 G H 0 * 0 0 C 1 GF C G H C 1 G 0 C 1 G H 0 C 1 H F C 1 H * 0 0 0 0 0 C 1 G F C G H C 1 G 0 C 1 G H C 1 G H * 0 0 0 0 0 0 0 0 C 1 G F C G H C 1 G C 1 H F * C 1 FH C 1 FG 0 C 1 F H C 1 F G 0 C 1 F H C 1 F G 0 C 1 F H C 1 F G C 1 G F C G H * C FH C FG C 1 F C F H C F G C 1 F C F H C F G C 1 F C F H C F G C 1 F C F H C F G * .
Thus, the three new columns are just replications of the previous three columns by expanding the uninterrupted sequences of zeros with an additional zero.

5. Empirical Application

The idea to model asset returns with factors arising from some observed data and idiosyncratic components is quite popular in modern finance theory. The most prominent example is the capital asset-pricing model (CAPM) of Sharpe (1964), Litner (1965), Mossin (1966), and Treynor (2012), which is a one-factor model with the market return as the only common driver of asset prices. Another well-known approach is the arbitrage pricing theory (APT) of Ross (1976). In this case, a multi-factor model describes the return of an asset as the sum of an asset-specific return, an exposure to systematic risk factors, and an error term. A third example is Stock and Watson (2002a), who extracted factors from a large number of predictors to forecast the log-returns of the Federal Reserve Board’s Index of Industrial Production. Ando and Bai (2014) provide further empirical evidence that stock returns are related to macro- and microeconomic factors. In this section, latent factors from U.S. macroeconomic data are extracted and then used for portfolio optimization. We consider 35 assets from S&P 500, which are classified as “Industrials” according to Global Industry Classification Standard. We assume that the estimated factors have the most prediction power for these assets.
The U.S. panel data includes such economic indicators as government bond yields along the curve, currency index, main commodity prices, indicators of money stock, inflation, consumer consumption, and industrial production gauges. Altogether, we have 22 time series listed in Appendix B. Each series contains monthly data from 31 January 1986 to 30 November 2016, 371 data points in total. Next, we split the panel data into financial and macroeconomic groups according to Table A2 and Table A3 in Appendix B. In the sequel, each time series is transformed in order to eliminate trends and achieve its stationarity. Table A2 and Table A3 in Appendix B also contain information on considered data transformations. Further, we consider three factor models: separately for the two groups of the panel data, and one joint model for all monthly indicators. Thereby, we aim to illustrate that modeling nonlinear dependence between estimated factors of different groups of panel data with COPAR may lead to a better asset allocation.
We consider the following three DFMs for i = f i n , m a c r o , a l l :
X t i = Λ i F t i + ε t i
F t i = A i F t 1 i + u t i ,
where X t i R m i × 1 collects observed macroeconomic data and F t i R q i × 1 is the vector of factors. The dimension m i of the panel data for i = f i n , m a c r o and a l l is equal to 9 , 13 and 22, respectively.
In the first step, we estimate three DFMs—one for each group (macro and financial) and for the full panel data. The starting time frame is from January 1986 to December 2005, and expands successively by one month until November 2016 is reached. For the three DFMs, we apply the EM-Algorithm from Section 2 to the panel data from each expanding time window and obtain a monthly sequence of estimated factors for each month of the considered time period. For i = f i n , m a c r o , and a l l , the EM-Algorithm requires the factor dimension q i , which is not known. In what follows, we perform the model selection for the factor dimension.
The dimension of factors is selected using principal components analysis (see Jolliffe 2002). We choose the number of principal components (PCs) such that we capture more than 95% of the empirical data variance, based on the initial time frame from January 1986 to December 2005. Figure 4 illustrates the fraction of variance captured by eight or fewer PCs for the considered time window. Thus, two factors are sufficient for the financial group to capture 95.4% of the corresponding data variability. For macroeconomic indicators, we need three factors (96.5% of variability), while four factors are enough for the whole panel data (96.2% of variability).
In general, the latent factors of DFMs may follow a VAR model of order p. Initiated by a referee’s comment, we also consider autoregressive order p = 2 for the joint DFM with all monthly indicators and compare its forecasting performance for p = 1 and p = 2 . Our decision criterion is based on the root-mean-square error (RMSE) of point predictions defined for univariate observations x 1 , , x T * as
R M S E = t = 1 T * x t x ^ t 2 T * ,
where x ^ 1 , , x ^ T * are predicted values. Thus, we compute the RMSE for each time series and take their average value. Thereby, our first prediction is done for January 2006 and ends in November 2016, resulting in 131 point forecasts. The averaged values of the RMSE for p = 1 and for p = 2 are 0.0302 and 0.0306 , respectively. Therefore, our initial choice of p = 1 is justified.
Figure 5 illustrates the correlation coefficients of the financial and macro factors filtered from data up to October 2010. The estimated factors show some moderate linear dependence within groups and a weak linear dependence between groups. Note that dynamic factor models assume factors to be multivariate normally distributed. Nevertheless, the estimated factors could exhibit non-Gaussian dependence, which we aim to capture using the COPAR model.
To improve the joint forecasting of asset returns at month t, for a subsequent month, we link the two factor models; namely, we capture the dependence structure of the estimated factors with a COPAR model. Thereby, we combine the estimated factors from two groups as a single five-dimensional vector for each data time window. We put the financial factors first, then the macroeconomic. Using fitted marginal normal distribution functions, the filtered factors are translated to copula data.
We consider a COPAR(1) model, since the transition equation of the DFMs is supposed to be a VAR(1). Further, Brechmann and Czado (2015) discuss a sophisticated copula family selection procedure for two time series. Here, we follow a more simple approach consisting of three steps. In the first step, the selection of copulas for contemporaneous cross-sectional dependence of the filtered factors is done by neglecting serial dependence. Note that the cross-sectional dependence is described by a C-vine with fixed order of variables. Therefore, we only have to select copula families of pair-copulas. Thus, we consider the estimated five-dimensional factors as iid data and perform sequential selection of copula families for each pair-copula using Akaike information criterion. Copula families for the pair-copulas associated with the first tree of the C-vine are selected first, then with the second tree, and so on. For more details on model selection for C-vines, we refer to Czado et al. (2012). Equation (9) presents the selected copula families for cross-sectional dependence in R-vine matrix notation until December 2005 (for copula families in (9), see Appendix C):
Families = * 2 * 5 2 * 16 3 2 * 16 2 36 2 * .
It is remarkable that none of the selected copula families in (9) is Gaussian. Moreover, only 0.5% of families selected for cross-sectional dependence for all expending time windows are the Gaussian copula.
In the second step, we use the Gaussian copulas as conditional pair-copulas to model the temporal dependence of the filtered factors. Note that this simple approach does not imply that the joint distribution of the time-shifted factors is Gaussian except for the first component of F t f i n ; i.e., F 1 , t 1 f i n , F 1 , t 2 f i n , , F 1 , t k f i n . Non-Gaussian pair-copulas for cross-sectional dependence destroy the Gaussianity of the time-shifted factors, and only the joint distribution of the first component of F t f i n remains Gaussian. In the third step, the R-vine matrix of the COPAR(1) model for the estimated factors F t f i n , F t m a c r o is constructed, and the maximum likelihood estimation is performed. Copula selection and parameter estimation of the COPAR model for the factors is done for every expanding window of data; that is, as soon as the DFMs for both groups are estimated and the latent factors are filtered out. Finally, note that we also test Granger causality between components of the multivariate time series F t f i n , F t m a c r o in the whole time period, and can confirm it. For this, we regress univariate series of F t f i n , F t m a c r o on the lagged values of F t 1 f i n , F t 1 m a c r o . Table A11 in Appendix F summarizes all five linear regressions. For example, the first component of F t f i n t = 1 , , T Granger causes almost all other univariate time series at least at 10 % of significance level with the exception of the second component of F t m a c r o t = 1 , , T .
Next, we present a conditional method for forecasting using COPAR, which follows Brechmann and Czado (2015) with a small modification. For a prediction time point, we simulate factors from the estimated COPAR model, but conditioned on the past values of the factors. The sampled factors are further used to forecast assets returns and to construct an optimal mean-variance portfolio. Since we assume an autoregressive order of one, it is enough to condition only on the last value of factors; i.e., we condition only on F ^ T f i n and F ^ T m a c r o to make a forecast for time point T + 1 . The full algorithm for both markets is given in Algorithm 1, employing notation F ¯ 1 : 0 , t = .
Algorithm 1: Conditional method of forecasting using COPAR.
  • Estimate COPAR model based on the first T observations and get factor estimates F ^ t = F ^ t f i n , F ^ t m a c r o ;
  • Set j = 1 ;
  • Repeat the following steps
    (a)
    If j = q + 1 then Stop;
    (b)
    Determine the estimated conditional density of F j , T + 1 F ^ 1 : q , T , F ¯ 1 : ( j 1 ) , T + 1 on the equidistant grid on ( 0 , 1 ) with step Δ = 10 4 , i.e., for u = n 10000 , n = 1 , , 9999 ;
    c Φ 1 F ¯ 1 , T + 1 , , Φ j 1 F ¯ j 1 , T + 1 , u , Φ 1 F ^ 1 , T , , Φ q F ^ q , T f F ¯ 1 , T + 1 , , F ¯ j 1 , T + 1 , F ^ 1 , T , , F ^ q , T
    · k = 1 j 1 ϕ k F ¯ k , T + 1 · ϕ j Φ j 1 u · k = 1 q ϕ k F ^ k , T ,
    where ϕ k and Φ k are the estimated Gaussian density and distribution function of latent factor F k , t ( i ) ;
    (c)
    Determine the estimated conditional cumulative distribution function on the above grid;
    (d)
    Simulate 10,000 F ˜ j , T + 1 from the estimated conditional cumulative distribution function and set forecast F ¯ j , T + 1 to their empirical mean;
    (e)
    Set j = j + 1 and go to (a);
To illustrate the idea behind this method, we consider a small-scale example of two variables G t and H t . Let us assume that we observe some values of these two random variables at time point t; i.e., G t = G ^ t , H t = H ^ t . We want to find the distribution of G t + 1 and H t + 1 conditioned on the observed values that we have observed. For this purpose, we consider the decomposition of the conditional distribution of G t + 1 , H t + 1 G t = G ^ t , H t = H ^ t given by
F G t + 1 , H t + 1 G t = G ^ t , H t = H ^ t            = F G t + 1 G t = G ^ t , H t = H ^ t F H t + 1 G t = G ^ t , H t = H ^ t , G t + 1 = G ¯ t + 1 ,
where G ¯ t + 1 is some forecast. One has a free choice for forecast methods for G t + 1 . As Brechmann and Czado (2015), we opt for the conditional mean estimated by the sample mean. Next, we first estimate the conditional density, and then, the conditional distribution function. For this simple case of two variables G t and H t , the estimated density f ^ of G t + 1 G t = G ^ t , H t = H ^ t on a grid from 0 to 1 with step 0.0001 is estimated. Then, we search for the estimated distribution function F ^ . With f ^ and F ^ , one can estimate the mean.
In the next step, we use the estimated factors to model the asset returns of 35 industrial stocks from S&P 500 and the sampled factors to construct an optimal portfolio in the mean-variance framework for each expanding time window starting from January 2006 up to November 2016. We consider and estimate the following model for each asset return j = 1 , , 35 :
r j , t = α j , 0 + α j , 1 r j , t 1 + Γ j F ^ t + v j , t ,
where the constants α j , 0 , α j , 1 , and the vector Γ j constitute unknown regression parameters and the errors v j , t are assumed to be iid Gaussian with respect to t. The error terms v j , t are assumed to be independent across j due to the dimensionality of the error covariance matrix. Moreover, we treat all linear regression separately for each asset j due to the dimensionality of regression parameters and couple them with the common estimated factors. In Section 2, we have pointed out that the estimated factors are unique up to a rotation with an orthogonal matrix R; i.e., R R = R R is an identity matrix. Since Γ j F t ^ = Γ j R R F ^ t holds, the impact of the rotated and unrotated estimated factors together with their regression coefficients on asset returns remains the same.
To construct an optimal portfolio in the mean-variance framework, we determine portfolio weights w at month t by solving the following optimization problem with respect to w :
max w · E r t + 1 s . t . 1 T · w = 1 , w 0 , w · V a r r t + 1 · w σ m o n t h l y 2 ,
where r t = ( r 1 , t , , r 35 , t ) is a vector of asset returns at t, E r t + 1 and V a r r t + 1 are the expectation and covariance matrix of r t + 1 . Thus, our optimal portfolio does not allow short-selling.
Since E r t + 1 and V a r r t + 1 are unknown at month t, we estimate them with four different methods, resulting in four portfolios. The first portfolio is the COPAR portfolio, which is constructed using asset returns modeled with (10). In this case, E r t + 1 and V a r r t + 1 are empirically estimated at time point t based on 10,000 sampled factors from the COPAR model and 10,000 sampled errors from the estimated univariate Gaussian distribution. The second portfolio is the DFM portfolio similarly constructed with sampled factors from the estimated stationary distribution (1) for (7)–(8) and i = a l l (i.e., the DFM for the full panel data). The third portfolio—called independent DFM—uses sampled factors from the estimated stationary distribution (1) for (7)–(8) and i = f i n , m a c r o ; i.e., assets returns are driven by F t f i n , F t m a c r o . The fourth portfolio is the historical portfolio, which employs the empirical mean and covariance matrix based on the data up to time t. We consider the historical portfolio as a benchmark. Finally note that the comparison of these four portfolios enables the assessment of the economic relevance of factors.
For the above mean-variance optimization problem, we choose three monthly volatilities σ m o n t h l y = 2.89 % , 3.75 % , 4.62 % , which correspond to annual volatilities of σ = 10 % , 13 % , 16 % . These choices of σ ’s are practically reasonable, and the optimizer diversifies the portfolio for them. For higher values of σ , the optimal portfolio consists of only one stock, if no constraints on maximal weights are imposed. We start to determine an optimal portfolio for the panel data up to January 2006. Then, we sequentially expand the time window by one month and find optimal weights for the considered four portfolios and chosen level of volatilities. The performance of the four portfolios with initial investment of 1 USD during the out-of-sample time period up to November 2016 is illustrated in Figure 6.
We observe that the COPAR portfolio outperforms the DFM, independent DFM, and historical portfolios. Further, two portfolios based on DFMs deliver a higher overall return than the historical portfolio, and the independent DFM (abbreviated in Figures and Tables as ind) is preferred over the DFM portfolio.
To compare the four portfolios, we consider several portfolio performance and risk measures summarized in Appendix D. First, note that observed standard deviations of portfolio returns are higher than prespecified ones. This is natural due to the prediction error. According to the Sharpe and Omega ratio, the COPAR portfolio is preferred over all remaining portfolios. For both risk measures, the independent DFM portfolio outperforms the DFM portfolio. We explain this finding with a fortunate split of the panel data. Further, if we consider 95 % Value at Risk (VaR) and 95 % Conditional Value at Risk (CVaR), then the historical portfolio outperforms the others except for one case. For σ = 16 % and 95 % VaR, the COPAR portfolio is slightly superior.
To statistically assess the differences in the Sharpe ratios, we perform the Jobson–Korkie Test from Jobson and Korkie (1981). The null hypothesis of this test is that the Sharpe ratios of the two considered portfolios are equal. The normalized and centered test statistics are asymptotically Gaussian distributed with mean 0 and variance 1 under the null hypothesis. If the null hypothesis is rejected, then there is significant statistical evidence for different Sharpe ratios. Appendix E presents results of the Jobson–Korkie Test for all pairs of portfolios and σ = 10 % , 13 % , 16 % . The COPAR portfolio significantly outperforms the independent DFM portfolio, and this advocates our approach to model estimated factors with a COPAR model. For σ = 10 % and σ = 13 % , we do not see a statistically significant difference in Sharpe ratios for historical and COPAR portfolios. We explain this finding with lower standard deviations of the historical portfolio returns.

6. Conclusions and Final Remarks

This paper applies copulas to capture the dependence structure of estimated latent factors from dynamic factor models. The proposed modeling approach is especially convenient when several factor models under consideration are estimated separately. In this context, we combine the filtered latent factors with the COPAR model of Brechmann and Czado (2015), which results in a non-Gaussian dependence between the factors. The gained flexibility of the factor dependence is then used for modeling asset returns and building optimal mean-variance portfolios.
In our empirical study, we consider U.S. panel data consisting of 9 financial and 13 macroeconomic monthly observable indicators. The nature of indicators suggests the consideration of two separate dynamic factor models. We also treat a joint dynamic factor model for all indicators. Estimated factors from the considered DFMs are used to model returns of 35 industrial stocks from S&P500. Then, factors’ predictions from different models spanning almost 11 years are employed for portfolio optimization in the mean-variance framework.
Our main contribution is a performance improvement of portfolios based on DFMs. For this, we propose to capture the dependence structure of filtered factors from DFMs with a COPAR model. This allows us to sample factor forecasts from the estimated COPAR model conditionally on past values of estimated factors. The gained factor predictions are then utilized to construct an optimal mean-variance portfolio. Thus, we compare the COPAR-based portfolio with two portfolios derived from DFMs, as well as with the classical mean-variance approach utilizing empirical means and covariance matrices. For the considered panel data and industrial stocks, we observe the outperformance of the COPAR portfolio in terms of the total return, the Sharpe and Omega ratio. The superiority of the COPAR-portfolio in terms of the Sharpe ratio is even statistically significant for several portfolio comparisons.
A possible improvement of the proposed approach is its extension with a model selection criterion for a general R-vine, which best captures the cross-sectional and temporal dependence. Thus, one departs from the COPAR model and generalizes it as well as the copula based multivariate time series models of Beare and Seo (2015) and Smith (2015). In general, one can completely revise our approach and alternatively develop dynamic versions of copula factor models as proposed by Krupskii and Joe (2013) and Oh and Patton (2017) for iid data. The first methodology in this direction is provided by Oh and Patton (2016), who capture cross-sectional dependence with time-varying copulas. These points are the subject of further research.

Acknowledgments

The authors want to thank the editor and the anonymous reviewers for their very helpful suggestions, which contributed essentially to the improvement of our manuscript. We are also obliged to Eike Brechmann, who kindly provided R-code for COPAR model, which served as an inspiration for computer implementation of this paper. Franz Ramsauer gratefully acknowledges the support of Pioneer Investments during his doctoral phase. Eugen Ivanov gratefully acknowledges the support of Deutsche Forschungsgemeinschaft during his doctoral phase. The authors also express gratitude to Leibniz-Rechenzentrum for providing computational resources and assistance.

Author Contributions

Eugen Ivanov, Aleksey Min and Franz Ramsauer analyzed the data, constructed the model and designed the estimation procedure. Asset allocation procedure based on dynamical factor models is proposed by Franz Ramsauer. Eugen Ivanov and Franz Ramsauer performed the complete computational implementation. All three authors wrote the paper.

Conflicts of Interest

The authors declare no conflict of interest. The sponsors had no role in the design of the study, in the collection, analyses, or interpretation of data, in the writing of the manuscript, and in the decision to publish the results.

Appendix A. Assets for Portfolio Optimization

Table A1. Assets used for portfolio construction.
Table A1. Assets used for portfolio construction.
TickerNameGICS Sub Industry
CMICummins Inc.Industrial Machinery
EMREmerson Electric CompanyIndustrial Conglomerates
CSXCSX Corp.Railroads
MMM3M CompanyIndustrial Conglomerates
BABoeing CompanyAerospace & Defense
CHRWC. H. Robinson WorldwideAir Freight & Logistics
CATCaterpillar Inc.Construction & Farm Machinery & Heavy Trucks
CTASCintas CorporationDiversified Support Services
DHRDanaher Corp.Industrial Conglomerates
DEDeere & Co.Construction & Farm Machinery & Heavy Trucks
ETNEaton CorporationIndustrial Conglomerates
EFXEquifax Inc.Research & Consulting Services
GDGeneral DynamicsAerospace & Defense
GEGeneral ElectricIndustrial Conglomerates
LLLL-3 Communications HoldingsIndustrial Conglomerates
LEGLeggett & PlattIndustrial Conglomerates
NSCNorfolk Southern Corp.Railroads
PBIPitney-BowesTechnology, Hardware, Software and Supplies
RTNRaytheon Co.Aerospace & Defense
RHIRobert Half InternationalHuman Resource & Employment Services
ROKRockwell Automation Inc.Industrial Conglomerates
COLRockwell CollinsIndustrial Conglomerates
UNPUnion PacificRailroads
UPSUnited Parcel ServiceAir Freight & Logistics
UTXUnited TechnologiesIndustrial Conglomerates
WMWaste Management Inc.Environmental Services
TXTTextron Inc.Industrial Conglomerates
FLRFluor Corp.Diversified Commercial Services
FDXFedEx CorporationAir Freight & Logistics
PCARPACCAR Inc.Construction & Farm Machinery & Heavy Trucks
GWWGrainger (W.W.) Inc.Industrial Materials
MASMasco Corp.Building Products
RRyder SystemIndustrial Conglomerates
LMTLockheed Martin Corp.Aerospace & Defense
NOCNorthrop Grumman Corp.Aerospace & Defense

Appendix B. Underlying Panel Data

In the following tables, FRED stands for Federal Reserve Bank of St. Louis.
Table A2. U.S. financial indicators.
Table A2. U.S. financial indicators.
IndicatorDescriptionTransformationSource
USA Treasury 3 mYield of U.S. government bonds with maturity of 3 monthsMonthly changeFRED
USA Treasury 1yYield of U.S. government bonds with maturity of 1 yearMonthly changeFRED
USA Treasury 2yYield of U.S. government bonds with maturity of 2 yearsMonthly changeFRED
USA Treasury 3yYield of U.S. government bonds with maturity of 3 yearsMonthly changeFRED
USA Treasury 5yYield of U.S. government bonds with maturity of 5 yearsMonthly changeFRED
USA Treasury 7yYield of U.S. government bonds with maturity of 7 yearsMonthly changeFRED
USA Treasury 10yYield of U.S. government bonds with maturity of 10 yearsMonthly changeFRED
GoldGold Fixing Price 10:30 A.M. (London time) in London Bullion Market, based in U.S. Dollars.ReturnFRED
WTI CrudeCrude Oil Prices: West Texas Intermediate (WTI)—Cushing, Oklahoma.ReturnFRED
Table A3. U.S. macroeconomical indicators.
Table A3. U.S. macroeconomical indicators.
IndicatorDescriptionTransformationSource
M1 U.S.Narrow money (M1) includes currency, i.e., banknotes and coins, as well as balances which can immediately be converted into currency or used for cashless payments, i.e., overnight deposits.ReturnFRED
M2 U.S.“Intermediate” money (M2) comprises narrow money (M1) and, in addition, deposits with a maturity of up to two years and deposits redeemable at a period of notice of up to three months.ReturnFRED
UnemploymentUnemployment rateReturnFRED
CPIConsumer Price Index (CPI) measures changes in the price level of a market basket of consumer goods and services purchased by households.ReturnFRED
PPIProducer Price Index (PPI) measures the average changes in prices received by domestic producers for their outputReturnFRED
PCEMeasures goods and services purchased by U.S. residentsReturnFRED
Personal saving ratePersonal saving as a percentage of disposable personal income (DPI), frequently referred to as “the personal saving rate,” is calculated as the ratio of personal saving to DPI.ReturnFRED
PayrollsAll Employees: Total Nonfarm Payrolls.ReturnFRED
UnemploymentUnemploymentReturnFRED
Initial claimsInitial claimsReturnFRED
Housing StartsHousing StartsReturnFRED
Capacity utilizationCapacity utilizationReturnFRED
Dollar IndexTrade Weighted U.S. Dollar IndexReturnFRED

Appendix C. Copula Families

Table A4. Copula families as numbered in the R-package VineCopula.
Table A4. Copula families as numbered in the R-package VineCopula.
NumberFamily
0Independence copula
1Gaussian copula
2Student’s t copula
3Clayton copula
4Gumbel copula
5Frank copula
6Joe copula
13Rotated Clayton copula (180 degrees; “survival Clayton”)
14Rotated Gumbel copula (180 degrees; “survival Gumbel”)
16Rotated Joe copula (180 degrees; “survival Joe”)
23Rotated Clayton copula (90 degrees)
24Rotated Gumbel copula (90 degrees)
26Rotated Joe copula (90 degrees)
33Rotated Clayton copula (270 degrees)
34Rotated Gumbel copula (270 degrees)
36Rotated Joe copula (270 degrees)

Appendix D. Portfolio Characteristics

Table A5. Portfolio characteristics for annual volatility σ = 10 % .
Table A5. Portfolio characteristics for annual volatility σ = 10 % .
COPARDFMind DFMhist
Log-return (Total)1.05780.95920.96610.6982
Log-return (Monthly)0.00790.00720.00720.0050
Std. deviation (Annual)0.17750.18080.17990.1571
Sharpe Ratio (Monthly)0.18160.16650.16680.1354
Omega Ratio (Monthly)1.63891.56171.56961.4420
95% VaR−0.1930−0.1916−0.1993−0.1899
95% CVaR−0.1965−0.1956−0.20373−0.1924
Table A6. Portfolio characteristics for annual volatility σ = 13 % .
Table A6. Portfolio characteristics for annual volatility σ = 13 % .
COPARDFMind DFMhist
Log-return (Total)1.18280.96571.02570.6855
Log-return (Monthly)0.00890.00730.00770.0048
Std. deviation (Annual)0.18380.18900.18880.1661
Sharpe Ratio (Monthly)0.19560.16340.17040.1268
Omega Ratio (Monthly)1.69411.54201.57831.3974
95% VaR 0 .2106 0 .2135 0 .2219 0 .1920
95% CVaR 0 .2162 0 .2191 0 .2283 0 .1944
Table A7. Portfolio characteristics for annual volatility σ = 16 % .
Table A7. Portfolio characteristics for annual volatility σ = 16 % .
COPARDFMind DFMhist
Log-return (Total)1.25540.90681.07500.2660
Log-return (Monthly)0.00950.00690.00810.0015
Std. deviation (Annual)0.19310.19810.19960.2015
Sharpe Ratio (Monthly)0.19990.15140.17150.0581
Omega Ratio (Monthly)1.70181.48911.58601.1814
95% VaR 0 . 2332−0.2362 0 .2458 0 .2355
95% CVaR 0 .2408 0 .2434 0 .2541 0 .2379

Appendix E. Significance Tests for Sharpe Ratios

Table A8. Test statistics of the Jobson–Korkie Test for pairwise comparison of Sharpe ratios for four portfolios from the mean-variance optimization problem with σ = 10 % .
Table A8. Test statistics of the Jobson–Korkie Test for pairwise comparison of Sharpe ratios for four portfolios from the mean-variance optimization problem with σ = 10 % .
COPARDFMind DFM
COPAR
DFM1.02
ind DFM1.89 *0.03
hist1.040.660.70
Note: * p-value < 0.1, ** p-value < 0.05, *** p-value < 0.01.
Table A9. Test statistics of the Jobson–Korkie Test for pairwise comparison of Sharpe ratios for four portfolios from the mean-variance optimization problem with σ = 13 % .
Table A9. Test statistics of the Jobson–Korkie Test for pairwise comparison of Sharpe ratios for four portfolios from the mean-variance optimization problem with σ = 13 % .
COPARDFMind DFM
COPAR
DFM1.62
ind DFM2.20 **0.42
hist1.300.710.82
Note: * p-value < 0.1, ** p-value < 0.05, *** p-value < 0.01.
Table A10. Test statistics of the Jobson–Korkie Test for pairwise comparison of Sharpe ratios for four portfolios from the mean-variance optimization problem with σ = 16 % .
Table A10. Test statistics of the Jobson–Korkie Test for pairwise comparison of Sharpe ratios for four portfolios from the mean-variance optimization problem with σ = 16 % .
COPARDFMind DFM
COPAR
DFM2.03 **
ind DFM2.27 **0.95
hist1.95 *1.391.60
Note: * p-value < 0.1, ** p-value < 0.05, *** p-value < 0.01.

Appendix F. Testing for Granger Causality

Table A11. Summary of linear regressions for testing Granger causality. Columns contain estimate of regression coefficients, and their standard errors are given in brackets.
Table A11. Summary of linear regressions for testing Granger causality. Columns contain estimate of regression coefficients, and their standard errors are given in brackets.
Dependent Variable:
y
F 1 , t fin F 2 , t fin F 1 , t macro F 2 , t macro F 3 , t macro
(1)(2)(3)(4)(5)
F 1 , t 1 f i n 0.216 ***0.655 ***−0.189 ***0.076−0.094 *
(0.054)(0.060)(0.053)(0.056)(0.055)
F 2 , t 1 f i n 0.086 **0.214 ***0.052−0.0560.048
(0.041)(0.045)(0.040)(0.042)(0.042)
F 1 , t 1 m a c r o 0.0760.009−0.385 ***0.037−0.104 **
(0.047)(0.052)(0.046)(0.048)(0.048)
F 2 , t 1 m a c r o 0.077 *−0.0060.020−0.456 ***0.024
(0.046)(0.051)(0.045)(0.047)(0.047)
F 3 , t 1 m a c r o −0.0090.0650.192 ***−0.045−0.554 ***
(0.043)(0.047)(0.042)(0.044)(0.044)
Constant0.0030.002−0.00050.0030.0002
(0.052)(0.058)(0.051)(0.054)(0.053)
Observations369369369369369
R 2 0.0930.3690.2360.2150.315
Adjusted R 2 0.0800.3600.2250.2040.305
Residual Std. Error (df = 363)1.0031.1090.9811.0301.026
F Statistic (df = 5; 363)7.428 ***42.464 ***22.408 ***19.893 ***33.339 ***
Note: * p-value < 0.1; ** p-value < 0.05; *** p-value < 0.01.

References

  1. Aas, Kjersti. 2016. Pair-copula constructions for financial applications: A review. Econometrics 4: 43. [Google Scholar] [CrossRef]
  2. Aas, Kjersti, Claudia Czado, Arnoldo Frigessi, and Henrik Bakken. 2009. Pair-copula constructions of multiple dependence. Insurance: Mathematics and economics 44: 182–98. [Google Scholar] [CrossRef]
  3. Ando, Tomohiro, and Jushan Bai. 2014. Asset pricing with a general multifactor structure. Journal of Financial Econometrics 13: 556–604. [Google Scholar] [CrossRef]
  4. Bańbura, Marta, and Michele Modugno. 2014. Maximum likelihood estimation of factor models on data sets with arbitrary pattern of missing data. Journal of Applied Econometrics 29: 133–60. [Google Scholar] [CrossRef]
  5. Beare, Brendan K., and Juwon Seo. 2015. Vine copula specifications for stationary multivariate markov chains. Journal of Time Series Analysis 36: 228–46. [Google Scholar] [CrossRef]
  6. Biller, Bahar, and Barry L. Nelson. 2003. Modeling and generating multivariate time-series input processes using a vector autoregressive technique. ACM Transactions on Modeling and Computer Simulation 13: 211–37. [Google Scholar] [CrossRef]
  7. Bork, Lasse. 2009. Estimating US Monetary Policy Shocks Using a Factor-Augmented Vector Autoregression: An EM Algorithm Approach. CREATES Research Paper 2009-11. Denmark: Department of Economics and Business Economics, Aarhus University. [Google Scholar]
  8. Brechmann, Eike Christian, and Claudia Czado. 2015. COPAR—Multivariate time series modeling using the copula autoregressive model. Applied Stochastic Models in Business and Industry 31: 495–514. [Google Scholar] [CrossRef]
  9. Chena, Xiaohong, and Yanqin Fan. 2006. Estimation of copula-based semiparametric time series models. Journal of Econometrics 130: 307–35. [Google Scholar] [CrossRef]
  10. Czado, Claudia. 2010. Pair-copula constructions of multivariate copulas. In Copula Theory and Its Applications. Edited by Wolfgang Härdle, Piotr Jaworski and Tomasz Rychlik. New York: Springer, pp. 93–109. [Google Scholar]
  11. Czado, Claudia, Eike Christian Brechmann, and Lutz Gruber. 2012. Selection of vine copulas. In Copulae in Mathematical and Quantitative Finance. Edited by Fabrizio Durante and Piotr Jaworski. New York: Springer, pp. 17–37. [Google Scholar]
  12. Czado, Claudia, Ulf Schepsmeier, and Aleksey Min. 2012. Maximum likelihood estimation of mixed C-vines with application to exchange rates. Statistical Modelling 12: 229–55. [Google Scholar] [CrossRef]
  13. Darsow, William F., Bao Nguyen, and Elwood T. Olsen. 1992. Copulas and Markov processes. Illinois Journal of Mathematics 36: 600–642. [Google Scholar]
  14. Dempster, Arthur P., Nan M. Laird, and Donald B. Rubin. 1977. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society. Series B (Methodological) 39: 1–38. [Google Scholar]
  15. Dissmann, Jeffrey, Eike Christian Brechmann, Claudia Czado, and Dorota Kurowicka. 2013. Selecting and estimating regular vine copulae and application to financial returns. Computational Statistics & Data Analysis 59: 52–69. [Google Scholar] [CrossRef]
  16. Embrechts, Paul, Alexander McNeil, and Daniel Straumann. 2002. Correlation and dependency in risk management: Properties and pitfalls. In Risk Management: Value at Risk and Beyond. Edited by Michael Dempster and Henry Moffat. Cambridge: Cambridge University Press, pp. 176–223. [Google Scholar]
  17. Frees, Edward W., and Emiliano A. Valdez. 1998. Understanding relationships using copulas. North American Actuarial Journal 2: 1–25. [Google Scholar] [CrossRef]
  18. Ibragimov, Rustam. 2009. Copula-based characterizations for higher order Markov processes. Econometric Theory 25: 819–46. [Google Scholar] [CrossRef]
  19. Jolliffe, Ian. 2002. Principal Component Analysis. New York: Springer. [Google Scholar]
  20. Jobson, John, and Bob Korkie. 1981. Performance hypothesis testing with the Sharpe and Treynor measures. The Journal of Finance 36: 889–908. [Google Scholar] [CrossRef]
  21. Krupskii, Pavel, and Harry Joe. 2013. Factor copula models for multivariate data. Journal of Multivariate Analysis 120: 85–101. [Google Scholar] [CrossRef]
  22. Kurowicka, Dorota, and Roger Cooke. 2006. Uncertainty Analysis with High Dimensional Dependence Modelling. New York: John Wiley & Sons. [Google Scholar]
  23. Li, David X. 2000. On default correlation: A copula function approach. Journal of Fixed Income 9: 43–54. [Google Scholar] [CrossRef]
  24. Litner, John. 1965. The valuation of risk assets and the selection of risky investments in stock portfolios and capital budgets. Review of Economics and Statistics 47: 13–37. [Google Scholar] [CrossRef]
  25. Lütkepohl, Helmut. 2005. New Introduction to Multiple Time Series Analysis. New York: Springer. [Google Scholar]
  26. Mossin, Jan. 1966. Equilibrium in a Capital Asset Market. Econometrica 34: 768–83. [Google Scholar] [CrossRef]
  27. Morales-Nápoles, Oswaldo, Roger Cooke, and Dorota Kurowicka. 2010. About the number of vines and regular vines on n nodes. [Google Scholar] [CrossRef]
  28. Oh, Dong Hwan, and Andrew J. Patton. 2017. Modelling dependence in high dimensions with factor copulas. Journal of Business & Economic Statistics 35: 139–154. [Google Scholar] [CrossRef]
  29. Oh, Dong Hwan, and Andrew J. Patton. 2016. Time-varying systemic risk: Evidence from a dynamic copula model of CDS spreads. Journal of Business & Economic Statistics. [Google Scholar] [CrossRef]
  30. Ross, Stephen. 1976. The arbitrage theory of capital asset pricing. Journal of Economic Theory 13: 341–60. [Google Scholar] [CrossRef]
  31. Shumway, Robert, and David Stoffer. 1982. An approach to time series smoothing and forecasting using the EM algorithm. Journal of Time Series Analysis 3: 253–64. [Google Scholar] [CrossRef]
  32. Sklar, Abe. 1959. Fonctions de répartition à n dimensions et leurs marges. Publications de l’Institut Statistique de l’Université de Paris 8: 229–31. [Google Scholar]
  33. Smith, Michael Stanley. 2015. Copula modelling of dependence in multivariate time series. International Journal of Forecasting 31: 815–33. [Google Scholar] [CrossRef]
  34. Stock, James, and Mark Watson. 1999. Diffusion indices. Working Paper No. 6702. Cambridge, MA, USA: National Bureau of Economic Research. [Google Scholar]
  35. Stock, James, and Mark Watson. 2002. Forecasting using principal components from a large number of predictors. Journal of the American Statistical Association 97: 1167–79. [Google Scholar] [CrossRef]
  36. Stock, James, and Mark Watson. 2002. Macroeconomic forecasting using diffusion indexes. Journal of Business & Economic Statistics 20: 147–62. [Google Scholar] [CrossRef]
  37. Stöber, Jakob, Harry Joe, and Claudia Czado. 2013. Simplified pair copula constructions—Limitations and extensions. Journal of Multivariate Analysis 119: 101–18. [Google Scholar] [CrossRef]
  38. Sharpe, William F. 1964. Capital asset prices: A theory of market equilibrium under conditions of risk. The Journal of Finance 19: 425–42. [Google Scholar] [CrossRef]
  39. Treynor, Jack L. 2012. Toward a Theory of Market Value of Risky Assets. In Treynor on Institutional Investing. Edited by Jack L. Treynor. Hoboken: John Wiley & Sons. [Google Scholar] [CrossRef]
  40. Watson, Mark, and Robert Engle. 1983. Alternative algorithms for the estimation of dynamic factor, mimic and varying coefficient regression models. Journal of Econometrics 23: 385–400. [Google Scholar] [CrossRef]
  41. Wu, Jeff. 1983. On the convergence properties of the EM algorithm. The Annals of Statistics 11: 95–103. [Google Scholar] [CrossRef]
Figure 1. C-vine (Left) and D-vine (Right) representation for d = 5 .
Figure 1. C-vine (Left) and D-vine (Right) representation for d = 5 .
Econometrics 05 00020 g001
Figure 2. Dependencies of a simultaneous bivariate time series for T = 4 . Vertical solid lines: cross-sectional dependence; horizontal solid lines and curved lines: serial dependence for each time series; dotted and dashed lines: cross-serial dependence.
Figure 2. Dependencies of a simultaneous bivariate time series for T = 4 . Vertical solid lines: cross-sectional dependence; horizontal solid lines and curved lines: serial dependence for each time series; dotted and dashed lines: cross-serial dependence.
Econometrics 05 00020 g002
Figure 3. R-vine structure of three-dimensional time series for T = 4 .
Figure 3. R-vine structure of three-dimensional time series for T = 4 .
Econometrics 05 00020 g003
Figure 4. Amount of principal components (PCs) vs. captured variance for three dynamic factor models (DFMs) considered up to December 2005.
Figure 4. Amount of principal components (PCs) vs. captured variance for three dynamic factor models (DFMs) considered up to December 2005.
Econometrics 05 00020 g004
Figure 5. Correlations heatmap as of October 2010.
Figure 5. Correlations heatmap as of October 2010.
Econometrics 05 00020 g005
Figure 6. Performance of four portfolios with initial investment of 1 USD over the out-of-sample time period from January 2006 to November 2016 for σ = 10 % , 13 % , 16 % . COPAR: copula autoregressive; Hist: historical; Ind: independent DFM.
Figure 6. Performance of four portfolios with initial investment of 1 USD over the out-of-sample time period from January 2006 to November 2016 for σ = 10 % , 13 % , 16 % . COPAR: copula autoregressive; Hist: historical; Ind: independent DFM.
Econometrics 05 00020 g006

Share and Cite

MDPI and ACS Style

Ivanov, E.; Min, A.; Ramsauer, F. Copula-Based Factor Models for Multivariate Asset Returns. Econometrics 2017, 5, 20. https://doi.org/10.3390/econometrics5020020

AMA Style

Ivanov E, Min A, Ramsauer F. Copula-Based Factor Models for Multivariate Asset Returns. Econometrics. 2017; 5(2):20. https://doi.org/10.3390/econometrics5020020

Chicago/Turabian Style

Ivanov, Eugen, Aleksey Min, and Franz Ramsauer. 2017. "Copula-Based Factor Models for Multivariate Asset Returns" Econometrics 5, no. 2: 20. https://doi.org/10.3390/econometrics5020020

APA Style

Ivanov, E., Min, A., & Ramsauer, F. (2017). Copula-Based Factor Models for Multivariate Asset Returns. Econometrics, 5(2), 20. https://doi.org/10.3390/econometrics5020020

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop