Copula-Based Factor Models for Multivariate Asset Returns

Ivanov, Eugen; Min, Aleksey; Ramsauer, Franz

doi:10.3390/econometrics5020020

Open AccessArticle

Copula-Based Factor Models for Multivariate Asset Returns

by

Eugen Ivanov

^1,*,

Aleksey Min

² and

Franz Ramsauer

²

¹

Department of Economics, University of Augsburg, Universitätsstr. 16, 86159 Augsburg, Germany

²

Department of Mathematics, Technical University of Munich, Boltzmannstr. 3, 85748 Garching, Germany

^*

Author to whom correspondence should be addressed.

Econometrics 2017, 5(2), 20; https://doi.org/10.3390/econometrics5020020

Submission received: 29 September 2016 / Revised: 3 May 2017 / Accepted: 3 May 2017 / Published: 17 May 2017

(This article belongs to the Special Issue Recent Developments in Copula Models)

Download

Browse Figures

Versions Notes

Abstract

:

Recently, several copula-based approaches have been proposed for modeling stationary multivariate time series. All of them are based on vine copulas, and they differ in the choice of the regular vine structure. In this article, we consider a copula autoregressive (COPAR) approach to model the dependence of unobserved multivariate factors resulting from two dynamic factor models. However, the proposed methodology is general and applicable to several factor models as well as to other copula models for stationary multivariate time series. An empirical study illustrates the forecasting superiority of our approach for constructing an optimal portfolio of U.S. industrial stocks in the mean-variance framework.

Keywords:

COPAR model; dynamic factor model; multivariate time series; optimal mean-variance portfolio; vine copula

JEL Classification:

C58; C53; C10; G10

1. Introduction

It took almost four decades before the statistical usefulness and attractiveness of copulas was widely recognized after the seminal papers by Frees and Valdez (1998), Li (2000), and Embrechts et al. (2002). Copulas are now a standard tool for modeling a dependence structure of multivariate iid data in applied science. The foundation of the copula theory was laid by the famous Sklar’s theorem (see Sklar 1959), which states that any multivariate distribution can be represented through its copula and marginal distributions. If marginal distributions are continuous, then the copula of a multivariate distribution is unique. This approach is particularly flexible, since margins and the dependence structure—which is dictated by the copula—can be modeled independently.

To our knowledge, Darsow et al. (1992) initiated a theoretical application of copulas to specify univariate Markov processes of first order. Thus, conditional independence can be stated in terms of copulas, and this results in a copula counterpart of the Chapman–Kolmogorov equations for the transition probabilities of a Markov process. Ibragimov (2009) generalized their approach for univariate Markov processes of higher order as well as for non-Markov processes. Furthermore, he introduced new classes of copulas for modeling univariate time series. Estimation of copula-based stationary time series models can still be pursued in the classical framework as for iid data. For example, Chen and Fan (2006) investigated theoretical aspects of the two-step estimation when marginal distributions are fitted non-parametrically in the first step and copula parameters are then estimated by maximum likelihood.

The first non-Gaussian VAR models were introduced by Biller and Nelson (2003), where smartly chosen Gaussian VAR time series were transformed to achieve desired autocorrelation structure and marginal distributions. Recently, Brechmann and Czado (2015), Beare and Seo (2015), and Smith (2015) simultaneously developed copula-based models for stationary multivariate time series. Although these models differ from each other, their generality consists of an underlying R-vine pair-copula construction (see Aas et al. 2009) to describe the cross-sectional and temporal dependence jointly. To capture the cross-sectional dependence, Brechmann and Czado (2015) employ C-Vine, while Beare and Seo (2015) and Smith (2015) consider D-vine. Further, Brechmann and Czado (2015) and Beare and Seo (2015) assume the existence of a key variable, whose temporal dependence was explicitly modeled, and this assumption combined with C- or D-vine for the cross-sectional dependence results in a corresponding R-vine for a multivariate time series. In contrast, Smith (2015) explicitly modeled the temporal dependence between multivariate observations and constructed a general D-vine for them consisting of D-vines for the cross-sectional dependence.

In the time of big data, factor models offer an elegant solution to describe a high-dimensional panel data with a few unobservable (latent) variables, called factors. The idea behind factor models is that observable variables are driven by two orthogonal, hidden processes: one captures their co-movements and arises from a linear combination of the latent factors, whereas the other covers their individual nature in the form of idiosyncratic shocks. The dimension of the observed data usually significantly exceeds the number of factors, and so a reduction in dimension takes place.

In the seminal works of Stock and Watson (1999, 2002a, 2002b), factor models supported the forecasting of univariate time series. It was shown for large panel data that the unobserved factors can be consistently estimated, and this served for a consistent forecasting framework. In particular, Stock and Watson (2002a) illustrated that the forecasting of several macroeconomic variables based on factor models can outperform those obtained from competing models such as autoregression (AR), VAR, and leading indicators. In the literature, factor models are classified as static or dynamic with respect to the stochastic dynamics of the unobserved factors. Static factor models suppose iid normally distributed factors, while dynamic factor models assume that the factor obeys a VAR model of order

p \geq 1

.

In this paper, we apply the copula autoregressive (COPAR) model of Brechmann and Czado (2015) to quantify the dependence structure of estimated unobserved factors in dynamic factor models. More precisely, we consider two dynamic factor models and estimate them separately with the maximum likelihood by employing the Kalman filter and smoother. The estimated factors are then combined with a COPAR model, from which latent factors are simulated for a forecasting purpose. Thus, our approach allows several estimated dynamic factor models to be coupled with a copula and admits a non-Gaussian dependence structure of simulated latent factors. To gain information from the estimated factors, a forecasted variable of each market is regressed on the corresponding simulated factors and its previous value.

It should be noted that our modeling approach is different from the factor copula modeling of Krupskii and Joe (2013) and Oh and Patton (2017). We first estimate unobserved factors and then fit copula for them as for observable data. With factor models, we reduce the data dimension, and using autoregressive structure of factors we decrease the number of copula parameters essentially. In contrast, Krupskii and Joe (2013) and Oh and Patton (2017) treat iid data and reduce the dimension of copula parameters by considering conditional independence with respect to unobserved factors. Numerically integrating the unobserved factors, a copula with low dimensional parameters for observed iid data is obtained. For multivariate time series, Oh and Patton (2016) extended the factor copula models with time-varying parametric copulas.

In the empirical application, we consider monthly U.S. financial and macroeconomic panel data to filter driving factors later employed for a mean-variance portfolio optimization. Our main contribution is a new method to improve portfolio performance using factor predictions sampled from the COPAR model. In contrast to dynamic factor models, we explicitly allow for non-Gaussian cross-sectional and temporal dependence between factors. The forecasted factors with non-linear dependence structure are used to assess the future variability of multivariate asset returns. This allows us to construct an optimal portfolio in the mean-variance framework. For comparison, three benchmark portfolios are constructed using dynamic factor models as well as empirical moments of observed asset returns. Thus, our optimal portfolio outperforms the benchmark portfolios according to several risk-adjusted performance measures.

The rest of the paper is organized as follows. Section 2 outlines dynamic factor models and their estimation in the maximum likelihood framework. Section 3 briefly considers vine copulas. Section 4 reviews COPAR models and discusses an algorithm to extend COPAR(1) model for another multivariate observation. Section 5 presents our proposed methodology for an optimal asset allocation of 35 industrial stocks from S&P500 listed in Appendix A and compares it with three benchmark portfolios. Finally, we conclude and discuss further research. Appendix B gives an overview on the considered monthly panel data. Appendix C presents bivariate pair-copulas considered in a model selection procedure for cross-sectional dependence. Appendices Appendix D and Appendix E contain detailed numerical results for portfolio comparisons. Appendix F summarizes testing results of Granger causality between estimated factors.

2. Factor Models

In our application, we deal only with dynamic factor models (DFMs). Further, we restrict our exposition to the simplest factor dynamics of order 1, since any VAR

(p)

can be written in VAR

(1)

form (see Lütkepohl 2005, p. 15).

Definition 1.

(Dynamic Factor Model) For any point in time t, let

X_{t} \in R^{m}

be a stationary vector process with zero mean. Let

F_{t} \in R^{q}, q \leq m,

denote the multivariate factor at time t, and let the vector

ε_{t} \in R^{m}

collect all idiosyncratic shocks. Then, a dynamic factor model of order 1 is given by

\begin{matrix} X_{t} & = Λ F_{t} + ε_{t}, \\ F_{t} & = A F_{t - 1} + u_{t}, \end{matrix}

with constant matrices

Λ \in R^{m \times q}

and

A \in R^{q \times q}

. The idiosyncratic shocks

ε_{t}

are iid Gaussian with zero mean and covariance matrix

R \in R^{m \times m}

and the error vectors

u_{t} \in R^{q}

are iid Gaussian with zero mean and covariance matrix

Q \in R^{q \times q}

.

Since

X_{t}

is a stationary process, Definition 1 implicitly assumes that unobserved factor process

F_{t}

is also stationary. The stationarity of

F_{t}

can be ensured if the roots of the characteristic polynomial

|I_{q} - A z|

lie outside of the complex unit circle. In this case, the moving average representation for

F_{t}

yields its stationary

q -

dimensional zero-mean Gaussian distribution (see Lütkepohl 2005, pp. 18–21) given by

N_{q} (0, Σ_{F}) with Σ_{F} = \sum_{i = 1}^{\infty} A^{i} Q {(A^{i})}^{'} .

(1)

For known or estimated Q and A, the factors can be drawn from (1) by truncating the infinite sum for a pre-specified error tolerance of

10^{- 5}

for all entries of

Σ_{F}

.

Parameters

Λ, A, R

, and Q of the dynamic factor model in Definition 1 can be estimated in the maximum likelihood framework with Expectation-Maximization Algorithm (EM-Algorithm) of Dempster et al. (1977). This was first done by Shumway and Stoffer (1982) and Watson and Engle (1983), though Shumway and Stoffer (1982) assumed a known

Λ

, and Watson and Engle (1983) did not directly maximize the log-likelihood of dynamic factor models. Recently, Bork (2009) and Bańbura and Modugno (2014) derived the EM-Algorithm for the dynamic factor models in Definition 1, on which we rely in our empirical application. Note that the convergence properties of the EM-Algorithm has been theoretically shown for an exponential family by Wu (1983).

For the convenience of the reader, we outline the estimation procedure of Bork (2009) and Bańbura and Modugno (2014). and refer to the original works for further details. Ignoring the unobservability of the factors, the log-likelihood function of the model in Definition 1 for a data sample of length T can be derived by iterative conditioning on observations (e.g., in Bork 2009, p. 45 or Bańbura and Modugno 2014, p. 156). However, the factors

F_{t}

are unobservable, and therefore the log-likelihood is integrated out with respect to the factor distribution. This results in the expected log-likelihood conditioned on the observed panel data, which constitutes the expectation step of the EM-Algorithm. In contrast to the unconditioned log-likelihood, here factors are replaced by their corresponding conditional moments of first and second order, which a single run of the Kalman filter and smoother given in (Bork 2009, p.43) can provide.

In the maximization step of the EM-Algorithm, Bork (2009) and Bańbura and Modugno (2014) treat the conditional factor moments as constants, when the partial derivatives of the conditional expectation of the log-likelihood with respect to the model parameters are computed. Next, they search for the zeros of the arising system of linear equations to determine the maximum of the expected log-likelihood function. The iterative parameter updates of the EM-Algorithm from Bork (2009) and Bańbura and Modugno (2014) is summarized in Theorem 1.

Theorem 1.

(EM-Algorithm as in Bork (2009) and Bańbura and Modugno (2014)) Assume the dynamic factor model in Definition 1 and let the matrix

X = [X_{1}, \dots, X_{T}] \in R^{m \times T}

collect all panel data. Let the index

l \geq 0

indicate the current loop of the EM-Algorithm and let

E [\cdot | X, θ_{(l)}]

denote the expectation conditioned on the panel data and the parameters estimated in loop l. Then, the parameter updates in loop

(l + 1)

are given as follows:

\begin{matrix} Λ_{(l + 1)} & = (\sum_{t = 1}^{T} X_{t} E [F_{t}^{'} | X, θ_{(l)}]) {(\sum_{t = 1}^{T} E [F_{t} F_{t}^{'} | X, θ_{(l)}])}^{- 1}, \\ A_{(l + 1)} & = (\sum_{t = 1}^{T} E [F_{t} F_{t - 1}^{'} | X, θ_{(l)}]) {(\sum_{t = 1}^{T} E [F_{t - 1} F_{t - 1}^{'} | X, θ_{(l)}])}^{- 1}, \\ R_{(l + 1)} & = \frac{1}{T} (\sum_{t = 1}^{T} X_{t} X_{t}^{'} - Λ_{(l + 1)} \sum_{t = 1}^{T} E [F_{t} | X, θ_{(l)}] X_{t}^{'}), \\ Q_{(l + 1)} & = \frac{1}{T} (\sum_{t = 1}^{T} E [F_{t} F_{t}^{'} | X, θ_{(l)}] - A_{(l + 1)} \sum_{t = 1}^{T} E [F_{t - 1} F_{t}^{'} | X, θ_{(l)}]), \end{matrix}

where

Z^{- 1}

stands for the inverse matrix and conditional factor moments are computed using the Kalman filter and smoother for each loop l.

The iterative estimation procedure in Theorem 1 requires a termination criterion. In our application, we terminate the above EM-Algorithm as soon as the change in log-likelihood is smaller than

10^{- 8}

. Finally, note that the estimated factors are unique up to rotation with orthogonal matrices. For forecasting purposes, one can ignore this fact, since estimated parameters of factors in forecasting equations will then be transformed correspondingly with no effect on forecasting variable. We illustrate this point later in our application.

3. Vine Copulas

Since the statistical applicability of vine copulas with non-Gaussian building bivariate copulas was recognized by Aas et al. (2009), vine copulas became a standard tool to describe the dependence structure of multivariate data (see Aas 2016 for a recent review). Moreover, Brechmann and Czado (2015), Beare and Seo (2015), and Smith (2015) have applied vine copulas to model temporal dependence of multivariate time series as well as the cross-sectional dependence between univariate time series. In this section, we review the COPAR model of Brechmann and Czado (2015), which is used to describe the stochastic dynamics and the dependence structure of the estimated factors from Section 2. We start with the concept of regular vines from Kurowicka and Cooke (2006). We do not consider illustrating examples on pair-copula constructions, and refer instead to Aas (2016) or Czado (2010) for more intuition on them.

Definition 2.

(Regular vine) A collection of trees

V = (T_{1}, \dots, T_{d - 1})

is a regular vine on d elements if

1.: $T_{1}$ is a tree with nodes $N_{1} = \{1, \dots, d\}$ connected by a set of non-looping edges $E_{1}$ .
2.: For $i = 2, \dots, d - 1$ , $T_{i}$ is a connected tree with edge set $E_{i}$ and node set $N_{i} = E_{i - 1}$ , where $|N_{i}| = d - (i - 1)$ and $|E_{i}| = d - i$ are the number of edges and nodes, respectively.
3.: For $i = 2, \dots, d - 1$ , $\forall e = \{a, b\} \in E_{i}$ : $|a \cap b| = 1$ (two nodes $a, b \in N_{i}$ are connected by an edge e in $T_{i}$ if the corresponding edges a and b in $T_{i - 1}$ share one node (proximity condition)).

A tree

T = (N, E)

is an acyclic graph, where N is its set of nodes and E is its set of edges. Acyclic means that there exits no path such that it cycles. In a connected tree we can reach each node from all other nodes on this tree. R-vine is simply a sequence of connected trees such that the edges of

T_{i}

are the nodes of

T_{i + 1}

. A traditional example of these structures are canonical vines (C-vines) and drawable vines (D-vines) (see Czado (2010) and Aas et al. (2009)) in Figure 1. Every tree of a C-vine is defined by a root node, which has

d - i

incoming edges, in each tree

T_{i}

,

i \in \{1, \dots, d - 1\}

, whereas a D-vine is solely defined through its first tree, where each node has at most two incoming edges.

Regular vines are a powerful tool to systemize all possible factorizations of a

d -

dimensional density as a product of d univariate marginal densities with a product of

d (d - 1) / 2

conditional and unconditional bivariate copulas (see Theorem 4.2 in Kurowicka and Cooke (2006)). Thus, the unconditional and conditional bivariate copulas—called pair-copulas—of a given factorization can be uniquely mapped onto the edge set E of a particular regular vine, and vice versa. Then, the conditional copulas and their arguments depend on conditioned values. The dependence of conditional copulas on conditioned values is crucial, and allows for statistical applications only for a subclass of elliptical distributions (see Stöber et al. (2013)), since in this case, conditioned pair-copulas depend on conditioned values only through their arguments. Aas et al. (2009) first developed this observation further and went beyond the elliptical world. Thus, they considered regular vine factorizations with arbitrary fixed conditional copulas and showed that this results in valid multivariate distributions and copulas. In this paper, we also assume that conditional copulas depend on conditioned values only through their arguments, and so they can be chosen from bivariate copula families.

The number of possible vine structures on d random variables can be immense. For

d \leq 4

, only C- and D-vines are possible. For

d > 4

, there are

\frac{d!}{2}

different C- and D-vines. The total amount of regular vine structures has been computed by Morales-Nápoles et al. (2010), and is equal to

\frac{d!}{2} \cdot 2^{(\binom{d - 2}{2})}

. To select conditional copulas on these graphical structures, we define the following sets as in Czado (2010).

Definition 3.

(Conditioned and conditioning sets) For any edge

e = \{a, b\} \in E_{i}

of a regular vine

V

, the complete union of e is the subset

A_{e} = \{v \in N_{1} : \forall m = 1, \dots, i - 1, \exists e_{j_{m}} \in E_{m} s . t . v \in e_{j_{1}} \in \dots \in e_{j_{i - 1}} \in e\} .

The conditioning set associated with e is

D_{e} = A_{a} \cap A_{b} .

The conditioned sets associated with e are

\begin{matrix} i (e) & = & A_{a} \ D_{e} \\ j (e) & = & A_{b} \ D_{e} . \end{matrix}

The copula for this edge will be denoted by

C_{e} ≔ C_{i (e), j (e) ∣ D (e)} .

Given a regular vine, we specify a regular vine copula by assigning a (conditional) pair copula (with parameters) to each edge of the regular vine. In doing so, we follow Czado et al. (2012).

Definition 4.

(Regular vine copula). A regular vine copula

C = (V, B (V), θ (B (V)))

in d dimensions is a multivariate distribution function such that for a random vector

U = {(U_{1}, \dots, U_{d})}^{'} \sim C

with uniform margins

$V$ is a regular vine on d elements,
$B (V) = \{C_{e} ∣ e \in E_{m}, m = 1, \dots, d - 1\}$ is a set of $d (d - 1) / 2$ copula families identifying the unconditional distributions of $U_{i (e), j (e)}$ as well as the conditional distributions of $U_{i (e), j (e)} ∣ U_{D (e)},$
$θ (B (V)) = \{θ_{e} ∣ e \in E_{m}, m = 1, \dots, d - 1\}$ is the set of parameter vectors corresponding to the copulas in $B (V) .$

To facilitate statistical inference, a matrix representation of R-vines was proposed by Morales-Nápoles et al. (2010) and further developed by Dissmann et al. (2013). To specify a d-dimensional R-vine in matrix form, one needs several lower triangular

d \times d

matrices: one that stores the structure of the R-vine, one with copula families, and another two with the first and second parameters.

For a d-dimensional R-vine, the matrix with the structure has the following form

M = (\begin{matrix} m_{1, 1} \\ ⋮ & ⋱ \\ m_{d - 1, 1} & \dots & m_{d - 1, d - 1} \\ m_{d, 1} & \dots & m_{d, d - 1} & m_{d, d} \end{matrix}),

where

m_{i, j} \in (1, \dots, d)

. The rules for reading from this matrix are as follows. The conditioned set for an entry

m_{i, j}

is the entry itself and the diagonal entry of the column

m_{j, j}

, whereas the conditioning set is composed of variables under the entry; i.e., for

m_{i, j}

, the conditioned set will be

(m_{i, j}, m_{j, j})

, the conditioning set is

(m_{i + 1, j}, \dots, m_{d, j})

. Thus,

m_{i, j}

denotes the node

(m_{j, j}, m_{i, j} ∣ m_{i + 1, j}, \dots, m_{d, j})

. We will assume that the diagonal of M is sorted in descending order, which can always be achieved by reordering the node labels, so that we have

m_{i, i} = n - i + 1

. To illustrate the R-vine matrix notation, we consider the C-vine from Figure 1 and give below its R-vine matrix representation.

(\begin{matrix} 5 \\ 4 & 4 \\ 3 & 3 & 3 \\ 2 & 2 & 2 & 2 \\ 1 & 1 & 1 & 1 & 1 \end{matrix})

The first column encodes the following nodes

Node $54 ∣ 321$ is saved through $m_{1, 1}, m_{2, 1}$ given $m_{3, 1}, m_{4, 1}$ and $m_{5, 1}$ ;
Node $53 ∣ 21$ is saved through $m_{1, 1}, m_{3, 1}$ given $m_{4, 1}$ and $m_{5, 1}$ ;
Node $52 ∣ 1$ is saved through $m_{1, 1}, m_{4, 1}$ given $m_{5, 1}$ ;
Node 51 is saved through $m_{1, 1}, m_{5, 1}$ .

In the sequel, we utilize C-vines to capture the cross-sectional dependence of a multivariate time series at any time point t. To capture the dependence between multivariate observations, the first tree of the C-vine for multivariate observation at time point t is connected to the first trees of the C-vines for existing neighboring multivariate observations at time points

t - 1

and

t + 1

with one edge, correspondingly. This results in the first tree of an R-vine for all multivariate observations treated as one huge sample point. Depending on the choice of C-vines and the connection of the first trees, the copula autoregressive model of Brechmann and Czado (2015) from the next section can be obtained.

Finally, note that the information on copula families and their parameters is similarly stored in lower triangular

d \times d

matrices. Each element of the R-vine matrix below the diagonal specifies a conditional or unconditional pair-copula depending on the diagonal entry above it. The family and parameters of this pair-copula are now entered at the same entry place of matrices for copula families and parameters. Since the diagonal entries of the R-vine matrix alone do not determine any pair-copulas, no entries for copula family and parameters matrices are needed on the diagonal. To avoid confusion with space character, we fill the main diagonal with * sign.

4. Copula Autoregressive Model

R-vines have been mostly used to model contemporaneous dependence. We now present a special R-vine structure called copula autoregressive (COPAR) model from Brechmann and Czado (2015) which is designed to capture cross-sectional, serial, and cross-serial dependence in multivariate time series, and allows a general Markovian structure. Let

{\{F_{t}, G_{t}\}}_{t = 1, \dots, T}

be an observable bivariate stationary time series. To illustrate how the two individual time series are interdependent, consider the mapping of dependencies for

T = 4

in Figure 2. Vertical solid lines represent the cross-sectional dependence, horizontal solid lines and curved lines represent the serial dependence for each time series, and dotted and dashed lines represent the cross-serial dependence.

Traditionally, R-vines were used to model only the cross-sectional dependence (pictured by the vertical solid lines in Figure 2), but under the assumption that other dependencies are absent (i.e., data is iid). The COPAR model is designed to additionally capture serial and cross-serial dependence. The following definition of a COPAR model without Markovian structure for two time series adopted to our notation is taken from Brechmann and Czado (2015). The vectors

(F_{s}, \dots, F_{t})

and

(G_{s}, \dots, G_{t})

are denoted as

F_{s : t}

and

G_{s : t}

, respectively.

Definition 5.

(COPAR model for the bivariate case) The COPAR model for stationary continuous time series

{\{F_{t}\}}_{t = 1, \dots, T}

and

{\{G_{t}\}}_{t = 1, \dots, T}

has the following components.

(i)

Unconditional marginal distributions of each time series are independent of time.

(ii)

An R-vine for the serial and between-series dependence of

{\{F_{t}\}}_{t = 1, \dots, T}

and

{\{G_{t}\}}_{t = 1, \dots, T}

, where the following pairs are selected.

1.: Serial dependence of ${\{F_{t}\}}_{t = 1, \dots, T}$ : The pairs of serial D-vine copula for $F_{1}, \dots, F_{T}$ ; that is,

$F_{s}, F_{t} ∣ F_{(s + 1) : (t - 1)}, 1 \leq s < t \leq T .$

(2)
2.: Between-series dependence:

$F_{s}, G_{t} ∣ F_{(s + 1) : t}, 1 \leq s \leq t \leq T,$

(3)

and

$G_{s}, F_{t} ∣ F_{1 : (t - 1)}, G_{(s + 1) : (t - 1)}, 1 \leq s < t \leq T .$

(4)
3.: Conditional serial dependence of ${\{G_{t}\}}_{t = 1, \dots, T}$ : The pairs of a serial D-vine copula for $G_{1}, \dots, G_{T}$ conditioned on all previous values of ${\{F_{t}\}}_{t = 1, \dots, T}$ ; that is,

$G_{s}, G_{t} ∣ F_{1 : t}, G_{(s + 1) : (t - 1)}, 1 \leq s < t \leq T .$

(5)

Pair copulas of the same lag length

t - s

,

t \geq s

, are identical. We associate

1.: copula $C_{t - s}^{F} ≔ C_{F_{s}, F_{t} ∣ F_{(s + 1) : (t - 1)}}$ with (2),
2.: copulas $C_{t - s}^{F G} ≔ C_{F_{s}, G_{t} ∣ F_{(s + 1) : t}}$ and $C_{t - s}^{G F} ≔ C_{G_{s}, F_{t} ∣ F_{1 : (t - 1)}, G_{(s + 1) : (t - 1)}}$ with (3) and (4), respectively, and
3.: copula $C_{t - s}^{G} ≔ C_{G_{s}, G_{t} ∣ F_{1 : t}, G_{(s + 1) : (t - 1)}}$ with (5).

Let us explain the above definition on an example of a bivariate time series

{\{F_{t}, G_{t}\}}_{t = 1, \dots, 4}

with four observations. Using a D-vine, Equation (2) captures the dependence structure of

F_{1}

,

F_{2}

,

F_{3}

, and

F_{4}

expressed by the top lines connecting them in Figure 2. For

s = 1

and

t = 4

, Equation (3) describes the conditional dependence between

F_{1}

and

G_{4}

conditioned on

F_{2}

,

F_{3}

, and

F_{4}

. Thus, Equation (3) captures the conditional dependence between

F_{s}

and

G_{t}

for

s < t

reflected by the dashed lines in Figure 2. For

s = t

, Equation (3) models the unconditional dependence between

F_{t}

and

G_{t}

expressed by the vertical lines. Similarly, Equation (4) describes the conditional serial dependence between univariate time series illustrated by the dotted lines in Figure 2. However, Equations (3) and (4) are not symmetric with respect to conditioned sets. In particular, the conditional distribution of

F_{s}

and

G_{t}

for

s < t

is independent of

G_{1}, \dots, G_{t - 1}

. This already indicates that the first time series

{\{F_{t}\}}_{t = 1, \dots, T}

plays a key role in the stochastic dynamics of the time series, since it is sufficient to describe the conditional dependence between

F_{s}

and

G_{t}

for

s < t

. Therefore, the univariate time series are not interchangeable. Using a D-vine, Equation (5) finally describes the dependence structure of

G_{1}, G_{2}, \dots, G_{T}

(connecting bottom lines in Figure 2) conditioned on

F_{1}, F_{2}, \dots, F_{T}

. Due to the property that copulas of same lag length are identical, the COPAR model defines a stationary bivariate time series.

Now, we expand the COPAR model to an arbitrary number of variables with Markov structure. We consider q univariate time series observed at T time points; that is,

{\{F_{1, t}\}}_{t = 1, \dots, T}, \dots, {\{F_{q, t}\}}_{t = 1, \dots, T}

. We denote a random vector of q variables observed at times

t = 1, \dots, T

as

F_{t} = {(F_{1, t}, \dots, F_{q, t})}^{'}

, time series of individual variables for

i = 1, \dots, q

as

F^{(i)} = (F_{i, 1}, \dots, F_{i, T})

, and also introduce a vector

F_{l : q, s : t} = (F_{l, s}, \dots, F_{l, t}, \dots, F_{q, s}, \dots, F_{q, t})

. Finally, we define a COPAR

(k)

model, which is COPAR model for a multivariate time series with a Markovian structure of the k-th order. Since unobservable factors in Section 2 are also denoted by

F_{t}

, our notation above should not lead to a confusion. Our modeling approach utilizes estimated factors even if they are not observed. Therefore, for reader convenience, we denote here multivariate time series with

F_{t}

and present COPAR models in terms of

F_{t}

.

Definition 6.

(COPAR $(k)$ model for q time series of length T) The COPAR

(k)

model for a q-dimensional stationary time series

F_{t} \in R^{q \times 1}

,

t = 1, \dots, T

has the following components.

(i)

Unconditional marginal distributions of each time series are independent of time.

(ii)

An R-vine for the serial and between-series dependence of

F_{t} \in R^{q \times 1}

,

t = 1, \dots, T

, where the following pairs are selected.

1.: Serial dependence of $F^{(1)}$ : The pairs of serial D-vine copula for $F_{1, 1}, \dots, F_{1, T}$ ; that is,

$F_{1, s}, F_{1, t} ∣ F_{1, (s + 1) : (t - 1)}, 1 \leq s < t \leq T .$
2.: Between-series dependence of $F^{(i)}$ and $F^{(j)}$ for $i < j$ , $i, j = 1, \dots, q$ :

$F_{i, s}, F_{j, t} ∣ F_{1 : (i - 1), 1 : t}, F_{i, (s + 1) : t} 1 \leq s \leq t \leq T,$

and

$F_{j, s}, F_{i, t} ∣ F_{1 : (i - 1), 1 : t}, F_{i : (j - 1), 1 : (t - 1)}, F_{j, (s + 1) : (t - 1)} 1 \leq s < t \leq T,$
3.: Conditional serial dependence of $F^{(i)}$ for $2 \leq i \leq q$ :

$F_{i, s}, F_{i, t} ∣ F_{1 : (i - 1), 1 : t}, F_{i, (s + 1) : (t - 1)}, 1 \leq s < t \leq T,$

whereas

(a): copulas $\forall i$ and for lag length $t - s > k$ are independent copulas,
(b): copulas for $i, j = 1, \dots, q$ for the same lag length $t - s$ , $t \geq s$ are identical.

Definition 6 introduces the COPAR model, if a) is neglected. The dependence of

F^{(i)}

is modeled conditioned on

F^{(1)}, \dots, F^{(i - 1)}

, and consequently, the order of variables cannot be simply interchanged. The number of pair copulas in COPAR models for time series with Markovian structure does not depend on T, and is less than the number needed for a general R-vine. With respect to the number of pair-copulas, the following result from Brechmann and Czado (2015) holds.

Lemma 1.

The number of copulas needed for a COPAR

(k)

model of q univariate time series is

q^{2} k + \frac{q (q - 1)}{2}

.

Note, the number of parameters in a VAR

(k)

for the between-series dependence of q time series—not counting parameters for marginal distributions—is equal to the number of pair-copulas in a COPAR

(k)

model. Therefore, the number of parameters in a COPAR

(k)

model is bounded by a multiple of the number of VAR parameters, depending on the number of parameters of the involved copula families. In contrast, a general R-vine requires

\frac{q T (q T - 1)}{2}

pair-copulas, resulting in a huge number of parameters.

Figure 3 illustrates the first 4 trees of a COPAR model for three univariate time series

F_{t}

,

G_{t}

, and

H_{t}

with

T = 4

(i.e., for four observations). Examining the first tree, we observe that it is a sequence of connected C-vines, where the central nodes are the time points of the first factor

F_{t}

. Thus, unconditional contemporaneous dependence is modeled with a C-vine.

The matrix representation of the R-vine structure of this COPAR model is given by

(\begin{matrix} H_{4} \\ H_{1} & G_{4} \\ H_{2} & H_{1} & F_{4} \\ H_{3} & H_{2} & H_{1} & H_{3} \\ G_{1} & H_{3} & H_{2} & H_{1} & G_{3} \\ G_{2} & G_{1} & H_{3} & H_{2} & H_{1} & F_{3} \\ G_{3} & G_{2} & G_{1} & G_{1} & H_{2} & H_{1} & H_{2} \\ G_{4} & G_{3} & G_{2} & G_{2} & G_{1} & H_{2} & H_{1} & G_{2} \\ F_{1} & F_{1} & G_{3} & G_{3} & G_{2} & G_{1} & G_{1} & H_{1} & F_{2} \\ F_{2} & F_{2} & F_{1} & F_{1} & F_{1} & G_{2} & G_{2} & G_{1} & H_{1} & H_{1} \\ F_{3} & F_{3} & F_{2} & F_{2} & F_{2} & F_{1} & F_{1} & F_{1} & G_{1} & G_{1} & G_{1} \\ F_{4} & F_{4} & F_{3} & F_{3} & F_{3} & F_{2} & F_{2} & F_{2} & F_{1} & F_{1} & F_{1} & F_{1} \end{matrix})

and the matrix of copulas for three-dimensional COPAR model is given by

(\begin{matrix} * \\ C_{3}^{H} & * \\ C_{2}^{H} & C_{3}^{H G} & * \\ C_{1}^{H} & C_{2}^{H G} & C_{3}^{H F} & * \\ C_{3}^{G H} & C_{1}^{H G} & C_{2}^{H F} & C_{2}^{H} & * \\ C_{2}^{G H} & C_{3}^{G} & C_{1}^{H F} & C_{1}^{H} & C_{2}^{G H} & * \\ C_{1}^{G H} & C_{2}^{G} & C_{3}^{G F} & C_{2}^{G H} & C_{1}^{G H} & C_{2}^{H F} & * \\ C^{G H} & C_{1}^{G} & C_{2}^{G F} & C_{1}^{G H} & C_{2}^{G} & C_{1}^{H F} & C_{1}^{H} & * \\ C_{3}^{F H} & C_{3}^{F G} & C_{1}^{G F} & C^{G H} & C_{1}^{G} & C_{2}^{G F} & C_{1}^{G H} & C_{1}^{G H} & * \\ C_{2}^{F H} & C_{2}^{F G} & C_{3}^{F} & C_{2}^{F H} & C_{2}^{F G} & C_{1}^{G F} & C^{G H} & C_{1}^{G} & C_{1}^{H F} & * \\ C_{1}^{F H} & C_{1}^{F G} & C_{2}^{F} & C_{1}^{F H} & C_{1}^{F G} & C_{2}^{F} & C_{1}^{F H} & C_{1}^{F G} & C_{1}^{G F} & C^{G H} & * \\ C^{F H} & C^{F G} & C_{1}^{F} & C^{F H} & C^{F G} & C_{1}^{F} & C^{F H} & C^{F G} & C_{1}^{F} & C^{F H} & C^{F G} & * \end{matrix}) .

If a COPAR

(k)

model is estimated based on the first T multivariate observations, then it can easily be extended to

T + 1

observations. This allows us to sample the

(T + 1)

-th observation according to the estimated dependence structure between the

k -

th subsequent multivariate observations. Let us illustrate this point for a COPAR

(1)

model and the above example with three univariate time series. The matrix representation of the R-vine structure of this COPAR

(1)

model remains unchanged, and the matrix of copulas for three-dimensional COPAR

(1)

model is now simplified to the matrix in (6), where 0 stands for independence copula

\begin{matrix} (\begin{matrix} * \\ 0 & * \\ 0 & 0 & * \\ C_{1}^{H} & 0 & 0 & * \\ 0 & C_{1}^{H G} & 0 & 0 & * \\ 0 & 0 & C_{1}^{H F} & C_{1}^{H} & 0 & * \\ C_{1}^{G H} & 0 & 0 & 0 & C_{1}^{G H} & 0 & * \\ C^{G H} & C_{1}^{G} & 0 & C_{1}^{G H} & 0 & C_{1}^{H F} & C_{1}^{H} & * \\ 0 & 0 & C_{1}^{G F} & C^{G H} & C_{1}^{G} & 0 & C_{1}^{G H} & C_{1}^{G H} & * \\ 0 & 0 & 0 & 0 & 0 & C_{1}^{G F} & C^{G H} & C_{1}^{G} & C_{1}^{H F} & * \\ C_{1}^{F H} & C_{1}^{F G} & 0 & C_{1}^{F H} & C_{1}^{F G} & 0 & C_{1}^{F H} & C_{1}^{F G} & C_{1}^{G F} & C^{G H} & * \\ C^{F H} & C^{F G} & C_{1}^{F} & C^{F H} & C^{F G} & C_{1}^{F} & C^{F H} & C^{F G} & C_{1}^{F} & C^{F H} & C^{F G} & * \end{matrix}) . \end{matrix}

(6)

Now, if we want to expand this COPAR

(1)

model by adding a new time point (i.e.,

T \to T + 1

—in our case,

4 \to 5

), the matrix representation will be changed as follows

Add three blank columns to the left of the matrix and add $(H_{T + 1}, G_{T + 1}, F_{T + 1})$ to the diagonal;
Under $F_{T + 1}$ comes ${(H_{1}, \dots, H_{T})}^{'}$ , then ${(G_{1}, \dots, G_{T})}^{'}$ and ${(F_{1}, \dots, F_{T})}^{'}$ ;
Under $G_{T + 1}$ comes ${(H_{1}, \dots, H_{T})}^{'}$ , then ${(G_{1}, \dots, G_{T})}^{'}$ and ${(F_{1}, \dots, F_{T + 1})}^{'}$ ;
Under $H_{T + 1}$ comes ${(H_{1}, \dots, H_{T})}^{'}$ , then ${(G_{1}, \dots, G_{T + 1})}^{'}$ and ${(F_{1}, \dots, F_{T + 1})}^{'}$ .

The expanded matrix representation is as follows, where the new columns are marked in bold:

(\begin{matrix} H_{5} \\ H_{1} & G_{5} \\ H_{2} & H_{1} & F_{5} \\ H_{3} & H_{2} & H_{1} & H_{4} \\ H_{4} & H_{3} & H_{2} & H_{1} & G_{4} \\ G_{1} & H_{4} & H_{3} & H_{2} & H_{1} & F_{4} \\ G_{2} & G_{1} & H_{4} & H_{3} & H_{2} & H_{1} & H_{3} \\ G_{3} & G_{2} & G_{1} & G_{1} & H_{3} & H_{2} & H_{1} & G_{3} \\ G_{4} & G_{3} & G_{2} & G_{2} & G_{1} & H_{3} & H_{2} & H_{1} & F_{3} \\ G_{5} & G_{4} & G_{3} & G_{3} & G_{2} & G_{1} & G_{1} & H_{2} & H_{1} & H_{2} \\ F_{1} & F_{1} & G_{4} & G_{4} & G_{3} & G_{2} & G_{2} & G_{1} & H_{2} & H_{1} & G_{2} \\ F_{2} & F_{2} & F_{1} & F_{1} & F_{1} & G_{3} & G_{3} & G_{2} & G_{1} & G_{1} & H_{1} & F_{2} \\ F_{3} & F_{3} & F_{2} & F_{2} & F_{2} & F_{1} & F_{1} & F_{1} & G_{2} & G_{2} & G_{1} & H_{1} & H_{1} \\ F_{4} & F_{4} & F_{3} & F_{3} & F_{3} & F_{2} & F_{2} & F_{2} & F_{1} & F_{1} & F_{1} & G_{1} & G_{1} & G_{1} \\ F_{5} & F_{5} & F_{4} & F_{4} & F_{4} & F_{3} & F_{3} & F_{3} & F_{2} & F_{2} & F_{2} & F_{1} & F_{1} & F_{1} & F_{1} \end{matrix}) .

We also expand the matrix of copulas by adding three columns to the left of (6), as follows:

(\begin{matrix} * \\ 0 & * \\ 0 & 0 & * \\ 0 & 0 & 0 & * \\ C_{1}^{H} & 0 & 0 & 0 & * \\ 0 & C_{1}^{HG} & 0 & 0 & 0 & * \\ 0 & 0 & C_{1}^{HF} & C_{1}^{H} & 0 & 0 & * \\ 0 & 0 & 0 & 0 & C_{1}^{H G} & 0 & 0 & * \\ C_{1}^{GH} & 0 & 0 & 0 & 0 & C_{1}^{H F} & C_{1}^{H} & 0 & * \\ C^{GH} & C_{1}^{G} & 0 & C_{1}^{G H} & 0 & 0 & 0 & C_{1}^{G H} & 0 & * \\ 0 & 0 & C_{1}^{GF} & C^{G H} & C_{1}^{G} & 0 & C_{1}^{G H} & 0 & C_{1}^{H F} & C_{1}^{H} & * \\ 0 & 0 & 0 & 0 & 0 & C_{1}^{G F} & C^{G H} & C_{1}^{G} & 0 & C_{1}^{G H} & C_{1}^{G H} & * \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & C_{1}^{G F} & C^{G H} & C_{1}^{G} & C_{1}^{H F} & * \\ C_{1}^{FH} & C_{1}^{FG} & 0 & C_{1}^{F H} & C_{1}^{F G} & 0 & C_{1}^{F H} & C_{1}^{F G} & 0 & C_{1}^{F H} & C_{1}^{F G} & C_{1}^{G F} & C^{G H} & * \\ C^{FH} & C^{FG} & C_{1}^{F} & C^{F H} & C^{F G} & C_{1}^{F} & C^{F H} & C^{F G} & C_{1}^{F} & C^{F H} & C^{F G} & C_{1}^{F} & C^{F H} & C^{F G} & * \end{matrix}) .

Thus, the three new columns are just replications of the previous three columns by expanding the uninterrupted sequences of zeros with an additional zero.

5. Empirical Application

The idea to model asset returns with factors arising from some observed data and idiosyncratic components is quite popular in modern finance theory. The most prominent example is the capital asset-pricing model (CAPM) of Sharpe (1964), Litner (1965), Mossin (1966), and Treynor (2012), which is a one-factor model with the market return as the only common driver of asset prices. Another well-known approach is the arbitrage pricing theory (APT) of Ross (1976). In this case, a multi-factor model describes the return of an asset as the sum of an asset-specific return, an exposure to systematic risk factors, and an error term. A third example is Stock and Watson (2002a), who extracted factors from a large number of predictors to forecast the log-returns of the Federal Reserve Board’s Index of Industrial Production. Ando and Bai (2014) provide further empirical evidence that stock returns are related to macro- and microeconomic factors. In this section, latent factors from U.S. macroeconomic data are extracted and then used for portfolio optimization. We consider 35 assets from S&P 500, which are classified as “Industrials” according to Global Industry Classification Standard. We assume that the estimated factors have the most prediction power for these assets.

The U.S. panel data includes such economic indicators as government bond yields along the curve, currency index, main commodity prices, indicators of money stock, inflation, consumer consumption, and industrial production gauges. Altogether, we have 22 time series listed in Appendix B. Each series contains monthly data from 31 January 1986 to 30 November 2016, 371 data points in total. Next, we split the panel data into financial and macroeconomic groups according to Table A2 and Table A3 in Appendix B. In the sequel, each time series is transformed in order to eliminate trends and achieve its stationarity. Table A2 and Table A3 in Appendix B also contain information on considered data transformations. Further, we consider three factor models: separately for the two groups of the panel data, and one joint model for all monthly indicators. Thereby, we aim to illustrate that modeling nonlinear dependence between estimated factors of different groups of panel data with COPAR may lead to a better asset allocation.

We consider the following three DFMs for

i = f i n, m a c r o, a l l

:

\begin{matrix} X_{t}^{(i)} & = & Λ^{(i)} F_{t}^{(i)} + ε_{t}^{(i)} \end{matrix}

(7)

\begin{matrix} F_{t}^{(i)} & = & A^{(i)} F_{t - 1}^{(i)} + u_{t}^{(i)}, \end{matrix}

(8)

where

X_{t}^{(i)} \in R^{m_{i} \times 1}

collects observed macroeconomic data and

F_{t}^{(i)} \in R^{q_{i} \times 1}

is the vector of factors. The dimension

m_{i}

of the panel data for

i = f i n, m a c r o

and

a l l

is equal to

9, 13

and 22, respectively.

In the first step, we estimate three DFMs—one for each group (macro and financial) and for the full panel data. The starting time frame is from January 1986 to December 2005, and expands successively by one month until November 2016 is reached. For the three DFMs, we apply the EM-Algorithm from Section 2 to the panel data from each expanding time window and obtain a monthly sequence of estimated factors for each month of the considered time period. For

i = f i n, m a c r o

, and

a l l

, the EM-Algorithm requires the factor dimension

q_{i}

, which is not known. In what follows, we perform the model selection for the factor dimension.

The dimension of factors is selected using principal components analysis (see Jolliffe 2002). We choose the number of principal components (PCs) such that we capture more than 95% of the empirical data variance, based on the initial time frame from January 1986 to December 2005. Figure 4 illustrates the fraction of variance captured by eight or fewer PCs for the considered time window. Thus, two factors are sufficient for the financial group to capture 95.4% of the corresponding data variability. For macroeconomic indicators, we need three factors (96.5% of variability), while four factors are enough for the whole panel data (96.2% of variability).

In general, the latent factors of DFMs may follow a VAR model of order p. Initiated by a referee’s comment, we also consider autoregressive order

p = 2

for the joint DFM with all monthly indicators and compare its forecasting performance for

p = 1

and

p = 2

. Our decision criterion is based on the root-mean-square error (RMSE) of point predictions defined for univariate observations

x_{1}, \dots, x_{T^{*}}

as

R M S E = \sqrt{\frac{\sum_{t = 1}^{T^{*}} {(x_{t} - {\hat{x}}_{t})}^{2}}{T^{*}}},

where

{\hat{x}}_{1}, \dots, {\hat{x}}_{T^{*}}

are predicted values. Thus, we compute the RMSE for each time series and take their average value. Thereby, our first prediction is done for January 2006 and ends in November 2016, resulting in 131 point forecasts. The averaged values of the RMSE for

p = 1

and for

p = 2

are

0.0302

and

0.0306

, respectively. Therefore, our initial choice of

p = 1

is justified.

Figure 5 illustrates the correlation coefficients of the financial and macro factors filtered from data up to October 2010. The estimated factors show some moderate linear dependence within groups and a weak linear dependence between groups. Note that dynamic factor models assume factors to be multivariate normally distributed. Nevertheless, the estimated factors could exhibit non-Gaussian dependence, which we aim to capture using the COPAR model.

To improve the joint forecasting of asset returns at month t, for a subsequent month, we link the two factor models; namely, we capture the dependence structure of the estimated factors with a COPAR model. Thereby, we combine the estimated factors from two groups as a single five-dimensional vector for each data time window. We put the financial factors first, then the macroeconomic. Using fitted marginal normal distribution functions, the filtered factors are translated to copula data.

We consider a COPAR(1) model, since the transition equation of the DFMs is supposed to be a VAR(1). Further, Brechmann and Czado (2015) discuss a sophisticated copula family selection procedure for two time series. Here, we follow a more simple approach consisting of three steps. In the first step, the selection of copulas for contemporaneous cross-sectional dependence of the filtered factors is done by neglecting serial dependence. Note that the cross-sectional dependence is described by a C-vine with fixed order of variables. Therefore, we only have to select copula families of pair-copulas. Thus, we consider the estimated five-dimensional factors as iid data and perform sequential selection of copula families for each pair-copula using Akaike information criterion. Copula families for the pair-copulas associated with the first tree of the C-vine are selected first, then with the second tree, and so on. For more details on model selection for C-vines, we refer to Czado et al. (2012). Equation (9) presents the selected copula families for cross-sectional dependence in R-vine matrix notation until December 2005 (for copula families in (9), see Appendix C):

\begin{matrix} Families & = & (\begin{matrix} * \\ 2 & * \\ 5 & 2 & * \\ 16 & 3 & 2 & * \\ 16 & 2 & 36 & 2 & * \end{matrix}) . \end{matrix}

(9)

It is remarkable that none of the selected copula families in (9) is Gaussian. Moreover, only 0.5% of families selected for cross-sectional dependence for all expending time windows are the Gaussian copula.

In the second step, we use the Gaussian copulas as conditional pair-copulas to model the temporal dependence of the filtered factors. Note that this simple approach does not imply that the joint distribution of the time-shifted factors is Gaussian except for the first component of

F_{t}^{(f i n)}

; i.e.,

F_{1, t_{1}}^{(f i n)}, F_{1, t_{2}}^{(f i n)}, \dots, F_{1, t_{k}}^{(f i n)}

. Non-Gaussian pair-copulas for cross-sectional dependence destroy the Gaussianity of the time-shifted factors, and only the joint distribution of the first component of

F_{t}^{(f i n)}

remains Gaussian. In the third step, the R-vine matrix of the COPAR(1) model for the estimated factors

{(F_{t}^{{(f i n)}^{'}}, F_{t}^{{(m a c r o)}^{'}})}^{'}

is constructed, and the maximum likelihood estimation is performed. Copula selection and parameter estimation of the COPAR model for the factors is done for every expanding window of data; that is, as soon as the DFMs for both groups are estimated and the latent factors are filtered out. Finally, note that we also test Granger causality between components of the multivariate time series

{(F_{t}^{{(f i n)}^{'}}, F_{t}^{{(m a c r o)}^{'}})}^{'}

in the whole time period, and can confirm it. For this, we regress univariate series of

{(F_{t}^{{(f i n)}^{'}}, F_{t}^{{(m a c r o)}^{'}})}^{'}

on the lagged values of

{(F_{t - 1}^{{(f i n)}^{'}}, F_{t - 1}^{{(m a c r o)}^{'}})}^{'}

. Table A11 in Appendix F summarizes all five linear regressions. For example, the first component of

{\{F_{t}^{(f i n)}\}}_{t = 1, \dots, T}

Granger causes almost all other univariate time series at least at

10 %

of significance level with the exception of the second component of

{\{F_{t}^{(m a c r o)}\}}_{t = 1, \dots, T}

.

Next, we present a conditional method for forecasting using COPAR, which follows Brechmann and Czado (2015) with a small modification. For a prediction time point, we simulate factors from the estimated COPAR model, but conditioned on the past values of the factors. The sampled factors are further used to forecast assets returns and to construct an optimal mean-variance portfolio. Since we assume an autoregressive order of one, it is enough to condition only on the last value of factors; i.e., we condition only on

{\hat{F}}_{T}^{(f i n)}

and

{\hat{F}}_{T}^{(m a c r o)}

to make a forecast for time point

T + 1

. The full algorithm for both markets is given in Algorithm 1, employing notation

{\bar{F}}_{1 : 0, t} = \emptyset

.

Algorithm 1: Conditional method of forecasting using COPAR.

Estimate COPAR model based on the first T observations and get factor estimates ${\hat{F}}_{t} = {({\hat{F}}_{t}^{{(f i n)}^{'}}, {\hat{F}}_{t}^{{(m a c r o)}^{'}})}^{'}$ ;
Set $j = 1$ ;
Repeat the following steps
(a)
If $j = q + 1$ then Stop;
(b)
Determine the estimated conditional density of $F_{j, T + 1} ∣ {\hat{F}}_{1 : q, T}, {\bar{F}}_{1 : (j - 1), T + 1}$ on the equidistant grid on $(0, 1)$ with step $Δ = 10^{- 4}$ , i.e., for $u = \frac{n}{10000}$ , $n = 1, \dots, 9999$ ;

$\frac{c (Φ_{1} ({\bar{F}}_{1, T + 1}), \dots, Φ_{j - 1} ({\bar{F}}_{j - 1, T + 1}), u, Φ_{1} ({\hat{F}}_{1, T}), \dots, Φ_{q} ({\hat{F}}_{q, T}))}{f ({\bar{F}}_{1, T + 1}, \dots, {\bar{F}}_{j - 1, T + 1}, {\hat{F}}_{1, T}, \dots, {\hat{F}}_{q, T})}$

$\cdot \prod_{k = 1}^{j - 1} ϕ_{k} ({\bar{F}}_{k, T + 1}) \cdot ϕ_{j} (Φ_{j}^{- 1} (u)) \cdot \prod_{k = 1}^{q} ϕ_{k} ({\hat{F}}_{k, T}),$

where $ϕ_{k}$ and $Φ_{k}$ are the estimated Gaussian density and distribution function of latent factor $F_{k, t}^{(i)}$ ;
(c)
Determine the estimated conditional cumulative distribution function on the above grid;
(d)
Simulate 10,000 ${\tilde{F}}_{j, T + 1}$ from the estimated conditional cumulative distribution function and set forecast ${\bar{F}}_{j, T + 1}$ to their empirical mean;
(e)
Set $j = j + 1$ and go to (a);

To illustrate the idea behind this method, we consider a small-scale example of two variables

G_{t}

and

H_{t}

. Let us assume that we observe some values of these two random variables at time point t; i.e.,

G_{t} = {\hat{G}}_{t}

,

H_{t} = {\hat{H}}_{t}

. We want to find the distribution of

G_{t + 1}

and

H_{t + 1}

conditioned on the observed values that we have observed. For this purpose, we consider the decomposition of the conditional distribution of

G_{t + 1}, H_{t + 1} ∣ G_{t} = {\hat{G}}_{t}, H_{t} = {\hat{H}}_{t}

given by

\begin{matrix} F (G_{t + 1}, H_{t + 1} ∣ G_{t} = {\hat{G}}_{t}, H_{t} = {\hat{H}}_{t}) \\ = F (G_{t + 1} ∣ G_{t} = {\hat{G}}_{t}, H_{t} = {\hat{H}}_{t}) F (H_{t + 1} ∣ G_{t} = {\hat{G}}_{t}, H_{t} = {\hat{H}}_{t}, G_{t + 1} = {\bar{G}}_{t + 1}), \end{matrix}

where

{\bar{G}}_{t + 1}

is some forecast. One has a free choice for forecast methods for

G_{t + 1}

. As Brechmann and Czado (2015), we opt for the conditional mean estimated by the sample mean. Next, we first estimate the conditional density, and then, the conditional distribution function. For this simple case of two variables

G_{t}

and

H_{t}

, the estimated density

\hat{f}

of

G_{t + 1} ∣ G_{t} = {\hat{G}}_{t}, H_{t} = {\hat{H}}_{t}

on a grid from 0 to 1 with step 0.0001 is estimated. Then, we search for the estimated distribution function

\hat{F}

. With

\hat{f}

and

\hat{F}

, one can estimate the mean.

In the next step, we use the estimated factors to model the asset returns of 35 industrial stocks from S&P 500 and the sampled factors to construct an optimal portfolio in the mean-variance framework for each expanding time window starting from January 2006 up to November 2016. We consider and estimate the following model for each asset return

j = 1, \dots, 35

:

r_{j, t} = α_{j, 0} + α_{j, 1} r_{j, t - 1} + Γ_{j}^{'} {\hat{F}}_{t} + v_{j, t},

(10)

where the constants

α_{j, 0}

,

α_{j, 1}

, and the vector

Γ_{j}

constitute unknown regression parameters and the errors

v_{j, t}

are assumed to be iid Gaussian with respect to t. The error terms

v_{j, t}

are assumed to be independent across j due to the dimensionality of the error covariance matrix. Moreover, we treat all linear regression separately for each asset j due to the dimensionality of regression parameters and couple them with the common estimated factors. In Section 2, we have pointed out that the estimated factors are unique up to a rotation with an orthogonal matrix R; i.e.,

R R^{'} = R^{'} R

is an identity matrix. Since

Γ_{j}^{'} \hat{F_{t}} = Γ_{j}^{'} R^{'} R {\hat{F}}_{t}

holds, the impact of the rotated and unrotated estimated factors together with their regression coefficients on asset returns remains the same.

To construct an optimal portfolio in the mean-variance framework, we determine portfolio weights

w

at month t by solving the following optimization problem with respect to

w

:

\begin{matrix} max w^{'} \cdot E [r_{t + 1}] \\ s . t . \\ 1^{T} \cdot w = 1, w \geq 0, \\ w^{'} \cdot V a r [r_{t + 1}] \cdot w \leq σ_{m o n t h l y}^{2}, \end{matrix}

where

r_{t} = {(r_{1, t}, \dots, r_{35, t})}^{'}

is a vector of asset returns at t,

E [r_{t + 1}]

and

V a r [r_{t + 1}]

are the expectation and covariance matrix of

r_{t + 1}

. Thus, our optimal portfolio does not allow short-selling.

Since

E [r_{t + 1}]

and

V a r [r_{t + 1}]

are unknown at month t, we estimate them with four different methods, resulting in four portfolios. The first portfolio is the COPAR portfolio, which is constructed using asset returns modeled with (10). In this case,

E [r_{t + 1}]

and

V a r [r_{t + 1}]

are empirically estimated at time point t based on 10,000 sampled factors from the COPAR model and 10,000 sampled errors from the estimated univariate Gaussian distribution. The second portfolio is the DFM portfolio similarly constructed with sampled factors from the estimated stationary distribution (1) for (7)–(8) and

i = a l l

(i.e., the DFM for the full panel data). The third portfolio—called independent DFM—uses sampled factors from the estimated stationary distribution (1) for (7)–(8) and

i = f i n, m a c r o

; i.e., assets returns are driven by

{(F_{t}^{{(f i n)}^{'}}, F_{t}^{{(m a c r o)}^{'}})}^{'}

. The fourth portfolio is the historical portfolio, which employs the empirical mean and covariance matrix based on the data up to time t. We consider the historical portfolio as a benchmark. Finally note that the comparison of these four portfolios enables the assessment of the economic relevance of factors.

For the above mean-variance optimization problem, we choose three monthly volatilities

σ_{m o n t h l y} = 2.89 %, 3.75 %, 4.62 %

, which correspond to annual volatilities of

σ = 10 %, 13 %, 16 %

. These choices of

σ

’s are practically reasonable, and the optimizer diversifies the portfolio for them. For higher values of

σ

, the optimal portfolio consists of only one stock, if no constraints on maximal weights are imposed. We start to determine an optimal portfolio for the panel data up to January 2006. Then, we sequentially expand the time window by one month and find optimal weights for the considered four portfolios and chosen level of volatilities. The performance of the four portfolios with initial investment of 1 USD during the out-of-sample time period up to November 2016 is illustrated in Figure 6.

We observe that the COPAR portfolio outperforms the DFM, independent DFM, and historical portfolios. Further, two portfolios based on DFMs deliver a higher overall return than the historical portfolio, and the independent DFM (abbreviated in Figures and Tables as ind) is preferred over the DFM portfolio.

To compare the four portfolios, we consider several portfolio performance and risk measures summarized in Appendix D. First, note that observed standard deviations of portfolio returns are higher than prespecified ones. This is natural due to the prediction error. According to the Sharpe and Omega ratio, the COPAR portfolio is preferred over all remaining portfolios. For both risk measures, the independent DFM portfolio outperforms the DFM portfolio. We explain this finding with a fortunate split of the panel data. Further, if we consider

95 %

Value at Risk (VaR) and

95 %

Conditional Value at Risk (CVaR), then the historical portfolio outperforms the others except for one case. For

σ = 16 %

and

95 %

VaR, the COPAR portfolio is slightly superior.

To statistically assess the differences in the Sharpe ratios, we perform the Jobson–Korkie Test from Jobson and Korkie (1981). The null hypothesis of this test is that the Sharpe ratios of the two considered portfolios are equal. The normalized and centered test statistics are asymptotically Gaussian distributed with mean 0 and variance 1 under the null hypothesis. If the null hypothesis is rejected, then there is significant statistical evidence for different Sharpe ratios. Appendix E presents results of the Jobson–Korkie Test for all pairs of portfolios and

σ = 10 %, 13 %, 16 %

. The COPAR portfolio significantly outperforms the independent DFM portfolio, and this advocates our approach to model estimated factors with a COPAR model. For

σ = 10 %

and

σ = 13 %

, we do not see a statistically significant difference in Sharpe ratios for historical and COPAR portfolios. We explain this finding with lower standard deviations of the historical portfolio returns.

6. Conclusions and Final Remarks

This paper applies copulas to capture the dependence structure of estimated latent factors from dynamic factor models. The proposed modeling approach is especially convenient when several factor models under consideration are estimated separately. In this context, we combine the filtered latent factors with the COPAR model of Brechmann and Czado (2015), which results in a non-Gaussian dependence between the factors. The gained flexibility of the factor dependence is then used for modeling asset returns and building optimal mean-variance portfolios.

In our empirical study, we consider U.S. panel data consisting of 9 financial and 13 macroeconomic monthly observable indicators. The nature of indicators suggests the consideration of two separate dynamic factor models. We also treat a joint dynamic factor model for all indicators. Estimated factors from the considered DFMs are used to model returns of 35 industrial stocks from S&P500. Then, factors’ predictions from different models spanning almost 11 years are employed for portfolio optimization in the mean-variance framework.

Our main contribution is a performance improvement of portfolios based on DFMs. For this, we propose to capture the dependence structure of filtered factors from DFMs with a COPAR model. This allows us to sample factor forecasts from the estimated COPAR model conditionally on past values of estimated factors. The gained factor predictions are then utilized to construct an optimal mean-variance portfolio. Thus, we compare the COPAR-based portfolio with two portfolios derived from DFMs, as well as with the classical mean-variance approach utilizing empirical means and covariance matrices. For the considered panel data and industrial stocks, we observe the outperformance of the COPAR portfolio in terms of the total return, the Sharpe and Omega ratio. The superiority of the COPAR-portfolio in terms of the Sharpe ratio is even statistically significant for several portfolio comparisons.

A possible improvement of the proposed approach is its extension with a model selection criterion for a general R-vine, which best captures the cross-sectional and temporal dependence. Thus, one departs from the COPAR model and generalizes it as well as the copula based multivariate time series models of Beare and Seo (2015) and Smith (2015). In general, one can completely revise our approach and alternatively develop dynamic versions of copula factor models as proposed by Krupskii and Joe (2013) and Oh and Patton (2017) for iid data. The first methodology in this direction is provided by Oh and Patton (2016), who capture cross-sectional dependence with time-varying copulas. These points are the subject of further research.

Acknowledgments

The authors want to thank the editor and the anonymous reviewers for their very helpful suggestions, which contributed essentially to the improvement of our manuscript. We are also obliged to Eike Brechmann, who kindly provided R-code for COPAR model, which served as an inspiration for computer implementation of this paper. Franz Ramsauer gratefully acknowledges the support of Pioneer Investments during his doctoral phase. Eugen Ivanov gratefully acknowledges the support of Deutsche Forschungsgemeinschaft during his doctoral phase. The authors also express gratitude to Leibniz-Rechenzentrum for providing computational resources and assistance.

Author Contributions

Eugen Ivanov, Aleksey Min and Franz Ramsauer analyzed the data, constructed the model and designed the estimation procedure. Asset allocation procedure based on dynamical factor models is proposed by Franz Ramsauer. Eugen Ivanov and Franz Ramsauer performed the complete computational implementation. All three authors wrote the paper.

Conflicts of Interest

The authors declare no conflict of interest. The sponsors had no role in the design of the study, in the collection, analyses, or interpretation of data, in the writing of the manuscript, and in the decision to publish the results.

Appendix A. Assets for Portfolio Optimization

Table A1. Assets used for portfolio construction.

Ticker	Name	GICS Sub Industry
CMI	Cummins Inc.	Industrial Machinery
EMR	Emerson Electric Company	Industrial Conglomerates
CSX	CSX Corp.	Railroads
MMM	3M Company	Industrial Conglomerates
BA	Boeing Company	Aerospace & Defense
CHRW	C. H. Robinson Worldwide	Air Freight & Logistics
CAT	Caterpillar Inc.	Construction & Farm Machinery & Heavy Trucks
CTAS	Cintas Corporation	Diversified Support Services
DHR	Danaher Corp.	Industrial Conglomerates
DE	Deere & Co.	Construction & Farm Machinery & Heavy Trucks
ETN	Eaton Corporation	Industrial Conglomerates
EFX	Equifax Inc.	Research & Consulting Services
GD	General Dynamics	Aerospace & Defense
GE	General Electric	Industrial Conglomerates
LLL	L-3 Communications Holdings	Industrial Conglomerates
LEG	Leggett & Platt	Industrial Conglomerates
NSC	Norfolk Southern Corp.	Railroads
PBI	Pitney-Bowes	Technology, Hardware, Software and Supplies
RTN	Raytheon Co.	Aerospace & Defense
RHI	Robert Half International	Human Resource & Employment Services
ROK	Rockwell Automation Inc.	Industrial Conglomerates
COL	Rockwell Collins	Industrial Conglomerates
UNP	Union Pacific	Railroads
UPS	United Parcel Service	Air Freight & Logistics
UTX	United Technologies	Industrial Conglomerates
WM	Waste Management Inc.	Environmental Services
TXT	Textron Inc.	Industrial Conglomerates
FLR	Fluor Corp.	Diversified Commercial Services
FDX	FedEx Corporation	Air Freight & Logistics
PCAR	PACCAR Inc.	Construction & Farm Machinery & Heavy Trucks
GWW	Grainger (W.W.) Inc.	Industrial Materials
MAS	Masco Corp.	Building Products
R	Ryder System	Industrial Conglomerates
LMT	Lockheed Martin Corp.	Aerospace & Defense
NOC	Northrop Grumman Corp.	Aerospace & Defense

Appendix B. Underlying Panel Data

In the following tables, FRED stands for Federal Reserve Bank of St. Louis.

Table A2. U.S. financial indicators.

Indicator	Description	Transformation	Source
USA Treasury 3 m	Yield of U.S. government bonds with maturity of 3 months	Monthly change	FRED
USA Treasury 1y	Yield of U.S. government bonds with maturity of 1 year	Monthly change	FRED
USA Treasury 2y	Yield of U.S. government bonds with maturity of 2 years	Monthly change	FRED
USA Treasury 3y	Yield of U.S. government bonds with maturity of 3 years	Monthly change	FRED
USA Treasury 5y	Yield of U.S. government bonds with maturity of 5 years	Monthly change	FRED
USA Treasury 7y	Yield of U.S. government bonds with maturity of 7 years	Monthly change	FRED
USA Treasury 10y	Yield of U.S. government bonds with maturity of 10 years	Monthly change	FRED
Gold	Gold Fixing Price 10:30 A.M. (London time) in London Bullion Market, based in U.S. Dollars.	Return	FRED
WTI Crude	Crude Oil Prices: West Texas Intermediate (WTI)—Cushing, Oklahoma.	Return	FRED

Table A3. U.S. macroeconomical indicators.

Indicator	Description	Transformation	Source
M1 U.S.	Narrow money (M1) includes currency, i.e., banknotes and coins, as well as balances which can immediately be converted into currency or used for cashless payments, i.e., overnight deposits.	Return	FRED
M2 U.S.	“Intermediate” money (M2) comprises narrow money (M1) and, in addition, deposits with a maturity of up to two years and deposits redeemable at a period of notice of up to three months.	Return	FRED
Unemployment	Unemployment rate	Return	FRED
CPI	Consumer Price Index (CPI) measures changes in the price level of a market basket of consumer goods and services purchased by households.	Return	FRED
PPI	Producer Price Index (PPI) measures the average changes in prices received by domestic producers for their output	Return	FRED
PCE	Measures goods and services purchased by U.S. residents	Return	FRED
Personal saving rate	Personal saving as a percentage of disposable personal income (DPI), frequently referred to as “the personal saving rate,” is calculated as the ratio of personal saving to DPI.	Return	FRED
Payrolls	All Employees: Total Nonfarm Payrolls.	Return	FRED
Unemployment	Unemployment	Return	FRED
Initial claims	Initial claims	Return	FRED
Housing Starts	Housing Starts	Return	FRED
Capacity utilization	Capacity utilization	Return	FRED
Dollar Index	Trade Weighted U.S. Dollar Index	Return	FRED

Appendix C. Copula Families

Table A4. Copula families as numbered in the R-package VineCopula.

Number	Family
0	Independence copula
1	Gaussian copula
2	Student’s t copula
3	Clayton copula
4	Gumbel copula
5	Frank copula
6	Joe copula
13	Rotated Clayton copula (180 degrees; “survival Clayton”)
14	Rotated Gumbel copula (180 degrees; “survival Gumbel”)
16	Rotated Joe copula (180 degrees; “survival Joe”)
23	Rotated Clayton copula (90 degrees)
24	Rotated Gumbel copula (90 degrees)
26	Rotated Joe copula (90 degrees)
33	Rotated Clayton copula (270 degrees)
34	Rotated Gumbel copula (270 degrees)
36	Rotated Joe copula (270 degrees)

Appendix D. Portfolio Characteristics

Table A5. Portfolio characteristics for annual volatility

σ = 10 %

.

Table A5. Portfolio characteristics for annual volatility

σ = 10 %

.

	COPAR	DFM	ind DFM	hist
Log-return (Total)	1.0578	0.9592	0.9661	0.6982
Log-return (Monthly)	0.0079	0.0072	0.0072	0.0050
Std. deviation (Annual)	0.1775	0.1808	0.1799	0.1571
Sharpe Ratio (Monthly)	0.1816	0.1665	0.1668	0.1354
Omega Ratio (Monthly)	1.6389	1.5617	1.5696	1.4420
95% VaR	−0.1930	−0.1916	−0.1993	−0.1899
95% CVaR	−0.1965	−0.1956	−0.20373	−0.1924

Table A6. Portfolio characteristics for annual volatility

σ = 13 %

.

Table A6. Portfolio characteristics for annual volatility

σ = 13 %

.

	COPAR	DFM	ind DFM	hist
Log-return (Total)	1.1828	0.9657	1.0257	0.6855
Log-return (Monthly)	0.0089	0.0073	0.0077	0.0048
Std. deviation (Annual)	0.1838	0.1890	0.1888	0.1661
Sharpe Ratio (Monthly)	0.1956	0.1634	0.1704	0.1268
Omega Ratio (Monthly)	1.6941	1.5420	1.5783	1.3974
95% VaR	$- 0$ .2106	$- 0$ .2135	$- 0$ .2219	$- 0$ .1920
95% CVaR	$- 0$ .2162	$- 0$ .2191	$- 0$ .2283	$- 0$ .1944

Table A7. Portfolio characteristics for annual volatility

σ = 16 %

.

Table A7. Portfolio characteristics for annual volatility

σ = 16 %

.

	COPAR	DFM	ind DFM	hist
Log-return (Total)	1.2554	0.9068	1.0750	0.2660
Log-return (Monthly)	0.0095	0.0069	0.0081	0.0015
Std. deviation (Annual)	0.1931	0.1981	0.1996	0.2015
Sharpe Ratio (Monthly)	0.1999	0.1514	0.1715	0.0581
Omega Ratio (Monthly)	1.7018	1.4891	1.5860	1.1814
95% VaR	$- 0 .$ 2332	−0.2362	$- 0$ .2458	$- 0$ .2355
95% CVaR	$- 0$ .2408	$- 0$ .2434	$- 0$ .2541	$- 0$ .2379

Appendix E. Significance Tests for Sharpe Ratios

Table A8. Test statistics of the Jobson–Korkie Test for pairwise comparison of Sharpe ratios for four portfolios from the mean-variance optimization problem with

σ = 10 %

.

Table A8. Test statistics of the Jobson–Korkie Test for pairwise comparison of Sharpe ratios for four portfolios from the mean-variance optimization problem with

σ = 10 %

.

	COPAR	DFM	ind DFM
COPAR
DFM	1.02
ind DFM	1.89 *	0.03
hist	1.04	0.66	0.70

Note: * p-value < 0.1, ** p-value < 0.05, *** p-value < 0.01.

Table A9. Test statistics of the Jobson–Korkie Test for pairwise comparison of Sharpe ratios for four portfolios from the mean-variance optimization problem with

σ = 13 %

.

Table A9. Test statistics of the Jobson–Korkie Test for pairwise comparison of Sharpe ratios for four portfolios from the mean-variance optimization problem with

σ = 13 %

.

	COPAR	DFM	ind DFM
COPAR
DFM	1.62
ind DFM	2.20 **	0.42
hist	1.30	0.71	0.82

Note: * p-value < 0.1, ** p-value < 0.05, *** p-value < 0.01.

Table A10. Test statistics of the Jobson–Korkie Test for pairwise comparison of Sharpe ratios for four portfolios from the mean-variance optimization problem with

σ = 16 %

.

Table A10. Test statistics of the Jobson–Korkie Test for pairwise comparison of Sharpe ratios for four portfolios from the mean-variance optimization problem with

σ = 16 %

.

	COPAR	DFM	ind DFM
COPAR
DFM	2.03 **
ind DFM	2.27 **	0.95
hist	1.95 *	1.39	1.60

Note: * p-value < 0.1, ** p-value < 0.05, *** p-value < 0.01.

Appendix F. Testing for Granger Causality

Table A11. Summary of linear regressions for testing Granger causality. Columns contain estimate of regression coefficients, and their standard errors are given in brackets.

	Dependent Variable:
	y
	$F_{1, t}^{fin}$	$F_{2, t}^{fin}$	$F_{1, t}^{macro}$	$F_{2, t}^{macro}$	$F_{3, t}^{macro}$
	(1)	(2)	(3)	(4)	(5)
$F_{1, t - 1}^{f i n}$	0.216 ***	0.655 ***	−0.189 ***	0.076	−0.094 *
$F_{1, t - 1}^{f i n}$	(0.054)	(0.060)	(0.053)	(0.056)	(0.055)
$F_{2, t - 1}^{f i n}$	0.086 **	0.214 ***	0.052	−0.056	0.048
$F_{2, t - 1}^{f i n}$	(0.041)	(0.045)	(0.040)	(0.042)	(0.042)
$F_{1, t - 1}^{m a c r o}$	0.076	0.009	−0.385 ***	0.037	−0.104 **
$F_{1, t - 1}^{m a c r o}$	(0.047)	(0.052)	(0.046)	(0.048)	(0.048)
$F_{2, t - 1}^{m a c r o}$	0.077 *	−0.006	0.020	−0.456 ***	0.024
$F_{2, t - 1}^{m a c r o}$	(0.046)	(0.051)	(0.045)	(0.047)	(0.047)
$F_{3, t - 1}^{m a c r o}$	−0.009	0.065	0.192 ***	−0.045	−0.554 ***
$F_{3, t - 1}^{m a c r o}$	(0.043)	(0.047)	(0.042)	(0.044)	(0.044)
Constant	0.003	0.002	−0.0005	0.003	0.0002
Constant	(0.052)	(0.058)	(0.051)	(0.054)	(0.053)
Observations	369	369	369	369	369
R $^{2}$	0.093	0.369	0.236	0.215	0.315
Adjusted R $^{2}$	0.080	0.360	0.225	0.204	0.305
Residual Std. Error (df = 363)	1.003	1.109	0.981	1.030	1.026
F Statistic (df = 5; 363)	7.428 ***	42.464 ***	22.408 ***	19.893 ***	33.339 ***

Note: * p-value < 0.1; ** p-value < 0.05; *** p-value < 0.01.

References

Aas, Kjersti. 2016. Pair-copula constructions for financial applications: A review. Econometrics 4: 43. [Google Scholar] [CrossRef]
Aas, Kjersti, Claudia Czado, Arnoldo Frigessi, and Henrik Bakken. 2009. Pair-copula constructions of multiple dependence. Insurance: Mathematics and economics 44: 182–98. [Google Scholar] [CrossRef]
Ando, Tomohiro, and Jushan Bai. 2014. Asset pricing with a general multifactor structure. Journal of Financial Econometrics 13: 556–604. [Google Scholar] [CrossRef]
Bańbura, Marta, and Michele Modugno. 2014. Maximum likelihood estimation of factor models on data sets with arbitrary pattern of missing data. Journal of Applied Econometrics 29: 133–60. [Google Scholar] [CrossRef]
Beare, Brendan K., and Juwon Seo. 2015. Vine copula specifications for stationary multivariate markov chains. Journal of Time Series Analysis 36: 228–46. [Google Scholar] [CrossRef]
Biller, Bahar, and Barry L. Nelson. 2003. Modeling and generating multivariate time-series input processes using a vector autoregressive technique. ACM Transactions on Modeling and Computer Simulation 13: 211–37. [Google Scholar] [CrossRef]
Bork, Lasse. 2009. Estimating US Monetary Policy Shocks Using a Factor-Augmented Vector Autoregression: An EM Algorithm Approach. CREATES Research Paper 2009-11. Denmark: Department of Economics and Business Economics, Aarhus University. [Google Scholar]
Brechmann, Eike Christian, and Claudia Czado. 2015. COPAR—Multivariate time series modeling using the copula autoregressive model. Applied Stochastic Models in Business and Industry 31: 495–514. [Google Scholar] [CrossRef]
Chena, Xiaohong, and Yanqin Fan. 2006. Estimation of copula-based semiparametric time series models. Journal of Econometrics 130: 307–35. [Google Scholar] [CrossRef]
Czado, Claudia. 2010. Pair-copula constructions of multivariate copulas. In Copula Theory and Its Applications. Edited by Wolfgang Härdle, Piotr Jaworski and Tomasz Rychlik. New York: Springer, pp. 93–109. [Google Scholar]
Czado, Claudia, Eike Christian Brechmann, and Lutz Gruber. 2012. Selection of vine copulas. In Copulae in Mathematical and Quantitative Finance. Edited by Fabrizio Durante and Piotr Jaworski. New York: Springer, pp. 17–37. [Google Scholar]
Czado, Claudia, Ulf Schepsmeier, and Aleksey Min. 2012. Maximum likelihood estimation of mixed C-vines with application to exchange rates. Statistical Modelling 12: 229–55. [Google Scholar] [CrossRef]
Darsow, William F., Bao Nguyen, and Elwood T. Olsen. 1992. Copulas and Markov processes. Illinois Journal of Mathematics 36: 600–642. [Google Scholar]
Dempster, Arthur P., Nan M. Laird, and Donald B. Rubin. 1977. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society. Series B (Methodological) 39: 1–38. [Google Scholar]
Dissmann, Jeffrey, Eike Christian Brechmann, Claudia Czado, and Dorota Kurowicka. 2013. Selecting and estimating regular vine copulae and application to financial returns. Computational Statistics & Data Analysis 59: 52–69. [Google Scholar] [CrossRef]
Embrechts, Paul, Alexander McNeil, and Daniel Straumann. 2002. Correlation and dependency in risk management: Properties and pitfalls. In Risk Management: Value at Risk and Beyond. Edited by Michael Dempster and Henry Moffat. Cambridge: Cambridge University Press, pp. 176–223. [Google Scholar]
Frees, Edward W., and Emiliano A. Valdez. 1998. Understanding relationships using copulas. North American Actuarial Journal 2: 1–25. [Google Scholar] [CrossRef]
Ibragimov, Rustam. 2009. Copula-based characterizations for higher order Markov processes. Econometric Theory 25: 819–46. [Google Scholar] [CrossRef]
Jolliffe, Ian. 2002. Principal Component Analysis. New York: Springer. [Google Scholar]
Jobson, John, and Bob Korkie. 1981. Performance hypothesis testing with the Sharpe and Treynor measures. The Journal of Finance 36: 889–908. [Google Scholar] [CrossRef]
Krupskii, Pavel, and Harry Joe. 2013. Factor copula models for multivariate data. Journal of Multivariate Analysis 120: 85–101. [Google Scholar] [CrossRef]
Kurowicka, Dorota, and Roger Cooke. 2006. Uncertainty Analysis with High Dimensional Dependence Modelling. New York: John Wiley & Sons. [Google Scholar]
Li, David X. 2000. On default correlation: A copula function approach. Journal of Fixed Income 9: 43–54. [Google Scholar] [CrossRef]
Litner, John. 1965. The valuation of risk assets and the selection of risky investments in stock portfolios and capital budgets. Review of Economics and Statistics 47: 13–37. [Google Scholar] [CrossRef]
Lütkepohl, Helmut. 2005. New Introduction to Multiple Time Series Analysis. New York: Springer. [Google Scholar]
Mossin, Jan. 1966. Equilibrium in a Capital Asset Market. Econometrica 34: 768–83. [Google Scholar] [CrossRef]
Morales-Nápoles, Oswaldo, Roger Cooke, and Dorota Kurowicka. 2010. About the number of vines and regular vines on n nodes. [Google Scholar] [CrossRef]
Oh, Dong Hwan, and Andrew J. Patton. 2017. Modelling dependence in high dimensions with factor copulas. Journal of Business & Economic Statistics 35: 139–154. [Google Scholar] [CrossRef]
Oh, Dong Hwan, and Andrew J. Patton. 2016. Time-varying systemic risk: Evidence from a dynamic copula model of CDS spreads. Journal of Business & Economic Statistics. [Google Scholar] [CrossRef]
Ross, Stephen. 1976. The arbitrage theory of capital asset pricing. Journal of Economic Theory 13: 341–60. [Google Scholar] [CrossRef]
Shumway, Robert, and David Stoffer. 1982. An approach to time series smoothing and forecasting using the EM algorithm. Journal of Time Series Analysis 3: 253–64. [Google Scholar] [CrossRef]
Sklar, Abe. 1959. Fonctions de répartition à n dimensions et leurs marges. Publications de l’Institut Statistique de l’Université de Paris 8: 229–31. [Google Scholar]
Smith, Michael Stanley. 2015. Copula modelling of dependence in multivariate time series. International Journal of Forecasting 31: 815–33. [Google Scholar] [CrossRef]
Stock, James, and Mark Watson. 1999. Diffusion indices. Working Paper No. 6702. Cambridge, MA, USA: National Bureau of Economic Research. [Google Scholar]
Stock, James, and Mark Watson. 2002. Forecasting using principal components from a large number of predictors. Journal of the American Statistical Association 97: 1167–79. [Google Scholar] [CrossRef]
Stock, James, and Mark Watson. 2002. Macroeconomic forecasting using diffusion indexes. Journal of Business & Economic Statistics 20: 147–62. [Google Scholar] [CrossRef]
Stöber, Jakob, Harry Joe, and Claudia Czado. 2013. Simplified pair copula constructions—Limitations and extensions. Journal of Multivariate Analysis 119: 101–18. [Google Scholar] [CrossRef]
Sharpe, William F. 1964. Capital asset prices: A theory of market equilibrium under conditions of risk. The Journal of Finance 19: 425–42. [Google Scholar] [CrossRef]
Treynor, Jack L. 2012. Toward a Theory of Market Value of Risky Assets. In Treynor on Institutional Investing. Edited by Jack L. Treynor. Hoboken: John Wiley & Sons. [Google Scholar] [CrossRef]
Watson, Mark, and Robert Engle. 1983. Alternative algorithms for the estimation of dynamic factor, mimic and varying coefficient regression models. Journal of Econometrics 23: 385–400. [Google Scholar] [CrossRef]
Wu, Jeff. 1983. On the convergence properties of the EM algorithm. The Annals of Statistics 11: 95–103. [Google Scholar] [CrossRef]

Figure 1. C-vine (Left) and D-vine (Right) representation for

d = 5

.

Figure 1. C-vine (Left) and D-vine (Right) representation for

d = 5

.

Figure 2. Dependencies of a simultaneous bivariate time series for

T = 4

. Vertical solid lines: cross-sectional dependence; horizontal solid lines and curved lines: serial dependence for each time series; dotted and dashed lines: cross-serial dependence.

Figure 2. Dependencies of a simultaneous bivariate time series for

T = 4

. Vertical solid lines: cross-sectional dependence; horizontal solid lines and curved lines: serial dependence for each time series; dotted and dashed lines: cross-serial dependence.

Figure 3. R-vine structure of three-dimensional time series for

T = 4

.

Figure 3. R-vine structure of three-dimensional time series for

T = 4

.

Figure 4. Amount of principal components (PCs) vs. captured variance for three dynamic factor models (DFMs) considered up to December 2005.

Figure 5. Correlations heatmap as of October 2010.

Figure 6. Performance of four portfolios with initial investment of 1 USD over the out-of-sample time period from January 2006 to November 2016 for

σ = 10 %, 13 %, 16 %

. COPAR: copula autoregressive; Hist: historical; Ind: independent DFM.

Figure 6. Performance of four portfolios with initial investment of 1 USD over the out-of-sample time period from January 2006 to November 2016 for

σ = 10 %, 13 %, 16 %

. COPAR: copula autoregressive; Hist: historical; Ind: independent DFM.

© 2017 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ivanov, E.; Min, A.; Ramsauer, F. Copula-Based Factor Models for Multivariate Asset Returns. Econometrics 2017, 5, 20. https://doi.org/10.3390/econometrics5020020

AMA Style

Ivanov E, Min A, Ramsauer F. Copula-Based Factor Models for Multivariate Asset Returns. Econometrics. 2017; 5(2):20. https://doi.org/10.3390/econometrics5020020

Chicago/Turabian Style

Ivanov, Eugen, Aleksey Min, and Franz Ramsauer. 2017. "Copula-Based Factor Models for Multivariate Asset Returns" Econometrics 5, no. 2: 20. https://doi.org/10.3390/econometrics5020020

APA Style

Ivanov, E., Min, A., & Ramsauer, F. (2017). Copula-Based Factor Models for Multivariate Asset Returns. Econometrics, 5(2), 20. https://doi.org/10.3390/econometrics5020020

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Copula-Based Factor Models for Multivariate Asset Returns

Abstract

1. Introduction

2. Factor Models

3. Vine Copulas

4. Copula Autoregressive Model

5. Empirical Application

6. Conclusions and Final Remarks

Acknowledgments

Author Contributions

Conflicts of Interest

Appendix A. Assets for Portfolio Optimization

Appendix B. Underlying Panel Data

Appendix C. Copula Families

Appendix D. Portfolio Characteristics

Appendix E. Significance Tests for Sharpe Ratios

Appendix F. Testing for Granger Causality

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI