Model Free Inference on Multivariate Time Series with Conditional Correlations

: New results on volatility modeling and forecasting are presented based on the NoVaS transformation approach. Our main contribution is that we extend the NoVaS methodology to modeling and forecasting conditional correlation, thus allowing NoVaS to work in a multivariate setting as well. We present exact results on the use of univariate transformations and on their combination for joint modeling of the conditional correlations: we show how the NoVaS transformed series can be combined and the likelihood function of the product can be expressed explicitly, thus allowing for optimization and correlation modeling. While this keeps the original “model-free” spirit of NoVaS it also makes the new multivariate NoVaS approach for correlations “semi-parametric”, which is why we introduce an alternative using cross validation. We also present a number of auxiliary results regarding the empirical implementation of NoVaS based on different criteria for distributional matching. We illustrate our ﬁndings using simulated and real-world data, and evaluate our methodology in the context of portfolio management.


Introduction
Joint modeling of the conditional second moments, volatilities and correlations, of a vector of asset returns is considerably more complicated (and with far fewer references) than individual volatility modeling. With the exception of realized correlation measures, based on high-frequency data, the literature on conditional correlation modeling is plagued with the "curse of dimensionality": parametric or semi-parametric correlation models are usually dependent on a large number of parameters (always greater than the number of assets being modeled). Besides the always lurking misspecification problems, one is faced with the difficult task of multi-parameter numerical optimization under various constraints. Some recent advances, see for example Ledoit et al. [1] and Palandri [2], propose simplifications by breaking the modeling and optimization problem into smaller, more manageable, sub-problems but one still has to make ad-hoc assumptions about the way volatilities and correlations are parametrized.
In this paper we present a novel approach for modeling conditional correlations building on the NoVaS (NOrmalizing and VAriance Stabilizing) transformation approach introduced by Politis [3][4][5][6] and significantly extended by Politis and Thomakos [7,8]. Our work has both similarities and differences with the related literature. The main similarity is that we also begin by modeling the volatilities of the individual series and estimate correlations using the standardized return series.
Stats 2020, 3 The main differences are that (a) we do not make distributional assumptions for the distribution of the standardized returns, (b) we assume no "model" for the volatilities and the correlations and (c) calibration-estimation of parameters requires only one-dimensional optimizations in the unit interval and simple numerical integration.
The main advantages of using NoVaS transformations for volatility modeling and forecasting, see Politis and Thomakos [8], are that the method is data-adaptable without making any a prior assumptions about the distribution of returns (e.g., their degree of kurtosis) and it can work in a multitude of environments (e.g., global and local stationary models, models with structural breaks etc.) These advantages carry-over to the case of correlation modeling. In addition to our main results on correlations we also present some auxiliary results on the use of different criteria for distributional matching thus allowing for a more "automated" application of the NoVaS methodology. Furthermore, we apply NoVaS to portfolio analysis. A referee has suggested that we consider non-linear types of correlations, asymmetric distributions for the returns and other types of portfolios. These are well suited for future research, especially the extensions for non-linear types of association, but this particular extension is beyond the scope of the current paper. The assumption of asymmetric distributions for the returns is innocuous in the current context: the main idea here is a transformation to normality no matter what the original return distribution might be.
The related literature on conditional correlation modeling is focused on finding parsimonious, easy to optimize, parametric and semi-parametric representations of volatilities and correlations, and on approaches that can handle the presence of excess kurtosis in asset returns. Early references for parametric multivariate models of volatility and correlation include Bollerslev, Engle and Woolridge [9], (the VEC model), Bollerslev [10] (the constant conditional correlation, CCC model), Bollerslev et al. and Engle et al. [11] (the BEKK) model), Pelletier and Silvennoinen et al. [12][13][14][15] (extensions of the CCC-GARCH). For an alternative Bayesian treatment of GARCH models see Vrontos et al. [16] Engle [17] introduced the popular dynamic conditional correlation (DCC) model, which was extended and generalized by various authors: see, among others, Sheppard [18] and Hafner and Frances [19]. Independently Tse and Tsui [20] developed the VC-GARCH. For a review of the class of multivariate GARCH-type models see Bauwens et al. [21] and Silvennoinen and Teräsvirta [14] and for a review of volatility and correlation forecast evaluation see Patton and Sheppard [22]. A recent paper linking BEKK and DCC models is Caporin and McAleer [23].
Part of the literature treats the problem in a semi-parametric or non-parametric manner, such as in Long and Ullah [24] and Hafner et al. [25] Ledoit et al. [1] and Palandri [2] propose simplifications to the modeling process, both on a parametrization and optimization level.
The NoVaS approach we present in this paper also has some similarities with copula-based modeling where the marginal distributions of standardized returns are specified and then joined to form a multivariate distribution; for applications in the current context, see Jondeau and Rockinger [26] and Patton [27], and see Andersen et al. [28] for the realized correlation measures.
Finally, although time-series correlation modeling originated in the econometrics literature for financial applications, it caught on recently on the neuroscience literature after Lindquist et al. [29] presented DCC to the neuroimaging community. Some very recent and related papers on this strand of the literature, some of which propose robust methods, are those of Warnick et al. [30], John et al. [31], Lee and Kim [32,33], Behboundi and Farnoosh [34] and Faghiri et al. [35] The rest of the paper is organized as follows: in Section 2, we briefly review the general development of the NoVaS approach; in Section 3, we present the new results on NoVaS-based modeling and forecasting of correlations; in Section 4, we present a proposal for "model" selection in the context of NoVaS; in Section 5, we present some limited simulation results and a possible application of the methodology in portfolio analysis, while in Section 6, we present an illustrative empirical application; Section 7 offers some concluding remarks.

Review of the NoVaS Methodology
In this section we present a brief overview of the univariate NoVaS methodology: the NoVaS transformation, the implied NoVaS distribution and the methods for distributional matching. For brevity we do not review the NoVaS volatility forecasting methodology, which can be found along with additional discussion in Politis and Thomakos [8].

NoVaS Transformation and Implied Distribution
Consider a zero mean, strictly stationary time series {X t } t∈Z corresponding to the returns of a financial asset. We assume that the basic properties of X t correspond to the 'stylized facts' (Departures from the assumption of these) 'stylized facts' have been discussed in [7,8] of financial returns: 1. X t has a non-Gaussian, approximately symmetric distribution that exhibits excess kurtosis. 2. X t has time-varying conditional variance (volatility), denoted by h 2 3. X t is dependent although it possibly exhibits low or no autocorrelation which suggests possible nonlinearity.
The first step in the NoVaS transformation is variance stabilization to address the time-varying conditional variance of the returns. We construct an empirical measure of the time-localized variance of X t based on the information set F t|t−p where α is a scalar control parameter, a def = (a 0 , a 1 , . . . , a p ) is a (p + 1) × 1 vector of control parameters and G(·; α, a) is to be specified. The function G(·; α, a) can be expressed in a variety of ways, using a parametric or a semi-parametric specification. For parsimony assume that G(·; α, a) is additive and takes the following form: with the implied restrictions (to maintain positivity for γ t ) that α ≥ 0, a j ≥ 0, g(·) > 0 and a p = 0 for identifiability. The "natural" choices for g(z) are g(z) = z 2 or g(z) = |z|. With these designations, our empirical measure of the time-localized variance becomes a combination of an unweighted, recursive estimator s t−1 of the unconditional variance of the returns σ 2 = E X 2 1 , or of the mean absolute deviation of the returns δ = E|X 1 |, and a weighted average of the current. The necessity and advantages of including the current value is elaborated upon by Politis [3][4][5][6][36][37][38] and the past p values of the squared or absolute returns.
Using g(z) = z 2 results in a measure that is reminiscent of an ARCH(p) model which was employed in Politis [3,4,36]. The use of absolute returns, i.e., g(z) = |z| has also been advocated for volatility modeling; see, e.g., Ghysels and Forsberg [39] and the references therein. Robustness in the presence of outliers is an obvious advantage of absolute vs. squared returns. In addition, note that the mean absolute deviation is proportional to the standard deviation for the symmetric distributions that will be of current interest. The practical usefulness of the absolute value measure was demonstrated also in Politis and Thomakos [7,8].
The second step in the NoVaS transformation is to use γ t in constructing a studentized version of the returns, akin to the standardized innovations in the context of a parametric (e.g., GARCH-type) model. Consider the series W t defined as: where φ(z) is the time-localized standard deviation that is defined relative to our choice of g(z), The aim now is to choose the NoVaS parameters in such a way as to make W t follow as closely as possible a chosen target distribution that is easier to work with. The natural choice for such a distribution is the normal-hence the 'normalization' in the NoVaS acronym; other choices (such as the uniform) are also possible in applications, although perhaps not as intuitive-see, e.g., Politis and Thomakos [7,8]. Note, however, that the uniform distribution is far easier to work with in both the univariate and multivariate context. Remark 1. The above distributional matching should not only focus on the first marginal distribution of the transformed series W t . Rather, the joint distributions of W t should be normalized as well; this can be accomplished by attempting to normalize linear combinations of the form W t + λW t−k for different values of the lag k and the weight parameter λ; see, e.g., Politis [3,4,36]. For practical applications it appears that the distributional matching of the first marginal distribution is quite sufficient.
A related idea is the notion of an implied model that is associated with the NoVaS transformation that was put forth by Politis [36,37] for the univariate and for the multivariate case respectively. For example, solving for X t in (3), and using the fact that γ t depends on X t , it follows that: where (corresponding to using either squared or absolute returns) the two terms on the right-hand side above are given by and If one postulates that the U t are i.i.d. according to some desired distribution, then (4) becomes a bona fide model. In particular, when g(z) = z 2 , then (4) is tantamount to an ARCH(p) model. For example, if the distribution of U t is the one implied by (4) with W t having a (truncated) normal distribution, then (4) is the model that is 'associated' with NoVaS. Details on the exact form and probabilistic properties of the resulting implied distributions for U t for all four combinations of target distributions (normal and uniform) and variance estimates (squared and absolute returns) are available on request. (4) can not only be viewed as an implied model, but does also give us a backwards transformation from W t back to X t . Assuming we have transformed our series X t and are now working with W t , we can recapture X t , for example in the case of g(z) = z 2 , by:

Remark 2. Equation
This is going to be interesting in later parts of this work.

Weight Selection
We next turn to the issue of optimal selection-calibration-of the NoVaS parameters. The objective is to achieve the desired distributional matching with as few parameters as possible (parsimony). The free parameters are p (the NoVaS order), and (α, a). The parameters α and a are constrained to be nonnegative to ensure the same for the variance. In addition, motivated by unbiasedness considerations, Politis [3,4,36] suggested the convexity condition α + ∑ p j=0 a j = 1. Finally, thinking of the coefficients a i as local smoothing weights, it is intuitive to assume a i ≥ a j for i > j.
We discuss the case when α = 0; see Politis and Thomakos [7,8] for the case of α = 0. The simplest scheme that satisfies the above conditions is equal weighting, that is a j = 1/(p + 1) for all j = 0, 1, . . . , p. These are the 'simple' NoVaS weights proposed in Politis [3,4,36]. An alternative allowing for greater weight to be placed on earlier lags is to consider exponential weights of the form: where b is the rate; these are the 'exponential' NoVaS weights proposed in Politis [3,4,36]. In the exponential NoVaS, p is chosen as a big enough number, such that the weights a j are negligible for j > p. Both the 'simple' and 'exponential' NoVaS require the calibration of two parameters: a 0 and p for 'simple', and a 0 and b for 'exponential'. Nevertheless, the exponential weighting scheme allows for greater flexibility, and will be our preferred method. In this connection, let θ def = (p, b) → (α, a), and denote the studentized series as W t ≡ W t (θ) rather than W t ≡ W t (α, a). For any given value of the parameter vector θ we need to evaluate the 'closeness' of the marginal distribution of W t with the target distribution. To do this, an appropriately defined objective function is needed, and discussed in the next subsection.

Objective Functions for Optimization
To evaluate whether the distributional matching to the target distribution has been achieved, many different objective functions could be used. For example, one could use moment-based matching (e.g., kurtosis matching as originally proposed by Politis [3,4,36], or complete distributional matching via any goodness-of-fit statistic like the Kolmogorov-Smirnov statistic, the quantile-quantile correlation coefficient (Shapiro-Wilk type of statistic) and others. All these measures are essentially distance-based and the optimization will attempt to minimize the distance between empirical (sample) and target values.
Consider the simplest case first, i.e., moment matching. Assuming that the data are approximately symmetrically distributed and only have excess kurtosis, one first computes the sample excess kurtosis of the studentized returns as: W t denotes the sample mean, s 2 (W t −W n ) 2 denotes the sample variance of the W t (θ) series, and κ * denotes the theoretical kurtosis coefficient of the target distribution. For the normal distribution κ * = 3, whereas for the Uniform κ * = 1.
The objective function for this case can be taken to be the absolute value, i.e., D n (θ) def = |K n (θ)|, and one would adjust the values of θ so as to minimize D n (θ). As noted by Politis [3,4,36] such an optimization procedure will always have a solution in view of the intermediate value theorem. To see this, note that when p = 0, a 0 must equal 1, and thus W t = sign(X t ) that corresponds to K n (θ) < 0 for any choice of the target distribution. On the other hand, for large values of p we expect that K n (θ) > 0, since it is assumed that the data have large excess kurtosis. Therefore, there must be a value of θ that will make the sample excess kurtosis approximately equal to zero. Politis [3,4,36] describes a suitable algorithm that can be used to optimize D n (θ).
Alternative specifications for the objective function that we have successfully used in previous applied work include the QQ-correlation coefficient and the Kolmogorov-Smirnov statistic. The first is easily constructed as follows. For any given values of θ compute the order statistics W (t) , W (1) ≤ W (2) ≤ · · · ≤ W (n) , and the corresponding quantiles of the target distribution, say Q (t) , obtained from the inverse cdf. The squared correlation coefficient in the simple regression on the pairs Q (t) , W (t) is a measure of distributional goodness of fit and corresponds to the well known Shapiro-Wilk test for normality, when the target distribution is the standard normal. We now have that: In a similar fashion one can construct an objective function that is based on the Kolmogorov-Smirnov statistic as: where F W (x) and F(x) are two cumulative distribution functions; F W (x) is the empirical distribution of the W t data, and F(x) is the target distribution. Note that for any choice of the objective function we have that D n (θ) ≥ 0 and the optimal values of the parameters are clearly determined by the condition: with the final studentized series given by W * t ≡ W t (θ * n ).

Remark 3.
While the above approach is theoretically and empirically suitable for achieving distribution matching in a univariate context the question about its suitability in a multivariate context naturally arises. For example, why not use a multivariate version of a kurtosis statistic (e.g., Mardia [40], Wang and Serfling [41]) or a multivariate normality statistic (e.g., Royston [42], Villasenor-Alva and Gonzalez-Estrada [43])? This is certainly possible, and follows along the same arguments as above. However, it also means that multivariate numerical optimization (in a unit hyperplane) would need to be used thus making the multivariate approach unattractive for large scale problems. Our preferred method is to perform univariate distributional matching for the individual series and then model their correlations, as we show in the next section.

Multivariate NoVaS & Correlations
We now turn to multivariate NoVaS modeling. Our starting point is similar to that of many other correlation modeling approaches in the literature. In a parametric context one first builds univariate models for the volatilities and then uses the fitted volatility values to standardize the returns and use those for building a model for the correlations. Our approach is similar. The first step is to use the univariate NoVaS transformation to obtain the (properly aligned) studentized series W * t,i and W * t,j , for a pair of returns (i, j). There are two main advantages with the use of NoVaS in the present context: (a) the individual volatility series are potentially more accurate since there is no problem of parametric misspecification and (b) there is only one univariate optimization per pair of returns analyzed. To fix ideas first remember that the studentized return series use information up to and including time t. Note that this is different from the standardization used in the rest of the literature where the standardization is made from the model not from the data, i.e., from X t /A t−1 in the present notation. This allows us to use the time t information when computing the correlation measure.
We start by giving a definition concerning the product of two series. Definition 1. Consider a pair (i, j) of studentized returns W * t,i and W * t,j , which have been scaled to zero mean and unit variance, and let Z t (i, j) ≡ Z t def = W * t,i W * t,j denote their product.
j is the constant correlation coefficient between the returns and can be consistently estimated by the sample mean of Z t as ρ n j |F t−s , for s = 0, 1, is the conditional correlation coefficient between the returns. For the case that s = 0 the expectation operator is formally redundant but see Equation (13) and the discussion around it.
The unconditional correlation can be estimated by the sample mean of the Z t . The remaining task is therefore to propose a suitable form for the conditional correlation and to estimate its parameters. To stay in line with the "model-free" spirit of this paper, when choosing a method to estimate the conditional correlation, we opt for parsimony, computational simplicity and compatibility with other models in the related literature. The easiest scheme is exponential smoothing as in Equation (14) which can compactly represented as the following autoregressive model: and can therefore be estimated by: for s = 0, 1, λ ∈ (0, 1) the smoothing parameter and L a (sufficiently high) truncation parameter. This is of the form of a local average so different weights can be applied. An alternative general formulation could, for example, be as follows: with B the backshift operator. Choosing exponential weights, as in univariate NoVaS, we have For any specification similar to the above, we can impose an "unbiasedness" condition (similar to other models in the literature) where the mean of the conditional correlation matches the unconditional correlation as follows:ρ Note what exactly is implied by the use of s = 0 in the context of Equation (13): the correlation is still conditional but now using data up to and including time t. Both s = 0 and s = 1 options can be used in applications with little difference in their in-sample performance; their out-of-sample performance needs to be further investigated.
Other specifications are, of course, possible but they would entail additional parameters and move us away from the NoVaS smoothing approach. For example, at the expense of one additional parameter we could account for asymmetries in the correlation in a standard fashion such as: with d t−s def = I(Z t−s < 0) the indicator function for negative returns. Finally, to ensure that the estimated correlations lie within [−1, 1] it is convenient to work with an (optional) scaling condition, such as the Fisher transformation and its inverse. For example, we can model the series: and then transform and recover the correlations from the inverse transformation: What is left to do is to estimate λ. In the following, we propose two approaches. One involves maximum likelihood estimation and is based on the distribution of the product of the two studentized series. The other is more in line with the "model-free" spirit of the NoVaS approach and uses cross-validation.

Maximum Likelihood Estimation
We first summarize some interesting properties concerning the product of two studentized series in the following proposition. Proposition 1. With Definition 1, and under the assumptions of strict stationarity and distributional matching the following holds.
1. Assuming that both studentized series were obtained using the same target distribution then the (conditional or unconditional) density function of Z t can be obtained from the result of Rohatgi (1976) and has the generic form of: is the joint density of the studentized series. In particular: (a) If the target distribution is normal, and using the unconditional correlation ρ, the density function of Z t is given by Craig (1936) and has the following form f Z (z; ρ) = I 1 (z; ρ) − I 2 (z; ρ) where: and I 2 (z; ρ) is the integral of the same function in the interval (−∞, 0). Note that the result in Graig (1936) is for the normal not truncated normal distribution; however, the truncation involved in NoVaS has a negligible effect in the validity of the result. (b) If the target distribution is uniform, and again using the unconditional correlation ρ, the density function of Z t can be derived using the Karhunen-Loeve transform and is given (apart from a constant) as: In this case, and in contrast to the previous case with a truncated normal target distribution, the result obtained is exact.
2. A similar result as in 1 above holds when we use the conditional correlation ρ t|t−s , for s = 0, 1.

Remark 4.
Proposition 1 allows for a straightforward interpretation of unconditional and conditional correlation using NoVaS transformations on individual series. Moreover, note how we can make use of the distributional matching, based on the marginal distributions, to form an explicit likelihood for the product of the studentized series; this is different from the copula-based approach to correlation modeling where from marginal distributions we go to a joint distribution-the joint distribution is just not needed in the NoVaS context. We can now use the likelihood function of the product Z t to obtain an estimate of λ, as in Equation (13).
Given the form of the conditional correlation function, the truncation parameter L and the above transformation we have that the smoothing parameter λ is estimated by maximum likelihood as: Remark 5. Even though we do not need the explicit joint distribution of the studentized series, we still need to know the distribution of the product. Furthermore, using an MLE approach makes the multivariate NoVaS "semi-parametric". We therefore introduce a second method based on cross validation which does not require knowledge of the distribution, and hence keeps NoVaS model-free.

Cross Validation (CV)
Our aim in this subsection is to find an estimate for λ as in Equation (13), without using a maximum likelihood method, and without using the density of Z t . We instead use an error minimization procedure. We start by suggesting different objective functions which we then compare for suitability.
We still use Equation (13) but without knowing the density of Z t . We only rely on the data. In the following, we define an objective function Q(λ), which describes how well the λ is globally suited to describe the conditional correlation. Q(λ) is then minimized with respect to λ, in order to find the optimal λ in Equation (13) to capture the conditional correlation.
, a first intuitive approach is to define the objective function by: • CV 2 Assume we observe the series: . . , X j,T , X j,T+1 , . . . X j,n and transform them individually with univariate NoVaS to get: Assuming we used NoVaS with a normal target distribution, due to the properties of the multivariate normal distribution, the best estimator for W i,T+1 given W j,T+1 , is: Assuming now that we furthermore observe X i,T+1 , . . . X i,n and therefore the entire series: . . X i,n X j,1 , X j,2 , . . . , X j,T , X j,T+1 , . . . X j,n we can use the estimatesŴ i,k+1 with k = T, . . . , n − 1 as in Equation (22) to get to the objective function: In this context, T should be chosen large enough, in order to guarantee that the estimate of the conditional correlation in Equation (13) has enough data to work with. For practical implementation, we use T ≈ n/4. • CV 3 To account for the symmetry of the correlation, one can add to the term in Equation (23) the symmetric term: to get to the objective function: • CV 4 Remaining in the same state of mind as for Method 2 and 3, one might think that ρ t|t−1 should rather describe the dependency between X i,t and X j,t then between W i,t and W j,t . One could therefore argue, that it would be more sensible to use (X j,t − X j,t ) as an error. Still, to get toX j,t , one has to go throughŴ j,t , which we get by applying Equation (22). One can then use the inverse transformation discussed in Equation (7), namely: Now, one can once again define the objective error function: • CV 5 With the same motivation as in Method 3, thus to account for the symmetry of the correlation, one could think about using: • CV 6 With the motivation of capturing the correct sign of the correlation, one can define an objective function that gets larger if the sign of the correlation at time point t is not predicted correctly. More formally, we define the loss function L: for t = T + 1, . . . , n, and withŴ i,t defined as in Equation (22). Our objective error function is then: No matter which of the six methods is used, the goal will in every case be to chooseλ as in: Using this estimate in Equation (13) than yields the captured correlation: Remark 6. Note however that the captured correlation is first of all the correlation between the series W t,i and W t,j . We are now interested in the correlation between X t,i and X t,j . To be more precise, we have an estimatê ρ t|t−s,W for: What we would like to get is an estimateρ t|t−s,X for With Equation (7), this is in the case of g(z) = z 2 : Since analytic computation of the above term is difficult, it is more sensible to use the iid structure of the (W t,i , W t,j ). Assuming a normal target distribution of the NoVaS transformation, we can sample from the multivariate normal distribution of the (W t,i , W t,j ) with covariance matrix: We then transform the sampled iid (W t,i , W t,j ) back to (X t,i ,X t,j ) using the backwards transformation Equation (25). Doing that, we can for every t construct an empirical distribution of the (X t,i , X t,j ) which we then use to computeρ t|t−s,X using again Equation (14).
Interestingly, practical application show that the captured correlationρ t|t−s,W barely differs from ρ t|t−s,X . This might be due to the fact that at least in the case of a normal target, the distribution of is actually bell shaped albeit with heavy tails. We are still investigating why this empirical finding also holds for a uniform target.

Going from the Bivariate Paradigm to a Fully Multivariate Setting
All the discussion in Section 3 has focused on estimating the conditional correlation coefficient of a pair (i, j) of studentized returns W * t,i and W * t,j ; the notationρ t|t−s and ρ t|t−s was used for the estimator and estimand respectively.
Of course, the realistic challenge is to construct an estimate of the conditional correlation coefficient matrix-denoted R t|t−s -that is associated with a d-dimensional multivariate series, i.e., a vector series that at time t equals (W * t,1 , . . . , W * t,d ) . Our proposal is to construct a matrix estimatorR t|t−s whose (i, j)-th element is given by the aforementioned conditional correlation coefficient of returns W * t,i and W * t,j . Hence, for each pair (i, j), an optimal estimator of the (i, j) entry of R t|t−s that involves its own optimal smoothing λ parameter. For example, suppose that estimating the correlation between series 1 and 2 requires a small λ whereas for the correlation between series 3 and 4 it would be better to use a large λ. Using the same smoothing λ parameter for all the entries of the matrix could be grossly suboptimal.
Our procedure constructs a matrix estimator that is optimized entry-by-entry; the disadvantage is that the estimatorR t|t−s will not necessarily be a nonnegative definite matrix. Here, we will adopt the philosophy of Politis [2011], i.e., start with a matrix estimator that is optimized entry-by-entry, and adjust its eigenvalues to make it nonnegative definite.
Although the eigenvalue correction can be performed on the correlation matrix estimatorR t|t−s , it is not guaranteed that the result will be a correlation matrix despite being nonnegative definite. Hence, it is more expedient to do the eigenvalue correction on an estimate of the covariance matrix instead, and then turn the corrected covariance matrix estimator into a new correlation matrix estimator.
The complete algorithm is outlined in what follows.
1. ConstructR t|t−s which is the estimator of the d × d correlation matrix R t|t−s . The construction is carried out using entry-by-entry optimized estimators each of which is constructed according to one of the methods described in the previous subsections of Section 3. 2. Turn the correlation matrix estimatorR t|t−s into a covariance matrix estimator (denotedĈ t|t−s ) by multiplying each correlation entry by the two respective volatilities, i.e., the square root of local variance estimates. In other words, the i, j entry ofĈ t|t−s consists of the i, j entry ofR t|t−s multiplied by the product of the volatility estimate of series i and the volatility estimate of series j. Note that Step 6 of the above algorithm implies that the square root of the i, i entry ofĈ * t|t−s is to be treated as a new volatility estimate of series i for the purposes of multivariate modeling. However, this volatility estimate differs little from the standard one obtained from univariate modeling, e.g., the one used in Step 6 of the above algorithm. The reason is that the eigenvalue adjustment towards nonnegative definiteness is just a small-sample correction. The estimatorsR t|t−s andR * t|t−s perform identically in large samples; see Politis [2011] for a discussion in a related context.

Model Selection
The NoVaS methodology offers many different combinations for constructing the volatility measures and performing distributional matching. One can mix squared and absolute returns, uniform and normal marginal target distributions, different matching functions (kurtosis, QQ-correlation and KS-statistic) and different cross validation methods to capture the conditional correlation. In applications one can either proceed by careful examination of the properties of individual series and then use a particular NoVaS combination or we can think of performing some kind of "model selection" by searching across the different combinations and selecting the one that gives us the best results. In the univariate case, the best "results" were defined by the closest distributional match. Details and empirical analysis is given in Politis and Thomakos (2008a,b). In our multivariate setting we are much rather interested in the NoVaS combination that is most suited to capture the correlation.
The choice as to which matching function should be used depends on the target distribution. Even though the kurtosis for instance does make sense when opting for a normal target distribution, it is not the most intuitive choice for a uniform target. Practical experimentation suggests that using the kurtosis as a matching measure works well for the normal target, whereas the QQ-correlation coefficient is more suitable when trying to match a uniform distribution. Another important point that should be made is that we are choosing the same target distribution for both univariate series, as, since we are trying to capture correlation, differently distributed series are undesirable.
The choice as to which combination should be chosen can be made as follows. Consider fixing the type of normalization used (squared or absolute returns) and the target distribution (normal or uniform) and then calculating the correlation between the transformed series with all seven of the described methods in Sections 3.1 and 3.2. Calculate the mean squared error between this captured correlation and the realized correlation. Record the results in a (7 × 1) vector, say D m (ν, τ), where m = Method 1, ..., Method 6, MLE Method, ν = squared, absolute returns and τ = normal, uniform target distribution. Then, repeat the optimizations with respect to all seven methods for all combinations of (ν, τ). The "optimal" combination is then defined across all possible combinations (m, ν, τ) as follows: Since the realized correlation is in general not known in practice, one can alternatively evaluate the quality of the captured correlation between say X t and Y t by using it to forecast X n given Y n bŷ X n =ρ n Y n . Then the optimal NoVaS transformation is the one that minimizes (X n −X n ) 2 .
The choice of the truncation parameter L in Equation (13) can be based on the chosen length on the individual NoVaS transformations (i.e., on p from Equation (2)) or to a multiple of it or it can be selected via the AIC or similar criterion (since there is a likelihood function available).

Application to Portfolio Analysis
In what follows, we apply the NoVaS transformation to return series for portfolio analysis. We consider the case of a portfolio consisting of two assets, with prices at time t p 1,t and p 2,t and continuously compounded returns r 1,t = log p 1,t /p 1,t−1 and r 2,t = log p 2,t /p 2,t−1 . Denote by µ 1 and µ 2 the assumed non time varying mean returns. The variances are σ 2 1,t and σ 2 2,t and the covariance between the two assets is σ 12,t = ρ 12,t σ 1,t σ 2,t .
Let us further assume, that the portfolio consists of β t units of asset 1 and (1 − β t ) units of asset 2. The portfolio return is therefore given by where we use the linear approximation of the logarithm, because we can expect that returns are going to be small, a setting in which this approximation works well. β is indexed by t − 1, because the choice of the composition of the portfolio has to made before the return in t is known. We assume that no short sales are allowed, and therefore impose that 0 ≤ β t ≤ 1 for all t. The portfolio variance is given by The goal of portfolio analysis in this context is to choose β t , such that the utility of the investor is maximized.
The utility of the investor is a function of the portfolio return and the portfolio variance. Assuming that the investor is risk-averse with risk aversion parameter η, a general form of the utility function is: where the last equality is exact if we assume efficient markets.
A rational investor will try to maximize his utility with respect to β t−1 : which simplifies to the minimum variance weight when we assume zero means: 12,t Recall that we need to impose that 0 ≤ β t−1 ≤ 1 under the assumption of no short sales. As expected, the optimal hedge ratio depends on the correlation and can therefore be time varying.

Simulation Study
In this section we report results from a simulation study. We use two types of simulated data. First, we use a simple bivariate model as a data generating process (DGP), as in [22], which we call DGP-PS, that allows for consistent realized covariances and correlations to be computed. Next, we assume two univariate GARCH models and specify a deterministic time varying correlation between them.
We start by illustrating the approach discussed in Section 4.1 and continue with comparing the performance of NoVaS with other standard methods from literature. Finally we conclude by applying NoVaS to portfolio analysis.

DGP-PS Simulation
Y t ] denote the (2 × 1) vector of returns, the DGP-PS is given as follows: whereΣ is a (2 × 2) matrix with unit diagonals and off-diagonals entries of 0.3. We let t = 1, . . . , 1000. This is a scalar BEKK-GARCH model of order 1.
We use the model selection approach of the previous section. We compute D m (ν, τ) of Equation (30) for all m and (ν, τ) and repeat the calculations 1000 times. We summarize the mean squared error between the realized correlation ρ t|t−s and our estimated conditional correlationρ t|t−s in all 28 combinations in Table 1.
We use the specified NoVaS transformation, where the kurtosis as in Equation (9) was used when fitting to a normal target, and the QQ-correlation as in Equation (11) was used when fitting to a uniform target. Furthermore we set s = 0 and used exponential weights. The NoVaS transformation with normal target and absolute returns, combined with the MLE method to estimate the correlation yields the best result. Using a normal target with squared returns combined with methods CV 2 and CV 4 perform competitively. One can expect that the good performance of the MLE compared to the other methods is due to the Gaussian structure of the data. In this context, and given the nature of the DGP, it would be hard for a non-parametric and "model-free" method to beat a parametric one, especially when using a normal distribution for constructing the model's innovations. In practice, when the DGP is unknown and the data have much more kurtosis, the results of the NoVaS approach might be different. We explore this in the Section 6.
We now focus on a different type of simulated data with deterministic and time-varying correlation.

Multivariate Normal Returns
We now assume that our bivariate return series follows a multivariate normal distribution, where the variances are determined by two volatility processes that follows GARCH dynamics. At the same time we specify a deterministic correlation process between the two return series. More precisely: where σ 2 1,t = 0.01 + 0.05X 2 1,t−1 + 0.94σ 2 1,t−1 σ 2 2,t = 0.05 + 0.20X 2 2,t−1 + 0.50σ 2 2,t−1 and ρ 1,t = 0.5 + 0.4 cos(2πt/400) or Both examples of ρ t , the first implying a sinusoidal correlation, the second a linearly increasing correlation, will be examined.
For both multivariate processes, we again compute the D m (ν, τ) (as in Equation (30)). We repeat the computations 1000 times in order to get a robust idea of which method works best. Table 2 shows the mean squared error between the real deterministic correlation and the estimates using the 28 different NoVaS methods. As we can see in Table 2, in the case of a sinusoidal correlation structure, using NoVaS with a uniform target distribution and absolute returns seems to be working best when using the MLE method to capture the correlation, but CV 2 and CV 4 and absolute returns perform competitively. In the case of the linearly increasing correlation, one should again use uniform target either with squared returns and CV 4 or absolute returns and CV2. Interestingly, in this case, using a uniform target distribution clearly outperforms the normal target. We show plots of the resulting estimated correlation in Figure 1.

Comparison of NoVaS to Standard Methods and Portfolio Analysis
We now evaluate the performance of NoVaS. To do that we compare the error between the correlation captured by NoVaS to the realized correlation to the error made when capturing the correlation with standard methods from literature. We use baseline methods like a naive approach, where the hedge ratio β t is defined to be constantly 0.5, and a linear model, where the hedge ratio is defined through linear regression. We furthermore compare NoVaS to GARCH based methods like DCC, BEKK and CCC.
In Table 3, we calculate the mean-squared error between captured correlation and realized correlation of the simulation examples as before. We average over 1000 repetitions of the simulations. NoVaS in Table 3 corresponds to the method that performed best in capturing the correlation according to Tables 1 and 2 (hence normal target and absolute returns, combined with the MLE method for the DGP-PS data; uniform target with squared returns and the MLE method for sinusoidal correlation; and uniform target with absolute returns and CV 4 for linearly increasing correlation). Table 3 shows that NoVaS gets outperformed by the classic DCC and BEKK approaches with all three types of simulated data, when considering the MSE between realized and estimated correlation. However, considering the structure of the simulated datasets, NoVaS performs better than expected, especially when considering the correlation between realized and estimated correlation. We expect that NoVaS will perform even better on datasets with heavier tails and less structure. Table 3. MSE and correlation between realized correlation and estimated correlation by the respective method averaged over 1000 simulations. NoVaS corresponds to the best method according to Tables 1 and 2. For the BEKK, we fit a BEKK-GARCH(1,1,1) defined as in Engle and Kroner (1995). We now apply NoVaS to portfolio analysis. We use the correlation captured by the different methods above, in order to calculate the optimal hedge ratio defined in Equation (34). More precisely, the following algorithm is applied: Algorithm 1: Computation of portfolio weights and associated returns.
We compute mean, standard deviation and Sharpe ratio of the portfolio returns. The results are shown in Table 4. The simulations are repeated 100 times, and we show average results. Table 4. Mean, standard deviation and sharpe ratio of portfolio returns, where the hedge ratio is based on the methods in the left column. NoVaS stands for the NoVaS method that for the specific dataset had the most convincing results when capturing the correlation.  Table 4 confirms previous results. NoVaS is on structured simulated data not able to outperform known methods based on multivariate GARCH. However, especially when looking at the standard deviation of the returns, NoVaS seems to be on par. In Figure 1 we show plots of the captured correlation by the different methods of exemplary simulated data.

DGP-PS
However, once again, one should not forget that we are dealing with very structured datasets with normal innovation processes. This is something that should be beneficial to parametric GARCH type methods. In the next section, we will observe how NoVaS performs under conditions where the data has heavier tails and the dynamics of the dataset are unknown.

Empirical Illustration
In this section we offer a brief empirical illustration of the NoVaS-based correlation estimation using two data sets. First we consider data from the following three series: the S&P500, the 10-year bond (10-year Treasury constant maturity rate series) and the USD/Japanese Yen exchange rate. Then, to assess the performance on smaller samples, we focus on returns of the SPY and the TLT Exchange Traded Funds (ETFs). Daily data are obtained from the beginning of the series and then trimmed and aligned. Daily log-returns are computed and from them we compute monthly returns, realized volatilities and realized correlations. The final data sample is from 01/1971 to 02/2010 for a total of n 1 = 469 available observations for the first three series, and from 08/2002 to 02/2016 for the second sample for total of n 2 = 135. Figure 2 plots the monthly realized and fitted correlations and Tables 5 and 6 summarizes some descriptive statistics. From Tables 5 and 6 we can see that all the series have excess kurtosis and appear to be non-normally distributed (bootstrapped p-values from the Shapiro-Wilk normality test-not reported-reject the hypothesis of normality). In addition, there is negative skewness for the S&P500, the USD/JPY and the SPY series. Table 5. Descriptive Statistics for monthly data, sample size is n 1 = 469 months from 01/1970 to 02/2010; SW is short for Shapiro-Wilk. The first 3 columns refer to returns, the middle 3 columns to realized volatilities, and the last 3 columns to correlations.

Capturing the Correlation
After performing univariate NoVaS transformation of the return series individually, we move on to compute the NoVaS-based correlations. We use exponential weights as in Equations (15) and (16) with s = 0 and L set to a multiple of the lags used in the individual NoVaS transformations (results are similar when we use s = 1). Applying the model selection approach of Section 4, we look for the optimal combination of target distribution, squared or absolute returns and the method to capture the correlation. The results are summarized in Table 7.
When working with the S&P500 and Bonds dataset, the MSE between realized and estimated correlation is minimized when using the MLE Method, together with uniform target and absolute returns. In the other two cases, normal target and squared returns yield better results. The dataset with S&P500 and USD/YEN works best with CV 4, whereas the Bonds and USD/YEN dataset works better with CV 2. In the case of the shorter SPY and TLT index return series, using a normal target distribution and absolute returns, together with the MLE method provides the best result.
We now assess the performance of NoVaS using the same measures as in the simulation study and compare the results with the same methods as before. Table 8 summarizes the results and Figure 2 plots the realized correlations along with the fitted values from different benchmark methods.
The table entries show that the NoVaS approach provides better estimates of the conditional correlation than the other models. For all four datasets the mean-squared error when using NoVaS is either better or on par with the other methods, and the correlation between realized and estimated correlation is larger.
Our results are, of course, conditional on the data at hand and the benchmark models. However, we should note that one of the advantages of NoVaS is data-adaptability and parameter parsimony. There are different, more complicated, types of correlation models that include asymmetries, regime switching, factors etc. All these models operate on distributional assumptions, parametric assumptions and are far less parsimonious that the NoVaS approach suggested in this paper. In addition, they are computationally very hard to handle even when the number of series used is small (this is true even for the sequential DCC model of Palandri [2009]). NoVaS-based correlations do not suffer from these potential problems and they can be very competitive in applications.

Application to Portfolio Analysis
We finally want to give the results of the proposed application in portfolio analysis. As in the previous section with simulated data, we use the captured correlation in order calculate the optimal hedge ratio, as in Equation (34). We then compare mean, standard deviation and Sharpe ratio of the portfolio returns. We use Algorithm 1, and set T 0 = 1/2N and η = 3. Table 9 summarizes the results.
The results of Table 9 show that NoVaS again performs best or on par when comparing to the other methods. In three out of four of our examples, the NoVaS portfolio achieves the highest Sharpe ratio. Only when constructing a portfolio out of Bonds and USD/YEN does the CCC method give a higher Sharpe Ratio, but NoVaS achieves the smallest standard deviation of the returns. In further analysis, not reported here, we compared results based on different choices of the risk aversion coefficient η in Equation (34), and of T 0 . The performance of NoVaS somehow did not vary. Table 9. Mean (M), standard deviation (SD) and Sharpe ratio (SR) of portfolio returns, where the portfolio weights are chosen as in Equation (34), with the correlation estimated with the methods in the left column. The NoVaS transformation used is the one that performed best in Table 7. Naive assumes an equally weighted portfolio. Linear model estimates the conditional correlation based on linear regression.

Do NoVaS Work Well in Larger Dimensions?
There is but one remaining question that we need to address: is NoVaS the way to go when we increase the dimensionality of a problem? The answer to this question can be a paper of its own so here we provide a short-cut illustrating both the efficacy of the purely multivariate algorithm presented in Section 3 but also the efficacy in a practical application. To this end we expand our universe of assets to include more than a pair in two sets of experiments: in the first experiment we add just one more asset, to have a representation of equities, fixed income/bonds and cash; in the second experiment we take several assets that comprise the industry components of S&P500. For the first experiment we add SHY as our cash-proxy ETF and for the second experiment we take the 9 ETFs that make industry-groups for the S&P500: XLB, XLE, XLF, XLI, XLK, XLP, XLU, XLV, XLY.
We thus have a 3-dimensional, a 4-dimensional, and a 9-dimensional example of application of the purely multivariate approach. To better understand the way NoVaS operates on these examples we consider two different sample splits for our estimation, one-third and two-thirds of our total available observations. Furthermore, to examine whether the frequency of portfolio rebalancing matters we consider a weekly and a monthly rebalancing. All our results are collected in Tables 10-13 that follow.
For the weekly rebalancing we find that across the two sample splits NoVaS has the highest or second highest portfolio return in 7 out of 12 cases examined (63%), but only 4 out of 12 (33%) of the highest or second highest Sharpe ratios. For the monthly rebalancing we find that across the two sample splits NoVaS has the highest or second highest portfolio return in 10 out of 12 cases examined (83%), while it has only 3 out of 12 highest or second highest Sharpe ratios (25%). Although the results presented here are only illustrative, they are suggestive of a further avenue of future research for the application of NoVaS in a purely multivariate setting: (a) clearly the application of NoVaS is a return-generator insofar as portfolio performance is concerned, at the (expected) expense of higher portfolio risk; (b) lower frequency rebalancing improves performance dramatically for returns but leaves Sharpe ratios relatively unchanged; (c) NoVaS is highly competitive in larger dimensions because it involves simpler/less optimization and is robust across sample sizes-a relatively large sample is not a theoretical prerequisite for achieving good estimation performance.  Table 11. Mean (M), standard deviation (SD) and Sharpe ratio (SR) of portfolio returns with monthly rebalancing and with weights computed on 1/3 of available observations.  Of course this summary of our multivariate results does not exhaust the problem of NoVaS multivariate applications. It rather leaves open the question as to how further improvements can be made, but as our results show these improvements are necessarily specific to what we are trying to achieve. Further research can show us where multivariate NoVaS fares better, in estimation of correlations or in problems like portfolio allocation.

Concluding Remarks
In this paper we extend the univariate NoVaS methodology for volatility modeling and forecasting, put forth by Politis (2003aPolitis ( ,b, 2007Politis ( , 2015 and Politis and Thomakos (2008a, b), to a multivariate context. Using a simple, parsimonious parametrization and smoothing arguments similar to the univariate case, we show how the conditional correlation can be estimated and predicted. A limited simulation study and an empirical application using real data show that the NoVaS approach to correlation modeling can be very competitive, possibly outperform, a popular benchmark as the DCC model. We furthermore evaluate the NoVaS based correlations in the context of portfolio management. An important advantage of the whole NoVaS approach is data-adaptability and lack of distributional or parametric assumptions. This is particularly important in a multivariate context where most of the competitive models are parametric and much more difficult to handle in applications, especially when the number of assets is large.
There are, of course, open issues that we do not address in this paper but are important both in terms of further assessing the NoVaS approach to correlation modeling and in terms of practical usefulness. Some of them are: (a) evaluation of the forecasting performance of NoVaS-based correlations; (b) additional comparisons of NoVaS-based correlations with other benchmark models; (c) a further and more elaborate exploration of how NoVaS can be applied in high-dimension problems, and what the relevant implications may be.