Copula–based vMEM Specifications versus Alternatives: The Case of Trading Activity

We discuss several multivariate extensions of the Multiplicative Error Model by Engle (2002) to take into account dynamic interdependence and contemporaneously correlated innovations (vector MEM or vMEM). We suggest copula functions to link Gamma marginals of the innovations, in a specification where past values and conditional expectations of the variables can be simultaneously estimated. Results with realized volatility, volumes and number of trades of the JNJ stock show that significantly superior realized volatility forecasts are delivered with a fully interdependent vMEM relative to a single equation. Alternatives involving log–Normal or semiparametric formulations produce substantially equivalent results.


Introduction
Dynamics in financial markets can be characterized by many indicators of trading activity such as absolute returns, high-low range, number of trades in a certain interval (possibly labeled as buys or sells), volume, ultra-high frequency based measures of volatility, financial durations, spreads and so on.Engle (2002) reckons that one striking regularity behind financial time series is that persistence and clustering characterizes the evolution of such processes.As a result, the dynamics of such variables can be specified as the product of a conditionally deterministic scale factor, which evolves according to a GARCH-type equation, and an innovation term which is iid with unit mean.Such models are labeled Multiplicative Error Models (MEM) and can be seen as a generalization of the GARCH (Bollerslev 1986) and of the Autoregressive Conditional Duration (ACD, Engle and Russell (1998)) approaches.One of the advantages of such a model is to avoid the need to resort to logs (not possible when zeros are present in the data) and to provide conditional expectations of the variables of interest directly (rather than expectations of the logs).Empirical results show a good performance of these types of models in capturing the stylized facts of the observed series (e.g., for daily range, Chou (2005); for volatility, volume and trading intensity Hautsch (2008)).
The model was specified by Engle and Gallo (2006) in a multivariate context (vector MEM or vMEM) allowing just the lagged values of each variable of interest to affect the conditional expectation of the other variables.Such a specification lends itself to producing multi-step ahead forecasts: in Engle and Gallo (2006) three different measures of volatility (absolute returns, daily range and realized volatility), influence each other dynamically.Although such an equation-by-equation estimation ensures consistency of the estimators in a quasi-maximum likelihood context, given the stationarity conditions discussed by Engle (2002), correlation among the innovation terms is not taken into account and leads to a loss in efficiency.
However, full interdependence is not allowed by an equation-by-equation approach both in the form of past conditional expectations influencing the present and a contemporaneous correlations of the innovations.The specification of a multivariate distribution of the innovations is far from trivial: a straightforward extension to a joint Gamma probability distribution is not available except in very special cases.In this paper, we want to compare the features of a novel maximum likelihood estimation strategy adopting copula functions to other parametric and semiparametric alternatives.
Copula functions allow to link together marginal probability density functions for individual innovations specified as Gamma as in Engle and Gallo (2006) or as zero-augmented distributions (as in Hautsch et al. (2013), distinguishing between the zero occurrences and the strictly positive realizations).Copula functions are used in a Multiplicative Error framework, but in a Dynamic Conditional Correlation context by Bodnar and Hautsch (2016).Within a Maximum Likelihood approach to the vector MEM, we want also to explore the performance of a multivariate log-Normal specification for the joint distribution of the innovations and the semiparametric approach presented in Cipollini et al. (2013) resulting in a GMM estimator.The range of potential applications of a vector MEM is quite wide: dynamic interactions among different values of volatility, volatility spillovers across markets (allowing multivariate-multi-step ahead forecasts and impulse response functions, order execution dynamics Noss (2007) specifies a MEM for execution depths).
This paper aims at discussing merits of each approach and their performance in forecasting realized volatility with augmented information derived from the dynamic interaction with other measures of market activity (namely, the volumes and the number of trades).We want conditional expectations to depend just on past values (not also on some contemporary information as in Manganelli (2005) and Hautsch (2008)).
What the reader should expect is the following: in Section 2 we lay out the specification of a vector Multiplicative Error Model, discussing the issues arising from the adoption of several types of copula functions linking univariate Gamma marginal distributions.In Section 3 we describe the Maximum Likelihood procedure leading to the inference on the parameters.In Section 4 we discuss the features of a parametric specification with a log-Normal and of a semiparametric approach suggested by Cipollini et al. (2013).In Section 5 we present the empirical application to three series of market trading activity, namely realized kernel volatility, traded volumes and the number of trades.The illustration is performed on the JNJ stock over a period between 2007 and 2013.What we find is that specifying the joint distribution of the innovations allowing for contemporaneous correlation dramatically improves the log-likelihood over an independent (i.e., equation-by-equation) approach.Richer specifications (where simultaneous estimation is unavoidable) deliver a better fit, improved serial correlation diagnostics, and a better performance in out-of-sample forecasting.The Student-T copula possesses better features than the Normal copula.Overall, the indication is that we will have significantly superior realized volatility forecasts when other trading activity indicators and contemporaneous correlations are considered.Concluding remarks follow.

A Copula Approach to Multiplicative Error Models
Let x t be a K-dimensional process with non-negative components.A vector Multiplicative Error Model (vMEM) for x t is defined as where indicates the Hadamard (element-by-element) product and diag(•) indicates a diagonal matrix with its vector argument on the main diagonal.Conditionally upon the information set F t−1 , a fairly general specification for µ t is where ω is (K, 1) and α, γ and β are (K, K).The vector x (−) t has a generic element x t,i multiplied by a function related to a signed variable, be it a positive or negative return (0, 1 values) or a signed trade (buy or sell 1, −1 values), as to capture asymmetric effects.Let the parameters relevant for µ t be collected in a vector θ.
Conditions for stationarity of µ t are a simple generalization of those of the univariate case (e.g., Bollerslev (1986); Hamilton (1994)): a vMEM(1,1) with µ t defined as in Equation ( 2) is stationary in mean if all characteristic roots of A = α + β + γ/2 are smaller than 1 in modulus.We can think of A as the impact matrix in the expression If more lags are considered, the model is where L is the maximum lag occurring in the dynamics.It is often convenient to represent the system (3) in its equivalent companion form where µ * t+L|t−1 = (µ t+L|t−1 ; µ t+L−1|t−1 ; . . .; µ t+1|t−1 ) is a KL × 1 vector obtained by stacking its elements columnwise and The same stationarity condition holds in terms of eigenvalues of A * .The innovation vector ε t is a K-dimensional iid process with probability density function (pdf) defined over a [0, +∞) K support, the unit vector 1 as expectation and a general variance-covariance matrix Σ, (5) The previous conditions guarantee that where the latter is a positive definite matrix by construction.
The distribution of the error term ε t |F t−1 can be specified in different ways.We discuss here a flexible approach using copula functions; different specifications are presented in Section 4.
Using copulas (cf., among others, Joe (1997) and Nelsen (1999), Embrechts et al. (2002), Cherubini et al. (2004), McNeil et al. (2005) and the review of Patton (2013) for financial applications), the conditional pdf of ε t |F t−1 is given by where c(u t ; ξ) is the pdf of the copula, f i (ε t,i ; φ i ) and u t,i = F i (ε t,i ; φ i ) are the pdf and the cdf, respectively, of the marginals, ξ and φ i are parameters.A copula approach, hence, requires the specification of two objects: the distribution of the marginals and the copula function.
In view of the flexible properties shown elsewhere (Engle and Gallo 2006), for the first we adopt Gamma pdf's (but other choices are possible, such as Inverse-Gamma, Weibull, log-Normal, and mixtures of them).This has the important feature of guaranteeing an interpretation of a Quasi Maximum Likelihood Estimator even if the choice of the distribution is not appropriate.For the second, we discuss some possible specifications within the class of Elliptical copulas which provides an unified framework encompassing the Normal, the Student-T and any other member of this family endowed with an explicit pdf.
Elliptical copulas have interesting features and widespread applicability in Finance (for details see McNeil et al. (2005), Frahm et al. (2003), Schmidt (2002)).They are copulas generated by Elliptical distributions, exactly in the same way as the Normal copula and the Student-T copula stem from the multivariate Normal and Student-T distributions, respectively. 1 We consider a copula generated by an Elliptical distribution whose univariate 'standardized' marginals (intended here with location parameter 0 and dispersion parameter 1) have an absolutely continuous symmetric distribution, centered at zero, with pdf g(.; ν) and cdf G(.; ν) (ν represents a vector of shape parameters).The density of the copula can then be written as for suitable choices of K * (., .),g 1 (.; ., .)and g 2 (.; .),where q = (q 1 ; . . .; q K ), q i = G −1 (u i ; ν).Two notable cases will be considered.The first is the Normal copula (McNeil et al. (2005), Cherubini et al. (2004), Bouyé et al. (2000)); with no explicit shape parameter ν, we have K * (K) ≡ 1, g 1 (x; K) = g 2 (x) ≡ exp(−x/2).The pdf in Equation (9) becomes: where q = (q 1 ; . . .; q K ), q i = Φ −1 (u i ) and Φ(x) denotes the cdf of the standard Normal distribution computed at x.The Normal copula has many interesting properties: the ability to reproduce a broad range of dependencies (the bivariate version, according to the value of the correlation parameter, is capable of attaining the lower Fréchet bound, the product copula and the upper Fréchet bound), the analytical tractability, the ease of simulation.When combined with Gamma(φ i , φ i ) marginals, the resulting multivariate distribution is a special case of dispersion distribution generated from a Gaussian copula 1 In some applications, especially involving returns, their elliptical symmetry may constitute a limit (cf., for example, Patton (2006), Okimoto (2008), Cerrato et al. (2015) and reference therein).Copulas in the Archimedean family (Clayton, Gumbel, Joe-Clayton, etc.) offer a way to bypass such a limitation but suffer from other drawbacks and, in any case, seem to be less relevant for the variables of interest for a vMEM (here, different indicators of trading activity).They will not be pursued in what follows.(Song 2000).We note that the conditional correlation matrix of ε t has generic element approximately equal to R ij , which can assume negative values too.
One of the limitations of the Normal copula is the asymptotic independence of its tails.Empirically, tail dependence is a behavior observed frequently in financial time series (see McNeil et al. (2005), among others).Elements of x (being different indicators of the same asset or different assets) tend to be affected by the same extreme events.For this reason, as an alternative, we consider the Student-T copula which allows for asymptotically dependent tails.Differently from the Normal copula, when R = I we get uncorrelated but not independent marginals (details in McNeil et al. (2005)).Thus, when considering the Student-T copula in Equation ( 9), with a scalar ν shape parameter, we have , and its explicit pdf is where q = (q 1 ; . . .; q K ), q i = T −1 (u i ; ν) and T(x; ν) denotes the cdf of the Student-T distribution with ν degrees of freedom computed at x.Further specifications of the Student-T copula are in Demarta and McNeil (2005).

Maximum Likelihood Inference
In this section we discuss how to get full Maximum Likelihood (ML) inferences from the vMEM with the parametric specification (3) for µ t (dependent on a parameter vector θ) and a copula-based formulation f ε (ε t |F t−1 ) of the conditional distribution of the vector error term (characterized by the parameter vector λ).Inference on θ and λ can be discussed in turn, given that from the model assumptions the log-likelihood function l is Considering a generic time t, it is useful to recall the sequence of calculations: where θ i is the parameter involved in the i-th element of the µ t vector.

Parameters in the Conditional Mean
Irrespective of the specification chosen for f ε (ε t |F t−1 ), the structure of the vMEM allows to express the portion of the score function corresponding to θ as where In order to have a zero expected score, we need E(w t |F t−1 ) = 0 or, equivalently, E(ε t b t |F t−1 ) = −1.
As a consequence, the information matrix and the expected Hessian are given by and respectively, where the matrices For a particular parametric choice of the conditional distribution of ε t , we need to plug the specific expression of ln f ε (ε t |F t−1 ) into b t .For instance, considering the generic copula formulation (8), then In what follows we provide specific formulas for the elliptical copula formulation (9) and its main sub-cases.

Parameters in the pdf of the Error Term
Under a copula approach, the portion of the score function corresponding to the term ∑ T t=1 ln f (ε t |F t−1 ) (cf.Equation ( 12)) depends on a vector λ = (ξ; φ) (ξ and φ are the parameters of the copula function and of the marginals, respectively-cf.Section 2), and As detailed before, beside a possible shape parameter ν, elliptical copulas are characterized by a correlation matrix R which, in view of its full ML estimation, can be expressed (cf. McNeil et al. (2005), p. 235) as where c is an upper-triangular matrix with ones on the main diagonal and D is a diagonal matrix with diagonal entries D 1 = 1 and for j = 2, . . ., K. By so doing, the estimation of R is transformed into an unconstrained problem, since the K(K − 1)/2 free elements of c can vary into R.
We can then write ξ = (c; ν).Let us introduce a compact notation as follows: C = cD, q t = (q t,1 ; . . .; q t,K ), q t,i = G −1 (u t,i ; ν), q t = C −1 q t , q * t = R −1 q t , q t = q t R −1 q t = q t q t .We can then write where we used 1 2 ln(|R|) = ∑ K i=2 ln D i .In specific cases we get:

Parameters entering the matrix c
The portion of the score relative to the free parameters of the c matrix has elements Using some algebra we can show that By replacing them into (23), we obtain

Parameters entering the vector ν
The portion of the score relative to ν is The derivative of ln K * = ln K * (ν; K) can sometimes be computed analytically.For instance, for the Student-T copula we have For the remaining quantities, we suggest numerical derivatives when, as in the Student-T case, the quantile function G −1 (x; ν) cannot be computed analytically.

Parameters entering the vector φ
The portion of the score relative to φ has elements After some algebra we obtain where For instance, if a marginal has a distribution Gamma(φ i , φ i ) then ) can be computed numerically.

Parameters entering the vector θ
By exploiting the notation introduced in this section, we can now detail the structure of b t entering into ( 16) and then into the portion of the score function relative to θ.From ( 22), ε t b t + 1 (cf.( 16)) has elements where d t,i is given in (25).For our choice, f i (ε t,i ) is the pdf of a Gamma(φ i , φ i ) distribution, so that

Expectation Targeting
Assuming weak-stationarity of the process, numerical stability and a reduction in the number of parameters to be estimated can be achieved by expressing ω in terms of the unconditional mean of the process, say µ, which can be easily estimated by the sample mean (expectation targeting 2 ).Since E(x t ) = E(µ t ) = µ is well defined and is equal to Equation ( 3) becomes In a GARCH framework, the consequences of this practice have been been investigated by Kristensen and Linton (2004) and, more recently, by Francq et al. (2011) who show its merits in terms of stability 2 This is equivalent to variance targeting in a GARCH context (Engle and Mezrich 1995), where the constant term of the conditional variance model is assumed to be a function of the sample unconditional variance and of the other parameters.In this context, other than a preference for the term expectation targeting since we are modeling a conditional mean, the main argument stays unchanged.
of estimation algorithms and accuracy in estimation of both coefficients and long term volatility (cf.Appendix A for some details in the present context).
From a technical point of view, a trivial replacement of µ by the sample mean x T followed by a ML estimation of the remaining parameters, preserves consistency but leads to wrong standard errors (Kristensen and Linton 2004).The issue can be faced by identifying the inference problem as involving a two-step estimator (Newey and McFadden (1994, ch. 6)), namely by rearranging (θ; λ) as (µ; ϑ), where ϑ collects all model parameters but µ.Under conditions able to guarantee consistency and asymptotic normality of (µ; ϑ) (in particular, the existence of E(µ t µ t ): see Francq et al. ( 2011)), we can adapt the notation of Newey and McFadden (1994) to write the asymptotic variance of where and is the moment function giving the sample average x T as an estimator of µ.The Ω matrix denotes the variance-covariance matrix of (∇ ϑ l t ; m t ) partitioned in the corresponding blocks.
To give account of the necessary modifications to be adopted with expectation targeting, we provide sufficiently general and compact expressions for the parameters µ (the unconditional mean) and θ (the remaining parameters) in the conditional mean expressed by (29).
The expression for Ω µ,µ is obtained by using the technique in Horváth et al. (2006) and Francq et al. (2011).In this sense, we extend the cited works to a multivariate formulation including asymmetric effects as well.Some further simplification is also possible when the model is correctly specified since, in such a case, H (ε) for what concerns the inner part of (30).

Concentrated Log-Likelihood
Some further numerical estimation stability and reduction in the number of parameters can be achieved-if needed-within the framework of elliptical copulas: we can use current values of residuals to compute current estimates of R (Kendall correlations are suggested by Lindskog et al. (2003)) and of the shape parameter ν (tail dependence indices are proposed by Kostadinov (2005)).This approach may be fruitful with other copulas as well when sufficiently simple moment conditions can be exploited.A similar strategy can be applied also to the parameters of the marginals.For instance, if they are assumed Gamma(φ i , φ i ) distributed, the relationship V(ε t,i |F t−1 ) = 1/φ i leads to very simple estimator of φ i from current values of residuals.By means of this approach, the remaining parameters can be updated from a sort of pseudo-loglikelihood conditioned on current estimates of the pre-estimated parameters.
In the case of a Normal copula a different strategy can be followed.The formula of the (unconstrained) ML estimator of the R matrix (Cherubini et al. 2004, p. 155), namely where q = (q 1 ; . . .; q T ) is a T × K matrix, can be plugged into the log-likelihood in place of R obtaining a sort of concentrated log-likelihood leading to a relatively simple structure of the score function.However, this estimator of R is obtained without imposing any constraint relative to its nature as a correlation matrix (diag(R) = 1 and positive definiteness).Computing directly the derivatives with respect to the off-diagonal elements of R we obtain, after some algebra, that the constrained ML estimator of R satisfies the following equations: for i = j = 1, . . ., K, where R i. and R .jindicate, respectively, the i-th row and the j-th column of the matrix R. Unfortunately, these equations do not have an explicit solution. 3An acceptable compromise 3 Even when R is a (2, 2) matrix, the value of R 12 has to satisfy the cubic equation: which should increase efficiency, although formally it cannot be interpreted as an ML estimator, is to normalize the estimator Q obtained above in order to transform it to a correlation matrix: where . This solution can be justified observing that the copula contribution to the likelihood depends on R exactly as if it was the correlation matrix of iid rv's q t normally distributed with mean 0 and correlation matrix R (see also (McNeil et al. 2005, p. 235)).Using this constrained estimator of R, the concentrated log-likelihood becomes It is interesting to note that, as long as ( 31), (32) too gives a relatively simple structure of the score function.Using some tedious algebra, we can show that the components of the score corresponding to θ and φ have exactly the same structure as above, with the quantity d t,i into (25) changed to where the C matrix is here given by and φ(.) indicates here the pdf of the standard normal computed at its argument.Of course, also in this case the parameters of the marginals can be updated by means of moment estimators computed from current residuals (instead that via ML) exactly as explained above.

Asymptotic Variance-Covariance Matrix
As customary in ML inference, the asymptotic variance-covariance matrix of parameter estimators can be expressed by using the Outer-Product of Gradients (OPG), the Hessian matrix, or in the sandwich form, more robust toward error distribution misspecification.The last option is what we use in the application of Section 5.

Multivariate Gamma
The generalization of the univariate gamma adopted by Engle and Gallo (2006) to a multivariate counterpart is frustrated by the limitations of the multivariate Gamma distributions available in the literature (all references below come from (Johnson et al. 2000, chapter 48)): many of them are bivariate formulations, not sufficiently general for our purposes; others are defined via the joint characteristic function, so that they require tedious numerical inversion formulas to find their pdf's.The formulation that is closest to our needs (it provides all univariate marginal probability functions for ε i,t as Gamma(φ i , φ i )), is a particular version of the multivariate Gamma's by Cheriyan and Ramabhadran (henceforth GammaCR, which is equivalent to other versions by Kowalckzyk and Trycha and by Mathai and Moschopoulos): where φ = (φ 1 ; . . .; φ K ) and 0 < φ 0 < min(φ 1 , . . ., φ K ) (Johnson et al. 2000, pp. 454-470).The multivariate pdf is expressed in terms of a cumbersome integral and the conditional correlations matrix of ε t has generic element which is restricted to be positive and is strictly related to the variances 1/φ i and 1/φ j .Given these drawbacks, Multivariate Gamma's will not be adopted here.

Multivariate Lognormal
A different specification is obtained assuming ε t conditionally log-Normal, where m and V are, respectively, the mean vector and the variance-covariance matrix of ln ε t |F t−1 and the constraint in ( 34) is needed in order for E(ε t |F t−1 ) to be equal to 1 (cf.Equation ( 5)).
The log-likelihood function is obtained by replacing the expression of ln f ε (ε t |F t−1 ) coming from (34) into Equation ( 12), leading to Accordingly, the portion of the score related to the derivative of l in θ (the parameters in the conditional mean) is given by Equation ( 14), with A t defined as in ( 15) and leading to A full application of the ML principle would require optimization of (35) also in the elements of the matrix V .However, differently from the maximization with respect to the same parameters of a Gaussian log-likelihood, this approach is tedious since the diagonal elements of V appear also in the term m, complicating the constraints to preserve its positive definiteness.
A possible approach to bypass this issue would be to resort to a reparameterization of V based on its Cholesky decomposition, as we did in Section 3 to represent the correlation matrix R in the copula-based specifications.Here we adopted a simpler approach, based on a method of moment estimator of the V elements.The ensuing estimation algorithm follows the sequence: • We compute log-residuals ln ε t (t = 1, . . ., T), according to the first two steps of (13), at the current parameter estimates; • We then estimate their variance-covariance matrix V ; • We estimate m = −0.5 diag(V ); • We use Equation (37) to update the estimate of θ, iterating until convergence.
Following (Newey and McFadden 1994, ch. 6) once again, the resulting asymptotic variance-covariance matrix of θ can be expressed as where 37) and λ = vech(V ) is estimated with the corresponding portion of the variance-covariance matrix of log-residuals ln ε t (t = 1, . . ., T).After some algebra we get where A t is defined in (15) and Ω λλ {u, v} indicates the (u, v)-th element of Ω λλ . 4Because of ( 39) and (40), Equation (38) simplifies to As an alternative, we can use the semiparametric approach by Cipollini et al. (2013), where the distribution of the error term ε t is left unspecified.In this case the pseudo-score equation corresponding to the θ parameter is again given by Equation ( 14), with A t defined as in (15) and leading to the equation The ensuing estimation algorithm works similarly to the log-Normal case iterating until convergence the following sequence: • We compute residuals ε t (t = 1, . . ., T), according the first two steps of (13), at the current parameter estimates;

•
We estimate Σ as their variance-covariance matrix;

•
We update the estimate of θ using Equation (43).

4
(40) and ( 41) come from standard properties of the multivariate Normal distribution: the former is implied by the fact that all centered odd moments of ln ε t are zero; the latter is implied by the structure of fourth centered moments of ln ε t .
The asymptotic variance-covariance matrix of θ is given by avar .
the form of which looks very similar to Ω −1 θθ in Equation ( 39).The (pseudo-)score Equations ( 37) and ( 43) look quite similar: beside a simpler expression of the GMM asymptotic variance-covariance matrix of θ, the essential difference rests in the nature of the martingale difference (MD) w t 'driving' the equation.The GMM w t relies on minimal model assumptions and does not require logs; as a consequence, it is feasible also in case of zeros in the data.Apart from the need to take logs, the expression of w t in the log-Normal case is a MD only if ln ε t |F t−1 has −0.5 diag(V ) as its mean; if this is not true, we would have a corresponding bias in estimating the ω coefficient appearing in the conditional mean µ t .

Trading Activity and Volatility within a vMEM
Trading activity produces a lot of indicators characterized by both simultaneous and dynamic interdependence.In this application, we concentrate on the joint dynamics of three series stemming from such trading activity, namely volatility (measured as realized kernel volatility, cf.Barndorff-Nielsen et al. (2011), but also Andersen et al. (2006) and references therein), volume of shares and number of shares traded daily.The relationship between volatility and volume as relating to trading activity was documented, for example, in the early contribution by Andersen (1996).Ever since, the evolution of the structure of financial markets, industry innovation, the increasing participation of institutional investors and the adoption of automated trading practices have strengthened such a relationship, and the number of trades clearly reflects an important aspect of trading strategies.For the sake of parsimony, we choose to focus on the realized volatility forecasts as the main product of interest from the multivariate model exploiting the extra information in the other indicators, as well as the contemporaneous correlation in the error terms.
The availability of ultra high frequency data allows us to construct daily series of the variables exploiting the most recent development in the volatility measurement literature.
As a leading example, we consider Johnson & Johnson (JNJ) between 3 January 2007 to 30 June 2013 (1886 observations).Such a stock has desirable properties of liquidity and a limited riskiness represented by a market beta generally smaller than 1.Raw trade data from TAQ are cleaned according to the Brownlees and Gallo (2006) algorithm.Subsequently, we build the daily series of realized kernel volatility, following Barndorff-Nielsen et al. (2011), computing the value at day t as where k(x) is the Parzen kernel x j is the j-th high frequency return computed according to (Barndorff-Nielsen et al. 2009, Section 2.2) and x j is the intradaily return of the j-th bin (equally spaced on 15 min intervals).For volumes (vol), that is the total number of shares traded during the day, and the number of trades (nt) executed during the day (irrespective of their size), we simply aggregate the data (sum of intradaily volumes and count of the number of trades, respectively).According to Figure 1, the turmoil originating with the subprime mortgage crisis is clearly affecting the profile of the series with an underlying low frequency component.For volatility, the presence of a changing average level was analyzed by Gallo and Otranto (2015) with a number of non-linear MEM's.Without adopting their approach, in what follows we implement a filter (separately for each indicator) in order to identify short term interactions among series apart from lower frequency movements.The presence of an upward slope in the first portion of the sample is apparent and is reminiscent of the evidence produced by Andersen (1996) for volumes on an earlier period.Remarkably, this upward movement is interrupted after the peak of the crisis in October 2008, with a substantial and progressive reduction of the average level of the variables.To remove this pattern, we adopt a solution similar to Andersen (1996), that is, a flexible function of time which smooths out the series.Extracting low frequency movements in a financial market activity series with a spline is reminiscent of the stream of literature initiated by Engle and Rangel (2008) with a spline-GARCH and carried out in Brownlees and Gallo (2010) within a MEM context.In detail, assuming that this component is multiplicative, we remove it in each indicator as follows: • we take the log of the original series; • we fit on each log-series a spline based regression with additive errors, using time t (a progressive counter from the first to the last observation in the sample) as an independent variable; 5 • the residuals of the previous regression are then exponentiated to get the adjusted series.
When used to produce out-of-sample forecasts of the original quantities, the described approach is applied assuming that the low frequency component remains constant at the last in-sample estimate.This strategy is simple to implement and fairly reliable for forecasting at moderate horizons.
Table 1 shows some similarities across original and adjusted series, with some expected increase in the correlation between volumes and number of trades in the adjusted series (due to a smaller correlation in the corresponding low frequency components).Although still quite large overall, after adjustment, correlations are more spread apart: as mentioned, the value concerning volumes and the number of trades, above 0.9, is the highest one; cor(rkv, vol) is about 0.6, whereas the cor(rkv, nt) is still above 0.7, all confirming the intuition that the information contained in other variables related to trading activity relevant.

Modeling Results
In the application, we consider a vMEM on adjusted data, where the conditional expectation µ t has the form (cf. Equation (3)) In order to appreciate the contribution of the different model variables, we consider alternative specifications for the coefficient matrices α 1 and β 1 (α 2 and γ 1 are kept diagonal in all specifications), and for the error term.
As of the former, we consider specifications with both α 1 and β 1 diagonal (labeled D); α 1 full and β 1 diagonal (labeled A); α 1 diagonal and β 1 full (labeled B); both α 1 and β 1 full (labeled AB).For the joint distribution of the errors, we adopt a Student-T, a Normal and an Independent copula (T, N and I as respective labels), in all cases with Gamma distributed marginals.As a comparison, the AB specification is also estimated by ML with log-Normal (AB-LN) and with GMM in the semiparametric AB-S.The estimated specifications are summarized in Table 2.When coupled with the conditional means in the table, the specifications with the Independent copula can be estimated equation-by-equation.

5
Alternative methods, such as a moving average of fixed length (centered or uncentered), can be used but in practice they deliver very similar results and will not be discussed in detail here.The spline regression is estimated with the gam() function in the R package mgcv by using default settings.
Table 2.Estimated specifications of the vMEM defined by ( 1) and ( 44), with error distribution specified, alternatively, by (8), or (34), or (5).Notation: I for Independent copula with Gamma marginals; N for Normal copula with Gamma marginals; T for Student-T copula with Gamma marginals; LN for log-Normal; S for semiparametric.

Error Distribution Conditional Mean (Parameters)
Estimation results are reported in Table 3, limiting ourselves to the equation for the realized volatility. 6Among copula-based specifications, the Student-T copula turns out to be the favorite version, judging upon a significantly higher log-likelihood function value, and lower information criteria; 7 the equation-by-equation approach (Independent copula) is dominated in both respects. 8 An estimation of the Diagonal model with the Normal/Student-T copula function (details not reported) shows log-likelihood values of 1947.91,respectively, 2034.07 (relative to the base value of 205.53 in the first column), pointing to both a substantial improvement coming from the contemporaneous correlation of the innovations and to the joint significance of the other indicators when the A specification is adopted.The comparison of the A (α 1 full; β 1 and γ 1 diagonal) versus the B (β 1 full; α 1 and γ 1 diagonal) formulations, seems to indicate a dominance of the former, at least judging upon the overall log-likelihood values.It is also interesting to note that model residual autocorrelation is substantially reduced only in the case of richer parameterizations (AB), where non-diagonality in both α 1 and β 1 captures possible interdependencies, with a sharp improvement in the LB diagnostics (although only marginally satisfying).
In general, it seems that the more relevant contribution in determining realized volatility comes from the lagged number of trades, but only in the AB models (some collinearity effect, diminishing individual significance, is to be expected); the relevance of such information is also highlighted by the Granger Causality tests, to check whether lagged model components attributable to either vol or nt have joint significance.In detail, using three indices where the first refers to the LHS variable, the second to the conditioning variable, and the third the lag, we test H 0 : α 1,j,1 = 0 (j = 2, 3) in the A-based formulations; H 0 : β 1,j,1 = 0 (j = 2, 3) in the B-based formulations; H 0 : α 1,j,1 = β 1,j,1 = 0 (j = 2, 3) in the AB-based formulations.
The own volatility coefficients at lag 2 are always significant (with negative signs); surprisingly, the leverage effect related to the sign of the returns is not.
The Normal and the Student-T specifications appear to provide similar point estimates, except for the non-diagonal β coefficients: for the latter, once again, the picture may be clouded by collinearity, since a formal log-likelihood test does indicate joint significance. 96 All models are estimated using Expectation Targeting (Section 3.2.1).The Normal copula based specifications are estimated resorting to the concentrated log-likelihood approach (Section 3.2.2).We omit estimates of the constant term ω.

7
The estimated degrees of freedom are 8. 53 (s.e. 1.16) and 8.18 (s.e. 1.05), respectively, in the A-T and AB-T formulations.We also tried full ML estimation of the AB-N specification getting a value of the log-likelihood equal to 2046.57, very close to the concentrated log-likelihood approach (Section 3.2.2) used in Table 3.

8
The stark improvement in the likelihood functions, coming from the explicit consideration of the correlation structure, does not guarantee similar gains in the forecasting ability of the same variables.This is similar to what happens in modeling returns, when the likelihood function of ARMA models is improved by superimposing a GARCH structure on the conditional variances: no substantive change in the fit of the conditional mean and no better predictive ability.9 No attempt at pruning the structure of the model following the automated procedure suggested in Cipollini and Gallo (2010) was performed.
The overview of the estimation results is complete by examining the Table 4, where we report the square root of the estimated elements on the main diagonal of Σ for all specifications, showing substantial similarity.Table 4. Square root of the estimated elements on the main diagonal of Σ (cf.Equation ( 5)).Finally, the correlation coefficients implied by the various specifications are reported in Table 5, showing that a strong correlation among innovations further supports the need for taking simultaneity into account.The alternative log-Normal and Semiparametric formulations deliver parameter inferences qualitatively similar to the corresponding versions based on copulas.The AIC and BIC statistics of the log-Normal are better than the copula based formulations (for the semiparametric they are not available); also its Ljung-Box diagnostics are marginally better.

D-I A-I A-N A-T B-N B-T AB-N AB-T AB-LN AB-
A comparison of the estimation times across specifications 10 reveals some differences.Student-T copula based formulations are the slowest; the Normal copula based estimated with the concentrated likelihood approach takes about 1/6 of time; finally, the log-Normal and the Semiparametric are a lot faster (about 1/20 of the Normal based).Differences notwithstanding, the largest times involved are very low on an absolute scale.
Figure 2 shows a comparison between the estimated residuals against their theoretical counterparts in the case of the Gamma and log-Normal distributions.In both cases, the fit seems to break apart for the tail of the distribution, pointing to the need for some care about the evolution of volatility of volatility (a mixture of distribution hypothesis is contemplated in, say, De Luca and Gallo ( 2009) in a MEM model).

Forecasting
We left the period 1 July-31 December 2013 (128 observations) for out-of-sample forecasting comparisons.We adopt a Diebold and Mariano (1995) test statistic for superior predictive ability using the error measures where x t and µ t denote here the observed and the predicted values, respectively.e N,t is the squared error, and can be interpreted as the loss behind an x t Normally distributed with mean µ t ; similarly, e G,t can be interpreted as the loss we can derive considering x t as Gamma distributed with mean µ t and variance proportional to µ 2 t .Comparisons are performed on both Original and Adjusted (i.e., removing the low frequency component) series.
10 Calculations with our routines written in R were performed on an Intel i7-5500U 2.4Ghz processor.We did not perform an extensive comparison of estimation times and we did not optimize performance by removing tracing of intermediate results.
Moreover, the copula-based and the alternative specifications are optimized resorting to different algorithms: the first ones, more cumbersome to optimize, are estimated toward a combination of NEWUOA (Powell 2006) and Newton-Raphson; for the second ones we used a dogleg algorithm (Nocedal and Wright 2006, ch. 4).
Table 6 reports the values of the Diebold-Mariano test statistic of different model formulations, against the AB-T specification (which has the lowest values of the information criteria among copula-based specifications, cf.Table 3), considering one-step ahead predictions.
There are no substantial differences in performance between the results on the original and the adjusted series. 11We notice a definite improvement of the specification involving simultaneous innovations over the equation-by-equation specifications (first two rows).The Student-T Copula improves upon the Normal formulation (marginally, line AB-N), but is substantially equivalent to the AB-LN (slightly better) and the AB-S (slightly worse).

Conclusions
The Multiplicative Error Model was originally introduced by Engle (2002) as a positive valued product process between a scale factor following a GARCH type dynamics and a unit mean innovation process.In this paper we have presented a general discussion of a vector extension of such a model with a dynamically interdependent formulation for the scale factors where lagged variables and conditional expectations may be both allowed to impact each other's conditional mean.Engle and Gallo (2006) estimate a vMEM where the innovations are Gamma-distributed with a diagonal variance-covariance matrix (estimated equation-by-equation).The extension to a truly multivariate process requires a complete treatment of the interdependence among the innovation terms.One possibility is to avoid the specification of the distribution and adopt a semiparametric GMM approach as in Cipollini et al. (2013).In this paper we have derived a maximum likelihood estimator by framing the innovation vector as a copula function linking Gamma marginals; a parametric alternative with a multivariate log-Normal distribution is discussed, while we show that the specification using a multivariate Gamma distribution has severe shortcomings.
We illustrate the procedure on three indicators related to market trading activity: realized volatility, volumes, and number of trades.The empirical results are presented in reference to daily data on the Johnson and Johnson (JNJ) stock.The data on the three variables show a (slowly moving) time varying local average which was removed before estimating the parameters.The three components are highly correlated with one another, but interestingly, their removal does not have a substantial impact on the correlation among the adjusted series.The specifications adopted start from a diagonal structure where no dynamic interaction is allowed, and an Independent copula (de facto an equation-by-equation specification).Refinements are obtained by inserting a Normal copula and a Student-T copula (which dramatically improve the estimated log-likelihood function values) and then allowing for the presence of dynamic interdependence in the form of links either between lagged variables, or between lagged conditional means, or both.Although hindered by the presence of some collinearity, the results clearly show a significant improvement for the fit of the equation for realized volatility when volumes and number of trades are considered.This is highlighted by significantly better model log-likelihoods, lower overall information criteria and improved autocorrelation diagnostics.The results of an out-of-sample forecasting exercise confirm the need for a simultaneous approach to model specification and estimation, with a substantial preference for the Student-T copula results within the copula-based formulations.From a interpretive point of view, the results show that the past realized volatility is not enough information by itself to reconstruct the dynamics: its modeling and forecastability are clearly influenced by other indicators of market trading activity, and simultaneous correlations in the innovations must be accounted for.
Under equal models for conditional means, the copula approach is substantially equivalent to other forms of specifications, be they parametric with a multivariate log-Normal error, or semiparametric (with GMM estimation).The advantage of a slightly more expensive (from a computational point of view) procedure lies in gaining flexibility in the tail dependence offered by the Student-T copula, which can be a plus when reconstructing joint conditional distributions, or when reproducing more realistic behavior in simulation is desirable.

Figure 1 .
Figure 1.Time series of the trading activity indices for JNJ (3 January 2007-30 June 2013).(a,c,e) original data; (b,d,f) data after removing the low frequency component.(a,b) realized kernel volatility (annualized percentage); (c,d) volumes (millions); (e,f) number of trades (thousands).The spike on 13 June 2012 corresponds to an important acquisition and a buy back of some of its common stock by a subsidiary.

Figure 2 .
Figure 2. QQ-plot between theoretical and empirical residuals for the realized volatility of JNJ.Comparison with Gamma and log-Normal distributions.Estimation period: 3 January 2007-30 June 2013.

Table 3 .
Estimated coefficients of the realized volatility equation for different model formulations (cf.Table 2) for JNJ (3 January 2007-30 June 2013).Robust t-stats in parentheses.An empty space indicates that the specification did not include the corresponding coefficient.Rows labeled vol and nt report the p-values for causality tests (see the text for details).Overall model diagnostics report Log-likelihood values, Akaike and Bayesian Information Criteria, and p-values of a joint Ljung-Box test of no autocorrelation at various lags.Last row reports the estimation time (in seconds).

Table 5 .
Estimated correlation matrices of the copula functions (specifications -N and -T), and of ε t (specifications -LN and -S).

Table 6 .
Diebold-Mariano test statistics for unidirectional comparisons, against the AB-T specification, considering 1-step ahead forecasts (out-of-sample period 1 July-31 December 2013).The error measures are defined as e N,t = 0.5(x t − µ t ) 2 (Normal loss) and e G,t = x t /µ t − 1 − ln(x t /µ t ) (Gamma loss), where x t and µ t denote the observed and the predicted values, respectively (cf.Section 5.2 for the interpretation).Boldface (italic) indicates 5% (10%) significant statistics.