5.1. Empirical Size and Power
We present some evidence of the performance of the proposed tests when they are applied to systems that might (or might not) exhibit within-dependence (i.e., autodependent processes, whose easiest form is autocorrelation) and/or dependence between processes (between-dependence or cross-dependence). We refer to the statistics depicted in Equations (11
) and (13
) by Test1 and Test2. For simplicity, the statistics have been computed assuming that the correct time-lag has been obtained. To this end, we have considered the following systems of relations:
We have selected these systems (with these parameters, sample sizes and relationships) because they have been studied in previous works, so comparability is easy. In particular, system S1 specifies two within-independent processes that are independent between themselves, while S2 specifies two between-independent Gaussian autoregressive AR(1) processes. System S3 represents two between-independent processes that exhibit a nonlinear form of within-dependence. On the other hand, systems S4, S5, S6, and S7 are between-dependent processes. S4 is a linear within-dependent model that entails ideal conditions (normality and linearity) for the application of parametric statistics. System S5 has a stable bivariate nonlinear autoregressive within-dependence. S6 model shows within-dependence in one of the variables, but not in the other. System S7 shows first a nontrivial nonlinear relation between the processes and, second, it considers two different forms of within-dependence. Finally, system S8 is very interesting as it is a nonlinear deterministic process formed from the chaotic logistic equation.
Each system was simulated 2000 times and the following tables collect the proportion of rejection the null hypothesis considered at the 5% nominal level. The tests were performed for . The selection of parameter m is open to the practioner, and we will deal with this regard later on this section.
shows the power and size results of Test1 for the eight models
for the null
being i.i.d. and independent between themselves.
We observe first that, when the stochastic system is generated under the null (i.e., two independent processes that are both auto-independent), the test rejects according at the fixed 5% level. In this regard, the new test Test1 behaves as expected under the null, regardless of the sample size. The results for systems S2 and S3 suggest that Test1 test easily detects a departure from one of the conditions of the null (namely, within-dependence), even when they are between-independent processes. The results for S4, S5 and S7 indicate that, when the departure from the null is larger (in this case both null conditions are violated), then Test1 behaves powerfully, even in the case of nonlinear dependencies, as in systems S5 and S7. As regards system S6, notice that, despite the fact that process Y is i.i.d., process X depends on Y. Even in the case that one process has no within-independence but there is between dependence, the test exhibits power to detect these departures. Finally, the results for system S8 are especially interesting because the processes involved are purely deterministic (there is no random term) and the dependence between them is evident. Despite this peculiar dynamic structure, it is noticeable that the Pearson cross-correlation test has a rejection rate of the null of 5%; in other words, Pearson’s test behaves at the nominal level of the statistic, which implies that it systematically suggests not to reject the null of independence between processes when there is an obvious deterministic dependence. In contrast, Test1 test rejects the null with great power. The explanation for this performance is that Pearson’s test is limited to detecting linear relationships between variables, while Test1 considers any form of potential relationship.
On the other hand, it can be also concluded from the above comments on simulations that Test1 is not capable of distinguishing between the forms of dependences (between and/or within). In other words, if the final user obtains a rejection of the null, she does not know the reason for the rejection. Given the simplicity of the test and its power with complex dependence forms, it is advisable to use Test1 as a first step. However, to complete the process, it is necessary either to use Test2 or to apply some sort pre-whitening process. We now explore the first solution (Test2) and, later in the paper, we consider the behavior of the test in the case of pre-whitening in a multivariate scenario.
shows the power and size results of Test2 for the eight models
for the null
being independent just between themselves.
The empirical behavior of other available tests on the same systems and sample sizes are reported in the Appendix. Based on the results for these models, we make the following remarks:
(i) The output for systems S1, S2 and S3 hints that Test2 can correctly deal with models that exhibit several forms of within-dependence, and this internal dependence does not contaminate the ability of Test2 to indicate, at a nominal level of 0.05 that both processes are independent. In this regard, it is noteworthy to observe that, a Haugh-type test could not have been used with system S3 because these tests report confident results only for systems of linear and Gaussian dependence, as is the case of S2.
(ii) When within-dependence is linear, as in S4, the power of the test is impressive regardless of the sample size. This empirical fact implies that Test2 can be used to detect simple cases of linear relationships between variables. As reported in [14
], Haugh-based tests also have extremely good power for this type of linear dependence. Accordingly, the final user of the tests could safely use the nonparametric Test2 test or parametric Haugh-based tests.
(iii) However, when there is within-dependence of nonlinear nature, as in S5, S6, S7 and S8, it is well known that Haugh based tests are unable to detect dependence, regardless of the sample size. As can be observed from this simulation, Test2 detects dependence when the sample is large enough.
From (i)–(iii), it can be concluded that Test2 can be used to effectively detect dependence between variables with fewer restrictions than other available tests in the literature, and, from a practical point of view, the larger the sample size, the more reliable the results are.
As mentioned earlier, an interesting advantage of both tests is that they can be used in a multivariate setting. We now consider two new sets of multivariate systems. The first set is formed of S9 and S10 systems. Each system is a three-variable stochastic linear system that is used in this paper to show that Test1 can be satisfactorily used for pre-whitened data as proved in the previous section:
These systems were generated and estimated by Ordinary Least Squares (OLS) to obtain the linear structure of each variable and residuals were then tested with Test1. The results are in Table 3
After removing the linear structure, independent estimated errors are obtained. Provided that errors are simulated independently, it is expected that Test1 will not reject the null at the nominal level, as shown in the table above.
The second set of systems is conducted to study the behavior of Test2 in a multivariate system of complex relationships. To this end, we have considered three systems:
| || |
S11 considers the case where two variables, X
, do not have between-dependence (cross-dependence) but are both driven by a common variable Z
. One can think of Z
as an environmental variable that determines (explains) X
, and therefore both are related by this external variable. Here, we expect Test2 to reject the null. Thus, S12 is a nonlinear and more complex model than S11, but, in essence, is similar. This system has two non-interacting variables, X
, that share common environmental forcing. Finally, S13 considers the case of three-variables where one of them has no dynamic structure, and the other two only have one-side dependence. We expect Test2 to be able to detect this dependence. The results are given in Table 4
The results suggest that Test2 is able to clearly detect departures from the null in a multivariate context. Even in the case of hidden common variables, the test unveils the indirect relationship, despite its linear or nonlinear nature and despite the sample size. It is also concluded that Test2 needs more observations (larger sample sizes) to detect dependences when considering scenarios like S13, where of six potential relationships between variables, only one exists (from Y to X).
5.2. Comparison with Other Tests
As we indicated in the introductory section, the technical literature on this topic has produced several statistics that test for independence between time series. This subsection aims to compare among the most relevant tests.
A comparison among tests can be conducted at several levels. We compare at the level of: the assumptions required to derive and implement the test, the parameters that the final user has to fix to conduct the test, and the empirical power of the test.
According to the literature, the improvements have occurred around some criteria that to some extent are related to the required assumptions for deriving the statistic and implementing the test(s). On this regard, scholars have mainly focused on the following criteria:
stationarity (or not) of the system generating process,
linearity (or not) of the system, and
robustness (or not) to the presence of outliers.
On the other hand, all available statistics require the final user of these statistical tests to make certain decisions on some aspects that will necessarily affect the final result of the test. Provided the test is used in the residuals of the model, one of the most important decisions is the fact that a correct model needs to be estimated. Obviously, pre-estimation (or not) of an autoregressive model before using the test is a critical decision. Another important choice for using the test is due to the fact that some of the tests relied on the use of kernels. Throughout the literature, there has been some controversy regarding how to choose the kernel and to what extent empirical behavior of the test changes because of the selected kernel. Along with the kernel, a selection is also required for truncation parameters. Finally, all the tests have to choose the number of observations in the lag vector, which is equivalent to parameter m (embedding) of our tests.
These observations lead us to complete the previous list with the following items:
allows us to compare the tests considered in terms of the robustness to processes that might be nonstationary and nonlinear, and to the presence of outliers (criteria (i)–(iii)). The table also facilitates comparisons in line with the choices that the user has to make before using the test(s), (criteria (iv)–(vi)).
According to the previous table, the tests presented in this work have a greater range of applicability. The data that can be analyzed can be compatible with an ample number of models. In other words, other tests are less generally applicable. From a practical point of view, the new tests facilitate user work, since she has to select a smaller number of parameters. This is especially relevant since we alleviate the burden of modeling a (correct) autoregressive process. Any of our techniques only require selecting parameter m, which is a necessary parameter in all the available tests.
As explained in the introductory section, mainly all the available tests are derived from a seminal Haugh’s test, which is best known along with Hong’s test, which is the test with better behavior in terms of power. To complete the comparison, we now compare the results in terms of power. To do so, we consider these two well-known tests, namely Haugh and Hong tests, and compare it with Test2, which is the most general one. To make a fair comparison, it is only conducted on models to which both tests can be applied.
We firstly describe these competitive tests and then show their results on the corresponding systems.
The Haugh’s (1976) [1
] procedure considers the following portmanteau statistic given by
are the residual cross-correlations for
are the two residual series of length n
, obtained by fitting univariate models to each of the series. The constant
is a fixed integrer and must be chosen a priori. The asymptotic distribution of
is chi-square under the null hypothesis of independence and the hypothesis is rejected for large values of the test statistic.
Hong (1996) [8
] generalizes Haugh’s statistic. In fact, Hong’s test is a weighted sum of residual cross-correlations of the form
The weighting depends on a kernel function k
and a smoothing parameter d
(both have to be selected a priori). Under the null hypothesis, the test statistic
and it rejects the null for large values of
The empirical power results on systems are collected in Table 6
Results point to several observations: (a) all tests have maximum power for the simplest system (S4), so all are highly competitive, and therefore the following comments will only apply to systems S5, S6 and S7. (b) None of the tests compared is competitive for (small) sample sizes: 200 and 500. (c) Haugh and Hong tests do not improve power by increasing the sample size; however, Test2 not only improves, but also reaches levels close to full power.
5.3. Selecting Parameter m
As mentioned earlier, all tests involve the selection of a parameter, m that comform the basic units of analysis. This parameter is an integer which stands for the fix length of the vectors that are formed to be introduced in the tests. These vectors are generally the first m consecutive observations from a time series (raw data or residuals). We have referred throughout this paper to this parameter as embedding dimension or m-history. This terminology is very frequent in the field related to entropy and nonlinear chaotic dynamical systems.
None of the cited tests provide advice for how to select this parameter. In this section, we reflect on how to select the parameter m and we analyze the empirical effect of increasing m.
An obvious observation is that, if the system generating process is constructed by two (or more) cross dependent equations where dependence is in lags larger than m, then no test will capture such a dependence and will consider that they are independent time series. One natural solution is to construct the same m-histories with some fixed time-delay, namely. .
This problem and also this solution is rarely found in Economics and Finance; however, it is more frequent in Physics and subfields related with nonlinear and chaotic behavior. Indeed, the modeling and prediction of chaotic time series require proper reconstruction of the state space from the available data in order to successfully estimate invariant properties of the embedded attractor. Thus, one must choose appropriate delay time and embedding dimension m for phase space reconstruction. For the aim of the presented tests, there is no need to go beyond what other tests do for analyzing dependences.
Provided that the new tests rely on symbolic measures, it is worth considering in this regard that previous research on these measures (see [24
]) has provided rules for selecting m
. Each m
induces a number of symbols. The number of symbols has to be large enough to capture departures from the null. For
symbols are being evaluated and are expected to be too few to detect departures from the null(s). For the next integer,
, 36 symbols are evaluated, and in the case of increasing to an embedding dimension of 4, 576 symbols will then be used.
According to the authors, gains in terms of power can be obtained by increasing m according to the following rule: given a data set of T observations, the embedding dimension will be the largest m that satisfies with m = 2, 3, 4, .... In our case, the rule is . The intuition beyond this rule is clear; on the one hand, the larger the number of symbols, the larger the sensitivity for detecting departures from the null. On the other hand, grows very fast and statistical devices require enough samples to behave normally. Note that the larger the m, the finer the search for dependences, but at the cost of increasing sample size required to satisfy the rule.
Finally, the study is completed by empirically analyzing the effect (in terms of power) of a change in the embedding parameter. We consider the basic models (S4–S8) and we evaluate for
. It makes no sense to use
because, in this case, 14,400 symbols will be used while (at most) only 3000 observations are available. Results are collected in Table 7
From these results, we firstly observe that, for a few number of symbols (i.e., for ), the test has no power (as expected), except for the deterministic system and the linear one. In addition, secondly, the power of the test tends to increase with m. For this reason, we recommend the potential users of the test to adhere to the automatic rule for choosing parameter m.