1. Introduction
In many scientific fields, a natural or experimentally-controlled phenomenon is observed and a dataset is collected. From these observations, one may be interested in testing basic assumptions with respect to some theoretical model. One of these assumptions that often appears in physical models is the so-called symmetry hypothesis; see, for example, [
1]. In order to validate a model under investigation, one typically wants to thoroughly test these kinds of hypotheses with the help of a statistical method.
There are various types of symmetry that need to be distinguished first. The most common concerns random variables taking values in the space of real numbers. In this context, a random variable is said to be symmetric around the origin if , where here and in the sequel, means equality in distribution. More generally, X is symmetric around if and only if . For a pair of random variables taking values in , many types of symmetry have been proposed in the literature. The pair is said to be exchangeable if and only if . This definition entails that X and Y have the same distribution. Another notion is reflected symmetry: is reflection symmetric around if and only if . This definition entails in particular the symmetry of X around a and the symmetry of Y around b. While this paper focuses on the two above-mentioned notions of bivariate symmetry, other definitions have been proposed, e.g., joint symmetry and spherical symmetry.
In the statistics and probability literature, there are two main ways to characterize the stochastic behaviour of random variables and random vectors. The most widely used is the distribution function approach. In that case, one works with the function in the univariate case and with the joint distribution in the bivariate case. An alternative, yet less popular approach, uses the so-called characteristic functions associated with random variables and random vectors. Since one can recover the distribution function of a random variable (or vector) from its characteristic function, and vice versa, the various hypotheses of symmetry described previously can equivalently be stated in terms of distribution functions or using characteristic functions. As will be seen, these two approaches lead to different and competing statistical procedures.
This paper focuses on consistent nonparametric tests of symmetry based on Cramér–von Mises functionals of empirical distribution and characteristic functions. These tests are attractive since they do not require any assumptions on the form of the underlying distribution and provide universally-consistent procedures. In addition, as will be seen, these test statistics for symmetry can be expressed as V-statistics. This representation allows for the derivation of their asymptotic behaviour and, most importantly, suggests a resampling method based on the multiplier bootstrap for the computation of
p-values. Compared to permutation methods, which are generally employed when testing symmetry, this strategy is substantially quicker and provides elegant formulas that make the tests easy to implement. The main features of this work are the following:
- (i)
Describe a general family of Cramér–von Mises test statistics for symmetry hypotheses based on empirical distributions and characteristic functions. In the case of univariate symmetry, exchangeability and reflected symmetry, some of these statistics have already been proposed in the literature.
- (ii)
Deduce the asymptotic behaviour of these test statistics under the null hypothesis upon noting that they are related to degenerate V-statistics.
- (iii)
Suggest an efficient alternative to the use of permutations based on the multiplier bootstrap method adapted to V-statistics.
- (iv)
Present the results of a simulation study that investigates the properties of the tests under the null hypothesis, as well as under violations of symmetry hypotheses.
- (v)
Develop a general framework for testing a broad class of symmetry hypotheses.
The paper is organized as follows.
Section 2 provides some results on degenerate V-statistics and their multiplier versions that will prove useful throughout the paper.
Section 3 focuses on tests of symmetry for random variables, while
Section 4 is devoted to tests of bivariate exchangeability and reflected symmetry. The results of an extensive simulation study are presented and discussed in
Section 5. A unified framework that contains as special cases the univariate and bivariate tests of symmetry encountered in
Section 3 and
Section 4, but also many other types of symmetry, is developed in
Section 6. Technical arguments are relegated to the
Appendix.
2. Some Preliminaries on V-statistics
All of the test statistics for symmetry that will be encountered in this work are related to first-order degenerate V-statistics. Therefore, their asymptotic behaviour can be derived using results that one can find, for instance, in the books by [
2] and [
3]. In what follows,
are identically distributed independent observations in
. Some of the test statistics that will be described are of the form:
where
is a symmetric kernel of degree two that is first-order degenerate in the sense that
for all
. In that case,
where
and
are the U-statistics:
The following result is a straightforward consequence of Theorem 1, p. 79, in [
2].
Proposition 1. If , the statistic converges in distribution to:where are independent random variables and are the eigenvalues of the mapping . Now, consider the statistic:
where
is a kernel of degree three that satisfies the following assumptions:
- 𝒜1.
for all , i.e., ϕ is symmetric with respect to its first two components;
- 𝒜2.
for all .
The large-sample behaviour of
is stated as a proposition whose proof is deferred to the
Appendix.
Proposition 2. The test statistic is asymptotically equivalent to the V-statistic with degenerate bivariate kernel , i.e., As a consequence, if , then converges in distribution to:where are independent random variables and are the eigenvalues of the mapping . As mentioned in the Introduction, the proposed methodology for the computation of
p-values will be based on the multiplier bootstrap. Specifically, a multiplier sample is obtained by generating, independently of the data, a random sample
of independent and identically distributed random variables, such that
and
. The suggested multiplier versions of
and
are given, respectively, by:
From a slight adaptation of Theorem 3.1 in [
4], which applies to first-order degenerate U-statistics, one obtains that
is a valid replicate of
asymptotically. For
, one could show using arguments similar as those in the proof of Proposition 2 that
is asymptotically equivalent to:
so that the validity of
to replicate
asymptotically can be deduced, as well.
For computational purposes, define the matrices
, such that:
Letting
and
, one can then write:
In practice, the multiplier procedure is repeated B times by generating independent vectors of multiplier random variables, i.e., for each , . Then, one computes and using the above formulas. These replicates of and are very quick to compute since the matrices A and need to be evaluated only once from the data.
3. Tests of Univariate Symmetry
Many tests of univariate symmetry have been proposed over the years. An early contribution is that of [
5] based on a Cramér–von Mises statistic. Tests of symmetry about an unspecified point have been studied by [
6,
7]; see also the more recent contribution by [
8], where invariant tests based on the empirical characteristic function are proposed. Extensions of these tests are investigated by [
9]. Tests based on kernel density estimation have been investigated by [
10,
11], where the computation of
p-values relies on the bootstrap. Data-driven smooth tests of symmetry have been proposed by [
12].
Here, one focuses on consistent tests based on distribution and characteristic functions in the case of a known center of symmetry. To this end, let be independent and identically distributed copies of a continuous random variable X. For , let be the distribution function of X, and for , let be its characteristic function. Here and in the sequel, , and is the expectation operator. The goal in this section is to describe test procedures for the null hypothesis . One can focus on the case only, i.e., . Indeed, the methodology extends easily to the case by observing that is equivalent to , where , and by working with the sample of transformed data , where for each .
The first step is to note that one can write the null hypothesis
from a distribution function or a characteristic function point-of-view. If
is true, then
for all
and
for all
. Hence, the null hypothesis can be written equivalently as:
As a consequence, consistent test statistics can be based either on the empirical version of
F or on the empirical version of
c given, respectively, by:
Here and in the sequel,
if the statement
s is true and zero otherwise. Natural test statistics for univariate symmetry are therefore given by:
where
and
denotes the modulus of the complex number
z. In the definition of the Cramér–von Mises statistic
,
puts mass
at each element of the sample. This statistic is a special case of the one proposed by [
13] when
X is continuous. An asymptotically-equivalent version of this test statistic has been investigated by [
14]; see also [
5]. According to the author’s knowledge,
has not been investigated yet. The test statistic
uses the characteristic function point-of-view and is based on a nonnegative weight function
ω that must be specified by the experimenter. Some examples of weight functions are described in
Section 5.2. The following lemma provides formulas for the computation of these test statistics.
Lemma 3. One has:where ,and . Since and , the fact that under the null hypothesis entails and . As a consequence, and are V-statistics of order two with first-order degeneracy, and their large-sample behaviour follows from Proposition 1. Note, however, that an additional requirement on is necessary in order that . In particular, it will hold true if the moment of order two exists.
Since
is symmetric with respect to its first two components and from the fact that
, which entails
for all
, the asymptotic behaviour of
is deduced from Proposition 2. Finally, the multiplier versions of
,
and
are derived from the formulas in (
4).
4. Tests of Bivariate Symmetry
While less popular than the univariate symmetry hypothesis, many tests of bivariate symmetry have been proposed. The earliest contributions come from [
15,
16], where nonparametric tests were developed; these tests have been reconsidered by [
17]. A test using the empirical distribution function has been suggested by [
18]. An investigation comparing some tests of bivariate symmetry was done by [
19]. Extensions to tests of multivariate symmetry were considered by [
20].
In this section, the focus is put on bivariate exchangeability and reflected symmetry. In the sequel,
are independent and identically distributed copies of a continuous random pair
. For
, the joint distribution of
is
, and for
, its characteristic function is
. The proposed test statistics will be based on the sample versions of
H and
C, namely:
4.1. Exchangeability
The goal here is to test for the null hypothesis
. When
is true,
and
. Hence, the null hypothesis can be written equivalently as:
;
.
In view of these two characterizations of the null hypothesis, consider:
where Ω is a nonnegative and integrable weight function. The test statistic
was introduced by [
16], where a test of symmetry is performed using an approximation of the distribution under
. Because the latter is inaccurate under high levels of dependence, an alternative procedure was proposed by [
21]. Explicit formulas for
and
are provided in the next lemma.
Lemma 4. One has:where:and:where for , The kernel
is symmetric with respect to its first two components. In addition,
under the null hypothesis, because
. The asymptotic behaviour of
can then be deduced from Proposition 2. Similarly,
, so that
and
is a V-statistic with first-order degeneracy. Its large-sample behaviour then follows from Proposition 1. Multiplier versions of
and
derive from formulas in Equation (
4).
4.2. Reflected Symmetry
As mentioned in the Introduction, the null hypothesis of reflected symmetry around is . For simplicity, one assumes that , so that the focus is put on . The extension to arbitrary is straightforward upon noting that the null hypothesis is equivalent to , where and . Hence, one would only have to consider the sample of transformed data , where and for each .
When
is true,
and
. Letting
, the distribution function and characteristic function versions of
are then respectively:
;
.
Letting
, consider the test statistics:
Explicit formulas are given next.
Lemma 5. One has:where:and:where . Proceeding similarly as with
, one can show that
. Thus, since
is symmetric with respect to its first two components, the asymptotic behaviour of
follows from Proposition 2. Furthermore, since
, one deduces
, and
is a first-order degenerate V-statistic whose large-sample behaviour follows from Proposition 1. Multiplier versions of
and
are derived from Equation (
4).
4.3. A Note on Copula Symmetry
A class of bivariate symmetries, yet less known than exchangeability and reflected symmetry, is based on copulas. The latter allows one to shed new light on the understanding of bivariate symmetry. The starting point is a theorem by [
22] that states that there exists a function
called a copula, such that
for all
. If the marginal distributions
and
are continuous, then
D is unique. As a consequence,
D completely characterizes the dependence between
X and
Y when
is continuous.
Because Sklar’s representation entails that the random pair
is distributed as
D, exchangeability and reflected symmetry can be reformulated as follows:
- (i)
The pair is exchangeable if and only if and ;
- (ii)
The pair is reflection symmetric around if and only if , and , where and .
The reader is referred to [
23] for more details on the general theory of copulas.
Assuming the availability of independent random copies of , one can test for the exchangeability and reflected symmetry of the copula only. This setup is equivalent in assuming that the marginal distributions and are known, so that a random sample can be transformed to the copula scale by letting for each . For copula exchangeability, the method described in Subsection 3.2 can be applied directly; for copula reflected symmetry, this corresponds to the case , and then, the methodology in Subsection 3.2 may be used with , where and .
The marginal distributions
and
are generally unknown. In that case, it is suggested to work instead with
, where
and
,
are the empirical distribution functions. However, doing so results in much more complicated limit distributions and calls for suitably-adapted multiplier methods. See the works by [
24] on copula exchangeability and by [
25] on copula reflected symmetry (called radial symmetry in that case) for details.
5. Monte Carlo Study of the Sampling Properties of the Tests
5.1. Parameters of the Simulations
This section explores the sample properties of the tests for the three null hypotheses considered in
Section 3 and
Section 4, namely
,
and
. Specifically, the ability of the tests to keep their 5% nominal level under the null hypothesis and their power against alternative hypotheses will be investigated with the help of simulated datasets. The probability of rejection of the null hypothesis will be estimated from 1000 replicates under each scenario. The computation of
p-values will be based on
bootstrap samples using a version of the multiplier method called the Bayesian bootstrap. In that case,
are replaced by
,
, where
are independent and identically distributed from the exponential law with mean one; see [
26] for details. Many other choices are possible for the stochastic structure of the multiplier variables, but from the author’s experience, it has little influence on the performance of the tests.
5.2. Size and Power of the Tests of Univariate Symmetry
This subsection investigates the properties of the tests based on
,
and
for testing the null hypothesis of univariate symmetry
. The computation of
calls for the choice of a weight function
ω. For the simulation results that will be presented, one considers
and
for
. One can show that for
and
,
where
is the density of the standard univariate normal distribution.
In order to investigate the ability of the tests to reject the null hypothesis of univariate symmetry around zero, one considers the general family of skew-asymmetric densities, as defined by [
27]. Specifically, for a given symmetric density
f and a given absolutely continuous distribution function
G, such that
is a symmetric density around zero, a skew-asymmetric density is defined for
by
. The case
corresponds to a situation under the null hypothesis. When
f and
G are respectively the density and the cumulative distribution function of the standard normal distribution, one recovers the skew-normal family as introduced by [
28]. For the simulation results that are reported in
Table 1, one also considers the skew-T distribution with three degrees of freedom and the skew-Cauchy distribution (which is indeed the skew-T with one degree of freedom). Since
for all
, datasets from
can be generated using the rejection method; see [
29] for more details. The idea is to simulate repeatedly
X from
f and
U from the uniform distribution on
until
; then
.
Looking at
Table 1, one can say that the six tests are very good at keeping their 5% nominal level under the null hypothesis, even when
. An exception occurs for
under the Cauchy distribution, where the test is too conservative. This behaviour is explained by the fact that the requirement
is not satisfied in that case. As expected, the power of these tests increases as a function of the sample size, as expected from their theoretical consistency. The power also increases as a function of the parameter
δ that controls the level of asymmetry. Note that departures from
based on skew-Student and skew-Cauchy alternatives are more easily detected than those from the skew-normal distribution. Overall, the best tests are those based on
and
, as well as on the characteristic function statistics
and
.
5.3. Size and Power of the Tests of Exchangeability
The test statistics
and
are investigated here for testing the null hypothesis
of exchangeability. Two weight functions are considered for
, namely:
As enlightened in Subsection 4.3, the hypothesis of exchangeability of a pair
requires that
and that
. For the simulation results that will be presented, one assumes a
distribution for both
X and
Y, so that the asymmetry will be controlled solely by the form of the copula. Here, one considers a general class of asymmetric bivariate distributions of the form:
where Φ is the cumulative distribution function of the
law and
D is a symmetric copula,
i.e.,
for all
. The special case
corresponds to a scenario under the null hypothesis of exchangeability. This construction is based on a proposal by [
30]. For the results in
Table 2, the copula
D belongs either to the normal or the Gumbel–Hougaard family of symmetric models,
i.e.,
where
is the bivariate standard normal density with correlation
and
. These parameters are taken so that they match a Kendall’s tau of 0.75,
i.e.,
and
. The values of the asymmetry parameter are
.
In light of simulations not presented here, the values
offer the best performance for the test statistics
and
. From the entries in
Table 2, one can see that the five tests are rather good at keeping their size under
, having in mind the fact that the multiplier method is valid asymptotically as
. As expected, the power of the tests increases with the sample size. Here, the level of asymmetry is not necessarily monotone in
δ. Indeed, the highest level of asymmetry occurs for values of
δ around
when it is measured for example by the index introduced by [
31]; the simulation results concord with this fact, where the highest power are observed when
. Here, the test based on the empirical distribution function statistic
is significantly less powerful than those based on the empirical characteristic function; a similar feature has been documented by [
32] when testing for copula symmetry. The best tests overall are those based on
. Finally, note that asymmetries based on the Gumbel–Hougaard copula are better detected than those based on the normal copula.
5.4. Size and Power of the Tests of Reflected Symmetry
For the same weight functions
and
considered in the preceding subsection for testing exchangeability, one can show that:
where
,
,
and
.
Following [
33], reflected asymmetric bivariate densities can be built from a generalization of skew asymmetric univariate densities. Specifically, consider a density
f, such that
, and a one-dimensional distribution function
G, such that its density
is symmetric around zero. Then,
is a skew asymmetric bivariate density. In the special case when
and
is the cumulative distribution function of the
distribution, one recovers the so-called skew-normal distribution with correlation coefficient
, namely:
For the results in
Table 3,
and
. Results not presented here with
show that the power is one, even for a sample size as low as
. Here, similar comments as for the tests of exchangeability apply for the ability of the tests to keep their nominal level and for their power as
n increases. Comparing to the results in
Table 2, however, one sees that the estimated probabilities of rejection are higher here. It can be explained, at least in part, by the fact that the asymmetry in the bivariate skew asymmetric model
affects both the marginal distributions and the copula. Here, reflected asymmetry increases as a function of
δ, resulting in power results that increase with
δ. Overall, the test based on
performs well under all of the scenarios that were considered. The characteristic function statistics are also doing well, the best being
. Finally, note that the power is higher when
compared to
.
6. Unification into a General Framework
The hypotheses considered so far can be treated somewhat simultaneously by taking a general group of transformations. To this end, take a random vector
in
with joint distribution function
,
and
p-variate characteristic function
,
. Then, let
be a symmetric matrix, such that
and consider testing the null hypothesis
against
. When
and
, one recovers the univariate symmetry encountered in
Section 3. In the case
, the exchangeability and reflected symmetry hypotheses treated in
Section 4 correspond respectively to:
Letting
and upon noting that
, the null hypothesis
can be written equivalently as:
From a sample
of independent copies of
, define the empirical versions of
F and
C respectively by:
A Cramér–von Mises statistic based on the sample distribution function is:
where
is the distribution function of
. Taking Ω to be a nonnegative integrable weight function defined on
, a characteristic-function statistic is:
From computations similar to those in Lemmas 3–5, one can show that:
where for
,
and for
,
Since
, it follows that
under
. Since in addition,
is symmetric with respect to its first two components, the asymptotic distribution of
under the null hypothesis can be deduced from Proposition 2. One also has
, and then,
is a first-order degenerate V-statistic with bivariate kernel
whose asymptotic distribution follows from Proposition 1. The multiplier versions of these statistics follow from the formulas in Equation (
4).
To close this section, note that many symmetry hypotheses are related to a group of transformations rather than to a single transformation matrix
. This situation has been considered by [
34] from a distribution function point-of-view using a bootstrap method for the computation of
p-values. In order to handle this case under the framework of the current paper, let
be a set of
symmetric matrices and consider the null hypothesis
for all . For example, spherical symmetry corresponds to
being the set of all orthogonal transformations in
, while multivariate exchangeability occurs when
is the set of all permutation matrices in
.
The key here is to work with a combination matrix
, such that for
,
if and only if
is a constant vector. Then, define
and
and note that under the null hypothesis
,
and
are
-dimensional vectors of identical functions in
. With this in hand, the null hypothesis can be re-written either as
or
. Hence, letting
and
, with
, test statistics are given by:
It can be shown that is of the form required in Proposition 2, while is a V-statistic with a bivariate kernel having a first-order degeneracy, hence falling under the requirements of Proposition 1.