1. Introduction
Let  be a compact group of linear transformations (operators) from  to . A Borel probability measure P on  is called -symmetric if and only if there exist an affine nonsingular transformation  from  onto  and a -invariant Borel probability measure  such that  In other words, if X is a random vector with the distribution  then there exists an affine nonsingular transformation  such that the random vector  is -invariant, which in its turn means that  for all transformations S from the group . Clearly, an affine transformation  is not unique, as one could take instead  for an arbitrary transformation S from the group . But, on the other hand, one could always fix  by normalizing it in any proper way.
Obviously, we can define  as the distribution of the random variable . The couple  will be called parameters (specifiers) of the -symmetric probability measure  We denote by  the set of all -symmetric distributions on 
We are interested in the following problem. Given an independent identically distributed (i.i.d.) sample  from the distribution  defined on the probability space , construct and study tests for -symmetry of the distribution P.
Our group approach will unify the theory for different tests for different types of symmetry. Below we give some examples of common in literature types of symmetry  with corresponding choices of .
Example 1. Let . Then -invariant probability measure  is a so called sign change symmetrical measure. Probability measure P is -symmetric if there exists an affine transformation  in the form  with some  such that . In this case, P is often called diagonally or reflectively symmetrical measure.
Let  denote the Euclidean norm of a vector . If  one can define . Obviously, the parameter  and the transformation  are uniquely defined for any P such that the corresponding X is integrable (not necessarily -symmetric). We can also define in such a generality  as the distribution of the random variable .
 Example 2. Let  be the group of all orthogonal transformations in . Then -symmetrical probability measure P is often called ellipsoidally symmetric or elliptically symmetric or elliptically contoured measure and  is spherically symmetric one. In this case, if  one can define  as the mean  and  as the square root of the covariance operator of  so that  for any x from . Define  as the distribution of the random variable . As in the previous case, these parameters are defined for any P such that the corresponding X is square-integrable (not necessarily -symmetric).
 We refer to the papers [
1,
2,
3,
4,
5,
6,
7,
8,
9] for results on ellipsoidal and spherical symmetry testing.
Example 3. Let  and let  be a group of all transformations translating a regular polygon with k vertices centered at 0 into itself. Clearly,  is a subgroup of the group of all orthogonal transformations. Thus, an affine transformation  can be fixed in the same way as in example 2.
 Example 4. Let  be a group of all reflections about hyperplanes . Then for each  there exists a permutation  such that  for all . In this case -invariant probability measure  is called permutation symmetric measure.
 As in examples 2 and 3 we can define 
 for any 
x from 
, where 
 and 
 is the square root of the covariance operator of 
 provided 
 (also see [
10,
11]).
Tests for symmetry of a multivariate distribution play an important role in statistics and in various fields of science. To name a few, in finance theory log-returns of assets are assumed to be ellipsoidally symmetric. In genetics it is assumed that gene expression values are diagonally symmetrically distributed. In image analysis components are assumed to be spherically symmetric. In linear programming it is assumed that the distribution of feasible solutions is permutation symmetric. In statistics sliced inverse regression method due to Li, see [
12], works for ellipsoidally symmetric distributions. Also since tests for normality are extended to tests for ellipsoidal symmetry, any research field that employs multivariate analysis based on normality assumption can benefit from relaxing this assumption to ellipsoidal symmetry assumption. So clearly symmetry tests are needed in applications. See [
13] for a detailed survey on the use of symmetry in various scientific fields.
The rest of the paper is organized in the following way. We give notations and construct test statistics with examples in 
Section 2. Main results and bootstrapped test statistics are given in 
Section 3, followed by a detailed example in 
Section 4. The proofs are in 
Section 5. The closing remarks are in 
Section 6, followed by technical details in Appendix.
  2. Notations and Preliminaries
Let 
m denote the uniform distribution (the normalized Haar measure) on the group 
 Given a bounded Borel function 
f on 
 we define
      
      It is easy to check that for a 
-symmetric 
P distribution with specifiers 
 we have
      
      for any bounded Borel function 
f. Indeed, 
 for any 
. Since 
 is a compact group, one can integrate the equality over 
 with respect to the uniform measure. Thus, 
 implies (
2).
As a result, if a class 
 characterizes the distribution, i.e.
      
      implies that 
 then 
P is 
-symmetric if and only if (
2) holds for all 
 In general, we call 
P a 
-asymmetric distribution if and only if there exists a function 
 such that (
2) does not hold. This observation is the key idea behind the tests that we construct and study. Naturally, a class 
 should be rich enough and possess good properties for further analysis. Let us describe it. Let 
 be a 
semialgebraic subgraph class as introduced in [
7]. Basically, for a function from such a class, its subgraph can be constructed from a union of intersections of a finite number of subgraphs of polynomials of a finite degree in 
. The same should be true for a product of two functions from such a class. For instance, one can use polynomials of bounded degree or trigonometric functions of bounded frequency as 
. The precise definition can be found in Appendix. We will provide a few examples later on.
Let 
 be the empirical distribution based on the sample 
 Assume in what follows that 
 is a 
-consistent estimator of 
. And, furthermore, there exists such a function 
 that 
 and as 
For example, assuming 
 let us define 
 and 
 as
      
      and
      
      where all vectors are columns and superscript 
T denotes transposition. Then one can define 
 in examples 2-4 and 
 in example 1 for any 
. Under the condition 
  is a 
-consistent estimator of 
. Weaker moment assumptions on 
P can be imposed if other statistics are considered for estimation of 
, such as a sample median and an 
M-estimator for the covariance matrix. See, e.g., [
14].
The scaled residuals of the observations 
 are defined as
      
      Let 
 denote the empirical distribution based on the sample 
 Our approach to the problem of testing for 
-symmetry will be to use the sup-norms of the stochastic process
      
     as test statistics
      
     Such functionals can be viewed as “measures of asymmetry” of the empirical distribution because of the relationship (
2).
Note that a nonsingular affine transformation of the data  results in an orthogonal transformation of the scaled residuals. If a class  is invariant with respect to all orthogonal transformations (i.e. for all  and any orthogonal transformation O we have ), then the test statistic defined as the sup-norm of the process  is affine invariant. This is the case in the following examples.
Example 1.1. Consider 
 from example 1. Let
      
      be the class of all half-spaces in 
, where 
 denotes the unit sphere in 
. Consider the class
      
      For 
, we have 
 The process 
 becomes
      
      The test statistic is represented as
      
 Example 1.2. Consider 
 from example 1. Let 
 be the class of all half-spaces in 
 as in the previous example. Let
      
      for 
 with 
. Then we have 
. The process 
 becomes
      
      and the test statistic looks like
      
 In one-dimensional case 
 this test statistic becomes
      
      One gets the expression that resembles a well known test for symmetry based on the empirical distribution function 
,
      
      See, for instance the discussion in the paper [
15].
Example 2.1. Consider 
 from example 2. Let
      
      be the class of “caps” on the unit sphere 
 Consider the class
      
      For 
 we have 
 The process 
 becomes now
      
      The test statistic 
 can also be represented as
      
      where 
 is the rearrangement of 
 such that 
 These tests were studied in [
7] and [
9].
 Example 2.2. Consider 
 from example 2. Let 
 denote the linear space of spherical harmonics of degree less than or equal to 
l in 
 and let 
 be the unit ball in 
 Denote
      
      Then for 
 we have 
 and 
, where 
 is the average of 
 on 
 In this case, the process 
 becomes
      
      The statistic 
 becomes
      
      where 
 denotes an orthonormal basis of the space 
   for 
 and 0 otherwise, and 
 of a set denotes the number of elements of the set. These tests were studied in [
7] and [
9], where their superiority in level preservation and power performance over other tests both theoretically and in a simulation study, was shown. A similar approach was used to test for multivariate normality in [
16]. The authors of [
6] developed a different kind of tests for ellipsoidal symmetry based on spherical harmonics.
 Example 2.3. Consider 
 from example 2. Let 
 be the class of all half-spaces in 
 as in the example 1.1. For 
 where 
 we have 
 where 
 The process 
 in this case is
      
      and the test statistic can be defined as
      
      This type of test statistics was systematically studied in papers [
7,
9,
10].
 Example 2.4. Consider 
 from example 2. Let
      
      For 
 we have 
 where 
 Thus, the process 
 becomes
      
      and the test statistic can be chosen as
      
 Example 2.5. Consider 
 from example 2. Consider the class
      
      For 
 we have 
, where 
 denotes the Bessel function of the 
l-th order, the constant 
 depends only on 
d. The process 
 becomes
      
      and the test statistic can be chosen as
      
 Consider  from example 3. Due to similarity between examples 2 and 3, one can choose the same classes of functions for  from example 3. We give just one of the examples as an illustration.
Example 3.1. Consider the class 
 from example 2.1. Then for 
 we have 
 for all 
, where 
 is the rotation on angle 
, 
. In this case, the process 
 is
      
      The test statistic 
 can be also represented as
      
      where 
 is the rearrangement of 
 such that 
 Example 4.1. Consider 
 from example 4. Let 
 be the class of all half-spaces in 
 as in the example 1.1. Denote
      
      Then for 
 such that 
 for 
 we have
      
      where the summation is over all permutations 
 of 
. In this case the process 
 is
      
      and the test statistic can be defined as
      
      where the last supremum is taken over all combinations 
 out of 
. Well known and frequently used Friedman’s rank tests are based on the similar choice of a class 
. For reference see the papers [
17] and [
11].
 It is not hard to see that the function classes defined in examples 1.1, 1.2, 2.1–2.4, 3.1, 3.2, and 4.1 are semialgebraic subgraph. In addition, classes  characterize the distribution in the case of examples 1.1, 2.1, 2.3, 2.4, 3.1, 4.1 above.
We say the class of transformations  preserves the semialgebraic property if for any polynomial p on  of degree less than or equal to r the set  belongs to  for some q and l (see Appendix for the definition). Classes , defined in examples 1–4, preserve the semialgebraic property.
Let
      
      It follows from (
2) that, for a 
-symmetric distribution 
P and for all 
fLet 
 denote the 
P-Brownian bridge, i.e. a centered Gaussian process indexed by functions in 
 with the covariance
      
      We will frequently use integral notation for 
      As always, 
 denotes the space of all uniformly bounded functions on 
 with the sup-norm 
A sequence of stochastic processes 
 is said to converge weakly in 
 (in the sense of Hoffmann-Jørgensen) to the stochastic process 
 if and only if there exists a Radon probability measure 
 on 
 such that 
 is the distribution of 
 and, for all bounded and 
-continuous functionals 
 we have 
 where 
 stands for the outer expectation, which is defined as 
 for a 
. See for instance [
18].
We assume in what follows that the class 
 satisfies standard measurability assumptions used in the theory of empirical processes (see [
19] or [
18]). We also need smoothness conditions (S) on 
P and 
, which are given in Appendix.
  3. Main Results
Theorem 1 Suppose that  is a semialgebraic subgraph class, the smoothness conditions (S) hold and  Define a Gaussian stochastic processwhose distribution is a Radon measure in . Then the sequence of stochastic processesconverges weakly in the space  to the process  In particular, if P is -symmetric with specifiers  then the sequence  converges weakly in the space  to the process   Define the test statistics
      
      Given 
 let
      
      Let 
 be the hypothesis that 
 and let 
 be the alternative that 
 Also, denote by 
 the alternative that 
P is 
-asymmetric.
Theorem 1 and the well-known theorem of Cirel’son on continuity of the distribution of the sup-norm of Gaussian processes, see [
20], imply the following.
Corollary 1 Suppose all conditions of Theorem 1 hold. Under the hypothesis and under the alternative In particular, if  characterizes the distribution, then under the alternative , i.e. for a fixed -asymmetric distribution P,  In most cases, however, the limit distributions of such statistics as 
 depend on the unknown parameters of the distribution 
 Thus, to implement the test one has to evaluate the distribution of the test statistic using, for instance, a bootstrap method. We describe below a version of the conditional bootstrap for 
-symmetry testing. It is a generalization of the bootstrap method proposed in [
7].
Given 
 let 
 denote the 
-symmetric distribution with specifiers 
 It will be called 
the -symmetrization of 
 Denote by 
 the 
-symmetric distribution with specifiers 
 Let 
, …, 
 be an i.i.d. sample from the distribution 
 defined on a probability space 
 One can construct such a sample using the following procedure. Take an i.i.d. sample 
 from 
, which is a resampling from 
. Define
      
      Then conditionally on 
, 
 is an i.i.d. sample from the 
-symmetric distribution 
In particular, for 
 from example 1
      
      where 
 is a Rademacher i.i.d. sample, that is 
 with probability 1/2, 
 independent of 
.
For 
 from example 2 one can take an i.i.d. sample 
 uniformly distributed on 
 and an i.i.d. sample 
 from 
, the empirical distribution based on 
, independent of 
 In other words, 
 is the resampling from the sample 
. Then
      
For 
 from example 3 let 
 be an i.i.d. sample uniformly distributed on 
 independent of 
, then
      
      where 
 is a rotation on the angle 
 about 0.
Finally, for 
 from example 4 consider 
n independent permutations 
 of 
, 
 independent of 
. Then
      
      where 
 is a reflection transformation such that 
 for 
.
Let 
 denote the empirical measure based on the sample 
 and let
      
      Define 
the bootstrapped scaled residuals as
      
      Let 
 denote the empirical distribution based on the sample 
The bootstrap version of 
 is the process
      
Let 
 denote the set of all functionals 
 such that 
 for all 
 and 
 for all 
. Given two stochastic processes 
 we define the following bounded Lipschitz distance:
      where 
 denotes the outer expectation.
Now we are going to consider a bootstrap version of Theorem 1.
Theorem 2 Suppose that  is a semialgebraic subgraph class, the smoothness conditions (S) hold and  Then the sequence of stochastic processes  converges weakly in the space  to a version  of the process  (defined on the probability space ) in probability  More precisely,In particular, if P is -symmetric,  converges weakly to a version of the process   Define test statistics
      
      Given 
 let
      
      In other words, 
 is a 
-quantile of the distribution of 
 conditional on the sample 
.
Then Theorems 1 and 2 imply the following.
Corollary 2 Suppose all the conditions of Theorems 1 and 2 hold. Under the hypothesis and under the alternative In particular, if  characterizes the distribution, the bootstrap test is consistent against any asymmetric alternative (subject to the smoothness conditions (S)): under the alternative   Thus, our method provides tests that are consistent against any -asymmetric alternative.
  4. Detailed example
In this section we provide an example for which we verify all the assumptions and supply a step-by-step computational algorithm. Let 
 and consider the problem of testing whether 
P is elliptically contoured measure (Example 2). For a vector 
 let 
 be its polar coordinates. For a fixed integer 
l let
      
     where 
 denotes the linear span of 
 with all the functions bounded by 1, see Example 2.2. This class 
 satisfies the following assumptions.
1. It characterizes distribution only for 
. For a finite 
l it does not characterize the distribution, since one might find two different distributions 
 such that
      
      and
      
      for all 
.
2. 
 is a semialgebraic subgraph class. Indeed, for 
 the sets
      
      can be represented as unions of finite number of intersections of polynomial sets of finite degree. For instance, for 
 we have have the following representation
      
      The representations for any 
 can be obtained similarly using trigonometric identities. Obviously, similar arguments work for sines, linear combinations of sines and cosines and products of any two functions from 
.
3.  is invariant with respect to all orthogonal transformations, which are rotations on the unit circle. Indeed, for any rotation on an angle  a vector  is transformed into the vector . So for any  we have  or , where both functions belong to the linear span of  and are bounded by 1. So any linear combination of such functions, which is bounded by 1, would also lie in .
4. Condition (S2) holds. Indeed,  is uniformly bounded by 1. For any  and any  we have  for some constant , so that the measure of the set defined in (S2) is zero for .
Also note that the group 
 of all orthonormal transformations of 
 preserves semialgebraic property. Indeed, for any rotation on an angle 
 the sets
      
      are semialgebraic. The same holds for sines and linear combinations of sines and cosines.
We also require the following two conditions on P: (S1) holds and . The first condition is satisfied for absolutely continuous P with the uniformly bounded and continuously differentiable Lebesgue density with the corresponding derivative approaching zero at infinity faster than , . For example, distributions with densities on a finite support and normal distributions satisfy (S1). The last condition is satisfied if .
Given a random sample  from a distribution P let us describe a step-by-step testing algorithm.
1. Obtain . In our example  is the sample mean and  is the square root of the sample covariance for the sample .
2. Calculate residuals .
3. Find the test statistics, which can be simplified as follows
      
      where 
 is the polar coordinate of 
.
4. Choose a number of bootstrap repetitions, say M. On practice we often take a large number, for instance . Then the next four steps are repeated M times.
4.1. Generate a sample . In this example, first, generate a sample  from a uniform distribution on the unit circle, independent of . Secondly, resample with replacement from  to obtain . Thirdly, .
4.2. Obtain . In our example  is the sample mean and  is the square root of the sample covariance for the sample .
4.3. Calculate residuals .
4.4. Find the bootstrapped test statistics, which can be simplified as follows
      
      where 
 is the polar coordinate of 
.
5. Based on  find the empirical -quantile of the distribution of , conditional on . Let us denote it as .
6. If  then reject  is elliptically contoured distribution, at the significance level .
  5. Proofs
We use ideas and methods of the work [
7]. Their technique was developed for ellipsoidal symmetry and is needed to be adjusted for group symmetry. Basically one should change 
 to 
 throughout the proofs. However, there are technical difficulties associated with using transformations 
A instead of 
, they are hidden in the proofs of lemmas. We give a few details for completeness.
Let 
 denote a subset of all nonsingular linear transformations in 
 Given a transformation 
 denote 
 For a function 
f on 
 let
      
      Given a class 
 of functions on 
, define 
      Now the process 
 is represented as
      
      and the process 
 as
      
      Clearly,
      
      Define
      
      Given a function 
g on 
 we can write
      
      where
      
     A similar computation shows that
      
      Let
      
      Given a class 
 of functions, define
      
We reformulate the following versions of lemmas from [
7] that describe smoothness properties of the functions introduced above and Donsker properties of the classes of functions given above. The convergence of transformations is with respect to the operator norm on the set of all linear transformations. Smoothness conditions are used in the proof of Lemma 1. Properties of Vapnik-Chervonenkis subgraph classes are used in the proof of Lemma 2. See [
18] for details on Vapnik-Chervonenkis, Glivenko-Cantelli, and Donsker classes of functions.
Lemma 1 Suppose that P and  satisfy the smoothness conditions (S). Then the following statements hold:
(C2) The function  is differentiable at the point  for any , and the Taylor expansion of the first orderholds uniformly in . (C3) Similarly, the function  is differentiable at the point  for any , and the Taylor expansion of the first orderholds uniformly in . (C4) The function  is continuous with respect to A at  uniformly in 
(C5) The function  is differentiable at the point  for any  and, moreover, the Taylor expansion of the first orderholds uniformly in  Moreover, the matrix-valued function  is continuous at  uniformly in  (C6) if  then for all and  Lemma 2 For a uniformly bounded semialgebraic subgraph class  the classes   and  are uniformly Donsker and uniformly Glivenko–Cantelli.
 Proof of Theorem 1. Define a process
      
      (C1) and 
 being a 
P-Donsker class by Lemma 2 imply that we can use asymptotic equicontinuity to obtain
      
      for all 
. Clearly,
      
      If 
 we have
      
      Note that
      
      Using (
5) and 
-consistency of 
 we obtain
      
      as 
 uniformly in 
. It follows from (C1) and 
-consistency of 
 that
      
      uniformly in 
. Representations (
6) and (
7), the fact that 
 is a uniformly Donsker class from Lemma 2 and (C1) imply that the sequence 
 converges weakly in the space 
 to the Gaussian stochastic process 
 This implies the first statement of the theorem. If 
P is ellipsoidally symmetric then 
, which concludes the proof of Theorem 1.
 Proof of Theorem 2. Define a process
      
      By Lemma 2, the class 
 is uniformly Glivenko–Cantelli. This together with (C4) and representations (
3), (
4) implies that
      
      uniformly in 
. Similarly, since the class 
 is uniformly Glivenko–Cantelli, by (C5) and representations (
3), (
4), we obtain
      
      uniformly in 
.
 Since 
 is a uniformly Donsker class, we can use Corollary 2.7 in [
21] to prove that a.s. 
 converges weakly in the space 
 to the same limit as 
, where 
 is the empirical measure based on a sample from 
, i.e. to the 
-Brownian bridge 
 Asymptotic equicontinuity and (C1) yield that for all 
 a.s.
      
      Define
      
      Since 
 we can write
      
      If 
 we have
      
      Note that
      
      Using (
8), (
9) and standard asymptotic properties of the estimators 
 we obtain
      
      as 
 uniformly in 
. Here and in what follows the remainder term 
 converges to 0 as 
 uniformly in 
 in probability 
.
Applying the asymptotic equicontinuity condition to the process 
 and using (C1), we obtain
      
      as 
 Now we can write
      
      Note that by (
3) and (
4)
      
      Since, by Lemma 2, 
 is uniformly Donsker class and since (C5) and (C6) hold, it is easy to prove the weak convergence of the processes
      
      in the space 
 where 
 is a ball in 
 with the center 
 Using the asymptotic equicontinuity and (C6), we obtain
      
      It follows from (C3) and standard asymptotic properties of the estimators 
  that
      
      uniformly in 
. Relationships (10)–(14) along with, again, Corollary 2.7 in [
21], imply the statement of the theorem.
  6. Conclusion
We propose and study a general class of tests for group symmetry, which encompasses different types of symmetry, such as ellipsoidal and permutation symmetries. Our approach is based on supremum norms of special empirical processes combined with bootstrap.
There are several advantages to our methodology. First, the test statistics are indexed by classes of functions that are rich enough and still relatively simple to use. This provides some flexibility in choosing a suitable class of functions, thereby giving an appropriate test. Secondly, these tests are consistent against all possible asymmetric alternatives. Thirdly, they enjoy the property of affine invariance. Fourth, these are bootstrap tests, which could be considered as a drawback but it is a way to deal with complex nature of asymptotic null distribution of a non-bootstrap semiparametric test, and these tests have good theoretical properties. Fifth, this approach gathers separate ideas and methods developed for various types of symmetry under one umbrella. It provides a unified theory for studying statistical properties of seemingly different tests for different types of symmetry.
  7. Appendix
Definition of a semialgebraic set. For any polynomial p on  of degree less than or equal to r we will call  a polynomial set of degree less than or equal to r in . Let  denote the class of all polynomial sets in  of degree less than or equal to r. Then any set from the union  is called a semialgebraic set of degree less than or equal to r and order less than or equal to l,  being the minimal set algebra generated by . Let  denote the class of all semialgebraic sets of degree less than or equal to r and order less than or equal to l in .
A class  of functions on  is a semialgebraic subgraph class if and only if for some  for all functions g from  the set  belongs to  and for all functions  from  the set  belongs to 
Conditions on P and . We also introduce the following smoothness conditions on the distribution P and the class 
(S1) 
P is absolutely continuous with a uniformly bounded and continuously differentiable density 
p such that for some 
        where 
 denotes the derivative of the density 
p.
(S2) The class 
 is uniformly bounded and for all 
 and 
        Here 
 denotes Lebesgue measure in 
 and
        
The classes  characterize the distribution. The classes 
 characterize the distribution in the case of the examples 1.1, 2.1, 2.3, 2.4, 3.1, 4.1 above. Indeed, this is a well-known property of the classes used in examples 1.1, 2.3 and 4.1. As to the Example 2.4, we refer, e.g., to the paper [
22] for similar statements. To prove that this is the case in Example 2.1 (and in Example 3.1, similarly), consider the map 
 Since this map is a Borel isomorphism (even a homeomorphism), it suffices to show that for any two finite measures 
 in 
 the condition
        
        implies 
 We will prove that, in fact, for any two finite measures 
 on 
 the condition
        
        where 
 is the class of all half-spaces in 
 implies that 
 (the previous statement then follows, since one can consider two measures in 
 both supported in 
). The condition
        
        is equivalent to the following one
        
        for all 
 Using a standard approximation of Borel functions by simple functions, we extend this to the equality
        
        that holds for all bounded Borel functions 
 If we set 
 and 
 we obtain that the characteristic functions of 
P and 
Q are equal, which implies that