1. Introduction
This section reviews regression models where the response variable Y depends on the vector of predictors only through the sufficient predictor . Then there are n cases (. For the regression models, the conditioning and subscripts, such as i, will often be suppressed. This paper gives a high dimensional test for versus where is the vector of zeroes.
A useful multiple linear regression (MLR) model is
for
Assume that the
are independent and identically distributed (iid) with expected value
and variance
. In matrix form, this model is
where
is an
vector of dependent variables,
is an
matrix with
ith row
,
is a
vector, and
e is an
vector of unknown errors. Also
and Cov(
where
is the
identity matrix.
For a multiple linear regression model with heterogeneity, assume model (1) holds with and Cov( is an positive definite matrix. Under regularity conditions, the ordinary least squares (OLS) estimator can be shown to be a consistent estimator of .
For estimation with ordinary least squares, let the covariance matrix of
x be
and the
vector
Let
and
For a multiple linear regression model with iid cases,
is a consistent estimator of
under mild regularity conditions, while
is a consistent estimator of
.
Ref. [
1] showed that the one component partial least squares (OPLS) estimator
estimates
where
for
. If
, then
. Also see [
2,
3,
4]. Ref. [
5] derived the large sample theory for
and OPLS under milder regularity conditions than those in the previous literature, where
Ref. [
6] showed that for iid cases
, these results still hold for multiple linear regression models with heterogeneity.
The marginal maximum likelihood estimator (MMLE or marginal least squares estimator) is due to [
7,
8]. This estimator computes the marginal regression of
Y on
, such as Poisson regression, resulting in the estimator
for
. Then
For multiple linear regression, the marginal estimators are the simple linear regression estimators. Hence
If the
are the predictors that are scaled or standardized to have unit sample variances, then
where
denotes that
Y was regressed on
t. Ref. [
6] derived large sample theory for the MMLE for multiple linear regression models, including models with heterogeneity.
For Poisson regression and related models, the response variable Y is a nonnegative count variable. A useful Poisson regression (PR) model is . This model has . The quasi-Poisson regression model has and where the dispersion parameter . Note that this model and the Poisson regression model have the same conditional mean function, and the conditional variance functions are the same if .
Some notation is needed for the negative binomial regression model. If
Y has a (generalized) negative binomial distribution,
, then the probability mass function (pmf) of
Y is
for
where
and
Then
and V(
.
The
negative binomial regression model states that
are independent random variables with
This model has
and
Following Ref. [
9] (p. 560), as
it can be shown that the negative binomial regression model converges to the Poisson regression model.
Let the log transformation
if
and
if
. This transformation often results in a linear model with heterogeneity:
where the
are independent with expected value
and variance
For Poisson regression, the minimum chi-square estimator is the weighted least squares estimator from the regression of
on
with weights
. See [
9] (pp. 611–612).
If the regression model for
Y depends on
x only through
, and if the predictors
are iid from a large class of elliptically contoured distributions, then [
10,
11] showed that, under regularity conditions,
. Hence
Thus
if
where
and
is the
identity matrix. If
in this case, then
implies that
. The constant
c is typically nonzero unless the model has a lot of symmetry about the distribution of
. Simulation with
can be difficult if the population values of
c and
d are unknown. Results from [
12] (p. 89) suggest that for Poisson regression model, a rough approximation is
Results from [
13] suggest that for binary logistic regression, a rough approximation is
where MSE is the mean square error from the OLS regression.
Ref. [
14] has an interesting result for the multiple linear regression model (1). Assume that the cases
are iid with
,
and nonsingular
. Let
. Then testing
versus
is equivalent to testing
versus
with
where
, and a one sample test can be applied to
Ref. [
14] notes that there are only a few high dimensional analogs of the low dimensional multiple linear regression
F-test for
versus
. See [
15,
16,
17,
18]. The assumptions on the predictors in these four papers are very strong.
This paper uses the above test for
, which is equivalent to a test for
. The resulting test is not limited to OLS for multiple linear regression with iid errors. As shown below and in the following paragraph, the test can be used for multiple linear regression when heterogeneity is present, and the test can also be used for many regression models that depend on the predictors only through
. Suppose
where
D is a
positive definite matrix. Then
if and only if
. Then
for OPLS,
for OLS, and
for the MMLE. The
k-component partial least squares estimator can be found by regressing
Y on a constant and on
for
where
for
. See [
19]. Hence
if
. Thus if the cases
are iid, then using
gives tests for
,
,
,
, and
. For multiple linear regression with heterogeneity,
is still a consistent estimator of
. Hence the test can be used when the constant variance assumption is violated.
Under iid cases with , if the response variables depend on the only through , then . Hence the are iid and do not depend on x, and thus satisfy a multiple linear regression model with For a parametric regression, such as a generalized linear model, assume where D is the parametric distribution and is a real valued function. For example, D could be the negative binomial distribution with and . If then the iid . Typically, if , then , and the test can have good power. An exception is when there is a lot of symmetry which rarely occurs with real data. For example, suppose where the iid errors are independent of the predictors, , and the function m is symmetric about 0, e.g., Then and even if .
If then , and . Then apply a high dimensional one sample test on the . Note that the sample mean
Section 2.1 reviews and derives some results for the one sample test that will be used.
Section 2.2 reviews some two sample tests.
Section 2.3 gives theory for the test given in the above paragraph.
2. Materials and Methods
2.1. A High Dimensional One Sample Test
This section reviews and derives some results for the one sample test that will be used. Suppose
are iid random vectors with
and covariance matrix
. Then the test
versus
is equivalent to the test
versus
. Let
. A U-statistic for estimating
is
where tr() is the trace function. See, for example, [
20].
To see that the last equality holds, note that
Now
Thus
Thus
Next, we derive a simple test. Let the variance
for
. Let
be the integer part of
. So floor(100/2) = floor(101/2) = 50. Let the iid random variables
Note that
and
. Let
be the sample variance of the
:
The following new theorem follows from the univariate central limit theorem.
Theorem 1. Assume are iid, , and the variance for . Let be defined as above. Then
(a) as . The following theorem derives the variance under simpler regularity conditions than those in the literature, and the new proof of the theorem is also simpler.
Theorem 2. Assume are iid, , and the variance for . Let for . Let where , , and . Then(b) If is true, then and Proof. (a) To find the variance
with
from Equation (
7), let
, and note that
Then
Let
for
. The covariances are of 3 types. First, if
with
, then
Second, if
are distinct with
and
, then
and
are independent with
. Third, there are terms where exactly three of the four subscripts are distinct, which have
where
,
, and
or
where
,
, and
. These covariance terms are all equal to the same number
since
The number of ways to get three distinct subscripts is
since
a is the number of terms on the right hand side of (8),
b is the number of terms where
are distinct with
and
, and
c is the number of terms where
with
. [Note that
terms have
i and
j distinct. Half of these terms have
and half have
. Similarly,
terms have
distinct, and half of the
terms have
, while half of the
terms have
.] Thus
This calculation was adapted from [
21] (pp. 336–337). Thus
(b) Now
where
and
are iid. Hence
Under
and thus
. □
Note that
is the sample mean of the
distinct, identically distributed
for
. When
, Theorem 2 proves that the
are uncorrelated. Hence when
is true,
satisfies (Theorem 2b). Ref. [
14] (p. 2024) showed that
. Plugging this value into (Theorem 2a) gives the [
22] result
Note that
can be consistently estimated as follows. Let
. Let
,
,
,
, …,
,
. Then
is the sample covariance of the
where
. Note that a consistent estimator of
is
.
Let
and
be consistent estimators of
and
, respectively. Then ref. [
22,
23,
24,
25], and others proved that under mild regularity conditions when
is true,
Under regularity conditions when
is true, ref. [
25] proved that
as
for fixed
where
.
A consistent estimator of
needs a consistent estimator of
. Let
Then one estimator is
from Theorem 1. An estimator nearly the same as the one used by [
25] is
Note that
can be proportional to
p since
is the standard deviation of a sum of
p random variables. Thus to have good asymptotic power against all alternatives, likely need
as
When
,
tends to have more power than
since
. Suppose
where the constant
and
1 is the
vector of ones. Then
, and the test using
may have good power for
or for
For computing , a question is whether to use an estimator of or of Let the th element of be with . Let be the Frobenius norm of , and be the Euclidean norm of vector a. Let be the vector formed by stacking the columns of into a vector. Then . There is a level-power tradeoff. Using is good for controlling the level = P(type I) error when is true. Since , the parameter can be much smaller than , and using a good estimator of may result in better power.
In high dimensions, it is often very difficult to estimate a
vector
when
. This result is a form of “the curse of dimensionality.” If a
consistent estimator of
is available, then the squared norm
Hence estimators
that use many parameters, such as plug in estimators
, are likely to be poor. The two parameter estimator
likely has more variability than
when
is true, and better estimators of
are needed. In simulations,
was often negative. Let
if
and
, otherwise. In limited simulations, this estimator did about as well as
. Obtaining an estimator that clearly outperforms
would improve the omnibus test, but is beyond the scope of this paper.
We also considered replacing
by
where the spatial sign function
if
, and
otherwise. This function projects the nonzero
onto the unit
p-dimensional hypersphere centered at
Let
denote the statistic
computed from an iid sample
. Since the
are iid if the
are iid, use
to test
versus
where
In general,
but
can occur if the
have a lot of symmetry about 0. In particular,
if the
are iid from an elliptically contoured distribution with
. The test based on the statistic
can be useful if the first or second moments of the
do not exist, for example if the
are iid from a multivariate Cauchy distribution. These results may be useful for understanding papers such as [
26].
The nonparametric bootstrap draws a bootstrap data set with replacement from the and computes by applying on the bootstrap data set. This process is repeated B times to get a bootstrap sample . For the statistic , the nonparametric bootstrap fails in high dimensions because terms like need to be avoided, and the nonparametric bootstrap has replicates: the proportion of cases in the bootstrap sample that are not replicates is about The m out of n bootstrap draws a sample of size m without replacement from the n cases. Using worked well in simulations. Sampling without replacement is also known as subsampling and the delete d jackknife.
2.2. Three High Dimensional Two Sample Tests
If come in correlated pairs, a high dimensional analog of the paired t test applies the one sample test on .
Now suppose there are two independent random samples and from two populations or groups, and that it is desired to test versus where are vectors. Let . Let be the sample covariance matrix of and let for
A simple test takes and for . Then apply the one sample test from Theorem 2 to the . This paired test might work well in high dimensions because of the superior power of the Theorem 2 test, but in low dimensions, it is known that there are better tests.
Let
be the
that has
. Then let
for
. Note that
if
. Ref. [
27] (pp. 177–178) proved that
, that
and
are uncorrelated for
, that
, and that
for
. Ref. [
25] showed that
where the
y denotes that the one sample test was computed using the
.
Note that holds if and only if These terms can be estimated by where and are the one sample test statistic applied to samples 1 and 2 and
Let
and
where
. Let
. Let
,
, and
. Let
be the variance of
when
is true. Assume
is a consistent estimator of
. Under
and additional regularity conditions, ref. [
22] showed that
and that
Let where , , and , where , , and , where , and where .
Ref. [
22] showed that
Ref. [
28], using arguments similar to Theorem 2, showed
Thus
and
Hence
If
, then the
, and the formula with the
worked well in simulations. Note that
,
, and the
can be estimated as in
Section 2.1. Let
, and
for
. Let
be the sample variance of the
. Another estimator of
is
2.3. Theory for Testing
Consider tests of the form versus . The omnibus test uses and tests versus .
Let
and
for
. Then
under mild regularity conditions by
Section 2.1 where
w indicates that the test was applied to the
. Ref. [
14] showed that
and used
for multiple linear regression in their simulations.
Let , , and . Then testing uses the one sample test on the . This test is equivalent to testing and . Note that data splitting could be used to select O. For multiple linear regression and the MMLE and OPLS estimators, these tests are high dimensional analogs for the OLS partial F tests for testing whether a reduced model is good. If , then I corresponds to the predictors in the reduced model while O corresponds to the predictors out of the reduced model.
In low dimensions, important tests for regression include (a) (the Wald tests for MLR), (b) (the Anova F test for MLR), and (c) (the partial F test for MLR). The above paragraph shows how to do these high dimensional tests for the multiple linear regression OPLS and MMLE estimators, with or without heterogeneity. Data splitting is not needed if O is known. Note that (a) corresponds to testing while (c) corresponds to testing .
The next subsection reviews competitors for the above tests when k is small compared to n.
2.4. Theory for Certain A
This subsection reviews some large sample theory for and OPLS for the multiple linear regression model, including some high dimensional tests for low dimensional quantities such as or . These tests depended on iid cases, but not on linearity or the constant variance assumption. Hence the tests are useful for multiple linear regression with heterogeneity.
The following [
5] theorem gives the large sample theory for
. Ref. [
6] gave alternative proofs. This theory needs
to exist for
to be a consistent estimator of
. Let
and let
and
be defined below where
Then the low order moments are needed for
to be a consistent estimator of
.
Theorem 3. Assume the cases are iid. Assume exist for and Let and . Let with sample mean . Let . Then (a)(b) Let Then . Hence .(c) Let be a full rank constant matrix with , assume is true, and assume . Then For the following theorem, consider a subset of
k distinct elements from
or from
. Stack the elements into a vector, and let each vector have the same ordering. For example, the largest subset of distinct elements corresponds to
For random variables
, use notation such as
the sample mean of the
,
, and
. Let
For general vectors of elements, the ordering of the vectors will all be the same and be denoted by vectors such as , , , and Let be the sample mean of the . Assuming that exists, then
The following [
6] theorem provides large sample theory for
and
. We use
to avoid confusion with the
used in Theorem 3. Note that
are dummy variables and could be replaced by
to get information about
m response variables
. Testing
could likely be done applying the one sample test to
, …,
assuming
and iid cases.
Theorem 4. Assume the cases are iid and that exists. Using the above notation with a vector,
(i) .
(ii) .
(iii) and .
2.5. Testing
As noted by [
5], the following simple testing method reduces a possibly high dimensional problem to a low dimensional problem. Testing
versus
is equivalent to testing
versus
where
A is a
constant matrix. Let
be the asymptotic covariance matrix of
. In high dimensions where
, we can’t get a good nonsingular estimator of
, but we can get good nonsingular estimators of
with
where
with
. Here
denotes predictors that are in the model. (Values of
J much larger than 10 may be needed if some of the
k predictors and/or
Y are skewed.) Simply apply Theorem 3 to the predictors
u used in the hypothesis test, and thus use the sample covariance matrix of the vectors
Hence we can test hypotheses like
In particular, testing
is equivalent to testing
where
.
2.6. High Dimensional Outlier Detection
High dimensional outlier detection is important. This subsection follows [
29] closely. See [
29,
30] for examples and simulations. Let
W be a data matrix, where the rows
correspond to cases. For example,
or
. One of the simplest outlier detection methods uses the Euclidean distances of the
from the coordinatewise median
Concentration type steps compute the weighted median
: the coordinatewise median computed from the “half set” of cases
with
where
. We often used
(no concentration type steps) or
. Let
. Let
if
where
and
is the default choice. Let
, otherwise. Using
insures that at least half of the cases get weight 1. This weighting corresponds to the weighting that would be used in a one sided metrically trimmed mean (Huber type skipped mean) of the distances. Here, the sample median absolute deviation is
where
is the sample median of
.
Let the
covmb2 set B of at least
cases correspond to the cases with weight
. Then the
covmb2 estimator
is the sample mean and sample covariance matrix applied to the cases in set
B. If
, then
This estimator was built for speed, applications, and outlier resistance.
Another method to get an outlier resistant estimator
is to use the following identity. If
X and
Y are random variables, then
Then replace Var(
by
where
is a robust estimator of scale or standard deviation and
or
. We used
where
Hence
The function ddplot5 plots the Euclidean distances from the coordinatewise median versus the Euclidean distances from the covmb2 location estimator. Typically the plotted points in this DD plot cluster about the identity line, and outliers appear in the upper right corner of the plot with a gap between the bulk of the data and the outliers.
The function rcovxy makes the classical and three robust estimators of , and makes a scatterplot matrix of the four estimated sufficient predictors and Y. Only two robust estimators are made if .
3. Results
Example 1. The [31] data was collected from districts in Prussia in 1843. Let Y = the number of women married to civilians in the district with a constant and predictors = the population of the district in 1843, = the number of married civilian men in the district, = the number of married men in the military in the district, and = the number of women married to husbands in the military in the district. Sometimes the person conducting the survey would not count a spouse if the spouse was not at home. Hence Y and are highly correlated but not equal. Similarly, and are highly correlated but not equal. We expect Then , , , and . Let the omnibus test statistic applied to the . Then and the hypotheses , , and are all rejected. The classical F-test also rejects with p-value=0. Example 2. The [32] pottery data has pottery shards of Roman earthware produced between second century B.C. and fourth century A.D. Often the pottery was stamped by the manufacturer. A chemical analysis was done for chemicals (variables), the types of pottery were 1-Arretine, 2-not-Arretine, 3-North Italian, 4-Central Italian, 5-questionable origin. Let the binary response variable for type 1 and 0 for types 2–5. The omnibus test had for a two sided p-value of 0.0319 and the more correct right tailed p-value of 0.016. The chi-square logistic regression test for had p-value = 0.0002, but the GLM did not converge. 3.1. One Sample Tests
In the simulations, we examined five one sample tests. The first “test” used the
m out of
n bootstrap to compute
with
. We used the shorth bootstrap confidence interval described in [
30] (ch. 2). This “test” has not been proven to have level
. The second test computed the usual
t confidence interval
for
based on the
from Theorem 1. The third and fourth tests used Theorem 2 (b) and
if
is a consistent estimator of
when
is true. The third test used
, while the fourth test used
based on Theorem 1. These two tests computed intervals (“confidence intervals for 0”)
The tests 2–4 use the same cutoff
so that the average interval lengths are more comparable. The fifth test used the Theorem 2 test applied to the spatial sign vectors with
.
The simulation used four distribution types where with where 1 is the vector of ones. Type 1 used type 2 used a mixture distribution , type 3 for a multivariate distribution, and type 4 for a multivariate lognormal distribution where with where and where . The covariance matrix type depended on the matrix A. Type 1 used , type 2 used , and type 3 used giving cor( for where if , as if where , and as if is a constant. We used and chosen so at least one test had good power. The simulation used 5000 runs, the 4 x distributions, and the 3 matrices A. For the third A, we used .
Table 1 and
Table 2 summarize some simulation results. There are two lines for each simulation scenario. The first line gives the simulated power = proportion of times
was rejected. The second line gives the average length of the confidence interval for 0 where
is rejected if 0 is not in the confidence interval. When
, observed coverage between 0.04 and 0.06 suggests coverage = power = level is close to the nominal value 0.05. For larger
, want the coverage near 1 for good power. See [
28] for more simulations.
The bootstrap test corresponds to the boot column, the tests using , , and correspond to the next three columns. The last column corresponds to the spatial sign test. This test tends to have much shorter lengths because of the transformation of the data. The test using has simple large sample theory, but low power compared to the other methods. This test’s length is approximately times the length of that corresponding to where in the tables. The bootstrap test was sometimes conservative with observed coverage when . For xtype = 4 and , was not true for the spatial test. Hence the coverage for the spatial test was sometimes higher than 0.06 for this scenario. For , the test with sometimes had coverage less than 0.04, while the test with sometimes had coverage greater than 0.06. In the simulations, the spatial test often performed well, but typically , which makes the spatial test harder to use. For testing , the test with appeared to perform better than the three competitors.
3.2. Two Sample Tests
In the simulations, we examined three two sample tests. The first “test” used the
m out of
n bootstrap where
to bootstrap the [
22] test that estimates
The second test was the “paired test” with
and
for
. Then apply the one sample test from Theorem 2 to the
. The third test was the [
25] Li test. Both of these tests used
applied to the
or the
.
The simulation used four distribution types where and where and had the same distribution, with and . Type 1 used type 2 used a mixture distribution , type 3 for a multivariate distribution, and type 4 for a multivariate lognormal distribution where with where and where . The covariance matrix type depended on the matrix A.
For the covariance types,
for covtyp = 1.
for covtyp = 2.
(1,2,...,p) for covtyp = 3.
Table 3 shows some results. Two lines were used for each simulation scenario, with coverages on the first line and lengths on the second line. When
, the paired test and Li test gave the same results. When
was not near 1, the Li test had better power and shorter length. Increasing
could greatly increase the length for the bootstrap test, but the coverage would be 1. Improving the one sample test would improve the Li test, but the Li test performed well in simulations.
3.3. Theorem 3 Tests
We illustrate Theorem 3 and
Section 2.5 for Poisson regression and negative binomial regression. This simulation is similar to that done by [
6] for multiple linear regression with and without heterogeneity. Let
be the
vector of nontrivial predictors. Let
for
. Hence
and
with
ones and
zeros. Here
is the Poisson regression parameter vector
or the negative binomial regression parameter vector
. Let
if
and
if
. Then a multiple linear regression model with heterogeneity is
where the
are independent with expected value
and variance
Since the cases
are iid, the OLS estimator
because
. Thus
with the first
k values equal to
and
zeros.
Let
. Then the Theorem 3 large sample
confidence interval (CI) is
could be computed for each
. If 0 is not in the confidence interval, then
and
are both rejected for estimators
E = OPLS and MMLE for the multiple linear regression model with
Z. In the simulations with
,
, and
, the maximum observed undercoverage was about
. Hence the program has the option to replace the cutoff
by
where
if
,
if
. If
, then use
. This correction factor was used in the simulations for the nominal 95% CIs, where the correction factor uses a cutoff that is between
and the cutoff
that would be used for a 97.5% CI. The nominal coverage was
with
. Observed coverage between 0.94 and 0.96 suggests coverage is close to the nominal value. Ref. [
33] noted that weighted least squares tests tend to reject
too often (liberal tests with undercoverage).
To summarize the confidence intervals, the average length of the confidence intervals over 5000 runs was computed. Then the minimum, mean, and maximum of the average lengths was computed. The proportion of times each confidence interval contained zero was computed. These proportions were the observed coverages of the confidence intervals. Then the minimum observed coverage was found. The percentage of the observed coverages that were ≥ 0.9, 0.92, 0.93, 0.94, and 0.96 were also recorded. The test was also done where was true. The coverage of the test was recorded and a correction factor was not used. Negative binomial regression and Poisson regression were used, where indicates that Poisson regression was used.
Table 4 illustrates Theorem 3(a) where
and
Table 4 replaces
Y with
Z. For
Table 4, confidence intervals were made for
for
and the coverage was the percentage of the 5000 CIs that contained 0. Here
, but
for
The first two lines of
Table 4 correspond to Poisson regression. The confidence interval for
never contained 0, hence the minimum coverage was 0 with observed power
. The proportion of CIs that had coverage
was 0.9898 (98/99 CIs). Hence this was also the proportion of CIs with coverage
and
. The proportion of CIs that had coverage
was 0.8081 (80/99 CIs). The typical coverage was near 0.965, hence the correction factor was slightly too large. The test
did not use a correction factor, and coverage was 0.9438. The minimum average CI length was 0.4166, the sample mean of the average CI lengths was 0.4187, and the maximum average length was 0.4875, corresponding to
. The second two lines and below for
Table 4 were for the negative binomial regression with kappa
. For
1000 and 10,000, the simulations were very similar to those for
. Using
Y instead of
Z gave similar results with longer lengths.
3.4. Omnibus Test
Multiple Linear Regression
For this simulation, the
x were generated as in
Section 3.1 with
, and then
where
. Hence
is true when
. The one sample test was applied on the
using
and
. The zero mean iid errors
were iid from five distributions: (i) N(0,1), (ii)
, (iii) EXP(1) − 1, (iv) uniform(
), and (v) 0.9 N(0,1) + 0.1 N(0,100). Only distribution (iii) is not symmetric. With 5000 runs, would like the coverage to be between 0.04 and 0.06 when
. In
Table 5, the coverage was a bit high when
was used (second to last column) instead of
(fourth column). Power near 0.95 was good for
.
Poisson Regression
For this simulation, the
were generated in a manner similar to
Section 3.1 when the
were from a multivariate normal distribution. Let
where there were
k 1’s and
0’s. Then the
were scaled such that
when
. In general,
for
. Hence the population Poisson regression was fairly strong for
and rather weak for
.
Table 6 shows that using
controlled the nominal level 0.05 better than using
. As
p got larger, the power performance could decrease. See line 8 of
Table 6.
Sample R code for the above two tables is shown below.
mlrcovxysim (n=100,p=500,nruns=5000,xtype=3,etype=2,delta=0)
prcovxysim (n=500,p=100,k=100,nruns=5000,psi=0,delta=0)
4. Discussion
The omnibus test is resistant to model misspecification. For example, (a) the constant variance multiple linear regression model could be assumed when there is heterogeneity, and (b) for count data, a multiple linear regression model, or a negative binomial regression model, or a quasi-Poisson regression model may fit the data much better than the count model actually chosen. The test can also be used in low dimensions when the MLE fails to converge.
Based on the simulations and the theory, (a) the omnibus test and one sample test will not have good power against all alternatives unless
as
. (b) The omnibus test and one sample test tended to have simulated observed level near the nominal level (control the type I error) if
was used, but the omnibus test could be conservative if
n was small:
for multiple linear regression and
for Poisson regression in the simulations. Sometimes
exploded if
p was large or if
was false. (c) The omnibus test and one sample test have little outlier resistance. Thus it is important to check for outliers before performing the tests. (d) Both tests worked fairly well in simulations for
and
, and Ref. [
14] used
in their simulations for multiple linear regression.
Right tail tests should be used for
since they have more power, but two tail tests are easier to explain and compare. Ref. [
14] used the statistic
with
and
This statistic can also be used for an omnibus test when
. The extra term was used to increase power and is likely a good idea, but better formulas for
may be needed.
Ref. [
28] has many references for high dimensional one and two sample tests. For classification with two groups, let
be the pooled covariance matrix. Then
if and only if
, which can be tested with a two sample test. For the importance of
in discriminant analysis, see, for example, [
34].
Let the “fail to reject region” be the compliment of the rejection region. Often the fail to reject region is a confidence region for the parameter or parameter vector of interest, where a confidence interval is a special case of a confidence region. In high dimensions, the length or volume of the fail to reject region does not necessarily converge to 0 as , and the volume could diverge to ∞ if For the one sample test, the fail to reject region using has much more power than using a confidence interval for .
Simulations were done in
R. See [
35]. The collection of [
30]
R functions
slpack, available from (
http://parker.ad.siu.edu/Olive/slpack.txt, accessed on 28 October 2025). has some useful functions for the inference. The function
hdomni does the omnibus test. The relevant
R code is shown below.
hdomni(x,y,alpha=0.05)
k <- n*(n-1)
xx <- scale(x,scale=F) #centered but not scaled
v <- xx*c(y-mean(y))
a <- apply(v,2,sum)
Thd <- (t(a)%*%a - sum(v^2))/k #1 by 1 matrix
Thd <- as.double(Thd) #so the test statistic Thd=Tn is a scalar
sscp <- v%*%t(v)
ss <- sscp - Thd
ss <- ss^2
vw1 <- (sum(ss) - sum(diag(ss)))/k
Vohat <- 2*vw1/k
Z <- Thd/sqrt(Vohat)
pval <- 2*pnorm(-abs(Z)) #two tail pvalue
rpval=1-pnorm(Z) #right tail pvalue
The function
hdhot1sim3 was used to simulate the five one sample tests, and was used for
Table 1 and
Table 2. The function
hdhot1sim4 added the test using
. The function
hdhot2sim simulates the two sample test which applies the fast paired test on the
for
, the [
25] test, and the two sample [
22] test based on subsampling with
for i = 1, 2. See
Table 3. Proofs for Theorems 3 and 4 were not given, but are available from preprints of the corresponding published papers from (
http://parker.ad.siu.edu/Olive/preprints.htm, accessed on 28 October 2025).
For
Table 4, the function
nbinroplssimz was used to create negative binomial regression data sets for finite
, while the function
PRoplssimz was used to create the Poisson regression data sets corresponding to
. The functions without the z do not use the
transformation.
For the omnibus test, the function mlrcovxysim was used for multiple linear regression, while the function prcovxysim was used for Poisson regression.
The spatial sign vectors have a some outlier resistance. If the predictor variables are all continuous, the
covmb2 and
ddplot5 functions are useful for detecting outliers in high dimensions. See [
30] (section 1.4.3). Ref. [
36] gave estimators for the variance of U-statistics.