1. Introduction
In this study, we propose a permutation-based methodology, based on multi-aspect testing, to address the goodness-of-fit problems when the sample size is small and numeric multivariate data are available.
Parametric techniques rely on specific assumptions about the distribution of the population from which the parameter of interest is drawn. When such assumptions are violated, inference can be highly unreliable. For this reason, appropriate tests need to be preliminarily conducted to detect the eventual departure from the required distribution.
Considering the multivariate scenario, in the literature, a few solutions have been proposed to evaluate the multivariate normality.
For example, Mardia [
1] proposed a pair of solutions based on multivariate versions of skewness and kurtosis measures. Let us suppose that
,
,
,
,
,
is the sample of interest,
n is the sample size, and
V is the number of multivariate components. Equation (
1) displays the proposed skewness test, while a kurtosis test is reported in Equation (
2).
Given that for small samples, the power and the type I error could be violated, the author afterwards proposed a corrected version of the Skewness test, using as test statistic, where .
A small sample size indeed makes multivariate goodness-of-fit problems quite challenging, given that we cannot rely on asymptotic properties. To deal with such circumstances, Arboretti et al. [
2] proposed a permutation-based method relying on the nonparametric combination (NPC) methodology [
3]. Given a random sample
from a population with distribution
and a theoretical distribution
, the authors followed the approach suggested by Friedman [
4], addressing a goodness-of-fit problem as a two-sample equality in distribution problem. They drew an additional sample
from the theoretical distribution
, and used
and
to test whether
. Arboretti et al. [
2] also recommended the use of permutation tests, highlighting their non-parametric, distribution-free nature and their power.
Considering a generic equality in distribution problem, there are many aspects that can determine a difference and need to be monitored. We can have completely different distribution functions, but also differences in location, in variability or in shape parameters. For this reason, multi-aspect permutation tests represent a solution worth considering. These tests follow the idea proposed by Fisher in 1947, which affirmed that different tests can be adopted to evaluate different aspects of the same null hypothesis [
5,
6]. Multi-aspect testing procedures indeed are aimed at simultaneously testing multiple features of the same
.
The NPC methodology allows us to easily extend such procedures to the multivariate scenario and thereafter to multivariate goodness-of-fit problems. In
Section 2, we propose a possible extension, providing a detailed description of the underlying algorithm and of a possible competing technique. Then,
Section 3 is devoted to the investigation of its performance through a simulation study. In
Section 4, a real data application is proposed. Finally, in
Section 5, we make conclusions about the conducted study.
2. Multi-Aspect Permutation Solution
Arboretti et al. [
2] showed that goodness-of-fit problems can be easily converted into a two-sample equality in the distribution problem. For this reason, the nonparametric combination methodology can provide suitable and quite powerful solutions.
The NPC essentially requires three steps to be undertaken:
In a multivariate scenario, the first step implies the following decomposition:
where we create a sub-system of hypotheses for each of the
V components
of the multivariate outcome.
On the other hand, NPC-based solutions for multi-aspect testing [
7,
8,
9,
10] also require an initial decomposition of the system of hypotheses, defining a sub-problem for each aspect to be considered. For the sake of simplicity, in this study, we focus on three aspects and report the related sub-systems:
cumulative distribution function
To address the aforementioned multivariate goodness-of-fit problem, we propose a multivariate multi-aspect test, and therefore we need to combine the two different decompositions:
For each individual aspect, we then identify a suitable test statistic. In particular, we detect differences in the following:
For location, we use the absolute difference between sample means:
For variability, we adopt the ratio of the two estimated variances
and
:
For the cumulative distribution function, we use the Anderson–Darling test statistic [
3]:
where
is the pooled sample,
n and
m are individual sample sizes,
,
,
,
,
, and
.
The second step of the NPC methodology can thereafter be undertaken. We apply each test statistic to each univariate component of the outcome and compute the related partial p-values via multivariate permutation.
The adopted algorithm is as follows:
Apply the three test statistics to the original pooled data set . Observed values , , and are achieved.
For :
- –
Shuffle rows of (i.e., the same permutation scheme is applied to each component), implicitly taking into account the existing correlation among variables.
- –
Apply the three test statistics and retrieve , , and .
Compute the partial
p-values
,
, and
, comparing the values of the observed test statistics to those of the permuted test statistics (e.g.,
,
), and their permutation distributions
,
, and
(for further details, see Pesarin and Salmaso [
3]).
The last step consists of the combination of the partial p-values and the computation of the global p-value. To do that, the following procedure needs to be followed:
For each aspect, apply a combining function to the V vectors of partial p-values and their permutation distributions to achieve second-order test statistics , , and their estimated distributions , , , .
Compute second-order p-values , , and (and the related distributions , , , ) comparing to the permuted values , with .
Apply a combining function to the second-order p-values and their permutation distributions to achieve a third-order test statistic and its estimated distribution .
Compute the global p-value comparing to the permuted values , .
The choice of the combining function can represent a key factor in determining the power of the proposed test. According to Pesarin and Salmaso [
3], a combining function should satisfy four fundamental properties, which are as follows:
It should be a non-increasing and possibly symmetric function;
It should reach its supremum value even when only a single partial p-value attains 0;
For each significance level , the related critical value should be finite and lower than the supremum value;
The rejection region of the resulting combined test should be convex.
Among the functions satisfying these requirements, we have the following:
The truncated product method [
12], a modification of Fisher’s combining function which generally helps in gaining power with highly dependent data [
13]:
In this study, we decided to set and to investigate the impact of choosing the truncated product method over Fisher’s as the first combining function used to combine V vectors of p-values related to the V components of the multivariate outcome. A simulation study was indeed conducted to evaluate the power of our proposal, implementing two different versions of the test, one using (indicated as NPC—Fisher) and one using (indicated as NPC—Truncated).
A Competing Method
For a better evaluation of the performance of the proposed method, we decided to consider a possible competing method. In particular, we focused on the two-sample energy tests introduced by Székely et al. [
15] in order to deal with the equality in distribution in high-dimensional problems. The test statistic they suggested is
where
is the Euclidean distance. Large values of this statistic lead to rejection and in order to propose an adequate
p-value, they rely on a permutation approach.
3. Simulation Study
In this study, we considered several different scenarios to accurately evaluate the performance of the proposed NPC-based approach.
Firstly, we decided to consider three different multivariate distributions:
Multivariate normal distribution with mean and variance-covariance matrix ;
Multivariate log-normal distribution with the mean vector of the log of the distribution equal to and variance–covariance matrix of the log of the distribution equal to ;
Multivariate Student’s t distribution with 3 degrees of freedom, location parameter and scale matrix .
Data generation was conducted, taking advantage of the
rmvnorm [
16],
LaplacesDemon [
17] and
compositions [
18] packages implemented in R. The
energy package [
19] was adopted to apply the competing method, while R codes implementing the two versions of the multi-aspect test are available upon request.
Initially, the sizes of the observed and the theoretical samples were both fixed to 20, while we decided to consider two possible numbers of variables V, namely 6 and 10.
Under the null hypothesis, the observed and the theoretical distributions are expected to be the same. Therefore, under
, we set the
V-dimensional vector
and the
-dimensional matrix
for both samples. We considered three possible values of
, i.e., 0, 0.25, and 0.5, to introduce different degrees of correlation.
Under the alternative hypothesis, we focused on scenarios where the observed and the theoretical distributions were different in terms of both the location and scale parameters. For the theoretical sample
, the aforementioned
and
were adopted. On the other hand, to generate the observed sample
, we used
and
Again, three possible values of
, i.e., 0, 0.25, and 0.5, were considered, using each time the same value for both the distributions.
Given the well-known properties of the NPC methodology, increasing the size
m of the theoretical sample could allow us to increase the power of our procedure. To better illustrate such a phenomenon, we also tried to vary
m in a final scenario, where for the theoretical sample, we use
and
and for the observed sample, we have
and
with
. In particular, we considered three possible values for
m (i.e., 20, 30, and 40), while keeping
n fixed to 20.
It is worth noting that the current choice of test statistics is not ideal for situations where the observed and the theoretical samples differ in terms of correlation structure. This shortcoming is illustrated and further discussed through an additional simulation scenario, where for the theoretical sample we use
,
and
with
, while for the observed sample we have
and
with
.
The number of simulation runs was set equal to 5000, while the number of permutations was equal to 2000.
Results and Discussion
Under the null hypotheses, both versions of the NPC-based test (using Fisher’s combining function and the truncated product method with
) and the energy test all kept the nominal level. Having fixed the significance level
to
, the rejection rates (i.e., the proportion of
p-values less than or equal to
) are always quite close to
, with some slight random fluctuations (see
Table 1).
Under the alternative hypothesis, we can appreciate some differences between the considered methods (see
Table 2).
First of all, it appears that the choice of the combining function does not affect considerably the performance of the NPC-based multi-aspect test. This is probably due to the fact that two sequential combination steps are undertaken, mitigating the impact of the use of a specific function during the first step. However, when a larger number of variables and a multivariate log-normal distribution are considered, the solution adopting Fisher’s combining function appears to be perform slightly better.
Both the Fisher and truncated product methods show reasonably high rejection rates and outperform the energy test under the vast majority of scenarios. However, it should be noted that for the multivariate log-normal distribution, this is not true. When considering this asymmetric distribution, the methods show similar performance when and the energy test even has the highest rejection rates when a larger number of variables is considered (i.e., when ).
It is then worth noting how a high correlation among variables appears to be detrimental to the power of the methods. All the considered tests show indeed considerably lower rejection rates when with respect to the case . The performance of the NPC-based solutions, however, remains reasonably good, even for when the symmetrical distributions are considered.
Additionally, a higher number of informative variables appears to lead to an increase in power. This is a well-known property of NPC-based tests, called finite-sample consistency [
20]. This means that whatever the sample size, a reasonably good power can be reached if a considerably high number of informative variables is available. This is a pretty useful property that introduces a potential solution to deal with the shortcomings posed by small-sample scenarios.
Table 3 allows us to appreciate the positive effect on power of an increase in the theoretical sample size
m. In particular, we can see that the permutation-based solutions with
are able to outperform the competing method even for
and a multivariate log-normal distribution, i.e., the only case where the energy test was performing the best for smaller sample sizes. To further enhance the power of these tests, the user could therefore consider increasing the size of the theoretical sample, given that it can be freely chosen. However, when adopting such an approach, we should be aware that it could lead to a substantial increase in the computational burden.
Investigating a potential shortcoming due to the current choice of the test statistics, we noticed that the methods indeed fail at detecting differences in the correlation structure. This is true for both the NPC-based solutions and the energy test. Looking at
Table 4, we can indeed see that the rejection rates are very close to the nominal level expected under
. However, by including an additional test statistics specifically designed to detect differences in correlation, it could be possible to address even such a scenario [
3].
4. Real Data Application
We decided to consider a real data application in order to better show the usefulness of our proposed procedure. In particular, we applied our approach to an industrial problem, where an operator is interested in checking the quality of a production process in terms of multiple key performance indicators. Initially, 25 different bottles (i.e., the output of the process) were randomly selected. Their diameters measured on three key positions were expected to be, on average, equal to 2.5 cm (Diameter A), 5 cm (Diameter B) and 7 cm (Diameter C), respectively (i.e.,
). Additionally, after an application of the Six Sigma methodology, we knew in advance that the expected variance–covariance matrix was as follows:
with the diameters values following a multivariate normal distribution.
The gathered sample showed a potential shift from the expected mean value in Diameter A (i.e., the diameter of the neck of the bottle) as we can see in
Table 5. We therefore applied both the versions of the NPC-based multi-aspect test and the energy test to further investigate this hypothesis.
Table 6 reports the achieved global
p-values. We can see that all the considered methods allow us to reject the null hypothesis with a significance level equal to 5%, which means that the gathered sample does not follow a multivariate normal distribution with mean
and a matrix of variance and covariance
. Looking at adjusted partial
p-values (see
Table 7) we can also identify which aspects lead to this rejection. In particular, we can see that a significant shift in mean did happen.
5. Conclusions
In this paper, we introduced a multi-aspect permutation test to deal with the multivariate goodness-of-fit (GoF). First of all, we adopted the approach already proposed by Arboretti et al. [
2], transforming the GoF problem into a traditional two-sample one. Then, we simply introduced an extension of the nonparametric combination (NPC) methodology [
3], which is able to detect differences in location, scale and cumulative distribution function between the observed sample distribution and the theoretical distribution.
To evaluate the performance of this solution, we proposed a simulation study, which allowed us to appreciate the goodness of our proposal, even when compared to a possible competing testing procedure (i.e., the energy test proposed by Székely et al. [
15]). Its power appears to be negatively affected by high correlation among variables, but at the same time, it tends to substantially increase when the number of informative variables increases. It also emerged that the choice of the combining function adopted in the first combination step required by the NPC methodology does not appear to significantly affect the performance of the proposed test.
The conducted simulation study showed the benefits of choosing a large size m of the sample drawn from the theoretical distribution, which appears to lead to an increase in power. Future studies could therefore consider providing guidelines about the appropriate ratio between the observed and the theoretical samples sizes.
On the other hand, it emerged also a shortcoming of the current configuration of the proposed approach, which fails at detecting differences in the correlation structure. For this reason, future studies could focus on the introduction of a further test statistic, specifically designed to detect such differences, which could allow us to improve the performance under such scenarios.
A real data application was also proposed, which allowed us to show the usefulness of our approach.
Overall, our proposal demonstrated to be a quite powerful solution to goodness-of-fit problems, which shows high flexibility and leaves room for further improvement and investigation.
Author Contributions
Conceptualization, R.C., N.B. and E.B.; Methodology, R.C., N.B. and E.B.; Software, R.C., N.B. and E.B.; Validation, R.C., N.B. and E.B.; Formal Analysis, R.C., N.B. and E.B.; Investigation, R.C., N.B. and E.B.; Resources, R.C., N.B. and E.B.; Data Curation, L.C., L.S. and R.A.; Writing—Original Draft Preparation, L.C., L.S. and R.A.; Writing—Review and Editing, L.C., L.S. and R.A.; Visualization, L.C., L.S. and R.A.; Supervision, L.C., L.S. and R.A.; Project Administration, L.S. All authors have read and agreed to the published version of the manuscript.
Funding
This research received no external funding.
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
The data presented in this study are available on request from the corresponding author.
Conflicts of Interest
The authors declare no conflict of interest.
References
- Mardia, K.V. Applications of some measures of multivariate skewness and kurtosis in testing normality and robustness studies. Sankhyā Indian J. Stat. Ser. B 1974, 36, 115–128. [Google Scholar]
- Arboretti, R.; Ceccato, R.; Salmaso, L. Permutation testing for goodness-of-fit and stochastic ordering with multivariate mixed variables. J. Stat. Comput. Simul. 2021, 91, 876–896. [Google Scholar] [CrossRef]
- Pesarin, F.; Salmaso, L. Permutation Tests for Complex Data: Theory, Applications and Software; John Wiley & Sons: Hoboken, NJ, USA, 2010. [Google Scholar]
- Friedman, J. On Multivariate Goodness-of-Fit and Two-Sample Testing; SLAC National Accelerator Lab.: Menlo Park, CA, USA, 2004. [Google Scholar]
- Fisher, R.A. The Design of Experiments, 4th ed.; Hafner Press: New York, NY, USA, 1947. [Google Scholar]
- Lehmann, E.L. The Fisher, Neyman-Pearson theories of testing hypotheses: One theory or two? J. Am. Stat. Assoc. 1993, 88, 1242–1249. [Google Scholar] [CrossRef]
- Salmaso, L.; Solari, A. Multiple aspect testing for case-control designs. Metrika 2005, 62, 331–340. [Google Scholar] [CrossRef]
- Brombin, C.; Salmaso, L. Multi-aspect permutation tests in shape analysis with small sample size. Comput. Stat. Data Anal. 2009, 53, 3921–3931. [Google Scholar] [CrossRef]
- Brombin, C.; Salmaso, L.; Ferronato, G.; Galzignato, P.F. Multi-aspect procedures for paired data with application to biometric morphing. Commun. Stat. Comput. 2010, 40, 1–12. [Google Scholar] [CrossRef] [Green Version]
- Corain, L.; Salmaso, L. Improving power of multivariate combination-based permutation tests. Stat. Comput. 2015, 25, 203–214. [Google Scholar] [CrossRef]
- Fisher, R. Statistical Methods for Research Workers; Oliver and Boyd: Edinburgh, UK, 1932. [Google Scholar]
- Zaykin, D.V.; Zhivotovsky, L.A.; Westfall, P.H.; Weir, B.S. Truncated product method for combining P-values. Genet. Epidemiol. Off. Publ. Int. Genet. Epidemiol. Soc. 2002, 22, 170–185. [Google Scholar]
- Arboretti Giancristofaro, R.; Bonnini, S.; Corain, L.; Salmaso, L. Dependency and truncated forms of combinations in multivariate combination-based permutation tests and ordered categorical variables. J. Stat. Comput. Simul. 2016, 86, 3608–3619. [Google Scholar] [CrossRef]
- Tippett, L.H.C. The Methods of Statistics. An Introduction Mainly for Workers in the Biological Sciences; Williams & Norgate: London, UK, 1931. [Google Scholar]
- Székely, G.J.; Rizzo, M.L. Testing for equal distributions in high dimension. InterStat 2004, 5, 1249–1272. [Google Scholar]
- Genz, A.; Bretz, F.; Miwa, T.; Mi, X.; Leisch, F.; Scheipl, F.; Hothorn, T. Mvtnorm: Multivariate Normal and t Distributions; R Package Version 1.1-3; R Foundation for Statistical Computing: Vienna, Austria, 2021. [Google Scholar]
- Statisticat LLC. Bayesian Inference; R Package Version 16.1.6; R Foundation for Statistical Computing: Vienna, Austria, 2021. [Google Scholar]
- van den Boogaart, K.G.; Tolosana-Delgado, R.; Bren, M. Compositions: Compositional Data Analysis; R Package Version 2.0-4; R Foundation for Statistical Computing: Vienna, Austria, 2022. [Google Scholar]
- Rizzo, M.; Szekely, G. Energy: E-Statistics: Multivariate Inference via the Energy of Data; R Package Version 1.7-8; R Foundation for Statistical Computing: Vienna, Austria, 2021. [Google Scholar]
- Pesarin, F.; Salmaso, L. Finite-sample consistency of combination-based permutation tests with application to repeated measures designs. J. Nonparametric Stat. 2010, 22, 669–684. [Google Scholar] [CrossRef]
Table 1.
Rejection rates with significance level under the null hypothesis.
Table 1.
Rejection rates with significance level under the null hypothesis.
| V | Method | Multivariate Normal | Multivariate Log-Normal | Multivariate Student’s t |
---|
0 | 6 | Energy | 0.010 | 0.009 | 0.010 |
NPC —Fisher | 0.012 | 0.010 | 0.011 |
NPC—Truncated | 0.012 | 0.009 | 0.010 |
10 | Energy | 0.007 | 0.011 | 0.010 |
NPC—Fisher | 0.008 | 0.009 | 0.011 |
NPC—Truncated | 0.007 | 0.009 | 0.013 |
0.25 | 6 | Energy | 0.010 | 0.010 | 0.009 |
NPC—Fisher | 0.010 | 0.008 | 0.009 |
NPC—Truncated | 0.009 | 0.009 | 0.011 |
10 | Energy | 0.008 | 0.012 | 0.009 |
NPC—Fisher | 0.009 | 0.013 | 0.009 |
NPC—Truncated | 0.008 | 0.014 | 0.009 |
0.5 | 6 | Energy | 0.009 | 0.010 | 0.011 |
NPC—Fisher | 0.009 | 0.010 | 0.009 |
NPC—Truncated | 0.009 | 0.009 | 0.009 |
10 | Energy | 0.012 | 0.011 | 0.011 |
NPC—Fisher | 0.009 | 0.012 | 0.011 |
NPC—Truncated | 0.009 | 0.012 | 0.011 |
Table 2.
Rejection rates with significance level under the alternative hypothesis varying .
Table 2.
Rejection rates with significance level under the alternative hypothesis varying .
| V | Method | Multivariate Normal | Multivariate Log-Normal | Multivariate Student’s t |
---|
0 | 6 | Energy | 0.757 | 0.448 | 0.807 |
NPC—Fisher | 0.965 | 0.453 | 0.976 |
NPC—Truncated | 0.958 | 0.450 | 0.972 |
10 | Energy | 0.949 | 0.677 | 0.970 |
NPC—Fisher | 0.999 | 0.582 | 1.000 |
NPC—Truncated | 0.999 | 0.558 | 1.000 |
0.25 | 6 | Energy | 0.647 | 0.425 | 0.734 |
NPC—Fisher | 0.931 | 0.435 | 0.955 |
NPC—Truncated | 0.926 | 0.430 | 0.946 |
10 | Energy | 0.854 | 0.595 | 0.891 |
NPC—Fisher | 0.990 | 0.551 | 0.997 |
NPC—Truncated | 0.987 | 0.535 | 0.995 |
0.5 | 6 | Energy | 0.563 | 0.378 | 0.645 |
NPC—Fisher | 0.877 | 0.380 | 0.908 |
NPC—Truncated | 0.876 | 0.374 | 0.901 |
10 | Energy | 0.709 | 0.479 | 0.770 |
NPC—Fisher | 0.961 | 0.461 | 0.959 |
NPC—Truncated | 0.953 | 0.455 | 0.955 |
Table 3.
Rejection rates with significance level under the alternative hypothesis varying m.
Table 3.
Rejection rates with significance level under the alternative hypothesis varying m.
m | V | Method | Multivariate Normal | Multivariate Log-Normal | Multivariate Student’s t |
---|
20 | 6 | Energy | 0.667 | 0.421 | 0.731 |
NPC—Fisher | 0.941 | 0.434 | 0.958 |
NPC—Truncated | 0.935 | 0.444 | 0.952 |
10 | Energy | 0.847 | 0.572 | 0.911 |
NPC—Fisher | 0.994 | 0.506 | 0.996 |
NPC—Truncated | 0.990 | 0.492 | 0.995 |
30 | 6 | Energy | 0.711 | 0.432 | 0.800 |
NPC—Fisher | 0.985 | 0.467 | 0.986 |
NPC—Truncated | 0.984 | 0.456 | 0.982 |
10 | Energy | 0.902 | 0.578 | 0.934 |
NPC—Fisher | 0.998 | 0.523 | 1.000 |
NPC—Truncated | 0.997 | 0.515 | 1.000 |
40 | 6 | Energy | 0.753 | 0.433 | 0.825 |
NPC—Fisher | 0.993 | 0.524 | 0.993 |
NPC—Truncated | 0.992 | 0.518 | 0.992 |
10 | Energy | 0.929 | 0.580 | 0.959 |
NPC—Fisher | 1.000 | 0.595 | 1.000 |
NPC—Truncated | 1.000 | 0.582 | 1.000 |
Table 4.
Rejection rates with significance level with differences in the correlation structure.
Table 4.
Rejection rates with significance level with differences in the correlation structure.
| V | Method | Multivariate Normal | Multivariate Log-Normal | Multivariate Student’s t |
---|
1.5 | 6 | Energy | 0.013 | 0.012 | 0.014 |
NPC—Fisher | 0.010 | 0.013 | 0.012 |
NPC—Truncated | 0.010 | 0.015 | 0.010 |
2.0 | 6 | Energy | 0.019 | 0.014 | 0.018 |
NPC—Fisher | 0.012 | 0.009 | 0.010 |
NPC—Truncated | 0.015 | 0.013 | 0.009 |
2.5 | 6 | Energy | 0.021 | 0.022 | 0.028 |
NPC—Fisher | 0.012 | 0.009 | 0.014 |
NPC—Truncated | 0.013 | 0.009 | 0.014 |
Table 5.
Descriptive statistics.
Table 5.
Descriptive statistics.
Value | Diameter A | Diameter B | Diameter C |
---|
Average | 2.516 | 4.999 | 7.002 |
Variance | 1.04 × | 1.17 × | 2.03 × |
Table 6.
Global p-values.
Table 6.
Global p-values.
Energy | NPC—Fisher A | NPC—Truncated |
---|
4.99 × | 1.14 × | 2.79 × |
Table 7.
Partial p-values.
Table 7.
Partial p-values.
Test Statistics | NPC—Fisher A | NPC—Truncated |
---|
| 0.014 | 0.019 |
| 0.095 | 0.147 |
| 0.014 | 0.019 |
| Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).