Next Article in Journal
Bayesian Mediation Analysis with an Application to Explore Racial Disparities in the Diagnostic Age of Breast Cancer
Previous Article in Journal
A Note on Simultaneous Confidence Intervals for Direct, Indirect and Synthetic Estimators
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Combined Permutation Tests for Pairwise Comparison of Scale Parameters Using Deviances

by
Scott J. Richter
1,* and
Melinda H. McCann
2
1
Department of Mathematics and Statistics, University of North Carolina at Greensboro, Greensboro, NC 27402, USA
2
Department of Statistics, Oklahoma State University, Stillwater, OK 74075, USA
*
Author to whom correspondence should be addressed.
Stats 2024, 7(2), 350-360; https://doi.org/10.3390/stats7020021
Submission received: 27 February 2024 / Revised: 21 March 2024 / Accepted: 25 March 2024 / Published: 28 March 2024
(This article belongs to the Section Statistical Methods)

Abstract

:
Nonparametric combinations of permutation tests for pairwise comparison of scale parameters, based on deviances, are examined. Permutation tests for comparing two or more groups based on the ratio of deviances have been investigated, and a procedure based on Higgins’ RMD statistic was found to perform well, but two other tests were sometimes more powerful. Thus, combinations of these tests are investigated. A simulation study shows a combined test can be more powerful than any single test.

1. Introduction

Tests for homogeneity of scale are of interest in many areas of application, including industrial quality assurance, agricultural production and education [1]. Parametric tests for comparing scale (e.g., [2,3,4]) are generally not robust to nonnormality (see [5]). Consequently, more robust alternatives are of interest.
An approximate test using the ANOVA F-test on the absolute deviations from the mean was proposed [6]. Using absolute deviations from the median (referred to as deviances in the remainder of this paper), referred to as the W50 test, was later suggested [7]. However, no uniformly best test for scale has been demonstrated in the literature. In fact, without more stringent distributional assumptions, the minimal sufficient statistic would generally be the n-dimensional vector of order statistics. Thus, no single statistic exists that summarizes the information contained in the data, and a uniformly best test statistic does not generally exist. In spite of this, the W50 test has been recommended as a computationally simple test showing good overall performance with respect to power and robustness to nonnormality in several comparative studies ([8,9,10]). More recently, a study [5] compared 25 omnibus tests for homogeneity of variance and recommended the W50 test as “superior”. A modification of Levene’s test [6] (referred to as OB) was proposed [11] which has been recommended over the W50 test for light-tailed distributions [12]. The W50 and OB tests, as well as permutation versions of these tests, were evaluated [5] and it was found that the permutation versions tended to be more robust and have higher power. The W50 test was recommended as a computationally simple robust test, as was the permutation version of the OB test for symmetric and lighter-tailed skewed distributions. Another test for scale utilizing deviances, based on the ratio of the mean deviances, was also proposed [13]. This test will be referred to as the RMD test. The RMD test was found [14] to be generally superior to W50 and OB, although there were still cases where each of W50 and OB had higher power. Since no test has been found to be uniformly superior, it is of interest to develop a test that combines these three tests. A combined test of scale parameters based on the IQR was studied [15] and the combined test was found to be more powerful than its constituent tests in some scenarios. Similarly, we will investigate nonparametric combinations of the RMD, W50 and OB tests to determine if combining the tests can provide increased power compared to individual tests.

2. Methods for Comparing Scale Parameters

Consider a one-way layout with t treatments and n i observations per treatment. We assume a location-scale model, y i j = μ i + σ i ε i j , i = 1 , , t ,     j = 1 , , n i , where μ i and σ i are the location and scale parameters, respectively, of treatment i, and ε i j are independent and identically distributed with median 0. It is desired to test H 0 : σ 1 = σ 2 = = σ t versus H a : σ i σ j for some i and j.

2.1. Brown–Forsythe (W50) Test

First, compute the deviances, z ~ i j = y i j y ~ i , where y ~ is the sample median. The ANOVA F test is performed on these scores, and the p-value is based on the F distribution with t 1 and n t degrees of freedom [7].

2.2. Higgins’ (RMD) Test

The statistic is defined as, R M D = max z ~ ¯ i , z ~ ¯ j min z ~ ¯ i , z ~ ¯ j , where z ~ ¯ i is the mean of the deviances, z ~ i j , for treatment i. The deviances z ~ i j = y i j y ~ i are the same as those used by the W50 test. The permutation distribution of the RMD statistic was used to calculate a p-value [13].

2.3. O’Brien’s (OB) Test

First, compute the scores r i j w = w + n i 2 n i j y i j y ¯ i 2 w s i 2 n i 1 n i 1 n i 2 , where 0 w 1 . At one extreme, when w = 0 , the statistic reduces to r i j 0 = n i y i j y ¯ i 2 n i 1 , which is a slight modification of Levene’s test, which uses z ¯ i j 2 = y i j y ¯ i 2 . At the other extreme, when w = 1 , r i j 1 = q i j = n i y i j y ¯ i 2 s i 2 n i 2 = n i s i 2 n i 1 s i 1 2 , which was referred to as a “jackknife pseudovalue of s i 2 [11]”. The ANOVA F test is performed on these scores and, the p-value is based on the F distribution with t 1 and n t degrees of freedom. Tests based on z ¯ i j 2 have been shown to have inflated Type I error rates, while those based on q i j tend to have low power. Since r w is a weighted average of the two tests, it provides a way to balance the drawbacks of the two tests. A “utility” value of w = 0.5 was suggested for most situations [11], and this is the value employed in this study.

2.4. Permutation Tests

While the permutation test using the RMD statistic was suggested [13], the W50 and OB tests described previously were proposed as approximate tests based on the F distribution. However, p-values for W50 and OB can also be calculated using permutation distributions. A simulation study [1] found for the two-treatment case that the permutation versions tended to be more robust and have greater power than the approximate tests. Thus, we will consider only the permutation versions of these combined tests. Test statistics will be computed for a large number of random reassignments of observations to treatments, and the p-value will be calculated as the proportion of values of the permutation distribution that is at least as extreme as the observed test statistic value.

3. Combined Tests

A two-step approach to create a nonparametric combination of dependent tests was proposed [16] and described as follows:
  • Step 1. Analyze the data using the tests of interest, referred to as partial tests;
  • Step 2. Combine the partial tests to assess the global hypothesis.
Several different combining functions have been developed that satisfy the properties required for a suitable combining function [16]. Since the relative power of different combining functions can vary across conditions, we consider combined tests using three of the best-known combining functions: the Fisher, Liptak and Tippett combining functions [15].
Let λ i be the p-value associated with the ith test to be combined. Then, the test statistics for the Fisher, Liptak and Tippett functions are
  • The Fisher combining function is T F = i ln λ i ;
  • The Liptak combining function is T L = i Φ 1 1 λ i ;
  • The Tippett combining function is T T = max i 1 λ i .
The Tippett function tends to have the highest power when one or a few, but not all, of the constituent tests reject the null hypothesis; the Liptak function tends to have the highest power when all tests reject the null hypothesis; the power of the Fisher function will tend to lie between the other two, making it the more general option and thus probably the most popular [16]. The combined tests are carried out as follows [16].
  • Compute the observed test statistic value ( T F , T L , T T ) according to the above definitions, using the permutation p-values of RMD, W50 and OB.
  • To compute the permutation test p-value associated with each combined statistic:
    • For the ith statistic in the permutation distributions constructed for RMD, W50 and OB, compute the ith partial p-value as the proportion of test statistic values at least as large as the ith statistic value.
    • Using the partial p-values for RMD, W50 and OB, use the respective combining function to compute a test statistic value ( T F , T L , T T ) for each permutation. This results in a permutation distribution for each of the combined statistics.
    • For each combined test, the permutation p-value is then the proportion of values in the permutation distribution at least as large as the observed statistic value.
Note that all tests are based on the same set of randomly generated permutations.
Since the RMD, W50 and OB tests were each most powerful for at least some scenarios in past simulations (e.g., [5]), combinations of these three tests will be examined. In addition, since RMD and W50 were usually more powerful than OB, a combination of only RMD and W50 will also be considered. The p-values for each of the constituent tests in each combination will be estimated using the permutation distribution of the statistic. The powers Type I error rates of the Fisher, Liptak and Tippett combining functions will be estimated and compared, and these will also be compared to those of the individual tests.

4. Strong Familywise Error Rate Control for Pairwise Comparisons

The familywise error rate (FWER) will be controlled using the technique of Richter and McCann [17]. Richter and McCann [17] proposed a restricted permutation method to provide strong control of the familywise error rate (FWER) for pairwise comparison of location parameters. This method will be extended to the present case of comparing scale parameters as follows. First, the two-sample test statistic for a given method will be calculated for each of the possible t ( t 1 ) / 2 pairs of treatments. Then, the maximum value of the test statistic across all pairs will be calculated. Next, observations will be reassigned at random to treatments within each pair of treatments, a test statistic calculated for each pair of treatments, and the maximum value determined. This will be repeated many times to build the permutation distribution, and the p-value for comparing each pair of treatments will be calculated as the proportion of values in the permutation distribution that is at least as extreme as the observed value.

5. Simulation Study

5.1. Procedures Studied

A simulation study estimated and compared the familywise Type I error rate and “any-pair” power (probability of detecting at least one true difference) of the methods described in Section 2:
  • RMD: Higgins RMD procedure.
  • W50: Brown and Forsyth’s W50 test.
  • OB: O’Brien’s method using means.
  • F 3 : Fisher’s combination test of RMD, W50 and OB.
  • F 2 : Fisher’s combination test of RMD and W50.
  • L 3 : Liptak’s combination test of RMD, W50 and OB.
  • L 2 : Liptak’s combination test of RMD and W50.
  • T 3 : Tippett’s combination test of RMD, W50 and OB.
  • T 2 : Tippett’s combination test of RMD and W50.

5.2. Sample Sizes and Differences in Scale Parameters

Both equal and unequal sample size settings were examined for five treatments. For equal sample size cases, n i = 10 and n i = 30 were used. Unequal sample size cases, settings of n 1 = 5 ,   n 2 = 5 ,     n 3 = 10 ,   n 4 = 15 ,   n 5 = 15 and n 1 = 10 ,     n 2 = 10 ,   n 3 = 20 ,   n 4 = 30 ,   n 5 = 30 were utilized. Maximum scale parameter ratios, σ m a x σ m i n , ranging from 1 to 5 were examined, with different patterns of smaller ratios present. Settings of σ , 1,1 , 1,1 and ( σ , ( σ + 1 ) / 2,1 , 1,1 ) were used. The first setting we refer to as the “single extreme scale parameter” setting, while the second setting has an intermediate scale value midway between the minimum (1) and maximum (σ). The specific settings used for ( σ 1 , σ 2 , σ 3 , σ 4 , σ 5 ) were as follows:
  • 1. 1,1 , 1,1 , 1 2. 3,1 , 1,1 , 1 3. 3,2 , 1,1 , 1 4. 5,1 , 1,1 , 1 5. 5,3 , 1,1 , 1 .

5.3. Distributions

Several different g and h distributions [18] were used to simulate data from distributions with different characteristics. g and h distributions are monotonic functions of normal distributions and allow investigation of nonnormal distributions with specific characteristics. The g-and-h random variable is defined as Y g , h Z = exp   g Z 1 g exp h Z 2 2 , where Z N 0,1 . When g = h = 0, Y g , h Z N 0,1 . Nonzero values of g increase the skewness and positive values of h increase the elongation (tail heaviness) of the distribution. Changing the values of g and h does not affect the location of the distribution. The following cases were considered, and representative plots shown in Figure 1:
  • g = 0, h = 0—Normally distributed (symmetric, light tails);
  • g = 0, h = 0.4—Symmetric, moderately heavy tails;
  • g = 0, h = 0.8—Symmetric, very heavy tails;
  • g = 0.4, h = 0—Moderately skewed, light tails;
  • g = 0.8, h = 0—Heavily skewed, light tails;
  • g = 0.4, h = 0.4—Moderately skewed, moderately heavy tails;
  • g = 0.8, h = 0.4—Heavily skewed, moderately heavy tails.
Type I error rate and power were estimated based on 1000 randomly selected data sets from each distribution, for each setting of sample sizes and scale parameter patterns. It has been suggested [19] that only 253 random permutations are necessary with 1000 random data sets if the goal of the simulation is to estimate the power of a test and only a “rough” estimate of the permutation p-value is required, while a random sample of at least 1600 permutations was recommended [20] to estimate the exact p-value for a permutation test. Since precise estimation of the permutation test p-values was considered important, a conservative sample of 1999 random permutations was utilized, and thus the permutation distribution for each test was based on 2000 values: the observed test statistic value plus 1999 values based on random permutations of the observed data.

6. Simulation Results

6.1. Familywise Type I Error

All tests were robust in the sense that estimated rates of Type I error were close to the nominal level of 0.05 (See Table 1, Table 2, Table 3, Table 4, Table 5 and Table 6) with only one exceeding 0.075 (0.084 for RMD in the equal sample n i = 30 case, g = 0.8, h = 0.4). Note that in the tables, the first row of each distribution represents the equal scale case, and thus the value given is the estimated Type I error rate.

6.2. Any-Pair Power

When sample sizes were equal (Table 1 and Table 2), RMD tended to have the highest power, although in some cases the Fisher or Liptak combined test was most powerful.
When sample sizes were small and unequal and the larger scales were associated with the smaller samples (Table 3), the F2 and L2 combined tests were most powerful for all scale configurations, with L2 usually having the higher power. The lone exception was when the distribution was symmetric with very heavy tails (g = 0, h = 0.8) where the RMD had similar power to F2 and L2. When the sample sizes increased to n i = 10 ,   10 ,   20 ,   30 ,   30 , however, the power advantage of the combined tests over RMD tended to diminish, except for the skewed, light-tailed distributions, where the combined tests were still more powerful (See Table 4).
Neither of the Tippett combined tests was as powerful as the Liptak and Fisher versions.
When the sample sizes were small and unequal but the larger scales were associated with the larger samples (Table 5 and Table 6), L2 and F2 had the highest power for normal and moderately skewed-only distributions. Meanwhile, RMD had the highest power for all distributions with heavy tails ( h = 0.4 ,   0.8 ) . As before, as sample sizes increased, the power advantages of the combined tests diminished while RMD maintained power advantages for heavier-tailed distributions.
Table 1. Proportion of at least one rejection at α = 0.05 , five treatments, equal samples of size n i = 10 .
Table 1. Proportion of at least one rejection at α = 0.05 , five treatments, equal samples of size n i = 10 .
DistributionScaleMethod
( σ 1 σ 2 σ 3 σ 4 σ 5 ) W50OBRMDF3F2L3L2T3T2
g = 0, h = 0111110.0400.0160.0390.0390.0420.0410.0430.0250.034
311110.6690.6200.7420.7080.7260.7080.7270.6890.710
321110.6090.4660.6890.6480.6920.6550.6970.6040.632
511110.9110.8500.9650.9350.9540.9330.9500.9440.957
531110.8890.6970.9700.9250.9560.9240.9570.9290.944
g = 0, h = 0.4111110.0100.0000.0650.0200.0340.0140.0310.0300.043
311110.0990.0280.2440.1550.2050.1260.2040.1910.202
321110.0840.0160.2740.1410.2120.1120.2050.1970.225
511110.2850.0900.4760.3780.4450.3310.4460.4110.433
531110.1920.0460.4950.3320.4340.2470.4210.4090.433
g = 0, h = 0.8111110.0040.0000.0710.0140.0310.0050.0240.0360.051
311110.0170.0030.1400.0570.0980.0320.0840.0920.110
321110.0110.0010.1820.0600.1100.0340.0880.1080.133
511110.0530.0110.2370.1300.1920.0790.1740.1890.208
531110.0340.0070.2860.1290.2080.0700.1820.2140.240
g = 0.4, h = 0111110.0420.0200.0490.0370.0530.0360.0560.0310.037
311110.6200.5520.6560.6600.6810.6650.6860.6270.630
321110.5490.4020.6370.5970.6300.5990.6380.5530.585
511110.8910.8160.9420.9330.9430.9280.9420.9310.935
531110.8420.6140.9310.9000.9280.8890.9280.8890.906
g = 0.4, h = 0.4111110.0070.0010.0620.0150.0290.0110.0280.0230..030
311110.0900.0280.2400.1460.2040.1180.2000.1840.200
321110.0810.0140.2720.1300.2160.0990.2010.1950.222
511110.2510.0900.4420.3430.4180.3010.4150.3840.406
531110.1820.0400.4770.3100.4060.2480.3900.3740.406
g = 0.8, h = 0111110.0340.0140.0580.0420.0470.0360.0520.0330.042
311110.4260.3590.4720.5140.4900.5130.4970.4650.458
321110.3460.2360.4680.4320.4510.4300.4590.4130.413
511110.7650.6580.8000.8350.8260.8260.8280.8130.802
531110.6400.4260.8010.7800.7910.7540.7990.7550.762
g = 0.8, h = 0.4111110.0110.0020.0630.0150.0300.0090.0270.0280.038
311110.0740.0250.1950.1380.1680.1040.1640.1540.171
321110.0640.0140.2310.1310.1790.0890.1670.1560.185
511110.1860.0650.3840.2940.3480.2420.3460.3130.338
531110.1380.0320.4300.2720.3560.2040.3460.3330.364
Table 2. Proportion of at least one rejection at α = 0.05 , five treatments, equal samples of size n i = 30 . Cases that were uninformative for comparing methods were omitted.
Table 2. Proportion of at least one rejection at α = 0.05 , five treatments, equal samples of size n i = 30 . Cases that were uninformative for comparing methods were omitted.
DistributionScaleMethod
( σ 1 σ 2 σ 3 σ 4 σ 5 ) W50OBRMDF3F2L3L2T3T2
g = 0, h = 0111110.0440.0230.0450.0410.0470.0420.0470.0380.041
g = 0, h = 0.4111110.0070.0010.0600.0140.0300.0120.0280.0300.034
311110.2760.0760.4190.3180.3920.2720.3950.3720.389
321110.2250.0500.4500.3090.4020.2420.3990.3840.401
511110.6190.2240.7020.6430.7160.5850.7150.6810.692
531110.4610.1340.7300.6010.7060.5120.7040.6840.698
g = 0, h = 0.8111110.0020.0000.0750.0130.0290.0090.0190.0330.043
311110.0260.0070.1550.0740.1120.0530.0970.1160.132
321110.0230.0030.1990.0700.1240.0420.1030.1430.167
511110.0780.0150.2780.1830.2410.1240.2280.2410.252
531110.0590.0080.3390.1870.2740.1100.2450.2890.299
g = 0.4, h = 0111110.0290.0200.0490.0370.0440.0340.0440.0330.040
g = 0.4, h = 0.4111110.0040.0010.0730.0140.0300.0110.0220.0350.045
311110.2220.0680.3800.2800.3490.2280.3460.3320.352
321110.1680.0410.4090.2640.3490.1990.3400.3380.360
511110.5330.1740.6350.5610.6600.5060.6640.6130.628
531110.3940.0990.6710.5460.6420.4480.6360.6130.634
g = 0.8, h = 0111110.0260.0160.0530.0340.0380.0340.0390.0390.043
311110.9350.7930.9210.9170.9420.9030.9430.9130.923
321110.8610.5840.9120.8650.9190.8520.9230.8620.892
g = 0.8, h = 0.4111110.0050.0010.0840.0120.0350.0090.0310.0360.050
311110.1380.0460.3030.2150.2650.1700.2630.2560.276
321110.1030.0240.3360.2000.2740.1540.2620.2670.296
511110.3710.1180.5260.4560.5140.3920.5170.4900.507
531110.2640.0670.5730.4360.5330.3250.5280.5100.531
Table 3. Proportion of at least one rejection at α = 0.05 , five treatments, unequal samples of size n i = 5 ,   5 ,   10 ,   15 ,   15 , larger scale associated with smaller sample size. Cases that were uninformative for comparing methods were omitted.
Table 3. Proportion of at least one rejection at α = 0.05 , five treatments, unequal samples of size n i = 5 ,   5 ,   10 ,   15 ,   15 , larger scale associated with smaller sample size. Cases that were uninformative for comparing methods were omitted.
DistributionScalesMethod
( σ 1 σ 2 σ 3 σ 4 σ 5 ) W50OBRMDF3F2L3L2T3T2
g = 0, h = 0111110.0460.0060.0220.0250.0380.0270.0380.0170.024
311110.3310.2880.1280.3220.2940.3360.3010.2680.258
321110.3200.2220.0480.3020.2750.3170.2970.2350.232
511110.5020.4510.3480.5430.5250.5570.5460.4700.460
531110.5070.3300.2440.5120.5410.5460.5750.4130.416
g = 0, h = 0.4111110.0060.0040.0470.0070.0260.0050.0230.0220.029
311110.0530.0240.0600.0520.0750.0560.0770.0480.053
321110.0500.0230.0560.0510.0690.0510.0830.0300.039
511110.1200.0530.1320.1440.1860.1400.1950.1090.124
531110.1200.0460.1140.1370.1840.1410.1960.0930.122
g = 0, h = 0.8111110.0030.0030.0680.0060.0230.0060.0170.0260.037
g = 0.8, h = 0111110.0270.0050.0320.0190.0390.0180.0420.0200.032
311110.2170.1810.0900.2200.2100.2220.2170.1810.177
321110.2240.1410.0400.2210.2050.2270.2230.1740.165
511110.4090.3180.2160.4310.4150.4300.4330.3650.370
531110.4210.2510.1430.4020.4280.4150.4530.3270.344
g = 0.4, h = 0111110.0360.0090.0240.0260.0420.0270.0440.0180.028
311110.3060.2570.1140.2860.2800.3040.2820.2320.234
321110.3040.1960.0440.2740.2700.2910.2870.2200.221
511110.4830.4150.3060.5230.5010.5320.5190.4340.436
531110.4210.2770.1800.4230.4270.4400.4610.3450.353
g = 0.4, h = 0.4111110.0070.0010.0560.0050.0230.0070.0230.0270.032
311110.0460.0220.0610.0580.0650.0510.0740.0470.051
321110.0540.0190.0540.0500.0730.0530.0820.0460.050
511110.1160.0540.1270.1320.1690.1230.1800.1080.124
531110.0840.0400.0700.0890.1140.0940.1270.0690.077
g = 0.8, h = 0.4111110.0060.0030.0640.0040.0270.0040.0200.0330.035
311110.0370.0170.0520.0470.0580.0450.0630.0390.041
321110.0450.0200.0530.0510.0730.0490.0740.0450.057
511110.1040.0440.1260.1160.1470.1080.1590.0970.119
531110.0970.0400.1040.1130.1460.1180.1560.0970.107
Table 4. Proportion of at least one rejection at α = 0.05 , five treatments, unequal samples of size n i = 10 ,   10 ,   2 0 ,   30 ,   30 , larger scale associated with smaller sample size.
Table 4. Proportion of at least one rejection at α = 0.05 , five treatments, unequal samples of size n i = 10 ,   10 ,   2 0 ,   30 ,   30 , larger scale associated with smaller sample size.
DistributionScalesMethod
( σ 1 σ 2 σ 3 σ 4 σ 5 ) W50OBRMDF3F2L3L2T3T2
g = 0, h = 0111110.0390.0140.0290.0340.0360.0340.0380.0240.030
311110.8130.7770.7710.8300.8160.8340.8230.7700.769
321110.7350.5910.7460.7920.7890.8000.8060.6670.687
511110.9650.9200.9770.9740.9840.9750.9840.9630.967
531110.9480.7690.9900.9750.9870.9750.9890.9600.972
g = 0, h = 0.4111110.0050.0020.0490.0100.0190.0080.0170.0220.030
311110.1050.0330.1760.1340.1790.1240.1810.1290.145
321110.0630.0260.1810.1060.1580.0920.1760.1130.142
511110.2960.1100.4120.3360.4170.3220.4220.3500.374
531110.1840.0620.4100.2890.3990.2630.4120.3160.346
g = 0, h = 0.8111110.0030.0010.0620.0090.0220.0050.0130.0280.034
311110.0160.0080.0640.0290.0550.0230.0540.0360.044
321110.0110.0070.0740.0210.0500.0190.0460.0400.049
511110.0410.0120.1220.0800.1140.0690.1170.0890.106
531110.0260.0080.1220.0630.1000.0600.1040.0860.096
g = 0.8, h = 0111110.0300.0090.0350.0300.0390.0310.0410.0250.030
311110.4890.3910.4430.5250.5250.5290.5360.4790.449
321110.3780.2580.3890.4450.4510.4460.4720.3710.366
511110.8430.7100.8270.8630.8600.8620.8650.8320.826
531110.6800.4770.8030.7950.8330.7860.8400.7260.758
g = 0.4, h = 0111110.0070.0010.0550.0120.0270.0130.0200.0260.032
311110.7320.6870.6820.7630.7420.7690.7530.7140.686
321110.6330.4840.6300.6790.6990.6900.7190.5840.590
511110.9480.8870.9550.9610.9590.9630.9650.9420.946
531110.8860.6860.9560.9360.9600.9360.9650.9180.928
g = 0.4, h = 0.4111110.0190.0070.0450.0210.0350.0200.0380.0230.026
311110.0800.0390.1610.1170.1590.1110.1610.1200.137
321110.0560.0230.1660.0990.1460.0890.1550.1010.130
511110.2650.0970.3750.3160.3830.2930.3840.3070.339
531110.1690.0590.3820.2770.3710.2590.3750.2850.315
g = 0.8, h = 0.4111110.0060.0010.0640.0110.0270.0100.0240.0290.031
311110.0600.0250.1280.0950.1160.0870.1200.0910.103
321110.0430.0220.1400.0750.1180.0750.1220.0840.097
511110.1890.0630.2870.2420.2960.2320.3000.2380.255
531110.1130.0450.2990.2130.2740.1950.2800.2210.248
Table 5. Proportion of at least one rejection at α = 0.05 , five treatments, unequal samples of size n i = 15 ,   15 ,   10 ,   5 ,   5 , larger scale associated with larger sample size.
Table 5. Proportion of at least one rejection at α = 0.05 , five treatments, unequal samples of size n i = 15 ,   15 ,   10 ,   5 ,   5 , larger scale associated with larger sample size.
DistributionScalesMethod
σ 1 σ 2 σ 3 σ 4 σ 5 W50OBRMDF3F2L3L2T3T2
g = 0, h = 0111110.0460.0060.0220.0250.0380.0270.0380.0170.024
311110.5390.0080.4930.5180.6140.5440.6550.4130.469
321110.5110.0030.4610.4680.5890.5040.6140.3830.445
511110.7810.0160.9150.8540.9290.8610.9400.8150.850
531110.7640.0110.8880.8180.9010.8220.9160.7800.819
g = 0, h = 0.4111110.0060.0040.0470.0070.0260.0050.0230.0220.029
311110.0140.0000.2150.0490.1310.0290.1160.1300.147
321110.0080.0000.2550.0320.1280.0220.0970.1450.167
511110.0380.0000.4550.1530.3330.0910.2840.3270.357
531110.0190.0000.4900.1270.3160.0540.2400.3430.374
g = 0, h = 0.8111110.0030.0030.0680.0060.0230.0060.0170.0260.037
311110.0000.0000.1270.1100.0580.0030.0250.0790.092
321110.0000.0020.1720.0120.0120.0700.0020.0270.105
511110.0020.0000.2380.0520.1480.0090.0780.1700.190
531110.0000.0010.3140.0450.1630.0040.0580.2120.241
g = 0.8, h = 0111110.0270.0050.0320.0190.0390.0180.0420.0200.032
311110.2130.0040.3640.2500.3820.2310.4030.2690.313
321110.1610.0000.3750.2170.3610.1880.3660.2690.306
511110.3700.0130.6900.5630.7280.5050.7390.6160.664
531110.2810.0040.6910.4860.6820.3920.6840.5820.627
g = 0.4, h = 0111110.0360.0090.0240.0260.0420.0270.0440.0180.028
311110.4420.0060.4500.4400.5660.4420.5950.3950.449
321110.3800.0040.4360.3780.5210.3730.5440.3520.409
511110.6520.0150.8530.7740.8990.7570.9090.7550.796
531110.6050.0060.8320.7140.8560.6800.8670.7140.759
g = 0.4, h = 0.4111110.0070.0010.0560.0050.0230.0070.0230.0270.032
311110.0070.0000.2140.0500.1340.0310.1010.1380.156
321110.0070.0010.2490.0350.1360.0190.0980.1480.180
511110.0300.0000.4300.1580.3140.0900.2750.3150.337
531110.0140.0000.4790.1330.3030.0560.2310.3310.366
g = 0.8, h = 0.4111110.0060.0030.0640.0040.0270.0040.0200.0330.035
311110.0060.0000.2110.0360.1250.0180.0890.1330.149
321110.0030.0010.2460.0310.1310.0160.0740.1560.179
511110.0170.0000.3770.1370.2830.0570.2360.3010.302
531110.0100.0010.4370.1190.2880.0350.1970.3300.362
Table 6. Proportion of at least one rejection at α = 0.05 , five treatments, unequal samples of size n i = 30 ,   30 ,   20 ,   10 ,   10 , larger scale associated with larger sample size. Cases that were uninformative for comparing methods were omitted.
Table 6. Proportion of at least one rejection at α = 0.05 , five treatments, unequal samples of size n i = 30 ,   30 ,   20 ,   10 ,   10 , larger scale associated with larger sample size. Cases that were uninformative for comparing methods were omitted.
DistributionScalesMethod
( σ 1 σ 2 σ 3 σ 4 σ 5 ) W50OBRMDF3F2L3L2T3T2
g = 0, h = 0111110.0390.0140.0290.0340.0360.0340.0380.0240.030
311110.9590.3610.9770.9680.9830.9710.9830.9460.961
321110.9360.3190.9600.9440.9700.9490.9710.9120.934
g = 0, h = 0.4111110.0050.0020.0490.0100.0190.0080.0170.0220.030
311110.0190.0000.3320.1230.2390.0660.2040.2430.268
321110.0110.0000.3550.1070.2310.0480.1750.2560.285
511110.0570.0000.6200.3810.5690.2050.5150.5600.570
531110.0230.0000.650.3050.5390.1220.4240.5770.602
g = 0, h = 0.8111110.0030.0010.0620.0090.0220.0050.0130.0280.034
311110.0000.0000.1560.0240.0890.0040.0420.1130.122
321110.0000.0000.2100.0290.1040.0060.0380.1430.163
511110.0000.0000.2900.0950.1990.0130.1190.2320.257
531110.0000.0000.3690.0920.2300.0090.1160.2870.314
g = 0.8, h = 0111110.0300.0090.0350.0300.0390.0310.0410.0250.030
311110.3720.0340.6620.5500.6770.4960.6950.5660.617
321110.2710.0070.6510.4580.6100.3820.6220.5120.558
511110.6070.0650.9520.9170.9700.8200.9700.9240.936
531110.4840.0200.9440.8570.9410.7160.9460.9050.921
g = 0.4, h = 0111110.0070.0010.0550.0120.0270.0130.0200.0260.032
311110.8070.1700.9210.8890.9420.8760.9510.8640.889
321110.7480.1110.8920.8250.9030.8050.9110.8110.844
511110.9470.2420.9990.9970.9990.9890.9990.9960.998
531110.9360.1890.9980.9910.9990.9750.9990.9930.993
g = 0.4, h = 0.4111110.0190.0070.0450.0210.0350.0200.0380.0230.026
311110.0090.0000.3160.0980.2160.0400.1800.2280.252
321110.0050.0000.3490.0860.2170.0300.1710.2400.266
511110.0430.0000.5820.3460.5280.1800.4610.5070.529
531110.0150.0000.6270.2990.5110.1060.3960.5500.575
g = 0.8, h = 0.4111110.0060.0010.0640.0110.0270.0100.0240.0290.031
311110.0020.0000.2590.0710.1640.0230.1280.1880.208
321110.0010.0000.3010.0670.1670.0160.1120.2080.235
511110.0160.0000.5030.2770.4380.1150.3600.4340.455
531110.0050.0000.5690.2350.4320.0640.3100.4880.518

7. Discussion

In this paper, we studied the performance of nonparametric combined tests for multiple comparisons of scale parameters. The RMD test had been shown in previous studies to be the preferred test to compare scale parameters, although it was not always the most powerful test, as the W50 and OB tests were able to outperform RMD in some situations. We found that combinations of two or more of these tests could be more powerful than any individual test. Distribution and sample size configurations for which W50 and/or OB were more powerful than RMD tended to be the cases where a combined test was found to be most powerful. Combined tests tended to outperform RMD for skewed, lighter-tailed distributions, while RMD tended to be more powerful when distributions were heavier-tailed, since in these scenarios, RMD enjoyed large power advantages over W50 and OB. Combined tests involving OB never showed an advantage over combinations of only RMD and W50.
As with any simulation study, generalization of results requires caution. These results may not extend to situations where the true scales are very different than those studied here, and/or where the data do not come from the distributions studied here. In addition, the conclusions rely on the assumption of a location-scale model, and thus may not be valid if that assumption is not plausible.

Author Contributions

Conceptualization, S.J.R. and M.H.M.; methodology, S.J.R. and M.H.M.; software, S.J.R.; validation, S.J.R.; formal analysis, S.J.R. and M.H.M.; investigation, S.J.R.; resources, S.J.R.; data curation, S.J.R.; writing—original draft preparation, S.J.R.; writing—review and editing, S.J.R. and M.H.M.; visualization, S.J.R.; project administration, S.J.R. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author/s.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Marozzi, M. Levene type tests for the ratio of two scales. J. Statist. Comput. Simul. 2011, 81, 815–826. [Google Scholar] [CrossRef]
  2. Bartlett, M.S. Properties of sufficiency and statistical tests. Proc. R. Soc. Lond. A 1937, 268–282. [Google Scholar]
  3. Cochran, W.G. Problems arising in the analysis of a series of similar experiments. J. R. Statist. Soc. 1937, 4, 102–118. [Google Scholar] [CrossRef]
  4. Hartley, H.O. The use of range in analysis of variance. Biometrika 1950, 37, 271–280. [Google Scholar] [CrossRef] [PubMed]
  5. Sharma, D.; Kibria, B.M. On some test statistics for testing homogeneity of variances: A comparative study. J. Statist. Comput. Simul. 2013, 83, 1944–1963. [Google Scholar] [CrossRef]
  6. Levene, H. Robust tests for equality of variances. In Contributions to Probability and Statistics; Olkin, I., Hotelling, H., Eds.; Stanford University Press: Palo Alto, CA, USA, 1960; pp. 278–292. [Google Scholar]
  7. Brown, M.B.; Forsythe, A.B. Robust tests for the equality of variances. J. Am. Stat. Assoc. 1974, 69, 364–367. [Google Scholar] [CrossRef]
  8. Keselman, H.J.; Games, P.A.; Clinch, J.J. Tests for homogeneity of variance. Commun. Stat. Simul. Comput. 1979, 8, 113–119. [Google Scholar] [CrossRef]
  9. Conover, W.J.; Johnson, M.E.; Johnson, M.M. A comparative study of tests for homogeneity of variances, with applications to the outer continental shelf bidding data. Technometrics 1981, 23, 351–361. [Google Scholar] [CrossRef]
  10. Balakrishnan, N.; Ma, C.W. A comparative study of various tests for the equality of two population variances. J. Stat. Comput. Simul. 1990, 35, 41–89. [Google Scholar] [CrossRef]
  11. O’Brien, R.G. A general ANOVA method for robust tests of additive models for variances. J. Am. Stat. Assoc. 1979, 74, 877–880. [Google Scholar] [CrossRef]
  12. Olejnik, S.F.; Algina, J. Tests of variance equality when distributions differ in form and location. Educ. Psychol. Meas. 1988, 48, 317–329. [Google Scholar] [CrossRef]
  13. Higgins, J.J. Introduction to Modern Nonparametric Statistics; Duxbury: Pacific Grove, CA, USA, 2004. [Google Scholar]
  14. Richter, S.J.; McCann, M.H. Permutation tests of scale using deviances. Commun. Stat. Simul. Comput. 2017, 46, 5553–5565. [Google Scholar] [CrossRef]
  15. Marozzi, M. A modified Hall-Padmanabhan test for the homogeneity of scales. Commun. Stat. Theory Methods 2012, 41, 3068–3078. [Google Scholar] [CrossRef]
  16. Pesarin, F.; Salmaso, L. Permutation Tests for Complex Data; Wiley: Chichester, UK, 2001. [Google Scholar]
  17. Richter, S.J.; McCann, M.H. Multiple Comparison of Medians Using Permutation Tests. J. Mod. Appl. Stat. Methods 2007, 6, 399–412. [Google Scholar] [CrossRef]
  18. Hoaglin, D.C. Summarizing shape numerically: The g-and-h distributions. In Exploring Data Tables, Trends, and Shapes; Hoaglin, D.C., Mosteller, F., Tukey, J.W., Eds.; John Wiley & Sons, Inc.: Hoboken, NJ, USA, 1985. [Google Scholar]
  19. Marozzi, M. Multivariate tests based on interpoint distances with application to magnetic resonance imaging. Stat. Methods Med. Res. 2016, 25, 2593–2610. [Google Scholar] [CrossRef]
  20. Keller-McNulty, S.; Higgins, J.J. Effect of tail weight and outliers on power and Type-I error of robust permutation tests for location. Comm. Statist. Simulation Comput. 1987, 16, 17–36. [Google Scholar] [CrossRef]
Figure 1. Example boxplots of the simulated distributions. Note that the “Value” axis has been truncated to omit extreme values from distributions 3 and 7.
Figure 1. Example boxplots of the simulated distributions. Note that the “Value” axis has been truncated to omit extreme values from distributions 3 and 7.
Stats 07 00021 g001
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Richter, S.J.; McCann, M.H. Combined Permutation Tests for Pairwise Comparison of Scale Parameters Using Deviances. Stats 2024, 7, 350-360. https://doi.org/10.3390/stats7020021

AMA Style

Richter SJ, McCann MH. Combined Permutation Tests for Pairwise Comparison of Scale Parameters Using Deviances. Stats. 2024; 7(2):350-360. https://doi.org/10.3390/stats7020021

Chicago/Turabian Style

Richter, Scott J., and Melinda H. McCann. 2024. "Combined Permutation Tests for Pairwise Comparison of Scale Parameters Using Deviances" Stats 7, no. 2: 350-360. https://doi.org/10.3390/stats7020021

APA Style

Richter, S. J., & McCann, M. H. (2024). Combined Permutation Tests for Pairwise Comparison of Scale Parameters Using Deviances. Stats, 7(2), 350-360. https://doi.org/10.3390/stats7020021

Article Metrics

Back to TopTop