Next Article in Journal
Low-Dose Electron Crystallography: Structure Solution and Refinement
Previous Article in Journal
Spectrality of a Class of Self-Affine Measures with Prime Determinant
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Estimation and Testing of Wilcoxon–Mann–Whitney Effects in Factorial Clustered Data Designs

by
Kerstin Rubarth
1,2,
Paavo Sattler
3,
Hanna Gwendolyn Zimmermann
4 and
Frank Konietschke
1,2,*
1
Institute of Biometry and Clinical Epidemiology, Charité—Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Charitéplatz 1, 10117 Berlin, Germany
2
Berlin Institute of Health (BIH), Anna-Louisa-Karsch-Straße 2, 10178 Berlin, Germany
3
Department of Statistics, TU Dortmund University, TU Dortmund, 44221 Dortmund, Germany
4
Experimental and Clinical Research Center, Charité—Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Charitéplatz 1, 10117 Berlin, Germany
*
Author to whom correspondence should be addressed.
Symmetry 2022, 14(2), 244; https://doi.org/10.3390/sym14020244
Submission received: 12 November 2021 / Revised: 7 December 2021 / Accepted: 16 December 2021 / Published: 26 January 2022
(This article belongs to the Section Life Sciences)

Abstract

:
Clustered data arise frequently in many practical applications whenever units are repeatedly observed under a certain condition. One typical example for clustered data are animal experiments, where several animals share the same cage and should not be assumed to be completely independent. Standard methods for the analysis of such data are Linear Mixed Models and Generalized Estimating Equations—however, checking their assumptions is not easy, especially in scenarios with small sample sizes, highly skewed, count, and ordinal or binary data. In such situations, Wilcoxon–Mann–Whitney type effects are suitable alternatives to mean-based or other distributional approaches. Hence, no specific data distribution, symmetric or asymmetric, is required. Within this work, we will present different estimation techniques of such effects in clustered factorial designs and discuss quadratic- and multiple contrast type-testing procedures for hypotheses formulated in terms of Wilcoxon–Mann–Whitney effects. Additionally, the framework allows for the occurrence of missing data: estimation and testing hypotheses are based on all-available data instead of complete-cases. An extensive simulation study investigates the precision of the estimators and the behavior of the test procedures in terms of their type-I error control. One real world dataset exemplifies the applicability of the newly proposed procedures.

1. Introduction

Clustered data are commonly encountered in medical research and various disciplines and occur whenever a subject is not only observed once under a certain condition, but multiple times. For instance, animals sharing the same cage, students in classes, skin irritations, etc., are different examples of clustered data. In these situations, subjects provide multiple possibly dependent observations (not necessarily equally sized). Standard methods for the analysis of independent observations (e.g., t-test, linear regression, Analysis of Variance (ANOVA), etc.) are not applicable in such scenarios. Ignoring that structure might result in bias (such as inflated type-I error rates and estimation bias) and therefore appropriate models are necessary for making inference. Reducing the clusters to a single point by, e.g., computing their mean or median, typically results in a loss of power and decreased precision of point estimates [1,2,3]. Furthermore, estimation of treatment effects becomes an issue because of presence of intra-cluster correlations and unequally sized clusters, see, e.g., Gao [4]. Under certain assumptions such as multivariate normality and linear relationships, Linear Mixed Models and Generalized Estimating Equations can be used. However, testing these assumptions is difficult in practice as noted by Fitzmaurice et al. [5] and Johnson and Wichern [6], especially when dealing with small sample sizes. Further, if count, ordinal, or highly skewed data are present, mean-based approaches are not applicable and thus, another type of measure is needed. On the contrary, Wilcoxon–Mann–Whitney-type effects p = P ( X < Y ) + 1 2 P ( X = Y ) [7] are purely non-parametric quantities, which can be used for the definition of a treatment effect for metric, discrete, ordinal, and even dichotomous data in a unified way. Thus, the response variable is not assumed to be symmetrically distributed. Here X and Y represent two independent random variables coming from different populations. In the literature, p is also called relative effect [8] or probabilistic index [9,10], see Brunner et al. [11] for an overview. It is the aim of this paper to discuss different estimation techniques of such effects in factorial repeated measures designs with a clustered structure. We hereby differ between non-informative and informative cluster sizes by presenting weighted and unweighted estimators. Here, informative cluster sizes mean that the cluster sizes might be related to the outcome. In addition to estimation, we further introduce different statistical inference methods for testing hypotheses formulated in terms of these effects. All methods allow for the occurrence of missing data and take all-available data into consideration, which is a novelty since previous methods by Akritas et al. [12], Fong et al. [13], Domhof et al. [14] and Amro et al. [15] can only be used for testing hypotheses in terms of distribution functions in scenarios with missing data and do not allow for clustered data. Akritas and Brunner [16] and Brunner et al. [17] propose ranking procedures for testing hypotheses green formulated in terms of distribution functions in clustered data designs, see Brunner et al. [18] for an excellent overview. Thus, the aim of this work is to provide a framework for estimating Wilcoxon–Mann–Whitney-type effects, as well as to present test procedures for hypotheses formulated in terms of these effects in factorial repeated measures designs with clustered data. The paper is organized as follows. First, a real world example is introduced in Section 2 that motivates the development of the methods. Next, in Section 3 the factorial repeated measures model with a clustered data structure is introduced. Subsequently, point estimators and their asymptotic distributions are derived in Section 4 and Section 5. Further, test procedures and multiple hypotheses in this framework are presented in Section 6 and Section 7. The results of extensive simulation studies are presented in the following Section 8. Finally, the motivating example is analyzed by using the newly proposed methods in Section 9 and a discussion and conclusion about the findings is given in Section 10. All proofs can be found in the Appendix A.

2. Motivating Example

In order to motivate the development of methods for factorial repeated measures data with a clustered structure, we consider the secondary/exploratory outcome analysis of retinal thickness in the ’Sunphenon in progressive forms of multiple sclerosis’ (SUPREMES) trial by Klumbies et al. [19], which was a relatively small clinical trial in progressive multiple sclerosis (MS) patients. MS is the most common autoimmune disorder of the central nervous system, affecting approximately 2.8 million people worldwide [20]. In MS, immune cells attack the myelin sheaths of nerve fibers, often resulting in neurodegeneration and thus permanent disability. In most cases, the disease starts with a relapsing–remitting course, followed by a progressive course around 15–20 years after diagnosis. Around 15% of patients feature a progressive course from onset of the disease [21]. MS is associated with visual impairment caused by optic nerve and posterior visual pathway damage, which can be quantified with optical coherence tomography based thickness measurements of the retinal nerve fiber layer, the combined ganglion cell, and inner plexiform layer (GCIP) and inner nuclear layer (INL) [22]. Retinal thickness analysis has been suggested as an outcome parameter in MS clinical trials [23]. In animal models of MS, epigallocatechin gallate (EGCG), which is an anti-inflammatory agent, indicated neuroprotective properties. The recent paper of Klumbies et al. [19] investigated the effect of EGCG on retinal thickness as an indicator for treatment response in progressive MS. For this motivating example, only the parameter peripapillary retinal nerve fiber layer (pRNFL) will be investigated. Longitudinal OCT data were available from 31 patients, from which 15 patients were assigned to the intervention group and 16 to the control group, respectively. For most patients, both eyes were investigated, thus leading to possibly dependent observations, since assuming independence of the eyes from the same patient would be dubious. Further, missing values occur: From 61 patients in the SUPREMES trial, only 31 contributed to the final analysis and at 3-year follow-up, only 8 patients remained in the study. Table 1 displays the numbers of patients with pRNFL measurements in each group at each time point. Due to the extremely small number of measurements after 3 years, this time point was discarded from the analysis.
Since the sample size was quite small and many missing values occurred, the authors tested—besides other hypotheses—whether there exists a statistical interaction of treatment and time using non-parametric analysis of longitudinal data in factorial experiments, as proposed by Brunner et al. [18]. One disadvantage of this procedure is that it cannot handle missing values and a clustered data structure. Therefore, Klumbies et al. [19] conducted a complete-case analysis and modeled the eyes as a second sub-plot factor besides the sub-plot factor time. In order to close this methodological gap, a general factorial model with repeated measurements, allowing for possibly correlated dependent replicates will be introduced in the next section.

3. The Factorial Repeated Measures Model with Missing Values

First, we study the general factorial repeated measures model without a clustered structure with independent random vectors
X i k = λ i 1 k , X i 1 k , , λ i d k , X i d k , i = 1 , , a ; k = 1 , , n i , with λ i s k = 1 , X i s k is observed 0 , X i s k is missing .
Here, the random variable X i s k F i s , i = 1 , , a ; s = 1 , , d ; k = 1 , , n i represents the s-th repeated measurement of the k-th subject in group i.
To account for metric, discrete, ordinal and ordered categorical data in a unified way, we use the normalized version of the distribution function
F i s ( x ) = P ( X i s k < x ) + 1 2 P ( X i s k = x ) ,
which is the average of the left and the right continuous version of the distribution function F i s = P ( X i s 1 < x ) and F i s + = P ( X i s 1 x ) , which was first introduced by Ruymgaart [24].
In model (1), the numbers of non-missing observations under time-point s in group i, the overall sample size and the minimal number of observations over all groups and time points are given by
λ i s = k = 1 n i λ i s k , N = i = 1 a n i and λ m i n = m i n ( λ 11 , , λ a d ) .
We propose to use unweighted relative effects
p i s = G d F i s = P ( Z < X i s 1 ) + 1 2 P ( Z = X i s 1 ) , i = 1 , , a ; s = 1 , , d ,
with G = 1 a d i = 1 a s = 1 d F i s being the unweighted mean distribution function and Z G , independent of X i s k . The relative effect p i s models the relationship of the distribution F i s to the average distribution G. If p i s > p j t , then data coming from F i s tend to be larger than data coming from F j t . If p i s = p j t , then there is no tendency to greenlarger nor smaller values between the two distributions. For more information on unweighted and weighted relative effects, we refer to Brunner et al. [25] and Brunner et al. [26]. In the following, we always refer to unweighted relative effects, if relative effects are mentioned.

General Factorial Model with Clustered Data

In the following we introduce a general factorial longitudinal model with clustered data. In comparison with model (1), we observe random vectors X i k = ( X i 1 k , . . , X i d k ) with
X i s k = λ i s k , X i s k 1 , , X i s k m i s k , where X i s k u F i s , u = 1 , , m i s k ,
and m i s k denotes the number of possibly dependent replicates of subject k in group i at time s and m i s = k = 1 n i m i s k λ i s k denotes the total number of possibly dependent replicates in group i at time s. Thus, the number of dependent replicates may vary for each subject and may not be under experimental control. Note, that we do not assume any correlation structure of the dependent replicates. Similarly as in (2), we define the relative effect as
p i s = G d F i s = P ( Z < X i s 11 ) + 1 2 P ( Z = X i s 11 )
with
F i s ( x ) = P ( X i s k u < x ) + 1 2 P ( X i s k u = x ) .
Note, that model (1) is contained within this model as a special case with m i s k 1 . In order to derive asymptotic results, we impose the following model assumptions:
Assumption A1.
  • A1.1: n i N κ i ( 0 , 1 ] ;
  • A1.2: N such that N λ m i n < N 0 , N 0 being a fixed constant;
  • A1.3: N such that m i s < M 0 , M 0 being a fixed constant.
Assumption A1.1 ensures that none of the groups vanishes asymptotically, whereas Assumptions A1.2 and A1.3 ensure that the total sample size N and number of clustered observations m i s is bounded. In the following section, estimators for the relative effect p will be derived.

4. Estimators and Their Asymptotic Distributions

We will first study estimators of relative effects in factorial repeated measures designs without clustered data (i.e., m i s k 1 ). In order to account for possible missing values, we define the empirical distribution function of the data under time point s in group i as the average of the all-available data by
F ^ i s k ( x ) = c ( x X i s k ) λ i s k resulting in F ^ i s ( x ) = 1 λ i s k = 1 n i F ^ i s k ( x ) .
Here, c ( u ) = 0 , 1 2 , 1 , if u < , = , > 0 . By plugging in the empirical counterparts F ^ i s , we obtain
G ^ ( x ) = 1 a d i = 1 a s = 1 d F ^ i s ( x ) ,
p ^ i s = G ^ d F ^ i s = 1 λ i s k = 1 n i λ i s k a d j = 1 a t = 1 d = 1 n j λ j t λ j t c ( X i s k X j t )
and define
p ^ = ( p ^ 11 , , p ^ 1 d , p ^ 21 , p ^ a d ) .

4.1. Effect Estimation in Factorial Designs with Clustered Data

To generalize the estimation of empirical distribution functions and relative effects to the case of clustered data, we follow the idea of Roy et al. [1] who proposed two different approaches for estimating the distribution functions by using the cluster sizes as weighting schemes. In the first version of the estimator of the relative effect p , larger clusters add more weight to the estimation than smaller ones and in the second version, all clusters add the same weight to the estimation, disregarding their size. Analogously to Roy et al. [1] the estimators are called unweighted and weighted estimators, respectively. Note that Obuchowski [27] also used the weighted version in the two-sample case.
The two different versions of the empirical distribution functions (unweighted and weighted) are defined as follows:
F ^ i s ( υ 1 ) ( x ) = 1 λ i s k = 1 n i 1 m i s k u = 1 m i s k c ( x X i s k u ) λ i s k F ^ i s ( υ 2 ) ( x ) = 1 m i s k = 1 n i u = 1 m i s k c ( x X i s k u ) λ i s k .
F ^ i s ( υ 1 ) ( x ) is the unweighted estimator of F i s ( x ) , where the average of the count function is calculated separately for each cluster and these averages are then again averaged. F ^ i s ( υ 2 ) ( x ) is the weighted estimator of F i s ( x ) where the counts are averaged over all observations.
In order to write the estimators in a unified way, we define weights
w i s k υ 1 = 1 λ i s m i s k and w i s k υ 2 = 1 m i s ,
then an estimator for F i s ( x ) and G ( x ) is given by
F ^ i s * ( x ) = k = 1 n i u = 1 m i s k w i s k * c ( x X i s k u ) λ i s k , * { υ 1 , υ 2 }
and
G ^ * = 1 a d i = 1 a s = 1 d F ^ i s * = 1 a d i = 1 a s = 1 d k = 1 n i λ i s k u = 1 m i s k w i s k * c ( x X i s k u ) .
It then follows that an estimator of p i s is given by
p ^ i s * = G ^ * d F ^ i s * = 1 a d j = 1 a t = 1 d k = 1 n i u = 1 m i s k λ i s k w i s k * F ^ j t * ( X i s k u )
= 1 a d k = 1 n i j = 1 a t = 1 d = 1 n j u = 1 m i s k v = 1 m j t λ i s k λ j t w i s k * w j t * c ( X i s k u X j t v ) .
Note that in order to derive the theory of these estimators, the weights need to fulfill the following properties
Proposition 1.
If Assumption A1.3 is fulfilled, it holds that
  • A2.1: λ i s k w i s k * m i s k O 1 λ m i n ;
  • A2.2: k = 1 n i λ i s k w i s k * m i s k = 1 .
Furthermore note, that the application of all weights which fulfill both properties is theoretically possible. For example, Zou [28] developed an ’optimal’ estimator, which incorporates information on cluster sizes and intra-cluster correlations by a mixed model approach.
First, we will study the asymptotic properties of general estimators for p in the following proposition.
Proposition 2.
The estimator p ^ * = p ^ 11 * , , p ^ 1 d * , p ^ 21 * , , p ^ a b * is asymptotically unbiased and strongly consistent, i.e.,
  • E p ^ * = p + O ( 1 λ m i n ) ;
  • p ^ p a . s . 0 , λ m i n .
Subsequently, the asymptotic distribution of the statistic N p ^ * p will be derived. It will be indicated in the next theorem that N p ^ * p has asymptotically under A1.1 and A1.2, the same distribution as the random vector N B * , with
N B * = h = 1 a N B h * = h = 1 a N n h k = 1 n h Ψ h k * E Ψ h k *
based on random variables defined as
Ψ i s , h k * : = n h a d t = 1 d u = 1 m h t k λ h t k w h t k * F i s ( X h t k u ) , for h i n h a d j i a t = 1 d u = 1 m i s k λ i s k w i s k * F j t ( X i s k u ) + n h a d t = 1 d ( u = 1 m i s k λ i s k w i s k * F i t ( X i s k u ) u = 1 m i t k λ i t k w i t k * F i s ( X i t k u ) ) , else .
The expectation of Ψ h k * can be written as
β i s , i k * : = E ( Ψ i s , h k * ) = n h a d t = 1 d λ h t k m h t k w h t k * p ( i s , h t ) , for h i n h a d j i a t = 1 d m i s k λ i s k w i s k * p ( j t , i s ) ) + n h a d t = 1 d ( m i s k λ i s k w i s k * p ( i t , i s ) m i t k λ i t k w i t k * p ( i s , i t ) ) , else ,
with p ( i s , h t ) : = F i s d F h t denoting pairwise relative effects between groups i and h and time points s and t.
Theorem 1.
Let N B * = h = 1 a N B h * = h = 1 a N B 11 , h * , , B a d , h * be the vector of the random variables n B i s , i = 1 , , d ; s = 1 , , d . If A1.1 and A1.2 hold true, then
| | N p ^ * p N B * | | 2 2 = O 1 N .
It follows that the asymptotic covariance matrix of N p ^ * p is given by
V N * = C o v ( N B * ) .
The asymptotic multivariate normality of the linear statistic N p ^ * p is given in the next theorem.
Theorem 2.
Under Assumptions A1.1 and A1.2, the statistic N p ^ * p follows asymptotically, as N , a multivariate normal distribution with expectation 0 and covariance matrix V N * .
However, this covariance matrix is mostly unknown in practical applications and must be estimated in order to be able to make statistical inferences. In Section 5 we will derive a consistent and positive semi-definite estimator of the covariance matrix.

4.2. Informative Cluster Sizes

In many applications, the cluster sizes m i s k (might) depend on the outcome of interest, i.e.,
Assumption 2.
E ( X i s k u ) E ( X i s k u | m i s k ) , i = 1 , , a ; s = 1 , , d ; k = 1 , , n i ; u = 1 , , m i s k .
Which makes them non-ignorable or informative [29]. As an example, consider the periodontal disease (an inflammation of the gums and bone that surround and support the teeth) study [30]. Severe periodontitis ends in the falling out of teeth and, thus, cluster sizes (patient’s teeth) depend on the clinical outcome. Hoffmann et al. [29], among others, suggest a Within-Cluster-Resampling (WCR) method for the analysis of informative clustered binary data. This approach is also applicable in the rather general model considered here and will be described in the following:
A randomly chosen observation X i s k q is sampled from cluster X i s k . This is done for each of all N * d clusters, resulting in a dataset involving single observations only. The latter is repeated Q times, e.g., Q = 10,000, and for each of the Q datasets, the vector of relative effects p is estimated by adapting Equation (4):
p ^ i s Δ , q = 1 λ i s k = 1 n i λ i s k a d j = 1 a t = 1 d = 1 n j λ j t λ j t c ( X i s k q X j t q ) .
An estimator and its asymptotic distribution is given in the following theorem.
Theorem 3.
Let
p ^ Δ = 1 Q q = 1 Q p ^ Δ , q
denote the Within-Cluster-Resampling based estimator. If N , then
N p ^ Δ p N ( 0 , Σ Δ ) ,
where Σ Δ is finite.
A consistent variance is estimator is provided in the next theorem.
Theorem 4.
Let N and p ^ Δ be defined as in Theorem 3. Then, an estimator of Σ Δ is given by
Σ ^ Δ = V a r ^ N p ^ Δ p = N 1 Q q = 1 Q Σ ^ q Q 1 Q S p 2 ,
where Σ ^ q is the estimated covariance matrix from the q-th analysis (see the following chapter for the derivation of an estimator) and
S p 2 = 1 Q 1 q = 1 Q p ^ Δ , q p ^ Δ p ^ Δ , q p ^ Δ
is the estimated covariance matrix among the Q resample-based estimates p ^ Δ , q . Then, Σ ^ Δ is consistent for Σ = V a r N p ^ Δ p .
The proofs of Theorem 3 and 4 can be found in the appendix of Hoffmann et al. [29]. The WCR-approach proposed by Hoffmann et al. [29] is computationally intensive and could possibly lead to negative variance estimators due to the subtraction in the variance estimation in Equation (7). Hoffmann et al. [29] noted, that this occurs rarely and concluded that in these scenarios, the number of resampled datasets Q or the number of clusters N may be too small for making inferences. However, the WCR-based approach is equivalent to the unweighted estimation of the relative effects p as proposed by Roy et al. [1]—in both analysis all clusters are given equal weight, regardless of their size. Thus, the use of the unweighted estimator should be preferred over the WCR-based approach since its computation is less intensive and always leads to positive variance estimators. However, it should be noted that Assumption 2 of ignorable cluster-sizes is never imposed during the development of the theory in this work. Therefore, all weighting schemes that fulfill Assumptions A2.1 and A2.2 can be applied in case of non-ignorable cluster sizes-however, the resulting estimators have a different interpretation.

5. Estimation of the Covariance Matrix

Now, an estimator of the covariance matrix V N * is derived. Similarly, as in Rubarth et al. [31], the random variables Ψ i s , h k * are not observable. Otherwise, an estimator of V N * would be given by
V ˜ N * = h = 1 a N n h V ˜ N , h *
with V ˜ N , h * = 1 n h 1 k = 1 n h Ψ h k * β h k * Ψ h k * β h k * . Therefore, we replace the unknown Ψ i s , h k * with observable random variables. Define the vectors Ψ ^ h k * = Ψ ^ 11 , h k * , , Ψ ^ a d , h k * with
Ψ ^ i s , h k * : = n h a d t = 1 d u = 1 m h t k λ h t k w h t k * F ^ i s * ( X h t k u ) , for h i n h a d j i a t = 1 d u = 1 m i s k λ i s k w i s k * F ^ j t * ( X i s k u ) + n h a d t = 1 d ( u = 1 m i s k λ i s k w i s k * F ^ i t * ( X i s k u ) u = 1 m i t k λ i t k w i t k * F ^ i s * ( X i t k u ) ) , else
and expectation values
β ^ i s , h k * : = E ( Ψ ^ i s , h k * ) = n h a d t = 1 d λ h t k m h t k w h t k * p ^ * ( i s , h t ) , for h i n h a d j i a t = 1 d m i s k λ i s k w i s k * p ^ * ( j t , i s ) + n h a d t = 1 d ( m i s k λ i s k w i s k * p ^ * ( i t , i s ) m i t k λ i t k w i t k * p ^ * ( i s , i t ) ) , else ,
where p ^ * ( i s , h t ) = F ^ i s * d F ^ h t * = k = 1 n h u = 1 m h t k λ h t k w h t k * F ^ i s * ( X h t k u ) denote the estimators of the pairwise relative effects p ( i s , h t ) . Finally, an estimator for the unknown covariance matrix V N , h * is given by
V ^ N , h * = 1 n h 1 k = 1 n h Ψ ^ h k * β ^ h k * Ψ ^ h k * β ^ h k *
and an estimator for V N * : = h = 1 a κ h 1 V N , h * is given by
V ^ N * : = h = 1 a N n h V ^ N , h * .
Its properties are presented in the next theorem.
Theorem 5.
For N , such that A1.1 and A1.3 are fulfilled, it holds
  • V ^ N , h * and V ^ N * are positive semi-definite;
  • V N , h * V ^ N , h * a . s . 0 ;
  • V N * V ^ N * a . s . 0 .

6. Multiple Hypotheses

In this section, the formulation of hypotheses for main- and interaction effects in the factorial repeated measures framework will be outlined. Let C = ( c 1 , , c q ) R q × a d be an arbitrary contrast matrix and let
Ω = { H 0 : c p = 0 , = 1 , , q }
be a family of hypotheses, where c denotes the -th row vector of C . The decision which contrast matrix is appropriate depends on the specific research questions. Well known types of contrast matrices are the Tukey-type contrast matrix, used for all-pairwise comparisons or the Dunnett-type contrast matrix, which is used for the comparison of several groups to one control group. User-specified contrast matrices can also be applied, as long as they have the property of a contrast matrix, which is, that each row of the contrast matrix C sums up to 0 (i.e., m = 1 a d c , m = 0 = 1 , , q ).
Since the layout in this paper is multifactorial, it is briefly demonstrated how to define appropriate contrast matrices for testing main effects of group membership and time and interaction effect between group membership and time.
  • Main effect group membership G In order to make comparisons in terms of group membership, it is necessary to center and average over the repeated measures. Thus, a contrast matrix to test for no group effect will be defined as
    C G : = C g P a 1 d 1 d R q × a d ,
    with C g being a contrast matrix for the group effect with a time structure.
  • Main effect time T Similarly for the time effect, the measurements across the groups need to be centered and averaged, leading to a contrast matrix to test for no time effect as
    C T : = C t 1 a 1 a P d R q × a d .
    Again, C t denotes a contrast matrix for the effect over time without the group structure.
  • Interaction effect G × T For the test of no interaction between group membership and time, the centering matrix
    C G T = P a P d R a d × a d
    will be used.

7. Test Statistics

In this section, we will present different test procedures for testing global and multiple hypotheses concerning the null hypothesis H 0 p : C p = 0 , with C being an appropriate contrast matrix tailored to the specific research question. First, we propose two quadratic test procedures, a Wald-type statistic (WTS) and an ANOVA-type statistic (ATS) as already described by Brunner et al. [17], Domhof et al. [14], and Rubarth et al. [31]. These procedures can only be used to test the global null hypothesis and cannot be inverted to obtain (simultaneous) confidence intervals. Therefore, we will present a Multiple Contrast Test Procedure (MCTP), which has been introduced by Konietschke et al. in a general non-parametric factorial framework [32] and Rubarth et al. [31] for the case of incompletely observed data. Using this procedure, multiple hypotheses can be tested simultaneously and adjusted confidence intervals and p-values are directly obtained.

7.1. Quadratic Test Procedures

Following Konietschke et al. [33] and Rubarth et al. [31], we consider the Wald-type statistic (WTS)
Q N * = N p ^ * C C V ^ N * C + C p ^ * ,
which can be approximated by a χ f ^ 2 distribution with f ^ = r a n k ( C V ^ N * C ) degrees of freedom (see the discussion on further assumptions on V N * in Brunner et al. [25]). Here, [ . ] + denotes the Moore-Penrose inverse of a matrix. However, simulation studies by Konietschke et al. [33], Domhof et al. [14] and Rubarth et al. [31] indicate, that the WTS is very liberal in small or moderate sample size scenarios. Therefore, Akritas et al. [34] and Brunner et al. [25], among others, approximate the (asymptotic) distribution of
A N * = N t r ( M V N * ) p ^ * M p ^ *
by a scaled χ f 2 / f distribution with
f = [ t r ( M V * ) ] 2 t r ( M V N * M V N * )
degrees of freedom. Here, M = C C C C and C C denotes a generalized inverse of C C . Since M is a projection matrix, it holds that M p = 0 C p = 0 . The unknown traces t r ( M V N * ) and t r ( M V N * M V N * ) are estimated by replacing V N * with V ^ N * , see Brunner et al. [17] for the derivation.

7.2. Multiple Contrast Test Procedure

To overcome the above outlined disadvantages of the quadratic test procedures, Konietschke et al. [32] proposed a rank-based MCTP for factorial designs, whereas Rubarth et al. [31] proposed a procedure for repeated measures designs with missing values.
Consider the -th individual null hypothesis H 0 ( ) : c p = 0 and the corresponding test statistic
T * = N c p ^ * p c V ^ N * c
with c being the -th row vector of C . All test statistics are collected in the vector
T * = ( T 1 * , , T q * ) .
Note that the test statistics T * and T m * ( m ) are not necessarily independent depending on the chosen contrast and the repeated measures.
The distribution of T * is asymptotically standard normal. It follows then from Theorem 2 and Slutzky’s theorem that T * follows, asymptotically, as N , a standard multivariate normal distribution with expectation 0 and correlation matrix
R * = D * , 1 / 2 C V N * C D * , 1 / 2 ,
with D * being a diagonal matrix of the diagonal elements of C V N * C . For large samples, the local null hypothesis H 0 ( ) : c p = 0 will be rejected if | T * | z 1 α , 2 , R * . Here, z 1 α , 2 , R * denotes the two-sided ( 1 α ) equicoordinate quantile of the N ( 0 , R * ) distribution [35]. By inverting the corresponding test statistic, simultaneous confidence intervals for the effects δ = c p can be obtained by
C I = c p ^ * z 1 α , 2 , R * N c V ^ N * c .
It follows directly, that the global null hypothesis H 0 p : C p = 0 will be rejected, if T 0 = m a x { | T 1 * | , , | T q * | } z 1 α , 2 , R * . Analogously as in Konietschke et al. [36] and Rubarth et al. [31], the correlation matrix is unknown but can be consistently estimated by
R ^ N = D ^ * , 1 / 2 C V ^ N * C D ^ * , 1 / 2 .
Again, D ^ * is denoted as the diagonal matrix obtained from the diagonal elements of C V ^ N * C . We note that the method controls the family wise error rate α in the strong sense asymptotically. However, the proposed procedure is only valid for large sample sizes and the convergence of T * to its asymptotic distribution is rather slow [32]. Therefore, we follow Konieschke et al. [32] who proposed a small sample approximation by using a central multivariate T ( ν , 0 , R ^ * ) distribution, with ν degrees of freedom and correlation matrix R ^ * . We define for each linear contrast c = ( c 11 , , c a d ) , = 1 , . . , q random variables Φ h k * = c Ψ h k * . It can be directly seen that
N c ( p ^ * p ) N h = 1 a 1 n h k = 1 n h [ Φ h k * E ( Φ h k * ) ]
and by independence of Φ h k * and Φ h k * ( k k ) we obtain for the variance
V a r N h = 1 a 1 n h k = 1 n h [ Φ h k * E ( Φ h k * ) ] = N h = 1 a 1 n h V a r ( Φ h 1 * ) = N h = 1 a 1 n h ω h 2 , *
with ω h 2 , * = V a r ( c l Ψ h 1 * ) = V a r ( Φ h 1 * ) . The unknown variances ω h 2 , * can be consistently estimated by ω ^ h 2 , * = 1 n h 1 k = 1 n h ( Φ h k * Φ ¯ h * ) 2 with Φ ¯ h * = 1 n h k = 1 n h Φ h k * . We follow Gao et al. [37] and estimate the degree of freedom by
ν = max { 1 , min = 1 , , q { ν 1 , , ν q } }
with
ν l = h = 1 a ω ^ h 2 , * / n h 2 h = 1 a ω ^ h 2 , * / ( n h 2 ( n h 1 ) ) , = 1 , , q .

8. Simulation Study

Within this section, the precision of the unweighted and weighted estimator and the behavior of the introduced test procedures in terms of their type-I error control are examined. The investigated metrics for the precision are Mean Squared Errors (MSEs) and biases, defined as
bias = 1 n sim i sim = 1 n sim 1 a d i = 1 a s = 1 d p ^ i s * 1 2 MSE = 1 n sim i sim = 1 n sim 1 a d i = 1 a s = 1 d p ^ i s * 1 2 2 .
As already pointed out by Domhof et al. [14], Konietschke et al. [33], and Rubarth et al. [31], the WTS requires large sample sizes to be able to maintain the type-I eror rate. Therefore, only the ATS and the MCTP will be examined.

8.1. Set-Up

The simulation study was conducted in R [38] version R 4.1.0 and for each scenario 10,000 simulation runs were performed. The complete simulation code can be found on https://github.com/KerstinRubarth/Clustered, last accessed on 10 November 2021. Due to the abundance of possible scenarios, the simulation study was restricted to the following parameter constellations: The number of independent groups was set to a = 2 and the number of repeated measures to d = 3 . The sample sizes n 1 and n 2 were chosen to model balanced designs with ( n 1 , n 2 ) { ( 15 , 15 ) , ( 30 , 30 ) } , as well as unbalanced designs with ( n 1 , n 2 ) { ( 20 , 10 ) , ( 40 , 20 ) } . The number of dependent replicates of subject k at time s in group i ( m i s k ) were chosen to be
  • m i s k 1 (no dependent replicates);
  • m i s k 2 (two dependent replicates);
  • m i s k realizations of a Binomial distribution with B i n o m ( 5 , 0.6) + 1 ;
  • m i s k realizations of a Binomial distribution with B i n o m ( 10 , 0.4) + 1 .
The correlation of the dependent replicates within a cluster was set to be
  • ρ i s k 0 , ρ i s k 0.3, ρ i s k 0.9 (same correlation within each cluster);
  • ρ i s k realizations of a Binomial distribution with B i n o m ( 10 , 0.6) / 10 (different correlations within each cluster).
Data was generated by drawing from multivariate normal distributions having expectation μ i k = ( μ i 1 , , μ i 1 , , μ i d , , μ i d ) R m i k ( m i k = s = 1 d m i s k ) and covariance matrices Σ i k R m i k × m i k with
Σ i k = σ i 1 2 ρ i 1 k ρ i 1 k σ i 12 σ i 12 σ i 13 σ i 13 ρ i 1 k ρ i 1 k ρ i 1 k ρ i 1 k ρ i 1 k ρ i 1 k σ i 1 2 σ i 12 . . σ i 12 σ i 13 σ i 13 σ i 21 σ i 21 σ i 2 2 ρ i 2 k ρ i 2 k σ i 23 σ i 23 ρ i 2 k ρ i 2 k ρ i 2 k ρ i 2 k σ i 21 σ i 21 ρ i 2 k ρ i 2 σ i 2 2 σ i 23 σ i 23 σ i 31 σ i 31 σ i 32 σ i 32 σ i 3 2 ρ i 3 k ρ i 3 k ρ i 3 k ρ i 3 k ρ i 3 k ρ i 3 k σ i 31 σ i 31 σ i 32 σ i 32 ρ i 3 k ρ i 3 k σ i 3 2 .
The components σ i s k are obtained from the following homo- and heteroscedastic covariance matrices of multivariate normal distributions:
Σ 1 1 0.2 0.2 0.2 1 0.2 0.2 0.2 1 and Σ 2 = 1 0.1 0.2 0.1 1.2 0.3 0.2 0.3 1.5 .
Since the simulation study of Rubarth et al. [31] indicated that the performance of the procedure is not dependent on the distribution, no other data generating distributions were considered. Analogously, we restricted our simulations to the case of Missing-Completely-At-Random (MCAR) scenarios, since no different behavior of the methods of Rubarth et al. [31] in MAR scenarios compared to MCAR scenarios could be observed. Thus, the indicators λ i s k greenwere generated by drawing from Binomial distributions B ( 1 r ) with r being the percentage of missing values r = ( r 1 , r 2 ) { ( 0 % , 0 % ) , ( 0 % , 20 % ) , ( 10 % , 10 % ) , ( 30 % , 30 % ) } . Since the power of the methods was already investigated in detail by Rubarth et al. [31], the simulation study green of the present paper focused solely on type-I error rates. Further, we additionally investigated the precision of the unweighted and weighted estimators.

8.2. Results—Type-I Error Rate

First, an overview of the impact of different sample sizes in scenarios with completely observed data is given in Figure 1 if no missing values occur. It can be readily seen that both procedures control the type-I error quite well even if the sample size is quite low with n 1 = n 2 = 15 . Interestingly, the MCTP works better if sample sizes are unbalanced, whereas the ATS works better in case of balanced sample sizes.
Next, the impact of missing values will be inspected. In Figure 2, the sample sizes were fixed with n 1 = n 2 = 30 . The empirical type-I error rates of both procedures increase, if missing values occur and the higher the relative frequency of missing values, the higher the type-I error rates. Furthermore, the simulation study indicates that the MCTP is more affected by the occurrence of missing values than the ATS, which was already noted by Rubarth et al. [31].
The relationship between the cluster sizes  m i s k and the type-I error rates is depicted in Figure 3. For this comparison, the sample sizes were again fixed with n 1 = n 2 = 30 and only completely observed data were investigated. It can be readily seen that type-I error rates decrease if two dependent replicates of each subject are present in comparison to a dataset without a clustered structure. However, type-I error rates of the ATS increases if the number of dependent replicates m i s k is arbitrary with an expected number of dependent replicates of 5 or 4, respectively. In contrast, the type-I error rates of the MCTP decrease on median in these scenarios.
Next, the influence of intra-cluster correlations  ρ i s k on type-I error rates is investigated in scenarios with sample sizes n 1 = n 2 = 30 and green without missing data (Figure 4). The type-I error rates of the ATS decrease if non-arbitrary higher intra-cluster correlations are present, whereas the type-I error rates of the MCTP increase in case of higher (non-arbitrary) intra-cluster correlations ρ i s k . Interestingly, if arbitrary correlations ρ i s k are present with a mean correlation of 0.6, the type-I error rates of the ATS increase in comparison to scenarios with fixed correlations, whereas the type-I error rates of the MCTP are on the same level as in scenarios with a fixed correlation ρ i s k = 0.9 .
Figure 5 depicts the impact of homo- and heteroscedastic covariance matrices Σ 1 and Σ 2 in settings with n 1 = n 2 = 30 and green without missing data. Again, it can be seen that the type-I error rates of the ATS are on median smaller than those of the MCTP for both homo- and heteroscedastic covariance matrices. Type-I error rates of the ATS seem to be a bit smaller in case of homoscedasticity, whereas type-I error rates of the MCTP seem to be slightly larger in homoscedastic scenarios.
Next, the relationship of unweighted and weighted estimation and type-I error rates will be inspected in scenarios with n 1 = n 2 = 30 and without missing data (Figure 6). It can be readily seen that the type-I error rates of the ATS do not differ on median in case of weighted and unweighted estimation of the relative effect p , only the interquartile range is increased in case of weighted estimation. Contrary, the type-I error rates of the MCTP are on median smaller in case of unweighted estimation of the relative effect p but without an enlargement of the respective interquartile range.
To conclude, an analysis of the impact of unweighted and weighted estimation of the relative effect p and the fixed intracluster correlation ρ i s k is presented in Figure 7 (in scenarios with n 1 = n 2 = 30 and without missing data). The type-I error rates of the ATS decrease if the intra-cluster correlations ρ i s k increase, as already depicted in Figure 4. Interestingly, the medians of the type-I error rates are comparable in case of unweighted and weighted estimation if no intra-cluster correlation is present. However, in these scenarios, the interquartile range of the type-I error rates in case of weighted estimation is very enlarged in comparison to the case of unweighted estimation. If a medium intra-correlation is present, type-I error rates of the unweighted estimator are on median smaller than those of the weighted estimator. Here, the interquartile ranges are quite comparable. In scenarios with high intra-class correlations, the weighted estimator yields smaller type-I error rates on median; as well as a larger interquartile range.
Again, as already outlined in Figure 4, type-I error rates of the MCTP increase with higher intra-cluster correlations. The type-I error rates of the unweighted and weighted estimator are quite comparable if no intracluster-correlation is present. However, they are quite different if a medium correlation is present: type-I error rates by using the unweighted estimator tend to be smaller on median than those obtained from weighted estimation. If high correlations are present, the unweighted estimator yields smaller type-I error rates than the weighted version.

8.3. Results—Precision

Analogously to the previous section, we will first explore the impact of the sample size on the precision of the unweighted and weighted estimators in scenarios with completely observed data. It can be readily seen in Figure 8 that the MSEs of the unweighted and weighted estimators are quite comparable. The MSEs decrease if sample size increase; balanced settings exhibit smaller MSEs than unbalanced settings. Regarding the bias of the estimators, scenarios with smaller sample sizes tend to exhibit biases in the negative direction, whereas scenarios with larger sample sizes exhibit biases in the positive direction. Interestingly, the interquartile range of biases is quite enlarged in scenarios with n 1 = 40 and n 2 = 20 .
Next, the impact of missing data on the precision of the estimators is inspected in scenarios with n 1 = n 2 = 30 (see Figure 9). As seen before, the MSEs of unweighted and weighted estimators are quite comparable. The MSEs increase with an increasing missing rate. Interestingly, the interquartile ranges of the biases of the two different estimators are quite different; the weighted estimator exhibits a larger interquartile range (especially if 10% of data are missing) than the unweighted estimator. Further, biases of the unweighted estimator tend to be positive (except if the missing rate is 30%). Contrary, the biases of the weighted estimator tend to be negative.
In Figure 10, the influence of the number of dependent replicates  m i s k on the precision is presented (again in scenarios with n 1 = n 2 = 30 and without missing data). As already pointed out, the distribution of MSEs of the unweighted and weighted estimator is very similar. Interestingly, there is very little variation if only one observation per subject and time point is available compared to scenarios with more possibly dependent replicates. Further, the MSEs decrease quite a lot if already two possibly dependent observations are available and the more clustered data are available, the smaller are the MSEs. The same observations can be made regarding the biases of both estimators.
Finally, the relationship between intra-cluster correlations will be inspected (see Figure 11, again scenarios with n 1 = n 2 = 30 and without missing data). Again, the MSEs of the unweighted and the weighted estimator do not differ much. It can be readily seen that in scenarios with fixed cluster correlations, the MSEs increase with increased intra-cluster correlations. Contrary, the biases of both estimators approach 0 with increasing fixed intra-cluster correlations but the two estimators exhibit a different behavior: the biases of the unweighted estimator tend to be positive whereas the biases of the weighted estimator tend to be negative.

9. Analysis of the Motivating Example

The parameter pRNFL from the SUPREMES study introduced in Section 2 can now be analyzed using the newly proposed methodology for factorial repeated measures designs with dependent replicates. It can be readily seen in Figure 12 and Figure 13 that pRNFL baseline values in the placebo group tend to be smaller than those from the EGCG group.
Further, Figure 13 indicates that pRNFL values seem to increase at 2-year follow-up; however, this needs to be interpreted with caution, since almost 50% of the patients could not be measured at this time point. These were mostly patients with smaller baseline pRNFL values. This could possibly violate the Missing Completely at Random (MCAR) assumption. However, the simulation study of Rubarth et al. [31] indicated that the proposed method is robust against violations of the MCAR assumption. In order to account for the baseline differences between the two groups following Klumbies et al. [19], the analysis is not based on raw pRNFL values but on differences to baseline. In contrast to the analysis of Klumbies et al. [19], the MCTP on baseline differences is applied. The Tukey contrast was chosen to compare all pairwise differences, thus, the contrast matrix for testing the null hypothesis H 0 p : C p = 0 is as follows:
C = c 1 c 2 c 3 c 4 c 5 c 6 = 1 1 0 0 1 0 1 0 1 0 0 1 0 1 1 0 0 1 0 1 0 0 1 1 .
The relative effects were estimated by using the unweighted version of the estimator, leading to p ^ = p ^ 11 , p ^ 12 , p ^ 21 , p ^ 22 = 0.525, 0.445, 0.505, 0.525 , indicating that the differences to baseline are the largest at 1-year follow-up simultaneously in group 1 (intervention group) and group 2 (Placebo group) and the smallest at 2-year follow-up in group 1. The results, including the values of the test statistics T , p-values and 95% simultaneous confidence intervals, are displayed in Table 2.
Note that no adjustment for multiplicity was necessary, since the same critical value obtained from the MCTP was used for each comparison. It follows that no evidence exists to reject the global null hypothesis H 0 p : C p = 0 , resulting in the same conclusion as already pointed out by Klumbies et al. [19]: Retinal thickness analysis did not reveal neuroprotective effects of EGCG, especially when considering the contrasts p ^ 21 p ^ 11 = 0 and p ^ 22 p ^ 12 = 0 , which allow a direct comparison of the differences to baseline at 1-year and 2-year follow-up in the two groups.

10. Discussion and Conclusions

In the present paper, we presented different estimation techniques for Wilcoxon–Mann–Whitney effects in factorial repeated measures designs with clustered data. In a first step, the information whether cluster sizes are informative or non-informative is key and plays a major role in precise effect size estimation. Furthermore, as indicated by Zou [28] the use of the intraclass correlation enlarges the precision of the estimators. Anyway, besides estimation, we furthermore discussed how to test global and multiple hypotheses in terms of Wilcoxon–Mann–Whitney effects using any of the aforementioned estimators. Here, no specific data distribution (symmetric or asymmetric) is required. The presented estimators and test procedures should be preferred over standard parametric methods such as Linear Mixed Models or Generalized Estimating Equations in scenarios with small sample sizes, heteroscedastic variances or count, ordinal outcomes. We recommend to use the MCTP instead of quadratic-type test procedures in most practical applications since testing global hypotheses does not usually answer the research questions. However, the MCTP presented in this paper does not precisely hold the nominal type-I error rate in case of very small sample sizes, high correlations, and strong heteroscedasticity between groups or time points. Recently, Friedrich et al. [39] proposed novel resampling methods in purely non-parametric designs. Extending these ideas to such designs will be part of future research to improve especially the MCTP in “extreme“ scenarios.
Although an extensive simulation study was conducted to evaluate the precision and type-I error rates of the procedures in several scenarios, it is advised to conduct further simulation studies in practical applications, e.g., in the planning or data analysis phase of a study for a specific scenario. Further examinations indicate that the methods are applicable even in situations with rather small sample sizes, such as n = 10 . The actual nominal level and accuracy depends, however, on the design of interest.
Furthermore, it is important to note that many scientists in applied research fields, e.g., biomedicine, have a misconception that the Wilcoxon–Mann–Whitney (WMW) test is a test for equality of means or medians, when outcomes are metric and distributions are skewed or ordinal and that this test is the non-parametric equivalent to a classic two-sample t-test [40]. As investigated by Fagerland et al. [41], the true significance level of the WMW test deviates enormously from the nominal level when the test is used for comparing means or medians in scenarios with deviations from a pure shift model (two populations having equal shapes and scales). In practical applications, the pure shift model is rarely present, since skewed distributions with different means have most likely also different variances. Especially in such scenarios, the Brunner–Munzel test [8] should be applied. Another disadvantage of the application of WMW tests as noted by Bergmann et al. [42] is that many versions and implementations in statistical software packages exist, e.g., large sample approximation, exact permutation form, versions with or without correction for continuity or ties and different algorithm variants, all leading to possibly different p-values and eventually to different conclusions. In this work we present a unified approach that does not need a correction for ties nor continuity. Furthermore, the WMW test and its p-value are rarely accompanied with their corresponding effect estimate, the Wilcoxon–Mann–Whitney parameter p and its corresponding confidence interval. As noted by Fay et al. [43], classic confidence interval procedures for the WMW parameter are not compatible with exact WMW tests, meaning, that the tests rejects a hypothesis at significance level α but the confidence interval for p includes 1 2 . Thus, Fay et al. [43] developed compatible confidence intervals for asymptotic WMW tests and for some exact WMW tests. Furthermore, Fay et al. [44] indicate that the WMW parameter p can be framed as a causal parameter (the probability that a randomly chosen subject from one population, e.g., the treatment group in a randomized-controlled trial, will have a larger response than one subject from the other population, e.g., the control group). However, this parameter is not equal to another closely related and non-identifiable causal effect, the probability that a randomly chosen subject will have a larger response under treatment than under control [44]. This paradox was first introduced by Hand et al. [45]. Therefore, caution must be given when interpreting effect estimates from non-parametric procedures. Thus, the literacy of non-parametric statistics of scientists working in applied fields should be fostered. This work aims to accomplish this by first introducing and explaining Wilcoxon–Mann–Whitney parameters p in special designs and then providing a flexible model for the analysis of factorial repeated measure designs with a clustered structure, allowing for missing values in a second step. For user friendly applications of the methods, it is planned to enrich the R software package nparLD [46].

Author Contributions

Conceptualization, K.R. and F.K.; methodology, K.R., P.S. and F.K.; formal analysis, K.R.; software, K.R. and P.S.; data curation, H.G.Z.; writing—original draft preparation, K.R. and P.S.; writing review and editing K.R., P.S., F.K. and H.G.Z.; funding acquisition, F.K. All authors have read and agreed to the published version of the manuscript.

Funding

The work of Kerstin Rubarth and Frank Konietschke was funded by Deutsche Forschungsgemeinschaft Grant/Award Number DFG KO 4680/3-2. The work of Paavo Sattler was funded by Deutsche Forschungsgemeinschaft Grant/Award Number DFG PA 2409/3-2.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The R code of the simulation study is available at https://github.com/KerstinRubarth/Clustered, last accessed on 10 November 2021.

Conflicts of Interest

The authors declare no conflict of interest. H.G.Z. received honoraria from Bayer Healthcare and Novartis, and research grants from Novartis, independent of the current study.

Appendix A

In this section, we present all the proofs of the theoretical results achieved.
Proof of Proposition 1.
For the concrete weights, we calculate
λ j t k w j t k υ 1 m j t k = λ j t k m j t k · 1 m j t k λ j t 1 λ min , λ j t k w j t k υ 2 m j t k = λ j t k m j t k · 1 m j t M 0 λ min .
Finally, we calculate
k = 1 n i m i s k λ i s k w i s k υ 1 = k = 1 n i m i s k λ i s k · 1 λ i s m i s k = 1 λ i s k = 1 n i λ i s k = 1 ,
and
k = 1 n i m i s k λ i s k w i s k υ 2 = k = 1 n i m i s k λ i s k · 1 m i s = 1 .
Since these are all necessary properties of the weights, the following results hold for all kinds of weights fulfilling these properties.
Moreover, it becomes clear, that condition A1.3 is only necessary for the first inequality and the weighted estimator. Since these inequalities are required for nearly all the following statements, it is therein claimed. Note that if the assumption of bounded cluster sizes is difficult to justify in practical applications, the unweighted estimator can be used without any restrictions.
Proposition A1.
For the empirical distributions, under condition A1.3, it holds
| F ^ i s * F i s | a . s . 0 , λ min and | G ^ * G | a . s . 0 λ min * { υ 1 , υ 2 } .
Proof. 
First, we demonstrate the pointwise almost sure convergence of the empirical distribution function F ^ i s * . Denote with Ξ i s = { k = 1 , , n i , : λ i s k > 0 } , the amount containing the indices of subjects from group i, with existing observation in the s-th component. It is clear that | Ξ i s | = λ i s . Moreover, for fixed x R we define independent random variables Y i s k * : = λ i s w i s k * u = 1 m i s k c ( x X i s k u ) . Then it holds that
F ^ i s * ( x ) = 1 λ i s k = 1 n i λ i s k Y i s k * = 1 λ i s Ξ i s Y i s * .
For the expectation of this sum, we calculate
E Ξ i s Y i s * = Ξ i s λ i s w i s * u = 1 m i s E ( c ( x X i s u ) ) = λ i s Ξ i s w i s * m i s F i s ( x ) = λ i s F i s ( x ) k = 1 n i λ i s k w i s * m i s = λ i s F i s ( x ) .
Because of | c ( x ) | < 1 for the variance of Y i k * we obtain
V a r ( Y i s * ) = λ i s 2 ( w i s * ) 2 V a r u = 1 m i s E ( c ( x X i s u ) ) λ i s 2 ( w i s * ) 2 m i s 2 .
For the application of the strong law of large numbers we finally consider
1 λ i s 2 Ξ i s V a r ( Y i k * ) Ξ i s ( w i s * ) 2 m i s 2 = k = 1 n i λ i s k 2 ( w i s k * ) 2 m i s k 2 = O ( λ min 1 ) k = 1 n i λ i s k w i s k * m i s k = O ( λ min 1 ) 0 for λ min .
Thus, it holds that
F ^ i s * ( x ) a . s . 1 λ i s E Ξ i s Y i k * = F i s ( x ) .
Replacing c ( x ) through c ( + ) ( x ) resp. c ( ) ( · ) , this leads to the same convergence for the right-continuous resp. left-continuous versions of this distribution functions.
Now, this pointwise convergence has to be expanded for the supremum norm. This was already done by Domhof [47] for a similar setting and only requires the pointwise convergence proven above. The result for G ^ * and G follows from this with the triangle inequality. □
Proof of Proposition 2.
To prove the asymptotic unbiasedness, we consider the single components and calculate
| E ( p ^ i s * ) p i s | = | 1 a d k = 1 n i j = 1 a t = 1 d = 1 n j u = 1 m i s k v = 1 m j t λ i s k λ j t w j t * w i s k * E ( c ( X i s k u X j t v ) ) F j t d F i s | 1 a d k = 1 n i j = 1 a t = 1 d = 1 n j u = 1 m i s k v = 1 m j t λ i s k λ j t w j t * w i s k * E ( c ( X i s k u X j t v ) ) F j t d F i s = 1 a d k = 1 n i j = 1 a t = 1 d λ j t k λ i s k w i s k * w j t k * u = 1 m i s k v = 1 m j t k | E ( c ( X i s k u X j t k v ) ) F j t d F i s | 1 a d k = 1 n i j = 1 a t = 1 d λ j t k λ i s k w i s k * w j t k * m i s k m j t k 1 a d j = 1 a t = 1 d k = 1 n i λ i s k w i s k * m i s k · O λ min 1 = O λ min 1
It is clear that through condition A1.1 and A1.2 we can also substitute this with O N 1 .
For the second part we consider
| p ^ i s * p i s | 1 a d j = 1 a t = 1 d F ^ j t * d F ^ i s * F j t d F i s = 1 a d j = 1 a t = 1 d F ^ j t d F ^ i s * F j t d F ^ i s * + F j t d F ^ i s * F j t d F i s 1 a d j = 1 a t = 1 d ( F ^ j t * F j t ) d F ^ i s * + F j t d ( F ^ i s * F i s ) = 1 a d j = 1 a t = 1 d ( F ^ j t * F j t ) d F ^ i s + ( F i s F ^ i s * ) d F j t 1 a d j = 1 a t = 1 d F ^ j t * F j t + F i s F ^ i s * .
From Proposition A1, we know that the differences between the distribution function and the empirical distribution function converge to zero, independent from the kind of weights. Therefore, as a finite sum of zero sequences, it holds that p ^ i s * p i s a . s . 0 . From this, it directly follows that p ^ * p a . s . 0 . □
With ψ h k * = ( Ψ 11 , h k * , Ψ 12 , h k * , , Ψ a d , h k * ) , h = 1 , , a k = 1 , , n h consider the vector
N B = h = 1 a N · 1 n h · k = 1 n h ψ h k * E ( ψ h k * )
based on random variables defined as
Ψ i s , h k * : = n h a d t = 1 d u = 1 m h t k λ h t k w h t k * F i s ( X h t k u ) , for h i       n h a d j i a t = 1 d u = 1 m i s k λ i s k w i s k * F j t ( X i s k u ) + n h a d t = 1 d ( u = 1 m i s k λ i s k w i s k * F i t ( X i s k u ) u = 1 m i t k λ i t k w i t k * F i s ( X i t k u ) ) , else
with expectation values
β i s , h k * : = E ( Ψ i s , h k * ) = n h a d t = 1 d λ h t k m h t k w h t k * p ( i s , h t ) , for h i       n h a d j i a t = 1 d m i s k λ i s k w i s k * p ( j t , i s ) ) + n h a d t = 1 d ( m i s k λ i s k w i s k * p ( i t , i s ) m i t k λ i t k w i t k * p ( i s , i t ) ) , else .
The term N B can be used to calculate the asymptotic distribution of N ( p ^ * p ) .
Proof of Theorem 1.
It is clear that
F i s d G ^ * = 1 a d j = 1 a t = 1 d F i s d F ^ j t * = 1 a d j = 1 a t = 1 d k = 1 n j λ j t k w j t k * u = 1 m h t k F i s ( X j t k u ) ,
G d F ^ i s * = k = 1 n i λ i s k w i s k * u = 1 m i s k G ( X i s k u ) = 1 a d j = 1 a t = 1 d k = 1 n i λ i s k w i s k * u = 1 m i s k F j t ( X i s k u ) .
Analogously to Theorem 1 from Rubarth et al. [31] for the s-th component from the i-th group, it holds that
N ( p ^ i s * p i s ) = N G d F ^ i s * N G ^ * d F i s = N G ^ * d ( F ^ i s F i s ) + N G ^ * d F i s N p i s = N G d ( F ^ i s * F i s ) + N G ^ * d F i s N p i s + O P ( 1 ) = N G d F ^ i s * + G ^ * d F i s 2 p i s + O P ( 1 ) = N G d F ^ i s * + 1 F i s d G ^ * 2 p i s + O P ( 1 ) = N 1 a d j = 1 a t = 1 d k = 1 n i r = 1 m i s k λ i s k w i s k * F j t ( X i s k r ) 1 a d j = 1 a t = 1 d = 1 n j u = 1 m j t λ j t w j t * F i s ( X j t u ) + ( 1 2 p i s ) + O P ( 1 ) = N 1 a d j = 1 a t = 1 d k = 1 n i r = 1 m i s k λ i s k w i s k * F j t ( X i s k r ) = 1 n j u = 1 m j t λ j t w j t * F i s ( X j t u ) + ( 1 2 p i s ) + O P ( 1 )
= N 1 a d j i a t = 1 d k = 1 n i u = 1 m i s k λ i s k w i s k * F j t ( X i s k u ) + 1 a d k = 1 n i t = 1 d u = 1 m i s k λ i s k w i s k * F i t ( X i s k u ) u = 1 m i t k λ i t k w i t k * F i s ( X i t k u ) 1 a d j i a t = 1 d = 1 n j u = 1 m j t λ j t w j t * F i s ( X j t u ) + ( 1 2 p i s ) + O P ( 1 ) = N h = 1 a 1 n h k = 1 n h Ψ i s , h k * + ( 1 2 p i s ) + O P ( 1 ) ,
where the stochastic convergence to zero, denoted by O P ( 1 ) , holds regarding λ min .
Considering now the expectations, this leads to
β i s , h k * = n h a d t = 1 d λ h t k m h t k w h t k * p ( i s , h t )
for h i and else to
β i s , i k * = n i a d j i a t = 1 d m i s k λ i s k w i s k * p ( j t , i s ) + n i a d t = 1 d m i s k λ i s k w i s k * p ( i t , i s ) m i t k λ i t k w i t k * p ( i s , i t )
with p ( i s , h t ) : = F i s d F h t . Therefore, we can calculate
k = 1 n i E ( Ψ i s , i k * ) = k = 1 n i n i a d j i a t = 1 d m i s k λ i s k w i s k * p ( j t , i s ) + n i a d t = 1 d m i s k λ i s k w i s k * p ( i t , i s ) m i t k λ i t k w i t k * p ( i s , i t ) = j i a n i a d t = 1 d p ( j t , i s ) + n i a d t = 1 d p ( i t , i s ) p ( i s , i t )
as well as, for h i ,
k = 1 n i E ( Ψ i s , h k * ) = n h a d k = 1 n i t = 1 d λ h t k m h t k w h t k * p ( i s , h t ) = n h a d t = 1 d p ( i s , h t ) .
In total, we obtain
E h = 1 a 1 n h k = 1 n h Ψ i s , h k * = h i a 1 n h k = 1 n h E ( Ψ i s , h k * ) + 1 n i k = 1 n g E ( Ψ i s , i k * )
= 1 a d h i a t = 1 d p ( i s , h t ) + j i a t = 1 d p ( j t , i s ) + t = 1 d p ( i t , i s ) p ( i s , i t ) = 1 a d h = 1 a t = 1 d p ( i s , h t ) + j = 1 a t = 1 d p ( j t , i s ) = ( 1 p i s ) + p i s = ( 1 2 p i s )
Together with the other equation, we obtain
N ( p ^ i s * p i s ) = N h = 1 a 1 n h · k = 1 n h Ψ i s , h k * E ( Ψ i s , h k * ) + O P ( 1 )
resp.
N ( p ^ * p ) = N h = 1 a 1 n h · k = 1 n h ψ h k * E ( ψ h k * ) + O P ( 1 ) .
Proof of Theorem 2. 
Through the construction of Ψ i s , h k and A1.2, it holds that | Ψ i s , h k |   O ( N 0 ) , thus, these random variables have finite moments. Due to the independence of all ψ h k by Lindeberg–Feller Theorem, we obtain for B h = 1 n h k = 1 n h ψ h k E ( ψ h k ) that n h B h is asymptotically distributed like a normal distributed random vector with expectation vector 0 and covariance matrix V N , h . Hereby, two of the requirements of the the Lindeberg–Feller Theorem are obviously fulfilled, since the random variables are uniformly bounded and centered. However, it is not sure that V N , h , which depends on N, converge to a fixed matrix. However, through the fact that all random variables are bounded, it follows that the covariance matrix is also bounded, and for each sequence we can find a subsequence where the covariance matrix converges. For each of this subsequences the asymptotic distribution of B h matches with the actual normal distribution. Therefore, the result holds in general. Through the independence of the B h and N B = h = 1 a N n h n h B h together with condition A1.2 the result follows.   □
Since the covariance matrix is unknown and the random variables are not observable, we use estimated versions of these random variables to estimate the covariance matrix. They are defined as
Ψ ^ i s , h k : = n h a d t = 1 d u = 1 m h t k λ h t k w h t k F ^ i s ( X h t k u ) , for h i + n h a d j i a t = 1 d u = 1 m i s k λ i s k w i s k F ^ j t ( X i s k u ) + n h a d t = 1 d ( u = 1 m i s k λ i s k w i s k F ^ i t ( X i s k u ) u = 1 m i t k λ i t k w i t k F ^ i s ( X i t k u ) ) , else
and for the expectation
β ^ i s , i k : = E ( Ψ ^ i s , h k ) = n h a d t = 1 d λ h t k m h t k w h t k p ^ ( i s , h t ) , for h i n h a d j i a t = 1 d m i s k λ i s k w i s k p ^ ( j t , i s ) ) + n h a d t = 1 d ( m i s k λ i s k w i s k p ^ ( i t , i s ) m i t k λ i t k w i t k p ^ ( i s , i t ) ) , else .
Based on these variables, we define an estimator for the unknown covariance matrix V N , h through
V ^ = 1 n h 1 k = 1 n h ( p ^ s i h k β ^ h k ) ( p ^ s i h k β ^ h k ) ,
whereby an estimator for V N : = h = 1 a κ h 1 · V N , h is given by V ^ N : = h = 1 a N n h · V ^ N , h .
Proof of Theorem 5. 
1. Let y = ( y 1 , , y a d ) be an arbitrary vector. Then, it holds that
y V ^ h y = 1 n h 1 k = 1 n h y ( p ^ s i h k β ^ h k ) y ( p ^ s i h k β ^ h k ) = 1 n h 1 k = 1 n h y ( p ^ s i h k β ^ h k ) 2 0 .
Since V ^ N is a convex combination of positive semi-definite matrices, it is also positive semi-definite.
2. It is clear that
V ˜ N , h = 1 n h 1 k = 1 n h ( ψ h k β h k ) ( ψ h k β h k )
is a consistent estimator for V N , h . To demonstrate consistency, we use the triangle inequality and prove that V ^ N , h V ˜ N , h a . s 0 . First, we remember that | Ψ h k |   O ( N 0 ) and with the same arguments it follows that | Ψ ^ h k |   O ( N 0 ) , | β h k |   O ( N 0 ) and | β ^ h k |   O ( N 0 ) .
Then, we consider the single components, where for the diagonal elements it holds that
1 n h 1 k = 1 n h Ψ ^ i s , h k β ^ i s , h k 2 Ψ i s , h k β i s , h k 2 2 · max k = 1 , , n h Ψ ^ i s , h k β ^ i s , h k 2 Ψ i s , h k β i s , h k 2 2 · max k = 1 , , n h Ψ ^ i s , h k β ^ i s , h k + Ψ i s , h k β i s , h k Ψ ^ i s , h k β ^ i s , h k Ψ i s , h k + β i s , h k O ( N 0 ) · max k = 1 , , n h Ψ ^ i s , h k β ^ i s , h k Ψ i s , h k + β i s , h k O ( N 0 ) · max k = 1 , , n h Ψ ^ i s , h k Ψ i s , h k + O ( N 0 ) · max k = 1 , , n h β i s , h k β ^ i s , h k .
First, we consider for h i
| Ψ i s , h k Ψ ^ i s , h k | = n h a d t = 1 d u = 1 m h t k λ h t k w h t k F i s ( X h t k u ) F ^ i s ( X h t k u ) n h a d t = 1 d m i k t λ h t k w h t k F i s F ^ i s n h · O ( λ min 1 ) · F i s F ^ i s a . s 0
and similar
| Ψ i s , i k Ψ ^ i s , i k | = n h a d | j i a t = 1 d u = 1 m i s k λ i s k w i s k · F j t ( X i s k u ) F ^ j t ( X i s k u ) + t = 1 d u = 1 m i s k λ i s k w i s k · F i t ( X i s k u ) F ^ i t ( X i s k u ) t = 1 d u = 1 m i t k λ i t k w i t k · F i s ( X i t k u ) F ^ i s ( X i t k u ) | n h a d ( j i a t = 1 d m i s k λ i s k w i s k · F j t F ^ j t + t = 1 d m i s k λ i s k w i s k · F i t F ^ i t + t = 1 d m i t k λ i t k w i t k · F i s F ^ i s ) n h a d ( j i a t = 1 d O ( λ min 1 ) · F j t F ^ j t + t = 1 d O ( λ min 1 ) · F j t F ^ j t + t = 1 d O ( λ min 1 ) · F i s F ^ i s ) O ( N 0 ) a d j i a t = 1 d F j t F ^ j t + t = 1 d F j t F ^ j t + F j t F ^ j t a . s 0
For the expectation, we calculate for h i
| β i s , h k β ^ i s , h k | = n h a d t = 1 d λ h t k m h t k w h t k p ( i s , h t ) p ^ ( i s , h t ) n h a d t = 1 d O ( λ min 1 ) · p ( i s , h t ) p ^ ( i s , h t ) a . s 0
and
| β i s , i k β ^ i s , i k | n h a d j i a t = 1 d m i s k λ i s k w i s k p ( j t , i s ) p ^ ( j t , i s ) + n h a d t = 1 d m i s k λ i s k w i s k p ( i t , i s ) p ^ ( i t , i s ) m i t k λ i t k w i t k p ( i s , i t ) p ^ ( i s , i t ) n h a d j i a t = 1 d m i s k λ i s k w i s k p ( j t , i s ) p ^ ( j t , i s ) + n h a d t = 1 d m i s k λ i s k w i s k p ( i t , i s ) p ^ ( i t , i s ) + m i t k λ i t k w i t k p ( i s , i t ) p ^ ( i s , i t ) O ( N 0 ) a d j i a t = 1 d p ( j t , i s ) p ^ ( j t , i s ) + t = 1 d p ( i t , i s ) p ^ ( i t , i s ) + p ( i t , i s ) p ^ ( i t , i s ) a . s 0 .
For the off diagonal elements we can use the same convergences, but first we define Δ ^ i s , h k : = Ψ ^ i s , h k β ^ i s , h k and Δ i s , h k : = Ψ i s , h k β i s , h k which fulfill | Δ ^ i s , h k | O ( N 0 ) and | Δ i s , h k | O ( N 0 ) . Then, for the elements which correspond to Ψ i s , h k and Ψ j t , h k we obtain
| 1 n h 1 k = 1 n h Δ ^ i s , h k Δ ^ j t , h k Δ i s , h k Δ j t , h k | = | 1 n h 1 k = 1 n h Δ ^ i s , h k Δ ^ j t , h k Δ i s , h k Δ j t , h k + Δ ^ i s , h k Δ j t , h k Δ ^ i s , h k Δ j t , h k | = | 1 n h 1 k = 1 n h Δ ^ i s , h k Δ ^ j t , h k Δ j t , h k + Δ j t , h k Δ ^ i s , h k Δ i s , h k | 1 n h 1 k = 1 n h | Δ ^ g i , h k ( Δ ^ j t , h k Δ j t , h k ) | + | Δ j t , h k ( Δ ^ i s , h k Δ i s , h k ) | | 1 n h 1 k = 1 n h O ( N 0 ) | ( Δ ^ j t , h k Δ j t , h k ) | + O ( N 0 | ( Δ ^ i s , h k Δ i s , h k ) | 2 n h k = 1 n h O ( N 0 ) | ( Δ ^ j t , h k Δ j t , h k ) | + O ( N 0 | ( Δ ^ i s , h k Δ i s , h k ) | O ( N 0 ) max k = 1 , , n h | Δ ^ j t , h k Δ j t , h k | O ( N 0 max k = 1 , , n h | Δ ^ i s , h k Δ i s , h k | = O ( N 0 ) max k = 1 , , n h | Ψ ^ j t , h k β ^ j t , h k Ψ j t , h k β j t , h k | + O ( N 0 ) max k = 1 , , n h | Ψ ^ i s , h k β ^ i s , h k Ψ i s , h k β i s , h k O ( N 0 ) max k = 1 , , n h | Ψ ^ j t , h k Ψ j t , h k + O ( N 0 ) max k = 1 , , n h | β j t , h k β ^ j t , h k + O ( N 0 ) max k = 1 , , n h | Ψ ^ i s , h k Ψ i s , h k + O ( N 0 ) max k = 1 , , n h | β i s , h k β ^ i s , h k a . s 0
Since we demonstrated consistency for the diagonal elements and the off diagonal elements, it follows that V ^ N , h V ˜ N , h a . s . 0 and consequently V ^ N , h V N , h a . s . 0 .
3. Follows directly from 2. with Slutzky’s theorem. □

References

  1. Roy, A.; Harrar, S.W.; Konietschke, F. The nonparametric Behrens-Fisher problem with dependent replicates. Stat. Med. 2019, 38, 4939–4962. [Google Scholar] [CrossRef] [PubMed]
  2. Larocque, D.; Haataja, R.; Nevalainen, J.; Oja, H. Two sample tests for the nonparametric Behrens–Fisher problem with clustered data. J. Nonparametric Stat. 2010, 22, 755–771. [Google Scholar] [CrossRef]
  3. Cui, Y.; Konietschke, F.; Harrar, S.W. The nonparametric Behrens–Fisher problem in partially complete clustered data. Biom. J. 2021, 63, 148–167. [Google Scholar] [CrossRef] [PubMed]
  4. Gao, X. A Nonparametric Procedure for the Two-Factor Mixed Model with Missing Data. Biom. J. 2007, 49, 774–788. [Google Scholar] [CrossRef] [PubMed]
  5. Fitzmaurice, G.; Laird, N.; Ware, J. Applied Longitudinal Analysis; John Wiley & Sons: Hoboken, NJ, USA, 2012. [Google Scholar]
  6. Johnson, R.A.; Wichern, D. Applied Multivariate Statistical Analysis; Pearson Education Limited: London, UK, 2007. [Google Scholar]
  7. Mann, H.B.; Whitney, D.R. On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other. Ann. Math. Stat. 1947, 18, 50–60. [Google Scholar] [CrossRef]
  8. Brunner, E.; Munzel, U. The Nonparametric Behrens-Fisher Problem: Asymptotic Theory and a Small-Sample Approximation. Biom. J. 2000, 42, 17–25. [Google Scholar] [CrossRef]
  9. Thas, O.; Neve, J.D.; Clement, L.; Ottoy, J.P. Probabilistic index models. J. R. Stat. Soc. Ser. B 2012, 74, 623–671. [Google Scholar] [CrossRef] [Green Version]
  10. Acion, L.; Peterson, J.J.; Temple, S.; Arndt, S. Probabilistic index: An intuitive non-parametric approach to measuring the size of treatment effects. Stat. Med. 2006, 25, 591–602. [Google Scholar] [CrossRef]
  11. Brunner, E.; Bathke, A.C.; Konietschke, F. Rank and Pseudo-Rank Procedures for Independent Observations in Factorial Designs; Springer: Berlin/Heidelberg, Germany, 2018. [Google Scholar]
  12. Akritas, M.; Kuha, J.; Osgood, W. A Nonparametric Approach to Matched Pairs with Missing Data. Sociol. Methods Res. 2002, 30, 425–454. [Google Scholar] [CrossRef]
  13. Fong, Y.; Huang, Y.; Lemos, M.; Mcelrath, J. Rank-based two-sample tests for paired data with missing values. Biostatistics 2018, 19, 281–294. [Google Scholar] [CrossRef]
  14. Domhof, S.; Brunner, E.; Osgood, W. Rank Procedures for Repeated Measures with Missing Values. Sociol. Methods Res. 2002, 30, 367–393. [Google Scholar] [CrossRef]
  15. Amro, L.; Konietschke, F.; Pauly, M. Incompletely observed nonparametric factorial designs with repeated measurements: A wild bootstrap approach. arXiv 2021, arXiv:2102.02871. [Google Scholar]
  16. Akritas, M.; Brunner, E. A unified approach to rank tests for mixed models. J. Stat. Plan. Inference 1997, 61, 249–277. [Google Scholar] [CrossRef]
  17. Brunner, E.; Munzel, U.; Puri, M.L. Rank-Score Tests in Factorial Designs with Repeated Measures. J. Multivar. Anal. 1999, 70, 286–317. [Google Scholar] [CrossRef] [Green Version]
  18. Brunner, E.; Domhof, S.; Langer, F. Nonparametric Analysis of Longitudinal Data in Factorial Experiments; Wiley-Interscience: Hoboken, NJ, USA, 2002; Volume 373. [Google Scholar]
  19. Klumbies, K.; Rust, R.; Dörr, J.; Konietschke, F.; Paul, F.; Bellmann-Strobl, J.; Brandt, A.; Zimmermann, H.G. Retinal Thickness Analysis in Progressive Multiple Sclerosis Patients Treated With Epigallocatechin Gallate: Optical Coherence Tomography Results From the SUPREMES Study. Front. Neurol. 2021, 12, 615790. [Google Scholar] [CrossRef]
  20. Walton, C.; King, R.; Rechtman, L.; Kaye, W.; Leray, E.; Marrie, R.A.; Robertson, N.; Rocca, N.L.; Uitdehaag, B.; van der Mei, I.; et al. Rising prevalence of multiple sclerosis worldwide: Insights from the Atlas of MS, third edition. Mult. Scler. J. 2020, 26, 1816–1821. [Google Scholar] [CrossRef]
  21. Reich, D.S.; Lucchinetti, C.F.; Calabresi, P.A. Multiple Sclerosis. N. Engl. J. Med. 2018, 378, 169–180. [Google Scholar] [CrossRef]
  22. Petzold, A.; Balcer, L.J.; Calabresi, P.A.; Costello, F.; Frohman, T.C.; Frohman, E.M.; Martinez-Lapiscina, E.H.; Green, A.J.; Kardon, R.; Outteryck, O.; et al. Retinal layer segmentation in multiple sclerosis: A systematic review and meta-analysis. Lancet Neurol. 2017, 16, 797–812. [Google Scholar] [CrossRef] [Green Version]
  23. Oertel, F.C.; Zimmermann, H.G.; Brandt, A.U.; Paul, F. Optical coherence tomography in neuromyelitis optica spectrum disorders: Potential advantages for individualized monitoring of progression and therapy. Expert Rev. Neurother. 2019, 19, 31–43. [Google Scholar] [CrossRef]
  24. Ruymgaart, F. A Unified Approach to the Asymptotic Distribution Theory of Certain Midrank Statistics; Springer: Berlin/Heidelberg, Germany, 2006; Volume 821, pp. 1–18. [Google Scholar]
  25. Brunner, E.; Konietschke, F.; Pauly, M.; Puri, M. Rank-Based Procedures in Factorial Designs: Hypotheses about Nonparametric Treatment Effects. J. R. Stat. Soc. Ser. B 2016, 79, 1463–1485. [Google Scholar] [CrossRef] [Green Version]
  26. Brunner, E.; Konietschke, F.; Bathke, A.C.; Pauly, M. Ranks and Pseudo-ranks—Surprising Results of Certain Rank Tests in Unbalanced Designs. Int. Stat. Rev. 2020, 89, 349–366. [Google Scholar] [CrossRef]
  27. Obuchowski, N.A. Nonparametric analysis of clustered ROC curve data. Biometrics 1997, 53, 567–578. [Google Scholar] [CrossRef]
  28. Zou, G. Confidence interval estimation for treatment effects in cluster randomization trials based on ranks. Stat. Med. 2021, 40, 3227–3250. [Google Scholar] [CrossRef]
  29. Hoffman, E.; Sen, P.; Weinberg, C. Within-Cluster Resampling. Biometrika 2001, 88, 1121–1134. [Google Scholar] [CrossRef] [Green Version]
  30. Williamson, J.; Datta, S.; Satten, G. Marginal Analyses of Clustered Data When Cluster Size Is Informative. Biometrics 2003, 59, 36–42. [Google Scholar] [CrossRef]
  31. Rubarth, K.; Pauly, M.; Konietschke, F. Ranking Procedures for Repeated Measures Designs with Missing Data: Estimation, Testing and Asymptotic Theory. Stat. Methods Med. Res. 2022, 31, 105–118. [Google Scholar] [CrossRef] [PubMed]
  32. Konietschke, F.; Hothorn, L.; Brunner, E. Rank-based multiple test procedures and simultaneous confidence intervals. Electron. J. Stat. 2012, 6, 738–759. [Google Scholar] [CrossRef] [Green Version]
  33. Konietschke, F.; Bathke, A.; Hothorn, L.; Brunner, E. Testing and estimation of purely nonparametric effects in repeated measures designs. Comput. Stat. Data Anal. 2010, 54, 1895–1905. [Google Scholar] [CrossRef]
  34. Akritas, M.; Arnold, S.; Brunner, E. Nonparametric Hypotheses and Rank Statistics for Unbalanced Factorial Designs. J. Am. Stat. Assoc. 1997, 92, 258–265. [Google Scholar] [CrossRef]
  35. Bretz, F.; Genz, A.; Hothorn, L. On the Numerical Availability of Multiple Comparison Procedures. Biom. J. 2001, 43, 645–656. [Google Scholar] [CrossRef]
  36. Konietschke, F.; Harrar, S.W.; Lange, K.; Brunner, E. Ranking procedures for matched pairs with missing data—Asymptotic theory and a small sample approximation. Comput. Stat. Data Anal. 2012, 56, 1090–1102. [Google Scholar] [CrossRef]
  37. Gao, X.; Alvo, M.; Chen, J.; Li, G. Nonparametric multiple comparison procedures for unbalanced one-way factorial designs. J. Stat. Plan. Inference 2008, 138, 2574–2591. [Google Scholar] [CrossRef]
  38. R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2013. [Google Scholar]
  39. Friedrich, S.; Konietschke, F.; Pauly, M. A wild bootstrap approach for nonparametric repeated measurements. Comput. Stat. Data Anal. 2017, 113, 38–52. [Google Scholar] [CrossRef]
  40. Fay, M.P.; Proschan, M.A. Wilcoxon-Mann-Whitney or t-test? On assumptions for hypothesis tests and multiple interpretations of decision rules. Stat. Surv. 2010, 4, 1–39. [Google Scholar] [CrossRef]
  41. Fagerland, M.W.; Sandvik, L. The Wilcoxon–Mann–Whitney test under scrutiny. Stat. Med. 2009, 28, 1487–1497. [Google Scholar] [CrossRef]
  42. Bergmann, R.; Ludbrook, J.; Spooren, W.P.J.M. Different Outcomes of the Wilcoxon-Mann-Whitney Test from Different Statistics Packages. Am. Stat. 2000, 54, 72–77. [Google Scholar]
  43. Fay, M.; Malinovsky, Y. Confidence intervals of the Mann-Whitney parameter that are compatible with the Wilcoxon-Mann-Whitney test: Confidence Intervals on the Mann-Whitney Parameter. Stat. Med. 2018, 37, 3991–4006. [Google Scholar] [CrossRef]
  44. Fay, M.; Brittain, E.; Shih, J.; Follmann, D.; Gabriel, E. Causal estimands and confidence intervals associated with Wilcoxon-Mann-Whitney tests in randomized experiments. Stat. Med. 2017, 37, 2923–2937. [Google Scholar] [CrossRef]
  45. Hand, D.J. On Comparing Two Treatments. Am. Stat. 1992, 46, 190–192. [Google Scholar] [CrossRef]
  46. Noguchi, K.; Gel, Y.R.; Brunner, E.; Konietschke, F. nparLD: An R software package for the nonparametric analysis of longitudinal data in factorial experiments. J. Stat. Softw. 2012, 50, 12. [Google Scholar] [CrossRef] [Green Version]
  47. Domhof, S. Nichtparametrische Relative Effekte. Ph.D. Thesis, Niedersächsische Staats-und Universitätsbibliothek Göttingen, Göttingen, Germany, 2001. [Google Scholar]
Figure 1. Boxplots of Type-I error rates in relation to sample sizes n 1 and n 2 in various settings without missing data.
Figure 1. Boxplots of Type-I error rates in relation to sample sizes n 1 and n 2 in various settings without missing data.
Symmetry 14 00244 g001
Figure 2. Boxplots of Type-I error rates in relation to missing rates r 1 and r 2 in various settings with n 1 = n 2 = 30 .
Figure 2. Boxplots of Type-I error rates in relation to missing rates r 1 and r 2 in various settings with n 1 = n 2 = 30 .
Symmetry 14 00244 g002
Figure 3. Boxplots of Type-I error rates in relation to cluster sizes m i s k in various settings with n 1 = n 2 = 30 and green without missing data.
Figure 3. Boxplots of Type-I error rates in relation to cluster sizes m i s k in various settings with n 1 = n 2 = 30 and green without missing data.
Symmetry 14 00244 g003
Figure 4. Boxplots of Type-I error rates in relation to intra-cluster correlation ρ i s k in various settings with n 1 = n 2 = 30 and green without missing data.
Figure 4. Boxplots of Type-I error rates in relation to intra-cluster correlation ρ i s k in various settings with n 1 = n 2 = 30 and green without missing data.
Symmetry 14 00244 g004
Figure 5. Boxplots of Type-I error rates in relation to covariance matrices Σ 1 and Σ 2 in various settings with n 1 = n 2 = 30 and without missing data.
Figure 5. Boxplots of Type-I error rates in relation to covariance matrices Σ 1 and Σ 2 in various settings with n 1 = n 2 = 30 and without missing data.
Symmetry 14 00244 g005
Figure 6. Boxplots of Type-I error rates in relation to unweighted and weighted estimation of the relative effect p in various settings with n 1 = n 2 = 30 and without missing data.
Figure 6. Boxplots of Type-I error rates in relation to unweighted and weighted estimation of the relative effect p in various settings with n 1 = n 2 = 30 and without missing data.
Symmetry 14 00244 g006
Figure 7. Boxplots of Type-I error rates in relation to unweighted and weighted estimation of the relative effect p and fixed intra-cluster correlations ρ i s k in various settings with n 1 = n 2 = 30 and without missing data.
Figure 7. Boxplots of Type-I error rates in relation to unweighted and weighted estimation of the relative effect p and fixed intra-cluster correlations ρ i s k in various settings with n 1 = n 2 = 30 and without missing data.
Symmetry 14 00244 g007
Figure 8. Boxplots of biases and MSEs of estimators p ^ i s * in Equation (5) in relation to different sample sizes n 1 and n 2 .
Figure 8. Boxplots of biases and MSEs of estimators p ^ i s * in Equation (5) in relation to different sample sizes n 1 and n 2 .
Symmetry 14 00244 g008
Figure 9. Boxplots of biases and MSEs of estimators p ^ i s * in Equation (5) in relation to the amount of missing data in scenarios with n 1 = n 2 = 30 .
Figure 9. Boxplots of biases and MSEs of estimators p ^ i s * in Equation (5) in relation to the amount of missing data in scenarios with n 1 = n 2 = 30 .
Symmetry 14 00244 g009
Figure 10. Boxplots of biases and MSEs of estimators p ^ i s * in Equation (5) in relation to cluster sizes m i s k in scenarios with n 1 = n 2 = 30 and without missing data.
Figure 10. Boxplots of biases and MSEs of estimators p ^ i s * in Equation (5) in relation to cluster sizes m i s k in scenarios with n 1 = n 2 = 30 and without missing data.
Symmetry 14 00244 g010
Figure 11. Boxplots of biases and MSEs of estimators p ^ i s * in Equation (5) in relation to intra-cluster correlations ρ i s k in scenarios with n 1 = n 2 = 30 and without missing data.
Figure 11. Boxplots of biases and MSEs of estimators p ^ i s * in Equation (5) in relation to intra-cluster correlations ρ i s k in scenarios with n 1 = n 2 = 30 and without missing data.
Symmetry 14 00244 g011
Figure 12. Lineplot of pRNFL values in both groups at baseline, 1-year follow-up and 2-year follow-up of the SUPREMES trial.
Figure 12. Lineplot of pRNFL values in both groups at baseline, 1-year follow-up and 2-year follow-up of the SUPREMES trial.
Symmetry 14 00244 g012
Figure 13. Boxplot of pRNFL values in both groups at baseline, 1-year follow-up and 2-year follow-up of the SUPREMES trial.
Figure 13. Boxplot of pRNFL values in both groups at baseline, 1-year follow-up and 2-year follow-up of the SUPREMES trial.
Symmetry 14 00244 g013
Table 1. Number of patients with pRNFL measurements in each group at baseline, 1-year follow-up, 2-year follow-up, and 3-year follow-up.
Table 1. Number of patients with pRNFL measurements in each group at baseline, 1-year follow-up, 2-year follow-up, and 3-year follow-up.
GroupFirst OCT1-Year F/U2-Year F/U3-Year F/U
Verum151473
Placebo1616115
Table 2. Point estimators of differences p i s p j t ( i , j : group, s , t : time point), simultaneous confidence intervals, t-values and p-values for Tukey-type contrasts in relative effects in the SUPREMES trial.
Table 2. Point estimators of differences p i s p j t ( i , j : group, s , t : time point), simultaneous confidence intervals, t-values and p-values for Tukey-type contrasts in relative effects in the SUPREMES trial.
ComparisonEstimator95%-Confidence Intervalt-Valuep-Value
p ^ 12 p ^ 11 −0.080[−0.371, 0.212]−0.7770.850
p ^ 21 p ^ 11 −0.020[−0.344, 0.303]−0.1780.998
p ^ 22 p ^ 11 0.000[−0.468, 0.468]0.0001.000
p ^ 21 p ^ 12 0.059[−0.335, 0.453]0.4290.969
p ^ 22 p ^ 12 0.080[−0.442, 0.601]0.4350.968
p ^ 22 p ^ 21 0.020[−0.306, 0.346]0.1770.998
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Rubarth, K.; Sattler, P.; Zimmermann, H.G.; Konietschke, F. Estimation and Testing of Wilcoxon–Mann–Whitney Effects in Factorial Clustered Data Designs. Symmetry 2022, 14, 244. https://doi.org/10.3390/sym14020244

AMA Style

Rubarth K, Sattler P, Zimmermann HG, Konietschke F. Estimation and Testing of Wilcoxon–Mann–Whitney Effects in Factorial Clustered Data Designs. Symmetry. 2022; 14(2):244. https://doi.org/10.3390/sym14020244

Chicago/Turabian Style

Rubarth, Kerstin, Paavo Sattler, Hanna Gwendolyn Zimmermann, and Frank Konietschke. 2022. "Estimation and Testing of Wilcoxon–Mann–Whitney Effects in Factorial Clustered Data Designs" Symmetry 14, no. 2: 244. https://doi.org/10.3390/sym14020244

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop