Nonparametric Limits of Agreement in Method Comparison Studies: A Simulation Study on Extreme Quantile Estimation

Bland–Altman limits of agreement and the underlying plot are a well-established means in method comparison studies on quantitative outcomes. Normally distributed paired differences, a constant bias, and variance homogeneity across the measurement range are implicit assumptions to this end. Whenever these assumptions are not fully met and cannot be remedied by an appropriate transformation of the data or the application of a regression approach, the 2.5% and 97.5% quantiles of the differences have to be estimated nonparametrically. Earlier, a simple Sample Quantile (SQ) estimator (a weighted average of the observations closest to the target quantile), the Harrell–Davis estimator (HD), and estimators of the Sfakianakis–Verginis type (SV) outperformed 10 other quantile estimators in terms of mean coverage for the next observation in a simulation study, based on sample sizes between 30 and 150. Here, we investigate the variability of the coverage probability of these three and another three promising nonparametric quantile estimators with n=50(50)200,250(250)1000. The SQ estimator outperformed the HD and SV estimators for n=50 and was slightly better for n=100, whereas the SQ, HD, and SV estimators performed identically well for n≥150. The similarity of the boxplots for the SQ estimator across both distributions and sample sizes was striking.


Introduction
When comparing methods for the measurement of a continuous outcome, the Bland-Altman Limits of Agreement (BA LoAs) are a well-known and well-established means to this end [1][2][3]. Mean values are plotted against the paired differences in a scatter plot that is supplemented by an estimate of the bias (i.e., the mean of the paired differences) and the so-called LoAs (bias estimate +/− 1.96 standard deviations of the paired differences), including the respective 95% confidence intervals [4][5][6][7]. Under the assumptions of normally distributed paired differences, the BA LoAs represent estimates for the boundaries between which 95% of all population differences are supposed to lie. In the case of the variance heterogeneity of the paired differences, a normalizing transformation (like the natural logarithm) may render the usual analysis (on the log-scale) possible, whereas a bias that is non-constant across the measurement range might require a regression approach [3,8]. If neither a transformation of the data, nor a regression approach are applicable, the LoAs are often estimated by simple empirical 2.5% and 97.5% quantiles. However, various nonparametric quantile estimators have been proposed in the literature during the recent decades, which likewise may serve in the nonparametric estimation of 2.5% and 97.5% quantiles [9][10][11][12]. In an earlier endeavor, we performed a literature search on nonparametric quantile estimators from which we compared 15 in a simulation study [13]. We assessed the performance of these estimators by the average coverage probability for one newly generated observation from 20,000 fictive trials, and we assumed six different distributions and sample sizes between 30 and 150 to this end. We found that a simple sample quantile estimator based on two rank statistics [9], the Harrell-Davis estimator [14], and the estimators of the Sfakianakis-Verginis type [15] performed, on average, closely to the nominal coverage probability of 95%. The purpose of this paper is to illuminate the variability of the coverage probability of these nonparametric quantile estimators in a simulation study. Three further, possibly promising nonparametric quantile estimators from our former investigation [13] and the classical BA LoAs [3] (as a benchmark measure and for illustrative purposes) are added. In Section 2, all nonparametric quantile estimators and the simulation setup are described. In Section 3, the results are given and illustrated by boxplots. A discussion closes the paper.

Materials and Methods
The description of the six different methods of nonparametric quantile estimation considered here is kept brief. Further details can be found elsewhere [13].

Sample Quantile Estimator
A random sample of n paired differences X 1 , . . . X n is sorted in increasing order X (1) ≤ X (2) ≤ . . . ≤ X (n) ; the symbols here denote the order statistics of the random sample. The SQ estimator is a weighted average of the two order statistics that are closest to including p percent of all the observations in the sample: with α = p(n + 1) − r, r = [p(n + 1)], and [x] is the greatest integer that is less than or equal to x [10,12].

Harrell-Davis Estimator
The HD estimator, as well as those estimators that follow, employs linear combinations of all the available order statistics, weighting them according to their relative closeness to the target percentile.
They can be given as L-statistics, which is, n ∑ j=1 W j · X (j) , where W j and X (j) is the weight for the j-th order statistic and the j-th order statistic itself, respectively [16]. The Harrell-Davis estimator is given by: with weight function: where I i/n {a, b} is the incomplete beta function [9,14].

Bernstein Polynomial Estimator
The BP estimator employs the binomial probability of observing exactly i out of n events with an event probability of p, B(i; n, p), and is given by: according to Cheng [17].

HD Estimator Using a Level Crossing Empirical Distribution
Huang [18] modified the Harrell-Davis estimator (2) by applying a weighted empirical distribution function instead of the empirical distribution with equal weights 1/n. The Harrell-Davis estimator using a level crossing empirical distribution function can be written as: with weight function: . . , n, and: if j = 2, 3 . . . , n − 1.

Sfakianakis-Verginis Estimator
Sfakianakis and Verginis [15] proposed a group of three estimators, from which we chose the first one due to the similarity of the results in our former investigation [13]. SV estimators are supposed to better estimate quantiles in the tails of a distribution when using small samples, and they employ also the binomial probabilities B(i; n, p) as weights for the ordered statistics X (i) , i = 1, . . . , n:

Simulation Setup
We assumed six distributions (Figure 1), the choice and number of which were motivated by former simulation studies on nonparametric quantile estimation, i.e., 4-6 symmetric and asymmetric distributions [11,14,15,18,19]: 1. the standard normal distribution; 2. a lognormal distribution with meanlog = 1 and sdlog = 1; 3. a beta distribution with shape parameters α = 2 and β = 5 (non-centrality parameter λ = 0); 4. a beta distribution with shape parameters α = 2 and β = 2 (λ = 0); 5. a chi-squared distribution with 4 degrees of freedom; and 6. an exponential distribution with rate parameter 1. For sample sizes n = 50(50)200, 250(250)1000, each distribution, and quantile estimator, we simulated 5000 fictive trials and derived the LoAs. Their respective coverage probability c was derived by applying the cumulative distribution function F(x) to the LoAs, i.e., This resulted in the distributions of coverage probabilities, which we display with boxplots. We compared the nonparametric quantile estimators with respect to (a) the average (i.e., median and mean) coverage probability and the closeness to the nominal 95% level and (b) their variability in terms of the 5% quantile and the first quartile. The classical BA LoAs served as the benchmark indication of expectable variation under the standard normal distribution; their performance under the remaining distributions illustrated the inappropriateness under non-normal distributions. The Root Mean Squared Error (RMSE) for the estimation of the 2.5% and 97.5% quantiles was supplemented. Here, the true values for these quantiles were known, as were the assumed distributions. Note, though, that the investigation of the coverage probability enabled a simultaneous assessment of both LoAs, whereas the RMSE was related to one of the two LoAs at a time. For instance, underestimation of the lower LoA and underestimation of the upper LoA for one set of estimated LoAs can imply the very same coverage probability as an overestimated lower LoA and an overestimated upper LoA for another set of estimated LoAs. The R code is available as Supplementary Material Code S1 (R Version 4.0.2).

BA LoA
For n = 50, the median (mean) coverage probability was close to 0.95 for three distributions, but above the nominal level for the beta distributions (Distributions 3 and 4 in Figure 2; 0.957 (0.953) and 0.972 (0.967), respectively) and slightly below for the exponential distribution (Distribution 6 in Figure 2; 0.943 (0.940)). For Distribution 4, even the first quartile exceeded 0.95 (0.953), indicating overestimation of the BA LoA, which is visible by the box lying completely above 0.95 in Figure 2. The same applied in the case of Distribution 3 for sample sizes of n ≥ 200 and for Distribution 2 for sample sizes of n ≥ 500 (Figures 2 and 3). In the case of Distribution 6, the median (mean) coverage probability fell short of the nominal level for all sample sizes, even for n = 1000 (0.948 (0.948)).

SQ, n=200
Distribution Coverage Figure 4. Boxplots of the coverage probability for the SQ estimator and n = 50(50)200.
Comparing the HD and BP estimator for n = 200 in terms of the RMSE, the RMSE was smaller for the HD than for the BP estimator for four distributions in the estimation of the 2.5% percentile ( Table 2) and for one distribution in the estimation of the 97.5% percentile (Table 3). For three distributions, the RMSE was smaller for the BP than for the HD estimator in the estimation of the 97.5% percentile.
In general, all nonparametric LoAs performed considerably close to each other in the estimation of the 2.5% percentile for sample sizes of n ≥ 100 (Table 2). In the estimation of the 97.5% percentile, this was the case for n ≥ 200 (Table 3).
The 5% quantile was slightly below that of SQ for n = 50, 100 (Table 1), but identical for n ≥ 150. The SV estimator performed very similar to the SQ estimator for n ≥ 150 and, in general, close to identical to the HD estimator with respect to the 5% quantile, the first quartile, and the median of the coverage probability for all sample sizes (±0.001).

SV, n=200
Distribution Coverage Figure 6. Boxplots of the coverage probability for the SV estimator and n = 50(50)200.

NO Estimator
The NO estimator achieved a median (mean) and first quartile coverage probability of 0.95-0.951 (0.949-0.950) and 0.944-0.945 for n = 500, but fell short of the nominal coverage probability of 0.95 for smaller sample sizes ( Supplementary Figures S9 and S10). Only for sample sizes of n = 500, 750, 1000, the NO estimator performed comparably to the SQ estimator.

Key Finding
The SQ estimator, a simple sample quantile estimator based on two rank statistics, conservatively kept, on average, the nominal coverage level of 95% for n = 50 and converged rapidly towards it with increasing sample size. The variability of the SQ estimator, measured in terms of the 5% quantile and the first quartile of the coverage probability, was smallest amongst all considered nonparametric quantile estimators for n = 50, 100. For sample sizes of n ≥ 150, both the HD and the SV estimator performed likewise well in terms of average performance and small variability. The remaining nonparametric estimators considered in the study did so for larger sample sizes only (BP: n ≥ 750, HD lc: n ≥ 500, and NO: n ≥ 500).

What Does This Add to What Is Known
Harrell and Davis [14] investigated the mean squared error of their estimator to measure its efficiency with sample sizes of a maximum n = 60 and did not recommend the HD estimator for small n and extreme p. Dielman, Lowry, and Pfaffenberger [12] compared several quantile estimators in terms of bias with sample sizes n = 10, 15, 25, 30 and p as small as 0.02. They concluded that there was not one best estimator across scenarios, but the HD estimator performed well in a wide range of cases, except when p = 0.02, 0.98. They highlighted that the HD estimator makes use of many data points and, therefore, performed better in the estimation of middle quantiles. On the contrary, sample quantile estimators, like the SQ estimator, performed well at p = 0.02, and the HD estimator performed better at larger sample sizes. Sfakianakis and Verginis [15] compared the SV estimator with the HD and SQ estimators in terms of bias and mean squared error for p = 0.01, 0.05(0.05), 0.95, 0.99 and n = 5-750. They concluded that the SV estimator performed better than the HD and SQ estimators in most of the examined cases did.
The nonparametric estimation of the LoAs involves two tail percentiles, p = 0.025, 0.975, which need to be considered simultaneously. Earlier, we indicated that the SQ estimator outperformed other quantile estimators for n = 50 in terms of the mean coverage probability for the next observation, but the HD estimator and estimators of the Sfakianakis-Verginis type performed likewise well for sample sizes exceeding 80 observations [13]. Here, we investigated both the average performance and variability of nonparametric quantile estimators for the LoA. The SQ estimator outperformed the HD and SV estimators for n = 50 and was slightly better for n = 100, whereas the SQ, HD, and SV estimators performed identically well for n ≥ 150. The similarity of the boxplots for the SQ estimator across both distributions and sample sizes was striking.

What Is the Implication, and What Should Change Now?
Whenever violated assumptions for the paired differences prohibit the immediate derivation of the classical BA LoA and neither an appropriate preparatory transformation nor a regression approach are practicable alternatives [3,8,[20][21][22], the SQ estimator may serve as a basis for the derivation of nonparametric LoAs. For sample sizes n ≥ 150, this holds also true for the HD and SV estimators, although all available differences contribute to the quantile estimation, and the target quantiles are extreme tail percentiles. The SQ estimator is not defined for n < 40; however, a reasonably sized method comparison study is likely to enroll at least 40 subjects anyway [23][24][25][26]. Finally, the 95% confidence intervals should supplement nonparametric LoAs to indicate the estimates' uncertainty [27,28]. Bootstrapping techniques may serve this purpose for the SQ estimator.
Supplementary Materials: The following are available online at http://www.mdpi.com/1660-4601/17/22/8330/ s1: Code S1: R source code for the simulation study, Figure S2: Boxplots of the coverage probability for the SQ estimator and n = 250(250)1000, Figure S3: Boxplots of the coverage probability for the HD estimator and n = 250(250)1000, Figure S4: Boxplots of the coverage probability for the BP estimator and n = 50(50)200, Figure S5: Boxplots of the coverage probability for the BP estimator and n = 250(250)1000, Figure S6: Boxplots of the coverage probability for the HD lc estimator and n = 50(50)200, Figure S7: Boxplots of the coverage probability for the HD lc estimator and n = 250(250)1000, Figure S8: Boxplots of the coverage probability for the SV estimator and n = 250(250)1000, Figure S9: Boxplots of the coverage probability for the NO estimator and n = 50(50)200, Figure S10: Boxplots of the coverage probability for the NO estimator and n = 250(250)1000.
Funding: This research received no external funding.