Performance was evaluated over
R = 3000 simulation replications using percentage relative bias (%RB) and relative root mean squared error (RRMSE), where
with
denoting the estimate from replication
r and
θ the target parameter. For the finite distribution function, the bootstrap variance, and quantiles, the corresponding choices were
The performance of the Woodruff CI was assessed by its coverage probability (%CP
ξ), lower error rate (%L), and upper error rate (%U):
where
and
denote, respectively, the lower and upper Woodruff CI bounds for the
qth quantile in replication
r.
5.1. Study 1
Following the simulation design of Chen et al. [
15], we generated a finite population of size
N = 20,000. The study variable
y and auxiliary variables
x were generated from
where (
x1i,
x2i,
x3i,
x4i) follow the design in Chen et al. [
15], and the error terms
ϵi ∼
N(0, 1). The parameter
σ was chosen such that the correlation coefficient
ρ between
y and the linear predictor
x⊤β equals 0.5.
We consider four model specification scenarios:
TT: Both δ and M are correctly specified.
TF: δ is correctly specified, but M is misspecified, with x4i omitted from the model.
FT: M is correctly specified, but δ is misspecified, with x4i omitted from the model.
FF: Both models are misspecified, with x4i omitted from each model.
The analysis uses a nonprobability sample
SA of size
nA = 500 and a probability sample
SB of size
nB = 1000.
Table 1 reports %RB and RRMSE for the proposed finite distribution function estimators. Under TT, all estimators exhibit low bias and error, demonstrating stable performance. Under TF and FT, the DR estimator attains lower bias and error than the alternatives, highlighting the advantages of the doubly robust property. By contrast, under FF, performance deteriorates markedly for all estimators.
Table 2 compares the bootstrap variance estimators in terms of %RB and %CP
v. Under TT, all variance estimators perform satisfactorily. Under TF and FT, despite model misspecification, the variance estimator associated with the DR method retains low bias and a %CP
v close to 95%, indicating stable reliability and accuracy. Conversely, under FF, coverage performance deteriorates markedly across all methods.
Table 3 summarizes the results for the quantile estimators. Mirroring the findings for the finite distribution function estimators, all methods perform well under the TT scenario. Under TF and FT, the DR-based quantiles remain stable, confirming the robustness of the doubly robust approach. By contrast, under FF, overall estimation accuracy deteriorates.
Table 4 reports the Woodruff CI results for the quantile estimators, including %CP
ξ, %L, and %U. Consistent with previous findings, all methods perform well under the TT scenario. Under TF and FT, the DR-based intervals maintain a %CP
ξ close to the nominal 95% with balanced tail errors, indicating high reliability. By contrast, under FF, coverage deteriorates substantially across methods. %CP
ξ falls below the nominal level and both tail error rates increase, signaling degraded interval performance.
5.2. Study 2
In the second simulation study, we treat the 2023 Korean Survey of Household Finances and Living Conditions (
N = 16,730) as the finite population and repeatedly draw subsamples from it.
Table 5 summarizes the key variables used in the experiment and their definitions.
The nonprobability sample
SA was generated to mimic structures commonly observed in practice. The propensity score model was specified as a logistic regression,
where
ζ0 was chosen so that
. Under this design, households with higher educational attainment of the household head, non-single households, apartment residents, and households without debt were more likely to be included in
SA. The nonprobability sample
SA was then selected by Poisson sampling with inclusion probabilities
.
The probability sample SB was stratified into nine strata defined by GEO, HOME, and SIZE. A mixed-allocation scheme—combining Neyman and proportional allocation—was used to determine stratum-specific sample sizes, followed by simple random sampling without replacement within each stratum. The sample sizes were set to nA = 500 and nB = 1000.
The study variable of interest was current income (INCOME). Because the true outcome regression model was unknown, we included EXP1 and EXP2—the covariates with comparatively strong explanatory power—as regressors in the working model. This setup allows us to assess the impact of model misspecification on estimation performance and to isolate efficiency gains attributable to the DR estimator. We consider two scenarios regarding the propensity score model:
Table 6 reports the results for the distribution–function estimators. Overall, the REG estimator performs reasonably well, although its bias and error are somewhat larger at lower quantiles than at middle and upper quantiles, likely reflecting the limited explanatory power of the auxiliary variables and the possible overrepresentation of high-income households. Under Scenario A, the IPW estimator and the DR estimator both exhibit low bias and error, confirming the effectiveness of propensity score adjustment. Under Scenario B, the REG estimator is the most stable, while the DR estimator inherits some bias from the misspecified IPW component and thus loses efficiency. In summary, when the propensity score model is correctly specified, the IPW estimator, the REG estimator, and the DR estimator all yield stable results. However, when the propensity score model is misspecified, only the REG estimator and the DR estimator perform well, with the REG estimator performing best. These findings highlight that the choice of estimator may critically depend on the availability and explanatory power of the auxiliary variables.
Table 7 compares the bootstrap variance estimators in terms of %RB and %CP
v. Consistent with the findings for the finite distribution function estimators, the REG estimator shows degraded variance performance at lower quantiles. The IPW estimator maintains coverage close to 95% %CP
v under Scenario A, but its %CP
v declined markedly under Scenario B. The DR estimator achieves both low bias and stable %CP
v across scenarios, indicating reliable variance estimation.
Table 8 compares the quantile estimators in terms of %RB and RRMSE. The REG estimator shows substantial bias at lower quantiles, whereas the IPW estimator performs well under Scenario A but deteriorates under Scenario B. The DR estimator maintains moderate bias and error across both scenarios, yielding comparatively stable performance overall.
Table 9 reports results for the Woodruff confidence intervals of the quantile estimators-%CP
ξ, %L, and %U. The IPW estimator attains a %CP
ξ close to the nominal 95% under Scenario A, but coverage drops sharply under Scenario B, accompanied by an upward bias in %U, indicating sensitivity to propensity score misspecification. The REG estimator performs well at the middle and upper quantiles, but shows increased %L at lower quantiles. The DR estimator maintains a stable %CP
ξ across scenarios, with only a slight upward bias in %U under Scenario B.
In summary, two simulation experiments were conducted to evaluate the proposed methods. First, when both models were correctly specified, all estimators (IPW, REG, and DR) performed well, while under double misspecification, all methods failed to provide reliable results. Second, under single-model misspecification, the DR estimator consistently maintained low bias and stable inference, confirming the robustness of the approach. Third, in the empirical application using the 2023 Korean Survey of Household Finances and Living Conditions, the central role of auxiliary variables was evident, with the DR estimator showing comparatively the most reliable performance, especially in the lower tail.