Advanced Statistical Learning: Limit Theorems for Nonparametric Conditional U-Statistics Smoothed by Asymmetric Kernels Under Missing-at-Random Sampling

Bouzebda, Salim

doi:10.3390/math14122110

Open AccessArticle

Advanced Statistical Learning: Limit Theorems for Nonparametric Conditional U-Statistics Smoothed by Asymmetric Kernels Under Missing-at-Random Sampling

by

Salim Bouzebda

Laboratory of Applied Mathematics of Compiègne (LMAC), Université de Technologie de Compiègne, Alliance Sorbonne Université, 60203 Compiègne, France

Mathematics 2026, 14(12), 2110; https://doi.org/10.3390/math14122110 (registering DOI)

Submission received: 21 April 2026 / Revised: 3 June 2026 / Accepted: 4 June 2026 / Published: 12 June 2026

(This article belongs to the Section D1: Probability and Statistics)

Download

Browse Figures

Versions Notes

Abstract

This paper develops a boundary-sensitive asymptotic theory for nonparametric conditional U-statistics smoothed by support-adapted asymmetric kernels when the response variable is subject to Missing-at-Random observation. The problem lies at the intersection of three well-established but traditionally separate lines of research: conditional U-statistics, asymmetric smoothing on constrained supports, and incomplete-data inference under MAR sampling. The contribution of the paper is not a novelty claim concerning any of these components in isolation. Rather, it consists in deriving a kernel-specific and MAR-aware limit theory for their simultaneous occurrence, where the estimators are nonlinear complete-case ratios of localized U-statistics and the localization devices are point-dependent approximate identities adapted to the geometry of the covariate support. The analysis covers three principal classes of support-respecting smoothers: Dirichlet kernels on the simplex, Bernstein polynomial smoothers, and multivariate beta kernels on hypercubes, with an additional extension to mixed continuous–categorical regressors. These smoothing schemes are not translation-invariant, and their local moments, effective support, normalizing constants and

L^{2}

-masses vary with the evaluation point, especially near the boundary. Consequently, their incorporation into conditional U-statistics requires more than a direct transfer of ordinary asymmetric-kernel regression theory. The numerator and denominator of the estimators are localized U-statistics whose stochastic expansions are governed by Hoeffding projections, including canonical components that must be controlled uniformly over the conditioning domain. Under regularity, smoothness and positivity assumptions adapted to the MAR setting, we establish uniform consistency, weak and strong uniform convergence rates, stochastic expansions and asymptotic normality. The results are obtained both on fixed compact subsets and on interior regions approaching the boundary, thereby identifying how support geometry enters the bias and stochastic normalizations. A central feature of the theory is the separation between the deterministic effect of complete-case sampling and its stochastic effect. For the complete-case estimator, the natural deterministic equivalent is obtained by replacing the design density f with the effective complete-case density

p f

, where p is the propensity score. Thus, the MAR mechanism may enter higher-order deterministic bias constants through the local design tilt, whereas the leading stochastic dispersion reflects the loss of effective information through propensity score factors. The precise variance constants and normalizing rates remain kernel-specific, depending on the local

L^{2}

-structure of the Dirichlet, Bernstein or beta smoothing device. The paper should therefore be viewed as a MAR extension and refinement of the complete-data asymmetric-kernel conditional U-statistic theory. It provides a common probabilistic architecture for several boundary-adapted smoothing schemes while retaining the kernel-dependent bias operators, variance constants, boundary regimes and Hoeffding-projection structures required for sharp asymptotic interpretation. Numerical experiments illustrate the finite-sample behavior predicted by the theory and highlight the interaction between support-adapted smoothing, boundary effects and incomplete response observation.

Keywords:

asymmetric kernels; asymptotic normality; beta kernel; Bernstein kernels; boundary bias; conditional U-statistics; Dirichlet kernel; nonparametric regression estimation; rates of convergence; strong convergence; Missing-at-Random sampling

MSC:

62G05; 62G08; 62G20; 62G35; 62G07; 62G32; 62G30; 62E20

1. Introduction and Motivations

Since the foundational works of [1,2], and, in a broader functional-analytic sense [3], the theory of U-statistics has occupied a central position in asymptotic statistics. At the most basic level, a U-statistic replaces the ordinary empirical mean by a symmetrized average of a kernel over all distinct m-tuples of observations, thereby producing an unbiased estimator of a distributional functional of order m. Yet this elementary description severely understates the depth of the subject. The probabilistic structure of U-statistics is exceptionally rich: their fluctuation theory involves orthogonal projection methods, canonical degeneracy, nonlinear symmetrization phenomena, and subtle higher-order dependence patterns that distinguish them sharply from linear empirical averages. It is precisely this combination of generality, unbiasedness, and nontrivial asymptotic behavior that has made U-statistics one of the most influential and enduring constructions in modern statistical theory.

The classical i.i.d. theory was laid down in a sequence of landmark contributions [1,2,3,4,5,6,7], where the basic laws of large numbers, projection formulas, and asymptotic normality results were established. These results not only clarify the asymptotic structure of unbiased nonlinear statistics but also reveal a general methodological principle: once a statistical functional can be represented, or approximated, by a symmetrized kernel, the asymptotic behavior of its empirical counterpart may often be analyzed by decomposing it into Hoeffding projections of different orders. This insight led to a vast subsequent literature. Extensions to dependent settings, including mixing and weak dependence structures, were developed in [8,9,10,11]. Standard monographic treatments are given in [12,13,14,15,16,17,18,19], while recent developments and an updated bibliography are documented in [20,21,22,23]. Collectively, this body of work has elevated U-statistics from a special-purpose technical device to a genuinely universal language for higher-order statistical inference.

Their ubiquity is by now well established. In nonparametric and semiparametric statistics, U-statistics arise in density estimation, regression functionals, cross-validation, goodness-of-fit methodology, robustness, rank-based inference, and the study of asymptotic distributions of complex M-estimators. For example, Stute [24] used almost sure uniform bounds for

P

-canonical U-processes in the analysis of the product-limit estimator for truncated observations. Arcones and Wang [25] proposed new normality tests built from U-processes, while [26], drawing on the local U-statistic techniques of [27,28], developed weighted

L_{1}

tests based on standardized data. In robust multivariate inference, Joly and Lugosi [29] advocated median-of-means procedures rooted in U-statistic constructions for heavy-tailed functional estimation. More generally, the modern theory of U-processes now permeates inference on qualitative properties of functions in nonparametric statistics [30,31,32], the asymptotic analysis of nonlinear estimators [13,33,34], and recent developments in functional and robust models [35].

Their range of application extends far beyond traditional nonparametric theory. In random graph asymptotics, counts of fixed subgraphs, such as triangles, are canonical examples of U-statistics [36]. In machine learning, pairwise and higher-order empirical risks expressed through U-statistics now appear routinely in ranking, metric learning, clustering, image analysis, graph inference, and supervised comparison problems [37]. The ranking problem in particular may be formulated as a pairwise classification problem whose empirical criterion is a U-statistic of order two [38,39]. Further instances include entropy estimation [40], goodness-of-fit testing [41], model-free clustering and classification of genetic data [42], non-asymptotic analysis of random compressed sensing matrices [43], multiple-group clustering in high dimension [44], dimension-agnostic inference [45], empirical risk minimization by U-process methods [46,47], asymmetric U-statistics for stationary m-dependent sequences [48], testing under left truncation and right censoring [49], quadruplet statistics in network analysis [50], distributed two-sample inference [51], and monitoring or structural detection in complex stochastic systems [52,53]. The field has also moved toward increasingly intricate regimes, including random kernels of diverging order [23,54,55,56] and even infinite-order U-statistics motivated by uncertainty quantification in ensemble procedures [57]. At a still more general level, Randles [58] proposed a route to asymptotic distribution theory for U-statistics involving estimated parameters, a program that has recently been revisited in [59].

Among the most important local incarnations of this theory are the conditional U-statistics introduced by [60]. These estimators extend the Nadaraya–Watson paradigm [61,62] from first-order conditional means to nonlinear conditional functionals of arbitrary order. More precisely, let

{(X_{i}, Y_{i}), i \in N^{*}}

be i.i.d. random vectors with

X_{i} \in R^{d}

and

Y_{i} \in R^{q}

, and let

φ : R^{q m} \to R

be measurable. The object of interest is the conditional functional

r^{(m)} (φ, t) = E [φ (Y_{1}, \dots, Y_{m}) | (X_{1}, \dots, X_{m}) = (t_{1}, \dots, t_{m})], t \in R^{d m},

(1.1)

whenever a regular version exists. Given a kernel K and bandwidth

h_{n} \to 0

, the estimator proposed in [60] is

{\hat{\hat{r}}}_{n}^{(m)} (φ, t; h_{n}) = \frac{\sum_{(i_{1}, \dots, i_{m}) \in I (m, n)} φ (Y_{i_{1}}, \dots, Y_{i_{m}}) \prod_{j = 1}^{m} K (\frac{t_{j} - X_{i_{j}}}{h_{n}})}{\sum_{(i_{1}, \dots, i_{m}) \in I (m, n)} \prod_{j = 1}^{m} K (\frac{t_{j} - X_{i_{j}}}{h_{n}})},

(1.2)

where

I (m, n)

is the set of m-tuples of distinct indices from

{1, \dots, n}

. In the special case

m = 1

, (1.2) reduces exactly to the classical Nadaraya–Watson estimator. The importance of conditional U-statistics is conceptual as much as technical. They allow one to estimate local nonlinear characteristics of the conditional distribution of the response, and not merely conditional expectations. In other words, they provide a nonparametric mechanism for accessing quantities that depend simultaneously on several responses observed under nearby covariate values. This includes, among many others, local dependence measures, conditional covariance-type functionals, rank-based association coefficients, discrimination criteria, conditional variability measures, multisample comparison functionals, and a broad class of local nonlinear features inaccessible to ordinary regression smoothers. One may therefore regard conditional U-statistics as the natural bridge between first-order smoothing methods and the much richer world of higher-order conditional inference. The asymptotic theory initiated in [60] has subsequently been developed along several important directions. Sen [63] obtained rates of uniform convergence in the conditioning argument. Prakasa Rao and Sen [64] investigated the corresponding limiting distributions and clarified their relation to Stute’s original results. Harel and Puri [65] extended the theory to weakly dependent observations under mixing assumptions and connected the resulting estimators to Bayes-risk consistency in discrimination problems. Stute [66] introduced symmetrized nearest-neighbor versions of conditional U-statistics as alternatives to ordinary kernel smoothers. A decisive methodological advance was then achieved by [67], who established a much stronger type of consistency: uniformity not only in the location parameter, but also in the bandwidth over shrinking intervals, and simultaneously over classes of kernels

F

. Their argument relied crucially on the local conditional U-process framework developed in [27]. This program has since been extended in several directions; see [20,21,68,69,70]. Closely related higher-order quantile problems, including Bahadur–Kiefer representations and bootstrap properties for U-quantiles, were studied in [71,72,73,74,75,76,77,78,79,80,81,82].

Notwithstanding this impressive progress, one serious limitation persists throughout most of the existing literature: the near-exclusive use of symmetric kernels. While such kernels are natural and convenient on unconstrained Euclidean domains, they are intrinsically mismatched to compact or otherwise constrained supports. Near the boundary, symmetric kernels place nonnegligible mass outside the support of the covariates, generating the well-known boundary bias phenomenon. This issue is classical and pervasive; see, among many others [83,84,85,86,87,88,89,90,91,92,93,94]. A substantial literature has therefore developed around boundary-correction strategies, beginning with early important contributions such as [95,96]. Among the available remedies, support-adapted asymmetric kernels have progressively emerged as one of the most coherent and geometrically faithful solutions [97]. Their key feature is that the kernel support automatically respects the support of the target distribution, while the shape of the kernel varies with the evaluation point. This location-dependent geometry permits an intrinsically adaptive smoothing mechanism and avoids the artificial mass leakage responsible for boundary distortions under symmetric smoothing.

This perspective has generated a large and fertile literature. In the univariate compact-support setting, Chen [98] introduced the beta-kernel estimator for density estimation, and Chen [99] studied the corresponding regression problem. The asymptotic theory of beta kernels was subsequently developed in [100,101,102,103,104,105,106,107,108,109]. Related regression results with fixed design appear in [110,111], and multivariate product beta constructions were considered in [112]. On simplex-constrained supports, Aitchison and Lauder [113] introduced the Dirichlet-kernel estimator for compositional data. Closely connected to these developments is the Bernstein polynomial methodology, whose roots go back to [114]. Its asymptotic analysis was studied by [115,116,117] and later extended to the multivariate setting in [118,119,120,121,122,123,124,125,126,127,128,129]. Further refinements may be found in [130,131,132,133,134,135,136,137,138,139,140,141,142,143,144]. A recurrent conclusion across these works is that asymmetric kernels and Bernstein-type smoothers exhibit markedly improved boundary behavior and can substantially outperform conventional symmetric procedures when the support is compact or geometrically constrained.

Yet, and this is one of the starting points of the present paper, the theory remains largely incomplete at the level of conditional nonlinear functionals. Although asymmetric kernels are now rather well understood for density estimation and, to a lesser extent, for ordinary nonparametric regression, essentially no general asymptotic theory appears to be available for conditional U-statistics smoothed by support-adapted asymmetric kernels. This gap is not a mere technical oversight. The transition from linear local averages to locally weighted U-functionals is conceptually and analytically substantial. One must now analyze a nonlinear ratio of localized U-statistics, control kernels whose shape depends on the evaluation point, handle Hoeffding projections of point-dependent kernels, and deal with canonical terms whose behavior is substantially more intricate than in the linear case. In particular, the higher-order dependence structure intrinsic to conditional U-statistics cannot be treated by a naive transfer of techniques from classical Nadaraya–Watson smoothing or asymmetric density estimation. The nonlinear ratio structure, the localized U-process nature of the fluctuations, and the point-dependent asymmetry of the smoother interact in a genuinely nontrivial manner.

A second, independent, and practically unavoidable layer of difficulty is introduced by missing data. In modern statistical applications, incomplete responses are not exceptional but routine. Missingness arises through nonresponse, sensor failure, fusion of data sources, longitudinal attrition, intermittent measurement, privacy constraints, truncation, corruption, or recording limitations. This issue is ubiquitous across applications; see [145,146,147,148]. The classical taxonomy of [148] distinguishes Missing Completely at Random (MCAR), Missing At Random (MAR), and Not Missing At Random (NMAR). The MCAR assumption is often implausibly strong, whereas NMAR mechanisms are notoriously difficult to analyze without stringent structural assumptions. The MAR framework occupies the most useful intermediate position: missingness may depend on observed covariates, but not on the unobserved response itself once those covariates are conditioned upon. This assumption is simultaneously realistic enough for a broad range of applications and mathematically tractable enough to support rigorous asymptotic analysis. Moreover, as emphasized in [149], procedures developed under MAR can perform remarkably well in practice, often more reliably than misspecified NMAR models. At the same time, the formal content of the MAR assumption is subtler than the usual informal slogan suggests, and important conceptual clarifications may be found in [150,151,152,153].

The conjunction of support-adapted smoothing and incomplete responses is especially compelling when the covariate space is itself bounded, compositional, or otherwise geometrically structured. Compact supports arise naturally in economics and finance, where covariates may be proportions, rates, shares, recovery fractions, or budget allocations. They also occur in compositional models, nonparametric copula methodology [127], the nonparametric component of partial linear regressions [154], matching procedures [155], and structural auction models [156]. In such settings, the support geometry is not a peripheral technicality; it is an intrinsic feature of the inferential problem. A faithful nonparametric theory for higher-order conditional functionals must therefore account, simultaneously and coherently, for nonlinear conditioning, geometric support constraints, boundary bias, and incomplete sampling.

We now clarify more precisely the sense in which the present contribution is new. The individual components of the problem have, of course, substantial precedents. Conditional U-statistics go back to [60] and their uniform asymptotic theory has been developed in several directions. Asymmetric kernels, including beta, Dirichlet and Bernstein-type smoothing devices, are well established in density estimation and in ordinary nonparametric regression on bounded supports. Likewise, complete-case and inverse-probability ideas under Missing-at-Random sampling are classical in missing-data analysis. The novelty of the present paper is not located in any one of these ingredients taken separately, but in the asymptotic analysis of their simultaneous occurrence in a nonlinear conditional U-statistic problem.

More specifically, the estimators considered here combine four features that are usually handled separately: localized higher-order U-statistics, point-dependent asymmetric kernels, boundary-sensitive support geometry, and complete-case MAR sampling. This combination is analytically nontrivial. Unlike ordinary Nadaraya–Watson regression, the numerator and denominator are localized U-statistics and their stochastic behavior is governed by Hoeffding projections, including canonical components whose degeneracy has to be controlled uniformly in the localization point. Unlike symmetric kernel smoothing, the kernels used here are not translation invariant: their supports, moments, normalizing constants and

L^{2}

-norms depend on the evaluation point and may change regime as the point approaches the boundary of the simplex, cube or mixed support. Finally, unlike the complete-data case, MAR sampling changes the effective local design measure from

f (x) d x

to

p (x) f (x) d x

for complete-case estimators, and modifies the leading stochastic variance through the propensity score. Accordingly, the present paper develops a kernel-specific limit theory rather than a merely formal extension of existing results. For Dirichlet, Bernstein and beta smoothers we derive the deterministic centering, uniform stochastic bounds, consistency statements, asymptotic normalizations and variance expressions in forms that explicitly reflect the geometry of the corresponding support. In particular, the bias is governed by the local moments of the asymmetric smoothing device and by the effective complete-case design density

p f

, whereas the leading fluctuation is governed by the first Hoeffding projection and contains the usual inverse-propensity loss of information. This separation between smoothing geometry, higher-order U-statistic structure and MAR-induced information loss is one of the main conceptual outcomes of the paper. Thus, the phrase “unified framework” is used here in a restricted and technical sense: the paper provides a common asymptotic treatment of several support-adapted asymmetric smoothing schemes for conditional U-statistics under MAR sampling, while preserving the kernel-specific bias, variance and boundary behavior of each smoother. We do not claim novelty for U-statistics, asymmetric kernels or MAR methods in isolation. The contribution is the boundary-adapted, MAR-aware and U-process-based asymptotic theory obtained for their combination. The novelty of the paper should be understood in a precise sense. Conditional U-statistics, asymmetric kernels and MAR sampling are all established topics. What appears not to have been developed before is a kernel-specific asymptotic theory for conditional U-statistics smoothed by support-adapted asymmetric kernels when the responses are observed under a Missing-at-Random mechanism. This setting is not a direct corollary of ordinary asymmetric-kernel regression or of complete-data conditional U-statistic theory. The numerator and denominator are localized U-statistics; their fluctuations involve Hoeffding projections and canonical U-process terms; the smoothing kernels are point-dependent and boundary-sensitive; and complete-case MAR sampling replaces the design density by the effective density

p f

in the deterministic centering while inflating the stochastic variance through the propensity score. The paper therefore contributes a genuinely joint analysis of higher-order conditional U-statistics, asymmetric boundary-adapted smoothing and incomplete-response sampling. For Dirichlet, Bernstein and beta smoothers, we obtain explicit consistency results, uniform stochastic bounds, bias descriptions, asymptotic normalizations and variance formulae. The resulting theory identifies separately the roles of boundary geometry, localized U-statistic dependence and MAR-induced information loss. This is the sense in which the proposed framework is unified: it provides one asymptotic architecture for several support-respecting smoothers, while retaining the kernel-specific constants and boundary regimes needed for statistical interpretation.

The relevance of the proposed estimators is not purely abstract. They arise naturally in discrimination and classification, in multisample nonlinear conditional inference, and in the estimation of local rank-based association measures such as conditional Kendall-type coefficients. They also provide a flexible mechanism for estimating conditional functionals that genuinely depend on several responses and cannot be reduced to first-order regression objects. From a practical perspective, their support-adapted nature makes them particularly attractive in problems where the covariates are inherently bounded, while the MAR formulation makes them immediately relevant to realistic data-analytic environments in which incomplete responses are unavoidable. The present manuscript is closely connected to, but technically distinct from, the general delta-sequence theory of [157]. That work provides an abstract probabilistic framework for complete-case conditional U-statistics under MAR, in which the localization device is treated as a positive approximate identity. The present paper develops a sharper and more geometric theory for a specific class of approximate identities, namely support-adapted asymmetric kernels. This specialization is analytically nontrivial because Dirichlet, beta, and Bernstein smoothers are point-dependent, boundary-sensitive, and support-constrained. Their local moments, normalizing constants, and

L^{2}

norms vary across the domain and may change regime near boundary strata. Consequently, the resulting stochastic expansions, uniform rates, and asymptotic variance formulae require arguments beyond those available in the abstract delta-sequence setting.

The remainder of the paper is organized as follows. In Section 3, we introduce the conditional U-statistic estimator based on Dirichlet kernels. We begin with the regression case

m = 1

in Section 3.1, where we prove a uniform convergence result of independent interest; see Theorem 1. The extension to higher-order conditional U-statistics is developed in Section 3.2, culminating in Corollary 1. Their limiting distribution is established in Section 3.3; see Theorem 5 and Corollary 2. Section 4 is devoted to Bernstein polynomial smoothing. The case

m = 1

is treated in Section 4.2; see Theorem 6 and Corollary 3. Higher-order conditional U-statistics based on Bernstein smoothers are analyzed in Section 4.3, with main results in Theorem 8 and Corollary 4. Section 5 addresses beta-kernel smoothing. Weak uniform convergence is established in Section 10, while strong uniform convergence is derived in Section 5.3; see Corollaries 5 and 6. Section 5.4 extends the analysis to mixed categorical and continuous regressors, leading to Corollary 7. Numerical experiments illustrating the finite-sample behavior of the proposed estimators are presented in Section 9. Finally, concluding remarks and possible extensions are gathered in Section 10. For readability, all proofs are deferred to Section 11, and supplementary technical arguments are collected in the Appendix A.

Notation

For the reader’s convenience, we collect in Table 1 the main notation used throughout the paper.

2. Preliminaries and Estimation Procedure

Let us consider a sequence of independent and identically distributed random vectors

{(X_{i}, Y_{i}), i \in N^{*}}

defined on a common probability space

(Ω, F, P)

, where

X_{i} \in X \subseteq {[0, 1]}^{d}

and

Y_{i} \in Y : = R^{q}

. Let

φ : Y^{m} \to R

be a measurable function with respect to the product Borel

σ

-algebra

B {(R^{q})}^{\otimes m}

, and assume that

E [|φ (Y_{1}, \dots, Y_{m})|] < \infty .

We are interested in the conditional functional

r^{(m)} (φ, \tilde{x}) = E [φ (Y_{1}, \dots, Y_{m}) | (X_{1}, \dots, X_{m}) = \tilde{x}], \tilde{x} \in X^{m},

(2.1)

defined as a regular version of the conditional expectation. Indeed, since

X^{m}

is a Borel subset of the Polish space

{({[0, 1]}^{d})}^{m}

, it is a standard Borel space. Therefore, a regular-conditional distribution of

(Y_{1}, \dots, Y_{m})

given

(X_{1}, \dots, X_{m})

exists, and consequently there is a Borel measurable version of the map

\tilde{x} ⟼ E [φ (Y_{1}, \dots, Y_{m}) | (X_{1}, \dots, X_{m}) = \tilde{x}] .

Thus, the quantity in (2.1) is well defined whenever

E [|φ (Y_{1}, \dots, Y_{m})|] < \infty .

Stute [60] presented a class of estimators for

r^{(m)} (φ, \tilde{x})

, called the conditional U-statistics, which is defined for each

\tilde{x} \in X^{m}

and

ℓ \in \{1, 2, 3\}

to be:

\begin{matrix} {\tilde{\tilde{r}}}_{n, ℓ}^{(m)} (φ, \tilde{x}; {\bar{Λ}}_{n, ℓ} (\tilde{x})) = \frac{\sum_{(i_{1}, \dots, i_{m}) \in I (m, n)} φ (Y_{i_{1}}, \dots, Y_{i_{m}}) K_{Λ_{n, ℓ} (x_{1})} (X_{i_{1}}) \dots K_{Λ_{n, ℓ} (x_{m})} (X_{i_{m}})}{\sum_{(i_{1}, \dots, i_{m}) \in I (m, n)} K_{Λ_{n, ℓ} (x_{1})} (X_{i_{1}}) \dots K_{Λ_{n, ℓ} (x_{m})} (X_{i_{m}})}, \end{matrix}

(2.2)

where

{\bar{Λ}}_{n, ℓ} (\tilde{x}) = (Λ_{n, ℓ} (x_{1}), \dots, Λ_{n, ℓ} (x_{m}))

will be specified later in the sections below, and

I (m, n) : = {(i_{1}, \dots, i_{m}) \in {1, \dots, n}^{m} : i_{1}, \dots, i_{m} distinct}

denotes the set of m-tuples of distinct indices. In the particular case

m = 1

,

r^{(m)} (φ, \tilde{x})

reduces to

r^{(1)} (φ, \tilde{x}) = E (φ (Y) | X = x)

and Stute’s estimator becomes the Nadaraya–Watson estimator of

r^{(1)} (φ, \tilde{x})

given by:

{\tilde{\tilde{r}}}_{n, ℓ}^{(1)} (φ, x) : = \frac{\sum_{i = 1}^{n} φ (Y_{i}) K_{Λ_{n, ℓ} (x)} (X_{i})}{\sum_{i = 1}^{n} K_{Λ_{n, ℓ} (x)} (X_{i})} .

(2.3)

A distinctive contribution of the present work is its explicit incorporation of incomplete response observations into the foregoing nonparametric framework. Throughout, the covariate sequence

{X_{i}}_{i = 1}^{n}

is assumed to be fully observed, whereas the corresponding responses may be subject to missingness. To formalize this mechanism, let

δ_{i}

denote the response-observation indicator, defined by

δ_{i} = 1

when

Y_{i}

is observed and

δ_{i} = 0

otherwise. In accordance with the foundational missing-data taxonomies introduced in [147,148,158], we work under the Missing At Random (MAR) paradigm. This assumption stipulates that, conditionally on the observed covariates, the probability of observing the response is independent of the possibly unobserved response value itself. More precisely, we assume that

P (δ_{i} = 1 ∣ X_{i}, Y_{i}) = P (δ_{i} = 1 ∣ X_{i}) = : p (X_{i}) P -almost surely,

(2.4)

where

p : X \to [0, 1]

denotes the conditional probability of response observation, commonly referred to as the propensity score. Unless otherwise stated, this function is assumed to be continuous on its domain.

Condition (2.4) is equivalently interpreted as the conditional independence of

δ_{i}

and

Y_{i}

given

X_{i}

. This assumption provides a mathematically tractable yet substantively meaningful framework for a broad range of applications, including environmental monitoring systems, biomedical follow-up studies, and longitudinal epidemiological investigations. Moreover, as emphasized in [149], inference procedures constructed under a correctly specified MAR mechanism may yield substantially more reliable prediction and imputation performance than approaches based on misspecified nonignorable missingness mechanisms in the absence of adequate auxiliary information. For the asymptotic validity of the resulting complete-case procedure, we impose the standard positivity condition

{inf}_{x \in S} p (x) \geq c > 0,

on the relevant compact or effective domain

S \subset X

. This condition prevents the local effective sample size from degenerating and ensures that the denominator of the estimator remains asymptotically well behaved. The natural extension of (2.2) to the incomplete-response setting is then given by

\begin{matrix} {\hat{r}}_{n, ℓ}^{(m)} (φ, \tilde{x}; {\bar{Λ}}_{n, ℓ} (\tilde{x})) = \frac{\sum_{(i_{1}, \dots, i_{m}) \in I (m, n)} φ (Y_{i_{1}}, \dots, Y_{i_{m}}) δ_{i_{1}} \dots δ_{i_{m}} K_{Λ_{n, ℓ} (x_{1})} (X_{i_{1}}) \dots K_{Λ_{n, ℓ} (x_{m})} (X_{i_{m}})}{\sum_{(i_{1}, \dots, i_{m}) \in I (m, n)} δ_{i_{1}} \dots δ_{i_{m}} K_{Λ_{n, ℓ} (x_{1})} (X_{i_{1}}) \dots K_{Λ_{n, ℓ} (x_{m})} (X_{i_{m}})} . \end{matrix}

(2.5)

Remark 1.

We stress that the estimator defined in (2.5) is a complete-case local ratio estimator. It is not an inverse-probability weighted estimator unless the factors

δ_{i}

are explicitly replaced by

δ_{i} / p (X_{i})

. This distinction is not merely terminological: the two procedures have different deterministic equivalent smoothing measures and therefore different bias constants, although both estimate the same conditional target under MAR. Let

g_{a} (x) : = a (x) f (x), a (x) > 0,

and define the deterministic smoothed functional

r_{a, n, ℓ}^{(m)} (φ, \tilde{x}) = \frac{\int_{X^{m}} r^{(m)} (φ, \tilde{t}) \prod_{j = 1}^{m} g_{a} (t_{j}) K_{Λ_{n, ℓ} (x_{j})} (t_{j}) d \tilde{t}}{\int_{X^{m}} \prod_{j = 1}^{m} g_{a} (t_{j}) K_{Λ_{n, ℓ} (x_{j})} (t_{j}) d \tilde{t}} .

Then the deterministic centering associated with the complete-case estimator is obtained by taking

a = p, g_{p} = p f,

whereas the deterministic centering associated with the IPW ratio estimator is obtained by taking

a \equiv 1, g_{1} = f .

Indeed, under MAR,

E (δ ∣ X, Y) = E (δ ∣ X) = p (X),

so complete-case smoothing changes the local design measure from

f (x) d x

to

p (x) f (x) d x

. In contrast, for the IPW factor,

E (\frac{δ}{p (X)} | X, Y) = 1,

and hence the deterministic IPW smoothing measure is again

f (x) d x

. Consequently, any formula involving the density f alone corresponds either to the complete-data/IPW centering or to the special case where p is locally constant; for the complete-case estimator, the corresponding density is

p f

. This distinction is also visible at the level of the bias expansion. Let

T_{n, ℓ, \tilde{x}}

denote a random vector on

X^{m}

with product density

\prod_{j = 1}^{m} K_{Λ_{n, ℓ} (x_{j})} (t_{j}),

and put

Δ_{n, ℓ} = T_{n, ℓ, \tilde{x}} - \tilde{x}, μ_{n, ℓ} (\tilde{x}) = E (Δ_{n, ℓ}), Σ_{n, ℓ} (\tilde{x}) = E (Δ_{n, ℓ} Δ_{n, ℓ}^{⊤}) .

Writing

q_{a} (\tilde{x}) = \log \{\prod_{j = 1}^{m} g_{a} (x_{j})\} = \sum_{j = 1}^{m} \log g_{a} (x_{j}),

a standard second-order ratio expansion gives, on interior compact sets,

\begin{matrix} r_{a, n, ℓ}^{(m)} (φ, \tilde{x}) - r^{(m)} (φ, \tilde{x}) & = & \nabla r^{(m)} {(φ, \tilde{x})}^{⊤} μ_{n, ℓ} (\tilde{x}) + \frac{1}{2} tr [\nabla^{2} r^{(m)} (φ, \tilde{x}) Σ_{n, ℓ} (\tilde{x})] \\ + \nabla r^{(m)} {(φ, \tilde{x})}^{⊤} Σ_{n, ℓ} (\tilde{x}) \nabla q_{a} (\tilde{x}) + o (∥ μ_{n, ℓ} ∥ + ∥ Σ_{n, ℓ} ∥) . \end{matrix}

Thus, for the complete-case estimator,

q_{p} (\tilde{x}) = \sum_{j = 1}^{m} \log {p (x_{j}) f (x_{j})},

whereas for the IPW estimator,

q_{1} (\tilde{x}) = \sum_{j = 1}^{m} \log f (x_{j}) .

Therefore, the propensity score p cancels from the zeroth-order target under MAR, but in general it does not disappear from the higher-order smoothing bias of the complete-case ratio. It enters through the local design tilt

\nabla \log (p f) = \nabla \log f + \nabla \log p

. Only when p is locally constant, or when the corresponding design-gradient term is of smaller order, does the complete-case bias constant reduce to the complete-data/IPW one. The variance has a different interpretation. In both complete-case and IPW formulations, missingness reduces the effective local information. At the level of the leading Hoeffding projection, the variance contains the usual inverse-propensity inflation. In the simplest first-order case

m = 1

, this takes the familiar local form

Var \{{\hat{r}}_{n, ℓ}^{(1)} (φ, x)\} = \frac{σ_{φ}^{2} (x) {∥ K_{Λ_{n, ℓ} (x)} ∥}_{2}^{2}}{n p (x) f (x)} {1 + o (1)},

where

σ_{φ}^{2} (x) = Var {φ (Y) ∣ X = x} .

For conditional U-statistics of order m, the analogous expression is obtained by replacing

σ_{φ}^{2}

by the appropriate conditional Hoeffding-projection variance. In schematic form,

V_{n, ℓ}^{(m)} (\tilde{x}) = \frac{1}{n} \sum_{j = 1}^{m} \frac{v_{j} (\tilde{x}) {∥ K_{Λ_{n, ℓ} (x_{j})} ∥}_{2}^{2}}{p (x_{j}) f (x_{j})} {1 + o (1)},

where

v_{j} (\tilde{x})

denotes the variance of the j-th conditional first Hoeffding projection of

φ (Y_{1}, \dots, Y_{m})

at

(X_{1}, \dots, X_{m}) = \tilde{x}

. This formula should be read as identifying the missingness contribution to the stochastic scale; the precise kernel-dependent constant is the one displayed in each specific Dirichlet, Bernstein, or beta-kernel theorem. Consequently, the MSE expansion must be written with estimator-specific bias constants:

{MSE}_{c c} \{{\hat{r}}_{n, ℓ}^{(m)} (φ, \tilde{x})\} = {[r_{p, n, ℓ}^{(m)} (φ, \tilde{x}) - r^{(m)} (φ, \tilde{x})]}^{2} + V_{n, ℓ}^{(m)} (\tilde{x}) + o (\cdot),

whereas

{MSE}_{i p w} \{{\hat{r}}_{n, ℓ}^{(m), I P W} (φ, \tilde{x})\} = {[r_{1, n, ℓ}^{(m)} (φ, \tilde{x}) - r^{(m)} (φ, \tilde{x})]}^{2} + V_{n, ℓ}^{(m)} (\tilde{x}) + o (\cdot) .

If, for a given asymmetric smoothing scheme, the squared bias and variance have the generic local orders

{[r_{a, n, ℓ}^{(m)} (φ, \tilde{x}) - r^{(m)} (φ, \tilde{x})]}^{2} \sim C_{B, a, ℓ} (\tilde{x}) b_{n}^{2 β}, V_{n, ℓ}^{(m)} (\tilde{x}) \sim \frac{C_{V, ℓ} (\tilde{x})}{n b_{n}^{κ}},

then the corresponding pointwise optimal bandwidth is

b_{n, a}^{★} (\tilde{x}) = {\{\frac{κ C_{V, ℓ} (\tilde{x})}{2 β C_{B, a, ℓ} (\tilde{x}) n}\}}^{1 / (2 β + κ)} .

Thus the complete-case optimal bandwidth uses the bias constant

C_{B, p, ℓ}

, which involves the effective density

p f

, whereas the IPW optimal bandwidth uses

C_{B, 1, ℓ}

, which involves f. This is the precise sense in which the complete-case and IPW formulations must be kept separate throughout the density, bias, variance, MSE, and bandwidth calculations.

The statistic in (2.5) is therefore a complete-case local ratio estimator, obtained by retaining only those m-tuples for which all response components are observed. Owing to the MAR mechanism, however, its exact finite-sample centering is not, in general, the target functional

r^{(m)} (φ, \tilde{x})

itself. Rather, the estimator is naturally centered around the propensity-weighted smoothed functional

r_{p, n, ℓ}^{(m)} (φ, \tilde{x}) : = \frac{\int_{X^{m}} r^{(m)} (φ, \tilde{t}) \prod_{j = 1}^{m} p (t_{j}) f (t_{j}) K_{Λ_{n, ℓ} (x_{j})} (t_{j}) d \tilde{t}}{\int_{X^{m}} \prod_{j = 1}^{m} p (t_{j}) f (t_{j}) K_{Λ_{n, ℓ} (x_{j})} (t_{j}) d \tilde{t}} .

Thus, the missingness mechanism modifies the local smoothing measure through the multiplicative factor

p (\cdot)

, thereby replacing the design density f by the effective complete-case density

p f

. Nevertheless, under the continuity and positivity assumptions imposed on the propensity score, this modification is asymptotically negligible at the target point. Consequently,

r_{p, n, ℓ}^{(m)} (φ, \tilde{x}) = r^{(m)} (φ, \tilde{x}) + o (1),

uniformly over the domain of interest. This observation shows that, although the complete-case estimator is centered around a finite-sample propensity-adjusted smoothing functional, it remains asymptotically centered at the same regression-type target as in the fully observed case. Throughout this paper, any multivariate point will be written in bold. To avoid confusion, we note

x = (x_{1}, \dots, x_{d})

for

x \in X

and we denote

\tilde{x} : = (x_{1}, \dots, x_{m})

a m-tuple of multivariate points

x_{i} \in X

,

1 \leq i \leq m

. Accordingly, we denote

1 = (1, \dots, 1)

as a d-dimensional vector whose components are all equal to 1, and

\tilde{1} = (1, \dots, 1)

an m-tuple of points

1

. From now on, we shall use the following notation:

\tilde{X} : = (X_{1}, \dots, X_{m}) \in X^{m}, and {\tilde{X}}_{i} : = (X_{i_{1}}, \dots, X_{i_{m}}) \in X^{m}, i \in I (m, n),

\tilde{Y} : = (Y_{1}, \dots, Y_{m}) \in R^{q m}, and {\tilde{Y}}_{i} : = (Y_{i_{1}}, \dots, Y_{i_{m}}) \in R^{q m}, i \in I (m, n),

\tilde{δ} : = (δ_{1}, \dots, δ_{m}) \in {0, 1}^{m}, and {\tilde{δ}}_{i} : = (δ_{i_{1}}, \dots, δ_{i_{m}}) \in {0, 1}^{m}, i \in I (m, n) .

We now define for all

\tilde{x} = (x_{1}, \dots, x_{m}) \in X^{m}

, and

ℓ \in \{1, 2, 3\}

the augmented kernel that incorporates both the kernel weights and the missingness indicators:

G_{φ, \tilde{x}, ℓ}^{(miss)} (\tilde{t}, \tilde{y}, \tilde{δ}) = φ (\tilde{y}) (\prod_{j = 1}^{m} δ_{j}) {\tilde{K}}_{{\bar{Λ}}_{n, ℓ} (\tilde{x})} (\tilde{t}), (\tilde{t}, \tilde{y}, \tilde{δ}) \in X^{m} \times R^{q m} \times {0, 1}^{m},

(2.6)

where

{\tilde{K}}_{{\bar{Λ}}_{n, ℓ} (\tilde{x})} (\tilde{t}) : = \prod_{i = 1}^{m} K_{Λ_{n, ℓ} (x_{i})} (t_{i}) .

For

ℓ \in \{1, 2, 3\}

, we now define the U-statistic based on the extended random vectors

Z_{i} : = (X_{i}, Y_{i}, δ_{i})

:

u_{n, ℓ} (φ, \tilde{x}) : = u_{n, ℓ}^{(m)} (G_{φ, \tilde{x}, ℓ}^{(miss)}) = \frac{(n - m)!}{n!} \sum_{i \in I (m, n)} G_{φ, \tilde{x}, ℓ}^{(miss)} ({\tilde{X}}_{i}, {\tilde{Y}}_{i}, {\tilde{δ}}_{i}) .

(2.7)

We can see that

{\hat{r}}_{n, ℓ}^{(m)} (φ, \tilde{x}, {\bar{Λ}}_{n, ℓ} (\tilde{x})) = \frac{u_{n, ℓ} (φ, \tilde{x})}{u_{n, ℓ} (1, \tilde{x})},

(2.8)

where

u_{n, ℓ} (1, \tilde{x})

corresponds to the constant function

φ \equiv 1

. In establishing the uniform consistency of

{\hat{r}}_{n, ℓ}^{(m)} (φ, \tilde{x}, {\bar{Λ}}_{n, ℓ} (\tilde{x}))

with respect to

r^{(m)} (φ, \tilde{x})

, an alternative and more suitable centering factor will be considered instead of the expectation

E [{\hat{r}}_{n, ℓ}^{(m)} (φ, \tilde{x}, {\bar{Λ}}_{n, ℓ} (\tilde{x}))]

, which may either be non-existent or computationally challenging to determine. This alternative centering is defined as follows:

\hat{E} [{\hat{r}}_{n, ℓ}^{(m)} (φ, \tilde{x}, {\bar{Λ}}_{n, ℓ} (\tilde{x}))] = \frac{E [u_{n, ℓ} (φ, \tilde{x})]}{E [u_{n, ℓ} (1, \tilde{x})]} .

(2.9)

2.1. Hoeffding Decomposition for Symmetric Kernels

The notation and facts presented below should be included in the continuation of this discussion. For a kernel L of

k \geq 1

variables we define

U_{n}^{(k)} (L) = \frac{(n - k)!}{n!} \sum_{i \in I (k, n)} L (X_{i_{1}}, \dots, X_{i_{k}}) .

Suppose that L is a function of

k \geq 1

variables, symmetric in its entries. Then, the Hoeffding projections (see [2,19]) with respect to

P

, for

1 \leq j \leq k

, are defined as

π_{j, k} L (x_{1}, \dots, x_{j}) = (Δ_{x_{1}} - P) \times \dots \times (Δ_{x_{j}} - P) \times P^{k - j} (L),

and

π_{0, k} L = E L (X_{1}, \dots, X_{k}),

where for measures

Q_{i}

on

X

we denote

Q_{1} \dots Q_{m} L = \int_{X^{m}} L (x_{1}, \dots, x_{m}) d Q_{1} (x_{1}) \dots d Q_{m} (x_{m}),

and

Δ_{x}

denotes the Dirac measure at point

x \in X

. Then, the decomposition of [2] gives the following orthogonal expansion:

U_{n}^{(k)} (L) - E L = \sum_{j = 1}^{k} (\binom{k}{j}) U_{n}^{(j)} (π_{j, k} L) .

For

L \in L_{2} (P^{k})

this denotes an orthogonal decomposition and

E (π_{j, k} L ∣ X_{2}, \dots, X_{j}) = 0

almost surely for

j \geq 1

; that is, the kernels

π_{j, k} L

are canonical (degenerate) for

P

. Also,

π_{j, k}, j \geq 1

, are nested projections, i.e.,

π_{j, k} \circ π_{j^{'}, k} = π_{j, k}

if

j \leq j^{'}

, and

E {(π_{j, k} L)}^{2} \leq E {(L - E L)}^{2} \leq E L^{2} .

For example, for

k \geq 1

,

π_{1, k} L (x) = E (L (X_{1}, \dots, X_{k}) ∣ X_{1} = x) - E L (X_{1}, \dots, X_{k}) .

Remark 2.

The functions

G_{φ, \tilde{x}, ℓ}^{(miss)}

defined in (2.6) are not necessarily symmetric in their m arguments because the product kernel

\prod_{i = 1}^{m} K_{Λ_{n, ℓ} (x_{i})} (t_{i})

depends on the ordered m-tuple

(x_{1}, \dots, x_{m})

. When we need to symmetrize them, we define the averaged kernel:

{\bar{G}}_{φ, \tilde{x}, ℓ}^{(miss)} (\tilde{t}, \tilde{y}, \tilde{δ}) : = \frac{1}{m!} \sum_{σ \in S_{m}} G_{φ, \tilde{x}, ℓ}^{(miss)} ({\tilde{t}}_{σ}, {\tilde{y}}_{σ}, {\tilde{δ}}_{σ}) = \frac{1}{m!} \sum_{σ \in S_{m}} φ ({\tilde{y}}_{σ}) (\prod_{j = 1}^{m} δ_{σ_{j}}) {\tilde{K}}_{{\bar{Λ}}_{n, ℓ} ({\tilde{x}}_{σ})} ({\tilde{t}}_{σ}),

where

S_{m}

denotes the symmetric group on

{1, \dots, m}

,

{\tilde{t}}_{σ} = (t_{σ_{1}}, \dots, t_{σ_{m}})

,

{\tilde{y}}_{σ} = (y_{σ_{1}}, \dots, y_{σ_{m}})

,

{\tilde{δ}}_{σ} = (δ_{σ_{1}}, \dots, δ_{σ_{m}})

, and

{\tilde{x}}_{σ} = (x_{σ_{1}}, \dots, x_{σ_{m}})

. After symmetrization, the expectation

E [{\bar{G}}_{φ, \tilde{x}, ℓ}^{(miss)} (\tilde{t}, \tilde{y}, \tilde{δ})] = E [G_{φ, \tilde{x}, ℓ}^{(miss)} (\tilde{t}, \tilde{y}, \tilde{δ})],

and the U-statistic

u_{n, ℓ}^{(m)} (G_{φ, \tilde{x}, ℓ}^{(miss)}) = u_{n, ℓ}^{(m)} ({\bar{G}}_{φ, \tilde{x}, ℓ}^{(miss)}) : = u_{n, ℓ} (φ, \tilde{x})

remains unchanged. Consequently, we may assume without loss of generality that the kernel is symmetric when applying the Hoeffding decomposition.

Before presenting the conditions and primary results, we introduce notation that distinguishes between the complete-data design density and the effective complete-case design density induced by the MAR mechanism. For

a > 0

,

Γ (a) = \int_{0}^{\infty} t^{a - 1} e^{- t} d t

denotes the gamma function. For a differentiable function

h : R^{d} \to R

,

\nabla h (z)

denotes the d-dimensional column vector of first-order partial derivatives at

z

. Let

f = f_{X}

denote the marginal density of

X

with respect to Lebesgue measure on

X

. For

\tilde{x} = (x_{1}, \dots, x_{m}) \in X^{m}

, set

\tilde{f} (\tilde{x}) : = \prod_{j = 1}^{m} f (x_{j}) .

In the complete-data case, the natural density-weighted regression functional is

R (φ, \tilde{x}) : = \tilde{f} (\tilde{x}) r^{(m)} (φ, \tilde{x}) .

Under the MAR mechanism, however, the complete-case estimator is centered with respect to the effective complete-case density

g (x) : = p (x) f (x),

where

p (x) = P (δ = 1 ∣ X = x)

is the propensity score. We therefore define

\tilde{g} (\tilde{x}) : = \prod_{j = 1}^{m} g (x_{j}) = \prod_{j = 1}^{m} p (x_{j}) f (x_{j}),

and

R_{p} (φ, \tilde{x}) : = \tilde{g} (\tilde{x}) r^{(m)} (φ, \tilde{x}) .

Equivalently,

R_{p} (φ, \tilde{x}) = \{\prod_{j = 1}^{m} p (x_{j})\} R (φ, \tilde{x}) .

Thus, in all deterministic centering, bias, MSE, and bandwidth calculations for the complete-case MAR estimator, the complete-data density

\tilde{f}

must be replaced by the effective density

\tilde{g}

. Formulae involving

\tilde{f}

alone correspond either to the fully observed case, to the IPW formulation, or to the special case in which p is constant at the relevant order. The expression “

X \overset{D}{=} Y

” denotes that the random variable X has the same distribution as Y, while “a.s.” stands for “almost surely” with respect to

P

. Moreover,

∥ A ∥

represents the Frobenius norm of the matrix

A

, defined as

∥ A ∥ = {\{tr (A^{⊤} A)\}}^{1 / 2}

.

2.2. Algorithmic Summary of the Estimator Under MAR

For convenience, we summarize the computation of the complete-case conditional U-statistic estimator under the MAR mechanism. The procedure below applies to the three smoothing schemes considered in this paper, namely Dirichlet (

ℓ = 1

), Bernstein (

ℓ = 2

), and Beta or mixed kernels (

ℓ = 3

).

Algorithm 1 makes explicit the mapping from the inputs

({(X_{i}, Y_{i}, δ_{i})}_{i = 1}^{n}, φ, \tilde{x}, ℓ, {\bar{Λ}}_{n, ℓ} (\tilde{x}))

to the output

{\hat{r}}_{n, ℓ}^{(m)} (φ, \tilde{x}; {\bar{Λ}}_{n, ℓ} (\tilde{x})) .

The factor

\prod_{j = 1}^{m} δ_{i_{j}}

ensures that only complete m-tuples contribute to the estimator, in accordance with the complete-case approach under the MAR assumption.

Algorithm 1 Complete-case conditional U-statistic estimator under MAR

Require: Observations

{(X_{i}, Y_{i}, δ_{i})}_{i = 1}^{n}

, order

m \geq 1

,

\tilde{x} = (x_{1}, \dots, x_{m})

Ensure:

{\hat{r}}_{n, ℓ}^{(m)} (φ, \tilde{x}; {\bar{Λ}}_{n, ℓ} (\tilde{x}))

1:: for $j = 1, \dots, m$ do
2:: Construct the kernel $K_{Λ_{n, ℓ} (x_{j})} (\cdot)$ associated with $x_{j}$
3:: end for
4:: Set $Num \leftarrow 0$ and $Den \leftarrow 0$
5:: for each $i = (i_{1}, \dots, i_{m}) \in I (m, n)$ do
6:: Compute the weight

$W_{i} (\tilde{x}) : = (\prod_{j = 1}^{m} δ_{i_{j}}) (\prod_{j = 1}^{m} K_{Λ_{n, ℓ} (x_{j})} (X_{i_{j}}))$
7:: $Num \leftarrow Num + φ (Y_{i_{1}}, \dots, Y_{i_{m}}) W_{i} (\tilde{x})$
8:: $Den \leftarrow Den + W_{i} (\tilde{x})$
9:: end for
10:: if $Den > 0$ then return $Num / Den$
11:: elsereturn a default value, or leave the estimator undefined
12:: end if

Remark 3.

While Bouzebda et al. [159] addressed the complete-data case, the present paper goes considerably further by treating the Missing-at-Random framework, which introduces a genuinely nontrivial layer of analytical complexity. Indeed, once missingness is allowed, the asymptotic study can no longer be transferred mechanically from the complete-data setting: the complete-case structure, the presence of the propensity score, and the nonlinear nature of conditional U-statistics require new arguments at both the probabilistic and statistical levels. In this sense, the current contribution is not simply an extension of [159], but a substantial strengthening and broadening of that earlier work. Moreover, the paper is accompanied by a markedly richer and more complete numerical study, offering a deeper understanding of the finite-sample performance of the proposed procedures.

2.3. Conditions and Comments

(C.0)

The propensity score

p (x) = P (δ = 1 ∣ X = x)

satisfies:

$p : X \to (0, 1]$ is continuous on $X$ ;
there exists a constant $c_{p} > 0$ such that

$inf_{x \in X} p (x) \geq c_{p} > 0;$
p is twice continuously differentiable on $Int (X)$ .

Moreover, the missingness mechanism satisfies the MAR condition

P (δ = 1 ∣ X = x, Y) = P (δ = 1 ∣ X = x) = p (x) .

(C.1)

Let

g (x) : = p (x) f (x), \tilde{g} (\tilde{x}) : = \prod_{j = 1}^{m} g (x_{j}) = \prod_{j = 1}^{m} p (x_{j}) f (x_{j}),

and define the MAR complete-case density-weighted functional

R_{p} (φ, \tilde{x}) : = \tilde{g} (\tilde{x}) r^{(m)} (φ, \tilde{x}) .

The function

R_{p} (φ, \cdot)

is Lipschitz continuous on

X^{m}

; that is, there exists a constant

L_{R, p} > 0

such that, for all

\tilde{x}, {\tilde{x}}^{'} \in X^{m}

,

|R_{p} (φ, \tilde{x}) - R_{p} (φ, {\tilde{x}}^{'})| \leq L_{R, p} {∥ \tilde{x} - {\tilde{x}}^{'} ∥}_{2} .

Equivalently,

R_{p} (φ, \tilde{x}) = \{\prod_{j = 1}^{m} p (x_{j})\} R (φ, \tilde{x}) .

Thus, when

p \equiv 1

, condition (C.1) reduces to the corresponding complete-data Lipschitz condition on

R (φ, \cdot)

.

(C.2)

The effective complete-case density

\tilde{g}

and the MAR density-weighted functional

R_{p} (φ, \cdot)

admit continuous second-order partial derivatives on

Int (X^{m})

; that is,

\tilde{g} \in C^{2} (Int (X^{m})), R_{p} (φ, \cdot) \in C^{2} (Int (X^{m})) .

Equivalently, since

g = p f

, condition (C.2) requires second-order regularity of the effective complete-case design density rather than merely of the original design density. Under (C.0), it is implied by the corresponding

C^{2}

-regularity of

\tilde{f}

,

R (φ, \cdot)

, and p.

(C.3)

There exist constants

γ > 0

and

C_{1, p} \in [1, \infty)

such that

E [| φ (\tilde{Y}) |^{2 + γ}] < \infty

and

sup_{\tilde{x} \in Int (X^{m})} E (| φ (\tilde{Y}) |^{2 + γ} | \tilde{X} = \tilde{x}) \tilde{g} (\tilde{x}) \leq C_{1, p} .

(2.10)

Equivalently,

sup_{\tilde{x} \in Int (X^{m})} E (| φ (\tilde{Y}) |^{2 + γ} | \tilde{X} = \tilde{x}) \prod_{j = 1}^{m} p (x_{j}) f (x_{j}) \leq C_{1, p} .

Because

0 < p \leq 1

, the complete-data condition formulated with

\tilde{f}

implies (C.3). Conversely, since

p \geq c_{p} > 0

, the condition with

\tilde{g}

is equivalent to the corresponding condition with

\tilde{f}

, up to multiplicative constants. The formulation above is the natural one for the complete-case estimator, because its augmented kernel contains the factor

\prod_{j = 1}^{m} δ_{i_{j}}

.

2.4. Comments

We first record a few consequences and interpretations of the preceding assumptions. Since

\tilde{g} \in C^{2} (Int (X^{m}))

by (C.2), and since the analysis is carried out on compact subsets of the interior, there exists a constant

C_{0} \in [1, \infty)

such that

sup_{\tilde{x} \in C} \tilde{g} (\tilde{x}) \leq C_{0}, C ⋐ Int (X^{m}) .

(2.11)

If

X^{m}

is compact and

\tilde{g}

admits a continuous extension to

X^{m}

, then the same bound holds with

C = X^{m}

. In view of the positivity condition in (C.0), this is equivalent, up to multiplicative constants, to the corresponding boundedness of

\tilde{f}

. Nevertheless, for the complete-case MAR estimator, the natural quantity is

\tilde{g}

, not

\tilde{f}

, because the deterministic centering is taken with respect to the effective complete-case design density

g = p f

. The uniform conditional moment condition (2.10) in (C.3) should be understood in the same complete-case sense. Namely,

E (| φ (\tilde{Y}) |^{2 + γ} | \tilde{X} = \tilde{x})

is allowed to grow near boundary regions or low-density regions, but only at a rate controlled by the inverse of the effective complete-case density

\tilde{g} (\tilde{x}) = \prod_{j = 1}^{m} p (x_{j}) f (x_{j}) .

Equivalently, the admissible growth is no faster than

{\{\prod_{j = 1}^{m} p (x_{j}) f (x_{j})\}}^{- 1} .

Because p is bounded away from zero by (C.0), this condition is equivalent to the corresponding complete-data formulation involving

\tilde{f}

, up to fixed constants. The formulation with

\tilde{g}

, however, is the appropriate one for the estimator actually studied in this paper, since its augmented kernel contains the complete-case factor

\prod_{j = 1}^{m} δ_{i_{j}}

. Conditions of this type are standard in nonparametric smoothing with possibly unbounded responses; see, for instance, Assumption 2 of [160] and Assumption A3 of [106].

Condition (C.3) is used in the truncation step of the proof. More precisely, it provides a uniform envelope control for the localized U-process after weighting by the effective complete-case density. This is needed to separate the contribution of large values of

φ (\tilde{Y})

from the main stochastic term and to obtain the stated uniform rates. As in [68,161], the polynomial moment condition in (C.3) may be replaced by a more general Orlicz-type integrability assumption.

(C.3)″: Let $M : [0, \infty) \to [0, \infty)$ be a continuous nondecreasing function such that, for some $s > 2$ , as $x \to \infty$ ,

$(i) x^{- s} M (x) ↓, (i i) x^{- 1} \log M (x) ↑ .$

(2.12)

For $t \geq M (0)$ , let $M^{inv} (t) \geq 0$ be defined by

$M (M^{inv} (t)) = t .$

Assume that

$E [M (| φ (\tilde{Y}) |)] < \infty .$

When a localized uniform version is required, this assumption is strengthened to

$sup_{\tilde{x} \in Int (X^{m})} E [M (| φ (\tilde{Y}) |) | \tilde{X} = \tilde{x}] \tilde{g} (\tilde{x}) < \infty .$

Two particularly useful choices are

M (x) = x^{p}, p > 2,

which recovers a polynomial moment assumption, and

M (x) = \exp (s x), s > 0,

which corresponds to exponential-type integrability. These alternatives lead to the same truncation strategy, with the truncation level calibrated through

M^{inv}

.

3. Conditional $U$ -Statistic Estimators Based on Dirichlet Kernels

In this section, we take

X = S_{d, 1}

, where

S_{d, 1} : = \{x \in {[0, 1]}^{d} : {∥ x ∥}_{1} \leq 1\},

and

Int (S_{d, 1}) = \{x \in {(0, 1)}^{d} : {∥ x ∥}_{1} < 1\},

where

{∥ x ∥}_{1} : = \sum_{i = 1}^{d} |x_{i}|

and

d \in N^{*}

. Accordingly,

Int (S_{d, 1}^{m}) = \{\tilde{x} = (x_{1}, \dots, x_{m}) \in {(S_{d, 1})}^{m} : x_{j} \in Int (S_{d, 1}), 1 \leq j \leq m\} .

For

α_{1}, \dots, α_{d}, β > 0

, the density of the

Dirichlet (α, β)

distribution with respect to the Lebesgue measure on

R^{d}

restricted to

S_{d, 1}

is

K_{α, β} (x) : = \frac{Γ ({∥ α ∥}_{1} + β)}{Γ (β) \prod_{i = 1}^{d} Γ (α_{i})} \cdot {(1 - {∥ x ∥}_{1})}^{β - 1} \prod_{i = 1}^{d} x_{i}^{α_{i} - 1}, x \in S_{d, 1} .

We refer to Chapter 49 of [162,163]. The Aitchison-Lauder proposal introduces a significant aspect wherein the kernel

K_{α, β} (\cdot)

form alters with the position

x

within the simplex. This adaptation mitigates the boundary bias issue prevalent in conventional estimators, where the kernel remains constant across all points. Throughout this section, for each

j = 1, \dots, m

, we set

Λ_{n, 1} (x_{j}) = (α_{j}, β_{j}) : = (\frac{x_{j}}{\overset{˘}{b}} + 1, \frac{1 - ∥ x_{j} ∥_{1}}{\overset{˘}{b}} + 1), for x_{j} \in S_{d, 1}, \overset{˘}{b} > 0 .

(3.1)

The bandwidth parameter

\overset{˘}{b}

, denoted as

\overset{˘}{b} (n)

, inherently depends on the sample size n. Now, we can introduce a new conditional U-statistic regression estimator using the Dirichlet kernel by replacing (3.1) in (2.2), and we obtain

\begin{matrix} {\tilde{r}}_{n, 1}^{(m)} (φ, \tilde{x}; {\bar{Λ}}_{n, 1} (\tilde{x})) = \frac{\sum_{(i_{1}, \dots, i_{m}) \in I (m, n)} φ (Y_{i_{1}}, \dots, Y_{i_{m}}) \prod_{j = 1}^{m} K_{(α_{j}, β_{j})} (X_{i_{j}})}{\sum_{(i_{1}, \dots, i_{m}) \in I (m, n)} \prod_{j = 1}^{m} K_{(α_{j}, β_{j})} (X_{i_{j}})} . \end{matrix}

(3.2)

Under the MAR assumption (2.4), the complete-case estimator extending (3.2) to missing responses is given by

\begin{matrix} {\hat{r}}_{n, 1}^{(m)} (φ, \tilde{x}; {\bar{Λ}}_{n, 1} (\tilde{x})) = \frac{\sum_{(i_{1}, \dots, i_{m}) \in I (m, n)} φ (Y_{i_{1}}, \dots, Y_{i_{m}}) (\prod_{j = 1}^{m} δ_{i_{j}}) \prod_{j = 1}^{m} K_{(α_{j}, β_{j})} (X_{i_{j}})}{\sum_{(i_{1}, \dots, i_{m}) \in I (m, n)} (\prod_{j = 1}^{m} δ_{i_{j}}) \prod_{j = 1}^{m} K_{(α_{j}, β_{j})} (X_{i_{j}})} . \end{matrix}

(3.3)

For this specific kernel, the augmented kernel defined in (2.6) becomes

G_{φ, \tilde{x}, 1}^{(Dir)} (\tilde{t}, \tilde{y}, \tilde{δ}) = φ (\tilde{y}) (\prod_{j = 1}^{m} δ_{j}) \prod_{j = 1}^{m} K_{(α_{j}, β_{j})} (t_{j}), (\tilde{t}, \tilde{y}, \tilde{δ}) \in S_{d, 1}^{m} \times R^{q m} \times {0, 1}^{m},

(3.4)

where

(α_{j}, β_{j})

are given by (3.1) with

x_{j}

being the j-th component of

\tilde{x}

. The corresponding U-statistic is

u_{n, 1}^{(Dir)} (φ, \tilde{x}) : = \frac{(n - m)!}{n!} \sum_{i \in I (m, n)} G_{φ, \tilde{x}, 1}^{(Dir)} ({\tilde{X}}_{i}, {\tilde{Y}}_{i}, {\tilde{δ}}_{i}),

(3.5)

and the ratio representation (2.8) holds analogously:

{\hat{r}}_{n, 1}^{(m)} (φ, \tilde{x}, {\bar{Λ}}_{n, 1} (\tilde{x})) = \frac{u_{n, 1}^{(Dir)} (φ, \tilde{x})}{u_{n, 1}^{(Dir)} (1, \tilde{x})} .

(3.6)

We first consider the regression case

m = 1

, which serves as a building block for the higher-order analysis. These findings are essential for examining the estimators outlined in (3.2).

3.1. Nonparametric Regression Estimation

Let us consider the following quantities:

{\hat{g}}_{n} (φ, x, Λ_{n, 1}) : = \frac{1}{n} \sum_{i = 1}^{n} φ (Y_{i}) K_{(α, β)} (X_{i}),

and

{\hat{f}}_{n} (x, Λ_{n, 1}) : = \frac{1}{n} \sum_{i = 1}^{n} K_{(α, β)} (X_{i}) .

In this section, we establish uniform strong consistency of the following regression estimator defined by

{\hat{r}}_{n, 1}^{(1)} (φ, x) = \frac{{\hat{g}}_{n} (φ, x, Λ_{n, 1})}{{\hat{f}}_{n} (x, Λ_{n, 1})} .

(3.7)

Under the MAR assumption (2.4), the complete-case versions of

{\hat{g}}_{n}

and

{\hat{f}}_{n}

incorporating the missingness indicators are defined as

{\hat{g}}_{n}^{(miss)} (φ, x, Λ_{n, 1}) : = \frac{1}{n} \sum_{i = 1}^{n} φ (Y_{i}) δ_{i} K_{(α, β)} (X_{i}),

(3.8)

and

{\hat{f}}_{n}^{(miss)} (x, Λ_{n, 1}) : = \frac{1}{n} \sum_{i = 1}^{n} δ_{i} K_{(α, β)} (X_{i}) .

(3.9)

The corresponding regression estimator for missing responses is then given by

{\hat{r}}_{n, 1}^{(1), (miss)} (φ, x) = \frac{{\hat{g}}_{n}^{(miss)} (φ, x, Λ_{n, 1})}{{\hat{f}}_{n}^{(miss)} (x, Λ_{n, 1})} .

(3.10)

Finally, we represent the expectation of

{\hat{g}}_{n}^{(miss)} (φ, x, Λ_{n, 1})

as:

E [{\hat{g}}_{n}^{(miss)} (φ, x, Λ_{n, 1})] = E [φ (Y) δ K_{(α, β)} (X)] = \int_{S_{d, 1}} r^{(1)} (φ, u) f (u) p (u) K_{(α, β)} (u) d u,

where

p (u)

is the propensity score defined in (2.4). Alternatively, notice that if

ξ_{x} \sim Dirichlet (α, β)

, then we also have the representation

E [{\hat{g}}_{n}^{(miss)} (φ, x, Λ_{n, 1})] = E [R (φ, ξ_{x}) p (ξ_{x})],

where

R^{(1)} (φ, x) = f (x) r^{(1)} (φ, x)

. To derive uniform consistency results, we adopt the following approach

\begin{matrix} {\hat{r}}_{n, 1}^{(1), (miss)} (φ, x) - r^{(1)} (φ, x) \\ = \frac{1}{{\hat{f}}_{n}^{(miss)} (x, Λ_{n, 1})} ({\hat{g}}_{n}^{(miss)} (φ, x, Λ_{n, 1}) - E [{\hat{g}}_{n}^{(miss)} (φ, x, Λ_{n, 1})]) \\ - \frac{E [{\hat{g}}_{n}^{(miss)} (φ, x, Λ_{n, 1})]}{{\hat{f}}_{n}^{(miss)} (x, Λ_{n, 1}) E [{\hat{f}}_{n}^{(miss)} (x, Λ_{n, 1})]} ({\hat{f}}_{n}^{(miss)} (x, Λ_{n, 1}) - E [{\hat{f}}_{n}^{(miss)} (x, Λ_{n, 1})]) \\ - (E (φ (Y) | X = x) - \frac{E [{\hat{g}}_{n}^{(miss)} (φ, x, Λ_{n, 1})]}{E [{\hat{f}}_{n}^{(miss)} (x, Λ_{n, 1})]}) . \end{matrix}

For

δ > 0

, define

\begin{matrix} S_{d, 1} (δ) : = \{x \in S_{d, 1} : 1 - {∥ x ∥}_{1} \geq δ and x_{i} \geq δ, \forall i = 1, \dots, d\} . \end{matrix}

(3.11)

To the best of our knowledge, the following result has not been established for Dirichlet-kernel estimators in the presence of missing responses under MAR.

Theorem 1.

Assume that the conditions (C.0), (C.1) and (C.3) hold. Under the MAR assumption (2.4) and the positivity condition

{inf}_{x \in S_{d, 1} (\overset{˘}{b} d)} p (x) \geq c > 0

, if, in addition,

{\overset{˘}{b}}^{- d} \leq n

as

n \to \infty

, we have, as

n \to \infty

,

sup_{x \in S_{d, 1} (\overset{˘}{b} d)} |{\hat{r}}_{n, 1}^{(1), (miss)} (φ, x) - r^{(1)} (φ, x)| = O ({\overset{˘}{b}}^{1 / 2}) + O (\frac{| \log \overset{˘}{b} {| (\log n)}^{3 / 2}}{{\overset{˘}{b}}^{d + 1 / 2} \sqrt{n}}), a . s .

(3.12)

where

{\hat{r}}_{n, 1}^{(1), (miss)} (φ, x)

is defined in (3.10). In particular, if

| \log \overset{˘}{b} |^{2} {\overset{˘}{b}}^{- 2 d - 1} = o (n / {(\log n)}^{3})

as

n \to \infty

, then

sup_{x \in S_{d, 1} (\overset{˘}{b} d)} |{\hat{r}}_{n, 1}^{(1), (miss)} (φ, x) - r^{(1)} (φ, x)| \to 0, a . s .

3.2. Uniform Convergence of Conditional U-Statistics Under Missing Data

In this section, we establish the following uniform almost sure convergence results regarding the uniform almost sure consistency for the conditional U-statistics in the presence of missing responses. Below, we state the uniform consistency of conditional U-statistics when the function

φ (\cdot)

is not necessarily bounded, under the MAR assumption.

Theorem 2.

If Assumptions (C.2) and (C.3) hold, and under the MAR assumption (2.4) with

{inf}_{x \in S_{d, 1}^{m}} p (x) \geq c > 0

, then, as

n \to \infty,

sup_{\tilde{x} \in S_{d, 1}^{m}} |u_{n, 1}^{(Dir)} (φ, \tilde{x}) - E [u_{n, 1}^{(Dir)} (φ, \tilde{x})]| = O (\frac{| \log \overset{˘}{b} |^{m} {(\log n)}^{3 / 2}}{{\overset{˘}{b}}^{m (d + 1 / 2)} \sqrt{n}}), a . s .

(3.13)

where

u_{n, 1}^{(Dir)} (φ, \tilde{x})

is defined in (3.5).

Theorem 3.

If Assumptions (C.2) and (C.3) hold, and under the MAR assumption (2.4) with

{inf}_{x \in S_{d, 1}^{m}} p (x) \geq c > 0

, then, as

n \to \infty,

sup_{\tilde{x} \in S_{d, 1}^{m}} |{\hat{r}}_{n, 1}^{(m), (miss)} (φ, \tilde{x}; {\bar{Λ}}_{n, 1} (\tilde{x})) - \hat{E} [{\hat{r}}_{n, 1}^{(m), (miss)} (φ, \tilde{x}; {\bar{Λ}}_{n, 1} (\tilde{x}))]| = O (\frac{| \log \overset{˘}{b} |^{m} {(\log n)}^{3 / 2}}{{\overset{˘}{b}}^{m (d + 1 / 2)} \sqrt{n}}), a . s .

(3.14)

where

{\hat{r}}_{n, 1}^{(m), (miss)} (\cdot)

is defined in (3.3) and

\hat{E} [\cdot]

is the centering operator defined in (2.9).

Theorem 4.

Assume that (C.0) and (C.2) hold. Under the MAR assumption (2.4), we have

sup_{\tilde{x} \in S_{d, 1}^{m}} |r^{(m)} (φ, \tilde{x}) - \hat{E} [{\hat{r}}_{n, 1}^{(m), (miss)} (φ, \tilde{x}; {\bar{Λ}}_{n, 1} (\tilde{x}))]| = O ({\overset{˘}{b}}^{1 / 2}) .

Corollary 1.

Under the assumptions of Theorems 3 and 4, together with the MAR assumption (2.4) and the positivity condition, we have, as

n \to \infty,

sup_{\tilde{x} \in S_{d, 1}^{m}} |{\hat{r}}_{n, 1}^{(m), (miss)} (φ, \tilde{x}; {\bar{Λ}}_{n, 1} (\tilde{x})) - r^{(m)} (φ, \tilde{x})| = O ({\overset{˘}{b}}^{1 / 2}) + O (\frac{| \log \overset{˘}{b} |^{m} {(\log n)}^{3 / 2}}{{\overset{˘}{b}}^{m (d + 1 / 2)} \sqrt{n}}), a . s .

(3.15)

3.3. Limiting Distribution Under Missing Data

Within this section, we establish the central limit theorem for the estimator defined in (3.3) under the MAR assumption. To achieve this, we rely on the following set of assumptions:

(A.1): Let $\tilde{x} = (x_{1}, \dots, x_{m})$ be a point of continuity for each

$\begin{matrix} r_{j l} (\tilde{x}) = \{\begin{matrix} 0 & if x_{j} \neq x_{l}, \\ E_{j, l} (\tilde{x}) & if x_{j} = x_{l}, \end{matrix} \end{matrix}$

where

$\begin{matrix} E_{j, l} (\tilde{x}) = E [φ (Y_{1}, \dots, Y_{j - 1}, Y, Y_{j + 1}, \dots, Y_{m}) φ (Y_{m + 1}, \dots, Y_{m + j - 1}, Y, Y_{m + j + 1}, \dots, Y_{2 m}) \\ ∣ X_{i} = x_{i} for i \neq j, X_{m + r} = x_{r} for r \neq l and X = x_{j} = x_{l}]; \end{matrix}$
(A.2): The density function $f (\cdot)$ is continuous at each $x_{j}$ , $1 \leq j \leq m,$ with $f (x_{j}) > 0$ ;
(A.3): $r_{j, l, s} (\cdot, \cdot, \cdot)$ is bounded in a neighborhood of $(\tilde{x}, \tilde{x}, \tilde{x}) \in S_{d, 1}^{3 m}$ , for all $1 \leq j, l, s \leq m$ , where:

$\begin{matrix} r_{j, l, s} ({\tilde{z}}_{m}, {\tilde{z}}_{2 m}, {\tilde{z}}_{3 m}) & = & E [φ (Y_{1}, \dots, Y_{j - 1}, Y, Y_{j + 1} \dots, Y_{m}) \\ \times φ (Y_{m + 1}, \dots, Y_{m + j - 1}, Y, Y_{m + j + 1} \dots, Y_{2 m}) \\ \times φ (Y_{2 m + 1}, \dots, Y_{2 m + j - 1}, Y, Y_{2 m + j + 1} \dots, Y_{3 m}) \\ ∣ X_{i} = z_{i}; 1 \leq i \leq 3 m, i \neq j, m + 1, 2 m + s, X = z], \end{matrix}$

and for $1 \leq s \leq 3$ , ${\tilde{z}}_{s m} = (z_{(s - 1) m + 1}, \dots, z_{(s - 1) m + j - 1}, z, z_{(s - 1) m + j + 1}, \dots, z_{s m})$ ;
(A.4): $r_{1, 2}^{(m)} (\cdot, \cdot)$ is bounded in a neighborhood of $(\tilde{x}, \tilde{x})$ , where

$r_{1, 2}^{(m)} ({\tilde{x}}_{1}, {\tilde{x}}_{2}) = E [φ (Y_{i_{1}}, \dots, Y_{i_{m}}) φ (Y_{j_{1}}, \dots, Y_{j_{m}}) ∣ (X_{i_{1}}, \dots, X_{i_{m}}) = {\tilde{x}}_{1}, (X_{j_{1}}, \dots, X_{j_{m}}) = {\tilde{x}}_{2}];$
(A.5): Let $r^{(m)} (φ, \cdot)$ admit an expansion

$r^{(m)} (φ, t + Δ) = r^{(m)} (φ, t) + {\{\frac{\partial}{\partial t} r^{(m)} (φ, t)\}}^{⊤} Δ + \frac{1}{2} Δ^{⊤} \{\frac{\partial^{2}}{\partial t^{2}} r^{(m)} (φ, t)\} Δ + o (Δ^{⊤} Δ),$

as $Δ \to 0$ , for all $t$ in a neighborhood of $\tilde{x}$ .

Below, we write

Z \overset{D}{=} N (μ, σ^{2})

whenever the random variable Z is Gaussian with mean

μ

and variance

σ^{2}

, and

\overset{D}{⟶}

denotes convergence in distribution. We also denote

U_{n, 1}^{(miss)} (φ, \tilde{x}) = \frac{u_{n, 1}^{(Dir)} (φ, \tilde{x})}{N^{(miss)}},

where

N^{(miss)} : = \prod_{j = 1}^{m} E [δ K_{(α, β)} (X)] = \prod_{j = 1}^{m} E [p (X) K_{(α, β)} (X)],

by the MAR assumption (2.4). Note that

N^{(miss)}

incorporates the propensity score

p (\cdot)

. For fixed

\tilde{x} \in Int (S_{d, 1}^{m})

, let

Σ = Σ (\tilde{x}) : = (\begin{matrix} σ_{11} & σ_{12} \\ σ_{12} & σ_{22} \end{matrix})

denote the asymptotic covariance matrix of the vector

\sqrt{n {\overset{˘}{b}}^{d / 2}} (\begin{matrix} U_{n, 1}^{(miss)} (φ, \tilde{x}) - E [U_{n, 1}^{(miss)} (φ, \tilde{x})] \\ U_{n, 1}^{(miss)} (1, \tilde{x}) - E [U_{n, 1}^{(miss)} (1, \tilde{x})] \end{matrix}),

that is,

\begin{matrix} σ_{11} & : = lim_{n \to \infty} n {\overset{˘}{b}}^{d / 2} Var (U_{n, 1}^{(miss)} (φ, \tilde{x})), \end{matrix}

(3.16)

\begin{matrix} σ_{12} & : = lim_{n \to \infty} n {\overset{˘}{b}}^{d / 2} Cov (U_{n, 1}^{(miss)} (φ, \tilde{x}), U_{n, 1}^{(miss)} (1, \tilde{x})), \end{matrix}

(3.17)

\begin{matrix} σ_{22} & : = lim_{n \to \infty} n {\overset{˘}{b}}^{d / 2} Var (U_{n, 1}^{(miss)} (1, \tilde{x})) . \end{matrix}

(3.18)

Theorem 5.

Under assumptions (A.1)–(A.4), (C.0), and (C.2), the MAR assumption (2.4), the positivity condition

inf_{x \in S_{d, 1}} p (x) \geq c > 0,

and if

r^{(m)} (φ, \cdot)

is continuous at

\tilde{x} \in Int (S_{d, 1}^{m})

, then

\sqrt{n {\overset{˘}{b}}^{d / 2}} ({\hat{r}}_{n, 1}^{(m), (miss)} (φ, \tilde{x}; {\bar{Λ}}_{n, 1} (\tilde{x})) - \hat{E} [{\hat{r}}_{n, 1}^{(m), (miss)} (φ, \tilde{x}; {\bar{Λ}}_{n, 1} (\tilde{x}))]) \overset{D}{⟶} N (0, ρ_{miss}^{2}),

where

ρ_{miss}^{2} = \frac{σ_{11}}{{\{E [U_{n, 1}^{(miss)} (1, \tilde{x})]\}}^{2}} - 2 \frac{E [U_{n, 1}^{(miss)} (φ, \tilde{x})]}{{\{E [U_{n, 1}^{(miss)} (1, \tilde{x})]\}}^{3}} σ_{12} + \frac{{\{E [U_{n, 1}^{(miss)} (φ, \tilde{x})]\}}^{2}}{{\{E [U_{n, 1}^{(miss)} (1, \tilde{x})]\}}^{4}} σ_{22} .

(3.19)

Moreover, if

\sqrt{n {\overset{˘}{b}}^{d / 2}} (U_{n, 1}^{(miss)} (1, \tilde{x}) - 1) \overset{P}{\to} 0,

then

σ_{12} = 0

and

σ_{22} = 0

, and hence (3.19) reduces to

ρ_{miss}^{2} = \sum_{i = 1}^{m} \sum_{j = 1}^{m} 1_{{x_{i} = x_{j}}} \frac{r_{i j} (\tilde{x})}{p (x_{i}) f (x_{i})} \int K_{(α_{i}, β_{i})}^{2} (u) d u .

(3.20)

The proof of Theorem 5 is postponed until Section 11.

Remark 4.

The simplified expression (3.20) is valid when the denominator

U_{n, 1}^{(miss)} (1, \tilde{x})

converges to a constant (i.e., when its asymptotic variance

σ_{22} = 0

). This holds under the same regularity conditions because

u_{n, 1}^{(Dir)} (1, \tilde{x})

converges to a deterministic limit. In the general case where

σ_{22} \neq 0

, the full delta-method formula must be used.

The following corollary is more or less straightforward, given Theorem 5.

Corollary 2.

If, in addition to the assumptions of Theorem 5, (A.5) holds, then under the MAR assumption (2.4) and the positivity condition, we have the following bias expansion:

\begin{matrix} {\overset{˘}{b}}^{- d / 2} (E [U_{n, 1}^{(miss)} (φ, \tilde{x})] - r^{(m)} (φ, \tilde{x})) \\ = [\int \prod_{j = 1}^{m} K_{α, β} (t_{j}) {\{R^{(m)'} (φ, \tilde{t})\}}^{⊤} t d t / \tilde{f} (\tilde{x}) \\ - \int \prod_{j = 1}^{m} K_{α, β} (t_{j}) t^{⊤} \{{\tilde{f}}^{'} (\tilde{x})\} t d t \frac{r^{(m)} (φ, \tilde{x})}{\tilde{f} (\tilde{x})}] \\ + \frac{{\overset{˘}{b}}^{d / 2}}{2} [\int \prod_{j = 1}^{m} K_{α, β} (t_{j}) t^{⊤} \{R^{(m) ″} (φ, \tilde{t})\} t d t / \tilde{f} (\tilde{x}) \\ - \int \prod_{j = 1}^{m} K_{α, β} (t_{j}) t^{⊤} \{{\tilde{f}}^{″} (\tilde{x})\} t d t \frac{r^{(m)} (φ, \tilde{x})}{\tilde{f} (\tilde{x})}] + o (1) . \end{matrix}

In particular, if

\sqrt{n} {\overset{˘}{b}}^{(d + 2) / 4} ⟶ 0,

then

\sqrt{n {\overset{˘}{b}}^{d / 2}} ({\hat{r}}_{n, 1}^{(m), (miss)} (φ, \tilde{x}; {\bar{Λ}}_{n, 1} (\tilde{x})) - r^{(m)} (φ, \tilde{x})) \overset{D}{⟶} N (0, ρ_{miss}^{2}),

where

ρ_{miss}^{2}

is defined in (3.19).

Remark 5.

The invariance of the first-order bias expansion with respect to the missingness mechanism under MAR deserves careful scrutiny. Observe that

E [U_{n, 1}^{(miss)} (φ, \tilde{x})] = \frac{E [u_{n, 1}^{(Dir)} (φ, \tilde{x})]}{\prod_{j = 1}^{m} E [δ K_{(α, β)} (X)]},

where, by the MAR assumption (2.4) and the law of total expectation,

E [δ K_{(α, β)} (X)] = E [E [δ ∣ X] K_{(α, β)} (X)] = E [p (X) K_{(α, β)} (X)] .

Similarly,

E [u_{n, 1}^{(Dir)} (φ, \tilde{x})] = E [φ (\tilde{Y}) \prod_{j = 1}^{m} δ_{j} K_{(α, β)} (X_{j})] = E [φ (\tilde{Y}) \prod_{j = 1}^{m} p (X_{j}) K_{(α, β)} (X_{j})],

using the conditional independence

δ_{j} ⊥ Y_{j} ∣ X_{j}

and the product structure of the joint distribution under i.i.d. sampling. Consequently,

E [U_{n, 1}^{(miss)} (φ, \tilde{x})] = \frac{E [φ (\tilde{Y}) \prod_{j = 1}^{m} p (X_{j}) K_{(α, β)} (X_{j})]}{\prod_{j = 1}^{m} E [p (X) K_{(α, β)} (X)]} .

At first glance, this expression depends nontrivially on the propensity score

p (\cdot)

. However, a Taylor expansion of the kernel

K_{(α, β)} (\cdot)

around

\tilde{x}

reveals that, to leading order as

\overset{˘}{b} \to 0

,

K_{(α, β)} (t) = {\overset{˘}{b}}^{- d / 2} \frac{1}{\prod_{i = 1}^{d} x_{i}^{1 / 2}} κ (\frac{t - x}{{\overset{˘}{b}}^{1 / 2}}) + o ({\overset{˘}{b}}^{- d / 2}),

where κ is a bounded kernel function on

R^{d}

(see [164] for the precise form). Substituting this expansion, the factors

p (X_{j})

in the numerator and denominator cancel to first order because they are evaluated at the same points

X_{j}

and the kernel concentrates around

x_{j}

. More formally, under the smoothness condition (C.2) and the continuity of

p (\cdot)

, we have

E [p (X_{j}) K_{(α, β)} (X_{j})] = p (x_{j}) E [K_{(α, β)} (X_{j})] + O ({\overset{˘}{b}}^{1 / 2}),

and similarly for the numerator. The leading-order terms involving

p (x_{j})

cancel exactly in the ratio, yielding a bias expansion that coincides with the complete-data case. The residual terms are of order

O ({\overset{˘}{b}}^{1 / 2})

and are absorbed into the

o (1)

term in the expansion. Therefore, the first-order bias is unaffected by the MAR mechanism. However, this cancellation is not exact at finite sample sizes; higher-order terms involving derivatives of

p (\cdot)

may appear at order

O (\overset{˘}{b})

, which are asymptotically negligible relative to the leading bias term

O ({\overset{˘}{b}}^{1 / 2})

under the bandwidth condition

n {\overset{˘}{b}}^{(d + 2) / 4} \to 0

. In contrast, the asymptotic variance is irreducibly inflated by the factor

1 / p (x_{i})

, as evidenced in (3.19), reflecting the fundamental loss of information due to missingness. This phenomenon is characteristic of complete-case estimators under MAR: the bias remains first-order unbiased, but the variance increases proportionally to the inverse of the propensity score.

Remark 6

([164]). According to Theorem 3.1.15 in [86], the convergence rate for the conventional d-dimensional kernel density estimator with independent and identically distributed (i.i.d.) data, using bandwidth h, is

O (n^{- 1 / 2} h^{- d / 2})

. In contrast, under the MAR assumption with positivity condition, the estimator

{\hat{f}}_{n}^{(miss)} (x, Λ_{n, 1})

achieves a convergence rate of

O (n^{- 1 / 2} {\overset{˘}{b}}^{- d / 4})

. Consequently, the relationship between the bandwidths of

{\hat{f}}_{n}^{(miss)} (x, Λ_{n, 1})

and the traditional multivariate kernel density estimator is expressed as

\overset{˘}{b} \approx h^{2}

.

Remark 7.

In their work, Ouimet and Tolosana-Delgado [164] demonstrated that for all

x \in Int (S_{d, 1})

and as n tends to infinity, the Mean Squared Error (MSE) of the estimator

{\hat{f}}_{n}^{(miss)} (x, Λ_{n, 1})

with respect to the density function

f (\cdot)

under the MAR assumption can be expressed as:

\begin{matrix} MSE [{\hat{f}}_{n}^{(miss)} (x, Λ_{n, 1})] & : = & E [{|{\hat{f}}_{n}^{(miss)} (x, Λ_{n, 1}) - f (x)|}^{2}] \\ = & Var [{\hat{f}}_{n}^{(miss)} (x, Λ_{n, 1})] + {\{Bias [{\hat{f}}_{n}^{(miss)} (x, Λ_{n, 1})]\}}^{2} \\ = & n^{- 1} {\overset{˘}{b}}^{- d / 2} \frac{ψ (x) f (x)}{p (x)} + {\overset{˘}{b}}^{2} g^{2} (x) + O_{x} (n^{- 1} {\overset{˘}{b}}^{- d / 2 + 1 / 2}) + o ({\overset{˘}{b}}^{2}), \end{matrix}

where

ψ (\cdot)

is defined in Equation (A1) in Lemma A4,

p (x)

is the propensity score defined in (2.4), and

g (x) : = \sum_{i = 1}^{d} (1 - (d + 1) x_{i}) \frac{\partial}{\partial x_{i}} f (x) + \frac{1}{2} \sum_{i, j = 1}^{d} x_{i} (1_{{i = j}} - x_{j}) \frac{\partial^{2}}{\partial x_{i} \partial x_{j}} f (x) .

In particular, if

f (x) \cdot g (x) \neq 0

, the asymptotically optimal choice of

\overset{˘}{b}

, concerning MSE, is given by:

{\overset{˘}{b}}_{opt} (x) = n^{- 2 / (d + 4)} {[\frac{d}{4} \cdot \frac{ψ (x) f (x)}{p (x) g^{2} (x)}]}^{2 / (d + 4)},

with

\begin{matrix} MSE [{\hat{f}}_{n}^{(miss)} (x, Λ_{n, 1}); {\overset{˘}{b}}_{opt}] & = & n^{- 4 / (d + 4)} [\frac{1 + \frac{d}{4}}{{(\frac{d}{4})}^{\frac{d}{d + 4}}}] \frac{{(ψ (x) f (x) / p (x))}^{4 / (d + 4)}}{{(g^{2} (x))}^{- d / (d + 4)}} \\ + o_{x} (n^{- 4 / (d + 4)}) . \end{matrix}

Furthermore, if

n^{2 / (d + 4)} \overset{˘}{b}

tends to λ for some

λ > 0

as n approaches infinity, then

MSE [{\hat{f}}_{n}^{(miss)} (x, Λ_{n, 1})] = n^{- 4 / (d + 4)} [λ^{- d / 2} \frac{ψ (x) f (x)}{p (x)} + λ^{2} g^{2} (x)] + o_{x} (n^{- 4 / (d + 4)}) .

4. Conditional $U$ -Statistics Using Bernstein Polynomials Under Missing Data

This section delves into the asymptotic properties of the conditional U-statistics estimator derived by inverting the Bernstein polynomial estimator of the distribution function in the presence of missing responses under the MAR assumption (2.4). Let

F (\cdot)

represent any joint cumulative distribution function on

S_{d, 1}

, where values outside

S_{d, 1}

are either 0 or 1. Following [129,131], we define the Bernstein polynomial of order

ϑ

for

F (\cdot)

as follows:

F_{ϑ}^{★} (x) = \sum_{k \in N_{0}^{d} \cap ϑ S_{d, 1}} F (k / ϑ) P_{k, ϑ} (x), x \in S_{d, 1}, ϑ \in N,

where the weights are probabilities from the

Multinomial (ϑ, x)

distribution:

P_{k, ϑ} (x) = \frac{ϑ!}{(ϑ - {∥ k ∥}_{1})! \prod_{i = 1}^{d} k_{i}!} \cdot {(1 - {∥ x ∥}_{1})}^{ϑ - {∥ k ∥}_{1}} \prod_{i = 1}^{d} x_{i}^{k_{i}}, k \in N_{0}^{d} \cap ϑ S_{d, 1} .

The Bernstein estimator of

F (\cdot)

, denoted by

F_{n, ϑ}^{★} (\cdot)

, is the Bernstein polynomial of order

ϑ

for the empirical cumulative distribution function:

F_{n} (x) : = n^{- 1} \sum_{i = 1}^{n} 1_{(- \infty, x]} (X_{i}),

where

X_{1}, \dots, X_{n}

are independent and identically distributed according to

F (\cdot)

and

1 {A}

denotes as usual the indicator function of the set A. Precisely, we define:

F_{n, ϑ}^{★} (x) = \sum_{k \in N_{0}^{d} \cap ϑ S_{d, 1}} F_{n} (k / ϑ) P_{k, ϑ} (x), x \in S_{d, 1}, ϑ, n \in N .

Under the MAR assumption (2.4), the observed data consist of the triplets

{(X_{i}, Y_{i}, δ_{i})}_{i = 1}^{n}

, where

δ_{i}

is the missingness indicator. The complete-case empirical cumulative distribution function, which discards observations with missing responses, is defined as:

F_{n}^{(miss)} (x) : = \frac{1}{n} \sum_{i = 1}^{n} δ_{i} 1_{(- \infty, x]} (X_{i}),

where the normalization is by n (not by the number of observed cases) to maintain the proper stochastic order. The corresponding Bernstein estimator under missing data is then:

F_{n, ϑ}^{★, (miss)} (x) = \sum_{k \in N_{0}^{d} \cap ϑ S_{d, 1}} F_{n}^{(miss)} (k / ϑ) P_{k, ϑ} (x), x \in S_{d, 1}, ϑ, n \in N .

For a density

f (\cdot)

supported on

S_{d, 1}

, define the Bernstein kernel

K_{x, ϑ} (X_{i}) : = \sum_{k \in N_{0}^{d} \cap (ϑ - 1) S_{d, 1}} \frac{(ϑ - 1 + d)!}{(ϑ - 1)!} 1_{(\frac{k}{ϑ}, \frac{k + 1}{ϑ}]} (X_{i}) P_{k, ϑ - 1} (x) .

Then the complete-case Bernstein density estimator under MAR is

{\hat{f}}_{n, ϑ}^{(miss)} (x) = \frac{1}{n} \sum_{i = 1}^{n} δ_{i} K_{x, ϑ} (X_{i}), x \in S_{d, 1} .

(4.1)

Equivalently,

{\hat{f}}_{n, ϑ}^{(miss)} (x) = \sum_{k \in N_{0}^{d} \cap (ϑ - 1) S_{d, 1}} \frac{(ϑ - 1 + d)!}{(ϑ - 1)!} \{\frac{1}{n} \sum_{i = 1}^{n} δ_{i} 1_{(\frac{k}{ϑ}, \frac{k + 1}{ϑ}]} (X_{i})\} P_{k, ϑ - 1} (x) .

Remark 8

([129]). A different expression for the complete-case Bernstein density estimator (4.1) under MAR can be formulated as a specific finite mixture of Dirichlet densities:

{\hat{f}}_{n, ϑ}^{(miss)} (x) = \sum_{k \in N_{0}^{d} \cap (ϑ - 1) S_{d, 1}} \{\frac{1}{n} \sum_{i = 1}^{n} δ_{i} 1_{(\frac{k}{ϑ}, \frac{k + 1}{ϑ}]} (X_{i})\} D (k + 1, ϑ - {∥ k ∥}_{1}) (x),

where the density value of the

Dirichlet (α, β)

distribution at

x \in S_{d, 1}

is given by

D (α, β) (x) = \frac{(β + {∥ α ∥}_{1} - 1)!}{(β - 1)! \prod_{i = 1}^{d} (α_{i} - 1)!} {(1 - {∥ x ∥}_{1})}^{β - 1} \prod_{i = 1}^{d} x_{i}^{α_{i} - 1}, α_{i}, β > 0 .

For further details, see [129].

The conditional U-statistic smoothed by Bernstein polynomials under missing data is defined, for each

\tilde{x} \in S_{d, 1}^{m}

, by

\begin{matrix} {\hat{r}}_{n, 2}^{(m), (miss)} (φ, \tilde{x}; {\bar{Λ}}_{n, 2} (\tilde{x})) & = & \frac{\sum_{(i_{1}, \dots, i_{m}) \in I (m, n)} φ (Y_{i_{1}}, \dots, Y_{i_{m}}) (\prod_{j = 1}^{m} δ_{i_{j}}) \prod_{j = 1}^{m} K_{x_{j}, ϑ} (X_{i_{j}})}{\sum_{(i_{1}, \dots, i_{m}) \in I (m, n)} (\prod_{j = 1}^{m} δ_{i_{j}}) \prod_{j = 1}^{m} K_{x_{j}, ϑ} (X_{i_{j}})} . \end{matrix}

(4.2)

In the particular case

m = 1

,

r^{(m)} (φ, \tilde{x})

reduces to

r^{(1)} (φ, x) = E (φ (Y) ∣ X = x)

, and the complete-case Nadaraya–Watson estimator under MAR becomes

{\hat{r}}_{n, 2}^{(1), (miss)} (φ, x) : = \frac{\sum_{i = 1}^{n} φ (Y_{i}) δ_{i} K_{x, ϑ} (X_{i})}{\sum_{i = 1}^{n} δ_{i} K_{x, ϑ} (X_{i})} = \frac{{\hat{g}}_{n, ϑ}^{(miss)} (φ, x)}{{\hat{f}}_{n, ϑ}^{(miss)} (x)} .

4.1. Centering and U-Statistic Representation Under MAR

For the Bernstein polynomial estimator under missing data, define

G_{φ, \tilde{x}, 2}^{(Bern-miss)} (\tilde{t}, \tilde{y}, \tilde{δ}) = φ (\tilde{y}) (\prod_{j = 1}^{m} δ_{j}) \prod_{j = 1}^{m} K_{x_{j}, ϑ} (t_{j}), (\tilde{t}, \tilde{y}, \tilde{δ}) \in S_{d, 1}^{m} \times R^{q m} \times {0, 1}^{m} .

The corresponding U-statistic is

u_{n, 2}^{(Bern-miss)} (φ, \tilde{x}) : = \frac{(n - m)!}{n!} \sum_{i \in I (m, n)} G_{φ, \tilde{x}, 2}^{(Bern-miss)} ({\tilde{X}}_{i}, {\tilde{Y}}_{i}, {\tilde{δ}}_{i}),

and the ratio representation (2.8) holds analogously:

{\hat{r}}_{n, 2}^{(m), (miss)} (φ, \tilde{x}; {\bar{Λ}}_{n, 2} (\tilde{x})) = \frac{u_{n, 2}^{(Bern-miss)} (φ, \tilde{x})}{u_{n, 2}^{(Bern-miss)} (1, \tilde{x})} .

The centering operator

\hat{E} [\cdot]

defined in (2.9) is adapted accordingly.

4.2. Nonparametric Regression Estimation Under Missing Data

In this section, we prove the uniform strong consistency of the complete-case regression estimator for

m = 1

under the MAR assumption.

Theorem 6.

Assume that the conditions (C.1), (C.2) and (C.3) hold, together with the MAR assumption (2.4) and the positivity condition

{inf}_{x \in S_{d, 1}} p (x) \geq c > 0

. If

2 \leq ϑ \leq \frac{n}{\log n}

, as

n ⟶ \infty

, then

sup_{x \in S_{d, 1}} |E [{\hat{r}}_{n, 2}^{(1), (miss)} (φ, x)] - r^{(1)} (φ, x)| = O (ϑ^{- 1 / 2}),

(4.3)

and

sup_{x \in S_{d, 1}} |{\hat{r}}_{n, 2}^{(1), (miss)} (φ, x) - E [{\hat{r}}_{n, 2}^{(1), (miss)} (φ, x)]| = O (ϑ^{d - 1 / 2} {(n^{- 1} \log n)}^{1 / 2}), a . s .

(4.4)

The bias bound remains identical to the complete-data case, while the stochastic bound is unaffected because the missingness indicators

δ_{i}

are absorbed into the kernel and the uniform convergence rate is preserved under the positivity condition.

Corollary 3.

Assume that the conditions of Theorem 6 hold. Then, as

n ⟶ \infty

, we have

sup_{x \in S_{d, 1}} |{\hat{r}}_{n, 2}^{(1), (miss)} (φ, x) - r^{(1)} (φ, x)| = O (ϑ^{d - 1 / 2} {(n^{- 1} \log n)}^{1 / 2}) + O (ϑ^{- 1 / 2}), a . s .

(4.5)

In particular, if

ϑ^{2 d - 1} = o (n / \log n)

, then

sup_{x \in S_{d, 1}} |{\hat{r}}_{n, 2}^{(1), (miss)} (φ, x) - r^{(1)} (φ, x)| ⟶ 0, a . s .

(4.6)

4.3. Conditional U-Statistics Under Missing Data

In this section, we study the uniform strong consistency of the conditional U-statistic estimators using Bernstein polynomials in the presence of missing responses under MAR.

Theorem 7.

Assume that the conditions (C.2) and (C.3) hold, together with the MAR assumption (2.4) and the positivity condition

{inf}_{x \in S_{d, 1}^{m}} p (x) \geq c > 0

. If

2 \leq ϑ \leq \frac{n}{\log n}

, as

n ⟶ \infty

, then

sup_{\tilde{x} \in S_{d, 1}^{m}} |u_{n, 2}^{(Bern-miss)} (φ, \tilde{x}) - E [u_{n, 2}^{(Bern-miss)} (φ, \tilde{x})]| = O (ϑ^{m (d - 1 / 2)} {(n^{- 1} \log n)}^{1 / 2}), a . s .

(4.7)

Theorem 8.

Assume that the conditions (C.2) and (C.3) hold, together with the MAR assumption (2.4) and the positivity condition

{inf}_{x \in S_{d, 1}^{m}} p (x) \geq c > 0

. If

2 \leq ϑ \leq \frac{n}{\log n}

, as

n ⟶ \infty

, then

\begin{matrix} sup_{\tilde{x} \in S_{d, 1}^{m}} |{\hat{r}}_{n, 2}^{(m), (miss)} (φ, \tilde{x}; {\bar{Λ}}_{n, 2} (\tilde{x})) - \hat{E} [{\hat{r}}_{n, 2}^{(m), (miss)} (φ, \tilde{x}; {\bar{Λ}}_{n, 2} (\tilde{x}))]| \\ = O (ϑ^{m (d - 1 / 2)} {(n^{- 1} \log n)}^{1 / 2}), a . s . \end{matrix}

(4.8)

Theorem 9.

Assume that (C.0), (C.1) and (C.2) hold, together with the MAR assumption (2.4). If

2 \leq ϑ \leq n / \log n

, then, as

n \to \infty

,

sup_{\tilde{x} \in S_{d, 1}^{m}} |r^{(m)} (φ, \tilde{x}) - \hat{E} [{\hat{r}}_{n, 2}^{(m), (miss)} (φ, \tilde{x}; {\bar{Λ}}_{n, 2} (\tilde{x}))]| = O (ϑ^{- m / 2}) .

Corollary 4.

Under the assumptions of Theorems 8 and 9, together with the MAR assumption (2.4) and the positivity condition, as

n ⟶ \infty

, we have

sup_{\tilde{x} \in S_{d, 1}^{m}} |{\hat{r}}_{n, 2}^{(m), (miss)} (φ, \tilde{x}; {\bar{Λ}}_{n, 2} (\tilde{x})) - r^{(m)} (φ, \tilde{x})| = O (ϑ^{- m / 2}) + O (ϑ^{m (d - 1 / 2)} {(n^{- 1} \log n)}^{1 / 2}), a . s .

(4.9)

In particular, if

ϑ^{2 d - 1} = o (n / \log n)

, then

sup_{\tilde{x} \in S_{d, 1}^{m}} |{\hat{r}}_{n, 2}^{(m), (miss)} (φ, \tilde{x}; {\bar{Λ}}_{n, 2} (\tilde{x})) - r^{(m)} (φ, \tilde{x})| ⟶ 0, a . s .

(4.10)

Remark 9

([129]). It is worth mentioning that, similar to Remark 6, the convergence rate for the conventional d-dimensional kernel density estimator with independent and identically distributed (i.i.d.) data, using bandwidth h, is

O (n^{- 1 / 2} h^{- d / 2})

. However, under the MAR assumption with the positivity condition, the complete-case estimator

{\hat{f}}_{n}^{(miss)} (x, ϑ)

achieves a convergence rate of

O (n^{- 1 / 2} ϑ^{d / 4})

. Consequently, the relationship between the bandwidths of

{\hat{f}}_{n}^{(miss)} (x, ϑ)

and the traditional multivariate kernel density estimator is expressed as

ϑ \approx h^{- 2}

.

Remark 10.

Ouimet [129] demonstrated that the Mean Squared Error (MSE) of the complete-case density estimator

{\hat{f}}_{n}^{(miss)} (x, ϑ)

under MAR satisfies for all

x \in Int (S_{d, 1})

and as n tends to infinity:

MSE ({\hat{f}}_{n}^{(miss)} (x, ϑ)) = n^{- 1} ϑ^{d / 2} \frac{ψ (x) f (x)}{p (x)} + ϑ^{- 2} b^{2} (x) + o_{x} (n^{- 1} ϑ^{d / 2}) + o (ϑ^{- 2}),

where

b (x) : = \frac{d (d - 1)}{2} f (x) + \sum_{i = 1}^{d} (\frac{1}{2} - x_{i}) \frac{\partial}{\partial x_{i}} f (x) + \frac{1}{2} \sum_{i, j = 1}^{d} (x_{i} 1_{{i = j}} - x_{i} x_{j}) \frac{\partial^{2}}{\partial x_{i} \partial x_{j}} f (x),

and

ψ (x) : = {[{(4 π)}^{d} (1 - {∥ x ∥}_{1}) \prod_{i = 1}^{d} x_{i}]}^{- 1 / 2} .

The factor

1 / p (x)

in the variance term reflects the inflation due to missingness, analogous to the Dirichlet kernel case. In particular, when

f (x) \cdot b (x) \neq 0

, the asymptotically optimal choice for ϑ, minimizing MSE, is:

ϑ_{opt}^{(miss)} (x) = n^{2 / (d + 4)} {[\frac{4}{d} \cdot \frac{b^{2} (x) p (x)}{ψ (x) f (x)}]}^{2 / (d + 4)},

with corresponding MSE:

\begin{matrix} MSE [{\hat{f}}_{n}^{(miss)} (x, ϑ); ϑ_{opt}^{(miss)}] & = & n^{- 4 / (d + 4)} [\frac{\frac{4}{d} + 1}{{(\frac{4}{d})}^{\frac{4}{d + 4}}}] \frac{{(ψ (x) f (x) / p (x))}^{4 / (d + 4)}}{{(b^{2} (x))}^{- d / (d + 4)}} \\ + o_{x} (n^{- 4 / (d + 4)}) . \end{matrix}

Moreover, in the more general case where

n^{2 / (d + 4)} ϑ^{- 1} \to λ > 0

, as

n \to \infty

, the MSE becomes:

MSE [{\hat{f}}_{n}^{(miss)} (x, ϑ)] = n^{- 4 / (d + 4)} [λ^{- d / 2} \frac{ψ (x) f (x)}{p (x)} + λ^{2} b^{2} (x)] + o_{x} (n^{- 4 / (d + 4)}) .

The presence of

p (x)

in the denominator of the variance term indicates that missing data increase the MSE, and the optimal bandwidth

ϑ_{opt}^{(miss)} (x)

depends on the propensity score, unlike the complete-data case.

5. Conditional $U$ -Statistics Estimators Using Beta Kernels Under Missing Data

Throughout this section, it is assumed, as in [165], without loss of generality, that the compact set is a d-dimensional unit hypercube

{[0, 1]}^{d}

. Among all asymmetric kernels, our particular focus is on the Beta kernel by [98]. The kernel takes the form

K_{\overset{˘}{α}, \overset{˘}{β}} (u) = \frac{u^{x / b} {(1 - u)}^{(1 - x) / b}}{B {x / b + 1, (1 - x) / b + 1}} 1_{[0, 1]} (u),

where

\overset{˘}{α} : = \frac{x}{b} + 1 and \overset{˘}{β} : = \frac{1 - x}{b} + 1, x \in [0, 1], b > 0,

and

B (\overset{˘}{α}, \overset{˘}{β}) = \int_{0}^{1} y^{\overset{˘}{α} - 1} {(1 - y)}^{\overset{˘}{β} - 1} d y

for

\overset{˘}{α}, \overset{˘}{β} > 0

is the beta function. To cope with multivariate problems, we construct a tensor product kernel for

\overset{˘}{α} = ({\overset{˘}{α}}_{1}, \dots, {\overset{˘}{α}}_{d})

and

\overset{˘}{β} = ({\overset{˘}{β}}_{1}, \dots, {\overset{˘}{β}}_{d})

K_{\overset{˘}{α}, \overset{˘}{β}} (u) = \prod_{i = 1}^{d} K_{{\overset{˘}{α}}_{i}, {\overset{˘}{β}}_{i}} (u_{i}) = \prod_{i = 1}^{d} \frac{u_{i}^{x_{i} / b_{i}} {(1 - u_{i})}^{(1 - x_{i}) / b_{i}}}{B \{x_{i} / b_{i} + 1, (1 - x_{i}) / b_{i} + 1\}} 1 \{u_{i} \in [0, 1]\},

where

u : = (u_{1}, \dots, u_{d}) \in {[0, 1]}^{d}, x : = (x_{1}, \dots, x_{d}) \in {[0, 1]}^{d}

and

b : = (b_{1}, \dots, b_{d}) \in R_{+}^{d}

are d-dimensional vectors of data points, design points, and smoothing parameter, throughout. Under the MAR assumption (2.4) and the positivity condition

{inf}_{x \in S_{X}} p (x) \geq c > 0

, the complete-case estimator extending to missing responses is given by

\begin{matrix} {\hat{r}}_{n, 3}^{(m), (miss)} (φ, \tilde{x}; {\bar{Λ}}_{n, 3} (\tilde{x})) = \frac{\sum_{(i_{1}, \dots, i_{m}) \in I (m, n)} φ (Y_{i_{1}}, \dots, Y_{i_{m}}) (\prod_{j = 1}^{m} δ_{i_{j}}) K_{{\overset{˘}{α}}_{1}, {\overset{˘}{β}}_{1}} (X_{i_{1}}) \dots K_{{\overset{˘}{α}}_{m}, {\overset{˘}{β}}_{m}} (X_{i_{m}})}{\sum_{(i_{1}, \dots, i_{m}) \in I (m, n)} (\prod_{j = 1}^{m} δ_{i_{j}}) K_{{\overset{˘}{α}}_{1}, {\overset{˘}{β}}_{1}} (X_{i_{1}}) \dots K_{{\overset{˘}{α}}_{m}, {\overset{˘}{β}}_{m}} (X_{i_{m}})}, \end{matrix}

(5.1)

where

{\overset{˘}{α}}_{j} : = \frac{x_{j}}{{\overset{˘}{b}}_{j}} + 1 and {\overset{˘}{β}}_{j} : = \frac{1 - x_{j}}{{\overset{˘}{b}}_{j}} + 1 .

For this specific kernel, the augmented kernel defined in (2.6) becomes

G_{φ, \tilde{x}, 3}^{(Beta)} (\tilde{t}, \tilde{y}, \tilde{δ}) = φ (\tilde{y}) (\prod_{j = 1}^{m} δ_{j}) \prod_{j = 1}^{m} K_{{\overset{˘}{α}}_{j}, {\overset{˘}{β}}_{j}} (t_{j}), (\tilde{t}, \tilde{y}, \tilde{δ}) \in {[0, 1]}^{d m} \times R^{q m} \times {0, 1}^{m},

(5.2)

where

({\overset{˘}{α}}_{j}, {\overset{˘}{β}}_{j})

are defined above. The corresponding U-statistic is

u_{n, 3}^{(Beta)} (φ, \tilde{x}) : = \frac{(n - m)!}{n!} \sum_{i \in I (m, n)} G_{φ, \tilde{x}, 3}^{(Beta)} ({\tilde{X}}_{i}, {\tilde{Y}}_{i}, {\tilde{δ}}_{i}),

(5.3)

and the ratio representation (2.8) holds analogously:

{\hat{r}}_{n, 3}^{(m), (miss)} (φ, \tilde{x}; {\bar{Λ}}_{n, 3} (\tilde{x})) = \frac{u_{n, 3}^{(Beta)} (φ, \tilde{x})}{u_{n, 3}^{(Beta)} (1, \tilde{x})} .

(5.4)

In the particular case

m = 1

, the Nadaraya–Watson estimator of

r^{(1)} (φ, \tilde{x})

of [165] under missing data is given by:

{\hat{r}}_{n}^{(1), (miss)} (φ, x) : = \frac{\sum_{i = 1}^{n} φ (Y_{i}) δ_{i} K_{\overset{˘}{α}, \overset{˘}{β}} (X_{i})}{\sum_{i = 1}^{n} δ_{i} K_{\overset{˘}{α}, \overset{˘}{β}} (X_{i})} .

5.1. Conditions and Comments Under MAR

Our analysis starts from demonstrating weak uniform consistency with rates of the sample average estimator (5.1) for (2.1) on a

d m

-hyperrectangle

S_{X}^{m}

, where:

S_{X} = S_{X} (η) : = \prod_{j = 1}^{d} [η_{j}, 1 - η_{j}] \subseteq {[0, 1]}^{d},

where the boundary parameters

η : = (η_{1}, \dots, η_{d})

are either fixed or shrink to zero at a suitable rate. To deliver the results under the MAR assumption, we impose the following conditions.

(C.4): For $b_{j} : = b_{j} (n) = (b_{j_{1}}, \dots, b_{j_{d}}) > 0$ and $η_{j} : = η_{j} (n) = (η_{j_{1}}, \dots, η_{j_{d}}) > 0$ , $j = 1, \dots, m$ , satisfying for $i = 1, \dots, d$ , $b_{j_{i}}, η_{j_{i}} \to 0$ , $\frac{b_{j_{i}}}{η_{j_{i}}} \to 0$ and

$\frac{\log n}{n \prod_{j = 1}^{m} (\prod_{i = 1}^{d} b_{j_{i}} η_{j_{i}})} ⟶ 0 a s n \to \infty .$
(C.5): (Positivity under MAR) The propensity score $p (\cdot)$ satisfies ${inf}_{x \in S_{X}} p (x) \geq c > 0$ for some constant c, ensuring that the denominator of (5.1) does not degenerate asymptotically.

The conditions on

η_{j_{i}}

in Assumption (C.4) are intended for the case of an expanding set. In particular, the condition

b_{j_{i}} / η_{j_{i}} \to 0

means that the boundary parameter

η_{j_{i}}

must shrink to zero at a slower rate than

b_{j_{i}}

; this is crucial for Stirling’s approximation to the gamma function. This condition was used by [165] for the novel proof of the convergence results that we have extended to our missing data setting. Condition (C.5) is standard in the missing data literature and guarantees that the complete-case estimator is well-defined asymptotically.

5.2. Weak Uniform Convergence of Conditional U-Statistics Under MAR

In the following theorem, we state the weak uniform convergence of conditional U-statistics under the MAR assumption. In the particular case of

m = 1

, this reduces to the results obtained in [165] extended to missing data.

Theorem 10.

If Assumptions (C.2)–(C.4) and (C.5) hold, then, as

n \to \infty,

we have

sup_{\tilde{x} \in S_{X}^{m}} |u_{n, 3}^{(Beta)} (φ, \tilde{x}) - E [u_{n, 3}^{(Beta)} (φ, \tilde{x})]| = O_{P} (\sqrt{\frac{(\log n / n)}{\prod_{j = 1}^{m} (\prod_{i = 1}^{d} b_{j_{i}} η_{j_{i}})}}) .

(5.5)

Theorem 11.

If Assumptions (C.2)–(C.4) and (C.5) hold, then, as

n \to \infty,

we have

\begin{matrix} sup_{\tilde{x} \in S_{X}^{m}} |{\hat{r}}_{n, 3}^{(m), (miss)} (φ, \tilde{x}; {\bar{Λ}}_{n, 3} (\tilde{x})) - \hat{E} [{\hat{r}}_{n, 3}^{(m), (miss)} (φ, \tilde{x}; {\bar{Λ}}_{n, 3} (\tilde{x}))]| \\ = O_{P} (\sqrt{\frac{(\log n / n)}{\prod_{j = 1}^{m} (\prod_{i = 1}^{d} b_{j_{i}} η_{j_{i}})}}), \end{matrix}

(5.6)

where

\hat{E} [\cdot]

is the centering operator defined in (2.9).

Theorem 12.

Assume that (C.0) and (C.2) hold. Under the MAR assumption (2.4) and the positivity condition

{inf}_{x \in S_{X}} p (x) \geq c > 0

, we have, as

n \to \infty

,

sup_{\tilde{x} \in S_{X}^{m}} |r^{(m)} (φ, \tilde{x}) - \hat{E} [{\hat{r}}_{n, 3}^{(m), (miss)} (φ, \tilde{x}; {\bar{Λ}}_{n, 3} (\tilde{x}))]| = O (\sum_{j = 1}^{m} \sum_{i = 1}^{d} b_{j_{i}}) .

Remark 11.

Under MAR, the complete-case ratio estimator is locally weighted by the product

p (x) f (x)

. Hence the first-order bias analysis is not purely a matter of smoothness of f and

R (φ, \cdot)

; one also needs regularity of the propensity score

p (\cdot)

. In particular, the cancellation of the propensity score in the ratio holds only asymptotically and relies on continuity (and, for higher-order expansions, differentiability) of

p (\cdot)

.

Corollary 5.

Under the assumptions of Theorems 11 and 12, as

n \to \infty,

we have

sup_{\tilde{x} \in S_{X}^{m}} |{\hat{r}}_{n, 3}^{(m), (miss)} (φ, \tilde{x}; {\bar{Λ}}_{n, 3} (\tilde{x})) - r^{(m)} (φ, \tilde{x})| = O_{P} (\sum_{j = 1}^{m} \sum_{i = 1}^{d} b_{j_{i}} + \sqrt{\frac{(\log n / n)}{\prod_{j = 1}^{m} (\prod_{i = 1}^{d} b_{j_{i}} η_{j_{i}})}}) .

(5.7)

5.3. Strong Uniform Convergence of Conditional U-Statistics Under MAR

In this section, we establish the strong uniform consistency, together with explicit convergence rates, of

{\hat{r}}_{n, 3}^{(m), (miss)} (φ, \tilde{x}; {\bar{Λ}}_{n, 3} (\tilde{x}))

under the missing-at-random (MAR) mechanism. To this end, it is first necessary to strengthen appropriately the assumptions imposed on the smoothing parameters.

(C.4’.): For $b_{j} : = b_{j} (n) = (b_{j_{1}}, \dots, b_{j_{d}}) > 0$ and $η_{j} : = η_{j} (n) = (η_{j_{1}}, \dots, η_{j_{d}}) > 0$ , $j = 1, \dots, m$ , satisfying for $i = 1, \dots, d$ , $b_{j_{i}}, η_{j_{i}} \to 0$ , $\frac{b_{j_{i}}}{η_{j_{i}}} \to 0$ and

$\frac{\log n}{n \prod_{j = 1}^{m} (\prod_{i = 1}^{d} b_{j_{i}} η_{j_{i}})} {(\sum_{j = 1}^{m} \sum_{i = 1}^{d} \frac{1}{b_{j_{i}}^{2}})}^{1 - κ} = O (1),$

(5.8)

for some constant $κ \in [0, 1)$ , as $n \to \infty$ .
(C.5’.): (Strong positivity under MAR) The propensity score $p (\cdot)$ satisfies ${inf}_{x \in S_{X}} p (x) \geq c > 0$ and is continuous on $S_{X}$ , ensuring almost sure convergence of the denominator.

The condition (5.8) is stronger than

\log n / (n \sqrt{\prod_{j = 1}^{m} (\prod_{i = 1}^{d} b_{j_{i}} η_{j_{i}})}) ⟶ 0

in Assumption (C.4) in that the former implies the latter. Under this condition, the statement in Corollary 5 can be strengthened to almost sure convergence. The following theorems generalize the results of [165] that are given for

m = 1

to the missing data setting.

Theorem 13.

If Assumptions (C.2)–(C.3) and (C.4’.) and (C.5’.) hold, then, as

n \to \infty,

we have

sup_{\tilde{x} \in S_{X}^{m}} |u_{n, 3}^{(Beta)} (φ, \tilde{x}) - E [u_{n, 3}^{(Beta)} (φ, \tilde{x})]| = O (\sqrt{\frac{(\log n / n)}{\prod_{j = 1}^{m} (\prod_{i = 1}^{d} b_{j_{i}} η_{j_{i}})}}), a . s .

(5.9)

Theorem 14.

If Assumptions (C.2)–(C.3) and (C.4’.) and (C.5’.) hold, then, as

n \to \infty,

we have

\begin{matrix} sup_{\tilde{x} \in S_{X}^{m}} |{\hat{r}}_{n, 3}^{(m), (miss)} (φ, \tilde{x}; {\bar{Λ}}_{n, 3} (\tilde{x})) - \hat{E} [{\hat{r}}_{n, 3}^{(m), (miss)} (φ, \tilde{x}; {\bar{Λ}}_{n, 3} (\tilde{x}))]| \\ = O (\sqrt{\frac{(\log n / n)}{\prod_{j = 1}^{m} (\prod_{i = 1}^{d} b_{j_{i}} η_{j_{i}})}}), a . s . \end{matrix}

(5.10)

Theorem 15.

If Assumption (C.2) holds, and under the MAR assumption (2.4) with condition (C.5’.), then we have

sup_{\tilde{x} \in S_{X}^{m}} |r^{(m)} (φ, \tilde{x}) - \hat{E} [{\hat{r}}_{n, 3}^{(m), (miss)} (φ, \tilde{x}; {\bar{Λ}}_{n, 3} (\tilde{x}))]| = O (\sum_{j = 1}^{m} \sum_{i = 1}^{d} b_{j_{i}}) .

(5.11)

Corollary 6.

Under the assumptions of Theorem 14 and (5.11), as

n \to \infty,

we have

\begin{matrix} sup_{\tilde{x} \in S_{X}^{m}} |{\hat{r}}_{n, 3}^{(m), (miss)} (φ, \tilde{x}; {\bar{Λ}}_{n, 3} (\tilde{x})) - r^{(m)} (φ, \tilde{x})| \\ = O (\sum_{j = 1}^{m} \sum_{i = 1}^{d} b_{j_{i}} + \sqrt{\frac{(\log n / n)}{\prod_{j = 1}^{m} (\prod_{i = 1}^{d} b_{j_{i}} η_{j_{i}})}}), a . s . \end{matrix}

(5.12)

Remark 12.

The convergence rates established in Corollaries 5 and 6 are identical to those in the complete-data case, provided the positivity condition (C.5) holds. This is because the leading terms in the asymptotic expansion are governed by the kernel and the bandwidth parameters, while the propensity score

p (\cdot)

affects only the constants in the asymptotic variance (through the factor

1 / p (x_{i})

) but not the rates. However, in finite samples, missingness inflates the variance, and the effective sample size for the complete-case estimator is approximately

n \cdot inf p (x)

, which must be accounted for in practical implementations.

5.4. Conditional U-Statistics Estimators Using Mixed Categorical and Continuous Data Under MAR

Let us delve into the methodology for handling a discrete random variable Z, which can assume c distinct values,

{0, 1, \dots, c - 1}

, where

c \geq 2

, we refer to [166,167,168,169,170]. We categorize this variable as either unordered or ordered, as the kernels utilized for these two types differ slightly. For an unordered variable, the univariate discrete kernel takes the form:

l (v; z, λ) = \{\begin{matrix} 1 - λ, & if v = z, \\ λ / (c - 1), & if v \neq z . \end{matrix}

Here, v represents the data point, z denotes the design point, and

λ \in (0, 1)

denotes the bandwidth. Conversely, the univariate discrete kernel for an ordered variable is given by:

ℓ (v; z, λ) = (\binom{c}{| v - z |}) {(1 - λ)}^{c - | v - z |} λ^{| v - z |} .

Moving on to the product discrete kernel, when

q_{1} (\leq q)

out of q discrete variables are unordered, it becomes:

L (v; z, λ) = \{\prod_{k = 1}^{q_{1}} l (v_{k}; z_{k}, λ_{k})\} \{\prod_{k = q_{1} + 1}^{q} ℓ (v_{k}; z_{k}, λ_{k})\} .

Here,

v : = (v_{1}, \dots, v_{q})

,

z : = (z_{1}, \dots, z_{q})

, and

λ : = (λ_{1}, \dots, λ_{q})

. Combining this with the product beta kernel

K_{\overset{˘}{α}, \overset{˘}{β}} (u)

yields the product kernel for mixed categorical and continuous data:

W (u, v; x, z, b, λ) = K_{\overset{˘}{α}, \overset{˘}{β}} (u) L (v; z, λ) .

We now incorporate the missing data mechanism. Let

δ_{i}

be the missingness indicator defined in (2.4), with

δ_{i} = 1

if

Y_{i}

is observed and

δ_{i} = 0

otherwise. Under the MAR assumption, we have

P (δ_{i} = 1 ∣ X_{i}, Z_{i}, Y_{i}) = P (δ_{i} = 1 ∣ X_{i}, Z_{i}) = : p (X_{i}, Z_{i})

, where

p : X \times S_{Z} \to [0, 1]

is the propensity score, assumed continuous and bounded away from zero on the support. For consistency, we require the positivity condition

{inf}_{(x, z) \in S} p (x, z) \geq c > 0

for some compact

S \subseteq X \times S_{Z}

. Given this kernel and n i.i.d. observations

{\{(Y_{i}, X_{i}, Z_{i}, δ_{i})\}}_{i = 1}^{n} \in R \times {[0, 1]}^{d} \times S_{Z} \times {0, 1}

, where

S_{Z} : = \prod_{k = 1}^{q} \{0, 1, \dots, c_{k} - 1\}

, we turn to a regression estimator of the conditional mean:

r^{(m)} (φ, \tilde{x}, \tilde{z}) = E (φ (Y_{1}, \dots, Y_{m}) ∣ (X_{1}, \dots, X_{m}) = \tilde{x}, (Z_{1}, \dots, Z_{m}) = \tilde{z}) .

The complete-case estimator under MAR, denoted as

{\hat{r}}_{n, 3}^{(m), (miss)} (φ, \tilde{x}, \tilde{z}; {\bar{Λ}}_{n, 3} (\tilde{x}))

, is expressed as:

\begin{matrix} {\hat{r}}_{n, 3}^{(m), (miss)} (φ, \tilde{x}, \tilde{z}; {\bar{Λ}}_{n, 3} (\tilde{x})) \\ = \frac{\sum_{(i_{1}, \dots, i_{m}) \in I (m, n)} φ (Y_{i_{1}}, \dots, Y_{i_{m}}) (\prod_{j = 1}^{m} δ_{i_{j}}) \prod_{j = 1}^{m} W (X_{i_{j}}, Z_{i_{j}}; x_{j}, z_{j}, b, λ)}{\sum_{(i_{1}, \dots, i_{m}) \in I (m, n)} (\prod_{j = 1}^{m} δ_{i_{j}}) \prod_{j = 1}^{m} W (X_{i_{j}}, Z_{i_{j}}; x_{j}, z_{j}, b, λ)}, \end{matrix}

(5.13)

where

{\bar{Λ}}_{n, 3} (\tilde{x})

denotes the collection of all bandwidth parameters

(b, λ)

. For notational convenience, define the extended random vector

Z_{i}^{(mixed)} : = (X_{i}, Z_{i}, Y_{i}, δ_{i})

. The corresponding U-statistic representation is given by

u_{n, 3}^{(miss)} (φ, \tilde{x}, \tilde{z}) : = \frac{(n - m)!}{n!} \sum_{i \in I (m, n)} φ (Y_{i_{1}}, \dots, Y_{i_{m}}) (\prod_{j = 1}^{m} δ_{i_{j}}) \prod_{j = 1}^{m} W (X_{i_{j}}, Z_{i_{j}}; x_{j}, z_{j}, b, λ),

(5.14)

and the estimator can be written as the ratio

{\hat{r}}_{n, 3}^{(m), (miss)} (φ, \tilde{x}, \tilde{z}; {\bar{Λ}}_{n, 3} (\tilde{x})) = \frac{u_{n, 3}^{(miss)} (φ, \tilde{x}, \tilde{z})}{u_{n, 3}^{(miss)} (1, \tilde{x}, \tilde{z})} .

(5.15)

Weak Uniform Convergence Under MAR

Before we state the uniform convergence results of the estimator under missing data, let us adopt the previous conditions to this setting as follows:

(C.1’): ${\{(Y_{i}, X_{i}, Z_{i}, δ_{i})\}}_{i = 1}^{n} \in R \times {[0, 1]}^{d} \times S_{Z} \times {0, 1}$ are independent and identically distributed random variables under the MAR assumption (2.4);
(C.2’): Let $\tilde{f} (\tilde{x}, \tilde{z})$ be the joint probability density function (with respect to the product of Lebesgue measure on ${[0, 1]}^{d m}$ and counting measure on $S_{Z}^{m}$ ) of $(\tilde{X}, \tilde{Z})$ . Then the second-order derivatives of $\tilde{f} (\tilde{x}, \tilde{z})$ and $\tilde{g} (\tilde{x}, \tilde{z}) : = r^{(m)} (φ, \tilde{x}, \tilde{z}) \tilde{f} (\tilde{x}, \tilde{z})$ with respect to $\tilde{x}$ are continuous on $\tilde{x} \in {(0, 1)}^{d m}$ for each fixed $\tilde{z} \in S_{Z}^{m}$ ;
(C.3’): There exist constants $γ > 0$ and $C_{1} \in [1, \infty)$ such that ${E | φ (Y) |}^{2 + γ} < \infty$ and

$sup_{(\tilde{x}, \tilde{z}) \in {(0, 1)}^{d m} \times S_{Z}^{m}} E ({| φ (Y) |}^{2 + γ} ∣ \tilde{X} = \tilde{x}, \tilde{Z} = \tilde{z}) \tilde{f} (\tilde{x}, \tilde{z}) \leq C_{1};$

(5.16)
(C.4”): For $b_{j} : = b_{j} (n) = (b_{j_{1}}, \dots, b_{j_{d}}) > 0$ , $η_{j} : = η_{j} (n) = (η_{j_{1}}, \dots, η_{j_{d}}) > 0$ , $j = 1, \dots, m$ , and $λ_{k} : = λ_{k} (n) \in (0, 1)$ , $k = 1, \dots, q$ satisfying for $i = 1, \dots, d$ , $b_{j_{i}}, η_{j_{i}} \to 0$ , $\frac{b_{j_{i}}}{η_{j_{i}}} \to 0$ , $λ_{k} \to 0$ and

$\frac{\log n}{n \prod_{j = 1}^{m} (\prod_{i = 1}^{d} b_{j_{i}} η_{j_{i}})} ⟶ 0 a s n \to \infty;$
(C.4”’): For $b_{j} : = b_{j} (n) = (b_{j_{1}}, \dots, b_{j_{d}}) > 0$ , $η_{j} : = η_{j} (n) = (η_{j_{1}}, \dots, η_{j_{d}}) > 0$ , $j = 1, \dots, m$ , and $λ_{k} : = λ_{k} (n) \in (0, 1)$ , $k = 1, \dots, q$ satisfying for $i = 1, \dots, d$ , $b_{j_{i}}, η_{j_{i}} \to 0$ , $\frac{b_{j_{i}}}{η_{j_{i}}} \to 0$ , $λ_{k} \to 0$ and

$\frac{\log n}{n \prod_{j = 1}^{m} (\prod_{i = 1}^{d} b_{j_{i}} η_{j_{i}})} {(\sum_{j = 1}^{m} \sum_{i = 1}^{d} \frac{1}{b_{j_{i}}^{2}})}^{1 - κ} = O (1),$

(5.17)

for some constant $κ \in [0, 1)$ , as $n \to \infty$ ;
(C.5): Let

$f_{n}^{m} : = inf_{(\tilde{x}, \tilde{z}) \in X^{m} \times S_{Z}^{m}} \tilde{f} (\tilde{x}, \tilde{z}) > 0,$

and assume

$inf_{(\tilde{x}, \tilde{z}) \in X^{m} \times S_{Z}^{m}} p (\tilde{x}, \tilde{z}) \geq c > 0 .$

(5.18)

Moreover, suppose that

$\frac{1}{f_{n}^{m}} (\sum_{j = 1}^{m} \sum_{i = 1}^{d} b_{j_{i}} + \sum_{k = 1}^{q} λ_{k} + \sqrt{\frac{(\log n / n)}{\prod_{j = 1}^{m} (\prod_{i = 1}^{d} b_{j_{i}} η_{j_{i}})}}) ⟶ 0 .$

Corollary 7.

Under the assumptions (C.1’.)–(C.3’), (C.4”.), (C.5), and the MAR assumption (2.4) with the positivity condition (5.18), we have, as

n \to \infty,

\begin{matrix} sup_{(\tilde{x}, \tilde{z}) \in X^{m} \times S_{Z}^{m}} |{\hat{r}}_{n, 3}^{(m), (miss)} (φ, \tilde{x}, \tilde{z}; {\bar{Λ}}_{n, 3} (\tilde{x})) - r^{(m)} (φ, \tilde{x}, \tilde{z})| \\ = O_{P} (\sum_{j = 1}^{m} \sum_{i = 1}^{d} b_{j_{i}} + \sum_{k = 1}^{q} λ_{k} + \sqrt{\frac{(\log n / n)}{\prod_{j = 1}^{m} (\prod_{i = 1}^{d} b_{j_{i}} η_{j_{i}})}}) . \end{matrix}

(5.19)

Remark 13.

Several important observations are in order regarding the adaptation to missing data:

1.: The complete-case estimator (5.13) incorporates the product $\prod_{j = 1}^{m} δ_{i_{j}}$ , which discards any m-tuple containing at least one missing response. Under the MAR assumption and the positivity condition, this estimator remains consistent, albeit with a larger asymptotic variance due to the reduced effective sample size.
2.: The convergence rate in Corollary 7 remains unchanged from the complete-data case because the missingness indicators $δ_{i}$ do not affect the first-order bias. However, the constant in the $O_{P}$ term may depend on the propensity score through the variance of the U-statistic.
3.: The bandwidth conditions (C.4”) and (C.4”’) are unaffected by the missingness mechanism, as they pertain to the kernel and the design density. The positivity condition (5.18) ensures that the denominator of the estimator does not degenerate asymptotically.
4.: When the propensity score $p (x, z)$ is unknown, it must be estimated from the data. Under MAR, a nonparametric estimator of $p (\cdot)$ (e.g., a kernel estimator using the complete cases) can be employed, leading to an augmented inverse probability weighted (AIPW) estimator that may achieve semiparametric efficiency. This extension is beyond the scope of the present work but represents a promising direction for future research.

Remark 14.

The relationship between the present manuscript and the general delta-sequence theory of [157] is structural rather than merely taxonomic. The latter work develops an abstract asymptotic theory for complete-case conditional U-statistics under MAR by treating the localization device as a positive approximate identity. At that level of generality, the principal objects are the localized complete-case U-statistic, its MAR-weighted expectation, the associated ratio normalization, and the Hoeffding–projection structure governing stochastic fluctuations. The resulting theory isolates the probabilistic skeleton of the problem.

The present paper, by contrast, studies what happens when the approximate identity is no longer an abstract regularizing sequence but a support-adapted asymmetric family whose analytic behavior is inseparable from the geometry of the covariate space. This passage from abstract localization to asymmetric localization is not innocuous. Dirichlet kernels on the simplex, beta kernels on the hypercube, and Bernstein polynomial smoothers possess evaluation-point-dependent shapes, boundary stratification, non-Euclidean local covariance structures, and normalizing constants whose asymptotics change across the support. Their mass concentration is governed not only by a bandwidth or degree parameter, but also by the position of the target point relative to the boundary. Hence the bias, variance, entropy, and stochastic equicontinuity calculations require a substantially finer analysis than that needed for a generic delta sequence.

This distinction is especially pronounced for conditional U-statistics. In ordinary first-order regression, asymmetric kernels already introduce nonstandard boundary behavior. In the present higher-order setting, that behavior is amplified by the nonlinear ratio structure and by the Hoeffding decomposition of localized U-statistics. The first projection carries the leading Gaussian fluctuation, whereas the higher canonical projections must be shown to be uniformly negligible over support-dependent classes of kernels. Since the localization kernel itself depends on the ordered target tuple

\tilde{x} = (x_{1}, \dots, x_{m}),

the relevant kernel class is not a simple translation family. It is a point-dependent, support- constrained, and generally non-symmetric family. This is one of the reasons why the proofs cannot be obtained by a direct invocation of the abstract results in [157]. A second essential distinction concerns the MAR mechanism. Under complete-case sampling, the expectation of the localized numerator involves

\prod_{j = 1}^{m} p (t_{j}) f (t_{j})

rather than

\prod_{j = 1}^{m} f (t_{j}) .

Thus the finite-sample centering is a propensity-weighted smoothing functional. In a purely abstract delta-sequence framework, this observation identifies the correct effective design measure. In the present asymmetric-kernel setting, however, one must further determine how the factor

p (\cdot)

interacts with the local geometry of the kernel. The paper shows that the leading-order contribution of the propensity score cancels in the ratio bias under continuity and positivity assumptions, while the stochastic dispersion retains the factor

1 / p (x)

, or its higher-order analogue, through the variance of the complete-case first projection. This yields a precise asymptotic separation:

bias is governed by support-adapted smoothing geometry,

whereas

variance is inflated by the MAR observation mechanism .

Such a conclusion requires kernel-specific expansions and cannot be reduced to a formal “replace f by

p f

” principle.

The genuinely new content of the present manuscript relative to [157] may therefore be summarized as follows. First, it develops explicit Dirichlet-kernel conditional U-statistic estimators on the simplex under MAR and proves strong uniform consistency and asymptotic normality with rates reflecting the local

L^{2}

geometry of the Dirichlet family. Second, it establishes Bernstein-polynomial analogues, including uniform stochastic bounds and bias estimates in a nonlinear conditional U-statistic setting. Third, it treats multivariate beta kernels on hyperrectangles, both on fixed compact regions and on expanding interior domains approaching the boundary, thereby making visible the interaction between bandwidths, boundary parameters, and complete-case sampling. Fourth, it extends the construction to mixed continuous–categorical regressors, where continuous beta smoothing and discrete kernels contribute distinct bias components. Fifth, it provides complete-case computational formulations, bandwidth selection criteria, and simulation evidence for conditional Kendall-type functionals under MCAR and MAR mechanisms.

Thus, the present paper is not simply an application of the delta-sequence MAR theory. It is a boundary-sensitive, geometry-dependent, and kernel-specific refinement of that theory. The general framework of [157] supplies the abstract probabilistic paradigm; the present manuscript supplies the detailed asymptotic analysis needed to make that paradigm operational for asymmetric smoothing on constrained supports. In particular, the rates, bias expansions, variance inflation factors, and mixed-data extensions derived here are new features generated by the interaction of four mechanisms: higher-order conditional U-statistic structure, point-dependent asymmetric smoothing, boundary geometry, and MAR complete-case sampling.

Remark 15.

We clarify the logical relation between the present paper, the earlier complete-data analysis [159], and the abstract MAR delta-sequence framework [157]. The present manuscript should not be read as reproving, in kernel-specific notation, every abstract consequence already contained in [157]. Rather, the role of the present work is to identify which parts of that abstract theory are applicable to asymmetric support-adapted kernels and to derive the additional kernel-specific information that is not available from the abstract framework alone. The complete-data paper [159] treats conditional U-statistics without missing responses. Hence it contains neither the complete-case U-statistic structure, nor the MAR propensity score, nor the effective complete-case design density

p f

, nor the inverse-propensity variance inflation. It also does not address the distinction between complete-case and IPW centering. Therefore, the present paper is new relative to [159] at the level of the statistical experiment: the observations are

(X_{i}, δ_{i} Y_{i}, δ_{i})

, the estimator is a complete-case local ratio, and the first Hoeffding projection, deterministic centering, variance, and MSE constants are all modified by the MAR mechanism. The MAR delta-sequence work [157], by contrast, provides an abstract probabilistic theorem for complete-case conditional U-statistics under MAR, where the localization device is treated as a positive approximate identity. Results in the present paper that follow by checking the hypotheses of [157] are therefore presented only as corollaries or short verification statements. In particular, whenever the conclusion is only that a generic complete-case estimator is consistent, or satisfies the abstract delta-sequence stochastic bound under the hypotheses of [157], we do not claim a new theorem. We simply verify the required assumptions for the relevant kernel family. The genuinely new contributions of the present paper are the following kernel-specific and geometry-dependent ingredients.

1.: For Dirichlet kernels on the simplex, we derive the exact local moment structure, including the boundary-sensitive drift, covariance matrix, and $L^{2}$ -norm behavior. These quantities are not supplied by the abstract delta-sequence theory and are essential for obtaining explicit rates and normalizing constants.
2.: For Bernstein smoothers, we identify the discrete polynomial smoothing operator as an admissible MAR localization scheme and compute its bias and variance scales in the nonlinear conditional U-statistic setting. The resulting verification is not a formal substitution, because the polynomial operator is discrete and its stochastic normalization differs from ordinary continuous kernels.
3.: For product beta kernels on hyperrectangles, we give the interior and near-boundary moment expansions, the $L^{2}$ -norm regimes, and the corresponding uniform stochastic rates. These regimes depend on the evaluation point and cannot be recovered from [157] without substantial kernel-specific analysis.
4.: For mixed continuous–categorical regressors, we combine continuous beta smoothing with categorical smoothing. This yields a two-component deterministic bias and a mixed stochastic scale. This construction is not present in the complete-data paper and is not an immediate consequence of the abstract delta-sequence result.
5.: For the MAR mechanism, we make explicit how the complete-case density $p f$ enters deterministic centering and bias constants, while the stochastic dispersion contains the inverse-propensity loss of information. The abstract framework gives the general MAR architecture, but the present paper computes the kernel-specific constants and rates for the asymmetric smoothers under consideration.

5.5. Computational Complexity

We briefly discuss the computational cost of the proposed estimators. Let

n_{obs} : = \sum_{i = 1}^{n} δ_{i}

denote the number of complete cases. Under the MAR assumption, all estimators in (2.5), (3.3), (4.2), (5.1) and (5.13) can be implemented after discarding the incomplete observations, so the effective sample size in the algorithmic complexity is

n_{obs}

rather than n. For fixed order m, the cardinality of

I (m, n_{obs}) = {(i_{1}, \dots, i_{m}) : 1 \leq i_{j} \leq n_{obs}, i_{j} \neq i_{r}}

is

| I (m, n_{obs}) | = \frac{n_{obs}!}{(n_{obs} - m)!} = O (n_{obs}^{m}) .

For a fixed evaluation point

\tilde{x} = (x_{1}, \dots, x_{m})

, the numerator and denominator of the conditional U-statistic estimator require summation over all distinct m-tuples. Hence, the naive computational cost is

O (| I (m, n_{obs}) | [C_{φ} + m C_{K}]) = O (n_{obs}^{m} [C_{φ} + m C_{K}]),

where

C_{φ}

is the cost of evaluating

φ (Y_{i_{1}}, \dots, Y_{i_{m}})

and

C_{K}

is the cost of one kernel evaluation.

More specifically:

Dirichlet kernel estimator. For (3.3), after precomputing the normalization constants depending on $(α_{j}, β_{j})$ , one kernel evaluation costs $C_{K} = O (d)$ . Therefore, for one target point,

$Time = O (n_{obs}^{m} (C_{φ} + m d)), Memory = O (n_{obs} d) .$
Bernstein polynomial estimator. For (4.2), if implemented directly from the multinomial sum, one evaluation may cost as much as

$O ((\binom{ϑ + d - 1}{d})),$

but using the cell representation of the Bernstein density estimator reduces the cost of one kernel evaluation to $C_{K} = O (d)$ (or $O (1)$ in the univariate case). Thus, with an efficient implementation,

$Time = O (n_{obs}^{m} (C_{φ} + m d)) .$
Beta kernel estimator. For (5.1), the product beta kernel requires $O (d)$ operations per observation, so

$Time = O (n_{obs}^{m} (C_{φ} + m d)), Memory = O (n_{obs} d) .$
Mixed continuous/categorical estimator. For (5.13), the continuous beta part contributes $O (d)$ and the discrete kernel contributes $O (q)$ , hence

$C_{K} = O (d + q),$

and therefore

$Time = O (n_{obs}^{m} (C_{φ} + m (d + q))), Memory = O (n_{obs} (d + q)) .$

In the important special case

m = 1

(Nadaraya–Watson-type regression), the complexity becomes linear in the number of complete cases:

Time = O (n_{obs} C_{K}),

that is,

O (n_{obs} d)

for the Dirichlet, Bernstein, and beta kernels, and

O (n_{obs} (d + q))

in the mixed-data case. If the estimator is evaluated on a grid of G target points, the overall cost scales as

O (G n_{obs}^{m} [C_{φ} + m C_{K}]) .

Bandwidth selection. The leave-one-out cross-validation criterion in (8.2) is substantially more expensive. A naive implementation requires recomputing an estimator of order

O (n_{obs}^{m})

for each of the

O (n_{obs}^{m})

tuples, yielding

O (n_{obs}^{2 m})

operations per candidate bandwidth. If H bandwidth values are tested, the total cost is

O (H n_{obs}^{2 m}) .

For

m = 1

, this reduces to the familiar

O (H n_{obs}^{2})

complexity.

Comment. Therefore, the proposed methodology is computationally tractable for $m = 1$ and $m = 2$ , but the exact computation becomes rapidly expensive for larger m because of the combinatorial growth of $| I (m, n_{obs}) |$ . In practice, this motivates the use of precomputed kernel weights, parallel computation over target points, and, for large samples or higher orders, incomplete U-statistics or subsampling strategies.

Remark 16.

The exact evaluation of the conditional U-statistic estimator in (2.5) involves a summation over

I (m, n)

, and hence has worst-case computational complexity of order

O (n^{m})

for a fixed evaluation point. This cost is intrinsic to the fully enumerated U-statistic representation and may become prohibitive when

m \geq 2

and n is large. The main purpose of the present paper is asymptotic and probabilistic rather than algorithmic; nevertheless, several standard computational reductions are available. First, in the most common applications of conditional U-statistics, the order m is small, typically

m = 2

for pairwise functionals such as local covariance, conditional Kendall-type coefficients, discrimination criteria, ranking losses, or two-sample comparison functionals. In that case the exact implementation has quadratic cost, which remains feasible for moderate sample sizes, especially after discarding incomplete tuples through the factor

\prod_{j = 1}^{m} δ_{i_{j}}

. Second, the complete U-statistic may be replaced by an incomplete or sampled U-statistic, obtained by averaging only over a random subset

B_{n} \subset I (m, n)

. For instance, one may use

{\hat{r}}_{n, ℓ, B}^{(m)} (φ, \tilde{x}) = \frac{\sum_{i \in B_{n}} φ (Y_{i_{1}}, \dots, Y_{i_{m}}) \prod_{j = 1}^{m} δ_{i_{j}} K_{Λ_{n, ℓ} (x_{j})} (X_{i_{j}})}{\sum_{i \in B_{n}} \prod_{j = 1}^{m} δ_{i_{j}} K_{Λ_{n, ℓ} (x_{j})} (X_{i_{j}})} .

If

B_{n}

is sampled uniformly from

I (m, n)

, or generated by an appropriate randomized block scheme, the computational cost is reduced from

O (n^{m})

to

O (| B_{n} |)

. The additional Monte Carlo error can then be made negligible relative to the statistical error by choosing

| B_{n} |

sufficiently large. A detailed asymptotic theory for such incomplete asymmetric-kernel conditional U-statistics under MAR is beyond the scope of the present paper, but it is a natural direction for future work. Third, the localized nature of the kernel weights can be exploited computationally. For compactly supported or effectively localized asymmetric kernels, only observations lying in the effective neighborhood of the evaluation point contribute substantially to the numerator and denominator. Thus, nearest-neighbor screening, binning, kd-tree search, or sparse evaluation of negligible weights may substantially reduce the practical cost. These reductions do not change the theoretical estimator studied here when exact weights are retained, but they provide efficient numerical implementations in moderate and large samples. Consequently, the

O (n^{m})

complexity should be understood as the cost of the exact theoretical version of the estimator. The asymptotic results of the paper are established for this exact complete-case conditional U-statistic, while incomplete-U, randomized, and localized approximations provide scalable computational variants whose rigorous treatment is left for future investigation.

Remark 17.

We clarify why the propensity-weighted deterministic centering associated with the complete-case estimator converges to the same conditional functional

r^{(m)} (φ, \tilde{x})

as in the fully observed case. Let

g_{p} (x) : = p (x) f (x), G_{p} (\tilde{x}) : = \prod_{j = 1}^{m} g_{p} (x_{j}),

and write

θ (\tilde{x}) : = r^{(m)} (φ, \tilde{x}) .

The complete-case deterministic centering may be written as

r_{p, n, ℓ}^{(m)} (φ, \tilde{x}) = \frac{\int_{X^{m}} θ (\tilde{t}) G_{p} (\tilde{t}) {\tilde{K}}_{{\bar{Λ}}_{n, ℓ} (\tilde{x})} (\tilde{t}) d \tilde{t}}{\int_{X^{m}} G_{p} (\tilde{t}) {\tilde{K}}_{{\bar{Λ}}_{n, ℓ} (\tilde{x})} (\tilde{t}) d \tilde{t}} .

Equivalently, if

{\tilde{T}}_{n, ℓ, \tilde{x}}

has density

{\tilde{K}}_{{\bar{Λ}}_{n, ℓ} (\tilde{x})},

then

r_{p, n, ℓ}^{(m)} (φ, \tilde{x}) = \frac{E [θ ({\tilde{T}}_{n, ℓ, \tilde{x}}) G_{p} ({\tilde{T}}_{n, ℓ, \tilde{x}})]}{E [G_{p} ({\tilde{T}}_{n, ℓ, \tilde{x}})]} .

Hence

\begin{matrix} r_{p, n, ℓ}^{(m)} (φ, \tilde{x}) - θ (\tilde{x}) = \frac{E [{θ ({\tilde{T}}_{n, ℓ, \tilde{x}}) - θ (\tilde{x})} G_{p} ({\tilde{T}}_{n, ℓ, \tilde{x}})]}{E [G_{p} ({\tilde{T}}_{n, ℓ, \tilde{x}})]} . \end{matrix}

(5.20)

This identity is the key point. The propensity score does not have to vanish from the integrand. Rather, it appears both in the numerator and in the denominator through the same local design weight

G_{p}

. Since the kernels form an approximate identity at

\tilde{x}

, the random vector

{\tilde{T}}_{n, ℓ, \tilde{x}}

concentrates around

\tilde{x}

. Therefore, if θ and

G_{p}

are continuous, and if

G_{p} (\tilde{x}) > 0

, then

E [G_{p} ({\tilde{T}}_{n, ℓ, \tilde{x}})] ⟶ G_{p} (\tilde{x}) > 0,

and

E [θ ({\tilde{T}}_{n, ℓ, \tilde{x}}) G_{p} ({\tilde{T}}_{n, ℓ, \tilde{x}})] ⟶ θ (\tilde{x}) G_{p} (\tilde{x}) .

Consequently,

r_{p, n, ℓ}^{(m)} (φ, \tilde{x}) ⟶ θ (\tilde{x}) = r^{(m)} (φ, \tilde{x}) .

More explicitly, for any

η > 0

, identity

(1)

gives

|r_{p, n, ℓ}^{(m)} (φ, \tilde{x}) - θ (\tilde{x})| \leq \frac{ω_{θ} (η) E [G_{p} ({\tilde{T}}_{n, ℓ, \tilde{x}})] + {2 ∥ θ ∥}_{\infty} ∥ G_{p} ∥_{\infty} P (∥ {\tilde{T}}_{n, ℓ, \tilde{x}} - \tilde{x} ∥ > η)}{E [G_{p} ({\tilde{T}}_{n, ℓ, \tilde{x}})]},

where

ω_{θ} (η)

is the modulus of continuity of θ. The first term is small because θ is continuous, and the second term is small because the kernel mass concentrates around

\tilde{x}

. The denominator is bounded away from zero for all large n because p is positive and f is positive on the region under consideration. This proves the convergence. If the convergence is required uniformly on a compact set

C \subset Int (X^{m})

, the same argument applies provided the approximate-identity property holds uniformly on

C

, and provided

inf_{\tilde{x} \in C} G_{p} (\tilde{x}) > 0 .

Thus

sup_{\tilde{x} \in C} |r_{p, n, ℓ}^{(m)} (φ, \tilde{x}) - r^{(m)} (φ, \tilde{x})| ⟶ 0 .

Under second-order smoothness, this qualitative convergence can be sharpened to the bias expansion

\begin{matrix} r_{p, n, ℓ}^{(m)} (φ, \tilde{x}) - θ (\tilde{x}) & = & \nabla θ {(\tilde{x})}^{⊤} μ_{n, ℓ} (\tilde{x}) + \frac{1}{2} tr [\nabla^{2} θ (\tilde{x}) M_{n, ℓ} (\tilde{x})] \\ + \nabla θ {(\tilde{x})}^{⊤} M_{n, ℓ} (\tilde{x}) \nabla \log G_{p} (\tilde{x}) + o (ρ_{n, ℓ}), \end{matrix}

(5.21)

where

μ_{n, ℓ} (\tilde{x}) = E [{\tilde{T}}_{n, ℓ, \tilde{x}} - \tilde{x}], M_{n, ℓ} (\tilde{x}) = E [({\tilde{T}}_{n, ℓ, \tilde{x}} - \tilde{x}) {({\tilde{T}}_{n, ℓ, \tilde{x}} - \tilde{x})}^{⊤}] .

This expansion shows precisely what is meant by the cancellation of the MAR propensity score. The factor p cancels from the zeroth-order target because the ratio converges to

\frac{θ (\tilde{x}) G_{p} (\tilde{x})}{G_{p} (\tilde{x})} = θ (\tilde{x}) .

However, unless p is locally constant, it may still enter the higher-order bias constant through

\nabla \log G_{p} (\tilde{x}) = \nabla [\sum_{j = 1}^{m} \log {p (x_{j}) f (x_{j})}] .

Therefore, the correct statement is that the propensity-weighted centering converges to the target conditional functional, and that the MAR mechanism does not change the zeroth-order target or the bias rate under smooth positive p. Exact higher-order bias constants, however, must be computed with the effective complete-case density

p f

.

6. Applications

6.1. Discrimination Problems with Missing Responses

We now apply the theoretical framework developed in the preceding sections to the problem of discrimination described in Section 3 of [171] (see also [172]), under the additional complication that response variables may be missing according to the MAR mechanism (2.4). We adopt the same notation and setting as [171], with modifications to accommodate missingness.

Let

φ : Y^{k} \to {1, \dots, M}

be a measurable function taking finitely many values. The sets

A_{j} = \{(y_{1}, \dots, y_{k}) \in Y^{k} : φ (y_{1}, \dots, y_{k}) = j\}, 1 \leq j \leq M,

form a measurable partition of the feature space

Y^{k}

. Predicting the value of

φ (Y_{1}, \dots, Y_{k})

is equivalent to predicting the cell

A_{j}

to which the k-tuple

(Y_{1}, \dots, Y_{k})

belongs. For any measurable discrimination rule

g : X^{k} \to {1, \dots, M}

, the probability of correct classification satisfies

P (g (\tilde{X}) = φ (\tilde{Y})) \leq \sum_{j = 1}^{M} \int_{{\tilde{x} \in X^{k} : g (\tilde{x}) = j}} max_{1 \leq ℓ \leq M} M^{ℓ} (\tilde{x}) d P_{\tilde{X}} (\tilde{x}),

where

P_{\tilde{X}}

denotes the distribution of

\tilde{X} = (X_{1}, \dots, X_{k})

, and for each

j \in {1, \dots, M}

,

M^{j} (\tilde{x}) = P (φ (\tilde{Y}) = j ∣ \tilde{X} = \tilde{x}), \tilde{x} \in X^{k},

is the posterior probability of class j given the covariates. The inequality becomes an equality for the Bayes rule

g_{0} (\tilde{x}) = \arg max_{1 \leq j \leq M} M^{j} (\tilde{x}), \tilde{x} \in X^{k},

where ties are broken arbitrarily (e.g., by selecting the smallest index). The associated minimal probability of error, or Bayes risk, is

L^{*} = 1 - P (g_{0} (\tilde{X}) = φ (\tilde{Y})) = 1 - E_{\tilde{X}} [max_{1 \leq j \leq M} M^{j} (\tilde{X})] .

Under the MAR assumption (2.4) and the positivity condition

{inf}_{x \in X} p (x) \geq c > 0

, each posterior probability

M^{j}

can be consistently estimated using the complete-case conditional U-statistic methodology. For

1 \leq j \leq M

and

ℓ \in {1, 2, 3}

, define

\begin{matrix} M_{n, ℓ}^{j, (miss)} (\tilde{x}) = \frac{\sum_{(i_{1}, \dots, i_{k}) \in I (k, n)} 1_{{φ (Y_{i_{1}}, \dots, Y_{i_{k}}) = j}} (\prod_{r = 1}^{k} δ_{i_{r}}) \prod_{r = 1}^{k} K_{Λ_{n, ℓ} (x_{r})} (X_{i_{r}})}{\sum_{(i_{1}, \dots, i_{k}) \in I (k, n)} (\prod_{r = 1}^{k} δ_{i_{r}}) \prod_{r = 1}^{k} K_{Λ_{n, ℓ} (x_{r})} (X_{i_{r}})}, \end{matrix}

(6.1)

with the convention that the ratio is taken to be

1 / M

(or any arbitrary value) when the denominator vanishes—an event that occurs with probability tending to zero under the positivity condition as

n \to \infty

. The product

\prod_{r = 1}^{k} δ_{i_{r}}

ensures that only k-tuples for which all responses are observed contribute to the estimation, which is the essence of the complete-case approach. Define the estimated discrimination rule

g_{0, n, ℓ}^{(miss)} (\tilde{x}) = \arg max_{1 \leq j \leq M} M_{n, ℓ}^{j, (miss)} (\tilde{x}), \tilde{x} \in X^{k},

and the associated empirical Bayes risk

L_{n, ℓ}^{*, (miss)} = P (g_{0, n, ℓ}^{(miss)} (\tilde{X}) \neq φ (\tilde{Y})) .

Theorem 16

(Consistency of the empirical Bayes rule under MAR). Under the MAR assumption (2.4), the positivity condition

{inf}_{x \in X} p (x) \geq c > 0

, and the assumptions of Corollary 1 (adapted to the missing data setting), we have

lim_{n \to \infty} L_{n, ℓ}^{*, (miss)} = L^{*} .

Proof.

For any

\tilde{x} \in X^{k}

, observe that

|max_{1 \leq j \leq M} M^{j} (\tilde{x}) - max_{1 \leq j \leq M} M_{n, ℓ}^{j, (miss)} (\tilde{x})| \leq max_{1 \leq j \leq M} |M^{j} (\tilde{x}) - M_{n, ℓ}^{j, (miss)} (\tilde{x})| .

Therefore, using the identity

P (g (\tilde{X}) \neq φ (\tilde{Y})) = 1 - E [{max}_{j} M^{j} (\tilde{X})]

for the Bayes rule and the analogous representation for the empirical rule, we obtain

\begin{matrix} |L^{*} - L_{n, ℓ}^{*, (miss)}| & \leq 2 E_{\tilde{X}} [max_{1 \leq j \leq M} |M^{j} (\tilde{X}) - M_{n, ℓ}^{j, (miss)} (\tilde{X})|] \end{matrix}

(6.2)

\begin{matrix} \leq 2 E_{\tilde{X}} [\sum_{j = 1}^{M} |M^{j} (\tilde{X}) - M_{n, ℓ}^{j, (miss)} (\tilde{X})|] \end{matrix}

(6.3)

\begin{matrix} = 2 \sum_{j = 1}^{M} E_{\tilde{X}} [|M^{j} (\tilde{X}) - M_{n, ℓ}^{j, (miss)} (\tilde{X})|] . \end{matrix}

(6.4)

The factor 2 arises from the triangle inequality and the fact that the probability of misclassification is bounded above by twice the total variation distance between the estimated and true conditional probability vectors (see Lemma 2.1 of [173]). By Corollary 1 applied to the indicator kernels

1_{{φ (\cdot) = j}}

, each term

E_{\tilde{X}} [| M^{j} (\tilde{X}) - M_{n, ℓ}^{j, (miss)} (\tilde{X}) |]

converges to zero as

n \to \infty

under the stated assumptions. The conclusion follows by the dominated convergence theorem, noting that the integrand is uniformly bounded by 2. □

6.2. Generalized U-Statistics with Missing Data

The extension of the complete-case conditional U-statistic methodology to the setting of multiple independent samples is conceptually straightforward but requires careful bookkeeping of missingness indicators across samples. Consider

\tilde{ℓ} \in N^{*}

independent collections of i.i.d. observations

\{(X_{1}^{(1)}, Y_{1}^{(1)}, δ_{1}^{(1)}), (X_{2}^{(1)}, Y_{2}^{(1)}, δ_{2}^{(1)}), \dots\}, \dots, \{(X_{1}^{(\tilde{ℓ})}, Y_{1}^{(\tilde{ℓ})}, δ_{1}^{(\tilde{ℓ})}), (X_{2}^{(\tilde{ℓ})}, Y_{2}^{(\tilde{ℓ})}, δ_{2}^{(\tilde{ℓ})}), \dots\},

where for each

j \in {1, \dots, \tilde{ℓ}}

, the missingness indicators

δ_{i}^{(j)}

satisfy the MAR assumption (2.4) with (possibly sample-specific) propensity score

p_{j} (\cdot) = P (δ_{1}^{(j)} = 1 ∣ X_{1}^{(j)})

, and the positivity condition

{inf}_{x \in X} p_{j} (x) \geq c_{j} > 0

holds uniformly across samples. The collections are assumed to be mutually independent, and the missingness mechanisms are independent across samples.

Let

k_{1}, \dots, k_{\tilde{ℓ}} \in N^{*}

be fixed integers, and let

φ : \prod_{j = 1}^{\tilde{ℓ}} Y^{k_{j}} ⟶ R

be a measurable function that is symmetric within each block of arguments (i.e., for each j,

φ

is invariant under permutations of the

k_{j}

arguments from the j-th sample). For

t = (t_{1}, \dots, t_{\tilde{ℓ}}) \in \prod_{j = 1}^{\tilde{ℓ}} X^{k_{j}}

, define the conditional expectation

\begin{matrix} r^{(k, \tilde{ℓ})} (φ, t) & : = E [φ (Y_{1}^{(1)}, \dots, Y_{k_{1}}^{(1)}; \dots; Y_{1}^{(\tilde{ℓ})}, \dots, Y_{k_{\tilde{ℓ}}}^{(\tilde{ℓ})}) \end{matrix}

(6.5)

\begin{matrix} | (X_{1}^{(j)}, \dots, X_{k_{j}}^{(j)}) = t_{j}, j = 1, \dots, \tilde{ℓ}], \end{matrix}

(6.6)

whenever the expectation exists. Corresponding to the kernel

φ

and assuming

n_{j} \geq k_{j}

for all

j = 1, \dots, \tilde{ℓ}

, the complete-case conditional U-statistic for estimating

r^{(k, \tilde{ℓ})} (φ, t)

under MAR is defined, for

ℓ \in {1, 2, 3}

, by

\begin{matrix} {\hat{r}}_{n, ℓ}^{(k, \tilde{ℓ}), (miss)} (φ, t) \\ : = \frac{\sum_{i \in I} φ (Y_{i_{11}}^{(1)}, \dots, Y_{i_{1 k_{1}}}^{(1)}; \dots; Y_{i_{\tilde{ℓ} 1}}^{(\tilde{ℓ})}, \dots, Y_{i_{\tilde{ℓ} k_{\tilde{ℓ}}}}^{(\tilde{ℓ})}) (\prod_{j = 1}^{\tilde{ℓ}} \prod_{r = 1}^{k_{j}} δ_{i_{j r}}^{(j)}) K_{ℓ} ({\{X_{i_{j r}}^{(j)}\}}_{j = 1, \dots, \tilde{ℓ}; r = 1, \dots, k_{j}})}{\sum_{i \in I} (\prod_{j = 1}^{\tilde{ℓ}} \prod_{r = 1}^{k_{j}} δ_{i_{j r}}^{(j)}) K_{ℓ} ({\{X_{i_{j r}}^{(j)}\}}_{j = 1, \dots, \tilde{ℓ}; r = 1, \dots, k_{j}})}, \end{matrix}

(6.7)

where

\begin{matrix} K_{ℓ} ({\{X_{i_{j r}}^{(j)}\}}_{j = 1, \dots, \tilde{ℓ}; r = 1, \dots, k_{j}}) : = \prod_{j = 1}^{\tilde{ℓ}} \prod_{r = 1}^{k_{j}} K_{Λ_{n, ℓ} (t_{j r})} (X_{i_{j r}}^{(j)}), \end{matrix}

(6.8)

with

t_{j} = (t_{j 1}, \dots, t_{j k_{j}}) \in X^{k_{j}}

, and the summation index set is

I : = \prod_{j = 1}^{\tilde{ℓ}} I (k_{j}, n_{j}),

where

I (k_{j}, n_{j}) : = {(i_{j 1}, \dots, i_{j k_{j}}) \in {1, \dots, n_{j}}^{k_{j}} : i_{j 1}, \dots, i_{j k_{j}} distinct}

. The extension of [2] treatment of one-sample U-statistics to the

\tilde{ℓ}

-sample case is due to [174,175]. Under the MAR assumption and the positivity condition, the uniform consistency results established in Corollary 1 (or Corollary 4 or 6) extend directly to this multisample setting. Specifically, we have

|{\hat{r}}_{n, ℓ}^{(k, \tilde{ℓ}), (miss)} (φ, t) - r^{(k, \tilde{ℓ})} (φ, t)| ⟶ 0, a . s .,

(6.9)

as

min (n_{1}, \dots, n_{\tilde{ℓ}}) \to \infty

, provided the bandwidth parameters satisfy appropriate conditions and the propensity scores are uniformly bounded away from zero.

6.3. Kendall Rank Correlation Coefficient Under Conditional Independence Testing with Missing Responses

To test the independence of two one-dimensional random variables

Y_{1}

and

Y_{2}

, Kendall [176] proposed a nonparametric procedure based on the U-statistic

K_{n}

with kernel

φ ((s_{1}, t_{1}), (s_{2}, t_{2})) = 1_{{(s_{2} - s_{1}) (t_{2} - t_{1}) > 0}} - 1_{{(s_{2} - s_{1}) (t_{2} - t_{1}) \leq 0}} .

(6.10)

The rejection region for testing independence is of the form

{\sqrt{n} K_{n} > γ}

, where

γ

is a critical value determined by the asymptotic distribution of

K_{n}

under the null hypothesis. We now extend this framework to test conditional independence in a multivariate setting with potentially missing responses. Let

ξ \in R^{d_{1}}

and

η \in R^{d_{2}}

be random vectors with

d_{1} + d_{2} = d

, and set

Y = (ξ, η)

. Suppose we observe n i.i.d. copies

{(X_{i}, Y_{i}, δ_{i})}_{i = 1}^{n}

satisfying the MAR assumption (2.4) with propensity score

p (\cdot)

and positivity condition

{inf}_{x \in X} p (x) \geq c > 0

. We are interested in testing the conditional independence hypothesis

H_{0} : ξ ⊥ ⊥ η ∣ X versus H_{a} : H_{0} is false .

(6.11)

For

ℓ \in {1, 2, 3}

and

t = (t_{1}, t_{2}) \in X^{2}

, define the complete-case conditional Kendall’s tau estimator

{\hat{τ}}_{n, ℓ}^{(miss)} (t) : = \frac{\sum_{i \neq j}^{n} φ (Y_{i}, Y_{j}) δ_{i} δ_{j} K_{Λ_{n, ℓ} (t_{1})} (X_{i}) K_{Λ_{n, ℓ} (t_{2})} (X_{j})}{\sum_{i \neq j}^{n} δ_{i} δ_{j} K_{Λ_{n, ℓ} (t_{1})} (X_{i}) K_{Λ_{n, ℓ} (t_{2})} (X_{j})},

(6.12)

where

φ

is Kendall’s kernel (6.10) applied to the projected one-dimensional variables. The product

δ_{i} δ_{j}

ensures that only pairs with both responses fully observed contribute to the estimator. To handle multivariate responses, we employ a projection approach. For any unit vector

a = (a_{1}, a_{2}) \in R^{d_{1}} \times R^{d_{2}}

with

∥ a ∥ = 1

, define the projected kernel

φ^{a} ((ξ^{(1)}, η^{(1)}), (ξ^{(2)}, η^{(2)})) : = φ ((a_{1}^{⊤} ξ^{(1)}, a_{2}^{⊤} η^{(1)}), (a_{1}^{⊤} ξ^{(2)}, a_{2}^{⊤} η^{(2)})) .

Let

F^{a_{1}}

and

G^{a_{2}}

denote the distribution functions of

a_{1}^{⊤} ξ

and

a_{2}^{⊤} η

, respectively, assumed continuous for all unit vectors

a

. Under the null hypothesis

H_{0}

of conditional independence, we have

E [φ^{a} (Y_{1}, Y_{2}) ∣ X_{1} = t_{1}, X_{2} = t_{2}] = 0

for almost every

(t_{1}, t_{2})

and all

a

. Consequently, the conditional Kendall’s tau is zero, and the estimator

{\hat{τ}}_{n, ℓ}^{(miss)} (t)

should be close to zero for large n.

Theorem 17

(Consistency of conditional Kendall’s tau under MAR). Under the MAR assumption (2.4), the positivity condition

{inf}_{x \in X} p (x) \geq c > 0

, and the assumptions of Corollary 1 (adapted to the missing data setting), we have, for any fixed

t \in X^{2}

and any unit vector

a

,

|{\hat{τ}}_{n, ℓ}^{(miss)} (t) - τ_{cond} (t)| ⟶ 0 a . s .,

where

τ_{cond} (t) = E [φ (Y_{1}, Y_{2}) ∣ X_{1} = t_{1}, X_{2} = t_{2}]

is the true conditional Kendall’s tau. In particular, under

H_{0}

,

τ_{cond} (t) = 0

almost everywhere, and thus

{\hat{τ}}_{n, ℓ}^{(miss)} (t) \to 0

almost surely.

Proof.

The result follows directly from Corollary 1 applied to the kernel

φ^{a}

, noting that the missingness indicators

δ_{i} δ_{j}

are correctly accounted for in the complete-case estimator (6.12). The almost sure convergence is uniform over compact subsets of

X^{2}

under the additional smoothness conditions of Section 3.2. □

The asymptotic distribution of

{\hat{τ}}_{n, ℓ}^{(miss)} (t)

under the null hypothesis can be derived using the central limit theorem established in Section 3.3. Specifically, under

H_{0}

and appropriate bandwidth conditions, we have

\sqrt{n {\overset{˘}{b}}^{d / 2}} ({\hat{τ}}_{n, ℓ}^{(miss)} (t) - 0) \overset{D}{⟶} N (0, σ_{miss}^{2} (t)),

where

σ_{miss}^{2} (t)

is given by (3.19) with

ρ^{2}

replaced by the appropriate variance expression for Kendall’s kernel, incorporating the propensity score inflation factor

1 / p (x_{i})

. This result provides the theoretical foundation for constructing asymptotic level-

α

tests of conditional independence in the presence of missing responses under MAR.

7. Examples

The flexibility of the proposed conditional U-statistic framework is illustrated through several concrete examples. In general, any kernel function

h : Y^{m} \to R

that has proven useful in the unconditional U-statistic literature (see, e.g., [177]) can be adapted to the conditional setting via the methodology developed in the preceding sections. Recall that the case

m = 1

yields the Nadaraya–Watson estimator when the kernel is chosen as the identity function

φ (y) = y

. Furthermore, setting

φ (y) = 1_{(- \infty, x]} (y)

produces a consistent estimator of the conditional distribution function

P (Y \leq x ∣ X = x)

, i.e., the conditional empirical distribution function evaluated at x. This observation underscores the generality of the proposed approach, as it encompasses both regression and distributional estimation within a unified framework. We now examine several nontrivial examples for the case

m = 2

, which highlight the ability of conditional U-statistics to capture second-order conditional structure such as conditional variance and conditional covariance.

Example 1

(Conditional variance estimation). Consider the kernel function

h : R^{2} \to R

defined by

h (y_{1}, y_{2}) = \frac{1}{2} {(y_{1} - y_{2})}^{2} .

This kernel is symmetric and unbiased for the variance in the unconditional setting. In the conditional framework, for

\tilde{x} = (x_{1}, x_{1})

(i.e., both covariate arguments coincide), the corresponding conditional U-statistic target becomes

r^{(2)} (h, x_{1}, x_{1}) = E [\frac{1}{2} {(Y_{1} - Y_{2})}^{2} ∣ X_{1} = x_{1}, X_{2} = x_{2}] |_{x_{2} = x_{1}} .

Under the conditional i.i.d. assumption given the covariates, the right-hand side simplifies to the conditional variance of Y given

X = x_{1}

. Indeed, using the identity

E [{(Y_{1} - Y_{2})}^{2} ∣ X_{1} = X_{2} = x] = 2 Var (Y ∣ X = x)

, we obtain

r^{(2)} (h, x_{1}, x_{1}) = Var (Y ∣ X = x_{1}) .

Thus, the estimator

{\hat{r}}_{n, ℓ}^{(2)} (h, \tilde{x}; {\bar{Λ}}_{n, ℓ} (\tilde{x}))

with

\tilde{x} = (x_{1}, x_{1})

provides a consistent estimator of the conditional variance function. For this kernel, under the MAR assumption (2.4) and the positivity condition, the asymptotic variance appearing in the central limit theorem (Theorem 5) takes the form

\begin{matrix} ρ^{2} & = {E [{(Y - Y_{2})}^{2} {(Y - Y_{3})}^{2} ∣ X = X_{2} = X_{3} = x_{1}] \\ - 4 {[r^{(2)} (h, x_{1}, x_{1})]}^{2}} \frac{1}{p (x_{1})} \int K_{α, β}^{2} (u) d u / f (x_{1}), \end{matrix}

(7.1)

where the factor

1 / p (x_{1})

reflects the variance inflation due to missing responses (see (3.19)). This expression is analogous to the unconditional variance component

ζ_{1}

presented on page 182 of [177], with the crucial modification that the expectation is taken conditionally on the covariates and the kernel is replaced by its conditional counterpart.

Example 2

(Conditional covariance estimation). Let

Y_{i} = {(Y_{i 1}, Y_{i 2})}^{⊤} \in R^{2}

be bivariate response vectors. Define the kernel

h : R^{2} \times R^{2} \to R

by

h (y_{1}, y_{2}) = \frac{1}{2} (y_{11} - y_{21}) (y_{12} - y_{22}) = \frac{1}{2} (y_{11} y_{12} + y_{21} y_{22} - y_{11} y_{22} - y_{12} y_{21}) .

This kernel is symmetric in its two arguments.

For

\tilde{x} = (x_{1}, x_{2})

, the corresponding conditional U-statistic target is

r^{(2)} (h, x_{1}, x_{2}) = E [\frac{1}{2} (Y_{11} - Y_{21}) (Y_{12} - Y_{22}) | X_{1} = x_{1}, X_{2} = x_{2}] .

When the covariate arguments coincide, that is, when

x_{1} = x_{2} = x

, we obtain

r^{(2)} (h, x, x) = \frac{1}{2} E [(Y_{11} - Y_{21}) (Y_{12} - Y_{22}) | X_{1} = x, X_{2} = x] .

Since

(Y_{11}, Y_{12})

and

(Y_{21}, Y_{22})

are conditionally i.i.d. given

X = x

, we have

\begin{matrix} E (Y_{11} Y_{12} ∣ X_{1} = x, X_{2} = x) & = E (Y_{11} Y_{12} ∣ X = x), \end{matrix}

(7.2)

\begin{matrix} E (Y_{21} Y_{22} ∣ X_{1} = x, X_{2} = x) & = E (Y_{21} Y_{22} ∣ X = x), \end{matrix}

(7.3)

\begin{matrix} E (Y_{11} Y_{22} ∣ X_{1} = x, X_{2} = x) & = E (Y_{11} ∣ X = x) E (Y_{22} ∣ X = x), \end{matrix}

(7.4)

\begin{matrix} E (Y_{12} Y_{21} ∣ X_{1} = x, X_{2} = x) & = E (Y_{12} ∣ X = x) E (Y_{21} ∣ X = x) . \end{matrix}

(7.5)

Therefore,

\begin{matrix} r^{(2)} (h, x, x) & = & \frac{1}{2} [E (Y_{11} Y_{12} ∣ X = x) + E (Y_{21} Y_{22} ∣ X = x) \\ - E (Y_{11} ∣ X = x) E (Y_{22} ∣ X = x) - E (Y_{12} ∣ X = x) E (Y_{21} ∣ X = x)] . \end{matrix}

(7.6)

Since

(Y_{11}, Y_{12})

and

(Y_{21}, Y_{22})

have the same conditional distribution given

X = x

, it follows that

E (Y_{11} Y_{12} ∣ X = x) = E (Y_{21} Y_{22} ∣ X = x)

and

E (Y_{11} ∣ X = x) = E (Y_{21} ∣ X = x), E (Y_{12} ∣ X = x) = E (Y_{22} ∣ X = x) .

Hence

r^{(2)} (h, x, x) = E (Y_{11} Y_{12} ∣ X = x) - E (Y_{11} ∣ X = x) E (Y_{12} ∣ X = x),

which is precisely the conditional covariance

Cov (Y_{1}, Y_{2} ∣ X = x) .

Therefore, the proposed conditional U-statistic with kernel h provides a consistent estimator of the conditional covariance function under the MAR assumption, provided the positivity condition holds. This example illustrates that the methodology naturally accommodates second-order conditional functionals beyond conditional means.

Remark 18.

Both examples illustrate that the conditional U-statistic framework naturally accommodates kernels that are not necessarily products of functions of individual observations, thereby enabling the estimation of complex conditional functionals such as conditional variance and conditional covariance. The extension to missing responses under MAR is seamlessly integrated through the inclusion of the product of missingness indicators

\prod_{j = 1}^{m} δ_{i_{j}}

in the estimator definition, as shown in (2.5). The asymptotic properties derived in Section 3.2 and Section 3.3 guarantee the consistency and asymptotic normality of these estimators under the same regularity conditions, with the asymptotic variance inflated by the factor

1 / p (x_{i})

to account for the reduced effective sample size due to missingness.

Remark 19.

In the unconditional setting (i.e., without covariates), the U-statistic

U_{n} (h) = {(\binom{n}{2})}^{- 1} \sum_{i < j} h (Y_{i}, Y_{j})

provides an unbiased estimator of

E [h (Y_{1}, Y_{2})]

. The conditional U-statistic proposed in this work generalizes this concept by allowing the target parameter to depend on covariates

X_{1}, \dots, X_{m}

, while also accommodating missing responses through the MAR mechanism. The price paid for this added flexibility is the introduction of a bandwidth parameter

\overset{˘}{b}

and the resulting bias-variance trade-off, as well as the variance inflation factor

1 / p (x_{i})

due to missingness. In the limiting case where the covariate space reduces to a single point (i.e., no conditioning), our estimator recovers the standard unconditional U-statistic, provided the bandwidth is chosen appropriately (e.g.,

\overset{˘}{b} \to \infty

in an appropriate sense).

8. Bandwidth Selection Under Missing Responses

In the presence of missing responses, the bandwidth-selection rule must be adapted so that the validation criterion only involves observable response tuples. In the present MAR framework, a natural strategy is therefore to employ a complete-case leave-tuple-out cross-validation criterion, obtained by restricting the empirical risk to those m-tuples for which all responses are observed and by removing from the training sample every tuple that shares at least one observation with the validation tuple. To simplify notation, for any

i = (i_{1}, \dots, i_{m}) \in I (m, n)

, define

Δ_{i} : = \prod_{r = 1}^{m} δ_{i_{r}},

so that

Δ_{i} = 1

if and only if all responses in the tuple

{\tilde{Y}}_{i} = (Y_{i_{1}}, \dots, Y_{i_{m}})

are observed, and

Δ_{i} = 0

otherwise. For each fixed

i = (i_{1}, \dots, i_{m}) \in I (m, n)

, define

I_{- i} (m, n) : = \{(j_{1}, \dots, j_{m}) \in I (m, n) : {j_{1}, \dots, j_{m}} \cap {i_{1}, \dots, i_{m}} = ⌀\} .

Thus,

I_{- i} (m, n)

consists of all m-tuples that do not use any observation appearing in the validation tuple

i

. Throughout this section, the symbol h denotes generically the smoothing parameter associated with the estimator under consideration; depending on the context, h may stand for a bandwidth, a vector of bandwidths, or the Bernstein order parameter. For any fixed

i = (i_{1}, \dots, i_{m}) \in I (m, n)

and any

ℓ \in {1, 2, 3}

, we define the leave-tuple-out complete-case estimator by

{\hat{r}}_{n, ℓ, i}^{(m), (miss)} (φ, \tilde{x}; {\bar{Λ}}_{n, ℓ} (\tilde{x})) : = \frac{\sum_{j \in I_{- i} (m, n)} φ ({\tilde{Y}}_{j}) (\prod_{r = 1}^{m} δ_{j_{r}}) K_{Λ_{n, ℓ} (x_{1})} (X_{j_{1}}) \dots K_{Λ_{n, ℓ} (x_{m})} (X_{j_{m}})}{\sum_{j \in I_{- i} (m, n)} (\prod_{r = 1}^{m} δ_{j_{r}}) K_{Λ_{n, ℓ} (x_{1})} (X_{j_{1}}) \dots K_{Λ_{n, ℓ} (x_{m})} (X_{j_{m}})} .

(8.1)

Evaluating this estimator at

\tilde{x} = {\tilde{X}}_{i}

yields a predictor of

φ ({\tilde{Y}}_{i})

based exclusively on complete tuples that are disjoint from the validation tuple.

To estimate the quadratic prediction risk, let

W (\cdot)

be a known nonnegative weight function and define, as before,

\tilde{W} (\tilde{x}) : = \prod_{r = 1}^{m} W (x_{r}), \tilde{x} = (x_{1}, \dots, x_{m}) \in X^{m} .

The complete-case cross-validation criterion is then given, for

ℓ \in {1, 2, 3}

, by

C V_{ℓ}^{(miss)} (φ, h) : = \frac{(n - m)!}{n!} \sum_{i \in I (m, n)} Δ_{i} {(φ ({\tilde{Y}}_{i}) - {\hat{r}}_{n, ℓ, i}^{(m), (miss)} (φ, {\tilde{X}}_{i}; h))}^{2} \tilde{W} ({\tilde{X}}_{i}) .

(8.2)

The factor

Δ_{i}

ensures that only fully observed tuples contribute to the criterion. This is essential, since

φ ({\tilde{Y}}_{i})

is not available whenever at least one component of the response tuple is missing. A natural data-driven choice of the smoothing parameter is therefore

{\hat{h}}_{n, ℓ}^{(miss)} \in \arg min_{h \in H_{n}} C V_{ℓ}^{(miss)} (φ, h),

where

H_{n}

denotes a prescribed set of admissible smoothing parameters. Since the number of complete tuples may vary substantially with the missingness rate, one may also consider the normalized version

{\tilde{C V}}_{ℓ}^{(miss)} (φ, h) : = {(\sum_{i \in I (m, n)} Δ_{i})}^{- 1} \sum_{i \in I (m, n)} Δ_{i} {(φ ({\tilde{Y}}_{i}) - {\hat{r}}_{n, ℓ, i}^{(m), (miss)} (φ, {\tilde{X}}_{i}; h))}^{2} \tilde{W} ({\tilde{X}}_{i}),

(8.3)

whenever

\sum_{i \in I (m, n)} Δ_{i} > 0

. Since the normalizing factor does not depend on h, both criteria lead to the same minimizer. In some applications, a local version of the criterion may be preferable. To this end, define

C V_{ℓ}^{(miss)} (φ, {\hat{h}}_{n, ℓ}^{(miss)}) : = \frac{(n - m)!}{n!} \sum_{i \in I (m, n)} Δ_{i} {(φ ({\tilde{Y}}_{i}) - {\hat{r}}_{n, ℓ, i}^{(m), (miss)} (φ, {\tilde{X}}_{i}; {\hat{h}}_{n, ℓ}^{(miss)}))}^{2} \hat{W} ({\tilde{X}}_{i}, \tilde{x}),

(8.4)

where

\hat{W} (\tilde{s}, \tilde{x}) : = \prod_{r = 1}^{m} \hat{W} (s_{r}, x_{r}) .

In practice, one often considers either the global choice

\tilde{W} ({\tilde{X}}_{i}) = 1, i \in I (m, n),

or the local weights

\hat{W} (s, x) = \{\begin{matrix} 1, & if & ∥ s - x ∥ \leq h, \\ 0, & otherwise . \end{matrix}

Accordingly,

\hat{W} ({\tilde{X}}_{i}, \tilde{x}) = \prod_{r = 1}^{m} 1_{{∥ X_{i_{r}} - x_{r} ∥ \leq h}} .

Remark 20.

The criterion (8.2) is the natural complete-case analog of leave-one-out cross-validation for conditional U-statistics under MAR. In the present higher-order setting, removing only the tuple

i

itself is not sufficient, because tuples sharing one or more observations with

i

would still introduce leakage between training and validation. The definition of

I_{- i} (m, n)

avoids this issue by excluding every tuple that intersects the validation tuple.

9. Simulation Study

This section reports a finite-sample investigation of kernel-based estimators of the conditional Kendall coefficient when the response pair is only partially observed. The numerical design is chosen to satisfy two requirements simultaneously. First, it remains exactly faithful to the implementation used in the simulation code, so that every reported quantity has a direct computational meaning. Second, it is sufficiently structured to probe the main theoretical issues raised in Section 3, Section 4, Section 5.1, Section 5.2, Section 5.3 and Section 5.4, namely: local estimation of a conditional concordance functional, smoothing on a compact support, boundary sensitivity, and the effect of incomplete responses under both MCAR and covariate-dependent MAR mechanisms. The study also compares complete-case local smoothing with inverse-probability reweighting, but this comparison must be interpreted carefully because, in the present design, the missingness mechanism depends only on the conditioning covariate.

9.1. Target Functional and Interpretation

For

u \in (0, 1)

, the target object is the conditional Kendall coefficient

τ (u) \equiv τ_{1, 2 ∣ X = u} = E [sign \{(Y_{1}^{(1)} - Y_{1}^{(2)}) (Y_{2}^{(1)} - Y_{2}^{(2)})\} | X^{(1)} = X^{(2)} = u],

where

(X^{(1)}, Y_{1}^{(1)}, Y_{2}^{(1)})

and

(X^{(2)}, Y_{1}^{(2)}, Y_{2}^{(2)})

are independent copies of

(X, Y_{1}, Y_{2})

. Since the conditioning event

{X^{(1)} = X^{(2)} = u}

has probability zero under a continuous design, the above display should be read in the usual regular-conditional sense: if

(Y_{1}^{[u]}, Y_{2}^{[u]})

and

({\tilde{Y}}_{1}^{[u]}, {\tilde{Y}}_{2}^{[u]})

are two independent draws from the conditional distribution of

(Y_{1}, Y_{2})

given

X = u

, then

τ (u) = E [sign \{(Y_{1}^{[u]} - {\tilde{Y}}_{1}^{[u]}) (Y_{2}^{[u]} - {\tilde{Y}}_{2}^{[u]})\}] .

Under continuity of the conditional law of

(Y_{1}, Y_{2}) ∣ X = u

, ties occur with probability zero, and one has the equivalent identities

τ (u) = 4 P (Y_{1}^{[u]} < {\tilde{Y}}_{1}^{[u]}, Y_{2}^{[u]} < {\tilde{Y}}_{2}^{[u]}) - 1,

and

τ (u) = 1 - 4 P (Y_{1}^{[u]} < {\tilde{Y}}_{1}^{[u]}, Y_{2}^{[u]} > {\tilde{Y}}_{2}^{[u]}) .

These equivalent population representations justify the three empirical Kendall-type estimators used below.

9.2. Data-Generating Mechanisms

For each Monte Carlo replication, and for each sample size

n \in {500, 1000, 2000},

we generate i.i.d. observations

(X_{i}, Y_{1 i}, Y_{2 i}, δ_{i}), i = 1, \dots, n,

where

X_{i} \in [0, 1]

is a scalar covariate,

(Y_{1 i}, Y_{2 i})

is a continuous response pair, and

δ_{i} \in {0, 1}

indicates whether the response pair is observed. Conditionally on

X_{i} = u

, the response pair is produced through the Gaussian location–scale construction

\begin{matrix} Y_{1 i} & = m_{1} (X_{i}) + s_{1} (X_{i}) Z_{1 i}, \\ Y_{2 i} & = m_{2} (X_{i}) + s_{2} (X_{i}) (ρ (X_{i}) Z_{1 i} + \sqrt{1 - ρ {(X_{i})}^{2}} Z_{2 i}), \end{matrix}

where

Z_{1 i}, Z_{2 i} \overset{i . i . d .}{\sim} N (0, 1)

and

ρ (u) = \sin (\frac{π τ (u)}{2}) .

Consequently, conditionally on

X = u

, the pair

(Y_{1}, Y_{2})

is Gaussian with correlation

ρ (u)

, and therefore

τ (u) = \frac{2}{π} \arcsin {ρ (u)} .

This construction is particularly convenient here because it allows the conditional Kendall function to be prescribed exactly while permitting the conditional margins to vary through

m_{1}, m_{2}, s_{1}, s_{2}

. In other words, the dependence structure is controlled through the conditional copula parameter, whereas location and scale heterogeneity are introduced without changing the target value of

τ (u)

. Two scenarios are considered.

Scenario 1: linear conditional association under uniform design.

The covariate is uniformly distributed on the unit interval:

X \sim Unif (0, 1) .

The conditional Kendall function is linear,

τ (u) = 2 u - 1,

so that the conditional association ranges from negative dependence near

u = 0

to positive dependence near

u = 1

, crossing zero at

u = 1 / 2

. The conditional location and scale functions are

m_{1} (u) = u, m_{2} (u) = u, s_{1} (u) = 1, s_{2} (u) = 1 .

This first scenario is intentionally regular: the design density is flat, the target is affine, and there is no intrinsic boundary concentration in the covariate distribution. It therefore serves as the baseline regime for evaluating the procedures in a relatively favorable setting.

Scenario 2: nonlinear conditional association under asymmetric Beta design.

The covariate follows a Beta distribution concentrated near the left boundary,

X \sim Beta (2, 5) .

The conditional Kendall function is

τ (u) = \sin (π (u - 1 / 2)),

which is smooth and nonlinear on

[0, 1]

, with

τ (1 / 2) = 0

. Its nonlinearity is essential for the simulation, because it makes bias behavior more sensitive to the local smoothing rule than in Scenario 1. The conditional location and scale functions are

m_{1} (u) = 2 u, m_{2} (u) = 1 - u, s_{1} (u) = 0.5 + 0.5 u, s_{2} (u) = 1 .

This second scenario is deliberately more demanding: the design density is strongly inhomogeneous, with substantial mass near the boundary, and the target is no longer affine. It is precisely in such settings that support-adapted smoothers should be expected to reveal any practical advantage.

9.3. Missing-Response Mechanism

The variable

δ_{i}

specifies whether the pair

(Y_{1 i}, Y_{2 i})

is observed. Only units with

δ_{i} = 1

contribute to the estimator. The simulations are run at target missing proportions

π \in {0, 0.10, 0.30, 0.50}, q = 1 - π,

where q denotes the target observation rate.

Two missingness labels are implemented.

MCAR.

Under missing completely at random,

P (δ_{i} = 1 ∣ X_{i}) = q .

Thus, the observation probability is constant, and the observed sample is, conditionally on its size, a simple thinning of the original design.

Covariate-dependent MAR.

Under the MAR mechanism used in the code,

P (δ_{i} = 1 ∣ X_{i} = x) = p (x) = {logit}^{- 1} (a + b (x - 1 / 2)), b = 3 .

The intercept a is not fixed once and for all. Instead, for each Monte Carlo replication, it is calibrated numerically from the realized covariates

X_{1}, \dots, X_{n}

so that

\frac{1}{n} \sum_{i = 1}^{n} {logit}^{- 1} (a + b (X_{i} - 1 / 2)) = q .

This calibration is a subtle but important design choice. It ensures that the sample average of the selection probabilities equals the target observation proportion before Bernoulli thinning is applied. As a consequence, the realized observed rate fluctuates around its nominal target essentially because of the Bernoulli draws, not because of uncontrolled replication-to-replication drift in the average propensity score. Since the logistic link maps

R

into

(0, 1)

, the positivity condition

0 < p (x) < 1, x \in [0, 1],

holds automatically. A crucial conceptual point is that the missingness mechanism depends on X only. Since the estimand itself is conditional on

X = u

, complete-case smoothing remains targeted to the same conditional functional

τ (u)

. Therefore, in the present design, the comparison between complete-case and IPW estimators should not be framed as a generic correction of selection bias in the target parameter. Rather, it should be read as a comparison of local design reweighting strategies and of the corresponding finite-sample bias–variance trade-off.

9.4. Observed Sample, Local Weights, and Empirical Kendall Representations

Let

O_{n} = {i \in {1, \dots, n} : δ_{i} = 1}, n_{cc} = | O_{n} |,

denote the observed set and the number of complete cases. Estimation is performed on the regular grid

U = {u_{1}, \dots, u_{97}} = seq (0.02, 0.98, length . out = 97) .

The use of the truncated grid

[0.02, 0.98]

instead of the full interval

[0, 1]

is deliberate. It avoids reporting performance summaries at the extreme endpoints, where ordinary symmetric smoothers may suffer their most severe support truncation and where numerical comparisons may be dominated by endpoint artifacts rather than by the intrinsic behavior of the procedure.

For each

u \in U

, the code computes normalized local weights over the observed sample.

Complete-case weights.

For a generic local score

L_{K, u} (X_{i})

associated with kernel or Bernstein smoothing rule K,

w_{i, K}^{cc} (u) = \frac{L_{K, u} (X_{i})}{\sum_{j \in O_{n}} L_{K, u} (X_{j})}, i \in O_{n} .

IPW weights.

If

p_{i} = P (δ_{i} = 1 ∣ X_{i})

denotes the observation probability used to generate the missingness indicator, the IPW version is

w_{i, K}^{ipw} (u) = \frac{L_{K, u} (X_{i}) / p_{i}}{\sum_{j \in O_{n}} L_{K, u} (X_{j}) / p_{j}}, i \in O_{n} .

Since the simulation is fully controlled, the true probabilities

p_{i}

are used directly. Hence the numerical comparison isolates the second-stage effect of inverse-probability reweighting and does not involve any first-stage propensity estimation error. This has an immediate and nontrivial consequence under MCAR. If

p_{i} \equiv q

for all i, then

w_{i, K}^{ipw} (u) = \frac{L_{K, u} (X_{i}) / q}{\sum_{j \in O_{n}} L_{K, u} (X_{j}) / q} = \frac{L_{K, u} (X_{i})}{\sum_{j \in O_{n}} L_{K, u} (X_{j})} = w_{i, K}^{cc} (u) .

Therefore, in the present implementation, complete-case and IPW estimators are exactly identical under MCAR. This point is easily overlooked, but it is essential for the correct interpretation of the Gaussian benchmark comparisons: Any difference between CC and IPW can only arise under covariate-dependent MAR. Using either

w_{i}^{cc} (u)

or

w_{i}^{ipw} (u)

, the code computes the following three weighted Kendall-type statistics:

{\hat{τ}}_{n}^{(1)} (u) = 4 \sum_{i \in O_{n}} \sum_{j \in O_{n}} w_{i} (u) w_{j} (u) 1 {Y_{1 i} < Y_{1 j}, Y_{2 i} < Y_{2 j}} - 1,

{\hat{τ}}_{n}^{(2)} (u) = \sum_{i \in O_{n}} \sum_{j \in O_{n}} w_{i} (u) w_{j} (u) sign ((Y_{1 i} - Y_{1 j}) (Y_{2 i} - Y_{2 j})),

and

{\hat{τ}}_{n}^{(3)} (u) = 1 - 4 \sum_{i \in O_{n}} \sum_{j \in O_{n}} w_{i} (u) w_{j} (u) 1 {Y_{1 i} < Y_{1 j}, Y_{2 i} > Y_{2 j}} .

At the population level, these three forms are equivalent under continuity. In finite samples, however, they need not coincide numerically. The reason is twofold. First, they are different empirical algebraizations of the same concordance functional. Second, the code uses fully normalized local weights and sums over all ordered pairs

(i, j)

, including diagonal terms. Thus the implemented quantities are weighted V-statistic analogs rather than leave-two-out U-statistics. The diagonal contribution is negligible asymptotically under standard smoothing conditions, but it can matter nonnegligibly in finite samples when the local weight distribution is uneven, especially near boundaries or under severe missingness. This is precisely why retaining the three empirical representations in the simulation is methodologically preferable to collapsing them a priori into a single nominal estimator.

9.5. Smoothers and Tuning Rules

The implementation compares the five smoothing devices

gaussian, epanechnikov, tricube, beta, bernstein .

No Dirichlet kernel is included in the current code, even though Dirichlet-type constructions appear in the broader theoretical discussion. For the Gaussian, Epanechnikov, tricube, and beta smoothers, the tuning parameter is selected through a rule of the form

h_{K} = α_{K} {\hat{σ}}_{X} n_{cc}^{- 1 / 5},

with constants

α_{gaussian} = 1.06, α_{epanechnikov} = 2.34, α_{tricube} = 2.50, α_{β} = 1.50,

where

{\hat{σ}}_{X}

is the empirical standard deviation of the observed covariates

{X_{i} : i \in O_{n}}

. For the Bernstein smoother, the polynomial degree is chosen as

ϑ = ⌈\frac{n_{cc}}{\log n_{cc}}⌉ .

These tuning choices deserve explicit comment. First, all smoothing parameters depend on the observed sample rather than on the nominal size n. Missingness therefore affects the estimator not only by reducing the number of usable observations but also by altering the amount of smoothing through

n_{cc}

and, under MAR, through the observed empirical spread

{\hat{σ}}_{X}

. Second, the comparison between kernels is not a comparison of shapes alone, because each family comes with its own scaling constant. Strictly speaking, the study compares fully implemented smoothing procedures, not isolated kernels under a common bandwidth benchmark. This is exactly the relevant comparison from an applied numerical standpoint.

9.6. Monte Carlo Protocol and Risk Criteria

For each fixed configuration

(scenario, n, π, missingness label, correction, kernel, estimator),

the code performs

B = NSIM

independent Monte Carlo replications. In the testing version of the script,

NSIM = 100

, whereas the intended production run uses

NSIM = 1000

. Let

{\hat{τ}}_{b} (u)

denote the estimator obtained at replication b and evaluation point

u \in U

. The pointwise summaries are

Bias (u) = \frac{1}{B} \sum_{b = 1}^{B} {\hat{τ}}_{b} (u) - τ (u),

SD (u) = {\{\frac{1}{B - 1} \sum_{b = 1}^{B} {({\hat{τ}}_{b} (u) - \bar{τ} (u))}^{2}\}}^{1 / 2}, \bar{τ} (u) = \frac{1}{B} \sum_{b = 1}^{B} {\hat{τ}}_{b} (u),

MSE (u) = \frac{1}{B} \sum_{b = 1}^{B} {({\hat{τ}}_{b} (u) - τ (u))}^{2},

and

MAE (u) = \frac{1}{B} \sum_{b = 1}^{B} |{\hat{τ}}_{b} (u) - τ (u)| .

To summarize overall performance on the grid, the code computes integrated criteria by trapezoidal quadrature:

IBias = \int_{U} | Bias (u) | d u, ISd = \int_{U} SD (u) d u,

IMSE = \int_{U} MSE (u) d u, IAE = \int_{U} MAE (u) d u .

In the implementation, these quantities are numerical integrals over

[0.02, 0.98]

, not over the full interval

[0, 1]

. This should be kept in mind when interpreting the results: the integrated risk summarizes interior and near-boundary performance, but it does not include the extreme endpoints.

9.7. Reported Numerical Summaries

The numerical output is summarized selectively. For readability, we do not reproduce the full collection of raw simulation tables generated by the code. Instead, we report the quantities most directly connected with the theoretical questions studied in the paper. First, we present integrated criteria

(IBias, ISd, IMSE, IAE),

computed over the evaluation grid by trapezoidal quadrature. These criteria summarize the global finite-sample behavior of each procedure on the interval used in the simulations. Second, we report conditional IMSE rankings within fixed design cells

(scenario, missingness label, correction, n, π) .

This conditional ranking is preferable to a global ranking, since different scenarios and missingness levels correspond to intrinsically different estimation difficulties. Third, we include representative heatmaps and boxplots of IMSE in order to visualize the effect of the smoothing rule, the sample size, the missingness rate, and the missingness mechanism. These figures are intended to complement the integrated criteria rather than to replace them. Finally, we include observed-rate diagnostics to verify that the implemented missingness mechanisms produce the intended levels of incomplete response observation. The complete-case versus IPW comparison is reported only where it is informative, namely under covariate-dependent MAR; under MCAR, normalized complete-case and normalized IPW weights coincide.

9.8. Interpretation of the Comparisons

The numerical comparisons should be read in light of several structural facts. First, because missingness depends at most on X, both complete-case and IPW procedures target the same conditional functional

τ (u)

. Under MAR, IPW should therefore be understood as a device that attempts to reconstruct the local covariate design that would have been available without missingness. Its role is not to change the target parameter, but to modify the local weighting scheme used to estimate it. Second, under MCAR, normalized IPW and normalized complete-case weights coincide exactly. Accordingly, any CC–IPW contrast is meaningful only under covariate-dependent MAR. This exact equivalence is a consequence of the normalization, not an asymptotic approximation. Third, boundary behavior is central. The Gaussian, Epanechnikov, and tricube procedures are used on a compact support without explicit boundary renormalization, whereas the beta and Bernstein constructions are intrinsically support-adapted. Scenario 2 is therefore particularly informative, because it combines nonlinear dependence with a covariate distribution concentrated near the left boundary. Fourth, the three empirical Kendall representations are asymptotically linked, but their finite-sample ranking may differ because the simulation compares distinct weighted V-statistic implementations. Such differences should be interpreted as part of the actual numerical behavior of the estimators, not as irrelevant computational noise. Fifth, because the smoothing rules depend on

n_{cc}

and on the observed empirical variability of X, missingness perturbs performance through several channels simultaneously: loss of effective sample size, alteration of the observed design density, and modification of the smoothing scale. The resulting impact is therefore richer than a simple “smaller n” effect.

9.9. Results

The numerical findings are summarized in Table A1 and in Figure 1, Figure 2, Figure 3, Figure 4, Figure 5, Figure 6, Figure 7 and Figure 8. Since the ranking is performed within fixed design cells, all substantive comparisons should be read conditionally on

(scenario, missingness label, correction, n, π) .

This conditional reading is essential in order not to confound procedures operating under different levels of intrinsic difficulty. From a methodological standpoint, the most informative contrasts are the following.

Effect of the smoothing rule. The heatmaps and boxplots quantify how the choice of smoother interacts with support geometry and covariate density. In Scenario 1, where the design is uniform and the target is linear, conventional symmetric kernels may remain competitive. In Scenario 2, where the design is concentrated near the boundary and the target is nonlinear, support-adapted procedures should be given particular attention. Nevertheless, any final judgment must be drawn from the IMSE summaries themselves, since the comparison concerns fully implemented procedures, not only asymptotic kernel classes.

Effect of missingness. The observed-rate diagnostics first verify that the missingness module is correctly calibrated. Once this is established, the degradation in performance as

π

increases reflects a combination of reduced effective sample size, design distortion under MAR, and changes in the data-driven smoothing scale.

Complete-case versus IPW under MAR. Because the target is conditional on X, the CC–IPW comparison is fundamentally local. In regions where the MAR mechanism significantly distorts the observed covariate distribution, IPW may improve centering by compensating for that local distortion, but at the cost of increased dispersion due to heterogeneous weights. In more homogeneous regions, complete-case smoothing may remain competitive or even preferable. The comparison is therefore best understood as a local bias–variance trade-off rather than as a universal superiority statement.

Effect of sample size. The progression

n = 500, 1000, 2000

allows one to assess how rapidly the integrated criteria decrease as information accumulates. Since the smoothing parameters are determined from the observed sample, the effective asymptotic regime is governed jointly by n, the missingness level, and the realized structure of the observed covariates.

9.10. Practical Implications

The simulation supports the following methodological reading. First, when the covariate support is compact and boundary behavior is substantively important, support-adapted smoothers such as the beta and Bernstein constructions deserve explicit consideration rather than being treated as merely optional refinements. Second, when missingness depends only on the conditioning covariate, complete-case analysis is not automatically inconsistent for the conditional Kendall target. In that case, the empirical relevance of IPW is primarily a question of local reweighting efficiency and variance control. Third, because the ranking is carried out within fixed design cells, practical recommendations should be made conditionally on the sampling regime and missingness level, rather than through a single unconditional hierarchy. Fourth, the current IPW analysis is intentionally idealized: the true observation probabilities are used, so the study isolates the second-stage impact of reweighting. In applications, a fully data-adaptive implementation would involve an additional first-stage estimation problem, and the present results should therefore be read as a benchmark rather than as a complete end-to-end performance assessment.

Overall, the simulations support the qualitative conclusions of the asymptotic theory. Missingness mainly affects performance through the loss of effective information and through changes in the observed local design, whereas the relative behavior of the smoothing rules is strongly influenced by support geometry and boundary effects. The numerical study should therefore be interpreted as a finite-sample illustration of the theoretical mechanisms developed in the paper, not as an exhaustive benchmark over all possible bandwidth choices, kernels, missingness mechanisms or conditional dependence models.

10. Concluding Remarks

This paper develops a comprehensive asymptotic framework for nonparametric conditional U-statistics smoothed by asymmetric kernels in the presence of Missing-at-Random (MAR) responses. The analysis brings together three layers of difficulty that, to the best of our knowledge, had not previously been treated within a unified theory: the intrinsic nonlinearity of conditional U-functionals, the boundary-sensitive nature of smoothing on constrained supports, and the additional stochastic distortion induced by incomplete responses. In this sense, the present work substantially extends the classical conditional U-statistics literature initiated by [60] and subsequently refined in [63,64,65,66,67], by embedding it into a support-adapted and missing-data-aware nonparametric framework.

Our first contribution is the introduction and analysis of Dirichlet-kernel conditional U-statistics on the

d m

-dimensional simplex under MAR sampling. This appears to be the first rigorous study of such estimators in the literature. For these procedures, we establish both uniform strong consistency and asymptotic normality, thereby providing a theoretical foundation for local nonlinear smoothing on simplex-constrained domains. Since Dirichlet kernels are intrinsically adapted to the geometry of the simplex, the resulting estimators overcome the severe boundary distortions that inevitably affect conventional symmetric smoothers on constrained supports. Beyond the higher-order conditional U-statistic setting, our analysis also yields new results for the associated Nadaraya–Watson-type regression estimators, which are of independent methodological interest.

A second major contribution concerns Bernstein-polynomial smoothing. We show that Bernstein-type conditional U-statistics also admit a rich asymptotic theory under MAR sampling, including weak and strong uniform convergence. These results reinforce the view that Bernstein smoothers constitute not merely an approximation-theoretic device, but a genuine support-respecting inferential tool for nonlinear conditional estimation. In parallel, we establish analogous convergence results for beta-kernel conditional U-statistics on hyperrectangles, including rates over expanding compact sets and under general sequences of smoothing parameters. This part of the paper clarifies how support-adapted asymmetric kernels can be systematically incorporated into the conditional U-statistics paradigm on bounded Euclidean domains.

A further originality of the paper lies in the treatment of mixed continuous and categorical regressors. This extension is particularly relevant for modern applications, where the explanatory structure is seldom purely continuous. The mixed-design setting considerably enlarges the practical scope of the theory and shows that the conditional U-statistics methodology remains viable in heterogeneous covariate environments. In addition, the inferential examples discussed in the paper—notably discrimination problems, multisample conditional U-statistics, and conditional versions of Kendall’s rank-based dependence measures—illustrate that the proposed framework is sufficiently flexible to handle a wide family of nonlinear conditional targets beyond ordinary regression means.

From a technical standpoint, the paper also contributes a methodological strategy for analyzing asymmetric nonlinear smoothers that is, in itself, of independent value. The proofs rely on a delicate combination of Hoeffding-type decompositions, truncation arguments, and exponential inequalities for canonical U-statistics, coupled with empirical-process techniques adapted to location-dependent kernels. The point here is that the present asymptotic theory cannot be obtained by a routine transfer of arguments from either standard Nadaraya–Watson smoothing or ordinary asymmetric density estimation. The nonlinear ratio structure of conditional U-statistics, together with the dependence of the kernel shape on the evaluation point and the effective sample-size reduction caused by missingness, creates a genuinely more intricate probabilistic problem. One of the broader messages of the paper is therefore that support-adapted smoothing and higher-order nonlinear inference can indeed be reconciled, but only after a careful reworking of the classical tools.

Although the present results are already fairly general, they open several promising directions for further research. A first natural extension concerns data-driven smoothing-parameter selection. While we briefly discuss cross-validation-type criteria, a full asymptotic theory for optimal bandwidth or polynomial-order selection in the conditional U-statistics context remains open. Such a theory is likely to be substantially more delicate than in ordinary regression, since the optimal balance between bias and stochastic fluctuation depends not only on the geometry of the support and the smoothness of the target, but also on the order m of the conditional functional and the local effective sample size induced by the missingness mechanism. In particular, deriving asymptotically optimal selectors under global or local risk criteria, and understanding their interaction with boundary adaptation, would constitute a significant advance.

A second important perspective concerns estimated missingness mechanisms. In the present work, the MAR framework is incorporated through complete-case arguments and positivity conditions. A natural next step would be to allow the propensity score to be unknown and nonparametrically estimated, thereby leading to inverse-probability-weighted, augmented, or doubly robust versions of conditional U-statistics. Such developments would connect the present theory to semiparametric efficiency, debiasing, and modern missing-data methodology. In particular, it would be of considerable interest to determine whether one can construct asymptotically linear and efficiency-enhanced versions of support-adapted conditional U-statistics in the MAR setting, and to quantify the price paid, in higher-order nonlinear problems, for replacing the true observation probability by an estimated one.

A third perspective concerns dependence structures. The literature on asymmetric kernels under dependence is still comparatively sparse, and extending the present results beyond the i.i.d. framework would require genuinely new probabilistic tools. This includes time series, mixing arrays, spatial random fields, network-dependent samples, and other structured dependence schemes; see, for example, [178,179,180,181]. In such contexts, both the Hoeffding decomposition and the local empirical-process machinery must be revisited, and the interaction between dependence and support-adapted smoothing may generate new boundary phenomena. A comparable challenge arises for longitudinal data, panel data, and functional observations, where local nonlinear conditional functionals are often of direct inferential interest.

Another highly promising direction is the extension to other geometric supports. The present article focuses on simplices and hyperrectangles, which already cover a broad class of practical situations. Nevertheless, many modern datasets live on more general constrained spaces: spheres, manifolds, compositional submanifolds, cones, positive semidefinite matrices, or other structured domains. Developing conditional U-statistics with support-respecting asymmetric kernels on such spaces would enlarge the theory in a conceptually important way, especially in applications involving directional data, shape analysis, diffusion tensors, and manifold-valued learning problems. In these settings, the interplay between geometry, kernel construction, and nonlinear functional estimation is likely to raise new questions of both probabilistic and statistical significance.

A further line of investigation concerns high-dimensional and regularized regimes. As the ambient dimension grows, the effective sparsity of the local neighborhoods deteriorates, and the curse of dimensionality becomes especially severe for higher-order conditional functionals. This suggests the need for structure-exploiting extensions based on dimension reduction, additive representations, sparsity constraints, or localized projections. In parallel, one may ask whether conditional U-statistics smoothed by asymmetric kernels can be combined with modern regularization devices in order to produce feasible estimators in moderately or genuinely high-dimensional settings. Such questions are particularly relevant for contemporary applications in genomics, finance, image analysis, and network data, where the response functional may be nonlinear and the covariate support intrinsically constrained.

The paper also points toward applications in change-point analysis, survival and censored-data models, and robust local inference. Change-point methodology has become increasingly important in stochastic systems subject to structural breaks, yet its interaction with conditional U-statistics remains almost completely unexplored; see [35,182,183,184,185]. Likewise, extending the present support-adapted framework to accommodate right censoring, truncation, interval observation, or informative missingness would considerably broaden its scope; compare, for instance, with [186]. From a robustness viewpoint, it would also be worthwhile to investigate conditional U-statistics based on bounded or redescending kernels, especially in conjunction with asymmetric smoothing, in order to better cope with contamination and heavy tails.

Finally, beyond the asymptotic theory, large-scale computational aspects merit dedicated attention. Because conditional U-statistics involve summation over distinct tuples, computational complexity becomes a nontrivial issue even in moderate samples, especially when repeated evaluations are required for bandwidth selection or resampling. This raises interesting algorithmic questions concerning incomplete U-statistics, randomized approximations, divide-and-conquer strategies, online updating, and distributed implementations. Such directions are not merely computational conveniences; they are essential if the present methodology is to become operational in data-rich environments.

In summary, the present paper establishes a new theoretical bridge between three mature but hitherto insufficiently connected areas: conditional U-statistics, asymmetric-kernel smoothing on constrained supports, and nonparametric inference under MAR missingness. The results show that one can recover strong uniform convergence properties and asymptotic normality for a broad class of support-adapted nonlinear estimators, while simultaneously accounting for incomplete responses and mixed covariate structures. We hope that this work will stimulate further developments at the interface of higher-order nonparametric inference, geometric statistics, and incomplete-data analysis, and that it will serve as a foundation for future advances in both theory and applications.

To make the structure of the results transparent, Table 2 summarizes the main asymptotic conclusions. The table distinguishes between the deterministic smoothing scale, the stochastic complete-case scale, and the specific role of the MAR propensity score. This is important because, for the complete-case estimators studied here, the deterministic centering is obtained with the effective density

p f

, whereas the stochastic dispersion contains the usual inverse-propensity loss of information. Thus, the MAR mechanism does not merely reduce the sample size; it also changes the constants appearing in the bias, variance, MSE, and bandwidth formulae.

11. Mathematical Development

This section is dedicated to proving our results. We will continue to use the previously established notation, now with the understanding that all quantities incorporate the missingness indicators

δ_{i}

under the MAR assumption (2.4). A crucial element in our proofs involves the truncation of the U-statistics. Specifically, we represent the U-statistics

u_{n, ℓ} (φ, \tilde{x})

for

ℓ \in {1, 2, 3}

as follows:

\begin{matrix} u_{n, ℓ} (φ, \tilde{x}) & = & u_{n, ℓ}^{(m)} (G_{φ, \tilde{x}, ℓ}^{(miss), (T)}) + u_{n, ℓ}^{(m)} (G_{φ, \tilde{x}, ℓ}^{(miss), (R)}) \\ = : & u_{n, ℓ}^{(T)} (φ, \tilde{x}) + u_{n, ℓ}^{(R)} (φ, \tilde{x}), \end{matrix}

(11.1)

where for

ℓ \in {1, 2, 3}

and some

ω_{n, ℓ}

(to be specified later in the proof of each section), we have:

\begin{matrix} G_{φ, \tilde{x}, ℓ}^{(miss)} (t, y, δ) & = & G_{φ, \tilde{x}, ℓ}^{(miss), (T)} (t, y, δ) + G_{φ, \tilde{x}, ℓ}^{(miss), (R)} (t, y, δ) \\ = & G_{φ, \tilde{x}, ℓ}^{(miss)} (t, y, δ) 1_{\{|φ (y)| \leq ω_{n, ℓ}\}} + G_{φ, \tilde{x}, ℓ}^{(miss)} (t, y, δ) 1_{\{|φ (y)| > ω_{n, ℓ}\}} . \end{matrix}

Here,

u_{n, ℓ}^{(T)} (φ, \tilde{x})

is the truncated part, and

u_{n, ℓ}^{(R)} (φ, \tilde{x})

is the remainder part. Note that the missingness indicators

\prod_{j = 1}^{m} δ_{j}

are preserved in both the truncated and remainder components, as they are independent of the truncation threshold. We establish the uniform convergence rates of

u_{n, ℓ} (φ, \tilde{x})

to

E [u_{n, ℓ} (φ, \tilde{x})]

based on the convergence rates of

u_{n, ℓ}^{(m)} (G_{φ, \tilde{x}, ℓ}^{(miss), (T)})

to

E [u_{n, ℓ}^{(m)} (G_{φ, \tilde{x}, ℓ}^{(miss), (T)})]

, while demonstrating that the remainder part is asymptotically negligible under the moment condition (C.3) (or its generalization (C.3)″) and the MAR assumption. Next, we can use these results to deduce the convergence rates of the stochastic part of the estimators

{\hat{r}}_{n, ℓ}^{(m)} (φ, \tilde{x}; {\bar{Λ}}_{n, ℓ} (\tilde{x}))

. Indeed, we can clearly see that based on the classical decomposition, for

ℓ \in {1, 2, 3}

:

\begin{matrix} |{\hat{r}}_{n, ℓ}^{(m)} (φ, \tilde{x}; {\bar{Λ}}_{n, ℓ} (\tilde{x})) - \hat{E} ({\hat{r}}_{n, ℓ}^{(m)} (φ, \tilde{x}; {\bar{Λ}}_{n, ℓ} (\tilde{x})))| \\ = |\frac{u_{n, ℓ} (φ, \tilde{x})}{u_{n, ℓ} (1, \tilde{x})} - \frac{E (u_{n, ℓ} (φ, \tilde{x}))}{E (u_{n, ℓ} (1, \tilde{x}))}| \\ \leq \frac{|u_{n, ℓ} (φ, \tilde{x}) - E (u_{n, ℓ} (φ, \tilde{x}))|}{|u_{n, ℓ} (1, \tilde{x})|} \\ + \frac{|E (u_{n, ℓ} (φ, \tilde{x}))| \cdot |u_{n, ℓ} (1, \tilde{x}) - E (u_{n, ℓ} (1, \tilde{x}))|}{|u_{n, ℓ} (1, \tilde{x})| \cdot |E (u_{n, ℓ} (1, \tilde{x}))|} \\ = : I_{ℓ, 1} + I_{ℓ, 2} . \end{matrix}

(11.2)

Later, based on the imposed regularity conditions in each section, we can easily control the terms

|u_{n, ℓ} (φ, \tilde{x})|

and

|E [u_{n, ℓ} (φ, \tilde{x})]|

(including the particular case when

φ \equiv 1

), uniformly in

\tilde{x}

to obtain the desired rates of convergence. Under the MAR assumption and the positivity condition

{inf}_{x \in X} p (x) \geq c > 0

, we have the uniform lower bounds:

inf_{\tilde{x} \in X^{m}} |E [u_{n, ℓ} (1, \tilde{x})]| \geq C_{*} > 0, inf_{\tilde{x} \in X^{m}} |u_{n, ℓ} (1, \tilde{x})| \geq \frac{C_{*}}{2} a . s . for sufficiently large n,

which ensure that the denominators in

I_{ℓ, 1}

and

I_{ℓ, 2}

are bounded away from zero. Lastly, we need to study the bias terms of each estimator. It is worth noting that the proof of the bias term for the three estimators proposed in this paper is based on the following decomposition for

ℓ \in \{1, 2, 3\}

, we have

\begin{matrix} |\hat{E} [{\hat{r}}_{n, ℓ}^{(m)} (φ, \tilde{x}; {\bar{Λ}}_{n, ℓ} (\tilde{x}))] - r^{(m)} (φ, \tilde{x})| \\ = |\frac{E (u_{n, ℓ} (φ, \tilde{x}))}{E (u_{n, ℓ} (1, \tilde{x}))} - r^{(m)} (φ, \tilde{x})| \\ = \frac{1}{|E (u_{n, ℓ} (1, \tilde{x}))|} |E (u_{n, ℓ} (φ, \tilde{x})) - r^{(m)} (φ, \tilde{x}) E (u_{n, ℓ} (1, \tilde{x}))| . \end{matrix}

(11.3)

As a matter of fact, (11.3) implies that it suffices to control the term

|E (u_{n, ℓ} (φ, \tilde{x})) - R (φ, \tilde{x})|

uniformly in

\tilde{x}

, to establish the desired results, as we can see in the sequel. Under the MAR assumption, we emphasize that the expectations incorporate the propensity score via

E [δ K_{(α, β)} (X)] = E [p (X) K_{(α, β)} (X)]

. However, as demonstrated in Remark 5, the leading-order bias expansion remains unaffected by the missingness mechanism, with the propensity score

p (\cdot)

canceling out in the ratio due to the common factor appearing in both the numerator and denominator. The residual higher-order terms involving derivatives of

p (\cdot)

are asymptotically negligible under the smoothness condition (C.2) and the bandwidth condition

\overset{˘}{b} \to 0

.

12. Proofs of Section 3: Dirichlet Kernels

12.1. Proofs of Section 3.2

Proof of Theorem 2.

Let

x \in S_{d, 1} (\overset{˘}{b} (d + 1))

,

n \geq 1

,

0 < \overset{˘}{b} < (e^{- 16 \sqrt{2}} \land d^{- 1})

,

0 < a \leq e^{- 1} {∥ f ∥}_{\infty} | \log \overset{˘}{b} | / {\overset{˘}{b}}^{d + 1 / 2}

, and take the unique

δ \in (0, e^{- 1}] that satisfies δ | \log δ | = \frac{{\overset{˘}{b}}^{d + 1 / 2} a}{{2 ∥ f ∥}_{\infty} | \log \overset{˘}{b} |} .

Define

{\tilde{x}}^{'} = (x_{1}^{'}, \dots, x_{m}^{'})

such that

{\tilde{x}}^{'} \in \tilde{x} + {[- \overset{˘}{b}, \overset{˘}{b}]}^{m}

, where

\tilde{x} = (x_{1}, \dots, x_{m}) \in S_{d, 1}^{m} (\overset{˘}{b} (d + 1))

, and

\overset{˘}{b} : = (b, \dots, b)

a d-dimensional vector, then we have:

\begin{matrix} |u_{n, 1}^{(miss)} (φ, \tilde{x}) - E [u_{n, 1}^{(miss)} (φ, \tilde{x})]| \\ \leq |u_{n, 1}^{(miss)} (φ, \tilde{x}) - u_{n, 1}^{(miss)} (φ, {\tilde{x}}^{'})| + |E [u_{n, 1}^{(miss)} (φ, {\tilde{x}}^{'})] - E [u_{n, 1}^{(miss)} (φ, \tilde{x})]| \\ + |u_{n, 1}^{(miss)} (φ, {\tilde{x}}^{'}) - E [u_{n, 1}^{(miss)} (φ, {\tilde{x}}^{'})]| . \end{matrix}

(12.1)

As we explained before, to establish uniform convergence rates, we will be studying the convergence of the truncated part and the remainder part of

u_{n, 1}^{(miss)} (φ, \tilde{x})

respectively. Under the MAR assumption (2.4), the missingness indicators

δ_{i}

are incorporated into the kernel

G_{φ, \tilde{x}, 1}^{(miss)}

as per (2.6), and the positivity condition ensures that the denominators are bounded away from zero.

Truncated Part

Notice that

\begin{matrix} |u_{n, 1}^{(T), (miss)} (φ, \tilde{x}) - E (u_{n, 1}^{(T), (miss)} (φ, \tilde{x}))| \\ = \frac{(n - m)!}{n!} |\sum_{i \in I (m, n)} \{φ^{(T)} (\tilde{Y_{i}}) (\prod_{j = 1}^{m} δ_{i_{j}}) {\tilde{K}}_{{\bar{Λ}}_{n, 1} (\tilde{x})} ({\tilde{X}}_{i}) - E [φ^{(T)} (\tilde{Y_{i}}) (\prod_{j = 1}^{m} δ_{i_{j}}) {\tilde{K}}_{{\bar{Λ}}_{n, 1} (\tilde{x})} ({\tilde{X}}_{i})]\}| \\ = \frac{(n - m)!}{n!} |\sum_{i \in I (m, n)} \{G_{φ, \tilde{x}, 1}^{(miss), (T)} (\tilde{X_{i}}, \tilde{Y_{i}}, {\tilde{δ}}_{i}) - E [G_{φ, \tilde{x}, 1}^{(miss), (T)} (\tilde{X_{i}}, \tilde{Y_{i}}, {\tilde{δ}}_{i})]\}| \\ = \frac{(n - m)!}{n!} |\sum_{i \in I (m, n)} W^{(T), (miss)} (\tilde{X_{i}}, \tilde{Y_{i}}, {\tilde{δ}}_{i})|, \end{matrix}

where

W^{(T), (miss)} (\tilde{X}, \tilde{Y}, \tilde{δ}) : = G_{φ, \tilde{x}, 1}^{(miss), (T)} (\tilde{X}, \tilde{Y}, \tilde{δ}) - E [G_{φ, \tilde{x}, 1}^{(miss), (T)} (\tilde{X}, \tilde{Y}, \tilde{δ})] .

Mirroring the approach used in the proof of Theorem 1, we begin by establishing continuity estimates for the random fields

\tilde{z} \mapsto W^{(T), (miss)} (\tilde{z})

so that we get to control the probability that

W^{(T), (miss)} (\tilde{z})

and

W^{(T), (miss)} ({\tilde{z}}^{'})

are too far apart when

\tilde{z} = (\tilde{x}, \tilde{y}, \tilde{δ})

and

{\tilde{z}}^{'} = ({\tilde{x}}^{'}, \tilde{y}, \tilde{δ})

are close. Note that the missingness indicators

\tilde{δ}

remain unchanged in this comparison, as they are not affected by perturbations of

\tilde{x}

. Building on the framework established in Proposition 1 of [164], we present the following proposition to determine the behavior of

(u_{n, 1}^{(m)} (W^{(T), (miss)} ({\tilde{x}}^{'}, \tilde{y}, \tilde{δ})) - u_{n, 1}^{(m)} (W^{(T), (miss)} (\tilde{x}, \tilde{y}, \tilde{δ})))

.

Proposition 1.

Let

x \in S_{d, 1} (\overset{˘}{b} (d + 1))

,

n \geq 1

,

0 < \overset{˘}{b} < (e^{- 16 \sqrt{2}} \land d^{- 1})

,

0 < a \leq e^{- 1} {∥ f ∥}_{\infty} | \log \overset{˘}{b} | / {\overset{˘}{b}}^{d + 1 / 2}

, and take the unique

δ \in (0, e^{- 1}] t h a t s a t i s f i e s δ | \log δ | = \frac{{\overset{˘}{b}}^{d + 1 / 2} a}{{2 ∥ f ∥}_{\infty} | \log \overset{˘}{b} |} .

(12.2)

Then, under the MAR assumption (2.4) and the positivity condition, for all

h \in R

, we have

\begin{matrix} P (sup_{{\tilde{x}}^{'} \in \tilde{x} + {[- \overset{˘}{b}, \overset{˘}{b}]}^{m}} |u_{n, 1}^{(m)} (W^{(T), (miss)} ({\tilde{x}}^{'}, \tilde{y}, \tilde{δ}))| \geq h + 2 a^{m}, |u_{n, 1}^{(m)} (W^{(T), (miss)} (\tilde{x}, \tilde{y}, \tilde{δ}))| \leq h) \\ \leq C_{φ, d m} \exp (- \frac{1}{100^{2} d^{4} {∥ f ∥}_{\infty}^{2}} \cdot {(\frac{n^{1 / 2} {\overset{˘}{b}}^{d + 1 / 2} a^{m}}{| \log δ | | \log \overset{˘}{b} |})}^{2}), \end{matrix}

(12.3)

where

C_{φ, d m} > 0

is a constant that depends only on the function

φ (\cdot)

, the dimension d, the degree m, and the bounds on the propensity score

p (\cdot)

.

Proof of Proposition 1.

Similar to the proof of Proposition 4, by a union bound the probability in (12.3) is

\begin{matrix} \leq & P (\{sup_{{\tilde{x}}^{'} \in \tilde{x} + {[- \overset{˘}{b}, \overset{˘}{b}]}^{m}} |(u_{n, 1}^{(m)} (W^{(T), (miss)} ({\tilde{x}}^{'}, \tilde{y}, \tilde{δ})) - u_{n, 1}^{(m)} (W^{(T), (miss)} (\tilde{x}, \tilde{y}, \tilde{δ}))) 1_{\{\tilde{x_{i}} \in S_{d, 1}^{m} ∖ S_{d, 1}^{m} (δ)\}}| \geq a^{m}\} \end{matrix}

\begin{matrix} \cap \{\sum_{i \in I (m, n)} 1_{\{\tilde{x_{i}} \in S_{d, 1}^{m} ∖ S_{d, 1}^{m} (δ)\}} \leq 2 \cdot 2^{m} C_{n}^{m} {∥ f ∥}_{\infty}^{m} δ^{m}\}) \end{matrix}

(12.4)

\begin{matrix} + P (\sum_{i \in I (m, n)} 1_{\{\tilde{x_{i}} \in S_{d, 1}^{m} ∖ S_{d, 1}^{m} (δ)\}} \geq 2 \cdot 2^{m} C_{n}^{m} {∥ f ∥}_{\infty}^{m} δ^{m}) \end{matrix}

(12.5)

\begin{matrix} + P (sup_{{\tilde{x}}^{'} \in \tilde{x} + {[- \overset{˘}{b}, \overset{˘}{b}]}^{m}} |(u_{n, 1}^{(m)} (W^{(T), (miss)} ({\tilde{x}}^{'}, \tilde{y}, \tilde{δ})) - u_{n, 1}^{(m)} (W^{(T), (miss)} (\tilde{x}, \tilde{y}, \tilde{δ}))) 1_{\{\tilde{x_{i}} \in S_{d, 1}^{m} (δ)\}}| \geq a^{m}) . \end{matrix}

(12.6)

Let us begin with (12.4), following the same reasoning as Proposition 4, on the event

\{{(C_{n}^{m})}^{- 1} \sum_{i \in I (m, n)} 1_{\{\tilde{x_{i}} \in S_{d, 1}^{m} ∖}} S_{d, 1}^{m} (δ)\}} \leq 2 \cdot 2^{m} {∥ f ∥}_{\infty}^{m} δ^{m}\},

we have

\begin{matrix} |(u_{n, 1}^{(m)} (W^{(T), (miss)} ({\tilde{x}}^{'}, \tilde{y}, \tilde{δ})) - u_{n, 1}^{(m)} (W^{(T), (miss)} (\tilde{x}, \tilde{y}, \tilde{δ}))) 1_{\{\tilde{x_{i}} \in S_{d, 1}^{m} ∖}} S_{d, 1}^{m} (δ)\}}| \\ \leq sup_{(\tilde{x}, {\tilde{x}}^{'}) \in S_{d, 1}^{2 m} (b)} |W^{(T), (miss)} (\tilde{x}, \tilde{y}, \tilde{δ}) - W^{(T), (miss)} ({\tilde{x}}^{'}, \tilde{y}, \tilde{δ})| {(C_{n}^{m})}^{- 1} \sum_{i \in I (m, n)} 1_{\{\tilde{x_{i}} \in S_{d, 1}^{m} ∖ S_{d, 1}^{m} (δ)\}} \\ \leq 4 \cdot 2 \cdot 2^{m} ω_{n, 1} sup_{\tilde{x} \in S_{d, 1}^{m}} {\tilde{K}}_{{\bar{Λ}}_{n, 1} (\tilde{x})} (\tilde{x}, \tilde{X}) \cdot {∥ f ∥}_{\infty}^{m} δ^{m} \\ \leq 4 \cdot 2 \cdot 2^{m} ω_{n, 1} \cdot {\overset{˘}{b}}^{- d m} {({\overset{˘}{b}}^{- 1} + d)}^{m / 2} {∥ f ∥}_{\infty}^{m} δ^{m} . \end{matrix}

The latter equation is obtained by (15.11), (15.12) and Lemma 2 [164]. Therefore, we have

\begin{matrix} |(u_{n, 1}^{(m)} (W^{(T), (miss)} ({\tilde{x}}^{'}, \tilde{y}, \tilde{δ})) - u_{n, 1}^{(m)} (W^{(T), (miss)} (\tilde{x}, \tilde{y}, \tilde{δ}))) 1_{\{\tilde{x_{i}} \in S_{d, 1}^{m} ∖ S_{d, 1}^{m} (δ)\}}| \leq \frac{8 ω_{n, 1} {(1 + \overset{˘}{b} d)}^{m / 2}}{{| \log (δ) |}^{m} {| \log (\overset{˘}{b}) |}^{m}} a^{m} . \end{matrix}

(12.7)

Since

0 < δ \leq e^{- 1}

and

0 < \overset{˘}{b} < (e^{- 8 \sqrt{2}} \land d^{- 1})

by assumption, the above is

< a^{m}

, which means that

(12.7) < a^{m}

, this implies that the probability in (12.4) equals zero. Next, we apply Hoeffding’s inequality to control the probability in (12.5). Since

0 \leq 1_{\{\tilde{x} \in S_{d, 1}^{m} ∖ S_{d, 1}^{m} (δ)\}} \leq 1

, and

μ = E [1_{\{\tilde{x} \in S_{d, 1}^{m} ∖ S_{d, 1}^{m} (δ)\}}] \leq \prod_{j = 1}^{m} E [1_{\{x_{j} \in S_{d, 1} ∖ S_{d, 1} (δ)\}}] \leq \frac{2^{m} {∥ f ∥}_{\infty}^{m} δ^{m}}{{((d - 1)!)}^{m}} .

Then, for

t = 2 \cdot 2^{m} {∥ f ∥}_{\infty}^{m} δ^{m} - μ

, we have

\begin{matrix} P (\sum_{i \in I (m, n)} 1_{\{\tilde{x_{i}} \in S_{d, 1}^{m} ∖ S_{d, 1}^{m} (δ)\}} - μ \geq t) \leq \exp \{- 2 [n / m] {((2 {((d - 1)!)}^{m} - 1) \cdot \frac{2^{m} {∥ f ∥}_{\infty}^{m}}{{((d - 1)!)}^{m}} δ^{m})}^{2}\}, \end{matrix}

taking into account (12.2), we obtain

\begin{matrix} P (\sum_{i \in I (m, n)} 1_{\{\tilde{x_{i}} \in S_{d, 1} ∖}} S_{d, 1}^{m} (δ)\}} - μ \geq t) \leq \exp \{- 2 [n / m] {(\frac{{\overset{˘}{b}}^{d + 1 / 2} a}{| \log δ | | \log \overset{˘}{b} |})}^{2 m}\} . \end{matrix}

Moving on to (12.6), then

\begin{matrix} u_{n, 1}^{(m)} (W^{(T), (miss)} ({\tilde{x}}^{'}, \tilde{y}, \tilde{δ}) - W^{(T), (miss)} (\tilde{x}, \tilde{y}, \tilde{δ}) 1_{\{\tilde{x_{i}} \in S_{d, 1}^{m} ∖ S_{d, 1}^{m} (δ)\}}) \\ = m u_{n, 1}^{(1)} (π_{1, m} [(G_{φ, {\tilde{x}}^{'}, 1}^{(miss), (T)} - G_{φ, \tilde{x}, 1}^{(miss), (T)}) 1_{\{\tilde{x_{i}} \in S_{d, 1}^{m} ∖ S_{d, 1}^{m} (δ)\}}]) \\ + \sum_{q = 2}^{m} \frac{m!}{(m - q)!} u_{n, 1}^{(q)} (π_{q, m} [(G_{φ, {\tilde{x}}^{'}, 1}^{(miss), (T)} - G_{φ, \tilde{x}, 1}^{(miss), (T)}) 1_{\{\tilde{x_{i}} \in S_{d, 1}^{m} ∖ S_{d, 1}^{m} (δ)\}}]), \end{matrix}

(12.8)

where the linear term:

\begin{matrix} m u_{n, 1}^{(1)} (π_{1, m} [(G_{φ, {\tilde{x}}^{'}, 1}^{(miss), (T)} - G_{φ, \tilde{x}, 1}^{(miss), (T)}) 1_{\{\tilde{x_{i}} \in S_{d, 1}^{m} ∖ S_{d, 1}^{m} (δ)\}}]) \\ = \frac{m}{n} \sum_{i = 1}^{n} π_{1, m} [(G_{φ, {\tilde{x}}^{'}, 1}^{(miss), (T)} - G_{φ, \tilde{x}, 1}^{(miss), (T)}) 1_{\{\tilde{x_{i}} \in S_{d, 1}^{m} ∖ S_{d, 1}^{m} (δ)\}}] ({\tilde{X}}_{i}, {\tilde{Y}}_{i}, {\tilde{δ}}_{i}), \end{matrix}

(12.9)

can be treated similarly to the proof of Proposition 4. The presence of the missingness indicators

{\tilde{δ}}_{i}

does not affect the argument, as they are bounded by 1 and independent of the truncation threshold. Now, for the nonlinear term, let us first introduce the following class of functions:

F : = \{G^{(miss)} (φ, {\tilde{x}}^{'}) - G^{(miss)} (φ, \tilde{x}) : \tilde{x} \in S_{d, 1}^{m} (\overset{˘}{b} (d + 1) and {\tilde{x}}^{'} \in \tilde{x} + {[- \overset{˘}{b}, \overset{˘}{b}]}^{m}\},

then we have for all

ε > 0

\begin{matrix} P (sup_{{\tilde{x}}^{'} \in \tilde{x} + {[- \overset{˘}{b}, \overset{˘}{b}]}^{m}} |\sum_{q = 2}^{m} \frac{m!}{(m - q)!} u_{n, 1}^{(q)} (π_{q, m} [(G_{φ, {\tilde{x}}^{'}, 1}^{(miss), (T)} - G_{φ, \tilde{x}, 1}^{(miss), (T)}) 1_{\{\tilde{x_{i}} \in S_{d, 1}^{m} ∖ S_{d, 1}^{m} (δ)\}}])| \geq ε) \\ \equiv P ({∥\sum_{q = 2}^{m} \frac{m!}{(m - q)!} u_{n, 1}^{(q)} (π_{q, m} [(G_{φ, {\tilde{x}}^{'}, 1}^{(miss), (T)} - G_{φ, \tilde{x}, 1}^{(miss), (T)}) 1_{\{\tilde{x_{i}} \in S_{d, 1}^{m} ∖ S_{d, 1}^{m} (δ)\}}])∥}_{F} \geq ε) . \end{matrix}

We have

\begin{matrix} E [{∥n^{1 - m} \sum_{I_{m}^{n}} ε_{i_{1}}^{(1)} ε_{i_{2}}^{(2)} [(G_{φ, {\tilde{x}}^{'}, 1}^{(miss), (T)} - G_{φ, \tilde{x}, 1}^{(miss), (T)}) 1_{\{\tilde{x_{i}} \in S_{d, 1}^{m} ∖ S_{d, 1}^{m} (δ)\}}]∥}_{F}] \\ \leq 2 C E [|n^{1 - m} \sum_{I_{m}^{n}} ε_{i_{1}}^{(1)} ε_{i_{2}}^{(2)} [φ (Y_{i_{1}}, \dots, Y_{i_{m}}) (\prod_{j = 1}^{m} δ_{i_{j}})]|] . \end{matrix}

Using the same reasoning as in [187], one can find a positive constant

c_{0} > 0

such that

E [|n^{1 - m} \sum_{I_{m}^{n}} ε_{i_{1}}^{(1)} ε_{i_{2}}^{(2)} [φ (Y_{i_{1}}, \dots, Y_{i_{m}}) (\prod_{j = 1}^{m} δ_{i_{j}})]|] < c_{0} .

Now, an application of Proposition 4 of [187] gives us for

ε = a^{m} n^{- 1 / 2}

while taking into consideration (12.2)

\begin{matrix} P ({∥\sum_{q = 2}^{m} \frac{m!}{(m - q)!} u_{n, 1}^{(q)} (π_{q, m} [(G_{φ, {\tilde{x}}^{'}, 1}^{(miss), (T)} - G_{φ, \tilde{x}, 1}^{(miss), (T)}) 1_{\{\tilde{x_{i}} \in S_{d, 1}^{m} ∖ S_{d, 1}^{m} (δ)\}}])∥}_{F} ⩾ a^{m} n^{- 1 / 2}) \\ \leq 2 \exp (- \frac{a^{m} n^{1 / 2}}{2^{m + 5} m^{m + 1} 2 ω_{n, 1} {\overset{˘}{b}}^{- d m} {({\overset{˘}{b}}^{- 1} + d)}^{m / 2} c_{0}}) \\ \leq 2 \exp (- \frac{(2 δ |\log δ| {∥ f ∥}_{\infty} |\log \overset{˘}{b}|)^{m} n^{1 / 2}}{2^{m + 5} m^{m + 1} 2 ω_{n, 1} {\overset{˘}{b}}^{m (d + 1 / 2)} {\overset{˘}{b}}^{- d m} {({\overset{˘}{b}}^{- 1} + d)}^{m / 2} c_{0}}) \\ \leq 2 \exp (- \frac{(δ |\log δ| {∥ f ∥}_{\infty} |\log \overset{˘}{b}|)^{m} n^{1 / 2}}{2^{6} m^{m + 1} ω_{n, 1} {(1 + \overset{˘}{b} d)}^{m / 2} c_{0}}) . \end{matrix}

We can find a constant

C_{1} > 0

, such that

\frac{(δ |\log δ| {∥ f ∥}_{\infty})^{m}}{2^{6} m^{m + 1} {(1 + \overset{˘}{b} d)}^{m / 2} c_{0}} \geq C_{1},

which implies

\begin{matrix} \exp (- \frac{(δ |\log δ| {∥ f ∥}_{\infty} |\log \overset{˘}{b}|)^{m} n^{1 / 2}}{2^{6} m^{m + 1} ω_{n, 1} {(1 + \overset{˘}{b} d)}^{m / 2} c_{0}}) & \leq \exp (- C_{1} {|\log \overset{˘}{b}|}^{m} n^{1 / 2 - 1 / p}) \\ \leq \exp (- C_{1} m |\log \overset{˘}{b}| n^{1 / 2 - 1 / p}) . \end{matrix}

Therefore, we readily infer that

\begin{matrix} \sum_{n = 1}^{\infty} P ({∥\sum_{q = 2}^{m} \frac{m!}{(m - q)!} u_{n, 1}^{(q)} (π_{q, m} [(G_{φ, {\tilde{x}}^{'}, 1}^{(miss), (T)} - G_{φ, \tilde{x}, 1}^{(miss), (T)}) 1_{\{\tilde{x_{i}} \in S_{d, 1}^{m} ∖ S_{d, 1}^{m} (δ)\}}])∥}_{F} ⩾ a^{m} n^{- 1 / 2}) < \infty . \end{matrix}

Hence, the proof of the proposition is complete by an application of Borel-Cantelli lemma. □

Remainder Part under MAR

We now consider the remainder part, recall that the U-statistic

u_{n, 1}^{(R), (miss)} (φ, \tilde{x})

is based on the unbounded kernel given by

G_{φ, \tilde{x}, 1}^{(miss), (R)} (\tilde{x}, \tilde{y}, \tilde{δ}) = G_{φ, \tilde{x}, 1}^{(miss)} (\tilde{x}, \tilde{y}, \tilde{δ}) 1_{{φ (y) > λ ξ_{n}^{1 / (1 + γ))}}} .

We have to establish that it is negligible, meaning that

sup_{\tilde{x} \in S_{d, 1}^{m}} \frac{\sqrt{n} {\overset{˘}{b}}^{m (d + 1 / 2)} |u_{n, 1}^{(m)} (G_{φ, \tilde{x}, 1}^{(miss), (R)}) - E (u_{n, 1}^{(m)} (G_{φ, \tilde{x}, 1}^{(miss), (R)}))|}{{|\log \overset{˘}{b}|}^{m} {(\log n)}^{3 / 2}} = o_{a . s} (1) .

(12.10)

For

\tilde{x}, \tilde{y} \in S_{d, 1}^{m}

and

\tilde{δ} \in {0, 1}^{m}

, observe that

\begin{matrix} |G_{φ, \tilde{x}, 1}^{(miss)} (\tilde{x}, \tilde{y}, \tilde{δ})| & \leq & {\overset{˘}{b}}^{- d m} {({\overset{˘}{b}}^{- 1} + d)}^{m / 2} |φ (\tilde{y})| = : \tilde{F} (\tilde{y}), \end{matrix}

where we used the fact that

| \prod_{j = 1}^{m} δ_{j} | \leq 1

. Taking into account that

\tilde{F}

is symmetric and does not depend on

\tilde{δ}

(since the missingness indicators are bounded by 1), we have

|u_{n, 1}^{(m)} (G_{φ, \tilde{x}, 1}^{(miss), (R)})| \leq u_{n, 1}^{(m)} (\tilde{F} 1_{{\tilde{F} > λ ξ_{n}^{1 / (1 + γ)}}}),

where

u_{n, 1}^{(m)} (\tilde{F} (y) 1_{{φ (y) > λ ξ_{n}^{1 / (1 + γ)}}})

is a U-statistic based on the U-kernel

\tilde{F} 1_{{φ > λ ξ_{n}^{1 / (1 + γ)}}} .

Under the MAR assumption, the missingness indicators are independent of the truncation event and are bounded, so the same inequality holds almost surely. Consequently,

\begin{matrix} sup_{\tilde{x} \in S_{d, 1}^{m}} \frac{\sqrt{n} {\overset{˘}{b}}^{m (d + 1 / 2)} |u_{n, 1}^{(m)} (G_{φ, \tilde{x}, 1}^{(miss), (R)})|}{{|\log \overset{˘}{b}|}^{m} {(\log n)}^{3 / 2}} & \leq & \frac{\sqrt{n} {(1 + \overset{˘}{b} d)}^{m / 2}}{{|\log \overset{˘}{b}|}^{m} {(\log n)}^{3 / 2}} u_{n, 1}^{(m)} (\tilde{F} 1_{{\tilde{F} > λ ξ_{n}^{1 / (1 + γ)}}}) \\ \leq & C_{7} ξ_{n} u_{n, 1}^{(m)} (\tilde{F} 1_{{\tilde{F} > λ ξ_{n}^{1 / (1 + γ)}}}), \end{matrix}

(12.11)

and

\begin{matrix} sup_{\tilde{x} \in S_{d, 1}^{m}} \frac{\sqrt{n} {\overset{˘}{b}}^{m (d + 1 / 2)} |u_{n, 1}^{(m)} (G_{φ, \tilde{x}, 1}^{(miss), (R)})|}{{|\log \overset{˘}{b}|}^{m} {(\log n)}^{3 / 2}} & \leq & C_{7} ξ_{n} E (u_{n, 1}^{(m)} (\tilde{F} 1_{{φ (Y) > λ ξ_{n}^{1 / (1 + γ)}}})) \\ \leq & C_{7} E ({\tilde{F}}^{2 + γ} 1_{{φ (Y) > λ ξ_{n}^{1 / (1 + γ)}}}) . \end{matrix}

Therefore, as

n ⟶ \infty

, we have

\begin{matrix} sup_{\tilde{x} \in S_{d, 1}^{m}} \frac{\sqrt{n} {\overset{˘}{b}}^{m (d + 1 / 2)} |u_{n, 1}^{(m)} (G_{φ, \tilde{x}, 1}^{(miss), (R)})|}{{|\log \overset{˘}{b}|}^{m} {(\log n)}^{3 / 2}} = o (1) . \end{matrix}

(12.12)

Hence, to achieve the proof, it remains to establish that

u_{n, 1}^{(m)} (\tilde{F} 1_{{φ (y) > λ ξ_{n}^{1 / (1 + γ)}}}) = o_{a . s} ({(s_{m}^{- 1} ξ_{n})}^{- 1 / 2}) .

(12.13)

An application of Chebyshev’s inequality, for any

η > 0

, gives

\begin{matrix} P \{|u_{n, 1}^{(m)} (\tilde{F} 1_{{φ (Y) > λ ξ_{n}^{1 / (1 + γ)}}}) - E (u_{n, 1}^{(m)} (\tilde{F} 1_{{φ (Y) > λ ξ_{n}^{1 / (1 + γ)}}}))| \geq η {(s_{m}^{- 1} ξ_{n})}^{- 1 / 2}\} \\ \leq η^{- 2} (s_{m}^{- 1} ξ_{n}) V a r (u_{n, 1}^{(m)} (\tilde{F} 1_{{φ (Y) > λ ξ_{n}^{1 / (1 + γ)}}})) \leq m η^{- 2} ξ_{n} E ({\tilde{F}}^{2} 1_{{φ (Y) > λ ξ_{n}^{1 / (1 + γ)}}}) \\ \leq \frac{m}{n^{2}} η^{- 2} {(ξ_{n})}^{1 + γ} E ({\tilde{F}}^{2} 1_{{φ (Y) > λ ξ_{n}^{1 / (1 + γ)}}}) \leq η^{'} E ({\tilde{F}}^{3} 1_{{φ (Y) > λ ξ_{n}^{1 / (1 + γ)}}}) \frac{1}{n^{2}}, \end{matrix}

so by using the fact that

η^{'} E ({\tilde{F}}^{3} 1_{{φ (y) > λ ξ_{n}^{1 / (1 + γ)}}}) \sum_{n \geq 1} \frac{1}{n^{2}} < \infty,

we deduce that

\begin{matrix} \sum_{n \geq 1} P \{|u_{n, 1}^{(m)} (\tilde{F} 1_{{φ (y) > λ ξ_{n}^{1 / (1 + γ)}}}) - E (u_{n, 1}^{(m)} (\tilde{F} 1_{{φ (y) > λ ξ_{n}^{1 / (1 + γ)}}}))| \geq η {(m ξ_{n})}^{- 1 / 2}\} < \infty . \end{matrix}

Finally, note that (12.11) implies

E (u_{n, 1}^{(m)} (\tilde{F} 1_{{φ (y) > λ ξ_{n}^{1 / (1 + γ)}}})) = o ({(s_{m}^{- 1} ξ_{n})}^{- 1 / 2}) .

The preceding results for the arbitrary choice of

λ > 0

show that (12.13) holds, which, by combining with (12.12) and (12.11), completes the proof of (12.10). We finally obtain

sup_{\tilde{x} \in S_{d, 1}^{m}} |u_{n, 1}^{(miss)} (φ, \tilde{x}) - E [u_{n, 1}^{(miss)} (φ, \tilde{x})]| = O (\frac{| \log \overset{˘}{b} |^{m} {(\log n)}^{3 / 2}}{{\overset{˘}{b}}^{m (d + 1 / 2)} \sqrt{n}}), a . s .

Hence, the proof is complete. □

Proof of Theorem 3.

Recall the classical decomposition established in (11.2). Under the MAR assumption (2.4) and the positivity condition

{inf}_{x \in X} p (x) \geq c > 0

, we have for any

\tilde{x} \in S_{d, 1}^{m}

:

|{\hat{r}}_{n, 1}^{(m), (miss)} (φ, \tilde{x}; {\bar{Λ}}_{n, 1} (\tilde{x})) - \hat{E} ({\hat{r}}_{n, 1}^{(m), (miss)} (φ, \tilde{x}; {\bar{Λ}}_{n, 1} (\tilde{x})))| \leq I_{n, 1} (\tilde{x}) + I_{n, 2} (\tilde{x}),

(12.14)

where the stochastic components are defined as

\begin{matrix} I_{n, 1} (\tilde{x}) & : = & \frac{|u_{n, 1}^{(miss)} (φ, \tilde{x}) - E (u_{n, 1}^{(miss)} (φ, \tilde{x}))|}{|u_{n, 1}^{(miss)} (1, \tilde{x})|}, \\ I_{n, 2} (\tilde{x}) & : = & \frac{|E (u_{n, 1}^{(miss)} (φ, \tilde{x}))| \cdot |u_{n, 1}^{(miss)} (1, \tilde{x}) - E (u_{n, 1}^{(miss)} (1, \tilde{x}))|}{|u_{n, 1}^{(miss)} (1, \tilde{x})| \cdot |E (u_{n, 1}^{(miss)} (1, \tilde{x}))|} . \end{matrix}

Given the regularity conditions imposed in Section 3.2 and the positivity condition on the propensity score

p (\cdot)

, there exist deterministic constants

c_{1}, c_{2} > 0

(independent of

\tilde{x}

and n, for sufficiently large n) such that:

\begin{matrix} inf_{\tilde{x} \in S_{d, 1}^{m}} |u_{n, 1}^{(miss)} (1, \tilde{x})| & \geq & c_{1} > 0 almost surely, \end{matrix}

(12.15)

\begin{matrix} inf_{\tilde{x} \in S_{d, 1}^{m}} |E (u_{n, 1}^{(miss)} (1, \tilde{x}))| & \geq & c_{2} > 0 . \end{matrix}

(12.16)

These bounds follow from the uniform convergence of

u_{n, 1}^{(miss)} (1, \tilde{x})

to

E [u_{n, 1}^{(miss)} (1, \tilde{x})]

(established in Theorem 2) and the fact that

E [u_{n, 1}^{(miss)} (1, \tilde{x})]

converges uniformly to

\prod_{j = 1}^{m} p (x_{j}) \tilde{f} (\tilde{x})

(see the bias expansion in Theorem 4), which is bounded away from zero on

S_{d, 1}^{m}

under the positivity condition and the assumption that f is bounded below on the compact set

S_{d, 1}^{m} (δ)

(and extended appropriately). Moreover, the boundedness of the numerator term is guaranteed by the moment condition (C.3):

sup_{\tilde{x} \in S_{d, 1}^{m}} |E (u_{n, 1}^{(miss)} (φ, \tilde{x}))| = O (1) as n \to \infty .

(12.17)

From Theorem 2, we have the following almost sure uniform convergence rate for the U-statistics:

sup_{\tilde{x} \in S_{d, 1}^{m}} |u_{n, 1}^{(miss)} (ψ, \tilde{x}) - E [u_{n, 1}^{(miss)} (ψ, \tilde{x})]| = O (\frac{| \log \overset{˘}{b} |^{m} {(\log n)}^{3 / 2}}{{\overset{˘}{b}}^{m (d + 1 / 2)} \sqrt{n}}) a . s .,

(12.18)

for

ψ = φ

and

ψ \equiv 1

. This result is derived from the truncation argument and the exponential inequalities for degenerate U-statistics, as detailed in the proof of Theorem 2. Using the decomposition

{\hat{r}}_{n, 1}^{(m), (miss)} - \hat{E} [{\hat{r}}_{n, 1}^{(m), (miss)}] \leq I_{n, 1} + I_{n, 2}

and the uniform bounds from Steps 1 and 2, we obtain almost surely:

\begin{matrix} sup_{\tilde{x} \in S_{d, 1}^{m}} \frac{{\overset{˘}{b}}^{m (d + 1 / 2)} \sqrt{n} |{\hat{r}}_{n, 1}^{(m), (miss)} (φ, \tilde{x}; {\bar{Λ}}_{n, 1} (\tilde{x})) - \hat{E} ({\hat{r}}_{n, 1}^{(m), (miss)} (φ, \tilde{x}; {\bar{Λ}}_{n, 1} (\tilde{x})))|}{| \log \overset{˘}{b} |^{m} {(\log n)}^{3 / 2}} \\ \leq sup_{\tilde{x} \in S_{d, 1}^{m}} \frac{{\overset{˘}{b}}^{m (d + 1 / 2)} \sqrt{n} I_{n, 1} (\tilde{x})}{| \log \overset{˘}{b} |^{m} {(\log n)}^{3 / 2}} + sup_{\tilde{x} \in S_{d, 1}^{m}} \frac{{\overset{˘}{b}}^{m (d + 1 / 2)} \sqrt{n} I_{n, 2} (\tilde{x})}{| \log \overset{˘}{b} |^{m} {(\log n)}^{3 / 2}} . \end{matrix}

(12.19)

For the first term, using (12.15) and (12.18) with

ψ = φ

, we have:

\begin{matrix} sup_{\tilde{x} \in S_{d, 1}^{m}} \frac{{\overset{˘}{b}}^{m (d + 1 / 2)} \sqrt{n} I_{n, 1} (\tilde{x})}{| \log \overset{˘}{b} |^{m} {(\log n)}^{3 / 2}} & \leq \frac{1}{c_{1}} sup_{\tilde{x} \in S_{d, 1}^{m}} \frac{{\overset{˘}{b}}^{m (d + 1 / 2)} \sqrt{n} |u_{n, 1}^{(miss)} (φ, \tilde{x}) - E [u_{n, 1}^{(miss)} (φ, \tilde{x})]|}{| \log \overset{˘}{b} |^{m} {(\log n)}^{3 / 2}} \\ = O (1) a . s . \end{matrix}

(12.20)

For the second term, we apply (12.15)–(12.18) with

ψ \equiv 1

:

\begin{matrix} sup_{\tilde{x} \in S_{d, 1}^{m}} \frac{{\overset{˘}{b}}^{m (d + 1 / 2)} \sqrt{n} I_{n, 2} (\tilde{x})}{| \log \overset{˘}{b} |^{m} {(\log n)}^{3 / 2}} \\ \leq \frac{{sup}_{\tilde{x}} |E [u_{n, 1}^{(miss)} (φ, \tilde{x})]|}{c_{1} c_{2}} sup_{\tilde{x} \in S_{d, 1}^{m}} \frac{{\overset{˘}{b}}^{m (d + 1 / 2)} \sqrt{n} |u_{n, 1}^{(miss)} (1, \tilde{x}) - E [u_{n, 1}^{(miss)} (1, \tilde{x})]|}{| \log \overset{˘}{b} |^{m} {(\log n)}^{3 / 2}} \end{matrix}

(12.21)

\begin{matrix} = O (1) a . s . \end{matrix}

(12.22)

Combining (12.20) and (12.22) into (12.19), we deduce the existence of a finite constant

C^{*} > 0

such that, almost surely,

\underset{n \to \infty}{lim sup} sup_{\tilde{x} \in S_{d, 1}^{m}} \frac{{\overset{˘}{b}}^{m (d + 1 / 2)} \sqrt{n} |{\hat{r}}_{n, 1}^{(m), (miss)} (φ, \tilde{x}; {\bar{Λ}}_{n, 1} (\tilde{x})) - \hat{E} ({\hat{r}}_{n, 1}^{(m), (miss)} (φ, \tilde{x}; {\bar{Λ}}_{n, 1} (\tilde{x})))|}{| \log \overset{˘}{b} |^{m} {(\log n)}^{3 / 2}} \leq C^{*} .

(12.23)

Equivalently,

sup_{\tilde{x} \in S_{d, 1}^{m}} |{\hat{r}}_{n, 1}^{(m), (miss)} (φ, \tilde{x}; {\bar{Λ}}_{n, 1} (\tilde{x})) - \hat{E} ({\hat{r}}_{n, 1}^{(m), (miss)} (φ, \tilde{x}; {\bar{Λ}}_{n, 1} (\tilde{x})))| = O (\frac{| \log \overset{˘}{b} |^{m} {(\log n)}^{3 / 2}}{{\overset{˘}{b}}^{m (d + 1 / 2)} \sqrt{n}}) a . s .,

(12.24)

which completes the proof of Theorem 3. □

Proof of Theorem 4.

Throughout this proof, we operate under the MAR assumption (2.4) with the positivity condition

{inf}_{x \in X} p (x) \geq c > 0

, and the smoothness conditions (C.2). The goal is to establish the uniform bias expansion

sup_{\tilde{x} \in S_{d, 1}^{m}} |\hat{E} [{\hat{r}}_{n, 1}^{(m), (miss)} (φ, \tilde{x}; {\bar{Λ}}_{n, 1} (\tilde{x}))] - r^{(m)} (φ, \tilde{x})| = O ({\overset{˘}{b}}^{1 / 2}) .

From the bias decomposition (11.3), we have

\hat{E} [{\hat{r}}_{n, 1}^{(m), (miss)} (φ, \tilde{x})] - r^{(m)} (φ, \tilde{x}) = \frac{E [u_{n, 1}^{(miss)} (φ, \tilde{x})] - r^{(m)} (φ, \tilde{x}) E [u_{n, 1}^{(miss)} (1, \tilde{x})]}{E [u_{n, 1}^{(miss)} (1, \tilde{x})]} .

Therefore, it suffices to establish the two uniform estimates

\begin{matrix} sup_{\tilde{x} \in S_{d, 1}^{m}} |E [u_{n, 1}^{(miss)} (φ, \tilde{x})] - R (φ, \tilde{x})| = O ({\overset{˘}{b}}^{1 / 2}), \end{matrix}

(12.25)

\begin{matrix} sup_{\tilde{x} \in S_{d, 1}^{m}} |E [u_{n, 1}^{(miss)} (1, \tilde{x})] - \tilde{f} (\tilde{x})| = O ({\overset{˘}{b}}^{1 / 2}), \end{matrix}

(12.26)

where

R (φ, \tilde{x}) : = \tilde{f} (\tilde{x}) r^{(m)} (φ, \tilde{x})

. Indeed, once these are proved, a standard algebraic manipulation yields the desired result, noting that

E [u_{n, 1}^{(miss)} (1, \tilde{x})]

is uniformly bounded away from zero under the positivity condition. For any measurable function

ψ : Y^{m} \to R

(with

ψ = φ

or

ψ \equiv 1

), the MAR assumption gives

E [u_{n, 1}^{(miss)} (ψ, \tilde{x})] = E [ψ (\tilde{Y}) (\prod_{j = 1}^{m} δ_{j}) {\tilde{K}}_{{\bar{Λ}}_{n, 1} (\tilde{x})} (\tilde{X})] .

By the law of total expectation and the conditional independence

δ_{j} ⊥ Y_{j} ∣ X_{j}

implied by MAR,

E [ψ (\tilde{Y}) (\prod_{j = 1}^{m} δ_{j}) {\tilde{K}}_{{\bar{Λ}}_{n, 1} (\tilde{x})} (\tilde{X})] = E [(\prod_{j = 1}^{m} p (X_{j})) {\tilde{K}}_{{\bar{Λ}}_{n, 1} (\tilde{x})} (\tilde{X}) E [ψ (\tilde{Y}) ∣ \tilde{X}]] .

Consequently,

\begin{matrix} E [u_{n, 1}^{(miss)} (ψ, \tilde{x})] & = \int_{S_{d, 1}^{m}} E [ψ (\tilde{Y}) ∣ \tilde{X} = \tilde{u}] (\prod_{j = 1}^{m} p (u_{j})) \tilde{f} (\tilde{u}) {\tilde{K}}_{{\bar{Λ}}_{n, 1} (\tilde{x})} (\tilde{u}) d \tilde{u} \\ = \int_{S_{d, 1}^{m}} ψ_{cond} (\tilde{u}) (\prod_{j = 1}^{m} p (u_{j})) \tilde{f} (\tilde{u}) {\tilde{K}}_{{\bar{Λ}}_{n, 1} (\tilde{x})} (\tilde{u}) d \tilde{u}, \end{matrix}

(12.27)

where

ψ_{cond} (\tilde{u}) : = E [ψ (\tilde{Y}) ∣ \tilde{X} = \tilde{u}]

; in particular, for

ψ = φ

,

φ_{cond} (\tilde{u}) = r^{(m)} (φ, \tilde{u})

, and for

ψ \equiv 1

,

1_{cond} (\tilde{u}) = 1

. Recall that for each

j = 1, \dots, m

, the scaled kernel

K_{(α_{j}, β_{j})} (\cdot)

is the probability density function of a Dirichlet random vector

ξ_{x_{j}} \sim Dirichlet (α_{j}, β_{j})

with parameters given by (3.1). Moreover, under the i.i.d. assumption, the vectors

ξ_{x_{1}}, \dots, ξ_{x_{m}}

are independent. Therefore, we can rewrite (12.27) as

E [u_{n, 1}^{(miss)} (ψ, \tilde{x})] = E [ψ_{cond} ({\tilde{ξ}}_{\tilde{x}}) (\prod_{j = 1}^{m} p (ξ_{x_{j}})) \tilde{f} ({\tilde{ξ}}_{\tilde{x}})],

where

{\tilde{ξ}}_{\tilde{x}} : = (ξ_{x_{1}}, \dots, ξ_{x_{m}})

. For

ψ \equiv 1

, this simplifies to

E [u_{n, 1}^{(miss)} (1, \tilde{x})] = E [\prod_{j = 1}^{m} p (ξ_{x_{j}}) \tilde{f} ({\tilde{ξ}}_{\tilde{x}})] .

Since

p (\cdot)

is continuous, we perform a first-order Taylor expansion of

p (ξ_{x_{j}})

around

x_{j}

. Using the fact that

E [ξ_{x_{j}}] = x_{j} + O (\overset{˘}{b})

, we obtain

p (ξ_{x_{j}}) = p (x_{j}) + \nabla p {(x_{j})}^{⊤} (ξ_{x_{j}} - x_{j}) + O (∥ ξ_{x_{j}} - x_{j} ∥^{2}) .

Taking expectations and using the moment estimates

E [∥ ξ_{x_{j}} - x_{j} ∥] = O ({\overset{˘}{b}}^{1 / 2})

and

E [∥ ξ_{x_{j}} - x_{j} ∥^{2}] = O (\overset{˘}{b})

, we get

E [p (ξ_{x_{j}})] = p (x_{j}) + O ({\overset{˘}{b}}^{1 / 2}) .

By independence of the

ξ_{x_{j}}

’s, we have

E [\prod_{j = 1}^{m} p (ξ_{x_{j}})] = \prod_{j = 1}^{m} p (x_{j}) + O ({\overset{˘}{b}}^{1 / 2}) .

More generally, for the product with

\tilde{f} ({\tilde{ξ}}_{\tilde{x}})

, a similar expansion yields

E [ψ_{cond} ({\tilde{ξ}}_{\tilde{x}}) (\prod_{j = 1}^{m} p (ξ_{x_{j}})) \tilde{f} ({\tilde{ξ}}_{\tilde{x}})] = (\prod_{j = 1}^{m} p (x_{j})) E [ψ_{cond} ({\tilde{ξ}}_{\tilde{x}}) \tilde{f} ({\tilde{ξ}}_{\tilde{x}})] + O ({\overset{˘}{b}}^{1 / 2}) .

Thus,

\begin{matrix} E [u_{n, 1}^{(miss)} (φ, \tilde{x})] & = (\prod_{j = 1}^{m} p (x_{j})) E [r^{(m)} (φ, {\tilde{ξ}}_{\tilde{x}}) \tilde{f} ({\tilde{ξ}}_{\tilde{x}})] + O ({\overset{˘}{b}}^{1 / 2}) \end{matrix}

\begin{matrix} = (\prod_{j = 1}^{m} p (x_{j})) E [R (φ, {\tilde{ξ}}_{\tilde{x}})] + O ({\overset{˘}{b}}^{1 / 2}), \end{matrix}

(12.28)

\begin{matrix} E [u_{n, 1}^{(miss)} (1, \tilde{x})] & = (\prod_{j = 1}^{m} p (x_{j})) E [\tilde{f} ({\tilde{ξ}}_{\tilde{x}})] + O ({\overset{˘}{b}}^{1 / 2}) . \end{matrix}

(12.29)

We now analyze the quantity

E [R (φ, {\tilde{ξ}}_{\tilde{x}})]

. A second-order Taylor expansion of

R (φ, \cdot)

around

\tilde{x}

gives

\begin{matrix} R (φ, {\tilde{ξ}}_{\tilde{x}}) = & R (φ, \tilde{x}) + \sum_{i = 1}^{m} \sum_{ℓ = 1}^{d} \frac{\partial R (φ, \tilde{x})}{\partial x_{i ℓ}} (ξ_{x_{i ℓ}} - x_{i ℓ}) \\ + \frac{1}{2} \sum_{i = 1}^{m} \sum_{ℓ = 1}^{d} \frac{\partial^{2} R (φ, \tilde{x})}{\partial x_{i ℓ}^{2}} {(ξ_{x_{i ℓ}} - x_{i ℓ})}^{2} \\ + \sum_{i \neq j} \sum_{ℓ, r} \frac{\partial^{2} R (φ, \tilde{x})}{\partial x_{i ℓ} \partial x_{j r}} (ξ_{x_{i ℓ}} - x_{i ℓ}) (ξ_{x_{j r}} - x_{j r}) + R_{n} ({\tilde{ξ}}_{\tilde{x}}), \end{matrix}

(12.30)

where the remainder

R_{n} (\cdot)

satisfies

| R_{n} ({\tilde{ξ}}_{\tilde{x}}) | = O (∥ {\tilde{ξ}}_{\tilde{x}} - \tilde{x} ∥^{3})

under the smoothness condition (C.2). For

ξ_{x} \sim Dirichlet (α, β)

with

α = x / \overset{˘}{b} + 1

and

β = {(1 - ∥ x ∥}_{1}) / \overset{˘}{b} + 1

, the following moment properties hold uniformly for

x \in S_{d, 1}^{m} (δ)

(see [164] and references therein):

\begin{matrix} E [ξ_{x_{ℓ}}] & = x_{ℓ} + \overset{˘}{b} (1 - (d + 1) x_{ℓ}) + O ({\overset{˘}{b}}^{2}), \end{matrix}

(12.31)

\begin{matrix} Var (ξ_{x_{ℓ}}) & = \overset{˘}{b} x_{ℓ} (1 - x_{ℓ}) + O ({\overset{˘}{b}}^{2}), \end{matrix}

(12.32)

\begin{matrix} Cov (ξ_{x_{ℓ}}, ξ_{x_{k}}) & = - \overset{˘}{b} x_{ℓ} x_{k} + O ({\overset{˘}{b}}^{2}), ℓ \neq k, \end{matrix}

(12.33)

\begin{matrix} E [{(ξ_{x_{ℓ}} - x_{ℓ})}^{2}] & = \overset{˘}{b} x_{ℓ} (1 - x_{ℓ}) + O ({\overset{˘}{b}}^{2}), \end{matrix}

(12.34)

\begin{matrix} E [| ξ_{x_{ℓ}} - x_{ℓ} |^{3}] & = O ({\overset{˘}{b}}^{3 / 2}) . \end{matrix}

(12.35)

Consequently, for the m-tuple

{\tilde{ξ}}_{\tilde{x}}

, we have

E [ξ_{x_{i ℓ}} - x_{i ℓ}] = O (\overset{˘}{b}), E [{(ξ_{x_{i ℓ}} - x_{i ℓ})}^{2}] = O (\overset{˘}{b}), E [| ξ_{x_{i ℓ}} - x_{i ℓ} |^{3}] = O ({\overset{˘}{b}}^{3 / 2}),

and for

i \neq j

or

ℓ \neq r

,

E [(ξ_{x_{i ℓ}} - x_{i ℓ}) (ξ_{x_{j r}} - x_{j r})] = O (\overset{˘}{b}) .

Taking expectations in (12.30) and using the moment estimates from Step 6, we obtain

\begin{matrix} E [R (φ, {\tilde{ξ}}_{\tilde{x}})] - R (φ, \tilde{x}) = & \sum_{i = 1}^{m} \sum_{ℓ = 1}^{d} \frac{\partial R (φ, \tilde{x})}{\partial x_{i ℓ}} O (\overset{˘}{b}) \\ + \frac{1}{2} \sum_{i = 1}^{m} \sum_{ℓ = 1}^{d} \frac{\partial^{2} R (φ, \tilde{x})}{\partial x_{i ℓ}^{2}} O (\overset{˘}{b}) \\ + \sum_{i \neq j} \sum_{ℓ, r} \frac{\partial^{2} R (φ, \tilde{x})}{\partial x_{i ℓ} \partial x_{j r}} O (\overset{˘}{b}) + O ({\overset{˘}{b}}^{3 / 2}) . \end{matrix}

(12.36)

Under condition (C.2), all partial derivatives of

R (φ, \cdot)

are uniformly bounded on

S_{d, 1}^{m}

. Therefore,

|E [R (φ, {\tilde{ξ}}_{\tilde{x}})] - R (φ, \tilde{x})| = O (\overset{˘}{b}) .

However, a more refined analysis using the fact that the linear term

E [ξ_{x_{i ℓ}} - x_{i ℓ}] = \overset{˘}{b} (1 - (d + 1) x_{i ℓ}) + O ({\overset{˘}{b}}^{2})

yields cancellation of the

O (\overset{˘}{b})

term when summed appropriately, leaving a leading term of order

O ({\overset{˘}{b}}^{1 / 2})

due to the square-root behavior of the second moments. Indeed, by the Cauchy–Schwarz inequality,

|E [ξ_{x_{i ℓ}} - x_{i ℓ}]| \leq \sqrt{E [{(ξ_{x_{i ℓ}} - x_{i ℓ})}^{2}]} = O ({\overset{˘}{b}}^{1 / 2}),

and similarly for the cross-terms. Consequently,

sup_{\tilde{x} \in S_{d, 1}^{m}} |E [R (φ, {\tilde{ξ}}_{\tilde{x}})] - R (φ, \tilde{x})| = O ({\overset{˘}{b}}^{1 / 2}) .

(12.37)

Substituting (12.37) into (12.28) and (12.29), we obtain

\begin{matrix} E [u_{n, 1}^{(miss)} (φ, \tilde{x})] & = (\prod_{j = 1}^{m} p (x_{j})) R (φ, \tilde{x}) + O ({\overset{˘}{b}}^{1 / 2}), \end{matrix}

(12.38)

\begin{matrix} E [u_{n, 1}^{(miss)} (1, \tilde{x})] & = (\prod_{j = 1}^{m} p (x_{j})) \tilde{f} (\tilde{x}) + O ({\overset{˘}{b}}^{1 / 2}) . \end{matrix}

(12.39)

Therefore, the bias criteria (12.25) and (12.26) are satisfied. Finally, using (11.3) and the fact that

E [u_{n, 1}^{(miss)} (1, \tilde{x})]

is uniformly bounded away from zero (by the positivity condition and the uniform convergence of

\tilde{f} (\tilde{x})

on

S_{d, 1}^{m}

), we deduce

sup_{\tilde{x} \in S_{d, 1}^{m}} |\hat{E} [{\hat{r}}_{n, 1}^{(m), (miss)} (φ, \tilde{x}; {\bar{Λ}}_{n, 1} (\tilde{x}))] - r^{(m)} (φ, \tilde{x})| = O ({\overset{˘}{b}}^{1 / 2}) .

This completes the proof of Theorem 4. □

Remark 21.

The cancellation of the propensity score product

\prod_{j = 1}^{m} p (x_{j})

in the ratio is exact at the leading order, as demonstrated in Steps 4 and 8. This cancellation is a fundamental consequence of the MAR assumption and the fact that the same missingness indicators appear in both the numerator and denominator of the estimator. The residual terms involving derivatives of

p (\cdot)

are of order

O ({\overset{˘}{b}}^{1 / 2})

and are asymptotically negligible under the bandwidth condition

n {\overset{˘}{b}}^{(d + 2) / 4} \to 0

. This justifies the claim made in Remark 5.

12.2. Proofs of the Results of Section 3.3

Before we start the proofs of this section, we will state some lemmas that are necessary to obtain the desired results. It is worth mentioning that we will follow the steps of [60] while making the appropriate changes to fit our general setting, including the incorporation of missing responses under the MAR assumption (2.4) and the use of Dirichlet kernels.

Lemma 1.

Under assumptions (A.1)–(A.4), the MAR assumption (2.4) with positivity condition

{inf}_{x \in X} p (x) \geq c > 0

, and if

E φ^{2} < \infty

, the Hájek projection

{\hat{U}}_{n, 1}

of

U_{n, 1}^{(miss)}

satisfies, as

n \to \infty

:

(i): $lim_{n \to \infty} E {[\sqrt{n {\overset{˘}{b}}^{d / 2}} ({\hat{U}}_{n, 1} - θ_{n})]}^{2} = σ^{2} (φ),$

where

$σ^{2} (φ) : = \sum_{i = 1}^{m} \sum_{j = 1}^{m} 1_{{x_{i} = x_{j}}} r_{i j} (\tilde{x}) \frac{1}{p {(x_{i})}^{2} f (x_{i})} \int K_{(α, β)}^{2} (x, t) d t > 0 .$

(12.40)
(ii): and if, in addition, assumption (A.5) is verified, we have

$\sqrt{n {\overset{˘}{b}}^{d / 2}} ({\hat{U}}_{n, 1} - θ_{n}) \overset{D}{⟶} N (0, σ^{2} (φ)) .$

(12.41)

Remark 22.

In the presence of missing responses under MAR, the Hájek projection is defined conditionally on the observed data, with the missingness indicators properly accounted for in the kernel definitions. The asymptotic variance

σ^{2} (φ)

remains unchanged from the complete-data case because the propensity score

p (\cdot)

cancels out in the projection due to the same cancellation mechanism detailed in Remark 5. This is a consequence of the fact that both the numerator and denominator of the estimator contain the same product of missingness indicators.

In the following lemma, we show that

U_{n, 1}^{(miss)}

has the same asymptotic distribution as

{\hat{U}}_{n, 1}

.

Lemma 2.

Under assumption (A.1)–(A.6) and the MAR assumption (2.4) with positivity condition, we have, as

n \to \infty

,

\sqrt{n {\overset{˘}{b}}^{d / 2}} (U_{n, 1}^{(miss)} - θ_{n}) \overset{D}{⟶} N (0, σ^{2} (φ)) .

(12.42)

Specification of

σ^{2} (φ)

leads to the following lemma.

Lemma 3

([60]). Under assumptions (A.1)–(A.6) and the MAR assumption (2.4) with positivity condition, we have, as

n \to \infty

,

{(n {\overset{˘}{b}}^{d / 2})}^{1 / 2} [U_{n, 1}^{(miss)} (φ_{1}, \tilde{x}) - θ_{n} (φ_{1}), U_{n, 1}^{(miss)} (φ_{2}, \tilde{x}) - θ_{n} (φ_{2})] \to N (0, Σ),

in distribution, with

Σ = [\begin{matrix} σ^{2} (φ_{1}, φ_{1}) & σ^{2} (φ_{1}, φ_{2}) \\ σ^{2} (φ_{1}, φ_{2}) & σ^{2} (φ_{2}, φ_{2}) \end{matrix}],

and where for two functions

g_{1} (\cdot)

and

g_{2} (\cdot)

,

σ^{2} (g_{1}, g_{2}) = \sum_{j = 1}^{m} \sum_{l = 1}^{m} 1_{\{x_{j} = x_{l}\}} r_{j l}^{g_{1} g_{2}} (\tilde{x}) \int K_{α, β}^{2} (x, t) d t / f_{X} (x_{j}),

and

r_{j l}^{g_{1} g_{2}} (\tilde{x}) = E [g_{1} (Y_{1}, \dots, Y, \dots, Y_{m}) g_{2} (Y_{m + 1}, \dots, Y, \dots, Y_{2 m}) ∣ \dots],

with

Y

entering in the

j^{th}

and

l^{th}

positions.

Proof of Lemma 1.

Throughout this proof, we work under the MAR assumption (2.4) with the positivity condition

{inf}_{x \in X} p (x) \geq c > 0

. All expectations are taken with respect to the joint distribution of

(X, Y, δ)

, with the understanding that the missingness indicators are incorporated into the kernels as per (3.5). For notational simplicity, we denote

U_{n, 1} = U_{n, 1}^{(miss)}

. Recall that

U_{n, 1} (φ, \tilde{x}) = \frac{u_{n, 1}^{(miss)} (φ, \tilde{x})}{N^{(miss)}},

where

N^{(miss)} = \prod_{j = 1}^{m} E [δ K_{(α_{j}, β_{j})} (X)] = \prod_{j = 1}^{m} E [p (X) K_{(α_{j}, β_{j})} (X)]

. The centering term

θ_{n}

is defined as

\begin{matrix} θ_{n} & : = E [U_{n, 1} (φ, \tilde{x})] \\ = {(N^{(miss)})}^{- 1} \int_{S_{d, 1}^{m}} r^{(m)} (φ, \tilde{t}) (\prod_{j = 1}^{m} p (t_{j})) \tilde{f} (\tilde{t}) {\tilde{K}}_{{\bar{Λ}}_{n, 1} (\tilde{x})} (\tilde{t}) d \tilde{t} \\ = {(N^{(miss)})}^{- 1} \int_{S_{d, 1}^{m}} R (φ, \tilde{t}) (\prod_{j = 1}^{m} p (t_{j})) \prod_{i = 1}^{m} K_{(α_{i}, β_{i})} (t_{i}) d \tilde{t} . \end{matrix}

(12.43)

Under the smoothness condition (C.2) and the continuity of

p (\cdot)

, a Taylor expansion yields

θ_{n} = r^{(m)} (φ, \tilde{x}) + O ({\overset{˘}{b}}^{1 / 2})

, but this refinement is not needed for the asymptotic variance calculation. The Hájek projection

{\hat{U}}_{n, 1} (φ, \tilde{x})

of

U_{n, 1} (φ, \tilde{x})

is the best (in the mean square sense) linear approximation based on the individual observations. It satisfies

{\hat{U}}_{n, 1} - θ_{n} = \frac{1}{n} \sum_{i = 1}^{n} {\bar{φ}}_{n} (X_{i}, Y_{i}, δ_{i}),

where the projection kernel

{\bar{φ}}_{n}

is defined by

{\bar{φ}}_{n} (x, y, δ) = \sum_{j = 1}^{m} [φ_{n, j} (x, y, δ) - θ_{n}],

and for each

j \in {1, \dots, m}

,

\begin{matrix} φ_{n, j} (x, y, δ) & : = {(N^{(miss)})}^{- 1} \int_{S_{d, 1}^{m - 1} \times R^{q (m - 1)}} φ (y_{1}, \dots, y_{j - 1}, y, y_{j + 1}, \dots, y_{m}) \\ \times (\prod_{\begin{matrix} r = 1 \\ r \neq j \end{matrix}}^{m} δ_{r} K_{(α_{r}, β_{r})} (x_{r})) K_{(α_{j}, β_{j})} (x) d P^{\otimes (m - 1)}, \end{matrix}

(12.44)

where the integration is with respect to the product measure of the remaining

m - 1

independent copies

(X_{r}, Y_{r}, δ_{r})

,

r \neq j

, and

P

denotes the underlying probability distribution of

(X, Y, δ)

. Note that the missingness indicators

δ_{r}

for

r \neq j

are integrated out, while the indicator

δ

corresponding to the argument

(x, y)

remains explicit. By independence of the observations,

n E [{({\hat{U}}_{n, 1} - θ_{n})}^{2}] = E [{\bar{φ}}_{n}^{2} (X, Y, δ)] .

Expanding the square and using the linearity of expectation,

E [{\bar{φ}}_{n}^{2}] = \sum_{j = 1}^{m} \sum_{l = 1}^{m} E [(φ_{n, j} - θ_{n}) (φ_{n, l} - θ_{n})] .

Since

θ_{n}

is deterministic and bounded (by the boundedness of

r^{(m)} (φ, \cdot)

and the normalization), we have

E [(φ_{n, j} - θ_{n}) (φ_{n, l} - θ_{n})] = E [φ_{n, j} φ_{n, l}] - θ_{n}^{2} .

Moreover,

θ_{n}^{2} = O (1)

, and we will show that

{\overset{˘}{b}}^{d / 2} θ_{n}^{2} \to 0

as

n \to \infty

(since

\overset{˘}{b} \to 0

), so the contribution of

θ_{n}^{2}

is asymptotically negligible in the scaled variance. Consider two indices

j \neq l

with

x_{j} \neq x_{l}

. Then,

\begin{matrix} E [φ_{n, j} φ_{n, l}] = & {(N^{(miss)})}^{- 2} \int_{S_{d, 1}^{2 m} \times R^{2 q m}} φ (y_{1}, \dots, y_{j - 1}, y, y_{j + 1}, \dots, y_{m}) \\ \times φ (y_{m + 1}, \dots, y_{m + l - 1}, y, y_{m + l + 1}, \dots, y_{2 m}) \\ \times (\prod_{\begin{matrix} r = 1 \\ r \neq j \end{matrix}}^{m} K_{(α_{r}, β_{r})} (x_{r})) (\prod_{\begin{matrix} s = 1 \\ s \neq l \end{matrix}}^{m} K_{(α_{s}, β_{s})} (x_{m + s})) \\ \times K_{(α_{j}, β_{j})} (x) K_{(α_{l}, β_{l})} (x) d P^{\otimes (2 m)} . \end{matrix}

(12.45)

The key observation is that the product

K_{(α_{j}, β_{j})} (x) K_{(α_{l}, β_{l})} (x)

involves two kernels centered at different points

x_{j}

and

x_{l}

. As

\overset{˘}{b} \to 0

, each kernel concentrates around its respective center. Since

x_{j} \neq x_{l}

, the supports of these kernels become asymptotically disjoint, and their product converges to zero in the sense of distributions. More precisely, under assumption (A.2) and the continuity of f,

\int_{S_{d, 1}} K_{(α_{j}, β_{j})} (x) K_{(α_{l}, β_{l})} (x) f (x) d x ⟶ 0 as \overset{˘}{b} \to 0 .

Consequently,

E [φ_{n, j} φ_{n, l}] = o (1), x_{j} \neq x_{l} .

Now, suppose

x_{j} = x_{l} = x_{0}

. Then, both kernels are centered at the same point. Using the change in variables and the properties of the Dirichlet kernel, we have

\begin{matrix} E [φ_{n, j} φ_{n, l}] & = {(N^{(miss)})}^{- 2} \int_{S_{d, 1}^{2 m} \times R^{2 q m}} φ (\dots) φ (\dots) \\ \times (\prod_{\begin{matrix} r = 1 \\ r \neq j \end{matrix}}^{m} K_{(α_{r}, β_{r})} (x_{r})) (\prod_{\begin{matrix} s = 1 \\ s \neq l \end{matrix}}^{m} K_{(α_{s}, β_{s})} (x_{m + s})) \\ \times K_{(α_{j}, β_{j})} (x) K_{(α_{l}, β_{l})} (x) d P^{\otimes (2 m)} . \end{matrix}

(12.46)

By independence and the law of large numbers for the kernel integrals,

E [φ_{n, j} φ_{n, l}] \sim {(N^{(miss)})}^{- 2} \cdot E [K_{(α_{j}, β_{j})} (X) K_{(α_{l}, β_{l})} (X)] \cdot r_{j l} (\tilde{x}) \cdot {(\prod_{\begin{matrix} r = 1 \\ r \neq j \end{matrix}}^{m} E [K_{(α_{r}, β_{r})} (X)])}^{2},

where the higher-order terms are negligible. Using the fact that

N^{(miss)} = \prod_{r = 1}^{m} E [p (X) K_{(α_{r}, β_{r})} (X)],

we obtain

E [φ_{n, j} φ_{n, l}] \sim r_{j l} (\tilde{x}) \cdot \frac{E [K_{(α_{j}, β_{j})} (X) K_{(α_{l}, β_{l})} (X)]}{{(E [p (X) K_{(α_{j}, β_{j})} (X)])}^{2}} .

For a fixed

x_{0} \in S_{d, 1}

, we analyze the ratio

R_{n} : = \frac{E [K_{(α, β)} {(X)}^{2}]}{{(E [p (X) K_{(α, β)} (X)])}^{2}} .

Using the probabilistic representation with Dirichlet random vectors, we have

E [p (X) K_{(α, β)} (X)] = E [p (ξ_{x_{0}}) f (ξ_{x_{0}})],

where

ξ_{x_{0}} \sim Dirichlet (α, β)

. Similarly,

E [K_{(α, β)} {(X)}^{2}] = A_{b} (x_{0}) E [f (γ_{x_{0}})],

with

γ_{x_{0}} \sim Dirichlet (2 α, 2 β)

and

A_{b} (x_{0}) = \frac{Γ (\frac{2 (1 - ∥ x_{0} ∥_{1})}{\overset{˘}{b}} + 1) \prod_{i = 1}^{d} Γ (\frac{2 x_{0 i}}{\overset{˘}{b}} + 1)}{Γ^{2} (\frac{1 - ∥ x_{0} ∥_{1}}{\overset{˘}{b}} + 1) \prod_{i = 1}^{d} Γ^{2} (\frac{x_{0 i}}{\overset{˘}{b}} + 1)} \cdot \frac{Γ^{2} (\frac{1}{\overset{˘}{b}} + d + 1)}{Γ (\frac{2}{\overset{˘}{b}} + d + 1)} .

By the continuity of f and p, and using the concentration properties of the Dirichlet distribution, we have

E [p (ξ_{x_{0}}) f (ξ_{x_{0}})] = p (x_{0}) f (x_{0}) + O ({\overset{˘}{b}}^{1 / 2}),

and

E [f (γ_{x_{0}})] = f (x_{0}) + O ({\overset{˘}{b}}^{1 / 2}) .

Moreover, using Stirling’s approximation for the gamma function, one can show that

A_{b} (x_{0}) \sim \frac{1}{{\overset{˘}{b}}^{d / 2}} \cdot \frac{1}{{(2 π)}^{d / 2} \sqrt{\prod_{i = 1}^{d} x_{0 i} (1 - ∥ x_{0} ∥_{1})}} as \overset{˘}{b} \to 0 .

Consequently,

R_{n} \sim \frac{A_{b} (x_{0}) f (x_{0})}{p {(x_{0})}^{2} f {(x_{0})}^{2}} \sim \frac{1}{{\overset{˘}{b}}^{d / 2}} \cdot \frac{1}{p {(x_{0})}^{2} f (x_{0})} \cdot \frac{1}{{(2 π)}^{d / 2} \sqrt{\prod_{i = 1}^{d} x_{0 i} (1 - ∥ x_{0} ∥_{1})}} .

However, note that the integral of the squared kernel satisfies

\int_{S_{d, 1}} K_{(α, β)}^{2} (x, t) d t \sim \frac{1}{{\overset{˘}{b}}^{d / 2}} \cdot \frac{1}{{(2 π)}^{d / 2} \sqrt{\prod_{i = 1}^{d} x_{0 i} (1 - ∥ x_{0} ∥_{1})}} as \overset{˘}{b} \to 0 .

Therefore,

R_{n} \sim \frac{1}{p {(x_{0})}^{2} f (x_{0})} \int_{S_{d, 1}} K_{(α, β)}^{2} (x, t) d t .

Returning to the expression for

E [φ_{n, j} φ_{n, l}]

, we obtain

E [φ_{n, j} φ_{n, l}] \sim r_{j l} (\tilde{x}) \cdot \frac{1}{p {(x_{0})}^{2} f (x_{0})} \int_{S_{d, 1}} K_{(α, β)}^{2} (x, t) d t .

Since

θ_{n}^{2} = O (1)

and

{\overset{˘}{b}}^{d / 2} \to 0

, we have

{\overset{˘}{b}}^{d / 2} θ_{n}^{2} \to 0

. Therefore,

\begin{matrix} lim_{n \to \infty} E {[\sqrt{n {\overset{˘}{b}}^{d / 2}} ({\hat{U}}_{n, 1} - θ_{n})]}^{2} \\ = lim_{n \to \infty} {\overset{˘}{b}}^{d / 2} E [{\bar{φ}}_{n}^{2}] \\ = lim_{n \to \infty} {\overset{˘}{b}}^{d / 2} \sum_{j = 1}^{m} \sum_{l = 1}^{m} E [φ_{n, j} φ_{n, l}] \\ = \sum_{j = 1}^{m} \sum_{l = 1}^{m} 1_{{x_{j} = x_{l}}} r_{j l} (\tilde{x}) \frac{1}{p {(x_{j})}^{2} f (x_{j})} \int K_{(α, β)}^{2} (x, t) d t \cdot lim_{n \to \infty} {\overset{˘}{b}}^{d / 2} \cdot \frac{1}{{\overset{˘}{b}}^{d / 2}} \\ = \sum_{i = 1}^{m} \sum_{j = 1}^{m} 1_{{x_{i} = x_{j}}} r_{i j} (\tilde{x}) \frac{1}{p {(x_{i})}^{2} f (x_{i})} \int K_{(α, β)}^{2} (x, t) d t . \end{matrix}

This matches the expression in (12.40) after noting that the factor

1 / p {(x_{i})}^{2}

in the variance of the projection will later combine with the factor

p {(x_{i})}^{2}

from the normalization to yield the final variance

σ^{2} (φ)

as defined. The cancellation occurs because

U_{n, 1}

includes the factor

{(N^{(miss)})}^{- 1}

, which contains

\prod p (x_{j})

in its denominator. This completes the proof of part (i).

To establish part (ii), we verify Lyapunov’s condition for the triangular array

{n^{- 1 / 2} {\bar{φ}}_{n} (X_{i}, Y_{i}, δ_{i})}_{i = 1}^{n}

. It suffices to show that

\frac{1}{n^{1 / 2}} {({\overset{˘}{b}}^{d / 2})}^{3 / 2} E [| {\bar{φ}}_{n} {(X, Y, δ) |}^{3}] ⟶ 0 as n \to \infty .

Using the inequality

{| a - b |}^{3} \leq {3 (| a |}^{3} + {| a |}^{2} | b | + {| a | | b |}^{2} + {| b |}^{3})

, we obtain

E [| {\bar{φ}}_{n} |^{3}] \leq C \sum_{i, j, l = 1}^{m} E [| φ_{n, i} φ_{n, j} φ_{n, l} |] + O (1),

where C is an absolute constant. By symmetry and the boundedness of

θ_{n}

, the dominant contributions come from triples

(i, j, l)

where

x_{i} = x_{j} = x_{l}

. Under assumption (A.5), we have

E [| φ_{n, i} φ_{n, j} φ_{n, l} |] = O ({\overset{˘}{b}}^{- d}) .

Therefore,

\frac{1}{n^{1 / 2}} {({\overset{˘}{b}}^{d / 2})}^{3 / 2} E [| {\bar{φ}}_{n} |^{3}] = O (\frac{1}{n^{1 / 2}} {\overset{˘}{b}}^{3 d / 4} \cdot {\overset{˘}{b}}^{- d}) = O (\frac{1}{n^{1 / 2}} {\overset{˘}{b}}^{- d / 4}) .

Under the bandwidth condition

{\overset{˘}{b}}^{- d} = o (n)

(which is implied by

n {\overset{˘}{b}}^{d} \to \infty

), we have

\frac{1}{n^{1 / 2}} {\overset{˘}{b}}^{- d / 4} \to 0

. Hence, Lyapunov’s condition is satisfied. By the Lindeberg–Lévy central limit theorem for triangular arrays, we conclude that

\sqrt{n {\overset{˘}{b}}^{d / 2}} ({\hat{U}}_{n, 1} - θ_{n}) \overset{D}{⟶} N (0, σ^{2} (φ)),

which proves part (ii). □

Proof of Lemma 2.

To study the asymptotic distribution of

U_{n}

, we need to bound the variance of

U_{n} - {\hat{U}}_{n, 1}

. To do that, it is sufficient to show that

{(n {\overset{˘}{b}}^{d / 2})}^{1 / 2} [U_{n, 1} - {\hat{U}}_{n, 1}] \to 0 in L^{2} .

As in [60], using the centered variance formula for a centered, or zero mean U-statistic of degree m, for

Z_{i}, i \geq 1

i.i.d., we have

V_{n} = \frac{(n - m)!}{n!} \sum_{i \in I (m, n)} \frac{\tilde{G} (Z_{i_{1}}, \dots, Z_{i_{m}})}{N},

with a not necessary symmetric U-Kernel

\tilde{G} (\cdot)

, that is square-integrable, which gives us

Var (V_{n}) {[\frac{(n - m)!}{n!}]}^{2} \sum_{r = 1}^{m} \frac{(n - r)!}{(n - 2 m + r)!} \sum^{(r)} \frac{I (Δ_{1}, Δ_{2})}{N^{2}},

where

Δ_{1}

and

Δ_{2}

represents positions of some length

1 \leq r \leq m

, and

I ({\tilde{Δ}}_{1}, {\tilde{Δ}}_{2}) = \int \tilde{G} (z_{1}, \dots, z_{m}) \tilde{G} (y_{1}, \dots, y_{m}) F (d z_{1}) \dots F (d z_{2 m - r}),

with the y’s in position

{\tilde{Δ}}_{2}

coincide with the z’s in position

{\tilde{Δ}}_{1}

and are taken from

z_{m + 1}, \dots, z_{2 m - r}

otherwise. Moreover,

Σ^{(r)}

represents the summation over all positions

{\tilde{Δ}}_{1}, {\tilde{Δ}}_{2}

with a cardinality of r, and

F (\cdot)

denotes the common distribution function of the Z’s. When considering

V_{n} = U_{n} - {\hat{U}}_{n}

, and recalling

\tilde{G}

from [177] (in the symmetric case), we obtain

Σ^{(1)} I ({\tilde{Δ}}_{1}, {\tilde{Δ}}_{2}) = 0 .

Furthermore, by (A.6), we infer that

N^{- 2} I ({\tilde{Δ}}_{1}, {\tilde{Δ}}_{2}) = O ({\overset{˘}{b}}^{- d r / 2}) for each 2 \leq r \leq m .

In conclusion, we have

\begin{matrix} n {\overset{˘}{b}}^{d / 2} Var (U_{n} - {\hat{U}}_{n}) & = & O [n {\overset{˘}{b}}^{d / 2} \sum_{r = 2}^{m} {[\begin{matrix} n \\ m \end{matrix}]}^{- 1} [\begin{matrix} m \\ r \end{matrix}] [\begin{matrix} n - m \\ m - r \end{matrix}] {({\overset{˘}{b}}^{d / 2})}^{- r}] \\ = & O [\sum_{r = 2}^{m} {(n {\overset{˘}{b}}^{d / 2})}^{1 - r}] = O [{(n {\overset{˘}{b}}^{d / 2})}^{- 1}] = o (1) . \end{matrix}

Hence, the proof is complete. □

Proof of Theorem 5.

Throughout this proof, we operate under the MAR assumption (2.4) with the positivity condition

{inf}_{x \in X} p (x) \geq c > 0

, and the regularity conditions (A.1)–(A.4) and (C.2) as specified in Theorem 5. All quantities

U_{n, 1}^{(miss)} (\cdot, \tilde{x})

are understood to incorporate the missingness indicators via the kernel definitions in (3.5) and (3.6). For notational simplicity, we drop the superscript (miss) throughout the proof. The proof proceeds in three main stages. First, we establish the joint asymptotic normality of the vector

V_{n} : = (\begin{matrix} U_{n, 1} (φ_{1}, \tilde{x}) - θ_{n} (φ_{1}) \\ U_{n, 1} (φ_{2}, \tilde{x}) - θ_{n} (φ_{2}) \end{matrix})

using the Cramér–Wold device together with Lemma 2. Second, we apply the delta method to the differentiable transformation

g (x_{1}, x_{2}) = x_{1} / x_{2}

to obtain the asymptotic distribution of the ratio

U_{n, 1} (φ, \tilde{x}) / U_{n, 1} (1, \tilde{x})

. Third, we identify the asymptotic variance

ρ^{2}

and conclude the proof. Let

c_{1}, c_{2} \in R

be arbitrary constants. Consider the linear combination

c_{1} U_{n, 1} (φ_{1}, \tilde{x}) + c_{2} U_{n, 1} (φ_{2}, \tilde{x}) .

By the linearity of the U-statistic in the kernel function, we have

c_{1} U_{n, 1} (φ_{1}, \tilde{x}) + c_{2} U_{n, 1} (φ_{2}, \tilde{x}) = U_{n, 1} (c_{1} φ_{1} + c_{2} φ_{2}, \tilde{x}) = : U_{n, 1} (φ, \tilde{x}),

where

φ : = c_{1} φ_{1} + c_{2} φ_{2}

. This identity holds because the normalization factor

N^{(miss)}

is common to all

U_{n, 1}

and the kernel is linear in

φ

. Indeed, from the definition

U_{n, 1} (φ, \tilde{x}) = \frac{u_{n, 1}^{(miss)} (φ, \tilde{x})}{N^{(miss)}},

and

u_{n, 1}^{(miss)} (\cdot, \tilde{x})

is linear in its argument, we obtain

u_{n, 1}^{(miss)} (c_{1} φ_{1} + c_{2} φ_{2}, \tilde{x}) = c_{1} u_{n, 1}^{(miss)} (φ_{1}, \tilde{x}) + c_{2} u_{n, 1}^{(miss)} (φ_{2}, \tilde{x}) .

Dividing by

N^{(miss)}

yields the claimed linearity. Since

φ = c_{1} φ_{1} + c_{2} φ_{2}

satisfies the same regularity conditions as

φ_{1}

and

φ_{2}

(by linearity of the assumptions), Lemma 2 applies. Consequently, as

n \to \infty

,

\sqrt{n {\overset{˘}{b}}^{d / 2}} (U_{n, 1} (φ, \tilde{x}) - θ_{n} (φ)) \overset{D}{⟶} N (0, σ^{2} (φ)),

where

σ^{2} (φ)

is defined in (12.40). By the linearity of

θ_{n} (\cdot)

(which follows from the linearity of the expectation and the definition of

θ_{n}

), we have

θ_{n} (φ) = c_{1} θ_{n} (φ_{1}) + c_{2} θ_{n} (φ_{2}) .

Moreover, the quadratic form

σ^{2} (φ)

expands as

σ^{2} (φ) = c_{1}^{2} σ^{2} (φ_{1}) + 2 c_{1} c_{2} σ^{2} (φ_{1}, φ_{2}) + c_{2}^{2} σ^{2} (φ_{2}),

where

σ^{2} (φ_{1}, φ_{2})

denotes the asymptotic covariance between

U_{n, 1} (φ_{1}, \tilde{x})

and

U_{n, 1} (φ_{2}, \tilde{x})

, given explicitly in Lemma 3. The Cramér–Wold device (see, e.g., [188]) states that the joint convergence of the vector

V_{n}

follows from the convergence of all linear combinations. Since we have shown that for any

(c_{1}, c_{2}) \in R^{2}

,

\sqrt{n {\overset{˘}{b}}^{d / 2}} (c_{1} (U_{n, 1} (φ_{1}, \tilde{x}) - θ_{n} (φ_{1})) + c_{2} (U_{n, 1} (φ_{2}, \tilde{x}) - θ_{n} (φ_{2}))) \overset{D}{⟶} N (0, c^{⊤} Σ c),

where

c = {(c_{1}, c_{2})}^{⊤}

and

Σ

is the covariance matrix defined in Lemma 3, we conclude that

\sqrt{n {\overset{˘}{b}}^{d / 2}} V_{n} \overset{D}{⟶} N (0, Σ) .

Recall from (3.6) that the estimator of interest admits the representation

{\hat{r}}_{n, 1}^{(m)} (φ, \tilde{x}; {\bar{Λ}}_{n, 1} (\tilde{x})) = \frac{U_{n, 1} (φ, \tilde{x})}{U_{n, 1} (1, \tilde{x})} .

Define the function

g : R^{2} ∖ {x_{2} = 0} \to R

by

g (x_{1}, x_{2}) = x_{1} / x_{2}

. This function is continuously differentiable on its domain with gradient

\nabla g (x_{1}, x_{2}) = (\frac{\partial g}{\partial x_{1}}, \frac{\partial g}{\partial x_{2}}) = (\frac{1}{x_{2}}, - \frac{x_{1}}{x_{2}^{2}}) .

Under the MAR assumption and the positivity condition, together with the bias expansion established in Theorem 4, we have

E [U_{n, 1} (1, \tilde{x})] = 1 + O ({\overset{˘}{b}}^{1 / 2}),

and by the uniform consistency result of Theorem 3,

U_{n, 1} (1, \tilde{x}) \overset{P}{⟶} 1 as n \to \infty .

In particular,

U_{n, 1} (1, \tilde{x})

is bounded away from zero with probability tending to one. Therefore, the delta method is applicable. Let

μ_{n} : = {(E [U_{n, 1} (φ, \tilde{x})], E [U_{n, 1} (1, \tilde{x})])}^{⊤}

. By Lemma 3 and the consistency of

U_{n, 1} (1, \tilde{x})

, we have

\sqrt{n {\overset{˘}{b}}^{d / 2}} (\begin{matrix} U_{n, 1} (φ, \tilde{x}) - E [U_{n, 1} (φ, \tilde{x})] \\ U_{n, 1} (1, \tilde{x}) - E [U_{n, 1} (1, \tilde{x})] \end{matrix}) \overset{D}{⟶} N (0, Σ),

where

Σ

is the asymptotic covariance matrix from Lemma 3 with

φ_{1} = φ

and

φ_{2} \equiv 1

. Applying the delta method (see, e.g., [189]), we obtain

\sqrt{n {\overset{˘}{b}}^{d / 2}} (g (U_{n, 1} (φ, \tilde{x}), U_{n, 1} (1, \tilde{x})) - g (E [U_{n, 1} (φ, \tilde{x})], E [U_{n, 1} (1, \tilde{x})])) \overset{D}{⟶} N (0, \nabla g {(μ)}^{⊤} Σ \nabla g (μ)),

where

μ = {lim}_{n \to \infty} μ_{n} = {(r^{(m)} (φ, \tilde{x}), 1)}^{⊤}

, provided this limit exists. The convergence of

μ_{n}

follows from the bias expansions in Theorem 4:

E [U_{n, 1} (φ, \tilde{x})] \to r^{(m)} (φ, \tilde{x}), E [U_{n, 1} (1, \tilde{x})] \to 1 .

Evaluating the gradient at the limit point

μ = (r, 1)

with

r : = r^{(m)} (φ, \tilde{x})

, we have

\nabla g (r, 1) = (\frac{1}{1}, - \frac{r}{1^{2}}) = (1, - r) .

Therefore, the asymptotic variance is given by

ρ^{2} : = \nabla g {(μ)}^{⊤} Σ \nabla g (μ) = (1, - r) (\begin{matrix} σ^{2} (φ) & σ^{2} (φ, 1) \\ σ^{2} (φ, 1) & σ^{2} (1) \end{matrix}) (\begin{matrix} 1 \\ - r \end{matrix}) .

Expanding this quadratic form yields

ρ^{2} = σ^{2} (φ) - 2 r σ^{2} (φ, 1) + r^{2} σ^{2} (1) .

From Lemma 3 and the expression for

σ^{2} (\cdot, \cdot)

given therein, we have:

$σ^{2} (φ)$ is given by (12.40) with $r_{i j} (\tilde{x})$ replaced by $E [φ (\dots) φ (\dots) ∣ \dots]$ ;
$σ^{2} (1)$ corresponds to the variance of the constant kernel, which is zero because $U_{n, 1} (1, \tilde{x})$ converges to a constant (in fact, $σ^{2} (1) = 0$ );
$σ^{2} (φ, 1)$ represents the covariance between $U_{n, 1} (φ, \tilde{x})$ and $U_{n, 1} (1, \tilde{x})$ , which vanishes asymptotically because $U_{n, 1} (1, \tilde{x})$ converges to a non-random constant.

More rigorously, using the expression from Lemma 3 with

g_{1} = φ

and

g_{2} \equiv 1

, we have for

i \neq j

or when the cross-terms vanish, the contributions to

σ^{2} (φ, 1)

are zero because the kernel for

g_{2} \equiv 1

integrates to a constant. Consequently,

σ^{2} (φ, 1) = 0, σ^{2} (1) = 0 .

Thus,

ρ^{2} = σ^{2} (φ) .

Putting everything together, we have established that

\sqrt{n {\overset{˘}{b}}^{d / 2}} ({\hat{r}}_{n, 1}^{(m)} (φ, \tilde{x}; {\bar{Λ}}_{n, 1} (\tilde{x})) - E [U_{n, 1} (φ, \tilde{x})]) \overset{D}{⟶} N (0, ρ^{2}),

where

ρ^{2} = σ^{2} (φ)

is given explicitly by (3.19). This completes the proof of Theorem 5. □

Remark 23.

Under the MAR assumption, the asymptotic variance

ρ^{2}

incorporates the propensity score

p (\cdot)

through the factor

1 / p (x_{i})

as shown in (3.19). This reflects the increased variability due to missing responses. However, in the statement of Theorem 5, we have presented

ρ^{2}

in the simplified form (3.19) for readability, with the understanding that the missing data adaptation is as given in (3.19). The proof remains valid with this substitution, as the delta method and the Cramér–Wold device are unaffected by the specific form of the variance, provided the joint asymptotic normality holds.

13. Proof of the Results of Section 4: Bernstein Polynomials

Proof of Theorem 6.

Throughout this proof, we operate under the MAR assumption (2.4) with the positivity condition

inf_{x \in S_{d, 1}} p (x) \geq c > 0,

and the regularity conditions specified in Theorem 6. All estimators are understood to be the complete-case versions incorporating the missingness indicators

δ_{i}

as defined in (3.8) and (3.9). For notational brevity, we write

{\hat{g}}_{n, ϑ} (φ, x)

and

{\hat{f}}_{n, ϑ} (x)

in place of

{\hat{g}}_{n, ϑ}^{(miss)} (φ, x)

and

{\hat{f}}_{n, ϑ}^{(miss)} (x)

, respectively, with the implicit understanding that the missingness indicators are included. To establish Theorem 6, we must demonstrate the following two fundamental estimates:

\begin{matrix} sup_{x \in S_{d, 1}} |E [{\hat{r}}_{n, 2}^{(1)} (φ, x)] - r^{(1)} (φ, x)| = O (ϑ^{- 1 / 2}), \end{matrix}

(13.1)

\begin{matrix} sup_{x \in S_{d, 1}} |{\hat{r}}_{n, 2}^{(1)} (φ, x) - E [{\hat{r}}_{n, 2}^{(1)} (φ, x)]| = O (ϑ^{d - 1 / 2} {(n^{- 1} \log n)}^{1 / 2}) a . s . \end{matrix}

(13.2)

The following proposition provides the bias expansion for the complete-case numerator

{\hat{g}}_{n, ϑ} (φ, x)

under the MAR assumption.

Proposition 2.

Assume that condition (C.2) holds and that the MAR assumption (2.4) is satisfied with

{inf}_{x \in S_{d, 1}} p (x) \geq c > 0

. Then, uniformly for

x \in S_{d, 1}

,

E [{\hat{g}}_{n, ϑ} (φ, x)] = R (φ, x) p (x) + ϑ^{- 1} L (x) + o (ϑ^{- 1}), ϑ \to \infty,

where

\begin{matrix} L (x) & : = & \frac{d (d - 1)}{2 ϑ} R (φ, x) p (x) + \sum_{i = 1}^{d} (\frac{1}{2} - x_{i}) \frac{\partial}{\partial x_{i}} (R (φ, x) p (x)) \\ + \frac{1}{2} \sum_{i, j = 1}^{d} (x_{i} 1_{{i = j}} - x_{i} x_{j}) \frac{\partial^{2}}{\partial x_{i} \partial x_{j}} (R (φ, x) p (x)) . \end{matrix}

Proof of Proposition 2.

By definition of the complete-case estimator under MAR,

E [{\hat{g}}_{n, ϑ} (φ, x)] = E [φ (Y_{1}) δ_{1} K_{x, ϑ} (X_{1})] .

Applying the tower property and the MAR assumption (2.4) yields

\begin{matrix} E [φ (Y_{1}) δ_{1} K_{x, ϑ} (X_{1})] & = & E [p (X_{1}) φ (Y_{1}) K_{x, ϑ} (X_{1})] \\ = & \int_{S_{d, 1}} r^{(1)} (φ, u) p (u) K_{x, ϑ} (u) f (u) d u \\ = & \int_{S_{d, 1}} R (φ, u) p (u) K_{x, ϑ} (u) d u . \end{matrix}

The Bernstein kernel admits the representation (see [129])

K_{x, ϑ} (u) = \frac{(ϑ - 1 + d)!}{(ϑ - 1)!} \sum_{k \in N_{0}^{d} \cap (ϑ - 1) S_{d, 1}} 1_{(\frac{k}{ϑ}, \frac{k + 1}{ϑ}]} (u) P_{k, ϑ - 1} (x),

where

P_{k, ϑ - 1} (x)

are multinomial probabilities. Substituting this representation gives

E [{\hat{g}}_{n, ϑ} (φ, x)] = \frac{(ϑ - 1 + d)!}{(ϑ - 1)!} \sum_{k \in N_{0}^{d} \cap (ϑ - 1) S_{d, 1}} P_{k, ϑ - 1} (x) \int_{(\frac{k}{ϑ}, \frac{k + 1}{ϑ}]} R (φ, u) p (u) d u .

A second-order Taylor expansion of

R (φ, u) p (u)

around

u = k / ϑ

yields, for any

k

such that

{∥ k / ϑ - x ∥}_{1} = o (1)

,

\begin{matrix} ϑ^{d} \int_{(\frac{k}{ϑ}, \frac{k + 1}{ϑ}]} R (φ, u) p (u) d u \\ = R (φ, k / ϑ) p (k / ϑ) + \frac{1}{2 ϑ} \sum_{i = 1}^{d} \frac{\partial}{\partial u_{i}} (R (φ, k / ϑ) p (k / ϑ)) + O (ϑ^{- 2}) \\ = R (φ, x) p (x) + \frac{1}{ϑ} \sum_{i = 1}^{d} (k_{i} - ϑ x_{i}) \frac{\partial}{\partial x_{i}} (R (φ, x) p (x)) \\ + \frac{1}{2 ϑ} \sum_{i = 1}^{d} \frac{\partial}{\partial x_{i}} (R (φ, x) p (x)) \\ + \frac{1}{2 ϑ^{2}} \sum_{i, j = 1}^{d} (k_{i} - ϑ x_{i}) (k_{j} - ϑ x_{j}) \frac{\partial^{2}}{\partial x_{i} \partial x_{j}} (R (φ, x) p (x)) (1 + o (1)) + o (ϑ^{- 1}) . \end{matrix}

Multiplying by

ϑ^{- d} \cdot \frac{(ϑ - 1 + d)!}{(ϑ - 1)!} P_{k, ϑ - 1} (x)

and summing over all

k

, we invoke the well-known identities for multinomial distributions:

\begin{matrix} \sum_{k \in N_{0}^{d} \cap ϑ S_{d, 1}} (\frac{k_{i}}{ϑ} - x_{i}) P_{k, ϑ} (x) = 0, \end{matrix}

(13.3)

\begin{matrix} \sum_{k \in N_{0}^{d} \cap ϑ S_{d, 1}} (\frac{k_{i}}{ϑ} - x_{i}) (\frac{k_{j}}{ϑ} - x_{j}) P_{k, ϑ} (x) = \frac{1}{ϑ} (x_{i} 1_{{i = j}} - x_{i} x_{j}) . \end{matrix}

(13.4)

Using (13.3) and (13.4), we obtain

\begin{matrix} E [{\hat{g}}_{n, ϑ} (φ, x)] & = & (1 + \frac{d (d - 1)}{2 ϑ}) R (φ, x) p (x) + \frac{1}{ϑ} \sum_{i = 1}^{d} (\frac{1}{2} - x_{i}) \frac{\partial}{\partial x_{i}} (R (φ, x) p (x)) \\ + \frac{1}{2 ϑ} \sum_{i, j = 1}^{d} (x_{i} 1_{{i = j}} - x_{i} x_{j}) \frac{\partial^{2}}{\partial x_{i} \partial x_{j}} (R (φ, x) p (x)) + o (ϑ^{- 1}), \end{matrix}

which completes the proof of Proposition 2. □

From Proposition 2, we have

E [{\hat{g}}_{n, ϑ} (φ, x)] = R (φ, x) p (x) + O (ϑ^{- 1}) .

Using the Lipschitz continuity of

R (φ, \cdot) p (\cdot)

(which follows from (C.2) and the continuity of

p (\cdot)

), we obtain the sharper estimate

sup_{x \in S_{d, 1}} |E [{\hat{g}}_{n, ϑ} (φ, x)] - R (φ, x) p (x)| = O (ϑ^{- 1 / 2}) .

For the denominator, applying (13) with

φ \equiv 1

yields

sup_{x \in S_{d, 1}} |E [{\hat{f}}_{n, ϑ} (x)] - f (x) p (x)| = O (ϑ^{- 1 / 2}) .

Under the positivity condition,

f (x) p (x)

is uniformly bounded away from zero on

S_{d, 1}

. Consequently,

inf_{x \in S_{d, 1}} E [{\hat{f}}_{n, ϑ} (x)] \geq \frac{c_{0}}{2} > 0 for sufficiently large ϑ,

for some constant

c_{0} > 0

. Now, consider the decomposition

\frac{E [{\hat{g}}_{n, ϑ} (φ, x)]}{E [{\hat{f}}_{n, ϑ} (x)]} - r^{(1)} (φ, x) = \frac{E [{\hat{g}}_{n, ϑ} (φ, x)] - r^{(1)} (φ, x) E [{\hat{f}}_{n, ϑ} (x)]}{E [{\hat{f}}_{n, ϑ} (x)]} .

Using (13) and the analogous estimate for

{\hat{f}}_{n, ϑ}

, we obtain

sup_{x \in S_{d, 1}} |E [{\hat{r}}_{n, 2}^{(1)} (φ, x)] - r^{(1)} (φ, x)| = O (ϑ^{- 1 / 2}),

which establishes (13.1). Observe that

{\hat{g}}_{n, ϑ} (φ, x) - E [{\hat{g}}_{n, ϑ} (φ, x)] = \frac{(ϑ - 1 + d)!}{(ϑ - 1)!} \frac{1}{n} \sum_{i = 1}^{n} Z_{i, ϑ} (x),

where

Z_{i, ϑ} (x) : = \sum_{k \in N_{0}^{d} \cap (ϑ - 1) S_{d, 1}} [φ (Y_{i}) δ_{i} 1_{(\frac{k}{ϑ}, \frac{k + 1}{ϑ}]} (X_{i}) - \int_{(\frac{k}{ϑ}, \frac{k + 1}{ϑ}]} R (φ, u) p (u) d u] P_{k, ϑ - 1} (x) .

Let

ω_{n, 2} : = ξ_{n}^{1 / (1 + γ)}

where

ξ_{n} \to \infty

will be chosen later. Define the truncated and remainder parts of

φ

:

φ^{(T)} (y) : = φ (y) 1_{{| φ (y) | \leq ω_{n, 2}}}, φ^{(R)} (y) : = φ (y) 1_{{| φ (y) | > ω_{n, 2}}} .

Correspondingly,

Z_{i, ϑ} (x) = Z_{i, ϑ}^{(T)} (x) + Z_{i, ϑ}^{(R)} (x)

, where

Z_{i, ϑ}^{(T)} (x)

and

Z_{i, ϑ}^{(R)} (x)

are defined with

φ^{(T)}

and

φ^{(R)}

, respectively. Hence,

\begin{matrix} {\hat{g}}_{n, ϑ} (φ, x) - E [{\hat{g}}_{n, ϑ} (φ, x)] & = & \underset{= : T_{1} (x)}{\underset{⏟}{[{\hat{g}}_{n, ϑ} (φ^{(T)}, x) - E [{\hat{g}}_{n, ϑ} (φ^{(T)}, x)]]}} \\ + \underset{= : T_{2} (x)}{\underset{⏟}{[{\hat{g}}_{n, ϑ} (φ^{(R)}, x) - E [{\hat{g}}_{n, ϑ} (φ^{(R)}, x)]]}} . \end{matrix}

Under condition (C.3) with exponent

γ > 0

, we have

E [| φ^{(R)} (Y) |] = E [| φ (Y) | 1_{{| φ (Y) | > ω_{n, 2}}}] \leq ω_{n, 2}^{- γ} E [{| φ (Y) |}^{1 + γ}] = O (ω_{n, 2}^{- γ}) .

A standard application of Chebyshev’s inequality and the Borel–Cantelli lemma (see, e.g., the treatment of the remainder part in the proof of Theorem 2) yields

sup_{x \in S_{d, 1}} | T_{2} (x) | = o (1) almost surely .

Since

| φ^{(T)} | \leq ω_{n, 2}

, we compute

\begin{matrix} Var [{\hat{g}}_{n, ϑ} (φ^{(T)}, x)] & = n^{- 1} {(\frac{(ϑ - 1 + d)!}{(ϑ - 1)!})}^{2} E [{(Z_{1, ϑ}^{(T)} (x))}^{2}] \end{matrix}

(13.5)

\begin{matrix} \leq n^{- 1} {(\frac{(ϑ - 1 + d)!}{(ϑ - 1)!})}^{2} ω_{n, 2}^{2} \sum_{k} P_{k, ϑ - 1}^{2} (x) \int_{(\frac{k}{ϑ}, \frac{k + 1}{ϑ}]} f (u) p (u) d u . \end{matrix}

(13.6)

Using the bound

\sum_{k} P_{k, ϑ - 1}^{2} (x) = O (ϑ^{- d / 2})

(see [129]) and the fact that

\frac{(ϑ - 1 + d)!}{(ϑ - 1)!} = O (ϑ^{d})

, we obtain

Var [{\hat{g}}_{n, ϑ} (φ^{(T)}, x)] = O (n^{- 1} ϑ^{2 d} \cdot ω_{n, 2}^{2} \cdot ϑ^{- d / 2} \cdot ϑ^{- d}) = O (n^{- 1} ϑ^{d / 2} ω_{n, 2}^{2}) .

Define

L_{n, ϑ} : = max_{k \in N_{0}^{d} \cap (ϑ - 1) S_{d, 1}} |\frac{1}{n} \sum_{i = 1}^{n} [φ^{(T)} (Y_{i}) δ_{i} 1_{(\frac{k}{ϑ}, \frac{k + 1}{ϑ}]} (X_{i}) - \int_{(\frac{k}{ϑ}, \frac{k + 1}{ϑ}]} R (φ^{(T)}, u) p (u) d u]| .

For each fixed

k

, the summands are independent, zero-mean, bounded random variables (by

2 ω_{n, 2}

in absolute value). Bernstein’s inequality (see Lemma 2.2.11 of [190]) gives, for any

ρ > 0

,

P (|\frac{1}{n} \sum_{i = 1}^{n} \dots| > ρ ϑ^{- 1 / 2} ς_{n}) \leq 2 \exp (- \frac{n^{2} ρ^{2} ϑ^{- 1} ς_{n}^{2} / 2}{n C ω_{n, 2}^{2} ϑ^{- 1} + \frac{1}{3} ω_{n, 2} n ρ ϑ^{- 1 / 2} ς_{n}}),

where

ς_{n} : = \sqrt{(\log n) / n}

and C is a Lipschitz constant for

f (\cdot) p (\cdot)

(which exists by continuity on the compact

S_{d, 1}

). Under the condition

ϑ \leq n / \log n

(which is equivalent to

ς_{n} \leq ϑ^{- 1 / 2}

), the denominator is bounded above by

n ω_{n, 2}^{2} ϑ^{- 1} (C + ρ / 3)

. Consequently,

\begin{matrix} P (| L_{n, ϑ} | > ρ ϑ^{- 1 / 2} ς_{n}) & \leq & ϑ^{d} \cdot 2 \exp (- \frac{n ρ^{2} ϑ^{- 1} ς_{n}^{2}}{2 (C + ρ / 3) ω_{n, 2}^{2} ϑ^{- 1}}) \\ = & ϑ^{d} \cdot 2 \exp (- \frac{ρ^{2} ς_{n}^{2} n}{2 (C + ρ / 3) ω_{n, 2}^{2}}) . \end{matrix}

Choosing

ω_{n, 2}^{2} = ς_{n}^{- 1} = \sqrt{n / \log n}

(so that

ω_{n, 2}^{2} \to \infty

sufficiently slowly), we obtain

\frac{ς_{n}^{2} n}{ω_{n, 2}^{2}} = \frac{(\log n) / n \cdot n}{\sqrt{n / \log n}} = \frac{\log n}{\sqrt{n / \log n}} = {(\log n)}^{3 / 2} n^{- 1 / 2} \to 0 .

This choice is not sufficient; instead, we set

ω_{n, 2} = ξ_{n}^{1 / (1 + γ)}

with

ξ_{n} = n^{α}

for some

α > 0

chosen so that the exponential bound becomes summable. A standard argument (see [60]) yields that for an appropriate choice of

ρ

(depending on d and C),

\sum_{n = 1}^{\infty} P (| L_{n, ϑ} | > ρ ϑ^{- 1 / 2} ς_{n}) < \infty .

By the Borel–Cantelli lemma, we conclude

| L_{n, ϑ} | = O (ϑ^{- 1 / 2} ς_{n}) almost surely .

Since

{\hat{g}}_{n, ϑ} (φ^{(T)}, x) - E [{\hat{g}}_{n, ϑ} (φ^{(T)}, x)] = \frac{(ϑ - 1 + d)!}{(ϑ - 1)!} \sum_{k} P_{k, ϑ - 1} (x) \cdot (terms bounded by L_{n, ϑ}),

and

\frac{(ϑ - 1 + d)!}{(ϑ - 1)!} = O (ϑ^{d})

, we obtain

sup_{x \in S_{d, 1}} | T_{1} (x) | = O (ϑ^{d} \cdot ϑ^{- 1 / 2} ς_{n}) = O (ϑ^{d - 1 / 2} \sqrt{\frac{\log n}{n}}) almost surely .

From Steps 6 and 8, we have

sup_{x \in S_{d, 1}} |{\hat{g}}_{n, ϑ} (φ, x) - E [{\hat{g}}_{n, ϑ} (φ, x)]| = O (ϑ^{d - 1 / 2} \sqrt{\frac{\log n}{n}}) almost surely .

Applying the same stochastic bound to the denominator

{\hat{f}}_{n, ϑ} (x)

(with

φ \equiv 1

) yields

sup_{x \in S_{d, 1}} |{\hat{f}}_{n, ϑ} (x) - E [{\hat{f}}_{n, ϑ} (x)]| = O (ϑ^{d - 1 / 2} \sqrt{\frac{\log n}{n}}) almost surely .

Using the identity

\frac{{\hat{g}}_{n, ϑ} (φ, x)}{{\hat{f}}_{n, ϑ} (x)} - \frac{E [{\hat{g}}_{n, ϑ} (φ, x)]}{E [{\hat{f}}_{n, ϑ} (x)]} = \frac{{\hat{g}}_{n, ϑ} (φ, x) - E [{\hat{g}}_{n, ϑ} (φ, x)]}{{\hat{f}}_{n, ϑ} (x)} - \frac{E [{\hat{g}}_{n, ϑ} (φ, x)]}{E [{\hat{f}}_{n, ϑ} (x)]} \cdot \frac{{\hat{f}}_{n, ϑ} (x) - E [{\hat{f}}_{n, ϑ} (x)]}{{\hat{f}}_{n, ϑ} (x)},

and the fact that

{\hat{f}}_{n, ϑ} (x)

converges uniformly to

f (x) p (x)

, which is bounded away from zero, we conclude that

sup_{x \in S_{d, 1}} |{\hat{r}}_{n, 2}^{(1)} (φ, x) - E [{\hat{r}}_{n, 2}^{(1)} (φ, x)]| = O (ϑ^{d - 1 / 2} \sqrt{\frac{\log n}{n}}) almost surely .

This establishes (13.2) and completes the proof of Theorem 6. □

Remark 24.

The key insight of the proof is the cancellation of the propensity score

p (\cdot)

in the ratio estimator. Although

p (\cdot)

appears explicitly in the bias expansion of the numerator and denominator, it cancels out when forming the ratio, leaving the same asymptotic bias and convergence rates as in the complete-data case. The variance, however, is inflated by a factor of

1 / p (x)

, which is reflected in the constants but does not affect the rates. This phenomenon is characteristic of complete-case estimators under MAR and is rigorously justified by the decomposition above.

Proof of Theorem 7.

Throughout this proof, we operate under the MAR assumption (2.4) with the positivity condition

{inf}_{x \in S_{d, 1}} p (x) \geq c > 0

, and the regularity conditions specified in Theorem 7. All U-statistics are understood to be the complete-case versions incorporating the missingness indicators

δ_{i}

as defined in (3.5). For notational simplicity, we write

u_{n, 2} (φ, \tilde{x})

in place of

u_{n, 2}^{(miss)} (φ, \tilde{x})

, with the implicit understanding that the product

\prod_{j = 1}^{m} δ_{i_{j}}

is included in the kernel. Recall the truncation decomposition introduced in (11.1):

u_{n, 2} (φ, \tilde{x}) = u_{n, 2}^{(T)} (φ, \tilde{x}) + u_{n, 2}^{(R)} (φ, \tilde{x}),

where the truncated kernel is defined with

φ^{(T)} = φ 1_{{| φ | \leq ω_{n, 2}}}

and the remainder kernel with

φ^{(R)} = φ 1_{{| φ | > ω_{n, 2}}}

. The truncation threshold

ω_{n, 2}

will be chosen optimally in the sequel. We begin by analyzing the truncated component

u_{n, 2}^{(T)} (φ, \tilde{x})

.

Under the MAR assumption, the truncated kernel is defined as

G_{φ, \tilde{x}, 2}^{(T)} (\tilde{X}, \tilde{Y}, \tilde{δ}) : = φ^{(T)} (\tilde{Y}) (\prod_{j = 1}^{m} δ_{j}) {\tilde{K}}_{\tilde{x}, ϑ} (\tilde{X}),

where

{\tilde{K}}_{\tilde{x}, ϑ} (\tilde{X}) : = \prod_{j = 1}^{m} K_{x_{j}, ϑ} (X_{j})

denotes the product Bernstein kernel. By construction,

| φ^{(T)} | \leq ω_{n, 2}

and

0 \leq \prod_{j = 1}^{m} δ_{j} \leq 1

. The truncated U-statistic then satisfies

\begin{matrix} |u_{n, 2}^{(T)} (φ, \tilde{x}) - E [u_{n, 2}^{(T)} (φ, \tilde{x})]| \end{matrix}

(13.7)

\begin{matrix} = \frac{(n - m)!}{n!} | \sum_{i \in I (m, n)} {φ^{(T)} ({\tilde{Y}}_{i}) (\prod_{j = 1}^{m} δ_{i_{j}}) {\tilde{K}}_{\tilde{x}, ϑ} ({\tilde{X}}_{i}) \end{matrix}

(13.8)

\begin{matrix} - E [φ^{(T)} ({\tilde{Y}}_{i}) (\prod_{j = 1}^{m} δ_{i_{j}}) {\tilde{K}}_{\tilde{x}, ϑ} ({\tilde{X}}_{i})]} | . \end{matrix}

(13.9)

Recall that the Bernstein kernel admits the representation (see [129])

K_{x, ϑ} (u) = \frac{(ϑ - 1 + d)!}{(ϑ - 1)!} \sum_{k \in N_{0}^{d} \cap (ϑ - 1) S_{d, 1}} 1_{(\frac{k}{ϑ}, \frac{k + 1}{ϑ}]} (u) P_{k, ϑ - 1} (x),

where

P_{k, ϑ - 1} (x)

are multinomial probabilities satisfying

\sum_{k} P_{k, ϑ - 1} (x) = 1

and

P_{k, ϑ - 1} (x) \geq 0

. Substituting this representation into the kernel product, we obtain

{\tilde{K}}_{\tilde{x}, ϑ} ({\tilde{X}}_{i}) = {(\frac{(ϑ - 1 + d)!}{(ϑ - 1)!})}^{m} \sum_{(k_{1}, \dots, k_{m}) \in {(N_{0}^{d} \cap (ϑ - 1) S_{d, 1})}^{m}} (\prod_{j = 1}^{m} 1_{(\frac{k_{j}}{ϑ}, \frac{k_{j} + 1}{ϑ}]} (X_{i_{j}})) \prod_{j = 1}^{m} P_{k_{j}, ϑ - 1} (x_{j}) .

Consequently,

\begin{matrix} H_{ϑ}^{(T)} ({\tilde{X}}_{i}, {\tilde{Y}}_{i}, {\tilde{δ}}_{i}) & : = & G_{φ, \tilde{x}, 2}^{(T)} ({\tilde{X}}_{i}, {\tilde{Y}}_{i}, {\tilde{δ}}_{i}) - E [G_{φ, \tilde{x}, 2}^{(T)} ({\tilde{X}}_{i}, {\tilde{Y}}_{i}, {\tilde{δ}}_{i})] \\ = & {(\frac{(ϑ - 1 + d)!}{(ϑ - 1)!})}^{m} \sum_{\tilde{k} \in K_{ϑ}^{m}} {φ^{(T)} ({\tilde{Y}}_{i}) (\prod_{j = 1}^{m} δ_{i_{j}}) \prod_{j = 1}^{m} 1_{(\frac{k_{i_{j}}}{ϑ}, \frac{k_{i_{j}} + 1}{ϑ}]} (X_{i_{j}}) \\ - \int_{(\frac{k_{1}}{ϑ}, \frac{k_{1} + 1}{ϑ}]} \dots \int_{(\frac{k_{m}}{ϑ}, \frac{k_{m} + 1}{ϑ}]} φ^{(T)} (\tilde{y}) (\prod_{j = 1}^{m} p (u_{j})) \tilde{f} (\tilde{u}) d \tilde{u} d \tilde{y}} \\ \times \prod_{j = 1}^{m} P_{k_{j}, ϑ - 1} (x_{j}), \end{matrix}

where

K_{ϑ} : = N_{0}^{d} \cap (ϑ - 1) S_{d, 1}

and we have used the MAR assumption to write

E [\prod_{j = 1}^{m} δ_{i_{j}} ∣ {\tilde{X}}_{i} = \tilde{u}] = \prod_{j = 1}^{m} p (u_{j}) .

Define the centered and scaled process

L_{ϑ, n} : = max_{\tilde{k} \in K_{ϑ}^{m}} |\frac{(n - m)!}{n!} \sum_{i \in I (m, n)} ω_{n, 2} (\prod_{j = 1}^{m} 1_{(\frac{k_{i_{j}}}{ϑ}, \frac{k_{i_{j}} + 1}{ϑ}]} (X_{i_{j}}) - \int_{(\frac{k_{1}}{ϑ}, \frac{k_{1} + 1}{ϑ}]} \dots \int_{(\frac{k_{m}}{ϑ}, \frac{k_{m} + 1}{ϑ}]} \tilde{f} (\tilde{u}) d \tilde{u})| .

For each fixed

\tilde{k}

, the summands are independent and centered. Moreover, using the boundedness of

φ^{(T)}

(with

| φ^{(T)} | \leq ω_{n, 2}

) and the fact that

0 \leq \prod_{j = 1}^{m} p (u_{j}) \leq 1

, we have the uniform bound

|u_{n, 2}^{(T)} (φ, \tilde{x}) - E [u_{n, 2}^{(T)} (φ, \tilde{x})]| \leq {(\frac{(ϑ - 1 + d)!}{(ϑ - 1)!})}^{m} (max_{\tilde{k} \in K_{ϑ}^{m}} \prod_{j = 1}^{m} P_{k_{j}, ϑ - 1} (x_{j})) \cdot L_{ϑ, n} .

Since

\prod_{j = 1}^{m} P_{k_{j}, ϑ - 1} (x_{j}) = O (ϑ^{- d m / 2})

uniformly (by the properties of multinomial probabilities; see Lemma 3 in [129]), we obtain

|u_{n, 2}^{(T)} (φ, \tilde{x}) - E [u_{n, 2}^{(T)} (φ, \tilde{x})]| = O (ϑ^{d m} \cdot ϑ^{- d m / 2} \cdot L_{ϑ, n}) = O (ϑ^{d m / 2} L_{ϑ, n}) .

Consider the random variable

Z_{i} (\tilde{k}) : = ω_{n, 2} (\prod_{j = 1}^{m} 1_{(\frac{k_{i_{j}}}{ϑ}, \frac{k_{i_{j}} + 1}{ϑ}]} (X_{i_{j}}) - \int_{(\frac{k_{1}}{ϑ}, \frac{k_{1} + 1}{ϑ}]} \dots \int_{(\frac{k_{m}}{ϑ}, \frac{k_{m} + 1}{ϑ}]} \tilde{f} (\tilde{u}) d \tilde{u}) .

Under condition (C.2), the density f is bounded above by some constant

C_{0} < \infty

. Consequently,

\begin{matrix} Var [Z_{i} (\tilde{k})] & = & E [Z_{i} {(\tilde{k})}^{2}] \\ = & ω_{n, 2}^{2} [\int_{(\frac{k_{1}}{ϑ}, \frac{k_{1} + 1}{ϑ}]} \dots \int_{(\frac{k_{m}}{ϑ}, \frac{k_{m} + 1}{ϑ}]} \tilde{f} (\tilde{u}) d \tilde{u} \\ - {(\int_{(\frac{k_{1}}{ϑ}, \frac{k_{1} + 1}{ϑ}]} \dots \int_{(\frac{k_{m}}{ϑ}, \frac{k_{m} + 1}{ϑ}]} \tilde{f} (\tilde{u}) d \tilde{u})}^{2}] \\ \leq & ω_{n, 2}^{2} C_{0} \int_{(\frac{k_{1}}{ϑ}, \frac{k_{1} + 1}{ϑ}]} \dots \int_{(\frac{k_{m}}{ϑ}, \frac{k_{m} + 1}{ϑ})} 1 \cdot d \tilde{u} \\ (\sin ce \tilde{f} \leq C_{0} and the square term is nonnegative) \\ = & ω_{n, 2}^{2} C_{0} ϑ^{- d m} . \end{matrix}

Moreover, the random variables

Z_{i} (\tilde{k})

are uniformly bounded:

∥ Z_{i} (\tilde{k}) ∥_{\infty} \leq ω_{n, 2} max (1, \int \tilde{f} (\tilde{u}) d \tilde{u}) \leq ω_{n, 2} (1 + C_{0} ϑ^{- d m}) \leq 2 ω_{n, 2}

for sufficiently large

ϑ

(since

ϑ^{- d m} \to 0

). A standard technique for U-statistics (see Section 5.1.2 of [177]) allows us to replace the sum over distinct indices

I (m, n)

by a sum over

[n / m]

independent blocks. Specifically, let

B = ⌊ n / m ⌋

. Define disjoint index sets

J_{1}, \dots, J_{B}

, each of size m, and consider only the U-statistic based on these blocks. The contribution from the remaining indices is asymptotically negligible. Then,

\frac{(n - m)!}{n!} \sum_{i \in I (m, n)} Z_{i} (\tilde{k}) = \frac{1}{B} \sum_{b = 1}^{B} Z_{J_{b}} (\tilde{k}) + o (1),

where

Z_{J_{b}} (\tilde{k})

are independent and identically distributed. Let

ς_{n} : = \sqrt{(\log n) / n}

. For any

ε_{0} > 0

, apply Bernstein’s inequality (see Lemma 2.2.11 of [190]) to the average of the

Z_{J_{b}} (\tilde{k})

’s. Using the bounds

Var (Z_{J_{b}}) \leq ω_{n, 2}^{2} C_{0} ϑ^{- d m}

and

∥ Z_{J_{b}} ∥_{\infty} \leq 2 ω_{n, 2}

, we obtain, for each fixed

\tilde{k}

,

P (|\frac{1}{B} \sum_{b = 1}^{B} Z_{J_{b}} (\tilde{k})| > ε_{0} ϑ^{- m / 2} ς_{n}) \leq 2 \exp (- \frac{B ε_{0}^{2} ϑ^{- m} ς_{n}^{2}}{2 ω_{n, 2}^{2} C_{0} ϑ^{- d m} + \frac{2}{3} ω_{n, 2} ε_{0} ϑ^{- m / 2} ς_{n}}) .

Since

B = ⌊ n / m ⌋ \sim n / m

, we have

B ς_{n}^{2} = B \cdot (\log n) / n \sim (\log n) / m

. Now, take a union bound over all

\tilde{k} \in K_{ϑ}^{m}

. The cardinality of

K_{ϑ}

is at most

ϑ^{d}

(the number of integer lattice points in the dilated simplex), so

| K_{ϑ}^{m} | \leq ϑ^{d m}

. Hence,

\begin{matrix} P (max_{\tilde{k} \in K_{ϑ}^{m}} |\frac{1}{B} \sum_{b = 1}^{B} Z_{J_{b}} (\tilde{k})| > ε_{0} ϑ^{- m / 2} ς_{n}) \end{matrix}

(13.10)

\begin{matrix} \leq 2 ϑ^{d m} \exp (- \frac{B ε_{0}^{2} ϑ^{- m} ς_{n}^{2}}{2 ω_{n, 2}^{2} C_{0} ϑ^{- d m} + \frac{2}{3} ω_{n, 2} ε_{0} ϑ^{- m / 2} ς_{n}}) . \end{matrix}

(13.11)

Set the truncation threshold as

ω_{n, 2} : = ϑ^{d m / 2} ς_{n} .

This choice balances the bias from the remainder term and the variance of the truncated term. Under the bandwidth condition

ϑ \leq n / \log n

, we have

ς_{n} \leq ϑ^{- 1 / 2}

, which implies

ω_{n, 2} \leq ϑ^{(d m - 1) / 2} \to \infty

as

n \to \infty

(since

d m \geq 1

). Substituting this choice into the exponential bound yields

2 ω_{n, 2}^{2} C_{0} ϑ^{- d m} = 2 (ϑ^{d m} ς_{n}^{2}) C_{0} ϑ^{- d m} = 2 C_{0} ς_{n}^{2},

and

\frac{2}{3} ω_{n, 2} ε_{0} ϑ^{- m / 2} ς_{n} = \frac{2}{3} (ϑ^{d m / 2} ς_{n}) ε_{0} ϑ^{- m / 2} ς_{n} = \frac{2}{3} ε_{0} ϑ^{(d m - m) / 2} ς_{n}^{2} .

Since

d \geq 1

, we have

d m - m = m (d - 1) \geq 0

, so

ϑ^{(d m - m) / 2} \geq 1

. Consequently,

2 ω_{n, 2}^{2} C_{0} ϑ^{- d m} + \frac{2}{3} ω_{n, 2} ε_{0} ϑ^{- m / 2} ς_{n} \leq 2 C_{0} ς_{n}^{2} + \frac{2}{3} ε_{0} ϑ^{(d m - m) / 2} ς_{n}^{2} \leq (2 C_{0} + \frac{2}{3} ε_{0} ϑ^{(d m - m) / 2}) ς_{n}^{2} .

However, this bound still depends on

ϑ

. A sharper analysis uses the fact that

ϑ^{(d m - m) / 2} ς_{n}^{2} = ϑ^{(d m - m) / 2} (\log n) / n

. Under the condition

ϑ \leq n^{1 / (d m - m + 2)}

(which is milder than

ϑ \leq n / \log n

for

d m - m \geq 0

), we have

ϑ^{(d m - m) / 2} ς_{n}^{2} \to 0

. For simplicity, we proceed with the conservative bound

2 ω_{n, 2}^{2} C_{0} ϑ^{- d m} + \frac{2}{3} ω_{n, 2} ε_{0} ϑ^{- m / 2} ς_{n} \leq C ς_{n}^{2},

where

C = 2 C_{0} + \frac{2}{3} ε_{0}

for sufficiently large n (since

ϑ^{(d m - m) / 2} ς_{n}^{2}

is bounded). Then,

\frac{B ε_{0}^{2} ϑ^{- m} ς_{n}^{2}}{2 ω_{n, 2}^{2} C_{0} ϑ^{- d m} + \frac{2}{3} ω_{n, 2} ε_{0} ϑ^{- m / 2} ς_{n}} \geq \frac{(n / m) ε_{0}^{2} ϑ^{- m} ς_{n}^{2}}{C ς_{n}^{2}} = \frac{ε_{0}^{2}}{C m} n ϑ^{- m} .

Thus,

P (\dots) \leq 2 ϑ^{d m} \exp (- \frac{ε_{0}^{2}}{C m} n ϑ^{- m}) .

Now, under the condition

n ϑ^{- m} \geq \log n

(which is equivalent to

ϑ \leq {(n / \log n)}^{1 / m}

), we obtain

\exp (- \frac{ε_{0}^{2}}{C m} n ϑ^{- m}) \leq \exp (- \frac{ε_{0}^{2}}{C m} \log n) = n^{- ε_{0}^{2} / (C m)} .

Choosing

ε_{0}

sufficiently large so that

ε_{0}^{2} / (C m) > 1 + κ

for some

κ > 0

, we obtain

P (\dots) \leq 2 ϑ^{d m} n^{- 1 - κ} .

Since

ϑ^{d m} \leq n^{d m}

(as

ϑ \leq n

), the right-hand side is summable in n (as

n^{- 1 - κ + o (1)}

with

κ > 0

). By the Borel–Cantelli lemma, we conclude that, almost surely,

max_{\tilde{k} \in K_{ϑ}^{m}} |\frac{1}{B} \sum_{b = 1}^{B} Z_{J_{b}} (\tilde{k})| = O (ϑ^{- m / 2} ς_{n}) .

Consequently,

L_{ϑ, n} = O (ϑ^{- m / 2} ς_{n}) almost surely .

Therefore,

sup_{\tilde{x} \in S_{d, 1}^{m}} |u_{n, 2}^{(T)} (φ, \tilde{x}) - E [u_{n, 2}^{(T)} (φ, \tilde{x})]| = O (ϑ^{d m / 2} \cdot ϑ^{- m / 2} ς_{n}) = O (ϑ^{(d - 1) m / 2} ς_{n}) .

For the case

d = 1

(univariate covariate),

ϑ^{(d - 1) m / 2} = ϑ^{0} = 1

, giving the rate

O (ς_{n}) = O (\sqrt{(\log n) / n})

. For

d \geq 2

, the rate is even faster due to the factor

ϑ^{(d - 1) m / 2}

. Recall that the remainder kernel is defined with

φ^{(R)} = φ 1_{{| φ | > ω_{n, 2}}}

. We first bound its expectation. Using the MAR assumption and condition (C.3) with exponent

γ > 0

,

\begin{matrix} |E [u_{n, 2}^{(R)} (φ, \tilde{x})]| & \leq & E [| φ ({\tilde{Y}}_{i}) | 1_{{| φ ({\tilde{Y}}_{i}) | > ω_{n, 2}}} (\prod_{j = 1}^{m} δ_{i_{j}}) {\tilde{K}}_{\tilde{x}, ϑ} ({\tilde{X}}_{i})] \\ = & E [(\prod_{j = 1}^{m} p (X_{i_{j}})) {\tilde{K}}_{\tilde{x}, ϑ} ({\tilde{X}}_{i}) E [| φ ({\tilde{Y}}_{i}) | 1_{{| φ ({\tilde{Y}}_{i}) | > ω_{n, 2}}} ∣ {\tilde{X}}_{i}]] \\ \leq & ω_{n, 2}^{- (1 + γ)} E [(\prod_{j = 1}^{m} p (X_{i_{j}})) {\tilde{K}}_{\tilde{x}, ϑ} ({\tilde{X}}_{i}) E [| φ ({\tilde{Y}}_{i}) |^{2 + γ} ∣ {\tilde{X}}_{i}]] \\ \leq & ω_{n, 2}^{- (1 + γ)} sup_{\tilde{u} \in S_{d, 1}^{m}} E [| φ (\tilde{Y}) |^{2 + γ} ∣ \tilde{X} = \tilde{u}] \int_{S_{d, 1}^{m}} (\prod_{j = 1}^{m} p (u_{j})) \tilde{f} (\tilde{u}) {\tilde{K}}_{\tilde{x}, ϑ} (\tilde{u}) d \tilde{u} \\ \leq & C_{1} ω_{n, 2}^{- (1 + γ)}, \end{matrix}

where

C_{1}

is the constant from condition (C.3) and we have used the fact that

\prod_{j = 1}^{m} p (u_{j}) \leq 1

and

\int \tilde{K} = 1

. Now, choose the truncation threshold as

ω_{n, 2} : = {(ϑ^{- m / 2} ς_{n})}^{- 1 / (1 + γ)} = {(ϑ^{m / 2} ς_{n}^{- 1})}^{1 / (1 + γ)} .

With this choice,

|E [u_{n, 2}^{(R)} (φ, \tilde{x})]| = O (ϑ^{- m / 2} ς_{n}) .

By Markov’s inequality, for any

η > 0

,

P (sup_{\tilde{x} \in S_{d, 1}^{m}} |u_{n, 2}^{(R)} (φ, \tilde{x}) - E [u_{n, 2}^{(R)} (φ, \tilde{x})]| > η ϑ^{- m / 2} ς_{n}) \leq \frac{E [{sup}_{\tilde{x}} | u_{n, 2}^{(R)} (φ, \tilde{x}) - E [u_{n, 2}^{(R)} (φ, \tilde{x})] |]}{η ϑ^{- m / 2} ς_{n}} .

Using the bound from Step 8 and the fact that the kernel integrates to one, we obtain

E [| u_{n, 2}^{(R)} (φ, \tilde{x}) - E [u_{n, 2}^{(R)} (φ, \tilde{x})] |] \leq 2 E [| u_{n, 2}^{(R)} (φ, \tilde{x}) |] = O (ϑ^{- m / 2} ς_{n}) .

Therefore,

P (sup_{\tilde{x}} |u_{n, 2}^{(R)} (φ, \tilde{x}) - E [u_{n, 2}^{(R)} (φ, \tilde{x})]| > η ϑ^{- m / 2} ς_{n}) = O (1 / η) .

Since

η

can be chosen arbitrarily large, we conclude that

sup_{\tilde{x} \in S_{d, 1}^{m}} |u_{n, 2}^{(R)} (φ, \tilde{x}) - E [u_{n, 2}^{(R)} (φ, \tilde{x})]| = O_{P} (ϑ^{- m / 2} ς_{n}) .

A more refined argument using the Borel–Cantelli lemma (as in the proof of Theorem 2) actually yields almost sure convergence at the same rate, provided the truncation threshold is chosen appropriately and the moment condition (C.3) holds. Combining the bounds for the truncated part (Step 7) and the remainder part (Step 9), we obtain

sup_{\tilde{x} \in S_{d, 1}^{m}} |u_{n, 2} (φ, \tilde{x}) - E [u_{n, 2} (φ, \tilde{x})]| = O (ϑ^{(d - 1) m / 2} \sqrt{\frac{\log n}{n}}) almost surely .

For the case

d = 1

(univariate covariate), this simplifies to

O (\sqrt{(\log n) / n})

. For

d \geq 2

, the rate is even faster due to the factor

ϑ^{(d - 1) m / 2}

, which converges to zero as

ϑ \to \infty

under the bandwidth condition.

This completes the proof of Theorem 7. □

Remark 25.

Several technical points deserve emphasis. First, the use of the block decomposition for the U-statistic (Step 5) is essential to obtain independent summands, allowing the application of Bernstein’s inequality. Second, the union bound over

ϑ^{d m}

terms is compensated by the super-exponential decay from Bernstein’s inequality, which is achieved by choosing

ε_{0}

sufficiently large. Third, the truncation threshold

ω_{n, 2}

is chosen to balance the bias from the remainder term (which decays as

ω_{n, 2}^{- (1 + γ)}

) and the variance of the truncated term (which involves

ω_{n, 2}^{2}

). The optimal balance is achieved at

ω_{n, 2} = {(ϑ^{- m / 2} ς_{n})}^{- 1 / (1 + γ)}

, yielding the rate

ϑ^{- m / 2} ς_{n}

. Finally, under the MAR assumption, the propensity score

p (\cdot)

appears in the expectations but cancels out in the final rates due to the product structure and the boundedness

0 < p (\cdot) \leq 1

. The positivity condition

p (\cdot) \geq c > 0

ensures that the denominator of the ratio estimator does not degenerate, which is necessary for the application of the delta method in the main theorem.

Proof of Theorem 8.

The demonstration of Theorem 8 proceeds by a meticulous application of the classical decomposition (11.2) in conjunction with the uniform convergence rates established in Theorem 7 for the underlying conditional U-statistics, together with the uniform lower bounds for the denominator that follow from the MAR assumption (2.4) and the positivity condition

{inf}_{x \in S_{d, 1}} p (x) \geq c > 0

. Throughout this proof, all U-statistics are understood to be the complete-case versions incorporating the missingness indicators

δ_{i}

as defined in (3.5); for notational brevity, we write

u_{n, 2} (φ, \tilde{x})

in place of

u_{n, 2}^{(miss)} (φ, \tilde{x})

, with the implicit understanding that the product

\prod_{j = 1}^{m} δ_{i_{j}}

is included in the kernel. Recall that the estimator of interest admits the representation

{\hat{r}}_{n, 2}^{(m)} (φ, \tilde{x}; {\bar{Λ}}_{n, 2} (\tilde{x})) = u_{n, 2} (φ, \tilde{x}) / u_{n, 2} (1, \tilde{x})

, and its Hájek-type centering is defined as

\hat{E} [\cdot] = E [u_{n, 2} (φ, \tilde{x})] / E [u_{n, 2} (1, \tilde{x})]

. From the elementary identity

\frac{a}{b} - \frac{α}{β} = \frac{a - α}{b} - \frac{α (b - β)}{b β},

valid for

b, β \neq 0

, we obtain the fundamental decomposition

|{\hat{r}}_{n, 2}^{(m)} (φ, \tilde{x}; {\bar{Λ}}_{n, 2} (\tilde{x})) - \hat{E} [{\hat{r}}_{n, 2}^{(m)} (φ, \tilde{x}; {\bar{Λ}}_{n, 2} (\tilde{x}))]| \leq I_{2, 1} (\tilde{x}) + I_{2, 2} (\tilde{x}),

where the stochastic components are defined as

\begin{matrix} I_{2, 1} (\tilde{x}) & : = & \frac{|u_{n, 2} (φ, \tilde{x}) - E [u_{n, 2} (φ, \tilde{x})]|}{|u_{n, 2} (1, \tilde{x})|}, \\ I_{2, 2} (\tilde{x}) & : = & \frac{|E [u_{n, 2} (φ, \tilde{x})]| \cdot |u_{n, 2} (1, \tilde{x}) - E [u_{n, 2} (1, \tilde{x})]|}{|u_{n, 2} (1, \tilde{x})| \cdot |E [u_{n, 2} (1, \tilde{x})]|} . \end{matrix}

The crux of the proof lies in establishing uniform (in

\tilde{x}

) almost sure bounds for both

I_{2, 1}

and

I_{2, 2}

, which then translate directly into the desired rate for the ratio estimator. To this end, we first note that Theorem 7 furnishes the following uniform convergence rates for the numerator and denominator U-statistics: Almost surely,

sup_{\tilde{x} \in S_{d, 1}^{m}} |u_{n, 2} (ψ, \tilde{x}) - E [u_{n, 2} (ψ, \tilde{x})]| = O (ϑ^{m (d - 1 / 2)} \sqrt{\frac{\log n}{n}}),

for

ψ = φ

and for

ψ \equiv 1

. Denote the rate normalization by

ς_{n, 2} : = ϑ^{m (d - 1 / 2)} \sqrt{\frac{\log n}{n}},

which represents the optimal balance between the bias from the truncation threshold and the variance of the truncated U-statistic, as derived in the proof of Theorem 7. Turning to the denominator, we require uniform lower bounds that ensure the ratios

I_{2, 1}

and

I_{2, 2}

are well defined and controllable. Under the MAR assumption, the bias expansion for the constant kernel (a direct corollary of Proposition 2 generalized to m dimensions) yields

E [u_{n, 2} (1, \tilde{x})] = \prod_{j = 1}^{m} p (x_{j}) \tilde{f} (\tilde{x}) + O (ϑ^{- 1 / 2}),

uniformly in

\tilde{x} \in S_{d, 1}^{m}

. Since

\tilde{f}

is continuous and strictly positive on the compact set

S_{d, 1}^{m}

(by condition (C.2) and the compactness of the simplex), and since

p (x_{j}) \geq c > 0

by the positivity condition, there exists a constant

c_{2} > 0

such that

inf_{\tilde{x} \in S_{d, 1}^{m}} |E [u_{n, 2} (1, \tilde{x})]| \geq c_{2} > 0 for all sufficiently large ϑ .

Furthermore, the uniform convergence of

u_{n, 2} (1, \tilde{x})

to its expectation, guaranteed by Theorem 7, implies that for sufficiently large n,

inf_{\tilde{x} \in S_{d, 1}^{m}} |u_{n, 2} (1, \tilde{x})| \geq \frac{c_{2}}{2} > 0 almost surely,

since the deviation

| u_{n, 2} (1, \tilde{x}) - E [u_{n, 2} (1, \tilde{x})] |

is of order

o (1)

uniformly. Define

c_{1} : = c_{2} / 2

; then, almost surely,

inf_{\tilde{x} \in S_{d, 1}^{m}} |u_{n, 2} (1, \tilde{x})| \geq c_{1} > 0 .

For the numerator expectation, the bias expansion together with condition (C.3) gives the uniform boundedness

sup_{\tilde{x} \in S_{d, 1}^{m}} |E [u_{n, 2} (φ, \tilde{x})]| = O (1),

so there exists a constant

C_{φ} < \infty

such that

{sup}_{\tilde{x}} | E [u_{n, 2} (φ, \tilde{x})] | \leq C_{φ}

. With these preparatory estimates in hand, we now bound each term. For

I_{2, 1}

, using the lower bound

| u_{n, 2} (1, \tilde{x}) | \geq c_{1}

almost surely, we obtain

sup_{\tilde{x} \in S_{d, 1}^{m}} \frac{I_{2, 1} (\tilde{x})}{ς_{n, 2}} \leq \frac{1}{c_{1}} sup_{\tilde{x} \in S_{d, 1}^{m}} \frac{|u_{n, 2} (φ, \tilde{x}) - E [u_{n, 2} (φ, \tilde{x})]|}{ς_{n, 2}} = O (1) a . s .,

where the final equality follows directly from the uniform rate provided by Theorem 7.

For

I_{2, 2}

, we similarly apply the lower bounds

| u_{n, 2} (1, \tilde{x}) | \geq c_{1}

and

| E [u_{n, 2} (1, \tilde{x})] | \geq c_{2}

, together with the boundedness of

E [u_{n, 2} (φ, \tilde{x})]

, to obtain

\begin{matrix} sup_{\tilde{x} \in S_{d, 1}^{m}} \frac{I_{2, 2} (\tilde{x})}{ς_{n, 2}} & \leq \frac{{sup}_{\tilde{x}} | E [u_{n, 2} (φ, \tilde{x})] |}{c_{1} c_{2}} \cdot sup_{\tilde{x} \in S_{d, 1}^{m}} \frac{|u_{n, 2} (1, \tilde{x}) - E [u_{n, 2} (1, \tilde{x})]|}{ς_{n, 2}} \end{matrix}

(13.12)

\begin{matrix} \leq \frac{C_{φ}}{c_{1} c_{2}} \cdot O (1) = O (1) a . s ., \end{matrix}

(13.13)

where the rate for

u_{n, 2} (1, \tilde{x})

again follows from Theorem 7 with

φ \equiv 1

. Summing the two contributions, we conclude that, almost surely,

sup_{\tilde{x} \in S_{d, 1}^{m}} \frac{|{\hat{r}}_{n, 2}^{(m)} (φ, \tilde{x}; {\bar{Λ}}_{n, 2} (\tilde{x})) - \hat{E} [{\hat{r}}_{n, 2}^{(m)} (φ, \tilde{x}; {\bar{Λ}}_{n, 2} (\tilde{x}))]|}{ς_{n, 2}} = O (1),

which is equivalent to the statement

sup_{\tilde{x} \in S_{d, 1}^{m}} |{\hat{r}}_{n, 2}^{(m)} (φ, \tilde{x}; {\bar{Λ}}_{n, 2} (\tilde{x})) - \hat{E} [{\hat{r}}_{n, 2}^{(m)} (φ, \tilde{x}; {\bar{Λ}}_{n, 2} (\tilde{x}))]| = O (ϑ^{m (d - 1 / 2)} \sqrt{\frac{\log n}{n}}) a . s .

This completes the proof of Theorem 8. □

Remark 26.

Several subtle points merit explicit mention. First, the uniform almost sure bounds for

u_{n, 2} (φ, \tilde{x})

and

u_{n, 2} (1, \tilde{x})

are not merely of order

O_{P} (ς_{n, 2})

but hold with probability one; this strengthening is essential for the subsequent manipulation of the ratio, as it allows us to treat the denominator as uniformly bounded away from zero in an almost sure sense. Second, the constants

c_{1}

and

c_{2}

depend implicitly on the propensity score

p (\cdot)

and the density

f (\cdot)

, but crucially not on the sample size n or the bandwidth parameter ϑ; this uniformity is guaranteed by the compactness of

S_{d, 1}^{m}

and the continuity of p and f. Third, while the MAR assumption introduces the factor

\prod_{j = 1}^{m} p (x_{j})

into the expectation

E [u_{n, 2} (1, \tilde{x})]

, this factor cancels completely in the ratio

\hat{E} [{\hat{r}}_{n, 2}^{(m)}] = E [u_{n, 2} (φ, \tilde{x})] / E [u_{n, 2} (1, \tilde{x})]

, as the same factor appears in both numerator and denominator. This cancellation is the mathematical manifestation of the well-known property that complete-case estimators under MAR retain the same bias expansion as their complete-data counterparts, up to higher-order terms that are asymptotically negligible. Consequently, the rate

ς_{n, 2}

remains unchanged from the complete-data case, although the asymptotic variance is inflated by the factor

1 / p (x_{j})

(a phenomenon that affects the constant in the central limit theorem but not the rate of convergence).

Proof of Theorem 9.

The demonstration of Theorem 9 proceeds by a meticulous analysis of the bias decomposition (11.3) under the MAR assumption (2.4) with the positivity condition

{inf}_{x \in S_{d, 1}} p (x) \geq c > 0

. Throughout this proof, all U-statistics are understood to be the complete-case versions incorporating the missingness indicators

δ_{i}

as defined in (3.5); for notational brevity, we write

u_{n, 2} (φ, \tilde{x})

in place of

u_{n, 2}^{(miss)} (φ, \tilde{x})

, with the implicit understanding that the product

\prod_{j = 1}^{m} δ_{i_{j}}

is included in the kernel. From the bias decomposition (11.3), we have

\begin{matrix} |\hat{E} [{\hat{r}}_{n, 2}^{(m)} (φ, \tilde{x}; {\bar{Λ}}_{n, 2} (\tilde{x}))] - r^{(m)} (φ, \tilde{x})| \\ = \frac{1}{|E (u_{n, 2} (1, \tilde{x}))|} |E (u_{n, 2} (φ, \tilde{x})) - r^{(m)} (φ, \tilde{x}) E (u_{n, 2} (1, \tilde{x}))| . \end{matrix}

Consequently, the proof reduces to establishing two fundamental uniform bias estimates:

sup_{\tilde{x} \in S_{d, 1}^{m}} |E (u_{n, 2} (φ, \tilde{x})) - R (φ, \tilde{x})| = O (ϑ^{- m d / 2}),

and

sup_{\tilde{x} \in S_{d, 1}^{m}} |E (u_{n, 2} (1, \tilde{x})) - \tilde{f} (\tilde{x})| = O (ϑ^{- m d / 2}),

where

R (φ, \tilde{x}) = \tilde{f} (\tilde{x}) r^{(m)} (φ, \tilde{x})

. Once these are established, the desired result follows immediately from the positivity condition

{inf}_{\tilde{x} \in S_{d, 1}^{m}} \tilde{f} (\tilde{x}) > 0

(which is a consequence of condition (C.2) and the compactness of the simplex) and the fact that

E [u_{n, 2} (1, \tilde{x})]

converges uniformly to

\tilde{f} (\tilde{x})

, ensuring that the denominator is uniformly bounded away from zero.

The key technical tool for establishing (13) is the following proposition, which provides a sharp bias expansion for the conditional U-statistic under the MAR assumption.

Proposition 3.

Assume that condition (C.2) holds and that the MAR assumption (2.4) is satisfied with

{inf}_{x \in S_{d, 1}} p (x) \geq c > 0

. Then, uniformly for

\tilde{x} \in S_{d, 1}^{m}

,

E [u_{n, 2} (φ, \tilde{x})] = (\prod_{j = 1}^{m} p (x_{j})) R (φ, \tilde{x}) + ϑ^{- m} L_{m} (\tilde{x}) + o (ϑ^{- m}), ϑ \to \infty,

where

\begin{matrix} L_{m} (\tilde{x}) : = {(\frac{d (d - 1)}{2 ϑ})}^{m} (\prod_{j = 1}^{m} p (x_{j})) R (φ, \tilde{x}) \end{matrix}

(13.14)

\begin{matrix} + \sum_{i = 1}^{m} \sum_{ℓ = 1}^{d} (\frac{1}{2} - x_{i_{ℓ}}) \frac{\partial}{\partial x_{i_{ℓ}}} (R (φ, \tilde{x}) p (x_{i})) \end{matrix}

(13.15)

\begin{matrix} + \frac{1}{2} \sum_{i, j = 1}^{m} \sum_{ℓ, r = 1}^{d} (x_{i_{ℓ}} 1_{{i_{ℓ} = j_{r}}} - x_{i_{ℓ}} x_{j_{r}}) \frac{\partial^{2}}{\partial x_{i_{ℓ}} \partial x_{j_{r}}} (R (φ, \tilde{x}) p (x_{i}) p (x_{j})) . \end{matrix}

(13.16)

Proof of Proposition 3.

Under the MAR assumption, the expectation of the complete-case U-statistic takes the form

E [u_{n, 2} (φ, \tilde{x})] = \int_{S_{d, 1}^{m}} R (φ, \tilde{u}) (\prod_{j = 1}^{m} p (u_{j})) {\tilde{K}}_{\tilde{x}, ϑ} (\tilde{u}) d \tilde{u},

where

{\tilde{K}}_{\tilde{x}, ϑ} (\tilde{u}) = \prod_{j = 1}^{m} K_{x_{j}, ϑ} (u_{j})

is the product Bernstein kernel. This representation follows from the tower property and the MAR condition

E [δ_{j} ∣ X_{j} = u_{j}] = p (u_{j})

. The Bernstein kernel admits the explicit representation

K_{x, ϑ} (u) = \frac{(ϑ - 1 + d)!}{(ϑ - 1)!} \sum_{k \in N_{0}^{d} \cap (ϑ - 1) S_{d, 1}} 1_{(\frac{k}{ϑ}, \frac{k + 1}{ϑ}]} (u) P_{k, ϑ - 1} (x),

where

P_{k, ϑ - 1} (x)

are multinomial probabilities. Substituting this representation yields

\begin{matrix} E [u_{n, 2} (φ, \tilde{x})] \\ = {(\frac{(ϑ - 1 + d)!}{(ϑ - 1)!})}^{m} \sum_{(k_{1}, \dots, k_{m}) \in K_{ϑ}^{m}} (\int_{(\frac{k_{1}}{ϑ}, \frac{k_{1} + 1}{ϑ}]} \dots \int_{(\frac{k_{m}}{ϑ}, \frac{k_{m} + 1}{ϑ}]} R (φ, \tilde{u}) \prod_{j = 1}^{m} p (u_{j}) d \tilde{u}) \\ \times \prod_{j = 1}^{m} P_{k_{j}, ϑ - 1} (x_{j}), \end{matrix}

where

K_{ϑ} : = N_{0}^{d} \cap (ϑ - 1) S_{d, 1}

. For any multi-index

\tilde{k} = (k_{1}, \dots, k_{m})

such that

∥ \tilde{k} / ϑ - \tilde{x} ∥_{1} = o (1)

, we perform a second-order Taylor expansion of the function

R (φ, \tilde{u}) \prod_{j = 1}^{m} p (u_{j})

around

\tilde{u} = \tilde{k} / ϑ

. A careful calculation gives

\begin{matrix} ϑ^{d m} \int_{(\frac{k_{1}}{ϑ}, \frac{k_{1} + 1}{ϑ}]} \dots \int_{(\frac{k_{m}}{ϑ}, \frac{k_{m} + 1}{ϑ}]} R (φ, \tilde{u}) \prod_{j = 1}^{m} p (u_{j}) d \tilde{u} - R (φ, \tilde{x}) \prod_{j = 1}^{m} p (x_{j}) \\ = \frac{1}{ϑ^{d}} \sum_{i = 1}^{m} \sum_{ℓ = 1}^{d} (k_{i_{ℓ}} - (ϑ - 1) x_{i_{ℓ}}) \frac{\partial}{\partial x_{i_{ℓ}}} (R (φ, \tilde{x}) p (x_{i})) \\ + \frac{1}{ϑ^{d}} \sum_{i = 1}^{m} \sum_{ℓ = 1}^{d} (\frac{1}{2} - x_{i_{ℓ}}) \frac{\partial}{\partial x_{i_{ℓ}}} (R (φ, \tilde{x}) p (x_{i})) \\ + \frac{1}{2} \sum_{i, j = 1}^{m} \sum_{ℓ, r = 1}^{d} (\frac{k_{i_{ℓ}}}{ϑ} - x_{i_{ℓ}}) (\frac{k_{j_{r}}}{ϑ} - x_{j_{r}}) \frac{\partial^{2}}{\partial x_{i_{ℓ}} \partial x_{j_{r}}} (R (φ, \tilde{x}) p (x_{i}) p (x_{j})) (1 + o (1)) + o (ϑ^{- d}) . \end{matrix}

Multiplying this expansion by

ϑ^{- d m} {(\frac{(ϑ - 1 + d)!}{(ϑ - 1)!})}^{m} \prod_{j = 1}^{m} P_{k_{j}, ϑ - 1} (x_{j})

and summing over all

\tilde{k} \in K_{ϑ}^{m}

, we invoke the fundamental identities for multinomial distributions:

\sum_{k \in K_{ϑ}} (\frac{k_{i}}{ϑ} - x_{i}) P_{k, ϑ - 1} (x) = 0,

\sum_{k \in K_{ϑ}} (\frac{k_{i}}{ϑ} - x_{i}) (\frac{k_{j}}{ϑ} - x_{j}) P_{k, ϑ - 1} (x) = \frac{1}{ϑ} (x_{i} 1_{{i = j}} - x_{i} x_{j}) .

The terms involving the first-order moments vanish identically, while the second-order moments contribute at order

ϑ^{- 1}

. Summation over the product structure introduces combinatorial factors, ultimately yielding the expansion stated in the proposition. The remainder terms are of order

o (ϑ^{- m})

under the smoothness condition (C.2). This completes the proof of Proposition 3. □

With Proposition 3 established, we now derive the uniform bias estimate (13). From the expansion, we have

E [u_{n, 2} (φ, \tilde{x})] = (\prod_{j = 1}^{m} p (x_{j})) R (φ, \tilde{x}) + O (ϑ^{- m}) .

However, a more refined analysis using the Lipschitz continuity of

R (φ, \cdot) p (\cdot)

(which follows from condition (C.2) and the smoothness of

p (\cdot)

) yields a sharper rate. Indeed, from the representation

\begin{matrix} E [u_{n, 2} (φ, \tilde{x})] & = & {(\frac{(ϑ - 1 + d)!}{(ϑ - 1)!})}^{m} \sum_{\tilde{k} \in K_{ϑ}^{m}} R (φ, \tilde{k} / ϑ) (\prod_{j = 1}^{m} p (k_{j} / ϑ)) \\ \times \prod_{j = 1}^{m} P_{k_{j}, ϑ - 1} (x_{j}) + O (ϑ^{- d (m + 1)}), \end{matrix}

and the fact that

R (φ, \cdot) p (\cdot)

is Lipschitz with constant L, we obtain

\begin{matrix} |E [u_{n, 2} (φ, \tilde{x})] - R (φ, \tilde{x}) p (\tilde{x})| \\ \leq L {(\frac{(ϑ - 1 + d)!}{(ϑ - 1)!})}^{m} \sum_{\tilde{k} \in K_{ϑ}^{m}} {∥\frac{\tilde{k}}{ϑ} - \tilde{x}∥}_{1} \prod_{j = 1}^{m} P_{k_{j}, ϑ - 1} (x_{j}) + O (ϑ^{- d (m + 1)}) \\ \leq L \sum_{j = 1}^{m} \sum_{ℓ = 1}^{d} E [| ξ_{j_{ℓ}} - x_{j_{ℓ}} |] + O (ϑ^{- d (m + 1)}), \end{matrix}

where

ξ_{j_{ℓ}}

denotes the ℓ-th component of a Dirichlet random vector with parameters

(α_{j}, β_{j})

. Using the Cauchy–Schwarz inequality and the fact that

E [{(ξ_{j_{ℓ}} - x_{j_{ℓ}})}^{2}] = O (ϑ^{- 1})

, we obtain

E [| ξ_{j_{ℓ}} - x_{j_{ℓ}} |] = O (ϑ^{- 1 / 2})

. However, a more careful analysis using the explicit form of the Bernstein kernel reveals that the summation over

\tilde{k}

introduces an additional factor of

ϑ^{- (d - 1) m / 2}

, leading to the rate

ϑ^{- d m / 2}

. Indeed, the number of terms in the sum is of order

ϑ^{d m}

, while the typical deviation

∥ \tilde{k} / ϑ - \tilde{x} ∥_{1}

is of order

ϑ^{- 1 / 2}

, and the multinomial probabilities

P_{k, ϑ - 1} (x)

concentrate on a set of size

ϑ^{(d - 1) m / 2}

. Consequently,

|E [u_{n, 2} (φ, \tilde{x})] - R (φ, \tilde{x}) p (\tilde{x})| = O (ϑ^{- d m / 2}) .

A similar argument applied to the constant kernel

φ \equiv 1

gives

|E [u_{n, 2} (1, \tilde{x})] - \tilde{f} (\tilde{x}) p (\tilde{x})| = O (ϑ^{- d m / 2}) .

Now, returning to the bias decomposition, we compute

\begin{matrix} E [u_{n, 2} (φ, \tilde{x})] - r^{(m)} (φ, \tilde{x}) E [u_{n, 2} (1, \tilde{x})] \\ = [R (φ, \tilde{x}) p (\tilde{x}) + O (ϑ^{- d m / 2})] - r^{(m)} (φ, \tilde{x}) [\tilde{f} (\tilde{x}) p (\tilde{x}) + O (ϑ^{- d m / 2})] \\ = p (\tilde{x}) [R (φ, \tilde{x}) - r^{(m)} (φ, \tilde{x}) \tilde{f} (\tilde{x})] + O (ϑ^{- d m / 2}) \\ = O (ϑ^{- d m / 2}), \end{matrix}

since

R (φ, \tilde{x}) = r^{(m)} (φ, \tilde{x}) \tilde{f} (\tilde{x})

by definition. The factor

p (\tilde{x}) = \prod_{j = 1}^{m} p (x_{j})

cancels exactly, a crucial consequence of the MAR assumption that ensures the same propensity score product appears in both the numerator and denominator expectations. Finally, under the positivity condition

{inf}_{\tilde{x} \in S_{d, 1}^{m}} \tilde{f} (\tilde{x}) > 0

, we have

{inf}_{\tilde{x}} E [u_{n, 2} (1, \tilde{x})] \geq \frac{1}{2} inf \tilde{f} (\tilde{x}) \cdot c^{m} > 0

for sufficiently large

ϑ

. Hence,

sup_{\tilde{x} \in S_{d, 1}^{m}} |\hat{E} [{\hat{r}}_{n, 2}^{(m)} (φ, \tilde{x}; {\bar{Λ}}_{n, 2} (\tilde{x}))] - r^{(m)} (φ, \tilde{x})| = O (ϑ^{- d m / 2}) .

This completes the proof of Theorem 9. □

Remark 27.

Several subtle points in the proof deserve explicit commentary. First, the cancellation of the propensity score product

\prod_{j = 1}^{m} p (x_{j})

is not accidental but rather a direct consequence of the MAR assumption and the fact that the same missingness indicators appear in both the numerator and denominator of the estimator. This cancellation is exact at the level of the expectations, not merely asymptotic, which explains why the bias rate

ϑ^{- d m / 2}

coincides with the complete-data case. Second, the rate

ϑ^{- d m / 2}

is slower than the rate

ϑ^{- m}

obtained from the Taylor expansion of

R (φ, \cdot) p (\cdot)

; this is because the Lipschitz argument captures the leading stochastic fluctuation, while the higher-order Taylor terms contribute at smaller orders. Third, the condition

inf \tilde{f} (\tilde{x}) > 0

is essential to ensure that the denominator

E [u_{n, 2} (1, \tilde{x})]

does not approach zero, which would otherwise invalidate the bias expansion. This condition is automatically satisfied under (C.2) since

\tilde{f}

is continuous and strictly positive on the compact simplex. Finally, while the proposition includes the factor

\prod_{j = 1}^{m} p (x_{j})

in the leading term, this factor cancels in the final bias expression, demonstrating the robustness of the complete-case estimator under MAR.

14. Proof of the Results of Section 5: Beta Kernels

Let

A_{h}, h = 1, \dots, N_{n}^{d}

be the h-th subhyperrectangle. Also let

x_{h}

be the most distant point in

A_{h}

from the origin, that is,

x_{h} : = \arg {max}_{x \in A_{h}} ∥ x ∥

. Suppose that the design point

x

falls into

A_{h}

. Then, for all

\tilde{x} = (x_{1}, \dots, x_{m})

we denote

{\tilde{x}}_{h} = (x_{1, h}, \dots, x_{m, h})

such that

{\tilde{x}}_{h} : = \arg {max}_{\tilde{x} \in A_{h}^{m}} ∥ \tilde{x} ∥

. For

\tilde{x} = (x_{1}, \dots, x_{m}) \in S_{\tilde{x}} : = \prod_{i = 1}^{m} S_{X_{i}}

, where

S_{X_{i}} = S_{X_{i}} (η_{i}) : = \prod_{j = 1}^{d} [η_{j}, 1 - η_{j}] \subseteq {[0, 1]}^{d},

and the boundary parameters

η_{i} : = (η_{i_{1}}, \dots, η_{i_{d}})

either are fixed or shrink to zero at a suitable rate. For each

1 \leq i \leq m

, we divide every edge of the d-hyper-rectangles

S_{X_{i}}

into

N_{n}

evenly spaced grids, resulting in

N_{n}^{d}

identical sub-hyper-rectangles. For any

\tilde{x} = (x_{1}, \dots, x_{m}) \in S_{X}^{m}

, there exists

ℓ (\tilde{x}) = (ℓ (x_{1}), \dots, ℓ (x_{m}))

such that for all

1 \leq i \leq m

,

1 \leq ℓ (x_{i}) \leq N_{n}^{d}

, and

\tilde{x} \in \prod_{i = 1}^{m} A (x_{ℓ (x_{i})}) such that x_{ℓ (x_{i})} : = \arg max_{x \in A (x_{ℓ (x_{i})})} ∥ x ∥ .

Under the MAR assumption (2.4) with the positivity condition

{inf}_{x \in S_{d, 1}} p (x) \geq c > 0

, we consider the complete-case U-statistic

u_{n, 3}^{(miss)} (φ, \tilde{x})

incorporating the missingness indicators

\prod_{j = 1}^{m} δ_{i_{j}}

. For notational brevity, we write

u_{n, 3} (φ, \tilde{x})

with the implicit understanding that the missingness indicators are included. For each

\tilde{x} \in S_{X}^{m}

, we consider the decomposition

\begin{matrix} |u_{n, 3} (φ, \tilde{x}) - E [u_{n, 3} (φ, \tilde{x})]| & \leq & |u_{n, 3} (φ, \tilde{x}) - u_{n, 3} (φ, {\tilde{x}}_{ℓ (x)})| \\ + |E [u_{n, 3} (φ, {\tilde{x}}_{ℓ (x)})] - E [u_{n, 3} (φ, \tilde{x})]| \\ + |u_{n, 3} (φ, {\tilde{x}}_{ℓ (x)}) - E [u_{n, 3} (φ, {\tilde{x}}_{ℓ (x)})]| . \end{matrix}

Before proceeding, we borrow a few lemmas from [165], all of which are key building blocks for the technical proofs below. Under the MAR assumption, these lemmas remain valid as they concern only the kernel structure, which is unaffected by the missingness mechanism. Throughout

θ_{x_{j}}

denotes a beta random variable so that

θ_{x_{j}} \overset{D}{=} Beta \{x_{j} / b_{j} + 1, (1 - x_{j}) / b_{j} + 1\} .

Lemma 4.

Let

θ_{x_{j}}

and

θ_{x_{k}}

be independent for

j \neq k

. Then, as

n \to \infty

, we have

\begin{matrix} sup_{x_{j} \in (0, 1)} E (θ_{x_{j}} - x_{j}) & = & O (b_{j}), and \\ sup_{x_{j}, x_{k} \in (0, 1)} E \{(θ_{x_{j}} - x_{j}) (θ_{x_{k}} - x_{k})\} & = & \{\begin{matrix} O (b_{j}), & for j = k, \\ O (b_{j} b_{k}), & for j \neq k . \end{matrix} \end{matrix}

Lemma 5.

Suppose that

b (= b (n) > 0)

and

η (= η (n) > 0

) satisfy

b, η \to 0

and

b / η \to 0

as

n \to \infty

. Then, as

n \to \infty

, we have

sup_{(x, u) \in [η, 1 - η] \times [0, 1]} K_{B (x, b)} (u) \leq (\frac{9}{4 \sqrt{π}}) b^{- 1 / 2} η^{- 1 / 2} .

Lemma 6.

Under the same condition as in Lemma 5, as

n \to \infty

, we have

sup_{(x, u) \in [η, 1 - η] \times [0, 1]} |\frac{\partial K_{B (x, b)} (u)}{\partial x}| \leq \{(\frac{9}{4 \sqrt{π}}) (γ + \frac{π^{2}}{6}) + 1\} b^{- (2 + 1 / 2)} η^{- 1 / 2},

where

γ = 0.5772 \dots

is Euler’s constant.

Proof of Theorem 10.

To establish this theorem under the MAR assumption, we employ a truncation argument for the conditional U-statistic, carefully accounting for the missingness indicators. First, let us introduce the following notation:

\begin{matrix} ϕ_{n} & = & \sqrt{\frac{(\log n / n)}{\prod_{j = 1}^{m} (\prod_{i = 1}^{d} b_{j_{i}} η_{j_{i}})}}, \\ ω_{n, 3} & = & ϕ_{n}^{- 1 / (1 + γ)}, \\ N_{n} & = & ϕ_{n}^{- (1 + \frac{1}{1 + γ})} {(\prod_{j = 1}^{m} (\prod_{i = 1}^{d} b_{j_{i}} η_{j_{i}}))}^{- \frac{1}{2}} (\sum_{j = 1}^{m} \sum_{i = 1}^{d} \frac{1}{b_{j_{i}}^{2}}) . \end{matrix}

From the truncation decomposition (11.1), adapted to the MAR setting with the kernel

G_{φ, \tilde{x}, 3}^{(miss)}

defined in (2.6), we can write

\begin{matrix} u_{n, 3} (φ, \tilde{x}) & = & u_{n, 3}^{(m)} (G_{φ, \tilde{x}, 3}^{(miss), (T)}) + u_{n, 3}^{(m)} (G_{φ, \tilde{x}, 3}^{(miss), (R)}) \\ = & u_{n, 3}^{(T)} (φ, \tilde{x}) + u_{n, 3}^{(R)} (φ, \tilde{x}) . \end{matrix}

Using the same truncation technique we employed in the previous sections’ proofs, we establish the results for the truncated and remainder parts separately. The remainder part is handled analogously to the proof of Theorem 2, utilizing condition (C.3) and the MAR assumption to ensure its asymptotic negligibility. We focus here on the truncated part, where the missingness indicators play a crucial role.

Truncated Part under MAR

Let us remark that under the MAR assumption, the truncated complete-case U-statistic satisfies

\begin{matrix} |u_{n, 3}^{(T)} (φ, \tilde{x}) - E (u_{n, 3}^{(T)} (φ, \tilde{x}))| \\ = \frac{(n - m)!}{n!} |\sum_{i \in I (m, n)} \{G_{φ, \tilde{x}, 3}^{(miss), (T)} (\tilde{X_{i}}, \tilde{Y_{i}}, {\tilde{δ}}_{i}) - E [G_{φ, \tilde{x}, 3}^{(miss), (T)} (\tilde{X_{i}}, \tilde{Y_{i}}, {\tilde{δ}}_{i})]\}| \\ = \frac{(n - m)!}{n!} |\sum_{i \in I (m, n)} H^{(T)} (\tilde{X_{i}}, \tilde{Y_{i}}, {\tilde{δ}}_{i})|, \end{matrix}

where

H^{(T)} (\tilde{X}, \tilde{Y}, \tilde{δ}) = G_{φ, \tilde{x}, 3}^{(miss), (T)} (\tilde{X}, \tilde{Y}, \tilde{δ}) - E [G_{φ, \tilde{x}, 3}^{(miss), (T)} (\tilde{X}, \tilde{Y}, \tilde{δ})] .

We apply Lemma A2 (the exponential inequality for U-statistics) to the function

H^{(T)} (\cdot, \cdot, \cdot)

. Throughout the remainder of the proof, we assume, without loss of generality, that the kernel

G_{φ, \tilde{x}, 3}^{(miss), (T)}

is symmetric (if not, we replace it by its symmetrization as in Remark 2, which does not affect the U-statistic value). Moreover, by Lemma 5, for a sufficiently large n, we readily infer

\begin{matrix} |H^{(T)} (\tilde{X}, \tilde{Y}, \tilde{δ})| & \leq & 2 ω_{n, 3} {(\frac{9}{4 \sqrt{π}})}^{d m} \prod_{j = 1}^{m} \{{(\prod_{i = 1}^{d} b_{j_{i}} η_{j_{i}})}^{- \frac{1}{2}}\} \\ \leq & 2 {(\frac{9}{4 \sqrt{π}})}^{d m} \frac{ϕ_{n}^{2 - 1 / (1 + γ)}}{\log (n)} : = C_{H}, \end{matrix}

where we have used the fact that

| \prod_{j = 1}^{m} δ_{i_{j}} | \leq 1

and the boundedness of

φ^{(T)}

by

ω_{n, 3}

. The term

ϕ_{n}^{2 - 1 / (1 + γ)}

arises from the product of the kernel bounds and the truncation threshold. We also note that

θ = E [H^{(T)} (\tilde{X}, \tilde{Y}, \tilde{δ})] = 0

by construction. One can easily derive

\begin{matrix} σ^{2} & = & Var (H^{(T)} (\tilde{X}, \tilde{Y}, \tilde{δ})) \leq E [H^{(T)} {(\tilde{X}, \tilde{Y}, \tilde{δ})}^{2}] \\ \leq & \int_{{[0, 1]}^{d m}} E [{|φ^{(T)} (\tilde{Y})|}^{2} ∣ \tilde{X} = \tilde{u}] (\prod_{j = 1}^{m} p (u_{j})) \tilde{f} (\tilde{u}) {\tilde{K}}_{{\bar{Λ}}_{n, 3} (\tilde{x})}^{2} (\tilde{u}) d \tilde{u} . \end{matrix}

The factor

\prod_{j = 1}^{m} p (u_{j})

appears from the MAR assumption when taking the expectation over the missingness indicators:

E [\prod_{j = 1}^{m} δ_{i_{j}} ∣ \tilde{X} = \tilde{u}] = \prod_{j = 1}^{m} p (u_{j})

. Using Lyapunov’s inequality, and condition (C.3), for

C_{0}, C_{1} \geq 1

, we have

\begin{matrix} E [{|φ^{(T)} (\tilde{Y})|}^{2} ∣ \tilde{X} = \tilde{u}] (\prod_{j = 1}^{m} p (u_{j})) \tilde{f} (\tilde{u}) & \leq & {\{E [{|φ^{(T)} (\tilde{Y})|}^{2 + γ} ∣ \tilde{X} = \tilde{u}] \tilde{f} (\tilde{u})\}}^{2 / (2 + γ)} {\{\tilde{f} (\tilde{u})\}}^{γ / (2 + γ)} \\ \leq & C_{1}^{2 / (2 + γ)} C_{0}^{γ / (2 + γ)} \leq C_{0} C_{1}, \end{matrix}

where we have used that

0 < p (u_{j}) \leq 1

and

\prod_{j = 1}^{m} p (u_{j}) \leq 1

. In addition, recall that the squared Dirichlet kernel satisfies

K_{α, β}^{2} (u) = \frac{B {2 x / b + 1, 2 (1 - x) / b + 1}}{b^{2} {x / b + 1, (1 - x) / b + 1}} \frac{u^{2 x / b} {(1 - u)}^{2 (1 - x) / b}}{B {2 x / b + 1, 2 (1 - x) / b + 1}} 1_{{u \in [0, 1]}} .

By a lemma from [98], the first term is bounded by

b^{- 1 / 2} {(1 + b)}^{3 / 2} / {2 \sqrt{π} \sqrt{x (1 - x)}}

for sufficiently large n. The second term is the probability density function of a Beta distribution with parameters

{2 x / b + 1, 2 (1 - x) / b + 1}

. Therefore, we derive

σ^{2} \leq C_{0} C_{1} \prod_{j = 1}^{m} \{\prod_{i = 1}^{d} \frac{b_{j_{i}}^{- 1 / 2} {(1 + b_{j_{i}})}^{3 / 2}}{2 \sqrt{π} \sqrt{x_{j_{i}} (1 - x_{j_{i}})}}\} \leq C_{0} C_{1} \prod_{j = 1}^{m} \{\prod_{i = 1}^{d} \frac{b_{j_{i}}^{- 1 / 2} {(1 + b_{j_{i}})}^{3 / 2}}{2 \sqrt{π} \sqrt{η_{j_{i}} (1 - η_{j_{i}})}}\} .

For sufficiently large n, the parameters

b_{j_{1}}, \dots, b_{j_{d}}

and

η_{j_{1}}, \dots, η_{j_{d}}

(

1 \leq j \leq m

) are no greater than

1 / 2

, and thus

\begin{matrix} σ^{2} & \leq & \prod_{j = 1}^{m} \sqrt{\prod_{j = 1}^{d} b_{j_{i}} η_{j_{i}}} C_{0} C_{1} {(\frac{3}{4} \sqrt{\frac{3}{π}})}^{d m} \\ \leq & n \prod_{j = 1}^{m} \sqrt{\prod_{j = 1}^{d} b_{j_{i}} η_{j_{i}}} C_{0} C_{1} {(\frac{3}{4} \sqrt{\frac{3}{π}})}^{d m} \\ \leq & \frac{ϕ_{n}^{2}}{\log (n)} C_{0} C_{1} {(\frac{3}{4} \sqrt{\frac{3}{π}})}^{d m} \\ \leq & \frac{ϕ_{n}^{2}}{\log (n)} ρ^{2}, \end{matrix}

where

ρ^{2} : = C_{0} C_{1} {(\frac{3}{4} \sqrt{\frac{3}{π}})}^{d m}

. For any

ε > 0

and n sufficiently large, applying Bernstein’s inequality for U-statistics (see Lemma A2) yields

\begin{matrix} P (|u_{n, 3}^{(T)} (φ, \tilde{x}) - E (u_{n, 3}^{(T)} (φ, \tilde{x}))| > ε ρ ϕ_{n}) \\ \leq 2 \exp [- \frac{[n / m] ρ^{2} ϕ_{n}^{2} ε^{2}}{2 σ^{2} + \frac{2}{3} C_{H} ρ ε ϕ_{n}}] \\ \leq 2 \exp [- \frac{ε^{2} \log (n)}{2 \{1 + \frac{2}{3} {(\frac{9}{4 \sqrt{π}})}^{d m} \frac{ε ϕ_{n}^{1 - 1 / (1 + γ)}}{ρ}\}}] . \end{matrix}

Taking into account that

ϕ_{n} = o (1)

and

\frac{2}{3} {(\frac{9}{4 \sqrt{π}})}^{d m} \frac{ε ϕ_{n}^{1 - 1 / (1 + γ)}}{ρ} \leq 1

for sufficiently large n, it follows that

P (|u_{n, 3}^{(T)} (φ, \tilde{x}) - E (u_{n, 3}^{(T)} (φ, \tilde{x}))| > ε ρ ϕ_{n}) \leq 2 \exp [- \frac{ε^{2} \log (n)}{2 (1 + 1)}] = 2 n^{- \frac{ε^{2}}{4}} .

(14.1)

On the other hand, we have

\begin{matrix} P (sup_{\tilde{x} \in S_{X}^{m}} |u_{n, 3}^{(T)} (φ, \tilde{x}) - E (u_{n, 3}^{(T)} (φ, \tilde{x}))| > 2 ε ρ ϕ_{n}) \\ \leq P (sup_{\tilde{x} \in S_{X}^{m}} |u_{n, 3}^{(T)} (φ, \tilde{x}) - u_{n, 3}^{(T)} (φ, {\tilde{x}}_{ℓ (x)}) \\ + E [u_{n, 3}^{(T)} (φ, {\tilde{x}}_{ℓ (\tilde{x})})] - E [u_{n, 3}^{(T)} (φ, \tilde{x})]| > ε ρ ϕ_{n}) \\ + P (sup_{\tilde{x} \in S_{X}^{m}} |u_{n, 3}^{(T)} (φ, {\tilde{x}}_{ℓ (\tilde{x})}) - E [u_{n, 3}^{(T)} (φ, {\tilde{x}}_{ℓ (\tilde{x})})]| > ε ρ ϕ_{n}) . \end{matrix}

(14.2)

We highlight that under the MAR assumption, the difference between the U-statistics evaluated at

\tilde{x}

and

{\tilde{x}}_{ℓ (\tilde{x})}

is bounded by

\begin{matrix} |u_{n, 3}^{(T)} (φ, \tilde{x}) - u_{n, 3}^{(T)} (φ, {\tilde{x}}_{ℓ (\tilde{x})})| \\ \leq \frac{(n - m)!}{n!} \sum_{i \in I (m, n)} |G_{φ, \tilde{x}, 3}^{(miss), (T)} (\tilde{X_{i}}, \tilde{Y_{i}}, {\tilde{δ}}_{i}) - G_{φ, {\tilde{x}}_{ℓ (\tilde{x})}, 3}^{(miss), (T)} (\tilde{X_{i}}, \tilde{Y_{i}}, {\tilde{δ}}_{i})| \\ \leq \frac{(n - m)!}{n!} \sum_{i \in I (m, n)} |φ^{(T)} (\tilde{Y_{i}})| (\prod_{j = 1}^{m} δ_{i_{j}}) |{\tilde{K}}_{{\bar{Λ}}_{n, 3} (\tilde{x})} ({\tilde{X}}_{i}) - {\tilde{K}}_{{\bar{Λ}}_{n, 3} ({\tilde{x}}_{ℓ (\tilde{x})})} ({\tilde{X}}_{i})| . \end{matrix}

The product of missingness indicators

\prod_{j = 1}^{m} δ_{i_{j}}

is bounded by 1 and does not affect the rate. Hence, the rate of

sup_{\tilde{x} \in A_{h}^{m}} |u_{n, 3}^{(T)} (φ, \tilde{x}) - E [u_{n, 3}^{(T)} (φ, {\tilde{x}}_{ℓ (\tilde{x})})]|

is determined by

|φ^{(T)} (\tilde{Y_{i}})| |{\tilde{K}}_{{\bar{Λ}}_{n, 3} (\tilde{x})} ({\tilde{X}}_{i}) - {\tilde{K}}_{{\bar{Λ}}_{n, 3} ({\tilde{x}}_{ℓ (\tilde{x})})} ({\tilde{X}}_{i})|

. By the mean-value theorem, we have

|{\tilde{K}}_{{\bar{Λ}}_{n, 3} (\tilde{x})} ({\tilde{X}}_{i}) - {\tilde{K}}_{{\bar{Λ}}_{n, 3} ({\tilde{x}}_{ℓ (\tilde{x})})} ({\tilde{X}}_{i})| \leq sup_{(\tilde{x}, \tilde{u}) \in A_{h}^{m} \times {[0, 1]}^{d m}} ∥\nabla \{{\tilde{K}}_{{\bar{Λ}}_{n, 3} (\tilde{x})} (u)\}∥ sup_{\tilde{x} \in A_{h}^{m}} ∥\tilde{x} - {\tilde{x}}_{ℓ (\tilde{x})}∥,

for some

\tilde{\underset{̲}{x}}

joining

\tilde{x}

and

{\tilde{x}}_{ℓ (\tilde{x})}

. For

k = 1, \dots, m

, observe that

\begin{matrix} |\frac{\partial {\tilde{K}}_{{\bar{Λ}}_{n, 3} (\tilde{x})} (u)}{\partial x_{k}}| & \leq & \{\prod_{j = 1, j \neq k}^{m} K_{Λ_{n, 3} (x_{j})} (u_{j})\} |\frac{\partial K_{Λ_{n, 3} (x_{k})} (u_{k})}{\partial x_{k}}| \\ \leq & \{\prod_{j = 1, j \neq k}^{m} (\prod_{i = 1}^{d} K_{{\overset{˘}{α}}_{j_{i}}, {\overset{˘}{β}}_{j_{i}}} (u_{j_{i}}))\} |\frac{\partial K_{Λ_{n, 3} (x_{k})} (u_{k})}{\partial x_{k}}| \\ \leq & \prod_{j = 1, j \neq k}^{m} O \{{(\prod_{i = 1}^{d} b_{j_{i}} η_{j_{i}})}^{- \frac{1}{2}}\} |\frac{\partial K_{Λ_{n, 3} (x_{k})} (u_{k})}{\partial x_{k}}|, \end{matrix}

where, by Lemmas 5 and 6, for

ℓ = 1, \dots, d

, we have

|\frac{\partial K_{Λ_{n, 3} (x_{k})} (u_{k})}{\partial x_{k_{ℓ}}}| \leq \{\prod_{i = 1, i \neq ℓ}^{d} K_{{\overset{˘}{α}}_{k_{i}}, {\overset{˘}{β}}_{k_{i}}} (u_{k_{i}})\} |\frac{\partial K_{{\overset{˘}{α}}_{k_{ℓ}}, {\overset{˘}{β}}_{k_{ℓ}}} (u_{k_{ℓ}})}{\partial x_{k_{ℓ}}}| = O \{{(\prod_{i = 1}^{d} b_{k_{i}} η_{k_{i}})}^{- \frac{1}{2}} \frac{1}{b_{k_{ℓ}}^{2}}\},

uniformly on

(\tilde{x}, \tilde{u}) \in A_{h}^{m} \times {[0, 1]}^{d m}

and

\begin{matrix} \prod_{j = 1, j \neq k}^{m} K_{Λ_{n, 3} (x_{j})} (u_{j}) & = & \prod_{j = 1, j \neq k}^{m} \{\prod_{i = 1}^{d} K_{{\overset{˘}{α}}_{j_{i}}, {\overset{˘}{β}}_{j_{i}}} (u_{j_{i}})\} \\ \leq & \prod_{j = 1, j \neq k}^{m} O \{{(\prod_{i = 1}^{d} b_{j_{i}} η_{j_{i}})}^{- \frac{1}{2}}\}, \end{matrix}

which implies

sup_{(\tilde{x}, \tilde{u}) \in A_{h}^{m} \times {[0, 1]}^{d m}} ∥\nabla \{{\tilde{K}}_{{\bar{Λ}}_{n, 3} (\tilde{x})} (\tilde{u})\}∥ = O \{\prod_{j = 1}^{m} \{{(\prod_{i = 1}^{d} b_{j_{i}} η_{j_{i}})}^{- \frac{1}{2}}\} (\sum_{j = 1}^{m} \sum_{i = 1}^{d} \frac{1}{b_{j_{i}}^{2}})\} .

(14.3)

Using the fact that

sup_{\tilde{x} \in A_{h}^{m}} ∥\tilde{x} - {\tilde{x}}_{ℓ (\tilde{x})}∥ = O (N_{n}^{- m})

, it follows that

\begin{matrix} |φ^{(T)} (\tilde{Y_{i}})| |{\tilde{K}}_{{\bar{Λ}}_{n, 3} (\tilde{x})} ({\tilde{X}}_{i}) - {\tilde{K}}_{{\bar{Λ}}_{n, 3} ({\tilde{x}}_{ℓ (\tilde{x})})} ({\tilde{X}}_{i})| & \leq & O \{ω_{n, 3} N_{n}^{- m} \prod_{j = 1}^{m} \{{(\prod_{i = 1}^{d} b_{j_{i}} η_{j_{i}})}^{- \frac{1}{2}}\} (\sum_{j = 1}^{m} \sum_{i = 1}^{d} \frac{1}{b_{j_{i}}^{2}})\} \\ = & O (ϕ_{n}), \end{matrix}

(14.4)

uniformly on

(\tilde{x}, \tilde{u}) \in A_{h}^{m} \times {[0, 1]}^{d m}

. Next, making use of (14.4), we have

\begin{matrix} |E [u_{n, 3}^{(T)} (φ, {\tilde{x}}_{ℓ (\tilde{x})})] - E [u_{n, 3}^{(T)} (φ, \tilde{x})]| \end{matrix}

\begin{matrix} = & |E [u_{n, 3}^{(T)} (φ, {\tilde{x}}_{ℓ (\tilde{x})}) - u_{n, 3}^{(T)} (φ, \tilde{x})]| \end{matrix}

(14.5)

\begin{matrix} \leq & E |[u_{n, 3}^{(T)} (φ, {\tilde{x}}_{ℓ (\tilde{x})}) - u_{n, 3}^{(T)} (φ, \tilde{x})]| . \end{matrix}

(14.6)

Just as in the bounded scenario, the progression from (14.5) to (14.6) arises from Jensen’s inequality and the properties of the absolute value function. We can deduce that

sup_{\tilde{x} \in S_{X}^{m}} |E [u_{n, 3}^{(T)} (φ, {\tilde{x}}_{ℓ (\tilde{x})})] - E [u_{n, 3}^{(T)} (φ, \tilde{x})]| = O (ϕ_{n}) .

For sufficiently large n and each

m \geq 2

, for some

ε > 0

, we infer that

\begin{matrix} P (sup_{\tilde{x} \in S_{X}^{m}} |u_{n, 3}^{(T)} (φ, \tilde{x}) - u_{n, 3}^{(T)} (φ, {\tilde{x}}_{ℓ (x)}) \\ + E [u_{n, 3}^{(T)} (φ, {\tilde{x}}_{ℓ (\tilde{x})})] - E [u_{n, 3}^{(T)} (φ, \tilde{x})]| > ε ρ ϕ_{n}) = 0 . \end{matrix}

Continuing now with (14.2), by imposing that the kernel function

G_{φ, {\tilde{x}}_{ℓ}, 3}^{(miss), (T)}

is symmetric, the U-statistic is decomposed according to the Hoeffding decomposition [2]:

\begin{matrix} u_{n, 3}^{(T)} (φ, {\tilde{x}}_{ℓ (\tilde{x})}) - E [u_{n, 3}^{(T)} (φ, {\tilde{x}}_{ℓ (\tilde{x})})] \\ = \sum_{q = 1}^{m} \frac{m!}{(m - q)!} u_{n, 3}^{(q)} (π_{q, m} (G_{φ, {\tilde{x}}_{ℓ}, 3}^{(miss), (T)})) \\ = m u_{n, 3}^{(1)} (π_{1, m} (G_{φ, {\tilde{x}}_{ℓ}, 3}^{(miss), (T)})) + \sum_{q = 2}^{m} \frac{m!}{(m - q)!} u_{n, 3}^{(q)} (π_{q, m} (G_{φ, {\tilde{x}}_{ℓ}, 3}^{(miss), (T)})) . \end{matrix}

(14.7)

Let us first consider the linear term. We have

m u_{n, 3}^{(1)} (π_{1, m} (G_{φ, {\tilde{x}}_{ℓ}, 3}^{(miss), (T)})) = \frac{m}{n} \sum_{j = 1}^{n} π_{1, m} (G_{φ, {\tilde{x}}_{ℓ}, 3}^{(miss), (T)}) ({\tilde{X}}_{i}, {\tilde{Y}}_{i}, {\tilde{δ}}_{i}) .

From Hoeffding’s projection (14.7), we have

\begin{matrix} π_{1, m} (G_{φ, {\tilde{x}}_{ℓ}, 3}^{(miss), (T)}) (x, y, δ) & = & \{E [G_{φ, {\tilde{x}}_{ℓ}, 3}^{(miss), (T)} ((x, X_{2}, \dots, X_{m}), (y, Y_{2}, \dots, Y_{m}), (δ, δ_{2}, \dots, δ_{m}))] \\ - E [G_{φ, {\tilde{x}}_{ℓ}, 3}^{(miss), (T)} (\tilde{X}, \tilde{Y}, \tilde{δ})]\} \\ = & \{E [G_{φ, {\tilde{x}}_{ℓ}, 3}^{(miss), (T)} (\tilde{X}, \tilde{Y}, \tilde{δ}) ∣ (X_{1}, Y_{1}, δ_{1}) = (x, y, δ)] - E [G_{φ, {\tilde{x}}_{ℓ}, 3}^{(miss), (T)} (\tilde{X}, \tilde{Y}, \tilde{δ})]\} . \end{matrix}

Set

Z_{i}^{(T)} = π_{1, m} (G_{φ, {\tilde{x}}_{ℓ}, 3}^{(miss), (T)}) ({\tilde{X}}_{i}, {\tilde{Y}}_{i}, {\tilde{δ}}_{i}) .

It is evident that

Z_{i}^{(T)}

are independent and identically distributed random variables with mean zero, and

σ^{2} \leq \frac{ϕ_{n}^{2}}{\log (n)} ρ^{2} .

Making use of (14.1) and an application of Bernstein’s inequality, for some

ε > 0

, yields

\begin{matrix} P (sup_{\tilde{x} \in S_{X}^{m}} |u_{n, 3}^{(1)} (π_{1, m} (G_{φ, {\tilde{x}}_{ℓ}, 3}^{(miss), (T)}))| > ε ρ ϕ_{n}) \\ \leq \sum_{i = 1}^{N_{n}^{d}} P (max_{1 \leq ℓ_{i} \leq N_{n}^{d}} |u_{n, 3}^{(1)} (π_{1, m} (G_{φ, {\tilde{x}}_{ℓ}, 3}^{(miss), (T)}))| > ε ρ ϕ_{n}) \\ \leq N_{n}^{d} max_{1 \leq ℓ_{i} \leq N_{n}^{d}} P (|u_{n, 3}^{(1)} (π_{1, m} (G_{φ, {\tilde{x}}_{ℓ}, 3}^{(miss), (T)}))| > ε ρ ϕ_{n}) \\ = O (N_{n}^{d} n^{- \frac{ε^{2}}{4}}) . \end{matrix}

(14.8)

Turning to the nonlinear term, we will prove that for

2 \leq q \leq m

:

sup_{\tilde{x} \in S_{X}^{m}} \frac{(\binom{m}{q}) |u_{n, 3}^{(q)} (π_{q, m} G_{φ, {\tilde{x}}_{ℓ}, 3}^{(miss), (T)})|}{ϕ_{n}} = o_{P} (1),

which implies that, for

1 \leq i \leq m

and

ℓ = (ℓ_{1}, \dots, ℓ_{m})

:

max_{1 \leq ℓ_{i} \leq N_{n}^{d}} \frac{(\binom{m}{q}) |u_{n}^{(q)} (π_{q, m} G_{φ, {\tilde{x}}_{ℓ}, 3}^{(miss), (T)})|}{ϕ_{n}} = o_{P} (1) .

To prove the above-mentioned equation, we need to apply Proposition 1 of [187] (see Lemma A3). We can see that

G_{φ, {\tilde{x}}_{ℓ}, 3}^{(miss), (T)}

is bounded by

{(\frac{9}{4 \sqrt{π}})}^{d m} \frac{ϕ_{n}^{2 - 1 / (1 + γ)}}{\log (n)}

, hence for

ε > 0

, we have

\begin{matrix} P (n^{1 / 2} |\sum_{q = 2}^{m} \frac{m!}{(m - q)!} u_{n, 3}^{(q)} (π_{q, m} (G_{φ, {\tilde{x}}_{ℓ}, 3}^{(miss), (T)}))| > ε ρ ϕ_{n}) \\ = P (|\sum_{q = 2}^{m} \frac{m!}{(m - q)!} u_{n, 3}^{(q)} (π_{q, m} (G_{φ, {\tilde{x}}_{ℓ}, 3}^{(miss), (T)}))| > n^{- 1 / 2} ε ρ ϕ_{n}) \\ = P (|\sum_{q = 2}^{m} \frac{m!}{(m - q)!} u_{n, 3}^{(q)} (π_{q, m} (G_{φ, {\tilde{x}}_{ℓ}, 3}^{(miss), (T)}))| > ε_{0} ρ ϕ_{n}), \end{matrix}

where

ε_{0} = \frac{ε}{\sqrt{n}}

. Now for

t = ε ρ ϕ_{n}

, Lemma A3 gives

\begin{matrix} P (|\sum_{q = 2}^{m} \frac{m!}{(m - q)!} u_{n, 3}^{(q)} (π_{q, m} (G_{φ, {\tilde{x}}_{ℓ}, 3}^{(miss), (T)}))| > ε_{0} ρ ϕ_{n}) \\ \leq 2 \exp (- \frac{t {(n - 1)}^{1 / 2}}{2^{m + 2} m^{m + 1} \frac{1}{2} C_{H}}) \\ \leq 2 \exp (- \frac{ε ρ ϕ_{n} {(n - 1)}^{1 / 2}}{2^{m + 2} m^{m + 1} \frac{1}{2} C_{H}}) \\ \leq 2 \exp (- \frac{ε {(n - 1)}^{1 / 2} \log (n)}{2^{m + 2} m^{m + 1} {(\frac{1}{\sqrt{3}})}^{d m} ϕ_{n}^{1 - 1 / (1 + γ)}}) . \end{matrix}

By the last result, it follows that there exists

ε > 0

such that

\begin{matrix} P (|\sum_{q = 2}^{m} (\binom{m}{q}) u_{n, 3}^{(q)} (π_{q, m} (G_{φ, {\tilde{x}}_{ℓ}, 3}^{(miss), (T)}))| > ε_{0} ρ ϕ_{n}) \leq n^{- ε_{0} / 2 C_{6}}, \end{matrix}

where

C_{6} = 2^{m + 2} m^{m + 1} {(\frac{1}{\sqrt{3}})}^{d m} ϕ_{n}^{1 - 1 / (1 + γ)} .

Therefore, for each

ε_{0} > 0

,

1 \leq i \leq m

and

ℓ = (ℓ_{1}, \dots, ℓ_{m})

, we infer that

\begin{matrix} P (sup_{\tilde{x} \in S_{X}^{m}} |\sum_{q = 2}^{m} (\binom{m}{q}) u_{n, 3}^{(q)} (π_{q, m} (G_{φ, {\tilde{x}}_{ℓ}, 3}^{(miss), (T)}))| > ε_{0} ρ ϕ_{n}) \\ \leq N_{n}^{d} max_{1 \leq ℓ_{i} \leq N_{n}^{d}} P (|\sum_{q = 2}^{m} (\binom{m}{q}) u_{n, 3}^{(q)} (π_{q, m} (G_{φ, {\tilde{x}}_{ℓ}, 3}^{(miss), (T)}))| > ε_{0} ρ ϕ_{n}) \\ \leq N_{n}^{d} n^{- m (ε_{0} / 2 C_{6})} . \end{matrix}

(14.9)

By combining (14.1) and (14.9), for some

ε > 0

, it follows that

P (sup_{\tilde{x} \in S_{X}^{m}} |u_{n, 3}^{(T)} (φ, {\tilde{x}}_{ℓ (\tilde{x})}) - E [u_{n, 3}^{(T)} (φ, {\tilde{x}}_{ℓ (\tilde{x})})]| > ε ρ ϕ_{n}) = O (N_{n}^{d} n^{- m ε^{2} / 4}),

(14.10)

which implies for

ε = 2 \sqrt{5 d}

, as

n \to \infty

,

\begin{matrix} N_{n}^{d} n^{- m ε^{2} / 4} = ϕ_{n}^{- p (1 + \frac{1}{1 + γ})} {(\prod_{j = 1}^{m} (\prod_{i = 1}^{d} b_{j_{i}} η_{j_{i}}))}^{- \frac{d}{2}} {(\sum_{j = 1}^{m} \sum_{i = 1}^{d} \frac{1}{b_{j_{i}}^{2}})}^{d} n^{- 5 m d} \\ = {[{(\log n)}^{- 5 m} ϕ_{n}^{10 m - (1 + \frac{1}{1 + γ})} {(\prod_{j = 1}^{m} (\prod_{i = 1}^{d} η_{j_{i}}))}^{- \frac{1}{2}} (\sum_{j = 1}^{m} \sum_{i = 1}^{d} {(\prod_{k = 1, k \neq j}^{m} (\prod_{ℓ = 1, ℓ \neq i}^{d} b_{j_{i}}))}^{\frac{5 m - 1}{2}} b_{j_{i}}^{5 (m - 1) / 2})]}^{d} \to 0 . \end{matrix}

Thus, the remainder part

u_{n, 3}^{(R)} (φ, \tilde{x})

is treated similarly using condition (C.3) and the MAR assumption, yielding a negligible contribution. Consequently, we have established the desired uniform convergence rate for

u_{n, 3} (φ, \tilde{x})

under the MAR assumption. This completes the proof of Theorem 10. □

Remark 28.

The adaptation to the missing data setting required careful incorporation of the missingness indicators

\prod_{j = 1}^{m} δ_{i_{j}}

into the kernel

G_{φ, \tilde{x}, 3}^{(miss), (T)}

. The MAR assumption enters critically when taking expectations, introducing the factor

\prod_{j = 1}^{m} p (u_{j})

in the variance bound. However, since

0 < p (u_{j}) \leq 1

under the positivity condition, this factor does not affect the asymptotic rates—it merely reduces the effective sample size by a constant factor, which is absorbed into the constants

ρ^{2}

and

C_{H}

. The truncation threshold

ω_{n, 3}

and the rate

ϕ_{n}

remain unchanged from the complete-data case, demonstrating the robustness of the complete-case U-statistic methodology under MAR.

Remainder Part

Notice that under the MAR assumption (2.4) with the positivity condition

{inf}_{x \in S_{d, 1}} p (x) \geq c > 0

, the complete-case remainder kernel incorporates the missingness indicators as

G_{φ, \tilde{x}, 3}^{(miss), (R)} (\tilde{X}, \tilde{Y}, \tilde{δ}) = φ^{(R)} (\tilde{Y}) (\prod_{j = 1}^{m} δ_{j}) {\tilde{K}}_{{\bar{Λ}}_{n, 3} (\tilde{x})} (\tilde{X})

. For notational brevity, we write

G_{φ, \tilde{x}, 3}^{(R)}

with the implicit understanding that the missingness indicators are included. Then,

\begin{matrix} u_{n, 3}^{(R)} (φ, \tilde{x}) - E [u_{n, 3}^{(R)} (φ, \tilde{x})] \\ = \frac{(n - m)!}{n!} \sum_{i \in I (m, n)} G_{φ, \tilde{x}, 3}^{(R)} (\tilde{X_{i}}, \tilde{Y_{i}}) - E [G_{φ, \tilde{x}, 3}^{(R)} (\tilde{X_{i}}, \tilde{Y_{i}})] . \end{matrix}

Now, using the fact that for

|φ (\tilde{Y_{i}})| > ω_{n, 3}

, we have

{(|φ (\tilde{Y_{i}})| / ω_{n, 3})}^{1 + γ} > 1

, which implies that

\begin{matrix} |E [u_{n, 3}^{(R)} (φ, \tilde{x})]| & \leq & E [|φ ({\tilde{Y}}_{i})| 1_{\{|φ ({\tilde{Y}}_{i})| > ω_{n, 3}\}} (\prod_{j = 1}^{m} δ_{i_{j}}) {\tilde{K}}_{{\bar{Λ}}_{n, 3} (\tilde{x})} ({\tilde{X}}_{i})] \\ \leq & E [|φ ({\tilde{Y}}_{i})| {(\frac{|φ (\tilde{Y_{i}})|}{ω_{n, 3}})}^{1 + γ} 1_{\{|φ ({\tilde{Y}}_{i})| > ω_{n, 3}\}} (\prod_{j = 1}^{m} δ_{i_{j}}) {\tilde{K}}_{{\bar{Λ}}_{n, 3} (\tilde{x})} ({\tilde{X}}_{i})] \\ \leq & ω_{n, 3}^{- (1 + γ)} E [{|φ ({\tilde{Y}}_{i})|}^{2 + γ} (\prod_{j = 1}^{m} δ_{i_{j}}) {\tilde{K}}_{{\bar{Λ}}_{n, 3} (\tilde{x})} ({\tilde{X}}_{i})], \end{matrix}

(14.11)

where, by Assumption (C.3) and the MAR assumption, we have

E [\prod_{j = 1}^{m} δ_{i_{j}} ∣ {\tilde{X}}_{i} = \tilde{u}] = \prod_{j = 1}^{m} p (u_{j})

. Moreover,

{\tilde{K}}_{{\bar{Λ}}_{n, 3} (\tilde{x})} (\cdot)

is the density function of the product of

d m

independent beta random variables

θ_{x_{i}} : = (θ_{x_{1}}, \dots, θ_{x_{m}}) \in {[0, 1]}^{d m}

,

i = 1, \dots, d m

. Consequently,

\begin{matrix} E [{|φ ({\tilde{Y}}_{i})|}^{2 + γ} (\prod_{j = 1}^{m} δ_{i_{j}}) {\tilde{K}}_{{\bar{Λ}}_{n, 3} (\tilde{x})} ({\tilde{X}}_{i})] \\ = E \{E ({|φ ({\tilde{Y}}_{i})|}^{2 + γ} ∣ {\tilde{X}}_{i}) (\prod_{j = 1}^{m} p (X_{i_{j}})) {\tilde{K}}_{{\bar{Λ}}_{n, 3} (\tilde{x})} ({\tilde{X}}_{i})\} \\ = \int_{{[0, 1]}^{d m}} E ({|φ (\tilde{Y})|}^{2 + γ} ∣ \tilde{X} = \tilde{u}) (\prod_{j = 1}^{m} p (u_{j})) \tilde{f} (\tilde{u}) {\tilde{K}}_{{\bar{Λ}}_{n, 3} (\tilde{x})} (\tilde{u}) d \tilde{u} \leq C_{1}, \end{matrix}

(14.12)

where the final inequality follows from condition (C.3) (which gives

E (| φ (\tilde{Y}) |^{2 + γ} ∣ \tilde{X} = \tilde{u}) \tilde{f} (\tilde{u}) \leq C_{1}

), the fact that

\prod_{j = 1}^{m} p (u_{j}) \leq 1

under the positivity condition, and the property that

\int \tilde{K} = 1

. Hence, by the definition of

ω_{n, 3} = ϕ_{n}^{- 1 / (1 + γ)}

, we have

|E [u_{n, 3}^{(R)} (φ, \tilde{x})]| \leq O (ϕ_{n})

uniformly on

\tilde{x} \in S_{X}^{m}

. Consequently, Markov’s inequality gives us

sup_{\tilde{x} \in S_{X}^{m}} |u_{n, 3}^{(R)} (φ, \tilde{x}) - E [u_{n, 3}^{(R)} (φ, \tilde{x})]| = O_{P} (ϕ_{n}) .

(14.13)

A more refined argument employing the Borel–Cantelli lemma (see the proof of Theorem 2) actually yields the stronger almost sure convergence at the same rate, given the exponential bounds available for the truncated part and the moment condition for the remainder. Hence, the proof is complete.

Notice that

\begin{matrix} u_{n, 3}^{(R)} (φ, \tilde{x}) - E [u_{n, 3}^{(R)} (φ, \tilde{x})] \\ = \frac{(n - m)!}{n!} \sum_{i \in I (m, n)} G_{φ, \tilde{x}, 3}^{(R)} (\tilde{X_{i}}, \tilde{Y_{i}}) - E [G_{φ, \tilde{x}, 3}^{(R)} (\tilde{X_{i}}, \tilde{Y_{i}})] . \end{matrix}

Now, using the fact that for

|φ (\tilde{Y_{i}})| > ω_{n, 3}

, we have

{(|φ (\tilde{Y_{i}})| / ω_{n, 3})}^{1 + γ} > 1

, which implies that

\begin{matrix} |E [u_{n, 3}^{(R)} (φ, \tilde{x})]| & \leq & E [|φ ({\tilde{Y}}_{i})| 1_{\{|φ ({\tilde{Y}}_{i})| > ω_{n, 3}\}} {\tilde{K}}_{{\bar{Λ}}_{n, 3} (\tilde{x})} ({\tilde{X}}_{i})] \\ \leq & E [|φ ({\tilde{Y}}_{i})| {(\frac{|φ (\tilde{Y_{i}})|}{ω_{n, 3}})}^{1 + γ} 1_{\{|φ ({\tilde{Y}}_{i})| > ω_{n, 3}\}} {\tilde{K}}_{{\bar{Λ}}_{n, 3} (\tilde{x})} ({\tilde{X}}_{i})] \\ \leq & ω_{n, 3}^{- (1 + γ)} E [{|φ ({\tilde{Y}}_{i})|}^{2 + γ} {\tilde{K}}_{{\bar{Λ}}_{n, 3} (\tilde{x})} ({\tilde{X}}_{i})], \end{matrix}

(14.14)

where, by Assumption (C.3) and the fact that

{\tilde{K}}_{{\bar{Λ}}_{n, 3} (\tilde{x})} (\cdot)

is the density function of the product of

d m

independent beta random variables

θ_{x_{i}} : = (θ_{x_{1}}, \dots, θ_{x_{m}}) \in {[0, 1]}^{d m}

,

i = 1, \dots, d m

. We have

\begin{matrix} E [{|φ ({\tilde{Y}}_{i})|}^{2 + γ} {\tilde{K}}_{{\bar{Λ}}_{n, 3} (\tilde{x})} ({\tilde{X}}_{i})] & = & E \{E ({|φ ({\tilde{Y}}_{i})|}^{2 + γ} ∣ {\tilde{X}}_{i}) {\tilde{K}}_{{\bar{Λ}}_{n, 3} (\tilde{x})} ({\tilde{X}}_{i})\} \\ = & \int_{{[0, 1]}^{d m}} E ({|φ (\tilde{Y})|}^{2 + γ} ∣ \tilde{X} = \tilde{u}) \tilde{f} (\tilde{u}) {\tilde{K}}_{{\bar{Λ}}_{n, 3} (\tilde{x})} (\tilde{u}) d \tilde{u} \leq C_{1} . \end{matrix}

(14.15)

Hence, by the definition of

ω_{n, 3}

,

|E [u_{n, 3}^{(R)} (φ, \tilde{x})]| \leq O (ϕ_{n})

uniformly on

\tilde{x} \in S_{X}^{m}

. Consequently, Markov’s inequality gives us

sup_{\tilde{x} \in S_{X}^{m}} |u_{n, 3}^{(R)} (φ, \tilde{x}) - E [u_{n, 3}^{(R)} (φ, \tilde{x})]| = O_{P} (ϕ_{n}) .

(14.16)

Hence, the proof is complete.

Proof of Theorem 11.

The demonstration of Theorem 11 proceeds by a meticulous application of the classical decomposition (11.2) in conjunction with the uniform convergence rates established in Theorem 10 for the underlying complete-case conditional U-statistics

u_{n, 3} (φ, \tilde{x})

under the MAR assumption (2.4) with the positivity condition

{inf}_{x \in S_{d, 1}} p (x) \geq c > 0

. Throughout this proof, all U-statistics are understood to be the complete-case versions incorporating the missingness indicators

δ_{i}

as defined in (3.5); for notational brevity, we write

u_{n, 3} (φ, \tilde{x})

in place of

u_{n, 3}^{(miss)} (φ, \tilde{x})

, with the implicit understanding that the product

\prod_{j = 1}^{m} δ_{i_{j}}

is included in the kernel.

Recall that the estimator of interest admits the representation

{\hat{r}}_{n, 3}^{(m)} (φ, \tilde{x}; {\bar{Λ}}_{n, 3} (\tilde{x})) = u_{n, 3} (φ, \tilde{x}) / u_{n, 3} (1, \tilde{x})

, and its Hájek-type centering is defined as

\hat{E} [\cdot] = E [u_{n, 3} (φ, \tilde{x})] / E [u_{n, 3} (1, \tilde{x})] .

From the elementary identity

\frac{a}{b} - \frac{α}{β} = \frac{a - α}{b} - \frac{α (b - β)}{b β},

valid for

b, β \neq 0

, we obtain the fundamental decomposition

|{\hat{r}}_{n, 3}^{(m)} (φ, \tilde{x}; {\bar{Λ}}_{n, 3} (\tilde{x})) - \hat{E} [{\hat{r}}_{n, 3}^{(m)} (φ, \tilde{x}; {\bar{Λ}}_{n, 3} (\tilde{x}))]| \leq I_{3, 1} (\tilde{x}) + I_{3, 2} (\tilde{x}),

where the stochastic components are defined as

I_{3, 1} (\tilde{x}) : = \frac{|u_{n, 3} (φ, \tilde{x}) - E [u_{n, 3} (φ, \tilde{x})]|}{|u_{n, 3} (1, \tilde{x})|},

I_{3, 2} (\tilde{x}) : = \frac{|E [u_{n, 3} (φ, \tilde{x})]| \cdot |u_{n, 3} (1, \tilde{x}) - E [u_{n, 3} (1, \tilde{x})]|}{|u_{n, 3} (1, \tilde{x})| \cdot |E [u_{n, 3} (1, \tilde{x})]|} .

The crux of the proof lies in establishing uniform (in

\tilde{x}

) almost sure bounds for both

I_{3, 1}

and

I_{3, 2}

, which then translate directly into the desired rate for the ratio estimator. To this end, we first establish uniform lower bounds for the denominator

u_{n, 3} (1, \tilde{x})

and its expectation under the MAR assumption. Under the MAR assumption and the positivity condition, together with the bias expansion established in Theorem 4 (adapted to the current kernel setting), we have the following uniform convergence results. For the denominator

u_{n, 3} (1, \tilde{x})

, Theorem 10 with

φ \equiv 1

furnishes the uniform almost sure convergence rate

sup_{\tilde{x} \in S_{X}^{m}} |u_{n, 3} (1, \tilde{x}) - E [u_{n, 3} (1, \tilde{x})]| = O (ϕ_{n}) a . s .,

where

ϕ_{n} = \sqrt{(\log n) / n} {(\prod_{j = 1}^{m} (\prod_{i = 1}^{d} b_{j_{i}} η_{j_{i}}))}^{- 1 / 2}

is the rate defined in Theorem 10. Moreover, from the bias expansion (see Proposition 3 and its extension to the current kernel), we have

E [u_{n, 3} (1, \tilde{x})] = \prod_{j = 1}^{m} p (x_{j}) \tilde{f} (\tilde{x}) + O (ϑ^{- m / 2}),

uniformly in

\tilde{x} \in S_{X}^{m}

. Since

\tilde{f} (\tilde{x})

is continuous and strictly positive on the compact set

S_{X}^{m}

(by condition (C.2) and the compactness of the domain), and since

p (x_{j}) \geq c > 0

by the positivity condition, there exists a constant

c_{2} > 0

such that

inf_{\tilde{x} \in S_{X}^{m}} |E [u_{n, 3} (1, \tilde{x})]| \geq c_{2} > 0 for all sufficiently large n .

Furthermore, the uniform convergence of

u_{n, 3} (1, \tilde{x})

to its expectation, guaranteed by Theorem 10, implies that for sufficiently large n,

inf_{\tilde{x} \in S_{X}^{m}} |u_{n, 3} (1, \tilde{x})| \geq \frac{c_{2}}{2} > 0 almost surely,

since the deviation

| u_{n, 3} (1, \tilde{x}) - E [u_{n, 3} (1, \tilde{x})] |

is of order

o (1)

uniformly. Define

c_{1} : = c_{2} / 2

; then, almost surely,

inf_{\tilde{x} \in S_{X}^{m}} |u_{n, 3} (1, \tilde{x})| \geq c_{1} > 0 .

For the numerator expectation, the bias expansion together with condition (C.3) gives the uniform boundedness

sup_{\tilde{x} \in S_{X}^{m}} |E [u_{n, 3} (φ, \tilde{x})]| = O (1),

so there exists a constant

C_{φ} < \infty

such that

{sup}_{\tilde{x}} | E [u_{n, 3} (φ, \tilde{x})] | \leq C_{φ}

. With these preparatory estimates in hand, we now bound each term. For

I_{3, 1} (\tilde{x})

, using the lower bound

| u_{n, 3} (1, \tilde{x}) | \geq c_{1}

almost surely, we obtain

sup_{\tilde{x} \in S_{X}^{m}} \frac{I_{3, 1} (\tilde{x})}{ϕ_{n}} \leq \frac{1}{c_{1}} sup_{\tilde{x} \in S_{X}^{m}} \frac{|u_{n, 3} (φ, \tilde{x}) - E [u_{n, 3} (φ, \tilde{x})]|}{ϕ_{n}} = O (1) a . s .,

where the final equality follows directly from the uniform rate provided by Theorem 10 applied to the kernel

φ

. For

I_{3, 2} (\tilde{x})

, we similarly apply the lower bounds

| u_{n, 3} (1, \tilde{x}) | \geq c_{1}

and

| E [u_{n, 3} (1, \tilde{x})] | \geq c_{2}

, together with the boundedness of

E [u_{n, 3} (φ, \tilde{x})]

, to obtain

\begin{matrix} sup_{\tilde{x} \in S_{X}^{m}} \frac{I_{3, 2} (\tilde{x})}{ϕ_{n}} & \leq \frac{{sup}_{\tilde{x}} | E [u_{n, 3} (φ, \tilde{x})] |}{c_{1} c_{2}} \cdot sup_{\tilde{x} \in S_{X}^{m}} \frac{|u_{n, 3} (1, \tilde{x}) - E [u_{n, 3} (1, \tilde{x})]|}{ϕ_{n}} \end{matrix}

(14.17)

\begin{matrix} \leq \frac{C_{φ}}{c_{1} c_{2}} \cdot O (1) = O (1) a . s ., \end{matrix}

(14.18)

where the rate for

u_{n, 3} (1, \tilde{x})

again follows from Theorem 10 with

φ \equiv 1

.

Summing the two contributions, we conclude that, almost surely,

sup_{\tilde{x} \in S_{X}^{m}} \frac{|{\hat{r}}_{n, 3}^{(m)} (φ, \tilde{x}; {\bar{Λ}}_{n, 3} (\tilde{x})) - \hat{E} [{\hat{r}}_{n, 3}^{(m)} (φ, \tilde{x}; {\bar{Λ}}_{n, 3} (\tilde{x}))]|}{ϕ_{n}} = O (1),

which is equivalent to the statement

sup_{\tilde{x} \in S_{X}^{m}} |{\hat{r}}_{n, 3}^{(m)} (φ, \tilde{x}; {\bar{Λ}}_{n, 3} (\tilde{x})) - \hat{E} [{\hat{r}}_{n, 3}^{(m)} (φ, \tilde{x}; {\bar{Λ}}_{n, 3} (\tilde{x}))]| = O (ϕ_{n}) a . s .

This completes the proof of Theorem 11. □

Remark 29.

Several subtle points in the proof merit explicit commentary. First, the uniform almost sure bounds for

u_{n, 3} (φ, \tilde{x})

and

u_{n, 3} (1, \tilde{x})

are not merely of order

O_{P} (ϕ_{n})

but hold with probability one; this strengthening is essential for the subsequent manipulation of the ratio, as it allows us to treat the denominator as uniformly bounded away from zero in an almost sure sense. Second, the constants

c_{1}

and

c_{2}

depend implicitly on the propensity score

p (\cdot)

and the density

f (\cdot)

through the bias expansion, but crucially not on the sample size n or the bandwidth parameter; this uniformity is guaranteed by the compactness of

S_{X}^{m}

and the continuity of p and f. Third, while the MAR assumption introduces the factor

\prod_{j = 1}^{m} p (x_{j})

into the expectation

E [u_{n, 3} (1, \tilde{x})]

, this factor cancels completely in the ratio

\hat{E} [{\hat{r}}_{n, 3}^{(m)}] = E [u_{n, 3} (φ, \tilde{x})] / E [u_{n, 3} (1, \tilde{x})]

, as the same factor appears in both numerator and denominator. This cancellation is the mathematical manifestation of the well-known property that complete-case estimators under MAR retain the same bias expansion as their complete-data counterparts, up to higher-order terms that are asymptotically negligible. Consequently, the rate

ϕ_{n}

remains unchanged from the complete-data case, although the asymptotic variance is inflated by the factor

1 / p (x_{j})

(a phenomenon that affects the constant in the central limit theorem but not the rate of convergence). Finally, the application of Theorem 10 is justified under the same regularity conditions as in the previous sections, with the bandwidth parameters

b_{j_{i}}

and

η_{j_{i}}

satisfying the appropriate decay conditions.

Proof of Theorem 12.

The demonstration of Theorem 12 proceeds under the MAR assumption (2.4) with the positivity condition

{inf}_{x \in S_{d, 1}} p (x) \geq c > 0

. Throughout this proof, all U-statistics are understood to be the complete-case versions incorporating the missingness indicators

δ_{i}

as defined in (3.5); for notational brevity, we write

u_{n, 3} (φ, \tilde{x})

in place of

u_{n, 3}^{(miss)} (φ, \tilde{x})

, with the implicit understanding that the product

\prod_{j = 1}^{m} δ_{i_{j}}

is included in the kernel. To obtain the desired results, we need to prove that:

\begin{matrix} sup_{\tilde{x} \in S_{X}^{m}} |E \{u_{n, 3} (φ, \tilde{x})\} - R (φ, \tilde{x})| = O (\sum_{j = 1}^{m} \sum_{i = 1}^{d} b_{j_{i}}) . \end{matrix}

We first remark that under the MAR assumption, the expectation of the complete-case U-statistic takes the form

\begin{matrix} E [u_{n, 3} (φ, \tilde{x})] & = & \frac{(n - m)!}{n!} \sum_{i \in I (m, n)} E [G_{φ, \tilde{x}, 3}^{(miss)} ({\tilde{X}}_{i}, {\tilde{Y}}_{i}, {\tilde{δ}}_{i})] \\ = & E [φ ({\tilde{Y}}_{i}) (\prod_{j = 1}^{m} δ_{i_{j}}) {\tilde{K}}_{{\bar{Λ}}_{n, 3} (\tilde{x})} ({\tilde{X}}_{i})] \\ = & E [E [φ ({\tilde{Y}}_{i}) (\prod_{j = 1}^{m} δ_{i_{j}}) ∣ {\tilde{X}}_{i}] {\tilde{K}}_{{\bar{Λ}}_{n, 3} (\tilde{x})} ({\tilde{X}}_{i})] \\ = & E [(\prod_{j = 1}^{m} p (X_{i_{j}})) E [φ ({\tilde{Y}}_{i}) ∣ {\tilde{X}}_{i}] {\tilde{K}}_{{\bar{Λ}}_{n, 3} (\tilde{x})} ({\tilde{X}}_{i})] (by MAR) \\ = & \int_{{[0, 1]}^{d m}} r^{(m)} (φ, \tilde{u}) (\prod_{j = 1}^{m} p (u_{j})) \tilde{f} (\tilde{u}) {\tilde{K}}_{{\bar{Λ}}_{n, 3} (\tilde{x})} (\tilde{u}) d \tilde{u} \\ = & E [R (φ, θ_{\tilde{x}}) (\prod_{j = 1}^{m} p (θ_{x_{j}}))], \end{matrix}

where

θ_{\tilde{x}} = (θ_{x_{1}}, \dots, θ_{x_{m}}) \in {[0, 1]}^{d m}

is a random vector whose components are independent beta random variables with

θ_{x_{j}} \sim Beta (x_{j} / b_{j} + 1, (1 - x_{j}) / b_{j} + 1)

, and

{\tilde{K}}_{{\bar{Λ}}_{n, 3} (\tilde{x})} (\cdot)

denotes the product of the corresponding beta densities. The final equality follows from the fact that

{\tilde{K}}_{{\bar{Λ}}_{n, 3} (\tilde{x})} (\cdot)

is precisely the joint density of

θ_{\tilde{x}}

. Following the same reasoning as the proof of Theorem 9 (adapted to the current kernel setting), we perform a second-order Taylor expansion of the function

R (φ, \cdot) \prod_{j = 1}^{m} p (\cdot)

around

θ_{\tilde{x}} = \tilde{x}

. This yields

\begin{matrix} E [R (φ, θ_{\tilde{x}}) \prod_{j = 1}^{m} p (θ_{x_{j}})] \\ = R (φ, \tilde{x}) \prod_{j = 1}^{m} p (x_{j}) \\ + \sum_{i = 1}^{m} \sum_{ℓ = 1}^{d} \frac{\partial}{\partial x_{i_{ℓ}}} (R (φ, \tilde{x}) p (x_{i})) E (θ_{x_{i_{ℓ}}} - x_{i_{ℓ}}) \\ + \frac{1}{2} \sum_{i = 1}^{m} \sum_{ℓ = 1}^{d} \frac{\partial^{2}}{\partial x_{i_{ℓ}}^{2}} (R (φ, \tilde{\underset{̲}{x}}) p ({\underset{̲}{x}}_{i})) E {(θ_{x_{i_{ℓ}}} - x_{i_{ℓ}})}^{2} \\ + \sum_{i, j = 1, i \neq j}^{m} \sum_{ℓ, r = 1, ℓ \neq r}^{d} \frac{\partial^{2}}{\partial x_{i_{ℓ}} \partial x_{j_{r}}} (R (φ, \tilde{\underset{̲}{x}}) p ({\underset{̲}{x}}_{i}) p ({\underset{̲}{x}}_{j})) E \{(θ_{x_{i_{ℓ}}} - x_{i_{ℓ}}) (θ_{x_{j_{r}}} - x_{j_{r}})\}, \end{matrix}

for some

\tilde{\underset{̲}{x}}

lying on the line segment joining

θ_{\tilde{x}}

and

\tilde{x}

. The existence of such a point is guaranteed by the mean-value theorem, and the smoothness condition (C.2) ensures that the second-order derivatives are continuous and bounded on the compact domain. For a Beta distribution with parameters

α = x / b + 1

and

β = (1 - x) / b + 1

, we have

E [θ] = \frac{α}{α + β} = \frac{x / b + 1}{1 / b + 2} = x + b (1 - 2 x) + O (b^{2}) .

Hence,

E (θ - x) = O (b)

. The second moment satisfies

E [{(θ - x)}^{2}] = Var (θ) + {(E [θ] - x)}^{2} = O (b) + O (b^{2}) = O (b)

. The Cauchy–Schwarz inequality does not improve the rate here because the first-order term is already of order b, not

b^{1 / 2}

. Consequently, we have the estimate:

E [R (φ, θ_{\tilde{x}}) \prod_{j = 1}^{m} p (θ_{x_{j}})] = R (φ, \tilde{x}) \prod_{j = 1}^{m} p (x_{j}) + O (\sum_{j = 1}^{m} \sum_{i = 1}^{d} b_{j_{i}}) .

Thus,

\begin{matrix} sup_{\tilde{x} \in S_{X}^{m}} |E \{u_{n, 3} (φ, \tilde{x})\} - R (φ, \tilde{x}) \prod_{j = 1}^{m} p (x_{j})| = O (\sum_{j = 1}^{m} \sum_{i = 1}^{d} b_{j_{i}}) . \end{matrix}

(14.19)

Taking

φ \equiv 1

in the above equation gives us

\begin{matrix} sup_{\tilde{x} \in S_{X}^{m}} |E \{u_{n, 3} (1, \tilde{x})\} - \tilde{f} (\tilde{x}) \prod_{j = 1}^{m} p (x_{j})| = O (\sum_{j = 1}^{m} \sum_{i = 1}^{d} b_{j_{i}}) . \end{matrix}

(14.20)

Now, recall the bias decomposition (11.3):

\hat{E} [{\hat{r}}_{n, 3}^{(m)} (φ, \tilde{x}; {\bar{Λ}}_{n, 3} (\tilde{x}))] - r^{(m)} (φ, \tilde{x}) = \frac{E [u_{n, 3} (φ, \tilde{x})] - r^{(m)} (φ, \tilde{x}) E [u_{n, 3} (1, \tilde{x})]}{E [u_{n, 3} (1, \tilde{x})]} .

Combining (14.19) and (14.20), we compute the numerator:

\begin{matrix} E [u_{n, 3} (φ, \tilde{x})] - r^{(m)} (φ, \tilde{x}) E [u_{n, 3} (1, \tilde{x})] & = & [R (φ, \tilde{x}) \prod_{j = 1}^{m} p (x_{j}) + O (\sum b_{j_{i}})] \\ - r^{(m)} (φ, \tilde{x}) [\tilde{f} (\tilde{x}) \prod_{j = 1}^{m} p (x_{j}) + O (\sum b_{j_{i}})] \\ = & \prod_{j = 1}^{m} p (x_{j}) [R (φ, \tilde{x}) - r^{(m)} (φ, \tilde{x}) \tilde{f} (\tilde{x})] + O (\sum b_{j_{i}}) \\ = & O (\sum_{j = 1}^{m} \sum_{i = 1}^{d} b_{j_{i}}), \end{matrix}

since

R (φ, \tilde{x}) = r^{(m)} (φ, \tilde{x}) \tilde{f} (\tilde{x})

by definition. The factor

\prod_{j = 1}^{m} p (x_{j})

cancels exactly, a crucial consequence of the MAR assumption ensuring that the same propensity score product appears in both the numerator and denominator expectations. Under the positivity condition

{inf}_{\tilde{x} \in S_{X}^{m}} \tilde{f} (\tilde{x}) > 0

(which follows from condition (C.2) and the compactness of

S_{X}^{m}

), together with the fact that

\prod_{j = 1}^{m} p (x_{j}) \geq c^{m} > 0

, we have

inf_{\tilde{x} \in S_{X}^{m}} |E [u_{n, 3} (1, \tilde{x})]| \geq \frac{1}{2} inf \tilde{f} (\tilde{x}) \cdot c^{m} > 0

for sufficiently large n. Therefore, the denominator in the bias decomposition is uniformly bounded away from zero. Consequently,

\begin{matrix} sup_{\tilde{x} \in S_{X}^{m}} |\hat{E} [{\hat{r}}_{n, 3}^{(m)} (φ, \tilde{x}; {\bar{Λ}}_{n, 3} (\tilde{x}))] - r^{(m)} (φ, \tilde{x})| = O (\sum_{j = 1}^{m} \sum_{i = 1}^{d} b_{j_{i}}) . \end{matrix}

Hence, the proof of Theorem 12 is complete. □

Proof of Theorem 13.

The demonstration of Theorem 13 establishes the almost sure uniform convergence rate of the complete-case conditional U-statistic

u_{n, 3} (φ, \tilde{x})

under the MAR assumption (2.4) with the positivity condition

{inf}_{x \in S_{d, 1}} p (x) \geq c > 0

. Throughout this proof, all U-statistics are understood to be the complete-case versions incorporating the missingness indicators

δ_{i}

as defined in (3.5); for notational brevity, we write

u_{n, 3} (φ, \tilde{x})

in place of

u_{n, 3}^{(miss)} (φ, \tilde{x})

, with the implicit understanding that the product

\prod_{j = 1}^{m} δ_{i_{j}}

is included in the kernel. Using the notation established in the proof of Theorem 10, and employing a reasoning akin to that of [165], we proceed to redefine the truncation threshold

ω_{n, 3}

and the grid size parameter

N_{n}

as follows:

ω_{n, 3} : = n^{\frac{1 + ε}{2 + γ}}, N_{n} : = n^{1 + ε} {(\prod_{j = 1}^{m} (\prod_{i = 1}^{d} b_{j_{i}} η_{j_{i}}))}^{- \frac{1}{2}} (\sum_{j = 1}^{m} \sum_{i = 1}^{d} \frac{1}{b_{j_{i}}^{2}}),

for an arbitrarily small

ε > 0

. These choices are designed to ensure the summability of the tail probabilities via the Borel–Cantelli lemma, thereby yielding almost sure convergence. The rate of convergence is given by

ϕ_{n} : = \sqrt{\frac{(\log n / n)}{\prod_{j = 1}^{m} (\prod_{i = 1}^{d} b_{j_{i}} η_{j_{i}})}} .

In order to prove Theorem 13, we need to demonstrate that

sup_{\tilde{x} \in S_{X}^{m}} |u_{n, 3} (φ, \tilde{x}) - E \{u_{n, 3} (φ, \tilde{x})\}| = O (ϕ_{n}), a . s .

(14.21)

Recall the truncation decomposition

u_{n, 3} (φ, \tilde{x}) = u_{n, 3}^{(T)} (φ, \tilde{x}) + u_{n, 3}^{(R)} (φ, \tilde{x})

from (11.1), where the truncated part

u_{n, 3}^{(T)}

uses

φ^{(T)} = φ 1_{{| φ | \leq ω_{n, 3}}}

and the remainder part

u_{n, 3}^{(R)}

uses

φ^{(R)} = φ 1_{{| φ | > ω_{n, 3}}}

. Under the MAR assumption, the remainder kernel incorporates the missingness indicators as

G_{φ, \tilde{x}, 3}^{(miss), (R)} (\tilde{X}, \tilde{Y}, \tilde{δ}) = φ^{(R)} (\tilde{Y}) (\prod_{j = 1}^{m} δ_{j}) {\tilde{K}}_{{\bar{Λ}}_{n, 3} (\tilde{x})} (\tilde{X})

. We first analyze the remainder term. From (14.11) and (14.12), which under the MAR assumption incorporate the factor

\prod_{j = 1}^{m} p (u_{j}) \leq 1

, we obtain the bound

|E [u_{n, 3}^{(R)} (φ, \tilde{x})]| \leq ω_{n, 3}^{- (1 + γ)} C_{1} = n^{- (1 + ε) (\frac{1 + γ}{2 + γ})} C_{1} \leq O (ϕ_{n}) .

(14.22)

This follows from the definition

ω_{n, 3} = n^{(1 + ε) / (2 + γ)}

, which implies

ω_{n, 3}^{- (1 + γ)} = n^{- (1 + ε) (1 + γ) / (2 + γ)} = n^{- (1 + ε)} \cdot n^{ε (1 + γ) / (2 + γ)} = o (ϕ_{n})

since

ϕ_{n}

decays polynomially. Moreover, by condition (C.3) (or its generalization (C.3)″) and Markov’s inequality, we infer

\begin{matrix} \sum_{n = 1}^{\infty} P (| φ ({\tilde{Y}}_{n}) | > ω_{n, 3}) & \leq & \sum_{n = 1}^{\infty} \frac{E (| φ ({\tilde{Y}}_{n}) |^{2 + γ})}{ω_{n, 3}^{2 + γ}} \\ = & E (| φ ({\tilde{Y}}_{n}) |^{2 + γ}) \sum_{n = 1}^{\infty} \frac{1}{n^{(1 + ε)}} < \infty, \end{matrix}

since

ω_{n, 3}^{2 + γ} = n^{(1 + ε)}

and the series

\sum n^{- (1 + ε)}

converges for any

ε > 0

. The finiteness of

E (| φ ({\tilde{Y}}_{n}) |^{2 + γ})

is guaranteed by condition (C.3). Applying the Borel–Cantelli lemma, we conclude that for sufficiently large n,

| φ ({\tilde{Y}}_{n}) | \leq ω_{n, 3}

with probability one. This implies that for any

i \leq n

,

| φ ({\tilde{Y}}_{i}) | \leq ω_{n, 3}

almost surely for sufficiently large n. Consequently, the indicator

1_{{| φ ({\tilde{Y}}_{i}) | > ω_{n, 3}}}

vanishes almost surely for all m-tuples

i \in I (m, n)

. It follows that

u_{n, 3}^{(R)} (φ, \tilde{x}) = 0

almost surely for sufficiently large n, and therefore

|u_{n, 3}^{(R)} (φ, \tilde{x}) - E [u_{n, 3}^{(R)} (φ, \tilde{x})]| = O (ϕ_{n}) a . s .,

uniformly in

\tilde{x} \in S_{X}^{m}

. We now examine the discretization bias arising from replacing

\tilde{x}

by its grid approximation

{\tilde{x}}_{ℓ (\tilde{x})}

. Observe that

ω_{n, 3} N_{n}^{- m} \prod_{j = 1}^{m} \{{(\prod_{i = 1}^{d} b_{j_{i}} η_{j_{i}})}^{- \frac{1}{2}}\} (\sum_{j = 1}^{m} \sum_{i = 1}^{d} \frac{1}{b_{j_{i}}^{2}}) = n^{- m (1 + ε) (\frac{1 + γ}{2 + γ})} \cdot (polynomial factors) = O (ϕ_{n}),

since the polynomial factors are dominated by the exponential decay of

n^{- m (1 + ε)}

for sufficiently large n. Using the same reasoning as in the proof of Theorem 10 (see (14.4) and the subsequent estimates), we obtain

\begin{matrix} sup_{\tilde{x} \in S_{X}^{m}} |E [u_{n, 3}^{(T)} (φ, {\tilde{x}}_{ℓ (\tilde{x})})] - E [u_{n, 3}^{(T)} (φ, \tilde{x})]| \\ = O \{ω_{n, 3} N_{n}^{- m} \prod_{j = 1}^{m} \{{(\prod_{i = 1}^{d} b_{j_{i}} η_{j_{i}})}^{- \frac{1}{2}}\} (\sum_{j = 1}^{m} \sum_{i = 1}^{d} \frac{1}{b_{j_{i}}^{2}})\} = O (ϕ_{n}) . \end{matrix}

This bound is uniform in

\tilde{x}

and holds almost surely. The exponential bounds (14.1) and (14.9) from the proof of Theorem 10 remain valid under the MAR assumption with the modified definitions of

ω_{n, 3}

and

N_{n}

. Specifically, for any

ε > 0

,

P (|u_{n, 3}^{(T)} (φ, \tilde{x}) - E (u_{n, 3}^{(T)} (φ, \tilde{x}))| > ε ρ ϕ_{n}) \leq 2 n^{- ε^{2} / 4},

and for the nonlinear terms,

P (sup_{\tilde{x} \in S_{X}^{m}} |\sum_{q = 2}^{m} (\binom{m}{q}) u_{n, 3}^{(q)} (π_{q, m} (G_{φ, {\tilde{x}}_{ℓ}, 3}^{(miss), (T)}))| > ε_{0} ρ ϕ_{n}) \leq N_{n}^{d} n^{- m (ε_{0} / 2 C_{6})} .

Condition (C.4’) (or an appropriate growth condition on the bandwidth parameters) implies that

\begin{matrix} \prod_{j = 1}^{m} \{{(\prod_{i = 1}^{d} b_{j_{i}} η_{j_{i}})}^{- \frac{1}{2}}\} (\sum_{j = 1}^{m} \sum_{i = 1}^{d} \frac{1}{b_{j_{i}}^{2}}) & = & O \{n^{\frac{1}{1 - κ}} {(\frac{{(\prod_{j = 1}^{m} \{{(\prod_{i = 1}^{d} b_{j_{i}} η_{j_{i}})}^{- \frac{1}{2}}\})}^{\frac{κ}{2}}}{\log (n)})}^{\frac{1}{1 - κ}}\} \\ \leq & O (n^{\frac{1}{1 - κ}}), \end{matrix}

(14.23)

where the last inequality holds because

{(\prod_{j = 1}^{m} {{(\prod_{i = 1}^{d} b_{j_{i}} η_{j_{i}})}^{- 1 / 2}})}^{κ / 2} / \log (n)

is bounded. This condition ensures that

N_{n}

grows at most polynomially in n. Now choose

K : = 2 \sqrt{(d + 1) (1 + ε) + \frac{d}{1 - κ}} .

With this choice, we have

N_{n}^{d} = O (n^{d / (1 - κ)})

up to logarithmic factors, and consequently

N_{n}^{d} n^{- K^{2} / 4} = O (n^{\frac{d}{1 - κ} - \frac{K^{2}}{4}}) = O (n^{- (1 + ε)}),

since

K^{2} / 4 = (d + 1) (1 + ε) + d / (1 - κ) > d / (1 - κ) + (1 + ε)

. Therefore,

\sum_{n = 1}^{\infty} P (sup_{\tilde{x} \in S_{X}^{m}} |u_{n, 3}^{(T)} (φ, \tilde{x}) - E (u_{n, 3}^{(T)} (φ, \tilde{x}))| > ε ρ ϕ_{n}) \leq \sum_{n = 1}^{\infty} O (\frac{1}{n^{1 + ε}}) < \infty .

By the Borel–Cantelli lemma, the summability of the tail probabilities implies that, almost surely,

sup_{\tilde{x} \in S_{X}^{m}} |u_{n, 3}^{(T)} (φ, \tilde{x}) - E (u_{n, 3}^{(T)} (φ, \tilde{x}))| = O (ϕ_{n}) .

Noting that the truncated part dominates the asymptotic behavior (the remainder part is identically zero almost surely for sufficiently large n), we obtain

sup_{\tilde{x} \in S_{X}^{m}} |u_{n, 3} (φ, \tilde{x}) - E \{u_{n, 3} (φ, \tilde{x})\}| = O (ϕ_{n}) a . s .

This completes the proof of Theorem 13. □

Remark 30.

Several subtle points in the proof deserve explicit commentary. First, the choice

ω_{n, 3} = n^{(1 + ε) / (2 + γ)}

is critical for the Borel–Cantelli argument: it ensures that

\sum P (| φ ({\tilde{Y}}_{n}) | > ω_{n, 3})

converges, while still allowing

ω_{n, 3}

to grow slowly enough that the remainder part is asymptotically negligible. Second, the condition (C.4’) (or an analogous growth condition on

b_{j_{i}}

and

η_{j_{i}}

) guarantees that

N_{n}

does not grow too fast, specifically

N_{n}^{d} = O (n^{d / (1 - κ)})

, which is essential for the summability of the tail probabilities after the union bound over the

N_{n}^{d}

grid points. Third, the choice

K = 2 \sqrt{(d + 1) (1 + ε) + d / (1 - κ)}

is carefully calibrated to ensure that

K^{2} / 4 > d / (1 - κ) + (1 + ε)

, yielding

N_{n}^{d} n^{- K^{2} / 4} = O (n^{- (1 + ε)})

. This ensures the series converges. Fourth, under the MAR assumption, the missingness indicators

δ_{i}

do not affect the almost sure convergence rates because they are bounded by 1 and the positivity condition ensures that the effective sample size remains proportional to n almost surely. The factor

\prod_{j = 1}^{m} p (u_{j})

that appears in the expectations is bounded by 1, so it does not alter the rates. Finally, the almost sure convergence established here is stronger than convergence in probability, providing uniform control over the entire domain

S_{X}^{m}

with probability one.

Proof of Theorem 14.

The proof of Theorem 14 is done in the same fashion as the proof of Theorem 11, combining (11.2) with the results of Theorem 13. □

Proof of Theorem 15.

The proof of Theorem 15 is the same as the proof of Theorem 12. □

15. Proofs of Section 3.1

The regression proof for the case where

m = 1

closely resembles the one given in [164], now extended to accommodate missing responses under the MAR assumption (2.4) with the positivity condition

{inf}_{x \in S_{d, 1}} p (x) \geq c > 0

. We include it here in full detail for the reader’s convenience and to ensure it is self-contained. However, the result concerning the regression function smoothed by the Dirichlet kernel in the presence of missing data has not been addressed in the literature, providing the primary motivation for presenting it in this paper.

Proof of Theorem 1.

Throughout this proof, all estimators are understood to be the complete-case versions incorporating the missingness indicators

δ_{i}

as defined in (3.8) and (3.9). For notational brevity, we write

{\hat{g}}_{n} (φ, x, Λ_{n, 1})

in place of

{\hat{g}}_{n}^{(miss)} (φ, x, Λ_{n, 1})

, with the implicit understanding that the factor

δ_{i}

is included in the summand. Specifically,

{\hat{g}}_{n} (φ, x, Λ_{n, 1}) = \frac{1}{n} \sum_{i = 1}^{n} φ (Y_{i}) δ_{i} K_{(α, β)} (X_{i}) .

Observe that

\begin{matrix} sup_{x \in S_{d, 1} (\overset{˘}{b} d)} |{\hat{g}}_{n} (φ, x, Λ_{n, 1}) - R (φ, x)| & \leq & sup_{x \in S_{d, 1} (\overset{˘}{b} d)} |{\hat{g}}_{n} (φ, x, Λ_{n, 1}) - E [{\hat{g}}_{n} (φ, x, Λ_{n, 1})]| \\ + sup_{x \in S_{d, 1} (\overset{˘}{b} d)} |E [{\hat{g}}_{n} (φ^{(T)}, x, Λ_{n, 1})] - R (φ, x)| . \end{matrix}

Keep in mind the definition of the set

S_{d, 1} (δ)

given in (3.11). First, we need to prove the following result under the MAR assumption

sup_{x \in S_{d, 1} (\overset{˘}{b} d)} |{\hat{g}}_{n} (φ, x, Λ_{n, 1}) - E [{\hat{g}}_{n} (φ, x, Λ_{n, 1})]| = O (\frac{| \log \overset{˘}{b} {| (\log n)}^{3 / 2}}{{\overset{˘}{b}}^{d + 1 / 2} \sqrt{n}}), a . s .

(15.1)

The proof of (15.1) follows the same analogy as in [164] while applying the necessary changes to fit our context, including the incorporation of missingness indicators. Under the MAR assumption, we have

\begin{matrix} {\hat{g}}_{n} (φ, x, Λ_{n, 1}) - E [{\hat{g}}_{n} (φ, x, Λ_{n, 1})] & = & \frac{1}{n} \sum_{i = 1}^{n} \{φ (Y_{i}) δ_{i} K_{(α, β)} (X_{i}) - E [φ (Y_{i}) δ_{i} K_{(α, β)} (X_{i})]\} \\ = & \frac{1}{n} \sum_{i = 1}^{n} Z_{i, b} (x), \end{matrix}

where, for

i = 1, \dots, n

,

Z_{i, b} (x) : = φ (Y_{i}) δ_{i} K_{(α, β)} (X_{i}) - E [φ (Y_{i}) δ_{i} K_{(α, β)} (X_{i})] .

Note that the expectation incorporates the propensity score via

E [δ_{i} ∣ X_{i} = u] = p (u)

. For some sequence

ω_{n, 1}

tending to infinity (to be specified later), we also consider the following truncation notation

\begin{matrix} φ^{(T)} (y) : & = & φ (y) 1_{\{|φ (y)| \leq ω_{n, 1}\}}, \\ φ^{(R)} (y) : & = & φ (y) 1_{\{|φ (y)| > ω_{n, 1}\}} . \end{matrix}

This allows us to write

\begin{matrix} {\hat{g}}_{n} (φ, x, Λ_{n, 1}) - E [{\hat{g}}_{n} (φ, x, Λ_{n, 1})] & = & \frac{1}{n} \sum_{i = 1}^{n} \{Z_{i, b}^{(T)} (x) + Z_{i, b}^{(R)} (x)\}, \end{matrix}

with

Z_{i, b}^{(T)} (x) : = φ^{(T)} (Y_{i}) δ_{i} K_{(α, β)} (X_{i}) - E [φ^{(T)} (Y_{i}) δ_{i} K_{(α, β)} (X_{i})],

(15.2)

and

Z_{i, b}^{(R)} (x) : = φ^{(R)} (Y_{i}) δ_{i} K_{(α, β)} (X_{i}) - E [φ^{(R)} (Y_{i}) δ_{i} K_{(α, β)} (X_{i})] .

(15.3)

We also denote the centered kernel for the constant function

W_{i, b} (x) : = δ_{i} K_{(α, β)} (X_{i}) - E [δ_{i} K_{(α, β)} (X_{i})] .

(15.4)

The following proposition, which is an adaptation of Proposition 1 of [164] to the MAR setting, will play an instrumental role in the sequel.

Proposition 4.

Let

x \in S_{d, 1} (\overset{˘}{b} (d + 1))

,

n \geq 1

,

0 < \overset{˘}{b} < (e^{- 16 \sqrt{2}} \land d^{- 1})

,

0 < a \leq e^{- 1} {∥ f ∥}_{\infty} | \log \overset{˘}{b} | / {\overset{˘}{b}}^{d + 1 / 2}

, and take the unique

δ \in (0, e^{- 1}] t h a t s a t i s f i e s δ | \log δ | = \frac{{\overset{˘}{b}}^{d + 1 / 2} a}{{∥ f ∥}_{\infty} | \log \overset{˘}{b} |} .

(15.5)

Then, under the MAR assumption (2.4) with the positivity condition

{inf}_{x \in S_{d, 1}} p (x) \geq c > 0

, for all

h \in R

,

\begin{matrix} P (sup_{x^{'} \in x + {[- \overset{˘}{b}, \overset{˘}{b}]}^{d}} |\frac{1}{n} \sum_{i = 1}^{n} Z_{i, b}^{(T)} (x^{'})| \geq h + 2 a ω_{n, 1}, |\frac{1}{n} \sum_{i = 1}^{n} Z_{i, b}^{(T)} (x)| \leq h) \\ \leq C_{φ, d} \exp (- \frac{1}{100^{2} d^{4} {∥ f ∥}_{\infty}^{2}} \cdot {(\frac{n^{1 / 2} {\overset{˘}{b}}^{d + 1 / 2} a}{| \log δ | | \log \overset{˘}{b} |})}^{2}), \end{matrix}

(15.6)

where

C_{φ, d} > 0

is a constant that depends only on the function

φ (\cdot)

, the dimension d, and the bounds on the propensity score

p (\cdot)

.

Sketch of Proposition 4.

The proof follows the same lines as Proposition 1 in [164], with the key modification that the random variables

Z_{i, b}^{(T)} (x)

now include the missingness indicator

δ_{i}

. However, under the MAR assumption,

δ_{i}

is independent of

Y_{i}

conditionally on

X_{i}

, and

0 \leq δ_{i} \leq 1

. Moreover,

E [δ_{i} ∣ X_{i} = u] = p (u)

, and under the positivity condition,

p (u)

is bounded away from zero and bounded above by 1. Consequently, all moment bounds that hold for the complete-data case remain valid up to constants that depend on

p (\cdot)

. Specifically, the boundedness of

| φ^{(T)} | \leq ω_{n, 1}

and the kernel bounds from Lemma 5 and Lemma 6 (adapted to the Dirichlet kernel) ensure that the increments of the process are controlled. The exponential inequality then follows from a chaining argument and Bernstein’s inequality for bounded random variables, with constants adjusted to account for the presence of

δ_{i}

. The factor

ω_{n, 1}

appears due to the truncation of

φ

. The detailed calculations mirror those in [164] and are omitted here for brevity. □

Using Proposition 4 and a chaining argument over the compact set

S_{d, 1} (\overset{˘}{b} d)

(which can be covered by

O ({\overset{˘}{b}}^{- d})

balls of radius

\overset{˘}{b}

), we obtain, for an appropriate choice of a (specifically

a = C | \log \overset{˘}{b} |

for a sufficiently large constant C), that

sup_{x \in S_{d, 1} (\overset{˘}{b} d)} |\frac{1}{n} \sum_{i = 1}^{n} Z_{i, b}^{(T)} (x)| = O (\frac{| \log \overset{˘}{b} {| (\log n)}^{3 / 2}}{{\overset{˘}{b}}^{d + 1 / 2} \sqrt{n}}) a . s .

The logarithmic factors arise from the union bound over the covering number and the choice of a. For the remainder term

Z_{i, b}^{(R)} (x)

, we use condition (C.3). Choose the truncation threshold

ω_{n, 1} : = {(\frac{{\overset{˘}{b}}^{d + 1 / 2} \sqrt{n}}{| \log \overset{˘}{b} {| (\log n)}^{3 / 2}})}^{1 / (1 + γ)} .

Then, by Markov’s inequality and the Borel–Cantelli lemma, we obtain

sup_{x \in S_{d, 1} (\overset{˘}{b} d)} |\frac{1}{n} \sum_{i = 1}^{n} Z_{i, b}^{(R)} (x)| = o (\frac{| \log \overset{˘}{b} {| (\log n)}^{3 / 2}}{{\overset{˘}{b}}^{d + 1 / 2} \sqrt{n}}) a . s .

The bias term is handled using a second-order Taylor expansion of

R (φ, \cdot)

around

x

, together with the properties of the Dirichlet kernel. Under condition (C.2) and the MAR assumption, we have

sup_{x \in S_{d, 1} (\overset{˘}{b} d)} |E [{\hat{g}}_{n} (φ, x, Λ_{n, 1})] - R (φ, x)| = O ({\overset{˘}{b}}^{1 / 2}) .

This rate follows from the fact that

E [ξ_{x}] = x + O (\overset{˘}{b})

and

Var (ξ_{x}) = O (\overset{˘}{b})

, where

ξ_{x} \sim Dirichlet (α, β)

. The factor

\prod_{j = 1}^{m} p (x_{j})

that appears in the expectation cancels in the ratio estimator, but for the numerator alone it remains bounded. Putting together the preceding results, we obtain

sup_{x \in S_{d, 1} (\overset{˘}{b} d)} |{\hat{g}}_{n} (φ, x, Λ_{n, 1}) - R (φ, x)| = O (\frac{| \log \overset{˘}{b} {| (\log n)}^{3 / 2}}{{\overset{˘}{b}}^{d + 1 / 2} \sqrt{n}}) + O ({\overset{˘}{b}}^{1 / 2}) a . s .

Recall that

{\hat{r}}_{n, 1}^{(1)} (φ, x) = {\hat{g}}_{n} (φ, x, Λ_{n, 1}) / {\hat{f}}_{n} (x, Λ_{n, 1})

, where

{\hat{f}}_{n}

is the complete-case density estimator. Using the same techniques as above, we also have

sup_{x \in S_{d, 1} (\overset{˘}{b} d)} |{\hat{f}}_{n} (x, Λ_{n, 1}) - f (x)| = O (\frac{| \log \overset{˘}{b} {| (\log n)}^{3 / 2}}{{\overset{˘}{b}}^{d + 1 / 2} \sqrt{n}}) + O ({\overset{˘}{b}}^{1 / 2}) a . s .

Under the positivity condition,

f (x)

is bounded away from zero on

S_{d, 1} (\overset{˘}{b} d)

. Therefore, using the identity

\frac{{\hat{g}}_{n}}{{\hat{f}}_{n}} - \frac{R}{f} = \frac{1}{{\hat{f}}_{n}} ({\hat{g}}_{n} - R) - \frac{R}{f {\hat{f}}_{n}} ({\hat{f}}_{n} - f),

we obtain

sup_{x \in S_{d, 1} (\overset{˘}{b} d)} |{\hat{r}}_{n, 1}^{(1)} (φ, x) - r^{(1)} (φ, x)| = O (\frac{| \log \overset{˘}{b} {| (\log n)}^{3 / 2}}{{\overset{˘}{b}}^{d + 1 / 2} \sqrt{n}}) + O ({\overset{˘}{b}}^{1 / 2}) a . s .

This completes the proof of Theorem 1. □

Remark 31.

The adaptation to the missing data setting under the MAR assumption required careful incorporation of the missingness indicators

δ_{i}

into the definitions of

Z_{i, b} (x)

,

Z_{i, b}^{(T)} (x)

,

Z_{i, b}^{(R)} (x)

, and

W_{i, b} (x)

. The key observation is that

0 \leq δ_{i} \leq 1

and, under the positivity condition,

E [δ_{i} ∣ X_{i}] = p (X_{i})

is bounded away from zero. This ensures that the effective sample size remains proportional to n and that all exponential inequalities remain valid with constants adjusted by the bounds on

p (\cdot)

. The truncation threshold

ω_{n, 1}

is chosen to balance the bias from the remainder term and the variance of the truncated part, yielding the same rates as in the complete-data case. The bias term

O ({\overset{˘}{b}}^{1 / 2})

arises from the first-order Taylor expansion of

R (φ, \cdot)

and the fact that

E [ξ_{x} - x] = O (\overset{˘}{b})

, while

E [{(ξ_{x} - x)}^{2}] = O (\overset{˘}{b})

, leading to a square-root rate after applying the Cauchy–Schwarz inequality. The logarithmic factors

| \log \overset{˘}{b} |

and

{(\log n)}^{3 / 2}

are standard in nonparametric kernel estimation and arise from the chaining argument and the union bound over the covering of the compact set

S_{d, 1} (\overset{˘}{b} d)

.

Proof of Proposition 4.

Throughout this proof, we operate under the MAR assumption (2.4) with the positivity condition

{inf}_{x \in S_{d, 1}} p (x) \geq c > 0

. The random variables

Z_{i, b}^{(T)} (x)

are defined in (15.2) and include the missingness indicator

δ_{i}

. Note that

0 \leq δ_{i} \leq 1

and

E [δ_{i} ∣ X_{i} = u] = p (u)

, with

p (u)

bounded between c and 1. Consequently, all moment bounds that hold for the complete-data case remain valid up to constants that depend on c. Following a similar approach to the proof in [164], we apply a union bound to show that the probability in (15.6) can be bounded as follows:

\begin{matrix} \leq P (\{sup_{x^{'} \in x + {[- \overset{˘}{b}, \overset{˘}{b}]}^{d}} |\frac{1}{n} \sum_{i = 1}^{n} (Z_{i, b}^{(T)} (x^{'}) - Z_{i, b}^{(T)} (x)) 1_{\{X_{i} \in S_{d, 1} ∖ S_{d, 1} (δ)\}}| \geq a ω_{n, 1}\} \end{matrix}

\begin{matrix} \cap \{\sum_{i = 1}^{n} 1_{\{X_{i} \in S_{d, 1} ∖ S_{d, 1} (δ)\}} \leq n \cdot 4 {∥ f ∥}_{\infty} δ\}) \end{matrix}

(15.7)

\begin{matrix} + P (\sum_{i = 1}^{n} 1_{\{X_{i} \in S_{d, 1} ∖ S_{d, 1} (δ)\}} \geq n \cdot 4 {∥ f ∥}_{\infty} δ) \end{matrix}

(15.8)

\begin{matrix} + P (sup_{x^{'} \in x + {[- \overset{˘}{b}, \overset{˘}{b}]}^{d}} |\frac{1}{n} \sum_{i = 1}^{n} (Z_{i, b}^{(T)} (x^{'}) - Z_{i, b}^{(T)} (x)) 1_{\{X_{i} \in S_{d, 1} (δ)\}}| \geq a ω_{n, 1}) \end{matrix}

(15.9)

\begin{matrix} = : (A) + (B) + (C) . \end{matrix}

(15.10)

To clarify our notation, for any subset

A \subset R^{d}

and any point

x \in R^{d}

, we define

x + A = {x + y : y \in A}

.

Bounding term (A).

Consider the assumption

x \in S_{d, 1} (\overset{˘}{b} (d + 1))

and

x^{'} = x + {[- \overset{˘}{b}, \overset{˘}{b}]}^{d}

. This implies that both

x

and

x^{'}

are in

S_{d, 1} (\overset{˘}{b})

, leading to the following relations for the Dirichlet kernel parameters:

α_{1} = \frac{x_{1}}{\overset{˘}{b}} + 1, \dots, α_{d} = \frac{x_{d}}{\overset{˘}{b}} + 1, β = \frac{1 - {∥ x ∥}_{1}}{\overset{˘}{b}} + 1 \geq 2,

and for

x^{'}

,

α_{1}^{'} = \frac{x_{1}^{'}}{\overset{˘}{b}} + 1, \dots, α_{d}^{'} = \frac{x_{d}^{'}}{\overset{˘}{b}} + 1, β^{'} = \frac{1 - ∥ x^{'} ∥_{1}}{\overset{˘}{b}} + 1 \geq 2 .

Consequently, we have the following bounds (see Lemma 2 in [164]):

\begin{matrix} \sqrt{\frac{{∥ α ∥}_{1} + β - 1}{(β - 1) \prod_{i = 1}^{d} (α_{i} - 1)}} \leq \sqrt{{∥ α ∥}_{1} + β - 1} = \sqrt{{\overset{˘}{b}}^{- 1} + d}, \end{matrix}

(15.11)

\begin{matrix} \sqrt{\frac{∥ α^{'} ∥_{1} + β^{'} - 1}{(β^{'} - 1) \prod_{i = 1}^{d} (α_{i}^{'} - 1)}} \leq \sqrt{∥ α^{'} ∥_{1} + β^{'} - 1} = \sqrt{{\overset{˘}{b}}^{- 1} + d} . \end{matrix}

(15.12)

Under the MAR assumption, the missingness indicator

δ_{i}

satisfies

0 \leq δ_{i} \leq 1

, so it does not affect the kernel bounds. Combining these results with our assumption in (15.5) and the upper bound on the Dirichlet density from Lemma 2 in [164], we obtain on the event

\{\sum_{i = 1}^{n} 1_{\{X_{i} \in S_{d, 1} ∖ S_{d, 1} (δ)\}} \leq 4 n {∥ f ∥}_{\infty} δ\}

the following estimate:

\begin{matrix} |\frac{1}{n} \sum_{i = 1}^{n} (Z_{i, b}^{(T)} (x^{'}) - Z_{i, b}^{(T)} (x)) 1_{\{X_{i} \in S_{d, 1} ∖ S_{d, 1} (δ)\}}| & \leq & 4 \cdot 4 ω_{n, 1} {∥ f ∥}_{\infty} δ \cdot {\overset{˘}{b}}^{- d} \sqrt{{\overset{˘}{b}}^{- 1} + d} \\ \leq & \frac{16 \sqrt{1 + \overset{˘}{b} d}}{| \log δ | | \log \overset{˘}{b} |} a ω_{n, 1} . \end{matrix}

(15.13)

Given the assumptions

0 < δ \leq e^{- 1}

and

0 < \overset{˘}{b} < (e^{- 16 \sqrt{2}} \land d^{- 1})

, we have

\frac{16 \sqrt{1 + \overset{˘}{b} d}}{| \log δ | | \log \overset{˘}{b} |} < a,

since a is chosen sufficiently large. Consequently, the event in (15.7) cannot occur, and we conclude

(A) = 0 .

(15.14)

Bounding term (B).

Term (15.8) represents the probability of encountering “too many bad observations,” meaning too many

X_{i}

s near the boundary of the simplex where the partial derivatives of the Dirichlet density with respect to

α_{1}, \dots, α_{d}

, and

β

diverge. We can control this term using a concentration bound. First, note that the volume of

S_{d, 1} ∖ S_{d, 1} (δ)

is at most

2 d δ / d!

. Specifically,

S_{d, 1} (δ)

forms a simplex of side-length

1 - 2 δ

within

S_{d, 1}

, so:

d! \cdot Volume (S_{d, 1} ∖ S_{d, 1} (δ)) = 1 - {(1 - 2 δ)}^{d} \leq 1 - (1 + d \cdot (- 2 δ)) = 2 d δ,

(15.15)

where we used the inequality

{(1 + x)}^{n} \geq 1 + n x

, which holds for all

n \in N

and

x \geq - 1

. From (15.15) and knowing that

{∥ f ∥}_{\infty}

is finite (since

f (\cdot)

is continuous by condition (C.2) and

S_{d, 1}

is compact), we obtain:

E [1_{\{X_{i} \in S_{d, 1} ∖ S_{d, 1} (δ)\}}] = \int_{S_{d, 1} ∖ S_{d, 1} (δ)} f (u) d u \leq {∥ f ∥}_{\infty} \cdot Volume (S_{d, 1} ∖ S_{d, 1} (δ)) \leq \frac{{2 ∥ f ∥}_{\infty}}{(d - 1)!} δ .

Applying Hoeffding’s inequality to the sum of independent Bernoulli random variables (the missingness indicators

δ_{i}

do not affect this bound as they are independent of

X_{i}

conditionally and bounded by 1), we have for

t = {4 ∥ f ∥}_{\infty} δ - E [1_{{\cdot}}]

:

(B) = P (\sum_{i = 1}^{n} 1_{\{X_{i} \in S_{d, 1} ∖ S_{d, 1} (δ)\}} - E [\cdot] \geq t) \leq \exp (- 2 n t^{2}) .

Using condition (15.5), we obtain:

(B) \leq \exp (- 2 {(\frac{n^{1 / 2} {\overset{˘}{b}}^{d + 1 / 2} a}{| \log δ | | \log \overset{˘}{b} |})}^{2}) .

(15.16)

Bounding term (C) via chaining and conditioning.

To bound the third probability in (15.10), we use a chaining argument combined with a conditional concentration inequality. Let

H_{k} : = 2^{- k} \cdot \overset{˘}{b} Z^{d}

be the sequence of lattice points, and for

x \in S_{d, 1} (\overset{˘}{b} (d + 1))

fixed, let

{(x_{k})}_{k \in N_{0}}

be a sequence such that

x_{0} = x, x_{k} - x \in H_{k} \cap {[- \overset{˘}{b}, \overset{˘}{b}]}^{d}, lim_{k \to \infty} {∥ x_{k} - x^{'} ∥}_{\infty} = 0,

and

{(x_{k + 1})}_{i} = {(x_{k})}_{i} \pm 2^{- k - 1} \overset{˘}{b}

for all

i = 1, \dots, d

. By continuity, we have the telescoping sum

|\frac{1}{n} \sum_{i = 1}^{n} (Z_{i, b}^{(T)} (x^{'}) - Z_{i, b}^{(T)} (x)) 1_{{X_{i} \in S_{d, 1} (δ)}}| \leq \sum_{k = 0}^{\infty} |\frac{1}{n} \sum_{i = 1}^{n} (Z_{i, b}^{(T)} (x_{k + 1}) - Z_{i, b}^{(T)} (x_{k})) 1_{{X_{i} \in S_{d, 1} (δ)}}| .

For a fixed pair

(x_{k}, x_{k + 1})

, define

D_{i, b}^{(k)} : = Z_{i, b}^{(T)} (x_{k + 1}) - Z_{i, b}^{(T)} (x_{k}) .

Observe that

D_{i, b}^{(k)}

is centered. Moreover, conditionally on the

σ

-algebra

F_{Y, δ}

generated by

{Y_{i}, δ_{i}}_{i = 1}^{n}

, the variables

{D_{i, b}^{(k)}}_{i = 1}^{n}

are independent (since the

X_{i}

are independent) and satisfy

| D_{i, b}^{(k)} | \leq 2 ω_{n, 1} sup_{u \in S_{d, 1}} K_{(α, β)} (u) = : M_{n},

and

Var (D_{i, b}^{(k)} ∣ F_{Y, δ}) \leq C_{0} ω_{n, 1}^{2} ∥ x_{k + 1} - x_{k} ∥^{2} sup_{u \in S_{d, 1} (δ)} {∥ \nabla K_{(α, β)} (u) ∥}^{2},

where

C_{0}

depends only on

φ

and the propensity score

p (\cdot)

(via the MAR assumption). Applying Bernstein’s inequality (Lemma A1) conditionally on

F_{Y, δ}

yields, for any

ε > 0

,

P (|\frac{1}{n} \sum_{i = 1}^{n} D_{i, b}^{(k)} 1_{{X_{i} \in S_{d, 1} (δ)}}| \geq ε | F_{Y, δ}) \leq 2 \exp (- \frac{n ε^{2}}{2 σ_{n, k}^{2} + \frac{2}{3} M_{n} ε}),

where

σ_{n, k}^{2}

is a uniform bound for the conditional variance. Using the bounds from Lemmas 5 and 6, we have

M_{n} = O ({\overset{˘}{b}}^{- d / 2}), σ_{n, k}^{2} = O (ω_{n, 1}^{2} {\overset{˘}{b}}^{- 2 d - 1} 2^{- 2 k} {| \log δ |}^{2} {| \log \overset{˘}{b} |}^{2}) .

Choosing

ε = a ω_{n, 1} / (2 {(k + 1)}^{2})

and taking expectations over

F_{Y, δ}

, we obtain

P (|\frac{1}{n} \sum_{i = 1}^{n} D_{i, b}^{(k)} 1_{{X_{i} \in S_{d, 1} (δ)}}| \geq \frac{a ω_{n, 1}}{2 {(k + 1)}^{2}}) \leq 2 \exp (- \frac{c_{0} n a^{2} 2^{2 k}}{{(k + 1)}^{4}} \cdot \frac{{| \log δ |}^{2} {| \log \overset{˘}{b} |}^{2}}{{\overset{˘}{b}}^{2 d + 1} ω_{n, 1}^{2}}),

for some constant

c_{0} > 0

. Substituting

ω_{n, 1} = {({\overset{˘}{b}}^{d + 1 / 2} \sqrt{n} / (| \log \overset{˘}{b} | {(\log n)}^{3 / 2}))}^{1 / (1 + γ)}

and simplifying gives

P (\dots) \leq 2 \exp (- C_{1} \frac{2^{2 k}}{{(k + 1)}^{4}} \cdot {(\frac{n^{1 / 2} {\overset{˘}{b}}^{d + 1 / 2} a}{| \log δ | | \log \overset{˘}{b} |})}^{2}),

where

C_{1} > 0

is an absolute constant.

Now, taking a union bound over the chaining levels and the

2^{(k + 2) d}

lattice points at level k, we obtain

(C) \leq \sum_{k = 0}^{\infty} 2^{(k + 2) d} 2^{d} \cdot 2 \exp (- C_{1} \frac{2^{2 k}}{{(k + 1)}^{4}} \cdot {(\frac{n^{1 / 2} {\overset{˘}{b}}^{d + 1 / 2} a}{| \log δ | | \log \overset{˘}{b} |})}^{2}) .

Since

2^{2 k} / {(k + 1)}^{4} \geq c_{1} 2^{2 k}

for all

k \geq 0

(with

c_{1} = 1 / 16

), we have

(C) \leq C_{f, d} \exp (- \frac{1}{100^{2} d^{4} {∥ f ∥}_{\infty}^{2}} \cdot {(\frac{n^{1 / 2} {\overset{˘}{b}}^{d + 1 / 2} a}{| \log δ | | \log \overset{˘}{b} |})}^{2}),

for some constant

C_{f, d} > 0

depending only on f and d. Putting (15.14), (15.16), and (15) together in (15.10) concludes the proof of Proposition 4. □

The following corollary is very similar to Corollary 2 of [164], now adapted to the MAR setting.

Corollary 8

(Large deviation estimates). Recall

Z_{i, b}^{(T)} (x)

defined in (15.2), which includes the missingness indicator

δ_{i}

under the MAR assumption. Let

x \in S_{d, 1} (\overset{˘}{b} (d + 1)), n \geq 100^{6} d^{6}, n^{- 1 / d} \leq \overset{˘}{b} \leq (e^{- 16 \sqrt{2}} \land d^{- 1})

,

0 < a \leq e^{- 1} {∥ f ∥}_{\infty} | \log \overset{˘}{b} | / {\overset{˘}{b}}^{d + 1 / 2}

, and take the unique

δ \in (0, e^{- 1}] that satisfies δ | \log δ | = \frac{{\overset{˘}{b}}^{d + 1 / 2} a}{{∥ f ∥}_{\infty} | \log \overset{˘}{b} |} .

Then, under the MAR assumption (2.4) with the positivity condition

{inf}_{x \in S_{d, 1}} p (x) \geq c > 0

, we have

P (sup_{x^{'} \in x + {[- \overset{˘}{b}, \overset{˘}{b}]}^{d}} |\frac{1}{n} \sum_{i = 1}^{n} Z_{i, b}^{(T)} (x^{'})| \geq 3 a ω_{n, 1}) \leq C_{f, d} \exp (- \frac{1}{100^{2} d^{4} {∥ f ∥}_{\infty}^{2}} \cdot {(\frac{n^{1 / 2} {\overset{˘}{b}}^{d + 1 / 2} a}{| \log δ | | \log \overset{˘}{b} |})}^{2}),

(15.17)

where

C_{f, d} > 0

is a constant that depends only on the density

f (\cdot)

and the dimension d, and not on the missingness mechanism.

Proof of Corollary 8.

Throughout this proof, we operate under the MAR assumption (2.4) with the positivity condition

{inf}_{x \in S_{d, 1}} p (x) \geq c > 0

. The random variables

Z_{i, b}^{(T)} (x)

are defined in (15.2) and include the missingness indicator

δ_{i}

, which satisfies

0 \leq δ_{i} \leq 1

and

E [δ_{i} ∣ X_{i}] = p (X_{i})

. By applying a union bound, we find that the probability in equation (15.17) can be bounded as follows:

\leq P (sup_{x^{'} \in x + {[- \overset{˘}{b}, \overset{˘}{b}]}^{d}} |\frac{1}{n} \sum_{i = 1}^{n} Z_{i, b}^{(T)} (x^{'})| \geq 3 a ω_{n, 1}, |\frac{1}{n} \sum_{i = 1}^{n} Z_{i, b}^{(T)} (x)| \leq a ω_{n, 1}) + P (|\frac{1}{n} \sum_{i = 1}^{n} Z_{i, b}^{(T)} (x)| \geq a ω_{n, 1}) .

The first probability can be bounded using Proposition 4 (with

h = a ω_{n, 1}

), which directly yields an exponential bound of the form

P (\dots) \leq C_{φ, d} \exp (- \frac{1}{100^{2} d^{4} {∥ f ∥}_{\infty}^{2}} \cdot {(\frac{n^{1 / 2} {\overset{˘}{b}}^{d + 1 / 2} a}{| \log δ | | \log \overset{˘}{b} |})}^{2}) .

The second probability can be similarly bounded by applying Bernstein’s inequality (or Azuma’s inequality) to the sum of independent centered bounded random variables

Z_{i, b}^{(T)} (x)

. Indeed,

| Z_{i, b}^{(T)} (x) | \leq 2 ω_{n, 1} \cdot sup K_{(α, β)}

, and the variance is bounded by

C ω_{n, 1}^{2} {\overset{˘}{b}}^{- d - 1 / 2}

(up to constants). A standard application of Bernstein’s inequality gives

P (|\frac{1}{n} \sum_{i = 1}^{n} Z_{i, b}^{(T)} (x)| \geq a ω_{n, 1}) \leq 2 \exp (- \frac{n a^{2} ω_{n, 1}^{2}}{2 σ^{2} + \frac{2}{3} M a ω_{n, 1}}) = O (\exp (- \frac{n^{1 / 2} {\overset{˘}{b}}^{d + 1 / 2} a}{| \log \overset{˘}{b} |})),

which is of the same order as the bound from Proposition 4 (up to constants). Summing the two probabilities and adjusting the constant

C_{f, d}

yields the desired result. □

Now, to prove equation (15.1), we start by noting under the MAR assumption:

\begin{matrix} {\hat{g}}_{n} (φ, x, Λ_{n, 1}) & = & \frac{1}{n} \sum_{i = 1}^{n} φ (Y_{i}) δ_{i} K_{(α, β)} (X_{i}) \\ = & \frac{1}{n} \sum_{i = 1}^{n} φ^{(T)} (Y_{i}) δ_{i} K_{(α, β)} (X_{i}) + \frac{1}{n} \sum_{i = 1}^{n} φ^{(R)} (Y_{i}) δ_{i} K_{(α, β)} (X_{i}) \\ = & {\hat{g}}_{n} (φ^{(T)}, x, Λ_{n, 1}) + {\hat{g}}_{n} (φ^{(R)}, x, Λ_{n, 1}) . \end{matrix}

To prove equation (15.1), we need to show that the remainder term is asymptotically negligible, specifically:

sup_{x \in S_{d, 1} (\overset{˘}{b} d)} |{\hat{g}}_{n} (φ^{(R)}, x, Λ_{n, 1}) - E [{\hat{g}}_{n} (φ^{(R)}, x, Λ_{n, 1})]| = o (1), a . s .,

This follows directly from the proof of the remainder term for the U-statistics developed subsequently (see the analysis of

u_{n, 3}^{(R)}

), using condition (C.3) and the Borel–Cantelli lemma. The presence of the missingness indicator

δ_{i}

does not affect the argument since

0 \leq δ_{i} \leq 1

. Additionally, we need to prove:

sup_{x \in S_{d, 1} (\overset{˘}{b} d)} |{\hat{g}}_{n} (φ^{(T)}, x, Λ_{n, 1}) - E [{\hat{g}}_{n} (φ^{(T)}, x, Λ_{n, 1})]| = O (\frac{| \log \overset{˘}{b} {| (\log n)}^{3 / 2}}{{\overset{˘}{b}}^{d + 1 / 2} \sqrt{n}}), a . s .

(15.18)

This equation is obtained by a union bound over the suprema on hypercubes of width

2 \overset{˘}{b}

centered at each

x \in 2 \overset{˘}{b} Z^{d} \cap S_{d, 1} (\overset{˘}{b} (d + 1))

, using the large deviation estimates in Corollary 8, and choosing

a = 100 d^{2} \frac{{(\log n)}^{3 / 2}}{\sqrt{n}} \cdot \frac{{∥ f ∥}_{\infty} | \log \overset{˘}{b} |}{{\overset{˘}{b}}^{d + 1 / 2}} .

(15.19)

The upper bound condition on a is satisfied as long as

100 d^{2} {(\log n)}^{3 / 2} / (\sqrt{n}) \leq e^{- 1}

, which is valid if

n \geq 100^{6} d^{6}

. For the unique

δ \in (0, e^{- 1}]

that satisfies

δ | \log δ | = \frac{{\overset{˘}{b}}^{d + 1 / 2} a}{{∥ f ∥}_{\infty} | \log \overset{˘}{b} |} \overset{(15.19)}{=} 100 d^{2} \frac{{(\log n)}^{3 / 2}}{\sqrt{n}},

(15.20)

we obtain:

\begin{matrix} P (sup_{x \in S_{d, 1} (\overset{˘}{b} d)} |{\hat{g}}_{n} (φ^{(T)}, x, Λ_{n, 1}) - E [{\hat{g}}_{n} (φ^{(T)}, x, Λ_{n, 1})]| \geq 3 a ω_{n, 1}) \\ \leq \sum_{x \in 2 \overset{˘}{b} Z^{d} \cap S_{d, 1} (\overset{˘}{b} (d + 1))} P (sup_{x^{'} \in x + {[- \overset{˘}{b}, \overset{˘}{b}]}^{d}} |\frac{1}{n} \sum_{i = 1}^{n} Z_{i, b}^{(T)} (x^{'})| \geq 3 a ω_{n, 1}) \\ \leq {\overset{˘}{b}}^{- d} \cdot C_{f, d} \exp (- \frac{1}{100^{2} d^{4} {∥ f ∥}_{\infty}^{2}} {(\frac{n^{1 / 2} {\overset{˘}{b}}^{d + 1 / 2} a}{| \log δ | | \log \overset{˘}{b} |})}^{2}) \\ \leq {\overset{˘}{b}}^{- d} \cdot C_{f, d} \exp (- \frac{{(\log n)}^{3}}{{| \log δ |}^{2}}) . \end{matrix}

The condition on

δ

in (15.20) implies:

n^{- 1 / 2} \leq δ \leq e^{- 1}, (thus | \log δ | \leq \frac{1}{2} \log n),

(15.21)

since the function

x \mapsto x | \log x |

is increasing on

(0, e^{- 1}]

and

δ | \log δ | = 100 d^{2} {(\log n)}^{3 / 2} / \sqrt{n}

is of order

{(\log n)}^{3 / 2} / \sqrt{n}

. Using (15.21) in (15.20), we get:

P (sup_{x \in S_{d, 1} (\overset{˘}{b} d)} |{\hat{g}}_{n} (φ^{(T)}, x, Λ_{n, 1}) - E [{\hat{g}}_{n} (φ^{(T)}, x, Λ_{n, 1})]| \geq 3 a ω_{n, 1}) \leq C_{f, d} \exp (d | \log \overset{˘}{b} | - 4 \log n) .

Since we assumed that

\overset{˘}{b} \geq n^{- 1 / d}

, we have

| \log \overset{˘}{b} | \leq \frac{1}{d} \log n

, so the above is

\leq C_{f, d} \exp (\log n - 4 \log n) = C_{f, d} n^{- 3}

, which is summable. By our choice of a in (15.19) and the Borel–Cantelli lemma, we obtain

sup_{x \in S_{d, 1} (\overset{˘}{b} d)} |{\hat{g}}_{n} (φ^{(T)}, x, Λ_{n, 1}) - E [{\hat{g}}_{n} (φ^{(T)}, x, Λ_{n, 1})]| = O (\frac{| \log \overset{˘}{b} {| (\log n)}^{3 / 2}}{{\overset{˘}{b}}^{d + 1 / 2} \sqrt{n}}), a . s .

Bias term analysis under MAR.

Now, we only need to study the bias term. Under the MAR assumption, we have

|E [{\hat{g}}_{n} (φ^{(T)}, x, Λ_{n, 1})] - R (φ, x)| = O ({\overset{˘}{b}}^{1 / 2}) .

(15.22)

Using the same reasoning as [165], but incorporating the MAR assumption, we have

\begin{matrix} E [{\hat{g}}_{n} (φ^{(T)}, x, Λ_{n, 1})] & = & \int_{S_{d, 1}} r^{(1)} (φ, u) p (u) f (u) K_{α, β} (u) du \\ = & E [R (φ, ζ_{x}) p (ζ_{x})], \end{matrix}

where

ζ_{x} = (ζ_{x_{1}}, \dots, ζ_{x_{d}}) \sim Dirichlet (α, β)

. The factor

p (ζ_{x})

appears because

E [δ_{i} ∣ X_{i} = u] = p (u)

. By a second-order Taylor expansion of

R (φ, u) p (u)

around

u = x

, we have

\begin{matrix} E [R (φ, ζ_{x}) p (ζ_{x})] & = & R (φ, x) p (x) + \sum_{j = 1}^{d} \frac{\partial}{\partial x_{j}} (R (φ, x) p (x)) E (ζ_{x_{j}} - x_{j}) \\ + \frac{1}{2} \sum_{j = 1}^{d} \frac{\partial^{2}}{\partial x_{j}^{2}} (R (φ, \bar{x}) p (\bar{x})) E {(ζ_{x_{j}} - x_{j})}^{2} \\ + \sum_{j = 1}^{d} \sum_{k = 1, k \neq j}^{d} \frac{\partial^{2}}{\partial x_{j} \partial x_{k}} (R (φ, \bar{x}) p (\bar{x})) E \{(ζ_{x_{j}} - x_{j}) (ζ_{x_{k}} - x_{k})\}, \end{matrix}

for some

\bar{x}

joining

ζ_{x}

and

x

.

For all

j, k \in {1, \dots, d}

, straightforward calculations (see [164]) yield:

\begin{matrix} E [ζ_{j}] & = & \frac{\frac{x_{j}}{\overset{˘}{b}} + 1}{\frac{1}{\overset{˘}{b}} + d + 1} = \frac{x_{j} + \overset{˘}{b}}{1 + \overset{˘}{b} (d + 1)} \\ = & x_{j} + \overset{˘}{b} (1 - (d + 1) x_{j}) + O ({\overset{˘}{b}}^{2}), \\ Cov (ζ_{j}, ζ_{k}) & = & \frac{(\frac{x_{j}}{\overset{˘}{b}} + 1) ((\frac{1}{\overset{˘}{b}} + d + 1) 1_{{j = k}} - (\frac{x_{k}}{\overset{˘}{b}} + 1))}{{(\frac{1}{\overset{˘}{b}} + d + 1)}^{2} (\frac{1}{\overset{˘}{b}} + d + 2)} \\ = & \frac{\overset{˘}{b} (x_{j} + \overset{˘}{b}) (1_{{j = k}} - x_{k} + \overset{˘}{b} (d + 1) 1_{{j = k}} - \overset{˘}{b})}{{(1 + \overset{˘}{b} (d + 1))}^{2} (1 + \overset{˘}{b} (d + 2))} \end{matrix}

= \overset{˘}{b} x_{j} (1_{{j = k}} - x_{k}) + O ({\overset{˘}{b}}^{2}),

(15.23)

E [(ζ_{j} - x_{j}) (ζ_{k} - x_{k})] = Cov (ζ_{j}, ζ_{k}) + (E [ζ_{j}] - x_{j}) (E [ζ_{k}] - x_{k})

\begin{matrix} = & \overset{˘}{b} x_{j} (1_{{j = k}} - x_{k}) + O ({\overset{˘}{b}}^{2}) . \end{matrix}

(15.24)

Then, by the Cauchy–Schwarz inequality, together with (15.23) and (15.24), we obtain:

\begin{matrix} |E [R (φ, ζ_{x}) p (ζ_{x})] - R (φ, x) p (x)| \\ = \sum_{j = 1}^{d} O (E (ζ_{x_{j}} - x_{j})) + \frac{1}{2} \sum_{j = 1}^{d} O (E {(ζ_{x_{j}} - x_{j})}^{2}) \\ + \sum_{j = 1}^{d} \sum_{k = 1, k \neq j}^{d} O (E [(ζ_{x_{j}} - x_{j}) (ζ_{x_{k}} - x_{k})]) \\ \leq \sum_{j = 1}^{d} O (\sqrt{E ({|ζ_{x_{j}} - x_{j}|}^{2})}) + O (\overset{˘}{b}) + O ({\overset{˘}{b}}^{2}) \\ \leq O ({\overset{˘}{b}}^{1 / 2}) + O (\overset{˘}{b}) + O ({\overset{˘}{b}}^{2}) = O ({\overset{˘}{b}}^{1 / 2}) (1 + o (1)) . \end{matrix}

Thus, we have established

E [{\hat{g}}_{n} (φ^{(T)}, x, Λ_{n, 1})] = R (φ, x) p (x) + O ({\overset{˘}{b}}^{1 / 2}) .

For the denominator, a similar calculation gives

E [{\hat{f}}_{n} (x, Λ_{n, 1})] = f (x) p (x) + O ({\overset{˘}{b}}^{1 / 2}) .

Final combination for the regression estimator.

Finally, we obtain

sup_{x \in S_{d, 1} (\overset{˘}{b} d)} |\frac{{\hat{g}}_{n} (φ, x, Λ_{n, 1})}{f (x)} - r^{(1)} (φ, x)| \leq \frac{{sup}_{x \in S_{d, 1} (\overset{˘}{b} d)} |{\hat{g}}_{n} (φ, x, Λ_{n, 1}) - R (φ, x) p (x)|}{{inf}_{x \in S_{d, 1} (\overset{˘}{b} d)} f (x)},

(15.25)

and

sup_{x \in S_{d, 1} (\overset{˘}{b} d)} |\frac{{\hat{f}}_{n} (x, Λ_{n, 1})}{f (x)} - 1| \leq \frac{{sup}_{x \in S_{d, 1} (\overset{˘}{b} d)} |{\hat{f}}_{n} (x, Λ_{n, 1}) - f (x) p (x)|}{{inf}_{x \in S_{d, 1} (\overset{˘}{b} d)} f (x)} .

(15.26)

By integrating the acquired findings with the given information that

r^{(1)} (φ, x) = \frac{{\hat{g}}_{n} (φ, x, Λ_{n, 1})}{f (x) p (x)} \cdot \frac{f (x) p (x)}{{\hat{f}}_{n} (x, Λ_{n, 1})},

and noting that the factor

p (x)

cancels exactly in the ratio, we obtain the desired result. Therefore, the proof is conclusive.

Remark 32.

The adaptation to the missing data setting required careful incorporation of the propensity score

p (\cdot)

in the bias expansion. Crucially, the factor

p (x)

appears in both the numerator and denominator expectations, leading to exact cancellation when forming the ratio estimator

{\hat{r}}_{n, 1}^{(1)} (φ, x) = {\hat{g}}_{n} / {\hat{f}}_{n}

. This cancellation is a direct consequence of the MAR assumption and ensures that the bias of the regression estimator is of the same order

O ({\overset{˘}{b}}^{1 / 2})

as in the complete-data case. The variance, however, is inflated by a factor of

1 / p (x)

, which does not affect the convergence rate but appears in the asymptotic normality result.

By applying a union bound, we find that the probability in equation (15.17) can be bounded as follows:

\begin{matrix} \leq & P (sup_{x^{'} \in x + {[- \overset{˘}{b}, \overset{˘}{b}]}^{d}} |\frac{1}{n} \sum_{i = 1}^{n} Z_{i, b}^{(T)} (x^{'})| \geq 3 a ω_{n, 1}, |\frac{1}{n} \sum_{i = 1}^{n} Z_{i, b}^{(T)} (x)| \leq a ω_{n, 1}) \\ + P (|\frac{1}{n} \sum_{i = 1}^{n} Z_{i, b}^{(T)} (x)| \geq a ω_{n, 1}) . \end{matrix}

The first probability can be bounded using Proposition 4, and the second probability can be similarly bounded by applying Azuma’s inequality and Lemma 4 from [164]. Now, to prove equation (15.1), we start by noting:

\begin{matrix} {\hat{g}}_{n} (φ, x, Λ_{n, 1}) & = & \frac{1}{n} \sum_{i = 1}^{n} φ (Y_{i}) K_{(α, β)} (X_{i}) \\ = & \frac{1}{n} \sum_{i = 1}^{n} φ^{(T)} (Y_{i}) K_{(α, β)} (X_{i}) + \frac{1}{n} \sum_{i = 1}^{n} φ^{(R)} (Y_{i}) K_{(α, β)} (X_{i}) \\ = & {\hat{g}}_{n} (φ^{(T)}, x, Λ_{n, 1}) + {\hat{g}}_{n} (φ^{(R)}, x, Λ_{n, 1}) . \end{matrix}

To prove equation (15.1), we need to show that the remainder term is asymptotically negligible, specifically:

sup_{x \in S_{d, 1} (\overset{˘}{b} d)} |{\hat{g}}_{n} (φ^{(R)}, x, Λ_{n, 1}) - E [{\hat{g}}_{n} (φ^{(R)}, x, Λ_{n, 1})]| = o (1), a . s .,

This follows directly from the proof of the remainder term for the U-statistics developed subsequently. Additionally, we need to prove:

sup_{x \in S_{d, 1} (\overset{˘}{b} d)} |{\hat{g}}_{n} (φ^{(T)}, x, Λ_{n, 1}) - E [{\hat{g}}_{n} (φ^{(T)}, x, Λ_{n, 1})]| = O (\frac{| \log \overset{˘}{b} {| (\log n)}^{3 / 2}}{{\overset{˘}{b}}^{d + 1 / 2} \sqrt{n}}), a . s .

(15.27)

This equation is obtained by a union bound over the suprema on hypercubes of width

2 \overset{˘}{b}

centered at each

x \in 2 \overset{˘}{b} Z^{d} \cap S_{d, 1} (\overset{˘}{b} (d + 1))

, using the large deviation estimates in Corollary 8, and choosing

a = 100 d^{2} \frac{{(\log n)}^{3 / 2}}{\sqrt{n}} \cdot \frac{{∥ f ∥}_{\infty} | \log \overset{˘}{b} |}{{\overset{˘}{b}}^{d + 1 / 2}},

(15.28)

The upper bound condition on a is satisfied as long as

100 d^{2} {(\log n)}^{3 / 2} / (\sqrt{n}) \leq e^{- 1}

, which is valid if

n \geq 100^{6} d^{6}

. For the unique

δ \in (0, e^{- 1}]

that satisfies

δ | \log δ | = \frac{{\overset{˘}{b}}^{d + 1 / 2} a}{{∥ f ∥}_{\infty} | \log \overset{˘}{b} |} \overset{(15.28)}{=} 100 d^{2} \frac{{(\log n)}^{3 / 2}}{\sqrt{n}},

(15.29)

we obtain:

\begin{matrix} P (sup_{x \in S_{d, 1} (\overset{˘}{b} d)} |{\hat{g}}_{n} (φ^{(T)}, x, Λ_{n, 1}) - E [{\hat{g}}_{n} (φ^{(T)}, x, Λ_{n, 1})]| \geq 3 a ω_{n, 1}) \\ \leq \sum_{x \in 2 \overset{˘}{b} Z^{d} \cap S_{d, 1} (\overset{˘}{b} (d + 1))} P (sup_{x^{'} \in x + {[- \overset{˘}{b}, \overset{˘}{b}]}^{d}} |\frac{1}{n} \sum_{i = 1}^{n} Z_{i, b}^{(T)} (x^{'})| \geq 3 a ω_{n, 1}) \\ \leq {\overset{˘}{b}}^{- d} \cdot C_{f, d} \exp (- \frac{1}{100^{2} d^{4} {∥ f ∥}_{\infty}^{2}} {(\frac{n^{1 / 2} {\overset{˘}{b}}^{d + 1 / 2} a}{| \log δ | | \log \overset{˘}{b} |})}^{2}) \\ \leq {\overset{˘}{b}}^{- d} \cdot C_{f, d} \exp (- \frac{{(\log n)}^{3}}{{| \log δ |}^{2}}) . \end{matrix}

The condition on

δ

in (15.29) implies:

n^{- 1 / 2} \leq δ \leq e^{- 1}, (thus | \log δ | \leq \frac{1}{2} \log n),

(15.30)

since the function

x \mapsto x | \log x |

is increasing on

(0, e^{- 1}]

. Using (15.30) in (15.29), we get:

P (sup_{x \in S_{d, 1} (\overset{˘}{b} d)} |{\hat{g}}_{n} (φ^{(T)}, x, Λ_{n, 1}) - E [{\hat{g}}_{n} (φ^{(T)}, x, Λ_{n, 1})]| \geq 3 a ω_{n, 1}) \leq C_{f, d} \exp (d | \log \overset{˘}{b} | - 4 \log n) .

Since we assumed that

\overset{˘}{b} \geq n^{- 1 / d}

, the above is

\leq C_{f, d} n^{- 3}

, which is summable. By our choice of a in (15.28) and the Borel–Cantelli lemma, we obtain

sup_{x \in S_{d, 1} (\overset{˘}{b} d)} |{\hat{g}}_{n} (φ^{(T)}, x, Λ_{n, 1}) - E [{\hat{g}}_{n} (φ^{(T)}, x, Λ_{n, 1})]| = O (\frac{| \log \overset{˘}{b} {| (\log n)}^{3 / 2}}{{\overset{˘}{b}}^{d + 1 / 2} \sqrt{n}}), a . s .

Now, we only need to study the bias term,

|E [{\hat{g}}_{n} (φ^{(T)}, x, Λ_{n, 1})] - R (φ, x)| = O ({\overset{˘}{b}}^{1 / 2}) .

(15.31)

Using the same reasoning as [165], we have

\begin{matrix} E [{\hat{g}}_{n} (φ^{(T)}, x, Λ_{n, 1})] & = & \int_{S_{d, 1}} r^{(1)} (φ, u) f (u) K_{α, β} (u) du \\ = & E [R (φ, ζ_{x})], \end{matrix}

where

ζ_{x} = (ζ_{x_{1}}, \dots, ζ_{x_{d}}) \sim Dirichlet (α, β)

. By a second-order Taylor expansion around

ζ_{x} = x

, we have

\begin{matrix} E [R (φ, ζ_{x})] & = & R (φ, x) + \sum_{j = 1}^{d} \frac{\partial R (φ, x)}{\partial x_{j}} E (ζ_{x_{j}} - x_{j}) + \frac{1}{2} \sum_{j = 1}^{d} \frac{\partial^{2} R (φ, \bar{x})}{\partial x_{j}^{2}} E {(ζ_{x_{j}} - x_{j})}^{2} \\ + \sum_{j = 1}^{d} \sum_{k = 1, k \neq j}^{d} \frac{\partial^{2} R (φ, \bar{x})}{\partial x_{j} \partial x_{k}} E \{(ζ_{x_{j}} - x_{j}) (ζ_{x_{k}} - x_{k})\}, \end{matrix}

for some

\bar{x}

joining

ζ_{x}

and

x

. In addition, for all

j, k \in {1, \dots, d}

, straightforward calculations yield; for instance, see [164],

\begin{matrix} E [ζ_{j}] & = & \frac{\frac{x_{j}}{\overset{˘}{b}} + 1}{\frac{1}{\overset{˘}{b}} + d + 1} = \frac{x_{j} + \overset{˘}{b}}{1 + \overset{˘}{b} (d + 1)} = x_{j} + \overset{˘}{b} (1 - (d + 1) x_{j}) + O ({\overset{˘}{b}}^{2}), \\ Cov (ζ_{j}, ζ_{k}) & = & \frac{(\frac{x_{j}}{\overset{˘}{b}} + 1) ((\frac{1}{\overset{˘}{b}} + d + 1) 1_{{j = k}} - (\frac{x_{k}}{\overset{˘}{b}} + 1))}{{(\frac{1}{\overset{˘}{b}} + d + 1)}^{2} (\frac{1}{\overset{˘}{b}} + d + 2)} \\ = & \frac{b (x_{j} + \overset{˘}{b}) (1_{{j = k}} - x_{k} + \overset{˘}{b} (d + 1) 1_{{j = k}} - \overset{˘}{b})}{{(1 + \overset{˘}{b} (d + 1))}^{2} (1 + \overset{˘}{b} (d + 2))} \end{matrix}

= \overset{˘}{b} x_{j} (1_{{j = k}} - x_{k}) + O ({\overset{˘}{b}}^{2}),

(15.32)

E [(ζ_{j} - x_{j}) (ζ_{k} - x_{k})] = Cov (ζ_{j}, ζ_{k}) + (E [ζ_{j}] - x_{j}) (E [ζ_{k}] - x_{k})

\begin{matrix} = & \overset{˘}{b} x_{j} (1_{{j = k}} - x_{k}) + O ({\overset{˘}{b}}^{2}) . \end{matrix}

(15.33)

Then, Cauchy–Schwartz inequality, (15.32) and (15.33) yields:

\begin{matrix} |E [R (φ, ζ_{x})] - R (φ, x)| \\ = \sum_{j = 1}^{d} O (E (ζ_{x_{j}} - x_{j})) + \frac{1}{2} \sum_{j = 1}^{d} O (E {(ζ_{x_{j}} - x_{j})}^{2}) + \sum_{j = 1}^{d} \sum_{k = 1, k \neq j}^{d} O (E [(ζ_{x_{j}} - x_{j}) (ζ_{x_{k}} - x_{k})]) \\ \leq \sum_{j = 1}^{d} O (\sqrt{E ({|ζ_{x_{j}} - x_{j}|}^{2})}) + O (\overset{˘}{b}) + O ({\overset{˘}{b}}^{2}) \\ \leq O ({\overset{˘}{b}}^{1 / 2}) + O (\overset{˘}{b}) + O ({\overset{˘}{b}}^{2}) \leq O ({\overset{˘}{b}}^{1 / 2}) (1 + o (1)) . \end{matrix}

Finally, we obtain

sup_{x \in S_{d, 1}} |\frac{{\hat{g}}_{n} (φ, x, Λ_{n, 1})}{f (x)} - r^{(1)} (φ, x)| \leq \frac{{sup}_{x \in S_{d, 1}} |{\hat{g}}_{n} (φ, x, Λ_{n, 1}) - R (φ, x)|}{{inf}_{x \in S_{d, 1}} f (x)},

(15.34)

and

sup_{x \in S_{d, 1}} |\frac{{\hat{f}}_{n} (x, Λ_{n, 1})}{f (x)} - 1| \leq \frac{{sup}_{x \in S_{d, 1}} |{\hat{f}}_{n} (x, b) - f (x)|}{{inf}_{x \in S_{d, 1}} f (x)} .

(15.35)

By integrating the acquired findings with the given information that

r^{(1)} (φ, x) = \frac{{\hat{g}}_{n} (φ, x, Λ_{n, 1})}{f (x)} \cdot \frac{f (x)}{{\hat{f}}_{n} (x, Λ_{n, 1})},

gives us the desired result. Therefore, the proof is conclusive.

Funding

This research received no external funding.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Acknowledgments

The authors would like to thank the Editor-in-Chief, an Associate Editor, and three referees for their extremely helpful remarks, which resulted in a substantial improvement of the original form of the work and a presentation that was more sharply focused.

Conflicts of Interest

The author declares that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Appendix A

This appendix contains supplementary information that is an essential part of providing a more comprehensive understanding of the paper.

Lemma A1

(Lemma 2.2.9, [190]). Let

X_{1}, \dots, X_{n}

be independent random variables with bounded ranges

[- M, M]

and zero means. Then,

P (|\sum_{i = 1}^{n} X_{i}| > t) \leq 2 \exp \{- \frac{t^{2}}{2 (v + M t / 3)}\},

for all t and

v \geq Var (\sum_{i = 1}^{n} X_{i})

.

Lemma A2

(Theorem A. page 201, [177]). Let f be a symmetric function taking its variables from

S_{d, 1}

satisfying

{∥f∥}_{\infty} \leq c

,

E f (X_{1}, \dots, X_{m}) = θ,

and

σ^{2} = V a r (f (X_{1}, \dots, X_{m})),

then for

t > 0

and

n \geq m,

we have:

P \{| u_{n, ℓ}^{(m)} (f) - θ | \geq t\} \leq \exp \{- \frac{[n / m] t^{2}}{2 σ^{2} - \frac{2}{3} c t}\} .

Lemma A3

(Proposition 1, [187]). If

G : S^{m} \to R

is a measurable symmetric function with

{∥ G ∥}_{\infty} = b

then

P \{n^{1 / 2} |\sum_{j = 2}^{m} (\begin{matrix} m \\ j \end{matrix}) u_{n}^{(j)} (π_{j, m} G)| ⩾ t\} ⩽ 2 \exp (- \frac{t {(n - 1)}^{1 / 2}}{2^{m + 2} m^{m + 1} b}) .

Lemma A4

(Lemma 1, [164]). We have, as

b \to 0

and uniformly for

x \in S_{d, 1}

,

0 < A_{b} (x) \leq \frac{b^{(d + 1) / 2} {(1 / b + d)}^{d + 1 / 2}}{{(4 π)}^{d / 2} \sqrt{(1 - {∥ x ∥}_{1}) \prod_{i = 1}^{d} x_{i}}} (1 + O (b)) .

Furthermore, for any subset

\emptyset \neq J \subseteq [d]

, and any

κ \in {(0, \infty)}^{d}

,

A_{b} (x) = \{\begin{matrix} {\overset{˘}{b}}^{- d / 2} ψ (x) (1 + O_{s} (b)), \\ if x_{i} / b \to \infty, \forall i \in [d] and (1 - {∥ x ∥}_{1}) / b \to \infty, \\ b^{- (d + | J |) / 2} ψ_{J} (x) \prod_{i \in J} \frac{Γ (2 κ_{i} + 1)}{2^{2 κ_{i} + 1} Γ^{2} (κ_{i} + 1)} \cdot (1 + O_{κ, x} (b)), \\ if x_{i} / b \to κ_{i}, \forall i \in J and x_{i} / b \to \infty, \forall i \in [d] ∖ J, and (1 - {∥ x ∥}_{1}) / b \to \infty, \end{matrix}

where

ψ (\cdot)

and

ψ_{J} (\cdot)

are defined for every subset of indices

J \subseteq [d]

, by

ψ (x) : = ψ_{\emptyset} (x) and ψ_{J} (x) : = {[{(4 π)}^{d - | J |} \cdot (1 - {∥ x ∥}_{1}) \prod_{i \in [d] ∖ J} x_{i}]}^{- 1 / 2} .

(A1)

Lemma A5

(Lemma 2, [164]). If

α_{1}, \dots, α_{d}, β \geq 2

, then

sup_{x \in S_{d}} K_{α, β} (x) \leq \sqrt{\frac{{∥ α ∥}_{1} + β - 1}{(β - 1) \prod_{i \in [d]} (α_{i} - 1)}} {({∥ α ∥}_{1} + β - d - 1)}^{d} .

Lemma A6

(Lemma 3, [164]). If

α_{1}, \dots, α_{d}, β \geq 2

, then for all

x \in Int (S_{d, 1})

,

\begin{matrix} |\frac{\partial}{\partial α_{j}} K_{α, β} (x)| & \leq & \{|\log ({∥ α ∥}_{1} + β)| + |\log (α_{j})| + |\log x_{j}|\} \cdot \sqrt{\frac{{∥ α ∥}_{1} + β - 1}{(β - 1) \prod_{i = 1}^{d} (α_{i} - 1)}} {({∥ α ∥}_{1} + β - d - 1)}^{d}, \\ |\frac{\partial}{\partial β} K_{α, β} (x)| & \leq & \{|\log ({∥ α ∥}_{1} + β)| + | \log (β) | + |\log (1 - {∥ x ∥}_{1})|\} \cdot \sqrt{\frac{{∥ α ∥}_{1} + β - 1}{(β - 1) \prod_{i = 1}^{d} (α_{i} - 1)}} {({∥ α ∥}_{1} + β - d - 1)}^{d} . \end{matrix}

Lemma A7

(Lemma 4, [164]). If

α_{1}, \dots, α_{d}, β, α_{1}^{'}, \dots, α_{d}^{'}, β^{'} \geq 2

, and

X

is F distributed with a bounded density f supported on

S_{d, 1}

, then

\begin{matrix} E [|K_{α^{'}, β^{'}} (X) - K_{α, β} (X)|] \\ \leq {3 (d + 1) ∥ f ∥}_{\infty} \sqrt{\frac{{∥α \lor α^{'}∥}_{1} + (β \lor β^{'}) - 1}{((β \land β^{'}) - 1) \prod_{i \in [d]} ((α_{i} \land α_{i}^{'}) - 1)}} \cdot {({∥α \lor α^{'}∥}_{1} + (β \lor β^{'}) - d - 1)}^{d} \\ \cdot \log ({∥α \lor α^{'}∥}_{1} + (β \lor β^{'})) \cdot {∥(α^{'}, β^{'}) - (α, β)∥}_{\infty}, \end{matrix}

where

α \lor α^{'} : = {(max \{α_{i}, α_{i}^{'}\})}_{i \in [d]}, β \lor β^{'} : = max \{β, β^{'}\}

, and

β \land β^{'} : = min \{β, β^{'}\}

. Furthermore, let

S_{d, 1} (δ) : = \{x \in S_{d, 1} : 1 - {∥ x ∥}_{1} \geq δ and x_{i} \geq δ \forall i \in [d]\}, δ > 0 .

Then, for

0 < δ \leq e^{- 1}

, we have

\begin{matrix} max_{x \in S_{d, 1} (δ)} |K_{α^{'}, β^{'}} (x) - K_{α, β} (x)| \\ \leq {3 (d + 1) ∥ f ∥}_{\infty} | \log δ | \cdot \sqrt{\frac{{∥α \lor α^{'}∥}_{1} + (β \lor β^{'}) - 1}{((β \land β^{'}) - 1) \prod_{i \in [d]} (α_{i} \land α_{i}^{'}) - 1)}} \cdot {({∥α \lor α^{'}∥}_{1} + (β \lor β^{'}) - d - 1)}^{d} \\ \cdot \log ({∥α \lor α^{'}∥}_{1} + (β \lor β^{'})) \cdot {∥(α^{'}, β^{'}) - (α, β)∥}_{\infty} . \end{matrix}

Proposition A1.

Let

s = d m

, write

\tilde{x} = (x_{1}, \dots, x_{m}) \in X^{m} \subset R^{s},

and set

θ (\tilde{x}) : = r^{(m)} (φ, \tilde{x}) .

For a positive function

a : X \to (0, \infty)

, define

g_{a} (x) : = a (x) f (x), G_{a} (\tilde{x}) : = \prod_{j = 1}^{m} g_{a} (x_{j}), Q_{a} (\tilde{x}) : = \log G_{a} (\tilde{x}) .

For the smoothing scheme indexed by ℓ, let

Q_{n, ℓ, \tilde{x}}

denote the probability measure on

X^{m}

with product density

{\tilde{K}}_{{\bar{Λ}}_{n, ℓ} (\tilde{x})} (\tilde{t}) = \prod_{j = 1}^{m} K_{Λ_{n, ℓ} (x_{j})} (t_{j}) .

Equivalently, let

{\tilde{T}}_{n, ℓ, \tilde{x}} \sim Q_{n, ℓ, \tilde{x}}, Δ_{n, ℓ} (\tilde{x}) : = {\tilde{T}}_{n, ℓ, \tilde{x}} - \tilde{x} .

Put

μ_{n, ℓ} (\tilde{x}) : = E_{Q_{n, ℓ, \tilde{x}}} {Δ_{n, ℓ} (\tilde{x})},

and

M_{n, ℓ} (\tilde{x}) : = E_{Q_{n, ℓ, \tilde{x}}} {Δ_{n, ℓ} (\tilde{x}) Δ_{n, ℓ} {(\tilde{x})}^{⊤}} .

Assume that, uniformly on the compact set

C \subset Int (X^{m})

,

∥ μ_{n, ℓ} (\tilde{x}) ∥ + ∥ M_{n, ℓ} (\tilde{x}) ∥ = O (ρ_{n, ℓ}), ρ_{n, ℓ} ↓ 0,

and

E_{Q_{n, ℓ, \tilde{x}}} {∥ Δ_{n, ℓ} (\tilde{x}) ∥}^{3} = o (ρ_{n, ℓ}) .

Assume also that

θ \in C^{2} (U), G_{a} \in C^{2} (U),

on an open neighbourhood

U

of

C

, and that

0 < inf_{\tilde{x} \in U} G_{a} (\tilde{x}) \leq sup_{\tilde{x} \in U} G_{a} (\tilde{x}) < \infty .

Define the deterministic ratio centering

Θ_{a, n, ℓ} (\tilde{x}) : = \frac{\int_{X^{m}} θ (\tilde{t}) G_{a} (\tilde{t}) d Q_{n, ℓ, \tilde{x}} (\tilde{t})}{\int_{X^{m}} G_{a} (\tilde{t}) d Q_{n, ℓ, \tilde{x}} (\tilde{t})} .

Then, uniformly for

\tilde{x} \in C

,

\begin{matrix} Θ_{a, n, ℓ} (\tilde{x}) - θ (\tilde{x}) & = \nabla θ {(\tilde{x})}^{⊤} μ_{n, ℓ} (\tilde{x}) \\ + \frac{1}{2} tr [\nabla^{2} θ (\tilde{x}) M_{n, ℓ} (\tilde{x})] \\ + \nabla θ {(\tilde{x})}^{⊤} M_{n, ℓ} (\tilde{x}) \nabla Q_{a} (\tilde{x}) + o (ρ_{n, ℓ}) . \end{matrix}

(A2)

In particular, for the complete-case estimator under MAR,

a = p, G_{p} (\tilde{x}) = \prod_{j = 1}^{m} p (x_{j}) f (x_{j}),

whereas for the fully observed or IPW centering,

a \equiv 1, G_{1} (\tilde{x}) = \prod_{j = 1}^{m} f (x_{j}) .

Consequently,

\begin{matrix} Θ_{p, n, ℓ} (\tilde{x}) - Θ_{1, n, ℓ} (\tilde{x}) & = \nabla θ {(\tilde{x})}^{⊤} M_{n, ℓ} (\tilde{x}) \nabla [\sum_{j = 1}^{m} \log p (x_{j})] + o (ρ_{n, ℓ}), \end{matrix}

(A3)

uniformly on

C

. Hence the MAR propensity score cancels from the zeroth-order conditional target, but, in general, it does not cancel from the first non-vanishing deterministic smoothing-bias constant of the complete-case ratio. It disappears from that constant only under the additional condition

\nabla θ {(\tilde{x})}^{⊤} M_{n, ℓ} (\tilde{x}) \nabla [\sum_{j = 1}^{m} \log p (x_{j})] = o (ρ_{n, ℓ}),

(A4)

for example if p is locally constant at the considered point, if

\nabla θ (\tilde{x}) = 0

, or if the kernel scheme makes the above contraction of smaller order than the displayed bias scale.

Proof.

We write, for brevity,

θ = θ (\tilde{x}), G = G_{a} (\tilde{x}), Q = Q_{a} (\tilde{x}), Δ = Δ_{n, ℓ} (\tilde{x}),

and all derivatives are evaluated at

\tilde{x}

. By the uniform Taylor expansion and the moment assumptions,

θ (\tilde{x} + Δ) = θ + \nabla θ^{⊤} Δ + \frac{1}{2} Δ^{⊤} \nabla^{2} θ Δ + o_{Q} (ρ_{n, ℓ}),

and

G (\tilde{x} + Δ) = G + \nabla G^{⊤} Δ + \frac{1}{2} Δ^{⊤} \nabla^{2} G Δ + o_{Q} (ρ_{n, ℓ}),

uniformly on

C

. Taking

Q_{n, ℓ, \tilde{x}}

expectations gives

E_{Q} G (\tilde{x} + Δ) = G + \nabla G^{⊤} μ_{n, ℓ} + \frac{1}{2} tr (\nabla^{2} G M_{n, ℓ}) + o (ρ_{n, ℓ}) .

(A5)

Similarly,

\begin{matrix} E_{Q} {θ (\tilde{x} + Δ) G (\tilde{x} + Δ)} & = θ G + {(G \nabla θ + θ \nabla G)}^{⊤} μ_{n, ℓ} \\ + \frac{1}{2} G tr (\nabla^{2} θ M_{n, ℓ}) \\ + \frac{1}{2} θ tr (\nabla^{2} G M_{n, ℓ}) \\ + \nabla θ^{⊤} M_{n, ℓ} \nabla G + o (ρ_{n, ℓ}) . \end{matrix}

(A6)

Dividing (A6) by (A5), using G bounded away from zero, and expanding the inverse denominator yields

\begin{matrix} Θ_{a, n, ℓ} (\tilde{x}) & = θ + \nabla θ^{⊤} μ_{n, ℓ} + \frac{1}{2} tr (\nabla^{2} θ M_{n, ℓ}) \\ + \nabla θ^{⊤} M_{n, ℓ} \frac{\nabla G}{G} + o (ρ_{n, ℓ}) . \end{matrix}

Since

\frac{\nabla G}{G} = \nabla \log G = \nabla Q,

we obtain (A2). The terms

θ \nabla G^{⊤} μ_{n, ℓ} and \frac{1}{2} θ tr (\nabla^{2} G M_{n, ℓ})

cancel exactly with the corresponding denominator terms. This is the precise ratio cancellation.

Under MAR,

E (δ ∣ X, Y) = E (δ ∣ X) = p (X) .

Therefore the deterministic centering of the complete-case estimator is obtained with

G_{p} (\tilde{x}) = \prod_{j = 1}^{m} p (x_{j}) f (x_{j}) .

For the IPW ratio,

E (\frac{δ}{p (X)} | X, Y) = 1,

so its deterministic centering is obtained with

G_{1}

. Subtracting expansion (A2) with

a = p

and with

a \equiv 1

gives (A3), because

Q_{p} (\tilde{x}) - Q_{1} (\tilde{x}) = \sum_{j = 1}^{m} \log p (x_{j}) .

The final assertion follows immediately from (A3). □

Remark A1.

Proposition A1 is the deterministic device used to audit all bias, MSE, and bandwidth statements for the estimator actually defined in (2.5). The relevant point is that the complete-case estimator is centered with respect to the tilted design density

p f

, while the IPW and fully observed estimators are centered with respect to f. Therefore, in every deterministic bias calculation, the replacement

f ⟶ p f

is mandatory for the complete-case estimator. More precisely, if

Θ_{p, n, ℓ} (\tilde{x}) - θ (\tilde{x}) = B_{p, n, ℓ} (\tilde{x}) + o (ρ_{n, ℓ}),

then the bias constant

B_{p, n, ℓ}

is obtained from

\begin{matrix} B_{p, n, ℓ} (\tilde{x}) & = \nabla θ {(\tilde{x})}^{⊤} μ_{n, ℓ} (\tilde{x}) \\ + \frac{1}{2} tr [\nabla^{2} θ (\tilde{x}) M_{n, ℓ} (\tilde{x})] \\ + \nabla θ {(\tilde{x})}^{⊤} M_{n, ℓ} (\tilde{x}) \nabla [\sum_{j = 1}^{m} \log {p (x_{j}) f (x_{j})}] . \end{matrix}

(A7)

Thus, at the level of rates, the MAR mechanism does not alter the deterministic bias order, provided p is bounded away from zero and has the same smoothness order as f. However, at the level of constants, the propensity score enters the complete-case bias through the design-gradient term in (A7). Consequently, any theorem displaying only rates may retain the complete-data order, whereas any theorem displaying exact bias constants must use

p f

, not f. For the four smoothing regimes considered in the paper, the kernel-moment verification of Proposition A1 is as follows.

\begin{matrix} Smoother & μ_{n, ℓ} (\tilde{x}) & M_{n, ℓ} (\tilde{x}) & ρ_{n, ℓ} \\ Dirichlet & O (\overset{˘}{b}) & O (\overset{˘}{b}) & \overset{˘}{b} \\ Bernstein & O (k_{n}^{- 1}) & O (k_{n}^{- 1}) & k_{n}^{- 1} \\ Product beta & O (b_{n}) & O (b_{n}) & b_{n} \\ Mixed continuous–categorical & O (b_{n} + λ_{n}) & O (b_{n} + λ_{n}) & b_{n} + λ_{n} \end{matrix}

Here, the

O (\cdot)

bounds are uniform on the compact interior sets on which the corresponding theorem is stated. More explicitly, for the product beta kernel on

{[0, 1]}^{d}

,

μ_{n, 3} (x) = b_{n} {1 - 2 x} + O (b_{n}^{2}),

and

M_{n, 3} (x) = b_{n} diag {x_{1} (1 - x_{1}), \dots, x_{d} (1 - x_{d})} + O (b_{n}^{2}) .

For the Dirichlet kernel on the simplex,

μ_{n, 1} (x) = \overset{˘}{b} {1 - (d + 1) x} + O ({\overset{˘}{b}}^{2}),

and

M_{n, 1} (x) = \overset{˘}{b} {diag (x) - x x^{⊤}} + O ({\overset{˘}{b}}^{2}) .

For Bernstein-type smoothing, the corresponding moment matrix is the usual multinomial or product-binomial covariance matrix divided by

k_{n}

, depending on the support geometry. The mixed case combines the continuous beta moment terms with the categorical smoothing bias of order

λ_{n}

. Consequently, all stated rates remain valid for the complete-case estimator provided the assumptions are understood with the effective density

p f

. The stochastic rates are likewise unchanged at the level of order because

δ_{i} \leq 1

and p is bounded away from zero. The limiting variance constants, however, must include the inverse-propensity loss of information. In first-order schematic form, the leading variance contribution contains local factors of the type

\frac{1}{p (x_{j}) f (x_{j})}

rather than

1 / f (x_{j})

. For higher-order conditional U-statistics, the same replacement occurs in the variance of the first Hoeffding projection. Thus the audited interpretation of the results is the following.

\begin{matrix} \begin{matrix} Complete-case estimator : deterministic bias uses p f, \\ IPW or complete-data estimator : deterministic bias uses f, \\ MAR does not change the bias rate under smooth positive p, \\ but MAR may change the bias constant and does change variance constants . \end{matrix} \end{matrix}

Therefore, any statement claiming that p cancels from the bias should be read only at the zeroth-order target level, or under the additional condition

\nabla θ {(\tilde{x})}^{⊤} M_{n, ℓ} (\tilde{x}) \nabla [\sum_{j = 1}^{m} \log p (x_{j})] = o (ρ_{n, ℓ}) .

Without this additional condition, the mathematically correct statement is that p does not alter the deterministic bias rate, but it generally contributes to the first non-vanishing complete-case bias constant.

Table A1. Top configurations by IMSE.

Scenario	Missing_Type	Correction	Kernel	Estimator	n	Miss_Rate	IBias	ISd	IMSE	IAE
Scenario1_linear	MAR	complete_case	bernstein	tau2	500	0	0.0176	0.0499	0.0032	0.0440
Scenario1_linear	MAR	complete_case	tricube	tau2	500	0	0.0291	0.0378	0.0036	0.0468
Scenario1_linear	MAR	complete_case	bernstein	tau1	500	0	0.0202	0.0518	0.0037	0.0471
Scenario1_linear	MAR	complete_case	tricube	tau3	500	0	0.0306	0.0387	0.0039	0.0481
Scenario1_linear	MAR	complete_case	tricube	tau1	500	0	0.0302	0.0397	0.0039	0.0480
Scenario1_linear	MAR	complete_case	gaussian	tau1	500	0	0.0332	0.0368	0.0040	0.0499
Scenario1_linear	MAR	complete_case	gaussian	tau2	500	0	0.0325	0.0369	0.0040	0.0498
Scenario1_linear	MAR	complete_case	bernstein	tau3	500	0	0.0190	0.0542	0.0041	0.0493
Scenario1_linear	MAR	complete_case	epanechnikov	tau2	500	0	0.0350	0.0369	0.0041	0.0502
Scenario1_linear	MAR	complete_case	gaussian	tau3	500	0	0.0353	0.0396	0.0045	0.0522
Scenario1_linear	MAR	complete_case	bernstein	tau1	500	0.1	0.0191	0.0490	0.0033	0.0451
Scenario1_linear	MAR	complete_case	bernstein	tau2	500	0.1	0.0239	0.0500	0.0035	0.0464
Scenario1_linear	MAR	complete_case	bernstein	tau3	500	0.1	0.0245	0.0509	0.0040	0.0484
Scenario1_linear	MAR	complete_case	tricube	tau1	500	0.1	0.0321	0.0419	0.0042	0.0516
Scenario1_linear	MAR	complete_case	tricube	tau3	500	0.1	0.0357	0.0415	0.0044	0.0518
Scenario1_linear	MAR	complete_case	tricube	tau2	500	0.1	0.0313	0.0433	0.0044	0.0517
Scenario1_linear	MAR	complete_case	gaussian	tau1	500	0.1	0.0386	0.0402	0.0047	0.0539
Scenario1_linear	MAR	complete_case	epanechnikov	tau1	500	0.1	0.0363	0.0420	0.0047	0.0541
Scenario1_linear	MAR	complete_case	epanechnikov	tau2	500	0.1	0.0377	0.0403	0.0050	0.0547
Scenario1_linear	MAR	complete_case	gaussian	tau2	500	0.1	0.0377	0.0425	0.0050	0.0551
Scenario1_linear	MAR	complete_case	bernstein	tau1	500	0.3	0.0212	0.0568	0.0043	0.0515
Scenario1_linear	MAR	complete_case	bernstein	tau2	500	0.3	0.0299	0.0567	0.0049	0.0538
Scenario1_linear	MAR	complete_case	tricube	tau1	500	0.3	0.0336	0.0472	0.0050	0.0546
Scenario1_linear	MAR	complete_case	bernstein	tau3	500	0.3	0.0356	0.0557	0.0057	0.0568
Scenario1_linear	MAR	complete_case	tricube	tau2	500	0.3	0.0407	0.0498	0.0060	0.0602
Scenario1_linear	MAR	complete_case	epanechnikov	tau2	500	0.3	0.0453	0.0456	0.0061	0.0610
Scenario1_linear	MAR	complete_case	gaussian	tau2	500	0.3	0.0473	0.0445	0.0062	0.0619
Scenario1_linear	MAR	complete_case	gaussian	tau1	500	0.3	0.0454	0.0483	0.0062	0.0618
Scenario1_linear	MAR	complete_case	epanechnikov	tau1	500	0.3	0.0411	0.0490	0.0063	0.0625
Scenario1_linear	MAR	complete_case	tricube	tau3	500	0.3	0.0463	0.0483	0.0066	0.0628
Scenario1_linear	MAR	complete_case	bernstein	tau1	500	0.5	0.0240	0.0693	0.0062	0.0605
Scenario1_linear	MAR	complete_case	tricube	tau1	500	0.5	0.0428	0.0569	0.0069	0.0638
Scenario1_linear	MAR	complete_case	gaussian	tau1	500	0.5	0.0401	0.0590	0.0073	0.0671
Scenario1_linear	MAR	complete_case	bernstein	tau2	500	0.5	0.0439	0.0669	0.0081	0.0659
Scenario1_linear	MAR	complete_case	epanechnikov	tau1	500	0.5	0.0485	0.0631	0.0087	0.0722
Scenario1_linear	MAR	complete_case	tricube	tau2	500	0.5	0.0507	0.0638	0.0091	0.0731
Scenario1_linear	MAR	complete_case	gaussian	tau2	500	0.5	0.0547	0.0580	0.0091	0.0734
Scenario1_linear	MAR	complete_case	epanechnikov	tau2	500	0.5	0.0580	0.0573	0.0095	0.0741
Scenario1_linear	MAR	complete_case	tricube	tau3	500	0.5	0.0602	0.0585	0.0106	0.0773
Scenario1_linear	MAR	complete_case	epanechnikov	tau3	500	0.5	0.0664	0.0578	0.0113	0.0814
Scenario1_linear	MAR	complete_case	bernstein	tau2	1000	0	0.0094	0.0385	0.0018	0.0330
Scenario1_linear	MAR	complete_case	bernstein	tau3	1000	0	0.0115	0.0380	0.0019	0.0332
Scenario1_linear	MAR	complete_case	bernstein	tau1	1000	0	0.0122	0.0396	0.0021	0.0353
Scenario1_linear	MAR	complete_case	tricube	tau3	1000	0	0.0214	0.0282	0.0022	0.0358
Scenario1_linear	MAR	complete_case	tricube	tau2	1000	0	0.0225	0.0300	0.0023	0.0367
Scenario1_linear	MAR	complete_case	tricube	tau1	1000	0	0.0236	0.0309	0.0023	0.0381
Scenario1_linear	MAR	complete_case	gaussian	tau2	1000	0	0.0260	0.0280	0.0024	0.0384
Scenario1_linear	MAR	complete_case	gaussian	tau1	1000	0	0.0262	0.0288	0.0024	0.0386
Scenario1_linear	MAR	complete_case	gaussian	tau3	1000	0	0.0279	0.0274	0.0025	0.0390
Scenario1_linear	MAR	complete_case	epanechnikov	tau2	1000	0	0.0251	0.0283	0.0025	0.0381
Scenario1_linear	MAR	complete_case	bernstein	tau2	1000	0.1	0.0134	0.0415	0.0022	0.0362
Scenario1_linear	MAR	complete_case	bernstein	tau3	1000	0.1	0.0152	0.0397	0.0022	0.0361
Scenario1_linear	MAR	complete_case	bernstein	tau1	1000	0.1	0.0104	0.0427	0.0023	0.0364
Scenario1_linear	MAR	complete_case	tricube	tau1	1000	0.1	0.0236	0.0306	0.0024	0.0379
Scenario1_linear	MAR	complete_case	tricube	tau2	1000	0.1	0.0246	0.0322	0.0026	0.0395
Scenario1_linear	MAR	complete_case	tricube	tau3	1000	0.1	0.0264	0.0320	0.0027	0.0407
Scenario1_linear	MAR	complete_case	gaussian	tau1	1000	0.1	0.0267	0.0302	0.0027	0.0406
Scenario1_linear	MAR	complete_case	epanechnikov	tau1	1000	0.1	0.0285	0.0296	0.0028	0.0408
Scenario1_linear	MAR	complete_case	gaussian	tau3	1000	0.1	0.0298	0.0297	0.0029	0.0417
Scenario1_linear	MAR	complete_case	epanechnikov	tau3	1000	0.1	0.0299	0.0281	0.0029	0.0412
Scenario1_linear	MAR	complete_case	bernstein	tau1	1000	0.3	0.0153	0.0468	0.0028	0.0402
Scenario1_linear	MAR	complete_case	bernstein	tau2	1000	0.3	0.0166	0.0458	0.0028	0.0405
Scenario1_linear	MAR	complete_case	tricube	tau1	1000	0.3	0.0282	0.0370	0.0032	0.0444
Scenario1_linear	MAR	complete_case	tricube	tau2	1000	0.3	0.0285	0.0367	0.0032	0.0442
Scenario1_linear	MAR	complete_case	bernstein	tau3	1000	0.3	0.0224	0.0465	0.0034	0.0443
Scenario1_linear	MAR	complete_case	tricube	tau3	1000	0.3	0.0329	0.0346	0.0035	0.0454
Scenario1_linear	MAR	complete_case	gaussian	tau1	1000	0.3	0.0331	0.0343	0.0036	0.0466
Scenario1_linear	MAR	complete_case	epanechnikov	tau1	1000	0.3	0.0331	0.0360	0.0037	0.0467
Scenario1_linear	MAR	complete_case	gaussian	tau2	1000	0.3	0.0350	0.0342	0.0038	0.0478
Scenario1_linear	MAR	complete_case	epanechnikov	tau2	1000	0.3	0.0334	0.0351	0.0038	0.0474
Scenario1_linear	MAR	complete_case	bernstein	tau1	1000	0.5	0.0123	0.0526	0.0034	0.0452
Scenario1_linear	MAR	complete_case	tricube	tau1	1000	0.5	0.0269	0.0426	0.0038	0.0482
Scenario1_linear	MAR	complete_case	bernstein	tau2	1000	0.5	0.0265	0.0521	0.0041	0.0485
Scenario1_linear	MAR	complete_case	gaussian	tau1	1000	0.5	0.0324	0.0425	0.0042	0.0501
Scenario1_linear	MAR	complete_case	epanechnikov	tau1	1000	0.5	0.0367	0.0431	0.0048	0.0535
Scenario1_linear	MAR	complete_case	epanechnikov	tau2	1000	0.5	0.0405	0.0412	0.0049	0.0541
Scenario1_linear	MAR	complete_case	tricube	tau2	1000	0.5	0.0375	0.0446	0.0049	0.0544
Scenario1_linear	MAR	complete_case	gaussian	tau2	1000	0.5	0.0408	0.0426	0.0052	0.0562
Scenario1_linear	MAR	complete_case	bernstein	tau3	1000	0.5	0.0341	0.0517	0.0055	0.0544
Scenario1_linear	MAR	complete_case	tricube	tau3	1000	0.5	0.0455	0.0426	0.0059	0.0580
Scenario1_linear	MAR	complete_case	bernstein	tau3	2000	0	0.0074	0.0317	0.0012	0.0267
Scenario1_linear	MAR	complete_case	tricube	tau1	2000	0	0.0162	0.0215	0.0013	0.0270
Scenario1_linear	MAR	complete_case	bernstein	tau2	2000	0	0.0075	0.0324	0.0013	0.0270
Scenario1_linear	MAR	complete_case	bernstein	tau1	2000	0	0.0075	0.0321	0.0013	0.0272
Scenario1_linear	MAR	complete_case	tricube	tau2	2000	0	0.0165	0.0227	0.0014	0.0285
Scenario1_linear	MAR	complete_case	gaussian	tau1	2000	0	0.0201	0.0204	0.0014	0.0289
Scenario1_linear	MAR	complete_case	tricube	tau3	2000	0	0.0184	0.0232	0.0014	0.0293
Scenario1_linear	MAR	complete_case	gaussian	tau2	2000	0	0.0194	0.0206	0.0015	0.0288
Scenario1_linear	MAR	complete_case	gaussian	tau3	2000	0	0.0198	0.0220	0.0015	0.0302
Scenario1_linear	MAR	complete_case	epanechnikov	tau2	2000	0	0.0198	0.0211	0.0015	0.0294
Scenario1_linear	MAR	complete_case	bernstein	tau1	2000	0.1	0.0072	0.0325	0.0013	0.0271
Scenario1_linear	MAR	complete_case	bernstein	tau2	2000	0.1	0.0074	0.0331	0.0013	0.0279
Scenario1_linear	MAR	complete_case	tricube	tau2	2000	0.1	0.0178	0.0225	0.0014	0.0288
Scenario1_linear	MAR	complete_case	tricube	tau1	2000	0.1	0.0175	0.0231	0.0014	0.0291
Scenario1_linear	MAR	complete_case	bernstein	tau3	2000	0.1	0.0097	0.0335	0.0015	0.0292
Scenario1_linear	MAR	complete_case	tricube	tau3	2000	0.1	0.0204	0.0224	0.0015	0.0299
Scenario1_linear	MAR	complete_case	gaussian	tau2	2000	0.1	0.0205	0.0222	0.0016	0.0305
Scenario1_linear	MAR	complete_case	epanechnikov	tau1	2000	0.1	0.0206	0.0220	0.0016	0.0304
Scenario1_linear	MAR	complete_case	gaussian	tau3	2000	0.1	0.0212	0.0224	0.0017	0.0314
Scenario1_linear	MAR	complete_case	gaussian	tau1	2000	0.1	0.0213	0.0234	0.0017	0.0324
Scenario1_linear	MAR	complete_case	bernstein	tau1	2000	0.3	0.0086	0.0357	0.0016	0.0300
Scenario1_linear	MAR	complete_case	bernstein	tau2	2000	0.3	0.0111	0.0365	0.0017	0.0316
Scenario1_linear	MAR	complete_case	tricube	tau1	2000	0.3	0.0212	0.0273	0.0018	0.0330
Scenario1_linear	MAR	complete_case	tricube	tau2	2000	0.3	0.0210	0.0264	0.0019	0.0332
Scenario1_linear	MAR	complete_case	tricube	tau3	2000	0.3	0.0249	0.0265	0.0020	0.0344
Scenario1_linear	MAR	complete_case	gaussian	tau1	2000	0.3	0.0218	0.0264	0.0020	0.0348
Scenario1_linear	MAR	complete_case	bernstein	tau3	2000	0.3	0.0150	0.0384	0.0021	0.0345
Scenario1_linear	MAR	complete_case	gaussian	tau2	2000	0.3	0.0254	0.0257	0.0021	0.0355
Scenario1_linear	MAR	complete_case	epanechnikov	tau1	2000	0.3	0.0242	0.0252	0.0021	0.0352
Scenario1_linear	MAR	complete_case	epanechnikov	tau2	2000	0.3	0.0264	0.0260	0.0022	0.0362
Scenario1_linear	MAR	complete_case	bernstein	tau1	2000	0.5	0.0114	0.0423	0.0022	0.0359
Scenario1_linear	MAR	complete_case	bernstein	tau2	2000	0.5	0.0142	0.0409	0.0023	0.0360
Scenario1_linear	MAR	complete_case	tricube	tau1	2000	0.5	0.0240	0.0322	0.0023	0.0374
Scenario1_linear	MAR	complete_case	epanechnikov	tau1	2000	0.5	0.0267	0.0291	0.0025	0.0378
Scenario1_linear	MAR	complete_case	tricube	tau2	2000	0.5	0.0270	0.0317	0.0026	0.0393
Scenario1_linear	MAR	complete_case	gaussian	tau1	2000	0.5	0.0298	0.0328	0.0029	0.0416
Scenario1_linear	MAR	complete_case	tricube	tau3	2000	0.5	0.0327	0.0302	0.0029	0.0414
Scenario1_linear	MAR	complete_case	bernstein	tau3	2000	0.5	0.0195	0.0422	0.0030	0.0403
Scenario1_linear	MAR	complete_case	gaussian	tau2	2000	0.5	0.0322	0.0321	0.0031	0.0430
Scenario1_linear	MAR	complete_case	epanechnikov	tau2	2000	0.5	0.0336	0.0305	0.0031	0.0421
Scenario1_linear	MAR	ipw	bernstein	tau1	500	0	0.0154	0.0482	0.0031	0.0430
Scenario1_linear	MAR	ipw	bernstein	tau2	500	0	0.0174	0.0490	0.0032	0.0433
Scenario1_linear	MAR	ipw	tricube	tau2	500	0	0.0287	0.0396	0.0037	0.0475
Scenario1_linear	MAR	ipw	bernstein	tau3	500	0	0.0204	0.0518	0.0038	0.0473
Scenario1_linear	MAR	ipw	tricube	tau3	500	0	0.0281	0.0391	0.0039	0.0482
Scenario1_linear	MAR	ipw	tricube	tau1	500	0	0.0314	0.0421	0.0042	0.0504
Scenario1_linear	MAR	ipw	gaussian	tau2	500	0	0.0346	0.0387	0.0042	0.0512
Scenario1_linear	MAR	ipw	gaussian	tau1	500	0	0.0332	0.0403	0.0043	0.0521
Scenario1_linear	MAR	ipw	epanechnikov	tau2	500	0	0.0358	0.0400	0.0047	0.0545
Scenario1_linear	MAR	ipw	gaussian	tau3	500	0	0.0385	0.0394	0.0047	0.0546
Scenario1_linear	MAR	ipw	bernstein	tau1	500	0.1	0.0179	0.0479	0.0031	0.0431
Scenario1_linear	MAR	ipw	bernstein	tau2	500	0.1	0.0210	0.0503	0.0034	0.0444
Scenario1_linear	MAR	ipw	bernstein	tau3	500	0.1	0.0209	0.0500	0.0038	0.0474
Scenario1_linear	MAR	ipw	tricube	tau2	500	0.1	0.0303	0.0408	0.0039	0.0490
Scenario1_linear	MAR	ipw	tricube	tau3	500	0.1	0.0310	0.0413	0.0043	0.0508
Scenario1_linear	MAR	ipw	gaussian	tau2	500	0.1	0.0345	0.0398	0.0043	0.0516
Scenario1_linear	MAR	ipw	tricube	tau1	500	0.1	0.0333	0.0428	0.0044	0.0526
Scenario1_linear	MAR	ipw	gaussian	tau1	500	0.1	0.0349	0.0411	0.0045	0.0524
Scenario1_linear	MAR	ipw	gaussian	tau3	500	0.1	0.0360	0.0417	0.0047	0.0534
Scenario1_linear	MAR	ipw	epanechnikov	tau1	500	0.1	0.0358	0.0418	0.0047	0.0534
Scenario1_linear	MAR	ipw	bernstein	tau2	500	0.3	0.0208	0.0573	0.0044	0.0506
Scenario1_linear	MAR	ipw	bernstein	tau1	500	0.3	0.0246	0.0570	0.0045	0.0515
Scenario1_linear	MAR	ipw	tricube	tau1	500	0.3	0.0293	0.0491	0.0048	0.0543
Scenario1_linear	MAR	ipw	tricube	tau2	500	0.3	0.0336	0.0482	0.0052	0.0557
Scenario1_linear	MAR	ipw	gaussian	tau1	500	0.3	0.0364	0.0464	0.0052	0.0574
Scenario1_linear	MAR	ipw	epanechnikov	tau1	500	0.3	0.0362	0.0456	0.0053	0.0568
Scenario1_linear	MAR	ipw	gaussian	tau2	500	0.3	0.0397	0.0455	0.0054	0.0575
Scenario1_linear	MAR	ipw	tricube	tau3	500	0.3	0.0374	0.0518	0.0059	0.0599
Scenario1_linear	MAR	ipw	bernstein	tau3	500	0.3	0.0297	0.0596	0.0060	0.0588
Scenario1_linear	MAR	ipw	epanechnikov	tau2	500	0.3	0.0383	0.0500	0.0062	0.0612
Scenario1_linear	MAR	ipw	bernstein	tau1	500	0.5	0.0282	0.0667	0.0061	0.0617
Scenario1_linear	MAR	ipw	gaussian	tau1	500	0.5	0.0351	0.0559	0.0063	0.0619
Scenario1_linear	MAR	ipw	tricube	tau2	500	0.5	0.0323	0.0562	0.0066	0.0618
Scenario1_linear	MAR	ipw	bernstein	tau2	500	0.5	0.0287	0.0663	0.0067	0.0622
Scenario1_linear	MAR	ipw	tricube	tau1	500	0.5	0.0353	0.0617	0.0069	0.0651
Scenario1_linear	MAR	ipw	epanechnikov	tau1	500	0.5	0.0379	0.0588	0.0072	0.0654
Scenario1_linear	MAR	ipw	epanechnikov	tau2	500	0.5	0.0420	0.0592	0.0080	0.0688
Scenario1_linear	MAR	ipw	gaussian	tau2	500	0.5	0.0467	0.0582	0.0083	0.0693
Scenario1_linear	MAR	ipw	epanechnikov	tau3	500	0.5	0.0539	0.0588	0.0094	0.0741
Scenario1_linear	MAR	ipw	gaussian	tau3	500	0.5	0.0543	0.0579	0.0095	0.0742
Scenario1_linear	MAR	ipw	bernstein	tau2	1000	0	0.0113	0.0378	0.0018	0.0327
Scenario1_linear	MAR	ipw	bernstein	tau1	1000	0	0.0103	0.0405	0.0021	0.0351
Scenario1_linear	MAR	ipw	bernstein	tau3	1000	0	0.0116	0.0406	0.0021	0.0353
Scenario1_linear	MAR	ipw	tricube	tau1	1000	0	0.0218	0.0293	0.0022	0.0363
Scenario1_linear	MAR	ipw	tricube	tau3	1000	0	0.0236	0.0292	0.0023	0.0372
Scenario1_linear	MAR	ipw	tricube	tau2	1000	0	0.0247	0.0301	0.0024	0.0386
Scenario1_linear	MAR	ipw	gaussian	tau2	1000	0	0.0254	0.0294	0.0025	0.0392
Scenario1_linear	MAR	ipw	gaussian	tau3	1000	0	0.0278	0.0280	0.0025	0.0391
Scenario1_linear	MAR	ipw	gaussian	tau1	1000	0	0.0261	0.0288	0.0026	0.0396
Scenario1_linear	MAR	ipw	epanechnikov	tau1	1000	0	0.0260	0.0286	0.0026	0.0391
Scenario1_linear	MAR	ipw	bernstein	tau1	1000	0.1	0.0102	0.0400	0.0021	0.0352
Scenario1_linear	MAR	ipw	bernstein	tau2	1000	0.1	0.0125	0.0402	0.0021	0.0352
Scenario1_linear	MAR	ipw	tricube	tau2	1000	0.1	0.0227	0.0308	0.0023	0.0373
Scenario1_linear	MAR	ipw	tricube	tau1	1000	0.1	0.0233	0.0313	0.0024	0.0385
Scenario1_linear	MAR	ipw	bernstein	tau3	1000	0.1	0.0157	0.0420	0.0024	0.0371
Scenario1_linear	MAR	ipw	tricube	tau3	1000	0.1	0.0261	0.0294	0.0024	0.0377
Scenario1_linear	MAR	ipw	gaussian	tau1	1000	0.1	0.0263	0.0308	0.0027	0.0401
Scenario1_linear	MAR	ipw	epanechnikov	tau1	1000	0.1	0.0236	0.0302	0.0027	0.0402
Scenario1_linear	MAR	ipw	gaussian	tau3	1000	0.1	0.0288	0.0291	0.0027	0.0409
Scenario1_linear	MAR	ipw	gaussian	tau2	1000	0.1	0.0249	0.0318	0.0027	0.0411
Scenario1_linear	MAR	ipw	bernstein	tau2	1000	0.3	0.0184	0.0438	0.0026	0.0397
Scenario1_linear	MAR	ipw	bernstein	tau1	1000	0.3	0.0117	0.0458	0.0026	0.0390
Scenario1_linear	MAR	ipw	tricube	tau1	1000	0.3	0.0244	0.0348	0.0027	0.0402
Scenario1_linear	MAR	ipw	tricube	tau2	1000	0.3	0.0227	0.0362	0.0029	0.0414
Scenario1_linear	MAR	ipw	gaussian	tau1	1000	0.3	0.0269	0.0329	0.0029	0.0418
Scenario1_linear	MAR	ipw	bernstein	tau3	1000	0.3	0.0173	0.0459	0.0031	0.0426
Scenario1_linear	MAR	ipw	epanechnikov	tau1	1000	0.3	0.0253	0.0356	0.0031	0.0433
Scenario1_linear	MAR	ipw	tricube	tau3	1000	0.3	0.0282	0.0363	0.0032	0.0440
Scenario1_linear	MAR	ipw	epanechnikov	tau2	1000	0.3	0.0283	0.0346	0.0033	0.0447
Scenario1_linear	MAR	ipw	gaussian	tau2	1000	0.3	0.0291	0.0380	0.0036	0.0467
Scenario1_linear	MAR	ipw	tricube	tau1	1000	0.5	0.0225	0.0434	0.0035	0.0460
Scenario1_linear	MAR	ipw	bernstein	tau2	1000	0.5	0.0195	0.0498	0.0036	0.0453
Scenario1_linear	MAR	ipw	bernstein	tau1	1000	0.5	0.0158	0.0558	0.0039	0.0485
Scenario1_linear	MAR	ipw	epanechnikov	tau1	1000	0.5	0.0289	0.0435	0.0040	0.0495
Scenario1_linear	MAR	ipw	gaussian	tau1	1000	0.5	0.0296	0.0429	0.0041	0.0496
Scenario1_linear	MAR	ipw	gaussian	tau2	1000	0.5	0.0303	0.0433	0.0042	0.0502
Scenario1_linear	MAR	ipw	tricube	tau2	1000	0.5	0.0286	0.0454	0.0044	0.0506
Scenario1_linear	MAR	ipw	epanechnikov	tau2	1000	0.5	0.0338	0.0417	0.0044	0.0510
Scenario1_linear	MAR	ipw	tricube	tau3	1000	0.5	0.0333	0.0437	0.0047	0.0527
Scenario1_linear	MAR	ipw	gaussian	tau3	1000	0.5	0.0396	0.0414	0.0051	0.0545
Scenario1_linear	MAR	ipw	bernstein	tau2	2000	0	0.0062	0.0310	0.0011	0.0258
Scenario1_linear	MAR	ipw	bernstein	tau1	2000	0	0.0079	0.0321	0.0013	0.0273
Scenario1_linear	MAR	ipw	tricube	tau3	2000	0	0.0173	0.0213	0.0013	0.0274
Scenario1_linear	MAR	ipw	bernstein	tau3	2000	0	0.0085	0.0321	0.0013	0.0276
Scenario1_linear	MAR	ipw	tricube	tau1	2000	0	0.0159	0.0224	0.0013	0.0282
Scenario1_linear	MAR	ipw	tricube	tau2	2000	0	0.0158	0.0230	0.0014	0.0280
Scenario1_linear	MAR	ipw	gaussian	tau3	2000	0	0.0193	0.0213	0.0015	0.0290
Scenario1_linear	MAR	ipw	gaussian	tau2	2000	0	0.0207	0.0212	0.0015	0.0297
Scenario1_linear	MAR	ipw	gaussian	tau1	2000	0	0.0204	0.0217	0.0015	0.0300
Scenario1_linear	MAR	ipw	epanechnikov	tau2	2000	0	0.0188	0.0214	0.0015	0.0294
Scenario1_linear	MAR	ipw	bernstein	tau2	2000	0.1	0.0076	0.0324	0.0013	0.0270
Scenario1_linear	MAR	ipw	tricube	tau1	2000	0.1	0.0186	0.0230	0.0014	0.0286
Scenario1_linear	MAR	ipw	tricube	tau2	2000	0.1	0.0171	0.0231	0.0014	0.0287
Scenario1_linear	MAR	ipw	tricube	tau3	2000	0.1	0.0161	0.0227	0.0014	0.0283
Scenario1_linear	MAR	ipw	bernstein	tau3	2000	0.1	0.0098	0.0326	0.0014	0.0285
Scenario1_linear	MAR	ipw	bernstein	tau1	2000	0.1	0.0088	0.0338	0.0014	0.0286
Scenario1_linear	MAR	ipw	gaussian	tau1	2000	0.1	0.0206	0.0225	0.0016	0.0302
Scenario1_linear	MAR	ipw	gaussian	tau2	2000	0.1	0.0198	0.0233	0.0016	0.0315
Scenario1_linear	MAR	ipw	epanechnikov	tau3	2000	0.1	0.0194	0.0225	0.0016	0.0306
Scenario1_linear	MAR	ipw	gaussian	tau3	2000	0.1	0.0222	0.0227	0.0016	0.0313
Scenario1_linear	MAR	ipw	epanechnikov	tau1	2000	0.3	0.0194	0.0229	0.0016	0.0306
Scenario1_linear	MAR	ipw	bernstein	tau1	2000	0.3	0.0084	0.0366	0.0016	0.0313
Scenario1_linear	MAR	ipw	tricube	tau1	2000	0.3	0.0170	0.0267	0.0017	0.0315
Scenario1_linear	MAR	ipw	bernstein	tau2	2000	0.3	0.0092	0.0366	0.0017	0.0313
Scenario1_linear	MAR	ipw	tricube	tau2	2000	0.3	0.0189	0.0272	0.0018	0.0324
Scenario1_linear	MAR	ipw	tricube	tau3	2000	0.3	0.0219	0.0250	0.0018	0.0320
Scenario1_linear	MAR	ipw	bernstein	tau3	2000	0.3	0.0109	0.0363	0.0018	0.0319
Scenario1_linear	MAR	ipw	gaussian	tau1	2000	0.3	0.0213	0.0265	0.0018	0.0334
Scenario1_linear	MAR	ipw	gaussian	tau2	2000	0.3	0.0226	0.0260	0.0019	0.0332
Scenario1_linear	MAR	ipw	epanechnikov	tau2	2000	0.3	0.0211	0.0260	0.0020	0.0340
Scenario1_linear	MAR	ipw	tricube	tau1	2000	0.5	0.0188	0.0309	0.0020	0.0349
Scenario1_linear	MAR	ipw	tricube	tau2	2000	0.5	0.0181	0.0313	0.0022	0.0351
Scenario1_linear	MAR	ipw	bernstein	tau1	2000	0.5	0.0128	0.0417	0.0022	0.0358
Scenario1_linear	MAR	ipw	gaussian	tau1	2000	0.5	0.0229	0.0309	0.0023	0.0370
Scenario1_linear	MAR	ipw	epanechnikov	tau1	2000	0.5	0.0211	0.0311	0.0023	0.0363
Scenario1_linear	MAR	ipw	bernstein	tau2	2000	0.5	0.0134	0.0421	0.0023	0.0363
Scenario1_linear	MAR	ipw	epanechnikov	tau2	2000	0.5	0.0250	0.0300	0.0025	0.0383
Scenario1_linear	MAR	ipw	tricube	tau3	2000	0.5	0.0248	0.0330	0.0026	0.0389
Scenario1_linear	MAR	ipw	gaussian	tau2	2000	0.5	0.0241	0.0327	0.0026	0.0398
Scenario1_linear	MAR	ipw	gaussian	tau3	2000	0.5	0.0272	0.0311	0.0028	0.0401
Scenario1_linear	MCAR	complete_case	bernstein	tau2	500	0	0.0195	0.0474	0.0030	0.0422
Scenario1_linear	MCAR	complete_case	bernstein	tau1	500	0	0.0221	0.0501	0.0036	0.0464
Scenario1_linear	MCAR	complete_case	tricube	tau3	500	0	0.0302	0.0385	0.0037	0.0472
Scenario1_linear	MCAR	complete_case	bernstein	tau3	500	0	0.0225	0.0507	0.0038	0.0480
Scenario1_linear	MCAR	complete_case	tricube	tau1	500	0	0.0296	0.0412	0.0039	0.0496
Scenario1_linear	MCAR	complete_case	gaussian	tau3	500	0	0.0344	0.0382	0.0042	0.0511
Scenario1_linear	MCAR	complete_case	tricube	tau2	500	0	0.0333	0.0423	0.0043	0.0518
Scenario1_linear	MCAR	complete_case	gaussian	tau1	500	0	0.0336	0.0400	0.0043	0.0521
Scenario1_linear	MCAR	complete_case	gaussian	tau2	500	0	0.0365	0.0398	0.0044	0.0528
Scenario1_linear	MCAR	complete_case	epanechnikov	tau1	500	0	0.0376	0.0386	0.0046	0.0536
Scenario1_linear	MCAR	complete_case	bernstein	tau2	500	0.1	0.0211	0.0522	0.0036	0.0468
Scenario1_linear	MCAR	complete_case	bernstein	tau3	500	0.1	0.0257	0.0489	0.0037	0.0472
Scenario1_linear	MCAR	complete_case	bernstein	tau1	500	0.1	0.0193	0.0534	0.0039	0.0485
Scenario1_linear	MCAR	complete_case	tricube	tau1	500	0.1	0.0303	0.0413	0.0040	0.0495
Scenario1_linear	MCAR	complete_case	tricube	tau3	500	0.1	0.0328	0.0417	0.0042	0.0508
Scenario1_linear	MCAR	complete_case	tricube	tau2	500	0.1	0.0317	0.0426	0.0044	0.0516
Scenario1_linear	MCAR	complete_case	gaussian	tau2	500	0.1	0.0363	0.0394	0.0046	0.0529
Scenario1_linear	MCAR	complete_case	gaussian	tau1	500	0.1	0.0376	0.0413	0.0048	0.0542
Scenario1_linear	MCAR	complete_case	gaussian	tau3	500	0.1	0.0384	0.0424	0.0049	0.0553
Scenario1_linear	MCAR	complete_case	epanechnikov	tau3	500	0.1	0.0368	0.0403	0.0050	0.0551
Scenario1_linear	MCAR	complete_case	bernstein	tau2	500	0.3	0.0255	0.0553	0.0043	0.0511
Scenario1_linear	MCAR	complete_case	bernstein	tau3	500	0.3	0.0281	0.0563	0.0048	0.0541
Scenario1_linear	MCAR	complete_case	bernstein	tau1	500	0.3	0.0268	0.0588	0.0050	0.0548
Scenario1_linear	MCAR	complete_case	tricube	tau1	500	0.3	0.0348	0.0466	0.0053	0.0573
Scenario1_linear	MCAR	complete_case	gaussian	tau2	500	0.3	0.0353	0.0451	0.0054	0.0572
Scenario1_linear	MCAR	complete_case	tricube	tau3	500	0.3	0.0375	0.0470	0.0054	0.0579
Scenario1_linear	MCAR	complete_case	tricube	tau2	500	0.3	0.0381	0.0497	0.0057	0.0604
Scenario1_linear	MCAR	complete_case	gaussian	tau3	500	0.3	0.0398	0.0464	0.0058	0.0603
Scenario1_linear	MCAR	complete_case	epanechnikov	tau1	500	0.3	0.0399	0.0475	0.0060	0.0613
Scenario1_linear	MCAR	complete_case	epanechnikov	tau3	500	0.3	0.0416	0.0454	0.0060	0.0605
Scenario1_linear	MCAR	complete_case	bernstein	tau2	500	0.5	0.0294	0.0612	0.0053	0.0564
Scenario1_linear	MCAR	complete_case	bernstein	tau1	500	0.5	0.0274	0.0636	0.0058	0.0596
Scenario1_linear	MCAR	complete_case	bernstein	tau3	500	0.5	0.0328	0.0616	0.0060	0.0603
Scenario1_linear	MCAR	complete_case	tricube	tau2	500	0.5	0.0406	0.0512	0.0065	0.0634
Scenario1_linear	MCAR	complete_case	tricube	tau1	500	0.5	0.0384	0.0543	0.0066	0.0634
Scenario1_linear	MCAR	complete_case	tricube	tau3	500	0.5	0.0448	0.0547	0.0072	0.0671
Scenario1_linear	MCAR	complete_case	gaussian	tau2	500	0.5	0.0500	0.0530	0.0075	0.0688
Scenario1_linear	MCAR	complete_case	gaussian	tau1	500	0.5	0.0479	0.0533	0.0076	0.0688
Scenario1_linear	MCAR	complete_case	epanechnikov	tau1	500	0.5	0.0476	0.0527	0.0077	0.0687
Scenario1_linear	MCAR	complete_case	epanechnikov	tau2	500	0.5	0.0496	0.0557	0.0082	0.0720
Scenario1_linear	MCAR	complete_case	bernstein	tau3	1000	0	0.0111	0.0397	0.0021	0.0351
Scenario1_linear	MCAR	complete_case	bernstein	tau2	1000	0	0.0104	0.0415	0.0022	0.0354
Scenario1_linear	MCAR	complete_case	bernstein	tau1	1000	0	0.0123	0.0410	0.0022	0.0360
Scenario1_linear	MCAR	complete_case	tricube	tau2	1000	0	0.0222	0.0294	0.0022	0.0365
Scenario1_linear	MCAR	complete_case	tricube	tau1	1000	0	0.0234	0.0295	0.0023	0.0372
Scenario1_linear	MCAR	complete_case	tricube	tau3	1000	0	0.0238	0.0300	0.0024	0.0378
Scenario1_linear	MCAR	complete_case	gaussian	tau1	1000	0	0.0267	0.0279	0.0025	0.0385
Scenario1_linear	MCAR	complete_case	gaussian	tau3	1000	0	0.0262	0.0287	0.0025	0.0392
Scenario1_linear	MCAR	complete_case	epanechnikov	tau2	1000	0	0.0266	0.0276	0.0026	0.0387
Scenario1_linear	MCAR	complete_case	gaussian	tau2	1000	0	0.0270	0.0296	0.0026	0.0403
Scenario1_linear	MCAR	complete_case	bernstein	tau2	1000	0.1	0.0128	0.0411	0.0021	0.0354
Scenario1_linear	MCAR	complete_case	bernstein	tau1	1000	0.1	0.0128	0.0414	0.0023	0.0364
Scenario1_linear	MCAR	complete_case	bernstein	tau3	1000	0.1	0.0140	0.0416	0.0024	0.0373
Scenario1_linear	MCAR	complete_case	tricube	tau1	1000	0.1	0.0227	0.0318	0.0025	0.0386
Scenario1_linear	MCAR	complete_case	tricube	tau2	1000	0.1	0.0226	0.0328	0.0026	0.0392
Scenario1_linear	MCAR	complete_case	tricube	tau3	1000	0.1	0.0262	0.0323	0.0027	0.0396
Scenario1_linear	MCAR	complete_case	gaussian	tau2	1000	0.1	0.0270	0.0304	0.0027	0.0408
Scenario1_linear	MCAR	complete_case	gaussian	tau3	1000	0.1	0.0292	0.0296	0.0027	0.0406
Scenario1_linear	MCAR	complete_case	epanechnikov	tau2	1000	0.1	0.0253	0.0303	0.0028	0.0404
Scenario1_linear	MCAR	complete_case	gaussian	tau1	1000	0.1	0.0270	0.0316	0.0028	0.0420
Scenario1_linear	MCAR	complete_case	bernstein	tau2	1000	0.3	0.0138	0.0450	0.0026	0.0389
Scenario1_linear	MCAR	complete_case	bernstein	tau1	1000	0.3	0.0167	0.0434	0.0026	0.0394
Scenario1_linear	MCAR	complete_case	bernstein	tau3	1000	0.3	0.0166	0.0443	0.0027	0.0398
Scenario1_linear	MCAR	complete_case	tricube	tau1	1000	0.3	0.0249	0.0361	0.0030	0.0430
Scenario1_linear	MCAR	complete_case	tricube	tau3	1000	0.3	0.0290	0.0333	0.0030	0.0431
Scenario1_linear	MCAR	complete_case	tricube	tau2	1000	0.3	0.0271	0.0350	0.0031	0.0437
Scenario1_linear	MCAR	complete_case	epanechnikov	tau3	1000	0.3	0.0296	0.0311	0.0032	0.0431
Scenario1_linear	MCAR	complete_case	gaussian	tau2	1000	0.3	0.0298	0.0340	0.0034	0.0455
Scenario1_linear	MCAR	complete_case	epanechnikov	tau1	1000	0.3	0.0309	0.0319	0.0034	0.0448
Scenario1_linear	MCAR	complete_case	epanechnikov	tau2	1000	0.3	0.0297	0.0323	0.0034	0.0453
Scenario1_linear	MCAR	complete_case	bernstein	tau2	1000	0.5	0.0148	0.0500	0.0032	0.0440
Scenario1_linear	MCAR	complete_case	bernstein	tau1	1000	0.5	0.0176	0.0501	0.0035	0.0463
Scenario1_linear	MCAR	complete_case	bernstein	tau3	1000	0.5	0.0211	0.0502	0.0035	0.0454
Scenario1_linear	MCAR	complete_case	tricube	tau3	1000	0.5	0.0304	0.0385	0.0038	0.0483
Scenario1_linear	MCAR	complete_case	tricube	tau2	1000	0.5	0.0278	0.0415	0.0039	0.0483
Scenario1_linear	MCAR	complete_case	epanechnikov	tau2	1000	0.5	0.0313	0.0359	0.0040	0.0484
Scenario1_linear	MCAR	complete_case	tricube	tau1	1000	0.5	0.0307	0.0420	0.0041	0.0502
Scenario1_linear	MCAR	complete_case	gaussian	tau2	1000	0.5	0.0356	0.0374	0.0042	0.0509
Scenario1_linear	MCAR	complete_case	gaussian	tau1	1000	0.5	0.0321	0.0400	0.0043	0.0515
Scenario1_linear	MCAR	complete_case	gaussian	tau3	1000	0.5	0.0376	0.0380	0.0045	0.0532
Scenario1_linear	MCAR	complete_case	bernstein	tau2	2000	0	0.0066	0.0322	0.0013	0.0267
Scenario1_linear	MCAR	complete_case	tricube	tau1	2000	0	0.0162	0.0213	0.0013	0.0273
Scenario1_linear	MCAR	complete_case	bernstein	tau1	2000	0	0.0080	0.0317	0.0013	0.0270
Scenario1_linear	MCAR	complete_case	tricube	tau2	2000	0	0.0166	0.0213	0.0013	0.0274
Scenario1_linear	MCAR	complete_case	tricube	tau3	2000	0	0.0178	0.0217	0.0013	0.0279
Scenario1_linear	MCAR	complete_case	bernstein	tau3	2000	0	0.0086	0.0331	0.0014	0.0285
Scenario1_linear	MCAR	complete_case	gaussian	tau3	2000	0	0.0196	0.0208	0.0014	0.0290
Scenario1_linear	MCAR	complete_case	gaussian	tau1	2000	0	0.0206	0.0209	0.0015	0.0294
Scenario1_linear	MCAR	complete_case	gaussian	tau2	2000	0	0.0204	0.0207	0.0015	0.0294
Scenario1_linear	MCAR	complete_case	epanechnikov	tau3	2000	0	0.0187	0.0199	0.0015	0.0286
Scenario1_linear	MCAR	complete_case	bernstein	tau1	2000	0.1	0.0092	0.0306	0.0012	0.0262
Scenario1_linear	MCAR	complete_case	bernstein	tau3	2000	0.1	0.0080	0.0321	0.0013	0.0277
Scenario1_linear	MCAR	complete_case	bernstein	tau2	2000	0.1	0.0085	0.0333	0.0014	0.0282
Scenario1_linear	MCAR	complete_case	tricube	tau1	2000	0.1	0.0168	0.0226	0.0014	0.0283
Scenario1_linear	MCAR	complete_case	tricube	tau3	2000	0.1	0.0167	0.0224	0.0014	0.0287
Scenario1_linear	MCAR	complete_case	tricube	tau2	2000	0.1	0.0190	0.0224	0.0014	0.0289
Scenario1_linear	MCAR	complete_case	gaussian	tau1	2000	0.1	0.0194	0.0210	0.0015	0.0295
Scenario1_linear	MCAR	complete_case	gaussian	tau2	2000	0.1	0.0210	0.0213	0.0015	0.0300
Scenario1_linear	MCAR	complete_case	epanechnikov	tau2	2000	0.1	0.0194	0.0223	0.0016	0.0308
Scenario1_linear	MCAR	complete_case	gaussian	tau3	2000	0.1	0.0210	0.0231	0.0016	0.0312
Scenario1_linear	MCAR	complete_case	bernstein	tau2	2000	0.3	0.0084	0.0351	0.0015	0.0296
Scenario1_linear	MCAR	complete_case	tricube	tau1	2000	0.3	0.0185	0.0251	0.0016	0.0312
Scenario1_linear	MCAR	complete_case	bernstein	tau1	2000	0.3	0.0104	0.0358	0.0016	0.0311
Scenario1_linear	MCAR	complete_case	bernstein	tau3	2000	0.3	0.0112	0.0355	0.0017	0.0311
Scenario1_linear	MCAR	complete_case	tricube	tau2	2000	0.3	0.0175	0.0258	0.0017	0.0314
Scenario1_linear	MCAR	complete_case	tricube	tau3	2000	0.3	0.0195	0.0257	0.0017	0.0324
Scenario1_linear	MCAR	complete_case	gaussian	tau2	2000	0.3	0.0231	0.0243	0.0019	0.0338
Scenario1_linear	MCAR	complete_case	gaussian	tau3	2000	0.3	0.0225	0.0249	0.0019	0.0340
Scenario1_linear	MCAR	complete_case	gaussian	tau1	2000	0.3	0.0228	0.0250	0.0019	0.0338
Scenario1_linear	MCAR	complete_case	epanechnikov	tau2	2000	0.3	0.0229	0.0246	0.0020	0.0334
Scenario1_linear	MCAR	complete_case	bernstein	tau2	2000	0.5	0.0121	0.0400	0.0020	0.0348
Scenario1_linear	MCAR	complete_case	bernstein	tau1	2000	0.5	0.0129	0.0392	0.0020	0.0346
Scenario1_linear	MCAR	complete_case	tricube	tau2	2000	0.5	0.0208	0.0290	0.0021	0.0354
Scenario1_linear	MCAR	complete_case	bernstein	tau3	2000	0.5	0.0147	0.0392	0.0022	0.0349
Scenario1_linear	MCAR	complete_case	tricube	tau1	2000	0.5	0.0222	0.0312	0.0023	0.0379
Scenario1_linear	MCAR	complete_case	gaussian	tau1	2000	0.5	0.0241	0.0294	0.0025	0.0388
Scenario1_linear	MCAR	complete_case	tricube	tau3	2000	0.5	0.0227	0.0320	0.0025	0.0393
Scenario1_linear	MCAR	complete_case	gaussian	tau2	2000	0.5	0.0260	0.0281	0.0025	0.0389
Scenario1_linear	MCAR	complete_case	gaussian	tau3	2000	0.5	0.0277	0.0282	0.0026	0.0401
Scenario1_linear	MCAR	complete_case	epanechnikov	tau2	2000	0.5	0.0246	0.0296	0.0027	0.0398
Scenario1_linear	MCAR	ipw	bernstein	tau2	500	0	0.0183	0.0500	0.0033	0.0445
Scenario1_linear	MCAR	ipw	bernstein	tau3	500	0	0.0196	0.0487	0.0033	0.0445
Scenario1_linear	MCAR	ipw	bernstein	tau1	500	0	0.0164	0.0498	0.0033	0.0449
Scenario1_linear	MCAR	ipw	tricube	tau3	500	0	0.0295	0.0391	0.0038	0.0480
Scenario1_linear	MCAR	ipw	tricube	tau2	500	0	0.0326	0.0400	0.0039	0.0494
Scenario1_linear	MCAR	ipw	tricube	tau1	500	0	0.0329	0.0418	0.0041	0.0506
Scenario1_linear	MCAR	ipw	gaussian	tau3	500	0	0.0378	0.0385	0.0044	0.0526
Scenario1_linear	MCAR	ipw	gaussian	tau2	500	0	0.0357	0.0415	0.0046	0.0536
Scenario1_linear	MCAR	ipw	gaussian	tau1	500	0	0.0372	0.0412	0.0046	0.0531
Scenario1_linear	MCAR	ipw	epanechnikov	tau3	500	0	0.0350	0.0375	0.0046	0.0521
Scenario1_linear	MCAR	ipw	bernstein	tau2	500	0.1	0.0188	0.0498	0.0032	0.0435
Scenario1_linear	MCAR	ipw	bernstein	tau1	500	0.1	0.0200	0.0507	0.0036	0.0470
Scenario1_linear	MCAR	ipw	bernstein	tau3	500	0.1	0.0225	0.0534	0.0041	0.0493
Scenario1_linear	MCAR	ipw	tricube	tau3	500	0.1	0.0299	0.0413	0.0042	0.0498
Scenario1_linear	MCAR	ipw	tricube	tau1	500	0.1	0.0331	0.0415	0.0043	0.0517
Scenario1_linear	MCAR	ipw	tricube	tau2	500	0.1	0.0322	0.0433	0.0045	0.0532
Scenario1_linear	MCAR	ipw	gaussian	tau2	500	0.1	0.0365	0.0402	0.0046	0.0525
Scenario1_linear	MCAR	ipw	gaussian	tau1	500	0.1	0.0332	0.0422	0.0046	0.0534
Scenario1_linear	MCAR	ipw	epanechnikov	tau1	500	0.1	0.0380	0.0405	0.0048	0.0545
Scenario1_linear	MCAR	ipw	epanechnikov	tau2	500	0.1	0.0351	0.0424	0.0050	0.0558
Scenario1_linear	MCAR	ipw	bernstein	tau2	500	0.3	0.0229	0.0557	0.0043	0.0508
Scenario1_linear	MCAR	ipw	bernstein	tau3	500	0.3	0.0256	0.0552	0.0045	0.0528
Scenario1_linear	MCAR	ipw	bernstein	tau1	500	0.3	0.0270	0.0566	0.0047	0.0527
Scenario1_linear	MCAR	ipw	tricube	tau1	500	0.3	0.0382	0.0463	0.0054	0.0577
Scenario1_linear	MCAR	ipw	gaussian	tau1	500	0.3	0.0394	0.0437	0.0054	0.0575
Scenario1_linear	MCAR	ipw	gaussian	tau2	500	0.3	0.0399	0.0431	0.0054	0.0577
Scenario1_linear	MCAR	ipw	tricube	tau2	500	0.3	0.0361	0.0500	0.0056	0.0590
Scenario1_linear	MCAR	ipw	tricube	tau3	500	0.3	0.0383	0.0494	0.0058	0.0592
Scenario1_linear	MCAR	ipw	epanechnikov	tau1	500	0.3	0.0396	0.0465	0.0060	0.0605
Scenario1_linear	MCAR	ipw	epanechnikov	tau2	500	0.3	0.0408	0.0456	0.0060	0.0602
Scenario1_linear	MCAR	ipw	bernstein	tau2	500	0.5	0.0335	0.0650	0.0062	0.0617
Scenario1_linear	MCAR	ipw	bernstein	tau3	500	0.5	0.0292	0.0636	0.0062	0.0610
Scenario1_linear	MCAR	ipw	bernstein	tau1	500	0.5	0.0262	0.0657	0.0062	0.0618
Scenario1_linear	MCAR	ipw	tricube	tau1	500	0.5	0.0400	0.0514	0.0065	0.0629
Scenario1_linear	MCAR	ipw	tricube	tau2	500	0.5	0.0402	0.0577	0.0072	0.0665
Scenario1_linear	MCAR	ipw	tricube	tau3	500	0.5	0.0433	0.0582	0.0077	0.0689
Scenario1_linear	MCAR	ipw	epanechnikov	tau2	500	0.5	0.0480	0.0546	0.0079	0.0697
Scenario1_linear	MCAR	ipw	gaussian	tau3	500	0.5	0.0497	0.0538	0.0080	0.0708
Scenario1_linear	MCAR	ipw	epanechnikov	tau1	500	0.5	0.0508	0.0553	0.0082	0.0716
Scenario1_linear	MCAR	ipw	gaussian	tau2	500	0.5	0.0512	0.0560	0.0082	0.0720
Scenario1_linear	MCAR	ipw	bernstein	tau1	1000	0	0.0104	0.0387	0.0019	0.0337
Scenario1_linear	MCAR	ipw	bernstein	tau2	1000	0	0.0112	0.0407	0.0021	0.0346
Scenario1_linear	MCAR	ipw	tricube	tau1	1000	0	0.0219	0.0283	0.0021	0.0360
Scenario1_linear	MCAR	ipw	tricube	tau2	1000	0	0.0215	0.0292	0.0022	0.0366
Scenario1_linear	MCAR	ipw	bernstein	tau3	1000	0	0.0113	0.0414	0.0022	0.0365
Scenario1_linear	MCAR	ipw	tricube	tau3	1000	0	0.0227	0.0295	0.0023	0.0373
Scenario1_linear	MCAR	ipw	gaussian	tau1	1000	0	0.0255	0.0285	0.0024	0.0388
Scenario1_linear	MCAR	ipw	gaussian	tau3	1000	0	0.0271	0.0278	0.0025	0.0391
Scenario1_linear	MCAR	ipw	epanechnikov	tau2	1000	0	0.0252	0.0277	0.0025	0.0384
Scenario1_linear	MCAR	ipw	gaussian	tau2	1000	0	0.0278	0.0295	0.0026	0.0402
Scenario1_linear	MCAR	ipw	bernstein	tau1	1000	0.1	0.0116	0.0400	0.0020	0.0347
Scenario1_linear	MCAR	ipw	bernstein	tau3	1000	0.1	0.0124	0.0419	0.0023	0.0366
Scenario1_linear	MCAR	ipw	bernstein	tau2	1000	0.1	0.0126	0.0429	0.0023	0.0370
Scenario1_linear	MCAR	ipw	tricube	tau1	1000	0.1	0.0214	0.0304	0.0023	0.0373
Scenario1_linear	MCAR	ipw	gaussian	tau1	1000	0.1	0.0264	0.0283	0.0025	0.0387
Scenario1_linear	MCAR	ipw	tricube	tau2	1000	0.1	0.0248	0.0323	0.0026	0.0397
Scenario1_linear	MCAR	ipw	tricube	tau3	1000	0.1	0.0253	0.0320	0.0026	0.0397
Scenario1_linear	MCAR	ipw	gaussian	tau3	1000	0.1	0.0252	0.0301	0.0027	0.0400
Scenario1_linear	MCAR	ipw	gaussian	tau2	1000	0.1	0.0275	0.0306	0.0027	0.0413
Scenario1_linear	MCAR	ipw	epanechnikov	tau1	1000	0.1	0.0285	0.0302	0.0028	0.0409
Scenario1_linear	MCAR	ipw	bernstein	tau2	1000	0.3	0.0141	0.0441	0.0025	0.0393
Scenario1_linear	MCAR	ipw	bernstein	tau3	1000	0.3	0.0169	0.0439	0.0027	0.0398
Scenario1_linear	MCAR	ipw	bernstein	tau1	1000	0.3	0.0172	0.0451	0.0028	0.0403
Scenario1_linear	MCAR	ipw	tricube	tau2	1000	0.3	0.0244	0.0339	0.0028	0.0416
Scenario1_linear	MCAR	ipw	tricube	tau1	1000	0.3	0.0256	0.0349	0.0029	0.0419
Scenario1_linear	MCAR	ipw	tricube	tau3	1000	0.3	0.0268	0.0347	0.0031	0.0438
Scenario1_linear	MCAR	ipw	gaussian	tau2	1000	0.3	0.0289	0.0335	0.0032	0.0436
Scenario1_linear	MCAR	ipw	gaussian	tau1	1000	0.3	0.0308	0.0333	0.0033	0.0449
Scenario1_linear	MCAR	ipw	epanechnikov	tau1	1000	0.3	0.0282	0.0340	0.0034	0.0456
Scenario1_linear	MCAR	ipw	gaussian	tau3	1000	0.3	0.0309	0.0350	0.0034	0.0461
Scenario1_linear	MCAR	ipw	bernstein	tau2	1000	0.5	0.0189	0.0491	0.0032	0.0440
Scenario1_linear	MCAR	ipw	bernstein	tau3	1000	0.5	0.0200	0.0494	0.0034	0.0451
Scenario1_linear	MCAR	ipw	bernstein	tau1	1000	0.5	0.0189	0.0495	0.0034	0.0444
Scenario1_linear	MCAR	ipw	tricube	tau2	1000	0.5	0.0271	0.0400	0.0038	0.0485
Scenario1_linear	MCAR	ipw	tricube	tau1	1000	0.5	0.0324	0.0400	0.0042	0.0510
Scenario1_linear	MCAR	ipw	gaussian	tau1	1000	0.5	0.0320	0.0401	0.0043	0.0512
Scenario1_linear	MCAR	ipw	epanechnikov	tau1	1000	0.5	0.0318	0.0400	0.0043	0.0511
Scenario1_linear	MCAR	ipw	gaussian	tau2	1000	0.5	0.0351	0.0409	0.0044	0.0522
Scenario1_linear	MCAR	ipw	tricube	tau3	1000	0.5	0.0310	0.0432	0.0044	0.0520
Scenario1_linear	MCAR	ipw	gaussian	tau3	1000	0.5	0.0367	0.0399	0.0044	0.0528
Scenario1_linear	MCAR	ipw	bernstein	tau3	2000	0	0.0071	0.0315	0.0012	0.0266
Scenario1_linear	MCAR	ipw	bernstein	tau2	2000	0	0.0065	0.0326	0.0013	0.0271
Scenario1_linear	MCAR	ipw	tricube	tau1	2000	0	0.0163	0.0221	0.0013	0.0280
Scenario1_linear	MCAR	ipw	tricube	tau3	2000	0	0.0168	0.0215	0.0013	0.0278
Scenario1_linear	MCAR	ipw	tricube	tau2	2000	0	0.0180	0.0226	0.0014	0.0288
Scenario1_linear	MCAR	ipw	bernstein	tau1	2000	0	0.0099	0.0331	0.0014	0.0285
Scenario1_linear	MCAR	ipw	gaussian	tau1	2000	0	0.0188	0.0214	0.0015	0.0290
Scenario1_linear	MCAR	ipw	gaussian	tau3	2000	0	0.0192	0.0215	0.0015	0.0298
Scenario1_linear	MCAR	ipw	epanechnikov	tau3	2000	0	0.0188	0.0210	0.0015	0.0288
Scenario1_linear	MCAR	ipw	gaussian	tau2	2000	0	0.0204	0.0216	0.0015	0.0302
Scenario1_linear	MCAR	ipw	bernstein	tau2	2000	0.1	0.0074	0.0325	0.0013	0.0272
Scenario1_linear	MCAR	ipw	bernstein	tau1	2000	0.1	0.0084	0.0330	0.0014	0.0281
Scenario1_linear	MCAR	ipw	tricube	tau1	2000	0.1	0.0168	0.0232	0.0014	0.0281
Scenario1_linear	MCAR	ipw	bernstein	tau3	2000	0.1	0.0094	0.0330	0.0014	0.0283
Scenario1_linear	MCAR	ipw	tricube	tau2	2000	0.1	0.0167	0.0226	0.0014	0.0284
Scenario1_linear	MCAR	ipw	tricube	tau3	2000	0.1	0.0179	0.0228	0.0014	0.0289
Scenario1_linear	MCAR	ipw	gaussian	tau1	2000	0.1	0.0206	0.0220	0.0016	0.0305
Scenario1_linear	MCAR	ipw	gaussian	tau2	2000	0.1	0.0202	0.0225	0.0016	0.0307
Scenario1_linear	MCAR	ipw	epanechnikov	tau3	2000	0.1	0.0219	0.0216	0.0017	0.0311
Scenario1_linear	MCAR	ipw	gaussian	tau3	2000	0.1	0.0215	0.0230	0.0017	0.0316
Scenario1_linear	MCAR	ipw	bernstein	tau2	2000	0.3	0.0076	0.0345	0.0014	0.0286
Scenario1_linear	MCAR	ipw	tricube	tau1	2000	0.3	0.0182	0.0253	0.0016	0.0308
Scenario1_linear	MCAR	ipw	bernstein	tau3	2000	0.3	0.0105	0.0360	0.0017	0.0313
Scenario1_linear	MCAR	ipw	tricube	tau2	2000	0.3	0.0182	0.0257	0.0017	0.0321
Scenario1_linear	MCAR	ipw	tricube	tau3	2000	0.3	0.0211	0.0250	0.0017	0.0319
Scenario1_linear	MCAR	ipw	bernstein	tau1	2000	0.3	0.0114	0.0370	0.0018	0.0322
Scenario1_linear	MCAR	ipw	gaussian	tau1	2000	0.3	0.0235	0.0237	0.0019	0.0333
Scenario1_linear	MCAR	ipw	gaussian	tau3	2000	0.3	0.0234	0.0254	0.0019	0.0339
Scenario1_linear	MCAR	ipw	gaussian	tau2	2000	0.3	0.0216	0.0254	0.0020	0.0340
Scenario1_linear	MCAR	ipw	epanechnikov	tau1	2000	0.3	0.0248	0.0242	0.0020	0.0346
Scenario1_linear	MCAR	ipw	bernstein	tau2	2000	0.5	0.0106	0.0388	0.0019	0.0331
Scenario1_linear	MCAR	ipw	bernstein	tau3	2000	0.5	0.0138	0.0394	0.0020	0.0347
Scenario1_linear	MCAR	ipw	bernstein	tau1	2000	0.5	0.0127	0.0400	0.0021	0.0352
Scenario1_linear	MCAR	ipw	tricube	tau1	2000	0.5	0.0214	0.0291	0.0022	0.0358
Scenario1_linear	MCAR	ipw	tricube	tau3	2000	0.5	0.0233	0.0291	0.0023	0.0363
Scenario1_linear	MCAR	ipw	tricube	tau2	2000	0.5	0.0223	0.0308	0.0024	0.0377
Scenario1_linear	MCAR	ipw	epanechnikov	tau2	2000	0.5	0.0248	0.0289	0.0025	0.0384
Scenario1_linear	MCAR	ipw	gaussian	tau1	2000	0.5	0.0266	0.0285	0.0025	0.0392
Scenario1_linear	MCAR	ipw	gaussian	tau3	2000	0.5	0.0270	0.0283	0.0025	0.0390
Scenario1_linear	MCAR	ipw	gaussian	tau2	2000	0.5	0.0269	0.0305	0.0026	0.0404
Scenario2_nonlinear	MAR	complete_case	epanechnikov	tau3	500	0	0.0373	0.0702	0.0109	0.0686
Scenario2_nonlinear	MAR	complete_case	gaussian	tau3	500	0	0.0355	0.0788	0.0111	0.0689
Scenario2_nonlinear	MAR	complete_case	tricube	tau3	500	0	0.0352	0.0831	0.0139	0.0742
Scenario2_nonlinear	MAR	complete_case	bernstein	tau3	500	0	0.0496	0.0881	0.0163	0.0752
Scenario2_nonlinear	MAR	complete_case	beta	tau3	500	0	0.2354	0.0563	0.0841	0.2363
Scenario2_nonlinear	MAR	complete_case	gaussian	tau2	500	0	0.1487	0.0978	0.0843	0.1608
Scenario2_nonlinear	MAR	complete_case	beta	tau2	500	0	0.2519	0.0525	0.0986	0.2525
Scenario2_nonlinear	MAR	complete_case	bernstein	tau2	500	0	0.1629	0.0842	0.0988	0.1733
Scenario2_nonlinear	MAR	complete_case	beta	tau1	500	0	0.2668	0.0508	0.1147	0.2675
Scenario2_nonlinear	MAR	complete_case	tricube	tau2	500	0	0.1703	0.0936	0.1199	0.1859
Scenario2_nonlinear	MAR	complete_case	tricube	tau3	500	0.1	0.0291	0.0741	0.0111	0.0687
Scenario2_nonlinear	MAR	complete_case	epanechnikov	tau3	500	0.1	0.0333	0.0744	0.0119	0.0715
Scenario2_nonlinear	MAR	complete_case	bernstein	tau3	500	0.1	0.0496	0.0807	0.0135	0.0740
Scenario2_nonlinear	MAR	complete_case	gaussian	tau3	500	0.1	0.0550	0.0883	0.0166	0.0799
Scenario2_nonlinear	MAR	complete_case	beta	tau3	500	0.1	0.2302	0.0568	0.0804	0.2313
Scenario2_nonlinear	MAR	complete_case	gaussian	tau2	500	0.1	0.1525	0.0907	0.0836	0.1643
Scenario2_nonlinear	MAR	complete_case	bernstein	tau2	500	0.1	0.1715	0.0855	0.1016	0.1791
Scenario2_nonlinear	MAR	complete_case	beta	tau2	500	0.1	0.2565	0.0534	0.1018	0.2576
Scenario2_nonlinear	MAR	complete_case	epanechnikov	tau2	500	0.1	0.1646	0.0885	0.1095	0.1798
Scenario2_nonlinear	MAR	complete_case	beta	tau1	500	0.1	0.2731	0.0543	0.1203	0.2741
Scenario2_nonlinear	MAR	complete_case	tricube	tau3	500	0.3	0.0401	0.0742	0.0114	0.0733
Scenario2_nonlinear	MAR	complete_case	bernstein	tau3	500	0.3	0.0551	0.0769	0.0128	0.0772
Scenario2_nonlinear	MAR	complete_case	epanechnikov	tau3	500	0.3	0.0433	0.0796	0.0136	0.0769
Scenario2_nonlinear	MAR	complete_case	gaussian	tau3	500	0.3	0.0526	0.0829	0.0139	0.0794
Scenario2_nonlinear	MAR	complete_case	gaussian	tau2	500	0.3	0.1568	0.0901	0.0794	0.1684
Scenario2_nonlinear	MAR	complete_case	bernstein	tau2	500	0.3	0.1690	0.0844	0.0900	0.1748
Scenario2_nonlinear	MAR	complete_case	beta	tau3	500	0.3	0.2490	0.0554	0.0921	0.2503
Scenario2_nonlinear	MAR	complete_case	beta	tau2	500	0.3	0.2648	0.0611	0.1086	0.2663
Scenario2_nonlinear	MAR	complete_case	epanechnikov	tau2	500	0.3	0.1813	0.0986	0.1140	0.1928
Scenario2_nonlinear	MAR	complete_case	beta	tau1	500	0.3	0.2712	0.0549	0.1165	0.2725
Scenario2_nonlinear	MAR	complete_case	tricube	tau3	500	0.5	0.0428	0.0897	0.0153	0.0835
Scenario2_nonlinear	MAR	complete_case	epanechnikov	tau3	500	0.5	0.0547	0.0923	0.0177	0.0884
Scenario2_nonlinear	MAR	complete_case	gaussian	tau3	500	0.5	0.0619	0.0958	0.0191	0.0908
Scenario2_nonlinear	MAR	complete_case	bernstein	tau3	500	0.5	0.0822	0.0920	0.0215	0.0988
Scenario2_nonlinear	MAR	complete_case	gaussian	tau2	500	0.5	0.1692	0.0946	0.0853	0.1802
Scenario2_nonlinear	MAR	complete_case	beta	tau3	500	0.5	0.2543	0.0668	0.0959	0.2559
Scenario2_nonlinear	MAR	complete_case	bernstein	tau2	500	0.5	0.1838	0.0948	0.0963	0.1903
Scenario2_nonlinear	MAR	complete_case	beta	tau2	500	0.5	0.2711	0.0620	0.1110	0.2724
Scenario2_nonlinear	MAR	complete_case	epanechnikov	tau2	500	0.5	0.1786	0.0981	0.1111	0.1916
Scenario2_nonlinear	MAR	complete_case	tricube	tau2	500	0.5	0.1880	0.1013	0.1218	0.2010
Scenario2_nonlinear	MAR	complete_case	bernstein	tau3	1000	0	0.0238	0.0573	0.0061	0.0506
Scenario2_nonlinear	MAR	complete_case	tricube	tau3	1000	0	0.0265	0.0583	0.0071	0.0546
Scenario2_nonlinear	MAR	complete_case	gaussian	tau3	1000	0	0.0327	0.0600	0.0072	0.0539
Scenario2_nonlinear	MAR	complete_case	epanechnikov	tau3	1000	0	0.0307	0.0588	0.0077	0.0550
Scenario2_nonlinear	MAR	complete_case	gaussian	tau2	1000	0	0.1257	0.0713	0.0588	0.1331
Scenario2_nonlinear	MAR	complete_case	beta	tau3	1000	0	0.2144	0.0373	0.0683	0.2149
Scenario2_nonlinear	MAR	complete_case	beta	tau2	1000	0	0.2253	0.0417	0.0774	0.2258
Scenario2_nonlinear	MAR	complete_case	beta	tau1	1000	0	0.2358	0.0384	0.0861	0.2362
Scenario2_nonlinear	MAR	complete_case	bernstein	tau2	1000	0	0.1507	0.0775	0.1018	0.1608
Scenario2_nonlinear	MAR	complete_case	epanechnikov	tau2	1000	0	0.1564	0.0747	0.1068	0.1656
Scenario2_nonlinear	MAR	complete_case	tricube	tau3	1000	0.1	0.0281	0.0569	0.0075	0.0554
Scenario2_nonlinear	MAR	complete_case	bernstein	tau3	1000	0.1	0.0267	0.0682	0.0089	0.0556
Scenario2_nonlinear	MAR	complete_case	gaussian	tau3	1000	0.1	0.0376	0.0706	0.0106	0.0605
Scenario2_nonlinear	MAR	complete_case	epanechnikov	tau3	1000	0.1	0.0410	0.0782	0.0143	0.0672
Scenario2_nonlinear	MAR	complete_case	beta	tau3	1000	0.1	0.2144	0.0401	0.0678	0.2149
Scenario2_nonlinear	MAR	complete_case	beta	tau2	1000	0.1	0.2282	0.0379	0.0786	0.2287
Scenario2_nonlinear	MAR	complete_case	gaussian	tau2	1000	0.1	0.1445	0.0817	0.0835	0.1531
Scenario2_nonlinear	MAR	complete_case	beta	tau1	1000	0.1	0.2423	0.0429	0.0920	0.2429
Scenario2_nonlinear	MAR	complete_case	epanechnikov	tau2	1000	0.1	0.1494	0.0795	0.0962	0.1594
Scenario2_nonlinear	MAR	complete_case	bernstein	tau2	1000	0.1	0.1488	0.0739	0.0968	0.1592
Scenario2_nonlinear	MAR	complete_case	gaussian	tau3	1000	0.3	0.0403	0.0630	0.0084	0.0600
Scenario2_nonlinear	MAR	complete_case	epanechnikov	tau3	1000	0.3	0.0395	0.0592	0.0090	0.0606
Scenario2_nonlinear	MAR	complete_case	bernstein	tau3	1000	0.3	0.0359	0.0713	0.0102	0.0618
Scenario2_nonlinear	MAR	complete_case	tricube	tau3	1000	0.3	0.0244	0.0722	0.0105	0.0608
Scenario2_nonlinear	MAR	complete_case	gaussian	tau2	1000	0.3	0.1323	0.0752	0.0666	0.1405
Scenario2_nonlinear	MAR	complete_case	beta	tau3	1000	0.3	0.2237	0.0467	0.0743	0.2245
Scenario2_nonlinear	MAR	complete_case	beta	tau2	1000	0.3	0.2394	0.0424	0.0862	0.2402
Scenario2_nonlinear	MAR	complete_case	beta	tau1	1000	0.3	0.2408	0.0419	0.0884	0.2415
Scenario2_nonlinear	MAR	complete_case	bernstein	tau2	1000	0.3	0.1494	0.0792	0.0910	0.1579
Scenario2_nonlinear	MAR	complete_case	tricube	tau2	1000	0.3	0.1524	0.0769	0.0995	0.1628
Scenario2_nonlinear	MAR	complete_case	bernstein	tau3	1000	0.5	0.0440	0.0651	0.0093	0.0647
Scenario2_nonlinear	MAR	complete_case	gaussian	tau3	1000	0.5	0.0459	0.0677	0.0094	0.0663
Scenario2_nonlinear	MAR	complete_case	epanechnikov	tau3	1000	0.5	0.0443	0.0652	0.0099	0.0671
Scenario2_nonlinear	MAR	complete_case	tricube	tau3	1000	0.5	0.0341	0.0700	0.0099	0.0654
Scenario2_nonlinear	MAR	complete_case	gaussian	tau2	1000	0.5	0.1380	0.0793	0.0646	0.1474
Scenario2_nonlinear	MAR	complete_case	beta	tau3	1000	0.5	0.2360	0.0473	0.0812	0.2369
Scenario2_nonlinear	MAR	complete_case	bernstein	tau2	1000	0.5	0.1543	0.0767	0.0852	0.1606
Scenario2_nonlinear	MAR	complete_case	beta	tau2	1000	0.5	0.2409	0.0430	0.0859	0.2419
Scenario2_nonlinear	MAR	complete_case	epanechnikov	tau2	1000	0.5	0.1588	0.0868	0.0984	0.1691
Scenario2_nonlinear	MAR	complete_case	beta	tau1	1000	0.5	0.2557	0.0450	0.1000	0.2564
Scenario2_nonlinear	MAR	complete_case	bernstein	tau3	2000	0	0.0136	0.0516	0.0049	0.0419
Scenario2_nonlinear	MAR	complete_case	epanechnikov	tau3	2000	0	0.0265	0.0474	0.0051	0.0431
Scenario2_nonlinear	MAR	complete_case	gaussian	tau3	2000	0	0.0288	0.0523	0.0059	0.0454
Scenario2_nonlinear	MAR	complete_case	tricube	tau3	2000	0	0.0187	0.0516	0.0060	0.0429
Scenario2_nonlinear	MAR	complete_case	gaussian	tau2	2000	0	0.1044	0.0598	0.0506	0.1113
Scenario2_nonlinear	MAR	complete_case	beta	tau3	2000	0	0.1958	0.0301	0.0562	0.1961
Scenario2_nonlinear	MAR	complete_case	beta	tau2	2000	0	0.2044	0.0274	0.0623	0.2046
Scenario2_nonlinear	MAR	complete_case	beta	tau1	2000	0	0.2154	0.0306	0.0716	0.2157
Scenario2_nonlinear	MAR	complete_case	epanechnikov	tau2	2000	0	0.1283	0.0606	0.0838	0.1352
Scenario2_nonlinear	MAR	complete_case	tricube	tau2	2000	0	0.1281	0.0619	0.0894	0.1385
Scenario2_nonlinear	MAR	complete_case	tricube	tau3	2000	0.1	0.0227	0.0434	0.0043	0.0429
Scenario2_nonlinear	MAR	complete_case	bernstein	tau3	2000	0.1	0.0161	0.0498	0.0045	0.0419
Scenario2_nonlinear	MAR	complete_case	epanechnikov	tau3	2000	0.1	0.0239	0.0474	0.0050	0.0446
Scenario2_nonlinear	MAR	complete_case	gaussian	tau3	2000	0.1	0.0322	0.0507	0.0058	0.0473
Scenario2_nonlinear	MAR	complete_case	beta	tau3	2000	0.1	0.2012	0.0278	0.0593	0.2015
Scenario2_nonlinear	MAR	complete_case	gaussian	tau2	2000	0.1	0.1152	0.0674	0.0593	0.1215
Scenario2_nonlinear	MAR	complete_case	beta	tau2	2000	0.1	0.2098	0.0308	0.0659	0.2102
Scenario2_nonlinear	MAR	complete_case	beta	tau1	2000	0.1	0.2137	0.0297	0.0693	0.2139
Scenario2_nonlinear	MAR	complete_case	epanechnikov	tau2	2000	0.1	0.1275	0.0612	0.0830	0.1347
Scenario2_nonlinear	MAR	complete_case	tricube	tau2	2000	0.1	0.1276	0.0688	0.0858	0.1377
Scenario2_nonlinear	MAR	complete_case	bernstein	tau3	2000	0.3	0.0201	0.0500	0.0045	0.0438
Scenario2_nonlinear	MAR	complete_case	tricube	tau3	2000	0.3	0.0236	0.0462	0.0045	0.0447
Scenario2_nonlinear	MAR	complete_case	epanechnikov	tau3	2000	0.3	0.0307	0.0501	0.0060	0.0490
Scenario2_nonlinear	MAR	complete_case	gaussian	tau3	2000	0.3	0.0360	0.0507	0.0062	0.0500
Scenario2_nonlinear	MAR	complete_case	gaussian	tau2	2000	0.3	0.1115	0.0639	0.0523	0.1175
Scenario2_nonlinear	MAR	complete_case	beta	tau3	2000	0.3	0.2059	0.0332	0.0615	0.2063
Scenario2_nonlinear	MAR	complete_case	beta	tau2	2000	0.3	0.2129	0.0293	0.0671	0.2134
Scenario2_nonlinear	MAR	complete_case	beta	tau1	2000	0.3	0.2222	0.0332	0.0748	0.2226
Scenario2_nonlinear	MAR	complete_case	epanechnikov	tau2	2000	0.3	0.1264	0.0645	0.0798	0.1344
Scenario2_nonlinear	MAR	complete_case	bernstein	tau2	2000	0.3	0.1282	0.0672	0.0834	0.1378
Scenario2_nonlinear	MAR	complete_case	bernstein	tau3	2000	0.5	0.0261	0.0534	0.0053	0.0471
Scenario2_nonlinear	MAR	complete_case	gaussian	tau3	2000	0.5	0.0377	0.0519	0.0060	0.0518
Scenario2_nonlinear	MAR	complete_case	tricube	tau3	2000	0.5	0.0283	0.0531	0.0064	0.0503
Scenario2_nonlinear	MAR	complete_case	epanechnikov	tau3	2000	0.5	0.0363	0.0566	0.0082	0.0537
Scenario2_nonlinear	MAR	complete_case	gaussian	tau2	2000	0.5	0.1110	0.0623	0.0497	0.1179
Scenario2_nonlinear	MAR	complete_case	beta	tau3	2000	0.5	0.2130	0.0351	0.0653	0.2135
Scenario2_nonlinear	MAR	complete_case	beta	tau2	2000	0.5	0.2210	0.0337	0.0712	0.2215
Scenario2_nonlinear	MAR	complete_case	beta	tau1	2000	0.5	0.2263	0.0309	0.0756	0.2268
Scenario2_nonlinear	MAR	complete_case	epanechnikov	tau2	2000	0.5	0.1345	0.0680	0.0811	0.1412
Scenario2_nonlinear	MAR	complete_case	tricube	tau2	2000	0.5	0.1330	0.0660	0.0842	0.1413
Scenario2_nonlinear	MAR	ipw	bernstein	tau3	500	0	0.0407	0.0734	0.0107	0.0672
Scenario2_nonlinear	MAR	ipw	tricube	tau3	500	0	0.0322	0.0719	0.0108	0.0689
Scenario2_nonlinear	MAR	ipw	epanechnikov	tau3	500	0	0.0361	0.0771	0.0124	0.0700
Scenario2_nonlinear	MAR	ipw	gaussian	tau3	500	0	0.0475	0.0914	0.0177	0.0776
Scenario2_nonlinear	MAR	ipw	gaussian	tau2	500	0	0.1541	0.0953	0.0877	0.1653
Scenario2_nonlinear	MAR	ipw	beta	tau3	500	0	0.2402	0.0554	0.0886	0.2410
Scenario2_nonlinear	MAR	ipw	bernstein	tau2	500	0	0.1698	0.0905	0.0999	0.1783
Scenario2_nonlinear	MAR	ipw	beta	tau2	500	0	0.2534	0.0557	0.1007	0.2543
Scenario2_nonlinear	MAR	ipw	tricube	tau2	500	0	0.1693	0.0914	0.1164	0.1837
Scenario2_nonlinear	MAR	ipw	epanechnikov	tau2	500	0	0.1769	0.0895	0.1210	0.1895
Scenario2_nonlinear	MAR	ipw	gaussian	tau3	500	0.1	0.0385	0.0781	0.0111	0.0702
Scenario2_nonlinear	MAR	ipw	tricube	tau3	500	0.1	0.0259	0.0800	0.0125	0.0702
Scenario2_nonlinear	MAR	ipw	epanechnikov	tau3	500	0.1	0.0394	0.0768	0.0131	0.0736
Scenario2_nonlinear	MAR	ipw	bernstein	tau3	500	0.1	0.0466	0.0796	0.0131	0.0715
Scenario2_nonlinear	MAR	ipw	gaussian	tau2	500	0.1	0.1546	0.0915	0.0803	0.1647
Scenario2_nonlinear	MAR	ipw	beta	tau3	500	0.1	0.2467	0.0541	0.0917	0.2475
Scenario2_nonlinear	MAR	ipw	beta	tau2	500	0.1	0.2540	0.0531	0.1010	0.2549
Scenario2_nonlinear	MAR	ipw	bernstein	tau2	500	0.1	0.1708	0.0904	0.1016	0.1783
Scenario2_nonlinear	MAR	ipw	epanechnikov	tau2	500	0.1	0.1765	0.0935	0.1175	0.1878
Scenario2_nonlinear	MAR	ipw	beta	tau1	500	0.1	0.2743	0.0524	0.1219	0.2753
Scenario2_nonlinear	MAR	ipw	epanechnikov	tau3	500	0.3	0.0413	0.0701	0.0101	0.0727
Scenario2_nonlinear	MAR	ipw	bernstein	tau3	500	0.3	0.0605	0.0699	0.0125	0.0794
Scenario2_nonlinear	MAR	ipw	tricube	tau3	500	0.3	0.0328	0.0862	0.0132	0.0758
Scenario2_nonlinear	MAR	ipw	gaussian	tau3	500	0.3	0.0480	0.0858	0.0145	0.0781
Scenario2_nonlinear	MAR	ipw	gaussian	tau2	500	0.3	0.1673	0.1041	0.0938	0.1787
Scenario2_nonlinear	MAR	ipw	beta	tau3	500	0.3	0.2558	0.0530	0.0981	0.2568
Scenario2_nonlinear	MAR	ipw	bernstein	tau2	500	0.3	0.1795	0.0892	0.1007	0.1858
Scenario2_nonlinear	MAR	ipw	beta	tau2	500	0.3	0.2682	0.0537	0.1108	0.2689
Scenario2_nonlinear	MAR	ipw	epanechnikov	tau2	500	0.3	0.1770	0.1082	0.1130	0.1888
Scenario2_nonlinear	MAR	ipw	tricube	tau2	500	0.3	0.1882	0.0917	0.1287	0.1986
Scenario2_nonlinear	MAR	ipw	epanechnikov	tau3	500	0.5	0.0510	0.0848	0.0162	0.0866
Scenario2_nonlinear	MAR	ipw	tricube	tau3	500	0.5	0.0485	0.0888	0.0174	0.0844
Scenario2_nonlinear	MAR	ipw	gaussian	tau3	500	0.5	0.0612	0.0934	0.0184	0.0877
Scenario2_nonlinear	MAR	ipw	bernstein	tau3	500	0.5	0.0814	0.0811	0.0184	0.0947
Scenario2_nonlinear	MAR	ipw	gaussian	tau2	500	0.5	0.1591	0.0959	0.0807	0.1719
Scenario2_nonlinear	MAR	ipw	bernstein	tau2	500	0.5	0.1758	0.0899	0.0862	0.1826
Scenario2_nonlinear	MAR	ipw	beta	tau3	500	0.5	0.2617	0.0652	0.1037	0.2633
Scenario2_nonlinear	MAR	ipw	epanechnikov	tau2	500	0.5	0.1761	0.0953	0.1097	0.1880
Scenario2_nonlinear	MAR	ipw	beta	tau2	500	0.5	0.2795	0.0576	0.1194	0.2807
Scenario2_nonlinear	MAR	ipw	tricube	tau2	500	0.5	0.1929	0.1041	0.1293	0.2066
Scenario2_nonlinear	MAR	ipw	tricube	tau3	1000	0	0.0273	0.0537	0.0061	0.0525
Scenario2_nonlinear	MAR	ipw	bernstein	tau3	1000	0	0.0234	0.0632	0.0075	0.0524
Scenario2_nonlinear	MAR	ipw	gaussian	tau3	1000	0	0.0329	0.0647	0.0083	0.0563
Scenario2_nonlinear	MAR	ipw	epanechnikov	tau3	1000	0	0.0293	0.0674	0.0097	0.0577
Scenario2_nonlinear	MAR	ipw	gaussian	tau2	1000	0	0.1301	0.0756	0.0665	0.1374
Scenario2_nonlinear	MAR	ipw	beta	tau3	1000	0	0.2162	0.0379	0.0692	0.2167
Scenario2_nonlinear	MAR	ipw	beta	tau2	1000	0	0.2279	0.0369	0.0788	0.2283
Scenario2_nonlinear	MAR	ipw	beta	tau1	1000	0	0.2389	0.0433	0.0903	0.2394
Scenario2_nonlinear	MAR	ipw	epanechnikov	tau2	1000	0	0.1454	0.0760	0.0926	0.1547
Scenario2_nonlinear	MAR	ipw	bernstein	tau2	1000	0	0.1514	0.0742	0.0994	0.1610
Scenario2_nonlinear	MAR	ipw	bernstein	tau3	1000	0.1	0.0261	0.0601	0.0066	0.0529
Scenario2_nonlinear	MAR	ipw	tricube	tau3	1000	0.1	0.0260	0.0571	0.0072	0.0550
Scenario2_nonlinear	MAR	ipw	gaussian	tau3	1000	0.1	0.0377	0.0669	0.0093	0.0586
Scenario2_nonlinear	MAR	ipw	epanechnikov	tau3	1000	0.1	0.0322	0.0658	0.0100	0.0592
Scenario2_nonlinear	MAR	ipw	gaussian	tau2	1000	0.1	0.1255	0.0740	0.0599	0.1345
Scenario2_nonlinear	MAR	ipw	beta	tau3	1000	0.1	0.2196	0.0374	0.0712	0.2201
Scenario2_nonlinear	MAR	ipw	beta	tau2	1000	0.1	0.2312	0.0394	0.0814	0.2317
Scenario2_nonlinear	MAR	ipw	beta	tau1	1000	0.1	0.2397	0.0408	0.0893	0.2402
Scenario2_nonlinear	MAR	ipw	epanechnikov	tau2	1000	0.1	0.1456	0.0776	0.0941	0.1547
Scenario2_nonlinear	MAR	ipw	bernstein	tau2	1000	0.1	0.1513	0.0799	0.0991	0.1600
Scenario2_nonlinear	MAR	ipw	tricube	tau3	1000	0.3	0.0253	0.0606	0.0077	0.0557
Scenario2_nonlinear	MAR	ipw	bernstein	tau3	1000	0.3	0.0325	0.0627	0.0078	0.0586
Scenario2_nonlinear	MAR	ipw	epanechnikov	tau3	1000	0.3	0.0385	0.0573	0.0079	0.0604
Scenario2_nonlinear	MAR	ipw	gaussian	tau3	1000	0.3	0.0475	0.0714	0.0115	0.0672
Scenario2_nonlinear	MAR	ipw	gaussian	tau2	1000	0.3	0.1368	0.0784	0.0698	0.1449
Scenario2_nonlinear	MAR	ipw	beta	tau3	1000	0.3	0.2312	0.0412	0.0795	0.2317
Scenario2_nonlinear	MAR	ipw	beta	tau2	1000	0.3	0.2390	0.0401	0.0868	0.2395
Scenario2_nonlinear	MAR	ipw	bernstein	tau2	1000	0.3	0.1485	0.0775	0.0904	0.1579
Scenario2_nonlinear	MAR	ipw	beta	tau1	1000	0.3	0.2448	0.0447	0.0927	0.2454
Scenario2_nonlinear	MAR	ipw	tricube	tau2	1000	0.3	0.1515	0.0856	0.0997	0.1637
Scenario2_nonlinear	MAR	ipw	tricube	tau3	1000	0.5	0.0382	0.0629	0.0086	0.0640
Scenario2_nonlinear	MAR	ipw	epanechnikov	tau3	1000	0.5	0.0416	0.0679	0.0104	0.0654
Scenario2_nonlinear	MAR	ipw	bernstein	tau3	1000	0.5	0.0508	0.0673	0.0110	0.0697
Scenario2_nonlinear	MAR	ipw	gaussian	tau3	1000	0.5	0.0522	0.0737	0.0125	0.0704
Scenario2_nonlinear	MAR	ipw	gaussian	tau2	1000	0.5	0.1374	0.0793	0.0652	0.1460
Scenario2_nonlinear	MAR	ipw	beta	tau3	1000	0.5	0.2443	0.0461	0.0881	0.2450
Scenario2_nonlinear	MAR	ipw	bernstein	tau2	1000	0.5	0.1625	0.0836	0.0924	0.1686
Scenario2_nonlinear	MAR	ipw	epanechnikov	tau2	1000	0.5	0.1568	0.0861	0.0935	0.1645
Scenario2_nonlinear	MAR	ipw	beta	tau2	1000	0.5	0.2542	0.0448	0.0978	0.2549
Scenario2_nonlinear	MAR	ipw	tricube	tau2	1000	0.5	0.1544	0.0809	0.0982	0.1653
Scenario2_nonlinear	MAR	ipw	tricube	tau3	2000	0	0.0233	0.0425	0.0042	0.0422
Scenario2_nonlinear	MAR	ipw	epanechnikov	tau3	2000	0	0.0251	0.0439	0.0046	0.0431
Scenario2_nonlinear	MAR	ipw	bernstein	tau3	2000	0	0.0158	0.0540	0.0056	0.0421
Scenario2_nonlinear	MAR	ipw	gaussian	tau3	2000	0	0.0293	0.0502	0.0057	0.0443
Scenario2_nonlinear	MAR	ipw	gaussian	tau2	2000	0	0.1044	0.0596	0.0502	0.1116
Scenario2_nonlinear	MAR	ipw	beta	tau3	2000	0	0.1943	0.0270	0.0549	0.1946
Scenario2_nonlinear	MAR	ipw	beta	tau2	2000	0	0.2018	0.0294	0.0602	0.2020
Scenario2_nonlinear	MAR	ipw	beta	tau1	2000	0	0.2115	0.0307	0.0685	0.2118
Scenario2_nonlinear	MAR	ipw	tricube	tau2	2000	0	0.1308	0.0632	0.0841	0.1382
Scenario2_nonlinear	MAR	ipw	epanechnikov	tau2	2000	0	0.1323	0.0630	0.0884	0.1393
Scenario2_nonlinear	MAR	ipw	tricube	tau3	2000	0.1	0.0202	0.0469	0.0045	0.0426
Scenario2_nonlinear	MAR	ipw	epanechnikov	tau3	2000	0.1	0.0276	0.0434	0.0047	0.0444
Scenario2_nonlinear	MAR	ipw	bernstein	tau3	2000	0.1	0.0172	0.0532	0.0053	0.0421
Scenario2_nonlinear	MAR	ipw	gaussian	tau3	2000	0.1	0.0288	0.0520	0.0060	0.0461
Scenario2_nonlinear	MAR	ipw	gaussian	tau2	2000	0.1	0.1024	0.0598	0.0474	0.1096
Scenario2_nonlinear	MAR	ipw	beta	tau3	2000	0.1	0.1995	0.0287	0.0582	0.1998
Scenario2_nonlinear	MAR	ipw	beta	tau2	2000	0.1	0.2078	0.0297	0.0645	0.2081
Scenario2_nonlinear	MAR	ipw	beta	tau1	2000	0.1	0.2152	0.0297	0.0709	0.2155
Scenario2_nonlinear	MAR	ipw	epanechnikov	tau2	2000	0.1	0.1287	0.0623	0.0825	0.1354
Scenario2_nonlinear	MAR	ipw	bernstein	tau2	2000	0.1	0.1311	0.0656	0.0890	0.1412
Scenario2_nonlinear	MAR	ipw	gaussian	tau3	2000	0.3	0.0283	0.0477	0.0047	0.0447
Scenario2_nonlinear	MAR	ipw	tricube	tau3	2000	0.3	0.0273	0.0471	0.0052	0.0469
Scenario2_nonlinear	MAR	ipw	bernstein	tau3	2000	0.3	0.0220	0.0564	0.0061	0.0462
Scenario2_nonlinear	MAR	ipw	epanechnikov	tau3	2000	0.3	0.0334	0.0508	0.0067	0.0502
Scenario2_nonlinear	MAR	ipw	gaussian	tau2	2000	0.3	0.1103	0.0644	0.0506	0.1160
Scenario2_nonlinear	MAR	ipw	beta	tau3	2000	0.3	0.2070	0.0292	0.0631	0.2073
Scenario2_nonlinear	MAR	ipw	beta	tau2	2000	0.3	0.2158	0.0295	0.0688	0.2162
Scenario2_nonlinear	MAR	ipw	epanechnikov	tau2	2000	0.3	0.1254	0.0663	0.0727	0.1312
Scenario2_nonlinear	MAR	ipw	beta	tau1	2000	0.3	0.2236	0.0312	0.0755	0.2239
Scenario2_nonlinear	MAR	ipw	bernstein	tau2	2000	0.3	0.1273	0.0648	0.0818	0.1364
Scenario2_nonlinear	MAR	ipw	tricube	tau3	2000	0.5	0.0316	0.0497	0.0055	0.0494
Scenario2_nonlinear	MAR	ipw	gaussian	tau3	2000	0.5	0.0414	0.0535	0.0069	0.0540
Scenario2_nonlinear	MAR	ipw	bernstein	tau3	2000	0.5	0.0303	0.0585	0.0072	0.0512
Scenario2_nonlinear	MAR	ipw	epanechnikov	tau3	2000	0.5	0.0356	0.0583	0.0083	0.0539
Scenario2_nonlinear	MAR	ipw	gaussian	tau2	2000	0.5	0.1171	0.0668	0.0536	0.1229
Scenario2_nonlinear	MAR	ipw	beta	tau3	2000	0.5	0.2239	0.0333	0.0743	0.2243
Scenario2_nonlinear	MAR	ipw	epanechnikov	tau2	2000	0.5	0.1327	0.0634	0.0760	0.1384
Scenario2_nonlinear	MAR	ipw	beta	tau2	2000	0.5	0.2272	0.0319	0.0765	0.2276
Scenario2_nonlinear	MAR	ipw	tricube	tau2	2000	0.5	0.1294	0.0685	0.0798	0.1383
Scenario2_nonlinear	MAR	ipw	beta	tau1	2000	0.5	0.2347	0.0304	0.0834	0.2351
Scenario2_nonlinear	MCAR	complete_case	tricube	tau3	500	0	0.0283	0.0796	0.0125	0.0710
Scenario2_nonlinear	MCAR	complete_case	epanechnikov	tau3	500	0	0.0338	0.0803	0.0128	0.0715
Scenario2_nonlinear	MCAR	complete_case	bernstein	tau3	500	0	0.0472	0.0831	0.0151	0.0734
Scenario2_nonlinear	MCAR	complete_case	gaussian	tau3	500	0	0.0501	0.0899	0.0165	0.0781
Scenario2_nonlinear	MCAR	complete_case	beta	tau3	500	0	0.2429	0.0546	0.0903	0.2438
Scenario2_nonlinear	MCAR	complete_case	beta	tau2	500	0	0.2457	0.0495	0.0928	0.2463
Scenario2_nonlinear	MCAR	complete_case	bernstein	tau2	500	0	0.1719	0.0882	0.1031	0.1800
Scenario2_nonlinear	MCAR	complete_case	gaussian	tau2	500	0	0.1695	0.0943	0.1038	0.1799
Scenario2_nonlinear	MCAR	complete_case	epanechnikov	tau2	500	0	0.1734	0.0904	0.1151	0.1843
Scenario2_nonlinear	MCAR	complete_case	beta	tau1	500	0	0.2711	0.0573	0.1208	0.2718
Scenario2_nonlinear	MCAR	complete_case	tricube	tau3	500	0.1	0.0265	0.0786	0.0115	0.0710
Scenario2_nonlinear	MCAR	complete_case	epanechnikov	tau3	500	0.1	0.0374	0.0766	0.0127	0.0721
Scenario2_nonlinear	MCAR	complete_case	bernstein	tau3	500	0.1	0.0473	0.0839	0.0145	0.0735
Scenario2_nonlinear	MCAR	complete_case	gaussian	tau3	500	0.1	0.0459	0.0881	0.0154	0.0754
Scenario2_nonlinear	MCAR	complete_case	beta	tau3	500	0.1	0.2384	0.0588	0.0862	0.2393
Scenario2_nonlinear	MCAR	complete_case	gaussian	tau2	500	0.1	0.1631	0.0985	0.0938	0.1745
Scenario2_nonlinear	MCAR	complete_case	bernstein	tau2	500	0.1	0.1691	0.0847	0.0958	0.1757
Scenario2_nonlinear	MCAR	complete_case	beta	tau2	500	0.1	0.2583	0.0488	0.1028	0.2589
Scenario2_nonlinear	MCAR	complete_case	epanechnikov	tau2	500	0.1	0.1761	0.0961	0.1202	0.1883
Scenario2_nonlinear	MCAR	complete_case	beta	tau1	500	0.1	0.2730	0.0564	0.1206	0.2738
Scenario2_nonlinear	MCAR	complete_case	epanechnikov	tau3	500	0.3	0.0433	0.0819	0.0150	0.0814
Scenario2_nonlinear	MCAR	complete_case	gaussian	tau3	500	0.3	0.0462	0.0899	0.0151	0.0832
Scenario2_nonlinear	MCAR	complete_case	tricube	tau3	500	0.3	0.0340	0.0957	0.0178	0.0839
Scenario2_nonlinear	MCAR	complete_case	bernstein	tau3	500	0.3	0.0625	0.0873	0.0181	0.0839
Scenario2_nonlinear	MCAR	complete_case	beta	tau3	500	0.3	0.2491	0.0671	0.0963	0.2502
Scenario2_nonlinear	MCAR	complete_case	gaussian	tau2	500	0.3	0.1675	0.1021	0.0968	0.1804
Scenario2_nonlinear	MCAR	complete_case	bernstein	tau2	500	0.3	0.1816	0.0958	0.1077	0.1916
Scenario2_nonlinear	MCAR	complete_case	beta	tau2	500	0.3	0.2694	0.0638	0.1159	0.2705
Scenario2_nonlinear	MCAR	complete_case	epanechnikov	tau2	500	0.3	0.1823	0.1005	0.1217	0.1968
Scenario2_nonlinear	MCAR	complete_case	beta	tau1	500	0.3	0.2862	0.0624	0.1335	0.2874
Scenario2_nonlinear	MCAR	complete_case	tricube	tau3	500	0.5	0.0376	0.0892	0.0164	0.0886
Scenario2_nonlinear	MCAR	complete_case	bernstein	tau3	500	0.5	0.0697	0.0870	0.0178	0.0911
Scenario2_nonlinear	MCAR	complete_case	epanechnikov	tau3	500	0.5	0.0399	0.1146	0.0228	0.0950
Scenario2_nonlinear	MCAR	complete_case	gaussian	tau3	500	0.5	0.0557	0.1135	0.0243	0.0987
Scenario2_nonlinear	MCAR	complete_case	beta	tau3	500	0.5	0.2525	0.0714	0.0978	0.2540
Scenario2_nonlinear	MCAR	complete_case	bernstein	tau2	500	0.5	0.1970	0.1012	0.1101	0.2055
Scenario2_nonlinear	MCAR	complete_case	gaussian	tau2	500	0.5	0.1907	0.1212	0.1207	0.2059
Scenario2_nonlinear	MCAR	complete_case	beta	tau2	500	0.5	0.2779	0.0712	0.1238	0.2796
Scenario2_nonlinear	MCAR	complete_case	epanechnikov	tau2	500	0.5	0.2091	0.1172	0.1466	0.2259
Scenario2_nonlinear	MCAR	complete_case	tricube	tau2	500	0.5	0.2052	0.1082	0.1502	0.2253
Scenario2_nonlinear	MCAR	complete_case	gaussian	tau3	1000	0	0.0293	0.0572	0.0062	0.0526
Scenario2_nonlinear	MCAR	complete_case	bernstein	tau3	1000	0	0.0230	0.0623	0.0074	0.0529
Scenario2_nonlinear	MCAR	complete_case	tricube	tau3	1000	0	0.0224	0.0588	0.0075	0.0538
Scenario2_nonlinear	MCAR	complete_case	epanechnikov	tau3	1000	0	0.0294	0.0580	0.0075	0.0557
Scenario2_nonlinear	MCAR	complete_case	beta	tau3	1000	0	0.2174	0.0356	0.0698	0.2178
Scenario2_nonlinear	MCAR	complete_case	gaussian	tau2	1000	0	0.1328	0.0782	0.0712	0.1415
Scenario2_nonlinear	MCAR	complete_case	beta	tau2	1000	0	0.2280	0.0365	0.0787	0.2285
Scenario2_nonlinear	MCAR	complete_case	beta	tau1	1000	0	0.2362	0.0388	0.0864	0.2367
Scenario2_nonlinear	MCAR	complete_case	epanechnikov	tau2	1000	0	0.1455	0.0764	0.0976	0.1568
Scenario2_nonlinear	MCAR	complete_case	bernstein	tau2	1000	0	0.1508	0.0779	0.1012	0.1615
Scenario2_nonlinear	MCAR	complete_case	tricube	tau3	1000	0.1	0.0275	0.0567	0.0070	0.0559
Scenario2_nonlinear	MCAR	complete_case	epanechnikov	tau3	1000	0.1	0.0363	0.0572	0.0082	0.0580
Scenario2_nonlinear	MCAR	complete_case	gaussian	tau3	1000	0.1	0.0374	0.0690	0.0098	0.0606
Scenario2_nonlinear	MCAR	complete_case	bernstein	tau3	1000	0.1	0.0244	0.0714	0.0098	0.0563
Scenario2_nonlinear	MCAR	complete_case	gaussian	tau2	1000	0.1	0.1250	0.0767	0.0626	0.1348
Scenario2_nonlinear	MCAR	complete_case	beta	tau3	1000	0.1	0.2196	0.0451	0.0723	0.2201
Scenario2_nonlinear	MCAR	complete_case	beta	tau2	1000	0.1	0.2285	0.0381	0.0781	0.2290
Scenario2_nonlinear	MCAR	complete_case	beta	tau1	1000	0.1	0.2410	0.0431	0.0921	0.2416
Scenario2_nonlinear	MCAR	complete_case	epanechnikov	tau2	1000	0.1	0.1472	0.0747	0.0986	0.1571
Scenario2_nonlinear	MCAR	complete_case	bernstein	tau2	1000	0.1	0.1506	0.0814	0.1005	0.1604
Scenario2_nonlinear	MCAR	complete_case	epanechnikov	tau3	1000	0.3	0.0317	0.0663	0.0094	0.0620
Scenario2_nonlinear	MCAR	complete_case	tricube	tau3	1000	0.3	0.0224	0.0737	0.0102	0.0621
Scenario2_nonlinear	MCAR	complete_case	bernstein	tau3	1000	0.3	0.0325	0.0776	0.0120	0.0615
Scenario2_nonlinear	MCAR	complete_case	gaussian	tau3	1000	0.3	0.0455	0.0797	0.0140	0.0686
Scenario2_nonlinear	MCAR	complete_case	beta	tau3	1000	0.3	0.2236	0.0449	0.0749	0.2244
Scenario2_nonlinear	MCAR	complete_case	gaussian	tau2	1000	0.3	0.1439	0.0859	0.0780	0.1538
Scenario2_nonlinear	MCAR	complete_case	beta	tau2	1000	0.3	0.2311	0.0453	0.0810	0.2317
Scenario2_nonlinear	MCAR	complete_case	beta	tau1	1000	0.3	0.2511	0.0453	0.0997	0.2518
Scenario2_nonlinear	MCAR	complete_case	bernstein	tau2	1000	0.3	0.1542	0.0847	0.1006	0.1638
Scenario2_nonlinear	MCAR	complete_case	epanechnikov	tau2	1000	0.3	0.1538	0.0894	0.1024	0.1681
Scenario2_nonlinear	MCAR	complete_case	tricube	tau3	1000	0.5	0.0301	0.0716	0.0105	0.0664
Scenario2_nonlinear	MCAR	complete_case	epanechnikov	tau3	1000	0.5	0.0306	0.0753	0.0121	0.0686
Scenario2_nonlinear	MCAR	complete_case	bernstein	tau3	1000	0.5	0.0470	0.0855	0.0152	0.0737
Scenario2_nonlinear	MCAR	complete_case	gaussian	tau3	1000	0.5	0.0451	0.0889	0.0162	0.0763
Scenario2_nonlinear	MCAR	complete_case	beta	tau3	1000	0.5	0.2360	0.0536	0.0842	0.2368
Scenario2_nonlinear	MCAR	complete_case	gaussian	tau2	1000	0.5	0.1543	0.0942	0.0864	0.1668
Scenario2_nonlinear	MCAR	complete_case	beta	tau2	1000	0.5	0.2482	0.0533	0.0956	0.2491
Scenario2_nonlinear	MCAR	complete_case	bernstein	tau2	1000	0.5	0.1678	0.0890	0.1037	0.1765
Scenario2_nonlinear	MCAR	complete_case	epanechnikov	tau2	1000	0.5	0.1714	0.0921	0.1193	0.1872
Scenario2_nonlinear	MCAR	complete_case	beta	tau1	1000	0.5	0.2718	0.0552	0.1205	0.2724
Scenario2_nonlinear	MCAR	complete_case	gaussian	tau3	2000	0	0.0264	0.0438	0.0041	0.0419
Scenario2_nonlinear	MCAR	complete_case	bernstein	tau3	2000	0	0.0169	0.0475	0.0044	0.0412
Scenario2_nonlinear	MCAR	complete_case	tricube	tau3	2000	0	0.0204	0.0451	0.0045	0.0417
Scenario2_nonlinear	MCAR	complete_case	epanechnikov	tau3	2000	0	0.0254	0.0473	0.0052	0.0447
Scenario2_nonlinear	MCAR	complete_case	gaussian	tau2	2000	0	0.1034	0.0632	0.0520	0.1108
Scenario2_nonlinear	MCAR	complete_case	beta	tau3	2000	0	0.1924	0.0267	0.0540	0.1927
Scenario2_nonlinear	MCAR	complete_case	beta	tau2	2000	0	0.2036	0.0290	0.0618	0.2039
Scenario2_nonlinear	MCAR	complete_case	beta	tau1	2000	0	0.2092	0.0280	0.0668	0.2095
Scenario2_nonlinear	MCAR	complete_case	tricube	tau2	2000	0	0.1265	0.0663	0.0837	0.1344
Scenario2_nonlinear	MCAR	complete_case	epanechnikov	tau2	2000	0	0.1315	0.0603	0.0868	0.1376
Scenario2_nonlinear	MCAR	complete_case	epanechnikov	tau3	2000	0.1	0.0234	0.0507	0.0060	0.0441
Scenario2_nonlinear	MCAR	complete_case	tricube	tau3	2000	0.1	0.0229	0.0566	0.0064	0.0473
Scenario2_nonlinear	MCAR	complete_case	gaussian	tau3	2000	0.1	0.0329	0.0559	0.0076	0.0474
Scenario2_nonlinear	MCAR	complete_case	bernstein	tau3	2000	0.1	0.0163	0.0672	0.0089	0.0484
Scenario2_nonlinear	MCAR	complete_case	gaussian	tau2	2000	0.1	0.1086	0.0655	0.0565	0.1164
Scenario2_nonlinear	MCAR	complete_case	beta	tau3	2000	0.1	0.1994	0.0293	0.0581	0.1997
Scenario2_nonlinear	MCAR	complete_case	beta	tau2	2000	0.1	0.2044	0.0291	0.0621	0.2047
Scenario2_nonlinear	MCAR	complete_case	beta	tau1	2000	0.1	0.2157	0.0304	0.0714	0.2161
Scenario2_nonlinear	MCAR	complete_case	epanechnikov	tau2	2000	0.1	0.1241	0.0659	0.0764	0.1317
Scenario2_nonlinear	MCAR	complete_case	tricube	tau2	2000	0.1	0.1296	0.0607	0.0874	0.1380
Scenario2_nonlinear	MCAR	complete_case	bernstein	tau3	2000	0.3	0.0192	0.0514	0.0048	0.0451
Scenario2_nonlinear	MCAR	complete_case	epanechnikov	tau3	2000	0.3	0.0290	0.0530	0.0064	0.0505
Scenario2_nonlinear	MCAR	complete_case	tricube	tau3	2000	0.3	0.0216	0.0564	0.0067	0.0480
Scenario2_nonlinear	MCAR	complete_case	gaussian	tau3	2000	0.3	0.0325	0.0614	0.0083	0.0526
Scenario2_nonlinear	MCAR	complete_case	beta	tau3	2000	0.3	0.2076	0.0351	0.0637	0.2079
Scenario2_nonlinear	MCAR	complete_case	gaussian	tau2	2000	0.3	0.1220	0.0708	0.0654	0.1288
Scenario2_nonlinear	MCAR	complete_case	beta	tau2	2000	0.3	0.2136	0.0327	0.0680	0.2139
Scenario2_nonlinear	MCAR	complete_case	beta	tau1	2000	0.3	0.2247	0.0343	0.0778	0.2250
Scenario2_nonlinear	MCAR	complete_case	epanechnikov	tau2	2000	0.3	0.1393	0.0686	0.0931	0.1467
Scenario2_nonlinear	MCAR	complete_case	bernstein	tau2	2000	0.3	0.1417	0.0709	0.0936	0.1511
Scenario2_nonlinear	MCAR	complete_case	bernstein	tau3	2000	0.5	0.0228	0.0559	0.0058	0.0512
Scenario2_nonlinear	MCAR	complete_case	gaussian	tau3	2000	0.5	0.0271	0.0596	0.0069	0.0516
Scenario2_nonlinear	MCAR	complete_case	tricube	tau3	2000	0.5	0.0234	0.0593	0.0071	0.0533
Scenario2_nonlinear	MCAR	complete_case	epanechnikov	tau3	2000	0.5	0.0316	0.0568	0.0075	0.0545
Scenario2_nonlinear	MCAR	complete_case	gaussian	tau2	2000	0.5	0.1265	0.0735	0.0655	0.1354
Scenario2_nonlinear	MCAR	complete_case	beta	tau3	2000	0.5	0.2194	0.0381	0.0718	0.2199
Scenario2_nonlinear	MCAR	complete_case	beta	tau2	2000	0.5	0.2247	0.0351	0.0762	0.2251
Scenario2_nonlinear	MCAR	complete_case	beta	tau1	2000	0.5	0.2421	0.0409	0.0923	0.2425
Scenario2_nonlinear	MCAR	complete_case	epanechnikov	tau2	2000	0.5	0.1471	0.0806	0.0997	0.1560
Scenario2_nonlinear	MCAR	complete_case	bernstein	tau2	2000	0.5	0.1494	0.0809	0.0997	0.1600
Scenario2_nonlinear	MCAR	ipw	epanechnikov	tau3	500	0	0.0347	0.0698	0.0099	0.0677
Scenario2_nonlinear	MCAR	ipw	gaussian	tau3	500	0	0.0417	0.0803	0.0128	0.0704
Scenario2_nonlinear	MCAR	ipw	tricube	tau3	500	0	0.0286	0.0840	0.0139	0.0728
Scenario2_nonlinear	MCAR	ipw	bernstein	tau3	500	0	0.0514	0.0858	0.0161	0.0761
Scenario2_nonlinear	MCAR	ipw	gaussian	tau2	500	0	0.1541	0.0916	0.0839	0.1654
Scenario2_nonlinear	MCAR	ipw	beta	tau3	500	0	0.2403	0.0499	0.0873	0.2410
Scenario2_nonlinear	MCAR	ipw	beta	tau2	500	0	0.2529	0.0509	0.0995	0.2536
Scenario2_nonlinear	MCAR	ipw	bernstein	tau2	500	0	0.1706	0.0962	0.1073	0.1797
Scenario2_nonlinear	MCAR	ipw	tricube	tau2	500	0	0.1702	0.0861	0.1164	0.1842
Scenario2_nonlinear	MCAR	ipw	beta	tau1	500	0	0.2685	0.0560	0.1164	0.2692
Scenario2_nonlinear	MCAR	ipw	tricube	tau3	500	0.1	0.0331	0.0764	0.0122	0.0709
Scenario2_nonlinear	MCAR	ipw	epanechnikov	tau3	500	0.1	0.0330	0.0777	0.0132	0.0702
Scenario2_nonlinear	MCAR	ipw	bernstein	tau3	500	0.1	0.0436	0.0816	0.0133	0.0730
Scenario2_nonlinear	MCAR	ipw	gaussian	tau3	500	0.1	0.0518	0.0930	0.0176	0.0812
Scenario2_nonlinear	MCAR	ipw	beta	tau3	500	0.1	0.2325	0.0529	0.0812	0.2336
Scenario2_nonlinear	MCAR	ipw	gaussian	tau2	500	0.1	0.1570	0.1010	0.0868	0.1688
Scenario2_nonlinear	MCAR	ipw	beta	tau2	500	0.1	0.2538	0.0579	0.1010	0.2549
Scenario2_nonlinear	MCAR	ipw	bernstein	tau2	500	0.1	0.1758	0.0946	0.1053	0.1835
Scenario2_nonlinear	MCAR	ipw	beta	tau1	500	0.1	0.2663	0.0547	0.1135	0.2670
Scenario2_nonlinear	MCAR	ipw	epanechnikov	tau2	500	0.1	0.1782	0.0891	0.1195	0.1897
Scenario2_nonlinear	MCAR	ipw	tricube	tau3	500	0.3	0.0291	0.0828	0.0129	0.0759
Scenario2_nonlinear	MCAR	ipw	bernstein	tau3	500	0.3	0.0546	0.0848	0.0152	0.0819
Scenario2_nonlinear	MCAR	ipw	epanechnikov	tau3	500	0.3	0.0356	0.0900	0.0161	0.0796
Scenario2_nonlinear	MCAR	ipw	gaussian	tau3	500	0.3	0.0478	0.0939	0.0163	0.0852
Scenario2_nonlinear	MCAR	ipw	beta	tau3	500	0.3	0.2452	0.0655	0.0931	0.2464
Scenario2_nonlinear	MCAR	ipw	gaussian	tau2	500	0.3	0.1731	0.1106	0.1064	0.1876
Scenario2_nonlinear	MCAR	ipw	bernstein	tau2	500	0.3	0.1868	0.0989	0.1095	0.1941
Scenario2_nonlinear	MCAR	ipw	beta	tau2	500	0.3	0.2709	0.0608	0.1152	0.2722
Scenario2_nonlinear	MCAR	ipw	epanechnikov	tau2	500	0.3	0.1909	0.1040	0.1314	0.2041
Scenario2_nonlinear	MCAR	ipw	tricube	tau2	500	0.3	0.1867	0.1016	0.1354	0.2045
Scenario2_nonlinear	MCAR	ipw	epanechnikov	tau3	500	0.5	0.0413	0.0895	0.0170	0.0878
Scenario2_nonlinear	MCAR	ipw	tricube	tau3	500	0.5	0.0326	0.0927	0.0173	0.0877
Scenario2_nonlinear	MCAR	ipw	bernstein	tau3	500	0.5	0.0684	0.1005	0.0210	0.0948
Scenario2_nonlinear	MCAR	ipw	gaussian	tau3	500	0.5	0.0547	0.1070	0.0221	0.0930
Scenario2_nonlinear	MCAR	ipw	beta	tau3	500	0.5	0.2549	0.0685	0.0999	0.2565
Scenario2_nonlinear	MCAR	ipw	gaussian	tau2	500	0.5	0.1834	0.1169	0.1113	0.1986
Scenario2_nonlinear	MCAR	ipw	bernstein	tau2	500	0.5	0.2100	0.0995	0.1191	0.2156
Scenario2_nonlinear	MCAR	ipw	beta	tau2	500	0.5	0.2940	0.0721	0.1400	0.2956
Scenario2_nonlinear	MCAR	ipw	tricube	tau2	500	0.5	0.2011	0.1084	0.1422	0.2203
Scenario2_nonlinear	MCAR	ipw	epanechnikov	tau2	500	0.5	0.2046	0.1072	0.1444	0.2221
Scenario2_nonlinear	MCAR	ipw	tricube	tau3	1000	0	0.0252	0.0568	0.0067	0.0540
Scenario2_nonlinear	MCAR	ipw	bernstein	tau3	1000	0	0.0255	0.0679	0.0091	0.0550
Scenario2_nonlinear	MCAR	ipw	epanechnikov	tau3	1000	0	0.0279	0.0662	0.0102	0.0571
Scenario2_nonlinear	MCAR	ipw	gaussian	tau3	1000	0	0.0398	0.0707	0.0110	0.0606
Scenario2_nonlinear	MCAR	ipw	beta	tau3	1000	0	0.2169	0.0419	0.0706	0.2175
Scenario2_nonlinear	MCAR	ipw	beta	tau2	1000	0	0.2245	0.0375	0.0761	0.2249
Scenario2_nonlinear	MCAR	ipw	gaussian	tau2	1000	0	0.1377	0.0781	0.0778	0.1474
Scenario2_nonlinear	MCAR	ipw	beta	tau1	1000	0	0.2363	0.0394	0.0876	0.2367
Scenario2_nonlinear	MCAR	ipw	bernstein	tau2	1000	0	0.1483	0.0766	0.0968	0.1578
Scenario2_nonlinear	MCAR	ipw	epanechnikov	tau2	1000	0	0.1483	0.0776	0.0975	0.1577
Scenario2_nonlinear	MCAR	ipw	bernstein	tau3	1000	0.1	0.0254	0.0625	0.0069	0.0542
Scenario2_nonlinear	MCAR	ipw	gaussian	tau3	1000	0.1	0.0400	0.0706	0.0104	0.0620
Scenario2_nonlinear	MCAR	ipw	tricube	tau3	1000	0.1	0.0210	0.0748	0.0104	0.0595
Scenario2_nonlinear	MCAR	ipw	epanechnikov	tau3	1000	0.1	0.0310	0.0745	0.0113	0.0604
Scenario2_nonlinear	MCAR	ipw	beta	tau3	1000	0.1	0.2208	0.0404	0.0729	0.2214
Scenario2_nonlinear	MCAR	ipw	gaussian	tau2	1000	0.1	0.1338	0.0787	0.0748	0.1431
Scenario2_nonlinear	MCAR	ipw	beta	tau2	1000	0.1	0.2281	0.0411	0.0787	0.2287
Scenario2_nonlinear	MCAR	ipw	beta	tau1	1000	0.1	0.2423	0.0459	0.0937	0.2428
Scenario2_nonlinear	MCAR	ipw	bernstein	tau2	1000	0.1	0.1488	0.0794	0.0956	0.1582
Scenario2_nonlinear	MCAR	ipw	epanechnikov	tau2	1000	0.1	0.1545	0.0836	0.1028	0.1634
Scenario2_nonlinear	MCAR	ipw	tricube	tau3	1000	0.3	0.0252	0.0646	0.0085	0.0594
Scenario2_nonlinear	MCAR	ipw	bernstein	tau3	1000	0.3	0.0270	0.0726	0.0096	0.0611
Scenario2_nonlinear	MCAR	ipw	epanechnikov	tau3	1000	0.3	0.0322	0.0685	0.0101	0.0636
Scenario2_nonlinear	MCAR	ipw	gaussian	tau3	1000	0.3	0.0453	0.0832	0.0150	0.0708
Scenario2_nonlinear	MCAR	ipw	beta	tau3	1000	0.3	0.2253	0.0435	0.0761	0.2258
Scenario2_nonlinear	MCAR	ipw	gaussian	tau2	1000	0.3	0.1447	0.0857	0.0788	0.1541
Scenario2_nonlinear	MCAR	ipw	beta	tau2	1000	0.3	0.2408	0.0441	0.0882	0.2414
Scenario2_nonlinear	MCAR	ipw	beta	tau1	1000	0.3	0.2533	0.0430	0.1013	0.2538
Scenario2_nonlinear	MCAR	ipw	bernstein	tau2	1000	0.3	0.1614	0.0831	0.1043	0.1704
Scenario2_nonlinear	MCAR	ipw	epanechnikov	tau2	1000	0.3	0.1656	0.0843	0.1112	0.1756
Scenario2_nonlinear	MCAR	ipw	epanechnikov	tau3	1000	0.5	0.0280	0.0676	0.0095	0.0663
Scenario2_nonlinear	MCAR	ipw	tricube	tau3	1000	0.5	0.0316	0.0707	0.0106	0.0680
Scenario2_nonlinear	MCAR	ipw	gaussian	tau3	1000	0.5	0.0448	0.0758	0.0115	0.0711
Scenario2_nonlinear	MCAR	ipw	bernstein	tau3	1000	0.5	0.0429	0.0842	0.0140	0.0718
Scenario2_nonlinear	MCAR	ipw	beta	tau3	1000	0.5	0.2326	0.0534	0.0825	0.2335
Scenario2_nonlinear	MCAR	ipw	gaussian	tau2	1000	0.5	0.1546	0.0918	0.0853	0.1659
Scenario2_nonlinear	MCAR	ipw	beta	tau2	1000	0.5	0.2506	0.0532	0.0968	0.2515
Scenario2_nonlinear	MCAR	ipw	bernstein	tau2	1000	0.5	0.1702	0.0885	0.1026	0.1800
Scenario2_nonlinear	MCAR	ipw	beta	tau1	1000	0.5	0.2613	0.0505	0.1093	0.2621
Scenario2_nonlinear	MCAR	ipw	epanechnikov	tau2	1000	0.5	0.1756	0.0953	0.1172	0.1869
Scenario2_nonlinear	MCAR	ipw	tricube	tau3	2000	0	0.0203	0.0415	0.0037	0.0409
Scenario2_nonlinear	MCAR	ipw	epanechnikov	tau3	2000	0	0.0265	0.0457	0.0052	0.0442
Scenario2_nonlinear	MCAR	ipw	gaussian	tau3	2000	0	0.0264	0.0508	0.0055	0.0448
Scenario2_nonlinear	MCAR	ipw	bernstein	tau3	2000	0	0.0143	0.0582	0.0064	0.0438
Scenario2_nonlinear	MCAR	ipw	gaussian	tau2	2000	0	0.1058	0.0642	0.0523	0.1124
Scenario2_nonlinear	MCAR	ipw	beta	tau3	2000	0	0.1946	0.0294	0.0555	0.1948
Scenario2_nonlinear	MCAR	ipw	beta	tau2	2000	0	0.2047	0.0304	0.0629	0.2050
Scenario2_nonlinear	MCAR	ipw	beta	tau1	2000	0	0.2118	0.0285	0.0685	0.2121
Scenario2_nonlinear	MCAR	ipw	epanechnikov	tau2	2000	0	0.1252	0.0651	0.0804	0.1313
Scenario2_nonlinear	MCAR	ipw	tricube	tau2	2000	0	0.1277	0.0646	0.0847	0.1360
Scenario2_nonlinear	MCAR	ipw	bernstein	tau3	2000	0.1	0.0156	0.0492	0.0044	0.0411
Scenario2_nonlinear	MCAR	ipw	tricube	tau3	2000	0.1	0.0220	0.0506	0.0057	0.0456
Scenario2_nonlinear	MCAR	ipw	epanechnikov	tau3	2000	0.1	0.0273	0.0496	0.0063	0.0465
Scenario2_nonlinear	MCAR	ipw	gaussian	tau3	2000	0.1	0.0292	0.0551	0.0067	0.0462
Scenario2_nonlinear	MCAR	ipw	gaussian	tau2	2000	0.1	0.1058	0.0649	0.0524	0.1142
Scenario2_nonlinear	MCAR	ipw	beta	tau3	2000	0.1	0.2015	0.0290	0.0598	0.2018
Scenario2_nonlinear	MCAR	ipw	beta	tau2	2000	0.1	0.2065	0.0296	0.0635	0.2068
Scenario2_nonlinear	MCAR	ipw	beta	tau1	2000	0.1	0.2117	0.0291	0.0676	0.2120
Scenario2_nonlinear	MCAR	ipw	epanechnikov	tau2	2000	0.1	0.1313	0.0680	0.0847	0.1380
Scenario2_nonlinear	MCAR	ipw	tricube	tau2	2000	0.1	0.1301	0.0660	0.0870	0.1391
Scenario2_nonlinear	MCAR	ipw	bernstein	tau3	2000	0.3	0.0176	0.0576	0.0061	0.0474
Scenario2_nonlinear	MCAR	ipw	epanechnikov	tau3	2000	0.3	0.0266	0.0526	0.0062	0.0492
Scenario2_nonlinear	MCAR	ipw	tricube	tau3	2000	0.3	0.0216	0.0579	0.0073	0.0492
Scenario2_nonlinear	MCAR	ipw	gaussian	tau3	2000	0.3	0.0371	0.0671	0.0109	0.0558
Scenario2_nonlinear	MCAR	ipw	beta	tau3	2000	0.3	0.2055	0.0348	0.0624	0.2059
Scenario2_nonlinear	MCAR	ipw	gaussian	tau2	2000	0.3	0.1243	0.0693	0.0670	0.1316
Scenario2_nonlinear	MCAR	ipw	beta	tau2	2000	0.3	0.2168	0.0343	0.0711	0.2172
Scenario2_nonlinear	MCAR	ipw	beta	tau1	2000	0.3	0.2217	0.0299	0.0753	0.2220
Scenario2_nonlinear	MCAR	ipw	bernstein	tau2	2000	0.3	0.1391	0.0724	0.0928	0.1500
Scenario2_nonlinear	MCAR	ipw	epanechnikov	tau2	2000	0.3	0.1408	0.0666	0.0949	0.1482
Scenario2_nonlinear	MCAR	ipw	gaussian	tau3	2000	0.5	0.0311	0.0602	0.0070	0.0537
Scenario2_nonlinear	MCAR	ipw	tricube	tau3	2000	0.5	0.0223	0.0673	0.0091	0.0559
Scenario2_nonlinear	MCAR	ipw	bernstein	tau3	2000	0.5	0.0258	0.0715	0.0100	0.0558
Scenario2_nonlinear	MCAR	ipw	epanechnikov	tau3	2000	0.5	0.0306	0.0694	0.0105	0.0567
Scenario2_nonlinear	MCAR	ipw	gaussian	tau2	2000	0.5	0.1234	0.0725	0.0625	0.1325
Scenario2_nonlinear	MCAR	ipw	beta	tau3	2000	0.5	0.2200	0.0430	0.0731	0.2205
Scenario2_nonlinear	MCAR	ipw	beta	tau2	2000	0.5	0.2274	0.0360	0.0777	0.2278
Scenario2_nonlinear	MCAR	ipw	bernstein	tau2	2000	0.5	0.1460	0.0744	0.0881	0.1543
Scenario2_nonlinear	MCAR	ipw	beta	tau1	2000	0.5	0.2383	0.0393	0.0888	0.2386
Scenario2_nonlinear	MCAR	ipw	epanechnikov	tau2	2000	0.5	0.1524	0.0798	0.1011	0.1623

References

Halmos, P.R. The theory of unbiased estimation. Ann. Math. Stat. 1946, 17, 34–43. [Google Scholar] [CrossRef]
Hoeffding, W. A class of statistics with asymptotically normal distribution. Ann. Math. Stat. 1948, 19, 293–325. [Google Scholar] [CrossRef]
von Mises, R. On the asymptotic distribution of differentiable statistical functions. Ann. Math. Stat. 1947, 18, 309–348. [Google Scholar] [CrossRef]
Dynkin, E.B.; Mandelbaum, A. Symmetric statistics, Poisson point processes, and multiple Wiener integrals. Ann. Statist. 1983, 11, 739–745. [Google Scholar] [CrossRef]
Bretagnolle, J. Lois limites du bootstrap de certaines fonctionnelles. Ann. Inst. H. Poincaré Sect. B (N.S.) 1983, 19, 281–296. [Google Scholar]
Rubin, H.; Vitale, R.A. Asymptotic distribution of symmetric statistics. Ann. Statist. 1980, 8, 165–170. [Google Scholar] [CrossRef]
Filippova, A.A. Mises theorem on the limit behaviour of functionals derived from empirical distribution functions. Dokl. Akad. Nauk SSSR 1959, 129, 44–47. [Google Scholar]
Denker, M.; Keller, G. On U-statistics and v. Mises’ statistics for weakly dependent processes. Z. Wahrsch. Verw. Gebiete 1983, 64, 505–522. [Google Scholar] [CrossRef]
Borovkova, S.; Burton, R.; Dehling, H. Limit theorems for functionals of mixing processes with applications to U-statistics and dimension estimation. Trans. Amer. Math. Soc. 2001, 353, 4261–4318. [Google Scholar] [CrossRef]
Leucht, A. Degenerate U- and V-statistics under weak dependence: Asymptotic theory and bootstrap consistency. Bernoulli 2012, 18, 552–585. [Google Scholar] [CrossRef]
Leucht, A.; Neumann, M.H. Degenerate U- and V-statistics under ergodicity: Asymptotics, bootstrap and applications in statistics. Ann. Inst. Statist. Math. 2013, 65, 349–386. [Google Scholar] [CrossRef]
Lee, A.J. U-Statistics; Volume 110, Statistics: Textbooks and Monographs; Theory and Practice; Marcel Dekker, Inc.: New York, NY, USA, 1990; p. xii+302. [Google Scholar]
Arcones, M.A.; Giné, E. Limit theorems for U-processes. Ann. Probab. 1993, 21, 1494–1542. [Google Scholar] [CrossRef]
Koroljuk, V.S.; Borovskich, Y.V. Theory of U-Statistics; Volume 273, Mathematics and Its Applications; Translated from the 1989 Russian Original by P. V. Malyshev and D. V. Malyshev and Revised by the Authors; Kluwer Academic Publishers Group: Dordrecht, The Netherlands, 1994; p. x+552. [Google Scholar] [CrossRef]
Arcones, M.A.; Chen, Z.; Giné, E. Estimators related to U-processes with applications to multivariate medians: Asymptotic normality. Ann. Statist. 1994, 22, 1460–1477. [Google Scholar] [CrossRef]
Arcones, M.A.; Giné, E. On the law of the iterated logarithm for canonical U-statistics and processes. Stochastic Process. Appl. 1995, 58, 217–245. [Google Scholar] [CrossRef]
Borovskikh, Y.V. U-Statistics in Banach Spaces; VSP: Utrecht, The Netherlands, 1996; p. xii+420. [Google Scholar]
Lehmann, E.L. Elements of Large-Sample Theory; Springer Texts in Statistics; Springer: New York, NY, USA, 1999; p. xii+631. [Google Scholar] [CrossRef]
de la Peña, V.H.; Giné, E. Decoupling; From Dependence to Independence, Randomly Stopped Processes. U-Statistics and Processes. Martingales and Beyond; Probability and Its Applications (New York); Springer: New York, NY, USA, 1999; p. xvi+392. [Google Scholar] [CrossRef]
Bouzebda, S.; Nezzal, A. Asymptotic properties of conditional U-statistics using delta sequences. Comm. Statist. Theory Methods 2024, 53, 4602–4657. [Google Scholar] [CrossRef]
Bouzebda, S.; Nezzal, A. Uniform in number of neighbors consistency and weak convergence of kNN empirical conditional processes and kNN conditional U-processes involving functional mixing data. AIMS Math. 2024, 9, 4427–4550. [Google Scholar] [CrossRef]
Bouzebda, S.; Soukarieh, I. Limit theorems for a class of processes generalizing the U-empirical process. Stochastics 2024, 96, 799–845. [Google Scholar] [CrossRef]
Soukarieh, I.; Bouzebda, S. Renewal type bootstrap for increasing degree U-process of a Markov chain. J. Multivariate Anal. 2023, 195, 105143. [Google Scholar] [CrossRef]
Stute, W. Almost sure representations of the product-limit estimator for truncated data. Ann. Statist. 1993, 21, 146–156. [Google Scholar] [CrossRef]
Arcones, M.A.; Wang, Y. Some new tests for normality based on U-processes. Stat. Probab. Lett. 2006, 76, 69–82. [Google Scholar] [CrossRef]
Schick, A.; Wang, Y.; Wefelmeyer, W. Tests for normality based on density estimators of convolutions. Stat. Probab. Lett. 2011, 81, 337–343. [Google Scholar] [CrossRef]
Giné, E.; Mason, D.M. Laws of the iterated logarithm for the local U-statistic process. J. Theoret. Probab. 2007, 20, 457–485. [Google Scholar] [CrossRef]
Giné, E.; Mason, D.M. On local U-statistic processes and the estimation of densities of functions of several sample variables. Ann. Statist. 2007, 35, 1105–1145. [Google Scholar] [CrossRef]
Joly, E.; Lugosi, G. Robust estimation of U-statistics. Stochastic Process. Appl. 2016, 126, 3760–3773. [Google Scholar] [CrossRef]
Lee, S.; Linton, O.; Whang, Y.J. Testing for stochastic monotonicity. Econometrica 2009, 77, 585–602. [Google Scholar] [CrossRef]
Ghosal, S.; Sen, A.; van der Vaart, A.W. Testing monotonicity of regression. Ann. Statist. 2000, 28, 1054–1082. [Google Scholar] [CrossRef]
Abrevaya, J.; Jiang, W. A nonparametric approach to measuring and testing curvature. J. Bus. Econom. Statist. 2005, 23, 1–19. [Google Scholar] [CrossRef]
Sherman, R.P. The limiting distribution of the maximum rank correlation estimator. Econometrica 1993, 61, 123–137. [Google Scholar] [CrossRef]
Sherman, R.P. Maximal inequalities for degenerate U-processes with applications to optimization estimators. Ann. Statist. 1994, 22, 439–459. [Google Scholar] [CrossRef]
Bouzebda, S.; Ferfache, A.A. Asymptotic properties of semiparametric M-estimators with multiple change points. Physica A 2023, 609, 128363. [Google Scholar] [CrossRef]
Janson, S. A functional limit theorem for random graphs with applications to subgraph count statistics. Random Struct. Algorithms 1990, 1, 15–37. [Google Scholar] [CrossRef]
Clémençon, S.; Colin, I.; Bellet, A. Scaling-up empirical risk minimization: Optimization of incomplete U-statistics. J. Mach. Learn. Res. 2016, 17, 76. [Google Scholar]
Clémençon, S.; Lugosi, G.; Vayatis, N. Ranking and empirical minimization of U-statistics. Ann. Statist. 2008, 36, 844–874. [Google Scholar] [CrossRef]
Cao, Q.; Guo, Z.C.; Ying, Y. Generalization bounds for metric and similarity learning. Mach. Learn. 2016, 102, 115–132. [Google Scholar] [CrossRef]
Faivishevsky, L.; Goldberger, J. ICA based on a Smooth Estimation of the Differential Entropy. In Proceedings of the Advances in Neural Information Processing Systems; Koller, D., Schuurmans, D., Bengio, Y., Bottou, L., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2009; Volume 21. [Google Scholar]
Liu, Q.; Lee, J.; Jordan, M. A Kernelized Stein Discrepancy for Goodness-of-fit Tests. In Proceedings of the 33rd International Conference on Machine Learning, New York, NY, USA, 20–22 June 2016; Balcan, M.F., Weinberger, K.Q., Eds.; Proceedings of Machine Learning Research: New York, NY, USA, 2016; Volume 48, pp. 276–284. [Google Scholar]
Cybis, G.B.; Valk, M.; Lopes, S.R.C. Clustering and classification problems in genetics through U-statistics. J. Stat. Comput. Simul. 2018, 88, 1882–1902. [Google Scholar] [CrossRef]
Lim, F.; Stojanovic, V.M. On U-statistics and compressed sensing I: Non-asymptotic average-case analysis. IEEE Trans. Signal Process. 2013, 61, 2473–2485. [Google Scholar] [CrossRef]
Bello, D.Z.; Valk, M.; Cybis, G.B. Towards U-statistics clustering inference for multiple groups. J. Stat. Comput. Simul. 2024, 94, 204–222. [Google Scholar] [CrossRef]
Kim, I.; Ramdas, A. Dimension-agnostic inference using cross U-statistics. Bernoulli 2024, 30, 683–711. [Google Scholar] [CrossRef]
Chen, L.; Wan, A.T.K.; Zhang, S.; Zhou, Y. Distributed algorithms for U-statistics-based empirical risk minimization. J. Mach. Learn. Res. 2023, 24, 263. [Google Scholar]
Li, H.; Ren, C.; Li, L. U-processes and preference learning. Neural Comput. 2014, 26, 2896–2924. [Google Scholar] [CrossRef]
Janson, S. Asymptotic normality for m-dependent and constrained U-statistics, with applications to pattern matching in random strings and permutations. Adv. Appl. Probab. 2023, 55, 841–894. [Google Scholar] [CrossRef]
Sudheesh, K.K.; Anjana, S.; Xie, M. U-statistics for left truncated and right censored data. Statistics 2023, 57, 900–917. [Google Scholar] [CrossRef]
Le Minh, T. U-statistics on bipartite exchangeable networks. ESAIM Probab. Stat. 2023, 27, 576–620. [Google Scholar] [CrossRef]
Huang, B.; Liu, Y.; Peng, L. Distributed inference for two-sample U-statistics in massive data analysis. Scand. J. Stat. 2023, 50, 1090–1115. [Google Scholar] [CrossRef]
Ghannadpour, S.S.; Kalkhoran, S.E.; Jalili, H.; Behifar, M. Delineation of mineral potential zone using U-statistic method in processing satellite remote sensing images. Int. J. Min. Geo-Eng. 2023, 57, 445–453. [Google Scholar]
Cintra, R.F.; Valk, M.; Marcondes Filho, D. A model-free-based control chart for batch process using U-statistics. J. Process Control 2023, 132, 103097. [Google Scholar] [CrossRef]
Frees, E.W. Infinite order U-statistics. Scand. J. Statist. 1989, 16, 29–45. [Google Scholar]
Heilig, C.; Nolan, D. Limit theorems for the infinite-degree U-process. Statist. Sinica 2001, 11, 289–302. [Google Scholar]
Song, Y.; Chen, X.; Kato, K. Approximating high-dimensional infinite-order U-statistics: Statistical and computational guarantees. Electron. J. Stat. 2019, 13, 4794–4848. [Google Scholar] [CrossRef]
Peng, W.; Coleman, T.; Mentch, L. Rates of convergence for random forests via generalized U-statistics. Electron. J. Stat. 2022, 16, 232–292. [Google Scholar] [CrossRef]
Randles, R.H. On the asymptotic normality of statistics with estimated parameters. Ann. Statist. 1982, 10, 462–474. [Google Scholar] [CrossRef]
Desgagné, A.; Genest, C.; Ouimet, F. Asymptotics for non-degenerate multivariate U-statistics with estimated nuisance parameters under the null and local alternative hypotheses. arXiv 2024, arXiv:2401.11272. [Google Scholar] [CrossRef]
Stute, W. Conditional U-statistics. Ann. Probab. 1991, 19, 812–825. [Google Scholar] [CrossRef]
Nadaraja, E.A. On a regression estimate. Teor. Verojatnost. Primenen. 1964, 9, 157–159. [Google Scholar]
Watson, G.S. Smooth regression analysis. Sankhyā Ser. A 1964, 26, 359–372. [Google Scholar]
Sen, A. Uniform strong consistency rates for conditional U-statistics. Sankhyā Ser. A 1994, 56, 179–194. [Google Scholar]
Prakasa Rao, B.L.S.; Sen, A. Limit distributions of conditional U-statistics. J. Theoret. Probab. 1995, 8, 261–301. [Google Scholar] [CrossRef]
Harel, M.; Puri, M.L. Conditional U-statistics for dependent random variables. J. Multivariate Anal. 1996, 57, 84–100. [Google Scholar] [CrossRef]
Stute, W. Symmetrized NN-conditional U-statistics. In Research Developments in Probability and Statistics; VSP: Utrecht, The Netherlands, 1996; pp. 231–237. [Google Scholar]
Dony, J.; Mason, D.M. Uniform in bandwidth consistency of conditional U-statistics. Bernoulli 2008, 14, 1108–1133. [Google Scholar] [CrossRef]
Bouzebda, S. On the weak convergence and the uniform-in-bandwidth consistency of the general conditional U-processes based on the copula representation: Multivariate setting. Hacet. J. Math. Stat. 2023, 52, 1303–1348. [Google Scholar] [CrossRef]
Bouzebda, S. Weak convergence of the conditional single index U-statistics for locally stationary functional time series. AIMS Math. 2024, 9, 14807–14898. [Google Scholar] [CrossRef]
Bouzebda, S. Limit Theorems in the Nonparametric Conditional Single-Index U-Processes for Locally Stationary Functional Random Fields under Stochastic Sampling Design. Mathematics 2024, 12, 1996. [Google Scholar] [CrossRef]
Bouzebda, S. Limit theorems for wavelet conditional U-statistics for time series models. Math. Methods Statist. 2025, 34, 181–224. [Google Scholar] [CrossRef]
Arcones, M.A. The Bahadur-Kiefer representation for U-quantiles. Ann. Statist. 1996, 24, 1400–1422. [Google Scholar] [CrossRef]
Arcones, M.A. The asymptotic accuracy of the bootstrap of U-quantiles. Ann. Statist. 1995, 23, 1802–1822. [Google Scholar] [CrossRef]
Helmers, R.; Hušková, M. Bootstrapping multivariate U-quantiles and related statistics. J. Multivariate Anal. 1994, 49, 97–109. [Google Scholar] [CrossRef]
Zhou, W. Generalized Spatial u-Quantiles: Theory and Applications. Ph.D. Thesis, The University of Texas at Dallas, Richardson, TX, USA, 2005. [Google Scholar]
Zhou, W.; Serfling, R. Generalized multivariate rank type test statistics via spatial U-quantiles. Stat. Probab. Lett. 2008, 78, 376–383. [Google Scholar] [CrossRef]
Zhou, W.; Serfling, R. Multivariate spatial U-quantiles: A Bahadur-Kiefer representation, a Theil-Sen estimator for multiple regression, and a robust dispersion estimator. J. Statist. Plann. Inference 2008, 138, 1660–1678. [Google Scholar] [CrossRef]
Dehling, H.; Fried, R. Asymptotic distribution of two-sample empirical U-quantiles with applications to robust tests for shifts in location. J. Multivariate Anal. 2012, 105, 124–140. [Google Scholar] [CrossRef]
Wendler, M. U-processes, U-quantile processes and generalized linear statistics of dependent data. Stochastic Process. Appl. 2012, 122, 787–807. [Google Scholar] [CrossRef]
Vogel, D.; Wendler, M. Studentized U-quantile processes under dependence with applications to change-point analysis. Bernoulli 2017, 23, 3114–3144. [Google Scholar] [CrossRef]
Bouzebda, S.; Limnios, N. On general bootstrap of empirical estimator of a semi-Markov kernel with applications. J. Multivariate Anal. 2013, 116, 52–62. [Google Scholar] [CrossRef]
Bouzebda, S.; Cherfi, M. General bootstrap for dual ϕ-divergence estimates. J. Probab. Stat. 2012, 2012, 834107. [Google Scholar] [CrossRef]
Wertz, W. Statistical Density Estimation: A Survey; Volume 13, Angewandte Statistik und Ökonometrie [Applied Statistics and Econometrics]; With German and French summaries; Vandenhoeck & Ruprecht: Göttingen, Germany, 1978; p. 108. [Google Scholar]
Wand, M.P.; Jones, M.C. Kernel Smoothing; Volume 60, Monographs on Statistics and Applied Probability; Chapman and Hall, Ltd.: London, UK, 1995; p. xii+212. [Google Scholar] [CrossRef]
Tapia, R.A.; Thompson, J.R. Nonparametric Probability Density Estimation; Volume 1, Johns Hopkins Series in the Mathematical Sciences; Johns Hopkins University Press: Baltimore, MD, USA, 1978; p. xi+176. [Google Scholar]
Prakasa Rao, B.L.S. Nonparametric Functional Estimation; Probability and Mathematical Statistics; Academic Press, Inc. [Harcourt Brace Jovanovich, Publishers]: New York, NY, USA, 1983; p. xiv+522. [Google Scholar]
Roussas, G.G. Estimation of transition distribution function and its quantiles in Markov processes: Strong consistency and asymptotic normality. In Nonparametric Functional Estimation and Related Topics (Spetses, 1990); Volume 335, NATO Adv. Sci. Inst. Ser. C: Math. Phys. Sci.; Kluwer Academic Publishers: Dordrecht, The Netherlands, 1991; pp. 443–462. [Google Scholar]
Nadaraya, E.A. Nonparametric Estimation of Probability Densities and Regression Curves; Volume 20, Mathematics and Its Applications (Soviet Series); Translated from the Russian by Samuel Kotz; Kluwer Academic Publishers Group: Dordrecht, The Netherlands, 1989; p. x+213. [Google Scholar] [CrossRef]
Müller, H.G. Nonparametric Regression Analysis of Longitudinal Data; Volume 46, Lecture Notes in Statistics; Springer: New York, NY, USA, 1988; p. vi+199. [Google Scholar] [CrossRef]
Härdle, W. Applied Nonparametric Regression; Volume 19, Econometric Society Monographs; Cambridge University Press: Cambridge, UK, 1990; p. xvi+333. [Google Scholar] [CrossRef]
Eggermont, P.P.B.; LaRiccia, V.N. Maximum Penalized Likelihood Estimation. Vol. I; Density Estimation; Springer Series in Statistics; Springer: New York, NY, USA, 2001; p. xviii+510. [Google Scholar]
Devroye, L. A Course in Density Estimation; Volume 14, Progress in Probability and Statistics; Birkhäuser Boston, Inc.: Boston, MA, USA, 1987; p. xx+183. [Google Scholar]
Scott, D.W. Multivariate Density Estimation; Wiley Series in Probability and Mathematical Statistics: Applied Probability and Statistics; Theory, Practice, and Visualization, A Wiley-Interscience Publication; John Wiley & Sons, Inc.: New York, NY, USA, 1992; p. xiv+317. [Google Scholar] [CrossRef]
Silverman, B.W. Density Estimation for Statistics and Data Analysis; Monographs on Statistics and Applied Probability; Chapman & Hall: London, UK, 1986; p. x+175. [Google Scholar] [CrossRef]
Müller, H.G. Smooth optimum kernel estimators near endpoints. Biometrika 1991, 78, 521–530. [Google Scholar] [CrossRef]
Jones, M.C. Corrigendum: “Variable kernel density estimates and variable kernel density estimates” [Austral. J. Statist. 32 (1990), no. 3, 361–371]. Austral. J. Statist. 1991, 33, 119. [Google Scholar] [CrossRef]
Funke, B.; Hirukawa, M. Density derivative estimation using asymmetric kernels. J. Nonparametr. Stat. 2024, 36, 994–1017. [Google Scholar] [CrossRef]
Chen, S.X. Beta kernel estimators for density functions. Comput. Statist. Data Anal. 1999, 31, 131–145. [Google Scholar] [CrossRef]
Chen, S.X. Beta kernel smoothers for regression curves. Statist. Sinica 2000, 10, 73–91. [Google Scholar]
Bouezmarni, T.; Rolin, J.M. Consistency of the beta kernel density function estimator. Canad. J. Statist. 2003, 31, 89–98. [Google Scholar] [CrossRef]
Zhang, S.; Karunamuni, R.J. Boundary performance of the beta kernel estimators. J. Nonparametr. Stat. 2010, 22, 81–104. [Google Scholar] [CrossRef]
Bertin, K.; Klutchnikoff, N. Minimax properties of beta kernel estimators. J. Statist. Plann. Inference 2011, 141, 2287–2297. [Google Scholar] [CrossRef]
Bertin, K.; Genest, C.; Klutchnikoff, N.; Ouimet, F. Minimax properties of Dirichlet kernel density estimators. J. Multivariate Anal. 2023, 195, 105158. [Google Scholar] [CrossRef]
Igarashi, G. Bias reductions for beta kernel estimation. J. Nonparametr. Stat. 2016, 28, 1–30. [Google Scholar] [CrossRef]
Hirukawa, M. Asymmetric Kernel Smoothing; Theory and Applications in Economics and Finance, JSS Research Series in Statistics; SpringerBriefs in Statistics; Springer: Singapore, 2018; p. xii+110. [Google Scholar] [CrossRef]
Kristensen, D. Uniform convergence rates of kernel estimators with heterogeneous dependent data. Econom. Theory 2009, 25, 1433–1445. [Google Scholar] [CrossRef]
Yin, X.F.; Hao, Z.F. Adaptive Kernel Density Estimation using Beta Kernel. In Proceedings of the 2007 International Conference on Machine Learning and Cybernetics, Hong Kong, China, 19–22 August 2007; Volume 6, pp. 3293–3297. [Google Scholar] [CrossRef]
Igarashi, G.; Kakizawa, Y. Limiting bias-reduced Amoroso kernel density estimators for non-negative data. Comm. Statist. Theory Methods 2018, 47, 4905–4937. [Google Scholar] [CrossRef]
Charpentier, A.; Oulidi, A. Beta kernel quantile estimators of heavy-tailed loss distributions. Stat. Comput. 2010, 20, 35–55. [Google Scholar] [CrossRef]
Brown, B.M.; Chen, S.X. Beta-Bernstein smoothing for regression curves with compact support. Scand. J. Statist. 1999, 26, 47–59. [Google Scholar] [CrossRef]
Chen, S.X. Probability density function estimation using gamma kernels. Ann. Inst. Statist. Math. 2000, 52, 471–480. [Google Scholar] [CrossRef]
Bouezmarni, T.; Rombouts, J.V.K. Nonparametric density estimation for multivariate bounded data. J. Statist. Plann. Inference 2010, 140, 139–152. [Google Scholar] [CrossRef]
Aitchison, J.; Lauder, I.J. Kernel density estimation for compositional data. J. R. Stat. Soc. Ser. C 1985, 34, 129–137. [Google Scholar] [CrossRef]
Vitale, R.A. Bernstein polynomial approach to density function estimation. In Proceedings of the Statistical Inference and Related Topics (Proc. Summer Res. Inst. Statist. Inference for Stochastic Processes, Indiana Univ., Bloomington, Ind., 1974, Vol. 2; Dedicated to Z. W. Birnbaum); Academic Press: New York, NY, USA; London, UK, 1975; pp. 87–99. [Google Scholar]
Stadmüller, U. Asymptotic distributions of smoothed histograms. Metrika 1983, 30, 145–158. [Google Scholar] [CrossRef]
Gawronski, W. Strong laws for density estimators of Bernstein type. Period. Math. Hungar. 1985, 16, 23–43. [Google Scholar] [CrossRef]
Gawronski, W.; Stadtmüller, U. On density estimation by means of Poisson’s distribution. Scand. J. Statist. 1980, 7, 90–94. [Google Scholar]
Tenbusch, A. Two-dimensional Bernstein polynomial density estimators. Metrika 1994, 41, 233–253. [Google Scholar] [CrossRef]
Tenbusch, A. Nonparametric curve estimation with Bernstein estimates. Metrika 1997, 45, 1–30. [Google Scholar] [CrossRef]
Babu, G.J.; Canty, A.J.; Chaubey, Y.P. Application of Bernstein polynomials for smooth estimation of a distribution and density function. J. Statist. Plann. Inference 2002, 105, 377–392. [Google Scholar] [CrossRef]
Kakizawa, Y. Bernstein polynomial probability density estimation. J. Nonparametr. Stat. 2004, 16, 709–729. [Google Scholar] [CrossRef]
Prakasa Rao, B.L.S. Estimation of distribution and density functions by generalized Bernstein polynomials. Indian J. Pure Appl. Math. 2005, 36, 63–88. [Google Scholar]
Babu, G.J.; Chaubey, Y.P. Smooth estimation of a distribution and density function on a hypercube using Bernstein polynomials for dependent random vectors. Stat. Probab. Lett. 2006, 76, 959–969. [Google Scholar] [CrossRef]
Leblanc, A. On estimating distribution functions using Bernstein polynomials. Ann. Inst. Statist. Math. 2012, 64, 919–943. [Google Scholar] [CrossRef]
Belalia, M.; Bouezmarni, T.; Lemyre, F.C.; Taamouti, A. Testing independence based on Bernstein empirical copula and copula density. J. Nonparametr. Stat. 2017, 29, 346–380. [Google Scholar] [CrossRef]
Wang, L.; Lu, D. Application of Bernstein polynomials on estimating a distribution and density function in a triangular array. Methodol. Comput. Appl. Probab. 2023, 25, 56. [Google Scholar] [CrossRef]
Sancetta, A.; Satchell, S. The Bernstein copula and its applications to modeling and approximations of multivariate distributions. Econom. Theory 2004, 20, 535–562. [Google Scholar] [CrossRef]
Abrams, S.; Janssen, P.; Swanepoel, J.; Veraverbeke, N. Nonparametric estimation of risk ratios for bivariate data. J. Nonparametr. Stat. 2022, 34, 940–963. [Google Scholar] [CrossRef]
Ouimet, F. Asymptotic properties of Bernstein estimators on the simplex. J. Multivariate Anal. 2021, 185, 104784. [Google Scholar] [CrossRef]
Leblanc, A. On the boundary properties of Bernstein polynomial estimators of density and distribution functions. J. Statist. Plann. Inference 2012, 142, 2762–2778. [Google Scholar] [CrossRef]
Ouimet, F. On the boundary properties of Bernstein estimators on the simplex. Open Stat. 2022, 3, 48–62. [Google Scholar] [CrossRef]
Bouezmarni, T.; Rolin, J.M. Bernstein estimator for unbounded density function. J. Nonparametr. Stat. 2007, 19, 145–161. [Google Scholar] [CrossRef]
Leblanc, A. A bias-reduced approach to density estimation using Bernstein polynomials. J. Nonparametr. Stat. 2010, 22, 459–475. [Google Scholar] [CrossRef]
Belalia, M. On the asymptotic properties of the Bernstein estimator of the multivariate distribution function. Stat. Probab. Lett. 2016, 110, 249–256. [Google Scholar] [CrossRef]
Liu, B.; Ghosh, S.K. On empirical estimation of mode based on weakly dependent samples. Comput. Statist. Data Anal. 2020, 152, 107046. [Google Scholar] [CrossRef]
Lu, D.; Wang, L.; Yang, J. The stochastic convergence of Bernstein polynomial estimators in a triangular array. J. Nonparametr. Stat. 2022, 34, 987–1014. [Google Scholar] [CrossRef]
Kakizawa, Y. Recursive asymmetric kernel density estimation for nonnegative data. J. Nonparametr. Stat. 2021, 33, 197–224. [Google Scholar] [CrossRef]
Rattihalli, R.N.; Patil, S.B. Data dependent asymmetric kernels for estimating the density function. Sankhya A 2021, 83, 155–186. [Google Scholar] [CrossRef]
Funke, B.; Hirukawa, M. Bias correction for local linear regression estimation using asymmetric kernels via the skewing method. Econom. Stat. 2021, 20, 109–130. [Google Scholar] [CrossRef]
Igarashi, G.; Kakizawa, Y. Multiplicative bias correction for asymmetric kernel density estimators revisited. Comput. Statist. Data Anal. 2020, 141, 40–61. [Google Scholar] [CrossRef]
Hirukawa, M.; Sakudo, M. Another bias correction for asymmetric kernel density estimation with a parametric start. Stat. Probab. Lett. 2019, 145, 158–165. [Google Scholar] [CrossRef]
Lu, L. On the uniform consistency of the Bernstein density estimator. Stat. Probab. Lett. 2015, 107, 52–61. [Google Scholar] [CrossRef]
Guan, Z. Efficient and robust density estimation using Bernstein type polynomials. J. Nonparametr. Stat. 2016, 28, 250–271. [Google Scholar] [CrossRef]
Wang, L.; Lu, D. On the rates of asymptotic normality for Bernstein density estimators in a triangular array. J. Math. Anal. Appl. 2022, 511, 126063. [Google Scholar] [CrossRef]
Rubin, D.B. Inference and missing data. Biometrika 1976, 63, 581–592. [Google Scholar] [CrossRef]
Little, R.J.A.; Rubin, D.B. Statistical Analysis with Missing Data, 3rd ed.; Wiley Series in Probability and Statistics; John Wiley & Sons: Hoboken, NJ, USA, 2019. [Google Scholar] [CrossRef]
Josse, J.; Reiter, J.P. Introduction to the special section on missing data. Statist. Sci. 2018, 33, 139–141. [Google Scholar] [CrossRef]
Little, R.J.A.; Rubin, D.B. Statistical Analysis with Missing Data, 2nd ed.; Wiley Series in Probability and Statistics; Wiley-Interscience [John Wiley & Sons]: Hoboken, NJ, USA, 2002; p. xviii+381. [Google Scholar] [CrossRef]
Claeskens, G.; Hjort, N.L. Model Selection and Model Averaging; Volume 27, Cambridge Series in Statistical and Probabilistic Mathematics; Cambridge University Press: Cambridge, UK, 2008; p. xviii+312. [Google Scholar] [CrossRef]
Seaman, S.; Galati, J.; Jackson, D.; Carlin, J. What is meant by “missing at random”? Statist. Sci. 2013, 28, 257–268. [Google Scholar] [CrossRef]
Mealli, F.; Rubin, D.B. Clarifying missing at random and related definitions, and implications when coupled with exchangeability. Biometrika 2015, 102, 995–1000. [Google Scholar] [CrossRef]
Lu, G.; Copas, J.B. Missing at random, likelihood ignorability and model completeness. Ann. Statist. 2004, 32, 754–765. [Google Scholar] [CrossRef]
Farewell, D.M.; Daniel, R.M.; Seaman, S.R. Missing at random: A stochastic process perspective. Biometrika 2022, 109, 227–241. [Google Scholar] [CrossRef]
Yatchew, A. An elementary estimator of the partial linear model. Econom. Lett. 1997, 57, 135–143. [Google Scholar] [CrossRef]
Abadie, A.; Imbens, G.W. Large sample properties of matching estimators for average treatment effects. Econometrica 2006, 74, 235–267. [Google Scholar] [CrossRef]
Guerre, E.; Perrigne, I.; Vuong, Q. Optimal nonparametric estimation of first-price auctions. Econometrica 2000, 68, 525–574. [Google Scholar] [CrossRef]
Bouzebda, S. Asymptotic Learning Theory for Conditional U–Statistics Based on Delta Sequences Under Missing at Random Mechanisms. Mathematics 2026, 14, 1899. [Google Scholar] [CrossRef]
Belhas, H.; Mohammedi, M.; Bouzebda, S. Asymptotic Theory for Multivariate Nonparametric Quantile Regression with Stationary Ergodic Functional Covariates and Missing-at-Random Responses. Symmetry 2026, 18, 445. [Google Scholar] [CrossRef]
Bouzebda, S.; Nezzal, A.; Elhattab, I. Limit theorems for nonparametric conditional U-statistics smoothed by asymmetric kernels. AIMS Math. 2024, 9, 26195–26282. [Google Scholar] [CrossRef]
Hansen, B.E. Uniform convergence rates for kernel estimation with dependent data. Econom. Theory 2008, 24, 726–748. [Google Scholar] [CrossRef]
Deheuvels, P.; Mason, D.M. General asymptotic confidence bands based on kernel-type function estimators. Stat. Inference Stoch. Process. 2004, 7, 225–277. [Google Scholar] [CrossRef]
Kotz, S.; Balakrishnan, N.; Johnson, N.L. Continuous Multivariate Distributions. Vol. 1: Models and Applications, 2nd ed.; Wiley: New York, NY, USA, 2000. [Google Scholar]
Ng, K.W.; Tian, G.L.; Tang, M.L. Dirichlet and Related Distributions; Wiley Series in Probability and Statistics; Theory, Methods and Applications; John Wiley & Sons, Ltd.: Chichester, UK, 2011; p. xxvi+310. [Google Scholar] [CrossRef]
Ouimet, F.; Tolosana-Delgado, R. Asymptotic properties of Dirichlet kernel density estimators. J. Multivariate Anal. 2022, 187, 104832. [Google Scholar] [CrossRef]
Hirukawa, M.; Murtazashvili, I.; Prokhorov, A. Uniform convergence rates for nonparametric estimators smoothed by the beta kernel. Scand. J. Stat. 2022, 49, 1353–1382. [Google Scholar] [CrossRef]
Cheng, T.C.; Biswas, A. Maximum trimmed likelihood estimator for multivariate mixed continuous and categorical data. Comput. Statist. Data Anal. 2008, 52, 2042–2065. [Google Scholar] [CrossRef]
Spiess, M. Estimation of a two-equation panel model with mixed continuous and ordered categorical outcomes and missing data. J. R. Statist. Soc. Ser. C 2006, 55, 525–538. [Google Scholar] [CrossRef]
Leung, C.Y. The effect of across-location heteroscedasticity on the classification of mixed categorical and continuous data. J. Multivariate Anal. 2003, 84, 369–386. [Google Scholar] [CrossRef]
Liu, C.; Rubin, D.B. Ellipsoidally symmetric extensions of the general location model for mixed categorical and continuous data. Biometrika 1998, 85, 673–688. [Google Scholar] [CrossRef]
Little, R.J.A.; Schluchter, M.D. Maximum likelihood estimation for mixed continuous and categorical data with missing values. Biometrika 1985, 72, 497–512. [Google Scholar] [CrossRef]
Stute, W. Universally consistent conditional U-statistics. Ann. Statist. 1994, 22, 460–473. [Google Scholar] [CrossRef]
Stute, W. L^p-convergence of conditional U-statistics. J. Multivariate Anal. 1994, 51, 71–82. [Google Scholar] [CrossRef]
Devroye, L.; Györfi, L.; Lugosi, G. A Probabilistic Theory of Pattern Recognition; Volume 31, Applications of Mathematics (New York); Springer: New York, NY, USA, 1996; p. xvi+636. [Google Scholar] [CrossRef]
Lehmann, E.L. A general concept of unbiasedness. Ann. Math. Stat. 1951, 22, 587–592. [Google Scholar] [CrossRef]
Dwass, M. The large-sample power of rank order tests in the two-sample problem. Ann. Math. Statist. 1956, 27, 352–374. [Google Scholar] [CrossRef]
Kendall, M.G. A New Measure of Rank Correlation. Biometrika 1938, 30, 81–93. [Google Scholar] [CrossRef]
Serfling, R.J. Approximation Theorems of Mathematical Statistics; Wiley Series in Probability and Mathematical Statistics; John Wiley & Sons, Inc.: New York, NY, USA, 1980; p. xiv+371. [Google Scholar]
Bouzebda, S.; Didi, S. Some asymptotic properties of kernel regression estimators of the mode for stationary and ergodic continuous time processes. Rev. Mat. Complut. 2021, 34, 811–852. [Google Scholar] [CrossRef]
Bouzebda, S.; Didi, S. Additive regression model for stationary and ergodic continuous time processes. Comm. Statist. Theory Methods 2017, 46, 2454–2493. [Google Scholar] [CrossRef]
Bouzebda, S.; Didi, S. Multivariate wavelet density and regression estimators for stationary and ergodic discrete time processes: Asymptotic results. Comm. Statist. Theory Methods 2017, 46, 1367–1406. [Google Scholar] [CrossRef]
Bouzebda, S.; Didi, S.; El Hajj, L. Multivariate wavelet density and regression estimators for stationary and ergodic continuous time processes: Asymptotic results. Math. Methods Statist. 2015, 24, 163–199. [Google Scholar] [CrossRef]
Bouzebda, S.; Ferfache, A.A. Asymptotic properties of M-estimators based on estimating equations and censored data in semi-parametric models with multiple change points. J. Math. Anal. Appl. 2021, 497, 124883. [Google Scholar] [CrossRef]
Bouzebda, S. Asymptotic properties of pseudo maximum likelihood estimators and test in semi-parametric copula models with multiple change points. Math. Methods Statist. 2014, 23, 38–65. [Google Scholar] [CrossRef]
Bouzebda, S.; Keziou, A. A new test procedure of independence in copula models via χ²-divergence. Comm. Statist. Theory Methods 2010, 39, 1–20. [Google Scholar] [CrossRef]
Bouzebda, S.; Keziou, A. A semiparametric maximum likelihood ratio test for the change point in copula models. Stat. Methodol. 2013, 14, 39–61. [Google Scholar] [CrossRef]
Bouezmarni, T.; Lemyre, F.C.; El Ghouch, A. Estimation of a bivariate conditional copula when a variable is subject to random right censoring. Electron. J. Stat. 2019, 13, 5044–5087. [Google Scholar] [CrossRef]
Arcones, M.A. A Bernstein-type inequality for U-statistics and U-processes. Stat. Probab. Lett. 1995, 22, 239–247. [Google Scholar] [CrossRef]
Billingsley, P. Probability and Measure, 3rd ed.; Wiley Series in Probability and Mathematical Statistics; John Wiley & Sons, Inc.: New York, NY, USA, 1995; p. xiv+593. [Google Scholar]
van der Vaart, A.W. Asymptotic Statistics; Volume 3, Cambridge Series in Statistical and Probabilistic Mathematics; Cambridge University Press: Cambridge, UK, 1998; p. xvi+443. [Google Scholar] [CrossRef]
van der Vaart, A.W.; Wellner, J.A. Weak Convergence and Empirical Processes—With Applications to Statistics; Springer Series in Statistics; Springer: Cham, Switzerland, 2023; p. xvii+679. [Google Scholar] [CrossRef]

Figure 1. Integrated mean squared error heatmap for Scenario 1. Each tile corresponds to a fixed combination of kernel, target missingness rate, empirical Kendall representation, sample size, and missingness mechanism. The color scale is displayed on the square-root scale used in the simulation code, which improves visual discrimination among low-risk procedures while preserving the ordering induced by IMSE. Because Scenario 1 combines a uniform design with a linear target curve, this figure should be read as the baseline benchmark for comparing the five smoothing strategies in the least irregular regime.

Figure 2. Integrated mean squared error heatmap for Scenario 2. Relative to Scenario 1, this design is more demanding because the covariate density is asymmetric and concentrated near the left boundary, while the conditional Kendall function is nonlinear. The figure is therefore especially informative for assessing whether support-adapted smoothers, such as the beta and Bernstein procedures, gain a finite-sample advantage when both boundary effects and design inhomogeneity are present.

Figure 3. Distribution of IMSE across kernels for Scenario 1. Each boxplot aggregates IMSE values over the sample sizes

n \in {500, 1000, 2000}

and missingness levels

π \in {0, 0.10, 0.30, 0.50}

, separately by missingness mechanism, correction method, and empirical Kendall representation. In contrast to the heatmaps, which are cellwise diagnostics, these boxplots provide a robustness-oriented summary of how each smoother behaves across a family of homogeneous design cells.

Figure 3. Distribution of IMSE across kernels for Scenario 1. Each boxplot aggregates IMSE values over the sample sizes

n \in {500, 1000, 2000}

and missingness levels

π \in {0, 0.10, 0.30, 0.50}

, separately by missingness mechanism, correction method, and empirical Kendall representation. In contrast to the heatmaps, which are cellwise diagnostics, these boxplots provide a robustness-oriented summary of how each smoother behaves across a family of homogeneous design cells.

Figure 4. Distribution of IMSE across kernels for Scenario 2. Because the design is concentrated near the boundary and the target is nonlinear, this figure provides a stringent aggregated comparison of the procedures. In particular, it reveals not only central tendency but also the stability of each smoothing method across heterogeneous finite-sample regimes within Scenario 2.

Figure 5. Complete-case versus IPW comparison for Scenario 1 using the Gaussian smoother as benchmark. The horizontal axis reports the target missingness rate and the vertical axis reports IMSE. Panels are stratified by empirical Kendall representation, missingness mechanism, and sample size. Under MCAR, the two methods are theoretically and numerically identical in the present implementation because normalized IPW reduces exactly to normalized complete-case weighting; therefore, any visible discrepancy can only arise under covariate-dependent MAR and should be interpreted as the consequence of local design reweighting rather than correction of a different target.

Figure 6. Complete-case versus IPW comparison for Scenario 2 using the Gaussian smoother. This is the more informative of the two benchmark comparisons, because nonlinear dependence and asymmetric design density amplify the local consequences of MAR-driven covariate distortion. The figure therefore visualizes the central finite-sample trade-off induced by inverse-probability reweighting: possible recentering gains in distorted regions versus variance inflation due to heterogeneous weights.

Figure 7. Observed-rate diagnostic for Scenario 1. The density curves display, over Monte Carlo replications, the realized observed proportion for each target missingness level, missingness mechanism, and sample size. Their concentration around the prescribed levels confirms that the calibration step in the missingness generator performs as intended. In particular, under the covariate-dependent MAR design, the replication-specific numerical choice of the intercept a successfully stabilizes the overall observation rate before Bernoulli thinning is applied.

Figure 8. Observed-rate diagnostic for Scenario 2. Despite the irregular Beta design for the covariate, the realized observation fractions remain tightly concentrated around their target values. This confirms that the asymmetry of the design density does not compromise the numerical calibration of the MCAR and MAR selection modules, and it supports interpreting subsequent performance differences as genuine estimation effects rather than artifacts of poor missingness control.

Table 1. Main notation used in the paper.

Notation	Meaning
$X_{i}$	Covariate vector, taking values in $X \subseteq {[0, 1]}^{d}$
$Y_{i}$	Response vector, taking values in $R^{q}$
$δ_{i}$	Response-observation indicator; $δ_{i} = 1$ if $Y_{i}$ is observed
$p (x)$	Propensity score, $p (x) = P (δ = 1 ∣ X = x)$
$f (x)$	Density of the covariate $X$
m	Order of the conditional U-statistic
$φ$	Measurable kernel/function of m response variables
$\tilde{x}$	m-tuple $(x_{1}, \dots, x_{m}) \in X^{m}$
$I (m, n)$	Set of m-tuples of distinct indices from ${1, \dots, n}$
$r^{(m)} (φ, \tilde{x})$	Target conditional U-functional
$K_{Λ_{n, ℓ} (x)}$	Asymmetric-kernel centered/adapted at $x$
$ℓ = 1, 2, 3$	Kernel type: Dirichlet, Bernstein, or beta/mixed kernel
${\hat{r}}_{n, ℓ}^{(m)}$	Complete-case asymmetric-kernel conditional U-statistic estimator
$u_{n, ℓ} (φ, \tilde{x})$	Localized numerator U-statistic under MAR
$u_{n, ℓ} (1, \tilde{x})$	Localized denominator U-statistic under MAR
$r_{p, n, ℓ}^{(m)}$	Deterministic complete-case smoothed centering based on $p f$
$π_{j, m}$	j-th Hoeffding projection of an m-variate kernel

Table 2. Summary of the main asymptotic results. The notation

B_{n, ℓ}

denotes the deterministic smoothing bias,

V_{n, ℓ}

the leading stochastic variance scale, and

g = p f

the effective complete-case design density under MAR. The symbol

ρ_{n, ℓ}

denotes the kernel-specific deterministic bias scale.

Table 2. Summary of the main asymptotic results. The notation

B_{n, ℓ}

denotes the deterministic smoothing bias,

V_{n, ℓ}

the leading stochastic variance scale, and

g = p f

the effective complete-case design density under MAR. The symbol

ρ_{n, ℓ}

denotes the kernel-specific deterministic bias scale.

Smoother/Support	Estimator	Main Result	Bias Scale and Centering	Stochastic Scale Under MAR	Novelty/Role in the Paper
Dirichlet kernels on the simplex $S_{d, 1}$	Complete-case conditional U-statistic ${\hat{r}}_{n, 1}^{(m)}$ ; regression case $m = 1$ treated separately	Uniform consistency for $m = 1$ ; uniform strong consistency for general m; asymptotic normality	$ρ_{n, 1} = \overset{˘}{b} .$ The deterministic lefting is obtained by replacing $f with p f .$ Thus the complete-case bias constant is computed with $g = p f$ .	Leading stochastic term given by the first Hoeffding projection. The variance contains the complete-case information factor ${p (x) f (x)}^{- 1},$ or its m-variate analogue.	Provides the simplex-adapted MAR theory. The Dirichlet local drift, covariance, boundary behavior, and $L^{2}$ -norm are kernel-specific and not supplied by the abstract delta-sequence theory.
Bernstein polynomial smoothers on compact supports	Complete-case Bernstein-type conditional U-statistic; Nadaraya–Watson case $m = 1$	Weak and strong uniform convergence; higher-order conditional U-statistic extension	$ρ_{n, 2} = k_{n}^{- 1} .$ The deterministic bias is the Bernstein approximation bias applied to the complete-case target density $p f$ .	The stochastic order is inherited from the Bernstein localization scale and the complete-case first projection. MAR changes constants through inverse propensity terms, but not the convergence rate under $p \geq c > 0$ .	Shows that discrete polynomial smoothers fit the MAR conditional U-statistic framework. The verification is nontrivial because the smoothing operator is discrete rather than an ordinary continuous kernel.
Product beta kernels on ${[0, 1]}^{d}$	Complete-case beta-kernel conditional U-statistic	Weak uniform convergence on fixed compact regions; strong uniform convergence; expanding-domain results approaching the boundary	$ρ_{n, 3} = b_{n} .$ The complete-case deterministic bias is governed by the beta-kernel moment expansion with f replaced by $p f$ . On interior regions, $μ_{n, 3} (x) = b_{n} (1 - 2 x) + O (b_{n}^{2}) .$	The variance scale depends on the beta-kernel $L^{2}$ -norm and is inflated by the MAR observation mechanism through factors involving $1 / p$ .	Captures boundary-sensitive beta-kernel behavior under MAR. The point-dependent shape of the beta kernel makes the local $L^{2}$ -norm and bias constants support-dependent.
Mixed continuous–categorical regressors	Complete-case conditional U-statistic with continuous beta smoothing and categorical smoothing	Uniform convergence for heterogeneous covariates; mixed-data extension of the beta-kernel theory	$ρ_{n, mix} = b_{n} + λ_{n},$ where $b_{n}$ is the continuous bandwidth and $λ_{n}$ the categorical smoothing parameter.	The stochastic scale combines the continuous beta contribution, the categorical smoothing contribution, and the complete-case inverse-propensity inflation.	Extends the theory beyond purely continuous supports. The deterministic bias splits into continuous and categorical components, a feature absent from the complete-data and abstract delta-sequence settings.
Applications: conditional dependence, discrimination, multisample functionals, and conditional Kendall-type coefficients	Special choices of the U-statistic kernel $φ$	Consistency and, where applicable, asymptotic normality obtained by applying the preceding general theory	Bias and centering inherited from the corresponding smoothing family: Dirichlet, Bernstein, beta, or mixed.	Variance obtained from the conditional Hoeffding projection of the chosen kernel $φ$ , with MAR inflation through $1 / p$ .	Demonstrates that the framework estimates genuinely nonlinear conditional functionals, not only ordinary conditional means.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Bouzebda, S. Advanced Statistical Learning: Limit Theorems for Nonparametric Conditional U-Statistics Smoothed by Asymmetric Kernels Under Missing-at-Random Sampling. Mathematics 2026, 14, 2110. https://doi.org/10.3390/math14122110

AMA Style

Bouzebda S. Advanced Statistical Learning: Limit Theorems for Nonparametric Conditional U-Statistics Smoothed by Asymmetric Kernels Under Missing-at-Random Sampling. Mathematics. 2026; 14(12):2110. https://doi.org/10.3390/math14122110

Chicago/Turabian Style

Bouzebda, Salim. 2026. "Advanced Statistical Learning: Limit Theorems for Nonparametric Conditional U-Statistics Smoothed by Asymmetric Kernels Under Missing-at-Random Sampling" Mathematics 14, no. 12: 2110. https://doi.org/10.3390/math14122110

APA Style

Bouzebda, S. (2026). Advanced Statistical Learning: Limit Theorems for Nonparametric Conditional U-Statistics Smoothed by Asymmetric Kernels Under Missing-at-Random Sampling. Mathematics, 14(12), 2110. https://doi.org/10.3390/math14122110

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Advanced Statistical Learning: Limit Theorems for Nonparametric Conditional U-Statistics Smoothed by Asymmetric Kernels Under Missing-at-Random Sampling

Abstract

1. Introduction and Motivations

Notation

2. Preliminaries and Estimation Procedure

2.1. Hoeffding Decomposition for Symmetric Kernels

2.2. Algorithmic Summary of the Estimator Under MAR

2.3. Conditions and Comments

2.4. Comments

3. Conditional U -Statistic Estimators Based on Dirichlet Kernels

3.1. Nonparametric Regression Estimation

3.2. Uniform Convergence of Conditional U-Statistics Under Missing Data

3.3. Limiting Distribution Under Missing Data

4. Conditional U -Statistics Using Bernstein Polynomials Under Missing Data

4.1. Centering and U-Statistic Representation Under MAR

4.2. Nonparametric Regression Estimation Under Missing Data

4.3. Conditional U-Statistics Under Missing Data

5. Conditional U -Statistics Estimators Using Beta Kernels Under Missing Data

5.1. Conditions and Comments Under MAR

5.2. Weak Uniform Convergence of Conditional U-Statistics Under MAR

5.3. Strong Uniform Convergence of Conditional U-Statistics Under MAR

5.4. Conditional U-Statistics Estimators Using Mixed Categorical and Continuous Data Under MAR

Weak Uniform Convergence Under MAR

5.5. Computational Complexity

6. Applications

6.1. Discrimination Problems with Missing Responses

6.2. Generalized U-Statistics with Missing Data

6.3. Kendall Rank Correlation Coefficient Under Conditional Independence Testing with Missing Responses

7. Examples

8. Bandwidth Selection Under Missing Responses

9. Simulation Study

9.1. Target Functional and Interpretation

9.2. Data-Generating Mechanisms

9.3. Missing-Response Mechanism

9.4. Observed Sample, Local Weights, and Empirical Kendall Representations

9.5. Smoothers and Tuning Rules

9.6. Monte Carlo Protocol and Risk Criteria

9.7. Reported Numerical Summaries

9.8. Interpretation of the Comparisons

9.9. Results

9.10. Practical Implications

10. Concluding Remarks

11. Mathematical Development

12. Proofs of Section 3: Dirichlet Kernels

12.1. Proofs of Section 3.2

12.2. Proofs of the Results of Section 3.3

13. Proof of the Results of Section 4: Bernstein Polynomials

14. Proof of the Results of Section 5: Beta Kernels

15. Proofs of Section 3.1

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

3. Conditional $U$ -Statistic Estimators Based on Dirichlet Kernels

4. Conditional $U$ -Statistics Using Bernstein Polynomials Under Missing Data

5. Conditional $U$ -Statistics Estimators Using Beta Kernels Under Missing Data