Next Article in Journal
Statistical Learning of Conditional Single-Index U-Processes Under Local Stationarity and Missing-At-Random Functional Responses
Previous Article in Journal
Localized Hermite Method of Approximate Particular Solutions for Solving the Helmholtz Equation
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Advanced Statistical Learning: Limit Theorems for Nonparametric Conditional U-Statistics Smoothed by Asymmetric Kernels Under Missing-at-Random Sampling

Laboratory of Applied Mathematics of Compiègne (LMAC), Université de Technologie de Compiègne, Alliance Sorbonne Université, 60203 Compiègne, France
Mathematics 2026, 14(12), 2110; https://doi.org/10.3390/math14122110 (registering DOI)
Submission received: 21 April 2026 / Revised: 3 June 2026 / Accepted: 4 June 2026 / Published: 12 June 2026
(This article belongs to the Section D1: Probability and Statistics)

Abstract

This paper develops a boundary-sensitive asymptotic theory for nonparametric conditional U-statistics smoothed by support-adapted asymmetric kernels when the response variable is subject to Missing-at-Random observation. The problem lies at the intersection of three well-established but traditionally separate lines of research: conditional U-statistics, asymmetric smoothing on constrained supports, and incomplete-data inference under MAR sampling. The contribution of the paper is not a novelty claim concerning any of these components in isolation. Rather, it consists in deriving a kernel-specific and MAR-aware limit theory for their simultaneous occurrence, where the estimators are nonlinear complete-case ratios of localized U-statistics and the localization devices are point-dependent approximate identities adapted to the geometry of the covariate support. The analysis covers three principal classes of support-respecting smoothers: Dirichlet kernels on the simplex, Bernstein polynomial smoothers, and multivariate beta kernels on hypercubes, with an additional extension to mixed continuous–categorical regressors. These smoothing schemes are not translation-invariant, and their local moments, effective support, normalizing constants and L 2 -masses vary with the evaluation point, especially near the boundary. Consequently, their incorporation into conditional U-statistics requires more than a direct transfer of ordinary asymmetric-kernel regression theory. The numerator and denominator of the estimators are localized U-statistics whose stochastic expansions are governed by Hoeffding projections, including canonical components that must be controlled uniformly over the conditioning domain. Under regularity, smoothness and positivity assumptions adapted to the MAR setting, we establish uniform consistency, weak and strong uniform convergence rates, stochastic expansions and asymptotic normality. The results are obtained both on fixed compact subsets and on interior regions approaching the boundary, thereby identifying how support geometry enters the bias and stochastic normalizations. A central feature of the theory is the separation between the deterministic effect of complete-case sampling and its stochastic effect. For the complete-case estimator, the natural deterministic equivalent is obtained by replacing the design density f with the effective complete-case density p f , where p is the propensity score. Thus, the MAR mechanism may enter higher-order deterministic bias constants through the local design tilt, whereas the leading stochastic dispersion reflects the loss of effective information through propensity score factors. The precise variance constants and normalizing rates remain kernel-specific, depending on the local L 2 -structure of the Dirichlet, Bernstein or beta smoothing device. The paper should therefore be viewed as a MAR extension and refinement of the complete-data asymmetric-kernel conditional U-statistic theory. It provides a common probabilistic architecture for several boundary-adapted smoothing schemes while retaining the kernel-dependent bias operators, variance constants, boundary regimes and Hoeffding-projection structures required for sharp asymptotic interpretation. Numerical experiments illustrate the finite-sample behavior predicted by the theory and highlight the interaction between support-adapted smoothing, boundary effects and incomplete response observation.

1. Introduction and Motivations

Since the foundational works of [1,2], and, in a broader functional-analytic sense [3], the theory of U-statistics has occupied a central position in asymptotic statistics. At the most basic level, a U-statistic replaces the ordinary empirical mean by a symmetrized average of a kernel over all distinct m-tuples of observations, thereby producing an unbiased estimator of a distributional functional of order m. Yet this elementary description severely understates the depth of the subject. The probabilistic structure of U-statistics is exceptionally rich: their fluctuation theory involves orthogonal projection methods, canonical degeneracy, nonlinear symmetrization phenomena, and subtle higher-order dependence patterns that distinguish them sharply from linear empirical averages. It is precisely this combination of generality, unbiasedness, and nontrivial asymptotic behavior that has made U-statistics one of the most influential and enduring constructions in modern statistical theory.
The classical i.i.d. theory was laid down in a sequence of landmark contributions [1,2,3,4,5,6,7], where the basic laws of large numbers, projection formulas, and asymptotic normality results were established. These results not only clarify the asymptotic structure of unbiased nonlinear statistics but also reveal a general methodological principle: once a statistical functional can be represented, or approximated, by a symmetrized kernel, the asymptotic behavior of its empirical counterpart may often be analyzed by decomposing it into Hoeffding projections of different orders. This insight led to a vast subsequent literature. Extensions to dependent settings, including mixing and weak dependence structures, were developed in [8,9,10,11]. Standard monographic treatments are given in [12,13,14,15,16,17,18,19], while recent developments and an updated bibliography are documented in [20,21,22,23]. Collectively, this body of work has elevated U-statistics from a special-purpose technical device to a genuinely universal language for higher-order statistical inference.
Their ubiquity is by now well established. In nonparametric and semiparametric statistics, U-statistics arise in density estimation, regression functionals, cross-validation, goodness-of-fit methodology, robustness, rank-based inference, and the study of asymptotic distributions of complex M-estimators. For example, Stute [24] used almost sure uniform bounds for P -canonical U-processes in the analysis of the product-limit estimator for truncated observations. Arcones and Wang [25] proposed new normality tests built from U-processes, while [26], drawing on the local U-statistic techniques of [27,28], developed weighted L 1 tests based on standardized data. In robust multivariate inference, Joly and Lugosi [29] advocated median-of-means procedures rooted in U-statistic constructions for heavy-tailed functional estimation. More generally, the modern theory of U-processes now permeates inference on qualitative properties of functions in nonparametric statistics [30,31,32], the asymptotic analysis of nonlinear estimators [13,33,34], and recent developments in functional and robust models [35].
Their range of application extends far beyond traditional nonparametric theory. In random graph asymptotics, counts of fixed subgraphs, such as triangles, are canonical examples of U-statistics [36]. In machine learning, pairwise and higher-order empirical risks expressed through U-statistics now appear routinely in ranking, metric learning, clustering, image analysis, graph inference, and supervised comparison problems [37]. The ranking problem in particular may be formulated as a pairwise classification problem whose empirical criterion is a U-statistic of order two [38,39]. Further instances include entropy estimation [40], goodness-of-fit testing [41], model-free clustering and classification of genetic data [42], non-asymptotic analysis of random compressed sensing matrices [43], multiple-group clustering in high dimension [44], dimension-agnostic inference [45], empirical risk minimization by U-process methods [46,47], asymmetric U-statistics for stationary m-dependent sequences [48], testing under left truncation and right censoring [49], quadruplet statistics in network analysis [50], distributed two-sample inference [51], and monitoring or structural detection in complex stochastic systems [52,53]. The field has also moved toward increasingly intricate regimes, including random kernels of diverging order [23,54,55,56] and even infinite-order U-statistics motivated by uncertainty quantification in ensemble procedures [57]. At a still more general level, Randles [58] proposed a route to asymptotic distribution theory for U-statistics involving estimated parameters, a program that has recently been revisited in [59].
Among the most important local incarnations of this theory are the conditional U-statistics introduced by [60]. These estimators extend the Nadaraya–Watson paradigm [61,62] from first-order conditional means to nonlinear conditional functionals of arbitrary order. More precisely, let { ( X i , Y i ) , i N * } be i.i.d. random vectors with X i R d and Y i R q , and let φ : R q m R be measurable. The object of interest is the conditional functional
r ( m ) ( φ , t ) = E φ ( Y 1 , , Y m ) | ( X 1 , , X m ) = ( t 1 , , t m ) ,             t R d m ,
whenever a regular version exists. Given a kernel K and bandwidth h n 0 , the estimator proposed in [60] is
r ^ ^ n ( m ) ( φ , t ; h n ) = ( i 1 , , i m ) I ( m , n ) φ ( Y i 1 , , Y i m ) j = 1 m K t j X i j h n ( i 1 , , i m ) I ( m , n ) j = 1 m K t j X i j h n ,
where I ( m , n ) is the set of m-tuples of distinct indices from { 1 , , n } . In the special case m = 1 , (1.2) reduces exactly to the classical Nadaraya–Watson estimator. The importance of conditional U-statistics is conceptual as much as technical. They allow one to estimate local nonlinear characteristics of the conditional distribution of the response, and not merely conditional expectations. In other words, they provide a nonparametric mechanism for accessing quantities that depend simultaneously on several responses observed under nearby covariate values. This includes, among many others, local dependence measures, conditional covariance-type functionals, rank-based association coefficients, discrimination criteria, conditional variability measures, multisample comparison functionals, and a broad class of local nonlinear features inaccessible to ordinary regression smoothers. One may therefore regard conditional U-statistics as the natural bridge between first-order smoothing methods and the much richer world of higher-order conditional inference. The asymptotic theory initiated in [60] has subsequently been developed along several important directions. Sen [63] obtained rates of uniform convergence in the conditioning argument. Prakasa Rao and Sen [64] investigated the corresponding limiting distributions and clarified their relation to Stute’s original results. Harel and Puri [65] extended the theory to weakly dependent observations under mixing assumptions and connected the resulting estimators to Bayes-risk consistency in discrimination problems. Stute [66] introduced symmetrized nearest-neighbor versions of conditional U-statistics as alternatives to ordinary kernel smoothers. A decisive methodological advance was then achieved by [67], who established a much stronger type of consistency: uniformity not only in the location parameter, but also in the bandwidth over shrinking intervals, and simultaneously over classes of kernels F . Their argument relied crucially on the local conditional U-process framework developed in [27]. This program has since been extended in several directions; see [20,21,68,69,70]. Closely related higher-order quantile problems, including Bahadur–Kiefer representations and bootstrap properties for U-quantiles, were studied in [71,72,73,74,75,76,77,78,79,80,81,82].
Notwithstanding this impressive progress, one serious limitation persists throughout most of the existing literature: the near-exclusive use of symmetric kernels. While such kernels are natural and convenient on unconstrained Euclidean domains, they are intrinsically mismatched to compact or otherwise constrained supports. Near the boundary, symmetric kernels place nonnegligible mass outside the support of the covariates, generating the well-known boundary bias phenomenon. This issue is classical and pervasive; see, among many others [83,84,85,86,87,88,89,90,91,92,93,94]. A substantial literature has therefore developed around boundary-correction strategies, beginning with early important contributions such as [95,96]. Among the available remedies, support-adapted asymmetric kernels have progressively emerged as one of the most coherent and geometrically faithful solutions [97]. Their key feature is that the kernel support automatically respects the support of the target distribution, while the shape of the kernel varies with the evaluation point. This location-dependent geometry permits an intrinsically adaptive smoothing mechanism and avoids the artificial mass leakage responsible for boundary distortions under symmetric smoothing.
This perspective has generated a large and fertile literature. In the univariate compact-support setting, Chen [98] introduced the beta-kernel estimator for density estimation, and Chen [99] studied the corresponding regression problem. The asymptotic theory of beta kernels was subsequently developed in [100,101,102,103,104,105,106,107,108,109]. Related regression results with fixed design appear in [110,111], and multivariate product beta constructions were considered in [112]. On simplex-constrained supports, Aitchison and Lauder [113] introduced the Dirichlet-kernel estimator for compositional data. Closely connected to these developments is the Bernstein polynomial methodology, whose roots go back to [114]. Its asymptotic analysis was studied by [115,116,117] and later extended to the multivariate setting in [118,119,120,121,122,123,124,125,126,127,128,129]. Further refinements may be found in [130,131,132,133,134,135,136,137,138,139,140,141,142,143,144]. A recurrent conclusion across these works is that asymmetric kernels and Bernstein-type smoothers exhibit markedly improved boundary behavior and can substantially outperform conventional symmetric procedures when the support is compact or geometrically constrained.
Yet, and this is one of the starting points of the present paper, the theory remains largely incomplete at the level of conditional nonlinear functionals. Although asymmetric kernels are now rather well understood for density estimation and, to a lesser extent, for ordinary nonparametric regression, essentially no general asymptotic theory appears to be available for conditional U-statistics smoothed by support-adapted asymmetric kernels. This gap is not a mere technical oversight. The transition from linear local averages to locally weighted U-functionals is conceptually and analytically substantial. One must now analyze a nonlinear ratio of localized U-statistics, control kernels whose shape depends on the evaluation point, handle Hoeffding projections of point-dependent kernels, and deal with canonical terms whose behavior is substantially more intricate than in the linear case. In particular, the higher-order dependence structure intrinsic to conditional U-statistics cannot be treated by a naive transfer of techniques from classical Nadaraya–Watson smoothing or asymmetric density estimation. The nonlinear ratio structure, the localized U-process nature of the fluctuations, and the point-dependent asymmetry of the smoother interact in a genuinely nontrivial manner.
A second, independent, and practically unavoidable layer of difficulty is introduced by missing data. In modern statistical applications, incomplete responses are not exceptional but routine. Missingness arises through nonresponse, sensor failure, fusion of data sources, longitudinal attrition, intermittent measurement, privacy constraints, truncation, corruption, or recording limitations. This issue is ubiquitous across applications; see [145,146,147,148]. The classical taxonomy of [148] distinguishes Missing Completely at Random (MCAR), Missing At Random (MAR), and Not Missing At Random (NMAR). The MCAR assumption is often implausibly strong, whereas NMAR mechanisms are notoriously difficult to analyze without stringent structural assumptions. The MAR framework occupies the most useful intermediate position: missingness may depend on observed covariates, but not on the unobserved response itself once those covariates are conditioned upon. This assumption is simultaneously realistic enough for a broad range of applications and mathematically tractable enough to support rigorous asymptotic analysis. Moreover, as emphasized in [149], procedures developed under MAR can perform remarkably well in practice, often more reliably than misspecified NMAR models. At the same time, the formal content of the MAR assumption is subtler than the usual informal slogan suggests, and important conceptual clarifications may be found in [150,151,152,153].
The conjunction of support-adapted smoothing and incomplete responses is especially compelling when the covariate space is itself bounded, compositional, or otherwise geometrically structured. Compact supports arise naturally in economics and finance, where covariates may be proportions, rates, shares, recovery fractions, or budget allocations. They also occur in compositional models, nonparametric copula methodology [127], the nonparametric component of partial linear regressions [154], matching procedures [155], and structural auction models [156]. In such settings, the support geometry is not a peripheral technicality; it is an intrinsic feature of the inferential problem. A faithful nonparametric theory for higher-order conditional functionals must therefore account, simultaneously and coherently, for nonlinear conditioning, geometric support constraints, boundary bias, and incomplete sampling.
We now clarify more precisely the sense in which the present contribution is new. The individual components of the problem have, of course, substantial precedents. Conditional U-statistics go back to [60] and their uniform asymptotic theory has been developed in several directions. Asymmetric kernels, including beta, Dirichlet and Bernstein-type smoothing devices, are well established in density estimation and in ordinary nonparametric regression on bounded supports. Likewise, complete-case and inverse-probability ideas under Missing-at-Random sampling are classical in missing-data analysis. The novelty of the present paper is not located in any one of these ingredients taken separately, but in the asymptotic analysis of their simultaneous occurrence in a nonlinear conditional U-statistic problem.
More specifically, the estimators considered here combine four features that are usually handled separately: localized higher-order U-statistics, point-dependent asymmetric kernels, boundary-sensitive support geometry, and complete-case MAR sampling. This combination is analytically nontrivial. Unlike ordinary Nadaraya–Watson regression, the numerator and denominator are localized U-statistics and their stochastic behavior is governed by Hoeffding projections, including canonical components whose degeneracy has to be controlled uniformly in the localization point. Unlike symmetric kernel smoothing, the kernels used here are not translation invariant: their supports, moments, normalizing constants and L 2 -norms depend on the evaluation point and may change regime as the point approaches the boundary of the simplex, cube or mixed support. Finally, unlike the complete-data case, MAR sampling changes the effective local design measure from f ( x ) d x to p ( x ) f ( x ) d x for complete-case estimators, and modifies the leading stochastic variance through the propensity score. Accordingly, the present paper develops a kernel-specific limit theory rather than a merely formal extension of existing results. For Dirichlet, Bernstein and beta smoothers we derive the deterministic centering, uniform stochastic bounds, consistency statements, asymptotic normalizations and variance expressions in forms that explicitly reflect the geometry of the corresponding support. In particular, the bias is governed by the local moments of the asymmetric smoothing device and by the effective complete-case design density p f , whereas the leading fluctuation is governed by the first Hoeffding projection and contains the usual inverse-propensity loss of information. This separation between smoothing geometry, higher-order U-statistic structure and MAR-induced information loss is one of the main conceptual outcomes of the paper. Thus, the phrase “unified framework” is used here in a restricted and technical sense: the paper provides a common asymptotic treatment of several support-adapted asymmetric smoothing schemes for conditional U-statistics under MAR sampling, while preserving the kernel-specific bias, variance and boundary behavior of each smoother. We do not claim novelty for U-statistics, asymmetric kernels or MAR methods in isolation. The contribution is the boundary-adapted, MAR-aware and U-process-based asymptotic theory obtained for their combination. The novelty of the paper should be understood in a precise sense. Conditional U-statistics, asymmetric kernels and MAR sampling are all established topics. What appears not to have been developed before is a kernel-specific asymptotic theory for conditional U-statistics smoothed by support-adapted asymmetric kernels when the responses are observed under a Missing-at-Random mechanism. This setting is not a direct corollary of ordinary asymmetric-kernel regression or of complete-data conditional U-statistic theory. The numerator and denominator are localized U-statistics; their fluctuations involve Hoeffding projections and canonical U-process terms; the smoothing kernels are point-dependent and boundary-sensitive; and complete-case MAR sampling replaces the design density by the effective density p f in the deterministic centering while inflating the stochastic variance through the propensity score. The paper therefore contributes a genuinely joint analysis of higher-order conditional U-statistics, asymmetric boundary-adapted smoothing and incomplete-response sampling. For Dirichlet, Bernstein and beta smoothers, we obtain explicit consistency results, uniform stochastic bounds, bias descriptions, asymptotic normalizations and variance formulae. The resulting theory identifies separately the roles of boundary geometry, localized U-statistic dependence and MAR-induced information loss. This is the sense in which the proposed framework is unified: it provides one asymptotic architecture for several support-respecting smoothers, while retaining the kernel-specific constants and boundary regimes needed for statistical interpretation.
The relevance of the proposed estimators is not purely abstract. They arise naturally in discrimination and classification, in multisample nonlinear conditional inference, and in the estimation of local rank-based association measures such as conditional Kendall-type coefficients. They also provide a flexible mechanism for estimating conditional functionals that genuinely depend on several responses and cannot be reduced to first-order regression objects. From a practical perspective, their support-adapted nature makes them particularly attractive in problems where the covariates are inherently bounded, while the MAR formulation makes them immediately relevant to realistic data-analytic environments in which incomplete responses are unavoidable. The present manuscript is closely connected to, but technically distinct from, the general delta-sequence theory of [157]. That work provides an abstract probabilistic framework for complete-case conditional U-statistics under MAR, in which the localization device is treated as a positive approximate identity. The present paper develops a sharper and more geometric theory for a specific class of approximate identities, namely support-adapted asymmetric kernels. This specialization is analytically nontrivial because Dirichlet, beta, and Bernstein smoothers are point-dependent, boundary-sensitive, and support-constrained. Their local moments, normalizing constants, and L 2 norms vary across the domain and may change regime near boundary strata. Consequently, the resulting stochastic expansions, uniform rates, and asymptotic variance formulae require arguments beyond those available in the abstract delta-sequence setting.
The remainder of the paper is organized as follows. In Section 3, we introduce the conditional U-statistic estimator based on Dirichlet kernels. We begin with the regression case m = 1 in Section 3.1, where we prove a uniform convergence result of independent interest; see Theorem 1. The extension to higher-order conditional U-statistics is developed in Section 3.2, culminating in Corollary 1. Their limiting distribution is established in Section 3.3; see Theorem 5 and Corollary 2. Section 4 is devoted to Bernstein polynomial smoothing. The case m = 1 is treated in Section 4.2; see Theorem 6 and Corollary 3. Higher-order conditional U-statistics based on Bernstein smoothers are analyzed in Section 4.3, with main results in Theorem 8 and Corollary 4. Section 5 addresses beta-kernel smoothing. Weak uniform convergence is established in Section 10, while strong uniform convergence is derived in Section 5.3; see Corollaries 5 and 6. Section 5.4 extends the analysis to mixed categorical and continuous regressors, leading to Corollary 7. Numerical experiments illustrating the finite-sample behavior of the proposed estimators are presented in Section 9. Finally, concluding remarks and possible extensions are gathered in Section 10. For readability, all proofs are deferred to Section 11, and supplementary technical arguments are collected in the Appendix A.

Notation

For the reader’s convenience, we collect in Table 1 the main notation used throughout the paper.

2. Preliminaries and Estimation Procedure

Let us consider a sequence of independent and identically distributed random vectors { ( X i , Y i ) , i N * } defined on a common probability space ( Ω , F , P ) , where X i X [ 0 , 1 ] d and Y i Y : = R q . Let φ : Y m R be a measurable function with respect to the product Borel σ -algebra B ( R q ) m , and assume that E φ ( Y 1 , , Y m ) < . We are interested in the conditional functional
r ( m ) ( φ , x ˜ ) = E φ ( Y 1 , , Y m ) | ( X 1 , , X m ) = x ˜ ,             x ˜ X m ,
defined as a regular version of the conditional expectation. Indeed, since X m is a Borel subset of the Polish space ( [ 0 , 1 ] d ) m , it is a standard Borel space. Therefore, a regular-conditional distribution of ( Y 1 , , Y m ) given ( X 1 , , X m ) exists, and consequently there is a Borel measurable version of the map
x ˜ E φ ( Y 1 , , Y m ) | ( X 1 , , X m ) = x ˜ .
Thus, the quantity in (2.1) is well defined whenever
E φ ( Y 1 , , Y m ) < .
Stute [60] presented a class of estimators for r ( m ) ( φ , x ˜ ) , called the conditional U-statistics, which is defined for each x ˜ X m and 1 , 2 , 3 to be:
r ˜ ˜ n , ( m ) ( φ , x ˜ ; Λ ¯ n , ( x ˜ ) ) = ( i 1 , , i m ) I ( m , n ) φ ( Y i 1 , , Y i m ) K Λ n , ( x 1 ) ( X i 1 ) K Λ n , ( x m ) ( X i m ) ( i 1 , , i m ) I ( m , n ) K Λ n , ( x 1 ) ( X i 1 ) K Λ n , ( x m ) ( X i m ) ,
where Λ ¯ n , ( x ˜ ) = ( Λ n , ( x 1 ) , , Λ n , ( x m ) ) will be specified later in the sections below, and I ( m , n ) : = { ( i 1 , , i m ) { 1 , , n } m : i 1 , , i m distinct } denotes the set of m-tuples of distinct indices. In the particular case m = 1 , r ( m ) ( φ , x ˜ ) reduces to r ( 1 ) ( φ , x ˜ ) = E ( φ ( Y ) | X = x ) and Stute’s estimator becomes the Nadaraya–Watson estimator of r ( 1 ) ( φ , x ˜ ) given by:
r ˜ ˜ n , ( 1 ) ( φ , x ) : = i = 1 n φ ( Y i ) K Λ n , ( x ) ( X i ) i = 1 n K Λ n , ( x ) ( X i ) .
A distinctive contribution of the present work is its explicit incorporation of incomplete response observations into the foregoing nonparametric framework. Throughout, the covariate sequence { X i } i = 1 n is assumed to be fully observed, whereas the corresponding responses may be subject to missingness. To formalize this mechanism, let δ i denote the response-observation indicator, defined by δ i = 1 when Y i is observed and δ i = 0 otherwise. In accordance with the foundational missing-data taxonomies introduced in [147,148,158], we work under the Missing At Random (MAR) paradigm. This assumption stipulates that, conditionally on the observed covariates, the probability of observing the response is independent of the possibly unobserved response value itself. More precisely, we assume that
P ( δ i = 1 X i , Y i ) = P ( δ i = 1 X i ) = : p ( X i ) P -almost surely ,
where p : X [ 0 , 1 ] denotes the conditional probability of response observation, commonly referred to as the propensity score. Unless otherwise stated, this function is assumed to be continuous on its domain.
Condition (2.4) is equivalently interpreted as the conditional independence of δ i and Y i given X i . This assumption provides a mathematically tractable yet substantively meaningful framework for a broad range of applications, including environmental monitoring systems, biomedical follow-up studies, and longitudinal epidemiological investigations. Moreover, as emphasized in [149], inference procedures constructed under a correctly specified MAR mechanism may yield substantially more reliable prediction and imputation performance than approaches based on misspecified nonignorable missingness mechanisms in the absence of adequate auxiliary information. For the asymptotic validity of the resulting complete-case procedure, we impose the standard positivity condition inf x S p ( x ) c > 0 , on the relevant compact or effective domain S X . This condition prevents the local effective sample size from degenerating and ensures that the denominator of the estimator remains asymptotically well behaved. The natural extension of (2.2) to the incomplete-response setting is then given by
r ^ n , ( m ) φ , x ˜ ; Λ ¯ n , ( x ˜ ) = ( i 1 , , i m ) I ( m , n ) φ ( Y i 1 , , Y i m ) δ i 1 δ i m K Λ n , ( x 1 ) ( X i 1 ) K Λ n , ( x m ) ( X i m ) ( i 1 , , i m ) I ( m , n ) δ i 1 δ i m K Λ n , ( x 1 ) ( X i 1 ) K Λ n , ( x m ) ( X i m ) .
Remark 1.
We stress that the estimator defined in (2.5) is a complete-case local ratio estimator. It is not an inverse-probability weighted estimator unless the factors δ i are explicitly replaced by δ i / p ( X i ) . This distinction is not merely terminological: the two procedures have different deterministic equivalent smoothing measures and therefore different bias constants, although both estimate the same conditional target under MAR. Let
g a ( x ) : = a ( x ) f ( x ) ,             a ( x ) > 0 ,
and define the deterministic smoothed functional
r a , n , ( m ) ( φ , x ˜ ) = X m r ( m ) ( φ , t ˜ ) j = 1 m g a ( t j ) K Λ n , ( x j ) ( t j ) d t ˜ X m j = 1 m g a ( t j ) K Λ n , ( x j ) ( t j ) d t ˜ .
Then the deterministic centering associated with the complete-case estimator is obtained by taking
a = p ,             g p = p f ,
whereas the deterministic centering associated with the IPW ratio estimator is obtained by taking
a 1 ,             g 1 = f .
Indeed, under MAR,
E ( δ X , Y ) = E ( δ X ) = p ( X ) ,
so complete-case smoothing changes the local design measure from f ( x ) d x to p ( x ) f ( x ) d x . In contrast, for the IPW factor,
E δ p ( X ) | X , Y = 1 ,
and hence the deterministic IPW smoothing measure is again f ( x ) d x . Consequently, any formula involving the density f alone corresponds either to the complete-data/IPW centering or to the special case where p is locally constant; for the complete-case estimator, the corresponding density is p f . This distinction is also visible at the level of the bias expansion. Let T n , , x ˜ denote a random vector on X m with product density
j = 1 m K Λ n , ( x j ) ( t j ) ,
and put
Δ n , = T n , , x ˜ x ˜ ,             μ n , ( x ˜ ) = E ( Δ n , ) ,             Σ n , ( x ˜ ) = E ( Δ n , Δ n , ) .
Writing
q a ( x ˜ ) = log j = 1 m g a ( x j ) = j = 1 m log g a ( x j ) ,
a standard second-order ratio expansion gives, on interior compact sets,
r a , n , ( m ) ( φ , x ˜ ) r ( m ) ( φ , x ˜ ) = r ( m ) ( φ , x ˜ ) μ n , ( x ˜ ) + 1 2 tr 2 r ( m ) ( φ , x ˜ ) Σ n , ( x ˜ ) + r ( m ) ( φ , x ˜ ) Σ n , ( x ˜ ) q a ( x ˜ ) + o μ n , + Σ n , .
Thus, for the complete-case estimator,
q p ( x ˜ ) = j = 1 m log { p ( x j ) f ( x j ) } ,
whereas for the IPW estimator, q 1 ( x ˜ ) = j = 1 m log f ( x j ) . Therefore, the propensity score p cancels from the zeroth-order target under MAR, but in general it does not disappear from the higher-order smoothing bias of the complete-case ratio. It enters through the local design tilt log ( p f ) = log f + log p . Only when p is locally constant, or when the corresponding design-gradient term is of smaller order, does the complete-case bias constant reduce to the complete-data/IPW one. The variance has a different interpretation. In both complete-case and IPW formulations, missingness reduces the effective local information. At the level of the leading Hoeffding projection, the variance contains the usual inverse-propensity inflation. In the simplest first-order case m = 1 , this takes the familiar local form
Var r ^ n , ( 1 ) ( φ , x ) = σ φ 2 ( x ) K Λ n , ( x ) 2 2 n p ( x ) f ( x ) { 1 + o ( 1 ) } ,
where
σ φ 2 ( x ) = Var { φ ( Y ) X = x } .
For conditional U-statistics of order m, the analogous expression is obtained by replacing σ φ 2 by the appropriate conditional Hoeffding-projection variance. In schematic form,
V n , ( m ) ( x ˜ ) = 1 n j = 1 m v j ( x ˜ ) K Λ n , ( x j ) 2 2 p ( x j ) f ( x j ) { 1 + o ( 1 ) } ,
where v j ( x ˜ ) denotes the variance of the j-th conditional first Hoeffding projection of φ ( Y 1 , , Y m ) at ( X 1 , , X m ) = x ˜ . This formula should be read as identifying the missingness contribution to the stochastic scale; the precise kernel-dependent constant is the one displayed in each specific Dirichlet, Bernstein, or beta-kernel theorem. Consequently, the MSE expansion must be written with estimator-specific bias constants:
MSE c c r ^ n , ( m ) ( φ , x ˜ ) = r p , n , ( m ) ( φ , x ˜ ) r ( m ) ( φ , x ˜ ) 2 + V n , ( m ) ( x ˜ ) + o ( · ) ,
whereas
MSE i p w r ^ n , ( m ) , I P W ( φ , x ˜ ) = r 1 , n , ( m ) ( φ , x ˜ ) r ( m ) ( φ , x ˜ ) 2 + V n , ( m ) ( x ˜ ) + o ( · ) .
If, for a given asymmetric smoothing scheme, the squared bias and variance have the generic local orders
r a , n , ( m ) ( φ , x ˜ ) r ( m ) ( φ , x ˜ ) 2 C B , a , ( x ˜ ) b n 2 β ,             V n , ( m ) ( x ˜ ) C V , ( x ˜ ) n b n κ ,
then the corresponding pointwise optimal bandwidth is
b n , a ( x ˜ ) = κ C V , ( x ˜ ) 2 β C B , a , ( x ˜ ) n 1 / ( 2 β + κ ) .
Thus the complete-case optimal bandwidth uses the bias constant C B , p , , which involves the effective density p f , whereas the IPW optimal bandwidth uses C B , 1 , , which involves f. This is the precise sense in which the complete-case and IPW formulations must be kept separate throughout the density, bias, variance, MSE, and bandwidth calculations.
The statistic in (2.5) is therefore a complete-case local ratio estimator, obtained by retaining only those m-tuples for which all response components are observed. Owing to the MAR mechanism, however, its exact finite-sample centering is not, in general, the target functional r ( m ) ( φ , x ˜ ) itself. Rather, the estimator is naturally centered around the propensity-weighted smoothed functional
r p , n , ( m ) ( φ , x ˜ ) : = X m r ( m ) ( φ , t ˜ ) j = 1 m p ( t j ) f ( t j ) K Λ n , ( x j ) ( t j ) d t ˜ X m j = 1 m p ( t j ) f ( t j ) K Λ n , ( x j ) ( t j ) d t ˜ .
Thus, the missingness mechanism modifies the local smoothing measure through the multiplicative factor p ( · ) , thereby replacing the design density f by the effective complete-case density p f . Nevertheless, under the continuity and positivity assumptions imposed on the propensity score, this modification is asymptotically negligible at the target point. Consequently,
r p , n , ( m ) ( φ , x ˜ ) = r ( m ) ( φ , x ˜ ) + o ( 1 ) ,
uniformly over the domain of interest. This observation shows that, although the complete-case estimator is centered around a finite-sample propensity-adjusted smoothing functional, it remains asymptotically centered at the same regression-type target as in the fully observed case. Throughout this paper, any multivariate point will be written in bold. To avoid confusion, we note x = ( x 1 , , x d ) for x X and we denote x ˜ : = ( x 1 , , x m ) a m-tuple of multivariate points x i X , 1 i m . Accordingly, we denote 1 = ( 1 , , 1 ) as a d-dimensional vector whose components are all equal to 1, and 1 ˜ = ( 1 , , 1 ) an m-tuple of points 1 . From now on, we shall use the following notation:
X ˜ : = ( X 1 , , X m ) X m , and X ˜ i : = ( X i 1 , , X i m ) X m , i I ( m , n ) ,
Y ˜ : = ( Y 1 , , Y m ) R q m , and Y ˜ i : = ( Y i 1 , , Y i m ) R q m , i I ( m , n ) ,
δ ˜ : = ( δ 1 , , δ m ) { 0 , 1 } m , and δ ˜ i : = ( δ i 1 , , δ i m ) { 0 , 1 } m , i I ( m , n ) .
We now define for all x ˜ = ( x 1 , , x m ) X m , and 1 , 2 , 3 the augmented kernel that incorporates both the kernel weights and the missingness indicators:
G φ , x ˜ , ( miss ) ( t ˜ , y ˜ , δ ˜ ) = φ ( y ˜ ) j = 1 m δ j K ˜ Λ ¯ n , ( x ˜ ) ( t ˜ ) , ( t ˜ , y ˜ , δ ˜ ) X m × R q m × { 0 , 1 } m ,
where
K ˜ Λ ¯ n , ( x ˜ ) ( t ˜ ) : = i = 1 m K Λ n , ( x i ) ( t i ) .
For 1 , 2 , 3 , we now define the U-statistic based on the extended random vectors Z i : = ( X i , Y i , δ i ) :
u n , ( φ , x ˜ ) : = u n , ( m ) ( G φ , x ˜ , ( miss ) ) = ( n m ) ! n ! i I ( m , n ) G φ , x ˜ , ( miss ) ( X ˜ i , Y ˜ i , δ ˜ i ) .
We can see that
r ^ n , ( m ) ( φ , x ˜ , Λ ¯ n , ( x ˜ ) ) = u n , ( φ , x ˜ ) u n , ( 1 , x ˜ ) ,
where u n , ( 1 , x ˜ ) corresponds to the constant function φ 1 . In establishing the uniform consistency of r ^ n , ( m ) ( φ , x ˜ , Λ ¯ n , ( x ˜ ) ) with respect to r ( m ) ( φ , x ˜ ) , an alternative and more suitable centering factor will be considered instead of the expectation E r ^ n , ( m ) ( φ , x ˜ , Λ ¯ n , ( x ˜ ) ) , which may either be non-existent or computationally challenging to determine. This alternative centering is defined as follows:
E ^ r ^ n , ( m ) ( φ , x ˜ , Λ ¯ n , ( x ˜ ) ) = E u n , ( φ , x ˜ ) E u n , ( 1 , x ˜ ) .

2.1. Hoeffding Decomposition for Symmetric Kernels

The notation and facts presented below should be included in the continuation of this discussion. For a kernel L of k 1 variables we define
U n ( k ) ( L ) = ( n k ) ! n ! i I ( k , n ) L X i 1 , , X i k .
Suppose that L is a function of k 1 variables, symmetric in its entries. Then, the Hoeffding projections (see [2,19]) with respect to P , for 1 j k , are defined as
π j , k L x 1 , , x j = Δ x 1 P × × Δ x j P × P k j ( L ) ,
and
π 0 , k L = E L X 1 , , X k ,
where for measures Q i on X we denote
Q 1 Q m L = X m L ( x 1 , , x m ) d Q 1 ( x 1 ) d Q m ( x m ) ,
and Δ x denotes the Dirac measure at point x X . Then, the decomposition of [2] gives the following orthogonal expansion:
U n ( k ) ( L ) E L = j = 1 k k j U n ( j ) π j , k L .
For L L 2 P k this denotes an orthogonal decomposition and E ( π j , k L X 2 , , X j ) = 0 almost surely for j 1 ; that is, the kernels π j , k L are canonical (degenerate) for P . Also, π j , k , j 1 , are nested projections, i.e., π j , k π j , k = π j , k if j j , and
E π j , k L 2 E ( L E L ) 2 E L 2 .
For example, for k 1 ,
π 1 , k L ( x ) = E ( L ( X 1 , , X k ) X 1 = x ) E L ( X 1 , , X k ) .
Remark 2.
The functions G φ , x ˜ , ( miss ) defined in (2.6) are not necessarily symmetric in their m arguments because the product kernel i = 1 m K Λ n , ( x i ) ( t i ) depends on the ordered m-tuple ( x 1 , , x m ) . When we need to symmetrize them, we define the averaged kernel:
G ¯ φ , x ˜ , ( miss ) ( t ˜ , y ˜ , δ ˜ ) : = 1 m ! σ S m G φ , x ˜ , ( miss ) ( t ˜ σ , y ˜ σ , δ ˜ σ ) = 1 m ! σ S m φ ( y ˜ σ ) j = 1 m δ σ j K ˜ Λ ¯ n , ( x ˜ σ ) ( t ˜ σ ) ,
where S m denotes the symmetric group on { 1 , , m } , t ˜ σ = ( t σ 1 , , t σ m ) , y ˜ σ = ( y σ 1 , , y σ m ) , δ ˜ σ = ( δ σ 1 , , δ σ m ) , and x ˜ σ = ( x σ 1 , , x σ m ) . After symmetrization, the expectation
E G ¯ φ , x ˜ , ( miss ) ( t ˜ , y ˜ , δ ˜ ) = E G φ , x ˜ , ( miss ) ( t ˜ , y ˜ , δ ˜ ) ,
and the U-statistic
u n , ( m ) ( G φ , x ˜ , ( miss ) ) = u n , ( m ) ( G ¯ φ , x ˜ , ( miss ) ) : = u n , ( φ , x ˜ )
remains unchanged. Consequently, we may assume without loss of generality that the kernel is symmetric when applying the Hoeffding decomposition.
Before presenting the conditions and primary results, we introduce notation that distinguishes between the complete-data design density and the effective complete-case design density induced by the MAR mechanism. For a > 0 ,
Γ ( a ) = 0 t a 1 e t d t
denotes the gamma function. For a differentiable function h : R d R , h ( z ) denotes the d-dimensional column vector of first-order partial derivatives at z . Let f = f X denote the marginal density of X with respect to Lebesgue measure on X . For x ˜ = ( x 1 , , x m ) X m , set
f ˜ ( x ˜ ) : = j = 1 m f ( x j ) .
In the complete-data case, the natural density-weighted regression functional is
R ( φ , x ˜ ) : = f ˜ ( x ˜ ) r ( m ) ( φ , x ˜ ) .
Under the MAR mechanism, however, the complete-case estimator is centered with respect to the effective complete-case density
g ( x ) : = p ( x ) f ( x ) ,
where p ( x ) = P ( δ = 1 X = x ) is the propensity score. We therefore define
g ˜ ( x ˜ ) : = j = 1 m g ( x j ) = j = 1 m p ( x j ) f ( x j ) ,
and
R p ( φ , x ˜ ) : = g ˜ ( x ˜ ) r ( m ) ( φ , x ˜ ) .
Equivalently,
R p ( φ , x ˜ ) = j = 1 m p ( x j ) R ( φ , x ˜ ) .
Thus, in all deterministic centering, bias, MSE, and bandwidth calculations for the complete-case MAR estimator, the complete-data density f ˜ must be replaced by the effective density g ˜ . Formulae involving f ˜ alone correspond either to the fully observed case, to the IPW formulation, or to the special case in which p is constant at the relevant order. The expression “ X = D Y ” denotes that the random variable X has the same distribution as Y, while “a.s.” stands for “almost surely” with respect to P . Moreover, A represents the Frobenius norm of the matrix A , defined as A = tr A A 1 / 2 .

2.2. Algorithmic Summary of the Estimator Under MAR

For convenience, we summarize the computation of the complete-case conditional U-statistic estimator under the MAR mechanism. The procedure below applies to the three smoothing schemes considered in this paper, namely Dirichlet ( = 1 ), Bernstein ( = 2 ), and Beta or mixed kernels ( = 3 ).
Algorithm 1 makes explicit the mapping from the inputs { ( X i , Y i , δ i ) } i = 1 n , φ , x ˜ , , Λ ¯ n , ( x ˜ ) to the output r ^ n , ( m ) ( φ , x ˜ ; Λ ¯ n , ( x ˜ ) ) . The factor j = 1 m δ i j ensures that only complete m-tuples contribute to the estimator, in accordance with the complete-case approach under the MAR assumption.
Algorithm 1 Complete-case conditional U-statistic estimator under MAR
Require: Observations { ( X i , Y i , δ i ) } i = 1 n , order m 1 , x ˜ = ( x 1 , , x m )
Ensure:  r ^ n , ( m ) ( φ , x ˜ ; Λ ¯ n , ( x ˜ ) )
 1:
for  j = 1 , , m  do
 2:
      Construct the kernel K Λ n , ( x j ) ( · ) associated with x j
 3:
end for
 4:
Set Num 0 and Den 0
 5:
for each i = ( i 1 , , i m ) I ( m , n )  do
 6:
      Compute the weight   
W i ( x ˜ ) : = j = 1 m δ i j j = 1 m K Λ n , ( x j ) ( X i j )
 7:
       Num Num + φ ( Y i 1 , , Y i m ) W i ( x ˜ )
 8:
       Den Den + W i ( x ˜ )
 9:
end for
10:
if  Den > 0  then return  Num / Den
11:
elsereturn a default value, or leave the estimator undefined
12:
end if
Remark 3.
While Bouzebda et al. [159] addressed the complete-data case, the present paper goes considerably further by treating the Missing-at-Random framework, which introduces a genuinely nontrivial layer of analytical complexity. Indeed, once missingness is allowed, the asymptotic study can no longer be transferred mechanically from the complete-data setting: the complete-case structure, the presence of the propensity score, and the nonlinear nature of conditional U-statistics require new arguments at both the probabilistic and statistical levels. In this sense, the current contribution is not simply an extension of [159], but a substantial strengthening and broadening of that earlier work. Moreover, the paper is accompanied by a markedly richer and more complete numerical study, offering a deeper understanding of the finite-sample performance of the proposed procedures.

2.3. Conditions and Comments

(C.0) 
The propensity score
p ( x ) = P ( δ = 1 X = x )
satisfies:
  • p : X ( 0 , 1 ] is continuous on X ;
  • there exists a constant c p > 0 such that
    inf x X p ( x ) c p > 0 ;
  • p is twice continuously differentiable on Int ( X ) .
Moreover, the missingness mechanism satisfies the MAR condition
P ( δ = 1 X = x , Y ) = P ( δ = 1 X = x ) = p ( x ) .
(C.1) 
Let
g ( x ) : = p ( x ) f ( x ) ,             g ˜ ( x ˜ ) : = j = 1 m g ( x j ) = j = 1 m p ( x j ) f ( x j ) ,
and define the MAR complete-case density-weighted functional
R p ( φ , x ˜ ) : = g ˜ ( x ˜ ) r ( m ) ( φ , x ˜ ) .
The function R p ( φ , · ) is Lipschitz continuous on X m ; that is, there exists a constant L R , p > 0 such that, for all x ˜ , x ˜ X m ,
R p ( φ , x ˜ ) R p ( φ , x ˜ ) L R , p x ˜ x ˜ 2 .
Equivalently,
R p ( φ , x ˜ ) = j = 1 m p ( x j ) R ( φ , x ˜ ) .
Thus, when p 1 , condition (C.1) reduces to the corresponding complete-data Lipschitz condition on R ( φ , · ) .
(C.2) 
The effective complete-case density g ˜ and the MAR density-weighted functional R p ( φ , · ) admit continuous second-order partial derivatives on Int ( X m ) ; that is,
g ˜ C 2 Int ( X m ) ,             R p ( φ , · ) C 2 Int ( X m ) .
Equivalently, since g = p f , condition (C.2) requires second-order regularity of the effective complete-case design density rather than merely of the original design density. Under (C.0), it is implied by the corresponding C 2 -regularity of f ˜ , R ( φ , · ) , and p.
(C.3) 
There exist constants γ > 0 and C 1 , p [ 1 , ) such that
E | φ ( Y ˜ ) | 2 + γ <
and
sup x ˜ Int ( X m ) E | φ ( Y ˜ ) | 2 + γ | X ˜ = x ˜ g ˜ ( x ˜ ) C 1 , p .
Equivalently,
sup x ˜ Int ( X m ) E | φ ( Y ˜ ) | 2 + γ | X ˜ = x ˜ j = 1 m p ( x j ) f ( x j ) C 1 , p .
Because 0 < p 1 , the complete-data condition formulated with f ˜ implies (C.3). Conversely, since p c p > 0 , the condition with g ˜ is equivalent to the corresponding condition with f ˜ , up to multiplicative constants. The formulation above is the natural one for the complete-case estimator, because its augmented kernel contains the factor j = 1 m δ i j .

2.4. Comments

We first record a few consequences and interpretations of the preceding assumptions. Since g ˜ C 2 ( Int ( X m ) ) by (C.2), and since the analysis is carried out on compact subsets of the interior, there exists a constant C 0 [ 1 , ) such that
sup x ˜ C g ˜ ( x ˜ ) C 0 ,             C Int ( X m ) .
If X m is compact and g ˜ admits a continuous extension to X m , then the same bound holds with C = X m . In view of the positivity condition in (C.0), this is equivalent, up to multiplicative constants, to the corresponding boundedness of f ˜ . Nevertheless, for the complete-case MAR estimator, the natural quantity is g ˜ , not f ˜ , because the deterministic centering is taken with respect to the effective complete-case design density g = p f . The uniform conditional moment condition (2.10) in (C.3) should be understood in the same complete-case sense. Namely,
E | φ ( Y ˜ ) | 2 + γ | X ˜ = x ˜
is allowed to grow near boundary regions or low-density regions, but only at a rate controlled by the inverse of the effective complete-case density
g ˜ ( x ˜ ) = j = 1 m p ( x j ) f ( x j ) .
Equivalently, the admissible growth is no faster than
j = 1 m p ( x j ) f ( x j ) 1 .
Because p is bounded away from zero by (C.0), this condition is equivalent to the corresponding complete-data formulation involving f ˜ , up to fixed constants. The formulation with g ˜ , however, is the appropriate one for the estimator actually studied in this paper, since its augmented kernel contains the complete-case factor j = 1 m δ i j . Conditions of this type are standard in nonparametric smoothing with possibly unbounded responses; see, for instance, Assumption 2 of [160] and Assumption A3 of [106].
Condition (C.3) is used in the truncation step of the proof. More precisely, it provides a uniform envelope control for the localized U-process after weighting by the effective complete-case density. This is needed to separate the contribution of large values of φ ( Y ˜ ) from the main stochastic term and to obtain the stated uniform rates. As in [68,161], the polynomial moment condition in (C.3) may be replaced by a more general Orlicz-type integrability assumption.
(C.3)″ 
Let M : [ 0 , ) [ 0 , ) be a continuous nondecreasing function such that, for some s > 2 , as x ,
( i ) x s M ( x ) ,             ( i i ) x 1 log M ( x ) .
For t M ( 0 ) , let M inv ( t ) 0 be defined by
M M inv ( t ) = t .
Assume that
E M | φ ( Y ˜ ) | < .
When a localized uniform version is required, this assumption is strengthened to
sup x ˜ Int ( X m ) E M | φ ( Y ˜ ) | | X ˜ = x ˜ g ˜ ( x ˜ ) < .
Two particularly useful choices are M ( x ) = x p , p > 2 , which recovers a polynomial moment assumption, and M ( x ) = exp ( s x ) , s > 0 , which corresponds to exponential-type integrability. These alternatives lead to the same truncation strategy, with the truncation level calibrated through M inv .

3. Conditional U -Statistic Estimators Based on Dirichlet Kernels

In this section, we take X = S d , 1 , where
S d , 1 : = x [ 0 , 1 ] d : x 1 1 ,
and
Int ( S d , 1 ) = x ( 0 , 1 ) d : x 1 < 1 ,
where x 1 : = i = 1 d x i and d N * . Accordingly,
Int ( S d , 1 m ) = x ˜ = ( x 1 , , x m ) ( S d , 1 ) m : x j Int ( S d , 1 ) , 1 j m .
For α 1 , , α d , β > 0 , the density of the Dirichlet ( α , β ) distribution with respect to the Lebesgue measure on R d restricted to S d , 1 is
K α , β ( x ) : = Γ α 1 + β Γ ( β ) i = 1 d Γ α i · 1 x 1 β 1 i = 1 d x i α i 1 , x S d , 1 .
We refer to Chapter 49 of [162,163]. The Aitchison-Lauder proposal introduces a significant aspect wherein the kernel K α , β ( · ) form alters with the position x within the simplex. This adaptation mitigates the boundary bias issue prevalent in conventional estimators, where the kernel remains constant across all points. Throughout this section, for each j = 1 , , m , we set
Λ n , 1 ( x j ) = ( α j , β j ) : = x j b ˘ + 1 , 1 x j 1 b ˘ + 1 , for x j S d , 1 , b ˘ > 0 .
The bandwidth parameter b ˘ , denoted as b ˘ ( n ) , inherently depends on the sample size n. Now, we can introduce a new conditional U-statistic regression estimator using the Dirichlet kernel by replacing (3.1) in (2.2), and we obtain
r ˜ n , 1 ( m ) ( φ , x ˜ ; Λ ¯ n , 1 ( x ˜ ) ) = ( i 1 , , i m ) I ( m , n ) φ ( Y i 1 , , Y i m ) j = 1 m K ( α j , β j ) X i j ( i 1 , , i m ) I ( m , n ) j = 1 m K ( α j , β j ) X i j .
Under the MAR assumption (2.4), the complete-case estimator extending (3.2) to missing responses is given by
r ^ n , 1 ( m ) ( φ , x ˜ ; Λ ¯ n , 1 ( x ˜ ) ) = ( i 1 , , i m ) I ( m , n ) φ ( Y i 1 , , Y i m ) j = 1 m δ i j j = 1 m K ( α j , β j ) X i j ( i 1 , , i m ) I ( m , n ) j = 1 m δ i j j = 1 m K ( α j , β j ) X i j .
For this specific kernel, the augmented kernel defined in (2.6) becomes
G φ , x ˜ , 1 ( Dir ) ( t ˜ , y ˜ , δ ˜ ) = φ ( y ˜ ) j = 1 m δ j j = 1 m K ( α j , β j ) ( t j ) ,             ( t ˜ , y ˜ , δ ˜ ) S d , 1 m × R q m × { 0 , 1 } m ,
where ( α j , β j ) are given by (3.1) with x j being the j-th component of x ˜ . The corresponding U-statistic is
u n , 1 ( Dir ) ( φ , x ˜ ) : = ( n m ) ! n ! i I ( m , n ) G φ , x ˜ , 1 ( Dir ) ( X ˜ i , Y ˜ i , δ ˜ i ) ,
and the ratio representation (2.8) holds analogously:
r ^ n , 1 ( m ) ( φ , x ˜ , Λ ¯ n , 1 ( x ˜ ) ) = u n , 1 ( Dir ) ( φ , x ˜ ) u n , 1 ( Dir ) ( 1 , x ˜ ) .
We first consider the regression case m = 1 , which serves as a building block for the higher-order analysis. These findings are essential for examining the estimators outlined in (3.2).

3.1. Nonparametric Regression Estimation

Let us consider the following quantities:
g ^ n ( φ , x , Λ n , 1 ) : = 1 n i = 1 n φ ( Y i ) K ( α , β ) X i ,
and
f ^ n ( x , Λ n , 1 ) : = 1 n i = 1 n K ( α , β ) X i .
In this section, we establish uniform strong consistency of the following regression estimator defined by
r ^ n , 1 ( 1 ) ( φ , x ) = g ^ n ( φ , x , Λ n , 1 ) f ^ n ( x , Λ n , 1 ) .
Under the MAR assumption (2.4), the complete-case versions of g ^ n and f ^ n incorporating the missingness indicators are defined as
g ^ n ( miss ) ( φ , x , Λ n , 1 ) : = 1 n i = 1 n φ ( Y i ) δ i K ( α , β ) X i ,
and
f ^ n ( miss ) ( x , Λ n , 1 ) : = 1 n i = 1 n δ i K ( α , β ) X i .
The corresponding regression estimator for missing responses is then given by
r ^ n , 1 ( 1 ) , ( miss ) ( φ , x ) = g ^ n ( miss ) ( φ , x , Λ n , 1 ) f ^ n ( miss ) ( x , Λ n , 1 ) .
Finally, we represent the expectation of g ^ n ( miss ) ( φ , x , Λ n , 1 ) as:
E g ^ n ( miss ) ( φ , x , Λ n , 1 ) = E φ ( Y ) δ K ( α , β ) X = S d , 1 r ( 1 ) ( φ , u ) f ( u ) p ( u ) K ( α , β ) u d u ,
where p ( u ) is the propensity score defined in (2.4). Alternatively, notice that if ξ x Dirichlet α , β , then we also have the representation
E g ^ n ( miss ) ( φ , x , Λ n , 1 ) = E R φ , ξ x p ( ξ x ) ,
where R ( 1 ) ( φ , x ) = f ( x ) r ( 1 ) ( φ , x ) . To derive uniform consistency results, we adopt the following approach
  r ^ n , 1 ( 1 ) , ( miss ) ( φ , x ) r ( 1 ) ( φ , x )               = 1 f ^ n ( miss ) ( x , Λ n , 1 ) g ^ n ( miss ) ( φ , x , Λ n , 1 ) E [ g ^ n ( miss ) ( φ , x , Λ n , 1 ) ]                     E [ g ^ n ( miss ) ( φ , x , Λ n , 1 ) ] f ^ n ( miss ) ( x , Λ n , 1 ) E [ f ^ n ( miss ) ( x , Λ n , 1 ) ] f ^ n ( miss ) ( x , Λ n , 1 ) E [ f ^ n ( miss ) ( x , Λ n , 1 ) ]                     E ( φ ( Y ) | X = x ) E [ g ^ n ( miss ) ( φ , x , Λ n , 1 ) ] E [ f ^ n ( miss ) ( x , Λ n , 1 ) ] .
For δ > 0 , define
S d , 1 ( δ ) : = x S d , 1 : 1 x 1 δ and x i δ , i = 1 , , d .
To the best of our knowledge, the following result has not been established for Dirichlet-kernel estimators in the presence of missing responses under MAR.
Theorem 1.
Assume that the conditions (C.0), (C.1) and (C.3) hold. Under the MAR assumption (2.4) and the positivity condition inf x S d , 1 ( b ˘ d ) p ( x ) c > 0 , if, in addition, b ˘ d n as n , we have, as n ,
sup x S d , 1 ( b ˘ d ) r ^ n , 1 ( 1 ) , ( miss ) ( φ , x ) r ( 1 ) ( φ , x ) = O b ˘ 1 / 2 + O | log b ˘ | ( log n ) 3 / 2 b ˘ d + 1 / 2 n , a . s .
where r ^ n , 1 ( 1 ) , ( miss ) ( φ , x ) is defined in (3.10). In particular, if | log b ˘ | 2 b ˘ 2 d 1 = o n / ( log n ) 3 as n , then
sup x S d , 1 ( b ˘ d ) r ^ n , 1 ( 1 ) , ( miss ) ( φ , x ) r ( 1 ) ( φ , x ) 0 , a . s .

3.2. Uniform Convergence of Conditional U-Statistics Under Missing Data

In this section, we establish the following uniform almost sure convergence results regarding the uniform almost sure consistency for the conditional U-statistics in the presence of missing responses. Below, we state the uniform consistency of conditional U-statistics when the function φ ( · ) is not necessarily bounded, under the MAR assumption.
Theorem 2.
If Assumptions (C.2) and (C.3) hold, and under the MAR assumption (2.4) with inf x S d , 1 m p ( x ) c > 0 , then, as n ,
sup x ˜ S d , 1 m u n , 1 ( Dir ) ( φ , x ˜ ) E u n , 1 ( Dir ) ( φ , x ˜ ) = O | log b ˘ | m ( log n ) 3 / 2 b ˘ m ( d + 1 / 2 ) n , a . s .
where u n , 1 ( Dir ) ( φ , x ˜ ) is defined in (3.5).
Theorem 3.
If Assumptions (C.2) and (C.3) hold, and under the MAR assumption (2.4) with inf x S d , 1 m p ( x ) c > 0 , then, as n ,
sup x ˜ S d , 1 m r ^ n , 1 ( m ) , ( miss ) ( φ , x ˜ ; Λ ¯ n , 1 ( x ˜ ) ) E ^ r ^ n , 1 ( m ) , ( miss ) ( φ , x ˜ ; Λ ¯ n , 1 ( x ˜ ) ) = O | log b ˘ | m ( log n ) 3 / 2 b ˘ m ( d + 1 / 2 ) n , a . s .
where r ^ n , 1 ( m ) , ( miss ) ( · ) is defined in (3.3) and E ^ [ · ] is the centering operator defined in (2.9).
Theorem 4.
Assume that (C.0) and (C.2) hold. Under the MAR assumption (2.4), we have
sup x ˜ S d , 1 m r ( m ) ( φ , x ˜ ) E ^ r ^ n , 1 ( m ) , ( miss ) φ , x ˜ ; Λ ¯ n , 1 ( x ˜ ) = O b ˘ 1 / 2 .
Corollary 1.
Under the assumptions of Theorems 3 and 4, together with the MAR assumption (2.4) and the positivity condition, we have, as n ,
sup x ˜ S d , 1 m r ^ n , 1 ( m ) , ( miss ) ( φ , x ˜ ; Λ ¯ n , 1 ( x ˜ ) ) r ( m ) ( φ , x ˜ ) = O b ˘ 1 / 2 + O | log b ˘ | m ( log n ) 3 / 2 b ˘ m ( d + 1 / 2 ) n , a . s .

3.3. Limiting Distribution Under Missing Data

Within this section, we establish the central limit theorem for the estimator defined in (3.3) under the MAR assumption. To achieve this, we rely on the following set of assumptions:
(A.1) 
Let x ˜ = ( x 1 , , x m ) be a point of continuity for each
r j l ( x ˜ ) = 0 if x j x l , E j , l ( x ˜ ) if x j = x l ,
where
E j , l ( x ˜ ) = E φ Y 1 , , Y j 1 , Y , Y j + 1 , , Y m φ Y m + 1 , , Y m + j 1 , Y , Y m + j + 1 , , Y 2 m X i = x i for i j , X m + r = x r for r l and X = x j = x l ;
(A.2) 
The density function f ( · ) is continuous at each x j , 1 j m , with f ( x j ) > 0 ;
(A.3) 
r j , l , s ( · , · , · ) is bounded in a neighborhood of ( x ˜ , x ˜ , x ˜ ) S d , 1 3 m , for all 1 j , l , s m , where:
r j , l , s z ˜ m , z ˜ 2 m , z ˜ 3 m = E [ φ Y 1 , , Y j 1 , Y , Y j + 1 , Y m × φ Y m + 1 , , Y m + j 1 , Y , Y m + j + 1 , Y 2 m × φ Y 2 m + 1 , , Y 2 m + j 1 , Y , Y 2 m + j + 1 , Y 3 m X i = z i ; 1 i 3 m , i j , m + 1 , 2 m + s , X = z ] ,
and for 1 s 3 , z ˜ s m = ( z ( s 1 ) m + 1 , , z ( s 1 ) m + j 1 , z , z ( s 1 ) m + j + 1 , , z s m ) ;
(A.4) 
r 1 , 2 ( m ) ( · , · ) is bounded in a neighborhood of ( x ˜ , x ˜ ) , where
r 1 , 2 ( m ) ( x ˜ 1 , x ˜ 2 ) = E φ ( Y i 1 , , Y i m ) φ ( Y j 1 , , Y j m ) ( X i 1 , , X i m ) = x ˜ 1 , ( X j 1 , , X j m ) = x ˜ 2 ;
(A.5) 
Let r ( m ) ( φ , · ) admit an expansion
r ( m ) ( φ , t + Δ ) = r ( m ) ( φ , t ) + t r ( m ) ( φ , t ) Δ + 1 2 Δ 2 t 2 r ( m ) ( φ , t ) Δ + o Δ Δ ,
as Δ 0 , for all t in a neighborhood of x ˜ .
Below, we write Z = D N ( μ , σ 2 ) whenever the random variable Z is Gaussian with mean μ and variance σ 2 , and D denotes convergence in distribution. We also denote
U n , 1 ( miss ) ( φ , x ˜ ) = u n , 1 ( Dir ) ( φ , x ˜ ) N ( miss ) ,
where
N ( miss ) : = j = 1 m E δ K ( α , β ) X = j = 1 m E p ( X ) K ( α , β ) X ,
by the MAR assumption (2.4). Note that N ( miss ) incorporates the propensity score p ( · ) . For fixed x ˜ Int ( S d , 1 m ) , let
Σ = Σ ( x ˜ ) : = σ 11 σ 12 σ 12 σ 22
denote the asymptotic covariance matrix of the vector
n b ˘ d / 2 U n , 1 ( miss ) ( φ , x ˜ ) E U n , 1 ( miss ) ( φ , x ˜ ) U n , 1 ( miss ) ( 1 , x ˜ ) E U n , 1 ( miss ) ( 1 , x ˜ ) ,
that is,
σ 11 : = lim n n b ˘ d / 2 Var U n , 1 ( miss ) ( φ , x ˜ ) ,
σ 12 : = lim n n b ˘ d / 2 Cov U n , 1 ( miss ) ( φ , x ˜ ) , U n , 1 ( miss ) ( 1 , x ˜ ) ,
σ 22 : = lim n n b ˘ d / 2 Var U n , 1 ( miss ) ( 1 , x ˜ ) .
Theorem 5.
Under assumptions (A.1)–(A.4), (C.0), and (C.2), the MAR assumption (2.4), the positivity condition
inf x S d , 1 p ( x ) c > 0 ,
and if r ( m ) ( φ , · ) is continuous at x ˜ Int ( S d , 1 m ) , then
n b ˘ d / 2 r ^ n , 1 ( m ) , ( miss ) ( φ , x ˜ ; Λ ¯ n , 1 ( x ˜ ) ) E ^ r ^ n , 1 ( m ) , ( miss ) ( φ , x ˜ ; Λ ¯ n , 1 ( x ˜ ) ) D N 0 , ρ miss 2 ,
where
ρ miss 2 = σ 11 E U n , 1 ( miss ) ( 1 , x ˜ ) 2 2 E U n , 1 ( miss ) ( φ , x ˜ ) E U n , 1 ( miss ) ( 1 , x ˜ ) 3 σ 12 + E U n , 1 ( miss ) ( φ , x ˜ ) 2 E U n , 1 ( miss ) ( 1 , x ˜ ) 4 σ 22 .
Moreover, if
n b ˘ d / 2 U n , 1 ( miss ) ( 1 , x ˜ ) 1 P 0 ,
then σ 12 = 0 and σ 22 = 0 , and hence (3.19) reduces to
ρ miss 2 = i = 1 m j = 1 m 1 { x i = x j } r i j ( x ˜ ) p ( x i ) f ( x i ) K ( α i , β i ) 2 ( u ) d u .
The proof of Theorem 5 is postponed until Section 11.
Remark 4.
The simplified expression (3.20) is valid when the denominator U n , 1 ( miss ) ( 1 , x ˜ ) converges to a constant (i.e., when its asymptotic variance σ 22 = 0 ). This holds under the same regularity conditions because u n , 1 ( Dir ) ( 1 , x ˜ ) converges to a deterministic limit. In the general case where σ 22 0 , the full delta-method formula must be used.
The following corollary is more or less straightforward, given Theorem 5.
Corollary 2.
If, in addition to the assumptions of Theorem 5, (A.5) holds, then under the MAR assumption (2.4) and the positivity condition, we have the following bias expansion:
b ˘ d / 2 E U n , 1 ( miss ) ( φ , x ˜ ) r ( m ) ( φ , x ˜ )               = j = 1 m K α , β ( t j ) R ( m ) ( φ , t ˜ ) t d t / f ˜ ( x ˜ )           j = 1 m K α , β ( t j ) t f ˜ ( x ˜ ) t d t r ( m ) ( φ , x ˜ ) f ˜ ( x ˜ )             + b ˘ d / 2 2 j = 1 m K α , β ( t j ) t R ( m ) ( φ , t ˜ ) t d t / f ˜ ( x ˜ )           j = 1 m K α , β ( t j ) t f ˜ ( x ˜ ) t d t r ( m ) ( φ , x ˜ ) f ˜ ( x ˜ ) + o ( 1 ) .
In particular, if
n b ˘ ( d + 2 ) / 4 0 ,
then
n b ˘ d / 2 r ^ n , 1 ( m ) , ( miss ) φ , x ˜ ; Λ ¯ n , 1 ( x ˜ ) r ( m ) ( φ , x ˜ ) D N 0 , ρ miss 2 ,
where ρ miss 2 is defined in (3.19).
Remark 5.
The invariance of the first-order bias expansion with respect to the missingness mechanism under MAR deserves careful scrutiny. Observe that
E U n , 1 ( miss ) ( φ , x ˜ ) = E u n , 1 ( Dir ) ( φ , x ˜ ) j = 1 m E δ K ( α , β ) ( X ) ,
where, by the MAR assumption (2.4) and the law of total expectation,
E δ K ( α , β ) ( X ) = E E [ δ X ] K ( α , β ) ( X ) = E p ( X ) K ( α , β ) ( X ) .
Similarly,
E u n , 1 ( Dir ) ( φ , x ˜ ) = E φ ( Y ˜ ) j = 1 m δ j K ( α , β ) ( X j ) = E φ ( Y ˜ ) j = 1 m p ( X j ) K ( α , β ) ( X j ) ,
using the conditional independence δ j Y j X j and the product structure of the joint distribution under i.i.d. sampling. Consequently,
E U n , 1 ( miss ) ( φ , x ˜ ) = E φ ( Y ˜ ) j = 1 m p ( X j ) K ( α , β ) ( X j ) j = 1 m E p ( X ) K ( α , β ) ( X ) .
At first glance, this expression depends nontrivially on the propensity score p ( · ) . However, a Taylor expansion of the kernel K ( α , β ) ( · ) around x ˜ reveals that, to leading order as b ˘ 0 ,
K ( α , β ) ( t ) = b ˘ d / 2 1 i = 1 d x i 1 / 2 κ t x b ˘ 1 / 2 + o ( b ˘ d / 2 ) ,
where κ is a bounded kernel function on R d (see [164] for the precise form). Substituting this expansion, the factors p ( X j ) in the numerator and denominator cancel to first order because they are evaluated at the same points X j and the kernel concentrates around x j . More formally, under the smoothness condition (C.2) and the continuity of p ( · ) , we have
E p ( X j ) K ( α , β ) ( X j ) = p ( x j ) E K ( α , β ) ( X j ) + O ( b ˘ 1 / 2 ) ,
and similarly for the numerator. The leading-order terms involving p ( x j ) cancel exactly in the ratio, yielding a bias expansion that coincides with the complete-data case. The residual terms are of order O ( b ˘ 1 / 2 ) and are absorbed into the o ( 1 ) term in the expansion. Therefore, the first-order bias is unaffected by the MAR mechanism. However, this cancellation is not exact at finite sample sizes; higher-order terms involving derivatives of p ( · ) may appear at order O ( b ˘ ) , which are asymptotically negligible relative to the leading bias term O ( b ˘ 1 / 2 ) under the bandwidth condition n b ˘ ( d + 2 ) / 4 0 . In contrast, the asymptotic variance is irreducibly inflated by the factor 1 / p ( x i ) , as evidenced in (3.19), reflecting the fundamental loss of information due to missingness. This phenomenon is characteristic of complete-case estimators under MAR: the bias remains first-order unbiased, but the variance increases proportionally to the inverse of the propensity score.
Remark 6
([164]). According to Theorem 3.1.15 in [86], the convergence rate for the conventional d-dimensional kernel density estimator with independent and identically distributed (i.i.d.) data, using bandwidth h, is O n 1 / 2 h d / 2 . In contrast, under the MAR assumption with positivity condition, the estimator f ^ n ( miss ) ( x , Λ n , 1 ) achieves a convergence rate of O n 1 / 2 b ˘ d / 4 . Consequently, the relationship between the bandwidths of f ^ n ( miss ) ( x , Λ n , 1 ) and the traditional multivariate kernel density estimator is expressed as b ˘ h 2 .
Remark 7.
In their work, Ouimet and Tolosana-Delgado [164] demonstrated that for all x Int ( S d , 1 ) and as n tends to infinity, the Mean Squared Error (MSE) of the estimator f ^ n ( miss ) ( x , Λ n , 1 ) with respect to the density function f ( · ) under the MAR assumption can be expressed as:
MSE f ^ n ( miss ) ( x , Λ n , 1 ) : = E f ^ n ( miss ) ( x , Λ n , 1 ) f ( x ) 2 = Var f ^ n ( miss ) ( x , Λ n , 1 ) + Bias f ^ n ( miss ) ( x , Λ n , 1 ) 2 = n 1 b ˘ d / 2 ψ ( x ) f ( x ) p ( x ) + b ˘ 2 g 2 ( x ) + O x n 1 b ˘ d / 2 + 1 / 2 + o b ˘ 2 ,
where ψ ( · ) is defined in Equation (A1) in Lemma A4, p ( x ) is the propensity score defined in (2.4), and
g ( x ) : = i = 1 d 1 ( d + 1 ) x i x i f ( x ) + 1 2 i , j = 1 d x i 1 { i = j } x j 2 x i x j f ( x ) .
In particular, if f ( x ) · g ( x ) 0 , the asymptotically optimal choice of b ˘ , concerning MSE, is given by:
b ˘ opt ( x ) = n 2 / ( d + 4 ) d 4 · ψ ( x ) f ( x ) p ( x ) g 2 ( x ) 2 / ( d + 4 ) ,
with
MSE f ^ n ( miss ) ( x , Λ n , 1 ) ; b ˘ opt = n 4 / ( d + 4 ) 1 + d 4 d 4 d d + 4 ( ψ ( x ) f ( x ) / p ( x ) ) 4 / ( d + 4 ) g 2 ( x ) d / ( d + 4 ) + o x n 4 / ( d + 4 ) .
Furthermore, if n 2 / ( d + 4 ) b ˘ tends to λ for some λ > 0 as n approaches infinity, then
MSE f ^ n ( miss ) ( x , Λ n , 1 ) = n 4 / ( d + 4 ) λ d / 2 ψ ( x ) f ( x ) p ( x ) + λ 2 g 2 ( x ) + o x n 4 / ( d + 4 ) .

4. Conditional U -Statistics Using Bernstein Polynomials Under Missing Data

This section delves into the asymptotic properties of the conditional U-statistics estimator derived by inverting the Bernstein polynomial estimator of the distribution function in the presence of missing responses under the MAR assumption (2.4). Let F ( · ) represent any joint cumulative distribution function on S d , 1 , where values outside S d , 1 are either 0 or 1. Following [129,131], we define the Bernstein polynomial of order ϑ for F ( · ) as follows:
F ϑ ( x ) = k N 0 d ϑ S d , 1 F ( k / ϑ ) P k , ϑ ( x ) , x S d , 1 , ϑ N ,
where the weights are probabilities from the Multinomial ( ϑ , x ) distribution:
P k , ϑ ( x ) = ϑ ! ϑ k 1 ! i = 1 d k i ! · 1 x 1 ϑ k 1 i = 1 d x i k i , k N 0 d ϑ S d , 1 .
The Bernstein estimator of F ( · ) , denoted by F n , ϑ ( · ) , is the Bernstein polynomial of order ϑ for the empirical cumulative distribution function:
F n ( x ) : = n 1 i = 1 n 1 ( , x ] X i ,
where X 1 , , X n are independent and identically distributed according to F ( · ) and 1 { A } denotes as usual the indicator function of the set A. Precisely, we define:
F n , ϑ ( x ) = k N 0 d ϑ S d , 1 F n ( k / ϑ ) P k , ϑ ( x ) , x S d , 1 , ϑ , n N .
Under the MAR assumption (2.4), the observed data consist of the triplets { ( X i , Y i , δ i ) } i = 1 n , where δ i is the missingness indicator. The complete-case empirical cumulative distribution function, which discards observations with missing responses, is defined as:
F n ( miss ) ( x ) : = 1 n i = 1 n δ i 1 ( , x ] X i ,
where the normalization is by n (not by the number of observed cases) to maintain the proper stochastic order. The corresponding Bernstein estimator under missing data is then:
F n , ϑ , ( miss ) ( x ) = k N 0 d ϑ S d , 1 F n ( miss ) ( k / ϑ ) P k , ϑ ( x ) , x S d , 1 , ϑ , n N .
For a density f ( · ) supported on S d , 1 , define the Bernstein kernel
K x , ϑ ( X i ) : = k N 0 d ( ϑ 1 ) S d , 1 ( ϑ 1 + d ) ! ( ϑ 1 ) ! 1 k ϑ , k + 1 ϑ ( X i ) P k , ϑ 1 ( x ) .
Then the complete-case Bernstein density estimator under MAR is
f ^ n , ϑ ( miss ) ( x ) = 1 n i = 1 n δ i K x , ϑ ( X i ) ,             x S d , 1 .
Equivalently,
f ^ n , ϑ ( miss ) ( x ) = k N 0 d ( ϑ 1 ) S d , 1 ( ϑ 1 + d ) ! ( ϑ 1 ) ! 1 n i = 1 n δ i 1 k ϑ , k + 1 ϑ ( X i ) P k , ϑ 1 ( x ) .
Remark 8
([129]). A different expression for the complete-case Bernstein density estimator (4.1) under MAR can be formulated as a specific finite mixture of Dirichlet densities:
f ^ n , ϑ ( miss ) ( x ) = k N 0 d ( ϑ 1 ) S d , 1 1 n i = 1 n δ i 1 k ϑ , k + 1 ϑ ( X i ) D k + 1 , ϑ k 1 ( x ) ,
where the density value of the Dirichlet ( α , β ) distribution at x S d , 1 is given by
D ( α , β ) ( x ) = β + α 1 1 ! ( β 1 ) ! i = 1 d ( α i 1 ) ! 1 x 1 β 1 i = 1 d x i α i 1 ,             α i , β > 0 .
For further details, see [129].
The conditional U-statistic smoothed by Bernstein polynomials under missing data is defined, for each x ˜ S d , 1 m , by
r ^ n , 2 ( m ) , ( miss ) ( φ , x ˜ ; Λ ¯ n , 2 ( x ˜ ) ) = ( i 1 , , i m ) I ( m , n ) φ ( Y i 1 , , Y i m ) j = 1 m δ i j j = 1 m K x j , ϑ ( X i j ) ( i 1 , , i m ) I ( m , n ) j = 1 m δ i j j = 1 m K x j , ϑ ( X i j ) .
In the particular case m = 1 , r ( m ) ( φ , x ˜ ) reduces to r ( 1 ) ( φ , x ) = E ( φ ( Y ) X = x ) , and the complete-case Nadaraya–Watson estimator under MAR becomes
r ^ n , 2 ( 1 ) , ( miss ) ( φ , x ) : = i = 1 n φ ( Y i ) δ i K x , ϑ ( X i ) i = 1 n δ i K x , ϑ ( X i ) = g ^ n , ϑ ( miss ) ( φ , x ) f ^ n , ϑ ( miss ) ( x ) .

4.1. Centering and U-Statistic Representation Under MAR

For the Bernstein polynomial estimator under missing data, define
G φ , x ˜ , 2 ( Bern-miss ) ( t ˜ , y ˜ , δ ˜ ) = φ ( y ˜ ) j = 1 m δ j j = 1 m K x j , ϑ ( t j ) ,             ( t ˜ , y ˜ , δ ˜ ) S d , 1 m × R q m × { 0 , 1 } m .
The corresponding U-statistic is
u n , 2 ( Bern-miss ) ( φ , x ˜ ) : = ( n m ) ! n ! i I ( m , n ) G φ , x ˜ , 2 ( Bern-miss ) ( X ˜ i , Y ˜ i , δ ˜ i ) ,
and the ratio representation (2.8) holds analogously:
r ^ n , 2 ( m ) , ( miss ) ( φ , x ˜ ; Λ ¯ n , 2 ( x ˜ ) ) = u n , 2 ( Bern-miss ) ( φ , x ˜ ) u n , 2 ( Bern-miss ) ( 1 , x ˜ ) .
The centering operator E ^ [ · ] defined in (2.9) is adapted accordingly.

4.2. Nonparametric Regression Estimation Under Missing Data

In this section, we prove the uniform strong consistency of the complete-case regression estimator for m = 1 under the MAR assumption.
Theorem 6.
Assume that the conditions (C.1), (C.2) and (C.3) hold, together with the MAR assumption (2.4) and the positivity condition inf x S d , 1 p ( x ) c > 0 . If 2 ϑ n log n , as n , then
sup x S d , 1 E [ r ^ n , 2 ( 1 ) , ( miss ) ( φ , x ) ] r ( 1 ) ( φ , x ) = O ( ϑ 1 / 2 ) ,
and
sup x S d , 1 r ^ n , 2 ( 1 ) , ( miss ) ( φ , x ) E [ r ^ n , 2 ( 1 ) , ( miss ) ( φ , x ) ] = O ( ϑ d 1 / 2 ( n 1 log n ) 1 / 2 ) , a . s .
The bias bound remains identical to the complete-data case, while the stochastic bound is unaffected because the missingness indicators δ i are absorbed into the kernel and the uniform convergence rate is preserved under the positivity condition.
Corollary 3.
Assume that the conditions of Theorem 6 hold. Then, as n , we have
sup x S d , 1 r ^ n , 2 ( 1 ) , ( miss ) ( φ , x ) r ( 1 ) ( φ , x ) = O ( ϑ d 1 / 2 ( n 1 log n ) 1 / 2 ) + O ( ϑ 1 / 2 ) , a . s .
In particular, if ϑ 2 d 1 = o ( n / log n ) , then
sup x S d , 1 r ^ n , 2 ( 1 ) , ( miss ) ( φ , x ) r ( 1 ) ( φ , x ) 0 , a . s .

4.3. Conditional U-Statistics Under Missing Data

In this section, we study the uniform strong consistency of the conditional U-statistic estimators using Bernstein polynomials in the presence of missing responses under MAR.
Theorem 7.
Assume that the conditions (C.2) and (C.3) hold, together with the MAR assumption (2.4) and the positivity condition inf x S d , 1 m p ( x ) c > 0 . If 2 ϑ n log n , as n , then
sup x ˜ S d , 1 m u n , 2 ( Bern-miss ) ( φ , x ˜ ) E u n , 2 ( Bern-miss ) ( φ , x ˜ ) = O ( ϑ m ( d 1 / 2 ) ( n 1 log n ) 1 / 2 ) , a . s .
Theorem 8.
Assume that the conditions (C.2) and (C.3) hold, together with the MAR assumption (2.4) and the positivity condition inf x S d , 1 m p ( x ) c > 0 . If 2 ϑ n log n , as n , then
sup x ˜ S d , 1 m r ^ n , 2 ( m ) , ( miss ) ( φ , x ˜ ; Λ ¯ n , 2 ( x ˜ ) ) E ^ r ^ n , 2 ( m ) , ( miss ) ( φ , x ˜ ; Λ ¯ n , 2 ( x ˜ ) ) = O ( ϑ m ( d 1 / 2 ) ( n 1 log n ) 1 / 2 ) , a . s .  
Theorem 9.
Assume that (C.0), (C.1) and (C.2) hold, together with the MAR assumption (2.4). If 2 ϑ n / log n , then, as n ,
sup x ˜ S d , 1 m r ( m ) ( φ , x ˜ ) E ^ r ^ n , 2 ( m ) , ( miss ) φ , x ˜ ; Λ ¯ n , 2 ( x ˜ ) = O ϑ m / 2 .
Corollary 4.
Under the assumptions of Theorems 8 and 9, together with the MAR assumption (2.4) and the positivity condition, as n , we have
sup x ˜ S d , 1 m r ^ n , 2 ( m ) , ( miss ) ( φ , x ˜ ; Λ ¯ n , 2 ( x ˜ ) ) r ( m ) ( φ , x ˜ ) = O ϑ m / 2 + O ( ϑ m ( d 1 / 2 ) ( n 1 log n ) 1 / 2 ) , a . s .
In particular, if ϑ 2 d 1 = o ( n / log n ) , then
sup x ˜ S d , 1 m r ^ n , 2 ( m ) , ( miss ) ( φ , x ˜ ; Λ ¯ n , 2 ( x ˜ ) ) r ( m ) ( φ , x ˜ ) 0 , a . s .
Remark 9
([129]). It is worth mentioning that, similar to Remark 6, the convergence rate for the conventional d-dimensional kernel density estimator with independent and identically distributed (i.i.d.) data, using bandwidth h, is O n 1 / 2 h d / 2 . However, under the MAR assumption with the positivity condition, the complete-case estimator f ^ n ( miss ) ( x , ϑ ) achieves a convergence rate of O n 1 / 2 ϑ d / 4 . Consequently, the relationship between the bandwidths of f ^ n ( miss ) ( x , ϑ ) and the traditional multivariate kernel density estimator is expressed as ϑ h 2 .
Remark 10.
Ouimet [129] demonstrated that the Mean Squared Error (MSE) of the complete-case density estimator f ^ n ( miss ) ( x , ϑ ) under MAR satisfies for all x Int ( S d , 1 ) and as n tends to infinity:
MSE f ^ n ( miss ) ( x , ϑ ) = n 1 ϑ d / 2 ψ ( x ) f ( x ) p ( x ) + ϑ 2 b 2 ( x ) + o x n 1 ϑ d / 2 + o ϑ 2 ,
where
b ( x ) : = d ( d 1 ) 2 f ( x ) + i = 1 d 1 2 x i x i f ( x ) + 1 2 i , j = 1 d x i 1 { i = j } x i x j 2 x i x j f ( x ) ,
and
ψ ( x ) : = ( 4 π ) d 1 x 1 i = 1 d x i 1 / 2 .
The factor 1 / p ( x ) in the variance term reflects the inflation due to missingness, analogous to the Dirichlet kernel case. In particular, when f ( x ) · b ( x ) 0 , the asymptotically optimal choice for ϑ, minimizing MSE, is:
ϑ opt ( miss ) ( x ) = n 2 / ( d + 4 ) 4 d · b 2 ( x ) p ( x ) ψ ( x ) f ( x ) 2 / ( d + 4 ) ,
with corresponding MSE:
MSE f ^ n ( miss ) ( x , ϑ ) ; ϑ opt ( miss ) = n 4 / ( d + 4 ) 4 d + 1 4 d 4 d + 4 ( ψ ( x ) f ( x ) / p ( x ) ) 4 / ( d + 4 ) b 2 ( x ) d / ( d + 4 ) + o x n 4 / ( d + 4 ) .
Moreover, in the more general case where n 2 / ( d + 4 ) ϑ 1 λ > 0 , as n , the MSE becomes:
MSE f ^ n ( miss ) ( x , ϑ ) = n 4 / ( d + 4 ) λ d / 2 ψ ( x ) f ( x ) p ( x ) + λ 2 b 2 ( x ) + o x n 4 / ( d + 4 ) .
The presence of p ( x ) in the denominator of the variance term indicates that missing data increase the MSE, and the optimal bandwidth ϑ opt ( miss ) ( x ) depends on the propensity score, unlike the complete-data case.

5. Conditional U -Statistics Estimators Using Beta Kernels Under Missing Data

Throughout this section, it is assumed, as in [165], without loss of generality, that the compact set is a d-dimensional unit hypercube [ 0 , 1 ] d . Among all asymmetric kernels, our particular focus is on the Beta kernel by [98]. The kernel takes the form
K α ˘ , β ˘ ( u ) = u x / b ( 1 u ) ( 1 x ) / b B { x / b + 1 , ( 1 x ) / b + 1 } 1 [ 0 , 1 ] ( u ) ,
where
α ˘ : = x b + 1 and β ˘ : = 1 x b + 1 , x [ 0 , 1 ] , b > 0 ,
and B ( α ˘ , β ˘ ) = 0 1 y α ˘ 1 ( 1 y ) β ˘ 1 d y for α ˘ , β ˘ > 0 is the beta function. To cope with multivariate problems, we construct a tensor product kernel for α ˘ = ( α ˘ 1 , , α ˘ d ) and β ˘ = ( β ˘ 1 , , β ˘ d )
K α ˘ , β ˘ ( u ) = i = 1 d K α ˘ i , β ˘ i u i = i = 1 d u i x i / b i 1 u i 1 x i / b i B x i / b i + 1 , 1 x i / b i + 1 1 u i [ 0 , 1 ] ,
where u : = u 1 , , u d [ 0 , 1 ] d , x : = x 1 , , x d [ 0 , 1 ] d and b : = b 1 , , b d R + d are d-dimensional vectors of data points, design points, and smoothing parameter, throughout. Under the MAR assumption (2.4) and the positivity condition inf x S X p ( x ) c > 0 , the complete-case estimator extending to missing responses is given by
r ^ n , 3 ( m ) , ( miss ) ( φ , x ˜ ; Λ ¯ n , 3 ( x ˜ ) ) = ( i 1 , , i m ) I ( m , n ) φ ( Y i 1 , , Y i m ) j = 1 m δ i j K α ˘ 1 , β ˘ 1 X i 1 K α ˘ m , β ˘ m X i m ( i 1 , , i m ) I ( m , n ) j = 1 m δ i j K α ˘ 1 , β ˘ 1 X i 1 K α ˘ m , β ˘ m X i m ,
where
α ˘ j : = x j b ˘ j + 1 and β ˘ j : = 1 x j b ˘ j + 1 .
For this specific kernel, the augmented kernel defined in (2.6) becomes
G φ , x ˜ , 3 ( Beta ) ( t ˜ , y ˜ , δ ˜ ) = φ ( y ˜ ) j = 1 m δ j j = 1 m K α ˘ j , β ˘ j ( t j ) ,             ( t ˜ , y ˜ , δ ˜ ) [ 0 , 1 ] d m × R q m × { 0 , 1 } m ,
where ( α ˘ j , β ˘ j ) are defined above. The corresponding U-statistic is
u n , 3 ( Beta ) ( φ , x ˜ ) : = ( n m ) ! n ! i I ( m , n ) G φ , x ˜ , 3 ( Beta ) ( X ˜ i , Y ˜ i , δ ˜ i ) ,
and the ratio representation (2.8) holds analogously:
r ^ n , 3 ( m ) , ( miss ) ( φ , x ˜ ; Λ ¯ n , 3 ( x ˜ ) ) = u n , 3 ( Beta ) ( φ , x ˜ ) u n , 3 ( Beta ) ( 1 , x ˜ ) .
In the particular case m = 1 , the Nadaraya–Watson estimator of r ( 1 ) ( φ , x ˜ ) of [165] under missing data is given by:
r ^ n ( 1 ) , ( miss ) ( φ , x ) : = i = 1 n φ ( Y i ) δ i K α ˘ , β ˘ X i i = 1 n δ i K α ˘ , β ˘ X i .

5.1. Conditions and Comments Under MAR

Our analysis starts from demonstrating weak uniform consistency with rates of the sample average estimator (5.1) for (2.1) on a d m -hyperrectangle S X m , where:
S X = S X ( η ) : = j = 1 d η j , 1 η j [ 0 , 1 ] d ,
where the boundary parameters η : = η 1 , , η d are either fixed or shrink to zero at a suitable rate. To deliver the results under the MAR assumption, we impose the following conditions.
(C.4) 
For b j : = b j ( n ) = b j 1 , , b j d > 0 and η j : = η j ( n ) = η j 1 , , η j d > 0 , j = 1 , , m , satisfying for i = 1 , , d , b j i , η j i 0 , b j i η j i 0 and
log n n j = 1 m i = 1 d b j i η j i 0 a s n .
(C.5) 
(Positivity under MAR) The propensity score p ( · ) satisfies inf x S X p ( x ) c > 0 for some constant c, ensuring that the denominator of (5.1) does not degenerate asymptotically.
The conditions on η j i in Assumption (C.4) are intended for the case of an expanding set. In particular, the condition b j i / η j i 0 means that the boundary parameter η j i must shrink to zero at a slower rate than b j i ; this is crucial for Stirling’s approximation to the gamma function. This condition was used by [165] for the novel proof of the convergence results that we have extended to our missing data setting. Condition (C.5) is standard in the missing data literature and guarantees that the complete-case estimator is well-defined asymptotically.

5.2. Weak Uniform Convergence of Conditional U-Statistics Under MAR

In the following theorem, we state the weak uniform convergence of conditional U-statistics under the MAR assumption. In the particular case of m = 1 , this reduces to the results obtained in [165] extended to missing data.
Theorem 10.
If Assumptions (C.2)–(C.4) and (C.5) hold, then, as n , we have
sup x ˜ S X m u n , 3 ( Beta ) ( φ , x ˜ ) E u n , 3 ( Beta ) ( φ , x ˜ ) = O P ( log n / n ) j = 1 m i = 1 d b j i η j i .
Theorem 11.
If Assumptions (C.2)–(C.4) and (C.5) hold, then, as n , we have
sup x ˜ S X m r ^ n , 3 ( m ) , ( miss ) ( φ , x ˜ ; Λ ¯ n , 3 ( x ˜ ) ) E ^ r ^ n , 3 ( m ) , ( miss ) ( φ , x ˜ ; Λ ¯ n , 3 ( x ˜ ) ) = O P ( log n / n ) j = 1 m i = 1 d b j i η j i ,
where E ^ [ · ] is the centering operator defined in (2.9).
Theorem 12.
Assume that (C.0) and (C.2) hold. Under the MAR assumption (2.4) and the positivity condition inf x S X p ( x ) c > 0 , we have, as n ,
sup x ˜ S X m r ( m ) ( φ , x ˜ ) E ^ r ^ n , 3 ( m ) , ( miss ) φ , x ˜ ; Λ ¯ n , 3 ( x ˜ ) = O j = 1 m i = 1 d b j i .
Remark 11.
Under MAR, the complete-case ratio estimator is locally weighted by the product p ( x ) f ( x ) . Hence the first-order bias analysis is not purely a matter of smoothness of f and R ( φ , · ) ; one also needs regularity of the propensity score p ( · ) . In particular, the cancellation of the propensity score in the ratio holds only asymptotically and relies on continuity (and, for higher-order expansions, differentiability) of p ( · ) .
Corollary 5.
Under the assumptions of Theorems 11 and 12, as n , we have
sup x ˜ S X m r ^ n , 3 ( m ) , ( miss ) ( φ , x ˜ ; Λ ¯ n , 3 ( x ˜ ) ) r ( m ) ( φ , x ˜ ) = O P j = 1 m i = 1 d b j i + ( log n / n ) j = 1 m i = 1 d b j i η j i .

5.3. Strong Uniform Convergence of Conditional U-Statistics Under MAR

In this section, we establish the strong uniform consistency, together with explicit convergence rates, of r ^ n , 3 ( m ) , ( miss ) φ , x ˜ ; Λ ¯ n , 3 ( x ˜ ) under the missing-at-random (MAR) mechanism. To this end, it is first necessary to strengthen appropriately the assumptions imposed on the smoothing parameters.
(C.4’.) 
For b j : = b j ( n ) = b j 1 , , b j d > 0 and η j : = η j ( n ) = η j 1 , , η j d > 0 , j = 1 , , m , satisfying for i = 1 , , d , b j i , η j i 0 , b j i η j i 0 and
log n n j = 1 m i = 1 d b j i η j i j = 1 m i = 1 d 1 b j i 2 1 κ = O ( 1 ) ,
for some constant κ [ 0 , 1 ) , as n .
(C.5’.) 
(Strong positivity under MAR) The propensity score p ( · ) satisfies inf x S X p ( x ) c > 0 and is continuous on S X , ensuring almost sure convergence of the denominator.
The condition (5.8) is stronger than log n / n j = 1 m i = 1 d b j i η j i 0 in Assumption (C.4) in that the former implies the latter. Under this condition, the statement in Corollary 5 can be strengthened to almost sure convergence. The following theorems generalize the results of [165] that are given for m = 1 to the missing data setting.
Theorem 13.
If Assumptions (C.2)–(C.3) and (C.4’.) and (C.5’.) hold, then, as n , we have
sup x ˜ S X m u n , 3 ( Beta ) ( φ , x ˜ ) E u n , 3 ( Beta ) ( φ , x ˜ ) = O ( log n / n ) j = 1 m i = 1 d b j i η j i , a . s .
Theorem 14.
If Assumptions (C.2)–(C.3) and (C.4’.) and (C.5’.) hold, then, as n , we have
sup x ˜ S X m r ^ n , 3 ( m ) , ( miss ) ( φ , x ˜ ; Λ ¯ n , 3 ( x ˜ ) ) E ^ r ^ n , 3 ( m ) , ( miss ) ( φ , x ˜ ; Λ ¯ n , 3 ( x ˜ ) ) = O ( log n / n ) j = 1 m i = 1 d b j i η j i , a . s .
Theorem 15.
If Assumption (C.2) holds, and under the MAR assumption (2.4) with condition (C.5’.), then we have
sup x ˜ S X m r ( m ) ( φ , x ˜ ) E ^ r ^ n , 3 ( m ) , ( miss ) ( φ , x ˜ ; Λ ¯ n , 3 ( x ˜ ) ) = O j = 1 m i = 1 d b j i .
Corollary 6.
Under the assumptions of Theorem 14 and (5.11), as n , we have
sup x ˜ S X m r ^ n , 3 ( m ) , ( miss ) ( φ , x ˜ ; Λ ¯ n , 3 ( x ˜ ) ) r ( m ) ( φ , x ˜ ) = O j = 1 m i = 1 d b j i + ( log n / n ) j = 1 m i = 1 d b j i η j i , a . s .
Remark 12.
The convergence rates established in Corollaries 5 and 6 are identical to those in the complete-data case, provided the positivity condition (C.5) holds. This is because the leading terms in the asymptotic expansion are governed by the kernel and the bandwidth parameters, while the propensity score p ( · ) affects only the constants in the asymptotic variance (through the factor 1 / p ( x i ) ) but not the rates. However, in finite samples, missingness inflates the variance, and the effective sample size for the complete-case estimator is approximately n · inf p ( x ) , which must be accounted for in practical implementations.

5.4. Conditional U-Statistics Estimators Using Mixed Categorical and Continuous Data Under MAR

Let us delve into the methodology for handling a discrete random variable Z, which can assume c distinct values, { 0 , 1 , , c 1 } , where c 2 , we refer to [166,167,168,169,170]. We categorize this variable as either unordered or ordered, as the kernels utilized for these two types differ slightly. For an unordered variable, the univariate discrete kernel takes the form:
l ( v ; z , λ ) = 1 λ , if v = z , λ / ( c 1 ) , if v z .
Here, v represents the data point, z denotes the design point, and λ ( 0 , 1 ) denotes the bandwidth. Conversely, the univariate discrete kernel for an ordered variable is given by:
( v ; z , λ ) = c | v z | ( 1 λ ) c | v z | λ | v z | .
Moving on to the product discrete kernel, when q 1 ( q ) out of q discrete variables are unordered, it becomes:
L ( v ; z , λ ) = k = 1 q 1 l ( v k ; z k , λ k ) k = q 1 + 1 q ( v k ; z k , λ k ) .
Here, v : = v 1 , , v q , z : = z 1 , , z q , and λ : = λ 1 , , λ q . Combining this with the product beta kernel K α ˘ , β ˘ ( u ) yields the product kernel for mixed categorical and continuous data:
W ( u , v ; x , z , b , λ ) = K α ˘ , β ˘ ( u ) L ( v ; z , λ ) .
We now incorporate the missing data mechanism. Let δ i be the missingness indicator defined in (2.4), with δ i = 1 if Y i is observed and δ i = 0 otherwise. Under the MAR assumption, we have P ( δ i = 1 X i , Z i , Y i ) = P ( δ i = 1 X i , Z i ) = : p ( X i , Z i ) , where p : X × S Z [ 0 , 1 ] is the propensity score, assumed continuous and bounded away from zero on the support. For consistency, we require the positivity condition inf ( x , z ) S p ( x , z ) c > 0 for some compact S X × S Z . Given this kernel and n i.i.d. observations Y i , X i , Z i , δ i i = 1 n R × [ 0 , 1 ] d × S Z × { 0 , 1 } , where S Z : = k = 1 q 0 , 1 , , c k 1 , we turn to a regression estimator of the conditional mean:
r ( m ) ( φ , x ˜ , z ˜ ) = E φ ( Y 1 , , Y m ) ( X 1 , , X m ) = x ˜ , ( Z 1 , , Z m ) = z ˜ .
The complete-case estimator under MAR, denoted as r ^ n , 3 ( m ) , ( miss ) ( φ , x ˜ , z ˜ ; Λ ¯ n , 3 ( x ˜ ) ) , is expressed as:
r ^ n , 3 ( m ) , ( miss ) ( φ , x ˜ , z ˜ ; Λ ¯ n , 3 ( x ˜ ) ) = ( i 1 , , i m ) I ( m , n ) φ ( Y i 1 , , Y i m ) j = 1 m δ i j j = 1 m W X i j , Z i j ; x j , z j , b , λ ( i 1 , , i m ) I ( m , n ) j = 1 m δ i j j = 1 m W X i j , Z i j ; x j , z j , b , λ ,
where Λ ¯ n , 3 ( x ˜ ) denotes the collection of all bandwidth parameters ( b , λ ) . For notational convenience, define the extended random vector Z i ( mixed ) : = ( X i , Z i , Y i , δ i ) . The corresponding U-statistic representation is given by
u n , 3 ( miss ) ( φ , x ˜ , z ˜ ) : = ( n m ) ! n ! i I ( m , n ) φ ( Y i 1 , , Y i m ) j = 1 m δ i j j = 1 m W X i j , Z i j ; x j , z j , b , λ ,
and the estimator can be written as the ratio
r ^ n , 3 ( m ) , ( miss ) ( φ , x ˜ , z ˜ ; Λ ¯ n , 3 ( x ˜ ) ) = u n , 3 ( miss ) ( φ , x ˜ , z ˜ ) u n , 3 ( miss ) ( 1 , x ˜ , z ˜ ) .

Weak Uniform Convergence Under MAR

Before we state the uniform convergence results of the estimator under missing data, let us adopt the previous conditions to this setting as follows:
(C.1’) 
Y i , X i , Z i , δ i i = 1 n R × [ 0 , 1 ] d × S Z × { 0 , 1 } are independent and identically distributed random variables under the MAR assumption (2.4);
(C.2’) 
Let f ˜ ( x ˜ , z ˜ ) be the joint probability density function (with respect to the product of Lebesgue measure on [ 0 , 1 ] d m and counting measure on S Z m ) of ( X ˜ , Z ˜ ) . Then the second-order derivatives of f ˜ ( x ˜ , z ˜ ) and g ˜ ( x ˜ , z ˜ ) : = r ( m ) ( φ , x ˜ , z ˜ ) f ˜ ( x ˜ , z ˜ ) with respect to x ˜ are continuous on x ˜ ( 0 , 1 ) d m for each fixed z ˜ S Z m ;
(C.3’) 
There exist constants γ > 0 and C 1 [ 1 , ) such that E | φ ( Y ) | 2 + γ < and
sup ( x ˜ , z ˜ ) ( 0 , 1 ) d m × S Z m E | φ ( Y ) | 2 + γ X ˜ = x ˜ , Z ˜ = z ˜ f ˜ ( x ˜ , z ˜ ) C 1 ;
(C.4”) 
For b j : = b j ( n ) = b j 1 , , b j d > 0 , η j : = η j ( n ) = η j 1 , , η j d > 0 , j = 1 , , m , and λ k : = λ k ( n ) ( 0 , 1 ) , k = 1 , , q satisfying for i = 1 , , d , b j i , η j i 0 , b j i η j i 0 , λ k 0 and
log n n j = 1 m i = 1 d b j i η j i 0 a s n ;
(C.4”’) 
For b j : = b j ( n ) = b j 1 , , b j d > 0 , η j : = η j ( n ) = η j 1 , , η j d > 0 , j = 1 , , m , and λ k : = λ k ( n ) ( 0 , 1 ) , k = 1 , , q satisfying for i = 1 , , d , b j i , η j i 0 , b j i η j i 0 , λ k 0 and
log n n j = 1 m i = 1 d b j i η j i j = 1 m i = 1 d 1 b j i 2 1 κ = O ( 1 ) ,
for some constant κ [ 0 , 1 ) , as n ;
(C.5) 
Let
f n m : = inf ( x ˜ , z ˜ ) X m × S Z m f ˜ ( x ˜ , z ˜ ) > 0 ,
and assume
inf ( x ˜ , z ˜ ) X m × S Z m p ( x ˜ , z ˜ ) c > 0 .
Moreover, suppose that
1 f n m j = 1 m i = 1 d b j i + k = 1 q λ k + ( log n / n ) j = 1 m i = 1 d b j i η j i 0 .
Corollary 7.
Under the assumptions (C.1’.)–(C.3’), (C.4”.), (C.5), and the MAR assumption (2.4) with the positivity condition (5.18), we have, as n ,
sup ( x ˜ , z ˜ ) X m × S Z m r ^ n , 3 ( m ) , ( miss ) ( φ , x ˜ , z ˜ ; Λ ¯ n , 3 ( x ˜ ) ) r ( m ) ( φ , x ˜ , z ˜ )       = O P j = 1 m i = 1 d b j i + k = 1 q λ k + ( log n / n ) j = 1 m i = 1 d b j i η j i .
Remark 13.
Several important observations are in order regarding the adaptation to missing data:
1. 
The complete-case estimator (5.13) incorporates the product j = 1 m δ i j , which discards any m-tuple containing at least one missing response. Under the MAR assumption and the positivity condition, this estimator remains consistent, albeit with a larger asymptotic variance due to the reduced effective sample size.
2. 
The convergence rate in Corollary 7 remains unchanged from the complete-data case because the missingness indicators δ i do not affect the first-order bias. However, the constant in the O P term may depend on the propensity score through the variance of the U-statistic.
3. 
The bandwidth conditions (C.4”) and (C.4”’) are unaffected by the missingness mechanism, as they pertain to the kernel and the design density. The positivity condition (5.18) ensures that the denominator of the estimator does not degenerate asymptotically.
4. 
When the propensity score p ( x , z ) is unknown, it must be estimated from the data. Under MAR, a nonparametric estimator of p ( · ) (e.g., a kernel estimator using the complete cases) can be employed, leading to an augmented inverse probability weighted (AIPW) estimator that may achieve semiparametric efficiency. This extension is beyond the scope of the present work but represents a promising direction for future research.
Remark 14.
The relationship between the present manuscript and the general delta-sequence theory of [157] is structural rather than merely taxonomic. The latter work develops an abstract asymptotic theory for complete-case conditional U-statistics under MAR by treating the localization device as a positive approximate identity. At that level of generality, the principal objects are the localized complete-case U-statistic, its MAR-weighted expectation, the associated ratio normalization, and the Hoeffding–projection structure governing stochastic fluctuations. The resulting theory isolates the probabilistic skeleton of the problem.
The present paper, by contrast, studies what happens when the approximate identity is no longer an abstract regularizing sequence but a support-adapted asymmetric family whose analytic behavior is inseparable from the geometry of the covariate space. This passage from abstract localization to asymmetric localization is not innocuous. Dirichlet kernels on the simplex, beta kernels on the hypercube, and Bernstein polynomial smoothers possess evaluation-point-dependent shapes, boundary stratification, non-Euclidean local covariance structures, and normalizing constants whose asymptotics change across the support. Their mass concentration is governed not only by a bandwidth or degree parameter, but also by the position of the target point relative to the boundary. Hence the bias, variance, entropy, and stochastic equicontinuity calculations require a substantially finer analysis than that needed for a generic delta sequence.
This distinction is especially pronounced for conditional U-statistics. In ordinary first-order regression, asymmetric kernels already introduce nonstandard boundary behavior. In the present higher-order setting, that behavior is amplified by the nonlinear ratio structure and by the Hoeffding decomposition of localized U-statistics. The first projection carries the leading Gaussian fluctuation, whereas the higher canonical projections must be shown to be uniformly negligible over support-dependent classes of kernels. Since the localization kernel itself depends on the ordered target tuple
x ˜ = ( x 1 , , x m ) ,
the relevant kernel class is not a simple translation family. It is a point-dependent, support- constrained, and generally non-symmetric family. This is one of the reasons why the proofs cannot be obtained by a direct invocation of the abstract results in [157]. A second essential distinction concerns the MAR mechanism. Under complete-case sampling, the expectation of the localized numerator involves j = 1 m p ( t j ) f ( t j ) rather than j = 1 m f ( t j ) . Thus the finite-sample centering is a propensity-weighted smoothing functional. In a purely abstract delta-sequence framework, this observation identifies the correct effective design measure. In the present asymmetric-kernel setting, however, one must further determine how the factor p ( · ) interacts with the local geometry of the kernel. The paper shows that the leading-order contribution of the propensity score cancels in the ratio bias under continuity and positivity assumptions, while the stochastic dispersion retains the factor 1 / p ( x ) , or its higher-order analogue, through the variance of the complete-case first projection. This yields a precise asymptotic separation:
bias is governed by support-adapted smoothing geometry ,
whereas
variance is inflated by the MAR observation mechanism .
Such a conclusion requires kernel-specific expansions and cannot be reduced to a formal “replace f by p f ” principle.
The genuinely new content of the present manuscript relative to [157] may therefore be summarized as follows. First, it develops explicit Dirichlet-kernel conditional U-statistic estimators on the simplex under MAR and proves strong uniform consistency and asymptotic normality with rates reflecting the local L 2 geometry of the Dirichlet family. Second, it establishes Bernstein-polynomial analogues, including uniform stochastic bounds and bias estimates in a nonlinear conditional U-statistic setting. Third, it treats multivariate beta kernels on hyperrectangles, both on fixed compact regions and on expanding interior domains approaching the boundary, thereby making visible the interaction between bandwidths, boundary parameters, and complete-case sampling. Fourth, it extends the construction to mixed continuous–categorical regressors, where continuous beta smoothing and discrete kernels contribute distinct bias components. Fifth, it provides complete-case computational formulations, bandwidth selection criteria, and simulation evidence for conditional Kendall-type functionals under MCAR and MAR mechanisms.
Thus, the present paper is not simply an application of the delta-sequence MAR theory. It is a boundary-sensitive, geometry-dependent, and kernel-specific refinement of that theory. The general framework of [157] supplies the abstract probabilistic paradigm; the present manuscript supplies the detailed asymptotic analysis needed to make that paradigm operational for asymmetric smoothing on constrained supports. In particular, the rates, bias expansions, variance inflation factors, and mixed-data extensions derived here are new features generated by the interaction of four mechanisms: higher-order conditional U-statistic structure, point-dependent asymmetric smoothing, boundary geometry, and MAR complete-case sampling.
Remark 15.
We clarify the logical relation between the present paper, the earlier complete-data analysis [159], and the abstract MAR delta-sequence framework [157]. The present manuscript should not be read as reproving, in kernel-specific notation, every abstract consequence already contained in [157]. Rather, the role of the present work is to identify which parts of that abstract theory are applicable to asymmetric support-adapted kernels and to derive the additional kernel-specific information that is not available from the abstract framework alone. The complete-data paper [159] treats conditional U-statistics without missing responses. Hence it contains neither the complete-case U-statistic structure, nor the MAR propensity score, nor the effective complete-case design density p f , nor the inverse-propensity variance inflation. It also does not address the distinction between complete-case and IPW centering. Therefore, the present paper is new relative to [159] at the level of the statistical experiment: the observations are ( X i , δ i Y i , δ i ) , the estimator is a complete-case local ratio, and the first Hoeffding projection, deterministic centering, variance, and MSE constants are all modified by the MAR mechanism. The MAR delta-sequence work [157], by contrast, provides an abstract probabilistic theorem for complete-case conditional U-statistics under MAR, where the localization device is treated as a positive approximate identity. Results in the present paper that follow by checking the hypotheses of [157] are therefore presented only as corollaries or short verification statements. In particular, whenever the conclusion is only that a generic complete-case estimator is consistent, or satisfies the abstract delta-sequence stochastic bound under the hypotheses of [157], we do not claim a new theorem. We simply verify the required assumptions for the relevant kernel family. The genuinely new contributions of the present paper are the following kernel-specific and geometry-dependent ingredients.
1. 
For Dirichlet kernels on the simplex, we derive the exact local moment structure, including the boundary-sensitive drift, covariance matrix, and L 2 -norm behavior. These quantities are not supplied by the abstract delta-sequence theory and are essential for obtaining explicit rates and normalizing constants.
2. 
For Bernstein smoothers, we identify the discrete polynomial smoothing operator as an admissible MAR localization scheme and compute its bias and variance scales in the nonlinear conditional U-statistic setting. The resulting verification is not a formal substitution, because the polynomial operator is discrete and its stochastic normalization differs from ordinary continuous kernels.
3. 
For product beta kernels on hyperrectangles, we give the interior and near-boundary moment expansions, the L 2 -norm regimes, and the corresponding uniform stochastic rates. These regimes depend on the evaluation point and cannot be recovered from [157] without substantial kernel-specific analysis.
4. 
For mixed continuous–categorical regressors, we combine continuous beta smoothing with categorical smoothing. This yields a two-component deterministic bias and a mixed stochastic scale. This construction is not present in the complete-data paper and is not an immediate consequence of the abstract delta-sequence result.
5. 
For the MAR mechanism, we make explicit how the complete-case density  p f  enters deterministic centering and bias constants, while the stochastic dispersion contains the inverse-propensity loss of information. The abstract framework gives the general MAR architecture, but the present paper computes the kernel-specific constants and rates for the asymmetric smoothers under consideration.

5.5. Computational Complexity

We briefly discuss the computational cost of the proposed estimators. Let n obs : = i = 1 n δ i denote the number of complete cases. Under the MAR assumption, all estimators in (2.5), (3.3), (4.2), (5.1) and (5.13) can be implemented after discarding the incomplete observations, so the effective sample size in the algorithmic complexity is n obs rather than n. For fixed order m, the cardinality of I ( m , n obs ) = { ( i 1 , , i m ) : 1 i j n obs , i j i r } is
| I ( m , n obs ) | = n obs ! ( n obs m ) ! = O ( n obs m ) .
For a fixed evaluation point x ˜ = ( x 1 , , x m ) , the numerator and denominator of the conditional U-statistic estimator require summation over all distinct m-tuples. Hence, the naive computational cost is
O | I ( m , n obs ) | [ C φ + m C K ] = O n obs m [ C φ + m C K ] ,
where C φ is the cost of evaluating φ ( Y i 1 , , Y i m ) and C K is the cost of one kernel evaluation.
More specifically:
  • Dirichlet kernel estimator. For (3.3), after precomputing the normalization constants depending on ( α j , β j ) , one kernel evaluation costs C K = O ( d ) . Therefore, for one target point,
    Time = O n obs m ( C φ + m d ) ,             Memory = O ( n obs d ) .
  • Bernstein polynomial estimator. For (4.2), if implemented directly from the multinomial sum, one evaluation may cost as much as
    O ϑ + d 1 d ,
    but using the cell representation of the Bernstein density estimator reduces the cost of one kernel evaluation to C K = O ( d ) (or O ( 1 ) in the univariate case). Thus, with an efficient implementation,
    Time = O n obs m ( C φ + m d ) .
  • Beta kernel estimator. For (5.1), the product beta kernel requires O ( d ) operations per observation, so
    Time = O n obs m ( C φ + m d ) ,             Memory = O ( n obs d ) .
  • Mixed continuous/categorical estimator. For (5.13), the continuous beta part contributes O ( d ) and the discrete kernel contributes O ( q ) , hence
    C K = O ( d + q ) ,
    and therefore
    Time = O n obs m ( C φ + m ( d + q ) ) ,             Memory = O ( n obs ( d + q ) ) .
In the important special case m = 1 (Nadaraya–Watson-type regression), the complexity becomes linear in the number of complete cases:
Time = O n obs C K ,
that is, O ( n obs d ) for the Dirichlet, Bernstein, and beta kernels, and O ( n obs ( d + q ) ) in the mixed-data case. If the estimator is evaluated on a grid of G target points, the overall cost scales as
O G n obs m [ C φ + m C K ] .
Bandwidth selection. The leave-one-out cross-validation criterion in (8.2) is substantially more expensive. A naive implementation requires recomputing an estimator of order O ( n obs m ) for each of the O ( n obs m ) tuples, yielding
O ( n obs 2 m )
operations per candidate bandwidth. If H bandwidth values are tested, the total cost is
O ( H n obs 2 m ) .
For m = 1 , this reduces to the familiar O ( H n obs 2 ) complexity.
  • Comment. Therefore, the proposed methodology is computationally tractable for m = 1 and m = 2 , but the exact computation becomes rapidly expensive for larger m because of the combinatorial growth of | I ( m , n obs ) | . In practice, this motivates the use of precomputed kernel weights, parallel computation over target points, and, for large samples or higher orders, incomplete U-statistics or subsampling strategies.
Remark 16.
The exact evaluation of the conditional U-statistic estimator in (2.5) involves a summation over I ( m , n ) , and hence has worst-case computational complexity of order O ( n m ) for a fixed evaluation point. This cost is intrinsic to the fully enumerated U-statistic representation and may become prohibitive when m 2 and n is large. The main purpose of the present paper is asymptotic and probabilistic rather than algorithmic; nevertheless, several standard computational reductions are available. First, in the most common applications of conditional U-statistics, the order m is small, typically m = 2 for pairwise functionals such as local covariance, conditional Kendall-type coefficients, discrimination criteria, ranking losses, or two-sample comparison functionals. In that case the exact implementation has quadratic cost, which remains feasible for moderate sample sizes, especially after discarding incomplete tuples through the factor j = 1 m δ i j . Second, the complete U-statistic may be replaced by an incomplete or sampled U-statistic, obtained by averaging only over a random subset B n I ( m , n ) . For instance, one may use
r ^ n , , B ( m ) ( φ , x ˜ ) = i B n φ ( Y i 1 , , Y i m ) j = 1 m δ i j K Λ n , ( x j ) ( X i j ) i B n j = 1 m δ i j K Λ n , ( x j ) ( X i j ) .
If B n is sampled uniformly from I ( m , n ) , or generated by an appropriate randomized block scheme, the computational cost is reduced from O ( n m ) to O ( | B n | ) . The additional Monte Carlo error can then be made negligible relative to the statistical error by choosing | B n | sufficiently large. A detailed asymptotic theory for such incomplete asymmetric-kernel conditional U-statistics under MAR is beyond the scope of the present paper, but it is a natural direction for future work. Third, the localized nature of the kernel weights can be exploited computationally. For compactly supported or effectively localized asymmetric kernels, only observations lying in the effective neighborhood of the evaluation point contribute substantially to the numerator and denominator. Thus, nearest-neighbor screening, binning, kd-tree search, or sparse evaluation of negligible weights may substantially reduce the practical cost. These reductions do not change the theoretical estimator studied here when exact weights are retained, but they provide efficient numerical implementations in moderate and large samples. Consequently, the O ( n m ) complexity should be understood as the cost of the exact theoretical version of the estimator. The asymptotic results of the paper are established for this exact complete-case conditional U-statistic, while incomplete-U, randomized, and localized approximations provide scalable computational variants whose rigorous treatment is left for future investigation.
Remark 17.
We clarify why the propensity-weighted deterministic centering associated with the complete-case estimator converges to the same conditional functional r ( m ) ( φ , x ˜ ) as in the fully observed case. Let
g p ( x ) : = p ( x ) f ( x ) ,             G p ( x ˜ ) : = j = 1 m g p ( x j ) ,
and write
θ ( x ˜ ) : = r ( m ) ( φ , x ˜ ) .
The complete-case deterministic centering may be written as
r p , n , ( m ) ( φ , x ˜ ) = X m θ ( t ˜ ) G p ( t ˜ ) K ˜ Λ ¯ n , ( x ˜ ) ( t ˜ ) d t ˜ X m G p ( t ˜ ) K ˜ Λ ¯ n , ( x ˜ ) ( t ˜ ) d t ˜ .
Equivalently, if T ˜ n , , x ˜ has density K ˜ Λ ¯ n , ( x ˜ ) , then
r p , n , ( m ) ( φ , x ˜ ) = E θ ( T ˜ n , , x ˜ ) G p ( T ˜ n , , x ˜ ) E G p ( T ˜ n , , x ˜ ) .
Hence
r p , n , ( m ) ( φ , x ˜ ) θ ( x ˜ ) = E { θ ( T ˜ n , , x ˜ ) θ ( x ˜ ) } G p ( T ˜ n , , x ˜ ) E G p ( T ˜ n , , x ˜ ) .
This identity is the key point. The propensity score does not have to vanish from the integrand. Rather, it appears both in the numerator and in the denominator through the same local design weight G p . Since the kernels form an approximate identity at x ˜ , the random vector T ˜ n , , x ˜ concentrates around x ˜ . Therefore, if θ and G p are continuous, and if G p ( x ˜ ) > 0 , then
E G p ( T ˜ n , , x ˜ ) G p ( x ˜ ) > 0 ,
and
E θ ( T ˜ n , , x ˜ ) G p ( T ˜ n , , x ˜ ) θ ( x ˜ ) G p ( x ˜ ) .
Consequently,
r p , n , ( m ) ( φ , x ˜ ) θ ( x ˜ ) = r ( m ) ( φ , x ˜ ) .
More explicitly, for any η > 0 , identity ( 1 ) gives
r p , n , ( m ) ( φ , x ˜ ) θ ( x ˜ ) ω θ ( η ) E [ G p ( T ˜ n , , x ˜ ) ] + 2 θ G p P ( T ˜ n , , x ˜ x ˜ > η ) E [ G p ( T ˜ n , , x ˜ ) ] ,
where ω θ ( η ) is the modulus of continuity of θ. The first term is small because θ is continuous, and the second term is small because the kernel mass concentrates around x ˜ . The denominator is bounded away from zero for all large n because p is positive and f is positive on the region under consideration. This proves the convergence. If the convergence is required uniformly on a compact set C Int ( X m ) , the same argument applies provided the approximate-identity property holds uniformly on C , and provided
inf x ˜ C G p ( x ˜ ) > 0 .
Thus
sup x ˜ C r p , n , ( m ) ( φ , x ˜ ) r ( m ) ( φ , x ˜ ) 0 .
Under second-order smoothness, this qualitative convergence can be sharpened to the bias expansion
r p , n , ( m ) ( φ , x ˜ ) θ ( x ˜ ) = θ ( x ˜ ) μ n , ( x ˜ ) + 1 2 tr 2 θ ( x ˜ ) M n , ( x ˜ ) + θ ( x ˜ ) M n , ( x ˜ ) log G p ( x ˜ ) + o ( ρ n , ) ,
where
μ n , ( x ˜ ) = E [ T ˜ n , , x ˜ x ˜ ] ,             M n , ( x ˜ ) = E [ ( T ˜ n , , x ˜ x ˜ ) ( T ˜ n , , x ˜ x ˜ ) ] .
This expansion shows precisely what is meant by the cancellation of the MAR propensity score. The factor p cancels from the zeroth-order target because the ratio converges to
θ ( x ˜ ) G p ( x ˜ ) G p ( x ˜ ) = θ ( x ˜ ) .
However, unless p is locally constant, it may still enter the higher-order bias constant through
log G p ( x ˜ ) = j = 1 m log { p ( x j ) f ( x j ) } .
Therefore, the correct statement is that the propensity-weighted centering converges to the target conditional functional, and that the MAR mechanism does not change the zeroth-order target or the bias rate under smooth positive p. Exact higher-order bias constants, however, must be computed with the effective complete-case density p f .

6. Applications

6.1. Discrimination Problems with Missing Responses

We now apply the theoretical framework developed in the preceding sections to the problem of discrimination described in Section 3 of [171] (see also [172]), under the additional complication that response variables may be missing according to the MAR mechanism (2.4). We adopt the same notation and setting as [171], with modifications to accommodate missingness.
Let φ : Y k { 1 , , M } be a measurable function taking finitely many values. The sets
A j = ( y 1 , , y k ) Y k : φ ( y 1 , , y k ) = j ,             1 j M ,
form a measurable partition of the feature space Y k . Predicting the value of φ ( Y 1 , , Y k ) is equivalent to predicting the cell A j to which the k-tuple ( Y 1 , , Y k ) belongs. For any measurable discrimination rule g : X k { 1 , , M } , the probability of correct classification satisfies
P ( g ( X ˜ ) = φ ( Y ˜ ) ) j = 1 M { x ˜ X k : g ( x ˜ ) = j } max 1 M M ( x ˜ ) d P X ˜ ( x ˜ ) ,
where P X ˜ denotes the distribution of X ˜ = ( X 1 , , X k ) , and for each j { 1 , , M } ,
M j ( x ˜ ) = P φ ( Y ˜ ) = j X ˜ = x ˜ ,             x ˜ X k ,
is the posterior probability of class j given the covariates. The inequality becomes an equality for the Bayes rule
g 0 ( x ˜ ) = arg max 1 j M M j ( x ˜ ) ,             x ˜ X k ,
where ties are broken arbitrarily (e.g., by selecting the smallest index). The associated minimal probability of error, or Bayes risk, is
L * = 1 P ( g 0 ( X ˜ ) = φ ( Y ˜ ) ) = 1 E X ˜ max 1 j M M j ( X ˜ ) .
Under the MAR assumption (2.4) and the positivity condition inf x X p ( x ) c > 0 , each posterior probability M j can be consistently estimated using the complete-case conditional U-statistic methodology. For 1 j M and { 1 , 2 , 3 } , define
M n , j , ( miss ) ( x ˜ ) = ( i 1 , , i k ) I ( k , n ) 1 { φ ( Y i 1 , , Y i k ) = j } r = 1 k δ i r r = 1 k K Λ n , ( x r ) ( X i r ) ( i 1 , , i k ) I ( k , n ) r = 1 k δ i r r = 1 k K Λ n , ( x r ) ( X i r ) ,
with the convention that the ratio is taken to be 1 / M (or any arbitrary value) when the denominator vanishes—an event that occurs with probability tending to zero under the positivity condition as n . The product r = 1 k δ i r ensures that only k-tuples for which all responses are observed contribute to the estimation, which is the essence of the complete-case approach. Define the estimated discrimination rule
g 0 , n , ( miss ) ( x ˜ ) = arg max 1 j M M n , j , ( miss ) ( x ˜ ) ,             x ˜ X k ,
and the associated empirical Bayes risk
L n , * , ( miss ) = P g 0 , n , ( miss ) ( X ˜ ) φ ( Y ˜ ) .
Theorem 16
(Consistency of the empirical Bayes rule under MAR). Under the MAR assumption (2.4), the positivity condition inf x X p ( x ) c > 0 , and the assumptions of Corollary 1 (adapted to the missing data setting), we have
lim n L n , * , ( miss ) = L * .
Proof. 
For any x ˜ X k , observe that
max 1 j M M j ( x ˜ ) max 1 j M M n , j , ( miss ) ( x ˜ ) max 1 j M M j ( x ˜ ) M n , j , ( miss ) ( x ˜ ) .
Therefore, using the identity P ( g ( X ˜ ) φ ( Y ˜ ) ) = 1 E [ max j M j ( X ˜ ) ] for the Bayes rule and the analogous representation for the empirical rule, we obtain
L * L n , * , ( miss ) 2 E X ˜ max 1 j M M j ( X ˜ ) M n , j , ( miss ) ( X ˜ )
2 E X ˜ j = 1 M M j ( X ˜ ) M n , j , ( miss ) ( X ˜ )
= 2 j = 1 M E X ˜ M j ( X ˜ ) M n , j , ( miss ) ( X ˜ ) .
The factor 2 arises from the triangle inequality and the fact that the probability of misclassification is bounded above by twice the total variation distance between the estimated and true conditional probability vectors (see Lemma 2.1 of [173]). By Corollary 1 applied to the indicator kernels 1 { φ ( · ) = j } , each term E X ˜ [ | M j ( X ˜ ) M n , j , ( miss ) ( X ˜ ) | ] converges to zero as n under the stated assumptions. The conclusion follows by the dominated convergence theorem, noting that the integrand is uniformly bounded by 2. □

6.2. Generalized U-Statistics with Missing Data

The extension of the complete-case conditional U-statistic methodology to the setting of multiple independent samples is conceptually straightforward but requires careful bookkeeping of missingness indicators across samples. Consider ˜ N * independent collections of i.i.d. observations
X 1 ( 1 ) , Y 1 ( 1 ) , δ 1 ( 1 ) , X 2 ( 1 ) , Y 2 ( 1 ) , δ 2 ( 1 ) , , , X 1 ( ˜ ) , Y 1 ( ˜ ) , δ 1 ( ˜ ) , X 2 ( ˜ ) , Y 2 ( ˜ ) , δ 2 ( ˜ ) , ,
where for each j { 1 , , ˜ } , the missingness indicators δ i ( j ) satisfy the MAR assumption (2.4) with (possibly sample-specific) propensity score p j ( · ) = P ( δ 1 ( j ) = 1 X 1 ( j ) ) , and the positivity condition inf x X p j ( x ) c j > 0 holds uniformly across samples. The collections are assumed to be mutually independent, and the missingness mechanisms are independent across samples.
Let k 1 , , k ˜ N * be fixed integers, and let
φ : j = 1 ˜ Y k j R
be a measurable function that is symmetric within each block of arguments (i.e., for each j, φ is invariant under permutations of the k j arguments from the j-th sample). For t = ( t 1 , , t ˜ ) j = 1 ˜ X k j , define the conditional expectation
r ( k , ˜ ) ( φ , t ) : = E [ φ Y 1 ( 1 ) , , Y k 1 ( 1 ) ; ; Y 1 ( ˜ ) , , Y k ˜ ( ˜ )
                        | X 1 ( j ) , , X k j ( j ) = t j , j = 1 , , ˜ ] ,
whenever the expectation exists. Corresponding to the kernel φ and assuming n j k j for all j = 1 , , ˜ , the complete-case conditional U-statistic for estimating r ( k , ˜ ) ( φ , t ) under MAR is defined, for { 1 , 2 , 3 } , by
r ^ n , ( k , ˜ ) , ( miss ) ( φ , t )     : = i I φ Y i 11 ( 1 ) , , Y i 1 k 1 ( 1 ) ; ; Y i ˜ 1 ( ˜ ) , , Y i ˜ k ˜ ( ˜ ) j = 1 ˜ r = 1 k j δ i j r ( j ) K X i j r ( j ) j = 1 , , ˜ ; r = 1 , , k j i I j = 1 ˜ r = 1 k j δ i j r ( j ) K X i j r ( j ) j = 1 , , ˜ ; r = 1 , , k j ,
where
K X i j r ( j ) j = 1 , , ˜ ; r = 1 , , k j : = j = 1 ˜ r = 1 k j K Λ n , ( t j r ) X i j r ( j ) ,
with t j = ( t j 1 , , t j k j ) X k j , and the summation index set is
I : = j = 1 ˜ I ( k j , n j ) ,
where I ( k j , n j ) : = { ( i j 1 , , i j k j ) { 1 , , n j } k j : i j 1 , , i j k j distinct } . The extension of [2] treatment of one-sample U-statistics to the ˜ -sample case is due to [174,175]. Under the MAR assumption and the positivity condition, the uniform consistency results established in Corollary 1 (or Corollary 4 or 6) extend directly to this multisample setting. Specifically, we have
r ^ n , ( k , ˜ ) , ( miss ) ( φ , t ) r ( k , ˜ ) ( φ , t ) 0 ,             a . s . ,
as min ( n 1 , , n ˜ ) , provided the bandwidth parameters satisfy appropriate conditions and the propensity scores are uniformly bounded away from zero.

6.3. Kendall Rank Correlation Coefficient Under Conditional Independence Testing with Missing Responses

To test the independence of two one-dimensional random variables Y 1 and Y 2 , Kendall [176] proposed a nonparametric procedure based on the U-statistic K n with kernel
φ ( s 1 , t 1 ) , ( s 2 , t 2 ) = 1 { ( s 2 s 1 ) ( t 2 t 1 ) > 0 } 1 { ( s 2 s 1 ) ( t 2 t 1 ) 0 } .
The rejection region for testing independence is of the form { n K n > γ } , where γ is a critical value determined by the asymptotic distribution of K n under the null hypothesis. We now extend this framework to test conditional independence in a multivariate setting with potentially missing responses. Let ξ R d 1 and η R d 2 be random vectors with d 1 + d 2 = d , and set Y = ( ξ , η ) . Suppose we observe n i.i.d. copies ( X i , Y i , δ i ) i = 1 n satisfying the MAR assumption (2.4) with propensity score p ( · ) and positivity condition inf x X p ( x ) c > 0 . We are interested in testing the conditional independence hypothesis
H 0 : ξ η X             versus             H a : H 0 is false .
For { 1 , 2 , 3 } and t = ( t 1 , t 2 ) X 2 , define the complete-case conditional Kendall’s tau estimator
τ ^ n , ( miss ) ( t ) : = i j n φ ( Y i , Y j ) δ i δ j K Λ n , ( t 1 ) ( X i ) K Λ n , ( t 2 ) ( X j ) i j n δ i δ j K Λ n , ( t 1 ) ( X i ) K Λ n , ( t 2 ) ( X j ) ,
where φ is Kendall’s kernel (6.10) applied to the projected one-dimensional variables. The product δ i δ j ensures that only pairs with both responses fully observed contribute to the estimator. To handle multivariate responses, we employ a projection approach. For any unit vector a = ( a 1 , a 2 ) R d 1 × R d 2 with a = 1 , define the projected kernel
φ a ( ξ ( 1 ) , η ( 1 ) ) , ( ξ ( 2 ) , η ( 2 ) ) : = φ ( a 1 ξ ( 1 ) , a 2 η ( 1 ) ) , ( a 1 ξ ( 2 ) , a 2 η ( 2 ) ) .
Let F a 1 and G a 2 denote the distribution functions of a 1 ξ and a 2 η , respectively, assumed continuous for all unit vectors a . Under the null hypothesis H 0 of conditional independence, we have E [ φ a ( Y 1 , Y 2 ) X 1 = t 1 , X 2 = t 2 ] = 0 for almost every ( t 1 , t 2 ) and all a . Consequently, the conditional Kendall’s tau is zero, and the estimator τ ^ n , ( miss ) ( t ) should be close to zero for large n.
Theorem 17
(Consistency of conditional Kendall’s tau under MAR). Under the MAR assumption (2.4), the positivity condition inf x X p ( x ) c > 0 , and the assumptions of Corollary 1 (adapted to the missing data setting), we have, for any fixed t X 2 and any unit vector a ,
τ ^ n , ( miss ) ( t ) τ cond ( t ) 0 a . s . ,
where τ cond ( t ) = E [ φ ( Y 1 , Y 2 ) X 1 = t 1 , X 2 = t 2 ] is the true conditional Kendall’s tau. In particular, under H 0 , τ cond ( t ) = 0 almost everywhere, and thus τ ^ n , ( miss ) ( t ) 0 almost surely.
Proof. 
The result follows directly from Corollary 1 applied to the kernel φ a , noting that the missingness indicators δ i δ j are correctly accounted for in the complete-case estimator (6.12). The almost sure convergence is uniform over compact subsets of X 2 under the additional smoothness conditions of Section 3.2. □
The asymptotic distribution of τ ^ n , ( miss ) ( t ) under the null hypothesis can be derived using the central limit theorem established in Section 3.3. Specifically, under H 0 and appropriate bandwidth conditions, we have
n b ˘ d / 2 τ ^ n , ( miss ) ( t ) 0 D N 0 , σ miss 2 ( t ) ,
where σ miss 2 ( t ) is given by (3.19) with ρ 2 replaced by the appropriate variance expression for Kendall’s kernel, incorporating the propensity score inflation factor 1 / p ( x i ) . This result provides the theoretical foundation for constructing asymptotic level- α tests of conditional independence in the presence of missing responses under MAR.

7. Examples

The flexibility of the proposed conditional U-statistic framework is illustrated through several concrete examples. In general, any kernel function h : Y m R that has proven useful in the unconditional U-statistic literature (see, e.g., [177]) can be adapted to the conditional setting via the methodology developed in the preceding sections. Recall that the case m = 1 yields the Nadaraya–Watson estimator when the kernel is chosen as the identity function φ ( y ) = y . Furthermore, setting φ ( y ) = 1 ( , x ] ( y ) produces a consistent estimator of the conditional distribution function P ( Y x X = x ) , i.e., the conditional empirical distribution function evaluated at x. This observation underscores the generality of the proposed approach, as it encompasses both regression and distributional estimation within a unified framework. We now examine several nontrivial examples for the case m = 2 , which highlight the ability of conditional U-statistics to capture second-order conditional structure such as conditional variance and conditional covariance.
Example 1
(Conditional variance estimation). Consider the kernel function h : R 2 R defined by
h ( y 1 , y 2 ) = 1 2 y 1 y 2 2 .
This kernel is symmetric and unbiased for the variance in the unconditional setting. In the conditional framework, for x ˜ = ( x 1 , x 1 ) (i.e., both covariate arguments coincide), the corresponding conditional U-statistic target becomes
r ( 2 ) ( h , x 1 , x 1 ) = E 1 2 ( Y 1 Y 2 ) 2 X 1 = x 1 , X 2 = x 2 | x 2 = x 1 .
Under the conditional i.i.d. assumption given the covariates, the right-hand side simplifies to the conditional variance of Y given X = x 1 . Indeed, using the identity E [ ( Y 1 Y 2 ) 2 X 1 = X 2 = x ] = 2 Var ( Y X = x ) , we obtain
r ( 2 ) ( h , x 1 , x 1 ) = Var Y X = x 1 .
Thus, the estimator r ^ n , ( 2 ) ( h , x ˜ ; Λ ¯ n , ( x ˜ ) ) with x ˜ = ( x 1 , x 1 ) provides a consistent estimator of the conditional variance function. For this kernel, under the MAR assumption (2.4) and the positivity condition, the asymptotic variance appearing in the central limit theorem (Theorem 5) takes the form
ρ 2 = { E Y Y 2 2 Y Y 3 2 X = X 2 = X 3 = x 1             4 r ( 2 ) ( h , x 1 , x 1 ) 2 } 1 p ( x 1 ) K α , β 2 ( u ) d u / f ( x 1 ) ,
where the factor 1 / p ( x 1 ) reflects the variance inflation due to missing responses (see (3.19)). This expression is analogous to the unconditional variance component ζ 1 presented on page 182 of [177], with the crucial modification that the expectation is taken conditionally on the covariates and the kernel is replaced by its conditional counterpart.
Example 2
(Conditional covariance estimation). Let Y i = ( Y i 1 , Y i 2 ) R 2 be bivariate response vectors. Define the kernel h : R 2 × R 2 R by
h ( y 1 , y 2 ) = 1 2 ( y 11 y 21 ) ( y 12 y 22 ) = 1 2 y 11 y 12 + y 21 y 22 y 11 y 22 y 12 y 21 .
This kernel is symmetric in its two arguments.
For x ˜ = ( x 1 , x 2 ) , the corresponding conditional U-statistic target is
r ( 2 ) ( h , x 1 , x 2 ) = E 1 2 ( Y 11 Y 21 ) ( Y 12 Y 22 ) | X 1 = x 1 , X 2 = x 2 .
When the covariate arguments coincide, that is, when x 1 = x 2 = x , we obtain
r ( 2 ) ( h , x , x ) = 1 2 E ( Y 11 Y 21 ) ( Y 12 Y 22 ) | X 1 = x , X 2 = x .
Since ( Y 11 , Y 12 ) and ( Y 21 , Y 22 ) are conditionally i.i.d. given X = x , we have
E ( Y 11 Y 12 X 1 = x , X 2 = x ) = E ( Y 11 Y 12 X = x ) ,
E ( Y 21 Y 22 X 1 = x , X 2 = x ) = E ( Y 21 Y 22 X = x ) ,
E ( Y 11 Y 22 X 1 = x , X 2 = x ) = E ( Y 11 X = x ) E ( Y 22 X = x ) ,
E ( Y 12 Y 21 X 1 = x , X 2 = x ) = E ( Y 12 X = x ) E ( Y 21 X = x ) .
Therefore,
r ( 2 ) ( h , x , x ) = 1 2 [ E ( Y 11 Y 12 X = x ) + E ( Y 21 Y 22 X = x ) E ( Y 11 X = x ) E ( Y 22 X = x ) E ( Y 12 X = x ) E ( Y 21 X = x ) ] .
Since ( Y 11 , Y 12 ) and ( Y 21 , Y 22 ) have the same conditional distribution given X = x , it follows that
E ( Y 11 Y 12 X = x ) = E ( Y 21 Y 22 X = x )
and
E ( Y 11 X = x ) = E ( Y 21 X = x ) ,             E ( Y 12 X = x ) = E ( Y 22 X = x ) .
Hence
r ( 2 ) ( h , x , x ) = E ( Y 11 Y 12 X = x ) E ( Y 11 X = x ) E ( Y 12 X = x ) ,
which is precisely the conditional covariance
Cov ( Y 1 , Y 2 X = x ) .
Therefore, the proposed conditional U-statistic with kernel h provides a consistent estimator of the conditional covariance function under the MAR assumption, provided the positivity condition holds. This example illustrates that the methodology naturally accommodates second-order conditional functionals beyond conditional means.
Remark 18.
Both examples illustrate that the conditional U-statistic framework naturally accommodates kernels that are not necessarily products of functions of individual observations, thereby enabling the estimation of complex conditional functionals such as conditional variance and conditional covariance. The extension to missing responses under MAR is seamlessly integrated through the inclusion of the product of missingness indicators j = 1 m δ i j in the estimator definition, as shown in (2.5). The asymptotic properties derived in Section 3.2 and Section 3.3 guarantee the consistency and asymptotic normality of these estimators under the same regularity conditions, with the asymptotic variance inflated by the factor 1 / p ( x i ) to account for the reduced effective sample size due to missingness.
Remark 19.
In the unconditional setting (i.e., without covariates), the U-statistic U n ( h ) = n 2 1 i < j h ( Y i , Y j ) provides an unbiased estimator of E [ h ( Y 1 , Y 2 ) ] . The conditional U-statistic proposed in this work generalizes this concept by allowing the target parameter to depend on covariates X 1 , , X m , while also accommodating missing responses through the MAR mechanism. The price paid for this added flexibility is the introduction of a bandwidth parameter b ˘ and the resulting bias-variance trade-off, as well as the variance inflation factor 1 / p ( x i ) due to missingness. In the limiting case where the covariate space reduces to a single point (i.e., no conditioning), our estimator recovers the standard unconditional U-statistic, provided the bandwidth is chosen appropriately (e.g., b ˘ in an appropriate sense).

8. Bandwidth Selection Under Missing Responses

In the presence of missing responses, the bandwidth-selection rule must be adapted so that the validation criterion only involves observable response tuples. In the present MAR framework, a natural strategy is therefore to employ a complete-case leave-tuple-out cross-validation criterion, obtained by restricting the empirical risk to those m-tuples for which all responses are observed and by removing from the training sample every tuple that shares at least one observation with the validation tuple. To simplify notation, for any i = ( i 1 , , i m ) I ( m , n ) , define Δ i : = r = 1 m δ i r , so that Δ i = 1 if and only if all responses in the tuple Y ˜ i = ( Y i 1 , , Y i m ) are observed, and Δ i = 0 otherwise. For each fixed i = ( i 1 , , i m ) I ( m , n ) , define
I i ( m , n ) : = ( j 1 , , j m ) I ( m , n ) : { j 1 , , j m } { i 1 , , i m } = .
Thus, I i ( m , n ) consists of all m-tuples that do not use any observation appearing in the validation tuple i . Throughout this section, the symbol h denotes generically the smoothing parameter associated with the estimator under consideration; depending on the context, h may stand for a bandwidth, a vector of bandwidths, or the Bernstein order parameter. For any fixed i = ( i 1 , , i m ) I ( m , n ) and any { 1 , 2 , 3 } , we define the leave-tuple-out complete-case estimator by
r ^ n , , i ( m ) , ( miss ) φ , x ˜ ; Λ ¯ n , ( x ˜ ) : = j I i ( m , n ) φ ( Y ˜ j ) r = 1 m δ j r K Λ n , ( x 1 ) ( X j 1 ) K Λ n , ( x m ) ( X j m ) j I i ( m , n ) r = 1 m δ j r K Λ n , ( x 1 ) ( X j 1 ) K Λ n , ( x m ) ( X j m ) .
Evaluating this estimator at x ˜ = X ˜ i yields a predictor of φ ( Y ˜ i ) based exclusively on complete tuples that are disjoint from the validation tuple.
To estimate the quadratic prediction risk, let W ( · ) be a known nonnegative weight function and define, as before,
W ˜ ( x ˜ ) : = r = 1 m W ( x r ) ,             x ˜ = ( x 1 , , x m ) X m .
The complete-case cross-validation criterion is then given, for { 1 , 2 , 3 } , by
C V ( miss ) ( φ , h ) : = ( n m ) ! n ! i I ( m , n ) Δ i φ ( Y ˜ i ) r ^ n , , i ( m ) , ( miss ) ( φ , X ˜ i ; h ) 2 W ˜ ( X ˜ i ) .
The factor Δ i ensures that only fully observed tuples contribute to the criterion. This is essential, since φ ( Y ˜ i ) is not available whenever at least one component of the response tuple is missing. A natural data-driven choice of the smoothing parameter is therefore
h ^ n , ( miss ) arg min h H n C V ( miss ) ( φ , h ) ,
where H n denotes a prescribed set of admissible smoothing parameters. Since the number of complete tuples may vary substantially with the missingness rate, one may also consider the normalized version
C V ˜ ( miss ) ( φ , h ) : = i I ( m , n ) Δ i 1 i I ( m , n ) Δ i φ ( Y ˜ i ) r ^ n , , i ( m ) , ( miss ) ( φ , X ˜ i ; h ) 2 W ˜ ( X ˜ i ) ,
whenever i I ( m , n ) Δ i > 0 . Since the normalizing factor does not depend on h, both criteria lead to the same minimizer. In some applications, a local version of the criterion may be preferable. To this end, define
C V ( miss ) ( φ , h ^ n , ( miss ) ) : = ( n m ) ! n ! i I ( m , n ) Δ i φ ( Y ˜ i ) r ^ n , , i ( m ) , ( miss ) ( φ , X ˜ i ; h ^ n , ( miss ) ) 2 W ^ ( X ˜ i , x ˜ ) ,
where
W ^ ( s ˜ , x ˜ ) : = r = 1 m W ^ ( s r , x r ) .
In practice, one often considers either the global choice
W ˜ ( X ˜ i ) = 1 ,             i I ( m , n ) ,
or the local weights
W ^ ( s , x ) = 1 , if s x h , 0 , otherwise .
Accordingly,
W ^ ( X ˜ i , x ˜ ) = r = 1 m 1 { X i r x r h } .
Remark 20.
The criterion (8.2) is the natural complete-case analog of leave-one-out cross-validation for conditional U-statistics under MAR. In the present higher-order setting, removing only the tuple i itself is not sufficient, because tuples sharing one or more observations with i would still introduce leakage between training and validation. The definition of I i ( m , n ) avoids this issue by excluding every tuple that intersects the validation tuple.

9. Simulation Study

This section reports a finite-sample investigation of kernel-based estimators of the conditional Kendall coefficient when the response pair is only partially observed. The numerical design is chosen to satisfy two requirements simultaneously. First, it remains exactly faithful to the implementation used in the simulation code, so that every reported quantity has a direct computational meaning. Second, it is sufficiently structured to probe the main theoretical issues raised in Section 3, Section 4, Section 5.1, Section 5.2, Section 5.3 and Section 5.4, namely: local estimation of a conditional concordance functional, smoothing on a compact support, boundary sensitivity, and the effect of incomplete responses under both MCAR and covariate-dependent MAR mechanisms. The study also compares complete-case local smoothing with inverse-probability reweighting, but this comparison must be interpreted carefully because, in the present design, the missingness mechanism depends only on the conditioning covariate.

9.1. Target Functional and Interpretation

For u ( 0 , 1 ) , the target object is the conditional Kendall coefficient
τ ( u ) τ 1 , 2 X = u = E sign Y 1 ( 1 ) Y 1 ( 2 ) Y 2 ( 1 ) Y 2 ( 2 ) | X ( 1 ) = X ( 2 ) = u ,
where ( X ( 1 ) , Y 1 ( 1 ) , Y 2 ( 1 ) ) and ( X ( 2 ) , Y 1 ( 2 ) , Y 2 ( 2 ) ) are independent copies of ( X , Y 1 , Y 2 ) . Since the conditioning event { X ( 1 ) = X ( 2 ) = u } has probability zero under a continuous design, the above display should be read in the usual regular-conditional sense: if ( Y 1 [ u ] , Y 2 [ u ] ) and ( Y ˜ 1 [ u ] , Y ˜ 2 [ u ] ) are two independent draws from the conditional distribution of ( Y 1 , Y 2 ) given X = u , then
τ ( u ) = E sign Y 1 [ u ] Y ˜ 1 [ u ] Y 2 [ u ] Y ˜ 2 [ u ] .
Under continuity of the conditional law of ( Y 1 , Y 2 ) X = u , ties occur with probability zero, and one has the equivalent identities
τ ( u ) = 4 P Y 1 [ u ] < Y ˜ 1 [ u ] , Y 2 [ u ] < Y ˜ 2 [ u ] 1 ,
and
τ ( u ) = 1 4 P Y 1 [ u ] < Y ˜ 1 [ u ] , Y 2 [ u ] > Y ˜ 2 [ u ] .
These equivalent population representations justify the three empirical Kendall-type estimators used below.

9.2. Data-Generating Mechanisms

For each Monte Carlo replication, and for each sample size
n { 500 , 1000 , 2000 } ,
we generate i.i.d. observations
( X i , Y 1 i , Y 2 i , δ i ) ,             i = 1 , , n ,
where X i [ 0 , 1 ] is a scalar covariate, ( Y 1 i , Y 2 i ) is a continuous response pair, and δ i { 0 , 1 } indicates whether the response pair is observed. Conditionally on X i = u , the response pair is produced through the Gaussian location–scale construction
Y 1 i = m 1 ( X i ) + s 1 ( X i ) Z 1 i , Y 2 i = m 2 ( X i ) + s 2 ( X i ) ρ ( X i ) Z 1 i + 1 ρ ( X i ) 2 Z 2 i ,
where Z 1 i , Z 2 i i . i . d . N ( 0 , 1 ) and
ρ ( u ) = sin π τ ( u ) 2 .
Consequently, conditionally on X = u , the pair ( Y 1 , Y 2 ) is Gaussian with correlation ρ ( u ) , and therefore
τ ( u ) = 2 π arcsin { ρ ( u ) } .
This construction is particularly convenient here because it allows the conditional Kendall function to be prescribed exactly while permitting the conditional margins to vary through m 1 , m 2 , s 1 , s 2 . In other words, the dependence structure is controlled through the conditional copula parameter, whereas location and scale heterogeneity are introduced without changing the target value of τ ( u ) . Two scenarios are considered.
  • Scenario 1: linear conditional association under uniform design.
The covariate is uniformly distributed on the unit interval:
X Unif ( 0 , 1 ) .
The conditional Kendall function is linear,
τ ( u ) = 2 u 1 ,
so that the conditional association ranges from negative dependence near u = 0 to positive dependence near u = 1 , crossing zero at u = 1 / 2 . The conditional location and scale functions are
m 1 ( u ) = u ,             m 2 ( u ) = u ,             s 1 ( u ) = 1 ,             s 2 ( u ) = 1 .
This first scenario is intentionally regular: the design density is flat, the target is affine, and there is no intrinsic boundary concentration in the covariate distribution. It therefore serves as the baseline regime for evaluating the procedures in a relatively favorable setting.
  • Scenario 2: nonlinear conditional association under asymmetric Beta design.
The covariate follows a Beta distribution concentrated near the left boundary,
X Beta ( 2 , 5 ) .
The conditional Kendall function is
τ ( u ) = sin π ( u 1 / 2 ) ,
which is smooth and nonlinear on [ 0 , 1 ] , with τ ( 1 / 2 ) = 0 . Its nonlinearity is essential for the simulation, because it makes bias behavior more sensitive to the local smoothing rule than in Scenario 1. The conditional location and scale functions are
m 1 ( u ) = 2 u ,             m 2 ( u ) = 1 u ,             s 1 ( u ) = 0 . 5 + 0 . 5 u ,             s 2 ( u ) = 1 .
This second scenario is deliberately more demanding: the design density is strongly inhomogeneous, with substantial mass near the boundary, and the target is no longer affine. It is precisely in such settings that support-adapted smoothers should be expected to reveal any practical advantage.

9.3. Missing-Response Mechanism

The variable δ i specifies whether the pair ( Y 1 i , Y 2 i ) is observed. Only units with δ i = 1 contribute to the estimator. The simulations are run at target missing proportions
π { 0 , 0 . 10 , 0 . 30 , 0 . 50 } ,             q = 1 π ,
where q denotes the target observation rate.
Two missingness labels are implemented.
  • MCAR.
Under missing completely at random,
P ( δ i = 1 X i ) = q .
Thus, the observation probability is constant, and the observed sample is, conditionally on its size, a simple thinning of the original design.
  • Covariate-dependent MAR.
Under the MAR mechanism used in the code,
P ( δ i = 1 X i = x ) = p ( x ) = logit 1 a + b ( x 1 / 2 ) ,             b = 3 .
The intercept a is not fixed once and for all. Instead, for each Monte Carlo replication, it is calibrated numerically from the realized covariates X 1 , , X n so that
1 n i = 1 n logit 1 a + b ( X i 1 / 2 ) = q .
This calibration is a subtle but important design choice. It ensures that the sample average of the selection probabilities equals the target observation proportion before Bernoulli thinning is applied. As a consequence, the realized observed rate fluctuates around its nominal target essentially because of the Bernoulli draws, not because of uncontrolled replication-to-replication drift in the average propensity score. Since the logistic link maps R into ( 0 , 1 ) , the positivity condition
0 < p ( x ) < 1 ,             x [ 0 , 1 ] ,
holds automatically. A crucial conceptual point is that the missingness mechanism depends on X only. Since the estimand itself is conditional on X = u , complete-case smoothing remains targeted to the same conditional functional τ ( u ) . Therefore, in the present design, the comparison between complete-case and IPW estimators should not be framed as a generic correction of selection bias in the target parameter. Rather, it should be read as a comparison of local design reweighting strategies and of the corresponding finite-sample bias–variance trade-off.

9.4. Observed Sample, Local Weights, and Empirical Kendall Representations

Let O n = { i { 1 , , n } : δ i = 1 } , n cc = | O n | , denote the observed set and the number of complete cases. Estimation is performed on the regular grid
U = { u 1 , , u 97 } = seq ( 0 . 02 , 0 . 98 , length . out = 97 ) .
The use of the truncated grid [ 0 . 02 , 0 . 98 ] instead of the full interval [ 0 , 1 ] is deliberate. It avoids reporting performance summaries at the extreme endpoints, where ordinary symmetric smoothers may suffer their most severe support truncation and where numerical comparisons may be dominated by endpoint artifacts rather than by the intrinsic behavior of the procedure.
For each u U , the code computes normalized local weights over the observed sample.
  • Complete-case weights.
For a generic local score L K , u ( X i ) associated with kernel or Bernstein smoothing rule K,
w i , K cc ( u ) = L K , u ( X i ) j O n L K , u ( X j ) ,             i O n .
  • IPW weights.
If p i = P ( δ i = 1 X i ) denotes the observation probability used to generate the missingness indicator, the IPW version is
w i , K ipw ( u ) = L K , u ( X i ) / p i j O n L K , u ( X j ) / p j ,             i O n .
Since the simulation is fully controlled, the true probabilities p i are used directly. Hence the numerical comparison isolates the second-stage effect of inverse-probability reweighting and does not involve any first-stage propensity estimation error. This has an immediate and nontrivial consequence under MCAR. If p i q for all i, then
w i , K ipw ( u ) = L K , u ( X i ) / q j O n L K , u ( X j ) / q = L K , u ( X i ) j O n L K , u ( X j ) = w i , K cc ( u ) .
Therefore, in the present implementation, complete-case and IPW estimators are exactly identical under MCAR. This point is easily overlooked, but it is essential for the correct interpretation of the Gaussian benchmark comparisons: Any difference between CC and IPW can only arise under covariate-dependent MAR. Using either w i cc ( u ) or w i ipw ( u ) , the code computes the following three weighted Kendall-type statistics:
τ ^ n ( 1 ) ( u ) = 4 i O n j O n w i ( u ) w j ( u ) 1 { Y 1 i < Y 1 j , Y 2 i < Y 2 j } 1 ,
τ ^ n ( 2 ) ( u ) = i O n j O n w i ( u ) w j ( u ) sign ( Y 1 i Y 1 j ) ( Y 2 i Y 2 j ) ,
and
τ ^ n ( 3 ) ( u ) = 1 4 i O n j O n w i ( u ) w j ( u ) 1 { Y 1 i < Y 1 j , Y 2 i > Y 2 j } .
At the population level, these three forms are equivalent under continuity. In finite samples, however, they need not coincide numerically. The reason is twofold. First, they are different empirical algebraizations of the same concordance functional. Second, the code uses fully normalized local weights and sums over all ordered pairs ( i , j ) , including diagonal terms. Thus the implemented quantities are weighted V-statistic analogs rather than leave-two-out U-statistics. The diagonal contribution is negligible asymptotically under standard smoothing conditions, but it can matter nonnegligibly in finite samples when the local weight distribution is uneven, especially near boundaries or under severe missingness. This is precisely why retaining the three empirical representations in the simulation is methodologically preferable to collapsing them a priori into a single nominal estimator.

9.5. Smoothers and Tuning Rules

The implementation compares the five smoothing devices
gaussian ,             epanechnikov ,             tricube ,             beta ,             bernstein .
No Dirichlet kernel is included in the current code, even though Dirichlet-type constructions appear in the broader theoretical discussion. For the Gaussian, Epanechnikov, tricube, and beta smoothers, the tuning parameter is selected through a rule of the form
h K = α K σ ^ X n cc 1 / 5 ,
with constants
α gaussian = 1 . 06 ,             α epanechnikov = 2 . 34 ,             α tricube = 2 . 50 ,             α β = 1 . 50 ,
where σ ^ X is the empirical standard deviation of the observed covariates { X i : i O n } . For the Bernstein smoother, the polynomial degree is chosen as
ϑ = n cc log n cc .
These tuning choices deserve explicit comment. First, all smoothing parameters depend on the observed sample rather than on the nominal size n. Missingness therefore affects the estimator not only by reducing the number of usable observations but also by altering the amount of smoothing through n cc and, under MAR, through the observed empirical spread σ ^ X . Second, the comparison between kernels is not a comparison of shapes alone, because each family comes with its own scaling constant. Strictly speaking, the study compares fully implemented smoothing procedures, not isolated kernels under a common bandwidth benchmark. This is exactly the relevant comparison from an applied numerical standpoint.

9.6. Monte Carlo Protocol and Risk Criteria

For each fixed configuration
( scenario , n , π , missingness label , correction , kernel , estimator ) ,
the code performs B = NSIM independent Monte Carlo replications. In the testing version of the script, NSIM = 100 , whereas the intended production run uses NSIM = 1000 . Let τ ^ b ( u ) denote the estimator obtained at replication b and evaluation point u U . The pointwise summaries are
Bias ( u ) = 1 B b = 1 B τ ^ b ( u ) τ ( u ) ,
SD ( u ) = 1 B 1 b = 1 B τ ^ b ( u ) τ ¯ ( u ) 2 1 / 2 ,           τ ¯ ( u ) = 1 B b = 1 B τ ^ b ( u ) ,
MSE ( u ) = 1 B b = 1 B τ ^ b ( u ) τ ( u ) 2 ,
and
MAE ( u ) = 1 B b = 1 B τ ^ b ( u ) τ ( u ) .
To summarize overall performance on the grid, the code computes integrated criteria by trapezoidal quadrature:
IBias = U | Bias ( u ) | d u ,             ISd = U SD ( u ) d u ,
IMSE = U MSE ( u ) d u ,             IAE = U MAE ( u ) d u .
In the implementation, these quantities are numerical integrals over [ 0 . 02 , 0 . 98 ] , not over the full interval [ 0 , 1 ] . This should be kept in mind when interpreting the results: the integrated risk summarizes interior and near-boundary performance, but it does not include the extreme endpoints.

9.7. Reported Numerical Summaries

The numerical output is summarized selectively. For readability, we do not reproduce the full collection of raw simulation tables generated by the code. Instead, we report the quantities most directly connected with the theoretical questions studied in the paper. First, we present integrated criteria ( IBias , ISd , IMSE , IAE ) , computed over the evaluation grid by trapezoidal quadrature. These criteria summarize the global finite-sample behavior of each procedure on the interval used in the simulations. Second, we report conditional IMSE rankings within fixed design cells ( scenario , missingness label , correction , n , π ) . This conditional ranking is preferable to a global ranking, since different scenarios and missingness levels correspond to intrinsically different estimation difficulties. Third, we include representative heatmaps and boxplots of IMSE in order to visualize the effect of the smoothing rule, the sample size, the missingness rate, and the missingness mechanism. These figures are intended to complement the integrated criteria rather than to replace them. Finally, we include observed-rate diagnostics to verify that the implemented missingness mechanisms produce the intended levels of incomplete response observation. The complete-case versus IPW comparison is reported only where it is informative, namely under covariate-dependent MAR; under MCAR, normalized complete-case and normalized IPW weights coincide.

9.8. Interpretation of the Comparisons

The numerical comparisons should be read in light of several structural facts. First, because missingness depends at most on X, both complete-case and IPW procedures target the same conditional functional τ ( u ) . Under MAR, IPW should therefore be understood as a device that attempts to reconstruct the local covariate design that would have been available without missingness. Its role is not to change the target parameter, but to modify the local weighting scheme used to estimate it. Second, under MCAR, normalized IPW and normalized complete-case weights coincide exactly. Accordingly, any CC–IPW contrast is meaningful only under covariate-dependent MAR. This exact equivalence is a consequence of the normalization, not an asymptotic approximation. Third, boundary behavior is central. The Gaussian, Epanechnikov, and tricube procedures are used on a compact support without explicit boundary renormalization, whereas the beta and Bernstein constructions are intrinsically support-adapted. Scenario 2 is therefore particularly informative, because it combines nonlinear dependence with a covariate distribution concentrated near the left boundary. Fourth, the three empirical Kendall representations are asymptotically linked, but their finite-sample ranking may differ because the simulation compares distinct weighted V-statistic implementations. Such differences should be interpreted as part of the actual numerical behavior of the estimators, not as irrelevant computational noise. Fifth, because the smoothing rules depend on n cc and on the observed empirical variability of X, missingness perturbs performance through several channels simultaneously: loss of effective sample size, alteration of the observed design density, and modification of the smoothing scale. The resulting impact is therefore richer than a simple “smaller n” effect.

9.9. Results

The numerical findings are summarized in Table A1 and in Figure 1, Figure 2, Figure 3, Figure 4, Figure 5, Figure 6, Figure 7 and Figure 8. Since the ranking is performed within fixed design cells, all substantive comparisons should be read conditionally on
( scenario , missingness label , correction , n , π ) .
This conditional reading is essential in order not to confound procedures operating under different levels of intrinsic difficulty. From a methodological standpoint, the most informative contrasts are the following.
Effect of the smoothing rule. The heatmaps and boxplots quantify how the choice of smoother interacts with support geometry and covariate density. In Scenario 1, where the design is uniform and the target is linear, conventional symmetric kernels may remain competitive. In Scenario 2, where the design is concentrated near the boundary and the target is nonlinear, support-adapted procedures should be given particular attention. Nevertheless, any final judgment must be drawn from the IMSE summaries themselves, since the comparison concerns fully implemented procedures, not only asymptotic kernel classes.
Effect of missingness. The observed-rate diagnostics first verify that the missingness module is correctly calibrated. Once this is established, the degradation in performance as π increases reflects a combination of reduced effective sample size, design distortion under MAR, and changes in the data-driven smoothing scale.
Complete-case versus IPW under MAR. Because the target is conditional on X, the CC–IPW comparison is fundamentally local. In regions where the MAR mechanism significantly distorts the observed covariate distribution, IPW may improve centering by compensating for that local distortion, but at the cost of increased dispersion due to heterogeneous weights. In more homogeneous regions, complete-case smoothing may remain competitive or even preferable. The comparison is therefore best understood as a local bias–variance trade-off rather than as a universal superiority statement.
Effect of sample size. The progression n = 500 , 1000 , 2000 allows one to assess how rapidly the integrated criteria decrease as information accumulates. Since the smoothing parameters are determined from the observed sample, the effective asymptotic regime is governed jointly by n, the missingness level, and the realized structure of the observed covariates.

9.10. Practical Implications

The simulation supports the following methodological reading. First, when the covariate support is compact and boundary behavior is substantively important, support-adapted smoothers such as the beta and Bernstein constructions deserve explicit consideration rather than being treated as merely optional refinements. Second, when missingness depends only on the conditioning covariate, complete-case analysis is not automatically inconsistent for the conditional Kendall target. In that case, the empirical relevance of IPW is primarily a question of local reweighting efficiency and variance control. Third, because the ranking is carried out within fixed design cells, practical recommendations should be made conditionally on the sampling regime and missingness level, rather than through a single unconditional hierarchy. Fourth, the current IPW analysis is intentionally idealized: the true observation probabilities are used, so the study isolates the second-stage impact of reweighting. In applications, a fully data-adaptive implementation would involve an additional first-stage estimation problem, and the present results should therefore be read as a benchmark rather than as a complete end-to-end performance assessment.
Overall, the simulations support the qualitative conclusions of the asymptotic theory. Missingness mainly affects performance through the loss of effective information and through changes in the observed local design, whereas the relative behavior of the smoothing rules is strongly influenced by support geometry and boundary effects. The numerical study should therefore be interpreted as a finite-sample illustration of the theoretical mechanisms developed in the paper, not as an exhaustive benchmark over all possible bandwidth choices, kernels, missingness mechanisms or conditional dependence models.

10. Concluding Remarks

This paper develops a comprehensive asymptotic framework for nonparametric conditional U-statistics smoothed by asymmetric kernels in the presence of Missing-at-Random (MAR) responses. The analysis brings together three layers of difficulty that, to the best of our knowledge, had not previously been treated within a unified theory: the intrinsic nonlinearity of conditional U-functionals, the boundary-sensitive nature of smoothing on constrained supports, and the additional stochastic distortion induced by incomplete responses. In this sense, the present work substantially extends the classical conditional U-statistics literature initiated by [60] and subsequently refined in [63,64,65,66,67], by embedding it into a support-adapted and missing-data-aware nonparametric framework.
Our first contribution is the introduction and analysis of Dirichlet-kernel conditional U-statistics on the d m -dimensional simplex under MAR sampling. This appears to be the first rigorous study of such estimators in the literature. For these procedures, we establish both uniform strong consistency and asymptotic normality, thereby providing a theoretical foundation for local nonlinear smoothing on simplex-constrained domains. Since Dirichlet kernels are intrinsically adapted to the geometry of the simplex, the resulting estimators overcome the severe boundary distortions that inevitably affect conventional symmetric smoothers on constrained supports. Beyond the higher-order conditional U-statistic setting, our analysis also yields new results for the associated Nadaraya–Watson-type regression estimators, which are of independent methodological interest.
A second major contribution concerns Bernstein-polynomial smoothing. We show that Bernstein-type conditional U-statistics also admit a rich asymptotic theory under MAR sampling, including weak and strong uniform convergence. These results reinforce the view that Bernstein smoothers constitute not merely an approximation-theoretic device, but a genuine support-respecting inferential tool for nonlinear conditional estimation. In parallel, we establish analogous convergence results for beta-kernel conditional U-statistics on hyperrectangles, including rates over expanding compact sets and under general sequences of smoothing parameters. This part of the paper clarifies how support-adapted asymmetric kernels can be systematically incorporated into the conditional U-statistics paradigm on bounded Euclidean domains.
A further originality of the paper lies in the treatment of mixed continuous and categorical regressors. This extension is particularly relevant for modern applications, where the explanatory structure is seldom purely continuous. The mixed-design setting considerably enlarges the practical scope of the theory and shows that the conditional U-statistics methodology remains viable in heterogeneous covariate environments. In addition, the inferential examples discussed in the paper—notably discrimination problems, multisample conditional U-statistics, and conditional versions of Kendall’s rank-based dependence measures—illustrate that the proposed framework is sufficiently flexible to handle a wide family of nonlinear conditional targets beyond ordinary regression means.
From a technical standpoint, the paper also contributes a methodological strategy for analyzing asymmetric nonlinear smoothers that is, in itself, of independent value. The proofs rely on a delicate combination of Hoeffding-type decompositions, truncation arguments, and exponential inequalities for canonical U-statistics, coupled with empirical-process techniques adapted to location-dependent kernels. The point here is that the present asymptotic theory cannot be obtained by a routine transfer of arguments from either standard Nadaraya–Watson smoothing or ordinary asymmetric density estimation. The nonlinear ratio structure of conditional U-statistics, together with the dependence of the kernel shape on the evaluation point and the effective sample-size reduction caused by missingness, creates a genuinely more intricate probabilistic problem. One of the broader messages of the paper is therefore that support-adapted smoothing and higher-order nonlinear inference can indeed be reconciled, but only after a careful reworking of the classical tools.
Although the present results are already fairly general, they open several promising directions for further research. A first natural extension concerns data-driven smoothing-parameter selection. While we briefly discuss cross-validation-type criteria, a full asymptotic theory for optimal bandwidth or polynomial-order selection in the conditional U-statistics context remains open. Such a theory is likely to be substantially more delicate than in ordinary regression, since the optimal balance between bias and stochastic fluctuation depends not only on the geometry of the support and the smoothness of the target, but also on the order m of the conditional functional and the local effective sample size induced by the missingness mechanism. In particular, deriving asymptotically optimal selectors under global or local risk criteria, and understanding their interaction with boundary adaptation, would constitute a significant advance.
A second important perspective concerns estimated missingness mechanisms. In the present work, the MAR framework is incorporated through complete-case arguments and positivity conditions. A natural next step would be to allow the propensity score to be unknown and nonparametrically estimated, thereby leading to inverse-probability-weighted, augmented, or doubly robust versions of conditional U-statistics. Such developments would connect the present theory to semiparametric efficiency, debiasing, and modern missing-data methodology. In particular, it would be of considerable interest to determine whether one can construct asymptotically linear and efficiency-enhanced versions of support-adapted conditional U-statistics in the MAR setting, and to quantify the price paid, in higher-order nonlinear problems, for replacing the true observation probability by an estimated one.
A third perspective concerns dependence structures. The literature on asymmetric kernels under dependence is still comparatively sparse, and extending the present results beyond the i.i.d. framework would require genuinely new probabilistic tools. This includes time series, mixing arrays, spatial random fields, network-dependent samples, and other structured dependence schemes; see, for example, [178,179,180,181]. In such contexts, both the Hoeffding decomposition and the local empirical-process machinery must be revisited, and the interaction between dependence and support-adapted smoothing may generate new boundary phenomena. A comparable challenge arises for longitudinal data, panel data, and functional observations, where local nonlinear conditional functionals are often of direct inferential interest.
Another highly promising direction is the extension to other geometric supports. The present article focuses on simplices and hyperrectangles, which already cover a broad class of practical situations. Nevertheless, many modern datasets live on more general constrained spaces: spheres, manifolds, compositional submanifolds, cones, positive semidefinite matrices, or other structured domains. Developing conditional U-statistics with support-respecting asymmetric kernels on such spaces would enlarge the theory in a conceptually important way, especially in applications involving directional data, shape analysis, diffusion tensors, and manifold-valued learning problems. In these settings, the interplay between geometry, kernel construction, and nonlinear functional estimation is likely to raise new questions of both probabilistic and statistical significance.
A further line of investigation concerns high-dimensional and regularized regimes. As the ambient dimension grows, the effective sparsity of the local neighborhoods deteriorates, and the curse of dimensionality becomes especially severe for higher-order conditional functionals. This suggests the need for structure-exploiting extensions based on dimension reduction, additive representations, sparsity constraints, or localized projections. In parallel, one may ask whether conditional U-statistics smoothed by asymmetric kernels can be combined with modern regularization devices in order to produce feasible estimators in moderately or genuinely high-dimensional settings. Such questions are particularly relevant for contemporary applications in genomics, finance, image analysis, and network data, where the response functional may be nonlinear and the covariate support intrinsically constrained.
The paper also points toward applications in change-point analysis, survival and censored-data models, and robust local inference. Change-point methodology has become increasingly important in stochastic systems subject to structural breaks, yet its interaction with conditional U-statistics remains almost completely unexplored; see [35,182,183,184,185]. Likewise, extending the present support-adapted framework to accommodate right censoring, truncation, interval observation, or informative missingness would considerably broaden its scope; compare, for instance, with [186]. From a robustness viewpoint, it would also be worthwhile to investigate conditional U-statistics based on bounded or redescending kernels, especially in conjunction with asymmetric smoothing, in order to better cope with contamination and heavy tails.
Finally, beyond the asymptotic theory, large-scale computational aspects merit dedicated attention. Because conditional U-statistics involve summation over distinct tuples, computational complexity becomes a nontrivial issue even in moderate samples, especially when repeated evaluations are required for bandwidth selection or resampling. This raises interesting algorithmic questions concerning incomplete U-statistics, randomized approximations, divide-and-conquer strategies, online updating, and distributed implementations. Such directions are not merely computational conveniences; they are essential if the present methodology is to become operational in data-rich environments.
In summary, the present paper establishes a new theoretical bridge between three mature but hitherto insufficiently connected areas: conditional U-statistics, asymmetric-kernel smoothing on constrained supports, and nonparametric inference under MAR missingness. The results show that one can recover strong uniform convergence properties and asymptotic normality for a broad class of support-adapted nonlinear estimators, while simultaneously accounting for incomplete responses and mixed covariate structures. We hope that this work will stimulate further developments at the interface of higher-order nonparametric inference, geometric statistics, and incomplete-data analysis, and that it will serve as a foundation for future advances in both theory and applications.
To make the structure of the results transparent, Table 2 summarizes the main asymptotic conclusions. The table distinguishes between the deterministic smoothing scale, the stochastic complete-case scale, and the specific role of the MAR propensity score. This is important because, for the complete-case estimators studied here, the deterministic centering is obtained with the effective density p f , whereas the stochastic dispersion contains the usual inverse-propensity loss of information. Thus, the MAR mechanism does not merely reduce the sample size; it also changes the constants appearing in the bias, variance, MSE, and bandwidth formulae.

11. Mathematical Development

This section is dedicated to proving our results. We will continue to use the previously established notation, now with the understanding that all quantities incorporate the missingness indicators δ i under the MAR assumption (2.4). A crucial element in our proofs involves the truncation of the U-statistics. Specifically, we represent the U-statistics u n , ( φ , x ˜ ) for { 1 , 2 , 3 } as follows:
u n , ( φ , x ˜ ) = u n , ( m ) G φ , x ˜ , ( miss ) , ( T ) + u n , ( m ) G φ , x ˜ , ( miss ) , ( R ) = : u n , ( T ) ( φ , x ˜ ) + u n , ( R ) ( φ , x ˜ ) ,
where for { 1 , 2 , 3 } and some ω n , (to be specified later in the proof of each section), we have:
G φ , x ˜ , ( miss ) ( t , y , δ ) = G φ , x ˜ , ( miss ) , ( T ) ( t , y , δ ) + G φ , x ˜ , ( miss ) , ( R ) ( t , y , δ ) = G φ , x ˜ , ( miss ) ( t , y , δ ) 1 φ ( y ) ω n , + G φ , x ˜ , ( miss ) ( t , y , δ ) 1 φ ( y ) > ω n , .
Here, u n , ( T ) ( φ , x ˜ ) is the truncated part, and u n , ( R ) ( φ , x ˜ ) is the remainder part. Note that the missingness indicators j = 1 m δ j are preserved in both the truncated and remainder components, as they are independent of the truncation threshold. We establish the uniform convergence rates of u n , ( φ , x ˜ ) to E [ u n , ( φ , x ˜ ) ] based on the convergence rates of u n , ( m ) G φ , x ˜ , ( miss ) , ( T ) to E [ u n , ( m ) G φ , x ˜ , ( miss ) , ( T ) ] , while demonstrating that the remainder part is asymptotically negligible under the moment condition (C.3) (or its generalization (C.3)″) and the MAR assumption. Next, we can use these results to deduce the convergence rates of the stochastic part of the estimators r ^ n , ( m ) ( φ , x ˜ ; Λ ¯ n , ( x ˜ ) ) . Indeed, we can clearly see that based on the classical decomposition, for { 1 , 2 , 3 } :
r ^ n , ( m ) ( φ , x ˜ ; Λ ¯ n , ( x ˜ ) ) E ^ r ^ n , ( m ) ( φ , x ˜ ; Λ ¯ n , ( x ˜ ) ) =     u n , ( φ , x ˜ ) u n , ( 1 , x ˜ ) E u n , ( φ , x ˜ ) E u n , ( 1 , x ˜ ) u n , ( φ , x ˜ ) E u n , ( φ , x ˜ ) u n , ( 1 , x ˜ ) + E u n , ( φ , x ˜ ) · u n , ( 1 , x ˜ ) E u n , ( 1 , x ˜ ) u n , ( 1 , x ˜ ) · E u n , ( 1 , x ˜ ) = : I , 1 + I , 2 .
Later, based on the imposed regularity conditions in each section, we can easily control the terms u n , ( φ , x ˜ ) and E u n , ( φ , x ˜ ) (including the particular case when φ 1 ), uniformly in x ˜ to obtain the desired rates of convergence. Under the MAR assumption and the positivity condition inf x X p ( x ) c > 0 , we have the uniform lower bounds:
inf x ˜ X m E u n , ( 1 , x ˜ ) C * > 0 ,             inf x ˜ X m u n , ( 1 , x ˜ ) C * 2 a . s . for sufficiently large n ,
which ensure that the denominators in I , 1 and I , 2 are bounded away from zero. Lastly, we need to study the bias terms of each estimator. It is worth noting that the proof of the bias term for the three estimators proposed in this paper is based on the following decomposition for 1 , 2 , 3 , we have
E ^ r ^ n , ( m ) ( φ , x ˜ ; Λ ¯ n , ( x ˜ ) ) r ( m ) ( φ , x ˜ ) = E u n , ( φ , x ˜ ) E u n , ( 1 , x ˜ ) r ( m ) ( φ , x ˜ ) = 1 E u n , ( 1 , x ˜ ) E u n , ( φ , x ˜ ) r ( m ) ( φ , x ˜ ) E u n , ( 1 , x ˜ ) .
As a matter of fact, (11.3) implies that it suffices to control the term E u n , ( φ , x ˜ ) R ( φ , x ˜ ) uniformly in x ˜ , to establish the desired results, as we can see in the sequel. Under the MAR assumption, we emphasize that the expectations incorporate the propensity score via E [ δ K ( α , β ) ( X ) ] = E [ p ( X ) K ( α , β ) ( X ) ] . However, as demonstrated in Remark 5, the leading-order bias expansion remains unaffected by the missingness mechanism, with the propensity score p ( · ) canceling out in the ratio due to the common factor appearing in both the numerator and denominator. The residual higher-order terms involving derivatives of p ( · ) are asymptotically negligible under the smoothness condition (C.2) and the bandwidth condition b ˘ 0 .

12. Proofs of Section 3: Dirichlet Kernels

12.1. Proofs of Section 3.2

Proof of Theorem 2.
Let x S d , 1 ( b ˘ ( d + 1 ) ) , n 1 , 0 < b ˘ < e 16 2 d 1 , 0 < a e 1 f | log b ˘ | / b ˘ d + 1 / 2 , and take the unique
δ 0 , e 1 that   satisfies δ | log δ | = b ˘ d + 1 / 2 a 2 f | log b ˘ | .
Define x ˜ = ( x 1 , , x m ) such that x ˜ x ˜ + [ b ˘ , b ˘ ] m , where x ˜ = ( x 1 , , x m ) S d , 1 m ( b ˘ ( d + 1 ) ) , and b ˘ : = ( b , , b ) a d-dimensional vector, then we have:
u n , 1 ( miss ) ( φ , x ˜ ) E [ u n , 1 ( miss ) ( φ , x ˜ ) ] u n , 1 ( miss ) ( φ , x ˜ ) u n , 1 ( miss ) ( φ , x ˜ ) + E [ u n , 1 ( miss ) ( φ , x ˜ ) ] E [ u n , 1 ( miss ) ( φ , x ˜ ) ] + u n , 1 ( miss ) ( φ , x ˜ ) E [ u n , 1 ( miss ) ( φ , x ˜ ) ] .
As we explained before, to establish uniform convergence rates, we will be studying the convergence of the truncated part and the remainder part of u n , 1 ( miss ) ( φ , x ˜ ) respectively. Under the MAR assumption (2.4), the missingness indicators δ i are incorporated into the kernel G φ , x ˜ , 1 ( miss ) as per (2.6), and the positivity condition ensures that the denominators are bounded away from zero.
  • Truncated Part
Notice that
u n , 1 ( T ) , ( miss ) ( φ , x ˜ ) E u n , 1 ( T ) , ( miss ) ( φ , x ˜ ) = ( n m ) ! n ! i I ( m , n ) φ ( T ) ( Y i ˜ ) j = 1 m δ i j K ˜ Λ ¯ n , 1 ( x ˜ ) ( X ˜ i ) E φ ( T ) ( Y i ˜ ) j = 1 m δ i j K ˜ Λ ¯ n , 1 ( x ˜ ) ( X ˜ i ) = ( n m ) ! n ! i I ( m , n ) G φ , x ˜ , 1 ( miss ) , ( T ) ( X i ˜ , Y i ˜ , δ ˜ i ) E G φ , x ˜ , 1 ( miss ) , ( T ) ( X i ˜ , Y i ˜ , δ ˜ i ) = ( n m ) ! n ! i I ( m , n ) W ( T ) , ( miss ) ( X i ˜ , Y i ˜ , δ ˜ i ) ,
where
W ( T ) , ( miss ) ( X ˜ , Y ˜ , δ ˜ ) : = G φ , x ˜ , 1 ( miss ) , ( T ) ( X ˜ , Y ˜ , δ ˜ ) E G φ , x ˜ , 1 ( miss ) , ( T ) ( X ˜ , Y ˜ , δ ˜ ) .
Mirroring the approach used in the proof of Theorem 1, we begin by establishing continuity estimates for the random fields z ˜ W ( T ) , ( miss ) ( z ˜ ) so that we get to control the probability that W ( T ) , ( miss ) ( z ˜ ) and W ( T ) , ( miss ) ( z ˜ ) are too far apart when z ˜ = ( x ˜ , y ˜ , δ ˜ ) and z ˜ = ( x ˜ , y ˜ , δ ˜ ) are close. Note that the missingness indicators δ ˜ remain unchanged in this comparison, as they are not affected by perturbations of x ˜ . Building on the framework established in Proposition 1 of [164], we present the following proposition to determine the behavior of u n , 1 ( m ) ( W ( T ) , ( miss ) ( x ˜ , y ˜ , δ ˜ ) ) u n , 1 ( m ) ( W ( T ) , ( miss ) ( x ˜ , y ˜ , δ ˜ ) ) .
Proposition 1.
Let x S d , 1 ( b ˘ ( d + 1 ) ) , n 1 , 0 < b ˘ < e 16 2 d 1 , 0 < a e 1 f | log b ˘ | / b ˘ d + 1 / 2 , and take the unique
δ 0 , e 1 t h a t   s a t i s f i e s δ | log δ | = b ˘ d + 1 / 2 a 2 f | log b ˘ | .
Then, under the MAR assumption (2.4) and the positivity condition, for all h R , we have
P sup x ˜ x ˜ + [ b ˘ , b ˘ ] m u n , 1 ( m ) ( W ( T ) , ( miss ) ( x ˜ , y ˜ , δ ˜ ) ) h + 2 a m , u n , 1 ( m ) ( W ( T ) , ( miss ) ( x ˜ , y ˜ , δ ˜ ) ) h C φ , d m exp 1 100 2 d 4 f 2 · n 1 / 2 b ˘ d + 1 / 2 a m | log δ | | log b ˘ | 2 ,  
where C φ , d m > 0 is a constant that depends only on the function φ ( · ) , the dimension d, the degree m, and the bounds on the propensity score p ( · ) .
Proof of Proposition 1.
Similar to the proof of Proposition 4, by a union bound the probability in (12.3) is
P sup x ˜ x ˜ + [ b ˘ , b ˘ ] m u n , 1 ( m ) ( W ( T ) , ( miss ) ( x ˜ , y ˜ , δ ˜ ) ) u n , 1 ( m ) ( W ( T ) , ( miss ) ( x ˜ , y ˜ , δ ˜ ) ) 1 x i ˜ S d , 1 m S d , 1 m ( δ ) a m
i I ( m , n ) 1 x i ˜ S d , 1 m S d , 1 m ( δ ) 2 · 2 m C n m f m δ m
+ P i I ( m , n ) 1 x i ˜ S d , 1 m S d , 1 m ( δ ) 2 · 2 m C n m f m δ m
+ P sup x ˜ x ˜ + [ b ˘ , b ˘ ] m u n , 1 ( m ) ( W ( T ) , ( miss ) ( x ˜ , y ˜ , δ ˜ ) ) u n , 1 ( m ) ( W ( T ) , ( miss ) ( x ˜ , y ˜ , δ ˜ ) ) 1 x i ˜ S d , 1 m ( δ ) a m .
Let us begin with (12.4), following the same reasoning as Proposition 4, on the event
( C n m ) 1 i I ( m , n ) 1 x i ˜ S d , 1 m } } S d , 1 m ( δ ) 2 · 2 m f m δ m ,
we have
u n , 1 ( m ) ( W ( T ) , ( miss ) ( x ˜ , y ˜ , δ ˜ ) ) u n , 1 ( m ) ( W ( T ) , ( miss ) ( x ˜ , y ˜ , δ ˜ ) ) 1 x i ˜ S d , 1 m } } S d , 1 m ( δ ) sup ( x ˜ , x ˜ ) S d , 1 2 m ( b ) W ( T ) , ( miss ) ( x ˜ , y ˜ , δ ˜ ) W ( T ) , ( miss ) ( x ˜ , y ˜ , δ ˜ ) ( C n m ) 1 i I ( m , n ) 1 x i ˜ S d , 1 m S d , 1 m ( δ ) 4 · 2 · 2 m ω n , 1 sup x ˜ S d , 1 m K ˜ Λ ¯ n , 1 ( x ˜ ) ( x ˜ , X ˜ ) · f m δ m 4 · 2 · 2 m ω n , 1 · b ˘ d m ( b ˘ 1 + d ) m / 2 f m δ m .
The latter equation is obtained by (15.11), (15.12) and Lemma 2 [164]. Therefore, we have
u n , 1 ( m ) ( W ( T ) , ( miss ) ( x ˜ , y ˜ , δ ˜ ) ) u n , 1 ( m ) ( W ( T ) , ( miss ) ( x ˜ , y ˜ , δ ˜ ) ) 1 x i ˜ S d , 1 m S d , 1 m ( δ ) 8 ω n , 1 ( 1 + b ˘ d ) m / 2 | log ( δ ) | m | log ( b ˘ ) | m a m .
Since 0 < δ e 1 and 0 < b ˘ < e 8 2 d 1 by assumption, the above is < a m , which means that ( 12.7 ) < a m , this implies that the probability in (12.4) equals zero. Next, we apply Hoeffding’s inequality to control the probability in (12.5). Since 0 1 x ˜ S d , 1 m S d , 1 m ( δ ) 1 , and
μ = E 1 x ˜ S d , 1 m S d , 1 m ( δ ) j = 1 m E 1 x j S d , 1 S d , 1 ( δ ) 2 m f m δ m ( ( d 1 ) ! ) m .
Then, for t = 2 · 2 m f m δ m μ , we have
P i I ( m , n ) 1 x i ˜ S d , 1 m S d , 1 m ( δ ) μ t exp 2 [ n / m ] ( 2 ( ( d 1 ) ! ) m 1 ) · 2 m f m ( ( d 1 ) ! ) m δ m 2 ,
taking into account (12.2), we obtain
P i I ( m , n ) 1 x i ˜ S d , 1 } } S d , 1 m ( δ ) μ t exp 2 [ n / m ] b ˘ d + 1 / 2 a | log δ | | log b ˘ | 2 m .
Moving on to (12.6), then
u n , 1 ( m ) W ( T ) , ( miss ) ( x ˜ , y ˜ , δ ˜ ) W ( T ) , ( miss ) ( x ˜ , y ˜ , δ ˜ ) 1 x i ˜ S d , 1 m S d , 1 m ( δ ) = m u n , 1 ( 1 ) π 1 , m G φ , x ˜ , 1 ( miss ) , ( T ) G φ , x ˜ , 1 ( miss ) , ( T ) 1 x i ˜ S d , 1 m S d , 1 m ( δ ) + q = 2 m m ! ( m q ) ! u n , 1 ( q ) π q , m G φ , x ˜ , 1 ( miss ) , ( T ) G φ , x ˜ , 1 ( miss ) , ( T ) 1 x i ˜ S d , 1 m S d , 1 m ( δ ) ,
where the linear term:
m u n , 1 ( 1 ) π 1 , m G φ , x ˜ , 1 ( miss ) , ( T ) G φ , x ˜ , 1 ( miss ) , ( T ) 1 x i ˜ S d , 1 m S d , 1 m ( δ ) = m n i = 1 n π 1 , m G φ , x ˜ , 1 ( miss ) , ( T ) G φ , x ˜ , 1 ( miss ) , ( T ) 1 x i ˜ S d , 1 m S d , 1 m ( δ ) ( X ˜ i , Y ˜ i , δ ˜ i ) ,
can be treated similarly to the proof of Proposition 4. The presence of the missingness indicators δ ˜ i does not affect the argument, as they are bounded by 1 and independent of the truncation threshold. Now, for the nonlinear term, let us first introduce the following class of functions:
F : = G ( miss ) ( φ , x ˜ ) G ( miss ) ( φ , x ˜ ) : x ˜ S d , 1 m ( b ˘ ( d + 1 ) and x ˜ x ˜ + [ b ˘ , b ˘ ] m ,
then we have for all ε > 0
P sup x ˜ x ˜ + [ b ˘ , b ˘ ] m q = 2 m m ! ( m q ) ! u n , 1 ( q ) π q , m G φ , x ˜ , 1 ( miss ) , ( T ) G φ , x ˜ , 1 ( miss ) , ( T ) 1 x i ˜ S d , 1 m S d , 1 m ( δ ) ε P q = 2 m m ! ( m q ) ! u n , 1 ( q ) π q , m G φ , x ˜ , 1 ( miss ) , ( T ) G φ , x ˜ , 1 ( miss ) , ( T ) 1 x i ˜ S d , 1 m S d , 1 m ( δ ) F ε .
We have
E n 1 m I m n ε i 1 ( 1 ) ε i 2 ( 2 ) G φ , x ˜ , 1 ( miss ) , ( T ) G φ , x ˜ , 1 ( miss ) , ( T ) 1 x i ˜ S d , 1 m S d , 1 m ( δ ) F 2 C E n 1 m I m n ε i 1 ( 1 ) ε i 2 ( 2 ) φ ( Y i 1 , , Y i m ) j = 1 m δ i j .
Using the same reasoning as in [187], one can find a positive constant c 0 > 0 such that
E n 1 m I m n ε i 1 ( 1 ) ε i 2 ( 2 ) φ ( Y i 1 , , Y i m ) j = 1 m δ i j < c 0 .
Now, an application of Proposition 4 of [187] gives us for ε = a m n 1 / 2 while taking into consideration (12.2)
P q = 2 m m ! ( m q ) ! u n , 1 ( q ) π q , m G φ , x ˜ , 1 ( miss ) , ( T ) G φ , x ˜ , 1 ( miss ) , ( T ) 1 x i ˜ S d , 1 m S d , 1 m ( δ ) F a m n 1 / 2 2 exp a m n 1 / 2 2 m + 5 m m + 1 2 ω n , 1 b ˘ d m ( b ˘ 1 + d ) m / 2 c 0 2 exp ( 2 δ log δ f log b ˘ ) m n 1 / 2 2 m + 5 m m + 1 2 ω n , 1 b ˘ m ( d + 1 / 2 ) b ˘ d m ( b ˘ 1 + d ) m / 2 c 0 2 exp ( δ log δ f log b ˘ ) m n 1 / 2 2 6 m m + 1 ω n , 1 ( 1 + b ˘ d ) m / 2 c 0 .
We can find a constant C 1 > 0 , such that
( δ log δ f ) m 2 6 m m + 1 ( 1 + b ˘ d ) m / 2 c 0 C 1 ,
which implies
exp ( δ log δ f log b ˘ ) m n 1 / 2 2 6 m m + 1 ω n , 1 ( 1 + b ˘ d ) m / 2 c 0 exp C 1 log b ˘ m n 1 / 2 1 / p exp C 1 m log b ˘ n 1 / 2 1 / p .
Therefore, we readily infer that
n = 1 P q = 2 m m ! ( m q ) ! u n , 1 ( q ) π q , m G φ , x ˜ , 1 ( miss ) , ( T ) G φ , x ˜ , 1 ( miss ) , ( T ) 1 x i ˜ S d , 1 m S d , 1 m ( δ ) F a m n 1 / 2 < .
Hence, the proof of the proposition is complete by an application of Borel-Cantelli lemma. □
  • Remainder Part under MAR
We now consider the remainder part, recall that the U-statistic u n , 1 ( R ) , ( miss ) ( φ , x ˜ ) is based on the unbounded kernel given by
G φ , x ˜ , 1 ( miss ) , ( R ) ( x ˜ , y ˜ , δ ˜ ) = G φ , x ˜ , 1 ( miss ) ( x ˜ , y ˜ , δ ˜ ) 1 { φ ( y ) > λ ξ n 1 / ( 1 + γ ) ) } .
We have to establish that it is negligible, meaning that
sup x ˜ S d , 1 m n b ˘ m ( d + 1 / 2 ) u n , 1 ( m ) ( G φ , x ˜ , 1 ( miss ) , ( R ) ) E u n , 1 ( m ) ( G φ , x ˜ , 1 ( miss ) , ( R ) ) log b ˘ m ( log n ) 3 / 2 = o a . s ( 1 ) .
For x ˜ , y ˜ S d , 1 m and δ ˜ { 0 , 1 } m , observe that
G φ , x ˜ , 1 ( miss ) ( x ˜ , y ˜ , δ ˜ ) b ˘ d m ( b ˘ 1 + d ) m / 2 φ ( y ˜ ) = : F ˜ ( y ˜ ) ,
where we used the fact that | j = 1 m δ j | 1 . Taking into account that F ˜ is symmetric and does not depend on δ ˜ (since the missingness indicators are bounded by 1), we have
u n , 1 ( m ) G φ , x ˜ , 1 ( miss ) , ( R ) u n , 1 ( m ) F ˜ 1 { F ˜ > λ ξ n 1 / ( 1 + γ ) } ,
where u n , 1 ( m ) F ˜ ( y ) 1 { φ ( y ) > λ ξ n 1 / ( 1 + γ ) } is a U-statistic based on the U-kernel F ˜ 1 { φ > λ ξ n 1 / ( 1 + γ ) } . Under the MAR assumption, the missingness indicators are independent of the truncation event and are bounded, so the same inequality holds almost surely. Consequently,
sup x ˜ S d , 1 m n b ˘ m ( d + 1 / 2 ) u n , 1 ( m ) ( G φ , x ˜ , 1 ( miss ) , ( R ) ) log b ˘ m ( log n ) 3 / 2 n ( 1 + b ˘ d ) m / 2 log b ˘ m ( log n ) 3 / 2 u n , 1 ( m ) F ˜ 1 { F ˜ > λ ξ n 1 / ( 1 + γ ) } C 7 ξ n u n , 1 ( m ) F ˜ 1 { F ˜ > λ ξ n 1 / ( 1 + γ ) } ,
and
sup x ˜ S d , 1 m n b ˘ m ( d + 1 / 2 ) u n , 1 ( m ) ( G φ , x ˜ , 1 ( miss ) , ( R ) ) log b ˘ m ( log n ) 3 / 2 C 7 ξ n E u n , 1 ( m ) F ˜ 1 { φ ( Y ) > λ ξ n 1 / ( 1 + γ ) } C 7 E F ˜ 2 + γ 1 { φ ( Y ) > λ ξ n 1 / ( 1 + γ ) } .
Therefore, as n , we have
sup x ˜ S d , 1 m n b ˘ m ( d + 1 / 2 ) u n , 1 ( m ) ( G φ , x ˜ , 1 ( miss ) , ( R ) ) log b ˘ m ( log n ) 3 / 2 = o ( 1 ) .
Hence, to achieve the proof, it remains to establish that
u n , 1 ( m ) F ˜ 1 { φ ( y ) > λ ξ n 1 / ( 1 + γ ) } = o a . s s m 1 ξ n 1 / 2 .
An application of Chebyshev’s inequality, for any η > 0 , gives
P u n , 1 ( m ) F ˜ 1 { φ ( Y ) > λ ξ n 1 / ( 1 + γ ) } E u n , 1 ( m ) F ˜ 1 { φ ( Y ) > λ ξ n 1 / ( 1 + γ ) } η ( s m 1 ξ n ) 1 / 2 η 2 ( s m 1 ξ n ) V a r u n , 1 ( m ) F ˜ 1 { φ ( Y ) > λ ξ n 1 / ( 1 + γ ) } m η 2 ξ n E F ˜ 2 1 { φ ( Y ) > λ ξ n 1 / ( 1 + γ ) } m n 2 η 2 ( ξ n ) 1 + γ E F ˜ 2 1 { φ ( Y ) > λ ξ n 1 / ( 1 + γ ) } η E F ˜ 3 1 { φ ( Y ) > λ ξ n 1 / ( 1 + γ ) } 1 n 2 ,
so by using the fact that
η E F ˜ 3 1 { φ ( y ) > λ ξ n 1 / ( 1 + γ ) } n 1 1 n 2 < ,
we deduce that
n 1 P u n , 1 ( m ) F ˜ 1 { φ ( y ) > λ ξ n 1 / ( 1 + γ ) } E u n , 1 ( m ) F ˜ 1 { φ ( y ) > λ ξ n 1 / ( 1 + γ ) } η ( m ξ n ) 1 / 2 < .
Finally, note that (12.11) implies
E u n , 1 ( m ) F ˜ 1 { φ ( y ) > λ ξ n 1 / ( 1 + γ ) } = o s m 1 ξ n 1 / 2 .
The preceding results for the arbitrary choice of λ > 0 show that (12.13) holds, which, by combining with (12.12) and (12.11), completes the proof of (12.10). We finally obtain
sup x ˜ S d , 1 m u n , 1 ( miss ) ( φ , x ˜ ) E [ u n , 1 ( miss ) ( φ , x ˜ ) ] = O | log b ˘ | m ( log n ) 3 / 2 b ˘ m ( d + 1 / 2 ) n , a . s .
Hence, the proof is complete. □
Proof of Theorem 3.
Recall the classical decomposition established in (11.2). Under the MAR assumption (2.4) and the positivity condition inf x X p ( x ) c > 0 , we have for any x ˜ S d , 1 m :
r ^ n , 1 ( m ) , ( miss ) ( φ , x ˜ ; Λ ¯ n , 1 ( x ˜ ) ) E ^ r ^ n , 1 ( m ) , ( miss ) ( φ , x ˜ ; Λ ¯ n , 1 ( x ˜ ) ) I n , 1 ( x ˜ ) + I n , 2 ( x ˜ ) ,
where the stochastic components are defined as
I n , 1 ( x ˜ ) : = u n , 1 ( miss ) ( φ , x ˜ ) E u n , 1 ( miss ) ( φ , x ˜ ) u n , 1 ( miss ) ( 1 , x ˜ ) , I n , 2 ( x ˜ ) : = E u n , 1 ( miss ) ( φ , x ˜ ) · u n , 1 ( miss ) ( 1 , x ˜ ) E u n , 1 ( miss ) ( 1 , x ˜ ) u n , 1 ( miss ) ( 1 , x ˜ ) · E u n , 1 ( miss ) ( 1 , x ˜ ) .
Given the regularity conditions imposed in Section 3.2 and the positivity condition on the propensity score p ( · ) , there exist deterministic constants c 1 , c 2 > 0 (independent of x ˜ and n, for sufficiently large n) such that:
inf x ˜ S d , 1 m u n , 1 ( miss ) ( 1 , x ˜ ) c 1 > 0 almost surely ,
inf x ˜ S d , 1 m E u n , 1 ( miss ) ( 1 , x ˜ ) c 2 > 0 .
These bounds follow from the uniform convergence of u n , 1 ( miss ) ( 1 , x ˜ ) to E [ u n , 1 ( miss ) ( 1 , x ˜ ) ] (established in Theorem 2) and the fact that E [ u n , 1 ( miss ) ( 1 , x ˜ ) ] converges uniformly to j = 1 m p ( x j ) f ˜ ( x ˜ ) (see the bias expansion in Theorem 4), which is bounded away from zero on S d , 1 m under the positivity condition and the assumption that f is bounded below on the compact set S d , 1 m ( δ ) (and extended appropriately). Moreover, the boundedness of the numerator term is guaranteed by the moment condition (C.3):
sup x ˜ S d , 1 m E u n , 1 ( miss ) ( φ , x ˜ ) = O ( 1 ) as n .
From Theorem 2, we have the following almost sure uniform convergence rate for the U-statistics:
sup x ˜ S d , 1 m u n , 1 ( miss ) ( ψ , x ˜ ) E u n , 1 ( miss ) ( ψ , x ˜ ) = O | log b ˘ | m ( log n ) 3 / 2 b ˘ m ( d + 1 / 2 ) n a . s . ,
for ψ = φ and ψ 1 . This result is derived from the truncation argument and the exponential inequalities for degenerate U-statistics, as detailed in the proof of Theorem 2. Using the decomposition r ^ n , 1 ( m ) , ( miss ) E ^ [ r ^ n , 1 ( m ) , ( miss ) ] I n , 1 + I n , 2 and the uniform bounds from Steps 1 and 2, we obtain almost surely:
sup x ˜ S d , 1 m b ˘ m ( d + 1 / 2 ) n r ^ n , 1 ( m ) , ( miss ) ( φ , x ˜ ; Λ ¯ n , 1 ( x ˜ ) ) E ^ r ^ n , 1 ( m ) , ( miss ) ( φ , x ˜ ; Λ ¯ n , 1 ( x ˜ ) ) | log b ˘ | m ( log n ) 3 / 2 sup x ˜ S d , 1 m b ˘ m ( d + 1 / 2 ) n I n , 1 ( x ˜ ) | log b ˘ | m ( log n ) 3 / 2 + sup x ˜ S d , 1 m b ˘ m ( d + 1 / 2 ) n I n , 2 ( x ˜ ) | log b ˘ | m ( log n ) 3 / 2 .
For the first term, using (12.15) and (12.18) with ψ = φ , we have:
sup x ˜ S d , 1 m b ˘ m ( d + 1 / 2 ) n I n , 1 ( x ˜ ) | log b ˘ | m ( log n ) 3 / 2 1 c 1 sup x ˜ S d , 1 m b ˘ m ( d + 1 / 2 ) n u n , 1 ( miss ) ( φ , x ˜ ) E [ u n , 1 ( miss ) ( φ , x ˜ ) ] | log b ˘ | m ( log n ) 3 / 2 = O ( 1 ) a . s .
For the second term, we apply (12.15)–(12.18) with ψ 1 :
sup x ˜ S d , 1 m b ˘ m ( d + 1 / 2 ) n I n , 2 ( x ˜ ) | log b ˘ | m ( log n ) 3 / 2 sup x ˜ E [ u n , 1 ( miss ) ( φ , x ˜ ) ] c 1 c 2 sup x ˜ S d , 1 m b ˘ m ( d + 1 / 2 ) n u n , 1 ( miss ) ( 1 , x ˜ ) E [ u n , 1 ( miss ) ( 1 , x ˜ ) ] | log b ˘ | m ( log n ) 3 / 2
= O ( 1 ) a . s .
Combining (12.20) and (12.22) into (12.19), we deduce the existence of a finite constant C * > 0 such that, almost surely,
lim sup n sup x ˜ S d , 1 m b ˘ m ( d + 1 / 2 ) n r ^ n , 1 ( m ) , ( miss ) ( φ , x ˜ ; Λ ¯ n , 1 ( x ˜ ) ) E ^ r ^ n , 1 ( m ) , ( miss ) ( φ , x ˜ ; Λ ¯ n , 1 ( x ˜ ) ) | log b ˘ | m ( log n ) 3 / 2 C * .
Equivalently,
sup x ˜ S d , 1 m r ^ n , 1 ( m ) , ( miss ) ( φ , x ˜ ; Λ ¯ n , 1 ( x ˜ ) ) E ^ r ^ n , 1 ( m ) , ( miss ) ( φ , x ˜ ; Λ ¯ n , 1 ( x ˜ ) ) = O | log b ˘ | m ( log n ) 3 / 2 b ˘ m ( d + 1 / 2 ) n a . s . ,
which completes the proof of Theorem 3. □
Proof of Theorem 4.
Throughout this proof, we operate under the MAR assumption (2.4) with the positivity condition inf x X p ( x ) c > 0 , and the smoothness conditions (C.2). The goal is to establish the uniform bias expansion
sup x ˜ S d , 1 m E ^ r ^ n , 1 ( m ) , ( miss ) ( φ , x ˜ ; Λ ¯ n , 1 ( x ˜ ) ) r ( m ) ( φ , x ˜ ) = O ( b ˘ 1 / 2 ) .
From the bias decomposition (11.3), we have
E ^ r ^ n , 1 ( m ) , ( miss ) ( φ , x ˜ ) r ( m ) ( φ , x ˜ ) = E u n , 1 ( miss ) ( φ , x ˜ ) r ( m ) ( φ , x ˜ ) E u n , 1 ( miss ) ( 1 , x ˜ ) E u n , 1 ( miss ) ( 1 , x ˜ ) .
Therefore, it suffices to establish the two uniform estimates
sup x ˜ S d , 1 m E u n , 1 ( miss ) ( φ , x ˜ ) R ( φ , x ˜ ) = O ( b ˘ 1 / 2 ) ,
sup x ˜ S d , 1 m E u n , 1 ( miss ) ( 1 , x ˜ ) f ˜ ( x ˜ ) = O ( b ˘ 1 / 2 ) ,
where R ( φ , x ˜ ) : = f ˜ ( x ˜ ) r ( m ) ( φ , x ˜ ) . Indeed, once these are proved, a standard algebraic manipulation yields the desired result, noting that E [ u n , 1 ( miss ) ( 1 , x ˜ ) ] is uniformly bounded away from zero under the positivity condition. For any measurable function ψ : Y m R (with ψ = φ or ψ 1 ), the MAR assumption gives
E u n , 1 ( miss ) ( ψ , x ˜ ) = E ψ ( Y ˜ ) j = 1 m δ j K ˜ Λ ¯ n , 1 ( x ˜ ) ( X ˜ ) .
By the law of total expectation and the conditional independence δ j Y j X j implied by MAR,
E ψ ( Y ˜ ) j = 1 m δ j K ˜ Λ ¯ n , 1 ( x ˜ ) ( X ˜ ) = E j = 1 m p ( X j ) K ˜ Λ ¯ n , 1 ( x ˜ ) ( X ˜ ) E ψ ( Y ˜ ) X ˜ .
Consequently,
E u n , 1 ( miss ) ( ψ , x ˜ ) = S d , 1 m E ψ ( Y ˜ ) X ˜ = u ˜ j = 1 m p ( u j ) f ˜ ( u ˜ ) K ˜ Λ ¯ n , 1 ( x ˜ ) ( u ˜ ) d u ˜ = S d , 1 m ψ cond ( u ˜ ) j = 1 m p ( u j ) f ˜ ( u ˜ ) K ˜ Λ ¯ n , 1 ( x ˜ ) ( u ˜ ) d u ˜ ,
where ψ cond ( u ˜ ) : = E [ ψ ( Y ˜ ) X ˜ = u ˜ ] ; in particular, for ψ = φ , φ cond ( u ˜ ) = r ( m ) ( φ , u ˜ ) , and for ψ 1 , 1 cond ( u ˜ ) = 1 . Recall that for each j = 1 , , m , the scaled kernel K ( α j , β j ) ( · ) is the probability density function of a Dirichlet random vector ξ x j Dirichlet ( α j , β j ) with parameters given by (3.1). Moreover, under the i.i.d. assumption, the vectors ξ x 1 , , ξ x m are independent. Therefore, we can rewrite (12.27) as
E u n , 1 ( miss ) ( ψ , x ˜ ) = E ψ cond ( ξ ˜ x ˜ ) j = 1 m p ( ξ x j ) f ˜ ( ξ ˜ x ˜ ) ,
where ξ ˜ x ˜ : = ( ξ x 1 , , ξ x m ) . For ψ 1 , this simplifies to
E u n , 1 ( miss ) ( 1 , x ˜ ) = E j = 1 m p ( ξ x j ) f ˜ ( ξ ˜ x ˜ ) .
Since p ( · ) is continuous, we perform a first-order Taylor expansion of p ( ξ x j ) around x j . Using the fact that E [ ξ x j ] = x j + O ( b ˘ ) , we obtain
p ( ξ x j ) = p ( x j ) + p ( x j ) ( ξ x j x j ) + O ( ξ x j x j 2 ) .
Taking expectations and using the moment estimates E [ ξ x j x j ] = O ( b ˘ 1 / 2 ) and E [ ξ x j x j 2 ] = O ( b ˘ ) , we get
E [ p ( ξ x j ) ] = p ( x j ) + O ( b ˘ 1 / 2 ) .
By independence of the ξ x j ’s, we have
E j = 1 m p ( ξ x j ) = j = 1 m p ( x j ) + O ( b ˘ 1 / 2 ) .
More generally, for the product with f ˜ ( ξ ˜ x ˜ ) , a similar expansion yields
E ψ cond ( ξ ˜ x ˜ ) j = 1 m p ( ξ x j ) f ˜ ( ξ ˜ x ˜ ) = j = 1 m p ( x j ) E ψ cond ( ξ ˜ x ˜ ) f ˜ ( ξ ˜ x ˜ ) + O ( b ˘ 1 / 2 ) .
Thus,
E u n , 1 ( miss ) ( φ , x ˜ ) = j = 1 m p ( x j ) E r ( m ) ( φ , ξ ˜ x ˜ ) f ˜ ( ξ ˜ x ˜ ) + O ( b ˘ 1 / 2 )
= j = 1 m p ( x j ) E R ( φ , ξ ˜ x ˜ ) + O ( b ˘ 1 / 2 ) ,
E u n , 1 ( miss ) ( 1 , x ˜ ) = j = 1 m p ( x j ) E f ˜ ( ξ ˜ x ˜ ) + O ( b ˘ 1 / 2 ) .
We now analyze the quantity E [ R ( φ , ξ ˜ x ˜ ) ] . A second-order Taylor expansion of R ( φ , · ) around x ˜ gives
R ( φ , ξ ˜ x ˜ ) = R ( φ , x ˜ ) + i = 1 m = 1 d R ( φ , x ˜ ) x i ( ξ x i x i ) + 1 2 i = 1 m = 1 d 2 R ( φ , x ˜ ) x i 2 ( ξ x i x i ) 2 + i j , r 2 R ( φ , x ˜ ) x i x j r ( ξ x i x i ) ( ξ x j r x j r ) + R n ( ξ ˜ x ˜ ) ,
where the remainder R n ( · ) satisfies | R n ( ξ ˜ x ˜ ) | = O ( ξ ˜ x ˜ x ˜ 3 ) under the smoothness condition (C.2). For ξ x Dirichlet ( α , β ) with α = x / b ˘ + 1 and β = ( 1 x 1 ) / b ˘ + 1 , the following moment properties hold uniformly for x S d , 1 m ( δ ) (see [164] and references therein):
E [ ξ x ] = x + b ˘ 1 ( d + 1 ) x + O ( b ˘ 2 ) ,
Var ( ξ x ) = b ˘ x ( 1 x ) + O ( b ˘ 2 ) ,
Cov ( ξ x , ξ x k ) = b ˘ x x k + O ( b ˘ 2 ) , k ,
E [ ( ξ x x ) 2 ] = b ˘ x ( 1 x ) + O ( b ˘ 2 ) ,
E [ | ξ x x | 3 ] = O ( b ˘ 3 / 2 ) .
Consequently, for the m-tuple ξ ˜ x ˜ , we have
E [ ξ x i x i ] = O ( b ˘ ) ,             E [ ( ξ x i x i ) 2 ] = O ( b ˘ ) ,             E [ | ξ x i x i | 3 ] = O ( b ˘ 3 / 2 ) ,
and for i j or r ,
E [ ( ξ x i x i ) ( ξ x j r x j r ) ] = O ( b ˘ ) .
Taking expectations in (12.30) and using the moment estimates from Step 6, we obtain
E R ( φ , ξ ˜ x ˜ ) R ( φ , x ˜ ) = i = 1 m = 1 d R ( φ , x ˜ ) x i O ( b ˘ ) + 1 2 i = 1 m = 1 d 2 R ( φ , x ˜ ) x i 2 O ( b ˘ ) + i j , r 2 R ( φ , x ˜ ) x i x j r O ( b ˘ ) + O ( b ˘ 3 / 2 ) .
Under condition (C.2), all partial derivatives of R ( φ , · ) are uniformly bounded on S d , 1 m . Therefore,
E R ( φ , ξ ˜ x ˜ ) R ( φ , x ˜ ) = O ( b ˘ ) .
However, a more refined analysis using the fact that the linear term E [ ξ x i x i ] = b ˘ ( 1 ( d + 1 ) x i ) + O ( b ˘ 2 ) yields cancellation of the O ( b ˘ ) term when summed appropriately, leaving a leading term of order O ( b ˘ 1 / 2 ) due to the square-root behavior of the second moments. Indeed, by the Cauchy–Schwarz inequality,
E [ ξ x i x i ] E [ ( ξ x i x i ) 2 ] = O ( b ˘ 1 / 2 ) ,
and similarly for the cross-terms. Consequently,
sup x ˜ S d , 1 m E R ( φ , ξ ˜ x ˜ ) R ( φ , x ˜ ) = O ( b ˘ 1 / 2 ) .
Substituting (12.37) into (12.28) and (12.29), we obtain
E u n , 1 ( miss ) ( φ , x ˜ ) = j = 1 m p ( x j ) R ( φ , x ˜ ) + O ( b ˘ 1 / 2 ) ,
E u n , 1 ( miss ) ( 1 , x ˜ ) = j = 1 m p ( x j ) f ˜ ( x ˜ ) + O ( b ˘ 1 / 2 ) .
Therefore, the bias criteria (12.25) and (12.26) are satisfied. Finally, using (11.3) and the fact that E [ u n , 1 ( miss ) ( 1 , x ˜ ) ] is uniformly bounded away from zero (by the positivity condition and the uniform convergence of f ˜ ( x ˜ ) on S d , 1 m ), we deduce
sup x ˜ S d , 1 m E ^ r ^ n , 1 ( m ) , ( miss ) ( φ , x ˜ ; Λ ¯ n , 1 ( x ˜ ) ) r ( m ) ( φ , x ˜ ) = O ( b ˘ 1 / 2 ) .
This completes the proof of Theorem 4. □
Remark 21.
The cancellation of the propensity score product j = 1 m p ( x j ) in the ratio is exact at the leading order, as demonstrated in Steps 4 and 8. This cancellation is a fundamental consequence of the MAR assumption and the fact that the same missingness indicators appear in both the numerator and denominator of the estimator. The residual terms involving derivatives of p ( · ) are of order O ( b ˘ 1 / 2 ) and are asymptotically negligible under the bandwidth condition n b ˘ ( d + 2 ) / 4 0 . This justifies the claim made in Remark 5.

12.2. Proofs of the Results of Section 3.3

Before we start the proofs of this section, we will state some lemmas that are necessary to obtain the desired results. It is worth mentioning that we will follow the steps of [60] while making the appropriate changes to fit our general setting, including the incorporation of missing responses under the MAR assumption (2.4) and the use of Dirichlet kernels.
Lemma 1.
Under assumptions (A.1)–(A.4), the MAR assumption (2.4) with positivity condition inf x X p ( x ) c > 0 , and if  E φ 2 < , the Hájek projection U ^ n , 1  of  U n , 1 ( miss )  satisfies, as n :
(i) 
lim n E n b ˘ d / 2 U ^ n , 1 θ n 2 = σ 2 ( φ ) ,
where
σ 2 ( φ ) : = i = 1 m j = 1 m 1 { x i = x j } r i j ( x ˜ ) 1 p ( x i ) 2 f ( x i ) K ( α , β ) 2 ( x , t ) d t > 0 .
(ii) 
and if, in addition, assumption (A.5) is verified, we have
n b ˘ d / 2 U ^ n , 1 θ n D N 0 , σ 2 ( φ ) .
Remark 22.
In the presence of missing responses under MAR, the Hájek projection is defined conditionally on the observed data, with the missingness indicators properly accounted for in the kernel definitions. The asymptotic variance σ 2 ( φ ) remains unchanged from the complete-data case because the propensity score p ( · ) cancels out in the projection due to the same cancellation mechanism detailed in Remark 5. This is a consequence of the fact that both the numerator and denominator of the estimator contain the same product of missingness indicators.
In the following lemma, we show that U n , 1 ( miss ) has the same asymptotic distribution as U ^ n , 1 .
Lemma 2.
Under assumption (A.1)–(A.6) and the MAR assumption (2.4) with positivity condition, we have, as n ,
n b ˘ d / 2 U n , 1 ( miss ) θ n D N 0 , σ 2 ( φ ) .
Specification of σ 2 ( φ ) leads to the following lemma.
Lemma 3
([60]). Under assumptions (A.1)–(A.6) and the MAR assumption (2.4) with positivity condition, we have, as n ,
n b ˘ d / 2 1 / 2 U n , 1 ( miss ) ( φ 1 , x ˜ ) θ n φ 1 , U n , 1 ( miss ) ( φ 2 , x ˜ ) θ n φ 2 N ( 0 , Σ ) ,
in distribution, with
Σ = σ 2 φ 1 , φ 1 σ 2 φ 1 , φ 2 σ 2 φ 1 , φ 2 σ 2 φ 2 , φ 2 ,
and where for two functions g 1 ( · ) and g 2 ( · ) ,
σ 2 ( g 1 , g 2 ) = j = 1 m l = 1 m 1 x j = x l r j l g 1 g 2 ( x ˜ ) K α , β 2 ( x , t ) d t / f X x j ,
and
r j l g 1 g 2 ( x ˜ ) = E g 1 Y 1 , , Y , , Y m g 2 Y m + 1 , , Y , , Y 2 m ,
with Y entering in the j th and l th positions.
Proof of Lemma 1.
Throughout this proof, we work under the MAR assumption (2.4) with the positivity condition inf x X p ( x ) c > 0 . All expectations are taken with respect to the joint distribution of ( X , Y , δ ) , with the understanding that the missingness indicators are incorporated into the kernels as per (3.5). For notational simplicity, we denote U n , 1 = U n , 1 ( miss ) . Recall that
U n , 1 ( φ , x ˜ ) = u n , 1 ( miss ) ( φ , x ˜ ) N ( miss ) ,
where N ( miss ) = j = 1 m E [ δ K ( α j , β j ) ( X ) ] = j = 1 m E [ p ( X ) K ( α j , β j ) ( X ) ] . The centering term θ n is defined as
θ n : = E [ U n , 1 ( φ , x ˜ ) ] = N ( miss ) 1 S d , 1 m r ( m ) ( φ , t ˜ ) j = 1 m p ( t j ) f ˜ ( t ˜ ) K ˜ Λ ¯ n , 1 ( x ˜ ) ( t ˜ ) d t ˜ = N ( miss ) 1 S d , 1 m R ( φ , t ˜ ) j = 1 m p ( t j ) i = 1 m K ( α i , β i ) ( t i ) d t ˜ .
Under the smoothness condition (C.2) and the continuity of p ( · ) , a Taylor expansion yields θ n = r ( m ) ( φ , x ˜ ) + O ( b ˘ 1 / 2 ) , but this refinement is not needed for the asymptotic variance calculation. The Hájek projection U ^ n , 1 ( φ , x ˜ ) of U n , 1 ( φ , x ˜ ) is the best (in the mean square sense) linear approximation based on the individual observations. It satisfies
U ^ n , 1 θ n = 1 n i = 1 n φ ¯ n ( X i , Y i , δ i ) ,
where the projection kernel φ ¯ n is defined by
φ ¯ n ( x , y , δ ) = j = 1 m φ n , j ( x , y , δ ) θ n ,
and for each j { 1 , , m } ,
φ n , j ( x , y , δ ) : = N ( miss ) 1 S d , 1 m 1 × R q ( m 1 ) φ ( y 1 , , y j 1 , y , y j + 1 , , y m )             × r = 1 r j m δ r K ( α r , β r ) ( x r ) K ( α j , β j ) ( x ) d P ( m 1 ) ,
where the integration is with respect to the product measure of the remaining m 1 independent copies ( X r , Y r , δ r ) , r j , and P denotes the underlying probability distribution of ( X , Y , δ ) . Note that the missingness indicators δ r for r j are integrated out, while the indicator δ corresponding to the argument ( x , y ) remains explicit. By independence of the observations,
n E ( U ^ n , 1 θ n ) 2 = E φ ¯ n 2 ( X , Y , δ ) .
Expanding the square and using the linearity of expectation,
E φ ¯ n 2 = j = 1 m l = 1 m E ( φ n , j θ n ) ( φ n , l θ n ) .
Since θ n is deterministic and bounded (by the boundedness of r ( m ) ( φ , · ) and the normalization), we have
E ( φ n , j θ n ) ( φ n , l θ n ) = E [ φ n , j φ n , l ] θ n 2 .
Moreover, θ n 2 = O ( 1 ) , and we will show that b ˘ d / 2 θ n 2 0 as n (since b ˘ 0 ), so the contribution of θ n 2 is asymptotically negligible in the scaled variance. Consider two indices j l with x j x l . Then,
E [ φ n , j φ n , l ] = N ( miss ) 2 S d , 1 2 m × R 2 q m φ ( y 1 , , y j 1 , y , y j + 1 , , y m ) × φ ( y m + 1 , , y m + l 1 , y , y m + l + 1 , , y 2 m ) × r = 1 r j m K ( α r , β r ) ( x r ) s = 1 s l m K ( α s , β s ) ( x m + s ) × K ( α j , β j ) ( x ) K ( α l , β l ) ( x ) d P ( 2 m ) .
The key observation is that the product K ( α j , β j ) ( x ) K ( α l , β l ) ( x ) involves two kernels centered at different points x j and x l . As b ˘ 0 , each kernel concentrates around its respective center. Since x j x l , the supports of these kernels become asymptotically disjoint, and their product converges to zero in the sense of distributions. More precisely, under assumption (A.2) and the continuity of f,
S d , 1 K ( α j , β j ) ( x ) K ( α l , β l ) ( x ) f ( x ) d x 0 as b ˘ 0 .
Consequently,
E [ φ n , j φ n , l ] = o ( 1 ) ,             x j x l .
Now, suppose x j = x l = x 0 . Then, both kernels are centered at the same point. Using the change in variables and the properties of the Dirichlet kernel, we have
E [ φ n , j φ n , l ] = N ( miss ) 2 S d , 1 2 m × R 2 q m φ ( ) φ ( )             × r = 1 r j m K ( α r , β r ) ( x r ) s = 1 s l m K ( α s , β s ) ( x m + s )             × K ( α j , β j ) ( x ) K ( α l , β l ) ( x ) d P ( 2 m ) .
By independence and the law of large numbers for the kernel integrals,
E [ φ n , j φ n , l ] N ( miss ) 2 · E K ( α j , β j ) ( X ) K ( α l , β l ) ( X ) · r j l ( x ˜ ) · r = 1 r j m E [ K ( α r , β r ) ( X ) ] 2 ,
where the higher-order terms are negligible. Using the fact that
N ( miss ) = r = 1 m E [ p ( X ) K ( α r , β r ) ( X ) ] ,
we obtain
E [ φ n , j φ n , l ] r j l ( x ˜ ) · E K ( α j , β j ) ( X ) K ( α l , β l ) ( X ) E [ p ( X ) K ( α j , β j ) ( X ) ] 2 .
For a fixed x 0 S d , 1 , we analyze the ratio
R n : = E K ( α , β ) ( X ) 2 E [ p ( X ) K ( α , β ) ( X ) ] 2 .
Using the probabilistic representation with Dirichlet random vectors, we have
E [ p ( X ) K ( α , β ) ( X ) ] = E [ p ( ξ x 0 ) f ( ξ x 0 ) ] ,
where ξ x 0 Dirichlet ( α , β ) . Similarly,
E [ K ( α , β ) ( X ) 2 ] = A b ( x 0 ) E [ f ( γ x 0 ) ] ,
with γ x 0 Dirichlet ( 2 α , 2 β ) and
A b ( x 0 ) = Γ 2 ( 1 x 0 1 ) b ˘ + 1 i = 1 d Γ 2 x 0 i b ˘ + 1 Γ 2 1 x 0 1 b ˘ + 1 i = 1 d Γ 2 x 0 i b ˘ + 1 · Γ 2 1 b ˘ + d + 1 Γ 2 b ˘ + d + 1 .
By the continuity of f and p, and using the concentration properties of the Dirichlet distribution, we have
E [ p ( ξ x 0 ) f ( ξ x 0 ) ] = p ( x 0 ) f ( x 0 ) + O ( b ˘ 1 / 2 ) ,
and
E [ f ( γ x 0 ) ] = f ( x 0 ) + O ( b ˘ 1 / 2 ) .
Moreover, using Stirling’s approximation for the gamma function, one can show that
A b ( x 0 ) 1 b ˘ d / 2 · 1 ( 2 π ) d / 2 i = 1 d x 0 i ( 1 x 0 1 ) as b ˘ 0 .
Consequently,
R n A b ( x 0 ) f ( x 0 ) p ( x 0 ) 2 f ( x 0 ) 2 1 b ˘ d / 2 · 1 p ( x 0 ) 2 f ( x 0 ) · 1 ( 2 π ) d / 2 i = 1 d x 0 i ( 1 x 0 1 ) .
However, note that the integral of the squared kernel satisfies
S d , 1 K ( α , β ) 2 ( x , t ) d t 1 b ˘ d / 2 · 1 ( 2 π ) d / 2 i = 1 d x 0 i ( 1 x 0 1 ) as b ˘ 0 .
Therefore,
R n 1 p ( x 0 ) 2 f ( x 0 ) S d , 1 K ( α , β ) 2 ( x , t ) d t .
Returning to the expression for E [ φ n , j φ n , l ] , we obtain
E [ φ n , j φ n , l ] r j l ( x ˜ ) · 1 p ( x 0 ) 2 f ( x 0 ) S d , 1 K ( α , β ) 2 ( x , t ) d t .
Since θ n 2 = O ( 1 ) and b ˘ d / 2 0 , we have b ˘ d / 2 θ n 2 0 . Therefore,
lim n E n b ˘ d / 2 ( U ^ n , 1 θ n ) 2 = lim n b ˘ d / 2 E [ φ ¯ n 2 ] = lim n b ˘ d / 2 j = 1 m l = 1 m E [ φ n , j φ n , l ] = j = 1 m l = 1 m 1 { x j = x l } r j l ( x ˜ ) 1 p ( x j ) 2 f ( x j ) K ( α , β ) 2 ( x , t ) d t · lim n b ˘ d / 2 · 1 b ˘ d / 2 = i = 1 m j = 1 m 1 { x i = x j } r i j ( x ˜ ) 1 p ( x i ) 2 f ( x i ) K ( α , β ) 2 ( x , t ) d t .
This matches the expression in (12.40) after noting that the factor 1 / p ( x i ) 2 in the variance of the projection will later combine with the factor p ( x i ) 2 from the normalization to yield the final variance σ 2 ( φ ) as defined. The cancellation occurs because U n , 1 includes the factor ( N ( miss ) ) 1 , which contains p ( x j ) in its denominator. This completes the proof of part (i).
To establish part (ii), we verify Lyapunov’s condition for the triangular array { n 1 / 2 φ ¯ n ( X i , Y i , δ i ) } i = 1 n . It suffices to show that
1 n 1 / 2 ( b ˘ d / 2 ) 3 / 2 E | φ ¯ n ( X , Y , δ ) | 3 0 as n .
Using the inequality | a b | 3 3 ( | a | 3 + | a | 2 | b | + | a | | b | 2 + | b | 3 ) , we obtain
E [ | φ ¯ n | 3 ] C i , j , l = 1 m E | φ n , i φ n , j φ n , l | + O ( 1 ) ,
where C is an absolute constant. By symmetry and the boundedness of θ n , the dominant contributions come from triples ( i , j , l ) where x i = x j = x l . Under assumption (A.5), we have
E | φ n , i φ n , j φ n , l | = O b ˘ d .
Therefore,
1 n 1 / 2 ( b ˘ d / 2 ) 3 / 2 E [ | φ ¯ n | 3 ] = O 1 n 1 / 2 b ˘ 3 d / 4 · b ˘ d = O 1 n 1 / 2 b ˘ d / 4 .
Under the bandwidth condition b ˘ d = o ( n ) (which is implied by n b ˘ d ), we have 1 n 1 / 2 b ˘ d / 4 0 . Hence, Lyapunov’s condition is satisfied. By the Lindeberg–Lévy central limit theorem for triangular arrays, we conclude that
n b ˘ d / 2 ( U ^ n , 1 θ n ) D N ( 0 , σ 2 ( φ ) ) ,
which proves part (ii). □
Proof of Lemma 2.
To study the asymptotic distribution of U n , we need to bound the variance of U n U ^ n , 1 . To do that, it is sufficient to show that
( n b ˘ d / 2 ) 1 / 2 U n , 1 U ^ n , 1 0 in L 2 .
As in [60], using the centered variance formula for a centered, or zero mean U-statistic of degree m, for Z i , i 1 i.i.d., we have
V n = ( n m ) ! n ! i I ( m , n ) G ˜ ( Z i 1 , , Z i m ) N ,
with a not necessary symmetric U-Kernel G ˜ ( · ) , that is square-integrable, which gives us
Var ( V n ) ( n m ) ! n ! 2 r = 1 m ( n r ) ! ( n 2 m + r ) ! ( r ) I Δ 1 , Δ 2 N 2 ,
where Δ 1 and Δ 2 represents positions of some length 1 r m , and
I Δ ˜ 1 , Δ ˜ 2 = G ˜ z 1 , , z m G ˜ y 1 , , y m F d z 1 F d z 2 m r ,
with the y’s in position Δ ˜ 2 coincide with the z’s in position Δ ˜ 1 and are taken from z m + 1 , , z 2 m r otherwise. Moreover, Σ ( r ) represents the summation over all positions Δ ˜ 1 , Δ ˜ 2 with a cardinality of r, and F ( · ) denotes the common distribution function of the Z’s. When considering V n = U n U ^ n , and recalling G ˜ from [177] (in the symmetric case), we obtain
Σ ( 1 ) I Δ ˜ 1 , Δ ˜ 2 = 0 .
Furthermore, by (A.6), we infer that
N 2 I Δ ˜ 1 , Δ ˜ 2 = O b ˘ d r / 2 for each 2 r m .
In conclusion, we have
n b ˘ d / 2 Var U n U ^ n = O n b ˘ d / 2 r = 2 m n m 1 m r n m m r ( b ˘ d / 2 ) r = O r = 2 m n b ˘ d / 2 1 r = O n b ˘ d / 2 1 = o ( 1 ) .
Hence, the proof is complete. □
Proof of Theorem 5.
Throughout this proof, we operate under the MAR assumption (2.4) with the positivity condition inf x X p ( x ) c > 0 , and the regularity conditions (A.1)–(A.4) and (C.2) as specified in Theorem 5. All quantities U n , 1 ( miss ) ( · , x ˜ ) are understood to incorporate the missingness indicators via the kernel definitions in (3.5) and (3.6). For notational simplicity, we drop the superscript (miss) throughout the proof. The proof proceeds in three main stages. First, we establish the joint asymptotic normality of the vector
V n : = U n , 1 ( φ 1 , x ˜ ) θ n ( φ 1 ) U n , 1 ( φ 2 , x ˜ ) θ n ( φ 2 )
using the Cramér–Wold device together with Lemma 2. Second, we apply the delta method to the differentiable transformation g ( x 1 , x 2 ) = x 1 / x 2 to obtain the asymptotic distribution of the ratio U n , 1 ( φ , x ˜ ) / U n , 1 ( 1 , x ˜ ) . Third, we identify the asymptotic variance ρ 2 and conclude the proof. Let c 1 , c 2 R be arbitrary constants. Consider the linear combination
c 1 U n , 1 ( φ 1 , x ˜ ) + c 2 U n , 1 ( φ 2 , x ˜ ) .
By the linearity of the U-statistic in the kernel function, we have
c 1 U n , 1 ( φ 1 , x ˜ ) + c 2 U n , 1 ( φ 2 , x ˜ ) = U n , 1 c 1 φ 1 + c 2 φ 2 , x ˜ = : U n , 1 ( φ , x ˜ ) ,
where φ : = c 1 φ 1 + c 2 φ 2 . This identity holds because the normalization factor N ( miss ) is common to all U n , 1 and the kernel is linear in φ . Indeed, from the definition
U n , 1 ( φ , x ˜ ) = u n , 1 ( miss ) ( φ , x ˜ ) N ( miss ) ,
and u n , 1 ( miss ) ( · , x ˜ ) is linear in its argument, we obtain
u n , 1 ( miss ) ( c 1 φ 1 + c 2 φ 2 , x ˜ ) = c 1 u n , 1 ( miss ) ( φ 1 , x ˜ ) + c 2 u n , 1 ( miss ) ( φ 2 , x ˜ ) .
Dividing by N ( miss ) yields the claimed linearity. Since φ = c 1 φ 1 + c 2 φ 2 satisfies the same regularity conditions as φ 1 and φ 2 (by linearity of the assumptions), Lemma 2 applies. Consequently, as n ,
n b ˘ d / 2 U n , 1 ( φ , x ˜ ) θ n ( φ ) D N 0 , σ 2 ( φ ) ,
where σ 2 ( φ ) is defined in (12.40). By the linearity of θ n ( · ) (which follows from the linearity of the expectation and the definition of θ n ), we have
θ n ( φ ) = c 1 θ n ( φ 1 ) + c 2 θ n ( φ 2 ) .
Moreover, the quadratic form σ 2 ( φ ) expands as
σ 2 ( φ ) = c 1 2 σ 2 ( φ 1 ) + 2 c 1 c 2 σ 2 ( φ 1 , φ 2 ) + c 2 2 σ 2 ( φ 2 ) ,
where σ 2 ( φ 1 , φ 2 ) denotes the asymptotic covariance between U n , 1 ( φ 1 , x ˜ ) and U n , 1 ( φ 2 , x ˜ ) , given explicitly in Lemma 3. The Cramér–Wold device (see, e.g., [188]) states that the joint convergence of the vector V n follows from the convergence of all linear combinations. Since we have shown that for any ( c 1 , c 2 ) R 2 ,
n b ˘ d / 2 c 1 ( U n , 1 ( φ 1 , x ˜ ) θ n ( φ 1 ) ) + c 2 ( U n , 1 ( φ 2 , x ˜ ) θ n ( φ 2 ) ) D N 0 , c Σ c ,
where c = ( c 1 , c 2 ) and Σ is the covariance matrix defined in Lemma 3, we conclude that
n b ˘ d / 2 V n D N ( 0 , Σ ) .
Recall from (3.6) that the estimator of interest admits the representation
r ^ n , 1 ( m ) ( φ , x ˜ ; Λ ¯ n , 1 ( x ˜ ) ) = U n , 1 ( φ , x ˜ ) U n , 1 ( 1 , x ˜ ) .
Define the function g : R 2 { x 2 = 0 } R by g ( x 1 , x 2 ) = x 1 / x 2 . This function is continuously differentiable on its domain with gradient
g ( x 1 , x 2 ) = g x 1 , g x 2 = 1 x 2 , x 1 x 2 2 .
Under the MAR assumption and the positivity condition, together with the bias expansion established in Theorem 4, we have
E U n , 1 ( 1 , x ˜ ) = 1 + O ( b ˘ 1 / 2 ) ,
and by the uniform consistency result of Theorem 3,
U n , 1 ( 1 , x ˜ ) P 1 as n .
In particular, U n , 1 ( 1 , x ˜ ) is bounded away from zero with probability tending to one. Therefore, the delta method is applicable. Let μ n : = ( E [ U n , 1 ( φ , x ˜ ) ] , E [ U n , 1 ( 1 , x ˜ ) ] ) . By Lemma 3 and the consistency of U n , 1 ( 1 , x ˜ ) , we have
n b ˘ d / 2 U n , 1 ( φ , x ˜ ) E [ U n , 1 ( φ , x ˜ ) ] U n , 1 ( 1 , x ˜ ) E [ U n , 1 ( 1 , x ˜ ) ] D N ( 0 , Σ ) ,
where Σ is the asymptotic covariance matrix from Lemma 3 with φ 1 = φ and φ 2 1 . Applying the delta method (see, e.g., [189]), we obtain
n b ˘ d / 2 g U n , 1 ( φ , x ˜ ) , U n , 1 ( 1 , x ˜ ) g E [ U n , 1 ( φ , x ˜ ) ] , E [ U n , 1 ( 1 , x ˜ ) ] D N 0 , g ( μ ) Σ g ( μ ) ,
where μ = lim n μ n = ( r ( m ) ( φ , x ˜ ) , 1 ) , provided this limit exists. The convergence of μ n follows from the bias expansions in Theorem 4:
E [ U n , 1 ( φ , x ˜ ) ] r ( m ) ( φ , x ˜ ) ,             E [ U n , 1 ( 1 , x ˜ ) ] 1 .
Evaluating the gradient at the limit point μ = ( r , 1 ) with r : = r ( m ) ( φ , x ˜ ) , we have
g ( r , 1 ) = 1 1 , r 1 2 = ( 1 , r ) .
Therefore, the asymptotic variance is given by
ρ 2 : = g ( μ ) Σ g ( μ ) = ( 1 , r ) σ 2 ( φ ) σ 2 ( φ , 1 ) σ 2 ( φ , 1 ) σ 2 ( 1 ) 1 r .
Expanding this quadratic form yields
ρ 2 = σ 2 ( φ ) 2 r σ 2 ( φ , 1 ) + r 2 σ 2 ( 1 ) .
From Lemma 3 and the expression for σ 2 ( · , · ) given therein, we have:
  • σ 2 ( φ ) is given by (12.40) with r i j ( x ˜ ) replaced by E [ φ ( ) φ ( ) ] ;
  • σ 2 ( 1 ) corresponds to the variance of the constant kernel, which is zero because U n , 1 ( 1 , x ˜ ) converges to a constant (in fact, σ 2 ( 1 ) = 0 );
  • σ 2 ( φ , 1 ) represents the covariance between U n , 1 ( φ , x ˜ ) and U n , 1 ( 1 , x ˜ ) , which vanishes asymptotically because U n , 1 ( 1 , x ˜ ) converges to a non-random constant.
More rigorously, using the expression from Lemma 3 with g 1 = φ and g 2 1 , we have for i j or when the cross-terms vanish, the contributions to σ 2 ( φ , 1 ) are zero because the kernel for g 2 1 integrates to a constant. Consequently,
σ 2 ( φ , 1 ) = 0 ,             σ 2 ( 1 ) = 0 .
Thus,
ρ 2 = σ 2 ( φ ) .
Putting everything together, we have established that
n b ˘ d / 2 r ^ n , 1 ( m ) ( φ , x ˜ ; Λ ¯ n , 1 ( x ˜ ) ) E [ U n , 1 ( φ , x ˜ ) ] D N 0 , ρ 2 ,
where ρ 2 = σ 2 ( φ ) is given explicitly by (3.19). This completes the proof of Theorem 5. □
Remark 23.
Under the MAR assumption, the asymptotic variance ρ 2 incorporates the propensity score p ( · ) through the factor 1 / p ( x i ) as shown in (3.19). This reflects the increased variability due to missing responses. However, in the statement of Theorem 5, we have presented ρ 2 in the simplified form (3.19) for readability, with the understanding that the missing data adaptation is as given in (3.19). The proof remains valid with this substitution, as the delta method and the Cramér–Wold device are unaffected by the specific form of the variance, provided the joint asymptotic normality holds.

13. Proof of the Results of Section 4: Bernstein Polynomials

Proof of Theorem 6.
Throughout this proof, we operate under the MAR assumption (2.4) with the positivity condition
inf x S d , 1 p ( x ) c > 0 ,
and the regularity conditions specified in Theorem 6. All estimators are understood to be the complete-case versions incorporating the missingness indicators δ i as defined in (3.8) and (3.9). For notational brevity, we write g ^ n , ϑ ( φ , x ) and f ^ n , ϑ ( x ) in place of g ^ n , ϑ ( miss ) ( φ , x ) and f ^ n , ϑ ( miss ) ( x ) , respectively, with the implicit understanding that the missingness indicators are included. To establish Theorem 6, we must demonstrate the following two fundamental estimates:
sup x S d , 1 E r ^ n , 2 ( 1 ) ( φ , x ) r ( 1 ) ( φ , x ) = O ( ϑ 1 / 2 ) ,
sup x S d , 1 r ^ n , 2 ( 1 ) ( φ , x ) E r ^ n , 2 ( 1 ) ( φ , x ) = O ϑ d 1 / 2 n 1 log n 1 / 2 a . s .
The following proposition provides the bias expansion for the complete-case numerator g ^ n , ϑ ( φ , x ) under the MAR assumption.
Proposition 2.
Assume that condition (C.2) holds and that the MAR assumption (2.4) is satisfied with inf x S d , 1 p ( x ) c > 0 . Then, uniformly for x S d , 1 ,
E g ^ n , ϑ ( φ , x ) = R ( φ , x ) p ( x ) + ϑ 1 L ( x ) + o ( ϑ 1 ) ,             ϑ ,
where
L ( x ) : = d ( d 1 ) 2 ϑ R ( φ , x ) p ( x ) + i = 1 d 1 2 x i x i R ( φ , x ) p ( x ) + 1 2 i , j = 1 d x i 1 { i = j } x i x j 2 x i x j R ( φ , x ) p ( x ) .
Proof of Proposition 2.
By definition of the complete-case estimator under MAR,
E g ^ n , ϑ ( φ , x ) = E φ ( Y 1 ) δ 1 K x , ϑ ( X 1 ) .
Applying the tower property and the MAR assumption (2.4) yields
E φ ( Y 1 ) δ 1 K x , ϑ ( X 1 ) = E p ( X 1 ) φ ( Y 1 ) K x , ϑ ( X 1 ) = S d , 1 r ( 1 ) ( φ , u ) p ( u ) K x , ϑ ( u ) f ( u ) d u = S d , 1 R ( φ , u ) p ( u ) K x , ϑ ( u ) d u .
The Bernstein kernel admits the representation (see [129])
K x , ϑ ( u ) = ( ϑ 1 + d ) ! ( ϑ 1 ) ! k N 0 d ( ϑ 1 ) S d , 1 1 k ϑ , k + 1 ϑ ( u ) P k , ϑ 1 ( x ) ,
where P k , ϑ 1 ( x ) are multinomial probabilities. Substituting this representation gives
E g ^ n , ϑ ( φ , x ) = ( ϑ 1 + d ) ! ( ϑ 1 ) ! k N 0 d ( ϑ 1 ) S d , 1 P k , ϑ 1 ( x ) k ϑ , k + 1 ϑ R ( φ , u ) p ( u ) d u .
A second-order Taylor expansion of R ( φ , u ) p ( u ) around u = k / ϑ yields, for any k such that k / ϑ x 1 = o ( 1 ) ,
ϑ d k ϑ , k + 1 ϑ R ( φ , u ) p ( u ) d u = R ( φ , k / ϑ ) p ( k / ϑ ) + 1 2 ϑ i = 1 d u i R ( φ , k / ϑ ) p ( k / ϑ ) + O ( ϑ 2 ) = R ( φ , x ) p ( x ) + 1 ϑ i = 1 d ( k i ϑ x i ) x i R ( φ , x ) p ( x ) + 1 2 ϑ i = 1 d x i R ( φ , x ) p ( x ) + 1 2 ϑ 2 i , j = 1 d ( k i ϑ x i ) ( k j ϑ x j ) 2 x i x j R ( φ , x ) p ( x ) ( 1 + o ( 1 ) ) + o ( ϑ 1 ) .
Multiplying by ϑ d · ( ϑ 1 + d ) ! ( ϑ 1 ) ! P k , ϑ 1 ( x ) and summing over all k , we invoke the well-known identities for multinomial distributions:
k N 0 d ϑ S d , 1 k i ϑ x i P k , ϑ ( x ) = 0 ,
k N 0 d ϑ S d , 1 k i ϑ x i k j ϑ x j P k , ϑ ( x ) = 1 ϑ x i 1 { i = j } x i x j .
Using (13.3) and (13.4), we obtain
E g ^ n , ϑ ( φ , x ) = 1 + d ( d 1 ) 2 ϑ R ( φ , x ) p ( x ) + 1 ϑ i = 1 d 1 2 x i x i R ( φ , x ) p ( x ) + 1 2 ϑ i , j = 1 d x i 1 { i = j } x i x j 2 x i x j R ( φ , x ) p ( x ) + o ( ϑ 1 ) ,
which completes the proof of Proposition 2. □
From Proposition 2, we have
E g ^ n , ϑ ( φ , x ) = R ( φ , x ) p ( x ) + O ( ϑ 1 ) .
Using the Lipschitz continuity of R ( φ , · ) p ( · ) (which follows from (C.2) and the continuity of p ( · ) ), we obtain the sharper estimate
sup x S d , 1 E g ^ n , ϑ ( φ , x ) R ( φ , x ) p ( x ) = O ( ϑ 1 / 2 ) .
For the denominator, applying (13) with φ 1 yields
sup x S d , 1 E f ^ n , ϑ ( x ) f ( x ) p ( x ) = O ( ϑ 1 / 2 ) .
Under the positivity condition, f ( x ) p ( x ) is uniformly bounded away from zero on S d , 1 . Consequently,
inf x S d , 1 E f ^ n , ϑ ( x ) c 0 2 > 0 for sufficiently large ϑ ,
for some constant c 0 > 0 . Now, consider the decomposition
E [ g ^ n , ϑ ( φ , x ) ] E [ f ^ n , ϑ ( x ) ] r ( 1 ) ( φ , x ) = E [ g ^ n , ϑ ( φ , x ) ] r ( 1 ) ( φ , x ) E [ f ^ n , ϑ ( x ) ] E [ f ^ n , ϑ ( x ) ] .
Using (13) and the analogous estimate for f ^ n , ϑ , we obtain
sup x S d , 1 E r ^ n , 2 ( 1 ) ( φ , x ) r ( 1 ) ( φ , x ) = O ( ϑ 1 / 2 ) ,
which establishes (13.1). Observe that
g ^ n , ϑ ( φ , x ) E g ^ n , ϑ ( φ , x ) = ( ϑ 1 + d ) ! ( ϑ 1 ) ! 1 n i = 1 n Z i , ϑ ( x ) ,
where
Z i , ϑ ( x ) : = k N 0 d ( ϑ 1 ) S d , 1 φ ( Y i ) δ i 1 k ϑ , k + 1 ϑ ( X i ) k ϑ , k + 1 ϑ R ( φ , u ) p ( u ) d u P k , ϑ 1 ( x ) .
Let ω n , 2 : = ξ n 1 / ( 1 + γ ) where ξ n will be chosen later. Define the truncated and remainder parts of φ :
φ ( T ) ( y ) : = φ ( y ) 1 { | φ ( y ) | ω n , 2 } ,             φ ( R ) ( y ) : = φ ( y ) 1 { | φ ( y ) | > ω n , 2 } .
Correspondingly, Z i , ϑ ( x ) = Z i , ϑ ( T ) ( x ) + Z i , ϑ ( R ) ( x ) , where Z i , ϑ ( T ) ( x ) and Z i , ϑ ( R ) ( x ) are defined with φ ( T ) and φ ( R ) , respectively. Hence,
g ^ n , ϑ ( φ , x ) E g ^ n , ϑ ( φ , x ) = g ^ n , ϑ ( φ ( T ) , x ) E [ g ^ n , ϑ ( φ ( T ) , x ) ] = : T 1 ( x ) + g ^ n , ϑ ( φ ( R ) , x ) E [ g ^ n , ϑ ( φ ( R ) , x ) ] = : T 2 ( x ) .
Under condition (C.3) with exponent γ > 0 , we have
E | φ ( R ) ( Y ) | = E | φ ( Y ) | 1 { | φ ( Y ) | > ω n , 2 } ω n , 2 γ E | φ ( Y ) | 1 + γ = O ( ω n , 2 γ ) .
A standard application of Chebyshev’s inequality and the Borel–Cantelli lemma (see, e.g., the treatment of the remainder part in the proof of Theorem 2) yields
sup x S d , 1 | T 2 ( x ) | = o ( 1 ) almost surely .
Since | φ ( T ) | ω n , 2 , we compute
Var g ^ n , ϑ ( φ ( T ) , x ) = n 1 ( ϑ 1 + d ) ! ( ϑ 1 ) ! 2 E ( Z 1 , ϑ ( T ) ( x ) ) 2
n 1 ( ϑ 1 + d ) ! ( ϑ 1 ) ! 2 ω n , 2 2 k P k , ϑ 1 2 ( x ) k ϑ , k + 1 ϑ f ( u ) p ( u ) d u .
Using the bound k P k , ϑ 1 2 ( x ) = O ( ϑ d / 2 ) (see [129]) and the fact that ( ϑ 1 + d ) ! ( ϑ 1 ) ! = O ( ϑ d ) , we obtain
Var g ^ n , ϑ ( φ ( T ) , x ) = O n 1 ϑ 2 d · ω n , 2 2 · ϑ d / 2 · ϑ d = O n 1 ϑ d / 2 ω n , 2 2 .
Define
L n , ϑ : = max k N 0 d ( ϑ 1 ) S d , 1 1 n i = 1 n φ ( T ) ( Y i ) δ i 1 k ϑ , k + 1 ϑ ( X i ) k ϑ , k + 1 ϑ R ( φ ( T ) , u ) p ( u ) d u .
For each fixed k , the summands are independent, zero-mean, bounded random variables (by 2 ω n , 2 in absolute value). Bernstein’s inequality (see Lemma 2.2.11 of [190]) gives, for any ρ > 0 ,
P 1 n i = 1 n > ρ ϑ 1 / 2 ς n 2 exp n 2 ρ 2 ϑ 1 ς n 2 / 2 n C ω n , 2 2 ϑ 1 + 1 3 ω n , 2 n ρ ϑ 1 / 2 ς n ,
where ς n : = ( log n ) / n and C is a Lipschitz constant for f ( · ) p ( · ) (which exists by continuity on the compact S d , 1 ). Under the condition ϑ n / log n (which is equivalent to ς n ϑ 1 / 2 ), the denominator is bounded above by n ω n , 2 2 ϑ 1 ( C + ρ / 3 ) . Consequently,
P | L n , ϑ | > ρ ϑ 1 / 2 ς n ϑ d · 2 exp n ρ 2 ϑ 1 ς n 2 2 ( C + ρ / 3 ) ω n , 2 2 ϑ 1 = ϑ d · 2 exp ρ 2 ς n 2 n 2 ( C + ρ / 3 ) ω n , 2 2 .
Choosing ω n , 2 2 = ς n 1 = n / log n (so that ω n , 2 2 sufficiently slowly), we obtain
ς n 2 n ω n , 2 2 = ( log n ) / n · n n / log n = log n n / log n = ( log n ) 3 / 2 n 1 / 2 0 .
This choice is not sufficient; instead, we set ω n , 2 = ξ n 1 / ( 1 + γ ) with ξ n = n α for some α > 0 chosen so that the exponential bound becomes summable. A standard argument (see [60]) yields that for an appropriate choice of ρ (depending on d and C),
n = 1 P | L n , ϑ | > ρ ϑ 1 / 2 ς n < .
By the Borel–Cantelli lemma, we conclude
| L n , ϑ | = O ϑ 1 / 2 ς n almost surely .
Since
g ^ n , ϑ ( φ ( T ) , x ) E [ g ^ n , ϑ ( φ ( T ) , x ) ] = ( ϑ 1 + d ) ! ( ϑ 1 ) ! k P k , ϑ 1 ( x ) · terms bounded by L n , ϑ ,
and ( ϑ 1 + d ) ! ( ϑ 1 ) ! = O ( ϑ d ) , we obtain
sup x S d , 1 | T 1 ( x ) | = O ϑ d · ϑ 1 / 2 ς n = O ϑ d 1 / 2 log n n almost surely .
From Steps 6 and 8, we have
sup x S d , 1 g ^ n , ϑ ( φ , x ) E [ g ^ n , ϑ ( φ , x ) ] = O ϑ d 1 / 2 log n n almost surely .
Applying the same stochastic bound to the denominator f ^ n , ϑ ( x ) (with φ 1 ) yields
sup x S d , 1 f ^ n , ϑ ( x ) E [ f ^ n , ϑ ( x ) ] = O ϑ d 1 / 2 log n n almost surely .
Using the identity
g ^ n , ϑ ( φ , x ) f ^ n , ϑ ( x ) E [ g ^ n , ϑ ( φ , x ) ] E [ f ^ n , ϑ ( x ) ] = g ^ n , ϑ ( φ , x ) E [ g ^ n , ϑ ( φ , x ) ] f ^ n , ϑ ( x ) E [ g ^ n , ϑ ( φ , x ) ] E [ f ^ n , ϑ ( x ) ] · f ^ n , ϑ ( x ) E [ f ^ n , ϑ ( x ) ] f ^ n , ϑ ( x ) ,
and the fact that f ^ n , ϑ ( x ) converges uniformly to f ( x ) p ( x ) , which is bounded away from zero, we conclude that
sup x S d , 1 r ^ n , 2 ( 1 ) ( φ , x ) E [ r ^ n , 2 ( 1 ) ( φ , x ) ] = O ϑ d 1 / 2 log n n almost surely .
This establishes (13.2) and completes the proof of Theorem 6. □
Remark 24.
The key insight of the proof is the cancellation of the propensity score p ( · ) in the ratio estimator. Although p ( · ) appears explicitly in the bias expansion of the numerator and denominator, it cancels out when forming the ratio, leaving the same asymptotic bias and convergence rates as in the complete-data case. The variance, however, is inflated by a factor of 1 / p ( x ) , which is reflected in the constants but does not affect the rates. This phenomenon is characteristic of complete-case estimators under MAR and is rigorously justified by the decomposition above.
Proof of Theorem 7.
Throughout this proof, we operate under the MAR assumption (2.4) with the positivity condition inf x S d , 1 p ( x ) c > 0 , and the regularity conditions specified in Theorem 7. All U-statistics are understood to be the complete-case versions incorporating the missingness indicators δ i as defined in (3.5). For notational simplicity, we write u n , 2 ( φ , x ˜ ) in place of u n , 2 ( miss ) ( φ , x ˜ ) , with the implicit understanding that the product j = 1 m δ i j is included in the kernel. Recall the truncation decomposition introduced in (11.1):
u n , 2 ( φ , x ˜ ) = u n , 2 ( T ) ( φ , x ˜ ) + u n , 2 ( R ) ( φ , x ˜ ) ,
where the truncated kernel is defined with φ ( T ) = φ 1 { | φ | ω n , 2 } and the remainder kernel with φ ( R ) = φ 1 { | φ | > ω n , 2 } . The truncation threshold ω n , 2 will be chosen optimally in the sequel. We begin by analyzing the truncated component u n , 2 ( T ) ( φ , x ˜ ) .
Under the MAR assumption, the truncated kernel is defined as
G φ , x ˜ , 2 ( T ) ( X ˜ , Y ˜ , δ ˜ ) : = φ ( T ) ( Y ˜ ) j = 1 m δ j K ˜ x ˜ , ϑ ( X ˜ ) ,
where K ˜ x ˜ , ϑ ( X ˜ ) : = j = 1 m K x j , ϑ ( X j ) denotes the product Bernstein kernel. By construction, | φ ( T ) | ω n , 2 and 0 j = 1 m δ j 1 . The truncated U-statistic then satisfies
u n , 2 ( T ) ( φ , x ˜ ) E u n , 2 ( T ) ( φ , x ˜ )
= ( n m ) ! n ! | i I ( m , n ) { φ ( T ) ( Y ˜ i ) j = 1 m δ i j K ˜ x ˜ , ϑ ( X ˜ i )
                                    E φ ( T ) ( Y ˜ i ) j = 1 m δ i j K ˜ x ˜ , ϑ ( X ˜ i ) } | .
Recall that the Bernstein kernel admits the representation (see [129])
K x , ϑ ( u ) = ( ϑ 1 + d ) ! ( ϑ 1 ) ! k N 0 d ( ϑ 1 ) S d , 1 1 k ϑ , k + 1 ϑ ( u ) P k , ϑ 1 ( x ) ,
where P k , ϑ 1 ( x ) are multinomial probabilities satisfying k P k , ϑ 1 ( x ) = 1 and P k , ϑ 1 ( x ) 0 . Substituting this representation into the kernel product, we obtain
K ˜ x ˜ , ϑ ( X ˜ i ) = ( ϑ 1 + d ) ! ( ϑ 1 ) ! m ( k 1 , , k m ) ( N 0 d ( ϑ 1 ) S d , 1 ) m j = 1 m 1 k j ϑ , k j + 1 ϑ ( X i j ) j = 1 m P k j , ϑ 1 ( x j ) .
Consequently,
H ϑ ( T ) ( X ˜ i , Y ˜ i , δ ˜ i ) : = G φ , x ˜ , 2 ( T ) ( X ˜ i , Y ˜ i , δ ˜ i ) E G φ , x ˜ , 2 ( T ) ( X ˜ i , Y ˜ i , δ ˜ i ) = ( ϑ 1 + d ) ! ( ϑ 1 ) ! m k ˜ K ϑ m { φ ( T ) ( Y ˜ i ) j = 1 m δ i j j = 1 m 1 k i j ϑ , k i j + 1 ϑ ( X i j )             k 1 ϑ , k 1 + 1 ϑ k m ϑ , k m + 1 ϑ φ ( T ) ( y ˜ ) j = 1 m p ( u j ) f ˜ ( u ˜ ) d u ˜ d y ˜ } × j = 1 m P k j , ϑ 1 ( x j ) ,
where K ϑ : = N 0 d ( ϑ 1 ) S d , 1 and we have used the MAR assumption to write
E j = 1 m δ i j X ˜ i = u ˜ = j = 1 m p ( u j ) .
Define the centered and scaled process
L ϑ , n : = max k ˜ K ϑ m ( n m ) ! n ! i I ( m , n ) ω n , 2 j = 1 m 1 k i j ϑ , k i j + 1 ϑ ( X i j ) k 1 ϑ , k 1 + 1 ϑ k m ϑ , k m + 1 ϑ f ˜ ( u ˜ ) d u ˜ .
For each fixed k ˜ , the summands are independent and centered. Moreover, using the boundedness of φ ( T ) (with | φ ( T ) | ω n , 2 ) and the fact that 0 j = 1 m p ( u j ) 1 , we have the uniform bound
u n , 2 ( T ) ( φ , x ˜ ) E [ u n , 2 ( T ) ( φ , x ˜ ) ] ( ϑ 1 + d ) ! ( ϑ 1 ) ! m max k ˜ K ϑ m j = 1 m P k j , ϑ 1 ( x j ) · L ϑ , n .
Since j = 1 m P k j , ϑ 1 ( x j ) = O ( ϑ d m / 2 ) uniformly (by the properties of multinomial probabilities; see Lemma 3 in [129]), we obtain
u n , 2 ( T ) ( φ , x ˜ ) E [ u n , 2 ( T ) ( φ , x ˜ ) ] = O ϑ d m · ϑ d m / 2 · L ϑ , n = O ϑ d m / 2 L ϑ , n .
Consider the random variable
Z i ( k ˜ ) : = ω n , 2 j = 1 m 1 k i j ϑ , k i j + 1 ϑ ( X i j ) k 1 ϑ , k 1 + 1 ϑ k m ϑ , k m + 1 ϑ f ˜ ( u ˜ ) d u ˜ .
Under condition (C.2), the density f is bounded above by some constant C 0 < . Consequently,
Var Z i ( k ˜ ) = E [ Z i ( k ˜ ) 2 ] = ω n , 2 2 [ k 1 ϑ , k 1 + 1 ϑ k m ϑ , k m + 1 ϑ f ˜ ( u ˜ ) d u ˜             k 1 ϑ , k 1 + 1 ϑ k m ϑ , k m + 1 ϑ f ˜ ( u ˜ ) d u ˜ 2 ] ω n , 2 2 C 0 k 1 ϑ , k 1 + 1 ϑ k m ϑ , k m + 1 ϑ 1 · d u ˜ ( sin ce f ˜ C 0 and the square term is nonnegative ) = ω n , 2 2 C 0 ϑ d m .
Moreover, the random variables Z i ( k ˜ ) are uniformly bounded:
Z i ( k ˜ ) ω n , 2 max 1 , f ˜ ( u ˜ ) d u ˜ ω n , 2 ( 1 + C 0 ϑ d m ) 2 ω n , 2
for sufficiently large ϑ (since ϑ d m 0 ). A standard technique for U-statistics (see Section 5.1.2 of [177]) allows us to replace the sum over distinct indices I ( m , n ) by a sum over [ n / m ] independent blocks. Specifically, let B = n / m . Define disjoint index sets J 1 , , J B , each of size m, and consider only the U-statistic based on these blocks. The contribution from the remaining indices is asymptotically negligible. Then,
( n m ) ! n ! i I ( m , n ) Z i ( k ˜ ) = 1 B b = 1 B Z J b ( k ˜ ) + o ( 1 ) ,
where Z J b ( k ˜ ) are independent and identically distributed. Let ς n : = ( log n ) / n . For any ε 0 > 0 , apply Bernstein’s inequality (see Lemma 2.2.11 of [190]) to the average of the Z J b ( k ˜ ) ’s. Using the bounds Var ( Z J b ) ω n , 2 2 C 0 ϑ d m and Z J b 2 ω n , 2 , we obtain, for each fixed k ˜ ,
P 1 B b = 1 B Z J b ( k ˜ ) > ε 0 ϑ m / 2 ς n 2 exp B ε 0 2 ϑ m ς n 2 2 ω n , 2 2 C 0 ϑ d m + 2 3 ω n , 2 ε 0 ϑ m / 2 ς n .
Since B = n / m n / m , we have B ς n 2 = B · ( log n ) / n ( log n ) / m . Now, take a union bound over all k ˜ K ϑ m . The cardinality of K ϑ is at most ϑ d (the number of integer lattice points in the dilated simplex), so | K ϑ m | ϑ d m . Hence,
P max k ˜ K ϑ m 1 B b = 1 B Z J b ( k ˜ ) > ε 0 ϑ m / 2 ς n
            2 ϑ d m exp B ε 0 2 ϑ m ς n 2 2 ω n , 2 2 C 0 ϑ d m + 2 3 ω n , 2 ε 0 ϑ m / 2 ς n .
Set the truncation threshold as
ω n , 2 : = ϑ d m / 2 ς n .
This choice balances the bias from the remainder term and the variance of the truncated term. Under the bandwidth condition ϑ n / log n , we have ς n ϑ 1 / 2 , which implies ω n , 2 ϑ ( d m 1 ) / 2 as n (since d m 1 ). Substituting this choice into the exponential bound yields
2 ω n , 2 2 C 0 ϑ d m = 2 ( ϑ d m ς n 2 ) C 0 ϑ d m = 2 C 0 ς n 2 ,
and
2 3 ω n , 2 ε 0 ϑ m / 2 ς n = 2 3 ( ϑ d m / 2 ς n ) ε 0 ϑ m / 2 ς n = 2 3 ε 0 ϑ ( d m m ) / 2 ς n 2 .
Since d 1 , we have d m m = m ( d 1 ) 0 , so ϑ ( d m m ) / 2 1 . Consequently,
2 ω n , 2 2 C 0 ϑ d m + 2 3 ω n , 2 ε 0 ϑ m / 2 ς n 2 C 0 ς n 2 + 2 3 ε 0 ϑ ( d m m ) / 2 ς n 2 2 C 0 + 2 3 ε 0 ϑ ( d m m ) / 2 ς n 2 .
However, this bound still depends on ϑ . A sharper analysis uses the fact that ϑ ( d m m ) / 2 ς n 2 = ϑ ( d m m ) / 2 ( log n ) / n . Under the condition ϑ n 1 / ( d m m + 2 ) (which is milder than ϑ n / log n for d m m 0 ), we have ϑ ( d m m ) / 2 ς n 2 0 . For simplicity, we proceed with the conservative bound
2 ω n , 2 2 C 0 ϑ d m + 2 3 ω n , 2 ε 0 ϑ m / 2 ς n C ς n 2 ,
where C = 2 C 0 + 2 3 ε 0 for sufficiently large n (since ϑ ( d m m ) / 2 ς n 2 is bounded). Then,
B ε 0 2 ϑ m ς n 2 2 ω n , 2 2 C 0 ϑ d m + 2 3 ω n , 2 ε 0 ϑ m / 2 ς n ( n / m ) ε 0 2 ϑ m ς n 2 C ς n 2 = ε 0 2 C m n ϑ m .
Thus,
P 2 ϑ d m exp ε 0 2 C m n ϑ m .
Now, under the condition n ϑ m log n (which is equivalent to ϑ ( n / log n ) 1 / m ), we obtain
exp ε 0 2 C m n ϑ m exp ε 0 2 C m log n = n ε 0 2 / ( C m ) .
Choosing ε 0 sufficiently large so that ε 0 2 / ( C m ) > 1 + κ for some κ > 0 , we obtain
P 2 ϑ d m n 1 κ .
Since ϑ d m n d m (as ϑ n ), the right-hand side is summable in n (as n 1 κ + o ( 1 ) with κ > 0 ). By the Borel–Cantelli lemma, we conclude that, almost surely,
max k ˜ K ϑ m 1 B b = 1 B Z J b ( k ˜ ) = O ϑ m / 2 ς n .
Consequently,
L ϑ , n = O ϑ m / 2 ς n almost surely .
Therefore,
sup x ˜ S d , 1 m u n , 2 ( T ) ( φ , x ˜ ) E [ u n , 2 ( T ) ( φ , x ˜ ) ] = O ϑ d m / 2 · ϑ m / 2 ς n = O ϑ ( d 1 ) m / 2 ς n .
For the case d = 1 (univariate covariate), ϑ ( d 1 ) m / 2 = ϑ 0 = 1 , giving the rate O ( ς n ) = O ( ( log n ) / n ) . For d 2 , the rate is even faster due to the factor ϑ ( d 1 ) m / 2 . Recall that the remainder kernel is defined with φ ( R ) = φ 1 { | φ | > ω n , 2 } . We first bound its expectation. Using the MAR assumption and condition (C.3) with exponent γ > 0 ,
E [ u n , 2 ( R ) ( φ , x ˜ ) ] E | φ ( Y ˜ i ) | 1 { | φ ( Y ˜ i ) | > ω n , 2 } j = 1 m δ i j K ˜ x ˜ , ϑ ( X ˜ i ) = E j = 1 m p ( X i j ) K ˜ x ˜ , ϑ ( X ˜ i ) E | φ ( Y ˜ i ) | 1 { | φ ( Y ˜ i ) | > ω n , 2 } X ˜ i ω n , 2 ( 1 + γ ) E j = 1 m p ( X i j ) K ˜ x ˜ , ϑ ( X ˜ i ) E | φ ( Y ˜ i ) | 2 + γ X ˜ i ω n , 2 ( 1 + γ ) sup u ˜ S d , 1 m E | φ ( Y ˜ ) | 2 + γ X ˜ = u ˜ S d , 1 m j = 1 m p ( u j ) f ˜ ( u ˜ ) K ˜ x ˜ , ϑ ( u ˜ ) d u ˜ C 1 ω n , 2 ( 1 + γ ) ,
where C 1 is the constant from condition (C.3) and we have used the fact that j = 1 m p ( u j ) 1 and K ˜ = 1 . Now, choose the truncation threshold as
ω n , 2 : = ϑ m / 2 ς n 1 / ( 1 + γ ) = ϑ m / 2 ς n 1 1 / ( 1 + γ ) .
With this choice,
E [ u n , 2 ( R ) ( φ , x ˜ ) ] = O ϑ m / 2 ς n .
By Markov’s inequality, for any η > 0 ,
P sup x ˜ S d , 1 m u n , 2 ( R ) ( φ , x ˜ ) E [ u n , 2 ( R ) ( φ , x ˜ ) ] > η ϑ m / 2 ς n E sup x ˜ | u n , 2 ( R ) ( φ , x ˜ ) E [ u n , 2 ( R ) ( φ , x ˜ ) ] | η ϑ m / 2 ς n .
Using the bound from Step 8 and the fact that the kernel integrates to one, we obtain
E | u n , 2 ( R ) ( φ , x ˜ ) E [ u n , 2 ( R ) ( φ , x ˜ ) ] | 2 E | u n , 2 ( R ) ( φ , x ˜ ) | = O ϑ m / 2 ς n .
Therefore,
P sup x ˜ u n , 2 ( R ) ( φ , x ˜ ) E [ u n , 2 ( R ) ( φ , x ˜ ) ] > η ϑ m / 2 ς n = O ( 1 / η ) .
Since η can be chosen arbitrarily large, we conclude that
sup x ˜ S d , 1 m u n , 2 ( R ) ( φ , x ˜ ) E [ u n , 2 ( R ) ( φ , x ˜ ) ] = O P ϑ m / 2 ς n .
A more refined argument using the Borel–Cantelli lemma (as in the proof of Theorem 2) actually yields almost sure convergence at the same rate, provided the truncation threshold is chosen appropriately and the moment condition (C.3) holds. Combining the bounds for the truncated part (Step 7) and the remainder part (Step 9), we obtain
sup x ˜ S d , 1 m u n , 2 ( φ , x ˜ ) E [ u n , 2 ( φ , x ˜ ) ] = O ϑ ( d 1 ) m / 2 log n n almost surely .
For the case d = 1 (univariate covariate), this simplifies to O ( ( log n ) / n ) . For d 2 , the rate is even faster due to the factor ϑ ( d 1 ) m / 2 , which converges to zero as ϑ under the bandwidth condition.
This completes the proof of Theorem 7. □
Remark 25.
Several technical points deserve emphasis. First, the use of the block decomposition for the U-statistic (Step 5) is essential to obtain independent summands, allowing the application of Bernstein’s inequality. Second, the union bound over ϑ d m terms is compensated by the super-exponential decay from Bernstein’s inequality, which is achieved by choosing ε 0 sufficiently large. Third, the truncation threshold ω n , 2 is chosen to balance the bias from the remainder term (which decays as ω n , 2 ( 1 + γ ) ) and the variance of the truncated term (which involves ω n , 2 2 ). The optimal balance is achieved at ω n , 2 = ( ϑ m / 2 ς n ) 1 / ( 1 + γ ) , yielding the rate ϑ m / 2 ς n . Finally, under the MAR assumption, the propensity score p ( · ) appears in the expectations but cancels out in the final rates due to the product structure and the boundedness 0 < p ( · ) 1 . The positivity condition p ( · ) c > 0 ensures that the denominator of the ratio estimator does not degenerate, which is necessary for the application of the delta method in the main theorem.
Proof of Theorem 8.
The demonstration of Theorem 8 proceeds by a meticulous application of the classical decomposition (11.2) in conjunction with the uniform convergence rates established in Theorem 7 for the underlying conditional U-statistics, together with the uniform lower bounds for the denominator that follow from the MAR assumption (2.4) and the positivity condition inf x S d , 1 p ( x ) c > 0 . Throughout this proof, all U-statistics are understood to be the complete-case versions incorporating the missingness indicators δ i as defined in (3.5); for notational brevity, we write u n , 2 ( φ , x ˜ ) in place of u n , 2 ( miss ) ( φ , x ˜ ) , with the implicit understanding that the product j = 1 m δ i j is included in the kernel. Recall that the estimator of interest admits the representation r ^ n , 2 ( m ) ( φ , x ˜ ; Λ ¯ n , 2 ( x ˜ ) ) = u n , 2 ( φ , x ˜ ) / u n , 2 ( 1 , x ˜ ) , and its Hájek-type centering is defined as E ^ [ · ] = E [ u n , 2 ( φ , x ˜ ) ] / E [ u n , 2 ( 1 , x ˜ ) ] . From the elementary identity
a b α β = a α b α ( b β ) b β ,
valid for b , β 0 , we obtain the fundamental decomposition
r ^ n , 2 ( m ) ( φ , x ˜ ; Λ ¯ n , 2 ( x ˜ ) ) E ^ [ r ^ n , 2 ( m ) ( φ , x ˜ ; Λ ¯ n , 2 ( x ˜ ) ) ] I 2 , 1 ( x ˜ ) + I 2 , 2 ( x ˜ ) ,
where the stochastic components are defined as
I 2 , 1 ( x ˜ ) : = u n , 2 ( φ , x ˜ ) E [ u n , 2 ( φ , x ˜ ) ] u n , 2 ( 1 , x ˜ ) , I 2 , 2 ( x ˜ ) : = E [ u n , 2 ( φ , x ˜ ) ] · u n , 2 ( 1 , x ˜ ) E [ u n , 2 ( 1 , x ˜ ) ] u n , 2 ( 1 , x ˜ ) · E [ u n , 2 ( 1 , x ˜ ) ] .
The crux of the proof lies in establishing uniform (in x ˜ ) almost sure bounds for both I 2 , 1 and I 2 , 2 , which then translate directly into the desired rate for the ratio estimator. To this end, we first note that Theorem 7 furnishes the following uniform convergence rates for the numerator and denominator U-statistics: Almost surely,
sup x ˜ S d , 1 m u n , 2 ( ψ , x ˜ ) E [ u n , 2 ( ψ , x ˜ ) ] = O ϑ m ( d 1 / 2 ) log n n ,
for ψ = φ and for ψ 1 . Denote the rate normalization by
ς n , 2 : = ϑ m ( d 1 / 2 ) log n n ,
which represents the optimal balance between the bias from the truncation threshold and the variance of the truncated U-statistic, as derived in the proof of Theorem 7. Turning to the denominator, we require uniform lower bounds that ensure the ratios I 2 , 1 and I 2 , 2 are well defined and controllable. Under the MAR assumption, the bias expansion for the constant kernel (a direct corollary of Proposition 2 generalized to m dimensions) yields
E [ u n , 2 ( 1 , x ˜ ) ] = j = 1 m p ( x j ) f ˜ ( x ˜ ) + O ( ϑ 1 / 2 ) ,
uniformly in x ˜ S d , 1 m . Since f ˜ is continuous and strictly positive on the compact set S d , 1 m (by condition (C.2) and the compactness of the simplex), and since p ( x j ) c > 0 by the positivity condition, there exists a constant c 2 > 0 such that
inf x ˜ S d , 1 m E [ u n , 2 ( 1 , x ˜ ) ] c 2 > 0 for all sufficiently large ϑ .
Furthermore, the uniform convergence of u n , 2 ( 1 , x ˜ ) to its expectation, guaranteed by Theorem 7, implies that for sufficiently large n,
inf x ˜ S d , 1 m u n , 2 ( 1 , x ˜ ) c 2 2 > 0 almost surely ,
since the deviation | u n , 2 ( 1 , x ˜ ) E [ u n , 2 ( 1 , x ˜ ) ] | is of order o ( 1 ) uniformly. Define c 1 : = c 2 / 2 ; then, almost surely,
inf x ˜ S d , 1 m u n , 2 ( 1 , x ˜ ) c 1 > 0 .
For the numerator expectation, the bias expansion together with condition (C.3) gives the uniform boundedness
sup x ˜ S d , 1 m E [ u n , 2 ( φ , x ˜ ) ] = O ( 1 ) ,
so there exists a constant C φ < such that sup x ˜ | E [ u n , 2 ( φ , x ˜ ) ] | C φ . With these preparatory estimates in hand, we now bound each term. For I 2 , 1 , using the lower bound | u n , 2 ( 1 , x ˜ ) | c 1 almost surely, we obtain
sup x ˜ S d , 1 m I 2 , 1 ( x ˜ ) ς n , 2 1 c 1 sup x ˜ S d , 1 m u n , 2 ( φ , x ˜ ) E [ u n , 2 ( φ , x ˜ ) ] ς n , 2 = O ( 1 ) a . s . ,
where the final equality follows directly from the uniform rate provided by Theorem 7.
For I 2 , 2 , we similarly apply the lower bounds | u n , 2 ( 1 , x ˜ ) | c 1 and | E [ u n , 2 ( 1 , x ˜ ) ] | c 2 , together with the boundedness of E [ u n , 2 ( φ , x ˜ ) ] , to obtain
sup x ˜ S d , 1 m I 2 , 2 ( x ˜ ) ς n , 2 sup x ˜ | E [ u n , 2 ( φ , x ˜ ) ] | c 1 c 2 · sup x ˜ S d , 1 m u n , 2 ( 1 , x ˜ ) E [ u n , 2 ( 1 , x ˜ ) ] ς n , 2
C φ c 1 c 2 · O ( 1 ) = O ( 1 ) a . s . ,
where the rate for u n , 2 ( 1 , x ˜ ) again follows from Theorem 7 with φ 1 . Summing the two contributions, we conclude that, almost surely,
sup x ˜ S d , 1 m r ^ n , 2 ( m ) ( φ , x ˜ ; Λ ¯ n , 2 ( x ˜ ) ) E ^ [ r ^ n , 2 ( m ) ( φ , x ˜ ; Λ ¯ n , 2 ( x ˜ ) ) ] ς n , 2 = O ( 1 ) ,
which is equivalent to the statement
sup x ˜ S d , 1 m r ^ n , 2 ( m ) ( φ , x ˜ ; Λ ¯ n , 2 ( x ˜ ) ) E ^ r ^ n , 2 ( m ) ( φ , x ˜ ; Λ ¯ n , 2 ( x ˜ ) ) = O ϑ m ( d 1 / 2 ) log n n a . s .
This completes the proof of Theorem 8. □
Remark 26.
Several subtle points merit explicit mention. First, the uniform almost sure bounds for u n , 2 ( φ , x ˜ ) and u n , 2 ( 1 , x ˜ ) are not merely of order O P ( ς n , 2 ) but hold with probability one; this strengthening is essential for the subsequent manipulation of the ratio, as it allows us to treat the denominator as uniformly bounded away from zero in an almost sure sense. Second, the constants c 1 and c 2 depend implicitly on the propensity score p ( · ) and the density f ( · ) , but crucially not on the sample size n or the bandwidth parameter ϑ; this uniformity is guaranteed by the compactness of S d , 1 m and the continuity of p and f. Third, while the MAR assumption introduces the factor j = 1 m p ( x j ) into the expectation E [ u n , 2 ( 1 , x ˜ ) ] , this factor cancels completely in the ratio E ^ [ r ^ n , 2 ( m ) ] = E [ u n , 2 ( φ , x ˜ ) ] / E [ u n , 2 ( 1 , x ˜ ) ] , as the same factor appears in both numerator and denominator. This cancellation is the mathematical manifestation of the well-known property that complete-case estimators under MAR retain the same bias expansion as their complete-data counterparts, up to higher-order terms that are asymptotically negligible. Consequently, the rate ς n , 2 remains unchanged from the complete-data case, although the asymptotic variance is inflated by the factor 1 / p ( x j ) (a phenomenon that affects the constant in the central limit theorem but not the rate of convergence).
Proof of Theorem 9.
The demonstration of Theorem 9 proceeds by a meticulous analysis of the bias decomposition (11.3) under the MAR assumption (2.4) with the positivity condition inf x S d , 1 p ( x ) c > 0 . Throughout this proof, all U-statistics are understood to be the complete-case versions incorporating the missingness indicators δ i as defined in (3.5); for notational brevity, we write u n , 2 ( φ , x ˜ ) in place of u n , 2 ( miss ) ( φ , x ˜ ) , with the implicit understanding that the product j = 1 m δ i j is included in the kernel. From the bias decomposition (11.3), we have
E ^ r ^ n , 2 ( m ) ( φ , x ˜ ; Λ ¯ n , 2 ( x ˜ ) ) r ( m ) ( φ , x ˜ ) = 1 E u n , 2 ( 1 , x ˜ ) E u n , 2 ( φ , x ˜ ) r ( m ) ( φ , x ˜ ) E u n , 2 ( 1 , x ˜ ) .
Consequently, the proof reduces to establishing two fundamental uniform bias estimates:
sup x ˜ S d , 1 m E u n , 2 ( φ , x ˜ ) R ( φ , x ˜ ) = O ϑ m d / 2 ,
and
sup x ˜ S d , 1 m E u n , 2 ( 1 , x ˜ ) f ˜ ( x ˜ ) = O ϑ m d / 2 ,
where R ( φ , x ˜ ) = f ˜ ( x ˜ ) r ( m ) ( φ , x ˜ ) . Once these are established, the desired result follows immediately from the positivity condition inf x ˜ S d , 1 m f ˜ ( x ˜ ) > 0 (which is a consequence of condition (C.2) and the compactness of the simplex) and the fact that E [ u n , 2 ( 1 , x ˜ ) ] converges uniformly to f ˜ ( x ˜ ) , ensuring that the denominator is uniformly bounded away from zero.
The key technical tool for establishing (13) is the following proposition, which provides a sharp bias expansion for the conditional U-statistic under the MAR assumption.
Proposition 3.
Assume that condition (C.2) holds and that the MAR assumption (2.4) is satisfied with inf x S d , 1 p ( x ) c > 0 . Then, uniformly for x ˜ S d , 1 m ,
E u n , 2 ( φ , x ˜ ) = j = 1 m p ( x j ) R ( φ , x ˜ ) + ϑ m L m ( x ˜ ) + o ( ϑ m ) , ϑ ,
where
L m ( x ˜ ) : = d ( d 1 ) 2 ϑ m j = 1 m p ( x j ) R ( φ , x ˜ )
+ i = 1 m = 1 d 1 2 x i x i R ( φ , x ˜ ) p ( x i )
+ 1 2 i , j = 1 m , r = 1 d x i 1 { i = j r } x i x j r 2 x i x j r R ( φ , x ˜ ) p ( x i ) p ( x j ) .
Proof of Proposition 3.
Under the MAR assumption, the expectation of the complete-case U-statistic takes the form
E u n , 2 ( φ , x ˜ ) = S d , 1 m R ( φ , u ˜ ) j = 1 m p ( u j ) K ˜ x ˜ , ϑ ( u ˜ ) d u ˜ ,
where K ˜ x ˜ , ϑ ( u ˜ ) = j = 1 m K x j , ϑ ( u j ) is the product Bernstein kernel. This representation follows from the tower property and the MAR condition E [ δ j X j = u j ] = p ( u j ) . The Bernstein kernel admits the explicit representation
K x , ϑ ( u ) = ( ϑ 1 + d ) ! ( ϑ 1 ) ! k N 0 d ( ϑ 1 ) S d , 1 1 k ϑ , k + 1 ϑ ( u ) P k , ϑ 1 ( x ) ,
where P k , ϑ 1 ( x ) are multinomial probabilities. Substituting this representation yields
E u n , 2 ( φ , x ˜ ) = ( ϑ 1 + d ) ! ( ϑ 1 ) ! m ( k 1 , , k m ) K ϑ m k 1 ϑ , k 1 + 1 ϑ k m ϑ , k m + 1 ϑ R ( φ , u ˜ ) j = 1 m p ( u j ) d u ˜ × j = 1 m P k j , ϑ 1 ( x j ) ,
where K ϑ : = N 0 d ( ϑ 1 ) S d , 1 . For any multi-index k ˜ = ( k 1 , , k m ) such that k ˜ / ϑ x ˜ 1 = o ( 1 ) , we perform a second-order Taylor expansion of the function R ( φ , u ˜ ) j = 1 m p ( u j ) around u ˜ = k ˜ / ϑ . A careful calculation gives
ϑ d m k 1 ϑ , k 1 + 1 ϑ k m ϑ , k m + 1 ϑ R ( φ , u ˜ ) j = 1 m p ( u j ) d u ˜ R ( φ , x ˜ ) j = 1 m p ( x j ) = 1 ϑ d i = 1 m = 1 d k i ( ϑ 1 ) x i x i R ( φ , x ˜ ) p ( x i ) + 1 ϑ d i = 1 m = 1 d 1 2 x i x i R ( φ , x ˜ ) p ( x i ) + 1 2 i , j = 1 m , r = 1 d k i ϑ x i k j r ϑ x j r 2 x i x j r R ( φ , x ˜ ) p ( x i ) p ( x j ) ( 1 + o ( 1 ) ) + o ( ϑ d ) .
Multiplying this expansion by ϑ d m ( ϑ 1 + d ) ! ( ϑ 1 ) ! m j = 1 m P k j , ϑ 1 ( x j ) and summing over all k ˜ K ϑ m , we invoke the fundamental identities for multinomial distributions:
k K ϑ k i ϑ x i P k , ϑ 1 ( x ) = 0 ,
k K ϑ k i ϑ x i k j ϑ x j P k , ϑ 1 ( x ) = 1 ϑ x i 1 { i = j } x i x j .
The terms involving the first-order moments vanish identically, while the second-order moments contribute at order ϑ 1 . Summation over the product structure introduces combinatorial factors, ultimately yielding the expansion stated in the proposition. The remainder terms are of order o ( ϑ m ) under the smoothness condition (C.2). This completes the proof of Proposition 3. □
With Proposition 3 established, we now derive the uniform bias estimate (13). From the expansion, we have
E u n , 2 ( φ , x ˜ ) = j = 1 m p ( x j ) R ( φ , x ˜ ) + O ( ϑ m ) .
However, a more refined analysis using the Lipschitz continuity of R ( φ , · ) p ( · ) (which follows from condition (C.2) and the smoothness of p ( · ) ) yields a sharper rate. Indeed, from the representation
E u n , 2 ( φ , x ˜ ) = ( ϑ 1 + d ) ! ( ϑ 1 ) ! m k ˜ K ϑ m R ( φ , k ˜ / ϑ ) j = 1 m p ( k j / ϑ ) × j = 1 m P k j , ϑ 1 ( x j ) + O ( ϑ d ( m + 1 ) ) ,
and the fact that R ( φ , · ) p ( · ) is Lipschitz with constant L, we obtain
E [ u n , 2 ( φ , x ˜ ) ] R ( φ , x ˜ ) p ( x ˜ ) L ( ϑ 1 + d ) ! ( ϑ 1 ) ! m k ˜ K ϑ m k ˜ ϑ x ˜ 1 j = 1 m P k j , ϑ 1 ( x j ) + O ( ϑ d ( m + 1 ) ) L j = 1 m = 1 d E | ξ j x j | + O ( ϑ d ( m + 1 ) ) ,
where ξ j denotes the -th component of a Dirichlet random vector with parameters ( α j , β j ) . Using the Cauchy–Schwarz inequality and the fact that E [ ( ξ j x j ) 2 ] = O ( ϑ 1 ) , we obtain E [ | ξ j x j | ] = O ( ϑ 1 / 2 ) . However, a more careful analysis using the explicit form of the Bernstein kernel reveals that the summation over k ˜ introduces an additional factor of ϑ ( d 1 ) m / 2 , leading to the rate ϑ d m / 2 . Indeed, the number of terms in the sum is of order ϑ d m , while the typical deviation k ˜ / ϑ x ˜ 1 is of order ϑ 1 / 2 , and the multinomial probabilities P k , ϑ 1 ( x ) concentrate on a set of size ϑ ( d 1 ) m / 2 . Consequently,
E [ u n , 2 ( φ , x ˜ ) ] R ( φ , x ˜ ) p ( x ˜ ) = O ( ϑ d m / 2 ) .
A similar argument applied to the constant kernel φ 1 gives
E [ u n , 2 ( 1 , x ˜ ) ] f ˜ ( x ˜ ) p ( x ˜ ) = O ( ϑ d m / 2 ) .
Now, returning to the bias decomposition, we compute
E [ u n , 2 ( φ , x ˜ ) ] r ( m ) ( φ , x ˜ ) E [ u n , 2 ( 1 , x ˜ ) ] = R ( φ , x ˜ ) p ( x ˜ ) + O ( ϑ d m / 2 ) r ( m ) ( φ , x ˜ ) f ˜ ( x ˜ ) p ( x ˜ ) + O ( ϑ d m / 2 ) = p ( x ˜ ) R ( φ , x ˜ ) r ( m ) ( φ , x ˜ ) f ˜ ( x ˜ ) + O ( ϑ d m / 2 ) = O ( ϑ d m / 2 ) ,
since R ( φ , x ˜ ) = r ( m ) ( φ , x ˜ ) f ˜ ( x ˜ ) by definition. The factor p ( x ˜ ) = j = 1 m p ( x j ) cancels exactly, a crucial consequence of the MAR assumption that ensures the same propensity score product appears in both the numerator and denominator expectations. Finally, under the positivity condition inf x ˜ S d , 1 m f ˜ ( x ˜ ) > 0 , we have inf x ˜ E [ u n , 2 ( 1 , x ˜ ) ] 1 2 inf f ˜ ( x ˜ ) · c m > 0 for sufficiently large ϑ . Hence,
sup x ˜ S d , 1 m E ^ r ^ n , 2 ( m ) ( φ , x ˜ ; Λ ¯ n , 2 ( x ˜ ) ) r ( m ) ( φ , x ˜ ) = O ( ϑ d m / 2 ) .
This completes the proof of Theorem 9. □
Remark 27.
Several subtle points in the proof deserve explicit commentary. First, the cancellation of the propensity score product j = 1 m p ( x j ) is not accidental but rather a direct consequence of the MAR assumption and the fact that the same missingness indicators appear in both the numerator and denominator of the estimator. This cancellation is exact at the level of the expectations, not merely asymptotic, which explains why the bias rate ϑ d m / 2 coincides with the complete-data case. Second, the rate ϑ d m / 2 is slower than the rate ϑ m obtained from the Taylor expansion of R ( φ , · ) p ( · ) ; this is because the Lipschitz argument captures the leading stochastic fluctuation, while the higher-order Taylor terms contribute at smaller orders. Third, the condition inf f ˜ ( x ˜ ) > 0 is essential to ensure that the denominator E [ u n , 2 ( 1 , x ˜ ) ] does not approach zero, which would otherwise invalidate the bias expansion. This condition is automatically satisfied under (C.2) since f ˜ is continuous and strictly positive on the compact simplex. Finally, while the proposition includes the factor j = 1 m p ( x j ) in the leading term, this factor cancels in the final bias expression, demonstrating the robustness of the complete-case estimator under MAR.

14. Proof of the Results of Section 5: Beta Kernels

Let A h , h = 1 , , N n d be the h-th subhyperrectangle. Also let x h be the most distant point in A h from the origin, that is, x h : = arg max x A h x . Suppose that the design point x falls into A h . Then, for all x ˜ = ( x 1 , , x m ) we denote x ˜ h = ( x 1 , h , , x m , h ) such that x ˜ h : = arg max x ˜ A h m x ˜ . For x ˜ = ( x 1 , , x m ) S x ˜ : = i = 1 m S X i , where
S X i = S X i ( η i ) : = j = 1 d η j , 1 η j [ 0 , 1 ] d ,
and the boundary parameters η i : = η i 1 , , η i d either are fixed or shrink to zero at a suitable rate. For each 1 i m , we divide every edge of the d-hyper-rectangles S X i into N n evenly spaced grids, resulting in N n d identical sub-hyper-rectangles. For any x ˜ = ( x 1 , , x m ) S X m , there exists ( x ˜ ) = ( ( x 1 ) , , ( x m ) ) such that for all 1 i m , 1 ( x i ) N n d , and
x ˜ i = 1 m A ( x ( x i ) ) such   that x ( x i ) : = arg max x A ( x ( x i ) ) x .
Under the MAR assumption (2.4) with the positivity condition inf x S d , 1 p ( x ) c > 0 , we consider the complete-case U-statistic u n , 3 ( miss ) ( φ , x ˜ ) incorporating the missingness indicators j = 1 m δ i j . For notational brevity, we write u n , 3 ( φ , x ˜ ) with the implicit understanding that the missingness indicators are included. For each x ˜ S X m , we consider the decomposition
u n , 3 ( φ , x ˜ ) E [ u n , 3 ( φ , x ˜ ) ] u n , 3 ( φ , x ˜ ) u n , 3 ( φ , x ˜ ( x ) ) + E [ u n , 3 ( φ , x ˜ ( x ) ) ] E [ u n , 3 ( φ , x ˜ ) ] + u n , 3 ( φ , x ˜ ( x ) ) E [ u n , 3 ( φ , x ˜ ( x ) ) ] .
Before proceeding, we borrow a few lemmas from [165], all of which are key building blocks for the technical proofs below. Under the MAR assumption, these lemmas remain valid as they concern only the kernel structure, which is unaffected by the missingness mechanism. Throughout θ x j denotes a beta random variable so that
θ x j = D Beta x j / b j + 1 , 1 x j / b j + 1 .
Lemma 4.
Let θ x j and θ x k be independent for j k . Then, as n , we have
sup x j ( 0 , 1 ) E θ x j x j = O b j , and sup x j , x k ( 0 , 1 ) E θ x j x j θ x k x k = O b j , for j = k , O b j b k , for j k .
Lemma 5.
Suppose that b ( = b ( n ) > 0 ) and η ( = η ( n ) > 0 ) satisfy b , η 0 and b / η 0 as n . Then, as n , we have
sup ( x , u ) [ η , 1 η ] × [ 0 , 1 ] K B ( x , b ) ( u ) 9 4 π b 1 / 2 η 1 / 2 .
Lemma 6.
Under the same condition as in Lemma 5, as n , we have
sup ( x , u ) [ η , 1 η ] × [ 0 , 1 ] K B ( x , b ) ( u ) x 9 4 π γ + π 2 6 + 1 b ( 2 + 1 / 2 ) η 1 / 2 ,
where γ = 0 . 5772 is Euler’s constant.
Proof of Theorem 10.
To establish this theorem under the MAR assumption, we employ a truncation argument for the conditional U-statistic, carefully accounting for the missingness indicators. First, let us introduce the following notation:
ϕ n = ( log n / n ) j = 1 m i = 1 d b j i η j i , ω n , 3 = ϕ n 1 / ( 1 + γ ) , N n = ϕ n ( 1 + 1 1 + γ ) j = 1 m i = 1 d b j i η j i 1 2 j = 1 m i = 1 d 1 b j i 2 .
From the truncation decomposition (11.1), adapted to the MAR setting with the kernel G φ , x ˜ , 3 ( miss ) defined in (2.6), we can write
u n , 3 ( φ , x ˜ ) = u n , 3 ( m ) G φ , x ˜ , 3 ( miss ) , ( T ) + u n , 3 ( m ) G φ , x ˜ , 3 ( miss ) , ( R ) = u n , 3 ( T ) ( φ , x ˜ ) + u n , 3 ( R ) ( φ , x ˜ ) .
Using the same truncation technique we employed in the previous sections’ proofs, we establish the results for the truncated and remainder parts separately. The remainder part is handled analogously to the proof of Theorem 2, utilizing condition (C.3) and the MAR assumption to ensure its asymptotic negligibility. We focus here on the truncated part, where the missingness indicators play a crucial role.
  • Truncated Part under MAR
Let us remark that under the MAR assumption, the truncated complete-case U-statistic satisfies
u n , 3 ( T ) ( φ , x ˜ ) E u n , 3 ( T ) ( φ , x ˜ ) = ( n m ) ! n ! i I ( m , n ) G φ , x ˜ , 3 ( miss ) , ( T ) ( X i ˜ , Y i ˜ , δ ˜ i ) E G φ , x ˜ , 3 ( miss ) , ( T ) ( X i ˜ , Y i ˜ , δ ˜ i ) = ( n m ) ! n ! i I ( m , n ) H ( T ) ( X i ˜ , Y i ˜ , δ ˜ i ) ,
where
H ( T ) ( X ˜ , Y ˜ , δ ˜ ) = G φ , x ˜ , 3 ( miss ) , ( T ) ( X ˜ , Y ˜ , δ ˜ ) E G φ , x ˜ , 3 ( miss ) , ( T ) ( X ˜ , Y ˜ , δ ˜ ) .
We apply Lemma A2 (the exponential inequality for U-statistics) to the function H ( T ) ( · , · , · ) . Throughout the remainder of the proof, we assume, without loss of generality, that the kernel G φ , x ˜ , 3 ( miss ) , ( T ) is symmetric (if not, we replace it by its symmetrization as in Remark 2, which does not affect the U-statistic value). Moreover, by Lemma 5, for a sufficiently large n, we readily infer
H ( T ) ( X ˜ , Y ˜ , δ ˜ ) 2 ω n , 3 9 4 π d m j = 1 m i = 1 d b j i η j i 1 2 2 9 4 π d m ϕ n 2 1 / ( 1 + γ ) log ( n ) : = C H ,
where we have used the fact that | j = 1 m δ i j | 1 and the boundedness of φ ( T ) by ω n , 3 . The term ϕ n 2 1 / ( 1 + γ ) arises from the product of the kernel bounds and the truncation threshold. We also note that θ = E [ H ( T ) ( X ˜ , Y ˜ , δ ˜ ) ] = 0 by construction. One can easily derive
σ 2 = Var ( H ( T ) ( X ˜ , Y ˜ , δ ˜ ) ) E [ H ( T ) ( X ˜ , Y ˜ , δ ˜ ) 2 ] [ 0 , 1 ] d m E φ ( T ) ( Y ˜ ) 2 X ˜ = u ˜ j = 1 m p ( u j ) f ˜ ( u ˜ ) K ˜ Λ ¯ n , 3 ( x ˜ ) 2 ( u ˜ ) d u ˜ .
The factor j = 1 m p ( u j ) appears from the MAR assumption when taking the expectation over the missingness indicators: E [ j = 1 m δ i j X ˜ = u ˜ ] = j = 1 m p ( u j ) . Using Lyapunov’s inequality, and condition (C.3), for C 0 , C 1 1 , we have
E φ ( T ) ( Y ˜ ) 2 X ˜ = u ˜ j = 1 m p ( u j ) f ˜ ( u ˜ ) E φ ( T ) ( Y ˜ ) 2 + γ X ˜ = u ˜ f ˜ ( u ˜ ) 2 / ( 2 + γ ) f ˜ ( u ˜ ) γ / ( 2 + γ ) C 1 2 / ( 2 + γ ) C 0 γ / ( 2 + γ ) C 0 C 1 ,
where we have used that 0 < p ( u j ) 1 and j = 1 m p ( u j ) 1 . In addition, recall that the squared Dirichlet kernel satisfies
K α , β 2 ( u ) = B { 2 x / b + 1 , 2 ( 1 x ) / b + 1 } b 2 { x / b + 1 , ( 1 x ) / b + 1 } u 2 x / b ( 1 u ) 2 ( 1 x ) / b B { 2 x / b + 1 , 2 ( 1 x ) / b + 1 } 1 { u [ 0 , 1 ] } .
By a lemma from [98], the first term is bounded by b 1 / 2 ( 1 + b ) 3 / 2 / { 2 π x ( 1 x ) } for sufficiently large n. The second term is the probability density function of a Beta distribution with parameters { 2 x / b + 1 , 2 ( 1 x ) / b + 1 } . Therefore, we derive
σ 2 C 0 C 1 j = 1 m i = 1 d b j i 1 / 2 1 + b j i 3 / 2 2 π x j i 1 x j i C 0 C 1 j = 1 m i = 1 d b j i 1 / 2 1 + b j i 3 / 2 2 π η j i 1 η j i .
For sufficiently large n, the parameters b j 1 , , b j d and η j 1 , , η j d ( 1 j m ) are no greater than 1 / 2 , and thus
σ 2 j = 1 m j = 1 d b j i η j i C 0 C 1 3 4 3 π d m n j = 1 m j = 1 d b j i η j i C 0 C 1 3 4 3 π d m ϕ n 2 log ( n ) C 0 C 1 3 4 3 π d m ϕ n 2 log ( n ) ρ 2 ,
where ρ 2 : = C 0 C 1 3 4 3 π d m . For any ε > 0 and n sufficiently large, applying Bernstein’s inequality for U-statistics (see Lemma A2) yields
P u n , 3 ( T ) ( φ , x ˜ ) E u n , 3 ( T ) ( φ , x ˜ ) > ε ρ ϕ n 2 exp [ n / m ] ρ 2 ϕ n 2 ε 2 2 σ 2 + 2 3 C H ρ ε ϕ n 2 exp ε 2 log ( n ) 2 1 + 2 3 9 4 π d m ε ϕ n 1 1 / ( 1 + γ ) ρ .
Taking into account that ϕ n = o ( 1 ) and 2 3 9 4 π d m ε ϕ n 1 1 / ( 1 + γ ) ρ 1 for sufficiently large n, it follows that
P u n , 3 ( T ) ( φ , x ˜ ) E u n , 3 ( T ) ( φ , x ˜ ) > ε ρ ϕ n 2 exp ε 2 log ( n ) 2 ( 1 + 1 ) = 2 n ε 2 4 .
On the other hand, we have
P sup x ˜ S X m u n , 3 ( T ) ( φ , x ˜ ) E u n , 3 ( T ) ( φ , x ˜ ) > 2 ε ρ ϕ n P sup x ˜ S X m u n , 3 ( T ) ( φ , x ˜ ) u n , 3 ( T ) ( φ , x ˜ ( x ) ) + E [ u n , 3 ( T ) ( φ , x ˜ ( x ˜ ) ) ] E [ u n , 3 ( T ) ( φ , x ˜ ) ] > ε ρ ϕ n + P sup x ˜ S X m u n , 3 ( T ) ( φ , x ˜ ( x ˜ ) ) E [ u n , 3 ( T ) ( φ , x ˜ ( x ˜ ) ) ] > ε ρ ϕ n .
We highlight that under the MAR assumption, the difference between the U-statistics evaluated at x ˜ and x ˜ ( x ˜ ) is bounded by
u n , 3 ( T ) ( φ , x ˜ ) u n , 3 ( T ) ( φ , x ˜ ( x ˜ ) ) ( n m ) ! n ! i I ( m , n ) G φ , x ˜ , 3 ( miss ) , ( T ) ( X i ˜ , Y i ˜ , δ ˜ i ) G φ , x ˜ ( x ˜ ) , 3 ( miss ) , ( T ) ( X i ˜ , Y i ˜ , δ ˜ i ) ( n m ) ! n ! i I ( m , n ) φ ( T ) ( Y i ˜ ) j = 1 m δ i j K ˜ Λ ¯ n , 3 ( x ˜ ) ( X ˜ i ) K ˜ Λ ¯ n , 3 ( x ˜ ( x ˜ ) ) ( X ˜ i ) .
The product of missingness indicators j = 1 m δ i j is bounded by 1 and does not affect the rate. Hence, the rate of
sup x ˜ A h m u n , 3 ( T ) ( φ , x ˜ ) E u n , 3 ( T ) ( φ , x ˜ ( x ˜ ) )
is determined by φ ( T ) ( Y i ˜ ) K ˜ Λ ¯ n , 3 ( x ˜ ) ( X ˜ i ) K ˜ Λ ¯ n , 3 ( x ˜ ( x ˜ ) ) ( X ˜ i ) . By the mean-value theorem, we have
K ˜ Λ ¯ n , 3 ( x ˜ ) ( X ˜ i ) K ˜ Λ ¯ n , 3 ( x ˜ ( x ˜ ) ) ( X ˜ i ) sup ( x ˜ , u ˜ ) A h m × [ 0 , 1 ] d m K ˜ Λ ¯ n , 3 ( x ˜ ) ( u ) sup x ˜ A h m x ˜ x ˜ ( x ˜ ) ,
for some x ̲ ˜ joining x ˜ and x ˜ ( x ˜ ) . For k = 1 , , m , observe that
K ˜ Λ ¯ n , 3 ( x ˜ ) ( u ) x k j = 1 , j k m K Λ n , 3 ( x j ) ( u j ) K Λ n , 3 ( x k ) ( u k ) x k j = 1 , j k m i = 1 d K α ˘ j i , β ˘ j i u j i K Λ n , 3 ( x k ) ( u k ) x k j = 1 , j k m O i = 1 d b j i η j i 1 2 K Λ n , 3 ( x k ) ( u k ) x k ,
where, by Lemmas 5 and 6, for = 1 , , d , we have
K Λ n , 3 ( x k ) ( u k ) x k i = 1 , i d K α ˘ k i , β ˘ k i u k i K α ˘ k , β ˘ k u k x k = O i = 1 d b k i η k i 1 2 1 b k 2 ,
uniformly on ( x ˜ , u ˜ ) A h m × [ 0 , 1 ] d m and
j = 1 , j k m K Λ n , 3 ( x j ) ( u j ) = j = 1 , j k m i = 1 d K α ˘ j i , β ˘ j i u j i j = 1 , j k m O i = 1 d b j i η j i 1 2 ,
which implies
sup ( x ˜ , u ˜ ) A h m × [ 0 , 1 ] d m K ˜ Λ ¯ n , 3 ( x ˜ ) ( u ˜ ) = O j = 1 m i = 1 d b j i η j i 1 2 j = 1 m i = 1 d 1 b j i 2 .
Using the fact that sup x ˜ A h m x ˜ x ˜ ( x ˜ ) = O ( N n m ) , it follows that
φ ( T ) ( Y i ˜ ) K ˜ Λ ¯ n , 3 ( x ˜ ) ( X ˜ i ) K ˜ Λ ¯ n , 3 ( x ˜ ( x ˜ ) ) ( X ˜ i ) O ω n , 3 N n m j = 1 m i = 1 d b j i η j i 1 2 j = 1 m i = 1 d 1 b j i 2 = O ( ϕ n ) ,
uniformly on ( x ˜ , u ˜ ) A h m × [ 0 , 1 ] d m . Next, making use of (14.4), we have
E [ u n , 3 ( T ) ( φ , x ˜ ( x ˜ ) ) ] E [ u n , 3 ( T ) ( φ , x ˜ ) ]
= E u n , 3 ( T ) ( φ , x ˜ ( x ˜ ) ) u n , 3 ( T ) ( φ , x ˜ )
E u n , 3 ( T ) ( φ , x ˜ ( x ˜ ) ) u n , 3 ( T ) ( φ , x ˜ ) .
Just as in the bounded scenario, the progression from (14.5) to (14.6) arises from Jensen’s inequality and the properties of the absolute value function. We can deduce that
sup x ˜ S X m E [ u n , 3 ( T ) ( φ , x ˜ ( x ˜ ) ) ] E [ u n , 3 ( T ) ( φ , x ˜ ) ] = O ( ϕ n ) .
For sufficiently large n and each m 2 , for some ε > 0 , we infer that
P sup x ˜ S X m u n , 3 ( T ) ( φ , x ˜ ) u n , 3 ( T ) ( φ , x ˜ ( x ) ) + E [ u n , 3 ( T ) ( φ , x ˜ ( x ˜ ) ) ] E [ u n , 3 ( T ) ( φ , x ˜ ) ] > ε ρ ϕ n = 0 .
Continuing now with (14.2), by imposing that the kernel function G φ , x ˜ , 3 ( miss ) , ( T ) is symmetric, the U-statistic is decomposed according to the Hoeffding decomposition [2]:
u n , 3 ( T ) ( φ , x ˜ ( x ˜ ) ) E [ u n , 3 ( T ) ( φ , x ˜ ( x ˜ ) ) ] = q = 1 m m ! ( m q ) ! u n , 3 ( q ) π q , m ( G φ , x ˜ , 3 ( miss ) , ( T ) ) = m u n , 3 ( 1 ) π 1 , m ( G φ , x ˜ , 3 ( miss ) , ( T ) ) + q = 2 m m ! ( m q ) ! u n , 3 ( q ) π q , m ( G φ , x ˜ , 3 ( miss ) , ( T ) ) .
Let us first consider the linear term. We have
m u n , 3 ( 1 ) π 1 , m ( G φ , x ˜ , 3 ( miss ) , ( T ) ) = m n j = 1 n π 1 , m ( G φ , x ˜ , 3 ( miss ) , ( T ) ) ( X ˜ i , Y ˜ i , δ ˜ i ) .
From Hoeffding’s projection (14.7), we have
π 1 , m ( G φ , x ˜ , 3 ( miss ) , ( T ) ) ( x , y , δ ) = E G φ , x ˜ , 3 ( miss ) , ( T ) ( x , X 2 , , X m ) , ( y , Y 2 , , Y m ) , ( δ , δ 2 , , δ m ) E [ G φ , x ˜ , 3 ( miss ) , ( T ) X ˜ , Y ˜ , δ ˜ ] = E [ G φ , x ˜ , 3 ( miss ) , ( T ) X ˜ , Y ˜ , δ ˜ ( X 1 , Y 1 , δ 1 ) = ( x , y , δ ) ] E [ G φ , x ˜ , 3 ( miss ) , ( T ) X ˜ , Y ˜ , δ ˜ ] .
Set
Z i ( T ) = π 1 , m ( G φ , x ˜ , 3 ( miss ) , ( T ) ) ( X ˜ i , Y ˜ i , δ ˜ i ) .
It is evident that Z i ( T ) are independent and identically distributed random variables with mean zero, and
σ 2 ϕ n 2 log ( n ) ρ 2 .
Making use of (14.1) and an application of Bernstein’s inequality, for some ε > 0 , yields
P sup x ˜ S X m u n , 3 ( 1 ) π 1 , m ( G φ , x ˜ , 3 ( miss ) , ( T ) ) > ε ρ ϕ n i = 1 N n d P max 1 i N n d u n , 3 ( 1 ) π 1 , m ( G φ , x ˜ , 3 ( miss ) , ( T ) ) > ε ρ ϕ n N n d max 1 i N n d P u n , 3 ( 1 ) π 1 , m ( G φ , x ˜ , 3 ( miss ) , ( T ) ) > ε ρ ϕ n = O N n d n ε 2 4 .
Turning to the nonlinear term, we will prove that for 2 q m :
sup x ˜ S X m m q u n , 3 ( q ) π q , m G φ , x ˜ , 3 ( miss ) , ( T ) ϕ n = o P ( 1 ) ,
which implies that, for 1 i m and = ( 1 , , m ) :
max 1 i N n d m q u n ( q ) π q , m G φ , x ˜ , 3 ( miss ) , ( T ) ϕ n = o P ( 1 ) .
To prove the above-mentioned equation, we need to apply Proposition 1 of [187] (see Lemma A3). We can see that G φ , x ˜ , 3 ( miss ) , ( T ) is bounded by 9 4 π d m ϕ n 2 1 / ( 1 + γ ) log ( n ) , hence for ε > 0 , we have
P n 1 / 2 q = 2 m m ! ( m q ) ! u n , 3 ( q ) π q , m ( G φ , x ˜ , 3 ( miss ) , ( T ) ) > ε ρ ϕ n = P q = 2 m m ! ( m q ) ! u n , 3 ( q ) π q , m ( G φ , x ˜ , 3 ( miss ) , ( T ) ) > n 1 / 2 ε ρ ϕ n = P q = 2 m m ! ( m q ) ! u n , 3 ( q ) π q , m ( G φ , x ˜ , 3 ( miss ) , ( T ) ) > ε 0 ρ ϕ n ,
where ε 0 = ε n . Now for t = ε ρ ϕ n , Lemma A3 gives
P q = 2 m m ! ( m q ) ! u n , 3 ( q ) π q , m ( G φ , x ˜ , 3 ( miss ) , ( T ) ) > ε 0 ρ ϕ n 2 exp t ( n 1 ) 1 / 2 2 m + 2 m m + 1 1 2 C H 2 exp ε ρ ϕ n ( n 1 ) 1 / 2 2 m + 2 m m + 1 1 2 C H 2 exp ε ( n 1 ) 1 / 2 log ( n ) 2 m + 2 m m + 1 ( 1 3 ) d m ϕ n 1 1 / ( 1 + γ ) .
By the last result, it follows that there exists ε > 0 such that
P q = 2 m m q u n , 3 ( q ) π q , m ( G φ , x ˜ , 3 ( miss ) , ( T ) ) > ε 0 ρ ϕ n n ε 0 / 2 C 6 ,
where
C 6 = 2 m + 2 m m + 1 1 3 d m ϕ n 1 1 / ( 1 + γ ) .
Therefore, for each ε 0 > 0 , 1 i m and = ( 1 , , m ) , we infer that
P sup x ˜ S X m q = 2 m m q u n , 3 ( q ) π q , m ( G φ , x ˜ , 3 ( miss ) , ( T ) ) > ε 0 ρ ϕ n N n d max 1 i N n d P q = 2 m m q u n , 3 ( q ) π q , m ( G φ , x ˜ , 3 ( miss ) , ( T ) ) > ε 0 ρ ϕ n N n d n m ( ε 0 / 2 C 6 ) .
By combining (14.1) and (14.9), for some ε > 0 , it follows that
P sup x ˜ S X m u n , 3 ( T ) ( φ , x ˜ ( x ˜ ) ) E [ u n , 3 ( T ) ( φ , x ˜ ( x ˜ ) ) ] > ε ρ ϕ n = O ( N n d n m ε 2 / 4 ) ,
which implies for ε = 2 5 d , as n ,
N n d n m ε 2 / 4 = ϕ n p ( 1 + 1 1 + γ ) j = 1 m i = 1 d b j i η j i d 2 j = 1 m i = 1 d 1 b j i 2 d n 5 m d = ( log n ) 5 m ϕ n 10 m ( 1 + 1 1 + γ ) j = 1 m i = 1 d η j i 1 2 j = 1 m i = 1 d k = 1 , k j m = 1 , i d b j i 5 m 1 2 b j i 5 ( m 1 ) / 2 d 0 .
Thus, the remainder part u n , 3 ( R ) ( φ , x ˜ ) is treated similarly using condition (C.3) and the MAR assumption, yielding a negligible contribution. Consequently, we have established the desired uniform convergence rate for u n , 3 ( φ , x ˜ ) under the MAR assumption. This completes the proof of Theorem 10. □
Remark 28.
The adaptation to the missing data setting required careful incorporation of the missingness indicators j = 1 m δ i j into the kernel G φ , x ˜ , 3 ( miss ) , ( T ) . The MAR assumption enters critically when taking expectations, introducing the factor j = 1 m p ( u j ) in the variance bound. However, since 0 < p ( u j ) 1 under the positivity condition, this factor does not affect the asymptotic rates—it merely reduces the effective sample size by a constant factor, which is absorbed into the constants ρ 2 and C H . The truncation threshold ω n , 3 and the rate ϕ n remain unchanged from the complete-data case, demonstrating the robustness of the complete-case U-statistic methodology under MAR.
  • Remainder Part
Notice that under the MAR assumption (2.4) with the positivity condition inf x S d , 1 p ( x ) c > 0 , the complete-case remainder kernel incorporates the missingness indicators as G φ , x ˜ , 3 ( miss ) , ( R ) ( X ˜ , Y ˜ , δ ˜ ) = φ ( R ) ( Y ˜ ) j = 1 m δ j K ˜ Λ ¯ n , 3 ( x ˜ ) ( X ˜ ) . For notational brevity, we write G φ , x ˜ , 3 ( R ) with the implicit understanding that the missingness indicators are included. Then,
u n , 3 ( R ) ( φ , x ˜ ) E u n , 3 ( R ) ( φ , x ˜ ) = ( n m ) ! n ! i I ( m , n ) G φ , x ˜ , 3 ( R ) ( X i ˜ , Y i ˜ ) E G φ , x ˜ , 3 ( R ) ( X i ˜ , Y i ˜ ) .
Now, using the fact that for φ ( Y i ˜ ) > ω n , 3 , we have ( φ ( Y i ˜ ) / ω n , 3 ) 1 + γ > 1 , which implies that
E [ u n , 3 ( R ) ( φ , x ˜ ) ] E [ φ ( Y ˜ i ) 1 φ ( Y ˜ i ) > ω n , 3 j = 1 m δ i j K ˜ Λ ¯ n , 3 ( x ˜ ) ( X ˜ i ) ] E [ φ ( Y ˜ i ) φ ( Y i ˜ ) ω n , 3 1 + γ 1 φ ( Y ˜ i ) > ω n , 3 j = 1 m δ i j K ˜ Λ ¯ n , 3 ( x ˜ ) ( X ˜ i ) ] ω n , 3 ( 1 + γ ) E [ φ ( Y ˜ i ) 2 + γ j = 1 m δ i j K ˜ Λ ¯ n , 3 ( x ˜ ) ( X ˜ i ) ] ,
where, by Assumption (C.3) and the MAR assumption, we have E [ j = 1 m δ i j X ˜ i = u ˜ ] = j = 1 m p ( u j ) . Moreover, K ˜ Λ ¯ n , 3 ( x ˜ ) ( · ) is the density function of the product of d m independent beta random variables θ x i : = ( θ x 1 , , θ x m ) [ 0 , 1 ] d m , i = 1 , , d m . Consequently,
E φ ( Y ˜ i ) 2 + γ j = 1 m δ i j K ˜ Λ ¯ n , 3 ( x ˜ ) ( X ˜ i ) = E E φ ( Y ˜ i ) 2 + γ X ˜ i j = 1 m p ( X i j ) K ˜ Λ ¯ n , 3 ( x ˜ ) ( X ˜ i ) = [ 0 , 1 ] d m E φ ( Y ˜ ) 2 + γ X ˜ = u ˜ j = 1 m p ( u j ) f ˜ ( u ˜ ) K ˜ Λ ¯ n , 3 ( x ˜ ) ( u ˜ ) d u ˜ C 1 ,
where the final inequality follows from condition (C.3) (which gives E ( | φ ( Y ˜ ) | 2 + γ X ˜ = u ˜ ) f ˜ ( u ˜ ) C 1 ), the fact that j = 1 m p ( u j ) 1 under the positivity condition, and the property that K ˜ = 1 . Hence, by the definition of ω n , 3 = ϕ n 1 / ( 1 + γ ) , we have E u n , 3 ( R ) ( φ , x ˜ ) O ( ϕ n ) uniformly on x ˜ S X m . Consequently, Markov’s inequality gives us
sup x ˜ S X m u n , 3 ( R ) ( φ , x ˜ ) E u n , 3 ( R ) ( φ , x ˜ ) = O P ( ϕ n ) .
A more refined argument employing the Borel–Cantelli lemma (see the proof of Theorem 2) actually yields the stronger almost sure convergence at the same rate, given the exponential bounds available for the truncated part and the moment condition for the remainder. Hence, the proof is complete.
Notice that
u n , 3 ( R ) ( φ , x ˜ ) E u n , 3 ( R ) ( φ , x ˜ ) = ( n m ) ! n ! i I ( m , n ) G φ , x ˜ , 3 ( R ) ( X i ˜ , Y i ˜ ) E G φ , x ˜ , 3 ( R ) ( X i ˜ , Y i ˜ ) .
Now, using the fact that for φ ( Y i ˜ ) > ω n , 3 , we have ( φ ( Y i ˜ ) / ω n , 3 ) 1 + γ > 1 , which implies that
E [ u n , 3 ( R ) ( φ , x ˜ ) ] E [ φ ( Y ˜ i ) 1 φ ( Y ˜ i ) > ω n , 3 K ˜ Λ ¯ n , 3 ( x ˜ ) ( X ˜ i ) ] E [ φ ( Y ˜ i ) φ ( Y i ˜ ) ω n , 3 1 + γ 1 φ ( Y ˜ i ) > ω n , 3 K ˜ Λ ¯ n , 3 ( x ˜ ) ( X ˜ i ) ] ω n , 3 ( 1 + γ ) E [ φ ( Y ˜ i ) 2 + γ K ˜ Λ ¯ n , 3 ( x ˜ ) ( X ˜ i ) ] ,
where, by Assumption (C.3) and the fact that K ˜ Λ ¯ n , 3 ( x ˜ ) ( · ) is the density function of the product of d m independent beta random variables θ x i : = ( θ x 1 , , θ x m ) [ 0 , 1 ] d m , i = 1 , , d m . We have
E φ ( Y ˜ i ) 2 + γ K ˜ Λ ¯ n , 3 ( x ˜ ) ( X ˜ i ) = E E φ ( Y ˜ i ) 2 + γ X ˜ i K ˜ Λ ¯ n , 3 ( x ˜ ) ( X ˜ i ) = [ 0 , 1 ] d m E φ ( Y ˜ ) 2 + γ X ˜ = u ˜ f ˜ ( u ˜ ) K ˜ Λ ¯ n , 3 ( x ˜ ) ( u ˜ ) d u ˜ C 1 .
Hence, by the definition of ω n , 3 , E u n , 3 ( R ) ( φ , x ˜ ) O ( ϕ n ) uniformly on x ˜ S X m . Consequently, Markov’s inequality gives us
sup x ˜ S X m u n , 3 ( R ) ( φ , x ˜ ) E u n , 3 ( R ) ( φ , x ˜ ) = O P ( ϕ n ) .
Hence, the proof is complete.
Proof of Theorem 11.
The demonstration of Theorem 11 proceeds by a meticulous application of the classical decomposition (11.2) in conjunction with the uniform convergence rates established in Theorem 10 for the underlying complete-case conditional U-statistics u n , 3 ( φ , x ˜ ) under the MAR assumption (2.4) with the positivity condition inf x S d , 1 p ( x ) c > 0 . Throughout this proof, all U-statistics are understood to be the complete-case versions incorporating the missingness indicators δ i as defined in (3.5); for notational brevity, we write u n , 3 ( φ , x ˜ ) in place of u n , 3 ( miss ) ( φ , x ˜ ) , with the implicit understanding that the product j = 1 m δ i j is included in the kernel.
Recall that the estimator of interest admits the representation r ^ n , 3 ( m ) ( φ , x ˜ ; Λ ¯ n , 3 ( x ˜ ) ) = u n , 3 ( φ , x ˜ ) / u n , 3 ( 1 , x ˜ ) , and its Hájek-type centering is defined as
E ^ [ · ] = E [ u n , 3 ( φ , x ˜ ) ] / E [ u n , 3 ( 1 , x ˜ ) ] .
From the elementary identity
a b α β = a α b α ( b β ) b β ,
valid for b , β 0 , we obtain the fundamental decomposition
r ^ n , 3 ( m ) ( φ , x ˜ ; Λ ¯ n , 3 ( x ˜ ) ) E ^ [ r ^ n , 3 ( m ) ( φ , x ˜ ; Λ ¯ n , 3 ( x ˜ ) ) ] I 3 , 1 ( x ˜ ) + I 3 , 2 ( x ˜ ) ,
where the stochastic components are defined as
I 3 , 1 ( x ˜ ) : = u n , 3 ( φ , x ˜ ) E [ u n , 3 ( φ , x ˜ ) ] u n , 3 ( 1 , x ˜ ) ,
I 3 , 2 ( x ˜ ) : = E [ u n , 3 ( φ , x ˜ ) ] · u n , 3 ( 1 , x ˜ ) E [ u n , 3 ( 1 , x ˜ ) ] u n , 3 ( 1 , x ˜ ) · E [ u n , 3 ( 1 , x ˜ ) ] .
The crux of the proof lies in establishing uniform (in x ˜ ) almost sure bounds for both I 3 , 1 and I 3 , 2 , which then translate directly into the desired rate for the ratio estimator. To this end, we first establish uniform lower bounds for the denominator u n , 3 ( 1 , x ˜ ) and its expectation under the MAR assumption. Under the MAR assumption and the positivity condition, together with the bias expansion established in Theorem 4 (adapted to the current kernel setting), we have the following uniform convergence results. For the denominator u n , 3 ( 1 , x ˜ ) , Theorem 10 with φ 1 furnishes the uniform almost sure convergence rate
sup x ˜ S X m u n , 3 ( 1 , x ˜ ) E [ u n , 3 ( 1 , x ˜ ) ] = O ( ϕ n ) a . s . ,
where ϕ n = ( log n ) / n j = 1 m ( i = 1 d b j i η j i ) 1 / 2 is the rate defined in Theorem 10. Moreover, from the bias expansion (see Proposition 3 and its extension to the current kernel), we have
E [ u n , 3 ( 1 , x ˜ ) ] = j = 1 m p ( x j ) f ˜ ( x ˜ ) + O ( ϑ m / 2 ) ,
uniformly in x ˜ S X m . Since f ˜ ( x ˜ ) is continuous and strictly positive on the compact set S X m (by condition (C.2) and the compactness of the domain), and since p ( x j ) c > 0 by the positivity condition, there exists a constant c 2 > 0 such that
inf x ˜ S X m E [ u n , 3 ( 1 , x ˜ ) ] c 2 > 0 for all sufficiently large n .
Furthermore, the uniform convergence of u n , 3 ( 1 , x ˜ ) to its expectation, guaranteed by Theorem 10, implies that for sufficiently large n,
inf x ˜ S X m u n , 3 ( 1 , x ˜ ) c 2 2 > 0 almost surely ,
since the deviation | u n , 3 ( 1 , x ˜ ) E [ u n , 3 ( 1 , x ˜ ) ] | is of order o ( 1 ) uniformly. Define c 1 : = c 2 / 2 ; then, almost surely,
inf x ˜ S X m u n , 3 ( 1 , x ˜ ) c 1 > 0 .
For the numerator expectation, the bias expansion together with condition (C.3) gives the uniform boundedness
sup x ˜ S X m E [ u n , 3 ( φ , x ˜ ) ] = O ( 1 ) ,
so there exists a constant C φ < such that sup x ˜ | E [ u n , 3 ( φ , x ˜ ) ] | C φ . With these preparatory estimates in hand, we now bound each term. For I 3 , 1 ( x ˜ ) , using the lower bound | u n , 3 ( 1 , x ˜ ) | c 1 almost surely, we obtain
sup x ˜ S X m I 3 , 1 ( x ˜ ) ϕ n 1 c 1 sup x ˜ S X m u n , 3 ( φ , x ˜ ) E [ u n , 3 ( φ , x ˜ ) ] ϕ n = O ( 1 ) a . s . ,
where the final equality follows directly from the uniform rate provided by Theorem 10 applied to the kernel φ . For I 3 , 2 ( x ˜ ) , we similarly apply the lower bounds | u n , 3 ( 1 , x ˜ ) | c 1 and | E [ u n , 3 ( 1 , x ˜ ) ] | c 2 , together with the boundedness of E [ u n , 3 ( φ , x ˜ ) ] , to obtain
sup x ˜ S X m I 3 , 2 ( x ˜ ) ϕ n sup x ˜ | E [ u n , 3 ( φ , x ˜ ) ] | c 1 c 2 · sup x ˜ S X m u n , 3 ( 1 , x ˜ ) E [ u n , 3 ( 1 , x ˜ ) ] ϕ n
C φ c 1 c 2 · O ( 1 ) = O ( 1 ) a . s . ,
where the rate for u n , 3 ( 1 , x ˜ ) again follows from Theorem 10 with φ 1 .
Summing the two contributions, we conclude that, almost surely,
sup x ˜ S X m r ^ n , 3 ( m ) ( φ , x ˜ ; Λ ¯ n , 3 ( x ˜ ) ) E ^ [ r ^ n , 3 ( m ) ( φ , x ˜ ; Λ ¯ n , 3 ( x ˜ ) ) ] ϕ n = O ( 1 ) ,
which is equivalent to the statement
sup x ˜ S X m r ^ n , 3 ( m ) ( φ , x ˜ ; Λ ¯ n , 3 ( x ˜ ) ) E ^ r ^ n , 3 ( m ) ( φ , x ˜ ; Λ ¯ n , 3 ( x ˜ ) ) = O ( ϕ n ) a . s .
This completes the proof of Theorem 11. □
Remark 29.
Several subtle points in the proof merit explicit commentary. First, the uniform almost sure bounds for u n , 3 ( φ , x ˜ ) and u n , 3 ( 1 , x ˜ ) are not merely of order O P ( ϕ n ) but hold with probability one; this strengthening is essential for the subsequent manipulation of the ratio, as it allows us to treat the denominator as uniformly bounded away from zero in an almost sure sense. Second, the constants c 1 and c 2 depend implicitly on the propensity score p ( · ) and the density f ( · ) through the bias expansion, but crucially not on the sample size n or the bandwidth parameter; this uniformity is guaranteed by the compactness of S X m and the continuity of p and f. Third, while the MAR assumption introduces the factor j = 1 m p ( x j ) into the expectation E [ u n , 3 ( 1 , x ˜ ) ] , this factor cancels completely in the ratio E ^ [ r ^ n , 3 ( m ) ] = E [ u n , 3 ( φ , x ˜ ) ] / E [ u n , 3 ( 1 , x ˜ ) ] , as the same factor appears in both numerator and denominator. This cancellation is the mathematical manifestation of the well-known property that complete-case estimators under MAR retain the same bias expansion as their complete-data counterparts, up to higher-order terms that are asymptotically negligible. Consequently, the rate ϕ n remains unchanged from the complete-data case, although the asymptotic variance is inflated by the factor 1 / p ( x j ) (a phenomenon that affects the constant in the central limit theorem but not the rate of convergence). Finally, the application of Theorem 10 is justified under the same regularity conditions as in the previous sections, with the bandwidth parameters b j i and η j i satisfying the appropriate decay conditions.
Proof of Theorem 12.
The demonstration of Theorem 12 proceeds under the MAR assumption (2.4) with the positivity condition inf x S d , 1 p ( x ) c > 0 . Throughout this proof, all U-statistics are understood to be the complete-case versions incorporating the missingness indicators δ i as defined in (3.5); for notational brevity, we write u n , 3 ( φ , x ˜ ) in place of u n , 3 ( miss ) ( φ , x ˜ ) , with the implicit understanding that the product j = 1 m δ i j is included in the kernel. To obtain the desired results, we need to prove that:
sup x ˜ S X m E u n , 3 ( φ , x ˜ ) R ( φ , x ˜ ) = O j = 1 m i = 1 d b j i .
We first remark that under the MAR assumption, the expectation of the complete-case U-statistic takes the form
E u n , 3 ( φ , x ˜ ) = ( n m ) ! n ! i I ( m , n ) E G φ , x ˜ , 3 ( miss ) ( X ˜ i , Y ˜ i , δ ˜ i ) = E φ ( Y ˜ i ) j = 1 m δ i j K ˜ Λ ¯ n , 3 ( x ˜ ) ( X ˜ i ) = E E φ ( Y ˜ i ) j = 1 m δ i j X ˜ i K ˜ Λ ¯ n , 3 ( x ˜ ) ( X ˜ i ) = E j = 1 m p ( X i j ) E φ ( Y ˜ i ) X ˜ i K ˜ Λ ¯ n , 3 ( x ˜ ) ( X ˜ i ) ( by MAR ) = [ 0 , 1 ] d m r ( m ) ( φ , u ˜ ) j = 1 m p ( u j ) f ˜ ( u ˜ ) K ˜ Λ ¯ n , 3 ( x ˜ ) ( u ˜ ) d u ˜ = E R ( φ , θ x ˜ ) j = 1 m p ( θ x j ) ,
where θ x ˜ = ( θ x 1 , , θ x m ) [ 0 , 1 ] d m is a random vector whose components are independent beta random variables with θ x j Beta ( x j / b j + 1 , ( 1 x j ) / b j + 1 ) , and K ˜ Λ ¯ n , 3 ( x ˜ ) ( · ) denotes the product of the corresponding beta densities. The final equality follows from the fact that K ˜ Λ ¯ n , 3 ( x ˜ ) ( · ) is precisely the joint density of θ x ˜ . Following the same reasoning as the proof of Theorem 9 (adapted to the current kernel setting), we perform a second-order Taylor expansion of the function R ( φ , · ) j = 1 m p ( · ) around θ x ˜ = x ˜ . This yields
E R ( φ , θ x ˜ ) j = 1 m p ( θ x j ) = R ( φ , x ˜ ) j = 1 m p ( x j ) + i = 1 m = 1 d x i R ( φ , x ˜ ) p ( x i ) E ( θ x i x i ) + 1 2 i = 1 m = 1 d 2 x i 2 R ( φ , x ̲ ˜ ) p ( x ̲ i ) E ( θ x i x i ) 2 + i , j = 1 , i j m , r = 1 , r d 2 x i x j r R ( φ , x ̲ ˜ ) p ( x ̲ i ) p ( x ̲ j ) E ( θ x i x i ) ( θ x j r x j r ) ,
for some x ̲ ˜ lying on the line segment joining θ x ˜ and x ˜ . The existence of such a point is guaranteed by the mean-value theorem, and the smoothness condition (C.2) ensures that the second-order derivatives are continuous and bounded on the compact domain. For a Beta distribution with parameters α = x / b + 1 and β = ( 1 x ) / b + 1 , we have
E [ θ ] = α α + β = x / b + 1 1 / b + 2 = x + b ( 1 2 x ) + O ( b 2 ) .
Hence, E ( θ x ) = O ( b ) . The second moment satisfies E [ ( θ x ) 2 ] = Var ( θ ) + ( E [ θ ] x ) 2 = O ( b ) + O ( b 2 ) = O ( b ) . The Cauchy–Schwarz inequality does not improve the rate here because the first-order term is already of order b, not b 1 / 2 . Consequently, we have the estimate:
E R ( φ , θ x ˜ ) j = 1 m p ( θ x j ) = R ( φ , x ˜ ) j = 1 m p ( x j ) + O j = 1 m i = 1 d b j i .
Thus,
sup x ˜ S X m E u n , 3 ( φ , x ˜ ) R ( φ , x ˜ ) j = 1 m p ( x j ) = O j = 1 m i = 1 d b j i .
Taking φ 1 in the above equation gives us
sup x ˜ S X m E u n , 3 ( 1 , x ˜ ) f ˜ ( x ˜ ) j = 1 m p ( x j ) = O j = 1 m i = 1 d b j i .
Now, recall the bias decomposition (11.3):
E ^ r ^ n , 3 ( m ) ( φ , x ˜ ; Λ ¯ n , 3 ( x ˜ ) ) r ( m ) ( φ , x ˜ ) = E [ u n , 3 ( φ , x ˜ ) ] r ( m ) ( φ , x ˜ ) E [ u n , 3 ( 1 , x ˜ ) ] E [ u n , 3 ( 1 , x ˜ ) ] .
Combining (14.19) and (14.20), we compute the numerator:
E [ u n , 3 ( φ , x ˜ ) ] r ( m ) ( φ , x ˜ ) E [ u n , 3 ( 1 , x ˜ ) ] = R ( φ , x ˜ ) j = 1 m p ( x j ) + O b j i r ( m ) ( φ , x ˜ ) f ˜ ( x ˜ ) j = 1 m p ( x j ) + O b j i = j = 1 m p ( x j ) R ( φ , x ˜ ) r ( m ) ( φ , x ˜ ) f ˜ ( x ˜ ) + O b j i = O j = 1 m i = 1 d b j i ,
since R ( φ , x ˜ ) = r ( m ) ( φ , x ˜ ) f ˜ ( x ˜ ) by definition. The factor j = 1 m p ( x j ) cancels exactly, a crucial consequence of the MAR assumption ensuring that the same propensity score product appears in both the numerator and denominator expectations. Under the positivity condition inf x ˜ S X m f ˜ ( x ˜ ) > 0 (which follows from condition (C.2) and the compactness of S X m ), together with the fact that j = 1 m p ( x j ) c m > 0 , we have
inf x ˜ S X m E [ u n , 3 ( 1 , x ˜ ) ] 1 2 inf f ˜ ( x ˜ ) · c m > 0
for sufficiently large n. Therefore, the denominator in the bias decomposition is uniformly bounded away from zero. Consequently,
sup x ˜ S X m E ^ r ^ n , 3 ( m ) ( φ , x ˜ ; Λ ¯ n , 3 ( x ˜ ) ) r ( m ) ( φ , x ˜ ) = O j = 1 m i = 1 d b j i .
Hence, the proof of Theorem 12 is complete. □
Proof of Theorem 13.
The demonstration of Theorem 13 establishes the almost sure uniform convergence rate of the complete-case conditional U-statistic u n , 3 ( φ , x ˜ ) under the MAR assumption (2.4) with the positivity condition inf x S d , 1 p ( x ) c > 0 . Throughout this proof, all U-statistics are understood to be the complete-case versions incorporating the missingness indicators δ i as defined in (3.5); for notational brevity, we write u n , 3 ( φ , x ˜ ) in place of u n , 3 ( miss ) ( φ , x ˜ ) , with the implicit understanding that the product j = 1 m δ i j is included in the kernel. Using the notation established in the proof of Theorem 10, and employing a reasoning akin to that of [165], we proceed to redefine the truncation threshold ω n , 3 and the grid size parameter N n as follows:
ω n , 3 : = n 1 + ε 2 + γ ,             N n : = n 1 + ε j = 1 m i = 1 d b j i η j i 1 2 j = 1 m i = 1 d 1 b j i 2 ,
for an arbitrarily small ε > 0 . These choices are designed to ensure the summability of the tail probabilities via the Borel–Cantelli lemma, thereby yielding almost sure convergence. The rate of convergence is given by
ϕ n : = ( log n / n ) j = 1 m i = 1 d b j i η j i .
In order to prove Theorem 13, we need to demonstrate that
sup x ˜ S X m u n , 3 ( φ , x ˜ ) E u n , 3 ( φ , x ˜ ) = O ϕ n , a . s .
Recall the truncation decomposition u n , 3 ( φ , x ˜ ) = u n , 3 ( T ) ( φ , x ˜ ) + u n , 3 ( R ) ( φ , x ˜ ) from (11.1), where the truncated part u n , 3 ( T ) uses φ ( T ) = φ 1 { | φ | ω n , 3 } and the remainder part u n , 3 ( R ) uses φ ( R ) = φ 1 { | φ | > ω n , 3 } . Under the MAR assumption, the remainder kernel incorporates the missingness indicators as G φ , x ˜ , 3 ( miss ) , ( R ) ( X ˜ , Y ˜ , δ ˜ ) = φ ( R ) ( Y ˜ ) j = 1 m δ j K ˜ Λ ¯ n , 3 ( x ˜ ) ( X ˜ ) . We first analyze the remainder term. From (14.11) and (14.12), which under the MAR assumption incorporate the factor j = 1 m p ( u j ) 1 , we obtain the bound
E u n , 3 ( R ) ( φ , x ˜ ) ω n , 3 ( 1 + γ ) C 1 = n ( 1 + ε ) 1 + γ 2 + γ C 1 O ( ϕ n ) .
This follows from the definition ω n , 3 = n ( 1 + ε ) / ( 2 + γ ) , which implies
ω n , 3 ( 1 + γ ) = n ( 1 + ε ) ( 1 + γ ) / ( 2 + γ ) = n ( 1 + ε ) · n ε ( 1 + γ ) / ( 2 + γ ) = o ( ϕ n )
since ϕ n decays polynomially. Moreover, by condition (C.3) (or its generalization (C.3)″) and Markov’s inequality, we infer
n = 1 P | φ ( Y ˜ n ) | > ω n , 3 n = 1 E | φ ( Y ˜ n ) | 2 + γ ω n , 3 2 + γ = E | φ ( Y ˜ n ) | 2 + γ n = 1 1 n ( 1 + ε ) < ,
since ω n , 3 2 + γ = n ( 1 + ε ) and the series n ( 1 + ε ) converges for any ε > 0 . The finiteness of E ( | φ ( Y ˜ n ) | 2 + γ ) is guaranteed by condition (C.3). Applying the Borel–Cantelli lemma, we conclude that for sufficiently large n, | φ ( Y ˜ n ) | ω n , 3 with probability one. This implies that for any i n , | φ ( Y ˜ i ) | ω n , 3 almost surely for sufficiently large n. Consequently, the indicator 1 { | φ ( Y ˜ i ) | > ω n , 3 } vanishes almost surely for all m-tuples i I ( m , n ) . It follows that u n , 3 ( R ) ( φ , x ˜ ) = 0 almost surely for sufficiently large n, and therefore
u n , 3 ( R ) ( φ , x ˜ ) E u n , 3 ( R ) ( φ , x ˜ ) = O ( ϕ n ) a . s . ,
uniformly in x ˜ S X m . We now examine the discretization bias arising from replacing x ˜ by its grid approximation x ˜ ( x ˜ ) . Observe that
ω n , 3 N n m j = 1 m i = 1 d b j i η j i 1 2 j = 1 m i = 1 d 1 b j i 2 = n m ( 1 + ε ) 1 + γ 2 + γ · ( polynomial factors ) = O ( ϕ n ) ,
since the polynomial factors are dominated by the exponential decay of n m ( 1 + ε ) for sufficiently large n. Using the same reasoning as in the proof of Theorem 10 (see (14.4) and the subsequent estimates), we obtain
sup x ˜ S X m E [ u n , 3 ( T ) ( φ , x ˜ ( x ˜ ) ) ] E [ u n , 3 ( T ) ( φ , x ˜ ) ] = O ω n , 3 N n m j = 1 m i = 1 d b j i η j i 1 2 j = 1 m i = 1 d 1 b j i 2 = O ( ϕ n ) .
This bound is uniform in x ˜ and holds almost surely. The exponential bounds (14.1) and (14.9) from the proof of Theorem 10 remain valid under the MAR assumption with the modified definitions of ω n , 3 and N n . Specifically, for any ε > 0 ,
P u n , 3 ( T ) ( φ , x ˜ ) E u n , 3 ( T ) ( φ , x ˜ ) > ε ρ ϕ n 2 n ε 2 / 4 ,
and for the nonlinear terms,
P sup x ˜ S X m q = 2 m m q u n , 3 ( q ) π q , m ( G φ , x ˜ , 3 ( miss ) , ( T ) ) > ε 0 ρ ϕ n N n d n m ( ε 0 / 2 C 6 ) .
Condition (C.4’) (or an appropriate growth condition on the bandwidth parameters) implies that
j = 1 m i = 1 d b j i η j i 1 2 j = 1 m i = 1 d 1 b j i 2 = O n 1 1 κ j = 1 m i = 1 d b j i η j i 1 2 κ 2 log ( n ) 1 1 κ O n 1 1 κ ,
where the last inequality holds because j = 1 m { ( i = 1 d b j i η j i ) 1 / 2 } κ / 2 / log ( n ) is bounded. This condition ensures that N n grows at most polynomially in n. Now choose
K : = 2 ( d + 1 ) ( 1 + ε ) + d 1 κ .
With this choice, we have N n d = O ( n d / ( 1 κ ) ) up to logarithmic factors, and consequently
N n d n K 2 / 4 = O n d 1 κ K 2 4 = O n ( 1 + ε ) ,
since K 2 / 4 = ( d + 1 ) ( 1 + ε ) + d / ( 1 κ ) > d / ( 1 κ ) + ( 1 + ε ) . Therefore,
n = 1 P sup x ˜ S X m u n , 3 ( T ) ( φ , x ˜ ) E u n , 3 ( T ) ( φ , x ˜ ) > ε ρ ϕ n n = 1 O 1 n 1 + ε < .
By the Borel–Cantelli lemma, the summability of the tail probabilities implies that, almost surely,
sup x ˜ S X m u n , 3 ( T ) ( φ , x ˜ ) E u n , 3 ( T ) ( φ , x ˜ ) = O ϕ n .
Noting that the truncated part dominates the asymptotic behavior (the remainder part is identically zero almost surely for sufficiently large n), we obtain
sup x ˜ S X m u n , 3 ( φ , x ˜ ) E u n , 3 ( φ , x ˜ ) = O ϕ n a . s .
This completes the proof of Theorem 13. □
Remark 30.
Several subtle points in the proof deserve explicit commentary. First, the choice ω n , 3 = n ( 1 + ε ) / ( 2 + γ ) is critical for the Borel–Cantelli argument: it ensures that P ( | φ ( Y ˜ n ) | > ω n , 3 ) converges, while still allowing ω n , 3 to grow slowly enough that the remainder part is asymptotically negligible. Second, the condition (C.4’) (or an analogous growth condition on b j i and η j i ) guarantees that N n does not grow too fast, specifically N n d = O ( n d / ( 1 κ ) ) , which is essential for the summability of the tail probabilities after the union bound over the N n d grid points. Third, the choice K = 2 ( d + 1 ) ( 1 + ε ) + d / ( 1 κ ) is carefully calibrated to ensure that K 2 / 4 > d / ( 1 κ ) + ( 1 + ε ) , yielding N n d n K 2 / 4 = O ( n ( 1 + ε ) ) . This ensures the series converges. Fourth, under the MAR assumption, the missingness indicators δ i do not affect the almost sure convergence rates because they are bounded by 1 and the positivity condition ensures that the effective sample size remains proportional to n almost surely. The factor j = 1 m p ( u j ) that appears in the expectations is bounded by 1, so it does not alter the rates. Finally, the almost sure convergence established here is stronger than convergence in probability, providing uniform control over the entire domain S X m with probability one.
Proof of Theorem 14.
The proof of Theorem 14 is done in the same fashion as the proof of Theorem 11, combining (11.2) with the results of Theorem 13. □
Proof of Theorem 15.
The proof of Theorem 15 is the same as the proof of Theorem 12. □

15. Proofs of Section 3.1

The regression proof for the case where m = 1 closely resembles the one given in [164], now extended to accommodate missing responses under the MAR assumption (2.4) with the positivity condition inf x S d , 1 p ( x ) c > 0 . We include it here in full detail for the reader’s convenience and to ensure it is self-contained. However, the result concerning the regression function smoothed by the Dirichlet kernel in the presence of missing data has not been addressed in the literature, providing the primary motivation for presenting it in this paper.
Proof of Theorem 1.
Throughout this proof, all estimators are understood to be the complete-case versions incorporating the missingness indicators δ i as defined in (3.8) and (3.9). For notational brevity, we write g ^ n ( φ , x , Λ n , 1 ) in place of g ^ n ( miss ) ( φ , x , Λ n , 1 ) , with the implicit understanding that the factor δ i is included in the summand. Specifically,
g ^ n ( φ , x , Λ n , 1 ) = 1 n i = 1 n φ ( Y i ) δ i K ( α , β ) ( X i ) .
Observe that
sup x S d , 1 ( b ˘ d ) g ^ n ( φ , x , Λ n , 1 ) R ( φ , x ) sup x S d , 1 ( b ˘ d ) g ^ n ( φ , x , Λ n , 1 ) E g ^ n ( φ , x , Λ n , 1 ) + sup x S d , 1 ( b ˘ d ) E g ^ n ( φ ( T ) , x , Λ n , 1 ) R ( φ , x ) .
Keep in mind the definition of the set S d , 1 ( δ ) given in (3.11). First, we need to prove the following result under the MAR assumption
sup x S d , 1 ( b ˘ d ) g ^ n ( φ , x , Λ n , 1 ) E g ^ n ( φ , x , Λ n , 1 ) = O | log b ˘ | ( log n ) 3 / 2 b ˘ d + 1 / 2 n , a . s .
The proof of (15.1) follows the same analogy as in [164] while applying the necessary changes to fit our context, including the incorporation of missingness indicators. Under the MAR assumption, we have
g ^ n ( φ , x , Λ n , 1 ) E g ^ n ( φ , x , Λ n , 1 ) = 1 n i = 1 n φ ( Y i ) δ i K ( α , β ) X i E [ φ ( Y i ) δ i K ( α , β ) X i ] = 1 n i = 1 n Z i , b ( x ) ,
where, for i = 1 , , n ,
Z i , b ( x ) : = φ ( Y i ) δ i K ( α , β ) X i E [ φ ( Y i ) δ i K ( α , β ) X i ] .
Note that the expectation incorporates the propensity score via E [ δ i X i = u ] = p ( u ) . For some sequence ω n , 1 tending to infinity (to be specified later), we also consider the following truncation notation
φ ( T ) ( y ) : = φ ( y ) 1 φ ( y ) ω n , 1 , φ ( R ) ( y ) : = φ ( y ) 1 φ ( y ) > ω n , 1 .
This allows us to write
g ^ n ( φ , x , Λ n , 1 ) E g ^ n ( φ , x , Λ n , 1 ) = 1 n i = 1 n Z i , b ( T ) ( x ) + Z i , b ( R ) ( x ) ,
with
Z i , b ( T ) ( x ) : = φ ( T ) ( Y i ) δ i K ( α , β ) X i E [ φ ( T ) ( Y i ) δ i K ( α , β ) X i ] ,
and
Z i , b ( R ) ( x ) : = φ ( R ) ( Y i ) δ i K ( α , β ) X i E [ φ ( R ) ( Y i ) δ i K ( α , β ) X i ] .
We also denote the centered kernel for the constant function
W i , b ( x ) : = δ i K ( α , β ) X i E [ δ i K ( α , β ) X i ] .
The following proposition, which is an adaptation of Proposition 1 of [164] to the MAR setting, will play an instrumental role in the sequel.
Proposition  4.
Let x S d , 1 ( b ˘ ( d + 1 ) ) , n 1 , 0 < b ˘ < e 16 2 d 1 , 0 < a e 1 f | log b ˘ | / b ˘ d + 1 / 2 , and take the unique
δ 0 , e 1 t h a t   s a t i s f i e s δ | log δ | = b ˘ d + 1 / 2 a f | log b ˘ | .
Then, under the MAR assumption (2.4) with the positivity condition inf x S d , 1 p ( x ) c > 0 , for all h R ,
P sup x x + [ b ˘ , b ˘ ] d 1 n i = 1 n Z i , b ( T ) x h + 2 a ω n , 1 , 1 n i = 1 n Z i , b ( T ) ( x ) h C φ , d exp 1 100 2 d 4 f 2 · n 1 / 2 b ˘ d + 1 / 2 a | log δ | | log b ˘ | 2 ,
where C φ , d > 0 is a constant that depends only on the function φ ( · ) , the dimension d, and the bounds on the propensity score p ( · ) .
Sketch of Proposition 4.
The proof follows the same lines as Proposition 1 in [164], with the key modification that the random variables Z i , b ( T ) ( x ) now include the missingness indicator δ i . However, under the MAR assumption, δ i is independent of Y i conditionally on X i , and 0 δ i 1 . Moreover, E [ δ i X i = u ] = p ( u ) , and under the positivity condition, p ( u ) is bounded away from zero and bounded above by 1. Consequently, all moment bounds that hold for the complete-data case remain valid up to constants that depend on p ( · ) . Specifically, the boundedness of | φ ( T ) | ω n , 1 and the kernel bounds from Lemma 5 and Lemma 6 (adapted to the Dirichlet kernel) ensure that the increments of the process are controlled. The exponential inequality then follows from a chaining argument and Bernstein’s inequality for bounded random variables, with constants adjusted to account for the presence of δ i . The factor ω n , 1 appears due to the truncation of φ . The detailed calculations mirror those in [164] and are omitted here for brevity. □
Using Proposition 4 and a chaining argument over the compact set S d , 1 ( b ˘ d ) (which can be covered by O ( b ˘ d ) balls of radius b ˘ ), we obtain, for an appropriate choice of a (specifically a = C | log b ˘ | for a sufficiently large constant C), that
sup x S d , 1 ( b ˘ d ) 1 n i = 1 n Z i , b ( T ) ( x ) = O | log b ˘ | ( log n ) 3 / 2 b ˘ d + 1 / 2 n a . s .
The logarithmic factors arise from the union bound over the covering number and the choice of a. For the remainder term Z i , b ( R ) ( x ) , we use condition (C.3). Choose the truncation threshold
ω n , 1 : = b ˘ d + 1 / 2 n | log b ˘ | ( log n ) 3 / 2 1 / ( 1 + γ ) .
Then, by Markov’s inequality and the Borel–Cantelli lemma, we obtain
sup x S d , 1 ( b ˘ d ) 1 n i = 1 n Z i , b ( R ) ( x ) = o | log b ˘ | ( log n ) 3 / 2 b ˘ d + 1 / 2 n a . s .
The bias term is handled using a second-order Taylor expansion of R ( φ , · ) around x , together with the properties of the Dirichlet kernel. Under condition (C.2) and the MAR assumption, we have
sup x S d , 1 ( b ˘ d ) E g ^ n ( φ , x , Λ n , 1 ) R ( φ , x ) = O ( b ˘ 1 / 2 ) .
This rate follows from the fact that E [ ξ x ] = x + O ( b ˘ ) and Var ( ξ x ) = O ( b ˘ ) , where ξ x Dirichlet ( α , β ) . The factor j = 1 m p ( x j ) that appears in the expectation cancels in the ratio estimator, but for the numerator alone it remains bounded. Putting together the preceding results, we obtain
sup x S d , 1 ( b ˘ d ) g ^ n ( φ , x , Λ n , 1 ) R ( φ , x ) = O | log b ˘ | ( log n ) 3 / 2 b ˘ d + 1 / 2 n + O ( b ˘ 1 / 2 ) a . s .
Recall that r ^ n , 1 ( 1 ) ( φ , x ) = g ^ n ( φ , x , Λ n , 1 ) / f ^ n ( x , Λ n , 1 ) , where f ^ n is the complete-case density estimator. Using the same techniques as above, we also have
sup x S d , 1 ( b ˘ d ) f ^ n ( x , Λ n , 1 ) f ( x ) = O | log b ˘ | ( log n ) 3 / 2 b ˘ d + 1 / 2 n + O ( b ˘ 1 / 2 ) a . s .
Under the positivity condition, f ( x ) is bounded away from zero on S d , 1 ( b ˘ d ) . Therefore, using the identity
g ^ n f ^ n R f = 1 f ^ n g ^ n R R f f ^ n f ^ n f ,
we obtain
sup x S d , 1 ( b ˘ d ) r ^ n , 1 ( 1 ) ( φ , x ) r ( 1 ) ( φ , x ) = O | log b ˘ | ( log n ) 3 / 2 b ˘ d + 1 / 2 n + O ( b ˘ 1 / 2 ) a . s .
This completes the proof of Theorem 1. □
Remark 31.
The adaptation to the missing data setting under the MAR assumption required careful incorporation of the missingness indicators δ i into the definitions of Z i , b ( x ) , Z i , b ( T ) ( x ) , Z i , b ( R ) ( x ) , and W i , b ( x ) . The key observation is that 0 δ i 1 and, under the positivity condition, E [ δ i X i ] = p ( X i ) is bounded away from zero. This ensures that the effective sample size remains proportional to n and that all exponential inequalities remain valid with constants adjusted by the bounds on p ( · ) . The truncation threshold ω n , 1 is chosen to balance the bias from the remainder term and the variance of the truncated part, yielding the same rates as in the complete-data case. The bias term O ( b ˘ 1 / 2 ) arises from the first-order Taylor expansion of R ( φ , · ) and the fact that E [ ξ x x ] = O ( b ˘ ) , while E [ ( ξ x x ) 2 ] = O ( b ˘ ) , leading to a square-root rate after applying the Cauchy–Schwarz inequality. The logarithmic factors | log b ˘ | and ( log n ) 3 / 2 are standard in nonparametric kernel estimation and arise from the chaining argument and the union bound over the covering of the compact set S d , 1 ( b ˘ d ) .
Proof of Proposition 4.
Throughout this proof, we operate under the MAR assumption (2.4) with the positivity condition inf x S d , 1 p ( x ) c > 0 . The random variables Z i , b ( T ) ( x ) are defined in (15.2) and include the missingness indicator δ i . Note that 0 δ i 1 and E [ δ i X i = u ] = p ( u ) , with p ( u ) bounded between c and 1. Consequently, all moment bounds that hold for the complete-data case remain valid up to constants that depend on c. Following a similar approach to the proof in [164], we apply a union bound to show that the probability in (15.6) can be bounded as follows:
P sup x x + [ b ˘ , b ˘ ] d 1 n i = 1 n Z i , b ( T ) ( x ) Z i , b ( T ) ( x ) 1 X i S d , 1 S d , 1 ( δ ) a ω n , 1
i = 1 n 1 X i S d , 1 S d , 1 ( δ ) n · 4 f δ
+ P i = 1 n 1 X i S d , 1 S d , 1 ( δ ) n · 4 f δ
+ P sup x x + [ b ˘ , b ˘ ] d 1 n i = 1 n Z i , b ( T ) ( x ) Z i , b ( T ) ( x ) 1 X i S d , 1 ( δ ) a ω n , 1
= : ( A ) + ( B ) + ( C ) .
To clarify our notation, for any subset A R d and any point x R d , we define x + A = { x + y : y A } .
  • Bounding term (A).
Consider the assumption x S d , 1 ( b ˘ ( d + 1 ) ) and x = x + [ b ˘ , b ˘ ] d . This implies that both x and x are in S d , 1 ( b ˘ ) , leading to the following relations for the Dirichlet kernel parameters:
α 1 = x 1 b ˘ + 1 , , α d = x d b ˘ + 1 , β = 1 x 1 b ˘ + 1 2 ,
and for x ,
α 1 = x 1 b ˘ + 1 , , α d = x d b ˘ + 1 , β = 1 x 1 b ˘ + 1 2 .
Consequently, we have the following bounds (see Lemma 2 in [164]):
α 1 + β 1 ( β 1 ) i = 1 d ( α i 1 ) α 1 + β 1 = b ˘ 1 + d ,
α 1 + β 1 ( β 1 ) i = 1 d ( α i 1 ) α 1 + β 1 = b ˘ 1 + d .
Under the MAR assumption, the missingness indicator δ i satisfies 0 δ i 1 , so it does not affect the kernel bounds. Combining these results with our assumption in (15.5) and the upper bound on the Dirichlet density from Lemma 2 in [164], we obtain on the event
i = 1 n 1 X i S d , 1 S d , 1 ( δ ) 4 n f δ
the following estimate:
1 n i = 1 n Z i , b ( T ) ( x ) Z i , b ( T ) ( x ) 1 X i S d , 1 S d , 1 ( δ ) 4 · 4 ω n , 1 f δ · b ˘ d b ˘ 1 + d 16 1 + b ˘ d | log δ | | log b ˘ | a ω n , 1 .
Given the assumptions 0 < δ e 1 and 0 < b ˘ < e 16 2 d 1 , we have
16 1 + b ˘ d | log δ | | log b ˘ | < a ,
since a is chosen sufficiently large. Consequently, the event in (15.7) cannot occur, and we conclude
( A ) = 0 .
  • Bounding term (B).
Term (15.8) represents the probability of encountering “too many bad observations,” meaning too many X i s near the boundary of the simplex where the partial derivatives of the Dirichlet density with respect to α 1 , , α d , and β diverge. We can control this term using a concentration bound. First, note that the volume of S d , 1 S d , 1 ( δ ) is at most 2 d δ / d ! . Specifically, S d , 1 ( δ ) forms a simplex of side-length 1 2 δ within S d , 1 , so:
d ! · Volume S d , 1 S d , 1 ( δ ) = 1 ( 1 2 δ ) d 1 ( 1 + d · ( 2 δ ) ) = 2 d δ ,
where we used the inequality ( 1 + x ) n 1 + n x , which holds for all n N and x 1 . From (15.15) and knowing that f is finite (since f ( · ) is continuous by condition (C.2) and S d , 1 is compact), we obtain:
E 1 X i S d , 1 S d , 1 ( δ ) = S d , 1 S d , 1 ( δ ) f ( u ) d u f · Volume ( S d , 1 S d , 1 ( δ ) ) 2 f ( d 1 ) ! δ .
Applying Hoeffding’s inequality to the sum of independent Bernoulli random variables (the missingness indicators δ i do not affect this bound as they are independent of X i conditionally and bounded by 1), we have for t = 4 f δ E [ 1 { · } ] :
( B ) = P i = 1 n 1 X i S d , 1 S d , 1 ( δ ) E [ · ] t exp 2 n t 2 .
Using condition (15.5), we obtain:
( B ) exp 2 n 1 / 2 b ˘ d + 1 / 2 a | log δ | | log b ˘ | 2 .
  • Bounding term (C) via chaining and conditioning.
To bound the third probability in (15.10), we use a chaining argument combined with a conditional concentration inequality. Let H k : = 2 k · b ˘ Z d be the sequence of lattice points, and for x S d , 1 ( b ˘ ( d + 1 ) ) fixed, let ( x k ) k N 0 be a sequence such that
x 0 = x , x k x H k [ b ˘ , b ˘ ] d , lim k x k x = 0 ,
and ( x k + 1 ) i = ( x k ) i ± 2 k 1 b ˘ for all i = 1 , , d . By continuity, we have the telescoping sum
1 n i = 1 n Z i , b ( T ) ( x ) Z i , b ( T ) ( x ) 1 { X i S d , 1 ( δ ) } k = 0 1 n i = 1 n Z i , b ( T ) ( x k + 1 ) Z i , b ( T ) ( x k ) 1 { X i S d , 1 ( δ ) } .
For a fixed pair ( x k , x k + 1 ) , define
D i , b ( k ) : = Z i , b ( T ) ( x k + 1 ) Z i , b ( T ) ( x k ) .
Observe that D i , b ( k ) is centered. Moreover, conditionally on the σ -algebra F Y , δ generated by { Y i , δ i } i = 1 n , the variables { D i , b ( k ) } i = 1 n are independent (since the X i are independent) and satisfy
| D i , b ( k ) | 2 ω n , 1 sup u S d , 1 K ( α , β ) ( u ) = : M n ,
and
Var D i , b ( k ) F Y , δ C 0 ω n , 1 2 x k + 1 x k 2 sup u S d , 1 ( δ ) K ( α , β ) ( u ) 2 ,
where C 0 depends only on φ and the propensity score p ( · ) (via the MAR assumption). Applying Bernstein’s inequality (Lemma A1) conditionally on F Y , δ yields, for any ε > 0 ,
P 1 n i = 1 n D i , b ( k ) 1 { X i S d , 1 ( δ ) } ε | F Y , δ 2 exp n ε 2 2 σ n , k 2 + 2 3 M n ε ,
where σ n , k 2 is a uniform bound for the conditional variance. Using the bounds from Lemmas 5 and 6, we have
M n = O b ˘ d / 2 ,             σ n , k 2 = O ω n , 1 2 b ˘ 2 d 1 2 2 k | log δ | 2 | log b ˘ | 2 .
Choosing ε = a ω n , 1 / ( 2 ( k + 1 ) 2 ) and taking expectations over F Y , δ , we obtain
P 1 n i = 1 n D i , b ( k ) 1 { X i S d , 1 ( δ ) } a ω n , 1 2 ( k + 1 ) 2 2 exp c 0 n a 2 2 2 k ( k + 1 ) 4 · | log δ | 2 | log b ˘ | 2 b ˘ 2 d + 1 ω n , 1 2 ,
for some constant c 0 > 0 . Substituting ω n , 1 = b ˘ d + 1 / 2 n / ( | log b ˘ | ( log n ) 3 / 2 ) 1 / ( 1 + γ ) and simplifying gives
P 2 exp C 1 2 2 k ( k + 1 ) 4 · n 1 / 2 b ˘ d + 1 / 2 a | log δ | | log b ˘ | 2 ,
where C 1 > 0 is an absolute constant.
Now, taking a union bound over the chaining levels and the 2 ( k + 2 ) d lattice points at level k, we obtain
( C ) k = 0 2 ( k + 2 ) d 2 d · 2 exp C 1 2 2 k ( k + 1 ) 4 · n 1 / 2 b ˘ d + 1 / 2 a | log δ | | log b ˘ | 2 .
Since 2 2 k / ( k + 1 ) 4 c 1 2 2 k for all k 0 (with c 1 = 1 / 16 ), we have
( C ) C f , d exp 1 100 2 d 4 f 2 · n 1 / 2 b ˘ d + 1 / 2 a | log δ | | log b ˘ | 2 ,
for some constant C f , d > 0 depending only on f and d. Putting (15.14), (15.16), and (15) together in (15.10) concludes the proof of Proposition 4. □
The following corollary is very similar to Corollary 2 of [164], now adapted to the MAR setting.
Corollary 8
(Large deviation estimates). Recall Z i , b ( T ) ( x ) defined in (15.2), which includes the missingness indicator δ i under the MAR assumption. Let x S d , 1 ( b ˘ ( d + 1 ) ) , n 100 6 d 6 , n 1 / d b ˘ e 16 2 d 1 , 0 < a e 1 f | log b ˘ | / b ˘ d + 1 / 2 , and take the unique
δ 0 , e 1 that satisfies δ | log δ | = b ˘ d + 1 / 2 a f | log b ˘ | .
Then, under the MAR assumption (2.4) with the positivity condition inf x S d , 1 p ( x ) c > 0 , we have
P sup x x + [ b ˘ , b ˘ ] d 1 n i = 1 n Z i , b ( T ) x 3 a ω n , 1 C f , d exp 1 100 2 d 4 f 2 · n 1 / 2 b ˘ d + 1 / 2 a | log δ | | log b ˘ | 2 ,
where C f , d > 0 is a constant that depends only on the density f ( · ) and the dimension d, and not on the missingness mechanism.
Proof of Corollary 8.
Throughout this proof, we operate under the MAR assumption (2.4) with the positivity condition inf x S d , 1 p ( x ) c > 0 . The random variables Z i , b ( T ) ( x ) are defined in (15.2) and include the missingness indicator δ i , which satisfies 0 δ i 1 and E [ δ i X i ] = p ( X i ) . By applying a union bound, we find that the probability in equation (15.17) can be bounded as follows:
P sup x x + [ b ˘ , b ˘ ] d 1 n i = 1 n Z i , b ( T ) x 3 a ω n , 1 , 1 n i = 1 n Z i , b ( T ) ( x ) a ω n , 1 + P 1 n i = 1 n Z i , b ( T ) ( x ) a ω n , 1 .
The first probability can be bounded using Proposition 4 (with h = a ω n , 1 ), which directly yields an exponential bound of the form
P C φ , d exp 1 100 2 d 4 f 2 · n 1 / 2 b ˘ d + 1 / 2 a | log δ | | log b ˘ | 2 .
The second probability can be similarly bounded by applying Bernstein’s inequality (or Azuma’s inequality) to the sum of independent centered bounded random variables Z i , b ( T ) ( x ) . Indeed, | Z i , b ( T ) ( x ) | 2 ω n , 1 · sup K ( α , β ) , and the variance is bounded by C ω n , 1 2 b ˘ d 1 / 2 (up to constants). A standard application of Bernstein’s inequality gives
P 1 n i = 1 n Z i , b ( T ) ( x ) a ω n , 1 2 exp n a 2 ω n , 1 2 2 σ 2 + 2 3 M a ω n , 1 = O exp n 1 / 2 b ˘ d + 1 / 2 a | log b ˘ | ,
which is of the same order as the bound from Proposition 4 (up to constants). Summing the two probabilities and adjusting the constant C f , d yields the desired result. □
Now, to prove equation (15.1), we start by noting under the MAR assumption:
g ^ n ( φ , x , Λ n , 1 ) = 1 n i = 1 n φ ( Y i ) δ i K ( α , β ) X i = 1 n i = 1 n φ ( T ) ( Y i ) δ i K ( α , β ) X i + 1 n i = 1 n φ ( R ) ( Y i ) δ i K ( α , β ) X i = g ^ n ( φ ( T ) , x , Λ n , 1 ) + g ^ n ( φ ( R ) , x , Λ n , 1 ) .
To prove equation (15.1), we need to show that the remainder term is asymptotically negligible, specifically:
sup x S d , 1 ( b ˘ d ) g ^ n ( φ ( R ) , x , Λ n , 1 ) E g ^ n ( φ ( R ) , x , Λ n , 1 ) = o ( 1 ) , a . s . ,
This follows directly from the proof of the remainder term for the U-statistics developed subsequently (see the analysis of u n , 3 ( R ) ), using condition (C.3) and the Borel–Cantelli lemma. The presence of the missingness indicator δ i does not affect the argument since 0 δ i 1 . Additionally, we need to prove:
sup x S d , 1 ( b ˘ d ) g ^ n ( φ ( T ) , x , Λ n , 1 ) E g ^ n ( φ ( T ) , x , Λ n , 1 ) = O | log b ˘ | ( log n ) 3 / 2 b ˘ d + 1 / 2 n , a . s .
This equation is obtained by a union bound over the suprema on hypercubes of width 2 b ˘ centered at each x 2 b ˘ Z d S d , 1 ( b ˘ ( d + 1 ) ) , using the large deviation estimates in Corollary 8, and choosing
a = 100 d 2 ( log n ) 3 / 2 n · f | log b ˘ | b ˘ d + 1 / 2 .
The upper bound condition on a is satisfied as long as 100 d 2 ( log n ) 3 / 2 / ( n ) e 1 , which is valid if n 100 6 d 6 . For the unique δ ( 0 , e 1 ] that satisfies
δ | log δ | = b ˘ d + 1 / 2 a f | log b ˘ | = ( 15.19 ) 100 d 2 ( log n ) 3 / 2 n ,
we obtain:
P sup x S d , 1 ( b ˘ d ) g ^ n ( φ ( T ) , x , Λ n , 1 ) E g ^ n ( φ ( T ) , x , Λ n , 1 ) 3 a ω n , 1 x 2 b ˘ Z d S d , 1 ( b ˘ ( d + 1 ) ) P sup x x + [ b ˘ , b ˘ ] d 1 n i = 1 n Z i , b ( T ) ( x ) 3 a ω n , 1 b ˘ d · C f , d exp 1 100 2 d 4 f 2 n 1 / 2 b ˘ d + 1 / 2 a | log δ | | log b ˘ | 2 b ˘ d · C f , d exp ( log n ) 3 | log δ | 2 .
The condition on δ in (15.20) implies:
n 1 / 2 δ e 1 , ( thus | log δ | 1 2 log n ) ,
since the function x x | log x | is increasing on ( 0 , e 1 ] and δ | log δ | = 100 d 2 ( log n ) 3 / 2 / n is of order ( log n ) 3 / 2 / n . Using (15.21) in (15.20), we get:
P sup x S d , 1 ( b ˘ d ) g ^ n ( φ ( T ) , x , Λ n , 1 ) E g ^ n ( φ ( T ) , x , Λ n , 1 ) 3 a ω n , 1 C f , d exp d | log b ˘ | 4 log n .
Since we assumed that b ˘ n 1 / d , we have | log b ˘ | 1 d log n , so the above is C f , d exp log n 4 log n = C f , d n 3 , which is summable. By our choice of a in (15.19) and the Borel–Cantelli lemma, we obtain
sup x S d , 1 ( b ˘ d ) g ^ n ( φ ( T ) , x , Λ n , 1 ) E g ^ n ( φ ( T ) , x , Λ n , 1 ) = O | log b ˘ | ( log n ) 3 / 2 b ˘ d + 1 / 2 n , a . s .
  • Bias term analysis under MAR.
Now, we only need to study the bias term. Under the MAR assumption, we have
E g ^ n ( φ ( T ) , x , Λ n , 1 ) R ( φ , x ) = O ( b ˘ 1 / 2 ) .
Using the same reasoning as [165], but incorporating the MAR assumption, we have
E g ^ n ( φ ( T ) , x , Λ n , 1 ) = S d , 1 r ( 1 ) ( φ , u ) p ( u ) f ( u ) K α , β ( u ) du = E R ( φ , ζ x ) p ( ζ x ) ,
where ζ x = ( ζ x 1 , , ζ x d ) Dirichlet α , β . The factor p ( ζ x ) appears because E [ δ i X i = u ] = p ( u ) . By a second-order Taylor expansion of R ( φ , u ) p ( u ) around u = x , we have
E R ( φ , ζ x ) p ( ζ x ) = R ( φ , x ) p ( x ) + j = 1 d x j R ( φ , x ) p ( x ) E ζ x j x j + 1 2 j = 1 d 2 x j 2 R ( φ , x ¯ ) p ( x ¯ ) E ζ x j x j 2 + j = 1 d k = 1 , k j d 2 x j x k R ( φ , x ¯ ) p ( x ¯ ) E ζ x j x j ζ x k x k ,
for some x ¯ joining ζ x and x .
For all j , k { 1 , , d } , straightforward calculations (see [164]) yield:
E ζ j = x j b ˘ + 1 1 b ˘ + d + 1 = x j + b ˘ 1 + b ˘ ( d + 1 ) = x j + b ˘ 1 ( d + 1 ) x j + O b ˘ 2 , Cov ζ j , ζ k = x j b ˘ + 1 1 b ˘ + d + 1 1 { j = k } x k b ˘ + 1 1 b ˘ + d + 1 2 1 b ˘ + d + 2 = b ˘ x j + b ˘ 1 { j = k } x k + b ˘ ( d + 1 ) 1 { j = k } b ˘ ( 1 + b ˘ ( d + 1 ) ) 2 ( 1 + b ˘ ( d + 2 ) )
= b ˘ x j 1 { j = k } x k + O b ˘ 2 ,
E ζ j x j ζ k x k = Cov ζ j , ζ k + E ζ j x j E ζ k x k
= b ˘ x j 1 { j = k } x k + O b ˘ 2 .
Then, by the Cauchy–Schwarz inequality, together with (15.23) and (15.24), we obtain:
E R ( φ , ζ x ) p ( ζ x ) R ( φ , x ) p ( x ) = j = 1 d O E ζ x j x j + 1 2 j = 1 d O E ζ x j x j 2 + j = 1 d k = 1 , k j d O E ζ x j x j ζ x k x k j = 1 d O E ζ x j x j 2 + O ( b ˘ ) + O ( b ˘ 2 ) O ( b ˘ 1 / 2 ) + O ( b ˘ ) + O ( b ˘ 2 ) = O ( b ˘ 1 / 2 ) ( 1 + o ( 1 ) ) .
Thus, we have established
E g ^ n ( φ ( T ) , x , Λ n , 1 ) = R ( φ , x ) p ( x ) + O ( b ˘ 1 / 2 ) .
For the denominator, a similar calculation gives
E f ^ n ( x , Λ n , 1 ) = f ( x ) p ( x ) + O ( b ˘ 1 / 2 ) .
  • Final combination for the regression estimator.
Finally, we obtain
sup x S d , 1 ( b ˘ d ) g ^ n ( φ , x , Λ n , 1 ) f ( x ) r ( 1 ) ( φ , x ) sup x S d , 1 ( b ˘ d ) g ^ n ( φ , x , Λ n , 1 ) R ( φ , x ) p ( x ) inf x S d , 1 ( b ˘ d ) f ( x ) ,
and
sup x S d , 1 ( b ˘ d ) f ^ n ( x , Λ n , 1 ) f ( x ) 1 sup x S d , 1 ( b ˘ d ) f ^ n ( x , Λ n , 1 ) f ( x ) p ( x ) inf x S d , 1 ( b ˘ d ) f ( x ) .
By integrating the acquired findings with the given information that
r ( 1 ) ( φ , x ) = g ^ n ( φ , x , Λ n , 1 ) f ( x ) p ( x ) · f ( x ) p ( x ) f ^ n ( x , Λ n , 1 ) ,
and noting that the factor p ( x ) cancels exactly in the ratio, we obtain the desired result. Therefore, the proof is conclusive.
Remark 32.
The adaptation to the missing data setting required careful incorporation of the propensity score p ( · ) in the bias expansion. Crucially, the factor p ( x ) appears in both the numerator and denominator expectations, leading to exact cancellation when forming the ratio estimator r ^ n , 1 ( 1 ) ( φ , x ) = g ^ n / f ^ n . This cancellation is a direct consequence of the MAR assumption and ensures that the bias of the regression estimator is of the same order O ( b ˘ 1 / 2 ) as in the complete-data case. The variance, however, is inflated by a factor of 1 / p ( x ) , which does not affect the convergence rate but appears in the asymptotic normality result.
By applying a union bound, we find that the probability in equation (15.17) can be bounded as follows:
P sup x x + [ b ˘ , b ˘ ] d 1 n i = 1 n Z i , b ( T ) x 3 a ω n , 1 , 1 n i = 1 n Z i , b ( T ) ( x ) a ω n , 1 + P 1 n i = 1 n Z i , b ( T ) ( x ) a ω n , 1 .
The first probability can be bounded using Proposition 4, and the second probability can be similarly bounded by applying Azuma’s inequality and Lemma 4 from [164]. Now, to prove equation (15.1), we start by noting:
g ^ n ( φ , x , Λ n , 1 ) = 1 n i = 1 n φ ( Y i ) K ( α , β ) X i = 1 n i = 1 n φ ( T ) ( Y i ) K ( α , β ) X i + 1 n i = 1 n φ ( R ) ( Y i ) K ( α , β ) X i = g ^ n ( φ ( T ) , x , Λ n , 1 ) + g ^ n ( φ ( R ) , x , Λ n , 1 ) .
To prove equation (15.1), we need to show that the remainder term is asymptotically negligible, specifically:
sup x S d , 1 ( b ˘ d ) g ^ n ( φ ( R ) , x , Λ n , 1 ) E g ^ n ( φ ( R ) , x , Λ n , 1 ) = o ( 1 ) , a . s . ,
This follows directly from the proof of the remainder term for the U-statistics developed subsequently. Additionally, we need to prove:
sup x S d , 1 ( b ˘ d ) g ^ n ( φ ( T ) , x , Λ n , 1 ) E g ^ n ( φ ( T ) , x , Λ n , 1 ) = O | log b ˘ | ( log n ) 3 / 2 b ˘ d + 1 / 2 n , a . s .
This equation is obtained by a union bound over the suprema on hypercubes of width 2 b ˘ centered at each x 2 b ˘ Z d S d , 1 ( b ˘ ( d + 1 ) ) , using the large deviation estimates in Corollary 8, and choosing
a = 100 d 2 ( log n ) 3 / 2 n · f | log b ˘ | b ˘ d + 1 / 2 ,
The upper bound condition on a is satisfied as long as 100 d 2 ( log n ) 3 / 2 / ( n ) e 1 , which is valid if n 100 6 d 6 . For the unique δ ( 0 , e 1 ] that satisfies
δ | log δ | = b ˘ d + 1 / 2 a f | log b ˘ | = ( 15.28 ) 100 d 2 ( log n ) 3 / 2 n ,
we obtain:
P sup x S d , 1 ( b ˘ d ) g ^ n ( φ ( T ) , x , Λ n , 1 ) E g ^ n ( φ ( T ) , x , Λ n , 1 ) 3 a ω n , 1 x 2 b ˘ Z d S d , 1 ( b ˘ ( d + 1 ) ) P sup x x + [ b ˘ , b ˘ ] d 1 n i = 1 n Z i , b ( T ) ( x ) 3 a ω n , 1 b ˘ d · C f , d exp 1 100 2 d 4 f 2 n 1 / 2 b ˘ d + 1 / 2 a | log δ | | log b ˘ | 2 b ˘ d · C f , d exp ( log n ) 3 | log δ | 2 .
The condition on δ in (15.29) implies:
n 1 / 2 δ e 1 , ( thus | log δ | 1 2 log n ) ,
since the function x x | log x | is increasing on ( 0 , e 1 ] . Using (15.30) in (15.29), we get:
P sup x S d , 1 ( b ˘ d ) g ^ n ( φ ( T ) , x , Λ n , 1 ) E g ^ n ( φ ( T ) , x , Λ n , 1 ) 3 a ω n , 1 C f , d exp ( d | log b ˘ | 4 log n ) .
Since we assumed that b ˘ n 1 / d , the above is C f , d n 3 , which is summable. By our choice of a in (15.28) and the Borel–Cantelli lemma, we obtain
sup x S d , 1 ( b ˘ d ) g ^ n ( φ ( T ) , x , Λ n , 1 ) E g ^ n ( φ ( T ) , x , Λ n , 1 ) = O | log b ˘ | ( log n ) 3 / 2 b ˘ d + 1 / 2 n , a . s .
Now, we only need to study the bias term,
E g ^ n ( φ ( T ) , x , Λ n , 1 ) R ( φ , x ) = O ( b ˘ 1 / 2 ) .
Using the same reasoning as [165], we have
E g ^ n ( φ ( T ) , x , Λ n , 1 ) = S d , 1 r ( 1 ) ( φ , u ) f ( u ) K α , β ( u ) du = E R ( φ , ζ x ) ,
where ζ x = ( ζ x 1 , , ζ x d ) Dirichlet α , β . By a second-order Taylor expansion around ζ x = x , we have
E R ( φ , ζ x ) = R ( φ , x ) + j = 1 d R ( φ , x ) x j E ζ x j x j + 1 2 j = 1 d 2 R ( φ , x ¯ ) x j 2 E ζ x j x j 2 + j = 1 d k = 1 , k j d 2 R ( φ , x ¯ ) x j x k E ζ x j x j ζ x k x k ,
for some x ¯ joining ζ x and x . In addition, for all j , k { 1 , , d } , straightforward calculations yield; for instance, see [164],
E ζ j = x j b ˘ + 1 1 b ˘ + d + 1 = x j + b ˘ 1 + b ˘ ( d + 1 ) = x j + b ˘ 1 ( d + 1 ) x j + O b ˘ 2 , Cov ζ j , ζ k = x j b ˘ + 1 1 b ˘ + d + 1 1 { j = k } x k b ˘ + 1 1 b ˘ + d + 1 2 1 b ˘ + d + 2 = b x j + b ˘ 1 { j = k } x k + b ˘ ( d + 1 ) 1 { j = k } b ˘ ( 1 + b ˘ ( d + 1 ) ) 2 ( 1 + b ˘ ( d + 2 ) )
= b ˘ x j 1 { j = k } x k + O b ˘ 2 ,
E ζ j x j ζ k x k = Cov ζ j , ζ k + E ζ j x j E ζ k x k
= b ˘ x j 1 { j = k } x k + O b ˘ 2 .
Then, Cauchy–Schwartz inequality, (15.32) and (15.33) yields:
E R ( φ , ζ x ) R ( φ , x ) = j = 1 d O E ζ x j x j + 1 2 j = 1 d O E ζ x j x j 2 + j = 1 d k = 1 , k j d O E ζ x j x j ζ x k x k j = 1 d O E ζ x j x j 2 + O ( b ˘ ) + O ( b ˘ 2 ) O ( b ˘ 1 / 2 ) + O ( b ˘ ) + O ( b ˘ 2 ) O ( b ˘ 1 / 2 ) ( 1 + o ( 1 ) ) .
Finally, we obtain
sup x S d , 1 g ^ n ( φ , x , Λ n , 1 ) f ( x ) r ( 1 ) ( φ , x ) sup x S d , 1 g ^ n ( φ , x , Λ n , 1 ) R ( φ , x ) inf x S d , 1 f ( x ) ,
and
sup x S d , 1 f ^ n ( x , Λ n , 1 ) f ( x ) 1 sup x S d , 1 f ^ n ( x , b ) f ( x ) inf x S d , 1 f ( x ) .
By integrating the acquired findings with the given information that
r ( 1 ) ( φ , x ) = g ^ n ( φ , x , Λ n , 1 ) f ( x ) · f ( x ) f ^ n ( x , Λ n , 1 ) ,
gives us the desired result. Therefore, the proof is conclusive.

Funding

This research received no external funding.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Acknowledgments

The authors would like to thank the Editor-in-Chief, an Associate Editor, and three referees for their extremely helpful remarks, which resulted in a substantial improvement of the original form of the work and a presentation that was more sharply focused.

Conflicts of Interest

The author declares that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Appendix A

This appendix contains supplementary information that is an essential part of providing a more comprehensive understanding of the paper.
Lemma A1
(Lemma 2.2.9, [190]). Let X 1 , , X n be independent random variables with bounded ranges [ M , M ] and zero means. Then,
P i = 1 n X i > t 2 exp t 2 2 ( v + M t / 3 ) ,
for all t and v Var i = 1 n X i .
Lemma A2
(Theorem A. page 201, [177]). Let f be a symmetric function taking its variables from S d , 1 satisfying f c ,
E f X 1 , , X m = θ ,
and
σ 2 = V a r f X 1 , , X m ,
then for t > 0 and n m , we have:
P | u n , ( m ) ( f ) θ | t exp [ n / m ] t 2 2 σ 2 2 3 c t .
Lemma A3
(Proposition 1, [187]). If G : S m R is a measurable symmetric function with G = b then
P n 1 / 2 j = 2 m m j u n ( j ) π j , m G t 2 exp t ( n 1 ) 1 / 2 2 m + 2 m m + 1 b .
Lemma A4
(Lemma 1, [164]). We have, as b 0 and uniformly for x S d , 1 ,
0 < A b ( x ) b ( d + 1 ) / 2 ( 1 / b + d ) d + 1 / 2 ( 4 π ) d / 2 1 x 1 i = 1 d x i ( 1 + O ( b ) ) .
Furthermore, for any subset J [ d ] , and any κ ( 0 , ) d ,
A b ( x ) = b ˘ d / 2 ψ ( x ) 1 + O s ( b ) , if x i / b , i [ d ] and 1 x 1 / b , b ( d + | J | ) / 2 ψ J ( x ) i J Γ 2 κ i + 1 2 2 κ i + 1 Γ 2 κ i + 1 · 1 + O κ , x ( b ) , if x i / b κ i , i J and x i / b , i [ d ] J , and 1 x 1 / b ,
where ψ ( · ) and ψ J ( · ) are defined for every subset of indices J [ d ] , by
ψ ( x ) : = ψ ( x ) and ψ J ( x ) : = ( 4 π ) d | J | · 1 x 1 i [ d ] J x i 1 / 2 .
Lemma A5
(Lemma 2, [164]). If α 1 , , α d , β 2 , then
sup x S d K α , β ( x ) α 1 + β 1 ( β 1 ) i [ d ] α i 1 α 1 + β d 1 d .
Lemma A6
(Lemma 3, [164]). If α 1 , , α d , β 2 , then for all x Int S d , 1 ,
α j K α , β ( x ) log α 1 + β + log α j + log x j · α 1 + β 1 ( β 1 ) i = 1 d α i 1 α 1 + β d 1 d , β K α , β ( x ) log α 1 + β + | log ( β ) | + log 1 x 1 · α 1 + β 1 ( β 1 ) i = 1 d α i 1 α 1 + β d 1 d .
Lemma A7
(Lemma 4, [164]). If α 1 , , α d , β , α 1 , , α d , β 2 , and X is F distributed with a bounded density f supported on S d , 1 , then
E K α , β ( X ) K α , β ( X ) 3 ( d + 1 ) f α α 1 + β β 1 β β 1 i [ d ] α i α i 1 · α α 1 + β β d 1 d · log α α 1 + β β · α , β ( α , β ) ,
where α α : = max α i , α i i [ d ] , β β : = max β , β , and β β : = min β , β . Furthermore, let
S d , 1 ( δ ) : = x S d , 1 : 1 x 1 δ and x i δ i [ d ] , δ > 0 .
Then, for 0 < δ e 1 , we have
max x S d , 1 ( δ ) K α , β ( x ) K α , β ( x ) 3 ( d + 1 ) f | log δ | · α α 1 + β β 1 β β 1 i [ d ] α i α i 1 · α α 1 + β β d 1 d · log α α 1 + β β · α , β ( α , β ) .
Proposition A1.
Let s = d m , write x ˜ = ( x 1 , , x m ) X m R s , and set θ ( x ˜ ) : = r ( m ) ( φ , x ˜ ) . For a positive function a : X ( 0 , ) , define
g a ( x ) : = a ( x ) f ( x ) ,             G a ( x ˜ ) : = j = 1 m g a ( x j ) ,             Q a ( x ˜ ) : = log G a ( x ˜ ) .
For the smoothing scheme indexed by ℓ, let Q n , , x ˜ denote the probability measure on X m with product density
K ˜ Λ ¯ n , ( x ˜ ) ( t ˜ ) = j = 1 m K Λ n , ( x j ) ( t j ) .
Equivalently, let
T ˜ n , , x ˜ Q n , , x ˜ ,             Δ n , ( x ˜ ) : = T ˜ n , , x ˜ x ˜ .
Put
μ n , ( x ˜ ) : = E Q n , , x ˜ { Δ n , ( x ˜ ) } ,
and
M n , ( x ˜ ) : = E Q n , , x ˜ { Δ n , ( x ˜ ) Δ n , ( x ˜ ) } .
Assume that, uniformly on the compact set C Int ( X m ) ,
μ n , ( x ˜ ) + M n , ( x ˜ ) = O ( ρ n , ) ,             ρ n , 0 ,
and
E Q n , , x ˜ Δ n , ( x ˜ ) 3 = o ( ρ n , ) .
Assume also that
θ C 2 ( U ) ,             G a C 2 ( U ) ,
on an open neighbourhood U of C , and that
0 < inf x ˜ U G a ( x ˜ ) sup x ˜ U G a ( x ˜ ) < .
Define the deterministic ratio centering
Θ a , n , ( x ˜ ) : = X m θ ( t ˜ ) G a ( t ˜ ) d Q n , , x ˜ ( t ˜ ) X m G a ( t ˜ ) d Q n , , x ˜ ( t ˜ ) .
Then, uniformly for x ˜ C ,
Θ a , n , ( x ˜ ) θ ( x ˜ ) = θ ( x ˜ ) μ n , ( x ˜ ) + 1 2 tr 2 θ ( x ˜ ) M n , ( x ˜ ) + θ ( x ˜ ) M n , ( x ˜ ) Q a ( x ˜ ) + o ( ρ n , ) .
In particular, for the complete-case estimator under MAR,
a = p ,             G p ( x ˜ ) = j = 1 m p ( x j ) f ( x j ) ,
whereas for the fully observed or IPW centering,
a 1 ,             G 1 ( x ˜ ) = j = 1 m f ( x j ) .
Consequently,
Θ p , n , ( x ˜ ) Θ 1 , n , ( x ˜ ) = θ ( x ˜ ) M n , ( x ˜ ) j = 1 m log p ( x j ) + o ( ρ n , ) ,
uniformly on C . Hence the MAR propensity score cancels from the zeroth-order conditional target, but, in general, it does not cancel from the first non-vanishing deterministic smoothing-bias constant of the complete-case ratio. It disappears from that constant only under the additional condition
θ ( x ˜ ) M n , ( x ˜ ) j = 1 m log p ( x j ) = o ( ρ n , ) ,
for example if p is locally constant at the considered point, if θ ( x ˜ ) = 0 , or if the kernel scheme makes the above contraction of smaller order than the displayed bias scale.
Proof. 
We write, for brevity,
θ = θ ( x ˜ ) ,             G = G a ( x ˜ ) ,             Q = Q a ( x ˜ ) ,             Δ = Δ n , ( x ˜ ) ,
and all derivatives are evaluated at x ˜ . By the uniform Taylor expansion and the moment assumptions,
θ ( x ˜ + Δ ) = θ + θ Δ + 1 2 Δ 2 θ Δ + o Q ( ρ n , ) ,
and
G ( x ˜ + Δ ) = G + G Δ + 1 2 Δ 2 G Δ + o Q ( ρ n , ) ,
uniformly on C . Taking Q n , , x ˜ expectations gives
E Q G ( x ˜ + Δ ) = G + G μ n , + 1 2 tr 2 G M n , + o ( ρ n , ) .
Similarly,
E Q { θ ( x ˜ + Δ ) G ( x ˜ + Δ ) } = θ G + ( G θ + θ G ) μ n , + 1 2 G tr 2 θ M n , + 1 2 θ tr 2 G M n , + θ M n , G + o ( ρ n , ) .
Dividing (A6) by (A5), using G bounded away from zero, and expanding the inverse denominator yields
Θ a , n , ( x ˜ ) = θ + θ μ n , + 1 2 tr 2 θ M n , + θ M n , G G + o ( ρ n , ) .
Since
G G = log G = Q ,
we obtain (A2). The terms
θ G μ n ,   and   1 2 θ tr ( 2 G M n , )
cancel exactly with the corresponding denominator terms. This is the precise ratio cancellation.
Under MAR,
E ( δ X , Y ) = E ( δ X ) = p ( X ) .
Therefore the deterministic centering of the complete-case estimator is obtained with
G p ( x ˜ ) = j = 1 m p ( x j ) f ( x j ) .
For the IPW ratio,
E δ p ( X ) | X , Y = 1 ,
so its deterministic centering is obtained with G 1 . Subtracting expansion (A2) with a = p and with a 1 gives (A3), because
Q p ( x ˜ ) Q 1 ( x ˜ ) = j = 1 m log p ( x j ) .
The final assertion follows immediately from (A3). □
Remark A1.
Proposition A1 is the deterministic device used to audit all bias, MSE, and bandwidth statements for the estimator actually defined in (2.5). The relevant point is that the complete-case estimator is centered with respect to the tilted design density p f , while the IPW and fully observed estimators are centered with respect to f. Therefore, in every deterministic bias calculation, the replacement
f p f
is mandatory for the complete-case estimator. More precisely, if
Θ p , n , ( x ˜ ) θ ( x ˜ ) = B p , n , ( x ˜ ) + o ( ρ n , ) ,
then the bias constant B p , n , is obtained from
B p , n , ( x ˜ ) = θ ( x ˜ ) μ n , ( x ˜ ) + 1 2 tr 2 θ ( x ˜ ) M n , ( x ˜ ) + θ ( x ˜ ) M n , ( x ˜ ) j = 1 m log { p ( x j ) f ( x j ) } .
Thus, at the level of rates, the MAR mechanism does not alter the deterministic bias order, provided p is bounded away from zero and has the same smoothness order as f. However, at the level of constants, the propensity score enters the complete-case bias through the design-gradient term in (A7). Consequently, any theorem displaying only rates may retain the complete-data order, whereas any theorem displaying exact bias constants must use p f , not f. For the four smoothing regimes considered in the paper, the kernel-moment verification of Proposition A1 is as follows.
Smoother μ n , ( x ˜ ) M n , ( x ˜ ) ρ n , Dirichlet O ( b ˘ ) O ( b ˘ ) b ˘ Bernstein O ( k n 1 ) O ( k n 1 ) k n 1 Product beta O ( b n ) O ( b n ) b n Mixed   continuous–categorical O ( b n + λ n ) O ( b n + λ n ) b n + λ n
Here, the O ( · ) bounds are uniform on the compact interior sets on which the corresponding theorem is stated. More explicitly, for the product beta kernel on [ 0 , 1 ] d ,
μ n , 3 ( x ) = b n { 1 2 x } + O ( b n 2 ) ,
and
M n , 3 ( x ) = b n diag { x 1 ( 1 x 1 ) , , x d ( 1 x d ) } + O ( b n 2 ) .
For the Dirichlet kernel on the simplex,
μ n , 1 ( x ) = b ˘ { 1 ( d + 1 ) x } + O ( b ˘ 2 ) ,
and
M n , 1 ( x ) = b ˘ { diag ( x ) x x } + O ( b ˘ 2 ) .
For Bernstein-type smoothing, the corresponding moment matrix is the usual multinomial or product-binomial covariance matrix divided by k n , depending on the support geometry. The mixed case combines the continuous beta moment terms with the categorical smoothing bias of order λ n . Consequently, all stated rates remain valid for the complete-case estimator provided the assumptions are understood with the effective density p f . The stochastic rates are likewise unchanged at the level of order because δ i 1 and p is bounded away from zero. The limiting variance constants, however, must include the inverse-propensity loss of information. In first-order schematic form, the leading variance contribution contains local factors of the type
1 p ( x j ) f ( x j )
rather than 1 / f ( x j ) . For higher-order conditional U-statistics, the same replacement occurs in the variance of the first Hoeffding projection. Thus the audited interpretation of the results is the following.
Complete-case estimator : deterministic bias uses p f , IPW or complete-data estimator : deterministic bias uses f , MAR does not change the bias rate under smooth positive p , but MAR may change the bias constant and does change variance constants .
Therefore, any statement claiming that p cancels from the bias should be read only at the zeroth-order target level, or under the additional condition
θ ( x ˜ ) M n , ( x ˜ ) j = 1 m log p ( x j ) = o ( ρ n , ) .
Without this additional condition, the mathematically correct statement is that p does not alter the deterministic bias rate, but it generally contributes to the first non-vanishing complete-case bias constant.
Table A1. Top configurations by IMSE.
Table A1. Top configurations by IMSE.
ScenarioMissing_TypeCorrectionKernelEstimatornMiss_RateIBiasISdIMSEIAE
Scenario1_linearMARcomplete_casebernsteintau250000.01760.04990.00320.0440
Scenario1_linearMARcomplete_casetricubetau250000.02910.03780.00360.0468
Scenario1_linearMARcomplete_casebernsteintau150000.02020.05180.00370.0471
Scenario1_linearMARcomplete_casetricubetau350000.03060.03870.00390.0481
Scenario1_linearMARcomplete_casetricubetau150000.03020.03970.00390.0480
Scenario1_linearMARcomplete_casegaussiantau150000.03320.03680.00400.0499
Scenario1_linearMARcomplete_casegaussiantau250000.03250.03690.00400.0498
Scenario1_linearMARcomplete_casebernsteintau350000.01900.05420.00410.0493
Scenario1_linearMARcomplete_caseepanechnikovtau250000.03500.03690.00410.0502
Scenario1_linearMARcomplete_casegaussiantau350000.03530.03960.00450.0522
Scenario1_linearMARcomplete_casebernsteintau15000.10.01910.04900.00330.0451
Scenario1_linearMARcomplete_casebernsteintau25000.10.02390.05000.00350.0464
Scenario1_linearMARcomplete_casebernsteintau35000.10.02450.05090.00400.0484
Scenario1_linearMARcomplete_casetricubetau15000.10.03210.04190.00420.0516
Scenario1_linearMARcomplete_casetricubetau35000.10.03570.04150.00440.0518
Scenario1_linearMARcomplete_casetricubetau25000.10.03130.04330.00440.0517
Scenario1_linearMARcomplete_casegaussiantau15000.10.03860.04020.00470.0539
Scenario1_linearMARcomplete_caseepanechnikovtau15000.10.03630.04200.00470.0541
Scenario1_linearMARcomplete_caseepanechnikovtau25000.10.03770.04030.00500.0547
Scenario1_linearMARcomplete_casegaussiantau25000.10.03770.04250.00500.0551
Scenario1_linearMARcomplete_casebernsteintau15000.30.02120.05680.00430.0515
Scenario1_linearMARcomplete_casebernsteintau25000.30.02990.05670.00490.0538
Scenario1_linearMARcomplete_casetricubetau15000.30.03360.04720.00500.0546
Scenario1_linearMARcomplete_casebernsteintau35000.30.03560.05570.00570.0568
Scenario1_linearMARcomplete_casetricubetau25000.30.04070.04980.00600.0602
Scenario1_linearMARcomplete_caseepanechnikovtau25000.30.04530.04560.00610.0610
Scenario1_linearMARcomplete_casegaussiantau25000.30.04730.04450.00620.0619
Scenario1_linearMARcomplete_casegaussiantau15000.30.04540.04830.00620.0618
Scenario1_linearMARcomplete_caseepanechnikovtau15000.30.04110.04900.00630.0625
Scenario1_linearMARcomplete_casetricubetau35000.30.04630.04830.00660.0628
Scenario1_linearMARcomplete_casebernsteintau15000.50.02400.06930.00620.0605
Scenario1_linearMARcomplete_casetricubetau15000.50.04280.05690.00690.0638
Scenario1_linearMARcomplete_casegaussiantau15000.50.04010.05900.00730.0671
Scenario1_linearMARcomplete_casebernsteintau25000.50.04390.06690.00810.0659
Scenario1_linearMARcomplete_caseepanechnikovtau15000.50.04850.06310.00870.0722
Scenario1_linearMARcomplete_casetricubetau25000.50.05070.06380.00910.0731
Scenario1_linearMARcomplete_casegaussiantau25000.50.05470.05800.00910.0734
Scenario1_linearMARcomplete_caseepanechnikovtau25000.50.05800.05730.00950.0741
Scenario1_linearMARcomplete_casetricubetau35000.50.06020.05850.01060.0773
Scenario1_linearMARcomplete_caseepanechnikovtau35000.50.06640.05780.01130.0814
Scenario1_linearMARcomplete_casebernsteintau2100000.00940.03850.00180.0330
Scenario1_linearMARcomplete_casebernsteintau3100000.01150.03800.00190.0332
Scenario1_linearMARcomplete_casebernsteintau1100000.01220.03960.00210.0353
Scenario1_linearMARcomplete_casetricubetau3100000.02140.02820.00220.0358
Scenario1_linearMARcomplete_casetricubetau2100000.02250.03000.00230.0367
Scenario1_linearMARcomplete_casetricubetau1100000.02360.03090.00230.0381
Scenario1_linearMARcomplete_casegaussiantau2100000.02600.02800.00240.0384
Scenario1_linearMARcomplete_casegaussiantau1100000.02620.02880.00240.0386
Scenario1_linearMARcomplete_casegaussiantau3100000.02790.02740.00250.0390
Scenario1_linearMARcomplete_caseepanechnikovtau2100000.02510.02830.00250.0381
Scenario1_linearMARcomplete_casebernsteintau210000.10.01340.04150.00220.0362
Scenario1_linearMARcomplete_casebernsteintau310000.10.01520.03970.00220.0361
Scenario1_linearMARcomplete_casebernsteintau110000.10.01040.04270.00230.0364
Scenario1_linearMARcomplete_casetricubetau110000.10.02360.03060.00240.0379
Scenario1_linearMARcomplete_casetricubetau210000.10.02460.03220.00260.0395
Scenario1_linearMARcomplete_casetricubetau310000.10.02640.03200.00270.0407
Scenario1_linearMARcomplete_casegaussiantau110000.10.02670.03020.00270.0406
Scenario1_linearMARcomplete_caseepanechnikovtau110000.10.02850.02960.00280.0408
Scenario1_linearMARcomplete_casegaussiantau310000.10.02980.02970.00290.0417
Scenario1_linearMARcomplete_caseepanechnikovtau310000.10.02990.02810.00290.0412
Scenario1_linearMARcomplete_casebernsteintau110000.30.01530.04680.00280.0402
Scenario1_linearMARcomplete_casebernsteintau210000.30.01660.04580.00280.0405
Scenario1_linearMARcomplete_casetricubetau110000.30.02820.03700.00320.0444
Scenario1_linearMARcomplete_casetricubetau210000.30.02850.03670.00320.0442
Scenario1_linearMARcomplete_casebernsteintau310000.30.02240.04650.00340.0443
Scenario1_linearMARcomplete_casetricubetau310000.30.03290.03460.00350.0454
Scenario1_linearMARcomplete_casegaussiantau110000.30.03310.03430.00360.0466
Scenario1_linearMARcomplete_caseepanechnikovtau110000.30.03310.03600.00370.0467
Scenario1_linearMARcomplete_casegaussiantau210000.30.03500.03420.00380.0478
Scenario1_linearMARcomplete_caseepanechnikovtau210000.30.03340.03510.00380.0474
Scenario1_linearMARcomplete_casebernsteintau110000.50.01230.05260.00340.0452
Scenario1_linearMARcomplete_casetricubetau110000.50.02690.04260.00380.0482
Scenario1_linearMARcomplete_casebernsteintau210000.50.02650.05210.00410.0485
Scenario1_linearMARcomplete_casegaussiantau110000.50.03240.04250.00420.0501
Scenario1_linearMARcomplete_caseepanechnikovtau110000.50.03670.04310.00480.0535
Scenario1_linearMARcomplete_caseepanechnikovtau210000.50.04050.04120.00490.0541
Scenario1_linearMARcomplete_casetricubetau210000.50.03750.04460.00490.0544
Scenario1_linearMARcomplete_casegaussiantau210000.50.04080.04260.00520.0562
Scenario1_linearMARcomplete_casebernsteintau310000.50.03410.05170.00550.0544
Scenario1_linearMARcomplete_casetricubetau310000.50.04550.04260.00590.0580
Scenario1_linearMARcomplete_casebernsteintau3200000.00740.03170.00120.0267
Scenario1_linearMARcomplete_casetricubetau1200000.01620.02150.00130.0270
Scenario1_linearMARcomplete_casebernsteintau2200000.00750.03240.00130.0270
Scenario1_linearMARcomplete_casebernsteintau1200000.00750.03210.00130.0272
Scenario1_linearMARcomplete_casetricubetau2200000.01650.02270.00140.0285
Scenario1_linearMARcomplete_casegaussiantau1200000.02010.02040.00140.0289
Scenario1_linearMARcomplete_casetricubetau3200000.01840.02320.00140.0293
Scenario1_linearMARcomplete_casegaussiantau2200000.01940.02060.00150.0288
Scenario1_linearMARcomplete_casegaussiantau3200000.01980.02200.00150.0302
Scenario1_linearMARcomplete_caseepanechnikovtau2200000.01980.02110.00150.0294
Scenario1_linearMARcomplete_casebernsteintau120000.10.00720.03250.00130.0271
Scenario1_linearMARcomplete_casebernsteintau220000.10.00740.03310.00130.0279
Scenario1_linearMARcomplete_casetricubetau220000.10.01780.02250.00140.0288
Scenario1_linearMARcomplete_casetricubetau120000.10.01750.02310.00140.0291
Scenario1_linearMARcomplete_casebernsteintau320000.10.00970.03350.00150.0292
Scenario1_linearMARcomplete_casetricubetau320000.10.02040.02240.00150.0299
Scenario1_linearMARcomplete_casegaussiantau220000.10.02050.02220.00160.0305
Scenario1_linearMARcomplete_caseepanechnikovtau120000.10.02060.02200.00160.0304
Scenario1_linearMARcomplete_casegaussiantau320000.10.02120.02240.00170.0314
Scenario1_linearMARcomplete_casegaussiantau120000.10.02130.02340.00170.0324
Scenario1_linearMARcomplete_casebernsteintau120000.30.00860.03570.00160.0300
Scenario1_linearMARcomplete_casebernsteintau220000.30.01110.03650.00170.0316
Scenario1_linearMARcomplete_casetricubetau120000.30.02120.02730.00180.0330
Scenario1_linearMARcomplete_casetricubetau220000.30.02100.02640.00190.0332
Scenario1_linearMARcomplete_casetricubetau320000.30.02490.02650.00200.0344
Scenario1_linearMARcomplete_casegaussiantau120000.30.02180.02640.00200.0348
Scenario1_linearMARcomplete_casebernsteintau320000.30.01500.03840.00210.0345
Scenario1_linearMARcomplete_casegaussiantau220000.30.02540.02570.00210.0355
Scenario1_linearMARcomplete_caseepanechnikovtau120000.30.02420.02520.00210.0352
Scenario1_linearMARcomplete_caseepanechnikovtau220000.30.02640.02600.00220.0362
Scenario1_linearMARcomplete_casebernsteintau120000.50.01140.04230.00220.0359
Scenario1_linearMARcomplete_casebernsteintau220000.50.01420.04090.00230.0360
Scenario1_linearMARcomplete_casetricubetau120000.50.02400.03220.00230.0374
Scenario1_linearMARcomplete_caseepanechnikovtau120000.50.02670.02910.00250.0378
Scenario1_linearMARcomplete_casetricubetau220000.50.02700.03170.00260.0393
Scenario1_linearMARcomplete_casegaussiantau120000.50.02980.03280.00290.0416
Scenario1_linearMARcomplete_casetricubetau320000.50.03270.03020.00290.0414
Scenario1_linearMARcomplete_casebernsteintau320000.50.01950.04220.00300.0403
Scenario1_linearMARcomplete_casegaussiantau220000.50.03220.03210.00310.0430
Scenario1_linearMARcomplete_caseepanechnikovtau220000.50.03360.03050.00310.0421
Scenario1_linearMARipwbernsteintau150000.01540.04820.00310.0430
Scenario1_linearMARipwbernsteintau250000.01740.04900.00320.0433
Scenario1_linearMARipwtricubetau250000.02870.03960.00370.0475
Scenario1_linearMARipwbernsteintau350000.02040.05180.00380.0473
Scenario1_linearMARipwtricubetau350000.02810.03910.00390.0482
Scenario1_linearMARipwtricubetau150000.03140.04210.00420.0504
Scenario1_linearMARipwgaussiantau250000.03460.03870.00420.0512
Scenario1_linearMARipwgaussiantau150000.03320.04030.00430.0521
Scenario1_linearMARipwepanechnikovtau250000.03580.04000.00470.0545
Scenario1_linearMARipwgaussiantau350000.03850.03940.00470.0546
Scenario1_linearMARipwbernsteintau15000.10.01790.04790.00310.0431
Scenario1_linearMARipwbernsteintau25000.10.02100.05030.00340.0444
Scenario1_linearMARipwbernsteintau35000.10.02090.05000.00380.0474
Scenario1_linearMARipwtricubetau25000.10.03030.04080.00390.0490
Scenario1_linearMARipwtricubetau35000.10.03100.04130.00430.0508
Scenario1_linearMARipwgaussiantau25000.10.03450.03980.00430.0516
Scenario1_linearMARipwtricubetau15000.10.03330.04280.00440.0526
Scenario1_linearMARipwgaussiantau15000.10.03490.04110.00450.0524
Scenario1_linearMARipwgaussiantau35000.10.03600.04170.00470.0534
Scenario1_linearMARipwepanechnikovtau15000.10.03580.04180.00470.0534
Scenario1_linearMARipwbernsteintau25000.30.02080.05730.00440.0506
Scenario1_linearMARipwbernsteintau15000.30.02460.05700.00450.0515
Scenario1_linearMARipwtricubetau15000.30.02930.04910.00480.0543
Scenario1_linearMARipwtricubetau25000.30.03360.04820.00520.0557
Scenario1_linearMARipwgaussiantau15000.30.03640.04640.00520.0574
Scenario1_linearMARipwepanechnikovtau15000.30.03620.04560.00530.0568
Scenario1_linearMARipwgaussiantau25000.30.03970.04550.00540.0575
Scenario1_linearMARipwtricubetau35000.30.03740.05180.00590.0599
Scenario1_linearMARipwbernsteintau35000.30.02970.05960.00600.0588
Scenario1_linearMARipwepanechnikovtau25000.30.03830.05000.00620.0612
Scenario1_linearMARipwbernsteintau15000.50.02820.06670.00610.0617
Scenario1_linearMARipwgaussiantau15000.50.03510.05590.00630.0619
Scenario1_linearMARipwtricubetau25000.50.03230.05620.00660.0618
Scenario1_linearMARipwbernsteintau25000.50.02870.06630.00670.0622
Scenario1_linearMARipwtricubetau15000.50.03530.06170.00690.0651
Scenario1_linearMARipwepanechnikovtau15000.50.03790.05880.00720.0654
Scenario1_linearMARipwepanechnikovtau25000.50.04200.05920.00800.0688
Scenario1_linearMARipwgaussiantau25000.50.04670.05820.00830.0693
Scenario1_linearMARipwepanechnikovtau35000.50.05390.05880.00940.0741
Scenario1_linearMARipwgaussiantau35000.50.05430.05790.00950.0742
Scenario1_linearMARipwbernsteintau2100000.01130.03780.00180.0327
Scenario1_linearMARipwbernsteintau1100000.01030.04050.00210.0351
Scenario1_linearMARipwbernsteintau3100000.01160.04060.00210.0353
Scenario1_linearMARipwtricubetau1100000.02180.02930.00220.0363
Scenario1_linearMARipwtricubetau3100000.02360.02920.00230.0372
Scenario1_linearMARipwtricubetau2100000.02470.03010.00240.0386
Scenario1_linearMARipwgaussiantau2100000.02540.02940.00250.0392
Scenario1_linearMARipwgaussiantau3100000.02780.02800.00250.0391
Scenario1_linearMARipwgaussiantau1100000.02610.02880.00260.0396
Scenario1_linearMARipwepanechnikovtau1100000.02600.02860.00260.0391
Scenario1_linearMARipwbernsteintau110000.10.01020.04000.00210.0352
Scenario1_linearMARipwbernsteintau210000.10.01250.04020.00210.0352
Scenario1_linearMARipwtricubetau210000.10.02270.03080.00230.0373
Scenario1_linearMARipwtricubetau110000.10.02330.03130.00240.0385
Scenario1_linearMARipwbernsteintau310000.10.01570.04200.00240.0371
Scenario1_linearMARipwtricubetau310000.10.02610.02940.00240.0377
Scenario1_linearMARipwgaussiantau110000.10.02630.03080.00270.0401
Scenario1_linearMARipwepanechnikovtau110000.10.02360.03020.00270.0402
Scenario1_linearMARipwgaussiantau310000.10.02880.02910.00270.0409
Scenario1_linearMARipwgaussiantau210000.10.02490.03180.00270.0411
Scenario1_linearMARipwbernsteintau210000.30.01840.04380.00260.0397
Scenario1_linearMARipwbernsteintau110000.30.01170.04580.00260.0390
Scenario1_linearMARipwtricubetau110000.30.02440.03480.00270.0402
Scenario1_linearMARipwtricubetau210000.30.02270.03620.00290.0414
Scenario1_linearMARipwgaussiantau110000.30.02690.03290.00290.0418
Scenario1_linearMARipwbernsteintau310000.30.01730.04590.00310.0426
Scenario1_linearMARipwepanechnikovtau110000.30.02530.03560.00310.0433
Scenario1_linearMARipwtricubetau310000.30.02820.03630.00320.0440
Scenario1_linearMARipwepanechnikovtau210000.30.02830.03460.00330.0447
Scenario1_linearMARipwgaussiantau210000.30.02910.03800.00360.0467
Scenario1_linearMARipwtricubetau110000.50.02250.04340.00350.0460
Scenario1_linearMARipwbernsteintau210000.50.01950.04980.00360.0453
Scenario1_linearMARipwbernsteintau110000.50.01580.05580.00390.0485
Scenario1_linearMARipwepanechnikovtau110000.50.02890.04350.00400.0495
Scenario1_linearMARipwgaussiantau110000.50.02960.04290.00410.0496
Scenario1_linearMARipwgaussiantau210000.50.03030.04330.00420.0502
Scenario1_linearMARipwtricubetau210000.50.02860.04540.00440.0506
Scenario1_linearMARipwepanechnikovtau210000.50.03380.04170.00440.0510
Scenario1_linearMARipwtricubetau310000.50.03330.04370.00470.0527
Scenario1_linearMARipwgaussiantau310000.50.03960.04140.00510.0545
Scenario1_linearMARipwbernsteintau2200000.00620.03100.00110.0258
Scenario1_linearMARipwbernsteintau1200000.00790.03210.00130.0273
Scenario1_linearMARipwtricubetau3200000.01730.02130.00130.0274
Scenario1_linearMARipwbernsteintau3200000.00850.03210.00130.0276
Scenario1_linearMARipwtricubetau1200000.01590.02240.00130.0282
Scenario1_linearMARipwtricubetau2200000.01580.02300.00140.0280
Scenario1_linearMARipwgaussiantau3200000.01930.02130.00150.0290
Scenario1_linearMARipwgaussiantau2200000.02070.02120.00150.0297
Scenario1_linearMARipwgaussiantau1200000.02040.02170.00150.0300
Scenario1_linearMARipwepanechnikovtau2200000.01880.02140.00150.0294
Scenario1_linearMARipwbernsteintau220000.10.00760.03240.00130.0270
Scenario1_linearMARipwtricubetau120000.10.01860.02300.00140.0286
Scenario1_linearMARipwtricubetau220000.10.01710.02310.00140.0287
Scenario1_linearMARipwtricubetau320000.10.01610.02270.00140.0283
Scenario1_linearMARipwbernsteintau320000.10.00980.03260.00140.0285
Scenario1_linearMARipwbernsteintau120000.10.00880.03380.00140.0286
Scenario1_linearMARipwgaussiantau120000.10.02060.02250.00160.0302
Scenario1_linearMARipwgaussiantau220000.10.01980.02330.00160.0315
Scenario1_linearMARipwepanechnikovtau320000.10.01940.02250.00160.0306
Scenario1_linearMARipwgaussiantau320000.10.02220.02270.00160.0313
Scenario1_linearMARipwepanechnikovtau120000.30.01940.02290.00160.0306
Scenario1_linearMARipwbernsteintau120000.30.00840.03660.00160.0313
Scenario1_linearMARipwtricubetau120000.30.01700.02670.00170.0315
Scenario1_linearMARipwbernsteintau220000.30.00920.03660.00170.0313
Scenario1_linearMARipwtricubetau220000.30.01890.02720.00180.0324
Scenario1_linearMARipwtricubetau320000.30.02190.02500.00180.0320
Scenario1_linearMARipwbernsteintau320000.30.01090.03630.00180.0319
Scenario1_linearMARipwgaussiantau120000.30.02130.02650.00180.0334
Scenario1_linearMARipwgaussiantau220000.30.02260.02600.00190.0332
Scenario1_linearMARipwepanechnikovtau220000.30.02110.02600.00200.0340
Scenario1_linearMARipwtricubetau120000.50.01880.03090.00200.0349
Scenario1_linearMARipwtricubetau220000.50.01810.03130.00220.0351
Scenario1_linearMARipwbernsteintau120000.50.01280.04170.00220.0358
Scenario1_linearMARipwgaussiantau120000.50.02290.03090.00230.0370
Scenario1_linearMARipwepanechnikovtau120000.50.02110.03110.00230.0363
Scenario1_linearMARipwbernsteintau220000.50.01340.04210.00230.0363
Scenario1_linearMARipwepanechnikovtau220000.50.02500.03000.00250.0383
Scenario1_linearMARipwtricubetau320000.50.02480.03300.00260.0389
Scenario1_linearMARipwgaussiantau220000.50.02410.03270.00260.0398
Scenario1_linearMARipwgaussiantau320000.50.02720.03110.00280.0401
Scenario1_linearMCARcomplete_casebernsteintau250000.01950.04740.00300.0422
Scenario1_linearMCARcomplete_casebernsteintau150000.02210.05010.00360.0464
Scenario1_linearMCARcomplete_casetricubetau350000.03020.03850.00370.0472
Scenario1_linearMCARcomplete_casebernsteintau350000.02250.05070.00380.0480
Scenario1_linearMCARcomplete_casetricubetau150000.02960.04120.00390.0496
Scenario1_linearMCARcomplete_casegaussiantau350000.03440.03820.00420.0511
Scenario1_linearMCARcomplete_casetricubetau250000.03330.04230.00430.0518
Scenario1_linearMCARcomplete_casegaussiantau150000.03360.04000.00430.0521
Scenario1_linearMCARcomplete_casegaussiantau250000.03650.03980.00440.0528
Scenario1_linearMCARcomplete_caseepanechnikovtau150000.03760.03860.00460.0536
Scenario1_linearMCARcomplete_casebernsteintau25000.10.02110.05220.00360.0468
Scenario1_linearMCARcomplete_casebernsteintau35000.10.02570.04890.00370.0472
Scenario1_linearMCARcomplete_casebernsteintau15000.10.01930.05340.00390.0485
Scenario1_linearMCARcomplete_casetricubetau15000.10.03030.04130.00400.0495
Scenario1_linearMCARcomplete_casetricubetau35000.10.03280.04170.00420.0508
Scenario1_linearMCARcomplete_casetricubetau25000.10.03170.04260.00440.0516
Scenario1_linearMCARcomplete_casegaussiantau25000.10.03630.03940.00460.0529
Scenario1_linearMCARcomplete_casegaussiantau15000.10.03760.04130.00480.0542
Scenario1_linearMCARcomplete_casegaussiantau35000.10.03840.04240.00490.0553
Scenario1_linearMCARcomplete_caseepanechnikovtau35000.10.03680.04030.00500.0551
Scenario1_linearMCARcomplete_casebernsteintau25000.30.02550.05530.00430.0511
Scenario1_linearMCARcomplete_casebernsteintau35000.30.02810.05630.00480.0541
Scenario1_linearMCARcomplete_casebernsteintau15000.30.02680.05880.00500.0548
Scenario1_linearMCARcomplete_casetricubetau15000.30.03480.04660.00530.0573
Scenario1_linearMCARcomplete_casegaussiantau25000.30.03530.04510.00540.0572
Scenario1_linearMCARcomplete_casetricubetau35000.30.03750.04700.00540.0579
Scenario1_linearMCARcomplete_casetricubetau25000.30.03810.04970.00570.0604
Scenario1_linearMCARcomplete_casegaussiantau35000.30.03980.04640.00580.0603
Scenario1_linearMCARcomplete_caseepanechnikovtau15000.30.03990.04750.00600.0613
Scenario1_linearMCARcomplete_caseepanechnikovtau35000.30.04160.04540.00600.0605
Scenario1_linearMCARcomplete_casebernsteintau25000.50.02940.06120.00530.0564
Scenario1_linearMCARcomplete_casebernsteintau15000.50.02740.06360.00580.0596
Scenario1_linearMCARcomplete_casebernsteintau35000.50.03280.06160.00600.0603
Scenario1_linearMCARcomplete_casetricubetau25000.50.04060.05120.00650.0634
Scenario1_linearMCARcomplete_casetricubetau15000.50.03840.05430.00660.0634
Scenario1_linearMCARcomplete_casetricubetau35000.50.04480.05470.00720.0671
Scenario1_linearMCARcomplete_casegaussiantau25000.50.05000.05300.00750.0688
Scenario1_linearMCARcomplete_casegaussiantau15000.50.04790.05330.00760.0688
Scenario1_linearMCARcomplete_caseepanechnikovtau15000.50.04760.05270.00770.0687
Scenario1_linearMCARcomplete_caseepanechnikovtau25000.50.04960.05570.00820.0720
Scenario1_linearMCARcomplete_casebernsteintau3100000.01110.03970.00210.0351
Scenario1_linearMCARcomplete_casebernsteintau2100000.01040.04150.00220.0354
Scenario1_linearMCARcomplete_casebernsteintau1100000.01230.04100.00220.0360
Scenario1_linearMCARcomplete_casetricubetau2100000.02220.02940.00220.0365
Scenario1_linearMCARcomplete_casetricubetau1100000.02340.02950.00230.0372
Scenario1_linearMCARcomplete_casetricubetau3100000.02380.03000.00240.0378
Scenario1_linearMCARcomplete_casegaussiantau1100000.02670.02790.00250.0385
Scenario1_linearMCARcomplete_casegaussiantau3100000.02620.02870.00250.0392
Scenario1_linearMCARcomplete_caseepanechnikovtau2100000.02660.02760.00260.0387
Scenario1_linearMCARcomplete_casegaussiantau2100000.02700.02960.00260.0403
Scenario1_linearMCARcomplete_casebernsteintau210000.10.01280.04110.00210.0354
Scenario1_linearMCARcomplete_casebernsteintau110000.10.01280.04140.00230.0364
Scenario1_linearMCARcomplete_casebernsteintau310000.10.01400.04160.00240.0373
Scenario1_linearMCARcomplete_casetricubetau110000.10.02270.03180.00250.0386
Scenario1_linearMCARcomplete_casetricubetau210000.10.02260.03280.00260.0392
Scenario1_linearMCARcomplete_casetricubetau310000.10.02620.03230.00270.0396
Scenario1_linearMCARcomplete_casegaussiantau210000.10.02700.03040.00270.0408
Scenario1_linearMCARcomplete_casegaussiantau310000.10.02920.02960.00270.0406
Scenario1_linearMCARcomplete_caseepanechnikovtau210000.10.02530.03030.00280.0404
Scenario1_linearMCARcomplete_casegaussiantau110000.10.02700.03160.00280.0420
Scenario1_linearMCARcomplete_casebernsteintau210000.30.01380.04500.00260.0389
Scenario1_linearMCARcomplete_casebernsteintau110000.30.01670.04340.00260.0394
Scenario1_linearMCARcomplete_casebernsteintau310000.30.01660.04430.00270.0398
Scenario1_linearMCARcomplete_casetricubetau110000.30.02490.03610.00300.0430
Scenario1_linearMCARcomplete_casetricubetau310000.30.02900.03330.00300.0431
Scenario1_linearMCARcomplete_casetricubetau210000.30.02710.03500.00310.0437
Scenario1_linearMCARcomplete_caseepanechnikovtau310000.30.02960.03110.00320.0431
Scenario1_linearMCARcomplete_casegaussiantau210000.30.02980.03400.00340.0455
Scenario1_linearMCARcomplete_caseepanechnikovtau110000.30.03090.03190.00340.0448
Scenario1_linearMCARcomplete_caseepanechnikovtau210000.30.02970.03230.00340.0453
Scenario1_linearMCARcomplete_casebernsteintau210000.50.01480.05000.00320.0440
Scenario1_linearMCARcomplete_casebernsteintau110000.50.01760.05010.00350.0463
Scenario1_linearMCARcomplete_casebernsteintau310000.50.02110.05020.00350.0454
Scenario1_linearMCARcomplete_casetricubetau310000.50.03040.03850.00380.0483
Scenario1_linearMCARcomplete_casetricubetau210000.50.02780.04150.00390.0483
Scenario1_linearMCARcomplete_caseepanechnikovtau210000.50.03130.03590.00400.0484
Scenario1_linearMCARcomplete_casetricubetau110000.50.03070.04200.00410.0502
Scenario1_linearMCARcomplete_casegaussiantau210000.50.03560.03740.00420.0509
Scenario1_linearMCARcomplete_casegaussiantau110000.50.03210.04000.00430.0515
Scenario1_linearMCARcomplete_casegaussiantau310000.50.03760.03800.00450.0532
Scenario1_linearMCARcomplete_casebernsteintau2200000.00660.03220.00130.0267
Scenario1_linearMCARcomplete_casetricubetau1200000.01620.02130.00130.0273
Scenario1_linearMCARcomplete_casebernsteintau1200000.00800.03170.00130.0270
Scenario1_linearMCARcomplete_casetricubetau2200000.01660.02130.00130.0274
Scenario1_linearMCARcomplete_casetricubetau3200000.01780.02170.00130.0279
Scenario1_linearMCARcomplete_casebernsteintau3200000.00860.03310.00140.0285
Scenario1_linearMCARcomplete_casegaussiantau3200000.01960.02080.00140.0290
Scenario1_linearMCARcomplete_casegaussiantau1200000.02060.02090.00150.0294
Scenario1_linearMCARcomplete_casegaussiantau2200000.02040.02070.00150.0294
Scenario1_linearMCARcomplete_caseepanechnikovtau3200000.01870.01990.00150.0286
Scenario1_linearMCARcomplete_casebernsteintau120000.10.00920.03060.00120.0262
Scenario1_linearMCARcomplete_casebernsteintau320000.10.00800.03210.00130.0277
Scenario1_linearMCARcomplete_casebernsteintau220000.10.00850.03330.00140.0282
Scenario1_linearMCARcomplete_casetricubetau120000.10.01680.02260.00140.0283
Scenario1_linearMCARcomplete_casetricubetau320000.10.01670.02240.00140.0287
Scenario1_linearMCARcomplete_casetricubetau220000.10.01900.02240.00140.0289
Scenario1_linearMCARcomplete_casegaussiantau120000.10.01940.02100.00150.0295
Scenario1_linearMCARcomplete_casegaussiantau220000.10.02100.02130.00150.0300
Scenario1_linearMCARcomplete_caseepanechnikovtau220000.10.01940.02230.00160.0308
Scenario1_linearMCARcomplete_casegaussiantau320000.10.02100.02310.00160.0312
Scenario1_linearMCARcomplete_casebernsteintau220000.30.00840.03510.00150.0296
Scenario1_linearMCARcomplete_casetricubetau120000.30.01850.02510.00160.0312
Scenario1_linearMCARcomplete_casebernsteintau120000.30.01040.03580.00160.0311
Scenario1_linearMCARcomplete_casebernsteintau320000.30.01120.03550.00170.0311
Scenario1_linearMCARcomplete_casetricubetau220000.30.01750.02580.00170.0314
Scenario1_linearMCARcomplete_casetricubetau320000.30.01950.02570.00170.0324
Scenario1_linearMCARcomplete_casegaussiantau220000.30.02310.02430.00190.0338
Scenario1_linearMCARcomplete_casegaussiantau320000.30.02250.02490.00190.0340
Scenario1_linearMCARcomplete_casegaussiantau120000.30.02280.02500.00190.0338
Scenario1_linearMCARcomplete_caseepanechnikovtau220000.30.02290.02460.00200.0334
Scenario1_linearMCARcomplete_casebernsteintau220000.50.01210.04000.00200.0348
Scenario1_linearMCARcomplete_casebernsteintau120000.50.01290.03920.00200.0346
Scenario1_linearMCARcomplete_casetricubetau220000.50.02080.02900.00210.0354
Scenario1_linearMCARcomplete_casebernsteintau320000.50.01470.03920.00220.0349
Scenario1_linearMCARcomplete_casetricubetau120000.50.02220.03120.00230.0379
Scenario1_linearMCARcomplete_casegaussiantau120000.50.02410.02940.00250.0388
Scenario1_linearMCARcomplete_casetricubetau320000.50.02270.03200.00250.0393
Scenario1_linearMCARcomplete_casegaussiantau220000.50.02600.02810.00250.0389
Scenario1_linearMCARcomplete_casegaussiantau320000.50.02770.02820.00260.0401
Scenario1_linearMCARcomplete_caseepanechnikovtau220000.50.02460.02960.00270.0398
Scenario1_linearMCARipwbernsteintau250000.01830.05000.00330.0445
Scenario1_linearMCARipwbernsteintau350000.01960.04870.00330.0445
Scenario1_linearMCARipwbernsteintau150000.01640.04980.00330.0449
Scenario1_linearMCARipwtricubetau350000.02950.03910.00380.0480
Scenario1_linearMCARipwtricubetau250000.03260.04000.00390.0494
Scenario1_linearMCARipwtricubetau150000.03290.04180.00410.0506
Scenario1_linearMCARipwgaussiantau350000.03780.03850.00440.0526
Scenario1_linearMCARipwgaussiantau250000.03570.04150.00460.0536
Scenario1_linearMCARipwgaussiantau150000.03720.04120.00460.0531
Scenario1_linearMCARipwepanechnikovtau350000.03500.03750.00460.0521
Scenario1_linearMCARipwbernsteintau25000.10.01880.04980.00320.0435
Scenario1_linearMCARipwbernsteintau15000.10.02000.05070.00360.0470
Scenario1_linearMCARipwbernsteintau35000.10.02250.05340.00410.0493
Scenario1_linearMCARipwtricubetau35000.10.02990.04130.00420.0498
Scenario1_linearMCARipwtricubetau15000.10.03310.04150.00430.0517
Scenario1_linearMCARipwtricubetau25000.10.03220.04330.00450.0532
Scenario1_linearMCARipwgaussiantau25000.10.03650.04020.00460.0525
Scenario1_linearMCARipwgaussiantau15000.10.03320.04220.00460.0534
Scenario1_linearMCARipwepanechnikovtau15000.10.03800.04050.00480.0545
Scenario1_linearMCARipwepanechnikovtau25000.10.03510.04240.00500.0558
Scenario1_linearMCARipwbernsteintau25000.30.02290.05570.00430.0508
Scenario1_linearMCARipwbernsteintau35000.30.02560.05520.00450.0528
Scenario1_linearMCARipwbernsteintau15000.30.02700.05660.00470.0527
Scenario1_linearMCARipwtricubetau15000.30.03820.04630.00540.0577
Scenario1_linearMCARipwgaussiantau15000.30.03940.04370.00540.0575
Scenario1_linearMCARipwgaussiantau25000.30.03990.04310.00540.0577
Scenario1_linearMCARipwtricubetau25000.30.03610.05000.00560.0590
Scenario1_linearMCARipwtricubetau35000.30.03830.04940.00580.0592
Scenario1_linearMCARipwepanechnikovtau15000.30.03960.04650.00600.0605
Scenario1_linearMCARipwepanechnikovtau25000.30.04080.04560.00600.0602
Scenario1_linearMCARipwbernsteintau25000.50.03350.06500.00620.0617
Scenario1_linearMCARipwbernsteintau35000.50.02920.06360.00620.0610
Scenario1_linearMCARipwbernsteintau15000.50.02620.06570.00620.0618
Scenario1_linearMCARipwtricubetau15000.50.04000.05140.00650.0629
Scenario1_linearMCARipwtricubetau25000.50.04020.05770.00720.0665
Scenario1_linearMCARipwtricubetau35000.50.04330.05820.00770.0689
Scenario1_linearMCARipwepanechnikovtau25000.50.04800.05460.00790.0697
Scenario1_linearMCARipwgaussiantau35000.50.04970.05380.00800.0708
Scenario1_linearMCARipwepanechnikovtau15000.50.05080.05530.00820.0716
Scenario1_linearMCARipwgaussiantau25000.50.05120.05600.00820.0720
Scenario1_linearMCARipwbernsteintau1100000.01040.03870.00190.0337
Scenario1_linearMCARipwbernsteintau2100000.01120.04070.00210.0346
Scenario1_linearMCARipwtricubetau1100000.02190.02830.00210.0360
Scenario1_linearMCARipwtricubetau2100000.02150.02920.00220.0366
Scenario1_linearMCARipwbernsteintau3100000.01130.04140.00220.0365
Scenario1_linearMCARipwtricubetau3100000.02270.02950.00230.0373
Scenario1_linearMCARipwgaussiantau1100000.02550.02850.00240.0388
Scenario1_linearMCARipwgaussiantau3100000.02710.02780.00250.0391
Scenario1_linearMCARipwepanechnikovtau2100000.02520.02770.00250.0384
Scenario1_linearMCARipwgaussiantau2100000.02780.02950.00260.0402
Scenario1_linearMCARipwbernsteintau110000.10.01160.04000.00200.0347
Scenario1_linearMCARipwbernsteintau310000.10.01240.04190.00230.0366
Scenario1_linearMCARipwbernsteintau210000.10.01260.04290.00230.0370
Scenario1_linearMCARipwtricubetau110000.10.02140.03040.00230.0373
Scenario1_linearMCARipwgaussiantau110000.10.02640.02830.00250.0387
Scenario1_linearMCARipwtricubetau210000.10.02480.03230.00260.0397
Scenario1_linearMCARipwtricubetau310000.10.02530.03200.00260.0397
Scenario1_linearMCARipwgaussiantau310000.10.02520.03010.00270.0400
Scenario1_linearMCARipwgaussiantau210000.10.02750.03060.00270.0413
Scenario1_linearMCARipwepanechnikovtau110000.10.02850.03020.00280.0409
Scenario1_linearMCARipwbernsteintau210000.30.01410.04410.00250.0393
Scenario1_linearMCARipwbernsteintau310000.30.01690.04390.00270.0398
Scenario1_linearMCARipwbernsteintau110000.30.01720.04510.00280.0403
Scenario1_linearMCARipwtricubetau210000.30.02440.03390.00280.0416
Scenario1_linearMCARipwtricubetau110000.30.02560.03490.00290.0419
Scenario1_linearMCARipwtricubetau310000.30.02680.03470.00310.0438
Scenario1_linearMCARipwgaussiantau210000.30.02890.03350.00320.0436
Scenario1_linearMCARipwgaussiantau110000.30.03080.03330.00330.0449
Scenario1_linearMCARipwepanechnikovtau110000.30.02820.03400.00340.0456
Scenario1_linearMCARipwgaussiantau310000.30.03090.03500.00340.0461
Scenario1_linearMCARipwbernsteintau210000.50.01890.04910.00320.0440
Scenario1_linearMCARipwbernsteintau310000.50.02000.04940.00340.0451
Scenario1_linearMCARipwbernsteintau110000.50.01890.04950.00340.0444
Scenario1_linearMCARipwtricubetau210000.50.02710.04000.00380.0485
Scenario1_linearMCARipwtricubetau110000.50.03240.04000.00420.0510
Scenario1_linearMCARipwgaussiantau110000.50.03200.04010.00430.0512
Scenario1_linearMCARipwepanechnikovtau110000.50.03180.04000.00430.0511
Scenario1_linearMCARipwgaussiantau210000.50.03510.04090.00440.0522
Scenario1_linearMCARipwtricubetau310000.50.03100.04320.00440.0520
Scenario1_linearMCARipwgaussiantau310000.50.03670.03990.00440.0528
Scenario1_linearMCARipwbernsteintau3200000.00710.03150.00120.0266
Scenario1_linearMCARipwbernsteintau2200000.00650.03260.00130.0271
Scenario1_linearMCARipwtricubetau1200000.01630.02210.00130.0280
Scenario1_linearMCARipwtricubetau3200000.01680.02150.00130.0278
Scenario1_linearMCARipwtricubetau2200000.01800.02260.00140.0288
Scenario1_linearMCARipwbernsteintau1200000.00990.03310.00140.0285
Scenario1_linearMCARipwgaussiantau1200000.01880.02140.00150.0290
Scenario1_linearMCARipwgaussiantau3200000.01920.02150.00150.0298
Scenario1_linearMCARipwepanechnikovtau3200000.01880.02100.00150.0288
Scenario1_linearMCARipwgaussiantau2200000.02040.02160.00150.0302
Scenario1_linearMCARipwbernsteintau220000.10.00740.03250.00130.0272
Scenario1_linearMCARipwbernsteintau120000.10.00840.03300.00140.0281
Scenario1_linearMCARipwtricubetau120000.10.01680.02320.00140.0281
Scenario1_linearMCARipwbernsteintau320000.10.00940.03300.00140.0283
Scenario1_linearMCARipwtricubetau220000.10.01670.02260.00140.0284
Scenario1_linearMCARipwtricubetau320000.10.01790.02280.00140.0289
Scenario1_linearMCARipwgaussiantau120000.10.02060.02200.00160.0305
Scenario1_linearMCARipwgaussiantau220000.10.02020.02250.00160.0307
Scenario1_linearMCARipwepanechnikovtau320000.10.02190.02160.00170.0311
Scenario1_linearMCARipwgaussiantau320000.10.02150.02300.00170.0316
Scenario1_linearMCARipwbernsteintau220000.30.00760.03450.00140.0286
Scenario1_linearMCARipwtricubetau120000.30.01820.02530.00160.0308
Scenario1_linearMCARipwbernsteintau320000.30.01050.03600.00170.0313
Scenario1_linearMCARipwtricubetau220000.30.01820.02570.00170.0321
Scenario1_linearMCARipwtricubetau320000.30.02110.02500.00170.0319
Scenario1_linearMCARipwbernsteintau120000.30.01140.03700.00180.0322
Scenario1_linearMCARipwgaussiantau120000.30.02350.02370.00190.0333
Scenario1_linearMCARipwgaussiantau320000.30.02340.02540.00190.0339
Scenario1_linearMCARipwgaussiantau220000.30.02160.02540.00200.0340
Scenario1_linearMCARipwepanechnikovtau120000.30.02480.02420.00200.0346
Scenario1_linearMCARipwbernsteintau220000.50.01060.03880.00190.0331
Scenario1_linearMCARipwbernsteintau320000.50.01380.03940.00200.0347
Scenario1_linearMCARipwbernsteintau120000.50.01270.04000.00210.0352
Scenario1_linearMCARipwtricubetau120000.50.02140.02910.00220.0358
Scenario1_linearMCARipwtricubetau320000.50.02330.02910.00230.0363
Scenario1_linearMCARipwtricubetau220000.50.02230.03080.00240.0377
Scenario1_linearMCARipwepanechnikovtau220000.50.02480.02890.00250.0384
Scenario1_linearMCARipwgaussiantau120000.50.02660.02850.00250.0392
Scenario1_linearMCARipwgaussiantau320000.50.02700.02830.00250.0390
Scenario1_linearMCARipwgaussiantau220000.50.02690.03050.00260.0404
Scenario2_nonlinearMARcomplete_caseepanechnikovtau350000.03730.07020.01090.0686
Scenario2_nonlinearMARcomplete_casegaussiantau350000.03550.07880.01110.0689
Scenario2_nonlinearMARcomplete_casetricubetau350000.03520.08310.01390.0742
Scenario2_nonlinearMARcomplete_casebernsteintau350000.04960.08810.01630.0752
Scenario2_nonlinearMARcomplete_casebetatau350000.23540.05630.08410.2363
Scenario2_nonlinearMARcomplete_casegaussiantau250000.14870.09780.08430.1608
Scenario2_nonlinearMARcomplete_casebetatau250000.25190.05250.09860.2525
Scenario2_nonlinearMARcomplete_casebernsteintau250000.16290.08420.09880.1733
Scenario2_nonlinearMARcomplete_casebetatau150000.26680.05080.11470.2675
Scenario2_nonlinearMARcomplete_casetricubetau250000.17030.09360.11990.1859
Scenario2_nonlinearMARcomplete_casetricubetau35000.10.02910.07410.01110.0687
Scenario2_nonlinearMARcomplete_caseepanechnikovtau35000.10.03330.07440.01190.0715
Scenario2_nonlinearMARcomplete_casebernsteintau35000.10.04960.08070.01350.0740
Scenario2_nonlinearMARcomplete_casegaussiantau35000.10.05500.08830.01660.0799
Scenario2_nonlinearMARcomplete_casebetatau35000.10.23020.05680.08040.2313
Scenario2_nonlinearMARcomplete_casegaussiantau25000.10.15250.09070.08360.1643
Scenario2_nonlinearMARcomplete_casebernsteintau25000.10.17150.08550.10160.1791
Scenario2_nonlinearMARcomplete_casebetatau25000.10.25650.05340.10180.2576
Scenario2_nonlinearMARcomplete_caseepanechnikovtau25000.10.16460.08850.10950.1798
Scenario2_nonlinearMARcomplete_casebetatau15000.10.27310.05430.12030.2741
Scenario2_nonlinearMARcomplete_casetricubetau35000.30.04010.07420.01140.0733
Scenario2_nonlinearMARcomplete_casebernsteintau35000.30.05510.07690.01280.0772
Scenario2_nonlinearMARcomplete_caseepanechnikovtau35000.30.04330.07960.01360.0769
Scenario2_nonlinearMARcomplete_casegaussiantau35000.30.05260.08290.01390.0794
Scenario2_nonlinearMARcomplete_casegaussiantau25000.30.15680.09010.07940.1684
Scenario2_nonlinearMARcomplete_casebernsteintau25000.30.16900.08440.09000.1748
Scenario2_nonlinearMARcomplete_casebetatau35000.30.24900.05540.09210.2503
Scenario2_nonlinearMARcomplete_casebetatau25000.30.26480.06110.10860.2663
Scenario2_nonlinearMARcomplete_caseepanechnikovtau25000.30.18130.09860.11400.1928
Scenario2_nonlinearMARcomplete_casebetatau15000.30.27120.05490.11650.2725
Scenario2_nonlinearMARcomplete_casetricubetau35000.50.04280.08970.01530.0835
Scenario2_nonlinearMARcomplete_caseepanechnikovtau35000.50.05470.09230.01770.0884
Scenario2_nonlinearMARcomplete_casegaussiantau35000.50.06190.09580.01910.0908
Scenario2_nonlinearMARcomplete_casebernsteintau35000.50.08220.09200.02150.0988
Scenario2_nonlinearMARcomplete_casegaussiantau25000.50.16920.09460.08530.1802
Scenario2_nonlinearMARcomplete_casebetatau35000.50.25430.06680.09590.2559
Scenario2_nonlinearMARcomplete_casebernsteintau25000.50.18380.09480.09630.1903
Scenario2_nonlinearMARcomplete_casebetatau25000.50.27110.06200.11100.2724
Scenario2_nonlinearMARcomplete_caseepanechnikovtau25000.50.17860.09810.11110.1916
Scenario2_nonlinearMARcomplete_casetricubetau25000.50.18800.10130.12180.2010
Scenario2_nonlinearMARcomplete_casebernsteintau3100000.02380.05730.00610.0506
Scenario2_nonlinearMARcomplete_casetricubetau3100000.02650.05830.00710.0546
Scenario2_nonlinearMARcomplete_casegaussiantau3100000.03270.06000.00720.0539
Scenario2_nonlinearMARcomplete_caseepanechnikovtau3100000.03070.05880.00770.0550
Scenario2_nonlinearMARcomplete_casegaussiantau2100000.12570.07130.05880.1331
Scenario2_nonlinearMARcomplete_casebetatau3100000.21440.03730.06830.2149
Scenario2_nonlinearMARcomplete_casebetatau2100000.22530.04170.07740.2258
Scenario2_nonlinearMARcomplete_casebetatau1100000.23580.03840.08610.2362
Scenario2_nonlinearMARcomplete_casebernsteintau2100000.15070.07750.10180.1608
Scenario2_nonlinearMARcomplete_caseepanechnikovtau2100000.15640.07470.10680.1656
Scenario2_nonlinearMARcomplete_casetricubetau310000.10.02810.05690.00750.0554
Scenario2_nonlinearMARcomplete_casebernsteintau310000.10.02670.06820.00890.0556
Scenario2_nonlinearMARcomplete_casegaussiantau310000.10.03760.07060.01060.0605
Scenario2_nonlinearMARcomplete_caseepanechnikovtau310000.10.04100.07820.01430.0672
Scenario2_nonlinearMARcomplete_casebetatau310000.10.21440.04010.06780.2149
Scenario2_nonlinearMARcomplete_casebetatau210000.10.22820.03790.07860.2287
Scenario2_nonlinearMARcomplete_casegaussiantau210000.10.14450.08170.08350.1531
Scenario2_nonlinearMARcomplete_casebetatau110000.10.24230.04290.09200.2429
Scenario2_nonlinearMARcomplete_caseepanechnikovtau210000.10.14940.07950.09620.1594
Scenario2_nonlinearMARcomplete_casebernsteintau210000.10.14880.07390.09680.1592
Scenario2_nonlinearMARcomplete_casegaussiantau310000.30.04030.06300.00840.0600
Scenario2_nonlinearMARcomplete_caseepanechnikovtau310000.30.03950.05920.00900.0606
Scenario2_nonlinearMARcomplete_casebernsteintau310000.30.03590.07130.01020.0618
Scenario2_nonlinearMARcomplete_casetricubetau310000.30.02440.07220.01050.0608
Scenario2_nonlinearMARcomplete_casegaussiantau210000.30.13230.07520.06660.1405
Scenario2_nonlinearMARcomplete_casebetatau310000.30.22370.04670.07430.2245
Scenario2_nonlinearMARcomplete_casebetatau210000.30.23940.04240.08620.2402
Scenario2_nonlinearMARcomplete_casebetatau110000.30.24080.04190.08840.2415
Scenario2_nonlinearMARcomplete_casebernsteintau210000.30.14940.07920.09100.1579
Scenario2_nonlinearMARcomplete_casetricubetau210000.30.15240.07690.09950.1628
Scenario2_nonlinearMARcomplete_casebernsteintau310000.50.04400.06510.00930.0647
Scenario2_nonlinearMARcomplete_casegaussiantau310000.50.04590.06770.00940.0663
Scenario2_nonlinearMARcomplete_caseepanechnikovtau310000.50.04430.06520.00990.0671
Scenario2_nonlinearMARcomplete_casetricubetau310000.50.03410.07000.00990.0654
Scenario2_nonlinearMARcomplete_casegaussiantau210000.50.13800.07930.06460.1474
Scenario2_nonlinearMARcomplete_casebetatau310000.50.23600.04730.08120.2369
Scenario2_nonlinearMARcomplete_casebernsteintau210000.50.15430.07670.08520.1606
Scenario2_nonlinearMARcomplete_casebetatau210000.50.24090.04300.08590.2419
Scenario2_nonlinearMARcomplete_caseepanechnikovtau210000.50.15880.08680.09840.1691
Scenario2_nonlinearMARcomplete_casebetatau110000.50.25570.04500.10000.2564
Scenario2_nonlinearMARcomplete_casebernsteintau3200000.01360.05160.00490.0419
Scenario2_nonlinearMARcomplete_caseepanechnikovtau3200000.02650.04740.00510.0431
Scenario2_nonlinearMARcomplete_casegaussiantau3200000.02880.05230.00590.0454
Scenario2_nonlinearMARcomplete_casetricubetau3200000.01870.05160.00600.0429
Scenario2_nonlinearMARcomplete_casegaussiantau2200000.10440.05980.05060.1113
Scenario2_nonlinearMARcomplete_casebetatau3200000.19580.03010.05620.1961
Scenario2_nonlinearMARcomplete_casebetatau2200000.20440.02740.06230.2046
Scenario2_nonlinearMARcomplete_casebetatau1200000.21540.03060.07160.2157
Scenario2_nonlinearMARcomplete_caseepanechnikovtau2200000.12830.06060.08380.1352
Scenario2_nonlinearMARcomplete_casetricubetau2200000.12810.06190.08940.1385
Scenario2_nonlinearMARcomplete_casetricubetau320000.10.02270.04340.00430.0429
Scenario2_nonlinearMARcomplete_casebernsteintau320000.10.01610.04980.00450.0419
Scenario2_nonlinearMARcomplete_caseepanechnikovtau320000.10.02390.04740.00500.0446
Scenario2_nonlinearMARcomplete_casegaussiantau320000.10.03220.05070.00580.0473
Scenario2_nonlinearMARcomplete_casebetatau320000.10.20120.02780.05930.2015
Scenario2_nonlinearMARcomplete_casegaussiantau220000.10.11520.06740.05930.1215
Scenario2_nonlinearMARcomplete_casebetatau220000.10.20980.03080.06590.2102
Scenario2_nonlinearMARcomplete_casebetatau120000.10.21370.02970.06930.2139
Scenario2_nonlinearMARcomplete_caseepanechnikovtau220000.10.12750.06120.08300.1347
Scenario2_nonlinearMARcomplete_casetricubetau220000.10.12760.06880.08580.1377
Scenario2_nonlinearMARcomplete_casebernsteintau320000.30.02010.05000.00450.0438
Scenario2_nonlinearMARcomplete_casetricubetau320000.30.02360.04620.00450.0447
Scenario2_nonlinearMARcomplete_caseepanechnikovtau320000.30.03070.05010.00600.0490
Scenario2_nonlinearMARcomplete_casegaussiantau320000.30.03600.05070.00620.0500
Scenario2_nonlinearMARcomplete_casegaussiantau220000.30.11150.06390.05230.1175
Scenario2_nonlinearMARcomplete_casebetatau320000.30.20590.03320.06150.2063
Scenario2_nonlinearMARcomplete_casebetatau220000.30.21290.02930.06710.2134
Scenario2_nonlinearMARcomplete_casebetatau120000.30.22220.03320.07480.2226
Scenario2_nonlinearMARcomplete_caseepanechnikovtau220000.30.12640.06450.07980.1344
Scenario2_nonlinearMARcomplete_casebernsteintau220000.30.12820.06720.08340.1378
Scenario2_nonlinearMARcomplete_casebernsteintau320000.50.02610.05340.00530.0471
Scenario2_nonlinearMARcomplete_casegaussiantau320000.50.03770.05190.00600.0518
Scenario2_nonlinearMARcomplete_casetricubetau320000.50.02830.05310.00640.0503
Scenario2_nonlinearMARcomplete_caseepanechnikovtau320000.50.03630.05660.00820.0537
Scenario2_nonlinearMARcomplete_casegaussiantau220000.50.11100.06230.04970.1179
Scenario2_nonlinearMARcomplete_casebetatau320000.50.21300.03510.06530.2135
Scenario2_nonlinearMARcomplete_casebetatau220000.50.22100.03370.07120.2215
Scenario2_nonlinearMARcomplete_casebetatau120000.50.22630.03090.07560.2268
Scenario2_nonlinearMARcomplete_caseepanechnikovtau220000.50.13450.06800.08110.1412
Scenario2_nonlinearMARcomplete_casetricubetau220000.50.13300.06600.08420.1413
Scenario2_nonlinearMARipwbernsteintau350000.04070.07340.01070.0672
Scenario2_nonlinearMARipwtricubetau350000.03220.07190.01080.0689
Scenario2_nonlinearMARipwepanechnikovtau350000.03610.07710.01240.0700
Scenario2_nonlinearMARipwgaussiantau350000.04750.09140.01770.0776
Scenario2_nonlinearMARipwgaussiantau250000.15410.09530.08770.1653
Scenario2_nonlinearMARipwbetatau350000.24020.05540.08860.2410
Scenario2_nonlinearMARipwbernsteintau250000.16980.09050.09990.1783
Scenario2_nonlinearMARipwbetatau250000.25340.05570.10070.2543
Scenario2_nonlinearMARipwtricubetau250000.16930.09140.11640.1837
Scenario2_nonlinearMARipwepanechnikovtau250000.17690.08950.12100.1895
Scenario2_nonlinearMARipwgaussiantau35000.10.03850.07810.01110.0702
Scenario2_nonlinearMARipwtricubetau35000.10.02590.08000.01250.0702
Scenario2_nonlinearMARipwepanechnikovtau35000.10.03940.07680.01310.0736
Scenario2_nonlinearMARipwbernsteintau35000.10.04660.07960.01310.0715
Scenario2_nonlinearMARipwgaussiantau25000.10.15460.09150.08030.1647
Scenario2_nonlinearMARipwbetatau35000.10.24670.05410.09170.2475
Scenario2_nonlinearMARipwbetatau25000.10.25400.05310.10100.2549
Scenario2_nonlinearMARipwbernsteintau25000.10.17080.09040.10160.1783
Scenario2_nonlinearMARipwepanechnikovtau25000.10.17650.09350.11750.1878
Scenario2_nonlinearMARipwbetatau15000.10.27430.05240.12190.2753
Scenario2_nonlinearMARipwepanechnikovtau35000.30.04130.07010.01010.0727
Scenario2_nonlinearMARipwbernsteintau35000.30.06050.06990.01250.0794
Scenario2_nonlinearMARipwtricubetau35000.30.03280.08620.01320.0758
Scenario2_nonlinearMARipwgaussiantau35000.30.04800.08580.01450.0781
Scenario2_nonlinearMARipwgaussiantau25000.30.16730.10410.09380.1787
Scenario2_nonlinearMARipwbetatau35000.30.25580.05300.09810.2568
Scenario2_nonlinearMARipwbernsteintau25000.30.17950.08920.10070.1858
Scenario2_nonlinearMARipwbetatau25000.30.26820.05370.11080.2689
Scenario2_nonlinearMARipwepanechnikovtau25000.30.17700.10820.11300.1888
Scenario2_nonlinearMARipwtricubetau25000.30.18820.09170.12870.1986
Scenario2_nonlinearMARipwepanechnikovtau35000.50.05100.08480.01620.0866
Scenario2_nonlinearMARipwtricubetau35000.50.04850.08880.01740.0844
Scenario2_nonlinearMARipwgaussiantau35000.50.06120.09340.01840.0877
Scenario2_nonlinearMARipwbernsteintau35000.50.08140.08110.01840.0947
Scenario2_nonlinearMARipwgaussiantau25000.50.15910.09590.08070.1719
Scenario2_nonlinearMARipwbernsteintau25000.50.17580.08990.08620.1826
Scenario2_nonlinearMARipwbetatau35000.50.26170.06520.10370.2633
Scenario2_nonlinearMARipwepanechnikovtau25000.50.17610.09530.10970.1880
Scenario2_nonlinearMARipwbetatau25000.50.27950.05760.11940.2807
Scenario2_nonlinearMARipwtricubetau25000.50.19290.10410.12930.2066
Scenario2_nonlinearMARipwtricubetau3100000.02730.05370.00610.0525
Scenario2_nonlinearMARipwbernsteintau3100000.02340.06320.00750.0524
Scenario2_nonlinearMARipwgaussiantau3100000.03290.06470.00830.0563
Scenario2_nonlinearMARipwepanechnikovtau3100000.02930.06740.00970.0577
Scenario2_nonlinearMARipwgaussiantau2100000.13010.07560.06650.1374
Scenario2_nonlinearMARipwbetatau3100000.21620.03790.06920.2167
Scenario2_nonlinearMARipwbetatau2100000.22790.03690.07880.2283
Scenario2_nonlinearMARipwbetatau1100000.23890.04330.09030.2394
Scenario2_nonlinearMARipwepanechnikovtau2100000.14540.07600.09260.1547
Scenario2_nonlinearMARipwbernsteintau2100000.15140.07420.09940.1610
Scenario2_nonlinearMARipwbernsteintau310000.10.02610.06010.00660.0529
Scenario2_nonlinearMARipwtricubetau310000.10.02600.05710.00720.0550
Scenario2_nonlinearMARipwgaussiantau310000.10.03770.06690.00930.0586
Scenario2_nonlinearMARipwepanechnikovtau310000.10.03220.06580.01000.0592
Scenario2_nonlinearMARipwgaussiantau210000.10.12550.07400.05990.1345
Scenario2_nonlinearMARipwbetatau310000.10.21960.03740.07120.2201
Scenario2_nonlinearMARipwbetatau210000.10.23120.03940.08140.2317
Scenario2_nonlinearMARipwbetatau110000.10.23970.04080.08930.2402
Scenario2_nonlinearMARipwepanechnikovtau210000.10.14560.07760.09410.1547
Scenario2_nonlinearMARipwbernsteintau210000.10.15130.07990.09910.1600
Scenario2_nonlinearMARipwtricubetau310000.30.02530.06060.00770.0557
Scenario2_nonlinearMARipwbernsteintau310000.30.03250.06270.00780.0586
Scenario2_nonlinearMARipwepanechnikovtau310000.30.03850.05730.00790.0604
Scenario2_nonlinearMARipwgaussiantau310000.30.04750.07140.01150.0672
Scenario2_nonlinearMARipwgaussiantau210000.30.13680.07840.06980.1449
Scenario2_nonlinearMARipwbetatau310000.30.23120.04120.07950.2317
Scenario2_nonlinearMARipwbetatau210000.30.23900.04010.08680.2395
Scenario2_nonlinearMARipwbernsteintau210000.30.14850.07750.09040.1579
Scenario2_nonlinearMARipwbetatau110000.30.24480.04470.09270.2454
Scenario2_nonlinearMARipwtricubetau210000.30.15150.08560.09970.1637
Scenario2_nonlinearMARipwtricubetau310000.50.03820.06290.00860.0640
Scenario2_nonlinearMARipwepanechnikovtau310000.50.04160.06790.01040.0654
Scenario2_nonlinearMARipwbernsteintau310000.50.05080.06730.01100.0697
Scenario2_nonlinearMARipwgaussiantau310000.50.05220.07370.01250.0704
Scenario2_nonlinearMARipwgaussiantau210000.50.13740.07930.06520.1460
Scenario2_nonlinearMARipwbetatau310000.50.24430.04610.08810.2450
Scenario2_nonlinearMARipwbernsteintau210000.50.16250.08360.09240.1686
Scenario2_nonlinearMARipwepanechnikovtau210000.50.15680.08610.09350.1645
Scenario2_nonlinearMARipwbetatau210000.50.25420.04480.09780.2549
Scenario2_nonlinearMARipwtricubetau210000.50.15440.08090.09820.1653
Scenario2_nonlinearMARipwtricubetau3200000.02330.04250.00420.0422
Scenario2_nonlinearMARipwepanechnikovtau3200000.02510.04390.00460.0431
Scenario2_nonlinearMARipwbernsteintau3200000.01580.05400.00560.0421
Scenario2_nonlinearMARipwgaussiantau3200000.02930.05020.00570.0443
Scenario2_nonlinearMARipwgaussiantau2200000.10440.05960.05020.1116
Scenario2_nonlinearMARipwbetatau3200000.19430.02700.05490.1946
Scenario2_nonlinearMARipwbetatau2200000.20180.02940.06020.2020
Scenario2_nonlinearMARipwbetatau1200000.21150.03070.06850.2118
Scenario2_nonlinearMARipwtricubetau2200000.13080.06320.08410.1382
Scenario2_nonlinearMARipwepanechnikovtau2200000.13230.06300.08840.1393
Scenario2_nonlinearMARipwtricubetau320000.10.02020.04690.00450.0426
Scenario2_nonlinearMARipwepanechnikovtau320000.10.02760.04340.00470.0444
Scenario2_nonlinearMARipwbernsteintau320000.10.01720.05320.00530.0421
Scenario2_nonlinearMARipwgaussiantau320000.10.02880.05200.00600.0461
Scenario2_nonlinearMARipwgaussiantau220000.10.10240.05980.04740.1096
Scenario2_nonlinearMARipwbetatau320000.10.19950.02870.05820.1998
Scenario2_nonlinearMARipwbetatau220000.10.20780.02970.06450.2081
Scenario2_nonlinearMARipwbetatau120000.10.21520.02970.07090.2155
Scenario2_nonlinearMARipwepanechnikovtau220000.10.12870.06230.08250.1354
Scenario2_nonlinearMARipwbernsteintau220000.10.13110.06560.08900.1412
Scenario2_nonlinearMARipwgaussiantau320000.30.02830.04770.00470.0447
Scenario2_nonlinearMARipwtricubetau320000.30.02730.04710.00520.0469
Scenario2_nonlinearMARipwbernsteintau320000.30.02200.05640.00610.0462
Scenario2_nonlinearMARipwepanechnikovtau320000.30.03340.05080.00670.0502
Scenario2_nonlinearMARipwgaussiantau220000.30.11030.06440.05060.1160
Scenario2_nonlinearMARipwbetatau320000.30.20700.02920.06310.2073
Scenario2_nonlinearMARipwbetatau220000.30.21580.02950.06880.2162
Scenario2_nonlinearMARipwepanechnikovtau220000.30.12540.06630.07270.1312
Scenario2_nonlinearMARipwbetatau120000.30.22360.03120.07550.2239
Scenario2_nonlinearMARipwbernsteintau220000.30.12730.06480.08180.1364
Scenario2_nonlinearMARipwtricubetau320000.50.03160.04970.00550.0494
Scenario2_nonlinearMARipwgaussiantau320000.50.04140.05350.00690.0540
Scenario2_nonlinearMARipwbernsteintau320000.50.03030.05850.00720.0512
Scenario2_nonlinearMARipwepanechnikovtau320000.50.03560.05830.00830.0539
Scenario2_nonlinearMARipwgaussiantau220000.50.11710.06680.05360.1229
Scenario2_nonlinearMARipwbetatau320000.50.22390.03330.07430.2243
Scenario2_nonlinearMARipwepanechnikovtau220000.50.13270.06340.07600.1384
Scenario2_nonlinearMARipwbetatau220000.50.22720.03190.07650.2276
Scenario2_nonlinearMARipwtricubetau220000.50.12940.06850.07980.1383
Scenario2_nonlinearMARipwbetatau120000.50.23470.03040.08340.2351
Scenario2_nonlinearMCARcomplete_casetricubetau350000.02830.07960.01250.0710
Scenario2_nonlinearMCARcomplete_caseepanechnikovtau350000.03380.08030.01280.0715
Scenario2_nonlinearMCARcomplete_casebernsteintau350000.04720.08310.01510.0734
Scenario2_nonlinearMCARcomplete_casegaussiantau350000.05010.08990.01650.0781
Scenario2_nonlinearMCARcomplete_casebetatau350000.24290.05460.09030.2438
Scenario2_nonlinearMCARcomplete_casebetatau250000.24570.04950.09280.2463
Scenario2_nonlinearMCARcomplete_casebernsteintau250000.17190.08820.10310.1800
Scenario2_nonlinearMCARcomplete_casegaussiantau250000.16950.09430.10380.1799
Scenario2_nonlinearMCARcomplete_caseepanechnikovtau250000.17340.09040.11510.1843
Scenario2_nonlinearMCARcomplete_casebetatau150000.27110.05730.12080.2718
Scenario2_nonlinearMCARcomplete_casetricubetau35000.10.02650.07860.01150.0710
Scenario2_nonlinearMCARcomplete_caseepanechnikovtau35000.10.03740.07660.01270.0721
Scenario2_nonlinearMCARcomplete_casebernsteintau35000.10.04730.08390.01450.0735
Scenario2_nonlinearMCARcomplete_casegaussiantau35000.10.04590.08810.01540.0754
Scenario2_nonlinearMCARcomplete_casebetatau35000.10.23840.05880.08620.2393
Scenario2_nonlinearMCARcomplete_casegaussiantau25000.10.16310.09850.09380.1745
Scenario2_nonlinearMCARcomplete_casebernsteintau25000.10.16910.08470.09580.1757
Scenario2_nonlinearMCARcomplete_casebetatau25000.10.25830.04880.10280.2589
Scenario2_nonlinearMCARcomplete_caseepanechnikovtau25000.10.17610.09610.12020.1883
Scenario2_nonlinearMCARcomplete_casebetatau15000.10.27300.05640.12060.2738
Scenario2_nonlinearMCARcomplete_caseepanechnikovtau35000.30.04330.08190.01500.0814
Scenario2_nonlinearMCARcomplete_casegaussiantau35000.30.04620.08990.01510.0832
Scenario2_nonlinearMCARcomplete_casetricubetau35000.30.03400.09570.01780.0839
Scenario2_nonlinearMCARcomplete_casebernsteintau35000.30.06250.08730.01810.0839
Scenario2_nonlinearMCARcomplete_casebetatau35000.30.24910.06710.09630.2502
Scenario2_nonlinearMCARcomplete_casegaussiantau25000.30.16750.10210.09680.1804
Scenario2_nonlinearMCARcomplete_casebernsteintau25000.30.18160.09580.10770.1916
Scenario2_nonlinearMCARcomplete_casebetatau25000.30.26940.06380.11590.2705
Scenario2_nonlinearMCARcomplete_caseepanechnikovtau25000.30.18230.10050.12170.1968
Scenario2_nonlinearMCARcomplete_casebetatau15000.30.28620.06240.13350.2874
Scenario2_nonlinearMCARcomplete_casetricubetau35000.50.03760.08920.01640.0886
Scenario2_nonlinearMCARcomplete_casebernsteintau35000.50.06970.08700.01780.0911
Scenario2_nonlinearMCARcomplete_caseepanechnikovtau35000.50.03990.11460.02280.0950
Scenario2_nonlinearMCARcomplete_casegaussiantau35000.50.05570.11350.02430.0987
Scenario2_nonlinearMCARcomplete_casebetatau35000.50.25250.07140.09780.2540
Scenario2_nonlinearMCARcomplete_casebernsteintau25000.50.19700.10120.11010.2055
Scenario2_nonlinearMCARcomplete_casegaussiantau25000.50.19070.12120.12070.2059
Scenario2_nonlinearMCARcomplete_casebetatau25000.50.27790.07120.12380.2796
Scenario2_nonlinearMCARcomplete_caseepanechnikovtau25000.50.20910.11720.14660.2259
Scenario2_nonlinearMCARcomplete_casetricubetau25000.50.20520.10820.15020.2253
Scenario2_nonlinearMCARcomplete_casegaussiantau3100000.02930.05720.00620.0526
Scenario2_nonlinearMCARcomplete_casebernsteintau3100000.02300.06230.00740.0529
Scenario2_nonlinearMCARcomplete_casetricubetau3100000.02240.05880.00750.0538
Scenario2_nonlinearMCARcomplete_caseepanechnikovtau3100000.02940.05800.00750.0557
Scenario2_nonlinearMCARcomplete_casebetatau3100000.21740.03560.06980.2178
Scenario2_nonlinearMCARcomplete_casegaussiantau2100000.13280.07820.07120.1415
Scenario2_nonlinearMCARcomplete_casebetatau2100000.22800.03650.07870.2285
Scenario2_nonlinearMCARcomplete_casebetatau1100000.23620.03880.08640.2367
Scenario2_nonlinearMCARcomplete_caseepanechnikovtau2100000.14550.07640.09760.1568
Scenario2_nonlinearMCARcomplete_casebernsteintau2100000.15080.07790.10120.1615
Scenario2_nonlinearMCARcomplete_casetricubetau310000.10.02750.05670.00700.0559
Scenario2_nonlinearMCARcomplete_caseepanechnikovtau310000.10.03630.05720.00820.0580
Scenario2_nonlinearMCARcomplete_casegaussiantau310000.10.03740.06900.00980.0606
Scenario2_nonlinearMCARcomplete_casebernsteintau310000.10.02440.07140.00980.0563
Scenario2_nonlinearMCARcomplete_casegaussiantau210000.10.12500.07670.06260.1348
Scenario2_nonlinearMCARcomplete_casebetatau310000.10.21960.04510.07230.2201
Scenario2_nonlinearMCARcomplete_casebetatau210000.10.22850.03810.07810.2290
Scenario2_nonlinearMCARcomplete_casebetatau110000.10.24100.04310.09210.2416
Scenario2_nonlinearMCARcomplete_caseepanechnikovtau210000.10.14720.07470.09860.1571
Scenario2_nonlinearMCARcomplete_casebernsteintau210000.10.15060.08140.10050.1604
Scenario2_nonlinearMCARcomplete_caseepanechnikovtau310000.30.03170.06630.00940.0620
Scenario2_nonlinearMCARcomplete_casetricubetau310000.30.02240.07370.01020.0621
Scenario2_nonlinearMCARcomplete_casebernsteintau310000.30.03250.07760.01200.0615
Scenario2_nonlinearMCARcomplete_casegaussiantau310000.30.04550.07970.01400.0686
Scenario2_nonlinearMCARcomplete_casebetatau310000.30.22360.04490.07490.2244
Scenario2_nonlinearMCARcomplete_casegaussiantau210000.30.14390.08590.07800.1538
Scenario2_nonlinearMCARcomplete_casebetatau210000.30.23110.04530.08100.2317
Scenario2_nonlinearMCARcomplete_casebetatau110000.30.25110.04530.09970.2518
Scenario2_nonlinearMCARcomplete_casebernsteintau210000.30.15420.08470.10060.1638
Scenario2_nonlinearMCARcomplete_caseepanechnikovtau210000.30.15380.08940.10240.1681
Scenario2_nonlinearMCARcomplete_casetricubetau310000.50.03010.07160.01050.0664
Scenario2_nonlinearMCARcomplete_caseepanechnikovtau310000.50.03060.07530.01210.0686
Scenario2_nonlinearMCARcomplete_casebernsteintau310000.50.04700.08550.01520.0737
Scenario2_nonlinearMCARcomplete_casegaussiantau310000.50.04510.08890.01620.0763
Scenario2_nonlinearMCARcomplete_casebetatau310000.50.23600.05360.08420.2368
Scenario2_nonlinearMCARcomplete_casegaussiantau210000.50.15430.09420.08640.1668
Scenario2_nonlinearMCARcomplete_casebetatau210000.50.24820.05330.09560.2491
Scenario2_nonlinearMCARcomplete_casebernsteintau210000.50.16780.08900.10370.1765
Scenario2_nonlinearMCARcomplete_caseepanechnikovtau210000.50.17140.09210.11930.1872
Scenario2_nonlinearMCARcomplete_casebetatau110000.50.27180.05520.12050.2724
Scenario2_nonlinearMCARcomplete_casegaussiantau3200000.02640.04380.00410.0419
Scenario2_nonlinearMCARcomplete_casebernsteintau3200000.01690.04750.00440.0412
Scenario2_nonlinearMCARcomplete_casetricubetau3200000.02040.04510.00450.0417
Scenario2_nonlinearMCARcomplete_caseepanechnikovtau3200000.02540.04730.00520.0447
Scenario2_nonlinearMCARcomplete_casegaussiantau2200000.10340.06320.05200.1108
Scenario2_nonlinearMCARcomplete_casebetatau3200000.19240.02670.05400.1927
Scenario2_nonlinearMCARcomplete_casebetatau2200000.20360.02900.06180.2039
Scenario2_nonlinearMCARcomplete_casebetatau1200000.20920.02800.06680.2095
Scenario2_nonlinearMCARcomplete_casetricubetau2200000.12650.06630.08370.1344
Scenario2_nonlinearMCARcomplete_caseepanechnikovtau2200000.13150.06030.08680.1376
Scenario2_nonlinearMCARcomplete_caseepanechnikovtau320000.10.02340.05070.00600.0441
Scenario2_nonlinearMCARcomplete_casetricubetau320000.10.02290.05660.00640.0473
Scenario2_nonlinearMCARcomplete_casegaussiantau320000.10.03290.05590.00760.0474
Scenario2_nonlinearMCARcomplete_casebernsteintau320000.10.01630.06720.00890.0484
Scenario2_nonlinearMCARcomplete_casegaussiantau220000.10.10860.06550.05650.1164
Scenario2_nonlinearMCARcomplete_casebetatau320000.10.19940.02930.05810.1997
Scenario2_nonlinearMCARcomplete_casebetatau220000.10.20440.02910.06210.2047
Scenario2_nonlinearMCARcomplete_casebetatau120000.10.21570.03040.07140.2161
Scenario2_nonlinearMCARcomplete_caseepanechnikovtau220000.10.12410.06590.07640.1317
Scenario2_nonlinearMCARcomplete_casetricubetau220000.10.12960.06070.08740.1380
Scenario2_nonlinearMCARcomplete_casebernsteintau320000.30.01920.05140.00480.0451
Scenario2_nonlinearMCARcomplete_caseepanechnikovtau320000.30.02900.05300.00640.0505
Scenario2_nonlinearMCARcomplete_casetricubetau320000.30.02160.05640.00670.0480
Scenario2_nonlinearMCARcomplete_casegaussiantau320000.30.03250.06140.00830.0526
Scenario2_nonlinearMCARcomplete_casebetatau320000.30.20760.03510.06370.2079
Scenario2_nonlinearMCARcomplete_casegaussiantau220000.30.12200.07080.06540.1288
Scenario2_nonlinearMCARcomplete_casebetatau220000.30.21360.03270.06800.2139
Scenario2_nonlinearMCARcomplete_casebetatau120000.30.22470.03430.07780.2250
Scenario2_nonlinearMCARcomplete_caseepanechnikovtau220000.30.13930.06860.09310.1467
Scenario2_nonlinearMCARcomplete_casebernsteintau220000.30.14170.07090.09360.1511
Scenario2_nonlinearMCARcomplete_casebernsteintau320000.50.02280.05590.00580.0512
Scenario2_nonlinearMCARcomplete_casegaussiantau320000.50.02710.05960.00690.0516
Scenario2_nonlinearMCARcomplete_casetricubetau320000.50.02340.05930.00710.0533
Scenario2_nonlinearMCARcomplete_caseepanechnikovtau320000.50.03160.05680.00750.0545
Scenario2_nonlinearMCARcomplete_casegaussiantau220000.50.12650.07350.06550.1354
Scenario2_nonlinearMCARcomplete_casebetatau320000.50.21940.03810.07180.2199
Scenario2_nonlinearMCARcomplete_casebetatau220000.50.22470.03510.07620.2251
Scenario2_nonlinearMCARcomplete_casebetatau120000.50.24210.04090.09230.2425
Scenario2_nonlinearMCARcomplete_caseepanechnikovtau220000.50.14710.08060.09970.1560
Scenario2_nonlinearMCARcomplete_casebernsteintau220000.50.14940.08090.09970.1600
Scenario2_nonlinearMCARipwepanechnikovtau350000.03470.06980.00990.0677
Scenario2_nonlinearMCARipwgaussiantau350000.04170.08030.01280.0704
Scenario2_nonlinearMCARipwtricubetau350000.02860.08400.01390.0728
Scenario2_nonlinearMCARipwbernsteintau350000.05140.08580.01610.0761
Scenario2_nonlinearMCARipwgaussiantau250000.15410.09160.08390.1654
Scenario2_nonlinearMCARipwbetatau350000.24030.04990.08730.2410
Scenario2_nonlinearMCARipwbetatau250000.25290.05090.09950.2536
Scenario2_nonlinearMCARipwbernsteintau250000.17060.09620.10730.1797
Scenario2_nonlinearMCARipwtricubetau250000.17020.08610.11640.1842
Scenario2_nonlinearMCARipwbetatau150000.26850.05600.11640.2692
Scenario2_nonlinearMCARipwtricubetau35000.10.03310.07640.01220.0709
Scenario2_nonlinearMCARipwepanechnikovtau35000.10.03300.07770.01320.0702
Scenario2_nonlinearMCARipwbernsteintau35000.10.04360.08160.01330.0730
Scenario2_nonlinearMCARipwgaussiantau35000.10.05180.09300.01760.0812
Scenario2_nonlinearMCARipwbetatau35000.10.23250.05290.08120.2336
Scenario2_nonlinearMCARipwgaussiantau25000.10.15700.10100.08680.1688
Scenario2_nonlinearMCARipwbetatau25000.10.25380.05790.10100.2549
Scenario2_nonlinearMCARipwbernsteintau25000.10.17580.09460.10530.1835
Scenario2_nonlinearMCARipwbetatau15000.10.26630.05470.11350.2670
Scenario2_nonlinearMCARipwepanechnikovtau25000.10.17820.08910.11950.1897
Scenario2_nonlinearMCARipwtricubetau35000.30.02910.08280.01290.0759
Scenario2_nonlinearMCARipwbernsteintau35000.30.05460.08480.01520.0819
Scenario2_nonlinearMCARipwepanechnikovtau35000.30.03560.09000.01610.0796
Scenario2_nonlinearMCARipwgaussiantau35000.30.04780.09390.01630.0852
Scenario2_nonlinearMCARipwbetatau35000.30.24520.06550.09310.2464
Scenario2_nonlinearMCARipwgaussiantau25000.30.17310.11060.10640.1876
Scenario2_nonlinearMCARipwbernsteintau25000.30.18680.09890.10950.1941
Scenario2_nonlinearMCARipwbetatau25000.30.27090.06080.11520.2722
Scenario2_nonlinearMCARipwepanechnikovtau25000.30.19090.10400.13140.2041
Scenario2_nonlinearMCARipwtricubetau25000.30.18670.10160.13540.2045
Scenario2_nonlinearMCARipwepanechnikovtau35000.50.04130.08950.01700.0878
Scenario2_nonlinearMCARipwtricubetau35000.50.03260.09270.01730.0877
Scenario2_nonlinearMCARipwbernsteintau35000.50.06840.10050.02100.0948
Scenario2_nonlinearMCARipwgaussiantau35000.50.05470.10700.02210.0930
Scenario2_nonlinearMCARipwbetatau35000.50.25490.06850.09990.2565
Scenario2_nonlinearMCARipwgaussiantau25000.50.18340.11690.11130.1986
Scenario2_nonlinearMCARipwbernsteintau25000.50.21000.09950.11910.2156
Scenario2_nonlinearMCARipwbetatau25000.50.29400.07210.14000.2956
Scenario2_nonlinearMCARipwtricubetau25000.50.20110.10840.14220.2203
Scenario2_nonlinearMCARipwepanechnikovtau25000.50.20460.10720.14440.2221
Scenario2_nonlinearMCARipwtricubetau3100000.02520.05680.00670.0540
Scenario2_nonlinearMCARipwbernsteintau3100000.02550.06790.00910.0550
Scenario2_nonlinearMCARipwepanechnikovtau3100000.02790.06620.01020.0571
Scenario2_nonlinearMCARipwgaussiantau3100000.03980.07070.01100.0606
Scenario2_nonlinearMCARipwbetatau3100000.21690.04190.07060.2175
Scenario2_nonlinearMCARipwbetatau2100000.22450.03750.07610.2249
Scenario2_nonlinearMCARipwgaussiantau2100000.13770.07810.07780.1474
Scenario2_nonlinearMCARipwbetatau1100000.23630.03940.08760.2367
Scenario2_nonlinearMCARipwbernsteintau2100000.14830.07660.09680.1578
Scenario2_nonlinearMCARipwepanechnikovtau2100000.14830.07760.09750.1577
Scenario2_nonlinearMCARipwbernsteintau310000.10.02540.06250.00690.0542
Scenario2_nonlinearMCARipwgaussiantau310000.10.04000.07060.01040.0620
Scenario2_nonlinearMCARipwtricubetau310000.10.02100.07480.01040.0595
Scenario2_nonlinearMCARipwepanechnikovtau310000.10.03100.07450.01130.0604
Scenario2_nonlinearMCARipwbetatau310000.10.22080.04040.07290.2214
Scenario2_nonlinearMCARipwgaussiantau210000.10.13380.07870.07480.1431
Scenario2_nonlinearMCARipwbetatau210000.10.22810.04110.07870.2287
Scenario2_nonlinearMCARipwbetatau110000.10.24230.04590.09370.2428
Scenario2_nonlinearMCARipwbernsteintau210000.10.14880.07940.09560.1582
Scenario2_nonlinearMCARipwepanechnikovtau210000.10.15450.08360.10280.1634
Scenario2_nonlinearMCARipwtricubetau310000.30.02520.06460.00850.0594
Scenario2_nonlinearMCARipwbernsteintau310000.30.02700.07260.00960.0611
Scenario2_nonlinearMCARipwepanechnikovtau310000.30.03220.06850.01010.0636
Scenario2_nonlinearMCARipwgaussiantau310000.30.04530.08320.01500.0708
Scenario2_nonlinearMCARipwbetatau310000.30.22530.04350.07610.2258
Scenario2_nonlinearMCARipwgaussiantau210000.30.14470.08570.07880.1541
Scenario2_nonlinearMCARipwbetatau210000.30.24080.04410.08820.2414
Scenario2_nonlinearMCARipwbetatau110000.30.25330.04300.10130.2538
Scenario2_nonlinearMCARipwbernsteintau210000.30.16140.08310.10430.1704
Scenario2_nonlinearMCARipwepanechnikovtau210000.30.16560.08430.11120.1756
Scenario2_nonlinearMCARipwepanechnikovtau310000.50.02800.06760.00950.0663
Scenario2_nonlinearMCARipwtricubetau310000.50.03160.07070.01060.0680
Scenario2_nonlinearMCARipwgaussiantau310000.50.04480.07580.01150.0711
Scenario2_nonlinearMCARipwbernsteintau310000.50.04290.08420.01400.0718
Scenario2_nonlinearMCARipwbetatau310000.50.23260.05340.08250.2335
Scenario2_nonlinearMCARipwgaussiantau210000.50.15460.09180.08530.1659
Scenario2_nonlinearMCARipwbetatau210000.50.25060.05320.09680.2515
Scenario2_nonlinearMCARipwbernsteintau210000.50.17020.08850.10260.1800
Scenario2_nonlinearMCARipwbetatau110000.50.26130.05050.10930.2621
Scenario2_nonlinearMCARipwepanechnikovtau210000.50.17560.09530.11720.1869
Scenario2_nonlinearMCARipwtricubetau3200000.02030.04150.00370.0409
Scenario2_nonlinearMCARipwepanechnikovtau3200000.02650.04570.00520.0442
Scenario2_nonlinearMCARipwgaussiantau3200000.02640.05080.00550.0448
Scenario2_nonlinearMCARipwbernsteintau3200000.01430.05820.00640.0438
Scenario2_nonlinearMCARipwgaussiantau2200000.10580.06420.05230.1124
Scenario2_nonlinearMCARipwbetatau3200000.19460.02940.05550.1948
Scenario2_nonlinearMCARipwbetatau2200000.20470.03040.06290.2050
Scenario2_nonlinearMCARipwbetatau1200000.21180.02850.06850.2121
Scenario2_nonlinearMCARipwepanechnikovtau2200000.12520.06510.08040.1313
Scenario2_nonlinearMCARipwtricubetau2200000.12770.06460.08470.1360
Scenario2_nonlinearMCARipwbernsteintau320000.10.01560.04920.00440.0411
Scenario2_nonlinearMCARipwtricubetau320000.10.02200.05060.00570.0456
Scenario2_nonlinearMCARipwepanechnikovtau320000.10.02730.04960.00630.0465
Scenario2_nonlinearMCARipwgaussiantau320000.10.02920.05510.00670.0462
Scenario2_nonlinearMCARipwgaussiantau220000.10.10580.06490.05240.1142
Scenario2_nonlinearMCARipwbetatau320000.10.20150.02900.05980.2018
Scenario2_nonlinearMCARipwbetatau220000.10.20650.02960.06350.2068
Scenario2_nonlinearMCARipwbetatau120000.10.21170.02910.06760.2120
Scenario2_nonlinearMCARipwepanechnikovtau220000.10.13130.06800.08470.1380
Scenario2_nonlinearMCARipwtricubetau220000.10.13010.06600.08700.1391
Scenario2_nonlinearMCARipwbernsteintau320000.30.01760.05760.00610.0474
Scenario2_nonlinearMCARipwepanechnikovtau320000.30.02660.05260.00620.0492
Scenario2_nonlinearMCARipwtricubetau320000.30.02160.05790.00730.0492
Scenario2_nonlinearMCARipwgaussiantau320000.30.03710.06710.01090.0558
Scenario2_nonlinearMCARipwbetatau320000.30.20550.03480.06240.2059
Scenario2_nonlinearMCARipwgaussiantau220000.30.12430.06930.06700.1316
Scenario2_nonlinearMCARipwbetatau220000.30.21680.03430.07110.2172
Scenario2_nonlinearMCARipwbetatau120000.30.22170.02990.07530.2220
Scenario2_nonlinearMCARipwbernsteintau220000.30.13910.07240.09280.1500
Scenario2_nonlinearMCARipwepanechnikovtau220000.30.14080.06660.09490.1482
Scenario2_nonlinearMCARipwgaussiantau320000.50.03110.06020.00700.0537
Scenario2_nonlinearMCARipwtricubetau320000.50.02230.06730.00910.0559
Scenario2_nonlinearMCARipwbernsteintau320000.50.02580.07150.01000.0558
Scenario2_nonlinearMCARipwepanechnikovtau320000.50.03060.06940.01050.0567
Scenario2_nonlinearMCARipwgaussiantau220000.50.12340.07250.06250.1325
Scenario2_nonlinearMCARipwbetatau320000.50.22000.04300.07310.2205
Scenario2_nonlinearMCARipwbetatau220000.50.22740.03600.07770.2278
Scenario2_nonlinearMCARipwbernsteintau220000.50.14600.07440.08810.1543
Scenario2_nonlinearMCARipwbetatau120000.50.23830.03930.08880.2386
Scenario2_nonlinearMCARipwepanechnikovtau220000.50.15240.07980.10110.1623

References

  1. Halmos, P.R. The theory of unbiased estimation. Ann. Math. Stat. 1946, 17, 34–43. [Google Scholar] [CrossRef]
  2. Hoeffding, W. A class of statistics with asymptotically normal distribution. Ann. Math. Stat. 1948, 19, 293–325. [Google Scholar] [CrossRef]
  3. von Mises, R. On the asymptotic distribution of differentiable statistical functions. Ann. Math. Stat. 1947, 18, 309–348. [Google Scholar] [CrossRef]
  4. Dynkin, E.B.; Mandelbaum, A. Symmetric statistics, Poisson point processes, and multiple Wiener integrals. Ann. Statist. 1983, 11, 739–745. [Google Scholar] [CrossRef]
  5. Bretagnolle, J. Lois limites du bootstrap de certaines fonctionnelles. Ann. Inst. H. Poincaré Sect. B (N.S.) 1983, 19, 281–296. [Google Scholar]
  6. Rubin, H.; Vitale, R.A. Asymptotic distribution of symmetric statistics. Ann. Statist. 1980, 8, 165–170. [Google Scholar] [CrossRef]
  7. Filippova, A.A. Mises theorem on the limit behaviour of functionals derived from empirical distribution functions. Dokl. Akad. Nauk SSSR 1959, 129, 44–47. [Google Scholar]
  8. Denker, M.; Keller, G. On U-statistics and v. Mises’ statistics for weakly dependent processes. Z. Wahrsch. Verw. Gebiete 1983, 64, 505–522. [Google Scholar] [CrossRef]
  9. Borovkova, S.; Burton, R.; Dehling, H. Limit theorems for functionals of mixing processes with applications to U-statistics and dimension estimation. Trans. Amer. Math. Soc. 2001, 353, 4261–4318. [Google Scholar] [CrossRef]
  10. Leucht, A. Degenerate U- and V-statistics under weak dependence: Asymptotic theory and bootstrap consistency. Bernoulli 2012, 18, 552–585. [Google Scholar] [CrossRef]
  11. Leucht, A.; Neumann, M.H. Degenerate U- and V-statistics under ergodicity: Asymptotics, bootstrap and applications in statistics. Ann. Inst. Statist. Math. 2013, 65, 349–386. [Google Scholar] [CrossRef]
  12. Lee, A.J. U-Statistics; Volume 110, Statistics: Textbooks and Monographs; Theory and Practice; Marcel Dekker, Inc.: New York, NY, USA, 1990; p. xii+302. [Google Scholar]
  13. Arcones, M.A.; Giné, E. Limit theorems for U-processes. Ann. Probab. 1993, 21, 1494–1542. [Google Scholar] [CrossRef]
  14. Koroljuk, V.S.; Borovskich, Y.V. Theory of U-Statistics; Volume 273, Mathematics and Its Applications; Translated from the 1989 Russian Original by P. V. Malyshev and D. V. Malyshev and Revised by the Authors; Kluwer Academic Publishers Group: Dordrecht, The Netherlands, 1994; p. x+552. [Google Scholar] [CrossRef]
  15. Arcones, M.A.; Chen, Z.; Giné, E. Estimators related to U-processes with applications to multivariate medians: Asymptotic normality. Ann. Statist. 1994, 22, 1460–1477. [Google Scholar] [CrossRef]
  16. Arcones, M.A.; Giné, E. On the law of the iterated logarithm for canonical U-statistics and processes. Stochastic Process. Appl. 1995, 58, 217–245. [Google Scholar] [CrossRef]
  17. Borovskikh, Y.V. U-Statistics in Banach Spaces; VSP: Utrecht, The Netherlands, 1996; p. xii+420. [Google Scholar]
  18. Lehmann, E.L. Elements of Large-Sample Theory; Springer Texts in Statistics; Springer: New York, NY, USA, 1999; p. xii+631. [Google Scholar] [CrossRef]
  19. de la Peña, V.H.; Giné, E. Decoupling; From Dependence to Independence, Randomly Stopped Processes. U-Statistics and Processes. Martingales and Beyond; Probability and Its Applications (New York); Springer: New York, NY, USA, 1999; p. xvi+392. [Google Scholar] [CrossRef]
  20. Bouzebda, S.; Nezzal, A. Asymptotic properties of conditional U-statistics using delta sequences. Comm. Statist. Theory Methods 2024, 53, 4602–4657. [Google Scholar] [CrossRef]
  21. Bouzebda, S.; Nezzal, A. Uniform in number of neighbors consistency and weak convergence of kNN empirical conditional processes and kNN conditional U-processes involving functional mixing data. AIMS Math. 2024, 9, 4427–4550. [Google Scholar] [CrossRef]
  22. Bouzebda, S.; Soukarieh, I. Limit theorems for a class of processes generalizing the U-empirical process. Stochastics 2024, 96, 799–845. [Google Scholar] [CrossRef]
  23. Soukarieh, I.; Bouzebda, S. Renewal type bootstrap for increasing degree U-process of a Markov chain. J. Multivariate Anal. 2023, 195, 105143. [Google Scholar] [CrossRef]
  24. Stute, W. Almost sure representations of the product-limit estimator for truncated data. Ann. Statist. 1993, 21, 146–156. [Google Scholar] [CrossRef]
  25. Arcones, M.A.; Wang, Y. Some new tests for normality based on U-processes. Stat. Probab. Lett. 2006, 76, 69–82. [Google Scholar] [CrossRef]
  26. Schick, A.; Wang, Y.; Wefelmeyer, W. Tests for normality based on density estimators of convolutions. Stat. Probab. Lett. 2011, 81, 337–343. [Google Scholar] [CrossRef]
  27. Giné, E.; Mason, D.M. Laws of the iterated logarithm for the local U-statistic process. J. Theoret. Probab. 2007, 20, 457–485. [Google Scholar] [CrossRef]
  28. Giné, E.; Mason, D.M. On local U-statistic processes and the estimation of densities of functions of several sample variables. Ann. Statist. 2007, 35, 1105–1145. [Google Scholar] [CrossRef]
  29. Joly, E.; Lugosi, G. Robust estimation of U-statistics. Stochastic Process. Appl. 2016, 126, 3760–3773. [Google Scholar] [CrossRef]
  30. Lee, S.; Linton, O.; Whang, Y.J. Testing for stochastic monotonicity. Econometrica 2009, 77, 585–602. [Google Scholar] [CrossRef]
  31. Ghosal, S.; Sen, A.; van der Vaart, A.W. Testing monotonicity of regression. Ann. Statist. 2000, 28, 1054–1082. [Google Scholar] [CrossRef]
  32. Abrevaya, J.; Jiang, W. A nonparametric approach to measuring and testing curvature. J. Bus. Econom. Statist. 2005, 23, 1–19. [Google Scholar] [CrossRef]
  33. Sherman, R.P. The limiting distribution of the maximum rank correlation estimator. Econometrica 1993, 61, 123–137. [Google Scholar] [CrossRef]
  34. Sherman, R.P. Maximal inequalities for degenerate U-processes with applications to optimization estimators. Ann. Statist. 1994, 22, 439–459. [Google Scholar] [CrossRef]
  35. Bouzebda, S.; Ferfache, A.A. Asymptotic properties of semiparametric M-estimators with multiple change points. Physica A 2023, 609, 128363. [Google Scholar] [CrossRef]
  36. Janson, S. A functional limit theorem for random graphs with applications to subgraph count statistics. Random Struct. Algorithms 1990, 1, 15–37. [Google Scholar] [CrossRef]
  37. Clémençon, S.; Colin, I.; Bellet, A. Scaling-up empirical risk minimization: Optimization of incomplete U-statistics. J. Mach. Learn. Res. 2016, 17, 76. [Google Scholar]
  38. Clémençon, S.; Lugosi, G.; Vayatis, N. Ranking and empirical minimization of U-statistics. Ann. Statist. 2008, 36, 844–874. [Google Scholar] [CrossRef]
  39. Cao, Q.; Guo, Z.C.; Ying, Y. Generalization bounds for metric and similarity learning. Mach. Learn. 2016, 102, 115–132. [Google Scholar] [CrossRef]
  40. Faivishevsky, L.; Goldberger, J. ICA based on a Smooth Estimation of the Differential Entropy. In Proceedings of the Advances in Neural Information Processing Systems; Koller, D., Schuurmans, D., Bengio, Y., Bottou, L., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2009; Volume 21. [Google Scholar]
  41. Liu, Q.; Lee, J.; Jordan, M. A Kernelized Stein Discrepancy for Goodness-of-fit Tests. In Proceedings of the 33rd International Conference on Machine Learning, New York, NY, USA, 20–22 June 2016; Balcan, M.F., Weinberger, K.Q., Eds.; Proceedings of Machine Learning Research: New York, NY, USA, 2016; Volume 48, pp. 276–284. [Google Scholar]
  42. Cybis, G.B.; Valk, M.; Lopes, S.R.C. Clustering and classification problems in genetics through U-statistics. J. Stat. Comput. Simul. 2018, 88, 1882–1902. [Google Scholar] [CrossRef]
  43. Lim, F.; Stojanovic, V.M. On U-statistics and compressed sensing I: Non-asymptotic average-case analysis. IEEE Trans. Signal Process. 2013, 61, 2473–2485. [Google Scholar] [CrossRef]
  44. Bello, D.Z.; Valk, M.; Cybis, G.B. Towards U-statistics clustering inference for multiple groups. J. Stat. Comput. Simul. 2024, 94, 204–222. [Google Scholar] [CrossRef]
  45. Kim, I.; Ramdas, A. Dimension-agnostic inference using cross U-statistics. Bernoulli 2024, 30, 683–711. [Google Scholar] [CrossRef]
  46. Chen, L.; Wan, A.T.K.; Zhang, S.; Zhou, Y. Distributed algorithms for U-statistics-based empirical risk minimization. J. Mach. Learn. Res. 2023, 24, 263. [Google Scholar]
  47. Li, H.; Ren, C.; Li, L. U-processes and preference learning. Neural Comput. 2014, 26, 2896–2924. [Google Scholar] [CrossRef]
  48. Janson, S. Asymptotic normality for m-dependent and constrained U-statistics, with applications to pattern matching in random strings and permutations. Adv. Appl. Probab. 2023, 55, 841–894. [Google Scholar] [CrossRef]
  49. Sudheesh, K.K.; Anjana, S.; Xie, M. U-statistics for left truncated and right censored data. Statistics 2023, 57, 900–917. [Google Scholar] [CrossRef]
  50. Le Minh, T. U-statistics on bipartite exchangeable networks. ESAIM Probab. Stat. 2023, 27, 576–620. [Google Scholar] [CrossRef]
  51. Huang, B.; Liu, Y.; Peng, L. Distributed inference for two-sample U-statistics in massive data analysis. Scand. J. Stat. 2023, 50, 1090–1115. [Google Scholar] [CrossRef]
  52. Ghannadpour, S.S.; Kalkhoran, S.E.; Jalili, H.; Behifar, M. Delineation of mineral potential zone using U-statistic method in processing satellite remote sensing images. Int. J. Min. Geo-Eng. 2023, 57, 445–453. [Google Scholar]
  53. Cintra, R.F.; Valk, M.; Marcondes Filho, D. A model-free-based control chart for batch process using U-statistics. J. Process Control 2023, 132, 103097. [Google Scholar] [CrossRef]
  54. Frees, E.W. Infinite order U-statistics. Scand. J. Statist. 1989, 16, 29–45. [Google Scholar]
  55. Heilig, C.; Nolan, D. Limit theorems for the infinite-degree U-process. Statist. Sinica 2001, 11, 289–302. [Google Scholar]
  56. Song, Y.; Chen, X.; Kato, K. Approximating high-dimensional infinite-order U-statistics: Statistical and computational guarantees. Electron. J. Stat. 2019, 13, 4794–4848. [Google Scholar] [CrossRef]
  57. Peng, W.; Coleman, T.; Mentch, L. Rates of convergence for random forests via generalized U-statistics. Electron. J. Stat. 2022, 16, 232–292. [Google Scholar] [CrossRef]
  58. Randles, R.H. On the asymptotic normality of statistics with estimated parameters. Ann. Statist. 1982, 10, 462–474. [Google Scholar] [CrossRef]
  59. Desgagné, A.; Genest, C.; Ouimet, F. Asymptotics for non-degenerate multivariate U-statistics with estimated nuisance parameters under the null and local alternative hypotheses. arXiv 2024, arXiv:2401.11272. [Google Scholar] [CrossRef]
  60. Stute, W. Conditional U-statistics. Ann. Probab. 1991, 19, 812–825. [Google Scholar] [CrossRef]
  61. Nadaraja, E.A. On a regression estimate. Teor. Verojatnost. Primenen. 1964, 9, 157–159. [Google Scholar]
  62. Watson, G.S. Smooth regression analysis. Sankhyā Ser. A 1964, 26, 359–372. [Google Scholar]
  63. Sen, A. Uniform strong consistency rates for conditional U-statistics. Sankhyā Ser. A 1994, 56, 179–194. [Google Scholar]
  64. Prakasa Rao, B.L.S.; Sen, A. Limit distributions of conditional U-statistics. J. Theoret. Probab. 1995, 8, 261–301. [Google Scholar] [CrossRef]
  65. Harel, M.; Puri, M.L. Conditional U-statistics for dependent random variables. J. Multivariate Anal. 1996, 57, 84–100. [Google Scholar] [CrossRef]
  66. Stute, W. Symmetrized NN-conditional U-statistics. In Research Developments in Probability and Statistics; VSP: Utrecht, The Netherlands, 1996; pp. 231–237. [Google Scholar]
  67. Dony, J.; Mason, D.M. Uniform in bandwidth consistency of conditional U-statistics. Bernoulli 2008, 14, 1108–1133. [Google Scholar] [CrossRef]
  68. Bouzebda, S. On the weak convergence and the uniform-in-bandwidth consistency of the general conditional U-processes based on the copula representation: Multivariate setting. Hacet. J. Math. Stat. 2023, 52, 1303–1348. [Google Scholar] [CrossRef]
  69. Bouzebda, S. Weak convergence of the conditional single index U-statistics for locally stationary functional time series. AIMS Math. 2024, 9, 14807–14898. [Google Scholar] [CrossRef]
  70. Bouzebda, S. Limit Theorems in the Nonparametric Conditional Single-Index U-Processes for Locally Stationary Functional Random Fields under Stochastic Sampling Design. Mathematics 2024, 12, 1996. [Google Scholar] [CrossRef]
  71. Bouzebda, S. Limit theorems for wavelet conditional U-statistics for time series models. Math. Methods Statist. 2025, 34, 181–224. [Google Scholar] [CrossRef]
  72. Arcones, M.A. The Bahadur-Kiefer representation for U-quantiles. Ann. Statist. 1996, 24, 1400–1422. [Google Scholar] [CrossRef]
  73. Arcones, M.A. The asymptotic accuracy of the bootstrap of U-quantiles. Ann. Statist. 1995, 23, 1802–1822. [Google Scholar] [CrossRef]
  74. Helmers, R.; Hušková, M. Bootstrapping multivariate U-quantiles and related statistics. J. Multivariate Anal. 1994, 49, 97–109. [Google Scholar] [CrossRef]
  75. Zhou, W. Generalized Spatial u-Quantiles: Theory and Applications. Ph.D. Thesis, The University of Texas at Dallas, Richardson, TX, USA, 2005. [Google Scholar]
  76. Zhou, W.; Serfling, R. Generalized multivariate rank type test statistics via spatial U-quantiles. Stat. Probab. Lett. 2008, 78, 376–383. [Google Scholar] [CrossRef]
  77. Zhou, W.; Serfling, R. Multivariate spatial U-quantiles: A Bahadur-Kiefer representation, a Theil-Sen estimator for multiple regression, and a robust dispersion estimator. J. Statist. Plann. Inference 2008, 138, 1660–1678. [Google Scholar] [CrossRef]
  78. Dehling, H.; Fried, R. Asymptotic distribution of two-sample empirical U-quantiles with applications to robust tests for shifts in location. J. Multivariate Anal. 2012, 105, 124–140. [Google Scholar] [CrossRef]
  79. Wendler, M. U-processes, U-quantile processes and generalized linear statistics of dependent data. Stochastic Process. Appl. 2012, 122, 787–807. [Google Scholar] [CrossRef]
  80. Vogel, D.; Wendler, M. Studentized U-quantile processes under dependence with applications to change-point analysis. Bernoulli 2017, 23, 3114–3144. [Google Scholar] [CrossRef]
  81. Bouzebda, S.; Limnios, N. On general bootstrap of empirical estimator of a semi-Markov kernel with applications. J. Multivariate Anal. 2013, 116, 52–62. [Google Scholar] [CrossRef]
  82. Bouzebda, S.; Cherfi, M. General bootstrap for dual ϕ-divergence estimates. J. Probab. Stat. 2012, 2012, 834107. [Google Scholar] [CrossRef]
  83. Wertz, W. Statistical Density Estimation: A Survey; Volume 13, Angewandte Statistik und Ökonometrie [Applied Statistics and Econometrics]; With German and French summaries; Vandenhoeck & Ruprecht: Göttingen, Germany, 1978; p. 108. [Google Scholar]
  84. Wand, M.P.; Jones, M.C. Kernel Smoothing; Volume 60, Monographs on Statistics and Applied Probability; Chapman and Hall, Ltd.: London, UK, 1995; p. xii+212. [Google Scholar] [CrossRef]
  85. Tapia, R.A.; Thompson, J.R. Nonparametric Probability Density Estimation; Volume 1, Johns Hopkins Series in the Mathematical Sciences; Johns Hopkins University Press: Baltimore, MD, USA, 1978; p. xi+176. [Google Scholar]
  86. Prakasa Rao, B.L.S. Nonparametric Functional Estimation; Probability and Mathematical Statistics; Academic Press, Inc. [Harcourt Brace Jovanovich, Publishers]: New York, NY, USA, 1983; p. xiv+522. [Google Scholar]
  87. Roussas, G.G. Estimation of transition distribution function and its quantiles in Markov processes: Strong consistency and asymptotic normality. In Nonparametric Functional Estimation and Related Topics (Spetses, 1990); Volume 335, NATO Adv. Sci. Inst. Ser. C: Math. Phys. Sci.; Kluwer Academic Publishers: Dordrecht, The Netherlands, 1991; pp. 443–462. [Google Scholar]
  88. Nadaraya, E.A. Nonparametric Estimation of Probability Densities and Regression Curves; Volume 20, Mathematics and Its Applications (Soviet Series); Translated from the Russian by Samuel Kotz; Kluwer Academic Publishers Group: Dordrecht, The Netherlands, 1989; p. x+213. [Google Scholar] [CrossRef]
  89. Müller, H.G. Nonparametric Regression Analysis of Longitudinal Data; Volume 46, Lecture Notes in Statistics; Springer: New York, NY, USA, 1988; p. vi+199. [Google Scholar] [CrossRef]
  90. Härdle, W. Applied Nonparametric Regression; Volume 19, Econometric Society Monographs; Cambridge University Press: Cambridge, UK, 1990; p. xvi+333. [Google Scholar] [CrossRef]
  91. Eggermont, P.P.B.; LaRiccia, V.N. Maximum Penalized Likelihood Estimation. Vol. I; Density Estimation; Springer Series in Statistics; Springer: New York, NY, USA, 2001; p. xviii+510. [Google Scholar]
  92. Devroye, L. A Course in Density Estimation; Volume 14, Progress in Probability and Statistics; Birkhäuser Boston, Inc.: Boston, MA, USA, 1987; p. xx+183. [Google Scholar]
  93. Scott, D.W. Multivariate Density Estimation; Wiley Series in Probability and Mathematical Statistics: Applied Probability and Statistics; Theory, Practice, and Visualization, A Wiley-Interscience Publication; John Wiley & Sons, Inc.: New York, NY, USA, 1992; p. xiv+317. [Google Scholar] [CrossRef]
  94. Silverman, B.W. Density Estimation for Statistics and Data Analysis; Monographs on Statistics and Applied Probability; Chapman & Hall: London, UK, 1986; p. x+175. [Google Scholar] [CrossRef]
  95. Müller, H.G. Smooth optimum kernel estimators near endpoints. Biometrika 1991, 78, 521–530. [Google Scholar] [CrossRef]
  96. Jones, M.C. Corrigendum: “Variable kernel density estimates and variable kernel density estimates” [Austral. J. Statist. 32 (1990), no. 3, 361–371]. Austral. J. Statist. 1991, 33, 119. [Google Scholar] [CrossRef]
  97. Funke, B.; Hirukawa, M. Density derivative estimation using asymmetric kernels. J. Nonparametr. Stat. 2024, 36, 994–1017. [Google Scholar] [CrossRef]
  98. Chen, S.X. Beta kernel estimators for density functions. Comput. Statist. Data Anal. 1999, 31, 131–145. [Google Scholar] [CrossRef]
  99. Chen, S.X. Beta kernel smoothers for regression curves. Statist. Sinica 2000, 10, 73–91. [Google Scholar]
  100. Bouezmarni, T.; Rolin, J.M. Consistency of the beta kernel density function estimator. Canad. J. Statist. 2003, 31, 89–98. [Google Scholar] [CrossRef]
  101. Zhang, S.; Karunamuni, R.J. Boundary performance of the beta kernel estimators. J. Nonparametr. Stat. 2010, 22, 81–104. [Google Scholar] [CrossRef]
  102. Bertin, K.; Klutchnikoff, N. Minimax properties of beta kernel estimators. J. Statist. Plann. Inference 2011, 141, 2287–2297. [Google Scholar] [CrossRef]
  103. Bertin, K.; Genest, C.; Klutchnikoff, N.; Ouimet, F. Minimax properties of Dirichlet kernel density estimators. J. Multivariate Anal. 2023, 195, 105158. [Google Scholar] [CrossRef]
  104. Igarashi, G. Bias reductions for beta kernel estimation. J. Nonparametr. Stat. 2016, 28, 1–30. [Google Scholar] [CrossRef]
  105. Hirukawa, M. Asymmetric Kernel Smoothing; Theory and Applications in Economics and Finance, JSS Research Series in Statistics; SpringerBriefs in Statistics; Springer: Singapore, 2018; p. xii+110. [Google Scholar] [CrossRef]
  106. Kristensen, D. Uniform convergence rates of kernel estimators with heterogeneous dependent data. Econom. Theory 2009, 25, 1433–1445. [Google Scholar] [CrossRef]
  107. Yin, X.F.; Hao, Z.F. Adaptive Kernel Density Estimation using Beta Kernel. In Proceedings of the 2007 International Conference on Machine Learning and Cybernetics, Hong Kong, China, 19–22 August 2007; Volume 6, pp. 3293–3297. [Google Scholar] [CrossRef]
  108. Igarashi, G.; Kakizawa, Y. Limiting bias-reduced Amoroso kernel density estimators for non-negative data. Comm. Statist. Theory Methods 2018, 47, 4905–4937. [Google Scholar] [CrossRef]
  109. Charpentier, A.; Oulidi, A. Beta kernel quantile estimators of heavy-tailed loss distributions. Stat. Comput. 2010, 20, 35–55. [Google Scholar] [CrossRef]
  110. Brown, B.M.; Chen, S.X. Beta-Bernstein smoothing for regression curves with compact support. Scand. J. Statist. 1999, 26, 47–59. [Google Scholar] [CrossRef]
  111. Chen, S.X. Probability density function estimation using gamma kernels. Ann. Inst. Statist. Math. 2000, 52, 471–480. [Google Scholar] [CrossRef]
  112. Bouezmarni, T.; Rombouts, J.V.K. Nonparametric density estimation for multivariate bounded data. J. Statist. Plann. Inference 2010, 140, 139–152. [Google Scholar] [CrossRef]
  113. Aitchison, J.; Lauder, I.J. Kernel density estimation for compositional data. J. R. Stat. Soc. Ser. C 1985, 34, 129–137. [Google Scholar] [CrossRef]
  114. Vitale, R.A. Bernstein polynomial approach to density function estimation. In Proceedings of the Statistical Inference and Related Topics (Proc. Summer Res. Inst. Statist. Inference for Stochastic Processes, Indiana Univ., Bloomington, Ind., 1974, Vol. 2; Dedicated to Z. W. Birnbaum); Academic Press: New York, NY, USA; London, UK, 1975; pp. 87–99. [Google Scholar]
  115. Stadmüller, U. Asymptotic distributions of smoothed histograms. Metrika 1983, 30, 145–158. [Google Scholar] [CrossRef]
  116. Gawronski, W. Strong laws for density estimators of Bernstein type. Period. Math. Hungar. 1985, 16, 23–43. [Google Scholar] [CrossRef]
  117. Gawronski, W.; Stadtmüller, U. On density estimation by means of Poisson’s distribution. Scand. J. Statist. 1980, 7, 90–94. [Google Scholar]
  118. Tenbusch, A. Two-dimensional Bernstein polynomial density estimators. Metrika 1994, 41, 233–253. [Google Scholar] [CrossRef]
  119. Tenbusch, A. Nonparametric curve estimation with Bernstein estimates. Metrika 1997, 45, 1–30. [Google Scholar] [CrossRef]
  120. Babu, G.J.; Canty, A.J.; Chaubey, Y.P. Application of Bernstein polynomials for smooth estimation of a distribution and density function. J. Statist. Plann. Inference 2002, 105, 377–392. [Google Scholar] [CrossRef]
  121. Kakizawa, Y. Bernstein polynomial probability density estimation. J. Nonparametr. Stat. 2004, 16, 709–729. [Google Scholar] [CrossRef]
  122. Prakasa Rao, B.L.S. Estimation of distribution and density functions by generalized Bernstein polynomials. Indian J. Pure Appl. Math. 2005, 36, 63–88. [Google Scholar]
  123. Babu, G.J.; Chaubey, Y.P. Smooth estimation of a distribution and density function on a hypercube using Bernstein polynomials for dependent random vectors. Stat. Probab. Lett. 2006, 76, 959–969. [Google Scholar] [CrossRef]
  124. Leblanc, A. On estimating distribution functions using Bernstein polynomials. Ann. Inst. Statist. Math. 2012, 64, 919–943. [Google Scholar] [CrossRef]
  125. Belalia, M.; Bouezmarni, T.; Lemyre, F.C.; Taamouti, A. Testing independence based on Bernstein empirical copula and copula density. J. Nonparametr. Stat. 2017, 29, 346–380. [Google Scholar] [CrossRef]
  126. Wang, L.; Lu, D. Application of Bernstein polynomials on estimating a distribution and density function in a triangular array. Methodol. Comput. Appl. Probab. 2023, 25, 56. [Google Scholar] [CrossRef]
  127. Sancetta, A.; Satchell, S. The Bernstein copula and its applications to modeling and approximations of multivariate distributions. Econom. Theory 2004, 20, 535–562. [Google Scholar] [CrossRef]
  128. Abrams, S.; Janssen, P.; Swanepoel, J.; Veraverbeke, N. Nonparametric estimation of risk ratios for bivariate data. J. Nonparametr. Stat. 2022, 34, 940–963. [Google Scholar] [CrossRef]
  129. Ouimet, F. Asymptotic properties of Bernstein estimators on the simplex. J. Multivariate Anal. 2021, 185, 104784. [Google Scholar] [CrossRef]
  130. Leblanc, A. On the boundary properties of Bernstein polynomial estimators of density and distribution functions. J. Statist. Plann. Inference 2012, 142, 2762–2778. [Google Scholar] [CrossRef]
  131. Ouimet, F. On the boundary properties of Bernstein estimators on the simplex. Open Stat. 2022, 3, 48–62. [Google Scholar] [CrossRef]
  132. Bouezmarni, T.; Rolin, J.M. Bernstein estimator for unbounded density function. J. Nonparametr. Stat. 2007, 19, 145–161. [Google Scholar] [CrossRef]
  133. Leblanc, A. A bias-reduced approach to density estimation using Bernstein polynomials. J. Nonparametr. Stat. 2010, 22, 459–475. [Google Scholar] [CrossRef]
  134. Belalia, M. On the asymptotic properties of the Bernstein estimator of the multivariate distribution function. Stat. Probab. Lett. 2016, 110, 249–256. [Google Scholar] [CrossRef]
  135. Liu, B.; Ghosh, S.K. On empirical estimation of mode based on weakly dependent samples. Comput. Statist. Data Anal. 2020, 152, 107046. [Google Scholar] [CrossRef]
  136. Lu, D.; Wang, L.; Yang, J. The stochastic convergence of Bernstein polynomial estimators in a triangular array. J. Nonparametr. Stat. 2022, 34, 987–1014. [Google Scholar] [CrossRef]
  137. Kakizawa, Y. Recursive asymmetric kernel density estimation for nonnegative data. J. Nonparametr. Stat. 2021, 33, 197–224. [Google Scholar] [CrossRef]
  138. Rattihalli, R.N.; Patil, S.B. Data dependent asymmetric kernels for estimating the density function. Sankhya A 2021, 83, 155–186. [Google Scholar] [CrossRef]
  139. Funke, B.; Hirukawa, M. Bias correction for local linear regression estimation using asymmetric kernels via the skewing method. Econom. Stat. 2021, 20, 109–130. [Google Scholar] [CrossRef]
  140. Igarashi, G.; Kakizawa, Y. Multiplicative bias correction for asymmetric kernel density estimators revisited. Comput. Statist. Data Anal. 2020, 141, 40–61. [Google Scholar] [CrossRef]
  141. Hirukawa, M.; Sakudo, M. Another bias correction for asymmetric kernel density estimation with a parametric start. Stat. Probab. Lett. 2019, 145, 158–165. [Google Scholar] [CrossRef]
  142. Lu, L. On the uniform consistency of the Bernstein density estimator. Stat. Probab. Lett. 2015, 107, 52–61. [Google Scholar] [CrossRef]
  143. Guan, Z. Efficient and robust density estimation using Bernstein type polynomials. J. Nonparametr. Stat. 2016, 28, 250–271. [Google Scholar] [CrossRef]
  144. Wang, L.; Lu, D. On the rates of asymptotic normality for Bernstein density estimators in a triangular array. J. Math. Anal. Appl. 2022, 511, 126063. [Google Scholar] [CrossRef]
  145. Rubin, D.B. Inference and missing data. Biometrika 1976, 63, 581–592. [Google Scholar] [CrossRef]
  146. Little, R.J.A.; Rubin, D.B. Statistical Analysis with Missing Data, 3rd ed.; Wiley Series in Probability and Statistics; John Wiley & Sons: Hoboken, NJ, USA, 2019. [Google Scholar] [CrossRef]
  147. Josse, J.; Reiter, J.P. Introduction to the special section on missing data. Statist. Sci. 2018, 33, 139–141. [Google Scholar] [CrossRef]
  148. Little, R.J.A.; Rubin, D.B. Statistical Analysis with Missing Data, 2nd ed.; Wiley Series in Probability and Statistics; Wiley-Interscience [John Wiley & Sons]: Hoboken, NJ, USA, 2002; p. xviii+381. [Google Scholar] [CrossRef]
  149. Claeskens, G.; Hjort, N.L. Model Selection and Model Averaging; Volume 27, Cambridge Series in Statistical and Probabilistic Mathematics; Cambridge University Press: Cambridge, UK, 2008; p. xviii+312. [Google Scholar] [CrossRef]
  150. Seaman, S.; Galati, J.; Jackson, D.; Carlin, J. What is meant by “missing at random”? Statist. Sci. 2013, 28, 257–268. [Google Scholar] [CrossRef]
  151. Mealli, F.; Rubin, D.B. Clarifying missing at random and related definitions, and implications when coupled with exchangeability. Biometrika 2015, 102, 995–1000. [Google Scholar] [CrossRef]
  152. Lu, G.; Copas, J.B. Missing at random, likelihood ignorability and model completeness. Ann. Statist. 2004, 32, 754–765. [Google Scholar] [CrossRef]
  153. Farewell, D.M.; Daniel, R.M.; Seaman, S.R. Missing at random: A stochastic process perspective. Biometrika 2022, 109, 227–241. [Google Scholar] [CrossRef]
  154. Yatchew, A. An elementary estimator of the partial linear model. Econom. Lett. 1997, 57, 135–143. [Google Scholar] [CrossRef]
  155. Abadie, A.; Imbens, G.W. Large sample properties of matching estimators for average treatment effects. Econometrica 2006, 74, 235–267. [Google Scholar] [CrossRef]
  156. Guerre, E.; Perrigne, I.; Vuong, Q. Optimal nonparametric estimation of first-price auctions. Econometrica 2000, 68, 525–574. [Google Scholar] [CrossRef]
  157. Bouzebda, S. Asymptotic Learning Theory for Conditional U–Statistics Based on Delta Sequences Under Missing at Random Mechanisms. Mathematics 2026, 14, 1899. [Google Scholar] [CrossRef]
  158. Belhas, H.; Mohammedi, M.; Bouzebda, S. Asymptotic Theory for Multivariate Nonparametric Quantile Regression with Stationary Ergodic Functional Covariates and Missing-at-Random Responses. Symmetry 2026, 18, 445. [Google Scholar] [CrossRef]
  159. Bouzebda, S.; Nezzal, A.; Elhattab, I. Limit theorems for nonparametric conditional U-statistics smoothed by asymmetric kernels. AIMS Math. 2024, 9, 26195–26282. [Google Scholar] [CrossRef]
  160. Hansen, B.E. Uniform convergence rates for kernel estimation with dependent data. Econom. Theory 2008, 24, 726–748. [Google Scholar] [CrossRef]
  161. Deheuvels, P.; Mason, D.M. General asymptotic confidence bands based on kernel-type function estimators. Stat. Inference Stoch. Process. 2004, 7, 225–277. [Google Scholar] [CrossRef]
  162. Kotz, S.; Balakrishnan, N.; Johnson, N.L. Continuous Multivariate Distributions. Vol. 1: Models and Applications, 2nd ed.; Wiley: New York, NY, USA, 2000. [Google Scholar]
  163. Ng, K.W.; Tian, G.L.; Tang, M.L. Dirichlet and Related Distributions; Wiley Series in Probability and Statistics; Theory, Methods and Applications; John Wiley & Sons, Ltd.: Chichester, UK, 2011; p. xxvi+310. [Google Scholar] [CrossRef]
  164. Ouimet, F.; Tolosana-Delgado, R. Asymptotic properties of Dirichlet kernel density estimators. J. Multivariate Anal. 2022, 187, 104832. [Google Scholar] [CrossRef]
  165. Hirukawa, M.; Murtazashvili, I.; Prokhorov, A. Uniform convergence rates for nonparametric estimators smoothed by the beta kernel. Scand. J. Stat. 2022, 49, 1353–1382. [Google Scholar] [CrossRef]
  166. Cheng, T.C.; Biswas, A. Maximum trimmed likelihood estimator for multivariate mixed continuous and categorical data. Comput. Statist. Data Anal. 2008, 52, 2042–2065. [Google Scholar] [CrossRef]
  167. Spiess, M. Estimation of a two-equation panel model with mixed continuous and ordered categorical outcomes and missing data. J. R. Statist. Soc. Ser. C 2006, 55, 525–538. [Google Scholar] [CrossRef]
  168. Leung, C.Y. The effect of across-location heteroscedasticity on the classification of mixed categorical and continuous data. J. Multivariate Anal. 2003, 84, 369–386. [Google Scholar] [CrossRef]
  169. Liu, C.; Rubin, D.B. Ellipsoidally symmetric extensions of the general location model for mixed categorical and continuous data. Biometrika 1998, 85, 673–688. [Google Scholar] [CrossRef]
  170. Little, R.J.A.; Schluchter, M.D. Maximum likelihood estimation for mixed continuous and categorical data with missing values. Biometrika 1985, 72, 497–512. [Google Scholar] [CrossRef]
  171. Stute, W. Universally consistent conditional U-statistics. Ann. Statist. 1994, 22, 460–473. [Google Scholar] [CrossRef]
  172. Stute, W. Lp-convergence of conditional U-statistics. J. Multivariate Anal. 1994, 51, 71–82. [Google Scholar] [CrossRef]
  173. Devroye, L.; Györfi, L.; Lugosi, G. A Probabilistic Theory of Pattern Recognition; Volume 31, Applications of Mathematics (New York); Springer: New York, NY, USA, 1996; p. xvi+636. [Google Scholar] [CrossRef]
  174. Lehmann, E.L. A general concept of unbiasedness. Ann. Math. Stat. 1951, 22, 587–592. [Google Scholar] [CrossRef]
  175. Dwass, M. The large-sample power of rank order tests in the two-sample problem. Ann. Math. Statist. 1956, 27, 352–374. [Google Scholar] [CrossRef]
  176. Kendall, M.G. A New Measure of Rank Correlation. Biometrika 1938, 30, 81–93. [Google Scholar] [CrossRef]
  177. Serfling, R.J. Approximation Theorems of Mathematical Statistics; Wiley Series in Probability and Mathematical Statistics; John Wiley & Sons, Inc.: New York, NY, USA, 1980; p. xiv+371. [Google Scholar]
  178. Bouzebda, S.; Didi, S. Some asymptotic properties of kernel regression estimators of the mode for stationary and ergodic continuous time processes. Rev. Mat. Complut. 2021, 34, 811–852. [Google Scholar] [CrossRef]
  179. Bouzebda, S.; Didi, S. Additive regression model for stationary and ergodic continuous time processes. Comm. Statist. Theory Methods 2017, 46, 2454–2493. [Google Scholar] [CrossRef]
  180. Bouzebda, S.; Didi, S. Multivariate wavelet density and regression estimators for stationary and ergodic discrete time processes: Asymptotic results. Comm. Statist. Theory Methods 2017, 46, 1367–1406. [Google Scholar] [CrossRef]
  181. Bouzebda, S.; Didi, S.; El Hajj, L. Multivariate wavelet density and regression estimators for stationary and ergodic continuous time processes: Asymptotic results. Math. Methods Statist. 2015, 24, 163–199. [Google Scholar] [CrossRef]
  182. Bouzebda, S.; Ferfache, A.A. Asymptotic properties of M-estimators based on estimating equations and censored data in semi-parametric models with multiple change points. J. Math. Anal. Appl. 2021, 497, 124883. [Google Scholar] [CrossRef]
  183. Bouzebda, S. Asymptotic properties of pseudo maximum likelihood estimators and test in semi-parametric copula models with multiple change points. Math. Methods Statist. 2014, 23, 38–65. [Google Scholar] [CrossRef]
  184. Bouzebda, S.; Keziou, A. A new test procedure of independence in copula models via χ2-divergence. Comm. Statist. Theory Methods 2010, 39, 1–20. [Google Scholar] [CrossRef]
  185. Bouzebda, S.; Keziou, A. A semiparametric maximum likelihood ratio test for the change point in copula models. Stat. Methodol. 2013, 14, 39–61. [Google Scholar] [CrossRef]
  186. Bouezmarni, T.; Lemyre, F.C.; El Ghouch, A. Estimation of a bivariate conditional copula when a variable is subject to random right censoring. Electron. J. Stat. 2019, 13, 5044–5087. [Google Scholar] [CrossRef]
  187. Arcones, M.A. A Bernstein-type inequality for U-statistics and U-processes. Stat. Probab. Lett. 1995, 22, 239–247. [Google Scholar] [CrossRef]
  188. Billingsley, P. Probability and Measure, 3rd ed.; Wiley Series in Probability and Mathematical Statistics; John Wiley & Sons, Inc.: New York, NY, USA, 1995; p. xiv+593. [Google Scholar]
  189. van der Vaart, A.W. Asymptotic Statistics; Volume 3, Cambridge Series in Statistical and Probabilistic Mathematics; Cambridge University Press: Cambridge, UK, 1998; p. xvi+443. [Google Scholar] [CrossRef]
  190. van der Vaart, A.W.; Wellner, J.A. Weak Convergence and Empirical Processes—With Applications to Statistics; Springer Series in Statistics; Springer: Cham, Switzerland, 2023; p. xvii+679. [Google Scholar] [CrossRef]
Figure 1. Integrated mean squared error heatmap for Scenario 1. Each tile corresponds to a fixed combination of kernel, target missingness rate, empirical Kendall representation, sample size, and missingness mechanism. The color scale is displayed on the square-root scale used in the simulation code, which improves visual discrimination among low-risk procedures while preserving the ordering induced by IMSE. Because Scenario 1 combines a uniform design with a linear target curve, this figure should be read as the baseline benchmark for comparing the five smoothing strategies in the least irregular regime.
Figure 1. Integrated mean squared error heatmap for Scenario 1. Each tile corresponds to a fixed combination of kernel, target missingness rate, empirical Kendall representation, sample size, and missingness mechanism. The color scale is displayed on the square-root scale used in the simulation code, which improves visual discrimination among low-risk procedures while preserving the ordering induced by IMSE. Because Scenario 1 combines a uniform design with a linear target curve, this figure should be read as the baseline benchmark for comparing the five smoothing strategies in the least irregular regime.
Mathematics 14 02110 g001
Figure 2. Integrated mean squared error heatmap for Scenario 2. Relative to Scenario 1, this design is more demanding because the covariate density is asymmetric and concentrated near the left boundary, while the conditional Kendall function is nonlinear. The figure is therefore especially informative for assessing whether support-adapted smoothers, such as the beta and Bernstein procedures, gain a finite-sample advantage when both boundary effects and design inhomogeneity are present.
Figure 2. Integrated mean squared error heatmap for Scenario 2. Relative to Scenario 1, this design is more demanding because the covariate density is asymmetric and concentrated near the left boundary, while the conditional Kendall function is nonlinear. The figure is therefore especially informative for assessing whether support-adapted smoothers, such as the beta and Bernstein procedures, gain a finite-sample advantage when both boundary effects and design inhomogeneity are present.
Mathematics 14 02110 g002
Figure 3. Distribution of IMSE across kernels for Scenario 1. Each boxplot aggregates IMSE values over the sample sizes n { 500 , 1000 , 2000 } and missingness levels π { 0 , 0 . 10 , 0 . 30 , 0 . 50 } , separately by missingness mechanism, correction method, and empirical Kendall representation. In contrast to the heatmaps, which are cellwise diagnostics, these boxplots provide a robustness-oriented summary of how each smoother behaves across a family of homogeneous design cells.
Figure 3. Distribution of IMSE across kernels for Scenario 1. Each boxplot aggregates IMSE values over the sample sizes n { 500 , 1000 , 2000 } and missingness levels π { 0 , 0 . 10 , 0 . 30 , 0 . 50 } , separately by missingness mechanism, correction method, and empirical Kendall representation. In contrast to the heatmaps, which are cellwise diagnostics, these boxplots provide a robustness-oriented summary of how each smoother behaves across a family of homogeneous design cells.
Mathematics 14 02110 g003
Figure 4. Distribution of IMSE across kernels for Scenario 2. Because the design is concentrated near the boundary and the target is nonlinear, this figure provides a stringent aggregated comparison of the procedures. In particular, it reveals not only central tendency but also the stability of each smoothing method across heterogeneous finite-sample regimes within Scenario 2.
Figure 4. Distribution of IMSE across kernels for Scenario 2. Because the design is concentrated near the boundary and the target is nonlinear, this figure provides a stringent aggregated comparison of the procedures. In particular, it reveals not only central tendency but also the stability of each smoothing method across heterogeneous finite-sample regimes within Scenario 2.
Mathematics 14 02110 g004
Figure 5. Complete-case versus IPW comparison for Scenario 1 using the Gaussian smoother as benchmark. The horizontal axis reports the target missingness rate and the vertical axis reports IMSE. Panels are stratified by empirical Kendall representation, missingness mechanism, and sample size. Under MCAR, the two methods are theoretically and numerically identical in the present implementation because normalized IPW reduces exactly to normalized complete-case weighting; therefore, any visible discrepancy can only arise under covariate-dependent MAR and should be interpreted as the consequence of local design reweighting rather than correction of a different target.
Figure 5. Complete-case versus IPW comparison for Scenario 1 using the Gaussian smoother as benchmark. The horizontal axis reports the target missingness rate and the vertical axis reports IMSE. Panels are stratified by empirical Kendall representation, missingness mechanism, and sample size. Under MCAR, the two methods are theoretically and numerically identical in the present implementation because normalized IPW reduces exactly to normalized complete-case weighting; therefore, any visible discrepancy can only arise under covariate-dependent MAR and should be interpreted as the consequence of local design reweighting rather than correction of a different target.
Mathematics 14 02110 g005
Figure 6. Complete-case versus IPW comparison for Scenario 2 using the Gaussian smoother. This is the more informative of the two benchmark comparisons, because nonlinear dependence and asymmetric design density amplify the local consequences of MAR-driven covariate distortion. The figure therefore visualizes the central finite-sample trade-off induced by inverse-probability reweighting: possible recentering gains in distorted regions versus variance inflation due to heterogeneous weights.
Figure 6. Complete-case versus IPW comparison for Scenario 2 using the Gaussian smoother. This is the more informative of the two benchmark comparisons, because nonlinear dependence and asymmetric design density amplify the local consequences of MAR-driven covariate distortion. The figure therefore visualizes the central finite-sample trade-off induced by inverse-probability reweighting: possible recentering gains in distorted regions versus variance inflation due to heterogeneous weights.
Mathematics 14 02110 g006
Figure 7. Observed-rate diagnostic for Scenario 1. The density curves display, over Monte Carlo replications, the realized observed proportion for each target missingness level, missingness mechanism, and sample size. Their concentration around the prescribed levels confirms that the calibration step in the missingness generator performs as intended. In particular, under the covariate-dependent MAR design, the replication-specific numerical choice of the intercept a successfully stabilizes the overall observation rate before Bernoulli thinning is applied.
Figure 7. Observed-rate diagnostic for Scenario 1. The density curves display, over Monte Carlo replications, the realized observed proportion for each target missingness level, missingness mechanism, and sample size. Their concentration around the prescribed levels confirms that the calibration step in the missingness generator performs as intended. In particular, under the covariate-dependent MAR design, the replication-specific numerical choice of the intercept a successfully stabilizes the overall observation rate before Bernoulli thinning is applied.
Mathematics 14 02110 g007
Figure 8. Observed-rate diagnostic for Scenario 2. Despite the irregular Beta design for the covariate, the realized observation fractions remain tightly concentrated around their target values. This confirms that the asymmetry of the design density does not compromise the numerical calibration of the MCAR and MAR selection modules, and it supports interpreting subsequent performance differences as genuine estimation effects rather than artifacts of poor missingness control.
Figure 8. Observed-rate diagnostic for Scenario 2. Despite the irregular Beta design for the covariate, the realized observation fractions remain tightly concentrated around their target values. This confirms that the asymmetry of the design density does not compromise the numerical calibration of the MCAR and MAR selection modules, and it supports interpreting subsequent performance differences as genuine estimation effects rather than artifacts of poor missingness control.
Mathematics 14 02110 g008
Table 1. Main notation used in the paper.
Table 1. Main notation used in the paper.
NotationMeaning
X i Covariate vector, taking values in X [ 0 , 1 ] d
Y i Response vector, taking values in R q
δ i Response-observation indicator; δ i = 1 if Y i is observed
p ( x ) Propensity score, p ( x ) = P ( δ = 1 X = x )
f ( x ) Density of the covariate X
mOrder of the conditional U-statistic
φ Measurable kernel/function of m response variables
x ˜ m-tuple ( x 1 , , x m ) X m
I ( m , n ) Set of m-tuples of distinct indices from { 1 , , n }
r ( m ) ( φ , x ˜ ) Target conditional U-functional
K Λ n , ( x ) Asymmetric-kernel centered/adapted at x
= 1 , 2 , 3 Kernel type: Dirichlet, Bernstein, or beta/mixed kernel
r ^ n , ( m ) Complete-case asymmetric-kernel conditional U-statistic estimator
u n , ( φ , x ˜ ) Localized numerator U-statistic under MAR
u n , ( 1 , x ˜ ) Localized denominator U-statistic under MAR
r p , n , ( m ) Deterministic complete-case smoothed centering based on p f
π j , m j-th Hoeffding projection of an m-variate kernel
Table 2. Summary of the main asymptotic results. The notation B n , denotes the deterministic smoothing bias, V n , the leading stochastic variance scale, and g = p f the effective complete-case design density under MAR. The symbol ρ n , denotes the kernel-specific deterministic bias scale.
Table 2. Summary of the main asymptotic results. The notation B n , denotes the deterministic smoothing bias, V n , the leading stochastic variance scale, and g = p f the effective complete-case design density under MAR. The symbol ρ n , denotes the kernel-specific deterministic bias scale.
Smoother/SupportEstimatorMain ResultBias Scale and CenteringStochastic Scale Under MARNovelty/Role in the Paper
Dirichlet kernels on the simplex S d , 1 Complete-case conditional U-statistic r ^ n , 1 ( m ) ; regression case m = 1 treated separatelyUniform consistency for m = 1 ; uniform strong consistency for general m; asymptotic normality
ρ n , 1 = b ˘ .
The deterministic lefting is obtained by replacing
f with p f .
Thus the complete-case bias constant is computed with g = p f .
Leading stochastic term given by the first Hoeffding projection. The variance contains the complete-case information factor
{ p ( x ) f ( x ) } 1 ,
or its m-variate analogue.
Provides the simplex-adapted MAR theory. The Dirichlet local drift, covariance, boundary behavior, and L 2 -norm are kernel-specific and not supplied by the abstract delta-sequence theory.
Bernstein polynomial smoothers on compact supportsComplete-case Bernstein-type conditional U-statistic; Nadaraya–Watson case m = 1 Weak and strong uniform convergence; higher-order conditional U-statistic extension
ρ n , 2 = k n 1 .
The deterministic bias is the Bernstein approximation bias applied to the complete-case target density p f .
The stochastic order is inherited from the Bernstein localization scale and the complete-case first projection. MAR changes constants through inverse propensity terms, but not the convergence rate under p c > 0 .Shows that discrete polynomial smoothers fit the MAR conditional U-statistic framework. The verification is nontrivial because the smoothing operator is discrete rather than an ordinary continuous kernel.
Product beta kernels on [ 0 , 1 ] d Complete-case beta-kernel conditional U-statisticWeak uniform convergence on fixed compact regions; strong uniform convergence; expanding-domain results approaching the boundary
ρ n , 3 = b n .
The complete-case deterministic bias is governed by the beta-kernel moment expansion with f replaced by p f . On interior regions,
μ n , 3 ( x ) = b n ( 1 2 x ) + O ( b n 2 ) .
The variance scale depends on the beta-kernel L 2 -norm and is inflated by the MAR observation mechanism through factors involving 1 / p .Captures boundary-sensitive beta-kernel behavior under MAR. The point-dependent shape of the beta kernel makes the local L 2 -norm and bias constants support-dependent.
Mixed continuous–categorical regressorsComplete-case conditional U-statistic with continuous beta smoothing and categorical smoothingUniform convergence for heterogeneous covariates; mixed-data extension of the beta-kernel theory
ρ n , mix = b n + λ n ,
where b n is the continuous bandwidth and λ n the categorical smoothing parameter.
The stochastic scale combines the continuous beta contribution, the categorical smoothing contribution, and the complete-case inverse-propensity inflation.Extends the theory beyond purely continuous supports. The deterministic bias splits into continuous and categorical components, a feature absent from the complete-data and abstract delta-sequence settings.
Applications: conditional dependence, discrimination, multisample functionals, and conditional Kendall-type coefficientsSpecial choices of the U-statistic kernel φ Consistency and, where applicable, asymptotic normality obtained by applying the preceding general theoryBias and centering inherited from the corresponding smoothing family: Dirichlet, Bernstein, beta, or mixed.Variance obtained from the conditional Hoeffding projection of the chosen kernel φ , with MAR inflation through 1 / p .Demonstrates that the framework estimates genuinely nonlinear conditional functionals, not only ordinary conditional means.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Bouzebda, S. Advanced Statistical Learning: Limit Theorems for Nonparametric Conditional U-Statistics Smoothed by Asymmetric Kernels Under Missing-at-Random Sampling. Mathematics 2026, 14, 2110. https://doi.org/10.3390/math14122110

AMA Style

Bouzebda S. Advanced Statistical Learning: Limit Theorems for Nonparametric Conditional U-Statistics Smoothed by Asymmetric Kernels Under Missing-at-Random Sampling. Mathematics. 2026; 14(12):2110. https://doi.org/10.3390/math14122110

Chicago/Turabian Style

Bouzebda, Salim. 2026. "Advanced Statistical Learning: Limit Theorems for Nonparametric Conditional U-Statistics Smoothed by Asymmetric Kernels Under Missing-at-Random Sampling" Mathematics 14, no. 12: 2110. https://doi.org/10.3390/math14122110

APA Style

Bouzebda, S. (2026). Advanced Statistical Learning: Limit Theorems for Nonparametric Conditional U-Statistics Smoothed by Asymmetric Kernels Under Missing-at-Random Sampling. Mathematics, 14(12), 2110. https://doi.org/10.3390/math14122110

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop