1. Introduction
The problem of understanding how an s-dimensional covariate modulates the conditional distribution of a d-dimensional response , based on an i.i.d. sample , lies at the core of contemporary statistics, econometrics, and statistical learning. While conditional means and conditional covariance structures provide useful summaries of location and dispersion, they remain intrinsically insufficient whenever the scientific objective concerns heterogeneity across the conditional distribution, directional extremality, tail behavior, or robust local features. In such settings, conditional quantiles furnish a far more informative statistical description. They encode distributional asymmetry, remain meaningful under heavy-tailed regimes, and are naturally aligned with decision-theoretic questions involving risk, stress, and distribution-sensitive prediction. In the scalar-response case, the theory of conditional quantiles is by now highly developed, with kernel, local polynomial, nearest-neighbor, and spline-based estimators enjoying a mature asymptotic theory. By contrast, the extension to multivariate responses remains mathematically delicate, primarily because there is no canonical total order on .
A fruitful resolution of this difficulty is provided by the geometric viewpoint on multivariate quantiles. In the univariate case, the
p-th quantile is characterized as the unique minimizer of a convex asymmetric absolute loss; see [
1]. This variational formulation extends naturally to regression-type problems [
2,
3]. In higher dimension, however, the absence of an order structure requires a geometric replacement for scalar ranking. Two major lines of development have emerged in this direction: multivariate norm-based quantiles [
4,
5], and the theory of spatial, or geometric, quantiles introduced by Chaudhuri [
6,
7] and subsequently developed in several directions, including recent conditional and Bahadur-type formulations [
8,
9,
10,
11,
12]. The spatial quantile framework is especially compelling because it indexes multivariate quantiles by a direction vector
in the open unit ball
thereby encoding, within a single object, both the magnitude of outlyingness and its geometric orientation. This directional parametrization confers a direct geometric interpretation upon conditional multivariate quantiles and makes the associated estimation problem amenable to convex analysis.
More precisely, for
, define
The
-th geometric conditional quantile of
given
is then defined as any minimizer of the conditional convex functional
This formulation, which goes back to the geometric quantile paradigm of [
6,
7], offers a coherent multivariate analogue of ordinary quantiles, while retaining convexity, directional interpretability, and robustness. Under absolute continuity assumptions, existence and uniqueness follow from strict convexity arguments; see, e.g., [
4] (Remark 2.3). The conditional version is particularly attractive for multivariate response analysis, portfolio allocation, environmental monitoring, compositional systems, and spatio-temporal modelling, where one seeks a directional description of the entire conditional distribution rather than a single central tendency summary.
Yet the nonparametric estimation of
becomes substantially more delicate when the support of the covariate is bounded. This is not a marginal technical complication but a structural issue. Classical symmetric kernel methods inevitably allocate mass outside the support, thereby generating boundary bias and distorting local smoothing near edges, faces, and corners. This challenge has been widely documented in the nonparametric literature on density estimation, regression, and conditional functionals over bounded supports [
13,
14,
15,
16,
17,
18,
19,
20,
21,
22,
23,
24]. A broad range of correction strategies has been proposed, including boundary kernels and local polynomial corrections [
25,
26]. Nevertheless, a conceptually cleaner alternative has emerged in the form of asymmetric kernels whose support coincides exactly with the underlying domain [
27]. Such kernels are support-respecting by construction and exhibit location-dependent shape adaptation, thereby yielding an intrinsic form of boundary correction.
This philosophy is well established on the unit interval through beta kernels [
28,
29,
30,
31,
32,
33,
34,
35,
36,
37,
38] and on bounded Euclidean domains using Bernstein-type techniques [
39,
40,
41,
42,
43,
44,
45,
46,
47,
48,
49,
50,
51,
52,
53,
54,
55,
56]. However, the simplex setting is fundamentally richer. When the covariate represents compositional proportions or relative allocations, it is naturally constrained to the simplex
whose geometry is qualitatively different from that of a Cartesian product domain. In that setting, not only must the smoothing device respect each coordinate boundary, but it must also incorporate the global
-constraint. Dirichlet kernels constitute the natural support-adapted analogue of beta kernels on
; see [
57,
58,
59]. Their shape may be made to depend deterministically on the evaluation point, and this produces a genuinely geometry-aware smoothing scheme, with local dispersion and asymmetry automatically modulated by the position of the target point relative to the simplex boundary.
The present work is motivated precisely by this conjunction of two difficulties: the lack of a scalar order in multivariate conditional quantile analysis, and the geometric complexity induced by simplex-supported covariates. Our objective is to construct and analyze a nonparametric estimator of geometric conditional quantiles that is simultaneously: (i) faithful to the spatial-quantile paradigm; (ii) intrinsically compatible with the simplex support; (iii) asymptotically tractable under a full inferential theory; and (iv) computationally implementable in realistic multivariate settings. To this end, we combine the geometric quantile loss of [
6,
7] with Dirichlet kernel weighting on
, using evaluation-point-dependent shape parameters
This yields a location-adaptive smoothing device that exactly respects the simplex geometry and produces a weighted convex
M-estimator for the conditional geometric quantile.
The contribution of the paper is threefold. First, we formulate a Dirichlet–Kernel estimator of geometric conditional quantiles on the simplex and show that it provides a principled support-respecting analogue of the Nadaraya–Watson idea for multivariate directional quantile functionals. Second, we establish a full asymptotic theory, including a Bahadur-type linear expansion with explicit stochastic remainder, asymptotic bias and covariance expansions, mean squared error analysis, optimal bandwidth scaling, and asymptotic normality with covariance estimation. Third, our analysis uncovers a distinctive boundary phenomenon: the asymptotic variance is no longer governed solely by the ambient simplex dimension s, but also by the codimension of the face approached by the evaluation point. More precisely, each coordinate direction becoming asymptotically boundary-active contributes a multiplicative inflation to the variance, leading to the regime near a face of codimension . This reveals, in explicit form, how simplex geometry governs stochastic uncertainty in conditional quantile estimation.
These results position the present work at the intersection of several active literatures. Relative to the foundational theory of spatial quantiles [
4,
6,
7], we address the genuinely conditional problem with nonparametric, covariate-dependent weighting. Relative to the extensive literature on asymmetric kernel smoothing and Bernstein-type approximation [
29,
30,
54,
59], we move beyond density and distribution estimation to a substantially more intricate geometric functional. Relative to recent work on conditional multivariate quantiles and Bahadur expansions [
10,
11,
60], we introduce a support-adapted simplex-based smoothing mechanism and derive the resulting nonstandard inferential scaling. When
, the estimator reduces to the conditional geometric median, thereby recovering, and substantially extending, the framework of [
61]. At the same time, the full directional formulation permits inference not merely at the center of the conditional distribution, but across a continuum of oriented and increasingly extreme directions.
From a methodological standpoint, the framework is especially relevant for compositional and constrained-covariate applications. In environmental geochemistry, soil composition profiles are naturally represented by proportions or normalized concentrations. In econometrics and finance, portfolio weights and budget shares live on simplices. In spatial and biological applications, relative abundance vectors obey the same structural constraint. In all such settings, conventional kernel smoothing can be severely distorted near the boundary, whereas Dirichlet kernels yield a support-faithful alternative whose asymptotic behavior can be analyzed sharply. This is the principal conceptual message of the paper: boundary adaptivity is not merely a technical refinement, but an essential structural ingredient in multivariate conditional quantile estimation on constrained domains.
Organization
The remainder of the paper is organized as follows.
Section 2 introduces the simplex setting, Dirichlet kernels, and the geometric conditional quantile functional, and it clarifies the weighted
M-estimation principle underlying the estimator.
Section 3 states the regularity conditions and develops the main asymptotic results, including the Bahadur representation, bias and variance expansions, mean squared error, optimal bandwidth, mean integrated absolute error, and asymptotic normality.
Section 4 reports simulation experiments illustrating the finite-sample behavior of the estimator, the geometry of the confidence ellipsoids, and the directional quantile contours.
Section 5 presents a real-data application to the GEMAS dataset [
62], thereby illustrating the practical relevance of the methodology for compositional environmental covariates.
Section 6 concludes with limitations and possible extensions, while
Section 7 collects the technical proofs.
2. Setup and Definitions
We now formalize the geometric and probabilistic framework underlying the proposed estimator. We first recall the simplex geometry and the Dirichlet kernel construction, and then explain how these ingredients combine with the spatial-quantile loss to produce a weighted empirical
M-estimator of the geometric conditional quantile. Let
denote the
s-dimensional simplex defined by
with interior
where
and
. For parameters
, the density function of the Dirichlet
distribution is given by
For a detailed account, we refer the reader to [
57] (Chapter 49) and [
58]. We now clarify the theoretical basis of the estimator introduced below. For a fixed evaluation point
and a fixed directional index
, the target of inference is the population geometric conditional quantile
Accordingly, the only unknown parameter to be estimated in what follows is the vector
whereas
and
are regarded as fixed indexing arguments, and
is a smoothing parameter. The vector
appearing in the minimization problem is therefore not an additional model parameter; it is the optimization variable whose minimizing value defines the estimator of
.
The estimator is obtained by a plug-in principle. Since the conditional distribution of
given
is unknown, we replace it by a Dirichlet–Kernel weighted empirical conditional measure concentrated on the observations
. Specifically, for the fixed evaluation point
, we assign to each observation
the weight
where
is the Dirichlet kernel on
with parameters
These weights are nonnegative and sum to one, so they define the empirical conditional probability measure
where
denotes the Dirac mass at
. Equivalently, its associated conditional distribution function is
with the inequality understood componentwise.
Replacing the unknown conditional distribution in the population variational characterization by
yields the sample criterion
and the estimator is defined as its minimizer. Hence
is a weighted convex
M-estimator, namely the empirical analogue of the population geometric conditional quantile functional. This motivates the definition
In particular, the operator
denotes the set of minimizers of the sample objective, and under the regularity conditions imposed later, convexity and absolute continuity ensure existence and, with probability tending to one, uniqueness of the minimizer. Thus, Equation (
2) is not an ad hoc definition, but the natural Dirichlet–Kernel plug-in estimator of the population conditional geometric quantile.
Notation
Throughout the paper, let
and
denote the joint and marginal density functions of the random variables
and
, respectively. The conditional density of
given
is denoted by
. The notation
indicates convergence in distribution. For any matrix
,
denotes its transpose. For
, define
where
denotes the
identity matrix, and
with
denoting the Euclidean norm of
. Unless otherwise stated, all limits in this paper are taken as
.
Remark 1. Equation (2) follows from a standard plug-in principle, but with a localization device tailored to the geometry of the covariate space. The functional is defined as the minimizer of a conditional convex criterion, and the statistical task is therefore to approximate the conditional law of at the design point . The weights arise from this approximation step. The specificity of the present framework lies in the fact that the covariate takes values in . In such a setting, a symmetric kernel is not geometrically appropriate, since it ignores the support constraint and produces boundary distortion. The Dirichlet kernel, by contrast, is intrinsically supported on the simplex and possesses an evaluation-point-dependent shape, thereby inducing an automatic boundary adaptation. Hence the estimator is best interpreted as a geometric conditional quantile estimator obtained by localized empirical risk minimization under a support-respecting Dirichlet smoothing scheme. 4. Numerical Results
This section provides a finite-sample illustration of the estimator introduced in
Section 2 in the special case
. In that setting, the simplex
is one-dimensional, and the Dirichlet kernel reduces to the Beta kernel. The purpose of this section is not to establish an additional theoretical result, but rather to examine, in a controlled setting, the numerical behavior of the estimator
, the geometry of the associated Gaussian ellipsoids, and the shape of the estimated directional quantile contours.
We observe i.i.d. pairs
, where
is scalar and
is bivariate. For a fixed evaluation point
and a fixed direction
, the estimator is computed exactly as in (
2), with the one-dimensional Dirichlet kernel replaced by the corresponding Beta kernel. More precisely, letting
denote the bandwidth, define
and the normalized weights
The estimator of the
-th geometric conditional quantile at
is then defined by
where
is the geometric loss introduced in (
1). Since the objective is convex, the minimizer is computed numerically by direct convex optimization.
The covariate and conditional response are generated according to
with
The matrix
is positive definite for all
, so the model is well defined throughout the support.
Unless otherwise stated, the numerical illustrations are carried out at the interior point
and for the two directions
which both satisfy
. For point estimation, we take
which is the usual mean-squared-error optimal rate in the one-dimensional interior case
; compare Corollary 1.
Specializing Theorem 5 to
, one obtains, under assumptions
A.1–
A.7 and for fixed
,
where, with the notation of
Section 3,
At leading order, this reduces to
It is important to note, however, that in the present illustrations the bandwidth choice
is adopted for point estimation accuracy. In the one-dimensional case, this implies that the centering bias is of the same asymptotic order as the stochastic fluctuation, since
Consequently, the ellipsoids displayed below should be interpreted as nominal Gaussian uncertainty ellipsoids around the estimator, calibrated from the leading-order covariance structure and the bootstrap, rather than as fully bias-corrected asymptotic confidence sets. A formally centered asymptotic confidence set under the theory of
Section 3 would require either undersmoothing or explicit bias correction, neither of which is pursued in the present finite-sample illustration.
In order to visualize the local covariance structure predicted by (
17), we use the leading-order plug-in estimator
together with
and
The corresponding covariance estimate is
Using this matrix, we display the nominal
Gaussian ellipsoid
In finite samples, the scale of this approximation is additionally monitored by a nonparametric bootstrap with
replicates. The sample sizes considered are
.
Figure 1 displays the resulting nominal Gaussian ellipsoids for the two directions
and
. In each panel, the red point represents a numerical approximation of the target
obtained from the known conditional Gaussian law, while the black symbols indicate the corresponding sample estimates for
.
For , the ellipsoids contract substantially as n increases, and the centers stabilize around the target point. The ellipse for is comparatively wide and elongated, indicating a pronounced anisotropy in the estimated local covariance structure. The cases and show the expected reduction in dispersion, with the ellipse concentrated more tightly around the target.
The same qualitative behavior is observed for
. The contraction of the ellipsoids with increasing sample size is again clear, although the orientation and eccentricity differ from those in the first panel. This difference reflects the directional nature of the geometric conditional quantile, since the local matrices
and
, and hence the covariance structure in (
18), depend on the chosen direction
.
To visualize the directional geometry of the estimator over a grid of directions, write
For each pair
, we compute
at
under the same model (
16). For fixed
r, the resulting polygonal curve
constitutes a numerical approximation to the directional quantile contour at radial level
r.
The estimated contours are displayed in
Figure 2 for
and
. As expected, the contours are nested, and their size increases with
r, reflecting the passage from central to more extremal directions. Moreover, the curves become visibly smoother and more regular as the sample size increases. The remaining irregularities for
are consistent with finite-sample variability in the weighted convex minimization problem, whereas the
contours already exhibit a more stable directional structure.
Discussion of Simulation Results
The numerical evidence displayed in
Figure 1 and
Figure 2 is consistent with the asymptotic picture developed in
Section 3, while remaining within the limited scope of the present experiment.
The bandwidth used throughout this section is natural from the viewpoint of pointwise estimation in the one-dimensional interior setting. Under this choice, the empirical behavior of the estimator is stable and the ellipsoids contract as n increases. At the same time, because the deterministic bias is not asymptotically negligible under this scaling, the displayed ellipsoids should be interpreted as descriptive Gaussian uncertainty regions rather than as bias-corrected confidence sets in the strict asymptotic sense. This distinction is important for a correct reading of the figures.
The two panels of
Figure 1 show that the orientation and eccentricity of the ellipsoids depend on the direction
. This is entirely consistent with the geometric nature of the target functional. Indeed, even at a fixed evaluation point
, the matrices
and
entering the covariance Formula (
18) depend on the direction of the quantile, so different values of
induce different local covariance geometries.
The directional contour plots provide a complementary view of the same phenomenon. For each fixed radial level r, the contour is obtained by sweeping the angle over the unit circle. The observed nesting of the curves is the expected geometric analogue of moving from central to more extremal quantiles. The increased regularity from to indicates that the estimator captures this directional geometry with improving numerical stability as the sample size grows.
Because the present illustration is restricted to the case and to the interior point , it should be viewed as an interior-point validation of the finite-sample behavior of the estimator, rather than as a direct numerical verification of the boundary-inflation phenomenon described in Theorem 3. That phenomenon is a genuinely simplex-boundary effect and would require simulations at points approaching 0 or 1, or, more generally, experiments with on higher-dimensional simplices. Such extensions lie beyond the scope of the present section.
Within the present experimental design, the estimator exhibits the behavior predicted by the theory: the dispersion decreases with n, the Gaussian ellipsoids reflect the direction-dependent anisotropy of the local covariance, and the estimated quantile contours become progressively smoother as the sample size increases. These observations support the practical implementability of the proposed method and confirm that the weighted geometric M-estimation procedure behaves stably in finite samples in the one-dimensional simplex setting.
5. Empirical Validation: Simplex-Constrained Inference for Geochemical Data
The methodological framework developed in
Section 2 and
Section 3 is now subjected to rigorous empirical scrutiny using the GEMAS (Geochemical Mapping of Agricultural and Grazing Land Soils) dataset [
62]. This dataset provides an ideal testing ground for the proposed estimator, as it embodies precisely the confluence of challenges that motivated this work: the covariate space is compositional and thus naturally constrained to a simplex, while the response is multivariate and requires a directional, non-Gaussian description. The objective is not merely to demonstrate computational feasibility, but to substantively validate the theoretical claims—specifically, the boundary-adaptive behavior of the Dirichlet kernel and the capacity of geometric conditional quantiles to reveal heterogeneity in the joint response distribution that is inaccessible to mean-based or marginal methods. The analysis proceeds in two stages: first, a univariate covariate scenario (
) to establish baseline performance and facilitate comparison with conventional techniques; second, a bivariate covariate scenario (
) to demonstrate the full power of the multivariate simplex-adaptive methodology.
5.1. Univariate Compositional Covariate: Sand-Normalized Texture
We begin by considering a simplified, yet scientifically relevant, setting where the covariate is one-dimensional. This allows for a transparent exposition of the estimator’s properties before confronting the full complexity of a bivariate simplex. The covariate is the Sand-norm, a normalized measure of sand content in the soil, which by construction lies in the unit interval and therefore constitutes a one-dimensional simplex. The bivariate response vector is , where and denote the concentrations (in mg/kg) of zinc and copper, respectively, measured by X-ray fluorescence.
The estimation procedure follows the protocol delineated in
Section 2. For a fixed evaluation point
, the Dirichlet kernel reduces to the Beta kernel
with shape parameters
,
, and bandwidth
. The geometric conditional quantile
is computed for three directional indices:
where
corresponds to the conditional geometric median, and
and
represent opposing directional excursions into the upper and lower quadrants of the response space, respectively.
5.1.1. Point Estimates and Asymptotic Standard Errors:
The results are reported in
Table 1 with asymptotic standard errors obtained from the plug-in covariance estimator
derived in Theorem 5. The bandwidth is selected as
, which satisfies the optimal MSE rate derived in Corollary 1 for interior points.
5.1.2. Interpretation and Theoretical Validation
The results provide empirical support for several key theoretical predictions:
Consistency and monotonic trends: As x increases, indicating a shift toward sandier soils, the estimated geometric median for both metals decreases monotonically ( from 70.671 to 55.628; from 17.561 to 11.815). This aligns with the geochemical intuition that sandier soils, characterized by lower clay and organic matter content, have a diminished capacity to retain trace metals. The monotonic decrease is consistent with a conditional mean function that is smooth and decreasing, as assumed in the bias expansion of Theorem 2.
Directional asymmetry and distributional heterogeneity: The directional quantiles and systematically lie, respectively, above and below the median across all covariate values. This is not a mere location shift: the gap between these directional estimates widens as x increases. For , the inter-directional spread increases from at to at , indicating a subtle yet detectable increase in the conditional distribution’s directional dispersion with sand content. This widening gap is a direct empirical manifestation of the directional asymmetry captured by the geometric quantile framework, which would be entirely absent in a median-only analysis.
Variance inflation and inferential significance: The standard errors, while slightly larger for the directional estimates, remain an order of magnitude smaller than the directional spreads, confirming that the observed asymmetries are statistically significant. Notably, the ASE for the quantile of increases from 0.445 to 0.523 as x moves from 0.333 to 0.606, while the ASE for the quantile of decreases from 0.178 to 0.115. This non-uniform variance behavior directly validates the theoretical variance analysis in Theorem 3: the inflation of the asymptotic variance for directional quantiles () is manifested here as larger standard errors, and the boundary-dependent inflation factor manifests as the changing variance structure as the covariate moves away from the boundary .
5.2. Bivariate Compositional Covariate: Sand and Silt Interplay
We now extend the analysis to its full multivariate form, with a bivariate covariate , where and . This vector lives on the two-dimensional simplex , a domain whose geometry is fundamentally different from a Cartesian product. The response is with (soil acidity) and (total organic carbon, %). The bandwidth is set to , a rate that respects the condition required for the asymptotic theory in dimension .
5.2.1. Point Estimates and Asymptotic Standard Errors:
The estimation is performed at a set of design points
spanning the interior of the simplex, with coordinates chosen to represent distinct textural classes (sandy, loamy, and silty compositions). The results are presented in
Table 2, where the geometric median (
) and the opposing directional quantiles (
,
) are reported.
5.2.2. Visualizing the Conditional Structure: Directional Quantile Surfaces
To complement the tabular results,
Figure 3 provides a visual representation of the fitted conditional quantile surfaces. Panel (a) displays the scatter plot of
versus the bivariate covariate
, overlaid with the estimated geometric median surface (
) and the directional quantile surface for
. Panel (b) presents the analogous visualization for
. The surfaces are constructed by evaluating the estimator on a fine grid of points within the simplex.
5.2.3. Interpretation and Theoretical Synthesis
This bivariate analysis yields a richer and more nuanced picture than the univariate case, directly illustrating the value of the multivariate geometric framework. Several distinct phenomena are discernible, each connecting directly to the theoretical results established in
Section 3.
Nonlinear and non-monotonic conditional structures: The behavior of is monotone in the sand-silt composition: as the point moves from the silt-rich region to the sand-rich region , the conditional median declines steadily from 6.364 to 5.437, corroborating the known acidifying effect of sandy soils. The response, however, exhibits a non-monotone, and thus more complex, pattern. Its median first increases from 1.699 to 2.040 as the composition shifts to a more loamy balance at , before decreasing to 1.820 at the most sand-dominated point. This nuanced behavior, which captures the parabolic relationship between carbon storage and intermediate soil textures, is a prime example of the kind of structure that a mean-based regression would likely oversmooth. The ability of our estimator to capture this non-monotonicity is a direct consequence of the local weighting scheme, which, as shown in the bias expansion (Theorem 2), allows for flexible adaptation to the underlying smooth function .
Directional asymmetry and the geometry of the joint distribution: The directional quantiles reveal a striking asymmetry. For every design point, is significantly larger than the median, while is significantly smaller, for both and simultaneously. This confirms that the joint conditional distribution is not centrally symmetric. More importantly, the magnitude of this asymmetry evolves with the covariate. The directional spread for , measured by the gap between the and quantiles, increases from at to at . This widening gap is a direct empirical manifestation of the theoretical prediction in Theorem 3: as the evaluation point approaches the boundary of the simplex (i.e., as becomes dominated by sand), the asymptotic variance inflates by a factor of , where is the number of coordinates approaching zero. Here, the boundary is the face where (silt) is small and (the clay fraction) is also small, increasing the codimension of the face. This leads to a larger effective variance, which in our empirical results is manifested as a greater sensitivity of the directional quantile estimates to compositional changes near the boundary, as reflected in the widening spread and the non-uniform standard errors.
Inferential significance and the role of the covariance matrix: The standard errors reported in
Table 2 are not uniform across directions. They are systematically larger for the
quantile, especially for
, which is the most variable component of the response. This aligns with the theoretical prediction from the asymptotic variance formula
, where
captures the conditional second moment of the directional score. A larger conditional variance in the direction of
naturally translates into greater estimation uncertainty for that quantile. Crucially, despite this increased uncertainty, the estimated gaps between
and
are multiples of their respective standard errors, confirming that the observed directional heterogeneity is statistically significant and not an artifact of sampling variation. This provides empirical support for the asymptotic normality result (Theorem 5) and the validity of the plug-in covariance estimator used to construct the standard errors.
The boundary-adaptive advantage: A critical, though subtle, advantage of the Dirichlet kernel becomes apparent when considering the design points in the bivariate analysis. The point lies close to the boundary of the simplex, where and the clay fraction is small. A conventional symmetric kernel would assign substantial weight to points outside the simplex, leading to boundary bias and potentially distorting the estimates. The Dirichlet kernel, by construction, respects the support, as its density is zero outside . The fact that the estimates at this boundary point are stable and exhibit the expected geochemical trends (e.g., lower pH, lower TOC) provides indirect validation of the boundary-adaptive property asserted in the introduction.
5.3. Discussion: Synthesis of Empirical Findings and Theoretical Implications
The GEMAS analysis substantiates the methodological and theoretical contributions of this paper in several key respects, moving beyond mere illustration to a rigorous empirical validation.
- 1.
Structural fidelity and support preservation: The use of the Dirichlet kernel ensures that the estimator respects the compositional geometry of the covariate space. This is not a marginal technical point; it is a prerequisite for meaningful inference, as conventional kernel methods would assign mass to infeasible regions near the simplex boundary, introducing an uncontrolled source of bias that is particularly severe in the bivariate case. The stability of our estimates near the boundary, as seen in the bivariate analysis, demonstrates the practical efficacy of this support-adaptive smoothing.
- 2.
Detection of complex conditional structure: The bivariate analysis reveals that the conditional dependence of on soil texture cannot be reduced to a simple monotone or linear relationship. The non-monotonic median of and the directional asymmetry that evolves across the simplex demonstrate that the joint conditional distribution is subject to both shape and location variation. Such phenomena are beyond the scope of mean regression or marginal quantile analysis, confirming the necessity of the geometric quantile framework.
- 3.
Empirical validation of boundary asymptotics: The observed increase in directional spread and the non-uniformity of standard errors as the covariate approaches the sand-dominated corner of the simplex provide empirical corroboration for the theoretical variance regime established in Theorem 3. The estimator’s behavior near the boundary is not a deficiency but a correctly calibrated reflection of the intrinsic difficulty of local inference in that region, a difficulty that is accurately captured by the inflation factor. This is a novel empirical contribution, linking the asymptotic theory directly to observable finite-sample behavior.
- 4.
Actionable scientific insight: The directional asymmetry revealed by the quantile surfaces offers geochemically interpretable information. The fact that the quantile, which weights the upper tails of both and , is more responsive to changes in soil texture than its counterpart, suggests that the mechanism governing the upper joint distribution of acidity and organic carbon is more sensitive to textural composition than the mechanism governing the lower joint distribution. This differential sensitivity, which would remain hidden in a univariate analysis, provides a refined hypothesis about the underlying pedological processes. Specifically, it suggests that the factors that simultaneously drive high pH and high carbon storage (e.g., calcium-rich parent material, stable soil aggregates) are more strongly modulated by soil texture than the factors that drive low pH and low carbon.
- 5.
Inferential framework in practice: The construction of standard errors via the plug-in covariance estimator , grounded in Theorem 5, provides a practical tool for uncertainty quantification. The reported ASEs allow for formal hypothesis testing (e.g., comparing to ) and the construction of confidence ellipsoids, which, as shown in the simulation study, achieve near-nominal coverage in moderate sample sizes. This transforms the estimator from a point estimate into a fully inferential tool.
In conclusion, the empirical investigation of the GEMAS dataset serves as a rigorous proof-of-concept and an integral part of the paper’s contribution. It demonstrates that the boundary-adaptive geometric conditional quantile estimator, grounded in the theoretical framework of
Section 2 and
Section 3, is not only computationally implementable but, more importantly, capable of extracting scientifically meaningful and statistically reliable information from complex, constrained multivariate data. The analysis validates the core theoretical claims—the boundary-adaptive bias reduction, the directional variance inflation, and the asymptotic normality—and illustrates the unique inferential advantages conferred by the synergistic combination of Dirichlet kernel smoothing and geometric quantile regression. The empirical findings thus reinforce the paper’s central thesis: that respecting the geometry of the covariate space and the directional nature of the response is essential for reliable and insightful conditional inference in modern statistical applications.
6. Conclusions and Perspectives
6.1. Synthesis of Contributions
This paper has introduced and systematically analyzed a novel nonparametric estimator for geometric conditional quantiles when the covariate lies on the simplex. The methodological core resides in the synergistic combination of two conceptually powerful yet hitherto separate ideas: the geometric quantile framework, which confers directional interpretability and convexity properties, and the boundary-adaptive Dirichlet kernel construction, which generalizes the beta kernel paradigm to multivariate compositional covariates. By letting the kernel shape parameters depend deterministically on the evaluation point, we achieve intrinsic support alignment and eliminate boundary bias without recourse to ad hoc corrections. The resulting estimator inherits both the geometric coherence of its population counterpart and the boundary-respecting properties of the Dirichlet weighting scheme, rendering it particularly well-suited for compositional data analysis, spatial econometrics, and multivariate risk assessment.
From a theoretical perspective, the paper delivers a comprehensive asymptotic characterization that goes considerably beyond existing results for either univariate conditional quantiles or unconditional geometric quantiles. The Bahadur-type linear expansion established herein represents the first such result for a conditional geometric quantile estimator with boundary-adaptive weighting. This expansion identifies the pivotal matrix encoding the local geometric structure and establishes a sharp remainder rate that reflects the interplay between the kernel’s effective support and the smoothness of the conditional distribution. Crucially, the expansion provides the foundational tool from which all subsequent asymptotic properties are derived.
The bias analysis reveals the precise manner in which simplex geometry influences estimation error, incorporating boundary-adaptation terms with coefficients that depend explicitly on the simplex coordinates. This structure originates from the evaluation-point-dependent kernel moments and represents a significant departure from conventional kernel smoothing theory, where bias expansions typically involve only the marginal density and regression function derivatives. Perhaps most striking is the variance behavior, which exhibits a fundamental phase transition as the evaluation point approaches the boundary: each coordinate direction approaching the boundary inflates the asymptotic variance by a factor that diverges at a specific rate, a phenomenon rigorously derived from the asymptotic behavior of Gamma function ratios in the Dirichlet kernel. This boundary-induced variance inflation is not a defect of the estimator but an intrinsic feature of estimation near the support boundary, and its explicit characterization enables proper uncertainty quantification in such regions.
The mean squared error analysis synthesizes these bias and variance contributions to establish the optimal bandwidth rate, which depends on both the dimensionality and the proximity to the boundary. The asymptotic normality result, obtained under a nonstandard rescaling that accounts for the boundary-dependent convergence rates, provides the theoretical foundation for inference. By constructing plug-in covariance estimators and asymptotic confidence ellipsoids, we enable directional inference across the entire spectrum of quantile indices, from central tendency to extremal directions.
Beyond these theoretical contributions, the real-data application in
Section 5 provides a concrete empirical validation of the proposed methodology. The analysis of the GEMAS dataset shows that the estimator is capable of revealing meaningful directional heterogeneity in the joint conditional behavior of
and
as functions of normalized soil-composition covariates. In particular, for the
Sand-norm covariate, the estimated geometric conditional median and directional quantiles display a clear decreasing trend as the covariate level increases, indicating that larger sand proportions are associated with lower conditional levels of both zinc and copper. By contrast, for the
Silt-norm covariate, the fitted quantiles exhibit a generally increasing pattern, thereby suggesting an opposite conditional effect. These empirical findings illustrate that the proposed estimator is not merely a theoretical construct, but an effective tool for uncovering structured multivariate conditional relationships in constrained compositional settings.
The GEMAS study also highlights an important substantive feature of the method: its ability to detect directional asymmetry in the conditional response distribution. The noticeable separation between the geometric conditional median and the directional quantiles associated with and demonstrates that the conditional distribution of the response cannot be adequately summarized by a single central regression surface. Rather, the multivariate response cloud exhibits directional variation whose magnitude and orientation depend on the soil composition. In this respect, the real-data analysis confirms one of the main motivations of the paper: geometric conditional quantiles furnish a substantially richer description of conditional structure than mean-based or median-only approaches.
Furthermore, the graphical analysis reported in
Section 5 shows that the estimated conditional quantile curves capture nonlinear patterns and local heterogeneity more effectively than unconditional summaries. In particular, the visible deviations between the directional quantile curves and the conditional median in regions of increased spread support the practical relevance of the proposed directional framework for assessing conditional dispersion and asymmetry. Hence, the empirical results do not merely accompany the theory; they substantively reinforce the central claim of the paper that boundary-adaptive geometric quantile estimation on the simplex yields scientifically interpretable and practically useful information in applications involving compositional covariates.
6.2. Methodological Import and Positioning
Relative to the existing literature, this work occupies a distinctive position. Compared to norm-minimization quantiles, the spatial formulation yields a direct geometric interpretation and a tractable convex objective whose gradient structure meshes naturally with Dirichlet weights. Relative to beta and Bernstein estimators on the unit interval, our simplex-based Dirichlet kernels generalize boundary adaptivity to multivariate covariates with compositional constraints and reveal a clean asymptotic scaling law near faces and edges of varying codimension. When the directional parameter vanishes, the estimator reduces to the conditional geometric median, and our framework recovers and extends existing results for this special case under weaker conditions. More broadly, the analysis clarifies how proximity to the boundary governs stochastic error, thereby informing bandwidth selection and the interpretation of confidence regions in practice.
The real-data illustration further clarifies this positioning. In the GEMAS application, the covariates are normalized soil texture components and therefore naturally lie in a constrained domain where support respect is not optional but structurally required. In such a setting, the use of Dirichlet kernels is particularly well justified: the method respects the geometry of the covariate space, avoids artificial leakage outside the admissible domain, and remains interpretable near the edges of the support. The observed stability of the estimated conditional quantiles in that application provides empirical support for the claim that boundary adaptivity is not merely a formal refinement, but a practically consequential feature of the proposed methodology.
The estimator is computationally tractable, amenable to iteratively reweighted least squares and Weiszfeld-type algorithms, and readily implementable with weight reuse across grids of covariate values. Numerical experiments corroborate the theoretical predictions: the Dirichlet weighting yields stable estimation in the interior and demonstrably mitigates boundary distortion, while plug-in covariance coupled with moderate bootstrap calibration delivers reliable finite-sample confidence regions. The contour analyses across directional indices reveal the estimator’s ability to capture the local anisotropic structure of the conditional distribution, with ellipsoid orientation and eccentricity reflecting the underlying covariance geometry.
6.3. Limitations and Avenues for Extension
Notwithstanding these contributions, several limitations merit acknowledgment and suggest directions for future investigation. First, the asymptotic results are pointwise in the covariate and directional parameter, assuming independent and identically distributed observations with smooth conditional structure. Uniform inference over regions of the covariate space or over grids of directional indices remains an open challenge, yet such results are essential for constructing simultaneous confidence bands for quantile contours and for controlling family-wise error in exploratory analyses.
Second, the bandwidth selection problem, while theoretically characterized through the mean squared error optimal rate, lacks a fully data-driven implementation tailored to the geometric quantile loss, see [
64,
65,
66]. Plug-in rules, cross-validation procedures that target the quantile risk, and Lepski-type adaptive methods warrant systematic development, with particular attention to their behavior near the boundary where the effective sample size exhibits spatial heterogeneity.
Third, the extension to higher-dimensional covariates raises important questions about dimension reduction and structural assumptions. For compositional covariates on the simplex, the Dirichlet construction remains valid for arbitrary dimension, but the effective local sample size decays rapidly, necessitating either sparsity-inducing penalties, low-dimensional index models, or dimension reduction techniques that respect the compositional geometry. The interplay between the simplex dimension and the boundary codimension in determining optimal rates merits further investigation.
Fourth, the framework currently assumes independence across observations, yet many potential applications involve temporal dependence, spatial correlation, or network-structured data. Extending the Bahadur representation and central limit theorem to weakly dependent processes under appropriate mixing conditions would substantially broaden the estimator’s applicability to time series econometrics and spatial statistics.
Fifth, while the geometric quantile formulation inherently provides robustness through the use of Euclidean norms, extreme directions approaching the boundary of the unit ball remain sensitive to outliers and heavy-tailed phenomena. Robustification through Huberized losses or redescending influence functions, while preserving first-order asymptotic properties, could enhance stability in extremal inference.
Sixth, the compositional nature of the covariate space suggests deeper connections with Aitchison geometry. Integrating log-ratio transformations with the Dirichlet weighting scheme could yield estimators that are invariant under the natural operations of compositional data analysis, enhancing interpretability in geochemical, ecological, and economic applications where compositions arise naturally. This direction appears especially promising in light of the GEMAS application, where the covariates arise precisely from normalized soil-composition variables and where relative rather than absolute dominance between components may carry the primary scientific signal.
Seventh, incomplete data mechanisms—missing at random, missing not at random, and censoring—are pervasive in practice. Adapting the estimator to such settings under appropriate identification conditions, and establishing semiparametric efficiency bounds, would extend its utility to survival analysis and longitudinal studies with attrition.
Finally, nonasymptotic analysis in the form of concentration inequalities and finite-sample coverage guarantees for the confidence ellipsoids would complement the asymptotic theory and provide guidance for practice in moderate sample sizes. Such results typically require stronger tail conditions but yield valuable insights into the estimator’s behavior beyond the first-order asymptotics.
6.4. Concluding Remarks
Boundary-respecting Dirichlet kernels provide a principled and effective vehicle for multivariate conditional quantile estimation on the simplex. The resulting estimators admit precise asymptotics, are computationally viable, and exhibit robust boundary performance, positioning the approach as a sound default for geometric conditional inference. By unifying the geometric quantile paradigm with location-adaptive smoothing, this work bridges two previously disparate literatures and opens new perspectives for robust conditional analysis in econometric, financial, environmental, and stochastic modeling applications.
The real-data analysis presented in
Section 5 gives this conclusion a concrete empirical foundation. It shows that the proposed methodology is capable of extracting meaningful and nontrivial scientific information from compositional environmental data: it identifies opposite conditional trends for sand- and silt-dominated soils, reveals directional asymmetry in the joint conditional behavior of
and
, and captures nonlinear variation that is obscured by more classical summaries. These findings illustrate that the theoretical developments of the paper translate into genuine inferential gains in practice. In that sense, the GEMAS application does not merely serve as an illustration; it confirms the operational relevance of the proposed framework and demonstrates that boundary-adaptive geometric conditional quantiles can provide refined and interpretable insight in realistic multivariate regression problems.
The theoretical foundations established herein—the Bahadur expansion, the boundary-dependent variance regimes, the optimal bandwidth characterization, and the asymptotic normality with ellipsoidal inference—therefore acquire additional significance when viewed through the lens of the real-data evidence. The proposed estimator is not only mathematically well founded, but also practically informative in settings where the covariates are compositional and the response is multivariate. As such, this paper contributes both a rigorous asymptotic theory and an applied statistical methodology for recovering directional conditional structure on constrained domains, and it is our hope that it will stimulate further developments at the interface of geometric quantile theory, compositional data analysis, and boundary-adaptive nonparametric inference.