A Note on the Nonparametric Estimation of the Conditional Mode by Wavelet Methods

: The purpose of this note is to introduce and investigate the nonparametric estimation of the conditional mode using wavelet methods. We propose a new linear wavelet estimator for this problem. The estimator is constructed by combining a speciﬁc ratio technique and an established wavelet estimation method. We obtain rates of almost sure convergence over compact subsets of R d . A general estimator beyond the wavelet methodology is also proposed, discussing adaptivity within this statistical framework.


Introduction
Let us start by describing the mathematical context of the study. Consider a R d × R valued independent vectors (X i , Y i ) i≥1 defined on a probability space (Ω, F , P). The distribution and probability density functions of (X 1 , Y 1 ) are denoted by F(·) and f (·), respectively. The conditional density function of Y 1 given X 1 = x is denoted by f (· | x). By assuming that f (· | x) is unimodal, the conditional mode of Y 1 given X 1 = x, denoted by Θ(x), is defined by the following equation: A kernel estimator of the conditional mode Θ(x) is defined as the random variable Θ(x) that maximizes the kernel estimator f (· | x) of f (· | x), which is Kernel-type estimators have been studied in depth in different dependency contexts. See [1][2][3][4][5][6][7], to name a few. [4] motivated the use of conditional mode by pointing out that the prediction of the Y-values given X-values is done by regression function estimation. However, such an approach is not reasonable in certain circumstances, including the case where the conditional density of Y given X is far from being a symmetric function, or even worse when it is not unimodal. Prediction via conditional mode estimation is in these cases much more appropriate. In this regard, we may refer to [7]. Most kernels used are symmetrical and, when chosen, are fixed. This can be effective for estimating curves with unbounded supports, but not for those with compact support or a subset of the entire real line. The same for those having discontinuities at boundary points. An interesting feature of wavelet methods in statistics is to provide efficient procedures under mild assumptions on the regularity of the unknown function. In addition, procedures that adapt to the regularity of the object to be estimated can be developed. These procedures are called adaptive. The complete background on wavelets can be found in [8][9][10][11]. The use of wavelets in various curve estimation problems is studied in [12]. We can refer also to the articles of [13][14][15][16][17][18][19].
In the present work, we propose a wavelet based estimator of the conditional mode constructed from the methodology of [4] and an efficient linear wavelet estimator introduced by [20]. We establish its rates of almost sure convergence over compact subsets of R d under mild assumptions on the involved unknown functions; the Besov spaces are considered. A general method is also discussed, beyond the wavelet estimation setting. To the best of our knowledge, the developed methodology and result respond to a problem that has not been studied systematically up to the present, which is the basic motivation of this work.
The presentation of the note is as follows. In Section 2, we first present the estimator that we consider throughout this note. In Section 3, we declare the main assumptions and give the main theoretical result, with discussions. Section 4 is about a general method allowing more flexibility on the choice of the estimation method. Section 5 concludes the note.

Estimation Procedures
The mathematical context of the multiresolution analysis as well as the considered estimator are presented in this section.

Multiresolution Analysis
We start by giving some notation and definitions that are needed for later use. For an accessible introduction to the theory of wavelets, we refer to [8][9][10][11]. According to [8], a multiresolution analysis on the Euclidean space R d can be seen as a decomposition of the space L 2 (R d ) into an increasing sequence of closed subspaces {V j : j ∈ Z} in a such a way that where V 0 is closed subspace under integer translation. Assume that there exists a scale function is an orthonormal basis for the subspace V j . We call the multiresolution analysis r-regular if φ(·) ∈ C r and all its partial derivatives up to total order r are rapidly decreasing, i.e., for every integer i > 0, there exists a constant A i > 0 such that, for all |β| ≤ r, where is assumed to be r-regular in our framework. Let us denote by W j the orthogonal complement of V j in V j+1 , in the sense that V j ⊕ W j = V j+1 . This implies that the space L 2 (R d ) can be decomposed as . . , N} satisfying the following: has the same regularity as φ(·) and both functions are compactly supported on [−L, L] d for some L > 0.
For any function f (·) assumed to be in L 2 (R d ), for any integer m, we have the orthonormal representation, where One can remark that the orthogonal projection of f (·) on V may be written in two equivalent ways, for any m ≤ , Recall that a function f (·) ∈ L 2 (R d ) pertains to the Besov balls B d s,p,q if and only if there exists a constant C > 0 such that with the usual modifications for the cases p = ∞ or q = ∞. For a more general definition of Besov spaces, we refer to Appendix A.

Main Estimator
We consider the estimator ∂ y f (y | x) defined by the ratio: where ∂ y f (y, x) is an estimator for ∂ y f (y, x) defined by Here, the resolution levels = (n) and m = m(n) → ∞ at a rates specified below. Since we assume that φ(·) and ψ i (·) have a compact support so that the summations above are finite for each fixed x (note that in this case the support of φ(·) and ψ i (·) is a monotonically increasing function of their degree of differentiability [9]). We focus our attention on multivariate linear estimators which will be shown to have uniform almost sure convergence rates over compact sets, in a similar way as in [21,22] in the setting of the strongly mixing processes. The considered wavelet estimators ∂ y f (y, x) and f (x) are derived to those proposed in [20]. See also [23,24] for the use of this estimator in other statistical contexts. Now, assume that there is some compact subset C x such that f (y | x) has a unique mode Θ(x) on C x . Then, our estimator for the mode is defined by assuming that there is some unique Θ(x) in C x such that The idea of such construction was proposed by [4] for kernel estimators. We extend it to wavelet methodology, which is a nontrivial extension.

Assumptions and Main Results
This section is devoted to our main result.

Assumptions
We formulate the following assumptions. Let D be a compact set in R d and x ∈ D.
These assumptions are only technical and necessary to prove our main theorem.

Main Result
Theorem 1 below presents uniform rates of almost sure convergence over compact subsets of R d .

Proof.
We follow the lines of [4], Proof of Theorem 2. By (A.1), the Taylor expansion of the function with Θ * (x) being a point between Θ(x) and Θ(x). Making use of the condition (A.2) and the definition of C x , we obtain Moreover, always by (A.2), we readily infer that An appropriate decomposition and the triangular inequality yield Combining these inequalities with condition (A.5) and, for large enough n, This completes the proof of Theorem 1.

On a General Estimation Method: One Step to the Adaptivity
In this section, a more general estimator of the conditional mode is considered; we adopt the same notations and assumptions as above but beyond the methodology of wavelets. Specifically, we now consider an estimator ∂ y f (y | x) of ∂ y f (y | x) defined under the following ratio form: where I(.) is the indicator function, c refers to (A.5), ∂ y f (y, x) is a generic estimator for ∂ y f (y, x) (wavelets, kernel, splines, etc.) and f (x) is a general estimator for f (x) (wavelets, kernel, splines, etc.). Both may be adaptive or not.
Under this general setting, the result below presents a general upper bound for Θ(x) − Θ(x) with Θ(x) characterized by (6). This result has the features to hold for any n ∈ N * , and to involve  (6) with the general estimator given in (10). Then, for any n ∈ N * , we have where Proof. By putting (7) and (8) together, we have Now, based on (10), we have Owing to (A.5), the last term can be bounded as By virtue of (9) and The desired result follows by combining the above inequalities together.

Remark 3.
Based on Proposition 1, the following inequalities hold. There exists a constant C > 0 such that the uniform risk of Θ(x) satisfies In particular, the choices of adaptive estimators for ∂ y f (y, x) and f (x) in (10) which are efficient in the uniform risk yields an adaptive and efficient estimator Θ(x) for Θ(x) in the uniform risk.
An adaptive wavelet estimator based on hard thresholding for f (x) efficient in the uniform risk has been developed by [25]. However, to our knowledge, an adaptive wavelet estimator for ∂ y f (y, x) that proved to be efficient in the uniform risk has not received particular attention, and need further non-trivial developments beyond the scope of this note. In addition, the ratio estimator (10) has the drawback to depend on a constant c , which is related to f (x) in the lower bound sense. Consequently, one can say that such a ratio estimator is semi-adaptive. In this regard, the "full" adaptive nonparametric estimation of the conditional mode needs further investigations that are left for the future.

Conclusions
This note proposed some results on the estimation of the conditional mode by wavelet methods. A ratio wavelet estimator is proposed and we showed that it achieves fast rates of almost sure convergence over compact subsets of R d . A general approach is also discussed, taking a step in the direction of adaptivity. The possible perspectives of this work are to study the asymptotic normality of the proposed estimator. In addition, a full adaptive extension through wavelet thresholding or others adaptive techniques remains aground of free research. These two main branches need deeper developments, which we leave for future investigations.  We refer to [26,27] and the Appendix of [22] for more details. Notice that any function f ∈ B s,p,q is in L p (R d ), where s > 0 denotes the real valued smoothness parameter of the function f (·). An important characterization of B s,p,q based on wavelets coefficients is provided by [8]. The multiresolution analysis is assumed to be r-regular and s < r. Hence, we have f ∈ B s,p,q if and only if J s,p,q ( f ) = P V 0 f L p + ∑ j>0 2 js P W j f L p with the classical modification for the sup-norm where q = ∞. We have f ∈ B s,p,q being equivalent to the following: J s,p,q ( f ) = a 0 · L p + ∑ j>0 2 j(s+d(1/2−1/p)) b j · l p q 1/q