Smooth k NN Local Linear Estimation of the Conditional Distribution Function

: Previous works were dedicated to the functional k -Nearest Neighbors ( k NN) and the local linearity method estimations of a regression operator. In this paper, a sequence pair of ( X i , Y i ) i = 1,..., n of functional mixing observations are considered. We treat the local linear estimation of the cumulative function of Y i given functional input variable X i . Precisely, we combine the k NN method with the local linear algorithm to construct a new and fast efﬁciency estimator of the conditional distribution function. The main purpose of this paper is to prove the strong convergence of the constructed estimator under mixing conditions. An application to the functional times series prediction is used to compare our proposed estimator with the existing competitive estimators, and show its efﬁciency and superiority.


Introduction
In the last decade, the local linearity method (LLM) estimation has become an interesting growing method in nonparametric Functional Data modeling (NPFDM). This topic's motivation is the superiority of LLM-estimation over the method of the classical kernel weighting method (CKM). Especially, the CKM has a large bias compared to the LLM-estimation (see Fan and Gijbels [1] for a uni-dimensional framework, and Baìllo and Grané [2] for NPFDM setup). Baìllo and Grané [1] used the LLM-algorithm to estimate the hilibertian conditional expectation operator. A generalized LLM-estimation of this nonparametric operator was obtained by Barrientos et al. [3]. They treated the case of Banach explanatory variable. Berlinet et al. [4] built an alternative LLM-estimator of the functional conditional expectation by inverting the local variance-covariance matrix of the functional variable. The asymptotic distribution of the LLM-estimator proposed by Barrientos et al. [3] was obtained by Zhou and Lin [5].
Furthermore, the LLM-estimation of the conditional cumulative distribution function (CCDF) was investigated by Laksaci et al. [6], who established the almost consistency rate for an LLM-estimator of the CCDF-model when the observations have spatio-functional structure. All these previous studies have utilized the kernel local linearity method; however, this paper focuses on CCDF-estimation with a new weighting approach obtained by mixing the local linear fitting to the kNN method.
to the functional formation of underlying data (see Burba et al. [7] for more motivations on this approach). Notice also that the method of kNN, in the nonparametric functional statistic, has been studied by many researchers (see, for instance, Laloë [8], Kudraszow and Vieu [9] for previous works and Kara et al. [10] for the uniform consistency on the number of neighbors). On the other hand, the kNN estimation under the local linear approach was recently developed by Chikr-Elmezouar et al. [11]. They constructed an estimator of the conditional density by combining the ideas of the local linear method to the kNN weighting techniques. They proved the almost complete consistency of the constructed estimator when the observations are independent identically distributed. Considering the same situation for functional observations in the case of independent identically distributed, Bachir et al. [12] have studied the estimation of the M-regression function. Their estimator was obtained using the kNN approach over the Nadaraya-Watson method. They established the convergence rate of the uniform consistency on the number of the neighbor of the constructed estimator. As an alternative model to the robust regression, Laksaci et al. [13] have treated the kNN estimation of the quantile regression. They stated the property of the built estimator under the independence structure. We refer to Rachedi et al. [14] for the functional regression when the response variable is observed with missing data at random. More recently, Almanjahie et al. [15] have studied the computational aspects of the kNN estimation of some nonparametric functional models, including the conditional density, the regression operator, and the conditional cumulative function. They examined the feasibility of some selector algorithms to choose the best bandwidth parameter in nonparametric functional data analysis.

Contribution
While the previous works are dedicated to the functional kNN-estimation of the regression operator using the CKM-method, we consider, in this contribution, the kNN estimation problem of the CCDF using the LLM-smoothing. Precisely, we benefit from the attractive features of both the kNN weighting and LLM-fitting by combining the two algorithms to provide a fast efficiency estimator for the CCDF. On the one hand, it is well known that the main reason behind the implementation of the kNN method is its feature to select an attractive smoothing parameter. Specifically, the kNN method permits the selection of a bandwidth parameter more adapted to the local structure of the data. Moreover, this estimator can be updated to any new observations. Such consideration is essential in functional statistics where the asymptotic properties are strongly dependent on the behavior of the local structure. For the latter reason, the kNN method is better than the classical kernel method. However, the difficulty in the kNN smoothing is the fact that the bandwidth parameter is a random variable, unlike the kernel method in which the smoothing parameter is a deterministic scalar. So, the study of the asymptotic properties of this estimator is complicated, and it requires some additional tools and techniques.
On the other hand, it is well documented that the local linear approach is an alternative method to the usual Nadaraya-Watson technique. As discussed in the first paragraph, the estimation by local linear method permits to improve the asymptotic property of the kernel estimator by reducing the bias term. Thus, with this combined approach, we construct an estimator of the CCDF and state their consistency by establishing the almost complete convergence rate. The second novelty of this contribution is establishing the asymptotic results of the estimator when the observations are correlated as mixing functional time series.

Organization
This paper is structured as follows. Our Methodology describing the kNN-LLM estimator, as well as the functional time series framework, is presented in Section 2. The main asymptotic results with their proofs are also presented in Section 3. Section 4 is devoted to some comments allowing to reveal the very merits of the proposed approach. The perfor-mance of the constructed estimator in temperature prediction, compared to existing estimator, using real data is conducted in Section 5. Our conclusion is presented in Section 6.

CCDF-Model and Its kNN-LLM Estimator
Consider (X 1 , Y 1 ), (X 2 , Y 2 ), (X 3 , Y 3 ) . . . , (X n , Y n ) be stationary sequence of random vector (X, Y) valued in F × IR, where F is a separable metric space has a metric d. Let N x be the neighborhood of fixed curve x ∈ F , for which we suppose that the conditional cumulative distribution function (CCDF) F(·|x) has a continuous conditional density f (·, x). Usually, the LLM-estimator of CCDF is built by treating the function F(·|x) as a conditional expectation, i.e., where H is the cumulative distribution function, and ( n = ) is a positive real sequence. In fact, this idea was proposed, first, by Fan and Gijbels [1] in nonfunctional setup. In our functional context, we consider an alternative estimator to that proposed by Almanjahie et al. [16]. It is obtained by approximating F(y|x) locally in N x by So, the kNN-LLM estimator is constructed by estimating the operators a yx and b yx of the formula in (1) as where Ker(·) is a kernel function, l = min{ ∈ IR + , satisfies ∑ n i=1 1 I (y− , y+ ) (Y i ) = l}, and IH k = min h ∈ IR + such that ∑ n i=1 1 I B(x,h) (X i ) = k . Then, we prove, later, that the smooth kNN-LLM of the CCDF F(y|x) is explicited by

Functional Time Series Framework
We study the behavior of asymptotic property of the LLM-estimator based on CCDF F(·, x) when the data is observed as functional time series (FTS). It should be noticed that the functional time series analysis has been widely developed in the area of functional statistics; see Ferraty and Vieu [17] for an important discussions. In this paper, we carry out our functional time series framework by assuming that the sequence (Z i = (X i , Y i )) i is algebraic α-mixing has coefficient of mixing α(n) → 0 such that there existsa > 2, such that ∑ n n a α(n) < ∞. ( As for all asymptotic results, in nonparametric functional statistic, we need to control the local concentration of both marginal and joint distributions of the functional observations. Indeed, for the mariginal distribution, we assume that such that For the joint distribution, we assume that where Ba(x, h) := {z ∈ F satisfiesd(z, x) < r} refers to a ball with a center x and a radius h. The challenging aim of the paper is to establish the convergence rate of the kNN-LLM estimator without independence assumption. Of course, this general consideration requires different tools to those used for the independent situation. In the rest of this section, we give the necessary background to handle this situation. [17]). Let (Z i ) i∈IN be an algebraic α-mixing process which is identically distributed.

1.
If there exist p > 2 and M > 0 such that, for all t > M,

Results: The Asymptotic Properties of the kNN-LLM Estimator of CCDF
Now, we prove the a.co. convergence of the estimator F(y|x) toward F(y|x). Firstly, let us point out that the condition (4) ensures the existence of (α, β) ∈ (0, 1) 2 that satisfies So, in the remainder of this paper, we put IH r x (βk/n), r = l/αn and l = αl/n. Next, to establish the convergence rate of our estimator, we set the following conditions.
The kernel Ker is differentiable function and has support [0, 1]. Moreover, its first derivative Ker exists and ∃ C and C satisfy and for h = IH l k or IH r k , The conditional distribution function satisfies, for all (x 1 , The sequence  (6) and (8) can be also found in Ferraty and Vieu [17]. A deeper and clear discussion on the generality of our framework is given in the next Section.
Proof of Theorem 1. The details of the proof is given in the Supplementary File. It is based on the following Lemmas.

Lemma 2.
Using the same conditions of Theorem 1, we have Lemma 3. Using the same conditions of Theorem 1, we have where, ; ell = l or r , and j = 0, 1.
In the following, we will show the mathematical proofs of the above intermediate results. When no ambiguity is possible, C and C will be used to denote some strictly positive generic constants with Proof of Theorem 2. The proof is based on the exponential inequality of the Fuck-Nagaev (see Lemma 1) on it follows that, for all ε, r > 0, we obtain where Set r = C(log n) 2 to conclude that and use (9) to obtain For a suitable choice of ε and by (8), we get S j,n − IE S j,n = O a.co. τ n log n n 2 .
Proof of Theorem 3. Once again, as in Lemma 2's proof, we apply the exponential inequality of Fuck-Nagaev for another random variables as Since which allows to write e j,n − IE e j,n = O a.co. τ n log n n 2 .
This last achieves the proof of this lemma.

The kNN Method in Functional Statistics
Motivated by its flexibility in practice, the kNN-method is becoming a popular nonparametric data analysis. It was introduced in functional statistics by Burba et al. [7]. The implementation of this approach in functional data analysis is promising. Indeed, as with all nonparametric smoothing approaches, the kNN method has some drawbacks in multivariate analysis, such as its high sensitivity to feature vectors, the slowness of the execution-time when the data has large volume, or the excessive use of the memory. It is well known that all these drawbacks are due to the problem of the curse of dimensionality in vectorial statistics. This problem was handled by using the small probability function φ x to evaluate the asymptotic property of the estimator. Indeed as discussed in Ferraty et al. [7] the small probability function φ x (h) = IP(X ∈ Ba(x, h)) quantifies the concentration property of the probability measure of the functional variable, and it has an inherent role in functional data analysis. In a sense that a less concentration of the probability measure of the functional variable implies a slower rate of convergence of the estimator. Then, the best way of resolving the above drawbacks is to increase the concentration of the functional variable in neighbor of the location point x. To do that, we use the fact that the small ball probability function depends crucially on the metric d(., .). Hence, from a statistical point of view, we can increase the concentration property by choosing the best metric d. Thus, we can say that the implantation of the kNN in functional data analysis is very beneficial in practice, and their drawbacks of the multivariate can be overcome by using the appropriate topological structure.

On the Impact of This Contribution
It is well known that the conditional distribution function has a pivotal role in nonparametric statistics modeling. Indeed, the nonparametric estimation of this model is an imperative step for various nonparametric models, such as conditional density, the conditional hazard, and the conditional quantile functions. The conditional cumulative distribution function in the prediction setting allows to construct many predictive intervals or, more generally, predictive regions. We mention, for instance, the conditional percentile interval, the shortest conditional modal interval (SCMI), and the maximum conditional density region (MCDR) (see Yao and Tong [18] for their definitions). Of course, the diversity of the applicability of the conditional distribution function highlights the importance of this conditional model, which has the advantage of completely characterizing the conditional law of the considered random variables. As mentioned in the bibliographical discussion of the introduction's section, this model has been widely studied in nonparametric functional statistics. However, the novelty of the present contribution is mainly the estimation of this model by combining two important approaches: the local linear method and the k-Nearest Neighbors procedure. This combination allows to construct a new attractive estimator that inherits the advantages for both methods. Indeed, it is well known that the local linear method improves the bias property of the kernel method, while the weighting by the kNN algorithm offers a sophisticated procedure for the smoothing parameter selection. It is locally selected with respect to the vicinity at the conditioning point, which permits to construct an adaptive estimator to the data's local structure. Such consideration is very important in (nonparametric functional data analysis, where the performance of estimators is closely linked to the local structure of the data through concentration properties of the probability measure (see Ferraty and Vieu [17]). Nevertheless, the establishment of the asymptotic properties of this estimator is more difficult than the classical case studied by Laksaci et al. [6] because, here, the bandwidth parameter is a random variable, unlike the standard case where the bandwidth parameter is a scalar. Considering the dependent case, which is a more general and more realistic situation, this difficulty becomes more complicated. We can say that the principal axes of this contribution are (1) the conditional distribution function as a pivotal model for various nonparametric conditional models, (2) the estimation method as a new proceeder even in the nonfunctional case (as far as we know, there is no work in the CCDF estimation by combining the LLE to kNN) and (3) the functional time series case as a generalization of the independent case. To emphasize the usefulness of the present contribution in the prediction issue, we discuss in the following section how we can predict future real characteristics of a continuous-time process given its past.

Some Particular Cases
One of the main features of the present work is treating the kNN-local linear estimation under the dependent case. The latter allows for regrouping several usual situations. To highlight the importance and the generality of the present contribution, we come back to particularize our study for these usual situations. In particular, we consider the independent case, the strong local dependency case, and the local constant method. Let us note that, for the sake of brevity, detailed proof of the corollaries is given in the Supplementary File.

•
The independent case: The independent case is widely studied in the past for some alternative models. However, this case can be treated as a particular case for this contribution. It corresponds to the case of α(n) = 0. In this situation, the condition (2) is automatically stratified, and Theorem 1 leads straightforwardly to the following Corollary. (6) and (7) and if the derivative of the function H exists and is an increasing function satisfies

Corollary 1. Under the conditions
We point out that this result is also new as the kNN-LLM estimator of the CCDF in the i.i.d. case. • The strong local dependency: The second particular case is the case when the local dependency, measured by Ψ x (r), is of order Then, in this situation, the Theorem 1 is reformulated as follows.

Corollary 2.
Under the conditions (6) and (7) and if the derivative of the function H exists and is an increasing function satisfies Obviously, the convergence rate of this particular case is more speed than the general form given in Theorem 1.
• The local constant method: It is well known that the Nadaraya-Watson estimator can be viewed as a particular case of the local linear approach. It can be obtained by taking b = 0. This case is so-called local constant approach and its kNN estimator is defined by . This estimator has been studied by Karra et al. [10]. They established its asymptomatic properties when the observations are independent identically distributed. While, here, we develop the dependent case. Once again, the kNN-LCM estimator's consistency is also new in this context of nonparametric functional data analysis. It is given in the following corollary.
co. τ n log n n 2 .

Application to Functional Time Series Prediction
One of the main feature of the CCDF is the possibility to construct several predictive regions I ζ . Of course, the efficiency of each prediction interval is assessed by the means of the length of the set I ζ and the presence of the true value in I ζ . It is well documented that the width of the SCMI is the smallest compared to all predictive regions with the same coverage probability. It was introduced by Yao and Tong [18] and obtained by The Leb(·) refers to the Lebesgue measure. Using the CCFD estimators, we approximate the SCMI by

Example 1: Application to Climatological Time Series Data
In this first example, we show the applicability of the proposed estimators to climatological data. Indeed, we predict the monthly average temperature one year ahead in Debrecen's station in Hungary. The link to the data is provided in the "Data Availability Statement" section. Let us note that the studied data is recently collected, and it contains only some missing values, which are replaced by the average of the values before and after the analyses. Now, from this observed data, we construct n + 1 = 100 curves (X i (t)), i = 1, ..., n + 1, where X i denotes the average temperature curve observed during the (1 year) 12 months of the i-th year. The observed data are plotted in Figure 1, representing the values of the monthly average of the temperature. In Figure 2, we plot the (X i ) i that represents the yearly curves of the temperature. The efficiency of the SCMI predictive interval is linked to the parameters' choices in the estimator F. For this computational purpose, we compute F with the quadratic kernel K. The metric d is determined according to the PCA-algorithm. In this application part, we shall compare the predictive interval (SCMI ζ = 0.1) using the kNN-LLM estimator instated to the CKM-estimator studied by Laksaci et al. [6]. In both estimators, we choose k and h by the same cross-validation method used by De Gooijer and Gannoun [19], which is based on the following criterion: This criterion is optimized over the same subsets of k and h, proposed by Rachdi and Vieu [17]. Now, the SCMI predictive interval of the whole curve of the last year (i = 100) of this sample knowing the functional covariates X 99 , by commuting F(·|X 99 * ), where X 99 * is the nearest curve to X 99 using the learning sample (Y j i , X i ) i=1...99 with Y j i = X i+1 (j), for each fixed month j = 1, . . . , 12. Figure 3 displays the results where we plot three curves: The true curve in the dashed line and the extremities of the SCMI-interval are given in the continuous line. The observed data is displayed in the dashed curve and the solid curves represent the estimated values for the two extremities of the SCMI predictive interval. It is clear that the kNN-LLM is significantly better than the CKM-estimator basis one. This gain is confirmed by the average of the SCMI-length(see Table 1).

kNN-LLM Estimator CKM Estimator
Average of the length of the SCMI 1.23 2.07

Example 2: Application to Air Quality Time Series Data
In addition to the first example that highlights the importance of the kNN approach over the classical kernel method, we emphasize in this second example the superiority of the local linear approach over the local constant approach. For this purpose, we consider a time series from air quality data. The importance of this kind of data is motivated by the fact that the air quality has a potential environmental impact on the quality of life of humans and the health of animals. In particular, it is well documented that exposures to ground ozone levels for a period of 1-8 h reduce various pulmonary functions and affect the respiratory tract's tissues. Thus, the approximation of the excessive level of ozone concentration is a primordial subject in environmental sciences. In this example, we focus on the air quality in Westminster city in London. The time-series data was collected at the Marylebone road site by real-time monitoring. Let us point out that our study can be used to model the distribution of the ozone given the daily curves of the ocher polluting gases (carbon monoxide, carbon dioxide, Sulphur dioxide, Nitrogen dioxide, Nitric oxide, etc.). However, for the sake of shortness, we focus on two important indices of air quality that are the sulfur dioxide (SO 2 ) and the ozone concentration (O 3 ). It is well known that (SO 2 ) increases the stratospheric ozone concentrations when it reacts with ultraviolet rays. This example's data is more complicated than the first example because the time series data is observed on a more fine grid. The link to this data is also provided in the "Data Availability Statement" section. Precisely, unlike the yearly curve of the first example, here, the ozone concentration is hourly observed, and the sulfur dioxide is observed by time-gride equal to 15 min. Thus, this kind of time series data is more adapted for our functional approach.
It worth noting that the use of functional statistical models in this type of environmental studies has been widely studied by many authors FDA (see, for example, Quintela-del-Río and Francisco-Fernández (2011) to cite a few). In this example, we aim to analyze the relationship between the SO 2 and the O 3 in Westminster city using the SCMI predictive region. Specifically, we wish to predict the total ozone one day before using the daily curve of SO 2 . Formally, we observe 364 days of air quality data (X i , Y i ) in Marylebone road station, where X i (.) is the daily curve of SO 2 in day i and Y i the total ozone of the day i + 1. A sample of the functional regressors is shown in Figure 4. It concerns the daily curves of the sulphur dioxide observed by a time gride equal to 15 min, and the value of SO 2 in the vertical axis was measured in term of micrograms per cubic meter (µg/m 3 ). We highlight the importance of the local linear approach over the classical local constant one for this actual data set. In particular, the local linear approach reduces the bias term of the local constant. So, we shall quantify this gain in practice. To do this, we compare the SCMI of both estimators: the kNN-LLM and the kNN-CKM. In addition, we keep the same strategies as those used in the first example to select the parameters involved in the estimator. More precisely, we use the quadratic kernel on (0, 1), the PCA metric, and the criterion CV to choose the number of neighborhood k.
For our comparison purpose, we put ζ = 0.05 and split the data sample randomly into two parts: the learning sample (260 observations) and the test sample (104 observations). Finally, we examine both estimators' efficiency by Probability Coverage (PC), which is the main criterion to evaluate the predictive regions. In particular, we draw in the Figure 5, the PC of the testing sample obtained by the two estimation methods. We see that the local linear estimation has better performance than that based on the local constant method. Of course, this conclusion is not surprising since it reflects the superiority of the local linear approach in the bias term. Undoubtedly, we can say that the kNN-LLM keeps its advantages over the local constant method in the functional time series case.

Conclusions and Perspectives
The present contribution investigates the problem of the local linear estimation of the distribution function of real random variable conditioning on a functional covariate. The main novelty of this paper is to construct an estimator using the double kernels kNN estimation procedure. The main feature of the built estimator is its smoothing property. The latter improves the estimators' flexibility and broadens the scope of application of the conditional distribution estimation. From a theoretical point of view, the estimator's asymptotic property is established under a more general domain called the functional time series case. Specifically, the dependence setting is modeled through the strong mixing condition. It is well known that this kind of dependency covers an extensive class of usual processes, including the AR process, ARMA process, Gaussian process, Markov process, Linear process, and m-dependent, among others. From a practical point of view, we have illustrated the constructed estimator's feasibility using real data. The computational study shows that the proposed estimator has a good behavior as prediction models. It improves the prediction by the classical kernel method in both single predictions or by predictive region. This statement confirms the superiority of the kNN local linear estimation over the standard kernel one. Moreover, in addition to this considerable development of the nonparametric functional conditional models, the present contribution opens numerous future research tracks in nonparametric functional data analysis. For instance, it will be very interesting to establish the asymptotic normality of the proposed estimator, to consider the weak functional time series case or incomplete functional time series data case. On the other hand, the robustness of the predictors is a crucial issue in functional data analysis. At this stage, studying the consistency of the kNN-LLM estimator of the robust regression in functional time series is an important prospect of the present contribution. It permits the reduction of the sensitivity of the kNN approach to the noisy data, missing values, and the presence of outliers.  Data Availability Statement: The first data that used in the first example is available at https://www. met.hu/en/eghajlat/magyarorszag_eghajlata/eghajlati_adatsorok/Debrecen/adatok/napi_adatok/ index.php (accessed on 30 March 2021). The second data that used in the second example is available at https://www.airqualityengland.co.uk/site/data?site_id=MY1 (accessed on 30 March 2021).