k -Nearest Neighbors Estimator for Functional Asymmetry Shortfall Regression

: This paper deals with the problem of financial risk management using a new expected shortfall regression. The latter is based on the expectile model for financial risk-threshold. Unlike the VaR model, the expectile threshold is constructed by an asymmetric least square loss function. We construct an estimator of this new model using the k -nearest neighbors (kNN) smoothing approach. The mathematical properties of the constructed estimator are stated through the establishment of the pointwise complete convergence. Additionally, we prove that the constructed estimator is uniformly consistent over the nearest neighbors (UCNN). Such asymptotic results constitute a good mathematical support of the proposed financial risk process. Thus, we examine the easy implantation of this process through an artificial and real data. Our empirical analysis confirms the superiority of the kNN-approach over the kernel method as well as the superiority of the expectile over the quantile in financial risk analysis.


Introduction
Defining an accurate financial risk-metric is a challenging issue for financial institutions.Usually, the value at risk (VaR) is the standard risk-metric for financial risk management.The VaR-model was approved by the Basel committee in (1996,2006).However, the financial operators have recognized the limitations and the weaknesses of this risk-metric of the VaR-model through the successive financial crises in the last decade.The primary weakness of the VaR model in financial risk management is its insensitivity to the extreme values.Consequently, the Basel committee in 2014 proposed enhancing the financial risk surveillance with the expected shortfall (ES) function.This function examines the expected loss when we exceed a specific threshold.Generally the threshold is defined through the VaR level.The novelty of this paper is to define the ES-function using an alternative risk-threshold that is the expectile regression.
The shortfall risk model was investigated by [1].Motivated by its coherency feature, the ES function has widely developed in the last decade.A comparison study between VaR and ES-model was carried out by [1].They stated that the VaR is inaccurate when the profit or the loss is not Gaussian.In this context, the ES-model is a more accurate financial risk metric than the VaR function.From a statistical point of view, the ES model behaves at different manners such as parametric, semi-parametric, or free distribution approaches.For an overview in the parametric approach, we refer to [2][3][4].The present paper considers the nonparametric strategy.At this stage, we point out that the first study in nonparametric modeling was introduced by [5].He estimated the ES-model by the kernel method.The same estimator was considered by [6].They stated the asymptotic normality of their estimator.Alternatively, another estimator using the Bahadur representation was constructed by [7].The literature concerning the nonparametric estimation of the ES is limited when the data are functional.To the best of our knowledge, only two works have treated the functional ES-model using the nonparametric regression structure.The first results are developed by [8] when the financial time series is modeled under the strong mixing assumption.The authors of [9] have used a weak correlation assumption to model the financial time series.They proved the complete consistency of the functional kernel estimator of the ES-function under the quasi-associated auto-correlation.
The second component of our contribution concerns the expectile model.It was introduced by [10].It can be considered an alternative to the VaR-function.However, it corrects the main drawback of the quantile, which is the fact that it is the insensitive to the outliers.The expectile metric is very sensitive to the outliers.In financial risk analysis, the expectile has been developed by [11][12][13][14].It should be noted that the expectile function has been used for other statistical problems, including for the outlier analysis (see [15]) or heteroscedasticity detection (see [16,17]).The expectile regression for vectorial statistics was studied by [18].The authors of this last paper have developed a semi-parametric estimation of the expectile.Concerning the functional expectile model, we point out that the first result was stated by [19].They established the asymptotic convergence rate of the kernel estimator of the functional expectile regression.We return to [20] for the parametric version of the functional expectile regression.They established the asymptotic convergence rate of an estimator constructed from the reproducing kernel Hilbert-space structure.For more recent advances and results in functional regression data analysis, we may refer to [21][22][23][24][25].
The third component of this paper is the k-NN smoothing approach.It is an attractive approach for many applied statistics such as the classification problems, the clustering issues or the prediction questions.The kNN estimation approach has been popularized by the contribution of [26].This cited paper can be considered as pioneer work in nonparametric estimation by the kNN method.Pushed by its diversified applications, the kNN estimation algorithm has been introduced in functional data analysis by [27].They proved the almost complete point-wise convergence of the functional regression using the kNN estimator.Such a result has been stated under the independence condition.We refer to [28] for the uniform convergence of the kNN estimation of the functional regression.They established the convergence rate using the entropy property.More recent advances and references in the functional kNN method, we may cite [21,29,30].
In this paper, we aim to estimate expectile shortfall regression using the k-NN smoothing approach.The principal motivations on the use of this estimation methodology are as follows: (1) usually, financial data are not Gaussian and the parametric approach fails to fit its randomize movement; (2) the functional approach explores the high frequency of the financial data by treating it as continuous curves; (3) the kNN approach explores the functional structure of the data by considering a varied local bandwidth adapted to functional curves.This feature allows one to update the estimator and identify the financial risk systematically; (4) the last motivation is the possibility of remedying the problem of the outliers' insensitivity using the expectile instead to the VaR.The mathematical support of this contribution is highlighted by establishing the almost complete convergence of the constructed estimator.Additionally, we provide the convergence rate of the UCNN consistency of the constructed estimator.It should be noted that this last result has a great importance in practice.In particular, it can be used to resolve some practical purpose, namely the problem of the choice of the best number of neighborhood.So, we emphasize our theoretical development to examine the applicability as well as the efficiency of the kNN estimator of expectile shortfall regression.More precisely, we examine the attainability of the estimator using artificial and real financial data.
This paper is structured as follows: In Section 2, we present the risk metric function and its kNN estimator.In Section 3, we state the point-wise convergence of the constructed estimator.The UCNN consistency is sated in Section 4. Section 6 is dedicated to discuss the computation-ability of the estimator over simulated and real-data applications.Finally, the proofs of the auxiliary results are given in Section 7.

KNN Estimator of Expectile Shortfall Regression
Let (A 1 , B 1 ), . . ., (A n , B n ) be n independent random pairs in F × IR which are independent and have the same distribution as (A, B).The functional space F is a semi-metric space with a semi-metric d.For our expected shortfall regression analysis, we assume that A is the functional explanatory variable and B is the real response variable.Often, the conventional ES-regression is defined for a ∈ F , where CVaR p (•) is the conditional value at risk.In this paper, instead of CVaR p , we explicate the ES-regression using the conditional expectile of B given A = a.The latter is denoted by CEA p (•) and is defined by where 1 A is the indicator function of the set A. Of course, this replacement of CVaR p by EXR p enables one to overcome the lack of risk insensitivity of the quantile to the extreme values.This characteristic is very important in practice because the catastrophic losses are characterized in the extremities.The second feature of our contribution is the use of the kNN estimation approach.This latter feature is based on the determination of the smoothing parameter as where B(a, h n ) is a ball of the center a with a radius h n > 0 defined as follows So, the kNN estimator of EXES-regression is where F(•) is a known measurable function and EXR p is the kNN estimator of EXR p .The latter is defined as the solution of

Pointwise Convergence
Before establishing the asymptotic properties of the estimator CEA p , we consider some notation and assumptions.For the notation, we set by C a or C ′ a some strictly positive generic constants, N a is a given neighborhood of a. Furthermore, for all t ∈ IR, we define . Now, to formulate our main result, we will use the hypotheses listed below: There exists an invertible non-negative function ϕ(•), a bounded and positive function L(•), and a function ζ 0 (u) such that (i) ϕ(ϵ) tends to zero as ϵ goes to zero and, (P5) The kernel function F(•) is supported on (0, 1) such that (P6) The number of the neighborhood k such that Comments on the hypotheses.All the considered assumptions are classical in functional data analysis, namely for the kNN smoothing approach.They are used for a similar study (see, for instance [30]).The assumptions (P1) and (P2) relate the functional variable to the probability structure.As discussed in the last paragraph of introduction, the nonparametric path is motivated by the fact that the distribution of the financial movement is unknown in practice.Assumption (P4) concerns the conditional moment integrability of the interest variable B. Such a condition is usually used in the regression analysis.Observe that the upper bound in (P4) is not uniform, but strongly depends on the order of the moment m and the location point a.This assumption is used to apply the Bernstein inequality where the constant C ′ m,a should be inferior to The assumption over the kernel function is defined in condition (P5).Such a technical assumption is used to precise the convergence rate of the estimator.Now, we obtain the following result Theorem 1. From the suppositions (P1)-(P6), we have Proof of Theorem 1.For t ∈ IR, we define Then, , and CE(EXR p (a), a) = CEA p (a). Therefore, It suffices to prove the following lemmas.
The proof of both required results is based on the technique of kNN smoothing summarized on the following lemmas.
Lemma 3 (see [28]).Let (Z i , Z i ) i=1,...,n be a sequence of independent random variables identically distributed as (Z, Z), which are valued in F × R. Let where and Then, we have

UCNN Convergence
We aim to establish the almost complete consistency of CEA p (a) uniformly in the numbers of neighbors k ∈ (k 1,n , k 2,n ).To do that, we denote by C and C ′ some strictly positive generic constants.In order to announce the first theorem, we will need the following assumptions.

U1 The function's class
where the maximum is an overall probability Q on the space F with Q(G 2 ) < ∞ with G being the envelope function of the set F .N (ϵ, F ) is the number of open balls with a radius ϵ, which is necessary to cover the class of functions F .The balls are constructed using the L 2 (Q)-metric.
U2 The kernel F is supported within (−1/2, 1/2) and has a continuous first derivative, such that: 0 and where 1I A is the indicator function of set A.
U3 The sequences (k 1,n ) and (k 2,n ) verify: Then, the following theorem gives the UINN consistency of CEA p .

Empirical Analysis
In this section, we discuss the practical use of the risk metric studied in the present work.This section is divided into three sections.In the first part, we propose an approach to choose the best number of neighborhood.The selection of this number constitutes a primordial for the practical use of this financial model.The second part is devoted to evaluate the behavior of the estimator for an artificial datum.In the last section, we examine the constructed model over real financial data from the Dow-Jones stock market.

Smoothing Parameter Selection: Cross-Validation
Generally, the optimal number k is obtained by optimizing some criterion as where L is a given loss function which is fixed according to the employed selector algorithm.In particular, for the expectile regression, many selector approaches exist; for instance, different cross-validation rules that were used, as can be seen in [19].For instance, we can use or more generally where ρ is the scoring function defining EXR p .In practice, these selectors provide an efficient estimator.In this context, the UCNN convergence allows one to ensure the consistency of CEA p opt (a) which is associated with k opt .Therefore, we deduce the following corollary.
) and if the conditions of Theorem 2 hold, then we have

Simulated Data
The first part is devoted to the examination of the performance of the ES-expectile function using artificial observations.We compute this estimator for independent functional data.More precisely, we compare the proposed model to the same estimator obtained by the standard smoothing parameter.Additionally, we compare our estimator to the standard expected shortfall based on the percentile regression.For this empirical study, we generate the functional variable A i (t) defined, for any t ∈ [0, 1], by: where W i and η i are two real random variables.In order to cover more general cases, we consider two examples of (W i , η i ).In the first example, we assume that W i ∼ N (0, 0.5) and η i ∼ N (0, 1).While, in the second example, we generate W i from Lognormal(0, 0.5) and η i from Lognormal(0, 1).In both cases, the obtained functional variables are relatively smooth, allowing one to choose the spline L 2 -metric (see [31]).A sample of the first co-variate curves is plotted in Figure 1.We assume that the functional regressor represents a continuous trajectory of a financial asset.The interest variable B represents a future characteristic of this trajectory.More precisely, for all i, we assume that B i = A i+1 (0).Recall that the principal aim of this computational part is to conduct a comparison study between the kNN ES-expectile regression CEA p and the standard expected shortfall based on the VaR p regression associated with the percentile regression VaR p (a) = inf{z ∈ IR : F(z|a) ≥ p}, where F is the conditional cumulative function of B given A. The latter is estimated using the kNN estimator of the function F as .
Thereafter, we estimate the VaR p function by Recall that, in this case, the expected shortfall regression is expressed by So, we aim to compare the three estimators CEA p (see Equation ( 2)), CEA p (see, Equation ( 8)) and CEA p (obtained by replacing A n,k in Equation ( 2) by a standard bandwidth h n ).All these estimators are calculated using the β-kernel, and the spline L 2 metric and the smoothing parameter (h or k) are selected by the cross-validation rule (6).In the kNN estimation, we select the best k by Such an error is evaluated for various values of p = 0.9, 0.5, 0.1, 0.05, and p = 0.01.The abstained results are given in the following tables (see Tables 1 and 2).Clearly, the behavior of the three estimators are strongly impacted by the choice of the smoothing parameter.However, we observe that the kNN approach is more appropriate compared to a standard case.Moreover, the expected shortfall based on expectile is more accurate than the expected shortfall based on the VaR threshold.Without suppressing the behavior of the estimators, it is also affected by the definition of the regressors with respect to the distribution of (W i , η i ) (normal or lognormal).In particular, the estimators CEA p and CEA p are more sensitive to this aspect than the estimator CEA p .The variability of the Mse and Msp is more important in CEA p and CEA p than CEA p .This sensitivity confirms the importance of the expectile regression as a financial risk model.

Real Data Application
This last paragraph is devoted to the applicability of our model to real data.More precisely, we examine the efficiency of the ES-expectile model over financial data associated with real-time stock prices of liberty energy company.This last is one of a major energy industry service provider across North America.Using these data, we compare our financial metric to its competitive ones.In this financial data analysis, we study the high price P(t) t of this company during October 2024, observed within a frame of five minutes.The parent data contains more than 2700 values.The process of P(t) t is displayed in Figure 2. It is available in https://stooq.com/db/l(accessed on 24 April 2024).To insure the stability, we proceeded with the difference algorithmic.We constructed the functional data from the process R(t) = log(P(t + 1) − log(P(t))).The transformed data are given in Figure 3.We explore the functional path of the considered data by cutting the process R(t) with pieces of 30 points.These pieces represent the functional regressors A i .Furthermore, we use the same strategy as in the simulated data.Indeed, we choose A i as the curve of R(t) t∈[s−30,s[ and B i = R(s).Now, to insure the independence structure of our work, we select distanced observations.Specifically, from 2790 observations, we choose 90 equidistant independent values from which we construct our learning sample (A i , B i ) i=1,...,90 .Thus, we compare the three estimators CEA p , CEA p , and CEA p using a real datum, (A i , B i ) i=1,...,90 .Such estimators are computed using the same algorithm of the simulated data.We use the same kernel and select the smoothing parameters k and h by the rule (6).We use the L 2 metric obtained by the PCA-metric.We refer to Ferraty and Vieu [31] for more details on the mathematical formulation of this metric.The comparison results are given in Figures 4-6, where we plot the true values of 670 testing observations (black line) (B i ) i=1,...,670 versus the estimator CEX p (A i ) and CEA p (A i ) (red line) for values of p = 0.1.Once again, the comparison confirms the superiority of the kNN ES-expectile regression over the standard ES-expectile and the kNN ES-quantile model.This superiority is confirmed by computing the Mse error (9) of the three models.We obtain, respectively, 0.0274 for the kNN ES-expectile, 0.108 for the standard ES-expectile 0.201, and for the kNN ES-quantile.For a deep examination of the behavior of the three estimators as financial risk models, we use the backtesting measure based on the cover test developed Bayer and Dimitriadis [32].Specifically, we apply the version so-called one-side intercept expected shortfall regression backtest.This last is obtained using the routine code esr-backtest from the R-package esrback with α = 0.05.We compute the p-values of the 70 observations, randomly chosen, from the above 670 testing observations.The average of the obtained values confirms the first statement that is the kNN ES-expectile regression, which is more adequate than the standard ES-expectile and the kNN ES-quantile model.Specifically, the average of the p-values of the kNN ES-expectile is 0.035 against 0.067 for the standard ES-expectile, and for the 0.058 kNN ES-quantile.

Conclusions and Prospects
In the present work, we developed a free-parameter estimation of the ES-expectilewith-regression.We constructed an estimator using kNN smoothing.This study covers two principal aspects of the financial data analysis: In the theoretical part, we establish the almost complete convergence of the constructed estimator; moreover, to ensure the applicability of the constructed estimator, we also determine the convergence rate of the UCNN consistency.Of course, this theoretical analysis constitutes a good mathematical support for the use of the new developed risk-metric in practice.We point out that the obtained asymptotic results are established under standard conditions and with the precision of the convergence rate.In particular, all the assumed conditions are related to the functional structure of the regressors and the nonparametric path of the model.On the other hand, we observe that the applicability of the estimator is very easy and gives better results compared to the other financial risk metric.In addition, our contribution leaves many open questions.For instance, the first natural prospect is the treatment of the dependent case, which allows the control of the movement of the stock exchange in its natural path, that is, the functional time series case.The second future work is the establishment of the asymptotic distribution of our new estimator, which in both cases, are independent or dependent cases.Moreover, the third prospect concerns the determination of the single structure case.This last permit enables one to improve the convergence rate of the estimator.Furthermore, we can also treat the partial model case or the parametric case.

The Demonstration of the Intermediate Results
The proof of the intermediate results are regrouped in this Section.

Proof of Lemma 1. It suffices to apply Lemma 3 for
n and L are the same as in [27], Conditions (C1 and C2) are satisfied.So, all that remains is checking condition (C3).Indeed, by a simple decomposition, for t ∈ IR, where Because the proof of the three required results is based on similar analytical arguments in FDA, we only focus on the second results, namely, sup , a.co.
To do that, we write with Now, by (P2), we obtain sup As For this, we write for any η > 0 We write we can apply the inequality of Bernstein with a 2 n = (ϕ(h)) −1 , to obtain, for all τ > 0, Proof of Lemma 2. Let Similarly to [19], we have where Similarly to the previous lemma, we write where We prove and The rest of the proof is based on the same arguments of Lemma 1 where CE(t, a) is replaced by G 1 (t, a) or G 2 (t, a).
It is shown, in [33], that So, all that is left to be proven is   , for all z = t j ∓ l n , 1 ≤ j ≤ d n .
The proof of the latter is based on Bernstein's inequality for empirical processes as in [33].The empirical processes are where F i = F h −1 d(a, A i ) .Thereafter, we obtain IP sup Consequently, an adequate choice of η 0 enables us to deduce (18).

Figure 1 .
Figure 1.Some explanatory curves of the sample A i .

2 , 2 ,
EXR 0.5 (A i ) and for the standard bandwidth, we select h h CV opt (a) = arg min h∈H n (a) n ∑ i=1 B i − EXR 0.5 (A i ) where H n (a) is the set of the positive real h(a) such as the ball centered at a with radius h(a) 'contains exactly k neighbors of a.We compare this selection procedure using an arbitrary choice.Specifically, we execute the three optimal estimators CEA p opt , CEA p opt and CEA p opt , and the arbitrary one CEA p arb , CEA p arb and CEA p arb .The efficiency of the estimation approaches is examined using the backtesting measure defined by Mse

whereFF
CE(t, a) is obtained by replacing h by A n,k in CE(t, a).For this aim, we use the following decompositionCE(t, a) − CE(t, a) = B(t, a) + D(t, a) CE D (a) + Q(t, a) CE D (a) ,whereQ(t, a) = ( CE N (t, a) − IE[ CE N (t, a)]) − CE(t, a)( CE D (a) − IE[ CE D (a)]), B(t, a) = IE[ CE N (t, a)] IE[ CE D (a)] − CE(t, a) and D(t, a) = − B(t, a)( CE D (a) − IE[ CE D (a)]), with CE N (t, a) h −1 d(a, A i ) B i 1 B i >t and h −1 d(a, A i ) .Thus, we split the proof of Lemma 4 into supa n ≤h≤b n CE D (a) − IE[ CE D (a)] = O a.co.ln n nϕ(a n ) .sup a n ≤h≤b n sup t∈[EXR p (a)−δ, EXR p (a)+δ] CE N (t, a) − IE[ CE N (t, a)] = O a.co.ln n nϕ(a n ) ,wherea n = ϕ −1 αk 1,n n and b n = ϕ −1 k 2,nnα .We concentrate on the second convergence.The first one can be deduced by the same tools.Indeed, we write that [EXR p (a) − δ, EXR p (a) + δ] ⊂ d n j=1 t j − l n , t j + l n , with l n = n −1/2 and d n = O n 1/2 a sequence of real random variables and (V n ) n∈N a decreasing positive sequence (with lim : F −→ R be a non-random function.If, for all increasing sequence ζ n = ζ ∈ (0, 1) with limit 1 (ζ − 1 = O(V n )), there exist two sequences of real random variables (A −

.
Next, since the functions IE[ CE N (x, •)] and CE N (x, •) are monotone, which implies that, for 1≤ j ≤ d n , IE CE N (x, t j − l n ) ≤ sup t∈(t j −l n ,t j +l n ) IE CE N (t, a) ≤ IE CE N (x, t j + l n ) , CE N (x, t j − l n ) ≤ sup t∈(t j −l n ,t j +l n ) CE N (t, a) ≤ CE N (x, t j + l n ).CE N (t, a) − IE CE N (t, a) −l n ,t j +l n } CE N (z, a) − IE CE N (z, a) + 2Cl n .−l n ,t j +l n } CE N (x, z) − IE CE N (x, z) = O ln n n ϕ(a n ) −l n ,t j +l n } CE N (x, z) − IE CE N (x, z) > η ln n nϕ(a n ) ≤ 2d n max 1≤j≤d n max z∈{t j −l n ,t j +l n } IP sup a n ≤h≤b n CE N (x, z) − IE[ CE N (x, z)] > η ln n nϕ(a n ) .CE N (x, z) − IE[ CE N (x, z)] > η ln n nϕ(a n )