Computing Expectiles Using k -Nearest Neighbours Approach

: Expectiles have gained considerable attention in recent years due to wide applications in many areas. In this study, the k -nearest neighbours approach, together with the asymmetric least squares loss function, called ex- kNN , is proposed for computing expectiles. Firstly, the effect of various distance measures on ex- kNN in terms of test error and computational time is evaluated. It is found that Canberra, Lorentzian, and Soergel distance measures lead to minimum test error, whereas Euclidean, Canberra, and Average of ( L 1 , L ∞ ) lead to a low computational cost. Secondly, the performance of ex- kNN is compared with existing packages er-boost and ex-svm for computing expectiles that are based on nine real life examples. Depending on the nature of data, the ex- kNN showed two to 10 times better performance than er-boost and comparable performance with ex-svm regarding test error. Computationally, the ex- kNN is found two to ﬁve times faster than ex-svm and much faster than er-boost , particularly, in the case of high dimensional data.


Introduction
Given independent data D n := ((x 1 , y 1 ), . . . , (x n , y n )) that were drawn from unknown probability distribution P on X × Y, where X ⊂ R d and Y ⊂ R, the symmetric loss functions, such as least absolute deviation loss or least squares loss lead to study the center of the conditional distribution P(Y|X = x) by estimating the conditional median med(Y|X = x) or the conditional mean E(Y|X = x), respectively. To investigate P(·|x) beyond the center, one well-known approach is computing quantiles that were proposed by Koenker and Bassett Jr. [1]. If P(·|x) has strictly positive Lebesgue density, the conditional τ-quantile q τ , τ ∈ (0, 1) of Y given x ∈ X is the solution of Another approach is computing expectiles proposed by Newey and Powell [2] and it has gained considerable attention recently. Assume that Q := P(Y|x) such that |Q| 1 := Y y dQ(y) < ∞, then the conditional τ-expectile µ τ for each τ ∈ (0, 1) is the unique solution of It is well-known that quantiles or expectiles can be computed algorithmically, see, for example, Efron [3] and Abdous and Remillard [4]. To be more precise, for all t ∈ R and a fixed τ ∈ (0, 1), one needs to solve the optimization problem where L τ is the asymmetric loss function, which, for p ≥ 1, is defined by For p = 1, we reach the asymmetric least absolute deviation (ALAD) loss function from (2) and, consequently, conditional quantiles q τ = η τ from (1). Analogously, for p = 2, L τ from (2) is the asymmetric least squares (ALS) loss and, thus, (1) produces conditional expectiles µ τ = η τ .
In general, for fixed τ, expectiles do not coincide with quantiles. The decision of computing either expectiles or quantiles depends on the applications at hand. For example, if one is interested to compute the threshold below which τ-fraction of observations lies, then τ-quantile is the right choice. On the other hand, to compute the thresholds, such that the ratio of gain (average deviations above threshold) and loss (average deviations below threshold) is equal to k where k := 1−τ τ , then τ-expectile is the right choice. Broadly speaking, quantiles are the tail probabilities, while expectiles are considered tail expectations. This distinction of expectiles from quantiles made expectiles applicable in the fields of, for example, demography [5], education [6], and extensively in finance (see, e.g., Wang et al. [7] and Kim and Lee [8]). In fact, expectiles are the only risk measures that satisfy the well-known properties of coherence and elicitability (see, e.g., Bellini et al. [9]), and they are are proved to be better alternative to quantile-based value-at-risk (VaR). Furthermore, one can immediately realized the well-known performance measure in portfolio management known as gain-loss ratio or Ω-ratio by τ-expectile for any τ ∈ (0, 1), see Keating and Shadwick [10] for more details.
Different semiparametric and non-parametric approaches have been proposed in the literature for estimating expectiles. For example, Schnabel and Eilers [11] proposed an algorithm using P-splines, which is found to be difficult to implement on the problems involving multiple predictors. Another algorithm that is based on gradient boosting is proposed by [12] and Yang and Zou [13]. It is observed by Farooq and Steinwart [14] through experiments that boosting-based algorithm becomes computationally expensive when the dimensions of input space increases. Recently, Farooq and Steinwart [14] developed an SVM-like solver while considering sequential minimal optimization. Although the solver is found to be efficient when compared with an R-package er-boost developed by Yang and Zou [13], but SVM-based solver is found time-sensitive to the training set size.
It is important to note that the aforementioned algorithms are required to select an appropriate nonlinear function. To estimate conditional expectiles directly, Yao and Tong [15] used the kernelized iteratively reweighted approach, where kernel is used to assign higher weights to the points that are closer to the query point. However, choosing the right kernel for a problem at hand is tricky. Moreover, this method leads to the curse of dimensionality issue. Another simple, yet very popular, approach for estimating conditional quantities is the k-nearest neighbours that has competed with the advanced complex machine learning approaches when it comes to dealing with complex problems. Since the introduction of kNN in 1967, it has been applied in many applications regarding classification, regression, and missing value imputation, such as pattern recognition, economic forecasting, data compression, outlier detection, and genetics, etc., see [16]. One key advantage of the kNN approach is that it does not require a smoothness assumption of functions, which is, in general, necessary in advanced techniques of classification and regression. With these advantages, we use kNN together with the ALS loss function to compute expectiles and named the algorithm ex-kNN. We then compare the ex-kNN with and R-package, called er-boost proposed by Yang and Zou [13], and show, numerically, that ex-kNN not only better performs in terms of accuracy and computational cost, but is also less sensitive to the dimensions of input space as compared to er-boost. In addition, we show that the performance of ex-kNN is comparable with an SVM-like solver ex-svm proposed by [14]. Moreover, ex-kNN is found to be less sensitive to the training set size as compared to ex-svm.
This paper is organized, as follows: Section 2 gives a brief overview of the kNN approach in the context of computing expectiles. This also includes a concise introduction of different distance measures (Section 2.1) and the procedure of choosing the best value for k (Section 2.2)-two aspects of kNN that greatly influence its performance. Section 3 covers the experiments while considering nine real life datasets in order to evaluate the effect of various distance measures on the performance of ex-kNN as well as comparing the results with existing packages for computing expectiles, like er-boost and ex-svm. Section 4 provides the concluding remarks.

K-Nearest Neighbours Expectile Regression
The kNN approach to computing expectiles is explained in this section. Because the performance of kNN depends on the selection of suitable distance measures and the best value of k-neighbours, a detailed discussion on these two aspects is given in Sections 2.1 and 2.2, respectively. To this end, let x q ∈ X be a query point and R := {x 1 , . . . , x r } ∈ X be a set of reference points. Subsequently, the kNN approach searches the k nearest neighbours for query point x q in the reference set R based on a specified distance measure, and then uses the set {y 1 , . . . , y k } that corresponds to the k nearest neighbours for classification or regression.
Once the set {y 1 , . . . , y k } corresponding to k-nearest neighbours is achieved, expectiles can be computed by solving the empirical version of (1), which is, for p = 2, we solve the optimization problemμ Because the loss function used in (3) is quadratic in nature, the problem (3) can be solved by using iterative reweighted least squares (IRLS) algorithm. To be more precise, we assume an initial estimate of the expectile e τ and generate a sequence of weights τ (when y i < e τ ) and 1 − τ (when y i ≥ e τ ) for all i = 1, 2, . . . , k. Subsequently, we update the estimate of e τ by and, hence, the corresponding weights, repeatedly, until convergence is achieved. Note that one may initialize e τ by the average value of {y 1 , . . . , y k }. Following [11], Procedure 1 presents the pseudo code for computing expectiles using IRLS.

Procedure 1 Computing Expectiles
Input: response variable y, τ ∈ (0, 1), stopping criteria Here, g is the margin of error achieved in an iteration and T is the stopping criterion. Fast convergence can be achieved by making good choice of T and initialization of e * τ .
Moreover, in the case when τ = 0 or τ = 1, the τ-expectile can be considered to be the minimum or maximum value of the set {y 1 , . . . , y k }, respectively.

Distance Measures
As mentioned above, one of the aspects that influences the performance of kNN algorithm is the distance measure that is used to identify the closest neighbours in the training data. The Euclidean is a commonly used distance measure while implementing kNN. However, to our knowledge, there is no study indicating that it is the best suitable choice with kNN in all cases. Therefore, several studies have been dedicated to exploring the suitable distance measure with kNN for the given problem at hand. For instance, Mulak and Talhar [17] evaluated the performance of kNN with four distance measures on the KDD data set and found that the Manhattan distance measure is the best in terms of classification accuracy. Lopes and Ribeiro [18] investigated the impact of five distance measures, such as Euclidean, Manhattan, Canberra, Chebychev, and Minkowsky, for various small datasets and found that Euclidean and Manhattan perform better for most of the datasets. Extending the investigation, Kittipong et al. [19] investigated the performance of kNN with eleven distance measures and determined that Manhattan, Minkowsky, and Chebychev lead to better performance. Analogously, Todeschini et al. [20] considered eighteen distance measures and sorted out the best five distance measures leading to better performance.
A more detailed investigation in this regard has been done by [21], where they have considered 54 different distance measures from seven families of distances and shown that no single distance metric can be considered to be good enough for all types of datasets. In other words, the choice of the distance measures with kNN depends on many factors, such as nature of input variables, number of dimensions, and size of datasets. This raises the need of also considering different distance measures in our study for computing expectiles using kNN and to identify the one that leads to the best performance in our case for a specified dataset. For this purpose, we consider a set of distance measures that have been found to be the best in the aforementioned studies for various datasets. This set includes the Euclidean distance, Manhattan distance, Chebychev distance, Canberra distance, Soergel distance, Lorentzian distance, Cosine distance, Contracted JT distance, Clark distance Squared Chi-Squared distance, Average (L 1 , L ∞ ) distance, Divergence distance, Hassanat distance, and Whittaker's Index association. A brief description of these distance measures is given in the following. We refer to [21] for more details on these and other lists of distance measures. To this end, we assume that x i = {x i1 , . . . , x ip } and x j = {x j1 , . . . , x jp } are the p-dimensional ith and jth data points in X p .

•
Euclidean Distance (ED) is also called L 2 norm or Ruler distance and defined by • Manhattan Distance (MD) is also known as L 1 distance, and defined as the sum of absolute difference of elements of x i and x j for i = j = 1, 2, . . . , p, which is, • Chebychev Distance (CbD) is the maximum value distance and specified by • Canberra Distance (CD) is a weighted version of Manhattan distance measure and defined by Note that (8) is sensitive to small changes when both x i and x j are close to zero. • Soergel Distance (SoD) is widely used for calculating the evolutionary distance and obeying all four properties of a valid distance measure. It is listed by • Lorentzain Distance (LD) is defined as a natural log of absolute distance between vector x i and x j , that is where one is added to avoid the log of zero and to ensure non-negative property of a distance metric. • Cosine Distance is derived from a cosine similarity that measures the angle between two vectors. It is specified by • Jaccard Distance (JacD) measures dissimilarity between two vectors. It is defined by • Clark Distance is also called the coefficient of divergence. It is the square root of half of divergence distance. It is defined by • Squared Chi-Squared Distance belongs to the family of L 2 and it is defined by • Average (L 1 , L ∞ ) Distance is the average of Manhattan distance and Chebyshev distance. It is defined by • Divergence Distance is defined by • Hassanat Distance (HasD) is defined by where for l = 1, 2, . . . , p It is important to note that (17)

Selecting Best k
The selection of an appropriate value for k plays a key role in determining the performance of kNN. A lot of work has been done so far in the context of classification, regression, and missing data imputation in order to deal with this issue. For example, Lall and Sharma [22] suggested choosing the fixed optimal k = √ n for training a dataset when the sample size n > 100. However, this approach has been criticised due to lack of theoretical guarantee. Many more advanced approaches to determine k-value have been proposed. For example, the kTree method is used to learn different k-value for different test sample [23], sparse-based kNN method [24], and using reconstruction framework [25]. For more details, we refer the readers to [23], and the references therein.
The cross-validation is one of the approaches that has gained popularity in machine learning applications to tune the hyperparameters and that has also been considered in kNN. Recall that the cross-validation splits the data into two folds, where one fold is used to train the model by learning suitable values for the hyperparameters and the other fold is used to validate the model. The m-fold cross-validation method extends this approach by randomly dividing data into m equally (or nearly equally) folds. In other words, the process of cross-validation is repeated m times, such that in the each iteration a different fold is held-out for model validation and the remaining m-1 folds are used to learn the hyperparameters.

Procedure 2 Computing Best k
Input: τ ∈ (0, 1), k max , data split data into m-folds for i = 1 to k max do for j = 1 to m do Obtain k = i neighbours against each query point of j-th fold using distance measure d(·, ·) Compute expectiles from k = i neighbours using Procedure 1 Compute error for j-th fold using L τ end for For each k = i choose minimum error from all folds end for Choose k with minimum cross-validation error return Best k Note that the algorithms, such as er-boost and ex-svm, use cross-validation approach to tune the hyperparameters. We consider the same approach in this study to select the best value of k, despite the existence of advanced strategies so that a fair comparison of ex-kNN with these algorithms can be made in terms of training time. Procedure 2 provides the pseudo code of selecting best k value by cross-validation.

The ex-kNN Algorithm
After considering a distance measures from the list that is given in Section 2.1, the best k-value for the training part of the data is attained using cross-validation approach following Procedure 2. The selected best k-value is used in testing part of data to compute the test error. The whole procedure of implementing the ex-kNN is described in the following Algorithm 1.

Experimental Results
In this section, we conduct experiments using different datasets to compare the performance of ex-kNN with the existing algorithms of computing expectiles, like er-boost and ex-svm, in terms of training time and test error. All of the experiments have been performed on INTEL CORE i3-4010U (1.70 GHz) 4 GB RAM system under 64 bit version of WINDOWS 8. The run time during experiments is computed by using single core, whereas the running of other processes were minimized.
Recall that [14] have considered nine datasets in their study to compare the performance of ex-svm with er-boost. To make a fair comparison of ex-kNN with ex-svm and er-boost, we have downloaded the same datasets following the details that are given by [14]. That is, the data sets CONCRETE-COMP, UPDRS-MOTOR, CYCLE-PP, AIRFOIL-NOISE, and HOUR have been downloaded from UCI repository. Three data sets-NC-CRIME, HEAD-CIRCUM, and CAL-HOUSING-were extracted from R packages Ecdat, AGD and StatLib repository, respectively. Finally, one data set MUNICH-RENT was downloaded from the data archive of Institute of Statistics, Ludwig-Maximilians-University. These datasets were scaled componentwise, such that all of the variables, including the response variable, lie in [−1, 1] d+1 , where d denotes the dimension of the data. Table 1 describes the characteristics of the considered data sets. All of the datasets were randomly divided into training and testing samples comprising 70% and 30%, respectively. The training sample is further divided into m-folds with randomly generated folds to implement cross-validation approach for determining the best k-value. It is important to note that the algorithms er-boost and ex-svm are implemented in C ++ to gain the computational advantages. For fair comparison regarding computational cost, we have implemented ex-kNN in the R 3.6.1 package using the libraries Rcpp [26] and ArmadilloRcpp [27]. The library Rcpp provides seamless integration of R and C ++ , whereas ArmadilloRcpp is the templated C ++ linear algebra library. The use of these libraries makes the implementation of ex-kNN algorithm close to the implementation of er-boost and ex-svm algorithms.
Firstly, we evaluate the effect of different distance measures that were considered in Section 2.1 on the performance of ex-kNN for computing expectiles. Note that the performance is measured in terms of test error and computational time, and the distance measure that leads to the minimum on these two evaluation factors is considered to be the best distance measure. In this context, for τ = 0.25, 0.50, 0.75, the test error and computational time of ex-kNN is computed and presented in Tables A1-A6 in Appendix A. By giving rank to the test error and computation time of ex-kNN for each distance measure in each dataset, we observe that there is no single distance measure that performs well on all data sets in terms of test error and computational cost. For instance, by looking in Table A1 of test error for τ = 0.25, we see that, for the dataset HOUR, the Euclidean distance provides the minimum test error, whereas the same distance measure behaves entirely opposite for dataset MUNICH-RENT. This leads us to conclude that the distance measure that plays vital role on the performance of ex-kNN depends on the characteristics of datasets. This observation is also noted by [21] in their investigation. Therefore, one need to consider the nature of datasets while choosing a distance measure for kNN-type methods.
In order to determine the overall performance of a distance measure on all datasets, we have computed the average of ranks assigned to individual distance measures on different datasets, see Tables A1-A6 in Appendix A. Clearly, the distance measures Canberra, Lorentzian, and Soergel can be labelled as the best three distance measures when the goal is to achieve the high accuracy of the results. However, on the other hand, when the objective is to attain a low computational cost of ex-kNN, the Euclidean, Canberra, and Average (L 1 , L ∞ ) distance measures are ranked as the top three. Furthermore, these best three distance measures regarding test error and computational cost does not hold the said order for τ = 0.25, 0.5 and 0.75. This indicates that the distance measure for the same data set behaves differently for different τ-level. To be more elaborative, we see, in Tables A2 and A3, that the Canberra distance measure that holds top position in providing minimum distance measure in most of the datasets when computing expectiles for τ = 0.50, 0.75 attains third position for τ = 0.25. Similar is the situation with other distance measures. It is interesting to note from the results of our experiments based on the considered datasets that no single distance measure can lead the ex-kNN towards achieving both goals of high accuracy and low computational cost at the same time. In other words, the choice of a distance measures with ex-kNN not only depends on the data set, but also the objective. It is also important to note that the Euclidean distance that has been considered more often with kNN in the literature shows poor performance in general when the goal is to achieve highly accurate predictions.
Finally, we compare the performance of ex-kNN with the existing packages of computing expectiles, like er-boost and ex-svm, with respect to the test error and computational cost. To perform experiments by er-boost, we set the default value of boosting steps (M = 10) and use 5-fold cross-validation to choose the best value of interaction (L) between variables. For more details regarding experimental setting with er-boost, we refer the interested readers to [28]. To perform the ex-svm, which is the part of package liquidSVM, we downloaded the terminal version of liquidSVM for Windows 64 bit. The default setting of 10 by 10 grid search of hyperparameters together with 5-fold cross validation is used to tune these hyperparameters. For more details on ex-svm and liquidSVM, we refer the readers to [14,29], respectively. Finally, to compute the test error and computation cost for ex-kNN we consider the distance measures that attain the top average rank for τ-expectile, which is, Canberra distance for test error and Euclidean distance for computational time.
Furthermore, we also use five-fold cross-validation to determine the best k-value. Based on the aforementioned settings, the results for test error and computational cost of ex-kNN, er-boost, and ex-svm for different datasets are presented in Tables 2 and 3. By comparing ex-kNN with er-boost regarding test error at τ = 0.25, 0.50, and 0.75, we see that ex-kNN, depending on the nature of the data, shows between two to eight times better performance, see Figure 1. On the other hand, the performance of ex-kNN in terms of test error is comparable with ex-svm on some examples. Regarding computational cost, similar to the findings of [14] for ex-svm, we observe that ex-kNN is also more sensitive to the training set size and less sensitive to data dimensions. However, it is interesting to note that ex-kNN, based on the datasets, is up to five times more efficient on some examples than ex-svm. Moreover, ex-kNN is found to be considerably time efficient on the datasets, particularly when the data sets are high dimensional, as we see in Figure 1.

Conclusions
In this study, an algorithm, called ex-kNN, is proposed by combining the idea of the k-nearest neighbours approach and the asymmetric least squares loss function in order to compute expectiles. Because the performance of ex-kNN depends on the distance measure used to determine the neighbourhood of the query point, various distance measures are considered and their impact is evaluated in terms of test error and computational time. It is observed that there exists no single distance measures that can be associated with ex-kNN to achieve high performance for different kinds of datasets. To be more precise, it is found from the list of considered distance measures that Canberra, Lorentzian, and Soergel lead to minimum test error, whereas Euclidean, Canberra, and Average of (L 1 , L ∞ ) provide a low computational cost of ex-kNN. Furthermore, using nine real-world datasets, the performance of ex-kNN is compared with existing packages for computing expectiles, namely er-boost and ex-svm. The results showed that the ex-kNN, depending on the nature of data, performs between two to eight times better than ex-kNN in terms of test error and it showed comparable performance with ex-svm on some datasets. Regarding computational cost, it is found that ex-svm is up to five times more efficient than ex-svmm and much more efficient than er-boost.
To make a fair comparison of ex-kNN with existing packages, this study is limited to using the cross-validation approach with ex-kNN to select the best value of k-neighbours. However, more advanced and efficient approaches for this purpose can be considered with ex-kNN for a further reduction of computational cost. Moreover, some other loss functions to compute expectiles can be investigated, with the results compared to the one considered in this study.
Author Contributions: Each author's contribution is equal. All authors have read and agreed to the published version of the manuscript.
Funding: This research received no external funding. Data Availability Statement: Data available on request due to restrictions, e.g., privacy or ethical. The data presented in this study are available on request from the corresponding author.

Acknowledgments:
The authors would like to thank the three referees for their constructive comments on the article.