Next Article in Journal
Covariance Principle for Capital Allocation: A Time-Varying Approach
Next Article in Special Issue
Möbius Transformation-Induced Distributions Provide Better Modelling for Protein Architecture
Previous Article in Journal
Seasonality in Tourism: Do Senior Programs Mitigate It?
Previous Article in Special Issue
Alternative Dirichlet Priors for Estimating Entropy via a Power Sum Functional
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

An Improved Variable Kernel Density Estimator Based on L2 Regularization

1
Department of Trace Inspection Technology, Criminal Investigation Police University of China, Shenyang 110854, China
2
Key Laboratory of Impression Evidence Examination and Identification Technology, The Ministry of Public Security of the People’s Republic of China, Shenyang 110854, China
3
Big Data Institute, College of Computer Science & Software Engineering, Shenzhen University, Shenzhen 518060, China
4
National Engineering Laboratory for Big Data System Computing Technology, Shenzhen University, Shenzhen 518060, China
*
Author to whom correspondence should be addressed.
Mathematics 2021, 9(16), 2004; https://doi.org/10.3390/math9162004
Submission received: 25 June 2021 / Revised: 12 August 2021 / Accepted: 17 August 2021 / Published: 21 August 2021

Abstract

:
The nature of the kernel density estimator (KDE) is to find the underlying probability density function (p.d.f) for a given dataset. The key to training the KDE is to determine the optimal bandwidth or Parzen window. All the data points share a fixed bandwidth (scalar for univariate KDE and vector for multivariate KDE) in the fixed KDE (FKDE). In this paper, we propose an improved variable KDE (IVKDE) which determines the optimal bandwidth for each data point in the given dataset based on the integrated squared error (ISE) criterion with the L 2 regularization term. An effective optimization algorithm is developed to solve the improved objective function. We compare the estimation performance of IVKDE with FKDE and VKDE based on ISE criterion without L 2 regularization on four univariate and four multivariate probability distributions. The experimental results show that IVKDE obtains lower estimation errors and thus demonstrate the effectiveness of IVKDE.

1. Introduction

It is very important for many machine learning algorithms to estimate the unknown probability density functions (p.d.f.s) of given datasets, e.g., Bayesian classifiers [1,2], density-based clustering algorithms [3,4], and mutual information-based feature selection algorithms [5,6]. In order to obtain the unknown p.d.f., an effective kernel density estimator (KDE) should be thoroughly constructed in advance. The classical KDE training method is the Parzen window method [7], which uses the superposition of multiple kernel functions with a fixed Parzen window (i.e., bandwidth) to fit the unknown p.d.f. The most used kernels [8] include uniform, triangular, Epanechnikov, biwieght, triweight, cosine, and Gaussian kernels. Compared with the kernels, the bandwidth plays a more important role in p.d.f. estimation: a large bandwidth will result in an over-smoothed estimation, while a small bandwidth will lead to an under-smoothed estimation.
How to determine an optimal bandwidth is a key point for training a KDE. In order to select an appropriate bandwidth, the effective error criterion should firstly be designed [9]. Commonly used error criteria include the integrated squared error (ISE) and the mean integrated squared error (MISE). Currently, there are two main ways to design KDE, i.e, the classical Parzen window method with the fixed bandwidth parameter named the fixed kernel density estimator (FKDE) and the modified Parzen window method with the variable bandwidth parameter named the variable kernel density estimator (VKDE). The representative studies corresponding to FKDE and VKDE are summarized as follows.
  • Fixed kernel density estimator. The rule-of-thumb-based KDE (RoT-KDE) [10] was designed based on the asymptotic MISE (AMISE) criterion by assuming the unknown p.d.f. as normal p.d.f. Due to the inappropriate assumption of the true p.d.f., RoT-KDE is a naive KDE and inclined to select the over-smoothed bandwidth [8]. Apart from the sample and direct RoT-KDE, there are three other sophisticated KDEs, i.e., bootstrap-based KDE (BS-KDE) [11], biased cross-validation-based KDE (BCV-KDE) [12], and unbiased cross-validation-based KDE (UCV-KDE) [13]. BS-KDE determined the optimal bandwidth based on the MISE criterion by using the bootstrap technology to estimate the true p.d.f. BCV-KDE was also designed based on the MISE criterion, which calulated the optimal bandwidth by establishing the relationship between the true p.d.f. and the derivative of the estimated p.d.f. UCV-KDE used the ISE criterion to optimize the bandwidth by representing the true p.d.f. with the estimated leave-one-out p.d.f. In RoT-KDE, BS-KDE, BCV-KDE, and UCV-KDE, all samples in the given dataset enjoy a fixed bandwidth and do not use the bandwidth to adjust the roles of data points for p.d.f. estimation.
  • Variable kernel density estimator. The model of VKDE was firstly proposed by Breiman et al. [14], who introduced the variable bandwidths for each data point in the given dataset and represented the bandwidth with distance from the data point to its k-th nearest neighbor. Jones [15] clarified the difference between VKDE employing a different bandwidth for each data point and VKDE with bandwidth as a function of estimation location. Terrell and Scott [16] derived the optimization rule for variable bandwidths based on the asymptotic mean squared error (AMSE) criterion. Hall et al. [17] improved the VKDE proposed in [16] by further analyzing the rates of VKDE convergence. Wu et al. [18] proposed a strategy to express the variable bandwidth in VKDE as the product of a local bandwidth factor and a global smoothing parameter. Suaray [19] proposed a VKDE for the p.d.f. estimation of censored data. Klebanov [20] proposed an axiomatic approach to construct a VKDE which guaranteed the density estimation invariance under linear transformations of original density as well as under splitting of density into several well-separated parts.
Compared with FKDEs, the main merit of VKDEs is that the variable bandwidths can flexibly adjust the importance of data points during the p.d.f estimation. This paper focuses on the improvement of VKDE. Jones [21] discussed the roles of ISE and MISE criteria in p.d.f. estimation. We consider using the ISE criterion to calculate the optimal bandwidths for the VKDE. The mathematical analysis indicates that the ISE criterion usually leads to an over-smoothed p.d.f. estimation. Inspired by the integration of empirical and structural risks, we propose an improved variable KDE (IVKDE) which determines the optimal bandwidth for each data point based on the ISE criterion with an L 2 regularization term in this paper. The ISE and L 2 regularization represent the empirical and structural risks for constructing VKDE, respectively. In order to obtain the optimally variable bandwidths, an effective optimization scheme is developed to solve the improved objective function. We conduct the exhaustive experiments to validate the rationality, feasibility, and effectiveness of IVKDE. The experimental results show that IVKDE is convergent and able to obtain the desirable p.d.f. estimation. In comparison with FKDE and VKDE based on the ISE criterion without L 2 regularization on four univariate and four multivariate probability distributions, IVKDE obtains lower estimation errors and thus demonstrate the effectiveness of IVKDE.
The remainder of this paper is organized as follows. In Section 2, we describe the basic principles of the variable kernel density estimator. In Section 3, we introduce the improved variable kernel density estimator. In Section 4, we provide experimental results and analysis. Finally, in Section 5, we conclude this paper and discuss future works.

2. Basic Principle of VKDE

For the given dataset X = x ~ n x ~ n = x n 1 , x n 2 , , x n D ,   x n d ,   n = 1 , 2 , , N , d = 1 , 2 , , D , the classical fixed KDE (FKDE), i.e., Parzen window method [7], is constructed as
f ^ FKDE x ~ = f ^ FKDE x 1 , x 2 , , x D = 1 N d = 1 D h d n = 1 N κ x 1 x n 1 h 1 , x 2 x n 2 h 2 , , x D x n D h D ,
where
κ u ~ = 1 2 π D exp 1 2 u ~ T u ~ = 1 2 π D exp 1 2 d = 1 D u d 2 ,
u ~ = u 1 , u 2 , , u D D is the D -variate Gaussian kernel, and h ~ = h 1 , h 2 , , h D , h d > 0 , d = 1 , 2 , , D is the bandwidth. Substituting Equation (2) into Equation (1) yields the estimated p.d.f. of dataset X as
f ^ FKDE x ~ = 1 N n = 1 N N x ~ n , Σ ,
where
N x ~ n , Σ = 1 2 π D Σ 1 2 exp 1 2 x ~ x ~ n Σ 1 x ~ x ~ n T
is the D -dimensional Gaussian distribution with mean vector x ~ n = x n 1 , x n 2 , , x n D and covariance matrix Σ = h 1 2 0 0 0 h 2 2 0 0 0 h D 2 . Equation (3) reflects that the estimated p.d.f. is the superposition of N Gaussian p.d.f.s.
The p.d.f. of dataset X estimated by VKDE is
f ^ VKDE x ~ = 1 N n = 1 N N x ~ n , Σ n ,
where the covariance matrix of N x ~ n , Σ n is Σ n = h n 1 2 0 0 0 h n 2 2 0 0 0 h n D 2 , h n d > 0 , n = 1 , 2 , , N , d = 1 , 2 , , D . Equation (5) can be further transformed into the following Equation (6):
f ^ VKDE x ~ = 1 N n = 1 N 1 2 π D Σ n 1 2 exp 1 2 x ~ x ~ n Σ n 1 x ~ x ~ n T = 1 N n = 1 N d = 1 D 1 2 π h n d exp 1 2 x d x n d h n d 2 ,
where h ~ n = h n 1 , h n 2 , , h n D , n = 1 , 2 , , D is the variable bandwidth vector corresponding to the n-th data point. There are N D bandwidth parameters which need to be determined in VKDE.

3. Proposed IVKDE

In this section, we firstly provide an improved VKDE which uses an L 2 regularization term-based objective function to evaluate the efficiency of variable bandwidths. Then, a bandwidth optimization algorithm is developed to solve the optimal variable bandwidths based on the above-mentioned objective function.
The purpose of VKDE training is to make the estimated p.d.f.  f ^ VKDE x ~ as close to the true p.d.f. f x ~ as possible. In Equation (6), we can find that the performance of VKDE is only related to the selection of bandwidth vectors. We want to select the bandwidth vectors which can minimize the error between p.d.f. f ^ VKDE x ~ and f x ~ . In order to measure the estimated error, an effective error criterion should firstly be designed. The integrated squared error (ISE)
ISE h ~ 1 , h ~ 2 , , h ~ N = + f ^ VKDE x ~ f x ~ 2 d x ~ = + f ^ VKDE x ~ 2 d x ~ 2 + f ^ VKDE x ~ f x ~ d x ~ + + f x ~ 2 d x ~
is used in our proposed IVKDE to measure the estimated error.
In Equation (7), we can see that the third term + f x ~ 2 d x ~ is unrelated to the unknown bandwidth vectors. Thus, the optimal variable bandwidth vectors can be obtained by minimizing the simplified ISE criterion:
IS E * h ~ 1 , h ~ 2 , , h ~ N = + f ^ VKDE x ~ 2 d x ~ 2 + f ^ VKDE x ~ f x ~ 2 d x ~ .
Equation (8) is a data-driven error measurement which easily leads to a data-adaptive KDE and further makes the estimated p.d.f. more inclined to fit the given dataset X . In order to guarantee the good generalization capability of KDE, we give the following objective function to select the bandwidth vectors for our proposed IVKDE:
L h ~ 1 , h ~ 2 , , h ~ N = IS E * h ~ 1 , h ~ 2 , , h ~ N + ξ N n = 1 N h ~ n 2 2 ,
where the second term is the L 2 regulation term, h ~ n 2 is the L 2 norm of bandwidth vector h ~ n , n = 1 , 2 , , N , and ξ > 0 is the regulation factor.
Substituting Equation (6) into + f ^ VKDE x ~ 2 d x ~ and + f ^ VKDE x ~ f x ~ d x ~ terms yields
+ f ^ VKDE x ~ 2 d x ~ = + 1 N n = 1 N d = 1 D 1 2 π h n d exp 1 2 x d x n d h n d 2 2 d x ~ = 1 2 π D N 2 n = 1 N 1 h n 1 h n 2 h n D + 1 2 π D N 2 n = 1 N m = 1 m n N 1 d = 1 D h n d 2 + h m d 2 exp 1 2 d = 1 D x n d x m d h n d 2 + h m d 2 2
and
+ f ^ VKDE x ~ f x ~ d x ~ = E f ^ VKDE n x ~ n = 1 N n = 1 N f ^ VKDE n x ~ n = 1 N n = 1 N 1 N 1 m = 1 m n N d = 1 D 1 2 π h m d exp 1 2 x n d x m d h m d 2 = 1 2 π D N N 1 n = 1 N m = 1 m n N 1 h m 1 h m 2 h m D exp 1 2 d = 1 D x n d x m d h m d 2 ,
respectively, where f ^ VKDE n x ~ n , n = 1 , 2 , , N is a leave-one-out estimator trained through an unbiased cross-validation (UCV) method.
IVKDE needs to use the optimal bandwidth vectors that can minimize the objective function with the L 2 regulation term. In order to solve the optimal bandwidths, we should firstly calculate the partial derivative of L h ~ 1 , h ~ 2 , , h ~ N with respect to h n d , n = 1 , 2 , , N , d = 1 , 2 , , D . Let
Δ h n d = L h ~ 1 , h ~ 2 , , h ~ N h n d = 1 2 π D N 2 h n d k = 1 D h n k + 2 h n d Δ 1 2 π D N 2 2 Δ 2 2 π D N N 1 h n d k = 1 D h n k ,
where
Δ 1 = m = 1 m n N x m d x n d h n d 2 + h m d 2 2 1 h n d 2 + h m d 2 k = 1 D h n k 2 + h m k 2 exp 1 2 k = 1 D x m k x n k h n k 2 + h m k 2 2
and
Δ 2 = m = 1 m n N 1 + x n d x m d 2 h n d 2 exp 1 2 k = 1 D x m k x n k h n k 2 .
We can find that it is very difficult to calculate the analytic solution of h n d , n = 1 , 2 , , N , d = 1 , 2 , , D from Δ h n d = 0 . Here, we design the following Algorithm 1 which uses the gradient descent method to solve the optimal bandwidths for IVKDE based on the objective function as shown in Equation (9). Algorithm 1 iteratively determines the optimal bandwidths based on the decaying learning rate adjustment. Because the minimization of L h ~ 1 , h ~ 2 , , h ~ N is required, the negative gradient is used in Algorithm 1.
Algorithm 1 Solving the optimal bandwidths for IVKDE.
Input: 
The given dataset X , the regulation factor ξ > 0 , the maximum value of learning rate α Max , the minimum value of learning rate α Min , the maximum number of iterations T Max , the stopping threshold δ > 0 , and the initial bandwidth h n d 0 , n = 1 , 2 , , N , d = 1 , 2 , , D .
Output: 
The optimal bandwidth h n d , n = 1 , 2 , , N , d = 1 , 2 , , D .
 1:
t = 1 ; // t is the number of iterations.
 2:
repeat
 3:
    for  n = 1 ; n < = N ; n + +  do
 4:
        for  d = 1 ; d < = D ; d + +  do
 5:
            h n d t = h n d t 1 α Max α Max α Min T Max t Δ h n d t 1 ;
 6:
        end for
 7:
    end for
 8:
     t = t + 1 ;
 9:
until L h ~ 1 t , h ~ 2 t , , h ~ N t L h ~ 1 t 1 , h ~ 2 t 1 , , h ~ N t 1 < δ or t > T Max
  10:
h n d = h n d t , n = 1 , 2 , , N , d = 1 , 2 , , D .

4. Experimental Results and Analysis

We conduct three experiments based on eight different probability distributions as shown in Table 1 to validate the rationality, feasibility, and effectiveness of the proposed IVKDE. The graphics of these eight p.d.f.s for the given parameters are presented in Figure 1.

4.1. Experiential Setup

The rationality is to check the convergence of Algorithm 1, the feasibility is to show the estimation capability of IVKDE to the given p.d.f.s, and the effectiveness is demonstrated by comparing the estimation performances of IVKDE with FKDE and VKDE. For FKDE and VKDE, the optimal bandwidths are also determined with the gradient descent method. The synthetic datasets obeying the above-mentioned distributions can be accessible in any country accessed via our BaiduPan (https://pan.baidu.com/s/1YhkkrckQA_e2GNd8haLE1g, accessed on 25 June 2021) with extraction code vn6j. All the estimators are implemented with the Python programming language and run on a PC with an Intel(R) Quad-core 3.00 GHz i5-7400 CPU and 16 GB memory.

4.2. Rationality of IVKDE

We test the convergence of Algorithm 1 based on the random data points obeying F, normal, two-dimensional normal, and bimodal normal distributions with the following parameters:
  • F: N = 1000 and n 1 = n 2 = 20 ;
  • Normal: N = 1000 , μ = 0 , and σ = 1 ;
  • Two-dimensional normal: N = 1000 , μ ¯ = 0 , 3 , and Σ = 1 0 0 1 ;
  • Bimodal normal: N = 1000 , μ ¯ 1 = 0 , 0 , μ ¯ 2 = 3 , 3 , Σ 1 = Σ 2 = 1 0 0 1 , and ε 1 = ε 2 = 1 2 .
For each distribution, we repeat the running of Algorithm 1 10 times with the following parameters: T Max = 2500 , α Max = 1 , α Min = 0.001 , δ = 0 , and h n d 0 = 0.5 . We check the variation of the bandwidth sum with an increase in iteration numbers, where the bandwidth sum is calculated as
su m t h ~ 1 , h ~ 2 , , h ~ N = 1 N D n = 1 N d = 1 D h n d t , t = 1 , 2 , , T Max .
In Figure 2, we can see that Algorithm 1 is convergent for the different regulation factor ξ s on the given p.d.f. The curves of bandwidth sums firstly decrease and then keep stable with the increase in iteration numbers. This indicates that Algorithm 1 is convergent and can find the optimal bandwidths for IVKDE.

4.3. Feasibility of IVKDE

We check the p.d.f. estimation capability of IVKDE based on F and two-dimensional normal distributions with the following parameters:
  • F: N = 1000 and n 1 = n 2 = 20 ;
  • Two-dimensional normal: N = 1000 , μ ¯ = 0 , 3 , and Σ = 1 0 0 1 .
We use Algorithm 1 to determine the optimal bandwidths for each distribution based on the random data points, where the parameters of Algorithm 1 are set as T Max = 1500 , α Max = 0.4 , α Min = 0.01 , δ = 10 8 , h n d 0 = 0.3 , ξ = 0.3 for F distribution and T Max = 500 , α Max = 0.2 , α Min = 0.1 , δ = 10 8 , h n d 0 = 0.36 , and ξ = 0.3 for two-dimensional normal distribution. The estimated p.d.f.s are presented in Figure 3 and Figure 4. In these two figures, we can intuitively find that IVKDE can estimate the underlying p.d.f.s based on the given data points. The estimated p.d.f.s are very close to the true p.d.f.s. The experimental results show that IVKDE is feasible to estimate the unknown p.d.f.

4.4. Effectiveness of IVKDE

On eight probability distributions, as shown in Table 1, we compare the p.d.f. estimation performance of IVKDE with FKDE and VKDE. The parameters of these three kernel density estimators are summarized in Table 2. The comparative results among FKDE, VKDE, and IVKDE are listed in Table 3. We use the mean absolute error (MAE) to evaluate the training and testing performances of these three kernel density estimators. Assume the true and estimated p.d.f. values for the given dataset X are y 1 , y 2 , , y N and y ^ 1 , y ^ 2 , , y ^ N , respectively. Then, the MAE on dataset X is calculated as
MAE X = 1 N n = 1 N y n y ^ n .
In Table 3, we can find that IVKDE obtains the significantly better p.d.f. estimation performances on training and testing datasets than FKDE and VKDE. We carry out the statistical test on the comparative results based on the sign test method [22]. For the pairwise comparison between methods A and B, A is significantly better than B under the given significance level if the number of A’s wins reaches the critical number. There are eight different probability distributions which are used to compare the estimation performances of FKDE, VKDE, and IVKDE. The critical win number is 8 2 + 1.96 × 8 2 7 in our comparison for the given significance level 0.05. The win numbers of IVKDE vs. FKDE and VKDE on training datasets are 7 and 8, respectively. This indicates that IVKDE obtains significantly better p.d.f. estimation performances than FKDE and VKDE on training datasets. The win numbers of IVKDE vs. FKDE and VKDE on testing datasets are 6 and 8, respectively. This indicates that IVKDE obtains significantly better p.d.f. estimation performances than VKDE on testing datasets. The experimental and statistical results show that IVKDE can improve the p.d.f. estimation performance of VKDE and thus demonstrate the effectiveness of IVKDE.

5. Conclusions and Future Works

This paper presented an improved variable kernel density estimator (IVKDE) by using both integrated squared error (ISE) and L 2 regularization to determine the optimal bandwidths. The L 2 regularization can effectively avoid the over-smoothed bandwidth selection. The experimental results demonstrated the rationality, feasibility, and effectiveness of the proposed IVKDE. Future works will be carried out according to the following research directions: (1) using IVKDE to estimate the unknown p.d.f. for a large-scale dataset [23] and (2) finding the practical applications for IVKDE in data mining and machine learning fields.

Author Contributions

Methodology, Y.J.; Writing—Original Draft Preparation, Writing—Review and Editing, Y.H.; Validation, D.H. All authors have read and agreed to the published version of the manuscript.

Funding

This paper was supported by the Basic Research Foundation of Strengthening Police with Science and Technology of the Ministry of Public Security (2017GABJC09), Open Foundation of Key Laboratory of Impression Evidence Examination and Identification Technology, The Ministry of Public Security of the People’s Republic of China (HJKF201901), Basic Research Foundation of Shenzhen (20210312191246002), and the Scientific Research Foundation of Shenzhen University for Newly-Introduced Teachers (2018060).

Data Availability Statement

The data presented in this study are available in BaiduPan https://pan.baidu.com/s/1YhkkrckQA_e2GNd8haLE1g (accessed on 25 June 2021) with extraction code vn6j.

Acknowledgments

We would like to thank the editors and two anonymous reviewers whose meticulous readings and valuable suggestions helped us to improve this paper significantly after two rounds of review.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

p.d.f.Probability Density Function
KDEKernel Density Estimator
ISEIntegrated Squared Error
MISEMean Integrated Squared Error
FKDEFixed Kernel Density Estimator
VKDEVariable Kernel Density Estimator
RoTRule-of-Thumb
BSBootstrap
BCVBiased Cross-Validation
UCVUnbiased Cross-Validation
IVKDE   Improved Variable Kernel Density Estimator
MAEMean Absolute Error

References

  1. John, G.H.; Langley, P. Estimating continuous distributions in Bayesian classifiers. In Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence, San Francisco, CA, USA, 18–20 August 1995; pp. 338–345. [Google Scholar]
  2. Wang, X.Z.; He, Y.L.; Wang, D.D. Non-naive Bayesian classifiers for classification problems with continuous attributes. IEEE Trans. Cybern. 2014, 44, 21–39. [Google Scholar] [CrossRef] [PubMed]
  3. Azzalini, A.; Menardi, G. Clustering via nonparametric density estimation: The R package pdfCluster. J. Stat. Softw. 2014, 57, 1–26. [Google Scholar] [CrossRef] [Green Version]
  4. Cuevas, A.; Febrero, M.; Fraiman, R. Cluster analysis: A further approach based on density estimation. Comput. Stat. Data Anal. 2001, 36, 441–459. [Google Scholar] [CrossRef]
  5. Kwak, N.; Choi, C.H. Input feature selection by mutual information based on Parzen window. IEEE Trans. Pattern Anal. Mach. Intell. 2002, 24, 1667–1671. [Google Scholar] [CrossRef] [Green Version]
  6. Peng, H.C.; Long, F.H.; Ding, C. Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 2005, 27, 1226–1238. [Google Scholar] [CrossRef] [PubMed]
  7. Parzen, E. On estimation of a probability density function and mode. Ann. Math. Stat. 1962, 33, 1065–1076. [Google Scholar] [CrossRef]
  8. Wand, M.P.; Jones, M.C. Kernel Smoothing; Chapman and Hall: London, UK, 1994. [Google Scholar]
  9. Silverman, B.W. Density Estimation for Statistics and Data Analysis; Routledge: Abingdon, UK, 2018. [Google Scholar]
  10. Chen, S. Optimal bandwidth selection for kernel density functionals estimation. J. Probab. Stat. 2015, 2015, 242683. [Google Scholar] [CrossRef] [Green Version]
  11. Taylor, C.C. Bootstrap choice of the smoothing parameter in kernel density estimation. Biometrika 1989, 76, 705–712. [Google Scholar] [CrossRef]
  12. Scott, D.W.; Terrell, G.R. Biased and unbiased cross-validation in density estimation. J. Am. Stat. Assoc. 1987, 82, 1131–1146. [Google Scholar] [CrossRef]
  13. Bowman, A.W. An alternative method of cross-validation for the smoothing of density estimates. Biometrika 1984, 71, 353–360. [Google Scholar] [CrossRef]
  14. Breiman, L.; Meisel, W.; Purcell, E. Variable kernel estimates of multivariate densities. Technometrics 1977, 19, 135–144. [Google Scholar] [CrossRef]
  15. Jones, M.C. Variable kernel density estimates and variable kernel density estimates. Aust. J. Stat. 1990, 32, 361–371. [Google Scholar] [CrossRef]
  16. Terrell, G.R.; Scott, D.W. Variable kernel density estimation. Ann. Stat. 1992, 20, 1236–1265. [Google Scholar] [CrossRef]
  17. Hall, P.; Hu, T.C.; Marron, J.S. Improved variable window kernel estimates of probability densities. Ann. Stat. 1995, 23, 1–10. [Google Scholar] [CrossRef]
  18. Wu, T.J.; Chen, C.F.; Chen, H.Y. A variable bandwidth selector in multivariate kernel density estimation. Stat. Probab. Lett. 2007, 77, 462–467. [Google Scholar] [CrossRef]
  19. Suaray, K. Variable bandwidth kernel density estimation for censored data. J. Stat. Theory Pract. 2011, 5, 221–229. [Google Scholar] [CrossRef]
  20. Klebanov, I. Axiomatic Approach to Variable Kernel Density Estimation. arXiv 2018, arXiv:1805.01729. Available online: https://arxiv.org/abs/1805.01729 (accessed on 4 May 2018).
  21. Jones, M.C. The roles of ISE and MISE in density estimation. Stat. Probab. Lett. 1991, 12, 51–56. [Google Scholar] [CrossRef]
  22. Demšar, J. Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 2006, 7, 1–30. [Google Scholar]
  23. Ur Rehman, M.H.; Liew, C.S.; Abbas, A.; Jayaraman, P.P.; Wah, T.Y.; Khan, S.U. Big data reduction methods: A survey. Data Sci. Eng. 2016, 4, 265–284. [Google Scholar] [CrossRef] [Green Version]
Figure 1. Graphics of eight p.d.f.s.
Figure 1. Graphics of eight p.d.f.s.
Mathematics 09 02004 g001
Figure 2. Convergences of Algorithm 1 on 4 given p.d.f.s.
Figure 2. Convergences of Algorithm 1 on 4 given p.d.f.s.
Mathematics 09 02004 g002
Figure 3. Estimation capability of IVKDE on F distribution.
Figure 3. Estimation capability of IVKDE on F distribution.
Mathematics 09 02004 g003
Figure 4. Estimation capability of IVKDE on two-dimensional normal distribution.
Figure 4. Estimation capability of IVKDE on two-dimensional normal distribution.
Mathematics 09 02004 g004
Table 1. Four univariate and four multivariate probability distributions ( f i x ~ in bimodal, trimodal, and quadrimodal normal distributions is the two-dimensional normal distribution with mean vector μ ~ i and covariance matrix Σ i ).
Table 1. Four univariate and four multivariate probability distributions ( f i x ~ in bimodal, trimodal, and quadrimodal normal distributions is the two-dimensional normal distribution with mean vector μ ~ i and covariance matrix Σ i ).
Probability DistributionProbability Density Function
UnivariateF f x = n 1 n 1 2 n 2 n 2 2 x n 1 2 1 0 1 x n 1 2 1 1 x n 2 2 1 d x n 1 x + n 2 n 1 + n 2 2 , n 1 , n 2 = 1 , 2 , 3 , ; x 0 , +
Normal f x = 1 2 π σ e 1 2 x μ σ 2 , μ , + , σ > 0 ; x , +
Rayleigh f x = x σ 2 e 1 2 x σ 2 , σ > 0 ; x 0 , +
Student’s T f x = 0 + x v + 1 2 1 e x d x π v 0 + x v 2 1 e x d x 1 + x 2 v v + 1 2 , v > 0 ; x , +
MultivariateTwo-dimensional normal f x ~ = 2 π M 2 Σ 1 2 exp 1 2 x μ T Σ 1 x μ , x ~ = x 1 , x 2 , μ is the mean vector and Σ is the covariance matrix.
Bimodal normal f x ~ = i = 1 2 ε i f i x ~ , x ~ = x 1 , x 2 , i = 1 2 ε i = 1 , ε i 0 , i = 1 , 2
Trimodal normal f x ~ = i = 1 3 ε i f i x ~ , x ~ = x 1 , x 2 , i = 1 3 ε i = 1 , ε i 0 , i = 1 , 2 , 3
Quadrimodal normal f x ~ = i = 1 4 ε i f i x ~ , x ~ = x 1 , x 2 , i = 1 4 ε i = 1 , ε i 0 , i = 1 , 2 , 3 , 4
Table 2. Parameter settings of FKDE, VKDE, and IVKDE.
Table 2. Parameter settings of FKDE, VKDE, and IVKDE.
No.Probability DistributionFKDEVKDEIVKDE
T Max α Max α Min δ h n d 0 T Max α Max α Min δ h n d 0 T Max α Max α Min δ h n d 0 ξ
1F10001 10 5 10 8 115000.40.01 10 8 0.315000.40.01 10 8 0.30.3
2Normal12000.50.1 10 8 0.3412000.50.1 10 8 0.340.13
3Rayleigh100010.1 10 8 0.5100010.1 10 8 0.50.45
4Student’s T150010.5 10 8 0.5150010.5 10 8 0.50.15
5Two-dimensional normal5000.20.1 10 8 0.365000.20.1 10 8 0.360.3
6Bimodal normal5000.20.1 10 8 0.365000.20.1 10 8 0.360.3
7Trimodal normal5000.20.01 10 8 0.385000.20.01 10 8 0.380.01
8Quadrimodal normal5000.50.1 10 8 0.55000.50.1 10 8 0.50.3
ξ is the regulation factor; T Max is the maximum number of iterations; α Max is the maximum value of learning rate; α Min is the maximum value of learning rate; δ is the stopping threshold; h n d 0 , n = 1 , 2 , , N , d = 1 , 2 , , D are the initial bandwidths.
Table 3. Competitive results among FKDE, VKDE, and IVKDE on 8 different probability distributions.
Table 3. Competitive results among FKDE, VKDE, and IVKDE on 8 different probability distributions.
No. Probability Distribution MAE on Training SetMAE on Testing Set
FKDEVKDEIVKDEFKDEVKDEIVKDE
1F0.029210.039650.028910.029640.041110.03171
2Normal0.014160.015110.013890.013760.014890.0137
3Rayleigh0.022590.051270.027970.022220.051370.02859
4Student’s T0.009990.016070.009590.009800.015830.00970
5Two-dimensional normal0.004860.005020.004850.005000.005090.00498
6Bimodal normal0.005180.004630.004560.005300.004650.00455
7Trimodal normal0.003640.003630.003630.003590.003590.00359
8Quadrimodal normal0.002320.002350.002280.002470.002480.00241
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Jin, Y.; He, Y.; Huang, D. An Improved Variable Kernel Density Estimator Based on L2 Regularization. Mathematics 2021, 9, 2004. https://doi.org/10.3390/math9162004

AMA Style

Jin Y, He Y, Huang D. An Improved Variable Kernel Density Estimator Based on L2 Regularization. Mathematics. 2021; 9(16):2004. https://doi.org/10.3390/math9162004

Chicago/Turabian Style

Jin, Yi, Yulin He, and Defa Huang. 2021. "An Improved Variable Kernel Density Estimator Based on L2 Regularization" Mathematics 9, no. 16: 2004. https://doi.org/10.3390/math9162004

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop