Next Article in Journal
Two Families of Continuous Probability Distributions Generated by the Discrete Lindley Distribution
Previous Article in Journal
Combining Computational Modelling and Machine Learning to Identify COVID-19 Patients with a High Thromboembolism Risk
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Adaptive Nonparametric Density Estimation with B-Spline Bases

1
School of Mathematics, Hefei University of Technology, Hefei 230009, China
2
School of Physical and Mathematical Sciences, Nanjing Tech University, Nanjing 211816, China
3
Department of Mathematics, Hohai University, Nanjing 211100, China
*
Author to whom correspondence should be addressed.
Mathematics 2023, 11(2), 291; https://doi.org/10.3390/math11020291
Submission received: 4 December 2022 / Revised: 28 December 2022 / Accepted: 30 December 2022 / Published: 5 January 2023
(This article belongs to the Section Probability and Statistics)

Abstract

:
Learning density estimation is important in probabilistic modeling and reasoning with uncertainty. Since B-spline basis functions are piecewise polynomials with local support, density estimation with B-splines shows its advantages when intensive numerical computations are involved in the subsequent applications. To obtain an optimal local density estimation with B-splines, we need to select the bandwidth (i.e., the distance of two adjacent knots) for uniform B-splines. However, the selection of bandwidth is challenging, and the computation is costly. On the other hand, nonuniform B-splines can improve on the approximation capability of uniform B-splines. Based on this observation, we perform density estimation with nonuniform B-splines. By introducing the error indicator attached to each interval, we propose an adaptive strategy to generate the nonuniform knot vector. The error indicator is an approximation of the information entropy locally, which is closely related to the number of kernels when we construct the nonuniform estimator. The numerical experiments show that, compared with the uniform B-spline, the local density estimation with nonuniform B-splines not only achieves better estimation results but also effectively alleviates the overfitting phenomenon caused by the uniform B-splines. The comparison with the existing estimation procedures, including the state-of-the-art kernel estimators, demonstrates the accuracy of our new method.

1. Introduction

Nonparametric density estimation avoids the parametric assumptions in probabilistic modeling and reasoning, which achieves flexibility in data modeling while reducing the risk of model misspecification [1,2]. Hence, nonparametric density estimation is an important research area in statistics. In this paper, we focus on the estimation of univariate density functions, which is a classic problem in nonparametric statistics. We perform density estimation with nonuniform B-splines. By introducing the error indicator attached to each interval for density estimation, we propose an adaptive strategy to generate a nonuniform knot vector.
There are four main techniques for nonparametric estimation, i.e., histograms, orthogonal series, kernels, and splines. Histograms transform the continuous data into discrete data, while important information may be lost during the discretization process [3]. Kernel density estimation is one of the most famous methods for density estimation, which still remains an active research area (see [4] and references therein). In addition to kernel density estimators, orthogonal series estimators are also widely used, e.g., [5,6,7,8]. Terrell and Scott investigated some of the possibilities for the improvement of univariate and multivariate kernel density estimates by varying the window over the domain of estimation, pointwise and globally [9]. However, the global nature of orthogonal series estimators limits their applications.
Since B-spline basis functions possess the property of local support, local density estimators based on uniform B-splines have been discussed in [10,11,12,13]. In addition to the local support property, B-spline basis functions are also piecewise polynomials, which demonstrates their advantages where intensive numerical computations have been conducted after estimation [13,14,15]. In addition, local density estimators based on different splines have also been studied, e.g., logsplines [16,17], smoothing splines [18], penalized B-splines [19], and shape-constrained splines [20]. Recently, a Galerkin method was introduced to compute a B-spline estimator [21].
The important part of any basis estimation procedure is the bandwidth selection method. The existing literature on bandwidth is quite rich (see [21,22,23,24,25] and references therein). It should be noted that the least squares cross-validation (LSCV) formula in closed form was proposed in [21], which can be used to determine the bandwidth efficiently.
In this paper, we use nonuniform B-splines as density estimators. By introducing the local error indicator attached to the interval, we design an adaptive refinement strategy, which increases the approximation capability of the local density estimator. The numerical experiments show that our adaptive local density estimation produces a smaller approximation error than that with uniform B-splines. Comparison with the state-of-the-art density estimation methods shows that our adaptive method can approximate data with a comparable squared error and a significantly smaller absolute error than other kernel density estimators.
The remainder of the paper is organized as follows. The next section reviews the definition of B-spline basis functions and their corresponding piecewise polynomial space. In Section 3, we detail the proposed method for density estimation based on B-splines. In Section 4, we propose an adaptive knot refinement strategy. Numerical experiments are provided in Section 5. Finally, Section 6 ends with conclusions.

2. B-Splines

Given a knot vector U = u 1 , , u n + k + 1 , where u 1 u 2 u n + k + 1 and u i < u i + k + 1 , i = 1 , , n , the B-spline basis functions of degree k (order k + 1 ) are defined in a recursive fashion [26]:
N i , 0 ( x ) = 1 , u i x < u i + 1 , 0 , o t h e r w i s e . N i , j ( x ) = x u i u i + j u i N i , k 1 ( x ) + u i + j + 1 x u i + j + 1 u i + 1 N i + 1 , j 1 ( x ) , j = 1 , , k .
The B-spline basis functions N i , k ( x ) are nonnegative and have local support (i.e., N i , k ( x ) is a nonzero polynomial on [ u i , u i + k + 1 ) ). In addition, the B-spline basis functions form a partition of unity for x [ u k , u n + 1 ] [27]. Figure 1 shows the cubic and quadratic B-spline basis functions defined in [ 0 , 10 ] .
A B-spline of degree k is defined [28] as
p ( x ) = i = 1 n d i N i , k ( x ) ,
where d i R is the i-th control coefficient, and N i , k ( x ) is the i-th B-spline basis function, which is defined based on the knot vector U = [ u 1 , , u n + k + 1 ] . A B-spline whose knot vector is evenly spaced is known as a uniform B-spline. Otherwise, the B-spline is called a nonuniform B-spline.
Let Δ be a sequence of distinct real numbers
Δ = { η 0 < η 1 < < η l + 1 } ,
where η 0 = u k , η l + 1 = u n + 1 ,
u k + 1 , , u n : = η 1 , , η 1 k r 1 , , η l , , η l k r l ,
which forms a partition of the interval [ u k , u n + 1 ] . The space S k r ( Δ ) of piecewise polynomials of degree k with smoothness r = ( r 1 , , r l ) over the partition Δ is defined by
S k r ( Δ ) = { p P k ( [ η i , η i + 1 ) ) , i = 0 , , l , p C r i ( η i ) ) , i = 1 , , l } ,
where P k is the space of all the polynomials of degree k, and C r i is the space consisting of all the functions that are continuous at η i with order r i .
Lemma 1
([29]). Given the knot vector U, the set of B-spline basis functions defined as (1) forms an alternative basis for the piecewise polynomial space S k r ( Δ ) .
Moreover, the approximation capabilities of S k r ( Δ ) to a sufficiently smooth function defined over [ u k , u n + 1 ] were described in [30]. It was shown that a sufficiently smooth function can be approximated with good accuracy by B-splines.

3. Density Estimation with B-Splines

Let { x i } i = 1 N be an independent identically distributed random sample from a continuous probability density function f, X f ( x ) . Zong proposed a method to find B-spline estimates of a one-dimensional and two-dimensional probability density function from a sample [31].
We define an estimate f ^ ( x | α ) of f ( x ) in the form:
f ( x ) f ^ ( x | α ) : = j = 1 n α j N j , k ( x ) ,
where α = ( α 1 , , α n ) R n + 1 are coefficients. To fix the estimate f ^ ( x | α ) in the form of B-splines, we need to specify the degree k, the basis functions N j , k ( x ) , and the coefficients α j , j = 1 , , n .

3.1. Selecting the Degree and the Knot Vector

The degree k and the knot vector U need to be specified a priori to determine the basis functions. Based on the approximation ability and flexibility of the quadratic and cubic B-splines, quadratic or cubic B-splines are usually chosen for local density estimation [21,31,32]. The selection of the knot vector is challenging and time consuming. Even when we restrict the knot vector to a uniform case, we still need to specify:
(1)
η 0 and η l + 1 (i.e., u k , u n + 1 in the knot vector), which determines the endpoints of the interval of the piecewise polynomial space S k r ( Δ ) ;
(2)
the bandwidth h = u i + 1 u i .
To ensure that all the sample values are in the interval [ η 0 , η l + 1 ] , we can set
η 0 = a 0 γ ( b 0 a 0 ) , η l + 1 = a 0 + γ ( b 0 a 0 )
where a 0 = min x 1 , x 2 , , x N , b 0 = max x 1 , x 2 , , x N , and  γ is a parameter to control the length of the interval. In the numerical experiments, γ is set to be 0.01 in general. Note that the values a 0 , b 0 can be obtained by passing through data at a cost O ( N ) .
The selection of the optimal bandwidth is generally based on the score of the estimated model. A penalized likelihood score is chosen to perform selection in a principled way, e.g., the Bayesian information criterion (BIC) and measured entropy (ME) scores are adopted to select the bandwidth with the highest score [31,32], where
B I C : = 2 s = 1 N log f ^ ( x s | α ) + n log ( N ) ,
M E : = f ^ ( x | α ) log f ^ ( x | α ) ) d x + 3 n 1 2 N .
Note that ME is an asymptotically unbiased estimate of the information entropy. The information entropy of the real model f measured by the estimation f ^ is defined as
H ( f , f ^ ) = f ( x ) log f ^ ( x | α ) d x .

3.2. Computing the Coefficients

When the degree and the knot vector are specified, we need to compute the coefficients α 1 , , α n to obtain the B-spline estimation. In addition, two constraints need to be considered to ensure the resulting f ^ ( x | α ) : = j = 1 n α j N j , k ( x ) is a valid probability density function, i.e.,
(1)
α j 0 , j = 1 , , n , so that f ^ X ( x | α ) is always positive in the distribution range.
(2)
f ^ X x | α d x = 1 , which can be simplified to
f ^ x | α d x = j = 1 n α j N j , k ( x ) d x = j = 1 n α j u j u j k 1 k + 1 = 1 .
The coefficients α can be calculated based on the maximum likelihood method, which can be formulated as a constrained optimization problem:
max α R n j = 1 N log f ^ x j | α such   that α 1 , , α n 0 , j = 1 n α j u j u j k 1 ( k + 1 ) = 1 .
The constrained optimization problem (7) can be calculated efficiently by an iterative procedure [31]:
α j ( q ) = 1 N k + 1 ( u j u j k 1 ) × s = 1 N α j ( q 1 ) N j , 2 ( x s ) f ^ ( x s | α ) ( q 1 ) , j = 1 , , n ,
where q represents the iteration number in the optimization process. The initial values α j 0 are set to ( k + 1 ) / j = 1 n ( u j u j k 1 ) .

4. Knot Refinement

Uniform B-splines may fail to capture the details of the input dataset; it has been shown that the suitable placement of knots can improve the approximation capability of B-splines dramatically [33]. Hence, we focused on the adaptive generation of the knot vectors for density estimation in this paper.
We started with a coarse uniform knot vector; the adaptive procedure consisted of successive loops of the form:
Compute   coefficients Estimate   error Mark   and   refine .
The computation for the coefficients can be accomplished by (8). The essential part of the loops is the error estimate step. The error estimate methods with a posteriori error control are well developed in numerical analysis (see [34,35] and references therein for examples). We followed up on the ideas presented by [35,36] to derive a posteriori-based error estimator based on B-splines.

4.1. A Residual-Based Posteriori Error Estimator Based on B-Splines

We aimed to refine only those intervals I i = [ u i , u i + 1 ] , which contributed significantly to the error f ( x ) f ^ ( x | α ) . However, since the true density function f ( x ) was unknown, we defined a local error indicator attached with the interval I i as follows:
τ i = j = 1 N i 1 N i log f ^ ( x i j | α ) ,
where x i 1 , , x i N i are all the sampling points in { x j } j = 1 N located in the interval I i . Note that τ i is an estimate of the information entropy restricted on the interval I i :
H ( f , f ^ ) | I i = I i f ( x ) log f ^ ( x | α ) d x .

4.2. Adaptive Refinement Strategy

Inspired by the adaptive refinement strategy, in the numerical analysis, we introduced the adaptive refinement strategy to compute a sequence of estimates that converged to the true probability density function. As the error indicator for each interval was available, we marked each interval I i to be refined that had a large error. In order to find the intervals with a large error efficiently, we adopted the refinement strategy given in [37], with a slight modification as Algorithm 1.
Algorithm 1: Refinement algorithm
Mathematics 11 00291 i001
We implemented the adaptive strategy in this paper as in Algorithm 2.
Algorithm 2: Adaptive probability density function estimation
   Input: 
An observed sample { x i } i = 1 N ; the degree of B-spline k; the number of initial interior knots m.
   Output: 
A probability density function: f ^ X ( x | α ) = j = 1 n α j N j , k ( x ) .
1  Steps: 
 
   1.
Find a 0 = min x i i = 1 N , b 0 = max x i i = 1 N . Set η 0 = a 0 γ · ( b 0 a 0 ) , η m + 1 = b 0 + γ · ( b 0 a 0 ) , γ = 0.01 .
   2.
Initialize U = [ η 0 , , η 0 k + 1 , η 1 , η 2 , , η m , η m + 1 , , η m + 1 k + 1 ] , where η i = η 0 + i ( η m + 1 η 0 ) / ( m + 1 ) , i = 1 , , m .
   3.
Apply (8) to compute the coefficients α .
   4.
Evaluate the error indicators τ i as (9).
   5.
Apply Algorithm 1 to obtain a refined knot vector U .
   6.
If U U , update U = U and return to Step 3.
   7.
Set f ^ ( x | α ) as (3). Output f ^ ( x | α ) as an estimate of f ( x ) .
Remark 1.
Compared with the case of a uniform B-spline, where the knot vectors are selected by an exhaustive search [31], our adaptive refinement strategy generated the knot vector automatically.

5. Numerical Experiments

In this section, we report the results of several numerical experiments. We start by introducing the different comparison measures in Section 5.1. Section 5.2 shows a comparison of the accuracy of nonuniform B-spline density estimators versus uniform B-spline density estimators. In Section 5.3, we compare the nonuniform B-spline density estimator to the existing kernel density estimators and orthogonal sequence estimators.

5.1. Comparison Measures

We used different measures to evaluate the quality of the estimators computed based on the samples.
  • The measured entropy (ME) of the samples given by the estimator, which is defined as (5).
  • The BIC score of the samples given by the estimator, which is defined as (4).
In addition, we also used the MAE and root-MSE to measure how close the estimation f ^ X ( x | α ) was to the true density f ( x ) , where the x i is the sample point:
  • The root mean square error (root-MSE) between the estimation f ^ ( x | α ) and the true density f ( x ) :
    root - MSE = 1 N i = 1 N f ^ x i | α f x i 2 ;
  • The mean absolute error (MAE) between the estimation f ^ ( x | α ) and the true density f ( x ) :
    MAE = 1 N i = 1 N f ^ x i | α f x i .

5.2. Uniform B-Spline Estimators vs. Nonuniform B-Spline Estimators

First, we compared the uniform B-spline probability density estimator with the adaptive nonuniform B-spline probability density estimator; the generation of the nonuniform knot vector was described in Section 4.
Table 1 shows the name of the datasets, the probability distribution, and the approximation domain. The comparative experimental results of the uniform B-spline and the nonuniform B-spline are shown in Table 2, and the fitting results are shown in Figure 2.
When the sample size was fixed, the errors of the uniform B-spline and the nonuniform B-spline are compared in Table 2. The measured entropy (ME), the information entropy (H), the mean absolute error (MAE), the mean square error (root-MSE), and the Bayesian information criterion (BIC) scores are listed, which demonstrated that the adaptive nonuniform B-spline estimators usually outperformed the uniform B-spline estimators. In addition, compared to the uniform B-spline estimators, the adaptive nonuniform B-spline estimators were usually closer to the true density functions, which is shown in Figure 2.
When the sample size varied as N = 50 , 100, 500, 1000, 5000, the root-MSE and MAE results are listed for the uniform B-spline estimators and the nonuniform estimators in Table 3. From the ratio of the MAE and root-MSE, we can see that the fitting results of the adaptive nonuniform B-spline outperformed those of the uniform B-spline.

5.3. Comparison with Orthogonal Sequence and Kernel Estimators

Table 4 and Table 5 show the results compared with the previously mentioned probability density function estimation methods, including the orthogonal sequence of [7,13], the kernel estimators using three strategies to the bandwidth selected, that is, the rule-of-thumb method, which is based on the asymptotic mean integrated square error (ROT) [38], the least squares cross-validation method (LCV) [38], and a method proposed by Hall et al. based on the straightforward idea of plugging estimates into the usual asymptotic representation for the optimal bandwidth but with two important modifications (HALL) [39]. The experimental results of the B-spline function estimation used the values of the MAE and root-MSE.
In Table 4, we observe that with the same sample size, the errors of the nonuniform B-spline were smaller. The errors of the root-MSE and MAE of the nonuniform B-spline were smaller than those of the other methods, which showed that the estimation effect of the adaptive nonuniform B-splines was better than that of the listed methods. In addition, the fitting results obtained by the nonuniform B-splines overcame the overfitting phenomenon of the uniform method.
In Table 5, the error analysis of different sample sizes for the nonuniform B-spline, orthogonal sequence, and kernel methods are listed. The experimental results showed that the errors of the nonuniform B-spline were smaller, and the errors became smaller with the increase in the sample size. It was also shown that the B-spline fitting method of the nonuniform knots generated by our adaptive strategy had a better fitting effect.

6. Conclusions

In this work, we introduced a novel density estimation with nonuniform B-splines. By introducing the error indicator attached to each interval for density estimation, we proposed an adaptive strategy to generate the nonuniform knot vector. The numerical experiments showed that, compared with the uniform B-spline, the local density estimation with nonuniform B-splines not only achieved better estimation results but also effectively alleviated the overfitting phenomenon caused by the uniform B-splines. The comparison with the existing estimation procedures, including the state-of-the-art kernel estimators, demonstrated the accuracy of our new method.
In the future, it would be interesting to extend the method considered in the paper to multivariate density cases. Another natural direction to pursue further is the fast automatic knot placement method via feature characterization from the samples, which can generate the nonuniform knot vector directly. We leave these topics for future research.

Author Contributions

Y.Z., M.Z., Q.N. and X.W. had equal contributions. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China No.122011292 and No.61772167.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Siegel, S. Nonparametric statistics. Am. Stat. 1957, 11, 13–19. [Google Scholar]
  2. Gibbons, J.D.; Chakraborti, S. Nonparametric Statistical Inference; CRC Press: Boca Raton, FL, USA, 2014. [Google Scholar]
  3. García, S.; Luengo, J.; Sáez, J.A.; López, V.; Herrera, F. A survey of discretization techniques: Taxonomy and empirical analysis in supervised learning. IEEE Trans. Knowl. Data Eng. 2013, 25, 734–750. [Google Scholar] [CrossRef]
  4. Bhattacharya, A.; Dunson, D.B. Nonparametric Bayesian density estimation on manifolds with applications to planar shapes. Biometrika 2010, 97, 851–865. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  5. Hall, P. Cross-validation and the smoothing of orthogonal series density estimators. J. Multivar. Anal. 1987, 21, 189–206. [Google Scholar] [CrossRef] [Green Version]
  6. Dai, X.; Müller, H.-G.; Yao, F. Optimal bayes classifiers for functional data and density ratios. Biometrika 2017, 104, 545–560. [Google Scholar]
  7. Leitao, Á.; Oosterlee, C.W.; Ortiz-Gracia, L.; Bohte, S.M. On the data-driven cos method. Appl. Math. Comput. 2018, 317, 68–84. [Google Scholar] [CrossRef] [Green Version]
  8. Ait-Hennani, L.; Kaid, Z.; Laksaci, A.; Rachdi, M. Nonparametric estimation of the expected shortfall regression for quasi-associated functional data. Mathematics 2022, 10, 4508. [Google Scholar] [CrossRef]
  9. Terrell, G.R.; Scott, D.W. Variable kernel density estimation. Ann. Stat. 1992, 1236–1265. [Google Scholar] [CrossRef]
  10. Lamnii, A.; Nour, M.Y.; Zidna, A. A reverse non-stationary generalized b-splines subdivision scheme. Mathematics 2021, 9, 2628. [Google Scholar] [CrossRef]
  11. Fan, J.; Gijbels, I. Local Polynomial Modelling and Its Applications; Routledge: London, UK, 2018. [Google Scholar]
  12. Redner, R.A. Convergence rates for uniform B-spline density estimators part i: One dimension. SIAM J. Sci. Comput. 1999, 20, 1929–1953. [Google Scholar] [CrossRef]
  13. Cui, Z.; Kirkby, J.L.; Nguyen, D. Nonparametric density estimation by b-spline duality. Econom. Theory 2020, 36, 250–291. [Google Scholar] [CrossRef]
  14. Cui, Z.; Kirkby, J.L.; Nguyen, D. A data-driven framework for consistent financial valuation and risk measurement. Eur. J. Oper. Res. 2021, 289, 381–398. [Google Scholar] [CrossRef]
  15. Cui, Z.; Kirkby, J.L.; Nguyen, D. Efficient simulation of generalized sabr and stochastic local volatility models based on markov chain approximations. Eur. J. Oper. Res. 2021, 290, 1046–1062. [Google Scholar] [CrossRef]
  16. Kooperberg, C.; Stone, C.J. Comparison of parametric and bootstrap approaches to obtaining confidence intervals for logspline density estimation. J. Comput. Graph. Stat. 2004, 13, 106–122. [Google Scholar] [CrossRef]
  17. Koo, J.-Y. Bivariate b-splines for tensor logspline density estimation. Comput. Stat. Data Anal. 1996, 21, 31–42. [Google Scholar] [CrossRef]
  18. Gu, C. Smoothing spline density estimation: A dimensionless automatic algorithm. J. Am. Stat. Assoc. 1993, 88, 495–504. [Google Scholar] [CrossRef]
  19. Eilers, P.H.; Marx, B.D. Flexible smoothing with b-splines and penalties. Stat. Sci. 1996, 11, 89–121. [Google Scholar] [CrossRef]
  20. Papp, D.; Alizadeh, F. Shape-constrained estimation using nonnegative splines. J. Comput. Graph. Stat. 2014, 23, 211–231. [Google Scholar] [CrossRef]
  21. Kirkby, J.L.; Leitao, Á.; Nguyen, D. Nonparametric density estimation and bandwidth selection with b-spline bases: A novel galerkin method. Comput. Stat. Data Anal. 2021, 159, 107202. [Google Scholar] [CrossRef]
  22. Oliveira, M.; Crujeiras, R.M.; Rodríguez-Casal, A. A plug-in rule for bandwidth selection in circular density estimation. Comput. Stat. Data Anal. 2012, 56, 3898–3908. [Google Scholar] [CrossRef] [Green Version]
  23. Boente, G.; Rodriguez, D. Robust bandwidth selection in semiparametric partly linear regression models: Monte carlo study and influential analysis. Comput. Stat. Data Anal. 2008, 52, 2808–2828. [Google Scholar] [CrossRef]
  24. Hall, P.; Kang, K.-H. Bandwidth choice for nonparametric classification. Ann. Stat. 2005, 33, 284–306. [Google Scholar] [CrossRef]
  25. Loader, C. Bandwidth selection: Classical or plug-in? Ann. Stat. 1999, 27, 415–438. [Google Scholar] [CrossRef]
  26. Boor, C.D.; Boor, C.D. A Practical Guide to Splines; Springer: New York, NY, USA, 1978; Volume 27. [Google Scholar]
  27. Farin, G. Curves and Surfaces for Computer-Aided Geometric Design: A Practical Guide; Elsevier: Amsterdam, The Netherlands, 2014. [Google Scholar]
  28. Ezhov, N.; Neitzel, F.; Petrovic, S. Spline approximation, part 2: From polynomials in the monomial basis to b-splines—A derivation. Mathematics 2021, 9, 2198. [Google Scholar] [CrossRef]
  29. Curry, H.B.; Schoenberg, I.J. On spline distributions and their limits-the polya distribution functions. Bull. Am. Math. Soc. 1947, 53, 1114. [Google Scholar]
  30. Lyche, T.; Manni, C.; Speleers, H. Foundations of spline theory: B-splines, spline approximation, and hierarchical refinement. In Splines and PDEs: From Approximation Theory to Numerical Linear Algebra; Springer: Berlin/Heidelberg, Germany, 2018; pp. 1–76. [Google Scholar]
  31. Zong, Z.; Lam, K. Estimation of complicated distributions using B-spline functions. Struct. Saf. 1998, 20, 341–355. [Google Scholar] [CrossRef]
  32. López-Cruz, P.L.; Bielza, C.; Larranaga, P. Learning mixtures of polynomials of multidimensional probability densities from data using b-spline interpolation. Int. J. Approx. Reason. 2014, 55, 989–1010. [Google Scholar] [CrossRef]
  33. Loock, W.V.; Pipeleers, G.; Schutter, J.D.; Swevers, J. A convex optimization approach to curve fitting with b-splines. IFAC Proc. Vol. 2011, 44, 2290–2295. [Google Scholar] [CrossRef] [Green Version]
  34. Bastian, P.; Wittum, G. Adaptive multigrid methods: The ug concept. In Adaptive Methods—Algorithms, Theory and Applications; Springer: Berlin/Heidelberg, Germany, 1994; pp. 17–37. [Google Scholar]
  35. Urth, R.V. A Review of a Posteriori Error Estimation and Adaptive Mesh-Refinement Techniques; BG Teubner: Leipzig, Germany, 1996. [Google Scholar]
  36. Dörfel, M.R.; Jüttler, B.; Simeon, B. Adaptive isogeometric analysis by local h-refinement with t-splines. Comput. Methods Appl. Mech. Eng. 2010, 199, 264–275. [Google Scholar] [CrossRef] [Green Version]
  37. Morin, P.; Nochetto, R.H.; Siebert, K.G. Data oscillation and convergence of adaptive fem. SIAM J. Numer. Anal. 2000, 38, 466–488. [Google Scholar] [CrossRef]
  38. Węglarczyk, S. Kernel density estimation and its application. In ITM Web of Conferences; EDP Sciences: Les Ulis, France, 2018; p. 00037. [Google Scholar]
  39. Troudi, M.; Alimi, A.M.; Saoudi, S. Analytical plug-in method for kernel density estimator applied to genetic neutrality study. EURASIP J. Adv. Signal Process. 2008, 2008, 1–8. [Google Scholar] [CrossRef]
Figure 1. B-spline basis functions defined in the interval [ 0 , 10 ] . (a) Knot vector U = [ 0 , 0 , 0 , 1 , 2 , 3 , 5 , 8 , 9 , 10 , 10 , 10 ] . (b) Knot vector U = [ 0 , 0 , 0 , 0 , 1 , 2 , 3 , 5 , 8 , 9 , 10 , 10 , 10 , 10 ] .
Figure 1. B-spline basis functions defined in the interval [ 0 , 10 ] . (a) Knot vector U = [ 0 , 0 , 0 , 1 , 2 , 3 , 5 , 8 , 9 , 10 , 10 , 10 ] . (b) Knot vector U = [ 0 , 0 , 0 , 0 , 1 , 2 , 3 , 5 , 8 , 9 , 10 , 10 , 10 , 10 ] .
Mathematics 11 00291 g001
Figure 2. Density estimates use uniform and nonuniform B-splines. The blue rectangle represents the histogram, the blue solid line represents the true density function, the yellow dashed line represents the uniform B-spline function, and the red dotted line represents the adaptive nonuniform B-spline function. The knots of the nonuniform B-splines are marked as asterisks along the horizontal axis.
Figure 2. Density estimates use uniform and nonuniform B-splines. The blue rectangle represents the histogram, the blue solid line represents the true density function, the yellow dashed line represents the uniform B-spline function, and the red dotted line represents the adaptive nonuniform B-spline function. The knots of the nonuniform B-splines are marked as asterisks along the horizontal axis.
Mathematics 11 00291 g002
Table 1. Probability density functions.
Table 1. Probability density functions.
NameDistributionDomain
Gauss N ( 0 , 1 ) [ 3 , 3 ]
ExpExp ( 1 ) [ 0 , 3 ]
Chisq χ 2 ( 7 ) [ 0 , 25 ] )
MixGauss 0.3 N 0.25 , 0.33 + 0.5 N 3.25 , 1.0 [ 1.0 , 7.0 ]
Mix1d 0.8 χ 2 ( 3.00 ) + 0.2 N ( 7.00 , 1.00 ) [ 0.00 , 10.00 ]
MixGauss2 0.33 N 3.0 , 1.0 2 + 0.33 N 8.0 , 0.33 2 + 0.33 N 10 , 0.11 2 [ 0.0 , 10.0 ]
Table 2. The goodness of fit for the data using the uniform B-spline and nonuniform B-spline methods (The sample size is 1000).
Table 2. The goodness of fit for the data using the uniform B-spline and nonuniform B-spline methods (The sample size is 1000).
GaussExpChisqMixGaussMix1dMixGauss2
uniform B-spline
ME1.41520.79752.63371.60832.14941.8671
H1.39930.73912.61510.88572.10600.4424
root-MSE0.06320.03600.00320.12450.00440.1766
MAE0.01630.03210.00270.11160.00480.1506
BIC ( × 10 3 ) 2.87491.64175.31963.25014.34133.7524
non-uniform B-spline
ME1.42020.79912.63521.59782.15001.8855
H1.39620.73812.61440.88542.10390.4615
root-MSE0.00320.03600.00260.12160.00320.1780
MAE0.00460.03130.00220.11090.00320.1506
BIC ( × 10 3 ) 2.88361.64435.32063.23464.34383.7578
Table 3. Uniform vs. non-uniform Ratio - root - MSE = root - MSE non - uniform / root - MSE uniform . The goodness of fit for data using the uniform B-spline and nonuniform B-spline with the different sample sizes.
Table 3. Uniform vs. non-uniform Ratio - root - MSE = root - MSE non - uniform / root - MSE uniform . The goodness of fit for data using the uniform B-spline and nonuniform B-spline with the different sample sizes.
NCaseUniform B-splineNonuniform B-splineRatio-root-
MSE
Ratio-
MAE
root-MSEMAEroot-MSEMAE
50Gauss0.13080.07370.11400.08040.87191.0909
1000.07940.07010.07140.05920.89970.8445
5000.04120.03400.03870.03100.93930.9118
10000.02000.01630.00710.00460.3550.2822
50000.01000.00630.00710.00390.71430.6190
50Exp0.16120.13390.14110.11270.87490.8417
1000.16250.10770.15070.11420.927281.0604
5000.04580.03930.04120.03750.89970.9542
10000.03610.03210.03610.03131.00000.9751
50000.02240.02050.02450.02161.09541.0537
50Chisq0.03050.02660.08790.02312.88200.8684
1000.02080.01780.06430.01723.09130.9663
5000.01060.00910.01060.00911.00001.0000
10000.00320.00270.00880.00752.75002.7778
50000.01150.00980.01150.00981.00001.0000
50MixGauss0.12880.10740.13270.10781.03030.0037
1000.11870.10910.11700.10850.98570.9945
5000.12000.10970.11960.11000.99671.0027
10000.12450.11160.12170.11090.97750.9937
50000.11750.10590.11580.10450.98550.9868
50Mix1d0.09060.07240.10390.07471.14761.0318
1000.08660.05260.06630.05150.76590.9791
5000.01730.01310.01730.01361.00001.0382
10000.00530.00480.00440.00320.81480.6667
50000.00770.00280.00630.00280.81821.0000
50MixGauss20.17800.15880.17660.15780.99210.9937
1000.19290.17350.18810.16880.99510.9729
5000.17090.15370.16850.14980.98600.9746
10000.17660.15060.17800.15061.00791.000
50000.17440.15400.17350.15141.99480.9831
Table 4. The goodness of fit for data using the nonuniform B-spline methods, orthogonal sequence, and kernel estimators (The number of sample points is 1000).
Table 4. The goodness of fit for data using the nonuniform B-spline methods, orthogonal sequence, and kernel estimators (The number of sample points is 1000).
GaussExpChisqMixGaussMix1dMixGauss2
Non-uniform B-spline
root-MSE0.05660.03610.00260.12170.00320.1780
MAE0.00460.03130.00220.11090.00320.1506
Orthogonal sequence
root-MSE0.14420.18600.00970.12040.03320.1020
MAE0.13290.15080.00870.10350.02800.0958
kernal ROT
root-MSE0.28480.57420.02170.12730.08430.0309
MAE0.26550.50840.01980.11630.07790.0280
kernal LCV
root-MSE0.28710.58960.05740.13640.12920.0539
MAE0.26760.52290.05280.12470.12140.0479
kernal HALL
root-MSE0.30030.58870.08310.14420.12850.0539
MAE0.28020.52220.07690.13170.12040.0481
Table 5. The goodness of fit using the nonuniform B-spline, orthogonal sequence, and kernal estimators with the different sample sizes.
Table 5. The goodness of fit using the nonuniform B-spline, orthogonal sequence, and kernal estimators with the different sample sizes.
N GaussExpChisqMixGaussMix1dMixGauss2
Nonuniform B-spline
50root-MSE0.11400.14110.08790.13270.10390.1766
MAE0.08040.11270.02310.10780.07470.1578
100root-MSE0.07140.15070.06430.11700.06630.1881
MAE0.05920.11420.01720.10850.05150.1688
500root-MSE0.03870.04120.01060.11960.01730.1685
MAE0.03100.03750.00910.11000.01360.1498
1000root-MSE0.00430.03610.00260.12170.00320.1780
MAE0.00460.03130.00220.11090.00320.1506
5000root-MSE0.00630.02450.01150.11580.00230.1735
MAE0.00390.02160.00980.10450.00280.1514
Orthogonal sequence
50root-MSE0.11490.20060.01590.08370.02650.0640
MAE0.10970.16180.01360.06960.02040.0585
100root-MSE0.11750.21020.02160.10950.02650.0207
MAE0.10870.16020.01650.09520.02200.0169
500root-MSE0.15300.18170.00560.06080.04360.0943
MAE0.14120.14840.00490.01370.03910.0844
1000root-MSE0.14420.18600.00970.12040.03320.1020
MAE0.13290.15080.00870.10350.02800.0958
5000root-MSE0.14900.18190.01320.12530.03460.1109
MAE0.13690.14700.01170.10830.03020.0990
kernal ROT
50root-MSE0.25770.55440.11620.07070.05480.0469
MAE0.23620.49320.11110.05950.04960.0438
100root-MSE0.23900.58400.07480.10050.04360.0235
MAE0.21630.52230.07190.09080.03460.0202
500root-MSE0.28350.57000.00960.11870.07210.0216
MAE0.26630.50590.00780.10860.06730.0193
1000root-MSE0.28480.57420.02170.12730.08430.0309
MAE0.26550.50840.01980.11630.07790.0280
5000root-MSE0.29260.57780.05480.13380.10580.0424
MAE0.27140.50950.05050.12210.09870.0385
kernal LCV
50root-MSE0.23660.59160.19160.05660.05480.0632
MAE0.21610.52090.18420.4710.04650.595
100root-MSE0.23320.60840.02000.11270.09640.0480
MAE0.21070.54590.01740.10240.08880.4310
500root-MSE0.27510.59020.02440.13560.12650.0500
MAE0.25840.52510.02070.12420.11910.0453
1000root-MSE0.28710.58960.05740.13640.12920.0539
MAE0.26760.52290.05280.12470.12140.0479
5000root-MSE0.28880.58680.07870.13930.13300.0548
MAE0.26790.51780.07300.12680.12440.0496
kernal HALL
50root-MSE0.28480.58730.03320.11580.09640.0314
MAE0.26190.52640.02390.10200.08670.0273
100root-MSE0.27290.60830.05390.13380.10680.0436
MAE0.24980.54580.04850.12160.09920.0397
500root-MSE0.30300.58860.07620.14210.12770.0520
MAE0.28460.52360.07010.13000.12000.0471
1000root-MSE0.30030.58870.08310.14420.12850.0539
MAE0.28020.52220.07690.13170.12040.0481
5000root-MSE0.30130.58630.08830.14390.13190.0557
MAE0.27960.51740.08160.13130.12310.0505
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhao, Y.; Zhang, M.; Ni, Q.; Wang, X. Adaptive Nonparametric Density Estimation with B-Spline Bases. Mathematics 2023, 11, 291. https://doi.org/10.3390/math11020291

AMA Style

Zhao Y, Zhang M, Ni Q, Wang X. Adaptive Nonparametric Density Estimation with B-Spline Bases. Mathematics. 2023; 11(2):291. https://doi.org/10.3390/math11020291

Chicago/Turabian Style

Zhao, Yanchun, Mengzhu Zhang, Qian Ni, and Xuhui Wang. 2023. "Adaptive Nonparametric Density Estimation with B-Spline Bases" Mathematics 11, no. 2: 291. https://doi.org/10.3390/math11020291

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop