Adaptive Nonparametric Density Estimation with B-Spline Bases

: Learning density estimation is important in probabilistic modeling and reasoning with uncertainty. Since B-spline basis functions are piecewise polynomials with local support, density estimation with B-splines shows its advantages when intensive numerical computations are involved in the subsequent applications. To obtain an optimal local density estimation with B-splines, we need to select the bandwidth (i.e., the distance of two adjacent knots) for uniform B-splines. However, the selection of bandwidth is challenging, and the computation is costly. On the other hand, nonuniform B-splines can improve on the approximation capability of uniform B-splines. Based on this observation, we perform density estimation with nonuniform B-splines. By introducing the error indicator attached to each interval, we propose an adaptive strategy to generate the nonuniform knot vector. The error indicator is an approximation of the information entropy locally, which is closely related to the number of kernels when we construct the nonuniform estimator. The numerical experiments show that, compared with the uniform B-spline, the local density estimation with nonuniform B-splines not only achieves better estimation results but also effectively alleviates the overﬁtting phenomenon caused by the uniform B-splines. The comparison with the existing estimation procedures, including the state-of-the-art kernel estimators, demonstrates the accuracy of our new method.


Introduction
Nonparametric density estimation avoids the parametric assumptions in probabilistic modeling and reasoning, which achieves flexibility in data modeling while reducing the risk of model misspecification [1,2].Hence, nonparametric density estimation is an important research area in statistics.In this paper, we focus on the estimation of univariate density functions, which is a classic problem in nonparametric statistics.We perform density estimation with nonuniform B-splines.By introducing the error indicator attached to each interval for density estimation, we propose an adaptive strategy to generate a nonuniform knot vector.
There are four main techniques for nonparametric estimation, i.e., histograms, orthogonal series, kernels, and splines.Histograms transform the continuous data into discrete data, while important information may be lost during the discretization process [3].Kernel density estimation is one of the most famous methods for density estimation, which still remains an active research area (see [4] and references therein).In addition to kernel density estimators, orthogonal series estimators are also widely used, e.g., [5][6][7][8].Terrell and Scott investigated some of the possibilities for the improvement of univariate and multivariate kernel density estimates by varying the window over the domain of estimation, pointwise and globally [9].However, the global nature of orthogonal series estimators limits their applications.
The important part of any basis estimation procedure is the bandwidth selection method.The existing literature on bandwidth is quite rich (see [21][22][23][24][25] and references therein).It should be noted that the least squares cross-validation (LSCV) formula in closed form was proposed in [21], which can be used to determine the bandwidth efficiently.
In this paper, we use nonuniform B-splines as density estimators.By introducing the local error indicator attached to the interval, we design an adaptive refinement strategy, which increases the approximation capability of the local density estimator.The numerical experiments show that our adaptive local density estimation produces a smaller approximation error than that with uniform B-splines.Comparison with the state-of-the-art density estimation methods shows that our adaptive method can approximate data with a comparable squared error and a significantly smaller absolute error than other kernel density estimators.
The remainder of the paper is organized as follows.The next section reviews the definition of B-spline basis functions and their corresponding piecewise polynomial space.In Section 3, we detail the proposed method for density estimation based on B-splines.In Section 4, we propose an adaptive knot refinement strategy.Numerical experiments are provided in Section 5. Finally, Section 6 ends with conclusions.

Given a knot vector
. ., n, the B-spline basis functions of degree k (order k + 1) are defined in a recursive fashion [26]: The B-spline basis functions N i,k (x) are nonnegative and have local support (i.e., N i,k (x) is a nonzero polynomial on [u i , u i+k+1 )).In addition, the B-spline basis functions form a partition of unity for x ∈ [u k , u n+1 ] [27].Figure 1 shows the cubic and quadratic B-spline basis functions defined in [0,10].
A B-spline of degree k is defined [28] as where d i ∈ R is the i-th control coefficient, and N i,k (x) is the i-th B-spline basis function, which is defined based on the knot vector U = [u 1 , • • • , u n+k+1 ].A B-spline whose knot vector is evenly spaced is known as a uniform B-spline.Otherwise, the B-spline is called a nonuniform B-spline.
Let ∆ be a sequence of distinct real numbers , which forms a partition of the interval [u k , u n+1 ].The space S r k (∆) of piecewise polynomials of degree k with smoothness r = (r 1 , . . ., r l ) over the partition ∆ is defined by where P k is the space of all the polynomials of degree k, and C r i is the space consisting of all the functions that are continuous at η i with order r i .
Lemma 1 ([29]).Given the knot vector U, the set of B-spline basis functions defined as (1) forms an alternative basis for the piecewise polynomial space S r k (∆) .
Moreover, the approximation capabilities of S r k (∆) to a sufficiently smooth function defined over [u k , u n+1 ] were described in [30].It was shown that a sufficiently smooth function can be approximated with good accuracy by B-splines.

Density Estimation with B-Splines
Let {x i } N i=1 be an independent identically distributed random sample from a continuous probability density function f , X ∼ f (x).Zong proposed a method to find B-spline estimates of a one-dimensional and two-dimensional probability density function from a sample [31].
We define an estimate f (x|α) of f (x) in the form: where α = (α 1 , . . ., α n ) ∈ R n+1 are coefficients.To fix the estimate f (x|α) in the form of B-splines, we need to specify the degree k, the basis functions N j,k (x), and the coefficients α j , j = 1, . . ., n.

Selecting the Degree and the Knot Vector
The degree k and the knot vector U need to be specified a priori to determine the basis functions.Based on the approximation ability and flexibility of the quadratic and cubic B-splines, quadratic or cubic B-splines are usually chosen for local density estimation [21,31,32].The selection of the knot vector is challenging and time consuming.Even when we restrict the knot vector to a uniform case, we still need to specify: (1) η 0 and η l+1 (i.e., u k , u n+1 in the knot vector), which determines the endpoints of the interval of the piecewise polynomial space S r k (∆); To ensure that all the sample values are in the interval [η 0 , η l+1 ], we can set and γ is a parameter to control the length of the interval.In the numerical experiments, γ is set to be 0.01 in general.Note that the values a 0 , b 0 can be obtained by passing through data at a cost O(N).
The selection of the optimal bandwidth is generally based on the score of the estimated model.A penalized likelihood score is chosen to perform selection in a principled way, e.g., the Bayesian information criterion (BIC) and measured entropy (ME) scores are adopted to select the bandwidth with the highest score [31,32], where Note that ME is an asymptotically unbiased estimate of the information entropy.The information entropy of the real model f measured by the estimation f is defined as

Computing the Coefficients
When the degree and the knot vector are specified, we need to compute the coefficients α 1 , . . ., α n to obtain the B-spline estimation.In addition, two constraints need to be considered to ensure the resulting f (x|α) := ∑ n j=1 α j N j,k (x) is a valid probability density function, i.e., (1) α j ≥ 0, j = 1, . . ., n, so that fX (x|α) is always positive in the distribution range. (2) The coefficients α can be calculated based on the maximum likelihood method, which can be formulated as a constrained optimization problem: The constrained optimization problem (7) can be calculated efficiently by an iterative procedure [31]: where q represents the iteration number in the optimization process.The initial values α 0 j are set to (k + 1)/ ∑ n j=1 (u j − u j−k−1 ).

Knot Refinement
Uniform B-splines may fail to capture the details of the input dataset; it has been shown that the suitable placement of knots can improve the approximation capability of B-splines dramatically [33].Hence, we focused on the adaptive generation of the knot vectors for density estimation in this paper.
We started with a coarse uniform knot vector; the adaptive procedure consisted of successive loops of the form: Compute coefficients → Estimate error → Mark and refine.
The computation for the coefficients can be accomplished by (8).The essential part of the loops is the error estimate step.The error estimate methods with a posteriori error control are well developed in numerical analysis (see [34,35] and references therein for examples).We followed up on the ideas presented by [35,36] to derive a posteriori-based error estimator based on B-splines.

A Residual-Based Posteriori Error Estimator Based on B-Splines
We aimed to refine only those intervals I i = [u i , u i+1 ], which contributed significantly to the error f (x) − f (x|α).However, since the true density function f (x) was unknown, we defined a local error indicator attached with the interval I i as follows: where i are all the sampling points in {x j } N j=1 located in the interval I i .Note that τ i is an estimate of the information entropy restricted on the interval I i :

Adaptive Refinement Strategy
Inspired by the adaptive refinement strategy, in the numerical analysis, we introduced the adaptive refinement strategy to compute a sequence of estimates that converged to the true probability density function.As the error indicator for each interval was available, we marked each interval I i to be refined that had a large error.In order to find the intervals with a large error efficiently, we adopted the refinement strategy given in [37], with a slight modification as Algorithm 1.  1.

5.
Apply Algorithm 1 to obtain a refined knot vector U . 6.
If U = U, update U = U and return to Step 3.
Remark 1.Compared with the case of a uniform B-spline, where the knot vectors are selected by an exhaustive search [31], our adaptive refinement strategy generated the knot vector automatically.

Numerical Experiments
In this section, we report the results of several numerical experiments.We start by introducing the different comparison measures in Section 5.1.Section 5.2 shows a comparison of the accuracy of nonuniform B-spline density estimators versus uniform B-spline density estimators.In Section 5.3, we compare the nonuniform B-spline density estimator to the existing kernel density estimators and orthogonal sequence estimators.

Comparison Measures
We used different measures to evaluate the quality of the estimators computed based on the samples.

•
The measured entropy (ME) of the samples given by the estimator, which is defined as (5).

•
The BIC score of the samples given by the estimator, which is defined as (4).
In addition, we also used the MAE and root-MSE to measure how close the estimation fX (x|α) was to the true density f (x), where the x i is the sample point:

•
The root mean square error (root-MSE) between the estimation f (x|α) and the true density f (x):

•
The mean absolute error (MAE) between the estimation f (x|α) and the true density f (x):

Uniform B-Spline Estimators vs. Nonuniform B-Spline Estimators
First, we compared the uniform B-spline probability density estimator with the adaptive nonuniform B-spline probability density estimator; the generation of the nonuniform knot vector was described in Section 4.
Table 1 shows the name of the datasets, the probability distribution, and the approximation domain.The comparative experimental results of the uniform B-spline and the nonuniform B-spline are shown in Table 2, and the fitting results are shown in Figure 2.      When the sample size was fixed, the errors of the uniform B-spline and the nonuniform B-spline are compared in Table 2.The measured entropy (ME), the information entropy (H), the mean absolute error (MAE), the mean square error (root-MSE), and the Bayesian information criterion (BIC) scores are listed, which demonstrated that the adaptive nonuniform B-spline estimators usually outperformed the uniform B-spline estimators.In addition, compared to the uniform B-spline estimators, the adaptive nonuniform B-spline estimators were usually closer to the true density functions, which is shown in Figure 2.
When the sample size varied as N = 50, 100, 500, 1000, 5000, the root-MSE and MAE results are listed for the uniform B-spline estimators and the nonuniform estimators in Table 3. From the ratio of the MAE and root-MSE, we can see that the fitting results of the adaptive nonuniform B-spline outperformed those of the uniform B-spline.

Comparison with Orthogonal Sequence and Kernel Estimators
Tables 4 and 5 show the results compared with the previously mentioned probability density function estimation methods, including the orthogonal sequence of [7,13], the kernel estimators using three strategies to the bandwidth selected, that is, the rule-ofthumb method, which is based on the asymptotic mean integrated square error (ROT) [38], the least squares cross-validation method (LCV) [38], and a method proposed by Hall et al. based on the straightforward idea of plugging estimates into the usual asymptotic representation for the optimal bandwidth but with two important modifications (HALL) [39].The experimental results of the B-spline function estimation used the values of the MAE and root-MSE.In Table 4, we observe that with the same sample size, the errors of the nonuniform B-spline were smaller.The errors of the root-MSE and MAE of the nonuniform B-spline were smaller than those of the other methods, which showed that the estimation effect of the adaptive nonuniform B-splines was better than that of the listed methods.In addition, the fitting results obtained by the nonuniform B-splines overcame the overfitting phenomenon of the uniform method.
In Table 5, the error analysis of different sample sizes for the nonuniform B-spline, orthogonal sequence, and kernel methods are listed.The experimental results showed that the errors of the nonuniform B-spline were smaller, and the errors became smaller with the increase in the sample size.It was also shown that the B-spline fitting method of the nonuniform knots generated by our adaptive strategy had a better fitting effect.

Conclusions
In this work, we introduced a novel density estimation with nonuniform B-splines.By introducing the error indicator attached to each interval for density estimation, we proposed an adaptive strategy to generate the nonuniform knot vector.The numerical experiments showed that, compared with the uniform B-spline, the local density estimation with nonuniform B-splines not only achieved better estimation results but also effectively alleviated the overfitting phenomenon caused by the uniform B-splines.The comparison with the existing estimation procedures, including the state-of-the-art kernel estimators, demonstrated the accuracy of our new method.
In the future, it would be interesting to extend the method considered in the paper to multivariate density cases.Another natural direction to pursue further is the fast automatic knot placement method via feature characterization from the samples, which can generate the nonuniform knot vector directly.We leave these topics for future research.
Quadratic B-spline basis functions.
Cubic B-spline basis functions.

4 for all interval I i do 5 if I i is not marked then 6 if 10 endAlgorithm 2 :
τ i > µ • τ max then 7 mark the interval I i , refine I i by inserting a new knot (η i + η i+1 )/2; sum := sum +τ i ; We implemented the adaptive strategy in this paper as in Algorithm 2. Adaptive probability density function estimation Input: An observed sample {x i } N i=1 ; the degree of B-spline k; the number of initial interior knots m.Output: A probability density function: fX (x|α) = n ∑ j=1 α j N j,k (x).

Figure 2 .
Figure 2. Density estimates use uniform and nonuniform B-splines.The blue rectangle represents the histogram, the blue solid line represents the true density function, the yellow dashed line represents the uniform B-spline function, and the red dotted line represents the adaptive nonuniform B-spline function.The knots of the nonuniform B-splines are marked as asterisks along the horizontal axis.

Table 2 .
The goodness of fit for the data using the uniform B-spline and nonuniform B-spline methods (The sample size is 1000).

Table 3 .
Uniform vs. non-uniform Ratio-root-MSE = root-MSE non-uniform /root-MSE uniform .The goodness of fit for data using the uniform B-spline and nonuniform B-spline with the different sample sizes.

Table 4 .
The goodness of fit for data using the nonuniform B-spline methods, orthogonal sequence, and kernel estimators (The number of sample points is 1000).

Table 5 .
The goodness of fit using the nonuniform B-spline, orthogonal sequence, and kernal estimators with the different sample sizes.