An Improved Model for Kernel Density Estimation Based on Quadtree and Quasi-Interpolation

: There are three main problems for classical kernel density estimation in its application: boundary problem, over-smoothing problem of high (low)-density region and low-efﬁciency problem of large samples. A new improved model of multivariate adaptive binned quasi-interpolation density estimation based on a quadtree algorithm and quasi-interpolation is proposed, which can avoid the deﬁciency in the classical kernel density estimation model and improve the precision of the model. The model is constructed in three steps. Firstly, the binned threshold is set from the three dimensions of sample number, width of bins and kurtosis, and the bounded domain is adaptively partitioned into several non-intersecting bins (intervals) by using the iteration idea from the quadtree algorithm. Then, based on the good properties of the quasi-interpolation, the kernel functions of the density estimation model are constructed by introducing the theory of quasi-interpolation. Finally, the binned coefﬁcients of the density estimation model are constructed by using the idea of frequency replacing probability. Simulation of the Monte Carlo method shows that the proposed non-parametric model can effectively solve the three shortcomings of the classical kernel density estimation model and signiﬁcantly improve the prediction accuracy and calculation efﬁciency of the density function for large samples.


Introduction
Density estimates are a common technique in modern data analysis. They are usually used to analyze statistical characteristics, such as skewness and multimodality of samples, and quantify uncertainties. They have been widely used in engineering, economics, medicine, geography and other fields. The methods of density estimation contain the parametric method and nonparametric method. The parametric method requires strong assumptions for the prior model to restrict the probability density function drawn from a given parametric family of distribution, and then calculates the corresponding parameter estimates from the samples. The main problem of the parametric method is that inaccurate setting of the prior parameter model may lead to wrong conclusions. Moreover, in the process of testing the posterior model, there is a common situation that multiple assumptions of prior models can pass a posterior test, which greatly affects the accuracy and efficiency of data analysis. Therefore, to avoid the defects in the parametric method, Fix and Hodges [1] first eliminate the strong assumptions of the parametric method by introducing the idea of discriminant analysis, which is also the fundamental thought source of the nonparametric method. The simplest histogram method is an intuitive embodiment of this idea. Nonparametric methods do not require any prior assumptions and can predict density estimator method based on the resampling strategy of a multi-point grid. Harel [17] discussed the asymptotic normality of a binned kernel density estimator for non-stationary random variables. Peherstorfer [18] proposed a density estimation based on sparse grid, which can be viewed as improved binned rules. It used a sparse grid instead of full grid to reduce the bins. Although the binned kernel density estimator improves the processing efficiency of large sample data through the binned strategy, it still faces the boundary problem of the kernel density estimator in essence. In addition, there are some other methodologies to apply kernel density estimation to large datasets. Cheng [19] proposed a quick multivariate kernel density estimation for massive datasets by viewing the estimator as a two-step procedure: first, kernel density estimator in sub-interval and then function approximation based on pseudo data via the Nedaraya-Watson estimator. However, the research of Gao [20] demonstrated that the generalized rational form estimators provide a low convergence rate. Moreover, the computation of pseudo data using a kernel density estimator brings more computation than the above binned rule and does not consider the boundary problem of the kernel density estimator. Zheng [21] focused on the choice method of samples from large data to produce a proxy for the true data with a prescribed accuracy, which is more complex than the direct binned rule. Moreover, the research does not pay much attention to the discussion of the kernel density estimator. Therefore, the binned method is very simple and clear. Recently, we proposed a kernel density estimator based on quasi-interpolation and proved its theoretical statistical properties, but the research does not provide a solution for the over-smoothing phenomenon [22].
Another problem (over-smoothing phenomenon) for kernel density estimators is caused by the improper selection of bandwidth, and different scholars have adopted different methods to reduce the occurrence of this phenomenon. The most classical method to choose the bandwidth is the thumb rule, which calculates the optimal bandwidth by the standard deviation and dimension of the samples. Due to the simplicity of this method, it is regarded as a common tool in most application studies of kernel density estimators. However, the actual samples are usually random and uneven, and the optimal bandwidth obtained by the thumb rule is fixed. It only provides a calculation criterion of an optimal bandwidth in a sense and has a very limited improvement effect on the over-smoothing phenomenon. An adaptive bandwidth approach is used to ameliorate this phenomenon viewed as a correction to the thumb rule, which consists of two steps. Firstly, the evaluated function is calculated with a fixed bandwidth and the quantitative relationship between the pointwise function value of samples and the geometric mean value of the samples is established. Then, according to the quantitative relationship, the pointwise correction coefficient is determined to modify bandwidth. The final kernel density estimator can be obtained based on these modified bandwidth. The adaptive bandwidth method improves the accuracy of kernel density estimators for a fixed bandwidth, but it is difficult to apply to large samples because each sample will affect the determination of the correction coefficient and the computational efficiency is low. Barreiro Ures [23] proposed a bandwidth selection method for large samples via using subagging. The subagging can be viewed as an improvement on the cross-validation method. Therefore, it is difficult to capture local changes in samples. Moreover, the research does not consider the boundary problem.
In conclusion, a classical kernel density estimator is a convenient vehicle and it is widely used in many branches of science and technology. However, the majority of research usually did not consider the constraints of the kernel density estimator model itself. These limitations and deficiencies for the kernel density estimator need to be further considered. In addition, previous methods of kernel density estimators are not synthetically considered among the boundary problem, smooth problem and large sample computation efficiency. Therefore, in view of the insufficiency of the classical kernel density estimator, this paper proposes a new modeling process of multivariate adaptive binned kernel density estimators based on the quadtree algorithm and quasi-interpolation, which significantly improves the prediction accuracy of the estimation density function. Research works in this paper are summarized as follows: (1) Aiming at the boundary problem of the classical kernel density estimator defined over bounded region a new set of asymmetric kernel functions is introduced based on the quasi-interpolation theory to avoid the boundary problem.
(2) To improve the computational efficiency of the classical kernel density estimator for large samples, the idea of binned kernel density estimation is introduced. The coefficient explicit expression of the density estimator under the binned rule of data is derived, which greatly reduces the computation and improves the computational efficiency of the model.
(3) To alleviate the over-smoothing phenomenon of classical kernel density estimators, this paper proposes an adaptive strategy based on the segmentation thought of quadtree algorithm. We set the segmentation thresholds from sample size, bin width and kurtosis to achieve adaptive computation for the amount of bin and bin width. It can effectively avoid the over-smoothing phenomenon in the high (low)-density area and increase local adaptability in the model for samples and further improve the accuracy in the model. (4) We extend the univariate model based on the quadtree algorithm to the multivariate model. The numerical simulation based on Monte Carlo shows that the constructed models in this paper perform well in the boundary problem, large samples and over-smoothing phenomenon, which are significantly better than the current widespread use of kernel density estimation methods.

Univariate Quasi-Interpolation Density Estimator
Let X 1 , X 2 , · · · , X n be a set of random samples subject to the probability distribution of an unknown probability density function f (x). The classical non-parametric kernel density estimator is defined as: where h denotes bandwidth and K(x) denotes kernel function or weight function. There are some common symmetric kernel functions shown in Table 1: Table 1. Common kernel functions.

Type of Kernel Function Expression of Kernel Function
Gaussian kernel According to Equation (1), the classical kernel density estimator requires one to calculate the distance between the predicted point and each sampling point to allot weight function. It means that the computation increases rapidly with the increase in sample size. We can note that the prediction points are mainly influenced by the samples in the limited bandwidth domain, while the samples outside the bandwidth domain have very little influence. The pointwise calculation of large samples outside the bandwidth domain greatly reduces the computational efficiency. Therefore, the binned kernel density estimator was proposed: where t j denotes the centers of the j-th bin and n j denotes the number of samples dropped in the j-th bin, satisfying ∑ n j = n. For clarity, we remind readers: X i denotes random sample and t j represents center of bin. According to Equation (2), it can be found that the binned kernel density estimator transforms the pointwise calculation of the classical kernel density estimator into the calculation for bin centers. Its essential idea is to treat the samples in a small region as a whole and the central points of each region as the core samples. Therefore, it can ignore the bandwidth difference between each individual sample and the central point of the region. In this way, unnecessary detailed calculation in the classical kernel density estimator can be reduced and the computational efficiency can be improved on the premise of ensuring accuracy. However, since actual samples are usually sampled from the bounded domain, the above two classes of the kernel density estimator face the same problem; that is, the boundary problem will occur when a fixed symmetric kernel function is used to predict the true density function defined over a bounded domain. The main reason for the boundary problem is that the weight is allotted outside the density support when smoothing near the boundary point by using a fixed symmetric kernel function. A natural strategy is to use a kernel that has no weight allotted outside the support. Therefore, under the framework of numerical approximation, combining with the theory of quasi-interpolation and the binned idea to improve the above models, this paper proposes a new binned quasi-interpolation density estimator, which can not only improve the computation efficiency of large samples, but also eliminate the boundary problem.
Let us start with some definitions and lemmas: ] be a bounded interval, and a, b be known, a = t 0 < t 1 < · · · < t n = b be a set of scattered centers on the interval [a, b], f t j n j=0 be the discrete function values corresponding to scattered centers. Let c be the positive shape parameter, φ j (x) = c 2 + x − t j 2 be the MQ function first constructed by Hardy [24], then we have a quasi-interpolation (L D operator).
where ψ j n j=0 are the asymmetric MQ kernels Here, these kernels satisfy 0 < ψ j (x) < 1 and In addition, we obtain the following error estimates, which can be found in Wu and Schaback [25].
j=0 , c be a positive shape parameter, there exists some constant K 1 , K 2 , K 3 independent of h and c, such that the following inequality According to lemma 1, for any shape parameter c satisfying 0 ≤ c ≤ O h/ |log h| , the convergence rate O h 2 for the whole bounded interval can be provided by quasiinterpolation L D . Furthermore, the research of Ling [26] shows that the multivariate L D operator by the tensor product technique(dimension-splitting) can provide the same convergence rate as the univariate case. Inspired by the convergence characteristics of quasiinterpolation and the idea of the binned kernel density estimator, we construct a univariate adaptive quasi-interpolation density estimator based on the quadtree algorithm, which consists of three steps. Suppose that X is a random variable, {X k } n k=1 are the n independent samples in the random variable X. There is an unknown density function f (x) on the bounded interval. The first step is to divide the interval [a, b] into N bins t j , t j+1 N−1 j=0 . Let n j denote the number of samples {X k } n k=1 dropping into the corresponding bin t j , t j+1 . In the second step, we construct a new univariate binned density estimator as follows: Here, ψ j N j=0 denote the asymmetric MQ kernels defined by Equation (3), and the coefficients α j ( f ) N j=0 are defined as According to Equation (4) and lemma 1, we can note that the introduction of asymmetric MQ kernels can avoid the boundary problem caused by the weight allotted outside the support when the traditional kernel function smooths near the boundary points. Moreover, Equation (5) shows that n j /n represents the frequency of samples falling into the corresponding bin t j , t j+1 . Through the linear combination of frequencies between adjacent bins, the explicit expression of coefficients of the estimator under the binned rule is given, which can effectively improve the calculation efficiency in the model. Thirdly, the over-smoothing phenomenon in the kernel density estimator is considered. In the above two steps, we built a univariate binned quasi-interpolation density estimator. Based on the known samples and interval, the interval was divided into a certain number of bins, and then the estimated density function could be calculated by the endpoint position of the bin and the number of samples in the bins. If the number of bins is too few, the predicted result is over-smoothing, which differs greatly from the actual scenario. If the number of bins is too great, the calculation efficiency will be greatly reduced. How to determine the number and width of bins is the key to both model accuracy and calculation efficiency. The most common method is the thumb rule, which takes the idea of a fixed bandwidth and calculates the bandwidth as follows: Here, d denotes the dimension and σ denotes standard deviation of samples. In particular, to maintain notational clarity, we remind readers: d denotes dimension and D is a mark of L D operator. The number of bins is calculated by ceiling (b − a)/h. This method uses equal bandwidth, and similar equal bandwidth methods include the unbiased cross-validation method and insertion method, etc. However, due to the strong randomness and uneven distribution of actual samples, the equal bandwidth method generally has the problem of insufficient description of details for the high-density area, which causes the over-smoothing phenomenon. Therefore, it is expected that the bandwidth can be adjusted adaptively with the density of samples. The bandwidth should be smaller in high-density areas to enhance local characterization and improve accuracy. In addition, the bandwidth should be larger in the gentle area to avoid excessive calculation and improve calculation efficiency. A common adaptive method determines the number of bins according to the thumb rule and obtains the estimated value of bin centers. Then, the ratio between each estimated value and the geometric mean of each estimated value is taken as the correction coefficient of bandwidth, so as to achieve the purpose of taking smaller bandwidth in the intensive area and larger bandwidth in the sparse area. This adaptive method is simple and easy to operate, but it also has three disadvantages: First, this method is based on the estimation of thumb rule, and the adaptive process does not change the number of bins, which can be regarded as the optimal configuration of bandwidth in essence. Second, the degree of adaptive refinement is insufficient and the determination of bandwidth correction coefficient is too rough, which is susceptible to extreme values. Moreover, it is difficult to distinguish sharp peaks from wide peaks. Third, the adaptive effect of multi-peak distribution is poor. In addition, the density near the boundary is usually small, and increasing the width of the bin easily aggravates the boundary problem. Therefore, this paper proposes a new adaptive binned method.

Adaptive Binned Method Based on Quadtree Algorithm
The quadtree algorithm, as a space partition index technology, is widely used in the image processing field [27]. The key idea is an iterated segmentation of data space. The number of iterations depends on the number of samples in bins and bin-width threshold. Therefore, the density of samples can be characterized by the number and width of bins. The area with dense samples has more iterated segmentation and the area with sparse samples has less iterated segmentation. Therefore, according to the idea of quadtree segmentation, we can adaptively adjust the bin number and bin width in the quasi-interpolation density estimator via a data-driven method. The high-density area is divided into more bins to obtain a smaller bin width, which can more keenly capture the distribution details of the area, while the gentle area is divided into fewer bins to save the cost of calculation, so as to achieve a reasonable distribution of bins and improve the accuracy in the model. The adaptive binned method based on the quadtree algorithm is shown in Figure 1: First of all, the sample space is divided into four bins and the number of samples in each bin and the bin widths {L i } 4 i=1 are recorded. Secondly, we set the threshold of sample number n max and bin width L max . The setting of sample number threshold n max captures distribution details in the high-density area with more bins and improves computing efficiency in the gentle area with less bins. It not only solves the over-smoothing problem but also takes into account computing efficiency. The setting of the bin-width threshold ensures the segmentation level of the whole domain and avoids an insufficient number of bins, which leads to the large estimation error or boundary problem. Following the thumb rule, we set the bin-width threshold to 1.06σn −1/5 . In addition, we set a kurtosis threshold to identify the peak distribution of samples and improve the accuracy. Finally, the number of samples and bin width in each bin are compared with the number of sample number threshold n max , bin-width threshold L max and kurtosis threshold. The segmentation is finished when all of these conditions are met. are recorded. Secondly, we set the threshold of sample number and bin width . The setting of sample number threshold captures distribution details in the high-density area with more bins and improves computing efficiency in the gentle area with less bins. It not only solves the over-smoothing problem but also takes into account computing efficiency. The setting of the bin-width threshold ensures the segmentation level of the whole domain and avoids an insufficient number of bins, which leads to the large estimation error or boundary problem. Following the thumb rule, we set the bin-width threshold to 1.06 −1/5 . In addition, we set a kurtosis threshold to identify the peak distribution of samples and improve the accuracy. Finally, the number of samples and bin width in each bin are compared with the number of sample number threshold , bin-width threshold and kurtosis threshold. The segmentation is finished when all of these conditions are met.

Multivariate Adaptive Binned Quasi-Interpolation Density Estimator
Based on the idea of the above univariate adaptive binned quasi-interpolation density estimator, we extend it to the multivariate model. Following the above process, we first construct the multivariate binned density estimator. The classical multivariate kernel density estimator and multivariate binned density estimator are extended from the univariate model via the tensor product technique. They are defined as follows: Figure 1. Univariate adaptive binned method based on quadtree algorithm.

Multivariate Adaptive Binned Quasi-Interpolation Density Estimator
Based on the idea of the above univariate adaptive binned quasi-interpolation density estimator, we extend it to the multivariate model. Following the above process, we first construct the multivariate binned density estimator. The classical multivariate kernel density estimator and multivariate binned density estimator are extended from the univariate model via the tensor product technique. They are defined as follows: where ∑ · · · ∑ n j 1 , j 2 , ··· , j d = n. Based on the above univariate binned quasi-interpolation density estimator, we also extended it to the multivariate binned quasi-interpolation density estimator via tensor product technique: Let X be a d-dimension random variable with an unknown density function f defined on a bounded hyperrectangle quasi-interpolation density estimator via the tensor product technique is as follows: where In Equation (9), for i = 1, 2, · · · , d, there are N i + 1 := N i − 1 and N i + 2 := N i − 2. To avoid the over-smoothing phenomenon, we use the advantage of the tensor product to transform the multivariate adaptive binned problem into a univariate problem, and the adaptive process is shown in Figure 2.
adaptive process is shown in Figure 2.
First, we divide the domain into two bins for each dimension and record the number of samples and the bin width in each bin from the univariate dimension. Secondly, they are compared with the threshold of sample number, bin width and kurtosis to achieve iterative segmentation. Finally, these bins in each dimension are spanned into some twodimensional bins via the tensor product technique, and the number of samples falling in each two-dimensional bin is recorded.  First, we divide the domain into two bins for each dimension and record the number of samples and the bin width in each bin from the univariate dimension. Secondly, they are compared with the threshold of sample number, bin width and kurtosis to achieve iterative segmentation. Finally, these bins in each dimension are spanned into some twodimensional bins via the tensor product technique, and the number of samples falling in each two-dimensional bin is recorded.

Numerical Simulation
In order to verify the performance of the model proposed in this paper, the Monte Carlo method is used for numerical simulation in this section. Maximal Mean Squared Error (MMSE) and Mean Integrated Squared Error (MISE) are used to quantify the difference between the estimated density function and the true density function. Here, E denotes the expectation value, Q ( * ) f (x) denotes estimated density function and f (x) denotes true density function. MMSE and MISE error are used to measure the local and overall accuracy in the model, respectively.

Univariate Test
As the first example, we test the prediction accuracy of the univariate model by using the following test function: Here, N(µ, σ) denotes a normal distribution with an expectation µ and variance σ. The test function (called asymmetric claw distribution) is a combination of the five different parameters' normal distribution, which has five peaks and troughs of different heights on the considered interval [0, 1]. Next, the comparison of the quasi-interpolation density estimator (QIDE), univariate adaptive binned quasi-interpolation density estimator (AQIDE) based on the quadtree algorithm, classical kernel density estimator (KDE) and binned kernel density estimator (BKDE) are shown in Figure 3.

Numerical Simulation
In order to verify the performance of the model proposed in this paper, the Monte Carlo method is used for numerical simulation in this section. Maximal Mean Squared Error (MMSE) are used to quantify the difference between the estimated density function and the true density function. Here, denotes the expectation value, ( * ) ( ) denotes estimated density function and ( ) denotes true density function. MMSE and MISE error are used to measure the local and overall accuracy in the model, respectively.

Univariate Test
As the first example, we test the prediction accuracy of the univariate model by using the following test function: Here, ℕ( , ) denotes a normal distribution with an expectation and variance . The test function (called asymmetric claw distribution) is a combination of the five different parameters' normal distribution, which has five peaks and troughs of different heights on the considered interval [0, 1]. Next, the comparison of the quasi-interpolation density estimator (QIDE), univariate adaptive binned quasi-interpolation density estimator (AQIDE) based on the quadtree algorithm, classical kernel density estimator (KDE) and binned kernel density estimator (BKDE) are shown in Figure 3.    Figure 3 shows the sketches of different density estimators when the sample number is n = 12, 400 and the number of simulation experiments is 100. Furthermore, we provide a comparison sketch of KDE under the larger sample number, n = 12, 400 × 50 = 620, 000. In these simulation experiments, the bandwidth selection for the KDE model and bin-width selection of the BKDE model both adopt the thumb rule of Equation (6). The bin number and bin width in the AQIDE model proposed in this paper are adaptively obtained by the univariate quadtree segmentation algorithm designed in Section 3. The shape parameter is selected as c = min(L i ). The threshold of bin width L max is determined by using the thumb rule L max = 1.06σn −0.2 from Equation (6). The threshold of bin number n max is determined by n max = nL max based on the thumb rule, and the kurtosis threshold is determined as 3.
In addition, to compare the performance of the quasi-interpolation model after an adaptive processing proposed in this paper, the same bin number and shape parameters are selected for the QIDE model and AQIDE model.
In Figure 3, the blue dashed line denotes the true density function, while the turquoise, black, red and green lines represent the results by the KDE, BKDE, QIDE and AQIDE models, respectively. The black dashed line denotes the result of KDE for larger samples. We can note that the ability of classical KDE to catch the last two high peaks is poor. It performs nearly as well as our QIDE only when the sample number is increased to 620,000. The MMSE error and MISE error corresponding to each model are shown in Table 2. According to Figure 3 and Table 2, the binned technique does not affect the fitting accuracy. Moreover, the KDE and BKDE models both have a serious over-smoothing phenomenon, and the prediction effect of peaks and troughs is poor. The QIDE and AQIDE models in this paper can alleviate the problem. The fitting effect of peaks and troughs performs significantly better than the KDE and BKDE models. In addition, according to the adaptive algorithm proposed in this paper, we calculate the bin number, and then we provide the results of the equidistant QIDE and AQIDE model under the same bin number. These results show that the AQIDE model performs better than the QIDE model when the bin number is the same. It means that the proposed adaptive method based on the quadtree algorithm can better capture the distribution details than the case of equidistance bin width and improve the fitting accuracy of the model by increasing or reducing adaptive bins in the high-density or gentle area.

Bivariate Test
In order to further test the performance of the multivariate model proposed in this paper, we choose the following modified bivariate density function as the test function: e −((9x 1 +1) 2 /49−(9x 2 +1)/10) The function originates from the classic Franke function, which is difficult to approximate due to two Gaussian peaks of different heights and a small dip. Therefore, it is widely used as a test function in numerical analysis. In the test function, a constant G is introduced to ensure that the final test function f is the density function defined over the domain [0, 1] 2 . A comparison of the adaptive multivariate binned quasi-interpolation density estimator (AMQIDE), multivariate binned quasi-interpolation density estimator (MQIDE), classical multivariate kernel density estimator (MKDE) and multivariate binned kernel density estimator (MBKDE) is shown in Figure 4. Figure 4 shows the sketches of different multivariate density estimators under the samples N = 300, 000 and the number of simulation experiments is 50. In these simulation experiments, the bandwidth of the MKDE model and the bin width of the MBKDE model both adopt the thumb rule from Equation (6). The bin number and bin width in the AQIDE model are calculated by the multivariate adaptive quadtree algorithm. The shape parameter is chosen as c = h and the threshold of the bin width L max is given by the thumb rule L max = σn −1/6 from Equation (6). The threshold of the sample number n max is determined by n max = nL max based on the thumb rule, and the kurtosis threshold is determined as 3.
In addition, the QIDE model chooses the same bin number and shape parameter as the AQIDE model.   Figure 4 shows the sketches of different multivariate density estimators under the samples = 300,000 and the number of simulation experiments is 50. In these simulation experiments, the bandwidth of the MKDE model and the bin width of the MBKDE model both adopt the thumb rule from Equation (6). The bin number and bin width in the AQIDE model are calculated by the multivariate adaptive quadtree algorithm. The shape parameter is chosen as = ℎ and the threshold of the bin width is given by the thumb rule = −1/6 from Equation (6). The threshold of the sample number is determined by = based on the thumb rule, and the kurtosis threshold is determined as 3. In addition, the QIDE model chooses the same bin number and shape parameter as the AQIDE model.
The Figure 4a is the real density function. Figure 4b,c are the estimated density functions obtained by the AMQIDE model and MQIDE model, while Figure 4d,e are the estimated density functions obtained by the MKDE model and MBKDE model. In addition, corresponding MMSE errors and MISE errors in the four models are provided in Table 3. From Figure 4 and Table 3, it can be noted that the kurtosis in the Franke density function is small, and the estimated results of the MQIDE model and AMQIDE model are consistent, meaning that our adaptive method can effectively identify high-density areas. The results of the MKDE model and MBKDE model are similar to the univariate situation, which perform poorly with a serious boundary problem. The performance is much lower than the MQIDE and AMQIDE models proposed in this paper.   Table 3. From Figure 4 and Table 3, it can be noted that the kurtosis in the Franke density function is small, and the estimated results of the MQIDE model and AMQIDE model are consistent, meaning that our adaptive method can effectively identify high-density areas. The results of the MKDE model and MBKDE model are similar to the univariate situation, which perform poorly with a serious boundary problem. The performance is much lower than the MQIDE and AMQIDE models proposed in this paper.

Conclusions
This paper proposes a multivariate adaptive quasi-interpolation density estimation model based on the quadtree algorithm. The key goal to achieve the adaptive segmentation for samples via the quadtree algorithm and obtain the proper binned number and bin width. The method can be adjusted adaptively according to the distribution of the samples. It not only identifies details of distribution in the high-density area, but also avoids the inefficiency of large bins, which can effectively avoid the over-smoothing phenomenon. Moreover, based on the good properties of quasi-interpolation, the theory of quasi-interpolation is introduced to construct the kernel function for the density estimator, which can avoid the boundary problem of the classical kernel density estimator. Finally, the idea of frequency approximation probability is used to construct the coefficient of the binned density estimator, which can handle large samples and improve computational efficiency. The simulation of Monte Carlo shows that the proposed nonparametric model has strong robustness and can estimate the density function with high performance.