Abstract
Representative points (rep-points) are a set of points that are optimally chosen for representing a big original data set or a target distribution in terms of a statistical criterion, such as mean square error and discrepancy. Most of the existing criteria can only assure the representing properties in the whole variable space. In this paper, a new kernel discrepancy, named power exponential kernel discrepancy (PEKD), is proposed to measure the representativeness of the point set with respect to the general multivariate distribution. Different from the commonly used criteria, PEKD can improve the projection properties of the point set, which is important in high-dimensional circumstances. Some theoretical results are presented for understanding the new discrepancy better and guiding the hyperparameter setting. An efficient algorithm for searching rep-points under the PEKD criterion is presented and its convergence has also been proven. Examples are given to illustrate its potential applications in the numerical integration, uncertainty propagation, and reduction of Markov Chain Monte Carlo chains.
Keywords:
representative points; kernel discrepancy; parallel successive convex approximation; projection; uncertainty propagation MSC:
62K99; 65D30; 68W10
1. Introduction
Rep-points, also called principal points [] or support points [], can be viewed as a data reduction method or statistical simulation technique, and it has been widely applied in many areas. In the very beginning, many authors studied how to find the optimal rep-points for representing the univariate or bivariate normal distribution [,]. Then Refs. [,] extended the rep-points for the elliptical distributions. [,] used the rep-points as the refined Monte Carlo technique for approximating the integration or expectation. More applications of rep-points can be found in the uncertainty quantification [,,] and Bayesian analysis [,,].
A lot of statistical criteria, such as the mean square error [,,,,], discrepancy [], divergence [], and statistical potential [,], are proposed to measure the representativeness of the point set with respect to the target distribution. In this paper, we mainly discuss the kernel discrepancy, which is also known as the maximum mean discrepancy in deep learning [] and transfer learning []. The property of kernel discrepancy is determined by the corresponding kernel function. Analytic expressions of kernel discrepancy are available for particular distributions and particular kernel functions; see [,,]. For obtaining rep-points from more general distributions, Ref. [] proposed the kernel herding method based on some common kernel functions, such as Gaussian and Laplacian kernels, and they generated rep-points one by one with the greedy stochastic optimization algorithm. The support points (SP) method proposed by [] is another kind of rep-points based on the negative Euclidean distance kernel discrepancy.
Note that the kernels in kernel herding and support points methods are isotropic, which means all the variables are considered active and the effects of all orders are equally important. However, when the dimension of the problem is relatively high, the active variables are usually sparse in practice. More attention should be paid to the representativeness of the projection distribution of rep-points. Some generalized discrepancies proposed by [,] can assure the low-dimensional space-filling properties by directly summing all local projection discrepancies. These discrepancies have concise expressions by using separable kernels and binomial theorem, but they are limited to the uniform distribution on the hypercube. Ref. [] presented the projected support points (PSP) method by constructing a sparsity-inducing kernel, which assumes a prior on the hyperparameters of Gaussian kernel. However, compared with the SP method, the algorithm for generating PSP is computationally expensive since it is based on the block Majorization-Minimization algorithm framework [] and includes sampling steps for hyperparameters.
There is an urgent need for an effective kernel discrepancy that encourages the preservation of low-dimensional representativeness and can be efficiently constructed. In this paper, the new discrepancy is developed from the power exponential kernel function [,,], so we call it PEKD. Different from the average kernels in generalized discrepancies and the PSP method, we make use of the norm in the power exponential kernel to regulate the representativeness of rep-points in subspaces of different projection dimensions. The contribution of this work is threefold. First, some theoretical analyses about the effect of the hyperparameter on the low-dimensional structure of rep-points are presented. In particular, we demonstrate that the rep-points under PEKD just form a Latin hypercube design for uniform distribution on the hypercube, given a suitable choice of hyperparameters. Second, we introduce the successive convex approximation algorithm framework [] to construct an efficiently parallelized algorithm for generating rep-points under PEKD, and its convergence has also been proven. Third, we illustrate the effectiveness of the new method with simulation studies for numerical integration, uncertainty propagation problems, and a real-world problem for MCMC reduction.
This paper is organized as follows. Section 2 recalls kernel discrepancies in the existing reference related with rep-points and introduces the proposed PEKD. Section 3 constructs an algorithm to generate rep-points under PEKD. Section 4 demonstrates the effectiveness of the new method with several examples. Section 5 concludes with thoughts on further work. For brevity, all proofs are postponed to the Appendix A.
2. Power Exponential Kernel Discrepancy
In this section, we first briefly introduce the kernel discrepancy [] and the existing kernel functions used to generate rep-points. Then, we propose PEKD and analyze its theoretical properties.
2.1. Kernel Discrepancy
Let , the binary function is called a symmetric positive kernel [] if it satisfies two properties: (i) symmetric, , and (ii) nonnegative definite, ∀, ,
Definition 1.
Let F be a distribution function on , and let be the empirical distribution function of a point set . For a symmetric positive definite kernel γ, the kernel discrepancy between F and is defined as:
Further, is called the rep-points [] of distribution F, if
Lemma 1
(Koksma-Hlawka inequality; []). Let γ be a symmetric positive definite kernel on , and be the reproducing kernel Hilbert space for the kernel γ. F and are as defined in Definition 1. The integration error of , defined as:
can be uniformly bounded as:
2.2. Kernels in Existing Rep-Points Methods
2.2.1. Isotropic Kernel
Definition 2.
A kernel function γ is isotropic kernel, if it can be expressed as a function of the Euclidean distance between points, i.e., , where is the Euclidean norm.
Gaussian kernel and Laplacian kernel, and , are two well-known isotropic kernels, which are widely used in nonlinear classification and regression problems. Based on these kernels, Ref. [] generate rep-points with a point-by-point greedy optimization form. Another popular class of kernels is the distance-induced kernel [,]. It is conditionally strictly positive definite if . In particular, when and , the corresponding kernel discrepancy,
is called the energy distance. Ref. [] proposed the SP method by optimizing the Monte Carlo approximation version of (5) based on the difference-of-convex programming technique.
Obviously, the isotropic kernel is invariant to translation and rotation transformations [], which means that the distribution characteristics in all directions are equally important.
2.2.2. Separable Kernel
Definition 3.
A kernel function γ defined on is separable kernel [], if it can be expressed as the following product form:
The separable kernel function is sensitive when and are close in some coordinate. This attractive property is a useful feature for the generation rep-points having good representativeness in the projection space [].
There are two types of kernels including projection metrics, which are the average form of separable kernels. The first type is the kernel of generalized discrepancy in uniform design [], which can be expressed as . There are closed forms of integrals in (1) for those kernels when F is a uniform distribution in , the optimization method is usually a discrete random optimization algorithm based on the Latin hypercube design (or U-type design). The second type is sparsity-inducing kernel, defined as in PSP method []. The sparsity-inducing kernel gives a general form for constructing kernels containing sparse structures. For example, in the uniform design can be obtained by choosing a special distribution . Ref. [] chose a separable kernel, the so-called general Gaussian kernel, as , then generated rep-points by sampling from to approximate kernel and optimizing the corresponding kernel discrepancy with the block Majorization-Minimization algorithm [,].
2.3. Power Exponential Kernel
2.3.1. Definition
Definition 4.
The function is said to be a power exponential correlation function provided and . Then, p-dimensional separable power exponential (PE) kernel has the form
It is obvious that when , the PE kernel in (6) is the isotropic Gaussian kernel.
2.3.2. Visualization of Kernels
Following the analysis in [], the contours of six kernels are given in Figure 1. Kernel can be regarded as a metric of similarity between points. The larger the value of , the more similar and are.
Figure 1.
Contours of different kernels. (a) Negative Euclidean distance kernel in support points (SP) method; (b) Gaussian kernel; (c) sparsity-inducing kernel in projected support points (PSP) method; (d–f) power exponential (PE) kernel with , respectively. The point A and point B in all figures have the same coordinates in the second dimension, and .
Consider the points A, B, C in Figure 1, whose positions are denoted by , respectively. On the one hand, points B and C are on the circle centered on point A, i.e., , which means two pairs of points, denoted by (A,B) and (A,C), have the same similarity in the 2-dimensional space. On the other hand, the coordinates of (A,C) are totally different in all dimensions, while (A,B) has the same coordinates in the second dimension. Hence, it is more reasonable to assign a larger value to (A,B) in the kernel, if the similarity of point pairs in both the 1-and 2-dimensional space is considered. From the contour plots, we can find that the isotropic kernels in Figure 1a,b cannot tell the difference between the similarity of the two pairs of points, while the other kernels in Figure 1c–f can do it.
2.3.3. The Influence of Hyperparameters in PE Kernel on Rep-Points
The kernel determines what characteristics of the distribution F should be imitated by the point set . In order to capture the low-dimensional structure of the target distribution, a larger weight should be assigned to the low-dimensional similarity measure.
Proposition 1.
Let , and is defined in (6) with . Then, is the solution set of the following optimization problem
and the minimum value is .
The main idea of this paper is that we use the norm in (6) to control the decay speed of the kernel function value in different projection dimensions. Without the loss of generality, let be the origin in the Proposition 1. Denote the k-dimensional () coordinate hyperplane by , where S is the subset of with elements. The point set contains those points on the d-dimensional unit sphere that are farthest from and PE kernel assigns the minimum value at these points. Point C in Figure 1 is one such point when . According to the minimum value , it can be found that both parameters and affect the variation of the similarity between points and is directly related to the low-dimensional structure of the rep-points. When , the minimum value decreases with the increase in projection dimension d from 1 to p. In addition, the smaller the , the more attention is paid to low-dimensional distribution similarity measures.
2.3.4. PEKDs with and
According to (1) and (6), the expression of PEKD, denoted by , can be derived. Here, we consider PEKDs with and , and some interesting conclusions are as follows.
Theorem 1.
Let be the rep-points on the bounded region under . Let be the k-th dimension marginal distribution of F and . If , then is the rep-points of generated by minimizing (5).
Theorem 1 shows that when and is sufficiently small, PEKD focuses on the one-dimensional structure of the rep-points. Restricting F in Theorem 1 to the uniform distribution on the hypercube, a more intuitive conclusion can be obtained, which is related to the Latin hypercube design.
Corollary 1.
If the target distribution F in Theorem 1 is the uniform distribution on the hypercube and , then the rep-points is a central Latin hypercube design.
A toy example for Corollary 1 is given below.
Example 1.
Let F be a uniform distribution on and the number of points be . We firstly generate rep-points using the SP method. Under the assumption of Corollary 1, we take as the initial point set and as the kernel, and generate new rep-points with the algorithm proposed in Section 3. Figure 2 shows the scatter plot of these two rep-points sets.
Figure 2.
Scatterplots of rep-points for uniform distribution on generated by SP method (▲) and PEKD method (●).
From Figure 2, rep-points (●) based on kernel is indeed a central Latin hypercube design, which has great one-dimensional projection. Observing carefully, these circular points can be observed as the result of moving the triangular points to the center of the grid while keeping the rank of the triangular points in each dimension unchanged. This rank-preserving sampling technique is known as Latin hypercube sampling with dependence in [,,].
Theorem 2.
Let F be a distribution function on the bounded region with finite means, and . If and , then can be minimized by point set whenever .
Theorem 2 means that should not be too small in kernel , otherwise the resulting point set would be similar to the target distribution only in the first moment. We found that the hyperparameter setting works well for the numerical examples, given that the big training data is scaled to zero mean and unit variance for each variable. The small is suitable for cases where important variables are sparse. The precise selection of parameters requires a consideration of how to incorporate prior information based on the Bayesian form or sequential identification of important variables. We defer this to future work.
3. Optimization Algorithm
In this section, we introduce the successive convex approximation [,] framework to construct a parallel optimization algorithm to generate rep-points based on PEKD.
3.1. Successive Convex Approximation
Consider the following presumably difficult optimization problem: where the feasible set is convex and is continuous. The basic idea of successive convex approximation (SCA) is solving a difficult problem via the sequence of simpler problems
where is a surrogate of the original function and is the step size set.
Definition 5
(SCA surrogate function; []). A function is SCA surrogate function of at if it satisfies:
- 1
- is continuous and strongly convex about for all ;
- 2
- is differentiable about and .
Similar to gradient methods, there are three possible choices for the step size: bounded step size, backtracking line search and diminishing step size. Compared with the other two methods, the diminishing step size is more convenient in practice, so it is used in this paper. Two examples of diminishing step size rules are suggested in []:
- , where and ;
- where .
3.2. Algorithm for Generating Rep-Points under PEKD
3.2.1. Algorithm Statement
Our optimization problem is to minimize the discrepancy . Since the closed-form of the objective function is usually not available for the general distribution F, we optimized the Monte Carlo approximation version of it. Specifically, ignoring the first term and approximating the second integral with a large sample from the distribution F in the second equation of (1); then, the optimization problem becomes
The objective function in (8) is denoted by . We construct an appropriate surrogate function for G in the following Theorem 3.
Theorem 3
(Closed-form iterations). Let . Assume for all and . Define the function h as:
Then, is a SCA surrogate function of in (8) at . Moreover, the global minimizer of h is given by:
where , and
.
In order to ensure that the assumptions in Theorem 3 are satisfied and the actual calculations remain numerically stable, we add a small perturbation to the absolute value items in and in practice.
On the basic of the SCA algorithm framework, the construction process of rep-points under PEKD is described in Algorithm 1. If the training sample size N is too large, we can resample the mini-batch of it at each iteration, such as a mini-batch stochastic gradient descent in machine learning.
| Algorithm 1: Rep-points construction algorithm under PEKD |
| 1 Set step size ; |
| 2 Initialize and points set with SP method; |
| 3 repeat |
| 4 for parallelly do |
| 5 with defined in (9); |
| 6 . |
| 7 end |
| 8 Update ; |
| 9 until converges; |
| 10 return the convergent point set . |
3.2.2. Complexity and Convergence of the Algorithm
As we can observe in Theorem 3, the surrogate function in (9) has a closed-form minimizer and optimization variables can be updated in parallel. The running time of Algorithm 1 for one loop iteration is such as the SP method, where P is the total number of computation cores available. As for the PSP method, assuming that a sample is obtained from to approximate the sparsity-inducing kernel , the one-shot algorithm in [] takes . When the dimension p rises, R should be relatively large so that the sparsity-inducing kernel can be approximated well.
The following theorem gives a convergence guarantee for Algorithm 1.
Theorem 4
(Convergence of Algorithm 1). Suppose is convex and compact and assumptions in Theorem 3 hold, then every limit point set of the sequence (at least one such point set exists) from Algorithm 1 converges to a stationary solution of (8).
4. Applications
4.1. Numerical Simulations
In this section, we compare the performance of the PEKD () method with Monte Carlo (MC), inverse randomized quasi Monte Carlo (RQMC), SP and PSP methods. According to the hyperparameter setting of the PSP method in [], we generate with small and large sample size and set . PEKD and PSP methods take the point set generated by SP method as a warm start.
4.1.1. Visualization
Example 2.
Let F be the 5-dimensional i.i.d. distribution; we generate points with several sampling methods.
Figure 3 shows scattarplots and marginal histograms of the projection of the point sets on the first two dimensions. It is obvious that the sample generated by PEKD () has better representation in all 1-dimensional marginal distributions and are not clustered such as in the samples obtained by SP and PSP methods on the 2-dimensional projection.
Figure 3.
Scattarplots and marginal histograms of points for 5-dimensional i.i.d Beta(2,4) with six samplers. Red lines represent the true marginal densities. (a) MC; (b) RQMC; (c) SP; (d) PSPs; (e) PSPl; (f) PEKD1.5.
We calculate the Kolmogorov-Smirnov (K-S) test statistic between sample and distribution for each dimension, and average k-dimensional projected energy distance
where represents the projection dimensions and denotes the e.d.f. of . Figure 4 shows the results of two measurements (the smaller the better). PEKD method performs better on one and two dimensional projections than other methods, while the PSP method is not stable. In addition, PSP and PEKD methods sometimes perform better than the SP method in the full dimensional space; one possible reason is that they start with the rep-points generated by the SP method, which helps the SP method leave a local minimum.
Figure 4.
Box plots of K-S test statistic and relative average projected energy distance (setting SP method as a benchmark by calculating ).
4.1.2. Numerical Integration
Example 3.
Consider the approximation of integral in []. Test three choices of distribution F: the i.i.d. and the i.i.d. with dimension p from 5 to 20 and two well-known integrand functions:
- (1)
- Gaussian peak function ,
- (2)
- additive Gaussian function ,
where is the marginal mean of . To incorporate low-dim. structure, a fraction q of the p variables are set as active, with for active variables, and 0 otherwise. These functions are denoted as and .
Some results of the integral estimation error () are shown in Figure 5. In Figure 5a, there is just , as an important variable. PSP method with large R obtains the lower averaged error than the RQMC method at the expense of complicated calculations. It is interesting that the PEKD method () has better performance with almost the same running time as the SP method. In Figure 5b,c, the number of important variables is small, and PEKD() performs best. In Figure 5d, the ratio q is large, and PEKD method with large is better. In addition, the errors of RQMC method are not always lower than that of the SP method, one possible reason is that the inverse transformation method may result in a loss of the representativeness of low discrepancy sequence.
Figure 5.
Box plots of Log-Error for and under p dimensional i.i.d. and (a) (, , ); (b) (, , ); (c) (, ; (d) (, .
4.1.3. Uncertainty Propagation
A computer model is treated as a mathematical function g that takes varying values of input parameters denoted by a vector , and returns output . Uncertainty propagation methods are used to estimate the distribution of model outputs resulting from a set of uncertain model outputs. In other words, let denote input uncertainties; the distribution of can then be observed as the resulting uncertainty on the system output.
Example 4.
Three test (modified) functions are taken from [,]:
- (1)
- , where ;
- (2)
- , where ;
- (3)
- , where , .
We generate points with different methods to estimate the output distributions of three test functions. Figure 6 shows the K-S test statistic values (repeated 100 times) between the estimated density and the true density for each test function. SP and PEKD2 perform better on than other methods, while PSP, PEKD1 and PEKD1.5 are more suitable for and . Two possible reasons are: (1) the latter two test functions are more wiggly for each dimension, which means the low-dimensional structure should be given more attention; (2) has many inactive variables, which makes SP and PEKD2 even worse than MC. In addition, the PSP method has a large variance on , since its approximate sparsity-inducing kernel is unstable as the dimension increases.
Figure 6.
Box plots of K-S test statistic for three test functions on uncertainty propagation of computer experiments. The smaller, the better. (a) ; (b) ; (c) .
4.2. Reduction of MCMC Chain
Markov Chain Monte Carlo (MCMC) is a family of techniques for the sampling probability distributions, which allows us to make statistical inferences about complex Bayesian models. If necessary, many practitioners use the thinning method (discard all but every k-th sample point after) to reduce high autocorrelation in the MCMC chain, save computer storage space and reduce processing time for computing derived posterior quantities. However, the thinning method is inefficient in most cases, since the valuable posterior samples are carelessly thrown away. Greater precision is available by working with unthinned chains. In practice, the models of interest are often high-dimensional but the desired posterior quantities involve only a handful of parameters.
Consider the orange tree growth model in []. The Orange data records the trunk circumference measurements of five trees (i) at seven different times (), which can be found in R datesets package. The following hierarchical (multilevel) model is assumed in their paper:
The parameter set of interest is . As suggested in [], we generate the chain with 150,000 iterations and the first half of the sample is discarded as a burn-in based on the R package rstan. Then, the full chain (N = 75,000) is compressed to a small sample () with thinning, SP, PSPs and PEKD() methods. Compare these methods on how well they estimate (a) the marginal posterior means and variances of each parameter, (b) the averaged instantaneous growth rate at three future times (t = 1600, 1625, 1650). True posterior quantities are estimated by running a longer MCMC chain with 600,000 iterations. Table 1 reports the error ratios of the thinning method over SP, PSPs and PEKD methods in estimating the posterior quantities of interest. The larger the ratio is, the more accurate the estimation is. From Table 1, compared with the thinning method, other methods can estimate parameters more accurately. The PEKD1.5 method is stable and performs best for most parameter estimates, while PEKD2 method performs well only in the estimation of the mean value.
Table 1.
The error ratios of thinning method over SP, PSP and PEKD methods in estimating the posterior quantities. The larger, the better.
5. Conclusions and Discussion
In this work, a new rep-points criterion named PEKD is introduced. The most attractive property of PEKD is that the low-dimensional representativeness of rep-points can be regulated by tuning the hyperparameter . The smaller the , the lower the dimensional representative will be assured primarily. Actually, when , the rep-points under the PEKD is an LHD for uniform distribution on , which means the 1-dimensional representativeness achieves the best performance. What is more, a parallelized optimization algorithm is also constructed for generating the rep-points under the criterion PEKD. Simulation studies and an example of real data demonstrate that PEKD with small is suitable for situations where important variables are sparse and the function fluctuates greatly, and is a robust choice in most cases.
As a general distribution similarity measure, PEKD can be used to test independence and goodness-of-fit [,,,]. For the experimental design community, PEKD can be used as a criterion to construct complex designs, such as space-filling design and sliced design [,,,] in the irregular region. In addition, the algorithm proposed in this paper would be helpful for data splitting [] and model-free subsampling [] problems in the machine learning community.
Author Contributions
Conceptualization, J.N. and H.Q.; methodology, Z.X.; software, Y.X.; validation, Z.X., J.N. and H.Q.; investigation, Z.X.; data curation, Y.X.; writing—original draft preparation, Z.X.; writing—review and editing, Y.X. and J.N.; supervision, J.N.; project administration, H.Q.; funding acquisition, J.N. and H.Q. All authors have read and agreed to the published version of the manuscript.
Funding
This research was funded by the National Natural Science Foundation of China, grant number 11871237; by the Fundamental Research Funds for the Central Universities, grant number CCNU22JC023, 2022YBZZ036; and by the Discipline Coordination Construction Project of Zhongnan University of Economics and Law, grant number XKHJ202125.
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
Not applicable.
Conflicts of Interest
The authors declare no conflict of interest.
Abbreviations
| PEKD | power exponential kernel discrepancy |
| rep-points | representative points |
| MCMC | Markov chain Monte Carlo |
| SP | support points |
| PSP | projected support points with small, large s |
| PE | power exponential |
| SCA | successive convex approximation |
| MC | Monte Carlo |
| RQMC | randomized quasi Monte Carlo |
| x | scalar variable x |
| vector variable x | |
| point set | |
| expectation of the random variable from the distribution F | |
| kernel of energy distance | |
| power exponential kernel with hyperparameters and |
Appendix A
Proof of Proposition 1.
Let , then the optimization problem can described as the following optimization problem:
The standard Lagrange multiplier method can be used to solve this problem, which takes the minimum value at . Since , all conclusions can be obtained directly. □
Proof of Theorem 1.
According to Definition 4,
Using the Taylor formula
then,
Since , (A1) can be written as
Because the first term in (A2) is constant, then
To prove Corollary 1, we require a lemma:
Lemma A1.
Let F be the uniform distribution on and let be the e.d.f of ; then, the energy distance in (5) can expressed as
Proof of the Lemma A1.
Let random variables , the energy distance kernel in 1-dimensional is . Then,
Proof of Corollary 1.
In the light of Theorem 1, and is the energy distance rep-points of , . Without loss of generality, the subscript k is ignored below. Take Lemma A1, minimizes the kernel discrepancy in (A4). Next, let denote the t-th order statistic of the sample , then
Therefore, each dimension of rep-points is a permutation of the n levels, which are . In other words, the rep-points is a central Latin hypercube design. □
To prove the Theorem 2, we require the following lemma:
Lemma A2.
Proof of Lemma A2.
Similar to (5), when kernel , we can obtain
The proof of this lemma is finished. □
Proof of Theorem 2.
Since ,
Then, according to Lemma A2, we can obtain
The last problem achieves the optimal value when . □
Proof of Theorem 3.
Obviously, is a quadratic function about variables and the coefficients of the quadratic term are all greater than 0; there, is continuous and strongly convex. When for all , and are differentiable about . Through tedious algebraic calculations,
can be verified. Then, is a SCA surrogate function of in (8) at according to Definition 5. Moreover, the closed-form minimizer can be obtained by setting the gradient of to zero and solving for . □
Proof of Theorem 4.
Based on Theorem 3 and the diminishing step size rule, this theorem can be proven by Theorem 3 in [] under some regularity conditions. □
References
- Flury, B.A. Principal Points. Biometrika 1990, 77, 33–41. [Google Scholar] [CrossRef]
- Mak, S.; Joseph, V.R. Support points. Ann. Stat. 2018, 46, 2562–2592. [Google Scholar] [CrossRef]
- Anderberg, M.R. Cluster Analysis for Applications; Academic Press: San Diego, CA, USA, 1973. [Google Scholar] [CrossRef]
- Fang, K.T.; He, S.D. The Problem of Selecting a Given Number of Representative Points in a Normal Population and a Generalized Mills’ Ratio; Technical Report; Stanford University, Department of Statistics: Stanford, CA, USA, 1982. [Google Scholar] [CrossRef]
- Flury, B.D. Estimation of principal points. J. R. Stat. Soc. Ser. C Appl. Stat. 1993, 42, 139–151. [Google Scholar] [CrossRef]
- Fang, K.; Zhou, M.; Wang, W. Applications of the representative points in statistical simulations. Sci. China Math. 2014, 57, 2609–2620. [Google Scholar] [CrossRef]
- Lemaire, V.; Montes, T.; Pagès, G. New weak error bounds and expansions for optimal quantization. J. Comput. Appl. Math. 2020, 371, 112670. [Google Scholar] [CrossRef]
- Mezic, I.; Runolfsson, T. Uncertainty propagation in dynamical systems. Automatica 2008, 44, 3003–3013. [Google Scholar] [CrossRef]
- Mohammadi, S.; Cremaschi, S. Efficiency of Uncertainty Propagation Methods for Estimating Output Moments. In Proceedings of the 9th International Conference on Foundations of Computer-Aided Process Design, 14–18 July 2019, Copper Mountain, CO, USA; Muñoz, S.G., Laird, C.D., Realff, M.J., Eds.; Elsevier: Amsterdam, The Netherlands, 2019; Volume 47, pp. 487–492. [Google Scholar] [CrossRef]
- Owen, A.B. Statistically Efficient Thinning of a Markov Chain Sampler. J. Comput. Graph. Stat. 2017, 26, 738–744. [Google Scholar] [CrossRef]
- Riabiz, M.; Chen, W.Y.; Cockayne, J.; Swietach, P.; Niederer, S.A.; Mackey, L.; Oates, C.J. Optimal thinning of MCMC output. J. R. Stat. Soc. Ser. B 2022, 84, 1059–1081. [Google Scholar] [CrossRef]
- South, L.F.; Riabiz, M.; Teymur, O.; Oates, C.J. Postprocessing of MCMC. Annu. Rev. Stat. Its Appl. 2022, 9, 529–555. [Google Scholar] [CrossRef]
- Xu, L.H.; Fang, K.T.; Pan, J. Limiting behavior of the gap between the largest two representative points of statistical distributions. Commun. Stat.-Theory Methods 2021, 1–24. [Google Scholar] [CrossRef]
- Li, Y.; Fang, K.T.; He, P.; Peng, H. Representative Points from a Mixture of Two Normal Distributions. Mathematics 2022, 10, 3952. [Google Scholar] [CrossRef]
- Xu, L.H.; Fang, K.T.; He, P. Properties and generation of representative points of the exponential distribution. Stat. Pap. 2022, 63, 197–223. [Google Scholar] [CrossRef]
- Fang, K.T.; Liu, M.Q.; Qin, H.; Zhou, Y.D. Theory and Application of Uniform Experimental Designs; Springer: Singapore, 2018. [Google Scholar] [CrossRef]
- Pronzato, L.; Zhigljavsky, A. Bayesian quadrature, energy minimization and space-filling design. SIAM/ASA J. Uncertain. Quantif. 2020, 8, 959–1011. [Google Scholar] [CrossRef]
- Borodachov, S.; Hardin, D.; Saff, E. Low Complexity Methods for Discretizing Manifolds via Riesz Energy Minimization. Found. Comput. Math. 2014, 14, 1173–1208. [Google Scholar] [CrossRef]
- Joseph, V.R.; Dasgupta, T.; Tuo, R.; Wu, C.F.J. Sequential Exploration of Complex Surfaces Using Minimum Energy Designs. Technometrics 2015, 57, 64–74. [Google Scholar] [CrossRef]
- Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; MIT Press: Cambridge, MA, USA, 2016. [Google Scholar]
- Yang, Q.; Zhang, Y.; Dai, W.; Pan, S.J. Transfer Learning; Cambridge University Press: Cambridge, UK, 2020. [Google Scholar] [CrossRef]
- Fang, K.T.; Wang, Y. Number-Theoretic Methods in Statistics; Chapman and Hall: London, UK, 1994. [Google Scholar]
- Briol, F.X.; Oates, C.J.; Girolami, M.; Osborne, M.A.; Sejdinovic, D. Probabilistic Integration: A Role in Statistical Computation? Stat. Sci. 2019, 34, 1–22. [Google Scholar] [CrossRef]
- Chen, Y.; Welling, M.; Smola, A.J. Super-Samples from Kernel Herding. arXiv 2012, arXiv:1203.3472. [Google Scholar] [CrossRef]
- Hickernell, F.J. A generalized discrepancy and quadrature error bound. Math. Comput. 1998, 67, 299–322. [Google Scholar] [CrossRef]
- Zhou, Y.D.; Fang, K.T.; Ning, J.H. Mixture discrepancy for quasi-random point sets. J. Complex. 2013, 29, 283–301. [Google Scholar] [CrossRef]
- Mak, S.; Joseph, V.R. Projected support points: A new method for high-dimensional data reduction. arXiv 2018, arXiv:1708.06897. [Google Scholar] [CrossRef]
- Scutari, G.; Sun, Y. Parallel and Distributed Successive Convex Approximation Methods for Big-Data Optimization. In Multi-Agent Optimization: Cetraro, Italy 2014; Facchinei, F., Pang, J.S., Eds.; Springer International Publishing: Cham, Switzerland, 2018; pp. 141–308. [Google Scholar] [CrossRef]
- Santner, T.J.; Williams, B.J.; Notz, W.I. The Design and Analysis of Computer Experiments; Springer: New York, NY, USA, 2018. [Google Scholar] [CrossRef]
- N’Gbo, N.; Tang, J. On the Bounds of Lyapunov Exponents for Fractional Differential Systems with an Exponential Kernel. Int. J. Bifurc. Chaos 2022, 32, 2250188. [Google Scholar] [CrossRef]
- Székely, G.J.; Rizzo, M.L. Energy statistics: A class of statistics based on distances. J. Stat. Plan. Inference 2013, 143, 1249–1272. [Google Scholar] [CrossRef]
- Fang, K.T.; Hickernell, F.J. Uniform Experimental Designs; Springer: New York, NY, USA, 2007. [Google Scholar] [CrossRef]
- Lange, K. MM Optimization Algorithms; SIAM: Philadelphia, PA, USA, 2016. [Google Scholar] [CrossRef]
- Stein, M.L. Large sample properties of simulations using latin hypercube sampling. Technometrics 1987, 29, 143–151. [Google Scholar] [CrossRef]
- Packham, N.; Schmidt, W.M. Latin hypercube sampling with dependence and applications in finance. J. Comput. Financ. 2010, 13, 81–111. [Google Scholar] [CrossRef]
- Aistleitner, C.; Hofer, M.; Tichy, R.F. A central limit theorem for Latin hypercube sampling with dependence and application to exotic basket option pricing. Int. J. Theor. Appl. Financ. 2012, 15, 1–20. [Google Scholar] [CrossRef]
- Scutari, G.; Facchinei, F.; Song, P.; Palomar, D.P.; Pang, J.S. Decomposition by Partial Linearization: Parallel Optimization of Multi-Agent Systems. IEEE Trans. Signal Process. 2014, 62, 641–656. [Google Scholar] [CrossRef]
- Oakley, J.E.; O’Hagan, A. Bayesian inference for the uncertainty distribution of computer model outputs. Biometrika 2002, 89, 769–784. [Google Scholar] [CrossRef]
- Marrel, A.; Iooss, B.; Laurent, B.; Roustant, O. Calculations of sobol indices for the gaussian process metamodel. Reliab. Eng. Syst. Saf. 2009, 94, 742–751. [Google Scholar] [CrossRef]
- Székely, G.J.; Rizzo, M.L.; Bakirov, N.K. Measuring and testing dependence by correlation of distances. Ann. Stat. 2007, 35, 2769–2794. [Google Scholar] [CrossRef]
- Wang, S.; Liang, J.; Zhou, M.; Ye, H. Testing Multivariate Normality Based on F-Representative Points. Mathematics 2022, 10, 4300. [Google Scholar] [CrossRef]
- Liang, J.; He, P.; Yang, J. Testing Multivariate Normality Based on t-Representative Points. Axioms 2022, 11, 587. [Google Scholar] [CrossRef]
- Xiong, Z.K.; Liu, W.J.; Ning, J.H.; Qin, H. Sequential support points. Stat. Pap. 2022, 63, 1757–1775. [Google Scholar] [CrossRef]
- Xiao, Y.; Ning, J.H.; Xiong, Z.K.; Qin, H. Batch sequential adaptive designs for global optimization. J. Korean Stat. Soc. 2022, 51, 780–802. [Google Scholar] [CrossRef]
- Kong, X.; Zheng, W.; Ai, M. Representative points for distribution recovering. J. Stat. Plan. Inference 2023, 224, 69–83. [Google Scholar] [CrossRef]
- Joseph, V.R.; Vakayil, A. Split: An optimal method for data splitting. Technometrics 2022, 64, 166–176. [Google Scholar] [CrossRef]
- Zhang, M.; Zhou, Y.; Zhou, Z.; Zhang, A. Model-free Subsampling Method Based on Uniform Designs. arXiv 2022, arXiv:2209.03617. [Google Scholar] [CrossRef]
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).