Next Article in Journal
Bargaining-Based Profit Allocation Model for Fixed Return Investment Water-Saving Management Contract
Next Article in Special Issue
Projection Uniformity of Asymmetric Fractional Factorials
Previous Article in Journal
Some Properties of the Solution to a System of Quaternion Matrix Equations
Previous Article in Special Issue
A Unified Test for the AR Error Structure of an Autoregressive Model
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Representative Points Based on Power Exponential Kernel Discrepancy

1
School of Statistics and Mathematics, Zhongnan University of Economics and Law, Wuhan 430073, China
2
School of Mathematics and Statistics, Central China Normal University, Wuhan 430079, China
*
Author to whom correspondence should be addressed.
Axioms 2022, 11(12), 711; https://doi.org/10.3390/axioms11120711
Submission received: 4 November 2022 / Revised: 3 December 2022 / Accepted: 5 December 2022 / Published: 9 December 2022
(This article belongs to the Special Issue Computational Statistics & Data Analysis)

Abstract

:
Representative points (rep-points) are a set of points that are optimally chosen for representing a big original data set or a target distribution in terms of a statistical criterion, such as mean square error and discrepancy. Most of the existing criteria can only assure the representing properties in the whole variable space. In this paper, a new kernel discrepancy, named power exponential kernel discrepancy (PEKD), is proposed to measure the representativeness of the point set with respect to the general multivariate distribution. Different from the commonly used criteria, PEKD can improve the projection properties of the point set, which is important in high-dimensional circumstances. Some theoretical results are presented for understanding the new discrepancy better and guiding the hyperparameter setting. An efficient algorithm for searching rep-points under the PEKD criterion is presented and its convergence has also been proven. Examples are given to illustrate its potential applications in the numerical integration, uncertainty propagation, and reduction of Markov Chain Monte Carlo chains.

1. Introduction

Rep-points, also called principal points [1] or support points [2], can be viewed as a data reduction method or statistical simulation technique, and it has been widely applied in many areas. In the very beginning, many authors studied how to find the optimal rep-points for representing the univariate or bivariate normal distribution [3,4]. Then Refs. [1,5] extended the rep-points for the elliptical distributions. [6,7] used the rep-points as the refined Monte Carlo technique for approximating the integration or expectation. More applications of rep-points can be found in the uncertainty quantification [2,8,9] and Bayesian analysis [10,11,12].
A lot of statistical criteria, such as the mean square error [1,6,13,14,15], discrepancy [16], divergence [17], and statistical potential [18,19], are proposed to measure the representativeness of the point set with respect to the target distribution. In this paper, we mainly discuss the kernel discrepancy, which is also known as the maximum mean discrepancy in deep learning [20] and transfer learning [21]. The property of kernel discrepancy is determined by the corresponding kernel function. Analytic expressions of kernel discrepancy are available for particular distributions and particular kernel functions; see [16,22,23]. For obtaining rep-points from more general distributions, Ref. [24] proposed the kernel herding method based on some common kernel functions, such as Gaussian and Laplacian kernels, and they generated rep-points one by one with the greedy stochastic optimization algorithm. The support points (SP) method proposed by [2] is another kind of rep-points based on the negative Euclidean distance kernel discrepancy.
Note that the kernels in kernel herding and support points methods are isotropic, which means all the variables are considered active and the effects of all orders are equally important. However, when the dimension of the problem is relatively high, the active variables are usually sparse in practice. More attention should be paid to the representativeness of the projection distribution of rep-points. Some generalized L 2 discrepancies proposed by [25,26] can assure the low-dimensional space-filling properties by directly summing all local projection discrepancies. These discrepancies have concise expressions by using separable kernels and binomial theorem, but they are limited to the uniform distribution on the hypercube. Ref. [27] presented the projected support points (PSP) method by constructing a sparsity-inducing kernel, which assumes a prior on the hyperparameters of Gaussian kernel. However, compared with the SP method, the algorithm for generating PSP is computationally expensive since it is based on the block Majorization-Minimization algorithm framework [28] and includes sampling steps for hyperparameters.
There is an urgent need for an effective kernel discrepancy that encourages the preservation of low-dimensional representativeness and can be efficiently constructed. In this paper, the new discrepancy is developed from the power exponential kernel function [16,29,30], so we call it PEKD. Different from the average kernels in generalized L 2 discrepancies and the PSP method, we make use of the L α norm in the power exponential kernel to regulate the representativeness of rep-points in subspaces of different projection dimensions. The contribution of this work is threefold. First, some theoretical analyses about the effect of the hyperparameter α on the low-dimensional structure of rep-points are presented. In particular, we demonstrate that the rep-points under PEKD just form a Latin hypercube design for uniform distribution on the hypercube, given a suitable choice of hyperparameters. Second, we introduce the successive convex approximation algorithm framework [28] to construct an efficiently parallelized algorithm for generating rep-points under PEKD, and its convergence has also been proven. Third, we illustrate the effectiveness of the new method with simulation studies for numerical integration, uncertainty propagation problems, and a real-world problem for MCMC reduction.
This paper is organized as follows. Section 2 recalls kernel discrepancies in the existing reference related with rep-points and introduces the proposed PEKD. Section 3 constructs an algorithm to generate rep-points under PEKD. Section 4 demonstrates the effectiveness of the new method with several examples. Section 5 concludes with thoughts on further work. For brevity, all proofs are postponed to the Appendix A.

2. Power Exponential Kernel Discrepancy

In this section, we first briefly introduce the kernel discrepancy [25] and the existing kernel functions used to generate rep-points. Then, we propose PEKD and analyze its theoretical properties.

2.1. Kernel Discrepancy

Let X R p , the binary function γ : X × X R is called a symmetric positive kernel [16] if it satisfies two properties: (i) symmetric, γ ( x , y ) = γ ( y , x ) , x , y X , and (ii) nonnegative definite, ∀ c 1 , , c n R , x 1 , , x n X , i = 1 n j = 1 n c i γ ( x i , x j ) c j 0 .
Definition 1.
Let F be a distribution function on X R p , and let F P be the empirical distribution function of a point set P = { x i } i = 1 n X . For a symmetric positive definite kernel γ, the kernel discrepancy between F and F P is defined as:
D γ 2 ( F , F P ) : = X X γ ( x , y ) d [ F F P ] ( x ) d [ F F P ] ( y ) = X X γ ( x , y ) d F ( x ) d F ( y ) 2 n i = 1 n X γ ( x i , y ) d F ( y ) + 1 n 2 i , j = 1 n γ ( x i , x j ) .
Further, P * = { x i * } i = 1 n is called the rep-points [22] of distribution F, if
D γ 2 ( F , F P * ) = min P X D γ 2 ( F , F P ) .
Lemma 1
(Koksma-Hlawka inequality; [25]). Let γ be a symmetric positive definite kernel on X , and H γ be the reproducing kernel Hilbert space for the kernel γ. F and F P are as defined in Definition 1. The integration error of g H γ , defined as:
I ( g ; F , F P ) : = X g ( x ) d F ( x ) 1 n i = 1 n g ( x i ) ,
can be uniformly bounded as:
I ( g ; F , F P ) g H γ D γ ( F , F P ) .

2.2. Kernels in Existing Rep-Points Methods

2.2.1. Isotropic Kernel

Definition 2.
A kernel function γ is isotropic kernel, if it can be expressed as a function of the Euclidean distance between points, i.e., γ ( x , y ) = h ( x y 2 ) , where · 2 is the Euclidean norm.
Gaussian kernel and Laplacian kernel, γ G ( x , y ) = exp { θ x y 2 2 } and γ L ( x , y ) = exp { θ x y 2 } , are two well-known isotropic kernels, which are widely used in nonlinear classification and regression problems. Based on these kernels, Ref. [24] generate rep-points with a point-by-point greedy optimization form. Another popular class of kernels is the distance-induced kernel γ s ( x , y ) = x y 2 s [17,31]. It is conditionally strictly positive definite if s ( 0 , 2 ) . In particular, when s = 1 and Y , Y i . i . d . F , the corresponding kernel discrepancy,
D γ E D 2 ( F , F P ) = 2 n i = 1 n E x i Y 2 1 n 2 i = 1 n j = 1 n x i x j 2 E Y Y 2 ,
is called the energy distance. Ref. [2] proposed the SP method by optimizing the Monte Carlo approximation version of (5) based on the difference-of-convex programming technique.
Obviously, the isotropic kernel is invariant to translation and rotation transformations [29], which means that the distribution characteristics in all directions are equally important.

2.2.2. Separable Kernel

Definition 3.
A kernel function γ defined on X × X is separable kernel [32], if it can be expressed as the following product form:
γ ( x , y ) = k = 1 p γ k ( x k , y k ) , for any x , y X .
The separable kernel function γ is sensitive when x and y are close in some coordinate. This attractive property is a useful feature for the generation rep-points having good representativeness in the projection space [17].
There are two types of kernels including projection metrics, which are the average form of separable kernels. The first type is the kernel of generalized L 2 discrepancy in uniform design [16], which can be expressed as γ U D ( x , y ) = u { 1 : p } γ u ( x u , y u ) = k = 1 p [ 1 + γ k ( x k , y k ) ] . There are closed forms of integrals in (1) for those kernels when F is a uniform distribution in [ 0 , 1 ] p , the optimization method is usually a discrete random optimization algorithm based on the Latin hypercube design (or U-type design). The second type is sparsity-inducing kernel, defined as γ θ π ( x , y ) : = E θ π [ γ θ ( x , y ) ] , in PSP method [27]. The sparsity-inducing kernel gives a general form for constructing kernels containing sparse structures. For example, γ U D in the uniform design can be obtained by choosing a special distribution π . Ref. [27] chose a separable kernel, the so-called general Gaussian kernel, as γ θ ( x , y ) , then generated rep-points by sampling θ from π to approximate kernel γ θ π and optimizing the corresponding kernel discrepancy with the block Majorization-Minimization algorithm [28,33].

2.3. Power Exponential Kernel

2.3.1. Definition

Definition 4.
The function R ( h | θ ) = exp { θ | h | α } , h R , is said to be a power exponential correlation function provided θ > 0 and 0 < α 2 . Then, p-dimensional separable power exponential (PE) kernel has the form
γ θ , α ( x , y ) = exp k = 1 p θ | x k y k | α .
It is obvious that when α = 2 , the PE kernel in (6) is the isotropic Gaussian kernel.

2.3.2. Visualization of Kernels

Following the analysis in [27], the contours of six kernels are given in Figure 1. Kernel γ ( x , y ) can be regarded as a metric of similarity between points. The larger the value of γ ( x , y ) , the more similar x and y are.
Consider the points A, B, C in Figure 1, whose positions are denoted by x A , x B , x C , respectively. On the one hand, points B and C are on the circle centered on point A, i.e., x A x B 2 = x A x C 2 , which means two pairs of points, denoted by (A,B) and (A,C), have the same similarity in the 2-dimensional space. On the other hand, the coordinates of (A,C) are totally different in all dimensions, while (A,B) has the same coordinates in the second dimension. Hence, it is more reasonable to assign a larger value to (A,B) in the kernel, if the similarity of point pairs in both the 1-and 2-dimensional space is considered. From the contour plots, we can find that the isotropic kernels in Figure 1a,b cannot tell the difference between the similarity of the two pairs of points, while the other kernels in Figure 1c–f can do it.

2.3.3. The Influence of Hyperparameters in PE Kernel on Rep-Points

The kernel γ ( x , y ) determines what characteristics of the distribution F should be imitated by the point set { x i } i = 1 n . In order to capture the low-dimensional structure of the target distribution, a larger weight should be assigned to the low-dimensional similarity measure.
Proposition 1.
Let B = { z R d | z 2 = 1 } , and γ θ , α ( x , x 0 ) is defined in (6) with α ( 0 , 2 ) . Then, { x 0 + u / d | u i = ± 1 , i = 1 , , d } is the solution set of the following optimization problem
minimize x R d γ θ , α ( x , x 0 ) subject to x x 0 B ,
and the minimum value is exp { θ d 1 α 2 } .
The main idea of this paper is that we use the L α norm in (6) to control the decay speed of the kernel function value in different projection dimensions. Without the loss of generality, let x 0 be the origin 0 in the Proposition 1. Denote the k-dimensional ( 1 k < d ) coordinate hyperplane by H S = { x R d | x j = 0 , j S } , where S is the subset of { 1 , , d } with d k elements. The point set { u / d | u i = ± 1 , i = 1 , , d } contains those points on the d-dimensional unit sphere that are farthest from H S and PE kernel assigns the minimum value at these points. Point C in Figure 1 is one such point when d = 2 . According to the minimum value exp { θ d 1 α 2 } , it can be found that both parameters θ and α affect the variation of the similarity between points and α is directly related to the low-dimensional structure of the rep-points. When α ( 0 , 2 ) , the minimum value exp { θ d 1 α 2 } decreases with the increase in projection dimension d from 1 to p. In addition, the smaller the α , the more attention is paid to low-dimensional distribution similarity measures.

2.3.4. PEKDs with α = 1 and α = 2

According to (1) and (6), the expression of PEKD, denoted by D γ θ , α 2 ( F , F P ) , can be derived. Here, we consider PEKDs with α = 1 and α = 2 , and some interesting conclusions are as follows.
Theorem 1.
Let P = { x i } i = 1 n be the rep-points on the bounded region X = X 1 × × X p under D γ θ , 1 2 ( F , F P ) . Let F k be the k-th dimension marginal distribution of F and M = sup x , y X k = 1 p | x k y k | . If θ = o ( 1 / M ) , then { x i k } i = 1 n is the rep-points of F k generated by minimizing (5).
Theorem 1 shows that when α = 1 and θ is sufficiently small, PEKD focuses on the one-dimensional structure of the rep-points. Restricting F in Theorem 1 to the uniform distribution on the hypercube, a more intuitive conclusion can be obtained, which is related to the Latin hypercube design.
Corollary 1.
If the target distribution F in Theorem 1 is the uniform distribution on the hypercube [ 0 , 1 ] p and θ = o ( 1 / p ) , then the rep-points { x i } i = 1 n is a central Latin hypercube design.
A toy example for Corollary 1 is given below.
Example 1.
Let F be a uniform distribution on [ 0 , 1 ] 2 and the number of points be n = 10 . We firstly generate rep-points P S P using the SP method. Under the assumption of Corollary 1, we take P S P as the initial point set and γ 10 4 , 1 as the kernel, and generate new rep-points P P E K D with the algorithm proposed in Section 3. Figure 2 shows the scatter plot of these two rep-points sets.
From Figure 2, rep-points () based on kernel γ 10 4 , 1 is indeed a central Latin hypercube design, which has great one-dimensional projection. Observing carefully, these circular points can be observed as the result of moving the triangular points to the center of the grid while keeping the rank of the triangular points in each dimension unchanged. This rank-preserving sampling technique is known as Latin hypercube sampling with dependence in [34,35,36].
Theorem 2.
Let F be a distribution function on the bounded region X = X 1 × × X p with finite means, and M = sup x , y X k = 1 p ( x k y k ) 2 . If θ = o ( 1 / M ) and Y F , then D γ θ , 2 2 ( F , F P ) can be minimized by point set { x i } i = 1 n whenever 1 n i = 1 n x i = E Y .
Theorem 2 means that θ should not be too small in kernel γ θ , 2 , otherwise the resulting point set would be similar to the target distribution only in the first moment. We found that the hyperparameter setting θ α = 10 4 works well for the numerical examples, given that the big training data { y m } m = 1 N is scaled to zero mean and unit variance for each variable. The small α is suitable for cases where important variables are sparse. The precise selection of parameters requires a consideration of how to incorporate prior information based on the Bayesian form or sequential identification of important variables. We defer this to future work.

3. Optimization Algorithm

In this section, we introduce the successive convex approximation [28,37] framework to construct a parallel optimization algorithm to generate rep-points based on PEKD.

3.1. Successive Convex Approximation

Consider the following presumably difficult optimization problem: min x X G ( x ) , where the feasible set X is convex and G ( x ) is continuous. The basic idea of successive convex approximation (SCA) is solving a difficult problem via the sequence of simpler problems
x ^ ( x t ) = arg min x X G ˜ ( x | x t ) , x t + 1 = x t + η t ( x ^ ( x t ) x t ) ,
where G ˜ ( x | x t ) is a surrogate of the original function G ( x ) and { η t } is the step size set.
Definition 5
(SCA surrogate function; [28]). A function G ˜ ( x | y ) is SCA surrogate function of G ( x ) at x = y if it satisfies:
1 
G ˜ ( x | y ) is continuous and strongly convex about x for all y X ;
2 
G ˜ ( x | y ) is differentiable about x and x G ˜ ( x | y ) | x = y = x G ( x ) | x = y .
Similar to gradient methods, there are three possible choices for the step size: bounded step size, backtracking line search and diminishing step size. Compared with the other two methods, the diminishing step size is more convenient in practice, so it is used in this paper. Two examples of diminishing step size rules are suggested in [28]:
  • η t + 1 = η t ( 1 ϵ η t ) , t = 0 , 1 , , where η 0 < 1 / ϵ and ϵ ( 0 , 1 ) ;
  • η t + 1 = ( η t + a ) / ( 1 + b t ) , t = 0 , 1 , , where 0 < a b < 1 .

3.2. Algorithm for Generating Rep-Points under PEKD

3.2.1. Algorithm Statement

Our optimization problem is to minimize the discrepancy D γ θ , α 2 ( F , F P ) . Since the closed-form of the objective function is usually not available for the general distribution F, we optimized the Monte Carlo approximation version of it. Specifically, ignoring the first term and approximating the second integral with a large sample { y m } m = 1 N from the distribution F in the second equation of (1); then, the optimization problem becomes
argmin P X 2 n N i = 1 n m = 1 N exp k = 1 p θ | x i k y m k | α + 1 n 2 i , j = 1 n exp k = 1 p θ | x i k x j k | α .
The objective function in (8) is denoted by G ( { x i } i = 1 n ; { y m } m = 1 N ) . We construct an appropriate surrogate function for G in the following Theorem 3.
Theorem 3
(Closed-form iterations). Let { x i } i = 1 n ,   { x i ( t ) } i = 1 n ,   { y m } m = 1 N X . Assume x i k ( t ) x j k ( t ) ,   x i k ( t ) y m k for all 1 i ,   j n ,   j i ,   1 m N and 1 k p . Define the function h as:
G ˜ ( { x i } i = 1 n ; { x i ( t ) } i = 1 n , { y m } m = 1 N ) = 1 n N i = 1 n m = 1 N exp k = 1 p θ | x i k ( t ) y m k | α k = 1 p α θ | x i k ( t ) y m k | α 2 ( x i k y m k ) 2 2 n 2 i = 1 n j = 1 j i n exp k = 1 p θ | x i k ( t ) x j k ( t ) | α k = 1 p α θ | x i k ( t ) x j k ( t ) | α 2 ( x i k ( t ) x j k ( t ) ) ( x i k x i k ( t ) ) .
Then, G ˜ is a SCA surrogate function of G ( { x i } i = 1 n ; { y m } m = 1 N ) in (8) at { x i ( t ) } i = 1 n . Moreover, the global minimizer of h is given by:
x i = M i { x i ( t ) } i = 1 n ; { y m } m = 1 N = m = 1 N γ θ , α x i ( t ) , y m Ω θ 1 m = 1 N γ θ , α x i ( t ) , y m Ω θ y m + N n j = 1 j i n γ θ , α x i ( t ) , x j ( t ) q x i ( t ) , x j ( t ) ,
where γ θ , α x , y = exp k = 1 p θ | x k y k | α , Ω θ = diag | x i k ( t ) y m k | α 2 k = 1 p and
q x i ( t ) , x j ( t ) = | x i k ( t ) x j k ( t ) | α 2 ( x i k ( t ) x j k ( t ) ) k = 1 p .
In order to ensure that the assumptions in Theorem 3 are satisfied and the actual calculations remain numerically stable, we add a small perturbation δ to the absolute value items in Ω θ and q x i ( t ) , x j ( t ) in practice.
On the basic of the SCA algorithm framework, the construction process of rep-points under PEKD is described in Algorithm 1. If the training sample size N is too large, we can resample the mini-batch of it at each iteration, such as a mini-batch stochastic gradient descent in machine learning.
Algorithm 1: Rep-points construction algorithm under PEKD
  1  Set step size { η t } ( 0 , 1 ] ;
  2  Initialize t = 0 and points set P ( 0 ) = { x i ( 0 ) } i = 1 n with SP method;
  3  repeat (
  4    for i = 1 , , n  parallelly do (
  5          x ^ i = M i P ( t ) ; { y m } m = 1 N with M i defined in (9);
  6          x i t + 1 = x i t + η t ( x ^ i x i t ) .
  7    end (
  8    Update P ( t + 1 ) = { x i ( t + 1 ) } i = 1 n , and t t + 1 ;
  9  until P ( t )  converges; (
   10  return the convergent point set P [ ] .

3.2.2. Complexity and Convergence of the Algorithm

As we can observe in Theorem 3, the surrogate function in (9) has a closed-form minimizer and optimization variables can be updated in parallel. The running time of Algorithm 1 for one loop iteration is O ( n / P N p ) such as the SP method, where P is the total number of computation cores available. As for the PSP method, assuming that a sample { θ r } r = 1 R is obtained from π to approximate the sparsity-inducing kernel γ θ π ( x , y ) , the one-shot algorithm in [27] takes O ( R n N p ) . When the dimension p rises, R should be relatively large so that the sparsity-inducing kernel γ θ π ( x , y ) can be approximated well.
The following theorem gives a convergence guarantee for Algorithm 1.
Theorem 4
(Convergence of Algorithm 1). Suppose X R p is convex and compact and assumptions in Theorem 3 hold, then every limit point set of the sequence ( P ( t ) ) t = 1 (at least one such point set exists) from Algorithm 1 converges to a stationary solution of (8).

4. Applications

4.1. Numerical Simulations

In this section, we compare the performance of the PEKD ( α = 1 , 1.5 , 2 ) method with Monte Carlo (MC), inverse randomized quasi Monte Carlo (RQMC), SP and PSP methods. According to the hyperparameter setting of the PSP method in [27], we generate θ l i . i . d . Gamma ( 0.1 , 0.01 ) with small ( R = 50 , PSPs ) and large ( R = 1000 , PSPl ) sample size and set Γ | u | ( θ ) = exp { | u | } . PEKD and PSP methods take the point set generated by SP method as a warm start.

4.1.1. Visualization

Example 2.
Let F be the 5-dimensional i.i.d. Beta ( 2 , 4 ) distribution; we generate n = 25 points with several sampling methods.
Figure 3 shows scattarplots and marginal histograms of the projection of the point sets on the first two dimensions. It is obvious that the sample generated by PEKD ( α = 1.5 ) has better representation in all 1-dimensional marginal distributions and are not clustered such as in the samples obtained by SP and PSP methods on the 2-dimensional projection.
We calculate the Kolmogorov-Smirnov (K-S) test statistic between sample and Beta ( 2 , 4 ) distribution for each dimension, and average k-dimensional projected energy distance
APED k = 1 5 k u { 1 : 5 } | u | = k D γ E D 2 ( F u , F P u ) , k = 1 , 2 , 3 , 4 , 5 ,
where u represents the projection dimensions and F P u denotes the e.d.f. of { x i u } i = 1 n . Figure 4 shows the results of two measurements (the smaller the better). PEKD method performs better on one and two dimensional projections than other methods, while the PSP method is not stable. In addition, PSP and PEKD methods sometimes perform better than the SP method in the full dimensional space; one possible reason is that they start with the rep-points generated by the SP method, which helps the SP method leave a local minimum.

4.1.2. Numerical Integration

Example 3.
Consider the approximation of integral I = X g ( x ) d F ( x ) in [27]. Test three choices of distribution F: the i.i.d. N ( 0 , 1 ) and the i.i.d. Exp ( 1 ) with dimension p from 5 to 20 and two well-known integrand functions:
(1) 
Gaussian peak function ( GAPK ) : g G A P K ( x ) = exp l = 1 p a l 2 ( x l u l ) 2 ,
(2) 
additive Gaussian function ( ADD ) : g A D D ( x ) = exp l = 1 p b l x l ,
where u l is the marginal mean of F l . To incorporate low-dim. structure, a fraction q of the p variables are set as active, with a l = b l = 0.25 / ( q p ) for active variables, and 0 otherwise. These functions are denoted as GPAK [ q ] and ADD [ q ] .
Some results of the integral estimation error ( log | I ^ I | ) are shown in Figure 5. In Figure 5a, there is just p q = 1 , as an important variable. PSP method with large R obtains the lower averaged error than the RQMC method at the expense of complicated calculations. It is interesting that the PEKD method ( α < 2 ) has better performance with almost the same running time as the SP method. In Figure 5b,c, the number of important variables is small, and PEKD( α = 1.5 ) performs best. In Figure 5d, the ratio q is large, and PEKD method with large α is better. In addition, the errors of RQMC method are not always lower than that of the SP method, one possible reason is that the inverse transformation method may result in a loss of the representativeness of low discrepancy sequence.

4.1.3. Uncertainty Propagation

A computer model is treated as a mathematical function g that takes varying values of input parameters denoted by a vector x R p , and returns output g ( x ) . Uncertainty propagation methods are used to estimate the distribution of model outputs resulting from a set of uncertain model outputs. In other words, let X F denote input uncertainties; the distribution of g ( X ) can then be observed as the resulting uncertainty on the system output.
Example 4.
Three test (modified) functions are taken from [38,39]:
(1) 
g 1 ( x ) = 5 + e x 1 2 2 + e x 2 2 2 + 2 cos ( x 1 ) + 2 sin ( x 2 ) , where x 1 , x 2 i . i . d . U [ 1 , 1 ] ;
(2) 
g 2 ( x ) = sin ( x 1 ) + 7 sin 2 ( x 2 ) + 0.1 x 3 4 sin ( x 1 ) , where x 1 , x 2 , x 3 i . i . d . U [ π , π ] ;
(3) 
g 3 ( x ) = 5 + x 1 + 0.5 x 2 + 0.05 x 3 + 2 cos ( 20 x 1 ) + 0.2 sin ( 20 x 2 ) + 2 sin ( x 3 ) , where x i i . i . d . U [ 1 , 1 ] , i = 1 , 2 , , 6 .
We generate n = 60 points with different methods to estimate the output distributions of three test functions. Figure 6 shows the K-S test statistic values (repeated 100 times) between the estimated density and the true density for each test function. SP and PEKD2 perform better on g 1 ( x ) than other methods, while PSP, PEKD1 and PEKD1.5 are more suitable for g 2 ( x ) and g 3 ( x ) . Two possible reasons are: (1) the latter two test functions are more wiggly for each dimension, which means the low-dimensional structure should be given more attention; (2) g 3 ( x ) has many inactive variables, which makes SP and PEKD2 even worse than MC. In addition, the PSP method has a large variance on g 3 ( x ) , since its approximate sparsity-inducing kernel is unstable as the dimension increases.

4.2. Reduction of MCMC Chain

Markov Chain Monte Carlo (MCMC) is a family of techniques for the sampling probability distributions, which allows us to make statistical inferences about complex Bayesian models. If necessary, many practitioners use the thinning method (discard all but every k-th sample point after) to reduce high autocorrelation in the MCMC chain, save computer storage space and reduce processing time for computing derived posterior quantities. However, the thinning method is inefficient in most cases, since the valuable posterior samples are carelessly thrown away. Greater precision is available by working with unthinned chains. In practice, the models of interest are often high-dimensional but the desired posterior quantities involve only a handful of parameters.
Consider the orange tree growth model in [2]. The Orange data records the trunk circumference measurements { Y i ( t j ) } i = 1 5 j = 1 7 of five trees (i) at seven different times ( t j ), which can be found in R datesets package. The following hierarchical (multilevel) model is assumed in their paper:
Y i ( t j ) i n d e p . N ( η i ( t j ) , σ C 2 ) , η i ( t j ) = ϕ i 1 / ( 1 + ϕ i 2 exp { ϕ i 3 t j } ) ; ( likelihood ) log ϕ i 1 i n d e p . N ( μ 1 , σ 1 2 ) , log ( ϕ i 2 + 1 ) i n d e p . N ( μ 2 , σ 2 2 ) , log ( ϕ i 3 ) i n d e p . N ( μ 3 , σ 3 2 ) , σ C 2 Inv-Gamma ( 0.001 , 0.001 ) ; ( priors ) μ k i . i . d . N ( 0 , 100 ) , σ k 2 i . i . d . Inv-Gamma ( 0.01 , 0.01 ) ; ( hyperpriors ) i = 1 , , 5 , j = 1 , , 7 and k = 1 , 2 , 3 .
The parameter set of interest is Θ = ( ϕ 11 , ϕ 12 , , ϕ 53 , σ 2 ) . As suggested in [2], we generate the chain with 150,000 iterations and the first half of the sample is discarded as a burn-in based on the R package rstan. Then, the full chain (N = 75,000) is compressed to a small sample ( n = 375 ) with thinning, SP, PSPs and PEKD( α = 1.5 , 2 ) methods. Compare these methods on how well they estimate (a) the marginal posterior means and variances of each parameter, (b) the averaged instantaneous growth rate r ( t ) = 1 5 i = 1 5 s η i ( s ) s = t at three future times (t = 1600, 1625, 1650). True posterior quantities are estimated by running a longer MCMC chain with 600,000 iterations. Table 1 reports the error ratios of the thinning method over SP, PSPs and PEKD methods in estimating the posterior quantities of interest. The larger the ratio is, the more accurate the estimation is. From Table 1, compared with the thinning method, other methods can estimate parameters more accurately. The PEKD1.5 method is stable and performs best for most parameter estimates, while PEKD2 method performs well only in the estimation of the mean value.

5. Conclusions and Discussion

In this work, a new rep-points criterion named PEKD is introduced. The most attractive property of PEKD is that the low-dimensional representativeness of rep-points can be regulated by tuning the hyperparameter α . The smaller the α , the lower the dimensional representative will be assured primarily. Actually, when α = 1 , the rep-points under the PEKD is an LHD for uniform distribution on [ 0 , 1 ] p , which means the 1-dimensional representativeness achieves the best performance. What is more, a parallelized optimization algorithm is also constructed for generating the rep-points under the criterion PEKD. Simulation studies and an example of real data demonstrate that PEKD with small α is suitable for situations where important variables are sparse and the function fluctuates greatly, and α = 1.5 is a robust choice in most cases.
As a general distribution similarity measure, PEKD can be used to test independence and goodness-of-fit [31,40,41,42]. For the experimental design community, PEKD can be used as a criterion to construct complex designs, such as space-filling design and sliced design [16,43,44,45] in the irregular region. In addition, the algorithm proposed in this paper would be helpful for data splitting [46] and model-free subsampling [47] problems in the machine learning community.

Author Contributions

Conceptualization, J.N. and H.Q.; methodology, Z.X.; software, Y.X.; validation, Z.X., J.N. and H.Q.; investigation, Z.X.; data curation, Y.X.; writing—original draft preparation, Z.X.; writing—review and editing, Y.X. and J.N.; supervision, J.N.; project administration, H.Q.; funding acquisition, J.N. and H.Q. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China, grant number 11871237; by the Fundamental Research Funds for the Central Universities, grant number CCNU22JC023, 2022YBZZ036; and by the Discipline Coordination Construction Project of Zhongnan University of Economics and Law, grant number XKHJ202125.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

PEKDpower exponential kernel discrepancy
rep-pointsrepresentative points
MCMCMarkov chain Monte Carlo
SPsupport points
PSPprojected support points with small, large s
PEpower exponential
SCAsuccessive convex approximation
MCMonte Carlo
RQMCrandomized quasi Monte Carlo
xscalar variable x
x vector variable x
{ x i } i = 1 n point set { x 1 , , x n }
E X F [ X ] expectation of the random variable X from the distribution F
γ E D kernel of energy distance
γ θ , α power exponential kernel with hyperparameters θ and α

Appendix A

Proof of Proposition 1.
Let y = x x 0 , then the optimization problem can described as the following optimization problem:
min y R d exp i = 1 d θ | y i | α subject to y 2 2 = 1 .
The standard Lagrange multiplier method can be used to solve this problem, which takes the minimum value exp { θ d 1 α 2 } at y { u / d | u i = ± 1 , i = 1 , , d } . Since x = y + x 0 , all conclusions can be obtained directly. □
Proof of Theorem 1.
According to Definition 4,
γ θ , 1 ( x , y ) = exp k = 1 p θ | x k y k | , θ > 0 .
Using the Taylor formula
e z = 1 + z 1 ! + z 2 2 ! + + z n n ! + o ( z n ) ,
then,
exp k = 1 p θ | x k y k | = 1 θ k = 1 p | x k y k | + θ 2 2 ! k = 1 p | x k y k | 2 + ( θ ) n n ! k = 1 p | x k y k | n + o θ k = 1 p | x k y k | n .
Since 0 θ k = 1 p | x k y k | M θ = o ( 1 ) , (A1) can be written as
exp k = 1 p θ | x k y k | = 1 θ k = 1 p | x k y k | + o ( θ k = 1 p | x k y k | ) .
Because the first term in (A2) is constant, then
Argmin x 1 , , x n X D γ θ , 1 2 ( F , F P ) Argmin x 1 , , x n X 2 n i = 1 n X k = 1 p | x i k y k | d F ( y ) 1 n 2 i = 1 n j = 1 n k = 1 p | x i k x j k | Argmin x 1 k , , x n k X k 2 n i = 1 n X | x i k y k | d F k ( y k ) 1 n 2 i = 1 n j = 1 n | x i k x j k | , k = 1 , , p , Argmin x 1 k , , x n k X k 2 n i = 1 n E Y k F k | x i k Y k | 1 n 2 i = 1 n j = 1 n | x i k x j k | E Y k , Y k F k | Y k Y k | , k = 1 , , p .
The last term in (A3) is the rep-points of F k generated by minimizing (5). □
To prove Corollary 1, we require a lemma:
Lemma A1.
Let F be the uniform distribution on [ 0 , 1 ] and let F P be the e.d.f of { x i } i = 1 n [ 0 , 1 ] ; then, the energy distance in (5) can expressed as
D γ E D 2 ( F , F P ) = 2 n i = 1 n ( x i 2 x i + 1 2 ) 1 n 2 i = 1 n j = 1 n | x i x j | 1 3 .
Proof of the Lemma A1.
Let random variables Y , Y U [ 0 , 1 ] , the energy distance kernel in 1-dimensional is γ E D ( x , y ) = | x y | . Then,
E | x Y | = 0 1 | x t | d t = x 2 x + 1 2 , E | Y Y | = 0 1 t 2 t + 1 2 d t = 1 3 .
According to the Equation (5), the result in (A4) holds. □
Proof of Corollary 1.
In the light of Theorem 1, M = p and { x i k } i = 1 n is the energy distance rep-points of F k = U [ 0 , 1 ] , k = 1 , , p . Without loss of generality, the subscript k is ignored below. Take Lemma A1, { x i } i = 1 n minimizes the kernel discrepancy D γ E D 2 ( F , F P ) in (A4). Next, let x ( t ) denote the t-th order statistic of the sample { x i } i = 1 n , then
D γ E D 2 ( F , F P ) = 2 n i = 1 n ( x i 2 x i + 1 2 ) 1 n 2 i = 1 n j = 1 n | x i x j | 1 3 = 2 n i = 1 n x i 2 + 2 3 1 n 2 i = 1 n j = 1 n | x i x j | 2 n i = 1 n x i = 2 n i = 1 n x ( t ) 2 + 2 3 2 n 2 t = 1 n ( 2 t 1 n ) x ( t ) 2 n 2 t = 1 n n x ( t ) = 2 1 3 + 1 n t = 1 n x ( t ) 2 1 n 2 t = 1 n ( 2 t 1 ) x ( t ) = 2 n t = 1 n x ( t ) 2 t 1 2 n 2 + 1 6 n 2 1 6 n 2 .
Therefore, each dimension of rep-points { x i } i = 1 n is a permutation of the n levels, which are 2 t 1 2 n , t = 1 , , n . In other words, the rep-points { x i } i = 1 n is a central Latin hypercube design. □
To prove the Theorem 2, we require the following lemma:
Lemma A2.
Let Y , Y i . i . d . F and E Y 2 2 < . If kernel in (1) is γ s = 2 ( x , y ) = x y 2 2 , then the corresponding kernel discrepancy
D γ s = 2 2 ( F , F P ) = 2 1 n i = 1 n x i E Y 2 2 .
Proof of Lemma A2.
Similar to (5), when kernel γ ( x , y ) = x y 2 2 , we can obtain
D γ s = 2 2 ( F , F P ) = 2 n i = 1 n E x i Y 2 2 1 n 2 i = 1 n j = 1 n x i x j 2 2 E Y Y 2 2 , = 4 n i = 1 n x i T E Y + 4 n i = 1 n j = 1 n x i T x j 2 n 2 i = 1 n x i 2 2 + 2 E Y 2 2 = 2 1 n i = 1 n x i 2 2 2 n i = 1 n x i T E Y + E Y 2 2 = 2 1 n i = 1 n x i E Y 2 2 .
The proof of this lemma is finished. □
Proof of Theorem 2.
Since 0 θ k = 1 p ( x k y k ) 2 M θ = o ( 1 ) ,
γ θ , 2 ( x , y ) = exp k = 1 p θ ( x k y k ) 2 = 1 θ x k y k 2 2 + o ( θ x k y k 2 2 ) .
Then, according to Lemma A2, we can obtain
Argmin x 1 , , x n X D γ θ , 2 2 ( F , F P ) Argmin x 1 , , x n X D γ s = 2 2 ( F , F P ) Argmin x 1 , , x n X 1 n i = 1 n x i E Y 2 2 .
The last problem achieves the optimal value when 1 n i = 1 n x i = E Y . □
Proof of Theorem 3.
Obviously, G ˜ ( { x i } i = 1 n ; { x i ( t ) } i = 1 n , { y m } m = 1 N ) is a quadratic function about variables and the coefficients of the quadratic term are all greater than 0; there, G ˜ is continuous and strongly convex. When x i k ( t ) x j k ( t ) , x i k ( t ) y m k for all 1 i < j n , 1 m N , 1 k p , G ( { x i } i = 1 n ; { y m } m = 1 N ) and G ˜ ( { x i } i = 1 n ; { x i ( t ) } i = 1 n , { y m } m = 1 N ) are differentiable about { x i } i = 1 n . Through tedious algebraic calculations,
{ x i } i = 1 n G ˜ ( { x i } i = 1 n ; { x i ( t ) } i = 1 n , { y m } m = 1 N ) | { x i = x i ( t ) } i = 1 n = { x i } i = 1 n G ( { x i } i = 1 n ; { y m } m = 1 N ) | { x i = x i ( t ) } i = 1 n
can be verified. Then, G ˜ is a SCA surrogate function of G ( { x i } i = 1 n ; { y m } m = 1 N ) in (8) at { x i ( t ) } i = 1 n according to Definition 5. Moreover, the closed-form minimizer can be obtained by setting the gradient of G ˜ ( { x i } i = 1 n ; { x i ( t ) } i = 1 n , { y m } m = 1 N ) to zero and solving for { x i } i = 1 n . □
Proof of Theorem 4.
Based on Theorem 3 and the diminishing step size rule, this theorem can be proven by Theorem 3 in [37] under some regularity conditions. □

References

  1. Flury, B.A. Principal Points. Biometrika 1990, 77, 33–41. [Google Scholar] [CrossRef]
  2. Mak, S.; Joseph, V.R. Support points. Ann. Stat. 2018, 46, 2562–2592. [Google Scholar] [CrossRef]
  3. Anderberg, M.R. Cluster Analysis for Applications; Academic Press: San Diego, CA, USA, 1973. [Google Scholar] [CrossRef]
  4. Fang, K.T.; He, S.D. The Problem of Selecting a Given Number of Representative Points in a Normal Population and a Generalized Mills’ Ratio; Technical Report; Stanford University, Department of Statistics: Stanford, CA, USA, 1982. [Google Scholar] [CrossRef]
  5. Flury, B.D. Estimation of principal points. J. R. Stat. Soc. Ser. C Appl. Stat. 1993, 42, 139–151. [Google Scholar] [CrossRef]
  6. Fang, K.; Zhou, M.; Wang, W. Applications of the representative points in statistical simulations. Sci. China Math. 2014, 57, 2609–2620. [Google Scholar] [CrossRef]
  7. Lemaire, V.; Montes, T.; Pagès, G. New weak error bounds and expansions for optimal quantization. J. Comput. Appl. Math. 2020, 371, 112670. [Google Scholar] [CrossRef] [Green Version]
  8. Mezic, I.; Runolfsson, T. Uncertainty propagation in dynamical systems. Automatica 2008, 44, 3003–3013. [Google Scholar] [CrossRef]
  9. Mohammadi, S.; Cremaschi, S. Efficiency of Uncertainty Propagation Methods for Estimating Output Moments. In Proceedings of the 9th International Conference on Foundations of Computer-Aided Process Design, 14–18 July 2019, Copper Mountain, CO, USA; Muñoz, S.G., Laird, C.D., Realff, M.J., Eds.; Elsevier: Amsterdam, The Netherlands, 2019; Volume 47, pp. 487–492. [Google Scholar] [CrossRef]
  10. Owen, A.B. Statistically Efficient Thinning of a Markov Chain Sampler. J. Comput. Graph. Stat. 2017, 26, 738–744. [Google Scholar] [CrossRef]
  11. Riabiz, M.; Chen, W.Y.; Cockayne, J.; Swietach, P.; Niederer, S.A.; Mackey, L.; Oates, C.J. Optimal thinning of MCMC output. J. R. Stat. Soc. Ser. B 2022, 84, 1059–1081. [Google Scholar] [CrossRef]
  12. South, L.F.; Riabiz, M.; Teymur, O.; Oates, C.J. Postprocessing of MCMC. Annu. Rev. Stat. Its Appl. 2022, 9, 529–555. [Google Scholar] [CrossRef]
  13. Xu, L.H.; Fang, K.T.; Pan, J. Limiting behavior of the gap between the largest two representative points of statistical distributions. Commun. Stat.-Theory Methods 2021, 1–24. [Google Scholar] [CrossRef]
  14. Li, Y.; Fang, K.T.; He, P.; Peng, H. Representative Points from a Mixture of Two Normal Distributions. Mathematics 2022, 10, 3952. [Google Scholar] [CrossRef]
  15. Xu, L.H.; Fang, K.T.; He, P. Properties and generation of representative points of the exponential distribution. Stat. Pap. 2022, 63, 197–223. [Google Scholar] [CrossRef]
  16. Fang, K.T.; Liu, M.Q.; Qin, H.; Zhou, Y.D. Theory and Application of Uniform Experimental Designs; Springer: Singapore, 2018. [Google Scholar] [CrossRef]
  17. Pronzato, L.; Zhigljavsky, A. Bayesian quadrature, energy minimization and space-filling design. SIAM/ASA J. Uncertain. Quantif. 2020, 8, 959–1011. [Google Scholar] [CrossRef]
  18. Borodachov, S.; Hardin, D.; Saff, E. Low Complexity Methods for Discretizing Manifolds via Riesz Energy Minimization. Found. Comput. Math. 2014, 14, 1173–1208. [Google Scholar] [CrossRef] [Green Version]
  19. Joseph, V.R.; Dasgupta, T.; Tuo, R.; Wu, C.F.J. Sequential Exploration of Complex Surfaces Using Minimum Energy Designs. Technometrics 2015, 57, 64–74. [Google Scholar] [CrossRef]
  20. Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; MIT Press: Cambridge, MA, USA, 2016. [Google Scholar]
  21. Yang, Q.; Zhang, Y.; Dai, W.; Pan, S.J. Transfer Learning; Cambridge University Press: Cambridge, UK, 2020. [Google Scholar] [CrossRef]
  22. Fang, K.T.; Wang, Y. Number-Theoretic Methods in Statistics; Chapman and Hall: London, UK, 1994. [Google Scholar]
  23. Briol, F.X.; Oates, C.J.; Girolami, M.; Osborne, M.A.; Sejdinovic, D. Probabilistic Integration: A Role in Statistical Computation? Stat. Sci. 2019, 34, 1–22. [Google Scholar] [CrossRef] [Green Version]
  24. Chen, Y.; Welling, M.; Smola, A.J. Super-Samples from Kernel Herding. arXiv 2012, arXiv:1203.3472. [Google Scholar] [CrossRef]
  25. Hickernell, F.J. A generalized discrepancy and quadrature error bound. Math. Comput. 1998, 67, 299–322. [Google Scholar] [CrossRef] [Green Version]
  26. Zhou, Y.D.; Fang, K.T.; Ning, J.H. Mixture discrepancy for quasi-random point sets. J. Complex. 2013, 29, 283–301. [Google Scholar] [CrossRef]
  27. Mak, S.; Joseph, V.R. Projected support points: A new method for high-dimensional data reduction. arXiv 2018, arXiv:1708.06897. [Google Scholar] [CrossRef]
  28. Scutari, G.; Sun, Y. Parallel and Distributed Successive Convex Approximation Methods for Big-Data Optimization. In Multi-Agent Optimization: Cetraro, Italy 2014; Facchinei, F., Pang, J.S., Eds.; Springer International Publishing: Cham, Switzerland, 2018; pp. 141–308. [Google Scholar] [CrossRef] [Green Version]
  29. Santner, T.J.; Williams, B.J.; Notz, W.I. The Design and Analysis of Computer Experiments; Springer: New York, NY, USA, 2018. [Google Scholar] [CrossRef]
  30. N’Gbo, N.; Tang, J. On the Bounds of Lyapunov Exponents for Fractional Differential Systems with an Exponential Kernel. Int. J. Bifurc. Chaos 2022, 32, 2250188. [Google Scholar] [CrossRef]
  31. Székely, G.J.; Rizzo, M.L. Energy statistics: A class of statistics based on distances. J. Stat. Plan. Inference 2013, 143, 1249–1272. [Google Scholar] [CrossRef]
  32. Fang, K.T.; Hickernell, F.J. Uniform Experimental Designs; Springer: New York, NY, USA, 2007. [Google Scholar] [CrossRef]
  33. Lange, K. MM Optimization Algorithms; SIAM: Philadelphia, PA, USA, 2016. [Google Scholar] [CrossRef]
  34. Stein, M.L. Large sample properties of simulations using latin hypercube sampling. Technometrics 1987, 29, 143–151. [Google Scholar] [CrossRef]
  35. Packham, N.; Schmidt, W.M. Latin hypercube sampling with dependence and applications in finance. J. Comput. Financ. 2010, 13, 81–111. [Google Scholar] [CrossRef] [Green Version]
  36. Aistleitner, C.; Hofer, M.; Tichy, R.F. A central limit theorem for Latin hypercube sampling with dependence and application to exotic basket option pricing. Int. J. Theor. Appl. Financ. 2012, 15, 1–20. [Google Scholar] [CrossRef] [Green Version]
  37. Scutari, G.; Facchinei, F.; Song, P.; Palomar, D.P.; Pang, J.S. Decomposition by Partial Linearization: Parallel Optimization of Multi-Agent Systems. IEEE Trans. Signal Process. 2014, 62, 641–656. [Google Scholar] [CrossRef] [Green Version]
  38. Oakley, J.E.; O’Hagan, A. Bayesian inference for the uncertainty distribution of computer model outputs. Biometrika 2002, 89, 769–784. [Google Scholar] [CrossRef] [Green Version]
  39. Marrel, A.; Iooss, B.; Laurent, B.; Roustant, O. Calculations of sobol indices for the gaussian process metamodel. Reliab. Eng. Syst. Saf. 2009, 94, 742–751. [Google Scholar] [CrossRef] [Green Version]
  40. Székely, G.J.; Rizzo, M.L.; Bakirov, N.K. Measuring and testing dependence by correlation of distances. Ann. Stat. 2007, 35, 2769–2794. [Google Scholar] [CrossRef]
  41. Wang, S.; Liang, J.; Zhou, M.; Ye, H. Testing Multivariate Normality Based on F-Representative Points. Mathematics 2022, 10, 4300. [Google Scholar] [CrossRef]
  42. Liang, J.; He, P.; Yang, J. Testing Multivariate Normality Based on t-Representative Points. Axioms 2022, 11, 587. [Google Scholar] [CrossRef]
  43. Xiong, Z.K.; Liu, W.J.; Ning, J.H.; Qin, H. Sequential support points. Stat. Pap. 2022, 63, 1757–1775. [Google Scholar] [CrossRef]
  44. Xiao, Y.; Ning, J.H.; Xiong, Z.K.; Qin, H. Batch sequential adaptive designs for global optimization. J. Korean Stat. Soc. 2022, 51, 780–802. [Google Scholar] [CrossRef]
  45. Kong, X.; Zheng, W.; Ai, M. Representative points for distribution recovering. J. Stat. Plan. Inference 2023, 224, 69–83. [Google Scholar] [CrossRef]
  46. Joseph, V.R.; Vakayil, A. Split: An optimal method for data splitting. Technometrics 2022, 64, 166–176. [Google Scholar] [CrossRef]
  47. Zhang, M.; Zhou, Y.; Zhou, Z.; Zhang, A. Model-free Subsampling Method Based on Uniform Designs. arXiv 2022, arXiv:2209.03617. [Google Scholar] [CrossRef]
Figure 1. Contours of different kernels. (a) Negative Euclidean distance kernel in support points (SP) method; (b) Gaussian kernel; (c) sparsity-inducing kernel in projected support points (PSP) method; (df) power exponential (PE) kernel with α = 0.5 , 1 , 1.5 , respectively. The point A and point B in all figures have the same coordinates in the second dimension, and x A x B 2 = x A x C 2 .
Figure 1. Contours of different kernels. (a) Negative Euclidean distance kernel in support points (SP) method; (b) Gaussian kernel; (c) sparsity-inducing kernel in projected support points (PSP) method; (df) power exponential (PE) kernel with α = 0.5 , 1 , 1.5 , respectively. The point A and point B in all figures have the same coordinates in the second dimension, and x A x B 2 = x A x C 2 .
Axioms 11 00711 g001
Figure 2. Scatterplots of rep-points for uniform distribution on [ 0 , 1 ] 2 generated by SP method () and PEKD method ().
Figure 2. Scatterplots of rep-points for uniform distribution on [ 0 , 1 ] 2 generated by SP method () and PEKD method ().
Axioms 11 00711 g002
Figure 3. Scattarplots and marginal histograms of n = 25 points for 5-dimensional i.i.d Beta(2,4) with six samplers. Red lines represent the true marginal densities. (a) MC; (b) RQMC; (c) SP; (d) PSPs; (e) PSPl; (f) PEKD1.5.
Figure 3. Scattarplots and marginal histograms of n = 25 points for 5-dimensional i.i.d Beta(2,4) with six samplers. Red lines represent the true marginal densities. (a) MC; (b) RQMC; (c) SP; (d) PSPs; (e) PSPl; (f) PEKD1.5.
Axioms 11 00711 g003
Figure 4. Box plots of K-S test statistic and relative average projected energy distance (setting SP method as a benchmark by calculating sign ( APED SP k APED method k ) log 10 ( | APED SP k APED method k | ) ).
Figure 4. Box plots of K-S test statistic and relative average projected energy distance (setting SP method as a benchmark by calculating sign ( APED SP k APED method k ) log 10 ( | APED SP k APED method k | ) ).
Axioms 11 00711 g004
Figure 5. Box plots of Log-Error for GAPK and ADD under p dimensional i.i.d. N ( 0 ,   1 ) and Exp ( 1 ) . (a) Normal ( GAPK [ 0.2 ] , n = 50 , p = 5 ); (b) Normal ( GAPK [ 0.2 ] , n = 50 , p = 20 ); (c) Exp ( ADD [ 0.1 ] , n = 100 ,   p = 20 ) ; (d) Exp ( ADD [ 0.4 ] , n = 100 ,   p = 20 ) .
Figure 5. Box plots of Log-Error for GAPK and ADD under p dimensional i.i.d. N ( 0 ,   1 ) and Exp ( 1 ) . (a) Normal ( GAPK [ 0.2 ] , n = 50 , p = 5 ); (b) Normal ( GAPK [ 0.2 ] , n = 50 , p = 20 ); (c) Exp ( ADD [ 0.1 ] , n = 100 ,   p = 20 ) ; (d) Exp ( ADD [ 0.4 ] , n = 100 ,   p = 20 ) .
Axioms 11 00711 g005
Figure 6. Box plots of K-S test statistic for three test functions on uncertainty propagation of computer experiments. The smaller, the better. (a) g 1 ( x ) ; (b) g 2 ( x ) ; (c) g 3 ( x ) .
Figure 6. Box plots of K-S test statistic for three test functions on uncertainty propagation of computer experiments. The smaller, the better. (a) g 1 ( x ) ; (b) g 2 ( x ) ; (c) g 3 ( x ) .
Axioms 11 00711 g006
Table 1. The error ratios of thinning method over SP, PSP and PEKD methods in estimating the posterior quantities. The larger, the better.
Table 1. The error ratios of thinning method over SP, PSP and PEKD methods in estimating the posterior quantities. The larger, the better.
ParameterEstimations of Quantities with Different Methods
MeansVariances
SPPSPsPEKD1.5PEKD2SPPSPsPEKD1.5PEKD2
ϕ i 1 21.0021.4321.4921.602.722.614.333.21
ϕ i 2 8.788.7410.189.173.793.434.033.30
ϕ i 3 7.007.338.717.174.274.125.313.63
σ C 2 24.1727.4842.2544.405.045.6811.645.82
r(1600)14.0015.7926.7230.27----
r(1625)12.8514.2924.0425.16----
r(1650)11.9013.0921.9021.61----
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Xiong, Z.; Xiao, Y.; Ning, J.; Qin, H. Representative Points Based on Power Exponential Kernel Discrepancy. Axioms 2022, 11, 711. https://doi.org/10.3390/axioms11120711

AMA Style

Xiong Z, Xiao Y, Ning J, Qin H. Representative Points Based on Power Exponential Kernel Discrepancy. Axioms. 2022; 11(12):711. https://doi.org/10.3390/axioms11120711

Chicago/Turabian Style

Xiong, Zikang, Yao Xiao, Jianhui Ning, and Hong Qin. 2022. "Representative Points Based on Power Exponential Kernel Discrepancy" Axioms 11, no. 12: 711. https://doi.org/10.3390/axioms11120711

APA Style

Xiong, Z., Xiao, Y., Ning, J., & Qin, H. (2022). Representative Points Based on Power Exponential Kernel Discrepancy. Axioms, 11(12), 711. https://doi.org/10.3390/axioms11120711

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop