1. Introduction
Statistics is an art of science involving data collection and data analysis, in which statistical modeling relies on various types of statistical distributions. Such distributions are either discrete or continuous and either univariate or multivariate distributions. For an unknown continuous distribution 
F in 
, the conventional approach is to approximate 
F using the empirical distribution of a random sample. The empirical distribution is discrete, consisting of support points from the random sample, with each point contributing equally to the approximation. Because of the accuracy problem resulting from the empirical distribution, we want to construct a discrete distribution 
 that approximates the distribution 
F while preserving the distribution information as much as possible. Consider a random vector 
X following a continuous distribution 
, characterized by a probability density function (pdf) 
. In contrast, a discrete random vector 
Y is characterized by a probability mass function (pmf) 
.
      
      where 
 are support points of 
Y and 
 For defining an approximation distribution 
 to 
 it should satisfy
- (i)
  is a function F;
- (ii)
 A pre-decided distance between F and  is small;
- (iii)
  in distribution as , where k in  is the number of support points of Y. In this case, the support points  are called representative points (RPs). There are several ways to choose an approximation distribution .
  1.1. Monte Carlo—RPs
Let 
 be a random vector, where 
 represents the parameters. For instance, for the normal distribution 
, the parameters are denoted as 
. In traditional statistics, random samples are utilized to make inferences about the population. Specifically, a collection of independently and identically distributed (iid) random samples, denoted as 
, are drawn from the population distribution 
F. The empirical distribution of the random sample is defined as follows: 
        where 
 is the indicator function of 
A, and the inequalities 
 means that 
 (
, 
) where 
 and 
. Many statistical inferences rely on the empirical distribution 
, which includes various methods such as:
- (1)
 Parameter estimation (point estimation and confidence interval estimation);
- (2)
 Density estimation;
- (3)
 Testing hypothesis, and so on.
The empirical distribution is a discrete distribution with support points  each having the sampling probability  and can be considered as an approximation of  in the sense of consistency, i.e.,  in distribution as . In statistical simulation, a set of random samples can be generated by computer software under the Monte Carlo (MC) method. Therefore, we denote random variable  or  in this paper. The MC methods have been commonly used. For instance, in the case of a normal population  with unknown parameters  and , one can utilize the sample mean  and the sample variance  to estimate  and , respectively.
As the empirical distribution can be regarded as an approximation distribution to 
, one can therefore take a set of random samples from 
 instead of from 
F, as suggested by Efron [
1], which is called as 
bootstrap method. The bootstrap method is a resampling technique, where the random sample takes from an approximation distribution 
. Later, Efron gave a comprehensive study on the theory and application of the bootstrap method.
The MC method has proven to be useful in statistical theory and applications. However, its efficiency is not always good due to the convergence rate of  in distribution, which is  as . The slow convergence leads to unsatisfactory approximations when performing numerical integration using the MC method. While the empirical distribution serves as one approximation to the true distribution F, alternative approaches were proposed in the literature to address this issue.
  1.2. Number-Theoretic RPs or Quasi-Monte Carlo RPs
Let us consider the numerical calculation for high-dimension integration in a canonical form
        
        where 
f is a continuous function on 
. Let 
 be a set of 
k points uniformly scattered on 
. One can use the mean of 
, denoted by 
, to approximate 
. By the MC method, we can employ a random sample from the uniform distribution 
. The rate of convergence of 
 is 
, which is relatively slow but does not depend on the dimensionality 
d. How to increase the convergence rate is an impotent subject in applications. The number-theoretic methods (NTM) or quasi-Monte Carlo methods (QMC) provide many methods for the construction of 
 such that 
 are uniformly scattered on 
, by which it can increase the rate of convergence into 
. For the theory and methodology of NTM/QMC, one can refer to Hua and Wang [
2] and Niederreiter [
3]. In the earlier study on NTM, many authors employed the star discrepancy as a measure of the uniformity of 
 in 
. The star discrepancy is defined by
        
        where 
 is the cdf of 
 and 
 is the empirical distribution of 
. An optimal 
 has the minimum 
. In this case the points in 
 are called 
QMC-RPs that are support points of 
 each having the equal probability 
. In this paper we denote 
 by 
 and random vector 
. Another popular measure is the 
-distance between 
 and 
FWhen 
F is the uniform distribution on 
, the the 
-distance is called as 
-discrepancy. The star discrepancy is the 
-discrepancy as 
. In the literature a set of 
 under a certain structure is regarded as a set of 
quasirandom F-numbers if its discrepancy has an order 
 under the given discrepancy. When 
F is the uniform distribution on 
 (
), the quasirandom 
F-numbers are called as 
quasirandom numbers. The reader can refer to Fang and Wang [
4] for details. Due to the numerical computation complexity of the 
-discrepancy (
), the 
-discrepancy has a simple computational formula. There are more uniformity measures in the experimental design such as the centered 
-discrepancy, wrap-around 
-discrepancy and mixture 
-discrepancy (refer to Fang et al. [
5]). Fang and Wang [
4] and Fang et al. [
6] gave a comprehensive study on NTM and applications of NTM in statistical inference, experimental design, geometric probability, and optimization. Pages [
7] gave a detailed study on applications of QMC to financial mathematics. 
Section 6 will introduce some algorithms for the generation of QMC-RPs.
  1.3. Mean Square Error—RPs
Another measure to choose discrete approximation distribution to a given continuous distribution is the mean square error (MSE), and the corresponding support points are called MSE-RPs.
Definition 1. Suppose that a random vector  in  has a density function  with finite mean vector and covariance matrix. A set of points of   is called MSE-RPs if it minimizes the mean square error (MSE)where  denotes -norm of .  Given a set of 
k points 
 in 
 a set of regions are defined by
        
        that are called as 
Voronoi regions, where 
 is the attraction domain of 
.
For a univariate distribution 
 with pdf 
, mean 
 and variance 
, its MSE-RPs can be sorted as 
, and its MSE can be expressed as
        
        where
        
The corresponding 
 has support points 
 with probabilities 
 where
        
Its loss function (LF) is defined by
        
It is known that . The loss function shows what percentage of  is lost by using  replacing X.
The concept of MSE-RPs has been motivated by various problems. In the context of the grouping problem, Cox [
8] considered the task of condensing observations of a variate into a limited number of groups, where the grouping intervals are selected to retain maximum information. He introduced the concept of mean squared error (MSE) and provided several sets of MSE-RPs for the standard normal distribution. The concept of MSE-RPs is also relevant in data transmission systems, where analog input signals are converted to digital form, transmitted, and then reconstituted as analog signals at the receiver. The problem of optimal quantization of a continuous random variable with a fixed number of levels was precisely defined by Max [
9]. In IEEE journals, MSE-RPs are also referred to as “quatizers”.
Fang and He [
10] proposed the mathematical problem based on the national Chinese garment standard (refer to Fang [
11]). Iyengar and Solomon [
12] considered the mathematical problems that arise in the theory of representing a distribution by a few optimally chosen points. Flury [
13], Flury [
14] studied a project of the Swiss army to replace existing with newly designed protection masks. They used “
principal points” for MSE-RPs due to some link between the principal components and MSE-RPs. The MSE-RPs are also applied to select a few “representative” curves from a large collection of curves which is useful for kernel density estimation (see Flury and Tarpey [
15]) and for psychiatric studies by Tarpey and Petkova [
16]. Furthermore, MSE-RPs can be applied to problems related to the numerical computation of conditional expectations, stochastic differential equations, and stochastic partial differential equations. These applications are often motivated by challenges encountered in the field of finance [
7]. There was a special issue of “IEEE Transaction on Information Theory” on vector quantizers in 1982, a very detailed review on quantization by Gray and Neuhoff [
17]. There are several monographs on the theory and applications of RPs, for example, Graf and Luschgy [
18] “
Foundations of Quantization for Probability Distributions” and Pages [
7] “
Numerical Probability, An Introduction with Applications to Finance”.
The use of different types of representative points (RPs) allows for the construction of diverse approximation distributions to represent the underlying population distribution. By utilizing these approximation distributions, researchers can make more reliable and precise statistical inferences. The objective of this paper is to provide a comprehensive review of various types of RPs and their associated theory, algorithms, and applications. The focus of this review extends to the examination of recent advancements in the field, highlighting the latest developments and emerging trends. This paper aims to offer valuable insights into the current state of the art and provide researchers and practitioners with a deeper understanding of the potential applications and implications of RPs in statistical science. In 
Section 2, we present a comprehensive list of properties associated with MSE-RPs for univariate distributions. 
Section 3 focuses on reviewing various algorithms used for generating MSE-RPs for univariate distributions. In 
Section 4, we compare various types of RPs in terms of their performance in stochastic simulation and resampling. Additionally, we show the consistency of resampling when MSE-RPs. Properties of MSE-RPs for multivariate distributions are reviewed in 
Section 5, and algorithms for generating QMC-RPs and MSE-RPs for multivariate distributions are introduced in 
Section 6. QMC-RPs and MSE-RPs have found numerous applications across various domains. In this paper, we focus on selected applications in statistical inference and geometric probability due to space limitations.
  2. Properties of MSE-RPs for Univariate Distributions
We collect some properties of MSR-RPs in the literature in this section. These properties can be grouped into different issues. Some properties are only for the univariate distributions, and some are for the multivariate ones. The following results are from many articles, including Fei [
19] under the notation in the previous section.
Theorem 1. Let X be a continuous random variable with pdf , finite mean μ and variance . Then we have  The property (
A) can be regarded as “unbiased mean”. The property (
C) gives a decomposition of variance of 
X as
      
The concept of self-consistent has been used in the clustering analysis and has a close relation with MSE-RPs.
Definition 2. The set of k points  in  is called self-consistent with respect to the d-variate random vector X and the partition  of  ifwhere the region  is the domain of attraction of .  Tarpey and Flury [
20] gave a comprehensive study on the self-consistent and they pointed out
- (1)
 MSE-RPs are self-consistent with respective to X;
- (2)
 MSE-RPs have the minimum mean square error among all sets of the self-consistent to X.
  2.1. Existence and Uniqueness of MSE-RPs
The existence of MSE-RPs is no problem for any continuous distributions with the first second moments. For the case of , the MSE-RP  is the mean of X. This fact indicates that the MSE-RPs can be regarded as an extension of the mean. The MSE-RPs are no analytic formula for most cases of , but there are some discoveries on the symmetric distributions. In this paper, the notation  means two random vectors X and Y have the same distribution.
Definition 3. A random vector  is symmetric to  if  and X is symmetric to its mean vector  if .
 Theorem 2. Let  be a set of MSE-RPs for a symmetric distribution  about , then the set of  is also a set of MSE-RPs. Furthermore, if the set of MSE-RPs for  is unique, and its MSE-RPs are sorted as , thenwhere  is the largest integer not exceeding a.  The following review is for the univariate distribution 
F with mean 
 and variance 
. Sharma [
21] pointed out that the MSE-RPs of a symmetric distribution about zero do not need to be symmetric if the set of MSE-RPs is not unique.
Theorem 3. Let X be a continuous random variable with pdf , finite mean μ and variance  and the distribution of X is symmetric about μ. Let . The two MSE-RPs of X areand the corresponding MSE is with the relatedif and only if  This theorem was presented in Flury [
13]. If the condition (
12) does not hold, Gu and Mathew [
22] gave a detailed study on some characterizations of symmetric two MSE-RPs. Their results are listed below.
Let X be a random variable with density  symmetric about the mean  and continuous at ; then
- (a)
 If , then  and  are 2 MSE-RPs of X;
- (b)
 If , it implies that the above points do not provide local minimum of MSE.
They pointed out that Theorem 3 needs to be modified and gave a counterexample about the standard symmetric exponential distribution with pdf  and mean . It is easy to find that , but  are MSE-RPs.
More examples are discussed in their article. If two random variables 
Z and 
X have the relationship 
 and MSE-RPs of 
X are known, then MSE-RPs of 
Z can be easily obtained (Fang and He [
10] and Zoppè [
23]).
Theorem 4. Let  be MSE-RPs of X, then  has MSE-RPs of  with MSE to be .
 There are three special families that satisfy the above relationship: the location-scale family (), the location family () and the scales family ().
The study on the uniqueness of MSE-RPs is a challenging problem. Fleischer [
24] gave a sufficient condition “log-concavity” for the uniqueness of the MSE-RPs. The sufficient conditions in summary:
Trushkin [
25] proved that a log-concave probability density function has a unique set of MSE-RPs.
Definition 4. A continuous random variable X is said to have a log-concave density  if it satisfiesfor all  and all  in the support of X.  Log-concavity of the density is a well-known property, which is satisfied by a large number of remarkable distributions including the normal distribution. 
Table 1 lists some log-concave densities, where the kernel of 
 ignores some constant in 
 so that the condition for log-concavity of 
 remains the same. The exponential distribution is the case of gamma distribution with 
, and the uniform distribution 
 is the special case of beta distribution with 
.
Example 1. A finite mixture of distributions allows for great flexibility in capturing a variety of density shapes. Research into mixture models has a long history. The most cited early publication is Pearson [26], as he used a two-component normal mixture model for a biometric data set. The density of a mixture of two normal distributions, denoted by , is Li et al. [27] gave a detailed study on several aspects of the distribution: “unimodal or bimodal”, “measure of disparity of two normals”, and “uniqueness of MSE-RPs”. Generally, the uniqueness of MSE-RPs is not always true, but under some conditions it is true. For example, for a location mixture of two normal densities with , a set of MSE-RPs is is unique if  for all .    2.2. Asymptotic Behavior of MSE-RPs
There are a lot of studies on the asymptotic behavior of MSE-RPs; for example, see Zador [
28], Su [
29], Graf and Luschgy [
18], and Pagès [
7]. It is well-known that the distribution tail gives strong inference on statistical inference. According to different standards, there are many kinds of classification methods for statistical distribution. Embrechts et al. [
30] classified the distributions based on the convergence rate of pdf 
 as 
, and they defined the so-called 
heavy-tailed distribution and 
light-tailed distribution, in which the exponential distribution is used as a standard for comparison. The following formal definitions are from Foss et al. [
31].
Definition 5. The univariate random variable X with the distribution function F is said to have a heavy tail if Otherwise, F is said to have a light tail if  Obviously, any univariate random variable supported on a bounded interval is light-tailed. In fact, this definition can intuitively reflect that the tail of a heavy-tailed distribution is heavier than the tail of the exponential distribution. Moreover, the long-tailed distribution is an important subclass of heavy-tailed distribution and is more commonly used in applications. The formal definition of a long-tailed distribution was given by Foss et al. [
31] as follows.
Definition 6. The univariate random variable X with distribution function F is said to be long-tailed ifor equivalentlywhere .  Xu et al. [
32] studied the limiting behavior of the gap between the largest two representative points of a statistical distribution and obtained another kind of classification for the most useful univariate distributions. They illustrate the relationship between RPs and the concepts of doubly truncated mean residual life (DMRL) and mean residual life (MRL), which are widely used in survival analysis. Denote
        
 They consider three kinds of distributions according to the domain of distribution, i.e., 
, 
, and finite interval.
Table 2 shows limiting value of 
 of the normal, 
t, and logistic distributions. Their density functions are
        
        respectively. It is surprising that the normal distribution and 
t distribution have such different behavior, although the normal distribution is the limiting distribution of the student’s 
t distribution as 
.
 Table 3 presents limiting value of 
 for many useful distribution on 
. These distributions include the Weibull distribution with density
        
        the Gamma and exponential distributions with respective densities
        
        the density of the 
F-distribution with degrees of freedoms 
 and 
        the Beta prime distribution with density
        
        the lognormal distribution with density
        
        and the inverse Gaussian distribution with density
        
 Observing on these results [
32] gave Theorem 5.
Theorem 5. If the univariate random variable X supported on  is long-tailed, then  For the distributions on the finite interval 
, Xu et al. [
32] gave a systematic study including the following result.
Theorem 6. Suppose that a random variable X has continuous probability density function  on  and . Let  be the k MSE-RPs of X. If  converges uniformly to , , thenprovided that the above limit exists.    3. Algorithms for Generation of MSE-RPs of Univariate Distributions
Generation of MSE-RPs is very important for applications. This section reviews algorithms for the generation of univariate distributions. To minimize the mean square error (
5) is an optimization problem, including some difficulties:
The objective function is multivariate on simplex ;
The objective function might be not differentiable in the whole domain;
The minimum of the objective function is not unique, and the objective function may have more local minimums on the domain.
This kind of problems can not be directly solved by the classical optimization methods (such as the downhill simplex method, quasi-Newton methods, and conjugate gradient methods) for most of distributions.
There are three main different approaches for the generation of RPs:
- (a)
 Theoretic approach or combining the theoretic approach and computational calculation;
- (b)
 Applying the k-means method finds approximate RPs, and this approach can be applied to all of univariate and multivariate distributions;
- (c)
 To solve a system of nonlinear equations.
Approach (a) can be used for very few distributions, such as the uniform distribution in a finite interval.  [
33] proposed the method for finding MSE-RPs of the exponential and Laplace distributions by combining the theoretic approach and computational calculation.
 Approach (b) applies the k-means methods to any continuous univariate and multivariate distributions. The traditional k-means algorithm needs a set of n observations from the underlying distribution  and the user needs to cluster those observations into k groups under a loss function . The k-means algorithm begins with k arbitrary centers. Each observation is then assigned to the nearest center, and each center is recomputed as the center of mass of all points assigned to it. These steps (assignment and center calculation) are repeated until the process stabilizes. One can check that the total error  is monotonically decreasing, which ensures that no clustering is repeated during the course of the algorithm. The mean square error (MSE), see Definition 1, has been popularly used as an error .
It seems to us that Polard [
34] was the first one to propose this approach. Along this line, Lloyd [
35] proposed two trial-and-error methods. This approach is easy to implement, but it needs to choose a good quality initial and a large number of training samples. There are two kinds of 
k-means algorithms: nonparametric and parametric 
k-means algorithms. If the population distribution is known, the training samples are from the known population distribution and the corresponding 
k-means algorithm is parametric; otherwise, the underlying distribution is unknown and the corresponding 
k-means algorithm is nonparametric. Usually, the parametric 
k-means algorithm is more accurate for most univariate distributions.
The parametric k-means algorithm
- (1)
 For given pdf 
, the number of RPs: 
k, and 
, input a set of initial points 
. Determine a partition of 
 as
          
          where
          
- (2)
 Calculate probabilities
          
          and the condition mean
          
- (3)
 If two sets of  and  are identical, the process stops and deliver  as MSE-RPs of the distribution with probabilities ; otherwise, let  and go back to Step (1).
Stampfer and Stadlober [
36] called this algorithm the self-consistency algorithm as the output set of RPs is self-consistent (not necessarily MSE-RPs).
Approach (c) was proposed by Max [
9] and Fang and He [
10] based on the traditional optimization for minimizing the mean square error function 
 with respect to 
, where the objective function (
5) is differentiable, taking partial derivative of 
Y with respect to 
 and constructing a system of equations. Its solutions, denoted by 
, might be the global minimum of 
Y, i.e., MSE-RPs. For 
 there are three kinds of equations in (
13), (
14), and (
15), respectively. Fang and He [
10] gave the conditions for the solution to be unique under the normal distribution.
 Theorem 7. Taking partial derivative of Y with respective to to , we have three kinds of equations:
- 1.
 For any , for the equationthere exists a solution  if and only if . - 2.
 For given , for the equationthere exists a solution  when , where  is the th representative point in the set of MSE-RPs which has . - 3.
 For any , for the equationthere exists a solution . 
 The Fang–He algorithm has been applied to many univariate distributions. Max [
9] and Fang and He [
10] obtained sets of MSE-RPs of 
 for 
 and 
, respectively. Fu [
37] applied the Fang–He algorithm to the gamma distribution 
 and obtained MSE-RPs for 
. Ke et al. [
38] gave a more advanced study on MSE-RPs of the gamma distribution. Zhou and Wang [
39] studied the 
t distribution with 10 degrees of freedom and gave MSE-RPs for 
. Fei [
40] proposed an algorithm for generating MSE-RPs by Newton optimization algorithm. Li et al. [
27] gave a detailed study on MSE-RPs of the mixture normal distributions. Fei [
41] studied the class of Pearson distributions where the pdf of 
X has the form of
      
      where 
c is the normalized constant and parameters 
 satisfy the differential equation
      
The class of Pearson distributions includes many useful distributions. For example, type I is the beta distribution; type II is the symmetrical 
U-shaped curve; type III is the shifted gamma distribution; type V is the shift inverse gamma distribution; type VI is the inverse beta distribution; type VII is the 
t distribution; type VIII is the power function distribution; type X is the exponential distribution; and type XI is the normal distribution. Fei [
41] gave some sufficient conditions for the uniqueness of the solution.
Comparisons of the three kinds of generations of MSE-RPs: The approach (a) obviously is the best but only for a few distributions. The approach (b) can be applied to any continuous univariate and multivariate distributions. For the generation of univariate MSE-RPs, the parametric k-means algorithm does not need the training sample. Many authors have used this algorithm with a good initial set of points. The approach (c) can find the most accurate MSE-RPs of univariate distributions, but it needs a heavy computational calculation if k is larger.
  5. Property of MSE-RPs of Multivariate Distributions
Let 
 be a random vector with cdf 
 and pdf 
. Assume that 
X exist finite mean vector 
 and covariance matrix 
. A set of MSE-RPs of 
X is denoted by 
 that minimizes the mean square error (MSE) and the corresponding vector is 
 with Voronoi regions 
 and probabilities 
 (refer to Definition 1). The following results are from Flury [
13,
14].
Theorem 11. Under the above assumption on X we have
- 1.
 When  the MSE-RP is given by ;
- 2.
 , i.e.,  is in the convex hull of ;
- 3.
 MSE-RPs are self-consistent andwhere  is the covariance matrix of ; - 4.
 The rank of .
 Theorems 2 and 4 can be easily extended to the multivariate case, but Theorem 4 needs some change in linear relation for extension below.
Theorem 12. Let  and  be two random vectors in  with relation , where ,  and  is an orthogonal matrix of order d. We have
- (a) 
 If  is a set of self-consistent points of , then  is a set of self-consistent points of ;
- (b) 
 If  is a set of MSE-RPs of , then  is a set of MSE-RPs of .
 There are various kinds of symmetry in multivariate distributions, among which the class of elliptically symmetric distributions is an extension of the multivariate normal distribution and includes so many useful distributions. For a comprehensive study, refer to Fang et al. [
46].
Definition 9. Spherically and elliptically symmetric distributions. A d-dimensional random vector X is said to have an elliptically symmetric distribution (ESD), or elliptical distribution for short, if X has the following stochastic representation (SR)where random variable  is independent of  that is uniformly distributed on the unit sphere in , , Ψ is a positive definite matrix of order d (not necessary to be  ), and  is the positive definite square root of Ψ. We write  if X has a density of the formwhere g is called as the density generator. When , X has a spherical distribution with the stochastic representationand write , where  is the density of X.  In general, an elliptical/spherical distribution is not necessary to have a density. For example, 
 does not have a density in 
. If the distribution of 
X is spherical and 
, then 
 and 
 are independent. It is known that 
X defined in (
24) has a density if and only if 
R has a density 
. The relationship between 
 and 
 is given by
      
Table 5 lists some useful subclasses of the elliptical distributions.
 Flury [
13] is the first one who found some relationship between the principal components and the MSE-RPs of the elliptical distribution in the following theorems.
Theorem 13. Suppose  with mean vector , covariance matrix Σ that is proportional to Ψ and density generator g. Then, the two MSE-RPs of X have of the formwhere  is the normalized characteristic vector associated with the largest eigenvalue of Σ, and  are the  MSE-RPs of the univariate random variable . If the MSE-RPs are not unique, they can be chosen as the given form.  Tarpey et al. [
47] established a theorem, called the 
principal subspace theorem, which shows that 
k principal points of an elliptically symmetric distribution lie in the linear subspace spanned by the first several principal components.
Theorem 14. Let . If a set of k MSE-RPs of X spans a subspace  of dimension , then Σ has a set of eigenvectors  with associated ordered eigenvalues  such that  is spanned by .
 The principal subspace theorem exploresthe set of MSE-RPs of an elliptical distribution that has a close relationship with its principal components. It is why Flury [
13] called the MSE-RPs principal points. Tarpey [
48] and Yang et al. [
43] proposed ways to generate a set of MSE-RPs of elliptical distributions in several subclasses of elliptical distributions and explore more relationships between the principal components and MSE-RPs. Their studies need algorithms for producing MSE-RPs.
Yang et al. [
43] consider numerical simulation for estimation of mean vector and covariance matrix of the elliptical distributions and show that both QMC-RPs and MSE-RPs have better performance than MC-RPs. They also studied the distribution of MSE of MC-RPs for univariate distributions and elliptical distributions and pointed out that MSE of MC-RPs can be fitted by the extreme value distribution. For a random sample with a poor MSE value, it does not expect to have a good result based on this set of random samples.
  6. Algorithms of Generation for RPs of Multivariate Distributions
There are a lot of methods for generating a random sample from a given multivariate distribution 
. Johnson [
49] gave a good introduction to various methods. There are two useful methods: conditional decomposition and stochastic representation.
  6.1. Conditional Decomposition
The conditional distribution method changes generation for a multivariate distribution into generation for several conditional univariate distributions. Suppose random vector 
 has the cdf 
. Let 
 be the cdf of 
 and let 
 be the conditional distribution of 
 given 
. It is known in theory of probability
        
Note that each of  and  is a univariate (conditional) distributions. We can apply some methods including the inverse transformation method to generate a random sample from these distributions. Denote a set of random samples from these univariate (conditional) distributions by , then  is a random sample from X. In particular, when  are independent, , where  is the cdf of .
  6.2. Stochastic Representation
Let 
. Suppose that 
X has a stochastic representation
        
        where 
h is a set of continuous functions on 
 and 
Y follows the uniform distribution on 
. The Monte Carlo simulation can find a random sample 
 from 
. Then, 
 is a random sample from 
.
The SR method can be extended to generate a set of QMC-RPs and MSE-RPs. The QMC method employ a set of quasirandom numbers on , denoted by . Set . Then, the set of  is called a set of quasirandom F-numbers, which can be regarded as another kind of RPs of , i.e., NTM-RPs or QMC-RPs.
Generating a set of MSE-RPs is not possible for most multivariate distributions. If we focus on some class of multivariate distributions that are easily generated by MC or QMC, then the generation of MSE-RPs becomes much easier. One method is the LBG Algorithm by the use of 
k-means method proposed by Linde et al. [
50]. The LBG algorithm requests a training sequence 
 from the given distribution 
 by a Monte Carlo method, where 
N is much larger than 
k and 
k is the number of RPs for 
. The next step chooses a set of initial vectors using the same Monte Carlo method and finds the associated Voronoi partition 
 by assigning each 
 to the nearest region of the partition 
. Then follow the procedure of the 
k-means algorithm and iteration steps until reach the stopping rule.
Although the LBG algorithm can reach a local optimal output with a non-increasing MSE, Fang et al. [
51] pointed out two problems when applying this algorithm:
- (a)
 The algorithm gives the local optimum and the results are dependent on the initial points;
- (b)
 The generation of samples of  and the calculation of MSE are based on the Monte Carlo method, which is less efficient with the convergence rate .
Fang et al. [
51] revised the LBG algorithm by the use of quasirandom 
F-numbers in producing the set of training samples and the set of initial vectors. They proposed the so-called NTLBG algorithm for the generation of QMC-RPs of an elliptical distribution.
Recall a spherical distribution 
 has a SR 
 in (
25). If we can find a set of quasirandom numbers of the uniform distribution on 
 and a set of quasirandom numbers of 
R, their product can produce a set of QMC-RPs of 
X. An effective algorithm for generating a set of QMC-RPs on 
 can refer to Fang and Wang [
4]. The latter calls this algorithm as TFWW algorithm. It is easy to see that if 
, then 
 for any orthogonal matrix 
 of order 
d. Therefore, if 
 is a set of MSE-RPs of 
X, then 
 is also a set of MSE-RPs of 
X. That means that the set of MSE-RPs for spherical distributions is quite not unique.
  6.3. The NTSR Algorithm for the Generation of a Spherical Distribution
Generate a set of quasirandom numbers  on .
Denote the cdf of R by  and let  be its inverse function. Compute , .
Generate a set of quasirandom F-numbers  of the uniform distribution on  with the first -components of ’s via the TFWW algorithm.
Then  is a set of quasirandom F-numbers or QMC-RPs of the given spherical distribution .
This algorithm can be easily extended to generation of quasirandom F-numbers or QMC-RPs for elliptical distributions. The NTLBG algorithm has the following steps:
Step 1.  For a given , generate a set of quasirandom F-number  as a training sequence by the NTSR algorithm with a large N.
Step 2.  Set t = 0. For a given k, generate a set of quasirandom F-numbers  of  as an initial set of output vectors.
Step 3.  Form a partition  of  such that each  is assigned to the nearest region of the partition, i.e.,  if , .
Step 4.  Calculate the sample conditional means 
 and form a new set of output vector vectors 
, where
            
            and 
 is the number of 
 falling in 
. If 
, deliver 
 as MSE-RPs and go to 
Step 6 and 
 as estimated probability of 
; otherwise go to the next step.
 Step 5.  Let  and go to Step 3.
Step 6.  Calculate and deliver MSE
            
            or
            
            by its estimate based on the training sequence.
 
The NTLBG algorithm has used in generation QMC-RPs and MSE-RPs for elliptical distributions ([
43,
52,
53]) and the skew-normal distribution in Yang et al. [
43].
  8. Concluding Remarks
The bootstrap method, originally proposed by Efron [
1], has found wide applications in statistical theory and practice. This method involves drawing random samples from the empirical distribution, which serves as an approximation to the population distribution 
. However, due to the inherent randomness of these samples, the bootstrap method has certain limitations. To overcome this, a natural solution is to construct support points called RPs that offer a more representative characterization of the distribution 
 compared to random samples.
This paper discusses three types of RPs: MC-RPs, QMC-RPs, and MSE-RPs, along with their respective approximations. Theoretical foundations and practical applications demonstrate that all of these RPs can be effectively and efficiently utilized for statistical inferences, including estimation and hypothesis testing. In many case studies, MSE-RPs and/or QMC-RPs have shown better performance compared to MC-RPs. QMC-RPs have been widely applied in various fields, including numerical integration in high dimensions, financial mathematics, experimental design, and geometric probability. This paper provides a comprehensive review of the theory and applications of MSE-RPs, with particular emphasis on recent developments. MSE-RPs exhibit significant potential for applications in statistics, financial mathematics, and big data analysis.
However, in the theory of MSE- and QMC-RPs, several open questions remain. For instance, although several new RP construction methods have been proposed, these methods still lack solid theoretical justifications and practical applications. Further research is needed to address these gaps and advance the field.
We are creating a website (
https://fst.uic.edu.cn/isci_en/index.htm, accessed on 19 June 2023) where readers can access fundamental knowledge about RPs and MSE-RPs for various univariate distributions in the near future. Additionally, we are in the process of incorporating R software that generates MSE-RPs into the website, which will be available soon. While there are existing monographs such as “Foundations of Quantization for Probability Distributions” by Graf and Luschgy [
18] and “Numerical Probability, An Introduction with Applications to Finance” by Pages [
7], these works do not specifically focus on applications in statistical inference. Therefore, there is a need for a new monograph that covers recent advancements in both theory and applications. This review article can serve as a valuable resource, providing relevant content and establishing connections for a potential new book in this area.