A Review of Representative Points of Statistical Distributions and Their Applications

Fang, Kai-Tai; Pan, Jianxin

doi:10.3390/math11132930

Open AccessReview

A Review of Representative Points of Statistical Distributions and Their Applications

by

Kai-Tai Fang

^1,2

and

Jianxin Pan

^3,1,*

¹

Guangdong Provincial Key Laboratory of Interdisciplinary Research and Application for Data Science, Beijing Normal University—Hong Kong Baptist University United International College, Zhuhai 519087, China

²

The Key Lab of Random Complex Structures and Data Analysis, The Chinese Academy of Sciences, Beijing 100045, China

³

Research Center for Mathematics, Beijing Normal University, Zhuhai 519087, China

^*

Author to whom correspondence should be addressed.

Mathematics 2023, 11(13), 2930; https://doi.org/10.3390/math11132930

Submission received: 19 April 2023 / Revised: 20 June 2023 / Accepted: 27 June 2023 / Published: 29 June 2023

(This article belongs to the Special Issue Distribution Theory and Application)

Download

Browse Figure

Versions Notes

Abstract

Statistical modeling relies on a diverse range of statistical distributions, encompassing both univariate and multivariate distributions and/or discrete and continuous distributions. In the literature, numerous statistical methods have been proposed to approximate continuous distributions. The most commonly used approach is the use of the empirical distribution which is obtained from a random sample drawn from the distribution. However, it is very likely that the empirical distribution suffers from an accuracy problem when used to approximate the underlying distribution, especially if the sample size is not sufficient. In order to improve statistical inferences, various alternative forms of discrete approximation to the distribution were proposed in the literature. The choice of support points for the discrete approximation, known as Representative Points (RPs), becomes extremely important in terms of distribution approximations. In this paper we give a review of the three main methods for constructing RPs, namely based on the Monte Carlo method, the number-theoretic method (or quasi-Monte Carlo method), and the mean square error method, aiming to introduce such important methods to the statistical or mathematical community. Additional approaches for forming RPs are also briefly discussed. The review focuses on certain critical aspects such as theoretical properties and computational algorithms for constructing RPs. We also address the issue of the application of RPs through studying practical problems and provide evidence of RPs’ advantages over random samples in approximating the distribution.

Keywords:

approximation distribution; mean square error; quasi-Monte Carlo; representative points; resampling; statistical distribution; statistical simulation

MSC:

62E17; 62E10; 47J25

1. Introduction

Statistics is an art of science involving data collection and data analysis, in which statistical modeling relies on various types of statistical distributions. Such distributions are either discrete or continuous and either univariate or multivariate distributions. For an unknown continuous distribution F in

R^{d}

, the conventional approach is to approximate F using the empirical distribution of a random sample. The empirical distribution is discrete, consisting of support points from the random sample, with each point contributing equally to the approximation. Because of the accuracy problem resulting from the empirical distribution, we want to construct a discrete distribution

\hat{F} \in R^{d}

that approximates the distribution F while preserving the distribution information as much as possible. Consider a random vector X following a continuous distribution

F (x) \in R^{d}

, characterized by a probability density function (pdf)

p (x)

. In contrast, a discrete random vector Y is characterized by a probability mass function (pmf)

F_{Y} (y)

.

\begin{matrix} \begin{matrix} Y & y_{1} & y_{2} & \dots & y_{k} \\ p & p_{1} & p_{2} & \dots & p_{k} \end{matrix} \end{matrix}

(1)

where

y_{1}, \dots, y_{k}

are support points of Y and

P (Y = y_{i}) = p_{i} > 0, i = 1, \dots, k, \sum_{i - 1}^{k} p_{i} = 1 .

For defining an approximation distribution

F_{Y} (y)

to

F (x)

it should satisfy

(i): $F_{Y} (y)$ is a function F;
(ii): A pre-decided distance between F and $F_{Y}$ is small;
(iii): $F_{Y}^{(k)} \to F$ in distribution as $k \to \infty$ , where k in $F_{Y}$ is the number of support points of Y. In this case, the support points $y_{1}, \dots, y_{k}$ are called representative points (RPs). There are several ways to choose an approximation distribution $F_{Y}$ .

1.1. Monte Carlo—RPs

Let

X \sim F (x; θ) \in R^{d}

be a random vector, where

θ

represents the parameters. For instance, for the normal distribution

N (μ, σ^{2})

, the parameters are denoted as

θ = (μ, σ^{2})

. In traditional statistics, random samples are utilized to make inferences about the population. Specifically, a collection of independently and identically distributed (iid) random samples, denoted as

x_{1}, \dots, x_{k}

, are drawn from the population distribution F. The empirical distribution of the random sample is defined as follows:

\begin{matrix} F_{k} (x) = \frac{1}{k} \sum_{i = 1}^{k} I \{x_{i} \leq x\}, \end{matrix}

(2)

where

I_{A}

is the indicator function of A, and the inequalities

x_{i} \leq x

means that

x_{i l} \leq x_{l}

(

1 \leq i \leq k

,

1 \leq l \leq d

) where

x_{i} = {(x_{i 1}, \dots, x_{i d})}^{T}

and

x = {(x_{1}, \dots, x_{d})}^{T}

. Many statistical inferences rely on the empirical distribution

F_{k} (x)

, which includes various methods such as:

(1): Parameter estimation (point estimation and confidence interval estimation);
(2): Density estimation;
(3): Testing hypothesis, and so on.

The empirical distribution is a discrete distribution with support points

x_{1}, \dots, x_{k}

each having the sampling probability

1 / k

and can be considered as an approximation of

F (x)

in the sense of consistency, i.e.,

F_{k} (x) \to F (x)

in distribution as

k \to \infty

. In statistical simulation, a set of random samples can be generated by computer software under the Monte Carlo (MC) method. Therefore, we denote random variable

Y_{M C} \sim F_{k} (y)

or

Y_{M C} \sim F_{M C}^{(k)} (y)

in this paper. The MC methods have been commonly used. For instance, in the case of a normal population

N (μ, σ^{2})

with unknown parameters

μ

and

σ^{2}

, one can utilize the sample mean

\bar{x}

and the sample variance

s^{2} = \frac{1}{n - 1} \sum_{i = 1}^{n} {(x_{i} - \bar{x})}^{2}

to estimate

μ

and

σ^{2}

, respectively.

As the empirical distribution can be regarded as an approximation distribution to

F (x)

, one can therefore take a set of random samples from

F_{M C}

instead of from F, as suggested by Efron [1], which is called as bootstrap method. The bootstrap method is a resampling technique, where the random sample takes from an approximation distribution

\hat{F}

. Later, Efron gave a comprehensive study on the theory and application of the bootstrap method.

The MC method has proven to be useful in statistical theory and applications. However, its efficiency is not always good due to the convergence rate of

F_{k} (x) \to F (x)

in distribution, which is

O_{p} (\frac{1}{\sqrt{k}})

as

k \to \infty

. The slow convergence leads to unsatisfactory approximations when performing numerical integration using the MC method. While the empirical distribution serves as one approximation to the true distribution F, alternative approaches were proposed in the literature to address this issue.

1.2. Number-Theoretic RPs or Quasi-Monte Carlo RPs

Let us consider the numerical calculation for high-dimension integration in a canonical form

\begin{matrix} I (f) = \int_{0}^{1} \dots \int_{0}^{1} f (y_{1}, \dots, y_{d}) d y_{1} \dots d y_{d} = \int_{C^{d}} f (y) d y, \end{matrix}

where f is a continuous function on

C^{d} = {[0, 1]}^{d}

. Let

Y = {y_{1}, \dots, y_{k}}

be a set of k points uniformly scattered on

C^{d}

. One can use the mean of

{f (y_{i}), y_{i} \in Y}

, denoted by

\bar{f (y)} = \frac{1}{k} \sum_{i = 1}^{k} f (y_{i})

, to approximate

I (f)

. By the MC method, we can employ a random sample from the uniform distribution

U (C^{d})

. The rate of convergence of

\bar{f (y)} \to I (f)

is

O_{p} (1 / \sqrt{k})

, which is relatively slow but does not depend on the dimensionality d. How to increase the convergence rate is an impotent subject in applications. The number-theoretic methods (NTM) or quasi-Monte Carlo methods (QMC) provide many methods for the construction of

Y

such that

{y_{1}, \dots, y_{k}}

are uniformly scattered on

C^{d}

, by which it can increase the rate of convergence into

O (k^{- 1} {(log k)}^{d})

. For the theory and methodology of NTM/QMC, one can refer to Hua and Wang [2] and Niederreiter [3]. In the earlier study on NTM, many authors employed the star discrepancy as a measure of the uniformity of

Y

in

C^{d}

. The star discrepancy is defined by

\begin{matrix} D (F, F^{(k)}) = sup_{x \in R^{d}} | F (x) - F^{(k)} (x) |, \end{matrix}

where

F (x)

is the cdf of

U (C^{d})

and

F^{(k)} (x)

is the empirical distribution of

Y

. An optimal

Y

has the minimum

D (F, F^{(k)})

. In this case the points in

Y

are called QMC-RPs that are support points of

F^{(k)} (y)

each having the equal probability

1 / k

. In this paper we denote

F^{(k)}

by

F_{Q M C}^{(k)}

and random vector

Y_{Q M C}^{(k)} \sim F_{Q M C}^{(k)}

. Another popular measure is the

L_{p}

-distance between

F^{(k)}

and F

\begin{matrix} D_{p} (F^{(k)}, F) = {(\int_{R^{d}} {| F^{(k)} (x) - F (x) |}^{p} d x)}^{1 / p} . \end{matrix}

(3)

When F is the uniform distribution on

C^{d}

, the the

L_{p}

-distance is called as

L_{p}

-discrepancy. The star discrepancy is the

L_{p}

-discrepancy as

p \to \infty

. In the literature a set of

{q_{1}, \dots, q_{n}}

under a certain structure is regarded as a set of quasirandom F-numbers if its discrepancy has an order

O (n^{- 1 + ϵ})

under the given discrepancy. When F is the uniform distribution on

C^{d}

(

U (C^{d})

), the quasirandom F-numbers are called as quasirandom numbers. The reader can refer to Fang and Wang [4] for details. Due to the numerical computation complexity of the

L_{p}

-discrepancy (

p \neq 2

), the

L_{2}

-discrepancy has a simple computational formula. There are more uniformity measures in the experimental design such as the centered

L_{2}

-discrepancy, wrap-around

L_{2}

-discrepancy and mixture

L_{2}

-discrepancy (refer to Fang et al. [5]). Fang and Wang [4] and Fang et al. [6] gave a comprehensive study on NTM and applications of NTM in statistical inference, experimental design, geometric probability, and optimization. Pages [7] gave a detailed study on applications of QMC to financial mathematics. Section 6 will introduce some algorithms for the generation of QMC-RPs.

1.3. Mean Square Error—RPs

Another measure to choose discrete approximation distribution to a given continuous distribution is the mean square error (MSE), and the corresponding support points are called MSE-RPs.

Definition 1.

Suppose that a random vector

X \sim F (x)

in

R^{d}

has a density function

p (x)

with finite mean vector and covariance matrix. A set of points of

Ξ = {ξ_{j}, j = 1, \dots, k}

is called MSE-RPs if it minimizes the mean square error (MSE)

\begin{matrix} MSE (Ξ) = \int_{R^{d}} min_{j = 1, \dots, k} | | x - ξ_{j} {| |}^{2} d x, \end{matrix}

(4)

where

| | a | |

denotes

l_{2}

-norm of

a \in R^{d}

.

Given a set of k points

{y_{1}, \dots, y_{k}}

in

R^{d}

a set of regions are defined by

\begin{matrix} S_{j} = {x : {(x - y_{j})}^{T} (x - y_{j}) \leq {(x - y_{h})}^{T} (x - y_{h}), \forall h \neq j}, j = 1, \dots, k, \end{matrix}

that are called as Voronoi regions, where

S_{j}

is the attraction domain of

y_{j}

.

For a univariate distribution

F (x)

with pdf

p (x)

, mean

μ

and variance

σ^{2}

, its MSE-RPs can be sorted as

ξ_{1}^{(k)} < ξ_{2}^{(k)} < \dots < ξ_{k}^{(k)}

, and its MSE can be expressed as

\begin{matrix} MSE (Y) = \int_{- \infty}^{+ \infty} min_{i} {(x - ξ_{i}^{(k)})}^{2} p (x) d x = \sum_{j = 1}^{k} \int_{I_{j}} {(x - ξ_{i}^{(k)})}^{2} p (x) d x, \end{matrix}

(5)

where

\begin{matrix} I_{1} = (a_{1}, a_{2}), I_{2} = (a_{2}, a_{3}), \dots, I_{k} = (a_{k}, a_{k + 1}), \\ a_{1} = - \infty, a_{i} = (ξ_{i - 1}^{(k)} + ξ_{i}^{(k)}) / 2, i = 2, \dots, k, a_{k + 1} = \infty . \end{matrix}

(6)

The corresponding

Y_{M S E}

has support points

ξ_{j}^{(k)}, 1 \leq j \leq k

with probabilities

P (Y_{M S E} = ξ_{i}) = p_{i}

where

\begin{matrix} p_{1} = \int_{- \infty}^{(ξ_{1} + ξ_{2}) / 2} p (x) d x, \\ p_{i} = \int_{(ξ_{i - 1} + ξ_{i}) / 2}^{(ξ_{i} + ξ_{i + 1}) / 2} p (x) d x, i = 2, \dots, k - 1, \\ p_{k} = \int_{(ξ_{k - 1} + ξ_{k}) / 2)}^{\infty} p (x) d x . \end{matrix}

(7)

\begin{matrix} \begin{matrix} Y_{M S E} & ξ_{1} & ξ_{2} & \dots & ξ_{k} \\ p & p_{1} & p_{2} & \dots & p_{k} \end{matrix} \end{matrix}

(8)

Its loss function (LF) is defined by

\begin{matrix} L (Y) = L (ξ_{1}, \dots, ξ_{k}) = \frac{1}{σ^{2}} M S E (Y) . \end{matrix}

(9)

It is known that

0 \leq L (Y) \leq 1

. The loss function shows what percentage of

Var (X)

is lost by using

Y_{M S E}

replacing X.

The concept of MSE-RPs has been motivated by various problems. In the context of the grouping problem, Cox [8] considered the task of condensing observations of a variate into a limited number of groups, where the grouping intervals are selected to retain maximum information. He introduced the concept of mean squared error (MSE) and provided several sets of MSE-RPs for the standard normal distribution. The concept of MSE-RPs is also relevant in data transmission systems, where analog input signals are converted to digital form, transmitted, and then reconstituted as analog signals at the receiver. The problem of optimal quantization of a continuous random variable with a fixed number of levels was precisely defined by Max [9]. In IEEE journals, MSE-RPs are also referred to as “quatizers”.

Fang and He [10] proposed the mathematical problem based on the national Chinese garment standard (refer to Fang [11]). Iyengar and Solomon [12] considered the mathematical problems that arise in the theory of representing a distribution by a few optimally chosen points. Flury [13], Flury [14] studied a project of the Swiss army to replace existing with newly designed protection masks. They used “principal points” for MSE-RPs due to some link between the principal components and MSE-RPs. The MSE-RPs are also applied to select a few “representative” curves from a large collection of curves which is useful for kernel density estimation (see Flury and Tarpey [15]) and for psychiatric studies by Tarpey and Petkova [16]. Furthermore, MSE-RPs can be applied to problems related to the numerical computation of conditional expectations, stochastic differential equations, and stochastic partial differential equations. These applications are often motivated by challenges encountered in the field of finance [7]. There was a special issue of “IEEE Transaction on Information Theory” on vector quantizers in 1982, a very detailed review on quantization by Gray and Neuhoff [17]. There are several monographs on the theory and applications of RPs, for example, Graf and Luschgy [18] “Foundations of Quantization for Probability Distributions” and Pages [7] “Numerical Probability, An Introduction with Applications to Finance”.

The use of different types of representative points (RPs) allows for the construction of diverse approximation distributions to represent the underlying population distribution. By utilizing these approximation distributions, researchers can make more reliable and precise statistical inferences. The objective of this paper is to provide a comprehensive review of various types of RPs and their associated theory, algorithms, and applications. The focus of this review extends to the examination of recent advancements in the field, highlighting the latest developments and emerging trends. This paper aims to offer valuable insights into the current state of the art and provide researchers and practitioners with a deeper understanding of the potential applications and implications of RPs in statistical science. In Section 2, we present a comprehensive list of properties associated with MSE-RPs for univariate distributions. Section 3 focuses on reviewing various algorithms used for generating MSE-RPs for univariate distributions. In Section 4, we compare various types of RPs in terms of their performance in stochastic simulation and resampling. Additionally, we show the consistency of resampling when MSE-RPs. Properties of MSE-RPs for multivariate distributions are reviewed in Section 5, and algorithms for generating QMC-RPs and MSE-RPs for multivariate distributions are introduced in Section 6. QMC-RPs and MSE-RPs have found numerous applications across various domains. In this paper, we focus on selected applications in statistical inference and geometric probability due to space limitations.

2. Properties of MSE-RPs for Univariate Distributions

We collect some properties of MSR-RPs in the literature in this section. These properties can be grouped into different issues. Some properties are only for the univariate distributions, and some are for the multivariate ones. The following results are from many articles, including Fei [19] under the notation in the previous section.

Theorem 1.

Let X be a continuous random variable with pdf

p (x)

, finite mean μ and variance

σ^{2}

. Then we have

\begin{matrix} (A) E (Y_{M S E}) = E (X) = μ; \\ (B) E (Y_{M S E}^{t}) = E (X Y_{M S E}^{t - 1}), t = 2, 3, \dots, m - {1, I f E (| X |}^{m}) < \infty; \\ (C) Var (Y_{M S E}) = Var (X) - M S E (Y_{M S E}^{(k)}); \\ (D) MSE (Y_{M S E}^{(k)}) \geq MSE (Y_{M S E}^{(k + 1)}), k > 1; \\ (E) Y_{M S E}^{(k)} \to X i n d i s t r i b u t i o n a s k \to \infty; \\ (F) MSE (Y_{M S E}^{(k)}) \to 0, a n d L (Y_{M S E}^{(k)}) \to 0, a s k \to \infty . \end{matrix}

The property (A) can be regarded as “unbiased mean”. The property (C) gives a decomposition of variance of X as

Var (X) = Var (Y_{M S E}) + M S E (Y_{M S E}) .

The concept of self-consistent has been used in the clustering analysis and has a close relation with MSE-RPs.

Definition 2.

The set of k points

{y_{1}, \dots, y_{k}}

in

R^{d}

is called self-consistent with respect to the d-variate random vector X and the partition

{S_{j}, j = 1, \dots, k}

of

R^{d}

if

\begin{matrix} E [X | X \in S_{j}] = y_{j}, j = 1, \dots, k, \end{matrix}

where the region

S_{i}

is the domain of attraction of

y_{j}

.

Tarpey and Flury [20] gave a comprehensive study on the self-consistent and they pointed out

(1): MSE-RPs are self-consistent with respective to X;
(2): MSE-RPs have the minimum mean square error among all sets of the self-consistent to X.

2.1. Existence and Uniqueness of MSE-RPs

The existence of MSE-RPs is no problem for any continuous distributions with the first second moments. For the case of

k = 1

, the MSE-RP

ξ_{1}

is the mean of X. This fact indicates that the MSE-RPs can be regarded as an extension of the mean. The MSE-RPs are no analytic formula for most cases of

k \geq 2

, but there are some discoveries on the symmetric distributions. In this paper, the notation

X \overset{d}{=} Y

means two random vectors X and Y have the same distribution.

Definition 3.

A random vector

X \in R^{p}

is symmetric to

0

if

X \overset{d}{=} - X

and X is symmetric to its mean vector

μ

if

X - μ \overset{d}{=} - (X - μ)

.

Theorem 2.

Let

Ξ = {ξ_{j}, j = 1, \dots, k}

be a set of MSE-RPs for a symmetric distribution

X \sim F (x)

about

0

, then the set of

{- ξ_{l}, \dots, - ξ_{k}}

is also a set of MSE-RPs. Furthermore, if the set of MSE-RPs for

F (x) \in R

is unique, and its MSE-RPs are sorted as

ξ_{1} < ξ_{2} < \dots, ξ_{k}

, then

\begin{matrix} - ξ_{i} = ξ_{k - i + 1}, i = 1, \dots, ⌊ k / 2 ⌋, \end{matrix}

(10)

where

⌊ a ⌋

is the largest integer not exceeding a.

The following review is for the univariate distribution F with mean

μ

and variance

σ^{2}

. Sharma [21] pointed out that the MSE-RPs of a symmetric distribution about zero do not need to be symmetric if the set of MSE-RPs is not unique.

Theorem 3.

Let X be a continuous random variable with pdf

p (x)

, finite mean μ and variance

σ^{2}

and the distribution of X is symmetric about μ. Let

Z = X - μ

. The two MSE-RPs of X are

\begin{matrix} ξ_{1} = μ - E (| Z |), a n d ξ_{2} = μ + E (| Z |), \end{matrix}

and the corresponding MSE is with the related

\begin{matrix} MSE (ξ_{1}, ξ_{2}) = Var (| Z |) = Var (| X - μ |), \end{matrix}

(11)

if and only if

\begin{matrix} p (μ) E (| X - μ |) \leq \frac{1}{2} . \end{matrix}

(12)

This theorem was presented in Flury [13]. If the condition (12) does not hold, Gu and Mathew [22] gave a detailed study on some characterizations of symmetric two MSE-RPs. Their results are listed below.

Let X be a random variable with density $p (x)$ symmetric about the mean $μ$ and continuous at $μ$ ; then
(a)
If $p (μ) E (| X - μ |) < \frac{1}{2}$ , then $ξ_{1} = μ - E (| Z |)$ and $ξ_{2} = μ + E (| Z |)$ are 2 MSE-RPs of X;
(b)
If $p (μ) E (| X - μ |) > \frac{1}{2}$ , it implies that the above points do not provide local minimum of MSE.
They pointed out that Theorem 3 needs to be modified and gave a counterexample about the standard symmetric exponential distribution with pdf $p (x) = \frac{1}{2} e^{- | x |}$ and mean $μ = 1$ . It is easy to find that $p (μ) E (| X - μ |) = \frac{1}{2}$ , but $ξ = \pm 1$ are MSE-RPs.

More examples are discussed in their article. If two random variables Z and X have the relationship

Z = a + b X

and MSE-RPs of X are known, then MSE-RPs of Z can be easily obtained (Fang and He [10] and Zoppè [23]).

Theorem 4.

Let

Y = {ξ_{1} < \dots < ξ_{k}}

be MSE-RPs of X, then

Z = a + b X

has MSE-RPs of

{a + b Y} = {a + b ξ_{1}, \dots, a + b ξ_{k}}

with MSE to be

b^{2} MSE (Y)

.

There are three special families that satisfy the above relationship: the location-scale family (

b > 0

), the location family (

b = 1

) and the scales family (

b > 0, a = 0

).

The study on the uniqueness of MSE-RPs is a challenging problem. Fleischer [24] gave a sufficient condition “log-concavity” for the uniqueness of the MSE-RPs. The sufficient conditions in summary:

The density function $p (x)$ must be differentiable;
The derivative ${(\log p (x))}^{'}$ must be strictly decreasing.

Trushkin [25] proved that a log-concave probability density function has a unique set of MSE-RPs.

Definition 4.

A continuous random variable X is said to have a log-concave density

p (x)

if it satisfies

\begin{matrix} log (p (λ x + (1 - λ) y)) \geq λ log (p (x)) + (1 - λ) log (p (y)) \end{matrix}

for all

λ \in (0, 1)

and all

x, y

in the support of X.

Log-concavity of the density is a well-known property, which is satisfied by a large number of remarkable distributions including the normal distribution. Table 1 lists some log-concave densities, where the kernel of

p (x)

ignores some constant in

p (x)

so that the condition for log-concavity of

p (x)

remains the same. The exponential distribution is the case of gamma distribution with

α = 1

, and the uniform distribution

U (0, 1)

is the special case of beta distribution with

a = 1, b = 1

.

Example 1.

A finite mixture of distributions allows for great flexibility in capturing a variety of density shapes. Research into mixture models has a long history. The most cited early publication is Pearson [26], as he used a two-component normal mixture model for a biometric data set. The density of a mixture of two normal distributions, denoted by

X \sim M i x N (α, μ_{1}, σ_{1}^{2}; μ_{2}, σ_{2}^{2})

, is

\begin{matrix} p (x) & = & α φ (x; μ_{1}, σ_{1}^{2}) + (1 - α) φ (x; μ_{2}, σ_{2}^{2}) \\ = & α \frac{1}{\sqrt{2 π} σ_{1}} exp \{- \frac{{(x - μ_{1})}^{2}}{2 σ_{1}^{2}}\} + (1 - α) \frac{1}{\sqrt{2 π} σ_{2}} exp \{- \frac{{(x - μ_{2})}^{2}}{2 σ_{2}^{2}}\} . \end{matrix}

Li et al. [27] gave a detailed study on several aspects of the distribution: “unimodal or bimodal”, “measure of disparity of two normals”, and “uniqueness of MSE-RPs”. Generally, the uniqueness of MSE-RPs is not always true, but under some conditions it is true. For example, for a location mixture of two normal densities with

σ_{1} = σ_{2} = σ

, a set of MSE-RPs is is unique if

| μ_{1} - μ_{2} | \leq \sqrt{2} σ

for all

α \in (0, 1)

.

2.2. Asymptotic Behavior of MSE-RPs

There are a lot of studies on the asymptotic behavior of MSE-RPs; for example, see Zador [28], Su [29], Graf and Luschgy [18], and Pagès [7]. It is well-known that the distribution tail gives strong inference on statistical inference. According to different standards, there are many kinds of classification methods for statistical distribution. Embrechts et al. [30] classified the distributions based on the convergence rate of pdf

p (x)

as

x \to \infty

, and they defined the so-called heavy-tailed distribution and light-tailed distribution, in which the exponential distribution is used as a standard for comparison. The following formal definitions are from Foss et al. [31].

Definition 5.

The univariate random variable X with the distribution function F is said to have a heavy tail if

\begin{matrix} \int_{- \infty}^{+ \infty} e^{λ x} d F (x) = \infty, f o r a l l λ > 0 . \end{matrix}

Otherwise, F is said to have a light tail if

\begin{matrix} \int_{- \infty}^{+ \infty} e^{λ x} d F (x) < \infty, f o r s o m e λ > 0 . \end{matrix}

Obviously, any univariate random variable supported on a bounded interval is light-tailed. In fact, this definition can intuitively reflect that the tail of a heavy-tailed distribution is heavier than the tail of the exponential distribution. Moreover, the long-tailed distribution is an important subclass of heavy-tailed distribution and is more commonly used in applications. The formal definition of a long-tailed distribution was given by Foss et al. [31] as follows.

Definition 6.

The univariate random variable X with distribution function F is said to be long-tailed if

\begin{matrix} lim_{x \to \infty} P (X > x + t | X > x) = 1 f o r a l l t > 0, \end{matrix}

or equivalently

\begin{matrix} lim_{x \to \infty} \frac{\bar{F} (x + t)}{\bar{F} (x)} = 1 f o r a l l t > 0, \end{matrix}

where

\bar{F} (x) = P (X > x)

.

Xu et al. [32] studied the limiting behavior of the gap between the largest two representative points of a statistical distribution and obtained another kind of classification for the most useful univariate distributions. They illustrate the relationship between RPs and the concepts of doubly truncated mean residual life (DMRL) and mean residual life (MRL), which are widely used in survival analysis. Denote

\begin{matrix} ρ_{k} = ξ_{k}^{(k)}, Δ_{i}^{(k)} = ξ_{i}^{(k)} - ξ_{i - 1}^{(k)}, i = 2, \dots, k . \end{matrix}

They consider three kinds of distributions according to the domain of distribution, i.e.,

R

,

R^{+}

, and finite interval.

Table 2 shows limiting value of

Δ_{k}^{(k)}

of the normal, t, and logistic distributions. Their density functions are

\begin{matrix} p (x; μ, σ^{2}) = \frac{1}{\sqrt{2 π} σ} exp \{- \frac{1}{2} {(\frac{x - μ}{σ})}^{2}\}, μ \in R, σ^{2} > 0; \\ p (x; α, β) = \frac{exp (- (x - α) / β)}{β {(1 + exp (- (x - α) / β))}^{2}}, α \in R, β > 0; a n d \\ p (x; ν) = \frac{Γ (\frac{ν + 1}{2})}{\sqrt{ν π} Γ (\frac{ν}{2})} {(1 + \frac{x^{2}}{ν})}^{- \frac{ν + 1}{2}}, ν > 2, \end{matrix}

respectively. It is surprising that the normal distribution and t distribution have such different behavior, although the normal distribution is the limiting distribution of the student’s t distribution as

ν \to \infty

.

Table 3 presents limiting value of

Δ_{k}^{(k)}

for many useful distribution on

R^{+}

. These distributions include the Weibull distribution with density

\begin{matrix} p (x; σ, m) = \frac{m}{σ} x^{m - 1} e^{- {(x / σ)}^{m}}, σ > 0, m > 1; \\ p (x; σ, m) = \frac{m}{σ} x^{m - 1} e^{- {(x / σ)}^{m}}, σ > 0, 0 < m < 1, \end{matrix}

the Gamma and exponential distributions with respective densities

\begin{matrix} p (x; α, β) = \frac{1}{β^{α} Γ (α)} x^{α - 1} e^{- x / β}, α > 0, β > 0; \\ p (x; β) = \frac{1}{β} e^{- x / β}, β > 0; \end{matrix}

the density of the F-distribution with degrees of freedoms

d_{1}

and

d_{2}

\begin{matrix} p (x; d_{1}, d_{2}) = \frac{1}{B (\frac{d_{1}}{2}, \frac{d_{2}}{2})} {(\frac{d_{1}}{d_{2}})}^{\frac{d_{1}}{2}} x^{\frac{d_{1}}{2} - 1} {(1 + \frac{d_{1}}{d_{2}} x)}^{- \frac{d_{1} + d_{2}}{2}}, d_{1} > 0, d_{2} > 4; \end{matrix}

the Beta prime distribution with density

\begin{matrix} p (x; α, β) = \frac{1}{B (α, β)} \frac{x^{α - 1}}{{(1 + x)}^{α + β}}, α > 0, β > 2; \end{matrix}

the lognormal distribution with density

\begin{matrix} p (x; μ, σ^{2}) = \frac{1}{x σ \sqrt{2 π}} exp (- \frac{{(ln x - μ)}^{2}}{2 σ^{2}}), μ \in R, σ > 0; \end{matrix}

and the inverse Gaussian distribution with density

\begin{matrix} p (x; μ, σ) = \sqrt{\frac{σ}{2 π x^{3}}} exp (- \frac{σ {(x - μ)}^{2}}{2 μ^{2} x}), μ > 0, σ > 0 . \end{matrix}

Observing on these results [32] gave Theorem 5.

Theorem 5.

If the univariate random variable X supported on

R^{+}

is long-tailed, then

\begin{matrix} lim_{n \to \infty} Δ_{n}^{(n)} = + \infty . \end{matrix}

For the distributions on the finite interval

[m, M]

, Xu et al. [32] gave a systematic study including the following result.

Theorem 6.

Suppose that a random variable X has continuous probability density function

p (x)

on

[m, M]

and

E (X^{2}) < + \infty

. Let

m \leq ξ_{1}^{(k)} < . . . < ξ_{k}^{(k)} \leq M

be the k MSE-RPs of X. If

p (ξ_{k}^{(k)} - y)

converges uniformly to

f (M - y)

,

y \in [0, M - m]

, then

\begin{matrix} lim_{k \to \infty} Δ_{k}^{(k)} = 0, \end{matrix}

provided that the above limit exists.

3. Algorithms for Generation of MSE-RPs of Univariate Distributions

Generation of MSE-RPs is very important for applications. This section reviews algorithms for the generation of univariate distributions. To minimize the mean square error (5) is an optimization problem, including some difficulties:

The objective function is multivariate on simplex ${ξ_{1}^{(k)} < ξ_{2}^{(k)} < \dots < ξ_{k}^{(k)}}$ ;
The objective function might be not differentiable in the whole domain;
The minimum of the objective function is not unique, and the objective function may have more local minimums on the domain.

This kind of problems can not be directly solved by the classical optimization methods (such as the downhill simplex method, quasi-Newton methods, and conjugate gradient methods) for most of distributions.

There are three main different approaches for the generation of RPs:

(a): Theoretic approach or combining the theoretic approach and computational calculation;
(b): Applying the k-means method finds approximate RPs, and this approach can be applied to all of univariate and multivariate distributions;
(c): To solve a system of nonlinear equations.

Approach (a) can be used for very few distributions, such as the uniform distribution in a finite interval. [33] proposed the method for finding MSE-RPs of the exponential and Laplace distributions by combining the theoretic approach and computational calculation.

Approach (b) applies the k-means methods to any continuous univariate and multivariate distributions. The traditional k-means algorithm needs a set of n observations from the underlying distribution

F \in R^{d}

and the user needs to cluster those observations into k groups under a loss function

ϕ

. The k-means algorithm begins with k arbitrary centers. Each observation is then assigned to the nearest center, and each center is recomputed as the center of mass of all points assigned to it. These steps (assignment and center calculation) are repeated until the process stabilizes. One can check that the total error

ϕ

is monotonically decreasing, which ensures that no clustering is repeated during the course of the algorithm. The mean square error (MSE), see Definition 1, has been popularly used as an error

ϕ

.

It seems to us that Polard [34] was the first one to propose this approach. Along this line, Lloyd [35] proposed two trial-and-error methods. This approach is easy to implement, but it needs to choose a good quality initial and a large number of training samples. There are two kinds of k-means algorithms: nonparametric and parametric k-means algorithms. If the population distribution is known, the training samples are from the known population distribution and the corresponding k-means algorithm is parametric; otherwise, the underlying distribution is unknown and the corresponding k-means algorithm is nonparametric. Usually, the parametric k-means algorithm is more accurate for most univariate distributions.

The parametric k-means algorithm

(1): For given pdf $p (x)$ , the number of RPs: k, and $t = 0$ , input a set of initial points $y_{1}^{(t)} < y_{2}^{(t)} < \dots < y_{k}^{(t)}$ . Determine a partition of $R$ as

$\begin{matrix} I_{i}^{(t)} = (a_{i}^{(t)}, a_{i + 1}^{(t)}], i = 1, \dots, k - 1, I_{k}^{(t)} = (a_{k - 1}^{(t)}, a_{k}^{(t)}), \end{matrix}$

where

$\begin{matrix} a_{1}^{(t)} = - \infty, a_{i}^{(t)} = (y_{i - 1}^{(t)} + y_{i}^{(t)}) / 2, i = 2, \dots, k, a_{k}^{(t)} = \infty . \end{matrix}$
(2): Calculate probabilities

$\begin{matrix} p_{j}^{(t)} = \int_{I_{j}^{(t)}} p (x) d x, j = 1, \dots, k; \end{matrix}$

and the condition mean

$\begin{matrix} y_{j}^{(t + 1)} = \frac{\int_{I_{j}^{(t)}} x p (x) d x}{\int_{I_{j}^{(t)}} p (x) d x} = \frac{\int_{I_{j}^{(t)}} x p (x) d x}{p_{j}^{(t)}} . \end{matrix}$
(3): If two sets of ${y_{j}^{(t)}}$ and ${y_{j}^{(t + 1)}}$ are identical, the process stops and deliver ${y_{j}^{(t)}}$ as MSE-RPs of the distribution with probabilities ${p_{j}^{(t)}}$ ; otherwise, let $t : = t + 1$ and go back to Step (1).

Stampfer and Stadlober [36] called this algorithm the self-consistency algorithm as the output set of RPs is self-consistent (not necessarily MSE-RPs).

Approach (c) was proposed by Max [9] and Fang and He [10] based on the traditional optimization for minimizing the mean square error function

Y = f (z_{1}, \dots, z_{k})

with respect to

z_{1}, \dots, z_{k}

, where the objective function (5) is differentiable, taking partial derivative of Y with respect to

z_{1}, \dots, z_{k}

and constructing a system of equations. Its solutions, denoted by

y_{1}, \dots, y_{k}

, might be the global minimum of Y, i.e., MSE-RPs. For

k \geq 3

there are three kinds of equations in (13), (14), and (15), respectively. Fang and He [10] gave the conditions for the solution to be unique under the normal distribution.

Theorem 7.

Taking partial derivative of Y with respective to to

z_{1}, \dots, z_{k}

, we have three kinds of equations:

1.: For any $z_{1} > 0$ , for the equation

$\begin{matrix} (z_{1} - μ) F (\frac{z_{1} + z_{2}}{2}) = - \frac{1}{2 b} (z_{1} + z_{2}) p (\frac{z_{1} + z_{2}}{2}) \end{matrix}$

(13)

there exists a solution $z_{2}$ if and only if $z_{1} < μ$ .
2.: For given $z_{i - 1} > 0, i = 2, \dots, k - 1$ , for the equation

$\begin{matrix} (z_{i} - μ) [F (\frac{z_{i} + z_{i + 1}}{2}) - F (\frac{z_{i - 1} + z_{i}}{2})] \\ = \frac{1}{2 b} (z_{i - 1} + z_{i}) f (\frac{z_{i - 1} + z_{i}}{2}) - \frac{1}{2 b} (z_{i} + z_{i + 1}) p (\frac{z_{i} + z_{i + 1}}{2}) \end{matrix}$

(14)

there exists a solution $z_{i + 1}$ when $z_{i - 1} < z_{i, i - 1}$ , where $z_{i, i - 1}$ is the $(i - 1)$ th representative point in the set of MSE-RPs which has $k = i$ .
3.: For any $z_{k - 1} > 0$ , for the equation

$\begin{matrix} (z_{k} - μ) [1 - F (\frac{z_{k - 1} + z_{k}}{2})] = \frac{1}{2 b} (z_{k - 1} + z_{k}) p (\frac{z_{k - 1} + z_{k}}{2}) \end{matrix}$

(15)

there exists a solution $z_{k}$ .

The Fang–He algorithm has been applied to many univariate distributions. Max [9] and Fang and He [10] obtained sets of MSE-RPs of

N (0, 1)

for

k \leq 36

and

k \leq 31

, respectively. Fu [37] applied the Fang–He algorithm to the gamma distribution

Γ (\frac{3}{2}, 1)

and obtained MSE-RPs for

k \leq 20

. Ke et al. [38] gave a more advanced study on MSE-RPs of the gamma distribution. Zhou and Wang [39] studied the t distribution with 10 degrees of freedom and gave MSE-RPs for

k \leq 35

. Fei [40] proposed an algorithm for generating MSE-RPs by Newton optimization algorithm. Li et al. [27] gave a detailed study on MSE-RPs of the mixture normal distributions. Fei [41] studied the class of Pearson distributions where the pdf of X has the form of

\begin{matrix} p (x) = c exp \{- \int \frac{a_{0} + a_{1} x}{b_{0} + b_{1} x + b_{2} x^{2}} d x\}, \end{matrix}

(16)

where c is the normalized constant and parameters

a_{0}, a_{1}, b_{0}, b_{1}, b_{2}

satisfy the differential equation

\begin{matrix} \frac{d p (x)}{d x} = \frac{a_{0} + a_{1} x}{b_{0} + b_{1} x + b_{2} x^{2}} p (x) . \end{matrix}

The class of Pearson distributions includes many useful distributions. For example, type I is the beta distribution; type II is the symmetrical U-shaped curve; type III is the shifted gamma distribution; type V is the shift inverse gamma distribution; type VI is the inverse beta distribution; type VII is the t distribution; type VIII is the power function distribution; type X is the exponential distribution; and type XI is the normal distribution. Fei [41] gave some sufficient conditions for the uniqueness of the solution.

Comparisons of the three kinds of generations of MSE-RPs: The approach (a) obviously is the best but only for a few distributions. The approach (b) can be applied to any continuous univariate and multivariate distributions. For the generation of univariate MSE-RPs, the parametric k-means algorithm does not need the training sample. Many authors have used this algorithm with a good initial set of points. The approach (c) can find the most accurate MSE-RPs of univariate distributions, but it needs a heavy computational calculation if k is larger.

4. Stochastic Simulation and Resampling

Stochastic simulation has played an important role in statistical research and applications. The traditional simulation employs a set of random samples from the underlying distribution by some computational software.

4.1. Estimation of the Mean, Variance, Skewness, and Kurtosis

Fang et al. [42] first proposed a way of simulation to take random samples from

F_{Q M C}

and

F_{M S E}

instead of the underlying distribution F and found that the statistical simulation and resampling for estimation of the mean, variance, skewness, and kurtosis under

F_{M S E}

have the best performance. The second is under

F_{Q M C}

. Xu et al. [32] gave comparisons above simulation and resampling accuracy for estimation of the mean, variance, skewness, and kurtosis under a lot of distributions: the uniform normal, logistic, t, F, beta prime, gamma, lognormal, and inverse Gaussian distributions. Yang et al. [43] considered more approximation distributions to F in simulation. All of their results show advantages based on

F_{M S E}

and

F_{Q M C}

.

Example 2.

Li et al. [27] gave comparison estimators (mean, variance, skewness, and kurtosis) and density estimators of a mixture of two normal distributions via the four kinds of RPs. Here, the revised MC (RMC) is as follows. Denote distribution

F_{R M C}

by a sorted random samples

x_{1} \leq \dots \leq x_{k}

as support points with related probabilities

\begin{matrix} p_{1}^{(R M C)} & = & \int_{- \infty}^{\frac{x_{1} + x_{2}}{2}} p (x) d x, p_{2}^{(R M C)} = \int_{\frac{x_{1} + x_{2}}{2}}^{\frac{x_{2} + x_{3}}{2}} p (x) d x, \\ \dots \\ p_{i}^{(R M C)} & = & \int_{\frac{x_{i - 1} + x_{i}}{2}}^{\frac{x_{i} + x_{i + 1}}{2}} p (x) d x, i = 2, \dots, k - 1, p_{k}^{(R M C)} = \int_{\frac{x_{k - 1} + x_{k}}{2}}^{\infty} p (x) d x . \end{matrix}

Consider the mixture of two normal distributions

N (- 8, 3^{2})

and

N (3, 4^{2})

with

α = 0.6

; that is, its pdf is

\begin{matrix} p (x) = α φ (x; μ_{1}, σ_{1}^{2}) + (1 - α) φ (x; μ_{2}, σ_{2}^{2}), \end{matrix}

(17)

where

φ (x; 0, 1)

is the pdf of

N (0, 1)

. When

k = 15

, the density estimations by the four methods are given in Figure 1, and the

L_{2}

-distances between

p (x)

and

p_{Y} (x)

by the four methods are given in Table 4. Comparing their performances, we can see that MSE > QMC > RMC > MC.

4.2. Consistency

There are many criteria for stochastic simulation/resampling, including unbiasedness, efficiency (minimum variance), and consistency.

Let

\{X_{1}, \dots, X_{n}\}

be a random sample of size n from a population distribution F and let

Z_{n}

be a random-sample-based estimator of the parameter of interest

θ

. Denote a random variable of interest as

T (X_{1}, \dots, X_{n}; F) = n^{1 / 2} (Z_{n} - θ)

depending upon the underlying distribution F. Let

H_{n} (x) = P (T (X_{1}, \dots, X_{n}; F) \leq x)

(18)

be the cdf of

n^{1 / 2} (Z_{n} - θ)

. Let

\hat{F}

be an approximation distribution of F and let

\{Y_{1}, \dots, Y_{n}\}

be a random sample of size n from

\hat{F}

. A new plug-in estimate of

T (X_{1}, \dots, X_{n}; F)

is defined as

T (Y_{1}, \dots, Y_{n}; \hat{F}) = n^{1 / 2} ({\tilde{Z}}_{n} - {\tilde{θ}}_{k}),

where

{\tilde{θ}}_{k}

is a plug-in estimate of

θ

under

\hat{F}

and

{\tilde{Z}}_{n}

is the counterpart of

Z_{n}

based on i.i.d. samples

Y_{1}, \dots, Y_{n}

. The sampling distribution of

T (Y_{1}, \dots, Y_{n}; \hat{F})

is denoted as

{\tilde{H}}_{n}^{(k)} (y) = P (T (Y_{1}, \dots, Y_{n}; \hat{F}) \leq y),

(19)

which is used to approximate the sampling distribution

H_{n} (x)

defined in (18).

In order to show that the proposed resampling method is effective for simulation, we need to demonstrate that

{\tilde{H}}_{n}^{(k)}

converges to

H_{n}

and show

{\tilde{H}}_{n}^{(k)}

is closed to

H_{n}

in a certain sense. There are several ways to define a distance between

H_{n}

and

{\tilde{H}}_{n}^{(k)}

. Shao and Tu [44] provided a comprehensive review of the theoretical properties of the consistency of bootstrapping. Xu et al. [45] extended this study when the approximation distribution is chosen by

F_{m s e, k}

(the approximate distribution formed by k MSE-RPs of F). They employ the Kolmogorov metric (20) and the Mallows–Wasserstein metric (21) to measure the distance between

H_{n}

and

{\tilde{H}}_{n}^{(k)}

. If one of these distances converges to 0, then we say that

{\tilde{H}}_{n}^{(k)}

is consistent with

H_{n}

.

Definition 7.

Let G and K be two distribution functions in

F = \{all distributions on R\}

. The Kolmogorov’s distance between G and K is

ρ_{\infty} (G, K) = sup_{- \infty < x < \infty} | G (x) - K (x) | .

(20)

Definition 8.

For two distributions G and K in

F_{2}

where

F_{2} = \{F \in F : \int_{- \infty}^{\infty} {| x |}^{2} d F (x) < \infty\},

their Mallows–Wasserstein’s distance is

ρ_{2} (G, K) = inf_{T_{U, V}} {(E {| U - V |}^{2})}^{1 / 2},

(21)

where

T_{U, V}

is the collection of all possible joint distributions of the pairs

(U, V)

, whose marginal distributions are G and K, respectively.

The convergence under the Mallows–Wasserstein metric (21) is stronger than the convergence in distribution. Xu et al. [45] consider the consistency of sample mean and sample variance of

F_{m s e, k}

. Let

X_{1}, \dots, X_{n}

be independent observations on F. Denote sample mean as

\bar{X}

, and sample variance as

S^{2}

. Let

Y_{1}, \dots, Y_{n}

be independent observations on

F_{m s e, k}

. Denote sample mean as

\bar{Y}

and sample variance as

{\tilde{S}}^{2}

. The following results are from Xu et al. [45].

Theorem 8.

Under the above notation, the distribution

{\tilde{H}}_{n}^{(k)}

of

n^{1 / 2} (\bar{Y} - {\tilde{μ}}_{k})

converges to the distribution

H_{n}

of

n^{1 / 2} (\bar{X} - μ)

, where

{\tilde{μ}}_{k}

is the mean of

F_{m s e, k}

and μ is the mean of F, i.e.,

(i).: $ρ_{2} ({\tilde{H}}_{n}^{(k)}, H_{n}) \to 0$ as $k \to \infty$ ;
(ii).: $ρ_{\infty} ({\tilde{H}}_{n}^{(k)}, H_{n}) \to 0$ as both $n \to \infty$ and $k \to \infty$ .

Corollary 1.

Given a set of k MSE-RPs from F, where

k \in N^{+}

. For any sample size n, we have

ρ_{2} ({\tilde{H}}_{n}^{(k)}, H_{n}) \leq E^{1 / 2} [{(X - Y_{k}^{m s e})}^{2}] .

Corollary 2.

Given a set of k MSE-RPs from F, where

k \in N^{+}

. Denote

σ^{2}

as the variance of F and

{\tilde{σ}}_{k}^{2}

as the variance of

F_{m s e, k}

. Then, we have

\underset{n \to \infty}{lim sup} ρ_{\infty} ({\tilde{H}}_{n}^{(k)}, H_{n}) \leq ρ_{\infty} (Φ (\frac{x}{σ}), Φ (\frac{x}{{\tilde{σ}}_{k}})),

(22)

where Φ is the cumulative distribution function of the standard normal distribution.

The sampling distribution of

n^{1 / 2} (\bar{Y} - {\tilde{μ}}_{k})

is a valid estimation of the sampling distribution of

n^{1 / 2} (\bar{X} - μ)

according to Theorem 8. As shown in Corollary 1, the distance between the two sampling distributions under the Mallows–Wasserstein metric is always bounded by the square root of the mean squared error of k MSE-RPs from the underlying distribution F regardless of the sample size n. Moreover, Corollary 2 explains that when

n \to \infty

, the Kolmogorov’s distance between the two sampling distributions is bounded by the distance between two normal distributions with different standard errors. The distance on the right-hand side of (22) is determined by the number of representative points k. We should notice that the Kolmogorov’s distance between two sampling distributions converges to 0 as both n and k increase. In practice, the number of representative points k should be as large as possible.

Theorem 9.

Let

X \sim F

and

Y_{k}^{m s e} \sim F_{m s e, k}

on the domain

X

.

(i): If a function $g : X \to R$ is continuous, then

$g (Y_{k}^{m s e}) \to g (X) i n d i s t r i b u t i o n;$
(ii): If g is also bounded, then

$lim_{k \to \infty} E [g (Y_{k}^{m s e})] = E [g (X)] .$

Specifically, part (i) of Theorem 9 proves that the convergence of MSE-RPs is preserved by continuous functions, and part (ii) shows that the MSE-RPs can be used to estimate the expectation of

g (X)

consistently for any continuous and bounded function g.

Theorem 10.

Let

X \sim F

and

Y_{k}^{m s e} \sim F_{m s e, k}

. If there exists some

r \in N^{+}

and

r \geq 3

such that

E ({| X |}^{r + ε}) < \infty

with some

ϵ > 0

, then

(i): $Y_{k}^{m s e}$ converges to X in rth raw moment, i.e.,

$lim_{k \to \infty} E [{(Y_{k}^{m s e})}^{r}] = E (X^{r});$
(ii): $Y_{k}^{m s e}$ converges to X in rth absolute central moment, i.e.,

$lim_{k \to \infty} E ({| Y_{k}^{m s e} - μ |}^{r}) = E ({| X - μ |}^{r}),$

where $μ = E (Y_{k}^{m s e}) = E (X)$ .

Theorem 10 is essential for establishing the consistency of the sample rth moment under

F_{m s e, k}

, including the sample variance under

F_{m s e, k}

.

4.3. Confidence Interval Estimation

Confidence intervals are usually constructed by considering a pivotal quantity

t_{n} = T (X_{1}, \dots, X_{n}; F)

whose distribution is known. Suppose

θ

is a parameter of interest. Consider a studentized pivot

t_{n} = ({\hat{θ}}_{n} - θ) / {\hat{σ}}_{n}

, where

{\hat{θ}}_{n}

is an estimator of

θ

and

{\hat{σ}}_{n}

is an estimator of the standard error of

{\hat{θ}}_{n}

. Denote the distribution of

t_{n}

as

H_{n}

. Assume the inverse of

H_{n}

is unique, an exact

100 (1 - α) %

confidence interval for

θ

is

[{\hat{θ}}_{n} - {\hat{σ}}_{n} H_{n}^{- 1} (1 - α / 2), {\hat{θ}}_{n} - {\hat{σ}}_{n} H_{n}^{- 1} (α / 2)] .

(23)

In the traditional asymptotic approach, we replace the unknown

H_{n}

by its limit H. There are two main disadvantages of the traditional asymptotic approach. It requires the limit H to be derived analytically and explicitly, and it lacks high-order accuracy. The bootstrap method can be applied to obtain an easy-to-use confidence interval with higher accuracy by replacing

H_{n}

in (23) with its bootstrap estimator

H_{b o o t}

. According to numerical studies under some basic experimental settings by Xu et al. [45], the proposed resampling method tends to be more accurate for interval estimation than the bootstrap method.

5. Property of MSE-RPs of Multivariate Distributions

Let

X = {(X_{1}, \dots, X_{d})}^{T}

be a random vector with cdf

F (x_{1}, \dots, x_{d})

and pdf

p (x_{1}, \dots, x_{d})

. Assume that X exist finite mean vector

E (X) = μ

and covariance matrix

Σ = Cov (X)

. A set of MSE-RPs of X is denoted by

Ξ = {ξ_{j}, j = 1, \dots, k}

that minimizes the mean square error (MSE) and the corresponding vector is

ξ = Y_{M S E}

with Voronoi regions

S_{j}, j = 1, \dots, k

and probabilities

p_{j} = P (X \in S_{j}), j = 1, \dots, k

(refer to Definition 1). The following results are from Flury [13,14].

Theorem 11.

Under the above assumption on X we have

1.: When $k = 1$ the MSE-RP is given by $ξ = μ$ ;
2.: $E (Y_{M S E}) = E (X) = \sum_{j = 1}^{k} p_{j} ξ_{j}$ , i.e., $E (X)$ is in the convex hull of $ξ_{1}, \dots, ξ_{k}$ ;
3.: MSE-RPs are self-consistent and

$MSE (Y_{M S E}) = t r (Σ) - t r (Σ_{ξ}),$

where $Σ_{ξ} = \sum_{j = 1}^{k} p_{j} (ξ_{j} - μ) {(ξ_{j} - μ)}^{T}$ is the covariance matrix of $Y_{M S E}$ ;
4.: The rank of $(ξ_{1} - μ, \dots, ξ_{k} - μ) < k$ .

Theorems 2 and 4 can be easily extended to the multivariate case, but Theorem 4 needs some change in linear relation for extension below.

Theorem 12.

Let

X_{1}

and

X_{2}

be two random vectors in

R^{d}

with relation

X_{2} = a + b H X_{1}

, where

a \in R^{d}

,

b \in R

and

H

is an orthogonal matrix of order d. We have

(a): If ${y_{1}, \dots, y_{k}}$ is a set of self-consistent points of $X_{1}$ , then ${a + b H y_{1}, \dots, a + b H y_{k}}$ is a set of self-consistent points of $X_{2}$ ;
(b): If ${ξ_{1}, \dots, ξ_{k}}$ is a set of MSE-RPs of $X_{1}$ , then ${a + b H ξ_{1}, \dots, a + b H ξ_{k}}$ is a set of MSE-RPs of $X_{2}$ .

There are various kinds of symmetry in multivariate distributions, among which the class of elliptically symmetric distributions is an extension of the multivariate normal distribution and includes so many useful distributions. For a comprehensive study, refer to Fang et al. [46].

Definition 9.

Spherically and elliptically symmetric distributions. A d-dimensional random vector X is said to have an elliptically symmetric distribution (ESD), or elliptical distribution for short, if X has the following stochastic representation (SR)

\begin{matrix} X \overset{d}{=} μ + R Ψ^{1 / 2} U^{(d)}, \end{matrix}

(24)

where random variable

R \geq 0

is independent of

U^{(p)}

that is uniformly distributed on the unit sphere in

R^{p}

,

μ \in R^{d}

, Ψ is a positive definite matrix of order d (not necessary to be

Cov (X)

), and

Ψ^{1 / 2}

is the positive definite square root of Ψ. We write

X \sim E S D_{p} (μ, Ψ, g)

if X has a density of the form

\begin{matrix} p (x) = {| Ψ |}^{1 / 2} g ({(x - μ)}^{T} Ψ^{- 1} (x - μ)), \end{matrix}

where g is called as the density generator. When

μ = 0, Ψ = I

, X has a spherical distribution with the stochastic representation

\begin{matrix} X \overset{d}{=} R U^{(d)}, \end{matrix}

(25)

and write

X \sim S (g)

, where

g (x^{T} x)

is the density of X.

In general, an elliptical/spherical distribution is not necessary to have a density. For example,

U^{(d)}

does not have a density in

R^{d}

. If the distribution of X is spherical and

P (X = 0) = 0

, then

∥ X ∥ \overset{d}{=} R

and

X / ∥ X ∥ \overset{d}{=} U^{(d)}

are independent. It is known that X defined in (24) has a density if and only if R has a density

f (\cdot)

. The relationship between

g (\cdot)

and

f (\cdot)

is given by

\begin{matrix} f (r) = \frac{2 π^{d / 2}}{Γ (d / 2)} r^{d - 1} g (r^{2}) . \end{matrix}

(26)

Table 5 lists some useful subclasses of the elliptical distributions.

Flury [13] is the first one who found some relationship between the principal components and the MSE-RPs of the elliptical distribution in the following theorems.

Theorem 13.

Suppose

X \sim E S D_{d} (μ, Ψ, g)

with mean vector

μ

, covariance matrix Σ that is proportional to Ψ and density generator g. Then, the two MSE-RPs of X have of the form

\begin{matrix} ξ_{1} = μ + λ_{1} γ, y_{2} = μ + λ_{2} γ, \end{matrix}

(27)

where

γ \in R^{d}

is the normalized characteristic vector associated with the largest eigenvalue of Σ, and

{λ_{1}, λ_{2}}

are the

k = 2

MSE-RPs of the univariate random variable

γ^{T} (X - μ)

. If the MSE-RPs are not unique, they can be chosen as the given form.

Tarpey et al. [47] established a theorem, called the principal subspace theorem, which shows that k principal points of an elliptically symmetric distribution lie in the linear subspace spanned by the first several principal components.

Theorem 14.

Let

X \sim E C D_{d} (0, Σ, g)

. If a set of k MSE-RPs of X spans a subspace

V

of dimension

q (q \leq d)

, then Σ has a set of eigenvectors

β_{1}, \dots, β_{d}

with associated ordered eigenvalues

λ_{1} \geq λ_{2} \dots \geq λ_{d}

such that

V

is spanned by

β_{1}, \dots, β_{q}

.

The principal subspace theorem exploresthe set of MSE-RPs of an elliptical distribution that has a close relationship with its principal components. It is why Flury [13] called the MSE-RPs principal points. Tarpey [48] and Yang et al. [43] proposed ways to generate a set of MSE-RPs of elliptical distributions in several subclasses of elliptical distributions and explore more relationships between the principal components and MSE-RPs. Their studies need algorithms for producing MSE-RPs.

Yang et al. [43] consider numerical simulation for estimation of mean vector and covariance matrix of the elliptical distributions and show that both QMC-RPs and MSE-RPs have better performance than MC-RPs. They also studied the distribution of MSE of MC-RPs for univariate distributions and elliptical distributions and pointed out that MSE of MC-RPs can be fitted by the extreme value distribution. For a random sample with a poor MSE value, it does not expect to have a good result based on this set of random samples.

6. Algorithms of Generation for RPs of Multivariate Distributions

There are a lot of methods for generating a random sample from a given multivariate distribution

F (x)

. Johnson [49] gave a good introduction to various methods. There are two useful methods: conditional decomposition and stochastic representation.

6.1. Conditional Decomposition

The conditional distribution method changes generation for a multivariate distribution into generation for several conditional univariate distributions. Suppose random vector

X = (X_{1}, \dots, X_{d})

has the cdf

F (x_{1}, \dots, x_{d})

. Let

F_{1} (x)

be the cdf of

X_{1}

and let

F_{i} (x_{i} | x_{1}, \dots, x_{i - 1})

be the conditional distribution of

X_{i}

given

X_{1} = x_{1}, \dots, X_{i - 1} = x_{i - 1}

. It is known in theory of probability

\begin{matrix} F (x_{1}, \dots, x_{p}) = F_{1} (x) F_{2} (x_{2} | x_{1}) \dots F_{d} (x_{d} | x_{1}, \dots, x_{d - 1}) . \end{matrix}

(28)

Note that each of

F_{1} (x), F_{2} (x_{2} | x_{1}) \dots

and

F_{d} (x_{d} | x_{1}, \dots, x_{d - 1})

is a univariate (conditional) distributions. We can apply some methods including the inverse transformation method to generate a random sample from these distributions. Denote a set of random samples from these univariate (conditional) distributions by

z_{1}, \dots, z_{d}

, then

z = z_{1} \dots z_{d}

is a random sample from X. In particular, when

X_{1}, \dots, X_{d}

are independent,

F (x_{1}, \dots, x_{d}) = \prod_{i = 1}^{d} F_{X_{i}} (x_{i})

, where

F_{X_{i}} (x_{i})

is the cdf of

X_{i}

.

6.2. Stochastic Representation

Let

X \sim F (x), X \in R^{d}

. Suppose that X has a stochastic representation

\begin{matrix} X = h (Y), Y \sim U (C^{t}), t \leq d, \end{matrix}

where h is a set of continuous functions on

C^{t} = {[0, 1]}^{t}

and Y follows the uniform distribution on

C^{t}

. The Monte Carlo simulation can find a random sample

u = (u_{1}, \dots, u_{t})

from

C^{t}

. Then,

h (u)

is a random sample from

F (x)

.

The SR method can be extended to generate a set of QMC-RPs and MSE-RPs. The QMC method employ a set of quasirandom numbers on

C^{t}

, denoted by

{c_{i}, i = 1, \dots, n}, c_{i} \in C^{t}

. Set

x_{i} = h (c_{i}), i = 1, \dots, n

. Then, the set of

{x_{i}}

is called a set of quasirandom F-numbers, which can be regarded as another kind of RPs of

F (x)

, i.e., NTM-RPs or QMC-RPs.

Generating a set of MSE-RPs is not possible for most multivariate distributions. If we focus on some class of multivariate distributions that are easily generated by MC or QMC, then the generation of MSE-RPs becomes much easier. One method is the LBG Algorithm by the use of k-means method proposed by Linde et al. [50]. The LBG algorithm requests a training sequence

X = {x_{1}, \dots, x_{N}}

from the given distribution

F (x)

by a Monte Carlo method, where N is much larger than k and k is the number of RPs for

F (x)

. The next step chooses a set of initial vectors using the same Monte Carlo method and finds the associated Voronoi partition

{S_{j}}

by assigning each

x_{i}

to the nearest region of the partition

{S_{j}}

. Then follow the procedure of the k-means algorithm and iteration steps until reach the stopping rule.

Although the LBG algorithm can reach a local optimal output with a non-increasing MSE, Fang et al. [51] pointed out two problems when applying this algorithm:

(a): The algorithm gives the local optimum and the results are dependent on the initial points;
(b): The generation of samples of $F (x)$ and the calculation of MSE are based on the Monte Carlo method, which is less efficient with the convergence rate $O_{p} (n^{- 1 / 2})$ .

Fang et al. [51] revised the LBG algorithm by the use of quasirandom F-numbers in producing the set of training samples and the set of initial vectors. They proposed the so-called NTLBG algorithm for the generation of QMC-RPs of an elliptical distribution.

Recall a spherical distribution

S (g)

has a SR

X \overset{d}{=} R U^{(d)}

in (25). If we can find a set of quasirandom numbers of the uniform distribution on

U (C^{d})

and a set of quasirandom numbers of R, their product can produce a set of QMC-RPs of X. An effective algorithm for generating a set of QMC-RPs on

U (C^{d})

can refer to Fang and Wang [4]. The latter calls this algorithm as TFWW algorithm. It is easy to see that if

X \sim S (g)

, then

H X \sim S (g)

for any orthogonal matrix

H

of order d. Therefore, if

{ξ_{1}, \dots, ξ_{k}}

is a set of MSE-RPs of X, then

{H ξ_{1}, \dots, H ξ_{k}}

is also a set of MSE-RPs of X. That means that the set of MSE-RPs for spherical distributions is quite not unique.

6.3. The NTSR Algorithm for the Generation of a Spherical Distribution

Generate a set of quasirandom numbers ${c_{j} = (c_{j 1}, \dots, c_{j d}), j = 1, \dots, n}$ on $C^{d}$ .
Denote the cdf of R by $F_{R} (r)$ and let $F_{R}^{- 1}$ be its inverse function. Compute $r_{j} = F_{R}^{- 1} (c_{j d})$ , $j = 1, \dots, n$ .
Generate a set of quasirandom F-numbers ${u_{j}, j = 1, \dots, n}$ of the uniform distribution on $U^{d}$ with the first $(d - 1)$ -components of $c_{j}$ ’s via the TFWW algorithm.
Then ${y_{j} = r_{j} u_{j}, j = 1, \dots, n}$ is a set of quasirandom F-numbers or QMC-RPs of the given spherical distribution $F (x)$ .

This algorithm can be easily extended to generation of quasirandom F-numbers or QMC-RPs for elliptical distributions. The NTLBG algorithm has the following steps:

Step 1. For a given $F (x)$ , generate a set of quasirandom F-number $x_{1}, \dots, x_{N}$ as a training sequence by the NTSR algorithm with a large N.
Step 2. Set t = 0. For a given k, generate a set of quasirandom F-numbers $Y_{t} = {y_{i 1}, \dots, y_{i k}}$ of $F (x)$ as an initial set of output vectors.
Step 3. Form a partition ${S_{i}^{(t)}}$ of ${x_{j}, j = 1, \dots, N}$ such that each $x_{i}$ is assigned to the nearest region of the partition, i.e., $x_{i} \in S_{m}^{(t)}$ if $| | x_{i} - y_{t m} | | \leq | | x_{i} - y_{t j} | |$ , $j \neq m$ .
Step 4. Calculate the sample conditional means $y_{i} = E [x | x \in S_{i}]$ and form a new set of output vector vectors $Y_{t + 1} = {y_{t + 1, 1}, \dots, y_{t + 1, k}}$ , where

$\begin{matrix} y_{t + 1, m} = \frac{1}{n_{t + 1, m}} \sum_{x_{j} \in S_{m}^{(t)}} x_{j} \end{matrix}$

(29)

and $n_{t + 1, m}$ is the number of $x_{j}$ falling in $S_{m}^{(t)}$ . If $Y_{t + 1} = Y_{t}$ , deliver $Y_{t}$ as MSE-RPs and go to Step 6 and $n_{t + 1, m} / N$ as estimated probability of $P (x \in S_{m})$ ; otherwise go to the next step.
Step 5. Let $t : = t + 1$ and go to Step 3.
Step 6. Calculate and deliver MSE

$M S E (Y_{t}) = \int_{R^{p}} min_{1 \leq j \leq k} {(x - y_{t j})}^{T} (x - y_{t j}) d F (x),$

or

$M S E (Y_{t}) = \sum_{i = 1}^{N} \frac{n_{t + 1, m}}{N} {(x_{i} - y_{t, m})}^{T} (x_{i} - y_{t, m}) I (x_{i} \in S_{m}^{(t)})$

by its estimate based on the training sequence.

The NTLBG algorithm has used in generation QMC-RPs and MSE-RPs for elliptical distributions ([43,52,53]) and the skew-normal distribution in Yang et al. [43].

7. Applications of QMC-RPs and MSE-RPs

It is known that QMC-RPs have been applied to numerical high-dimensional integration [3], experimental design [5], geometric probability [4], optimization [4], option pricing [7], and big data analysis. Due to the limited space, we cite only some applications.

7.1. Statistical Inference

For statistical estimation, Fang et al. [42] proposed three kinds of RPs (MC-RPS, QMC-RPs, and MSE-RPs) in the estimation of the mean, variance, skewness, and kurtosis as well as density function. Their results show a big potential for the application of QMC-RPs and MSE-RPs in statistical estimation and resampling. Their results also show that MSE-RPs have a better performance than QMC-RPs under the normal distribution. Their observation is true for many other distributions. However, Jiang et al. [54] found that

Y_{Q M C}

of the arcsine distribution have the same mean, variance, skewness, and kurtosis of the population, i.e., QMC-RPs are the best among other approximations. These interesting results motivated Zhou and Fang [55] to consider putting more conditions into a new criterion for construction of RPs, i.e., FM criterion and FM-RPs. There are more criteria for the construction of RPs; for example, Mak and Joseph [56] proposed so-called support points vs. energy distance, and Rover and Friede [57] used the Kullback–Leibler divergence and symmetrized KL-divergence to define a kind of discrete approximation to a continuous distribution F(x). The mean square error is a special case of

\begin{matrix} \int_{R^{d}} min_{j = 1, \dots, k} | | x - ξ_{j} {| |}^{r} d x \end{matrix}

(30)

with

r = 2

, where

| | a | |

denotes

l_{r}

-norm of

a \in R^{d}

, and the corresponding set of RPs is called set of MrE-RPs. There are a lot of studies under this criteria, especially

r = 1

(see Graf and Luschgy [18]). Yu [58] gave some sufficient conditions to the uniqueness of MSE-RPs.

Liang et al. [52] and Wang et al. [53] proposed some statistics for testing multivariate normality based on MSE-RPs. Their numerical comparisons show that the new goodness-of-fit testing statistics can significantly improve the traditional chi-square test.

7.2. Moment Estimation and Maximum Likelihood Method

Let

x_{1}, \dots, x_{n}

be a sample from the population

F (x; θ)

, where

θ

is a set of unknown parameters and one needs to find a good estimator

\hat{θ}

based on this sample. For example, for normal distribution

N (μ, σ^{2})

, one needs to estimate the

θ = (μ, σ^{2})

. There are many useful methods for estimation, among which the moment estimator (ME) and maximum-likelihood estimator (MLE) are popular. For the ME, it needs to solve a system of nonlinear equations, and the sequential number-theoretic methods for optimization (SNTO for short) proposed by Fang and Wang [4] is a powerful method for solving the corresponding equations. Note that QMC-RPs have played an important role in the SNTO algorithm.

To find the MLE of

θ

is an optimization problem where the objective function (the likelihood function of the random sample) may have many local maximums and the traditional optimization methods may get into a local maximum. Fang and Wang [4] and Li and Fang [59] applied SNTO to find MSE for many univariate distributions.

It is well known that the performances of sample moments highly depend on the quality of the random sample. The sample moments will be less reliable if the sample fails to catch the turning points of population density. One way for ME revision is to improve the representativeness of the random sample used for calculating sample moments. Li and Fang [59] proposed a new concept “QMC-data” via the nonparametric estimator of the quantile proposed by Harrell and Davis [60].

Let

x = \{x_{1}, \dots, x_{n}\}

be a random sample of size n from

F (x)

. Denote

X_{(1)} \leq X_{(2)} \leq \dots \leq X_{(n)}

as the order statistics and

F^{- 1} (q)

as the

q^{t h}

population quantile.

Step 1. Generate n quasirandom numbers $q_{i} = (2 i - 1) / 2 n$ , for $i = 1, \dots, n$ , which are uniformly scattered on the interval $(0, 1)$ . For each $q = q_{i}$ run the following steps.
Step 2. The quantile estimator of $F^{- 1} (q)$ based on random sample is proposed to be $Q (q)$ :

$S Q (q) = \sum_{i = 1}^{n} W_{n, i} X_{(i)},$

(31)

where

$\begin{matrix} W_{n, i} & = & \frac{1}{β {(n + 1) q, (n + 1) (1 - q)}} \int_{(i - 1) / n}^{i / n} y^{(n + 1) q - 1} {(1 - y)}^{(n + 1) (1 - q) - 1} d y \\ = & I_{i / n} {q (n + 1), (1 - q) (n + 1)} - I_{(i - 1) / n} {q (n + 1), (1 - q) (n + 1)} \end{matrix}$

and $I_{x} (a, b)$ denotes the incomplete beta function.
Step 3. Let $z_{i} = S Q (q_{i})$ , for $i = 1, \dots, n$ , then $x = (x_{1}, \dots, x_{n})$ is replaced by $z = (z_{1}, \dots, z_{n})$ for moment estimation. The objective function based on the revised sample is

$\tilde{Q} (θ | z) = \sum_{i = 1}^{5} {(M_{i} (θ) - m_{i} (z))}^{2} .$

(32)

Li and Fang [59] call z as QMC-data based on which they obtain a better estimator of

θ = (α, μ_{1}, σ_{1}^{2}, μ_{2}, σ^{2})

of a mixture of two normal distributions by ME and/or MLE methods. Chen [61], Chen [62] gave a comprehensive study on estimation including convergency rate and condition for establishing the consistency of MLE based on this popularly used distribution. Qi et al. [63] considered using the density estimation for revising location-biased datasets for reducing biased datasets. The kernel estimation of the density function is a nonparametric method that likes the idea of the QMC-Data structure. Wang et al. [64] proposed a new bias-correction numerical method based on SNTO that can reduce the bias and variance of parameter estimators significantly in three types of extreme value distributions. Many case studies can be found in Qi et al. [63], Li and Fang [59], and Wang et al. [64].

7.3. Geometric Probability

Statistical simulation has been widely used for problems in statistics with no analytic solution. For example, let D be a domain in

R^{d}

and E be the output of a stochastic process working on D. One needs to find the distribution of the area or volume of E. Statistical simulations need a set of points that are uniformly scattered on D and on any subarea of D, including E. A set of random samples fails to provide such a set of points. Fang and Wang [4] suggested applying QMC-RPs to two real-life case studies where D is not a rectangle.

7.3.1. Case Study I: Area of Intersection between a Fixed Circle and Several Random Circles

Let K be a unit circle with a center at the origin

o = {(0, 0)}^{T}

. Suppose that

K_{1}, \dots, K_{m}

are the m random circles centered at

o, \dots, o_{m}

with given radii

r_{1}, \dots, r_{m}

, respectively. The centers

o_{1}, \dots, o_{m}

are independent and

o_{j} \sim N_{2} (0, σ_{j}^{2} I_{2})

, where

σ_{j} > 0

are known. Let

S_{m} = K \cap (K_{1} \cup \dots \cup K_{m})

, the overlapping region of K and the union of the m random circles

K_{1}, \dots, K_{m}

, and denote

A (S_{m})

the area of

S_{m}

. The goal of the problem is to find the distribution of

A (S_{m})

. It is not difficult to find the distribution of the overlapping area with a single random circle; however, there is no analytic solution for the cases when

m \geq 2

.

7.3.2. Case Study II: Area of Random Belts with a Fixed Width Covering a Unit Sphere

Let

S^{3}

be a unit sphere in

R^{3}

centering at the origin

{(0, 0, 0)}^{T}

, and it can be covered by random belts with a fixed width. Each belt is symmetric about a unit circle in

S^{3}

centered at the origin

{(0, 0, 0)}^{T}

. Let

n \in R^{3}

be a normal direction, then the belt on the surface of

S^{3}

with thickness

2 h

can be written as

\begin{matrix} G_{h} (n) = {w : | w^{T} n | \leq h}, \end{matrix}

(33)

and the area of the belt can be computed by

A (G_{h} (n) = 4 π h

. The original problem arises in steel rolling and aims to find the distribution of a roller’s lifespan and find some ways to increase the lifespan.

There are no analytic solutions in both case studies. If one considers a statistical simulation, an NT-net on the unit circle K or the unit sphere in

R^{3}

is needed. Fang and Wang [4] employed the statistical simulation over NT-net on

S^{3}

. They found that QMC-RPs can give the best design for sequential belts that has a significantly longer life. Yang et al. [43] gave comparisons on estimation accuracy between QMC-RPs and MSE-RPs for construction of NT-net on the unit circle and

S^{3}

, respectively, and concluded that both are working perfectly.

8. Concluding Remarks

The bootstrap method, originally proposed by Efron [1], has found wide applications in statistical theory and practice. This method involves drawing random samples from the empirical distribution, which serves as an approximation to the population distribution

F (x)

. However, due to the inherent randomness of these samples, the bootstrap method has certain limitations. To overcome this, a natural solution is to construct support points called RPs that offer a more representative characterization of the distribution

F (x)

compared to random samples.

This paper discusses three types of RPs: MC-RPs, QMC-RPs, and MSE-RPs, along with their respective approximations. Theoretical foundations and practical applications demonstrate that all of these RPs can be effectively and efficiently utilized for statistical inferences, including estimation and hypothesis testing. In many case studies, MSE-RPs and/or QMC-RPs have shown better performance compared to MC-RPs. QMC-RPs have been widely applied in various fields, including numerical integration in high dimensions, financial mathematics, experimental design, and geometric probability. This paper provides a comprehensive review of the theory and applications of MSE-RPs, with particular emphasis on recent developments. MSE-RPs exhibit significant potential for applications in statistics, financial mathematics, and big data analysis.

However, in the theory of MSE- and QMC-RPs, several open questions remain. For instance, although several new RP construction methods have been proposed, these methods still lack solid theoretical justifications and practical applications. Further research is needed to address these gaps and advance the field.

We are creating a website (https://fst.uic.edu.cn/isci_en/index.htm, accessed on 19 June 2023) where readers can access fundamental knowledge about RPs and MSE-RPs for various univariate distributions in the near future. Additionally, we are in the process of incorporating R software that generates MSE-RPs into the website, which will be available soon. While there are existing monographs such as “Foundations of Quantization for Probability Distributions” by Graf and Luschgy [18] and “Numerical Probability, An Introduction with Applications to Finance” by Pages [7], these works do not specifically focus on applications in statistical inference. Therefore, there is a need for a new monograph that covers recent advancements in both theory and applications. This review article can serve as a valuable resource, providing relevant content and establishing connections for a potential new book in this area.

Author Contributions

For this review article, K.-T.F. prepared a draft and J.P. made the final version. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported in part by the Guangdong Provincial Key Laboratory of Interdisciplinary Research and Application for Data Science, BNU-HKBU United International College [grant number 2022B1212010006], and by Guangdong Higher Education Upgrading Plan (2021–2025) UIC [grant number R0400001-22].

Data Availability Statement

Not applicable.

Acknowledgments

We are grateful for the valuable comments provided by the four referees, which have greatly improved the quality of the paper. We thank Yinan Li, a doctoral student in UIC, for her great help. She has made contributions in this field and published two papers during her Ph.D. study.

Conflicts of Interest

The authors declare no conflict of interest.

References

Efron, B. Bootstrap methods: Another look at the jackknife. Ann. Stat. 1979, 7, 1–26. [Google Scholar] [CrossRef]
Hua, L.K.; Wang, Y. Applications of Number Theory to Numerical Analysis; Springer: Berlin/Heidelberg, Germany; Science Press: Beijing, China, 1981. [Google Scholar]
Niederreiter, H. Random Number Generation and Quasi-Monte Carlo Methods; Society Industrial and Applied Mathematics (SIAM): Phiadelphia, PA, USA, 1992. [Google Scholar]
Fang, K.T.; Wang, Y. Number-Theoretic Methods in Statistics; Chapman and Hall: London, UK, 1994. [Google Scholar]
Fang, K.T.; Liu, M.Q.; Qin, H.; Zhou, Y.D. Theory and Application of Uniform Experimental Designs; Science Press: Beijing, China; Springer: Berlin/Heidelberg, Germany, 2018. [Google Scholar]
Fang, K.T.; Wang, Y.; Bentler, P.M. Some applications of number-theoretic methods in statistics. Stat. Sci. 1994, 9, 416–428. [Google Scholar] [CrossRef]
Pagès, G. Numerical Probability: An Introduction with Applications to Finance; Universitext; Springer: Cham, Switzerland, 2018. [Google Scholar]
Cox, D.R. Note on grouping. J. Am. Stat. Theory 1957, 52, 543–547. [Google Scholar] [CrossRef]
Max, J. Quantizing for minimum distortion. IRE Transform. Theory 1960, IT-6, 7–12. [Google Scholar] [CrossRef]
Fang, K.T.; He, S. The Problem of Selecting a Given Number of Representative Points in a Normal Population and a Generalized Mill’s Ratio; Technical Report No. 5; Department of Statistics, Stanford University: Stanford, CA, USA, 1982. [Google Scholar]
Fang, K.T. Application of the theory of the conditional distribution for the standardization of clothes. Acta Math. Appl. Sin. 1976, 2, 62–74. (In Chinese) [Google Scholar]
Iyengar, S.; Solomon, H. Selecting representative points in normal populations. In Recent Advances in Statistics: Papers in Honor of Herman Chernoff on His 60th Birthday; Rizvi, M.H., Rustagi, J., Siegmund, D., Eds.; Academic Press: New York, NY, USA, 1983; pp. 579–591. [Google Scholar]
Flury, B.A. Principal points. Biometrika 1990, 77, 33–41. [Google Scholar] [CrossRef]
Flury, B.A. Estimation of principal points. J. R. Stat. Soc. Ser. C Appl. Stat. 1993, 42, 139–151. [Google Scholar] [CrossRef]
Flury, B.A.; Tarpey, T. Representing a large collection of curves: A case for principal points. Am. Stat. 1993, 47, 304–306. [Google Scholar]
Tarpey, T.; Petkova, E. Principal point classification: Applications to differentiating drug and placebo responses in longitudinal studies. J. Stat. Plan. Inference 2010, 140, 539–550. [Google Scholar] [CrossRef]
Gray, R.M.; Neuhoff, D.L. Quantization. IEEE Trans. Inf. Theory 1998, 44, 2325–2383. [Google Scholar] [CrossRef]
Graf, S.; Luschgy, H. Foundations of Quantization for Probability Distributions; Lecture Notes in Math. 1730; Springer: Berlin/Heidelberg, Germany, 2000. [Google Scholar]
Fei, R.C. Statistical relationship between the representative point and the population. J. Wuxi Inst. Light Ind. 1991, 10, 78–81. (In Chinese) [Google Scholar]
Tarpey, T.; Flury, B. Self-consistency: A fundamental concept in statistics. Stat. Sci. 1996, 11, 229–243. [Google Scholar]
Sharma, D.K. Design of absolutely optimal quantizers for a wide class of distortion measures. IEEE Trans. Infor. Theory 1978, IT-24, 693–702. [Google Scholar] [CrossRef]
Gu, X.; Mathew, T. Some characterizations of symmetric two-principal points. J. Stat. Plan. Inference 2001, 98, 29–37. [Google Scholar] [CrossRef]
Zoppè, A. Principal points of univariate continuous distributions. Stat. Comput. 1995, 5, 127–132. [Google Scholar] [CrossRef]
Fleischer, P.E. Sufficient conditions for achieving minimum distortion in a quantizer. IEEE Int. Conv. Rec. 1964, 1, 104–111. [Google Scholar]
Trushkin, A.V. Sufficient conditions for uniqueness of a locally optimal quantizer for a class of convex error weighting functions. IEEE Trans. Inform. Theory 1982, IT-28, 187–198. [Google Scholar] [CrossRef]
Pearson, K. Contribution to the theory of mathematical evolution. Philos. Trans. R. Soc. Lond. 1894, 186, 71–110. [Google Scholar]
Li, Y.; Fang, K.T.; He, P.; Peng, H. Representative points from a mixture of two normal distributions. Mathematics 2022, 10, 3952. [Google Scholar] [CrossRef]
Zador, P.L. Asymptotic quantization error of continuous signals and the quantization dimension. IEEE Trans. Inform. Theory 1982, 28, 139–149. [Google Scholar] [CrossRef]
Su, Y. Asmptotically optimal representative points of bivariate random vectors. Stat. Sin. 2000, 10, 559–575. [Google Scholar]
Embrechts, P.; Klüppelberg, C.; Mikosch, T. Modelling Extremal Events for Insurance and Finance; Springer: Berlin/Heidelberg, Germany, 1997. [Google Scholar]
Foss, S.; Korshunov, D.; Zachary, S. An Introduction to Heavy-Tailed and Subexponential Distributions; Springer: Berlin/Heidelberg, Germany, 2011. [Google Scholar]
Xu, L.H.; Fang, K.T.; Pan, J.X. Limiting behavior of the gap between the largest two representative points of statistical distributions. Commun. Stat. Theory Methods 2021, 52, 3290–3313. [Google Scholar] [CrossRef]
Xu, L.H.; Fang, K.T.; He, P. Representative points of the exponential distribution. Stat. Pap. 2021, 63, 197–223. [Google Scholar] [CrossRef]
Polard, D. Quantization and the method of k-means. IEEE Trans. Inform. Theory 1982, IT-28, 199–205. [Google Scholar] [CrossRef]
Lloyd, S.P. Least squares quantization in PCM. IEEE Trans. Inform. Theory 1982, IT-28, 129–137. [Google Scholar] [CrossRef]
Stampfer, E.; Stadlober, E. Methods for estimating principal points. Commun. Stat. Simul. Comput. 2002, 31, 261–277. [Google Scholar] [CrossRef]
Fu, H.H. The problem of selecting a specified number of representative points from a gamma population. J. China Univ. Min. Technol. 1985, 4, 107–117. (In Chinese) [Google Scholar]
Ke, X.; Wang, S.; Zhou, M.; Ye, H. New approaches on parameter estimation of the gamma distribution. Mathematics 2022, 11, 927. [Google Scholar] [CrossRef]
Zhou, M.; Wang, W.J. Representative points of Student’s t_n-distribution and their applications in statistical simulation. Acta Math. Appl. Sin. 2016, 39, 620–640. (In Chinese) [Google Scholar]
Fei, R.C. The problem of selecting representative points from population. Comm. Appl. Math. Comput. 1992, 16, 94–96. (In Chinese) [Google Scholar]
Fei, R.C. The problem of selecting representative points in pearson distributions population. J. Wuxi Inst. Light Ind. 1990, 9, 74–78. (In Chinese) [Google Scholar]
Fang, K.T.; Zhou, M.; Wang, W.J. Applications of the representative points in statistical simulations. Sci. China Ser. A 2014, 57, 2609–2620. [Google Scholar] [CrossRef]
Yang, J.; He, P.; Fang, K.T. Three kinds of discrete approximations of statistical multivariate distributions and their applications. J. Multivar. Anal. 2021, 188, 104829. [Google Scholar] [CrossRef]
Shao, J.; Tu, D. The Jackknife and Bootstrap; Springer Series in Statistics; Springer: New York, NY, USA, 1995. [Google Scholar]
Xu, L.H.; Li, Y.; Fang, K.T. The resampling method via representative points. Stat. Pap. 2023; submitted. [Google Scholar]
Fang, K.T.; Kotz, S.; Ng, K.W. Symmetric Multivariate and Related Distributions; Chapman and Hall: London, UK; New York, NY, USA, 1990. [Google Scholar]
Tarpey, T.; Li, L.; Flury, B. Principal points and self-consistent points of elliptical distributions. Ann. Stat. 1995, 23, 103–112. [Google Scholar] [CrossRef]
Tarpey, T. Self-consistent patterns for symmetric multivariate distributions. J. Classif. 1998, 15, 57–79. [Google Scholar] [CrossRef]
Johnson, M.E. Multivariate Statistical Simulation; Wiley: New York, NY, USA, 1987. [Google Scholar]
Linde, Y.; Buzo, A.; Gray, R. An algorithm for vector quantizer design. IEEE Trans. Commun. 1980, COM-28, 84–95. [Google Scholar] [CrossRef]
Fang, K.T.; Yuan, K.H.; Benlter, P.M. Applications of number-theoretic methods to quantizers of elliptically contoured distributions. Multivar. Anal. Appl. IMS Lect. Notes-Monogr. Ser. 1994, 24, 211–225. [Google Scholar]
Liang, J.; He, P.; Yang, J. Testing Multivariate Normality Based on t-Representative Points. Axioms 2022, 11, 587. [Google Scholar] [CrossRef]
Wang, S.; Liang, J.; Zhou, M.; Ye, H. Testing Multivariate Normality Based on F-Representative Points. Mathematics 2022, 10, 4300. [Google Scholar] [CrossRef]
Jiang, J.J.; He, P.; Fang, K.T. An interesting property of the arcsine distribution and its applications. Stat. Prob. Lett. 2015, 105, 88–95. [Google Scholar] [CrossRef]
Zhou, Y.D.; Fang, K.T. FM Criterion in representative points. Sci. Sin. Math. 2019, 49, 1009–1020. (In Chinese) [Google Scholar]
Mak, S.; Joseph, R. Support points. Ann. Stat. 2020, 46, 2562–2592. [Google Scholar] [CrossRef]
Rover, C.; Friede, T. Discrete approximation of a mixture distribution via restricted divergence. J. Comput. Graph. Stat. 2017, 26, 217–222. [Google Scholar] [CrossRef]
Yu, F. Uniqueness of principal points with respect to p-order distance for a class of univariate continuous distribution. Stat. Probab. Lett. 2022, 183, 109341. [Google Scholar] [CrossRef]
Li, Y.; Fang, K.T. A New Approach to Parameter Estimation of Mixture of Two Normal Distributions. Commun. Stat. Theory Methods 2022, 1–27. [Google Scholar] [CrossRef]
Harrell, F.; Davis, C.E. A new distribution-free quantile estimator. Biometrika 1982, 69, 635–640. [Google Scholar] [CrossRef]
Chen, J. Optimal rate of convergence for finite mixture models. Ann. Statist. 1995, 23, 221–233. [Google Scholar] [CrossRef]
Chen, J. Consistency of the MLE under Mixture Models. Stat. Sci. 2017, 32, 47–63. [Google Scholar] [CrossRef]
Qi, Z.F.; Zhou, Y.D.; Fang, K.T. Representative points for location-biased datasets. Commun. Stat. Simul. Comput. 2019, 48, 458–471. [Google Scholar] [CrossRef]
Wang, S.R.; Fang, K.T.; Ye, H.J. A new bias-corrected estimator method in extreme value distributions with small sample size. J. Stat. Comput. Simul. 2022, 92, 3862–3884. [Google Scholar] [CrossRef]

Figure 1. Density estimators by the four RPs,

k = 31

.

Figure 1. Density estimators by the four RPs,

k = 31

.

Table 1. Some log-concave densities.

Distribution	Kernel of $p (x)$	Condition
Normal	$exp \{- \frac{{(x - μ)}^{2}}{2 σ^{2}}\}$	$σ > 0$
t-distribution	${(1 + \frac{1 + x^{2}}{v})}^{- (v + 1) / 2}$	$v \geq 3$
Gamma	$x^{α - 1} e^{- β x}$	$α \geq 1, β > 0$
Beta	$x^{a - 1} {(1 - x)}^{b - 1}$	$a \geq 1, b \geq 1$
Logistic	$e^{- \frac{x - μ}{σ}} / {[1 + e^{- \frac{x - μ}{σ}}]}^{2}$	$σ > 0$
Gumbel	$e^{- \frac{x - μ}{σ}} exp {- e^{- \frac{x - μ}{σ}}}$	$σ > 0$
Weibull	$x^{α - 1} e^{- β x^{α}}$	$α \geq 1, β > 0$

Table 2. Limit behavior of

Δ_{k}^{(k)}

for some distributions on

R

.

Table 2. Limit behavior of

Δ_{k}^{(k)}

for some distributions on

R

.

Distribution	Limiting Value of $Δ_{k}^{(k)}$
Normal distribution	0
Logistic distribution	$2 β$
Student’s t distribution	$+ \infty$

Table 3. Limit behavior of

Δ_{k}^{(k)}

for some distributions on

R^{+}

.

Table 3. Limit behavior of

Δ_{k}^{(k)}

for some distributions on

R^{+}

.

Distribution	Limiting Value of $Δ_{k}^{(k)}$
Weibull distribution ( $m > 1$ )	0
Weibull distribution ( $0 < m < 1$ )	$+ \infty$
Gamma distribution	$2 β$
Exponential distribution	$2 β,$ for all $n \geq 2$
F-distribution	$+ \infty$
Beta prime distribution	$+ \infty$
Log-normal distribution	$+ \infty$
Inverse Gaussian distribution	$\frac{2 μ^{2}}{σ}$

Table 4.

L_{2}

-distance between the population density and approximate densities by 4 methods.

Table 4.

L_{2}

-distance between the population density and approximate densities by 4 methods.

k	MC	RMC	QMC	MSE
15	0.0676	0.0453	0.0171	0.0151
30	0.0674	0.0316	0.0129	0.0126
35	0.0615	0.0284	0.0124	0.0122

Table 5. Density function of some useful ECDs.

Type	Density Function $g (x)$ in $R^{d}$
Kotz Type	$g (x) = c {(x^{^{T}} x)}^{N - 1} \exp [- r {(x^{^{T}} x)}^{s}],$ $r, s > 0, 2 N + d > 2$
Multinormal	$g (x) = c \exp (- \frac{1}{2} x^{^{T}} x)$
Person Type VII	$g (x) = c {(l + x^{^{T}} x / s)}^{- N}, N > d / 2, s > 0$
Multivariate t	$g (x) = c {(l + x^{^{T}} x / s)}^{- (d + q) / 2}, q > 0, an integer, s > 0$
Person Type II	$g (x) = c {(l - x^{^{T}} x)}^{q}, q > 0$
Logistics	$g (x) = c \exp (- x^{^{T}} x) / {[1 + \exp (- x^{^{T}} x)]}^{2}$

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Fang, K.-T.; Pan, J. A Review of Representative Points of Statistical Distributions and Their Applications. Mathematics 2023, 11, 2930. https://doi.org/10.3390/math11132930

AMA Style

Fang K-T, Pan J. A Review of Representative Points of Statistical Distributions and Their Applications. Mathematics. 2023; 11(13):2930. https://doi.org/10.3390/math11132930

Chicago/Turabian Style

Fang, Kai-Tai, and Jianxin Pan. 2023. "A Review of Representative Points of Statistical Distributions and Their Applications" Mathematics 11, no. 13: 2930. https://doi.org/10.3390/math11132930

APA Style

Fang, K.-T., & Pan, J. (2023). A Review of Representative Points of Statistical Distributions and Their Applications. Mathematics, 11(13), 2930. https://doi.org/10.3390/math11132930

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Review of Representative Points of Statistical Distributions and Their Applications

Abstract

1. Introduction

1.1. Monte Carlo—RPs

1.2. Number-Theoretic RPs or Quasi-Monte Carlo RPs

1.3. Mean Square Error—RPs

2. Properties of MSE-RPs for Univariate Distributions

2.1. Existence and Uniqueness of MSE-RPs

2.2. Asymptotic Behavior of MSE-RPs

3. Algorithms for Generation of MSE-RPs of Univariate Distributions

4. Stochastic Simulation and Resampling

4.1. Estimation of the Mean, Variance, Skewness, and Kurtosis

4.2. Consistency

4.3. Confidence Interval Estimation

5. Property of MSE-RPs of Multivariate Distributions

6. Algorithms of Generation for RPs of Multivariate Distributions

6.1. Conditional Decomposition

6.2. Stochastic Representation

6.3. The NTSR Algorithm for the Generation of a Spherical Distribution

7. Applications of QMC-RPs and MSE-RPs

7.1. Statistical Inference

7.2. Moment Estimation and Maximum Likelihood Method

7.3. Geometric Probability

7.3.1. Case Study I: Area of Intersection between a Fixed Circle and Several Random Circles

7.3.2. Case Study II: Area of Random Belts with a Fixed Width Covering a Unit Sphere

8. Concluding Remarks

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI