Second Order Expansions for High-Dimension Low-Sample-Size Data Statistics in Random Setting

Christoph, Gerd; Ulyanov, Vladimir V.

doi:10.3390/math8071151

Open AccessArticle

Second Order Expansions for High-Dimension Low-Sample-Size Data Statistics in Random Setting

by

Gerd Christoph

^1,2,†

and

Vladimir V. Ulyanov

^2,3,*,†

¹

Department of Mathematics, Otto-von-Guericke University Magdeburg, 39016 Magdeburg, Germany

²

Moscow Center for Fundamental and Applied Mathematics, Lomonosov Moscow State University, 119991 Moscow, Russia

³

Faculty of Computer Science, National Research University Higher School of Economics, 167005 Moscow, Russia

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Mathematics 2020, 8(7), 1151; https://doi.org/10.3390/math8071151

Submission received: 12 May 2020 / Revised: 6 July 2020 / Accepted: 9 July 2020 / Published: 14 July 2020

(This article belongs to the Special Issue Stability Problems for Stochastic Models: Theory and Applications)

Download

Browse Figures

Versions Notes

Abstract

We consider high-dimension low-sample-size data taken from the standard multivariate normal distribution under assumption that dimension is a random variable. The second order Chebyshev–Edgeworth expansions for distributions of an angle between two sample observations and corresponding sample correlation coefficient are constructed with error bounds. Depending on the type of normalization, we get three different limit distributions: Normal, Student’s t-, or Laplace distributions. The paper continues studies of the authors on approximation of statistics for random size samples.

Keywords:

second order expansions; high-dimensional; low sample size; random sample size; Laplace distribution; Student’s t-distribution

MSC:

62E17 (Primary) 62H10; 60E05 (Secondary)

1. Introduction

Let

{\vec{X}}_{1} = {(X_{11}, \dots, X_{1 m})}^{T}, \dots, {\vec{X}}_{k} = {(X_{k 1}, \dots, X_{k m})}^{T}

be a random sample from m-dimensional population. The data set can be regarded as k vectors or points in m-dimensional space. Recently, there has been significant interest in a high-dimensional datasets when the dimension is large. In a high-dimensional setting, it is assumed that either (i) m tends to infinity and k is fixed, or (ii) both m and k tend to infinity. Case (i) is related to high-dimensional low sample size (HDLSS) data. One of the first results for HDLSS data appeared in Hall et al. [1]. It became the basis of research in mathematical statistics for the analysis of high-dimensional data, see, e.g., Fujikoshi et al. [2], which are an important part of the current data analysis fashionable area called Big data. Scientific areas where these settings have proven to be very useful include genetics and other types of cancer research, neuroscience, and also image and shape analysis. See a recent survey on HDLSS asymptotics and its applications in Aoshima et al. [3].

For examining the features of the data set, it is necessary to study the asymptotic behavior of three functions: the length

∥ {\vec{X}}_{i} ∥

of a m-dimensional observation vector, the distance

∥ {\vec{X}}_{i} - {\vec{X}}_{j} ∥

between any two independent observation vectors, and the angle

ang ({\vec{X}}_{i}, {\vec{X}}_{j})

between these vectors at the population mean. Assuming that

{\vec{X}}_{i}

’s are a sample from

N (0, I_{m})

, it was shown in Hall et al. [1] that for HDLSS data the three geometric statistics satisfy the following relations:

\begin{matrix} ∥ {\vec{X}}_{i} ∥ & = \sqrt{m} + O_{p} (1), i = 1, \dots, k, \end{matrix}

(1)

\begin{matrix} ∥ {\vec{X}}_{i} - {\vec{X}}_{j} ∥ & = \sqrt{2 m} + O_{p} (1), i, j = 1, \dots k, i \neq j, \end{matrix}

(2)

\begin{matrix} ang ({\vec{X}}_{i}, {\vec{X}}_{j}) & = \frac{1}{2} π + O_{p} (m^{- 1 / 2}), i, j = 1, \dots k, i \neq j, \end{matrix}

(3)

where

∥ \cdot ∥

is the Euclidean distance and

O_{p}

denotes the stochastic order. These interesting results imply that the data converge to the vertices of a deterministic regular simplex. These properties were extended for non-normal sample under some assumptions (see Hall et al. [1] and Aoshima et al. [3]). In Kawaguchi et al. [4], the relations (1)–(3) were refined by constructing second order asymptotic expansions for distributions of all three basic statistics. The refinements of (1) and (2) were achieved by using the idea of Ulyanov et al. [5] who obtained the computable error bounds of order

O (m^{- 1})

for the chi-squared approximation of transformed chi-squared random variables with m degrees of freedom.

The aim of the present paper is to study approximation for the third statistic

ang ({\vec{X}}_{1}, {\vec{X}}_{2})

under generalized assumption that m is a realization of a random variable, say

N_{n}

, which represents the sample dimension and is independent of

{\vec{X}}_{1}

and

{\vec{X}}_{2}

. This problem is closely related to approximations of statistics constructed from the random size samples, in particular, to this kind of problem for the sample correlation coefficient

R_{m}

.

The use of samples with random sample sizes has been steadily growing over the years. For an overview of statistical inferences with a random number of observations and some applications, see Esquível et al. [6] and the references cited therein. Gnedenko [7] considered the asymptotic properties of the distributions of sample quantiles for samples of random size. In Nunes et al. [8] and Nunes et al. [9], unknown sample sizes are assumed in medical research for analysis of one and more than one-way fixed effects ANOVA models to avoid false rejections, obtained when using the classical fixed size F-tests. Esquível et al. [6] considered inference for the mean with known and unknown variance and inference for the variance in the normal model. Prediction intervals for the future observations for generalized order statistics and confidence intervals for quantiles based on samples of random sizes are studied in Barakat et al. [10] and Al-Mutairi and Raqab [11], respectively. They illustrated their results with real biometric data set, the duration of remission of leukemia patients treated by one drug. The present paper continues studies of the authors on non-asymptotic analysis of approximations for statistics based on random size samples. In Christoph et al. [12], second order expansions for the normalized random sample sizes are proved, see below Propositions 1 and 2. These results allow for proving second order asymptotic expansions of random sample mean in Christoph et al. [12] and random sample median in Christoph et al. [13]. See also Chapters 1 and 9 in Fujikoshi and Ulyanov [14].

The structure of the paper is the following. In Section 2, we describe the relation between

ang ({\vec{X}}_{1}, {\vec{X}}_{2})

and

R_{m}

. We recall also previous approximation results proved for distributions of

ang ({\vec{X}}_{1}, {\vec{X}}_{2})

and

R_{m}

. Section 3 is on general transfer theorems, which allow us to construct asymptotic expansions for distributions of randomly normalized statistics on the base of approximation results for non-randomly normalized statistics and for the random size of the underlying sample, see Theorems 1 and 2. Section 4 contains the auxiliary lemmas. Some of them have independent interest. For example, Lemma 3 on the upper bounds for the negative order moments of a random variable having negative binomial distribution. We formulate and discuss main results in Section 5 and Section 6. In Theorems 3–8, we construct the second order Chebyshev–Edgeworth expansions for distributions of

ang ({\vec{X}}_{1}, {\vec{X}}_{2})

and

R_{m}

in random setting. Depending on the type of normalization, we get three different limit distributions: Normal, Laplace, or Student’s t-distributions. All proofs are given in the Appendix A.

2. Sample Correlation Coefficient, Angle between Vectors and Their Normal Approximations

We slightly simplify notation. Let

{\vec{X}}_{m} = {(X_{1}, \dots, X_{m})}^{T}

and

{\vec{Y}}_{m} = {(Y_{1}, \dots, Y_{m})}^{T}

be two vectors from an m-dimensional normal distribution

N ({0, I}_{m})

with zero mean, identity covariance matrix

I_{m}

and the sample correlation coefficient

R_{m} = R_{m} ({\vec{X}}_{m}, {\vec{Y}}_{m}) = \frac{\sum_{k = 1}^{m} X_{k} Y_{k}}{\sqrt{\sum_{k = 1}^{m} X_{k}^{2} \sum_{k = 1}^{m} Y_{k}^{2}}} .

(4)

Under the null hypothesis

H_{0}

: {

{\vec{X}}_{m}

and

{\vec{Y}}_{m}

are uncorrelated}, the so-called null density

p_{R_{m}} (y; n)

of

R_{m}

is given in Johnson, Kotz and Balakrishnan [15], Chapter 32, Formula (32.7):

p_{R_{m}} (y; m) = \frac{Γ ((m - 1) / 2)}{\sqrt{π} Γ ((m - 2) / 2)} {(1 - y^{2})}^{(m - 4) / 2} I_{(- 1 1)} (y)

for

m \geq 3

, where

I_{A} (.)

denotes indicator function of a set A.

Note $μ = E R_{m} = 0$ and $σ^{2} = Var (R_{m}) = 1 / (m - 1)$ for $m \geq 2$ ,
$R_{2}$ is two point distributed with $P (R_{2} = - 1) = P (R_{2} = 1) = 1 / 2$ ,
$R_{3}$ is U-shaped with $p_{R_{3}} (y; 3) = (1 / π) {(1 - y^{2})}^{- 1 / 2} I_{(- 1, 1)} (y)$ and
$R_{4}$ is uniform with density $p_{R_{4}} (y; 4) = 1 / 2 I_{(- 1, 1)} (y)$ .
Moreover, for $m \geq 5$ , the density function $p_{R_{m}} (y; m)$ is unimodal.

Consider now the standardized correlation coefficient

\begin{matrix} {\bar{R}}_{m} = \sqrt{m - c} R_{m} \end{matrix}

(5)

with some correcting real constant

c < m

having density

\begin{matrix} p_{{\bar{R}}_{m}} (y; m, c) & = & \frac{Γ ((m - 1) / 2)}{\sqrt{m - c} \sqrt{π} Γ ((m - 2) / 2)} {(1 - \frac{y^{2}}{m - c})}^{(m - 4) / 2} I_{{| r | < \sqrt{m - c}}} (y), \end{matrix}

(6)

which converges with

c = O (1)

as

m \to \infty

to the standard normal density

φ (y) = \frac{1}{\sqrt{2 π}} e^{- y^{2} / 2}, y \in (- \infty \infty)

and by Konishi [16], Section 4, Formula (4.1) as

m \to \infty

:

\begin{matrix} F_{m}^{*} (x, c) : = P (\sqrt{m - c} R_{m} \leq x) = Φ (x) + \frac{x^{3} + (2 (c - 1) - 3) x}{4 (m - c)} φ (x) + O (m^{- 3 / 2}), \end{matrix}

(7)

where

Φ (x) = \int_{- \infty}^{x} φ (y) d y

is the standard normal distribution function. Note that in Konishi [16] the sample size (in our case the dimension of vectors) is

m + 1

and

c = 1 + 2 Δ

with Konishi’s correcting constant

Δ

. Moreover, (7) follows from the more general Theorem 2.2 in the mentioned paper for independent components in the pairs

(X_{k} Y_{k})

,

k = 1, \dots, m

.

In Christoph et al. [17], computable error bounds of approximations in (7) with

c = 2

and

c = 2.5

of order

O (m^{- 2})

for all

m \geq 7

are proved:

{sup}_{x} |P (\sqrt{m - 2.5} R_{m} \leq x) - Φ (x) - \frac{x^{3} φ (x)}{4 (m - 2.5)}| \leq \frac{B_{m}}{{(m - 2.5)}^{2}} \leq \frac{B}{m^{2}}

(8)

and

{sup}_{x} |P (\sqrt{m - 2} R_{m} \leq x) - Φ (x) - \frac{(x^{3} - x) φ (x)}{4 (m - 2)}| \leq \frac{B_{m}^{*}}{{(m - 2)}^{2}} \leq \frac{B^{*}}{m^{2}}

(9)

where for some

m \geq 7

constants

B_{m}

and

B_{m}^{*}

are calculated and presented in Table 1 in Christoph et al. [17]: i.e.,

B_{7} = 1.875

,

B_{7}^{*} = 2.083

and

B_{50} = 0.720

,

B_{50}^{*} = 0.982

.

Usually, the asymptotic for

{\bar{R}}_{m}

is (9), where

c = 2

since it is related to the t-distributed statistic

\sqrt{m - 2} R_{m} / \sqrt{1 - R_{m}^{2}}

. With the correcting constant

c = 2.5

, one term in the asymptotic in (8) vanishes.

In order to use a transfer theorem from non-random to random dimension of the vectors, we prefer (7) with

c = 0

. In a similar manner as proving (8) and (9) in Christoph et al. [17], one can verify the following inequalities for

m \geq 3

:

\begin{matrix} {sup}_{x} |P (\sqrt{m} R_{m} \leq x) - Φ (x) - \frac{(x^{3} - 5 x)}{4 m} φ (x)| \leq C_{1} m^{- 2} . \end{matrix}

(10)

Let us consider now the connection between the correlation coefficient

R_{m}

and the angle

θ_{m}

of the involved vectors

{\vec{X}}_{m}, {\vec{Y}}_{m}

:

θ_{m} = ang ({\vec{X}}_{m}, {\vec{Y}}_{m}) .

(11)

Hall et al. [1] showed that under the given conditions

θ_{m} = \frac{1}{2} π + O_{p} (m^{- 1 / 2}) as m \to \infty,

where

O_{p}

denotes the stochastic order. Since

cos θ_{m} = \frac{∥ {\vec{X}}_{m} ∥^{2} + ∥ {\vec{Y}}_{m} ∥^{2} - {∥ {\vec{X}}_{m} - {\vec{Y}}_{m} ∥}^{2}}{2 ∥ {\vec{X}}_{m} ∥ ∥ {\vec{Y}}_{m} ∥} = R_{m} ({\vec{X}}_{m}, {\vec{Y}}_{m}) = R_{m},

the computable error bounds for

θ_{m}

follows from computable error bounds for

R_{m}

.

For any fixed constant

c < m,

and arbitrary x with

| x | < \sqrt{m - c} π / 2

, we obtain for the angle

θ_{m} : 0 < θ_{m} < π

:

\begin{matrix} P (\sqrt{m - c} (θ_{m} - π / 2) \leq x) & = P (θ_{m} \leq π / 2 + x / \sqrt{m - c}) \\ = P (cos θ_{m} \geq cos (π / 2 + x / \sqrt{m - c})) \\ = P (R_{m} \geq - sin (x / \sqrt{m - c})) \\ = P (\sqrt{m - c} R_{m} \leq \sqrt{m - c} sin (x / \sqrt{m - c})) \end{matrix}

(12)

because

R_{m}

is symmetric and

P (R_{m} \leq x) = P (- R_{m} \leq x)

.

Equation (12) shows the connection between the correlation coefficient

R_{m}

and the angle

θ_{m}

among the vectors involved. In Christoph et al. [17], computable error bound of approximation in (8) are used to obtain similar bound for the approximation of the angle between two vectors, defined in (11). Here, the approximation (10) and (12) with

c = 0

lead for any

m \geq 3

and for

| x | \leq π \sqrt{m} / 2

to

{sup}_{x} | P (\sqrt{m} (θ_{m} - \frac{π}{2}) \leq x) - Φ (x) - \frac{(1 / 3) x^{3} - 5 x}{4 m} φ (x) | \leq C_{1} m^{- 2} .

(13)

Many authors investigated limit theorems for the sums of random vectors when their dimension tends to infinity, see, e.g., Prokhorov [18]. In (6) and (7), the dimension m of the vectors

{\vec{X}}_{m}

and

{\vec{Y}}_{m}

tends to infinity.

Now, we consider the correlation coefficient of vectors

{\vec{X}}_{m}

and

{\vec{Y}}_{m}

, where the non-random dimension m is replaced by a random dimension

N_{n} \in N_{+} = {1, 2, \dots}

depending on some natural parameter

n \in N_{+}

and

N_{n}

is independent of

{\vec{X}}_{m}

and

{\vec{Y}}_{m}

for any

m, n \in N_{+}

. Define

R_{N_{n}} = \frac{\sum_{k = 1}^{N_{n}} X_{k} Y_{k}}{\sqrt{\sum_{k = 1}^{N_{n}} X_{k}^{2} \sum_{k = 1}^{N_{n}} Y_{k}^{2}}} .

3. Statistical Models with a Random Number of Observations

Let

X_{1}, X_{2}, \dots \in R = (- \infty \infty)

and

N_{1}, N_{2}, \dots \in N_{+} = {1, 2, \dots}

be random variables on the same probability space

(Ω, A, P)

. Let

N_{n}

be a random size of the underlying sample, i.e., the random number of observations, which depends on parameter

n \in N_{+}

. We suppose for each

n \in N_{+}

that

N_{n} \in N_{+}

is independent of random variables

X_{1}, X_{2}, \dots

and

N_{n} \to \infty

in probability as

n \to \infty

. Let

T_{m} : = T_{m} (X_{1}, \dots, X_{m})

be some statistic of a sample with non-random sample size

m \in N_{+}

. Define the random variable

T_{N_{n}}

for every

n \in N_{+}

:

T_{N_{n}} (ω) : = T_{N_{n} (ω)} (X_{1} (ω), \dots, X_{N_{n} (ω)} (ω)), ω \in Ω,

i.e.,

T_{N_{n}}

is some statistic obtained from a random sample

X_{1}, X_{2}, \dots, X_{N_{n}}

.

The randomness of the sample size may crucially change asymptotic properties of

T_{N_{n}}

, see, e.g., Gnedenko [7] or Gnedenko and Korolev [19].

3.1. Random Sums

Many models lead to random sums and random means

S_{N_{n}} = \sum_{k = 1}^{N_{n}} X_{k} and M_{N_{n}} = \frac{1}{N_{n}} \sum_{k = 1}^{N_{n}} X_{k}, .

(14)

A fundamental introduction to asymptotic distributions of random sums is given in Döbler [20].

It is worth mentioning that a suitable scaled factor by

S_{N_{n}}

affects the type of limit distribution. In fact, consider random sum

S_{N_{n}}

given in (14). For the sake of convenience, let

X_{1}, X_{2}, \dots

be independent standard normal random variables and

N_{n} \in N_{+}

be geometrically distributed with

E (N_{n}) = n

and independent of

X_{1}, X_{2}, \dots

. Then, one has

\begin{matrix} P (\frac{1}{\sqrt{N_{n}}} S_{N_{n}} \leq x) = \int_{- \infty}^{x} \frac{1}{\sqrt{2 π}} e^{- u^{2} / 2} d u for all n \in N, \end{matrix}

(15)

\begin{matrix} P (\frac{1}{\sqrt{E (N_{n})}} S_{N_{n}} \leq x) \to \int_{- \infty}^{x} \frac{1}{\sqrt{2}} e^{- \sqrt{2} | u |} d u as n \to \infty, \end{matrix}

(16)

\begin{matrix} P (\frac{\sqrt{E (N_{n})}}{N_{n}} S_{N_{n}} \leq x) \to \int_{- \infty}^{x} {(2 + u^{2})}^{- 3 / 2} d u as n \to \infty . \end{matrix}

(17)

We have three different limit distributions. The suitable scaled geometric sum

S_{N_{n}}

is standard normal distributed or tends to the Laplace distribution with variance 1 depending on whether we take the random scaling factor

1 / \sqrt{N_{n}}

or the non-random scaling factor

1 / \sqrt{E N_{n}}

, respectively. Moreover, we get the Student distribution with two degrees of freedom as the limit distribution if we use scaling with the mixed factor

\sqrt{E (N_{n})} / N_{n}

. Similar results also hold for the normalized random mean

M_{N_{n}} = \frac{1}{N_{n}} S_{N_{n}}

.

Assertion (15) is obtained by conditioning and the stability of the normal law. Moreover, using Stein’s method, quantitative Berry–Esseen bounds in (15) and (16) for arbitrary centered random variables

X_{1}

with

E (| X_{1} |^{3}) < \infty

were proved in (Chen et al. [21], Theorem 10.6), (Döbler [20] Theorems 2.5 and 2.7) and (Pike and Ren [22] Theorem 3), respectively. Statement (17) follows from (Bening and Korolev [23] Theorem 2.1).

First order asymptotic expansions are obtained for the distribution function of random sample mean and random sample median constructed from a sample with two different random sizes in Bening et al. [24] and in the conference paper Bening et al. [25]. The authors make use of the rate of convergence of

P (N_{n} \leq g_{n} x)

to the limit distribution

H (x)

with some

g_{n} ↑ \infty

. In Christoph et al. [12], second order expansions for the normalized random sample sizes are proved, see below Propositions 1 and 2. These results allow for proving second order asymptotic expansions of random sample mean in Christoph et al. [12] and random sample median in Christoph et al. [13].

3.2. Transfer Proposition from Non-Random to Random Sample Sizes

Consider now the statistic

T_{N_{n}} = T_{N_{n}} ({\vec{X}}_{N_{n}}, {\vec{Y}}_{N_{n}})

, where the dimension of the vectors

{\vec{X}}_{N_{n}}, {\vec{Y}}_{N_{n}}

is a random number

N_{n} \in N_{+}

.

In order to avoid too long expressions and at the same time to preserve a necessary accuracy, we limit ourselves to obtaining limit distributions and terms of order

m^{- 1}

in the following non-asymptotic approximations with a bounds of order

m^{- a}

for some

a > 1

.

We suppose that the following condition on the statistic

T_{m} = T_{m} ({\vec{X}}_{m}, {\vec{Y}}_{m})

with

E T_{m} = 0

is met for a non-random sample size

m \in N_{+}

:

Condition 1.

There exist differentiable bounded function

f_{2} (x)

with

{sup}_{x} | x f_{2}^{'} (x) | < c_{0}

and real numbers

a > 1

,

C_{1} > 0

such that for all integer

m \geq 1

{sup}_{x} | P (m^{γ} T_{m} \leq x) - Φ (x) - m^{- 1} f_{2} (x) | \leq C_{1} m^{- a},

(18)

where

γ \in {- 1 / 2, 0, 1 / 2}

.

Remark 1.

Relations (10) and (13) give the examples of statistics such that Condition 1 is met. For other examples of multivariate statistics of this kind, see Chapters 14–16 in Fujikoshi et al. [2].

Suppose that the limiting behavior of distribution functions of the normalized random size

N_{n} \in N_{+}

is described by the following condition.

Condition 2.

There exist a distribution function

H (y)

with

H (0 +) = 0

, a function of bounded variation

h_{2} (y)

, a sequence

0 < g_{n} ↑ \infty

and real numbers

b > 0

and

C_{2} > 0

such that for all integer

n \geq 1

\begin{matrix} {sup}_{y \geq 0} |P (g_{n}^{- 1} N_{n} \leq y) - H (y)| \leq C_{2} n^{- b}, & 0 < b \leq 1 \\ {sup}_{y \geq 0} |P (g_{n}^{- 1} N_{n} \leq y) - H (y) - n^{- 1} h_{2} (y)| \leq C_{2} n^{- b}, & b > 1 \end{matrix}\}

(19)

Remark 2.

In Propositions 1 and 2 below, we get the examples of discrete random variables

N_{n}

such that Condition 2 is met.

Conditions 1 and 2 allow us to construct asymptotic expansions for distributions of randomly normalized statistics on the base of approximation results for normalized fixed-size statistics (see relation (18)) and for the random size of the underlying sample (see relation (19)). As a result, we obtain the following transfer theorem.

Theorem 1.

Let

| γ | \leq K < \infty

and both Conditions 1 and 2 be satisfied. Then, the following inequality holds for all

n \in N_{+}

:

{sup}_{x \in R} | P (g_{n}^{γ} T_{N_{n}} \leq x) - G_{n} (x, 1 / g_{n}) | \leq C_{1} E (N_{n}^{- a}) + (C_{3} D_{n} + C_{4}) n^{- b},

(20)

G_{n} (x, 1 / g_{n}) = \int_{1 / g_{n}}^{\infty} (Φ (x y^{γ}) + \frac{f_{2} (x y^{γ})}{g_{n} y}) d (H (y) + \frac{h_{2} (y)}{n}),

(21)

D_{n} = sup_{x} \int_{1 / g_{n}}^{\infty} |\frac{\partial}{\partial y} (Φ (x y^{γ}) + \frac{f_{2} (x y^{γ})}{y g_{n}})| d y,

(22)

where

a > 1, b > 0, f_{2} (z), h_{2} (y)

are given in (18) and (19). The constants

C_{1}, C_{3}, C_{4}

do not depend on n.

Remark 3.

Later, we use only the cases

γ \in {- 1 / 2, 0, 1 / 2}

.

Remark 4.

The domain

[1 / g_{n}, \infty)

of integration in (21) depends on

g_{n}

. Thus, it is not clear how

G_{n} (x, 1 / g_{n})

is represented as a polynomial in

g_{n}^{- 1}

and

n^{- 1}

. To overcome this problem (see (26)), we prove the following theorem.

Theorem 2.

Under the conditions of Theorem 1 and the additional conditions on functions

H (.)

and

h_{2} (.)

, depending on the convergence rate b in (19):

H (1 / g_{n}) \leq c_{1} g_{n}^{- b}, b > 0,

(23)

\begin{matrix} i : & \int_{0}^{1 / g_{n}} y^{- 1} d H (y) \leq c_{2} g_{n}^{- b + 1}, \\ i i : & h_{2} (0) = 0 a n d | h_{2} (1 / g_{n}) | \leq c_{3} n g_{n}^{- b}, \\ i i i : & \int_{0}^{1 / g_{n}} y^{- 1} | h_{2} (y) | d y \leq c_{4} n g_{n}^{- b}, \end{matrix}\} f o r b > 1,

(24)

we obtain for the function

G_{n} (x, 1 / g_{n})

defined in (21):

{sup}_{x} | G_{n} (x, 1 / g_{n}) - G_{n, 2} (x) | \leq C g_{n}^{- b} + {sup}_{x} (| I_{1} (x, n) | I_{{b < 1}} (b) + | I_{2} (x, n) |)

(25)

with

G_{n, 2} (x) = \{\begin{matrix} \int_{0}^{\infty} Φ (x y^{γ}) d H (y), & 0 < b < 1, \\ \int_{0}^{\infty} Φ (x y^{γ}) d H (y) + \frac{1}{g_{n}} \int_{0}^{\infty} \frac{f_{2} (x y^{γ})}{y} d H (y), & b = 1 \\ \int_{0}^{\infty} Φ (x y^{γ}) d H (y) + \frac{1}{g_{n}} \int_{0}^{\infty} \frac{f_{2} (x y^{γ})}{y} d H (y) I_{{γ = 0}} (γ) + \frac{1}{n} \int_{0}^{\infty} Φ (x y^{γ}) d h_{2} (y), & b > 1 . \end{matrix}

(26)

I_{1} (x, n) = \int_{1 / g_{n}}^{\infty} \frac{f_{2} (x y^{γ})}{g_{n} y} d H (y) f o r b \leq 1 a n d I_{2} (x, n) = \int_{1 / g_{n}}^{\infty} \frac{f_{2} (x y^{γ})}{g_{n} n y} d h_{2} (y) f o r b > 1 .

Remark 5.

The additional conditions (23) and (24) guarantee to extend the integration range from

[1 / g_{n}, \infty)

to

(0, \infty)

of the integrals in (26).

Theorems 1 and 2 are proved in Appendix A.1.

4. Auxiliary Propositions and Lemmas

Consider the standardized correlation coefficient (5) having density (6) with correcting real constant

c = 0

and standardized angle

\sqrt{m} (θ_{m} - π / 2)

, see (12). By (10) and (13) for

m \geq 3

, we have

\begin{matrix} {sup}_{x} |P (\sqrt{m} R_{m} \leq x) - Φ (x) - \frac{(x^{3} - 5 x)}{4 m} φ (x)| \leq C_{1} m^{- 2}, m \in N_{+}, \end{matrix}

(27)

and for the angle

θ_{m}

between the vectors for

| x | \leq π \sqrt{m} / 2

{sup}_{x} | P (\sqrt{m} (θ_{m} - \frac{π}{2}) \leq x) - Φ (x) - \frac{(1 / 3) x^{3} - 5 x}{4 m} φ (x) | \leq C_{1} m^{- 2}, m \in N_{+},

(28)

where (27) and (28) for

m = 1

and

m = 2

are trivial and

C_{1}

does not depend on m.

Suppose

f_{2} (x; a) = (a x^{3} - 5 x) φ (x) / 4

with

a = 1

or

a = 1 / 3

when (27) or (28) are considered. Since a product of polynomials in x with

φ (x)

is always bounded, numerical calculus leads to

\begin{matrix} {sup}_{x} | x f_{2}^{'} (x; a) | = {sup}_{x} | x (a x^{4} - (3 a + 5) x^{2} + 5) | φ (x) / 4 \leq 0.4 . \end{matrix}

Condition 1 of the transfer Theorem 1 to the statistics

R_{m}

and

θ_{m}

are satisfied with

c_{0} = 0.4

and

a = 2

.

Next, we estimate

D_{n} (x)

defined in (22).

Lemma 1.

Let

g_{n}

a sequence with

0 < g_{n} ↑ \infty

as

n \to \infty

. Then, with some

0 < c (γ, a) < \infty

, we obtain with

a = 1

and

a = 1 / 3

:

D_{n} = sup_{x} \int_{1 / g_{n}}^{\infty} |\frac{\partial}{\partial y} (Φ (x y^{γ}) + \frac{f_{2} (x y^{γ}; a)}{y g_{n}})| d y \leq \frac{1}{2} + \frac{c (γ, a)}{4} .

In the next subsection, we consider the cases when the random dimension

N_{n}

is negative binomial distributed with success probability

1 / n

.

4.1. Negative Binomial Distribution as Random Dimension of the Normal Vectors

Let the random dimension

N_{n} (r)

of the underlying normal vectors be negative binomial distributed (shifted by 1) with parameters

1 / n

and

r > 0

, having probability mass function

P (N_{n} (r) = j) = \frac{Γ (j + r - 1)}{Γ (j) Γ (r)} {(\frac{1}{n})}^{r} {(1 - \frac{1}{n})}^{j - 1}, j = 1, 2, \dots

(29)

with

E (N_{n} (r)) = r (n - 1) + 1

. Then,

P (N_{n} (r) / g_{n} \leq x)

tends to the Gamma distribution function

G_{r, r} (x)

with the shape and rate parameters

r > 0

, having density

g_{r, r} (x) = \frac{r^{r}}{Γ (r)} x^{r - 1} e^{- r x} I_{(0 \infty)} (x), x \in R .

(30)

If the statistic

T_{m}

is asymptotically normal, the limit distribution of the standardized statistic

T_{N_{n} (r)}

with random size

N_{n} (r)

is Student’s t-distribution

S_{2 r} (x)

having density

s_{ν} (x) = \frac{Γ ((ν + 1) / 2)}{\sqrt{ν π} Γ (ν / 2)} {(1 + \frac{x^{2}}{ν})}^{- (ν + 1) / 2}, ν > 0, x \in R,

(31)

with

ν = 2 r

, see Bening and Korolev [23] or Schluter and Trede [26].

Proposition 1.

Let

r > 0

, discrete random variable

N_{n} (r)

have probability mass function (29) and

g_{n} : = E N_{n} (r) = r (n - 1) + 1

. For

x > 0

and all

n \in N

there exists a real number

C_{2} (r) > 0

such that

{sup}_{x \geq 0} |P (\frac{N_{n} (r)}{g_{n}} \leq x) - G_{r, r} (x) - \frac{h_{2; r} (x)}{n}| \leq C_{2} (r) n^{- min {r, 2}},

(32)

where

h_{2; r} (x) = \{\begin{matrix} 0, & f o r r \leq 1, \\ \frac{g_{r, r} (x) ((x - 1) (2 - r) + 2 Q_{1} (g_{n} x))}{2 r}, & f o r r > 1 . \end{matrix}

Q_{1} (y) = 1 / 2 - (y - [y]) a n d [.] denotes the integer part of a number .

(33)

Figure 1 shows the approximation of

P (N_{n} (r) \leq (r (n - 1) + 1) x)

by

G_{2, 2} (x)

and

G_{2, 2} (x) + h_{2} (x) / n

.

Remark 6.

The convergence rate for

r \leq 1

is given in Bening et al. [24] or Gavrilenko et al. [27]. The Edgeworth expansion for

r > 1

is proved in Christoph et al. [12], Theorem 1. The jumps of the sample size

N_{n} (r)

have an effect only on the function

Q_{1} (.)

in the term

h_{2; r} (.)

.

The negative binomial random variable

N_{n}

satisfies Condition 2 of the transfer Theorem 1 with

H (x) = G_{r, r} (x)

,

h_{2} (x) = h_{2; r} (x)

,

g_{n} = E N_{n} (r) = r (n - 1) + 1

and

b = min {r 2}

.

Lemma 2.

In Theorem 2 the additional conditions (23) and (24) are satisfied with

H (x) = G_{r, r} (x)

,

h_{2} (x) = h_{2; r} (x)

,

g_{n} = E N_{n} (r) = r (n - 1) + 1

and

b = min {r 2}

. Moreover, one has for

γ \in {- 1 / 2, 0, 1 / 2}

and

f_{2} (z; a) = (a z^{3} - 5 z) φ (z) / 4

, with

a = 1

or

a = 1 / 3

:

| I_{1} (x, n) | = \{\begin{matrix} | \int_{1 / g_{n}}^{\infty} \frac{f_{2} (x y^{γ}; a)}{g_{n} y} d G_{r, r} (y) | \leq c_{5} g_{n}^{- r} & r < 1, \\ | \int_{1 / n}^{\infty} \frac{f_{2} (x y^{γ}; a)}{n y} d G_{1, 1} (y) - n^{- 1} f_{2} (x; a) ln n I_{{γ = 0}} (γ) | \leq c_{6} n^{- 1}, & r = 1, \end{matrix}

(34)

| I_{2} (x, n) | = |\int_{1 / g_{n}}^{\infty} \frac{f_{2} (x y^{γ}; a)}{g_{n} n y} d h_{2; r} (y)| \leq \{\begin{matrix} c_{7} g_{n}^{- r}, & r > 1, r \neq 2, \\ (c_{7} + c_{8} ln n I_{{γ = 0}} (γ)) g_{n}^{- 2}, & r = 2 . \end{matrix}

(35)

Furthermore, we have

0 \leq g_{n}^{- 1} - {(r n)}^{- 1} \leq (r - 1) {(r n)}^{- 2} e^{- 1 / 2} f o r r > 1, n \geq 2 .

(36)

In addition to the expansion of

N_{n} (r)

a bound of

E {(N_{n} (r))}^{- a}

is required, where

m^{- a}

is rate of convergence of Edgeworth expansion for

T_{m}

, see (18).

Lemma 3.

Let

r > 0

,

α > 0

and the random variable

N_{n} (r)

is defined by (29). Then,

E {(N_{n} (r))}^{- α} \leq C (r) \{\begin{matrix} n^{- min {r, α}}, & r \neq α \\ ln (n) n^{- α}, & r = α \end{matrix}

(37)

and the convergence rate in case

r = α

cannot be improved.

4.2. Maximum of n Independent Discrete Pareto Random Variables Is the Dimension of the Normal Vectors

Let

Y (s) \in N

be discrete Pareto II distributed with parameter

s > 0

, having probability mass and distribution functions

P (Y (s) = k) = \frac{s}{s + k - 1} - \frac{s}{s + k} and P (Y (s) \leq k) = \frac{k}{s + k}, k \in N,

(38)

which is a particular class of a general model of discrete Pareto distributions, obtained by discretization continuous Pareto II (Lomax) distributions on integers, see Buddana and Kozubowski [28].

Now, let

Y_{1} (s), Y_{2} (s), \dots

, be independent random variables with the same distribution (38). Define for

n \in N

and

s > 0

the random variable

N_{n} (s) = max_{1 \leq j \leq n} Y_{j} (s) with P (N_{n} (s) \leq k) = {(\frac{k}{s + k})}^{n}, n \in N .

(39)

It should be noted that the distribution of

N_{n} (s)

is extremely spread out on the positive integers.

In Christoph et al. [12], the following Edgeworth expansion was proved:

Proposition 2.

Let the discrete random variable

N_{n} (s)

have distribution function (39). For

x > 0

, fixed

s > 0

and all

n \in N

, then there exists a real number

C_{3} (s) > 0

such that

sup_{y > 0} |P (\frac{N_{n} (s)}{n} \leq y) - H_{s} (y) - \frac{h_{2; s} (y)}{n}| \leq \frac{C_{3} (s)}{n^{2}},

H_{s} (y) = e^{- s / y} a n d h_{2; s} (y) = s e^{- s / y} (s - 1 + 2 Q_{1} (n y)) / (2 y^{2}), y > 0

(40)

where

Q_{1} (y)

is defined in (33).

Remark 7.

The continuous function

H_{s} (y) = e^{- s / y} I_{(0 \infty)} (y)

with parameter

s > 0

is the distribution function of the inverse exponential random variable

W (s) = 1 / V (s)

, where

V (s)

is exponentially distributed with rate parameter

s > 0

. Both

H_{s} (y)

and

P (N_{n} (s) \leq y)

are heavy tailed with shape parameter 1.

Remark 8.

Therefore,

E (N_{n} (s)) = \infty

for all

n \in N

and

E (W (s)) = \infty

. Moreover:

First absolute pseudo moment $ν_{1} = \int_{0}^{\infty} x | d (P (N_{n} (s) \leq n x) - e^{- s / x}) | = \infty$ ,
Absolute difference moment $χ_{u} = \int_{0}^{\infty} x^{u - 1} | P (N_{n} (s) \leq n x) - e^{- s / x} | d x < \infty$
for $1 \leq u < 2$ , see Lemma 2 in Christoph et al. [12].

On pseudo moments and some of their generalizations, see Chapter 2 in Christoph and Wolf [29].

Lemma 4.

In Transfer Theorem 2, the additional conditions (23) and (24) are satisfied with

H (y) = H_{s} (y) = e^{- s / y}

,

h_{2} (y) = h_{2; s} (y) = s e^{- s / y} (s - 1 + 2 Q_{1} (n y)) / (2 y^{2}), y > 0

,

g_{n} = n

and

b = 2

. Moreover, one has for

| γ | \leq K < \infty

and

f_{2} (z; a) = (a z^{3} - 5 z) φ (z) / 4

, with

a = 1

or

a = 1 / 3

:

I_{2} (x, n) = |\int_{1 / n}^{\infty} \frac{f_{2} (x y^{γ}; a)}{n^{2} y} d h_{2; s} (y)| \leq C (s) n^{- 2} .

Lemma 5.

For random size

N_{n} (s)

with probabilities (39) with reals

s \geq s_{0} > 0

and arbitrary small

s_{0} > 0

and

n \geq 1

, we have

E {(N_{n} (s))}^{- α} \leq C (s) n^{- α} .

(41)

The Lemmas are proved in Appendix A.2.

5. Main Results

Consider the sample correlation coefficient

R_{m} = R_{m} ({\vec{X}}_{m}, {\vec{Y}}_{m})

, given in (4) and the two statistics

R_{m}^{*} = \sqrt{m} R_{m}

and

R_{m}^{* *} = m R_{m}

which differ from

R_{m}

by scaling factors. Hence, by (10),

P (\sqrt{m} R_{m} \leq x) = P (R_{m}^{*} \leq x) = P (\frac{1}{\sqrt{m}} R_{m}^{* *} \leq x) = Φ (x) + \frac{(x^{3} - 5 x)}{4 m} φ (x) + r (m)

(42)

with

| r (m) | \leq C m^{- 2}

.

Let

θ_{m}

be the angle between the vectors

{\vec{X}}_{m}

and

{\vec{Y}}_{m}

. Contemplate the statistics

Θ_{m} = θ_{m} - π / 2

,

Θ_{m}^{*} = \sqrt{m} (θ_{m} - π / 2)

and

Θ_{m}^{* *} = m (θ_{m} - π / 2)

which differ only in scaling. Then, by (13),

P (\sqrt{m} Θ_{m} \leq x) = P (Θ_{m}^{*} \leq x) = P (\frac{1}{\sqrt{m}} Θ_{m}^{* *} \leq x) = Φ (x) + \frac{((1 / 3) x^{3} - 5 x)}{4 m} φ (x) + r^{*} (m)

with

| r^{*} (m) | \leq C m^{- 2}

.

Consider now the statistics

R_{N_{n}}

,

R_{N_{n}}^{*}

and

R_{N_{n}}^{* *}

as well as

Θ_{N_{n}}

,

Θ_{N_{n}}^{*}

and

Θ_{N_{n}}^{* *}

when the vectors have random dimension

N_{n}

. The normalized statistics have different limit distributions as

n \to \infty

.

5.1. The Random Dimension $N_{n} = N_{n} (r)$ Is Negative Binomial Distributed

Let the random dimension

N_{n} (r)

be negative binomial distributed with probability mass function (29) and

g_{n} = E N_{n} (r) = r (n - 1) + 1

. “The negative binomial distribution is one of the two leading cases for count models, it accommodates the overdispersion typically observed in count data (which the Poisson model cannot)”, see Schluter and Trede [26].

It follows from Theorems 1 and 2 and Proposition 1 that if limit distributions for

P (g_{n}^{γ} N_{n} {(r)}^{1 / 2 - γ} R_{N_{n} (r)} \leq x)

for

γ \in {1 / 2, 0 - 1 / 2}

exist they are

\int_{0}^{\infty} Φ (x y^{γ}) d G_{r, r} (y)

with densities given bellow in the proof of the corresponding theorems:

\frac{r^{r}}{\sqrt{2 π} Γ (r)} \int_{0}^{\infty} y^{r - 1 / 2} e^{- (x y^{γ} + r y)} d y = \{\begin{matrix} s_{2 r} (x) = \frac{Γ (r + 1 / 2)}{\sqrt{2 r π} Γ (r)} {(1 + \frac{x^{2}}{2 r})}^{- (r + 1 / 2)}, & γ = 1 / 2, \\ φ (x) = \frac{1}{\sqrt{2 π}} e^{- x^{2} / 2}, & γ = 0, \\ l_{1} (x) = \frac{1}{\sqrt{2}} e^{- \sqrt{2} | x |}, for r = 1, & γ = - 1 / 2, \end{matrix}

(43)

where in case

γ = - 1 / 2

for

r \neq 1

generalized Laplace distributions occur.

5.1.1. Student’s t-Distribution

We start with the case

γ = 1 / 2

in Theorems 1 and 2. Consider the statistic

{\bar{R}}_{N_{n} (r)} = \sqrt{g_{n}} R_{N_{n} (r)}

. The limit distribution is the Student’s t-distribution

S_{2 r} (x)

with

2 r

degrees of freedom with density (31).

Theorem 3.

Let

r > 0

and (29) be the probability mass function of the random dimension

N_{n} = N_{n} (r)

of the vectors under consideration. If the representation (42) for the statistic

R_{m}

and the inequality (32) with

g_{n} = E N_{n} (r) = r (n - 1) + 1

hold, then there exists a constant

C_{r}

such that for all

n \in N_{+}

sup_{x} |P (\sqrt{g_{n}} R_{N_{n} (r)} \leq x) - S_{2 r; n} (x; 1)| \leq C_{r} \{\begin{matrix} n^{- min {r, 2}}, & r \neq 2, \\ ln (n) n^{- 2}, & r = 2, \end{matrix}

(44)

where

S_{2 r; n} (x; a) = S_{2 r} (x) + \frac{s_{2 r} (x)}{r n} (a x^{3} - \frac{10 r x + 5 x^{3}}{2 r - 1} + \frac{(2 - r) (x^{3} + x)}{4 (2 r - 1)}) .

(45)

Moreover, the scaled angle

θ_{N_{n} (r)}

between the vectors

{\vec{X}}_{N_{n} (r)}

and

{\vec{Y}}_{N_{n} (r)}

allows the estimate

{sup}_{x} |P (\sqrt{g_{n}} (θ_{N_{n} (r)} - π / 2) \leq x) - S_{2 r; n} (x; 1 / 3)| \leq C_{r} \{\begin{matrix} n^{- min {r, 2}}, & r \neq 2, \\ ln (n) n^{- 2}, & r = 2, \end{matrix}

where

S_{2 r; n} (x; 1 / 3)

is given in (45) with

a = 1 / 3

.

Figure 2 shows the advantage of the Chebyshev–Edgeworth expansion versus the limit law in approximating the empirical distribution function.

Remark 9.

The limit Student’s t-distribution

S_{2 r} (x)

is symmetric and a generalized hyperbolic distribution which can be written as a regularized incomplete beta function

I_{z} (a, b)

. For

x > 0

:

S_{2 r} (x) = \int_{- \infty}^{x} s_{2 r} (u) d u = \frac{1}{2} (1 + I_{2 r / (x^{2} + 2 r)} (1 / 2, r)) a n d I_{z} (a, b) = \frac{Γ (a + b)}{Γ (a) Γ (b)} \int_{0}^{z} t^{a - 1} {(1 - t)}^{b - 1} .

Remark 10.

For integer values

ν = 2 r \in {1, 2, \dots}

the Student’s t-distribution

S_{2 r} (x)

is computable in closed form:

t h e C a u c h y l a w S_{1} (x) = \frac{1}{2} + \frac{1}{π} arctan (x), S_{2} (x) = \frac{1}{2} + \frac{x}{2 \sqrt{2 + x^{2}}},

S_{3} (x) = \frac{1}{2} + \frac{1}{π} (\frac{x}{\sqrt{3} (1 + x^{2} / 3)} + arctan (x / \sqrt{3})) a n d S_{4} (x) = \frac{1}{2} + \frac{27 (x^{2} + 3) x (2 x^{2} + 9)}{8 {(3 x^{2} + 9)}^{5 / 2}} .

Remark 11.

If the dimension of the vectors has the geometric distribution

N_{n} (1)

, then asymptotic distribution of the sample coefficient is the Student law

S_{2} (x)

with two degrees of freedom.

Remark 12.

The Cauchy limit distribution occurs when the dimension of the vectors has distribution

N_{n} (1 / 2)

.

Remark 13.

The Student’s t-distributions

S_{2 r} (x)

are heavy tailed and their moments of orders

α \geq 2 r

do not exist.

5.1.2. Standard Normal Distribution

Let

γ = 0

in the Theorems 1 and 2 examining the statistics

R_{N_{n} (r)}^{*}

and

Θ_{N_{n} (r)}^{*} = \sqrt{N_{n} (r)} (θ_{N_{n} (r)} - π / 2)

.

Theorem 4.

Let

r > 0

and

N_{n} = N_{n} (r)

be the random vector dimension having probability mass function (29). If the representation (42) for the statistic

R_{m}

and the inequality (32) with

g_{n} = E N_{n} (r) = r (n - 1) + 1

hold, then there exists a constant

C_{r}

such that for all

n \in N_{+}

sup_{x} |P (\sqrt{N_{n} (r)} R_{N_{n} (r)} \leq x) - Φ_{n; 2} (x; 1)| \leq C_{r} \{\begin{matrix} n^{- min {r, 2}}, & r \neq 2, \\ ln (n) n^{- 2}, & r = 2, \end{matrix}

(46)

where

Φ_{n; 2} (x; a) = Φ (x) + \frac{φ (x)}{n} (\frac{(a x^{3} - 5 x) ln n}{4} I_{{r = 1}} (r) + \frac{Γ (r - 1) (a x^{3} - 5 x)}{4 Γ (r)} I_{{r > 1}} (r)) .

(47)

Moreover, the scaled angle

θ_{N_{n} (r)}^{*}

between the vectors

{\vec{X}}_{N_{n} (r)}

and

{\vec{Y}}_{N_{n} (r)}

allows the estimate

{sup}_{x} |P (\sqrt{N_{n} (r)} (θ_{N_{n} (r)} - π / 2) \leq x) - Φ_{n; 2} (x; 1 / 3)| \leq C_{r} \{\begin{matrix} n^{- min {r, 2}}, r \neq 2, \\ ln (n) n^{- 2}, r = 2, \end{matrix}

where

Φ_{n; 2} (x; 1 / 3)

is given in (47) with

a = 1 / 3

.

Figure 3 shows that the second order Chebyshev–Edgeworth expansion approximates the empirical distribution function better than the limit normal distribution.

Remark 14.

When the distribution function of a statistic

T_{m}

without standardization tends to the standard normal distribution

Φ (x)

, i.e.,

P (T_{m} \leq x) \to Φ (x)

, then the limit law for

P (T_{N_{n}} \leq x)

remains the standard normal distribution

Φ (x)

.

5.1.3. Generalized Laplace Distribution

Finally, we use

γ = - 1 / 2

in Theorems 1 and 2 examining the statistic

g_{n}^{- 1 / 2} R_{N_{n} (r)}^{* *}

. Theorems 1 and 2 state that if there exists a limit distribution of

P (g_{n}^{- 1 / 2} R_{N_{n}}^{* *} \leq x)

as

n \to \infty

then it has to be a scale mixture of normal distributions with zero mean and gamma distribution:

L_{r} (x) = \int_{0}^{\infty} Φ (x y^{- 1 / 2}) d G_{r, r} (y)

having density, see formula (A9) in the proof of Theorem 5:

l_{r} (x) = \frac{r^{r}}{Γ (r)} \int_{0}^{\infty} φ (x y^{- 1 / 2}) y^{r - 3 / 2} e^{- r y} d y = \frac{2 r^{r}}{Γ (r) \sqrt{2 π}} {(\frac{| x |}{\sqrt{2 r}})}^{r - 1 / 2} K_{r - 1 / 2} (\sqrt{2 r} | x |) .

(48)

where

K_{α} (u)

is the α-order Macdonald function or

α

-order modified Bessel function of the third kind. See, e.g., Oldham et al. [30], Chapter 51, or Kotz et al. [31], Appendix, for properties of these functions.

For integer

r = 1, 2, 3, \dots

these densities

l_{r} (x)

, so-called Sargan densities, and their distribution functions are computable in closed forms:

\begin{matrix} l_{1} (x) = \frac{1}{\sqrt{2}} e^{- \sqrt{2} | x |} & and & L_{1} (x) = 1 - \frac{1}{2} e^{- \sqrt{2} | x |}, x > 0 \\ l_{2} (x) = (\frac{1}{2} + | x |) e^{- 2 | x |} & and & L_{2} (x) = 1 - \frac{1}{2} (1 + x) e^{- 2 | x |}, x > 0 \\ l_{3} (x) = \frac{3 \sqrt{6}}{16} (1 + \sqrt{6} | x | + 2 x^{2}) e^{- \sqrt{6} | x |}) & and & L_{3} (x) = 1 - (\frac{1}{2} + \frac{5 \sqrt{6} x}{16} + \frac{3 x^{2}}{8}) e^{- \sqrt{6} | x |}, \end{matrix}\}

(49)

where

L_{r} (- x) = 1 - L_{r} (x)

for

x \geq 0

.

The standard Laplace distribution is

L_{1} (x)

with variance 1 and density

l_{1} (x)

given in (49). Therefore, Sargans distributions are a kind of generalizations of the standard Laplace distribution.

Theorem 5.

Let

r = 1, 2, 3

and (29) be probability mass function of the random dimension

N_{n} = N_{n} (r)

of the vectors under consideration. If the representation (42) for the statistic

R_{m}

and the inequality (32) for

N_{n} (r)

with

g_{n} = E N_{n} (r) = r (n - 1) + 1

hold, then there exists a constant

C_{r}

such that for all

n \in N_{+}

sup_{x} |P (g_{n}^{- 1 / 2} N_{n} (r) R_{N_{n} (r)} \leq x) - L_{n; 2} (x; 1)| \leq C_{r} \{\begin{matrix} n^{- min {r, 2}}, & r \neq 2, \\ ln (n) n^{- 2}, & r = 2, \end{matrix}

(50)

where

L_{n; 2} (x; a) = \{\begin{matrix} L_{1} (x), & r = 1, \\ L_{2} (x) + \frac{a x | x | - 5 x \sqrt{2}}{2 (n - 1) + 1} e^{- 2 | x |}, & r = 2, \\ L_{3} (x) + \frac{27}{24 (n - 1) + 8} (\frac{a x^{3}}{\sqrt{2}} - \frac{5 x | x |}{6} - \frac{5 x}{6 \sqrt{6}}) e^{- \sqrt{6} | x |} \\ + \frac{9 x}{2 n} (\frac{1}{12 \sqrt{6}} + \frac{| x |}{12} - \frac{x^{2}}{6 \sqrt{6}}) e^{- \sqrt{6} | x |}, & r = 3 . \end{matrix}

(51)

For arbitrary

r > 0

, the approximation rate is given by:

sup_{x} |P (g_{n}^{- 1 / 2} N_{n} (r) R_{N_{n} (r)} \leq x) - L_{r} (x)| \leq C_{r} n^{- min {r 1}} .

Moreover, the scaled angle

N_{n} (r) θ_{N_{n} (r)}

between the vectors

{\vec{X}}_{N_{n} (r)}

and

{\vec{Y}}_{N_{n} (r)}

allows the estimate

{sup}_{x} |P (g_{n}^{- 1 / 2} N_{n} (r) θ_{N_{n} (r)} \leq x) - L_{n; 2} (x; 1 / 3)| \leq C_{r} \{\begin{matrix} n^{- min {r, 2}}, & r \neq 2, \\ ln (n) n^{- 2}, & r = 2, \end{matrix}

where

L_{n; 2} (x; 1 / 3)

is given in (51) with

a = 1 / 3

.

Figure 4 shows that the Chebyshev–Edgeworth expansion approaches the empirical distribution function better than the limit Laplace law.

Remark 15.

One can find the distribution functions

L_{r} (x)

for arbitrary

r > 0

with formula 1.12.1.3 in Prudnikov et al. [32]:

\begin{matrix} L_{r} (x) & = \frac{1}{2} + \frac{2 r^{r}}{\sqrt{2 π} Γ (r)} \int_{0}^{x} {(\frac{| x |}{2 r})}^{r - 1 / 2} K_{r - 1 / 2} (\sqrt{2 r} | x |) d x \\ = \frac{1}{2} + \frac{x}{{(2 r)}^{(r - 1 / 2) / 2}} (K_{r - 1 / 2} (\sqrt{2 r} x) L_{r - 3 / 2} (\sqrt{2 r} x) + K_{r - 3 / 2} (\sqrt{2 r} x) L_{r - 1 / 2} (\sqrt{2 r} x)) . \end{matrix}

where

L_{α} (x)

are the modified Struve functions of order α, for properties of modified Struve functions see, e.g., Oldham et al. [30], Section 57:13.

Remark 16.

The function (48) as density of a mixture of normal distributions with zero mean and random variance

W_{r}

having gamma distribution

P (W_{r} \leq y) = G_{r, r} (y)

is given also in Kotz et al. [31], Formula (4.1.32) with

τ = r

,

σ = 1 / \sqrt{r}

, using Formula (A.0.4) with

λ = - r + 3 / 2

and the order-reflection formula

K_{- α} (x) = K_{α} (x)

. Such a variance gamma model is studied in Madan and Senata [33] for share market returns.

Remark 17.

A systematic exposition about the Laplace distribution and its numerous generalization and diverse applications one finds in the useful and interesting monography by Kotz et al. [31]. Here, these generalized Laplace distributions

L_{1} (x)

,

L_{2} (x)

and

L_{3} (x)

are the leading terms in the approximations of the sample correlation coefficient

R_{N_{n} (r)}^{* *}

of two Gaussian vectors with negative binomial distributed random dimension

N_{n} (r)

and the angle

θ_{N_{n} (r)}^{* *}

between these vectors.

Remark 18.

In Goldfeld and Quandt [34] and Missiakoulis [35] Sargans densities

l_{r} (x)

and distribution functions

L_{r} (x)

for arbitrary integer

r = 1, 2, 3, \dots

have been studied as an alternative to normal law in econometric models because they are computable in closed form, see also Kotz et al. [31], Section 4.4.3 and the references therein.

5.2. The Random Dimension $N_{n} = N_{n} (s)$ Is the Maximum of n Independent Discrete Pareto Random Variables

The random dimension

N_{n} (s)

has probability mass function (39): Since

E N_{n} (s) = \infty

we choose

g_{n} = n

and consider again the cases

γ = 1 / 2

,

γ = 0

and

γ = - 1 / 2

.

It follows from Theorems 1 and 2 and Proposition 2 that if limit distributions for

P (g_{n}^{γ} R_{N_{n} (s)} \leq x)

for

γ \in {1 / 2, 0 - 1 / 2}

exist, they are

\int_{0}^{\infty} Φ (x y^{γ}) d H_{s} (y)

with densities given below in the proof of the corresponding theorems

\frac{s}{\sqrt{2 π}} \int_{0}^{\infty} y^{- 3 / 2} e^{- (x^{2} y^{2 γ} / 2 + s / y)} d y = \{\begin{matrix} l_{1 / \sqrt{s}} (x) = \frac{\sqrt{2 s}}{2} e^{- \sqrt{2 s} | x |}, & γ = 1 / 2, \\ φ (x) = \frac{1}{\sqrt{2 π}} e^{- x^{2} / 2}, & γ = 0, \\ s_{2}^{*} (x; \sqrt{s}) = \frac{1}{2 \sqrt{2 s}} {(1 + \frac{x^{2}}{2 s})}^{- 3 / 2}, & γ = - 1 / 2, \end{matrix}

(52)

where

s_{2}^{*} (x; \sqrt{s})

is the density of the scaled Student’s t-distribution

S_{2}^{*} (x; \sqrt{s})

with 2 degrees of freedom, see Definition B37 in Jackman [36], p.507. If Z has density

s_{2}^{*} (x; \sqrt{s})

then

Z / \sqrt{s}

has a classic Student’s t-distribution with two degrees of freedom.

5.2.1. Laplace Distribution

We start with the case

γ = 1 / 2

in Theorems 1 and 2. Consider the statistics

\sqrt{n} R_{N_{n} (s)}

and

\sqrt{n} (θ_{N_{n} (s)} - π / 2)

. The limit distribution is now the Laplace distribution

L_{1 / \sqrt{s}} (x) = \frac{1}{2} + \frac{1}{2} sign (x) (1 - e^{- \sqrt{2 s} | x |}) with density l_{1 / \sqrt{s}} (x) = \frac{\sqrt{2 s}}{2} e^{- \sqrt{2 s} | x |} .

Theorem 6.

Let

s > 0

and (39) be the probability mass function of the random dimension

N_{n} = N_{n} (s)

of the vectors under consideration. If the representation (42) for the statistic

R_{m}

and the inequality (32) with

g_{n} = n

hold, then there exists a constant

C_{s}

such that for all

n \in N_{+}

sup_{x} |P (\sqrt{n} R_{N_{n} (s)} \leq x) - L_{1 / \sqrt{s}; n} (x; a)| \leq C_{s} n^{- 2},

where

L_{1 / \sqrt{s}; n} (x; a) = L_{1 / \sqrt{s}} (x) + \frac{l_{1 / \sqrt{s}} (x)}{8 s n} (a 2 s x^{3} - (4 - s) x (1 + \sqrt{2 s} | x |)) .

(53)

Moreover, the scaled angle

θ_{N_{n} (s)}

between the vectors

{\vec{X}}_{N_{n} (s)}

and

{\vec{Y}}_{N_{n} (s)}

allows the estimate

{sup}_{x} |P (\sqrt{n} (θ_{N_{n} (s)} - π / 2) \leq x) - L_{1 / \sqrt{s}; n} (x; 1 / 3)| \leq C_{s} n^{- 2},

where

L_{1 / \sqrt{s}; n} (x; 1 / 3)

is given in (53) with

a = 1 / 3

.

5.2.2. Standard Normal Distribution

Let

γ = 0

in the Theorems 1 and 2 examine the statistics

R_{N_{n} (s)}^{*}

and

Θ_{N_{n} (s)}^{*} = \sqrt{N_{n} (s)} (θ_{N_{n} (s)} - π / 2)

.

Theorem 7.

Let

s > 0

and

N_{n} = N_{n} (s)

be the random vector dimension having probability mass function (39). If the representation (42) for the statistic

R_{m}

and the inequality (32) with

g_{n} = n

hold, then there exists a constant

C_{s}

such that, for all

n \in N_{+}

sup_{x} |P (\sqrt{N_{n} (s)} R_{N_{n} (s)} \leq x) - Φ (x) - \frac{1}{4 n} φ (x) s^{2} (x^{3} - 5 x)| \leq C_{s} n^{- 2},

Moreover, the scaled angle

θ_{N_{n} (s)}^{*}

between the vectors

{\vec{X}}_{N_{n} (s)}

and

{\vec{Y}}_{N_{n} (s)}

allows the estimate

{sup}_{x} |P (\sqrt{N_{n} (s)} (θ_{N_{n} (s)} - π / 2) \leq x) - Φ (x) - \frac{1}{4 n} φ (x) s^{2} (\frac{1}{3} x^{3} - 5 x)| \leq C_{s} n^{- 2} .

5.2.3. Scaled Student’s t-Distribution

Finally, we use

γ = - 1 / 2

in Theorems 1 and 2 examining the statistics

n^{- 1 / 2} N_{n} (s) R_{N_{n} (s)}

and

n^{- 1 / 2} N_{n} (s) (θ_{N_{n} (s)} - π / 2)

. The limit Scaled Student’s t-Distribution

S_{2}^{*} (x; \sqrt{s})

with two degrees of freedom is a scale mixture of the normal distribution with zero mean and mixing exponential distribution

1 - e^{- s y}, y \geq 0

, and it is representable in a closed form, see (A15) below in the proof of Theorem 8:

\begin{matrix} \int_{0}^{\infty} Φ (x / \sqrt{y}) d e^{- s / y} & = \int_{0}^{\infty} Φ (x / \sqrt{y}) s y^{- 2} e^{- s / y} d y = \int_{0}^{\infty} Φ (x \sqrt{z}) s e^{- s z} d z \\ = \int_{0}^{\infty} Φ (x \sqrt{z}) d (1 - e^{- s z)} d z = \frac{1}{2} + \frac{x / \sqrt{s}}{2 \sqrt{2} \sqrt{1 + x^{2} / (2 s)}} = S_{2}^{*} (x) \end{matrix}

Theorem 8.

Let

s > 0

and

N_{n} = N_{n} (s)

be the random vector dimension having probability mass function (39). If the representation (42) for the statistic

R_{m}

and the inequality (32) with

g_{n} = n

hold, then there exists a constant

C_{s}

such that for all

n \in N_{+}

sup_{x} |P (n^{- 1 / 2} N_{n} (s) R_{N_{n} (s)} \leq x) - S_{n; 2}^{*} (x; 1)| \leq C_{r} n^{- 2},

(54)

where

S_{n; 2}^{*} (x; \sqrt{s}; a) = S_{2}^{*} (x; \sqrt{s}) + \frac{(15 a + 3 s - 18) x^{3} - 6 x s (6 - s)}{4 n {(x^{2} + 2 s)}^{2}} s_{2}^{*} (x; \sqrt{s})

(55)

Moreover, the scaled angle

θ_{N_{n} (s)}^{*}

between the vectors

{\vec{X}}_{N_{n} (s)}

and

{\vec{Y}}_{N_{n} (s)}

allows the estimate

{sup}_{x} |P (n^{- 1 / 2} N_{n} (s) θ_{N_{n} (s)} \leq x) - S_{n; 2} (x; \sqrt{s}; 1 / 3)| \leq C_{s} n^{- 2},

where

S_{n; 2} (x; \sqrt{s}; 1 / 3)

is given in (55) with

a = 1 / 3

.

Theorems 3 to 8 are proved in Appendix A.3.

6. Conclusions

The asymptotic distributions of the sample correlation coefficient of vectors with random dimensions are normal scale mixtures. From (43) and (52), one can conclude that random dimension and corresponding scaling have significant influence on limit distributions A scale mixture of a normal distribution change the tail behavior of the distribution. Students t-Distributions have polynomial tails, as one class of heavy-tailed distributions, they can be used to model heavy-tail returns data in finance. The Laplace distributions have heavier tails than normal distributions.

Author Contributions

Conceptualization, G.C. and V.V.U.; methodology, V.V.U. and G.C.; writing–original draft, G.C. and V.V.U.; writing–review and editing, V.V.U. and G.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding. It was done within the framework of the Moscow Center for Fundamental and Applied Mathematics, Lomonosov Moscow State University, and HSE University Basic Research Programs.

Acknowledgments

The authors would like to thank the Managing Editor and the Reviewers for the careful reading of the manuscript and pertinent comments. Their constructive feedback helped to improve the quality of this work and shape its final form.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Proofs of the Theorems and Lemmas

Appendix A.1. Proofs of Theorems 1 and 2

Proof of Theorem 1.

The proof follows along the similar arguments of the more general transfer Theorem 3.1 in Bening et al. [24]. Since in Theorem 3.1 in Bening et al. [24] the constant

γ

has to be non-negative and in our Theorem 1, we also need

γ = - 1 / 2

, therefore we repeat the proof. Conditioning on

N_{n}

, we have

\begin{matrix} P (g_{n}^{γ} T_{N_{n}} \leq x) = P (N_{n}^{γ} T_{N_{n}} \leq x {(N_{n} / g_{n})}^{γ}) = \sum_{m = 1}^{\infty} P (m^{γ} T_{m} \leq x {(m / g_{n})}^{γ}) P (N_{n} = m) . \end{matrix}

Using now (18) with

Φ_{m} (z) : = Φ (z) + m^{- 1} f_{2} (z)

, we find

\begin{matrix} \sum_{m = 1}^{\infty} & |P (m^{γ} T_{m} \leq x {(m / g_{n})}^{γ}) - Φ_{m} (x {(m / g_{n})}^{γ})| P (N_{n} = m) \\ \overset{(18)}{\leq} C_{1} \sum_{m = 1}^{\infty} m^{- a} P (N_{n} = m) = C_{1} E (N_{n}^{- a}) . \end{matrix}

(A1)

Taking into account

P (N_{n} / g_{n} < 1 / g_{n}) = P (N_{n} < 1) = 0

, we obtain

\begin{matrix} \sum_{m = 1}^{\infty} Φ_{m} (x {(m / g_{n})}^{γ}) P (N_{n} = m) & = E (Φ_{N_{n}} (x {(N_{n} / g_{n})}^{γ})) \\ = \int_{1 / g_{n}}^{\infty} Δ_{n} (x, y) d P (\frac{N_{n}}{g_{n}} \leq y) = G_{n} (x, 1 / g_{n}) + I_{1}, \end{matrix}

where

Δ_{n} (x, y) : = Φ (x y^{γ}) + f_{2} (x y^{γ}) / (g_{n} y),

G_{n} (x, 1 / g_{n})

is defined in (21) and

I_{1} = \int_{1 / g_{n}}^{\infty} Δ_{n} (x, y) d (P (\frac{N_{n}}{g_{n}} \leq y) - H (y) - \frac{h_{2} (y) I_{{b > 1}} (b)}{n}) .

Estimating integral

I_{1}

, we use the integration by parts for Lebesgue–Stieltjes integrals, the boundedness of

f_{2} (z)

, say

{sup}_{z} | f_{2} (z) | \leq c_{1}^{*}

, and estimates (19)

\begin{matrix} | I_{1} | & \leq {sup}_{x} {lim_{L \to \infty} | Δ_{n} (x, y) | | P (N_{n} / g_{n} \leq y) - H (y) - n^{- 1} h_{2} (y) I {b > 1} (b) ||}_{y = 1 / g_{n}}^{y = L} \\ + {sup}_{x} \int_{1 / g_{n}}^{\infty} | \frac{\partial}{\partial y} Δ_{n} (x, y) | |P (N_{n} / g_{n} \leq y) - H (y) - n^{- 1} h_{2} (y) I_{{b > 1}} (b)| d y \\ \leq (1 + c_{1}^{*}) C_{2} n^{- b} + C_{2} D_{n} n^{- b}, \end{matrix}

where

D_{n}

is defined in (22). Together with (A1), we obtain (20) and Theorem 1 is proved. □

Proof of Theorem 2.

Using (23), we find for

b > 0

\int_{0}^{1 / g_{n}} Φ (x y^{γ}) d H (y) \leq \int_{0}^{1 / g_{n}} d H (y) = H (1 / \sqrt{g_{n}}) - H (0) \overset{(23)}{\leq} c_{1} g_{n}^{- b} .

Let now

b > 1

. Since

f_{2} (z)

is supposed to be bounded, it follows from

| f_{2} (z) | \leq c_{1}^{*} < \infty

and (24i) that

\int_{0}^{1 / g_{n}} | f_{2} (x y^{γ}) | y^{- 1} d H (y) \leq c_{1}^{*} \int_{0}^{1 / g_{n}} y^{- 1} d H (y) \overset{(24 i)}{\leq} c_{1}^{*} c_{2} g_{n}^{- b + 1} .

Integration by parts,

| z | φ (z) \leq c^{*} = {(2 π e)}^{- 1 / 2}

, (24ii) and (24iii) lead to

|\int_{0}^{1 / g_{n}} Φ (x y^{γ}) d h_{2} (y)| \leq | h_{2} (1 / g_{n}) | + γ c^{*} \int_{0}^{1 / g_{n}} y^{- 1} | h_{2} (y) | d y \leq (c_{3} + γ c^{*} c_{4}) n g_{n}^{- b} .

Theorem 2 is proved. □

Appendix A.2. Proofs of Lemmas 1 to 5

Proof of Lemma 1.

To estimate

D_{n}

, we consider three cases:

D_{n} = {sup}_{x} | D_{n} (x) | = max {{sup}_{x > 0} | D_{n} (x) |, {sup}_{x < 0} | D_{n} (x) |, | D_{n} (0) |} .

Let

x > 0

. Since

\frac{\partial}{\partial y} Φ (x y^{γ}) = γ x y^{γ - 1} φ (x y^{γ}) \geq 0

, we find

\int_{1 / g_{n}}^{\infty} |\frac{\partial}{\partial y} Φ (x y^{γ})| d y = \int_{1 / g_{n}}^{\infty} γ x y^{γ - 1} φ (x y^{γ}) d y = \int_{x g_{n}^{- γ}}^{\infty} φ (u) d u = Φ (\infty) - Φ (x g_{n}^{- γ}) \leq 1 / 2 .

Consider now

f_{2} (x y^{γ}; a) = (a {(x y^{γ})}^{3} - 5 x y^{γ}) φ (x y^{γ}) / 4

with

a = 1

or

a = 1 / 3

. Then,

\frac{\partial}{\partial y} (\frac{f_{2} (x y^{γ}; a)}{y}) = \frac{Q_{5} (x y^{γ}; a)}{4 y^{2}}, Q_{5} (z; a) = - (γ a z^{5} - ((3 a + 5) γ - a) z^{3} + 5 (γ - 1) z) φ (z) .

(A2)

Since

{sup}_{z} | Q_{5} (z; a) | \leq c (γ; a) < \infty

and

g_{n}^{- 1} \int_{1 / g_{n}}^{\infty} y^{- 2} d y = 1

, inequality (29) holds for

x > 0

. Taking into account

| D_{n} (x) | = | D_{n} (- x) |

and

D_{n} (0) = 0

, Lemma 1 is proved. □

Proof of Lemma 2.

Using (30), we find

G_{r, r} (1 / g_{n}) \leq c_{1} g_{n}^{- r}

with

c_{1} = r^{r - 1} / Γ (r)

. For

r > 1

, then

\int_{0}^{1 / g_{n}} y^{- 1} d G_{r, r} (y) \leq c_{2} g_{n}^{- r + 1}

with

c_{2} = r^{r} / ((r - 1) Γ (r))

. Since

g_{r, r} (0) = 0

,

h_{2; r} (0) = 0

and

g_{n} \leq r n

for

r > 1

, then (24ii) and (24iii) hold with

c_{3} = c_{r}^{*}

and

c_{4} = c_{r}^{*} / (r - 1)

, where

c_{r}^{*} = \frac{r^{r}}{2 r Γ (r)} {sup}_{y} {e^{- r y} (| y - 1 | | 2 - r | + 1)} < \infty

.

It remains to prove the bounds in (34) and (35). Let first

r < 1

. With

c_{1}^{*} = {sup}_{z} | f_{2} (z; a) |

, we find

| I_{1} (x, n) | \leq \frac{c_{1}^{*} r^{r}}{g_{n} Γ (r)} \int_{1 / g_{n}}^{\infty} y^{r - 2} d y \leq \frac{c_{1}^{*} r^{r}}{(r - 1) Γ (r)} g_{n}^{- r} with c_{5} = \frac{c_{1}^{*} r^{r}}{(r - 1) Γ (r)} .

If

r = 1

with

c_{1}^{* *} = {sup}_{z} {| a z^{2} - 5 | φ (z / \sqrt{2})}

, we find

| f_{2} (z; a) | \leq c_{1}^{* *} | z | φ (z / \sqrt{2})

and

| I_{1} (x, n) | \leq \frac{c_{1}^{* *} | x |}{\sqrt{2 π} n} \int_{1 / n}^{\infty} y^{γ - 1} e^{- (y + x^{2} y^{2 γ} / 4)} d y with γ \in {- 1 / 2, 0, 1 / 2} .

For

γ = 1 / 2

using

| x | {(1 + x^{2} / 4)}^{- 1 / 2} \leq 2

, we obtain

| I_{1} (x, n) | \leq \frac{c_{1}^{* *} | x |}{\sqrt{2 π} n} \int_{1 / n}^{\infty} y^{1 / 2 - 1} e^{- (1 + x^{2} / 4) y} d y \leq \frac{c_{1}^{* *} | x | Γ (1 / 2)}{\sqrt{2 π} {(1 + x^{2} / 4)}^{1 / 2}} n^{- 1} \leq c_{6} n^{- 1}, c_{6} = \sqrt{2} c_{1}^{* *} .

If

γ = - 1 / 2

, then Prudnikov et al. [37], formula 2.3.16.3, for

x \neq 0

leads to

I_{1} (x, n) \leq \frac{c_{1}^{* *} | x |}{\sqrt{2 π} n} \int_{1 / n}^{\infty} y^{- 1 - 1 / 2} e^{- (2 y + x^{2} / (4 y))} d y \leq \frac{c_{1}^{* *} | x |}{\sqrt{2 π} n} \frac{2 \sqrt{π}}{| x |} e^{- \sqrt{2} | x |} \leq \frac{\sqrt{2} c_{1}^{* *}}{n}, c_{6} = \sqrt{2} c_{1}^{* *} .

Finally, if

γ = 0

, then

f_{2} (x y^{γ}; a) = f_{2} (x; a)

does not depend on y. Using now

0 \leq ln n - \int_{1 / n}^{1} y^{- 1} d G_{1, 1} (y) = \int_{1 / n}^{1} \frac{1 - e^{- y}}{y} d y \leq 1 and \int_{1}^{\infty} y^{- 1} d G_{1, 1} (y) \leq e^{- 1},

then (34) for

r = 1

holds with

c_{6} = c_{1}^{*} (1 + e^{- 1})

.

Let

r > 1

. Integration by parts for Lebesgue–Stieltjes integrals in

I_{2} (x, n)

in (35) and (A2) lead to

I_{2} (x, n) \leq \frac{1}{n g_{n}} (c_{1}^{*} g_{n} | h_{2; r} (1 / g_{n}) | + \int_{1 / g_{n}}^{\infty} \frac{| Q_{5} (x y^{γ}; a) |}{4 y^{2}} | h_{2; r} (y) | d y) .

(A3)

Since

c (γ; a) = {sup}_{z} | Q_{5} (z; a) | < \infty

and with above defined

c_{r}^{*}

, we find

\int_{1 / g_{n}}^{\infty} \frac{| h_{2; r} (y) |}{y^{2}} d y \leq c_{r}^{*} \int_{1 / g_{n}}^{\infty} y^{r - 3} d y = \frac{c_{r}^{*}}{(2 - r)} g_{n}^{- r + 2} f o r 1 < r < 2

and with

c_{r}^{* *} = \frac{r^{r - 1}}{2 Γ (r)} {sup}_{y} {(e^{- r y / 2} (| y - 1 | | 2 - r | + 1)} < \infty

, we obtain

\int_{1 / g_{n}}^{\infty} \frac{| h_{2; r} (y) |}{y^{2}} d y \leq c_{r}^{* *} \int_{1 / g_{n}}^{\infty} y^{r - 3} e^{- r y / 2} d y \leq \frac{c_{r}^{* *} Γ (r - 2)}{{(r / 2)}^{r - 2}} for r > 2 .

Hence, we obtain (35) for

r > 1

,

r \neq 2

with some constant

0 < c_{7} < \infty

.

For

r = 2

, the second integral in line above is an exponential integral. Therefore, we estimate the integral

I_{2} (x, n)

in (35) more precisely like in estimating

I_{1} (x, n)

above, taking into account the given function

f_{2} (z; a)

.

Using

| h_{2; 2} (y) | \leq 4 y e^{- 2 y}

and consider (A2), define

P_{4} (z; a)

by

Q_{5} (z; a) = - z P_{4} (z; a) φ (z / \sqrt{2})

with

c_{2}^{*} = {sup}_{z} | P_{4} (z; a) | φ (z / \sqrt{2}) < \infty

, we obtain in (A3)

\int_{1 / g_{n}}^{\infty} \frac{| Q_{5} (x y^{γ}) |}{4 y^{2}} | h_{2; 2} (y) | d y \leq \frac{c_{2}^{*} | x |}{\sqrt{2 π}} \int_{1 / g_{n}}^{\infty} y^{γ - 1} e^{- (2 y + x^{2} y^{2 γ} / 4)} d y .

We estimate the latter integral in the same way as

I_{1} (x, n)

for the two cases

γ = 1 / 2

γ = - 1 / 2

and find (35) for

r = 2

with some constants

0 < c_{7} < \infty

.

In order to prove (35) for

r = 2

and

γ = 0

, we consider for

α > 0

the following inequalities:

\int_{1 / g_{n}}^{\infty} y^{- 1} e^{- α y} d y \{\begin{matrix} \leq \int_{1 / g_{n}}^{1} y^{- 1} d y + \int_{1}^{\infty} e^{- α y} d y \leq ln g_{n} + α^{- 1} e^{- α}, \\ \geq \int_{1 / g_{n}}^{1} y^{- 1} e^{- α y} d y \geq e^{- α} \int_{1 / g_{n}}^{1} y^{- 1} d y \geq e^{- α} ln g_{n} \end{matrix} .

(A4)

The upper bound in (A4) leads to (35) for

r = 2, γ = 0

, too. The lower bound in (A4) shows that the

ln n

-term cannot be improved.

Bound (36) for

n \geq 2, r > 1

results from

0 \leq \frac{1}{g_{n}} - \frac{1}{r n} = \frac{r - 1}{r^{2} n^{2} (1 - (r - 1) / (r n)} \leq \frac{2 (r - 1)}{r^{2} n^{2}}

. □

Proof of Lemma 3.

Let

r > 0

. If

n = 1

, then

P (N_{1} (r) = 1) = 1

and (37) holds with

C (r) = 1

. Let

n \geq 2

and

α > 0

E {(N_{n} (r))}^{- α} = \frac{1}{n^{r}} (1 + \sum_{k = 2}^{\infty} \frac{Γ (k + r - 1)}{k^{α} Γ (r) Γ (k)} {(1 - \frac{1}{n})}^{k - 1}) .

It follows from the relations (49) and (50) with their corresponding bounds in the proof of Theorem 1 in Christoph et al. [12] that

\frac{Γ (k + r - 1)}{Γ (r) Γ (k)} = \frac{1}{(k + r - 1) B (r k)} = \frac{k^{r - 1}}{Γ (r)} (1 + R_{1} (k)), | R_{1} (k) | \leq \frac{c_{1} (r)}{k} .

(A5)

For

x \geq k \geq 2

using

{(1 - 1 / n)}^{x} \leq e^{- x / n}

, we find

\frac{k^{r - 1} {(1 - 1 / n)}^{k - 1}}{k^{α}} \leq \int_{k}^{k + 1} \frac{x^{r} {(1 - 1 / n)}^{x - 2}}{{(x - 1)}^{1 + α}} d x \leq 2^{3 + α} \int_{k}^{k + 1} x^{r - 3} e^{- x / n} d x .

Then, with

c_{2} (r) = 2^{3 + α} (1 + c_{1} (r)) / Γ (r)

, we obtain

E {(N_{n} (r))}^{- α} \leq c_{2} (r) n^{- r} J_{r} (n), where J_{r} (n) = \int_{1}^{\infty} x^{r - α - 1} e^{- x / n} d x = n^{r - α} \int_{1 / n}^{\infty} y^{r - α - 1} e^{- y} d y .

Since

J_{r} (n) \leq {(α - r)}^{- 1}

for

0 < r < α

,

J_{r} (n) \leq n^{r - α} Γ (r - α)

for

r > α

and using (A4) with

r = α

J_{r} (n) \leq ln n + e^{- 1}

, the upper bound (37) is proved.

Let

r = α > 0

. Considering the formula (A5),

0 \leq \sum_{k = 2}^{\infty} k^{- 1} | R_{1} (k) | \leq c_{1} (r) π^{2} / (6 Γ (r)) < \infty

,

\sum_{(n)} : = \sum_{k = 2}^{n - 1} k^{- 1} \geq ln n - ln 2

and

\sum_{k = 2}^{n - 1} \frac{1 - {(1 - 1 / n)}^{k - 1}}{k} \leq \sum_{k = 2}^{n - 1} \frac{k - 1}{k n} \leq 1

, we find:

E {(N_{n} (r))}^{- r} \geq \frac{1}{n^{r} Γ (r)} (\sum_{k = 2}^{n - 1} \frac{1}{k} {(1 - \frac{1}{n})}^{k - 1} - c_{3}) \geq \frac{1}{n^{r} Γ (r)} (\sum_{k = 2}^{n - 1} \frac{1}{k} - c_{4}) \geq \frac{1}{n^{r} Γ (r)} (ln n - c_{5}),

where

c_{3} = c_{1} (r) π^{2} / 6

,

c_{4} = 1 + c_{3}

and

c_{5} = c_{4} - ln 2

. Hence, the

ln n

-term cannot be dropped. □

Proof of Lemma 4.

The upper bounds in the estimates (23) and (24) with

H_{s} (y)

,

h_{2; s} (y)

and

I_{2} (x, n)

given in (40) are

C (s) e^{- s n / 2}

. For example, (24ii):

\int_{0}^{1 / n} y^{- 1} | h_{2; s} (y) | d y \leq s (s + 1) / 2 \int_{0}^{1 / n} y^{- 3} e^{- s / y} d y \leq (s + 1) / (2 s) \int_{s n}^{\infty} z e^{- z} d z \leq (s + 1) / (2 s) e^{- s n / 2} .

□

Proof of Lemma 5.

Proceeding as in Bening et al. [24] using

P (N_{n} (s) = k) = {(\frac{k}{s + k})}^{n} - {(\frac{k - 1}{s + k - 1})}^{n} = s n \int_{k - 1}^{k} \frac{x^{n - 1}}{{(s + x)}^{n + 1}} d x

and Formula 2.2.4.24 in Prudnikov et al. [37], p. 298, then

E (N_{n}^{- α}) = s n \sum_{k = 1}^{\infty} \frac{1}{k^{α}} \int_{k - 1}^{k} \frac{x^{n - 1}}{{(s + x)}^{n + 1}} d x \leq s n \int_{0}^{\infty} \frac{x^{n - 1 - α}}{{(s + x)}^{n + 1}} d x = s n B (n - α, 1 + α) .

Using

B (n - α, 1 + α) = Γ (1 + α) {(n + 1)}^{- 1 + α} (1 + R_{1} / n)

with

| R_{1} | \leq c < \infty

, we obtain (41). □

Appendix A.3. Proofs of Theorems 3 to 8

Proof of Theorem 3.

Since the additional assumptions (23) and (24) in the transfer Theorem 2 for the limit Gamma distribution

H (x) = G_{r, r} (x)

of the normalized sample size

N_{n} (r)

are satisfied by Lemma 2 with

b = r > 0

and by Lemma 3 for

α = 2

, it remains to calculate the integrals in (26). Define

\begin{matrix} J_{1}^{*} (x) & = \int_{0}^{\infty} Φ (x \sqrt{y}) d G_{r, r} (y), J_{2}^{*} (x) = \int_{0}^{\infty} \frac{a {(x \sqrt{y})}^{3} - 5 x \sqrt{y} φ (x \sqrt{y})}{4 y} d G_{r, r} (y), and \\ J_{3}^{*} (x) & = \int_{0}^{\infty} Φ (x \sqrt{y}) d h_{2; r} (y) with h_{2; r} (y) = ((y - 1) (2 - r) + 2 Q_{1} ((r (n - 1) + 1) y)) \frac{g_{r, r} (y)}{2 r}, \end{matrix}

and

Q_{1} (y) = 1 / 2 - (y - [y])

. Then,

G_{2; n} (x; 0) = J_{1}^{*} (x) + \frac{J_{2}^{*} (x)}{g_{n}} + \frac{J_{3}^{*} (x)}{n} with g_{n} = E N_{n} (r) = r (n - 1) + 1 .

(A6)

Using formula 2.3.3.1 in Prudnikov et al. [37], p. 322, with

α = r - 1 / 2, r + 1 / 2

,

p = 1 + x^{2} / (2 r)

and

q = 1

:

M_{α} (x) = \frac{r^{r}}{Γ (r) \sqrt{2 π}} \int_{0}^{\infty} y^{α - 1} e^{- (r + x^{2} / 2) y} d y = \frac{Γ (α) r^{r - α}}{Γ (r) \sqrt{2 π}} {(1 + x^{2} / (2 r))}^{- α}

(A7)

we calculate the integrals occurring in (A6). Consider

\begin{matrix} \frac{\partial}{\partial x} J_{1}^{*} (x) & = \int_{0}^{\infty} y^{1 / 2} φ (x \sqrt{y}) g_{r, r} (y) d y = \frac{r^{r}}{Γ (r) \sqrt{2 π}} \int_{0}^{\infty} y^{r - 1 / 2} e^{- (r + x^{2} / 2) y} d y \\ = M_{r + 1 / 2} (x) = s_{2 r} (x) and J_{1}^{*} (x) = S_{2 r} (x) . \end{matrix}

The integral

J_{2}^{*} (x)

in (A6) we calculate again with (A7) using

M_{r - 1 / 2} (x) = s_{2 r} (x) (2 r + x^{2}) / (2 r - 1)

and

M_{r + 1 / 2} (x) = s_{2 r} (x)

\begin{matrix} J_{2}^{*} (x) & : = \frac{r^{r}}{\sqrt{2 π} Γ (r)} \int_{0}^{\infty} \frac{1}{y} (a x^{3} y^{3 / 2} - 5 x y^{1 / 2}) y^{r - 1} e^{- (r + x^{2} / 2) y} d y \\ = (a x^{3} M_{r + 1 / 2} (x) - 5 x M_{r - 1 / 2} (x)) = (a x^{3} - \frac{10 r x + 5 x^{3}}{2 r - 1}) s_{2 r} (x) . \end{matrix}

The integral

J_{3}^{*} (x)

in (A6) is the same as the integral

J_{4} (x)

in the proof of Theorem 2 in Christoph et al. [12] with the estimate

sup_{x} |J_{3}^{*} (x) - \frac{(2 - r) x (x^{2} + 1)}{4 r (2 r - 1)} s_{2 r} (x)| \leq c (r) n^{- r + 1} .

With (36), we proved (44). □

Proof of Theorem 4.

By Lemma 2, the additional assumptions (23) and (24) in the transfer Theorem 2 are satisfied with the limit Gamma distribution

H (x) = G_{r, r} (x)

of the normalized sample size

N_{n} (r)

with

b = r > 0

. In Transfer Theorem 1 for

T_{N_{n}} = \sqrt{N_{n}} R_{N_{n}}

, the right-hand side of (20) is estimated by Lemma 1 and Lemma 3 for

α = 2

for the case

γ = 0

. Then, we have by (21) with (35)

G_{n} (x, 1 / g_{n}) = Φ (x) (1 - G_{r, r} (1 / g_{n}) - n^{- 1} h_{2; r} (1 / g_{n}) I_{{r > 1}} (r)) + \frac{f_{2} (x; a)}{g_{n}} \int_{1 / g_{n}}^{\infty} \frac{1}{y} d G_{r, r} (y) I_{{r > 1}} (r) .

The estimates (23), (24i), (24ii), (34) and

\int_{0}^{\infty} y^{- 1} d G_{r, r} (y) = r Γ (r - 1) / Γ (r)

for

r > 1

lead to (46) with

Φ_{n; 2} (x; 1)

defined in (47). Thus, Theorem 4 is proved. □

Proof of Theorem 5.

By Lemma 2, the additional assumptions (23) and (24) in the transfer Theorem 2 are satisfied with the limit Gamma distribution

H (x) = G_{r, r} (x)

of the normalized sample size

N_{n} (r)

with

b = r > 0

. In Transfer Theorem 1 for

T_{N_{n}} = g_{n}^{- 1 / 2} N_{n} R_{N_{n}}

, the right-hand side of (20) is estimated by Lemma 1 and Lemma 3 for

α = 2

for the case

γ = - 1 / 2

. Then, we have in (25)

G_{2; n} (x; 0) = J_{1; r}^{*} (x) + (\frac{J_{2; r}^{*} (x)}{g_{n}} + \frac{J_{3; r}^{*} (x)}{n}) I_{{r > 1}} (r) with g_{n} = E N_{n} (r) = r (n - 1) + 1

(A8)

\begin{matrix} J_{1; r}^{*} (x) & = \int_{0}^{\infty} Φ (x / \sqrt{y}) d G_{r, r} (y), J_{2; r}^{*} (x) = \int_{0}^{\infty} \frac{(a x^{3} y^{- 3 / 2} - 5 x y^{- 1 / 2}) φ (x / \sqrt{y})}{4 y} d G_{r, r} (y), and \\ J_{3; r}^{*} (x) & = \int_{0}^{\infty} Φ (x / \sqrt{y}) d h_{2; r} (y) with h_{2; r} (y) = ((y - 1) (2 - r) + 2 Q_{1} ((r (n - 1) + 1) y)) \frac{g_{r, r} (y)}{2 r} . \end{matrix}

Consider formula 2.3.16.1 in Prudnikov et al. [37], p. 444:

I_{α} : = \int_{0}^{\infty} y^{α - 1} e^{- p y - q / y} d y = 2 {(\frac{q}{p})}^{α / 2} K_{α} (2 \sqrt{p q}) p > 0, q > 0,

where

K_{α} (u)

is the

α

-order Macdonald function (or

α

-order modified Bessel function of the second kind), see, e.g., Oldham et al. [30], Chapter 51, for properties of these functions.

Let us calculate the integral

J_{1; r}^{*} (x)

occurring in (A8). Consider

\begin{matrix} \frac{d}{d x} J_{1; r}^{*} (x) & = \frac{r^{r}}{\sqrt{2 π} Γ (r)} \int_{0}^{\infty} y^{r - 3 / 2} e^{- r y - (x^{2} / (2 y)} d y \\ = \frac{2 r^{r}}{\sqrt{2 π} Γ (r)} {(\frac{| x |}{2 r})}^{r - 1 / 2} K_{r - 1 / 2} (\sqrt{2 r} | x |) = : l_{r} (x) . \end{matrix}

(A9)

If

α = \pm 1 / 2, \pm 3 / 2, \pm 5 / 2, \dots

the integral

I_{α}

and consequently

K_{α} (x)

are computable in closed-form expressions with formula 2.3.16.2 in Prudnikov et al. [37], p. 444:

I_{m}^{*} = \int_{0}^{\infty} y^{m - 1 / 2} e^{- p y - q / y} d y = {(- 1)}^{m} \sqrt{π} \frac{\partial^{m}}{\partial p^{m}} (p^{- 1 / 2} e^{- 2 \sqrt{p q}}), p > 0, q > 0, m = 0, 1, 2, \dots

(A10)

and with formula 2.3.16.3 in Prudnikov et al. [37], p. 444:

I_{- m}^{*} = \int_{0}^{\infty} y^{- m - 1 / 2} e^{- p y - q / y} d y = {(- 1)}^{m} \sqrt{\frac{π}{p}} \frac{\partial^{m}}{\partial q^{m}} e^{- 2 \sqrt{p q}}, p > 0, q > 0, m = 0, 1, 2, \dots

For

r = 1, 2, 3

using (A10) with

m = r - 1

, we find

l_{r} (x) = \frac{d}{d x} J_{1, r}^{*} (x) = \frac{r^{r}}{Γ (r) \sqrt{2 π}} \int_{0}^{\infty} y^{r - 3 / 2} e^{- r y - x^{2} / (2 y)} d y = \frac{r^{r}}{Γ (r) \sqrt{2 π}} I_{r - 1}^{*}

and we obtain the densities

l_{r} (x)

in (49) with

I_{m}^{*} (x) = = \{\begin{matrix} \sqrt{2 π} \frac{1}{| x |} e^{- \sqrt{2 r} | x |}, x \neq 0 & m = - 1, \\ \sqrt{π} e^{- \sqrt{2 r} | x |}, & m = 0, \\ \sqrt{π} (\frac{1}{2 r^{3 / 2}} + \frac{| x | \sqrt{2}}{2 r}) e^{- \sqrt{2 r} | x |}, & m = 1, \\ \sqrt{π} (\frac{3}{4 r^{5 / 2}} + \frac{3 | x | \sqrt{2}}{4 r^{2}} + \frac{{| x |}^{2}}{2 r^{3 / 2}}) e^{- \sqrt{2 r} | x |}, & m = 2 . \end{matrix}

Consider now

J_{2; r} (x)

for

r = 2

and

r = 3

:

J_{2; r}^{*} (x) = \int_{0}^{\infty} \frac{(a x^{3} y^{- 3 / 2} - 5 x y^{- 1 / 2}) r^{r} y^{r - 1} e^{- r y - x^{2} / (2 y)}}{4 y \sqrt{2 π} Γ (r)} d y = \frac{r^{r}}{4 \sqrt{2 π} Γ (r)} (a x^{3} I_{r - 3}^{*} (x) - 5 x I_{r - 2}^{*} (x)) .

Hence,

J_{2; 2}^{*} (x) = (a x | x | - 5 x / \sqrt{2}) e^{- 2 | x |} and J_{2; 3}^{*} (x) = \frac{27}{8} (\frac{a x^{3}}{\sqrt{2}} - 5 x (\frac{1}{6 \sqrt{6}} + \frac{| x |}{6})) e^{- \sqrt{6} | x |} .

Integration by parts in the integral

J_{3; r}^{*}

in (A8) leads to

\begin{matrix} J_{3; r}^{*} (x) & : = \int_{0}^{\infty} Φ (x y^{- 1 / 2}) d (h_{2; r} (y)) = \frac{x}{2} \int_{0}^{\infty} y^{- 3 / 2} φ (x y^{- 1 / 2}) h_{2; r} (y) d y \\ = \frac{r^{r} x}{2 r Γ (r) \sqrt{2 π}} \int_{0}^{\infty} y^{r - 5 / 2} e^{- r y - x^{2} / (2 y)} ((y - 1) (2 - r) + 2 Q_{1} (g_{n} y)) d y, \\ = \frac{r^{r - 1} x}{2 Γ (r) \sqrt{2 π}} \int_{0}^{\infty} y^{r - 5 / 2} (y - 1) (2 - r) e^{- r y - x^{2} / (2 y)} d y \\ + \frac{r^{r - 1} x}{Γ (r) \sqrt{2 π}} \int_{0}^{\infty} y^{r - 5 / 2} Q_{1} (g_{n} y) e^{- r y - x^{2} / (2 y)} d y = J_{3; r, 1} (x) + J_{3; r, 2} (x) . \end{matrix}

Since

J_{3; 2, 1} (x)

vanishes, we calculate

J_{3; 3, 1} (x)

:

J_{3; 3, 1} (x) = \frac{9 x}{2 \sqrt{2 π}} (I_{1}^{*} (x) - I_{2}^{*} (x)) = \frac{9 x}{2} (\frac{1}{12 \sqrt{6}} + \frac{| x |}{12} - \frac{{| x |}^{2}}{6 \sqrt{6}}) e^{- \sqrt{6} | x |} .

It remains to estimate

J_{3; 2, 2} (x)

and

J_{3; 3, 2} (x)

. The function

Q_{1} (y)

is periodic with period 1:

Q_{1} (y) = Q_{1} (y + 1) for all y \in R and Q_{1} (y) : = 1 / 2 - y for 0 \leq y < 1

It is right-continuous and has the jump 1 at every integer point y. The Fourier series expansion of

Q_{1} (y)

at all non-integer points y is

Q_{1} (y) = 1 / 2 - (y - [y]) = \sum_{k = 1}^{\infty} \frac{sin (2 π k y)}{k π} y \neq [y],

(A11)

see formula 5.4.2.9 in Prudnikov et al. [37], p. 726, with

a = 0

.

Using the Fourier series expansion (A11) of the periodic function

Q_{1} (y)

and interchange sum and integral, we find

J_{3; r, 2}^{*} = \frac{x}{\sqrt{2 π}} \sum_{k = 1}^{\infty} \frac{1}{k} \int_{0}^{\infty} y^{r - 5 / 2} e^{- r y - x^{2} / (2 y)} sin (2 π k g_{n} y) d y .

(A12)

First, we consider

r = 2

. Let

p > 0

,

q > 0

and

b > 0

be some real constants. Formula 2.5.37.4 in Prudnikov et al. [37], p. 453 reads

\int_{0}^{\infty} y^{- 1 / 2} e^{- p y - q / y} sin (b y) d y = \sqrt{\frac{π}{p^{2} + b^{2}}} e^{- 2 \sqrt{q} z_{+}} (z_{+} sin (2 \sqrt{q} z_{-}) + z_{-} cos (2 \sqrt{q} z_{-})) .

(A13)

with

2 z_{\pm}^{2} = \sqrt{p^{2} + b^{2}} \pm p

. Consider

z_{\pm}

with

p = r

,

q = x^{2} / 2

,

b = 2 π k g_{n}

,

k \geq 1

and

n \geq 1

: Then,

\sqrt{\frac{π}{p^{2} + b^{2}}} = \sqrt{\frac{π}{r^{2} + {(2 π k g_{n})}^{2}}} \leq \frac{\sqrt{π}}{2 π k g_{n}}, \sqrt{2} | x | z_{+} e^{- \sqrt{2} | x | z_{+}} \leq e^{- 1} and 0 < z_{-} \leq z_{+}

leads to

\begin{matrix} | J_{3; 2, 2}^{*} (x) | & \leq \frac{2 | x |}{\sqrt{2 π}} \sum_{k = 1}^{\infty} \frac{1}{k} \sqrt{\frac{π}{p^{2} + b^{2}}} e^{- 2 \sqrt{q} z_{+}} (z_{+} sin (2 \sqrt{q} z_{-}) + z_{-} cos (2 \sqrt{q} z_{-})) \\ \leq \frac{2}{\sqrt{2 π}} \sum_{k = 1}^{\infty} \frac{1}{k} \frac{\sqrt{π} \sqrt{2} e^{- 1}}{2 π k g_{n}} = \frac{1}{2 π e g_{n}} \sum_{k = 1}^{\infty} \frac{1}{k^{2}} = \frac{π}{12 e g_{n}} . \end{matrix}

Together with

g_{n} \geq n

, we find

n^{- 1} | J_{3; 2, 2}^{*} (x) | \leq C n^{- 2}

.

Consider finally

J_{3; 3, 2}^{*}

given in (A12). In order to estimate

J_{3; 3, 2}^{*} (x)

, we apply Leibniz’s rule for differentiation under the integral sign with respect to p in (A13) and obtain

\begin{matrix} \int_{0}^{\infty} y^{1 / 2} e^{- p y - q / y} sin (b y) d y & = \frac{\partial}{\partial p} \{\sqrt{\frac{π}{p^{2} + b^{2}}} e^{- 2 \sqrt{q} z_{+}} (z_{+} sin (2 \sqrt{q} z_{-}) + z_{-} cos (2 \sqrt{q} z_{-}))\} . \end{matrix}

Simple calculation considering

\sqrt{q} = | x | / \sqrt{2}

and

{| x |}^{m} e^{- \sqrt{2} | x | z_{+}} \leq \frac{m}{2^{m / 2} z_{+}} \leq \frac{m}{2^{m / 2} b^{m}}

for

m = 1, 2

, leads to

| x | \frac{\partial}{\partial p} \{\sqrt{\frac{π}{p^{2} + b^{2}}} e^{- 2 \sqrt{q} z_{+}} (z_{+} sin (2 \sqrt{q} z_{-}) + z_{-} cos (2 \sqrt{q} z_{-}))\} \leq \frac{C}{b^{2}} = \frac{C}{{(2 π k g_{n})}^{2}}

and we find equation (A12) with

r = 3

that

n^{- 1} | J_{3; 3, 2}^{*} | \leq C n^{- 3}

and (50) is proved. The approximation (52) holds since Lemmas 1, 2, and 3 are valid for arbitrary

r > 0

. Theorem 5 is proved. □

Proof of Theorem 6.

By Lemma 4, the additional assumptions (23) and (24) in the transfer Theorem 2 are satisfied with the limit inverse exponential distribution

H_{s} (y)

and

h_{2; s} (y)

given in (40),

g_{n} = n

and

b = 2

. In Transfer Theorem 1, the right-hand side of (20) is estimated by Lemma 1 and Lemma 5 for

α = 2

for the case

γ = 1 / 2

. Then, we have in (25) with (35)

G_{2; n} (x; 0) = J_{1; s}^{*} (x) + n^{- 1} J_{2; s}^{*} (x) + n^{- 1} J_{3; s}^{*} (x),

\begin{matrix} where J_{1; s}^{*} (x) = \int_{0}^{\infty} Φ (x \sqrt{y}) d e^{- s / y}, & J_{2; s}^{*} (x) = \int_{0}^{\infty} \frac{(a x^{3} y^{3 / 2} - 5 x \sqrt{y}) φ (x \sqrt{y})}{4 y} d e^{- s / y}, \\ and J_{3; s}^{*} (x) = \int_{0}^{\infty} Φ (x \sqrt{y}) d h_{2; s} (y) & with h_{2; s} (y) = s e^{- s / y} (s - 1 + 2 Q_{1} (n y)) / (2 y^{2}) . \end{matrix}

To obtain (53), we calculate the above integrals as in the proof of Theorem 5 in Christoph et al. [12]. Here, we use Formula 2.3.16.3 in Prudnikov et al. [37], p. 344 with

p = x^{2} / 2 > 0

,

s > 0

,

m = 1, 2

:

\int_{0}^{\infty} \frac{e^{- x^{2} y / 2}}{\sqrt{2 π} y^{m - 3 / 2}} d H_{s} (y) = \int_{0}^{\infty} \frac{s e^{- x^{2} y / 2 - s / y}}{\sqrt{2 π} y^{m + 1 / 2}} d y = {(- 1)}^{m} \frac{s}{| x |} \frac{\partial^{m}}{\partial s^{m}} e^{- \sqrt{2 s} | x |} .

(A14)

In the mentioned proof we obtained with (A14) for

m = 1

\int_{0}^{\infty} Φ (x \sqrt{y}) d H_{s} (y) = L_{1 / \sqrt{s}} (x)

and with (A14) for

m = 2

n^{- 1} sup_{x} |J_{3; s}^{*} (x) - \frac{(1 - s) x (1 + \sqrt{2 s} | x |)}{8 s} l_{1 / \sqrt{s}} (x)| \leq n^{- 1} c (s) e^{- \sqrt{π s n} / 2} \leq C (s) n^{- 2} .

Again using (A14) with

p = x^{2} / 2 > 0

,

s > 0

,

m = 1, 2

we find

\begin{matrix} J_{2; s} (x) & = \frac{s}{\sqrt{2 π}} \int_{0}^{\infty} (a x^{3} y^{- 1 - 1 / 2} - 5 x y^{- 2 - 1 / 2}) e^{- (x^{2} y / 2 + s / y)} d y \\ = \frac{2 s a x^{3} - 5 x (\sqrt{2 s} | x | + 1)}{8 s} l_{1 / \sqrt{s}} (x) . \end{matrix}

□

Proof of Theorem 7.

By Lemma 4, the additional assumptions (23) and (24) in Transfer Theorem 2 are satisfied with the limit inverse exponential distribution

H_{s} (y)

and

h_{2; s} (y)

given in (40),

g_{n} = n

and

b = 2

. In Transfer Theorem 1, the right-hand side of (20) is estimated by Lemma 1 and Lemma 5 for

α = 2

in the case

γ = 0

. Then, we have in (21) with (35)

G_{n} (x, 1 / n) = Φ (x) (1 - e^{- s n} - n^{- 1} h_{2; s} (1 / n)) + \frac{f_{2} (x; a)}{n} \int_{1 / n}^{\infty} \frac{1}{y} d e^{- s / y} .

The estimates (24i), (24ii) for

b = 2

and

\int_{0}^{\infty} y^{- 1} d e^{- s / y} = s \int_{0}^{\infty} y^{- 3} e^{- s / y} d y = s^{2} \int_{0}^{\infty} z e^{- z} d z = s^{2}

lead to

|G_{n} (x, 1 / g_{n}) - Φ (x) - n^{- 1} s^{2} f_{2} (x; a)| \leq C_{s} n^{- 2}

and Theorem 7 is proved. □

Proof of Theorem 8.

By Lemma 4, the additional assumptions (23) and (24) in Transfer Theorem 2 are satisfied with the limit inverse exponential distribution

H_{s} (y)

and

h_{2; s} (y)

given in (40),

g_{n} = n

and

b = 2

. In Transfer Theorem 1, the right-hand side of (20) is estimated by Lemma 1 and Lemma 5 for

α = 2

in the case

γ = - 1 / 2

. Then, we have in (21) with (35)

G_{2; n} (x; 0) = J_{1; s}^{*} (x) + n^{- 1} J_{2; s}^{*} (x) + n^{- 1} J_{3; s}^{*} (x),

\begin{matrix} where J_{1; s}^{*} (x) = \int_{0}^{\infty} Φ (x y^{- 1 / 2}) d e^{- s / y}, & J_{2; s}^{*} (x) = \int_{0}^{\infty} \frac{(a x^{3} y^{- 3 / 2} - 5 x y^{- 1 / 2}) φ (x y^{- 1 / 2})}{4 y} d e^{- s / y}, \\ and J_{3; s}^{*} (x) = \int_{0}^{\infty} Φ (x y^{- 1 / 2}) d h_{2; s} (y) & with h_{2; s} (y) = s e^{- s / y} (s - 1 + 2 Q_{1} (n y)) / (2 y^{2}) . \end{matrix}

To obtain (54), we calculate the above integrals:

\begin{matrix} \frac{\partial}{\partial x} \int_{0}^{\infty} Φ (x \sqrt{y}) d e^{- s / y} & = \frac{s}{\sqrt{2 π}} \int_{0}^{\infty} y^{- 3 / 2} e^{- (x^{2} / 2 + s) / y)} d y = \frac{s}{\sqrt{2 π}} \int_{0}^{\infty} z^{1 / 2 - 1} e^{- (x^{2} / 2 + s) z)} d z \\ = \frac{1}{2 \sqrt{2 s}} {(1 + \frac{x^{2}}{2 s})}^{- 3 / 2} & = s_{2}^{*} (x; \sqrt{s}), and \int_{0}^{\infty} Φ (x \sqrt{y}) d e^{- s / y} = S_{2}^{*} (x; \sqrt{s}) . \end{matrix}

(A15)

Define

K = (s + x^{2} / 2)

. With

z = K / y

and

Γ (α) = \int_{0}^{\infty} z^{α - 1} e^{- z} d z

,

α > 0

, we obtain

\begin{matrix} J_{2; s}^{*} (x) & = \frac{s}{4 \sqrt{2 π}} \int_{0}^{\infty} (a x^{3} y^{- 9 / 2} - 5 x y^{- 7 / 2}) e^{- K / y} d y = \frac{s K^{- 7 / 2}}{4 \sqrt{2 π}} \int_{0}^{\infty} (a x^{3} z^{5 / 2} - 5 x z^{3 / 2} K) e^{- z} d z \\ = \frac{s K^{- 7 / 2}}{4 \sqrt{2 π}} (a x^{3} Γ (7 / 2) - 5 x K Γ (5 / 2)) = \frac{1}{4 {(x^{2} + 2 s)}^{2}} (15 (a - 1) x^{3} - 30 x s) s_{2}^{*} (x; \sqrt{s}) . \end{matrix}

Integration by parts in

J_{3; s}^{*} (x)

leads to

J_{3; s}^{*} (x) = \frac{x}{2 \sqrt{2 π}} \int_{0}^{\infty} y^{- 3 / 2} e^{- x^{2} / (2 y)} s y^{- 2} e^{- s / y} ((s - 1) / 2 + Q_{1} (n y)) d y = J_{4; s}^{*} (x) + J_{5; s}^{*} (x),

where

J_{4; s}^{*} (x) = \frac{s (s - 1) x}{4 \sqrt{2 π}} \int_{0}^{\infty} y^{- 7 / 2} e^{- K / y} d y = \frac{s (s - 1) x Γ (5 / 2)}{4 \sqrt{2 π} K^{5 / 2}} = \frac{3 (s - 1) x}{4 (x^{2} + 2 s)} s_{2}^{*} (x; \sqrt{s})

and using the Fourier series expansion (A11) of the periodic function

Q_{1} (y)

and interchange sum and integral, we find

\begin{matrix} J_{5; s}^{*} (x) & = \frac{s x}{2 \sqrt{2 π}} \int_{0}^{\infty} y^{- 7 / 2} e^{- K / y} Q_{1} (n y) d y = \frac{s x}{2 \sqrt{2 π}} \sum_{k = 1}^{\infty} \frac{1}{k} \int_{0}^{\infty} y^{- 7 / 2} e^{- K / y} sin (2 π k n y) d y \\ = \frac{s x}{2 \sqrt{2 π}} \sum_{k = 1}^{\infty} \frac{1}{k} \int_{0}^{\infty} y^{- 7 / 2} e^{- K / y} sin (2 π k n y) d y . \end{matrix}

Integration by parts in the latter integral and

| x | / \sqrt{K} \leq \sqrt{2}

leads now to

{sup}_{x} | J_{5; s}^{*} (x) | \leq {sup}_{x} \frac{s | x |}{{(2 π)}^{3 / 2} n} \sum_{k = 1}^{\infty} \frac{1}{k^{2}} \int_{0}^{\infty} (\frac{7}{2} y^{- 9 / 2} + K y^{11 / 2}) e^{- K / y} d y \leq c_{s} n^{- 1}

with

c_{s} = \frac{s \sqrt{2}}{{(2 π)}^{3 / 2} n} (\frac{7 Γ (11 / 2)}{2 s^{3}} + \frac{Γ (13 / 2)}{s^{4}}) \frac{π^{2}}{6}

and Theorem 8 is proved. □

References

Hall, P.; Marron, J.S.; Neeman, A. Geometric representation of high dimension, low sample size data. J. R. Stat. Soc. Ser. 2005, 67, 427–444. [Google Scholar] [CrossRef]
Fujikoshi, Y.; Ulyanov, V.V.; Shimizu, R. Multivariate Statistics. High-Dimensional and Large-Sample Approximations; Wiley Series in Probability and Statistics; John Wiley & Sons, Inc.: Hoboken, NJ, USA, 2010. [Google Scholar]
Aoshima, M.; Shen, D.; Shen, H.; Yata, K.; Zhou, Y.-H.; Marron, J.S. A survey of high dimension low sample size asymptotics. Aust. N. Z. J. Stat. 2018, 60, 4–19. [Google Scholar] [CrossRef] [PubMed]
Kawaguchi, Y.; Ulyanov, V.V.; Fujikoshi, Y. Asymptotic distributions of basic statistics in geometric representation for high-dimensional data and their error bounds (Russian). Inform. Appl. 2010, 4, 12–17. [Google Scholar]
Ulyanov, V.V.; Christoph, G.; Fujikoshi, Y. On approximations of transformed chi-squared distributions in statistical applications. Sib. Math. J. 2006, 47, 1154–1166. [Google Scholar] [CrossRef]
Esquível, M.L.; Mota, P.P.; Mexia, J.T. On some statistical models with a random number of observations. J. Stat. Theory Pract. 2016, 10, 805–823. [Google Scholar] [CrossRef]
Gnedenko, B.V. Estimating the unknown parameters of a distribution with a random number of independent observations. (Probability theory and mathematical statistics (Russian)). Tr. Tbil. Mat. Instituta 1989, 92, 146–150. [Google Scholar]
Nunes, C.; Capristrano, G.; Ferreira, D.; Ferreira, S.S.; Mexia, J.T. Fixed effects ANOVA: An extension to samples with random size. J. Stat. Comput. Simul. 2014, 84, 2316–2328. [Google Scholar] [CrossRef]
Nunes, C.; Capristrano, G.; Ferreira, D.; Ferreira, S.S.; Mexia, J.T. Exact critical values for one-way fixed effects models with random sample sizes. J. Comput. Appl. Math. 2019, 354, 112–122. [Google Scholar] [CrossRef]
Barakat, H.M.; Nigm, E.M.; El-Adll, M.E.; Yusuf, M. Prediction of future generalized order statistics based on exponential distribution with random sample size. Stat. Pap. 2018, 59, 605–631. [Google Scholar] [CrossRef]
Al-Mutairi, J.S.; Raqab, M.Z. Confidence intervals for quantiles based on samples of random sizes. Stat. Pap. 2020, 61, 261–277. [Google Scholar] [CrossRef]
Christoph, G.; Monakhov, M.M.; Ulyanov, V.V. Second-order Chebyshev–Edgeworth and Cornish-Fisher expansions for distributions of statistics constructed with respect to samples of random size. J. Math. Sci. 2020, 244, 811–839. [Google Scholar] [CrossRef]
Christoph, G.; Ulyanov, V.V.; Bening, V.E. Second Order Expansions for Sample Median with Random Sample Size. arXiv 2019, arXiv:1905.07765v2. [Google Scholar]
Fujikoshi, Y.; Ulyanov, V.V. Non-Asymptotic Analysis of Approximations for Multivariate Statistics; Springer: Singapore, 2020. [Google Scholar]
Johnson, N.L.; Kotz, S.; Balakrishnan, N. Continuous Univariate Distributions, 2nd ed.; Wiley: New York, NY, USA, 1995; Volume 2. [Google Scholar]
Konishi, S. Asymptotic expansions for the distributions of functions of a correlation matrix. J. Multivar. Anal. 1979, 9, 259–266. [Google Scholar] [CrossRef]
Christoph, G.; Ulyanov, V.V.; Fujikoshi, Y. Accurate approximation of correlation coefficients by short Edgeworth-Chebyshev expansion and its statistical applications. In Prokhorov and Contemporary Probability Theory; Proceedings in Mathematics & Statistics 33; Shiryaev, A.N., Varadhan, S.R.S., Presman, E.L., Eds.; Springer: Heidelberg, Germany, 2013; pp. 239–260. [Google Scholar]
Prokhorov, Y.V. Limit theorems for the sums of random vectors whose dimension tends to infinity. Teor. Veroyatnostei Primen. 1990, 35, 751–753, English translation: Theory Probab. Appl. 1991, 35, 755–757. [Google Scholar] [CrossRef]
Gnedenko, B.V.; Korolev, V.Y. Random Summation. Limit Theorems and Applications; CRC Press: Boca Raton, FL, USA, 1996. [Google Scholar]
Döbler, C. New Berry-Esseen and Wasserstein bounds in the CLT for non-randomly centered random sums by probabilistic methods. ALEA Lat. Am. J. Probab. Math. Stat. 2015, 12, 863–902. [Google Scholar]
Chen, L.H.Y.; Goldstein, L.; Shao, Q.-M. Normal Approximation by Stein’s Method. Probability and its Applications; Springer: Heidelberg, Germany, 2011. [Google Scholar]
Pike, J.; Ren, H. Stein’s method and the Laplace distribution. ALEA Lat. Am. J. Probab. Math. Stat. 2014, 11, 571–587. [Google Scholar]
Bening, V.E.; Korolev, V.Y. On the use of Student’s distribution in problems of probability theory and mathematical statistics. Theory Probab. Appl. 2005, 49, 377–391. [Google Scholar] [CrossRef]
Bening, V.E.; Galieva, N.K.; Korolev, V.Y. Asymptotic expansions for the distribution functions of statistics constructed from samples with random sizes (Russian). Inform. Appl. 2013, 7, 75–83. [Google Scholar]
Bening, V.E.; Korolev, V.Y.; Zeifman, A.I. Asymptotic expansions for the distribution function of the sample median constructed from a sample with random size. In Proceedings 30th ECMS 2016 Regensburg; Claus, T., Herrmann, F., Manitz, M., Rose, O., Eds.; European Council for Modeling and Simulation: Regensburg, Germany, 2016; pp. 669–675. [Google Scholar]
Schluter, C.; Trede, M. Weak convergence to the Student and Laplace distributions. J. Appl. Probab. 2016, 53, 121–129. [Google Scholar] [CrossRef]
Gavrilenko, S.V.; Zubov, V.N.; Korolev, V.Y. The rate of convergence of the distributions of regular statistics constructed from samples with negatively binomially distributed random sizes to the Student distribution. J. Math. Sci. 2017, 220, 701–713. [Google Scholar] [CrossRef]
Buddana, A.; Kozubowski, T.J. Discrete Pareto distributions. Econ. Qual. Control. 2014, 29, 143–156. [Google Scholar] [CrossRef]
Christoph, G.; Wolf, W. Convergence Theorems with a Stable Limit Law, Mathematical Research 70; Akademie-Verlag: Berlin, Germany, 1992. [Google Scholar]
Oldham, K.B.; Myl, J.C.; Spanier, J. An Atlas of Functions, 2nd ed.; Springer Science + Business Media: New York, NY, USA, 2009. [Google Scholar]
Kotz, S.; Kozubowski, T.J.; Podgórski, K. The Laplace distribution and Generalizations: A Revisit with Applications to Communications, Economics, Engineering, and Finance; Birkhäuser: Boston, MA, USA, 2001. [Google Scholar]
Prudnikov, A.P.; Brychkov, Y.A.; Marichev, O.I. Integrals and Series, Volume 2: Special Functions, 3rd ed.; Gordon & Breach Science Publishers: New York, NY, USA, 1992. [Google Scholar]
Madan, D.B.; Senata, E. The variance gamma (V.G.) model for share markets returns. J. Bus. 1990, 63, 511–524. [Google Scholar] [CrossRef]
Goldfeld, S.M.; Quandt, R.E. Econometric modelling with non-normal disturbances. J. Econom. 1981, 17, 141–155. [Google Scholar] [CrossRef]
Missiakeles, S. Sargan densities which one? J. Econom. 1983, 23, 223–233. [Google Scholar] [CrossRef]
Jackman, S. Bayesian Analysis for the Social Sciences; Wiley Series in Probability and Statistics; John Wiley & Sons, Ltd.: Chichester, UK, 2009. [Google Scholar]
Prudnikov, A.P.; Brychkov, Y.A.; Marichev, O.I. Integrals and Series, Volume 1: Elementary Functions, 3rd ed.; Gordon & Breach Science Publishers: New York, NY, USA, 1992. [Google Scholar]

Figure 1. Distribution function

P (N_{n} (r) \leq (r (n - 1) + 1) x)

(black line, almost covered by the red line), the limit law

G_{2, 2} (x)

(blue line) and the second approximation

G_{2, 2} (x) + h_{2} (x) / n

(red line) with

n = 25

and

r = 2

.

Figure 1. Distribution function

P (N_{n} (r) \leq (r (n - 1) + 1) x)

(black line, almost covered by the red line), the limit law

G_{2, 2} (x)

(blue line) and the second approximation

G_{2, 2} (x) + h_{2} (x) / n

(red line) with

n = 25

and

r = 2

.

Figure 2. Empirical version of

P (\sqrt{g_{n}} R_{N_{n} (r)} \leq x)

(blue line), limit Student law

S_{2 r} (x)

(orange line) and second approximation

S_{2 r; n} (x; 1)

(green line) for the correlation coefficient for pairs of normal vectors with random dimension

N_{25} (2)

. Here,

x > 0, n = 25

and

r = 2

.

Figure 2. Empirical version of

P (\sqrt{g_{n}} R_{N_{n} (r)} \leq x)

(blue line), limit Student law

S_{2 r} (x)

(orange line) and second approximation

S_{2 r; n} (x; 1)

(green line) for the correlation coefficient for pairs of normal vectors with random dimension

N_{25} (2)

. Here,

x > 0, n = 25

and

r = 2

.

Figure 3. Empirical version of

P (\sqrt{N_{n} (r)} R_{N_{n} (r)} \leq x)

(blue line), limit normal law

Φ (x)

(orange line) and second approximation

Φ_{n; 2} (x; 1)

(green line) for the correlation coefficient for pairs of normal vectors with random dimension

N_{25} (2)

. Here,

x > 0, n = 25

and

r = 2

.

Figure 3. Empirical version of

P (\sqrt{N_{n} (r)} R_{N_{n} (r)} \leq x)

(blue line), limit normal law

Φ (x)

(orange line) and second approximation

Φ_{n; 2} (x; 1)

(green line) for the correlation coefficient for pairs of normal vectors with random dimension

N_{25} (2)

. Here,

x > 0, n = 25

and

r = 2

.

Figure 4. Empirical version of

P (g_{n}^{- 1 / 2} N_{n} (r) R_{N_{n} (r)} \leq x)

(blue line), limit Laplace law

L_{r} (x)

(orange line) and second approximation

L_{n; 2} (x; 1)

(green line) for the correlation coefficient for pairs of normal vectors with random dimension

N_{25} (2)

. Here,

x > 0, n = 25

and

r = 2

.

Figure 4. Empirical version of

P (g_{n}^{- 1 / 2} N_{n} (r) R_{N_{n} (r)} \leq x)

(blue line), limit Laplace law

L_{r} (x)

(orange line) and second approximation

L_{n; 2} (x; 1)

(green line) for the correlation coefficient for pairs of normal vectors with random dimension

N_{25} (2)

. Here,

x > 0, n = 25

and

r = 2

.

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Christoph, G.; Ulyanov, V.V. Second Order Expansions for High-Dimension Low-Sample-Size Data Statistics in Random Setting. Mathematics 2020, 8, 1151. https://doi.org/10.3390/math8071151

AMA Style

Christoph G, Ulyanov VV. Second Order Expansions for High-Dimension Low-Sample-Size Data Statistics in Random Setting. Mathematics. 2020; 8(7):1151. https://doi.org/10.3390/math8071151

Chicago/Turabian Style

Christoph, Gerd, and Vladimir V. Ulyanov. 2020. "Second Order Expansions for High-Dimension Low-Sample-Size Data Statistics in Random Setting" Mathematics 8, no. 7: 1151. https://doi.org/10.3390/math8071151

APA Style

Christoph, G., & Ulyanov, V. V. (2020). Second Order Expansions for High-Dimension Low-Sample-Size Data Statistics in Random Setting. Mathematics, 8(7), 1151. https://doi.org/10.3390/math8071151

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Second Order Expansions for High-Dimension Low-Sample-Size Data Statistics in Random Setting

Abstract

1. Introduction

2. Sample Correlation Coefficient, Angle between Vectors and Their Normal Approximations

3. Statistical Models with a Random Number of Observations

3.1. Random Sums

3.2. Transfer Proposition from Non-Random to Random Sample Sizes

4. Auxiliary Propositions and Lemmas

4.1. Negative Binomial Distribution as Random Dimension of the Normal Vectors

4.2. Maximum of n Independent Discrete Pareto Random Variables Is the Dimension of the Normal Vectors

5. Main Results

5.1. The Random Dimension $N_{n} = N_{n} (r)$ Is Negative Binomial Distributed

5.1.1. Student’s t-Distribution

5.1.2. Standard Normal Distribution

5.1.3. Generalized Laplace Distribution

5.2. The Random Dimension $N_{n} = N_{n} (s)$ Is the Maximum of n Independent Discrete Pareto Random Variables

5.2.1. Laplace Distribution

5.2.2. Standard Normal Distribution

5.2.3. Scaled Student’s t-Distribution

6. Conclusions

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

Appendix A. Proofs of the Theorems and Lemmas

Appendix A.1. Proofs of Theorems 1 and 2

Appendix A.2. Proofs of Lemmas 1 to 5

Appendix A.3. Proofs of Theorems 3 to 8

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Second Order Expansions for High-Dimension Low-Sample-Size Data Statistics in Random Setting

Abstract

1. Introduction

2. Sample Correlation Coefficient, Angle between Vectors and Their Normal Approximations

3. Statistical Models with a Random Number of Observations

3.1. Random Sums

3.2. Transfer Proposition from Non-Random to Random Sample Sizes

4. Auxiliary Propositions and Lemmas

4.1. Negative Binomial Distribution as Random Dimension of the Normal Vectors

4.2. Maximum of n Independent Discrete Pareto Random Variables Is the Dimension of the Normal Vectors

5. Main Results

5.1. The Random Dimension N n = N n ( r ) Is Negative Binomial Distributed

5.1.1. Student’s t-Distribution

5.1.2. Standard Normal Distribution

5.1.3. Generalized Laplace Distribution

5.2. The Random Dimension N n = N n ( s ) Is the Maximum of n Independent Discrete Pareto Random Variables

5.2.1. Laplace Distribution

5.2.2. Standard Normal Distribution

5.2.3. Scaled Student’s t-Distribution

6. Conclusions

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

Appendix A. Proofs of the Theorems and Lemmas

Appendix A.1. Proofs of Theorems 1 and 2

Appendix A.2. Proofs of Lemmas 1 to 5

Appendix A.3. Proofs of Theorems 3 to 8

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

5.1. The Random Dimension $N_{n} = N_{n} (r)$ Is Negative Binomial Distributed

5.2. The Random Dimension $N_{n} = N_{n} (s)$ Is the Maximum of n Independent Discrete Pareto Random Variables