Next Article in Journal
HSFAN: A Dual-Branch Hybrid-Scale Feature Aggregation Network for Remote Sensing Image Super-Resolution
Next Article in Special Issue
Construction of Space-Filling Asymmetrical Marginally Coupled Designs
Previous Article in Journal
Finite-Time Dissipative Fault Estimate and Event-Triggered Fault-Tolerant Synchronization Control for Discrete Semi-Markov Jumping Neural Networks
Previous Article in Special Issue
Federated Semi-Supervised Learning with Uniform Random and Lattice-Based Client Sampling
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Review

A Review: Construction of Statistical Distributions

1
Guangdong Provincial Key Laboratory of Interdisciplinary Research and Application for Data Science (IRADS), Beijing Normal-Hong Kong Baptist University, 2000 Jintong Road, Tangjiawan, Zhuhai 519087, China
2
The Key Lab of Random Complex Structures and Data Analysis, The Chinese Academy of Sciences, Beijing 100045, China
3
Faculty of Science and Technology, Beijing Normal-Hong Kong Baptist University, 2000 Jintong Road, Tangjiawan, Zhuhai 519087, China
*
Author to whom correspondence should be addressed.
Entropy 2025, 27(12), 1188; https://doi.org/10.3390/e27121188
Submission received: 7 October 2025 / Revised: 6 November 2025 / Accepted: 20 November 2025 / Published: 23 November 2025
(This article belongs to the Special Issue Number Theoretic Methods in Statistics: Theory and Applications)

Abstract

Statistical modeling is fundamentally based on probability distributions, which can be discrete or continuous and univariate or multivariate. This review focuses on the methods used to construct these distributions, covering both traditional and newly developed approaches. We first examine classic distributions such as the normal, exponential, gamma, and beta for univariate data, and the multivariate normal, elliptical, and Dirichlet for multidimensional data. We then address how, in recent decades, the demand for more flexible modeling tools has led to the creation of complex meta-distributions built using copula theory.

1. Introduction

Statistical distributions form the foundation of probability theory, statistical inference, and predictive modeling. They provide a mathematical framework for describing the randomness, variability, and uncertainty inherent in data. Different types of data—univariate or multivariate, discrete or continuous—require different distributions, including mixture models for complex patterns.
Numerous books have offered comprehensive introductions to useful distributions. The seminal four-volume work by Johnson et al. [1,2,3,4] provides an exceptionally detailed encyclopedia of distributions and their interrelations, though its sheer scope can make it challenging for readers to locate specific information efficiently. Other key texts include Fang and Xu’s book [5] (in Chinese, which limits its accessibility), Fang et al. [6] focusing treatment on multivariate symmetric distributions, Johnson [7] reviewing statistical simulation for multivariate distributions, and the new work by Fang et al. [8] offering a comprehensive study on representative points of distributions and their applications.
The purpose of this review is to synthesize various methods for constructing statistical distributions. By presenting a unified overview of both traditional and modern techniques, we aim to equip readers with the knowledge to understand existing distributions and develop new ones to meet evolving data modeling needs.
This paper is structured as follows. Section 2 introduces construction methods for univariate distributions, encompassing both traditional and newer models. Section 3 briefly reviews methods for constructing discrete distributions, including both univariate and multivariate frameworks. Section 4.4 then details construction techniques for multivariate statistical distributions, specifically the families of elliptical distributions and meta-distributions created via copula methods. The paper concludes with final remarks in Section 5.

2. Construction of Univariate Statistical Distributions

This section introduces several approaches for constructing univariate continuous distributions:
(1)
Distribution families, including location-scale families, the Pearson system, and exponential families;
(2)
Functions of random variables via stochastic representations;
(3)
Parameter expansion generalizes existing families by adding parameters. For example, normal→ skew normal; exponential → Weibull.
(4)
Generalized families via transformation of the cdf. A powerful and prolific approach for generating new families of distributions is through the functional transformation of an existing baseline cdf, F(x). These transformations introduce new parameters, thereby creating more flexible distributions capable of modeling complex data behaviors.
(5)
Combination methods, such as mixture distributions, compound distributions, convolutions, and copula-based techniques.

2.1. Distribution Families

A distribution family is defined by a specific structural property. Among many such families, the following three are particularly well established in the literature.

2.1.1. Location-Scale Distribution Family

A distribution F(x; μ, σ) belongs to the location-scale family if its probability density function (pdf) can be expressed as
f ( x ) = 1 σ f 0 x μ σ ,
where μ is the location parameter, σ > 0 is the scale parameter, and f0 is the standard density of the distribution ZF(x; 0, 1). This implies that for any XF(x; μ, σ), the standardized variable Z = ( X μ ) σ follows the standard distribution F(x; 0, 1).
A key property of this family is its invariance under affine transformations, implying that if XF(x; μ, σ), then a + bXF(x; a + , |b|σ). The normal distribution is a classic example of the location-scale distribution. Other prominent examples include the Cauchy and logistic distributions.
  • Cauchy distribution: Its pdf is given by
    f ( x ; θ , λ ) = 1 π λ 1 + x θ λ 2 ,
    where θ is the location parameter and λ is the scale parameter. Note that the Cauchy distribution does not have a defined mean or variance.
  • Logistic distribution: Named for its relationship to the logistic (sigmoid) function, the logistic distribution resembles the normal distribution in shape but possesses heavier tails. It finds applications in logistic regression, growth models, and economics. Its cumulative distribution function (cdf) is given by
    F ( x ; α , β ) = 1 1 + exp ( ( x α ) / β ) , x , α R , β > 0 .
There are more other distributions in this category. For further reading, see [9,10].

2.1.2. Pearson Distribution Family

The Pearson family is a system of continuous probability distributions, introduced by Karl Pearson [11], designed to model a wide range of empirical data characterized by properties such as skewness and kurtosis that deviate from the normal distribution. The system is defined by the Pearson differential equation:
d f ( x ) d x = f ( x ) x λ a x 2 + b x + c ,
where f(x) is a probability density function and the parameters a, b, c, λ determine the specific distribution type.
Roots of the denominator (ax2 + bx + c) classify distributions into eight or more types listed in Table 1, where type 0 is the normal distribution; type I distribution is the beta distribution; type II distribution curve is a symmetrical U-shaped curve; type III distribution curve is the gamma distribution; type IV distribution curve is skewed heavy-tailed; type V is the shift inverse gamma distribution; type VI is an inverse beta distribution; type VII is the t-distribution. Some references consider more types, like types VIII and IX, which are special cases of the power function distribution, and type X, which is the exponential distribution. More details can be found in Chapter 12 of [1,12].

2.1.3. The Exponential Distribution Family

Consider it as a “family” in the biological sense, implying that though its members, such as the normal, exponential, gamma, beta, and Poisson distributions, may appear distinct characteristics, they share a common underlying mathematical structure. This shared “DNA” makes them exceptionally powerful and convenient to use in statistical modeling and machine learning.
Formally, the exponential family comprises distributions whose probability density (or mass) functions can be expressed in the following canonical form:
f ( x ; θ ) = h ( x ) exp θ T ( x ) B ( θ ) ,
where h(x) is a function as base measure that depends only on the data X, not the parameters θ = ( θ 1 , , θ k ) . T ( x ) = ( T 1 ( x ) , , T k ( x ) ) is a sufficient statistic, and B(θ) is a log-partition function and a normalization constant that ensures that the function integrates/sums to 1. B(θ) also generates the moments (mean, variance) of the distribution.
This common structure leads to several powerful properties:
(1)
All members have a low-dimensional sufficient statistic T(x).
(2)
In Bayesian statistics, the exponential family always has a natural conjugate prior. This makes Bayesian updating mathematically neat and computationally tractable.
(3)
Distributions in the exponential family are the maximum entropy distributions given constraints on the expected values of the sufficient statistics. For example, the normal distribution has the maximum entropy for a given mean and variance.
Further details can be found in references such as [1,9]. Many additional families of univariate distributions have also been cataloged in [1].

2.2. Distributions Produced by a Function of Random Variables

When the distribution of a random variable X or a random vector X = ( X 1 , , X k ) is known, applying a continuous function Y = g(X) or Y = h(X) in order to yield a new distribution is a prevalent distribution construction approach. In this context, the original distribution is termed the initial distribution, while the resulting distribution is called the target distribution. The inverse transformation method [13] is a widely used technique for statistical simulation, based on the following theorem.
Theorem 1.
Let X have a continuous cdf F(x) and a uniform random variable U ∼ U(0, 1). Then we have
(a) 
F−1(U) ∼ F(·);
(b) 
F(X) ∼ U(0, 1).
Following Theorem 1, we can leverage this connection between any continuous distribution and the uniform distribution to derive the distribution of a transformed random variable Y = g(X). The key insight is to use the probability integral transform from part (b) of the theorem. We know that FX(X) ∼ U(0, 1). Therefore, the cdf of the transformation Y = g(X) can be found by relating the event {Yy} back to this uniform distribution. Specifically, the cdf of Y is
F Y ( y ) = P ( Y y ) = P ( g ( X ) y ) .
The solution to this probability depends on the nature of the function g. For monotonic and differentiable transformations, this approach leads directly to the well-known change of variables formula for the pdf of Y:
f Y ( y ) = f X ( x ) d x d y = f X ( g 1 ( y ) ) d d y g 1 ( y ) ,
where x = g−1(y) is the inverse function. This formula can be derived by differentiating the cdf FY(y), which itself is computed using the distribution of X that Theorem 1 helps us to understand and manipulate through the uniform distribution.
A systematic study and generalization of the inverse transformation method can be found in Li et al. [14]. Numerous distributions have been constructed using this approach. The following are some functions commonly employed in the literature for this purpose.

2.2.1. Transformation of Random Variables

Some transformation examples are presented later.
  • Linear transformation Y = a + bX, for example, used in the location-scale distributions;
  • Power transformation Y = Xr, for example, from the normal to the χ2 distribution by Y = X2;
  • Exponential transformation Y = eX, for example, from the normal to the lognormal distribution by Y = eX;
  • Inverse transformation Y = 1/X, for example, the inverse gamma distribution.
  • Transformation Y = g ( X 1 , , X m ) , for example, the F-distribution and beta distribution are produced by functions of two χ2 distribution variables.

2.2.2. Order Statistics

These include fundamental objects in statistics, for example, the sample minimum, maximum, and median. They deal with the properties and applications of sorted random samples. Let X 1 , , X n be a random sample from a population with cdf F and pdf f. The order statistics, denoted by X ( 1 ) X ( 2 ) X ( n ) , represent the sorted values of the sample. The cdf of the kth order statistic, X(k), is given by
P ( X ( k ) x ) = j = k n n j [ F ( x ) ] j [ 1 F ( x ) ] n j .
The density functions for specific order statistics are provided in Table 2. Order statistics have wide-ranging applications, including robust statistics, sample range and interquartile range, quantiles and Q–Q plots, reliability engineering, and non-parametric statistical tests (e.g., the Wilcoxon rank-sum test). A comprehensive introduction to order statistics can be found in [15].

2.2.3. Extreme Value Theory (EVT)

EVT is a statistical framework for assessing the probabilities of extreme events. The fundamental result in EVT is the extreme value theorem, stating that the distribution of properly normalized maxima (or minima) converges to one of three types of distributions, regardless of the underlying distribution of the original data. These three types are collectively known as the generalized extreme value (GEV) distribution. The cdf for the GEV is given by
F ( x ) = exp [ 1 + ξ ( x μ σ ) ] 1 / ξ , if 1 + ξ x μ σ > 0 ,
where parameters are μ for location, σ > 0 for scale, and ξ for shape. The three special cases of the GEV distribution are
  • Type I (Gumbel distribution) Gu(μ, σ) light-tailed, with ξ = 0. Its pdf is
    1 σ exp x μ σ exp x μ σ , σ > 0 .
  • Type II (Fr e ´ chet distribution) heavy-tailed, with ξ > 0.
  • Type III (Weibull distribution) short-tailed and bounded with ξ < 0.
The extreme value distribution has been used in finance, environmental science, engineering, and reliability testing. For a rigorous treatment of the GEV distribution and extreme value theory, see [16].

2.3. Generalized Families via Transformation of the cdf

Given a baseline cdf F(x), the exponentiated (or power-transformed) family is defined by
G ( x ) = [ F ( x ) ] α , α > 0 .
This construction, detailed in [17], generates a vast range of new distributions. If X 1 , X 2 , , X n is a random sample from F(x), then G(x) is the cdf of the sample maximum, max ( X 1 , , X n ) , when α = n is an integer. Notable members include the exponentiated Weibull and exponentiated exponential distributions.
A dual transformation yields the following complementary exponentiated family:
G ( x ) = 1 [ 1 F ( x ) ] β , β > 0 .
This is the cdf of the sample minimum, min ( X 1 , , X n ) , when β = n. It is also known as the proportional odds model in survival analysis. Together, the exponentiated and complementary exponentiated families form the foundation of the broader class of maximum–minimum stable distributions.
Further generalizations have been proposed to introduce additional flexibility and skewness. The Alpha Power Family (or T-X family), defined by
G ( x ) = α F ( x ) 1 α 1 , α > 0 , α 1 ,
effectively introduces skewness to the baseline distribution [18]. A highly flexible generalization is the Kumaraswamy-G family [19], given by
G ( x ) = 1 1 F ( x ) a b , a > 0 , b > 0 .
This family, which contains the exponentiated family as a special case (b = 1), enjoys the advantage of having a quantile function in a simple closed form.
These methods, along with other systematic frameworks like the T-X family [20], demonstrate that simple transformations of a baseline cdf can serve as an engine for generating numerous new and useful distributions, greatly expanding the toolbox for statistical modeling.

2.4. Combination Methods

There are several approaches to constructing a distribution through combining the existing distributions.

2.4.1. Types of Mixtures

  • Finite mixture f ( x ) = i = 1 k w i f i ( x ) , which has various applications in statistical inference with flexibility in density fitting [21], for example, a mixture of two normals [22] or two Weibull distributions [23].
  • Continuous mixture f(x) = ∫ f(x; θ)g(θ). One example is the scale mixture. A primary method to create such continuous mixtures is through compounding, which we detail in the next section.
The mixing distribution g(θ) in an infinite mixture is not required to be continuous. When g(θ) is a discrete distribution, the resulting mixture, often called a countable mixture or discrete mixture, integrates over a countable set of parameters. This approach is technically distinct from the continuous mixture defined above but remains a powerful tool for deriving flexible distributions. Prominent examples of distributions derived from a discrete mixing mechanism include the following:
  • The negative binomial distribution, which can be derived as a Poisson-gamma mixture where the Poisson rate parameter is governed by a gamma distribution. While the gamma is continuous, a discrete analogue exists (e.g., a Poisson mixture with a discrete log-gamma distribution would yield a similar over-dispersed count distribution).
  • The Student’s t-distribution, which is a scale mixture of normal distributions where the precision (inverse variance) parameter follows a gamma distribution.
  • The beta-binomial distribution, which arises when the success probability p of a binomial distribution follows a beta distribution.
  • The compound Poisson distribution and related models, where the intensity or other parameters are driven by a latent discrete process.

2.4.2. Compounding

It is a fundamental concept in probability theory that a parameter of one distribution is itself a random variable following another distribution (see [1]). This approach generates a more flexible distribution that incorporates uncertainty in the parameter. It is important to note that the marginal distribution resulting from compounding is mathematically equivalent to a continuous mixture distribution, as defined in Section 2.4.1. The compounding framework provides a hierarchical data-generation perspective for the same model. Formally, if X|θF1(θ) and θF2(λ), then the distribution of X is the marginal distribution of a compound distribution. Examples are the beta-binomial, gamma-Poisson (negative binomial), normal–normal, Dirichlet–multinomial, compound Poisson process models. The normal–normal model is as follows:
  • First stage (conditional distribution): X | μ N ( μ , σ 2 ) . The observation X is normally distributed with mean μ and a known variance σ2;
  • Second stage (parameter distribution): μ N ( μ 0 , τ 2 ) . The mean μ itself is assumed to be normally distributed with prior mean μ0 and variance τ2;
  • Compound distribution: The unconditional distribution of X N ( μ 0 , σ 2 + τ 2 ) .

2.4.3. Convolution

The random variable Z = X + Y is termed the convolution of X and Y, and it has a density function given by
f X + Y ( z ) = + f X , Y ( x , z x ) d x ,
where fX,Y(·, ·) is the joint density of (X, Y). If X and Y are independent, this simplifies to f X + Y ( z ) = ( f X f Y ) ( z ) = + f X ( x ) f Y ( z x ) d x .
A distribution of significant practical importance arises when X follows a normal distribution N(μ, σ2) and Y follows an exponential distribution Exp(λ), with X and Y being independent. This specific convolution forms the cornerstone of the normal–exponential stochastic frontier model [24], widely used in econometrics to measure the efficiency of firms and other decision-making units. In this context, the normal random variable represents the random noise (e.g., measurement error, luck), while the exponential random variable captures one-sided inefficiency.
Then the density function of Z = X + Y is given by
f Z ( z ; μ , σ , λ ) = λ 2 e λ 2 ( 2 μ + λ σ 2 2 z ) · erfc μ + λ σ 2 z 2 σ ,
where erfc(z) represents the complementary error function defined as
erfc ( z ) = 2 π z e t 2 d t .

2.5. Some Key Univariate Distributions and Related Distributions

The first continuous statistical distribution to be formally studied was the uniform distribution, dating back to Huygens [25]. However, the most historically significant early continuous distribution is the normal distribution, also referred to as the Gaussian distribution. Although the uniform distribution provided the simplest continuous model, it was the normal distribution that first underwent extensive analytical study. The development of the normal distribution emerged later through the work of de Moivre [26] and Gauss [27] during the 18th and 19th centuries. Its fundamental role in error theory and the central limit theorem established it as the cornerstone of continuous probability theory.

2.5.1. Distributions Related to the Normal Distribution

The probability density function of a normal distribution XN(μ, σ2) is given by
f ( x ; μ , σ ) = 1 2 π σ e 1 2 ( x μ σ ) 2 , < x < ,
where parameters μ and σ2 are the mean and variance of X, respectively. When μ = 0 and σ2 = 1, the distribution is referred to as the standard normal distribution, denoted by ZN(0, 1). Several distributions derived from the normal distribution are listed in Table 3 and can be constructed as follows:
(1)
Lognormal distribution: XLN(μ, σ2) if ln(X) ∼ N(μ, σ2);
(2)
Skew-normal distribution: proposed by Azzalini [28] with the pdf
f ( z ; α ) = 2 ϕ ( z ) Φ ( α z ) , z R , α R ,
where ϕ(z) is the standard normal pdf, Φ(αz) is is the standard normal cdf evaluated at αz and α is a shape parameter. For more general cases, refer to Table 3. Skew-normal distribution is one of the skew-elliptical distributions, and there are more such distributions including skew-t, skew-Cauchy, skew-logistic, and skew-Laplace distributions; see [29,30]. There is more discussion about this topic in Section 4.6.
(3)
Chi-square distribution: If X 1 , , X ν are independent and identically distributed (i.i.d.) random variables of N(0, 1), then X = X 1 2 + + X ν 2 χ 2 ( ν ) , the χ2-distribution with degrees of freedom ν;
(4)
Chi distribution: If X 1 , , X ν are i.i.d. N(0, 1), then X = X 1 2 + + X ν 2 χ ( ν ) the χ-distribution with degrees of freedom ν;
(5)
Beta distribution (special case): If Yχ2(2m) and Zχ2(2n) are independent, then X = Y Y + Z B e ( m , n ) , referred to Table 3;
(6)
Beta prime distribution: If XBe(α, β), then X 1 X B P ( α , β ) ;
(7)
Arcsine distribution: It is defined as B e ( 1 2 , 1 2 ) ;
(8)
F-distribution: If Yχ2(m) and Zχ2(n) are independent, then X = n Y m Z F ( m , n ) ;
(9)
t-distribution: If Yχ2(ν) and ZN(0, 1) are independent, then T = ν Z Y t ν .
Table 3. Some continuous distributions constructed by or closely related to N(μ, σ2).
Table 3. Some continuous distributions constructed by or closely related to N(μ, σ2).
DistributionNotationDensity f ( x )
NormalN(μ, σ2) 1 2 π σ exp ( x μ ) 2 2 σ 2 , x , μ R , σ > 0
LognormalLN(μ, σ2) 1 x σ 2 π exp ( ln x μ ) 2 2 σ 2 , x > 0 , μ R , σ > 0
Skew-normalSN(μ, σ, α) f ( x ; μ , σ , α ) = 2 σ ϕ x μ σ Φ α x μ σ , σ > 0 , x , μ , α R
χ2(ν) χ2(ν) [ 2 ν / 2 Γ ( ν 2 ) ] 1 x ν / 2 1 e x / 2 , x > 0 , ν > 0
χ(ν) χν [ 2 ν / 2 1 Γ ( ν 2 ) ] 1 x ν 1 e x 2 / 2 , x > 0 , ν > 0
BetaBe(a, b) 1 B ( a , b ) x a 1 ( 1 x ) b 1 , 0 x 1 , a , b > 0
Beta primeBP(α, β) 1 B ( α , β ) x α 1 ( 1 + x ) α + β , x > 0 , α , β > 0
Arcsinearcsine 1 π x ( 1 x ) , 0 < x < 1
FF(m, n) 1 B ( m 2 , n 2 ) m n m 2 x m 2 1 1 + m n x m + n 2 , x > 0 , m , n > 0
Student-ttν Γ ( ν + 1 2 ) ν π Γ ( ν 2 ) 1 + x 2 ν ν + 1 2 , ν > 0 , x R
Note: The title “constructed by” implies a direct derivation from the normal distribution. For distributions like χ2, χ, F, and t, this direct construction (e.g., as a sum of squares of independent standard normal variables) is most straightforward when their shape/degrees of freedom parameters (v, m, n) are positive integers. The density functions listed here represent the analytic continuations of these distributions to positive real-valued parameters, which are their standard definitions. The beta and arcsine distributions are included due to their relationship with normal random variables via the transformation of ratios or functions thereof.

2.5.2. Distributions Related to the Exponential Distribution

The exponential distribution, defined on the positive real number space, was extensively studied in the early 19th century due to its valuable properties, such as memorylessness and its connection to the Poisson process for modeling rare events (see [1]). Several distributions derived from the exponential distribution are listed in Table 4.
(1)
Laplace distribution: Also known as the double exponential distribution, the Laplace distribution extends the exponential distribution symmetrically to the entire real number space, R .
(2)
Weibull distribution: The Weibull distribution is formed by introducing location δ and sharp α parameters to the density of the exponential density. It constitutes a flexible family of distributions widely used for modeling time-to-failure, lifespan, and other positive-valued measurements.
(3)
Rayleigh distribution: Proposed by Rayleigh [31] for modeling wave amplitudes, the Rayleigh distribution is a special case of the Weibull distribution with Wei(2, β, 0). It also arises from the normal distribution, N(0, σ2), by X = Y 2 + Z 2 R a y ( σ ) , where Y and Z are i.i.d. N(0, σ2) and β = 2σ2.
(4)
Topp–Leone exponential distribution (TLED): Introduced by [32], the TLED adds a shape parameter α to the exponential distribution, yielding the cumulative distribution function F ( x ) = ( 1 e 2 x / β ) α . Its density is provided in Table 4. The TLED generalizes certain distributions. For instance, TLED includes the Burr type as TLED(2, β) and Frèchet distribution as TLED(1/2, β).
Table 4. Some continuous distributions constructed by the exponential distribution.
Table 4. Some continuous distributions constructed by the exponential distribution.
DistributionNotationDensity p ( x )
ExponentialExp(β) 1 β e x / β , x 0 , β > 0
LaplaceLaplace(μ, β) 1 2 β e | x μ | β , x R , μ R , β > 0
WeibullWei(α, β, δ) α β x δ β α 1 e x δ β α , x δ , δ R , α , β > 0
RayleighRay(β) or Ray(σ) 2 x β e x 2 / β , x 0 , β = 2 σ 2
TLEDTLED(α, β) 2 α β e 2 x / β ( 1 e 2 x / β ) α 1 , x > 0 , α , β > 0
It is worth mentioning that the exponential distribution and the Poisson distribution share a fundamental symbiotic relationship within the context of a Poisson process. If the number of events occurring in a fixed time interval follows a Poisson distribution with rate parameter λ, then the waiting time between consecutive events follows an exponential distribution with the same rate parameter λ = 1 β . Thus, the former models event counts, and the latter models inter-arrival times.

2.5.3. Distributions Related to the Gamma Distribution

The gamma distribution, Gamma(α, β), was formally defined in the 20th century and can be considered as an extension of the exponential distribution. It retains several desirable properties of the exponential distribution, such as additivity: the sum of independently and identically distributed variables XiGamma(αi, β) for i = 1 , , m is distributed as G a m m a ( i = 1 m α i , β ) . The gamma distribution is foundational, as the chi-square distribution χ ν 2 is a special case, Gamma(ν/2, 1/2), and the beta distribution Be(a, b) can be defined by Y Y + Z , where YGamma(a, 1) and ZGamma(b, 1) are independent. Table 5 lists several important extensions of the gamma distribution.
(1)
Location-shift gamma: Adding a location parameter δ forms the three-parameter distribution Gamma(α, β, δ).
(2)
Generalized gamma: This distribution exists in several forms. The version shown with shape a, scale d, and power p parameters was introduced by Stacy [33]. It includes several common distributions as special cases: the standard gamma (p = 1), the Weibull (a = 1), the Rayleigh (a = p = 2), and the exponential (a = p = 1).
(3)
Inverse gamma: Defined as Y = 1/X for XGamma(α, β), this two-parameter distribution is positive-skewed. It is primarily used in Bayesian statistics as the conjugate prior for the variance of a normal distribution.
(4)
Beta prime distribution: The ratio X/Y of two independent variables, XGamma(α, θ) and YGamma(β, θ), follows a beta prime distribution BP(α, β).
(5)
General beta prime distribution: This is a further extension of the beta prime distribution, incorporating two additional parameters.
Table 5. Some continuous distributions constructed by the gamma distribution.
Table 5. Some continuous distributions constructed by the gamma distribution.
DistributionNotationDensity f ( x )
GammaGamma(α, β) β α Γ ( α ) x α 1 e β x , x > 0 , α , β > 0
Location-shift gammaGamma(α, β, δ) β α Γ ( α ) ( x δ ) α 1 e β ( x δ ) , x δ , δ R , α, β > 0
Generalized gammaGgamma(a, d, p) p d a Γ ( a / p ) x a 1 e ( x / d ) p , x > 0 , a , d , p > 0
Inverse gammaInvGamma(α, β) β α Γ ( α ) x ( α + 1 ) e β / x , x > 0 , α , β > 0
Beta primeBP(α, β) 1 B ( α , β ) x α 1 ( 1 + x ) α + β , x > 0 , α , β > 0
General beta primeGBP(α, β, p, q) | α | ( 1 + ( x / β ) α ) p q ( x / β ) p α 1 β B ( p , q ) , α R , α 0 , p, q, β > 0, x > 0.

2.6. The Generalized Hyperbolic Distribution Family

Schrdinger [34] described the probability distribution of the first passage time in Brownian motion. Thirty years later, Tweedie [35] gave the inverse relationship between the cumulant generating function of the first passage time distribution and that of the normal distribution, and named it the inverse Gaussian (IG) distribution. This new distribution was proposed in the top journal Nature, which shows its importance. The IG distribution, denoted as IG(μ, λ), is defined for x > 0, μ > 0, and λ > 0 by the probability density function:
f ( x ; μ , λ ) = λ 2 π x 3 exp λ ( x μ ) 2 2 μ 2 x .
The IG distribution models the first passage time of a Brownian motion, making it highly useful for analyzing lifetime data and the frequency of event occurrences in various fields. For a comprehensive reference, see [36]. The IG distribution has many extensions, among which the generalized hyperbolic distribution (GH) is a more general one.
The GH distribution is a continuous probability distribution that represents a highly flexible family of models for describing data with skewness and heavy tails. Its key strength lies in its ability to nest several other important distributions as special or limiting cases, providing a unified framework for statistical modeling. It was introduced by [37] to model the grain size distributions of wind-blown sand. However, its remarkable fit to financial data catapulted it to prominence in financial econometrics and risk management (see [38]).
A random variable X is said to follow a GH distribution GH(λ, α, β, δ, μ) if its pdf is
f ( x ) = C ( λ , α , β , δ ) ( δ 2 + ( x μ ) 2 ) ( λ 0.5 ) / 2 K λ 0.5 α δ 2 + ( x μ ) 2 exp ( β ( x μ ) ) ,
where x R and
C ( λ , α , β , δ ) = ( α 2 β 2 ) λ / 2 2 π α λ 0.5 δ λ K λ ( δ α 2 β 2 ) , α > 0 , | β | < α , δ > 0 , μ , λ , R .
The parameter λ influences the tail behavior and the subfamily classification; α controls the heaviness of the tails and the steepness; β controls the asymmetry; δ is the scale parameter related to the dispersion; and μ is the location parameter. Kλ(·) is the modified Bessel function of the second kind of order λ defined by
K λ ( x ) = 1 2 0 t λ 1 exp x 2 ( t + 1 t ) d t .
The GH distribution involves many distributions as its special cases: hyperbolic distribution (λ = 1), normal inverse Gaussian distribution (λ = −0.5), variance gamma distribution (λ > 0, δ = 0), etc. A normal variance-mean mixture distribution, termed the normal inverse Gaussian distribution proposed by [39], is used to construct stochastic processes that appear of interest for statistical modeling purposes, particularly in turbulence and finance. Its pdf is defined as
f ( x ; α , β , δ , μ ) = C ( α , β , δ ) q x μ δ 1 K 1 α δ q x μ δ exp ( β ( x μ ) ) , x R ,
where
C ( α , β , δ ) = α π exp δ α 2 β 2 , q ( t ) = 1 + t 2 ,
and K1 is the modified Bessel function of the second kind of order 1. The parameters have the following role: μ R (location) and δ > 0 (scale); α > 0 (tail heaviness) controls the steepness of the distribution. A larger α leads to lighter tails and α > |β|, where β (asymmetry/skewness) controls the skewness. β = 0 gives a symmetric distribution, β > 0 implies positive skewness, and β < 0 implies negative skewness.
The GH distribution is a normal variance-mean mixture of the generalized inverse Gaussian (GIG) distributions. This construction is the source of its flexibility and mathematical tractability. The GIG distribution is a continuous probability distribution that generalizes several important distributions and plays a critical role as a mixing distribution in normal variance-mean mixtures. The GIG distribution denoted by GIG(λ, χ, ψ) is defined by the given pdf
f ( x ; λ , χ , ψ ) = ( ψ / χ ) λ / 2 2 K λ ( ψ χ ) x λ 1 exp 1 2 χ x + ψ x , x > 0 , χ , ψ 0 ,
where parameter λ determines the shape of the distribution and is deeply connected to the index of the Bessel function. χ is the scale parameter often related to the “left-hand” behavior, with the constraint: λ > 0 whenever χ = 0. ψ is another scale parameter that is often related to the “right-hand” tail with the constraint: λ < 0 whenever ψ = 0, and ψ = χ = 0 is not allowed. The GIG involves inverse Gaussian distribution (λ = −0.5), reciprocal inverse Gaussian distribution (λ = 0.5), and gamma distribution (χ = 0, λ > 0).
A random variable Y follows a GH distribution GH(λ, α, β, δ, μ) if it has the following stochastic representation:
Y = μ + β X + σ X Z ,
where ZN(0, 1), and XGIG(λ, χ, ψ) is independent of Z. The parameters of the resulting GH(λ, α, β, δ, μ) distribution are identified as follows: α = ψ σ 2 + β 2 , δ = χ . By choosing different GIG subfamilies as the mixer, we obtain the different GH subfamilies:
(1)
Mixing with the inverse Gaussian (λ = −0.5) distribution results in the normal inverse Gaussian distribution.
(2)
Mixing with the gamma (χ = 0, λ > 0) distribution results in the variance gamma distribution.
(3)
Mixing with the GIG (λ = 1) distribution results in the hyperbolic distribution.

2.7. Beta-Generated Distributions

A significant research area is the creation of flexible univariate distribution classes that extend classical models. The beta-generated (BG) distributions by Arnold et al. [40,41] achieve this by merging the beta distribution with a chosen baseline distribution.
Definition 1.
For a baseline cdf F(x) with pdf f(x) on a domain D, the beta-generated (BG) distributions have the following pdf:
g F ( x ; a , b ) = 1 B ( a , b ) F ( x ) a 1 [ 1 F ( x ) ] b 1 f ( x ) , x D , a > 0 , b > 0 .
We denote this by X ∼ BG(a, b; F). The corresponding cdf is derived via the substitution y = F(t):
G F ( x ; a , b ) = 1 B ( a , b ) 0 F ( x ) y a 1 ( 1 y ) b 1 d y = I F ( x ) ( a , b ) ,
which is the incomplete beta function ratio.
A key property of the BG(a, b; F) distribution is its stochastic representation:
X = d F 1 ( B ) , for B B e ( a , b ) .
This representation provides a foundation for applying the generalized inverse transformation method in statistical simulation using representative points. It offers a direct method for simulating from the density (11) and for constructing multivariate extensions. The moments of X are consequently given by
E ( X r ) = E ( [ F 1 ( B ) ] r ) , r > 0 .
The beta-generated family encompasses a wide range of distributions depending on the choice of the baseline F. To demonstrate the flexibility and utility of this class, we select two specific examples. For the generalization of this generating mechanism, see [42].

2.7.1. The Beta-Normal Distribution

If F(x) is the cdf of N(μ, σ2), the corresponding BG distribution is the beta-normal distribution, denoted as XBN(a, b; μ, σ). As studied by Eugene et al. [43], its cdf is
G ( x ) = 1 B ( a , b ) 0 Φ ( x μ σ ) t a 1 ( 1 t ) b 1 d t ,
where Φ(·) is the standard normal cdf. For the standard case BN(a, b; 0, 1), the pdf is
g ( x ) = 1 B ( a , b ) Φ ( x ) a 1 [ 1 Φ ( x ) ] b 1 ϕ ( x ) , x R .
Its moments are given by the formula in (13) (Gupta and Nadarajah [44]).
This four-parameter distribution defines location, scale, and shape, and can be unimodal or bimodal. Its fitting performance has been compared to that of normal mixture models in simulation studies. Note that the beta distribution is a special case of the Dirichlet distribution (cf. Section 4.4.2).

2.7.2. The Beta-Weibull Distribution

The beta-Weibull distribution, denoted by BW(a, b; c, γ), has the following pdf:
f B W ( x ) = 1 B ( a , b ) c γ x γ c 1 1 exp x γ c a 1 exp b x γ c , x > 0 .
A system study on this distribution refer to Famege et al. [45] and Mdziniso [46]. The beta-Weibull distribution includes many useful distributions, including the Weibull distribution (a = b = 1), beta-exponential distribution (c = 1), Rayleigh distribution (a = b = 1, c = 2), two-parameter Burr-type distribution (b = 1, c = 2), etc.
The beta-Weibull distribution has many good properties.
(1)
The limit of beta-Weibull density (16) is
lim x f B W ( x ) = , a c < 1 , c Γ ( a + b ) γ Γ ( a ) Γ ( b ) , a c = 1 , 0 , a c > 1 .
(2)
Let YBe(a, b), then the random variable X = γ log ( 1 Y ) 1 / c follows BW(a, b, c, γ), where c, γ > 0. It provides a way to generate a random sample from the BW distribution.
(3)
Let Y follow a beta-exponential distribution with parameters (a, b, θ), then random variable X = θ(Y/θ)1/c follows a beta-Weibull distribution.
(4)
The beta-Weibull distribution is unimodal. The mode is at x0 = 1 whenever ac < 1 or ac = 1 and b ≥ (c − 1)/2c. For other cases, x0 is the solution of the equation
b c ( c 1 ) ( x 0 / γ ) c = c ( a 1 ) exp ( x 0 / γ ) c 1 .
(5)
The rth moments of BW(a, b, c, γ) is given by
E X γ r = 1 B ( a , b ) 0 t r / c 1 e t a 1 e b t d t ,
by the transformation t = (x/γ)c.

3. Construction of Univariate Discrete Distributions and an Approximation

A discrete distribution is characterized by its support points and their probabilities:
X x 1 x 2 x k p i p 1 p 2 p k
where the random variable X has support x 1 , , x k and P ( X = x i ) = p i , i = 1 , , k .

3.1. Classical Univariate Discrete Distributions

Standard discrete distribution model outcomes from fundamental random processes/trials:
  • Bernoulli(p)/Binomial(n, p): The Bernoulli distribution, Bernoulli(p), models a single trial, and Binomial(n, p) counts successes in n independent trials with sample success probability p.
  • Geometric(p): It models the number of Bernoulli trials needed to obtain the first success, representing the waiting time for a success. It is a special case of the negative binomial (with r = 1).
  • Negative Binomial(r, p): As a generalization of the geometric distribution, it models the number of Bernoulli trials needed to achieve r successes. The negative binomial distribution presented here in its classic form (modeling the number of trials) is equivalent to the parameterization modeling the number of failures before the r-th success (with support y = 0 , 1 , 2 , ), which is prevalent in many modern software packages and generalized linear models.
  • Poisson(λ): It models the number of events occurring in a fixed interval of time or space, given a constant average rate (λ) and independence between events. It can be derived as a limiting case of the binomial distribution when n → ∞ and p → 0 such that npλ.
  • Hypergeometric(s, n, N, M): It represents the number s of successes in n draws without replacement from a population with N size with M success items, where max(0, n − (NM)) ≤ s ≤ min(M, n). This differs from the binomial distribution, where trials are independent (i.e., with replacement).
See Johnson and Kotz [47] for an extensive catalog. A guide to selecting common discrete probability distributions is illustrated in Figure 1, with probability mass functions summarized in Table 6.
Consider a random variable X following the discrete distribution defined in (17) and a continuous function g(·). The transformed variable Y = g(X) follows a distribution with the following structure:
Y g ( x 1 ) g ( x 2 ) g ( x k ) p i p 1 p 2 p k .
In other words, the law of Y is determined by mapping the support points of X via g while preserving the associated probabilities. This elegant property facilitates the analysis of transformed discrete variables. Li et al. [14] utilized this property to compute nearly representative points for the distribution of g(X).

3.2. Distributions Generated by Mixtures

A powerful method for constructing flexible probability models is through mixture distributions, where the parameter of one distribution is itself governed by another distribution. This approach naturally incorporates heterogeneity and over-dispersion, commonly observed in real-world data.

3.2.1. Poisson Mixtures

When the rate parameter λ of a Poisson distribution is not fixed but follows a continuous distribution G(λ) with support on R + , the resulting marginal distribution is a Poisson mixture. Its probability mass function is given by
P ( X = k ) = 0 e λ λ k k ! d G ( λ ) , k = 0 , 1 , 2 ,
The most prominent example is the negative binomial distribution, which arises as a Poisson-gamma mixture. Specifically, if
X λ Poisson ( λ ) and λ Gamma ( r , β = ( 1 p ) / p ) ,
then the marginal distribution of X is negative binomial with pmf:
P ( X = k ) = Γ ( r + k ) k ! Γ ( r ) p r ( 1 p ) k , k = 0 , 1 , 2 ,
This model is widely used for count data where the variance exceeds the mean.

3.2.2. Binomial Mixtures

Similarly, allowing the success probability p of a binomial distribution to vary according to a continuous distribution F(p) on [0,1] leads to a binomial mixture. The marginal pmf is
P ( X = k ) = 0 1 n k p k ( 1 p ) n k d F ( p ) , k = 0 , 1 , , n
The beta-binomial distribution is the canonical example, where p ∼ Beta(α, β). Its pmf is
P ( X = k ) = n k B ( k + α , n k + β ) B ( α , β )
where B(·, ·) is the beta function. This distribution effectively models counts with extra-binomial variation.
Other notable mixtures include the Poisson-inverse Gaussian and the negative binomial-beta distributions [48].

3.3. Distributions Generated by Random Sums

Distributions generated by random sums, also known as compound distributions, provide a framework for modeling the aggregate outcome of a random number of events. Let N be a non-negative integer-valued random variable with pmf pN(n), and let {Xi} be a sequence of i.i.d. random variables, independent of N. The random sum is defined as
S = X 1 + X 2 + + X N , ( with the convention that S = 0 if N = 0 ) .
The distribution of S is called the compound distribution of N and Xi. Its probability generating function (pgf) is given by GS(t) = GN(GX(t)), from which moments and probabilities can be derived.
Prominent examples include the following:
  • Compound Poisson distribution: If N ∼ Poisson(λ), then S is compound Poisson. This is a highly flexible family used in insurance (to model total claim amounts) and queueing theory. Special cases include the Poisson-exponential and Poisson-gamma (which is equivalent to the negative binomial under a specific parameterization) distributions.
  • Geometric sum: If N ∼ Geometric(p), and Xi ∼ Exp(θ), then S ∼ Exp(θp). This result is foundational in renewal theory.
The family of phase-type distributions can also be constructed using random sums of exponential random variables, demonstrating the versatility of this method for generating distributions with specific tail behaviors [49].

3.4. Approximation to Univariate Continuous Distributions by Representative Points

Consider a random variable XF(x), where F(x) is a univariate cumulative distribution function. The empirical distribution function Fk(x) is defined as
F k ( x ) = 1 k i = 1 k I x i x ,
where IA is the indicator function of A. The empirical distribution Fk(x) is a discrete distribution with support points x 1 , , x k , each assigned an equal probability mass of 1/k. It serves as a consistent approximation of F(x), meaning Fk(x) → F(x) in distribution as k → ∞. This distribution can be represented as
Y M C x 1 x 2 x k p i 1 k 1 k 1 k .
Alternatively, one can construct other approximations using the so-called representative points (RPs) as support points. For a given distribution F(x) in R , several types of RPs exist, including
  • Monte Carlo RPs (MC-RPs): the empirical sample points.
  • Quasi-Monte Carlo RPs (QMC-RPs): deterministic points based on low-discrepancy sequences.
  • Mean square error RPs (MSE-RPs): points minimizing a mean squared error criterion.
For a univariate continuous distribution F(x), the QMC-RP approximation is given by the following distribution:
Y Q M C F 1 ( 1 2 k ) F 1 ( 3 2 k ) F 1 ( 2 k 1 2 k ) p i 1 k 1 k 1 k .
MSE-RPs and related approximation distribution is
Y M S E ξ 1 ξ 2 ξ k p i p 1 p 2 p k .
where the support points { ξ 1 , , ξ k } are determined by
MSE ( Y ) = + min i ( x ξ i ( k ) ) 2 p ( x ) d x = j = 1 k I j ( x ξ i ( k ) ) 2 p ( x ) d x ,
where
I 1 = ( a 1 , a 2 ) , I 2 = ( a 2 , a 3 ) , , I k = ( a k , a k + 1 ) , a 1 = , a i = ξ i 1 ( k ) + ξ i ( k ) / 2 , i = 2 , , k , a k + 1 = .
Fang and Pan [50] and Fang et al. [8] provided a comprehensive review of the theory, generating algorithms, and applications of representative points in statistical inference.

3.4.1. Difference Between Two Distributions

The Kullback–Leibler (KL) divergence, or relative entropy, of two probability distributions with probability density functions p(x) and q(x) on X R p is defined as the expected logarithmic ratio of densities concerning p(x):
D K L ( p ( x ) | q ( x ) ) = X log p ( x ) q ( x ) p ( x ) d x = E p ( x ) log p ( x ) q ( x ) .
The KL divergence is non-negative and is equal to 0 if and only if the two distributions are identical.
Now, consider the univariate case. For any discrete distribution, we can find a corresponding density estimator, denoted by p ^ ( x ) . Let F(x) be a cdf with pmf p(x), and let F ^ ( x ) with pmf p ^ ( x ) be an approximating distribution. Several criteria in the literature measure the distance between F and F ^ :
KL   divergence ( K L ) : KL ( p p ^ ) = x log p ( x ) p ^ ( x ) p ( x ) ; L 2 -norm   of   pmfs ( L 2 pdf ) : p , p ^ = x p ( x ) p ^ ( x ) 2 1 / 2 ; L 2 -norm   of   cdfs ( L 2 cdf ) : F , F ^ = R F ( x ) F ^ ( x ) 2 d x 1 / 2 .
Another metric for the distance between F(x) and F ^ ( x ) is the absolute bias index (ABI), which evaluates the overall bias in parameter estimation. For a population distribution F(x; θ) with parameter vector θ = ( θ 1 , , θ m ) and its estimate θ ^ = ( θ ^ 1 , , θ ^ m ) under F ^ ( x ) , the ABI is defined as follows:
A B I = 1 m i = 1 m θ i θ ^ i θ i .

3.4.2. Discrete Analogues

Discrete analogs aim to create new discrete distributions that preserve one or more defining characteristics of an original continuous distribution. An approximation F ^ ( x ) is considered successful if it maintains as much information from F(x) as possible, whether measured by the distance between the distributions, the similarity of their moments, or the properties of simulated samples.
As a continuous random variable X can be characterized by its pdf, moments, generating functions, or other features, the construction of a discrete analog Y (with pmf) from X (with density f(x)) focuses on matching these properties. The primary methods for this construction, summarized by [51], are classified as follows.
(1)
Discrete analogs of the Pearson differential equation. This approach constructs discrete probability distributions by employing difference equations that mirror the structure of the Pearson differential equation, which defines the continuous Pearson family [52,53].
(2)
The probability mass function (pmf) of Y retains the form of the pdf of X, and the support of Y is determined from the full range of X. The pmf is defined as
P ( Y = k ) = f ( k ) j = f ( j ) , k = 0 , ± 1 , ± 2 , .
This approach has been used to create discrete analogs of various continuous distributions, including the gamma, general Dirichlet, normal, lognormal, exponential, Laplace, generalized exponential, and gamma distributions [54,55,56].
(3)
The survival function (SF) of Y retains the form of the survival function of X, and the support of Y is determined from the full range of X. The discrete survival function is defined as SY(k) = P(Yk), and its corresponding cdf is FY(k) = P(Yk), related by SY(k) = 1 − FY(k − 1). This technique has generated numerous discrete distributions, such as the discrete exponential, Weibull, geometric Weibull, normal, Rayleigh, Maxwell, extended exponential, and Lindley distributions [57,58].
(4)
The hazard (failure) rate function of Y retains the form of the hazard rate function of X. This method preserves the hazard rate function. For a continuous random variable X with survival function SX(x) = P(Xx) and hazard rate function λX(x) = fX(x)/SX(x), the survival function of the discrete analog Y is given by
P ( Y k ) = i = 1 k 1 1 λ X ( i ) , k = 1 , , m ,
and its pmf is
P ( Y = k ) = λ X ( k ) i = 1 k 1 1 λ X ( i ) .
This approach has been used to propose new discrete distributions, such as the discrete Lomax, Weibull, and Rayleigh [59].
(5)
The moments of Y and X coincide up to a certain order. This method requires that Y and X share the same finite rth moment for r = 0 , 1 , , 2 N 1 , and that their cdfs coincide at least at M + 1 points. The support of the discrete random variable Y is { y 1 , , y M N } . It has been used to define new discrete uniform, normal, gamma, and beta distributions [60].
There are more other construction methods, for instance, an extensive review on discrete distributions defined for integer data is provided in [61].

4. Multivariate Distributions

The commonly used discrete distributions can be extended to multivariate cases, including multinomial and multivariate negative binomial, hypergeometric, and Poisson distributions.

4.1. Multinomial Distribution

An extension of the binomial distribution is the so-called multinomial distribution. It is the fundamental model for counting events that fall into multiple categories, making it essential in fields like statistics, genetics, machine learning (e.g., topic modeling with naive Bayes), and survey analysis. Considering an experiment consists of n identical, independent trials, each trial results in one of k mutually exclusive and exhaustive categories with probabilities p i , i = 1 , , k ; p 1 + + p k = 1 . Let random vector X = ( X 1 , , X k ) follow a multinomial distribution M u l t i n o m i a l ( n ; p 1 , , p k ) . Its probability mass function is
P ( X 1 = n 1 , , X k = n k ) = n ! n 1 ! n k ! , n 1 , , n k N 0 , n 1 + + n k = n .
The multinomial distribution has many nice properties. One is related to Poisson distribution, that lets X 1 , , X k be independent and X i P o i s s o n ( λ i ) , i = 1 , , k . Then the conditional distribution of ( X 1 , , X k | X 1 + + X k = n ) follows M u l t i n o m i a l ( n ; p 1 , , p k ) where p i = λ i / ( λ 1 + + λ k ) , i = 1 , , k . The multinomial distribution has been used in many statistical applications. One is in the Pearson’s chi-square test for goodness-of-fit, using the fact that the test statistic, calculated from observed multinomial counts and expected counts (npi), follows a chi-square distribution asymptotically (as n grows large).

4.2. Multivariate Hypergeometric Distribution

The multivariate hypergeometric distribution is the generalization of the hypergeometric distribution to more than two categories. It describes the probability of drawing specific numbers of items from each category without replacement. Considering a population with size N, of which M 1 , , M k are the numbers of items in the k mutually exclusive and exhaustive categories with i = 1 k M i = N , suppose that the n random samples are sampling without replacement containing n 1 , , n k items of each corresponding category. The joint distribution of X = ( X 1 , , X k ) has pmf
P ( X 1 = n 1 , , n k ) = i = 1 k M i n i N n , i = 1 k n i = n , 0 n i M i , i = 1 , , k .
The distribution denoted as M u l t . H y p g . ( n ; M 1 , , M k ) is the so-called multivariate hypergeometric distribution. Without requiring the parameters in (28) to be non-negative, multivariate inverse hypergeometric, negative hypergeometric, and negative inverse hypergeometric distributions are then defined; see [62]. One property is that if X and Y are independent and each distributed as M u l t i n o m i a l ( n ; p 1 , , p k ) , then X|(X + Y = N) follows Mult. Hypg. (n, N).

4.3. Multivariate Poisson Distribution

If the parameters p 1 , , p k 1 in the multinomial distribution are allowed to tend to 0, this implies that pk tends 1 as n → ∞, such that n p i λ i , i = 1 , 2 , , k 1 . Hence, the probability of obtaining the event i = 1 k 1 ( X i = n i ) tends to
P ( X 1 = n 1 , , X k 1 = n k 1 ) = i = 1 k 1 { e λ i λ i n i n i ! } .
This is the joint distribution of k − 1 mutually independent Poisson random variables with means λ 1 , , λ k 1 . The distribution can also be obtained as a limiting form of negative multinomial distributions.
The simplest case, as derived from the multinomial limit, is one of independence, but more flexible distributions allowing for correlation are of greater practical interest. A more general and widely used approach to constructing a multivariate Poisson distribution with a covariance structure is to use a common mixture or latent variable model. A standard construction introduces a set of independent latent Poisson variables and defines the observed variables as their sums.
A classical bivariate Poisson distribution for random variables (X, Y) can be defined as follows:
X = Z 1 + Z 0 , Y = Z 2 + Z 0 ,
where Z0, Z1, Z2 are independent Poisson random variables with rates λ0, λ1, λ2 ≥ 0, respectively. The joint probability mass function of (X, Y) is given by
P ( X = x , Y = y ) = exp ( ( λ 1 + λ 2 + λ 0 ) ) k = 0 min ( x , y ) λ 1 x k ( x k ) ! λ 2 y k ( y k ) ! λ 0 k k ! .
The marginal distributions are X ∼ Poisson(λ1 + λ0) and Y ∼ Poisson(λ2 + λ0), and the covariance between X and Y is Cov(X, Y) = λ0 ≥ 0. This model can only account for non-negative correlation, which is often sufficient for many applications, such as modeling the number of insurance claims across different policy types or goals scored by two teams in a match.
This construction can be generalized to higher dimensions by introducing more common components, leading to a richer covariance structure [3,63,64]. These correlated multivariate Poisson models are essential tools in modern multivariate count data analysis.

4.4. Continuous Multivariate Distributions

Johnson et al. [4] provided a comprehensive compilation of many multivariate distributions. Johnson [7] offered an in-depth introduction to the generation of multivariate statistical simulations. Palmitesta and Provasi [65] reviewed a range of multivariate distributions and their computer generation methods. Their survey encompassed the Dirichlet distribution, the multivariate Johnson distribution system, multivariate beta and gamma distributions, the multivariate extreme value distribution, the inverse Gaussian multivariate distribution, the Cook–Johnson family of multivariate uniform distributions, and elliptically contoured distributions. They also examined techniques for generating random vectors with specified marginal distributions and correlation matrices. Fang, Kotz, and Ng [6] conducted a comprehensive study on symmetric multivariate distributions. Due to space limitations, this section focuses on selected multivariate distributions from the literature and meta-distributions constructed via copulas, see [66]. For more construction methods for multivariate distribution, see [67].

4.4.1. Stochastic Representation

The traditional definition of the multivariate normal distribution relies on its density function. This approach, however, has several weaknesses:
(i)
It assumes that the explicit form of the density function is known.
(ii)
It requires the analytical evaluation of high-dimensional integrals to compute probabilities, moments, and marginal distributions.
The stochastic representation operator, denoted by = d , is a powerful tool for constructing new distributions. If two random vectors X and Y have the same distribution, we write X = d Y .
Examples
U = d 1 U , where UU(0, 1);
X = d X , where XN(0, σ2);
X = d μ + σ Y , where XN(μ, σ2) and YN(0, 1).
Properties
(1)
X = d Y ( f 1 ( X ) , , f m ( X ) ) = d ( f 1 ( Y ) , , f m ( Y ) ) , where fj are measurable functions.
Examples are X X = d Y Y and X / X = d Y / Y .
(2)
X, Y, and Z are random variables,
X = d Y can imply X + Z = d Y + Z , if Z is independent of (X, Y).
(3)
If X + Z = d Y + Z , and Z is independent of (X, Y), it is not necessary for X = d Y .
(4)
If Z is independent of (X, Y), then
  • X = d Y implies X Z = d Y Z ;
  • X Z = d Y Z implies X = d Y under some conditions.
(5)
X = d Y if and only if X + = d Y + and X = d Y .
Example 1.
The multivariate normal distribution, denoted by X ∼ Np(μ, Σ), has the following stochastic representation:
X = d μ + A Z ,
where μ = ( μ 1 , , μ p ) , A A = Σ , and Z = ( Z 1 , , Z p ) with i.i.d. elements Zi ∼ N(0, 1). It is easy to find density function, moments, and marginal distributions of Z and then transfer these to X. For example, from the density function of Z, 1 ( 2 π ) p e 1 2 z z , it is straightforward to find the density of X as
f ( x ) = ( 2 π ) p / 2 | Σ | 1 / 2 exp 1 2 ( x μ ) ( x μ ) .
As the mean and covariance of Z are, respectively, 0  and Ip, one can immediately find E(X) = μ and Cov(X) = Σ, respectively. It is also easy to find that all the marginal and affine transformations of X are normally distributed. For more detailed discussion and applications, see [68].
Many efforts to extend constructing methods from univariate to multivariate, for example, Pearson system, exponential family, and many others, can be found in [4].

4.4.2. Dirichlet Distribution

Let X1, …, Xm−1 and Xm be independent random variables, and XiGamma(αi), i = 1, …, m. Set
Y = X 1 + + X m , Y j = X j Y , j = 1 , , m .
The distribution of Y = (Y1, …, Ym) is called the Dirichlet distribution with parameters α 1 , , α m and is written as (Y1, …, Ym) ∼ Dm(α1, …, αm).
The beta distribution Be(α, β) is a special case of Dirichlet distribution. As Y 1 + + Y m = 1 , the random vector ( Y 1 , , Y m ) does not have a density on R m . However, the density of the first m − 1 components (with respect to Lebesgue measure on the simplex) is given by
f ( y 1 , , y m 1 ) = 1 B m ( α ) i = 1 m y i α i 1 , i = 1 m y i = 1 0 y i 1 , i = 1 , , m ,
where α = (α1, …, αm) and
B m ( α 1 , , α m ) = Γ ( α 1 ) Γ ( α m ) Γ ( α 1 + + α m ) .
Moreover, Y is independent of { Y j , j = 1 , , m 1 } .
There is a fundamental relationship between the Dirichlet distribution and the uniform distribution on the unit sphere Sp = {x : xx = 1} [68].
Theorem 2.
Suppose that U U(Sp) in R p . Then U has a stochastic representation
U = U 1 U m = d D 1 U ( 1 ) D m U ( m ) ,
where U ( j ) R t j , t 1 + + t m = p , and
(i)
U(j)∼ U(Stj), the uniform distribution on S t j , j = 1 , , m ;
(ii)
U ( 1 ) , , U ( m ) and ( D 1 , , D m ) are mutually independent;
(iii)
D j > 0 , j = 1 , , m , and ( D 1 2 , , D m 2 ) D m ( t 1 / 2 , , t m / 2 ) .
Based on the Dirichlet distribution, the multivariate Liouville distribution is defined on R + m by the stochastic representation X = d R Y , where R > 0 is a random variable independent of Y and Y D m ( α 1 , , α m ) . For a comprehensive study of the Liouville distribution, see Chapter 6 of [6].

4.4.3. Spherical and Elliptical Distributions

There are a number of equivalent ways to define the spherical distribution. Let X be a p × 1 random vector. The following statements are equivalent [6]:
(i)
X = d P X for each PO(p), where O(p) is the set of orthogonal matrices of order p;
(ii)
The characteristic function (c.f.) of X is of the form ψ(tt) for some ψ ∈ Ψp where
Ψ p = { ψ ( · ) : ψ ( t 1 2 + + t p 2 ) is   a   p-dimensional   c.f. } ;
(iii)
X has a stochastic representation X = d R U ( p ) , where random variable R ≥ 0 is independent of U(p), U(p) is uniformly distributed on the unit sphere Sp and does not have a density;
(iv)
For any a S p and b S p , we have a X = d b X .
Theorem 3.
Let X = d R U ( p ) and P ( X = 0 ) = 0 . Then
X = d R , X / X = d U ( p ) ,
and they are independent.
Example 2.
The theorem has two immediate applications. First, it provides a method to generate a random sample from the uniform distribution on the sphere, U(Sp): simply sample X from N p ( 0 , I p ) and normalize it to obtain X X . Second, it allows the mixed moments of U(p)to be easily calculated from the known mixed moments of the standard normal vector X and the moments of its radius R = ||X||. A detailed discussion can be found in Section 3.1.2 of [6].
Theorem 4.
Let X Sp(ψ) denote a spherically symmetric distribution in R p , characterized by its characteristic function ψ(tt) as (33). It is not necessary that X has a density. The random vector X = d R U ( p ) has a density which must be of the form g(xx), if and only if the radial variable R has a density, denoted by f(r). The relationship between g and f is given by
f ( r ) = 2 π p / 2 Γ ( p / 2 ) r p 1 g ( r 2 ) ,
where g(·) is called the density generator. We write X Sp(g).
The necessary condition that X Sp(ψ) has a density is
0 y p / 2 1 g ( y ) d y <
as
g ( x x ) d x = π p / 2 Γ ( p / 2 ) 0 y p / 2 1 g ( y ) d y = 1 .
Table 7 gives many useful spherical distributions from [6].
Elliptical Distribution
A random vector X has an elliptical distribution, denoted as XECp(μ, Σ), with parameters μ and Σ = AA, if it can be expressed as
X = d μ + A Y = d μ + R A U ( p ) ,
where YSp(ψ) is spherical. This relationship generalizes the way a normal variable XN(μ, σ2) is constructed from a standard normal ZN(0, 1). Consequently, any subclass of spherical distributions in Table 7 defines a corresponding subclass of elliptical distributions.

4.4.4. Approximation to Multivariate Continuous Distributions by Representative Points

For multivariate continuous distributions, the QMC-RPs and MSE-RPs can be defined in a similar way to the univariate case.Suppose that a random vector XF(·) in R n has a density function f(·) with a finite mean vector and covariance matrix. A set of points Ξ = { ξ j , j = 1 , , k } is called mean squared error representative points if it minimizes the mean squared error, MSE ( Ξ ) = R n min j = 1 , , k x ξ j 2 f ( x ) d x , where ∥ · ∥ denotes the L2-norm. For information regarding the construction of approximation distributions of a continuous multivariate distribution by their QMC-RPs, one can refer to [8,69,70]. For information regarding the construction of approximation distributions of a continuous multivariate distribution by their MSE-RPs, one can refer to [8,70,71,72,73]. Due to the complexity of computation in high-dimensional multivariate distribution, the related research works about MSE-RPs of this kind of distribution are limited. Representative points also have many applications for discrete data, especially for discrete massive data. In Chapter 9 of [8], some cases where the sample size is hugely larger than the number of variables have been considered, and some model-based and model-free subsampling methods are included.

4.5. Basic Statistics Under the Multivariate Population

Let X follow a multivariate continuous distribution F(x; θ) where θ represents parameters. A random sample x 1 , , x n is drawn from this population. The distributions of basic statistics, such as the mean vector, the covariance matrix, and various test statistics, are fundamental to statistical inference and its applications. For example, when the population distribution is multivariate normal, Np(μ, Σ), the standard unbiased estimators for μ and Σ are sample mean and the sample covariance, given, respectively, by
μ ^ = x ¯ = 1 n i = 1 n x i , S = 1 n 1 i = 1 n ( x i x ¯ ) ( x i x ¯ ) .
It is known that x ¯ N p ( μ , 1 n 1 Σ ) . Other key related distributions are described below.
Wishart distribution: The Wishart distribution, denoted by Wp(n, Σ), is the multivariate generalization of the chi-square distribution. It is defined by W = i = 1 n Z i Z i and Z i N p ( 0 , Σ ) are i.i.d. The sample covariance matrix S follows Wishart distribution: (n − 1)SWp(n − 1, Σ). This distribution was developed by John Wishart [74], who, in his seminal 1928 paper “The Generalised Product Moment Distribution in samples from a normal multivariate population”, published in Biometrika, derived the distribution of the matrix of sums of squares and cross-products. A crucial result for normally distributed data is that the sample mean x ¯ and the sample covariance matrix S are independent, generalizing the corresponding univariate property.
Distributions of the eigenvalues of S: Unlike the sample covariance matrix itself, the distributions of its individual eigenvalues are complex and do not follow simple, universal forms like the normal or Wishart distributions.
Hotelling’s T-squared distribution: It is the multivariate generalization of the Student’s t-distribution, used primarily for hypothesis testing on the mean vector. For a random vector XNp(μ, Σ) independent of SWp(n, Σ), the Hotelling’s T2 statistic is denoted as
T 2 = n X S 1 X .
The resulting T2 statistic, despite being derived from multivariate data, is a scalar. Therefore, its sampling distribution, denoted by T2(p, n, μ), is a univariate distribution. When μ = 0 , the distribution is central and denoted by T2(p, n). The T2 statistic is related to the F-distribution through the following identity: n p + 1 n p T 2 ( p , n ) F ( p , n p + 1 ) .
Wilks’ lambda distribution: This distribution generalizes the beta distribution and is used in multivariate analysis of variance and likelihood ratio tests. Let A1Wp(m, Σ) and A2Wp(n, Σ) be independent. The Wilks’ lambda statistic, defined as Λ = A 2 A 1 + A 2 , is a scalar. Thus, its sampling distribution, denoted by Λ(p, m, n), is a univariate distribution. Wilks’ lambda has close relationships with the T2 distribution and simplifies to an F or beta distribution when p = 1.
When the population X follows an elliptical distribution (a broader class than the normal), deriving the distributions of the aforementioned statistics becomes significantly more challenging. A comprehensive discussion of this topic can be found in [68].

Multivariate L1-Norm Symmetric Distribution

The elliptical distribution can be regarded as an extension of the multivariate normal distribution. Similarly, the exponential distribution can be generalized to a multivariate setting. Let X 1 , , X p be i.i.d. random variables from the standard exponential distribution, Exp(1) (see Table 4). Consider the random vector X = ( X 1 , , X p ) and its L1-norm defined by X 1 = | X 1 | + + | X p | . Let U = X X 1 . Fang and Fang [75] defined a class of multivariate distributions via the following stochastic representation:
X = d R U ,
where the radial variable R is independent of the directional vector U. This class of distributions was studied systematically in [75,76] and in Chapter 5 of [6].

4.6. Definition by Distribution, Density or Survival Functions

4.6.1. Meta-Skew Symmetric Distributions

A traditional approach to defining a probability distribution is through its distribution function, density function, or survival function. For instance, the multivariate normal distribution in Anderson’s book [77] is defined by its density function. A general univariate form of an asymmetric distribution, known as the skew-symmetric family, can be represented as a function that combines a cumulative distribution function and a probability density function; see [78,79].
Definition 2.
A random variable X, whose density function has the form
p ( x ) = 2 f ( x ) G ( w ( x ) ) ,
is called skew symmetric, where f(x) is a density function centrally symmetric about 0, w(x) is an odd real-valued function such that w(−x) = −w(x), and G(w(·)) is a cumulative distribution function on R such that g = G  is an even density function, providing G is differentiable.

4.6.2. Skew-Normal Distribution

The multivariate skew-normal (MSN) distribution, introduced in Table 3, is a fundamental and powerful generalization of both the multivariate normal and the univariate skew-normal distributions. It provides a flexible framework for modeling multivariate data that exhibits asymmetry (skewness).
A p-dimensional random vector Y follows a multivariate skew-normal distribution with location vector μ, positive definite scale matrix Σ, and skewness vector α if its probability density function is given by
f ( y ; μ , Σ , α ) = 2 ϕ p ( y μ ; Σ ) Φ ( α ω 1 ( y μ ) ) ,
where ϕp(z; Σ) is the pdf of p-variate normal distribution with mean 0 and covariance matrix Σ; Φ(·) is the cdf of N(0, 1); ω is the diagonal matrix formed from the standard deviations of Σ (the square roots of the diagonal elements), and the vector α is the shape or skewness parameter vector. It denotes YSNp(μ, Σ, α).
The MSN distribution admits an intuitive stochastic representation. Consider a (p + 1)-dimensional normal random vector ( X 0 , X 1 ) ,
X 0 X 1 N 1 + p 0 0 , 1 δ δ Σ .
Here, X0 is a scalar and X1 is a p-vector. The vector δ controls the correlation between X0 and each component of X1. Then
Y = d μ + ω ( X 1 | X 0 > 0 ) S N p ( μ , Σ , α ) ,
where the skewness parameter α relates to δ through the following equation:
α = 1 1 δ Σ 1 δ Σ 1 δ .
For comprehensive details on the theory and applications of the multivariate skew-normal distribution, see [80,81].

4.6.3. Skew-Elliptical Distributions

Skew-elliptical distributions form a broad family that generalizes symmetric elliptical distributions by introducing skewness (asymmetry). This extension moves beyond the familiar world of symmetric, bell-shaped curves into the asymmetric realm, providing a principled and mathematically sound framework for modeling a wider range of real-world data patterns. Among these, the skew-t distribution is often a particularly useful starting point for applied work due to its flexibility in handling both skewness and heavy tails.
The general form of the pdf for a skew-elliptical distribution is
f ( z ) = 2 f 0 ( z ) Π ( z ) ,
where f0(z) is the pdf of a symmetric elliptical distribution (the “parent”), and Π(z) is a skewing function that maps onto the interval [0, 1] and satisfies Π(z) + Π(−z) = 1. This function controls the direction and amount of skewness. The skew-elliptical family is vast, but several members are particularly important: skew-normal, skew-t, skew-Cauchy, and skew-Laplace distributions. The skew-elliptical distributions are used anywhere that data deviates from symmetry and normality, such as in finance, environmental science, biology, psychometrics, and social sciences. For a comprehensive study, refer to [82].

4.6.4. Definition by Survival Function

The survival function of a random variable XF(x) is defined as F ¯ ( x ) = P ( X > x ) . This function is particularly useful in the analysis of biometrical data. Within this domain, the exponential and Weibull distributions play an important role.
A foundational model for dependent failures is the Marshall–Olkin multivariate exponential distribution [83]. Its strength lies in an intuitive shock-based construction and analytical tractability. The model assumes that shocks to the system components arrive according to three independent Poisson processes with rates λ1, λ2, and λ12, corresponding to shocks that fatally affect only component 1, only component 2, or both components simultaneously. The arrival times of these shocks, Z1, Z2, and Z12, are, therefore, independent exponential random variables with parameters λ1, λ2, and λ12, respectively.
We set X = min{Z1, Z12} and Y = min{Z2, Z12}. The joint survival function of the random vector (X, Y) is defined as H ¯ ( x , y ) = P ( X > x , Y > y ) . For any x, y ≥ 0 the survival function is given by
H ¯ ( x , y ) = P ( Z 1 > x ) P ( Z 2 > y ) P ( Z 12 > max ( x , y ) ) = exp [ λ 1 x λ 2 y λ 12 max ( x , y ) ] .
Its marginal survival functions are
F ¯ ( x ) = exp ( ( λ 1 + λ 12 ) x ) , and   G ¯ ( y ) = exp ( ( λ 2 + λ 12 ) y ) .
Hence, X and Y are exponential random variables with parameters λ1 + λ12 and λ2 + λ12, respectively. This elegant construction can be extended to higher dimensions.

4.7. Meta-Distributions via Copula Techniques

Various statistical distributions play a key role in modeling real-world data. The rapid development of data science has increased the demand for a wider variety of multivariate distributions. A common challenge is to construct a multivariate distribution from given one-dimensional marginal distributions while incorporating a known dependence structure. Such constructions arise not only from statistical theory but also from practical problems, particularly in risk and decision analysis for complex multivariate systems. The copula is a powerful technique designed for this purpose.
Copulas are mathematical functions that fully capture the dependence structure among random variables, offering great flexibility in building multivariate stochastic models. Since their introduction in the 1950s, copulas have gained significant popularity in several applied fields such as finance, insurance, and reliability theory. Nowadays, they are a well-recognized tool for market and credit modeling, risk aggregation, portfolio selection, and more.
The following definition is based on probability theory. For a more general treatment, see [66,84]. As most research works and applications are interested in the copulas with respect to continuous distributions, and it becomes complicated for discrete distributions, this section only covers the continuous case.
Definition 3.
If C ( z 1 , , z p ) is the cumulative distribution function on the unit hypercube [0, 1]p with uniform U(0, 1) margins, then C is called a copula.
The following Lemma demonstrates the existence of the copula function.
Lemma 1.
Let H ( x 1 , , x p ) be a continuous distribution with p continuous margins Fi on the domain Di for i = 1 , , p . Then there exists a unique copula C such that
C ( z 1 , , z p ) = H ( F 1 1 ( z 1 ) , , F p 1 ( z p ) ) .
It is well known that for a given set of p marginal distributions F i ( x i ) i = 1 p , there exist infinitely many p-dimensional distributions with these margins. If the dependence structure of the joint distribution is known, one may find a unique or a small family of such joint distributions H ( x 1 , , x p ) . The copula technique is a fundamental tool for this purpose.
Suppose we have p random variables XiFi(xi) for i = 1 , , p , and we wish to construct a multivariate distribution H with these margins and a given dependence structure defined by a copula C ( z 1 , , z p ) . This method is called the copula technique. The resulting distribution is termed a meta-distribution and is denoted by F ( C ; F 1 , , F p ) .
Theorem 5.
Sklar’s Theorem (Multivariate Case) Let X be a continuous random vector with joint distribution H ( x 1 , , x p )  and continuous marginal distributions Fi on domains Di for i = 1 , , p . Then there exists a unique copula C such that
H ( x 1 , , x p ) = C ( F 1 ( x 1 ) , , F p ( x p ) ) .
Conversely, if C is a copula and F 1 , , F p  are univariate distribution functions, then the function H defined by (42) is a p-dimensional distribution function with margins F 1 , , F p .
If the margins Fi are continuous, then C is unique. The joint density function of X derived from (42) is given by
h ( x 1 , , x p ) = c ( F 1 ( x 1 ) , , F p ( x p ) ) i = 1 p f i ( x i ) ,
where fi is the probability density function of Xi, and c(·) is the copula density, i.e., the p-th mixed partial derivative of C.

4.7.1. Some Examples of Copula

Many copula examples exist for lower dimensions. This subsection presents selected examples from Nelsen [66] to provide a tutorial for beginners to comprehend the ideas of copula. Examples 3 and 4 have different joint distributions but have the same copula.
Example 3.
Adapted from Nelsen [66]
Consider the joint distribution
H ( x , y ) = ( x + 1 ) ( e y 1 ) x + 2 e y 1 , ( x , y ) [ 1 , 1 ] × [ 0 , ) .
Its marginal distributions are X U(−1, 1) with cdf F ( x ) = x + 1 2 and Y Exp(1), the standard exponential distribution with cdf G(y) = 1 − ey, and copula
C ( u , v ) = u v u + v u v .
We now demonstrate how to derive the copula C from H, and, conversely, how to reconstruct H from C, F, G.
Deriving C from H:
Let u = F(x) and v = G(y), which implies u = x + 1 2 and v = 1 ey or e y = 1 1 v .
C ( u , v ) = H ( F 1 ( u ) , G 1 ( v ) ) = ( F 1 ( u ) + 1 ) ( e G 1 ( v ) 1 ) F 1 ( u ) + 2 e G 1 ( v ) 1 = [ ( 2 u 1 ) + 1 ] [ e ln ( 1 v ) 1 ] 2 u 1 + 2 e ln ( 1 v ) 1 = 2 u ( 1 1 v 1 ) 2 u 1 + 2 1 1 v 1 = 2 u [ 1 ( 1 v ) ] ( 2 u 1 ) ( 1 v ) + 2 ( 1 v ) = u v u + v u v .
Deriving H from {C, F, G}:
Note that e y = 1 1 v ; we have
H ( x , y ) = C ( u , v ) = x + 1 2 ( 1 e y ) x + 1 2 + 1 e y x + 1 2 ( 1 e y ) = ( x + 1 ) ( 1 e y ) x + 1 + 2 ( 1 e y ) ( x + 1 ) ( 1 e y ) = ( x + 1 ) ( e y 1 ) ( x + 1 ) + 2 e y 2 = ( x + 1 ) ( e y 1 ) x + 2 e y 1 .
Example 4.
Gumbel’s bivariate logistic distribution [85].
H ( x , y ) = ( 1 + e x + e y ) 1 , x , y 0 .
Its two margins are
F ( x ) = ( 1 + e x ) 1 , G ( y ) = ( 1 + e y ) 1 .
Its copula is
C ( u , v ) = u v u + v u v ,
which is identical to the copula in Example 3, despite the marginal distributions being quite different.
For multivariate copulas, higher-dimensional margins are defined as per the following example.
Example 5.
Consider the following three-dimensional distribution H:
H ( x , y , z ) = ( x + 1 ) ( e z 1 ) sin ( z ) x + 2 e y 1 , o n [ 1 , 1 ] × [ 0 , ] × [ 0 , π 2 ] .
Its three one-dimensional marginal distributions are
H 1 ( x ) = x + 1 2 , H 2 ( y ) = 1 e y , H 3 ( z ) = sin ( z ) .
The three two-dimensional marginal distributions are
H 12 ( x , y ) = ( x + 1 ) ( e y 1 ) x + 2 e y 1 , H 23 ( y , z ) = ( 1 e y ) sin ( z ) , H 13 ( x , z ) = 1 2 ( x + 1 ) sin ( z ) .
Example 6.
Fang et al. [86] provided a comprehensive study of the copula defined for α > 0 and β ∈ [0, 1]:
C ( u , v ; α , β ) = u v [ 1 β ( 1 u 1 / α ) ( 1 v 1 / α ) ] α .
For different values of α and β, we have the following:
1. 
β = 0; U and V are independent.
2. 
β = 1; it reduces to the bivariate Clayton distribution [87].
3. 
α = 1; it reduces to the Ali–Mikhail–Haq copula [88].
Its multivariate version is given by
C ( u 1 , , u n ; α , β ) = i = 1 n u i [ 1 β i = 1 n ( 1 u i 1 / α ) ] α .
These examples demonstrate that the copula technique provides a powerful method for constructing numerous new distributions useful for statistical modeling.

4.7.2. Gaussian Copula and Meta-Gaussian Distribution

When H(·) is a multivariate normal distribution, the associated copula is called a Gaussian copula. When H(·) is an elliptical distribution, the corresponding copula is known as an elliptical copula. This family includes the Gaussian copula, the t-copula, and the Kotz-type copula, among others; for a comprehensive overview, see [89].
From the definition of a copula, the Gaussian copula for Np(μ, Σ) is identical to that of Z = ( Z 1 , , Z p ) N p ( 0 , R ) , where R is the correlation matrix of Z. Clearly, all marginal distributions of Z are standard normal N(0, 1), yet its components have correlation matrix R = (rij). Denote the cdf of Z by G normal ( z 1 , , z p R ) and define a function C as follows:
C n o r m a l ( u 1 , , u p ) = G n o r m a l ( Φ 1 ( u 1 ) , , Φ 1 ( u p ) ) , ( u 1 , , u p ) [ 0 , 1 ] p ,
where Φ(·) is the cdf of N(0, 1). This function C is the Gaussian copula. Kelly and Krzysztofowicz [90] studied the meta-Gaussian (or meta-normal) distribution. A more detailed analysis can be found in Section 2.6, “Multivariate Gaussian/Normal”, of [91].
Definition 4.
The meta-Gaussian or meta-normal distribution with a set of given marginal cdfs F 1 , , F p is defined by its cdf
F X 1 , , X p ( x 1 , , x p ) = G Z 1 , , Z p ( Φ 1 ( F 1 ( x 1 ) ) , , Φ 1 ( F p ( x p ) ) ) .
We denote this distribution as X M N p ( 0 , R ; F 1 , , F p ) .
The pdf of M N p ( 0 , R ; F 1 , , F p ) is
h ( x 1 , , x p ) = c n o r m a l ( F 1 ( x 1 ) , , F p ( x p ) ) i = 1 p f i ( x i ) ,
where cnormal is the density of the Gaussian copula and fi is the pdf corresponding to Fi. In the case where the correlation matrix has all off-diagonal elements equal to ρ, Bouy’e [92] provided illustrative plots for various values of ρ.
Let X M N 2 ( 0 , R ; F 1 , F 2 ) , where the correlation matrix with Corr(X1, X2) = ρ. Then the normal copula is given by
C n o r m a l ( u , v ) = 1 2 π 1 ρ 2 Φ 1 ( u ) Φ 1 ( v ) exp s 2 2 ρ s t + t 2 2 ( 1 ρ 2 ) d s d t ,
where 0 ≤ u, v ≤ 1.
The correlation coefficient has been used in multivariate analysis for a long time. However, the copula and its related meta-distributions might have different correlation coefficients due to different marginal distributions. Therefore, the so-called “measure of concordance” is considered in the literature. It requests that the copula and its meta-distributions have the same values of the measure of concordance. The Kendall’s τ and the Spearman’s rho are the most popular ones used in the applications. Their definitions can refer to [66]. Spearman’s ρ for the binormal copula is given by 6 π arcsin ρ 2 (see [93]). This result does not extend to other elliptical distributions, which are introduced in the next section.
The meta-normal distribution X M N 2 ( 0 , R ; F 1 , F 2 ) is given by
F ( x 1 , x 2 ) = C n o r m a l ( F 1 ( x 1 ) , F 2 ( x 2 ) ) = 1 2 π 1 ρ 2 Φ 1 ( F 1 ( x 1 ) ) Φ 1 ( F 2 ( x 2 ) ) exp s 2 2 ρ s t + t 2 2 ( 1 ρ 2 ) d s d t .
Mai and Scherer [94], in Example 1.7, discussed the Gaussian copula and meta-Gaussian distributions for the general case XNp(μ, Σ). In this case, the marginal distributions of Xi ( i = 1 , , p ) are not necessarily identical if the variances Var(Xi) differ. Now, consider the two-dimensional normal distribution N 2 ( 0 , R ) with correlation matrix R and correlation coefficient ρ. Let ξi = Φ−1(ui) for i = 1, 2. The density of the Gaussian copula, as given in Chapter 7 of [95], is
c n o r m a l ( ξ ) = c n o r m a l ( ξ 1 , ξ 2 ) = | R | 1 / 2 exp 1 2 ξ ( R 1 I ) ξ .
This expression can be factorized into two independent components
c n o r m a l ( ξ ) = 1 2 π 1 ρ 2 exp 1 2 ξ R 1 ξ [ 2 π · exp ( 1 2 ( ξ 1 2 + ξ 2 2 ) ) ] .
The first factor is the density of N 2 ( 0 , R ) , while the second factor is the product of the densities of two independent standard normal variables (up to a constant),
2 π exp ( 1 2 ξ 1 2 ) and 2 π exp ( 1 2 ξ 2 2 ) .
Recently, He et al. [96] derived the covariance matrix of the Gaussian copula and related properties.

4.7.3. Marshall–Olkin Copula

The Marshall–Olkin multivariate exponential distribution was introduced in Section 4.6.4. We now derive its corresponding survival copula, denoted by C ^ M O ( u , v ) . We begin by expressing the joint survival function H ¯ ( x , y ) in terms of the marginal survival functions u = F ¯ ( x ) and v = G ¯ ( y ) , recalling the relationship H ¯ ( x , y ) = C ^ ( F ¯ ( x ) , G ¯ ( y ) ) , where C ^ is the survival copula. Using the identity max(x, y) = x + y − min(x, y), we can rewrite the survival function as follows:
H ¯ ( x , y ) = C ^ ( F ¯ ( x ) , G ¯ ( y ) ) = exp [ ( λ 1 + λ 12 ) x ( λ 2 + λ 12 ) y + λ 12 min ( x , y ) ] = F ¯ ( x ) G ¯ ( y ) min { exp ( λ 12 x ) , exp ( λ 12 y ) } .
As u λ 12 λ 1 + λ 12 = e x p ( λ 12 x ) and v λ 12 λ 2 + λ 12 = e x p ( λ 12 y ) , let α = λ 12 λ 1 + λ 12 and β = λ 12 λ 2 + λ 12 . Obviously, 0 < α, β < 1, and let
u = F ¯ ( x ) = e ( λ 1 + λ 12 ) x = e λ 12 x / α , u α = e λ 12 x , v = G ¯ ( y ) = e ( λ 2 + λ 12 ) y = e λ 12 y / β , v β = e λ 12 y .
The survival copula for the Marshall–Olkin distribution, known as the Marshall–Olkin copula, is given by
C ^ M O ( u , v ) = u v min u λ 12 λ 1 + λ 12 , v λ 12 λ 2 + λ 12 = min v u 1 λ 12 λ 1 + λ 12 , u v 1 λ 12 λ 2 + λ 12 = u v min ( u α , v β ) = min ( u 1 α v , u v 1 β ) .
Therefore, the survival copula for the Marshall–Olkin bivariate exponential distribution forms a two-parameter family of copulas given by
C α , β ( u , v ) = u 1 α v , if u α v β , u v 1 β , if u α v β .
The application of copulas and meta-distributions is a cornerstone of modern quantitative finance, risk management, and many other fields that deal with multivariate dependence. They provide a powerful toolkit for modeling complex relationships beyond simple correlation.

5. Conclusions

The development of statistical distributions, driven by the need to model real data, is a cornerstone of statistical science. With a vast and well-established body of literature, a complete review of construction methods is beyond our scope. This article, instead, focuses on recent advancements in the field, particularly in simulation techniques and computational algorithms. Our aim is to offer readers an expanded perspective on the evolving landscape of statistical distributions.

Author Contributions

Conceptualization, investigation, writing—original draft preparation, resources, supervision, project administration: K.-T.F.; writing—review and editing, visualization: Y.-X.L.; writing—review and editing, funding acquisition: Y.-H.D. All authors have read and agreed to the published version of the manuscript.

Funding

Our work was supported in part by the Guangdong Provincial Key Laboratory of IRADS (grant number 2022B1212010006), in part by National Natural Science Foundation of China (Key Program) (grant number 12231004), and in part by BNBU Research (grant number UICR0600048) at Beijing Normal-Hong Kong Baptist University, Zhuhai, PR China.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

References

  1. Johnson, N.L.; Kotz, S.; Balakrishnan, N. Continuous Univariate Distributions, 2nd ed.; John Wiley & Sons: New York, NY, USA, 1994; Volume 1. [Google Scholar]
  2. Johnson, N.L.; Kotz, S.; Balakrishnan, N. Continuous Univariate Distributions, 2nd ed.; John Wiley & Sons: New York, NY, USA, 1995; Volume 2. [Google Scholar]
  3. Johnson, N.L.; Kotz, S.; Balakrishnan, N. Discrete Multivariate Distributions; John Wiley & Sons: New York, NY, USA, 1997. [Google Scholar]
  4. Kotz, S.; Balakrishnan, N.; Johnson, N.L. Continuous Multivariate Distributions, Volume 1: Models and Applications, 2nd ed.; John Wiley & Sons: New York, NY, USA, 2000. [Google Scholar]
  5. Fang, K.T.; Xu, J.L. Statistical Distribution; China Higher Education Press: Beijing, China, 2016. [Google Scholar]
  6. Fang, K.T.; Kotz, S.; Ng, K.W. Symmetric Multivariate and Related Distributions; Chapman and Hall: London, UK, 1990. [Google Scholar]
  7. Johnson, M.E. Multivariate Statistical Simulation; John Wiley & Sons: New York, NY, USA, 1987. [Google Scholar]
  8. Fang, K.T.; Ye, H.; Zhou, Y. Representative Points of Statistical Distributions and Their Applications in Statistical Inference; CRC Press: Boca Raton, FL, USA, 2025. [Google Scholar]
  9. Lehmann, E.L.; Casella, G. Theory of Point Estimation, 2nd ed.; Springer: New York, NY, USA, 1998. [Google Scholar]
  10. Bickel, P.J.; Doksum, K.A. Mathematical Statistics: Basic Ideas and Selected Topics, 2nd ed.; Chapman and Hall/CRC: New York, NY, USA, 2015; Volume 1. [Google Scholar]
  11. Pearson, K.X. Contributions to the mathematical theory of evolution—II. Skew variation in homogeneous material. Philos. Trans. R. Soc. Lond. A 1895, 186, 343–414. [Google Scholar]
  12. Elderton, W.P.; Johnson, N.L. Systems of Frequency Curves; Cambridge University Press: Cambridge, UK, 1969. [Google Scholar]
  13. Hogg, R.V.; McKean, J.W.; Craig, A.T. Introduction to Mathematical Statistics, 6th ed.; Prentice Hall: Upper Saddle River, NJ, USA, 2005. [Google Scholar]
  14. Li, Y.; Sun, Z.H.; Fang, K.T. Generalized inverse transformation method via representative points in statistical simulation. Commun. Stat.-Simul. Comput. 2024, 54, 4519–4545. [Google Scholar] [CrossRef]
  15. David, H.A.; Nagaraja, H.N. Order Statistics, 3rd ed.; John Wiley & Sons: Hoboken, NJ, USA, 2004. [Google Scholar]
  16. De Haan, L.; Ferreira, A. Extreme Value Theory: An Introduction; Springer: New York, NY, USA, 2006. [Google Scholar]
  17. Al-Hussaini, E.K.; Ahsanullah, M. Exponentiated distributions. Atlantis Studies in Probability and Statistics; Atlantis Press: Paris, France, 2015; Volume 21. [Google Scholar]
  18. Mahdavi, A.; Kundu, D. A new method for generating distributions with an application to exponential distribution. Commun. Stat.-Theory Methods 2017, 46, 6543–6557. [Google Scholar] [CrossRef]
  19. Cordeiro, G.M.; De Castro, M. A new family of generalized distributions. J. Stat. Comput. Simul. 2011, 81, 883–898. [Google Scholar] [CrossRef]
  20. Alzaatreh, A.; Lee, C.; Famoye, F. A new method for generating families of continuous distributions using the T-X family. Metron 2013, 71, 6379. [Google Scholar] [CrossRef]
  21. Everitt, B.S.; Hand, D.J. Finite Mixture Distributions; Springer: Dordrecht, The Netherlands, 1981. [Google Scholar]
  22. Li, Y.; Fang, K.T.; He, P.; Peng, H. Representative points from a mixture of two normal distributions. Mathematics 2022, 10, 3952. [Google Scholar] [CrossRef]
  23. Yan, T.; Fang, K.T.; Yin, H. A novel approach for parameter estimation of mixture of two Weibull distributions in failure data modeling. Stat. Comput. 2024, 34, 221. [Google Scholar] [CrossRef]
  24. Aigner, D.; Lovell, C.K.; Schmidt, P. Formulation and estimation of stochastic frontier production function models. J. Econom. 1977, 6, 21–37. [Google Scholar] [CrossRef]
  25. Huygens, C. De Ratiociniis in Ludo Aleae. In Exercitationum Mathematicarum; Schooten, F.V., Ed.; Elsevier: Leiden, The Netherlands, 1657; pp. 521–534. [Google Scholar]
  26. de Moivre, A. Approximatio ad Summam Terminorum Binomii (a + b)n in Seriem Expansi. Misc. Anal. 1733, II, 1–7. [Google Scholar]
  27. Gauss, C.F. Theoria Motus Corporum Coelestium in Sectionibus Conicis Solem Ambientum; Perthes et Besser: Hamburg, Germany, 1809. [Google Scholar]
  28. Azzalini, A. A class of distributions which includes the normal ones. Scand. J. Stat. 1985, 12, 171–178. [Google Scholar]
  29. Martínez-Flórez, G.; Tovar-Falón, R.; Gómez, H. Bivariate Power-Skew-Elliptical Distribution. Symmetry 2020, 12, 1327. [Google Scholar] [CrossRef]
  30. Zhou, Y.F.; Lin, Y.X.; Fang, K.T.; Yin, H. The Representative Points of Generalized Alpha Skew-t Distribution and Applications. Entropy 2024, 26, 889. [Google Scholar] [CrossRef]
  31. Rayleigh, L. On the resultant of a large number of vibrations of the same pitch and of arbitrary phase. Philos. Mag. 1880, 10, 73–78. [Google Scholar] [CrossRef]
  32. Nadarajah, S.; Kotz, S. Moments of some J-shaped distributions. J. Appl. Stat. 2003, 30, 311–317. [Google Scholar] [CrossRef]
  33. Stacy, E.W. A generalization of the gamma distribution. Ann. Math. Stat. 1962, 33, 1187–1192. [Google Scholar] [CrossRef]
  34. Schrdinger, E. Zur theorie der fall-und steigversuche an teilchen mit brownscher bewegung. Phys. Z. 1915, 16, 289–295. [Google Scholar]
  35. Tweedie, K.C.M. Inverse statistical variates. Nature 1945, 155, 453. [Google Scholar] [CrossRef]
  36. Seshadri, V. The Inverse Gaussian Distribution: Statistical Theory and Applications; Springer: New York, NY, USA, 1999. [Google Scholar]
  37. Barndorff-Nielsen, O.E. Exponentially decreasing distributions for the logarithm of particle size. Proc. R. Soc. Lond. A Math. Phys. Sci. 1977, 353, 401–419. [Google Scholar]
  38. Eberlein, E.; Keller, U. Hyperbolic distributions in finance. Bernoulli 1995, 1, 281–299. [Google Scholar] [CrossRef]
  39. Barndorff-Nielsen, O.E. Normal inverse Gaussian distributions and stochastic volatility modelling. Scand. J. Stat. 1997, 24, 1–13. [Google Scholar] [CrossRef]
  40. Arnold, B.C.; Castillo, E.; Sarabia, J.M. Conditional Specification of Statistical Models; Springer Series in Statistics; Springer: New York, NY, USA, 1999. [Google Scholar]
  41. Arnold, B.C.; Castillo, E.; Sarabia, J.M. Conditionally specified distributions: An introduction (with comments and a rejoinder by the authors). Stat. Sci. 2001, 16, 249–274. [Google Scholar] [CrossRef]
  42. Zografos, K.; Balakrishnan, N. On families of beta-and generalized gamma-generated distributions and associated inference. Stat. Methodol. 2009, 6, 344–362. [Google Scholar] [CrossRef]
  43. Eugene, N.; Lee, C.; Famege, F. Beta-normal distribution and its applications. Commun. Stat.-Theory Methods 2002, 31, 497–512. [Google Scholar] [CrossRef]
  44. Gupta, A.K.; Nadarajah, S. On the moments of the beta normal distribution. Commun. Stat.-Theory Methods 2005, 33, 1–13. [Google Scholar] [CrossRef]
  45. Famege, F.; Lee, C.; Olumolade, O. The beta-Weibull distribution. J. Stat. Theory Appl. 2005, 4, 121–136. [Google Scholar]
  46. Mdziniso, N.C. The Quotient of the Beta-Weibull Distribution. Master’s Thesis, Marshall University, Huntington, WV, USA, 2012. [Google Scholar]
  47. Johnson, N.L.; Kotz, S. Urn Models and Their Applications; John Wiley & Sons: New York, USA, NY, 1977. [Google Scholar]
  48. Johnson, N.L.; Kemp, A.W.; Kotz, S. Univariate Discrete Distributions, 3rd ed.; John Wiley & Sons: New York, NY, USA, 2005. [Google Scholar]
  49. Ross, S.M. Introduction to Probability Models, 11th ed.; Academic Press: Cambridge, MA, USA, 2014. [Google Scholar]
  50. Fang, K.T.; Pan, J. A review of representative points of statistical distributions and their applications. Mathematics 2023, 11, 2930. [Google Scholar] [CrossRef]
  51. Chakraborty, S. Generating discrete analogues of continuous probability distributions—A survey of methods and constructions. J. Stat. Distrib. Appl. 2015, 2, 6. [Google Scholar] [CrossRef]
  52. Kemp, A.W. A wide class of discrete distributions and the associated differential equations. Sankhyā Indian J. Stat. Ser. A 1968, 30, 401–410. [Google Scholar]
  53. Sundt, B.; Jewell, W.S. Further results on recursive evaluation of compound distributions. ASTIN Bull. J. IAA 1981, 12, 27–39. [Google Scholar] [CrossRef]
  54. Lisman, J.H.C.; Van Zuylen, M.C.A. Note on the generation of most probable frequency distributions. Stat. Neerl. 1972, 26, 19–23. [Google Scholar] [CrossRef]
  55. Inusah, S.; Kozubowski, T.J. A discrete analogue of the Laplace distribution. J. Stat. Plan. Inference 2006, 136, 1090–1102. [Google Scholar] [CrossRef]
  56. Kozubowski, T.J.; Inusah, S. A skew Laplace distribution on integers. Ann. Inst. Stat. Math. 2006, 58, 555–571. [Google Scholar] [CrossRef]
  57. Roy, D. The discrete normal distribution. Commun. Stat.-Theory Methods 2003, 32, 1871–1883. [Google Scholar] [CrossRef]
  58. Roy, D. Discrete Rayleigh distribution. IEEE Trans. Reliab. 2004, 53, 255–260. [Google Scholar] [CrossRef]
  59. Krishna, H.; Pundir, P.S. Discrete Burr and discrete Pareto distributions. Stat. Methodol. 2009, 6, 177–188. [Google Scholar] [CrossRef]
  60. Chakraborty, S.; Chakravarty, D. Discrete gamma distributions: Properties and parameter estimations. Commun. Stat.-Theory Methods 2012, 41, 3301–3324. [Google Scholar] [CrossRef]
  61. Karlis, D.; Mamode Khan, N. Models for integer data. Annu. Rev. Stat. Its Appl. 2023, 10, 297–323. [Google Scholar] [CrossRef]
  62. Janardan, K.G.; Patil, G.P. A unified approach for a class of multivariate hypergeometric models. Sankhya¯ Indian J. Stat. Ser. A 1972, 34, 363–376. [Google Scholar]
  63. Kocherlakota, S.; Kocherlakota, K. Bivariate Discrete Distributions; Marcel Dekker: New York, USA, NY, 1992. [Google Scholar]
  64. Karlis, D. An EM algorithm for multivariate Poisson distribution and related models. J. Appl. Stat. 2003, 30, 63–77. [Google Scholar] [CrossRef]
  65. Palmitesta, P.; Provasi, C. Computer Generation of Random Vectors from Continuous Multivariate Distributions; Dipartimento di Metodi Quantitativi, Università degli Studi di Siena: Siena, Italy, 2001. [Google Scholar]
  66. Nelsen, R.B. An Introduction to Copulas, 2nd ed.; Springer: New York, NY, USA, 2006. [Google Scholar]
  67. Balakrishna, N.; Lai, C.D. Construction of bivariate distributions. In Continuous Bivariate Distributions, 2nd ed.; Springer: New York, NY, USA, 2009; pp. 179–228. [Google Scholar]
  68. Fang, K.T.; Zhang, Y.T. Generalized Multivariate Analysis; Science Press: Beijing, China; Springer: Berlin/Heidelberg, Germany, 1990. [Google Scholar]
  69. Fang, K.T.; Wang, Y. Number-Theoretic Methods in Statistics; Chapman & Hall: London, UK, 1994. [Google Scholar]
  70. Pagès, G. Numerical Probability: An Introduction with Applications to Finance; Universitext Series; Springer: Berlin/Heidelberg, Germany, 2018. [Google Scholar]
  71. Fang, K.T.; Yuan, K.H.; Benlter, P.M. Applications of number-theoretic methods to quantizers of elliptically contoured distributions, Multivariate Analysis and Its Applications. IMS Lect. Notes-Monogr. Ser. 1994, 24, 211–225. [Google Scholar]
  72. Tarpey, T.; Flury, B. Self-consistency: A fundamental concept in statistics. Stat. Sci. 1996, 11, 229–243. [Google Scholar] [CrossRef]
  73. Graf, S.; Luschgy, H. Foundations of Quantization for Probability Distributions; Springer: Berlin/Heidelberg, Germany, 2000. [Google Scholar]
  74. Wishart, J. The generalised product moment distribution in samples from a normal multivariate population. Biometrika 1928, 20, 32–52. [Google Scholar] [CrossRef]
  75. Fang, K.T.; Fang, B.Q. Some families of multivariate symmetric distributions related to exponential distribution. J. Multivar. Anal. 1988, 24, 109–122. [Google Scholar] [CrossRef]
  76. Fang, K.T.; Fang, B.Q. A characterization of multivariate l1-norm symmetric distributions. Stat. Probab. Lett. 1989, 7, 297–299. [Google Scholar] [CrossRef]
  77. Anderson, T.W. An Introduction to Multivariate Statistical Analysis; John Wiley & Sons: New York, NY, USA, 1984. [Google Scholar]
  78. Wang, J.; Boyer, J.; Genton, M.G. A skew-symmetric representation of multivariate distributions. Stat. Sin. 2004, 14, 1259–1270. [Google Scholar]
  79. Arad, F.; Sheikhi, A. Asymmetric distributions based on the t-copula. In Proceedings of the 2022 9th Iranian Joint Congress on Fuzzy and Intelligent Systems (CFIS), Bam, Iran, 2–4 March 2022; pp. 1–3. [Google Scholar]
  80. Azzalini, A.; Valle, A.D. The multivariate skew-normal distribution. Biometrika 1996, 83, 715–726. [Google Scholar] [CrossRef]
  81. Azzalini, A.; Capitanio, A. Statistical applications of the multivariate skew-normal distribution. J. R. Stat. Soc. Ser. B Stat. Methodol. 1999, 61, 579–602. [Google Scholar] [CrossRef]
  82. Genton, M.G. (Ed.) Skew-Elliptical Distributions and Their Applications: A Journey Beyond Normality; CRC Press: Boca Raton, FL, USA, 2004. [Google Scholar]
  83. Marshall, A.W.; Olkin, I. A multivariate exponential distribution. J. Am. Stat. Assoc. 1967, 62, 30–44. [Google Scholar] [CrossRef]
  84. Durante, F.; Sempi, C. Chapter 1: Copula theory: An introduction. In Copula Theory and Its Applications, Proceedings of the Workshop Held in Warsaw, Poland, 25–26 September 2009; Lecture Notes in Statistics—Proceedings, 198; Springer: Berlin/Heidelberg, Germany, 2010; pp. 3–32. [Google Scholar]
  85. Gumbel, E.J. Multivariate logistic distributions. Bull. Int. Stat. Inst. 1961, 38, 1–6. [Google Scholar]
  86. Fang, K.T.; Fang, H.B.; von Rosen, D. A family of bivariate distributions with non-elliptical contours. Commun. Stat. Theory Methods 2000, 29, 1885–1898. [Google Scholar] [CrossRef]
  87. Clayton, U.G. A model for association in bivariate iife tables and its applications in epideniological studies of familial tendency in chronic disease incidence. Biometrika 1978, 65, 141–151. [Google Scholar] [CrossRef]
  88. Ali, M.M.; Mikhail, N.N.; Haq, M.S. A class of bivariate distributions including the bivariate logistic. J. Multivar. Anal. 1978, 8, 405–412. [Google Scholar] [CrossRef]
  89. Fang, H.B.; Fang, K.T.; Kotz, S. The meta-elliptical distributions with given marginalis. J. Multivar. Anal. 2002, 82, 1–16. [Google Scholar] [CrossRef]
  90. Kelly, K.S.; Krzysztofowicz, R. Bivariate meta-gaussian density for use in hydrology. Stoch. Hydrol. Hydraul. 1997, 11, 17–31. [Google Scholar] [CrossRef]
  91. Joe, H. Dependence Modeling with Copulas; CRC Press: London, UK, 2015. [Google Scholar]
  92. Bouyé, E.; Durrleman, V.; Nikeghbali, A.; Riboulet, G.; Roncalli, T. Copulas for Finance—A Reading Guide and Some Applications. Available online: https://ssrn.com/abstract=1032533 (accessed on 1 October 2025).
  93. Kruskal, W.H. Ordinal measures of association. J. Am. Stat. Assoc. 1958, 53, 814–861. [Google Scholar] [CrossRef]
  94. Mai, J.F.; Scherer, M. Simulating Copulas: Stochastic Models, Sampling Algorithms and Applications, 2nd ed.; World Scientific: Hackensack, NJ, USA, 2017. [Google Scholar]
  95. Zhang, L.; Singh, V.P. Copulas and Their Applications in Water Resources Engineering; Cambridge University Press: Cambridge, UK, 2019. [Google Scholar]
  96. He, P.A.; Fang, K.T.; Ye, H.J. Gaussian copulas and meta Gaussian distributions: A theoretical study. Stat. Comput. 2025; in press. [Google Scholar]
Figure 1. A guide to selecting common discrete probability distributions.
Figure 1. A guide to selecting common discrete probability distributions.
Entropy 27 01188 g001
Table 1. Classification of Pearson distributions.
Table 1. Classification of Pearson distributions.
TypeConditionDistributionPDF Example
0a = 0, b = 0 Normal distribution f ( x ) e ( x μ ) 2 / 2 σ 2
Ib2 − 4ac > 0 Beta distributionf(x) ∝ (1 + x)r1(1 − x)r2, |x| ≤ 1
IIb2 − 4ac = 0, λ = − b/2aSymmetric beta f ( x ) ( 1 x 2 ) r , | x | 1
IIIa = 0 Gamma distributionf(x) ∝ xk−1ex/θ. x > 0
IVb2 − 4ac < 0 Skewed heavy-tailedComplex form
Vc = 0 Inverse gammaf(x) ∝ xk−1eθ/x, x > 0
VIb2 − 4ac > 0, λ root Beta prime, F-distributionf(x) ∝ xr1(1 + x)−(r1+r2)
VIIb = 0 Student’s t-distribution f ( x ) ( 1 + x 2 / ν ) ( ν + 1 ) / 2
Table 2. Distributions of the order statistics.
Table 2. Distributions of the order statistics.
DistributionDensity f ( x )
The general k-th order statistic 1 ( n k 1 ) [ F ( x ) ] k 1 [ 1 F ( x ) ] n k f ( x )
Minimum (X(1))n[1 − F(x)]n−1f(x)
Maximum (X(n))n[F(x)]n−1f(x)
1 Note: The pdf for the median exists in a closed form only when n is odd (n = 2m + 1), given by p median ( x ) = n ! ( m ! ) 2 [ F ( x ) ] m [ 1 F ( x ) ] m f ( x ) . For even n, the median is defined as 1 2 ( X ( n / 2 ) + X ( n / 2 + 1 ) ) and its distribution is derived from the joint distribution of these two order statistics.
Table 6. Famous discrete distributions.
Table 6. Famous discrete distributions.
DistributionSupport PointsProbability Mass Function p i ( x )
Bernoulli(p) 0, 1 p0 = 1 − p, p1 = p
Binomial(n, p) 0 , 1 , 2 , , n p i = n i p i ( 1 p ) n i , i = 0 , 1 , 2 , , n
Geometric(p) 1 , 2 , p i = p ( 1 p ) i 1 , i = 1 , 2 , (Number of trials until first success)
Negative Binomial(r, p) r , r + 1 , p i = i 1 r 1 p r ( 1 p ) i r , i = r , r + 1 , (Number of trials until r-th success)
0 , 1 , 2 , p i = i + r 1 i p r ( 1 p ) i , i = 0 , 1 , 2 , (Number of failures before r-th success)
Poisson(λ) 0 , 1 , 2 p i = e λ λ i i ! , i = 0 , 1 , 2 ,
Hypergeometric(s, n, N, M) max (0, n − (NM)), ⋯, min (M, n) p i = M i N M n i / N n
Note on alternative derivations and parameterizations: The distributions listed above can often be derived through multiple probabilistic mechanisms, which leads to different (but equivalent) parameterizations and interpretations. These alternative derivations highlight the interconnectedness of discrete probability distributions and provide flexibility in choosing the most appropriate model for a given data-generating process.
Table 7. Some subclasses of p-dimensional spherical distributions.
Table 7. Some subclasses of p-dimensional spherical distributions.
TypeDensity Function g ( x x ) or c.f. ψ ( t t )
Multinormal g ( x x ) = c exp ( 1 2 x x )
Kotz-type g ( x x ) = c ( x x ) N 1 exp [ r ( x x ) s ] , r , s > 0 , 2 N + p > 2
Pearson type VII g ( x x ) = c ( l + x x / s ) N , N > p / 2 , s > 0
Multivariate t g ( x x ) = c ( l + x x / v ) ( p + v ) / 2 , v > 0
Multivariate Cauchy g ( x x ) = c ( l + x x ) ( p + 1 ) / 2
Pearson Type II g ( x x ) = c ( l x x ) q , q > 0 for xx < 1
Logistics g ( x x ) = c exp ( x x ) / [ 1 + exp ( x x ) ] 2
Scale mixture g ( x x ) = c 0 t p / 2 exp ( x x / 2 t ) d G ( t ) , G ( t ) a   cdf .
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Fang, K.-T.; Lin, Y.-X.; Deng, Y.-H. A Review: Construction of Statistical Distributions. Entropy 2025, 27, 1188. https://doi.org/10.3390/e27121188

AMA Style

Fang K-T, Lin Y-X, Deng Y-H. A Review: Construction of Statistical Distributions. Entropy. 2025; 27(12):1188. https://doi.org/10.3390/e27121188

Chicago/Turabian Style

Fang, Kai-Tai, Yu-Xuan Lin, and Yu-Hui Deng. 2025. "A Review: Construction of Statistical Distributions" Entropy 27, no. 12: 1188. https://doi.org/10.3390/e27121188

APA Style

Fang, K.-T., Lin, Y.-X., & Deng, Y.-H. (2025). A Review: Construction of Statistical Distributions. Entropy, 27(12), 1188. https://doi.org/10.3390/e27121188

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop