A Short Note on Generating a Random Sample from Finite Mixture Distributions

: Computational statistics is a critical skill for professionals in fields such as data science, statistics, and related disciplines. One essential aspect of computational statistics is the ability to simulate random variables from specified probability distributions. Commonly employed techniques for sampling random variables include the inverse transform method, acceptance–rejection method, and Box–Muller transformation, all of which rely on sampling from the uniform ( 0,1 ) distribution. A significant concept in statistics is the finite mixture model, characterized by a convex combination of multiple probability density functions. In this paper, we introduce a modified version of the composition method, a standard approach for sampling finite mixture models. Our modification offers the advantage of relying on sampling from the uniform ( 0,1 ) distribution, aligning with prevalent methods in computational statistics. This alignment simplifies teaching computational statistics courses, as well as having other benefits. We offer several examples to illustrate the approach.


Introduction
Computational statistics has gained significant importance in recent years due to the exponential growth of data and the increasing complexity of data-driven problems.Within computational statistics, the ability to simulate or generate random samples from a probability distribution is fundamental.These generated random samples are utilized for estimating probabilities, expectations, and testing hypotheses.The inverse transform method and the acceptance-rejection method are two of the most fundamental techniques for generating random samples, and these can be found in well-known computational statistics textbooks such as Statistical Computing with R by [1].These methods rely on generating numbers from the uniform (0, 1) distribution.The choice of method depends on the specific distribution being generated and the desired properties of the generated sample, such as efficiency or accuracy.
In certain cases, the data may not conform to commonly known distributions such as the normal or exponential distributions.Instead, they can be represented as a finite mixture model, which combines multiple probability density functions in a convex manner.These models find applications in various scientific domains.For instance, normal mixture distributions are used as parametric density estimators [2], whereas finite mixture models are employed in medical studies [3] and financial analyses [4].Finite mixture models have also been used by [5] in the analysis of wind speeds, and Ref. [6] have demonstrated their usefulness in Bayesian density estimation.Furthermore, Ref. [7] provide a comprehensive overview of the different applications of mixture models.
Sampling from finite mixture models is a standard topic covered in many computational statistics textbooks, including works by [1,8], among others.In these texts, the primary approach for sampling from finite mixture models is typically the composition method.However, although the composition method is effective, it does not directly use the uniform distribution.
The goal of this paper is to modify the standard composition algorithm by incorporating sampling from the uniform (0, 1) distribution to ensure consistency with primary sampling algorithms such as the inverse transform method and the acceptance-rejection method.This aspect could prove beneficial in teaching computational statistics courses, as sampling from the uniform (0, 1) distribution becomes a standard step in various sampling algorithms.
The remainder of this paper is organized as follows.Section 2 provides a relevant background on finite mixture models and discusses the proposed modification.Section 3 presents several examples demonstrating the effectiveness of the proposed method.Finally, Section 4 offers concluding remarks.

Finite Mixture Models and Simulation Theorem
In this section, we define a finite mixture model and introduce a theorem for sampling this model via an adaptation of the composition method.The proof of this theorem is also included.
A finite mixture model is a statistical model that represents a probability distribution as a mixture of several component distributions.Mathematically, given k component distributions f 1 (x), . . ., f k (x), each with associated mixing probabilities (also known as mixing weights) π 1 , . . ., π k , a finite mixture model f (x) is defined as: where 0 ≤ π i ≤ 1 and ∑ k i=1 π i = 1.Further insights into Equation (1) can be found in studies by [9,10].
In the literature, simulating a variable from a finite k-mixture distribution is typically carried out by the composition method [1,11]: Generate an integer I ∈ {1, . . ., k} such that Deliver X with cumulative distribution function F I .
The following theorem introduces an algorithm for generating a sample from (1).This theorem presents a modified version of the composition method, utilizing the uniform distribution.Aligning with well-established algorithms such as the inverse transform and acceptance-rejection method enhances accessibility for learners.
Theorem 1.Consider F(x) as defined in (1).The following algorithm generates a random variate from X with the cumulative distribution function F(x): 1.
Generate a random u from the uniform (0, 1) distribution;

If
π i , generate a random x from F l (x), where l = 1, . . ., k, with the convention that Proof.We show that the generated sample has the same distribution as X.By the law of total probability, we have The proof of Theorem 1 reveals that the approach is overly general, encompassing not only mixtures of continuous distributions but also extending to other scenarios.This includes mixtures involving continuous and discrete distributions, as well as mixtures comprising only discrete distributions.Additionally, the framework can be extended to sample mixtures of multivariate distributions.In the following section, we explore specific examples that illustrate these various cases.

Examples
In this section, we demonstrate the proposed algorithm outlined in Theorem 1 with six illustrative examples.The R code is provided in the Supplementary Materials.
Using Theorem 1, we generated a sample of size 10 6 from F(x). Figure 1 shows the histogram of the generated sample with the true density superimposed.It is evident from Figure 1 that the proposed method performs exceptionally well in this example.
Example 2. Mixture of five gamma distributions: different shapes with same scale parameters [1].
, where X i ∼ gamma(r = 3, β i = i) are independent and the mixing probabilities are π i = i/15, i = 1, . . ., 5. Using Theorem 1, we generated a sample of size 10 6 from F(x). Figure 2 displays the histogram plot of the generated sample with the true density superimposed.The proposed procedure also performs well in this example.Example 3. Mixture of five gamma distributions: different scale with same shape parameters.

Mixture of Normal Distribution
Let F(x) be as described in Example 2, with X i ∼ gamma(r i = i, β i = 3).Employing Theorem 1, we generated a sample of size 10 6 from F(x). Figure 3 presents the histogram plot of the generated sample with the true density superimposed.The proposed procedure demonstrates effective performance in this example as well.
In all three cases, we let π 1 = 9/20, π 2 = 9/20, and π 3 = 1/10.As a measure of proximity, we utilize the Cramér-von Mises distance defined as We examine various sample sizes n ∈ {20, 50, 100, 1000}.For each generated sample X 1 , . . ., X n , we estimate D using For each sample, we compute 10 4 values of D and report D and sd( D), representing the mean and standard deviation of the 10 4 values of D. Additionally, for comparison, we include results obtained using samples generated from the composition method described in Section 2. The results are reported in Table 1.It is clear that both simulation algorithms work well as both D and sd( D) approach zero, especially as we increase the sample size.Example 5. Mixture of four binomial distributions [12].

Conclusions
This paper introduces a modified version of the composition method for sampling finite mixture distributions.By incorporating sampling from the uniform (0, 1) distribution, our modification aligns with prevalent methods in computational statistics, such as the inverse transform and acceptance-rejection methods.This modification not only enhances the consistency and accuracy of sampling procedures but also simplifies the teaching of computational statistics courses, where sampling from the uniform (0, 1) distribution is a common step in various algorithms.
The effectiveness of the proposed modification is demonstrated through several illustrative examples, showcasing its robust performance across different scenarios.From mixtures of normal and gamma distributions to binomial and Poisson mixtures, the proposed algorithm consistently generates samples that closely match the theoretical distributions.Moreover, comparison metrics such as the Cramér-von Mises distance provide quantitative evidence of the algorithm's efficiency and accuracy, especially as sample sizes increase.
Overall, the modified composition method presented in this paper offers a valuable addition to the toolkit of computational statisticians and educators alike.Its simplicity, consistency, and performance make it a practical choice for sampling finite mixture distributions in various applications.

Figure 1 .
Figure 1.Mixture of three normal distributions in Example 1.

Figure 2 .
Figure 2. Mixture of five gamma distributions with different shapes and same scale parameters in Example 2.

Figure 3 .
Figure 3. Mixture of five gamma distributions with different scale parameters and same shape parameters in Example 3.

Example 6 .
Mixture of normal and Poisson distributions.

Supplementary Materials:
The following supporting information can be downloaded at: www.mdpi.com/xxx/s1.R Code: A Short Note on Generating a Random Sample from Finite Mixture Distributions.Author Contributions: Methodology, L.A.-L.; Software, A.L.; Writing-original draft, A.L. All authors have read and agreed to the published version of the manuscript.Funding: This research received no external funding.

Table 1 .
Comparison of proposed and composition methods.