Next Article in Journal
Examples for BPS Solitons Destabilized by Quantum Effects
Previous Article in Journal
Fingerprint-Based Secure Query Scheme for Databases over Symmetric Mirror Servers
Previous Article in Special Issue
Statistical Analysis Under a Random Censoring Scheme with Applications
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Priori Sample Size Determination for Estimating a Location Parameter Under a Unified Skew-Normal Distribution

1
Department of Mathematical and Statistical Sciences, University of Nebraska Omaha, Omaha, NE 68182, USA
2
College of Big Data and Internet, Shenzhen Technology University, Shenzhen 518118, China
3
Department of Applied Mathematics, Xi’an University of Technology, Xi’an 710054, China
*
Author to whom correspondence should be addressed.
Symmetry 2025, 17(8), 1228; https://doi.org/10.3390/sym17081228
Submission received: 13 June 2025 / Revised: 12 July 2025 / Accepted: 24 July 2025 / Published: 4 August 2025

Abstract

The a priori procedure (APP) is concerned with determining appropriate sample sizes to ensure that sample statistics to be obtained are likely to be good estimators of corresponding population parameters. Previous researchers have shown how to compute a priori confidence interval means or locations for normal and skew-normal distributions. However, two critical limitations persist in the literature: (1) While numerous skewed models have been proposed, the APP equations for location parameters have only been formally established for the basic skew-normal distributions. (2) Even within this fundamental framework, the APPs for sample size determinations in estimating locations are constructed on samples of specifically dependent observations having multivariate skew-normal distributions jointly. Our work addresses these limitations by extending a priori reasoning to the more comprehensive unified skew-normal (SUN) distribution. The SUN family not only encompasses multiple existing skew-normal models as special cases but also enables broader practical applications through its capacity to model mixed skewness patterns and diverse tail behaviors. In this paper, we establish APP equations for determining the required sample sizes and set up confidence intervals for the location parameter in the one-sample case, as well as for the difference in locations in matched pairs and two independent samples, assuming independent observations from the SUN family. This extension addresses a critical gap in the literature and offers a valuable contribution to the field. Simulation studies support the equations presented, and two applications involve real data sets for illustrations of our main results.

1. Introduction

Given the limitations and periodicity in data collection within certain fields, the study of a priori confidence intervals first introduced by Trafimow [1] that do not rely on data has become increasingly important. The concept of a priori confidence intervals can be understood as researchers seeking to determine a data set with a sample size of n, such that the deviation and corresponding probability between the sample statistic and the population parameter fall within a given range. In practical applications, the calculation of a priori confidence intervals can assist researchers in planning sample sizes before data collection, thereby ensuring the reliability and efficiency of their research. By combining a priori knowledge and theoretical assumptions, researchers can make reasonable inferences about possible statistical outcomes even without the support of actual data. This is particularly significant for fields where data collection is difficult or costly, such as research on rare diseases, personalized medicine, extreme weather and climate change, and human behaviors. Although researchers have long been interested in parameter estimation, a recent advance, termed the a priori procedure (APP), focuses specifically on sample size requirements for meeting specific criteria for precision and confidence was introduced by Trafimow [1] and expanded and explained in detail by Trafimow and MacDonald [2], based on the assumption that researchers have an important stake in obtaining sample statistics that are good estimates of corresponding population parameters. The goal of the procedure is to obtain sample statistics that are close to corresponding population parameters, and to be able to be confident in the accuracy of the estimation. The procedure commences with the researcher specifying, prior to data collection, how close they wish the sample statistic to be to the population parameter, and how confident they wish to be of being that close. The researcher then uses an appropriate equation to obtain the sample size needed to meet the specifications.
With the continuous development of statistics and the demands of practical applications, researchers have recognized the limitations of normality assumptions in real-world data and extended the APP framework to skewed distributions. Trafimow et al. [3] provide the necessary equations to determine required sample size on estimating the population location when observations jointly follow the multivariate skew-normal distribution introduced by Azzalini [4], including normal as a special case. Wang et al. [5] extended this framework to compare location parameters between two independent skew-normal populations, deriving a closed-form expression for the minimum sample size required to achieve a specified level of confidence and precision. Wang et al. [6] further generalized the approach to matched samples for deriving the required sample size to estimate the difference in locations based on paired data that jointly follow a multivariate skew-normal distribution. In the multivariate case, Ma et al. [7] applied the APP to estimate the location vector, assuming a joint MSN distribution among observations. Beyond location estimation, Tong et al. [8] introduced an APP framework for estimating regression coefficients in linear models, enabling researchers to determine the necessary sample size to achieve specified levels of precision and confidence for regression analysis under normality assumptions. Most recently, Cao et al. [9] investigated APP-based estimation of population proportions using skew-normal approximations and the Beta–Bernoulli process. Together, these studies illustrate the growing body of research applying APPs in skew-normal contexts across a range of statistical problems.
However, current APP developments have been limited to the basic skew-normal family, whereas the skewed distributions have been studied intensively, and there are numerous developments. Moreover, existing APPs for estimating location parameters or their differences typically assume that observations, while identically distributed, exhibit a specific and limited form of dependence in their joint distribution. Therefore, it is imperative to establish an APP to increase the generality for samples coming from a much larger skewed family which include multiple existing skew-normal models as special cases.
To address this gap, we extend the a priori thinking to the unified skew-normal (SUN) distribution encompassing additional properties not captured by the basic skew-normal distributions. Since proposed by Arellano-Valle and Azzalini [10], it has attracted the attention of many scholars. For example, Gupta et al. [11] showed that the joint distribution of the independent SUN random vectors is again a SUN-distributed random vector. Amiri et al. [12] discussed the truncated version of the SUN distributions and determined some measures in the reliability theory. Durante [13] studied the conjugate Bayes for probit regression via the SUN distributions. Minozzo and Bagnato [14] proposed a latent spatial factor model in which all finite-dimensional marginal distributions are multivariate SUN distributions. Recently, Arellano-Valle and Azzalini [15] summarized and discussed some other useful properties of the SUN distribution. More recent advances in Bayesian inference and computation related to the SUN distributions can be found in Anceschi et al. [16].
In this paper, we would consider the APP to estimate the location parameter for one sample and the difference in locations for two samples under the SUN distributions. The rest of this article is organized as follows. Some properties of the SUN distribution and the APPs for estimating the location parameter in one SUN population and for the difference in locations from two independent samples and matched samples under the SUN distributions are studied in Section 2. Simulation studies are conducted to investigate and compare the performances of the proposed methods in Section 3. Two real data sets are analyzed to illustrate the usefulness of the proposed method in Section 4. Some conclusions are presented in Section 5.

2. Properties of the SUN Distribution

In this section, we introduce properties of univariate and multivariate SUN distributions that will be useful when showing our mail results. The moment generating functions (MGFs) and the means and variances will also be obtained.
Lemma 1. 
Consider  U 0 R , U 1 R n  and
U 0 U 1 N n + 1 ( 0 , Ω ) ,
where  Ω = σ 2 Γ Γ Σ , with Γ R n , σ R +  and  Σ M n × n , is a covariance matrix; then,  Z = U 1 | U 0 + γ > 0  follows the n-dimensional SUN distribution with the probability density function (pdf)
f Z ( z ) = ϕ n z ; Σ Φ γ + Γ Σ 1 z ; σ 2 Γ Σ 1 Γ Φ ( γ ; σ 2 ) ,
and it is denoted by
Z S U N n , 1 ( 0 , γ , I n , Ω ) ,
where  γ R , ϕ n ( · ; ξ , Σ )  is the pdf of an n-dimensional normal random vector with mean vector ξ and covariance matrix Σ, and  Φ n ( · ; Σ )  is the cumulative distribution function (cdf) of the n-dimensional normal random variable of mean  0  and covariance matrix Σ.
In general, let X = ξ + W Z , where W M n × n is of full rank and ξ R n ; thus, we have X S U N n , 1 ( ξ , γ , W , Ω ) , and the pdf of X is
f X ( x ) = ϕ n x ; ξ , W Σ W Φ γ + Γ Σ 1 W 1 ( x ξ ) ; σ 2 Γ Σ 1 Γ Φ ( γ ; σ 2 ) .
Lemma 2. 
Let  X S U N n , 1 ( ξ , γ , W , Ω ) . Then, the MGF of  X  is
M X ( t ) = exp t ξ + 1 2 t W Σ W t Φ γ + Γ W t ; σ 2 Φ ( γ ; σ 2 ) .
Lemma 3. 
Consider  U 0 R n U 1 R  and
U 0 U 1 N n + 1 ( 0 , Ω ˜ ) ,
where the covariance matrix  Ω ˜ = Σ Γ Γ σ 2  is positive definite, and Σ, Γ and σ are the same as defined in Lemma 1; then,  Z = U 1 | U 0 + γ > 0 , where γ R n  follows the SUN distribution such that the pdf is
f Z ( z ) = ϕ ( z ; 0 , σ 2 ) Φ n γ + Γ σ 2 z ; Σ σ 2 Γ Γ Φ n ( γ ; Σ ) ,
and it is denoted by
Z S U N 1 , n ( 0 , γ , 1 , Ω ˜ ) .
Similarly, let X = ξ + ω Z , where ξ R and ω R + ; then, X S U N 1 , n ( ξ , γ , ω , Ω ˜ ) and
f X ( x ) = ϕ ( x ; ξ , ω 2 σ 2 ) Φ n γ + Γ σ 2 ω 1 ( x ξ ) ; Σ σ 2 Γ Γ Φ n ( γ ; Σ ) .
Proposition 1. 
If  X S U N 1 , n ( ξ , γ , ω , Ω ˜ )  is given in Lemma 3, then the MGF of X is
M X ( t ) = exp ξ t + 1 2 σ 2 ω 2 t 2 Φ n γ + Γ ω t ; Σ Φ n ( γ ; Σ )
Corollary 1. 
Let  Y S U N 1 , 1 ( ξ , γ , ω , Ω ) , where  Ω = σ 1 2 ρ σ 1 σ 2 ρ σ 1 σ 2 σ 2 2 . Then, we have the following results:
(1) 
The MGF of Y is
M Y ( t ) = exp ξ t + 1 2 t 2 ω 2 σ 2 2 Φ γ + ρ σ 1 σ 2 ω t ; σ 1 2 Φ ( γ ; σ 1 2 ) .
(2) 
The mean of Y is
E ( Y ) = ξ + ρ σ 1 σ 2 ω c γ ,
and the variance of Y is
V a r ( Y ) = ω 2 σ 2 2 ρ 2 σ 1 2 σ 2 2 ω 2 γ σ 1 2 c γ + c γ 2 ,
where  c γ = ϕ γ ; σ 1 2 Φ ( γ ; σ 1 2 ) .
We next derive the distribution of the sample mean of a random sample from the SUN population.
Proposition 2. 
Let Y 1 , Y 2 , , Y n be independent random variables from S U N 1 , 1 ( ξ , γ , ω , Ω ) , where Ω is given in Corollary 1. Then,
Y ¯ = 1 n i = 1 n Y i S U N 1 , n ξ , γ 1 n , ω n , Ω ¯ ,
where Ω ¯ = σ 1 2 I n ρ σ 1 σ 2 n 1 n ρ σ 1 σ 2 n 1 n σ 2 2 , and the pdf of Y ¯ is
f Y ¯ ( y ) = ϕ y ; ξ , ω 2 σ 2 2 n Φ n γ + ρ σ 1 σ 2 y ξ ω 1 n ; σ 1 2 I n ρ 2 σ 1 2 1 n 1 n Φ n ( γ 1 n ; σ 1 2 I n ) ,
where 1 n R n is the vector of all ones and I n M n × n is the identity matrix.
Proof. 
By Equation (5), the MGF of S = i = 1 n Y i is
M S ( s ) = i = 1 n M Y i ( s ) = exp n ξ s + n 2 ω 2 σ 2 2 s 2 Φ n ( γ 1 n + ρ σ 1 σ 2 ω s 1 n ; σ 1 2 I n ) Φ n ( γ 1 n ; σ 1 2 I n ) .
Since Y ¯ = S / n , the MGF of Y ¯ is obtained by letting s = t / n . Then, by Proposition 1 and Lemma 3, the density function of Y ¯ given in (6) is obtained. □
To better characterize the relationship between the density curves of Y ¯ and the underlying parameters, we draw some curves in Figure 1 and Figure 2. In Figure 1, we fix ρ = 0.95 , n = 25 , σ 1 = σ 2 = γ = 1 and choose different values of ξ and ω . We can see that the density curves’ shapes are affected by the location parameter ξ and scale ω when fixing other parameters. Figure 2 shows how ρ affects the density curve of Y ¯ when ξ = 0 , n = 25 , and ω = σ 1 = σ 2 = γ = 1 are fixed.
We now turn to estimate the difference in locations for two random samples from the SUN populations. Assume X S U N 1 , 1 ( ξ 1 , γ 1 , ω 1 , Ω 1 ) and Y S U N 1 , 1 ( ξ 2 , γ 2 , ω 2 , Ω 2 ) to be independent, where
Ω i = σ i 1 2 ρ i σ i 1 σ i 2 ρ i σ i 1 σ i 2 σ i 2 2 ,
for i = 1 , 2 . Let X 1 , X 2 , , X n and Y 1 , Y 2 , , Y m to be two random samples from X and Y, respectively. Proposition 2 forms the basis for deriving the distribution of the difference in sample means, as presented below.
Proposition 3. 
Let T = X ¯ Y ¯ , where X ¯ = 1 n i = 1 n X i , Y ¯ = 1 m j = 1 m Y j . Then,
T S U N 1 , m + n ( ξ d , γ * , 1 , Ω * ) ,
where ξ d = ξ 1 ξ 2 , γ * = ( γ 1 1 n , γ 2 1 m ) , and Ω * = Σ * Γ * Γ * τ 2 with τ 2 = σ 12 2 ω 1 2 n + σ 22 2 ω 2 2 m , Σ * = d i a g ( σ 11 2 I n , σ 21 2 I m ) , and Γ * = ρ 1 σ 11 σ 12 n 1 n , ρ 2 σ 21 σ 22 m 1 m .
Oftentimes, there is interest in comparing the means of two populations, which are not independent. Suppose that Y i = ( Y i 1 , Y i 2 ) for i = 1 , 2 , , n is a random sample from S U N 2 , 1 ( ξ , γ , W , Ω ) where Ω = Σ Γ Γ σ 2 . The following proposition provides the sampling distribution in difference location under this situation.
Proposition 4. 
Let D i = Y i 1 Y i 2 , where Y i = ( Y i 1 , Y i 2 ) for i = 1 , 2 , , n are defined above. Then, we have the following results:
(i) 
The  D i ’s are independently and identically SUN-distributed, and
D i S U N 1 , 1 ( ξ d , γ , 1 , Ω d ) ;
(ii) 
D ¯ = 1 n i = 1 n D i is SUN-distributed, and
D ¯ S U N 1 , n ξ d , γ 1 n , 1 n , Ω ¯ d ,
where Ω d = σ 2 d d δ 2 , Ω ¯ d = σ 2 I n d n d n δ 2 , d = Γ W A , δ 2 = A W Σ W A with A = ( 1 , 1 ) .
Proof. 
Let A = ( 1 , 1 ) ; thus, the MGF of D i for i = 1 , , n is
M D i ( t ) = exp ξ d t + 1 2 A W Σ W A t 2 Φ ( γ + Γ W A t ; σ 2 ) Φ ( γ ; σ 2 ) ,
which implies
D i S U N 1 , 1 ( ξ d , γ , 1 , Ω d ) ,
where Ω d = σ 2 d d δ 2 . Here, d = Γ W A and δ 2 = A W Σ W A . The same arguments (ii) can be obtained by Proposition 2. □
Example 1. 
Let Y i = ( Y i 1 , Y i 2 ) S U N 2 , 1 ( ξ , γ , I 2 , Ω ) for i = 1 , 2 , , n , where I 2 M 2 × 2 is an identity matrix, Γ = ( 0.5 , 0.5 ρ ) and Σ = 1 ρ ρ 1 . According to Proposition 4, we obtain that
Ω ¯ d = σ 2 I n 0.5 ( 1 ρ ) / n 0.5 ( 1 ρ ) / n 2 ( 1 ρ )
and
D ¯ S U N 1 , n ξ d , γ 1 n , 1 n , Ω ¯ d .
Moreover, the density curves of D ¯ with different values of parameters are shown in Figure 3 and Figure 4. In Figure 3, we set up ρ = 0.5 , σ = 1 , γ = 1 , and n = 5 . Figure 4 fixes ξ d = 0 , σ = 2 , γ = 1 , and n = 24 . In Figure 3, we can see that the shapes of density curves are not affected by changes in location parameter ξ d , while for ρ, the shapes of density curves are affected much by the changes of ρ from 0 to 0.95, as shown in Figure 4.

2.1. The APP for Estimating the Location Parameter ξ in One Sample

In this part, we focus on the interval estimation on the location parameter ξ using the APP, assuming the scale parameter ω is known. Based on the distribution of the sample mean given in Proposition 2, we derive the following results to address the question of how large a sample size is needed to ensure, with a specified level of confidence, ensuring that the sample statistic is sufficiently close to the population parameter.
Theorem 1. 
Let Y 1 , Y 2 , , Y n be independent random variables from S U N 1 , 1 ( ξ , γ , ω , Ω ) , where Ω = σ 1 2 ρ σ 1 σ 2 ρ σ 1 σ 2 σ 2 2 . Let c be the confidence level and f be the precision, which are specified such that the error associated with the estimator Y ¯ of ξ is E = f σ Y . More specifically,
P f 1 σ Y Y ¯ μ Y f 2 σ Y = c ,
where μ Y and σ Y are the population mean and standard deviation, respectively. Then, the required sample size n is obtained by
L U g Z ( z ) d z = c
such that U L is the shortest, where g Z ( z ) is the pdf of Z = Y ¯ ξ ω / n ,
L = n f 1 σ 2 1 ρ 2 γ c γ + σ 1 2 c γ 2 + ρ σ 1 σ 2 c γ ,
and
U = n f 2 σ 2 1 ρ 2 γ c γ + σ 1 2 c γ 2 + ρ σ 1 σ 2 c γ .
Here, c γ = ϕ ( γ ; σ 1 2 ) Φ ( γ ; σ 1 2 ) is a skewness factor. f 1 and f 2 are called the left and right precisions, respectively, working as error bounds and being obtained by the n value so that U L is the shortest and f m a x { | f 1 | , f 2 } 0 is minimized.
Proof. 
From Proposition 2, we have
Y ¯ S U N 1 , n ( ξ , γ 1 n , ω n , Ω ¯ ) ,
with MGF
M Y ¯ ( t ) = exp ξ t + 1 2 n ω 2 σ 2 2 t 2 Φ n γ 1 n + ρ σ 1 σ 2 ω n t 1 n ; σ 1 2 I n Φ n γ 1 n ; σ 1 2 I n .
Let Z = Y ¯ ξ ω / n . Then, the MGF of Z is
M Z ( s ) = E [ exp ( Z s ) ] = E exp Y ¯ ξ ω / n s = exp ξ ω / n s E exp Y ¯ · n ω s = exp σ 2 2 2 s 2 Φ n γ 1 n + ρ σ 1 σ 2 n s 1 n ; σ 1 2 I n Φ n γ 1 n ; σ 1 2 I n ,
implying that
Z S U N 1 , n ( 0 , γ 1 n , 1 , Ω ¯ ) ,
where Ω ¯ is given in Proposition 2. Therefore, by the population mean μ Y given in Corollary 1, (7) can be rewritten as
P f 1 σ Y + ρ σ 1 σ 2 ω c γ ω / n Z f 2 σ Y + ρ σ 1 σ 2 ω c γ ω / n = c .
Noticing that the population standard deviation σ Y from Corollary 1 is
σ Y = ω σ 2 1 ρ 2 γ c γ + c γ 2 / σ 1 2 1 / 2 ,
(9) will turn to
P ( L Z U ) = c ,
by denoting
L = n f 1 σ 2 1 ρ 2 γ c γ + c γ 2 / σ 1 2 + n ρ σ 1 σ 2 c γ
and
U = n f 2 σ 2 1 ρ 2 γ c γ + c γ 2 / σ 1 2 + n ρ σ 1 σ 2 c γ .
Equivalently, the probability equation can be rewritten as
L U g Z ( z ) d z = c .
Then, for given precision f and c, the minimum required sample size n can be obtained by minimizing U L such that the corresponding f 1 and f 2 satisfy the side condition that f m a x { | f 1 | , f 2 } > 0 is minimized. □
Corollary 2. 
The c × 100 % confidence interval for ξ is
Y ¯ ω n U , Y ¯ ω n L

2.2. The APP to Estimate the Difference in Locations for Two Independent Samples

In this section, we aim to derive the minimum sample size necessary to estimate the difference in location parameters from two independent SUN-distributed samples, under the assumption of known scale parameters. Using the same notation as in Section 2, let X 1 , X 2 , , X n and Y 1 , Y 2 , , Y m be two independent samples from the SUN distribution S U N 1 , 1 ( ξ 1 , γ 1 , ω 1 , Ω 1 ) and S U N 1 , 1 ( ξ 2 , γ 2 , ω 2 , Ω 2 ) . Noting that the minimum required n for the case m = n also ensures a c × 100 % confidence level for achieving the specified sampling precision in estimating ξ d when n > m , we restrict our attention to the APP under the assumption n = m in which the sampling distribution T = X ¯ Y ¯ will be used, assuming known values for ω 1 and ω 2 . For the case n < m , further details can be found in Wang et al.’s study [5].
Theorem 2. 
Let c be the confidence level and f be the precision, which are specified such that the error associated with the estimator T of ξ d = ξ 1 ξ 2 is E = f σ X Y . More specifically,
P f 1 σ X Y T μ X Y f 2 σ X Y = c ,
where μ X Y and σ X Y are the mean and standard deviation of the population difference X Y , respectively. Then, the required sample size n is obtained by
L U h Z ( z ) d z = c ,
such that U L is the shortest, where h Z ( z ) is the pdf of Z = T ξ d σ X Y , σ X Y is the standard deviation of X Y in Proposition 3,
L = f 1 σ X Y + ρ 1 σ 11 σ 12 ω 1 c γ 1 + ρ 2 σ 21 σ 22 ω 2 c γ 2 σ T ,
and
U = f 2 σ X Y + ρ 1 σ 11 σ 12 ω 1 c γ 1 + ρ 2 σ 21 σ 22 ω 2 c γ 2 σ T .
Here, c γ i = ϕ ( γ i ; σ i 2 ) Φ ( γ i ; σ i 2 ) works as the skewness factor for i = 1 , 2 . The left and right error bound f 1 and f 2 are called the left and right precisions, respectively, derived by the n value satisfies (11) so that U L is the shortest and f m a x { | f 1 | , f 2 } 0 is minimized.
Proof. 
The proof is similar as in Theorem 1 using Proposition 3.  □
Corollary 3. 
The c × 100 % confidence interval for ξ d is
T σ X Y U , T σ X Y L .

2.3. The APP on Estimating the Difference in Locations for Matched Pairs

This section extends a priori thinking to an important case not previously addressed, where the researcher is interested in estimating the difference in location parameters across two matched samples. We now formulate the APP for estimating the location difference ξ d based on matched pairs drawn from a bivariate SUN distribution, as described in Proposition 4.
Theorem 3. 
Suppose that Y i = ( Y i 1 , Y i 2 ) for i = 1 , 2 , , n is a random sample from S U N 2 , 1 ( ξ , γ , W , Ω ) and D i = Y i 1 Y i 2 is the sample difference for i = 1 , 2 , , n so that D ¯ is the sample mean of D i ’s as defined in Proposition 4. Let c be the confidence level and f be the precision, which are specified such that the error associated with the estimator D ¯ of ξ d is E = f σ D i under the assumptions in Proposition 4. More specifically,
P f 1 σ D i D ¯ μ D i f 2 σ D i = c ,
where μ D i and σ D i are the mean and standard deviation of D i ’s, respectively. Then, the required sample size n is obtained by
L U h Z ( z ) d z = c ,
such that U L is the shortest, where h Z ( z ) is the pdf of Z = n ( D ¯ ξ d ) ,
L = n f 1 σ D i + d c γ 0 , and U = n f 2 σ D i + d c γ 0 .
Here, c γ 0 = ϕ ( γ ; σ 2 ) Φ ( γ ; σ 2 ) is a skewness factor, and d and δ 2 are the same as in Proposition 4. The left and right precisions f 1 and f 2 , as error bounds, are derived by the n value obtained from (13) so that U L is the shortest and f m a x { | f 1 | , f 2 } 0 is minimized.
Proof. 
According to the distribution of D i for i = 1 , 2 , , n given in Proposition 4 and Corollary 1, the mean and variance of D i are, respectively,
E ( D i ) = ξ d + d c γ 0 , and V a r ( D i ) = δ 2 b 2 σ 2 γ c γ 0 + σ 2 c γ 0 2
by denoting c γ 0 = ϕ ( γ ; σ 2 ) Φ ( γ ; σ 2 ) .
  • Note that
D ¯ S U N 1 , n ξ , γ 1 n , 1 n , Ω ¯ d ,
where Ω ¯ d is given in Proposition 4.
Let Z = D ¯ ξ d 1 / n . Then, Z is SUN-distributed, symbolically, as
Z S U N 1 , n ( 0 , γ 1 n , 1 , Ω ¯ d ) .
Therefore, (12) can be written as
P L Z U = c ,
where
L = n f 1 σ D i + d c γ 0 = n f 1 δ 2 b 2 ( c γ 0 2 + γ c γ 0 / σ 2 ) 1 / 2 + d c γ 0 ,
and
U = n f 2 σ D i + d c γ 0 = n f 2 δ 2 b 2 ( c γ 0 2 + γ c γ 0 / σ 2 ) 1 / 2 + d c γ 0 .
Therefore, (13) is obtained by rewriting the probability equation to be a definite integral, and it is used for solving the minimum required sample size n, for given precision f and c, by minimizing U L in (13) such that the corresponding f 1 and f 2 satisfy the side condition that f m a x { | f 1 | , f 2 } > 0 is minimized. □
Corollary 4. 
The c × 100 % confidence interval for ξ d is
D ¯ δ n U , D ¯ δ n L .

3. Simulation Studies

In this section, we use R statistical software to (i) present results on the required sample size n and the corresponding values of L and U that satisfy the precision and confidence specifications based on Section 3, and to (ii) evaluate the coverage probabilities along with the average lengths of the confidence intervals. See Appendix A for the R implementation. Without loss of generality, we assume that Σ and γ are known, and we consider c = 0.95 , 0.9 ; f = 0.2 , 0.4 , 0.6 , 0.8 and ρ = 0.5 , 0.98 . It is a straightforward extension of the one-sample case for the case of two independent samples. Therefore, we only provide a simulation work based on (8) and (11) in this section. Therefore, we consider the following cases:
  • Case 1: (One sample) Set up σ 1 = σ 2 = γ = 1 .
  • Case 2: (Dependent samples) Set up Σ = 1 ρ ρ 1 , Γ = ( 0.5 , 0.5 ρ ) , W = I 2 and σ = 1 .

3.1. Sample Sizes and Bounds

Table 1 and Table 2 provide the required sample size n and its values of L and U for each combination of c and f.
In Table 1 and Table 2, it is clear to see that the effect of this decreased level from 95 % to 90 % of the desired confidence is that sample sizes are reduced. More importantly, it makes a more significant difference when more precision is desired than when less precision is desired, and more precision necessitates more participants. However, as precision and confidence decrease, the required n values decline more slowly, approaching 4 for f = 0.8 and c = 0.9 , indicating a limitation when using larger f or smaller c values (see Figure 5). When considering the effects of ρ on required sample sizes, it does affect both n and U L in the one-sample case (see Table 1) while it is not significant for the dependent samples as ρ increases from 0.5 to 0.98 (see Table 2).
Moreover, Figure 6 shows the values of L and the corresponding length of U L (distance) on a specific required sample size n = 18 for ρ = 0.5 , f = 0.4 and c = 0.9 for the one-sample case. It is clear to see that the shortest distance of U and L occurs at around L = 0.955 , which matches the result given in Table 1. The values of L and the corresponding distances of U and L in dependent samples when ρ = 0.5 for f = 0.6 and c = 0.9 are shown in Figure 7. It indicates that the shortest distance happens when L is around −1.423, which matches the result in Table 2.

3.2. Coverage Probability and Average Length

In the following, we evaluate the performances of the confidence intervals constructed by Corollary 3.1 and 3.3, respectively, using the values of L and U in the previous tables. Here, Monte Carlo simulations will be used for the case where ρ = 0.5 . For the one-sample case, we count the relative frequencies and average lengths (ALs) for different values of ξ and ω , given in Table 3, corresponding to values of precision and confidences, as shown in Table 1. For dependent samples, the coverage probabilities (CPs) and the ALs are listed in Table 4. All results are based on M = 1000 simulation runs.
Note that for the nominal confidence level of c, good performance has a coverage probability close to or greater than c and the shortest average length. The corresponding coverage rate for each required sample size in each table shows that our method works well. Notice that in Table 3, the average lengths increase for less precision or more confidence. More importantly, the average lengths are proportional to the scale ω , which stays the same for different values of ξ as ω = 3 is fixed. Similarly, in Table 4, the average length is free of ξ d , which reflects the results in Corollary 4.

4. Applications

The following two examples illustrate our process of estimating the location parameter in one sample and the difference in locations for matched pairs, respectively.

4.1. Leaf Area Index

For the location of the one-sample case, we will use the data set that pertained to the leaf area index (LAI) of Robinia pseudoacacia in the Huaiping Forest Farm of Shaanxi, China, from June to October 2010 provided by Ye and Wang [17], with a size of 96. Table 5 presents the comparison among the models with a normal distribution, skew-normal distribution, and SUN distribution by the parameter estimators, the values of the Akaike information criterion (AIC) and Bayesian information criterion (BIC), from which we can see that the SUN has the smallest values in both requirements.
The Q-Q plot in Figure 8 shows the data to be highly non-normal. The histogram of the data curved by the normal, skew-normal and SUN distributions is given in Figure 9, showing that the SUN fits the data set well.
Given the known population parameters γ = 0.1 , σ 1 = σ 2 = 1 , ρ = 0.9373 , and ω = 1.8224 , we determine the required sample size. For a precision of f = 0.6 and a confidence level of c = 0.95 , the SUN model yields a required sample size of n = 11 . In contrast, under the i.i.d. normal assumption—ignoring the asymmetry evident in the normal Q-Q plot (Figure 8) and histogram (Figure 9)—the required sample size is also 11. However, under the skew-normal model incorporating the specific dependence structure proposed by Trafimow et al. [3], the required sample size decreases significantly to 5. This reduction occurs because the structured dependence reduces variability. In summary, the APP based on the SUN model accounts for both independence and asymmetry, offering a more comprehensive approach. To construct the 95% confidence interval for ξ under the SUN assumption, we randomly selected a sample of size n = 11 from the population, obtaining a sample mean x ¯ = 2.3291 . The resulting 95% confidence interval for ξ is [ 0.2403 , 1.6913 ] , with a length of 1.4510 . The true value ξ = 1.2729 lies within this interval.

4.2. Sale Price Market Values

In this application, we are going to set up the APP to illustrate the use of the SUN to fit the data from the example given by Wang et al. [6], which compares the market values (M.V.) and the selling prices (S.P.) (USD 10,000) of 45 homes. Based on the fact that the family of S U N d , m is similar to the family of C S N -2, which provides a way to estimate parameters in S U N 2 , 1 for describing our data, more details can be found in Sahu et al. [18] and Arellano-Valle and Azzalini [15]. The following Figure 10 displays an SUN fitted contour plot of S U N 2 , 1 ( ξ , γ , I 2 , Ω ) with ξ = ( 14.68 , 15.82 ) , γ = 0.2 ,
Ω = Σ Γ Γ 1 , where Σ = 81.18 87.82 87.82 105.27 , Γ = ( 8.84 , 10.06 ) .
Let D be the dependent data set of S.P. and M.V.; then, it follows an SUN distribution, S U N ( ξ d , γ , ω , Ω d ) , where ξ d = 1.14 , γ = 0.2 , ω = 1 , and Σ = 1 1.22 1.22 10.81 can be obtained by Proposition 4. Let c = 0.9 and f = 0.4 . The skew-normal fit incorporating the specific dependence structure proposed by Wang et al. [6] yields a required sample size of n = 10 . In contrast, when modeling the data under the i.i.d. normality assumption—ignoring the evident asymmetry shown in Figure 10—the required sample size increases substantially to n = 18 , reflecting the inefficiency introduced by neglecting this skewness. The SUN model, under the i.i.d. assumption via Theorem 3, requires a sample size of n = 16 . This approach retains the realistic independent sampling assumption while effectively capturing data skewness, providing a more balanced and broadly applicable solution. We randomly selected a sample of size 16 and calculated the sample mean D ¯ = 2.23 . The resulting 90 % confidence interval for ξ d is [ 1.43 , 0.76 ] , which includes the method-of-moments estimate ξ d = 1.14 .

5. Discussion

Researchers have primarily focused on developing a priori confidence intervals for the normal distribution, later extending these methods to the more general family of skew-normal distributions, which includes the normal distribution as a special case. However, due to the lack of additive closure under the skew-normal family, the current literature on the adaptive precision procedure (APP) for estimating location parameters assumes that samples jointly follow a multivariate skew-normal distribution. In this framework, individual observations share identical univariate skew-normal marginal distributions but exhibit specific dependencies. This paper addresses a critical gap by providing APP methods to determine the required sample sizes for constructing a priori confidence intervals in three scenarios under the SUN distribution family. The SUN family encompasses the skew-normal distribution as a subset and possesses the crucial property of closure under linear transformations of independent random variables. This extension offers a significant contribution to the field. While our study assumes known scale parameters, related work enhances the broader context. Gupta and Aziz [19] introduced weighted moments estimators for SUN populations, which could improve the validity of our findings when scales are unknown. Furthermore, Wang et al. [20] developed a Bayesian framework for mixed-type multivariate regression using continuous shrinkage priors. Their approach suggests potential interfaces between Bayesian shrinkage/variable selection methods and our SUN-based APP. To advance this line of research, future study will focus on developing a priori confidence intervals for location parameters in an unknown scale case and differences in k ( k 3 ) locations under the SUN setting when the samples are independent.

Author Contributions

W.T. and C.W.: conceptualization, methodology, validation, investigation, resources, supervision, project administration, visualization, and writing—review and editing; J.Y.: software, formal analysis, and data curation. All authors have read and agreed to the published version of the manuscript.

Funding

The research was supported by the Natural Science Foundation of Top Talent of SZTU (GDRC202214).

Data Availability Statement

Acknowledgments

The authors thank the Academic Editor and three reviewers for their valuable suggestions and modifications, which have improved the quality of this paper.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

R Code for Calculating the Sample Size Needed With Given Precision and Confidence Level.
Listing 1. R code for requried sample size.
Symmetry 17 01228 i001
Symmetry 17 01228 i002

References

  1. Trafimow, D. Using the coefficient of confidence to make the philosophical switch from a posteriori to a priori inferential statistics. Educ. Psychol. Meas. 2017, 77, 831–854. [Google Scholar] [CrossRef] [PubMed]
  2. Trafimow, D.; MacDonald, J.A. Performing inferential statistics prior to data collection. Educ. Psychol. Meas. 2017, 77, 204–219. [Google Scholar] [CrossRef] [PubMed]
  3. Trafimow, D.; Wang, T.; Wang, C. From a sampling precision perspective, skewness is a friend and not an enemy. Educ. Psychol. Meas. 2019, 79, 129–150. [Google Scholar] [CrossRef] [PubMed]
  4. Azzalini, A.; Dalla Valle, A. The multivariate skew-normal distribution. Biometrica 1996, 83, 715–726. [Google Scholar] [CrossRef]
  5. Wang, C.; Wang, T.; Trafimow, D.; Chen, J. Extending a priori procedure to two independent samples under skew normal settings. Asian J. Econ. Bank. 2019, 3, 29–40. [Google Scholar]
  6. Wang, C.; Wang, T.; Trafimow, D.; Myüz, H.A. Necessary sample sizes for specified closeness and confidence of matched data under the skew normal setting. Commun.-Stat.-Simul. Comput. 2022, 51, 2083–2094. [Google Scholar] [CrossRef]
  7. Ma, Z.; Wang, T.; Choy, S.T.B.; Wei, Z.; Zhu, X. Extending the A Priori Procedure for Estimating Location Parameter Under Multivariate Skew Normal Settings. In Optimal Transport Statistics for Economics and Related Topics; Studies in Systems, Decision and Control; Ngoc Thach, N., Kreinovich, V., Ha, D.T., Trung, N.D., Eds.; Springer: Cham, Switzerland, 2024; Volume 483. [Google Scholar] [CrossRef]
  8. Tong, T.; Trafimow, D.; Wang, T.; Wang, C.; Hu, L.; Chen, X. The a priori procedure (APP) for estimating regression coefficients in linear models. Methodology 2022, 18, 203–220. [Google Scholar] [CrossRef]
  9. Cao, L.; Wang, C.; Wang, T.; Trafimow, D. The APP for estimating population proportion based on skew normal approximations and the Beta-Bernoulli process. Commun.-Stat.-Simul. Comput. 2024, 53, 167–177. [Google Scholar] [CrossRef]
  10. Arellano-Valle, R.B.; Azzalini, A. On the unification of families of skew-normal distributions. Scand. J. Stat. 2006, 33, 561–574. [Google Scholar] [CrossRef]
  11. Gupta, A.K.; Aziz, M.A.; Ning, W. On some properties of the unified skew normal distribution. J. Stat. Theory Pract. 2013, 7, 480–495. [Google Scholar] [CrossRef]
  12. Amiri, M.; Jamalizadeh, A.; Towhidi, M. Some multivariate singular unified skew-normal distributions and their application. Commun. -Stat.-Theory Methods 2016, 45, 2159–2171. [Google Scholar] [CrossRef]
  13. Durante, D. Conjugate Bayes for probit regression via unified skew-normal distributions. Biometrika 2019, 106, 765–779. [Google Scholar] [CrossRef]
  14. Minozzo, M.; Bagnato, L. A unified skew-normal geostatistical factor model. Environmetrics 2021, 32, e2672. [Google Scholar] [CrossRef]
  15. Arellano-Valle, R.B.; Azzalini, A. Some properties of the unified skew-normal distribution. Stat. Pap. 2022, 63, 461–487. [Google Scholar] [CrossRef]
  16. Anceschi, N.; Fasano, A.; Durante, D.; Zanella, G. Bayesian conjugacy in probit, tobit, multinomial probit and extensions: A review and new results. J. Am. Stat. Assoc. 2023, 118, 1451–1469. [Google Scholar] [CrossRef]
  17. Ye, R.D.; Wang, T.H. Inferences in linear mixed models with skew-normal random effects. Acta Math. Sin. Engl. Ser. 2015, 31, 576–594. [Google Scholar] [CrossRef]
  18. Sahu, S.K.; Dey, D.K.; Branco, M.D. A new class of multivariate skew distributions with applications to Bayesian regression models. Can. J. Stat. 2003, 31, 129–150. [Google Scholar] [CrossRef]
  19. Gupta, A.K.; Aziz, M.A. Estimation of Parameters of the Unified Skew Normal Distribution Using the Method of Weighted Moments. J. Stat. Theory Pract. 2012, 6, 402–416. [Google Scholar] [CrossRef]
  20. Wang, S.-H.; Bai, R.; Huang, H.-H. Two-step mixed-type multivariate Bayesian sparse variable selection with shrinkage priors. Electron. J. Statist. 2025, 19, 397–457. [Google Scholar] [CrossRef]
Figure 1. Thedensity curves of Y ¯ with different values of ξ and ω .
Figure 1. Thedensity curves of Y ¯ with different values of ξ and ω .
Symmetry 17 01228 g001
Figure 2. The density curves of Y ¯ with different values of ρ .
Figure 2. The density curves of Y ¯ with different values of ρ .
Symmetry 17 01228 g002
Figure 3. The density curves of D ¯ for different values of ξ d .
Figure 3. The density curves of D ¯ for different values of ξ d .
Symmetry 17 01228 g003
Figure 4. The density curves of D ¯ for different values of ρ .
Figure 4. The density curves of D ¯ for different values of ρ .
Symmetry 17 01228 g004
Figure 5. The sample size n necessary to meet precision f = 0.2 , 0.4 , 0.6 , 0.8 and confidence c = 0.95 , 0.9 for ρ = 0.98 , 0.5 for the one-sample case.
Figure 5. The sample size n necessary to meet precision f = 0.2 , 0.4 , 0.6 , 0.8 and confidence c = 0.95 , 0.9 for ρ = 0.98 , 0.5 for the one-sample case.
Symmetry 17 01228 g005
Figure 6. Distances between the lower and upper cutoffs as a function of the left boundaries in one sample.
Figure 6. Distances between the lower and upper cutoffs as a function of the left boundaries in one sample.
Symmetry 17 01228 g006
Figure 7. Distances between the lower and upper cutoffs as a function of the left boundaries in dependent samples.
Figure 7. Distances between the lower and upper cutoffs as a function of the left boundaries in dependent samples.
Symmetry 17 01228 g007
Figure 8. The Q-Q plot for the LAI data.
Figure 8. The Q-Q plot for the LAI data.
Symmetry 17 01228 g008
Figure 9. The histogram of the data of LAI and its SUN-fitted curve.
Figure 9. The histogram of the data of LAI and its SUN-fitted curve.
Symmetry 17 01228 g009
Figure 10. A contour plot fit of the estimated S U N 2 , 1 distribution.
Figure 10. A contour plot fit of the estimated S U N 2 , 1 distribution.
Symmetry 17 01228 g010
Table 1. Theminimum required sample size n and the corresponding L, U, f 1 and f 2 for one sample.
Table 1. Theminimum required sample size n and the corresponding L, U, f 1 and f 2 for one sample.
  ρ = 0.5 ρ = 0.98
f c n L U f 1 f 2 n L U f 1 f 2
0.20.951020.4553.259−0.1980.2001091.2984.553−0.1960.192
0.968−0.3852.757−0.1990.200761.0913.818−0.1950.194
0.40.9525−1.1492.585−0.3920.39227−0.1523.103−0.3880.393
0.918−0.9542.176−0.3870.38820−0.1092.618−0.3820.378
0.60.9511−1.3872.347−0.5900.59212−0.6522.603−0.5860.585
0.98−1.1611.972−0.5820.5819−0.5092.218−0.5630.570
0.80.957−1.4842.250−0.7400.7427−0.8522.403−0.7520.780
0.95−1.2331.901−0.7300.7425−0.7092.018−0.7460.773
Table 2. The minimum required sample size n and the corresponding L, U, f 1 and f 2 for two dependent samples.
Table 2. The minimum required sample size n and the corresponding L, U, f 1 and f 2 for two dependent samples.
  ρ = 0.5 ρ = 0.98
f c n L U f 1 f 2 n L U f 1 f 2
0.20.9596−1.2322.641−0.1990.20096−1.8182.100−0.1990.196
0.968−1.0372.223−0.2000.20068−1.5301.767−0.1940.191
0.40.9524−1.5852.289−0.3960.39324−1.8882.029−0.3910.388
0.917−1.3341.927−0.3890.38517−1.5891.708−0.3930.391
0.60.9511−1.7282.205−0.5910.58911−1.9412.037−0.5890.587
0.98−1.4241.815−0.5870.5858−1.6561.737−0.5760.579
0.80.956−1.7612.113−0.7720.7766−1.9241.994−0.7810.779
0.94−1.4381.725−0.7830.7814−1.5711.628−0.7810.782
Table 3. Thecoverage rates of the confidence intervals of ξ for one sample.
Table 3. Thecoverage rates of the confidence intervals of ξ for one sample.
fcn ξ = 1 , ω = 1
CP (AL)
ξ = 1 , ω = 3
CP (AL)
ξ = 2 , ω = 3
CP (AL)
ξ = 5 , ω = 3
CP (AL)
0.20.951020.952 (0.3810)0.947 (1.1431)0.951 (1.1431)0.949 (1.1431)
0.9680.899 (0.3810)0.893 (1.1431)0.901 (1.1431)0.904 (1.1431)
0.40.95250.952 (0.7465)0.947 (2.2395)0.950 (2.2395)0.953 (2.2395)
0.9180.902 (0.7386)0.897 (2.2159)0.903 (2.2159)0.901 (2.2159)
0.60.95110.951 (1.1255)0.954 (3.3765)0.949 (3.3765)0.947 (3.3765)
0.980.896 (1.1079)0.901 (3.3238)0.905 (3.3238)0.902 (3.3238)
0.80.9570.952 (1.4113)0.954 (4.2339)0.949 (4.2339)0.947 (4.2339)
0.950.902 (1.4015)0.899 (4.2044)0.897 (4.2044)0.901 (4.2044)
Table 4. The coverage rates of the confidence intervals for ξ d in dependent samples.
Table 4. The coverage rates of the confidence intervals for ξ d in dependent samples.
fcn ξ d = 1
CP (AL)
ξ d = 0
CP (AL)
ξ d = 1
CP (AL)
ξ d = 5
CP (AL)
0.20.95960.951 (0.3953)0.951 (0.3953)0.949 (0.3953)0.950 (0.3953)
0.9680.902 (0.3953)0.899 (0.3953)0.901 (0.3953)0.898 (0.3953)
0.40.95240.950 (0.7907)0.949 (0.7907)0.951 (0.7907)0.953 (0.7907)
0.9170.901 (0.7906)0.904 (0.7906)0.899 (0.7906)0.896 (0.7906)
0.60.95110.947 (1.1860)0.949 (1.1860)0.955 (1.1860)0.946 (1.1860)
0.980.902 (1.1454)0.896 (1.1454)0.899 (1.1454)0.897 (1.1454)
0.80.9560.952 (1.5814)0.947 (1.5814)0.948 (1.5814)0.946 (1.5814)
0.940.901 (1.5813)0.898 (1.5813)0.895 (1.5813)0.896 (1.5813)
Table 5. Estimates of the parameters of the normal, skew-normal, and the SUN distributions based on the data set and the corresponding AIC and BIC values.
Table 5. Estimates of the parameters of the normal, skew-normal, and the SUN distributions based on the data set and the corresponding AIC and BIC values.
NormalSkew-NormalSUN
ξ 2.63581.27291.2729
ω 1.20991.82241.8224
λ -2.6888-
ρ --0.9373
σ 1 --1
σ 2 --1
γ --0.1
AIC595.862567.2963547.1149
BIC600.9907574.9893554.808
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wang, C.; Tian, W.; Yang, J. A Priori Sample Size Determination for Estimating a Location Parameter Under a Unified Skew-Normal Distribution. Symmetry 2025, 17, 1228. https://doi.org/10.3390/sym17081228

AMA Style

Wang C, Tian W, Yang J. A Priori Sample Size Determination for Estimating a Location Parameter Under a Unified Skew-Normal Distribution. Symmetry. 2025; 17(8):1228. https://doi.org/10.3390/sym17081228

Chicago/Turabian Style

Wang, Cong, Weizhong Tian, and Jingjing Yang. 2025. "A Priori Sample Size Determination for Estimating a Location Parameter Under a Unified Skew-Normal Distribution" Symmetry 17, no. 8: 1228. https://doi.org/10.3390/sym17081228

APA Style

Wang, C., Tian, W., & Yang, J. (2025). A Priori Sample Size Determination for Estimating a Location Parameter Under a Unified Skew-Normal Distribution. Symmetry, 17(8), 1228. https://doi.org/10.3390/sym17081228

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop