Optimal Sample Size for the Birnbaum–Saunders Distribution under Decision Theory with Symmetric and Asymmetric Loss Functions

Costa, Eliardo; Santos-Neto, Manoel; Leiva, Víctor

doi:10.3390/sym13060926

Open AccessArticle

Optimal Sample Size for the Birnbaum–Saunders Distribution under Decision Theory with Symmetric and Asymmetric Loss Functions

by

Eliardo Costa

¹

,

Manoel Santos-Neto

²

and

Víctor Leiva

^3,*

¹

Department of Statistics, Universidade Federal do Rio Grande do Norte, Natal 59078-970, Brazil

²

Department of Statistics, Universidade Federal de Campina Grande, Campina Grande 58429-900, Brazil

³

School of Industrial Engineering, Pontificia Universidad Católica de Valparaíso, Valparaíso 2362807, Chile

^*

Author to whom correspondence should be addressed.

Symmetry 2021, 13(6), 926; https://doi.org/10.3390/sym13060926

Submission received: 22 April 2021 / Revised: 19 May 2021 / Accepted: 20 May 2021 / Published: 23 May 2021

(This article belongs to the Special Issue Symmetric and Asymmetric Distributions: Theoretical Developments and Applications II)

Download

Browse Figures

Versions Notes

Abstract

The fatigue-life or Birnbaum–Saunders distribution is an asymmetrical model that has been widely applied in several areas of science and mainly in reliability. Although diverse methodologies related to this distribution have been proposed, the problem of determining the optimal sample size when estimating its mean has not yet been studied. In this paper, we derive a methodology to determine the optimal sample size under a decision-theoretic approach. In this approach, we consider symmetric and asymmetric loss functions for point and interval inference. Computational tools in the R language were implemented to use this methodology in practice. An illustrative example with real data is also provided to show potential applications.

Keywords:

Bayes risk; inverse gamma distribution; LINEX loss function; Metropolis–Hastings algorithm; R language; sampling cost

1. Introduction

The determination of the sample size is a relevant topic in all studies when statistical methods are applied. For example, in clinical trials, this determination was adequately discussed in ch. 6 of [1] and the references therein, giving an overview on this topic. The optimal sample size in the classical statistical setting depends crucially on the alternative hypothesis. However, this is not the case in a Bayesian framework where there is no need to state a specific alternative hypothesis.

In order to determine the sample size in any knowledge area, prior information must be available. Introducing uncertainty into this information is essentially a Bayesian approach. Then, the use of Bayesian methods for determining an optimal sample size should be explored within the distributional framework that is relevant according to the problem under study. In general, there are empirical limitations that require sample sizes to be determined in advance. Therefore, we can determine an optimal sample size that satisfies a criterion based on the Bayes risk. Given specific loss and sampling cost functions, a full Bayesian analysis may be performed for determining an optimal sample size. In the absence of precise information on costs and losses, the loss functions can be approximate, and a Bayesian approach might be employed to provide reasonable estimates of the optimal sample size. For more details about this methodology, see [2,3,4,5,6,7,8] and the references therein.

Birnbaum and Saunders [9] introduced a family of distributions to model failure times for metals subject to periodic stress. The authors provided a natural physical justification for this family, which is known as the fatigue-life or Birnbaum–Saunders (BS) distribution. In the last few decades, this distribution has received considerable attention in the literature, and many methodologies have been proposed for parameter inference. Such attention is justified by its wide applicability, and its variations have been applied in several areas [10,11,12,13,14,15,16], but mainly in reliability [17,18,19]. A detailed review of the BS distribution including methodologies under the classical and Bayesian approaches was presented in [20]. Recently, guidelines about the minimum sample size for monitoring the BS median in quality control under a classical statistical approach were presented in [21]. Although diverse methodologies related to the BS distribution have been proposed, the problem of determining an optimal sample size when estimating the BS mean has not yet been studied.

The main objective of this paper is to derive a methodology for determining the optimal sample size when estimating the mean of the BS distribution under decision theory. We develop a methodology via a Bayesian decision-theoretic approach based on a criterion that minimizes the Bayes risk and sampling cost. The proposed approach depends on ad-hoc loss functions (symmetric or asymmetric) defined to accommodate the implications of a decision. We consider three loss functions for point inference and two for interval inference. Computational tools in the R language were developed to use this methodology in practice.

The paper unfolds as follows. In Section 2, we provide background on the BS distribution, the inference of its parameters, and the Bayesian approach. Section 3 presents the methodology to obtain the optimal sample size under a decision-theoretic approach. In Section 4, we show the use of the main functions and methods of an R package implemented by the authors for the present work [22]. An illustrative example is also provided in this section. Finally, we conclude with a discussion of the results in Section 5, including ideas for future research.

2. The Birnbaum–Saunders Model

In this section, we present the properties of the BS distribution, the inference of its parameters, and discuss the Bayesian approach. Much of the background information about the BS distribution presented in this section has been gathered from other works [9,20,23,24].

2.1. Properties

Let X be a BS distributed random variable with a shape parameter

α

and a scale parameter

β

, which we denote by

X \sim BS (α, β)

. Then, the probability density function of X is given by:

f_{X} (x | α, β) = \frac{1}{\sqrt{2 π}} exp (- \frac{1}{2 α^{2}} (\frac{x}{β} + \frac{β}{x} - 2)) \frac{(x + β)}{2 α \sqrt{β x^{3}}},

where

x, α, β \in R_{> 0}

.

Besides being a scale parameter,

β

is also the median of the BS distribution. Furthermore, the mean and variance of the BS distribution are stated as:

μ = E [X] = β (1 + \frac{α^{2}}{2}), Var [X] = {(α β)}^{2} (1 + \frac{5 α^{2}}{4}) .

(1)

Moreover, if X is BS distributed, then:

X = \frac{β}{4} {(α Z + \sqrt{{(α Z)}^{2} + 4})}^{2},

(2)

where Z follows a standard normal distribution, which is useful to generate random values from the

BS (α, β)

distribution. It is possible to show that:

Z = \frac{1}{α} (\sqrt{\frac{X}{β}} - \sqrt{\frac{β}{X}}) \sim N (0, 1) .

Some useful properties of the BS distribution are:

P1:: If $X \sim BS (α, β)$ , then $c X \sim BS (α, c β)$ , that is, the BS distribution is a homogeneous family;
P2:: If $X \sim BS (α, β)$ , then $1 / X \sim BS (α, 1 / β)$ , that is, the BS distribution is invariant under the reciprocal transformation. This property can be important in financial applications.

2.2. Inference

Given a sample

X = (X_{1}, \dots, X_{n})

and their observed values

x = (x_{1}, \dots, x_{n})

, modified moment estimates [25] for

α

and

β

are expressed, respectively, as:

\tilde{α} = \sqrt{2 (\sqrt{\frac{{\bar{x}}_{a}}{{\bar{x}}_{h}}} - 1)}, \tilde{β} = \sqrt{{\bar{x}}_{a} {\bar{x}}_{h}},

(3)

where

{\bar{x}}_{a}

is the sample arithmetic mean and

{\bar{x}}_{h}

is the sample harmonic mean, that is,

{\bar{x}}_{a} = \frac{1}{n} \sum_{i = 1}^{n} x_{i}, {\bar{x}}_{h} = {(\frac{1}{n} \sum_{i = 1}^{n} \frac{1}{x_{i}})}^{- 1} .

In addition, the estimates

\tilde{α}

and

\tilde{β}

are well defined because

{\bar{x}}_{a} \geq {\bar{x}}_{h} \geq 0

, that is, these estimates are always positive and mathematically well determined or unique. We may use

\tilde{α}

and

\tilde{β}

as initial values in a sampling algorithm. Furthermore, the likelihood function obtained from the

BS (α, β)

probability density function satisfies:

\begin{matrix} L (α, β; x_{n}) \propto & \frac{1}{{(α β)}^{n}} \prod_{i = 1}^{n} ({(\frac{β}{x_{i}})}^{1 / 2} + {(\frac{β}{x_{i}})}^{3 / 2}) exp (- \frac{1}{2 α^{2}} \sum_{i = 1}^{n} (\frac{x_{i}}{β} + \frac{β}{x_{i}} - 2)) . \end{matrix}

For the parameters

α

and

β

of the model, we consider proper prior distributions because the use of non-informative prior distributions yields an improper posterior distribution and continuous conjugate priors do not exist [26]. A possible choice of a prior distribution for

β

is the inverse gamma (IG) distribution whose probability density function satisfies:

π (β) \propto β^{- (a_{1} + 1)} exp (- \frac{b_{1}}{β}), β \in R_{> 0},

where

a_{1}

and

b_{1}

are positive and known constants (hyperparameters) of the IG distribution. We denote this as

β \sim IG (a_{1}, b_{1})

, that is, the IG distribution of parameters

a_{1}

and

b_{1}

. We also assume an IG prior distribution for

α^{2}

with hyperparameters

a_{2}

and

b_{2}

. Thus, the model can be written hierarchically as:

\begin{matrix} X_{i} | α, β \overset{IID}{\sim} BS (α, β), i = 1, \dots, n; \\ β \sim IG (a_{1}, b_{1}), α^{2} \sim IG (a_{2}, b_{2}), \end{matrix}

where “IID” stands for independent and identically distributed. In this context, the conditional posterior distribution of

α^{2}

given

β

and

x_{n}

is stated as:

α^{2} | β, x_{n} \sim IG (\frac{n + 1}{2} + a_{2}, \frac{1}{2} \sum_{i = 1}^{n} (\frac{x_{i}}{β} + \frac{β}{x_{i}} - 2) + b_{2}),

(4)

whereas the marginal posterior distribution of

β

given

x_{n}

can be obtained from:

π (β | x_{n}) \propto β^{- (n + a_{1} + 1)} exp (\frac{b_{1}}{β}) \prod_{i = 1}^{n} ({(\frac{β}{x_{i}})}^{1 / 2} + {(\frac{β}{x_{i}})}^{3 / 2}) {(\frac{1}{2} \sum_{i = 1}^{n} (\frac{x_{i}}{β} + \frac{β}{x_{i}} - 2) + b_{2})}^{- \frac{(n + 1)}{2} - a_{2}},

(5)

which is not a known distribution [26]. In this way, we use the random walk Metropolis–Hastings algorithm [27] to generate samples from the marginal posterior distribution of

β

given

x_{n}

. Using this sampling algorithm and the posterior distribution defined in (4), we may generate values from the joint posterior distribution of

α^{2}

and

β

. For a given

x_{n}

, first, we generate values of

β

from (5), and with them, we draw values of

α^{2}

using (4). Note that the parameter of interest

μ

is the mean of the BS distribution, which is a function of

α^{2}

and

β

. In order to obtain a random sample of the posterior distribution of

μ

given

x_{n}

, we may draw values from the joint posterior of

α^{2}

and

β

. Then, we apply the expression defined in (1) for each sampled pair of values.

3. Optimal Sample Size

In this section, we introduce the methodology to obtain the optimal sample size for estimating

μ

of the BS distribution under a decision-theoretic approach. Furthermore, we define the different loss functions to be considered.

3.1. Determining the Optimal Sample Size

We may approach the problem of determining the optimal sample size as a decision problem [4,28]. Given that

μ

is the parameter of interest, we specify a loss function

L (μ, d_{n})

based on a sample

X_{n} = (X_{1}, \dots, X_{n})

and a decision function

d_{n} \equiv d_{n} (X_{n})

. For a given n and depending on the adopted loss function, the action

d_{n} (x_{n})

consists of specifying a quantity (point inference case) representing an estimate for

μ

or two quantities (interval inference case) representing the lower and upper limits of a credible interval for

μ

.

Let

π

be associated with a prior distribution for the unknown parameter

μ

and

d_{n}

be a decision function. Then, the Bayes risk [4] is defined as:

r (π, d_{n}) = \int_{M} \int_{X^{n}} L (μ, d_{n}) g (x_{n} | μ) π (μ) d x_{n} d μ,

(6)

where g is related to the sampling distribution for

X_{n}

given

μ

, M is the parameter space, and

X_{n}

is the sample space. The decision

d_{n}^{*}

that minimizes

r (π, d_{n})

among all the possible decisions

d_{n}

is called the Bayes rule. In this context, we define the optimal sample size as the one that minimizes the total cost (TC) stated as:

TC (n) = r (π, d_{n}^{*}) + C (n),

where

C (n)

is a function representing the cost of sampling n observations. Here, we take

C (n) = c n

, where c is the per-unit cost for observing a unit in the population. Since it is not possible to compute

r (π, d_{n}^{*})

analytically, we use Monte Carlo simulations as an alternative to estimate

TC (n)

for each n.

Suppose that the order of the integration may be reversed in (6). Note that this reversal is possible whenever the conditions for the Fubini theorem are satisfied. In this case, as is known, minimizing the Bayes risk is equivalent to minimizing the posterior expected loss. Then, we have:

r (π, d_{n}^{*}) = \int_{X^{n}} E [L (μ, d_{n}^{*}) | x_{n}] g (x_{n}) d x_{n},

so that we may estimate the minimized Bayes risk through the posterior expected value of the loss function applied to the Bayes rule

d_{n}^{*}

. This may be done as summarized in Algorithm 1.

After obtaining an estimate of

r (π, d_{n}^{*})

, we added the respective sampling cost

c n

, which provided us an estimate of

TC (n)

for a given n. We apply this procedure for a grid of plausible values of n. For example, if we set this grid of values as

n = 2, 12, \dots, 82, 92

, then we estimate

TC (2), TC (12), \dots, TC (82), TC (92)

, respectively. The choice of the grid values is arbitrary, but as the distance between its consecutive elements is shorter, a better visualization is reached of the behavior of the TC. However, as this distance decreases, the required computer processing power also increases, as well as the time to compute all these estimates. Thus, the choice of this grid must consider all these settings.

Algorithm 1: Estimation of the minimized Bayes risk.

1: Set values for the hyperparameters, which reflect the prior knowledge about $α^{2}$ and $β$ .
2: Draw one value of $α^{2}$ and one value of $β$ from the respective prior distributions, and compute the square root of $α^{2}$ .
3: Given $α$ and $β$ , generate a value of $X_{i}$ from the $BS (α, β)$ distribution using (2), for $i = 1, \dots, n$ , obtaining a sample $x_{n} = (x_{1}, \dots, x_{n})$ .
4: Given $x_{n}$ , collect a sample of size N (as large as possible) from the joint posterior distribution of $α^{2}$ and $β$ as explained in Section 2, generating values $(α_{j}^{2}, β_{j})$ , for $j = 1, \dots, N$ .
5: For $j = 1, \dots, N$ , compute the posterior values $μ_{j}$ using the generated values in Step 4 and the expression stated in (1).
6: Obtain the corresponding Bayes rule $d_{n}^{*}$ using the sample of the posterior distribution of $μ$ generated in Step 5.
7: Use the values computed in Step 5 to estimate $E [L (μ, d_{n}^{*}) | x_{n}]$ .
8: Repeat Steps 1–7 K times (as large as possible), generating K estimates of $E [L (μ, d_{n}^{*}) | x_{n}]$ .
9: Take the average of the K estimates generated in Step 8 as an estimate of $r (π, d_{n}^{*})$ .

In Step 4 of Algorithm 1, when sampling from the marginal posterior distribution stated in (5), we consider a burn-in of 500 iterations and a thinning of 20 with a final number of iterations of 500. We use these 500 iterations to estimate the Bayes risk. A trace and autocorrelation plot for a lower value of the grid used for n is inspected. We expect the same or better behavior as n increased in the grid. All the trace plots showed a random behavior around a value, and in all the autocorrelation plots, the autocorrelations for almost all lags were zero. In each value of n in the grid, we estimate the Bayes risk ten times. We inspect the trace plots, autocorrelations plots, and the acceptance rate to set the burn-in, thinning, and final number of iterations as 500. All these inspection tools showed that such settings provide good results. If we increase these values, we may have the same or better behavior. However, we must consider the computational cost, which increases notoriously as these values increase. Then, taking all this into account, we decide to set 500 iterations. Due to the computational burden, other values were not tested, but we use triplicate values to show the stability of our results.

Consider the fitting proposed by [8] of the total cost curve established as:

tc (n) = \frac{E}{{(1 + n)}^{G}} + c n,

to the grid of values of n and the respective estimates of

TC (n)

, denoted by

tc (n)

, where E and G are parameters to be estimated. This curve may be linearized by means of:

log (tc (n) - c n) = log (E) - G log (1 + n),

whereas the estimates of E and G can be computed by the least-squares method. In this setting, the optimal sample size (

n_{o}

) is the integer closest to:

{(\frac{\hat{E} \hat{G}}{c})}^{1 / (\hat{G} + 1)} - 1,

where

\hat{E}

and

\hat{G}

are, respectively, the least-squares estimates of E and G.

3.2. Loss Functions

We consider five loss functions to determine the optimal sample size. The loss functions 1 and 2 may be used for point inference and are symmetric. The loss function 3 also may be used for point inference, but this loss function is asymmetric. The loss functions 4 and 5 may be used for interval inference. Such loss functions are defined below.

L1: Loss Function 1. The first loss function is defined as:

$L (μ, d_{n}) = | μ - d_{n} |,$

which is known as the absolute loss function. For this loss function, the Bayes rule $d_{n}^{*}$ is the median of the posterior distribution of $μ$ . Given a sample $μ_{j}$ , for $j = 1, \dots, N$ , of the posterior distribution of $μ$ , an estimate of $E [L (μ, d_{n}^{*}) | x_{n}]$ may be obtained from $\sum_{j = 1}^{N} | μ_{j} - \hat{d_{n}^{*}} | / N$ , where $\hat{d_{n}^{*}}$ is the median of the sample $μ_{j}$ , for $j = 1, \dots, N$ .
L2: Loss Function 2. Second, we consider the well-known quadratic loss function stated as:

$L (μ, d_{n}) = {(μ - d_{n})}^{2} .$

For this loss function, the Bayes rule

d_{n}^{*}

corresponds to the posterior expected value of

μ

, and in this case,

E [L (μ, d_{n}^{*}) | x_{n}] = Var (μ | x_{n})

. Given a sample

μ_{j}

, for

j = 1, \dots, N

, of the posterior distribution of

μ

, an estimate of

E [L (μ, d_{n}^{*}) | x_{n}]

may be obtained from the respective sample variance.

L3: Loss Function 3. The loss functions L1 and L2 suffer from two disadvantages in practical applications: both are symmetric and unbounded. In the list of bounded loss functions that might be considered, we may include those suggested in [29,30]. However, in our case, these loss functions are not simple to deal with. Nevertheless, there is a simple well-know asymmetric loss function that we may consider. This is the linear exponential (known as LINEX) loss function given by:

$L (μ, d_{n}) = exp (ℓ (d_{n} - μ)) - ℓ (d_{n} - μ) - 1,$

where $ℓ \neq 0$ . As ℓ increases positively, the overestimation is more costly than the underestimation. As ℓ increases negatively, the situation is reversed [31]. From p. 447 in [31], the Bayes rule for this loss function is established as:

$d_{n}^{*} = - \frac{1}{ℓ} log (E [\exp (- ℓ μ) | x_{n}]) .$

Given a sample

μ_{j}

, for

j = 1, \dots, N

, of the posterior distribution of

μ

, we may compute an estimate of

d_{n}^{*}

through:

\hat{d_{n}^{*}} = - \frac{1}{ℓ} log (\frac{1}{N} \sum_{j = 1}^{N} \exp (- ℓ μ_{j})) .

For the LINEX function, we have that [5]:

E [L (μ, d_{n}^{*}) | x_{n}] = ℓ (E [μ | x_{n}] - d_{n}^{*}) .

An estimate of

E [L (μ, d_{n}^{*}) | x_{n}]

may be obtained from:

\hat{E} [L (μ, d_{n}^{*}) | x_{n}] = ℓ (\frac{1}{N} \sum_{j = 1}^{N} μ_{j} - \hat{d_{n}^{*}}) .

L4: Loss Function 4. The fourth function is defined as:

$L (μ, d_{n}) = ρ τ + {(a - μ)}^{+} + {(μ - b)}^{+},$

(7)

where $0 < ρ < 1$ is a weight, $τ = (b - a) / 2$ is the half-length of the desired interval, and the function $x^{+}$ is equal to x if $x > 0$ and equal to zero, otherwise. Note that as $τ$ decreases, the interval is narrower. The terms ${(a - μ)}^{+}$ and ${(μ - b)}^{+}$ are included to penalized intervals that do not contain the parameter of interest $μ$ . These terms are equal to zero if $μ \in [a, b]$ and increase as $μ$ moves away from the interval. Note that the loss function given in (7) is a weighted sum of two terms, $τ$ and ${(a - μ)}^{+} + {(μ - b)}^{+}$ , where the weights are $ρ$ and one, respectively. The Bayes rule $d_{n}^{*}$ corresponds to taking a and b as the 100 $(ρ / 2)$ th and 100 $(1 - ρ / 2)$ th quantiles of the posterior distribution of $μ$ [8,32]. If we consider this loss function applied to the Bayes rule, we have that:

$E [L (μ, d_{n}^{*}) | x_{n}] = E [μ δ_{μ} (A_{b^{*}}) | x_{n}] - E [μ δ_{μ} (A_{a^{*}}) | x_{n}],$

where $A_{b^{*}} = [b^{*}, \infty)$ , $A_{a^{*}} = (0, a^{*}]$ , $a^{*}$ and $b^{*}$ are the corresponding bounds of the Bayes rule $d_{n}^{*}$ , whereas $δ_{μ}$ is the indicator function. Given a sample $μ_{j}$ , for $j = 1, \dots, N$ , of the posterior distribution of $μ$ , an estimate of $E [L (μ, d_{n}^{*}) | x_{n}]$ may be obtained from:

$\hat{E} [L (μ, d_{n}^{*}) | x_{n}] = \frac{1}{N} \sum_{j = 1}^{N} (μ_{j} δ_{μ_{j}} (A_{b^{*}}) - μ_{j} δ_{μ_{j}} (A_{a^{*}})) .$
L5: Loss Function 5. The fifth and last loss function considered here is expressed as:

$L (μ, d_{n}) = γ τ + \frac{{(μ - m)}^{2}}{τ},$

(8)

where $γ > 0$ is a fixed constant and $m = (a + b) / 2$ is the center of the credible interval. The first term defined in (8) involves the half-width of the interval, and the second term is the square of the distance between the parameter of interest $μ$ and the center of the interval, which is divided by the half-width to maintain the same measurement unit of the first term. The weights attributed to each term stated in (8) are $γ$ and one, respectively. If $γ < 1$ , we attribute the largest weight to the second term; if $γ > 1$ , the situation is reversed; and if $γ = 1$ , the two terms have the same weight. For this loss function, the Bayes rule $d_{n}^{*}$ corresponds to the quantities that define the interval $[a^{*}, b^{*}] = [m^{*} - {S D}_{γ}, m^{*} + {S D}_{γ}]$ , where $m^{*} = E [μ | x_{n}]$ and ${S D}_{γ} = γ^{- 1 / 2} {(Var (μ | x_{n}))}^{1 / 2}$ , that is, the corresponding standard deviation [4,8,32]. In this case, we have that:

$E [L (μ, d_{n}^{*}) | x_{n}] = 2 γ^{1 / 2} \sqrt{Var (μ | x_{n})} .$

(9)

Given a sample

μ_{j}

, for

j = 1, \dots, N

, of the posterior distribution of

μ

, an estimate of

E [L (μ, d_{n}^{*}) | x_{n}]

may be obtained from the sample variance and the expression (9).

4. Computational Aspects and Empirical Applications

In this section, we provide the characteristics of the computer that was used in our study. Also, we show the capabilities and features of a new R package which is named samplesizeBS [22] and is available from GitHub at https://github.com/santosneto/samplesizeBS (accessed on 21 May 2021). The capabilities of the samplesizeBS package allow us to calculate an optimal sample size when estimating the BS mean and generate random numbers of the joint posterior BS/IG distribution. This section finishes with an empirical application based on real data.

4.1. Computer Characteristics

The characteristics of the Cluster Euler, used when calculating the optimal sample size in Section 4.3, are available at http://www.cemeai.icmc.usp.br/Euler/index.html (accessed on 22 May 2021). The other numerical results presented in the simple example of Section 4.2, and the illustrative example with real data of Section 4.4, were obtained by a computer with the following characteristics: (i) OS: Linux Mint 19.3 Cinnamon; (ii) RAM: 7.7 GiB; and (iii) processor: Intel Core i5-7200U CPU@2.50GHz x 2. In addition, the following tools and programming languages were used: (i) development tool –IDE–: RStudio Version (1.3.1093); and (ii) statistical software: R. Note that, in the simple example regarding the use of the function bss.dt.bs(), the elapsed time was: 19 m 48 s.

4.2. The `samplesizeBS` Functions

In Table 1, we present the details of the functions contained in the samplesizeBS package.

In order to determinate the BS optimal sample size, we use the function bss.dt.bs(). In the example below, we calculate the optimal sample size considering the loss function L1, with

a_{1} = a_{2} = 8

,

b_{1} = b_{2} = 50

, and

c = 0.01

. The function also returns the graph with the sample size (n) versus the

TC (n)

of sampling, that is,

TC (n)

takes into account both the cost of sampling n observations and the cost of inference through the loss function. In this way, the optimal sample size is the value that minimizes the total cost (see Figure 1).

4.3. Optimal Sample Sizes

Next, we calculate the optimal sample size assuming different scenarios. For the hyperparameters of the prior distribution of

β

, we take

b_{1} = 50

and

a_{1} = 8, 10, 13, 15

. With these values, we have different degrees of prior information (see Figure 2). For the prior distribution of

α^{2}

, we set

a_{2} = a_{1}

and

b_{2} = b_{1}

. We consider

c = 0.001, 0.01, 0.1

for the per-unit cost. For the loss function L3, we take

ℓ = 0.50, 1.00, 2.00

, for L4,

ρ = 0.01, 0.05, 0.10

, and for L5,

γ = 0.25, 0.50, 1.00

. For each combination of these values, we compute the optimal sample size

n_{o}

for estimating

μ

. The average acceptance rate for the Metropolis–Hastings algorithm in all these combinations was

\approx 70 %

. Since the proposed methodology is based on simulation methods, we obtain

n_{o}

in triplicate and observe the difference for the three values. Table 2 reports the optimal sample sizes computed with these settings. In addition,

n_{o}

may be reached via the following link, which also presents a graph with the fitted curve:

https://santosneto.shinyapps.io/samplesizeBSapp (accessed on 21 May 2021).

Note that values of the hyperparameters are fixed according to the practitioner’s knowledge for the parameters

α^{2}

and

β

(see Step 1 of Algorithm 1). For example, if we set the hyperparameters as

a_{1} = 8

and

b_{1} = 50

, it means that our prior knowledge indicates the most likely values for

β

in the interval between the numbers 3 and 5 (see Figure 2) and similarly for the parameter

α^{2}

. Thus, for instance, if the practitioner’s prior knowledge is that the possible values for

β

(and/or

α^{2}

) are in the interval between the numbers 6 and 9, the practitioner must set the values of the hyperparameters (parameters of the IG distribution) such that the greatest mass of probability of the IG distribution is in the interval between the numbers 6 and 9.

4.4. Illustrative Example

A practical application of the methodology to compute the optimal sample size when estimating the mean of the BS distribution is illustrated here with one example. In this example, we consider a data set composed by 46 observations given in [33] and available in the samplesizeBS package [22]. These observations correspond to maintenance data on active repair times (in hours) for an airborne communications transceiver. Let

x_{i}

be the observation i of this data set associated with the random variable

X_{i} \sim BS (α, β)

.

First, we estimate

α

and

β

using the modified moment method defined in (3), which result in

\tilde{α} = 1.25

and

\tilde{β} = 2.02

, respectively. In this way, we can set IG prior distributions for

α^{2}

and

β

with

a_{1} = a_{2} = 15

and

b_{1} = b_{2} = 50

(see Figure 2). Hence, if our interest is in obtaining a credible interval for

μ

considering the loss function L5, with

γ = 0.25

and

c = 0.01

, the optimal sample size may be obtained from Table 2, which is

n_{o} = 46

.

Now, we generate 1000 observations from the posterior distribution of

μ

and estimate the mean and variance of the respective posterior distribution through the sample mean and variance, respectively. With these estimates, we obtain a credible interval for

μ

given by [2.108; 3.939].

5. Discussion, Conclusions and Future Research

We proposed a methodology to compute the optimal sample size for estimating the mean of the Birnbaum–Saunders distribution, a widely applied and studied distribution in several areas of science. We considered five different loss functions, which allowed us to perform both point and interval inference for the parameter of interest.

An advantage of the proposed methodology is that the per-unit cost, represented by c, is explicitly taken into account. When the cost c is fixed and

b_{1} = 50

, the optimal sample size

n_{o}

decreases as

a_{1}

increases (or as the prior variance decreases), as expected, since in such a case, the prior knowledge increases as

a_{1}

increases. This occurred with all loss functions considered in our study. For

b_{1} = 50

and

a_{1}

fixed,

n_{o}

also decreases as c increases, but the sampling total cost increases. For example, if we take the loss function L1,

a_{1} = 8

and

c = 0.001

, the corresponding

n_{o}

is 651 (see Table 2), which generates a sampling cost of

C (651) = 0.001 \times 651 = 0.651

, whereas if we take

c = 0.1

, the corresponding

n_{o}

is 23 (see Table 2), which produces a sampling cost of

C (23) = 0.1 \times 23 = 2.3

. For the loss function L3,

n_{o}

increases as ℓ increases, if we consider

a_{1}

and c fixed, which makes sense since overestimation is more costly as ℓ increases. For the loss function L4, when

ρ

increases,

n_{o}

also increases, if we consider

a_{1}

and c fixed. This makes sense because

ρ

is the weight attributed to the term

τ

in L4, a term related to the length of the credible interval. In addition, when

ρ

increases, we expect longer credible intervals, and consequently, the probability of the respective interval decreases. The same is valid for

γ

in the loss function L5, but in this case, the decreasing of the corresponding credible interval is easily noted by the presence of the term

γ^{- 1 / 2}

in the expression of the respective Bayes rule. When

γ

increases, this term shrinks the length of the interval.

Since the proposed methodology was based on simulations, we obtained the value of

n_{o}

in triplicate for each scenario of values of

a_{1}

, c, ℓ,

ρ

, and

γ

. We observed that the largest discrepancies in the scenarios occurred for

a_{1} = 8

. However, these discrepancies decreased as

a_{1}

increased, or when the prior variance decreased. This also occurred when

c = 0.001

and/or when we considered the loss function L2. In general, the discrepancy was close to zero, but if a large discrepancy occurs, we suggest visually inspecting the graph of the fitted curves and taking the value of

n_{o}

that corresponds to the best fit. Nevertheless, if all the curves fit visually well, we suggest using the median of the values obtained for

n_{o}

. In our case, we obtained the values of

n_{o}

in triplicate, as mentioned. For example, in Figure 3, under the loss function L5 with

a_{1} = 8

,

c = 0.001

, and

γ = 0.50

, the values of

n_{o}

were 1208, 1179, and 1183. Since there was a discrepancy between these values and the fittings of the curves were visually suitable, in this case, we suggest using

n_{o} = 1183

. Note that we have optimal sample sizes equal to zero in Table 2 in some scenarios, which means that it is not worth sampling in these cases because the sampling cost outweighs the decreasing of the minimized Bayes risk. This was also observed in [2,6].

We recall that the objective of the present investigation was to calculate the optimal sample size when estimating the mean of the Birnbaum–Saunders distribution. Simulation experiments should be performed to study the relative bias of the estimator of the mean (or the width of the corresponding interval estimate). A sensitivity analysis considering such simulation experiments would also be helpful. However, as mentioned, our methodology uses very intensive computational resources, which limits the implementation of such experiments. In addition, we plan to consider different priors for the parameters and to examine the robustness and the sensitivity of the results. Moreover, we are interested in studying the possibility of using bounded loss functions and other sampling algorithms.

The relevance of calculating sample sizes in statistics is undeniable. Another important challenge to be implemented is related to calculating the sample size when estimating the Birnbaum–Saunders mean (or other parameters) under more complex modeling structures, such as regression, temporal, spatial, functional, PLS, and errors-in-variables settings [34,35,36,37,38]. We hope to report findings associated with these open aspects in future research.

Author Contributions

Data curation, E.C., and M.S.-N.; formal analysis, E.C., M.S.-N., and V.L.; investigation, E.C. and M.S.-N.; methodology, E.C., M.S.-N., and V.L.; writing—original draft, E.C. and M.S.-N.; writing—review and editing, V.L. All authors read and agreed to the published version of the manuscript.

Funding

This research was carried out using the computational resources of the Center for Mathematical Sciences Applied to Industry (CeMEAI) supported by FAPESP (Grant 2013/07375-0) from the São Paulo Research Foundation, a public foundation located in São Paulo, Brazil. The research of V. Leiva was partially supported by FONDECYT (Grant 1200525) from the National Agency for Research and Development (ANID) of the Chilean government under the Ministry of Science.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The real-world data set used in this work is available in the samplesizeBS package. To access the data, use the R command data(repair).

Acknowledgments

The authors would also like to thank the Editor and three Reviewers for their constructive comments, which led to the improvement of this paper.

Conflicts of Interest

The authors declare no conflict of interest.

References

Spiegelhalter, D.J.; Abrams, K.R.; Myles, J.P. Bayesian Approaches to Clinical Trials and Health-Care Evaluation; Wiley: Chichester, UK, 2004. [Google Scholar]
Etzioni, R.; Kadane, J.B. Optimal experimental design for another’s analysis. J. Am. Stat. Assoc. 1993, 88, 1404–1411. [Google Scholar] [CrossRef]
Sahu, S.K.; Smith, T.M.F. A Bayesian method of sample size determination with practical applications. J. R. Stat. Soc. A 2006, 169, 235–253. [Google Scholar] [CrossRef]
Parmigiani, G.; Inoue, L. Decision Theory: Principles and Approaches; Wiley: New York, NY, USA, 2009. [Google Scholar]
Islam, A.F.M.S.; Pettit, L.I. Bayesian sample size determination using LINEX loss and linear cost. Commun. Stat. Theory Methods 2012, 41, 223–240. [Google Scholar] [CrossRef]
Islam, A.F.M.S.; Pettit, L.I. Bayesian sample size determination for the bounded LINEX loss function. J. Stat. Comput. Simul. 2014, 84, 1644–1653. [Google Scholar] [CrossRef]
Santis, F.D.; Gubbiotti, S. A decision-theoretic approach to sample size determination under several priors. Appl. Stoch. Model. Bus. Ind. 2017, 33, 282–295. [Google Scholar] [CrossRef]
Costa, E.G. Sample Size for Estimating the Organism Concentration in Ballast Water: A Bayesian Approach. Ph.D. Thesis, Department of Statistics, Universidade de São Paulo, São Paulo, Brazil, 2017. (In Portuguese). [Google Scholar]
Birnbaum, Z.W.; Saunders, S.C. A new family of life distributions. J. Appl. Probab. 1969, 6, 319–327. [Google Scholar] [CrossRef]
Bourguignon, M.; Leao, J.; Leiva, V.; Santos-Neto, M. The transmuted Birnbaum–Saunders distribution. REVSTAT Stat. J. 2017, 15, 601–628. [Google Scholar]
Mazucheli, M.; Leiva, V.; Alves, B.; Menezes, A.F.B. A new quantile regression for modeling bounded data under a unit Birnbaum–Saunders distribution with applications in medicine and politics. Symmetry 2021, 13, 682. [Google Scholar] [CrossRef]
Santos-Neto, M.; Cysneiros, F.J.A.; Leiva, V.; Barros, M. Reparameterized Birnbaum–Saunders regression models with varying precision. Electron. J. Stat. 2016, 10, 2825–2855. [Google Scholar] [CrossRef]
Reyes, J.; Barranco-Chamorro, I.; Gallardo, D.I.; Gomez, H.W. Generalized modified slash Birnbaum—Saunders distribution. Symmetry 2018, 10, 724. [Google Scholar] [CrossRef]
Gomez-Deniz, E.; Gomez, L. The Rayleigh Birnbaum Saunders distribution: A general fading model. Symmetry 2020, 12, 389. [Google Scholar] [CrossRef]
Desousa, M.; Saulo, H.; Leiva, V.; Santos-Neto, M. On a new mixture-based regression model: Simulation and application to data with high censoring. J. Stat. Comput. Simul. 2020, 90, 2861–2877. [Google Scholar] [CrossRef]
Sanchez, L.; Leiva, V.; Galea, M.; Saulo, H. Birnbaum–Saunders quantile regression and its diagnostics with application to economic data. Appl. Stoch. Models Bus. Ind. 2021, 37, 53–73. [Google Scholar] [CrossRef]
Villegas, C.; Paula, G.A.; Leiva, V. Birnbaum–Saunders mixed models for censored reliability data analysis. IEEE Trans. Reliab. 2011, 60, 748–758. [Google Scholar] [CrossRef]
Marchant, C.; Leiva, V.; Cysneiros, F.J.A. A multivariate log-linear model for Birnbaum–Saunders distributions. IEEE Trans. Reliab. 2016, 65, 816–827. [Google Scholar] [CrossRef]
Arrue, J.; Arellano, R.; Gomez, H.W.; Leiva, V. On a new type of Birnbaum–Saunders models and its inference and application to fatigue data. J. Appl. Stat. 2020, 47, 2690–2710. [Google Scholar] [CrossRef]
Balakrishnan, N.; Kundu, D. Birnbaum–Saunders distribution: A review of models, analysis, and applications. Appl. Stoch. Model. Bus. Ind. 2019, 35, 4–49. [Google Scholar] [CrossRef]
Bourguignon, M.; Ho, L.L.; Fernandes, F.H. Control charts for monitoring the median parameter of Birnbaum–Saunders distribution. Qual. Reliab. Eng. Int. 2020, 36, 1333–1363. [Google Scholar] [CrossRef]
Costa, E.G.; Santos-Neto, M.; Leiva, V. samplesizeBS: Bayesian Sample Size in a Decision-Theoretic Approach for the Birnbaum–Saunders, R Package Version 1.1-1; 2020. Available online: www.github.com/santosneto/samplesizeBS (accessed on 21 May 2021).
Aykroyd, R.G.; Leiva, V.; Marchant, C. Multivariate Birnbaum–Saunders distributions: Modelling and applications. Risks 2018, 6, 21. [Google Scholar] [CrossRef]
Leiva, V. The Birnbaum–Saunders Distribution; Academic Press: New York, NY, USA, 2016. [Google Scholar]
Ng, H.; Kundu, D.; Balakrishnan, N. Modified moment estimation for the two-parameter Birnbaum–Saunders distribution. Comput. Stat. Data Anal. 2003, 43, 283–298. [Google Scholar] [CrossRef]
Wang, M.; Sun, X.; Park, C. Bayesian analysis of Birnbaum–Saunders distribution via the generalized ratio-of-uniforms method. Comput. Stat. 2016, 31, 207–225. [Google Scholar] [CrossRef]
Leiva, V.; Ruggeri, F.; Saulo, H.; Vivanco, J.F. A methodology based on the Birnbaum–Saunders distribution for reliability analysis applied to nano-materials. Reliab. Eng. Syst. Saf. 2017, 157, 192–201. [Google Scholar] [CrossRef]
Raiffa, H.; Schlaifer, R. Applied Statistical Decision Theory; Harvard University Press: Boston, MA, USA, 1961. [Google Scholar]
Spiring, F.A. The reflected normal loss function. Can. J. Stat. 1993, 21, 321–330. [Google Scholar] [CrossRef]
Leung, B.P.K.; Spiring, F.A. The inverted beta loss function: Properties and applications. IEE Trans. 2002, 34, 1101–1109. [Google Scholar] [CrossRef]
Zellner, A. Bayesian estimation and prediction using asymmetric loss functions. J. Am. Stat. Assoc. 1986, 81, 446–451. [Google Scholar] [CrossRef]
Rice, K.M.; Lumley, T.; Szpiro, A.A. Trading Bias for Precision: Decision Theory for Intervals and Sets. Working Paper 336, UW Biostatistics. 2008. Available online: https://biostats.bepress.com/uwbiostat/paper336/ (accessed on 21 May 2021).
Hsieh, H.K. Estimating the critical time of the inverse Gaussian hazard rate. IEEE Trans. Reliab. 1990, 39, 342–345. [Google Scholar] [CrossRef]
Puentes, R.; Marchant, C.; Leiva, V.; Figueroa, J.I.; Ruggeri, F. Predicting PM2.5 and PM10 levels during critical episodes management in Santiago, Chile, with a bivariate Birnbaum–Saunders log-linear model. Mathematics 2021, 9, 645. [Google Scholar] [CrossRef]
Leiva, V.; Saulo, H.; Souza, R.; Aykroyd, R.G.; Vila, R. A new BISARMA time series model for forecasting mortality using weather and particulate matter data. J. Forecast. 2021, 40, 346–364. [Google Scholar] [CrossRef]
Martinez, S.; Giraldo, R.; Leiva, V. Birnbaum–Saunders functional regression models for spatial data. Stoch. Environ. Res. Risk Assess. 2019, 33, 1765–1780. [Google Scholar] [CrossRef]
Huerta, M.; Leiva, V.; Liu, S.; Rodriguez, M.; Villegas, D. On a partial least squares regression model for asymmetric data with a chemical application in mining. Chemom. Intell. Lab. Syst. 2019, 190, 55–68. [Google Scholar] [CrossRef]
Carrasco, J.M.F.; Finiga, J.I.; Leiva, V.; Riquelme, M.; Aykroyd, R.G. An errors-in-variables model based on the Birnbaum–Saunders and its diagnostics with an application to earthquake data. Stoch. Environ. Res. Risk Assess. 2020, 34, 369–380. [Google Scholar] [CrossRef]

Figure 1. Fitted curve with the respective optimal sample size obtained using the loss function L1.

Figure 2. Probability density functions for different values of the hyperparameter

a_{1}

(

b_{1} = 50

) of the IG prior distribution for

β

.

Figure 2. Probability density functions for different values of the hyperparameter

a_{1}

(

b_{1} = 50

) of the IG prior distribution for

β

.

Figure 3. Fitted curves with the respective optimal sample sizes obtained via the loss function L5 with

a_{1} = 8

,

c = 0.001

, and

γ = 0.50

.

Figure 3. Fitted curves with the respective optimal sample sizes obtained via the loss function L5 with

a_{1} = 8

,

c = 0.001

, and

γ = 0.50

.

Table 1. Functions and the respective outputs of the samplesizeBS package.

Function	Output
`rbs()`	A sample of size n from the BS distribution
`logp.beta()`	The logarithm of the marginal posterior distribution of $β$
`rbeta.post()`	A random sample from the marginal posterior distribution of $β$ using the random walk Metropolis–Hastings algorithm
`bss.dt.bs()`	An integer representing the optimal sample size for estimating $μ$ of the BS distribution and the acceptance rate for the random walk Metropolis–Hastings algorithm

Table 2. Optimal sample sizes

n_{o}

(in triplicate) when estimating the mean of the BS distribution via five different loss functions.

Table 2. Optimal sample sizes

n_{o}

(in triplicate) when estimating the mean of the BS distribution via five different loss functions.

$ρ / γ / ℓ$	$a_{1} = 8$			$a_{1} = 10$			$a_{1} = 13$			$a_{1} = 15$
$ρ / γ / ℓ$	$c = 0.001$	$c = 0.01$	$c = 0.1$	$c = 0.001$	$c = 0.01$	$c = 0.1$	$c = 0.001$	$c = 0.01$	$c = 0.1$	$c = 0.001$	$c = 0.01$	$c = 0.1$
	Loss function L1
	651	117	23	436	80	13	267	47	9	210	36	6
	641	121	22	429	77	14	267	47	8	209	36	6
	627	140	21	429	77	13	268	47	8	209	37	6
	Loss function L2
	2096	641	176	1130	317	108	542	144	33	381	88	21
	2129	697	200	1198	326	97	558	138	42	380	89	23
	2075	622	218	1182	292	81	530	139	32	360	89	21
	Loss function L3
$ℓ = 2.00$	1787	354	61	1111	229	41	631	138	24	467	101	18
	1826	363	62	1115	235	40	640	135	25	468	101	18
	1820	352	61	1112	229	41	616	134	25	454	97	19
$ℓ = 1.00$	929	190	34	553	122	22	310	69	13	226	51	9
	924	197	34	556	122	22	311	69	13	227	51	9
	925	190	34	552	116	22	308	67	13	222	49	9
$ℓ = 0.50$	465	101	19	281	63	12	153	35	7	111	25	5
	463	101	19	275	63	12	155	35	7	111	25	4
	471	99	18	275	59	12	146	33	6	105	23	4
	Loss function L4
$ρ = 0.10$	279	53	9	175	31	7	103	18	3	79	14	2
	271	54	10	171	31	6	106	18	3	79	14	2
	284	53	10	168	33	5	103	18	3	80	14	2
$ρ = 0.05$	187	37	8	118	22	4	70	13	2	54	9	0
	197	37	8	121	22	4	71	12	2	55	9	0
	184	40	7	121	22	4	71	13	2	55	9	0
$ρ = 0.01$	82	18	3	54	9	0	30	5	0	22	4	0
	85	19	3	52	9	0	29	5	0	22	4	0
	83	18	3	49	9	0	30	5	0	22	4	0
	Loss function L5
$γ = 1.00$	1461	271	51	899	171	30	556	103	18	441	78	13
	1472	292	55	942	162	31	561	99	18	438	78	13
	1460	282	56	883	162	30	554	101	18	433	78	13
$γ = 0.50$	1208	203	39	684	132	23	427	78	14	337	59	10
	1179	201	38	690	130	23	434	78	14	335	59	10
	1183	213	42	693	134	24	436	80	14	338	60	10
$γ = 0.25$	796	166	32	538	106	18	333	59	10	259	46	8
	859	171	30	540	99	19	331	62	11	260	47	8
	894	167	32	531	101	18	333	60	10	260	46	8

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Costa, E.; Santos-Neto, M.; Leiva, V. Optimal Sample Size for the Birnbaum–Saunders Distribution under Decision Theory with Symmetric and Asymmetric Loss Functions. Symmetry 2021, 13, 926. https://doi.org/10.3390/sym13060926

AMA Style

Costa E, Santos-Neto M, Leiva V. Optimal Sample Size for the Birnbaum–Saunders Distribution under Decision Theory with Symmetric and Asymmetric Loss Functions. Symmetry. 2021; 13(6):926. https://doi.org/10.3390/sym13060926

Chicago/Turabian Style

Costa, Eliardo, Manoel Santos-Neto, and Víctor Leiva. 2021. "Optimal Sample Size for the Birnbaum–Saunders Distribution under Decision Theory with Symmetric and Asymmetric Loss Functions" Symmetry 13, no. 6: 926. https://doi.org/10.3390/sym13060926

APA Style

Costa, E., Santos-Neto, M., & Leiva, V. (2021). Optimal Sample Size for the Birnbaum–Saunders Distribution under Decision Theory with Symmetric and Asymmetric Loss Functions. Symmetry, 13(6), 926. https://doi.org/10.3390/sym13060926

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Optimal Sample Size for the Birnbaum–Saunders Distribution under Decision Theory with Symmetric and Asymmetric Loss Functions

Abstract

1. Introduction

2. The Birnbaum–Saunders Model

2.1. Properties

2.2. Inference

3. Optimal Sample Size

3.1. Determining the Optimal Sample Size

3.2. Loss Functions

4. Computational Aspects and Empirical Applications

4.1. Computer Characteristics

4.2. The `samplesizeBS` Functions

4.3. Optimal Sample Sizes

4.4. Illustrative Example

5. Discussion, Conclusions and Future Research

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Optimal Sample Size for the Birnbaum–Saunders Distribution under Decision Theory with Symmetric and Asymmetric Loss Functions

Abstract

1. Introduction

2. The Birnbaum–Saunders Model

2.1. Properties

2.2. Inference

3. Optimal Sample Size

3.1. Determining the Optimal Sample Size

3.2. Loss Functions

4. Computational Aspects and Empirical Applications

4.1. Computer Characteristics

4.2. The samplesizeBS Functions

4.3. Optimal Sample Sizes

4.4. Illustrative Example

5. Discussion, Conclusions and Future Research

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

4.2. The `samplesizeBS` Functions