A Note on the Birnbaum–Saunders Conditionals Model

: As an alternative to available bivariate Birnbaum–Saunders (BS) models, a conditionally speciﬁed distribution with BS conditionals is considered. The behavior of conditional or pseudolikelihood parameter estimates of the model parameters is investigated via simulation. A comparison using a mineralogy data set suggests that the conditionally speciﬁed model outperforms competing models (with BS marginals). An analogous comparison using a well-known data set of Australian athletes also suggests the superiority of the conditionally speciﬁed model. Further investigation of its possible general superiority is suggested.


Introduction
The BS distribution, which is asymmetric, was initially introduced as a suitable model for lifetimes in fatigue settings (see Leiva [1] and Balakrishnan and Kundu [2]). Other recent extensions of the BS model are delivered in the works of Reyes et al. [3], Martínez-Flórez et al. [4], Gómez-Déniz and Gómez [5] and Mazucheli et al. [6], among others. However, it has also been shown to function as a suitable model for other non-negative variables. Recently, certain bivariate BS models have been proposed in the literature. It can and has been argued that the shape of appropriate conditional densities, which are suggested by cross sections of bivariate histograms, are easier to visualize and that these conditional features are more informative about appropriate two dimensional models than any modeling suggestion based on marginal insights alone. With this in mind, in many situations, it may be worthwhile to consider bivariate models with BS conditionals, as we will, rather than models with BS marginals. For further discussion of the advantages of conditional specification when compared with marginal specification, see Arnold et al. [7].
The available bivariate Birnbaum-Saunders models, while providing adequate marginal fits, are not always capable of adapting to conditional features of the data. The goal of the current paper is to introduce alternative models that are endowed with a dependence structure that is driven by modeling conditional features of data sets.
In Section 2, we review the definition of the BS distribution and identify the general bivariate model with BS conditionals. Algorithms for simulating draws from this conditionally specified model are then described. Parameter estimation is most easily implemented via conditional (or pseudo) likelihood. In Section 4, the performance of this estimation strategy is investigated via simulation, and in Section 5, the conditionally specified model is compared with two recently proposed models with BS marginals when applied to two data sets, one of a mineralogical nature and the other a subset of the Australian athletes data as reported in Cook and Weisberg [8]. The performance of the conditionally specified BS distribution is encouraging.

The Proposed Conditionally Specified Model
The BS density with unit scale parameter is typically expressed in the following form where γ > 0. We will reparametrize the model, defining θ = 1 2γ 2 , so the density becomes Note that the parametrization in Equation (1) belongs to the natural exponential family of distributions. This guarantees that the parameter space for θ is convex. If a random variable X has (1) as its density, we write X ∼ BS(1, θ); here the 1 refers to the fact that the scale parameter is set equal to 1. If this random variable is multiplied by a positive scale parameter β, we denote the resulting distribution by BS(β, θ). Clearly, with β = 1, this is a one parameter exponential family. The corresponding parameter space is {θ : 0 < θ < ∞}. Using a result in Arnold and Strauss [9], the most general family of bivariate densities with conditionals in this one parameter family (1) (i.e., belonging to the natural exponential family of distributions) is of the form where θ 10 > 0, θ 01 > 0 and θ 11 ≥ 0. The corresponding marginal density of X is which is clearly not of the BS form unless θ 11 = 0, which corresponds to the case in which X and Y are independent. The conditional densities are indeed of the BS form. Thus, specifically X|Y = y ∼ BS(1, θ 10 + θ 11 T(y)), and Y|X = x ∼ BS(1, θ 01 + θ 11 T(x)).

Drawing Values from the Model
One advantage of the model is that both conditional distributions are specified in relatively simple forms. There is a well-known formula expressing a BS(β, θ) variable denoted by U as a function of a standard normal variable Z, thus This can used whenever a BS draw is to be simulated. There are two approaches that can be used to simulate draws from a general BS conditionals distribution. One approach involves simulation of a draw from the X marginal followed by use of the conditional density of Y given X to find the corresponding second coordinate of (X, Y). For this approach, we wish to avoid drawing values directly from the marginal distribution because of possible problems in the computation of the normalizing constant. For this reason, to draw values from the marginal distribution of X, we propose the following steps based on the Metropolis-Hastings algorithm (Algorithm 1): Algorithm 1 Metropolis-Hastings algorithm.

2.
Given a last value, say z k−1 , propose a new value as z * k = x k−1 + , where ∼ N(0, τ). A possible choice for τ can be τ = 1 + θ 10 , which corresponds to the variance of X for the independence case θ 11 = 0.

3.
If Repeat steps 2 and 3 as many times as necessary.
To eliminate the dependence among the drawn values, it is possible to consider a burn-in period and to thin the resulting series. Having drawn a final sample for X, say x * 1 , x * 2 , . . . , x * n , we can simulate corresponding values for Y (using (5)) such that y * i ∼ BS(1, θ 01 + θ 11 T(x * i )). Finally, the scale parameters can be introduced by setting x i = β X x * i and y i = β Y y * i for i = 1, . . . , n. The pairs (x 1 , y 1 ), (x 2 , y 2 ), . . . , (x n , y n ) then constitute a simulated random sample from the model.
An alternative approach is one involving Gibbs sampler simulation utilizing the relative simplicity of the two corresponding conditional distributions (and the ease of drawing from univariate BS distributions). For it, we begin with an arbitrary value for X, say x * 0 . Then, we use the conditional distribution of Y given X = x * 0 ( as in (5)) to simulate a corresponding y * 0 . Next, we generate a simulated value of X, say x * 1 , using the conditional distribution of X given Y = y * 0 ( as in (4)), and continue in this fashion using (5) and (4) alternately. Use of burn-in and thinning is also recommended for this approach.

Estimation
Up to an additive constant, the log-likelihood function for ψ = (β X , β Y , θ 10 , θ 01 , θ 11 ) can be written as where However, the maximization of (7) can be challenging because of the need to repeatedly compute the quantity C(θ 10 , θ 01 , θ 11 ). For this reason, we recommend the implementation of the (conditional or) pseudo-likelihood approach. Besag [10] (see also Arnold and Strauss [11]) defined the pseudo-maximum likelihood estimator as that parameter vector that maximizes the pseudo-likelihood function, which for the bivariate case is given by For our proposed bivariate density, up to an additive constant, the log-pseudo-likelihood function is given by The pseudo-likelihood estimates of the parameters are then obtained by numerically maximizing this expression.

Simulation Study
In this section, we present a brief simulation study in order to assess the recovery of parameter values based on the pseudo-maximum likelihood estimation discussed in Section 3. The data were drawn based on the first method described in Section 2.1, considering a burn-in period and a thinning of 1000 and 10, respectively. We consider ten different combinations for β 1 , β 2 , θ 10 , θ 01 and θ 11 . We also consider three sample sizes: 50, 100 and 200. For each combination of parameters and sample sizes, we consider 10,000 replicates. We present the average bias (AB), the mean of the standard errors (SE) based on the hessian matrix and the root of the mean squared error ( √ MSE) for those replicates. Results are presented in Table 1. It can be observed that accurate estimation of the dependence parameter θ 11 appears to require a quite large sample size. This is not unexpected since dependence parameters are routinely more elusive in conditionally specified models. Simulation and applications codes were written in the R programming language R [12].

Applications
In this section, we present two applications to compare our proposal with two other bivariate BS distributions that have been considered in the literature. The parameter estimation was performed based on the pseudo-maximum likelihood (ML) estimation method for the BVBSC model and the traditional ML estimation method for the other models. Both data sets (Sections 5.1 and 5.2) have been used in many papers, particularly in univariate distributional settings.
Regarding the first application, the variable Neodymium was used in Reyes et al. [3] and Bourguignon and Gallardo [13] in a univariate context. For the second application, Arellano-Valle et al. [14] used the variable weight, also in a univariate context.

Minerals Data Set
This data set corresponds to lanthanum and neodymium measurements collected by the Department of Mines of the University of Atacama, Chile, representing 86 samples of both minerals (see Supplementary Materials). Some descriptive statistics for this data set are presented in Table 2. We fitted the bivariate BS (BVBS) discussed in Kundu et al. [15] and the bivariate BS (BVBS2) proposed by Lemonte et al. [16]. The formulas for the densities corresponding to these models are displayed in the Appendix A. Results are presented in Table 3. In order to compare the models, the AIC (see Akaike [17]) and BIC (see Schwarz [18]) criteria are considered. The conditionally specified model appears to be markedly superior to the other two bivariate BS models. It should be remarked, however, that, since the conditionally specified model fails to have BS marginals, it might be difficult to convince a scientist to use it if they feel that there are valid scientific reasons to believe that BS marginals are appropriate. Figure 2 shows the better fit for the mineral data set of the BVBSC when compared with the BVBS and BVBS2 models.

Australian Athletes Data Set
These data are related to measurements of 202 Australian athletes found in Cook and Weisberg [8]. For each athlete various characteristics of the blood were determined together with data regarding the sport, body size and gender of the athlete. In particular, we consider two measurements, the Body fat percentage (Bfat) and the weight in kg. Some descriptive statistics for this data set are presented in Table 4. We fitted the same three bivariate BS models to these data as were fitted in Section 5.1. Results are presented in Table 5. Again, the conditionally specified model appears to be superior to the other two bivariate BS models. Figure 3 shows the better fit for the athlete data set of the BVBSC when compared with the BVBS and BVBS2 models.

Discussion
The BS density has been utilized in a wide range of scientific and economic settings as a suitable model for non-negative data. As was remarked in the Introduction, its genesis was in the context of fatigue life distributions, but its relative simplicity has led to successful application in many other fields. There are, of course, many ways in which we might seek to develop suitable bivariate BS models. One approach involves the use of copulas to link BS marginals. An example of this approach is the Kundu et al. [15] model, which can be viewed as utilizing a classical bivariate normal copula. The model proposed by Lemonte et al. [16] involves marginal transformations of a well-known bivariate skew normal model. The proposal, in the current paper, of using a conditionally specified model does have the drawback of not having BS marginals, but, as has been demonstrated, it fares well, when compared to the two models described above, in terms of fitting the data sets discussed here.
The field is, of course, open for the development of alternative bivariate BS distributions distinct from those discussed here. It must be kept in mind that we cannot expect to find an overall best bivariate BS model. Some will be good for certain kinds of data, while others will be better for different data configurations.
where a j = a j (α j , β j ) and A j = A j (α j , β j ) are defined as and φ 2 (u, v; ρ) denotes the joint probability density of the bivariate standard normal distribution with correlation parameter equal to ρ, i.e., On the other hand, the BVBS2 density studied by Lemonte et al. [16] is where a j and A j are presented in (A1) and φ(·) and Φ(·) denote the density and cumulative distribution functions for the univariate standard normal distribution.