The Multivariate Skewed Log-Birnbaum–Saunders Distribution and Its Associated Regression Model

Martínez-Flórez, Guillermo; Vergara-Cardozo, Sandra; Tovar-Falón, Roger; Rodriguez-Quevedo, Luisa

doi:10.3390/math11051095

Open AccessArticle

The Multivariate Skewed Log-Birnbaum–Saunders Distribution and Its Associated Regression Model

by

Guillermo Martínez-Flórez

^1,*,†

,

Sandra Vergara-Cardozo

²,

Roger Tovar-Falón

^1,*,†

and

Luisa Rodriguez-Quevedo

²

¹

Departamento de Matemáticas y Estadística, Facultad de Ciencias Básicas, Universidad de Córdoba, Montería 230002, Colombia

²

Departamento de Estadística, Facultad de Ciencias, Universidad Nacional de Colombia, Bogotá 111321, Colombia

^*

Authors to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Mathematics 2023, 11(5), 1095; https://doi.org/10.3390/math11051095

Submission received: 19 October 2022 / Revised: 28 November 2022 / Accepted: 6 December 2022 / Published: 22 February 2023

(This article belongs to the Special Issue Probability, Statistics & Symmetry)

Download

Browse Figures

Versions Notes

Abstract

In this article, a multivariate extension of the unit-sinh-normal (USHN) distribution is presented. The new distribution, which is obtained from the conditionally specified distributions methodology, is absolutely continuous, and its marginal distributions are univariate USHN. The properties of the multivariate USHN distribution are studied in detail, and statistical inference is carried out from a classical approach using the maximum likelihood method. The new multivariate USHN distribution is suitable for modeling bounded data, especially in the

{(0, 1)}^{p}

region. In addition, the proposed distribution is extended to the case of the regression model and, for the latter, the Fisher information matrix is derived. The numerical results of a small simulation study and two applications with real data sets allow us to conclude that the proposed distribution, as well as its extension to regression models, are potentially useful to analyze the data of proportions, rates, or indices when modeling them jointly considering different degrees of correlation that may exist in the study variables is of interest.

Keywords:

multivariate log-Birnbaum–Saunders distribution; multivariate regression model; unit-sinh-normal distribution; bounded data

MSC:

60E05; 62H05

1. Introduction

Data whose response falls in the interval

(0, 1)

such as indices, proportions, or rates appear very frequently in different fields of knowledge, mainly the areas of social sciences, engineering, economic sciences, and medicine. Some practical examples of these types of data are the proportion of patients who die from a certain disease or virus (SARS-CoV-2, Diabetes, HIV or Cancer) in a country or city; the Human Development Index or the illiteracy rate in a region or country; the proportion of deaths due to exposure to smoking or other exposure factors; the mortality rate from traffic accidents in a city; the percentage of items that do not meet the minimum requirements in an assembly line; and the portion of income that a family spends on entertainment.

For the analysis of data such as those described above, statistical methodologies developed from distributions with support in the interval

(0, 1)

are required. In this sense, several probability distributions and regression models have been proposed; see Ferrari and Cribari-Neto [1], Kumaraswamy [2], Martínez-Flórez et al. [3,4], Kieschnick and Mccullough [5], Mazucheli et al. [6,7].

The univariate sinh-normal (SHN) distribution introduced by Rieck and Nedelman [8] has received special attention for modeling material-fatigue-related problems. The probability density function (pdf) of this distribution is given by:

\begin{matrix} φ (y) = \frac{2}{σ α} cosh (\frac{y - γ}{σ}) ϕ (\frac{2}{α} sinh (\frac{y - γ}{σ})), y \in R, \end{matrix}

(1)

where

α > 0

and

γ

are shape and location parameters, respectively;

σ > 0

is a scale parameter, and

ϕ (\cdot)

is the pdf of the normal distribution. The pdf in (1) can also be written as:

φ (y) = b_{y}^{'} ϕ (b_{y})

(2)

where

b_{y}^{'} = \frac{2}{σ α} cosh (\frac{y - γ}{σ})

is a derivative of

b_{y} = \frac{2}{α} sinh (\frac{y - γ}{σ}) .

The distribution function in (2) is denoted by

SHN (α, γ, σ) .

Several extensions of the SHN distribution have been studied by numerous authors, for example, Martínez-Flórez et al. [9] proposed the extended generalized SHN distribution, which has great applicability in the fit of datasets presenting high skewness and bimodality simultaneously; Lemonte [10] introduced an extension named skewed log-Birnbaum–Saunders (log-BS) regression model which is based on the asymmetric SHN distribution proposed by Leiva et al. [11]. The log-BS regression model is suitable for fitting data with high degrees of skewness. On the other hand, Moreno et al. [12] proposed a generalization of the Birnbaum–Saunders (BS) distribution [13] that affords flexibility for fitting data with greater skewness and kurtosis compared with other distributions.

Multivariate extensions of the SHN distribution have also been considered, for instance, by Martínez-Flórez et al. [14], Díaz-García and Domínguez-Molina [15], Lemonte [16], Marchant et al. [17] and, recently, by Martínez-Flórez et al. [18], among others.

For modeling material fatigue data, the most widely known distribution is the BS whose pdf is given by:

\begin{matrix} f_{T} (t) = ϕ (a_{t}) \frac{t^{- 3 / 2} (t + β)}{2 α \sqrt{β}}, t > 0, \end{matrix}

(3)

where

a_{t} = \frac{1}{α} (\sqrt{t / β} - \sqrt{β / t})

,

α > 0

is a shape parameter, and

β > 0

is a scale parameter and the median distribution. The distribution in (3) is denoted by

T \sim BS (α, β)

. Rieck and Nedelman [8] showed that, if

Y \sim SHN (α, γ, σ = 2)

, then

T = exp (Y)

follows a BS distribution with parameters

α

and

β = exp (γ)

. From this last relationship between the BS and SHN distributions, the SHN regression model can be formulated as follows: if

x_{i} = {(x_{i 1}, \dots, x_{i p})}^{⊤}

is a vector of covariates such that,

Y_{i} = log (T_{i}) = x_{i}^{⊤} θ,

(4)

for

i = 1, \dots, n

, and,

Y_{i} \sim SHN (α, x_{i}^{⊤} θ, σ)

then, the model in (4) is known as the log-linear BS regression model. More details about this regression model can be found in Rieck and Nedelman [8].

The BS distribution has great applicability to analyze data in several areas of knowledge, such as biology, medicine, engineering, etc.; however, so far, no extension of the BS distribution has been proposed to study the modeling of rates and proportions, i.e., of a random variable in the unit interval

(0, 1)

from the BS model. In response to this special case, Mazucheli et al. [19] presented an extension of the BS distribution for fitting random variables in the unit interval

(0, 1)

. The pdf of this model is given by:

\begin{matrix} f (x) = \frac{1}{2 x α β \sqrt{2 π}} [{(- \frac{β}{log x})}^{1 / 2} + {(- \frac{β}{log x})}^{3 / 2}] exp \{\frac{1}{2 α^{2}} [\frac{log x}{β} + \frac{β}{log x} + 2]\}, \end{matrix}

(5)

where

x \in (0, 1)

,

α > 0

is the shape parameter and

β > 0

is a scale parameter. Based on the work of Mazucheli et al. [19], Martínez-Flórez et al. [3] studied the unit sinh-normal (USHN) distribution to deal with the problem of bounded observations on the interval

(0, 1)

which has a pdf given by:

\begin{matrix} φ (y) & = \frac{1}{(1 - y) log {(1 - y)}^{- 1}} \frac{2}{σ α} cosh (\frac{log (- log (1 - y)) - γ}{σ}) \\ \times ϕ (\frac{2}{α} sinh (\frac{log (- log (1 - y)) - γ}{σ})), \end{matrix}

(6)

where

y \in (0, 1)

,

α > 0

is a shape parameter,

γ

is a location parameter, and

σ > 0

is a scale parameter. The natural extension to the case of the model which considers covariates is the USHN linear regression (USHNR) model. The USHNR model is defined by considering a set of p explanatory variables that are denoted by

x_{i} = {(x_{i 1}, \dots, x_{i p})}^{⊤}

and, such that,

log (- log (1 - Y_{i})) = x_{i}^{⊤} θ + ε_{i}, for i = 1, \dots, n

(7)

where

θ = {(θ_{1}, \dots, θ_{p})}^{⊤}

is a p dimensional vector of unknown parameters and

ε_{i} \sim SHN (α, 0, σ)

. More details can be found in [3].

The main objective of this work is to introduce a new multivariate probability distribution capable of modeling data in the region

{(0, 1)}^{p}

. The new distribution is obtained from the extension of the univariate skewed unit-sinh-normal distribution and to do so, we rely on the conditionally specified distributions methodology introduced by Arnold. In addition, from the new distribution, we propose the multivariate unit-sinh-normal skewed regression model, which allows modeling data in the region

{(0, 1)}^{p}

through linear predictors. The new proposals are useful for the analysis of data on proportions, rates, or indices that arise in different fields of knowledge, such as those described at the beginning of this section. The results of a simulation presented in this work also show that these methodologies are viable alternatives to those existing in the current statistical literature.

This paper is organized as follows. In Section 2, the multivariate skew-normal distribution is revised, and its main properties are commented on. In Section 3, the new multivariate skewed unit sinh-normal distribution is introduced. Some properties are also derived, and the value of the coefficient correlation for the bivariate case is presented for some selected values of the parameter distribution. Section 4 presents the extension of the USHN to the case of the multivariate regression model and its respective statistical inference. Finally, two applications with real data to illustrate the applicability of the proposed methodologies and a small simulation study are presented in Section 5.

2. Multivariate Skew-Normal Distribution

The multivariate skew-normal (SN) distribution was studied by Arnold et al. [20] by using the theory of conditionally specified distributions; see [21]. The construction of the multivariate SN distribution is as follows: for each

j = 1, 2, \dots, p,

define the vector

Z_{(j)}

to be the

(p - 1)

dimensional random vector obtained from

Z

by deleting

Z_{j} .

In parallel, for a real vector

z = {(z_{1}, z_{2}, \dots, z_{p})}^{⊤}

,

z_{(j)}

is obtained by deleting the jth coordinate

z_{j}

of

z

. Now, suppose that, for each

j = 1, 2, \dots, p,

the conditional distribution of

Z_{j}

given

Z_{(j)} = z_{(j)}

is a SN distribution with a parameter which is a function of

z_{(j)} .

Thus, it is assumed for each j that

Z_{j} ∣ Z_{(j)} = z_{(j)} \sim SN (λ \prod_{j^{'} \neq j} z_{j^{'}}) .

The joint pdf of

Z = (Z_{1}, Z_{2}, \dots, Z_{p})

is given by

f_{Z} (z) = 2 (\prod_{j = 1}^{p} ϕ (z_{j})) Φ (λ \prod_{j = 1}^{p} z_{j}) .

(8)

In the distribution (8), the marginal densities follow a standard normal distribution, that is, for

j = 1, 2, \dots, p

, and

Z_{j} \sim N (0, 1),

the conditional distribution follows a SN distribution, (see Azzalini [22]) of parameter

λ \prod_{j^{'} \neq j} z_{j^{'}}

, with the pdf given by:

f (Z_{j} ∣ Z_{(j)} = z_{(j)}) = 2 ϕ (z_{j}) Φ (λ \prod_{j^{'} \neq j} z_{j^{'}})

From this distribution, Lemonte et al. [23] presented the multivariate Birnbaum–Saunders distribution, whose joint pdf is given by:

f_{T_{1}, \dots, T_{p}} (t_{1}, \dots, t_{p}) = 2 \prod_{j = 1}^{p} ϕ (a_{t_{j}}) Φ (λ \prod_{j = 1}^{p} a_{t_{j}}) \prod_{j = 1}^{p} \frac{t_{j}^{- 3 / 2} (t_{j} + η_{j})}{2 α_{j} \sqrt{η_{j}}}, t_{1}, \dots, t_{p} > 0,

(9)

where

a_{t_{j}} (α_{j}, η_{j}) = a_{t_{j}} = \frac{1}{α_{j}} (\sqrt{\frac{t_{j}}{η_{j}}} - \sqrt{\frac{η_{j}}{t_{j}}}),

(10)

for

j = 1, 2, \dots, p

, with

α_{j} > 0

and

η_{j} > 0

being the shape and scale parameters, respectively. The distribution (9) is denoted by

MVBS (α, η, λ) .

Another extension based on the multivariate SN model of Arnold et al. [20] is the multivariate asymmetric SHN distribution, studied by Martínez-Flórez et al. [24], whose joint pdf is given by:

f_{Y_{1}, \dots, Y_{p}} (y_{1}, \dots, y_{p}) = 2 (\prod_{j = 1}^{p} b_{j}^{'}) (\prod_{j = 1}^{p} ϕ (b_{j})) Φ (λ \prod_{j = 1}^{p} b_{j}), y_{1}, \dots, y_{p} \in R,

(11)

where

b_{j} = \frac{2}{α_{j}} sinh (\frac{Y_{j} - γ_{j}}{σ_{j}})

, and

b_{j}^{'} = \frac{2}{α_{j} σ_{j}} cosh (\frac{Y_{j} - γ_{j}}{σ_{j}})

for

j = 1, 2, \dots, p

is a derivative of

b_{j}

with respect to

Y_{j}

;

α_{j} > 0

and

σ_{j} > 0

are shape and scale parameters, respectively, and

γ_{j}, λ \in R

are location and asymmetry parameters, respectively. The distribution in (11) is denoted by

MVSHN (α, γ, σ, λ),

with

α = {(α_{1}, \dots, α_{p})}^{⊤},

γ = {(γ_{1}, \dots, γ_{p})}^{⊤}

and

σ = {(σ_{1}, \dots, σ_{p})}^{⊤} .

Although MVBS and MVSHN distributions, which are defined in

R^{2 p + 1}

and

R^{3 p + 1},

respectively, can be used to fit sets of random variables whose domain is the unit interval

(0, 1)

, these are not appropriate given the support of these distributions and the support of a bounded random variable vector. In the statistical literature, there are few distributions studied to fit sets of variables in the unit interval, that is, whose domain of definition is

{(0, 1)}^{p}

, which can be useful to fit rates and proportions. The interest for these type of distributions has been very little; we highlight the works of Cepeda et al. [25], Souza and Moura [26] and Lemonte and Moreno-Arenas [27], among others.

3. Multivariate Skewed Unit-Sinh-Normal Distribution

Following the idea of Arnold et al. [20], Lemonte et al. [23] and Martínez-Flórez et al. [24], in this section, a multivariate extension of the SHN distribution to fit vectors of rates and proportions is proposed, which is named multivariate skewed USHN distribution (MVSUSHN). The construction of the MVSUSHN is as follows: for

j = 1, 2, \dots, p

, let

Y_{j} = 1 - exp (- exp (γ_{j} + σ_{j} {sinh}^{- 1} (\frac{α_{j} Z_{j}}{2}))),

where

Z_{j} \sim N (0, 1)

for

j = 1, 2, \dots, p .

Then, the joint pdf of the vector with MVSUSHN distribution is given by,

f_{Y_{1}, \dots, Y_{p}} (y_{1}, \dots, y_{p}) = 2 (\prod_{j = 1}^{p} b_{j}^{'}) (\prod_{j = 1}^{p} ϕ (b_{j})) Φ (λ \prod_{j = 1}^{p} b_{j}), y_{1}, \dots, y_{p} \in (0, 1),

(12)

where

b_{j} = \frac{2}{α_{j}} sinh (\frac{log (- log (1 - y_{j})) - γ_{j}}{σ_{j}})

and

b_{j}^{'} = \frac{2}{α_{j} σ_{j} (1 - y_{j}) (- log (1 - y_{j}))} cosh (\frac{log (- log (1 - y_{j})) - γ_{j}}{σ_{j}})

for

j = 1, 2, \dots, p

is the derivative of

b_{j}

with respect to

y_{j}

;

α_{j} > 0

and

σ_{j} > 0

are shape and scale parameters, and

γ_{j}

and

λ \in R

are location and asymmetry parameters, respectively. The MVSUSHN is denoted as

MVSUSHN (α, γ, σ, λ),

with

α = {(α_{1}, \dots, α_{p})}^{⊤},

γ = {(γ_{1}, \dots, γ_{p})}^{⊤}

and

σ = {(σ_{1}, \dots, σ_{p})}^{⊤}

. For

λ = 0

, the case of independence is obtained, that is, the product of the pdf of USHN random variables studied by Martínez-Flórez et al. [3]. It follows that the parameter

λ

is directly associated with the correlation parameter. The Figure 1 shows the contours of the bivariate skewed USHN (BVSUSHN) distribution for some selected values of the parameters, while the Figure 2 presents the shape of the density function for particular values of the parameter of the the BVSUSHN distribution.

The following theorem provides the marginal and conditional distributions of the MVSUSHN distribution.

Theorem 1.

If

(Y_{1}, Y_{2}, \dots, Y_{n}) \sim M V S U S H N (α, γ, σ, λ)

then,

(1): $Y_{j} \sim U S H N (α_{j}, γ_{j}, σ_{j})$ for $j = 1, 2, \dots, n .$
(2): The conditional pdf of $Y_{j} ∣ Y_{(j)} = y_{(j)}$ is given by

$f_{Y_{j} ∣ Y_{(j)}} (y_{j} ∣ Y_{(j)} = y_{(j)}) = 2 b_{j}^{'} ϕ (b_{j}) Φ (λ \prod_{j = 1}^{n} b_{j}) .$

(13)
(3): The cumulative distribution function (cdf) of $Y_{j} ∣ Y_{(j)} = y_{(j)}$ is given by

$P (Y_{j} \leq y_{j} ∣ Y_{(j)} = y_{(j)}) = Φ (b_{j}) - 2 T (b_{j}, λ \prod_{j^{'} \neq j} b_{j^{'}}),$

(14)

where $T (\cdot)$ is the Owen function; see [28].

Proof.

(1): For $k = 1, 2, \dots, p$ and applying the integral over all subindex k (given by $j^{'}$ ), other than j, we obtain,

$\begin{matrix} f_{Y_{j}} (y_{j}) & = & \int_{(0, 1)} \overset{j^{'} \neq j}{\dots} \int_{(0, 1)} 2 (\prod_{k = 1}^{p} b_{k^{'}}^{'} ϕ (b_{k}) Φ (λ \prod_{k = 1}^{p} b_{k}) \prod_{j^{'} \neq j} d y_{j^{'}} \\ = & b_{j}^{'} ϕ (b_{j}) \int_{(0, 1)} \overset{j^{'} \neq j}{\dots} \int_{(0, 1)} 2 (\prod_{j^{'} \neq j} b_{j^{'}}^{'} ϕ (b_{j^{'}})) Φ ((λ b_{j}) \prod_{j^{'} \neq j} b_{j^{'}}) \prod_{j^{'} \neq j} d y_{j^{'}} . \end{matrix}$

Now, using the transformation $Z_{j^{'}} = \frac{2}{α_{j^{'}}} sinh (\frac{log (- log (1 - Y_{j^{'}})) - γ_{j^{'}}}{σ_{j^{'}}})$ for all $j^{'} \neq j$

$\begin{matrix} f_{Y_{j}} (y_{j}) & = & b_{j}^{'} ϕ (b_{j}) \int_{R} \overset{j^{'} \neq j}{\dots} \int_{R} 2 (\prod_{j \neq j^{'}} ϕ (z_{j^{'}})) Φ ((λ z_{j}) \prod_{j^{'} \neq j} z_{j^{'}}) \prod_{j^{'} \neq j} d z_{j^{'}} \\ = & b_{j}^{'} ϕ (b_{j}) (1) \\ = & b_{j}^{'} ϕ (b_{j}) . \end{matrix}$

where the second last result follows from Arnold et al. [20].
(2): Let

$f_{Z_{j} ∣ Z_{(j)}} (Z_{j} ∣ Z_{(j)} = z_{(j)}) = 2 ϕ (z_{j}) Φ ((λ \prod_{j^{'} \neq j} z_{j^{'}}) z_{j}),$

then, with the transformation $Y_{j} = 1 - exp (- exp (γ_{j} + σ_{j} {sinh}^{- 1} (\frac{α_{j} Z_{j}}{σ_{j}})))$ , it is found that $Z_{j} = \frac{2}{α_{j}} sinh (\frac{log (- log (1 - Y_{j})) - γ_{j}}{σ_{j}}) = b_{j}$ and $\frac{d Z_{j}}{d Y_{j}} = b_{j}^{'}$ and, by the Transformation Theorem, it follows:

$f_{Y_{j} ∣ Y_{(j)}} (Y_{j} ∣ Y_{(j)} = y_{(j)}) = 2 b_{j}^{'} ϕ (b_{j}) Φ (λ \prod_{j = 1}^{p} b_{j}) .$
(3): It has that

$P (Y_{j} \leq y_{j} ∣ Y_{(j)} = y_{(j)}) = \int_{- \infty}^{y_{j}} f_{Y_{j}} (t_{j} ∣ Y_{(j)} = y_{(j)}) d t_{j}$

through the transformation $Z_{j} = \frac{2}{α_{j}} sinh (\frac{log (- log (1 - T_{j})) - γ_{j}}{σ_{j}}) = b_{j}$ , it follows that

$\begin{matrix} P (Y_{j} \leq y_{j} ∣ Y_{(j)} = y_{(j)}) & = \int_{- \infty}^{b_{j}} 2 ϕ (z_{j}) Φ ((λ \prod_{j^{'} \neq j} b_{j^{'}}) z_{j}) d z_{j} \\ = Φ (b_{j}) - 2 T (b_{j}, λ \prod_{j^{'} \neq j} b_{j^{'}}) \end{matrix}$

where the last equality follows the properties of the cdf of the SN distribution, which is widely known in the literature.

□

Martínez-Flórez et al. [3] showed that, when

α \to 0

, the random variable

\frac{log (- log (1 - Y)) - γ}{α σ / 2},

converges to a standard normal distribution. From here, if

α_{j} \to 0

with

j = 1, 2, \dots, p

, then

Y \sim MVSN (γ, σ, λ) .

Furthermore, if

X_{j} = log (- log (1 - Y_{j}))

for

j = 1, 2, \dots, p

, then

X \sim MVSHN (α, γ, σ, λ) .

If

Y \sim MVSUSHN (α, γ, σ, λ)

then,

Z_{j} = \frac{2}{α_{j}} sinh (\frac{log (- log (1 - Y_{j})) - γ_{j}}{σ_{j}}) \sim N (0, 1),

for all

j = 1, 2, \dots, p

, then,

Z = {(Z_{1}, Z_{2}, \dots, Z_{p})}^{⊤} \sim MVSN (0, I, λ) .

(see [3]). It can be seen that, if

σ_{1} = σ_{2} = \dots = σ_{q} = 2

, the multivariate standard USHN distribution follows, which is denoted by

MVSUSHN (α, γ, 2 1_{q}, λ) .

In this case, the marginals are standard USHN, and the variables

X_{j} = log (- log (1 - Y_{j}))

for

j = 1, 2, \dots, p

follow the SHN distribution of Rieck and Nedelman [8].

In order to study the unimodal distribution, let

p = 2

, and suppose that

α_{1} = α_{2},

γ_{1} = γ_{2} = 0

, and

σ_{1} = σ_{2} = 1

. By differentiating the logarithm of the conditional distributions and equaling to zero, it has

\begin{matrix} \frac{\partial log f (y_{2} ∣ y_{1})}{\partial y_{2}} & = (1 + log (1 - y_{2})) + \frac{b_{y_{2}}}{b_{y_{2}}^{'}} - b_{y_{2}}^{'} b_{y_{2}} + λ b_{y_{2}}^{'} b_{y_{1}} \frac{ϕ (λ b_{y_{1}} b_{y_{2}})}{Φ (λ b_{y_{1}} b_{y_{2}})}, \\ \frac{\partial log f (y_{1} ∣ y_{2})}{\partial y_{1}} & = (1 + log (1 - y_{1})) + \frac{b_{y_{1}}}{b_{y_{1}}^{'}} - b_{y_{1}}^{'} b_{y_{1}} + λ b_{y_{1}}^{'} b_{y_{2}} \frac{ϕ (λ b_{y_{1}} b_{y_{2}})}{Φ (λ b_{y_{1}} b_{y_{2}})}, \end{matrix}

the following equations are obtained:

\frac{b_{y_{2}}}{b_{y_{2}}^{'}} - b_{y_{2}}^{'} b_{y_{2}} + λ b_{y_{2}}^{'} b_{y_{1}} \frac{ϕ (λ b_{y_{1}} b_{y_{2}})}{Φ (λ b_{y_{1}} b_{y_{2}})} = - (1 + log (1 - y_{2}))

(15)

\frac{b_{y_{1}}}{b_{y_{1}}^{'}} - b_{y_{1}}^{'} b_{y_{1}} + λ b_{y_{1}}^{'} b_{y_{2}} \frac{ϕ (λ b_{y_{1}} b_{y_{2}})}{Φ (λ b_{y_{1}} b_{y_{2}})} = - (1 + log (1 - y_{1}))

(16)

Multiplying the equation in (15) by

b_{y_{2}} b_{y_{1}}^{^{'} 2}

and the equation in (16) by

b_{y_{1}} b_{y_{2}}^{^{'} 2}

, and subtracting these two results, it follows that

\begin{matrix} b_{y_{2}}^{2} b_{y_{1}}^{^{'} 2} - b_{y_{2}}^{2} b_{y_{2}}^{^{'} 2} b_{y_{1}}^{^{'} 2} - b_{y_{2}}^{^{'} 2} b_{y_{1}}^{2} + b_{y_{2}}^{^{'} 2} b_{y_{1}}^{^{'} 2} b_{y_{1}}^{2} \\ = (1 + log (1 - y_{1})) b_{y_{2}}^{^{'} 2} b_{y_{1}}^{^{'}} b_{y_{1}} - (1 + log (1 - y_{2})) b_{y_{2}} b_{y_{2}}^{^{'}} b_{y_{1}}^{^{'} 2} . \end{matrix}

(17)

Note that by letting

y_{2} = y_{1}

and substituting in (17), it follows:

\begin{matrix} b_{y_{1}}^{2} b_{y_{1}}^{^{'} 2} - b_{y_{1}}^{2} b_{y_{1}}^{^{'} 4} - b_{y_{1}}^{^{'} 2} b_{y_{1}}^{2} + b_{y_{1}}^{^{'} 4} b_{y_{1}}^{2} & = (1 + log (1 - y_{1})) b_{y_{1}}^{^{'} 3} b_{y_{1}} - (1 + log (1 - y_{1})) b_{y_{1}} b_{y_{1}}^{^{'} 3} \\ 0 & = 0 \end{matrix}

Therefore,

y_{1} = y_{2}

is a trivial solution of the Equation (17). Then, by replacing

y_{1} = y_{2}

in (15), it has

b_{y_{2}} (1 - b_{y_{2}}^{^{'}}) Φ (λ b_{y_{2}}) + (1 + log (1 - y_{2})) b_{y_{2}}^{^{'}} Φ (λ b_{y_{2}}) + λ b_{y_{2}} b_{y_{2}}^{^{'}} ϕ (λ b_{y_{2}}) = 0

from which results the function

g (y_{2}; λ) = b_{y_{2}} (1 - b_{y_{2}}^{^{'}}) Φ (λ b_{y_{2}}) + (1 + log (1 - y_{2})) b_{y_{2}}^{^{'}} Φ (λ b_{y_{2}}) + λ b_{y_{2}} b_{y_{2}}^{^{'}} ϕ (λ b_{y_{2}}) .

Then, by applying the transformation

Y_{j} = 1 - exp (- exp (σ_{j} arcsinh (α_{j} Z_{j} / 2) + γ_{j}))

, the bivariate distribution with conditional SN distributions is obtained.

The bivariate distribution with conditional asymmetric USHN is a one-to-one transformation. The

λ

values for which the SBVUSHN distribution is unimodal are the same as for which the bivariate SN distribution is unimodal and, according to Arnold et al. [20], the bivariate SN distribution is unimodal for

λ \leq \sqrt{π / 2}

. One can note that the equation

g (y_{2}; λ)

is similar to the equation found by Arnold et al. [20] for the BVSN distribution. The modes of the BVSUSHN distribution can be obtained by solving the equation

g (y_{2}; λ) = 0

and

Y_{1} - Y_{2} = 0 .

Moments and Correlation

The covariance for the random variables

Y_{j}

and

Y_{j^{'}}

is given by:

cov (Y_{j}, Y_{j^{'}}; λ) = E (Y_{j} Y_{j^{'}}) - E (Y_{j}) E (Y_{j^{'}})

where the moment product

E (Y_{j} Y_{j^{'}})

for two random variables is given by

E (Y_{j} Y_{j^{'}}) = 2 \int_{(0, 1)} \int_{(0, 1)} y_{j} y_{j^{'}} b_{y_{j}}^{'} b_{y_{j^{'}}}^{'} ϕ (b_{y_{j}}) ϕ (b_{y_{j^{'}}}) Φ (λ b_{y_{j}} b_{y_{j^{'}}}) d y_{j} d y_{j^{'}}

and

E (Y^{r}) = \sum_{j = 0}^{r} \sum_{l = 0}^{\infty} (\binom{r}{j}) \frac{{(- 1)}^{j + l} {(j e^{γ})}^{l}}{l!} [\frac{k_{a_{1}} (α^{- 2}) + k_{b_{1}} (α^{- 2})}{k_{1 / 2} (α^{- 2})}]

(18)

with

a = \frac{r σ + 1}{2}

,

b = \frac{r σ - 1}{2}

, and

k_{λ} (\cdot)

being the third-order function of Bessel defined by

k_{λ} (v) = \frac{1}{2} {(\frac{v}{2})}^{λ} \int_{0}^{\infty} u^{- λ - 1} e^{- u - \frac{v^{2}}{4 u}} d u .

(19)

A proof of the result in (18) can be seen in the Appendix A. Using (18) the variances:

var (Y_{i}; λ) = E (Y_{i}^{2}) - E^{2} (Y_{i})

are obtained for

i = j, j^{'}

; thus, the correlation coefficient is obtained from:

cor (Y_{j}, Y_{j^{'}}; λ) = \frac{cov (Y_{j}, Y_{j^{'}}; λ)}{\sqrt{var (Y_{j}; λ) var (Y_{j^{'}}; λ)}} .

To compute

cor (Y_{j}, Y_{j^{'}}; λ)

, it is necessary to use numerical methods to determine the simple moments and the product moments. It can be shown that for this distribution

cov (Y_{j}, Y_{j^{'}}; - λ) = - cov (Y_{j}, Y_{j^{'}}; λ)

, whereby

cor (Y_{j}, Y_{j^{'}}; - λ) = - cor (Y_{j}, Y_{j^{'}}; λ)

. To study the range of the correlation coefficient,

cor (Y_{j}, Y_{j^{'}}; λ)

was evaluated for some parameter values. The values taken for the parameters:

σ_{1} = σ_{2} = 1

and

γ_{1} = γ_{2} = 0

,

α_{1}

,

α_{2}

, and

λ

varying. The Table 1 shows the values for the parameters

α_{1},

α_{2}

,

λ

, and the values of

cor (Y_{j}, Y_{j^{'}}; λ)

for a pair of variables

Y_{j}

and

Y_{j^{'}} .

The values were obtained for the case of

λ > 0

and the case of

λ < 0

results for symmetry, given that

cor (Y_{j}, Y_{j^{'}}; - λ) = - cor (Y_{j}, Y_{j^{'}}; λ) .

As can be seen, the case of independence occurs for

λ = 0

, i.e.,

cor (Y_{j}, Y_{j^{'}}; 0) = 0

, then according to the Table 1, for the values

| cor (Y_{j}, Y_{j^{'}}; λ) | \leq 0.9938

, which leads us to the conclusion that for this model

| cor (Y_{j}, Y_{j^{'}}; λ) | \leq 1.0

.

4. Multivariate Skewed USHN Regression Model

This section presents an extension of the USHN regression model for the case of multiple bounded response variables (rates and proportions). Suppose that we have q variables measuring rates or proportions in a sample of size

n,

i.e., for

i = 1, 2, \dots, n

, we have the vector of dimension

q \times 1

y_{i} = {(y_{i 1}, y_{i 2}, \dots, y_{i q})}^{⊤} .

Assume also that there are p explanatory variables

X_{1}, X_{2}, \dots, X_{p}

where, for

i = 1, 2, \dots, n

,

X_{i} = {(x_{i 1}, x_{i 2}, \dots, x_{i q})}^{⊤},

and there is a matrix

q \times p

associated to the ith observed response

y_{i}

with

x_{i j} = {(x_{i j 1}, x_{i j 2}, \dots, x_{i j p})}^{⊤},

for

j = 1, 2 \dots, q

, a p dimensional vector of values of the explanatory variables. For the vector of response variables, we use the operator

vec (\cdot)

, which transforms matrices into a column vector from the columns of the matrix. Therefore,

y = vec (y_{1}, y_{2}, \dots, y_{n}) .

Thus, the MVSUSHN regression model is given by

Z_{i} = X_{i} β + ε_{i}, i = 1, 2, \dots, n,

(20)

where

z_{i j} = log (- log (1 - y_{i j}))),

being

β

is a vector of unknown parameters of dimension p, and the vectors

ε_{i}

for

i = 1, 2, \dots, n

are vectors of independent and identically distributed random variables such that

ε_{i} \sim MVSHN (α, 0_{q}, Σ, λ)

where

Σ = diag (σ_{1}, σ_{2}, \dots, σ_{q}),

and

0_{q}

, a vector of zeros of dimension

q .

It follows that,

z_{i} \sim SMVSHN (α, X_{i} β, Σ, λ) .

From this result, we have by the theorem shown above that

y_{i j} \sim USHN (α_{j}, x_{i j}^{⊤} β_{j}, σ_{j})

for

j = 1, 2, \dots, q

i.e., each marginal follows a USHN regression model. Thus, defining

X = diag (X_{1}, X_{2}, \dots, X_{q})

, where

X_{j}

for

j = 1, 2, \dots, q

is a matrix of size

n \times p_{j}

and

X

of dimension

n q \times (p_{1} + p_{2} + \dots + p_{q})

, then the MVSUSHN can be represented by

Z = X β + ε,

(21)

where

Z_{i j} = X_{i j} β_{j} + ε_{i j}

,

i = 1, 2, \dots, n

,

j = 1, 2, \dots, q

taking values

z_{i j} = log (- log (1 - y_{i j}))),

with

β = vec (β_{1}, β_{2}, \dots, β_{q})

with

β_{j}

a vector of dimension

p_{j} \times 1

, i.e.,

β

is a vector of dimension

p \times 1

being

p = p_{1} + p_{2} + \dots + p_{q},

ε = vec (ε_{1}, ε_{2}, \dots, ε_{q})

is an error vector with

ε_{j} \sim MVSHN (α, 0_{n q}, Σ, λ)

where

ε_{j} = (ε_{1 j}, ε_{2 j}, \dots, ε_{n j})

with

ε_{i j} \sim SHN (α_{j}, 0, σ_{j})

it follows that

z_{i j} \sim SHN (α_{j}, X_{i j} β_{j}, σ_{j})

for

j = 1, 2, \dots, q .

Statistical Inference

For

i = 1, 2, \dots, n

and

j = 1, 2, \dots, q

, define:

δ_{1} = (δ_{11}, δ_{21}, \dots, δ_{n 1}),

δ_{2} = (δ_{12}, δ_{22}, \dots, δ_{n 2})

and

δ_{3} = (δ_{13}, δ_{23}, \dots, δ_{n 3})

; with

δ_{i 1} = (δ_{i 11}, δ_{i 12}, \dots, δ_{i 1 q}),

δ_{i 2} = (δ_{i 21}, δ_{i 22}, \dots, δ_{i 2 q})

and

δ_{i 3} = (δ_{i 31}, δ_{i 32}, \dots, δ_{i 3 q})

where

δ_{i 1 j} = \frac{2}{α_{j} σ_{j} (1 - y_{i j}) (- log (1 - y_{i j}))} cosh (\frac{z_{i j} - x_{i j}^{⊤} β}{σ_{j}}), δ_{i 2 j} = \frac{2}{α_{j}} sinh (\frac{z_{i j} - x_{i j}^{⊤} β}{σ_{j}})

and

δ_{i 3 j} = Φ (λ \prod_{j = 1}^{p} δ_{i 2 j}) .

for

j = 1, 2, \dots, p

. The log-likelihood function for the parameter vector

θ = {(α, β, Σ, λ)}^{⊤}

is

ℓ (θ) = \sum_{i = 1}^{n} ℓ_{i} (θ),

where

\begin{matrix} ℓ_{i} (θ) & = - \frac{q}{2} log (2 π) + \sum_{i = 1}^{q} σ_{j} + \sum_{i = 1}^{q} log (- (1 - y_{i j}) (log (1 - y_{i j}))) \\ + \sum_{i = 1}^{q} log (δ_{i 1 j}) - \frac{1}{2} \sum_{i = 1}^{q} δ_{i 2 j}^{2} + log (δ_{i 3 j}) . \end{matrix}

To obtain the score function, denoted

U (θ),

we took the derivative of the log-likelihood function with respect to each of the parameters, so the elements of the score function are given by

\begin{matrix} U (β_{j k}) & = \frac{1}{σ_{j}} \sum_{i = 1}^{n} x_{i j k} (δ_{i 1 j} δ_{i 2 j} - \frac{δ_{i 2 j}}{δ_{i 1 j}}) + \frac{λ}{σ_{j}} \sum_{i = 1}^{n} x_{i j k} δ_{i 1 j} ω_{i} (\prod_{j^{'} \neq j} δ_{i 2 j^{'}}), k = 1, \dots, p_{j}, \end{matrix}

(22)

\begin{matrix} U (α_{j}) & = - \frac{n}{α_{j}} + \frac{1}{α_{j}} \sum_{i = 1}^{n} δ_{i 2 j}^{2} - \frac{λ}{α_{j}} \sum_{i = 1}^{n} ω_{i} (\prod_{j = 1}^{q} δ_{i 2 j}), \end{matrix}

(23)

\begin{matrix} U (σ_{j}) & = - \frac{n}{σ_{j}} - \frac{1}{σ_{j}} \sum_{i = 1}^{n} v_{i j} tanh (v_{i j}) + \frac{1}{σ_{j}} \sum_{i = 1}^{n} v_{i j} δ_{i 1 j} δ_{i 2 j} - \frac{λ}{σ_{j}} \sum_{i = 1}^{n} δ_{i 1 j} ω_{i} (\prod_{j^{'} \neq j} δ_{i 2 j^{'}}), \end{matrix}

(24)

\begin{matrix} U (λ) & = \sum_{i = 1}^{n} ω_{i} (\prod_{j = 1}^{q} δ_{i 2 j}), \end{matrix}

(25)

where

v_{i j} = (log (- log (1 - y_{i j}) - x_{i j}^{⊤} β_{j}) / σ_{j}

, for

i = 1, \dots, n

, and

j = 1, 2, \dots p

and;

w_{i} = ϕ (λ \prod_{j = 1}^{q} δ_{i 2 j}) / Φ (λ \prod_{j = 1}^{q} δ_{i 2 j})

.

The maximum likelihood estimator (MLE) for

β_{j 1}, \dots, β_{j p_{j}}, α_{j}

and

σ_{j}

are the solutions to the equations

U (β_{j k}) = 0

,

U (α_{j}) = 0

and

U (σ_{j}) = 0

for

j = 1, 2, \dots, q, k = 1, 2, \dots, p_{j}

, which require numerical procedures. To start the iterative process, the least squares estimates can be used for the

β_{j}

, i.e.,

{\tilde{β}}_{j 0} = {(X_{j}^{⊤} X_{j})}^{- 1} X_{j} z_{j}

, from where

{\hat{σ}}_{j 0}^{2} = \frac{1}{n - p_{j}} \sum_{i = 1}^{n} {(z_{i j} - x_{j}^{⊤} {\tilde{β}}_{j 0})}^{2}

, while for the

α_{j}

, the initial values could be implemented

{\hat{α}}_{j 0} = \sqrt{{\tilde{α}}_{j}}

, where

{\tilde{α}}_{j} = \frac{4}{n} \sum_{i = 1}^{n} sinh (\frac{z_{i j} - x_{j}^{⊤} {\tilde{β}}_{j 0}}{{\bar{σ}}_{j 0}}) .

. With these initial values, and assuming them to be the true values of the parameters, a one-dimensional function for the parameter

λ

can be obtained, which can be estimated by some numerical method such as uniroot from R Development Core Team [29].

The elements of the observed information matrix that are calculated as minus the second derivative of the log-likelihood function with respect to the parameters, denoted by

I_{θ_{j} θ_{k}}

, are given by

I_{θ_{j} θ_{k}} = - \frac{\partial ℓ (θ)}{\partial θ_{j} \partial θ_{k}}

(26)

The explicit expressions of these elements are presented in the Appendix B. The elements of the Fisher information matrix (denoted

κ_{θ_{j} θ_{k}}

) are given by the expectation of the elements of the observed information matrix, that is

κ_{θ_{j} θ_{k}} = E (I_{θ_{j} θ_{k}}),

these are calculated numerically, therefore, the information matrix is expressed by

κ_{θ θ} = - E [\frac{\partial ℓ (θ)}{\partial θ_{j} \partial θ_{k}}] = (\begin{matrix} κ_{α α} & κ_{α β} & κ_{α Σ} & κ_{α λ} \\ κ_{α β}^{T} & κ_{β β} & κ_{β Σ} & κ_{β λ} \\ κ_{α Σ}^{T} & κ_{β Σ}^{T} & κ_{Σ Σ} & κ_{Σ λ} \\ κ_{α λ}^{T} & κ_{β λ}^{T} & κ_{Σ λ}^{T} & κ_{λ λ} \end{matrix})

When

λ = 0

, the case of the independence of univariate USHN distribution is obtained; thus, it follows that:

κ_{β β} = bloq . diag (c (α_{1}) x_{1}^{⊤} x_{1} / 4, c (α_{2}) x_{2}^{⊤} x_{2} / 4, \dots, c (α_{q}) x_{q}^{⊤} x_{q} / 4),

is a diagonal block matrix where

c (α_{j}) = 1 + \frac{4}{α_{j}^{2}} - \sqrt{2 π / α_{j}^{2}} (1 - \erf [{(2 / α_{j}^{2})}^{1 / 2}] exp (2 / α_{j}^{2}))

and

\erf (x)

is the error function given by:

\erf (x) = \frac{2}{\sqrt{π}} \int_{0}^{x} e^{- t^{2}} d t,

see Rieck and Nedelman [8];

κ_{α α} = diag (2 / α_{1}^{2}, 2 / α_{2}^{2}, \dots, 2 / α_{q}^{2}),

κ_{Σ Σ} = diag (κ_{σ_{1} σ_{1}}, κ_{σ_{2} σ_{2}}, \dots, κ_{σ_{q} σ_{q}}),

where

κ_{σ_{j} σ_{j}} = \frac{a_{2} (α_{j}, σ_{j})}{σ_{j}^{2}} + 2 \frac{b (α_{j}, σ_{j}) - d (α_{j}, σ_{j})}{σ_{j}^{2}}

with

a_{l} (α_{j}, σ_{j}) = E (v_{j}^{l} [2 δ_{2 j}^{2} + \frac{4}{α_{j}^{2}} - 1 + \frac{δ_{2 j}^{2}}{δ_{2 j}^{2} + 4 / α_{j}^{2}}])

,

b (α_{j}, σ_{j}) = E (v_{j} δ_{1 j} δ_{2 j})

and

d (α_{j}, σ_{j}) = E (v_{j} \frac{δ_{1 j}}{δ_{2 j}})

. Expectations in the above expressions must be calculated numerically,

κ_{α β} = 0

is a matrix of zeros,

κ_{α Σ} = diag (κ_{α_{1} σ_{1}}, κ_{α_{2} σ_{2}}, \dots, κ_{α_{q} σ_{q}})

with

κ_{α_{j} σ_{j}} = \frac{2}{α_{j} σ_{j}} b (α_{j}, σ_{j})

,

κ_{β Σ} = diag (κ_{β_{1} σ_{1}}, κ_{β_{2} σ_{2}}, \dots, κ_{β_{q} σ_{q}})

with

κ_{β_{j} σ_{j}} = \frac{a_{1} (α_{j}, σ_{j})}{2 σ_{j}} x_{j}

,

κ_{α λ} = 0

is a vector of size

q,

κ_{β λ} = 0

is a vector of size

p_{1} + p_{2} + \dots + p_{q}

κ_{Σ λ} = 0

is a vector of size q and

κ_{λ λ} = \frac{2}{π} .

The rows (or columns) of the matrix

κ_{θ θ}

are linearly independent, so the determinant is different than zero; this guarantees the existence of the inverse of

κ_{θ θ}

. Hence, for large samples, the MLE

\hat{θ}

of

θ

is asymptotically normal, that is,

\hat{θ} \overset{D}{⟶} N_{p + 2 q + 1} (θ, κ_{θ θ}^{- 1}),

resulting that the asymptotic variance of the MLE

\hat{θ}

is the inverse of

I (\hat{θ})

. The approximation to the

N_{p + 2 q + 1} (θ, κ_{θ θ}^{- 1})

can be used to construct confidence intervals for

α_{j},

β_{j k_{j}},

σ_{j}

y

λ,

these are given by

\hat{α_{j}} \mp z_{1 - ξ / 2} \sqrt{\hat{κ} (\hat{α})},

{\hat{β}}_{j k_{j}} \mp z_{1 - ξ / 2} \sqrt{\hat{κ} ({\hat{β}}_{j k_{j}})}

and

\hat{λ} \mp z_{1 - ξ / 2} \sqrt{\hat{κ} (\hat{λ})},

where

\hat{κ} (\cdot)

is on the diagonal of the matrix

κ_{θ θ}^{- 1}

for each parameter and

z_{1 - ξ / 2}

is the quantil

100 (δ / 2) %

of the standard normal distribution.

The hypothesis of interest

H_{0} : λ = 0 versus λ \neq 0

can be tested by the statistic test (for large n)

- 2 [\hat{ℓ} ({\hat{θ}}^{*}, \hat{λ}) - \hat{ℓ} ({\hat{θ}}^{*}, 0)] \sim χ_{1}^{2}

where

θ^{*}

is the vector

θ

without the parameter

λ,

and

{\hat{θ}}^{*}

is the MLE of

θ

restricted on

H_{0} .

5. Numerical Results

In this section, the results of the simulation study and two applications to illustrate the applicability of the proposed models are presented.

5.1. Simulation Study

In order to study the behavior of the MLE of the parameter vector of the MVSUSHN regression model, a small Monte Carlo simulation study with covariates in the model was carried out. We considered

X_{1 i} \sim N (0, 1)

and

X_{2 i} \sim U (0, 1)

for

i = 1, 2, \dots, n

. For

λ = 0,

the case of independence; the results for each model can be seen in the studies conducted by Martínez-Flórez et al. [3].

We used the bivariate

MVSUSHN ((α_{1}, α_{2}), (β_{10}, β_{11}, β_{20}, β_{21}), (σ_{1}, σ_{2}), λ)

regression model, and we took the values:

α_{1} = 1.5,

α_{2} = 0.75,

β_{10} = - 0.75,

β_{11} = 0.50,

β_{20} = 0.50

, and

β_{21} = 1.5

and

σ_{1} = σ_{2} = 2

, while

λ = 1.75, 3.5,

and

5.25 .

The sample size was

n = 40, 80, 120,

and 200, and the number of iterations was 5000. We studied the absolute value of the relative bias (RB), root of mean square error (RMSE), length of confidence interval (LCI), and coverage probability (CP). To generate the samples, we performed the following algorithm:

(1): Generate a uniform random $U_{1} \sim U (0, 1)$ and a random number $x_{1}$ with distribution $N (0, 1)$ .
(2): Generate $ε_{1} = 2 arcsinh (α_{1} Φ^{- 1} (U 1) / 2)$ with $Φ^{- 1} (\cdot)$ the inverse of the standard normal function.
(3): Let $y_{1} = 1 - exp (- exp (β_{10} + β_{11} x_{1} + ε_{1}))$ .
(4): Compute $b = (2 / α_{1}) sinh ((log (- log (1 - y_{1})) - (β_{10} + β_{11} x_{1})) / 2)$ .
(5): Generate another uniform random number (independent of $U_{1}$ ) $U_{2} \sim U (0, 1)$ and $x_{2}$ also with distribution $U (0, 1)$
(6): Compute the error $ε_{2}$ such that $ε_{2} = 2 arcsinh (α_{2} Φ_{SN}^{- 1} (U_{2}, 0, 1, λ b 1) / 2),$ where $Φ_{SN}^{- 1} (\cdot, 0, 1, \cdot)$ is the inverse function of the standard skew-normal and $arcsinh (\cdot)$ is the inverse of the hyperbolic sine function.
(7): Let $y_{2} = 1 - exp (- exp (β_{20} + β_{21} x_{2} + ε_{2}))$ . This algorithm is generated n times, finally obtaining the USHN bivariate random sample.

The results of the simulations can be seen in the Table 2. In general, it can be seen that the RB, RSME, and LCI of the model parameters decrease as n increases; this decrease is slower for some parameters. It can also be seen that the CP increases as n increases.

5.2. Illustration 1

To show the relevance of the MVSUSHN distributio, a real data set from a study conducted by Freeman [30] on drunk driving legislation and traffic fatalities in 48 states in the United States of America (USA) during the period from 1980 to 2004 is considered. The database is available in the wooldridge library by Shea and Brown [31] of the software R Development Core Team [29] under the name of driving, and it contains information associated with current legislation, accident records, and some demographic characteristics. For this illustration, the unemployment rate variables (

y_{1}

) and the percent of the population aged 14 through 24 (

y_{2}

) were used. The bivariate beta (BVBeta) model of Cepeda et al. [25], the bivariate Johnson SB (BVJSB) model of Lemonte and Moreno-Arenas [27], and the BVSUSHN model were fitted. The BVBeta distribution of Cepeda et al. [25] is based on the e Farlie–Gumbel–Morgentern copula; see Nelsen [32], and their joint pdf is given by

f_{X_{1} X_{2}} (x_{1}, x_{2}) = f_{X_{1}} (x_{1}) f_{X_{2}} (x_{2}) (1 + θ (1 - 2 F_{X_{1}} (x_{1})) (1 - 2 F_{X_{2}} (x_{2})),

where

f_{X_{j}} (x_{j})

and

F_{X_{j}} (x_{j})

correspond, respectively, to the pdf and cdf of the beta distributions of parameters

α_{j}

and

β_{j}

for

j = 1, 2

and

θ \in (- 1, 1)

. To compare models, we used the Akaike information criterion (AIC) of [33] and the Bayesian information criterion (BIC) of [34], defined, respectively, by

A I C = - 2 ℓ (\hat{θ}) + 2 p and B I C = - 2 ℓ (\hat{θ}) + n log (n),

where p is the number of parameters and

ℓ (\cdot)

is the log-likelihood function evaluated at the MLEs of parameters. The best model is the one with the smallest AIC or BIC.

For fitting the bivariate model, we used the optim function of R Development Core Team [29]. The parameter estimates of these models, accompanied by their standard errors in parentheses, obtained using the maximum likelihood method, are given in the Table 3. According to the AIC and BIC criteria, the BVSUSHN model presents the best fit.

The graphs in the Figure 3 show the contours of the fitted models. For the BVSUSHN distribution, we have that

X_{j} = log (- log (1 - Y_{j})) \sim USHN (α_{j}, γ_{j}, σ_{j})

for

j = 1, 2

and

(X_{1}, X_{2}) \sim BVSUSHN (α_{1}, α_{2}, γ_{1}, γ_{2}, σ_{1}, σ_{2}, λ)

, then it follows that

W_{j} = \frac{2}{α_{j}} sinh (\frac{X_{j} - γ_{j}}{σ_{j}}) \sim N (0, 1), j = 1, 2

and

(W_{1}, W_{2}) \sim SBSN (0_{2}, I_{2}, λ)

where

0_{2}

and

I_{2}

are a column vector of size 2 and an identity matrix of size

2 \times 2

, respectively. The statistic of the Kolmogorov–Smirnov (KS) goodness-of-fit test joint with the respective

p

-values for the marginal distributions for the BVSJB, BVBeta, and BVSUSHN distribution are presented in Table 3. From here, it can be see that the BVSUSHN distribution shows a good fit compared with the BVSJB and BVBeta models.

5.3. Illustration 2

In the second illustration, the USHN bivariate regression model is fitted. The real data were taken from http://www.pe.undp.org (accessed on 31 August 2022), and they correspond to measurements made on 195 districts in Peru. In this illustration, the interest is to model the human development index (

Y_{1}

) and illiteracy rate (

Y_{2}

) as functions of the proportion of people with high poverty level (HPL) by using the MVSUSHN regression model. As far as poverty is concerned, poor are identified as those unable to obtain minimum required calorie per day to keep body and soul together. We have that

log (- log (1 - Y_{1 i})) = β_{10} + β_{11} H P L + ε_{1 i}

and

log (- log (1 - Y_{2 i})) = β_{20} + β_{21} H P L + ε_{2 i}

with

(ε_{1}, ϵ_{2}) \sim BVSHN ((α_{1}, α_{2}), (0, 0), diag (σ_{1}, σ_{2}), λ) .

The MLEs of the vector of model parameters, with standard errors in parenthesis are given by

{\hat{α}}_{1} = 0.3938 (0.1539),

{\hat{β}}_{10} = - 0.0269 (0.0075),

{\hat{β}}_{11} = - 0.5734 (0.0277),

{\hat{σ}}_{1} = 0.3765 (0.1384),

{\hat{α}}_{2} = 0.1303 (0.0654),

{\hat{β}}_{20} = - 2.8999 (0.0500),

{\hat{β}}_{21} = 3.5010 (0.1832),

{\hat{σ}}_{2} = 7.8736 (3.9272)

, and

\hat{λ} = - 3.1520 (0.6167) .

W_{j} = \frac{2}{α_{j}} sinh (\frac{log (- log (1 - Y_{j i})) - β_{j 0} - β_{j 1} H P L}{σ_{j}}) \sim N (0, 1), j = 1, 2,

and

(W_{1}, W_{2}) \sim SBSN (0_{2}, I_{2}, λ),

where

0_{2}

and

I_{2}

are a column vector of size 2 and an identity matrix of size

2 \times 2

. To study the model fit, we perform the Kolmogorov–Smirnov test for the bivariate vector

(ε_{1}, ε_{2})

.

For the multivariate Kolmogorov–Smirnov test of goodness of fit proposed by Justel [35], special for the case of a bivariate distribution, which we denote by BKS (bivariate Kolmogorov–Smirnov), the statistic is given by

d_{n} = sup_{(x_{1}, x_{2}) \in R^{2}} |F_{n} (x_{1}, x_{2}) - F (x_{1}, x_{2})|

where

F_{n}

is the empirical distribution function of the sample, and

F

is some specified distribution function. When the

F

distribution is unknown, the Kolmogorov–Smirnov statistic is defined by

d_{n} (F) = max \{D^{1}, D^{2}\},

where

D^{1} = sup_{(x_{1}, x_{2}) \in R^{2}} |G_{n} (y_{1}, y_{2}) - y_{1} \times y_{2}|

by using the transformations

y_{1} = F_{X_{1}} (x_{1})

,

y_{2} = F_{X_{2} ∣ X_{1}} (x_{2} ∣ x_{1})

, and

D^{2} = sup_{(x_{1}, x_{2}) \in R^{2}} |G_{n} (y_{2}, y_{1}) - y_{2} \times y_{1}|

by using the transformations

y_{2} = F_{X_{2}} (x_{2})

and

y_{1} = F_{X_{1} ∣ X_{2}} (x_{1} ∣ x_{2}),

where

G_{n}

is the empirical distribution function of the sample. For the special case of the BVSUSHN model,

d_{n} (BPSN) = max \{0.08622004, 0.0750111\} = 0.08622004,

which is less than

0.1265

(for

n = 200

), which is the critical value given by Justel [35] at level of 5%. Therefore, it is concluded that the MVSUSHN model fits the data set well.

We also performed univariate Kolmogorov–Smirnov goodness-of-fit tests for

W_{j}

with

j = 1, 2

yielding the test statistics of

D_{1} = 0.5282

with

p - value = 0.949

and

D_{2} = 0.5076

with

p - value = 0.9898

, indicating that the marginals show a good fit.

The Figure 4 shows the envelope plots for the marginal and contour distributions for the residuals of the fitted model. For the envelope plot, we used the martingale residual transformation,

r M T_{i}

, proposed by Barros et al. [36]. These residuals are defined by

r M T_{i} = sgn (r M_{i}) \sqrt{- 2 [r M_{i} + α_{i} log (δ_{i} - r M_{i})]}; i = 1, 2, \dots, n

where

r M_{i} = δ_{i} + log (S (e_{i}, \hat{θ}))

is the martingale residual proposed by Ortega et al. [37], where

δ_{i} = 0, 1

indicates whether the i-th observation is censored or not, respectively,

sgn (r M_{i})

denotes the sign of

r M_{i}

, and

S (e_{i}; \hat{θ})

represents the survival function evaluated at

e_{i}

, where

\hat{θ}

are the MLE for

θ

.

6. Concluding Remarks

Diverse distributions to deal with the problem of bounded data on the interval

(0, 1)

were proposed, with great applicability in all fields of knowledge, especially in the social sciences, humanities, medicine, and engineering, among others. However, few proposals have been developed in the statistical literature to jointly model two or more variables, such as those described above, and especially that incorporate covariates to explain the variability of the variables of interest.

In this paper, a new multivariate distribution was introduced from the conditionally specified distributions methodology useful for modeling responses in the

{(0, 1)}^{p}

region jointly. The new distribution, which is absolutely continuous, is called the skewed log-Birnbaum–Saunders distribution and is also extended to the case of regression models. For the multivariate distribution, the marginal densities and conditional distributions were presented, and the Fisher information matrix of the multivariate regression model was also presented. For the estimation of the parameters in the models, a classical approach was used together with the maximum likelihood method. A small Monte Carlo simulation study was carried out to study the benefits and limitations of the new methodologies, which allows us to conclude that the parameter estimators behave asymptotically well. Two applications with real data to illustrate the usefulness of the introduced methodologies showed great flexibility to model data in

{(0, 1)}^{p}

for the particular case

p = 2

, which makes them excellent alternatives to existing methodologies in the statistical literature.

Author Contributions

Conceptualization, G.M.-F., S.V.-C., and R.T.-F.; data curation, G.M.-F., S.V.-C., and R.T.-F.; formal analysis, G.M.-F., S.V.-C., R.T.-F., and L.R.-Q.; funding acquisition, G.M.-F., and R.T.-F.; investigation, G.M.-F., S.V.-C., R.T.-F., and L.R.-Q.; resources, G.M.-F., and R.T.-F.; software, G.M.-F., and R.T.-F.; supervision, G.M.-F., and R.T.-F.; validation, G.M.-F., S.V.-C., R.T.-F., and L.R.-Q.; visualization, G.M.-F., S.V.-C., R.T.-F., and L.R.-Q.; writing—original draft, G.M.-F., S.V.-C., and R.T.-F.; writing—review and editing, G.M.-F., S.V.-C., and R.T.-F. All authors have read and agreed to the published version of the manuscript.

Funding

The research of G. Martínez-Flórez and R. Tovar-Falón was supported by project: Resolución de Problemas de Situaciones Reales Usando Análisis Estadístico a través del Modelamiento Multidimensional de Tasas y Proporciones; Esquemas de Monitoreamiento para Datos Asimétricos no Normales y una Estrategia Didáctica para el Desarrollo del Pensamiento Lógico-Matemático. Universidad de Córdoba, Colombia, Acta de Compromiso Número FCB-05-19.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Details about available data are given in Section 5.

Acknowledgments

G. Martínez-Flórez and R. Tovar-Falón acknowledge the support given by Universidad de Córdoba, Montería, Colombia. S. Vergara-Cardozo recognizes the support given by Universidad Nacional de Colombia, Sede Bogotá.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Expected Value of the LSHN Distribution

This Appendix presents the derivation of the expected value of a random variable with LSHN distribution, which is used to obtain the Equation (18).

Let

Z \sim SHN (α, γ σ)

, then

X = exp (Z) \sim LSHN (α, γ σ)

, where LSHN denotes the pdf of a non-negative SHN distribution; see Martínez-Flórez et al. [3].

Thus, if

X \sim LSHN

, then

Z = - log (X) \sim SHN

. Following Rieck and Nedelman [8], we have that

\begin{matrix} X = exp (- Z) & ⟹ X^{r} = exp (- r Z) \\ ⟹ E (X^{r}) = E (e^{- r Z}) \\ ⟹ E (X^{r}) = \sum_{k = 0}^{\infty} \frac{{(- r)}^{k}}{k!} E (Z^{k}) \\ ⟹ E (X^{r}) = \sum_{k = 0}^{\infty} \frac{{(- r)}^{k}}{k!} e^{k r} \frac{k_{a} (α^{- 2}) + k_{b} (α^{- 2})}{k_{1 / 2} (α^{- 2})} \end{matrix}

where

a = (r σ + 1) / 2

,

b = (r σ - 1) / 2

and

k_{Λ} (\cdot)

is the third-order function of Bessel.

Now, if

X \sim LSHN (α, γ σ)

, then

Y = 1 - exp (- X) \sim USHN (α, γ, σ)

.

Hence,

\begin{matrix} E (Y^{n}) & = E [{(1 - e^{- X})}^{n}] \\ = \sum_{j = 0}^{n} (\binom{n}{j}) {(- 1)}^{j} E (e^{- j X}) \\ = \sum_{j = 0}^{n} \sum_{l = 0}^{\infty} (\binom{n}{j}) \frac{{(- 1)}^{j + 1} {(- j e^{γ})}^{l}}{l!} \frac{k_{a_{1}} (α^{- 2}) + k_{b 1} (α^{- 2})}{k_{1 / 2} (α^{- 2})} \end{matrix}

The last term is obtained by using Taylor expansion for

e^{- j X}

for the jth moment of the

LSHN (α, γ, σ)

.

Appendix B. Elements of the Observed Information for the SMVSHN Regression Model

This Appendix presents the elements of the observed information, which are calculated from Equation (26).

\begin{matrix} I_{α_{j} α_{j}} & = & - \frac{n}{α_{j}^{2}} + \frac{3}{α_{j}^{2}} \sum_{i = 1}^{n} δ_{i 2 j}^{2} \\ + \frac{λ}{α_{j}^{2}} \sum_{i = 1}^{n} (\prod_{j = 1}^{q} δ_{i 2 j}) ω_{i} [- \frac{2}{α_{j}} + λ (\prod_{j = 1}^{q} δ_{i 2 j}) (λ (\prod_{j = 1}^{q} δ_{i 2 j}) + ω_{i})], \end{matrix}

\begin{matrix} I_{α_{j} α_{j^{'}}} & = & \frac{λ}{α_{j} α_{j^{'}}} \sum_{i = 1}^{n} (\prod_{j = 1}^{q} δ_{i 2 j}) ω_{i} [- 1 + λ (\prod_{j = 1}^{q} δ_{i 2 j}) (λ (\prod_{j = 1}^{q} δ_{i 2 j}) + ω_{i})], \end{matrix}

\begin{matrix} I_{α_{j} β_{j k}} & = & \frac{2}{α_{j} σ_{j}} \sum_{i = 1}^{n} x_{i j k} δ_{i 1 j} δ_{i 2 j} \\ + \frac{λ}{α_{j} σ_{j}} \sum_{i = 1}^{n} x_{i j k} δ_{i 1 j} (\prod_{j^{'} \neq j} δ_{i 2 j^{'}}) ω_{i} [- 1 + λ (\prod_{j = 1}^{q} δ_{i 2 j}) (λ δ_{i 2 j} (\prod_{j^{'} \neq j} δ_{i 2 j^{'}}) + ω_{i})], \end{matrix}

\begin{matrix} I_{α_{j} β_{j^{'} k}} & = & \frac{λ}{α_{j} σ_{j^{'}}} \sum_{i = 1}^{n} x_{i j^{'} k} δ_{i 1 j^{'}} (\prod_{j \neq j^{'}} δ_{i 2 j}) ω_{i} [- 1 + λ (\prod_{j = 1}^{q} δ_{i 2 j}) (λ δ_{i 2 j^{'}} (\prod_{j \neq j^{'}} δ_{i 2 j}) + ω_{i})], \end{matrix}

\begin{matrix} I_{α_{j} σ_{j}} & = & \frac{2}{α_{j} σ_{j}} \sum_{i = 1}^{n} v_{i j} δ_{i 1 j} δ_{i 2 j} \\ + \frac{λ}{α_{j} σ_{j}} \sum_{i = 1}^{n} v_{i j} δ_{i 1 j} (\prod_{j^{'} \neq j} δ_{i 2 j^{'}}) ω_{i} [- 1 + λ (λ δ_{i 2 j} (\prod_{j^{'} \neq j} δ_{i 2 j^{'}}) + ω_{i})], \end{matrix}

\begin{matrix} I_{α_{j} σ_{j^{'}}} & = & \frac{λ}{α_{j} σ_{j^{'}}} \sum_{i = 1}^{n} v_{i j^{'}} δ_{i 1 j^{'}} (\prod_{j \neq j^{'}} δ_{i 2 j}) ω_{i} [- 1 + λ (λ δ_{i 2 j^{'}} (\prod_{j \neq j^{'}} δ_{i 2 j}) + ω_{i})], \end{matrix}

\begin{matrix} I_{α_{j} λ} & = & \frac{1}{α_{j}} \sum_{i = 1}^{n} (\prod_{j = 1}^{q} δ_{i 2 j}) ω_{i} [1 - λ (\prod_{j = 1}^{q} δ_{i 2 j}) (λ (\prod_{j = 1}^{q} δ_{i 2 j}) + ω_{i})], \end{matrix}

\begin{matrix} I_{β_{j k} β_{j k^{'}}} & = & \frac{1}{σ_{j}^{2}} \sum_{i = 1}^{n} x_{i j k} x_{i j k^{'}} \{2 δ_{i 2 j}^{2} + \frac{4}{α_{j}^{2}} - 1 + \frac{δ_{i 2 j}^{2}}{δ_{i 2 j}^{2} + 4 / α_{j}^{2}}\} \\ + \frac{λ}{σ_{j}^{2}} \sum_{i = 1}^{n} x_{i j k} x_{i j k^{'}} (\prod_{j^{'} \neq j} δ_{i 2 j^{'}}) ω_{i} [- δ_{i 2 j^{'}} + λ δ_{i 1 j}^{2} (\prod_{j^{'} \neq j} δ_{i 2 j^{'}}) (λ δ_{i 2 j} (\prod_{j^{'} \neq j} δ_{i 2 j^{'}}) + ω_{i})], \end{matrix}

\begin{matrix} I_{β_{j k} β_{j^{'} k^{'}}} & = & \frac{λ}{σ_{j} σ_{j}^{^{'}}} \sum_{i = 1}^{n} x_{i j k} x_{i j^{'} k^{'}} δ_{i 1 j} δ_{i 1 j^{'}} ω_{i} [- (\prod_{l \neq j, j^{'}} δ_{i 2 l}) + λ (\prod_{j^{'} \neq j} δ_{i 2 j^{'}}) (\prod_{j \neq j^{'}} δ_{i 2 j}) (λ δ_{i 2 j^{'}} (\prod_{j \neq j^{'}} δ_{i 2 j}) + ω_{i})] \end{matrix}

\begin{matrix} I_{β_{j k} σ_{j}} & = & \frac{1}{2 σ_{j}} \sum_{i = 1}^{n} x_{i j k} v_{i j} [δ_{i 1 j}^{2} + δ_{i 2 j}^{2} - s e c h^{2} v_{i j}] - \frac{λ}{σ_{j}^{2}} \sum_{i = 1}^{n} x_{i j k} ω_{i} (δ_{i 1 j} + δ_{i 2 j} v_{i j}) (\prod_{j^{'} \neq j} δ_{i 2 j^{'}}) \\ + \frac{λ^{2}}{σ_{j}^{2}} \sum_{i = 1}^{n} x_{i j k} v_{i j} δ_{i 2 j}^{2} ω_{i} {(\prod_{j^{'} \neq j} δ_{i 2 j^{'}})}^{2} (λ δ_{i 2 j} (\prod_{j^{'} \neq j} δ_{i 2 j^{'}}) + ω_{i}) \end{matrix}

\begin{matrix} I_{β_{j k} σ_{j^{'}}} & = & \frac{λ}{σ_{j} σ_{j}^{^{'}}} \sum_{i = 1}^{n} x_{i j k} v_{i j^{'}} δ_{i 1 j} δ_{i 1 j^{'}} ω_{i} [- (\prod_{l \neq j, j^{'}} δ_{i 2 l}) + λ (\prod_{j^{'} \neq j} δ_{i 2 j^{'}}) (\prod_{j \neq j^{'}} δ_{i 2 j}) (λ δ_{i 2 j^{'}} (\prod_{j \neq j^{'}} δ_{i 2 j}) + ω_{i})] \end{matrix}

\begin{matrix} I_{β_{j k} λ} & = & \frac{1}{σ_{j}} \sum_{i = 1}^{n} x_{i j k} δ_{i 1 j} (\prod_{j^{'} \neq j} δ_{i 2 j^{'}}) ω_{i} [1 - λ (\prod_{j = 1}^{q} δ_{i 2 j}) (λ (\prod_{j = 1}^{q} δ_{i 2 j}) + ω_{i})], \end{matrix}

\begin{matrix} I_{σ_{j} σ_{j}} & = & \frac{2}{σ_{j}^{2}} \sum_{i = 1}^{n} v_{i j} (δ_{i 1 j} δ_{i 2 j} - \frac{δ_{i 2 j}}{δ_{i 1 j}}) + \frac{1}{σ_{j}^{2}} \sum_{i = 1}^{n} v_{i j}^{2} \{2 δ_{i 2 j}^{2} + \frac{4}{α_{j}^{2}} - 1 + \frac{δ_{i 2 j}^{2}}{δ_{i 2 j}^{2} + 4 / α_{j}^{2}}\} \\ + \frac{λ}{σ_{j}^{2}} \sum_{i = 1}^{n} (\prod_{j^{'} \neq j} δ_{i 2 j^{'}}) ω_{i} [- 2 v_{i j} δ_{i 1 j} - v_{i j}^{2} δ_{i 2 j} + λ v_{i j}^{2} δ_{i 1 j}^{2} (\prod_{j^{'} \neq j} δ_{i 2 j^{'}}) (λ δ_{i 2 j} (\prod_{j^{'} \neq j} δ_{i 2 j^{'}}) + ω_{i})] \end{matrix}

\begin{matrix} I_{σ_{j} σ_{j^{'}}} & = & \frac{λ}{σ_{j} σ_{j^{'}}} \sum_{i = 1}^{n} v_{i j} v_{i j^{'}} δ_{i 1 j} δ_{i 1 j^{'}} ω_{i} [- (\prod_{l \neq j, j^{'}} δ_{i 2 l}) + λ (\prod_{j \neq j^{'}} δ_{i 2 j}) (\prod_{j^{'} \neq j} δ_{i 2 j^{'}}) (λ δ_{i 2 j^{'}} (\prod_{j \neq j^{'}} δ_{i 2 j}) + ω_{i})] \end{matrix}

\begin{matrix} I_{σ_{j} λ} & = & \frac{1}{σ_{j}} \sum_{i = 1}^{n} v_{i j} δ_{i 1 j} (\prod_{j^{'} \neq j} δ_{i 2 j^{'}}) ω_{i} [1 - λ (\prod_{j = 1}^{q} δ_{i 2 j}) (λ (\prod_{j = 1}^{q} δ_{i 2 j}) + ω_{i})], \end{matrix}

\begin{matrix} I_{λ λ} & = & \sum_{i = 1}^{n} {(\prod_{j = 1}^{q} δ_{i 2 j})}^{2} ω_{i} (λ (\prod_{j = 1}^{q} δ_{i 2 j}) + ω_{i}), \end{matrix}

References

Ferrari, S.; Cribari-Neto, F. Beta regression for modelling rates and proportions. J. Appl. Stat. 2004, 31, 799–815. [Google Scholar] [CrossRef]
Kumaraswamy, P. A generalized probability density function for double-bounded random processes. J. Hydrol. 1980, 46, 79–88. [Google Scholar] [CrossRef]
Martínez-Flórez, G.; Tovar-Falón, R. Regression Models Based on the Unit Sinh-Normal Distribution. Mathematics 2021, 9, 1231. [Google Scholar] [CrossRef]
Martínez-Flórez, G.; Azevedo-Farias, R.B.; Tovar-Falón, R. New Class of Unit-Power-Skew-Normal Distribution and Its Associated Regression Model for Bounded Responses. Mathematics 2022, 10, 3035. [Google Scholar] [CrossRef]
Kieschnick, R.; Mccullough, B.D. Regression analysis of variates observed on (0,1). Stat. Model. 2003, 3, 193–213. [Google Scholar] [CrossRef]
Mazucheli, J.; Menezes, A.F.B.; Ghitany, M.E. The unit-Weibull distribution and associated inference. J. Appl. Probab. Stat. 2018, 13, 1–22. [Google Scholar]
Menezes, A.F.B.; Mazucheli, J.; Dey, S. The unit-logistic distribution: Different methods of estimation. Pesqui. Oper. 2018, 38, 555–578. [Google Scholar] [CrossRef]
Rieck, J.R.; Nedelman, J.R. A log-linear model for the Birnbaum-Saunders distribution. Technometrics 1991, 33, 51–60. [Google Scholar]
Martínez-Flórez, G.; Elal-Olivero, D.; Barrera-Causil, C. Extended Generalized Sinh-Normal Distribution. Mathematics 2021, 9, 2793. [Google Scholar] [CrossRef]
Lemonte, A.J. A log-Birnbaum-Saunders regression model with asymmetric errors. J. Stat. Comput. Simul. 2011, 82, 1775–1787. [Google Scholar] [CrossRef]
Leiva, V.; Vilca-Labra, F.; Balakrishnan, N.; Sanhueza, A. A skewed sinh-normal distribution and its properties and application to air pollution. Commun. Stat. Theory Methods 2010, 39, 426–443. [Google Scholar] [CrossRef]
Moreno-Arenas, G.; Martínez-Flórez, G.; Barrera-Causil, C. Proportional Hazard Birnbaum-Saunders Distribution with Application to the Survival Data Analysis. Rev. Colomb. Estad. 2016, 39, 129–147. [Google Scholar] [CrossRef]
Birnbaum, Z.W.; Saunders, S.C. A new family of life distributions. J. Appl. Probab. 1969, 6, 319–327. [Google Scholar] [CrossRef]
Martínez-Flórez, G.; Bolfarine, H.; Gómez, H.W. The Log-Linear Birnbaum-Saunders Power Model. Methodol. Comput. Appl. Probab. 2017, 19, 913–933. [Google Scholar] [CrossRef]
Díaz–García, J.A.; Domínguez–Molina, J.R. Some generalisations of Birnbaum–Saunders and sinh–normal distributions. Int. Math. Forum 2014, 1, 1709–1727. [Google Scholar]
Lemonte, A. A Multivariate Birnbaum-Saunders regression model. J. Stat. Comput. Simul. 2013, 12, 2244–2257. [Google Scholar] [CrossRef]
Marchant, C.; Leiva, V.; Cysneiros, F. A multivariate log-linear model for Birnbaum-Saunders distributions. IEEE Trans. Reliab. 2016, 65, 816–864. [Google Scholar] [CrossRef]
Martínez-Flórez, G.; Azevedo-Farias, R.B.; Tovar-Falón, R. An exponentiated multivariate extension for the Birnbaum-Saunders log-linear model. Mathematics 2022, 10, 1299. [Google Scholar] [CrossRef]
Mazucheli, J.; Menezes, A.; Dey, S. The unit-Birnbaum-Saunders distribution with applications. Chil. J. Stat. 2018, 9, 47–57. [Google Scholar]
Arnold, B.C.; Castillo, E.; Sarabia, J.M. Conditionally specified multivariate skewed distributions. Sankhya A 2002, 64, 206–226. [Google Scholar]
Arnold, B.C.; Castillo, E.; Sarabia, J.M. Conditionally specified distributions. In Lecture Notes in Statistics; Berger, J., Fienberg, J., Gani, J., Krickeberg, I., Singer, B., Eds.; Springer: New York, NY, USA, 1992; Volume 73. [Google Scholar]
Azzalini, A. A class of distributions which includes the normal ones. Scand. J. Stat. 1985, 12, 171–178. [Google Scholar]
Lemonte, A.J.; Martínez-Flórez, G.; Moreno-Arenas, G. Multivariate Birnbaum-Saunders distribution: Properties and associated inference. J. Stat. Comput. Simul. 2015, 85, 374–392. [Google Scholar] [CrossRef]
Martínez-Flórez, G.; Azevedo-Farias, R.; Moreno-Arenas, G. Multivariate log-Birnbaum-Saunders regression models. Commun. Stat. Theory Methods 2017, 46, 10166–10178. [Google Scholar] [CrossRef]
Cepeda-Cuervo, E.; Achcar, J.A.; Garrido-Lopera, L. Bivariate beta regression models: Joint modeling of the mean, dispersion and association parameters. J. Appl. Stat. 2014, 41, 677–687. [Google Scholar] [CrossRef]
Souza, D.F.; Moura, F.A.S. Multivariate Beta Regression with Application in Small Area Estimation. J. Off. Stat. 2016, 32, 747–768. [Google Scholar] [CrossRef]
Lemonte, A.J.; Moreno-Arenas, G. On a multivariate regression model for rates and proportions. J. Appl. Stat. 2019, 46, 1084–1106. [Google Scholar] [CrossRef]
Tables for computing bi-variate normal probabilities. Ann. Math. Stat. 1976, 27, 1075–1090.
R Development Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2021; Available online: http://www.R-project.org (accessed on 31 August 2022).
Freeman, D.G. Drunk Driving Legislation and Traffic Fatalities: New Evidence on BAC 08 Laws. Contemp. Econ. Policy 2007, 25, 293–308. [Google Scholar] [CrossRef]
Shea, J.M.; Brown, K.J. Wooldridge: 115 Data Sets. In Introductory Econometrics: A Modern Approach, 7e; Wooldridge, J.M., Ed.; R package version 1.4-2; Cengage Learning: Boston, MA, USA, 2022. [Google Scholar]
Nelsen, R.B. An Introduction to Copulas; Springer Series in Statistics; Springer: New York, NY, USA, 2006; p. 272. [Google Scholar]
Akaike, H. A new look at statistical model identification. IEEE Trans. Autom. Control. 1974, AU-19, 716–722. [Google Scholar] [CrossRef]
Schwarz, G. Estimating the dimension of a model. Ann. Stat. 1978, 6, 461–464. [Google Scholar] [CrossRef]
Justel, A.; Peña, D.; Zamar, Z. A multivariate Kolmogorov-Smirnov test of goodness of fit. Stat. Probab. Lett. 1997, 35, 251–259. [Google Scholar] [CrossRef]
Barros, M.; Galea, M.; Gonzalez, M.; Leiva, V. Influence diagnostics in the tobit censored response model. Stat. Methods Appl. 2010, 19, 379–397. [Google Scholar] [CrossRef]
Ortega, E.M.; Bolfarine, H.; Paula, G.A. Influence diagnostics in generalized log-gamma regression models. Comput. Stat. Data Anal. 2003, 42, 165–186. [Google Scholar] [CrossRef]

Figure 1. Graphs of contours for the BVSUSHN distribution for: (a)

BVSUSHN (0.5, 1.5, 0, 0, 1, 1, 1.5)

, (b)

BVSUSHN (0.25, 0.75, 0, 0, 1, 1, 0.75)

, and (c)

BVSUSHN (1.25, 1.75, 0, 0, 1, 1, - 0.75)

.

Figure 1. Graphs of contours for the BVSUSHN distribution for: (a)

BVSUSHN (0.5, 1.5, 0, 0, 1, 1, 1.5)

, (b)

BVSUSHN (0.25, 0.75, 0, 0, 1, 1, 0.75)

, and (c)

BVSUSHN (1.25, 1.75, 0, 0, 1, 1, - 0.75)

.

Figure 2. Graphs of density function of the BVSUSHN distribution for: (a)

BVSUSHN (0.5, 1.5, 0.0, 0.0, 1.0, 1.0, 1.5)

, (b)

BVSUSHN (0.25, 0.75, 0.0, 0.0, 1.5, 1.5, 0.75)

, (c)

BVSUSHN (1.25, 1.75, 0.0, 0.0, 1.0, 1.0, - 0.75)

, and (d)

BVSUSHN (2.5, 2.5, 0.0, 0.0, 1.0, 1.0, - 2.5)

.

Figure 2. Graphs of density function of the BVSUSHN distribution for: (a)

BVSUSHN (0.5, 1.5, 0.0, 0.0, 1.0, 1.0, 1.5)

, (b)

BVSUSHN (0.25, 0.75, 0.0, 0.0, 1.5, 1.5, 0.75)

, (c)

BVSUSHN (1.25, 1.75, 0.0, 0.0, 1.0, 1.0, - 0.75)

, and (d)

BVSUSHN (2.5, 2.5, 0.0, 0.0, 1.0, 1.0, - 2.5)

.

Figure 3. Contour plots for the fitted models. (a) VBJSB, (b) VBBeta, and (c) SMVUSHN.

Figure 4. Envelope plots for the marginals and contour for the fitted model. (a)

W_{1}

, (b)

W_{2}

, and (c) MVSUSHN.

Figure 4. Envelope plots for the marginals and contour for the fitted model. (a)

W_{1}

, (b)

W_{2}

, and (c) MVSUSHN.

Table 1. Correlation coefficient for BVUSHN distribution.

$λ$	$α_{1} / α_{2}$	0.085	0.5	0.75	1.5	2.25	3.0	5.0	7.5	10.0
	0.095	0.5124	0.3401	0.3410	0.3410	0.3388	0.3347	0.3077	0.2846	0.2759
	0.45	0.3513	0.3160	0.3169	0.3173	0.3155	0.3117	0.2716	0.2239	0.1966
	0.75	0.3524	0.3171	0.3181	0.3183	0.3165	0.3126	0.2722	0.2243	0.1969
0.5	1.0	0.3528	0.3177	0.3186	0.3188	0.3168	0.3128	0.2722	0.2243	0.1969
	2.0	0.3507	0.3163	0.3171	0.3169	0.3148	0.3106	0.2697	0.2220	0.1949
	3.0	0.3456	0.3119	0.3126	0.3122	0.3099	0.3057	0.2650	0.2180	0.1915
	5.0	0.3261	0.2718	0.2722	0.2713	0.2690	0.2650	0.2318	0.1964	0.1775
	7.5	0.3170	0.2240	0.2243	0.2234	0.2213	0.2180	0.1964	0.1774	0.1693
	10.0	0.3188	0.1967	0.1969	0.1962	0.1943	0.1915	0.1775	0.1693	0.1684
	0.095	0.8258	0.5950	0.6003	0.6123	0.6177	0.6183	0.5809	0.5252	0.4904
	0.45	0.6192	0.5269	0.5321	0.5442	0.5504	0.5517	0.5070	0.4343	0.3845
	0.75	0.6260	0.5331	0.5383	0.5504	0.5565	0.5576	0.5120	0.4381	0.3877
	1.0	0.6311	0.5379	0.5431	0.5551	0.5611	0.5621	0.5156	0.4408	0.3900
1.5	2.0	0.6425	0.5497	0.5548	0.5662	0.5717	0.5722	0.5232	0.4460	0.3941
	3.0	0.6446	0.5527	0.5576	0.5685	0.5735	0.5736	0.5232	0.4451	0.3932
	5.0	0.6140	0.5079	0.5120	0.5207	0.5241	0.5232	0.4774	0.4096	0.3661
	7.5	0.5695	0.4349	0.4381	0.4444	0.4464	0.4451	0.4096	0.3610	0.3315
	10.0	0.5434	0.3850	0.3877	0.3930	0.3945	0.3932	0.3661	0.3315	0.3117
	0.095	0.9317	0.6733	0.6812	0.7013	0.7131	0.7184	0.6878	0.6282	0.5857
	0.45	0.6975	0.5875	0.5945	0.6128	0.6242	0.6294	0.5900	0.5163	0.4623
	0.75	0.7073	0.5958	0.6030	0.6214	0.6327	0.6379	0.5976	0.5224	0.4677
	1.0	0.7154	0.6028	0.6099	0.6284	0.6397	0.6448	0.6036	0.5273	0.4718
2.5	2.0	0.7375	0.6222	0.6294	0.6478	0.6588	0.6635	0.6194	0.5396	0.4820
	3.0	0.7469	0.6307	0.6379	0.6559	0.6665	0.6708	0.6250	0.5432	0.4847
	5.0	0.7243	0.5912	0.5976	0.6132	0.6219	0.6250	0.5823	0.5092	0.4579
	7.5	0.6772	0.5172	0.5224	0.5348	0.5414	0.5432	0.5092	0.4534	0.4157
	10.0	0.6430	0.4632	0.4677	0.4781	0.4834	0.4847	0.4579	0.4157	0.3882
	0.095	0.9765	0.7038	0.7128	0.7374	0.7531	0.7616	0.7367	0.6801	0.6363
	0.45	0.7267	0.6135	0.6214	0.6428	0.6570	0.6645	0.6290	0.5567	0.5025
	0.75	0.7379	0.6227	0.6307	0.6523	0.6666	0.6742	0.6379	0.5642	0.5090
	1.0	0.7474	0.6307	0.6387	0.6605	0.6749	0.6824	0.6453	0.5705	0.5144
3.5	2.0	0.7755	0.6542	0.6624	0.6845	0.6988	0.7061	0.6662	0.5874	0.5288
	3.0	0.7895	0.6659	0.6742	0.6962	0.7102	0.7172	0.6755	0.5945	0.5345
	5.0	0.7730	0.6303	0.6379	0.6576	0.6698	0.6755	0.6367	0.5633	0.5098
	7.5	0.7297	0.5579	0.5642	0.5805	0.5902	0.5945	0.5633	0.5064	0.4656
	10.0	0.6951	0.5035	0.5090	0.5230	0.5311	0.5345	0.5098	0.4656	0.4348
	0.095	0.9938	0.7228	0.7324	0.7601	0.7790	0.7903	0.7707	0.7182	0.6763
	0.45	0.7446	0.6323	0.6405	0.6643	0.6808	0.6903	0.6583	0.5881	0.5346
	0.75	0.7563	0.6418	0.6502	0.6743	0.6910	0.7006	0.6681	0.5967	0.5421
	1.0	0.7669	0.6505	0.6590	0.6834	0.7002	0.7099	0.6767	0.6040	0.5486
5.0	2.0	0.7992	0.6772	0.6860	0.7110	0.7281	0.7377	0.7021	0.6254	0.5672
	3.0	0.8170	0.6917	0.7006	0.7258	0.7428	0.7523	0.7149	0.6358	0.5760
	5.0	0.8061	0.6597	0.6681	0.6913	0.7066	0.7149	0.6803	0.6084	0.5544
	7.5	0.7672	0.5894	0.5967	0.6164	0.6291	0.6358	0.6084	0.5521	0.5102
	10.0	0.7350	0.5357	0.5421	0.5594	0.5703	0.5760	0.5544	0.5102	0.4776

Table 2. Relative bias, root of mean square error, length of confidence interval, and coverage probability for the MVSUSHN regression model.

		$λ = 1.75$				$λ = 3.50$				$λ = 5.25$
$n$	$θ_{i}$	RB	RMSE	LCI	CP	RB	RMSE	LCI	CP	RB	RMSE	LCI	CP
40	$α_{1}$	0.2279	0.6529	4.3944	0.9992	0.1858	0.5632	4.2347	1.0000	0.1588	0.4897	4.1603	1.0000
	$α_{2}$	0.5797	0.3882	3.9557	1.0000	0.4186	0.2454	3.7913	1.0000	0.2729	0.1752	3.7730	1.0000
	$β_{10}$	0.1648	0.0446	0.6471	0.7258	0.1059	0.0301	0.5543	0.7723	0.0844	0.0226	0.5062	0.8373
	$β_{11}$	0.0631	0.0295	0.6283	0.9223	0.0414	0.0218	0.5472	0.9272	0.0314	0.0189	0.5046	0.9346
	$β_{20}$	0.0225	0.0313	0.6037	0.8949	0.0210	0.0228	0.5251	0.9103	0.0162	0.0166	0.4833	0.9279
	$β_{21}$	0.1725	0.1681	1.1539	0.6816	0.1047	0.0956	0.9631	0.7742	0.0838	0.0685	0.8842	0.7535
	$σ_{1}$	0.0546	0.4598	4.5055	0.8934	0.0308	0.4418	4.6131	0.9272	0.0284	0.3935	4.4996	0.9519
	$σ_{2}$	0.3359	0.6907	9.3760	0.7098	0.2693	1.0002	6.5228	0.8036	0.3444	1.5870	6.2103	0.9096
	$λ$	0.1705	0.7552	3.4734	0.8446	0.2084	4.6158	8.5285	0.7551	0.2087	6.2440	10.3860	0.7217
80	$α_{1}$	0.0422	0.1841	2.6664	1.0000	0.0424	0.1732	2.6472	1.0000	0.0396	0.1109	2.7276	1.0000
	$α_{2}$	0.1296	0.0798	2.6892	1.0000	0.2238	0.0657	2.8055	1.0000	0.2974	0.0727	2.7864	1.0000
	$β_{10}$	0.1580	0.0270	0.4598	0.8239	0.1048	0.0167	0.3987	0.8683	0.0784	0.0099	0.3591	0.8871
	$β_{11}$	0.0549	0.0141	0.4374	0.9282	0.0403	0.0091	0.3740	0.9424	0.0306	0.0079	0.3440	0.9392
	$β_{20}$	0.0173	0.0118	0.4293	0.9276	0.0152	0.0092	0.3683	0.9380	0.0051	0.0068	0.3269	0.9286
	$β_{21}$	0.1698	0.1033	0.7948	0.7658	0.1040	0.0541	0.6637	0.8320	0.0797	0.0336	0.5786	0.8199
	$σ_{1}$	0.0301	0.2265	3.0365	0.9973	0.0289	0.1868	2.9427	0.9993	0.0223	0.1192	2.7508	1.0000
	$σ_{2}$	0.1044	0.4867	7.9724	0.9606	0.2363	0.8463	5.8032	1.0000	0.2265	1.2797	5.8860	1.0000
	$λ$	0.1038	0.1931	1.8814	0.9043	0.1782	0.7432	4.0199	0.9170	0.1674	1.4969	6.6811	0.8465
120	$α_{1}$	0.0363	0.0903	2.2934	1.0000	0.0313	0.0793	2.1980	1.0000	0.0208	0.1048	2.0728	1.0000
	$α_{2}$	0.1172	0.0413	2.6855	1.0000	0.1760	0.0594	2.3182	1.0000	0.1853	0.0613	2.4098	1.0000
	$β_{10}$	0.1522	0.0196	0.3802	0.8248	0.1037	0.0112	0.3217	0.8755	0.0776	0.0090	0.2899	0.9029
	$β_{11}$	0.0537	0.0078	0.3533	0.9294	0.0375	0.0062	0.3008	0.9431	0.0306	0.0051	0.2762	0.9444
	$β_{20}$	0.0043	0.0092	0.3635	0.9408	0.0116	0.0056	0.2964	0.9448	0.0020	0.0052	0.2672	0.9410
	$β_{21}$	0.1675	0.0790	0.6467	0.8606	0.1042	0.0357	0.5260	0.8499	0.0794	0.0304	0.4765	0.8885
	$σ_{1}$	0.0103	0.0793	2.2790	1.0000	0.0034	0.0690	2.1409	1.0000	0.0104	0.1054	2.2788	1.0000
	$σ_{2}$	0.0599	0.4395	6.8628	1.0000	0.0990	0.5798	3.1258	1.0000	0.2055	0.5280	4.7802	1.0000
	$λ$	0.0909	0.1370	1.5647	0.9151	0.0893	0.5541	3.2542	0.9274	0.1199	1.3591	4.5893	0.9413
200	$α_{1}$	0.0055	0.0611	1.5785	1.0000	0.0073	0.0573	1.5689	1.0000	0.0012	0.0603	1.5590	1.0000
	$α_{2}$	0.1022	0.0350	1.7030	1.0000	0.0316	0.0316	1.8231	1.0000	0.1446	0.0280	1.9242	1.0000
	$β_{10}$	0.1299	0.0186	0.2872	0.8766	0.0905	0.0100	0.2490	0.8795	0.0656	0.0064	0.2220	0.9357
	$β_{11}$	0.0360	0.0055	0.2693	0.9552	0.0314	0.0037	0.2323	0.9481	0.0260	0.0029	0.2103	0.9487
	$β_{20}$	0.0014	0.0053	0.2691	0.9464	0.0051	0.0035	0.2280	0.9438	0.0011	0.0028	0.2041	0.9455
	$β_{21}$	0.1407	0.0701	0.4915	0.9187	0.0859	0.0344	0.4086	0.9020	0.0724	0.0239	0.3628	0.8919
	$σ_{1}$	0.0063	0.0605	1.6167	1.0000	0.0028	0.0645	1.6578	1.0000	0.0095	0.0673	1.6935	1.0000
	$σ_{2}$	0.0527	0.2503	5.3491	1.0000	0.0508	0.1961	2.2176	1.0000	0.0672	0.1795	2.5655	1.0000
	$λ$	0.0864	0.1210	1.0579	0.9497	0.0822	0.4112	2.1510	0.9580	0.0734	0.8740	3.2651	0.9598

Table 3. MLE (SE) for BVBeta, BVSJB, and BVSUSHN models.

Parameters	BVSJB	BVBeta	BVSUSHN
$α_{1}$	$0.1554 (0.0010)$	$57.0023 (2.2745)$	$1.5520 (0.1731)$
$α_{2}$	$0.0624 (0.0018)$	$312.5664 (12.5410)$	$- 1.7849 (0.0035)$
$β_{1}$	$5.0188 (0.9010)$	$8.2716 (0.3220)$	$0.2001 (0.0163)$
$β_{2}$	$2.7763 (0.1020)$	$128.3064 (5.1770)$	$0.1729 (0.1590)$
$σ_{1}$	$1.7204 (0.6972)$		$- 2.8302 (0.0097)$
$σ_{2}$	$8.0839 (0.2439)$		$4.0036 (3.6538)$
$λ$	$3.2156 (0.5746)$	$0.9995 (0.0001)$	$0.6239 (0.0547)$
KS test ( $p$ -value)	$D_{1} = 0.14 (6 \times 10^{- 10})$	$D_{1} = 0.12 (5 \times 10^{- 8})$	$D_{1} = 0.5725 (0.8987)$
	$D_{2} = 0.07 (0.00234)$	$D_{2} = 0.08 (0.00234)$	$D_{2} = 0.5175 (0.9518)$
AIC	$- 10, 980.07$	$- 12, 446.15$	$- 12, 559.71$
BIC	$- 10, 944.44$	$- 12, 440.69$	$- 12, 524.08$

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Martínez-Flórez, G.; Vergara-Cardozo, S.; Tovar-Falón, R.; Rodriguez-Quevedo, L. The Multivariate Skewed Log-Birnbaum–Saunders Distribution and Its Associated Regression Model. Mathematics 2023, 11, 1095. https://doi.org/10.3390/math11051095

AMA Style

Martínez-Flórez G, Vergara-Cardozo S, Tovar-Falón R, Rodriguez-Quevedo L. The Multivariate Skewed Log-Birnbaum–Saunders Distribution and Its Associated Regression Model. Mathematics. 2023; 11(5):1095. https://doi.org/10.3390/math11051095

Chicago/Turabian Style

Martínez-Flórez, Guillermo, Sandra Vergara-Cardozo, Roger Tovar-Falón, and Luisa Rodriguez-Quevedo. 2023. "The Multivariate Skewed Log-Birnbaum–Saunders Distribution and Its Associated Regression Model" Mathematics 11, no. 5: 1095. https://doi.org/10.3390/math11051095

APA Style

Martínez-Flórez, G., Vergara-Cardozo, S., Tovar-Falón, R., & Rodriguez-Quevedo, L. (2023). The Multivariate Skewed Log-Birnbaum–Saunders Distribution and Its Associated Regression Model. Mathematics, 11(5), 1095. https://doi.org/10.3390/math11051095

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

The Multivariate Skewed Log-Birnbaum–Saunders Distribution and Its Associated Regression Model

Abstract

1. Introduction

2. Multivariate Skew-Normal Distribution

3. Multivariate Skewed Unit-Sinh-Normal Distribution

Moments and Correlation

4. Multivariate Skewed USHN Regression Model

Statistical Inference

5. Numerical Results

5.1. Simulation Study

5.2. Illustration 1

5.3. Illustration 2

6. Concluding Remarks

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A. Expected Value of the LSHN Distribution

Appendix B. Elements of the Observed Information for the SMVSHN Regression Model

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI