A Generator of Bivariate Distributions: Properties, Estimation, and Applications

: In 2020, El-Morshedy et al. introduced a bivariate extension of the Burr type X generator (BBX-G) of distributions, and Muhammed presented a bivariate generalized inverted Kumaraswamy (BGIK) distribution. In this paper, we propose a more ﬂexible generator of bivariate distributions based on the maximization process from an arbitrary three-dimensional baseline distribution vector, which is of interest for maintenance and stress models, and expands the BBX-G and BGIK distributions, among others. This proposed generator allows one to generate new bivariate distributions by combining non-identically distributed baseline components. The bivariate distributions belonging to the proposed family have a singular part due to the latent component which makes them suitable for modeling two-dimensional data sets with ties. Several distributional and stochastic properties are studied for such bivariate models, as well as for its marginals, conditional distributions, and order statistics. Furthermore, we analyze its copula representation and some related association measures. The EM algorithm is proposed to compute the maximum likelihood estimations of the unknown parameters, which is illustrated by using two particular distributions of this bivariate family for modeling two real data sets.

Theorem 1. Let (X 1 , X 2 ) be a GBD model with baseline distribution vector (F U 1 , F U 2 , F U 3 ), then its joint cdf is given by where z = min(x 1 , x 2 ), for all x 1 , x 2 ∈ R.
Proof. It is immediate since For instance, a stress model may lead to the GBD family, as in Kundu and Gupta [13]. Suppose a two-component system where each component is subject to an individual independent stress, say U 1 and U 2 , respectively. The system has an overall stress U 3 which has been equally transmitted to both the components, independent of their individual stresses. Then, the observed stress for each component is the maximum of both, individual and overall stresses, i.e., X 1 = max(U 1 , U 3 ) and X 2 = max(U 2 , U 3 ), and (X 1 , X 2 ) is a GBD model.
Analogously, a GBD model is also plausible for a maintenance model. Suppose a system has two components, and it is assumed that each component has been maintained independently and there is also an overall maintenance. Due to component maintenance, the lifetime of the individual component is increased by a random time, say U 1 and U 2 respectively, and, because of the overall maintenance, the lifetime of each component is increased by another random time U 3 . Then, the increased lifetime of each component is the maximum of both individual and overall maintenances, X 1 = max(U 1 , U 3 ) and X 2 = max(U 2 , U 3 ), respectively.
As mentioned before, a bivariate model belonging to the GBD family does not have an absolutely continuous cdf. Let us see now the decomposition of a GBD model as a mixture of bivariate absolutely continuous and singular cdfs, the proof is provided in Appendix A. Theorem 2. Let (X 1 , X 2 ) be a GBD model with baseline distribution vector (F U 1 , F U 2 , F U 3 ). Then, where F s (x 1 , and F ac (x 1 , with z = min(x 1 , x 2 ), are the singular and absolutely continuous parts, respectively, and In addition, due to the singular part F s in (2), the GBD family does not have a pdf with respect to the two-dimensional Lebesgue measure even when the distribution functions F U 1 , F U 2 , and F U 3 are absolutely continuous. However, it is possible to construct a joint pdf for (X 1 , X 2 ) through a mixture between a pdf with respect to the two-dimensional Lebesgue measure and a pdf with respect to the one-dimensional Lebesgue measure (the proof is provided in Appendix A). Theorem 3. If (X 1 , X 2 ) is a GBD model with joint cdf given by (1), then the joint pdf with respect to µ, the measure associated with F, is when the pdf f U i of U i exists, i = 1, 2, 3.

Special Cases
In this section, we derive new bivariate models from Theorem 1, taking into account particular baseline distribution vectors (F U 1 , F U 2 , F U 3 ).
Note that, if the baseline components U i s belong to the same distribution family, say F U , then the proposed generator provides novel extended bivariate versions of that distribution F U . Furthermore, under certain restrictions on the underlying parameters of each U i , bivariate distributions given in the literature are obtained. From now on, it is assumed that all parameters of each F U i are positive unless otherwise mentioned.
Extended bivariate Gumbel-G model. Alzaatrech et al. [32] proposed a transformed-transformer method for generating families of continuous distributions. From such method, it is said that a random variable U follows a Gumbel-G model, U ∼ Gu-G(θ, α, λ) if its cdf can be expressed as where G is the transformer distribution with parameter vector λ.

Extended bivariate Burr type X-G model.
From the transformed-transformer method of Alzaatrech et al. [32], it is said that a random variable U follows a Burr X-G model, U ∼ BX-G(θ, λ) if its cdf can be expressed as where λ is the parameter vector of the transformer distribution G.
GBD models from different baseline components. In addition, a GBD model can be derived from baseline components U i s belonging to different distribution families, which allows one to generate new bivariate distributions.
For illustrative purposes, Figure 1a-d display 3D surfaces of different joint pdfs given by Theorem 3, along with their contour plots. Here, U 1 and U 2 are taken identically distributed GE(θ, λ) with different shape and scale parameter values, and U 3 having a Weibull distribution with scale parameter λ 3 and shape parameter α = 6, W(λ 3 , 6). Figure 1 shows that some of these GBD models are multi-modal bivariate models. It indicates a variety of shapes for the GBD family depending on the different baseline distribution components and for different parameter values.

Distributional Properties
Here, we derive the marginal and conditional distributions of the GBD family, and the order statistics. Furthermore, some properties for particular baseline distribution vectors are provided.

Marginal and Conditional Distributions
From Theorem 1, it is easy to obtain the marginal cdfs of the components X i 's, which can be written as and, when the pdf f U i of U i exists, i = 1, 2, 3, the corresponding pdfs are given by For instance, we shall now suppose that U i s have PRH distributions, in order to provide some preservation results of the PRH property on the marginals, and its closure under exponentiation of the underlying distributions.
. Moreover, when the base distribution is common, F B = F B i , then X i s also have the same baseline distribution F B .
Proof. It immediately follows from (5) and the EBPRH model, since , then X i s also have the same base distribution F B .
In addition, Figure 2 displays the plots of the marginal pdfs of the GBD models depicted in Figure 1a-d.
Note that Figure 2a-d show some bimodal shapes for the marginal pdfs given by (6) of the GBD models represented in Figure 1a-d, which also exhibit some multi-modal shapes of the joint pdfs. In this setting, Proposition 1 might be used to generate bimodal distributions from the marginals of the GBD family by mixing different baseline distribution components as in Figure 1.   (c) U 1 , U 2 ∼ GE(6, 1) and U 3 ∼ W(1, 6).  Furthermore, we provide some results about the conditional distributions of a GBD model whose proof can be found in Appendix A. Theorem 4. If (X 1 , X 2 ) has a GBD model with baseline distribution vector (F U 1 , F U 2 , F U 3 ), then 1. The conditional distribution of X i given X j ≤ x j (i = j), say F i|X j ≤x j , is an absolutely continuous cdf given by 2. The conditional pdf of X i given X j = x j (i = j), say f i|X j =x j , is a convex combination of an absolutely continuous cdf and a degenerate cdf given by where I x j is the indicator function of the given point x j , and f i|x j ,ac is the absolutely continuous part

Minimum and Maximum Order Statistics
Now, we provide the cdfs of the maximum and minimum order statistics of a GBD model, which may be interpreted as the lifetimes of parallel and series systems based on the components of (X 1 , X 2 ).
, then their cdfs are given by where U 1:2 = min(U 1 , U 2 ) and U 3: Proof. It is trivial from (1) and (5) by taking into account that The pdfs f T 1 and f T 2 of the minimum and maximum statistics can be readily obtained by differentiation of (7).
Furthermore, the PRH property is preserved by the maximum order statistic of a GBD model, which is immediately derived from Theorem 5.
, then T 2 also has the same base distribution F B .

Dependence and Stochastic Properties
In this section, we study various dependence and stochastic properties on the GBD family, its marginals and order statistics, and its copula representation. Notions of dependence and ageing for bivariate distributions can be found in Lai and Xie [34] and Balakrishnan and Lai [4]; see also Shaked and Shantikumar [35] for univariate and multivariate stochastic orders.

Proof. From (1) and (5), it is readily obtainable that
, which is equivalent to say that all random vector (X 1 , X 2 ), having a GBD model, is PQD.
An immediate consequence of the PQD property is that Cov(X 1 , X 2 ) > 0. Other important bivariate dependence properties are the following, whose proofs are provided in Appendix A. Proposition 3. Let (X 1 , X 2 ) be a random vector having a GBD model: 3. Its joint cdf F is totally positive and of order 2 (TP 2 ).
Proof. Note that F is TP 2 is equivalent to (X 1 , X 2 ) is LCSD, which implies LTD (e.g., see Balakrishnan and Lai [4]). Thereby, we only have to prove (3). From the definition of TP 2 property, it is equivalent to check that the following inequality holds: for all x and x , where x ∨ x = (max(x 1 , x 1 ), max(x 2 , x 2 )), and x ∧ x = (min(x 1 , x 1 ), min(x 2 , x 2 )). Hence, from (1), the inequality (8) can be expressed as (8) can be simplified as follows: which is trivial, since v ≤ w and F U 3 is a cdf. An analogous development follows for u > v, which completes the proof.
Let us see now some results related to the reversed hazard gradient of a random vector from the GBD family, which is defined as an extension of the univariate case, see Domma [36], where each r i (x) represents the reversed hazard function of (X i |X j ≤ x j ), i = j = 1, 2, and assuming that F is differentiable. In addition, it is said that (X 1 , X 2 ) has a bivariate decreasing (increasing) reversed hazard gradient, BDRHG (BIRHG), if all components r i s are decreasing (increasing) functions in the corresponding variables.

Proposition 4.
If (X 1 , X 2 ) has a GBD model with baseline distribution vector (F U 1 , F U 2 , F U 3 ), then its reversed hazard gradient r(x) is given by Proof. The proof is straightforward from the definition of reversed hazard rate function corresponding to the conditional cdf F i|X j ≤x j given by (1) of Theorem 4.

Theorem 6.
Let (X 1 , X 2 ) be a random vector having a GBD model. If U i s have decreasing reversed hazard functions (DRH), then (X 1 , Proof. It is straightforward from Proposition 4.
Note that Theorem 6 provides the closure of the DRH property under the formation of a GBD model. Thus, the bivariate extension of a DRH distribution F U generated by the GBD family is BDRHG.
Nevertheless, it does not hold for the increasing reversed hazard (IRH) property, since both r i (x) given in Proposition 4 have a negative jump discontinuity at x i = x j for i = j = 1, 2. Therefore, if U i ∈ IRH, then (X 1 , X 2 ) cannot be BIRHG.
Finally, we present some interesting stochastic ordering results between bivariate random vectors of GBD type.
Proof. The result immediately follows from the stochastic ordering between components and (1), since , and the lower orthant ordering is defined by the inequality , then it is only necessary that θ i ≤ θ * i to hold the lower orthant ordering.

Marginals and Order Statistics
Now, we study some stochastic properties of the marginals and the minimum and maximum order statistics of the GBD model.
Firstly, from (5) and (6), the reversed hazard function of the marginal X i s can be expressed as Therefore, the DRH (IRH) property is preserved to the marginals.

Remark 2.
Note that the IRH distributions have upper bounded support [37]. Thus, if any U i is not upper bounded, its reversed hazard function is always decreasing at the end, and then the marginal cannot be IRH. Therefore, it is necessary that U i ∈ IRH (i = 1, 2, 3) and they have the same upper bounds to be X i ∈ IRH (i = 1, 2).

Example 1.
Suppose U i s have extreme value distributions of type 3 with a common support, U i ∼ EV3(β, λ i , k i ), whose cdf is defined by and F U i (u) = 1 otherwise. Its reversed hazard function is given by , and, consequently, X i ∈ IRH(DRH) (i = 1, 2).

Example 2.
If (X 1 , X 2 ) has an EBGE model, then its marginals are DRH, since r X i given by (9) is the sum of two decreasing functions because of each which is evidently a decreasing function. Here, Exp(λ) denotes an exponential random variable with mean 1/λ. (9), when the U i s have a common distribution F U , then the marginals X i ∼ PRH(2) with base distribution F U . Therefore, r X i (x) = 2r U (x) has the same monotonicity. In particular, if F U ∈ DRH (IRH), then X i ∈ DRH (IRH).

Remark 3. From
. Thus, Remark 3 also holds by using F B instead of F U .
Secondly, the mean inactivity time (MIT), also called mean waiting time [37], of a random variable X is defined as Thus, from (5), the MIT of the marginal X i s of a GBD model can be derived by Here, we shall focus on two particular cases of GBD models, having baseline components with monotonous MIT, which is preserved by the marginals.
which is an increasing MIT function (IMIT), i.e., U i ∈ I MIT. From (10), we obtain the MIT function of the marginals X i s for the bivariate exponential version of GBD type, Then, upon differentiation, m X i (x) has the same sign as the expression 1 − e −2λx − 2λxe −λx , which is positive, and therefore X i ∈ I MIT (i = 1, 2).
, then its MIT can be expressed as where Φ(u; µ, σ i ) and φ(u; µ, σ i ) are the cdf and pdf of a normal model with µ = β and σ i = 1 √ 2λ i , respectively. Moreover, taking into account that a random variable and its standardized version have PRH functions, and the standard normal distribution has the DRH property [38], we obtain that U i ∈ I MIT.
On the other hand, the following stochastic orderings among the three baseline components of two GBD models are preserved by their corresponding marginals. The proof immediately follows from the definitions of the stochastic orderings. Theorem 9. Let (X 1 , X 2 ) and (Y 1 , Y 2 ) have GBD models with base distribution vectors (F U 1 , Finally, we discuss some stochastic properties of the minimum and maximum order statistics of the GBD family. In this setting, from (7), the reversed hazard function of the maximum statistic T 2 of (X 1 , X 2 ) of GBD type is determined by the sum of the reversed hazard rates of the baseline distribution vector: when the pdf f U i of U i exists, i = 1, 2, 3. Hence, it is immediate the following result. 1, 2, 3). Then, the reversed hazard function of T 2 is given by and, therefore, if every k i ≤ (≥)1, i = 1, 2, 3, then r T 2 is increasing (decreasing) in x, i.e., T 2 ∈ IRH(DRH).

Example 6.
If U i ∼ GE(θ i , λ i ), then the maximum statistic of the EBGE model is DRH, T 2 ∈ DRH, since (11) is the sum of three decreasing functions.

Remark 5.
When U i s have a common distribution F U , the GBD model has a maximum statistic whose cdf is F U cube, and (11) can be written as r T 2 (x) = 3r U (x). In particular, if F U ∈ DRH (IRH), then T 2 ∈ DRH (IRH).
Furthermore, the MIT of the maximum statistic of a GBD model (X 1 , X 2 ) can be derived by for each specific baseline distribution vector (F U 1 , F U 2 , F U 3 ), when the integral exists. For instance, we will consider a particular case, similar to one used in Example 4. 2), and consequently, U i ∈ I MIT for i = 1, 2, 3. Moreover, from Corollary 2, the maximum statistic T 2 ∼ EV3(β, θ * , 2) with θ * = θ 1 λ 1 + θ 2 λ 2 + θ 3 λ 3 . Thus, T 2 ∈ I MIT which is obtained along the same line as Example 4, since Regarding the minimum statistic T 1 of (X 1 , X 2 ) of GBD type, some preservation results are also obtained based on its reversed hazard rate r T 1 , the proofs are given in Appendix A, and from (7) r T 1 can be written as Theorem 11. If U i ∈ DRH (i = 1, 2, 3) and U 1:2 ≤ rh U i (i = 1, 2), then T 1 ∈ DRH.

Remark 7.
Note that, when U i s have a common distribution F U , (12) can be expressed as r T 1 (x) = r U (x)(3 − 2/(2 − F U (x))), and from Corollary 4, it is immediate to have that, if F U ∈ DRH, then T 1 ∈ DRH.
Proof. From (11) and (12), the statement is equivalent to r U 1:2 (x) ≤ r U 2:2 (x), which readily follows from Theorem 1.B.56 of Shaked and Shanthikumar [35], since the baseline components U i s are independent.

Copula and Related Association Measures
Let us see now the copula representation of the GBD family and some related dependence measures of interest in the analysis of two-dimensional data.
It is well known that the dependence between the random variables X 1 and X 2 is completely described by the joint cdf F(x 1 , x 2 ), and it is often represented by a copula which describes the dependence structure in a separate form from the marginal behaviour. In this setting, from Sklar's theorem (e.g., see [39]), if its marginal cdfs F X i s are absolutely continuous, then the joint cdf has a unique copula representation for Now, we can derive the copula representation for the joint cdf of the GBD family as a function of its base distribution vector (F U 1 , F U 2 , F U 3 ). In order to do this, by using (5), the joint cdf (1) can be expressed as and taking u i = F X i (x i ), the associated copula for an arbitrary base distribution vector (F U 1 , F U 2 , F U 3 ) can be written as where which allows us to give an additional result.
Theorem 13. Let X = (X 1 , X 2 ) and Y = (Y 1 , Y 2 ) be two GBD models with baseline distribution vectors , respectively. If X and Y have the same associated copula and U i ≤ st V i , then X ≤ st Y.
Proof. It is immediate by using Theorem 6.B.14 of Shaked and Shanthikumar [35] and (5), since Corollary 5. Let X = (X 1 , X 2 ) and Y = (Y 1 , Y 2 ) be two GBD models with common baseline distributions, F U and F V , respectively. If U ≤ st V, then X ≤ st Y.
Note that (13) provides a general formula to establish the specific copula upon considering two particular continuous and increasing bijective functions A 1 and A 2 from [0, 1] onto [0, 1]. Fang and Li [40] analyzed some stochastic orderings for an equivalent copula representation to (13) with interesting applications in network security and insurance. In the last section, we shall use the bivariate copula representation (13) to discuss the multivariate extension of the GBD family.
Furthermore, (13) may be considered a generalization of the Marshall-Olkin copula, as displayed in the following results whose proofs are omitted. Corollary 6. If (X 1 , X 2 ) has a GBD model with a common base distribution F U , then the copula representation of its joint cdf is C(u 1 , u 2 ) = min(u 1 u 1/2 2 , u 1/2 1 u 2 ).

Corollary 7.
If (X 1 , X 2 ) has a GBD model with PRHs baseline distribution vector of the same base F B , i.e., (X 1 , X 2 ) ∼ BPRH(θ 1 , θ 2 , θ 3 ), then the copula representation of its joint cdf is Some association measures for a bivariate random vector (X 1 , X 2 ) of GBD type can be derived from the dependence structure described by the general expression (13) for each particular pair of continuous and increasing bijective functions A 1 and A 2 determined by the specific baseline distribution vector. For instance, for the special GBD models given in Corollaries 6 and 7, the measures of dependence namely Kendall's tau, Spearman's rho, Blomqvist's beta, and tail dependence coefficients, see Nelsen [39] among others, can be calculated as follows.
Kendall's tau. The Kendall's τ is defined as the probability of concordance minus the probability of discordance between two pairs of independent and identically distributed random vectors, (X 1 , X 2 ) and (Y 1 , Y 2 ), as follows: and it can be calculated through its copula representation C(u 1 , u 2 ) by with U i s uniform [0, 1] random variables whose joint cdf is C. For example, if (X 1 , X 2 ) has a GBD model with a common baseline F U , upon substituting from the copula of Corollary 6 in (14), it is easy to check that Kendall's τ = 1/3.
Analogously, from the copula given in Corollary 7 of the GBD model for PRH(θ i ) components with a common base F B , the Kendall's τ coefficient (14) can be written as Spearman's rho. The Spearman's ρ coefficient measures the dependence by three pairs of independent and identically distributed random vectors, (X 1 , X 2 ), (Y 1 , Y 2 ) and (Z 1 , Z 2 ). It is defined as which can be computed by its copula representation C(u 1 , u 2 ) by Thus, if there is a common base distribution as in Corollary 6, the Spearman's ρ coefficient between X 1 and X 2 is ρ = 3/7.
Blomqvist's Beta. The Blomqvist's β coefficient, also called the medial correlation coefficient, is defined as the probability of concordance minus the probability of discordance between (X 1 , X 2 ) and its median point, say (m 1 , m 2 ), taking the following form: and from the copula of its joint cdf F, it can be expressed as In the case of Corollary 6, it is immediate that the medial correlation coefficient between X 1 and X 2 is β = √ 2 − 1 when it follows a GBD model with a common baseline distribution. In the other case, from Corollary 7, the Blomqvist's β coefficient (16) is also readily obtainable between the marginals of a BPRH model: which takes values between 0 and 1 as θ 3 varies from 0 to ∞.
Tail Dependence. The tail dependence measures the association of extreme events in both directions, the upper (lower) tail dependence λ U (λ L ) provides an asymptotical association measurement in the upper (lower) quadrant tail of a bivariate random vector, given by (if it exists) Similar to the above association coefficients, the tail dependence indexes can be calculated from the copula representation C(u 1 , u 2 ) of the joint cdf of (X 1 , X 2 ), as follows: In particular, if (X 1 , X 2 ) follows a GBD model with a common baseline distribution, upon substituting from the copula of Corollary 6 in (17), it is easy to check that λ L = 0 and λ U = 1/2.
In the case of U i ∼ PRH(θ i ) with the same base, from (17) and Corollary 7, it is clear that the tail dependence indexes of the BPRH model are λ L = 0 and which takes values between 0 and 1 as θ 3 varies from 0 to ∞.

Maximum Likelihood Estimation
In this section, we address the problem of computing the maximum likelihood estimations (MLEs) of the unknown parameters based on a random sample. The problem can be formulated as follows. Suppose {(x 1i , x 2i ); i = 1, . . . , n} is a random sample of size n from a GBD model, where it is assumed that, for j = 1, 2, 3, U j has the pdf f U j (u; θ j ) and θ j is of dimension p j . The objective is to estimate the unknown parameter vector θ = (θ 1 , θ 2 , θ 3 ). We use the following partition of the sample: Based on the above observations, the log-likelihood function becomes Here, it is difficult to compute the MLEs of the unknown parameter vector θ by solving a p 1 + p 2 + p 3 optimization problem. To avoid that, we suggest using the EM algorithm, and the basic idea is based on considering a random sample of size n from (U 1 , U 2 , U 3 ), instead of the random sample of size n from (X 1 , X 2 ). From the observed sample {(x 1i , x 2i }, the sample {(u 1i , u 2i , u 3i ); i = 1, . . . , n} has missing values as shown in Table 1. It is immediate that the MLEs of θ 1 , θ 2 and θ 3 can be obtained by solving the following three optimization problems of dimensions p 1 , p 2 and p 3 , respectively, which are computationally more tractable. Table 1. Relation between (x 1i , x 2i ) and (u 1i , u 2i , u 3i ).
From Table 1, if, i ∈ I 0 , then u 3i is known, and u 1i and u 2i are unknown. Similarly, if i ∈ I 1 (i ∈ I 2 ), then u 2i (u 1i ) and max{u 1i , u 3i } (max{u 2i , u 3i }) are known. Hence, in the E-step of the EM algorithm, the 'pseudo' log-likelihood function is formed by replacing the missing u ji by its expected value, u jim (θ), for i = 1, . . . , n and j = 1, 2, 3: 2. If i ∈ I 1 and j, k ∈ {1, 3}, j = k, then 3. If i ∈ I 2 and j, k ∈ {2, 3}, j = k, then Therefore, we propose the following EM algorithm to compute the MLEs of θ. Suppose at the k-th iteration of the EM algorithm, the value of θ is θ (k) = (θ (k) 3 ), then the following steps can be used to compute θ (k+1) : At the k-th step for i ∈ I 0 , obtain the missing u 1i and u 2i as u 1im (θ (k) ) and u 2im (θ (k) ), respectively. For i ∈ I 1 obtain the missing u 1i and u 3i as u 1im (θ (k) ) and u 3im (θ (k) ), respectively. Similarly, for i ∈ I 2 , obtain the missing u 2i and u 3i as u 2im (θ (k) ) and u 3im (θ (k) ), respectively. • Form the 'pseudo' log-likelihood function as ) can be obtained by maximizing Mainly for illustrative purposes, two particular GBD models will be applied in the next section to show the usefulness of the above EM algorithm. Firstly, we shall consider a GBD model with baseline components having the same distribution type and different underlying parameters. Secondly, we shall use a GBD model with baseline components from different distribution families. The technical details of both of them can be found in Appendix B.

Data Analysis
In this section, we present the analysis of two-dimensional data sets in order to show how the proposed EM algorithm can be applied to fit particular GBD models. For that, we shall suppose the following two models described in Appendix B: Model I is the GBD model with the exponential baseline distributions and different underlying parameters, U j ∼ Exp(λ j ) (j = 1, 2, 3). Model II is the GBD model with baseline components from Weibull and generalized exponential distributions, U 1 ∼ W(λ 1 , α 1 ), U 2 ∼ W(λ 2 , α 2 ) and U 3 ∼ GE(α 3 , λ 3 ).

Soccer Data
We have analyzed a UEFA Champion's League data set [41], played during the seasons 2004-2005 and 2005-2006. This set represents the soccer data where at least one goal has been scored by a kick goal (penalty kick, foul kick or any other direct kick) by any team and one goal has been scored by the home team. Here, in the bivariate data, (X 1 , X 2 ), X 1 represents the time in minutes of the first kick goal and X 2 represents the time in minutes scored by the home team. Clearly, all possibilities exist in the data set, namely X 1 < X 2 , X 1 > X 2 and X 1 = X 2 .
Meintanis [41] analyzed this data set using the Marshall-Olkin bivariate exponential model. The marginals of the Marshall-Olkin bivariate exponential distribution are exponential, and then they have constant hazard functions. A preliminary data analysis indicated that the empirical hazard function of both the marginals are increasing and their reversed hazard functions are decreasing. Hence, it may not be proper to use the Marshall-Olkin bivariate exponential model to analyze this data.
Example 9. In order to use Model I, we have started the initial guess as λ The algorithm stops after eight iterations, the final estimates and the associated 95% confidence intervals are λ 1 = 0.03126 (±0.01121), λ 2 = 0.04630 (±0.01563) and λ 3 = 0.04269 (±0.01875), with −257.8871 being the pseudo log-likelihood value. To check whether it has converged to the maximum or not, the performance of the EM algorithm may be compared with the experimental results obtained by using a quasi-Newton method for solving constrained nonlinear optimization problem, which have been summarized in Appendix C as well as the corresponding ones to the subsequent examples.
One natural question is whether Model I fits the bivariate data or not. We have computed the Kolmogorov-Smirnov (KS) distances with the corresponding p-values between the empirical and fitted cdfs for the marginals and the maximum order statistic. The results are reported in Table 2, and, from them, we cannot reject the null hypothesis that this data are coming from the GBD model with exponential baseline distributions.

Example 10.
Let us consider now Model II. We have started the EM algorithm with the initial guesses as α The KS distances with the corresponding p-values for the marginals and the maximum statistic are reported in Table 2. Thus, based on the p-values, we can say that the GBD model with two baseline Weibull distributions and the third GE one fits the data reasonably well.
Summarizing, it is clear that both of the GBD models provide a good fit to the given data set and the EM algorithm also works quite effectively in both the cases. Now, to compare Models I and II of Examples 9 and 10, which provide a better fit, we compute the Akaike's information criterion (AIC) and Bayesian information criterion (BIC) values and they are also presented in Table 2. Therefore, based on the AIC and BIC values, it is clear that Model I provides a better fit than Model II to the UEFA Champion's League data set. Table 2. Goodness-of-fit results for UEFA Champion's League data.

Diabetic Retinopathy Data
Let us consider now the diabetic retinopathy data set [42], available in the R package "SurvCor" [43]. Such data were investigated by the National Eye Institute to assess the effect of laser photocoagulation in delaying the onset of severe visual loss such as blindness in 197 patients with diabetic retinopathy. For each patient, one eye was randomly selected for laser photocoagulation and the other was given no treatment, being used as the control. The times to blindness in both eyes were recorded in months and the censoring was caused by death, dropout, or the end of the study.
For illustrative purposes, we have considered those patients for which complete data are available. Here, X 1 denotes the time to the blindness of the untreated or control eye and X 2 denotes the time to blindness of the treated eye. Out of 197 patients, we have complete information of X 1 and X 2 for 38 patients.
Example 11. As in Example 9, we have used Model I to analyze the data set. In this case, we have also used the same initial guess as λ We have used the proposed EM algorithm, the iteration stops after 14 iterations, and the estimates of unknown parameters and the corresponding 95% confidence intervals are λ 1 = 0.0653 (±0.0175), λ 2 = 0.0737(±0.0210) and λ 3 = 0.1345(±0.3879), with −172.2314 being the associated pseudo log-likelihood value.
The KS distances with the corresponding p-values between the empirical and fitted cdfs for the marginals and the maximum statistic are presented in Table 3.

Example 12.
As in Example 10, we have analyzed the data set by using Model II. We have started the EM algorithm with the initial guesses α The KS distances with the corresponding p-values for the marginals and the maximum order statistic are presented in Table 3.
From Table 3, we can also say that the estimated GBD models fit the diabetic retinopathy data reasonably well in both the cases. Moreover, we also present the AIC and BIC values of the two models in Table 3. Therefore, based on the AIC and BIC values, it is clear that Model I provides a better fit than Model II for the diabetic retinopathy data.

Discussion and Conclusions
In this paper, we have presented the generalized bivariate distribution family by a generator system based on the maximization process from any three-dimensional baseline continuous distribution vector with independent components, providing bivariate models with dependence structure.
For the proposed GBD family, several distributional and stochastic properties have been established. The preservation of the PRH property for the marginals and the maximum order statistic has been obtained. The positive dependence has been shown between both marginals of the GBD models, some results about stochastic orders and on the preservation of the monotonicity of the reversed hazard function and of the mean inactivity time. Furthermore, the copula representation of the GBD model has been discussed, providing a general formula, and some related dependence measures have been also calculated for specific copulas of particular bivariate distributions of the GBD family. In addition, new bivariate distributions can be generated by combining independent baseline components from different distribution families, and several bivariate distributions given in the literature are derived as particular cases of the GBD family.
Note that, even in the simple case, the MLEs cannot be obtained in explicit forms, and it is required solving a multidimensional nonlinear optimization problem. We have proposed using an EM algorithm to compute the MLEs of the unknown parameters, and it is observed that the proposed EM algorithm perform quite satisfactorily in the two data analyses by using two different models of the GBD family. The experimental results summarized in Table A1 disclose such efficiency of the EM algorithm with respect to a conventional numerical iterative procedure of the Newton-type. In more detail, Table A1 presents the experimental results obtained by the Broyden-Fletcher-Goldfarb-Shanno algorithm for maximizing the log-likelihood function, available in the R package "maxLik" [44].
It is worth mentioning that the bivariate copula representation (13) allows us to discuss its multivariate extension. Let U i s for i = 1, . . . , q + 1 be a set of q + 1 mutually independent random variables with any continuous distribution functions, denoting by F U i the cdf of each U i . Similarly to (1), the joint cdf of the q-dimensional random vector (X 1 , . . . , X q ) with X i = max(X 1 , X 2 ) is given by which can be considered as a generator of q-dimensional distribution models, called generalized multivariate distribution (GMD) family with baseline distribution vector (F U 1 , ..., F U q+1 ). Hence, the q-dimensional copula representation of this GMD family can be expressed as From these q-dimensional joint cdf and copula, many distributional and stochastic properties established for the GBD family are extensible to the GMD family. Furthermore, by using this generator of multivariate distributions, the special bivariate models given in Section 3 can be easily extended to the multivariate case, which contain multivariate versions of bivariate distributions given in the literature.
Hence, it is immediate that F s (x 1 , x 2 ) given by (3) is a singular cdf as its mixed second partial derivatives are zero when x 1 = x 2 .
Thus, α = P(A) may be established as follows: and, consequently, the bivariate cdf F(x 1 , x 2 ) can be rewritten as (2), where the absolutely continuous part F ac (x 1 , x 2 ) can be obtained by subtraction: which completes the proof of the theorem.
Proof of Theorem 3. Let µ, µ s and µ ac be the measures associated with F, F s and F ac , respectively. Obviously, µ ac is an absolutely continuous measure with respect to the two-dimensional Lebesgue measure since where the pdf associated with F ac in (4), f ac (u, v) = ∂ 2 ∂u∂v F ac (u, v), can be written as On the other hand, µ s is given by where z = min(x 1 , x 2 ), and so it can be expressed as an absolutely continuous measure µ * s with respect to the one-dimensional Lebesgue measure on the projection onto the line R of the intersection between (−∞, x 1 ] × (−∞, x 2 ] and the line x 1 = x 2 : , which can be also written as f * s (u) = 1 α f 0 (u). Furthermore, it is trivial that the line x 1 = x 2 is a null set under the two-dimensional Lebesgue measure, and hence with respect to µ ac . In addition, its complement {(x 1 , x 2 ) ∈ R 2 |x 1 = x 2 } is a null set with respect to µ s , since its projection onto the line R is the empty set, and, consequently, the measures µ s and µ ac are mutually singular. Therefore, the measure associated with F µ ((−∞, allows us to have the pdf of a GBD model with respect to µ, given by where I (x 1 =x 2 ) is the indicator function of x 1 = x 2 . Hence, it is easy to check that Proof of Theorem 4. From (1) and (5), the proof of (1) of Theorem 4 is straightforward.
In order to prove (2) of Theorem 4, from the joint pdf of a GBD model given in Theorem 3 and its marginal pdf (6), the conditional pdf f i|X j =x j can be expressed as , if x i = x j , by using the notation α j = f i|X j =x j (x j ), this conditional pdf can be readily rewritten as in the statement of Theorem 4.
Proof of Theorem 11. The reversed hazard function (12) of the minimum statistic can be rewritten as where each g i is a positive function (i = 1, 2) defined by Here, observe that U 1:2 ≤ rh U i implies the decreasing monotonicity of g i (x), and therefore r T 1 is a sum of three decreasing functions, which completes the proof.
Proof of Corollary 4. The proof readily follows along the same line as Theorem 11, taking into account that (12) can be simplified by using

Appendix B
For practical implementation of the EM algorithm in the data analysis applications, we give the technical details of the EM algorithm for two particular GBD models, first with baseline component vector with the same distribution (Model I), and then with different baseline distributions (Model II).
For implementation of the EM algorithm, we need the following expected values: 1. If i ∈ I 0 , then 3. If i ∈ I 2 , then Hence, the 'pseudo' log-likelihood function in this case becomes 3im + ∑ i∈I 0 x i , and the u (k) jim s are obtained from u jim (θ), j = 1, 2, 3, by replacing θ = (λ 1 , λ 2 , λ 3 ) with θ (k) = (λ (k) Note that, in this case, the maximization can be performed analytically at each M-Step.