Bivariate Proportional Hazard Models: Structure and Inference

Barry C. Arnold; Guillermo Martínez-Flórez; Héctor W. Gómez

doi:10.3390/sym14102073

,

and

¹

Statistics Department, University of California Riverside, Riverside, CA 92521, USA

²

Departamento de Matemáticas y Estadística, Facultad de Ciencias, Universidad de Córdoba, Córdoba 2300, Colombia

³

Departamento de Matemática, Facultad de Ciencias Básicas, Universidad de Antofagasta, Antofagasta 1240000, Chile

^*

Author to whom correspondence should be addressed.

Symmetry2022, 14(10), 2073;https://doi.org/10.3390/sym14102073

This article belongs to the Special Issue Mathematical Models and Methods in Various Sciences

Version Notes

Order Reprints

Abstract

We focus on a variety of bivariate models with proportional hazard components. Models with proportional hazard marginals are described together with a selection of models with proportional hazard conditional distributions. The bivariate distributions with marginal proportional hazards distributions are shown to be closely related to certain known bivariate exponential models. Two distinct kinds of conditional specification are investigated. Discussion is provided of cases with hazard function components that are (1) completely unknown, (2) known to belong to given parametric families and (3) completely known. Since the models are designed for use with survival data, it is inevitable that the marginal and conditional distributions will be asymmetric. However, logarithmic transformations in some cases will result in symmetric component distributions.

Keywords:

bivariate exponential models; conditional specification; gumbel distribution; dependent lifetime data; functional equation

1. Introduction

Survival models involving families of densities with proportional hazard functions have proved to be useful for analyzing many lifetime data sets. Not infrequently bivariate survival data (involving related lifetimes) need to be analyzed. In this paper, we review several methods for generating suitable bivariate models for such situations. The key observation in the development is that proportional hazard models can be viewed as ones obtained via monotone transformations applied to exponential models. In the latter sections of the paper, related statistical inference issues are discussed. Since the models are designed for use with survival data, it is inevitable that the marginal and conditional distributions will be asymmetric. However, logarithmic transformations in some cases will result in symmetric component distributions.

2. Bivariate Distributions with Proportional Hazard Marginals

Let

F_{1}

and

F_{2}

be two absolutely continuous distribution functions with support

(0, \infty)

and with corresponding densities

f_{1}

and

f_{2}

and hazard functions

h_{1}

and

h_{2}

(where

h_{i} = f_{i} / (1 - F_{i}), i = 1, 2

).

We will say that

(X_{1}, X_{2})

has a bivariate marginal proportional hazard distribution associated with

F_{1}

and

F_{2}

and with parameters

α_{1}, α_{2} > 0

if for

i = 1, 2

f_{X_{i}} (x_{i}; F_{i}, α_{i}) = α_{i} {[1 - F_{i} (x_{i})]}^{α_{i} - 1} f_{i} (x_{i}) I (x_{i} > 0),

(1)

and we will write

(X_{1}, X_{2}) \sim P H M (F_{1}, α_{1}; F_{2}, α_{2})

and also

X_{i} \sim P H (F_{i}, α_{i}) i = 1, 2

(here

P H M

is an acronym for proportional hazard marginals). Note that the name is appropriate since for

i = 1, 2

h_{X_{i}} (x_{i}; F_{i}, α_{i}) = α_{i} h_{i} (x_{i}) .

(2)

Observe that if

(X_{1}, X_{2}) \sim P H M (F_{1}, α_{1}; F_{2}, α_{2})

then the

X_{i}

’s admit a representation of the form

X_{i} = F_{i}^{- 1} (1 - e^{- Y_{i}}), i = 1, 2,

(3)

where for each i,

Y_{i} \sim exp (α_{i})

(i.e.,

f_{Y_{i}} (y_{i}) = α_{i} e^{- α_{i} y_{i}} I (y_{i} > 0)

).

Note that the transformations in (3) are monotone increasing.

In fact, it is not necessary to build the model with reference to two distribution functions

F_{1}

and

F_{2} .

Instead we can begin with

Y_{i} \sim exp (α_{i}), i = 1, 2

and use two monotone increasing functions

g_{i} : (0, \infty) \to (0, \infty)

to define

X_{i} = g_{i} (Y_{i}), i = 1, 2 .

If we denote the corresponding inverse functions by

h_{i} (x) = g_{i}^{- 1} (x),

then it is readily verified that each

X_{i}

has a proportional hazard distribution, i.e., that for

i = 1, 2,

h_{X_{i}} (x_{i}; g_{i}, α_{i}) = α_{i} h_{i}^{^{'}} (x_{i}), x_{i} > 0 .

(4)

It is however customary to use the representation (3) involving the two distribution functions

F_{1}

and

F_{2},

and we will adhere to this convention.

Of course, in the representation (3),

(Y_{1}, Y_{2})

can have any bivariate exponential distribution that we wish to utilize. Popular choices of bivariate exponential distributions involving few additional parameters include:

(i): Gumbel Type I distribution, with

$P (Y_{1} > y_{1}, Y_{2} > y_{2}) = exp [- α_{1} y_{1} - α_{2} y_{2} - δ α_{1} α_{2} y_{1} y_{2}], 0 \leq δ \leq 1 .$
(ii): Gumbel Type II distribution, with

$P (Y_{1} \leq y_{1}, Y_{2} \leq y_{2}) = [1 - e^{- α_{1} y_{1}}] [1 - e^{- α_{2} y_{2}}] [1 + δ e^{- α_{1} y_{1} - α_{2} y_{2}}], δ \in [- 1, 1] .$
(iii): Marshall–Olkin distribution (see Marshall and Olkin [1]), with

$P (Y_{1} > y_{1}, Y_{2} > y_{2}) = exp [- α_{1} y_{1} - α_{2} y_{2} - δ α_{1} α_{2} m a x (y_{1}, y_{2})], δ \in [0, \infty) .$

Many other choices are possible, see for example Kotz et al. ([2], pp. 350–385).

For a specific example, if we choose

(Y_{1}, Y_{2})

to have a Gumbel Type I density, i.e.,

f_{Y_{1}, Y_{2}} (y_{1}, y_{2}) = α_{1} α_{2} [(1 + δ y_{1}) (1 + δ y_{2}) - δ] exp {- α_{1} y_{1} - α_{2} y_{2} - α_{1} α_{2} δ y_{1} y_{2}},

(5)

and use

F_{1} (x_{1}) = x_{1}^{γ_{1}}, 0 < x_{1} < 1

and

F_{2} (x_{2}) = x_{2}^{γ_{2}}, 0 < x_{2} < 1,

then the resulting PHM density will be of the form:

\begin{matrix} f_{X_{1}, X_{2}} (x_{1}, x_{2}) & = & α_{1} α_{2} γ_{1} γ_{2} [(1 - α_{1} log (1 - x_{1}^{γ_{1}})) (1 - α_{2} log (1 - x_{2}^{γ_{2}})) - δ] x_{1}^{γ_{1}} x_{2}^{γ_{2}} \\ \times exp \{- α_{1} α_{2} δ log (1 - x_{1}^{γ_{1}}) log (1 - x_{2}^{γ_{2}})\} . \end{matrix}

In application of such models, it is frequently desirable to postulate that each

F_{i}

is a member of some parametric family of distributions to add flexibility to the model. The dependence structure will however be completely determined by the copula of the particular bivariate exponential distribution used in the construction.

Alternatively, one could “let the data tell us which

F_{i}

’s to use in the model”. Thus, we would seek monotone marginal transformations that will make the transformed marginal sample distributions look as much like exponential distributions as possible. This semi-parametric approach will be returned to later in the paper.

3. Bivariate Distributions with Proportional Hazard Conditionals

We will consider two types of conditioning. The first kind is quite traditional in that we consider the distribution of one variable given that a second variable takes on a particular value. The second kind involves conditioning on the event that the second variable is larger than a particular value.

3.1. The First Kind

Consider two proportional hazard families of densities as in (1). Recall that we write

X_{i} \sim P H (F_{i}, α_{i})

if the corresponding densities are given by (1).

For this conditional proportional hazards paradigm, we seek to identify joint distributions for

(X_{1}, X_{2})

with all conditional densities of the forms in (1). Thus, for each

x_{2} > 0

we wish to have

X_{1} | X_{2} = x_{2} \sim P H (F_{1}, α_{1} (x_{2})),

(6)

and for each

x_{1} > 0,

X_{2} | X_{1} = x_{1} \sim P H (F_{2}, α_{2} (x_{1})),

(7)

for some functions

α_{1} (x_{2})

and

α_{2} (x_{1}) .

It is not difficult to verify that this will be the case if and only if

X_{i} = F_{i}^{- 1} (1 - e^{- Y_{i}}), i = 1, 2,

(8)

where

(Y_{1}, Y_{2})

has a joint distribution with exponential conditionals, i.e., such that for each

y_{2} > 0

Y_{1} | Y_{2} = y_{2} \sim exp (α_{1} (y_{2})),

(9)

and for each

y_{1} > 0

Y_{2} | Y_{1} = y_{1} \sim exp (α_{2} (y_{1})) .

(10)

The class of all densities with such exponential conditionals is identified in Arnold and Strauss [3] and is of the form:

f_{Y_{1}, Y_{2}} (y_{1}, y_{2}) = k (δ) α_{1} α_{2} exp [- α_{1} y_{1} - α_{2} y_{2} - α_{1} α_{2} δ y_{1} y_{2}] I (y_{1} > 0, y_{2} > 0),

(11)

where

α_{1} > 0, α_{2} > 0

and

δ \geq 0 .

The normalizing constant,

k (δ),

in (10) can be expressed in terms of the exponential integral function. Thus

k (δ) = \frac{δ exp {- 1 / δ}}{\int_{1 / δ}^{\infty} \frac{exp {- w}}{w} d w} .

(12)

Consequently, bivariate densities with conditionals of the proportional hazard form will be given by

\begin{matrix} f_{X_{1}, X_{2}} (x_{1}, x_{2}) & = & α_{1} α_{2} k (δ) f_{1} (x_{1}) {[1 - F_{1} (x_{1})]}^{α_{1} - 1} f_{2} (x_{2}) {[1 - F_{2} (x_{2})]}^{α_{2} - 1} \\ \times exp {- α_{1} α_{2} δ log [1 - F_{1} (x_{1})] log [1 - F_{2} (x_{2})]} . \end{matrix}

(13)

This model is discussed in Arnold and Kim [4] and we will follow their nomenclature and call it a proportional hazard conditionals model of the first kind. If

(X_{1} X_{2})

has a density of the form (13), we will write

(X_{1}, X_{2}) \sim P H C (I) (F_{1}, α_{1}; F_{2}, α_{2}; δ) .

3.2. The Second Kind

Again consider two proportional hazard families of densities as in (1). Recall that we write

X_{i} \sim P H (F_{i}, α_{i})

if the corresponding densities are given by (1). The second kind of conditional model (also introduced in Arnold and Kim [4]) involves conditioning on events of the form

{X_{1} > x_{1}}

and

{X_{2} > x_{2}} .

Thus, we seek to identify joint survival functions for

(X_{1}, X_{2})

such that for each

x_{2} > 0,

X_{1} | {X_{2} > x_{2}} \sim P H (F_{1}, α_{1} (x_{2})),

(14)

and for each

x_{1} > 0,

X_{2} | {X_{1} > x_{1}} \sim P H (F_{2}, α_{2} (x_{1})),

(15)

for some functions

α_{1} (x_{2})

and

α_{2} (x_{1}) .

To analyze this situation (since

F_{1}

and

F_{2}

are known) it is again convenient to write

X_{i} = F_{i}^{- 1} (1 - e^{- Y_{i}}), i = 1, 2,

(16)

where the

Y_{i}

’s are exponential random variables.

The conditions (14) and (15) are then equivalent to the statements

Y_{1} | {Y_{2} > y_{2}} \sim exp (α_{1} (y_{2})),

(17)

and

Y_{2} | {Y_{1} > y_{1}} \sim exp (α_{2} (y_{1})) .

(18)

Denote the survival functions of

Y_{1}

and

Y_{2}

by

ψ_{1} (y_{1}) = P (Y_{1} > y_{1})

and

ψ_{2} (y_{2}) = P (Y_{2} > y_{2}) .

It then follows that

ψ_{2} (y_{2}) e^{- α_{1} (y_{2}) y_{1}} = P (Y_{1} > y_{1}, Y_{2} > y_{2}) = ψ_{1} (y_{1}) e^{- α_{2} (y_{1}) y_{2}} .

(19)

Taking logarithms we have:

log ψ_{2} (y_{2}) - α_{1} (y_{2}) y_{1} = log ψ_{1} (y_{1}) - α_{2} (y_{1}) y_{2} .

(20)

This is a Stephanos–Levi–Civita–Suto functional Equation (see Arnold et al. [5], p. 13) which is readily solved to yield the following expression for the joint survival function of

(Y_{1}, Y_{2}) :

P (Y_{1} > y_{1}, Y_{2} > y_{2}) = exp [- α_{1} y_{1} - α_{2} y_{2} - α_{1} α_{2} δ y_{1} y_{2}]

(21)

for

y_{1} > 0, y_{2} > 0,

where

α_{1} > 0, α_{2} > 0

and

0 \leq δ \leq 1 .

This is recognizable as Gumbel’s Type I bivariate exponential distribution (with exponential marginals). From Equation (21) we obtain the joint survival function of

(X_{1}, X_{2})

in the form:

\begin{matrix} P (X_{1} > x_{1}, X_{2} > x_{2}) & = & {[1 - F_{1} (x_{1})]}^{α_{1}} {[1 - F_{2} (x_{2})]}^{α_{2}} \\ \times & exp {- α_{1} α_{2} δ log [1 - F_{1} (x_{1})] log [1 - F_{2} (x_{2})]} . \end{matrix}

(22)

Then, the joint cumulative distribution function is

\begin{matrix} F (x_{1}, x_{2}) & = & {[1 - F_{1} (x_{1})]}^{α_{1}} {[1 - F_{2} (x_{2})]}^{α_{2}} exp {- α_{1} α_{2} δ log [1 - F_{1} (x_{1})] log [1 - F_{2} (x_{2})]} \\ + [1 - {(1 - F_{1} (x_{1}))}^{α_{1}}] + [1 - {(1 - F_{2} (x_{2}))}^{α_{2}}] - 1 . \end{matrix}

(23)

and the joint density function is

\begin{matrix} f_{X_{1}, X_{2}} (x_{1}, x_{2}) & = & α_{1} α_{2} f_{1} (x_{1}) {[1 - F_{1} (x_{1})]}^{α_{1} - 1} f_{2} (x_{2}) {[1 - F_{2} (x_{2})]}^{α_{2} - 1} \\ \times exp {- α_{1} α_{2} δ log [1 - F_{1} (x_{1})] log [1 - F_{2} (x_{2})]} \\ \times [(1 - α_{1} δ log [1 - F_{1} (x_{1})]) (1 - α_{2} δ log [1 - F_{2} (x_{2})]) - δ] . \end{matrix}

(24)

The vector

(Y_{1}, Y_{2})

with density (21) has exponential marginals, i.e.,

Y_{i} \sim exp (α_{i}),

for

i = 1, 2,

and thus

X_{i} \sim P H (F_{i}, α_{i},)

for

i = 1, 2 .

Consequently, for

j, j^{'} = 1, 2

the conditional densities are given by

\begin{matrix} f_{X_{j} | X_{j^{'}}} (x_{j}) & = & α_{1} f_{j} (x_{j}) {[1 - F_{j} (x_{j})]}^{α_{j} - 1} exp {- α_{j} α_{j^{'}} δ log [1 - F_{j} (x_{j})] log [1 - F_{j^{'}} (x_{j^{'}})]} \\ \times [(1 - α_{j} δ log [1 - F_{j} (x_{j})]) (1 - α_{j^{'}} δ log [1 - F_{j^{'}} (x_{j^{'}})]) - δ] . \end{matrix}

(25)

If

(X_{1}, X_{2})

has a survival function of the form (22), we write

(X_{1}, X_{2}) \sim P H C (I I) (F_{1}, α_{1}; F_{2}, α_{2}; δ),

note that

X_{1}

and

X_{2}

in (22) will be independent if and only if

δ = 0 .

4. If $F_{1}$ and $F_{2}$ Are Known

Suppose that we have available a sample of size

n,

(X_{1, j}, X_{2, j}), j = 1, 2, \dots, n

from one of the bivariate proportional hazard models discussed in this paper. Since

F_{1}

and

F_{2}

are known, it is appropriate to transform the data to obtain

Y_{i, j} = - log (1 - F_{i} (X_{i, j})), i = 1, 2, j = 1, 2, \dots, n,

and thus to have a sample

(Y_{1, j}, Y_{2, j}), j = 1, 2, \dots, n

from the corresponding well-known bivariate exponential distribution. See the following references for appropriate estimation strategies for these bivariate exponential data sets:

Gumbel [6],
Besag [7],
Arnold and Strauss ([3,8]),
Castillo and Hadi [9],
Arnold et al. [5].

5. If $F_{1}$ and $F_{2}$ Are Known to Belong to Some Given Parametric Families

We will illustrate this with a particular example. Other examples may treated in analogous fashion. Suppose that in the PHC(II) model, (23), we replace

F_{1} (x_{1})

by

F_{1} (x_{1}; θ)

and

F_{2} (x_{2})

by

F_{2} (x_{2}; τ),

where the parameters

θ

and

τ

are unknown. In this case, the model becomes more complicated, but we can still envision success in estimating all the parameters in the model. As a specific example, consider the following distributions of the Weibull form:

F_{1} (x_{1}; θ) = 1 - e^{- x_{1}^{θ}}, x_{1} > 0,

(26)

and

F_{2} (x_{2}; τ) = 1 - e^{- x_{2}^{τ}}, x_{2} > 0 .

(27)

The corresponding log-likelihood function is of the form

\begin{matrix} ℓ (θ; X_{1}, X_{2}) & = & n log (α_{1} α_{2} θ τ) + (θ - 1) \sum_{i = 1}^{n} log (x_{1 i}) + (τ - 1) \sum_{i = 1}^{n} log (x_{2 i}) - α_{1} \sum x_{1 i}^{θ} \\ - & α_{1} α_{2} δ \sum_{i = 1}^{n} x_{1 i}^{θ} x_{2 i}^{τ} - α_{2} \sum x_{2 i}^{τ} + \sum_{i = 1}^{n} log [(1 + α_{1} δ x_{1 i}^{θ}) (1 + α_{2} δ x_{2 i}^{τ}) - δ] \end{matrix}

The score function

U (θ) = (U (θ), U (τ), U (δ), U (α_{1}), U (α_{2}))

, has elements which are derivative of the log-likelihood function with respect to the parameters and thus are given by

\begin{matrix} U (α_{1}) & = & \frac{\partial ℓ (θ)}{\partial α_{1}} = \frac{n}{α_{1}} - \sum_{i = 1}^{n} x_{1 i}^{θ} - α_{2} δ \sum_{i = 1}^{n} x_{1 i}^{θ} x_{2 i}^{τ} + δ \sum_{i = 1}^{n} \frac{(1 + α_{2} δ x_{2 i}^{τ}) x_{1 i}^{θ}}{(1 + α_{1} δ x_{1 i}^{θ}) (1 + α_{2} δ x_{2 i}^{τ}) - δ}, \end{matrix}

\begin{matrix} U (α_{2}) & = & \frac{\partial ℓ (θ)}{\partial α_{2}} = \frac{n}{α_{2}} - \sum_{i = 1}^{n} x_{2 i}^{τ} - α_{1} δ \sum_{i = 1}^{n} x_{1 i}^{θ} x_{2 i}^{τ} + δ \sum_{i = 1}^{n} \frac{(1 + α_{1} δ x_{1 i}^{θ}) x_{2 i}^{τ}}{(1 + α_{1} δ x_{1 i}^{θ}) (1 + α_{2} δ x_{2 i}^{τ}) - δ}, \end{matrix}

\begin{matrix} U (δ) & = & \frac{\partial ℓ (θ)}{\partial δ} = - α_{1} α_{2} \sum_{i = 1}^{n} x_{1 i}^{θ} x_{2 i}^{τ} + \sum_{i = 1}^{n} \frac{(α_{1} x_{1 i}^{θ} + α_{2} x_{2 i}^{τ} - 1) + 2 α_{1} α_{2} δ x_{1 i}^{θ} x_{2 i}^{τ}}{(1 + α_{1} δ x_{1 i}^{θ}) (1 + α_{2} δ x_{2 i}^{τ}) - δ}, \end{matrix}

\begin{matrix} U (θ) & = & \frac{\partial ℓ (θ)}{\partial θ} = \frac{n}{θ} + \sum_{i = 1}^{n} log (x_{1 i}) - α_{1} \sum_{i = 1}^{n} x_{1 i}^{θ} log (x_{1 i}) - α_{1} α_{2} δ \sum_{i = 1}^{n} x_{1 i}^{θ} x_{2 i}^{τ} log (x_{1 i}) \\ + & α_{1} δ \sum_{i = 1}^{n} \frac{(1 + α_{2} δ x_{2 i}^{τ}) x_{1 i}^{θ} log (x_{1 i})}{(1 + α_{1} δ x_{1 i}^{θ}) (1 + α_{2} δ x_{2 i}^{τ}) - δ}, \end{matrix}

\begin{matrix} U (τ) & = & \frac{\partial ℓ (θ)}{\partial τ} = \frac{n}{τ} + \sum_{i = 1}^{n} log (x_{2 i}) - α_{2} \sum_{i = 1}^{n} x_{2 i}^{τ} log (x_{2 i}) - α_{1} α_{2} δ \sum_{i = 1}^{n} x_{1 i}^{θ} x_{2 i}^{τ} log (x_{2 i}) \\ + & α_{2} δ \sum_{i = 1}^{n} \frac{(1 + α_{1} δ x_{1 i}^{θ}) x_{2 i}^{τ} log (x_{2 i})}{(1 + α_{1} δ x_{1 i}^{θ}) (1 + α_{2} δ x_{2 i}^{τ}) - δ} . \end{matrix}

By equating the scores to zero we obtain the score equations, i.e., (

U (θ) = 0

). These equations are typically solved by means of Newton–Raphson or quasi-Newton numerical methods to obtain the maximum likelihood estimators

\hat{θ} = (\hat{θ}, \hat{τ}, \hat{δ}, {\hat{α}}_{1}, {\hat{α}}_{2})

of the parameter vector

θ = (θ, τ, δ, α_{1}, α_{2})

. The observed information matrix of

θ

is given by

K (θ) = - \frac{d}{d θ} U (θ) = (K_{θ_{j} θ_{j^{'}}})

, i.e., with elements of the form of minus the second derivative of the log-likelihood function with respect to the parameters. The Fisher information matrix of vector

θ,

I (θ)

is given by

I (θ) = E (K (θ))

and should be calculated numerically.

When, we use the base distributions (26) and (27) in the

P H C (I)

model, a technique known as pseudo-likelihood estimation (see Arnold and Strauss [10]) will provide estimates of all five parameters in the model. Besag [7], defined the pseudo-likelihood estimator of

θ

as the value

θ_{0}

of

θ

that maximizes the pseudo likelihood function, which in the present bivariate situation is based on the conditional PH densities and is given by

L_{P} (β; X_{1}, X_{2}) = \prod_{i = 1}^{n} f_{X_{1} | X_{2}} (x_{1 i} | x_{2}) f_{X_{2} | X_{1}} (x_{2 i} | x_{1}) .

(28)

Thus for the example with distributions:

F_{1} (x_{1}; γ_{1}) = x_{1}^{γ_{1}}, 0 < x_{1} < 1,

and

F_{2} (x_{2}; γ_{2}) = x_{2}^{γ_{2}}, 0 < x_{2} < 1,

we have the following log pseudo likelihood.

\begin{matrix} ℓ_{P} (β; X_{1}, X_{2}) & = & γ_{1} \sum_{i = 1}^{n} log (x_{1 i}) + 2 \sum_{i = 1}^{n} log [(1 - α_{1} δ log (1 - x_{1 i}^{γ_{1}})) (1 - α_{2} δ log (1 - x_{2 i}^{γ_{2}})) - δ] \\ + & n log (α_{1} α_{2} γ_{1} γ_{2}) + γ_{2} \sum_{i = 1}^{n} log (x_{2 i}) - 2 α_{1} α_{2} δ \sum_{i = 1}^{n} log (1 - x_{1 i}^{γ_{1}}) log (1 - x_{2 i}^{γ_{2}}) \end{matrix}

Parallel to the definition of the score function, the pseudo−score function is defined to be the vector whose coordinates are partial derivatives of the log-pseudo-likelihood function with respect to each of the parameters in the model. It is denoted by

U_{p} (β) = {(U_{p} (γ_{1}), U_{p} (γ_{2}), U_{p} (δ), U_{p} (α_{1}), U_{p} (α_{2}))}^{'} .

The estimating equations are constructed by setting the elements of the pseudo−score vector equal to zero. Solutions of these equations correspond to the pseudo-likelihood estimates of the parameters of the model. Typically, these solutions are obtained numerically using iterative methods such as Newton–Raphson or quasi-Newton.

The pseudo-likelihood estimator

\hat{β}

of

β

obtained in the above fashion can be verified to be consistent and asymptotically normally distributed with covariance matrix given by

Σ_{p} = J^{- 1} (β) K (β) J^{- 1} (β)

(see Arnold and Strauss [10]), where for

l, m = 1, 2

K_{l m} (β) = E [\{\frac{\partial ℓ_{p} (β)}{\partial β_{l}}\} {\{\frac{\partial ℓ_{p} (β)}{\partial β_{m}}\}}^{'}], J_{l m} (β) = - E [\frac{\partial^{2} ℓ_{p} (β)}{\partial β_{l} \partial β_{m}}] .

As a consistent estimate of the asymptotic variance-covariance matrix of the pseudo-likelihood estimator, we will use the sandwich estimator proposed by Cheng and Riu [11]. This estimator is developed as follows.

Let

U_{p i} (β) = \frac{\partial ℓ_{p i} β}{\partial β},

be the vector of pseudo-scores for the i-th observation. Then define

{\hat{J}}_{n} (β) = - \frac{1}{n} \sum_{i = 1}^{n} \frac{\partial U_{p i} (β)}{\partial β} |_{\tilde{β}},

which is the sum over all the observations of the matrices of second derivatives of

ℓ_{p} (β)

evaluated at the pseudo-likelihood estimator

\tilde{β} .

In addition, define

{\hat{K}}_{n} (β) = \frac{1}{n} \sum_{i = 1}^{n} U_{p i} (β) U_{p i} {(β)}^{'} |_{\tilde{β}} .

Using this, we construct a consistent sandwich estimator of the asymptotic variance-covariance matrix in the form

\hat{Σ} (\tilde{β}) = \frac{1}{n} {\hat{J}}_{n}^{- 1} (\tilde{β}) {\hat{K}}_{n} (\tilde{β}) {\hat{J}}_{n}^{- 1^{'}} (\tilde{β}) .

A detailed discussion and analysis of such a model, but with power Lindley base distributions, utilizing pseudo-likelihood estimation, may be found in Martínez-Flórez et al. [12].

6. If $F_{1}$ and $F_{2}$ Are Unknown

All but one of the bivariate proportional hazard models described in this paper have marginals of the proportional hazard form. The exception is the PHC(I) model which, for unknown

F_{1}

and

F_{2}

, we will discuss in Section 7. For the other models, we know that if

F_{1}

and

F_{2}

were known, we could transform the data to obtain a sample from a well-known bivariate exponential model. Consequently, if we consider an estimate

{\tilde{F}}_{1, n} (x)

of

F_{1}

based on

X_{1, 1}, X_{1, 2}, \dots X_{1, n}

and an estimate

{\tilde{F}}_{2, n} (x)

of

F_{2}

based on

X_{2, 1}, X_{2, 2}, \dots X_{2, n}

, we can transform the data using

Z_{1, j} = - log {\tilde{F}}_{1, n} (X_{1, j})

and

Z_{2, j} = - log {\tilde{F}}_{2, n} (X_{2, j}),

and then we will have approximately a sample from a bivariate distribution with standard exponential marginals and we can then estimate the parameters in this exponential model. Note that for identifiability in unknown

F_{1}

and

F_{2}

models we have to fix

α_{1}

and

α_{2}

to be equal to

1 .

For the PHC(II) model with Weibull component distributions, given in (26) and (27), a small simulation study of the performance of the maximum likelihood parameter estimates has been implemented for a variety of sample sizes and for several parametric configurations. With minimal loss of generality we set

α_{1} = α_{2} = 1

throughout the simulation study. Three values of the dependence parameter

δ

were used, namely

0.15, 0.30

and

0.45

, together with four sample sizes

n = 30, 50, 70, 90

. The table presents results for three representative choices of values for

θ

and

τ

. As for measures of performance, the relative bias (RB) and the square root of the mean squared error (MSE) are given.

The results in Table 1 confirm that both the relative bias and the root mean-squared error of the estimates decrease as sample size increases.

Table 1. RB and

\sqrt{M S E}

for the PHC(II)-Weibull model.

7. If $F_{1}$ and $F_{2}$ Are Unknown in the PHC(I) Model

Our model is of the

P H C (I) (1, F_{1}; 1, F_{2}; δ)

form, i.e.,

\begin{matrix} f_{X_{1}, X_{2}} (x_{1}, x_{2}) & = k (δ) [(1 - log (1 - F_{1} (x_{1}))) (1 - log (1 - F_{2} (x_{2}))) - δ] \\ \times f_{1} (x_{1}) f_{2} (x_{2}) exp {- δ log [1 - F_{1} (x_{1})] log [1 - F_{2} (x_{2})]} . \end{matrix}

(29)

where

δ, F_{1},

and

F_{2}

are unknown. Although it would be easy to estimate

δ

, via pseudolikelihood, if

F_{1}

and

F_{2}

were known, it is not apparent how to estimate

F_{1}

and

F_{2}

assuming that

δ

is known. So it is not clear how to implement an iterative strategy for estimating

F_{1}

,

F_{2}

and

δ

simultaneously. Perhaps our only choice is to assume that the

F_{1}

’s belong to some parametric families of distributions, with once more utilizing pseudo likelihood to avoid dealing with

k (δ) .

8. Application

The data analyzed in this example consist of the maximum water levels registered at two stations on the Fox river in Wisconsin during the period 1918–1950. Measurements were made at an upstream location (Berlin,

X_{1}

) and a downstream location (Wrightstown,

X_{2}

). This data set was previously analyzed by Gumbel and Mustafi [13] using a bivariate extreme model.

In our analysis of this data set we will fit four models namely:

The Arnold and Strauss [3] bivariate exponential conditionals distribution. denoted by BEC.
Gumbel’s [6] first bivariate exponential distribution, denoted by BG(I).
The proportional hazard conditionals Weibull extension of the BEC distribution, denoted by PHC(I)-W.
The proportional hazard conditionals Weibull extension of the BG(I) distribution, denoted by PHC(II)-W.

In both of the Weibull proportional hazard conditionals extensions mentioned above, i.e., PHC(I)-W and PHC(II)-W, as described in Section 3, we use the following choices for the component distributions

F_{1}

and

F_{2}

:

F_{1} (x_{1}; θ) = 1 - e^{- x_{1}^{θ}}, x_{1} > 0

and

F_{2} (x_{2}; τ) = 1 - e^{- x_{2}^{τ}}, x_{2} > 0 .

Using the Arnold and Strauss [3] density given in Equation (11), the density of the PHC(I)-W is given by

f_{X_{1}, X_{2}} (x_{1}, x_{2}) = α_{1} α_{2} θ τ k (δ) x_{1}^{θ - 1} x_{2}^{τ - 1} exp (- α_{1} x_{1}^{θ} - α_{2} x_{2}^{τ} - α_{1} α_{2} δ x_{1}^{θ} x_{2}^{τ}) .

The corresponding log-pseudo-likelihood function for a sample of size n takes the form

\begin{matrix} ℓ (θ; X_{(1)}, X_{(2)}) & = & n log (α_{1} α_{2} θ τ) + (θ - 1) \sum_{i = 1}^{n} log (x_{1 i}) + \sum_{i = 1}^{n} log [(1 + α_{1} δ x_{1}^{θ}) (1 + α_{2} δ x_{2}^{τ})] \\ + & (τ - 1) \sum_{i = 1}^{n} log (x_{2 i}) - α_{1} \sum_{i = 1}^{n} (1 + α_{2} δ x_{2}^{τ}) x_{1}^{θ} - α_{2} \sum_{i = 1}^{n} (1 + α_{1} δ x_{1}^{θ}) x_{2}^{τ} . \end{matrix}

The log-pseudo-likelihood for the BEC model is obtained from the expression for the PHC(I)-W by setting

θ = 1

and

τ = 1

.

The log-likelihood function for a sample of size n from the PHC(II)-W is of the form given in Equation (28), with simple change of notation. In this case the corresponding log-likelihood function for the BG(I) model is again obtained by setting

θ = 1

and

τ = 1

.

Using the Fox river data, maximizing the log-likelihood for the models BG(I) and PHC(II)-W and the log-pseudo-likelihood for the models BEC and PHC(I)-W, we obtain the estimates of the parameters of the four models given in Table 2 (with standard errors in parentheses).

Table 2. Estimates (standard errors) for the fitted models.

To compare model fitting, we use the AIC (Akaike [14]) criterion, namely AIC =

- 2 \hat{ℓ} (\cdot) + 2 p .

We also consider the BIC (Schwarz [15]) criterion, namely BIC =

- 2 \hat{ℓ} (\cdot) + log (n) p,

criterion where p is the number of parameters for the model being considered. The best model is the one with the smallest AIC or BIC.

According to the values of the AIC and BIC criteria for the Fox river data, the best model is the PHC(I)-W followed by the PHC(II)-W model.

Since the BEC and BG(I) models are special cases of the PHC(I)-W and PHC(II)-W models, respectively, obtained by setting

θ = τ = 1

, we may test the hypotheses

H_{0} : (θ, τ) = (1, 1) v e r s u s H_{1} : (θ, τ) \neq (1, 1)

for comparing the PHC(II)-W and PHC(I)-W models with the BG(I) and BEC models, respectively.

Using the likelihood ratio statistic,

Λ = \frac{ℓ_{f} (\hat{β})}{ℓ_{f - W} (\hat{β})}

(30)

we obtain

- 2 log (Λ) \sim χ_{2}^{2} .

(31)

The corresponding values of

- 2 log (Λ)

in each case are provided in Table 3 (note in the BEC-PHC(I)-W comparison the log-pseudo-likelihoods have been utilized instead of log-likelihoods) which are greater than the value of the

χ_{2, 99 %}^{2} = 9.210

indicating that the PHC(II)-W and PHC(I)-W models are significantly better at the 1% level. Thus, the PHC(II)-W and PHC(I)-W models appear to be good alternative for fitting the set data. The choice between the PHC(I)-W and PHC(II)-W is not so clear-cut, but perhaps the PHC(I)-W might be considered to be marginally better.

Table 3. Comparison of likelihood ratio statistics.

The graphs in Figure 1a,b and Figure 2a,b show the contours of the densities BG(I) and BEC and of the fitted models for PHC(II)-W and PHC(I)-W, respectively.

Figure 1. Contours for (a) BG(I) model and (b) BEC model.

Figure 2. Contours for (a) PHC(II)-W model and (b) PHC(I)-W model.

Under the assumption that the forms of the

F_{i}

’s are unknown, we use the transformations

Z_{1, j} = - log {\tilde{F}}_{1, n} (X_{1, j}) and Z_{2, j} = - log {\tilde{F}}_{2, n} (X_{2, j}),

to arrive at a BG(I) model with joint survival function

S (z_{1}, z_{2}) = exp (- z_{1} - z_{2} - δ z_{1} z_{2}) .

Then, using the expression for the maximum likelihood estimate of

δ

provided by Kotz et al. ([2], p. 352), we obtain

\hat{δ} = 0.2986,

a much smaller value than the estimated value of the parameter

δ

obtained assuming a known form for the

F_{i}

’s. Perhaps this indicates that the Weibull choices for the

F_{i}

’s are not optimal.

9. Discussion

The bivariate models discussed in this paper utilize quite different approaches to their construction and thus can be expected to exhibit significantly different distributional properties, especially with regard to dependence. Future research on such models should put some focus on the problem of selecting the appropriate one of these models for a particular data set. Of course, the old stand-by of fitting via maximum likelihood and comparing models via AIC and BIC is always available.

Author Contributions

Conceptualization, B.C.A., G.M.-F. and H.W.G.; Formal analysis, B.C.A., G.M.-F. and H.W.G.; Investigation, B.C.A., G.M.-F. and H.W.G.; Methodology, B.C.A. and G.M.-F.; Software, G.M.-F.; Supervision, B.C.A. and H.W.G.; Validation, B.C.A., G.M.-F. and H.W.G. Writing—original draft preparation, B.C.A., G.M.-F. and H.W.G.; Funding acquisition, H.W.G. All of the authors contributed significantly to this research article. All authors have read and agreed to the published version of the manuscript.

Funding

The research of H.W. Gómez was supported by SEMILLERO UA-2022 project, Chile. The research of G. Martínez-Flórez was supported by Universidad de Córdoba, Montería, Colombia.

Data Availability Statement

The data can be found in Gumbel and Mustafi (1967).

Acknowledgments

The authors thank the Editor and three anonymous referees for their constructive comments and suggestions, which have greatly helped them to improve the paper.

Conflicts of Interest

The authors declare no conflict of interest.

References

Marshall, A.W.; Olkin, I. A Multivariate Exponential Distribution. J. Am. Statist. Assoc. 1967, 62, 30–44. [Google Scholar] [CrossRef]
Kotz, S.; Balakrishnan, N.; Johnson, N.L. Continuous Multivariate Distributions; John Wiley and Sons: Hoboken, NJ, USA, 2000. [Google Scholar]
Arnold, B.C.; Strauss, D.J. Bivariate distributions with exponential conditionals. J. Am. Statist. Assoc. 1988, 83, 522–527. [Google Scholar] [CrossRef]
Arnold, B.C.; Kim, Y.H. Conditional proportional hazards models. In Lifetime Data: Models in Reliability and Survival Analysis; Jewell, N.P., Kimber, A.C., Lee, M.L.T., Whitmore, G.A., Eds.; Springer: Boston, MA, USA, 1996; pp. 21–28. [Google Scholar]
Arnold, B.C.; Castillo, E.; Sarabia, J.M. Conditional Specification of Statistical Models; Springer Series in Statistics; Springer: Berlin/Heidelberg, Germany, 1999. [Google Scholar]
Gumbel, E.J. Bivariate Exponential Distributions. J. Am. Statist. Assoc. 1960, 55, 698–707. [Google Scholar] [CrossRef]
Besag, J. Statistical Analysis of Non-Lattice Data. J. R. Stat. Soc. Ser. D 1975, 24, 179–195. [Google Scholar] [CrossRef]
Arnold, B.C.; Strauss, D.J. Bivariate distributions with conditionals in prescribed exponential families. J. R. Stat. Soc. Ser. B 1991, 53, 365–375. [Google Scholar] [CrossRef]
Castillo, E.; Hadi, A.S. Modeling Lifetime Data with Application to Fatigue Models. J. Am. Statist. Assoc. 1995, 90, 1041–1054. [Google Scholar] [CrossRef]
Arnold, B.C.; Strauss, D.J. Pseudolikelihood estimation: Some examples. Sankhya Ser. B 1991, 53, 233–243. [Google Scholar]
Cheng, C.; Riu, J. On estimating linear relationships when both variables are subject to heteroscedastic measurement errors. Technometrics 2006, 48, 511–519. [Google Scholar] [CrossRef]
Martínez-Flórez, G.; Arnold, B.C.; Gómez, H.W. A bivariate power Lindley survival distribution. 2022; Unpublished Work. [Google Scholar]
Gumbel, E.; Mustafi, C.K. Some Analytical Properties of Bivariate Extremal Distributions. J. Am. Statist. Assoc. 1967, 62, 569–588. [Google Scholar] [CrossRef]
Akaike, H. A new look at the statistical model identification. IEEE Trans. Autom. Control 1974, 19, 716–723. [Google Scholar] [CrossRef]
Schwarz, G. Estimating the dimension of a model. Ann. Stat. 1978, 6, 461–464. [Google Scholar] [CrossRef]

Figure 1. Contours for (a) BG(I) model and (b) BEC model.

Figure 2. Contours for (a) PHC(II)-W model and (b) PHC(I)-W model.

Table 1. RB and

\sqrt{M S E}

for the PHC(II)-Weibull model.

Table 1. RB and

\sqrt{M S E}

for the PHC(II)-Weibull model.

		$\hat{θ}$		$\hat{τ}$		$\hat{δ}$		${\hat{α}}_{1}$		${\hat{α}}_{2}$
Parameters	n	RB	$\sqrt{M S E}$	RB	$\sqrt{M S E}$	RB	$\sqrt{M S E}$	RB	$\sqrt{M S E}$	RB	$\sqrt{M S E}$
	30	0.1964	0.2630	0.4915	0.4029	1.6753	0.3424	0.2011	0.2591	0.5361	0.5931
	50	0.1911	0.2583	0.4864	0.3853	1.4262	0.2828	0.1983	0.2342	0.5339	0.5705
(1.25, 0.75, 0.15, 1, 1)	70	0.1842	0.2590	0.4819	0.3782	1.3480	0.2549	0.1974	0.2264	0.5330	0.5633
	90	0.1717	0.2597	0.4751	0.3686	1.3117	0.2376	0.1933	0.2165	0.5307	0.5541
	30	0.1948	0.2675	0.4966	0.4078	1.6798	0.3393	0.1957	0.2542	0.5358	0.596
	50	0.1934	0.2620	0.4830	0.3823	1.4519	0.2841	0.1957	0.2310	0.5356	0.5731
(1.25, 0.75, 0.30, 1, 1)	70	0.1887	0.2611	0.4827	0.3771	1.3260	0.2514	0.1935	0.2238	0.5332	0.5653
	90	0.1756	0.2579	0.4762	0.3688	1.2861	0.2351	0.1916	0.2178	0.5424	0.5629
	30	0.1894	0.2610	0.5316	0.4336	0.4177	0.3002	0.1918	0.2463	0.6127	0.6808
	50	0.1827	0.2512	0.5149	0.4090	0.4022	0.2688	0.1904	0.2297	0.6018	0.6428
(1.25, 0.75, 0.45, 1, 1)	70	0.1799	0.2483	0.4966	0.3884	0.4009	0.2604	0.1900	0.2210	0.5904	0.6193
	90	0.1679	0.2512	0.4905	0.3799	0.3958	0.2376	0.1779	0.2126	0.5878	0.6136
	30	0.4228	0.7457	1.2543	0.6503	1.6778	0.3378	0.4287	0.4473	1.4193	1.4930
	50	0.4189	0.7427	1.2332	0.6305	1.5481	0.2941	0.4270	0.4353	1.4191	14.435
(1.75, 0.5, 0.15, 1, 1)	70	0.4193	0.7422	1.2311	0.6262	1.3103	0.2455	0.4245	0.4347	1.4048	1.4378
	90	0.4093	0.7326	1.2276	0.6215	1.2531	0.2294	0.4227	0.4346	1.3939	1.4363
	30	0.4218	0.7455	1.2862	0.6652	0.8103	0.3439	0.4204	0.4352	1.4968	1.5718
	50	0.4207	0.7428	1.2554	0.6408	0.7520	0.2984	0.4152	0.4288	1.4907	1.5178
(1.75, 0.5, 0.30, 1, 1)	70	0.4171	0.7395	1.2544	0.6373	0.6981	0.2686	0.4154	0.4281	1.4804	1.5167
	90	0.4098	0.7334	1.2421	0.6300	0.6730	0.2539	0.4143	0.4250	1.4733	1.5102
	30	0.4179	0.7376	1.2780	0.6602	0.4190	0.2985	0.4149	0.4270	1.5069	1.5796
	50	0.4169	0.7361	1.2606	0.6456	0.4115	0.2725	0.4148	0.4250	1.4762	1.5240
(1.75, 0.5, 0.45, 1, 1)	70	0.4126	0.7326	1.2354	0.6268	0.4056	0.2597	0.4122	0.4239	1.4655	1.4988
	90	0.4034	0.7225	1.2303	0.6238	0.3916	0.2455	0.4062	0.4227	1.4550	1.4831
	30	0.3903	0.3270	0.2506	0.3899	1.3177	0.3091	0.3590	0.4395	0.2778	0.3212
	50	0.3661	0.2980	0.2434	0.3887	1.1349	0.2508	0.3479	0.4006	0.2776	0.3044
(0.75, 1.5, 0.15, 1, 1)	70	0.3516	0.2811	0.2493	0.3877	1.1461	0.2343	0.3471	0.3816	0.2761	0.2962
	90	0.3452	0.2721	0.2305	0.3871	1.0392	0.2106	0.3449	0.3755	0.2748	0.2929
	30	0.3877	0.3280	0.2439	0.3831	0.6461	0.3136	0.3794	0.4639	0.2677	0.3001
	50	0.3699	0.3006	0.2424	0.3817	0.6403	0.2784	0.3539	0.4050	0.2660	0.2871
(0.75, 1.5, 0.30, 1, 1)	70	0.3568	0.2861	0.2365	0.3785	0.6328	0.2611	0.3527	0.3896	0.2600	0.2884
	90	0.3532	0.2784	0.2273	0.3767	0.6066	0.2373	0.3513	0.3819	0.2492	0.2823
	30	0.3933	0.3324	0.2529	0.3914	0.3934	0.2906	0.3774	0.4611	0.2664	0.3023
	50	0.3685	0.2996	0.2439	0.3900	0.3893	0.2662	0.3653	0.4186	0.2639	0.2872
(0.75, 1.5, 0.45, 1, 1)	70	0.3557	0.2834	0.2495	0.3883	0.3649	0.2523	0.3595	0.3955	0.2594	0.2856
	90	0.3482	0.2747	0.2286	0.3861	0.3486	0.2379	0.3551	0.3833	0.2526	0.2829

Table 2. Estimates (standard errors) for the fitted models.

Estimates	BG(I)	PHC(II)-W	BEC	PHC(I)-W
$\hat{α_{1}}$	0.2198	0.0775	0.0940	0.0277
	(0.0185)	(0.0389)	(0.0081)	(0.0077)
$\hat{α_{2}}$	0.0656	0.0055	0.0283	0.0046
	(0.0056)	(0.0029)	(0.0089)	(0.0008)
$\hat{δ}$	0.8344	0.9053	3.8624	0.8753
	(0.3041)	(0.3341)	(0.3254)	(0.0006)
$\hat{θ}$		1.6396		2.0337
		(0.2979)		(0.0007)
$\hat{τ}$		1.8895		1.8384
		(0.1995)		(0.0006)
AIC	400.6863	364.4660	415.2929	353.4773
BIC	405.1758	371.9485	419.7824	360.9598

Table 3. Comparison of likelihood ratio statistics.

	PHC(II)-W vs. BG(I)	PHC(I)-W vs. BEC
$- 2 log (Λ)$	40.2203	65.8156

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Bivariate Proportional Hazard Models: Structure and Inference

Abstract

1. Introduction

2. Bivariate Distributions with Proportional Hazard Marginals

3. Bivariate Distributions with Proportional Hazard Conditionals

3.1. The First Kind

3.2. The Second Kind

4. If $F_{1}$ and $F_{2}$ Are Known

5. If $F_{1}$ and $F_{2}$ Are Known to Belong to Some Given Parametric Families

6. If $F_{1}$ and $F_{2}$ Are Unknown

7. If $F_{1}$ and $F_{2}$ Are Unknown in the PHC(I) Model

8. Application

9. Discussion

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics

Bivariate Proportional Hazard Models: Structure and Inference

Abstract

1. Introduction

2. Bivariate Distributions with Proportional Hazard Marginals

3. Bivariate Distributions with Proportional Hazard Conditionals

3.1. The First Kind

3.2. The Second Kind

4. If F 1 and F 2 Are Known

5. If F 1 and F 2 Are Known to Belong to Some Given Parametric Families

6. If F 1 and F 2 Are Unknown

7. If F 1 and F 2 Are Unknown in the PHC(I) Model

8. Application

9. Discussion

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics

4. If $F_{1}$ and $F_{2}$ Are Known

5. If $F_{1}$ and $F_{2}$ Are Known to Belong to Some Given Parametric Families

6. If $F_{1}$ and $F_{2}$ Are Unknown

7. If $F_{1}$ and $F_{2}$ Are Unknown in the PHC(I) Model