Evaluation of Surrogate Endpoints Using Information-Theoretic Measure of Association Based on Havrda and Charvat Entropy

Pardo, María del Carmen; Zhao, Qian; Jin, Hua; Lu, Ying

doi:10.3390/math10030465

Open AccessArticle

Evaluation of Surrogate Endpoints Using Information-Theoretic Measure of Association Based on Havrda and Charvat Entropy

¹

Department of Statistics and O.R., Complutense University of Madrid, 28040 Madrid, Spain

²

Department of Epidemiology and Health Statistics, School of Public Health, Guangzhou Medical University, Guangzhou 510260, China

³

Department of Probability and Statistics, School of Mathematics, South China Normal University, Guangzhou 510631, China

⁴

Department of Biomedical Data Science, Stanford University, San Francisco, CA 94305, USA

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Mathematics 2022, 10(3), 465; https://doi.org/10.3390/math10030465

Submission received: 18 December 2021 / Revised: 21 January 2022 / Accepted: 29 January 2022 / Published: 31 January 2022

(This article belongs to the Special Issue Mathematical and Statistical Assessment of Biomarkers and Surrogate Endpoints in Clinical Trials)

Download Versions Notes

Abstract

:

Surrogate endpoints have been used to assess the efficacy of a treatment and can potentially reduce the duration and/or number of required patients for clinical trials. Using information theory, Alonso et al. (2007) proposed a unified framework based on Shannon entropy, a new definition of surrogacy that departed from the hypothesis testing framework. In this paper, a new family of surrogacy measures under Havrda and Charvat (H-C) entropy is derived which contains Alonso’s definition as a particular case. Furthermore, we extend our approach to a new model based on the information-theoretic measure of association for a longitudinally collected continuous surrogate endpoint for a binary clinical endpoint of a clinical trial using H-C entropy. The new model is illustrated through the analysis of data from a completed clinical trial. It demonstrates advantages of H-C entropy-based surrogacy measures in the evaluation of scheduling longitudinal biomarker visits for a phase 2 randomized controlled clinical trial for treatment of multiple sclerosis.

Keywords:

surrogate endpoint; information theory; Havrda and Charvat entropy; mutual information; clinical trial design

1. Introduction

Surrogate endpoints which can be observed earlier, easier, possibly repeated, or are cost-saving, have been used to replace clinical endpoints in clinical trials. For example, total tumor response rate and progression–free survival have been used in phase II and phase III cancer clinical trials as surrogate endpoints for overall survival, which often requires a longer trial duration to achieve adequate statistical power. The United States Food and Drug Administration (USFDA) has accepted the use of surrogate endpoints in regulatory reviews of new drug applications [1]. Most cancer drug approvals (55 of 83 (66%)) between 2009 and 2014 by the USFDA have used at least one surrogate endpoint [2].

A motivative example is to use biomarkers in phase II cancer trials. Non-randomized single arm or randomized parallel clinical trials are used to evaluate signal of efficacy for a new drug. A binary response status, such as the total response based on the RECIST criterion [3], or a continuous response in change of tumor sizes [4] are common primary endpoints. For molecular-targeted drugs or immune oncology therapies, various serum, tissue, or imaging biomarkers are being developed to assess if the targeted pathways have been activated. These biomarkers are usually continuous, can be measured repeatedly, and their changes should proceed to a clinical response. However, the activation of a targeted pathway doesn’t necessarily imply the response to the treatment. Various questions have been raised about the utility of such biomarkers in phase II trials [5,6].

Surrogate endpoints provide the convenience to speed up the clinical trial [7], but may not represent the actual outcomes well regarding the benefit of therapy [8]. For instance, bevacizumab was approved in metastatic breast cancer based on the surrogate outcome and was later withdrawn for failing to confirm a survival benefit [5]. How to evaluate the therapy benefit between surrogates, denoted as

S

and the true clinical endpoint outcome, denoted as

T

, remains a scientific challenge.

There are a lot of successful statistical methods and measures to assess surrogate endpoints. One method to validate surrogate endpoints is to evaluate their correlation with clinically meaningful endpoints through meta-analyses [9]. Only 11 of 89 (12%) studies had found high correlation (r ≥ 0.85), and nine (10%) showed a moderate-only correlation (r > 0.7 to r < 0.85) between surrogates and endpoints [7], suggesting that the strength of surrogates in clinical practice is often unknown or weak.

The landmark paper by Prentice [8] proposed operational criteria for the identification of valid surrogate endpoints. A sufficient condition for an endpoint

S

, as a valid surrogate of a primary clinical endpoint

T

, in the evaluation of a treatment, denoted as

Z

, is that the random vector

(Z, S, T)

forms a Markov chain

Z \to S \to T

, i.e., conditioning on

S

,

Z

and

T

are independent from each other. This condition led to parametric and non-parametric approaches to quantify the proportion of the treatment effect on

T

that is explained by the treatment effect on the

S

[9,10,11,12,13]. Other proposed quantities to assess the utility of a surrogate endpoint include dissociative effects, associative effects, average causal necessity, average causal sufficiency, causal effect predictiveness surface, and principle surrogate, etc. [9,14,15,16,17,18,19,20,21,22].

Buyse and Molenberghs [23] suggested two quantities to validate a surrogate endpoint: the relative effect which related the treatment effect on the primary outcome to that on the surrogate at the population level, and the adjusted association, which quantified the association between the primary outcome and surrogate marker after adjusting for the treatment at the individual level. These methods assumed that information regarding the surrogate and true endpoints was available from a single-trial surrogate evaluation method. They focused only on the validity, whereas the general association of the last one was related to the efficiency of a surrogate marker. Using an information-theoretic approach, Alonso and Molenberghs [24] and Pryseley et al. [25] redefined surrogacy in terms of the information content that

S

provides with respect to

T

. Using notations from Pryseley et al [25], let

S

be a continuous surrogate random variable and

T

be the continuous targeted clinical endpoint of interest. We use

f (\cdot)

to denote the density function. The Shannon entropy functions for

T

and the conditional variable

T | S

are denoted as

h (T)

and

h (T | S)

, respectively, where

h (T) = E [- l o g f (T)]

and

h (T | S) = E [- l o g f (T | S)]

. The corresponding entropy power functions are

E P_{h} (T) = e^{\frac{2}{n} h (T)} / (2 π e)

and

E P_{h} (T | S) = e^{\frac{2}{n} h (T | S)} / (2 π e)

. An information theoretic measure of association (ITMA) is defined by Alonso and colleagues as the proportion of uncertainty reduction measured by the entropy power function for

T | S

in reference to

T

:

R_{h}^{2} = \frac{E P_{h} (T) - E P_{h} (T |S)}{E P_{h} (T)} = 1 - e^{- 2 I (T, S)}

(1)

Here

I (T, S) = E [- l o g f (T)] - E [- l o g f (T | S)]

is the mutual information. If

S

is a good surrogate for

T

, uncertainty about the effect on

T

is reduced by knowing the knowledge of the effect on

S

. There are some useful properties as described by Alonso and Molenberghs [24]:

(1): $0 \leq R_{h}^{2} \leq 1$
(2): $R_{h}^{2} = 0$ , if and only if ( $T, S$ ) are independent
(3): $R_{h}^{2}$ is symmetric in ( $T, S$ )
(4): $R_{h}^{2}$ is invariant under bijective transformations of $T$ and $S$
(5): When $R_{h}^{2} \to 1$ for continuous models, there is usually a deterministic relationship in the distribution of ( $T, S$ ), that is, often $T$ = φ( $S$ ).
(6): When T is a discrete random variable, $R_{h}^{2} \leq 1 - e^{- 2 h (T)}$ . So they propose to use $R_{h, m a x}^{2} = R_{h}^{2} / (1 - e^{- 2 h (T)})$ as a modified ITMA.

A good surrogacy should have a high

R_{h}^{2}

. The beauty of the informatic-theoretical framework is that it moves away from hypothesis testing and provides a quantitative measure of surrogacy. Thus, if there are two surrogate endpoints,

S_{1}

and

S_{2}

, we can compare their utility as surrogacy endpoints based on the value of

R_{h}^{2}

’s.

Multiple authors have provided examples of this approach and demonstrated applications for situations when

S

and

T

are both binary, continuous, longitudinal, and time-to-event random variables as well as ordinal outcomes [24,25,26,27,28].

In this paper, we present two new results on the topic of surrogate endpoints based on information-theoretic measure of association (ITMA). First, we extend the ITMA construction based on Shannon entropy to a construction based on Havrda and Charvat (H-C) entropy [29]. The extension is motivated by the general existence of the H-C entropy. Explicit expressions as well as the properties of ITMA in different situations based on H-C entropy are presented. Second, we extend the H-C ITMA model to a longitudinally collected continuous surrogate endpoint for a binary clinical endpoint of a clinical trial. Then, the benefit of S is evaluated with the ITMA [24,25]. The current work focuses on a single trial surrogacy and its extension to a meta-analytic framework will need further development.

The paper is organized in the following structure. In Section 2, we give an example of when the Shannon entropy cannot be defined, thus the surrogacy by Alonso and Molenberghs under the information theoretical framework will not work [24]. We then prove the existence of H-C entropy under general conditions. Therefore, a family of surrogacy measures based on ITMA of H-C entropy is defined. An explicit formula is obtained for the following situations: binary-binary, continuous–continuous and binary-continuous. In Section 3, we extend a longitudinal linear random effects model for the longitudinally collected surrogate marker and a probit regression model for a binary primary endpoint in clinical trials. An application using H-C entropy in selecting times to collect surrogate measures is presented using data from a completed clinical trial. Finally, Section 4 presents discussions and a conclusion.

2. Extension of ITMA Surrogacy from Shannon Entropy to Havrda-Charvat Entropy

Why should we consider the extension? While Shannon’s entropy is adequate in most applications, there are cases when a Shannon’s entropy function doesn’t exist, and thus the ITMA cannot be properly calculated. We give an example in the following.

Example 1.

Let

X

be the random variable with heavy tails. Its density function is

f (x) = \{\begin{matrix} \frac{1}{\sqrt{2 π}} e^{- \frac{x^{2}}{2}}, & |x| \leq c_{1} \\ \frac{c_{2}}{|x| {(\log |x|)}^{2}}, & |x| > c_{1} \end{matrix}

(2)

where

c_{1} \approx 1.44

and

c_{2} \approx 0.027

are chosen so that it is a continuous function with a heavy tail. The Shannon entropy

h (X)

for

X

is infinity.

One way to make ITMA work is to use the Havrda-Charvat entropy [29], a generalization of entropy function that contains the Shannon entropy as its special case. Mathematically, Havrda-Charvat entropy is defined as follows:

H C_{α} (X) = \int φ_{α} (f (x)) f (x) d x .

(3)

where

φ_{α} (x) = \{\begin{matrix} (x^{α - 1} - 1) / (1 - α) & α \neq 1 \\ - l o g (x) & α = 1 \end{matrix}

(4)

It is easy to see that

H C_{1} (X) = h (X)

.

Proposition 1.

Under a mild regular condition that the density function for

X

is bounded,

H C_{α} (X)

can always exist for an

α > 1

.

Proof.

We only need to prove that for a bounded density function

f (x)

by a constant

K

> 0,

\int {(f (x))}^{a} d x < + \infty

.

Let

W = {x : f (x) > 1}

, then it follows from the fact that

1 = \int f (x) d x = \int_{W}^{} f (x) d x + \int_{W^{C}}^{} f (x) d x \geq m (W) + \int_{W^{C}}^{} f (x) d x

that

m (W) \leq 1

and

\int_{W^{C}}^{} f (x) d x \leq 1

, where

m (W)

is the probability of

W

. Therefore,

\int {(f (x))}^{α} d x = \int_{W}^{} {(f (x))}^{α} d x + \int_{W^{C}}^{} {(f (x))}^{α} d x \leq K^{α - 1} m (W) + \int_{W^{C}}^{} f (x) d x < + \infty

□

Because Shannon’s entropy is a special case of H-C entropy and H-C entropy always exists with proper choice of

α

, ITMA of H-C entropy should be a more flexible way as a surrogacy measure for more distribution families. Different from Shannon’s entropy, H-C entropy satisfies a non-additive property such that

H C_{α} (T, S) = H C_{α} (T) + H C_{α} (S) + (1 - α) H C_{α} (T) H C_{α} (S)

, when

T

and

S

are independent. In general, the non-additive measures of entropy find justifications in many biological and chemical phenomena [30]. While H-C entropy has been used in quantum physics [31] and medical imaging research [32], it has not yet been used to describe the endpoint surrogacy for clinical trials.

To extend H-C entropy to measure the endpoint surrogacy for trials, we define the ITMA under H-C entropy power as the following:

R_{α}^{2} = 1 - e^{- 2 I_{α} (T, S)}

(5)

where

I_{α} (T, S)

is the mutual information between T and S under H-C entropy [32]. Specifically, for

α \neq 1,

I_{α} (T, S) = H C_{α} (T) + H C_{α} (S) + (1 - α) H C_{α} (T) H C_{α} (S) - H C_{α} (T, S)

(6)

When

α = 1

,

I_{α} (T, S) = I (T, S)

and

R_{α}^{2}

=

R_{h}^{2}

in Equation (1).

It is important to notice that:

\frac{E P_{α} (T) - E P_{α} (T | S)}{E P_{α} (T)} \neq 1 - e^{- 2 I_{α} (T, S)}

, for

α \neq 1

where

E P_{α} (X) = e^{\frac{2}{n} H C_{α} (X)} / (2 π e)

.

Some basic properties are available here.

When $T$ and $S$ are independent, $I_{α} (T, S) = 0$ . Thus, $R_{α}^{2} = 0$ .
When $T$ and $S$ are deterministic, the value of $I_{α} (T, S)$ will depend on $α > 1$ or $α < 1$ as seem in the following propositions.

Proposition 2.

Let

T

and

S

be two continuously normally distributed random variables such as the joint distribution of

{(T, S)}^{'} ~ N ([\begin{matrix} μ_{T} \\ μ_{S} \end{matrix}], [\begin{matrix} σ_{T}^{2} & ρ σ_{T} σ_{S} \\ ρ σ_{T} σ_{S} & σ_{S}^{2} \end{matrix}])

, the conditional distribution of

(T | S) ~ N (μ_{T} + \frac{ρ σ_{T}}{σ_{S}} (S - μ_{S}), σ_{T}^{2} (1 - ρ^{2}))

and

T ~ N (μ_{T}, σ_{T}^{2})

, where “′” means vector transpose. Then, we have the following results:

2.1: The mutual information for H-C entropy depends not only the correlation between T and S, but also their standard deviations for $α \neq 1 .$

$I_{α} (T, S) = \frac{{(2 π σ_{T} σ_{S})}^{1 - α}}{α (1 - α)} [1 - {(\sqrt{1 - ρ^{2}})}^{1 - α}] α \neq 1$

(7)

$I_{α} (T, S) = I (T, S) = - \frac{1}{2} \log (1 - ρ^{2}) f o r α = 1$
2.2: When $α \geq 1$ , $ρ \to \pm 1$ , $I_{α} \to \infty$ . $ρ \to \pm 0$ , $I_{α} \to 0$ , $I_{α}$ is an increasing function of $|ρ|$
2.3: When $α < 1$ , $ρ \to \pm 1$ , $I_{α} \to 1$ . $ρ \to \pm 0$ , $I_{α} \to 0$ , $I_{α}$ is an increasing function of $|ρ|$ .

Therefore, for

α < 1

, maximum

R_{α}^{2} = 1 - e^{- 2}

. We can normalize

R_{α}^{2}

by dividing its maximum value to make normalized

{\tilde{R}}_{α}^{2}

in between 0 and 1

{\tilde{R}}_{α, m a x}^{2} = \frac{R_{α}^{2}}{1 - e^{- 2}}, for α < 1

(8)

Proof.

\begin{matrix} H C_{α} (T, S) = \int \int \frac{1}{(1 - α)} (f_{T, S}^{α - 1} (t, s) - 1) f_{T, S}^{} (t, s) d t d s \\ = \frac{1}{(1 - α)} [\int \int f_{T, S}^{α} (t, s) d t d s - 1] \\ = \frac{1}{(1 - α)} [\frac{2 π σ_{T} σ_{S} \sqrt{1 - ρ^{2}}}{α {(2 π σ_{T} σ_{S} \sqrt{1 - ρ^{2}})}^{α}} - 1] \\ = \frac{1}{(1 - α)} [\frac{{(2 π σ_{T} σ_{S} \sqrt{1 - ρ^{2}})}^{1 - α}}{α} - 1] \end{matrix}

Similarly,

H C_{α} (T) = \frac{1}{(1 - α)} [\frac{{(2 π σ_{T}^{2})}^{\frac{1 - α}{2}}}{\sqrt{α}} - 1]

H C_{α} (S) = \frac{1}{(1 - α)} [\frac{{(2 π σ_{S}^{2})}^{\frac{1 - α}{2}}}{\sqrt{α}} - 1]

And

(1 - α) H C_{α} (T) H C_{α} (S) = \frac{1}{(1 - α)} [\frac{{(2 π σ_{T}^{2})}^{\frac{1 - α}{2}}}{\sqrt{α}} - 1] [\frac{{(2 π σ_{S}^{2})}^{\frac{1 - α}{2}}}{\sqrt{α}} - 1]

= \frac{1}{(1 - α)} \frac{{(2 π σ_{T} σ_{S})}^{1 - α}}{α} - \frac{1}{(1 - α)} [\frac{{(2 π σ_{T}^{2})}^{\frac{1 - α}{2}}}{\sqrt{α}} + \frac{{(2 π σ_{S}^{2})}^{\frac{1 - α}{2}}}{\sqrt{α}}] + \frac{1}{(1 - α)}

As such

\begin{matrix} I_{α} (T, S) = \frac{1}{(1 - α)} [\frac{{(2 π σ_{T}^{2})}^{\frac{1 - α}{2}}}{\sqrt{α}} - 1] + \frac{1}{(1 - α)} [\frac{{(2 π σ_{S}^{2})}^{\frac{1 - α}{2}}}{\sqrt{α}} - 1] + \frac{1}{(1 - α)} \frac{{(2 π σ_{T} σ_{S})}^{1 - α}}{α} \\ - \frac{1}{(1 - α)} [\frac{{(2 π σ_{T}^{2})}^{\frac{1 - α}{2}}}{\sqrt{α}} + \frac{{(2 π σ_{S}^{2})}^{\frac{1 - α}{2}}}{\sqrt{α}}] + \frac{1}{(1 - α)} - \frac{1}{(1 - α)} [\frac{{(2 π σ_{T} σ_{S} \sqrt{1 - ρ^{2}})}^{1 - α}}{α} - 1] \\ = \frac{{(2 π σ_{T} σ_{S})}^{1 - α}}{α (1 - α)} [1 - {(\sqrt{1 - ρ^{2}})}^{1 - α}] \end{matrix}

Finally, results in 2.2 and 2.3 can be concluded from the expression of

I_{α} (T, S)

. □

Proposition 3.

Let

T

and

S

be two binary outcome variables with 1 for a success and 0 for a failure such as the joint distribution of

{(T, S)}^{'} ~ M u l t i n o m i a l (p_{0, 0}, p_{1, 0}, p_{0, 1}, p_{1, 1})

. We have the following results:

I_{α} (T, S) = = \frac{1}{(1 - α)} \sum_{t = 0}^{1} \sum_{s = 0}^{1} \{{[p_{t, +}]}^{α} {[p_{+, s}]}^{α} - {[p_{t, s}]}^{α}\}

(9)

3.1: When $p_{t, s} = p_{t, +} p_{+, s}$ , $T$ and $S$ are independent, $I_{α} (T, S) = 0$ .
3.2: Let $ρ = \frac{p_{1, 1} - p_{1, +} p_{+, 1}}{\sqrt{p_{1, +} p_{0, +} p_{+, 1} p_{+, 0}}}$ be the correlation between $T$ and $S$ . If ${[p_{0, 0}]}^{α - 1} + {[p_{1, 1}]}^{α - 1} > {[p_{1, 0}]}^{α - 1} + {[p_{0, 1}]}^{α - 1}$ , $I_{α} (T, S)$ is an increasing function of $ρ$ for $α > 1$ . For $α < 1$ , $I_{α} (T, S)$ is an increasing function of $ρ$ if ${[p_{0, 0}]}^{α - 1} + {[p_{1, 1}]}^{α - 1} < {[p_{1, 0}]}^{α - 1} + {[p_{0, 1}]}^{α - 1}$ .
3.3: For $α > 1, I_{α} (T, S) \leq m i n (H C_{α} (T), H C_{α} (S))$ .
3.4: For $α < 1$ , $I_{α} (T, S) \geq m a x (H C_{α} (T), H C_{α} (S))$ .
3.5: For a given marginal distribution of $T$ and $S$ , there is a maximum value of mutual information as

I_{α} (T, S) \leq H C_{α} (T) + H C_{α} (S) + (1 - α) H C_{α} (T) H C_{α} (S) - \min (H C_{α} (T, S)) = I_{α, m a x} .

(10)

Thus, we can normalize ITMA as

{\tilde{R}}_{α, m a x} (T, S) = \frac{R_{α} (T, S)}{1 - e^{- 2 I_{α, m a x}}}

.

Proof.

Because

H C_{α} (T, S) = \frac{1}{(1 - α)} \sum_{t = 0}^{1} \sum_{s = 0}^{1} p_{t, s} \{{[p_{t, s}]}^{α - 1} - 1\} = \frac{1}{(1 - α)} \{\sum_{t = 0}^{1} \sum_{s = 0}^{1} {[p_{t, s}]}^{α} - 1\}

H C_{α} (T) = \frac{1}{(1 - α)} \sum_{t = 0}^{1} p_{t, +} \{{[p_{t, +}]}^{α - 1} - 1\} = \frac{1}{(1 - α)} \{\sum_{t = 0}^{1} {[p_{t, +}]}^{α} - 1\}

H C_{α} (S) = \frac{1}{(1 - α)} \sum_{s = 0}^{1} p_{+, s} \{{[p_{+, s}]}^{α - 1} - 1\} = \frac{1}{(1 - α)} \{\sum_{s = 0}^{1} {[p_{+, s}]}^{α} - 1\}

And

(1 - α) H C_{α} (T) H C_{α} (S) = \frac{1}{(1 - α)} \{\sum_{t = 0}^{1} {[p_{t, +}]}^{α} - 1\} \{\sum_{s = 0}^{1} {[p_{+, s}]}^{α} - 1\}

Thus,

H C_{α} (T) + H C_{α} (S) + (1 - α) H C_{α} (T) H C_{α} (S) = \frac{1}{(1 - α)} [\{\sum_{t = 0}^{1} {[p_{t, +}]}^{α}\} \{\sum_{s = 0}^{1} {[p_{+, s}]}^{α}\} - 1]

Therefore,

I_{α} (T, S) = \frac{1}{(1 - α)} ([\{\sum_{t = 0}^{1} {[p_{t, +}]}^{α}\} \{\sum_{s = 0}^{1} {[p_{+, s}]}^{α}\} - 1] - \frac{1}{(1 - α)} \{\sum_{t = 0}^{1} \sum_{s = 0}^{1} {[p_{t, s}]}^{α} - 1\})

\begin{matrix} = \frac{1}{(1 - α)} [\{\sum_{t = 0}^{1} {[p_{t, +}]}^{α}\} \{\sum_{s = 0}^{1} {[p_{+, s}]}^{α}\} - \sum_{t = 0}^{1} \sum_{s = 0}^{1} {[p_{t, s}]}^{α}] \\ = \frac{1}{(1 - α)} \sum_{t = 0}^{1} \sum_{s = 0}^{1} \{{[p_{t, +}]}^{α} {[p_{+, s}]}^{α} - {[p_{t, s}]}^{α}\} \end{matrix}

which is the expression given in Equation (9).

Result 3.1 is the direct derivation from Equation (9) as

p_{t, s} = p_{t, +} p_{+, s} .

Result 3.2 can be derived through the following relationship:

p_{1, 1} = p_{1, +} p_{+, 1} + ρ \sqrt{p_{1, +} p_{0, +} p_{+, 1} p_{+, 0}}

p_{1, 0} = p_{1, +} p_{+, 0} - ρ \sqrt{p_{1, +} p_{0, +} p_{+, 1} p_{+, 0}}

p_{0, 1} = p_{0, +} p_{+, 1} - ρ \sqrt{p_{1, +} p_{0, +} p_{+, 1} p_{+, 0}}

p_{0, 0} = p_{0, +} p_{+, 0} + ρ \sqrt{p_{1, +} p_{0, +} p_{+, 1} p_{+, 0}}

where

\frac{m a x (- p_{1, +} p_{+, 1}, - p_{0, +} p_{+, 0}, p_{1, +} p_{+, 0} - 1, p_{0, +} p_{+, 1} - 1)}{\sqrt{p_{1, +} p_{0, +} p_{+, 1} p_{+, 0}}} \leq ρ \leq \frac{m i n (1 - p_{1, +} p_{+, 1}, 1 - p_{0, +} p_{+, 0}, p_{1, +} p_{+, 0}, p_{0, +} p_{+, 1})}{\sqrt{p_{1, +} p_{0, +} p_{+, 1} p_{+, 0}}}

(11)

Then,

I_{α} (T, S) = \frac{1}{(1 - α)} \sum_{t = 0}^{1} \sum_{s = 0}^{1} \{{[p_{t, +}]}^{α} {[p_{+, s}]}^{α} - {[p_{t, +} p_{+, s} + {(- 1)}^{t + s} ρ \sqrt{p_{1, +} p_{0, +} p_{+, 1} p_{+, 0}}]}^{α}\}

Taking the derivative of

I_{α} (T, S)

on

ρ

, we have

\begin{matrix} \frac{d I_{α} (T, S)}{d ρ} = \frac{α}{(1 - α)} \sum_{t = 0}^{1} \sum_{s = 0}^{1} \{- {(- 1)}^{t + s} \sqrt{p_{1, +} p_{0, +} p_{+, 1} p_{+, 0}} {[p_{t, +}^{(z)} p_{+, s}^{(z)} + {(- 1)}^{t + s} ρ \sqrt{p_{1, +} p_{0, +} p_{+, 1} p_{+, 0}}]}^{α - 1}\} \\ = \frac{α}{(1 - α)} \sum_{t = 0}^{1} \sum_{s = 0}^{1} \sqrt{p_{1, +} p_{0, +} p_{+, 1} p_{+, 0}} \{{(- 1)}^{t + s + 1} {[p_{t, s}]}^{α - 1}\} \\ = \frac{α}{(1 - α)} \sqrt{p_{1, +} p_{0, +} p_{+, 1} p_{+, 0}} \{- {[p_{0, 0}]}^{α - 1} - {[p_{1, 1}]}^{α - 1} + {[p_{1, 0}]}^{α - 1} + {[p_{0, 1}]}^{α - 1}\} \end{matrix}

Under condition that

{[p_{0, 0}]}^{α - 1}

+

{[p_{1, 1}]}^{α - 1}

>

{[p_{1, 0}]}^{α - 1}

+

{[p_{0, 1}]}^{α - 1},

\frac{d I_{α} (T, S)}{d ρ} > 0

for

α > 1

. Similarly, under condition that

{[p_{0, 0}]}^{α - 1}

+

{[p_{1, 1}]}^{α - 1} < {[p_{1, 0}]}^{α - 1}

+

{[p_{0, 1}]}^{α - 1}

,

\frac{d I_{α} (T, S)}{d ρ} > 0

for

α < 1

.

For 3.3 and 3.4

when

α > 1,

\begin{array}{l} H C_{α} (T) - I_{α} (T, S) & = \frac{1}{(α - 1)} \sum_{t = 0}^{1} \{1 - {[p_{t, +}]}^{α} - \sum_{s = 0}^{1} \{{[p_{t, s}]}^{α} - {[p_{t, +}]}^{α} {[p_{+, s}]}^{α}\}\} \\ = \frac{1}{(α - 1)} \sum_{t = 0}^{1} \{1 - \sum_{s = 0}^{1} {[p_{t, s}]}^{α} - {[p_{t, +}]}^{α} \{1 - \sum_{s = 0}^{1} {[p_{+, s}]}^{α}\}\} \\ \geq \frac{1}{(α - 1)} \sum_{t = 0}^{1} \{1 - \sum_{s = 0}^{1} {[p_{+, s}]}^{α} - {[p_{t, +}]}^{α} \{1 - \sum_{s = 0}^{1} {[p_{+, s}]}^{α}\}\} \\ = \frac{1}{(α - 1)} \{1 - \sum_{s = 0}^{1} {[p_{+, s}]}^{α}\} \sum_{t = 0}^{1} \{1 - {[p_{t, +}]}^{α}\} \geq 0 \end{array}

However, when

α < 1,

\begin{array}{l} H C_{α} (T) - I_{α} (T, S) & = \frac{1}{(1 - α)} \sum_{t = 0}^{1} \{{[p_{t, +}]}^{α} - 1 - \sum_{s = 0}^{1} \{{[p_{t, +}]}^{α} {[p_{+, s}]}^{α} - {[p_{t, s}]}^{α}\}\} \\ = \frac{1}{(1 - α)} \sum_{t = 0}^{1} \{{[p_{t, +}]}^{α} - 1 - {[p_{t, +}]}^{α} \sum_{s = 0}^{1} {[p_{+, s}]}^{α} + \sum_{s = 0}^{1} {[p_{t, s}]}^{α}\} \\ \leq \frac{1}{(1 - α)} \sum_{t = 0}^{1} \{{[p_{t, +}]}^{α} - 1 - {[p_{t, +}]}^{α} \sum_{s = 0}^{1} {[p_{+, s}]}^{α} + \sum_{s = 0}^{1} {[p_{+, s}]}^{α}\} \\ = \frac{1}{(1 - α)} \sum_{t = 0}^{1} ({[p_{t, +}]}^{α} - 1) \sum_{s = 0}^{1} (1 - {[p_{+, s}]}^{α}) \leq 0 . \end{array}

Because of a symmetric relationship between T and S, we proved results 3.4 and 3.5.

For 3.5, for fixed marginal probability,

I_{α} (T, S)

depends only on

ρ

in

H C_{α} (T, S)

. Like 3.2,

\frac{d H C_{α} (T, S)}{d ρ} = \frac{α}{(1 - α)} \sqrt{p_{1, +} p_{0, +} p_{+, 1} p_{+, 0}} \{- {[p_{0, 0}]}^{α - 1} - {[p_{1, 1}]}^{α - 1} + {[p_{1, 0}]}^{α - 1} + {[p_{0, 1}]}^{α - 1}\}

When

\frac{d H C_{α} (T, S)}{d ρ} > 0

, taking the lower boundary of

ρ

in inequality (11) will derive the min value of

H C_{α} (T, S)

.

When

\frac{d H C_{α} (T, S)}{d ρ} < 0

, taking the upper boundary of

ρ

in inequality (11) will derive the min value of

H C_{α} (T, S)

. □

Remark 1.

When the concordant pairs are more likely than the discordant pairs for the two binary endpoints, for

α

> 1,

{[p_{0, 0}]}^{α - 1}

+

{[p_{1, 1}]}^{α - 1}

is more likely to be greater than

{[p_{1, 0}]}^{α - 1}

+

{[p_{0, 1}]}^{α - 1}

. However, when

α

< 1,

{[p_{0, 0}]}^{α - 1}

+

{[p_{1, 1}]}^{α - 1}

is more likely to be less than

{[p_{1, 0}]}^{α - 1}

+

{[p_{0, 1}]}^{α - 1}

. Thus, when two binary endpoints have more chance to be concordant, the mutual information will be an increasing function of correlation coefficient of

ρ

as shown in Proposition 3 Result 3.2.

Now we define a model for a binary

T

and a continuous normally distributed surrogate variable

S

.

Proposition 4.

Let

T

be a binary outcome variable and

S

continuous normally distributed surrogate variable, where

T ~ B (p_{0}, p_{1})

and

S ~ N (μ_{S}, σ_{S}^{2})

. We assume that there is a latent variable

U

such that

T = 1 \Leftrightarrow U \geq 0

, i.e, a Probit model with

U ~ N (μ_{T}, 1)

and

μ_{T} = Φ^{- 1} (p_{1})

. Assume a correlation coefficient between

U

and

S

is

ρ

, the conditional binary endpoint

T |S

follows a Bernoulli distribution with

p_{1 |s} = p_{T = 1 |S = s} = Φ (\frac{Φ^{- 1} (p_{1}) + ρ \frac{s - μ_{S}}{σ_{S}}}{\sqrt{1 - ρ^{2}}})

and

p_{0 |s} = p_{T = 0 |S = s} = 1 - p_{T = 1 |S = s} = Φ (\frac{Φ^{- 1} (p_{0}) - ρ \frac{s - μ_{S}}{σ_{S}}}{\sqrt{1 - ρ^{2}}})

. We have the following results:

4.1: The mutual information for H-C entropy is

$\begin{matrix} I_{α} (T, S) = \frac{1}{(1 - α)} \{\frac{{(2 π σ_{S}^{2})}^{\frac{1 - α}{2}}}{\sqrt{α}} \sum_{t = 0}^{1} {[p_{t}]}^{α} - \int_{- \infty}^{\infty} [\sum_{t = 0}^{1} {[p_{t |s}]}^{α}] ϕ^{α} (\frac{s - μ_{S}}{σ_{S}}) d s\} \\ = \frac{{(2 π σ_{S}^{2})}^{\frac{1 - α}{2}}}{(1 - α) \sqrt{α}} \sum_{t = 0}^{1} [{(p_{t})}^{α} - ℑ^{(α)} (\frac{Φ^{- 1} (p_{t})}{\sqrt{1 - ρ^{2}}}, \frac{{(- 1)}^{1 - t} \frac{ρ}{\sqrt{α}}}{\sqrt{1 - ρ^{2}}})] \end{matrix}$

where $ℑ^{(α)} (a, b) = \int_{- \infty}^{\infty} Φ^{α} (a + b y) ϕ (y) d y$ .
4.2: When $ρ = 0,$ $I_{α} (T, S) = 0$ .
4.3: When $ρ \to \pm 1,$ $I_{α} (T, S) \to \frac{1}{(1 - α)} \frac{{(2 π σ_{S}^{2})}^{\frac{1 - α}{2}}}{\sqrt{α}} \{\sum_{t = 0}^{1} {[p_{t}]}^{α} - Φ (\sqrt{α} Φ^{- 1} (p_{1})) - Φ (\sqrt{α} Φ^{- 1} (p_{0}))\}$
4.4: For $α \to 1,$ $I_{α} (T, S) \to \sum_{t = 0}^{1} [\int_{- \infty}^{\infty} p_{t |s} l o g (p_{t |s}) ϕ (s) d s - p_{t} l o g (p_{t})]$

Proof.

The joint distribution function for

(T, S) = (t, s)

is:

\begin{matrix} f (t, s) = \\ {\{\int_{0}^{\infty} \frac{1}{2 π σ_{S} \sqrt{1 - ρ^{2}}} e x p (- \frac{1}{2} [u - μ_{T} \frac{s - μ_{S}}{σ_{S}}] {[\begin{array}{l} 1 & ρ \\ ρ & 1 \end{array}]}^{- 1} [\frac{\begin{array}{l} u - μ_{T} \\ s - μ_{S} \end{array}}{σ_{S}}]) d u\}}^{t} \\ \times {\{\int_{- \infty}^{0} \frac{1}{2 π σ_{S} \sqrt{1 - ρ^{2}}} e x p (- \frac{1}{2} [u - μ_{T} \frac{s - μ_{S}}{σ_{S}}] {[\begin{array}{l} 1 & ρ σ_{S} \\ ρ σ_{S} & σ_{S}^{2} \end{array}]}^{- 1} [\frac{\begin{array}{l} u - μ_{T} \\ s - μ_{S} \end{array}}{σ_{S}}]) d u\}}^{1 - t} \\ = {\{\int_{0}^{\infty} \frac{1}{2 π σ_{S} \sqrt{1 - ρ^{2}}} e x p (- \frac{1}{2 (1 - ρ^{2})} ({[u - μ_{T} - ρ \frac{s - μ_{S}}{σ_{S}}]}^{2} + (1 - ρ^{2}) {[\frac{s - μ_{S}}{σ_{S}}]}^{2})) d u\}}^{t} \\ \times {\{\int_{- \infty}^{0} \frac{1}{2 π σ_{S} \sqrt{1 - ρ^{2}}} e x p (- \frac{1}{2 (1 - ρ^{2})} ({[u - μ_{T} - ρ \frac{s - μ_{S}}{σ_{S}}]}^{2} + (1 - ρ^{2}) {[\frac{s - μ_{S}}{σ_{S}}]}^{2})) d u\}}^{1 - t} \\ = {[1 - Φ (\frac{- μ_{T} - ρ \frac{s - μ_{S}}{σ_{S}}}{\sqrt{1 - ρ^{2}}})]}^{t} {[Φ (\frac{- μ_{T} - ρ \frac{s - μ_{S}}{σ_{S}}}{\sqrt{1 - ρ^{2}}})]}^{1 - t} \frac{ϕ (\frac{s - μ_{S}}{σ_{S}})}{σ_{S}} \\ = {[1 - Φ (\frac{- Φ^{- 1} (p_{1}) - ρ \frac{s - μ_{S}}{σ_{S}}}{\sqrt{1 - ρ^{2}}})]}^{t} {[Φ (\frac{- Φ^{- 1} (p_{1}) - ρ \frac{s - μ_{S}}{σ_{S}}}{\sqrt{1 - ρ^{2}}})]}^{1 - t} \frac{ϕ (\frac{s - μ_{S}}{σ_{S}})}{σ_{S}} \\ = {[p_{1 |s}]}^{t} {[p_{0 |s}]}^{1 - t} \frac{ϕ (\frac{s - μ_{S}}{σ_{S}})}{σ_{S}} \end{matrix}

Therefore,

H C_{α} (T) = \frac{1}{(1 - α)} \{\sum_{t = 0}^{1} {[p_{t}]}^{α} - 1\}

H C_{α} (S) = \frac{1}{(1 - α)} [\frac{{(2 π σ_{S}^{2})}^{\frac{1 - α}{2}}}{\sqrt{α}} - 1]

H C_{α} (T, S) = \frac{1}{1 - α} \{σ_{S}^{1 - α} \int_{- \infty}^{\infty} [p_{1 |s}^{α} + p_{0 |s}^{α}] ϕ^{α} (\frac{s - μ_{S}}{σ_{S}}) d \frac{s}{σ_{S}} - 1\}

H C_{α} (T) + H C_{α} (S) + (1 - α) H C_{α} (T) H C_{α} (S) = \frac{1}{(1 - α)} \{\frac{{(2 π σ_{S}^{2})}^{\frac{1 - α}{2}}}{\sqrt{α}} \sum_{t = 0}^{1} {[p_{t}]}^{α} - 1\}

\begin{matrix} I_{α} (T, S) \\ = H C_{α} (T) + H C_{α} (S) + (1 - α) H C_{α} (T) H C_{α} (S) - H C_{α} (T, S) \\ = \frac{1}{(1 - α)} \{\frac{{(2 π σ_{S}^{2})}^{\frac{1 - α}{2}}}{\sqrt{α}} \sum_{t = 0}^{1} {[p_{t}]}^{α} - \int_{- \infty}^{\infty} [\sum_{t = 0}^{1} p_{t |s}^{α}] \frac{ϕ^{α} (\frac{s - μ_{S}}{σ_{S}})}{σ_{S}^{α}} d s\} \end{matrix}

Using

ℑ^{(α)} (a, b) = \int_{- \infty}^{\infty} Φ^{α} (a + b y) ϕ (y) d y

, we can derive an alternative formulation

\begin{matrix} \int_{- \infty}^{\infty} [\sum_{t = 0}^{1} Φ {(\frac{Φ^{- 1} (p_{t}) + {(- 1)}^{1 - t} ρ \frac{s - μ_{S}}{σ_{S}}}{\sqrt{1 - ρ^{2}}})}^{α}] \frac{ϕ^{α} (\frac{s - μ_{S}}{σ_{S}})}{σ_{S}^{α}} d s \\ = \int_{- \infty}^{\infty} [\sum_{t = 0}^{1} Φ {(\frac{Φ^{- 1} (p_{t}) + {(- 1)}^{1 - t} ρ \frac{s - μ_{S}}{σ_{S}}}{\sqrt{1 - ρ^{2}}})}^{α}] \frac{e^{- \frac{α}{2 σ_{S}^{2}} {(s - μ_{S})}^{2}}}{{(2 π σ_{S}^{2})}^{\frac{α}{2}}} d s \\ = \frac{{(2 π σ_{S}^{2} / α)}^{\frac{1}{2}}}{{(2 π σ_{S}^{2})}^{\frac{α}{2}}} \int_{- \infty}^{\infty} [\sum_{t = 0}^{1} Φ {(\frac{Φ^{- 1} (p_{t}) + {(- 1)}^{1 - t} \frac{ρ}{\sqrt{α}} y}{\sqrt{1 - ρ^{2}}})}^{α}] \frac{e^{- \frac{1}{2} y^{2}}}{{(2 π)}^{\frac{1}{2}}} d y \\ = \frac{{(2 π σ_{S}^{2})}^{\frac{1 - α}{2}}}{\sqrt{α}} \sum_{t = 0}^{1} \int_{- \infty}^{\infty} Φ {(\frac{Φ^{- 1} (p_{t}) + {(- 1)}^{1 - t} \frac{ρ}{\sqrt{α}} y}{\sqrt{1 - ρ^{2}}})}^{α} ϕ (y) d y \end{matrix}

I_{α} (T, S)

can be simplified as

I_{α} (T, S) = \frac{{(2 π σ_{S}^{2})}^{\frac{1 - α}{2}}}{(1 - α) \sqrt{α}} \sum_{t = 0}^{1} [{(p_{t})}^{α} - ℑ^{(α)} (\frac{Φ^{- 1} (p_{t})}{\sqrt{1 - ρ^{2}}}, \frac{{(- 1)}^{1 - t} \frac{ρ}{\sqrt{α}}}{\sqrt{1 - ρ^{2}}})]

Thus, we complete the proof for 4.1.

For 4.2,

ρ = 0,

I_{α} (T, S) = \frac{1}{(1 - α)} \{\frac{{(2 π σ_{S}^{2})}^{\frac{1 - α}{2}}}{\sqrt{α}} \sum_{t = 0}^{1} {[p_{t}]}^{α} - \int_{- \infty}^{\infty} [p_{1}^{α} + p_{0}^{α}] \frac{ϕ^{α} (\frac{s - μ_{S}}{σ_{S}})}{σ_{S}^{α}} d s\} = 0

For 4.3, as

ρ \to 1

,

\begin{matrix} \int_{- \infty}^{\infty} [p_{1 |s}^{α} + p_{0 |s}^{α}] \frac{ϕ^{α} (\frac{s - μ_{S}}{σ_{S}})}{σ_{S}^{α}} d s \to \int_{- Φ^{- 1} (p_{1})}^{\infty} \frac{ϕ^{α} (\frac{s - μ_{S}}{σ_{S}})}{σ_{S}^{α}} d s + \int_{- \infty}^{Φ^{- 1} (p_{0})} \frac{ϕ^{α} (\frac{s - μ_{S}}{σ_{S}})}{σ_{S}^{α}} d s \\ = \frac{{(2 π σ_{S}^{2})}^{\frac{1 - α}{2}}}{\sqrt{α}} [Φ (\sqrt{α} Φ^{- 1} (p_{1})) + Φ (\sqrt{α} Φ^{- 1} (p_{0}))] \end{matrix}

Similarly,

ρ \to - 1

,

\int_{- \infty}^{\infty} [p_{1 |s}^{α} + p_{0 |s}^{α}] \frac{ϕ^{α} (\frac{s - μ_{S}}{σ_{S}})}{σ_{S}^{α}} d s \to \frac{{(2 π σ_{S}^{2})}^{\frac{1 - α}{2}}}{\sqrt{α}} [Φ (\sqrt{α} Φ^{- 1} (p_{1})) + Φ (\sqrt{α} Φ^{- 1} (p_{0}))]

.

So,

ρ \to \pm 1

,

I_{α} (T, S) \to \frac{1}{(1 - α)} \frac{{(2 π σ_{S}^{2})}^{\frac{1 - α}{2}}}{\sqrt{α}} \{\sum_{t = 0}^{1} [{(p_{t})}^{α} - Φ (\sqrt{α} Φ^{- 1} (p_{t}))]\}

.

For

α = 1,

H-C entropy is similar to Shannon’s entropy. Thus, by taking limit of

α

to 1, we can derive Shannon’s mutual information for the Probit model in 4.4.

\begin{array}{l} \lim_{α \to 1} I_{α} (T, S) & = \lim_{α \to 1} \{\frac{{(2 π σ_{S}^{2})}^{\frac{1 - α}{2}}}{(1 - α) \sqrt{α}} \sum_{t = 0}^{1} [{(p_{t})}^{α} - \int_{- \infty}^{\infty} Φ {(\frac{Φ^{- 1} (p_{t}) + {(- 1)}^{1 - t} \frac{ρ}{\sqrt{α}} y}{\sqrt{1 - ρ^{2}}})}^{α} ϕ (y) d y]\} \\ = \lim_{α \to 1} \{\frac{1}{(1 - α)} \sum_{t = 0}^{1} [{(p_{t})}^{α} - \int_{- \infty}^{\infty} Φ {(\frac{Φ^{- 1} (p_{t}) + {(- 1)}^{1 - t} \frac{ρ}{\sqrt{α}} y}{\sqrt{1 - ρ^{2}}})}^{α} ϕ (y) d y]\} \\ = \lim_{α \to 1} {\frac{1}{- 1} \sum_{t = 0}^{1} [{(p_{t})}^{α} l o g (p_{t}) \\ - \int_{- \infty}^{\infty} Φ {(\frac{Φ^{- 1} (p_{t}) + {(- 1)}^{1 - t} \frac{ρ}{\sqrt{α}} y}{\sqrt{1 - ρ^{2}}})}^{α} l o g (Φ (\frac{Φ^{- 1} (p_{t}) + {(- 1)}^{1 - t} \frac{ρ}{\sqrt{α}} y}{\sqrt{1 - ρ^{2}}})) ϕ (y) d y]} \\ = \sum_{t = 0}^{1} [\int_{- \infty}^{\infty} Φ (\frac{Φ^{- 1} (p_{t}) + {(- 1)}^{1 - t} ρ y}{\sqrt{1 - ρ^{2}}}) l o g (Φ (\frac{Φ^{- 1} (p_{t}) + {(- 1)}^{1 - t} ρ y}{\sqrt{1 - ρ^{2}}})) ϕ (y) d y \\ - p_{t} l o g (p_{t})] = \sum_{t = 0}^{1} [\int_{- \infty}^{\infty} p_{t |y} l o g (p_{t |y}) ϕ (y) d y - p_{t} l o g (p_{t})] \end{array}

□.

Remark 2.

Taking into account that

ℑ^{(2)} (a, b) = Φ (\frac{a}{\sqrt{1 + b^{2}}}) - 2 T (\frac{a}{\sqrt{1 + b^{2}}}, \frac{1}{\sqrt{1 + 2 b^{2}}})

, where

T (h, k) = ϕ (h) \int_{0}^{k} \frac{ϕ (h x)}{1 + x^{2}} d x

is the Owen’s function [33] and the property

T (h, k) = T (- h, k)

, we can derive explicit formula for

α

= 2,

I_{2} (T, S) = - \frac{1}{2 \sqrt{π} σ_{S}} \sum_{t = 0}^{1} [{(p_{t})}^{2} - [Φ (\frac{Φ^{- 1} (p_{t})}{\sqrt{1 - ρ^{2} / 2}}) - 2 T (\frac{Φ^{- 1} (p_{t})}{\sqrt{1 - ρ^{2} / 2}}, \sqrt{(1 - ρ^{2})})]]

Since

Φ^{- 1} (p_{t}) = Φ^{- 1} (1 - p_{1 - t}) = - Φ^{- 1} (p_{1 - t})

and

Φ (\frac{Φ^{- 1} (p_{1})}{\sqrt{1 - ρ^{2} / 2}}) + Φ (\frac{Φ^{- 1} (p_{0})}{\sqrt{1 - ρ^{2} / 2}})

= 1 we can simplify the expression as

I_{2} (T, S) = \frac{1}{2 \sqrt{π} σ_{S}} \{1 - 4 T (\frac{Φ^{- 1} (p_{0})}{\sqrt{1 - ρ^{2} / 2}}, \sqrt{(1 - ρ^{2})}) - {(p_{1})}^{2} - {(p_{0})}^{2}\}

3. Surrogacy of a Longitudinal Biomarker for a Binary Clinical Endpoint

3.1. Model for Longitudinal Continuous Surrogate Biomarkers in Phase II Trials

In many phase II trials, clinical endpoints of interest (

T)

are often a proportion of a binary endpoint or mean of a continuous variable. For example, in oncology phase II trials, a common clinical endpoint is total response rate. The surrogate biomarkers, on the other hand, are usually lab tests either from serum or urine or imaging modalities that can be measured repeatedly during the study. In this section, we focus on a binary one-time clinical endpoint

T

and a continuous repeated surrogate variable S.

In the remainder of the paper, we will use

t_{j}

to denote the time of

j

th measurement, since baseline in a longitudinal trial. For simplicity, consider the difference model from baseline

t_{0} = 0

.

Let the general model as:

S_{i, j} |Z_{i}, T_{i} = μ_{S_{i}} + α_{1} Z_{i} + α_{2} t_{j} + α_{3} Z_{i} t_{j} + β_{j} T_{i} + ϵ_{S, i j}, for j = 1, \dots, K .

Thus,

S_{i} |Z_{i}, T_{i} = (S_{i, 0} |Z_{i}, T_{i}, S_{i, 1} |Z_{i}, T_{i}, \dots, S_{i, K} |Z_{i}, T_{i})^{'} ~ M V N (μ_{S_{i}}, Σ_{S S})

, where

μ_{S_{i}} = {(μ_{S} + α_{1} Z_{i} + (α_{2} + α_{3} Z_{i}) t_{1} + β_{1} T_{i}, \dots, μ_{S} + α_{1} Z_{i} + (α_{2} + α_{3} Z_{i}) t_{K} + β_{K} T_{i})}^{'}

Using a bivariate probit model [34] for the joint distribution of

{(Z_{i}, T_{i})}^{'}

, we can derive a probit model for the conditional joint distribution for

{(Z_{i}, T_{i})}^{'} |S

:

[\begin{matrix} Φ^{- 1} [P (T_{i} = 1)] \\ Φ^{- 1} [P (Z_{i} = 1)] \end{matrix}| S_{i}] ~ M V N_{2} ([\begin{matrix} μ_{0, 1} & γ_{1}^{'} \\ μ_{0, 2} & γ_{2}^{'} \end{matrix}] [\begin{matrix} 1 \\ S_{i} \end{matrix}], [\begin{matrix} 1 & ρ \\ ρ & 1 \end{matrix}])

where

μ_{0, k}

and

γ_{k}

are the intercept and regression coefficient vector for the probit regression for

T

given longitudinal

S

(

k = 1

) and for

Z

given longitudinal

S

(

k = 2)

, respectively, and

ρ

is the correlation coefficient of two underlying latent normal variables for

T

and

Z

.

Because a linear combination of multivariate normal variables is still a normal random variable, we can use the Proposition 4 to calculate ITMA under H-C entropy power to evaluate surrogacy of the longitudinal biomarker in each arm conditioning on

Z

. We can also average over the treatment arms to get the mean trial level ITMA under H-C entropy, denoted as

I_{α} (T, S | Z) = E [I_{α} (T, S | Z)] = I_{α} (T, S | Z = 1) P (Z = 1) + I_{α} (T, S | Z = 0) P (Z = 0)

Furthermore, we can use the mutual information

I_{α} (T, Z | S)

to verify Prentice’s criteria as suggested by [24]: i.e., conditioning on surrogate

S

, the clinical endpoint

T

and treatment assignment

Z

are independent, thus a good surrogate should lead to

I_{α} (T, Z | S) \approx 0

. Since

I_{α} (T, Z | S) = E H C_{α} (T |S) + H C_{α} (Z |S) + (1 - α) H C_{α} (T |S) H C_{α} (Z |S) - H C_{α} (T, Z |S)

where

H C_{α} (T |S) = \frac{1}{1 - α} \{{[p (T = 1 |S)]}^{α} + {[p (T = 0 |S)]}^{α} - 1\}

(12)

H C_{α} (Z |S) = \frac{1}{1 - α} \{{[p (Z = 1 |S)]}^{α} + {[p (Z = 0 |S)]}^{α} - 1\}

H C_{α} (T, Z |S) = \frac{1}{1 - α} \{{[p (T = 1, Z = 1 |S)]}^{α} + {[p (T = 1, Z = 0 |S)]}^{α} + {[p (T = 0, Z = 1 |S)]}^{α} + {[p (T = 0, Z = 0 |S)]}^{α} - 1\}

So for

α \neq 1, I_{α} (T, Z | S) = \frac{1}{1 - α} E ⟨ \sum_{t = 0}^{1} \sum_{z = 0}^{1} \{{[p (T = t |S)]}^{α} {[p (Z = z |S)]}^{α} - {[p (T = t, Z = z |S)]}^{α}\} ⟩

. When

α = 1

,

I_{α} (T, Z | S) = E ⟨ - \sum_{t = 0}^{1} p (T = t |S) l o g [p (T = t |S)] - \sum_{z = 0}^{1} p (Z = z |S) l o g [p (Z = z |S)] + \sum_{t = 0}^{1} \sum_{z = 0}^{1} p (T = t, Z = z |S) l o g [p (T = t, Z = z |S)] ⟩

.

For real data, we can use bivariate probit model to estimate equations for

{[p (T = t |S)]}^{α}

,

{[p (Z = z |S)]}^{α}

, and

{[p (T = t, Z = z |S)]}^{α}

, then use Equation (9) to perform numerical integration for the derivation of

I_{α} (T, Z | S)

. One way to perform this analysis is to use R-package mvProbit from CRAN-R (https://cran.r-project.org/web/packages/mvProbit/mvProbit.pdf, accessed on 17 December 2021).

3.2. A Data Example

“Safety, Tolerability and Activity Study of Ibudilast in Subjects with Progressive Multiple Sclerosis” (NCT01982942) is a US National Institute of Health (NIH) sponsored multicenter, randomized, double-blind, placebo-controlled, parallel-group phase II study from November 2013 to December 2017. The main study results have been published by Fox, et al. [35]. The trial data is publicly available upon request to NIH. We use this data for the numerical illustration for H-C ITMA.

More specifically, patients were enrolled with primary or secondary progressive multiple sclerosis of this phase II randomized trial of oral ibudilast (≤100 mg daily) or placebo for 96 weeks. The primary efficacy end point was the rate of brain atrophy, as measured by the brain parenchymal fraction (brain size relative to the volume of the outer surface contour of the brain). Major secondary end points included the change in the pyramidal tracts on diffusion tensor imaging and cortical atrophy, all measures of tissue damage in multiple sclerosis.

We requested and received data from the study team that contained 104 placebo patients and 99 treated patients, with longitudinal observations in brain parenchymal fraction (BPF) and thinning of the cortical gray matter (cortical thickness) measured by magnetic resonance imaging at week 0, 24, 48, 72, and 96. For illustration purposes, we altered the primary and secondary endpoints of the trial and created a binary clinical endpoint as the cortical thickness (CTH) greater than 3 mm as a clinical outcome for less cortical gray matter atrophy and used BPF as the continuous longitudinal marker. Table 1 provides a summary of the data used for this illustration.

From Table 1, we can see that 104 patients were randomized to the control arm and 99 patients to the treatment arm. The treatment significantly reduced cortical atrophy for 71% patients who maintained more than 3 mm cortical thickness (CTH) in the treatment arm in comparison to 48% in the placebo arm at 96 weeks post baseline. While the differences in BPFs between treatment arms had p-values above 0.38 in each follow-up MRI, the aggregated changes over time measured by the slopes of a mixed random effects regression model achieved highly statistical significance with a p-value of 0.0056.

The importance of evaluating the surrogacy of the longitudinal BPF measurements for the binary CTH endpoint in MS trials is to understand the strength of surrogacy and whether it can be used to shorten trial duration. More importantly for future trial design, we need to understand how often and when the longitudinal measurements should be performed.

Using formulas derived in Proposition 4, we derived the mean mutual information and ITMA of longitudinal BPF as a surrogate for the clinical endpoint of maintaining more than 3 mm cortical thickness at the end of 96 weeks. We explore three choices of

α

= 0.5, 1, and 2 to show the difference between H-C and Shannon entropies. The value of

α

= 1 has been considered because it corresponds to Shannon entropy. The other two alpha values have been considered in other papers such as [32]. The columns of Table 2 are organized according to values of

α

. Each row in Table 2 represents a design to use BPF in the baseline (week 0) and different follow-up visits to construct a longitudinal surrogate endpoint. For example, the first row used the baseline and week 24 data while the last row used the data from baseline, weeks 48 and 72.

From Table 2, we can see that the longitudinal BPF measures at the baseline with at least one follow-up visit were all reasonable surrogates for the binary endpoint of CTH > 3.0 mm. H-C entropy with

α = 0.5

was not sensitive enough to differentiate subtle differences in surrogacy utility of different designs to collect surrogate endpoints. When

α = 1

, H-C entropy is Shannon entropy and it was able to discriminate among different designs to the 3rd decimal place. H-C entropy for

α = 2

was more sensitive and showed differences in all designs. As it demonstrated, using longitudinal BPF data could shorten trial duration to 72 weeks. For a trial ended at 72 weeks, additional BPF measures at week 24 and week 48 did not add any more valuable utility to surrogacy than a single measure in week 24. Overall, the p-values from the linear mixed random effects model reflected the directions of ITMAs, but not in completely concordance, perhaps, due to random variation in fitting the mixed random effects and the probit models.

Table 3 examines the longitudinal surrogacy based on Prentice’s criteria. Here we want to determine if

I_{α} (T, Z | S)

is close to 0. The results of Table 3 confirm the observations in Table 2 that the longitudinal BPF is a good surrogate variable for binary CTH > 3.0 mm at 96 weeks. Because Table 3 uses the same model as Table 2, the p-values for longitudinal models are omitted. Once again,

I_{α} (T, Z | S)

decreases with

α

.

4. Conclusions

Alonso et al. [24] proposed to assess the validity of a surrogate endpoint in terms of uncertainty reduction. The main proposals for measures of uncertainty are found in information theory. These authors based their proposal in the well-known Shannon entropy. In the past there has been an extensive work on generalized entropies [30,31,32,36,37,38,39]. We focus on the Havrda-Charvat entropy, which reduces to the Shannon case if the parameter is set to one, to extend that surrogacy measure. Based on the generalized entropy, we consider a generalized mutual information as it has been proved in other contexts to have better performance of some members of this family [30,31,32]. In this paper, the theoretical development of these measures has been completed. The advantage of our proposal is that it contains a particular case of a useful measure to assess surrogacy and demonstrates the ability to easily explore other measures which may have performance advantages for specific questions. We have seen the advantage of using

α

= 2 instead of

α

= 1 in our example to evaluate scheduling of longitudinal visits.

Some additional issues are pending. On one hand, we are working to carry out a more extensive numerical study for assessing the performance of these measures in the endpoint surrogacy context. In our paper, we compared the performance of ITMA in a real trial with three choices of

α

(0.5, 1 and 2). They were chosen for illustration purposes. The optimal choice of

α

remains a research question. Additional research can consider other ITMA, such as divergence measures [36], taking into account that the mutual information is equal to the Kullback divergence, or measures of unilateral dependency as that defined by Andonie et al. [37] based on the informational energy [39] or surrogacy for testing of variances [38].

Author Contributions

Conceptualization, M.d.C.P. and Y.L.; Data curation, Q.Z. and Y.L.; Funding acquisition, M.d.C.P.; Investigation, M.d.C.P., Q.Z., H.J. and Y.L.; Methodology, M.d.C.P., Q.Z., H.J. and Y.L.; Supervision, Y.L.; Validation, Q.Z.; Writing—original draft, M.d.C.P. and Y.L.; Writing—review—editing, M.d.C.P., Q.Z., H.J. and Y.L. All authors have read and agreed to the published version of the manuscript.

Funding

PID2019-104681RB-I00 (M.d.C.P.) of the Spanish Ministry of Science and Innovation and grants from the US National Institute of Health 1UL1TR003142, 4P30CA124435, and R01HL089778 (Y.L.).

Data Availability Statement

Data can be requested through the US National Institute of Neurological Disorders and Stroke and Fox, the Principal Investigator of NCT01982942 for access the de-identified data from the “Safety, Tolerability and Activity Study of Ibudilast in Subjects with Progressive Multiple Sclerosis”.

Acknowledgments

This work was partially supported by research grant PID2019-104681RB-I00 (Pardo M.D.C.) of the Spanish Ministry of Science and Innovation and grants from the US National Institute of Health 1UL1TR003142, 4P30CA124435, and R01HL089778 (Lu Y.). We want to thank the US National Institute of Neurological Disorders and Stroke and Fox, the Principal Investigator of NCT01982942 for sharing the de-identified data from the “Safety, Tolerability and Activity Study of Ibudilast in Subjects with Progressive Multiple Sclerosis”. We want to thank the reviewers for their constructive comments that substantially improved the paper.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. R-Program

Appendix A.1. R-Program for Table 1

### Row 1 ###

fisher.test(table(nihexample$CTh96YesNo,nihexample$trt.group))

### Row 3–5 ###

t.test(nihexample$bpf0~nihexample$trt.group)

t.test(nihexample$bpf96~nihexample$trt.group)

### Row 6 ###

library(lme4)

summary(lmer(bpf~trt.group+week+trt.group*week+(1|ID),data=nihexamplelong))

Appendix A.2. R-Program for Table 2

hcentr=function(pt,preds,pz,trt, alpha){

s2=var(preds)

if(alpha!=1){

mtrinf=pz/(1-alpha)*((2*pi*s2)^((1-alpha)/2)/sqrt(alpha)*((pt[2])^alpha+(1-pt[2])^alpha)-mean((pnorm(preds[trt==1]))^alpha+(1-pnorm(preds[trt==1]))^alpha))+(1-pz)/(1-alpha)*((2*pi*s2)^((1-alpha)/2)/sqrt(alpha)*((pt[1])^alpha+(1-pt[1])^alpha)-mean((pnorm(preds[trt==0]))^alpha+(1-pnorm(preds[trt==0]))^alpha))}

if(alpha==1){

mtrinf=pz*(mean(pnorm(preds[trt==1])*log(pnorm(preds[trt==1])))+mean(1-pnorm(preds[trt==1])*log(1-pnorm(preds[trt==1])))-pt[2]*log(pt[2])-(1-pt[2])*log(1-pt[2]))+(1-pz)*(mean(pnorm(preds[trt==0])*log(pnorm(preds[trt==0])))+mean(1-pnorm(preds[trt==0])*log(1-pnorm(preds[trt==0])))-pt[1]*log(pt[1])-(1-pt[1])*log(1-pt[1]))

}

itma=1-exp(-2*mtrinf)

c(mtrinf,itma)

}

### Row 1 ###

pt=table(nihexample$CTh96YesNo,nihexample$trt.group)[2,]/table(nihexample$trt.group)

myprobit1=glm(CTh96YesNo~trt.group+bpf0+bpf24,family=binomial(link="probit"), data=nihexample)

row1=round(c(

hcentr(pt,myprobit1$linear.predictors,sum(nihexample$trt.group)/length(nihexample$trt.group),nihexample$trt.group,0.5),

hcentr(pt,myprobit1$linear.predictors,sum(nihexample$trt.group)/length(nihexample$trt.group),nihexample$trt.group,1),

hcentr(pt,myprobit1$linear.predictors,sum(nihexample$trt.group)/length(nihexample$trt.group),nihexample$trt.group,2),

summary(lmer(bpf~trt.group+week+trt.group*week+(1|ID),data=nihexamplelong[nihexamplelong$week==0 |nihexamplelong$week==24, ]))$coefficients[4,5]),4)

### similar codes for other rows ####

Appendix A.3. R-Program for Table 3

library(mvtnorm)

library(mvProbit)

####(T, Z)####

table(table(nihexample$CTh96YesNo,nihexample$trt.group)

#####choose varibles: “bpf0”, “bpf24” “bpf48” “bpf72” “bpf96” (1-5)

fullmodel1 = mvProbit(cbind(CTh96YesNo,trt.group)~bpf0+bpf24+bpf48+bpf72+bpf96,data=nihexample) #####model

summary(fullmodel1)

sigma=symMatrix(c(1,fullmodel1$estimate[length(fullmodel1$estimate)],1))

################################################

#mu1=fullmodel1$estimate[1]+fullmodel1$estimate[2]*nihexample$bpf0+fullmodel1$estimate[3]*nihexample$bpf24

#mu2=fullmodel1$estimate[4]+fullmodel1$estimate[5]*nihexample$bpf0+fullmodel1$estimate[6]*nihexample$bpf24

#mu1=fullmodel1$estimate[1]+fullmodel1$estimate[2]*nihexample$bpf0+fullmodel1$estimate[3]*nihexample$bpf24+fullmodel1$estimate[4]*nihexample$bpf48

#mu2=fullmodel1$estimate[5]+fullmodel1$estimate[6]*nihexample$bpf0+fullmodel1$estimate[7]*nihexample$bpf24+fullmodel1$estimate[8]*nihexample$bpf48

#mu1=fullmodel1$estimate[1]+fullmodel1$estimate[2]*nihexample$bpf0+fullmodel1$estimate[3]*nihexample$bpf24+fullmodel1$estimate[4]*nihexample$bpf48+fullmodel1$estimate[5]*nihexample$bpf72

#mu2=fullmodel1$estimate[6]+fullmodel1$estimate[7]*nihexample$bpf0+fullmodel1$estimate[8]*nihexample$bpf24+fullmodel1$estimate[9]*nihexample$bpf48+fullmodel1$estimate[10]*nihexample$bpf72

mu1=fullmodel1$estimate[1]+fullmodel1$estimate[2]*nihexample$bpf0+fullmodel1$estimate[3]*nihexample$bpf24+fullmodel1$estimate[4]*nihexample$bpf48+fullmodel1$estimate[5]*nihexample$bpf72+fullmodel1$estimate[6]*nihexample$bpf96

mu2=fullmodel1$estimate[7]+fullmodel1$estimate[8]*nihexample$bpf0+fullmodel1$estimate[9]*nihexample$bpf24+fullmodel1$estimate[10]*nihexample$bpf48+fullmodel1$estimate[11]*nihexample$bpf72+fullmodel1$estimate[12]*nihexample$bpf96

fullmodel1$estimate

sigma

##################### T|S Z|S############

bs=10000

set.seed(873465)

BT=sample(c(1:length(CTh96YesNo)), size = bs, replace = TRUE) ### bootstrap ID###

BT_U1=matrix(0,bs,2) #### 1 D normal probability

BT_U2=matrix(0,bs,4) ####2 D normal probability

for (i in 1:bs)

{ c=BT[i]

BT_U1[i,1]=1-pnorm(0, mu1[c], 1) ####P(T=1|S)

BT_U1[i,2]=1-pnorm( 0,mu2[c], 1)####P(Z=1|S)

BT_U2[i,1]=pmvnorm(lower=c(0,0),upper=Inf,mean=c(mu1[c],mu2[c]),sigma)####P(T=1,Z=1|S]

BT_U2[i,2]=pmvnorm(lower=c(0,-Inf),upper=c(Inf,0),mean=c(mu1[c],mu2[c]),sigma)####P(T=1,Z=0|S]

BT_U2[i,3]=pmvnorm(lower=c(-Inf,0),upper=c(0,Inf),mean=c(mu1[c],mu2[c]),sigma)####P(T=0,Z=1|S]

BT_U2[i,4]=pmvnorm(lower=-Inf,upper=c(0,0),mean=c(mu1[c],mu2[c]),sigma)####P(T=0,Z=0|S]

}

################################

alfa=0.5

p1=mean((BT_U1[,1]*BT_U1[,2])^alfa+((1-BT_U1[,1])*BT_U1[,2])^alfa+(BT_U1[,1]*(1-BT_U1[,2]))^alfa+((1-BT_U1[,1])*(1-BT_U1[,2]))^alfa)

p2=mean(BT_U2[,1]^alfa+BT_U2[,2]^alfa+BT_U2[,3]^alfa+BT_U2[,4]^alfa)

I_alfa=1/(1-alfa)*(p1-p2)

IM_alfa=1-exp(-2*I_alfa)

I_alfa

IM_alfa

###################################

alfa=2

p1=mean((BT_U1[,1]*BT_U1[,2])^alfa+((1-BT_U1[,1])*BT_U1[,2])^alfa+(BT_U1[,1]*(1-BT_U1[,2]))^alfa+((1-BT_U1[,1])*(1-BT_U1[,2]))^alfa)

p2=mean(BT_U2[,1]^alfa+BT_U2[,2]^alfa+BT_U2[,3]^alfa+BT_U2[,4]^alfa)

I_alfa=1/(1-alfa)*(p1-p2)

IM_alfa=1-exp(-2*I_alfa)

I_alfa

IM_alfa

######## alfa=1#####################

p1=-mean(BT_U1[,1]*log(BT_U1[,1])+(1-BT_U1[,1])*log(1-BT_U1[,1]))

p2=-mean(BT_U1[,2]*log(BT_U1[,2])+(1-BT_U1[,2])*log(1-BT_U1[,2]))

p3=mean(BT_U2[,1]*log(BT_U2[,1])+BT_U2[,2]*log(BT_U2[,2])+BT_U2[,3]*log(BT_U2[,3])+BT_U2[,4]*log(BT_U2[,4]))

I=p1+p2+p3

IM=1-exp(-2*I)

References

Clinical Trial Endpoints for the Approval of Cancer Drugs and Biologics Guidance for Industry. U.S. Department of Health and Human Services. Available online: https://www.fda.gov/media/71195/download (accessed on 17 December 2021).
Kim, C.; Prasad, V. Cancer drugs approved on the basis of a surrogate end point and subsequent overall survival: An analysis of 5 years of, U.S.; food and drug administration approvals. JAMA Intern. Med. 2015, 175, 1992–1994. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Schwartz, L.H.; Litière, S.; de Vries, E.; Ford, R.; Gwyther, S.; Mandrekar, S.; Shankar, L.; Bogaerts, J.; Chen, A.; Dancey, J.; et al. RECIST 1.1-update and clarification: From the RECIST committee. Eur. J. Cancer 2016, 62, 132–137. [Google Scholar] [CrossRef] [Green Version]
Karrison, T.G.; Maitland, M.L.; Stadler, W.M.; Ratain, M.J. Design of phase II cancer trials using a continuous endpoint of change in tumor size: Application to a study of sorafenib and erlotinib in non-small-cell lung cancer. J. Natl. Cancer Inst. 2007, 99, 1455–1461, Erratum in J. Natl. Cancer Inst. 2007, 99, 1819. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Burzykowski, T.; Coart, E.; Saad, E.D.; Shi, Q.; Sommeijer, D.W.; Bokemeyer, C.; Díaz-Rubio, E.; Douillard, J.Y.; Falcone, A.; Fuchs, C.S.; et al. Evaluation of continuous tumor-size–based end points as surrogates for overall survival in randomized clinical trials in metastatic colorectal cancer. JAMA Netw. Open 2019, 2, e1911750. [Google Scholar] [CrossRef]
Lu, Y. Statistical considerations for quantitative imaging measures in clinical trials. In Biopharmaceutical Applied Statistics Symposium: Volume 3 Pharmaceutical Applications; Peace, K.E., Chen, D.-G., Menon, S., Eds.; ICSA Book Series in Statistics; Springer: Singapore, 2018; pp. 219–240. [Google Scholar]
Chen, E.Y.; Joshi, S.K.; Tran, A.; Prasad, V. Estimation of study time reduction using surrogate end points rather than overall survival in oncology clinical trials. JAMA Intern. Med. 2019, 179, 642–647. [Google Scholar] [CrossRef] [PubMed]
Kok, P.S.; Yoon, W.H.; Lord, S.; Marschner, I.; Friedlander, M.; Lee, C.K. Tumor response end points as surrogates for overall survival in immune checkpoint inhibitor trials: A systematic review and meta-analysis. JCO Precis. Oncol. 2021, 5, 1151–1159. [Google Scholar] [CrossRef]
Shameer, K.; Zhang, Y.; Jackson, D.; Rhodes, K.; Neelufer, I.K.A.; Nampally, S.; Prokop, A.; Hutchison, E.; Ye, J.; Malkov, V.A.; et al. Correlation between early endpoints and overall survival in non-small-cell lung cancer: A trial-level meta-analysis. Front. Oncol. 2021, 11, 672916. [Google Scholar] [CrossRef]
Haslam, A.; Hey, S.P.; Gill, J.; Prasad, V. A systematic review of trial-level meta-analyses measuring the strength of association between surrogate end-points and overall survival in oncology. Eur. J. Cancer 2019, 106, 196–211. [Google Scholar] [CrossRef]
Prentice, R.L. Surrogate endpoints in clinical trials: Definitions and operational criteria. Stat. Med. 1989, 8, 431–440. [Google Scholar] [CrossRef]
Freedman, L.S.; Graubard, B.I.; Schatzkin, A. Statistical validation of intermediate endpoints for chronic diseases. Stat. Med. 1992, 11, 167–178. [Google Scholar] [CrossRef]
Wang, Y.; Taylor, J.M. A measure of the proportion of treatment expect explained by a surrogate marker. Biometrics 2002, 58, 803–812. [Google Scholar] [CrossRef] [PubMed]
Taylor, J.M.; Wang, Y.; Thiffebaut, R. Counterfactual links to the proportion of treatment effect explained by a surrogate marker. Biometrics 2005, 61, 1102–1111. [Google Scholar] [CrossRef] [Green Version]
Parast, L.; Tian, L.; Cai, T. Landmark estimation of survival and treatment effect in a randomized clinical trial. J. Am. Stat. Assoc. 2014, 109, 384–394. [Google Scholar] [CrossRef] [PubMed]
Parast, L.; McDermott, M.M.; Tian, L. Robust estimation of the proportion of treatment effect explained by surrogate marker information. Stat. Med. 2016, 35, 1637–1653. [Google Scholar] [CrossRef] [Green Version]
Frangakis, C.E.; Rubin, D.B. Principal stratification in causal inference. Biometrics 2002, 58, 21–29. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Conlon, A.S.; Taylor, J.M.; Elliott, M.R. Surrogacy assessment using principal stratification when surrogate and outcome measures are multivariate normal. Biostatistics 2014, 15, 266–283. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Huang, Y.; Gilbert, P.B. Comparing biomarkers as principal surrogate endpoints. Biometrics 2011, 67, 1442–1451. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Gabriel, E.E.; Gilbert, P.B. Evaluating principal surrogate endpoints with time-to-event data accounting for time-varying treatment efficacy. Biostatistics 2014, 15, 251–265. [Google Scholar] [CrossRef] [Green Version]
Gabriel, E.E.; Sachs, M.C.; Gilbert, P.B. Comparing and combining biomarkers as principle surrogates for time-to-event clinical endpoints. Stat. Med. 2015, 34, 381–395. [Google Scholar] [CrossRef]
Gilbert, P.B.; Hudgens, M.G. Evaluating candidate principal surrogate endpoints. Biometrics 2008, 64, 1146–1154. [Google Scholar] [CrossRef] [Green Version]
Buyse, M.; Molenberghs, G. Criteria for the validation of surrogate endpoints in randomized experiments. Biometrics 1998, 54, 1014–1029. [Google Scholar] [CrossRef] [PubMed]
Alonso, A.; Molenberghs, G. Surrogate marker evaluation from an information theoretic perspective. Biometrics 2007, 63, 180–186. [Google Scholar] [CrossRef] [PubMed]
Pryseley, A.; Tilahun, A.; Alonso, A.; Molenberghs, G. Information-theory based surrogate marker evaluation from several randomized clinical trials with continuous true and binary surrogate endpoints. Clin. Trials 2007, 4, 587–597. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Alonso, A.; Molenberghs, G. Evaluating time to cancer recurrence as a surrogate marker for survival from an information theory perspective. Stat. Methods Med. Res. 2008, 17, 497–504. [Google Scholar] [CrossRef]
Alonso, A.; Bigirumurame, T.; Burzykowski, T.; Buyse, M.; Molenberghs, G.; Muchene, L.; Perualila, N.J.; Shkedy, Z.; Van der Elst, W. Applied surrogate endpoint evaluation methods with SAS and R; Chapman and Hall/CRC: London, UK, 2017. [Google Scholar] [CrossRef]
Ensor, H.; Weir, C.J. Evaluation of surrogacy in the multi-trial setting based on information theory: An extension to ordinal outcomes. J. Biopharm. Stat. 2020, 30, 364–376. [Google Scholar] [CrossRef] [Green Version]
Havrda, J.; Charvát, F. Quantification method of classification processes. Concept of structural α-entropy. Kybernetika 1967, 3, 30–35. [Google Scholar]
Tsallis, C. Possible generalization of BoltzmannGibbs statistics. J. Stat. Phys. 1988, 52, 479–487. [Google Scholar] [CrossRef]
Amigó, J.M.; Balogh, S.G.; Hernández, S. A brief review of generalized entropies. Entropy 2018, 20, 813. [Google Scholar] [CrossRef] [Green Version]
Wachowiak, M.P.; Smolíková, R.; Tourassi, G.D.; Elmaghraby, A.S. Similarity metrics based on nonadditive entropies for 2D-3D multimodal biomedical image registration. In Medical Imaging 2003: Image Processing; International Society for Optics and Photonics: Bellingham, WA, USA, 2003; Volume 5032, pp. 1090–1100. [Google Scholar]
Owen, D. A table of normal integrals. Commun. Stat. Simul. Comput. 1980, 9, 389–419. [Google Scholar] [CrossRef]
Chib, S.; Greenger, E. Analysis of multivariate probit models. Biometrika 1998, 85, 347–361. [Google Scholar] [CrossRef] [Green Version]
Fox, R.J.; Coffey, C.S.; Conwit, R.; Cudkowicz, M.E.; Gleason, T.; Goodman, A.; Klawiter, E.C.; Matsuda, K.; McGovern, M.; Naismith, R.T.; et al. NN102/SPRINT-MS trial investigators. Phase 2 trial of ibudilast in progressive multiple sclerosis. N. Engl. J. Med. 2018, 379, 846–855. [Google Scholar] [CrossRef] [PubMed]
Biswas, A.; Pardo, M.C.; Guha, A. Auto-association measures for stationary time series of categorical data. TEST 2004, 23, 487–514. [Google Scholar] [CrossRef]
Andonie, R.; Petrescu, F. Interacting systems and informational energy. Found. Control Eng. 1986, 11, 53–59. [Google Scholar]
Pardo, J.A.; Pardo, M.C.; Vicente, M.L.; Esteban, M.D. A statistical information theory approach to compare the homogeneity of several variances. Comput. Stat. Data Anal. 1997, 24, 411–416. [Google Scholar] [CrossRef]
Menéndez, M.L.; Pardo, J.A.; Pardo, M.C. Estimators based on sample quantiles using (h,φ)-entropy measures. Appl. Math. Lett. 1998, 11, 99–104. [Google Scholar] [CrossRef] [Green Version]

Table 1. Summary Statistics for The Real Data Example.

Variable	Control (N = 104)	Treatment (N = 99)	p-Value *
CTH > 3 mm: N (%)	50 (48%)	70 (71%)	0.0016
BPF: Mean (SD)
Week 0	0.8023 (0.0301)	0.8040 (0.0281)	0.6823
Week 24	0.8012 (0.0301)	0.8039 (0.0277)	0.5001
Week 48	0.8009 (0.0311)	0.8036 (0.0282)	0.5115
Week 72	0.8001 (0.0303)	0.8032 (0.0283)	0.4433
Week 96	0.7989 (0.0306)	0.8026 (0.0293)	0.3813
Change/24 weeks **	−0.0008 (0.0001)	−0.0004 (0.0001)	0.0056

* p-value for CTH > 3 mm was calculated using Fisher’s exact test; p-values for mean differences at follow-up visits were calculated using a t-test. P-value for changes in 24 weeks (slopes) was calculated by the mixed random effects model. ** change per 24 weeks was estimated using a mixed random effects linear regression model using the R-lmer package (see Appendix A.1).

Table 2. H-C Mutual Information and ITMA by Different Longitudinal Designs.

BPF Data Used	$α = 0.5$		$α = 1$		$α = 2$		p-Value *
BPF Data Used	$I_{α} (T, S \| Z)$	ITMA	$I_{α} (T, S \| Z)$	ITMA	$I_{α} (T, S \| Z)$	ITMA	p-Value *
0, 24	4.6042	0.9999	2.6300	0.9948	0.6063	0.7026	0.0797
0, 24, 48	4.6117	0.9999	2.6307	0.9948	0.6066	0.7028	0.1025
0, 24, 48, 72	4.6209	0.9999	2.6352	0.9949	0.6071	0.7031	0.0390
0, 24, 48, 72, 96	4.6103	0.9999	2.6361	0.9949	0.6069	0.7029	0.0056
0, 48	4.4683	0.9999	2.6012	0.9945	0.5980	0.6976	0.1586
0, 72	4.4522	0.9999	2.5912	0.9944	0.5962	0.6965	0.0675
0, 24, 72	4.6223	0.9999	2.6348	0.9949	0.6072	0.7031	0.0485
0, 48, 72	4.4696	0.9999	2.6022	0.9945	0.5980	0.6976	0.0382

* p-value for treatment and visit interactions in a linear mixed random effects model using the R-lmer function (see Appendix A.2).

Table 3. Prentice Criteria for Surrogate Endpoint (see Appendix A.3).

BPF Data Used	$α = 0.5$		$α = 1$		$α = 2$
BPF Data Used	$I_{α} (T, Z \| S)$	ITMA	$I_{α} (T, Z \| S)$	ITMA	$I_{α} (T, Z \| S)$	ITMA
0, 24	0.0390	0.0751	0.0271	0.0528	0.0108	0.0213
0, 24, 48	0.0388	0.0747	0.0270	0.0526	0.0108	0.0213
0, 24, 48, 72	0.0407	0.0782	0.0280	0.0545	0.0110	0.0218
0, 24, 48, 72, 96	0.0395	0.0760	0.0274	0.0533	0.0110	0.0218
0, 48	0.0428	0.0820	0.0297	0.0578	0.0117	0.0231
0, 72	0.0416	0.0798	0.0287	0.0558	0.0111	0.0219
0, 24, 72	0.0403	0.0775	0.0278	0.0541	0.0110	0.0217
0, 48, 72	0.0434	0.0832	0.0299	0.0580	0.0116	0.0229

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Pardo, M.d.C.; Zhao, Q.; Jin, H.; Lu, Y. Evaluation of Surrogate Endpoints Using Information-Theoretic Measure of Association Based on Havrda and Charvat Entropy. Mathematics 2022, 10, 465. https://doi.org/10.3390/math10030465

AMA Style

Pardo MdC, Zhao Q, Jin H, Lu Y. Evaluation of Surrogate Endpoints Using Information-Theoretic Measure of Association Based on Havrda and Charvat Entropy. Mathematics. 2022; 10(3):465. https://doi.org/10.3390/math10030465

Chicago/Turabian Style

Pardo, María del Carmen, Qian Zhao, Hua Jin, and Ying Lu. 2022. "Evaluation of Surrogate Endpoints Using Information-Theoretic Measure of Association Based on Havrda and Charvat Entropy" Mathematics 10, no. 3: 465. https://doi.org/10.3390/math10030465

APA Style

Pardo, M. d. C., Zhao, Q., Jin, H., & Lu, Y. (2022). Evaluation of Surrogate Endpoints Using Information-Theoretic Measure of Association Based on Havrda and Charvat Entropy. Mathematics, 10(3), 465. https://doi.org/10.3390/math10030465

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Evaluation of Surrogate Endpoints Using Information-Theoretic Measure of Association Based on Havrda and Charvat Entropy

Abstract

1. Introduction

2. Extension of ITMA Surrogacy from Shannon Entropy to Havrda-Charvat Entropy

3. Surrogacy of a Longitudinal Biomarker for a Binary Clinical Endpoint

3.1. Model for Longitudinal Continuous Surrogate Biomarkers in Phase II Trials

3.2. A Data Example

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A. R-Program

Appendix A.1. R-Program for Table 1

Appendix A.2. R-Program for Table 2

Appendix A.3. R-Program for Table 3

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI