Evaluation of Surrogate Endpoints Using Information-Theoretic Measure of Association Based on Havrda and Charvat Entropy

Surrogate endpoints have been used to assess the efficacy of a treatment and can potentially reduce the duration and/or number of required patients for clinical trials. Using information theory, Alonso et al. (2007) proposed a unified framework based on Shannon entropy, a new definition of surrogacy that departed from the hypothesis testing framework. In this paper, a new family of surrogacy measures under Havrda and Charvat (H-C) entropy is derived which contains Alonso’s definition as a particular case. Furthermore, we extend our approach to a new model based on the information-theoretic measure of association for a longitudinally collected continuous surrogate endpoint for a binary clinical endpoint of a clinical trial using H-C entropy. The new model is illustrated through the analysis of data from a completed clinical trial. It demonstrates advantages of H-C entropy-based surrogacy measures in the evaluation of scheduling longitudinal biomarker visits for a phase 2 randomized controlled clinical trial for treatment of multiple sclerosis.


Introduction
Surrogate endpoints which can be observed earlier, easier, possibly repeated, or are costsaving, have been used to replace clinical endpoints in clinical trials. For example, total tumor response rate and progression-free survival have been used in phase II and phase III cancer clinical trials as surrogate endpoints for overall survival, which often requires a longer trial duration to achieve adequate statistical power. The United States Food and Drug Administration (USFDA) has accepted the use of surrogate endpoints in regulatory reviews of new drug applications [1]. Most cancer drug approvals (55 of 83 (66%)) between 2009 and 2014 by the USFDA have used at least one surrogate endpoint [2].
A motivative example is to use biomarkers in phase II cancer trials. Non-randomized single arm or randomized parallel clinical trials are used to evaluate signal of efficacy for a new drug. A binary response status, such as the total response based on the RECIST criterion [3], or a continuous response in change of tumor sizes [4] are common primary endpoints. For molecular-targeted drugs or immune oncology therapies, various serum, tissue, or imaging biomarkers are being developed to assess if the targeted pathways have been activated. These biomarkers are usually continuous, can be measured repeatedly, and their changes should proceed to a clinical response. However, the activation of a targeted pathway doesn't necessarily imply the response to the treatment. Various questions have been raised about the utility of such biomarkers in phase II trials [5,6].
Surrogate endpoints provide the convenience to speed up the clinical trial [7], but may not represent the actual outcomes well regarding the benefit of therapy [8]. For instance, bevacizumab was approved in metastatic breast cancer based on the surrogate outcome and was later withdrawn for failing to confirm a survival benefit [5]. How to evaluate the therapy benefit between surrogates, denoted as S and the true clinical endpoint outcome, denoted as T, remains a scientific challenge.
There are a lot of successful statistical methods and measures to assess surrogate endpoints. One method to validate surrogate endpoints is to evaluate their correlation with clinically meaningful endpoints through meta-analyses [9]. Only 11 of 89 (12%) studies had found high correlation (r ≥ 0.85), and nine (10%) showed a moderate-only correlation (r > 0.7 to r < 0.85) between surrogates and endpoints [7], suggesting that the strength of surrogates in clinical practice is often unknown or weak.
Buyse and Molenberghs [23] suggested two quantities to validate a surrogate endpoint: the relative effect which related the treatment effect on the primary outcome to that on the surrogate at the population level, and the adjusted association, which quantified the association between the primary outcome and surrogate marker after adjusting for the treatment at the individual level. These methods assumed that information regarding the surrogate and true endpoints was available from a single-trial surrogate evaluation method. They focused only on the validity, whereas the general association of the last one was related to the efficiency of a surrogate marker. Using an information-theoretic approach, Alonso and Molenberghs [24] and Pryseley et al. [25] redefined surrogacy in terms of the information content that S provides with respect to T. Using notations from Pryseley et al [25], let S be a continuous surrogate random variable and T be the continuous targeted clinical endpoint of interest. We use f(·) to denote the density function. The Shannon entropy functions for T and the conditional variable T|S are denoted as h(T) and h(T|S), respectively, where h(T) = E[−logf(T)] and h(T|S) = E[−logf(T|S)]. The corresponding entropy power functions are EP ℎ (T ) = e 2 n ℎ(T ) /(2πe) and EP ℎ (T | S) = e 2 n ℎ(T | S) /(2πe). An information theoretic measure of association (ITMA) is defined by Alonso and colleagues as the proportion of uncertainty reduction measured by the entropy power function for T|S in reference to T: Here is the mutual information. If S is a good surrogate for T, uncertainty about the effect on T is reduced by knowing the knowledge of the effect on S. There are some useful properties as described by Alonso and Molenberghs  Multiple authors have provided examples of this approach and demonstrated applications for  situations when S and T are both binary, continuous, longitudinal, and time-to-event random variables as well as ordinal outcomes [24][25][26][27][28].
In this paper, we present two new results on the topic of surrogate endpoints based on information-theoretic measure of association (ITMA). First, we extend the ITMA construction based on Shannon entropy to a construction based on Havrda and Charvat (H-C) entropy [29]. The extension is motivated by the general existence of the H-C entropy. Explicit expressions as well as the properties of ITMA in different situations based on H-C entropy are presented. Second, we extend the H-C ITMA model to a longitudinally collected continuous surrogate endpoint for a binary clinical endpoint of a clinical trial. Then, the benefit of S is evaluated with the ITMA [24,25]. The current work focuses on a single trial surrogacy and its extension to a meta-analytic framework will need further development.
The paper is organized in the following structure. In Section 2, we give an example of when the Shannon entropy cannot be defined, thus the surrogacy by Alonso and Molenberghs under the information theoretical framework will not work [24]. We then prove the existence of H-C entropy under general conditions. Therefore, a family of surrogacy measures based on ITMA of H-C entropy is defined. An explicit formula is obtained for the following situations: binary-binary, continuous-continuous and binary-continuous. In Section 3, we extend a longitudinal linear random effects model for the longitudinally collected surrogate marker and a probit regression model for a binary primary endpoint in clinical trials. An application using H-C entropy in selecting times to collect surrogate measures is presented using data from a completed clinical trial. Finally, Section 4 presents discussions and a conclusion. R-Programs that generated re-sults for tables in Section 3 are presented in the Appendix A.

Extension of ITMA Surrogacy from Shannon Entropy to Havrda-Charvat Entropy
Why should we consider the extension? While Shannon's entropy is adequate in most applications, there are cases when a Shannon's entropy function doesn't exist, and thus the ITMA cannot be properly calculated. We give an example in the following.

Example 1.
Let X be the random variable with heavy tails. Its density function is One way to make ITMA work is to use the Havrda-Charvat entropy [29], a generalization of entropy function that contains the Shannon entropy as its special case. Mathematically, Havrda-Charvat entropy is defined as follows: where It is easy to see that HC 1 (X) = h(X).

Proposition 1.
Under a mild regular condition that the density function for X is bounded, HC α (X) can always exist for an α > 1.

Proof.
We only need to prove that for a bounded density function f(x) by a constant K>0, Let W = {x: f(x) > 1}, then it follows from the fact that that m(W) ≤ 1 and ∫ W C f(x)dx ≤ 1, where m(W) is the probability of W. Therefore, Because Shannon's entropy is a special case of H-C entropy and H-C entropy always exists with proper choice of α, ITMA of H-C entropy should be a more flexible way as a surrogacy measure for more distribution families. Different from Shannon's entropy, H-C entropy satisfies a non-additive property such that HC α (T, S) = HC α (T) + HC α (S) + (1 − α)HC α (T)HC α (S), when T and S are independent. In general, the non-additive measures of entropy find justifications in many biological and chemical phenomena [30]. While H-C entropy has been used in quantum physics [31] and medical imaging research [32], it has not yet been used to describe the endpoint surrogacy for clinical trials.
Some basic properties are available here.

2.
When T and S are deterministic, the value of I α (T, S) will depend on α > 1 or α < 1 as seem in the following propositions.

Proposition 2.
Let T and S be two continuously normally distributed random variables such as the joint distribution of (T , S)′ N μ T μ S , σ T 2 ρσ T σ S ρσ T σ S σ S 2 , the conditional distribution of (T | S) N μ T + ρσ T σ S S − μ S , σ T 2 1 − ρ 2 and T N μ T , σ T 2 , where "′" means vector transpose.
Then, we have the following results:

2.1
The mutual information for H-C entropy depends not only the correlation between T and S, but also their standard deviations for α ≠ 1.

Proposition 3.
Let T and S be two binary outcome variables with 1 for a success and 0 for a failure such as the joint distribution of (T, S)′ ~ Multinomial (p 0,0 , p 1,0 , p 0,1 , p 1,1 ). We have the following results:

3.1
When p t,s = p t,+ p +,s′ T and S are independent, I α (T, S) = 0.
Result 3.1 is the direct derivation from Equation (9) as p t,s = p t,+ p +,s .

Proposition 4.
Let T be a binary outcome variable and S continuous normally distributed surrogate variable, where T~B(p 0 , p 1 ) and S N μ S , σ S 2 . We assume that there is a latent variable U such that T = 1 ⇔ U ≥ 0, i.e, a Probit model with U~N(μ T , 1) and μ T = Φ −1 (p 1 ). Assume a correlation coefficient between U and S is ρ, the conditional binary endpoint  . We have the following results:

4.1
The mutual information for H-C entropy is
The joint distribution function for (T, S) = (t, s) is:

Author Manuscript
Author Manuscript Author Manuscript Author Manuscript Using J (α) (a, b) = ∫ −∞ ∞ Φ α (a + by)ϕ(y)dy, we can derive an alternative formulation For α = 1, H-C entropy is similar to Shannon's entropy. Thus, by taking limit of α to 1, we can derive Shannon's mutual information for the Probit model in 4.4.

Model for Longitudinal Continuous Surrogate Biomarkers in Phase II Trials
In many phase II trials, clinical endpoints of interest (T) are often a proportion of a binary endpoint or mean of a continuous variable. For example, in oncology phase II trials, a common clinical endpoint is total response rate. The surrogate biomarkers, on the other hand, are usually lab tests either from serum or urine or imaging modalities that can be measured repeatedly during the study. In this section, we focus on a binary one-time clinical endpoint T and a continuous repeated surrogate variable S.
In the remainder of the paper, we will use t j to denote the time of jth measurement, since baseline in a longitudinal trial. For simplicity, consider the difference model from baseline t 0 = 0. Let the general model as: S i, j | Z i , T i = μ S i + α 1 Z i + α 2 t j + α 3 Z i t j + β j T i + ϵ S, ij , for j = 1, …, K .
Using a bivariate probit model [34] for the joint distribution of (Z i , T i )′ we can derive a probit model for the conditional joint distribution for (Z i , T i )′|S: where μ 0,k and γ k are the intercept and regression coefficient vector for the probit regression for T given longitudinal S (k = 1) and for Z given longitudinal S (k = 2), respectively, and ρ is the correlation coefficient of two underlying latent normal variables for T and Z.
Because a linear combination of multivariate normal variables is still a normal random variable, we can use the Proposition 4 to calculate ITMA under H-C entropy power to evaluate surrogacy of the longitudinal biomarker in each arm conditioning on Z. We can also average over the treatment arms to get the mean trial level ITMA under H-C entropy, denoted as Furthermore, we can use the mutual information I α (T, Z|S) to verify Prentice's criteria as suggested by [24]: i.e, conditioning on surrogate S, the clinical endpoint T and treatment assignment Z are independent, thus a good surrogate should lead to I α (T, Z|S) ≈ 0. Since .
For real data, we can use bivariate probit model to estimate equations for [p(T = t|S)] α , [p(Z = z|S)] α , and [p(T = t, Z = z|S)] α , then use Equation (9) to perform numerical integration for the derivation of I α (T, Z|S). One way to perform this analysis is to use R-package mvProbit del Carmen Pardo et al. Page 17 from CRAN-R (https://cran.r-project.org/web/packages/mvProbit/mvProbit.pdf, accessible on January 29, 2022).

A Data Example
"Safety, Tolerability and Activity Study of Ibudilast in Subjects with Progressive Multiple Sclerosis" (NCT01982942) is a US National Institute of Health (NIH) sponsored multicenter, randomized, double-blind, placebo-controlled, parallel-group phase II study from November 2013 to December 2017. The main study results have been published by Fox, et al. [35]. The trial data is publicly available upon request to NIH. We use this data for the numerical illustration for H-C ITMA.
More specifically, patients were enrolled with primary or secondary progressive multiple sclerosis of this phase II randomized trial of oral ibudilast (≤100 mg daily) or placebo for 96 weeks. The primary efficacy end point was the rate of brain atrophy, as measured by the brain parenchymal fraction (brain size relative to the volume of the outer surface contour of the brain). Major secondary end points included the change in the pyramidal tracts on diffusion tensor imaging and cortical atrophy, all measures of tissue damage in multiple sclerosis.
We requested and received data from the study team that contained 104 placebo patients and 99 treated patients, with longitudinal observations in brain parenchymal fraction (BPF) and thinning of the cortical gray matter (cortical thickness) measured by magnetic resonance imaging at week 0, 24, 48, 72, and 96. For illustration purposes, we altered the primary and secondary endpoints of the trial and created a binary clinical endpoint as the cortical thickness (CTH) greater than 3 mm as a clinical outcome for less cortical gray matter atrophy and used BPF as the continuous longitudinal marker. Table 1 provides a summary of the data used for this illustration.
From Table 1, we can see that 104 patients were randomized to the control arm and 99 patients to the treatment arm. The treatment significantly reduced cortical atrophy for 71% patients who maintained more than 3 mm cortical thickness (CTH) in the treatment arm in comparison to 48% in the placebo arm at 96 weeks post baseline. While the differences in BPFs between treatment arms had p-values above 0.38 in each follow-up MRI, the aggregated changes over time measured by the slopes of a mixed random effects regression model achieved highly statistical significance with a p-value of 0.0056.
The importance of evaluating the surrogacy of the longitudinal BPF measurements for the binary CTH endpoint in MS trials is to understand the strength of surrogacy and whether it can be used to shorten trial duration. More importantly for future trial design, we need to understand how often and when the longitudinal measurements should be performed.
Using formulas derived in Proposition 4, we derived the mean mutual information and ITMA of longitudinal BPF as a surrogate for the clinical endpoint of maintaining more than 3 mm cortical thickness at the end of 96 weeks. We explore three choices of α = 0.5, 1, and 2 to show the difference between H-C and Shannon entropies. The value of α = 1 has been considered because it corresponds to Shannon entropy. The other two alpha values have been del Carmen Pardo et al. Page 18 considered in other papers such as [32]. The columns of Table 2 are organized according to values of α. Each row in Table 2 represents a design to use BPF in the baseline (week 0) and different follow-up visits to construct a longitudinal surrogate endpoint. For example, the first row used the baseline and week 24 data while the last row used the data from baseline, weeks 48 and 72.
From Table 2, we can see that the longitudinal BPF measures at the baseline with at least one follow-up visit were all reasonable surrogates for the binary endpoint of CTH > 3.0 mm. H-C entropy with α = 0.5 was not sensitive enough to differentiate subtle differences in surrogacy utility of different designs to collect surrogate endpoints. When α = 1, H-C entropy is Shannon entropy and it was able to discriminate among different designs to the 3rd decimal place. H-C entropy for α = 2 was more sensitive and showed differences in all designs. As it demonstrated, using longitudinal BPF data could shorten trial duration to 72 weeks. For a trial ended at 72 weeks, additional BPF measures at week 24 and week 48 did not add any more valuable utility to surrogacy than a single measure in week 24. Overall, the p-values from the linear mixed random effects model reflected the directions of ITMAs, but not in completely concordance, perhaps, due to random variation in fitting the mixed random effects and the probit models.  Table 3 confirm the observations in Table 2 that the longitudinal BPF is a good surrogate variable for binary CTH > 3.0 mm at 96 weeks. Because Table 3 uses the same model as Table 2, the p-values for longitudinal models are omitted. Once again, I α (T, Z|S) decreases with α.

Conclusions
Alonso et al. [24] proposed to assess the validity of a surrogate endpoint in terms of uncertainty reduction. The main proposals for measures of uncertainty are found in information theory. These authors based their proposal in the well-known Shannon entropy. In the past there has been an extensive work on generalized entropies [30][31][32][36][37][38][39]. We focus on the Havrda-Charvat entropy, which reduces to the Shannon case if the parameter is set to one, to extend that surrogacy measure. Based on the generalized entropy, we consider a generalized mutual information as it has been proved in other contexts to have better performance of some members of this family [30][31][32]. In this paper, the theoretical development of these measures has been completed. The advantage of our proposal is that it contains a particular case of a useful measure to assess surrogacy and demonstrates the ability to easily explore other measures which may have performance advantages for specific questions. We have seen the advantage of using α = 2 instead of α = 1 in our example to evaluate scheduling of longitudinal visits.
Some additional issues are pending. On one hand, we are working to carry out a more extensive numerical study for assessing the performance of these measures in the endpoint surrogacy context. In our paper, we compared the performance of ITMA in a real trial with three choices of α (0.5, 1 and 2). They were chosen for illustration purposes. The optimal choice of α remains a research question. Additional research can consider other ITMA, such as divergence measures [36], taking into account that the mutual information is equal to the Kullback divergence, or measures of unilateral dependency as that defined by Andonie et al. [37] based on the informational energy [39] or surrogacy for testing of variances [38].