Definition and Estimation of Covariate Effect Types in the Context of Treatment Effectiveness

Yasutaka Chiba

doi:10.3390/math8101657

Clinical Research Center, Kindai University Hospital, 377-2, Ohno-higashi, Osaka-sayama, Osaka 589–8511, Japan

Mathematics2020, 8(10), 1657;https://doi.org/10.3390/math8101657

This article belongs to the Special Issue Applied Medical Statistics: Theory, Computation, Applicability

Version Notes

Order Reprints

Abstract

In some clinical studies, assessing covariate effect types indicating whether a covariate is predictive and/or prognostic is of interest, in addition to the study endpoint. Recently, for a case with a binary outcome, Chiba (Clinical Trials, 2019; 16: 237–245) proposed the new concept of covariate effect type, which is assessed in terms of four response types, and showed that standard subgroup or regression analysis is applicable only in certain cases. Although this concept could be useful for supplementing conventional standard analysis, its application is limited to cases with a binary outcome. In this article, we aim to generalize Chiba’s concept to continuous and time-to-event outcomes. We define covariate effect types based on four response types. It is difficult to estimate the response types from the observed data without making certain assumptions, so we propose a simple method to estimate them under the assumption of independent potential outcomes. Our approach is illustrated using data from a clinical study with a time-to-event outcome.

Keywords:

potential outcome; prediction; prognosis; response type; restricted mean survival time

1. Introduction

In certain clinical studies, it is important to assess covariate effect types to indicate whether a covariate is predictive and/or prognostic in the context of treatment effectiveness. Clark et al. [1] defined a predictive factor as a factor associated with the response or lack of response to a particular therapy, while a prognostic factor is a factor associated with the clinical outcome in the absence of therapy, or with the application of a standard therapy. To assess whether a covariate is predictive and/or prognostic, subgroup or regression analysis is often used [2]. For a case with a binary outcome, on the difference scale, a binary covariate

Z

is predictive if

δ \equiv {\Pr (Y = 1 | X = 1, Z = 1) - \Pr (Y = 1 | X = 0, Z = 1)} - {\Pr (Y = 1 | X = 1, Z = 0) - \Pr (Y = 1 | X = 0, Z = 0)}

(1)

is not zero, where

X

is the treatment arm (

X = 1

for the experimental treatment and

X = 0

for the control treatment) and

Y

is the outcome (

Y = 1

for response and

Y = 0

for no response). If

δ

is close to zero,

Z

is prognostic if

γ \equiv \Pr (Y = 1 | X = 0, Z = 1) - \Pr (Y = 1 | X = 0, Z = 0)

(2)

is not zero.

Recently, for cases with a binary outcome, Chiba [3] proposed assessing covariate effect types based on four response types [4] defined in terms of the potential outcome of

Y

if

X

is

x

,

Y (x)

, rather than

δ

and

γ

as above. The four response types are defined as follows:

Activated subjects, who would show a response regardless of the treatment they received; that is, ${Y (1), Y (0)} = (1, 1)$ ;
Causative subjects, who would show a response only if they received the experimental treatment; that is, ${Y (1), Y (0)} = (1, 0)$ ;
Preventive subjects, who would show a response only if they received the control treatment; that is, ${Y (1), Y (0)} = (0, 1)$ ;
Inert subjects, who would not show a response regardless of the treatment they received; that is, ${Y (1), Y (0)} = (0, 0)$ .

Based on the four response types, measures of the covariate effect types are defined as follows [3]:

η_{k l} \equiv \Pr {Y (1) = k, Y (0) = l | Z = 1} - \Pr {Y (1) = k, Y (0) = l | Z = 0}

(3)

for

k, l = 0, 1

. Then,

η_{10}

can be used instead of a current common measure for predicting the effectiveness of the experimental treatment, i.e.,

δ

in (1).

η_{10} > 0

indicates that the proportion of subjects who would show a response only if they received the experimental treatment is higher in the subgroup where

Z = 1

than in the subgroup where

Z = 0

. In addition, we can consider predictability for the effectiveness of the control treatment (or the harm caused by the experimental treatment) by

η_{01}

. Instead of a current common measure for a prognostic factor, i.e.,

γ

in (2),

η_{11}

can be used.

η_{11} > 0

indicates that the proportion of subjects who would show a response regardless of the treatment they received is higher in the subgroup where

Z = 1

than in the subgroup where

Z = 0

.

Applying the potential outcome

Y (x)

to (1) under the assumption of

Y (x) ⊥ X | Z

, which indicates that

Y (x)

is independent from

X

given

Z

, the following relationship between (1) and (3) is obtained:

\begin{matrix} δ & = [\Pr {Y (1) = 1 | Z = 1} - \Pr {Y (0) = 1 | Z = 1}] - [\Pr {Y (1) = 1 | Z = 0} - \Pr {Y (0) = 1 | Z =}] \\ = η_{10} - η_{01} . \end{matrix}

(4)

Similarly, the following relationship holds between (2) and (3):

\begin{matrix} γ & = \Pr {Y (0) = 1 | Z = 1} - \Pr {Y (0) = 1 | Z = 0} \\ = η_{11} + η_{01} . \end{matrix}

(5)

Obviously, if

η_{01} \neq 0

, the currently used definition of covariate effect types does not correspond to the new definition in (3). Except for the special case of

η_{01} = 0

, under the current definition, the two characteristics of covariate effect cannot be separated from each other. In actual clinical studies, the aim is to assess covariate effect type on the basis of (3) rather than (1) and (2). However, (3) can be applied only when an outcome is binary and not when an outcome is continuous or time-to-event-based. In this article, we aim to generalize (3) to continuous and time-to-event outcomes. We provide a definition of measures of covariate effect types in Section 2 and propose a simple method to estimate those measures in Section 3. In Section 4, our approach is illustrated using data from a clinical study with a time-to-event outcome. Finally, in Section 5, the performance of the proposed approach and directions for future work in this area are discussed.

2. Definition of Measures of Covariate Effect Types

In the following, we mainly discuss a case with a time-to-event outcome

T

. However, this discussion can also be applied to a case with a continuous outcome as a special case without censored data in which a larger value represents a better response.

X

and

Z

are the same as in Section 1; i.e.,

X

is a binary treatment arm and

Z

is a binary covariate.

T (x)

is the potential outcome of

T

if

X

had been set to

x

. Unfortunately, it is not possible to observe the values of both

T (1)

and

T (0)

; i.e.,

T = T (1) X + T (0) (1 - X)

[5]. We use the stable unit treatment value assumption under which a single version of each treatment is available and there is no interference among subjects [6,7].

When an outcome is time-to-event, the response type

{T (1), T (0)}

cannot be classified into four types unlike in the case of a binary outcome described in Section 1. However, if we set a cut-off point

t_{c}

, the time-to-event outcome can be regarded as a binary variable; then, we can consider the following four response types.

Definition 1

(Response type).Using a cut-off point

t_{c}

, the response type can be classified into the following four types:

Activated subjects (type 11 subjects), whose outcomes would be larger than or equal to $t_{c}$ , regardless of the treatment they received; that is, ${T (1) \geq t_{c}, T (0) \geq t_{c}}$ ;
Causative subjects (type 10 subjects), whose outcomes would be larger than or equal to $t_{c}$ if they received the experimental treatment, but would be smaller than $t_{c}$ if they received the control treatment; that is, ${T (1) \geq t_{c}, T (0) < t_{c}}$ ;
Preventive subjects (type 01 subjects), whose outcomes would be smaller than $t_{c}$ if they received the experimental treatment, but would be larger or equal to $t_{c}$ if they received the control treatment; that is, ${T (1) < t_{c}, T (0) \geq t_{c}}$ ;
Inert subjects (type 00 subjects), whose outcomes would be smaller than $t_{c}$ , regardless of the treatment they received; that is, ${T (1) < t_{c}, T (0) < t_{c}}$ .

All subjects belong to one of these four types, although we cannot know which type a subject belongs to. We denote a proportion of the

k l

subjects of the total subjects as

P_{k l} (t_{c})

; i.e.,

P_{k l} (t_{c}) \equiv {[\Pr {T (1) \geq t_{c}, T (0) \geq t_{c}}]}^{k l} {[\Pr {T (1) \geq t_{c}, T (0) < t_{c}}]}^{k (1 - l)} \times {[\Pr {T (1) < t_{c}, T (0) \geq t_{c}}]}^{(1 - k) l} {[\Pr {T (1) < t_{c}, T (0) < t_{c}}]}^{(1 - k) (1 - l)},

(6)

and the proportion in the stratum with

Z = z

as

P_{k l, z} (t_{c})

. Then,

P_{11} (t) + P_{10} (t) = \Pr {T (1) \geq t}

corresponds to the survival probability under the experimental condition and

P_{11} (t) + P_{01} (t) = \Pr {Y (0) \geq t}

is that under the control condition. For a continuous outcome, they correspond to one minus the cumulative density function.

Using

P_{k l, z} (t_{c})

, we can simply define measures of covariate effect types at the cut-off point

t_{c}

by

θ_{k l} (t_{c}) \equiv P_{k l, 1} (t_{c}) - P_{k l, 0} (t_{c}) .

(7)

This is essentially the same as

η_{k l}

in (3). Unfortunately,

θ_{k l} (t_{c})

with a fixed

t_{c}

cannot be applied as a general definition of measures of covariate types. If there is a medically meaningful cut-off point, the outcome is no longer time-to-event (continuous); rather, it is binary. If the cut-off point

t_{c}

has no medical significance, it is not valuable to use a fixed cut-off point.

P_{k l} (t_{c})

and

P_{k l, z} (t_{c})

with a fixed cut-off point

t_{c}

may also not be useful. Using these probabilities with

t_{c}

changed on the interval, we define the restricted mean probability (RMP) as follows.

Definition 2

(Restricted mean probability).The restricted mean probability for response type on the interval

[t_{a}, t_{b}]

,

{\bar{P}}_{k l}

, is defined by the following formula:

{\bar{P}}_{k l} \equiv \frac{1}{t_{b} - t_{a}} \int_{t_{a}}^{t_{b}} P_{k l} (t_{c}) d t_{c}

for

k, l = 0, 1

.

In this definition,

t_{a}

and

t_{b}

will usually be set to

t_{a} = 0

and

t_{b} = t_{τ}

, respectively, where

t_{τ}

is the truncation time; for a continuous outcome, these values will be set to the minimum and maximum values of the observed outcome, respectively.

{\bar{P}}_{k l, z}

in the stratum with

Z = z

is defined in the same manner.

{\bar{P}}_{k l}

is the restricted expectation of

P_{k l} (t_{c})

on the interval

[t_{a}, t_{b}]

. Then,

{\bar{P}}_{k l}

is interpreted as the mean proportion of type

k l

subjects when the cut-off point is changed from

t_{a}

to

t_{b}

.

It is important to note that

{\bar{P}}_{k l}

is related to

E [\min {\max {T (x), t_{a}}, t_{b}}]

, which is the restricted expectation of the potential outcome

T (x)

on the interval

[t_{a}, t_{b}]

.

Lemma 1.

We have the following relationships between

{\bar{P}}_{k l}

and

E [\min {\max {T (x), t_{a}}, t_{b}}]

:

E [\min {\max {T (1), t_{a}}, t_{b}}] = (t_{b} - t_{a}) ({\bar{P}}_{11} + {\bar{P}}_{10}) + t_{a}, E [\min {\max {T (0), t_{a}}, t_{b}}] = (t_{b} - t_{a}) ({\bar{P}}_{11} + {\bar{P}}_{01}) + t_{a} .

The proof is given in Appendix A. The difference between these two equations indicates that the restricted average causal effect on the interval

[t_{a}, t_{b}]

can be expressed as

(t_{b} - t_{a}) ({\bar{P}}_{10} - {\bar{P}}_{01})

. Notably, when

t_{a} = 0

and

t_{b} = t_{τ}

,

t_{τ} ({\bar{P}}_{11} + {\bar{P}}_{10})

and

t_{τ} ({\bar{P}}_{11} + {\bar{P}}_{01})

correspond to the restricted mean survival times (RMSTs) [8,9,10], which are equal to the areas under the survival curves on the interval

[0, t_{τ}]

under the experimental and control conditions, respectively. Similarly,

t_{τ} ({\bar{P}}_{01} + {\bar{P}}_{00})

and

t_{τ} ({\bar{P}}_{10} + {\bar{P}}_{00})

correspond to the restricted mean time lost [9] under the experimental and control conditions, respectively.

We give a general definition of measures of covariate effect types using

{\bar{P}}_{k l, z}

, which is the RMP of a given response type in the stratum with

Z = z

.

Proposition 1.

We define a measure of covariate effect type as

{\bar{θ}}_{k l} \equiv {\bar{P}}_{k l, 1} - {\bar{P}}_{k l, 0}

for

k, l = 0, 1

.

The interpretation of

{\bar{θ}}_{k l}

is similar to that of

η_{k l}

in (3) for a case with a binary outcome. As a prediction measure, we use

{\bar{θ}}_{10}

and

{\bar{θ}}_{01}

.

{\bar{θ}}_{10} > 0

implies the subgroup where

Z = 1

contains a higher proportion of subjects who would survive longer by receiving the experimental treatment than the subgroup where

Z = 0

. Then,

Z

predicts the effectiveness of the experimental treatment, and we say that

Z

is “augmented-causative”. If

{\bar{θ}}_{10} < 0

, we say that

Z

is “depleted-causative”. In a similar sense,

{\bar{θ}}_{01}

is a prediction measure of the effectiveness of the control treatment (or for the harm associated with proceeding with the experimental treatment).

We note that for a case with a time-to-event (continuous) outcome, we can obtain results similar to (4) and (5) when the outcome is binary. Using Proposition 1 and Lemma 1, the prediction and prognosis measures based on conventional standard analysis,

δ^{'}

and

γ^{'}

, can be expressed as functions of

{\bar{θ}}_{k l}

on the difference scale.

Corollary 1.

We have the following equations:

δ^{'} \equiv (E_{1, 1} - E_{0, 1}) - (E_{1, 0} - E_{0, 0}) = (t_{b} - t_{a}) ({\bar{θ}}_{10} - {\bar{θ}}_{01}), γ^{'} \equiv E_{0, 1} - E_{0, 0} = (t_{b} - t_{a}) ({\bar{θ}}_{11} + {\bar{θ}}_{01}),

where

E_{x, z} \equiv E [\min {\max {T (x), t_{a}}, t_{b}} | Z = z]

.

On the basis of this corollary,

{\bar{θ}}_{11}

in Proposition 1 can be related to the prognosis under the current standard analysis. Similar to the currently used standard analysis, if

{\bar{θ}}_{10}

is close to

{\bar{θ}}_{01}

, which implies that

δ^{'}

is close to zero,

Z

is prognostic if

{\bar{θ}}_{11}

is not zero.

{\bar{θ}}_{11} > 0

implies that the mean survival time is longer for subjects in the subgroup where

Z = 1

than the subgroup where

Z = 0

, regardless of the treatment they received. In this case, we say that

Z

is “augmented-activated”. In contrast, when

{\bar{θ}}_{11} < 0

, we say that

Z

is “depleted-activated”.

Finally, we emphasize that Corollary 1 indicates that the results of the current standard analysis can be properly interpreted when

{\bar{θ}}_{01} = 0

; however, this may not be the case when

{\bar{θ}}_{01} \neq 0

. On the other hand, our measure of covariate effect type can assess whether a covariate is predictive and/or prognostic, even when

{\bar{θ}}_{01} \neq 0

.

3. Estimation of Measures of Covariate Effect Types

Unfortunately,

{\bar{θ}}_{k l}

in Proposition 1 cannot be identified based on the observed data without making certain assumptions, because the joint probability of

Y (1)

and

Y (0)

cannot be estimated [11,12]. Therefore, we use two assumptions that are often applied in the context of the pairwise comparison based on

\Pr {T (x) > T (1 - x)}

[13,14,15].

Assumption 1

(Ignorable treatment assignment).The potential outcome

T (x)

is independent of the treatment arm

X

; i.e.,

T (x) ⊥ X

. It is also assumed that

T (x) ⊥ X | Z

.

Assumption 2

(Independent potential outcome).Two potential outcomes are independent of each other; i.e.,

T (1) ⊥ T (0)

.

Assumption 1 is often made in randomized trials. Although some authors have discussed methods to infer

\Pr {T (x) > T (1 - x)}

without Assumption 2 [16,17], the methods are impractical as they tend to be complex and/or require considerable computational effort.

Corollary 2.

P_{k l} (t_{c})

in (6) is identified under Assumptions 1 and 2 as follows:

P_{k l} (t_{c}) = {[\Pr {T \geq t_{c} | X = 1}]}^{k} {[\Pr {T ⟨ t_{c} | X = 1}]}^{1 - k} \times {[\Pr {T \geq t_{c} | X = 0}]}^{l} {[\Pr {T ⟨ t_{c} | X = 0}]}^{1 - l} .

P_{k l, z} (t_{c})

in the stratum with

Z = z

is also identified under Assumptions 1 and 2, thus allowing

{\bar{P}}_{k l, z}

in Definition 2 and

{\bar{θ}}_{k l}

in Proposition 1 to be identified.

We propose a method to estimate

{\bar{P}}_{k l}

in Definition 2 based on the formula in Corollary 2. Let us denote the observed event occurrence or censoring time as

t_{1} < \dots < t_{j} < \dots < t_{D}

in which the time points in both arms are included. Then, we can use the cut-off point

t_{c}

as a discrete value taking only the same value as

t_{j}

; that is, it is sufficient to consider

t_{1}, \dots, t_{D}

as the cut-off points. We also set the interval

[t_{a}, t_{b}]

to

[0, t_{τ}]

, where

t_{τ} (\leq t_{D})

is the truncation time. These settings yield the following formula to estimate

{\bar{P}}_{k l}

.

Proposition 2.

For a time-to-event outcome, we estimate

P_{k l, z} (t_{c})

with truncation time

t_{τ}

under Assumptions 1 and 2 as follows:

{\hat{P}}_{k l} (t_{c}) = {{\hat{S}}_{1} (t_{c - 1})}^{k} {1 - {\hat{S}}_{1} (t_{c - 1})}^{1 - k} {{\hat{S}}_{0} (t_{c - 1})}^{l} {1 - {\hat{S}}_{0} (t_{c - 1})}^{1 - l},

where

t_{0} = 0

and

S_{x} (t_{c - 1}) \equiv \Pr {T \geq t_{c} | X = x}

is the survival probability estimated using the Kaplan–Meier method. Then, we estimate

{\bar{P}}_{k l}

by

{\hat{\bar{P}}}_{k l} = \frac{1}{τ} \sum_{c = 1}^{τ} (t_{c} - t_{c - 1}) {\hat{P}}_{k l} (t_{c}) .

Censoring is considered when estimating

S_{x} (t_{c - l}

by the Kaplan–Meier method. In a similar manner,

P_{k l, z} (t_{c})

and

{\bar{P}}_{k l, z}

are estimated by applying Proposition 2 to the stratum with Z = z. Thus,

θ_{k l, z} (t_{c})

, in (7) and

θ_{k l, z}

in Proposition 1 can be estimated based on the observed data. It is obvious from Proposition 2 that

τ ({\hat{\bar{P}}}_{11} + {\hat{\bar{P}}}_{10})

and

τ ({\hat{\bar{P}}}_{11} + {\hat{\bar{P}}}_{01})

are consistent with the estimators of the RMST in the arm with X = x.

To derive the formulas to estimate

P_{k l} (t_{c})

and

{\bar{P}}_{k l}

for a continuous outcome without censoring, let us suppose that t₁ < ⋯ < t_j < ⋯ t_D are the observed values of the continuous outcome. The interval [t_a,t_b] is set to [t₁,t_D]. Here, we denote the number of subjects taking the value t_j in the experimental arm as n_j and that in the control arm as m_j. Then,

P_{k l} (t_{j})

and

{\bar{P}}_{k l}

are estimated by the following formulas.

Proposition 3.

For a continuous outcome, we estimate

P_{k l} (t_{c}) a n d {\bar{P}}_{k l}

under Assumptions 1 and 2 by

{\hat{P}}_{k l} (t_{c}) = {(\frac{\sum_{j \geq c} n_{j}}{n})}^{k} {(\frac{\sum_{j \leq c - 1} n_{j}}{n})}^{1 - k} {(\frac{\sum_{j \geq c} m_{j}}{m})}^{l} {(\frac{\sum_{j \leq c - 1} m_{j}}{m})}^{1 - l}, {\hat{\bar{P}}}_{k l} = \frac{1}{t_{D} - t_{1}} \sum_{c = 2}^{D} (t_{c} - t_{c - 1}) {\hat{P}}_{k l} (t_{c}),

where

n = \sum_{j = 1}^{D} n_{j}

and

m = \sum_{j = 1}^{D} m_{j}

.

Let us denote the outcome values as

t_{j_{1}}^{1}

(

j_{1} = 1, \dots, D_{1}

) for subjects in the experimental arm and

t_{j_{0}}^{0} (j_{0} = 1, \dots, D_{0})

for those in the control arm. Then, in relation to Lemma 1, we have Lemma 2 below.

Lemma 2.

When

{\hat{P}}_{k l} (t_{c}) a n d {\hat{\bar{P}}}_{k l}

in Proposition 3 are used, the following equations hold:

(t_{D} - t_{1}) ({\hat{\bar{P}}}_{11} + {\hat{\bar{P}}}_{10}) + t_{1} = \frac{1}{n} \sum_{j_{1} = 1}^{D_{1}} n_{j_{1}} t_{j_{1}}^{1}, (t_{D} - t_{1}) ({\hat{\bar{P}}}_{11} + {\hat{\bar{P}}}_{01}) + t_{1} = \frac{1}{m} \sum_{j_{0} = 1}^{D_{0}} m_{j_{0}} t_{j_{0}}^{0} .

The proof is given in Appendix B. In this lemma, the right sides correspond to the arithmetic means in the arm with

X = x

. As the arithmetic mean is a plausible estimate of E{T|X = x}, the left sides are plausible estimates of unrestricted rather than restricted expectation of

T (x)

,

E {T (x)}

, under Assumption 1.

4. Illustration

We illustrate our measure of covariate effect type using the clinical data set of Ohashi and Hamada [18]. The purpose of that study was to determine whether the survival time is longer with use of radiation during surgery (

X = 1

) than without it (

X = 0

) in subjects with pancreatic cancer. As it was an observational study, Assumption 1 would not hold. Nevertheless, we analyzed the data under this assumption, so the following analyses are for illustrative purposes only.

We explored the covariate effect type in the context of radiation effectiveness in the pancreatic cancer site, which was subclassified as pancreatic head (

Z = 1

) or “other” (

Z = 0

). In the pancreatic head subgroup, 51 subjects received radiation during surgery and 14 subjects did not. In the other subgroup, nine subjects (including one censored subject) received radiation and nine did not. The survival curves in both subgroups are shown in Figure 1. Table 1 summarizes the RMST estimates and differences between arms in both subgroups, with 95% confidence intervals (CIs). For the estimation, the truncation time was set to the maximum observed time, 21.6 (months), because the final survival probability was zero in both arms in both subgroups.

Figure 1. Survival curves for the arms with and without radiation during surgery: (a) pancreatic head subgroup; (b) “other” subgroup.

Table 1. Restricted mean survival times (RMSTs) for the arms with and without radiation, for the pancreatic head and “other” subgroups, and the RMST differences between the two arms.

Before applying our approach, we first applied the current standard analysis given by

δ^{'}

and

γ^{'}

in Corollary 1. Using the RMST estimates in Table 1,

{\hat{δ}}^{'} = - 4.349

(95% CI: –8.687, –0.011) and

{\hat{γ}}^{'} = - 0.106

(95% CI: –2.635, 2.422). As

{\hat{δ}}^{'} < 0

, it was concluded that the cancer site would be a predictive factor, i.e., if the cancer site is not the pancreatic head, surgery is expected to be more effective with radiation than without it.

Next, we applied the approach proposed in this article. Figure 2 shows

{\hat{P}}_{k l, z} (t_{c})

in each subgroup when the cut-off point

t_{c}

was changed from 0 to 21.6 (months). The proportions of activated and inert subjects decreased and increased monotonically over time, respectively, as shown in Figure 2a,d; this is to be expected given Proposition 2. Figure 2b shows that the proportion of causative subjects is lower in the pancreatic head subgroup than in the other subgroup. Figure 2c shows that the proportion of preventive subjects was small, especially in the other subgroup. Figure 3 shows the differences in the estimated proportions between the two subgroups; i.e.,

{\hat{θ}}_{k l} (t_{c})

in (7). In Figure 3, we can clearly see that the cancer site is “depleted-activated” until approximately six months. Then, over the period from 6 to 12 months, the cancer site is “depleted-causative.”

Figure 2. Response type proportions in the pancreatic head and “other” subgroups: (a) activated (type 11) subjects; (b) causative (type 10) subjects; (c) preventive (type 01) subjects; (d) inert (type 00) subjects.

Figure 3. Differences in the proportions of type

k l

subjects between the pancreatic head and “other” subgroups.

The estimates of

{\bar{P}}_{k l}

in Definition 2 and

{\bar{θ}}_{k l}

in Proposition 1 are summarized in Table 2. In the table,

{\hat{\bar{θ}}}_{10} = - 0.145

implies that the mean proportion of causative subjects in the pancreatic head subgroup on the interval [0, 21.6] (12.9%) was 14.5% lower than that in the other subgroup (27.4%). This indicates that the cancer site is “depleted-causative”. However, for some subjects, the cancer site might be “activated-preventive” because

{\hat{\bar{θ}}}_{01} = 0.057

, although the absolute value is smaller than that of

{\hat{\bar{θ}}}_{10}

(−0.145). The cancer site might also be considered as “depleted-activated” because

{\hat{\bar{θ}}}_{11} = - 0.062

. In the context of prediction and prognosis, as

| {\hat{\bar{θ}}}_{10} | > | {\hat{\bar{θ}}}_{01} | > 0

, it is concluded that the cancer site would be a predictive factor of the effectiveness of radiation, and possibly also of the harm caused by radiation, albeit to a lesser extent.

Table 2. Restricted mean probabilities (RMPs) of the four response types by subgroups of cancer site and the RMP differences between the two subgroups.

The results generated using our approach were similar to those obtained via the current standard analysis; however, the magnitude was different. The difference is attributed to

{\hat{\bar{θ}}}_{01} = 0.057 \neq 0

, where our approach could not rule out, based on the data, a harmful effect of radiation in some subjects, especially in the pancreatic head subgroup. The value of

{\hat{δ}}^{'}

estimated by the current standard analysis was −4.439, while it is converted to −0.201 (

- 4.349 / 21.6

); this is 0.057 less than the value of

{\hat{\bar{θ}}}_{10}

(−0.145) derived by our approach, as seen in Corollary 1. Similarly,

{\hat{γ}}^{'} = - 0.106

is converted to −0.005 (

- 0.106 / 21.6

); this is 0.057 larger than the value of

{\hat{\bar{θ}}}_{11}

(−0.062). As a result,

{\hat{γ}}^{'}

is closer to zero than

{\hat{\bar{θ}}}_{11}

.

5. Conclusion

In this article, we extended the concept of the covariate effect type proposed by Chiba [3] from the case with a binary outcome to cases with time-to-event and continuous outcomes. This was achieved by using all cut-off points on the interval

[t_{a}, t_{b}]

, where the respective values of

t_{a}

and

t_{b}

are zero and the truncation time in a case with a time-to-event outcome and the minimum and maximum observed values in a case with a continuous outcome. A proportion of the

k l

subjects for each cut-off point is

P_{k l} (t_{c})

in (6), and the mean of

P_{k l} (t_{c})

is the RMP,

{\bar{P}}_{k l}

, in Definition 2. As discussed in Section 2 and Section 3, the RMP is related to the arithmetic mean, which corresponds to the RMST for a case with a time-to-event outcome.

In some studies, a covariate of interest may be a continuous variable, rather than a binary variable discussed in this article. Unfortunately, our approach cannot be applied for a continuous covariate. However, a continuous covariate can be partitioned into two subgroups by finding the optimal threshold value to split it using a popular method such as receiver operating characteristic (ROC) curve analysis [19]. Recently, partitioning methods based on a combination of multivariate covariates have also been discussed [20]. These methods will facilitate the use of our approach.

Our measures of covariate effect types can easily be calculated by merging two data sets for the experimental and control arms including the rows of times and survival probabilities, which are automatically generated using commercial software such as SAS (SAS Institute, Cary, NC, USA) and R (R Foundation for Statistical Computing, Vienna, Austria). For example, to derive

{\hat{P}}_{11} (t_{c}) = {\hat{S}}_{1} (t_{c - 1}) {\hat{S}}_{0} (t_{c - 1})

, we can use a procedure to generate a Kaplan–Meier plot. Then, we obtain respective data sets including

{t_{j_{1}}^{1}, {\hat{S}}_{1} (t_{c - 1})}

and

{t_{j_{0}}^{0}, {\hat{S}}_{0} (t_{c - 1})}

.

{\hat{P}}_{11} (t_{c})

is derived by merging these two data sets. The RMP estimate,

{\hat{\bar{P}}}_{11}

, is calculated by summing the areas of rectangles, which are calculated by applying a lag function. This process to estimate

{\bar{P}}_{k l}

can be applied to a case with a continuous outcome by setting the minimum value of the outcome to zero, i.e., by using

t_{j} - t_{1}

. It is also easy to extend the method to observational studies by using a weighted survival curve [21] derived based on the propensity score [22].

A limitation of our approach is the requirement that the assumption of independent potential outcomes (Assumption 2) is used when estimating the measures of covariate effect types. Unfortunately, we cannot verify whether this assumption holds in actual studies based on the observed data. Additional work is necessary to develop a simple method to estimate our measures of covariate effect types without using the assumption of independent potential outcomes.

Current standard analysis is suitable for assessing prediction and prognosis of a covariate when

{\bar{θ}}_{01} = 0

, which implies that a covariate is not predictable for the effectiveness of the control treatment (or for the harm caused by the experimental treatment). However, when

{\bar{θ}}_{01} \neq 0

, whether the current approach is appropriate is somewhat questionable unlike for our approach. Thus, our approach can supplement the current standard analysis, despite the limitation of requiring the assumption of independent potential outcomes.

Funding

This research was funded by Grant-in-Aid for Scientific Research from Japan Society for the Promotion of Science, grant number 19K11871.

Conflicts of Interest

The author declares no conflict of interest.

Appendix A. Proof of Lemma 1

Let us denote the probability density function of

T (1) = t_{c}

as

f (t_{c})

, and the cumulative density function as

F (t_{c})

. Then, the left side of the first equation in Lemma 1 can be expressed as follows:

\begin{matrix} E [\min {\max {T (1), t_{a}}, t_{b}}] & = \int_{- \infty}^{\infty} t_{c} f (t_{c}) d t_{c} \\ = \int_{- \infty}^{t_{a}} t_{a} f (t_{c}) d t_{c} + \int_{t_{a}}^{t_{b}} t_{c} f (t_{c}) d t_{c} + \int_{t_{b}}^{\infty} t_{b} f (t_{c}) d t_{c} \\ = t_{a} {[F (t_{c})]}_{- \infty}^{t_{a}} + \int_{t_{a}}^{t_{b}} [{t_{c} F (t_{c})}^{'} - F (t_{c})] d t_{c} + t_{b} {[F (t_{c})]}_{t_{b}}^{\infty} \\ = - \int_{t_{a}}^{t_{b}} [1 - \Pr {T (1) \geq t_{c}}] d t_{c} + t_{b} \\ = \int_{t_{a}}^{t_{b}} {P_{11} (t_{c}) + P_{10} (t_{c})} d t_{c} + t_{a} \\ = (t_{b} - t_{a}) ({\bar{P}}_{11} + {\bar{P}}_{10}) + t_{a} . \end{matrix}

The second equation in Lemma 1 is derived in a similar manner.

Appendix B. Proof of Lemma 2

Using Proposition 3,

{\hat{P}}_{11} (t_{c}) + {\hat{P}}_{10} (t_{c}) = \sum_{j \geq c} n_{j} / n

holds. This implies that the value of

{\hat{P}}_{11} (t_{c}) + {\hat{P}}_{10} (t_{c})

does not change when applying the cut-off point for the observed outcome value in the control group. Therefore, for

{\hat{\bar{P}}}_{11} + {\hat{\bar{P}}}_{10}

, it is sufficient to consider only the observed outcome values in the experimental arm. Then, the left side of the first equation in Lemma 2 can be expressed as follows:

\begin{matrix} (t_{D} - t_{1}) ({\hat{\bar{P}}}_{11} + {\hat{\bar{P}}}_{10}) + t_{1} & = \sum_{j_{1} = 2}^{D_{1}} {(t_{j_{1}}^{1} - t_{j_{1} - 1}^{1}) \frac{\sum_{j \geq j_{1}} n_{j}}{n}} + \frac{t_{1} \sum_{j \geq 1} n_{j}}{n} \\ = \frac{1}{n} {\sum_{j_{1} = 1}^{D_{1}} (t_{j_{1}}^{1} \sum_{j = j_{1}}^{D_{1}} n_{j}) - \sum_{j_{1} = 2}^{D_{1}} (t_{j_{1} - 1}^{1} \sum_{j = j_{1}}^{D_{1}} n_{j})} \\ = \frac{1}{n} {\sum_{j_{1} = 1}^{D_{1} - 1} (t_{j_{1}}^{1} \sum_{j = j_{1}}^{D_{1}} n_{j}) + n_{D_{1}} t_{D_{1}}^{1} - \sum_{j_{1} = 2}^{D_{1} - 1} (t_{j_{1}}^{1} \sum_{j = j_{1} + 1}^{D_{1}} n_{j})} \\ = \frac{1}{n} \sum_{j = j_{1}}^{D_{1}} n_{j_{1}} t_{j_{1}}^{1} \end{matrix}

The second equation in Lemma 2 is derived in a similar manner.

References

Clark, G.M.; Zborowski, D.M.; Culbertson, J.L.; Whitehead, M.; Savoie, M.; Seymour, L.; Shepherd, F.A. Clinical utility of epidermal growth factor receptor expression for selecting patients with advanced non-small cell lung cancer for treatment with erlotinib. J. Thorac. Oncol. 2006, 1, 837–846. [Google Scholar] [CrossRef][Green Version]
Ballman, K.V. Biomarker: Predictive or prognostic? J. Clin. Oncol. 2015, 33, 3968–3972. [Google Scholar] [CrossRef] [PubMed]
Chiba, Y. Inference on covariate effect types for treatment effectiveness in a randomized trial with a binary outcome. Clin. Trials 2019, 16, 237–245. [Google Scholar] [CrossRef] [PubMed]
Hernán, M.A.; Robins, J.M. Causal Inference: What If; Chapman & Hall/CRC: Boca Raton, FL, USA, 2020; pp. 58–65. [Google Scholar]
Holland, P.W. Statistics and causal inference. J. Am. Stat. Assoc. 1986, 81, 945–960. [Google Scholar] [CrossRef]
Rubin, D.B. Bayesian inference for causal effects: The role of randomization. Ann. Stat. 1978, 6, 34–58. [Google Scholar] [CrossRef]
Rubin, D.B. Randomization analysis of experimental data: The Fisher randomization test comment. J. Am. Stat. Assoc. 1980, 75, 591–593. [Google Scholar] [CrossRef]
Royston, P.; Parmar, M.K. Restricted mean survival time: An alternative to the hazard ratio for the design and analysis of randomized trials with a time-to-event outcome. BMC Med. Res. Methodol. 2013, 13, 152. [Google Scholar] [CrossRef] [PubMed]
Uno, H.; Claggett, B.; Tian, L.; Inoue, E.; Gallo, P.; Miyata, T.; Schrag, D.; Takeuchi, M.; Uyama, Y.; Zhao, L.; et al. Moving beyond the hazard ratio in quantifying the between-group difference in survival analysis. J. Clin. Oncol. 2014, 32, 2380–2385. [Google Scholar] [CrossRef] [PubMed]
A’Hern, R.P. Restricted mean survival time: An obligatory end point for time-to-event analysis in cancer trials? J. Clin. Oncol. 2016, 34, 3474–3476. [Google Scholar] [CrossRef] [PubMed]
Ding, P.; Dasgupta, T. A potential tale of two-by-two tables from completely randomized experiments. J. Am. Stat. Assoc. 2016, 111, 157–168. [Google Scholar] [CrossRef]
Lu, J.; Ding, P.; Dasgupta, T. Treatment effects on ordinal outcomes: Causal estimands and sharp bounds. J. Educ. Behav. Stat. 2018, 43, 511–539. [Google Scholar] [CrossRef]
Buyse, M. Generalized pairwise comparisons of prioritized outcomes in the two-sample problem. Stat. Med. 2010, 29, 3245–3257. [Google Scholar] [CrossRef] [PubMed]
Péron, J.; Roy, P.; Ozenne, B.; Roche, L.; Buyse, M. The net chance of a longer survival as a patient-oriented measure of treatment benefit in randomized clinical trials. JAMA Oncol. 2016, 2, 901–905. [Google Scholar] [CrossRef] [PubMed]
Chiba, Y. Sharp nonparametric bounds and randomization inference for treatment effects on an ordinal outcome. Stat. Med. 2017, 36, 3966–3975. [Google Scholar] [CrossRef] [PubMed]
Huang, E.J.; Fang, E.X.; Hanley, D.F.; Rosenblum, M. Inequality in treatment benefits: Can we determine if a new treatment benefits the many or the few? Biostatistics 2017, 18, 308–324. [Google Scholar] [CrossRef] [PubMed][Green Version]
Chiba, Y. Bayesian inference of causal effects for an ordinal outcome in randomized trials. J. Causal Infer. 2018, 6, 20170019. [Google Scholar] [CrossRef]
Ohashi, Y.; Hamada, C. Survival Analysis: Biostatistics with SAS; University of Tokyo Press: Tokyo, Japan, 1995; pp. 247–250. (In Japanese) [Google Scholar]
Heagerty, P.J.; Lumley, T.; Pepe, M.S. Time-dependent ROC curves for censored survival data and a diagnostic marker. Biometrics 2000, 56, 337–344. [Google Scholar] [CrossRef] [PubMed]
Wang, J.; Li, J.; Li, Y.; Wong, W.K. A model-based multithreshold method for subgroup identification. Stat. Med. 2019, 38, 2605–2631. [Google Scholar] [CrossRef] [PubMed]
Cole, S.R.; Hernán, M.A. Adjusted survival curves with inverse probability weights. Comput. Methods Programs Biomed. 2004, 75, 45–49. [Google Scholar] [CrossRef] [PubMed]
Rosenbaum, P.R.; Rubin, D.B. The central role of the propensity score in observational studies for causal effects. Biometrika 1983, 70, 41–55. [Google Scholar] [CrossRef]

Figure 1. Survival curves for the arms with and without radiation during surgery: (a) pancreatic head subgroup; (b) “other” subgroup.

Figure 2. Response type proportions in the pancreatic head and “other” subgroups: (a) activated (type 11) subjects; (b) causative (type 10) subjects; (c) preventive (type 01) subjects; (d) inert (type 00) subjects.

Figure 3. Differences in the proportions of type

k l

subjects between the pancreatic head and “other” subgroups.

Table 1. Restricted mean survival times (RMSTs) for the arms with and without radiation, for the pancreatic head and “other” subgroups, and the RMST differences between the two arms.

Subgroup	Arm	RMST	RMST Difference (95% CI ¹)
Pancreatic head	Radiation	5.508	1.236 (–1.127, 3.600)
Pancreatic head	No radiation	4.271	1.236 (–1.127, 3.600)
Other	Radiation	9.963	5.585 (1.947, 9.223)
Other	No radiation	4.378	5.585 (1.947, 9.223)

¹ Confidence intervals (CIs) were derived based on a normal approximation.

Table 2. Restricted mean probabilities (RMPs) of the four response types by subgroups of cancer site and the RMP differences between the two subgroups.

Response Type	Subgroup	$RMP ({\bar{P}}_{k l})$	$RMP Difference ({\bar{θ}}_{k l}) (95 % CI^{1})$
Activated (Type 11)	Pancreatic head	0.126	−0.062 (−0.134, 0.017)
Activated (Type 11)	Other	0.188	−0.062 (−0.134, 0.017)
Causative (Type 10)	Pancreatic head	0.129	−0.145 (−0.267, −0.026)
Causative (Type 10)	Other	0.274	−0.145 (−0.267, −0.026)
Preventive (Type 01)	Pancreatic head	0.072	0.057 (0.010, 0.119)
Preventive (Type 01)	Other	0.015	0.057 (0.010, 0.119)
Inert (Type 00)	Pancreatic head	0.673	0.149 (−0.007, 0.307)
Inert (Type 00)	Other	0.524	0.149 (−0.007, 0.307)

¹ Confidence intervals (CIs) were yielded as percentiles for the bootstrap distribution with 2000 samples.

© 2020 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Definition and Estimation of Covariate Effect Types in the Context of Treatment Effectiveness

Abstract

1. Introduction

2. Definition of Measures of Covariate Effect Types

3. Estimation of Measures of Covariate Effect Types

4. Illustration

5. Conclusion

Funding

Conflicts of Interest

Appendix A. Proof of Lemma 1

Appendix B. Proof of Lemma 2

References

Article Metrics

Citations

Article Access Statistics