Assessing Dominance in Survival Functions: A Test for Right-Censored Data

Belzunce, Félix; Martínez-Riquelme, Carolina; Valenciano, Jaime

doi:10.3390/math14010107

Open AccessFeature PaperArticle

Assessing Dominance in Survival Functions: A Test for Right-Censored Data

by

Félix Belzunce

^*

,

Carolina Martínez-Riquelme

and

Jaime Valenciano

Departamento de Estadística e Investigación Operativa, Facultad de Matemáticas, Universidad de Murcia, Espinardo, 30100 Murcia, Spain

^*

Author to whom correspondence should be addressed.

Mathematics 2026, 14(1), 107; https://doi.org/10.3390/math14010107

Submission received: 4 November 2025 / Revised: 22 December 2025 / Accepted: 23 December 2025 / Published: 27 December 2025

(This article belongs to the Special Issue Probability, Statistics & Symmetry, 2nd edition)

Download

Browse Figures

Versions Notes

Abstract

This paper proposes a new statistical test to assess the dominance of survival functions in the presence of right-censored data. Traditional methods, such as the Log-Rank test, are inadequate for determining whether one survival function consistently dominates another, especially when survival curves cross. The proposed test is based on the supremum of the difference between Kaplan–Meier estimators and allows for distinguishing between dominance and crossing survival curves. The paper presents the test’s asymptotic properties, along with simulations and applications to real datasets. The results demonstrate that the test has high sensitivity for detecting crossings and dominance compared to conventional methods.

Keywords:

survival analysis; right censoring; Kaplan–Meier estimator; stochastic dominance

MSC:

62N03; 62N01

1. Introduction

In survival analysis, a fundamental objective is to compare the distribution of event times between two or more groups. The standard tool for this purpose is the survival function

S_{T} (t) = P (T > t)

, and the most widely used hypothesis test for its comparison is the Log-Rank test, see [1]. This test is fundamentally based on the null hypothesis of strict equality (

H_{0} : S_{T} (t) = S_{U} (t)

) and is optimal under the crucial assumption of Proportional Hazards (PHs).

However, in many clinical and reliability settings, the PH assumption is frequently violated. The most challenging scenario is when the survival curves cross. This phenomenon indicates that the advantage of one treatment over another changes over time (e.g., one treatment is superior initially but inferior in the long term). In this situation, classic tests like the Log-Rank test present a critical limitation:

The test statistic, which accumulates the observed and expected differences in events over time, suffers from a severe cancellation effect; see [2,3]. The positive contributions of one group are nullified by the negative contributions of the other group after the crossing point. This results in a drastic loss of statistical power and a non-significant p-value, often leading to the erroneous conclusion of ’no difference’ when a significant, but time-varying, difference actually exists. The test is simply inadequate for diagnosing whether the observed difference is due to a consistent dominance or a crossing effect.

To reach a robust and clinically relevant conclusion, researchers must evaluate stochastic dominance. A survival function

S_{T}

dominates

S_{U}

if

S_{T} (t) \geq S_{U} (t)

for all t. The crucial question for the investigator is, therefore, whether a consistent dominance exists or if, on the contrary, the curves cross.

This paper addresses this gap by proposing a new statistical test specifically designed to clearly distinguish the case of dominance in the presence of right-censored data. Our test is based on the supremum of the difference between Kaplan–Meier estimators, focusing on the maximum deviation between the curves. As demonstrated through asymptotic properties and simulation studies, this approach provides superior sensitivity for detecting dominance and allows researchers to directly assess the question of stochastic dominance. Furthermore, the proposed methodology can also be utilized to detect the presence of crossing survival curves. In this way, our test complements the results provided by traditional equality tests by offering a more robust interpretation of survival patterns.

The remainder of the paper is structured as follows. In Section 2, we introduce a new test for assessing dominance between two survival functions in the presence of censored data. We examine its asymptotic properties and consistency. Next, in Section 3, we show the implementation of the new test and apply it to real datasets. In Section 4, we conduct a simulation study to evaluate its performance under different scenarios. Throughout the paper, we assume that the samples are independent. Finally, in Section 5, we provide a discussion on advantages and limitations.

2. Test for Comparing Survival Functions: The Case of Independent Samples

As mentioned in the introduction, in this paper we consider right-censored data; that is, in some cases we know the time at which some event occurs (e.g., failure, death), but in others we know that the event has not yet happened by the end of the observation period, but we do not know the exact time at which it will occur.

Let us consider the lung dataset included in the survival package in R, version 4.5.2; see [4]. The lung dataset originates from the North Central Cancer Treatment Group study and includes 228 patients, 90 female and 138 male, with advanced lung cancer. One of the variables of interest is overall survival, measured in days from the initial diagnosis to the lost-to-follow-up status or death.

If we want to compare the survivals for females and males, a first approach is to plot the Kaplan–Meier estimators of the survivals for both groups.

Formally, let

T_{i}

,

i = 1, \dots, n

be independent and identically distributed death times with common survival function

S_{T}

, and let

C_{i}

,

i = 1, \dots, n

, be independent and identically distributed censoring times with common survival function

H_{C} (t)

. It is assumed that failure and censoring times are independent. The dataset consists of bivariate random vectors

(X_{i}, δ_{i}^{X})

, where

X_{i} = T_{i} \land C_{i}

, where ∧ denotes the minimum, with a common distribution function F, and

δ_{i}^{X} = I (T_{i} \leq C_{i})

indicates whether

X_{i}

is censored, with

I (\cdot)

being the indicator function. One of the main issues in this context is to provide information about

S_{T}

from

(X_{i}, δ_{i}^{X})

,

i = 1, \dots, n

. Denoting by

t_{1}, \dots, t_{k}

the observed death times (non-censored observations), the Kaplan–Meier or product limit estimator of the survival function

S_{T}

is

{\hat{S}}_{T, n} (t) = \prod_{t_{j} \leq t} (1 - \frac{d_{j}}{n_{j}}),

where

d_{j}

is the number of deaths at time

t_{j}

and

n_{j}

the number of survivors before time

t_{j}

.

A key result for inferential purposes is the weak convergence of the Kaplan–Meier estimator. Under the previous notation [5] (see also [6]), this proves that for

τ < + \infty

, such that

S_{T} (τ) > 0

, the empirical process

\{n^{1 / 2} ({\hat{S}}_{n, T} (t) - S_{T} (t)), t \in (0, τ)\},

converges weakly to a Gaussian process,

G_{T}

with mean 0, and covariance matrix given by

c o v (G_{T} (s), G_{T} (t)) = S_{T} (s) S_{T} (t) a (min (s, t))

where

a (x) = \int_{0}^{x} \frac{- d S_{T} (t)}{{(S_{T} (t))}^{2} H_{C} (t)},

under the conditions of

S_{T}

and

H_{C}

being continuous.

We now return to the lung cancer dataset introduced earlier. Figure 1 displays the Kaplan–Meier estimators for both groups, suggesting that female patients exhibit a higher survival probability across all time points. The traditional methods only give information on equality or differences of the survival function, and therefore, it would be more informative to test whether one survival function dominates the other one against the alternative hypothesis that there is at least one crossing point between the two survival functions.

When one survival function lies below another, we use the concept of stochastic ordering. Formally, we say that T is smaller than U in the stochastic, or first stochastic dominance, order, denoted as

T \leq_{s t} U

, if

P (T > t) \leq P (U > t)

for all t, meaning that the survival probability of T is always lower than that of U. Determining whether one distribution consistently exhibits better survival characteristics than another is essential. Whether this problem has been addressed by [7] in the context of stochastic dominance or ordering for non-censored data, there are limited statistical tools available in the case of censored data.

Following the approach by [7], the main purpose of this paper is to provide statistical methods for testing the ordering of two survival functions by considering the Kolmogorov–Smirnov type test in the presence of right-censored data based on the supremum of the difference between the two Kaplan–Meier estimators.

To fix the notation, we consider another set

U_{i}

,

i = 1, \dots, m

of independent and identically distributed death times with common survival function

S_{U}

; let

D_{i}

,

i = 1, \dots, m

, be independent and identically distributed censoring times with common survival function

H_{D} (t)

. Again, it is assumed that failure and censoring times are independent. The second dataset consists of bivariate random vectors

(Y_{i}, δ_{i}^{Y})

, where

Y_{i} = U_{i} \land D_{i}

and

δ_{i}^{Y} = I (U_{i} \leq D_{i})

indicates whether

Y_{i}

is censored or not. Let us denote by

{\hat{S}}_{U, m}

the corresponding Kaplan–Meier estimator of the survival curve

S_{U}

. We assume that these observations are independent of the previous observations.

Under the previous notation, our main objective is to test the null hypothesis

H_{0} : S_{T} (t) \leq S_{U} (t), for all t \in (0, τ)

against the alternative hypothesis

H_{1} : S_{T} (t) > S_{U} (t), for some t \in (0, τ),

using a test statistic based on the supremum of the difference of the Kaplan–Meier estimators.

More precisely, we consider the test statistic

Δ_{n, m} = {(\frac{n m}{n + m})}^{1 / 2} sup_{t \in (0, τ)} \{{\hat{S}}_{T, n} (t) - {\hat{S}}_{U, m} (t)\} .

It is important to note that

Δ_{n, m}

is a one-sided statistic designed to detect departures from the null hypothesis

S_{T} \leq S_{U}

. To formally distinguish between strict dominance (where one curve is always above the other) and a crossing scenario, the test should be performed symmetrically by also considering

Δ_{m, n}

(the maximum difference in the opposite direction). A crossing is statistically suggested when the null hypothesis of dominance is rejected in at least one direction, while secondary testing evidence shows a reversal of roles in another time interval.

According to this, the null hypothesis is rejected if

Δ_{n, m} > δ

, where the critical value

δ

would be determined in terms of the distribution of

Δ_{n, m}

. However, it is not feasible to obtain the exact distribution of such a statistic, and we would rather use the asymptotic distribution. To derive the asymptotic properties, we will assume the following assumptions:

A1: The value

τ

satisfies that

S_{T} (τ), S_{U} (τ) > 0

.

A2: When

n, m \to + \infty

then

n / (n + m) \to λ

, with

0 < λ < 1

.

Next, we provide an asymptotic upper bound under

H_{0}

for

P (Δ_{n, m} > δ)

in terms of its asymptotic distribution.

Theorem 1.

Following the previous notation and assumptions A1 and A2, we get, under

H_{0}

,

lim_{n, m \to + \infty} P (Δ_{n, m} > δ) \leq P (sup_{t \in (0, τ)} G_{T, U} (t) > δ),

(1)

where

G_{T, U}

is a Gaussian process with 0 mean and covariance matrix given by

c o v (G_{T, U} (s), G_{T, U} (t)) = (1 - λ) c o v (G_{T} (s), G_{T} (t)) + λ c o v (G_{U} (s), G_{U} (t)) .

Proof.

As noticed previously, the empirical processes

n^{1 / 2} ({\hat{S}}_{n, T} - S_{T})

and

m^{1 / 2} ({\hat{S}}_{m, U} - S_{U})

converge weakly to Gaussian processes

G_{T}

and

G_{U}

, respectively. The independence of the samples implies that

(n^{1 / 2} ({\hat{S}}_{n, T} - S_{T}), m^{1 / 2} ({\hat{S}}_{m, U} - S_{U}))

converges weakly to

(G_{T}, G_{U})

. Now, by the continuity mapping theorem, we have

Δ_{n, m}^{*} = {(\frac{n m}{n + m})}^{1 / 2} sup_{t \in (0, τ)} \{{\hat{S}}_{T, n} (t) - {\hat{S}}_{U, m} (t) - (S_{T} (t) - S_{U} (t))\}

converges weakly to

{sup}_{t \in (0, τ)} \{G_{T, U} (t)\}

where

G_{T, U}

is a Gaussian process with a 0 mean and covariance matrix given by

c o v (G_{T, U} (s), G_{T, U} (t)) = (1 - λ) c o v (G_{T} (s), G_{T} (t)) + λ c o v (G_{U} (s), G_{U} (t)) .

Now, under

H_{0}

we get

Δ_{n, m} \leq Δ_{n, m}^{*} (a . s)

and therefore

Δ_{n, m} \leq_{s t} Δ_{n, m}^{*}

; see Theorem 1.A.1 in [8]. As a consequence, we get

lim_{n, m \to + \infty} P (Δ_{n, m} > δ) \leq lim_{n, m \to + \infty} P (Δ_{n, m}^{*} > δ) = P (sup_{t \in (0, τ)} G_{T, U} (t) > δ) .

□

This test is consistent as can be seen next.

Proposition 1.

Under the conditions of previous theorem and under

H_{1}

, it holds that

lim_{n, m \to + \infty} P (Δ_{n, m} > δ) = 1,

for any

δ \in R

.

Proof.

Under

H_{1}

there exists a

t_{0} \in (0, τ)

such that

S_{T} (t_{0}) - S_{U} (t_{0}) > 0

, and therefore

sup_{t \in (0, τ)} \{S_{T} (t) - S_{U} (t)\} > 0 .

It is not difficult to see that

lim_{n, m \to + \infty} sup_{t \in (0, τ)} \{{\hat{S}}_{T, n} (t) - {\hat{S}}_{U, m} (t)\} = sup_{t \in (0, τ)} \{S_{T} (t_{0}) - S_{U} (t_{0})\} > 0, a . s .,

and clearly

lim_{n, m \to + \infty} {(\frac{n m}{n + m})}^{1 / 2} sup_{t \in (0, τ)} \{{\hat{S}}_{T, n} (t) - {\hat{S}}_{U, m} (t)\} = + \infty, a . s . .

Now the result follows, observing that

lim_{n, m \to + \infty} inf P (Δ_{n, m} > δ) \geq P (lim_{n, m \to + \infty} inf {Δ_{n, m} > δ}) = 1,

for any

δ \in R

. □

We have introduced the theoretical framework of the proposed test and its motivation compared to traditional methods. Next, we will apply this methodology to real-world datasets and simulations to assess its performance in practical scenarios.

3. Implementation and Application to Some Datasets

To provide the upper bound (1) for the p-value, we need to compute the probability for the supremum of a Gaussian process. A common computational method is to approximate the probability via Monte Carlo simulation. The idea is to generate N samples of the Gaussian process over a discretized grid, then compute the supremum for each sample and estimate the probability using empirical frequency.

However, following [9], we propose a more efficient method using the mvtnorm package in R; see [10]. Instead of relying exclusively on Monte Carlo simulations, this approach reduces the computational burden associated with the covariance matrix calculation, making the process faster.

Given a discretized grid

0 = τ_{0} < τ_{1} < \dots < τ_{m}

on

(0, τ)

, the Monte Carlo method provides an approximation of

P ({sup}_{t \in (0, τ)} G_{T, U} (t) > δ)

, where

δ

is the value of

Δ_{n, m}

at a given sample, by an estimation of

P (m a x (G_{T, U} (τ_{0}), \dots, G_{T, U} (τ_{m})) > δ)

based on the simulations. This probability can be computed as

P (m a x (G_{T, U} (τ_{0}), \dots, G_{T, U} (τ_{m})) > δ) = 1 - P (G_{T, U} (τ_{0}) \leq δ, \dots, G_{T, U} (τ_{m})) \leq δ) .

Given that

(G_{T, U} (τ_{0}), \dots, G_{T, U} (τ_{m}))

follows a multivariate normal distribution, this expression can be efficiently computed in R using the multivariate normal distribution function from the mvtnorm package.

Additionally, to construct the covariance matrix, we require theoretical values of the survival functions

S_{T}

,

S_{U}

, and their corresponding densities, and the survival functions

H_{C}

and

H_{D}

. Since these values are unknown, we use empirical values. The survival functions are replaced by the corresponding Kaplan–Meier estimators, and the distribution functions are replaced by their empirical counterparts. For density estimation, we apply the Foldes–Rejtó–Winter smoothed density estimator, which incorporates plug-in bandwidth selection, as proposed by [11,12]. This estimation is implemented in R using the survPresmooth package; see [13].

Finally, to assess assumption A1, we will take

τ = min (max (X_{i}^{'} s), max (Y_{j}^{'} s))

.

Next, we provide applications of the previous test to some datasets, starting with the lung cancer dataset introduced in Section 2.

Lung cancer: Here, we aim to assess whether the survival function of male patients lies consistently below that of female patients, as Figure 1 suggests.

Applying our test, we obtain

Δ_{n, m} = - 0.0375

and an upper bound for the p-value of 0.9955, indicating no statistical evidence to reject the null hypothesis that female survival dominates male survival.

To determine whether this dominance is strict or the two survival functions are equal, we apply some classical tests to detect differences between the two survival functions. The results are presented in Table 1. Since all tests yield extremely low p-values, we conclude that female survival strictly dominates male survival.

Gastric cancer: In this second application, we illustrate the performance of the proposed test using the gastric cancer dataset. These data originate from a clinical trial conducted by the Gastrointestinal Tumor Study Group [14], which compared the survival of patients with locally advanced gastric carcinoma under two treatment arms: chemotherapy alone (5-fluorouracil and semustine), which we consider the control group, versus chemotherapy combined with radiation therapy. This dataset is particularly relevant because the survival curves of the two groups exhibit a visual intersection at approximately 900 days (see Figure 2). Before this point, the chemotherapy group appears to show higher survival probabilities, but the trend reverses thereafter.

We applied our proposed test to specifically test the null hypothesis that the survival function of the chemotherapy group dominates (is “above”) that of the chemotherapy plus radiation group. Interestingly, despite the visual crossing, the test does not provide sufficient evidence to reject this dominance hypothesis. Applying our proposed test, we obtain

Δ_{n, m} = 0.4392052

and an upper bound for the p-value of 0.5864. This result indicates that the observed intersection in the sample is statistically compatible with the hypothesis of ordering. From a methodological perspective, this suggests that the divergence observed in the right tail of the distributions after 900 days may be attributed to sampling variability rather than a significant structural reversal in treatment efficacy. Our test thus provides a robust interpretation by not over-interpreting visual crossings that lack enough statistical strength to invalidate the dominance model.

On the other hand, when applying traditional tests for comparing survival functions, we obtain Table 2:

The results presented in Table 2 highlight the limitations of traditional tests when dealing with crossing survival functions. While the Log-Rank and Tarone–Ware tests fail to reach statistical significance, other methods like Gehan or Peto–Peto yield conflicting results with p-values below 0.05. This discrepancy arises from the cancellation effect inherent in rank-based tests, where differences before and after the crossing point (at approximately 900 days) neutralize each other. In contrast, our proposed test, based on the supremum of the difference between Kaplan–Meier estimators, provides a more robust and interpretable framework. By obtaining a p-value of 0.5864 for the dominance hypothesis, we can conclude that the observed intersection in the sample does not provide sufficient evidence to reject the model of stochastic ordering. This confirms that our approach effectively avoids over-interpreting visual crossings that lack enough statistical strength, offering a clear advantage for clinical decision-making over conventional methods.

4. Simulation Studies

To show the performance of our test in different scenarios, we carry out Monte Carlo experiments for small and large samples. The simulation studies are performed in several scenarios where the dominance among the two survivals either holds or does not hold, for different sample sizes and different rates of censoring.

First, we have considered gamma-distributed survival times. Let us describe the different cases that we have considered.

Case 1: In this case, T follows a gamma distribution with shape parameter 2 and scale parameter 1,

T \sim G a m m a (2, 1)

, and

U \sim G a m m a (3, 1)

. It is known that in this case the survival function of U dominates the survival function of T.

Case 2: In this case

T \sim G a m m a (2, 1)

and

U \sim G a m m a (2.2, 1)

. It is known that in this case, the survival function of U dominates the survival function of T, but they are very close.

Case 3: In this case,

T \sim G a m m a (3, 5)

and

U \sim G a m m a (6, 2)

. It is known that in this case, the survival functions cross at one point.

Case 4: In this case,

T \sim G a m m a (2, 2)

and

U \sim G a m m a (3, 1)

. It is known that in this case the survival functions cross at one point, but it is difficult to detect the crossing point.

In Figure 3 we have plot the different cases, where we observe ordered survival functions in cases 1 and 2 and a crossing point in cases 3 and 4.

In addition to the gamma-based scenarios, we have considered four cases to broaden the scope of the simulation study and assess the performance of the proposed test under more general survival models. Specifically, cases 5 and 6 consider Log-Normal distributions, where the survival functions cross at a single point, with the intersection occurring very early in case 6, making it more difficult to detect. Similarly, cases 7 and 8 involve Weibull distributions, both exhibiting a single crossing point, although in case 8, the crossing appears near the origin, posing an additional challenge for detection. We want to highlight that, except for trivial cases or strict equality of parameters, comparisons between survival functions generated by two Weibull or Log-Normal distributions with different parameters always result in a crossing point.

In particular, we have considered the following cases.

Case 5: In this case, T follows a Log-Normal distribution with parameters

μ = 2

and

σ = 1

,

T \sim LN (2, 1)

, and

U \sim LN (1.7, 0.5)

. In this setting, the survival functions of T and U intersect at a single point.

Case 6: In this case,

T \sim LN (1, 1.5)

and

U \sim LN (0.5, 1)

. Again, the survival functions cross at one point, but the intersection occurs very early, making it more difficult to detect.

Case 7: In this case, T follows a Weibull distribution with shape parameter 1 and scale parameter 5,

T \sim Weibull (1, 5)

, and

U \sim Weibull (2, 3.5)

. The survival functions cross at a single point.

Case 8: In this case,

T \sim Weibull (1.5, 4)

and

U \sim Weibull (2, 3)

. As in previous cases, the survival functions intersect at one point; however, the crossing occurs very early and is therefore more difficult to detect.

In Figure 4, we have plot the different cases, where we observe crossing survival functions in cases 5–8.

For each case, we have considered exponentially distributed censor times with a different shape parameter

λ

. In each case, we have selected two different

λ

’s to get a rate of approximately 20% and a 50% rate of censored observations. Given a survival time T and censored time C with an exponential distribution, we need to solve the probability equation

P (T > C) = Target Percentage

to determine the appropriate

λ_{C}

. Because this equation is analytically complex, we used a numerical approach to solve for

λ_{C}

. We set up an equation where the calculated probability of censoring must equal our target (e.g.,

0.20

). We used R’s numerical integration function (integrate) to precisely calculate

P (T > C)

for any given

λ_{C}

. We then employed a root-finding algorithm (specifically, R’s uniroot) to iteratively search for the single

λ_{C}

value that makes the calculated probability match the target percentage.

Following the numerical procedure described, we obtained the specific

λ

values that achieve the desired censoring proportions in each case. The values of the exponential parameters used to generate the censoring times for each scenario are the following:

Case 1: Gamma(2, 1) vs. Gamma(3, 1)
–
20% censoring: $C_{T} \sim Exp (0.1180610), C_{U} \sim Exp (0.07719858)$
–
50% censoring: $C_{T} \sim Exp (0.4142079), C_{U} \sim Exp (0.25989219)$
Case 2: Gamma(2, 1) vs. Gamma(2.2, 1)
–
20% censoring: $C_{T} \sim Exp (0.1180610), C_{U} \sim Exp (0.1067799)$
–
50% censoring: $C_{T} \sim Exp (0.4142079), C_{U} \sim Exp (0.3703497)$
Case 3: Gamma(3, 5) vs. Gamma(6, 2)
–
20% censoring: $C_{T} \sim Exp (0.01544137), C_{U} \sim Exp (0.01894665)$
–
50% censoring: $C_{T} \sim Exp (0.05198403), C_{U} \sim Exp (0.06125454)$
Case 4: Gamma(2, 2) vs. Gamma(3, 1)
–
20% censoring: $C_{T} \sim Exp (0.05903155), C_{U} \sim Exp (0.07719858)$
–
50% censoring: $C_{T} \sim Exp (0.20710355), C_{U} \sim Exp (0.25989219)$
Case 5: Log-Normal(2, 1) vs. Log-Normal(1.7, 0.5)
–
20% censoring: $C_{T} \sim Exp (0.02150724), C_{U} \sim Exp (0.0371334)$
–
50% censoring: $C_{T} \sim Exp (0.08611372), C_{U} \sim Exp (0.1225910)$
Case 6: Log-Normal(1, 1.5) vs. Log-Normal(0.5, 1)
–
20% censoring: $C_{T} \sim Exp (0.04214411), C_{U} \sim Exp (0.09632171)$
–
50% censoring: $C_{T} \sim Exp (0.22510678), C_{U} \sim Exp (0.38593292)$
Case 7: Weibull(1, 5) vs. Weibull(2, 3.5)
–
20% censoring: $C_{T} \sim Exp (0.05000624), C_{U} \sim Exp (0.07419826)$
–
50% censoring: $C_{T} \sim Exp (0.19999707), C_{U} \sim Exp (0.24728661)$
Case 8: Weibull(1.5, 4) vs. Weibull(2, 3)
–
20% censoring: $C_{T} \sim Exp (0.06515498), C_{U} \sim Exp (0.08657097)$
–
50% censoring: $C_{T} \sim Exp (0.22790919), C_{U} \sim Exp (0.28850326)$

This systematic determination of

λ

ensures comparability across all settings and provides a sound foundation for the simulation design.

We performed 1000 Monte Carlo replications for each case with different sample sizes,

n = 50, 100, 150, 200, 250

and 500, in which the rejection rates of the null hypothesis have been computed for the two conventional significance levels of

0.05

and

0.01

. The number of points of the grid is

m = 100

in every replication.

Table 3 and Table 4 summarize the performance of the proposed test under eight scenarios, cases 1–8, with varying sample sizes,

n = 50, 100, 150, 200, 250, 500

and censoring levels, 20% vs. 50%, for significance levels

α = 0.05

and

α = 0.0

.

Some key observations are the following:

When one survival function truly dominates the other, cases 1 and 2, the test almost never rejects the null hypothesis. Even with small samples, the rejection rate is essentially 0 at both $α = 0.05$ and $0.01$ , indicating no false alarms when the survival functions are ordered.
In scenarios where the survival functions cross, cases 3–8, the test’s power to reject the null hypothesis increases sharply with larger sample sizes. For small samples, e.g., $n = 50$ , the rejection rates are modest, but with larger samples they increase, exceeding 90% in most cases by $n = 200$ and approaching 100% at $n = 250$ . This shows the test is very sensitive to crossings given sufficient data.
Heavier censoring reduces the test’s power to detect differences. At a given sample size, a 50% censoring rate yields lower rejection rates than 20% censoring. For example, in case 3 with n = 100, the rate of rejection is 85.1% with 20% censoring versus 65.4% with 50% censoring at $α = 0.05$ . Nonetheless, as sample size grows, even with 50% censoring, the power eventually becomes high, reaching 99% at n = 500 in crossing cases.

These results demonstrate that the proposed test is reliable for confirming the order of survival functions and highly effective at flagging crossings, provided the data are sufficient. The test maintains a low false-positive rate when survival curves are truly ordered, giving confidence that a non-significant result indeed suggests no crossing. Conversely, a significant result from this test is strong evidence of at least one crossing point between survival curves. A sufficient sample size is crucial for the test to detect crossings, especially if censoring is heavy. In practice, researchers should plan for larger samples and/or seek to reduce censoring to ensure the test has high power to uncover crossing survival patterns. This sensitivity to crossings is a key advantage over traditional survival comparison tests (like Log-Rank), which often fail to detect any difference when survival curves intersect. Such improved detection can lead to better insights in studies where survival functions may cross, ensuring that meaningful survival differences are not overlooked.

5. Discussion

This paper introduces a novel test, based on the supremum of the difference between Kaplan–Meier estimators, specifically designed to evaluate stochastic dominance and to distinguish this condition from scenarios where survival curves cross. The asymptotic results presented, coupled with the simulation studies and applications to real data, confirm that the proposed test is a robust tool that is more sensitive than classic methods (such as the Log-Rank test) for detecting curve crossings. To ensure the generality of our findings, we included simulations using Gamma, Weibull and Log-Normal distributions.

Our proposed test, based on the supremum of the difference of Kaplan–Meier estimators, offers several advantages over traditional methods such as the Log-Rank test:

A significant contribution of this work, as evidenced by the analysis of the gastric cancer dataset in Section 3, is the ability of the proposed test $Δ_{n, m}$ to handle crossing survival curves with statistical rigor. A common criticism in survival analysis is whether a visual intersection of Kaplan–Meier curves automatically invalidates a dominance model. While the definition of our statistic focuses on the maximum distance between survival functions, this does not imply that the crossing is ignored. On the contrary, our methodology provides a formal decision rule to distinguish between a structural reversal of survival advantage and a crossing that is statistically compatible with the null hypothesis of ordering due to sampling variability. In the case of the gastric cancer study, the visual crossing observed at approximately 900 days does not lead to a rejection of the dominance hypothesis by our test. This suggests that the divergence after the intersection point lacks sufficient statistical strength to confirm a violation of the ordering. While the traditional Log-Rank test often results in non-significant p-values in such scenarios due to the cancellation of early and late differences, our approach avoids this ’blindness’ by focusing on the maximum evidence of deviation. Therefore, the test $Δ_{n, m}$ proves to be a robust tool that prevents the over-interpretation of visual patterns in the tails of the distribution, where high censoring often increases uncertainty. This reinforces the practical utility of our test as a complement to visual inspection in clinical and biological research.
Improved detection of stochastic dominance: Our test is specifically designed to assess whether one survival function consistently lies above another, providing a more informative alternative to conventional hypothesis testing frameworks.
Computational efficiency and implementation in R: The test is easy to compute using standard statistical software. The estimation of p-values can be efficiently performed using multivariate normal approximations, making it practical for large datasets.
Sensitivity to outliers: The test inherits the inherent robustness of the Kaplan–Meier estimator. As a non-parametric statistic based on the rank of event times, K-M is less susceptible to the influence of a single extreme value than estimators based on the mean or variance. Nevertheless, caution is advised when analyzing very small samples with clear outliers, as an extreme event or censoring value can impact the calculation of the supremum and influence the asymptotic convergence properties.

Despite its strengths, the proposed method has certain limitations that suggest promising avenues for future research:

Performance in small sample sizes: While our test has strong asymptotic properties, its performance in small samples could be further refined by improving the estimation of the asymptotic distribution. Future research could explore bootstrap-based refinements or finite-sample adjustments to enhance the test’s accuracy in small-sample scenarios.
Adaptation for paired samples: A relevant extension would be the development of a similar test for paired survival data, where observations are correlated (e.g., matched case-control studies or twin studies).
Handling time-dependent covariates: In real-world applications, survival probabilities often depend on time-varying covariates. Extending our approach to incorporate time-dependent covariates could provide additional insights into how survival dominance evolves over time.

Finally, it is crucial to define the role our test plays in the general hypothesis testing process. The proposed test is not intended to replace classic tests like the Log-Rank but rather acts as a crucial diagnostic complement.

We propose the following workflow: If the test fails to reject the dominance hypothesis (

H_{0} : S_{T} \leq S_{U}

, p-value high), the data is consistent with

S_{T}

being globally less than or equal to

S_{U}

. In this scenario, the next step is to perform a classical test (e.g., Log-Rank) to determine whether there is strict dominance or statistical equality. Conversely, if the test rejects the dominance hypothesis (p-value low), it provides evidence that

S_{T}

is significantly better than

S_{U}

in at least one time interval. When combined with the initial clinical hypothesis of

S_{T}

’s superiority—or by performing the contrast in both directions—this rejection serves as a diagnostic for a crossing point. This result yields critical information unattainable by the Log-Rank alone, allowing researchers to shift their focus to interval-specific comparisons. This allows for a more nuanced interpretation of treatment effects, particularly when early risks are offset by long-term benefits.

Author Contributions

Conceptualization, F.B. and C.M.-R.; Methodology, F.B., C.M.-R. and J.V.; Software, C.M.-R. and J.V.; Validation, C.M.-R.; Formal analysis, C.M.-R. and J.V.; Investigation, F.B., C.M.-R. and J.V.; Writing—original draft, F.B., C.M.-R. and J.V.; Project administration, F.B.; Funding acquisition, F.B. All authors have read and agreed to the published version of the manuscript.

Funding

This work has been funded by Agencia Estatal de Investigación (AEI) del Ministerio de Ciencia e Innovación and cofunded by Fondo Europeo de Desarrollo Regional (FEDER), project Comparaciones Estocásticas Bajo Estructuras de Dependencia con Aplicaciones a Fiabilidad y Finanzas. Teoría e Inferencia (PID2022-137396NB-I00).

Data Availability Statement

The datasets analyzed in this study are publicly available. The lung cancer dataset is available in the survival package in R, version 4.5.2, see [4]. The gastric dataset is available in [15]. The data used in the simulation studies were generated by the authors and will be made available upon reasonable request to the corresponding author.

Acknowledgments

We would also like to express our sincere gratitude to two anonymous referees for their thorough reading and insightful comments, which have substantially improved the clarity, structure, and overall presentation of this work.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

References

Klein, J.P.; Moeschberger, M.L. Survival Analysis: Techniques for Censored and Truncated Data; Statistics for Biology and Health; Springer: New York, NY, USA, 2003. [Google Scholar] [CrossRef]
Dormuth, I.; Liu, T.; Xu, J.; Yu, M.; Pauly, M.; Ditzhaus, M. Which test for crossing survival curves? A user’s guideline. BMC Med. Res. Methodol. 2022, 22, 1–7. [Google Scholar] [CrossRef] [PubMed]
Li, H.; Han, D.; Hou, Y.; Chen, H.; Chen, Z. Statistical inference methods for two crossing survival curves: A comparison of methods. PLoS ONE 2015, 10, e0116774. [Google Scholar] [CrossRef] [PubMed]
Therneau, T.M. A Package for Survival Analysis in R. Available online: https://CRAN.R-project.org/package=survival (accessed on 1 January 2025).
Breslow, N.; Crowley, J. A large sample study of the life table and product limit estimates under random censorship. Ann. Stat. 1974, 2, 437–453. [Google Scholar] [CrossRef]
Gillespie, M.J.; Fisher, L. Confidence bands for the Kaplan–Meier survival curve estimate. Ann. Stat. 1979, 7, 920–924. [Google Scholar] [CrossRef]
Barrett, G.F.; Donald, S.G. Consistent tests for stochastic dominance. Econometrica 2003, 71, 71–104. [Google Scholar] [CrossRef]
Shaked, M.; Shanthikumar, J.G. Stochastic Orders; Springer Series in Statistics; Springer: New York, NY, USA, 2007. [Google Scholar] [CrossRef]
Belzunce, F.; Martínez-Riquelme, C. A new stochastic dominance criterion for dependent random variables with applications. Insur. Math. Econ. 2023, 108, 165–176. [Google Scholar] [CrossRef]
Genz, A.; Bretz, F.; Miwa, T.; Mi, X.; Leisch, F.; Scheipl, F.; Hothorn, T. mvtnorm: Multivariate Normal and t Distributions. R Package. Available online: https://CRAN.R-project.org/package=mvtnorm (accessed on 1 January 2025).
Jácome, M.A.; Gijbels, I.; Cao, R. Comparison of presmoothing methods in kernel density estimation under censoring. Comput. Stat. 2008, 23, 381–406. [Google Scholar] [CrossRef]
Cao, R.; López-de Ullibarri, I. Product-type and presmoothed hazard rate estimators with censored data. TEST 2007, 16, 355–382. [Google Scholar] [CrossRef]
López-de Ullibarri, I.; Jácome, M.A. survPresmooth: An R package for presmoothed estimation in survival analysis. J. Stat. Softw. 2013, 54, 1–26. [Google Scholar] [CrossRef]
Schein, P.S. Gastrointestinal Tumor Study Group. A Comparison of Combination Chemotherapy and Combined Modality Therapy for Locally Advanced Gastric Carcinoma. Cancer 1982, 49, 1771–1777. [Google Scholar] [CrossRef]
Stablein, D.M.; Koutrouvelis, I.A. A Two-Sample Test Sensitive to Crossing Hazards in Uncensored and Singly Censored Data. Biometrika 1985, 41, 643–652. [Google Scholar] [CrossRef]

Figure 1. Survival plots for females (solid) and males (dashes).

Figure 2. Survival plots for the chemotherapy group (solid) versus the chemotherapy combined with radiation therapy (dashes).

Figure 3. Survival plots for T (dashes) and U (solid), for cases 1–4. (a) Case 1 for

T \sim G a m m a (2, 1)

and

U \sim G a m m a (3, 1)

; (b) Case 2 for

T \sim G a m m a (2, 1)

and

U \sim G a m m a (2.2, 1)

; (c) Case 3 for

T \sim G a m m a (3, 5)

and

U \sim G a m m a (6, 2)

; (d) Case 4 for

T \sim G a m m a (2, 2)

and

U \sim G a m m a (3, 1)

.

Figure 3. Survival plots for T (dashes) and U (solid), for cases 1–4. (a) Case 1 for

T \sim G a m m a (2, 1)

and

U \sim G a m m a (3, 1)

; (b) Case 2 for

T \sim G a m m a (2, 1)

and

U \sim G a m m a (2.2, 1)

; (c) Case 3 for

T \sim G a m m a (3, 5)

and

U \sim G a m m a (6, 2)

; (d) Case 4 for

T \sim G a m m a (2, 2)

and

U \sim G a m m a (3, 1)

.

Figure 4. Survival plots for T (dashes) and U (solid), for cases 5–8. (a) Case 5 for

T \sim LN (2, 1)

and

U \sim LN (1.7, 0.5)

; (b) Case 6 for

T \sim LN (1, 1.5)

and

U \sim LN (0.5, 1)

; (c) Case 7 for

T \sim Weibull (1, 5)

and

U \sim Weibull (2, 3.5)

; (d) Case 8 for

T \sim Weibull (1.5, 4)

and

U \sim Weibull (2, 3)

.

Figure 4. Survival plots for T (dashes) and U (solid), for cases 5–8. (a) Case 5 for

T \sim LN (2, 1)

and

U \sim LN (1.7, 0.5)

; (b) Case 6 for

T \sim LN (1, 1.5)

and

U \sim LN (0.5, 1)

; (c) Case 7 for

T \sim Weibull (1, 5)

and

U \sim Weibull (2, 3.5)

; (d) Case 8 for

T \sim Weibull (1.5, 4)

and

U \sim Weibull (2, 3)

.

Table 1. Test statistics and p-values of some classical tests for the lung dataset.

Test	Test Statistic	p-Value
Log-Rank	10.3267	0.0013
Gehan	12.4721	0.0004
Tarone–Ware	12.4555	0.0004
Peto–Peto	12.7078	0.0003
Modified Peto–Peto	12.7091	0.0003

Table 2. Test statistics and p-values of some classical tests for the gastric dataset.

Test	Test Statistic	p-Value
Log-Rank	0.2252	0.6351
Gehan	3.9637	0.0465
Tarone–Ware	1.9030	0.1677
Peto–Peto	3.9955	0.0456
Modified Peto–Peto	4.0872	0.0432

Table 3. Empirical rejection probabilities of the null hypothesis across the previously described scenarios, evaluated at a significance level of

α = 0.05

, for censoring proportions approximately equal to 20% and 50%.

Table 3. Empirical rejection probabilities of the null hypothesis across the previously described scenarios, evaluated at a significance level of

α = 0.05

, for censoring proportions approximately equal to 20% and 50%.

n	50		100		150		200		250		500
	20%	50%	20%	50%	20%	50%	20%	50%	20%	50%	20%	50%
Case 1	0	0	0	0	0	0	0	0	0	0	0	0
Case 2	0.019	0.012	0.011	0.008	0.004	0.007	0.003	0.004	0.002	0.002	0	0.001
Case 3	0.541	0.318	0.851	0.654	0.960	0.836	0.985	0.906	0.998	0.973	1	1
Case 4	0.463	0.275	0.783	0.544	0.917	0.750	0.976	0.881	0.987	0.927	1	0.996
Case 5	0.836	0.568	0.996	0.910	1	0.984	1	0.998	1	1	1	1
Case 6	0.656	0.355	0.942	0.674	0.988	0.796	0.999	0.915	0.999	0.950	1	0.997
Case 7	0.746	0.421	0.980	0.804	0.997	0.965	1	0.986	1	1	1	1
Case 8	0.589	0.352	0.886	0.696	0.981	0.875	0.998	0.939	1	0.978	1	1

Table 4. Empirical rejection probabilities of the null hypothesis across the previously described scenarios, evaluated at a significance level of

α = 0.01

, for censoring proportions approximately equal to 20% and 50%.

Table 4. Empirical rejection probabilities of the null hypothesis across the previously described scenarios, evaluated at a significance level of

α = 0.01

, for censoring proportions approximately equal to 20% and 50%.

n	50		100		150		200		250		500
	20%	50%	20%	50%	20%	50%	20%	50%	20%	50%	20%	50%
Case 1	0	0	0	0	0	0	0	0	0	0	0	0
Case 2	0.002	0.001	0.002	0	0.002	0	0	0	0	0	0	0
Case 3	0.268	0.134	0.619	0.396	0.843	0.605	0.933	0.746	0.980	0.884	1	0.997
Case 4	0.220	0.095	0.519	0.272	0.749	0.500	0.881	0.664	0.943	0.763	1	0.982
Case 5	0.589	0.304	0.952	0.735	0.995	0.932	1	0.986	1	0.992	1	1
Case 6	0.387	0.125	0.766	0.388	0.920	0.595	0.992	0.721	0.997	0.836	1	0.978
Case 7	0.449	0.179	0.901	0.566	0.985	0.839	0.998	0.941	1	0.984	1	1
Case 8	0.301	0.148	0.686	0.407	0.902	0.646	0.977	0.796	0.995	0.898	1	1

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Belzunce, F.; Martínez-Riquelme, C.; Valenciano, J. Assessing Dominance in Survival Functions: A Test for Right-Censored Data. Mathematics 2026, 14, 107. https://doi.org/10.3390/math14010107

AMA Style

Belzunce F, Martínez-Riquelme C, Valenciano J. Assessing Dominance in Survival Functions: A Test for Right-Censored Data. Mathematics. 2026; 14(1):107. https://doi.org/10.3390/math14010107

Chicago/Turabian Style

Belzunce, Félix, Carolina Martínez-Riquelme, and Jaime Valenciano. 2026. "Assessing Dominance in Survival Functions: A Test for Right-Censored Data" Mathematics 14, no. 1: 107. https://doi.org/10.3390/math14010107

APA Style

Belzunce, F., Martínez-Riquelme, C., & Valenciano, J. (2026). Assessing Dominance in Survival Functions: A Test for Right-Censored Data. Mathematics, 14(1), 107. https://doi.org/10.3390/math14010107

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Assessing Dominance in Survival Functions: A Test for Right-Censored Data

Abstract

1. Introduction

2. Test for Comparing Survival Functions: The Case of Independent Samples

3. Implementation and Application to Some Datasets

4. Simulation Studies

5. Discussion

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI