Validated Transfer Learning Peters–Belson Methods for Survival Analysis: Ensemble Machine Learning Approaches with Overfitting Controls for Health Disparity Decomposition

Liang, Menglu; Li, Yan

doi:10.3390/stats8040114

Open AccessArticle

Validated Transfer Learning Peters–Belson Methods for Survival Analysis: Ensemble Machine Learning Approaches with Overfitting Controls for Health Disparity Decomposition

by

Menglu Liang

^*,†

and

Yan Li

^†

Department of Epidemiology and Biostatistics, School of Public Health, University of Maryland, College Park, MD 20740, USA

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Stats 2025, 8(4), 114; https://doi.org/10.3390/stats8040114

Submission received: 21 October 2025 / Revised: 3 December 2025 / Accepted: 5 December 2025 / Published: 10 December 2025

(This article belongs to the Special Issue Advances in Machine Learning, High-Dimensional Inference, Shrinkage Estimation, and Model Validation)

Download

Browse Figures

Versions Notes

Abstract

Background: Health disparities research increasingly relies on complex survey data to understand survival differences between population subgroups. While Peters–Belson decomposition provides a principled framework for distinguishing disparities explained by measured covariates from unexplained residual differences, traditional approaches face challenges with complex data patterns and model validation for counterfactual estimation. Objective: To develop validated Peters–Belson decomposition methods for survival analysis that integrate ensemble machine learning with transfer learning while ensuring logical validity of counterfactual estimates through comprehensive model validation. Methods: We extend the traditional Peters–Belson framework through ensemble machine learning that combines Cox proportional hazards models, cross-validated random survival forests, and regularized gradient boosting approaches. Our framework incorporates a transfer learning component via principal component analysis (PCA) to discover shared latent factors between majority and minority groups. We note that this “transfer learning” differs from the standard machine learning definition (pre-trained models or domain adaptation); here, we use the term in its statistical sense to describe the transfer of covariate structure information from the pooled population to identify group-level latent factors. We develop a comprehensive validation framework that ensures Peters–Belson logical bounds compliance, preventing mathematical violations in counterfactual estimates. The approach is evaluated through simulation studies across five realistic health disparity scenarios using stratified complex survey designs. Results: Simulation studies demonstrate that validated ensemble methods achieve superior performance compared to individual models (proportion explained: 0.352 vs. 0.310 for individual Cox, 0.325 for individual random forests), with validation framework reducing logical violations from 34.7% to 2.1% of cases. Transfer learning provides additional 16.1% average improvement in explanation of unexplained disparity when significant unmeasured confounding exists, with 90.1% overall validation success rate. The validation framework ensures explanation proportions remain within realistic bounds while maintaining computational efficiency with 31% overhead for validation procedures. Conclusions: Validated ensemble machine learning provides substantial advantages for Peters–Belson decomposition when combined with proper model validation. Transfer learning offers conditional benefits for capturing unmeasured group-level factors while preventing mathematical violations common in standard approaches. The framework demonstrates that realistic health disparity patterns show 25–35% of differences explained by measured factors, providing actionable targets for reducing health inequities.

Keywords:

Peters–Belson decomposition; health disparities; ensemble machine learning; survival analysis; transfer learning; model validation; cross-validation; ensemble methods

1. Introduction

Persistent racial and ethnic disparities in cancer survival outcomes represent one of the most pressing challenges in contemporary public health research [1,2,3]. Black patients demonstrate significantly higher death rates than other racial and ethnic groups across multiple cancer types, with disparities particularly pronounced in breast, prostate, colorectal, and lung cancers. Recent population-based analyses reveal that while overall cancer-specific survival improved from 2004 to 2018, substantial racial disparities persist across multiple domains of care [4,5].

These epidemiological patterns present complex methodological challenges that extend beyond traditional survival analysis approaches. Understanding the sources of group differences in survival outcomes requires decomposition methods that can distinguish between disparities explained by measured covariates versus those attributable to unmeasured factors or differential treatment effects [6,7]. The Peters–Belson method provides a counterfactual framework for such decomposition, estimating what survival outcomes would be for minority group members if they experienced the same covariate effects as the majority group [8,9].

However, successful extension of Peters–Belson methods to survival analysis faces several fundamental challenges. First, traditional survival models may inadequately capture complex relationships between covariates and outcomes, particularly when these relationships involve high-order interactions or nonlinear effects [10,11]. Second, machine learning methods applied to Peters–Belson decomposition without proper validation can produce mathematically invalid counterfactual estimates that violate basic logical bounds [12,13]. Third, ensemble approaches may inherit failures from poorly performing individual models, contaminating the entire decomposition analysis [14,15]. Fourth, unmeasured confounding factors that operate at the group level may not be adequately addressed by conventional approaches [16,17].

This manuscript addresses four core methodological problems in survival-based health disparity research. We develop validated ensemble machine learning methods that combine multiple survival modeling approaches within the Peters–Belson framework while ensuring logical validity of counterfactual estimates through comprehensive validation procedures. Our approach incorporates transfer learning methods to discover and incorporate shared latent factors between groups that may capture unmeasured determinants of health disparities. We implement cross-validation and regularization approaches that prevent overfitting to majority group patterns while ensuring generalizability to minority group counterfactual estimation. Finally, we provide diagnostic tools and validation procedures that ensure Peters–Belson logical bounds compliance and guide method selection based on observable data characteristics.

Terminology Clarification: We use “transfer learning” in a statistical sense distinct from its common usage in machine learning. In ML, transfer learning typically refers to adapting a model pre-trained on one task/domain to a different but related task/domain. In our context, “transfer learning” refers to the transfer of covariate structure information from the pooled population (majority + minority groups) to identify latent factors that capture group-level variation not directly measured. This is accomplished through PCA on the combined covariate space, with the resulting components used to augment the Peters–Belson decomposition. This usage aligns with the broader statistical concept of learning transferable structure across populations, though it does not involve pre-trained models or domain adaptation in the ML sense.

Contributions

Our main contributions are: (1) a validated ensemble framework that extends Peters–Belson decomposition to survival analysis with comprehensive model validation ensuring logical bounds compliance; (2) development of transfer learning methods that capture group-level latent factors through principal component analysis, providing meaningful improvements when unmeasured confounding exists; (3) implementation of Peters–Belson-specific validation procedures that assess logical bounds compliance and counterfactual validity; (4) systematic evaluation across realistic health disparity scenarios demonstrating when ensemble methods versus individual approaches excel; and (5) development of diagnostic tools that characterize data complexity and provide guidance for method selection based on observable characteristics and validation results.

Ensemble machine learning methods are well-established in the statistical and machine learning literature [14,15]. Our contribution lies in the specific adaptation of these methods to the Peters–Belson decomposition framework for survival analysis, with particular emphasis on the validation procedures that ensure logical bounds compliance—a challenge unique to counterfactual estimation in disparity decomposition.

2. Methods

Figure 1 provides an overview of our validated ensemble Peters–Belson framework, illustrating the complete methodology from data input through individual model training, validation, ensemble construction, optional transfer learning enhancement, and final disparity decomposition analysis.

2.1. Traditional Peters–Belson Framework for Survival Analysis

2.1.1. Setup and Notation

Consider a finite population

P = {1, 2, \dots, N}

partitioned into majority (

P_{M}

, size

N_{M}

) and minority (

P_{m}

, size

N_{m}

) subgroups. Each unit has survival time

T_{i}^{*}

, censoring time

C_{i}^{*}

, and covariate vector

X_{i} \in R^{p}

. We observe a probability sample

S

of size

n = n_{M} + n_{m}

according to a complex survey design with known inclusion probabilities

{π_{i} : i \in P}

.

For each sampled unit

i \in S

, we observe:

\begin{matrix} T_{i}^{obs} & = min (T_{i}^{*}, C_{i}^{*}) & (observed time) \end{matrix}

(1)

\begin{matrix} δ_{i} & = 1 {T_{i}^{*} \leq C_{i}^{*}} & (event indicator) \end{matrix}

(2)

\begin{matrix} X_{i} & \in R^{p} & (covariate vector) \end{matrix}

(3)

\begin{matrix} Z_{i} & \in {0, 1} & (group indicator : 1 = majority, 0 = minority) \end{matrix}

(4)

\begin{matrix} w_{i}^{survey} & = π_{i}^{- 1} & (design weight) \end{matrix}

(5)

2.1.2. Peters–Belson Decomposition

The Peters–Belson method decomposes the total survival disparity between groups into explained and unexplained components. For a given time point t, let:

$S_{maj} (t)$ = observed survival probability for majority group
$S_{\min} (t)$ = observed survival probability for minority group
$S_{cf} (t)$ = counterfactual survival probability for minority group under majority group model

The decomposition is:

\begin{matrix} \underset{Total Disparity}{\underset{︸}{S_{maj} (t) - S_{\min} (t)}} = \underset{Explained}{\underset{︸}{S_{maj} (t) - S_{cf} (t)}} + \underset{Unexplained}{\underset{︸}{S_{cf} (t) - S_{\min} (t)}} \end{matrix}

(6)

2.2. Validated Ensemble Machine Learning Framework

Our validated ensemble framework represents a fundamental advancement in applying machine learning methods to Peters–Belson decomposition while ensuring mathematical validity of counterfactual estimates. The framework integrates multiple survival modeling approaches through a comprehensive validation system that prevents contamination from poorly performing models.

2.2.1. Individual Survival Models with Cross-Validation

The ensemble framework incorporates four complementary survival modeling approaches, each selected for their distinct advantages in capturing different aspects of the survival process. Cox proportional hazards models provide interpretable baseline estimates with established theoretical foundations, while machine learning approaches offer enhanced flexibility for complex data patterns.

The Cox proportional hazards model forms the foundational component of our ensemble:

\begin{matrix} h_{i} (t | X_{i}) = h_{0} (t) exp (X_{i}^{T} β) \end{matrix}

(7)

where

h_{0} (t)

represents the baseline hazard and

β

are covariate effects estimated via partial likelihood [18]. This approach provides reliable performance across diverse data conditions while maintaining computational efficiency essential for validation procedures.

Random survival forests extend traditional random forest methodology to survival outcomes through recursive binary partitioning with permutation-based variable selection [10,19]. The ensemble nature of this approach naturally provides overfitting controls through bootstrap aggregation:

\begin{matrix} {\hat{S}}_{RSF} (t | X_{i}) = \frac{1}{B} \sum_{b = 1}^{B} S_{b} (t | X_{i}) \end{matrix}

(8)

where

S_{b} (t | X_{i})

represents the survival function from the b-th tree. Cross-validation determines optimal hyperparameters including the number of variables randomly selected at each split, minimum node size, and splitting criteria to prevent overfitting while maintaining predictive performance.

Gradient boosting methods implement sequential learning through weak learners with comprehensive regularization [20,21]:

\begin{matrix} {\hat{S}}_{GB} (t | X_{i}) = S_{0} (t) exp (- \sum_{m = 1}^{M} γ_{m} f_{m} (X_{i})) \end{matrix}

(9)

The sequential nature allows the model to learn complex patterns while L1 and L2 regularization parameters, selected via cross-validation, prevent overfitting to majority group patterns that could invalidate counterfactual estimates.

Elastic net regularized Cox regression combines the interpretability of Cox models with advanced regularization techniques [22,23]:

\begin{matrix} {\hat{β}}_{EN} = \underset{β}{arg min} \{- ℓ (β) + λ [{α ∥ β ∥}_{1} + (1 - α) {∥ β ∥}_{2}^{2}]\} \end{matrix}

(10)

This approach addresses multicollinearity issues common in health disparities research while maintaining model interpretability essential for policy applications.

2.2.2. Hyperparameter Specification

All machine learning models were tuned via cross-validation with the following specifications (complete details in Supplementary Material Section S5). For penalized Cox (Lasso):

α = 1.0

,

λ = 0.0464

(10-fold CV). Random survival forest:

n_{tree} = 500

,

m_{try} = ⌊ \sqrt{p} ⌋ = 2

, minimum node size = 5, log-rank splitting. XGBoost: learning rate

η = 0.05

, max depth = 6,

n_{rounds} = 81

(early stopping), L1/L2 = 0.1. GBM: shrinkage = 0.01, interaction depth = 4,

n_{trees} = 98

(5-fold CV), bag fraction = 0.8.

2.2.3. Peters–Belson Validation Framework

The validation framework represents our most critical methodological innovation, addressing fundamental limitations in applying machine learning methods to counterfactual estimation. Traditional approaches often produce mathematically invalid results where counterfactual estimates violate basic logical constraints inherent to the Peters–Belson framework.

Our validation system implements comprehensive logical bounds checking for each candidate model:

\begin{matrix} min (S_{maj} (t), S_{\min} (t)) \leq S_{cf} (t) \leq max (S_{maj} (t), S_{\min} (t)) \end{matrix}

(11)

This constraint ensures that counterfactual survival probabilities remain within the observable range defined by the actual group survival functions. Models consistently violating these bounds across multiple time points are excluded from ensemble construction, preventing contamination of the final decomposition results.

The validation process extends beyond simple bounds checking to assess the stability and reliability of counterfactual estimates. We evaluate the proportion of explained disparity at multiple time points, flagging models that produce explanations exceeding 100% or suggesting negative contributions from measured covariates. This multi-dimensional validation approach ensures both mathematical validity and substantive interpretability of results.

2.2.4. Validated Ensemble Integration

The ensemble integration process combines only validated survival models through optimal weight determination that minimizes prediction error while maintaining Peters–Belson validity. The optimization problem balances predictive accuracy with logical consistency:

\begin{matrix} w^{*} = \underset{w}{arg min} IBS (w) subject to \sum_{k \in V} w_{k} = 1, w_{k} \geq 0 \end{matrix}

(12)

where

V

represents the set of validated models and IBS denotes the integrated Brier score [24,25]. The constraint set ensures that only models passing validation contribute to the final ensemble, with weights determined by predictive performance on validation data.

The resulting validated ensemble survival function integrates the strengths of multiple modeling approaches:

\begin{matrix} {\hat{S}}_{ensemble} (t | x_{i}) = \sum_{k \in V} w_{k}^{*} S_{k} (t | x_{i}) \end{matrix}

(13)

This approach provides superior performance compared to individual models while maintaining the logical validity essential for Peters–Belson decomposition.

2.3. Transfer Learning Peters–Belson Framework

Traditional Peters–Belson decomposition methods may fail to capture group-level factors that operate through unmeasured pathways, potentially underestimating the complexity of health disparities. Our transfer learning framework addresses this limitation by discovering latent factors that represent shared structural characteristics between groups while identifying meaningful differences that contribute to observed disparities.

2.3.1. Terminology and Conceptual Framework

In standard machine learning contexts, “transfer learning” typically refers to pre-training a model on a large source dataset (e.g., ImageNet for image classification), then adapting or fine-tuning the model to a target task with limited data. Common examples include BERT language models adapted for domain-specific NLP tasks and ImageNet-pretrained convolutional neural networks applied to medical imaging.

In contrast, our usage of “transfer learning” refers to a statistical approach wherein covariate data from majority and minority groups are pooled, principal component analysis is applied to identify linear combinations of observed covariates that capture shared structure across groups, this structural information is then “transferred” to augment the Peters–Belson decomposition, and latent factors exhibiting meaningful group differences are identified through Cohen’s d effect size statistics.

This usage aligns with the broader statistical concept of leveraging information across populations to improve inference, but does not involve pre-trained models, neural networks, or domain adaptation in the machine learning sense. An alternative term could be “latent factor augmentation via PCA,” but we retain “transfer learning” to emphasize the conceptual transfer of structural information from the pooled to group-specific analyses.

2.3.2. Latent Factor Discovery

The transfer learning approach builds on the recognition that health disparities often arise from complex interactions between measured covariates and unmeasured structural factors such as discrimination, healthcare system characteristics, and social determinants [26,27]. Our method systematically discovers these latent factors through enhanced principal component analysis applied to the combined covariate space.

The discovery process begins by combining covariate matrices from both majority and minority groups to create a unified representation space. Given majority group data

X_{maj} \in R^{n_{maj} \times p}

and minority group data

X_{\min} \in R^{n_{\min} \times p}

, we construct the combined matrix

X_{combined} = [X_{maj}; X_{\min}]

that preserves the covariate structure across both groups.

Principal component analysis transforms this combined space to identify orthogonal factors that capture the maximum variance in the original covariates:

F = X_{combined} W

, where

W

represents the transformation matrix derived from eigendecomposition of the covariance matrix. The first k components

F_{k} = F [:, 1 : k]

capture the most significant patterns of covariate variation across both groups.

Component selection balances the need to capture sufficient variation with the requirement for interpretable factors. We employ explained variance criteria to determine the optimal number of components:

\begin{matrix} k^{*} = \underset{k}{arg min} \{k : \frac{\sum_{j = 1}^{k} λ_{j}}{\sum_{j = 1}^{p} λ_{j}} \geq θ\} \end{matrix}

(14)

where

λ_{j}

represent eigenvalues ordered by magnitude and

θ \in (0.7, 0.9)

serves as a threshold parameter balancing comprehensiveness with parsimony.

The critical innovation lies in assessing group differences within the latent factor space. After extracting latent factors, we calculate Cohen’s d for each component to quantify the magnitude of group differences in the transformed space. Components exhibiting substantial group differences (typically

| d | > 0.3

) represent potential sources of unmeasured disparity that traditional Peters–Belson methods might miss.

2.3.3. Threshold Selection

The threshold of Cohen’s

d > 0.3

for identifying meaningful group differences follows established conventions in behavioral and social sciences [28], where

d = 0.2

represents a “small” effect,

d = 0.5

a “medium” effect, and

d = 0.8

a “large” effect. We selected

d > 0.3

as a threshold between small and medium effects to balance sensitivity (detecting meaningful differences) with specificity (avoiding noise). Sensitivity analyses across thresholds from 0.2 to 0.5 demonstrate that results are robust to this choice, with threshold 0.5 being overly restrictive (excluding valid simulations). Complete sensitivity analysis results appear in Supplementary Material Section S3.

2.3.4. When to Use Transfer Learning

Transfer learning via latent factor augmentation is recommended under specific conditions. The approach is most appropriate when standard Peters–Belson decomposition leaves substantial unexplained disparity, typically exceeding 50% of the total observed difference between groups. Additionally, transfer learning is indicated when there exists theoretical justification to suspect unmeasured group-level confounding, such as structural racism, differential healthcare access patterns, or other systemic factors not captured by measured covariates. The method should be applied when principal component analysis identifies components exhibiting meaningful group differences as indicated by Cohen’s d effect sizes exceeding 0.3, and when the identified latent factors admit substantive interpretation relevant to the health disparity under investigation.

Conversely, transfer learning should be omitted in several circumstances. When measured covariates adequately explain observed disparities with minimal residual unexplained difference, the additional complexity of latent factor augmentation provides limited benefit. Similarly, when no principal components demonstrate meaningful group differences (that is, when all effect sizes satisfy

| d | < 0.3

), the transfer learning component contributes negligible additional explanatory power. The approach is also inappropriate when sample sizes are insufficient for stable principal component estimation, and when the “same-X” scenario applies wherein covariate distributions are identical between majority and minority groups.

2.3.5. Limitations of PCA’s Linear Assumptions

Our use of principal component analysis assumes that latent factors can be expressed as linear combinations of observed covariates. This assumption may be violated in several contexts. First, when true latent structures are inherently nonlinear in nature, linear combinations may fail to capture the underlying relationships adequately. Second, when important interactions exist among covariates that linear combinations cannot represent, the PCA-based approach may miss substantial variation relevant to health disparities. Third, when the relationship between covariates and survival outcomes involves threshold effects or other discontinuities, linear factor extraction may provide misleading results.

To partially address these limitations, we recommend applying nonlinear feature engineering—such as polynomial terms and interaction terms—to the covariate matrix prior to principal component analysis. This preprocessing strategy allows the linear PCA procedure to capture nonlinear patterns within the augmented feature space. Future methodological extensions could explore nonlinear dimensionality reduction techniques such as kernel PCA or uniform manifold approximation and projection [29], though such approaches present additional challenges for interpretability and computational efficiency.

2.3.6. Rationale for PCA Selection

We selected principal component analysis over alternative dimensionality reduction methods based on several considerations. First, interpretability is essential for policy-relevant health disparity research, and PCA loadings provide clear interpretation of what each latent factor represents in terms of the original measured covariates. Second, computational efficiency is critical given the iterative validation procedures central to our framework, and PCA scales efficiently to high-dimensional settings common in health disparity research involving numerous demographic, clinical, and socioeconomic variables. Third, the statistical properties of PCA are well-established, with known asymptotic behavior that facilitates theoretical justification and inference procedures. Fourth, reproducibility is enhanced by the deterministic nature of PCA, in contrast to stochastic methods such as t-distributed stochastic neighbor embedding that may yield different results across runs. Fifth, there exists substantial precedent for PCA in epidemiology and health services research for dimension reduction, facilitating comparison with existing literature and acceptance by the research community.

2.3.7. Enhanced Decomposition with Validation

Transfer learning enhances the traditional Peters–Belson decomposition by introducing an intermediate layer that captures latent factor contributions. The refined decomposition distinguishes between three sources of disparity:

\begin{matrix} Total Disparity & = Traditional Explained + Latent Factor Explained + Residual Unexplained \end{matrix}

(15)

Each component undergoes validation to ensure logical bounds compliance and substantive interpretability. Traditional explained disparity captures differences attributable to measured covariates through conventional modeling approaches. Latent factor explained disparity represents additional explanation gained through transfer learning, typically reflecting structural or systemic factors not directly measured. Residual unexplained disparity encompasses remaining differences that neither measured covariates nor discovered latent factors can explain.

This enhanced decomposition provides policymakers with more nuanced insights into disparity sources, distinguishing between factors amenable to individual-level interventions (traditional explained) and those requiring structural or systemic approaches (latent factor explained).

2.4. Generalization-Aware Counterfactual Estimation

A critical component is ensuring counterfactual estimates generalize from majority to minority groups. We implement method-specific approaches:

For Cox Models:

\begin{matrix} S_{cf} (t | X_{\min}) = {[S_{maj, 0} (t)]}^{exp (X_{\min}^{T} {\hat{β}}_{maj})} \end{matrix}

(16)

For Machine Learning Models: Performance-dependent interpolation based on validation results [15,30]:

\begin{matrix} S_{cf} (t) = \{\begin{matrix} Model - based estimate & if C - index > θ_{perf} \\ α S_{maj} (t) + (1 - α) S_{\min} (t) & otherwise \end{matrix} \end{matrix}

(17)

where

α

depends on model type and performance.

3. Simulation Study

Our simulation study evaluates method performance across diverse scenarios representing the complexity and variability observed in real-world health disparity research. The design emphasizes realistic data-generating mechanisms that mirror the challenges encountered when analyzing survival differences between population subgroups using complex survey data.

3.1. Study Design

The simulation framework generates finite populations that reflect the demographic and clinical characteristics typical of large-scale epidemiological studies. Each simulated population contains

N_{M}

= 12,000 majority group members and

N_{m} = 6000

minority group members, establishing a realistic 2:1 ratio commonly observed in population health research. This population structure enables assessment of method performance under varying degrees of group size imbalance while maintaining sufficient statistical power for reliable inference.

Complex survey sampling procedures mirror real-world data collection approaches through stratified designs with three strata per group [31,32]. Majority group strata maintain proportions of 0.5, 0.3, and 0.2, while minority group strata reflect 0.6, 0.25, and 0.15 proportions, respectively. This asymmetric stratification design captures the differential sampling strategies often employed in studies targeting health disparities, where minority populations may be oversampled in certain geographic or socioeconomic strata.

Sample sizes range systematically from small (

n_{M} = 400

,

n_{m} = 250

) to large (

n_{M} = 800

,

n_{m} = 400

) configurations, enabling assessment of method performance across different statistical power scenarios. This range encompasses typical sample sizes in health disparity research while accounting for the additional complexity introduced by complex survey designs.

3.1.1. Disparity Scenarios

The simulation encompasses five scenarios representing different sources and complexities of health disparities, each designed to reflect realistic mechanisms observed in epidemiological research. These scenarios progress systematically from simple linear relationships to complex multifactorial patterns involving unmeasured confounding.

The simple linear scenario establishes baseline performance under ideal conditions where disparities arise solely from different covariate distributions between groups while maintaining identical effect parameters. Covariate effects follow linear relationships with survival outcomes, and group differences manifest through shifted distributions of measured risk factors rather than differential treatment effects. This scenario provides a reference point for evaluating method performance under optimal conditions.

Complex interaction scenarios introduce substantial two-way and three-way interactions among covariates, coupled with nonlinear transformations that challenge traditional modeling approaches. These patterns reflect real-world situations where disparity sources involve complex relationships among socioeconomic status, clinical factors, and treatment variables. The interaction effects vary in magnitude across different covariate combinations, creating challenging prediction landscapes that test the ability of ensemble methods to capture multifaceted relationships.

Nonlinear relationship scenarios incorporate complex nonlinear covariate effects through polynomial terms, logarithmic transformations, and threshold effects. These patterns capture situations where traditional linear models face fundamental misspecification, such as when the relationship between age and survival exhibits threshold effects or when socioeconomic gradients follow nonlinear patterns. The nonlinearity introduces additional complexity that may favor flexible machine learning approaches over traditional Cox regression.

Unmeasured confounding scenarios represent perhaps the most challenging and realistic conditions, incorporating strong latent factors that influence both covariates and outcomes differently between groups. These latent factors represent structural determinants such as discrimination, healthcare system characteristics, and neighborhood effects that operate through unmeasured pathways. The confounding structure reflects the reality that observed covariates in health disparity research often serve as proxies for underlying structural factors rather than direct causal mechanisms.

Mixed complexity scenarios combine elements from all previous scenarios, creating the most challenging realistic conditions that ensemble methods and transfer learning approaches must handle. These scenarios integrate interactions, nonlinearity, and unmeasured confounding simultaneously, reflecting the multifaceted nature of real-world health disparities where multiple mechanisms operate concurrently.

3.1.2. Performance Metrics

Method evaluation employs metrics specifically tailored to Peters–Belson decomposition validity and interpretability. The proportion explained serves as the primary outcome measure, quantifying the fraction of total disparity attributable to measured covariates. This metric directly relates to policy relevance by indicating the potential impact of interventions targeting measured risk factors.

Validation success rates measure the proportion of models passing Peters–Belson logical bounds validation across simulation replications. This metric captures the reliability of different modeling approaches in producing mathematically valid counterfactual estimates, which is essential for maintaining scientific credibility in policy applications.

Logical violations track the frequency of bounds violations per method across time points, providing detailed assessment of validation framework effectiveness. This metric identifies methods prone to producing impossible results, such as negative explanations exceeding 100% or counterfactual estimates outside observable bounds.

Transfer learning benefit quantifies additional explanation provided by latent factors beyond traditional decomposition, measured as the percentage increase in explained disparity when transfer learning enhancements are applied. This metric assesses the practical value of discovering unmeasured group-level factors while accounting for computational costs.

Computational efficiency measures runtime including validation overhead relative to standard approaches, ensuring that methodological advances remain practical for real-world applications. This metric balances statistical performance improvements against computational feasibility, particularly important for routine use in health disparity research.

3.2. Results

3.2.1. Overall Method Performance

Table 1 presents comprehensive simulation results demonstrating the superior performance of validated ensemble methods across all scenarios.

The enhanced ensemble method demonstrates consistent improvements over individual Cox models, with gains ranging from 3.4 percentage points (simple linear) to 4.3 percentage points (mixed complexity). Transfer learning provides additional benefits particularly pronounced in scenarios with unmeasured confounding (12.5 percentage point improvement) and mixed complexity (8.3 percentage points).

The validation framework achieves 90.1% overall success rate in preventing logical violations, with higher success rates in simpler scenarios. This demonstrates the framework’s ability to maintain scientific validity while capturing complex disparity patterns.

3.2.2. Individual Method Contributions

Table 2 details the contribution of individual methods within the validated ensemble framework.

Cox proportional hazards models receive the highest ensemble weight (0.42) due to their high validation success rate and computational efficiency. Random survival forests and gradient boosting contribute substantially to ensemble performance, while neural networks are frequently excluded due to validation failures in complex scenarios.

3.2.3. Transfer Learning Analysis

Transfer learning provides meaningful improvements when unmeasured group-level factors exist, as detailed in Table 3.

Transfer learning proves most beneficial in scenarios with unmeasured confounding (21.1% additional explanation) and mixed complexity (20.7%), where latent factors capture group-level differences not reflected in measured covariates. The method automatically adapts to scenario complexity, providing minimal additional explanation when measured covariates adequately capture group differences.

Figure 2 provides comprehensive visualization of our simulation results, illustrating the superior performance of validated ensemble methods and the conditional benefits of transfer learning across different disparity scenarios.

3.3. Enhanced Precision Results

Table 4 presents unexplained disparity estimates with four decimal place precision, confirming that apparent similarities at two decimal places reflect rounding rather than computational artifacts. Notably, scenarios involving differential covariate effects between groups (diff-

β

scenarios) demonstrate approximately twice the magnitude of unexplained disparity compared to scenarios with identical covariate effects across groups, consistent with theoretical expectations regarding the sources of health disparities.

4. SEER Pancreatic Cancer Data Application

4.1. Data Source and Analysis Framework

We demonstrate our validated ensemble Peters–Belson methodology using pancreatic cancer data extracted from the Surveillance, Epidemiology, and End Results (SEER) program database. The SEER program, maintained by the National Cancer Institute, collects data on cancer incidence, treatment, and survival from population-based cancer registries covering approximately 48% of the U.S. population. We extracted all pancreatic cancer cases (ICD-O-3 site codes C25.0–C25.9) diagnosed between 2004 and 2018, with follow-up through 2019. The final analytic sample included patients with complete covariate information on age at diagnosis, sex, race/ethnicity, tumor stage, grade, and treatment modality. The analysis framework applies our validated ensemble approach to decompose observed survival disparities between demographic groups.

4.2. Peters–Belson Decomposition Results

Table 5 presents the comprehensive Peters–Belson decomposition results using our validated ensemble methodology applied to the SEER pancreatic cancer data.

4.3. Transfer Learning Analysis in SEER Data

Application of transfer learning to the SEER pancreatic cancer data revealed important latent factor contributions. In the sex-based analysis, transfer learning identified 3.2 significant latent factors explaining an additional 8.7% of previously unexplained disparity, primarily related to treatment delay patterns and socioeconomic clustering. For racial/ethnic analysis, 4.1 latent factors captured an additional 15.8% of unexplained disparity, reflecting complex interactions between socioeconomic status, healthcare access patterns, and geographic factors.

5. Discussion

The findings from our comprehensive simulation study and SEER data application reveal important insights into the performance and utility of validated ensemble Peters–Belson methods for health disparity decomposition. Our results demonstrate that methodological advances in ensemble machine learning, when properly validated and combined with transfer learning approaches, can substantially improve both the accuracy and interpretability of disparity analyses in survival settings.

The superior performance of validated ensemble methods compared to individual models represents more than incremental improvement. The consistent 3–5 percentage point gains in disparity explanation across all scenarios suggest that ensemble approaches capture complementary aspects of the survival process that individual models miss. This is particularly evident in complex scenarios where traditional Cox models struggle with nonlinear relationships and high-order interactions. The ensemble framework effectively leverages the strengths of different modeling approaches while mitigating their individual weaknesses through the validation process.

The validation framework emerges as a critical methodological innovation that addresses fundamental problems in counterfactual estimation for Peters–Belson decomposition. The dramatic reduction in logical violations from 34.7% to 2.1% of cases represents a qualitative improvement in scientific validity. Previous applications of machine learning to Peters–Belson methods often produced mathematically impossible results where counterfactual estimates exceeded logical bounds or implied negative explanations greater than 100%. Our validation framework prevents these contaminating effects while preserving the flexibility benefits of machine learning approaches.

Transfer learning provides conditional but substantial benefits that depend critically on the presence of unmeasured group-level confounding. The 16.1% average additional explanation represents meaningful progress in understanding disparity sources, particularly given the challenges of capturing structural factors through measured covariates alone. The method’s ability to automatically adapt its contribution based on data characteristics prevents unnecessary complexity in simple scenarios while providing substantial gains where unmeasured factors operate. This adaptive behavior suggests that transfer learning captures genuine signal rather than overfitting to noise.

The computational efficiency of our approach, with only 31% overhead for comprehensive validation, makes it practical for real-world applications. This efficiency reflects careful optimization of the validation procedures and ensemble weight calculation. The modest computational cost is more than justified by the substantial improvements in result validity and interpretability.

Our SEER analysis findings align with the broader health disparities literature while providing new insights into the sources of observed differences. The finding that measured covariates explain approximately 30% of survival disparities is consistent with previous research suggesting substantial roles for unmeasured factors including discrimination, treatment quality differences, and biological variations [33,34,35]. The temporal stability of explanation proportions over time suggests persistent structural factors rather than evolving disparity patterns, which has important implications for intervention design.

The transfer learning benefits observed in the SEER analysis, ranging from 9 to 16% additional explanation, demonstrate the practical value of capturing unmeasured group-level differences. These improvements likely reflect complex interactions between socioeconomic status, geographic factors, and healthcare system characteristics that are difficult to measure directly but influence survival outcomes through multiple pathways [36,37].

Several methodological limitations warrant consideration. While our Peters–Belson framework provides counterfactual interpretation, causal inference requires additional assumptions about exchangeability and no unmeasured confounding that we cannot fully verify [38]. Future work should explore integration with formal causal inference frameworks that explicitly address these assumptions. The computational requirements, while reasonable for current applications, may require enhanced dimension reduction procedures for very high-dimensional data common in genomic applications.

Our reliance on linear principal component analysis for latent factor discovery represents a conservative choice that may miss nonlinear latent structures. Future extensions could explore nonlinear dimensionality reduction techniques such as UMAP or variational autoencoders for more flexible latent factor discovery [29]. Additionally, the current framework assumes that the set of covariates needed for valid decomposition is the same across groups, which may not hold when different confounding structures operate in majority versus minority populations.

The validation framework, while comprehensive, focuses on logical bounds compliance and may not detect all forms of model misspecification. Future work could incorporate additional validation criteria including calibration assessment across different subgroups and stability analysis under covariate perturbations. The current approach also does not fully address missing data patterns common in complex survey settings, which could be addressed through specialized imputation methods designed for Peters–Belson applications.

Despite these limitations, our findings have important implications for health disparity research and policy. The demonstration that realistic health disparity patterns show 25–35% of differences explained by measured factors provides actionable targets for intervention design. The substantial unexplained portions highlight the continued importance of addressing structural and systemic factors that operate beyond individual-level characteristics. The validated ensemble framework provides researchers and policymakers with principled tools for decomposing health disparities that balance methodological sophistication with practical usability.

The modular design of our approach facilitates extension to additional disease contexts and incorporation of emerging data types. The validation procedures ensure that extensions maintain scientific validity while the ensemble framework can accommodate new modeling approaches as they are developed. This flexibility positions the methodology to evolve with advancing machine learning techniques while preserving the core Peters–Belson logic that enables disparity decomposition.

Future applications should prioritize extension to longitudinal settings with time-varying covariates, development of enhanced causal inference frameworks that explicitly address unmeasured confounding assumptions, and incorporation of complex survey design features beyond stratification. The framework’s emphasis on validation and logical consistency provides a foundation for these extensions while maintaining interpretability for policy applications.

6. Conclusions

This work demonstrates that validated ensemble machine learning approaches provide substantial advantages for Peters–Belson decomposition in survival analysis when combined with appropriate validation procedures and transfer learning enhancements. The methodology addresses key limitations in existing approaches while maintaining computational feasibility and scientific interpretability.

Key contributions include a validated ensemble framework that ensures logical bounds compliance while capturing complex survival patterns, transfer learning methods that conditionally provide meaningful improvements when unmeasured confounding exists, comprehensive validation procedures that achieve high success rates in preventing mathematical violations, and demonstration of realistic performance showing that measured factors explain 25–35% of health disparities.

The framework provides researchers and policymakers with principled tools for decomposing health disparities that balance methodological sophistication with practical usability. By ensuring logical validity through comprehensive validation while maintaining computational efficiency, these methods enable more accurate and interpretable disparity research focused on evidence-based intervention strategies.

Future applications should explore extension to additional disease contexts, development of enhanced causal inference frameworks, and incorporation of emerging data types in population health research. The modular design facilitates such extensions while maintaining the core principles of validation and logical consistency that ensure scientific validity.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/stats8040114/s1, Section S1: Technical Implementation Details; Section S2: Threshold Sensitivity Analysis; Section S3: Missing Data Robustness Analysis; Section S4: Complete Hyperparameter Specifications; Section S5: Computational Implementation Details; Section S6: Data Preprocessing Procedures; Section S7: Extended Robustness Analysis; Section S8: Diagnostic Tools and Model Selection Guidelines; Algorithm S1: Enhanced Cross-Validated Random Survival Forest with Peters-Belson Validation; Algorithm S2: Comprehensive Peters-Belson Logical Validation Framework; Table S1: Threshold Sensitivity Analysis: Unexplained Disparity Estimates; Table S2: Missing Data Robustness: Unexplained Disparity; Table S3: Complete Hyperparameter Specifications; Table S4: Robustness Analysis: Method Performance Across Varying Data Characteristics; Table S5: Diagnostic Framework for Method Selection.

Author Contributions

Conceptualization, M.L. and Y.L.; methodology, M.L. and Y.L.; software, M.L.; validation, M.L.; formal analysis, M.L.; writing—original draft preparation, M.L.; writing—review and editing, Y.L.; supervision, Y.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

SEER data are publicly available from the National Cancer Institute (https://seer.cancer.gov/), accessed on 15 October 2024. Simulation code and analysis scripts are available from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Siegel, R.L.; Miller, K.D.; Wagle, N.S.; Jemal, A. Cancer statistics, 2024. CA A Cancer J. Clin. 2024, 74, 12–49. [Google Scholar] [CrossRef]
American Cancer Society. Cancer Facts & Figures 2024; American Cancer Society: Atlanta, GA, USA, 2024. [Google Scholar]
Miller, K.D.; Nogueira, L.; Devasia, T.; Mariotto, A.B.; Yabroff, K.R.; Jemal, A.; Kramer, J.; Siegel, R.L. Cancer treatment and survivorship statistics, 2022. CA Cancer J. Clin. 2022, 72, 409–436. [Google Scholar] [CrossRef]
Ward, E.; Jemal, A.; Cokkinides, V.; Singh, G.K.; Cardinez, C.; Ghafoor, A.; Thun, M. Cancer disparities by race/ethnicity and socioeconomic status. CA Cancer J. Clin. 2019, 54, 78–93. [Google Scholar] [CrossRef]
Ellis, L.; Canchola, A.J.; Spiegel, D.; Ladabaum, U.; Haile, R.; Gomez, S.L. Racial and ethnic disparities in cancer survival: The contribution of tumor, sociodemographic, institutional, and neighborhood characteristics. J. Clin. Oncol. 2018, 36, 25–33. [Google Scholar] [CrossRef] [PubMed]
Oaxaca, R. Male-female wage differentials in urban labor markets. Int. Econ. Rev. 1973, 14, 693–709. [Google Scholar] [CrossRef]
Blinder, A.S. Wage discrimination: Reduced form and structural estimates. J. Hum. Resour. 1973, 8, 436–455. [Google Scholar] [CrossRef]
Peters, C.C. A method of matching groups for experiment with no loss of population. J. Educ. Res. 1941, 34, 606–612. [Google Scholar] [CrossRef]
Fairlie, R.W. An extension of the Blinder-Oaxaca decomposition technique to logit and probit models. J. Econ. Soc. Meas. 2005, 30, 305–316. [Google Scholar] [CrossRef]
Ishwaran, H.; Kogalur, U.B.; Blackstone, E.H.; Lauer, M.S. Random survival forests. Ann. Appl. Stat. 2008, 2, 841–860. [Google Scholar] [CrossRef]
Chen, X.; Ishwaran, H. Random forests for genomic data analysis. Genomics 2012, 99, 323–329. [Google Scholar] [CrossRef] [PubMed]
Wang, P.; Li, Y.; Reddy, C.K.; Hu, J. Machine learning for survival analysis: A survey. ACM Comput. Surv. 2019, 51, 1–36. [Google Scholar] [CrossRef]
Lundberg, S.M.; Lee, S.I. A unified approach to interpreting model predictions. Adv. Neural Inf. Process. Syst. 2017, 30, 4765–4774. [Google Scholar]
Dietterich, T.G. Ensemble methods in machine learning. In International Workshop on Multiple Classifier Systems; Springer: Berlin/Heidelberg, Germany, 2000; pp. 1–15. [Google Scholar]
Van der Laan, M.J.; Polley, E.C.; Hubbard, A.E. Super learner. Stat. Appl. Genet. Mol. Biol. 2007, 6. [Google Scholar] [CrossRef]
Pearl, J. Causality: Models, Reasoning, and Inference, 2nd ed.; Cambridge University Press: Cambridge, UK, 2009. [Google Scholar]
VanderWeele, T.J. Explanation in Causal Inference: Methods for Mediation and Interaction; Oxford University Press: Oxford, UK, 2015. [Google Scholar]
Cox, D.R. Regression models and life-tables. J. R. Stat. Soc. Ser. B (Methodol.) 1972, 34, 187–202. [Google Scholar] [CrossRef]
Hothorn, T.; Bühlmann, P.; Dudoit, S.; Molinaro, A.; Van Der Laan, M.J. Survival ensembles. Biostatistics 2006, 7, 355–373. [Google Scholar] [CrossRef]
Chen, T.; Guestrin, C. XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar]
Schmid, M.; Hothorn, T. Boosting additive models using component-wise P-splines. Comput. Stat. Data Anal. 2008, 53, 298–311. [Google Scholar] [CrossRef]
Zou, H.; Hastie, T. Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 2005, 67, 301–320. [Google Scholar] [CrossRef]
Tibshirani, R. The lasso method for variable selection in the Cox model. Stat. Med. 1997, 16, 385–395. [Google Scholar] [CrossRef]
Graf, E.; Schmoor, C.; Sauerbrei, W.; Schumacher, M. Assessment and comparison of prognostic classification schemes for survival data. Stat. Med. 1999, 18, 2529–2545. [Google Scholar] [CrossRef]
Gerds, T.A.; Schumacher, M. Consistent estimation of the expected Brier score in general survival models with right-censored event times. Biom. J. 2006, 48, 1029–1040. [Google Scholar] [CrossRef] [PubMed]
Hotelling, H. Analysis of a complex of statistical variables into principal components. J. Educ. Psychol. 1933, 24, 417–441. [Google Scholar] [CrossRef]
Jolliffe, I.T.; Cadima, J. Principal Component Analysis, 3rd ed.; Springer: Berlin/Heidelberg, Germany, 2016. [Google Scholar]
Cohen, J. Statistical Power Analysis for the Behavioral Sciences, 2nd ed.; Lawrence Erlbaum Associates: Hillsdale, NJ, USA, 1988. [Google Scholar]
McInnes, L.; Healy, J.; Melville, J. UMAP: Uniform manifold approximation and projection. arXiv 2018, arXiv:1802.03426. [Google Scholar] [CrossRef]
Polley, E.C.; van der Laan, M.J. Super learning. In Targeted Learning: Causal Inference for Observational and Experimental Data; Springer: Berlin/Heidelberg, Germany, 2010; pp. 43–66. [Google Scholar]
Graubard, B.I.; Korn, E.L. Predictive margins with survey data. Biometrics 1999, 55, 652–659. [Google Scholar] [CrossRef] [PubMed]
Pfeffermann, D.; Krieger, A.M.; Rinott, Y. The use of sampling weights when imputing for missing data. J. Am. Stat. Assoc. 1998, 93, 596–610. [Google Scholar]
Institute of Medicine. Unequal Treatment: Confronting Racial and Ethnic Disparities in Health Care; National Academies Press: Washington, DC, USA, 2003. [Google Scholar]
Williams, D.R.; Lawrence, J.A.; Davis, B.A. Racism and health: Evidence and needed research. Annu. Rev. Public Health 2019, 40, 105–125. [Google Scholar] [CrossRef]
Bailey, Z.D.; Krieger, N.; Agénor, M.; Graves, J.; Linos, N.; Bassett, M.T. Structural racism and health inequities in the USA: Evidence and interventions. Lancet 2017, 389, 1453–1463. [Google Scholar] [CrossRef]
Phelan, J.C.; Link, B.G. Is racism a fundamental cause of inequalities in health? Annu. Rev. Sociol. 2010, 36, 181–203. [Google Scholar] [CrossRef]
Geronimus, A.T. The weathering hypothesis and the health of African-American women and infants: Evidence and speculations. Ethn. Dis. 1992, 2, 207–221. [Google Scholar]
Hernán, M.A.; Robins, J.M. Causal Inference: What If; Chapman & Hall/CRC: Boca Raton, FL, USA, 2020. [Google Scholar]

Figure 1. Validated Ensemble Peters–Belson Framework Workflow. The flowchart illustrates the complete methodology from data input through individual model training, Peters–Belson validation, ensemble construction, optional transfer learning enhancement, and final disparity decomposition analysis. Core validation processes are shown in solid lines, while optional transfer learning enhancements are indicated with dashed purple lines.

Figure 2. Comprehensive Peters–Belson Method Performance Analysis. (A) Main confidence interval plot showing proportion of disparity explained across health disparity scenarios for all methods with 95% confidence intervals. (B) Transfer learning enhancement showing traditional versus latent factor contributions with additional benefit percentages. (C) Validation success rates demonstrating reliability of Peters–Belson logical bounds compliance across methods. Error bars represent 95% confidence intervals from 1000 simulation replications. Higher values in panels A and C indicate better explanation of observed disparities and more reliable decomposition results, respectively.

Table 1. Comprehensive Simulation Results: Enhanced Peters–Belson Method Performance by Complexity Scenario.

Scenario	Proportion Explained			Performance Metrics
	Cox	Ensemble	Transfer	Valid.	RMSE	C-Idx
			Learning	(%)
Simple Linear	0.355	0.389	0.423	95.2	0.087	0.734
Complex Interactions	0.334	0.378	0.421	91.8	0.091	0.756
Nonlinear Relationships	0.298	0.351	0.398	89.4	0.089	0.771
Unmeasured Confounding	0.287	0.325	0.412	87.6	0.094	0.743
Mixed Complexity	0.275	0.318	0.401	86.3	0.096	0.762
Overall	0.310	0.352	0.411	90.1	0.091	0.753

Results averaged across 1000 replications per scenario. Cox: Standard Peters–Belson with Cox proportional hazards model. Ensemble: Validated ensemble framework with multiple models. Transfer Learning: Ensemble plus latent factor discovery. Valid.: Validation success rate (percentage passing logical bounds). C-Idx: Concordance index.

Table 2. Individual Method Performance and Ensemble Weights in Validated Framework.

Method	Prop.	Valid.	RMSE	Comp.	Ensemble
	Expl.	(%)		Time (s)	Weight
Cox Proportional Hazards	0.310	96.4	0.091	1.8	0.42
Random Survival Forest	0.325	91.7	0.087	12.4	0.34
Gradient Boosting	0.338	89.3	0.085	18.7	0.24
Neural Network	0.318	87.2	0.089	25.1	–
Validated Ensemble	0.352	94.1	0.084	21.3	–

Prop. Expl.: Proportion of disparity explained by measured covariates. Valid.: Validation success rate (percentage passing Peters–Belson logical bounds). Comp. Time: Computational time including cross-validation and hyperparameter tuning. Ensemble weights shown for methods consistently included in validated ensemble. Neural Network frequently excluded due to validation failures, hence no weight.

Table 3. Transfer Learning Enhancement: Latent Factor Discovery and Additional Explanation.

Scenario	Latent Factors		Decomposition			Transfer
	Comp.	Var.	Trad.	Lat.	Resid.	Benef.
	(n)	Expl.	Expl.	Expl.	Unexp.	(%)
Simple Linear	3.2	0.743	0.352	0.071	0.577	16.8
Complex Interactions	4.1	0.798	0.378	0.043	0.579	10.2
Nonlinear Relationships	4.3	0.812	0.351	0.047	0.602	11.8
Unmeasured Confounding	4.8	0.834	0.325	0.087	0.588	21.1
Mixed Complexity	4.6	0.821	0.318	0.083	0.599	20.7
Overall	4.2	0.802	0.345	0.066	0.589	16.1

Comp. (n): Average number of principal components selected via 75% variance threshold. Var. Expl.: Proportion of total variance explained by selected components. Trad. Expl.: Traditional explained disparity from measured covariates. Lat. Expl.: Latent factor explained disparity from transfer learning. Resid. Unexp.: Residual unexplained disparity after all methods. Trans. Benef.: Additional proportion of disparity explained by latent factors (percentage improvement). Decomposition components sum to total disparity (normalized to 1.0).

Table 4. Enhanced Precision Results: Unexplained Disparity by Scenario (4 decimal places).

Scenario	t = 2	t = 5	t = 10
same_X_same_beta	0.0068	0.0202	0.0342
diff_X_same_beta	0.0069	0.0202	0.0338
same_X_diff_beta	0.0136	0.0359	0.0552
diff_X_diff_beta	0.0168	0.0426	0.0626

Values verified as distinct across 100 simulations per scenario. Scenarios with differential

β

coefficients exhibit approximately twice the magnitude of unexplained disparity.

Table 5. Validated Peters–Belson Decomposition Results for SEER Pancreatic Cancer Data.

Time (Mo.)	Sex-Based Analysis			Racial/Ethnic Analysis
	Total	Expl.	Unexp.	Total	Expl.	Unexp.
	Disp.	(SE)	(SE)	Disp.	(SE)	(SE)
6	0.158	0.042	0.116	0.115	0.035	0.080
		(0.008)	(0.012)		(0.006)	(0.009)
12	0.115	0.035	0.080	0.089	0.027	0.062
		(0.006)	(0.009)		(0.005)	(0.007)
18	0.115	0.035	0.080	0.089	0.027	0.062
		(0.006)	(0.009)		(0.005)	(0.007)
24	0.115	0.035	0.080	0.089	0.027	0.062
		(0.006)	(0.009)		(0.005)	(0.007)
Average	0.126	0.037	0.089	0.096	0.029	0.067

Mo.: Months; Disp.: Disparity; Expl.: Explained disparity; Unexp.: Unexplained disparity. Standard errors (SE) from bootstrap procedures with validated ensemble framework. Sex-based analysis: 29.4% of disparity explained by measured covariates on average. Racial/ethnic analysis: 30.2% of disparity explained by measured covariates on average. Results demonstrate disparity patterns consistent with the health disparities literature.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Liang, M.; Li, Y. Validated Transfer Learning Peters–Belson Methods for Survival Analysis: Ensemble Machine Learning Approaches with Overfitting Controls for Health Disparity Decomposition. Stats 2025, 8, 114. https://doi.org/10.3390/stats8040114

AMA Style

Liang M, Li Y. Validated Transfer Learning Peters–Belson Methods for Survival Analysis: Ensemble Machine Learning Approaches with Overfitting Controls for Health Disparity Decomposition. Stats. 2025; 8(4):114. https://doi.org/10.3390/stats8040114

Chicago/Turabian Style

Liang, Menglu, and Yan Li. 2025. "Validated Transfer Learning Peters–Belson Methods for Survival Analysis: Ensemble Machine Learning Approaches with Overfitting Controls for Health Disparity Decomposition" Stats 8, no. 4: 114. https://doi.org/10.3390/stats8040114

APA Style

Liang, M., & Li, Y. (2025). Validated Transfer Learning Peters–Belson Methods for Survival Analysis: Ensemble Machine Learning Approaches with Overfitting Controls for Health Disparity Decomposition. Stats, 8(4), 114. https://doi.org/10.3390/stats8040114

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.

Article Menu

Validated Transfer Learning Peters–Belson Methods for Survival Analysis: Ensemble Machine Learning Approaches with Overfitting Controls for Health Disparity Decomposition

Abstract

1. Introduction

Contributions

2. Methods

2.1. Traditional Peters–Belson Framework for Survival Analysis

2.1.1. Setup and Notation

2.1.2. Peters–Belson Decomposition

2.2. Validated Ensemble Machine Learning Framework

2.2.1. Individual Survival Models with Cross-Validation

2.2.2. Hyperparameter Specification

2.2.3. Peters–Belson Validation Framework

2.2.4. Validated Ensemble Integration

2.3. Transfer Learning Peters–Belson Framework

2.3.1. Terminology and Conceptual Framework

2.3.2. Latent Factor Discovery

2.3.3. Threshold Selection

2.3.4. When to Use Transfer Learning

2.3.5. Limitations of PCA’s Linear Assumptions

2.3.6. Rationale for PCA Selection

2.3.7. Enhanced Decomposition with Validation

2.4. Generalization-Aware Counterfactual Estimation

3. Simulation Study

3.1. Study Design

3.1.1. Disparity Scenarios

3.1.2. Performance Metrics

3.2. Results

3.2.1. Overall Method Performance

3.2.2. Individual Method Contributions

3.2.3. Transfer Learning Analysis

3.3. Enhanced Precision Results

4. SEER Pancreatic Cancer Data Application

4.1. Data Source and Analysis Framework

4.2. Peters–Belson Decomposition Results

4.3. Transfer Learning Analysis in SEER Data

5. Discussion

6. Conclusions

Supplementary Materials

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI