Next Article in Journal
Population Median Estimation Using Auxiliary Variables: A Simulation Study with Real Data Across Sample Sizes and Parameters
Previous Article in Journal
Empirical Bayes Estimators for Mean Parameter of Exponential Distribution with Conjugate Inverse Gamma Prior Under Stein’s Loss
Previous Article in Special Issue
RCDi: Robust Causal Direction Inference Using INUS-Inspired Asymmetry with the Solomonoff Prior
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Integrating Copula-Based Random Forest and Deep Learning Approaches for Analyzing Heterogeneous Treatment Effects in Survival Analysis

by
Jong-Min Kim
Statistics Discipline, Division of Science and Mathematics, University of Minnesota-Morris, Morris, MN 56267, USA
Mathematics 2025, 13(10), 1659; https://doi.org/10.3390/math13101659
Submission received: 16 April 2025 / Revised: 13 May 2025 / Accepted: 17 May 2025 / Published: 19 May 2025
(This article belongs to the Special Issue Computational Methods and Machine Learning for Causal Inference)

Abstract

:
This paper presents deep learning models—specifically, Long Short-Term Memory (LSTM) networks and hybrid Convolutional Neural Network–LSTM (CNN-LSTM) with a Copula-Based Random Forest (CBRF) model to estimate Heterogeneous Treatment Effects (HTEs) in survival analysis. The proposed method is designed to capture non-linear relationships and temporal dependencies in clinical and genomic data, with a particular focus on exploring how treatment effects vary by race as a moderating factor. Using breast cancer data from the TCGA-BRCA dataset, which includes both clinical variables and gene expression profiles, we filter the data to focus on two racial groups: Black or African American and White. Dimensionality reduction is performed using Principal Component Analysis (PCA). We compare the CNN-LSTM, LSTM, and CBRF models under three weighting strategies—no weights, Horvitz–Thompson (HT) weights, and Inverse Probability of Treatment Weighting (IPTW)—for predicting treatment effects. Model performance is evaluated using Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), Concordance statistic (C-statistic), Average Treatment Effect (ATE), and Conditional Average Treatment Effect (CATE) by race. The CNN-LSTM model consistently outperforms the others, achieving the lowest prediction errors and highest discrimination, particularly under IPTW. Among the weighting strategies, IPTW yields the most substantial improvements in model performance and bias reduction. Importantly, race-specific treatment effects exhibit notable variation: CNN-LSTM estimates a slightly higher CATE for Black individuals under IPTW. Overall, CNN-LSTM with IPTW is recommended for robust and equitable causal inference, especially in racially stratified settings.

1. Introduction

Over the past five years, researchers have significantly advanced the estimation and application of heterogeneous treatment effects (HTEs). These advancements result from both methodological innovations and an increasing number of applications across diverse fields, particularly in clinical and healthcare domains. Recent studies integrate advanced machine learning techniques, data fusion strategies, and statistical models to improve the precision of treatment effect estimation [1]. These developments have important implications for clinical practice, especially in enabling personalized treatment strategies. Looking ahead, the continued integration of machine learning, Bayesian modeling, and meta-analytic frameworks promises to enhance our understanding and application of HTE across various healthcare settings [2,3]. For example,  [2] apply flexible causal machine learning estimators in a comprehensive case study that explores the heterogeneous survival effects of two radiotherapy strategies for localized high-risk prostate cancer. Understanding how treatments affect distinct subgroups, especially those defined by race, gender, or other demographic variables remains a major challenge in medical research. Analyzing HTE across such subpopulations deepens our understanding of treatment efficacy and supports the design of more personalized and equitable healthcare interventions [4,5,6].
HTE plays a critical role in survival analysis because treatment effects rarely remain uniform across all individuals [7]. Patients differ in baseline characteristics, risk factors, comorbidities, and biological responses to interventions. Overlooking this heterogeneity may lead to inaccurate conclusions about treatment efficacy, resulting in suboptimal or potentially harmful decisions for specific subgroups [8]. Accurately modeling HTE enables the development of more personalized treatment strategies, better allocation of healthcare resources, and improved patient outcomes [9]. Nevertheless, existing methods for estimating HTE in survival contexts exhibit several limitations. Traditional models like the Cox proportional hazards model often assume constant treatment effects across populations, potentially masking significant subgroup differences [10]. Parametric and semiparametric models frequently impose rigid assumptions, such as proportional hazards, which may not hold in complex, real-world data settings [11]. While machine learning methods adapted for HTE, including causal forests, have been extended to handle survival data [12], they still struggle to integrate censoring appropriately without strong assumptions. Ref. [13] introduce DeepSurv, a deep neural network variant of the Cox model that captures interactions between patient covariates and treatment effects to generate personalized recommendations. To evaluate recent survival machine learning methods, such as DeepSurv,  [2] conducted a simulation study across varied settings involving confounded HTE and covariate overlap. Ref. [14] propose a deep learning architecture for dynamic survival modeling using longitudinal data.
Copula-based models have recently emerged as a promising approach to address these challenges [15]. By capturing the dependence structure between random variables, copulas enable more flexible modeling of relationships between covariates and treatment effects [16]. These methods capture nonlinear dependencies, manage missing data more effectively, and provide a robust framework for accounting for unobserved confounders. Traditional survival models like the Cox model [10] and survival random forests [17] often assume simple covariate independence or require the model to implicitly learn complex dependencies. In practice, however, covariates often exhibit nonlinear and high-order dependencies. Neglecting these dependencies can bias treatment effect estimates and degrade predictive performance. Although recent HTE methods [9,12] offer greater flexibility through tree-based techniques, they typically rely on the assumption of covariate independence in their partitioning schemes. In survival analysis,  [17] develop random survival forests for right censored data, while [12] extend causal forests for HTE estimation. Alternatively,  [18] propose DeepHit, a neural network that directly models survival time distributions without assuming an underlying stochastic process. These models often use evaluation metrics, such as the concordance index [19], Brier scores, and calibration plots to assess treatment effect estimation performance.
To address these limitations, we propose incorporating copula-based transformations into survival models to explicitly model and exploit complex feature dependencies, thereby improving the estimation of HTE under-censoring. Our main contribution lies in the development of a Copula-Based Random Forest (CBRF) framework for survival analysis, where copulas model feature dependencies before constructing the random forest [16]. By explicitly addressing these dependencies, our approach improves both the interpretability and predictive performance of survival models in HTE settings. Investigating racial disparities in healthcare outcomes is especially important in survival contexts. While survival models can predict patient outcomes, their effectiveness often depends on how well they account for race, treatment status, and their interaction. Individuals from different racial groups may respond differently to treatment due to genetic, environmental, or socio-economic factors [20,21]. Incorporating such heterogeneity into survival models is thus essential for equitable predictive modeling.
In this study, we develop an integrated framework that combines deep learning models, specifically Long Short-Term Memory (LSTM) networks and hybrid Convolutional Neural Network (CNN)–LSTM models with a CBRF model. The LSTM model captures temporal dependencies in the data, while the CNN-LSTM hybrid extracts complex spatial and temporal features simultaneously. Meanwhile, the CBRF model captures nonlinear interactions, especially those involving race and treatment. We apply our methodology to breast cancer data from the TCGA-BRCA dataset, focusing on clinical features and gene expression data from Black or African American and White patients. We reduce the dimensionality of gene expression data using Principal Component Analysis (PCA) before model training. We evaluate all models using performance metrics including Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), and the concordance index, along with treatment effect measures such as Average Treatment Effect (ATE) and Conditional Average Treatment Effect (CATE).
This research underscores the importance of incorporating HTE in survival models and demonstrates how race influences treatment response. Our findings suggest that machine learning models can support more equitable and personalized treatment recommendations, particularly in cancer treatment and survivorship planning [22].
The rest of the paper is structured as follows: Section 2 describes the dataset and the data collection process. Section 3 outlines the methodology, including preprocessing steps, model development, HT weight, IPTW, and evaluation criteria. Section 4 presents results, model performance metrics, and treatment effect estimates. Section 5 discusses the implications of our findings and suggests future research directions.

2. Differential Gene Expression Analysis of TCGA-BRCA Data

This study employs the DESeq2 R package to perform differential gene expression analysis on breast cancer data from the TCGA-BRCA dataset [23]. The analysis workflow begins with querying, downloading, and preparing gene expression data from the Genomic Data Commons (GDC), followed by constructing a dataset compatible with DESeq2 to identify Differentially Expressed Genes (DEGs). This workflow provides a systematic and reproducible framework for assessing gene expression changes in breast cancer samples, thereby offering valuable insights into the molecular mechanisms underlying tumor progression and treatment response.
Gene expression analyses of cancer datasets, such as those available through The Cancer Genome Atlas (TCGA), consistently yield significant insights into the molecular alterations associated with tumorigenesis [24,25,26]. In particular, RNA Seq-based differential gene expression analysis facilitates the identification of genes that are significantly upregulated or downregulated in cancerous tissues relative to normal tissues. DESeq2, a widely adopted tool for RNA Seq data analysis, normalizes raw counts and enables the identification of DEGs between conditions, such as tumor versus normal tissue [23].
To access the TCGA-BRCA dataset, we utilize the TCGAbiolinks package in R [24]. We construct a query targeting gene expression quantification data under the “Transcriptome Profiling” category for the TCGA-BRCA project. The raw RNA Seq count data are subsequently downloaded using the GDCdownload() function and structured using GDCprepare(), yielding a SummarizedExperiment object that integrates both the gene expression matrix and corresponding clinical metadata.
To maintain consistency and comparability in the downstream analysis, we filter the dataset based on the race metadata field, retaining only patients who self-identify as either Black or African American or White. We extract vital status information and encode it as a binary outcome (1 = deceased, 0 = alive). We remove genes with zero variance across samples to eliminate non-informative predictors.
Given the high dimensionality of transcriptomic data, we apply Principal Component Analysis (PCA) to reduce dimensionality while preserving the majority of variance. We transpose the gene expression matrix so that rows represent samples, and columns represent genes. Following variance filtering, we perform PCA using the prcomp() function in R, enabling centering and scaling. We retain the first six Principal Components (PCs), which capture the largest proportion of variance in gene expression, for use in subsequent predictive modeling.
We process and align clinical covariates such as race and vital_status with the gene expression data. After filtering, we subset the gene expression matrix to include only those samples for which clinical metadata are available. This alignment ensures that the rows of the reduced PCA matrix correspond exactly to the samples included in the clinical dataset. The final dataset comprises the following:
  • A reduced gene expression matrix (samples × 6 PCs),
  • A categorical variable indicating race (Black or African American vs. White),
  • A binary outcome variable representing vital status.
We conduct all preprocessing steps in R using the following packages: TCGAbiolinks, DESeq2, dplyr, ggplot2, keras, randomForest, copula, Hmisc, and tidyr. These tools collectively support querying TCGA data, reducing dimensionality, handling clinical covariates, and preparing the dataset for advanced statistical modeling and machine learning applications.
Figure 1 presents the PCA screeplot for the breast cancer dataset. Based on this plot, we select six principal components for further analysis. The analysis begins by loading the gene expression and clinical data. The clinical dataset includes patient attributes such as race and vital status (alive or deceased). We categorize race into two groups: Black or African American and White, and filter the dataset to retain only these groups. We apply PCA to the gene expression data to reduce dimensionality and improve model training and interpretability.
Let X R n × m denote the gene expression matrix, where n is the number of samples and m is the number of genes. After performing PCA, we project the data onto the first k principal components:
X reduced = X · V k
Here, V k contains the top k principal components, and X reduced R n × k represents the reduced dataset, with · denoting matrix multiplication.
The preprocessing pipeline consists of the following steps:
  • Convert the variable race into a factor to ensure appropriate handling in modeling.
  • Encode the vital_status variable as a binary outcome (0 = alive, 1 = deceased) to facilitate survival analysis.
  • Normalize and transform the gene expression matrix to a format compatible with statistical models.
This structured and filtered dataset serves as the foundation for downstream survival analysis and model development.

3. LSTM, CNN-LSTM, and Copula-Based Random Forest

Researchers widely adopt Long Short-Term Memory (LSTM) networks and CNN-LSTM models for time-series analysis, particularly in contexts that involve sequential dependencies. The Random Forest (RF) algorithm serves as an ensemble learning method based on decision trees. It trains multiple decision trees on bootstrapped samples and aggregates predictions through majority voting (for classification) or averaging (for regression). RF naturally handles nonlinearity, feature selection, and variable interactions. However, it assumes independence among input features, which often limits its effectiveness in scenarios with strong inter-feature dependencies.
Real-world datasets frequently contain complex, nonlinear dependencies that standard RF models fail to capture. To address this limitation, we enhance RF using copula transformations. This section introduces how we apply copula transformations to construct the Copula-Based Random Forest (CBRF) model, enabling structured dependency learning across features.

3.1. Gaussian Copula Transformation

The Gaussian copula transformation is a popular method for modeling multivariate dependence structures, especially when marginal distributions are non-normal. This approach decouples the modeling of marginal distributions from the joint dependence structure, making it a flexible tool for multivariate analysis. According to Sklar’s Theorem, any multivariate Cumulative Distribution Function (CDF) F ( x 1 , x 2 , , x d ) with continuous marginals F 1 , F 2 , , F d can be expressed in terms of a copula C as:
F ( x 1 , x 2 , , x d ) = C ( F 1 ( x 1 ) , F 2 ( x 2 ) , , F d ( x d ) ) ,
where the copula C : [ 0 , 1 ] d [ 0 , 1 ] captures the dependence structure between variables, independent of their marginal distributions [27].
Given a d-dimensional random vector X = ( X 1 , X 2 , , X d ) , each marginal X j is transformed to the unit interval using its CDF:
U j = F j ( X j ) , j = 1 , , d .
In practice, when the true marginal distributions are unknown, empirical cumulative distribution functions F ^ j are used, resulting in pseudo-observations U ^ j = F ^ j ( X j ) . These transformed variables U = ( U 1 , , U d ) are uniformly distributed on [ 0 , 1 ] d , marginally.
To apply a Gaussian copula, the uniformly distributed variables are then transformed to standard normal marginals via the probit function, i.e., the inverse standard normal CDF:
Z j = Φ 1 ( U j ) , j = 1 , , d ,
where Φ denotes the standard normal cumulative distribution function. The resulting vector Z = ( Z 1 , , Z d ) has standard normal marginals and retains the dependence structure of the original data under the Gaussian copula assumption.
The dependence structure of the transformed data Z is characterized by estimating the empirical correlation matrix Σ ^ . The Gaussian copula C Σ is defined as:
C Σ ( u 1 , , u d ) = Φ Σ Φ 1 ( u 1 ) , , Φ 1 ( u d ) ,
where Φ Σ is the joint CDF of a multivariate normal distribution with zero mean and correlation matrix Σ . This copula captures the full joint dependence while preserving the marginal distributions of the original data.
The Gaussian copula transformation is useful in applications where complex marginal distributions need to be modeled while assuming a multivariate normal-like dependence structure. It has been widely applied in fields, such as financial econometrics, survival analysis, and risk management, and serves as a foundation for copula-based regression and deep learning models that capture non-linear and non-Gaussian dependencies [28].
In survival analysis, censored data (e.g., when subjects drop out or the study ends before the event of interest occurs) are common. Copulas are effective tools for modeling such data, as they handle both censored and uncensored data by capturing the dependence structure between them. This allows for more accurate estimation of treatment effects, even when some data are censored [29].
Moreover, in treatment effect estimation, copulas provide a powerful method for modeling non-linear, dependent, and higher-order relationships among variables, especially in HTE settings. By using copulas, we can obtain more reliable treatment effect estimates, particularly when other models struggle to account for complex dependencies [30].

3.2. Copula-Based Random Forest Model

Common copula families are Gaussian copula (captures symmetric dependencies), Clayton copula (models asymmetric lower tail dependence), Gumbel copula (captures upper tail dependence), Frank copula (captures both positive and negative dependence) and empirical copula (nonparametric, data-driven approach). We propose CBRF, a hybrid model that integrates copula functions with the traditional random forest algorithm to capture nonlinear dependencies and complex interactions between features in high-dimensional data. The key idea in this paper is to use the Gaussian copula function to model dependence structures before feeding the transformed features into a Random Forest (RF) model [31].
This approach enhances the flexibility and robustness of random forests, especially in scenarios where traditional independence assumptions are not valid. It is particularly useful for HTE estimation, financial risk modeling, survival analysis, and spatial data modeling.
The CBRF modifies standard RF by transforming the feature space using copulas before feeding it into the RF model. In this research, we apply the Gaussian copula to RF for data analysis. CBRF combines the concepts of copula theory and random forests to model complex dependencies between variables in multivariate data. It is particularly useful for dealing with high-dimensional datasets where the relationship between variables is intricate and nonlinear.
Random forests are ensembles of decision trees trained using random subsets of data and features. For regression tasks, each tree in the forest provides a prediction, and the final prediction is typically the average of all tree predictions:
f RF ( X ) = 1 T t = 1 T h t ( X )
where h t ( X ) is the prediction from the t-th tree and T is the total number of trees in the forest [31].
The CBRF model integrates Gaussian copula transformations with random forests to effectively model complex, non-linear dependencies among features, particularly in high-dimensional and heterogeneous datasets. This hybrid approach benefits from the flexibility of the copula framework and the robustness of ensemble learning.
The Gaussian copula enables the modeling of multivariate dependencies through bivariate copulas while preserving the marginal distributions of the original variables. To implement the CBRF, pseudo-observations are first obtained from the empirical marginal distributions. These are then transformed into a standard normal scale using the inverse Gaussian CDF. The transformed features are used to train a random forest model.
The training process involves building decision trees using splitting criteria based on the transformed feature space. The final prediction is obtained by averaging the predictions across all trees:
y ^ = 1 N T t = 1 N T T t ( x )
where T t ( x ) is the prediction from the t-th tree for input x, and N T is the total number of trees.
This methodology captures both marginal behavior and inter-variable dependencies while maintaining the non-parametric, data-driven strengths of random forests. The randomForest() function is employed to train the model using bootstrapped samples of the copula-transformed data. The ensemble of trees captures non-linear patterns without requiring parametric assumptions, thus enhancing predictive performance.

3.3. LSTM and CNN-LSTM

LSTM networks use memory cells to retain long-term dependencies in time-series data. The main equations governing LSTM cell states are the following:
f t = σ ( W f [ h t 1 , x t ] + b f ) , i t = σ ( W i [ h t 1 , x t ] + b i ) , C ˜ t = tanh ( W C [ h t 1 , x t ] + b C ) , C t = f t · C t 1 + i t · C ˜ t , o t = σ ( W o [ h t 1 , x t ] + b o ) , h t = o t · tanh ( C t ) .
Here, f t is the forget gate, i t is the input gate, C t is the cell state, and h t is the hidden state.
The LSTM is a type of Recurrent Neural Network (RNN) designed to handle sequential data. LSTM cells are known for their ability to retain long-term dependencies, making them suitable for time-series forecasting, gene expression data, and other tasks involving sequential data. The LSTM model takes a 3D input of shape (samples, time_steps, features). In this case, the reshaped expression data (copula_transformed_reshaped) have dimensions ( n _ s a m p l e s , 1 , n _ f e a t u r e s ) , where n _ s a m p l e s is the number of data points (patients), 1 is the time step (since it is not a temporal sequence in the classical sense, but each sample is treated as a sequence of length 1), n _ f e a t u r e s is the number of principal components. The core of the model is the LSTM layer with 64 units. It processes the input sequence and learns patterns in the data. The return_sequences = FALSE argument indicates that only the last output (the hidden state at the final time step) is passed to the next layer. This is followed by a dense layer. After the LSTM layer, there is a dense layer with 32 units and ReLU activation. This layer introduces non-linearity and learns more complex relationships between the features. The final layer is a dense layer with a single neuron that outputs the predicted outcome (e.g., the vital status). The model is compiled with the MSE loss function, which is appropriate for regression tasks, where the output is continuous. The Adam optimizer is used to adjust the model weights during training. Adam is an adaptive optimizer that works well for a variety of tasks. The model is trained using the fit() method, where the copula-transformed data are used as input, and the outcome (vital status) is the target variable. Training is performed for 20 epochs, with a batch size of 32, and 20% of the data are reserved for validation.
The CNN-LSTM model is a hybrid deep learning architecture that combines Convolutional Neural Networks (CNNs) and LSTM networks. This architecture leverages the strengths of both CNNs (for spatial feature extraction) and LSTMs (for temporal sequence modeling), making it highly suitable for tasks involving spatiotemporal data such as time series forecasting, video classification, and functional data analysis. The CNN component is designed to extract high-level local features from the input sequence. For time series data, 1D convolutions are typically employed to identify short-term dependencies and localized temporal patterns across fixed-size windows. The LSTM network models long-term dependencies and sequential dynamics by utilizing memory cells regulated by input, forget, and output gates. This allows the model to retain important temporal information over extended time horizons. The CNN-LSTM architecture proceeds as follows: The raw sequential input X = { x 1 , x 2 , , x T } R T × d is segmented into fixed-size (possibly overlapping) windows. CNN Layers are processed through one or more 1D convolutional layers with kernel size k, resulting in feature maps that capture localized temporal structure. The resulting feature maps are reshaped as needed and passed through LSTM layers, which model the temporal dynamics of the CNN-extracted features. The LSTM output (either the last hidden state or the entire sequence, depending on the task) is fed into one or more fully connected (dense) layers to produce the final output.
Let Conv ( X ) denote the CNN output and LSTM ( · ) the LSTM operation. The CNN-LSTM model can be expressed as follows:
H = LSTM ( Conv ( X ) )
y ^ = σ ( W H + b )
where H is the hidden representation from the LSTM, y ^ is the predicted output, σ is an activation function (e.g., softmax or identity), W and b are learnable weight and bias parameters.
CNN captures local patterns, while LSTM models global temporal dependencies. Convolutional layers reduce input dimensionality before temporal modeling for efficient handling of high-dimensional and long-sequence data. The CNN-LSTM model combines CNNs and LSTMs to capture both local patterns (through CNNs) and sequential dependencies (through LSTMs). Like the LSTM model, the CNN-LSTM model takes a 3D input tensor of shape (samples, time_steps, features) with dimensions ( n _ s a m p l e s , 1 , n _ f e a t u r e s ) . A 1D convolutional layer with 64 filters and a kernel size of 1 is applied to the input data to extract local patterns. The kernel size of 1 allows the model to learn spatial dependencies among features across samples. ReLU activation is used to introduce non-linearity. A max pooling layer with a pool size of 1 is applied, but it does not change the data size due to the pool size being equal to 1. The LSTM layer processes the sequential dependencies of the data with 64 units. Like the LSTM model, return_sequences = FALSE ensures that only the final hidden state is passed to the next layer. A dense layer with 32 units and ReLU activation is applied to further process the features learned by the CNN and LSTM layers. The final dense layer with a single neuron outputs the predicted value (vital status). The model is compiled with MSE as the loss function. The Adam optimizer is used to update the model weights. The model is trained similarly to the LSTM model using the fit() method, with 20 epochs and a batch size of 32, where 20% of the data is used for validation.
In the CNN-LSTM architecture, we employ a convolutional kernel of size 1 to perform pointwise transformations. This design choice allows the model to reweight features at each time step without aggregating information across adjacent steps, thereby preserving the temporal resolution of the input sequence. The pooling size is also set to 1, which maintains the sequence length across layers. Although such configurations do not introduce spatial abstraction, they are beneficial in scenarios where per-time-step feature transformation is prioritized over downsampling. Future work may consider alternative kernel and pooling sizes to explore their effects on feature summarization and model performance.

3.4. HT and IPTW Weights in Causal Inference

In causal inference, HT weights are used to correct for selection bias and estimate causal effects when treatment assignment is not randomized. These weights originate from survey sampling and are adapted for use under the potential outcomes framework.
The HT weight for each unit is the inverse of the probability of receiving the treatment actually received:
w i = 1 e ( X i ) if T i = 1 1 1 e ( X i ) if T i = 0
where w i is HT weight for unit i, T i : treatment indicator (1 for treated, 0 for control), and e ( X i ) = P ( T i = 1 X i ) : propensity score, the probability of receiving the treatment given covariates X i .
HT weights allow for reweighting the sample to approximate a randomized experiment, balancing the covariate distribution across treatment groups.
HT weights are commonly used to estimate the Average Treatment Effect (ATE):
ATE ^ HT = 1 n i = 1 n T i Y i e ( X i ) ( 1 T i ) Y i 1 e ( X i )
This estimator is unbiased under the following assumptions: No unmeasured confounders and each unit has a nonzero probability of receiving either treatment.
HT weights can be unstable when propensity scores are close to 0 or 1. In practice, stabilized weights or trimming/extreme value handling is often used.
Inverse Probability of Treatment Weighting (IPTW) is a method for estimating causal effects from observational data by constructing a weighted pseudo-population in which treatment assignment is independent of measured covariates. This method relies on estimating the probability of the treatment conditional on observed covariates referred to as the propensity score and using its inverse as a weight.
Let T i { 0 , 1 } denote the treatment indicator for unit i, where T i = 1 indicates treatment and T i = 0 indicates control. Let Y i denote the observed outcome, and X i R p represent a vector of observed pre-treatment covariates. The propensity score is defined as:
e ( X i ) = P ( T i = 1 X i ) ,
i.e., the probability that unit i receives the treatment, given covariates X i . Assuming that e ( X i ) is known or consistently estimated from the data, the IPTW weight for individual i is given by:
w i IPTW = T i e ( X i ) + 1 T i 1 e ( X i ) .
In practice, these weights correspond to:
w i IPTW = 1 e ( X i ) if T i = 1 , 1 1 e ( X i ) if T i = 0 .
These weights are used to create a weighted sample in which the distribution of observed covariates is balanced between treatment groups, thereby mimicking a randomized controlled trial.
Using these weights, the IPTW estimator of the Average Treatment Effect (ATE) is defined as:
ATE ^ IPTW = 1 n i = 1 n T i Y i e ( X i ) ( 1 T i ) Y i 1 e ( X i ) .
This estimator is consistent for the true ATE under the following assumptions: ( Y ( 0 ) , Y ( 1 ) ) T X , i.e., there are no unmeasured confounders given X, 0 < e ( X i ) < 1 for all i, ensuring that each unit has a positive probability of receiving either treatment. The model used to estimate the propensity scores e ( X i ) must be correctly specified.
In applications where extreme weights can introduce high variance, stabilized or truncated versions of IPTW are commonly used to enhance the robustness and efficiency of the estimator.

3.5. Model Training and Evaluation

All models are trained to predict the binary vital_status outcome using the same training dataset. Each model is compiled with the Adam optimizer and mean squared error loss function. Training includes a validation split to monitor generalization and prevent overfitting and runs for 20 epochs.
Model performance is assessed using Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), and the Concordance statistic (C-index). RMSE and MAE evaluate prediction accuracy by measuring the average magnitude of errors:
RMSE = 1 n i = 1 n ( y i y ^ i ) 2 , MAE = 1 n i = 1 n | y i y ^ i |
To assess the model’s discriminatory power in survival outcomes, we use the C-statistic, also known as the Concordance index (C-index), is a metric used to evaluate the discriminatory power of a predictive model, particularly in survival analysis. It measures how well the model distinguishes between subjects with different survival outcomes based on their predicted risk scores. It provides an overall assessment of the model’s ability to correctly rank the subjects according to their likelihood of experiencing the event of interest (e.g., death, failure, etc.). The C-statistic evaluates the degree of concordance between predicted risk scores and the actual observed outcomes. It is calculated as the proportion of all possible pairs of subjects for which the predicted risk scores are in agreement with the actual observed event times. The C-statistic measures how well a model can rank subjects based on predicted risk scores relative to actual survival times. It evaluates the model’s ability to correctly order survival times.
For two individuals i and j with survival times T i , T j and predicted risk scores r ^ i , r ^ j , the C-statistic is calculated as:
C = 1 | C | ( i , j ) C 1 , if r ^ i > r ^ j and T i < T j 0.5 , if r ^ i = r ^ j and T i T j 0 , otherwise
where C is the set of comparable pairs. A C-statistic of 0.5 suggests random predictions, while 1.0 indicates perfect concordance [19].
To capture treatment heterogeneity, we compute both the Average Treatment Effect (ATE) and the Conditional Average Treatment Effect (CATE). ATE is defined as the difference in expected outcomes between treated and control groups:
ATE = E [ Y ( 1 ) ] E [ Y ( 0 ) ]
In our context, race is treated as a moderator, with ATE computed separately for Black or African American and White populations.
CATE is the expected treatment effect conditional on covariates X, represented here by the first principal component score from a PCA decomposition:
CATE ( X ) = E [ Y ( 1 ) Y ( 0 ) X ]
This allows us to evaluate how treatment effects vary within racial groups based on individual-level covariate patterns.

4. Results

To begin our analysis, we examine the racial composition and survival outcomes within the dataset. Table 1 summarizes the counts of patients who are alive or deceased across two racial groups: Black or African American and White.
Based on the data in Table 1, we calculate the marginal probabilities of race group membership, which are later used in the computation of Horvitz–Thompson (HT) weights:
P ( T = 1 ) = 191 1071 0.1784 , P ( T = 0 ) = 880 1071 0.8216
Using these probabilities, the HT weights for each group are computed as:
w = 1 P ( T )
  • For Black or African American ( T = 1 ):
    w = 1 0.1784 5.605
  • For White ( T = 0 ):
    w = 1 0.8216 1.217
These results are presented in Table 2.
Next, we compute the conditional probabilities of treatment (race) given survival outcomes. These are necessary for estimating the Inverse Probability of Treatment Weighting (IPTW):
P ( T = 1 | Alive ) = 159 880 0.1807 , P ( T = 1 | Dead ) = 32 191 0.1675
P ( T = 0 | Alive ) = 721 880 0.8193 , P ( T = 0 | Dead ) = 159 191 0.8325
Accordingly, the IPTW values are calculated as follows:
I P T W = 1 P ( T | Y )
The computed IPTW values are summarized in Table 3.
The application of HT and IPTW weights plays a critical role in causal inference for observational studies. HT weights adjust for heterogeneity in treatment effects across subgroups, allowing for individualized treatment effect estimates. IPTW, in contrast, corrects for selection bias by reweighting individuals based on the likelihood of receiving their observed treatment, given covariates. This process emulates random treatment assignment and helps balance the covariate distribution between treatment groups.
Using both weights in tandem allows for a more accurate and equitable estimation of treatment effects—HT weights enhance precision by addressing within-group variation, while IPTW improves validity by mitigating confounding bias.
To evaluate model performance and the effectiveness of the weighting strategies, we compile results from CNN-LSTM, LSTM, and CBRF models into a comprehensive summary table. This table includes performance metrics, such as RMSE, MAE, and the C-statistic, as well as estimated ATE and race-specific CATE.
To aid interpretability, we visualize these outcomes using bar plots that disaggregate metrics by model and racial group. These visualizations clearly illustrate the comparative effectiveness of each model and highlight disparities in treatment effects across racial subgroups.
Table 4 presents a detailed comparison of model performance and treatment effect estimates across three machine learning models—CNN-LSTM, LSTM, and CBRF—evaluated under three weighting strategies: no weighting (unadjusted), HT weights, and IPTW. The performance metrics include RMSE, MAE, and C-statistic, which assess prediction accuracy and discrimination. Additionally, ATE and CATE are reported for two racial subgroups: Black or African American and White.
CNN-LSTM demonstrates superior predictive capability across all weighting methods. For the unweighted model, CNN-LSTM achieves an RMSE of 0.2991, MAE of 0.2147, and a C-statistic of 0.8890 across both racial groups. These values are consistently lower than those of LSTM (RMSE = 0.3112, MAE = 0.2330, C-statistic = 0.8853) and substantially better than CBRF (RMSE = 0.5037, MAE = 0.4913, C-statistic = 0.5042). When incorporating HT weights, CNN-LSTM shows marginal improvement in both error metrics (RMSE = 0.2916, MAE = 0.2047), while maintaining a stable C-statistic (0.8893). The most notable gains appear under IPTW weighting, where CNN-LSTM achieves its best overall performance (RMSE = 0.2604, MAE = 0.1745, C-statistic = 0.8944). LSTM also benefits from IPTW, improving to RMSE = 0.2751 and MAE = 0.1894, though still trailing CNN-LSTM.
CBRF, by contrast, maintains relatively poor performance across all weighting schemes. Despite slight improvements under HT (RMSE = 0.4829) and IPTW (RMSE = 0.4972), its C-statistic remains close to 0.50, suggesting limited discriminatory ability, and indicating performance near chance level.
ATE and CATE estimates vary both across models and racial subgroups. Without weighting, CNN-LSTM estimates ATEs of 1.1980 (Black or African American) and 1.2073 (White), with CATEs closely aligned. LSTM produces slightly higher ATEs (1.2204 for Black or African American; 1.2109 for White), suggesting consistent treatment effects across groups.
Under HT weighting, estimates shift moderately. CNN-LSTM reports an ATE of 1.2035 for Black or African American and 1.1889 for White, while CATEs exhibit greater variability (1.1624 for Black or African American; 1.2116 for White), indicating some heterogeneity. IPTW weighting further sharpens subgroup differences. CNN-LSTM yields an ATE of 1.0971 for Black or African American and 1.1485 for White, with CATEs of 1.1325 and 1.1079, respectively. These results suggest the potential presence of race-specific treatment response heterogeneity, with a marginally higher CATE for Black or African American individuals under the best-performing model configuration.
CBRF consistently produces higher ATE and CATE estimates across all settings. For instance, under IPTW, CBRF estimates an ATE of 1.4978 and a CATE of 1.4855 for White individuals—values notably higher than those produced by the deep learning models. However, given the model’s poor predictive performance, these estimates should be interpreted with caution and may reflect overfitting or bias rather than true effect magnitudes.
The choice of weighting strategy has a measurable impact on both model performance and effect estimation. Unweighted models yield higher variance and poorer calibration. HT weights provide moderate improvements, particularly in LSTM and CNN-LSTM, suggesting partial adjustment for covariate imbalance. IPTW consistently delivers the best results in terms of predictive accuracy and discrimination, particularly for CNN-LSTM. This supports the theoretical advantages of IPTW in mitigating confounding by balancing covariates across treatment groups.
These findings demonstrate that CNN-LSTM, particularly when combined with IPTW, offers the most reliable framework for treatment effect estimation in racially stratified cohorts. Its low error rates and high C-statistics across subgroups suggest both strong predictive validity and equitable performance. CBRF, while flexible in handling complex interactions, shows substantial limitations in generalizability due to poor prediction accuracy and model calibration.
Overall, Table 4 highlights the critical role of model selection and appropriate weighting in achieving accurate, robust, and equitable treatment effect estimates. A visualization of these results is provided in Figure 2, facilitating intuitive comparison across models, weighting strategies, and racial subgroups. The related R code can be found in Appendix A.

5. Conclusions

This study evaluated the comparative performance of advanced machine learning models—CNN-LSTM, LSTM, and CBRF—in estimating treatment effects under varying weighting strategies in a racially heterogeneous population. The results underscore the superior predictive and discriminatory performance of the CNN-LSTM model, particularly when integrated with IPTW. This model consistently demonstrated the lowest prediction errors and the highest C-statistics across racial subgroups, indicating its suitability for both outcome prediction and causal inference.
The use of IPTW substantially enhanced model calibration and reduced error variance, reinforcing its utility in addressing confounding in observational data. While HT weights offered modest gains over unweighted models, IPTW delivered the most reliable and equitable estimates of both ATE and CATE. These improvements were especially evident in the CNN-LSTM and LSTM models, highlighting the value of integrating modern weighting strategies with deep learning architectures.
Although CBRF estimated consistently higher treatment effects, its inferior predictive performance and poor discrimination capacity raise concerns about the validity and stability of these estimates. This suggests that, in contexts requiring both prediction and causal inference, model interpretability should not come at the expense of statistical rigor and generalization capability.
Importantly, the results revealed subtle yet meaningful heterogeneity in treatment effects across racial groups, which were most reliably captured by the CNN-LSTM model under IPTW. These findings advocate for the use of deep learning frameworks—especially CNN-LSTM paired with IPTW—in producing accurate and fair treatment effect estimates in racially diverse settings.
In sum, this study contributes to the growing literature on machine learning for causal inference by demonstrating the importance of model selection, weighting strategy, and subgroup-specific evaluation in health and policy research. Future work should extend these findings by validating performance on additional datasets, incorporating longitudinal and clustered data structures, and evaluating fairness metrics more explicitly.

Funding

This research received no external funding.

Data Availability Statement

No new data were created or analyzed in this study.

Acknowledgments

We thank the two respected referees, Associated Editor and Editor for constructive and helpful suggestions which led to substantial improvement in the revised version. For the sake of transparency and reproducibility, the code and data used for this study can be found in the following GitHub repository: R code GitHub site (https://github.com/kjonomi/Rcode/blob/main/gene_causal_R, accessed on 15 May 2025). We also acknowledge the use of data from the TCGA-BRCA dataset, which was made publicly available by the Genomic Data Commons (GDC).

Conflicts of Interest

The author declare no conflict of interest.

Appendix A. R Code

if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("TCGAbiolinks")
BiocManager::install("DESeq2")
  
library(TCGAbiolinks)
library(DESeq2)
  
# Step 1: Query the GDC for TCGA-BRCA data
query <- GDCquery(
  project = "TCGA-BRCA",                  # Breast cancer project
  data.category = "Transcriptome Profiling",       # Data category
  data.type = "Gene Expression Quantification"     # Data type
)
  
# Step 2: Download the data from GDC
GDCdownload(query)
  
# Step 3: Prepare the Data
data <- GDCprepare(query)
  
#Step 4: Preview the Data head(data)
  
#Step 5: Extract Expression Matrix and Clinical Data
expression_matrix <- assay(data)  # Extracts the gene expression
matrix col_data <- colData(data)  # Extracts clinical metadata
  
# Ensure you have the correct matrix or data structure.
  
col_data <- colData(data)         # Extract clinical data
  
# No Weights
  
library(keras)
library(randomForest)
library(dplyr)
library(ggplot2)
library(copula)
library(Hmisc)
library(tidyr)
  
# Step 1: Prepare the data
colnames(col_data)
str(col_data)
head(col_data)
  
col_data$race <- as.factor(col_data$race)
  
col_data$vital_status <- as.numeric(col_data$vital_status)
  
col_data <- col_data[col_data$race %in% c("black or african american", "white"), ]
  
expression_matrix <- expression_matrix[, rownames(col_data)]
  
race <- col_data$race outcome <- col_data$vital_status
  
# PCA on filtered expression data
  
expression_data <- as.matrix(t(expression_matrix))
  
zero_variance_cols <- apply(expression_data, 2, function(x)
var(x)==0)
  
expression_data_filtered <- expression_data[, !zero_variance_cols]
  
pca_result <- prcomp(expression_data_filtered, center = TRUE, scale.
= TRUE)
  
reduced_data_matrix <- as.matrix(pca_result$x[, 1:6])
  
# Reshape for LSTM/CNN-LSTM
n_samples <- nrow(reduced_data_matrix)
n_features <- ncol(reduced_data_matrix)
copula_transformed_reshaped
<- array(reduced_data_matrix, dim = c(n_samples, 1, n_features))
  
# --- LSTM Model ---
create_lstm_model <- function() {
  model <- keras_model_sequential() %>%
    layer_lstm(units = 64, input_shape = c(1, n_features),
    return_sequences = FALSE) %>%
    layer_dense(units = 32, activation = "relu") %>%
    layer_dense(units = 1)
  return(model)
}
  
# --- CNN-LSTM Model ---
create_cnn_lstm_model <- function() {
  model <- keras_model_sequential() %>%
    layer_conv_1d(filters = 64, kernel_size = 1, activation = "relu",
    input_shape = c(1, n_features)) %>%
    layer_max_pooling_1d(pool_size = 1) %>%
    layer_lstm(units = 64, return_sequences = FALSE) %>%
    layer_dense(units = 32, activation = "relu") %>%
    layer_dense(units = 1)
  return(model)
}
  
# --- Copula-Based RF Model ---
create_copula_rf_model <-
function(data, outcome) {
  fitted_copula <- fitCopula(normalCopula(dim = ncol(data)), pobs(data),
  method = "ml")
  copula_transformed_data <- qnorm(pobs(data), mean = 0, sd = 1)
  model <- randomForest(outcome ~ .,
  data = as.data.frame(copula_transformed_data), ntree = 100)
  return(model)
}
  
# --- Model Training ---
lstm_model <- create_lstm_model()
lstm_model %>% compile(loss = ’mean_squared_error’,
optimizer = optimizer_adam())
history_lstm <- lstm_model %>% fit(copula_transformed_reshaped, outcome,
epochs = 20, batch_size = 32, validation_split = 0.2)
  
cnn_lstm_model <- create_cnn_lstm_model()
cnn_lstm_model %>% compile(loss = ’mean_squared_error’,
optimizer = optimizer_adam())
history_cnn_lstm <- cnn_lstm_model %>% fit(copula_transformed_reshaped,
outcome, epochs = 20, batch_size = 32, validation_split = 0.2)
  
copula_rf_model <- create_copula_rf_model(reduced_data_matrix,
outcome)
  
# --- Predictions ---
predictions_lstm <- lstm_model %>% predict(copula_transformed_reshaped)
predictions_cnn_lstm <- cnn_lstm_model %>%
predict(copula_transformed_reshaped)
copula_rf_predictions <- predict(copula_rf_model,
as.data.frame(reduced_data_matrix))
  
# --- RMSE and MAE ---
rmse_lstm <- sqrt(mean((predictions_lstm -
outcome)^2))
mae_lstm <- mean(abs(predictions_lstm - outcome))
  
rmse_cnn_lstm <- sqrt(mean((predictions_cnn_lstm - outcome)^2))
mae_cnn_lstm <- mean(abs(predictions_cnn_lstm - outcome))
  
rmse_copula_rf <- sqrt(mean((copula_rf_predictions - outcome)^2))
mae_copula_rf <- mean(abs(copula_rf_predictions - outcome))
  
# --- C-statistics (Concordance Index) ---
  
c_stat_lstm <- as.numeric(rcorr.cens(predictions_lstm, outcome)["C
Index"])
  
c_stat_cnn_lstm <- as.numeric(rcorr.cens(predictions_cnn_lstm,
outcome)["C Index"])
  
c_stat_copula_rf <- as.numeric(rcorr.cens(copula_rf_predictions,
outcome)["C Index"])
  
# --- HTE (ATE and CATE) ---
  
calculate_treatment_effects_by_race <- function(predictions, race,
reduced_data_matrix) {
  black_indices <- which(race == "black or african american")
  white_indices <- which(race == "white")
  ate_black <- mean(predictions[black_indices])
  ate_white <- mean(predictions[white_indices])
  cate_black <- mean(predictions[black_indices &
  reduced_data_matrix[black_indices, 1] > 0])
  cate_white <- mean(predictions[white_indices &
  reduced_data_matrix[white_indices, 1] > 0])
  return(data.frame(
    Race = c("Black or African American", "White"),
    ATE = c(ate_black, ate_white),
    CATE = c(cate_black, cate_white)
  ))
}
  
hte_lstm_race <-
calculate_treatment_effects_by_race(predictions_lstm, race,
reduced_data_matrix)
  
hte_cnn_lstm_race <-
calculate_treatment_effects_by_race(predictions_cnn_lstm, race,
reduced_data_matrix)
  
hte_copula_rf_race <-
calculate_treatment_effects_by_race(copula_rf_predictions, race,
reduced_data_matrix)
hte_results_all_models <- rbind(
  cbind(hte_lstm_race, Model = "LSTM"),
  cbind(hte_cnn_lstm_race, Model = "CNN-LSTM"),
  cbind(hte_copula_rf_race, Model = "Copula-RF")
)
  
# --- Combine All Results ---
  
model_names <- c("LSTM", "CNN-LSTM", "Copula-RF")
  
rmse_values <- c(rmse_lstm, rmse_cnn_lstm, rmse_copula_rf)
  
mae_values <- c(mae_lstm, mae_cnn_lstm, mae_copula_rf)
  
cstat_values <- c(c_stat_lstm, c_stat_cnn_lstm, c_stat_copula_rf)
  
results_table <- data.frame(
  Model = model_names,
  RMSE = rmse_values,
  MAE = mae_values,
  C_Statistic = cstat_values
)
  
final_results_table <- merge(results_table, hte_results_all_models,
by = "Model", all = TRUE)
  
# --- Print Results ---
print(final_results_table)
  
# --- Plotting ---
results_long <- gather(final_results_table, key =
"Metric", value = "Value", -Model, -Race)
  
ggplot(results_long, aes(x = Model, y = Value, fill = Metric)) +
  geom_bar(stat = "identity", position = position_dodge(width = 0.8),
  width = 0.7) + facet_wrap(~Race) +
  labs(title = "Model Evaluation: RMSE, MAE, C-Statistic, ATE,
  and CATE by Race", x = "Model", y = "Value") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))
  
## HT Weights ########
  
library(keras)
library(randomForest)
library(dplyr)
library(ggplot2)
library(copula)
library(Hmisc)
library(tidyr)
  
# Step 1: Prepare the data
  
colnames(col_data)
str(col_data)
head(col_data)
  
# Convert race to factor and ensure outcome is numeric
  
col_data$race <- as.factor(col_data$race)
  
col_data$vital_status <- as.numeric(col_data$vital_status)
  
# Filter for ’Black or African American’ and ’White’ only
  
col_data <- col_data[col_data$race %in% c("black or african american", "white"),]
  
# Update the expression matrix
  
expression_matrix <- expression_matrix[, rownames(col_data)]
  
# Assign HT Weights based on Race
  
ht_weights <- ifelse(col_data$race == "black or african american",
5.605, 1.217)
  
# PCA for dimensionality reduction
  
expression_data <- as.matrix(t(expression_matrix))
  
zero_variance_cols <- apply(expression_data, 2, function(x) var(x)
== 0)
  
expression_data_filtered <- expression_data[, !zero_variance_cols]
  
pca_result <- prcomp(expression_data_filtered, center = TRUE, scale.
= TRUE)
  
reduced_data_matrix <- as.matrix(pca_result$x[, 1:6])
  
# Reshape for deep learning models
  
n_samples <- nrow(reduced_data_matrix)
  
n_features <- ncol(reduced_data_matrix)
  
copula_transformed_reshaped <- array(reduced_data_matrix, dim =
c(n_samples, 1, n_features))
  
# --- LSTM Model ---
  
create_lstm_model <- function() {
  model <- keras_model_sequential() %>%
    layer_lstm(units = 64, input_shape = c(1, n_features),
    return_sequences = FALSE) %>%
    layer_dense(units = 32, activation = "relu") %>%
    layer_dense(units = 1)
  return(model)
}
  
# --- CNN-LSTM Model ---
  
create_cnn_lstm_model <- function() {
  model <- keras_model_sequential() %>%
    layer_conv_1d(filters = 64, kernel_size = 1, activation = "relu",
    input_shape = c(1, n_features)) %>%
    layer_max_pooling_1d(pool_size = 1) %>%
    layer_lstm(units = 64, return_sequences = FALSE) %>%
    layer_dense(units = 32, activation = "relu") %>%
    layer_dense(units = 1)
  return(model)
}
  
# --- Copula-Based Random Forest Model ---
  
create_copula_rf_model <- function(data, outcome) {
  fitted_copula <- fitCopula(normalCopula(dim = ncol(data)), pobs(data),
  method = "ml")
  copula_transformed_data <- qnorm(pobs(data), mean = 0, sd = 1)
  model <- randomForest(outcome ~ .,
  data = as.data.frame(copula_transformed_data), ntree = 100)
  return(model)
}
  
# --- Train Models ---
  
lstm_model <- create_lstm_model()
lstm_model %>% compile(loss = ’mean_squared_error’,
optimizer = optimizer_adam())
history_lstm <- lstm_model %>% fit(copula_transformed_reshaped, outcome,
epochs = 20, batch_size = 32, validation_split = 0.2)
  
cnn_lstm_model <- create_cnn_lstm_model()
cnn_lstm_model %>% compile(loss = ’mean_squared_error’,
optimizer = optimizer_adam())
history_cnn_lstm <- cnn_lstm_model %>% fit(copula_transformed_reshaped,
outcome, epochs = 20, batch_size = 32, validation_split = 0.2)
  
copula_rf_model <- create_copula_rf_model(reduced_data_matrix,
outcome)
  
# --- Model Predictions ---
predictions_lstm <- lstm_model %>%
predict(copula_transformed_reshaped)
predictions_cnn_lstm <- cnn_lstm_model %>%
predict(copula_transformed_reshaped)
copula_rf_predictions <- predict(copula_rf_model,
as.data.frame(reduced_data_matrix))
  
# --- Weighted RMSE and MAE ---
  
weighted_rmse <- function(predictions) {
  sqrt(sum(ht_weights * (predictions - outcome)^2) / sum(ht_weights))
}
  
weighted_mae <- function(predictions) {
  sum(ht_weights * abs(predictions - outcome)) / sum(ht_weights)
}
  
rmse_lstm <- weighted_rmse(predictions_lstm)
  
mae_lstm <-weighted_mae(predictions_lstm)
  
rmse_cnn_lstm <- weighted_rmse(predictions_cnn_lstm)
  
mae_cnn_lstm <- weighted_mae(predictions_cnn_lstm)
  
rmse_copula_rf <- weighted_rmse(copula_rf_predictions)
  
mae_copula_rf <- weighted_mae(copula_rf_predictions)
  
# --- C-Statistics (Concordance Index) ---
  
calculate_c_statistic <- function(predictions, outcome) {
  rcorr.cens(predictions, outcome)[["C Index"]]
}
  
c_stat_lstm <- calculate_c_statistic(predictions_lstm, outcome)
  
c_stat_cnn_lstm <- calculate_c_statistic(predictions_cnn_lstm,
outcome)
  
c_stat_copula_rf <- calculate_c_statistic(copula_rf_predictions,
outcome)
  
# --- HTE (ATE & CATE) with HT Weights ---
  
calculate_weighted_ht_effects <- function(predictions, race,
reduced_data_matrix, weights) {
  black_indices <- which(race == "black or african american")
  white_indices <- which(race == "white")
  weighted_mean <- function(pred, indices) {
    sum(weights[indices] * pred[indices]) / sum(weights[indices])
  }
  
  cate_filter <- reduced_data_matrix[, 1] > 0
  
  ate_black <- weighted_mean(predictions, black_indices)
  ate_white <- weighted_mean(predictions, white_indices)
  cate_black <- weighted_mean(predictions,
  black_indices[cate_filter[black_indices]])
  cate_white <- weighted_mean(predictions,
  white_indices[cate_filter[white_indices]])
  
  return(data.frame(
    Race = c("Black or African American", "White"),
    ATE = c(ate_black, ate_white),
    CATE = c(cate_black, cate_white)
  ))
}
  
hte_lstm_race <- calculate_weighted_ht_effects(predictions_lstm,
col_data$race, reduced_data_matrix, ht_weights)
  
hte_cnn_lstm_race
<-calculate_weighted_ht_effects(predictions_cnn_lstm, col_data$race,
reduced_data_matrix, ht_weights)
  
hte_copula_rf_race <-
calculate_weighted_ht_effects(copula_rf_predictions, col_data$race,
reduced_data_matrix, ht_weights)
  
# --- Combine Results ---
  
hte_results_all_models <- rbind(
  cbind(hte_lstm_race, Model = "LSTM"),
  cbind(hte_cnn_lstm_race, Model = "CNN-LSTM"),
  cbind(hte_copula_rf_race, Model = "Copula-RF")
)
  
model_names <- c("LSTM", "CNN-LSTM", "Copula-RF")
  
rmse_values <- c(rmse_lstm, rmse_cnn_lstm, rmse_copula_rf)
  
mae_values <- c(mae_lstm, mae_cnn_lstm, mae_copula_rf)
  
c_stat_values <- c(c_stat_lstm, c_stat_cnn_lstm, c_stat_copula_rf)
  
results_table <- data.frame(
  Model = model_names,
  RMSE = rmse_values,
  MAE = mae_values,
  C_statistic = c_stat_values
)
  
final_results_table <- merge(results_table, hte_results_all_models,
by = "Model", all = TRUE) print(final_results_table)
  
# --- Plotting Weighted Results ---
  
results_long <- gather(final_results_table, key = "Metric", value =
"Value", -Model, -Race)
  
ggplot(results_long, aes(x = Model, y = Value, fill = Metric)) +
  geom_bar(stat = "identity", position = position_dodge(width = 0.8),
  width = 0.7) +
  facet_wrap(~Race, scales = "free_y") +
  labs(title = "HT-Weighted Evaluation Metrics by Model and Race",
       x = "Model",
       y = "Value") +
  theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
  
######################## ## IPTW ########################
  
library(keras)
library(randomForest)
library(dplyr)
library(ggplot2)
library(copula)
library(Hmisc)
library(tidyr)
  
# Step 1: Prepare the data
  
col_data$race <- as.factor(col_data$race)
  
col_data$vital_status <- as.numeric(col_data$vital_status)
col_data <- col_data[col_data$race %in% c("black or african american", "white"), ]
expression_matrix <- expression_matrix[, rownames(col_data)]
  
# IPTW Weights
  
n_black_alive <- 159
n_black_dead <- 32
n_white_alive <- 721
n_white_dead <- 159
  
p_black_alive <- n_black_alive / (n_black_alive + n_white_alive)
p_black_dead <- n_black_dead / (n_black_dead + n_white_dead)
p_white_alive <- n_white_alive / (n_black_alive + n_white_alive)
p_white_dead <- n_white_dead / (n_black_dead + n_white_dead)
  
iptw_black_alive <- 1 / p_black_alive
  
iptw_black_dead <- 1 / p_black_dead
  
iptw_white_alive <- 1 / p_white_alive
  
iptw_white_dead <- 1 / p_white_dead
  
iptw_weights <- ifelse(
  col_data$race == "black or african american" & col_data$vital_status == 1,
  iptw_black_alive,
  ifelse(col_data$race == "black or african american" & col_data$vital_status == 0,
  iptw_black_dead,
    ifelse(col_data$race == "white" & col_data$vital_status == 1, iptw_white_alive,
    iptw_white_dead)
  )
)
  
# PCA Dimensionality Reduction
  
expression_data <- as.matrix(t(expression_matrix))
  
zero_variance_cols <- apply(expression_data, 2, function(x) var(x)
== 0)
  
expression_data_filtered <- expression_data[, !zero_variance_cols]
  
pca_result <- prcomp(expression_data_filtered, center = TRUE, scale.
= TRUE)
  
reduced_data_matrix <-as.matrix(pca_result$x[, 1:6])
  
n_samples <- nrow(reduced_data_matrix)
  
n_features <- ncol(reduced_data_matrix)
  
copula_transformed_reshaped <- array(reduced_data_matrix, dim =
c(n_samples, 1, n_features))
  
# LSTM Model create_lstm_model <- function() {
  keras_model_sequential() %>%
    layer_lstm(units = 64, input_shape = c(1, n_features),
    return_sequences = FALSE) %>%
    layer_dense(units = 32, activation = "relu") %>%
    layer_dense(units = 1)
}
  
# CNN-LSTM Model create_cnn_lstm_model <- function() {
  keras_model_sequential() %>%
    layer_conv_1d(filters = 64, kernel_size = 1, activation = "relu",
    input_shape = c(1, n_features)) %>%
    layer_max_pooling_1d(pool_size = 1) %>%
    layer_lstm(units = 64, return_sequences = FALSE) %>%
    layer_dense(units = 32, activation = "relu") %>%
    layer_dense(units = 1)
}
  
# Copula-Based Random Forest Model
  
create_copula_rf_model <- function(data, outcome) {
  fitted_copula <- fitCopula(normalCopula(dim = ncol(data)),
  pobs(data), method = "ml")
  copula_transformed_data <- qnorm(pobs(data), mean = 0, sd = 1)
  randomForest(outcome ~ ., data = as.data.frame(copula_transformed_data),
  ntree = 100)
}
  
# Train Models
  
lstm_model <- create_lstm_model()
lstm_model %>% compile(loss = ’mean_squared_error’,
optimizer = optimizer_adam())
history_lstm <- lstm_model %>% fit(copula_transformed_reshaped,
col_data$vital_status, epochs = 20, batch_size = 32,
validation_split = 0.2)
  
cnn_lstm_model <- create_cnn_lstm_model()
cnn_lstm_model %>% compile(loss = ’mean_squared_error’,
optimizer = optimizer_adam())
history_cnn_lstm <- cnn_lstm_model %>% fit(copula_transformed_reshaped,
col_data$vital_status, epochs = 20, batch_size = 32,
validation_split = 0.2)
  
copula_rf_model <- create_copula_rf_model(reduced_data_matrix,
col_data$vital_status)
  
# Model Predictions
predictions_lstm <- lstm_model %>% predict(copula_transformed_reshaped)
predictions_cnn_lstm <- cnn_lstm_model %>% predict(copula_transformed_reshaped)
copula_rf_predictions <- predict(copula_rf_model,
as.data.frame(reduced_data_matrix))
  
# Weighted RMSE and MAE
weighted_rmse <- function(predictions) {
  sqrt(sum(iptw_weights * (predictions - col_data$vital_status)^2)/
  sum(iptw_weights))
}
  
weighted_mae <- function(predictions) {
  sum(iptw_weights * abs(predictions - col_data$vital_status))/sum(iptw_weights)
}
  
rmse_lstm <- weighted_rmse(predictions_lstm) mae_lstm <-
weighted_mae(predictions_lstm)
  
rmse_cnn_lstm <- weighted_rmse(predictions_cnn_lstm) mae_cnn_lstm <-
weighted_mae(predictions_cnn_lstm)
  
rmse_copula_rf <- weighted_rmse(copula_rf_predictions) mae_copula_rf
<- weighted_mae(copula_rf_predictions)
  
# Compute C-statistics cstat_lstm <- rcorr.cens(predictions_lstm,
col_data$vital_status)["C Index"] cstat_cnn_lstm <-
rcorr.cens(predictions_cnn_lstm, col_data$vital_status)["C Index"]
cstat_copula_rf <- rcorr.cens(copula_rf_predictions,
col_data$vital_status)["C Index"]
  
# Weighted ATE & CATE calculate_weighted_iptw_effects <-
function(predictions, race, reduced_data_matrix, weights) {
  black_indices <- which(race == "black or african american")
  white_indices <- which(race == "white")
  
  weighted_mean <- function(pred, indices) {
    sum(weights[indices] * pred[indices]) / sum(weights[indices])
  }
  
  ate_black <- weighted_mean(predictions, black_indices)
  ate_white <- weighted_mean(predictions, white_indices)
  cate_black <- weighted_mean(predictions, black_indices &
  reduced_data_matrix[black_indices, 1] > 0)
  cate_white <- weighted_mean(predictions, white_indices &
  reduced_data_matrix[white_indices, 1] > 0)
  
  data.frame(
    Race = c("Black or African American", "White"),
    ATE = c(ate_black, ate_white),
    CATE = c(cate_black, cate_white)
  )
}
  
hte_lstm_race <- calculate_weighted_iptw_effects(predictions_lstm,
col_data$race, reduced_data_matrix, iptw_weights)
  
hte_cnn_lstm_race <-
calculate_weighted_iptw_effects(predictions_cnn_lstm, col_data$race,
reduced_data_matrix, iptw_weights)
  
hte_copula_rf_race <-
calculate_weighted_iptw_effects(copula_rf_predictions,
col_data$race, reduced_data_matrix, iptw_weights)
  
# Final Combined Table
final_results_table <- data.frame(
  Model = rep(c("LSTM", "CNN-LSTM", "Copula-RF"), each = 2),
  Race = rep(c("Black or African American", "White"), times = 3),
  RMSE = c(rmse_lstm, rmse_lstm, rmse_cnn_lstm, rmse_cnn_lstm,
  rmse_copula_rf, rmse_copula_rf),
  MAE = c(mae_lstm, mae_lstm, mae_cnn_lstm, mae_cnn_lstm,
  mae_copula_rf, mae_copula_rf),
  ATE = c(hte_lstm_race$ATE, hte_cnn_lstm_race$ATE, hte_copula_rf_race$ATE),
  CATE = c(hte_lstm_race$CATE, hte_cnn_lstm_race$CATE, hte_copula_rf_race$CATE),
  C_statistic = c(cstat_lstm, cstat_lstm, cstat_cnn_lstm, cstat_cnn_lstm,
  cstat_copula_rf, cstat_copula_rf)
)
  
print(final_results_table)
  
# Plot Results
results_long <- pivot_longer(final_results_table,
cols = c(RMSE, MAE, ATE, CATE, C_statistic), names_to = "Metric",
values_to = "Value")
  
ggplot(results_long, aes(x = Model, y = Value, fill = Metric)) +
  geom_bar(stat = "identity", position = position_dodge(width = 0.8),
  width = 0.7) + facet_wrap(~Race) +
  labs(title = " IPTW-Weighted Evaluation Metrics by Model and Race ",
       x = "Model", y = "Value") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

References

  1. Ling, Y.; Upadhyaya, P.; Chen, L.; Jiang, X.; Kim, Y. Emulate randomized clinical trials using heterogeneous treatment effect estimation for personalized treatments: Methodology review and benchmark. J. Biomed. Inform. 2023, 137, 104256. [Google Scholar] [CrossRef] [PubMed]
  2. Hu, L.; Ji, J.; Li, F. Estimating heterogeneous survival treatment effect in observational data using machine learning. Stat. Med. 2021, 40, 4691–4713. [Google Scholar] [CrossRef] [PubMed]
  3. Dimitriou, E.; Fong, E.; Diaz-Ordaz, K.; Lehmann, B. Data Fusion for Heterogeneous Treatment Effect Estimation with Multi-Task Gaussian Processes. arXiv 2024. [Google Scholar]
  4. Imbens, G.W.; Rubin, D.B. Causal Inference in Statistics, Social, and Biomedical Sciences; Cambridge University Press: Cambridge, UK, 2015. [Google Scholar]
  5. Rubin, D.B. Estimating causal effects of treatments in randomized and nonrandomized studies. J. Educ. 1974, 66, 688–701. [Google Scholar] [CrossRef]
  6. Rosenbaum, P.R.; Rubin, D.B. The Central Role of the Propensity Score in Observational Studies for Causal Effects. Biometrika 1983, 70, 41–55. [Google Scholar] [CrossRef]
  7. Lu, M.; Sadiq, S.; Feaster, D.J.; Ishwaran, H. Estimating individualized treatment rules with censored data. Stat. Med. 2021, 40, 2477–2493. [Google Scholar]
  8. Van der Laan, M.J.; Rose, S. Targeted Learning: Causal Inference for Observational and Experimental Data; Springer: New York, NY, USA, 2011. [Google Scholar]
  9. Athey, S.; Imbens, G.W. Recursive partitioning for heterogeneous causal effects. Proc. Natl. Acad. Sci. USA 2016, 113, 7353–7360. [Google Scholar] [CrossRef]
  10. Therneau, T.M.; Grambsch, P.M. Modeling Survival Data: Extending the Cox Model; Springer: New York, NY, USA, 2000. [Google Scholar]
  11. Zhao, Q.; Hastie, T.; Tibshirani, R. Efficient computation of regularization paths for generalized additive models. J. Comput. Graph. Stat. 2019, 28, 727–744. [Google Scholar]
  12. Wager, S.; Athey, S. Estimation and inference of heterogeneous treatment effects using random forests. J. Am. Stat. Assoc. 2018, 113, 1228–1242. [Google Scholar] [CrossRef]
  13. Katzman, J.L.; Shaham, U.; Cloninger, A.; Bates, J.; Jiang, T.; Kluger, Y. DeepSurv: Personalized treatment recommender system using a Cox proportional hazards deep neural network. BMC Med. Res. Methodol. 2018, 18, 24. [Google Scholar] [CrossRef]
  14. Wang, Y.; Xie, J.; Zhao, X. DeepSurv landmarking: A deep learning approach for dynamic survival analysis with longitudinal data. J. Stat. Comput. Simul. 2024, 95, 186–207. [Google Scholar] [CrossRef]
  15. Joe, H. Dependence Modeling with Copulas; CRC Press: Boca Raton, FL, USA, 2014. [Google Scholar]
  16. Nelsen, R.B. An Introduction to Copulas, 2nd ed.; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2013. [Google Scholar]
  17. Ishwaran, H.; Kogalur, U.B.; Blackstone, E.H.; Lauer, M.S. Random survival forests. Ann. Appl. Stat. 2008, 2, 841–860. [Google Scholar] [CrossRef]
  18. Lee, C.; Zame, W.R.; Yoon, J.; van der Schaar, M. DeepHit: A deep learning approach to survival analysis with competing risks. Proc. Aaai Conf. Artif. Intell. 2018, 32, 2314–2321. [Google Scholar] [CrossRef]
  19. Harrell, F.E.; Califf, R.M.; Pryor, D.B.; Lee, K.L.; Rosati, R.A. Evaluating the yield of medical tests. JAMA 1982, 247, 2543–2546. [Google Scholar] [CrossRef]
  20. Imbens, G.W. Nonparametric Estimation of Average Treatment Effects Under Exogeneity: A Review. Rev. Econ. Stat. 2004, 86, 4–29. [Google Scholar] [CrossRef]
  21. Austin, P.C. An Introduction to Propensity Score Methods for Reducing the Effects of Confounding in Observational Studies. Multivar. Behav. Res. 2011, 46, 399–424. [Google Scholar] [CrossRef]
  22. Anderson, C.; Rutkowski, L. Multinomial logistic regression. In Multinomial Logistic Regression; Osborne, J., Ed.; SAGE Publications, Inc.: Thousand Oaks, CA, USA, 2008; pp. 390–409. [Google Scholar] [CrossRef]
  23. Love, M.I.; Huber, W.; Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014, 15, 550. [Google Scholar] [CrossRef]
  24. Colaprico, A.; Silva, T.C.; Olsen, C.; Garofano, L.; Cava, C.; Garolini, D.; Sabedot, T.; Malta, T.M.; Pagnotta, S.M.; Castiglioni, I.; et al. TCGAbiolinks: An R/Bioconductor package for integrative analysis of TCGA data. Nucleic Acids Res. 2015, 44, e71. [Google Scholar] [CrossRef]
  25. Silva, C.T.; Colaprico, A.; Olsen, C.; DÁngelo, F.; Bontempi, G.; Ceccarelli, M.; Noushmehr, H. TCGA Workflow: Analyze cancer genomics and epigenomics data using Bioconductor packages. F1000Research 2016, 5, 1542. [Google Scholar] [CrossRef]
  26. Mounir, M.; Lucchetta, M.; Silva, C.T.; Olsen, C.; Bontempi, G.; Chen, X.; Noushmehr, H.; Colaprico, A.; Papaleo, E. New functionalities in the TCGAbiolinks package for the study and integration of cancer data from GDC and GTEx. PLoS Comput. Biol. 2019, 15, e1006701. [Google Scholar] [CrossRef]
  27. Sklar, A. Fonctions de répartition à n dimensions et leurs marges. Publ. L’Institut Stat. L’UniversitÉ Paris 1959, 8, 229–231. [Google Scholar]
  28. Joe, H. Multivariate Models and Dependence Concepts; CRC Press: Boca Raton, FL, USA, 1997. [Google Scholar]
  29. Laan, M.J.; Robins, J.M. Unified Methods for Censored Longitudinal Data and Causality; Springer: New York, NY, USA, 2003. [Google Scholar]
  30. Pouliasis, G.; Torres-Alves, G.A.; Morales-Napoles, O. Stochastic Modeling of Hydroclimatic Processes Using Vine Copulas. Water 2021, 13, 2156. [Google Scholar] [CrossRef]
  31. Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Figure 1. PCA screeplot for breast cancer dataset.
Figure 1. PCA screeplot for breast cancer dataset.
Mathematics 13 01659 g001
Figure 2. Comparison of different weights models with breast cancer data.
Figure 2. Comparison of different weights models with breast cancer data.
Mathematics 13 01659 g002aMathematics 13 01659 g002b
Table 1. Counts of race groups and survival outcomes.
Table 1. Counts of race groups and survival outcomes.
RaceAliveDeadTotal
Black or African American15932191
White721159880
Total8801911071
Table 2. Horvitz–Thompson weights.
Table 2. Horvitz–Thompson weights.
RaceHT Weight
Black or African American5.605
White1.217
Table 3. Inverse Probability of Treatment Weights (IPTW).
Table 3. Inverse Probability of Treatment Weights (IPTW).
RaceOutcomeProbability P ( T | Y ) IPTW 1 / P ( T | Y )
Black or African AmericanAlive0.18075.53
Black or African AmericanDead0.16755.97
WhiteAlive0.81931.22
WhiteDead0.83251.20
Table 4. Model performance comparison across weighting methods.
Table 4. Model performance comparison across weighting methods.
ModelRaceRMSEMAEC_StatisticATECATEWeighting
CNN-LSTMBlack or African American0.29910.21470.88901.19801.2080No Weights
CNN-LSTMWhite0.29910.21470.88901.20731.2011No Weights
CBRFBlack or African American0.50370.49130.50421.46381.4883No Weights
CBRFWhite0.50370.49130.50421.49241.4891No Weights
LSTMBlack or African American0.31120.23300.88531.22041.2142No Weights
LSTMWhite0.31120.23300.88531.21091.2014No Weights
CNN-LSTMBlack or African American0.29160.20470.88931.20351.1624HW Weights
CNN-LSTMWhite0.29160.20470.88931.18891.2116HW Weights
CBRFBlack or African American0.48290.47140.51401.43661.4527HW Weights
CBRFWhite0.48290.47140.51401.48131.5294HW Weights
LSTMBlack or African American0.30410.20360.87061.13251.1569HW Weights
LSTMWhite0.30410.20360.87061.16761.2120HW Weights
CNN-LSTMBlack or African American0.26040.17450.89441.09711.1325IPTW Weights
CNN-LSTMWhite0.26040.17450.89441.14851.1079IPTW Weights
CBRFBlack or African American0.49720.48890.48851.47481.4816IPTW Weights
CBRFWhite0.49720.48890.48851.49781.4855IPTW Weights
LSTMBlack or African American0.27510.18940.87481.12141.1558IPTW Weights
LSTMWhite0.27510.18940.87481.18021.1479IPTW Weights
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Kim, J.-M. Integrating Copula-Based Random Forest and Deep Learning Approaches for Analyzing Heterogeneous Treatment Effects in Survival Analysis. Mathematics 2025, 13, 1659. https://doi.org/10.3390/math13101659

AMA Style

Kim J-M. Integrating Copula-Based Random Forest and Deep Learning Approaches for Analyzing Heterogeneous Treatment Effects in Survival Analysis. Mathematics. 2025; 13(10):1659. https://doi.org/10.3390/math13101659

Chicago/Turabian Style

Kim, Jong-Min. 2025. "Integrating Copula-Based Random Forest and Deep Learning Approaches for Analyzing Heterogeneous Treatment Effects in Survival Analysis" Mathematics 13, no. 10: 1659. https://doi.org/10.3390/math13101659

APA Style

Kim, J.-M. (2025). Integrating Copula-Based Random Forest and Deep Learning Approaches for Analyzing Heterogeneous Treatment Effects in Survival Analysis. Mathematics, 13(10), 1659. https://doi.org/10.3390/math13101659

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop