Comparing Handcrafted Radiomics Versus Latent Deep Learning Features of Admission Head CT for Hemorrhagic Stroke Outcome Prediction

Tran, Anh T.; Wen, Junhao; Abou Karam, Gaby; Zeevi, Dorin; Qureshi, Adnan I.; Malhotra, Ajay; Majidi, Shahram; Valizadeh, Niloufar; Murthy, Santosh B.; Sabuncu, Mert R.; Roh, David; Falcone, Guido J.; Sheth, Kevin N.; Payabvash, Seyedmehdi

doi:10.3390/biotech14040087

Open AccessArticle

Comparing Handcrafted Radiomics Versus Latent Deep Learning Features of Admission Head CT for Hemorrhagic Stroke Outcome Prediction

by

Anh T. Tran

¹,

Junhao Wen

¹,

Gaby Abou Karam

²,

Dorin Zeevi

¹,

Adnan I. Qureshi

³,

Ajay Malhotra

²

,

Shahram Majidi

⁴

,

Niloufar Valizadeh

⁴,

Santosh B. Murthy

⁵

,

Mert R. Sabuncu

^6,7,

David Roh

⁸,

Guido J. Falcone

⁹

,

Kevin N. Sheth

⁹ and

Seyedmehdi Payabvash

^1,*

¹

Department of Radiology, Columbia University Irving Medical Center, New York, NY 10032, USA

²

Department of Radiology and Biomedical Imaging, Yale School of Medicine, New Haven, CT 06520, USA

³

Zeenat Qureshi Stroke Institute and Department of Neurology, University of Missouri, Columbia, MO 65212, USA

⁴

Department of Neurosurgery, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA

⁵

Department of Neurology, Weill Cornell Medical College, Cornell University, New York, NY 10065, USA

⁶

Department of Radiology, Weill Cornell Medicine, New York, NY 10065, USA

⁷

School of Electrical and Computer Engineering, Cornell Tech, Cornell University, New York, NY 10044, USA

⁸

Department of Neurology, Columbia University Irving Medical Center, New York, NY 10032, USA

⁹

Department of Neurology, Yale School of Medicine, New Haven, CT 06520, USA

^*

Author to whom correspondence should be addressed.

BioTech 2025, 14(4), 87; https://doi.org/10.3390/biotech14040087

Submission received: 15 September 2025 / Revised: 30 October 2025 / Accepted: 31 October 2025 / Published: 2 November 2025

(This article belongs to the Special Issue Advances in Bioimaging Technology)

Download

Browse Figures

Versions Notes

Abstract

Handcrafted radiomics use predefined formulas to extract quantitative features from medical images, whereas deep neural networks learn de novo features through iterative training. We compared these approaches for predicting 3-month outcomes and hematoma expansion from admission non-contrast head CT in acute intracerebral hemorrhage (ICH). Training and cross-validation were performed using a multicenter trial cohort (n = 866), with external validation on a single-center dataset (n = 645). We trained multiscale U-shaped segmentation models for hematoma segmentation and extracted (i) radiomics from the segmented lesions and (ii) two latent deep feature sets—from the segmentation encoder and a generative autoencoder trained on dilated lesion patches. Features were reduced with unsupervised Non-Negative Matrix Factorization (NMF) to 128 per set and used—alone or in combination—for six machine-learning classifiers to predict 3-month clinical outcomes and (>3, >6, >9 mL) hematoma expansion thresholds. The addition of latent deep features to radiomics numerically increased model prediction performance for 3-month outcomes and hematoma expansion using Random Forest, XGBoost, Extra Trees, or Elastic Net classifiers; however, the improved accuracy only reached statistical significance in predicting >3 mL hematoma expansion. Clinically, these consistent but modest increases in prediction performance may improve risk stratification at the individual level. Nevertheless, the latent deep features show potential for extracting additional clinically relevant information from admission head CT for prognostication in hemorrhagic stroke.

Keywords:

radiomics; latent deep features; U-net segmentation; generative auto-encoders; stroke; intracerebral hemorrhage

Key Contribution: We showed that latent deep features from a U-Net segmentation model and a generative autoencoder applied to admission head CT achieved slightly but consistently better prognostic performance for acute ICH than handcrafted radiomics computed by predefined formulas.

1. Introduction

Radiomics refers to the extraction of handcrafted quantitative features from medical images using predefined formulas representing lesion shape, intensity, and texture [1,2,3]. In contrast, deep learning models such as convolutional neural networks (CNNs) automatically learn hierarchical representations from clinical scans through iterative training, without any preset formula [4,5]. Compared to radiomics, deep learning models can capture high-order abstract patterns or subtle context in images [6,7,8]. However, training of CNN models requires large sample sizes, which can be particularly challenging in less common medical conditions. To circumvent this, some groups have used pretrained CNNs to extract features from clinical scans [9,10,11]. This strategy is—at least theoretically—limited by task specificity and by the fact that most pretrained CNNs are optimized for 2D non-medical images, making direct application to 3D volumetric clinical scans imperfect. Alternatively, a U-shaped neural network for a segmentation task or configured as an autoencoder can extract latent deep features from medical images [12,13,14]. Compared with generic pretrained CNNs, such in-domain models provide modality- and disease-specific features that can support clinical prediction.

Latent deep features extracted from brain MRIs have been widely used for the classification and differentiation of cerebral tumors. Various U-Net–based segmentation models have been applied for the dual purpose of lesion segmentation and prediction of pathological grades or tumor subtypes [15,16,17,18]. Denes-Fazakas et al. developed L-net, which couples a U-Net segmentation unit with a CNN classification unit so that the CNN uses latent deep features extracted by the U-Net rather than the raw image [15]. Rai et al. proposed a “two-headed” UNet-EfficientNet architecture that performs segmentation and classification in parallel, leveraging latent deep feature extraction from the segmentation arm for the classification of brain tumors [16]. In addition, several groups have utilized generative autoencoders to capture texture-based features for brain tumor classification [19,20,21,22,23,24,25,26,27]. For example, Cheng et al. reported on a multimodal disentangled variational autoencoder for glioma grading [19]. Ullah et al. combined CNN architecture with a Stack Encoder–Decoder network to extract and apply multimodal brain MRI features for tumor classification [22]. However, this approach has rarely been extended beyond oncologic neuroimaging. In this study, we apply such strategies and compare them with handcrafted radiomics for outcome prediction in hemorrhagic stroke.

Spontaneous acute intracerebral hemorrhage (ICH) accounts for 49.5% of the stroke burden in terms of disability-adjusted life years [DALYs] loss and 45.6% of stroke-related mortality [28]. Hematoma expansion affects up to 30% of ICH patients within the first 24 h after onset and is a main modifiable risk factor and treatment target in ICH [29]. Prior studies have shown that radiomic features extracted from admission head CT can predict hematoma expansion, long-term outcomes, and mortality in ICH [30,31,32,33,34,35,36,37,38,39,40,41]. For this study, we extracted the handcrafted radiomic features from the hematoma lesions on admission head CTs for the prediction of 3-month clinical outcome and >3-, 6-, and 9 mL hematoma expansion. We compared these radiomic features with latent representations derived from (i) a U-Net–based ICH segmentation network [42] and (ii) a generative adversarial autoencoder trained to reconstruct the hemorrhagic lesion and perihematomal brain tissue [43]. U-Net segmentation models learn hierarchical representative features distinguishing hemorrhage from normal brain, where the encoder concentrates salient information and the decoder produces voxel-wise labels [44,45]. Similarly, U-shaped generative autoencoders reconstruct the input by learning a compact latent representation of cropped head CT patches containing the hemorrhage and perihematomal tissue, thereby capturing lesion morphology and surrounding tissue context [46,47]. We applied the unsupervised Non-negative Matrix Factorization (NMF) feature extraction to both radiomics and latent deep features [48]. Notably, in our study design, both feature extraction—via radiomics, U-Net segmentation, or generative autoencoder—and feature selection are performed independent of outcome prediction (i.e., 3-month clinical outcome or hematoma expansion), thereby avoiding information leakage and yielding outcome-agnostic features that can be reused for other ICH endpoints. We compared these features as input for six different machine learning classifiers.

2. Materials and Methods

2.1. Patients’ Datasets

For training and cross-validation, we used the patients’ dataset from the Antihypertensive Treatment of Acute Cerebral Hemorrhage II (ATACH-2) clinical trial [49]. ATACH-2 was a large-scale, randomized, multicenter study designed to assess the clinical impact of intensive systolic blood pressure lowering in individuals with acute ICH but found no treatment benefits [49]. For independent validation, we used the Yale Longitudinal Study of Acute Brain Injury dataset [50]. We included adult patients (>18 years old) with acute spontaneous ICH and baseline and follow-up head CT scans. For follow-up CT, we used the scan taken close to 24 h after the ICH onset. Subjects with missing follow-up scans or poor-quality CTs were excluded. The 3-month modified Rankin Scale (mRS), or the closest available follow-up, was used to determine clinical outcome. Poor outcome was defined as mRS > 3, consistent with the prior literature [49].

2.2. The U-Shaped Hematoma Segmentation Model

We used nnU-Net [42] for the automated segmentation of hematomas on baseline non-contrast head CTs. The nnU-Net is a self-adapting deep learning framework that automatically configures preprocessing, network architecture, and training settings based on the provided dataset. The pipeline of nnU-Net consists of the following key components:

Preprocessing: Standardized intensity normalization, resampling to isotropic voxel spacing, and automatic cropping based on region of interest.
Network Architecture: A fully convolutional encoder–decoder model with residual blocks and deep supervision for improved gradient flow.
Training Strategy: Dice loss and cross-entropy loss are combined to address class imbalance, ensuring accurate segmentation of small hematomas.

The ground truth hematoma masks for training and validation were based on manually delineated hemorrhagic lesions. We adopted a multiscale nnU-Net architecture (Figure 1) for ICH segmentation to improve the network’s capacity to capture a broader and more diverse range of features across multiple spatial resolutions [51]. For the multiscale approach, we preprocessed input head CT scans at two isotropic spaces: 128 × 128 × 128 and 64 × 64 × 64. Each scale passes through an encoder to extract rich hierarchical features, with latent features of (512, 8, 8, 8) and (512, 4, 4, 4), respectively. By processing both the original and scaled versions of the input images through parallel encoder paths, the network learns to capture high-level contextual features from the down-sampled image while simultaneously preserving fine spatial details from the original resolution. This dual-path encoding strategy allows the final layers of the two encoders to extract features derived from full-resolution and scaled representation. Incorporating multiscale inputs theoretically allows the network to more accurately differentiate small hematomas from adjacent brain tissue, thus improving segmentation performance. This capability can be crucial for ICH segmentation, where lesions can vary widely in size, shape, and anatomical location. We assessed the model’s segmentation performance using the Dice Similarity Coefficient, which measured the overlap between the model’s segmentation predictions and manually delineated ground truth hematoma masks [52].

2.3. Extraction of Handcrafted Radiomic Features

Radiomic features are quantitative descriptors extracted from medical images that capture various properties of a target lesion, including its shape, intensity distribution, texture, and spatial heterogeneity [53]. We applied the pyradiomics package [54] to extract n = 1693 radiomic features from hematoma lesions on admission non-contrast head CT scans (detailed list in Supplemental Table S1). These features quantified the following characteristics of hematomas:

Shape-based Features: Quantifying the geometry of the region of interest (ROI), such as volume, surface area, and sphericity.
First-order Statistics (Intensity-based): Quantifying the distribution of voxel intensities within the ROI, such as mean, median, variance, skewness, kurtosis, entropy, and energy.
Texture Features (second-order and higher): Capturing spatial relationships between lesion voxels, such as Gray-Level Co-occurrence Matrix (GLCM), and Gray-Level Run Length Matrix (GLRLM).
Wavelet/Filter-based Features: Applying transforms such as wavelet or Laplacian of Gaussian to reveal multiscale features.

2.4. Extraction of Latent Deep Learning Features from nnU-Net

We extracted high-level features from the final layer of the encoder in the nnU-Net segmentation model (Figure 2). These features tend to capture the most abstract semantic representations of the input image. At this stage of the U-shaped network, the spatial dimensions of the feature map have been significantly down-sampled through pooling or stride convolution operations, resulting in lower spatial resolution but much richer contextual and semantic understanding. These deep features encode global information about the entire image, such as object presence, class identity, and relationships between classes, rather than local textures or edges. In the context of ICH segmentation, these features can capture the overall shape, distribution, and anatomical context of hemorrhagic regions. This is especially important for distinguishing ICH from other tissues or pathologies that may appear similar at the local scale. In the U-shaped model, these features are typically passed through a bottleneck before being fed into the decoder arm, which will reconstruct the segmentation map by gradually increasing the spatial resolution through sampling and convolution.

2.5. Extraction of Latent Deep Features from a Generative Adversarial Network Autoencoder

Latent features from the U-shaped segmentation network theoretically capture characteristics that distinguish hematoma from surrounding brain parenchyma. To extract additional texture information, we applied an autoencoder reconstruction model using cropped CT images capturing hematoma and surrounding tissue with a dilated mask (Figure 2). Dilating the ICH region allows the model to capture surrounding tissue changes or edema, which may be clinically relevant. Specifically, we apply binary dilation with a structuring element size 5 to the predicted ICH mask, expanding the lesion region to capture surrounding contextual features. This dilated mask is then used to crop and extract an ROI from the original head CT at 64 × 64 × 64 size. This enriched region is then passed through a Variational Autoencoder–Generative Adversarial Network (VAE-GAN) architecture [43] to compress and model its latent features (Figure 2). The VAE-GAN combines the strengths of VAEs (robust probabilistic latent space encoding) with those of GANs (high-quality image generation) to create compact, expressive feature representations that are both informative and generative.

2.6. Unsupervised Feature Selection

Unsupervised feature selection can identify a subset of relevant features from high-dimensional data without relying on labeled information [55]. This approach is essential where labeled data is scarce or unavailable, such as for clustering, anomaly detection, and exploratory data analysis. Unlike supervised techniques that use class labels to guide feature selection, unsupervised methods leverage data properties such as variance, correlation, or latent structure. The most notable technique used in this context is Non-negative Matrix Factorization (NMF) [48]. NFM can simplify complex data by breaking it down into smaller parts, while ensuring all values remain non-negative. NMF helps highlight important patterns by grouping features that often appear together, making it useful for understanding the structure of the data and selecting key features without supervision. In this study, we applied NMF to reduce the high-dimensional latent features extracted from (1) radiomic features; (2) two different scales of the nnU-Net segmentation model—compressing features of 512 × 8 × 8 × 8 and 512 × 4 × 4 × 4 size; and (3) autoencoder latent features of 512 × 4 × 4 × 4 size, into 128 dimensions each. The three 128-dimensional feature sets were used separately or combined into a single feature vector as inputs for the final machine learning models.

2.7. Machine Learning Prediction Models

Given the diversity of real-world data—ranging from linear to highly nonlinear relationships—it is beneficial to evaluate a variety of models with different underlying assumptions and learning strategies. We implemented six diverse models: Random Forest (RF) with 1000 estimators [56], XGBoost with 1000 estimators and log-loss as the evaluation metric [57], Naive Bayes classifier (GaussianNB) [58], Extra Trees Classifier (ExtraTrees) with 1000 estimators [59], Elastic Net regularization (ElasticNet_LogReg) with a balanced mix of L1 and L2 penalties (l1_ratio = 0.5) and the “saga” solver [60], and Support Vector Machine (SVM) with a radial basis function (RBF) kernel [61]. All models were initialized with default settings and a fixed random seed to ensure reproducibility, enabling a fair comparison across classifiers. We applied a stratified 5-fold cross-validation to obtain the hyperparameters using the ATACH-2 dataset. Then, we applied the hyperparameters to the entire ATACH-2 dataset for final training and subsequently applied the final model to the independent Yale test set. In each step of cross-validation and final testing, NMF dimensionality reduction was applied to the training fold or training set and then tested on the validation fold or independent test cohort. Due to outcome imbalance, with substantially fewer patients experiencing poor outcomes or hematoma expansion, we implemented stratified 5-fold cross-validation on the ATACH-2 cohort based on the corresponding outcome labels. Stratification ensures that each fold preserves approximately the same proportion of positive and negative cases, preventing training folds from being dominated by the majority class.

We trained and validated these machine learning classifiers using seven different inputs: (1) all radiomic features, (2) only shape radiomic features, (3) latent features from nnU-Net, (4) latent features from VAE-GAN, (5) combination of latent features from both nnU-Net and VAE-GAN, (6) combination of radiomic and latent features from nnU-Net and VAE-GAN, and (7) combination of radiomic shape and latent features from nnU-Net and VAE-GAN. The models were trained to predict four different binary outcomes: (a) 3-month poor outcomes, and (b) >3 mL, (c) >6 mL, and (d) >9 mL hematoma expansion. Model performances were quantified and compared using the receiver operating characteristics (ROCs) area under the curve (AUC) [62].

2.8. Statistical Analysis

Continuous variables are reported as mean ± standard variation and categorical variables are reported as number (percentage). Additional statistical analyses were performed to verify whether observed differences in classifier performance across feature pipelines were statistically significant [63,64]. Because model performance metrics (e.g., AUC) across repeated cross-validation folds and classifiers did not meet normality assumptions (Shapiro–Wilk test, p < 0.05), we used non-parametric tests. A Friedman test was first applied to assess global differences among the seven feature-extraction pipelines across the six classifiers for each outcome label (3-month mRS and hematoma expansion thresholds of 3 mL, 6 mL, and 9 mL) [65]. When the Friedman test indicated a significant global effect (p < 0.05), pairwise Wilcoxon signed-rank tests were performed between pipelines to identify the source of the difference. To control for multiple comparisons, p-values were adjusted using three complementary procedures: Bonferroni, Holm–Bonferroni, and Benjamini–Hochberg (False Discovery Rate, FDR) [66].

3. Results

3.1. Patients’ Characteristics

A total of 866 patients from the ATACH-2 trial were included in the training/cross-validation cohort and 645 from the Yale dataset in the independent test cohort. Patients’ characteristics are summarized and compared between the two cohorts in Table 1. Overall, patients in ATACH-2 had smaller initial hematoma volume, less severe symptoms at presentation, and lower rates of hematoma expansions and poor outcomes. This is likely due to trial inclusion criteria limiting enrollment to those with <60 mL ICH at admission.

3.2. Automated Hematoma Segmentation Performance

We used baseline head CTs for training and validation. The multiscale nnU-Net using dual input resolutions achieved an average Dice of 0.88 ± 0.05 in cross-validation and 0.77 ± 0.19 in independent tests. Notably, a single scale nnU-Net also achieved an average Dice of 0.87 ± 0.08 in cross-validation and 0.77 ± 0.19 in independent tests.

3.3. Comparison of Radiomics and Latent Deep Features in ICH Outcome Prediction

Figure 3 summarizes the accuracy of machine learning models with different inputs in the prediction of outcome and hematoma expansion. Supplementary Table S2 includes the details of AUC, F1-score, sensitivity, specificity, and negative and positive predictive values for all iterations among both validation folds and the independent test cohort.

Overall, combined latent deep features extracted by the nnU-Net and VAE-GAN consistently but only slightly had higher AUCs than radiomics in Random Forest, XGBoost, Extra Trees, and logistic regression. Supplementary Table S3 shows the detailed AUC differences between radiomics alone versus a combination of radiomics with latent deep features as input for all classifiers. The XGBoost and Elastic Net models using combined input had significantly higher AUCs in predicting >3 mL hematoma expansion (p = 0.027 and 0.040, respectively). Otherwise, the higher AUCs did not reach statistical significance. Notably, the Navie Bayes models in the prediction of hematoma expansion had lower AUCs using combined inputs compared to radiomics alone.

The Friedman test showed a statistically significant global difference in classifier AUCs between pipelines in predicting 3-month outcome with X² = 20.429, p = 0.002. No significant global difference was observed in classifier AUCs between pipelines in predicting >3 mL (X² = 10.286, p = 0.113), >6 mL (X² = 4.143, p = 0.6573), or >9 mL (X² = 3.786, p = 0.706) hematoma expansion. As detailed in Supplementary Table S5, we then compared the seven pipelines across the six classifiers using a signed-rank pairwise Wilcoxon test with Bonferroni correction, Holm–Bonferroni correction, and Benjamini–Hochberg by controlling the False Discovery Rate (FDR). Although none of the comparisons reached statistical significance after correction, pipelines with latent features showed consistent numerical improvement over the baseline. These findings suggest the potential for improved performance by combining latent features with radiometric geometric features and shape descriptors, although further validation with larger sample sizes and more classifiers is needed.

The calibration curve analysis for the best-performing model, the Extra Trees classifier trained with a combination of radiomic and latent deep features from both nnU-Net and VAE-GAN encoders, demonstrated great similarity between the predicted and observed probabilities of poor outcome. The calibration plot showed close alignment across the entire probability range, with minimal overestimation in the highest risk decile, indicating that the model’s predicted risk closely reflected the true event rate (Figure 4). We also compared the models’ performance between male versus female patients and found no significant difference (Supplementary Table S4). Model interpretability using SHAP (Shapley Additive Interpretation) revealed consistent feature group importance patterns across all classifiers in the outcome prediction task. In addition to the importance of radiomic features, the latent deep features obtained from the VAE-GAN encoder contributed the most in terms of predictive importance, capturing subtle contextual and perivascular texture patterns beyond handcrafted radiomic features (Figure 5).

4. Discussion

Our proposed models for predicting hematoma expansion and clinical outcomes from admission head CT scans can risk-stratify patients and potentially guide early treatment or intervention immediately after ICH diagnosis, even in the absence of additional clinical information. We found that adding latent deep features to radiomics slightly increased model accuracy, though not to a statistically significant degree. Nevertheless, SHAP explainability analysis indicated that, when combined, latent deep features exert similar or even greater impact on prediction decisions compared with radiomics. Overall, our findings demonstrate promising but not definitive added value of latent deep features for prognostic modeling in hemorrhagic stroke. Furthermore, because the latent features are outcome-independent and do not rely on follow-up data, this framework can be repurposed as a prognostic tool to predict future hematoma expansion or other downstream outcomes.

We trained a multiscale nnU-Net for hematoma segmentation on non-contrast head CT. We showed that dual-scale inputs improved Dice accuracy over single-scale, and latent features from the dual-scale nnU-Net likewise improved prediction performance. Similar multiscale input fusion [67] and multiscale densely connected U-Net [68] architecture have previously been shown to improve segmentation performance. To design a fully automated pipeline, nnU-Net outputs were used to extract both radiomics and latent deep features. Encoder features from segmentation nnU-Net capture hemorrhage morphology and its distinction from perihematomal tissue, while a generative autoencoder (VAE-GAN) trained on dilated lesion patches learns latent vectors encoding spatial texture and context from ICH and surrounding brain tissue. Concatenating these vectors yields a set of composite latent deep feature that can substitute—or augment—conventional radiomics. The addition of the 14 shape features provided almost similar incremental gain in prediction accuracy to the full set of radiomics. Using unsupervised NMF, we reduced the feature space to 128 variables, which is practical for studies with limited sample sizes. Overall, we demonstrate the feasibility of capturing prognostically relevant shape and texture information in <300 (latent deep) features that can be flexibly applied to any ICH outcome prediction scenario.

Prior studies have shown that radiomics from admission non-contrast head CT and deep learning models can predict hematoma expansion [69,70,71]. In head-to-head comparisons, deep learning models outperform radiomics alone [37]. Similarly, radiomics-based and deep learning models can predict clinical outcome from admission head CT scans [32]. Complementary evidence suggests machine learning models that incorporate clinical data further improve prediction [32]. Although U-Net-based segmentation and generative autoencoders have been used in brain tumor imaging to create composite latent deep features [15,16,17,18,19,20,21,22,23,24,25,26,27], this strategy has barely been applied to ICH. Lee et al. applied a combination of ICH detection and segmentation deep features to predict hematoma expansion in 572 patients [72]. We also introduce unsupervised NMF for feature selection and systematically compare radiomics and latent deep features, both separately and in combination. Our findings demonstrate the feasibility of an outcome-agnostic latent deep feature extraction/selection pipeline that yields a compact feature set that is suited to small-sample ICH studies.

While the combination of radiomics and latent deep features from nnU-Net segmentation and a VAE-GAN autoencoder consistently achieved numerically higher AUCs in Random Forest, XGBoost, Extra Trees, and Elastic Net models compared to radiomics alone, the absolute gains were statistically significant only in the prediction of >3 mL hematoma expansion. Clinically, these differences may improve risk stratification at the individual level; nevertheless, the latent features show potential for extracting additional prognostic information from medical images, given their impact on model prediction decision in SHAP analysis. In addition, the combination of latent deep features from segmentation and the autoencoder showed a trend towards higher accuracy than radiomics, suggesting that they may potentially be substitutes for handcrafted features.

Notably, despite significant demographic, clinical, and radiological differences between the ATACH-2 and Yale dataset, we found no consistent decline in model performance when comparing the validation and independent test cohorts (Supplementary Table S3). Compared with the ATACH-2, the Yale group was significantly older, had a higher proportion of Hispanic and white patients, and had higher hypertension, diabetes, hyperlipidemia, and atrial fibrillation. Patients in the Yale dataset had more severe neurological symptoms, reflected by lower Glasgow Coma Scale and higher NIH Stroke Scale scores. Hematoma volumes were also larger in the Yale group, with larger initial and follow-up volumes, and hematoma expansion rates were higher at all thresholds (>3, >6, and >9 mL). These differences represent a clear distributional shift between the training (ATACH-2) and validation (Yale) domains. While this supports the generalizability of the models, it also shows that the slight shift in calibration may impact the transferability of learned image representations, particularly for latent deep features that are more sensitive to intensity patterns. However, our models retained predictive performance between cross-validation and the independent test set.

The latent features from the U-shaped model tend to capture characteristics that differentiate the hematoma from the surrounding parenchyma, whereas the VAE-GAN model using the dilated lesion masks for input encodes both the texture within and around the hematoma, as well as broader contextual changes. This enables the combined features to incorporate information related to edema, boundary irregularities, and tissue heterogeneity. However, these features are not directly interpretable by humans. To enhance transparency, saliency-based visualization methods (e.g., 3D Grad-CAM) can be applied to the expanded lesion to identify the brain areas most influential in the latent features.

A main limitation of our study was clinical differences between the training/cross-validation cohort from ATACH-2 and the independent test set from Yale, as shown in Table 1. However, the heterogeneity of the multicentric training cohort with a fully independent test set supports the generalizability of our final models. Given that our models were intentionally image-only, we did not include clinical variables; prior work (including ours [32]) has shown that the combination of imaging and clinical variables improves prediction accuracy. It should also be noted that latent deep features—especially after NMF selection—are less interpretable than radiomics.

5. Conclusions

Using a large multicenter cohort for training and cross-validation, with an independent test set, we demonstrated that adding latent deep features from a segmentation encoder and a generative autoencoder to handcrafted radiomics yielded slightly more accurate predictions of outcomes and hematoma expansion, although statistical significance was reached in only a few instances. When combined features were used as input, latent deep features contributed equally or more to the machine-learning predictions. Our proposed pipeline—comprising outcome-agnostic feature extraction, unsupervised NMF dimensionality reduction, and automated segmentation—can be readily deployed in ICH studies with limited sample sizes.

Supplementary Materials

The following supporting information can be downloaded at https://www.mdpi.com/article/10.3390/biotech14040087/s1, Supplementary Table S1: List of radiomic features; Supplementary Table S2: Details of machine learning classifier performance; Supplementary Table S3: Comparison of models with radiomics alone versus those with combined radiomics and latent deep features; Supplementary Table S4: Comparison of model performance between male versus female patients; Supplementary Table S5: Statistical comparison of different models.

Author Contributions

Conceptualization and Study Design: A.T.T., J.W. and S.P.; Data collection/curation: G.A.K., D.Z., A.M., G.J.F. and A.I.Q.; Investigation: A.T.T., G.A.K., D.Z., A.I.Q. and A.M., S.M., S.B.M., D.R., G.J.F., K.N.S. and S.P.; Methodology, Formal Analysis, Visualizations (Figures): A.T.T. and S.P.; Data Interpretation: A.T.T., J.W. and S.P.; Manuscript Writing—original draft: A.T.T., J.W., G.A.K., D.Z., A.I.Q., A.M., S.M., N.V., S.B.M., M.R.S., D.R., G.J.F., K.N.S. and S.P.; Manuscript Writing—review and editing: A.T.T., J.W., G.A.K., D.Z., A.I.Q., A.M., S.M., N.V., S.B.M., M.R.S., D.R., G.J.F., K.N.S. and S.P.; Project administration: A.T.T., J.W. and S.P.; Resources: S.P.; Supervision: S.P. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the NIH (K23NS118056 and R01NS140459).

Institutional Review Board Statement

The study was conducted in accordance with the Declaration of Helsinki and approved by the Institutional Review Boards at Columbia University Medical Center, New York, NY, USA (AAAV3664), Yale University, New Haven, CT, USA (2000024296), and the participating centers in the ATACH-2 trial (ClinicalTrials.gov identifier: NCT01176565).

Informed Consent Statement

Informed consent was obtained from all subjects in the ATACH-2 trial. Patient consent in the Yale datasets was waived due to the retrospective nature of study.

Data Availability Statement

The original contributions presented in this study are included in the article/Supplementary Materials. Further inquiries can be directed to the corresponding author(s).

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

ATACH-2	Antihypertensive Treatment of Acute Cerebral Hemorrhage II
AUC	Area under the curve
CNN	Convolutional Neural Networks
DALYs	Disability-adjusted life years
FDR	False Discovery Rate
GLRLM	Gray-Level Run Length Matrix
GLCM	Gray-Level Co-occurrence Matrix
ICH	Intracerebral hemorrhage
NMF	Non-negative Matrix Factorization
RBF	Radial basis function
ROC	Receiver operating characteristics
ROI	Region of interest
RF	Random Forest
SVM	Support Vector Machine
VAE-GAN	Variational Autoencoder–Generative Adversarial Network
SHAP	Shapley Additive Interpretation

References

Mariotti, F.; Agostini, A.; Borgheresi, A.; Marchegiani, M.; Zannotti, A.; Giacomelli, G.; Pierpaoli, L.; Tola, E.; Galiffa, E.; Giovagnoni, A. Insights into radiomics: A comprehensive review for beginners. Clin. Transl. Oncol. 2025, 27, 4091–4102. [Google Scholar] [CrossRef]
Vrettos, K.; Triantafyllou, M.; Marias, K.; Karantanas, A.H.; Klontzas, M.E. Artificial intelligence-driven radiomics: Developing valuable radiomics signatures with the use of artificial intelligence. BJR Artif. Intell. 2024, 1, ubae011. [Google Scholar] [CrossRef]
Lambin, P.; Rios-Velazquez, E.; Leijenaar, R.; Carvalho, S.; van Stiphout, R.G.; Granton, P.; Zegers, C.M.; Gillies, R.; Boellard, R.; Dekker, A.; et al. Radiomics: Extracting more information from medical images using advanced feature analysis. Eur. J. Cancer 2012, 48, 441–446. [Google Scholar] [CrossRef]
Chen, J.; Ye, Z.; Zhang, R.; Li, H.; Fang, B.; Zhang, L.B.; Wang, W. Medical image translation with deep learning: Advances, datasets and perspectives. Med. Image Anal. 2025, 103, 103605. [Google Scholar] [CrossRef] [PubMed]
Xia, Q.; Zheng, H.; Zou, H.; Luo, D.; Tang, H.; Li, L.; Jiang, B. A comprehensive review of deep learning for medical image segmentation. Neurocomputing 2025, 613, 128740. [Google Scholar] [CrossRef]
Zhijin He, A.B.M. Comparative Evaluation of Radiomics and Deep Learning Models for Disease Detection in Chest Radiography. arXiv 2025, arXiv:2504.12249. [Google Scholar] [CrossRef]
Shariaty, F.; Pavlov, V.; Baranov, M. AI-Driven Precision Oncology: Integrating Deep Learning, Radiomics, and Genomic Analysis for Enhanced Lung Cancer Diagnosis and Treatment. Signal Image Video Process. 2025, 19, 693. [Google Scholar] [CrossRef]
Buvat, I.; Dutta, J.; Jha, A.K.; Siegel, E.; Yousefirizi, F.; Rahmim, A.; Bradshaw, T. Should end-to-end deep learning replace handcrafted radiomics? Eur. J. Nucl. Med. Mol. Imaging 2025, 52, 4360–4363. [Google Scholar] [CrossRef]
Jain, A.; Pandey, M.; Sahu, S. A Deep Learning-Based Feature Extraction Model for Classification Brain Tumor. In Lecture Notes on Data Engineering and Communications Technologies, Proceedings of the Data Analytics and Management, Virtual, 25–26 June 2021; Springer Nature: Singapore, 2022; pp. 493–508. [Google Scholar]
Sage, A.; Badura, P. Intracranial Hemorrhage Detection in Head CT Using Double-Branch Convolutional Neural Network, Support Vector Machine, and Random Forest. Appl. Sci. 2020, 10, 7577. [Google Scholar] [CrossRef]
Ertuğrul, Ö.F.; Akıl, M.F. Detecting hemorrhage types and bounding box of hemorrhage by deep learning. Biomed. Signal Process. Control 2022, 71, 103085. [Google Scholar] [CrossRef]
Bijari, S.; Sayfollahi, S.; Mardokh-Rouhani, S.; Bijari, S.; Moradian, S.; Zahiri, Z.; Rezaeijo, S.M. Radiomics and Deep Features: Robust Classification of Brain Hemorrhages and Reproducibility Analysis Using a 3D Autoencoder Neural Network. Bioengineering 2024, 11, 643. [Google Scholar] [CrossRef] [PubMed]
Sasagasako, T.; Ueda, A.; Mineharu, Y.; Mochizuki, Y.; Doi, S.; Park, S.; Terada, Y.; Sano, N.; Tanji, M.; Arakawa, Y.; et al. Postoperative Karnofsky performance status prediction in patients with IDH wild-type glioblastoma: A multimodal approach integrating clinical and deep imaging features. PLoS ONE 2024, 19, e0303002. [Google Scholar] [CrossRef] [PubMed]
Suero Molina, E.; Azemi, G.; Ozdemir, Z.; Russo, C.; Krahling, H.; Valls Chavarria, A.; Liu, S.; Stummer, W.; Di Ieva, A. Predicting intraoperative 5-ALA-induced tumor fluorescence via MRI and deep learning in gliomas with radiographic lower-grade characteristics. J. Neurooncol. 2025, 171, 589–598. [Google Scholar] [CrossRef] [PubMed]
Denes-Fazakas, L.; Kovacs, L.; Eigner, G.; Szilagyi, L. Enhancing Brain Tumor Diagnosis with L-Net: A Novel Deep Learning Approach for MRI Image Segmentation and Classification. Biomedicines 2024, 12, 2388. [Google Scholar] [CrossRef]
Rai, H.M.; Yoo, J.; Dashkevych, S. Two-headed UNetEfficientNets for parallel execution of segmentation and classification of brain tumors: Incorporating postprocessing techniques with connected component labelling. J. Cancer Res. Clin. Oncol. 2024, 150, 220. [Google Scholar] [CrossRef]
Lv, C.; Shu, X.J.; Liang, Q.; Qiu, J.; Xiong, Z.C.; Ye, J.B.; Li, S.B.; Liu, C.Q.; Niu, J.Z.; Chen, S.B.; et al. BrainTumNet: Multi-task deep learning framework for brain tumor segmentation and classification using adaptive masked transformers. Front. Oncol. 2025, 15, 1585891. [Google Scholar] [CrossRef]
Kihira, S.; Mei, X.; Mahmoudi, K.; Liu, Z.; Dogra, S.; Belani, P.; Tsankova, N.; Hormigo, A.; Fayad, Z.A.; Doshi, A.; et al. U-Net Based Segmentation and Characterization of Gliomas. Cancers 2022, 14, 4457. [Google Scholar] [CrossRef]
Cheng, J.; Gao, M.; Liu, J.; Yue, H.; Kuang, H.; Liu, J.; Wang, J. Multimodal Disentangled Variational Autoencoder with Game Theoretic Interpretability for Glioma Grading. IEEE J. Biomed. Health Inform. 2022, 26, 673–684. [Google Scholar] [CrossRef]
Yathirajam, S.S.; Gutta, S. Efficient glioma grade prediction using learned features extracted from convolutional neural networks. J. Med. Artif. Intell. 2024, 7. Available online: https://jmai.amegroups.org/article/view/8452/html (accessed on 1 October 2025). [CrossRef]
Abd El Kader, I.; Xu, G.; Shuai, Z.; Saminu, S.; Javaid, I.; Ahmad, I.S.; Kamhi, S. Brain Tumor Detection and Classification on MR Images by a Deep Wavelet Auto-Encoder Model. Diagnostics 2021, 11, 1589. [Google Scholar] [CrossRef]
Ullah, M.S.; Khan, M.A.; Almujally, N.A.; Alhaisoni, M.; Akram, T.; Shabaz, M. BrainNet: A fusion assisted novel optimal framework of residual blocks and stacked autoencoders for multimodal brain tumor classification. Sci. Rep. 2024, 14, 5895. [Google Scholar] [CrossRef]
Sandeep Waghere, S.; Prashant Shinde, J. A robust classification of brain tumor disease in MRI using twin-attention based dense convolutional auto-encoder. Biomed. Signal Process. Control 2024, 92, 106088. [Google Scholar] [CrossRef]
Cao, Y.; Liang, F.; Zhao, T.; Han, J.; Wang, Y.; Wu, H.; Zhang, K.; Qiu, H.; Ding, Y.; Zhu, H. Brain tumor intelligent diagnosis based on Auto-Encoder and U-Net feature extraction. PLoS ONE 2025, 20, e0315631. [Google Scholar] [CrossRef] [PubMed]
Ahmad, B.; Sun, J.; You, Q.; Palade, V.; Mao, Z. Brain Tumor Classification Using a Combination of Variational Autoencoders and Generative Adversarial Networks. Biomedicines 2022, 10, 223. [Google Scholar] [CrossRef] [PubMed]
Kordnoori, S.; Sabeti, M.; Shakoor, M.H.; Moradi, E. Deep multi-task learning structure for segmentation and classification of supratentorial brain tumors in MR images. Interdiscip. Neurosurg. 2024, 36, 101931. [Google Scholar] [CrossRef]
Li, G.; Hui, X.; Li, W.; Luo, Y. Multitask Learning with Multiscale Residual Attention for Brain Tumor Segmentation and Classification. Mach. Intell. Res. 2023, 20, 897–908. [Google Scholar] [CrossRef]
Parry-Jones, A.R.; Krishnamurthi, R.; Ziai, W.C.; Shoamanesh, A.; Wu, S.; Martins, S.O.; Anderson, C.S. World Stroke Organization (WSO): Global intracerebral hemorrhage factsheet 2025. Int. J. Stroke 2025, 20, 145–150. [Google Scholar] [CrossRef]
Chen, S.; Fan, J.; Abdollahi, A.; Ashrafi, N.; Alaei, K.; Placencia, G.; Pishgar, M. Machine Learning-Based Prediction of ICU Readmissions in Intracerebral Hemorrhage Patients: Insights from the MIMIC Databases. Medrxiv 2025. [Google Scholar] [CrossRef]
Yu, F.; Yang, M.; He, C.; Yang, Y.; Peng, Y.; Yang, H.; Lu, H.; Liu, H. CT radiomics combined with clinical and radiological factors predict hematoma expansion in hypertensive intracerebral hemorrhage. Eur. Radiol. 2025, 35, 6–19. [Google Scholar] [CrossRef]
Zeng, W.; Chen, J.; Shen, L.; Xia, G.; Xie, J.; Zheng, S.; He, Z.; Deng, L.; Guo, Y.; Yang, J.; et al. Clinical, radiological, and radiomics feature-based explainable machine learning models for prediction of neurological deterioration and 90-day outcomes in mild intracerebral hemorrhage. BMC Med. Imaging 2025, 25, 184. [Google Scholar] [CrossRef]
Dierksen, F.; Sommer, J.K.; Tran, A.T.; Lin, H.; Haider, S.P.; Maier, I.L.; Aneja, S.; Sanelli, P.C.; Malhotra, A.; Qureshi, A.I.; et al. Machine Learning Models for 3-Month Outcome Prediction Using Radiomics of Intracerebral Hemorrhage and Perihematomal Edema from Admission Head Computed Tomography (CT). Diagnostics 2024, 14, 2827. [Google Scholar] [CrossRef]
Haider, S.P.; Qureshi, A.I.; Jain, A.; Tharmaseelan, H.; Berson, E.R.; Zeevi, T.; Werring, D.J.; Gross, M.; Mak, A.; Malhotra, A.; et al. Radiomic markers of intracerebral hemorrhage expansion on non-contrast CT: Independent validation and comparison with visual markers. Front. Neurosci. 2023, 17, 1225342. [Google Scholar] [CrossRef]
Zaman, S.; Dierksen, F.; Knapp, A.; Haider, S.P.; Abou Karam, G.; Qureshi, A.I.; Falcone, G.J.; Sheth, K.N.; Payabvash, S. Radiomic Features of Acute Cerebral Hemorrhage on Non-Contrast CT Associated with Patient Survival. Diagnostics 2024, 14, 944. [Google Scholar] [CrossRef]
Jain, A.; Malhotra, A.; Payabvash, S. Imaging of Spontaneous Intracerebral Hemorrhage. Neuroimaging Clin. N. Am. 2021, 31, 193–203. [Google Scholar] [CrossRef]
Chen, Q.; Zhu, D.; Liu, J.; Zhang, M.; Xu, H.; Xiang, Y.; Zhan, C.; Zhang, Y.; Huang, S.; Yang, Y. Clinical-radiomics Nomogram for Risk Estimation of Early Hematoma Expansion after Acute Intracerebral Hemorrhage. Acad. Radiol. 2021, 28, 307–317. [Google Scholar] [CrossRef]
Lu, M.; Wang, Y.; Tian, J.; Feng, H. Application of deep learning and radiomics in the prediction of hematoma expansion in intracerebral hemorrhage: A fully automated hybrid approach. Diagn. Interv. Radiol. 2024, 30, 299–312. [Google Scholar] [CrossRef]
Ma, C.; Zhang, Y.; Niyazi, T.; Wei, J.; Guocai, G.; Liu, J.; Liang, S.; Liang, F.; Yan, P.; Wang, K.; et al. Radiomics for predicting hematoma expansion in patients with hypertensive intraparenchymal hematomas. Eur. J. Radiol. 2019, 115, 10–15. [Google Scholar] [CrossRef] [PubMed]
Pszczolkowski, S.; Manzano-Patron, J.P.; Law, Z.K.; Krishnan, K.; Ali, A.; Bath, P.M.; Sprigg, N.; Dineen, R.A. Quantitative CT radiomics-based models for prediction of haematoma expansion and poor functional outcome in primary intracerebral haemorrhage. Eur. Radiol. 2021, 31, 7945–7959. [Google Scholar] [CrossRef]
Xie, H.; Ma, S.; Wang, X.; Zhang, X. Noncontrast computer tomography-based radiomics model for predicting intracerebral hemorrhage expansion: Preliminary findings and comparison with conventional radiological model. Eur. Radiol. 2020, 30, 87–98. [Google Scholar] [CrossRef] [PubMed]
Yu, B.; Melmed, K.R.; Frontera, J.; Zhu, W.; Huang, H.; Qureshi, A.I.; Maggard, A.; Steinhof, M.; Kuohn, L.; Kumar, A.; et al. Predicting hematoma expansion after intracerebral hemorrhage: A comparison of clinician prediction with deep learning radiomics models. Neurocrit. Care 2025, 43, 119–129. [Google Scholar] [CrossRef] [PubMed]
Isensee, F.; Jaeger, P.F.; Kohl, S.A.A.; Petersen, J.; Maier-Hein, K.H. nnU-Net: A self-configuring method for deep learning-based biomedical image segmentation. Nat. Methods 2021, 18, 203–211. [Google Scholar] [CrossRef] [PubMed]
Larsen, A.B.L.; Sønderby, S.K.; Larochelle, H.; Winther, O. Autoencoding beyond pixels using a learned similarity metric. In Proceedings of the Proceedings of the 33rd International Conference on Machine Learning (ICML), New York, NY, USA, 19 June 2016; pp. 1558–1566. [Google Scholar]
Azad, R.; Aghdam, E.K.; Rauland, A.; Jia, Y.; Avval, A.H.; Bozorgpour, A.; Karimijafarbigloo, S.; Cohen, J.P.; Adeli, E.; Merhof, D. Medical Image Segmentation Review: The Success of U-Net. IEEE Trans. Pattern Anal. Mach. Intell. 2024, 46, 10076–10095. [Google Scholar] [CrossRef] [PubMed]
Chen, L.; Li, J.; Ge, H. TBUnet: A Pure Convolutional U-Net Capable of Multifaceted Feature Extraction for Medical Image Segmentation. J. Med. Syst. 2023, 47, 122. [Google Scholar] [CrossRef] [PubMed]
Chen, X.; Li, Y.; Yao, L.; Adeli, E.; Zhang, Y.; Wang, X. Generative adversarial U-Net for domain-free few-shot medical diagnosis. Pattern Recognit. Lett. 2022, 157, 112–118. [Google Scholar] [CrossRef]
Skandarani, Y.; Jodoin, P.M.; Lalande, A. GANs for Medical Image Synthesis: An Empirical Study. J. Imaging 2023, 9, 69. [Google Scholar] [CrossRef]
Lee, D.D.; Seung, H.S. Learning the parts of objects by non-negative matrix factorization. Nature 1999, 401, 788–791. [Google Scholar] [CrossRef]
Qureshi, A.I.; Palesch, Y.Y.; Barsan, W.G.; Hanley, D.F.; Hsu, C.Y.; Martin, R.L.; Moy, C.S.; Silbergleit, R.; Steiner, T.; Suarez, J.I.; et al. Intensive Blood-Pressure Lowering in Patients with Acute Cerebral Hemorrhage. N. Engl. J. Med. 2016, 375, 1033–1043. [Google Scholar] [CrossRef]
Torres-Lopez, V.M.; Rovenolt, G.E.; Olcese, A.J.; Garcia, G.E.; Chacko, S.M.; Robinson, A.; Gaiser, E.; Acosta, J.; Herman, A.L.; Kuohn, L.R.; et al. Development and Validation of a Model to Identify Critical Brain Injuries Using Natural Language Processing of Text Computed Tomography Reports. JAMA Netw. Open 2022, 5, e2227109. [Google Scholar] [CrossRef]
Su, R.; Zhang, D.; Liu, J.; Cheng, C. MSU-Net: Multi-Scale U-Net for 2D Medical Image Segmentation. Front. Genet. 2021, 12, 639930. [Google Scholar] [CrossRef]
Dice, L.R. Measures of the amount of ecologic association between species. Ecology 1945, 26, 297–302. [Google Scholar] [CrossRef]
Avery, E.; Sanelli, P.C.; Aboian, M.; Payabvash, S. Radiomics: A Primer on Processing Workflow and Analysis. Semin. Ultrasound CT MR 2022, 43, 142–146. [Google Scholar] [CrossRef]
van Griethuysen, J.J.M.; Fedorov, A.; Parmar, C.; Hosny, A.; Aucoin, N.; Narayan, V.; Beets-Tan, R.G.H.; Fillion-Robin, J.C.; Pieper, S.; Aerts, H. Computational Radiomics System to Decode the Radiographic Phenotype. Cancer Res. 2017, 77, e104–e107. [Google Scholar] [CrossRef]
Solorio-Fernández, S.; Carrasco-Ochoa, J.A.; Martínez-Trinidad, J.F. A review of unsupervised feature selection methods. Artif. Intell. Rev. 2019, 53, 907–948. [Google Scholar] [CrossRef]
Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Tianqi Chen, C.G. XGBoost: A Scalable Tree Boosting System. In Proceedings of the Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar]
Andrew McCallum, K.N. A Comparison of Event Models for Naive Bayes Text Classification. In Proceedings of the AAAI-98 Workshop on Learning for Text Categorization, Madison, WI, USA, 26–27 July 1998; pp. 41–48. [Google Scholar]
Geurts, P.; Ernst, D.; Wehenkel, L. Extremely randomized trees. Mach. Learn. 2006, 63, 3–42. [Google Scholar] [CrossRef]
Zou, H.; Hastie, T. Regularization and Variable Selection Via the Elastic Net. J. R. Stat. Soc. Ser. B Stat. Methodol. 2005, 67, 301–320. [Google Scholar] [CrossRef]
Cortes, C.; Vapnik, V. Support-vector networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef] [PubMed]
DeLong, E.R.; DeLong, D.M.; Clarke-Pearson, D.L. Comparing the areas under two or more correlated receiver operating characteristic curves: A nonparametric approach. Biometrics 1988, 44, 837–845. [Google Scholar] [CrossRef]
Meng, L.; Jiang, X.Y.; Liu, X.Y.; Fan, J.H.; Ren, H.R.; Guo, Y.; Diao, H.K.; Wang, Z.H.; Chen, C.; Dai, C.Y.; et al. User-Tailored Hand Gesture Recognition System for Wearable Prosthesis and Armband Based on Surface Electromyogram. IEEE Trans. Instrum. Meas. 2022, 71, 1–16. [Google Scholar] [CrossRef]
Zeng, Z.; Tao, L.; Su, R.; Zhu, Y.; Meng, L.; Tuheti, A.; Huang, H.; Shu, F.; Chen, W.; Chen, C. Unsupervised Transfer Learning Approach with Adaptive Reweighting and Resampling Strategy for Inter-subject EOG-based Gaze Angle Estimation. IEEE J. Biomed. Health Inform. 2023, 28, 157–168. [Google Scholar] [CrossRef]
Demšar, J. Statistical Comparisons of Classifiers over Multiple Data Sets. J. Mach. Learn. Res. 2006, 7, 1–30. [Google Scholar]
Benjamini, Y.; Hochberg, Y. Controlling the False Discovery Rate-a Practical and Powerful Approach to Multiple Testing. J. Roy. Stat. Soc. B 1995, 57, 289–300. [Google Scholar] [CrossRef]
Liu, Z.; Hu, J.; Gong, X.; Li, F. Skin lesion segmentation with a multiscale input fusion U-Net incorporating Res2-SE and pyramid dilated convolution. Sci. Rep. 2025, 15, 7975. [Google Scholar] [CrossRef] [PubMed]
Zhang, J.; Zhang, Y.; Jin, Y.; Xu, J.; Xu, X. MDU-Net: Multi-scale densely connected U-Net for biomedical image segmentation. Health Inform. Sci. Syst. 2023, 11, 13. [Google Scholar] [CrossRef] [PubMed]
Ko, D.R.; Na, H.; Jung, S.; Lee, S.; Jeon, J.; Ahn, S.J. Hematoma expansion prediction in patients with intracerebral hemorrhage using a deep learning approach. J. Med. Artif. Intell. 2024, 7, 10. [Google Scholar] [CrossRef]
Teng, L.; Ren, Q.; Zhang, P.; Wu, Z.; Guo, W.; Ren, T. Artificial Intelligence Can Effectively Predict Early Hematoma Expansion of Intracerebral Hemorrhage Analyzing Noncontrast Computed Tomography Image. Front. Aging Neurosci. 2021, 13, 632138. [Google Scholar] [CrossRef]
Zhong, J.W.; Jin, Y.J.; Song, Z.J.; Lin, B.; Lu, X.H.; Chen, F.; Tong, L.S. Deep learning for automatically predicting early haematoma expansion in Chinese patients. Stroke Vasc. Neurol. 2021, 6, 610–614. [Google Scholar] [CrossRef]
Lee, H.; Lee, J.; Jang, J.; Hwang, I.; Choi, K.S.; Park, J.H.; Chung, J.W.; Choi, S.H. Predicting hematoma expansion in acute spontaneous intracerebral hemorrhage: Integrating clinical factors with a multitask deep learning model for non-contrast head CT. Neuroradiology 2024, 66, 577–587. [Google Scholar] [CrossRef]

Figure 1. Integrating a multiscale model into nnU-Net with multiple outputs for loss function. The final nn-UNET model included both the original and the ½ scale head CT images for segmentation of hematomas.

Figure 2. The overall pipeline for extraction of radiomic features and latent deep learning features. The radiomic features were extracted from the hematoma lesion segmented by nnU-Net. A set of latent deep learning features were extracted from the encoder of the U-shaped segmentation neural network at the bottleneck. Then, dilated masks of the hematoma were used as input for a Variational Autoencoder–Generative Adversarial Network (VAE-GAN), and an additional set of latent deep features were extracted from the encoder bottleneck as the model regenerates the CT images within the dilated hematoma mask.

Figure 3. The AUCs of each prediction model with different inputs in independent tests for functional outcome and >3-, >6-, and >9 mL hematoma expansion. Green indicates higher performance, yellow indicates moderate performance, and red indicates lower performance values. Darker green corresponds to the best-performing models within each prediction task.

Figure 4. The example calibration curves and decision curve of the best model for poor outcome prediction (Extra Trees model with radiomics and latent deep features from nnU-Net and VAE-GAN).

Figure 5. The group-level SHAP contribution across models in outcome prediction.

Table 1. Summary of patients’ characteristics in (ATACH-2) training/cross-validation and (Yale) independent test cohorts.

		ATACH-2 (n = 866)	Yale (n = 645)	p Values
3-month poor outcome		316 (36.5%)	301 (46.7%)	<0.001
>3 mL hematoma expansion		98 (11.3%)	163 (25.3%)	<0.001
>6 mL hematoma expansion		79 (9.1%)	122 (18.9%)	<0.001
>9 mL hematoma expansion		53 (6.1%)	97 (15.0%)	<0.001
Sex [male]		528 (60.9%)	354 (54.9%)	0.020
Age [years]		62.1 ± 12.9	69.6 ± 14.4	<0.001
Hispanic		69 (8.0%)	329 (52.2%)	<0.001
Race	White	241	440	<0.001
	Black	110	125
	Asian	489	17
	Other	26	63
Systolic blood pressure [mmHg]		175.2 ± 25.1	172.9 ± 32.9	0.147
History of hypertension		690 (79.7%)	548 (85.0%)	0.010
History of diabetes mellitus		166 (19.2%)	173 (26.8%)	<0.001
History of hyperlipidemia		213 (24.6%)	346 (53.6%)	<0.001
History of atrial fibrillation		29 (3.4%)	139 (21.6%)	<0.001
Baseline Glasgow Coma Scale	3–11	127 (14.7%)	179 (27.8%)	<0.001
	12–14	242 (27.9%)	168 (26.1%)
	15	497 (57.4%)	270 (41.9%)
	unknown		28 (4.3%)
Baseline NIH Stroke Scale score	0–4	137 (15.8%)	181 (28.1%)	<0.001
	5–9	226 (26.1%)	110 (17.1%)
	10–14	235 (27.1%)	76 (11.8%)
	15–19	159 (18.4%)	86 (13.3%)
	20–25	77 (8.9%)	67 (10.4%)
	>25	27 (3.1%)	38 (5.9%)
	unknown	5 (0.6%)	87 (13.5%)
Baseline hematoma volume [mL]		13.1 ± 12.6	18.7 ± 20.7	<0.001
Follow-up hematoma volume [mL]		15.8 ± 16.7	23.0 ± 25.9	<0.001
CT scans	Slice thickness [mm]	5.3 ± 1.8	4.8 ± 0.7	<0.001
	Min axial image matrix [n x n]	[418 × 418]	[472 × 472]
	Max axial matrix [n x n]	[512 × 734]	[1024 × 1024]
	Number of slices	31.0 ± 18.0	35.1 ± 11.5	<0.001

Continuous variables are reported as mean ± standard variation and compared between the two cohorts using the t test; categorical variables are reported as number (percentage) and compared be-tween the two cohorts using the Chi square test.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Tran, A.T.; Wen, J.; Abou Karam, G.; Zeevi, D.; Qureshi, A.I.; Malhotra, A.; Majidi, S.; Valizadeh, N.; Murthy, S.B.; Sabuncu, M.R.; et al. Comparing Handcrafted Radiomics Versus Latent Deep Learning Features of Admission Head CT for Hemorrhagic Stroke Outcome Prediction. BioTech 2025, 14, 87. https://doi.org/10.3390/biotech14040087

AMA Style

Tran AT, Wen J, Abou Karam G, Zeevi D, Qureshi AI, Malhotra A, Majidi S, Valizadeh N, Murthy SB, Sabuncu MR, et al. Comparing Handcrafted Radiomics Versus Latent Deep Learning Features of Admission Head CT for Hemorrhagic Stroke Outcome Prediction. BioTech. 2025; 14(4):87. https://doi.org/10.3390/biotech14040087

Chicago/Turabian Style

Tran, Anh T., Junhao Wen, Gaby Abou Karam, Dorin Zeevi, Adnan I. Qureshi, Ajay Malhotra, Shahram Majidi, Niloufar Valizadeh, Santosh B. Murthy, Mert R. Sabuncu, and et al. 2025. "Comparing Handcrafted Radiomics Versus Latent Deep Learning Features of Admission Head CT for Hemorrhagic Stroke Outcome Prediction" BioTech 14, no. 4: 87. https://doi.org/10.3390/biotech14040087

APA Style

Tran, A. T., Wen, J., Abou Karam, G., Zeevi, D., Qureshi, A. I., Malhotra, A., Majidi, S., Valizadeh, N., Murthy, S. B., Sabuncu, M. R., Roh, D., Falcone, G. J., Sheth, K. N., & Payabvash, S. (2025). Comparing Handcrafted Radiomics Versus Latent Deep Learning Features of Admission Head CT for Hemorrhagic Stroke Outcome Prediction. BioTech, 14(4), 87. https://doi.org/10.3390/biotech14040087

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Comparing Handcrafted Radiomics Versus Latent Deep Learning Features of Admission Head CT for Hemorrhagic Stroke Outcome Prediction

Abstract

1. Introduction

2. Materials and Methods

2.1. Patients’ Datasets

2.2. The U-Shaped Hematoma Segmentation Model

2.3. Extraction of Handcrafted Radiomic Features

2.4. Extraction of Latent Deep Learning Features from nnU-Net

2.5. Extraction of Latent Deep Features from a Generative Adversarial Network Autoencoder

2.6. Unsupervised Feature Selection

2.7. Machine Learning Prediction Models

2.8. Statistical Analysis

3. Results

3.1. Patients’ Characteristics

3.2. Automated Hematoma Segmentation Performance

3.3. Comparison of Radiomics and Latent Deep Features in ICH Outcome Prediction

4. Discussion

5. Conclusions

Supplementary Materials

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI