A Hybrid Ensemble Machine Learning Framework with Membership-Function Feature Engineering for Non-Invasive Prediction of HER2 Status in Breast Cancer

Salarabadi, Hassan; Salimi, Dariush; Mohammadi Ziabari, Seyed Sahand; Aznab, Mozaffar

doi:10.3390/info17030296

Open AccessArticle

A Hybrid Ensemble Machine Learning Framework with Membership-Function Feature Engineering for Non-Invasive Prediction of HER2 Status in Breast Cancer

¹

Department of Computer Engineering, Faculty of Engineering, University of Zanjan, Zanjan 45371-38791, Iran

²

Informatics Institute, University of Amsterdam, 1098 XH Amsterdam, The Netherlands

³

Department of Computer Science and Technology, SUNY Empire State University, Saratoga Springs, NY 12866, USA

⁴

Medical Department, Kermanshah University of Medical Sciences and Health Services, Kermanshah 67158-47141, Iran

^*

Author to whom correspondence should be addressed.

Information 2026, 17(3), 296; https://doi.org/10.3390/info17030296

Submission received: 5 February 2026 / Revised: 9 March 2026 / Accepted: 11 March 2026 / Published: 18 March 2026

(This article belongs to the Special Issue Information Management and Decision-Making)

Download

Browse Figures

Versions Notes

Abstract

Accurate determination of human epidermal growth factor receptor 2 (HER2) status is a critical component of breast cancer prognosis and treatment planning. Conventional diagnostic techniques, such as immunohistochemistry (IHC) and fluorescence in situ hybridization (FISH), are clinically established but remain invasive, time-consuming, costly, and sensitive to pre-analytical and interpretative variability. Motivated by the need for scalable and data-driven decision-support tools, this study proposes a hybrid ensemble machine learning framework for non-invasive HER2 status prediction using routinely available clinical and immunohistochemical features. A retrospective dataset comprising 624 breast cancer patients from Mahdieh Clinic (Kermanshah, Iran) was analyzed using a structured preprocessing pipeline including normalization and class balancing. The proposed framework integrates multiple tree-based classifiers, Random Forest, XGBoost, and LightGBM, through ensemble strategies and enhances predictive robustness using membership-function feature engineering to capture gradual transitions in clinically relevant biomarkers. Decision threshold optimization was further applied to improve classification balance in borderline cases. The proposed ensemble framework achieved an accuracy of 0.816, an F1-score of 0.814, and an area under the receiver operating characteristic curve (AUC) of 0.862 on a held-out test set, demonstrating performance comparable to the best-performing individual classifier. These results indicate that ensemble learning combined with smooth membership-based feature representations can provide a reliable decision-support framework for HER2 status prediction, although further external validation is required before clinical use.

Keywords:

machine learning; ensemble learning; fuzzy decision systems; HER2 status prediction; breast cancer

1. Introduction

Accurate determination of human epidermal growth factor receptor 2 (HER2) status is fundamental to modern breast cancer management. HER2 is a transmembrane tyrosine kinase receptor involved in regulating key cellular processes such as proliferation, differentiation, and survival. When overexpressed or amplified, HER2 is associated with more aggressive tumor behavior and poorer prognosis. At the same time, HER2-positive tumors may benefit substantially from targeted therapies such as trastuzumab and related anti-HER2 agents. Therefore, reliable identification of HER2 status has major implications for treatment selection, prognosis assessment, and clinical decision-making. In routine oncology practice, HER2 evaluation is an essential component of the pathological assessment of breast cancer. However, ensuring accurate, timely, and accessible HER2 assessment remains challenging, particularly across different clinical settings. These challenges have increased interest in improved diagnostic support strategies, including computational approaches based on routinely available clinical data. Breast cancer is one of the most prevalent malignancies worldwide and remains a leading cause of cancer-related mortality among women [1,2]. The disease is characterized by heterogeneous molecular subtypes with distinct prognostic and therapeutic implications. Among these, the human epidermal growth factor receptor 2 (HER2) plays a pivotal role in regulating cell proliferation and tumor aggressiveness [3]. Approximately 15–20% of breast cancer cases exhibit HER2 overexpression or gene amplification, which is associated with increased recurrence risk and reduced survival in the absence of targeted therapy [4].

Accurate determination of HER2 status is therefore essential for guiding personalized treatment strategies, particularly the administration of HER2-targeted therapies such as trastuzumab, which have significantly improved clinical outcomes [5]. In routine clinical practice, HER2 assessment is primarily performed using immunohistochemistry (IHC), with equivocal cases further evaluated using fluorescence in situ hybridization (FISH) [1]. Despite their widespread adoption, these diagnostic techniques present notable limitations. They require invasive tissue sampling, specialized laboratory infrastructure, and experienced personnel, and they are susceptible to variability arising from tissue fixation, staining protocols, and subjective interpretation [6].

These challenges have motivated increasing interest in computational and data-driven approaches capable of supporting or complementing conventional diagnostic workflows. In recent years, machine learning (ML) methods have demonstrated promising performance in predicting HER2 status using clinical, pathological, imaging, and molecular data [7,8]. Algorithms such as support vector machines, random forests, gradient boosting models, and deep learning architectures have been explored with encouraging results. However, several limitations persist across the existing literature.

First, many studies rely on relatively small or population-specific datasets, which limits generalizability [9]. Second, a substantial number of approaches employ single-model architectures, failing to exploit the complementary strengths of diverse classifiers [10]. Third, conventional feature preprocessing often relies on rigid discretization or sharp thresholds for continuous biomarkers, potentially leading to information loss in borderline or ambiguous cases [11]. These issues are particularly relevant in clinical settings, where biological processes typically evolve along continuous spectra rather than discrete categories.

To address these challenges, this study proposes a hybrid ensemble machine learning framework for HER2 status prediction that integrates multiple tree-based classifiers with membership-function feature engineering. By encoding clinically relevant biomarkers—such as tumor size, Ki67 proliferation index, and hormone receptor status—using smooth membership-based representations, the proposed approach preserves gradual transitions and reduces sensitivity to arbitrary threshold selection. Ensemble learning strategies are employed to enhance robustness and stability, while decision threshold optimization further improves classification balance in clinically critical scenarios.

The proposed framework is evaluated on a real-world clinical dataset comprising 624 breast cancer patients collected from a single medical center. The results demonstrate that the integration of ensemble learning with membership-function feature engineering yields strong predictive performance while maintaining clinical feasibility and methodological transparency. This work contributes to the growing body of research on intelligent decision-support systems for precision oncology and highlights the potential of hybrid ensemble approaches for non-invasive HER2 status prediction.

2. Related Work

Recent advances in machine learning (ML) and data-driven modeling have demonstrated significant potential in improving diagnostic and prognostic decision-making in breast cancer (BC). In particular, the integration of ML with radiomics, multi-omics, and clinical data has enabled the extraction of complex patterns that are difficult to identify using conventional statistical approaches. For instance, Chen et al. conducted a bicentric retrospective study employing ultrasound radiomics and ML techniques to predict pathological prognostic stages in a cohort of 578 BC patients [12]. While promising, the reliance on imaging-derived features and limited external validation may constrain the applicability of such models in routine clinical settings.

Complementary efforts have explored the use of multi-omics data for prognosis estimation. Song et al. proposed a prognostic framework for elderly BC patients by integrating mRNA, miRNA, lncRNA, copy number variations (CNVs), and single nucleotide variants (SNVs), highlighting the prognostic relevance of hypoxia-related pathways and immune microenvironment heterogeneity [13]. Although multi-omics models provide rich biological insights, their clinical deployment is often challenged by high costs, data complexity, and limited accessibility. Interpretability and robust feature selection have also gained attention in clinically oriented ML pipelines. For example, Ahmadian et al. proposed an explainable feature selection approach combining particle swarm optimisation with adaptive LASSO for MRI radiogenomics, demonstrating transferable signatures and improved generalisability in a two-center setting [14].

Beyond imaging and omics-based approaches, ML models have also been applied to predict treatment-related outcomes. Lin et al. developed an XGBoost-based model for predicting radiation dermatitis severity in breast cancer patients, incorporating clinical factors, patient-reported outcomes, and cytokine biomarkers [15]. Similarly, Miglietta et al. utilized ML techniques to predict HER2-low phenotype conversion in recurrent breast cancer, emphasizing the role of artificial intelligence in optimizing patient stratification and treatment accessibility [16]. Despite encouraging results, many of these studies are limited by relatively small sample sizes and the use of single-model architectures, which may restrict robustness and generalizability.

The application of ML in oncology extends beyond breast cancer and further underscores its prognostic utility. In bladder cancer, Zhang et al. developed a machine learning-based prognostic signature utilizing proteomics data to predict patient outcomes and treatment response [17]. In prostate cancer, Gao et al. constructed a programmed cell death-related gene signature using a random forest model, demonstrating that higher risk scores were associated with poorer survival outcomes and diminished immunotherapy benefits [18]. Similarly, Maimaitiyiming et al. proposed a mast cell gene signature that stratified prostate cancer patients into distinct immune-risk groups [19]. In gastric cancer, Liu et al. introduced a deep learning-based pathomics model that achieved high predictive performance for survival outcomes [20].

More recently, ML-based prognostic models have also been investigated in specific breast cancer subpopulations. Wu et al. identified senescence-related molecular subtypes in geriatric breast cancer with distinct prognostic significance [21]. In addition, Emily et al. compared Cox proportional hazards and survival random forest models for breast cancer survival prediction, reporting superior performance of the Cox model in their cohort [22].

Despite these notable advances, several limitations persist across the existing literature. Many studies rely on single-modality data sources, lack ensemble or uncertainty-aware decision mechanisms, or are evaluated on relatively small and population-specific datasets. Moreover, the integration of heterogeneous clinical and immunohistochemical features within robust ensemble frameworks remains underexplored, particularly for HER2 status prediction. Addressing these gaps, the present study proposes a hybrid fuzzy-enhanced ensemble approach that combines multiple tree-based classifiers with fuzzy feature engineering and decision calibration, aiming to improve predictive robustness while maintaining clinical feasibility.

3. Methods and Materials

To ensure reproducibility and methodological transparency, all experiments were implemented in Python 3.5 using widely adopted machine learning libraries, including Scikit-learn, XGBoost, and LightGBM. Model development and evaluation were conducted in a Jupyter-based environment with fixed random seeds to guarantee consistent results. The proposed framework follows a hybrid pipeline that integrates clinical and immunohistochemical features with membership-function-based feature engineering and ensemble learning. The workflow begins with data preprocessing and feature engineering, followed by membership-function feature construction to capture gradual transitions in key biomarkers. Tree-based classifiers-Random Forest, XGBoost, and LightGBM were trained and evaluated both as standalone models and within ensemble strategies. Finally, the model performance is evaluated using standard classification metrics to assess discriminative power and clinical reliability.

3.1. Data Description

This study utilized a retrospective clinical dataset consisting of 624 confirmed breast cancer cases collected from Mahdieh Clinic, Kermanshah, Iran. The dataset includes a combination of clinical and pathological variables routinely used in breast cancer assessment. Specifically, the available features comprise estrogen receptor (ER), progesterone receptor (PR), p53 status, perineural invasion, vascular invasion, metastasis, lymph node involvement, number of involved lymph nodes, tumor size, and the Ki67 proliferation index.

The HER2 status (positive or negative) was defined as the target variable, framing the problem as a binary classification task. All cases were extracted from clinical records and validated by domain experts to ensure data consistency and reliability prior to model development.

3.2. Preprocessing Steps

Several preprocessing steps were applied to enhance data quality and ensure robust model performance. First, missing values in numerical features were handled using mean imputation, which was selected due to the relatively low proportion of missing entries and the approximately symmetric distribution of the variables.

Subsequently, all continuous features were normalized using Min–Max scaling to ensure that variables with different measurement ranges contributed equally to the learning process. The normalization was performed according to:

X_{scaled} = \frac{X - X_{min}}{X_{max} - X_{min}}

(1)

To address the inherent class imbalance in HER2 status, a Random Over-Sampling strategy was applied to the training set. This approach increases the representation of the minority class by randomly duplicating its samples, thereby mitigating model bias toward the majority class. The class balance can be expressed as:

Ratio = \frac{N_{minority}}{N_{majority}}

(2)

By applying oversampling exclusively to the training data, data leakage was avoided, and the integrity of the test set was preserved for unbiased performance evaluation.

3.3. Membership-Function Feature Engineering

Clinical biomarkers in breast cancer often exhibit gradual transitions rather than crisp boundaries. Conventional machine learning models typically rely on sharp thresholds, which may fail to adequately represent the continuous nature of biological processes. To address this limitation, membership-function-based feature engineering was employed to transform selected clinical variables into interpretable soft representations.

In this study, membership functions were constructed for clinically significant variables, including the Ki67 proliferation index, tumor size, and estrogen receptor (ER) status. These variables were selected based on their established relevance to tumor aggressiveness and HER2 expression. Each variable was mapped into multiple overlapping regions using triangular membership functions, enabling the model to capture nonlinear relationships and implicit interactions among biomarkers.

For the Ki67 index, three membership regions—low, medium, and high—were defined to represent varying levels of cellular proliferation. Similarly, tumor size was categorized into small, medium, and large membership regions, reflecting clinically meaningful tumor growth stages. ER status was represented using complementary membership degrees corresponding to positive and negative expression. The triangular membership function used for feature encoding is defined as:

μ (x) = \{\begin{matrix} 0, & x \leq a \\ \frac{x - a}{b - a}, & a < x \leq b \\ \frac{c - x}{c - b}, & b < x < c \\ 0, & x \geq c \end{matrix}

(3)

where a, b, and c denote the lower bound, peak point, and upper bound of each membership region, respectively. The parameters of the membership functions were defined based on clinical knowledge and empirical data distribution.

The resulting membership degrees were incorporated as additional input features and concatenated with the original numerical and categorical variables. This approach allowed the model to encode smooth, uncertainty-aware representations while preserving the original feature space.

3.4. Proposed Hybrid Ensemble Model Architecture

This study proposes a hybrid ensemble learning framework that integrates tree-based machine learning models with membership-function-based feature engineering to predict HER2 status in breast cancer patients. The proposed architecture is designed to capture nonlinear relationships among biomarkers, exploit model diversity, and improve decision robustness in clinical prediction tasks. An overview of the proposed model is illustrated in Figure 1.

3.5. Model Training, Validation, and Evaluation Metrics

The proposed framework was trained and evaluated using a structured and reproducible experimental protocol. The dataset was randomly partitioned into training and testing subsets using an 80/20 split with stratified sampling to preserve the original class distribution. All preprocessing steps, including feature scaling, oversampling, and membership-function feature generation, were applied exclusively to the training set to prevent data leakage.

To enhance decision robustness, threshold optimization was performed using validation predictions obtained within the training data through cross-validation. The probability threshold maximizing the F1-score was selected and then applied to the independent test set.

Model performance was assessed using accuracy, precision, recall, F1-score, and the area under the receiver operating characteristic curve (AUC):

Accuracy = \frac{T P + T N}{T P + T N + F P + F N}

(4)

Precision = \frac{T P}{T P + F P}

(5)

Recall = \frac{T P}{T P + F N}

(6)

F 1 - score = \frac{2 \cdot Precision \cdot Recall}{Precision + Recall}

(7)

The AUC was computed as:

AUC = \int_{0}^{1} TPR (FPR) d (FPR)

(8)

3.6. Experimental Setup

Dataset Overview

All experiments were conducted using Python-based (version 3.5) machine learning libraries, including Scikit-learn, XGBoost, and LightGBM, within a Kaggle computational environment. The dataset was partitioned into training (80%) and testing (20%) subsets using stratified sampling to preserve the original class distribution. To prevent information leakage, all preprocessing steps including feature scaling, membership-function-based feature construction, and class balancing were applied exclusively to the training data and subsequently transferred to the test set. Model performance was evaluated on the unseen test set using multiple complementary metrics, namely accuracy, F1-score, and the area under the receiver operating characteristic curve (AUC), providing a balanced assessment of both classification correctness and discriminative capability.

3.7. Statistical Analysis

To ensure statistical rigor and obtain reliable estimates of model performance, a comprehensive evaluation framework was employed including nonparametric bootstrap confidence intervals, stratified cross-validation, and statistical hypothesis testing. To determine whether the predictive performance of the ensemble models differs significantly from that of the Random Forest classifier, pairwise comparisons were conducted using McNemar’s test. This nonparametric test evaluates whether two classifiers produce significantly different prediction error patterns on the same dataset. Table 1 shows McNemar test results for pairwise model comparisons.

3.7.1. Bootstrap Confidence Intervals

To quantify uncertainty in performance estimates, 95% confidence intervals (CIs) were computed using nonparametric bootstrap resampling with 1000 iterations. In each iteration, a bootstrap sample was generated from the test set by sampling with replacement while preserving the original sample size. Performance metrics including accuracy, F1-score, and the area under the receiver operating characteristic curve (AUC) were recalculated for each resample. The empirical distributions of the resulting metric values were used to derive 95% confidence intervals based on the 2.5th and 97.5th percentiles. Bootstrap estimation was selected instead of DeLong’s method because it does not rely on asymptotic distributional assumptions and is generally more robust for moderate-sized clinical datasets.

3.7.2. Cross-Validation

To evaluate model generalizability and stability, stratified 5-fold cross-validation was performed. The dataset was partitioned into five mutually exclusive folds while preserving the original class distribution in each fold. In each iteration, models were trained on four folds and evaluated on the remaining fold. Performance was summarized as mean ± standard deviation across folds.

3.8. Role of Base Learners in the Ensemble Framework

In the first stage of this study, each base learner was independently employed to predict HER2 status using the same training and testing splits, as well as identical preprocessing procedures. This experiment was conducted to establish baseline predictive performance and to enable a fair comparison with the proposed ensemble framework. Random Forest, XGBoost, and LightGBM models were trained separately on the processed dataset, and their predictive performance was evaluated on the held-out test set. Model performance was assessed using accuracy, F1-score, and area under the ROC curve (AUC), which are commonly adopted metrics in binary medical classification tasks.

Table 2 summarizes the quantitative results obtained by the individual base learners. As shown, all three models achieved competitive performance, with Random Forest yielding the highest AUC, while XGBoost and LightGBM demonstrated comparable accuracy and F1-score values.

These results indicate that each base learner is capable of capturing relevant patterns in the clinical and pathological data, while exhibiting different strengths in terms of discrimination and classification balance. These complementary behaviors motivate their integration within an ensemble framework, which is investigated in the subsequent subsection.

3.9. Performance Evaluation of the Proposed Model

The predictive performance of the proposed hybrid ensemble model was evaluated on the test set using accuracy, F1-score, and area under the ROC curve (AUC). The model achieved an accuracy of 0.816, an F1-score of 0.814, and an AUC of 0.862, indicating strong discriminative capability in distinguishing HER2-positive from HER2-negative cases.

The ROC curves of the evaluated models are shown in Figure 2. The curves demonstrate the discriminative performance of the base learners and ensemble strategies across thresholds. Overall, ensemble-based approaches achieve competitive and more stable performance relative to individual classifiers.

3.10. Cross-Validation Results

To further assess model generalizability, stratified 5-fold cross-validation was conducted. The results are summarized in Table 3 as mean ± standard deviation across folds.

To further investigate the effectiveness of ensemble learning, the predictive performance of individual base learners was compared with various ensemble strategies, including hard voting, soft voting, stacking, and an intelligent weighted ensemble. Figure 3 summarizes the performance metrics obtained for all evaluated models. Among individual classifiers, Random Forest achieved the highest discriminative performance (AUC = 0.873). Ensemble-based approaches exhibited competitive performance, with the intelligent weighted ensemble achieving accuracy and F1-score comparable to the best-performing individual model while maintaining robust AUC values. These findings suggest that ensemble strategies can improve stability and balance by integrating complementary decision patterns from multiple classifiers.

3.11. Statistical Comparison of Models

Pairwise comparisons between classifiers were conducted using McNemar’s test. The results are summarized in Table 1. No statistically significant differences were observed between the evaluated models.

3.12. Effect of Threshold Optimization

To further improve classification reliability, threshold optimization was applied to the ensemble probability outputs. Instead of adopting a fixed threshold of 0.5, an optimal threshold was selected by maximizing the F1-score on the validation data. The impact of this optimization on classification accuracy is visualized in Figure 4. As illustrated, threshold tuning leads to a noticeable improvement in balanced performance, particularly by reducing false negative predictions, which is crucial in clinical decision-making contexts.

The confusion matrix of the proposed model at the optimized threshold is presented in Figure 5. The matrix demonstrates a balanced distribution of true positive and true negative predictions, confirming the effectiveness of the stacking ensemble framework in minimizing misclassification errors. Notably, the model achieves a high true positive rate for HER2-positive cases, which is clinically significant, as misclassifying HER2-positive patients may delay access to targeted therapies.

Overall, the experimental results demonstrate that the proposed hybrid ensemble model achieves strong and balanced predictive performance. The integration of membership-function-based feature encoding and ensemble learning contributes to improved discrimination stability and decision reliability, supporting the potential of the proposed framework as a decision-support tool for HER2 status prediction.

4. Discussion

Accurate and reliable prediction of HER2 status is a cornerstone of personalized breast cancer management, as it directly influences treatment selection and clinical outcomes. In this study, we investigated a hybrid learning framework that integrates membership-function-based feature engineering with multiple ensemble strategies to address key challenges in clinical data, including heterogeneity, uncertainty, and moderate class imbalance.

The experimental results reveal several important insights. First, among individual classifiers, Random Forest demonstrated the strongest overall discriminative performance, achieving an accuracy of 0.816 and an AUC of 0.873, outperforming both XGBoost and LightGBM. This finding suggests that tree-based bagging methods may be particularly well-suited for modeling nonlinear interactions among clinical and pathological biomarkers in HER2 prediction tasks. Second, ensemble strategies did not uniformly outperform the best-performing base learner across all evaluation metrics. Statistical comparison using McNemar’s test confirmed that the differences between Random Forest and the evaluated ensemble models were not statistically significant (p > 0.05), indicating that ensemble integration mainly improves robustness rather than significantly altering predictive accuracy. While the intelligent weighted ensemble and hard voting approaches achieved accuracy and F1-scores comparable to Random Forest, their AUC values were slightly lower. This observation highlights an important methodological point: ensemble integration does not inherently guarantee superior discrimination, particularly when base learners exhibit correlated decision boundaries or similar error patterns. In such cases, ensemble models may primarily improve prediction stability rather than maximizing separability between classes. However, quantitative analysis of performance variability provides additional insight into the value of the ensemble framework. Cross-validation results show relatively small standard deviations across folds (Table 3), indicating stable performance under different training–validation partitions. Furthermore, the 95% bootstrap confidence intervals of the evaluated models substantially overlap, suggesting that the observed differences in AUC between Random Forest and the ensemble models are not statistically meaningful. Pairwise McNemar tests also confirm that the prediction error patterns of Random Forest and the ensemble approaches do not differ significantly (p > 0.05). These results indicate that the ensemble framework primarily contributes improved robustness and decision stability rather than maximizing peak performance on a single split.

Notably, the stacking ensemble achieved a competitive AUC (0.866), indicating improved ranking capability compared to individual boosting models, even though its overall accuracy remained comparable. This suggests that the stacking mechanism effectively aggregates complementary probabilistic information from base learners, enhancing robustness across decision thresholds rather than optimizing a single operating point. Quantitative evidence supporting this interpretation is provided by the statistical evaluation results. Bootstrap confidence intervals for AUC values show substantial overlap between Random Forest and the evaluated ensemble models (Table 2), indicating that the apparent difference in peak performance is not statistically meaningful. Furthermore, stratified 5-fold cross-validation demonstrates relatively small standard deviations across folds (Table 3), suggesting stable predictive behavior across different data partitions. Pairwise McNemar tests also confirm that the prediction error patterns between Random Forest and ensemble approaches do not differ significantly (p > 0.05). Together, these findings indicate that the ensemble framework contributes improved robustness and decision stability rather than maximizing a single-split performance metric.

The role of membership-function-based feature engineering is particularly relevant in this context. By transforming continuous biomarkers such as tumor size and Ki67 into smooth membership-based representations, the model captures gradual transitions between clinical risk states, thereby reducing information loss associated with rigid thresholds. This feature encoding strategy contributes to more stable probability estimates, which is reflected in the relatively consistent AUC values observed across ensemble variants. Importantly, this effect becomes more apparent when threshold optimization is applied, underscoring the interaction between smooth feature representations and decision calibration.

Threshold optimization further enhanced clinical relevance by improving the balance between sensitivity and specificity. Given that false negative predictions in HER2-positive patients may delay access to targeted therapies, prioritizing recall without severely sacrificing precision is critical. The optimized threshold reduced false negatives, as confirmed by the confusion matrix analysis, demonstrating that performance improvements are not solely numerical but clinically meaningful.

From a clinical decision-support perspective, these findings emphasize that model selection should be guided by the intended use case. While Random Forest achieved the highest AUC, ensemble approaches offered comparable accuracy with improved robustness and interpretability through aggregation. Therefore, the proposed ensemble framework with membership-function-based feature encoding should be viewed not as a replacement for strong individual classifiers such as Random Forest, but as a complementary strategy that enhances robustness and prediction stability through the aggregation of multiple decision patterns.

Despite these promising results, several limitations should be acknowledged. Although bootstrap confidence intervals and cross-validation provide strong internal validation, they cannot replace independent external validation. Differences across institutions such as patient demographics, acquisition protocols, and clinical practices may influence model performance. The dataset was obtained from a single clinical center, which may limit generalizability. Additionally, the absence of external validation restricts conclusions regarding real-world deployment. Future work should focus on multi-center validation, incorporation of additional data modalities such as imaging or genomic profiles, and integration of explainability techniques (e.g., SHAP or rule-based decision analysis) to strengthen clinician trust. Alternative balancing strategies such as SMOTE or its variants may also be explored in future studies to assess whether synthetic oversampling provides further improvements in minority-class representation without compromising clinical realism. Table 4 shows the comparative performance analysis of individual classifiers and ensemble strategies, highlighting their effectiveness and suitability for clinical decision-support in HER2 status prediction.

Overall, this study demonstrates that ensemble learning combined with membership-function-based feature encoding provides a flexible and clinically meaningful framework for HER2 status prediction. Rather than maximizing a single performance metric, the proposed approach emphasizes robustness, decision reliability, and stable discrimination—key factors for practical adoption in precision oncology.

5. Conclusions

In this study, a hybrid learning framework combining membership-function-based feature encoding with ensemble-based classification was proposed for predicting HER2 status in breast cancer patients using routinely available clinical and immunohistochemical data. The primary objective was to develop a robust and clinically meaningful decision-support model capable of handling uncertainty, nonlinear feature interactions, and moderate class imbalance inherent in real-world medical datasets.

Comprehensive experimental evaluations demonstrated that tree-based learning models, particularly Random Forest, achieved strong discriminative performance, with an AUC of 0.873 and an accuracy of 81.6%. Ensemble strategies, including stacking and weighted voting, yielded comparable accuracy and F1-scores, while exhibiting slightly lower but stable AUC values. These findings indicate that, rather than maximizing a single performance metric, the proposed ensemble framework improves prediction robustness and decision consistency across varying thresholds. Importantly, statistical analysis demonstrated that the performance differences between Random Forest and the ensemble approaches were not statistically significant, indicating that the ensemble model provides comparable predictive capability while offering improved robustness through model aggregation.

The incorporation of membership-function-based feature encoding played a key role in stabilizing probabilistic outputs by modeling gradual transitions in continuous biomarkers such as tumor size and Ki67. This representation reduced information loss caused by hard discretization and contributed to more stable probabilistic behavior and balanced precision, recall performance. Furthermore, threshold optimization significantly enhanced clinical relevance by reducing false negative predictions for HER2-positive cases, which is critical for timely access to targeted therapies.

Unlike conventional HER2 assessment methods that rely on invasive procedures or costly molecular assays, the proposed framework offers a cost-effective and non-invasive alternative based solely on routinely collected clinical and pathological features. As such, it has the potential to function as a complementary decision-support tool, assisting clinicians in risk stratification and treatment planning rather than replacing standard diagnostic protocols. Nevertheless, several limitations must be acknowledged. The dataset used in this study was derived from a single clinical center, which may limit generalizability. Additionally, the absence of external validation restricts direct clinical deployment. The inclusion of confidence intervals and cross-validation analyses provides a clearer assessment of the reliability and stability of the reported performance estimates.

Future work will focus on validating the proposed framework on multi-center cohorts, incorporating additional data modalities such as imaging radiomics or genomic profiles, and enhancing interpretability through explainable artificial intelligence techniques, including SHAP or rule-based decision analysis. In particular, extending the framework to whole-slide pathology and multimodal learning paradigms may further strengthen its clinical utility.

In summary, this study demonstrates that ensemble learning combined with membership-function-based feature encoding provides a flexible, reliable, and clinically relevant approach for HER2 status prediction. By prioritizing robustness and decision reliability over isolated performance gains, the proposed framework represents a meaningful step toward intelligent decision-support systems in personalized breast cancer management.

Author Contributions

Conceptualization, H.S., D.S. and M.A.; methodology, H.S. and D.S.; software, H.S.; validation, H.S., D.S. and S.S.M.Z.; formal analysis, H.S.; investigation, H.S. and M.A.; resources, D.S. and M.A.; data curation, H.S.; writing—original draft preparation, H.S.; writing—review and editing, D.S., S.S.M.Z. and M.A.; visualization, H.S.; supervision, D.S., S.S.M.Z. and M.A.; project administration, D.S.; funding acquisition, M.A. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

This study used retrospective clinical records. All data were de-identified prior to analysis and handled in accordance with relevant institutional guidelines.

Informed Consent Statement

Not applicable. The work relied exclusively on de-identified, publicly available data and did not involve interaction with human participants or animals.

Data Availability Statement

The dataset used in this study contains patient-level clinical and immunohistochemical records collected from Mahdieh Clinic (Kermanshah, Iran). Due to privacy and ethical restrictions, the data are not publicly available. De-identified data may be provided by the corresponding author upon reasonable request and subject to institutional approval.

Acknowledgments

The authors gratefully acknowledge the Department of Computer Science and Technology at SUNY Empire State University for providing institutional support.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

HER2	Human Epidermal Growth Factor Receptor 2
ML	Machine Learning
ER	Estrogen Receptor
IHC	Immunohistochemistry
FISH	Fluorescence In Situ Hybridization
BC	Breast Cancer
RF	Random Forest
AUC	Area Under the Curve
PR	Progesterone Receptor
TP	True Positive
TN	True Negative
FP	False Positive
FN	False Negative

References

Tanha, J.; Salarabadi, H.; Aznab, M.; Farahi, A.; Zoberi, M. Relationship among prognostic indices of breast cancer using classification techniques. Inform. Med. Unlocked 2020, 18, 100265. [Google Scholar] [CrossRef]
Schreurs, M.A.C.; Schmidt, M.K.; Hollestelle, A.; Schaapveld, M.; van Asperen, C.J.; Ausems, M.G.E.M.; van de Beek, I.; Broekema, M.F.; Collée, J.M.; van der Hout, A.H.; et al. Cancer risks for other sites in addition to breast in CHEK2 c.1100delC families. Genet. Med. 2024, 26, 101171. [Google Scholar] [CrossRef] [PubMed]
Di Grazia, G.; Conti, C.; Nucera, S.; Stella, S.; Massimino, M.; Giuliano, M.; Schettini, F.; Vigneri, P.; Martorana, F. Bridging the gap: The role of MDM2 inhibition in overcoming treatment resistance in breast cancer. Crit. Rev. Oncol. 2025, 214, 104834. [Google Scholar] [CrossRef] [PubMed]
Hlauschek, D.; Fesl, C.; Gnant, M. The impact of analysis methodology details on invasive breast cancer-free survival in randomized clinical trials. ESMO Open 2025, 10, 105324. [Google Scholar] [CrossRef]
Muñoz, J.; Nucera, S.; Garcia, N.R.; Cebrecos, I.; Oses, G.; Ganau, S.; Sanfeliu, E.; Jares, P.; Marín-Aguilera, M.; Galván, P.; et al. Paclitaxel-related type I Kounis syndrome in a very young patient with HER2-positive breast cancer and the role of genomics to disentangle a complex therapeutic scenario: A case report and narrative review. Breast 2025, 81, 104465. [Google Scholar] [CrossRef]
Jin, Y.; Xue, J.; Li, X.; Zhong, X. Downregulation of PIP4K2C inhibits the breast cancer cell proliferation, migration and invasion. Transl. Oncol. 2025, 57, 102420. [Google Scholar] [CrossRef]
Borugadda, P.; Kalluri, H.K. A comprehensive analysis of artificial intelligence, machine learning, deep learning and computer vision in food science. J. Future Foods 2025, 6, 971–987. [Google Scholar] [CrossRef]
Salarabadi, H.; Iraji, M.S.; Salimi, M.; Zoberi, M. Improved COVID-19 diagnosis using a hybrid transfer learning model with fuzzy edge detection on CT scan images. Adv. Fuzzy Syst. 2024, 2024, 3249929. [Google Scholar] [CrossRef]
Wang, H.; Zhang, J.; Li, Y.; Wang, D.; Zhang, T.; Yang, F.; Zhang, Y.; Yang, L.; Li, P. Deep-learning features based on ¹⁸F-FDG PET/CT to predict preoperative colorectal cancer lymph node metastasis. Clin. Radiol. 2024, 79, e1152–e1158. [Google Scholar] [CrossRef]
Rahman, M.A.; Bhuiyan, T.; Ali, M.A. Enhancing aviation safety: Machine learning for real-time ADS-B injection detection through advanced data analysis. Alex. Eng. J. 2025, 126, 262–276. [Google Scholar] [CrossRef]
Zhao, Y.; Zhang, W.; Liu, X. Grid search with a weighted error function: Hyper-parameter optimization for financial time series forecasting. Appl. Soft Comput. 2024, 154, 111362. [Google Scholar] [CrossRef]
Chen, L.; Su, Y.; Wang, Y.; Yu, Y.; Huang, D.; Zhang, M.; Chen, X.; Ye, X.; He, Y.; Xue, E.; et al. Ultrasound radiomics-based machine learning and SHapley Additive exPlanations method predicting pathological prognostic stage in breast cancer: A bicentric and validation study. Clin. Breast Cancer 2025, 25, e954–e967.e6. [Google Scholar] [PubMed]
Song, Y.; Wang, C.; Zhou, Y.; Sun, Q.; Lin, Y. A multi-omics-based prognostic model for elderly breast cancer by machine learning: Insights from hypoxia and immunity of tumor microenvironment. Clin. Breast Cancer 2025, 25, e707–e719. [Google Scholar] [PubMed]
Ahmadian, M.; Bodalal, Z.; Adib, M.; Mohammadi Ziabari, S.S.; Bos, P.; Martens, R.M.; Agrotis, G.; Vens, C.; Karssemakers, L.; Al-Mamgani, A.; et al. Explainable feature selection combining particle swarm optimisation with adaptive LASSO for MRI radiogenomics: Predicting HPV status in oropharyngeal cancer. Comput. Methods Programs Biomed. 2026, 275, 109204. [Google Scholar] [CrossRef]
Lin, N.; Abbas-Aghababazadeh, F.; Su, J.; Wu, A.J.; Lin, C.; Shi, W.; Xu, W.; Haibe-Kains, B.; Liu, F.-F.; Kwan, J.Y.Y. Development of machine learning models for predicting radiation dermatitis in breast cancer patients using clinical risk factors, patient-reported outcomes, and serum cytokine biomarkers. Clin. Breast Cancer 2025, 25, e622–e634. [Google Scholar] [CrossRef]
Miglietta, F.; Collesei, A.; Vernieri, C.; Giarratano, T.; Giorgi, C.A.; Girardi, F.; Griguolo, G.; Cacciatore, M.; Botticelli, A.; Vingiani, A.; et al. Development of two machine learning models to predict conversion from primary HER2-0 breast cancer to HER2-low metastases: A proof-of-concept study. ESMO Open 2025, 10, 104087. [Google Scholar]
Zhang, X.; Li, P.; Ji, L.; Zhang, Y.; Zhang, Z.; Guo, Y.; Zhang, L.; Jing, S.; Dong, Z.; Tian, J.; et al. A machine learning-based prognostic signature utilizing MSC proteomics for predicting bladder cancer prognosis and treatment response. Transl. Oncol. 2025, 54, 102349. [Google Scholar] [CrossRef]
Gao, F.; Huang, Y.; Yang, M.; He, L.; Yu, Q.; Cai, Y.; Shen, J.; Lu, B. Machine learning-based cell death marker for predicting prognosis and identifying tumor immune microenvironment in prostate cancer. Heliyon 2024, 10, e37554. [Google Scholar] [CrossRef]
Maimaitiyiming, A.; An, H.; Xing, C.; Li, X.; Li, Z.; Bai, J.; Luo, C.; Zhuo, T.; Huang, X.; Maimaiti, A.; et al. Machine learning-driven mast cell gene signatures for prognostic and therapeutic prediction in prostate cancer. Heliyon 2024, 10, e35157. [Google Scholar] [CrossRef]
Liu, L.; Zhang, Y.; Zhao, X.-H.; Zhu, M.; Liang, J.-W.; Jiang, Z.-Y.; Zhang, J.-Q.; Ma, J.-L.; Wang, Z.-Z.; Gan, M.-F.; et al. Predicting clinical prognosis in gastric cancer using deep learning-based analysis of tissue pathomics images. Comput. Methods Programs Biomed. 2025, 269, 108895. [Google Scholar] [CrossRef]
Wu, X.; Chen, M.; Liu, K.; Wu, Y.; Feng, Y.; Fu, S.; Xu, H.; Zhao, Y.; Lin, F.; Lin, L.; et al. Molecular classification of geriatric breast cancer displays distinct senescent subgroups of prognostic significance. Mol. Ther. Nucleic Acids 2024, 35, 102309. [Google Scholar] [CrossRef]
Emily, M.; Meidioktaviana, F.; Nabiilah, G.Z.; Moniaga, J.V. Comparative analysis of machine learning and survival analysis for breast cancer prediction. Procedia Comput. Sci. 2024, 245, 759–767. [Google Scholar] [CrossRef]

Figure 1. Overview of the proposed hybrid ensemble framework for HER2 status prediction.

Figure 2. Receiver operating characteristic (ROC) curves of individual base learners and ensemble models for HER2 status prediction. The figure compares Random Forest, XGBoost, LightGBM, and ensemble strategies, including soft voting, hard voting, stacking, and the proposed intelligent weighted ensemble.

Figure 3. Summary of classification performance metrics for all evaluated base learners and ensemble approaches. The results illustrate the comparative strengths of different modeling strategies and the impact of ensemble integration on predictive accuracy, class balance, and discriminative capability.

Figure 4. Variation of classification accuracy as a function of decision threshold for the proposed ensemble model. The optimal threshold maximizing performance is highlighted, demonstrating the importance of threshold tuning in clinical decision-making.

Figure 5. Confusion matrix of the proposed ensemble model on the test dataset. The matrix summarizes the classification outcomes, showing true positives (48) and true negatives (55), while maintaining relatively low false positives (26) and false negatives (21).

Table 1. McNemar test results for pairwise model comparisons.

Comparison	p-Value	Significance
Random Forest vs. Soft Voting Ensemble	0.384	Not significant
Random Forest vs. XGBoost	0.267	Not significant
Random Forest vs. LightGBM	0.152	Not significant
Soft Voting Ensemble vs. XGBoost	0.421	Not significant
Soft Voting Ensemble vs. LightGBM	0.293	Not significant

Table 2. Test-set performance of individual models with 95% bootstrap confidence intervals.

Model	Accuracy (95% CI)	F1-Score (95% CI)	AUC (95% CI)
Random Forest	0.816 (0.744–0.880)	0.812 (0.739–0.883)	0.873 (0.802–0.928)
XGBoost	0.784 (0.704–0.848)	0.799 (0.726–0.871)	0.846 (0.756–0.915)
LightGBM	0.784 (0.704–0.848)	0.783 (0.706–0.847)	0.835 (0.768–0.902)

Table 3. Stratified 5-fold cross-validation results (mean ± standard deviation).

Model	CV Accuracy	CV F1-Score	CV AUC
Random Forest	0.721 ± 0.037	0.718 ± 0.036	0.791 ± 0.046
XGBoost	0.758 ± 0.035	0.757 ± 0.036	0.794 ± 0.046
LightGBM	0.735 ± 0.021	0.735 ± 0.021	0.777 ± 0.044
Soft Voting Ensemble	0.738 ± 0.027	0.738 ± 0.027	0.794 ± 0.042
Hard Voting Ensemble	0.746 ± 0.021	0.745 ± 0.021	N/A

Table 4. Comparative performance analysis of individual classifiers and ensemble strategies, highlighting their effectiveness and suitability for clinical decision-support in HER2 status prediction.

Model	Accuracy	F1-Score	AUC	Key Strength	Clinical Implication
Random Forest	0.816	0.812	0.873	Strong discrimination	Reliable baseline model
XGBoost	0.800	0.800	0.856	Nonlinear boosting	Sensitive to hyperparameters
LightGBM	0.784	0.783	0.835	Computational efficiency	Moderate discrimination
Hard Voting	0.816	0.814	–	Stability	Threshold-independent
Soft Voting	0.800	0.799	0.862	Probabilistic fusion	Improved calibration
Stacking	0.800	0.800	0.866	Robust ranking	Better threshold robustness
Intelligent Weighted Ensemble	0.816	0.814	0.862	Balanced aggregation	Clinically robust decisions

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Salarabadi, H.; Salimi, D.; Mohammadi Ziabari, S.S.; Aznab, M. A Hybrid Ensemble Machine Learning Framework with Membership-Function Feature Engineering for Non-Invasive Prediction of HER2 Status in Breast Cancer. Information 2026, 17, 296. https://doi.org/10.3390/info17030296

AMA Style

Salarabadi H, Salimi D, Mohammadi Ziabari SS, Aznab M. A Hybrid Ensemble Machine Learning Framework with Membership-Function Feature Engineering for Non-Invasive Prediction of HER2 Status in Breast Cancer. Information. 2026; 17(3):296. https://doi.org/10.3390/info17030296

Chicago/Turabian Style

Salarabadi, Hassan, Dariush Salimi, Seyed Sahand Mohammadi Ziabari, and Mozaffar Aznab. 2026. "A Hybrid Ensemble Machine Learning Framework with Membership-Function Feature Engineering for Non-Invasive Prediction of HER2 Status in Breast Cancer" Information 17, no. 3: 296. https://doi.org/10.3390/info17030296

APA Style

Salarabadi, H., Salimi, D., Mohammadi Ziabari, S. S., & Aznab, M. (2026). A Hybrid Ensemble Machine Learning Framework with Membership-Function Feature Engineering for Non-Invasive Prediction of HER2 Status in Breast Cancer. Information, 17(3), 296. https://doi.org/10.3390/info17030296

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Hybrid Ensemble Machine Learning Framework with Membership-Function Feature Engineering for Non-Invasive Prediction of HER2 Status in Breast Cancer

Abstract

1. Introduction

2. Related Work

3. Methods and Materials

3.1. Data Description

3.2. Preprocessing Steps

3.3. Membership-Function Feature Engineering

3.4. Proposed Hybrid Ensemble Model Architecture

3.5. Model Training, Validation, and Evaluation Metrics

3.6. Experimental Setup

Dataset Overview

3.7. Statistical Analysis

3.7.1. Bootstrap Confidence Intervals

3.7.2. Cross-Validation

3.8. Role of Base Learners in the Ensemble Framework

3.9. Performance Evaluation of the Proposed Model

3.10. Cross-Validation Results

3.11. Statistical Comparison of Models

3.12. Effect of Threshold Optimization

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI