Calibrated Global Logit Fusion (CGLF) for Fetal Health Classification Using Cardiotocographic Data
Abstract
1. Introduction
- Calibration-centric fusion for CTG. We introduce a Calibrated Global Logit Fusion (CGLF) that blends TabNet and XGBoost in the log-odds domain with a single global weight chosen on OOF data and then applies per-class vector–temperature calibration, also fitted on OOF. To the best of our knowledge, this calibration-first, logit-level fusion has not previously been reported for CTG tabular classification.
- Leakage-safe path selection using probability-derived meta-features. Beyond the logit blend, we design a lightweight level-2 meta-ENet trained only on probability-derived meta-features (raw class probabilities, entropy, top-2 margin, inter-model agreement). The final path (meta vs. blend) is selected purely on OOF under a BA floor, eliminating raw-feature leakage into stacking, an uncommon safeguard in CTG ensembles.
- Clinically aligned post hoc tuning under a BA floor. We propose constrained post hoc class reweighting after calibration that optimizes performance while enforcing a balanced-accuracy floor tied to the best standalone base on that fold. This yields operating points that maintain recall for minority (pathological) cases, aligning fusion with risk-aware deployment.
- Diversity by design with dual XGBoost streams. We instantiate two complementary XGB learners XGB–A (SMOTE–Tomek) and XGB–B (class-weighted) trained in a strictly leakage-controlled manner. This pragmatic, imbalance-aware diversification, coupled with logit-level fusion and OOF-only calibration, provides a principled hedge against bias from any single imbalance remedy.
2. Literature Review
2.1. Traditional ML for CTG Interpretation
2.2. Deep Learning and Temporal Dynamics
2.3. Real-Time CTG Processing
2.4. Calibration-Aware Fusion and Ensembles
2.5. Positioning of the Present Work
3. Materials and Methods
3.1. Model Architecture and Design
3.1.1. TabNet Model
3.1.2. XGBoost
3.1.3. Calibrated Global Logit Fusion (CGLF)
Seed Ensembling and OOF Probabilities
XGB Stream Selection
Fusion Candidates
- Global logit blend: TabNet and are fused in log-odds space with a single global weight w, tuned on OOF data:
- Meta-learner: An Elastic-Net multinomial logistic regression is trained on OOF probability-derived features (base probabilities, entropy, top-2 margin, and pairwise agreement indicators) [21].
Calibration and Reweighting
Path Selection and Inference
4. Experiment and Results Analysis
4.1. Dataset and Preprocessing
4.2. Class Imbalance Handling
4.3. Feature Engineering
4.4. Evaluation Metrics
- Accuracy. The fraction of correctly classified samples:
- Precision/Recall/F1 (for each class i).
- Macro-F1. The unweighted average of class-wise F1 scores (treats all classes equally; helpful under imbalance):
- Balanced Accuracy. The unweighted average of recalls per class (balances sensitivity between classes):Micro AUROC (OvR). For each class i, form an OvR ROC curve using the class-i scores (e.g., predicted probabilities ). Let denote the area under that ROC. The macro-average isEquivalently, each OvR AUROC admits the ranking interpretation:
4.5. Baseline Models
- Random Forest (RF). Bagged ensemble of decision trees with feature sub-sampling at each split; it is robust to non-linearities and interactions [30]. Settings used: 400 trees, with Gini impurity and bootstrapping enabled.
- Multi-Layer Perceptron (MLP). Feed-forward neural network trained by backpropagation [31]. Settings used: one hidden layer with 50 units, ReLU activation, cross-entropy loss, and early stopping on inner validation.
- Support Vector Machine (SVM, RBF). Maximum-margin classifier with radial-basis-function kernel to capture non-linear decision boundaries [32]. Settings used: , scale (library default), and one-vs-rest for multiclass.
- K-Nearest Neighbors (KNN). Non-parametric classifier using majority vote among the k closest samples in the feature space [33]. Settings used: and the Minkowski distance (order ).
- Logistic Regression (LR). Multinomial logistic (softmax) model optimized by maximum likelihood [34]. Settings used: multinomial loss, regularization, and inverse-class-frequency weights for imbalance.
4.6. Results Analysis
5. Discussion
5.1. Clinical Relevance
5.2. Discrimination vs. Calibration Trade-Offs
5.3. Where the Gains Come from
5.4. Interpretability and Robustness Signals
5.5. Benchmarks in Context
5.6. Clinical Implications
5.7. Limitations and Scope
5.8. Deployment Considerations
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Data Availability Statement
Conflicts of Interest
References
- Rahmayanti, N.; Pradani, H.; Pahlawan, M.; Vinarti, R. Comparison of machine learning algorithms to classify fetal health using cardiotocogram data. Procedia Comput. Sci. 2021, 197, 162–171. [Google Scholar] [CrossRef]
- Alfirevic, Z.; Devane, D.; Gyte, G. Continuous cardiotocography (CTG) as a form of electronic fetal monitoring (EFM) for fetal assessment during labor. Cochrane Database Syst. Rev. 2017, 2, CD006066. [Google Scholar] [CrossRef]
- Ranaei-Zamani, N.; David, A.; Siassakos, D.; Dadhwal, V.; Melbourne, A.; Aughwane, R.; Russell-Buckland, J.; Tachtsidis, I.; Hillman, S.; Mitra, S. Saving babies and families from preventable harm: A review of the current state of fetoplacental monitoring and emerging opportunities. npj Womens Health 2024, 2, 10. [Google Scholar] [CrossRef]
- Mendis, L.; Palaniswami, M.; Brownfoot, F.; Keenan, E. Computerised Cardiotocography Analysis for the Automated Detection of Fetal Compromise during Labour: A Review. Bioengineering 2023, 10, 7. [Google Scholar] [CrossRef]
- Woessner, A.; Anjum, U.; Salman, H.; Lear, J.; Turner, J.T.; Campbell, R.; Beaudry, L.; Zhan, J.; Cornett, L.E.; Gauch, S.; et al. Identifying and Training Deep Learning Neural Networks on Biomedical-Related Datasets. J. Biomed. Inform. 2024, 150, 104294. [Google Scholar] [CrossRef] [PubMed]
- Arık, S.; Pfister, T. TabNet: Attentive Interpretable Tabular Learning. In Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, Virtual, 2–9 February 2021. [Google Scholar] [CrossRef]
- Chen, T.; Guestrin, C. XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar] [CrossRef]
- Salini, Y.; Mohanty, S.; Ramesh, J.; Yang, M.; Chalapathi, M. Cardiotocography Data Analysis for Fetal Health Classification Using Machine Learning Models. IEEE Access 2024, 12, 26005–26022. [Google Scholar] [CrossRef]
- Mienye, I.; Swart, T. A Comprehensive Review of Deep Learning: Architectures, Recent Advances, and Applications. Information 2024, 15, 755. [Google Scholar] [CrossRef]
- Rao, H.; Karthik, S.; Gupta, R. Automatic classification of fetal heart rate based on a multi-scale LSTM network. Front. Physiol. 2024, 15, 1398735. [Google Scholar] [CrossRef] [PubMed]
- Mushtaq, G.; Veningston, K. AI driven interpretable deep learning based fetal health classification. SLAS Technol. 2024, 229, 100206. [Google Scholar] [CrossRef]
- Lee, K.; Choi, E.; Nam, Y.; Liu, N.W.; Yang, Y.S.; Kim, H.Y.; Ahn, K.H.; Hong, S.C. Real-time Classification of Fetal Status Based on Deep Learning and Cardiotocography Data. J. Med. Syst. 2023, 47, 82. [Google Scholar] [CrossRef]
- Chieregato, M.; Frangiamore, F.; Morassi, M.; Baresi, C.; Nici, S.; Bassetti, C.; Bnà, C.; Galelli, M. A hybrid machine learning/deep learning COVID-19 severity predictive model from CT images and clinical data. Sci. Rep. 2022, 12, 4329. [Google Scholar] [CrossRef]
- Wang, Y.; Pan, Z.; Zheng, J.; Qian, L.; Li, M. A hybrid ensemble method for pulsar candidate classification. Astrophys. Space Sci. 2019, 364, 149. [Google Scholar] [CrossRef]
- Martins, A.; Astudillo, R. From softmax to sparsemax: A sparse model of attention and multi-label classification. In Proceedings of the 33rd International Conference on Machine Learning (ICML), New York, NY, USA, 19–24 June 2016; Volume 48, pp. 1614–1623. [Google Scholar]
- Bentéjac, C.; Csőrgő, A.; Martínez-Muñoz, G. A comparative analysis of gradient boosting algorithms. Artif. Intell. Rev. 2021, 54, 1937–1967. [Google Scholar] [CrossRef]
- Grinsztajn, L.; Oyallon, E.; Varoquaux, G. Why do tree-based models still outperform deep learning on tabular data? arXiv 2022, arXiv:2207.08815. [Google Scholar] [CrossRef]
- Buddenkotte, T.; Escudero Sanchez, L.; Crispin-Ortuzar, M.; Woitek, R.; McCague, C.; Brenton, J.D.; Öktem, O.; Sala, E.; Rundo, L. Calibrating ensembles for scalable uncertainty quantification in deep learning-based medical image segmentation. Comput. Biol. Med. 2023, 163, 107096. [Google Scholar] [CrossRef]
- Silva Filho, T.; Song, H.; Perello-Nieto, M.; Santos-Rodriguez, R.; Kull, M.; Flach, P. Classifier calibration: A survey on how to assess and improve predicted class probabilities. Mach. Learn. 2023, 112, 3211–3260. [Google Scholar] [CrossRef]
- Ojeda, F.M.; Jansen, M.L.; Thiéry, A.; Blankenberg, S.; Weimar, C.; Schmid, M.; Ziegler, A. Calibrating machine learning approaches for probability estimation: A comprehensive comparison. Stat. Med. 2023, 42, 5451–5478. [Google Scholar] [CrossRef] [PubMed]
- Rivolli, A.; Garcia, L.; Soares, C. Meta-features for Meta-learning. Knowl.-Based Syst. 2022, 240, 108101. [Google Scholar] [CrossRef]
- Balanya, S.A.; Maroñas, J.; Ramos, D. Adaptive temperature scaling for robust calibration of deep neural networks. Neural Comput. Appl. 2024, 36, 8073–8095. [Google Scholar] [CrossRef]
- Jung, S.; Seo, S.; Jeong, Y.; Choi, J. Scaling of Class-wise Training Losses for Post-hoc Calibration. In Proceedings of the 40th International Conference on Machine Learning (ICML), Honolulu, HI, USA, 23–29 July 2023; Volume 202, pp. 15421–15434. [Google Scholar]
- Or, B. Improving requirements classification with SMOTE-Tomek preprocessing. arXiv 2025, arXiv:2501.06491. [Google Scholar] [CrossRef]
- He, H.; Bai, Y.; Garcia, E.; Li, S. ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In Proceedings of the IEEE International Joint Conference on Neural Networks, Hong Kong, China, 1–8 June 2008; pp. 1322–1328. [Google Scholar] [CrossRef]
- Datta, D.; Mallick, P.; Reddy, A.; Mohammed, M.A.; Jaber, M.M.; Alghawli, A.S.; Al-qaness, M.A. A hybrid classification of imbalanced hyperspectral images using ADASYN and enhanced deep subsampled multi-grained cascaded forest. Remote Sens. 2022, 14, 4853. [Google Scholar] [CrossRef]
- Mujahid, M.; Kına, E.; Rustam, F.; Villar, M.G.; Alvarado, E.S.; De La Torre Diez, I.; Ashraf, I. Data oversampling and imbalanced datasets: An investigation of performance for machine learning and feature engineering. J. Big Data 2024, 11, 87. [Google Scholar] [CrossRef]
- Kuhn, M.; Johnson, K. Feature Engineering and Selection: A Practical Approach for Predictive Models, 1st ed.; Chapman and Hall/CRC: New York, NY, USA, 2019. [Google Scholar] [CrossRef]
- Brier, G.W. Verification of forecasts expressed in terms of probability. Mon. Weather. Rev. 1950, 78, 1–3. [Google Scholar] [CrossRef]
- Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
- Rumelhart, D.E.; Hinton, G.E.; Williams, R.J. Learning representations by back-propagating errors. Nature 1986, 323, 533–536. [Google Scholar] [CrossRef]
- Cortes, C.; Vapnik, V. Support-Vector Networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
- Cover, T.; Hart, P. Nearest Neighbor Pattern Classification. IEEE Trans. Inf. Theory 1967, 13, 21–27. [Google Scholar] [CrossRef]
- Cox, D.R. The Regression Analysis of Binary Sequences. J. R. Stat. Soc. Ser. B Methodol. 1958, 20, 215–242. [Google Scholar] [CrossRef]
- Taha, A.; El-Sharkawy, M. Machine learning on cardiotocography data: A systematic review. Comput. Methods Programs Biomed. 2024, 242, 107656. [Google Scholar]
Model | Macro-F1 | Bal. Acc | Acc | AUROC |
---|---|---|---|---|
CGLF (TabNet+XGB) | 0.9096 ± 0.0238 | 0.9080 ± 0.0319 | 0.9503 ± 0.0110 | 0.9880 ± 0.0095 |
Best XGBoost | 0.9077 ± 0.0233 | 0.9136 ± 0.0254 | 0.9483 ± 0.0122 | 0.9885 ± 0.0075 |
TabNet | 0.8543 ± 0.0459 | 0.8840 ± 0.0445 | 0.9106 ± 0.0226 | 0.9717 ± 0.0131 |
Welch’s t (CGLF vs. XGB) | , | , | (n.s. at ) |
Model | Macro-F1 | Bal. Acc | Acc | AUROC |
---|---|---|---|---|
CGLF (TabNet+XGB) | 0.9096 ± 0.0238 | 0.9080 ± 0.0319 | 0.9503 ± 0.0110 | 0.9880 ± 0.0095 |
Random Forest (200) | 0.9015 ± 0.0438 | 0.8873 ± 0.0595 | 0.9469 ± 0.0176 | 0.9878 ± 0.0078 |
MLP (50) | 0.8546 ± 0.0440 | 0.8421 ± 0.0610 | 0.9200 ± 0.0168 | 0.9773 ± 0.0101 |
SVM (RBF, ) | 0.8289 ± 0.0452 | 0.8052 ± 0.0522 | 0.9079 ± 0.0206 | 0.9764 ± 0.0100 |
KNN () | 0.8152 ± 0.0479 | 0.7831 ± 0.0593 | 0.9039 ± 0.0154 | 0.9606 ± 0.0223 |
Logistic Regression | 0.8003 ± 0.0439 | 0.7930 ± 0.0420 | 0.8945 ± 0.0241 | 0.9676 ± 0.0119 |
Model | ECE (%) | Brier |
---|---|---|
CGLF (TabNet+XGB) | 3.54 ± 1.07 | 0.0259 ± 0.0052 |
Best XGBoost | 3.82 ± 0.75 | 0.0271 ± 0.0055 |
TabNet | 4.83 ± 1.52 | 0.0448 ± 0.0091 |
Random Forest (200) | 4.94 ± 0.89 | 0.0282 ± 0.0063 |
MLP (50) | 5.06 ± 1.09 | 0.0391 ± 0.0075 |
SVM (RBF, ) | 5.15 ± 1.16 | 0.0428 ± 0.0088 |
KNN () | 3.12 ± 1.46 | 0.0419 ± 0.0079 |
Logistic Regression | 4.83 ± 1.24 | 0.0471 ± 0.0080 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Abraha, M.E.; Kim, J. Calibrated Global Logit Fusion (CGLF) for Fetal Health Classification Using Cardiotocographic Data. Electronics 2025, 14, 4013. https://doi.org/10.3390/electronics14204013
Abraha ME, Kim J. Calibrated Global Logit Fusion (CGLF) for Fetal Health Classification Using Cardiotocographic Data. Electronics. 2025; 14(20):4013. https://doi.org/10.3390/electronics14204013
Chicago/Turabian StyleAbraha, Mehret Ephrem, and Juntae Kim. 2025. "Calibrated Global Logit Fusion (CGLF) for Fetal Health Classification Using Cardiotocographic Data" Electronics 14, no. 20: 4013. https://doi.org/10.3390/electronics14204013
APA StyleAbraha, M. E., & Kim, J. (2025). Calibrated Global Logit Fusion (CGLF) for Fetal Health Classification Using Cardiotocographic Data. Electronics, 14(20), 4013. https://doi.org/10.3390/electronics14204013