Strategies for Class-Imbalanced Learning in Multi-Sensor Medical Imaging
Abstract
1. Introduction
2. The Impact of Imbalanced Data in Medical Imaging
2.1. Model Performance Bias
2.2. Overfitting Risk
2.3. Misleading Evaluation Criteria
3. Data-Centric Strategies
3.1. Oversampling
3.1.1. Simple Oversampling
3.1.2. SMOTE Algorithm
3.1.3. ADASYN Algorithm
3.2. Undersampling
3.2.1. Random Undersampling
3.2.2. Clustering-Based Undersampling
3.3. Hybrid Sampling Strategy
3.4. Data Augmentation
3.4.1. Traditional Image Transformation Augmentation
3.4.2. Generative Adversarial Networks (GANs) Augmentation
3.5. Ensemble Methods
3.5.1. EasyEnsemble
3.5.2. BalanceCascade
4. Model-Centric Strategies
4.1. Loss Function Adjustment
4.1.1. Weighted Loss Functions
4.1.2. Focal Loss Functions
4.1.3. Probability Distribution Correction Loss Functions
4.1.4. Hybrid Loss Functions
4.2. Transfer Learning
4.3. Ensemble Learning
4.4. Multi-Task Learning
4.5. Model Calibration
| Category | Specific Method | Core Mechanism | Advantages | Disadvantages/ Clinical Risks | Typical Medical Imaging Application | Optimal Imbalance Ratio |
|---|---|---|---|---|---|---|
| Over- sampling | SMOTE | Generates synthetic minority samples by interpolating between existing ones in feature space. | Increases diversity of minority class, mitigates overfitting from simple duplication. | May create unrealistic samples; effectiveness can diminish with high-dimensional image features. | Benign/Malignant thyroid nodule classification [8]. | 1:4~1:20 |
| Borderline- SMOTE | Oversamples only the minority samples near the decision boundary. | Strengthens the decision boundary, improves model discrimination for borderline cases. | Sensitive to boundary definition; higher computational complexity. | Tasks requiring fine-grained boundary discrimination (e.g., early-stage lesion identification) [10]. | 1:10~1:30 | |
| ADASYN | Adaptively generates synthetic samples based on learning difficulty, focusing on harder samples. | Targets hard-to-classify regions, potentially boosting model performance where needed. | Inaccurate difficulty estimation can introduce noisy samples. | Early Alzheimer’s disease diagnosis; classification of small breast cancer lesions [9,20]. | 1:20~1:50 | |
| Undersampling | Cluster-based (e.g., K-means) | Clusters majority class samples first, then selects representative samples from each cluster. | Preserves the distribution structure of the majority class, minimizing information loss. | Results heavily depend on clustering algorithm and parameter choices. | Diabetes prediction [53] | 1:4~1:30 |
| Data Augmentation | Traditional (Rotation, Flip, etc.) | Applies spatial/transform-domain transformations to images. | Simple, efficient, interpretable; simulates different acquisition conditions. | May cause distortion or loss of critical diagnostic features if over-applied. | General-purpose augmentation for X-ray, CT image classification [54] | 1:4~1:20 (mild imbalance) |
| Generative Adversarial Networks (GANs) | Uses a generator to synthesize realistic minority-class medical images. | High sample diversity and fidelity, closely mimics real data distribution. | Training instability (mode collapse), high computational cost, clinical validity of generated samples needs verification. | Liver tumor MRI data expansion; rare disease image synthesis [3]. | >1:50 (extreme) | |
| Ensemble Learning | Easy- Ensemble | Divides majority class into subsets, each combined with all minority samples to train multiple classifiers, then ensembles results. | Makes full use of majority class information; improves stability and minority class recall. | High computational cost and time consumption from training multiple classifiers. | Tumor biomarker prediction; early detection of aortic dissection [37,38]. | 1:10~1:40 |
| Balance- Cascade | Uses a cascade structure; each classifier is trained on samples misclassified by the previous one, focusing on remaining hard samples. | Progressively focuses on difficult samples, achieving high sensitivity for the minority class. | Sensitive to label noise; errors can amplify through cascade; long training time. | Cardiovascular disease diagnosis (where high sensitivity is critical) [38]. | 1:10~1:50 | |
| Loss Function | Weighted Cross- Entropy (WCE/BCE) | Assigns higher loss weights to the minority class during training. | Direct and effective, forces the model to pay more attention to the minority class. | Weight setting is empirical and subjective; may degrade model calibration. | Foundational adjustment strategy for various medical image classification tasks [41]. | 1:4~1:50 |
| Focal Loss | Reduces the loss contribution from easy-to-classify samples, focusing the model on hard ones. | Particularly effective for tasks with abundant simple negatives (e.g., background in detection). | Introduces additional hyperparameters, complicating optimization. | Lesion detection and segmentation in medical images [40]. | 1:10~1:50 | |
| Transfer Learning | Pre-trained Model Fine-tuning | Leverages features from models pre-trained on large-scale (natural/medical) image datasets, fine-tuned for the imbalanced target task. | Alleviates overfitting on small datasets; utilizes powerful generic feature extractors. | Risk of negative transfer if source/target domains differ significantly; poor interpretability. | Rare disease classification; model adaptation across centers and devices [43]. | 1:10~1:50 |
5. Clinical Deployment and Compliance Framework
5.1. Clinical Deployment Workflow
- (1)
- Multi-Center Data Acquisition and Imbalance-Aware Processing: The pipeline begins with aggregating imaging data from diverse sources (Hospitals A, B, C). This inherently imbalanced dataset then undergoes specialized preprocessing, including synthetic data generation (e.g., GANs for rare cases), intelligent resampling (e.g., SMOTE), and clinically rational data augmentation to create a more balanced and robust training set.
- (2)
- Model Development and Validation: Subsequently, models are trained using imbalance-optimized architectures and loss functions (e.g., Focal Loss, transfer learning). Crucially, validation must be multi-centered and include bias/fairness audits and model calibration to ensure reliability and generalizability beyond the development dataset.
- (3)
- Regulatory Submission and Clinical Integration: Following rigorous validation, the model undergoes regulatory review (e.g., FDA, CE Mark). Upon approval, it is deployed in a radiologist-in-the-loop setting, providing explainable outputs with confidence scores to support clinical decision-making.
- (4)
- Continuous Monitoring: Post-deployment, continuous performance monitoring and post-market surveillance create a feedback loop, enabling ongoing refinement and ensuring sustained safety and efficacy in real-world use.
5.2. Regulatory Compliance for Medical AI
5.2.1. Standard Requirements for Medical Imaging Datasets
- (1)
- Sample diversity standard: The dataset must include samples from at least 3 multi-center clinical sites, covering different age groups (18–80+ years), genders, and ethnicities, with the proportion of minority groups (e.g., ethnic minorities, pediatric/geriatric patients) not less than 15% to avoid demographic bias.
- (2)
- Imaging quality standard: All medical images must meet the clinical diagnostic quality requirements of the corresponding modality (e.g., CT spatial resolution ≥0.625 mm, MRI signal-to-noise ratio ≥20), with a quality control pass rate of 100% verified by two senior radiologists.
- (3)
- Annotation standard: Lesion annotation must be completed by at least two board-certified radiologists with inter-annotator agreement (Cohen’s kappa) ≥0.85; ambiguous annotations must be resolved through a third expert review.
- (4)
- Data management standard: Datasets must include complete metadata (imaging protocol, patient clinical information, annotation time), and raw data must be stored in DICOM 3.0 standard format with traceable version control.
- (5)
- Synthetic data standard: For synthetic data generated by GANs/SMOTE, the similarity with real clinical data (assessed by Fréchet Inception Distance (FID) for 2D images, Dice Similarity Coefficient (DSC) for 3D images) must be ≥0.90, and clinical plausibility must be verified by a clinical expert panel.
5.2.2. FDA Certification Processes
- (1)
- 510(k) programs: For low- to medium-risk medical AI products substantially equivalent to marketed “predicate devices”. Manufacturers are required to conduct sufficient product testing and verification, including performance testing, safety testing, etc., before submitting applications. The application materials should include a product description, intended use, comparative analysis with the “predicate device”, test report, etc. FDA usually completes the review within 3–6 months.
- (2)
- De Novo Procedure: For novel low- and medium-risk devices incompatible with existing market medical devices. Manufacturers need to provide proof of safety and efficacy, as well as establish classification rules. FDA review time is typically 150 business days.
- (3)
- PMA Program: For high-risk medical AI products. Manufacturers need to conduct large-scale, multi-center clinical trials to collect sufficient clinical data to demonstrate the safety and efficacy of the products. Application materials include detailed product design, manufacturing process, clinical research report, risk management plan, etc. FDA review time is typically 180 business days.
5.3. Critical Challenges and Solutions
- (1)
- Federated Learning and Privacy-Preserving Synthesis: Models trained on data from one hospital network may perform poorly on data from another due to differences in imaging equipment, protocols, and patient demographics—a challenge magnified when using synthetic or augmented data. The combination of Federated Learning (FL) and synthetic data generation (e.g., GANs) presents a paradigm shift [55,56]. FL can leverage distributed data across institutions without sharing raw images, while on-device or server-assisted GANs can generate site-specific synthetic minority samples. Future research must address challenges like cross-site distribution shifts in generated data and the development of efficient algorithms (like FedSPU [55]) for personalized and robust model training in resource-constrained environments.
- (2)
- Explainable AI for Imbalanced Learning: As models become more complex, ensuring their decisions are interpretable and clinically plausible is non-negotiable. Future methods should tightly integrate Explainable AI (XAI) techniques like SHAP or attention maps directly into the training loop of imbalanced classifiers. This will allow clinicians to verify whether the model’s increased sensitivity to a minority class stems from medically relevant features or spurious correlations in synthesized/augmented data.
- (3)
- Foundation Models and Few-Shot Learning Adaptation: The emergence of large-scale, pre-trained foundation models for medical imaging offers a promising alternative to traditional transfer learning. There is often a tension between methods that achieve high accuracy (e.g., complex GANs, deep ensembles) and those that meet clinical needs for speed, interpretability, and regulatory simplicity. By leveraging their rich, general-purpose visual representations, adaptation to new, imbalanced tasks could be achieved through efficient fine-tuning or prompt-based learning with very few examples, potentially bypassing the need for extensive data augmentation or resampling [57].
- (1)
- Cross-modality feature misalignment: Imbalance-handling methods for a single modality may disrupt feature consistency across sensors, leading to fusion performance degradation;
- (2)
- Real-time fusion computational cost: Complex imbalance-handling methods (e.g., CM-GANs, cascaded ensembles) increase model complexity, making real-time fusion difficult on clinical hardware (e.g., radiology workstations);
- (3)
- Regulatory validation of fusion models: The FDA/MDR require separate validation of each sensor modality and the integrated fusion model, with imbalance-handling effects verified for both single-modality and fusion results;
- (4)
- Interpretability of fusion-imbalance models: Cross-modality feature fusion and imbalance-handling together exacerbate the “black-box” problem, making it difficult to trace the model’s minority class prediction to specific sensor features.
- (5)
- To address these challenges, future research must develop sensor-adaptive imbalance-handling methods with low computational cost, and integrate XAI techniques (e.g., cross-modality attention maps) to improve interpretability.
6. Conclusions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
| SMOTE | Synthetic Minority Over-sampling Technique |
| ADASYN | Adaptive Synthetic Sampling Approach |
| GANs | Generative Adversarial Networks |
| FDA | U.S. Food and Drug Administration |
| WCE | Weighted Cross-Entropy |
| BCE | Balanced Cross-Entropy |
| FL | Focal Loss |
| MTL | Multi-Task Learning |
| DANN | Domain-Adversarial Neural Networks |
| ECE | Expected Calibration Error |
| PTC | Papillary Thyroid Cancer |
| FedSPU | Federated Learning with Stochastic Parameter Update |
| ERI | Effort-Reward Imbalance |
References
- Zhang, X.; Xiao, Z.; Ma, J.; Wu, X.; Zhao, J.; Zhang, S.; Li, R.; Pan, Y.; Liu, J. Adaptive Dual-Axis Style-based Recalibration Network with Class-Wise Statistics Loss for Imbalanced Medical Image Classification. IEEE Trans. Image Process. 2025, 34, 2081–2096. [Google Scholar] [CrossRef] [PubMed]
- Chen, Z.; Duan, J.; Kang, L.; Qiu, G. Class-Imbalanced Deep Learning via a Class-Balanced Ensemble. IEEE Trans. Neural Netw. Learn. Syst. 2022, 33, 5626–5640. [Google Scholar] [CrossRef]
- Devi, M.K.A.A.; Suganthi, K. Review of Medical Image Synthesis using GAN Techniques. In ITM Web of Conferences; EDP Sciences: Les Ulis, France, 2021; Volume 37. [Google Scholar] [CrossRef]
- Buda, M.; Maki, A.; Mazurowski, M.A. A systematic study of the class imbalance problem in convolutional neural networks. Neural Netw. 2018, 106, 249–259. [Google Scholar] [CrossRef] [PubMed]
- Topol, E.J. High-performance medicine: The convergence of human and artificial intelligence. Nat. Med. 2019, 25, 44–56. [Google Scholar] [CrossRef]
- Liao, F.; Adelaine, S.; Afshar, M.; Patterson, B.W. Governance of Clinical AI applications to facilitate safe and equitable deployment in a large health system: Key elements and early successes. Front. Digit. Health 2022, 4, 931439. [Google Scholar] [CrossRef]
- Iqbal, S.; Qureshi, A.N.; Li, J.; Choudhry, I.A.; Mahmood, T. Dynamic learning for imbalanced data in learning chest X-ray and CT images. Heliyon 2023, 9, e16807. [Google Scholar] [CrossRef]
- Yu, C.; Pei, H. Dynamic Weighting Translation Transfer Learning for Imbalanced Medical Image Classification. Entropy 2024, 26, 400. [Google Scholar] [CrossRef]
- Krawczyk, B.; Jeleń, Ł.; Krzyżak, A.; Fevens, T. Oversampling Methods for Classification of Imbalanced Breast Cancer Malignancy Data. In International Conference on Computer Vision and Graphics; Springer: Berlin/Heidelberg, Germany, 2012; pp. 483–490. [Google Scholar]
- Asaduzzaman, A.; Thompson, C.C.; Uddin, M.J. Machine Learning Approaches for Skin Neoplasm Diagnosis. ACS Omega 2024, 9, 32853–32863. [Google Scholar] [CrossRef]
- Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: Synthetic Minority Over-sampling Technique. J. Artif. Intell. Res. 2002, 16, 321–357. [Google Scholar] [CrossRef]
- Chen, J.H.; Zhang, Y.Q.; Zhu, T.T.; Zhang, Q.; Zhao, A.X.; Huang, Y. Applying machine-learning models to differentiate benign and malignant thyroid nodules classified as C-TIRADS 4 based on 2D-ultrasound combined with five contrast-enhanced ultrasound key frames. Front. Endocrinol. 2024, 15, 1299686. [Google Scholar] [CrossRef] [PubMed]
- Guo, Z.; Wang, N.; Zhao, G.; Du, L.; Cui, Z.; Liu, F. Development of preoperative models for predicting positive esophageal margin in proximal gastric cancer based on machine learning. J. Shandong Univ. (Health Sci.) 2024, 62, 78–83. [Google Scholar]
- Guo, Z.; Wang, N.; Zhao, G.; Du, L.; Cui, Z.; Liu, F. Development and validation of a preoperative model for predicting positive proximal margins in adenocarcinoma of the esophagogastric junction and assessing safe margin distance. Front. Oncol. 2024, 14, 1503728. [Google Scholar] [CrossRef]
- Batista, G.E.; Prati, R.C.; Monard, M.C. A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explor. Newsl. 2004, 6, 20–29. [Google Scholar] [CrossRef]
- Chen, W.; Yang, K.; Yu, Z.; Shi, Y.; Chen, C.L.P. A survey on imbalanced learning: Latest research, applications and future directions. Artif. Intell. Rev. 2024, 57, 137. [Google Scholar] [CrossRef]
- He, H.; Garcia, E.A. Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 2009, 21, 1263–1284. [Google Scholar] [CrossRef]
- Insan, H.; Prasetiyowati, S.S.; Sibaroni, Y. SMOTE-LOF and Borderline-SMOTE Performance to Overcome Imbalanced Data and Outliers on Classification. In Proceedings of the 2023 3rd International Conference on Intelligent Cybernetics Technology & Applications (ICICyTA), Denpasar, Indonesia, 13–15 December 2023; IEEE: New York, NY, USA, 2023; pp. 136–141. [Google Scholar]
- Ungkawa, U.; Rafi, M.A. Data Balancing Techniques Using the PCA-KMeans and ADASYN for Possible Stroke Disease Cases. J. Online Inform. 2024, 9, 138–147. [Google Scholar] [CrossRef]
- Ahmed, G.; Er, M.J.; Fareed, M.M.S.; Zikria, S.; Mahmood, S.; He, J.; Asad, M.; Jilani, S.F.; Aslam, M. DAD-Net: Classification of Alzheimer’s Disease Using ADASYN Oversampling Technique and Optimized Neural Network. Molecules 2022, 27, 7085. [Google Scholar] [CrossRef]
- Khan, T.M.; Xu, S.; Khan, Z.G.; Chishti, M.U. Implementing Multilabeling, ADASYN, and ReliefF Techniques for Classification of Breast Cancer Diagnostic through Machine Learning: Efficient Computer-Aided Diagnostic System. J. Healthc. Eng. 2021, 2021, 5577636. [Google Scholar] [CrossRef]
- Munshi, R.M. Novel ensemble learning approach with SVM-imputed ADASYN features for enhanced cervical cancer prediction. PLoS ONE 2024, 19, e0296107. [Google Scholar] [CrossRef]
- Ramotra, A.K.; Mansotra, V. Hybrid Type-2 Diabetes Prediction Model Using SMOTE, K-means Clustering, PCA, and Logistic Regression. Asian Pac. J. Health Sci. 2021, 8, 137–140. [Google Scholar] [CrossRef]
- Madsen, M.T.; Park, C.H. Enhancement of SPECT Images by Fourier Filtering the Projection Image Set. J. Nucl. Med. 1985, 26, 395–402. [Google Scholar]
- Yoshida, H.; Keserci, B. Bayesian wavelet snake for computer-aided diagnosis of lung nodules. Integr. Comput.-Aided Eng. 2000, 7, 253–269. [Google Scholar] [CrossRef]
- Lundervold, A.S.; Lundervold, A. An overview of deep learning in medical imaging focusing on MRI. Z. Für Med. Phys. 2019, 29, 102–127. [Google Scholar] [CrossRef]
- Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial networks. Commun. ACM 2020, 63, 139–144. [Google Scholar] [CrossRef]
- Greenspan, H.; van Ginneken, B.; Summers, R.M. Guest Editorial Deep Learning in Medical Imaging: Overview and Future Promise of an Exciting New Technique. IEEE Trans. Med. Imaging 2016, 35, 1153–1159. [Google Scholar] [CrossRef]
- Kim, Y.; Choi, W.; Choi, W.; Ko, G.; Han, S.; Kim, H.C.; Kim, D.; Lee, D.G.; Shin, D.W.; Lee, Y. A machine learning approach using conditional normalizing flow to address extreme class imbalance problems in personal health records. BioData Min. 2024, 17, 14. [Google Scholar] [CrossRef]
- Stephanovitch, A.; Aamari, E.; Levrard, C. Wasserstein GANs Are Minimax Optimal Distribution Estimators. arXiv 2023. [Google Scholar] [CrossRef]
- Li, X.; Huang, H.; Yuan, G.T.; Wang, Z.L.; Du, R. Research on fusion neural network intrusion detection based on improved WGAN algorithm. J. Sichuan Univ. Sci. Eng. (Nat. Sci. Ed.) 2024, 37, 57–65. [Google Scholar]
- Luo, Y.; Yang, Z. DynGAN: Solving Mode Collapse in GANs With Dynamic Clustering. IEEE Trans. Pattern Anal. Mach. Intell. 2024, 46, 5493–5503. [Google Scholar] [CrossRef] [PubMed]
- Yin, W.; Huang, C.; Chen, L.; Huang, X.; Wang, Z.; Bian, Y.; Zhou, Y.; Wan, Y.; Han, T.; Yi, M. Facilitate Robust Early Screening of Cerebral Palsy via General Movements Assessment with Multi-Modality Co-Learning. IEEE Trans. Med. Imaging 2025. early access. [Google Scholar] [CrossRef]
- Brock, A.; Donahue, J.; Simonyan, K. Large Scale GAN Training for High Fidelity Natural Image Synthesis. arXiv 2019, arXiv:1809.11096. [Google Scholar] [CrossRef]
- Karras, T.; Laine, S.; Aittala, M.; Hellsten, J.; Lehtinen, J.; Aila, T. Analyzing and Improving the Image Quality of StyleGAN. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; IEEE: New York, NY, USA, 2020; pp. 8107–8116. [Google Scholar]
- Xie, Z.H.; Yi, M.; Huang, X. Application progress of multi-instance learning in medical image analysis. J. Integr. Technol. 2025, 14, 24–32. [Google Scholar]
- Rao, Z.; Wu, C.; Liao, Y.; Ye, C.; Huang, S.; Zhao, D. POCALI: Prediction and Insight on CAncer LncRNAs by Integrating Multi-Omics Data with Machine Learning. Small Methods 2025, 9, e2401987. [Google Scholar] [CrossRef]
- Luo, J.; Zhang, W.; Tan, S.; Liu, L.; Bai, Y.; Zhang, G. Aortic Dissection Auxiliary Diagnosis Model and Applied Research Based on Ensemble Learning. Front. Cardiovasc. Med. 2021, 8, 777757. [Google Scholar] [CrossRef]
- Yang, C.; Xie, J.; Huang, X.; Tan, H.; Li, Q.; Tang, Z.; Ma, X.; Lu, J.; He, Q.; Fu, W.; et al. ECS-Net: Extracellular space segmentation with contrastive and shape-aware loss by using cryo-electron microscopy imaging. Expert Syst. Appl. 2025, 270, 126370. [Google Scholar] [CrossRef]
- Yeung, M.; Sala, E.; Schönlieb, C.B.; Rundo, L. Unified Focal loss: Generalising Dice and cross entropy-based losses to handle class imbalanced medical image segmentation. Comput. Med. Imaging Graph. 2022, 95, 102026. [Google Scholar] [CrossRef] [PubMed]
- Du, J.; Zhang, X.; Liu, P.; Vong, C.M.; Wang, T. An Adaptive Deep Metric Learning Loss Function for Class-Imbalance Learning via Intraclass Diversity and Interclass Distillation. IEEE Trans. Neural Netw. Learn. Syst. 2024, 35, 15372–15386. [Google Scholar] [CrossRef]
- Li, Z.; Kamnitsas, K.; Glocker, B. Analyzing Overfitting Under Class Imbalance in Neural Networks for Image Segmentation. IEEE Trans. Med. Imaging 2021, 40, 1065–1077. [Google Scholar] [CrossRef]
- Albahli, S.; Masood, M. Efficient attention-based CNN network (EANet) for multi-class maize crop disease classification. Front. Plant Sci. 2022, 13, 1003152. [Google Scholar] [CrossRef] [PubMed]
- Qiu, S.; Zhao, H.; Jiang, N.; Wang, Z.; Liu, L.; An, Y.; Zhao, H.; Miao, X.; Liu, R.; Fortino, G. Multi-sensor information fusion based on machine learning for real applications in human activity recognition: State-of-the-art and research challenges. Inf. Fusion 2022, 80, 241–265. [Google Scholar] [CrossRef]
- Rao, P.M.M.; Singh, S.K.; Khamparia, A.; Bhushan, B.; Podder, P. Multi-Class Breast Cancer Classification Using Ensemble of Pretrained models and Transfer Learning. Curr. Med. Imaging 2022, 18, 409–416. [Google Scholar] [CrossRef]
- Chen, Y.S. An empirical study of a hybrid imbalanced-class DT-RST classification procedure to elucidate therapeutic effects in uremia patients. Med. Biol. Eng. Comput. 2016, 54, 983–1001. [Google Scholar] [CrossRef]
- Yang, J.; El-Bouri, R.; O’Donoghue, O.; Lachapelle, A.S.; Soltan, A.A.S.; Eyre, D.W.; Lu, L.; Clifton, D.A. Deep reinforcement learning for multi-class imbalanced training: Applications in healthcare. Mach. Learn. 2024, 113, 2655–2674. [Google Scholar] [CrossRef]
- Solihah, B.; Azhari, A.; Musdholifah, A. Enhancement of conformational B-cell epitope prediction using CluSMOTE. PeerJ Comput. Sci. 2020, 6, e275. [Google Scholar] [CrossRef] [PubMed]
- Ojeda, F.M.; Jansen, M.L.; Thiéry, A.; Blankenberg, S.; Weimar, C.; Schmid, M.; Ziegler, A. Calibrating machine learning approaches for probability estimation: A comprehensive comparison. Stat. Med. 2023, 42, 5451–5478. [Google Scholar] [CrossRef] [PubMed]
- Hou, F.; Zhu, Y.; Zhao, H.; Cai, H.; Wang, Y.; Peng, X.; Lu, L.; He, R.; Hou, Y.; Li, Z.; et al. Development and validation of an interpretable machine learning model for predicting the risk of distant metastasis in papillary thyroid cancer: A multicenter study. eClinicalMedicine 2024, 77, 102913. [Google Scholar] [CrossRef]
- Dawood, T.; Ruijsink, B.; Razavi, R.; King, A.P.; Puyol-Antón, E. Improving Deep Learning Model Calibration for Cardiac Applications using Deterministic Uncertainty Networks and Uncertainty-aware Training. arXiv 2024. [Google Scholar] [CrossRef]
- Van Calster, B.; McLernon, D.J.; van Smeden, M.; Wynants, L.; Steyerberg, E.W. Calibration: The Achilles heel of predictive analytics. BMC Med. 2019, 17, 230. [Google Scholar] [CrossRef]
- Abu-Shareha, A.A.; Abualhaj, M.; Hussein, A.; Al-Saaidah, A.; Achuthan, A. Investigation of Data Balancing Techniques for Diabetes Prediction. Int. J. Intell. Eng. Syst. 2025, 18, 598–611. [Google Scholar] [CrossRef]
- Zhang, F.; Wu, S.; Zhang, C.; Chen, Q.; Yang, X.; Jiang, K.; Zheng, J. Multi-domain features for reducing false positives in automated detection of clustered microcalcifications in digital breast tomosynthesis. Med. Phys. 2019, 46, 1300–1308. [Google Scholar] [CrossRef]
- Niu, Z.; Dong, H.; Qin, A.K. FedSPU: Personalized Federated Learning for Resource-Constrained Devices with Stochastic Parameter Update. Proc. AAAI Conf. Artif. Intell. 2025, 39, 19721–19729. [Google Scholar] [CrossRef]
- Wu, N.; Yu, L.; Yang, X.; Cheng, K.-T.; Yan, Z. FedIIC: Towards Robust Federated Learning for Class-Imbalanced Medical Image Classification. In International Conference on Medical Image Computing and Computer-Assisted Intervention; Springer: Cham, Switzerland, 2023; pp. 692–702. [Google Scholar] [CrossRef]
- Işık, G.; Paçal, İ. Few-shot classification of ultrasound breast cancer images using meta-learning algorithms. Neural Comput. Appl. 2024, 36, 12047–12059. [Google Scholar] [CrossRef]



| Method Category | Key FDA Certification Considerations and Validation Requirements |
|---|---|
| All Methods | 1. Performance Validation: Provide performance reports (Sensitivity, Specificity, AUC, etc.) on multi-center, prospective clinical datasets. 2. Bias Assessment: Must evaluate and demonstrate algorithmic fairness across demographic subgroups (age, gender, ethnicity). 3. Traceability: Complete logs of model development, data preprocessing, and augmentation steps. |
| Synthetic Data Generation (e.g., SMOTE, GANs) | 1. Realism Verification: Must demonstrate clinical plausibility of synthetic samples, typically requiring expert review or Turing-test-like evaluation. 2. Generalizability Testing: The model trained with synthetic data must be tested on an independent, entirely real clinical dataset to prove performance gains are not artifacts of synthesis. 3. Privacy Compliance: If real data is used for generation, ensure compliance with regulations like HIPAA, providing proof of anonymization. |
| Data Augmentation | 1. Transformation Justification: All image transformations must have clear clinical rationale (e.g., simulating different view angles) with safe intensity thresholds to prevent distortion of diagnostic features. 2. Consistency Testing: Diagnostic decisions from models trained on augmented data should show high agreement with those from models trained on original data for critical cases. |
| Ensemble/Cascade Models | 1. Stability Report: Provide performance and stability analysis for each base classifier and the ensemble, demonstrating robustness of the strategy. 2. Real-time Performance Validation: For cascade models, inference time must be tested on target clinical hardware to ensure it meets real-time diagnostic requirements. 3. Error Propagation Analysis: For methods like BalanceCascade, analyze and mitigate the risk of error amplification through the cascade stages. |
| Model Calibration | 1. Calibration Curves: Must provide calibration curves (reliability diagrams) on validation and test sets, and compute metrics like Expected Calibration Error (ECE). 2. Clinical Decision Support: Calibrated probability outputs should align with clinical risk stratification, providing reliable decision thresholds for physicians. |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Zhou, D.; Gao, S.; Huang, X. Strategies for Class-Imbalanced Learning in Multi-Sensor Medical Imaging. Sensors 2026, 26, 1998. https://doi.org/10.3390/s26061998
Zhou D, Gao S, Huang X. Strategies for Class-Imbalanced Learning in Multi-Sensor Medical Imaging. Sensors. 2026; 26(6):1998. https://doi.org/10.3390/s26061998
Chicago/Turabian StyleZhou, Da, Song Gao, and Xinrui Huang. 2026. "Strategies for Class-Imbalanced Learning in Multi-Sensor Medical Imaging" Sensors 26, no. 6: 1998. https://doi.org/10.3390/s26061998
APA StyleZhou, D., Gao, S., & Huang, X. (2026). Strategies for Class-Imbalanced Learning in Multi-Sensor Medical Imaging. Sensors, 26(6), 1998. https://doi.org/10.3390/s26061998

