Multi-Scale Fusion MaxViT for Medical Image Classification with Hyperparameter Optimization Using Super Beluga Whale Optimization
Abstract
1. Introduction
- The SBWO is proposed. To improve the performance of deep learning models in medical image classification, this study introduces the SBWO algorithm based on bi-interpolation optimization [15]. The algorithm combines adaptive parameter tuning and quadratic interpolation optimization to improve the convergence and global search capability of the algorithm. Compared to the traditional BWO, SBWO shows excellent performance in optimizing the hyperparameters of the MaxViT network.
- Combination of Parallel Attention Block based on MaxViT Block improvement and Multi-Scale Fusion Attention Block (MSFA) focusing on enhanced feature representation on MaxViT base model: An improved model, named MSF-MaxViT, is proposed in this paper. This model introduces a novel attention mechanism module, MSFA Block, into the MaxViT network, which enhances the network’s generalization ability and feature expression ability by focusing on the combination of edge information enhancement, multi-scale fusion, and coordinate attention [16], which further improves the performance of medical image classification. Meanwhile, the Parallel Attention Block focuses on the dynamic fusion of multiple attention mechanisms of the original MaxViT Block, changing Block Attention and Grid Attention from serial to parallel, which can learn features at different scales and levels.
- Optimized hyperparameters of the MSF-MaxViT model: By optimizing the hyperparameters of MSF-MaxViT using SBWO, the classification accuracy on the HAM10000 dataset was improved to 92.87%. This result demonstrates the effectiveness of the proposed optimization strategy in improving the accuracy of the medical image classification task.
- Innovative and practical: The research in this paper effectively improves the performance of optimization algorithms and network modules. The outcomes of the experiments show that the method performs better in the HAM10000 medical image classification task and has the potential to be further extended to other datasets.
2. SBWO Optimization Algorithm
2.1. BWO Background
2.1.1. Initial Phase
2.1.2. Global Exploration Phase
2.1.3. Localized Development Phase
2.1.4. Whale-Fall Stage
2.2. BWO Improvement Strategy
2.2.1. Adaptive Parameter Tuning
2.2.2. Enhanced Quadratic Interpolation Strategy
2.2.3. Exploring and Developing Mechanisms
2.2.4. Mechanisms of Variation
2.2.5. Algorithmic Description
| Algorithm 1: SBWO | 
| 1. Input:. 2. Initialize: for all solutions. . . , inertia max, inertia min, mutation rate. 3. Output: . 4. For : when better. 5. End For. 6. While : and inertia weight. . 7. For : Interpolation Strategy: Select two random whales rl, rr. using quadratic interpolation. when better. Exploration/Exploitation Phase: If (exploration): Update position using random peers and a non-linear inertia weight. Else (exploitation using Levy flight): Update position with Levy flight strategy. Mutation: Apply mutation occurs with probability . Boundary check and fitness update. when improved. 8. Whale Fall Mechanism (EBWO): : Update position using whale fall mechanism. Boundary check and fitness update. when improved. 9. . 10. Record . 11. Increment . | 
2.3. Hyperparametric Optimization MSF-MaxViT
3. Multi-Scale Fusion MaxViT Network Model
3.1. MaxViT Background
3.2. MaxViT Optimization Improvements
3.2.1. The Overall MSF-MaxViT Framework
3.2.2. Parallel Attention
3.2.3. MSFA Block
4. Experimental Analysis and Results
4.1. HAM10000 Dataset
Data Set Division and Enhancement
4.2. Testing of Optimization Algorithms
4.3. Experimental Configuration and Results
4.4. Ablation Experiments
4.5. Evaluation Indicators
5. Discussion and Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Tu, Z.; Talebi, H.; Zhang, H.; Yang, F.; Milanfar, P.; Bovik, A.; Li, Y. MaxVit: Multi-Axis Vision Transformer. In European Conference on Computer Vision; Springer: Cham, Switzerland, 2022; pp. 459–479. [Google Scholar] [CrossRef]
- Pacal, I. Enhancing crop productivity and sustainability through disease identification in maize leaves: Exploiting a large dataset with an advanced vision transformer model. Expert Syst. Appl. 2024, 238, 122099. [Google Scholar] [CrossRef]
- Jiang, T.; Guo, J.; Xing, W.; Yu, M.; Li, Y.; Zhang, B.; Ta, D. A prior segmentation knowledge enhanced deep learning system for the classification of tumors in ultrasound image. Eng. Appl. Artif. Intell. 2025, 142, 109926. [Google Scholar] [CrossRef]
- Pacal, I. MaxCerVixT: A novel lightweight vision transformer-based Approach for precise cervical cancer detection. Knowl.-Based Syst. 2024, 289, 111482. [Google Scholar] [CrossRef]
- Wang, W.C.; Tian, W.C.; Xu, D.M.; Zang, H.F. Arctic Puffin Optimization: A Bio-Inspired Metaheuristic Algorithm for Solving Engineering Design Optimization. Adv. Eng. Softw. 2024, 195, 103694. [Google Scholar] [CrossRef]
- Hamad, R.K.; Rashid, T.A. GOOSE Algorithm: A Powerful Optimization Tool for Real-World Engineering Challenges and Beyond. Evol. Syst. 2024, 15, 1249–1274. [Google Scholar] [CrossRef]
- Fu, S.; Li, K.; Huang, H.; Ma, C.; Fan, Q.; Zhu, Y. Red-Billed Blue Magpie Optimizer: A Novel Metaheuristic Algorithm for 2D/3D UAV Path Planning and Engineering Design Problems. Artif. Intell. Rev. 2024, 57, 134. [Google Scholar] [CrossRef]
- Bouaouda, A.; Hashim, F.A.; Sayouti, Y.; Hussien, A.G. Pied Kingfisher Optimizer: A New Bio-Inspired Algorithm for Solving Numerical Optimization and Industrial Engineering Problems. Neural Comput. Appl. 2024, 36, 15455–15513. [Google Scholar] [CrossRef]
- Zhong, C.; Li, G.; Meng, Z. Beluga Whale Optimization: A Novel Nature-Inspired Metaheuristic Algorithm. Knowl.-Based Syst. 2022, 251, 109215. [Google Scholar] [CrossRef]
- Huang, J.; Hu, H. Hybrid Beluga Whale Optimization Algorithm with Multi-Strategy for Functions and Engineering Optimization Problems. J. Big Data 2024, 11, 3. [Google Scholar] [CrossRef]
- Horng, S.C.; Lin, S.S. Improved Beluga Whale Optimization for Solving the Simulation Optimization Problems with Stochastic Constraints. Mathematics 2023, 11, 1854. [Google Scholar] [CrossRef]
- Li, J.; Zhou, X.; Zhou, Y.; Han, A. Optimal Configuration of Distributed Generation Based on an Improved Beluga Whale Optimization. IEEE Access 2024, 12, 31000–31013. [Google Scholar] [CrossRef]
- Yuan, H.; Chen, Q.; Li, H.; Zeng, D.; Wu, T.; Wang, Y.; Zhang, W. Improved Beluga Whale Optimization Algorithm Based Cluster Routing in Wireless Sensor Networks. Math. Biosci. Eng. 2024, 21, 4587–4625. [Google Scholar] [CrossRef]
- Tschandl, P.; Rosendahl, C.; Kittler, H. The HAM10000 Dataset, a Large Collection of Multi-Source Dermatoscopic Images of Common Pigmented Skin Lesions. Sci. Data 2018, 5, 180161. [Google Scholar] [CrossRef] [PubMed]
- Qaraad, M.; Amjad, S.; Hussein, N.K.; Farag, M.A.; Mirjalili, S.; Elhosseini, M.A. Quadratic Interpolation and a New Local Search Approach to Improve Particle Swarm Optimization: Solar Photovoltaic Parameter Estimation. Expert Syst. Appl. 2024, 236, 121417. [Google Scholar] [CrossRef]
- Hou, Q.; Zhou, D.; Feng, J. Coordinate Attention for Efficient Mobile Network Design. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 13713–13722. [Google Scholar] [CrossRef]
- Paçacı, S. Improvement of Beluga Whale Optimization Algorithm by Distance Balance Selection Method. Yalvaç Akad. Derg. 2023, 8, 125–144. [Google Scholar] [CrossRef]
- Jia, H.; Wen, Q.; Wu, D.; Wang, Z.; Wang, Y.; Wen, C.; Abualigah, L. Modified Beluga Whale Optimization with Multi-Strategies for Solving Engineering Problems. J. Comput. Des. Eng. 2023, 10, 2065–2093. [Google Scholar] [CrossRef]
- Chen, X.; Zhang, M.; Yang, M.; Wang, D. NHBBWO: A Novel Hybrid Butterfly-Beluga Whale Optimization Algorithm with the Dynamic Strategy for WSN Coverage Optimization. Peer-to-Peer Netw. Appl. 2025, 18, 80. [Google Scholar] [CrossRef]
- Yuan, X.; Hu, G.; Zhong, J.; Wei, G. HBWO-JS: Jellyfish Search Boosted Hybrid Beluga Whale Optimization Algorithm for Engineering Applications. J. Comput. Des. Eng. 2023, 10, 1615–1656. [Google Scholar] [CrossRef]
- Agrawal, A.; Tripathi, S. Particle Swarm Optimization with Adaptive Inertia Weight Based on Cumulative Binomial Probability. Evol. Intell. 2021, 14, 305–313. [Google Scholar] [CrossRef]
- Chen, H.; Wang, Z.; Wu, D.; Jia, H.; Wen, C.; Rao, H.; Abualigah, L. An Improved Multi-Strategy Beluga Whale Optimization for Global Optimization Problems. Math. Biosci. Eng. 2023, 20, 13267–13317. [Google Scholar] [CrossRef]
- Li, Y.; Li, X.; Liu, J.; Ruan, X. An Improved Bat Algorithm Based on Lévy Flights and Adjustment Factors. Symmetry 2019, 11, 925. [Google Scholar] [CrossRef]
- Zhang, L.; Qiao, Z.; Li, L. An Evolutionary Deep Learning Method Based on Improved Heap-Based Optimization for Medical Image Classification and Diagnosis. IEEE Access 2024, 12, 102745–102773. [Google Scholar] [CrossRef]
- El-Bouzaidi, Y.E.I.; Hibbi, F.Z.; Abdoun, O. Optimizing Convolutional Neural Network Impact of Hyperparameter Tuning and Transfer Learning. In Innovations in Optimization and Machine Learning; IGI Global Scientific Publishing: Hershey, PA, USA, 2025; pp. 301–326. [Google Scholar] [CrossRef]
- Dosovitskiy, A. An Image is Worth 16 × 16 Words: Transformers for Image Recognition at Scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Guo, B. Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 10012–10022. [Google Scholar] [CrossRef]
- Zhang, H.; Zu, K.; Lu, J.; Zou, Y.; Meng, D. EPSANet: An Efficient Pyramid Split Attention Block on Convolutional Neural Network. arXiv 2021, arXiv:2105.14447. [Google Scholar] [CrossRef]
- Yue, W.; Liu, S.; Li, Y. Eff-PCNet: An Efficient Pure CNN Network for Medical Image Classification. Appl. Sci. 2023, 13, 9226. [Google Scholar] [CrossRef]
- Maduranga, M.W.P.; Nandasena, D. Mobile-Based Skin Disease Diagnosis System Using Convolutional Neural Networks (CNN). Int. J. Image Graph. Signal Process. 2022, 12, 47. [Google Scholar] [CrossRef]
- Adebiyi, A.; Abdalnabi, N.; Smith, E.H.; Hirner, J.; Simoes, E.J.; Becevic, M.; Rao, P. Accurate skin lesion classification using multimodal learning on the HAM10000 dataset. medRxiv 2024. [Google Scholar] [CrossRef]
- Shetty, B.; Fernandes, R.; Rodrigues, A.P.; Chengoden, R.; Bhattacharya, S.; Lakshmanna, K. Skin Lesion Classification of Dermoscopic Images Using Machine Learning and Convolutional Neural Network. Sci. Rep. 2022, 12, 18134. [Google Scholar] [CrossRef]
- Min, C.; Zhang, M.; Zhang, Q.; Jiang, Z.; Zhou, L. A Two-Stage Adaptive Differential Evolution Algorithm with Accompanying Populations. Mathematics 2025, 13, 440. [Google Scholar] [CrossRef]
- Wei, M.; Wu, Q.; Ji, H.; Wang, J.; Lyu, T.; Liu, J.; Zhao, L. A Skin Disease Classification Model Based on DenseNet and ConvNext Fusion. Electronics 2023, 12, 438. [Google Scholar] [CrossRef]
- Sarker, M.M.K.; Moreno-García, C.F.; Ren, J.; Elyan, E. TransSLC: Skin Lesion Classification in Dermatoscopic Images Using Transformers. In Annual Conference on Medical Image Understanding and Analysis; Springer: Cham, Switzerland, 2022; pp. 651–660. [Google Scholar] [CrossRef]
- Chaturvedi, S.S.; Tembhurne, J.V.; Diwan, T. A Multi-Class Skin Cancer Classification Using Deep Convolutional Neural Networks. Multimed. Tools Appl. 2020, 79, 28477–28498. [Google Scholar] [CrossRef]
- Heller, N.; Bussmann, E.; Shah, A.; Dean, J.; Papanikolopoulos, N. Computer-Aided Diagnosis of Skin Lesions from Morphological Features. Available online: https://api.semanticscholar.org/CorpusID:52108303 (accessed on 23 February 2025).
- Haider, K.M.M.; Dhar, M.; Akter, F.; Islam, S.; Shariar, S.R.; Hossain, M.I. An Enhanced CNN Model for Classifying Skin Cancer. In Proceedings of the 2nd International Conference on Computing Advancements, Dhaka, Bangladesh, 10–12 March 2022; pp. 456–459. [Google Scholar] [CrossRef]
- Khan, M.A.; Javed, M.Y.; Sharif, M.; Saba, T.; Rehman, A. Multi-Model Deep Neural Network-Based Features Extraction and Optimal Selection Approach for Skin Lesion Classification. In Proceedings of the 2019 International Conference on Computer and Information Sciences (ICCIS), Sakaka, Saudi Arabia, 3–4 April 2019; pp. 1–7. [Google Scholar] [CrossRef]
- Pacal, I.; Alaftekin, M.; Zengul, F.D. Enhancing Skin Cancer Diagnosis Using Swin Transformer with Hybrid Shifted Window-Based Multi-Head Self-Attention and SwiGLU-Based MLP. J. Imaging Inform. Med. 2024, 37, 3174–3192. [Google Scholar] [CrossRef] [PubMed]
- Saheed, Y.K.; Misra, S. CPS-IoT-PPDNN: A New Explainable Privacy Preserving DNN for Resilient Anomaly Detection in Cyber-Physical Systems-Enabled IoT Networks. Chaos Solitons Fractals 2025, 191, 115939. [Google Scholar] [CrossRef]
- Laganà, F.; Prattico, D.; De Carlo, D.; Oliva, G.; Pullano, S.A.; Calcagno, S. Engineering Biomedical Problems to Detect Carcinomas: A Tomographic Impedance Approach. Eng 2024, 5, 1594–1614. [Google Scholar] [CrossRef]







| Test Function | BWO | SBWO | APO | GOOSE | RBMO | PKO | 
|---|---|---|---|---|---|---|
| F1 | 1.14 × 10−267 | 0 | 1.10 × 10−6 | 5.15 × 10−3 | 2.05 × 10−4 | 3.30 × 10−4 | 
| F2 | 1.15 × 10−134 | 0 | 1.50 × 10−4 | 1.17 × 102 | 1.20 × 10−2 | 1.49 × 10−2 | 
| F3 | 1.97 × 10−241 | 0 | 2.74 × 10−2 | 7.88 × 103 | 2.21 × 102 | 1.47 × 103 | 
| F4 | 4.05 × 10−130 | 0 | 1.36 × 10−1 | 3.77 × 101 | 3.67 × 100 | 1.01 × 100 | 
| F9 | 2.20 × 10−259 | 0 | 1.08 × 10−7 | 2.26 × 10−1 | 1.21 × 10−5 | 6.40 × 10−8 | 
| F10 | 2.80 × 10−275 | 0 | 2.91 × 10−16 | 8.53 × 10−6 | 1.67 × 102 | 2.11 × 105 | 
| F11 | 1.98 × 10−129 | 0 | 2.02 × 10−3 | 1.79 × 102 | 4.75 × 10−2 | 3.35 × 10−2 | 
| F12 | 0 | 0 | 4.76 × 10−24 | 1.77 × 10−12 | 6.03 × 10−12 | 3.17 × 10−11 | 
| F24 | 0 | 0 | 5.28 × 100 | 1.84 × 102 | 2.35 × 101 | 2.0019 | 
| Test Function | BWO | SBWO | EBWO | NIWBWO | LHNIEBWO | HNIEBWO | 
|---|---|---|---|---|---|---|
| F1 | 1.32 × 10−260 | 0 | 0 | 8.82 × 10−321 | 0 | 0 | 
| F2 | 4.55 × 10−132 | 0 | 0 | 2.24 × 10−163 | 0 | 0 | 
| F3 | 4.00 × 10−249 | 0 | 0 | 1.17 × 10−305 | 0 | 0 | 
| F4 | 6.38 × 10−129 | 0 | 0 | 1.08 × 10−163 | 0 | 0 | 
| F9 | 1.79 × 10−265 | 0 | 0 | 0 | 0 | 0 | 
| F10 | 1.68 × 10−267 | 0 | 0 | 0 | 0 | 0 | 
| F11 | 5.94 × 10−132 | 0 | 0 | 8.67 × 10−163 | 0 | 0 | 
| F12 | 0 | 0 | 0 | 0 | 0 | 0 | 
| F24 | 0 | 0 | 0 | 0 | 0 | 0 | 
| Model Name | Year | Accuracy | 
|---|---|---|
| ConvNeXt_L [34] | 2022 | 88.40% | 
| TransSLC [35] | 2022 | 90.20% | 
| An Enhanced CNN Model [38] | 2022 | 90.55% | 
| NASNetLarge [36] | 2018 | 91.11% | 
| Xception [36] | 2017 | 91.47% | 
| InceptionV3 [36] | 2015 | 91.56% | 
| DenseNet [37] | 2017 | 88.20% | 
| Resnet50, Restnet101 + KcPCA + SVM RBF [39] | 2019 | 89.80% | 
| Model Based on DenseNet and ConvNeXt Fusion [34] | 2023 | 90.85% | 
| MaxViT | 2021 | 91.02% | 
| MSF-MaxViT (Ours) | 2024 | 92.87% | 
| Model | Max Accuracy | Last Accuracy | 
|---|---|---|
| MaxViT | 91.02% | 90.82% | 
| MaxViT + MSFA | 90.07% | 89.88% | 
| MaxViT + MSFA + Parallel Attention | 89.03% | 88.98% | 
| MaxViT + MSFA (Residual + Dynamic Fusion Weights) + Parallel Attention | 91.37% | 91.17% | 
| MaxViT + MSFA (Residual + Edge Infor Enhancement + Multiscale Convolution) + Parallel Attention | 91.07% | 90.97% | 
| MaxViT + MSFA (SOTA) | 91.42% | 91.32% | 
| SOTA + SBWO (ours) | 92.87% | 91.87% | 
| Model Name | Recall | Precision | F1-Score | 
|---|---|---|---|
| ConvNeXt_L [34] | 76.47% | 80.04% | 78.24% | 
| TransSLC [35] | 85.00% | 87.00% | 85.00% | 
| An Enhanced CNN Model [38] | 91.85% | 92.89% | 92.37% | 
| NASNetLarge [36] | 86.00% | 86.00% | 86.00% | 
| Xception [36] | 88.00% | 89.00% | 88.00% | 
| InceptionV3 [36] | 89.00% | 89.00% | 89.00% | 
| DenseNet [37] | 84.61% | 78.67% | 81.53% | 
| Resnet50, Restnet101 + KcPCA + SVM RBF [39] | 89.71% | 90.14% | 89.92% | 
| Model Based on DenseNet and ConvNeXt Fusion [34] | 83.81% | 83.75% | 83.45% | 
| MSF-MaxViT (Ours) | 92.00% | 92.00% | 92.00% | 
| Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. | 
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhao, J.; Liu, T.; Sun, L. Multi-Scale Fusion MaxViT for Medical Image Classification with Hyperparameter Optimization Using Super Beluga Whale Optimization. Electronics 2025, 14, 912. https://doi.org/10.3390/electronics14050912
Zhao J, Liu T, Sun L. Multi-Scale Fusion MaxViT for Medical Image Classification with Hyperparameter Optimization Using Super Beluga Whale Optimization. Electronics. 2025; 14(5):912. https://doi.org/10.3390/electronics14050912
Chicago/Turabian StyleZhao, Jiaqi, Tiannuo Liu, and Lin Sun. 2025. "Multi-Scale Fusion MaxViT for Medical Image Classification with Hyperparameter Optimization Using Super Beluga Whale Optimization" Electronics 14, no. 5: 912. https://doi.org/10.3390/electronics14050912
APA StyleZhao, J., Liu, T., & Sun, L. (2025). Multi-Scale Fusion MaxViT for Medical Image Classification with Hyperparameter Optimization Using Super Beluga Whale Optimization. Electronics, 14(5), 912. https://doi.org/10.3390/electronics14050912
 
        

 
       