Previous Article in Journal
Low-Memory-Footprint CNN-Based Biomedical Signal Processing for Wearable Devices
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Article

A Hybrid Learnable Fusion of ConvNeXt and Swin Transformer for Optimized Image Classification

by
Jaber Qezelbash-Chamak
1,* and
Karen Hicklin
1,2
1
Department of Industrial and Systems Engineering, University of Florida, Gainesville, FL 32611, USA
2
Department of Population Health Science and Policy, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
*
Author to whom correspondence should be addressed.
IoT 2025, 6(2), 30; https://doi.org/10.3390/iot6020030 (registering DOI)
Submission received: 25 March 2025 / Revised: 3 May 2025 / Accepted: 13 May 2025 / Published: 16 May 2025

Abstract

Medical image classification often relies on CNNs to capture local details (e.g., lesions, nodules) or on transformers to model long-range dependencies. However, each paradigm alone is limited in addressing both fine-grained structures and broader anatomical context. We propose ConvTransGFusion, a hybrid model that fuses ConvNeXt (for refined convolutional features) and Swin Transformer (for hierarchical global attention) using a learnable dual-attention gating mechanism. By aligning spatial dimensions, scaling each branch adaptively, and applying both channel and spatial attention, the proposed architecture bridges local and global representations, melding fine‑grained lesion details with the broader anatomical context essential for accurate diagnosis. Tested on four diverse medical imaging datasets—including X-ray, ultrasound, and MRI scans—the proposed model consistently achieves superior accuracy, precision, recall, F1, and AUC over state-of-the-art CNNs and transformers. Our findings highlight the benefits of combining convolutional inductive biases and transformer-based global context in a single learnable framework, positioning ConvTransGFusion as a robust and versatile solution for real-world clinical applications.
Keywords: machine learning; deep learning; ConvNeXt; Swin Transformer; feature fusion; biomedical informatics machine learning; deep learning; ConvNeXt; Swin Transformer; feature fusion; biomedical informatics

Share and Cite

MDPI and ACS Style

Qezelbash-Chamak, J.; Hicklin, K. A Hybrid Learnable Fusion of ConvNeXt and Swin Transformer for Optimized Image Classification. IoT 2025, 6, 30. https://doi.org/10.3390/iot6020030

AMA Style

Qezelbash-Chamak J, Hicklin K. A Hybrid Learnable Fusion of ConvNeXt and Swin Transformer for Optimized Image Classification. IoT. 2025; 6(2):30. https://doi.org/10.3390/iot6020030

Chicago/Turabian Style

Qezelbash-Chamak, Jaber, and Karen Hicklin. 2025. "A Hybrid Learnable Fusion of ConvNeXt and Swin Transformer for Optimized Image Classification" IoT 6, no. 2: 30. https://doi.org/10.3390/iot6020030

APA Style

Qezelbash-Chamak, J., & Hicklin, K. (2025). A Hybrid Learnable Fusion of ConvNeXt and Swin Transformer for Optimized Image Classification. IoT, 6(2), 30. https://doi.org/10.3390/iot6020030

Article Metrics

Back to TopTop