DA OMS-CNN: Dual-Attention OMS-CNN with 3D Swin Transformer for Early-Stage Lung Cancer Detection
Abstract
1. Introduction
- The first contribution of this study is the integration of a dual-attention mechanism into the final layers of the OMS-CNN. The dual-attention mechanism enhances the network’s ability to capture both spatial and channel-wise dependencies within the feature maps. By incorporating both spatial attention, which emphasizes important regions in the image, and channel attention, which focuses on relevant feature channels, the DA OMS-CNN achieves improved sensitivity in detecting small lung nodules. This approach ensures that critical regions and fine-grained details in the input data are highlighted, leading to more accurate and robust feature extraction.
- The second contribution is the introduction of the dual-attention RoIPooling (DA-RoIPooling) mechanism at the classification stage of the framework. DA-RoIPooling applies spatial and channel-wise attention to the pooled features, enabling the model to focus on the most relevant features within each region of interest (RoI). This dual-attention mechanism ensures that the classification network emphasizes the key characteristics of the nodules while suppressing irrelevant background information. By refining the feature representation within the RoIs, DA-RoIPooling improves the overall classification accuracy, particularly in distinguishing true nodules from false positives. This innovation significantly enhances the performance of the Faster R-CNN framework by reducing misclassifications and improving sensitivity and precision, particularly for challenging cases.
- The third contribution involves the utilization of three distinct 3D Swin Transformers for the false-positive reduction stage. This approach leverages the powerful feature representation capabilities of the 3D Swin Transformer, which uses hierarchical feature extraction and self-attention mechanisms across spatial and temporal dimensions. By combining three separate 3D Swin Transformers, the proposed framework effectively processes volumetric data from different perspectives, ensuring a more comprehensive analysis of nodule candidates. This ensemble strategy reduces false positives by capturing subtle variations and dependencies in the 3D CT data, improving the model’s ability to differentiate between true nodules and irrelevant structures. The use of 3D Swin Transformers in this stage not only enhances the overall detection accuracy but also strengthens the robustness of the proposed framework in clinical scenarios.
2. Related Works
3. Materials and Methods
3.1. Dataset and Preprocessing
3.1.1. LUNA16
3.1.2. PN9
3.1.3. Data Augmentation
3.2. Lung Parenchyma Segmentation
3.3. Lung Nodule Detection
3.3.1. Dual-Attention Optimized Multi-Scale CNN (DA OMS-CNN)
3.3.2. Region Proposal Network (RPN)
- i represents the index of the proposals generated by the region proposal networks.
- j identifies an anchor selected.
- k refers to one of the two region proposal networks.
- is the predicted probability of proposal i being a nodule.
- is the ground truth label, where if the proposal is positive, otherwise .
- and are the predicted and ground truth bounding box regression parameters, respectively.
- is a binary cross-entropy loss.
- represents the regression loss.
- is a balancing factor between the classification and regression losses.
- and are normalization terms for classification and regression, respectively.
- R is the smooth function.
3.3.3. Classification Stage
3.4. False Positive Reduction
3D Swin Transformer
- CT images are represented as , where D refers to the depth, and H and W denote the image’s height and width, respectively.
- The patch partitioning process in SwinT divides the input into patches, each sized . In contrast, 3D SwinT utilizes 3D cubes of size , producing patches. These patches, with a feature dimension of 64, are projected into an arbitrary dimension C via a linear embedding layer. Following this, the neighboring patches are combined during the patch merging stage, where the spatial and depth resolution decrease progressively (4, 8, 16, 32).
- The main distinction between the SwinT and 3D SwinT blocks lies in the multi-head self-attention mechanism. For 3D SwinT, the window-based multi-head self-attention (W-MSA) is extended into a 3D version (3D W-MSA), incorporating the volumetric information. This is achieved using 3D windows sized , where P represents the depth dimension, instead of the 2D windows used in SwinT. Additionally, the window shifting mechanism in 3D SwinT introduces shifts of patches along the depth, height, and width dimensions, enhancing inter-window information interaction.
3.5. Evaluation Metrics
4. Experimental Results and Discussion
4.1. Implementation
4.2. Ablation Study
4.3. Experimental Comparison
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Balyan, A.K.; Ahuja, S.; Lilhore, U.K.; Sharma, S.K.; Manoharan, P.; Algarni, A.D.; Elmannai, H.; Raahemifar, K. A Hybrid Intrusion Detection Model Using EGA-PSO and Improved Random Forest Method. Sensors 2022, 22, 5986. [Google Scholar] [CrossRef]
- Barbouchi, K.; El Hamdi, D.; Elouedi, I.; Ben Aïcha, T.; Echi, A.K.; Slim, I. A transformer-based deep neural network for detection and classification of lung cancer via PET/CT images. Int. J. Imaging Syst. Technol. 2023, 33, 1383–1395. [Google Scholar] [CrossRef]
- Bharati, S.; Mondal, M.R.H.; Podder, P. A Review on Explainable Artificial Intelligence for Healthcare: Why, How, and When? IEEE Trans. Artif. Intell. 2023, 5, 1429–1442. [Google Scholar] [CrossRef]
- World Health Organization. Cancer Today. Iarc.fr. Available online: https://gco.iarc.fr/today/home (accessed on 3 August 2023).
- Dhiman, P.; Kukreja, V.; Manoharan, P.; Kaur, A.; Kamruzzaman, M.M.; Ben Dhaou, I.; Iwendi, C. A Novel Deep Learning Model for Detection of Severity Level of the Disease in Citrus Fruits. Electronics 2022, 11, 495. [Google Scholar] [CrossRef]
- Dai, D.; Sun, Y.; Dong, C.; Yan, Q.; Li, Z.; Xu, S. Effectively fusing clinical knowledge and AI knowledge for reliable lung nodule diagnosis. Expert Syst. Appl. 2023, 230, 120634. [Google Scholar] [CrossRef]
- Zamanidoost, Y.; Ould-Bachir, T.; Martel, S. OMS-CNN: Optimized Multi-Scale CNN for Lung Nodule Detection Based on Faster R-CNN. IEEE J. Biomed. Health Inform. 2024, 29, 2148–2160. [Google Scholar] [CrossRef]
- Jeong, Y.W.; Park, S.M.; Geem, Z.W.; Sim, K.B. Advanced parameter-setting-free harmony search algorithm. Appl. Sci. 2020, 10, 2586. [Google Scholar] [CrossRef]
- Wu, Q.; Ma, Z.; Xu, G.; Li, S.; Chen, D. A novel neural network classifier using beetle antennae search algorithm for pattern classification. IEEE Access 2019, 7, 64686–64696. [Google Scholar] [CrossRef]
- Gao, C.; Wu, L.; Wu, W.; Huang, Y.; Wang, X.; Sun, Z.; Xu, M.; Gao, C. Deep learning in pulmonary nodule detection and segmentation: A systematic review. Eur. Radiol. 2025, 35, 255–266. [Google Scholar] [CrossRef]
- Zamanidoost, Y.; Alami-Chentoufi, N.; Ould-Bachir, T.; Martel, S. Efficient region proposal extraction of small lung nodules using enhanced VGG16 network model. In Proceedings of the 2023 IEEE 36th International Symposium on Computer-Based Medical Systems (CBMS), L’Aquila, Italy, 22–24 June 2023; pp. 483–488. [Google Scholar]
- Tan, Y.; Fu, X.; Zhu, J.; Chen, L. A improved detection method for lung nodule based on multi-scale 3D convolutional neural network. Concurr. Comput. Pract. Exp. 2023, 35, e7034. [Google Scholar] [CrossRef]
- Almahasneh, M.; Xie, X.; Paiement, A. AttentNet: Fully Convolutional 3D Attention for Lung Nodule Detection. arXiv 2024, arXiv:2407.14464. [Google Scholar] [CrossRef]
- Wu, R.; Liang, C.; Zhang, J.; Tan, Q.; Huang, H. Multi-kernel driven 3D convolutional neural network for automated detection of lung nodules in chest CT scans. Biomed. Opt. Express 2024, 15, 1195–1218. [Google Scholar] [CrossRef] [PubMed]
- Srivastava, D.; Srivastava, S.K.; Khan, S.B.; Singh, H.R.; Maakar, S.K.; Agarwal, A.K.; Malibari, A.A.; Albalawi, E. Early Detection of Lung Nodules Using a Revolutionized Deep Learning Model. Diagnostics 2023, 13, 3485. [Google Scholar] [CrossRef] [PubMed]
- Ma, L.; Li, G.; Feng, X.; Fan, Q.; Liu, L. TiCNet: Transformer in Convolutional Neural Network for Pulmonary Nodule Detection on CT Images. J. Imaging Inform. Med. 2024, 37, 196–208. [Google Scholar] [CrossRef]
- Sun, R.; Pang, Y.; Li, W. Efficient lung cancer image classification and segmentation algorithm based on an improved swin transformer. Electronics 2023, 12, 1024. [Google Scholar] [CrossRef]
- LUNA16—Grand Challenge. Grand-Challenge.org. Available online: https://luna16.grand-challenge.org/Download/ (accessed on 3 August 2023).
- Mei, J.; Cheng, M.M.; Xu, G.; Wan, L.R.; Zhang, H. SANet: A slice-aware network for pulmonary nodule detection. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 44, 4374–4387. [Google Scholar] [CrossRef]
- Chlap, P.; Min, H.; Vandenberg, N.; Dowling, J.; Holloway, L.; Haworth, A. A review of medical image data augmentation techniques for deep learning applications. J. Med. Imaging Radiat. Oncol. 2021, 65, 545–563. [Google Scholar] [CrossRef]
- Bauckhage, C. Numpy/Scipy Recipes for Image Processing: Binary Images and Morphological Operations; Technical Report; B-IT, University of Bonn, Fraunhofer IAIS: Sankt Augustin, Germany, 2017. [Google Scholar]
- UrRehman, Z.; Qiang, Y.; Wang, L.; Shi, Y.; Yang, Q.; Khattak, S.U.; Aftab, R.; Zhao, J. Effective lung nodule detection using deep CNN with dual attention mechanisms. Sci. Rep. 2024, 14, 3934. [Google Scholar] [CrossRef]
- Zhao, Y.; Wang, Z.; Liu, X.; Chen, Q.; Li, C.; Zhao, H.; Wang, Z. Pulmonary nodule detection based on multiscale feature fusion. Comput. Math. Methods Med. 2022, 2022, 8903037. [Google Scholar] [CrossRef]
- Dou, Q.; Chen, H.; Jin, Y.; Lin, H.; Qin, J.; Heng, P.A. Automated pulmonary nodule detection via 3d convnets with online sample filtering and hybrid-loss residual learning. In Medical Image Computing and Computer Assisted Intervention—MICCAI 2017, Proceedings of the 20th International Conference, Quebec City, QC, Canada, 11–13 September 2017; Proceedings, Part III 20; Springer: Berlin/Heidelberg, Germany, 2017; pp. 630–638. [Google Scholar]
- Gu, Y.; Lu, X.; Yang, L.; Zhang, B.; Yu, D.; Zhao, Y.; Gao, L.; Wu, L.; Zhou, T. Automatic lung nodule detection using a 3D deep convolutional neural network combined with a multi-scale prediction strategy in chest CTs. Comput. Biol. Med. 2018, 103, 220–231. [Google Scholar] [CrossRef]
- Pezeshk, A.; Hamidian, S.; Petrick, N.; Sahiner, B. 3-D convolutional neural networks for automatic detection of pulmonary nodules in chest CT. IEEE J. Biomed. Health Inform. 2018, 23, 2080–2090. [Google Scholar] [CrossRef] [PubMed]
- Xie, H.; Yang, D.; Sun, N.; Chen, Z.; Zhang, Y. Automated pulmonary nodule detection in CT images using deep convolutional neural networks. Pattern Recognit. 2019, 85, 109–119. [Google Scholar] [CrossRef]
- Zuo, W.; Zhou, F.; He, Y. An embedded multi-branch 3D convolution neural network for false positive reduction in lung nodule detection. J. Digit. Imaging 2020, 33, 846–857. [Google Scholar] [CrossRef] [PubMed]
- Sun, L.; Wang, Z.; Pu, H.; Yuan, G.; Guo, L.; Pu, T.; Peng, Z. Attention-embedded complementary-stream CNN for false positive reduction in pulmonary nodule detection. Comput. Biol. Med. 2021, 133, 104357. [Google Scholar] [CrossRef]
- Harsono, I.W.; Liawatimena, S.; Cenggoro, T.W. Lung nodule detection and classification from Thorax CT-scan using RetinaNet with transfer learning. J. King Saud Univ. Comput. Inf. Sci. 2022, 34, 567–577. [Google Scholar] [CrossRef]
- Wei, L. SSD: Single shot multibox detector. In Computer Vision ECCV2016 14th European Conference Proceedings Part I, Amsterdam, The Netherlands, 11–14 October 2016; Leibe, B., Matas, J., Sebe, N., Welling, M., Eds.; Springer: Berlin/Heidelberg, Germany, 2016; pp. 21–37. [Google Scholar]
- Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
- Tang, H.; Zhang, C.; Xie, X. Nodulenet: Decoupled false positive reduction for pulmonary nodule detection and segmentation. In Medical Image Computing and Computer Assisted Intervention—MICCAI 2019, Proceedings of the 22nd International Conference, Shenzhen, China, 13–17 October 2019; Proceedings, Part VI 22; Springer: Berlin/Heidelberg, Germany, 2019; pp. 266–274. [Google Scholar]
Model Configuration | CPM Score | Sensitivity at 1.0 FP/scan |
---|---|---|
OMS-CNN | 0.839 | 0.8521 |
DA OMS-CNN | 0.849 | 0.8967 |
DA OMS-CNN + DA-RoIPooling | 0.860 | 0.9331 |
DA OMS-CNN + DA-RoIPooling + FPR | 0.911 | 0.9601 |
CAD Method | Year | 0.125 | 0.25 | 0.5 | 1.0 | 2.0 | 4.0 | 8.0 | CPM |
---|---|---|---|---|---|---|---|---|---|
Dou et al. [24] | (2017) | 0.6590 | 0.7540 | 0.8190 | 0.8650 | 0.9060 | 0.9330 | 0.9460 | 0.8390 |
Gu et al. [25] | (2018) | 0.4801 | 0.6495 | 0.7920 | 0.8794 | 0.9163 | 0.9293 | 0.9301 | 0.7967 |
Pezeshk et al. [26] | (2018) | 0.6370 | 0.7230 | 0.8040 | 0.8650 | 0.9070 | 0.9380 | 0.9520 | 0.8320 |
Xie et al. [27] | (2019) | 0.4390 | 0.6880 | 0.7960 | 0.8520 | 0.8640 | 0.8640 | 0.8640 | 0.7750 |
OMS-CNN [7] | (2024) | 0.7215 | 0.7357 | 0.7993 | 0.8521 | 0.9162 | 0.9243 | 0.9283 | 0.8396 |
DA OMS-CNN | 0.7285 | 0.7461 | 0.8223 | 0.8967 | 0.9377 | 0.9438 | 0.9458 | 0.8601 |
CAD Method | Year | 0.125 | 0.25 | 0.5 | 1.0 | 2.0 | 4.0 | 8.0 | CPM |
---|---|---|---|---|---|---|---|---|---|
Zeo et al. [28] | (2020) | 0.6300 | 0.7530 | 0.8190 | 0.8690 | 0.9030 | 0.9150 | 0.9200 | 0.8300 |
CBAM [29] | (2021) | 0.4670 | 0.6020 | 0.7300 | 0.812 | 0.8770 | 0.9150 | 0.9310 | 0.7620 |
I3DR-Net [30] | (2022) | 0.6356 | 0.7131 | 0.7984 | 0.8527 | 0.8760 | 0.8992 | 0.9147 | 0.8128 |
MSM-CNN [23] | (2022) | 0.6770 | 0.7410 | 0.8160 | 0.8500 | 0.8900 | 0.9050 | 0.9250 | 0.8290 |
MS-3DCNN [12] | (2023) | 0.7280 | 0.7990 | 0.860 | 0.8080 | 0.9260 | 0.9410 | 0.9560 | 0.8730 |
AttentNet [13] | (2024) | 0.7520 | 0.8170 | 0.8570 | 0.8850 | 0.9200 | 0.9330 | 0.9330 | 0.8710 |
MK-3DCNN [14] | (2024) | 0.7099 | 0.7723 | 0.8356 | 0.8836 | 0.9174 | 0.9384 | 0.9562 | 0.8591 |
TED [16] | (2024) | 0.7619 | 0.8222 | 0.8736 | 0.9069 | 0.9302 | 0.9443 | 0.9530 | 0.8846 |
OMS-CNN [7] | (2024) | 0.7932 | 0.8421 | 0.8712 | 0.9048 | 0.9387 | 0.9473 | 0.9481 | 0.8922 |
DA OMS-CNN | 0.7973 | 0.8584 | 0.8995 | 0.9331 | 0.9534 | 0.9682 | 0.9689 | 0.9112 |
CAD Method | Year | 0.125 | 0.25 | 0.5 | 1.0 | 2.0 | 4.0 | 8.0 | CPM |
---|---|---|---|---|---|---|---|---|---|
SSD512 [31] | (2016) | 0.0462 | 0.0848 | 0.1476 | 0.2506 | 0.4032 | 0.5727 | 0.7080 | 0.3161 |
RetinaNet [32] | (2017) | 0.0260 | 0.0556 | 0.1095 | 0.1925 | 0.2929 | 0.4049 | 0.5105 | 0.2274 |
NoduleNet [33] | (2019) | 0.2117 | 0.3023 | 0.4038 | 0.5102 | 0.6129 | 0.7070 | 0.7693 | 0.5025 |
SA-Net [19] | (2021) | 0.2672 | 0.3603 | 0.4746 | 0.5699 | 0.6635 | 0.7352 | 0.7832 | 0.5506 |
I3DR-Net [30] | (2022) | 0.1564 | 0.2313 | 0.3700 | 0.5154 | 0.6454 | 0.7291 | 0.7753 | 0.4890 |
OMS-CNN [7] | (2024) | 0.2865 | 0.3841 | 0.4775 | 0.5907 | 0.6974 | 0.7853 | 0.8432 | 0.5807 |
DA OMS-CNN | 0.3015 | 0.3952 | 0.4978 | 0.6221 | 0.7205 | 0.8241 | 0.8629 | 0.6034 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zamanidoost, Y.; Rivron, M.; Ould-Bachir, T.; Martel, S. DA OMS-CNN: Dual-Attention OMS-CNN with 3D Swin Transformer for Early-Stage Lung Cancer Detection. Informatics 2025, 12, 65. https://doi.org/10.3390/informatics12030065
Zamanidoost Y, Rivron M, Ould-Bachir T, Martel S. DA OMS-CNN: Dual-Attention OMS-CNN with 3D Swin Transformer for Early-Stage Lung Cancer Detection. Informatics. 2025; 12(3):65. https://doi.org/10.3390/informatics12030065
Chicago/Turabian StyleZamanidoost, Yadollah, Matis Rivron, Tarek Ould-Bachir, and Sylvain Martel. 2025. "DA OMS-CNN: Dual-Attention OMS-CNN with 3D Swin Transformer for Early-Stage Lung Cancer Detection" Informatics 12, no. 3: 65. https://doi.org/10.3390/informatics12030065
APA StyleZamanidoost, Y., Rivron, M., Ould-Bachir, T., & Martel, S. (2025). DA OMS-CNN: Dual-Attention OMS-CNN with 3D Swin Transformer for Early-Stage Lung Cancer Detection. Informatics, 12(3), 65. https://doi.org/10.3390/informatics12030065