Daisy-Net: Dual-Attention and Inter-Scale-Aware Yield Network for Lung Nodule Object Detection
Abstract
1. Introduction
1.1. Related Work
1.1.1. Traditional Detection Methods
1.1.2. Deep Learning-Based Two-Dimensional Methods
1.1.3. Advances in Attention Mechanisms
1.1.4. Three-Dimensional Deep Learning Approaches
1.1.5. Motivation for Daisy-Net
2. Materials and Methods
2.1. LUNA16 Dataset
2.2. Preprocessing
- Coordinate transformation and metadata preservation. All CT images were resampled to a standardized size of , where Z is the number of slices. The in-plane size of was retained because it is consistent with the common matrix size of chest CT slices in LUNA16 and helps preserve fine anatomical boundaries while keeping the network input format uniform across scans. This standardization reduces implementation variability in batch training and avoids introducing an additional resize target that could blur tiny nodules. Both image and annotation coordinates were converted to a relative reference system. The origin and pixel spacing were stored to allow mapping predictions back to the original space.
- Slice selection based on nodule location. To reduce redundancy and balance data distribution, we selected a range of slices centered on each nodule, based on its diameter and z-axis position. This ensures the model learns from effective and relevant image regions.
- Mask generation. For each selected slice, a corresponding binary nodule mask was generated. If the slice contains part of a nodule, the region corresponding to its projection is marked as 1; other pixels remain 0. These masks provide fine supervision signals for training.
2.3. Proposed Method
2.3.1. Network Architecture Overview
2.3.2. Omni-Domain Multistage Fusion Module
- Fuse and using CRC and CSP to produce ;
- Fuse and to generate ;
- Combine , , and to compute ;
- Fuse with to obtain ;
- Concatenate , , and along the channel dimension and pass through a convolution to yield the final output .
2.3.3. Parallelized Patch and Spatial Context Aware Module
- Multi-branch patch preparation: The input X is first passed through a channel adjustment layer to produce . This intermediate tensor is then fed into two parallel patch-aware branches to extract features at scales and .
- Patch processing in each branch: Within each branch, patches are extracted via the Unfold operation, followed by channel-wise average pooling to obtain . A feed-forward network (FFN) computes patch-wise attention weights, normalized via Softmax. The weighted feature map is then reassembled and upsampled to the original resolution, yielding .
- Residual fusion and enhancement: The outputs from both branches are combined with through residual connections to form the enhanced feature map , which is passed into the subsequent attention module.
- Spatial context enhancement: The attention module, implemented as SCAM, captures spatial context information across the entire image to reinforce feature representation.
- Final output: A final channel adjustment layer transforms into the output .
2.4. Evaluation Metrics and Model Training
3. Results
3.1. Comparison with Other Models
3.2. Ablation Study of Daisy-Net Components
3.3. Impact of Normalization on Nodule Detection Accuracy
3.4. Threshold Analysis
4. Discussion
4.1. Added Value of the Study
- We design two dedicated modules to address the challenge of inadequate feature extraction for small objects and interference from complex backgrounds: the Parallelized Patch and Spatial Context Aware (PPSCA) module and the Omni-domain Multistage Fusion (OMF) module. PPSCA improves the extraction of spatial and channel features of tiny nodules, while OMF ensures effective integration of multi-scale features to improve nodule–background discrimination.
- Extensive experiments conducted on the publicly available LUNA16 dataset validate the performance of Daisy-Net. The proposed method achieves higher overlap-based scores than the selected comparison methods across multiple evaluation metrics, indicating its potential for computer-aided analysis.
- We formulate pulmonary nodule detection as a 2D image segmentation task and propose Daisy-Net, a network that integrates dual attention mechanisms and multi-scale feature awareness for efficient and accurate detection of pulmonary nodules.
4.2. Comparison with State-of-the-Art
4.3. Innovations and Limitations
4.4. Clinical Relevance and Future Directions
- Integrate 3D convolutional architectures to better exploit volumetric information and spatial continuity.
- Evaluate the model on multi-center datasets to ensure robustness and generalization across patient populations.
- Extend the framework to other small-lesion detection tasks such as liver metastases, cerebral microbleeds, and retinal abnormalities.
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
| ACM | Asymmetric Contextual Modulation |
| ANM | Adaptive Nodule Modeling |
| AUC | Area Under the Curve |
| BCE | Binary Cross-Entropy |
| CAD | Computer-Aided Diagnosis |
| CNN | Convolutional Neural Network |
| CRC | Channel Re-weighting Convolution |
| CSP | Cross Stage Partial |
| CT | Computed Tomography |
| FFN | Feed-Forward Network |
| FN | False Negative |
| FP | False Positive |
| GAP | Global Average Pooling |
| GMP | Global Max Pooling |
| HU | Hounsfield Unit |
| IoU | Intersection over Union |
| LIDC/IDRI | Lung Image Database Consortium and Image Database Resource Initiative |
| LUNA16 | LUng Nodule Analysis 2016 |
| MIP | Maximum Intensity Projection |
| OMF | Omni-domain Multistage Fusion |
| PPA | Patch Perception Attention |
| PPSCA | Parallelized Patch and Spatial Context Aware |
| ROC | Receiver Operating Characteristic |
| SCAM | Spatial Context Aware Module |
| Sens | Sensitivity |
| SGD | Stochastic Gradient Descent |
| Spec | Specificity |
| SVM | Support Vector Machine |
| TN | True Negative |
| TP | True Positive |
| U-Net | U-shaped Convolutional Network |
References
- Thanoon, M.A.; Zulkifley, M.A.; Mohd Zainuri, M.A.A.; Abdani, S.R. A Review of Deep Learning Techniques for Lung Cancer Screening and Diagnosis Based on CT Images. Diagnostics 2023, 13, 2617. [Google Scholar] [CrossRef]
- Rampinelli, C.; Calloni, S.F.; Minotti, M.; Bellomi, M. Spectrum of early lung cancer presentation in low-dose screening CT: A pictorial review. Insights Imaging 2016, 7, 449–459. [Google Scholar] [CrossRef] [PubMed]
- Lenzen, H.; Roos, N.; Heindel, W.; Semik, M.; Diederich, S.; Thomas, M.; Weber, A.; Wormanns, D. Screening for early lung cancer with low-dose spiral computed tomography: Results of annual follow-up examinations in asymptomatic smokers. Eur. Radiol. 2004, 14, 691–702. [Google Scholar] [CrossRef] [PubMed]
- Jett, J.R. Limitations of Screening for Lung Cancer with Low-Dose Spiral Computed Tomography. Clin. Cancer Res. 2005, 11, 4988s–4992s. [Google Scholar] [CrossRef]
- Ye, X.; Lin, X.; Dehmeshki, J.; Slabaugh, G.; Beddoe, G. Shape-Based Computer-Aided Detection of Lung Nodules in Thoracic CT Images. IEEE Trans. Biomed. Eng. 2009, 56, 1810–1820. [Google Scholar] [CrossRef]
- Li, B.; Zhang, J.; Tian, L.; Tan, L.; Xiang, S.; Ou, S. Intelligent Recognition of Lung Nodule Combining Rule-based and C-SVM Classifiers. Int. J. Comput. Intell. Syst. 2011, 4, 960. [Google Scholar] [CrossRef]
- Soleymanpour, E.; Pourreza, H.R.; ansaripour, E.; Yazdi, M.S. Fully Automatic Lung Segmentation and Rib Suppression Methods to Improve Nodule Detection in Chest Radiographs. J. Med. Signals Sens. 2011, 1, 191–199. [Google Scholar] [CrossRef]
- Choi, W.J.; Choi, T.S. Automated pulmonary nodule detection based on three-dimensional shape-based feature descriptor. Comput. Methods Programs Biomed. 2014, 113, 37–54. [Google Scholar] [CrossRef]
- Halder, A.; Dey, D.; Sadhu, A.K. Lung Nodule Detection from Feature Engineering to Deep Learning in Thoracic CT Images: A Comprehensive Review. J. Digit. Imaging 2020, 33, 655–677. [Google Scholar] [CrossRef]
- Shah, A.A.; Malik, H.A.M.; Muhammad, A.; Alourani, A.; Butt, Z.A. Deep learning ensemble 2D CNN approach towards the detection of lung cancer. Sci. Rep. 2023, 13, 2987. [Google Scholar] [CrossRef] [PubMed]
- Xu, J.; Ren, H.; Cai, S.; Zhang, X. An improved faster R-CNN algorithm for assisted detection of lung nodules. Comput. Biol. Med. 2023, 153, 106470. [Google Scholar] [CrossRef]
- Su, Y.; Li, D.; Chen, X. Lung Nodule Detection based on Faster R-CNN Framework. Comput. Methods Programs Biomed. 2021, 200, 105866. [Google Scholar] [CrossRef]
- Zheng, S.; Guo, J.; Cui, X.; Veldhuis, R.N.J.; Oudkerk, M.; van Ooijen, P.M.A. Automatic Pulmonary Nodule Detection in CT Scans Using Convolutional Neural Networks Based on Maximum Intensity Projection. arXiv 2019, arXiv:1904.05956. [Google Scholar] [CrossRef]
- Zheng, S.; Cornelissen, L.J.; Cui, X.; Jing, X.; Veldhuis, R.N.J.; Oudkerk, M.; van Ooijen, P.M.A. Deep convolutional neural networks for multi-planar lung nodule detection: Improvement in small nodule identification. arXiv 2020, arXiv:2001.04537. [Google Scholar] [CrossRef]
- Fu, Y.; Xue, P.; Xiao, T.; Zhang, Z.; Zhang, Y.; Dong, E. Semi-Supervised Adversarial Learning for Improving the Diagnosis of Pulmonary Nodules. IEEE J. Biomed. Health Inform. 2022, 27, 109–120. [Google Scholar] [CrossRef] [PubMed]
- Cai, J.; Wang, L.; Cai, J.; Deng, Z.; Yang, Z.; Feng, H. Contactless Intelligent Anti-interference Lung Nodule Detection Method for Early Disease Detection. IEEE J. Biomed. Health Inform. 2025, 30, 2939–2950. [Google Scholar] [CrossRef]
- UrRehman, Z.; Qiang, Y.; Wang, L.; Shi, Y.; Yang, Q.; Khattak, S.U.; Aftab, R.; Zhao, J. Effective lung nodule detection using deep CNN with dual attention mechanisms. Sci. Rep. 2024, 14, 3934. [Google Scholar] [CrossRef]
- Nasrullah, N.; Sang, J.; Alam, M.S.; Mateen, M.; Cai, B.; Hu, H. Automated Lung Nodule Detection and Classification Using Deep Learning Combined with Multiple Strategies. Sensors 2019, 19, 3722. [Google Scholar] [CrossRef] [PubMed]
- Wang, H.; Zhu, H.; Ding, L.; Yang, K. Attention pyramid pooling network for artificial diagnosis on pulmonary nodules. PLoS ONE 2024, 19, e0302641. [Google Scholar] [CrossRef]
- Liu, W.; Sun, J.; Li, H.; Wang, Y.; Wang, Z. CSEA-Net: A channel–spatial enhanced attention network for lung tumor segmentation on CT images. iScience 2025, 28, 111974. [Google Scholar] [CrossRef]
- Kamnitsas, K.; Ledig, C.; Newcombe, V.F.; Simpson, J.P.; Kane, A.D.; Menon, D.K.; Rueckert, D.; Glocker, B. Efficient multi-scale 3D CNN with fully connected CRF for accurate brain lesion segmentation. Med. Image Anal. 2017, 36, 61–78. [Google Scholar] [CrossRef]
- Song, W.; Tang, F.; Marshall, H.; Fong, K.M.; Liu, F. A multiscale 3D network for lung nodule detection using flexible nodule modeling. Med. Phys. 2024, 51, 7356–7368. [Google Scholar] [CrossRef] [PubMed]
- Hamidian, S.; Sahiner, B.; Petrick, N.; Pezeshk, A. 3D convolutional neural network for automatic detection of lung nodules in chest CT. In Proceedings of the SPIE Medical Imaging, Orlando, FL, USA, 3 March 2017; p. 1013409. [Google Scholar] [CrossRef]
- Luo, X.; Song, T.; Wang, G.; Chen, J.; Chen, Y.; Li, K.; Metaxas, D.N.; Zhang, S. SCPM-Net: An Anchor-free 3D Lung Nodule Detection Network using Sphere Representation and Center Points Matching. arXiv 2021, arXiv:2104.05215. [Google Scholar] [CrossRef]
- Marinakis, I.D.; Karampidis, K.; Papadourakis, G.; Kara, M. Dynamic Patch-Based Sample Generation for Pulmonary Nodule Segmentation in Low-Dose CT Scans Using 3D Residual Networks for Lung Cancer Screening. Appl. Biosci. 2025, 4, 14. [Google Scholar] [CrossRef]
- Yang, S.; Lim, S.H.; Hong, J.H.; Park, J.S.; Kim, J.; Kim, H.W. Deep learning-based lung cancer risk assessment using chest computed tomography images without pulmonary nodules ≥ 8 mm. Transl. Lung Cancer Res. 2025, 14, 150–162. [Google Scholar] [CrossRef] [PubMed]
- Ozdemir, O.; Russell, R.L.; Berlin, A.A. A 3D Probabilistic Deep Learning System for Detection and Diagnosis of Lung Cancer Using Low-Dose CT Scans. IEEE Trans. Med. Imaging 2020, 39, 1419–1429. [Google Scholar] [CrossRef] [PubMed]
- Zhang, Y.; Ye, M.; Zhu, G.; Liu, Y.; Guo, P.; Yan, J. FFCA-YOLO for Small Object Detection in Remote Sensing Images. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5611215. [Google Scholar] [CrossRef]
- Xu, S.; Zheng, S.; Xu, W.; Xu, R.; Wang, C.; Zhang, J.; Teng, X.; Li, A.; Guo, L. HCF-Net: Hierarchical Context Fusion Network for Infrared Small Object Detection. In Proceedings of the 2024 IEEE International Conference on Multimedia and Expo (ICME), Niagara Falls, ON, Canada, 15–19 July 2024; pp. 1–6. [Google Scholar] [CrossRef]
- van Ginneken, B.; Jacobs, C. LUNA16 Part 1/2. Zenodo 2019. [Google Scholar] [CrossRef]
- van Ginneken, B.; Jacobs, C. LUNA16 Part 2/2. Zenodo 2019. [Google Scholar] [CrossRef]
- Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015; Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F., Eds.; Lecture Notes in Computer Science; Springer International Publishing: Cham, Switzerland, 2015; Volume 9351, pp. 234–241. [Google Scholar] [CrossRef]
- Dai, Y.; Wu, Y.; Zhou, F.; Barnard, K. Asymmetric Contextual Modulation for Infrared Small Target Detection. arXiv 2020, arXiv:2009.14530. [Google Scholar] [CrossRef]
- Wu, X.; Hong, D.; Chanussot, J. UIU-Net: U-Net in U-Net for Infrared Small Object Detection. IEEE Trans. Image Process. 2023, 32, 364–376. [Google Scholar] [CrossRef] [PubMed]
- Quan, W.; Zhao, W.; Wang, W.; Xie, H.; Lee Wang, F.; Wei, M. Lost in UNet: Improving Infrared Small Target Detection by Underappreciated Local Features. IEEE Trans. Geosci. Remote Sens. 2025, 63, 5000115. [Google Scholar] [CrossRef]
- Gunawan, R.; Tran, Y.; Zheng, J.; Nguyen, H.; Carrigan, A.; Mills, M.K.; Chai, R. Combining Multistaged Filters and Modified Segmentation Network for Improving Lung Nodules Classification. IEEE J. Biomed. Health Inform. 2024, 28, 5519–5527. [Google Scholar] [CrossRef] [PubMed]







| Item | Setting |
|---|---|
| Input resolution | |
| Optimizer | SGD |
| Learning rate | |
| Momentum | 0.9 |
| Weight decay | |
| Batch size | 6 |
| Training epochs | 800 |
| Dropout | 0.1 |
| Binarization threshold | 0.3 |
| Loss | BCE + with multi-scale weighting |
| GPU configuration | NVIDIA Tesla V100-SXM2-32GB |
| Parameter count | ≈20.5 M |
| IoU (%) | Dice (%) | Precision (%) | Sensitivity (%) | Specificity (%) | Accuracy (%) | Total Loss | Runtime (h) | |
|---|---|---|---|---|---|---|---|---|
| Baseline (HCF-Net) | 61.1309 (56.7180–65.4509) | 75.8773 (72.3823–79.1182) | 95.0295 (92.3706–97.1994) | 63.1501 (58.6862–67.4493) | 99.9979 (99.9968–99.9989) | 99.9750 (99.9718–99.9781) | 0.9846 | 7.77 |
| U-Net | 52.2023 (47.3823–56.8784) | 68.5959 (64.2985–72.5127) | 97.3866 (94.6778–99.2788) | 52.9440 (48.1230–57.6247) | 99.9991 (99.9982–99.9998) | 99.9699 (99.9663–99.9732) | 0.6252 | 5.95 |
| ACM | 56.3188 (52.1214–60.3663) | 72.0563 (68.5261–75.2855) | 96.0852 (93.6192–97.7764) | 57.6414 (53.4276–61.6748) | 99.9985 (99.9976–99.9992) | 99.9722 (99.9692–99.9751) | 0.6583 | 1.97 |
| UIUNet | 62.8487 (59.2687–66.1993) | 77.1866 (74.4261–79.6625) | 85.1420 (82.2017–87.7702) | 70.5908 (67.1808–73.7182) | 99.9923 (99.9907–99.9938) | 99.9741 (99.9712–99.9766) | 0.8248 | 8.12 |
| HintUNet | 43.1266 (38.9726–47.1955) | 60.2636 (56.0867–64.1263) | 82.6198 (79.3638–85.4477) | 47.4296 (43.0273–51.7941) | 99.9938 (99.9926–99.9948) | 99.9611 (99.9574–99.9647) | 0.7251 | 10.21 |
| HintHCFNet | 78.5465 (75.0126–81.6417) | 87.9844 (85.7225–89.8931) | 95.4145 (93.4688–96.8643) | 81.6278 (78.3335–84.4658) | 99.9976 (99.9965–99.9983) | 99.9861 (99.9838–99.9882) | 0.2965 | 15.98 |
| Daisy-Net (Ours) | 81.4134 (78.3253–84.2531) | 89.7545 (87.8454–91.4537) | 95.3436 (93.4224–96.7934) | 84.7845 (81.9568–87.3162) | 99.9974 (99.9963–99.9982) | 99.9880 (99.9859–99.9898) | 0.5001 | 11.30 |
| HCF-Net | PPSCA | OMF | LUNA16 Results | |||
|---|---|---|---|---|---|---|
| IoU (%) | Dice (%) | Sens (%) | Spec (%) | |||
| ✓ | 55.08 (52.99–57.00) | 60.49 (59.21–61.64) | 47.99 (46.39–49.43) | 99.9991 (99.9980–99.9999) | ||
| ✓ | ✓ | 55.51 (53.41–57.45) | 65.80 (64.40–67.05) | 51.17 (49.47–52.70) | 99.9818 (99.9807–99.9826) | |
| ✓ | ✓ | ✓ | 81.41 (78.33–84.25) | 89.75 (87.85–91.45) | 84.78 (81.96–87.32) | 99.9974 (99.9963–99.9982) |
| IoU (%) | Dice (%) | Sens (%) | Spec (%) | Total Loss | |
|---|---|---|---|---|---|
| W/O Data Normalization | 81.41 (78.33–84.25) | 89.75 (87.85–91.45) | 84.78 (81.96–87.32) | 99.9974 (99.9963–99.9982) | 0.5001 |
| W/Data Normalization | 69.65 (67.01–72.08) | 76.91 (75.28–78.37) | 80.88 (78.19–83.30) | 99.9938 (99.9927–99.9946) | 0.3647 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Zhu, Z.; Zhao, Y.; Zhao, X.; Ying, Y.; Gu, H.; Song, G.; Wang, Q. Daisy-Net: Dual-Attention and Inter-Scale-Aware Yield Network for Lung Nodule Object Detection. Mathematics 2026, 14, 1350. https://doi.org/10.3390/math14081350
Zhu Z, Zhao Y, Zhao X, Ying Y, Gu H, Song G, Wang Q. Daisy-Net: Dual-Attention and Inter-Scale-Aware Yield Network for Lung Nodule Object Detection. Mathematics. 2026; 14(8):1350. https://doi.org/10.3390/math14081350
Chicago/Turabian StyleZhu, Zhijian, Yiwen Zhao, Xingang Zhao, Yuhan Ying, Haoran Gu, Guoli Song, and Qinghui Wang. 2026. "Daisy-Net: Dual-Attention and Inter-Scale-Aware Yield Network for Lung Nodule Object Detection" Mathematics 14, no. 8: 1350. https://doi.org/10.3390/math14081350
APA StyleZhu, Z., Zhao, Y., Zhao, X., Ying, Y., Gu, H., Song, G., & Wang, Q. (2026). Daisy-Net: Dual-Attention and Inter-Scale-Aware Yield Network for Lung Nodule Object Detection. Mathematics, 14(8), 1350. https://doi.org/10.3390/math14081350

