LMeRAN: Label Masking-Enhanced Residual Attention Network for Multi-Label Chest X-Ray Disease Aided Diagnosis
Abstract
1. Introduction
- To the best of our knowledge, this work is the first to incorporate a label mask training strategy into CXR image classification, enabling the model to effectively capture inter-disease correlations and thereby improve both the precision and robustness of predictions.
- We design a novel label-specific residual attention mechanism that simultaneously emphasizes disease-relevant features and retains crucial global image context. Furthermore, we provide visual explanations by highlighting image regions most influential in the model’s decisions, enhancing interpretability and transparency.
- We perform comprehensive evaluations on the publicly available ChestX-ray14 dataset to demonstrate the superiority of LMeRAN and assess the individual contributions of its components through ablation studies.
2. Related Work
2.1. Deep Learning-Based CXR Image Classification
2.2. Attention in CXR Image Classification
2.3. Label Dependency in Multi-Label Image Classification
3. Proposed Method
3.1. Image Feature Extraction
3.2. Embedding Disease Label and State
3.3. Label-Specific Residual Attention
3.4. Label Mask Training Loss
4. Experiments
4.1. Dataset
4.2. Evaluation Metrics
4.3. Experiment Setting
4.4. Baselines
- Wang et al. [16] use a pre-trained CNN as a feature extractor and focus on training only the transition and classification layers in the model for weakly supervised classification and localization of common thorax diseases.
- Yao et al. [21] employ a multi-resolution analysis approach that combines weakly supervised learning techniques. The model integrates features from different resolutions to enhance the accuracy of medical diagnosis and localization tasks.
- Ma et al. [26] design a multi-attention network for thoracic disease classification and localization, utilizing multiple attention mechanisms to better capture relevant features. The model enhances both classification accuracy and localization precision by focusing on critical areas in the images.
- Peng et al. [32] propose the Conformer model, which merges the local feature extraction capabilities of CNNs with the global representation power of vision transformers. The model effectively captures both local details and long-distance feature dependencies by combining convolution operations with self-attention mechanisms, enhancing representation learning.
- Wu et al. [33] introduce CTransCNN, a model that combines CNNs and Transformers for multi-label medical image classification. It includes a multi-head attention feature module, a multi-branch residual module, and an information interaction module, which together improve label correlation exploration, model optimization, and feature transmission.
4.5. Results and Analysis
4.5.1. Performance Comparison with Baselines
4.5.2. Ablation Experiment
- LMeRAN (w/o LMT): LMeRAN model with the label mask training excluded.
- LMeRAN (w/o LSRA): LMeRAN model with the label-specific residual attention excluded.
- LMeRAN (w/o LMT+ LSRA): LMeRAN model with both the label mask training and label-specific residual attention excluded.
- LMeRAN (Complete): The complete LMeRAN model, including all components.
- LMeRAN (w/o LMT+ LSRA): When both components are excluded, the mAUC is 0.792, serving as the baseline performance.
- LMeRAN (w/o LMT): Including only the LSRA component enhances the mAUC to 0.813. This result highlights that LSRA effectively enhances the model’s ability to focus on discriminative image regions, thereby improving feature representations of disease-relevant areas.
- LMeRAN (w/o LSRA): When only the LMT component is included, the mAUC increases to 0.804. This enhancement suggests that LMT effectively captures the interdependencies among disease labels, allowing the model to leverage label correlations during training to refine its predictions.
- LMeRAN (Complete): When both LMT and LSRA components are included, the mAUC reaches its highest value of 0.825. This confirms that the joint contribution of both components leads to superior classification performance, effectively combining label dependency modeling and enhanced image feature representation to maximize predictive accuracy.
LMT | LSRA | mAUC | |
---|---|---|---|
LMeRAN (w/o LMT+ LSRA) | × | × | 0.792 |
LMeRAN (w/o LMT) | × | √ | 0.813 |
LMeRAN (w/o LSRA) | √ | × | 0.804 |
LMeRAN (Complete) | √ | √ | 0.825 |
4.5.3. Parameter Sensitivity Experiment
4.5.4. Interpretability Analysis
5. Discussion
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Weiss, J.; Raghu, V.K.; Bontempi, D.; Christiani, D.C.; Mak, R.H.; Lu, M.T.; Aerts, H.J. Deep Learning to Estimate Lung Disease Mortality from Chest Radiographs. Nat. Commun. 2023, 14, 2797. [Google Scholar] [CrossRef]
- Wei, Y.J.; Pan, N.; Chen, Y.; Lv, P.J.; Gao, J.B. A Study Using Deep Learning-Based Computer-Aided Diagnostic System with Chest Radiographs-Pneumothorax and Pulmonary Nodules Detection. J. Clin. Radiol. 2021, 40, 252–257. [Google Scholar]
- Guo, H.; Li, M.Y.; Zhu, P.Z.; Wang, X.M.; Zhou, X. An early screening system for lung lesions in chest X-ray images based on AI algorithms. Imaging Sci. Photochem. 2025, 43, 134–144. [Google Scholar]
- Çallı, E.; Sogancioglu, E.; van Ginneken, B.; van Leeuwen, K.G.; Murphy, K. Deep Learning for Chest X-Ray Analysis: A Survey. Med. Image Anal. 2021, 72, 102125. [Google Scholar] [CrossRef] [PubMed]
- Chen, B.; Zhang, Z.; Lin, J.; Chen, Y.; Lu, G. Two-Stream Collaborative Network for Multi-Label Chest X-Ray Image Classification with Lung Segmentation. Pattern Recognit. Lett. 2020, 135, 221–227. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 770–778. [Google Scholar]
- Huang, G.; Liu, Z.; van der Maaten, L.; Weinberger, K.Q. Densely Connected Convolutional Networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4700–4708. [Google Scholar]
- Guan, Q.; Huang, Y.; Luo, Y.; Liu, P.; Xu, M.; Yang, Y. Discriminative Feature Learning for Thorax Disease Classification in Chest X-Ray Images. IEEE Trans. Image Process. 2021, 30, 2476–2487. [Google Scholar] [CrossRef]
- Sanida, T.; Dasygenis, M. A Novel Lightweight CNN for Chest X-Ray-Based Lung Disease Identification on Heterogeneous Embedded System. Appl. Intell. 2024, 54, 4756–4780. [Google Scholar] [CrossRef]
- Saednia, K.; Jalalifar, A.; Ebrahimi, S.; Sadeghi-Naini, A. An Attention-Guided Deep Neural Network for Annotating Abnormalities in Chest X-Ray Images: Visualization of Network Decision Basis. In Proceedings of the 42nd Annual International Conference of the IEEE Engineering in Medicine and Biology Society, Online, Montreal, QC, Canada, 20–24 July 2020; pp. 1258–1261. [Google Scholar]
- Guan, Q.; Huang, Y.; Zhong, Z.; Zheng, Z.; Zheng, L.; Yang, Y. Thorax Disease Classification with Attention Guided Convolutional Neural Network. Pattern Recognit. Lett. 2020, 131, 38–45. [Google Scholar] [CrossRef]
- Jiang, X.; Zhu, Y.; Cai, G.; Zheng, B.; Yang, D. MXT: A New Variant of Pyramid Vision Transformer for Multi-Label Chest X-Ray Image Classification. Cogn. Comput. 2022, 14, 1362–1377. [Google Scholar] [CrossRef]
- Khater, O.H.; Shuaib, A.S.; Haq, S.U.; Siddiqui, A.J. AttCDCNeT: Attention-Enhanced Chest Disease Classification using X-Ray Images. In Proceedings of the 22nd IEEE International Multi-Conference on Systems, Signals & Devices, Monastir, Tunisia, 17–20 February 2025; pp. 891–896. [Google Scholar]
- Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, MN, USA, 2–7 June 2019; pp. 4171–4186. [Google Scholar]
- Lanchantin, J.; Wang, T.; Ordonez, V.; Qi, Y. General Multi-Label Image Classification with Transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 19–25 June 2021; pp. 16478–16488. [Google Scholar]
- Wang, X.; Peng, Y.; Lu, L.; Lu, Z.; Bagheri, M.; Summers, R.M. ChestX-Ray8: Hospital-Scale Chest X-Ray Database and Benchmarks on Weakly Supervised Classification and Localization of Common Thorax Diseases. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2097–2106. [Google Scholar]
- Krizhevsky, A.; Sutskever, L.; Hinton, G.E. Imagenet Classification with Deep Convolutional Neural Networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef]
- Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going Deeper with Convolutions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 1–9. [Google Scholar]
- Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- Seibold, C.; Reiß, S.; Sarfraz, M.S.; Stiefelhagen, R.; Kleesiek, J. Breaking with Fixed-Set Pathology Recognition through Report-Guided Contrastive Training. In Proceedings of the 25th International Conference on Medical Image Computing and Computer-Assisted Intervention, Singapore, 18–22 September 2022; pp. 690–700. [Google Scholar]
- Yao, L.; Prosky, J.; Poblenz, E.; Covington, B.; Lyman, K. Weakly Supervised Medical Diagnosis and Localization from Multiple Resolutions. arXiv 2018, arXiv:1803.07703. [Google Scholar] [CrossRef]
- Yang, M.; Tanaka, H.; Ishida, T. Performance Improvement in Multi-Label Thoracic Abnormality Classification of Chest X-Rays with Noisy Labels. Int. J. Comput. Assist. Radiol. Surg. 2023, 18, 181–189. [Google Scholar] [CrossRef] [PubMed]
- Chen, Y.; Wan, Y.; Pan, F. Enhancing Multi-Disease Diagnosis of Chest X-Rays with Advanced Deep-Learning Networks in Real-World Data. J. Digit. Imaging 2023, 36, 1332–1347. [Google Scholar] [CrossRef]
- Ishwerlal, R.D.; Agarwal, R.; Sujatha, K.S. Lung Disease Classification using Chest X-Ray Image: An Optimal Ensemble of Classification with Hybrid Training. Biomed. Signal Process. Control 2024, 91, 105941. [Google Scholar]
- Öztürk, Ş.; Turalı, M.Y.; Çukur, T. Hydravit: Adaptive Multi-Branch Transformer for Multi-Label Disease Classification from Chest X-Ray Images. Biomed. Signal Process. Control 2025, 100, 106959. [Google Scholar] [CrossRef]
- Ma, Y.; Zhou, Q.; Chen, X.; Lu, H.; Zhao, Y. Multi-Attention Network for Thoracic Disease Classification and Localization. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, Brighton, UK, 12–17 May 2019; pp. 1378–1382. [Google Scholar]
- Hu, J.; Shen, L.; Sun, G. Squeeze-and-Excitation Networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 7132–7141. [Google Scholar]
- Guan, Q.; Huang, Y. Multi-Label Chest X-Ray Image Classification via Category-Wise Residual Attention Learning. Pattern Recognit. Lett. 2020, 130, 259–266. [Google Scholar] [CrossRef]
- Wang, H.; Wang, S.; Qin, Z.; Zhang, Y.; Li, R.; Xia, Y. Triple Attention Learning for Classification of 14 Thoracic Diseases using Chest Radiography. Med. Image Anal. 2021, 67, 101846. [Google Scholar] [CrossRef]
- Taslimi, S.; Taslimi, S.; Fathi, N.; Salehi, M.; Rohban, M.H. SwinCheX: Multi-Label Classification on Chest X-Ray Images with Transformers. arXiv 2022, arXiv:2206.04246. [Google Scholar]
- Liu, Z.; Liu, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 10012–10022. [Google Scholar]
- Peng, Z.; Guo, Z.; Huang, W.; Wang, Y.; Xie, L.; Jiao, J. Conformer: Local Features Coupling Global Representations for Recognition and Detection. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 9454–9468. [Google Scholar] [CrossRef]
- Wu, X.; Feng, Y.; Xu, H.; Lin, Z.; Chen, T.; Li, S.; Qiu, S.; Liu, Q.; Ma, Y.; Zhang, S. CTransCNN: Combining Transformer and CNN in Multilabel Medical Image Classification. Knowl.-Based Syst. 2023, 281, 111030. [Google Scholar] [CrossRef]
- Song, L.; Liu, J.; Qian, B.; Sun, M.; Yang, K.; Sun, M. A Deep Multi-Modal CNN for Multi-Instance Multi-Label Image Classification. IEEE Trans. Image Process. 2018, 27, 6025–6038. [Google Scholar] [CrossRef]
- Wang, J.; Yang, Y.; Mao, J.; Huang, Z.; Huang, C.; Xu, W. CNN-RNN: A Unified Framework for Multi-Label Image Classification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 2285–2294. [Google Scholar]
- Lee, Y.W.; Huang, S.K.; Chang, R.F. CheXGAT: A Disease Correlation-Aware Network for Thorax Disease Diagnosis from Chest X-Ray Images. Artif. Intell. Med. 2022, 132, 102382. [Google Scholar] [CrossRef]
- Radford, A.; Kim, J.W.; Hallacy, C.; Ramesh, A.; Goh, G.; Agarwal, S.; Sastry, G.; Askell, A.; Mishkin, P.; Clark, J.; et al. Learning Transferable Visual Models from Natural Language Supervision. In Proceedings of the 38th International Conference on Machine Learning, Online, 18–24 July 2021; pp. 8748–8763. [Google Scholar]
- Ali, M.; Khan, S. Clip-Decoder: Zeroshot Multilabel Classification using Multimodal CLIP aligned Representations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 18–22 June 2023; pp. 4675–4679. [Google Scholar]
- Wang, A.; Chen, H.; Lin, Z.; Ding, Z.; Liu, P.; Bao, Y.; Yan, W.; Ding, G. Hierarchical Prompt Learning using CLIP for Multi-Label Classification with Single Positive Labels. In Proceedings of the 31st ACM International Conference on Multimedia, Ottawa, ON, Canada, 29 October–3 November 2023; pp. 5594–5604. [Google Scholar]
Labels | Quantity | Frequency |
---|---|---|
No Finding | 60,361 | 42.65% |
Infiltration | 19,894 | 14.06% |
Effusion | 13,317 | 9.41% |
Atelectasis | 11,559 | 8.17% |
Nodule | 6331 | 4.47% |
Mass | 5782 | 4.09% |
Pneumothorax | 5302 | 3.75% |
Consolidation | 4667 | 3.30% |
Pleural Thickening | 3385 | 2.39% |
Cardiomegaly | 2776 | 1.96% |
Emphysema | 2516 | 1.78% |
Edema | 2303 | 1.63% |
Fibrosis | 1686 | 1.19% |
Pneumonia | 1431 | 1.01% |
Hernia | 227 | 0.16% |
Parameters | Value |
---|---|
optimizer | Adam |
learning_rate | 0.00001 |
dropout_rate | 0.1 |
batch_size | 64 |
epoch | 50 |
number of layers | 4 |
number of heads | 4 |
λ | 0.2 |
Disease | Wang et al. [16] | Yao et al. [21] | Ma et al. [26] | Conformer [32] | CTransCNN [33] | LMeRAN (Ours) |
---|---|---|---|---|---|---|
Atelectasis | 0.700 | 0.733 | 0.763 | 0.727 | 0.748 | 0.777 |
Cardiomegaly | 0.810 | 0.865 | 0.884 | 0.907 | 0.900 | 0.886 |
Effusion | 0.759 | 0.806 | 0.816 | 0.806 | 0.837 | 0.836 |
Infiltration | 0.661 | 0.673 | 0.679 | 0.697 | 0.707 | 0.703 |
Mass | 0.693 | 0.718 | 0.801 | 0.761 | 0.781 | 0.830 |
Nodule | 0.669 | 0.777 | 0.729 | 0.728 | 0.742 | 0.783 |
Pneumonia | 0.658 | 0.684 | 0.710 | 0.623 | 0.630 | 0.755 |
Pneumothorax | 0.799 | 0.805 | 0.838 | 0.831 | 0.847 | 0.885 |
Consolidation | 0.703 | 0.711 | 0.744 | 0.700 | 0.731 | 0.759 |
Edema | 0.805 | 0.806 | 0.841 | 0.828 | 0.858 | 0.860 |
Emphysema | 0.833 | 0.842 | 0.884 | 0.815 | 0.856 | 0.931 |
Fibrosis | 0.786 | 0.743 | 0.801 | 0.804 | 0.778 | 0.830 |
Pleural Thickening | 0.684 | 0.724 | 0.754 | 0.681 | 0.690 | 0.794 |
Hernia | 0.872 | 0.775 | 0.876 | 0.786 | 0.881 | 0.918 |
mAUC | 0.745 | 0.761 | 0.794 | 0.764 | 0.785 | 0.825 |
H = 1 | H = 2 | H = 4 | H = 6 | H = 8 | |
---|---|---|---|---|---|
mAUC | 0.819 | 0.822 | 0.825 | 0.824 | 0.824 |
R = 0 | R = 0.25 | R = 0.50 | R = 0.75 | |
---|---|---|---|---|
mAUC | 0.818 | 0.825 | 0.821 | 0.817 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Fu, H.; Song, C.; Qu, X.; Li, D.; Zhang, L. LMeRAN: Label Masking-Enhanced Residual Attention Network for Multi-Label Chest X-Ray Disease Aided Diagnosis. Sensors 2025, 25, 5676. https://doi.org/10.3390/s25185676
Fu H, Song C, Qu X, Li D, Zhang L. LMeRAN: Label Masking-Enhanced Residual Attention Network for Multi-Label Chest X-Ray Disease Aided Diagnosis. Sensors. 2025; 25(18):5676. https://doi.org/10.3390/s25185676
Chicago/Turabian StyleFu, Hongping, Chao Song, Xiaolong Qu, Dongmei Li, and Lei Zhang. 2025. "LMeRAN: Label Masking-Enhanced Residual Attention Network for Multi-Label Chest X-Ray Disease Aided Diagnosis" Sensors 25, no. 18: 5676. https://doi.org/10.3390/s25185676
APA StyleFu, H., Song, C., Qu, X., Li, D., & Zhang, L. (2025). LMeRAN: Label Masking-Enhanced Residual Attention Network for Multi-Label Chest X-Ray Disease Aided Diagnosis. Sensors, 25(18), 5676. https://doi.org/10.3390/s25185676