Adversarial CAM Guidance for Chest X-Ray Classification: Reducing Framing Sensitivity with Mask Supervision
Abstract
1. Introduction
- •
- Bio-inspired framing sensitivity definition for chest X-rays:We define framing sensitivity as the prediction dependence on the surrounding context when the diagnostic evidence is unchanged, motivated by bio-inspired figure–ground segregation and selective attention.
- •
- Discriminator-guided CAM alignment as a training process:We introduce a training strategy where the CAM heatmaps produced by the classifier are treated as generated evidence maps, and a discriminator learns to distinguish these heatmaps from ground truth masks to provide evidence-centered guidance.
- •
- Joint optimization for accuracy and evidence focus:We optimize the classifier with a combined objective that preserves the classification performance while encouraging a mask-like CAM structure, improving the interpretability without requiring masks at inference.
- •
- Evaluation metrics for robustness and framing effects:We propose test-phase measures, including augmentation inconsistency (flip rate under angle-based augmentations) and framing sensitivity based on CAM–mask attention outside the evidence region.
- •
- Model agnostic and practical applicability:The proposed training process can be applied to standard chest X-ray backbones and datasets, requiring masks only during training and leaving the original network structure unchanged.
- •
- No inference overhead and practical deployment:The proposed training process does not modify the structure of existing models, so the inference-time processing speed, parameter count, and memory requirements remain unchanged. The method improves attention concentration and test performance, while introducing additional computation only during training due to the discriminator-guided updates.
- •
- Reproducibility:Our implementation procedure and trained models are available on GitHub [2].
2. Related Work
2.1. Chest X-Ray Classification and Generalization
2.2. Shortcut Learning and Spurious Correlations
2.3. Interpretability and Class Activation Mapping
2.4. Adversarial Learning for Alignment of Generated Outputs
3. Proposed Methodology
3.1. Problem Setting and Goal

- ClassifierA backbone network extracts the feature maps F, followed by global pooling and a linear classifier producing the class probabilities .
- CAM generator induced by the classifierGiven X and a target class, the classifier produces a CAM heatmap . No separate generator network is introduced; the classifier plays a dual role, both classifier and heatmap generator.
- DiscriminatorThe discriminator receives a single channel map and outputs a probability indicating whether the map is a real mask or a generated CAM map. It is trained using mask maps as real samples and CAM maps as fake samples.
- •
- Discriminator update:The discriminator learns to classify the ground truth masks as real and the CAM maps as fake.
- •
- Classifier update:The classifier is optimized with a combined loss consisting of standard classification loss and an adversarial loss that encourages the discriminator to classify the CAM maps as real.

- (1)
- Classifier (backbone + GAP + linear head)
- : Input X-ray image for sample i.
- : Backbone network (convolutional feature extractor) with parameters .
- : Feature value at spatial location and channel c.
- h, w: Spatial resolution of the last convolutional feature map.
- d: Number of channels in the last convolutional feature map.
- (2)
- CAM heatmap generation (target class )
- (3)
- Discriminator and adversarial losses
- -
- for real masks;
- -
- for fake CAM maps.
3.2. Dataset Description

4. Experimental Results
4.1. Training Setup
4.2. Loss Functions
- -
- Real: The mask (evidence region);
- -
- Fake: The CAM heatmap produced by the classifier.
4.3. Testing
4.3.1. Evaluation Metrics
Augmentation Inconsistency
- -
- Compute the predicted class for the original image x;
- -
- Compute the predicted class for the modified image
- -
- If the two predicted classes are different, the condition is true and the indicator becomes 1;
- -
- If they are the same, it becomes 0.
Framing Sensitivity
4.3.2. Ablation Study
4.3.3. Comparisons with X-Ray Image-Based SOTA Methods
4.3.4. Comparisons of Algorithm Complexity and Processing Time
4.3.5. Statistical Analysis
5. Discussion
5.1. Information Fusion
5.2. Error and Correct Classification Cases
5.3. Limitations and Future Work
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Zhou, B.; Khosla, A.; Lapedriza, A.; Oliva, A.; Torralba, A. Learning deep features for discriminative localization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 2921–2929. [Google Scholar] [CrossRef]
- Proposed Method. Available online: https://github.com/ganav/Adversarial-CAM-Guidance-for-Chest-X-ray-Classification/tree/main (accessed on 30 April 2026).
- Wang, L.; Lin, Z.Q.; Wong, A. COVID-Net: A tailored deep convolutional neural network design for detection of COVID-19 cases from chest X-ray images. Sci. Rep. 2020, 10, 19549. [Google Scholar] [CrossRef] [PubMed]
- Zech, J.R.; Badgeley, M.A.; Liu, M.; Costa, A.B.; Titano, J.J.; Oermann, E.K. Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: A cross-sectional study. PLoS Med. 2018, 15, e1002683. [Google Scholar] [CrossRef] [PubMed]
- Wang, B.; Pan, H.; Aboah, A.; Zhang, Z.; Keles, E.; Torigian, D.; Turkbey, B.; Krupinski, E.; Udupa, J.; Bagci, U. GazeGNN: A Gaze-guided graph neural network for chest X-ray classification. In Proceedings of the 2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, 3–8 January 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 2183–2192. [Google Scholar] [CrossRef]
- Zhu, H.; Rohling, R.; Salcudean, S.E. Multi-task UNet: Jointly boosting saliency prediction and disease classification on chest X-ray images. arXiv 2022, arXiv:2202.07118. [Google Scholar]
- Nie, W.; Zhang, C.; Song, D.; Bai, Y.; Xie, K.; Liu, A.A. Chest X-ray image classification: A causal perspective. In Medical Image Computing and Computer Assisted Intervention—MICCAI 2023; Greenspan, H., Madabhushi, A., Mousavi, P., Salcudean, S., Duncan, J., Syeda-Mahmood, T., Taylor, R., Eds.; Lecture Notes in Computer Science; Springer: Cham, Switzerland, 2023; Volume 14222. [Google Scholar] [CrossRef]
- Geirhos, R.; Jacobsen, J.-H.; Michaelis, C.; Zemel, R.; Brendel, W.; Bethge, M.; Wichmann, F.A. Shortcut learning in deep neural networks. Nat. Mach. Intell. 2020, 2, 665–673. [Google Scholar] [CrossRef]
- DeGrave, A.J.; Janizek, J.D.; Lee, S.-I. AI for radiographic COVID-19 detection selects shortcuts over signal. Nat. Mach. Intell. 2021, 3, 610–619. [Google Scholar] [CrossRef]
- Haider, A.; Arsalan, M.; Park, C.; Sultan, H.; Park, K.R. Exploring deep feature-blending capabilities to assist glaucoma screening. Appl. Soft. Comput. 2023, 133, 109918. [Google Scholar] [CrossRef]
- Goodfellow, I.J.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial nets. In Advances in Neural Information Processing Systems 27 (NIPS 2014); MIT Press: Cambridge, MA, USA, 2014; pp. 2672–2680. [Google Scholar]
- Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention, MICCAI 2015; Lecture Notes in Computer Science; Springer: Cham, Switzerland, 2015; Volume 9351, pp. 234–241. [Google Scholar] [CrossRef]
- Wali, R. Xtreme Margin: A tunable loss function for binary classification problems. arXiv 2022, arXiv:2211.00176. [Google Scholar] [CrossRef]
- COVID-19 Radiography Database. Available online: https://www.kaggle.com/datasets/tawsifurrahman/covid19-radiography-database (accessed on 30 April 2026).
- Python. 2026. Available online: https://www.python.org/ (accessed on 30 April 2026).
- TensorFlow. 2026. Available online: https://www.tensorflow.org/ (accessed on 30 April 2026).
- OpenCV. 2026. Available online: http://opencv.org/ (accessed on 30 April 2026).
- Keras. 2026. Available online: https://keras.io/ (accessed on 30 April 2026).
- Zhang, Z.; Sabuncu, M.R. Generalized cross entropy loss for training deep neural networks with noisy labels. arXiv 2018, arXiv:1805.07836. [Google Scholar] [CrossRef]
- Kingma, D.P.; Ba, J.B. ADAM: A method for stochastic optimization. In Proceedings of the 3rd International Conference on Learning Representations (ICLR), San Diego, CA, USA, 7–9 May 2015; OpenReview: Alameda, CA, USA, 2015; pp. 1–15. [Google Scholar]
- Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. MobileNets: Efficient convolutional neural networks for mobile vision applications. arXiv 2017, arXiv:1704.04861. [Google Scholar] [CrossRef]
- Tan, M.; Le, Q. EfficientNet: Rethinking model scaling for convolutional neural networks. In Proceedings of the 36th International Conference on Machine Learning (ICML), Long Beach, CA, USA, 9–15 June 2019; PMLR: Cambridge, MA, USA, 2019; pp. 6105–6114. [Google Scholar]
- Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K.Q. Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 2261–2269. [Google Scholar]
- Oltu, B.; Güney, S.; Yuksel, S.E.; Dengiz, B. Automated classification of chest X-rays: A deep learning approach with attention mechanisms. BMC Med. Imaging 2025, 25, 71. [Google Scholar] [CrossRef] [PubMed]
- El Houby, E.M.F. COVID-19 detection from chest X-ray images using transfer learning. Sci. Rep. 2024, 14, 11639. [Google Scholar] [CrossRef] [PubMed]
- Lamouadene, H.; El Kassaoui, M.; El Yadari, M.; El Kenz, A.; Benyoussef, A.; El Moutaouakil, A.; Mounkachi, O. Detection of COVID-19, lung opacity, and viral pneumonia via X-ray using machine learning and deep learning. Comput. Biol. Med. 2025, 191, 110131. [Google Scholar] [CrossRef] [PubMed]
- Jetson TX2 Module. 2026. Available online: https://developer.nvidia.com/embedded/jetson-tx2 (accessed on 30 April 2026).
- Mishra, P.; Singh, U.; Pandey, C.M.; Mishra, P.; Pandey, G. Application of student’s t-test, analysis of variance, and covariance. Ann. Card. Anaesth. 2019, 22, 407–411. [Google Scholar] [CrossRef] [PubMed]
- Cohen, J. A power primer. Psychol. Bull. 1992, 112, 155–159. [Google Scholar] [CrossRef] [PubMed]
- Wang, S. Domain-adaptive faster R-CNN for non-PPE identification on construction sites from body-worn and general images. Sci. Rep. 2026, 16, 4793. [Google Scholar] [CrossRef] [PubMed]
- Wang, S. Domain adaptation using transformer models for automated detection of exterior cladding materials in street view images. Sci. Rep. 2026, 16, 2696. [Google Scholar] [CrossRef] [PubMed]
- Mahmood, T.; Wahid, A.; Hong, J.S.; Kim, S.G.; Park, K.R. A novel convolution transformer-based network for histopathology-image classification using adaptive convolution and dynamic attention. Eng. Appl. Artif. Intell. 2024, 135, 108824. [Google Scholar] [CrossRef]






| Hardware | Software | ||
|---|---|---|---|
| Hardware | Specification | Library | Version |
| Memory | 32 GB RAM | Python [15] | 3.5.4 |
| GPU | Nvidia GeForce TITAN X (12 GB) | TensorFlow [16] | 1.9.0 |
| CPU | Intel(R) CoreTM i7-6700 CPU@3.40 GHz (8 CPUs) | OpenCV [17] | 4.3.0 |
| Keras API [18] | 2.1.6-tf | ||
| Parameter | Classifier | Discriminator |
|---|---|---|
| Loss | Categorical cross-entropy (CCE) [19] | Binary cross-entropy loss (BCE) [13] |
| Optimizer | Adaptive moment estimation (Adam) [20] | Adam |
| # Epochs | 50 | 50 |
| Learning rate | 0.001 | 0.001 |
| Batch size | 4 | 4 |
| Backbone | Training | Loss Used to Update Classifier | F1 | AI | BAR | |
|---|---|---|---|---|---|---|
| MobileNet-v1 [21] | Baseline | 0.849 | 0.140 | 0.360 | 0.640 | |
| Proposed | 0.869 | 0.122 | 0.310 | 0.690 | ||
| EfficientNet-B0 [22] | Baseline | 0.878 | 0.125 | 0.340 | 0.660 | |
| Proposed | 0.909 | 0.105 | 0.285 | 0.715 | ||
| DenseNet-121 [23] | Baseline | 0.898 | 0.118 | 0.325 | 0.675 | |
| Proposed | 0.926 | 0.098 | 0.270 | 0.730 |
| Predicted | Baseline | Proposed | |||||||
|---|---|---|---|---|---|---|---|---|---|
| Actual | COVID-19 | Lung Opacity | Normal | Viral Pneumonia | COVID-19 | Lung Opacity | Normal | Viral Pneumonia | |
| COVID-19 | 425 | 48 | 0 | 27 | 467 | 23 | 0 | 10 | |
| Lung Opacity | 25 | 463 | 0 | 12 | 38 | 449 | 11 | 2 | |
| Normal | 6 | 7 | 427 | 60 | 5 | 0 | 459 | 36 | |
| Viral Pneumonia | 2 | 0 | 17 | 481 | 2 | 2 | 19 | 477 | |
| Class | Baseline | Proposed | ||||
|---|---|---|---|---|---|---|
| PPV | TPR | F1 | PPV | TPR | F1 | |
| COVID-19 | 0.928 | 0.850 | 0.887 | 0.912 | 0.934 | 0.923 |
| Lung Opacity | 0.894 | 0.926 | 0.910 | 0.947 | 0.898 | 0.922 |
| Normal | 0.962 | 0.854 | 0.905 | 0.939 | 0.918 | 0.928 |
| Viral Pneumonia | 0.829 | 0.962 | 0.891 | 0.909 | 0.954 | 0.931 |
| Backbone | Training | Loss Used to Update Classifier | F1 | AI | BAR | |
|---|---|---|---|---|---|---|
| GazeCNN [5] | Baseline | 0.892 | 0.120 | 0.331 | 0.669 | |
| Proposed | 0.924 | 0.101 | 0.276 | 0.724 | ||
| MT-Unet [6] | Baseline | 0.842 | 0.143 | 0.365 | 0.635 | |
| Proposed | 0.867 | 0.124 | 0.314 | 0.686 | ||
| Causal CXR [7] | Baseline | 0.881 | 0.127 | 0.346 | 0.654 | |
| Proposed | 0.910 | 0.108 | 0.292 | 0.708 | ||
| DenseNet201 + ViT + GAP [24] | Baseline | 0.872 | 0.119 | 0.318 | 0.682 | |
| Proposed | 0.895 | 0.101 | 0.264 | 0.736 | ||
| Transfer-CNN [25] | Baseline | 0.865 | 0.116 | 0.305 | 0.695 | |
| Proposed | 0.889 | 0.097 | 0.251 | 0.749 | ||
| EfficientNet-CNN [26] | Baseline | 0.858 | 0.111 | 0.298 | 0.702 | |
| Proposed | 0.884 | 0.093 | 0.243 | 0.757 |
| Backbone | Mode | Params (M) | FLOPs (G) | Inference Time per Image Unit as ms (fps) | |
|---|---|---|---|---|---|
| Desktop | Jetson | ||||
| EfficientNet-B0 [22] | Baseline | 5.3 | 0.39 | 11.96 (83.61) | 42.86 (23.33) |
| Proposed | |||||
| DenseNet-121 [23] | Baseline | 8.0 | 2.83 | 24.42 (40.95) | 62.75 (15.94) |
| Proposed | |||||
| MobileNet-v1 [21] | Baseline | 4.2 | 0.56 | 6.96 (143.68) | 35.38 (28.26) |
| Proposed | |||||
| GazeCNN [5] | Baseline | 10.7 | 1.7 | 19.80 (50.51) | 56.50 (17.70) |
| Proposed | |||||
| MT-UNet [6] | Baseline | 29.0 | 6.2 | 43.00 (23.26) | 128.00 (7.81) |
| Proposed | |||||
| Causal CXR [7] | Baseline | 44.6 | 7.8 | 52.00 (19.23) | 150.00 (6.67) |
| Proposed | |||||
| DenseNet201 + ViT + GAP [24] | Baseline | 24.0 | 5.1 | 36.87 (27.12) | 105.46 (9.48) |
| Proposed | |||||
| Transfer-CNN [25] | Baseline | 143.7 | 19.6 | 120.30 (8.31) | 372.21 (2.69) |
| Proposed | |||||
| EfficientNet-CNN [26] | Baseline | 5.7 | 0.42 | 10.30 (97.04) | 38.76 (25.80) |
| Proposed | |||||
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Batchuluun, G.; Lee, S.J.; Im, S.J.; Park, K.R. Adversarial CAM Guidance for Chest X-Ray Classification: Reducing Framing Sensitivity with Mask Supervision. Biomimetics 2026, 11, 409. https://doi.org/10.3390/biomimetics11060409
Batchuluun G, Lee SJ, Im SJ, Park KR. Adversarial CAM Guidance for Chest X-Ray Classification: Reducing Framing Sensitivity with Mask Supervision. Biomimetics. 2026; 11(6):409. https://doi.org/10.3390/biomimetics11060409
Chicago/Turabian StyleBatchuluun, Ganbayar, Sung Jae Lee, Su Jin Im, and Kang Ryoung Park. 2026. "Adversarial CAM Guidance for Chest X-Ray Classification: Reducing Framing Sensitivity with Mask Supervision" Biomimetics 11, no. 6: 409. https://doi.org/10.3390/biomimetics11060409
APA StyleBatchuluun, G., Lee, S. J., Im, S. J., & Park, K. R. (2026). Adversarial CAM Guidance for Chest X-Ray Classification: Reducing Framing Sensitivity with Mask Supervision. Biomimetics, 11(6), 409. https://doi.org/10.3390/biomimetics11060409
