Augmented Grad-CAM++: Super-Resolution Saliency Maps for Visual Interpretation of Deep Neural Network
Abstract
:1. Introduction
- The Augmented Grad-CAM++ method is proposed to introduce an augmentation strategy that preserves the high-frequency details of the saliency maps, generates multiple input images using image augmentation techniques, creates different activation mapping maps with spatial feature information of the target object for each image, and combines multiple activation maps into the final saliency map, which contains more detailed details, a more accurate and focused distribution of the target object, and better visualization than the traditional method.
- A super-resolution technique is applied to reconstruct the saliency maps by increasing the pixel points in the saliency maps using bilinear interpolation to improve the resolution of the salient mapping map. Compared with traditional methods, the saliency maps of this model have a higher resolution, providing a more detailed and interpretable visual interpretation, and the high resolution of the heat map provides a better understanding of the decision process of the model and helps to identify the input image regions that play an important role in prediction.
- Application in the localization of defects on industrial surfaces. Augmented Grad-CAM++ is applied to locate tiny defects of various shapes in industrial images. The results show that Augmented Grad-CAM++ is more capable of locating industrial defects of various damage shapes than traditional methods, providing strong support for quality control and defect detection, and enhancing the interpretability of deep learning network model applications in the industrial field, which shows the potential and application prospects of this model in the industrial field.
2. Related Works
3. Algorithm Design
3.1. Geometric Image Augmentation
3.2. Saliency Map Generation Based on Super-Resolution Techniques
3.3. Flow Chart of Augmented Grad-CAM++
3.4. The Pseudo-Code of the Model-Based Approach
Algorithm 1: Augmented Grad-CAM++ |
1 Start 2 Input image object class indexed as , number of augmented images 3 Apply the combination of rotation and translation to augment the transformation 4 5 Apply Equation (7) for rotation and translation operations 6 Calculate the new pixels , after the transformation 7 Compute the activation mapping of the last layer of the convolutional neural network 8 Compute the combined activation mapping of augmented images 9 10 Apply super-resolution operation 11 12 Compute the saliency map after reconstructing the pixels 13 14 |
4. Experiment Results and Evaluation
4.1. Experimental Environment and Data
4.1.1. Experiment Environment
4.1.2. Introduction of Dataset
4.2. Human Trust Assessment
4.3. Model Target Object Visualization and Analysis Experiments
4.3.1. Weakly Supervised Target Localization Experiments
4.3.2. t-SNE Visualization and Analysis
4.4. Experiment on Pixel Count and Resolution Comparison
4.5. Insert and Delete Pixel Test
4.6. Pointing Game
4.7. Integrity Check
4.7.1. Data Category Randomization
4.7.2. Model Randomization
4.7.3. Scalability of the Model
4.8. Analysis of Complexity
4.9. Visualization of Industrial Surface Defect Detection
4.9.1. Human Trust Assessment
4.9.2. Pointing Game
4.9.3. Insertion and Deletion of Pixel Tests
4.9.4. Weakly Supervised Target Localization Experiments
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Ibrahim, I.R.; Shafiq, M.O. Augmented Score-CAM: High resolution visual interpretations for deep neural networks. Knowl.-Based Syst. 2022, 252, 109287. [Google Scholar] [CrossRef]
- Teng, Q.; Liu, Z.; Song, Y.; Han, K.; Lu, Y. A survey on the interpretability of deep learning in medical diagnosis. Multimed. Syst. 2022, 28, 2335–2355. [Google Scholar] [CrossRef] [PubMed]
- Ibrahim, I.R.; Shafiq, M.O. Explainable Convolutional Neural Networks: A Taxonomy, Review, and Future Directions. ACM Comput. Surv. 2023, 55, 206. [Google Scholar] [CrossRef]
- Zhou, X.; Cai, X.; Zhang, H.; Zhang, Z.; Jin, T.; Chen, H.; Deng, W. Multi-strategy competitive-cooperative co-evolutionary algorithm and its application. Inform. Sci. 2023, 635, 328–344. [Google Scholar] [CrossRef]
- Li, X.; Zhao, H.; Deng, W. BFOD: Blockchain-based privacy protection and security sharing scheme of flight operation data. IEEE Internet Things J. 2023. [Google Scholar] [CrossRef]
- Xiao, Y.; Shao, H.; Feng, M.; Han, T.; Wan, J.; Liu, B. Towards trustworthy rotating machinery fault diagnosis via attention uncertainty in Transformer. J. Manuf. Syst. 2023, 70, 186–201. [Google Scholar] [CrossRef]
- Chen, X.; Shao, H.; Xiao, Y.; Yan, S.; Cai, B.; Liu, B. Collaborative fault diagnosis of rotating machinery via dual adversarial guided unsupervised multi-domain adaptation network. Mech. Syst. Signal Process. 2023, 198, 110427. [Google Scholar] [CrossRef]
- Yan, S.; Shao, H.; Min, Z.; Peng, J.; Cai, B.; Liu, B. FGDAE: A new machinery anomaly detection method towards complex operating conditions. Reliab. Eng. Syst. Saf. 2023, 236, 109319. [Google Scholar] [CrossRef]
- Voulodimos, A.; Doulamis, N.; Doulamis, A.; Protopapadakis, E. Deep learning for computer vision: A brief review. Comput. Intell. Neurosci. 2018, 2018, 7068349. [Google Scholar] [CrossRef]
- Dong, S.; Wang, P.; Abbas, K. A survey on deep learning and its applications. Comput. Sci. Rev. 2021, 40, 100379. [Google Scholar] [CrossRef]
- Li, X.H.; Cao, C.C.; Shi, Y.; Bai, W.; Gao, H.; Qiu, L.; Wang, C.; Gao, Y.; Zhang, S.; Xue, X.; et al. A survey of data-driven and knowledge-aware explainable ai. IEEE Trans. Knowl. Data Eng. 2020, 34, 29–49. [Google Scholar] [CrossRef]
- Du, M.; Liu, N.; Hu, X. Techniques for interpretable machine learning. Commun. ACM 2019, 63, 68–77. [Google Scholar] [CrossRef]
- Zachary, C.L. The mythos of model interpretability. Queue 2018, 16, 31–57. [Google Scholar]
- Shu, Y.; Jin, T. Stability in measure and asymptotic stability of uncertain nonlinear switched systems with a practical application. Int. J. Control. 2023, 96, 2917–2927. [Google Scholar] [CrossRef]
- Zhao, H.; Wu, Y.; Deng, W. An interpretable dynamic inference system based on fuzzy broad learning. IEEE Trans. Instrum. Meas. 2023, 72, 2527412. [Google Scholar] [CrossRef]
- Zhen, Y.; Yang, H.; Guo, D.; Lin, Y. Improving airport arrival flow prediction considering heterogeneous and dynamic network dependencies. Inf. Fusion 2023, 100, 101924. [Google Scholar]
- Li, M.; Zhang, W.; Hu, B.; Kang, J.; Wang, Y.; Lu, S. Automatic assessment of depression and anxiety through encoding pupil-wave from HCI in VR scenes. ACM Trans. Multimed. Comput. Commun. Appl. 2023, 20, 1–22. [Google Scholar] [CrossRef]
- Yang, J.; Zhang, Y.; Jin, T.; Lei, Z.; Todo, Y.; Gao, S. Maximum Lyapunov exponent-based multiple chaotic slime mold algorithm for real-world optimization. Sci. Rep. 2023, 13, 12744. [Google Scholar] [CrossRef]
- Xie, C.; Zhou, L.; Ding, S.; Liu, R.; Zheng, S. Experimental and numerical investigation on self-propulsion performance of polar merchant ship in brash ice channel. Ocean. Eng. 2023, 269, 113424. [Google Scholar] [CrossRef]
- Zhao, H.; Liu, J.; Chen, H.; Chen, J.; Li, Y.; Xu, J.; Deng, W. Intelligent diagnosis using continuous wavelet transform and gauss convolutional deep belief network. IEEE Trans. Reliab. 2023, 72, 692–702. [Google Scholar] [CrossRef]
- Zeiler, M.D.; Fergus, R. Visualizing and understanding convolutional networks. In Proceedings of the Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, 6–12 September 2014; Proceedings, Part I 13. Springer International Publishing: Berlin/Heidelberg, Germany, 2014; pp. 818–833. [Google Scholar]
- Pezeshkpour, P.; Tian, Y.; Singh, S. Investigating robustness and interpretability of link prediction via adversarial modifications. arXiv 2019, arXiv:1905.00563. [Google Scholar]
- Shwartz-Ziv, R.; Tishby, N. Opening the black box of deep neural networks via information. arXiv 2017, arXiv:1703.00810. [Google Scholar]
- Ribeiro, M.T.; Singh, S.; Guestrin, C. “Why should i trust you?” Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 1135–1144. [Google Scholar]
- Zhou, B.; Khosla, A.; Lapedriza, A.; Oliva, A.; Torralba, A. Learning deep features for discriminative localization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 2921–2929. [Google Scholar]
- Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 618–626. [Google Scholar]
- Chattopadhyay, A.; Sarkar, A.; Howlader, P.; Balasubramanian, V.N. Grad-CAM: Improved visual explanations for deep convolutional networks. arXiv 2017, arXiv:1710.11063. [Google Scholar]
- Wang, H.; Wang, Z.; Du, M.; Yang, F.; Zhang, Z.; Ding, S.; Mardziel, P.; Hu, X. Score-CAM: Score-weighted visual explanations for convolutional neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Seattle, WA, USA, 14–19 June 2020; pp. 24–25. [Google Scholar]
- Jiang, P.T.; Zhang, C.B.; Hou, Q.; Cheng, M.M.; Wei, Y. Layercam: Exploring hierarchical class activation maps for localization. IEEE Trans. Image Process. 2021, 30, 5875–5888. [Google Scholar] [CrossRef]
- Zhang, Q.; Rao, L.; Yang, Y. Group-cam: Group score-weighted visual explanations for deep convolutional networks. arXiv 2021, arXiv:2103.13859. [Google Scholar]
- Zhang, L.; Chen, D.; Ma, J.; Zhang, J. Remote-sensing image superresolution based on visual saliency analysis and unequal reconstruction networks. IEEE Trans. Geosci. Remote Sens. 2020, 58, 4099–4115. [Google Scholar] [CrossRef]
- Liu, Z.; Tian, J.; Chen, L.; Wang, Y. Saliency adaptive super-resolution image reconstruction. Opt. Commun. 2012, 285, 1039–1043. [Google Scholar] [CrossRef]
- Liu, B.; Zhao, L.; Li, J.; Zhao, H.; Liu, W.; Li, Y.; Wang, Y.; Chen, H.; Cao, W. Saliency-guided remote sensing image super-resolution. Remote Sens. 2021, 13, 5144. [Google Scholar] [CrossRef]
- Li, X.; Zhao, H.; Yu, L.; Chen, H.; Deng, W.; Deng, W. Feature Extraction Using Parameterized Multisynchrosqueezing Transform. IEEE Sens. J. 2022, 22, 14263–14272. [Google Scholar] [CrossRef]
- Gu, Y.; Zhou, L.; Ding, S.; Tan, X.; Gao, J.; Zhang, M. Numerical simulation of ship maneuverability in level ice considering ice crushing failure. Ocean. Eng. 2022, 251, 111110. [Google Scholar] [CrossRef]
- Wu, X.; Wang, Z.; Wu, T.; Bao, X. Solving the family traveling salesperson problem in the Adleman–Lipton model based on DNA Computing. IEEE Trans. NanoBiosci. 2022, 21, 75–85. [Google Scholar] [CrossRef] [PubMed]
- Deng, W.; Li, Z.; Li, X.; Chen, H.; Zhao, H. Compound fault diagnosis using optimized MCKD and sparse representation for rolling bearings. IEEE Trans. Instrum. Meas. 2022, 71, 3508509. [Google Scholar] [CrossRef]
- Zhang, Z.; Guo, D.; Zhou, S.; Zhang, J.; Lin, Y. Flight trajectory prediction enabled by time-frequency wavelet transform. Nat. Commun. 2023, 14, 5258. [Google Scholar] [CrossRef]
- Wang, Z.; Wang, Q.; Wu, T. A novel hybrid model for water quality prediction based on VMD and IGOA optimized for LSTM. Front. Environ. Sci. Eng. 2023, 17, 88. [Google Scholar] [CrossRef]
- Yao, Z.; Wang, Z.; Wang, D.; Wu, J.; Chen, L. An ensemble CNN-LSTM and GRU adaptive weighting model based improved sparrow search algorithm for predicting runoff using historical meteorological and runoff data as input. J. Hydrol. 2023, 625, 129977. [Google Scholar] [CrossRef]
- Morbidelli, P.; Carrera, D.; Rossi, B.; Fragneto, P.; Boracchi, G. Augmented Grad-CAM: Heat-maps super resolution through augmentation. In Proceedings of the ICASSP 2020—2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 4–8 May 2020; pp. 4067–4071. [Google Scholar]
- Li, M.; Zhang, J.; Song, J.; Li, Z.; Lu, S. A clinical-oriented non severe depression diagnosis method based on cognitive behavior of emotional conflict. IEEE Trans. Comput. Soc. Syst. 2022, 10, 131–141. [Google Scholar] [CrossRef]
- Zhao, H.M.; Zhang, P.P.; Zhang, R.C.; Yao, R.; Deng, W. A novel performance trend prediction approach using ENBLS with GWO. Meas. Sci. Technol. 2023, 34, 025018. [Google Scholar] [CrossRef]
- Xu, J.J.; Zhao, Y.L.; Chen, H.Y.; Deng, W. ABC-GSPBFT: PBFT with grouping score mechanism and optimized consensus process for flight operation data-sharing. Inf. Sci. 2023, 624, 110–127. [Google Scholar] [CrossRef]
- Mundhenk, T.N.; Chen, B.Y.; Friedland, G. Efficient saliency maps for explainable AI. arXiv 2019, arXiv:1911.11293. [Google Scholar]
- Rai, A. Explainable AI: From black box to glass box. J. Acad. Mark. Sci. 2020, 48, 137–141. [Google Scholar] [CrossRef]
- Taylor, L.; Nitschke, G. Improving deep learning with generic data augmentation. In Proceedings of the 2018 IEEE Symposium Series on Computational Intelligence (SSCI), Bengaluru, India, 18–21 November 2018; IEEE: New York, NY, USA, 2018; pp. 1542–1547. [Google Scholar]
- Bloice, M.D.; Roth, P.M.; Holzinger, A. Biomedical image augmentation using Augmentor. Bioinformatics 2019, 35, 4522–4524. [Google Scholar] [CrossRef] [PubMed]
- Deng, J.; Dong, W.; Socher, R.; Li, L.J.; Li, K.; Fei-Fei, L. Imagenet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; IEEE: New York, NY, USA, 2009; pp. 248–255. [Google Scholar]
- Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Ma, S.; Huang, Z.; Karpathy, A.; Khosla, A.; Bernstein, M.; et al. Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 2015, 115, 211–252. [Google Scholar] [CrossRef]
- Van der Maaten, L.; Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 2008, 9, 2579–2605. [Google Scholar]
- Samek, W.; Binder, A.; Montavon, G.; Lapuschkin, S.; Müller, K.R. Evaluating the visualization of what a deep neural network has learned. IEEE Trans. Neural Netw. Learn. Syst. 2016, 28, 2660–2673. [Google Scholar] [CrossRef] [PubMed]
- Petsiuk, V.; Das, A.; Saenko, K. Rise: Randomized input sampling for explanation of black-box models. arXiv 2018, arXiv:1806.07421. [Google Scholar]
- Kupferman, O. Sanity checks in formal verification. In Proceedings of the CONCUR 2006–Concurrency Theory: 17th International Conference, CONCUR 2006, Bonn, Germany, 27–30 August 2006; Proceedings 17. Springer: Berlin/Heidelberg, Germany, 2006; pp. 37–51. [Google Scholar]
- Goodfellow, I.J.; Shlens, J.; Szegedy, C. Explaining and harnessing adversarial examples. arXiv 2014, arXiv:1412.6572. [Google Scholar]
- Martin, D.; Heinzel, S.; von Bischhoffshausen, J.K.; Kühl, N. Deep learning strategies for industrial surface defect detection systems. arXiv 2021, arXiv:2109.11304. [Google Scholar]
- Lakkaraju, H.; Kamar, E.; Caruana, R.; Leskovec, J. Interpretable & explorable approximations of black box models. arXiv 2017, arXiv:1707.01154. [Google Scholar]
Options | Augmented Grad-CAM++ | Grad-CAM++ | Same |
---|---|---|---|
Average percentage | 64% | 20% | 16% |
Method | Grad-CAM | Grad-CAM++ | Score-CAM | Augmented Grad-CAM++ |
---|---|---|---|---|
mIoU | 49.25% | 49.86% | 50.63% | 53.01% |
Method | Grad-CAM | Grad-CAM++ | Score-CAM | Augmented Grad-CAM++ |
---|---|---|---|---|
Pixel | 90,000 | 90,000 | 90,000 | 360,000 |
Resolution | 300 * 300 | 300 * 300 | 300 * 300 | 600 * 600 |
AUC | Grad-CAM | Grad-CAM++ | Score-CAM | Augmented Grad-CAM++ |
---|---|---|---|---|
Insertion | 52.6 | 53.0 | 54.3 | 56.2 |
Deletion | 15.9 | 15.5 | 13.8 | 13.0 |
Overall | 36.7 | 37.5 | 40.5 | 43.2 |
Method | Grad-CAM | Grad-CAM++ | Score-CAM | Augmented Grad-CAM++ |
---|---|---|---|---|
Percentage | 41.6% | 45.1% | 50.3% | 52.8% |
Model | Added Noise | Accuracy |
---|---|---|
VGG-19 1 | 0 | 87% |
VGG-19 2 | 0.1 | 83% |
VGG-19 3 | 0.2 | 60% |
VGG-19 4 | 0.3 | 29% |
Image Resolution | Number of Augmented Images | Execution Time |
---|---|---|
High Resolution (300 dpi) | 100 | 56.59 s |
High Resolution (300 dpi) | 150 | 64.65 s |
High Resolution (300 dpi) | 200 | 94.38 s |
Medium Resolution (150 dpi) | 100 | 56.59 s |
Medium Resolution (150 dpi) | 150 | 64.65 s |
Medium Resolution (150 dpi) | 200 | 94.38 s |
Low Resolution (75 dpi) | 100 | 56.59 s |
Low Resolution (75 dpi) | 150 | 64.65 s |
Low Resolution (75 dpi) | 200 | 94.38 s |
Method | Grad-CAM | Grad-CAM++ | Score-CAM | Augmented Grad-CAM++ |
---|---|---|---|---|
FLOPs(G) | 27.28 | 29.65 | 26.7 | 32.58 |
Method | Grad-CAM | Grad-CAM++ | Score-CAM | Augmented Grad-CAM++ |
---|---|---|---|---|
Average score percentage | 10% | 15% | 20% | 55% |
Method | Grad-CAM | Grad-CAM++ | Score-CAM | Augmented Grad-CAM++ |
---|---|---|---|---|
Pointing game (%) | 30.6% | 35.1% | 40.3% | 45.8% |
AUC | Grad-CAM | Grad-CAM++ | Score-CAM | Augmented Grad-CAM++ |
---|---|---|---|---|
Insertion | 45.5 | 46.3 | 47.6 | 49.2 |
Deletion | 11.7 | 11.4 | 11.1 | 10.0 |
Overall | 33.8 | 34.9 | 36.5 | 39.2 |
Method | Grad-CAM | Grad-CAM++ | Score-CAM | Augmented Grad-CAM++ |
---|---|---|---|---|
mIoU (%) | 15.6% | 18.1% | 24.3% | 35.9% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Gao, Y.; Liu, J.; Li, W.; Hou, M.; Li, Y.; Zhao, H. Augmented Grad-CAM++: Super-Resolution Saliency Maps for Visual Interpretation of Deep Neural Network. Electronics 2023, 12, 4846. https://doi.org/10.3390/electronics12234846
Gao Y, Liu J, Li W, Hou M, Li Y, Zhao H. Augmented Grad-CAM++: Super-Resolution Saliency Maps for Visual Interpretation of Deep Neural Network. Electronics. 2023; 12(23):4846. https://doi.org/10.3390/electronics12234846
Chicago/Turabian StyleGao, Yongshun, Jie Liu, Weihan Li, Ming Hou, Yang Li, and Huimin Zhao. 2023. "Augmented Grad-CAM++: Super-Resolution Saliency Maps for Visual Interpretation of Deep Neural Network" Electronics 12, no. 23: 4846. https://doi.org/10.3390/electronics12234846
APA StyleGao, Y., Liu, J., Li, W., Hou, M., Li, Y., & Zhao, H. (2023). Augmented Grad-CAM++: Super-Resolution Saliency Maps for Visual Interpretation of Deep Neural Network. Electronics, 12(23), 4846. https://doi.org/10.3390/electronics12234846