Beyond Local Explanations: A Framework for Global Concept-Based Interpretation in Image Classification
Abstract
1. Introduction
- (RQ1): Can global explanation methods be grounded with human taxonomy?
- (RQ2): What is the relationship between local and global explanations?
- (RQ3): How can our method be adapted to any model and dataset?
2. Related Work
3. Datasets and Annotations
4. Annotation Transfer Across Images
4.1. Finding Visually Nearest Neighbor
4.2. Unsupervised Segmentation
4.3. Part Label Transfer with Correspondence
- For each hyperpixel correspondence :
- (a)
- Appearance weight:Here, appearance matching confidence is derived using an exponentiated cosine distance over hyperpixel features:
- (b)
- Hough voting:
- Map the correspondence to its offset bin in Hough space.
- Add to that bin’s accumulator.
- (c)
- Spatial regularization:
- Convolve the Hough space with a smoothing filter.
- Read out spatial consistency from the bin of .
- (d)
- Final weight:
5. Local and Global Explanations
5.1. Minimal Sufficient Explanations
5.2. Symbolic Representation
5.3. Deriving Global Explanations
6. Results and Discussion
7. Conclusions and Future Work
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet Classification with Deep Convolutional Neural Networks. In Advances in Neural Information Processing Systems; Pereira, F., Burges, C., Bottou, L., Weinberger, K., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2012; Volume 25. [Google Scholar]
- General Data Protection Regulation (GDPR). Regulation (EU) 2016/679 of the European Parliament and of the Council; Publications Office of the European Union: Luxembourg, 2016. [Google Scholar]
- Smuha, N.A.; Ahmed-Rengers, E.; Harkens, A.; Li, W.; MacLaren, J.; Piselli, R.; Yeung, K. How the EU Can Achieve Legally Trustworthy AI: A Response to the European Commission’s Proposal for an Artificial Intelligence Act. 2021. Available online: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3899991 (accessed on 13 July 2025).
- Fresz, B.; Dubovitskaya, E.; Brajovic, D.; Huber, M.; Horz, C. How should AI decisions be explained? Requirements for Explanations from the Perspective of European Law. arXiv 2024, arXiv:2404.12762. [Google Scholar] [CrossRef]
- Bau, D.; Zhou, B.; Khosla, A.; Oliva, A.; Torralba, A. Network dissection: Quantifying interpretability of deep visual representations. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 6541–6549. [Google Scholar]
- Zhou, B.; Khosla, A.; Lapedriza, A.; Oliva, A.; Torralba, A. Learning deep features for discriminative localization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 2921–2929. [Google Scholar]
- Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 618–626. [Google Scholar]
- Petsiuk, V.; Das, A.; Saenko, K. Rise: Randomized input sampling for explanation of black-box models. arXiv 2018, arXiv:1806.07421. [Google Scholar] [CrossRef]
- Zeiler, M.D.; Fergus, R. Visualizing and understanding convolutional networks. In Proceedings of the Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, 6–12 September 2014; Proceedings, Part I 13. Springer: Cham, Switzerland, 2014; pp. 818–833. [Google Scholar]
- Khorram, S.; Lawson, T.; Fuxin, L. iGOS++ integrated gradient optimized saliency by bilateral perturbations. In Proceedings of the Conference on Health, Inference, and Learning, Virtual, 8–10 April 2021; pp. 174–182. [Google Scholar]
- Shitole, V.; Li, F.; Kahng, M.; Tadepalli, P.; Fern, A. One explanation is not enough: Structured attention graphs for image classification. Adv. Neural Inf. Process. Syst. 2021, 34, 11352–11363. [Google Scholar]
- Chen, C.; Li, O.; Tao, D.; Barnett, A.; Rudin, C.; Su, J.K. This looks like that: Deep learning for interpretable image recognition. Adv. Neural Inf. Process. Syst. 2019, 8928–8939. [Google Scholar]
- Fei-Fei, L.; Iyer, A.; Koch, C.; Perona, P. What do we perceive in a glance of a real-world scene? J. Vis. 2007, 7, 10. [Google Scholar] [CrossRef] [PubMed]
- Biederman, I. Recognition-by-components: A theory of human image understanding. Psychol. Rev. 1987, 94, 115. [Google Scholar] [CrossRef] [PubMed]
- Wah, C.; Branson, S.; Welinder, P.; Perona, P.; Belongie, S. The Caltech-UCSD Birds-200-2011 Dataset; Computation & Neural Systems Technical Report, CNS-TR-2011-001; California Institute of Technology: Pasadena, CA, USA, 2011; Available online: https://www.vision.caltech.edu/datasets/cub_200_2011/ (accessed on 13 July 2025).
- Zhou, B.; Zhao, H.; Puig, X.; Fidler, S.; Barriuso, A.; Torralba, A. Scene parsing through ade20k dataset. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 633–641. [Google Scholar]
- Krause, J.; Deng, J.; Stark, M.; Fei-Fei, L. Collecting a large-scale dataset of fine-grained cars. In Proceedings of the Second Workshop on Fine-Grained Visual Categorization (FGVC), IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, Portland, OR, USA, 23–28 June 2013; Available online: https://ai.stanford.edu/~jkrause/papers/fgvc13.pdf (accessed on 13 July 2025).
- Ribeiro, M.T.; Singh, S.; Guestrin, C. “Why should i trust you?” Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 1135–1144. [Google Scholar]
- Ribeiro, M.T.; Singh, S.; Guestrin, C. Anchors: High-precision model-agnostic explanations. In Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018; Volume 32. [Google Scholar]
- Kim, B.; Wattenberg, M.; Gilmer, J.; Cai, C.; Wexler, J.; Viegas, F.; Sayres, R. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden, 10–15 July 2018; PMLR: New York, NY, USA, 2018; pp. 2668–2677. [Google Scholar]
- Miller, T. Explanation in artificial intelligence: Insights from the social sciences. Artif. Intell. 2019, 267, 1–38. [Google Scholar] [CrossRef]
- Doshi-Velez, F.; Kim, B. Towards a rigorous science of interpretable machine learning. arXiv 2017, arXiv:1702.08608. [Google Scholar] [CrossRef]
- Zheng, H.; Fu, J.; Mei, T.; Luo, J. Learning multi-attention convolutional neural network for fine-grained image recognition. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 5209–5217. [Google Scholar]
- Zhang, N.; Donahue, J.; Girshick, R.; Darrell, T. Part-based R-CNNs for fine-grained category detection. In Proceedings of the European Conference on Computer Vision, Zurich, Switzerland, 6–12 September 2014; Springer: Cham, Switzerland, 2014; pp. 834–849. [Google Scholar]
- Kliegr, T.; Bahník, Š.; Fürnkranz, J. A review of possible effects of cognitive biases on interpretation of rule-based machine learning models. Artif. Intell. 2021, 295, 103458. [Google Scholar] [CrossRef]
- Singla, S.; Wallace, S.; Triantafillou, S.; Batmanghelich, K. Using causal analysis for conceptual deep learning explanation. In Proceedings of the Medical Image Computing and Computer Assisted Intervention–MICCAI 2021: 24th International Conference, Strasbourg, France, 27 September–1 October 2021; Proceedings, Part III 24. Springer: Cham, Switzerland, 2021; pp. 519–528. [Google Scholar]
- Sharma, R.; Reddy, N.; Kamakshi, V.; Krishnan, N.C.; Jain, S. MAIRE-a model-agnostic interpretable rule extraction procedure for explaining classifiers. In Proceedings of the Machine Learning and Knowledge Extraction: 5th IFIP TC 5, TC 12, WG 8.4, WG 8.9, WG 12.9 International Cross-Domain Conference, CD-MAKE 2021, Virtual Event, 17–20 August 2021; Proceedings 5. Springer: Cham, Switzerland, 2021; pp. 329–349. [Google Scholar]
- Friedman, J.H.; Popescu, B.E. Predictive learning via rule ensembles. Ann. Appl. Stat. 2008, 2, 916–954. [Google Scholar] [CrossRef]
- Lundberg, S.M.; Erion, G.G.; Lee, S.I. Consistent individualized feature attribution for tree ensembles. arXiv 2018, arXiv:1802.03888. [Google Scholar]
- Harris, C.; Pymar, R.; Rowat, C. Joint Shapley values: A measure of joint feature importance. arXiv 2021, arXiv:2107.11357. [Google Scholar]
- Kamakshi, V.; Krishnan, N.C. Explainable image classification: The journey so far and the road ahead. AI 2023, 4, 620–651. [Google Scholar] [CrossRef]
- Balayn, A.; Soilis, P.; Lofi, C.; Yang, J.; Bozzon, A. What do you mean? Interpreting image classification with crowdsourced concept extraction and analysis. In Proceedings of the Web Conference 2021, Ljubljana, Slovenia, 19–23 April 2021; pp. 1937–1948. [Google Scholar]
- Ji, C.; Darwiche, A. A new class of explanations for classifiers with non-binary features. In Proceedings of the European Conference on Logics in Artificial Intelligence, Dresden, Germany, 20–22 September 2023; Springer: Cham, Switzerland, 2023; pp. 106–122. [Google Scholar]
- Azzolin, S.; Longa, A.; Barbiero, P.; Lio, P.; Passerini, A. Global Explainability of GNNs via Logic Combination of Learned Concepts. In Proceedings of the Eleventh International Conference on Learning Representations, Kigali, Rwanda, 1–5 May 2022. [Google Scholar]
- Zhang, H.; Fang, T.; Chen, X.; Zhao, Q.; Quan, L. Partial similarity based nonparametric scene parsing in certain environment. In Proceedings of the CVPR 2011, Colorado Springs, CO, USA, 20–25 June 2011; IEEE: New York, NY, USA, 2011; pp. 2241–2248. [Google Scholar]
- Liu, C.; Yuen, J.; Torralba, A. Nonparametric scene parsing: Label transfer via dense scene alignment. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; IEEE: New York, NY, USA, 2009; pp. 1972–1979. [Google Scholar]
- Criminisi, A. Microsoft Research Cambridge (MSRC) Object Recognition Image Database, Version 2.0; Microsoft Research Cambridge: Cambridge, UK, 2004; Available online: http://research.microsoft.com/vision/cambridge/recognition/default.htm (accessed on 13 July 2025).
- Joulin, A.; Van Der Maaten, L.; Jabri, A.; Vasilache, N. Learning visual features from large weakly supervised data. In Proceedings of the Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Proceedings, Part VII 14. Springer: Cham, Switzerland, 2016; pp. 67–84. [Google Scholar]
- Zhang, X.; Wei, Y.; Kang, G.; Yang, Y.; Huang, T. Self-produced guidance for weakly-supervised object localization. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 597–613. [Google Scholar]
- Simonyan, K. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv 2013, arXiv:1312.6034. [Google Scholar]
- Springenberg, J.T.; Dosovitskiy, A.; Brox, T.; Riedmiller, M. Striving for simplicity: The all convolutional net. arXiv 2014, arXiv:1412.6806. [Google Scholar]
- Samek, W. Explainable artificial intelligence: Understanding, visualizing and interpreting deep learning models. arXiv 2017, arXiv:1708.08296. [Google Scholar] [CrossRef]
- Islam, M.J.; Wang, R.; Sattar, J. SVAM: Saliency-guided visual attention modeling by autonomous underwater robots. arXiv 2020, arXiv:2011.06252. [Google Scholar]
- Achanta, R.; Shaji, A.; Smith, K.; Lucchi, A.; Fua, P.; Süsstrunk, S. SLIC superpixels compared to state-of-the-art superpixel methods. IEEE Trans. Pattern Anal. Mach. Intell. 2012, 34, 2274–2282. [Google Scholar] [CrossRef] [PubMed]
- Min, J.; Lee, J.; Ponce, J.; Cho, M. Hyperpixel Flow: Semantic Correspondence with Multi-Layer Neural Features. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019. [Google Scholar]
- Hariharan, B.; Arbeláez, P.; Girshick, R.; Malik, J. Hypercolumns for object segmentation and fine-grained localization. In Proceedings of thee IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 447–456. [Google Scholar]
- Slavík, P. A tight analysis of the greedy algorithm for set cover. In Proceedings of the Twenty-Eighth Annual ACM Symposium on Theory of Computing, Philadelphia, PA, USA, 22–24 May 1996; pp. 435–441. [Google Scholar]
- Huang, X.; Marques-Silva, J. On the failings of Shapley values for explainability. Int. J. Approx. Reason. 2024, 171, 109112. [Google Scholar] [CrossRef]
- Young, N.E. Greedy set-cover algorithms (1974–1979, chvátal, johnson, lovász, stein). In Encyclopedia of Algorithms; Springer: New York, NY, USA, 2008; pp. 379–381. [Google Scholar]
Model (Mean Coverage %, Standard Deviation) | |||
---|---|---|---|
Datasets | VGG19 | ResNet101 | DenseNet121 |
ADE20k | 48.5, 25.9 | 53.5, 25.3 | 52.1, 27.5 |
CUB200 | 56.7, 12 | 59.2, 11.2 | 58.3, 12.4 |
Stanford Cars | 98.5, 1.7 | 99.3, 1.34 | 97.2, 3.3 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Vasu, B.; Rathore, K.; Tadepalli, P. Beyond Local Explanations: A Framework for Global Concept-Based Interpretation in Image Classification. Electronics 2025, 14, 3230. https://doi.org/10.3390/electronics14163230
Vasu B, Rathore K, Tadepalli P. Beyond Local Explanations: A Framework for Global Concept-Based Interpretation in Image Classification. Electronics. 2025; 14(16):3230. https://doi.org/10.3390/electronics14163230
Chicago/Turabian StyleVasu, Bhavan, Kunal Rathore, and Prasad Tadepalli. 2025. "Beyond Local Explanations: A Framework for Global Concept-Based Interpretation in Image Classification" Electronics 14, no. 16: 3230. https://doi.org/10.3390/electronics14163230
APA StyleVasu, B., Rathore, K., & Tadepalli, P. (2025). Beyond Local Explanations: A Framework for Global Concept-Based Interpretation in Image Classification. Electronics, 14(16), 3230. https://doi.org/10.3390/electronics14163230