The growing use of deep neural networks in critical applications is making interpretability urgently to be solved. Local interpretation methods are the most prevalent and accepted approach for understanding and interpreting deep neural networks. How to effectively evaluate the local interpretation methods is challenging. To address this question, a unified evaluation framework is proposed, which assesses local interpretation methods from three dimensions: accuracy, persuasibility and class discriminativeness. Specifically, in order to assess correctness, we designed an interactive user feature annotation tool to provide ground truth for local interpretation methods. To verify the usefulness of the interpretation method, we iteratively display part of the interpretation results, and then ask users whether they agree with the category information. At the same time, we designed and built a set of evaluation data sets with a rich hierarchical structure. Surprisingly, one finding is that the existing visual interpretation methods cannot satisfy all evaluation dimensions at the same time, and each has its own shortcomings.
This is an open access article distributed under the Creative Commons Attribution License
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited