Pairwise Coupling of Convolutional Neural Networks for the Better Explainability of Classification Systems
Round 1
Reviewer 1 Report
Comments and Suggestions for Authors1.It is recommended that the authors include a literature review section in the introduction to incorporate recent studies on the interpretability of deep learning.
2.The description of the paper’s structural framework requires further elaboration to enhance clarity and coherence.
3.Interpretability evaluation metrics (Grad-CAM) were not employed to compare the proposed method with conventional CNN models.
4.The rationale behind the choice of hyperparameters has not been sufficiently justified.
5.As the experiments were conducted solely on the Fashion-MNIST dataset, it is difficult to assess the method’s effectiveness and generalization capability for more complex tasks.
6.Although the conclusion briefly acknowledges certain experimental limitations, it does not provide specific directions for improvement or discuss potential avenues for future application.
Author Response
Comment 1.It is recommended that the authors include a literature review section in the introduction to incorporate recent studies on the interpretability of deep learning.
Response 1. Agree, as suggested, we have added references to recent works in the introduction about interpretability of deep learning.
Comment 2.The description of the paper’s structural framework requires further elaboration to enhance clarity and coherence.
Response 2. Agree. We made substantial revisions to abstract, introduction and conclusion sections in order to enhance clarity.
Comment 3.Interpretability evaluation metrics (Grad-CAM) were not employed to compare the proposed method with conventional CNN models.
Response 3. Partially agree. Grad-CAM is used to examine other facets of interpretability of deep learning, which are not covered in our paper. But we find it useful to cite Grad-CAM paper in introduction, so we made the change.
Comment 4.The rationale behind the choice of hyperparameters has not been sufficiently justified.
Reponse 4. Disagree. Most hyper-parameters are considered standard in the context of deep neural networks. We conducted an experiment to find a good value of threshold hyperparameter as illustrated in Figure 4.
Comment 5. As the experiments were conducted solely on the Fashion-MNIST dataset, it is difficult to assess the method’s effectiveness and generalization capability for more complex tasks.
Response 5. Partially agree. We have eliminated section related to sureness explicability (also called out-of-distribution detection), since after literature review we feel is not sufficiently supported by our experiments. However, the other findings are “proof-of-concept”, and thus we consider using the (admittedly basic) Fashion-MNIST dataset an appropriate choice.
Comment 6.Although the conclusion briefly acknowledges certain experimental limitations, it does not provide specific directions for improvement or discuss potential avenues for future application.
Reponse 6. Agree. We added a subsection in the conclusion about future research directions.
We thank the reviewer for useful suggestions.
Reviewer 2 Report
Comments and Suggestions for AuthorsThe manuscript provides a detailed examination of the pairwise coupling methodology applied to convolutional neural networks (CNNs) for improving classification explicability. The paper presents novel insights into different aspects of explicability such as pairwise explicability, likelihood randomness explicability, and sureness explicability. The comparative evaluation of Wu-Lin-Weng and Bayes covariant methods is useful, and the experiments conducted on the Fashion MNIST dataset yield promising results.
However, there are several areas where the manuscript needs improvement. These include clarity in some explanations, more thorough validation of claims, better statistical reporting, and a deeper discussion on the limitations of the proposed methodologies. Based on the content and structure, I recommend a minor revision with a few critical enhancements to strengthen the paper's impact.
While the novelty of applying pairwise coupling methods for explicability is clearly stated, the clinical or industrial implications of this work could be emphasized more, especially how these methods could improve the adoption of CNNs in real-world applications where interpretability is crucial (e.g., medical diagnoses, autonomous driving).
Include effect sizes alongside p-values to provide a more complete understanding of the results.
Enhance the clarity of figures and tables by providing more detailed captions and context for each figure. For instance, clarify what the “pairwise accuracy” and "multi-class accuracy" represent and how they are calculated.
Ensure that the statistical analysis section includes detailed explanations of the methods used, including any assumptions made during testing.
Expand the discussion to include a more comprehensive comparison of your results with similar studies. How do your findings compare to previous work on model explicability in deep learning, especially with regard to the accuracy and interpretability of predictions?
Include a detailed exploration of potential applications and discuss any real-world challenges in adopting your methods
Author Response
Comment 1. The manuscript provides a detailed examination of the pairwise coupling methodology applied to convolutional neural networks (CNNs) for improving classification explicability. The paper presents novel insights into different aspects of explicability such as pairwise explicability, likelihood randomness explicability, and sureness explicability. The comparative evaluation of Wu-Lin-Weng and Bayes covariant methods is useful, and the experiments conducted on the Fashion MNIST dataset yield promising results.
Response 1. Thank you for your comment.
Comment 2. However, there are several areas where the manuscript needs improvement. These include clarity in some explanations, more thorough validation of claims, better statistical reporting, and a deeper discussion on the limitations of the proposed methodologies. Based on the content and structure, I recommend a minor revision with a few critical enhancements to strengthen the paper's impact.
Response 2. Thank you for suggestion, more detailed responses are below.
Comment 3. While the novelty of applying pairwise coupling methods for explicability is clearly stated, the clinical or industrial implications of this work could be emphasized more, especially how these methods could improve the adoption of CNNs in real-world applications where interpretability is crucial (e.g., medical diagnoses, autonomous driving).
Response 3.
We included new section 8.2 describing real-world applications of our work.
Comment 4.
Include effect sizes alongside p-values to provide a more complete understanding of the results.
Response 4.
We added a comment on statistical assumptions in Section 5.2, as well as an effect size statement.
Comment 5. Enhance the clarity of figures and tables by providing more detailed captions and context for each figure. For instance, clarify what the “pairwise accuracy” and "multi-class accuracy" represent and how they are calculated.
Response 5.
We included new section 4.4 where we define evaluation metrics.
Comment 6. Ensure that the statistical analysis section includes detailed explanations of the methods used, including any assumptions made during testing.
Response 6.
We added a comment on statistical assumptions in Section 5.2, as well as effect size statement.
Comment 7. Expand the discussion to include a more comprehensive comparison of your results with similar studies. How do your findings compare to previous work on model explicability in deep learning, especially with regard to the accuracy and interpretability of predictions?
Response 7. We added text in Section 7 which mentions alternatives to uncertainty quantification. Other facets of interpretability have not been considered previously and thus no comparison can be made. Section 8 about sureness explicability (out of distribution detection) was eliminated.
Comment 8. Include a detailed exploration of potential applications and discuss any real-world challenges in adopting your methods
Response 8. We added text in the conclusion which both describers applications and explicates the key challenge of pairwise coupling method i.e. the quadratic growth of binary classifiers needed to construct the model.
We thank the reviewer for useful suggestions.
Reviewer 3 Report
Comments and Suggestions for AuthorsThe authors should clarify explicability vs explainability.
The introduction is currently too confusing and difficult to follow. Also, when defining each use case, the introduction should include more citations to support the claims and provide proper context within the existing literature.
The authors should consider creating a summary table that presents the different types of explicabilities discussed in the paper.
The term IIA is introduced before being defined; this should be corrected to ensure conceptual clarity.
From an experimental standpoint, the authors should consider using additional datasets. Since the accuracy on FMNIST is already very high, it becomes difficult to meaningfully assess improvements in model performance.
The statement The optimal probability of dropout may vary depending on the task is somewhat trivial, as the same holds for most hyperparameters (e.g., number of layers, number of neurons).
Sharing the weights of the first layers might reduce randomness, this point should be elaborated further.
Section 5 should include appropriate statistical tests to support the reported results.
The concept of Sureness explicability seems closely related to open-set recognition; the authors should clarify this connection explicitly to help readers understand how it fits within existing paradigms.
Finally, not all explicabilities are discussed in sufficient detail, and these sections should be expanded for completeness.
Comments on the Quality of English LanguageThe text seems too confusing, specially the abstract and introduction.
Author Response
Comment 1. The authors should clarify explicability vs explainability.
Response 1. We agree. We replaced the term “explicability” with “explainability” since it is more commonly used in the field. Our original choice was based on the fact that the words are considered synonyms in English.
Comment 2. The introduction is currently too confusing and difficult to follow. Also, when defining each use case, the introduction should include more citations to support the claims and provide proper context within the existing literature.
Response 2. We agree. We rewrote the introduction to enhance clarity and provided supporting references
Comment 3. The authors should consider creating a summary table that presents the different types of explicabilities discussed in the paper.
Response 3. We agree that having a table would be a good idea, but after rewriting the logic of introduction and eliminated one explainability facet (sureness explainability, also known under the term out-of-distribution detection), we do not think it is necessary.
Comment 4. The term IIA is introduced before being defined; this should be corrected to ensure conceptual clarity.
Response 4. Regarding IIA, we added reference to the section where it is discussed to improve clarity.
Comment 5. From an experimental standpoint, the authors should consider using additional datasets. Since the accuracy on FMNIST is already very high, it becomes difficult to meaningfully assess improvements in model performance.
Response 5. FMNIST is indeed a too simple dataset for sureness explainability (i.e. out-of-distribution detection). We thus eliminated the section related to this explainability aspect. For the other two facets of explainability studied in the paper (uncertainty quantification and pairwise explainability), we consider the dataset sufficient, since it is used to provide “proof-of-concept”.
Comment 6. The statement The optimal probability of dropout may vary depending on the task is somewhat trivial, as the same holds for most hyperparameters (e.g., number of layers, number of neurons).
Response 6.
We agree, that to the experts the point about dropout hyperparameter is obvious, but we find no harm in leaving the statement as is.
Comment 7. Sharing the weights of the first layers might reduce randomness, this point should be elaborated further.
Response 7. The point about randomness introduced by sharing the weights is intriguing and likely valid, but at the moment we do not how to fix it.
Comment 8. Section 5 should include appropriate statistical tests to support the reported results.
Response 8. Section 5 could indeed use standard statistical tests. However, statistical tests are harder to parse to wide audience compared to boxplot visualization that we provided, and thus we propose to leave it as it is. We limit statistical test to Section 6.
Comment 9. The concept of Sureness explicability seems closely related to open-set recognition; the authors should clarify this connection explicitly to help readers understand how it fits within existing paradigms.
Response 9. Sureness explicability section was eliminated, as mentioned earlier.
Comment 10. Finally, not all explicabilities are discussed in sufficient detail, and these sections should be expanded for completeness.
Response 10. Partially agree. While its true that more detail could be provided, we believe for a journal paper we provided sufficient background to the reader.
We thank the reviewer for useful suggestions.
Round 2
Reviewer 1 Report
Comments and Suggestions for AuthorsWe still believe that including the visual results based on Grad-CAM in the experimental section of the paper will significantly enhance the persuasiveness of the paper.
Author Response
Comment 1: We still believe that including the visual results based on Grad-CAM in the experimental section of the paper will significantly enhance the persuasiveness of the paper.
Response 1: Thank you for the suggestion. We have added visualization of predictions using Grad-CAM in Section 5.2 (Figure 5).
