Identifying Bias in Deep Neural Networks Using Image Transforms
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsThe paper addresses the highly relevant issue of bias in Convolutional Neural Networks (CNNs) and benchmark datasets. It evaluates bias using image transforms. The use of a mix of natural, synthetic, and hybrid datasets ensures broader applicability. However, there are several areas where the manuscript could be improved.
Abstract: Clearly outline the research gap and contributions. Move introductory statements to the introduction section. Keep the abstract concise and focused on findings.
Introduction: Expand the discussion of existing works tackling bias. Provide more references (e.g., saliency maps, adversarial training, or dataset augmentation) to give readers an overview of current trends. Strengthen the connection between bias detection methods and practical applications.
Classification Bias in Benchmark Datasets: Include a detailed comparison with existing bias detection techniques to contextualize the novelty of this approach.
Methods: Add parameter details (e.g., wavelet function used, filter kernel size) for reproducibility.
Results: Explore the implications of accuracy changes in greater detail, particularly why synthetic datasets and Yale Faces maintain or improve accuracy while natural datasets do not. Mention any statistical tests used to confirm accuracy changes. Consider reporting additional metrics like F1-score and precision/recall for imbalanced datasets. Add a table summarizing key findings for each transform across datasets.
Conclusion: Avoid repeating results verbatim. Instead, summarize contributions and provide actionable insights. Relate findings to practical applications (e.g., medical imaging, face recognition systems). Suggest future research directions, such as bias correction methods or applying these techniques to other neural network architectures.
Code Repository: Include clear instructions for launching the program and reproducing experiments.
Writing:
Line 5 and 14, that —> this
Line 14, used by CNNs to classify images —> that CNNs use to classify images
Line 248, and are —> which were. Please ensure tense consistency across the manuscript.
Line 316, two “when using”? Please fix the sentence.
Figure 13 and 14 captions, “classification accuracy of … images after applying a median filter" is more clear.
Figure 15 and 16, “classification accuracy of full images after applying both the median and wavelet transforms" is more clear.
Figure 16, some texts seem to be cropped. For example, bottom of COIL-20.
There are several long sentences. Please consider breaking them into shorter sentences for better readability. For example, line 2-5, line 240-242, line 256-259, etc. Please go through the manuscript and improve it.
Author Response
The paper addresses the highly relevant issue of bias in Convolutional Neural Networks (CNNs) and benchmark datasets. It evaluates bias using image transforms. The use of a mix of natural, synthetic, and hybrid datasets ensures broader applicability. However, there are several areas where the manuscript could be improved.
--Author response: Thank you for the time you spent reading and commenting on the manuscript. Many changes have been made based on the comments. For convenience, changes to the manuscript are highlighted in bold font.
Abstract: Clearly outline the research gap and contributions. Move introductory statements to the introduction section. Keep the abstract concise and focused on findings.
--Author response: The abstract has been revised. The introduction sentence has been shortened, but not completely removed for the readability of the abstract. Other parts of the abstract have also changed to make it more clear.
Introduction: Expand the discussion of existing works tackling bias. Provide more references (e.g., saliency maps, adversarial training, or dataset augmentation) to give readers an overview of current trends. Strengthen the connection between bias detection methods and practical applications.
--Author response: That is a good point. The description has been added to the Introduction section, with many references. The addition is highlighted in bold font. Adversarial neural network is something a bit different, but it is related and now mentioned in the text with references.
Classification Bias in Benchmark Datasets: Include a detailed comparison with existing bias detection techniques to contextualize the novelty of this approach.
--Author response: A comparison has been added to the section. Several methods have been mentioned in the beginning of the section. The most relevant method is the use of saliency maps, which is discussed later in the section (the new text is highlighted in bold font). The downside of that method is that it assumes that the images can be registered properly, which is not always the case. Also, it assumes that some parts that are known to be irrelevant can be expected. But that is also not always the case, and in some datasets assumptions about irrelevant parts of the image cannot be made. That has also been added to the revised paper.
Methods: Add parameter details (e.g., wavelet function used, filter kernel size) for reproducibility.
--Author response: Reproducibility should be very easy through the code that we made public at
https://github.com/SaiTeja-Erukude/identifying-bias-in-dnn-classification/tree/main . That has been added to the part of the paper that discusses the Wavelet transforms. The Wavelet transform was done simply by using the dwt2 function with the “haar” parameter and the “Daubechies” parameter. That has been added to the section. But the best was to reproduce the experiment is to just simply use the code.
Results: Explore the implications of accuracy changes in greater detail, particularly why synthetic datasets and Yale Faces maintain or improve accuracy while natural datasets do not. Mention any statistical tests used to confirm accuracy changes. Consider reporting additional metrics like F1-score and precision/recall for imbalanced datasets. Add a table summarizing key findings for each transform across datasets.
--Author response: The classification accuracy, precision, recall, and F1 scores for all experiments have been added in a new sub-section of Section 4 that has been added to the paper. The table summarizes the results of all experiments.
Conclusion: Avoid repeating results verbatim. Instead, summarize contributions and provide actionable insights. Relate findings to practical applications (e.g., medical imaging, face recognition systems). Suggest future research directions, such as bias correction methods or applying these techniques to other neural network architectures.
--Author response: The conclusion section has been revised substantially. The beginning of the introduction has been revised, and repetitions from other parts of the papers were removed, except from the first line of the Conclusion section that is needed for readability purposes. Description of how and when the method should be used. Limitations of the method are also described, and future work and research directions are added in the end of the Conclusion section.
Code Repository: Include clear instructions for launching the program and reproducing experiments.
--Author response: A note has been added to the code section. Using the code should be easy, and requires to just change the input and output directories, and then run it on Python. Libraries that are needed are also mentioned. The code is fairly straightforward.
Writing:
Line 5 and 14, that —> this
--Author response: That has been corrected. Thank you.
Line 14, used by CNNs to classify images —> that CNNs use to classify images
--Author response: A correction was made as proposed. Thank you.
Line 248, and are —> which were. Please ensure tense consistency across the manuscript.
--Author response: That indeed needed a correction, and was corrected.
Line 316, two “when using”? Please fix the sentence.
--Author response: The sentence has been corrected. Thank you.
Figure 13 and 14 captions, “classification accuracy of … images after applying a median filter" is more clear.
--Author response: Yes. The two captions have been completely re-written.
Figure 15 and 16, “classification accuracy of full images after applying both the median and wavelet transforms" is more clear.
--Author response: The captions have been both corrected.
Figure 16, some texts seem to be cropped. For example, bottom of COIL-20.
There are several long sentences. Please consider breaking them into shorter sentences for better readability. For example, line 2-5, line 240-242, line 256-259, etc. Please go through the manuscript and improve it.
--Author response: In our version of the manuscript Figure 16 seems clear. That can be because of the Latex compilation on the on-line submission system. But just in case, we replaced Figure 16. It also uses vector graphics to avoid problems of pixelization. Lines 2-5 have been removed anyway during the changing of the abstract. Other long sentences throughout the paper have been separated to shorter sentences to make them more readable.
Reviewer 2 Report
Comments and Suggestions for AuthorsThe manuscript is devoted to very interesting subject - how to identify dataset bias. The manuscript is well presented, but there are a few issues that need clarification.
1. It is necessary to describe more clearly what is new in the work.
2. The authors used the Accuracy metric. But some of the datasets used are unbalanced. And the accuracy metric is not robust for them. F1 score seems more preferable
3. What about more contemporary neural networks like transformers? Does it make sense to use them for bias dataset identification?
4. The classification process with 20х20 cropped part of the image needs to be described in more detail. For now it seems that the classification is not done on the whole image, but on the cut out image.
Author Response
The manuscript is devoted to very interesting subject - how to identify dataset bias. The manuscript is well presented, but there are a few issues that need clarification.
--Author response: We would like to thank you for the comments and for the time you took reading and commenting on the manuscript. The comments have been addressed, and the changes made to the manuscript are provided below each comment. Changes made in the text are highlighted in bold font.
- It is necessary to describe more clearly what is new in the work.
--Author response: That’s a good point. The novelty of the method is that it can identify dataset biases, but with no need to separate a blank part of the background. Because not every dataset allows to separate foreground from background data (in many cases the entire image is just foreground information), the new method can be used in such cases. That has been added to the Conclusion section, and especially to the abstract of the paper.
- The authors used the Accuracy metric. But some of the datasets used are unbalanced. And the accuracy metric is not robust for them. F1 score seems more preferable
--Author response: Yes. We added the precision, recall, and F1 for all experiments. That was done in a new table (Table 1) added to the paper. Using a table allows to show all the results in a compact manner.
- What about more contemporary neural networks like transformers? Does it make sense to use them for bias dataset identification?
--Author response: That’s an interesting idea, and will take a whole new work to test it. Unlike wavelets, Fourier, etc, the transformers need to be trained with target images before they can be applied. It is not clear to us at this point what the transformers should be trained with as output images, so at this point it is not entirely clear to us how that can be done, but it is something that might be possible in the future.
- The classification process with 20х20 cropped part of the image needs to be described in more detail. For now it seems that the classification is not done on the whole image, but on the cut out image.
--Author response: Thank you for the comment. That is indeed a point that should be explained clearly. The classification is done using both the 20x20 sub-images and the full images. But that was done just for the testing, and with datasets that we already know are biased just to test how they respond to the transforms and whether the transforms can identify the bias. When the method is applied, it is applied to the full images. That can indeed confuse the reader of the original version of the paper. We added a paragraph to the Conclusion section that summarizes how the method should be applied. In this paragraph we specify explicitly that when transforming the images, the transformation should be applied to the full images and not to the 20x20 background. The whole purpose of the method is to allow to identify dataset bias even when blank background is not available in the images.
Round 2
Reviewer 1 Report
Comments and Suggestions for AuthorsThe questions were well resolved. I recommend acceptance.
Reviewer 2 Report
Comments and Suggestions for AuthorsThe authors took into account all my comments. I recommend the article for publication.