Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Domain Adaptive Channel Pruning

Electronics 2024, 13(5), 887; https://doi.org/10.3390/electronics13050887

by Ge Yang¹

, Chao Zhang^2,*, Ling Gao³, Yufei Guo⁴

and Jinyang Guo^5,*

Reviewer 1:

Irdin Pekaric

Reviewer 2: Anonymous

Electronics 2024, 13(5), 887; https://doi.org/10.3390/electronics13050887

Submission received: 24 December 2023 / Revised: 2 February 2024 / Accepted: 2 February 2024 / Published: 26 February 2024

(This article belongs to the Special Issue AI Security and Safety)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

Summary:

The paper addresses the challenge of generalizing deep learning models to different domains, which often leads to instability and lack of robustness. They propose a new channel pruning method called Domain Adaptive Channel Pruning (DACP) specifically designed for unsupervised domain adaptation tasks. The goal of the approach is to reduce data distribution mismatch between training and testing samples, considering the limitations of deploying deep models on different types of edge devices. In addition, in order to avoid frequent and time-consuming fine-tuning and to address it in a closed-form, the authors propose using a least squares optimization problem to adjust model parameters. Furthermore, they also provide a method to effectively utilize unlabelled data in the target domain. They generate pseudo-labels using the initial DANN model and selectively use these labels or features from the target domain to guide the channel pruning process. This adaptive approach ensures that the pruning process is guided by target samples with different confidence scores of pseudo-labels. The approach was evaluated through various conducted experiments and has shown better results compared to the related work.

Identified issues:

The motivation of the topic in section 1 should be improved. For example, the issues or problems that the domain adaptation is successfully addressing should be discussed here. It is important for readers to understand why is this topic significant. The topic itself could be described in more detail too.

"While we take the unsupervised domain adaptation method called Domain Adversarial Neural Network (DANN) [2] as an example to introduce our DACP approach, our DACP approach can also be readily used to prune channels for other approaches like [3]." -> In this sentence in Section 1, it would be beneficial for readers to provide more information aside providing only the references. For instance, after "like", you should mention the approaches that the DCHP approach you propose can prune.

The paper would benefit if the paper's structure was desrcribed at the end of Section 1. This would make it easier to grasp the structure of the paper for the readers.

As a reader and a reviewer, I think it would be beneficial to add a simpler version of Figure 1 to Section 1. It is very difficult to grasp some things until this figure is seen. However, you should surely not remove Figure 1.

The authors might have missed some related works such as the paper "Accelerating Deep Unsupervised Domain Adaptation with Transfer Channel Pruning" by Chaohui Yu, Jindong Wang, Yiqiang Chen, and Zijing Wu (2019). This paper seems to discuss unsupervised domain adaptation settings for which the authors have stated that their approach is the first one to discuss such settings. I would advise the authors to provide reasoning on this matter including such a strong statement that they have made. In addition, they should check for additional related work they might have missed.

Personally, I am not a fan of long captions in figures such as in Figure 1. It is better to describe these things in the text itself.

In regards to Section 4, the authors should provide reasoning for choosing the specific models VGG16, AlexNet, and ResNet-50, as well as the reasoning for using Office-31 and ImageCLEF-DA datasets.

Why did the authors choose the specific values for learning rate, batch size, and momentum? Some explanation should be provided in order to better understand the experiment(s).

The authors should discuss the weaknesses and limitations of the proposed approach. In addition, they should state possible future directions that could be addressed.

I would also suggest authors consider adding a "Background" section in order to explain some concepts to the readers that are not too familiar with this topic.

Comments on the Quality of English Language

Below you can find minor issues that were identified. These relate to grammar, typos, sentence structure issues, etc. This is shown by indicating the page and line number, a sentence with an issue, and a corrected sentence. These issues are presented in order to improve the overall readability of the publication. I would suggest someone proofread your paper in detail. You should also consider changing words such as "learning based" to "learning-based". There are other examples in the paper too.

pg 1. line 16: "While deep learning approaches have achieved promising results in many computer vision tasks, it is still a challenge task to generalize it on different domains, which makes the deep learning approaches unstable and lack of robustness. Domain adaptation is an effective approach to generalize from a labelled domain to a unlabelled domain, where the data distribution are different between these domains." -> "While deep learning approaches have achieved promising results in many computer vision tasks, it is still a challenging task to generalize them on different domains, which makes the deep learning approaches unstable and lack robustness. Domain adaptation is an effective approach to generalize from a labelled domain to a unlabelled domain, where the data distribution is different between these domains."

pg 1. line 31: "Specifically, for the image classification task, it uses the losses from both image classifier and domain classifier to guide the pruning process, which takes both classification accuracy and domain distribution mismatch into consideration." -> "Specifically, the image classification task uses the losses from both the image classifier and domain classifier to guide the pruning process, which takes both classification accuracy and domain distribution mismatch into consideration."

pg 2. line 45: "First, we propose domain adaptive channel pruning approach specifically designed for the unsupervised domain adaptation task, which can improve the generalization ability and also address the deployment issue." -> "First, we propose a domain adaptive channel pruning approach specifically designed for the unsupervised domain adaptation task, which can improve the generalization ability and also address the deployment issue."

pg 2. line 96: "In contrast to [22], our work uses the losses from both image classifier and domain classifier as the guidance for model compression, which is more suitable for the target domain." -> "In contrast to [22], our work uses the losses from both the image classifier and domain classifier as the guidance for model compression, which is more suitable for the target domain."

pg 3. line 99: "Then, we present the process of pre-training an initial DANN model to reduce the data distribution mismatch between two domains." -> "Then, we present the process of pre-training an initial DANN model to reduce the data distribution mismatch between the two domains."

pg 3. line 122: "The feature extractor F uses the CNNs (e.g. VGG [38]) as the backbone." -> "The feature extractor F uses CNNs (e.g. VGG [38]) as the backbone."

pg 3. line 131: "Given an pre-trained initial DANN model, our goal is to compress this model so as to achieve the best performance on the target domain under a given model compression ratio." -> "Given a pre-trained initial DANN model, our goal is to compress this model so as to achieve the best performance on the target domain under a given model compression ratio."

pg 4. line 158: "After pruning the channels by setting bj = 0 for several channel indices j, the compressed model will have a output feature Y at the l-th layer." -> "After pruning the channels by setting bj = 0 for several channel indices j, the compressed model will have an output feature Y at the l-th layer."

pg 5. line 196: "B is the pre-defined number of remained channels." -> "B is the pre-defined number of remaining channels."

pg 6. line 241: "Both of the image classifiers Cs and Ct consists of several fully connected layers." -> "Both of the image classifiers Cs and Ct consist of several fully connected layers."

pg 8. line 301: "Our DACP approach outperforms the CP method on all the settings, which again demonstrates the effectiveness of our DACP approach." -> "Our DACP approach outperforms the CP method in all the settings, which again demonstrates the effectiveness of our DACP approach."

pg 8. line 309: "One possible explanation is that our DACP-ResNet50 additionly utilizes pseudo-labelled samples from the target domain." -> "One possible explanation is that our DACP-ResNet50 additionaly utilizes pseudo-labelled samples from the target domain."

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

Major issues:

1. The motivation is not convincing.

a) The authors mention in the Abstract and Page 2, Line 47: the existing deep learning methods suffer from deployment problem due to different types of edge devices, but these problems are not clearly defined or discussed in the paper. This requires further elaboration and specific examples.

b) No references are provided to support the statement: “it is still a challenge task to generalize it on different domains, which makes the deep learning approaches unstable and lack of robustness.”

c) No references are provided to support the statement: “Therefore, it is desirable to perform model compression like channel pruning approaches under the domain adaptation setting.”

2. Since (un)compressed models are formally define in Section 3.3, it is necessary to revise “(un)compressed networks” to keep consistency throughout the paper: e.g., Line 142, Page 4: between the initial network and the compressed network models.

3. The choice of cross-entropy loss in Section 3.5 needs justification? Why this specific loss function is suitable for this research?

4. How to determine TH in Section 3.6? Also justify why a threshold of 0.3 was chosen in the experiments in Section 4.1?

5. It is important to discuss the limitations of the current work and suggest directions for future research.

Minor Issues:

1. The statement is unclear on Page 6, Line 207: Since the samples from the target domain are unlabelled, it is important to use the samples in the target domain effectively.

2. Mention Section 3.4 when discussing J1 in Figure 1.

Comments on the Quality of English Language

The paper requires thorough proofreading, for instance:

a) Page 1, Line 20: …where the data distribution distributions are different between these domains…

b) Page 3, Line 131: …Given an a pre-trained…

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

I have checked the "author_response" document that the authors provided. The changes that they did seem fine and I think the comments are addressed.

I really do not like how the "Background" section was incorporated into the paper. "Introduction" should surely be Section 1. In this case, I would change Section 3 (Related Work) into "Background and Related Work". In case the authors want to expand on it, "Background" can be a standalone section. I honestly think the authors should add more background information on domain adaptive channel pruning so it can be easier for all the readers to understand what this approach is about. There is no clear transition from the "Introduction" section to the " Domain Adaptive Channel Pruning" section. I would strongly advise this to be fixed.

I also noticed that for a lot of explanations regarding why certain approach or models was applied, the explanation simply states that "it is because they are commonly used or applied". I think the authors should state why are they good in this particular case (why did the authors choose them in their specific approach).

I would also advise authors to share their implementation and approach in the public repository and provide the URL in the paper for the purpose of replication and future research.

I think I also added several comments previously so authors should take these into consideration too.

Comments on the Quality of English Language

The English in the response document has typos. The authors need to fix this or have someone proofread the paper.

"Compared to TCP, our approach use the cross entropy loss for both image classifier and domain classifier to reduce the data distribution mismatch, which is not considered in [1]." -> "Compared to TCP, our approach uses the cross entropy loss for both image classifier and domain classifier to reduce the data distribution mismatch, which is not considered in [1]."

"We call the domain of training data as source domain and the domain of testing data as target domain." -> "We call the domain of training data as the source domain and the domain of testing data as the target domain."

"Moreover, with the development of deep learning methods, large deep neural networks require a huge amount of computing and storage resources which brings difficulties to deployment in real-world application [1]." -> "Moreover, with the development of deep learning methods, large deep neural networks require a huge amount of computing and storage resources which brings difficulties to deployment in real-world applications [1]."

"For example, it is still challenging to deploy a large-scale ResNet model on apple watch and run it in real-time, which limits the application scenario of deep learning models." -> "For example, it is still challenging to deploy a large-scale ResNet model on an apple watch and run it in real-time, which limits the application scenario of deep learning models."

"While deep learning approaches have achieved promising results in many computer vision tasks, it is still a challenge task to generalize them on different domains, which makes the deep learning approaches unstable and lack of robustness [2]." -> "While deep learning approaches have achieved promising results in many computer vision tasks, it is still a challenging task to generalize them on different domains, which makes the deep learning approaches unstable and lack robustness [2]."

"In order to reduce the data distribution mismatch, we choose the cross entropy loss for both image classifier and domain classifier as our cost function since the cross entropy loss is a common loss function designed for classification tasks, which is widely used in many works [1,44]." -> "In order to reduce the data distribution mismatch, we choose the cross entropy loss for both the image classifier and domain classifier as our cost function since the cross entropy loss is a common loss function designed for classification tasks, which is widely used in many works [1,44]."

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

After carefully reviewing the revised manuscript and response, I am pleased to say that my concerns have been successfully addressed, and the response provided was thoughtful. Thus I recommend acceptance as it is.

Author Response

Thank you for reviewing our paper and for accepting it for publication. We greatly appreciate your time and valuable feedback during the review process.

Article Menu

Domain Adaptive Channel Pruning

Further Information

Guidelines

MDPI Initiatives

Follow MDPI