Next Article in Journal
An Improved Submerged Mangrove Recognition Index-Based Method for Mapping Mangrove Forests by Removing the Disturbance of Tidal Dynamics and S. alterniflora
Previous Article in Journal
Multiscale Perspectives on an Extreme Warm-Sector Rainfall Event over Coastal South China
 
 
Article
Peer-Review Record

Self-Supervision and Self-Distillation with Multilayer Feature Contrast for Supervision Collapse in Few-Shot Remote Sensing Scene Classification

Remote Sens. 2022, 14(13), 3111; https://doi.org/10.3390/rs14133111
by Haonan Zhou 1,*, Xiaoping Du 2 and Sen Li 2
Reviewer 1:
Reviewer 2:
Remote Sens. 2022, 14(13), 3111; https://doi.org/10.3390/rs14133111
Submission received: 7 May 2022 / Revised: 23 June 2022 / Accepted: 26 June 2022 / Published: 28 June 2022
(This article belongs to the Section Remote Sensing Image Processing)

Round 1

Reviewer 1 Report

The paper proposes a solution for the scene classification task. The idea is based on conv nets and augmentation methods. The model is described with all necessary re-implementing details, but some issues are missing. Therefore, i must ask the authors for adding this information and making some discussion on some parts of their proposal like:

1)Underline the novelty in the abstract

2)Discuss the use of other machine learning models in the ship classification, the monitoring of coastal, etc.

3)Discuss also other, hybrid models based on machine learning like neural bag-of-words.

4)Explain why the new architecture is needed when we have so many pre-trained models. I understand, that your solution is based on such a model (embedding net), but why such a network is not enough.

5)Did you test some other augmentation tools in your proposal? The last few years brought some interpolation merges, etc. Some discussions also could be worth making.

6)Some information about the kernel sizes and their selection methods should be added. Can you show the feature maps of these kernels?

7) The loss function is based on sum of other ones (see Eq. (2)) - why are all of them equal?

8) Experimental section is well-designed.

Author Response

Please see the attachment

 

 

 

 

 

Author Response File: Author Response.docx

Reviewer 2 Report

1. Figure 1 that representing the overall structure is unclear, especially for the multi-layer feature contrast process.

2. In part 4.1, number and categories of the mixed dataset are not clear enough, a table is better provided.

3. In Table 1, structure of the network is not supposed to be right, where the FC layer is absent. Also, the last output should not be 1x1 vector for your classification model.

4. In Line 553-555, there is no evidence to support the statement that low performance is due to the self-attention mechanism (Transformer). You may need to change the structure of Embedding-Net (ConvNet) of your method with Transformer to support this idea.

5. The novelty is not clear and most of the techniques uses are well known approaches.

6. The listed algorithm/method should be formalized. The corresponding time complexity should be analyzed.

7. Discuss the limitations of the proposed method. Every method has a weakness and should be discussed clearly and give solutions in the future work section.

8. The whole text should be carefully examined and corrected. Lots of language error. Looks like they have not edited the paper

 

Typo and Language error

1. Lines 111 “reduce the compactness …”

2. Lines 239, have no idea of “niece NN”

3. In Figure 381, “will over cluster learning

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Round 2

Reviewer 1 Report

The paper was improved to all my comments.

Back to TopTop