Next Article in Journal
ArithFusion: An Arithmetic Deep Model for Temporal Remote Sensing Image Fusion
Next Article in Special Issue
PatchMask: A Data Augmentation Strategy with Gaussian Noise in Hyperspectral Images
Previous Article in Journal
Exploring the Ability of Solar-Induced Chlorophyll Fluorescence for Drought Monitoring Based on an Intelligent Irrigation Control System
Previous Article in Special Issue
CSCE-Net: Channel-Spatial Contextual Enhancement Network for Robust Point Cloud Registration
 
 
Article
Peer-Review Record

Dual-Branch Attention-Assisted CNN for Hyperspectral Image Classification

Remote Sens. 2022, 14(23), 6158; https://doi.org/10.3390/rs14236158
by Wei Huang 1, Zhuobing Zhao 1, Le Sun 2,3,* and Ming Ju 1
Reviewer 1:
Reviewer 2:
Reviewer 3:
Remote Sens. 2022, 14(23), 6158; https://doi.org/10.3390/rs14236158
Submission received: 8 November 2022 / Revised: 28 November 2022 / Accepted: 1 December 2022 / Published: 5 December 2022
(This article belongs to the Special Issue Deep Learning for the Analysis of Multi-/Hyperspectral Images)

Round 1

Reviewer 1 Report

A dual-branch attention-assisted CNN (DBAA-CNN) for HSI classification is proposed in this paper, which uses 3-D CNN, PSA module, and spectral attention module to jointly extract spatial-spectral features. This is very innovative work and the effectiveness of the method is verified by experiments. I suggest some refinements are needed before publication.

1.   In line 184, it is mentioned that "The features were compressed along the spatial dimension and the resulting feature map had a global perceptual field." The authors do not explain this in detail. It is necessary to add how the features were compressed along the spatial dimension and why the resulting feature map had a global perceptual field.

2.   In line 163, it is mentioned that " As shown in Figure 1, we used a cube block  of size  ", I don't understand why the size of  is , please explain the meaning of .

3.   For the data description section, the training sets of IP and PU and SV datasets are divided in different ratios, why? And how was this determined?

4.   The paper has a large number of abbreviations. In some cases, I had a hard time finding some of them. I suggest that the authors create a table listing all the abbreviations. This will make it easier for the reader to read and understand.

5.   In the Experimental Setting section, the authors parametrically analyze the window size of the input data, but do not explain why it affects the performance of the classification? Also, is there a criterion for setting the size of the window?

Author Response

please see the attached file for the response.

Author Response File: Author Response.docx

Reviewer 2 Report

The manuscript deals with the hot problem of classification of remotely sensed spectral images. It proposes a further DN methodology based on a dual-branch attention-assisted CNN aimed at reaching simultaneously lower computational times and high accuracy.

The manuscript is very well written, somewhat usefully educational. As with DN, the proof of validity of the proposed methodology lies (besides its motivations) in its results which appear to be good according to the shown tables and Figures

Experiments are worked out on three data sets very popular among the scientific community.

In my opinion it is not clear, or at least not sufficiently detailed, what is the scheme adopted for the Train/Test validation of the methodology. It is generically said that training (5%) and test (remaining part) sets are used with repetition 10 times, but nothing more. In particular it is not clear if the separation between the data sets between the two subset is always applied.

I begin with the number of epochs, which is apparently and surprisingly kept fixed at 100. Is this the potentially maximum number of epochs or the number of epochs really used? In the former case on which subset (Training, test) was it optimized? The high number of 100% success rate of the methodology require a control of hyperfitting.

The same for the windows size: on which subset was it optimized?

Authors reduce the number of bands actually used to 80 out of the available ones, even after cleaning from noise and water absorption. How were these bands chosen? Again, on which data set (training, test) were they optimized?

As a general rule, in principle the most consolidated and rigorous splitting scheme is a Training/Validation/Test scheme (of course, separate subsets), where Training/Validation are used to set the model (btw, 10 repetitions used by authors are computationally of the same order as a 10-fold CV) and optimizing hyperparameters (including, e.g., number of epochs) and test is finally evaluated on the optimal model arising from the training/validation phase.

As a last note, in the spirit of reproducible research, I believe that it would be fair to release codes used for implementation, espeially considering that data sets are well known and public.

Author Response

please see the attached file for the response.

Author Response File: Author Response.docx

Reviewer 3 Report

The paper "Dual-branch attention-assisted CNN for hyperspectral image classification" is an interesting research paper that combine different known methods based on CNN different technologies to improve the HSI classification accuracy and reduce the demand for computer resources. The paper can be more attractive to readers if the following minor and major modifications are implemented.

1-  There are discrepancies between the problem and the compared state-of-the-art HSI classification algorithms. The authors indicated that " the use of complex networks inevitably causes information redundancy and increased computational cost." the author should  write in the abstract "a comparison should be with these complex networks" or "a comparison with combination of conventional ML and recent CNN methods".

2-Please include the following literature  with the list of  a growing number of scholars have investigated HSI classification (line 40). they are very important 

I-Cooperative evolutionary classification algorithm for hyperspectral images, Journal of Applied Remote Sensing 14 (1), 016509

II-Hyperspectral image classification on insufficient-sample and feature learning using deep neural networks: A review, International Journal of Applied Earth Observation and Geoinformation, Volume 105, 25 December 2021.

III-Folded LDA: Extending the Linear Discriminant Analysis Algorithm for Feature Extraction and Data Reduction in Hyperspectral Remote Sensing", IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol.14, pp.12312-12331, 2021

 

3-In the related work from line 134 to 199 should not be included in the data and method section. The needed information such as brief information about 3D network only can be added in the appropriate place in the subsection "proposed method" . This also applies to Squeeze-and-Excitation (SE) block.

4-the paragraph from line 217 to 221 has many ambiguous issues such as why PCA dimensionality reduction will lead inevitably to information loss? Second issue how 1X1 convolution will remove useless information and will reduce the dimension of HSI from l to b?

5-Pay attention to English mistakes on line 219 replace "we chose to use" with "we used"

6-The authors indicated that the two 3D layers could not provide sufficient features information so they added PSA. I would suggest to use only PSA in that case otherwise the author has to convince the readersr why they have to keep the 3D layers and why it is needed for feature extraction.

7-Can the authors explain the type of multi scale features extracted by SAC module in parallel? Before concatenation is there a mechanism to check for redundancies? How g is determined?

8-When reading subsection 2.2.2 I got confused because  the authors indicated that PSA is better in extracting features than 3D. The authors again used two-layer 2D convolution layers to extract shallow features. Why they need to do it with lower layers dimension?

How they adjusted the spatial size and spectral dimension while using the 2D layers and why?

9-Again pay attention to English mistakes line 287 replace "...is obtained to obtain..." with "...was utilized to obtain..." pay attention to tenses!

10-Data descriptions should be placed  first in the data and methods. Add references for the Datasets IP, PU, and SV

11-Since the topic is improving HSI classification accuracy and for the simplicity of reading I would recommend that the authors remove or move to the end  the performance experiments using different windows sizes and different number of bands  from line 347 to 366. 

Author Response

please see the attached file for the response.

Author Response File: Author Response.docx

Round 2

Reviewer 2 Report

The reply of the authors satisfies the issues I raised in my first review therefore the manuscript can be published in the journal in my opinion

Reviewer 3 Report

Great job was done by the authors to improve their interesting research paper "Dual-branch attention-assisted CNN for hyperspectral image classification", congratulations! I would like to ask them to further check for English mistakes to improve the quality of the paper.

Back to TopTop