A Feature Embedding Network with Multiscale Attention for Hyperspectral Image Classification
Round 1
Reviewer 1 Report
This paper proposes a new network, MAFEN, for hyperspectral classification. It introduces a Multiscale Attention Module (MAM) and Feature Embedding Network to extract and learn multi-scale information of features at different depths. The Adaptive Spatial Feature Fusion (ASFF) strategy is introduced to fuse features from different levels. In general the idea seems to be feasible.
Below are detailed comments:
1. In the introduction section, the authors should add relevant literature on using spatial-spectral information for classification.
2. In section 2, the paper should provide a more detailed description of the model architecture and clearly label the meaning of each block in Figure 1.
3. The PCA transformation compresses spectral information. If only the first few principal components are selected, some information will be lost. Why do the authors use the principal component images as the data source for deep learning instead of using the original hyperspectral images as the data source?
4. The 1x1, 3x3, and 5x5 convolution size are used to obtain multi-scale information. It seems that in the context only local neighborhood information will be noticed. Why a larger convolution size is not adopted to obtain high level spatial or global information?
5. In Figure 2, appropriate text annotations should be added to the modules described for readers to understand.
6. In Figures, “Spectrum Attention” should be “Spectral Attention” for consistency.
7. I suggest the authors add a figure to illustrate the ASFF strategy described in Section 2.3.
8. Tables II-V only report the standard deviation in OA, AA and Kappa coefficients. It is recommended to report the standard deviation for each category.
9. The resolution of Figure 5 is too low, and the legend size and font size are inconsistent.
10. In Section 3.3.1.2) KSC, the corresponding table is not specified.
11. The advantages of MAFEN are not reflected in the comparative experiments on PU and Salinas datasets. Both SSFTT and HybridSN achieved excellent results in the accuracy of each category and overall accuracy. Therefore, it is recommended to reduce the proportion of training set samples and conduct comparative experiments again to verify the superiority of the proposed model.
12. Computational performance should be evaluated and compared to other models.
13. The ablation experiments described in Section 3.3.3 were only conducted on the Indian Pines dataset. To improve credibility, I suggest the authors should conduct experiments on the other datasets.
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Reviewer 2 Report
It is imperative to present a plausible justification for the low percentage of data used for model training. The authors must indicate several studies in the literature in which low percentage is adopted, with due presentation of justification.
Statistical tests are essential. Are there statistically significant performance differences between the considered methods?
Authors should consider, for example, the use of the Nemenyi test.
Overall readability is good. It is essential to make sure that the manuscript reads smoothly- this definitely helps the reader fully appreciate the research findings. Some parts of the manuscript must be improved -- the beginning of the third paragraph of Introduction is an example.
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Reviewer 3 Report
The authors propose a hyperspectral image classification method based on a feature embedding network with multiscale attention. Two modules, MAM and ASFF, are designed to improve the classification performance of HSI. Extensive experiments are conducted on five datasets to validate the effectiveness of the proposed method. This work is novel, but still needs some issues to be discussed as follows.
1. The term "spectral" and "spectral" are used confusedly in the article.
2. The font in Figure 5 is very blurry, and the font size should be consistent.
3. The details of spectral attention and spatial attention should be given.
4. Tables II to V provide comparative results of different methods, where what does "± 0.001" mean? Standard deviation? In addition, each category should also provide corresponding values.
5. The values in Figure 10 are very blurry, and the author should make adjustments.
6. The author mentioned that the proposed method is efficient, please compare the running time of the model.
7. In the method comparison section of the article, several recent methods should be added.
Moderate editing of English language required
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Round 2
Reviewer 1 Report
Please receive the review comments for the Manuscript ID: remotesensing-2426075: The author has answered all my questions with significant improvements. However the manuscript still need a few ponish: 1. F1', F2', F3' declared but not explained in the figure 1 and the context; 2. Missing annotation of the patch blocks in Figure 2 and Figure 5; 3. Duplicated subtitle (a) and (b) In Figure 12. Consider its current status, my review recommendation is: Accept after minor revision. Kind regards,
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Reviewer 3 Report
The author has answered all my questions, I have no other questions.
Minor editing of English language required
Author Response
Please see the attachment.
Author Response File: Author Response.pdf