A Convolutional Neural Network Method for Rice Mapping Using Time-Series of Sentinel-1 and Sentinel-2 Imagery
Round 1
Reviewer 1 Report
Dear authors
This article aims to identify the rice fields in Mazandaran province in Iran based on optical and RADAR time series of Sentinel 1 and 2. The main novelty is the convolutional neural network method used. In general, it's an interesting article, but some points could be improved.
Here I presented some of my questions and suggestions to improve the article:
In the abstract, it is not necessary to mention wheat and focus only on rice. In addition, only one sentence is presenting the results, please try to add more information about the results and conclusions in the abstract.
In the Introduction, line 60 you mean SPOT instead of SOPT, right?
In line 123 there is a repetition of the same reference
In line 233 you mean SNAP instead of SANP’s, right?
Why only 5% of the samples were used for training and validation?
If you used NDVI as a vegetation index for the optical images why didn’t you use a vegetation index for the RADAR images too?
In Figure 5, it seems that you used only NDVI for the optical image. Did you use all the multispectral information of Sentinel 2 bands?
I also recommend you bring your results and discussion together in the same section, so some of the questions I made earlier would be avoided if you explained them well. However, some questions remain. So try to improve the results and discussion sections with more discussion and compare your findings with other articles' results.
Best regards
Author Response
Please see the attached file.
Author Response File: Author Response.pdf
Reviewer 2 Report
This manuscript looks like a technical report. 1) The explanatory variables, radar data are every 14 days, and optical data are all data. Why? 2) There is no analysis of the importance of the explanatory variables. 3) In Figure 8, a partial zoom image should be added. 4) The discussion section needs to be rewritten, the analysis is not sufficient, and the comparison with the existing literature should be enhanced. Firstly the section should be divided into several subheadings; secondly, the analysis of VHR images is unnecessary.
Author Response
Please see the attached file.
Author Response File: Author Response.pdf
Reviewer 3 Report
1. The readability and presentation of the study should be further improved. The paper suffers from language problems. It is suggested to proofread by a native speaker or a proofreading agent.
2. It is recommended to revise the abstract by highlighting the novelty of this study.
3. It is recommended to summarize the contribution of this study at the end of the introduction.
4. It is suggested to determine the claims of this study clearly and related experimental evaluations.
5. It is recommended to compare the effectiveness of the proposed algorithm with existing work.
6. The discussion section should be added in a more highlighting, argumentative way. The author should analyze the reason why the tested results are achieved.
Author Response
Please see the attached file.
Author Response File: Author Response.pdf
Reviewer 4 Report
This manuscript presents a novel multi-channel streams deep feature extraction method, which extracts deep features from the time series of NDVI and original SAR images by first and second streams, while the third stream focuses on integrating them into multi-levels. Meanwhile, attention mechanism and group dilated convolution are adopted to enhance the ability of the model to extract features
Generally, the paper is well organized and gives detailed descriptions on the model mentioned in this paper. At the same time, experiments are done to prove the performance of the proposed model. Below, I’ll state some comments that I hope the authors find them useful towards the improvement of their manuscript.
1. For those who do not study agricultural data, they may not know about NDVI. After consulting the data, I found that this is a very interesting data type. Could the author also describe NDVI in the paper?
2. The authors emphasized the proposed framework several times, but from a reader's point of view, the section on methods may be less specific than other sections. We expect that the authors could abstract the problem into a concrete model and formulate it in the mathematical form.
3. Could the author also express CAM module and group dilated convolution in formula?
4. CAM module is composed of channel attention and spatial attention. I hope the author can introduce their functions and advantages. With the development of deep learning, there are many attention mechanisms. Why did the author choose CAM? Have you made any other attempts?
5. All the tables in the paper need to be centered.
6. I think the third stream used to combine features should also play an important role in the process of feature extraction. I hope the author can design experiments to remove the third stream to prove the necessity of feature fusion.
7. In Section 3.2, the authors tried to show the proposed multi-streams deep feature extraction framework, but they failed to explain the inspiration of designing the frame work and how the network works clearly.
8. In Section 4.2, the authors illustrated the classification performances with confusion matrices. It could not be a good choice for comparing the classification performances of different methods. We suggest that the authors could try other metrics to make the comparison more intuitive.
9. Some details: in the abstract, introduction, text and conclusion, the statements about the proposed framework seem to be a little repetitive; some graphs are not beautiful enough.
Author Response
Please see the attached file.
Author Response File: Author Response.pdf
Round 2
Reviewer 2 Report
Your manuscript has been sufficiently improved, and I accept it in its present form. Congrats on the research!
Reviewer 4 Report
The authors have addressed all my comments, and the manuscript can be accepted.