Next Article in Journal
Method for Estimating the Optimal Coefficient of L1C/B1C Signal Correlator Joint Receiving
Next Article in Special Issue
Optimization of Remote Sensing Image Segmentation by a Customized Parallel Sine Cosine Algorithm Based on the Taguchi Method
Previous Article in Journal
Revealing the Morphological Evolution of Krakatau Volcano by Integrating SAR and Optical Remote Sensing Images
Previous Article in Special Issue
Integrating Remote Sensing and Meteorological Data to Predict Wheat Stripe Rust
 
 
Article
Peer-Review Record

The Self-Supervised Spectral–Spatial Vision Transformer Network for Accurate Prediction of Wheat Nitrogen Status from UAV Imagery

Remote Sens. 2022, 14(6), 1400; https://doi.org/10.3390/rs14061400
by Xin Zhang 1, Liangxiu Han 1,*, Tam Sobeih 1, Lewis Lappin 2, Mark A. Lee 3, Andew Howard 4 and Aron Kisdi 2
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Reviewer 4: Anonymous
Remote Sens. 2022, 14(6), 1400; https://doi.org/10.3390/rs14061400
Submission received: 27 January 2022 / Revised: 4 March 2022 / Accepted: 8 March 2022 / Published: 14 March 2022

Round 1

Reviewer 1 Report

The author proposed a new deep learning network structure to make wheat nitrogen status prediction. In general, the paper is well organized and written. However, there still exist a few problems need to modification and they are descried as follows:

1)The paper focuses on wheat. Thus, please limit the title to wheat not all crops.

2)Part 2.1, there exist lots of studies for non-destructive crop nitrogen estimation. Normally, it can be classified into empirical model method (including deep learning method), mechanism model method and Hybrid-model method. Thus, this part did not describe exist studies well.

3)Following parts needed to be descried in detail: i) How to collect UAV images, how to make preprocessing of UAV images (mosaic, geo-correction)? ii) How to make image augmentation in part 3.3.

4)Discussion part: lack of comparative analysis with existing studies.

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Reviewer 2 Report

The paper is well-written and complete. The experimental design is adequate, and the reading was overall enjoyable.

The authors properly motivate the paper stating that identifying optimal nitrogen fertilizer usage for specific crops is good for several reasons. However, the proposed granularity for N levels seems insufficient for such an objective. Still, results are impressive for the scale used.

The ablation study was informative, and the authors explored different modules in the networks to create evidence that attention block was more efficient than other counterparts. 

Overall, the paper reads very well, and I recommend it for publication. 

A minor correction: "SSL can be broadly divided into Generalized Adversarial Networks (GANs) and Contrastive learning [47]". I believe the correct statement would be "SSL can be broadly divided into generative and/or contrastive methods." Generative adversarial networks are only one of the various existing generative-contrastive approaches. 

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Reviewer 3 Report

Specific comments:

Page 2: One of the most commonly used spectral analysis methods is to use vegetation indexes (VI) based on specific wavelengths to predict the crop N content and status, such as Normalized Difference Vegetation Index (NDVI)[9] and Leaf area index (LAI)” - NDVI and LAI are conceptually very different each other, in that NDVI is directly computed from multispectral digital imagery data, while LAI requires some sort of ground-based data collection to “calibrate” VI-based maps derived from remotely sensed data.

“However, the N content of crops is considered a complex problem”: the assertion should be changed into something like this: “However, the indirect determination of N content of crops from remotely sensed data is considered a complex problem”.

Page 3: “Over the past two decades, remote sensing technology has been considered one of the most promising methods to provide a non-destructive way in which to measure and estimate crop N content and status in fields and wider environments” - Remove from above text “measure and”.

Page 3: “The spectral information in these regions are considered to measure the biological (e.g. photo- synthetic pigments, chlorophylls) and morphological (leaf area, canopy density) features of the crop and thus derive the N content and status[3].” - Change into something like: “The spectral information in these regions is considered to indirectly estimate the biological (e.g. photo- synthetic pigments, chlorophylls) and morphological (leaf area, canopy density) features of the crop and thus derive the N content and status[3].”

Page 16: “To avoid the effect of shadows on images, all the data in this work are taken at 11-12 AM to reduce the shadow of the plants and ensure sufficient light conditions.” - This is a heavy constraint that highly reduces the method applicability in a real operational contex, because common surveys - that are usualy much larger than the proposed experimental plot - require long acquisition times and the proposed time window is too narrow for this. Longer (more realistic) acquisition times, that exceed the proposed time window, would very likely completely invalidate the shown good performance indexes for the proposed method.

General comments:

The usability of the proposed method in a real situation seems to be quite questionable, since in a real field the investigated parameter “N content” continuously varies across the field itself. In other words, real fields aren’t simple patches of square areas each one characterized by constant values of “N content”, which is what has been used for the training of the proposed method. It is reasonable to presume that a method that has trained on such a very non-realistic situation won’t be able to perform as well in real conditions. In any case, the paper should contain in his conclusion such kind of warning.

The initial assumption of “low”, “medium” and “high” fertilization rates do not take into account the nitrate endowment of the soil. Therefore, the three doses may not represent the optimal choice for the specific soil and the specific crop type. In other words, the paper is just arguing on the ability to discern the (we do not know if optimal) effect on vegetation of the dispensed quantities of nitrogen, but it does not touch at all the problem of the dose optimality for the specific crop and for the specfic soil. This can be reached only by integrating in-field phisiology measurements aimed at establishing the optimal nutritional condition of the plants. We must remember that the farmer’s objective it is not so much to detect healthy and green leaves in their fields, but rather to maximise the production and/or his quality.

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Reviewer 4 Report

For automatic and accurate crop N status estimation, the work proposes a self-supervised spectral-spatial attention-based transformer network (SSVT). The manuscript is well written, and results are presented and discussed, considering previous studies. Following minor suggestions/comments:


Page 3: the sentence "The rest of this paper is organized as follows: Section 2 presents the related work; Section 3 details the proposed method; In Section 4, the experimental evaluation is described; Section 5 concludes the proposed work and highlights the future work" can be removed (it is not necessary).


Section 2. Related work: I suggest including this section in the "Material and methods" section. (e.g., Vision Transformer and Self-supervised learning (SSL) can be included in the methods). 


Page 9: unit of measure of the image size should be added. Moreover, a scale bar should be added to Figs. 4, 6, 12, 13, and 14.

 

 

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Back to TopTop