Next Article in Journal
The Formation of Yardangs Surrounding the Suoyang City Ruins in the Hexi Corridor of Northwestern China and Its Climatic–Environmental Significance
Next Article in Special Issue
Unlocking Large-Scale Crop Field Delineation in Smallholder Farming Systems with Transfer Learning and Weak Supervision
Previous Article in Journal
Spatial Estimation of Regional PM2.5 Concentrations with GWR Models Using PCA and RBF Interpolation Optimization
Previous Article in Special Issue
Multisite and Multitemporal Grassland Yield Estimation Using UAV-Borne Hyperspectral Data
 
 
Article
Peer-Review Record

Early-Season Crop Identification in the Shiyang River Basin Using a Deep Learning Algorithm and Time-Series Sentinel-2 Data

Remote Sens. 2022, 14(21), 5625; https://doi.org/10.3390/rs14215625
by Zhiwei Yi, Li Jia, Qiting Chen *, Min Jiang, Dingwang Zhou and Yelong Zeng
Reviewer 1:
Reviewer 2:
Remote Sens. 2022, 14(21), 5625; https://doi.org/10.3390/rs14215625
Submission received: 12 September 2022 / Revised: 29 October 2022 / Accepted: 31 October 2022 / Published: 7 November 2022
(This article belongs to the Special Issue Monitoring Crops and Rangelands Using Remote Sensing)

Round 1

Reviewer 1 Report

The authors of this study exploit the potential of deep learning algorithms and time-series Sentinel-2 data for early-season crop identification. Four classifiers, including two deep learning algorithms and two shallow machine learning algorithms, were tested using Sentinel-2 images in the Shiyang River Basin. Here are the three concerns about the manuscript.

 

1.      In the introduction section, please add information about how current researchers do the early season classification of the crop.

 

2.      When you do the classification only use half of the VI time series, How the SG filter was used? Did you use the data covering the whole growing season or only use the available observations?

 

 

3.      The input data has filling 0 values. How the filling value influence the resluts?

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

A very interesting paper, highly topical issue, very well presented and analysed. taking advantage not only of the high temporality of Sentinel2 but also the benefit of an early characterization of crops using different classifications techniques including deep learning algorithm

I suggest some minor changes.

-  DOY103 was in early March, when the crops in the Shiyang River Basin had not yet been sown,( 3.2. Experimental design in page 2)

While  i consider DOY103  is within first fortnight of april

-   F1 score is incorporated in section 3.2. Experimental design (page 9), and defined individually for each class by means of the equation 6, but there is no explanation of its meaning. and most of the times is referred as for the whole classification

 

-   Table3 (4.4. Early crop mapping in the Shiyang River Basin page 16 ) is distributed in 2 different page. It should not be splitted 

Author Response

Please see the attachment

Author Response File: Author Response.doc

Reviewer 3 Report

 

The paper assessed the potential of deep learning algorithms and Sentinel-2 time-series data for early-season crop identification. It compares four classifiers (2 deep learning algorithms: one-dimensional convolutional networks and long and short-term memory networks and 2 shallow machine learning algorithms: random forest and support vector machine) using time-series with 3 different temporal resolutions (observation each 5, 10 and 15 days) and duration in the Shiyang River Basin. Generally, the document is well organized and written. However, my main concern is about the way the accuracy assessment was carried out. I feel that the way it is done does not enable authors to have robust conclusions about the compared results because it does not permit assessing if the differences in accuracy estimates are statistically significant of not. I explain this point further.

The training and ground truth samples need to be described in more detail. I understand that it is an opportunistic sampling due to logistic restrictions of fieldwork. Please indicate the type of sampling (opportunistic, random …). Figure 1. “Study area and locations of ground truth samples” is too small to appreciate the spatial distribution of the samples. It is unclear if cropland plots are defined as a small area around the center coordinates recorded by GPS (e.g. 5x5 pixels) or if it depends on the size of the field (in this case, the polygon of the crop field is needed). Note that from a statistical point of view, the independent observations are the plots, not the pixel (so N = 654 for 2020). 

There were two types of accuracy assessment. The first one aimed to assess the classifications of the 2020 ground samples to compare the performance of the different classifiers/input data. The second one aimed to evaluate the accuracy of the early-season crop map for 2020. In both cases, accuracy assessment is not based on random sampling. Maybe authors can argue that there is not an obvious bias in the sampling (e.g. only samples in flat areas can be considered as a biased sampling). I think it is essential to make clear the type of sampling and the possible consequences on the statistical rigour of the assessment.

In the case of the map assessment, there is no bias correction due to the difference between categories´ map area and the number of samples for each category (see Card or Olofson papers). That is important because without these corrections, estimates of accuracy are biased. It seems that the authors used R to make the plots; there is a R package called mapaccuracy able to perform all these corrections. Note that Card method also allows to obtain category’s area (taking into account the commission and omission errors from the confusion matrix).

The uncertainty (confidence intervals) of the estimates of the accuracy indices and the areas derived from the classified image was not computed. So, it is impossible to know whether the difference between obtained accuracies using the different scenarios is significant. In order to get robust conclusions from the comparisons, it is crucial to assess the uncertainty of the estimates and carry out statistical tests to evaluate the significance of the differences between methods. See the paper of Card (1982) or more recent and easy to find Olofsson (2013, 2014) for methods that provide the confidence intervals of the estimates. Note that in the present case, all the results are assessed with the same sample. Foody (2004) presents a method to carry out such comparisons.

Please indicate the software programs used to perform the classifications in the material section. In the case of using free software as R or python, it can be great to share the code and allow other users to test your methods.

 

Additional specific comments

p 1 I am not sure that “low labor cost” is clear.

In the introduction and the description of the study area, make clear whether only irrigated agriculture is concerned or both irrigated and rain-fed agriculture.

(Pal and Mather, 2003; Pal and Mather, 2004). I guess the correct format is without repeating the authors' names, only the date (Pal and Mather, 2003 and 2004).

Currently, deep learning algorithms such as the Conv1D network and LSTM network have been applied to crop classification with good results”. I suggest avoiding subjective term as “good”, in the case the evaluation was focused on accuracy, “accurate” can replace “good” (and maybe the level of accuracy can be mentioned).

There are many problems with cross-references of figures along the text (Error! Reference source not found.).

Figure 1. Study area and locations of ground truth samples. It is complicated to see the sample points and appreciate their distribution.

Table 1. Please add a new row with the column sums (totals). That will help readers link the text and the table values (for instance, Crop plots of 654 and 27,843 training sample points in section 3.3. Accuracy assessment).

Figure 6. There is no shading around lines for categories alfalfa and wheat

4.2. Classification performances of the different combinations of classification strategies

The different classifiers had different classification performances. The best classification accuracy was 0.87 for the Conv1D when using the full time-series data (i.e., end-of-season mapping), followed by 0.85 for the LSTM network and the SVM network, while the accuracy was only 0.82 for the RF algorithm.

The accuracy estimates are pretty similar, so without an evaluation of the uncertainty of the estimates, it is impossible to know if one method performs significantly better than another.

Table 3. Confusion matrix of the early-season crop map for 2020 in the Shiyang River Basin obtained using the Conv1D network. The accuracy indices' estimates are incorrect due to non-random sampling and the absence of Card correction.

2.2.4. Feature selection.

Feature selection consists in reducing a large set of features by selecting effective features and discarding the redundant features from the original dataset. I do not feel that computing different vegetation indices is true “feature selection” (only new variables which can eventually be selected by applying feature selection).

The fact of using only VIs is a bit contradictory to a point raised in the discussion (“The different shapes, pigment levels, and moisture contents of the crops cause the spectral properties of the crops to differ”). The Vis enables the assessment of the difference between the crops’ phenology. However, the difference in pigment could be detected more accurately using all the spectral bands.


5.2. Factors decreasing the accuracy of the early crop mapping

samples and full season images for the mapping year (SFSMY).using a confusion matrix (     ). It seems that something is missing between the parenthesis.

Authors mention that “The overall accuracy of the HSES decreased by 0.04 and the Kappa coefficient decreased by 0.05 compared to the HSFS; while the overall classification accuracy decreased by 0.13 and the kappa coefficient decreased by 0.16 compared to the SFSMY. This indicates that the spectral variations in the crop itself between 2019 and 2020 in the Shiyang River Basin were an important factor limiting the accuracy of the early crop mapping.” These conclusions should be based on a statistical test to assess the difference in accuracy.

 

Discussion

Compared to HSFS and SFSMY, the F1 scores of HSES decreased by 0.03 and 0.12 for sunflower, and 0.04 and 0.13 for corn. It suggests that the spectral variations of the sunflower and corn between the two years was a major factor reducing their early mapping accuracies.Compared to HSFS and SFSMY , theF1 scores of HSES for alfalfa decreased by 0.05 and 0.08, and those for wheat by 0.01 and 0.02

In this case, the conclusions should be based on a statistical test to assess the observed decrease in accuracy is statistically significant.

I hope that these comments will help the authors in improving the paper.

 

References on accuracy assessment (method to correct bias samples):

Card, D., 1982, “Using known map category marginal frequencies to improve estimates of thematic map accuracy,” Photogrammetric Engineering & Remote Sensing, vol. 48, no. 3, pp. 431–439, 1982

Olofsson, P., Giles M. Foody, Stephen V. Stehman, Curtis E. Woodcock, 2013, Making better use of accuracy data in land change studies: Estimating accuracy and area and quantifying uncertainty using stratified estimation, Remote Sensing of Environment, Volume 129, 2013, Pages 122-131,

Olofosson et al., 2014, Good practices for estimating area and assessing accuracy of land change. Rem Sens of Env 42-47.

Foody, G. M., 2004. Thematic map comparison: evaluating the statistical significance of differences in classification accuracy, Photogrammetric Engineering and Remote Sensing, 70, pp. 627-633. https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.476.9115&rep=rep1&type=pdf

 

 

Author Response

Please see the attachment

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

The authors have replied well.

Author Response

Thank you very much for this positive comment. 

Reviewer 3 Report

It is difficult to assess how the document was improved due to the absence of a version with change tracking.
However, it seems that elements in the response letter are not in the manuscript. For instance, in the letter authors clarified that the sampling is opportunistic and explained why they chose it, but it seems that these important considerations are not in the new version. Moreover, the authors used new accuracy assessment methods following my suggestions but did not present these methods in the method section. For example, a short presentation of Card's correction and McNemar's method should be included in section 3.3. Accuracy assessment.

Finally, I feel that the computing of accuracy indices' confidence intervals (tabla 3) and McNemar (table 5) are wrong because the authors used the number the pixels and not the number of plots as the size of the sampling. This artificial increase of the size of the sampling conducted to very small confidence intervals and to consider minimal differences in accuracy as significant. 654 Crop plots covering 27,843 pixels are not 27,843 independent observations so N = 654 and the number of plots should populate the confusion matrix.
I understand it is difficult to assess at the plot level because a plot corresponds to one crop category in the field but to various pixels with different categories in the classified image. To obtain a confusion matrix at the plot level, authors can consider only the central pixel of the plot or, better, the majority categories among the pixels belonging to the plot.

The justification of the complexity of crop patterns is weak because a stratified random sampling based on the map could resolve this point. Moreover, the authors should discuss the fact that samples are not randomly selected. This is not a problem for training but for testing (assessing accuracy).

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Back to TopTop