Next Article in Journal
The Cooling Effect of Urban Green Spaces in Metacities: A Case Study of Beijing, China’s Capital
Next Article in Special Issue
A Deep Multitask Semisupervised Learning Approach for Chlorophyll-a Retrieval from Remote Sensing Images
Previous Article in Journal
Estimation of Vertical Fuel Layers in Tree Crowns Using High Density LiDAR Data
 
 
Article
Peer-Review Record

Crop Rotation Modeling for Deep Learning-Based Parcel Classification from Satellite Time Series

Remote Sens. 2021, 13(22), 4599; https://doi.org/10.3390/rs13224599
by Félix Quinton and Loic Landrieu *
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Remote Sens. 2021, 13(22), 4599; https://doi.org/10.3390/rs13224599
Submission received: 11 October 2021 / Revised: 28 October 2021 / Accepted: 10 November 2021 / Published: 16 November 2021

Round 1

Reviewer 1 Report

This manuscript presents the first deep learning approach modeling simultaneously the inter- and intra-annual agricultural dynamics of parcel classification along with simple training adjustments. The proposed model was applied towards an area of 110 × 110 km2 in the South East of France and demonstrate a good performance. The research results are of interest for the scientific community in some degree, but the manuscript needs to be improved, both in its structure and its contents. Please consider the following comments when preparing a revised version of the manuscript.

 

  1. On the whole, the manuscript was written like a technical report. The proposed model framework was not compared with other models, and there is no discussion section in the manuscript. The discussion section should clearly establish the link between introduction the study objectives and results. In other words, it should answer objectively the question of whether the proposed method is superior and in what way to previous methods? It should also provide limitations of the study.

 

  1. Is the proposed first deep learning framework for a specific study area or a general technology framework?If the framework is for a specific study area, please supplement the overview of the study area. If the it is a general technical framework, please verify the accuracy in multiple experimental areas.

 

  1. Generally speaking, the training data is very important for the deep learning network. It is suggested that the author analyze the input training data and calculate the training data error, so as to better prove the performance of the proposed framework model.

 

4.L12,It should be "monitoring the crops types of for subsidy allocation".

 

  1. L21,"farmers declare the crop cultivated in each of their parcels every year. " What is the accuracy and coverage of the above declared data? If the accuracy and coverage are high, is it better to spatialize the data than to use deep learning for classification? If the accuracy and coverage are not high, how to ensure the accuracy of the deep learning network training data?

 

6.L108 “We consider that the history of a parcel is completely described by its past cultivated” I think this assumption may not be suitable for all research areas.

 

7 The label is missing in Figure 4 and Figure 7.

Author Response

We thank the reviewer for their comment. Please find atatched a revised version of the manuscrit and below a detailed answers to the reviewers' questions.

1) Our method aims to classify parcels by leveraging multi-year data. To the extent of our knowledge, only two other approaches in the litterature (CRFs [32,33], observation stacking [34] ) allows to classify multi-year time series. We compare our method to (i) prediction from single-year (ii) smoothing with CRFs (iii) observation stacking (iv) variations of our model. In this sense, we have compared our approach to all other relevant models.

The PSE+LTAE is not in itself part of our contribution, but simply our choice of backbone network (ie the spatio temporal encoder chosen to learn the features) as it constitutes the clear state-of-the-art for parcel classification from SITS. In order for our message to keep a clear focus, we do not evaluate the performance of different single-year encoder networks, as the choice of spatio-temporal encoder network is unrelated to the scope of our article and already extensively covered in the litterature. We refer the reader to [6,17-20] for comparisons between the performance of different encoders.

We added a discussion section as recommended, in which we expose explictely this important point.

2) Given the large amount of data involved and the complexity of data collection, we have limited our analysis and open-access dataset to a single area of the French Metropolitan territory. While nothing in our method is specific to this area, some of our analysis may be biased by the preponderance of stable cultures such as vineyards in this area. In order to confirm the generality of our conclusions, we would require a dataset with parcels taken from regions across the world with various meteorological conditions and agricultural practices. This task is made complicated by the lack of harmonization between LPIS in terms of open-access policy and even nomenclature. We hope that our results will encourage mapping agencies across Europe to release multi-year LPIS in open-source to help constituting a truly global dataset allowing to assess the spatial generalizability of state-of-the-art methods.. We added this limitation in the discussion section. Finally, note that the specific stuy area is already represented in its entirety in Figure 3a).

3) On average, our best model performs at 84.7% mIoU and 98.1% OA on the training sets (Table3). We added this information.

4) Corrected, thank you for pointing it out.

5) We refer you to L162 for this information: the coverage of the LPIS is total and the accuracy is self-reported as 97%.

6) Indeed, we presented this hypothesis as one of our simplifying assumptions. We are in no way presenting this a true statement. We evaluated the impact of taking past observations as the history of the parcel instead of the declarations, and obtained worse results. We conclude that, in our precise setting, our assumption is validated experimentally.

7) The colormap of labels is the same across all figures, including Figure 4 and Figure 7. We added this information more explicitely.

Author Response File: Author Response.pdf

Reviewer 2 Report

Dear authors, please see my comments in the attached WORD file

Comments for author File: Comments.pdf

Author Response

We thank the reviewer for their insightful comments. We reply point by point below, and attached a revised version of the manuscript.

1 - regarding structure and templates, we have checked with the editor and we are following the correct template for this special issue

2 - We are indeed from the machine learning community. We thank the reviewer for their constructive criticisms that will allow us to make this paper more easily understandable to the RS community. 

3 - Your understanding of our setting and method is indeed correct: each parcel is a geo-referenced polygon for which we have 1 label and 1 time series of cropped Sentinel-2 images. Our model classify the parcels by mapping their time series to a single class representing the prediction for the cultivated crop type. The prediction is done by parcel/polygon and not by pixel. The misunderstanding comes from the use of "mIoU". While IoU (or Dice score, or Jaccard index) can indeed be a measure of the quality of shape prediction, it is also ubiquitous in the CV/ML communities to measure the quality of a classification (IoU = TP / (TP+FP+FN)). We did not realize that this metric was not yet as standard in the RS community as well, and added the details and formulas necesssary to make sure that this does not add an unncessary but understandable confusion.

4 - Each parcel has a unique label per year: its main crop ("culture principale"). We do not refer to inter-annual rotations in the article, but use "inter-annual dynamics" to refer to the temporal changes within one year. We made this distinction clearer to avoid misunderstandings.

5 - As stated L146: "We split our data into 5 folds for cross validation. For each fold, we  train on 3 folds and use the last fold for calibration and model selection." This means that our train/validation/test ratio is 60%/20%/20% for each fold (but the evaluation metrics are given across all test folds, ie the entire dataset). We added this information more explicitely.

6 - While your suggestion to train a model on 2018 and 2019 and evaluate it on 2020 would indeed lead to a more realistic scenario, we argue there is not enough data available to run this experiment in a meaningful manner. Indeed, as the Sentinel-2 mission was only operational starting in 2017, we only have access to full-year coverage since 2018. This means that, at the time of writing this paper, there only exists three years worth of optical time series. In our opinion and from preliminary experiments, this prevents us from evaluating the realistic setting in which the last year is withheld from the training set. Indeed, the inter-year meteorological variations between two years are typically too great to simply test on a third year and reasonably expect good results.
Instead, we use the same setting than the vast majority of work in parcel classification and whose setting is to classify parcels after the year is over [7-20]. As more Sentinel-2 data become available, we will be able to explore this important endeavor.

7 - We have added a discussion section as recommended

8 - 
>Let’s say that you now have [...] all images for 2022, can you classify each parcel? 
We are not able to run our model for 2022 as we do not have access yet the declaration for 2021. However, we can indeed run our model for 2021 using a model trained on 2018, 2019, and 2020. Table2 shows that mixed-year training allows for good generalization even when the training labels are from different years. However, as of now, the mixed training protol involves training samples from the evaluation year. See our discussion above (6) for why this is.
> Can you detect which parcel has a few annotations for 2022 (i.e., a few crop cycles)
As mentioned above (4), we do not predict intra-year rotations
> “Predictably, the specialized models have good performance when evaluated on a test set composed of parcels from the year they were trained, and poor results for other years.”
Indeed, which is why we proposed mixed-year training as a better alternative than specialized models trained on a single year. Training models on one year (specialized models ) is the standard operating procedure , and precisely not our contribution. Instead, we show that mixed year training improve on this limitation. We clarified this crucial point.
We reiterate that the PSE+LTAE is not our proposed model, but our choice of backbone.

9 - we have removed all occurence of the word "significantly"

10 - we have altered the caption of Fig 1 to reflect your remark

11 - we have added exposition sentences to explain the meaning of each subtitle.

12 - we use classic cross validation, without sub-fold splitting. We added more details.

13 - We use Sentinel2 -  level 2A without the bands B01, B09, B10, sometimes referred as atmospheric bands as they are dedicated to the observation of clouds eg cirrus. We reformulate this passage to make this information clearer.

14 - We use bilinear interpolation to resample bands with resolution under 10m. We have made this clearer, and added a reference to the THEIA data provider.

15 - The stability of parcel  can be easily checked with information from the LPIS by comparing the annotated polygons across years. However, this information is not directly given by the LPIS and require some (simple) processing. We added this limitation in the discussion section.

16 - We remove small parcels due to the Sentinel-2 resolution.  We added this precision.

17 - The 97% accuracy report comes from our direct communication with the French LPIS provider. We made more clear that this is a self-reported 
value.

18 - Higher retention means that we would be able to filter-out fewer parcels with a more fine-grained analysis. We have reformulated this sentence to make it clearer.

19 -  In Figure4, we observe intra-year radiommetric variability of the parcels. As mentioned above, there is no intra-year rotation. Some parcel may display several growth-cycle, but our goal is only to retrieve the main culture. Other cultures are simply not reported in the French LPIS.

20 - We have moved the evaluation metrics subsection, the beginning of the section L185-188, and Figure 5  in materials & methods. Fig 5A,B are now better divided.

21 - In Table 2, specialized models are trained with data from a single year. They are tested on data from this year as well as other years to evaluate temporal generalization.  As explained, we made sure that no model could be trained and tested on data corresponding to the same parcel, even at different years.

22 - L190-192 illustrates the limitation of training specialized models, and underlies the benefits of our proposed training scheme: mixed-year training.

23 - The “the model with mixed training" is indeed M_mixed. We have reformulated this sentence for clarity's sake.

24 - In Table 3, the method is iteratively evaluated on the five folds partitoning the data at a given year, in a classic cross-validation fashion. The training set is constituted of all the parcels not in that fold for all years.  We have made sure that no parcel can be in both the training and test sets even at different years.

25 - We have made clearer our use of "declaration" and "annotation" (annotations are taken from farmers' declaration)

26 - We have added a short explanation of calibration through temperature scaling as requested.



Author Response File: Author Response.pdf

Reviewer 3 Report

Accurate automated crop type mapping is of great significance for agricultural optimization and agricultural policy. The authors propose a simple modified PSE + LTAE model, which considers the declared crop labels of the last two years, provides an improvement of parcel classification and saving memory and computation. In addition, the author also released the first large-scale multi-year agricultural dataset, which is of great significance for subsequent research. The following is my specific suggestions:

1. In line 101 of page 3. Is TAE + LTAE written incorrectly ? Which should be PSE + LTAE here.

2. Figure 6 should be more standardized. The legends of time (2018) and different crop types cannot be same shape, and should be distinguished (for example, use bar to represent different crop legends).

3. The confusion matrix results in Figure 8 cannot be qualitatively represented by the rectangle size alone, and the quantitative results should be added.

4. The dataset is composed of specially selected stable parcels. How to ensure the generalization of the model when facing the problem of dramatically changes of parcels?

5. In Part 2. Materials and Methods, only the rationale of the proposedv method is introduced, the specific structure and parameters of model should be illustrated, which is convenient for interested scholars to reproduce.

6. In Table 3, the proposed mode mainly compared with classical machine learning algorithms such as CRF. Please add the quantitative comparison results of the proposed method with classical deep learning methods such as LSTM to demonstrate the advantages of the proposed method.

7. Due to crop planting declaration policy in France, millions of annotations each year could be obtained so that the proposed model can have a lot of annotated data. How about the performance of the model in some other EU countries lacking annual labeled data ? Or is the model only applicable in France ?

Author Response

We thank the reviewer for their insightful comments. We reply point by point below, and attached a revised version of the manuscrit.

1. corrected. thank you for your careful reading.

2. Indeed, thank you for this clarifying design suggestion.

3. Adding quantitative values in Fig8 results in a table with 400 numerical values and little legibility. We refer the reader to Table 4 for classwise metrics.

4. Classification of parcels whose shapes are not stable is outside the scope of this article. We did not investigate this as it poses numerous data collection issues, and prevent from associating multi-year data to parcels. We made this limitation more explicit in the discussion section.

5. Our proposed method is not the PSE+LTAE approach, but a modification of a spatio-temporal encoder in order to take multi-year data into acount. The architecture of all competing approaches to handle multi-year data (aka baselines) are represented in Figure2 and detailed (with formulas) in 2.3. We do not give details on the PSE+LTAE approach because (i) this is not our contribution (ii) we used standard parameterization directly from the official repository (iii) we refer the reader to the papers introducing these methods for more details (iv) the choice of single-year backbone network is outside the scope of the article (v) our code is already publicly available for full reproducibility.

6. The focus of our method is to develop a model able to operate on multi-year data. We propose a simple modification applicable to any deep spatio-temporal encoder. We want to make clear that our contribution is NOT the PSE+LTAE model which is an existing work, but a simple method for handling multi-year data with deep learning. We chose the PSE+LTAE as backbone since it is now the established SOTA on the topic of parcel classification [6,17-20], while LSTMs lag significantly behind. The performance of different spatio-temporal encoders is outside of the scope of this article, and testing less expressive single-year backbones would not bring additional insights regarding the impact of our proposed idea for the classification of parcels from multi-year data. We refer the reader to [6,17-20] for extensive comparisons between the performance of different encoders. We have made this point more explicit in the discussion section.

7. Because our method requires the type of the last two crops grown in a given parcel, it can indeed only be applied if the past declarations are known. It is an important limitation of our approach, which we made sure to make it explicit in the discussion. Note that there exists very little work on the classification of parcels for countries without an open-access LPIS, as the lack the ground truth prevents to quantitatively evaluate the models' performance.

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

The authors have added the discussion section and significantly improved the manuscript. However, I am still not convinced on the responses to the three items below:
1. L21,"farmers declare the crop cultivated in each of their parcels every year. "I think the source and accuracy of the declared data are very important. If the accuracy and coverage of the declared data are high, it is more suitable for declared data spatialization rather than classification based on deep leaning. I wonder whether this study is meaningful, especially when the author only used one study area.
2. The proposed model was applied towards an area of 110 × 110 km2 in the South East of France. Although the main crops are shown in the figure 1, the manuscript lacks many descriptions of the study area, such as common land-covers and crop rotation mode in study area. The lack of this information makes the manuscript read more like a technical report.
3. The figures in the manuscript are not standard. There are no geographic coordinates, and no labels of in some figures. I suggest that the author refer to other manuscripts in RS.

Author Response

1 -
The source and accuracy of the declared data are detailed in 2.1

We could not find references related to the spatialization of agricultural declarations as an alternative to deep learning in the computer vision-based Remote Sensing literature. As shown in [20], not using the contour of the parcel but performing pixel-wise prediction leads to lower performance overall: the knowledge of the parcel's extent is a precious source of information. 

We now answer on the relevancy of our experimental setting:

  • classifying the content of agricultural parcels from observational data and given their extent is the operational setting of monitoring agencies across Europe. In this sense, our setting is meaningful.
  • This setting (one zone, parcel extent know, high accuracy and coverage) is the standard setting for the task of parcel classification using deep learning, as seen in [6,7,8,18,19]

2 - We have added the requested information in the revised manuscript.

3 - We have harmonized our figures with the standard of Remote Sensing. Note that, as mentioned in the caption of Fig1, all figures use the same color code to represent parcels.

Reviewer 2 Report

22/10/2021

Review round 2

Crop Rotation Modeling for Deep Learning-Based Parcel Classification from Satellite Time Series

Authors: Félix Quinton, Loic Landrieu

 

 

The author did a good job in explaining themselves and adding additional information to the text. I find the paper much improved. There is still one issue that is not clear to me which is the “intra-annual” concept (please see the comments below). Although this paper is not 100% structured and worded like other remote sensing papers, I lean towards recommending the editor to accept this paper after some minor revisions.

 

Please see the comments below:

  • L3 – “the first deep learning approach” – unless you are 1000% sure that this is indeed the first time, I would avoid such a declaration.
  • L4 – “modeling simultaneously the inter- and intra-annual” – I have to ask again because I’m having trouble understanding it: Inter-annual, means between years, meaning, that the crop type changes between years, for example in 2018 the crop was potato, and in 2019 it was wheat. Intra-annual, means within a year, meaning, that the crop type changes within a given year, for example, Jan-Apr 2018 the crop was tomato, and Jun-Sep the crop was corn. Correct me if I’m wrong, but what you did was to model inter-annual crop rotation, and NOT intra-annual. In the answers to my first-round review, you explicitly wrote:” we do not predict intra-year rotations”.  If I’m correct, please change it in the text, and remove the reference to intra-annual. If I‘m mistaken, please try to explain it better.
  • L5 - “improvement of over 6.6 mIoU” - as mIoU is not a well-known metric in the RS community, I suggest adding the improvement in percentage as well, so the “common” reader will have a better idea of the improvement your model achieved. In general, I suggest that in any future publications or work you do, does not matter on which subject or on which platform, you should always add to your metrics, a metric in percentage, that is much easier to understand.
  • L20 –high revisit time of five days – it is not always 5 days, in some places it is more and in some less, but usually 5 days.
  • Figure 1 caption (and in all figure/table captions)- you don't need to "announce" what you are doing, for example, no need to write: “We represent the crop type of each parcel”  … just write, what it is the reader see in the figure, for example here: “the crop type of each parcel is represented by..”
  • Please add a scale bar and north arrow to all maps
  • L81 - “do not model both inter- and intra-annual dynamics” – but you also don’t model intra-annual dynamics
  • L82 – “our model operates at both intra and inter-annual scales” – same comment here
  • L109 – “This model can be trained end-to-end to simultaneously learn inter annual crop rotations along with intra-annual temporal patterns” – again if you don’t have intra-annual labels/annotations, and your model predicts only one label/annotation per field per year, how can you say that your model can be trained with intra-annual”. In your won words from your answers: “Our model classify the parcels by mapping their time series to a single class
  • Figure 3 (previously Figure 5) – same comment from the first round - A and b should be better divided (maybe put a squared bracket around each), it looks like they are connected, meaning that it is the same process.
  • Section 2.5 “dataset” – this is your study area. It should be located higher up. The section is called materials and methods, meaning you first describe your materials (which is the study area and Sentinel-2 images, and the LPIS), and then you describe your methods.
  • L173-174 – “We do not apply any pre-processing such as cloud removal or radiometric calibration” – this is statement is not an “easy” thing to read from a remote-sensing perspective…and there is actually no need for it since your provider already did the calibration (and probably also the cloud removal). In Line 168 you state that you use Sentinel2 - level 2A. Level 2A is already calibrated for radiometric and atmospheric effects.
  • L285 – again the “intra” comment
  • L290 – what do you mean by: “non-obvious rotation patterns.”?

Author Response

We thank the reviewer for their appreciation of our effort to improve the manuscript.

"The first deep learning approach" We are pretty sure that no such deep learning model has been yet proposed. But we err on the side of caution and have reformulated this sentence.

"modeling simultaneously the inter- and intra-annual dynamics" We differentiate between "dynamics," i.e., the temporal evolution of the observations, and the "declarations", i.e., the primary culture declared by the farmers. We only predict yearly declarations (one crop per year) and do not use nor predict intra-annual crop types. In contrast, our observations (the satellite time series) have both intra-annual observational dynamics (a series of observations for each year, between 27 and 36 images) and inter-annual observational dynamics (a yearly sequence of observations/declarations, one per year for several years). Our network models not only the intra-annual dynamics (through the temporal evolution of the spectral statistics within a single year) but also inter-annual dynamics (i.e., crop rotations from one year to another). We have added some clarifying phrases.

  • "improvement of over 6.6 mIoU" the mIoU is a percentage-based metric, and Table 3 gives the more commonly used Overall Accuracy. But we hear your concern and have added the improvement in reducing the error rate in the abstract for improved clarity.  
  • "high revisit time of five days" We have incorporated your precision.
  • Figure 1 caption has been reworked.
  • We added a scale bar and north arrow to all maps.
  • L81 - "do not model both inter- and intra-annual dynamics" we do model the intra-year dynamics of observations since we have access to temporal sequence for each year. We clarified this sentence.
  • ", how can you say that your model can be trained with intra-annual" Our model consumes both intra-year time sequence and inter-year information (through past declarations) but is only supervised with yearly labels and no intra-year labels whatsoever.
  • Figure 5: we added a vertical bar
  • We moved the dataset to 2.1  as suggested
  • We clarified the mention of radiometric calibration for clarity
  • "non-obvious rotation patterns" are nonpermanent cultures. We added a clarifying remark.

Reviewer 3 Report

The authors have made good efforts to address the reviewers' comments. In particular, they discussed deeply the limitation of the method. The manuscript has been improved considerably after the revision. So I suggest this manuscript can be published.

Author Response

We thank the reviewer for their insightful comment, which has allowed us to improve the manuscript and make our ideas more straightforward for the journal's audience.

Back to TopTop