Next Article in Journal
Multilevel Data and Decision Fusion Using Heterogeneous Sensory Data for Autonomous Vehicles
Next Article in Special Issue
Combining LSTM and PLUS Models to Predict Future Urban Land Use and Land Cover Change: A Case in Dongying City, China
Previous Article in Journal
Mapping Growing Stem Volume Using Dual-Polarization GaoFen-3 SAR Images in Evergreen Coniferous Forests
Previous Article in Special Issue
Efficient Deep Semantic Segmentation for Land Cover Classification Using Sentinel Imagery
 
 
Article
Peer-Review Record

A Reference-Free Method for the Thematic Accuracy Estimation of Global Land Cover Products Based on the Triple Collocation Approach

Remote Sens. 2023, 15(9), 2255; https://doi.org/10.3390/rs15092255
by Pengfei Chen 1,2, Huabing Huang 1,2, Wenzhong Shi 3,* and Rui Chen 1,2
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Reviewer 4:
Remote Sens. 2023, 15(9), 2255; https://doi.org/10.3390/rs15092255
Submission received: 17 February 2023 / Revised: 21 April 2023 / Accepted: 23 April 2023 / Published: 24 April 2023

Round 1

Reviewer 1 Report

The authors use an innovative approach to an important but daunting problem. I am impressed with the Bayesian approach, and it appears scientifically sound. However, I must admit that I do not have sufficient expertise to rigorously evaluate the details. I do have the  general impression that this is an important paper that will be often cited in the future.

There is an obvious error with the first paragraph o the Introduction

Author Response

Thanks for the evaluation, we are very grateful to receive such high praise

Reviewer 2 Report

This paper proposes a GLC-TCCA method to model high-precision land use. The accuracy of land use data is related to the accuracy of many studies or earth system models, and is a hot and difficult issue in academic research. The GLC-TCCA method proposed in this study is very interesting and has obtained relatively high-precision results. However, in my opinion, the biggest problem within this study is writing.

First of all, the framework organization of the whole paper is unusual, and it feels a little like an experimental report.

Second, the language of the paper is too long-winded, especially in the introduction.

Third, the method section needs to further improve the logic and clarity, so that readers can clearly understand what the author has done, especially in section 2.2.

Finally, please give it applicable conditions or limited scopes on research question.

You can find the file attached.

Comments for author File: Comments.pdf

Author Response

Thanks for the valuable comments. We have adjusted the organization and polished the figures following the comments. The point-to-point responses are given as follows:

First of all, the framework organization of the whole paper is unusual, and it feels a little like an experimental report.

Response: We combined most parts of Section 3 and Section 4 into one experiment section (section 3) in this revision. Some contents that are more about methodology, such as the foundation of CI estimation, are moved into Section 2. The detailed procedures of data preparation, such as the selection and extraction of remote sensing features, were moved into the supplementary. Discussion and conclusion are divided into two section in the revision.

Please find the highlighted contents for the detail

 

Second, the language of the paper is too long-winded, especially in the introduction.

Response: We deleted some useless part from the introduction and elaborated on the methods in the last paragraph of the introduction section.

Please find the highlighted contents in line 96 and line 112 for the detail.

 

Third, the method section needs to further improve the logic and clarity, so that readers can clearly understand what the author has done, especially in section 2.2.

Response: We improved the methodology section following the comments. The workflow is introduced step by step in this revision.

Please find the highlighted contents in section 2.2.

 

Finally, please give it applicable conditions or limited scopes on research question.

Response: In the new discussion section, we analyzed the applicable conditions of our method. In our opinion, GLC-TCCA requires the assessed data to be generally correct, otherwise the local classifier will inherit too many erroneous information, rejecting the conditional independency between the two local classifications. We also elaborate the importance of conditional independency rather than the classification accuracy in GLC-TCCA.

Four limitations are given in the conclusion section. Compared to the last submission, we emphasized the importance to quantify the effect of data accuracy on estimation and to build up an efficient software.

Please find the highlighted content in Section 4 and 5 for the detail.

Reviewer 3 Report

Triple collocation (TCCA) is proposed as an alternative to the usual approach for assessing map accuracy. In the usual approach, a sample is selected and “true” class labels are obtained for those sample units. The “true” class is then compared to the map class to provide a direct determination of agreement (or disagreement). The literature review presented in the Introduction does a nice job providing context for the work described in the manuscript. TCCA is based on creating two additional classifications, and then if an assumption of conditional independence holds, it becomes possible to estimate accuracy of the initial map of primary interest. TCCA offers a mathematically sophisticated and somewhat mystical alternative to assessing map quality. Nonetheless, the manuscript describes what is undoubtedly a different and interesting approach to determine map quality. My primary concerns with the manuscript are related to matters of practical utility in the implementation of the methodology. Hopefully the following comments will provide a basis by which the authors might improve clarity of some of these issues.

Major Considerations

1. The underlying premise of the method is difficult to understand intuitively due to my lack of familiarity with the mathematics involved.  It does seem counterintuitive that we could derive accuracy of a target map by producing two other classifications. These other two classifications must be of poorer quality than our target map. Otherwise, we would be using one of these more accurate classifiers. Yet, we are able to estimate accuracy of our target map on the basis of two inferior map products. It is difficult to understand how that could be possible. Does this process work even if the two other classifiers are vastly inferior to the target map of interest or is there some minimum accuracy that both classifiers should achieve?

2. A practical concern with the map is the requirement to produce three classifications (the target map being evaluated plus two additional maps for assessing accuracy). For many projects it is sufficiently costly and time consuming that completing even a single map is problematic. Yet here we must produce two additional classifications. I suppose one could argue for use of other maps that have already been produced, but this would carry with it problems of similarity of legends, date of the maps, etc.

3. One of the potential strengths of TCCA is the use of a local classifier to produce the two map products needed for TCCA. It makes sense that a local classifier could improve upon the target global classifer for the subset of area to which the local classifier is applied. To diminish the problem that manual interpretation is costly and time consuming (L190), it is proposed to use existing land cover products. However, these existing land cover products would seem to forfeit the “local classifier” advantage as similar to the global product being assessing, the existing data would not have been developed locally.

4. Providing a measure of degree of conditional independence is essential and the authors have met this requirement (equation 12). This measure seems to be interpretable in a manner like any other correlation metric. At what threshold of the measure (equation 12) would we begin to be concerned about conditional independence? Greater than 0.10? Greater than 0.20?

5. It seems reasonable to assume that TCCA would work best if we had two very accurate classifiers as the information input into the analysis. Yet it would also seem likely that if we did have two accurate classifiers, then they would not be conditionally independent because they have a great deal of commonality to the class labels assigned. To achieve conditional independence, it would seem preferable to have one very good classifier and one poor classifier. Is there a simple explanation to refute this seeming paradox about quality of the two classifications needed to assess the map of interest? 

6. If it is necessary to collect 15 million points to produce the local classifiers is that really decreasing costs relative to collecting reference data? If we also add to that the time and effort to implement the filtering method, it is not clear why the TCCA method would be less costly. The problem of using existing data to reduce costs was noted in comment #3, forfeiture of the advantage of local classification.

7. At Lines 206-208, it is not apparent why using a high-order neighbourhood set would be connected to the conditional independence assumption. Is there an intuitive explanation? Is there a relationship between area classified and conditional independence?

8. Mean absolute percentage error (MAPE) needs to be defined. Why would just using mean UA or mean PA note suffice? Please state how MAPE was calculated. Perhaps the issue is more the term applied than the calculation, but it is unclear what definition is associated with MAPE. Similarly, the term “relative error” needs to be defined. Relative to what feature? Please add these specific definitions to improve clarity.

Other comments:

10. Lines 39-47: The text for these lines is not part of the manuscript but information about content to be included in an Introduction.

11. Line 81: “sample point” should be “sample points” (plural)

 

12. L296 and L302: It is not stated whether there is a sampling design underlying selection of the 1,980 chips or the 89 tiles. It would seem to matter if these two samples (chips and tiles) are purposely or arbitrarily selected versus selected via a rigorous sampling design.

13. The manuscript uses an incorrect definition of “sample” in many places. A sample is defined by a collection of points or spatial units, and each individual point or unit is not a sample. In many places in the text the authors refer to “sample” when it is just a single point or unit, and refer to the collection of points or units as “samples” when in reality there is just one sample. Please check the entire manuscript and correct these errors in use of the statistical term “sample” (e.g., Lines 199, 271, 411, 495, etc.)

14. The TCCA method has very high computational requirements. Is there software that will implement these analyses? This is a potential limitation for practitioners.

15. Reference [18] has multiple authors listed twice – please check for the correct list and order of authors.

 

Author Response

Thanks for the valuable comments, they are very helpful to improve our work. In addition to the reactions to the comments, we adjusted the organization of the paper and polished the figures in this revision.

Here are our responses:

  1. The underlying premise of the method is difficult to understand intuitively due to my lack of familiarity with the mathematics involved.  It does seem counterintuitive that we could derive accuracy of a target map by producing two other classifications. These other two classifications must be of poorer quality than our target map. Otherwise, we would be using one of these more accurate classifiers. Yet, we are able to estimate accuracy of our target map on the basis of two inferior map products. It is difficult to understand how that could be possible. Does this process work even if the two other classifiers are vastly inferior to the target map of interest or is there some minimum accuracy that both classifiers should achieve?

Response: Thanks for the question. We add some discussions about the relationship between accuracy and TCCA in Section 4. Here, I want to give some more examples to explain the foundation of GLC-TCCA.

An extreme example: if the original data is totally wrong, the training sample as well as the local classification will be totally wrong, too. In that situation, the two local classifications, namely X and Y, won’t be independent given the ground truth because errors are fully correlated in X and Y.

A general example: if the original data is generally correct (e.g. OA>70%), we could extract reliable sample to train two different classifiers (e.g. decision tree and GNB in our paper). An ideal situation is that all the training sample are correct, so that the classification error for each local classification is only dependent on the mechanism of classifier (training sample are also different for these two classifier). In this ideal situation, since errors are not correlated among all three classifications (no error local propagated from original data to the local classifications, and local classifiers are independently trained with reliable samples), they are likely to satisfy the conditional independence assumption, and their accuracy could be solved using TCCA.

The second example generally reflects what we do in this study. All efforts made in this study aim to make the three classifications conditional independent, not just accurate. To that end, the training sample should be generally correct due to the reason as discussed in the first example. Once the above points are clear, the math part could be found in Section 2.1. 

 

As accuracy and conditional independence might have certain mutual effect as we discussed in the extreme example. Investigation on this effect could be an important work in future.

We added some discussion in Section 4 and limitations in Section 5. Please find the highlighted in the corresponding sections.

 

  1. A practical concern with the map is the requirement to produce three classifications (the target map being evaluated plus two additional maps for assessing accuracy). For many projects it is sufficiently costly and time consuming that completing even a single map is problematic. Yet here we must produce two additional classifications. I suppose one could argue for use of other maps that have already been produced, but this would carry with it problems of similarity of legends, date of the maps, etc.

Response: I think there are some misunderstandings. Using other maps for TCCA is ideal but not feasible in practice especially due to the inconsistence of legends. This is also why we trained two local classifiers using the land label from the assessed data in this study. Details can be found in line 104 to 108. Line 110 also states what problem we want to address in this study.

Furthermore, please note that, it is not costly to generate two additional classifications since the training and predicting phases are both fast for classifiers like DT and GNB.

  1. One of the potential strengths of TCCA is the use of a local classifier to produce the two map products needed for TCCA. It makes sense that a local classifier could improve upon the target global classifer for the subset of area to which the local classifier is applied. To diminish the problem that manual interpretation is costly and time consuming (L190), it is proposed to use existing land cover products. However, these existing land cover products would seem to forfeit the “local classifier” advantage as similar to the global product being assessing, the existing data would not have been developed locally.

Response: During training the local classifiers, we suppose that local remote sensing features would be superior to present the land characteristics. Even the global product might has been produced using a global classifier (in fact, region-specific classifiers are widely used in literature), we only used the land label it provides, and we further extracted our own local features to train our classifiers. In that sense, local classifiers should still remain the advantages because the training sample from neighboring tiles should be representative to the land features in the assessed unit.

Please also see the responses to the first comments for reference.

The investigation on the effect of data accuracy would also be a critical part in our future work. Please find the highlighted content in section 4 and 5 for reference.

  1. Providing a measure of degree of conditional independence is essential and the authors have met this requirement (equation 12). This measure seems to be interpretable in a manner like any other correlation metric. At what threshold of the measure (equation 12) would we begin to be concerned about conditional independence? Greater than 0.10? Greater than 0.20?

Response: this is a hard question to me. As far as I know, there is no threshold for this measure to determine its significance as what has been done in some statistical tests. According to the following literature, it seems that a CI index lower than 0.1 could safely indicate a negligible conditional dependency.

  • Georgiadis, M. P., Johnson, W. O., Gardner, I. A., & Singh, R. (2003). Correlation‐adjusted estimation of sensitivity and specificity of two diagnostic tests. Journal of the Royal Statistical Society: Series C (Applied Statistics), 52(1), 63-76.
  • Foody, G. M. (2010). Assessing the accuracy of land cover change with imperfect ground reference data. Remote Sensing of Environment, 114(10), 2271-2285.

 

  1. It seems reasonable to assume that TCCA would work best if we had two very accurate classifiers as the information input into the analysis. Yet it would also seem likely that if we did have two accurate classifiers, then they would not be conditionally independent because they have a great deal of commonality to the class labels assigned. To achieve conditional independence, it would seem preferable to have one very good classifier and one poor classifier. Is there a simple explanation to refute this seeming paradox about quality of the two classifications needed to assess the map of interest? 

Response: please refer to the answer in the comment#1. All efforts made in this study aim to make the three classifications conditional independent, not just accurate.

 

 

  1. If it is necessary to collect 15 million points to produce the local classifiers is that really decreasing costs relative to collecting reference data? If we also add to that the time and effort to implement the filtering method, it is not clear why the TCCA method would be less costly. The problem of using existing data to reduce costs was noted in comment #3, forfeiture of the advantage of local classification.

Response: From the perspective of cost, TCCA should be superior to traditional methods which rely on manual inspection. First, our method is totally automatic. The selection of sample is easily implemented on GEE and theoretically we can get as many sample point as we want. More importantly, compared to manual inspect which might take a very long time (it is really a huge workload and always requests a team for the inspection) to load images and visually check sample, our method is much faster (all tasks are operated by computer). We adopted such a big number of sample points because TCCA is not stable to small data.

As for the issue on ‘local classification’, please refer to the response to comment#3

  1. At Lines 206-208, it is not apparent why using a high-order neighbourhood set would be connected to the conditional independence assumption. Is there an intuitive explanation? Is there a relationship between area classified and conditional independence?

Response: Given the widely existed spatial heterogeneity in large-scale land cover, the classification errors in different tiles are also likely to present different distributions. Therefore, using neighbourhood would benefit to build up effective local classifiers and also help to avoid the overlapping between the assessed unit and local sample.

We added some discussions on this in line 211 and 528.

  1. Mean absolute percentage error (MAPE) needs to be defined. Why would just using mean UA or mean PA note suffice? Please state how MAPE was calculated. Perhaps the issue is more the term applied than the calculation, but it is unclear what definition is associated with MAPE. Similarly, the term “relative error” needs to be defined. Relative to what feature? Please add these specific definitions to improve clarity.

Response: Thanks for the suggestion, we defined MAPE in line 292. Here we want to compare the estimates and actual value. Commonly mean absolute error (MAE) and MAPE can be used for that goal. However, to exclude the effect of magnitude, MAPE is more suitable here. Please see the definition in line 288.

 

  1. Lines 39-47: The text for these lines is not part of the manuscript but information about content to be included in an Introduction.

Response: Sorry for that mistake, the part has been removed.

 

  1. Line 81: “sample point” should be “sample points” (plural)

Response: Thanks. We revised that.

 

  1. L296 and L302: It is not stated whether there is a sampling design underlying selection of the 1,980 chips or the 89 tiles. It would seem to matter if these two samples (chips and tiles) are purposely or arbitrarily selected versus selected via a rigorous sampling design.

Response: Samples are just randomly selected from the reliable pixels in 1,980 chips. Please refer to revised content in line 312 and 317.

 

  1. The manuscript uses an incorrect definition of “sample” in many places. A sample is defined by a collection of points or spatial units, and each individual point or unit is not a sample. In many places in the text the authors refer to “sample” when it is just a single point or unit, and refer to the collection of points or units as “samples” when in reality there is just one sample. Please check the entire manuscript and correct these errors in use of the statistical term “sample” (e.g., Lines 199, 271, 411, 495, etc.)

Response: Thanks for the comments, we made a throughout check on this issue.

 

  1. The TCCA method has very high computational requirements. Is there software that will implement these analyses? This is a potential limitation for practitioners.

Response: we added this as a limitation in the conclusion section.

In fact, we have the full code in python but it lacks UI and needs more optimization before releasing as a software.

 

  1. Reference [18] has multiple authors listed twice – please check for the correct list and order of authors.

Response: sorry for the mistake, we have revised it.

Reviewer 4 Report

       The paper originally uses  the triple collocation approach (TCCA) for estimating the GLC’s classification accuracy.  In the case study of  WorldCover 2020, the relative error is less than 4% at the continent level. The results are quite positive. There are still some minors for improvement.

1.     In instruction part, line 39-47 should be cancelled.

2.     In Materials and Methods part, 2.1 & 2.2 can be methods, there is not material. It is suggested to change the title. In 2.1, it is preferable to be briefly.

3.     In Part 3.2.2, table 1 and figue4 express the proof for choice of the classifiers, it is suggestive to combine them together.

4.      In fig.7, one of the most different estimation accuracy value between the GLC report and GLC-TCCA is in North America, but in table 3 it is the different result, please give the detailed explain.

5.     TCCA is the up-scaling method for assess the accuracy of the GLC’s product accuracy.  From the traditional view, the detailed image can be the reference or sample data  for estimating the accuracy of the land cover at coarse scale. Why can we use the GLC-TCCA(at coarse scale) result to estimate the accuracy of the more detailed GLC product? Please give the assured proof.

Comments for author File: Comments.pdf

Author Response

Thanks for the valuable comments, they are very helpful to improve our work. Here are our responses:

 

  1. In instruction part, line 39-47 should be cancelled.

Response: sorry for the mistake, we have removed it.

 

 

  1. In Materials and Methods part, 2.1 & 2.2 can be methods, there is not material. It is suggested to change the title. In 2.1, it is preferable to be briefly.

Response: We changed the title and simplified 2.1 by removing some verbose part.

 

 

3.In Part 3.2.2, table 1 and figue4 express the proof for choice of the classifiers, it is suggestive to combine them together.

Response: After a serious consideration, we decided to keep table 1 and figure 4 separate.

we admit that there some overlaps between table 1 and figure 4 (i.e. the column of DT and GNB were plotted in Figure 4, but they do have different emphasis: Table 1 aims to present the selection of local classifiers, whilst Figure 4 emphasizes the validation of exclusion of assessed unit from the neighbourhood and also presents the independency between the selected classifiers (i.e. a-3 and b-3).

Furthermore, we actually tried to combine them into a new table, but we found the table is too large and difficult to convey the information comprehensively.

 

4.In fig.7, one of the most different estimation accuracy value between the GLC report and GLC-TCCA is in North America, but in table 3 it is the different result, please give the detailed explain.

Response: It is noteworthy that certain grids at high latitudes have smaller areas due to the distortion caused by projection and occupation by sea, resulting in their less weights in the calculation of overall accuracy. The northeast regions of North America, for instance, visually exhibit a greater difference in accuracy; however, this does not significantly diminish the overall accuracy of North America.

We added the above explanation in line 451.

 

5.TCCA is the up-scaling method for assess the accuracy of the GLC’s product accuracy.  From the traditional view, the detailed image can be the reference or sample data for estimating the accuracy of the land cover at coarse scale. Why can we use the GLC-TCCA(at coarse scale) result to estimate the accuracy of the more detailed GLC product? Please give the assured proof.

 

Response: Thanks for the question. We need to emphasize here that the assessed unit as well as the local classifications have the same spatial scale. The remote sensing features used in this study range, but the finest scale is also consistent with the GLC data addressed in this study. So, from the spatial resolution perspective, we do not encounter any scale issues in this study.

However, from validation perspective, the foundation of GLC-TCCA might not be so intuitive to understand. Here, I will take a general example to explain it.

If the original data is generally correct (e.g. OA>70%), we could extract reliable sample to train two different classifiers (e.g. decision tree and GNB in our paper). An ideal situation is that all the training sample are correct, so that the classification error for each local classification is only dependent on the mechanism of classifier (training sample are also different for these two classifier). In this ideal situation, since errors are not correlated among all three classifications (no error local propagated from original data to the local classifications, and local classifiers are independently trained with reliable samples), they are likely to satisfy the conditional independence assumption, and their accuracy could be solved using TCCA.

This example generally reflects what we do in this study. All efforts made in this study aim to make the three classifications conditional independent, not just accurate. To that end, the training sample should be generally correct due to the reason as discussed in the first example. Once the above points are clear, the math part could be found in Section 2.1. 

We add some discussions about the relationship between accuracy and TCCA in Section 4.

As accuracy and conditional independence might have certain mutual effect as we discussed in the extreme example. Investigation on this effect could be an important work in future.

In the new discussion section, we also analyzed the applicable conditions of our method. In our opinion, GLC-TCCA requires the assessed data to be generally correct, otherwise the local classifier will inherit too many erroneous information, rejecting the conditional independency between the two local classifications. We also elaborate the importance of conditional independency rather than the classification accuracy in GLC-TCCA.

Please find the highlighted content in Section 4 and 5 for detail.

 

Round 2

Reviewer 2 Report

The manuscript has been largely improved, and most of my pervious concerns have appropriately addressed.

Author Response

Thanks for the valuable comments, which essentially help to improve the work.

Reviewer 3 Report

The authors invested considerable effort to respond to comments.  I read their responses and briefly skimmed the revisions noted in red in the manuscript. The only comment I have is that in the definition of MAPE the use of Y as the predicted value and Y^ as the truth is the reverse of standard notation where Y^ is denoted as the predicted value. But otherwise I did not see any other issues of substance that would need to be addressed. 

Author Response

Thanks for the comments, we have revised the expression accordingly.

Back to TopTop