Next Article in Journal
People Flow Trend Estimation Approach and Quantitative Explanation Based on the Scene Level Deep Learning of Street View Images
Next Article in Special Issue
Spatiotemporal Landscape Pattern Analyses Enhanced by an Integrated Index: A Study of the Changbai Mountain National Nature Reserve
Previous Article in Journal
Vegetation Cover Dynamics in the High Atlas Mountains of Morocco
Previous Article in Special Issue
Fusing Multiple Land Cover Products Based on Locally Estimated Map-Reference Cover Type Transition Probabilities
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Characterizing Uncertainty and Enhancing Utility in Remotely Sensed Land Cover Using Error Matrices Localized in Canonical Correspondence Analysis Ordination Space

by
Yue Wan
1,2,
Jingxiong Zhang
3,*,
Wangle Zhang
4,
Ying Zhang
5,
Wenjing Yang
6,
Jianxu Wang
3,
Okafor Somtoochukwu Chukwunonso
3 and
Asurapplullige Milani Tharuka Nadeeka
3
1
College of Resource and Environment, Henan Agricultural University, Zhengzhou 450002, China
2
College of Forestry, Henan Agricultural University, Zhengzhou 450002, China
3
School of Geodesy & Geomatics, Wuhan University, Wuhan 430079, China
4
College of Geology Engineering and Geomatics, Chang’an University, Xi’an 710054, China
5
College of Geo-Exploration Science and Technology, Jilin University, Changchun 130026, China
6
College of Geography and Environment, Shandong Normal University, Jinan 250358, China
*
Author to whom correspondence should be addressed.
Remote Sens. 2023, 15(5), 1367; https://doi.org/10.3390/rs15051367
Submission received: 27 December 2022 / Revised: 29 January 2023 / Accepted: 26 February 2023 / Published: 28 February 2023
(This article belongs to the Special Issue State-of-the-Art in Land Cover Classification and Mapping)

Abstract

:
In response to uncertainty in remotely sensed land cover products, there is continuing research on accuracy assessment and analysis. Given reference sample data, accuracy indicators are commonly estimated based on error matrices, from which areal extents of different cover types are also estimated. There are merits to explore the ways utilities of land cover products may be further enhanced beyond map face values and conventional area estimation. This paper presents an integrative method (CCAErrMat) for uncertainty characterization and utility enhancement. This works through reference-map cover type co-occurrence analyses based on error matrices localized in canonical correspondence analysis (CCA) ordination space rather than in geographic space to overcome the sparsity of reference sample data. The aforementioned co-occurrence analyses facilitate quantification of accuracy indicators, identification of correctly classified and perfectly misclassified pixels, and prediction of reference class probabilities, all at individual pixels. Moreover, these predicted reference class probabilities are used as auxiliary variables to formulate model-assisted area estimation, further enhancing map utilities. Extensions to CCAErrMat are also investigated as a way to bypass the pre-computing of map class occurrence pattern indices as candidate explanatory variables for CCAErrMat, leading to two variant methods: CCACCAErrMat and CNNCCAErrMat. A case study based in Wuhan municipality, central China was undertaken to compare the proposed method against alternative methods, including CCA-separate and CNN-separate. The advantages of CCAErrMat and CCACCAErrMat were confirmed. The proposed method is recommendable for characterizing uncertainty and enhancing utilities in land cover maps by analyzing locally constrained error matrices. The method is also cost-effective in terms of reference sample data, as requirements for them are similar to those for conventional accuracy assessments.

1. Introduction

Land cover information is important for geoscientific research and various spatial applications. A variety of land cover information products are produced by organizations around the world at both finer spatial resolution, such as GlobeLand30 [1] and LCMAP (Land Change Monitoring, Assessment, and Projection) [2], and coarser spatial resolution, such as Copernicus global land cover [3] and MODIS [4,5]. They provide information support for natural resource monitoring and environmental modeling at different scales.
However, land-cover maps are subject to uncertainty, which hampers spatial analyses and applications. There is growing research on accuracy assessment and analysis [6,7,8,9], which are useful not only for judging maps’ fit for use, but also for better understanding occurrences of misclassifications and thus informing about how land cover mapping may be improved. In this paper, uncertainty is a term with broad meanings and includes misclassification, ambiguity and fuzziness in labeling, and other types of errors in classification, though it can be used interchangeably with inaccuracy here.
Accuracy is conventionally assessed in a non-spatial way using error matrices constructed by cross-tabulating map classifications and reference classifications. Accuracy indicators, such as overall accuracy (OA), user’s accuracy (UA), and producer’s accuracy (PA), are often reported.
Local accuracy characterization complements non-spatial accuracy assessments mentioned above and provides pixel-level information about spatial variations of classification accuracy. For this, there have been continuing research efforts [10,11,12,13,14]. Three types of methods are often applied: spatial interpolation, empirical modeling, and localized error matrices, as reviewed by Khatami et al. [12], Stehman and Foody [14], and Zhang et al. [15], among other researchers. Spatial interpolation may be based on kriging, as in Steele et al. [16]. Empirical modeling-based methods seek to model relationships between map-reference classification agreements/disagreements and various predictors, either in the image spectral domain [9] or the map domain [13,17]. For empirical modeling, machine learning methods were employed, such as logistic regression [10,13], generalized additive modeling [18], and random forest [9]. The spatially constrained error matrices-based method was proposed by Foody [11] to map local classification accuracies (e.g., local OA, local UA, and local PA). Along this line, Comber et al. [19] computed localized and geographically weighted correspondence matrices to compute local accuracy indicators. The local error matrices-based methods are advantageous over alternative methods for their versatility in modeling multiple accuracy indicators altogether. With per-pixel accuracy analyzed, it is possible to demarcate low-accuracy locations for well-targeted re-mapping. For example, an accuracy increase was achieved by combined use of computer image re-processing over regions relatively easy to map and visual image interpretation over regions difficult to map [20].
Map-wise accuracy assessment and per-pixel accuracy modeling are seldom the ultimate goal themselves. It is also important to investigate how maps’ utilities may be enhanced beyond their face values. Here, utilities refer to the information and functionalities attainable with a map. An example is area estimation based on error matrices along with accuracy assessment [8]. Quantities that can be estimated from error matrices include map-reference class co-occurrences, map-reference class transition probabilities (with UA and PA being their respective special cases), and reference class occurrences, which actually pertain to area estimation.
The quantities mentioned above can be estimated not only for a map as whole but also around individual pixels. This study seeks to analyze map-reference class co-occurrences locally so that not only accuracy indicators but also map-reference cover type co-occurrences, transitions, and marginals (reference class probabilities) can be predicted, all at individual pixels. For map-reference cover type transitions, there are two extreme cases. One concerns correctly classified pixels, which can be used as classification training data. The other is perfectly misclassified pixels (i.e., refinable pixels discussed in [21], which can be corrected and then added to the aforementioned training data. Such pixels would be difficult to identify on globally estimated error matrices, as misclassifications are rarely patterned up, except for cases like in Campos and Brito [22]. In between, a fuller spectrum of map-reference transition probabilities can be computed for a land cover map individually as in this research or for fusion of multiple land cover maps [23], although the latter is beyond the scope of this research.
The predicted per-pixel reference class probabilities can be used to augment land cover representations. They also help identify mixed pixels, which are common for land cover products at 30 m resolution or coarser, especially over fragmented landscapes. The information about mixed pixels and low-accuracy pixels identified in accuracy assessment can be used together for well-targeted land cover re-mapping. Moreover, reference class probabilities are useful for area estimation. While stratified estimators with map classes being strata are highly recommended [8,24], model-assisted estimators are often applied for improving area estimation [25,26]. As reference class probabilities mentioned above are predicted in alignment to reference classifications via reference sample data, they are presumably closer to the true class proportions than original map classifications. It then makes sense to formulate model-assisted area estimators by using these predicted probabilities as auxiliary variables to increase estimation precision.
As discussed above, properly constructed local error matrices are versatile for uncertainty characterization and map-reference cover type co-occurrence analyses, which lead to enhanced map utilities as described above. However, reference cover conditions are often only sparsely sampled, meaning that reference sample data are often inadequate for constructing error matrices localized in geographic space, as revealed in our preliminary experiments. This research proposes constructing error matrices localized in synthetic feature space (i.e., ordination space) defined through CCA (thus named CCAErrMat) to overcome the hurdle of sample data sparsity. This leads to a unifying framework for predictions and analyses concerning local accuracy indicators, correct classifications, perfect misclassification, spectral confusions, and reference class probabilities, which can be further used as auxiliary variables to facilitate model-assisted area estimation for improved precision. The remainder of the paper describes the methods and experiments before some discussion and concluding remarks.

2. Methods

In conventional CCA modeling, response variables are specified according to the objectives of modeling: map-reference classification agreements (conditional or non-conditional) for mapping local accuracies and reference classifications for predicting reference class probabilities. Thus, the conventional CCA method is named CCA-separate to highlight its difference from CCAErrMat (for which response variables are always reference classifications). Similar to CCA-separate, CNN is also compared with CCAErrMat, thus named CNN-separate.
To bypass the pre-computing of map class occurrence pattern indices for CCA modeling, novel use was made of CCA and CNN as feature extractors for CCAErrMat in this study. These two variants of CCAErrMat were named CCACCAErrMat and CNNCCAErrMat, respectively. A flowchart for the methods proposed is shown in Figure 1.
As indicated in Figure 1, CCA-separate and CNN-separate perform the modeling for different local accuracy indicators and reference class probabilities only and separately. By CCAErrMat and its two variant methods, map-reference cover types co-occurrence analyses are performed based on CCA feature space localized error matrices, proceeding to training data extraction (by locating correctly classified and perfectly misclassified pixels) and reference class probability predictions. Reference class probabilities can be used to augment representations of land cover. They can also be used along with local accuracy surfaces to demarcate locations of uncertainty to inform land cover mapping. The predicted reference class probabilities can be further used to drive model-assisted area estimation, as shown in Figure 1. The aforementioned methods are described in more detail below while CCA and CNN are described in Appendix A and Appendix B, respectively.

2.1. CCA Feature Space Local Error Matrices: CCAErrMat

The basis for this method is CCA [27,28]. With reference classifications used as response variables and map classifications (usually along with map class occurrence pattern indices) used as explanatory variables, CCA modeling is described in Legendre and Legendre [28] and other related literature [29,30,31,32,33]. See also Appendix A for a concise introduction.
As described in Appendix A, CCA site scores determine the canonical coordinates of each pixel in the canonical space (CCA feature space) defined by the canonical axes. Consider a pixel x0. At its location O(x0) in CCA feature space, a subset of the sample pixels (say k of them) falling within a distance range Dis are weighted by their distance to O(x0) to construct the error matrix. The resulting values are feature space distance-weighted counts of CCA sample pixels in each reference class (column j) assigned to each map class (row i). In other words, cell counts N P O x 0 i , j are estimated. Cell counts are the counts of pixels belonging to map-reference cover type combinations in a localized error matrix. These counts are sums of numbers of pixels weighted by distances between unknown locations and sample locations in CCA ordination space. They are non-integers when distance-weighting is applied:
N P ^ O x 0 i , j = k = 1 k W k I i = c x k , j = c x k , when d k x 0 < Dis
where I() is an indicator; c x k and c x k refer to map class and reference class for sample pixel xk, respectively; W k , being the weight assigned to sample pixel xk, is based on the distance d k x 0 between sample pixel x k and x 0 , which is itself eigenvalue-weighted; and Dis refers to the search radius containing k sample pixels. The estimated cell counts N P O x 0 i , j are standardized by their grand total say TP to get standardized cell probabilities p ^ O x 0 i , j ( which equals N P O x 0 i , j /TP).
Once the error matrices centered at individual pixels are populated with cell proportions properly estimated, it is straightforward to calculate local accuracy indicators for pixel x0. For example, we can compute local OA as:
P O A , o x 0 = i = 1 C L S p ^ O x 0 i , i
where CLS represents the number of candidate classes considered. Local UA for map class c′ and local PA for reference class c are predicted, respectively, as:
P U A , o x 0 | c = p ^ O x 0 c , c / j = 1 C L S p ^ O x 0 c , j
and
P P A , o x 0 | c = p ^ O x 0 c , c / i = 1 C L S p ^ O x 0 i , c
We can predict probabilities of all candidate reference classes’ occurrences from the aforementioned feature space localized error matrices. This amounts to computing the sum of cell proportions along a particular column (corresponding to a reference class say j):
p ^ o x 0 + , j = i = 1 C L S p ^ O x 0 i , j
The resultant class probabilities are useful for identifying mixed pixels, which along with local accuracy surfaces mapped above, can inform land cover re-mapping. They can also be used for augmented representations of land cover by showing primary and alternate class labels for mixed pixels; among them, pixels showing virtually equal class memberships to more than two classes can be further identified. Furthermore, based on localized error matrices, it is straightforward to examine pixels if they are correctly classified or perfectly misclassified (can be corrected easily). From such pixels and the pixel segments (3 by 3) they fall in, training data are extracted and used for improving classification.
Note that for different modeling objectives (e.g., different local accuracy indicators, reference class probabilities), the k’s (optimum number of nearest neighboring sample pixels) for constructing local error matrices need to be determined individually. Optimization was carried out through cross-validation in this study.

2.2. CCA and CNN Used as Feature Extractors for CCAErrMat: CCACCAErrMat and CNNCCAErrMat

For modeling by CCAErrMat, CCA-separate, and CNN-separate, candidate explanatory variables usually include map classes, geospatial coordinates, and map class occurrence pattern indices, such as class proportions, homogeneity, heterogeneity, dominance, entropy, and contagion [34,35,36]. For this study, map class occurrence pattern indices were computed over moving window sizes of 3 by 3, 5 by 5, 7 by 7, and 9 by 9 pixels. This maximum window size of 9 pixels was set because smaller window sizes are more informative for pixel-level modeling and help to reduce the computational burden. A total of 63 explanatory variables were considered in model selection: map class (6), class proportions in different-sized windows (28), pattern indices computed in different-sized windows (20), class occurrence frequencies of neighbor polygons (7), and sample pixel’s geospatial coordinates (2) [21].
The computing of class occurrence pattern indices is, however, computationally intensive. As CNN is becoming easy to use and is often considered as a feature extractor, there seem to be merits in applying CNN using map class indicators alone as explanatory variables to predict reference class probabilities, which are then used as explanatory variables along with map classifications for CCAErrMat. This leads to a method named CNNCCAErrMat. Likewise, CCA is also used as a feature extractor for CCAErrMat, leading to Method CCACCAErrMat.

2.3. Improved Model-Assisted Area Estimation

Area estimators include design-based estimators, model-based estimators, and model-assisted estimators [25,37,38]. The most common design-based estimator is the π estimator (also known as the Horvitz–Thompson estimator), which uses probability-based samples to estimate areas. It is unbiased and has relatively high precision (i.e., low variance). When stratified sample data with map classes being the strata were organized in an error matrix, areas for individual cover types were easily estimated [8]. See Appendix C for further detail. When full-coverage auxiliary variables were available or could be assembled, model-assisted estimators were used [37,39], such as the difference estimator and the regression estimator, which are also described in Appendix C.
A model-assisted area estimator, which uses predicted reference class probabilities (Equation (5)) as auxiliary variables, was proposed in this study. Since the predicted class probabilities presumably approximated true class proportions more closely than original map classifications, they were expected to lead to increased precision in area estimation. This was pursued through the difference estimator and regression estimator described in Appendix C.

3. Experiments

3.1. The Study Site and Datasets

GlobeLand30 2010 land-cover dataset (http://www.globallandcover.com (accessed on 26 February 2023)) for Wuhan, China was used for this study. The city of Wuhan is about 8495 km2 in areal extent, located in the middle reach of the Yangtze, as shown in Figure 2 (where the inset map of China is shown lower right corner). For Wuhan, of seven classes in total, the dominant class is cultivated land, followed by water, forest, and artificial surface; they account for about 60%, 15%, 12%, and 7% of the total area, respectively, according to map classifications. Grassland, wetland, and bare land together take about 6% of Wuhan’s areal extent. Further detail can be found in Wan et al. [21].
Model training and testing were based on the sample data acquired by Zhang et al. [40], which were collected following a class-heterogeneity-stratified random sampling design (StRS). This design uses a local heterogeneity index to stratify pixels within a class (stratum) into relatively homogeneous and heterogeneous sub-strata [40,41], resulting in a total of 14 sub-strata for a total of 7 strata (cover types) in the study area. The full training sample has a total of 3000 pixels, and the full test sample has a total of 1020 pixels, as shown in Table 1, where number of sample pixels belonging to individual sub-strata are indicated, with sampling intensities shown in parentheses for the full training sample and the full test sample. Sampling intensities are percentages of the total number of pixels (Nstrata) belonging to specific sub-strata (E-Heterogeneous, O-Homogeneous). It is shown that sampling intensities for E sub-strata were much greater than their O counterparts, being a feature of the adopted StRS design by Zhang et al. [40]. Sample size allocations for the training and test samples were configured following the Neyman method [40]. In Table 1, Cultivt and Artfct are abbreviations for cultivated land and artificial surfaces, respectively, while grass, water, and bare for grassland, water bodies, and bare land, respectively. These abbreviations are the same through the remainder of this paper, including the appendices.
In this study, for evaluating the influence of sample size on model performances, sample subsets I and II (of 360 pixels and 1020 pixels, respectively) were selected from the full training sample for experiments, as shown in Table 1. For model testing, the full test sample was adopted. For model-assisted area estimation, sample subset III (of 360 pixels) was selected from the full test sample and was used along with the latter, as shown in Table 1. Note that the aforementioned sample subsets I and II (for model training) were different from the samples used for area estimation (i.e., sample subset III and full test sample), although with the same sizes (strata-allocations) of 360 and 1020 pixels, respectively, as shown in Table 1.

3.2. Results

Modeling and estimation were carried out using methods described in Section 2. Some of the results, especially the intermediate ones, are shown in Appendix D, as indicated in the text below wherever relevant.

3.2.1. Mapping Local Accuracy

Conventional error matrix-based accuracy assessment was carried out, as shown in Table A1 in Appendix D. While it is possible to identify classes of relatively lower UAs and/or PAs from inspecting the error matrix, pixel-level information about spatial variations in accuracy would be possible only through local accuracy mapping.
Local accuracies were mapped using the methods described in Section 2. In Appendix D, Table A2 shows the optimal explanatory variables selected and optimum k’s for KNN (K nearest neighbors) in CCA-separate while shows optimum k’s for CCAErrMat. Some of the local accuracy surfaces obtained with method CCAErrMat based on the training sample subset of 360 pixels are shown in Figure 3, where surfaces of local OAs, local UAs for cultivated land, and local PAs for water bodies are shown in Figure 3a–c, respectively.
Water pixels appear to have higher local OAs while pixels of artificial surfaces and cultivated land tend to have lower local OA, as shown in Figure 3a. For the class of cultivated land, there exist variations in local UAs, as shown in Figure 3b. For the class of water bodies, higher local PAs tend to cluster around water bodies of larger areal extents.
Accuracies of local accuracy indicators predicted using different methods were evaluated based on the full test sample. This was performed based on area under the curve (AUC) (i.e., the receiver operating characteristic curve, ROC) [42] commonly applied for continuously valued quantities such as local accuracies. Results are shown in Table 2.
As shown in Table 2, accuracies increase generally as the training sample size increases. For local OAs, all methods tested achieve AUCs higher than 0.70. CCA-separate with 1020 training sample pixels attained the highest accuracy (0.79). For local UAs, relatively accurate predictions are generated by most methods for cultivated land, forest, and artificial surfaces. For local PAs, prediction accuracies are quite high. This is similar to that reported in Wickham et al. [13].
Then, we compare CCAErrMat and CCACCAErrMat. Although they predict local accuracies with comparable accuracies, CCACCAErrMat predicts local OAs and local UAs (for grassland, artificial surfaces, and bare land) with greater accuracy. Thus, it is highly recommended for use given its reduced cost in computing class occurrence pattern indices.

3.2.2. Analyzing Map-Reference Class Co-Occurrences

Alternative methods were applied with training sample subsets I and II detailed in Table 1. Unless stated otherwise, the results obtained using CCAErrMat with the sample of 360 pixels were used as examples below.
Reference class probabilities predicted were used to identify mixed pixels. They were identified by setting thresholds on the maximum class probabilities. The threshold applied was 0.60. Mixed pixels identified are shown in Figure 4a, indicating many pixels were viewed as mixed as the threshold value applied was relatively high. Based on local OAs, locations of relatively low accuracy (thresholding at 0.20) can be marked, resulting in the map shown in Figure 4b, where fewer locations than those in Figure 4a were considered as of low-accuracy, as the threshold value applied was relatively low. Combining the maps shown in Figure 4a,b, locations of low-accuracy and mixed cover were identified. This leads to the map shown in Figure 4c, where the locations marked indicate the locations where re-mapping should be targeted.
Predicted reference class probabilities were further used to augment land cover representations. The most probable cover types (i.e., primary classes) were determined based on the predicted class probabilities, as shown in Figure 5a. For mixed pixels identified as in Figure 4a, the second most probable class labels (i.e., alternate classes) were determined, resulting in Figure 5b. Furthermore, the mixed pixels exhibiting almost equal probabilities to more than two classes were further identified (called highly mixed pixels), as shown in Figure 5c, where the third most probable classes are depicted.
In addition, based on local error matrices, locations that appear to be correctly classified or can be corrected based on local refinability can be identified. A total of 359,780 pixels were identified as being correctly classified while only 768 pixels were classified as refinable. All 3 by 3 pixel segments (excluding edge pixels) in which correctly classified pixels fall were extracted (Figure 6a). For pixels found refinable, homogeneous 3 by 3 pixel segments (again excluding edge pixels) they fall in were corrected accordingly (Figure 6b). The union of correctly classified pixel segments (Figure 6a) and corrected pixel segments (Figure 6b) are shown in Figure 5c, leading to the set of pixels potentially usable as classification training data.

3.2.3. Improved Area Estimation

Based on reference class probabilities predicted using CCAErrMat (trained with the training sample subset I of 360 pixels described in Section 3.1), model-assisted area estimation was carried out and compared with that obtained with the π estimator. For this, two samples (the full test sample set of 1020 pixels and its subset of 360 pixels (sample subset III), as shown in Table 1) were used. The estimated area proportions and the corresponding standard errors (SEs) relative to area proportions estimated are shown in Table 3 and Table 4, respectively.
As shown in Table 3, area proportions of seven classes estimated from multiple methods are quite similar. However, there are noticeable differences in SEs concerning the area estimates, as shown in Table 4 (where smaller SEs mean greater precision). Overall, SEs for area estimates obtained with a larger sample size (1020 pixels) are smaller than those with smaller sample size (360 pixels), indicating improved precision with a greater sample size. With respect to the differences of SEs among the three estimators tested, the π estimator leads to the greatest SEs, the regression estimator the smallest SEs (especially for the classes of cultivated land and water bodies), with the difference estimator in between, on the whole. Exceptions are observed with the class of bare land, due to it being the smallest areal extent and prone to misclassification.

4. Discussions

4.1. Comparisons with Related Work

This study promotes method CCAErrMat for uncertainty characterization and utility enhancement. The method’s major advantages lie in its multiple functionalities.
For local accuracy mapping, CCAErrMat allows predictions of multiple local accuracy indicators without having to re-run modeling separately, although optimum k’s for different accuracy indicators need to be determined separately. The proposed method complements those devised for image-domain per-pixel accuracy analyses, such as Ebrahimy et al. [9].
Concerning performances of local accuracy modeling, some remarks are in order. Although not reported here, a preliminary experiment was undertaken. Results indicated that geospatially constrained local error matrices are inferior to CCAErrMat due to the sparsity of sample data used in this study (even with the full model-training sample set of 3000 pixels). In addition, the initial results showed that CCAErrMat was comparable to random forest in prediction performances. Nevertheless, it is important to undertake more comprehensive comparative studies on competing methods, although this is out of the scope of this study.
Predicted reference class probabilities can be used to augment land cover representations by providing information about potential land cover distributions through multiple alternative class labels. They can also be used to locate mixed pixels, providing useful information about locations prone to misclassification. The demarcation of low-accuracy locations and mixed pixels is useful for well-targeted land cover re-mapping. This is complementary to the work done by Yu et al. [43], in which difficult-to-map classes are identified and analyzed based on multiple classifications of land cover, and by Huang et al. [20], where areas of uncertainty were identified using multiple map consistency analysis.
Corrections for misclassification errors can be done to increase classification accuracies when misclassification errors exhibit certain patterns [22]. However, map-wise misclassification patterns are rarely apparent or discernible. By the proposed method, patterns of map-reference cover type associations can be analyzed based on localized error matrices. Pixels that are correctly classified or perfectly misclassified can be identified to create classification training datasets, although our preliminary experiments using augmented training data indicated only a modest accuracy increase. Nevertheless, this method provides a map-domain strategy for classification improvement, complementing to image-domain strategies as in Huang et al. [20], in which image interpretation is adopted over uncertain regions while training sample data for image classification over certain regions are extracted from consistency analyses of multiple products. Furthermore, confusion among cover types can be analyzed locally to aid in searching for potential informative features for land cover mapping, although this was not investigated further in this study.
With the proposed method, the enhanced map utilities were also demonstrated by an improved area estimation through the use of model-predicted reference class probabilities. Specifically, cover type probabilities were used as auxiliary variables in models for predicting per-pixel areal extents of candidate classes. Areas of multiple cover types were estimated by correcting model predictions using reference sample data in a model-assisted estimation framework. This strategy complements existing model-assisted area estimators, which are often based on remote-sensing images and/or biophysical variables and are mostly applied to a single theme (e.g., forest) [38]. According to Sales et al. [44], consistent area estimates can be obtained using class membership probabilities estimates from a random forest classification, with the error of the predicted class membership probabilities converging to zero given a large sample and proper set of explanatory variables. This model-based strategy can be usefully extended with the model-assisted strategy promoted in this study.

4.2. Recommendation for Further Research

CCAErrMat may be improved by integrating it with geographically localized error matrices for improved performances when denser reference data can be furnished. CCA can also be improved by considering stratified modeling approaches according to distinct patterns of map-reference cover type associations. This is relevant for large-area applications where regionalization is to be discerned concerning map-reference cover type associations. As a further note, feature space does not need to be synthesized through CCA only. Other kinds of synthetic feature spaces are certainly worth studying.
Land cover information fusion is a long-standing research theme. There is continuing work on the fusion of multiple land cover products by making use of a variety of methods, as exemplified by the relevant literature [45,46,47,48,49,50,51]. Other potentially useful methods are worth exploring. For instance, in addition to original maps, class probabilities predicted can be used as contextual information or as land cover primitives [52] for the fusion of multiple land cover products [53], especially when in combination with accuracy characterization. This may proceed by having individual products accuracy-characterized and thematically aligned using reference sample data before having them fused through accuracy-based weighting [47]. Land cover information fusion likely also benefits from exploiting a fuller spectrum of map-reference class co-occurrence statistics, which can be computed using local error matrices, such as map-reference cover type transition probabilities, as investigated by Zhang et al. [23].
Change detection and analysis are becoming increasingly important nowadays [53,54,55]. It is certainly worth exploring how CCAErrMat (and its variant CCACCAErrMat) is extended from single-date applications to change detection and multi-temporal applications [56]. Both direct and indirect methods are worth exploring. The former treat changes as “from-to” classes (using methods originally designed for single-date land cover to change detection directly) while the latter handle single-date land cover separately followed by proper synthesis of single-date results. For the latter, work by Zhang et al. [15] on geostatistical modeling of spatial-temporal correlation (for local accuracy characterization in land cover change) may be usefully extended.
In this study, we considered only discrete classes of land cover. Percent cover information products are also important [13,57]. Numerical ecology and machine learning methods can in principle be used for modeling percent covers of candidate land cover classes by noting the analog between percentages (or fractions) and abundances. Although there seems to be no consensus on how error matrices should be constructed for percent covers, some work has been done on error matrices for soft classifications [58]; class probabilities can be directly used as explanatory variables for CCA, indicating CCAErrMat’s potential extensibility. Nevertheless, further investigations are needed.

5. Conclusions

The novelty of this study lies in having proposed a method based on CCA ordination space localized error matrices for accuracy characterization (with multiple local accuracy indicators), analyses of map-reference cover type transitions, prediction of reference class probabilities, and improved area estimation. For bypassing the pre-computing of map class occurrence pattern indices in modeling, CCACCAErrMat and CNNCAAErrMat are also proposed as useful extensions to CCAErrMat.
Results obtained from experiments in this study are summarized below. First, for local accuracy characterization, local OAs, local UAs, and local PAs were all predicted with reasonable accuracy using alternative methods. However, in terms of cost-effectiveness, CCErrMat and CCACCAErrMat are preferred. Second, reference class probabilities predicted using CCAErrMat were used to augment land cover representations with primary and alternate labels. They were also used to identify mixed pixels, which, along with low-accuracy pixels, can be used to inform re-mapping. Third, local error matrices were analyzed to extract training data by locating correctly classified and perfectly misclassified pixels. Lastly, standard errors in area estimation were greatly reduced by model-assisted area estimation using reference class probabilities predicted by CCAErrMat as auxiliary variables. The regression estimator gave rise to the greatest precision on the whole. Although CCAErrMat was tested using GlobeLand30 land cover data over Wuhan municipality, it is transferable to other regions and other land cover products with the understanding that the specific results and performances are likely different over different regions and with different products.
Through map-reference co-occurrence analyses via local error matrices, the proposed method, CCAErrMat, provides fuller information about uncertainty in map classification, map-reference cover type transitions (e.g., differentiation among locations correctly classified, perfectly misclassified, or prone to more complex confusion), and reference class probabilities, which lead to further improved area estimation by building model-assisted area estimators. The enhanced map utilities pertain not only to the maps individually, but also collectively when they are used in combination (e.g., for change detection and data fusion). These come at no extra cost with respect to the reference data used, which are usually available or can be easily furnished, as in conventional accuracy assessments.

Author Contributions

Conceptualization, Y.W. and J.Z.; methodology, Y.W. and J.Z.; software, Y.W.; validation, Y.Z., W.Y. and J.W.; formal analysis, Y.W.; investigation, W.Z.; resources, W.Y.; data curation, W.Z., O.S.C. and A.M.T.N.; writing—original draft preparation, J.Z.; writing—review and editing, J.Z.; funding acquisition, J.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research is supported by the National Science Foundation of China (grant no. 41471375) and The National High Technology Research and Development Program of China (grant no. 2021YFD1700905).

Data Availability Statement

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. CCA Modeling

The following description about CCA is based on Legendre and Legendre [28] and Wan et al. [21]. Let I and Z represent the matrices for response variables and explanatory variables, respectively. Let f i j , f i + , and f + j be the absolute frequencies, row totals, and column totals of matrix I, respectively. Then, relative frequencies, row totals, and column totals of matrix I, which are represented by p i j , p i + , and p + j , respectively, are computed as:
p i j = f i j f + + = f i j i = 1 n j = 1 p f i j
p i + = f i + f + + = j = 1 p f i j i = 1 n j = 1 p f i j
p + j = f + j f + + = i = 1 n f i j i = 1 n j = 1 p f i j
Matrix I (for sample pixels) is then transformed to matrix Q, which consists of contributions to chi-squares defined for cross-tabulation of rows and columns in I:
Q = q i j = p i j p i + p + j p i + p + j
After performing multiple regression on matrix Q against matrix Z, the regression coefficients B is obtained as:
B = Z D ( p i + Z ] 1 Z D ( p i + ) 1 2 Q
where D = diag p i + 1 2 , meaning that matrix D is the diagonal matrix of row sums od matrix I. Matrix Q ^ is obtained as:
Q ^ = D ( p i + ) 1 2 Z B
Principal component analysis is then run on matrix Q ^ , with eigenvalues Λ and eigenvectors U derived. Site scores O are computed as linear combinations of Z variables using B. These site scores determine the canonical coordinates of each pixel in the canonical space (i.e., ordination space, called feature space in this paper) defined by the canonical axes:
O x = Z · B · U Λ 1 2
After calculating site scores, the CCA feature space is constructed with training sample and test sample. Distances between samples are eigenvalue-weighted squared distance:
d k x 0 2 = O k O x 0 Λ O k O x 0
where O k and O x are site score of reference sample pixel k’ and the target pixel x, respectively, and Λ is the diagonal matrix constructed using the eigenvalues derived.
In this study, anova.cca function in the R package vegan (vegan: community ecology package, version 2.5–4, http://vegan.r-forge.r-project.org/(accessed on 26 February 2023)) was used for forward variable selection to identify a reduced set of explanatory variables.

Appendix B. CNN

The CNN network architecture and optimization techniques similar to those in Carranza-García et al. [59] were adopted for this study. As shown in Figure A1, the CNN architecture contains two convolution layers, two pooling layers, and one fully connected layer. Batch-normalization and dropout in Carranza-García et al. [59] were also used to enhance CNN performances. Fixed-size neighborhood patch centered at each pixel being processed was used as the input for the central pixel. As patch size influences CNN performances, optimal patch size was determined through error and trial to maximize accuracy.
Figure A1. The network architecture of the convolution neural network used in this research.
Figure A1. The network architecture of the convolution neural network used in this research.
Remotesensing 15 01367 g0a1
The training sample size is key to CNN performances. CNNs usually require a large sample size to avoid overfitting and to achieve satisfactory performances. However, reference sample data are expensive to collect in practice. To augment the original reference sample data, each training image patch is rotated for seven fixed angles (45°, 90°, 135°,…, 315°) to fully extract spatial features from sample image patches. Therefore, for a sample of 360 pixels, its sample size can be expanded seven folds to 2520 pixels.

Appendix C. The π Area Estimator and Some of Model-Assisted Area Estimators

Define an indicator variable for area estimation as I t :
I t = 1 , i f a u n i t pixel s a y t b e l o n g s t o r e f e r e n c e c l a s s j 0 , o t h e r w i s e
Estimating the area for reference class j in the population (problem domain) amounts to estimating the population total A for indicator I t (with t traversing all units in the population). According to the π estimator, A is estimated as:
A ^ = s I t π t
where π t represents the inclusion probability of element t in the sample and for stratified random sampling is estimated as:
π t = n h N h
where n h and N h denotes sample count and population count of strata h, respectively. When stratified sample data with map classes being the strata are organized in an error matrix { n h j }, where n h j stands for sample count of strata h with reference class j, Equation (A10) can be rewritten:
A ^ j = h = 1 H N h n h j n h
where H represents the number of strata pertaining to the sample data [37].
The variance estimator for π estimator is:
V ^ A ^ = s s 1 π t p π t p π t π p 1 I t I p
where π t p is the inclusion probability that element t and element p are simultaneously included in the sample:
π t p = n n 1 N N 1 , w h e n t p n N , w h e n t = p
For stratified random sampling, variance is estimated [37]:
V ^ A ^ = h = 1 H N h 2 1 f h n h s I s h 2
where f h is sampling intensity of the strata h ( f h = n h N h ) and s I s h 2 sample variance with strata h:
s I s h 2 = s h I t I ¯ s h 2 n h 1
For difference estimator, the predicted class probability is set as a proxy of target variable. The population total of class probabilities serves as a proxy of ultimate area estimation. Then the difference between the proxy total and the ultimate estimation can be estimated through difference estimator and the target variable.
Set proxy estimator as I t 0 . Then, the difference is:
D t = I t I t 0
Adding the total estimation of the difference to the proxy totals recovers area estimation of target variable:
A ^ d i f = U I t 0 + s D t π t
where U I t 0 is the sum of auxiliary variables on the population total and s D t π t is the estimated sum of the difference between auxiliary variable and target variable of the population totals based on sample weighted by the inverse of inclusion probabilities [37].
Variance estimation for A ^ d i f is:
V ^ A ^ d i f = s s π t p π t π p π t p D t π t D p π p
where π t p means the inclusion probability that the t′th element and p′th element are both included in the sample. For stratified random sampling, the variance of difference estimator is:
V ^ A ^ d i f = h = 1 H N h 2 1 f h n h s I s h 2 + s Z s h 2 2 s Z I s h
where s I s h 2 and s Z s h 2 represent sample variance of variable I (target variable) and variable Z (proxy variable), with s Z I s h being sample covariance between Z and I [37].
The other model-assisted estimator considered is the regression estimator. For this study, regression estimator uses linear regression of auxiliary variables as a proxy of target variable I. The coefficient of regression is estimated using sample data. For each pixel, the regression model is expressed as:
E I t = β 1 + β 2 Z t
V I t = σ 2
where β 1 and β 2 are coefficient of regression and σ 2 is the variance of the t’th unit. I t is the target variable, and Z t ( t = 1 , 2 , , N ) is the auxiliary variable. Then, coefficients of regression are estimated as β ^ 1 and β ^ 2 .
Then, regression area estimator is as follows:
A ^ = N I ¯ s + β ^ 2 Z ¯ U Z ¯ s
where N is the population size and I ¯ s is the sample mean of target variable, with Z ¯ U and Z ¯ s being population mean of auxiliary variable and sample mean of the auxiliary variable, respectively. The variance of area estimation is:
V ^ A ^ = N 2 1 f n n 1 s I t I ¯ s β ^ 2 Z t Z ¯ s 2 ,
as described in Särndal et al. (1992) [37].
Model-assisted survey estimation is described in McConville et al. [60]. In this study, R package mase [60] was applied for model-assisted area estimation, along with package survey [61].

Appendix D. Some of Experiment Results

Appendix D.1. An Error Matrix Estimated for the Original Map

The aforementioned error matrix for the original land cover map is shown in Table A1. It was estimated using the full model-training sample of 3000 pixels, reporting an OA of 74.0%, a Tau coefficient of 0.696 and a Kappa coefficient of 0.601 [62].
Table A1. Error matrix and accuracy assessment for the original map based on the full model-training sample set (cell proportions and accuracy measures in %).
Table A1. Error matrix and accuracy assessment for the original map based on the full model-training sample set (cell proportions and accuracy measures in %).
Map ClassReference Class
CultivtForestGrassWetlandWaterArtfctBareStrata AccuracyUA
Cultivt (E)0.24.5 × 10 −26.4 × 10 −23.5 × 10 −20.10.12.5 × 10 −236.771.8
Cultivt (O)43.42.51.82.35.83.60.872.1
Forest (E)0.30.40.20.10.40.13.0 × 10 −226.473.1
Forest (O)0.88.30.80.20.30.13.8 × 10 −279.3
Grass (E)0.10.10.33.7 × 10 −20.13.1 × 10 −26.1 × 10 −338.348.3
Grass (O)0.30.21.34.6 × 10 −20.40.20.151.2
Wetland (E)2.7 × 10 −405.3 × 10 −41.1 × 10 −29.1 × 10 −30053.869.1
Wetland (O)0.11.0 × 10 −22.0 × 10 −21.00.30069.3
Water (E)2.3 × 10 −27.5 × 10 −31.3 × 10 −23.0 × 10 −20.20071.085.4
Water (O)0.40.10.60.612.50.10.485.6
Artfct (E)2.0 × 10 −23.9 × 10 −21.6 × 10 −201.6 × 10 −20.16.1 × 10 −352.083.1
Artfct (O)0.20.10.400.16.20.484.0
Bare (E)7.2 × 10 −39.6 × 10 −38.2 × 10 −304.8 × 10 −401.3 × 10 −233.845.1
Bare (O)9.0 × 10 −31.2 × 10 −21.3 × 10 −201.6 × 10 −303.7 × 10 −251.1
PA95.173.029.522.962.859.82.973.9

Appendix D.2. Variable Selection and Optimum k for k Nearest Neighbors

Explanatory variables selected and optimum k for k nearest neighbor in Method CCA-separate are shown in Table A2 for re-mapping and for local accuracy mapping using 360 sample pixels and 1020 sample pixels.
Table A2. Selected explanatory variables and optimum k for KNN (Method CCA-separate).
Table A2. Selected explanatory variables and optimum k for KNN (Method CCA-separate).
Re-mappingSample SetsSelected Variablesk
360 pixelsmapclass1+mapclass2+mapclass3+mapclass4+mapclass5+20
1020 pixelsmapclass6+hom3+con3+het3+dom3+ent3+p1w3+p2w3+47
Local accuracy mapping (360 sample pixels) Selected Variablesk
OAent3+mapclass6+mapclass5+p6w9+mapclass4+p5w7+p2w543
UA Cultivtp6w73
UA_Forestcon3+patch5+hom38
UA_Grasscon5+patch210
UA_Wetlandpatch3+patch4+dom3+p5w3+con3+het3+patch1+p2w77
UA_Waterp1w7+het5+het7+area+patch3+con96
UA_Artfctp8w320
UA_Barecon5+het5+patch4+dom3+het7+p9w3+ent5+patch2+area+
patch5+patch1+dom7+con3+p6w9+ent3+p2w3+patch3+
hom9+het9+p2w5+p2w7+p1w9+hom3
7
PA Cultivtmapclass1+p1w3+p1w5+p1w7+p1w91
PA_Forestmapclass2+p1w3+p1w5+p1w7+p1w91
PA_Grassmapclass3+p1w3+p1w5+p1w7+p1w91
PA_Wetlandmapclass4+p1w3+p1w5+p1w7+p1w91
PA_Watermapclass5+p1w3+p1w5+p1w7+p1w91
PA_Artfctmapclass6+p1w3+p1w5+p1w7+p1w91
PA_Baremapclass7+p1w3+p1w5+p1w7+p1w91
Local accuracy mapping (1020 sample pixels) Selected Variablesk
OAent3+mapclass5+mapclass6+lsihet2_3+mapclass4+
mapclass1+p6w9+p5w7+POINT_Y+hom9+p5w3+p5w9
37
UA Cultivtent3+het3+p5w9+p6w521
UA_Foresthom9+patch5+p3w9+p8w3+con340
UA_Grassp1w5+dom5+dom9+patch2+area44
UA_Wetlandp6w3+patch4+patch3+patch1+p3w9+p1w336
UA_Waterp1w3+dom3+p3w5+patch6+p2w314
UA_Artfctpatch4+dom3+p8w5+hom3+p8w319
UA_Barepatch3+p1w3+hom5+area14
PA Cultivtmapclass1+p1w3+p1w5+p1w7+p1w91
PA_Forestmapclass2+p1w3+p1w5+p1w7+p1w91
PA_Grassmapclass3+p1w3+p1w5+p1w7+p1w91
PA_Wetlandmapclass4+p1w3+p1w5+p1w7+p1w91
PA_Watermapclass5+p1w3+p1w5+p1w7+p1w91
PA_Artfctmapclass6+p1w3+p1w5+p1w7+p1w91
PA_Baremapclass7+p1w3+p1w5+p1w7+p1w91
Method CCAErrMat is actually the same as CCA-separate when used for re-mapping. Thus, explanatory variable selection and optimum k are the same as shown in Table A2. However, when used for local accuracy mapping, optimum k’s need to be found for CCAErrMat, as shown in Table A3, where results with sample sets of 360 pixels and 1020 pixels are in Table A3, respectively.
Table A3. Optimum k for k nearest neighbors for local accuracy characterization (method CCAErrMat).
Table A3. Optimum k for k nearest neighbors for local accuracy characterization (method CCAErrMat).
360
Sample Pixels
OAUA CultivtUA ForestUA GrassUA WetlandUA WaterUA ArtfctUA Bare
K128681023237
PA CultivtPA ForestPA GrassPA WetlandPA WaterPA ArtfctPA Bare
K 62421515
1020
Sample Pixels
OAUA CultivtUA ForestUA GrassUA WetlandUA WaterUA ArtfctUA Bare
K425022248124648
PA CultivtPA ForestPA GrassPA WetlandPA WaterPA ArtfctPA Bare
k 12421515

References

  1. Chen, J.; Chen, J.; Liao, A.; Cao, X.; Chen, L.; Chen, X.; He, C.; Han, G.; Peng, S.; Lu, M.; et al. Global land cover mapping at 30 m resolution: A POK-based operational approach. ISPRS J. Photogramm. Remote Sens. 2015, 103, 7–27. [Google Scholar] [CrossRef] [Green Version]
  2. Brown, J.F.; Tollerud, H.J.; Barber, C.P.; Zhou, Q.; Dwyer, J.L.; Vogelmann, J.E.; Loveland, T.R.; Woodcock, C.E.; Stehman, S.V.; Zhu, Z.; et al. Lessons learned implementing an operational continuous United States national land change monitoring capability—The Land Change Monitoring, Assessment, and Projection (LCMAP) approach. Remote Sens. Environ. 2020, 238, 111356. [Google Scholar] [CrossRef]
  3. Buchhorn, M.; Lesiv, M.; Tsendbazar, N.-E.; Herold, M.; Bertels, L.; Smets, B. Copernicus global land cover layers-collection 2. Remote Sens. 2020, 12, 1044. [Google Scholar] [CrossRef] [Green Version]
  4. Friedl, M.A.; McIver, D.K.; Hodges, J.C.F.; Zhang, X.Y.; Muchoney, D.; Strahler, A.H.; Woodcock, C.E.; Gopal, S.; Schneider, A.; Cooper, A.; et al. Global land cover mapping from MODIS: Algorithms and early results. Remote Sens. Environ. 2002, 83, 287–302. [Google Scholar] [CrossRef]
  5. Sulla-Menashe, D.; Gray, J.M.; Abercrombie, S.P.; Friedl, M.A. Hierarchical mapping of annual global land cover 2001 to present: The MODIS collection 6 land cover product. Remote Sens. Environ. 2019, 222, 183–194. [Google Scholar] [CrossRef]
  6. Yang, Y.; Xiao, P.; Feng, X.; Li, H. Accuracy assessment of seven global land cover datasets over China. ISPRS J. Photogramm. Remote Sens. 2017, 125, 156–173. [Google Scholar] [CrossRef]
  7. Hua, T.; Zhao, W.; Liu, Y.; Wang, S.; Yang, S. Spatial consistency assessments for global land-cover datasets: A comparison among GLC2000, CCI LC, MCD12, GLOBCOVER and GLCNMO. Remote Sens. 2018, 10, 1846. [Google Scholar] [CrossRef] [Green Version]
  8. Stehman, S.V. Estimating area from an accuracy assessment error matrix. Remote Sens. Environ. 2013, 132, 202–211. [Google Scholar] [CrossRef]
  9. Ebrahimy, H.; Mirbagheri, B.; Matkan, A.A.; Azadbakht, M. Per-pixel land cover accuracy prediction: A random forest-based method with limited reference sample data. ISPRS J. Photogramm. Remote Sens. 2021, 172, 17–27. [Google Scholar] [CrossRef]
  10. Smith, J.H.; Stehman, S.V.; Wickham, J.D.; Yang, L. Effects of landscape characteristics on land-cover class accuracy. Remote Sens. Environ. 2003, 84, 342–349. [Google Scholar] [CrossRef]
  11. Foody, G.M. Local characterization of thematic classification accuracy through spatially constrained confusion matrices. Int. J. Remote Sens. 2005, 26, 1217–1228. [Google Scholar] [CrossRef]
  12. Khatami, R.; Mountrakis, G.; Stehman, S.V. Mapping per-pixel predicted accuracy of classified remote sensing images. Remote Sens. Environ. 2017, 191, 156–167. [Google Scholar] [CrossRef] [Green Version]
  13. Wickham, J.; Stehman, S.V.; Homer, C.G. Spatial patterns of the United States National Land Cover Dataset (NLCD) land-cover change thematic accuracy (2001-2011). Int. J. Remote Sens. 2018, 39, 1729–1743. [Google Scholar] [CrossRef] [Green Version]
  14. Stehman, S.V.; Foody, G.M. Key issues in rigorous accuracy assessment of land cover products. Remote Sens. Environ. 2019, 231, 111199. [Google Scholar] [CrossRef]
  15. Zhang, J.; Zhang, W.; Mei, Y.; Yang, W. Geostatistical characterization of local accuracies in remotely sensed land cover change categorization with complexly configured reference samples. Remote Sens. Environ. 2019, 223, 63–81. [Google Scholar] [CrossRef]
  16. Steele, B.M.; Winne, J.C.; Redmond, R.L. Estimation and mapping of misclassification probabilities for thematic land cover maps. Remote Sens. Environ. 1998, 66, 192–202. [Google Scholar] [CrossRef]
  17. Van Oort, P.A.J.; Bregt, A.K.; De Bruin, S.; De Wit, A.J.W.; Stein, A. Spatial variability in classification accuracy of agricultural crops in the Dutch national land-cover database. Int. J. Geogr. Inf. Sci. 2004, 18, 611–626. [Google Scholar] [CrossRef]
  18. Burnicki, A.C. Modeling the probability of misclassification in a map of land cover change. Photogramm. Eng. Remote Sens. 2011, 77, 39–50. [Google Scholar] [CrossRef]
  19. Comber, A.; Brunsdon, C.; Charlton, M.; Harris, P. Geographically weighted correspondence matrices for local error reporting and change analyses: Mapping the spatial distribution of errors and change. Remote Sens. Lett. 2017, 8, 234–243. [Google Scholar] [CrossRef] [Green Version]
  20. Huang, X.; Song, Y.H.; Yang, J.; Wang, W.R.; Ren, H.Q.; Dong, M.J.; Feng, Y.J.; Yin, H.D.; Li, J.Y. Toward accurate mapping of 30-m time-series global impervious surface area (GISA). Int. J. Appl. Earth Obs. Geoinf. 2022, 109, 102787. [Google Scholar] [CrossRef]
  21. Wan, Y.; Zhang, J.; Yang, W.; Tang, Y. Refining land-cover maps based on probabilistic re-classification in CCA ordination space. Remote Sens. 2020, 12, 2954. [Google Scholar] [CrossRef]
  22. Campos, J.C.; Brito, J.C. Mapping underrepresented land cover heterogeneity in arid regions: The Sahara-Sahel example. ISPRS J. Photogramm. Remote Sens. 2018, 146, 211–220. [Google Scholar] [CrossRef]
  23. Zhang, W.L.; Wang, J.W.; Lin, H.T.; Cong, M.; Wan, Y.; Zhang, J.X. Fusing multiple land cover products based on locally estimated map-reference cover type transition probabilities. Remote Sens. 2023, 15, 481. [Google Scholar] [CrossRef]
  24. Olofsson, P.; Foody, G.M.; Herold, M.; Stehman, S.V.; Woodcock, C.E.; Wulder, M.A. Good practices for estimating area and assessing accuracy of land change. Remote Sens. Environ. 2014, 148, 42–57. [Google Scholar] [CrossRef]
  25. Breidt, F.J.; Opsomer, J.D. Model-assisted survey estimation with modern prediction techniques. Stat. Sci. 2017, 32, 190–205. [Google Scholar] [CrossRef]
  26. McConville, K.S.; Moisen, G.G.; Frescino, T.S. A tutorial on model-assisted estimation with application to forest inventory. Forests 2020, 11, 244. [Google Scholar] [CrossRef] [Green Version]
  27. Ter Braak, C.J. The analysis of vegetation-environment relationships by canonical correspondence analysis. Vegetatio 1987, 69, 69–77. [Google Scholar] [CrossRef]
  28. Legendre, P.; Legendre, L. Numerical Ecology; Elsevier: Amsterdam, The Netherlands, 2012. [Google Scholar]
  29. Crookston, N.L.; Finley, A.O. yaImpute: An R Package for kNN Imputation. J. Stat. Softw. 2008, 23, 1–16. [Google Scholar] [CrossRef] [Green Version]
  30. Duveneck, M.J.; Thompson, J.R.; Wilson, B.T. An imputed forest composition map for New England screened by species range boundaries. For. Ecol. Manage 2015, 347, 107–115. [Google Scholar] [CrossRef]
  31. McRoberts, R.E.; Næsset, E.; Gobakken, T. Optimizing the k-Nearest neighbors technique for estimating forest aboveground biomass using airborne laser scanning data. Remote Sens. Environ. 2015, 163, 13–22. [Google Scholar] [CrossRef]
  32. Xu, C.; Manley, B.; Morgenroth, J. Evaluation of modelling approaches in predicting forest volume and stand age for small-scale plantation forests in New Zealand with RapidEye and LiDAR. Int. J. Appl. Earth Obs. Geoinf. 2018, 73, 386–396. [Google Scholar] [CrossRef]
  33. Feilhauer, H.; Zlinsky, A.; Kania, A.; Foody, G.M.; Doktor, D.; Lausch, A.; Schmidtlein, S. Let your maps be fuzzy!—Class probabilities and floristic gradients as alternatives to crisp mapping for remote sensing of vegetation. Remote Sens. Ecol. Conserv. 2020, 7, 292–305. [Google Scholar] [CrossRef]
  34. McGarigal, K.; Tagil, S.; Cushman, S.A. Surface metrics: An alternative to patch metrics for the quantification of landscape structure. Landsc. Ecol. 2009, 24, 433–450. [Google Scholar] [CrossRef]
  35. O’Neill, R.V.; Krummel, J.R.; Gardner, R.H.; Sugihara, G.; Jackson, B.; DeAngelis, D.L.; Milne, B.T.; Turner, M.G.; Zygmunt, B.; Christensen, S.W.; et al. Indices of landscape pattern. Landsc. Ecol. 1988, 1, 153–162. [Google Scholar] [CrossRef]
  36. Riitters, K.H.; O’Neill, R.V.; Wickham, J.D. A note on contagion indices for landscape analysis. Landsc. Ecol. 1996, 11, 197–202. [Google Scholar] [CrossRef]
  37. Särndal, C.E.; Swensson, B.; Wretman, J. Model-Assisted Survey Sampling; Springer: New York, NY, USA, 1992. [Google Scholar]
  38. Ståhl, G.; Saarela, S.; Schnell, S.; Holm, S.; Breidenbach, J.; Healey, S.P.; Patterson, P.L.; Magnussen, S.; Næsset, E.; McRoberts, R.E.; et al. Use of models in large-area forest surveys: Comparing model-assisted, model-based and hybrid estimation. For. Ecosyst. 2016, 3, 5. [Google Scholar] [CrossRef] [Green Version]
  39. Pickering, J.; Stehman, S.V.; Tyukavina, A.; Potapov, P.; Watt, P.; Jantz, S.M.; Bholanath, P.; Hansen, M.C. Quantifying the trade-off between cost and precision in estimating area of forest loss and degradation using probability sampling in Guyana. Remote Sens. Environ. 2019, 221, 122–135. [Google Scholar] [CrossRef]
  40. Zhang, J.; Yang, W.; Zhang, W.; Wang, Y.; Liu, D.; Xiu, Y. An explorative study on estimating local accuracies in land-cover information using logistic regression and class-heterogeneity-stratified data. Remote Sens. 2018, 10, 1581. [Google Scholar] [CrossRef] [Green Version]
  41. Sweeney, S.P.; Evans, T.P. An edge-oriented approach to thematic map error assessment. Geocarto Int. 2012, 27, 31–56. [Google Scholar] [CrossRef]
  42. Hanley, J.A.; McNeil, B.J. A method of comparing the areas under receiver operating characteristic curves derived from the same cases. Radiology 1983, 148, 839–843. [Google Scholar] [CrossRef] [Green Version]
  43. Yu, L.; Liu, X.; Zhao, Y.; Yu, C.; Gong, P. Difficult to map regions in 30 m global land cover mapping determined with a common validation dataset. Int. J. Remote Sens. 2018, 39, 4077–4087. [Google Scholar] [CrossRef]
  44. Sales, M.H.R.; De Bruin, S.; Souza, C.; Herold, M. Land use and land cover area estimates from class membership probability of a random forest classification. IEEE Trans. Geosci. Remote Sens. 2022, 60, 4402711. [Google Scholar] [CrossRef]
  45. Jung, M.; Henkel, K.; Herold, M.; Churkina, G. Exploiting synergies of global land cover products for carbon cycle modeling. Remote Sens. Environ. 2006, 101, 534–553. [Google Scholar] [CrossRef]
  46. Iwao, K.; Nasahara, K.N.; Kinoshita, T.; Yamagata, Y.; Patton, D.; Tsuchida, S. Creation of new global land cover map with map integration. J. Geogr. Inf. Syst. 2011, 3, 160–165. [Google Scholar] [CrossRef] [Green Version]
  47. Tuanmu, M.-N.; Jetz, W. A global 1-km consensus land-cover product for biodiversity and ecosystem modelling. Glob. Ecol. Biogeogr. 2014, 23, 1031–1045. [Google Scholar] [CrossRef]
  48. See, L.; Schepaschenko, D.; Lesiv, M.; Kraxner, F.; Obersteiner, M. Building a hybrid land cover map with crowdsourcing and geographically weighted regression. ISPRS J. Photogramm. Remote Sens. 2015, 103, 48–56. [Google Scholar] [CrossRef] [Green Version]
  49. Gengler, S.; Bogaert, P. Combining land cover products using a minimum divergence and a Bayesian data fusion approach. Int. J. Geogr. Inf. Sci. 2018, 32, 806–826. [Google Scholar] [CrossRef]
  50. Pérez-Hoyos, A.; Udías, A.; Rembold, F. Integrating multiple land cover maps through a multi-criteria analysis to improve agricultural monitoring in Africa. Int. J. Appl. Earth Obs. Geoinf. 2020, 88, 102064. [Google Scholar] [CrossRef]
  51. Li, Z.; White, J.C.; Wulder, M.A.; Hermosilla, T.; Davidson, A.M.; Comber, A.J. Land cover harmonization using Latent Dirichlet Allocation. Int. J. Geogr. Inf. Sci. 2021, 35, 348–374. [Google Scholar] [CrossRef]
  52. Saah, D.; Tenneson, K.; Poortinga, A.; Nguyen, Q.; Chishtie, F.; Aung, K.S.; Markert, K.N.; Clinton, N.; Anderson, E.R.; Cutter, P.; et al. Primitives as building blocks for constructing land cover maps. Int. J. Appl. Earth Obs. Geoinf. 2020, 85, 101979. [Google Scholar] [CrossRef]
  53. Healey, S.P.; Cohen, W.B.; Yang, Z.; Brewer, C.K.; Brooks, E.B.; Gorelick, N.; Hernandez, A.J.; Huang, C.; Hughes, M.J.; Kennedy, R.E.; et al. Mapping forest change using stacked generalization: An ensemble approach. Remote Sens. Environ. 2018, 204, 717–728. [Google Scholar] [CrossRef]
  54. Chughtai, A.H.; Abbasi, H.; Karas, I.R. A review on change detection method and accuracy assessment for land use land cover. Remote Sens. Appl. Soc. Environ. 2021, 22, 100482. [Google Scholar] [CrossRef]
  55. Xu, L.; Herold, M.; Tsendbazar, N.-E.; Masiliūnas, D.; Li, L.; Lesiv, M.; Fritz, S.; Verbesselt, J. Time series analysis for global land cover change monitoring: A comparison across sensors. Remote Sens. Environ. 2022, 271, 112905. [Google Scholar] [CrossRef]
  56. Olofsson, P.; Arévalo, P.; Espejo, A.B.; Green, C.; Lindquist, E.; McRoberts, R.E.; Sanz, M.J. Mitigating the effects of omission errors on area and area change estimates. Remote Sens. Environ. 2020, 236, 111492. [Google Scholar] [CrossRef]
  57. Masiliūnas, D.; Tsendbazar, N.E.; Herold, M.; Lesiv, M.; Buchhorn, M.; Verbesselt, J. Global land characterisation using land cover fractions at 100 m resolution. Remote Sens. Environ. 2021, 259, 112409. [Google Scholar] [CrossRef]
  58. Silván-Cárdenas, J.L.; Wang, L. Sub-pixel confusion-uncertainty matrix for assessing soft classifications. Remote Sens. Environ. 2008, 112, 1081–1095. [Google Scholar] [CrossRef]
  59. Carranza-García, M.; García-Gutiérrez, J.; Riquelme, J.C. A framework for evaluating land use and land cover classification using convolutional neural networks. Remote Sens. 2019, 11, 274. [Google Scholar] [CrossRef] [Green Version]
  60. McConville, K.; Tang, B.; Zhu, G.; Cheung, S.; Li, S. Mase: Model-Assisted Survey Estimation, Version 0.1.3. 2018; Available online: https://cran.r-project.org/package=mase (accessed on 26 February 2023).
  61. Lumley, T. Survey: Analysis of complex survey samples, R package version 4.0. J. Stat. Softw. 2020, 9, 1–19. [Google Scholar]
  62. Ma, Z.; Redmond, R.L. Tau coefficients for accuracy assessment of classification of remote sensing data. Photogramm. Eng. Remote Sens. 1995, 61, 435–439. [Google Scholar]
Figure 1. The flowchart showing the methods proposed in this research.
Figure 1. The flowchart showing the methods proposed in this research.
Remotesensing 15 01367 g001
Figure 2. GlobeLand30 2010 land cover map for Wuhan, China.
Figure 2. GlobeLand30 2010 land cover map for Wuhan, China.
Remotesensing 15 01367 g002
Figure 3. The surface of local OAs (a), the surface of local UAs for cultivated land (b), and the surface of local PAs for water bodies (c), predicted using CCAErrMat.
Figure 3. The surface of local OAs (a), the surface of local UAs for cultivated land (b), and the surface of local PAs for water bodies (c), predicted using CCAErrMat.
Remotesensing 15 01367 g003
Figure 4. Locations of uncertainty: (a) those of mixed pixels, (b) those of relatively low OAs, and (c) union of locations shown in (a,b).
Figure 4. Locations of uncertainty: (a) those of mixed pixels, (b) those of relatively low OAs, and (c) union of locations shown in (a,b).
Remotesensing 15 01367 g004
Figure 5. Land cover representations augmented with: (a) the primary classes, (b) the alternate classes for mixed pixels, and (c) the third most probable classes for highly mixed pixels.
Figure 5. Land cover representations augmented with: (a) the primary classes, (b) the alternate classes for mixed pixels, and (c) the third most probable classes for highly mixed pixels.
Remotesensing 15 01367 g005
Figure 6. Extracted training data: (a) pixels correctly classified, (b) corrected pixels (shown in exaggeration in order to be legible), and (c) union of pixel segments for (a,b).
Figure 6. Extracted training data: (a) pixels correctly classified, (b) corrected pixels (shown in exaggeration in order to be legible), and (c) union of pixel segments for (a,b).
Remotesensing 15 01367 g006
Table 1. Model-training sample full set and subsets I and II, the test sample (for model testing and area estimation), and sample subset III (for area estimation).
Table 1. Model-training sample full set and subsets I and II, the test sample (for model testing and area estimation), and sample subset III (for area estimation).
Strata and
Sub-Strata
NstrataTraining Sample
Full Set (Sampling
Intensity)
Sample
Subset I
Sample
Subset II
Test Sample
Full Set
(Sampling Intensity)
Sample
Subset III
Cultivt_E56,721120 (0.21)156060 (0.11)15
Cultivt_O5,739,2031095 (0.02)132160160 (0.003)132
Forest_E133,655140 (0.10)187070 (0.05)18
Forest_O1,001,767280 (0.03)27100100 (0.01)27
Grass_E70,366120 (0.17)156060 (0.09)15
Grass_O248,912170 (0.07)218080 (0.03)21
Wetland_E203380 (3.93)94040 (1.97)9
Wetland_O133,735140 (0.10)187070 (0.05)18
Water_E23,895100 (0.42)125050 (0.21)12
Water_O1,395,043285 (0.02)33110110 (0.01)33
Artfct_E19,324100 (0.52)125050 (0.26)12
Artfct_O699,787200 (0.03)249090 (0.01)24
Bare_E365880 (2.19)124040 (1.10)12
Bare_O699490 (1.29)124040 (0.58)12
All9,535,093300036010201020360
Table 2. AUC-based evaluation for local accuracy characterization.
Table 2. AUC-based evaluation for local accuracy characterization.
CCAErrMatCCA-SeparateCNN-SeparateCCACCAErrMatCNNCCAErrMat
36010203601020360102036010203601020
Local OA 0.710.720.750.790.740.770.740.770.720.76
Local UACultivt0.720.710.670.750.780.750.700.740.620.76
Forest0.790.830.780.870.820.860.770.820.820.87
Grass0.610.500.570.540.660.610.670.650.610.56
Wetland0.610.680.610.710.400.700.540.670.360.70
Water0.450.620.660.810.550.540.560.560.540.54
Artfct0.700.560.690.660.730.710.710.700.690.70
Bare0.630.650.580.800.720.870.620.700.730.79
Local PACultivt1.001.001.001.000.991.001.001.001.001.00
Forest1.001.001.001.000.991.001.001.001.001.00
Grass1.001.001.001.000.981.001.001.001.001.00
Wetland1.001.001.001.001.001.001.001.001.001.00
Water0.990.991.001.000.991.001.001.001.001.00
Artfct1.001.001.001.000.981.001.001.001.001.00
Bare0.981.001.001.000.961.001.001.001.001.00
Table 3. Area proportions estimated with different estimators (%).
Table 3. Area proportions estimated with different estimators (%).
Area Cultvt
Land
ForestGrassWetlandWaterArtificial
Surfaces
Bare
Land
360
pixels
π estimator50.112.96.83.417.18.90.8
Difference
estimator
50.412.66.93.516.98.80.8
Regression
estimator
50.412.66.93.616.98.80.8
1020
pixels
π estimator49.112.66.72.619.58.90.6
Difference
estimator
49.912.66.52.518.99.10.6
Regression
estimator
50.012.66.52.618.79.10.6
Table 4. Standard error (SE) estimates (relative to area proportions estimated) obtained with different estimators (%).
Table 4. Standard error (SE) estimates (relative to area proportions estimated) obtained with different estimators (%).
SE Cultvt LandForestGrassWetlandWaterArtificial SurfacesBare Land
360
pixels
π estimator5.014.124.038.312.418.3102.4
Difference estimator3.911.621.728.18.912.891.5
Regression estimator3.811.621.728.28.712.896.5
1020 pixelsπ estimator2.68.916.132.27.012.081.9
Difference estimator2.16.815.026.55.17.478.7
Regression estimator2.06.815.026.05.17.580.6
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wan, Y.; Zhang, J.; Zhang, W.; Zhang, Y.; Yang, W.; Wang, J.; Chukwunonso, O.S.; Nadeeka, A.M.T. Characterizing Uncertainty and Enhancing Utility in Remotely Sensed Land Cover Using Error Matrices Localized in Canonical Correspondence Analysis Ordination Space. Remote Sens. 2023, 15, 1367. https://doi.org/10.3390/rs15051367

AMA Style

Wan Y, Zhang J, Zhang W, Zhang Y, Yang W, Wang J, Chukwunonso OS, Nadeeka AMT. Characterizing Uncertainty and Enhancing Utility in Remotely Sensed Land Cover Using Error Matrices Localized in Canonical Correspondence Analysis Ordination Space. Remote Sensing. 2023; 15(5):1367. https://doi.org/10.3390/rs15051367

Chicago/Turabian Style

Wan, Yue, Jingxiong Zhang, Wangle Zhang, Ying Zhang, Wenjing Yang, Jianxu Wang, Okafor Somtoochukwu Chukwunonso, and Asurapplullige Milani Tharuka Nadeeka. 2023. "Characterizing Uncertainty and Enhancing Utility in Remotely Sensed Land Cover Using Error Matrices Localized in Canonical Correspondence Analysis Ordination Space" Remote Sensing 15, no. 5: 1367. https://doi.org/10.3390/rs15051367

APA Style

Wan, Y., Zhang, J., Zhang, W., Zhang, Y., Yang, W., Wang, J., Chukwunonso, O. S., & Nadeeka, A. M. T. (2023). Characterizing Uncertainty and Enhancing Utility in Remotely Sensed Land Cover Using Error Matrices Localized in Canonical Correspondence Analysis Ordination Space. Remote Sensing, 15(5), 1367. https://doi.org/10.3390/rs15051367

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop