Improving the Accuracy of Remote Sensing Land Cover Classification by GEO-ECO Zoning Coupled with Geostatistical Simulation

Land cover products obtained from remote sensing image classification inevitably contain a large number of false classification or uncertain pixels because of spectral confusion, image resolution limitation, and ground object complexity. The confusion matrix used to evaluate the classification accuracy cannot reflect the spatial variation. The information provided to users of land cover products is incomplete and uncertain. In this study, a method is presented to evaluate and improve the accuracy of land cover classification products by coupling Geo-Eco zoning and Markov chain geoscience statistical simulation. Validation points collected from various sources are used in the model calculation and accuracy verification of results. The pre-classified image that needs to be improved and Geo-Eco zoning attribute data are used as auxiliary data for co-simulation. Results show that the accuracy of Globeland30 data can be improved by more than 10% by coupling Geo-Eco zoning and Markov chain geostatistical simulation.


Introduction
Land cover is a concept emerging with the development of remote sensing technology, and remote sensing is the only effective means for large-scale land cover mapping [1]. Since the 1980s, the international scientific community has been highly concerned about remote sensing mapping of global land cover [2][3][4]. A variety of 1000, 300, 30, and 10 m resolution global, regional, or national land cover products have been developed [5][6][7][8][9][10][11][12]. Classical supervised and unsupervised classification technologies are commonly used, but the accuracy is not high, which is about 60% to 70% [13,14]. A deep learning algorithm has not been applied to the classification of land cover products on a large scale [15]. The second-level land cover product classification is more difficult to obtain reliable results via automatic classification. Classification accuracy varies spatially, and spatial variation in accuracy should be quantitatively evaluated. Evaluating and improving the accuracy of land cover product classification (hereinafter referred to as pre-classified image) obtained by conventional classification algorithm and quantifying the uncertainty of classification are necessary to study global change, geographical condition census, social environment planning, and ecological resource management.
Two kinds of methods are used to improve the accuracy of remote sensing classification. One of the methods is the application of Geoscience Knowledge Rules. Since the 1980s, experts and scholars have introduced expert systems and knowledge engineering to solve remote sensing classification problems [16][17][18][19][20]. In previous land cover mapping, auxiliary data, such as digital elevation model (DEM) data, ecological region data, vegetation data of countries or regions [5], global mangrove atlas, global human settlements, and regional data of global urban coverage (http://maps.elie.ucl.ac.be/CCI/viewer/index.php), MODIS (moderate-resolution imaging spectroradiometer, NASA, USA) NDVI (normalized different vegetation index) data, global geographic information data, global DEM data, various thematic data, and online high-resolution images are employed to improve product accuracy [21]. However, the expert knowledge and reference auxiliary data used in land cover mapping are sporadic and unsystematic, and no global integral system is available to manage expert knowledge and auxiliary data for reuse.
Eco geographical regions are areas where ecosystems (and the type, quality, and quantity of environmental resources) are generally similar. They are relatively large units of land containing a distinct assemblage of natural communities and species, with boundaries that approximate the original extent of natural communities prior to major land-use change [22]. Eco geographical regions can be applied as a frame to construct a global expert knowledge base and assist in remote sensing image classification. Zhu et al. [23] adopted the "world terrestrial ecological region" established by the World Wildlife Fund for natural protection as the basic framework of the global Geo-Eco zoning knowledge base [22]. An object-oriented method is used to construct a rule base and help identify spurious changes in remote sensing image detection. Five kinds of attributes, namely, DEM, slope, NDVI, temperature, and moisture, of each Geo-eco zone, are collected to identify spurious change. The accuracy of change detection is improved to a certain extent [23].
Another method to improve the accuracy of remote sensing classification is the application of geostatistics. Geostatistics is used to analyze and predict the values associated with spatial or spatiotemporal phenomena. Geostatistics has been applied in remote sensing since the 1980s but not popular. Meer [24] made a detailed review. Among them, studies on using geostatistics to improve the accuracy of land cover classification products are limited. Bruin [25] used the sequential indicator simulation algorithm and the cokriging method to predict the area range of olive trees by using the classified image as the soft data and the interpretation sample from the aerial image as the hard index. Tsendbazar et al. [14] used indicator Kriging to estimate the spatial variation in the accuracy of the source land cover products and use integration methods, which consider the local accuracy of each source product to obtain Africa's land cover products. Carvalho et al. [26] improved the accuracy of land cover classification by using a direct sequential co-simulation algorithm combined with field observations and remote sensing images classified by maximum likelihood classification. Tang et al. [27] utilized a multipoint geostatistic calculation method to take the maximum likelihood classification result as the training image, the maximum likelihood probability as the soft condition, and the training sample as the hard condition and improve the accuracy of classification products. Li and Zhang [28] proposed to use the Markov chain random field (MCRF) method to evaluate the uncertainty of the land cover classification. They carried out conditional random simulation at the sample pixels interpreted by experts referring to high-resolution images and other auxiliary information. Unlabeled pixels are regarded as uncertain regions, and the class is obtained by using the MCRF algorithm [29]. However, the disadvantage of this method is that a large number of expert-interpreted samples are needed to ensure the reliability of results; consequently, the method becomes time-consuming and laborious. Li et al. [30] further improved the algorithm by introducing a pre-classified image to the simulation process. The co-simulation algorithm of the MCRF (Co-MCSS) is used to improve the classification accuracy by integrating sample pixels and pre-classified images.
In this study, the boundary of geostatistics is confined in each Geo-Eco zone, and the Co-MCSS method [30] is used to quantify the classification local accuracy by integrating the pre-classified image, the verification points, and the attribute data based on the Geo-Eco zoning. Three contributions are made in terms of determining, evaluating, and improving the accuracy of the land cover product.
(1) Taking Geo-Eco zoning as the area of geostatistics. Generally the area of geostatistics is based on the selected image area, which often contains boundary effect, that is, the fact that a class may have statistically biased smaller frequencies of transitions if it has a higher chance of occurring at boundaries of the study area because boundary polygons are incomplete and have no transition to other classes beyond the boundary [28]. Generally, the boundary of Geo-Eco zoning does not cross two different types of land cover to avoid the boundary effect. (2) In previous research, the large number of sample pixels were interpreted by experts to obtain a transiogram model [28], but this process is time-consuming and laborious. In this study, reliable verification data published by some websites or institutions related to land cover research are reused to reduce the work on visual interpretation. (3) The Co-MCSS method not only combines remote sensing pre-classified images but also takes various attributes of Geo-Eco zoning (such as DEM, slope, temperature, and humidity) as auxiliary data to participate in the simulation and calculation of cross field transition probability. With the combination of additional attribute information, the simulation algorithm becomes more robust.

Geo-Eco Zoning Rule Base
In this study, the global eco-regions established by the World Wild Fund for natural protection were selected. With these eco-regions, the terrestrial world was subdivided into 14 biomes and 8 biogeographic realms and nested 867 Geo-Eco zones [22]. The framework of Zhu et al. [23] Geo-Eco zoning rule base was adopted to collect and sort out the natural attribute data sets, including the DEM, slope, NDVI, temperature, and moisture. This kind of knowledge was highly correlated with land cover. Each attribute was expressed as a layer, which was used as the auxiliary data of co-simulation with the pre-classified image.

Process
The technical process is shown in Figure 1. In this study, a set of transiogram models of each Geo-Eco zone were estimated by using the sample pixels. The cross field transition probability from the sample data to the auxiliary data set was estimated. The Co-MCSS algorithm was used to generate the optimal prediction map and occurrence probability map.
fact that a class may have statistically biased smaller frequencies of transitions if it has a higher chance of occurring at boundaries of the study area because boundary polygons are incomplete and have no transition to other classes beyond the boundary [28]. Generally, the boundary of Geo-Eco zoning does not cross two different types of land cover to avoid the boundary effect.
(2) In previous research, the large number of sample pixels were interpreted by experts to obtain a transiogram model [28], but this process is time-consuming and laborious. In this study, reliable verification data published by some websites or institutions related to land cover research are reused to reduce the work on visual interpretation. (3) The Co-MCSS method not only combines remote sensing pre-classified images but also takes various attributes of Geo-Eco zoning (such as DEM, slope, temperature, and humidity) as auxiliary data to participate in the simulation and calculation of cross field transition probability. With the combination of additional attribute information, the simulation algorithm becomes more robust.

Geo-Eco Zoning Rule Base
In this study, the global eco-regions established by the World Wild Fund for natural protection were selected. With these eco-regions, the terrestrial world was subdivided into 14 biomes and 8 biogeographic realms and nested 867 Geo-Eco zones [22]. The framework of Zhu et al. [23] Geo-Eco zoning rule base was adopted to collect and sort out the natural attribute data sets, including the DEM, slope, NDVI, temperature, and moisture. This kind of knowledge was highly correlated with land cover. Each attribute was expressed as a layer, which was used as the auxiliary data of co-simulation with the pre-classified image.

Process
The technical process is shown in Figure 1. In this study, a set of transiogram models of each Geo-Eco zone were estimated by using the sample pixels. The cross field transition probability from the sample data to the auxiliary data set was estimated. The Co-MCSS algorithm was used to generate the optimal prediction map and occurrence probability map.  The main steps were as follows: (1) The land cover verification points from networks were collected from the relevant websites (citations are provided below). If the amount of verification points from networks was not enough for the transiogram estimation, then visual interpretation of sample points was added as a supplement to form the sample data set. (2) Traditional methods, such as the maximum likelihood method, were used to obtain pre-classified images. The natural attribute layers (such as DEM, slope, and aspect) of Geo-Eco zoning constituted the auxiliary data set for co-simulation.
(3) A set of transiograms were estimated by using the sample data set.
(4) The cross field transition probabilities were estimated by the sample and auxiliary data set. (5) Co-MCSS algorithm was carried out under the condition of sample data and auxiliary data. (6) The optimal prediction map and occurrence probability map were obtained.
In the above steps, except that some of the sample points needed to be interpreted manually, other works were coded by Matlab and realized automatically. These steps will be further presented in detail in the Results part.

Transiogram
Traditionally, variograms are commonly used to describe the correlation between variables. However, variograms cannot describe directional asymmetry when land cover categories appear sequentially, thus they cannot effectively express the parallel relationship between categories. Markov chain conditional simulation provides a general framework for a non-linear non-kriging geostatistical approach. It needs a powerful index of spatial heterogeneity to realize multi-type simulation. Li [31] proposed the concept of transiogram, which is a one-dimensional transition probability function (conditional probability between two points) model at distance h: where x is a specific location, and P ij (h) is the transition probability of a random variable z from class i to class j. When h increases gradually, P ij (h) forms a graph, i.e., a transiogram. P ii (h) denotes an auto-transiogram, which describes the dependency of a type itself; a cross-transiogram describes the dependency between types (including cross correlation, parallel relationship, and directional asymmetry). Transiograms can estimate multistep transition probabilities from sparse point samples for two-dimensional Markov chain simulations, which involves the rich spatial heterogeneity characteristics of land cover types. The transition probabilities of different spatial steps (or lag) can form a one-dimensional continuous transition probability graph. The role of transiograms on Markov chain geostatistics is similar to that of variograms for Kriging geostatistics.
Transiograms can only rely on a large number of reliable and soundly distributed sample points, which need an amount of visual interpretation. When the number of sample points is insufficient, transiograms may show false fluctuations and cannot convey reliable information [31].
In recent years, some organizations participating in land cover mapping around the world have released their validation data to the public, thereby providing a reference for subsequent research. Geoweb-based tagging system also enables users to tag geographical information for land cover data acquisition [32]. We collected these reference sample data, and they were mainly from the following websites: (1) GOFC-GOLD Land Cover Project Office in coordination with reference data producers (http://www.gofcgold.wur.nl/sites/ gofcgold_refdataportal.php). The GOFC-GOLD includes the consolidated GLC 2000 reference (GLC200ref) [33], the consolidated GlobCover 2005 reference (GlobCover2005ref) [34], the System for Terrestrial Ecosystem Parameterization (STEP) reference [35], the Visible Infrared Imaging Radiometer Suite Surface-Type reference [36] and the GLCNMO 2008 datasets [37]; (2) Geo-Wiki crowdsourced data (https://www.geo-wiki.org/); (3) DCP volunteers (http://www.confluence.org); (4) other research institutions, such as the global validation sample set developed by Tsinghua University [38]; http://data.ess.tsinghua.edu. cn/data/temp/GlobalLandCoverValidationSampleSet_v1.xlsx.); (5) Flickr photo-sharing website (www.flickr.com); and (6) LACO-Wiki open access online portal for land cover validation (http://www.laco-wiki.net). Existing reference sample datasets built for calibrating and validating global land cover maps have high reliability and can be reused. However, the density and distribution of these verification points was not balanced because of the Appl. Sci. 2021, 11, 553 5 of 14 scattered collection and may not meet the density requirements of model estimation. Nevertheless, the workload of the visual interpretation of sample points can be appropriately reduced. As more and more organizations release their validation data sets, reusing these data to reduce visual interpretation will become feasible.

Cross Field Transition Probability Matrix
The transition probability from a primary variable to an auxiliary variable can be called a cross field transition probability matrix. Each auxiliary variable is considered independent of each other. The cross field transfer probability is calculated as follows: where f ik is the frequency of class i being transformed into class k in the space of auxiliary variables, and n is the number of auxiliary variables.

MCRF Co-Simulation (Co-MCSS)
Similar to the cokriging model in classical geostatistics, Co-MCSS can be built by extending Markov chain random fields. According to Bayesian reasoning theory, Co-MCSS can be regarded as the Bayesian updating of the Markov chain random field model based on new evidence on auxiliary data.
If X is the target classification variable to be estimated, and E is the auxiliary data set, then the Bayesian inference formula can be written as follows: where C = ∑ X P(E|X)P(X) is a constant. Based on Bayes principle, the simplified Equation (3) is extended to the Co-MCSS model, and the auxiliary data are combined via co-simulation. The contribution of auxiliary variables can be realized in different ways. In this study, auxiliary variable data are considered the nearest neighbor of an unknown location in other variable spaces. The Co-MCSS model with k auxiliary variables can be expressed as follows: where r 0 (k) is the class of the kth auxiliary variable at u 0 (k) , and q i0 r 0 (k) is the transition field probability matrix between i 0 in the space of the primary variable at position u 0 and r 0 in the space of the auxiliary variable. In this study, two auxiliary variables are selected: (1) Pre-classified image and (2) DEM. Therefore, k = 2 in Equation (4).
In practical applications, considering many nearest known neighbors in different directions is unnecessary and difficult. For the pixel data of a remote sensing image, the four main directions are easily considered. Therefore, Equation (4) of the Co-MCSS considering two auxiliary variables and four main directions is as follows: Equation (5) is the Co-MCSS model used in this study.

Study Area
The selected study area was located in Indonesia, Southeast Asia. Indonesia is extremely rich in biological species, and the forest coverage rate has reached 67.8% (according to Globeland30 land cover product 2010, www.globeland30.com).
The collected verification points whose resolution and time met the requirements include reference data of GLC 2000 (GLC2000ref), reference data of Globcover 2005 (Glob-cover2005ref), reference data of system for Territorial Ecosystem parameter (STEP), reference data of visible infrared imaging radiometer Suite (VIIRS), GLCNMO 2008 reference data, Tsinghua University Global validation sample set, and Globeland30 verification points provided by NGCC (National Geomatics Center of China). The collected verification points were filtered because of the inconsistency semantics and resolution between the verification points and the pre-classified image. Figure 2a shows the distribution of reserved verification points according to the source of verification points; Figure 2b illustrates the distribution of reserved verification points in the study area according to the land cover class of Globeland30-2015.
Equation (5) is the Co-MCSS model used in this study.

Study Area
The selected study area was located in Indonesia, Southeast Asia. Indonesia is extremely rich in biological species, and the forest coverage rate has reached 67.8% (according to Globeland30 land cover product 2010, www.globeland30.com).
The collected verification points whose resolution and time met the requirements include reference data of GLC 2000 (GLC2000ref), reference data of Globcover 2005 (Globcover2005ref), reference data of system for Territorial Ecosystem parameter (STEP), reference data of visible infrared imaging radiometer Suite (VIIRS), GLCNMO 2008 reference data, Tsinghua University Global validation sample set, and Globeland30 verification points provided by NGCC (National Geomatics Center of China). The collected verification points were filtered because of the inconsistency semantics and resolution between the verification points and the pre-classified image. Figure 2a  Because of the limitation of the operation speed and data volume of the algorithm, a small block, as shown in the red frame of Figure 2a,b, was selected as the demonstration area based on the richness of land cover type. The demonstration area was located in the IM0104 of the Geo-Eco zone [22]. Only one zone was involved, thus the subsequent processing was carried out uniformly within this zone. Figure 3a presents the land cover classification map of Globeland30-2015, which was used as the pre-classified image in this study. There were five kinds of land cover in this area according to GlobeLand30 classification schema [11]. Globeland30-2015 had a resolution of 30 m and was based on the classification of remote sensing images within year 2015. The demonstration area had more than 70,000 pixels. The number of available verification points was insufficient to meet the requirements of model calculation. Therefore, some sample pixels were interpreted as supplements to form a sample data set as shown in Figure 3b. Because of the limitation of the operation speed and data volume of the algorithm, a small block, as shown in the red frame of Figure 2a,b, was selected as the demonstration area based on the richness of land cover type. The demonstration area was located in the IM0104 of the Geo-Eco zone [22]. Only one zone was involved, thus the subsequent processing was carried out uniformly within this zone. Figure 3a presents the land cover classification map of Globeland30-2015, which was used as the pre-classified image in this study. There were five kinds of land cover in this area according to GlobeLand30 classification schema [11]. Globeland30-2015 had a resolution of 30 m and was based on the classification of remote sensing images within year 2015. The demonstration area had more than 70,000 pixels. The number of available verification points was insufficient to meet the requirements of model calculation. Therefore, some sample pixels were interpreted as supplements to form a sample data set as shown in Figure 3b.
Using the "create random points" function in ArcGIS, the random points were gen- Using the "create random points" function in ArcGIS, the random points were generated and uniformly distributed of the density of 1 point per 4000 m 2 . If there was no network collected verification point at or near the position of the random point, then visually interpreted on Google Earth high resolution images was needed. A total of 15,826 sample points were divided into two parts. Of these points, 7913 were used for model simulation (about 10% of the pixels in the demonstration area), and the remaining points were utilized for the final accuracy verification.
In addition to the sample point data, Geo-Eco zoning related attribute data should also be collected. For the demonstration area, the final data source was GDEM 30m resolution digital elevation data, considering the availability of data, the independence of data, resolution, and year close to 2015 (Figure 4).  Figure 5 is the result of the self-and cross-transiograms of each land cover type. The type is indicated by a code, and the specific meaning is shown in Table 1.  In addition to the sample point data, Geo-Eco zoning related attribute data should also be collected. For the demonstration area, the final data source was GDEM 30 m resolution digital elevation data, considering the availability of data, the independence of data, resolution, and year close to 2015 (Figure 4). In addition to the sample point data, Geo-Eco zoning rel also be collected. For the demonstration area, the final data so olution digital elevation data, considering the availability of data, resolution, and year close to 2015 ( Figure 4).

Color
Code Lan 10 20 30  Figure 5 is the result of the self-and cross-transiograms of each land cover type. The type is indicated by a code, and the specific meaning is shown in Table 1.

Transfer Probability Diagram
For the cultivated land, the curve was more tortuous as shown in Figure 5a. The self-transfer (cultivated land with itself) and cross-transfer (with the other four categories of forest, grassland, wetland, and water body) probabilities of the forest were the highest, wetland were lower than forest, water body were lower than wetland, and grassland were basically 0. This finding was due to the scattered distribution of grassland. The transfer probability map results were consistent with the results of expert interpretation.  For the cultivated land, the curve was more tortuous as shown in Figure 5a. The self-transfer (cultivated land with itself) and cross-transfer (with the other four categories of forest, grassland, wetland, and water body) probabilities of the forest were the highest, wetland were lower than forest, water body were lower than wetland, and grassland were basically 0. This finding was due to the scattered distribution of grassland. The transfer probability map results were consistent with the results of expert interpretation.
The forest had the largest proportion, thus its self-transfer probability was higher than those of the four other categories as shown in Figure 5b. The transfer probabilities of the wetland and the water body were lower than the autocorrelation transfer probability of the forest. The conversion relationship from the cultivated land to the grassland and the forest almost did not exist, because the area of these two categories was relatively small. This finding was consistent with the results of expert interpretation.
In Figure 5c, the distribution of the grassland was scattered, thus the curve of the transiogram was very tortuous and not as smooth as those of the other categories. Generally, the cross-transfer probability of the grassland and the cultivated land was almost 0, and the short-distance cross-transfer probabilities of the forest land, the water body, and the wetland were similar. The long-distance cross-transfer probability of the forest was higher than that of the other categories because the proportion of the forest was the largest.   Using the "create random points" function in ArcGIS, the random points were generated and uniformly distributed of the density of 1 point per 4000 m 2 . If there was no network collected verification point at or near the position of the random point, then visually interpreted on Google Earth high resolution images was needed. A total of 15,826 sample points were divided into two parts. Of these points, 7913 were used for model simulation (about 10% of the pixels in the demonstration area), and the remaining points were utilized for the final accuracy verification.
In addition to the sample point data, Geo-Eco zoning related attribute data should also be collected. For the demonstration area, the final data source was GDEM 30m resolution digital elevation data, considering the availability of data, the independence of data, resolution, and year close to 2015 (Figure 4).  Figure 5 is the result of the self-and cross-transiograms of each land cover type. The type is indicated by a code, and the specific meaning is shown in Table 1.  Using the "create random points" function in ArcGIS, the random points were generated and uniformly distributed of the density of 1 point per 4000 m 2 . If there was no network collected verification point at or near the position of the random point, then visually interpreted on Google Earth high resolution images was needed. A total of 15,826 sample points were divided into two parts. Of these points, 7913 were used for model simulation (about 10% of the pixels in the demonstration area), and the remaining points were utilized for the final accuracy verification.

Transfer Probability Diagram
In addition to the sample point data, Geo-Eco zoning related attribute data should also be collected. For the demonstration area, the final data source was GDEM 30m resolution digital elevation data, considering the availability of data, the independence of data, resolution, and year close to 2015 (Figure 4).  Figure 5 is the result of the self-and cross-transiograms of each land cover type. The type is indicated by a code, and the specific meaning is shown in Table 1.  Using the "create random points" function in ArcGIS, the random points were generated and uniformly distributed of the density of 1 point per 4000 m 2 . If there was no network collected verification point at or near the position of the random point, then visually interpreted on Google Earth high resolution images was needed. A total of 15,826 sample points were divided into two parts. Of these points, 7913 were used for model simulation (about 10% of the pixels in the demonstration area), and the remaining points were utilized for the final accuracy verification.

Transfer Probability Diagram
In addition to the sample point data, Geo-Eco zoning related attribute data should also be collected. For the demonstration area, the final data source was GDEM 30m resolution digital elevation data, considering the availability of data, the independence of data, resolution, and year close to 2015 (Figure 4).  Figure 5 is the result of the self-and cross-transiograms of each land cover type. The type is indicated by a code, and the specific meaning is shown in Table 1.  Using the "create random points" function in ArcGIS, the random points were generated and uniformly distributed of the density of 1 point per 4000 m 2 . If there was no network collected verification point at or near the position of the random point, then visually interpreted on Google Earth high resolution images was needed. A total of 15,826 sample points were divided into two parts. Of these points, 7913 were used for model simulation (about 10% of the pixels in the demonstration area), and the remaining points were utilized for the final accuracy verification.

Transfer Probability Diagram
In addition to the sample point data, Geo-Eco zoning related attribute data should also be collected. For the demonstration area, the final data source was GDEM 30m resolution digital elevation data, considering the availability of data, the independence of data, resolution, and year close to 2015 (Figure 4).  Figure 5 is the result of the self-and cross-transiograms of each land cover type. The type is indicated by a code, and the specific meaning is shown in Table 1. The forest had the largest proportion, thus its self-transfer probability was higher than those of the four other categories as shown in Figure 5b. The transfer probabilities of the wetland and the water body were lower than the autocorrelation transfer probability of the forest. The conversion relationship from the cultivated land to the grassland and the forest almost did not exist, because the area of these two categories was relatively small. This finding was consistent with the results of expert interpretation.

Transfer Probability Diagram
In Figure 5c, the distribution of the grassland was scattered, thus the curve of the transiogram was very tortuous and not as smooth as those of the other categories. Generally, the cross-transfer probability of the grassland and the cultivated land was almost 0, and the short-distance cross-transfer probabilities of the forest land, the water body, and the wetland were similar. The long-distance cross-transfer probability of the forest was higher than that of the other categories because the proportion of the forest was the largest.
As shown in Figure 5d, the curve of the wetland was relatively smooth. As the distance increased, the probability of the conversion of the wetland to the forest gradually increased. The probability of the conversion of the wetland to the water body was lower than that of the forest. Almost no conversion relationship was observed in the cultivated land and the grassland. In addition, the wetlands and water bodies always appeared together, and this finding was consistent with our common sense that wetlands always form around water bodies.
The shapes of the autocorrelation transfer probability curve and the cross-correlation transfer probability curve of the water body were similar to those of the wetland by comparing Figure 5d,e. As distance increased, the probability of the conversion of the water body to the forest increased gradually. The probability of the conversion of the water body to the wetland was less than that of the forest. Almost no conversion relationship was observed in the cultivated land and the grassland. In addition, wetlands and water bodies always appeared together, wetlands always formed around water bodies, thus their transfer probability curves were very similar.
In general, each category had the highest probability of conversion to forest, because the proportion of forest was relatively large. In addition, the forest, wetland, and water sample points were rich, the transfer probability curve was stable, and the credibility was high. Although the curve of cultivated land was not smooth enough, the transiogram model was reliable because of its concentrated distribution. There were few and scattered samples in grassland, and the transfer probability map was not smooth enough, and the confidence level was low.

Simulation Results
For comparison, two kinds of simulation methods were adopted: One was using MCRF, which involves sample pixels only, and the other applies the Co-MCSS method combined with auxiliary data (including pre-classified image, i.e., Globeland30-2015 data and DEM layer). The simulation results include the occurrence probability map of each category and the optimal prediction map.

Simulation Results of MCRF
The occurrence probability map of the simulation results obtained from the sample pixels is shown in Figure 6. The deeper the hue is, the higher the likelihood of a category to be correct, and vice versa. The black part indicates that the confidence level of the type is nearly 1, i.e., it can be considered almost 100% certain. The white part implies that the confidence level of the type is approximately 0, which can be considered impossible. The color of the forest is the deepest, and the range is the widest, suggesting that the reliability of the forest land is highest. Although the scope of the cultivated land is small, the color is deep. As such, the simulation of the cultivated land is reliable. For grassland, few dark places and many gray areas are found, implying that the simulation of the grassland is insufficiently reliable. The light tone part can be considered the warning area for further survey. Wetlands and water bodies are always accompanied by each other, thus their hues are almost complementary, and the range of gray colors is large. Therefore, wetlands and water bodies are easily misclassified.
The optimal prediction map of the simulation results is shown in Figure 7, which was the assembly of the most likely type of each location. The coincidence degree of the forest, cultivated land, and part of grassland, wetland, and water bodies was higher than that of the original classification products (Globeland30-2015), and the other places with deviation were likely to be the ones with false classifications. ity of the forest land is highest. Although the scope of the cultivated land is small, the color is deep. As such, the simulation of the cultivated land is reliable. For grassland, few dark places and many gray areas are found, implying that the simulation of the grassland is insufficiently reliable. The light tone part can be considered the warning area for further survey. Wetlands and water bodies are always accompanied by each other, thus their hues are almost complementary, and the range of gray colors is large. Therefore, wetlands and water bodies are easily misclassified. The optimal prediction map of the simulation results is shown in Figure 7, which was the assembly of the most likely type of each location. The coincidence degree of the forest, cultivated land, and part of grassland, wetland, and water bodies was higher than that of the original classification products (Globeland30-2015), and the other places with deviation were likely to be the ones with false classifications.

Simulation Results of Co-MCRF with Auxiliary Data
After the auxiliary data were added, the occurrence probability map of the co-simulation results is shown in Figure 8. The simulation results of the forest, cultivated land, grassland, wetland, and water body was similar to MCRF. The distribution of the wetland, water body, and grassland was slightly different from that involving the sample data only.

Simulation Results of Co-MCRF with Auxiliary Data
After the auxiliary data were added, the occurrence probability co-simulation results is shown in Figure 8. The simulation results of the fore land, grassland, wetland, and water body was similar to MCRF. The distrib wetland, water body, and grassland was slightly different from that involvin data only.

Simulation Results of Co-MCRF with Auxiliary Data
After the auxiliary data were added, the occurrence probability map of the cosimulation results is shown in Figure 8. The simulation results of the forest, cultivated land, grassland, wetland, and water body was similar to MCRF. The distribution of the wetland, water body, and grassland was slightly different from that involving the sample data only.

Simulation Results of Co-MCRF with Auxiliary Data
After the auxiliary data were added, the occurrence probability map of the co-simulation results is shown in Figure 8. The simulation results of the forest, cultivated land, grassland, wetland, and water body was similar to MCRF. The distribution of the wetland, water body, and grassland was slightly different from that involving the sample data only. The simulation result of the optimal prediction map with the auxiliary data are shown in Figure 9. The simulation results revealed that the cultivated land, forest, and the original classification products had slight variations, the distribution of the cultivated land was relatively concentrated, and the coverage of forest distribution was very wide. The main changes were found in the grassland, the wetland, and the water body. Wetlands and water bodies were always together, thus they were easily misclassified. For grassland, the simulation result considerably differed from the original pre-classified image because of two possible reasons. One was that more pixels were wrongly classified into grassland, and the other is that the transiograms obtained were insufficiently reliable.

Accuracy Analysis
Accuracy verification aims to compare the optimal prediction map of two different simulation methods with the set of sample pixels used for accuracy verification. Type The simulation result of the optimal prediction map with the auxiliary data are shown in Figure 9. The simulation results revealed that the cultivated land, forest, and the original classification products had slight variations, the distribution of the cultivated land was relatively concentrated, and the coverage of forest distribution was very wide. The main changes were found in the grassland, the wetland, and the water body. Wetlands and water bodies were always together, thus they were easily misclassified. For grassland, the simulation result considerably differed from the original pre-classified image because of two possible reasons. One was that more pixels were wrongly classified into grassland, and the other is that the transiograms obtained were insufficiently reliable. The simulation result of the optimal prediction map with the auxiliary data are shown in Figure 9. The simulation results revealed that the cultivated land, forest, and the original classification products had slight variations, the distribution of the cultivated land was relatively concentrated, and the coverage of forest distribution was very wide. The main changes were found in the grassland, the wetland, and the water body. Wetlands and water bodies were always together, thus they were easily misclassified. For grassland, the simulation result considerably differed from the original pre-classified image because of two possible reasons. One was that more pixels were wrongly classified into grassland, and the other is that the transiograms obtained were insufficiently reliable.

Accuracy Analysis
Accuracy verification aims to compare the optimal prediction map of two different simulation methods with the set of sample pixels used for accuracy verification. Type matching means that the simulation results are correct, and the type inconsistency indi-

Accuracy Analysis
Accuracy verification aims to compare the optimal prediction map of two different simulation methods with the set of sample pixels used for accuracy verification. Type matching means that the simulation results are correct, and the type inconsistency indicates that the simulation results are wrong. The results are shown in Table 2. The overall accuracy of Globeland30 products was high [21], but accuracy was spatially varied in Southeast Asia in part or some land cover types. The proportion of matching the pre-classified image with the sample pixels was 75.34%. The accuracy of the MCRF simulation results was 81.52%, which was 6.18% higher than that of the GlobeLand30-2015 products. The accuracy of matching the co-simulation results with the verification points was 86.48%, which was 11.14% higher than that of the GlobeLand30 products. The simulation results also revealed where the accuracy was relatively low. The area with a low accuracy can be used as the warning area of false classification, which provides a reference for subsequent product improvement.

Conclusions
In this study, a method is proposed to improve the accuracy of land cover classification products by coupling Geo-Eco zoning and Markov chain geostatistical simulation. This method can be used to evaluate the spatial accuracy variation of land cover classification products and improve the drawbacks of the overall accuracy evaluation of the general confusion matrix method. In this study, the verification points collected from the network are reused. The Geo-Eco zoning attribute data and pre-classified images are set as auxiliary data for co-simulation. The simulation results are more reliable if the simulation area is limited in the Geo-eco zone. The local accuracy of the pre-classified image can be quantified and improved. Therefore, the coupling of Geo-Eco zoning and Markov chain geostatistical simulation can enhance the accuracy of Globeland30 data by more than 10%.
However, some deficiencies should be improved in the future.
(1) In this study, the image size that can be processed is limited because of the high operation cost of the algorithm. The experimental area only contains one Geo-eco zone. In future studies, the algorithm should be optimized, and a GPU-parallel acceleration method should be used to increase the amount of data that can be processed and make the algorithm more practical. (2) Existing data are limited, thus only the attribute data related to elevation are tested, and the impact of other types of auxiliary data are yet to be described. The role of Geo-Eco zoning in geostatistical simulation should be further explored. For example, geoscience knowledge on Geo-Eco zoning can be applied and combined with verification points to generate reasonable transiograms and further reduce the number of sample pixels required by the algorithm.  Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author.