Global Land Cover Assessment Using Spatial Uniformity Validation Dataset

: The Degree Conﬂuence Project (DCP) is a volunteer-based validation dataset that comprises useful information for global land cover map validation. However, there is a problem with using DCP points as validation data for the accuracy assessment of land cover maps. While resolutions of typical global land cover maps are several hundred meters to several kilometers, DCP points can only guarantee an area of several tens of meters that can be conﬁrmed by ground photographs. So, the objective of this study is to create a land cover map validation dataset with added spatial uniformity information using satellite images and DCP points. For this, we devised a new method to semiautomatically guarantee the spatial uniformity of DCP validation data points at any resolution. This method can judge the validation data with guaranteed uniformity with a user’s accuracy of 0.954. Furthermore, we conducted the accuracy assessment for the existing global land cover maps by the DCP validation data with guaranteed spatial uniformity and found that the trends differed by class and region.


Introduction
In the land cover mapping process, accuracy assessment is an expensive yet essential step [1]. In particular, it is almost impossible to conduct a field survey to collect the required amount of validation data in a reasonable period of time to assess the accuracy of global land cover maps. One solution to these problems is the use of volunteer-based Geographic Information (VGI) [2]. VGI is provided by the project for use in a variety of applications, including land cover map validation. Projects that provide VGI for global coverage include Flickr (https://www.flickr.com/), OpenStreetMap (http://www.openstreetmap.org/), Panoramio (http://www.panoramio.com/), the Degree Confluence Project (DCP) (http: //confluence.org/), and the Geo-Wiki project (http://www.geo-wiki.org/). The DCP website provides ground-based photographs of points at integer latitudes and longitudes around the globe, and some studies have used these photographs to evaluate the accuracy of land cover maps. The first study that used DCP information as an accuracy evaluation of land cover maps was by Iwao et al. [3], who proposed a new land cover map validation method using DCP information. In that study, 749 confluences were used to evaluate the accuracy of land cover maps for the Eurasia region using Global Land Cover 2000 (GLC2000) [4], MODIS Land Cover (MOD12) [5], the University of Maryland's 1-km Global Land Cover products (UMD) [6], and the Global Land Cover Characteristics Data Base Version 2.0 (GLCC) [7]. In studies aimed at improving accuracy by integrating existing global land cover maps [8,9], a DCP-based land cover validation dataset was used for accuracy assessment. Foody and Boyd [10] also validated the availability of DCP Remote Sens. 2021, 13, 2950 2 of 18 photographs by evaluating the accuracy of GlobCover for a region of tropical forests in West Africa. Soyama et al. [11] found that highly reliable reference data based on defined quality levels was produced for the IGBP land cover classification scheme using DCP information. Qian et al. [12] used DCP information as a land cover accuracy assessment in a study to consider the effect of uncertainty in map accuracy caused by references produced under the existing global land cover map matrix structure classification scheme.
As mentioned above, the potential use of DCP information in assessing the accuracy of land cover maps has already been shown in several studies. However, there is an issue regarding spatial representativeness between DCP points and global land cover maps. The resolution of a typical global land cover map is a few hundred meters to a few kilometers. On the other hand, DCP photos cannot confirm the land cover hundreds of meters or kilometers away from the DCP points. As a result, it may cause disagreement between the class that can be identified from the DCP photos and the true class of the land cover map at its particular resolution. If such validation data lacking spatial representativeness is used to evaluate the accuracy of the land cover map, it will be underestimated and cannot serve as validation data. Therefore, when ground validation data such as DCP are used to evaluate the accuracy of land cover maps, it is necessary to show that the spatial representativeness of the ground validation data is guaranteed. However, there has been no quantitative discussion of spatial representativeness for DCP validation data.
The objective of this study is to create a ground validation dataset that included the information of spatial uniformity using satellite images. For this, we devised a new method to semiautomatically guarantee the spatial uniformity of DCP validation data points at any resolution. The method constructs an SVM model using some DCP points as training data, and then determines whether each DCP point has spatial uniformity at a certain resolution or not. In addition, we conducted accuracy assessments for existing global land cover maps using a DCP validation dataset with guaranteed spatial uniformity.

Materials and Methods
In this study, we create a validation dataset with spatial uniformity information and evaluate the accuracy of the global land cover map using said validation dataset. The method for adding spatial uniformity information to the validation data is presented in Figure 1. First, we obtain a list of DCP points and download the satellite images for the corresponding latitude and longitude of a list of DCP. Then, a visual interpretation of the class is conducted for all DCP points, and that of spatial uniformity is performed for some DCP points. After that, the SVM model for automatic determination of spatial uniformity is constructed and the determination of spatial uniformity is performed for all DCP points. The data and methods required for these processes are described below.

Materials
DCP information was used to generate validation data for evaluating the accuracy of global land cover maps. The DCP is a volunteer-based ground validation dataset that provides field photographs and descriptions of sites at integer latitudes and longitudes around the globe, including the date of the survey and quality information. In this study, we used DCP points from 2003 to 2007. The reason that we chose this period is that we can assume that there will be little land cover change for five years, and the amount of DCP data in the five years centered on 2005 is the largest among 2000, 2005, 2010, and 2015.

Satellite Images
In this study, the semiautomatic determination of spatial uniformity was performed using information from satellite images. The satellite images used in this study were ASTER L1T Radiance (hereafter referred to as ASTER) from 2003 to 2007, Landsat Global Land Survey 2005 Landsat 5 + 7 scenes (hereafter referred to as GLS2005) whose Landsat images were acquired from 2003 to 2008, and Global PALSAR-2/PALSAR Yearly Mosaic (hereafter referred to as PALSAR) in 2007. The downloaded images are listed in Table 1. These satellite images were downloaded from Google Earth Engine Code (GEE) in the range of 300 m × 300 m and 990 m × 990 m centered on integer latitude and longitude. This is because these resolutions are commonly used in global land cover maps. These clipped satellite images are referred to as patches.

Satellite Images
In this study, the semiautomatic determination of spatial uniformity was performed using information from satellite images. The satellite images used in this study were AS-TER L1T Radiance (hereafter referred to as ASTER) from 2003 to 2007, Landsat Global Land Survey 2005 Landsat 5 + 7 scenes (hereafter referred to as GLS2005) whose Landsat images were acquired from 2003 to 2008, and Global PALSAR-2/PALSAR Yearly Mosaic (hereafter referred to as PALSAR) in 2007. The downloaded images are listed in Table 1. These satellite images were downloaded from Google Earth Engine Code (GEE) in the range of 300 m × 300 m and 990 m × 990 m centered on integer latitude and longitude. This is because these resolutions are commonly used in global land cover maps. These clipped satellite images are referred to as patches.

HH HV
The following preprocessing was performed on the downloaded satellite images: AS-TER was downloaded from GEE with 0% cloud cover conditions. The processing for AS-TER was to convert radiance to reflectance and to create NDVI from Band 2 and Band 3.  The following preprocessing was performed on the downloaded satellite images: ASTER was downloaded from GEE with 0% cloud cover conditions. The processing for ASTER was to convert radiance to reflectance and to create NDVI from Band 2 and Band 3. The common preprocessing for ASTER and GLS2005 was to select the band with the highest mean NDVI value when there were multiple overlapping satellite images at the same latitude and longitude.
The rationale for choosing the time of maximum NDVI is that the larger the NDVI, the more likely it is that the area contains vegetation and the easier it is to determine whether it is uniform or non-uniform. As a common preprocessing step for ASTER, GLS2005, and PALSAR, if even one pixel in a patch contained "Nodata," the patch was discarded.

World Cities Database
The World Cities Database (https://simplemaps.com/data/world-cities) is a simple database of cities and towns in the world. In this research, when creating the training data for determining uniformity or non-uniformity by SVM, we created uniform training data for each class. Then, we referred to the World Cities Database to create additional training data since the Urban class lacks DCP points. Of these databases, we used the Basic database.

Google Earth
Google Earth was used in the visual interpretation of classes for the DCP validation data and to create the training data for the spatial uniformity judgment.

Method for Creating Validation Data Sets with Guaranteed Spatial Uniformity
This section describes a method for creating a validation dataset to evaluate the accuracy of global land cover maps with guaranteed spatial uniformity.

Visual Interpretation for Classes
Between 2003 and 2007, a total of 6243 points were visually inspected, including multiple visits to a single point. For the visual interpretation of the classes, we adopted the classification system of the IGBP [16] (MCD12 definitions) and further consolidated them into the seven classes defined in this study. The aggregation from the IGBP classes to the classes defined in this study was automated. DCP photographs and aerial photographs from Google Earth were used as references for visual interpretation. This work was carried out by two people, and only their results that were in agreement were considered valid.

Visual Interpretation for Uniformity/Non-Uniformity
In this study, in order to automate the judgment of uniformity and non-uniformity by SVM, we created the training data for SVM model constructing with some of the DCP and World Cities Database data. In Section 2.2.1, we used seven class definitions, but wetland and the mosaic of vegetation and cropland included in "Other" are not included in this study because they can be considered as non-uniform. Therefore, the six classes that were judged to be uniform or non-uniform were Forest, Grass/Shrub, Cropland, Urban, Barren, and Water. Stratified random sampling was used to obtain uniform training data with 100 points per class. Since some data that would be considered non-uniform were yielded, a class called "non-uniform" was also created for 100 points. These uniform and non-uniform training data were also created by two people who performed a visual interpretation of the data, and only the results that were in agreement were adopted. The criterion for judging uniformity was whether a class representing a 30 m × 30 m area centered on an integer latitude and longitude contained 90% or more of the classes when viewed in a 300 m × 300 m area.

Building the SVM Model
For each patch of the preprocessed satellite images, the mean and the sum of squared deviations (SSD) to the center pixel were calculated. The sum of deviation square SSD of the pixels contained in the patch, which centered a certain integer latitude and longitude, is defined as follows.
Here, p i is the value of the i-th pixel, p 0 is the value of the pixel that is the center of the area, and n is the number of pixels contained in the patch. Since the mean and SSD were calculated for 14 bands, a total of 28 bands were used as input data for SVM.
In this study, two cost parameters were used for SVM. The intended use of the two cost parameters is to retain the "certainly uniform data" and screen out the "possibly non-uniform data" when determining spatial uniformity by SVM. In this study, the two cost parameters were adjusted using LIBSVM version 3.25 software. The RBF (radial basis function) kernel was used in this SVM model.
In this study, nested cross-validation [17], which consists of outer cross-validation (OuterCV) and inner cross-validation (InnerCV), was used. The flow of nested crossvalidation is shown in Figure 2. In nested cross validation, all data are first divided into test data and outer training data. In InnerCV, these outer training data are divided into validation data and inner training data, and five-fold cross-validation [18] is applied to determine the optimal hyper parameters. In OuterCV, five-fold cross-validation [18] is applied to the test data and outer training data to evaluate the model performance. The procedure for building the SVM model using nested cross-validation is shown below.
Here, is the value of the -th pixel, is the value of the pixel that is the center of the area, and is the number of pixels contained in the patch. Since the mean and were calculated for 14 bands, a total of 28 bands were used as input data for SVM.
In this study, two cost parameters were used for SVM. The intended use of the two cost parameters is to retain the "certainly uniform data" and screen out the "possibly nonuniform data" when determining spatial uniformity by SVM. In this study, the two cost parameters were adjusted using LIBSVM version 3.25 software. The RBF (radial basis function) kernel was used in this SVM model.
In this study, nested cross-validation [17], which consists of outer cross-validation (OuterCV) and inner cross-validation (InnerCV), was used. The flow of nested cross-validation is shown in Figure 2. In nested cross validation, all data are first divided into test data and outer training data. In InnerCV, these outer training data are divided into validation data and inner training data, and five-fold cross-validation [18] is applied to determine the optimal hyper parameters. In OuterCV, five-fold cross-validation [18] is applied to the test data and outer training data to evaluate the model performance. The procedure for building the SVM model using nested cross-validation is shown below. Initially, all the training data with uniform/non-uniform visual interpretation were divided into five sets, one of which was used as the test data and the rest as outer training data. Then, the outer training data were further divided into five sets, one of which was used for validation data and the rest for inner training data. The inner training data were used to construct a two-class SVM model with each uniform and non-uniform class. The optimum parameters were searched by grid-search in the range of 2 ( = −2, −1, . . . ,6) for C1 and C2, and 2 ( = −8, −7, . . . ,2) for gamma. In general, the parameters are adjusted to maximize the mean OA (Overall Accuracy) of InnerCV, but in this study, the following original decision method was established: Step 1: Create confusion matrix from InnerCV validation data.
Step 2: Find the parameter whose uniform UA (User Accuracy) (= ) of the confusion matrix is closest to 100%. Initially, all the training data with uniform/non-uniform visual interpretation were divided into five sets, one of which was used as the test data and the rest as outer training data. Then, the outer training data were further divided into five sets, one of which was used for validation data and the rest for inner training data. The inner training data were used to construct a two-class SVM model with each uniform and non-uniform class. The optimum parameters were searched by grid-search in the range of 2 2i−1 (i = −2, −1, . . . , 6) for C1 and C2, and 2 2i−1 (i = −8, −7, . . . , 2) for gamma. In general, the parameters are adjusted to maximize the mean OA (Overall Accuracy) of InnerCV, but in this study, the following original decision method was established: Step 1: Create confusion matrix from InnerCV validation data.
Step 2: Find the parameter whose uniform UA (User Accuracy) = m 11 m 11 +m 12 of the confusion matrix is closest to 100%.
Step 3: If there are multiple parameters that satisfy the condition of UA chosen in Step 2, choose the one with the highest OA.
Step 4: If there are multiple parameters selected in Step 3, select the one with the smallest cost parameter C1.
Maximizing OA in Step 3 is equivalent to maximizing m 11 in Table 2 when UA is the same. The reason for maximizing m 11 is that it is better to have a large number of valid DCP points with guaranteed uniformity.
Step 4 is required as overfitting is generally more likely to occur when the cost parameter of SVM is large.
In this study, SVM models were created for each class, and the final uniform/nonuniform decision was made by integrating the classification results by each model. If we denote by M k the SVM model that discriminates a uniform class k (k = 1, 2, . . . , n) from non-uniform when there are n classes, the result of the decision by the SVM model for any sample whose attributes are denoted by x = (x 1 , x 2 , . . . , x 28 ) can be expressed as follows: Here, y k = {0, 1}, where 1-uniform and 0-non-uniform. Since the results of each model identify "certainly uniform data", if any SVM model judges a point to be uniform, that point is considered uniform. Therefore, the final judgment of uniformity or nonuniformity can be obtained as follows: In this way, a certain sample is classified uniformly when y final = 1 and non-uniformly when y final = 0. Instead of determining uniformity or non-uniformity for all DCP points from the beginning, we perform a two-class classification of uniformity of a class versus non-uniformity for each class and, finally, integrate them. This is to account for the fact that the magnitude of the mean and the SSD from the center pixel when each class is uniform is different.
The means of uniform UA and that of OA obtained by the InnerCV, which is divided into five parts, are five ways because the OuterCV is also divided into five parts. Therefore, the mean of uniform UA and that of OA for the OuterCV are also obtained. In other words, a double mean is taken. As a result of this double averaging, the optimal parameter is the one that satisfies the conditions of Step 2 to Step 4.
After the hyperparameters were determined, the SVM model was reconstructed using the determined parameters C1, C2, and gamma for each class using the outer training data. The performance of the SVM model was evaluated by classifying the test data with the SVM model trained by the outer training data and obtaining the mean accuracy of the five datasets. Finally, we reconstructed the final SVM model using all the training data.
The number of training, validation, and test data used in each stage of the nested cross-validation process are shown in Table 3. When building the SVM model, we used the mean and SSD from the center pixel for a 300 m × 300 m patch for each band. As long as the mean and SSD from the center pixel can be calculated, the same SVM model can be applied even if the target patch size is different. Therefore, in the evaluation of the accuracy of the existing global land cover map, we considered not only 300 m × 300 m, but also 990 m × 990 m as the patch size to judge the uniformity or non-uniformity of DCP.

Land Cover Class Definition
The class definitions for each global land cover map used in this study are shown in Table 5. A common class definition was also created for accuracy comparison. "Other" in the common class includes the class definition of each global land cover map, and those that cannot be evaluated for spatial uniformity were consolidated into one class. The definition of the common classes was performed by referring to the methods described in [22,23].

Results of Visual Interpretation of DCP
Between 2003 and 2007, a total of 6243 points, including multiple surveys of a single site, were visually interpreted for class. Of these, the number of DCP points that were the target six classes and in agreement by the visual interpretation of classes by two people was 4721 points. The remaining 1522 points (about 24.4%) were either "disagreement", "Other" class, or "impossible to determine" by a visual interpretation by two people. Therefore, the DCP validation data removed here are not included in the determination of uniformity/nonuniformity by the SVM model and in the global land cover accuracy assessment.

Adjusting SVM Parameters
The hyperparameters of the finally adopted model and the results of the performance evaluation are shown in Table 6. The mean of uniform UA of InnerCV in Table 6 is the highest in each class, and furthermore, the mean of OA is the highest under the highest uniform UA. Similarly, for the OutorCV, the highest mean of the uniform UA and the highest OA under the highest uniform UA for each class are written in Table 6. For the OuterCV, we also calculated the accuracy of the final model after integrating the SVM models of each class. In other words, the UA of that model when uniform DCP validation data are secured using the final SVM model constructed in this study was 0.954. In addition, the variance of uniform UA for the final model in the OuterCV was 8.90 × 10 −4 .

Guaranteed Spatial Uniformity
In this study, DCP validation data with guaranteed spatial uniformity in the 300 m and 990 m scale were generated. Tables 7 and 8 show the number of uniform and nonuniform DCP validation data determined by the constructed SVM model. Note that, of the 4721 points after visual interpretation, 1958 points at 300 m scale and 2114 points at 990 m scale were reduced to one if they overlapped at the same latitude and longitude and we removed the points for which no satellite image existed. So, the total number of DCP validation data is 2763 and 2607 points here, at 300 m and 990 m scale, respectively.  Figure 3 shows the proportion of DCP points with guaranteed spatial uniformity among all DCP points in each class. More than half of the DCP points of Forest, Grass/Shrub, Cropland, and Urban are judged to have a high possibility of non-uniformity at both 300 m and 990 m scales. On the other hand, only about 10-20% of Barren and Water were judged as having non-uniformity potential. These results mean the optimal resolution differs depending on the class. For example, more than half of Forest, Grass/Shrub, Cropland, and Urban are under-resolved at 300 m resolution, which is likely to have been underestimated in the DCP validation data before spatial uniformity was guaranteed. On the other hand, since Barren and Water have nearly not decreased from the originally created DCP, the results of accuracy assessment should be almost the same between the validation data before and after guaranteeing spatial uniformity at the 300 m and 990 m scales. The relationship between the screened amount of non-uniform validation data and the difference in the results in the accuracy assessment will be discussed in detail later in the accuracy assessment of existing global land cover maps.  Figure 3 shows the proportion of DCP points with guaranteed spatial uniformity among all DCP points in each class. More than half of the DCP points of Forest, Grass/Shrub, Cropland, and Urban are judged to have a high possibility of non-uniformity at both 300 m and 990 m scales. On the other hand, only about 10-20% of Barren and Water were judged as having non-uniformity potential. These results mean the optimal resolution differs depending on the class. For example, more than half of Forest, Grass/Shrub, Cropland, and Urban are under-resolved at 300 m resolution, which is likely to have been underestimated in the DCP validation data before spatial uniformity was guaranteed. On the other hand, since Barren and Water have nearly not decreased from the originally created DCP, the results of accuracy assessment should be almost the same between the validation data before and after guaranteeing spatial uniformity at the 300 m and 990 m scales. The relationship between the screened amount of non-uniform validation data and the difference in the results in the accuracy assessment will be discussed in detail later in the accuracy assessment of existing global land cover maps.  Figure 4 shows the distribution of (a) all DCP points and (b) DCP points with guaranteed uniformity at the 300 m scale. Note that the distribution of non-uniformity is not guaranteed to be non-uniform, only highly likely. In the same way, Figure 5 shows the distribution of DCP points at the 990 m scale.
Next, we compared the proportion of DCP points with guaranteed spatial uniformity among all DCP points by continent for North America, Asia, South America, Europe, Oceania, and Africa. Figures 6 and 7 show the class on the horizontal axis and the proportion of uniformity DCP points in all DCP points on the vertical axis. Note that "Water" focuses on the interior of the continent, and the ocean is not included. These results show that the smaller the proportion of Uniformity DCP/All DCP is, the more the validation data do not meet the spatial uniformity at the 300 m or 990 m scales. For example, Water in Africa has a proportion of 1.0, indicating that the exact same validation data can be used as when spatial uniformity is not taken into account. However, note that the distribution of DCP points is not necessarily representative of the entire continent, as Figures 4 and 5 show. For example, in Figure 6, there is a possibility that many forests in South America are smaller than the 300 m area. Intuitively, the mean resolution of forests in South America seems to be larger than the 300 m scale, but the discrepancy between the intuition and the results is due to the spatial bias of the DCP points as shown in Figure 4. From these results, we can obtain a trend of how much each class of DCP has spatial uniformity at the 300 m or 990 m scales for the regions where DCP points are distributed. Such information may be useful in for discussing the optimal resolution for each class when mapping each region where DCP points exist. Figure 3. The proportion of DCP points with guaranteed spatial uniformity among all DCP points in each class. Figure 4 shows the distribution of (a) all DCP points and (b) DCP points with guaranteed uniformity at the 300 m scale. Note that the distribution of non-uniformity is not guaranteed to be non-uniform, only highly likely. In the same way, Figure 5 shows the distribution of DCP points at the 990 m scale.   Next, we compared the proportion of DCP points with guaranteed spatial uniformity among all DCP points by continent for North America, Asia, South America, Europe, Oceania, and Africa. Figures 6 and 7 show the class on the horizontal axis and the proportion of uniformity DCP points in all DCP points on the vertical axis. Note that "Water" focuses on the interior of the continent, and the ocean is not included. These results show that the smaller the proportion of Uniformity DCP/All DCP is, the more the validation data do not meet the spatial uniformity at the 300 m or 990 m scales. For example, Water in Africa has a proportion of 1.0, indicating that the exact same validation data can be used as when spatial uniformity is not taken into account. However, note that the distribution of DCP points is not necessarily representative of the entire continent, as Figures 4 and 5 show. For example, in Figure 6, there is a possibility that many forests in South America are smaller than the 300 m area. Intuitively, the mean resolution of forests in South America seems to be larger than the 300 m scale, but the discrepancy between the intuition and the results is due to the spatial bias of the DCP points as shown in Figure 4. From these results, we can obtain a trend of how much each class of DCP has spatial uniformity at the 300 m The method for guaranteeing spatial uniformity using information from satellite images in this research was created with 300 m × 300 m as the training data, but it is possible to guarantee spatial uniformity at any resolution as long as the mean value and the sum of squared deviations from the center pixel are obtained from the satellite image in a certain area. We used this method to determine the uniformity or non-uniformity at the 990 m scale. The results of screening DCP validation data using this method showed that the number of DCP validation data was reduced by half for Forest, Grass/Shrub, Cropland, and Urban, while most of the data for Water and Barren remained at both the 300 m and 990 m scale. This means that the optimal resolution for each class of land cover classification may be obtained by looking at the variation in the amount of screening of non-uniformity for each class using the proposed method at, for example, 10 m, 100 m, 1 km, and 10 km.
Sens. 2021, 13, x FOR PEER REVIEW 13 of 18 or 990 m scales for the regions where DCP points are distributed. Such information may be useful in for discussing the optimal resolution for each class when mapping each region where DCP points exist.  The method for guaranteeing spatial uniformity using information from satellite images in this research was created with 300 m × 300 m as the training data, but it is possible to guarantee spatial uniformity at any resolution as long as the mean value and the sum of squared deviations from the center pixel are obtained from the satellite image in a certain area. We used this method to determine the uniformity or non-uniformity at the 990 m scale. The results of screening DCP validation data using this method showed that the number of DCP validation data was reduced by half for Forest, Grass/Shrub, Cropland, and Urban, while most of the data for Water and Barren remained at both the 300 m and 990 m scale. This means that the optimal resolution for each class of land cover classification may be obtained by looking at the variation in the amount of screening of or 990 m scales for the regions where DCP points are distributed. Such information may be useful in for discussing the optimal resolution for each class when mapping each region where DCP points exist.  The method for guaranteeing spatial uniformity using information from satellite images in this research was created with 300 m × 300 m as the training data, but it is possible to guarantee spatial uniformity at any resolution as long as the mean value and the sum of squared deviations from the center pixel are obtained from the satellite image in a certain area. We used this method to determine the uniformity or non-uniformity at the 990 m scale. The results of screening DCP validation data using this method showed that the number of DCP validation data was reduced by half for Forest, Grass/Shrub, Cropland, and Urban, while most of the data for Water and Barren remained at both the 300 m and 990 m scale. This means that the optimal resolution for each class of land cover classification may be obtained by looking at the variation in the amount of screening of

Accuracy Assessment of Existing Global Land Cover Maps
In this study, we created a new DCP validation dataset with guaranteed spatial uniformity. Here, we evaluate the accuracy of the global land cover maps using the DCP validation dataset with guaranteed spatial uniformity and the validation dataset including all DCP points. As both validation datasets have deficiencies, the following points should be noted when the accuracy assessment of global land cover maps using these validation datasets. The validation dataset with guaranteed spatial uniformity has the deficiency of screening the validation data and leaves only spatially uniform points, where the bias of the population of the validation data points is relatively large. In this case, the accuracy is considered to be overestimated. When all the validation data are used for accuracy assessment, DCP data do not guarantee spatial representativeness at a certain map resolution. In such cases, the accuracy assessment may be underestimated.
With these points in mind, we compared the accuracy of GlobCover using the DCP validation data with guaranteed spatial uniformity at the 300 m scale and with all DCP validation data. Table 9 shows the confusion matrices with and without spatial uniformity guaranteed. The class numbers correspond to the following: 1-Forest, 2-Grass/Shrub, 3-Cropland, 4-Urban, 5-Barren, 6-Water, 7-Other. Note that "Other" is not included in the validation data in this study because it may contain mixtures. Comparing the overall accuracy (OA), OA was 0.076 higher when evaluated with DCP validation data that guaranteed spatial uniformity than when evaluated with all DCP validation data. Looking at the user accuracy (UA) of each class, the UA of Forest, Grass/Shrub, and Cropland are 0.137, 0.071, and 0.051 higher, respectively, when evaluated with the DCP validation data that guarantee spatial uniformity than when evaluated with all DCP validation data. On the other hand, the difference was relatively low for Barren at 0.042 and Water at 0.038, and there was no change in Urban. As shown in Figure 3, the classes with larger amounts of screened data, which have a higher possibility of non-uniformity, showed a larger change in UA, while the classes with smaller amounts of screened data showed almost no change. Thus, when the amount of data screened as validation data with a high possibility of non-uniformity is large and the accuracy improves, accuracy could be considered underestimated because the spatial representativeness of the screened validation data was not ensured. Table 10 shows the results of accuracy assessment for GlobCover resampled to 1 km resolution. The OA was 0.09 higher when 990 m uniformity was guaranteed than when all DCP points were used. Looking at the breakdown of the improvement in accuracy, Forest, Grass/Shrub, and Urban improved their accuracy by about 10%, while Cropland, Barren, and Water improved relatively little.  Tables 11-13 show the results of the accuracy assessment of MCD12, GLCNMO, and GLC2000 when the spatial uniformity of the 990 m scale is guaranteed and when all DCP points are used, as in Table 10. In all maps, the accuracy of Forest and Grass/Shrub is much higher when the spatial uniformity is guaranteed than when all DCP points are used. On the other hand, for Barren and Water, the improvement in accuracy is relatively small depending on whether spatial uniformity is guaranteed or not for any map. However, Water in MCD12 shows an exceptionally large improvement in accuracy. This trend is consistent with the discussion in Figure 3, which shows that when the amount of screening of validation data with a high possibility of non-uniformity is large, the accuracy improves; when the amount of screening is small, there is almost no improvement in accuracy. However, despite the fact that Cropland has a large amount of screened validation data with a high possibility of non-uniformity, shown in Figure 3, the improvement in accuracy is small or may be lower in Tables 10-13. This is thought to be due to the fact that the mean of OA of Cropland's OuterCV by SVM is the lowest, as can be seen from Table 6. This low mean of OA means that the amount of erroneously screened uniform data included when screening non-uniform data is higher than other classes. In fact, checking Tables 10-13, only about 20% of the data in Cropland (row 3 column 3 elements in confusion matrix) is left when uniformity is ensured. On the other hand, in MCD12 and GLC2000, where the UA of Cropland is reduced, the accuracy of Cropland is considered to be reduced because the other classes have relatively more data remaining as uniform than Cropland.   Additionally, the "Other" (numbering 7 in confusion matrix) of each confusion matrix is actually a mixture of Cropland and Grass/Shrub, or a mixture of water bodies and vegetation such as wetlands. By calculating the following equation, we can check how much of the DCP determined to be non-uniform by SVM was actually a class of mixture (non-uniformity).
Here, m ij denotes the element of the confusion matrix with class number i = 7, j = 1, . . . , 6. The calculation results of this equation for each map are shown in Table 14. In this study, the SVM model was constructed with the aim of increasing the uniform accuracy as much as possible at the expense of accuracy on the non-uniform side, but it can be confirmed that the non-uniform data can be separated with high accuracy in some maps. These results indicate that the use of DCP to evaluate the accuracy of land cover maps may have resulted in underestimation, especially for Forest and Grass/Shrub, because the data without spatial representativeness were used. On the other hand, it was clarified that the accuracy assessment for Barren and Water by DCP was reasonable in terms of spatial representativeness if the resolution was 990 m or higher. In the case of Cropland, the screening of potentially non-uniform DCP points had the effect of improving accuracy by allowing only uniform DCP validation data for accuracy assessment, which offset the decrease in accuracy caused by the screening of a large amount of validation data compared with other classes; as a result, the accuracy was almost the same before and after screening by SVM. The reason why the separation of uniform and non-uniform Cropland is more difficult than other classes is thought to be that Cropland may be exposed to soil, covered with vegetation, or both, depending on the crop calendar, and it is difficult to distinguish uniform and non-uniform features from a satellite image. In addition, the number of data in the original DCP for Urban was small, and screening of non-uniform data at 300 m and 990 m scales left almost no data. Furthermore, the discriminative performance of uniform Urban was as high as 0.985 from the OuterCV results (Table 6). From these results, the resolution at which spatial representativeness of Urban can be ensured is lower than 300 m.
Considering the above discussion on spatial uniformity, we will discuss the accuracy assessment of the global land cover map using the DCP validation data that ensure uniformity. In GlobCover, GLCNMO, and GLC2000, half of Grass/Shrub is classified as Barren, while fewer data are classified as Barren in MCD12 (5 rows and 2 columns in confusion matrix). In MCD12, many of the data that are originally classified as Forest or Cropland are classified as Grass/Shrub. This is more pronounced than in other global land cover maps. This difference is thought to be due to the difference in the class definition. As shown in Table 4, the difference between GlobCover, GLCNMO, and GLC2000 is based on the definition of LCCS, while MCD12 is based on the definition of IGBP, which can be seen in this result.

Conclusions
In this study, we created DCP validation data with added information of spatial uniformity at 300 m and 990 m scales and compared the accuracy of the existing global land cover maps with the accuracy assessment by all DCP validation data. The judgment of uniformity and non-uniformity was made semiautomatically by the SVM classifier of the RBF kernel with two cost parameters, using some DCP points as training data. It was confirmed that the constructed SVM model was able to identify uniform DCP validation data with a high accuracy, UA = 0.954.
As a result of judging the DCP validation data as uniform or non-uniform using this classifier, Forest, Grass/Shrub, Cropland, and Urban were screened as non-uniform in more than half of the cases. This means that the typical spatial scale of these classes is smaller than 300 m. Therefore, it is indicated that the data used in these classes did not ensure spatial representativeness, and thus may have been underestimated when accuracy was assessed at larger than 300 m scale in previous studies. On the other hand, most of the DCP validation data for Barren and Water were judged to be uniform at 300 m and 990 m scales. Therefore, for Barren and Water, accuracy assessment by DCP is spatially representative if the resolution is 990 m or higher. For example, the results of the accuracy assessment at 1 km are reasonable from the viewpoint of spatial representativeness even if the validation data do not guarantee spatial representativeness so far. Due to the nature of the Cropland class, it is relatively difficult to determine uniformity and non-uniformity from satellite images, and the amount of data is extremely reduced by screening.