Consistency Analysis and Accuracy Assessment of Three Global 30 ‐ m Land ‐ Cover Products over the European Union using the LUCAS Dataset

: Land ‐ cover plays an important role in the Earth ʹ s energy balance, the hydrological cycle, and the carbon cycle. Therefore, it is important to evaluate the current global land ‐ cover (GLC) products and to understand the differences between these products so that they can be used effectively in different applications. In this study, three 30 ‐ m GLC products, namely GlobeLand30 ‐ 2010, GLC_FCS30 ‐ 2015, and FROM_GLC30 ‐ 2015, were evaluated in terms of areal consistency and spatial consistency using the Land Use/Cover Area frame statistical Survey (LUCAS) reference dataset over the European Union (EU). Given the limitations of the traditional confusion matrix used in accuracy assessment, we adjusted the confusion matrices from sample counts by accounting for the class proportions of the map and reported the standard errors of the descriptive accuracy measures in the accuracy assessment. The results revealed the following. (1) The overall accuracy of the GlobeLand30 ‐ 2010 product was the highest at 88.90 ± 0.68%; this was followed by GLC_FCS30 ‐ 2015 (84.33 ± 0.80%) and FROM_GLC2015 (65.31 ± 1.0%). (2) The consistency between the GLC_FCS30 ‐ 2015 and GlobeLand30 ‐ 2010 is higher than the consistency between other products, with an area correlation coefficient of 0.930 and a proportion of consistent pixels of 52.41%, respectively. (3) Across the area of the EU, the dominant land ‐ cover types such as forest and cropland are the most consistent across the three products, whereas the spatial consistency for bare land, grassland, shrubland, and wetland is relatively low. (4) The proportion of pixels for which the consistency is low accounts for less than 16.17% of pixels, whereas the proportion of pixels for which the consistency is high accounts for about 39.12%. The disagreement between these products primarily occurs in transitional zones with mixed land cover types or in mountain areas. Overall, the GlobeLand30 and GLC ‐ FCS30 products were found to be the most consistent and to have good classification accuracy in the EU, with the disagreement between the three 30 ‐ m GLC products mainly occurring in heterogeneous regions.


Introduction
Land-cover is an essential climate variable affecting the Earthʹs energy balance, matter cycles, and the structure and function of the ecosystem [1][2][3][4]. In recent decades, as satellite remote sensing techniques and computer facilities and storage have been developed, various global/regional landcover products that have different spatial resolutions and which use different classification systems have been developed and released. For example, Hansen et al. (2010) and Loveland et al. (2010) used time-series of the Advanced Very High Resolution Radiometer (AVHRR) imagery to produce the 1km Global land-cover (GLC) product (GLC_2000) as well as the International Geosphere-Biosphere Programme Data and Information System land-cover product (IGBP_DISCover) [5][6][7]. Based on Moderate Resolution Imaging Spectroradiometer (MODIS) and MEdium Resolution Imaging Spectrometer (MERIS) satellite data, several global 300-m and 500-m land-cover products, including the MODIS GLC dataset (MOD12Q1 and MCD12Q1) [8,9], GlobCover [10], and the Climate Change Initiative land-cover dataset (CCI_LC) [11], have been released. Although these coarse-resolution GLC products have provided valuable information for numerous applications, some studies have shown that these products have low classification accuracies in transitional zones with heterogeneous landscapes and that finer-resolution data are required in these areas [12][13][14].
Recently, some GLC products with fine resolutions of, for example 30 m, have been produced. These include the GlobeLand30 products, which were developed by using a pixel-object-knowledgebase (POK) classification strategy that combines multi-temporal Landsat and HJ-1A/B imagery [15]. Another example is the Global Land Cover with Fine Classification System at 30 m (GLC_FCS30) product, which was developed by combining time-series of Landsat imagery and local adaptive random forest models on the Google Earth Engine platform [16]. Finally, there is the Finer Resolution Observation and Monitoring of Global Land Cover (FROM_GLC) product, which was developed using supervised classifiers and single-date Landsat imagery [14]. These fine-resolution GLC products provide potential users with rich spatial detail. However, they are derived from multisource remote sensing imagery using different classification systems and methods; for most users, it is not clear which products suits their particular application best [12,17]. Ultimately, the accuracy assessments provided by the data producers cannot be directly compared due to the differences of validation datasets used [17]. Moreover, the accuracy assessment cannot illustrate the spatial distribution of classification error and inconsistency between different products [12]. Therefore, quantitative and independent consistency analysis and accuracy assessment are extremely important for users to choose the best product for their specific application.
Two commonly used methods for comparing the accuracies of different GLC products are (1) comparing the different products to determine the consistency between them and (2) quantitatively assessing the accuracy of different products using a confusion matrix based on a common validation dataset [18,19]. Specifically, the first and crucial point in consistency analysis is to unify the different classification systems used for the different products. Unfortunately, in most previous studies, the land-cover types were integrated directly without taking into consideration the semantic differences between the same land-cover types in different classification systems [19,20]. This was done even though many studies have demonstrated that the use of different classification systems affects the consistency and accuracy across different GLC products [19,[21][22][23]. For example, grassland is defined as land covered by natural herbaceous vegetation using an FVC (fraction of vegetation cover) threshold of 10% in the GlobeLand30 classification system [14], while an FVC threshold of 15% is used for FROM_GLC and GLC_FCS30 [14,16].
Although numerous researchers have realized the importance of carrying out rigorous accuracy assessment for different land-cover products using a common validation dataset [21,24,25], collecting a sufficient number of accurate validation sample points is still a difficult task, as the collection of validation datasets usually relies on the visual interpretation of high-resolution satellite images, regional maps, and geo-tagged photos, which is time-consuming and labor-intensive [10,14,15,26]. Currently, in order to improve the efficiency of the existing datasets, several GLC reference datasets are being made accessible via the Global Observations of Forest and Land Dynamics (GOFC-GOLD) reference data portal, the Geo-Wiki platform, and other organizations such as the Statistical Office of the European Commission (Eurostat) [27][28][29]. In addition, some studies have considered the use of these datasets for accuracy assessment: for example, Tsendbazar et al. assessed the accuracy of four GLC products for Africa using the existing reference datasets that are publicly accessible through the GOFC-GOLD reference data portal [28]. Furthermore, the most commonly used measures of accuracy, such as the overall accuracy (O.A.), producer's accuracy (P.A.), and user's accuracy (U.A.), which are based on confusion matrices derived from sample counts, are still the measures most widely used for evaluating the accuracy of products [14,16,19]. However, Card (1982) firstly adjusted the traditional confusion matrices by taking into account the proportions of different land-cover types by area in the product to reduce the standard errors of the accuracy estimates [30], and poststratified estimator has been proven to have better precision than the estimators commonly used [31]. However, there are only a few studies using this poststratified estimator for the comparative accuracy assessment of different GLC products based on a common validation dataset [21].
The objectives of this study were: (1) to propose a method for reducing the differences between the classification systems used for different land-cover products using the EAGLE (EIONET (Environmental Information and Observation Network) Action Group on Land monitoring in Europe) concept; (2) to make a comparison of three products (GlobeLand30, GLC_FCS30, and FROM_GLC) for the territory of the European Union; and (3) to quantitatively assess the performance of these three 30-m GLC products and report the confidence intervals for each of the estimates using adjusted confusion matrices. The results of this study will help users select the optimal land-cover products use in their applications.

Study Area
The study area that was used covers the 27 member states of the European Union (EU) and the United Kingdom and has a total area of approximately 4.3 million km 2 . Seven different environmental zones identified by the European Environmental Agency [32] are found within this area; these are shown in Figure 1. Due to the variety of different zones that are found in this area, the land-cover types in Europe are complex and diverse: the most widely distributed land-cover types are forest and cropland, which are concentrated in the boreal, continental, and Mediterranean biogeographic regions [32].

Global Land-Cover Products
In this study, three global fine-resolution land-cover products that have a resolution of 30 m-GlobeLand30-2010, GLC_FCS30-2015, and FROM_GLC30-2015-were acquired and used to analyze the consistency and accuracy of these products within the European Union. The GlobeLand30-2010 product is one of the most widely used land-cover products and was developed by using the pixelobject-knowledge-base (POK) method and multi-temporal Landsat and HJ-1 (China Environment and Disaster Reduction Satellite) A/B imagery [15]. It uses a classification system containing 10 major land-cover types. This product has been validated as having an overall global accuracy of 80.3% based on 154,586 validation sample points [15].
The GLC_FCS30-2015 product, which contains a large variety of land-cover types, was developed in the GEE platform by combining time-series of Landsat imagery, a global prior training dataset from GSPECLib, and local adaptive random forest models [16]. It inherited the classification system used for the CCI_LC land-cover maps that consists of 30 land-cover types. GLC_FCS30-2015 was validated as having an overall global accuracy of 82.5% based on 44,043 validation sample points [16].
FROM_GLC30, the first global 30-m land-cover product, was produced by combining over 9000 single-date Landsat images, 10,000 visually interpreted training sample points, and supervised classifiers including maximum likelihood classifier (MLC), support vector machine (SVM), Random Forest (RF), and Artificial Neutral Network (ANN). The product has an overall accuracy of 64.9% [14]. The new generation of FROM_GLC30 uses multi-temporal Landsat imagery to improve the overall accuracy. In this study, FROM_GLC30 from 2015 (FROM_GLC30-2015), which consists of 27 land cover types and has been validated to have an overall accuracy of 80.3% [33], was selected for use, and the main characteristics information of the three GLC products is listed in Table 1.

LUCAS Validation Dataset
The Land Use/Cover Area frame statistical Survey (LUCAS) collects information about landcover and land use as well as agro-environmental and soil-type data every three years using field observations and photo-interpretation based on a systematic 2 km × 2 km grid [34][35][36]. This comprehensive framework of in situ information launched by the European Union [37] has gradually become the largest and most comprehensive land-cover database in Europe [34] and has been successfully used for analyzing and verifying land-cover maps in several countries [3,38].
In this study, the LUCAS micro dataset, which is collected as part of the LUCAS survey by the Statistical Office of the European Commission (Eurostat) and is available at https://ec.europa.eu/eurostat/web/lucas/data/lucas-grid, was used to quantitatively assess the accuracy of three different land-cover products. This dataset consists of 1,090,863 records, and each record lists 17 attributes including the position, elevation, land-cover types, and time of collection. As a result of the differences between classification systems and the temporal interval between the LUCAS and GLC products, we took some steps to ensure the quality of the validation sample points. Specifically, in order to make the sample points consistent with the target classes given in Section 3.1, we first translated all 44 land-cover types in the LUCAS micro dataset into a new validation dataset containing nine land-cover types (Tundra was not included in the validation experiment because there is no tundra sample in the LUCAS reference dataset). Except for the tundra class, these classes were the same as the target classes in the classification system employed by GlobeLand30. Secondly, consideration was given to the temporal interval between the LUCAS sample points and the GLC products: for example, GlobeLand30 and FROM_GLC30 were produced in 2010 and 2015 respectively, and the temporal information used in the validation sample points for the LUCAS micro dataset were acquired in 2006, 2012, and 2018. The temporal filter method was applied to refine the validation sample points with corresponding year. For example, for the collection of sample points in 2010, sample points for which the land-cover types in 2006 and 2012 were different were removed as being error sample points. Thirdly, as some sample points change with time (such as those corresponding to permanent snow or ice) or lack detailed attribute information, visual interpretation was applied to examine the validation sample points by using the high-resolution and middleresolution imagery, such as IKONOS, SPOT (Systeme Probatoire d'Observation de la Terre)-5, Landsat-8 OLI (Operational Land Imager), Landsat-5 TM (Thematic Mapper), and Sentinel-2 satellite imagery, which are available in Google Earth. Taking the permanent snow/ice sample points as an example, only the ground target in a 30-m × 30-m grid cell visually interpreted as snow/ice in the corresponding year was reserved as the final validation sample points. Finally, after carefully checking and filtering, a total of 691,521 LUCAS sample points in 2010 and 632,315 sample points in 2015 containing nine land-cover types were retained, as illustrated in Figure 2. Visually, the spatial distribution of these validation sample points captured the actual landscapes present within the European Union to a certain extent. The final LUCAS validation datasets for 2010 and 2015 are available at https://doi.org/10.5281/zenodo.3998826.

Methods
Due to the discrepancies between the classification systems used for different GLC products, it is challenging to evaluate the accuracy of these products. With the aim of making a comprehensive evaluation of the performance of the three GLC products within the European Union, we decided to take the following steps: harmonization of the classification systems used for the different GLC products, carrying out an analysis of the consistency of the GLC products, and making an accuracy assessment using a validation dataset.

Harmonization of the Classification Systems Used for Different GLC Products
There are large discrepancies between the classification systems employed by the three different GLC products. FROM_GLC30 uses a two-level hierarchical classification system with 10 land-cover types at level 1; these separate the traits, life forms, and structural information into distinct layers [14]. The CCI-LC classification system [39] employed by GLC_FCS30 has 16 level-1 land-cover types; this system was designed to serve the climate modeling and climate communities [16]. GlobeLand30 has 10 basic land-cover types, which are the same as the land-cover types used in FROM_GLC30 at level 1 [15]; this classification system was designed to meet the classification requirements of Earth system modeling. There are also differences in the way that the same land-cover type is defined in different classification systems. For example, GlobeLand30 distinguishes between the vegetated land-cover types (forest, shrubland, and grassland) and bare areas using an FVC threshold of 10%, whereas the GLC_FCS30 products use an FVC threshold of 15% [15,16].
In order to assess these GLC products using a common validation dataset, it was necessary to transform the various classification systems into the same validation target set. Based on the user guidelines for the three GLC products [14][15][16]39,40], a target set with 10 major land-cover types was designed; these are listed in Table 2. In particular, due to the similarity between the classification systems used for GlobeLand30, FROM_GLC30 level-1, and that of the LUCAS validation dataset, GlobeLand30 and FROM_GLC30 could be directly assessed using the LUCAS validation dataset. Given that the difference of land-cover types definition between that used by the GLC_FCS30 classification system and the target classes are particularly great, the first step of land-cover types conversion for GLC_FCS30 was calculating the semantic similarity between the GLC_FCS30 classification system and the target classification set according to the EAGLE (EIONET (Environmental Information and Observation Network) Action Group on Land monitoring in Europe) concept proposed by Arnold et al. [41]. This EAGLE model describes land units in a featureoriented manner by decomposing them and highlighting the characteristic landscape elements [41]. The details of the process for calculating the semantic similarity between GLC_FCS30 and the target classes are given as supplementary material in Tables S1-S6 [41,42]. However, as shown in Table S6, the definition of sparse vegetation in GLC_FCS30 overlaps with that of several other major land-cover types in the target set, including bare land, grassland, shrubland, and forest. Therefore, except for sparse vegetation, all 15 level-1 land-cover types of GLC_FCS30 were converted to the corresponding target classes that completely overlapped with them based on Table S6. To solve the problem caused by the definition of the sparse vegetation, pixels classified as belonging to sparse vegetation were converted to whichever land-cover type had the maximum weight, as calculated by taking the product of the proportion of pixels in each category within the relevant 0.05° × 0.05° grid cell, which was same as the statistical cell size of thematic area fraction calculation in the research of Zhang et al. [22], and the semantic similarity between the sparse vegetation and the other land-cover types. For example, it can be seen from Table S6 that the semantic similarity values between bare land and sparse vegetation, grassland, and sparse vegetation were 0.901 and 0.553, respectively. This means that if 15% of the pixels in one 0.05° × 0.05° target cell were grassland while the other 10% were bare land, the pixel was converted to bare land because the final weight of bare land (9.01%) was higher than the weight of grassland (8.30%). The final table used for converting the original classes used in the land-cover datasets that were compared and the target classes is given in Table 2.
As a final step, because of the need for consistency in the analysis of the maps across Europe, the reclassified datasets of (a) GlobeLand30-2010, (b) GLC_FCS30-2015, and (c) FROM_GLC 2015 were displayed using a Lambert Azimuthal Equal Area projection ( Figure 3). As shown in Figure 3, there was significant spatial consistency between these three GLC products and the validation dataset distribution in Figure 2 from a visual of view, indicating that it roughly captured the actual distribution of land-cover types within the European Union.

Consistency Analysis for GLC Products
In order to present a more comprehensive understanding of the GLC products, comparisons based on two key aspects ("area-based" and "pixel-based") were performed between the three GLC products. The "area-based" analysis mainly focused on the areal proportions covered by each landcover type in the study area and the similarity of the same land-cover types between different products was derived. The "pixel-based" analysis allowed the consistency of the GLC products at different spatial locations to be illustrated. In addition, although there was a five-year interval between GlobeLand30 and the two other GLC products, numerous researchers have demonstrated that the land-cover changes that occur at the country scale over this time interval are almost negligible compared to the classification error in the land-cover products [8,43]. Therefore, it was considered reasonable to carry out an analysis of the consistency between GlobeLand30 and the two other GLC products.

Area-Based Consistency Analysis
The area-based analysis focused mainly on the consistency of the areal proportions of each landcover type between the different GLC products [19,23]. In order to quantitatively evaluate the areal similarity for the same land-cover types in the different products, the areal proportion for each landcover type within the territory of the EU was calculated for each of the three products. The similarity coefficients between the products were calculated using Equation (1) [19,44]: where XY R is the area correlation coefficient between GLC products X and Y; i is the land-cover type; i X and i Y are the total areas of land-cover type i in land-cover products X and Y, respectively; X and Y are the average of the total area of all land-cover types in land-cover products X and Y, respectively; and n is the total number of land-cover types.

Pixel-Based Consistency Analysis
The pixel-based analysis concerned the spatial (dis)agreement for each land-cover type between the different GLC products [18,20,23]. In order to intuitively illustrate the spatial consistency for the different GLC products, the spatial superposition method [23] was employed to obtain the pixel-bypixel spatial correspondence for the different products analyzed in this study. Based on the number of pixel-by-pixel matches that were obtained for the target classes for the three products, in each case, the degree of consistency was classified as belonging to one of three levels. In descending order, these levels were as follows [19]: 1. High consistency: the target classes for the three GLC products were exactly the same at a given pixel; 2. Moderate consistency: any two products had the same target class at a given pixel; 3. Low consistency: the three GLC products all had different target classes at a given pixel.
To obtain a better understanding of the contribution of each product to the total spatial consistency and obtain more detailed information about the spatial consistency between different products, the spatial superposition method was also applied to each pair of products. Pixels for which the target class was the same in both products were considered "consistent", whereas pixels with different target legends were labeled as "inconsistent".

Accuracy Assessment Using the Validation Dataset
The confusion matrix is widely used in assessments of the accuracy of GLC products. In order to reduce the standard errors in the results of the accuracy evaluation, the common descriptive accuracy metrics of user's accuracy (U.A.), producer's accuracy (P.A.), and overall accuracy (O.A.) were calculated using a confusion matrix. Then, the confusion matrix was adjusted from sample counts by accounting for the class proportions of the map [21,30] based on Equation (2): where ˆi j p is the unbiased estimate of the proportion of area in reference category i that is mapped as category j; j W is the proportion of the area mapped as category j; ij n is the number of observations in reference category i that are mapped as category j; and j n  is the marginal sum of mapped category j for m land-cover types with N validation sample points.
Several descriptive accuracy measures and their variances at confidence intervals of approximately 95% were calculated in line with the method proposed by Olofsson et al. [31,45]. In particular, the accuracy assessment dataset used in this study was based on a systematic spatial sampling design without any prior information about the proportions of the different land-cover classes [29], so the calculations were based on a simple random sampling of individual points from the accuracy evaluation. Taking the overall accuracy as an example, the O.A. was estimated as where m is the number of the land-cover types. Then, the final approximation with a 95% confidence interval was obtained as   . . 1 Figure 4 shows a comparison of the area percentages of different land-cover types in the three GLC products within the EU. Overall, these three GLC products were able to capture the actual landcover patterns within the area of the European Union [3,46]. For example, it can be seen that cropland and forest had the highest coverage almost with more than 30% area percentage in three products; this was followed by grassland, shrubland, and water. There was quite a high degree of consistency between the GlobeLand30-2010, GLC_FCS30-2015, and FROM_GLC30-2015 products over the water, impervious surface, and permanent snow/ice classes, with water having proportions of 2.96%, 4.69%, and 2.64%, impervious surface having proportions of 1.19%, 1.92%, and 1.68%, and permanent snow/ice have proportions of 0.02%, 0.11%, and 0.03%, respectively. However, the proportions of cropland and forest area in the GlobeLand30-2010 product were significantly higher than in GLC_FCS30-2015 and FROM_GLC30-2015. Especially for cropland, the area proportion of cropland in GlobeLand30-2010 product reached 53.06%; while the figures for FROM_GLC30-2015 and GLC_FCS30 were only 34.39% and 36.61%, respectively. For grassland, shrubland, and the other land-cover types, the consistency between the three products was relatively low. For example, for grassland, the proportions ranged from 1.75% to 20.38%. The area classified as belonging to the wetland class in the FROM_GLC30-2015 product was almost zero, whereas the proportions of wetland in the GlobeLand30-2010 and GLC_FCS30-2015 products were 0.67% and 1.01%, respectively. To quantitatively measure the "area-based" consistency, the area correlation coefficients between each pair of products were calculated, and the results are shown in Table 3. It can be seen that the correlation coefficient between GlobeLand30-2010 and GLC_FCS30-2015 was the highest, which was equal to 0.930; the lowest correlation coefficient was that between GlobeLand30-2010 and FROM_GLC2015, which was equal to 0.538. The results revealed that the area consistency between GLC_FCS30-2015 and GlobeLand30-2010 was higher than that between other products.

. Areal Consistency between the Three 30-m GLC Products
To show the areal consistency of each pair of products, the three pairwise consistency matrices that illustrate the percent area of agreement and disagreement by each class were calculated as shown in Figure 5, and the details of these consistency matrices are shown in Tables S7-S9 in the Supplementary Materials. In Figure 5, it can be observed that the percent area of disagreement of the areas classified as cropland in one product while classified as forest, grassland, or impervious surface in the other product is the highest.  Specifically, Figure 5a illustrates that the area consistency between the GLC_FCS30-2015 product and GlobeLand30-2010 is higher than that between any other pairs of product; and most of the areas labeled as grassland in the GLC_FCS30-2015 product were labeled as cropland in GlobeLand30-2010, accounting for 6.82% of the total area of the study area, while the total area of grassland in the GLC_FCS30-2015 product only accounted for 10.63% of the total area. The area consistency between GlobeLand30-2010 and FROM_GLC30-2015, as shown in Figure 5b, shows an obvious disagreement of the areas labeled as cropland, forest, and shrubland in GlobeLand30-2010, while it was labeled as grassland in the FROM_GLC30-2015 product; among them, the areas classified as cropland in the GlobeLand30-2010 product were classified as grassland in the FROM_GLC30-2015 product, accounting for 10.21% of the total area. In addition, as shown in Figure 5c, similar to the disagreement areas between the GlobeLand30-2010 and GLC_FCS30-2015 products, the disagreement areas between the GLC_FCS30-2015 product and the FROM_GLC30-2015 product are mainly distributed in areas labeled as cropland in the GLC_FCS30-2015 product, while they were labeled as forest or grassland in the FROM_GLC30-2015 product. Figure 6 illustrates the spatial consistency between the three land-cover products over the EU based on the levels of consistency within the 0.05° × 0.05° grid cells. Overall, the proportion of pixels with a "low consistency" degree makes up 16.17%, whereas the proportions of "high consistency" and "moderate consistency" pixels amount to about 39.12%, and 39.70%, respectively.In terms of biogeographic regions (Figure 1), the high-consistency areas are mainly concentrated in Sweden and Finland, which are in the boreal region, and Poland, Romania, and Bulgaria in the continental region. The majority of the Pannonian biogeographic region and some parts of the Atlantic region are also areas of high consistency. Most of these areas are flat and the distribution of land-cover types is relatively simple: for example, within the boreal region, the land is relatively flat and lies below 500 m; forest is the dominant land-cover type, although wetlands are also common [32]. Areas of low consistency across the three GLC products are sparsely distributed and found mainly in different parts of the Alpine and Atlantic biogeographic regions. Specifically, the areas of inconsistency with the Alpine region are mainly located near the Fennoscandian Mountains. This is an area that contains a myriad of different microclimates created by the complex topography and differing exposures (sheltered south-facing slopes, snow pockets, wind-blasted crags, and uneven rock screes), and contains such a rich biodiversity that almost two-thirds of the plants on the European continent are present there [32]. The differences in the land-cover types between the different products in this area are mainly concentrated on the grassland and bare land classes. The GLC_FCS30-2015 product classified most of the area as bare land, whereas in FROM_GLC30-2015, it was classified as grassland.

Spatial Consistency between All Three Products
Finally, most of the low-consistency area in the Atlantic region is in the north of Scotland and results from the heterogeneous patches of wetland and transition zones from forest to shrubland [32] that are present there. From the above analysis, it can be seen that within the EU, there is an easily understood consistency between the distribution of classes that have relatively simple distribution patterns (i.e., forest and cropland) in areas such as plains, whereas inconsistency occurs mainly in areas of complex terrain (such as mountains) and in rapid transition zones. For example, in France, the main mountain ranges are concentrated in the east and northeast, which means that the topography tends to be higher in the east of France than in the west. Correspondingly, it can be seen from Figure 6 that the consistency in the northeast of France is significantly lower than that in the southwest. Within these areas, the areas of high consistency were mostly classified as cropland in all three products, whereas areas of low consistency are mostly transition areas from forest to grassland or forest to other landcover types.
In Figure 7, some obvious discontinuous transitions can be seen from the spatial consistency of the three products within France. By combining the spatial distributions for the three products in this region and making a further analysis, it was found that in these transitional areas, there is significant confusion between cropland, forest, and grassland. In particular, FROM_GLC30-2015 identified these transitional areas as grassland, while the other two products labeled them as cropland or forest.  Figure 8 shows the spatial distribution of the proportion of consistent pixels for each pair of the compared GLC products, which were plotted using a 0.05° × 0.05° grid. As shown in Tables S7-S9

Accuracy of the Three GLC Products
The three adjusted confusion matrices corresponding to the GLC products were calculated using the LUCAS validation dataset (Tables S10-S12). Table 4 gives the values of the accuracy measures, overall accuracy, user's accuracy, and producer's accuracy at the 95% confidence interval. The results show that the overall accuracy of the GlobeLand30-2010 product was the highest, with a value of 88.90 ± 0.68%. This was followed by GLC_FCS30-2015, which had a similar overall accuracy of 84.33 ± 0.80%. The overall accuracy of the FROM_GLC2015 product was the lowest at 65.31 ± 1.0%. Among the different land-cover types, the mapping accuracies for the cropland, forest, water, and impervious surface types were high in all three products. For GlobeLand30-2010 in particular, the accuracies for these land-cover types were in the range 84.88 ± 0.06% to 99.01 ± 0.01%. However, although the accuracies of the cropland, water, and impervious surface types were higher than those of the other types except for forest in the case of the FROM_GLC30-2015 product, they were clearly lower than for the GlobeLand30-2010 and GLC_FCS30-2015 products. The three products had lower classification accuracies for grassland, shrubland, and wetland. Specifically, the highest value of the producer's accuracy for grassland among the different products was only 70.95 ± 0.05%, which was for GLC_FCS30-2015, and the highest user's accuracy was 59.05 ± 0.09%, which was for GlobeLand30-2010. The classification errors for grassland were mainly because of confusion with cropland, forest, and shrubland and also to a certain extent because of confusion between grassland and water in FROM_GLC30-2015. For wetland, the overall classification accuracy was lower than for other types of land-cover. The classification accuracy for wetland was very poor in the case of FROM_GLC30-2015, which had a producer's accuracy of 0 ± 0.00% (only 60 of the 18,132 wetland points had the same classification labels with the validation sample points) and a user's accuracy of 28.44 ± 11.35%. GlobeLand30-2010 had a high user's accuracy of 91.88 ± 0.08% and a relatively low producer's accuracy of 35.81 ± 0.02% for this land-cover type, indicating that large areas of wetland were omitted by GlobeLand30-2010. Furthermore, the classification accuracy for bare land and permanent snow/ice varied greatly. For bare land in particular, the accuracy levels ranged from 8.81 ± 0.06% to 78.06 ± 0.12%.

Reasons for the Low Level of Consistency between the Different Products
The low consistency between the three 30-m GLC products was due to differences in their classification systems, data sources, original classification algorithms, and in the periods to which they relate. Differences in classification systems is one of the main factors that produces disagreement in classification results [18]. Such differences also make rigorous comparison between maps and the synergistic use of different maps challenging [12,14,18]. In the classification, comparison, and assessment of land-cover mapping, the most important issue is the substantial differences that exist between the identification criteria used by different maps for the same or similar land-cover types. For instance, for the bare land and permanent snow/ice cover types, the differences between the products were more obvious due to the differences in the definitions used by the classification systems. In the case of permanent snow/ice, as a result of the unclear definitions of temporary or permanent in this land-cover type and because the characteristics of snow and ice can vary greatly from year to year, the classification results for this cover type for FROM_GLC30-2015 and GlobeLand30-2010 were poor, and many pixels were misidentified as bare land or shrubland. When the classification system transformation was implemented, these differences and uncertainties already existed. Therefore, the errors in the accuracy evaluation results based on the integrated landcover types do not all originate from the products themselves.
The different classification algorithms and remote sensing imagery used also affect the consistency between different products. Specifically, the GlobeLand30-2010 product was constructed using the "pixel-object-knowledge" classification strategy together with human experience based on individual Landsat TM/ETM images combined with HJ-1A/B images. The FROM_GLC2015 and GLC_FCS30-2015 products were generated using random forest classification algorithms and timeseries of Landsat-8 OLI images. The consistency analysis and accuracy assessment revealed that the mapping accuracy of the different land-cover types, especially for land-use types such as cropland, was lower for the mapping only based on remote sensing imagery without any human input (FROM_GLC30-2015 and GLC_FCS30-2015) than for that obtained using both remote sensing imagery and human input (GlobeLand30-2010).

Uncertainty Due to Different Classification Systems
Considering that the individual accuracy assessment of each product mostly provided by the data producer cannot be compared directly due to the differences of the classification system and validation dataset they used, the users need a comparative accuracy assessment of different products to choose the most appropriate maps in their specific applications. So, a comparative accuracy assessment based on a common validation dataset is necessary. In order to compare GLC products that use different classification systems, the first step is the harmonization of the classification system used for different GLC products, and maps with more detailed land-cover types are often transformed into generalized systems that have fewer land-cover types. This naturally leads to a loss of detail in the description of land-cover characteristics. Furthermore, there is a high degree of semantic overlap between some of the detailed land-cover types and target land-cover types, which cannot be simply merged into one class. An example of this is the sparse vegetation category in the GLC_FCS30-2015 product. In this study, all three products were integrated to produce nine target classes in each case; however, the sparse vegetation class had a high semantic similarity with bare land (0.9014), shrubland (0.5863), and other integrated land-cover types. For these detailed classes, merging them into integrated classes directly [19], or adding rule constraints based on a priori information in order to merge the categories, as was done in this study, means that differences and uncertainties had already been introduced when the transformation was performed. Therefore, since the currently used land-cover products contain detailed land-cover information-for example, FROM_GLC30 contains 26 land-cover types [14] and GLC_FCS30 contains 30 land-cover types [16]how to obtain more detailed reference datasets for the classes is an important question that still needs to be addressed.

Issues Related to the Comparative Assessment of GLC Products for Specific Applications
In general, users with different application objectives have different requirements on the GLC products. For example, the data producers are more concerned about the thematic similarity between LC classes of the GLC products, while for the agriculture users, only the classes related to the cropland are concerned. Therefore, the classification errors of different classes should have different impact on the specific applications [21]. Take the agriculture application as an example; although the GlobeLand30-2010 product had the highest overall accuracy within the territory of the EU (88.90 ± 0.34%), the user's accuracy of the cropland class (88.79 ± 0.01%) was lower than the GLCFCS30-2015 product of 91.53 ± 0.01%. Moreover, the GLCFCS30-2015 product has more classes related to the cropland, so the GLCFCS30 product may be more appropriate than the GlobeLand30 product for agriculture users who take care of different cropland types. Therefore, in addition to the overall accuracy assessment of the maps, the thematic accuracy assessment of different GLC products for specific applications is also important for different users to choose the most appropriate maps for their applications.
Considering this issue, more efforts should be paid on the comparative accuracy assessment of the GLC product for specific applications, and the GLC product accuracies for specific applications can be calculated using weights derived from class similarities [21,47].

Conclusions
In order to provide reference data that would allow users to understand the performance of different 30-m GLC products, as well as provide guidelines for the further validation of the classification accuracy of these products, based on the LUCAS validation dataset, a consistency analysis and accuracy assessment were performed for three 30-m GLC products that cover the EU. The results indicated that the GlobeLand30-2010 product had the highest overall accuracy within the territory of the EU (88.90 ± 0.34%), followed by GLC_FCS30-2015 (84.33 ± 0.80%) and FROM_GLC2015 (65.31 ± 1.0%). The results of the consistency analysis showed that the proportion of low-consistency pixels made up 16.7% of the total, whereas the proportion of high-consistency pixels amounted to about 42.5%. The identification of the dominant land-cover types, such as forest and cropland, by the three GLC products was fairly consistent; for the bare land, grassland, shrubland, and wetland cover types, the consistency was lower. Regions of disagreement between these products primarily occurred in transitional zones with mixed land-cover types and in heterogeneous mountain regions.
Supplementary Materials: The following are available online at www.mdpi.com/2072-4292/12/21/3479/s1, Table  S1: EAGLE matrix of the target legends, Table S2: EAGLE matrix of the GLC_FCS30 classification system, Table  S3: Transformation of the code values into a fuzzy function, Table S4: Weights for each LC types of the target legends target legends, Table S5: Weights for each LC types of GLC_FCS30 classification system, Table S6: Similarity matrix between GLC_FCS30 classification system and the target legends, Table S7. The percent area of agreement and disagreement by each class of GlobeLand30-2010 product and GLC_FCS30-2015 product, Table S8. The percent area of agreement and disagreement by each class of GlobeLand30-2010 product and FROM_GLC30-2015 product, Table S9. The percent area of agreement and disagreement by each class of FROM_GLC30-2015 product and GLC_FCS30-2015 product, Table S10. Adjusted confusion matrix of GlobeLand30-2010 product, Table S11. Adjusted confusion matrix of GLC_FCS30-2015 product, Table S12.