Comparison of Global Land Cover Datasets for Cropland Monitoring

Accurate and reliable information on the spatial distribution of major crops is needed for detecting possible production deficits with the aim of preventing food security crises and anticipating response planning. In this paper, we compared some of the most widely used global land cover datasets to examine their comparative advantages for cropland monitoring. Cropland class areas are compared for the following datasets: FAO-GLCshare (FAO Global Land Cover Network), Geowiki IIASA-Hybrid (Hybrid global land cover map from the International Institute of Applied System Analysis), GLC2000 (Global Land Cover 2000), GLCNMO2008 (Global Land Cover by National Mapping Organizations), GlobCover, Globeland30, LC-CCI (Land Cover Climate Change Initiative) 2010 and 2015, and MODISLC (MODIS Land Cover product). The methodology involves: (1) highlighting discrepancies in the extent and spatial distribution of cropland, (2) comparing the areas with FAO agricultural statistics at the country level, and (3) providing accuracy assessment through freely available reference datasets. Recommendations for crop monitoring at the country level are based on a priority ranking derived from the results obtained from analyses 2 and 3. Our results revealed that cropland information varies substantially among the analyzed land cover datasets. FAO-GLCshare and Globeland30 generally provided adequate results to monitor cropland areas, whereas LC-CCI2010 and GLC2000 are less unsuitable due to large overestimations in the former and out of date information and low accuracy in the latter. The recently launched LC-CCI datasets (i.e., LC-CCI2015) show a higher potential for cropland monitoring uses than the previous version (i.e., LC-CCI2010).


Introduction
Real time monitoring of crop conditions is important for the early detection of possible adverse climatic conditions affecting production, particularly in food insecure countries to support crisis prevention and response planning [1].A significant number of global operational systems managed by national or international organizations have been developed using different input sources and methodologies and serving different purposes [2].These include, for example, the USAID Famine Early Warning System (FEWS-NET), the U.N. Food and Agriculture Organization (FAO), Global Information and Early Warning Systems (GIEWS), the CropWatch System of the Chinese Academy of Sciences, and the European MARS Crop Yield Forecasting System (MCYFS) [3].More recently and following a need for more consolidated messages, the Global Agriculture Monitoring Initiative of the Group on Earth Observation (GEOGLAM) started producing bulletins on crop conditions for both major crop producers and food insecure countries, based on products and analyses from many agencies.
Both the bulletins from the single agencies and from the GEOGLAM initiative (called Crop Monitor for Early Warning-CM4EW) are finally informing food security assessments such as those carried out according to the IPC (Integrated Food Security Phase Classification) or the Global Report on Food Crises [4].
Crop early warning systems mainly rely on vegetation condition indicators derived from satellite and agro-meteorological models data, which have the advantage of covering large spatial extents at high temporal resolution [5].This data is used in combination with land cover data in order to focus on cropland areas.
Since the first land cover map was derived from 1 km Advanced Very High Resolution Radiometer (AVHRR) data (International Geosphere Biosphere Programme DIS-cover) [6], many other maps have been produced such as the 1 km Global Land Cover 2000 [7] and 500 m MODIS Land Cover dataset [8].The availability of medium resolution satellites (e.g., MERIS) contributes to the emergence of datasets with higher spatial resolution such as GlobCover [9], European Space Agency Climate Change Initiative Land Cover products [10], and the Global Land Cover by National Mapping Organizations [11], which cover up to 300 m.Nevertheless, the still relatively low spatial resolution of all these datasets (from 300 m to 1 km) is insufficient for fine scale studies [12].The recently developed GlobeLand30 represents a major change in this sense as it provides global land coverage based on the exploitation of the Landsat archive at 30 m [13].
In addition to datasets based on single satellite sensors, new approaches are being developed to derive land cover products from various sources.Several initiatives rely on the development of a hybrid or synergic map by reconciling the best characteristics of several existing global datasets and integrating them into a single one [14] or a single-category product such as the global cropland map from the International Institute for Applied System Analysis-International Food Policy Research Institute (IIASA-IFPRI) [15].Other approaches also aggregate regional and national land cover products into a single global map such as FAO's GLCshare [16] or the Joint Research Centre (JRC) crop mask [17].
Comparative analyses of the most commonly used datasets show large disagreement and uncertainties [18].For example, differences around 60% are found in Africa when comparing several global land cover datasets [19,20].Globally, cropland area from GlobCover is reported to be 20% higher than that derived from Moderate Resolution Imaging Spectroradiometer (MODIS) [21].Concretely, agricultural land cover uncertainties in classification tend to be larger than for other classes, mainly in areas with low cropping density, in crops with similar seasonal development such as natural vegetation (savannah), or in areas with high fragmentation of landscape as is the case in Africa [22].As a result of such uncertainty and inconsistency [23,24], users of land cover maps frequently find difficulties in selecting the best map for their specific application.Moreover, with the growing number of global datasets, it remains challenging to establish in a unique way in which a product is the most suitable for monitoring cropland.
Global crop production monitoring systems need reliable input information; in particular, total cropland extent and location are absolutely necessary as they have a direct impact on anomaly warnings triggered by a percentage of cropland concerned such as FAO's Agricultural Stress Information System (ASIS) or in the JRC's Anomaly hot Spots of Agricultural Production (ASAP) [25,26].In this framework, the overarching goal of this study is to evaluate at the national level the most recent and widely used global land cover datasets in order to identify the advantages and disadvantages for their use in agricultural early warning.The specific objectives of the study are: (1) to highlight the discrepancies among datasets, (2) to assess their agreement level with FAO agricultural statistics, (3) to provide an accurate assessment of each dataset against publicly available reference datasets, and (4) to provide a country level assessment of the advantages and possible limitations for crop monitoring according to points 2 and 3. Additionally, the suitability of each dataset to monitor agriculture is also assessed based on the parcel size derived from the IIASA dataset [15].
Although the proposed approach can be applied at global scale, in this study, we focus on countries with high risk of food insecurity, where crop monitoring is important for early warning, which include the African continent as well as a selection of countries in Central America, the Caribbean region, and Central and South East Asia [1].

Global Land Cover Datasets
This paper focuses on the global land cover datasets explained in detail below and summarized in Table 1.
FAO-GLCshare, produced by the United Nations' FAO in 2014 at 1 km, integrates the best available recent national, regional, and global dataset with a high process stage method that establishes priority ranks to each dataset and each class.The product has a legend of 11 classes and an accuracy of 80% [16].
The GeoWiki Hybrid product, produced for the baseline year 2005, is a hybrid map at 300 m spatial resolution based on a geographically weighted regression (GWR) and crowdsourced validation data from a Geo-Wiki approach.Two hybrid datasets have been derived using GLC2000, MODISLC and GlobCover as input.The difference between hybrid 1 and 2 lies in the way that the GWR is applied.In this study, the hybrid 1 dataset, with an overall accuracy of 87.9%, is used [27].The dataset has a legend with 10 land cover classes that can be compared with the Land Cover Classification System (LCCS) [28].
GLC2000 was developed by the European Commission's JRC for the reference year 2000 at 1 km spatial resolution [7].The product was classified using unsupervised clustering based upon the spectral response and temporal profile from SPOT-VEGETATION daily data collected from November 1999 to December 2000.GLC2000 legend is described through the FAO's LCCS and encompasses 22 classes.The overall accuracy of the product is 68.6% [29].
GLCNMO2008 was generated by the Global Mapping Project organized by the International Steering Committee for Global Mapping (ISCGM).The Version II dataset has a spatial resolution of 500 m that is based on 16 day Bidirectional Reflectance Distribution Function (BRDF)-Adjusted Reflectance (MCD43A4) MODIS data for 2008, with 77.9% overall accuracy [11].The legend has 20 classes defined under the LCCS.The method is a combination of supervised classification for fourteen of the twenty classes and an individual independent classification for the remaining six classes [30].
The first version of GlobCover was developed in 2005 by the European Space Agency (ESA) in cooperation with an international network of partners, including the European Environmental Agency (EEA), the United Nations Environment Program (UNEP), the Global Observation of Forest Cover and Land Dynamic (GOFC-GOLD), the Joint Research Center (JRC), and the IGBP [31].In this paper, we evaluate GlobCover 2009.This product used as input, cloud-free, bi-monthly, and annual mosaics derived from MERIS Full Resolution (300 m spatial resolution) instrument onboard ENVISAT from January to December 2009.Per-pixel supervised classification was used to identify urban and wetland classes, and an unsupervised classification was used for the remaining classes to create similar spectral and temporal clusters that are labeled further.The global land cover map is made up of 22 land cover classes defined with the LCCS and has an overall accuracy of 67.5% [32].
GlobeLand30 of the National Geomatics Centre of China (NGCC) was developed using a pixel-object-knowledge (POK) based approach to derive a global land cover map at 30 m resolution for 2010 using multispectral images, including Landsat Thematic Mapper (TM) and Enhanced TM+ and multispectral images from the Chinese Environmental Disaster Alleviation Satellite (HJ-4).GlobeLand30 achieves an overall accuracy of 80.2%, and the classification system includes ten land cover types [33].The ESA's Land Cover (LC-CCI) project delivered three consecutive GLC maps for three five-year epochs centered around 2000, 2005, and 2010, with a spatial resolution of 300 m.The classification combines supervised and unsupervised classification of MERIS Full Resolution (FR) time series, pre-processed in seven-day composites to generate a baseline LC map (from 2003 to 2012).Backdating and updating techniques, based on the SPOT-VGT time series, were adopted to obtain the three different epochs.In this study, we use the epoch 2010, which is representative of the 2008 to 2012 period, with a legend based on LCCS with 22 classes.The overall accuracy is 74.4% based on the same reference sample as the one used for the GlobCover-2009 validation [34].Recently, ESA enlarged the land cover datasets provided with a series of 24 annual datasets (from 1992 to 2015).These maps are derived from a unique baseline from MERIS (2003 to 2012), and backdating and updating techniques are applied based on 300 m PROBA-V (2013 to 2015), 1 km SPOT-VGT (1999 to 2012), and 1 km AVHRR time series (1992)(1993)(1994)(1995)(1996)(1997)(1998)(1999).In this study, the Land Cover Map 2015 (LC-CCI 2015) at 300 m with a 22-class legend is evaluated [35].
MODISLC was developed by Boston University and is coordinated by the MODIS Land Team from the National Aeronautics and Space Administration (NASA) [8]; the first version became available in 2001.In this study, the MODIS Collection 5 Land Cover Type (MOD12Q1 V051) for 2010 at 500 m spatial resolution is evaluated.MODIS land cover was produced using an ensemble supervised classification algorithm, based on a multi-temporal decision tree in conjunction with a boosting technique based on spectral and temporal information from MODIS bands 1 to 7, supplemented by the enhanced vegetation index and land surface temperature [36].It contains five classification schemes, from which the International Geosphere-Biosphere Programme (IGBP), consisting of 17 classes [6], was created; IGBP is used in this study.Its overall accuracy reached 71.6% [36].

Reference Global Land Cover Datasets
One of the main drawbacks to carrying out a validation is the lack of a common reference dataset [24], especially at a continental or global scale [37].Nevertheless, the development of new datasets in recent years is increasing the accessibility of validation data.This research benefits from freely available and publicly shared datasets that were merged into a single one, the details of which are provided below and in Table 2.The resulting dataset consists of 7333 reference sample units of which 21.7% correspond to cropland (i.e., 5058 constitute 20.8% of the cropland in Africa, 1688 constitute 26% of the cropland in Asia, and 587 constitute 16.7% of the cropland in America).
Global Observation for Forest Cover and Land Dynamics (GOFC-GOLD) has been building a data-access portal, which provides various validation datasets [38], including GLC2000con points, as the result of the consolidated version of the original GLC2000 reference dataset, which was reclassified by reinterpretation with Google Earth at 11 classes [7].GlobCover2005con is the re-interpreted version of the original ESA-GlobCover 2005 reference dataset using Google Earth [28].Each sample unit is a 5 × 5 MERIS window interpreted according to 22 land cover classes.A subset of the System for Terrestrial Ecosystem Parameterization (STEP) database is also provided in the GOFC-GOLD portal.The STEP database, developed by Boston University, is used to calibrate MODIS land cover products [8].Each site is a polygon (~4 km 2 ) and is classified according to the IGBP legend.The University of Boston also created the Visible Infrared Imaging Radiometer Suite (VIIRS) in order to the meet validation requirements for the VIIRS Surface Type (ST) product.Each VIIRS pixel (~1 km 2 ) is interpreted as per the 17-class IGBP legend [39].The VIIRS validation data consists of 5 × 5 pixels blocks.Last GLCNMO 2008 training datasets, developed by ISCGM to produce the GLCNMO map for 2008, are also included.The datasets include polygons classified into 20 classes [11].
On top of the GOFC-GOLD datasets, we also used the FROM-GLC (Finer Resolution Observation and Monitoring of Global Land Cover) project data [40] that is comprised of point training samples.These points were interpreted and cross-checked based on Landsat TM and Google Earth data and then rechecked twice by high quality image interpreters using Enhanced Vegetation Index time series for 2010 as auxiliary data.The classification-scheme for the validation dataset includes 11 land cover types.
We also use crowdsourcing datasets to enlarge the number of validation samples.Recently, these kinds of datasets have proved to be valuable in accuracy assessment tasks [41,42].Hence, data from IIASA's Geo-Wiki project in collaboration with the University of Applied Sciences in Wiener Neustadt and the University of Freiburg is also included in this analysis.First, we used the data validated by IIASA experts using high-resolution images through the Geo-Wiki crowdsourcing land cover validation tool [41].This reference data consists of 1 km resolution pixels located at each latitude/longitude intersection point across Africa, with a legend that includes the percentage of cropland [17].Crowdsourcing data from the second Geo-Wiki competition, carried out from January to March 2012, was also employed.Interpreters were presented with 1 km pixels overlaid on Google Earth and were asked to classify them according to a ten-class legend based on LCCS/IGBP into up to a maximum of three classes [43].

Methods
The land cover datasets were transferred into a common geographic framework to make them directly comparable.Hence, the datasets were re-projected to the World Geodetic System (WGS) 84 Geographic Latitude/Longitude and resampled to a spatial resolution of around 250 m (0.00225 degrees) using the nearest neighbor method as it is a compromise resolution for monitoring crops at the national or regional level with medium resolution time series such as MODIS.

Legend Harmonization
One of the main challenges in land cover comparison is legend reconciliation or harmonization, as each of the datasets defines cropland in a different manner and according to different descriptors.The number of classes referred to as cropland and their definition is highly heterogeneous and differs from one dataset to another.The classification schemes vary in terms of the number of categories, density of cover, land use categories (i.e., pure agriculture, agriculture/forestry), life forms (i.e., herbaceous, trees, shrubs, or mixed forms), or the artificiality of cover (i.e., natural or semi natural with managed areas) [28] (see Table 3).Indeed, there are some challenging issues in legend translation, mainly when these datasets are not specifically designed for agricultural purposes: some legends only define 'pure' land cover classes (i.e., Globeland30), while others include mosaic classes that comprise two or even more classes in their definition, e.g., GlobCover class 20 includes mosaic cropland (50% to 70%) and vegetation (20% to 50%).Furthermore, GlobCover or LC-CCI distinguish between irrigated and rainfed crops, in contrast with MODIS IGBP or Globeland30.The maps also show different thematic information related to life form classes, as GLCNMO2008 only considers herbaceous life form in cropland, whilst FAO-GLCshare includes tree life forms.
Translation becomes even more problematic when there is a lack of consensus in cropland definition.For example, according to the Joint Experiment for Crop Assessment and Monitoring definition [44], perennial crops and fallows are excluded from cropland, while FAO includes permanent crops [45].In this study, we adopted IFPRI's definition, which considers cropland to be the sum of arable land and permanent crops.This inclusive definition facilitates comparison by allowing the allocation of all the possible cropland classes independently of their life forms (e.g., tree forms), production systems (i.e., both rainfed and irrigated), and density of cover.
The datasets were harmonized to a target legend of two classes (i.e., cropland, non-cropland), following the approach proposed by Herold et al. [23], according to the LCCS-based legend translation protocols [46], and using as a benchmark a long list of references (e.g., [19,23]).The majority of the analyzed datasets (i.e., FAO-GLCshare, Geowiki, GLC20000, GLCNMO2008, GlobCover, and LC-CCI) are based on LCCS [32], while MODISLC is based on the IGBP scheme, and Globeland30 adopts a classification scheme of ten classes defined by [31].

Comparison of Datasets
The first step was comparing datasets with each other in terms of both the amount and location of discrepancies.The amount refers to the cropland area as computed by the number of cropland pixels of each dataset.Location refers to a spatial localization of the discrepancies and was performed by producing an agreement map using all the land cover datasets.This agreement map was produced by Boolean or crisp comparison [23], revealed areas of uncertainty in global maps [47], and identified patterns of agreement.To minimize geolocalization errors, the agreement among datasets was computed in a 3 × 3 window instead of a pixel-by-pixel basis [18].Furthermore, an agreement analysis in a pair-wise basis was performed to assess the discrepancies and similarities between pairs of land cover datasets.

Assessment with FAO Statistics
The consistency of the land cover datasets was evaluated against statistics at the country level.For this purpose, FAOSTAT (http://faostat3.fao.org/home/) was used as it provides statistics from over 200 countries, including agricultural time series.FAO statistics are based on national censuses, agricultural samples, questionnaire-based surveys with major agricultural producers, and independent evaluations [45].Although in some countries statistics are recognized to be inadequate, the dataset is still the most comprehensive agricultural database, mainly in countries where the availability of statistics is a challenge [17].The total cropland areas from FAOSTAT were derived from the sum of the arable land and defined as "land under temporary crops (multiple-cropped areas are counted only as one), temporary meadows for mowing or pasture, land under market and kitchen gardens and land temporarily fallow (less than five years).The abandoned land resulting from shifting cultivation is not included in this category" and permanent croplands defined as "land cultivated with crops that occupy the land for long periods and need not be replanted after each harvest, such as cocoa, coffee and rubber; this category includes land under flowering shrubs, fruit trees, nut trees and vines, but excludes land under trees grown for wood or timber" [45].For reliability and to minimize the diversity of the timeliness effect, the per-country FAO-statistical averages from 2006 to 2014 were computed.
For calculating the cropland area of the datasets, the number of cropland pixels per country was multiplied by the area of the original pixel prior to reprojecting and resampling to minimize errors in the process.To reduce bias due to the presence of mixed classes, a weight (see Table 3) representative of the percentage of cover of each class was evaluated according to the original legend descriptions and some previous studies used as reference benchmarks [20].Three weights were evaluated: the minimum fraction, the maximum fraction, and a weight of 100%, considering 'hard' categories or pure cropland classes.In the case of Globeland30 only a 'hard' category is evaluated, whereas, for FAO-GLCshare, each pixel is multiplied by the fraction layer.
The root mean square error between the cropland country statistics and the land cover dataset cropland areas was computed according to Fritz et al. [20] in order to identify the closest dataset to the FAO statistics.Moreover, the country cropland statistics were regressed against the cropland areas of the datasets using ordinary least square regression (OLS) (p-value < 0.1) to assess the various land cover datasets.Last, the absolute difference of the FAO statistics at the country level (called Error FAO) was calculated by applying the three different weights to the mosaic classes of the land cover datasets:

Accuracy Assessment
An accuracy assessment was performed to verify the accuracy of the cropland class in the different global land cover datasets, as it represents the degree of 'correctness' of the land cover dataset [48] and provides a quantitative description of the quality of the land cover dataset [49].Even if the majority of the datasets analyzed in this paper are accompanied by their own accuracy assessment (see Table 3), these are not comparable as they are performed with different methodologies and validation datasets.Hence, a common accuracy assessment was carried out at the continental and country levels in those countries where at least ten samples were available [17].
The reference datasets used have different origins, thus their harmonization and cleansing was previously required.For this, the legends of the reference datasets were translated into the two-class common legend (i.e., cropland and non-cropland).In case the original legend contained the percentage of cropland, 50% was selected as the threshold for considering the sample to be cropland [17].To handle the differences in sample support (e.g., point, polygon), the land cover was related to the centroid of the reference data [42].Once each data source was prepared, it was merged in order to build the final reference dataset.
An accuracy assessment was carried out by means of the confusion or error matrix that is a cross-tabulation of the land cover data against the reference data for the sample sites [39].Several classical accuracy parameters were derived from the confusion matrix, including the overall accuracy as the percentage of the correctly classified pixels, the user's accuracy (UA) (or commission error) as the percentage of the map classified correctly, and the producer's accuracy (PA) (or omission error) as the percentage of validation sites classified correctly.Furthermore, the F-score, which is harmonic mean of PA and UA that expresses the balance between the PA and UA, was also derived [50].To minimize errors, the analysis was only carried out in the areas classified as cropland by at least one of the analyzed datasets.

Recommendations at Country Level
We established a priority index to rank the nine land cover datasets at the country level and as an objective tool to assess which dataset is the most appropriate, and hence recommended, for cropland monitoring.The least recommended dataset was also obtained.The priority ranking was computed on the basis of this index: where n is the nine analyzed datasets, O corresponds to the overall accuracy, F is the F-score, and FAO is the absolute FAO error computed as |Area FAO − Area n |.Each criteria ranges between 0 and 1; as a result, the priority index ranges from 0 (least recommended) to 3 (most recommended).

Suitability of Datasets to Monitor Agricultural Land Based on Parcel Size
In land cover accuracy assessments, spatial heterogeneity and fragmentation of landscape play a key role in driving discrepancies [51].In this sense, an important question is to know whether the available land cover datasets are suitable as reference data for crop monitoring in relation to locally prevailing field sizes.The suitability of each dataset to monitor cropland was assessed at the country level depending on its average field size, which was derived from an IIASA dataset and the spatial resolution required to appropriately monitor it [15].This data consists of a 1 km 2 dataset derived from the interpolation, through an inverse distance approach, of field size data collected via a Geo-Wiki crowdsourcing campaign and using high-resolution imagery.The dataset is divided into four classes of field size: large, medium, small, and very small.
A histogram from the IIASA parcel size data was computed by country.Then, the parcel size of each country was defined in a way similar to that of Walder et al. [52], that is, by selecting a threshold of 75% from the histogram, which allows 75% of the largest fields of the country to be monitored.The field size was then related to the GEOGLAM spatial resolution requirements for the ASAP cropland mask.That is, a generalized map of cultivated areas and crop condition monitoring understood as the growing condition of croplands based on coarse resolution data [53] (Table 4).Table 4. Spatial resolution requirements for satellite-based Earth observation data developed by the CEOS (Committee on Earth Observation Satellites) Ad Hoc Team for GEOGLAM and the observed field size for crop mask and for derived crop condition indicators [53].

Spatial Agreement and Discrepancies
Figure 1 displays spatial agreement among datasets and agreement among products on a per-country analysis.Nine levels of agreement are defined, ranging from nine or full agreement, when all datasets within a pixel agree that there is cropland, to 1, when only one dataset classifies the pixel as cropland.Under the hypothesis that the likeliness of a pixel to be correctly classified as cropland is high when the majority of the datasets agree, this map could be used as a cropland probability map that can be used to derive a cropland mask according to different thresholds (e.g., seven, eight, or nine datasets agree) and to overcome some of the limitations of the use of a single dataset [54].
Remote Sens. 2017, 9, 1118 10 of 23 Full agreement is reached in 11.90% of the analyzed countries in Asia, whereas in Africa and America these values are lower, being 2.15% and 1.39%, respectively.Areas of high agreement (values from six to eight) are also larger in Asia (28.68%) and spatially adjacent to areas of full agreement.The highest agreements in Southeast Asia are mainly found in rice dominated countries, the largest of which are Bangladesh (i.e., 33.8% of full agreement), Thailand (i.e., 25.7%), and Myanmar (i.e., 17.56%).Likewise, in America, the highest agreement occurs in Cuba (i.e., 7.81%), where rice constitutes 50% of agricultural production.Actually, rice production across these environments is characterized by an initial flooded phase and a typical phenology that can be readily identified from remote sensing data, resulting in good discrimination and classification [55].Similar to previous studies [19], in Africa, the highest agreements are located in the transition zone between the Soudanien and Sahelian bands, running from west Senegal to east Ethiopia.An overwhelming proportion of areas of full or higher agreement is also concentrated along the Nile Valley and Delta in Egypt (i.e., 42.6%), north Algeria, Morocco, Tunisia, and the surrounding areas of Lake Victoria (i.e., Rwanda and Burundi).Conversely, areas of full disagreement are more abundant in Africa (29.5%) and America (28.49%) than in Asia (20.60%) as monitoring croplands in these areas remains particularly challenging due to the low intensity of agriculture and its spatial structure.High disagreement areas (values from 2 to 3), 31.97% for Africa and 30.12% for America, are more scattered in the surroundings of full disagreement areas, demonstrating that areas in which agriculture is not dominant are more prone to errors in classification [56].Disagreement, in most cases, is widely distributed in transition zones, where two or more classes coexist.For example, in Namibia (61.8%) or Angola (61.4%), the occurrence of grasslands hinders the detection of croplands due to spectral similarities between these two classes.Similarly, high disagreements are found in the Sahelian belt in countries such as Mauritania (42.1%),Niger (24.7%), or the north of Burkina, where crops are more scattered and discrimination between them and similar spectral classes such as soil or grasslands is also a challenge.
Central America, Nicaragua, Honduras, and El Salvador present high disagreements (from 1 to 3), with values of 37.5%, 57.76%, and 24.84%, respectively.These discrepancies are produced as a Full agreement is reached in 11.90% of the analyzed countries in Asia, whereas in Africa and America these values are lower, being 2.15% and 1.39%, respectively.Areas of high agreement (values from six to eight) are also larger in Asia (28.68%) and spatially adjacent to areas of full agreement.The highest agreements in Southeast Asia are mainly found in rice dominated countries, the largest of which are Bangladesh (i.e., 33.8% of full agreement), Thailand (i.e., 25.7%), and Myanmar (i.e., 17.56%).Likewise, in America, the highest agreement occurs in Cuba (i.e., 7.81%), where rice constitutes 50% of agricultural production.Actually, rice production across these environments is characterized by an initial flooded phase and a typical phenology that can be readily identified from remote sensing data, resulting in good discrimination and classification [55].Similar to previous studies [19], in Africa, the highest agreements are located in the transition zone between the Soudanien and Sahelian bands, running from west Senegal to east Ethiopia.An overwhelming proportion of areas of full or higher agreement is also concentrated along the Nile Valley and Delta in Egypt (i.e., 42.6%), north Algeria, Morocco, Tunisia, and the surrounding areas of Lake Victoria (i.e., Rwanda and Burundi).
Conversely, areas of full disagreement are more abundant in Africa (29.5%) and America (28.49%) than in Asia (20.60%) as monitoring croplands in these areas remains particularly challenging due to the low intensity of agriculture and its spatial structure.High disagreement areas (values from 2 to 3), 31.97% for Africa and 30.12% for America, are more scattered in the surroundings of full disagreement areas, demonstrating that areas in which agriculture is not dominant are more prone to errors in classification [56].Disagreement, in most cases, is widely distributed in transition zones, where two or more classes coexist.For example, in Namibia (61.8%) or Angola (61.4%), the occurrence of grasslands hinders the detection of croplands due to spectral similarities between these two classes.Similarly, high disagreements are found in the Sahelian belt in countries such as Mauritania (42.1%),Niger (24.7%), or the north of Burkina, where crops are more scattered and discrimination between them and similar spectral classes such as soil or grasslands is also a challenge.
Central America, Nicaragua, Honduras, and El Salvador present high disagreements (from 1 to 3), with values of 37.5%, 57.76%, and 24.84%, respectively.These discrepancies are produced as a consequence of the high fragmentation of the landscape, which is harder to classify [21], and cultivation techniques that make crop signal detection difficult.This includes the traditionally swidden subsistence agriculture in Mesoamerica, referred to as milpas, that rotates recovering forest and small agriculture plots of maize, sorghum, or beans [57].Another example comes from the existence of a secondary crop season, postrera, or even a third season, e.g., Apante, in El Salvador.
Furthermore, cloud cover increases the difficulty in characterizing cultivated areas, which is mainly a problem in the humid tropics as it occurs in Guinea, Liberia, Sierra Leone, or the Republic of Côte d'Ivoire (Figure 1).Actually, the number of composites on the western coast of central Africa for LC-CCI2010 is reported to be low due to a limited number of valid and cloud-free weekly images [30].
Overall agreement on a pairwise basis is highest between related products such as LC-CCI2010 and LC-CCI2015 for America and Asia.In Africa, the discrepancies are higher due to the better cropland classification of the new LC-CCI dataset (i.e., 2015) compared to the previous version (i.e., 2010), which entails an important reduction of cropland in the Congo Basin zone.The Geowiki dataset is in high agreement with MODISLC in Asia and with GLC2000 in Africa, which is coherent with the development of this hybrid dataset [27].Agreement drops drastically for other combinations, mainly for combinations that involve LC-CCI, GlobCover, and GLC2000.

Agreement with FAO Statistics
The cropland area for each dataset is compared with cropland area statistics from FAOSTAT.The results in Africa, derived using the 100% weight for mosaic classes, show that GLCNMO2008 (with 261 Mha) is the dataset closest to FAO total cropland area (257 Mha).Even if the datasets derived from a hybrid approach slightly overestimate the cropland area, i.e., Geowiki (305 Mha) and FAO-GLCshare (337 Mha), they are still closer to the FAO statistics than the majority of the satellite-based global land cover maps.This is because the hybrid maps inherit some of the errors produced in the original sources.For instance, Geowiki, which partly comes from GLC, overestimates cropland in Senegal, Ethiopia, and Eritrea similarly to GLC and underestimates cropland in Guinea similarly to MODISLC.Furthermore, FAO-GLCshare benefits from the high resolution products used in its development.As expected, the main discrepancies in the FAO statistics occur for FAO-GLCshare in the regions where no detailed dataset is available [16]; hence the large overestimation in Morocco and Western Africa (i.e., Ghana, Benin and Nigeria) and the underestimation in Namibia and Madagascar.
In accordance with previous results [12], MODISLC underestimates cropland area at a continental level in the three cases (i.e., weight 100%, minimum, and maximum) due to confusion between cropland and forest/woody savannah, particularly in Angola, Botswana, Tanzania, Namibia, Malawi, Zambia, and Zimbabwe.Globeland30 slightly underestimates cropland area (232 Mha) mainly in Burundi, Cameroon, Central African Republic, the Republic of Côte d'Ivoire, Ghana, Guinea, Liberia, Madagascar, Mauritania, and Sierra Leone.
As anticipated by Fritz et al. [12], GLC2000 tends to overestimate (351 Mha) cropland in the countries located in the transition zone of the Sahelian belt such as Chad, Mauritania, Senegal, Sudan, and Eritrea.LC-CCI2010 largely over-represents cropland areas, even when considering the minimum weight (717 Mha).Large surpluses mainly occur in Angola, Botswana, Chad, the Republic of Côte d'Ivoire, Ethiopia, the Democratic Republic of the Congo (DRC), Kenya, Mali, Madagascar, and Mozambique.LC-CCI2015 also overestimates cropland area in this continent, mostly in Algeria, Benin, Burkina, Chad, the Democratic Republic of the Congo (DRC), and the Republic of Côte d'Ivoire.Nevertheless, this new product has improved with respect to its previous version; for example, in DRC or Ethiopia (reaching 378 Mha with the minimum weight).Linear regressions between FAO statistics and the total cropland from the datasets range from 0.49 for LC-CCI2010 to 0.91 for FAO-GLCshare in Africa, when the maximum weight is considered.
Compared to the FAO statistics (114 Mha), Globeland30 slightly underestimates the cropland area in the analyzed Asian countries (98.2 Mha) as a consequence of considerable underestimations in Timor-Leste, Indonesia, the Philippines, and Bangladesh.GLCNMO2008 and Geowiki are the closest to the FAO statistics (around 106 Mha), despite large underestimations in Bangladesh and Indonesia and overestimations in the Democratic Republic of North Korea (DPRK), Myanmar, Nepal, and Timor-Leste for GLCNMO2008.ESA datasets, i.e., LC-CCI2010, LC-CCI2015, and GlobCover, also slightly overestimate the cropland area in Asia with 195 Mha, 146 Mha, and 194 Mha, respectively, with the 100% weight.These three datasets overestimate cropland in Cambodia, Myanmar, Indonesia, and Thailand.
Contrary to the other continents, in America the best fit is found for LC-CCI2015 with 29.3 Mha compared to the 27.4 Mha of the FAO statistics.In this area, the new ESA datasets really improve compared to LC-CCI2010 (70.14 Mha) and GlobCover (61.65 Mha).When considering the minimum weight, GlobCover drastically reduces the cropland area to 25.97 Mha.Hence, in case a crop mask is derived from GlobCover, the strategy for America is to disregard class 40 as cropland in the legend translation.Likewise, a similar reduction in cropland area occurs with GLCNMO2008 as the cropland area drops from 50.4 Mha with 100% weight to 27.4 Mha with the minimum weight.
Figure 2 displays the datasets that best fit the FAO statistics according to the 100% weight.We chose this weight from an operational point of view, as our intention is to use these results as a benchmark for deriving a cropland mask in the future.In this case, it is more convenient to work by assuming pure classes than to use cropland fractions.In the figure, an example of the error bars of the statistics for Ecuador is also displayed.
According to Figure 2, FAO-GLCshare is one of the best datasets for Eastern and Southern Africa, as it indirectly benefits from local knowledge through the Africover and South African Development Community datasets [16].GlobCover is hardly ever the best dataset, with the only exception of Malawi, Rwanda, and the Republic of the Congo.Globeland30 remains quite suitable in Sahel countries, whilst MODISLC and GLCNMO2008 are the best datasets in the western African coastal countries and the Asian continent, respectively.LC-CCI2010 wins only in Niger, where all other datasets underestimate cropland area, whereas, LC-CCI2015 is the closest to the FAO statistics in South Africa, Madagascar, Indonesia, Cuba, or Ecuador.
Development Community datasets [16].GlobCover is hardly ever the best dataset, with the only exception of Malawi, Rwanda, and the Republic of the Congo.Globeland30 remains quite suitable in Sahel countries, whilst MODISLC and GLCNMO2008 are the best datasets in the western African coastal countries and the Asian continent, respectively.LC-CCI2010 wins only in Niger, where all other datasets underestimate cropland area, whereas, LC-CCI2015 is the closest to the FAO statistics in South Africa, Madagascar, Indonesia, Cuba, or Ecuador.

Accuracy Assessments
The results presented in this section should be interpreted carefully because the numbers of samples are quite low in some countries, even insufficient in several of them (displayed as no data in Figure 3).Although, the pooled reference datasets used to assess accuracy may not be optimal, they are reliable enough to evaluate land cover datasets [52].
The overall accuracy results at the continental level (Table 5, Appendix A) are generally consistent with previous studies carried out in Africa [42] and China [58], where GlobeLand30 is the dataset with the highest overall accuracy.Indeed, GlobeLand30 shows the highest overall accuracy values (around 80%) for the three continents.The factors behind this are the higher resolution and

Accuracy Assessments
The results presented in this section should be interpreted carefully because the numbers of samples are quite low in some countries, even insufficient in several of them (displayed as no data in Figure 3).Although, the pooled reference datasets used to assess accuracy may not be optimal, they are reliable enough to evaluate land cover datasets [52].
The overall accuracy results at the continental level (Table 5, Appendix A) are generally consistent with previous studies carried out in Africa [42] and China [58], where GlobeLand30 is the dataset with the highest overall accuracy.Indeed, GlobeLand30 shows the highest overall accuracy values (around 80%) for the three continents.The factors behind this are the higher resolution and the improved classification algorithm [31].Furthermore, this dataset has been demonstrated to be suitable for monitoring cultivated areas in arid zones [59].
The results also indicate that, generally, MODISLC and FAO-GLCshare are likely to be more accurate than LC-CCI2010 and GlobCover over the three continents, with overall accuracies of around 40% to 50%.Bontemps et al. [35] explained the low overall accuracies of GlobCover due to the unsteadiness of the dataset.Actually, 25% of the pixels changed class from 2005 to 2009, with most of the change attributed to mosaic classes between cropland and grassland.Moreover, the SWIR (shortwave-infrared) band that was great importance for cropland classification in western Africa [60] is lacking in MERIS [28], which reduces the capacity to identify cropland.The overall accuracy of GlobCover is especially low in countries such as Cambodia, where cropland is confused with the wetlands surrounding Lake Tonle Sap, and in Somalia (23%), where cropland is omitted in the important cropland region of Woqooyi Galbeed and overestimated in Bay in the south of the country.
most of the change attributed to mosaic classes between cropland and grassland.Moreover, the SWIR (shortwave-infrared) band that was great importance for cropland classification in western Africa [60] is lacking in MERIS [28], which reduces the capacity to identify cropland.The overall accuracy of GlobCover is especially low in countries such as Cambodia, where cropland is confused with the wetlands surrounding Lake Tonle Sap, and in Somalia (23%), where cropland is omitted in the important cropland region of Woqooyi Galbeed and overestimated in Bay in the south of the country.Despite the reported instability of MODISLC that changes from year to year [8], this dataset obtains the highest accuracies in South Africa, Somalia, Kenya, Mozambique, Namibia, and Botswana.
As for LC-CCI, the overall accuracy highly improves with the 2015 product compared to the 2010 product.For example, in the Democratic Republic of Congo, one of the most problematic countries for 2010 data, the overall accuracy jumps from 20% to 70%.Finally, GLC2000 generally has low accuracy in America and Asia, mainly in Colombia (35%) and Ecuador (41%), and Geowiki reaches the lowest overall accuracy in Haiti (18%).
The values of F-scores at the country level (Figure 3a) are the highest in Asia, where fairly homogeneous and agricultural landscapes occur.Poorer values are registered in mountainous countries such as Nepal, Timor, and Indonesia (F-scores between 0.30 and 0.50).The geomorphology and irregularity of the terrain make it more prone to mis-registration errors [48] due to shadow [61] when compared with countries with more predominant flat agricultural areas such as Bangladesh (0.81 F-score) or Thailand (0.72 F-score).High values are also found in areas with irrigated crops such as Morocco, Egypt, Cuba, Libya, and South Africa, where croplands do not intermix with any other type of vegetation and preserve pixel homogeneity.Generally, values in Africa and America are lower for croplands, as this class is subject to the large classification errors in these areas [24].
Figure 3b depicts the datasets with the highest F-scores.Globeland30 map has a higher F-score in South Africa, Burkina, Algeria, Morocco, Kenya, Madagascar, and South Africa as it is capable of capturing heterogeneous landscape variety in detail due to its spatial resolution.Despite the maximum F-score reached by Globeland30, in some cases errors are mostly conditioned by the high omission errors that are largely compensated by low commission errors, perhaps due to a limited number of reference samples.FAO-GLCshare, generally presents good overall accuracy, mainly in Ethiopia, Libya, Tunisia, and DRC, as these countries rely on high resolution data such as Africover or GeoNetwork.MODISLC dataset has the highest F-score in Libya and Guatemala, while Geowiki is the best in Chad, Mauritania, Niger, Nepal, and Bangladesh.

Suitability of Datasets to Monitor Agricultural Land Based on Parcel Size
Field size is used to determine the appropriate spatial resolution needed for monitoring agriculture according to GEOGLAM requirements.As Figure 4 shows, the majority of the African agricultural land falls within small (<1.5 ha) and very small (<0.15 ha) parcel sizes.This is the case in countries such as Ethiopia, which are characterized by farms with small parcel sizes, or countries in coastal Western Africa (e.g., Cameroon), where agriculture is mainly formed by small patches integrated in the forest.Hence, a fine-resolution sensor such as Landsat, and, by extension the Globeland30 datasets, is the most suitable sensor to properly capture the spatial variety of these countries.In fact, Landsat has already been proven to be efficient in land cover characterization, with a minimum mapping unit between 1 and 5 ha [46].Furthermore, the launching of new high resolution satellites such as Sentinel 1 and 2 offers new valuable sources and the opportunity to enhance the monitoring of fragmented landscapes.

Recommendations for the Use of Global Land Cover Maps for Agricultural Monitoring
Based on the described analysis, we provide a recommendation for the use of each dataset at the country level.The priority index is a quick tool to rank datasets and easily discard the least adequate for cropland monitoring.Thus, the first three datasets obtained according to the priority index were visually checked using high spatial resolution imagery from Google Earth, and the final selection is

Recommendations for the Use of Global Land Cover Maps for Agricultural Monitoring
Based on the described analysis, we provide a recommendation for the use of each dataset at the country level.The priority index is a quick tool to rank datasets and easily discard the least adequate for cropland monitoring.Thus, the first three datasets obtained according to the priority index were visually checked using high spatial resolution imagery from Google Earth, and the final selection is depicted in Figure 5.

Discussion
The proposed approach describes a pragmatic way of comparing land cover datasets and is in line with previous studies [17].Nevertheless, it should be kept in mind that there are some inherent characteristics of the datasets, as well as assumptions that need to be made during the process, that limit quantitative comparison and can introduce potential bias in the results.Also the results do not represent complete quality assessments of the considered datasets but focus on drawing conclusions and making recommendations for using the agricultural land cover for crop monitoring.
Legend harmonization is a very critical issue as applying legend translation protocols is a simplification, mainly for mixed classes.This inevitably leads to increasing errors when dealing with mixed coverage pixels due to landscape fragmentation and land cover continuum between land cover types.Actually, the presence of mosaic classes increases misclassification errors because these classes tend to be spectrally similar to multiple sub-pixel cover types.From this perspective, we aim to make Generally, Globeland30 and FAO-GLCshare can be highly recommended for early warning and cropland monitoring in Africa.FAO-GLCshare results are mainly suitable in countries where high resolution datasets are used such as Sudan or South Sudan.On the contrary, LC-CCI2010 and GLC2000 results are unsuitable for monitoring agriculture on all three continents.In America and Asia, the advantage of Globeland30 and FAO-GLCshare is less evident.The remaining datasets are country specific; for example, MODISLC results are quite suitable in the Asian zone (e.g., Philippines).Even if LC-CCI2015 overestimates cropland areas in some countries, this new product improves the previous version's results and is especially suitable for monitoring Central America.Hybrid datasets such as Geowiki are generally suitable (e.g., Botswana).However, Geowiki contains some spatial incoherencies (abrupt transitions) in some countries e.g., Bangladesh and Mauritania.In addition, hybrid datasets may inherit errors from the original datasets, as occurs in DPRK, where Geowiki overestimates cropland similarly to MODISLC, or with FAO-GLCshare in El Salvador due to GlobCover error.GLCNMO2008 in Africa ranks in the middle of the list; however, it seems to be more adequate in Asia and America, particularly in Guatemala and Nepal.
In areas of Asia and America, the differences between datasets are small, so even datasets not retained in Figure 5 may also be used.In some areas that are prone to errors because of highly fragmented or heterogeneous landscapes, such as DRC, the differences in the priority index are also low.In this case, performance is bad in most of the datasets.In contrast, there are countries for which one dataset clearly outperforms all others such as FAO-GLCshare in Kenya.

Discussion
The proposed approach describes a pragmatic way of comparing land cover datasets and is in line with previous studies [17].Nevertheless, it should be kept in mind that there are some inherent characteristics of the datasets, as well as assumptions that need to be made during the process, that limit quantitative comparison and can introduce potential bias in the results.Also the results do not represent complete quality assessments of the considered datasets but focus on drawing conclusions and making recommendations for using the agricultural land cover for crop monitoring.
Legend harmonization is a very critical issue as applying legend translation protocols is a simplification, mainly for mixed classes.This inevitably leads to increasing errors when dealing with mixed coverage pixels due to landscape fragmentation and land cover continuum between land cover types.Actually, the presence of mosaic classes increases misclassification errors because these classes tend to be spectrally similar to multiple sub-pixel cover types.From this perspective, we aim to make the different products as comparable as possible by following an inclusive approach that takes into account all the possible cropland classes, including mosaic ones.Even though, for simplification, we mainly carried out a Boolean approach throughout the paper, as it is easier to deal with the whole pixel than with a percentage of the pixel, cropland weights have been defined to assess the datasets in comparison to cropland statistics from FAOSTAT.As expected, the results using minimum or maximum weights for the mosaic classes are generally closer to the statistics.The main reason is that using the cropland area fraction allows use to minimize errors due to legend translation, particularly in the datasets with a high number of mosaic classes such as LC-CCI 40: Mosaic natural vegetation (tree, shrub, herbaceous cover) (>50%) and cropland (<50%).The dataset that most benefits from using the minimum/maximum weight is GlobCover, which passes from 479 Mha to 299.47 Mha, revealing legend translation as an important source of uncertainty in land cover area estimation.However, there is not a clear explanation as to why the LC-CCI2010 dataset has a poorer performance than the other datasets even with the minimum/maximum weight, the difference between the maximum and minimum weight only being 50 Mha.
The spatial resolution of the original dataset is key factor in comparison, as higher spatial resolution generally ensures a better land cover characterization as it is able to depict smaller features, mainly in heterogeneous landscapes.This clearly favors higher spatial resolutions such as Globeland.The majority of the analyzed datasets are predominantly derived from coarse spatial resolution data (>100 m), according to the GEOGLAM definition, such as MODISLC, LC-CCI, or GLCNMO2008.However, these datasets are only able to capture the spatial complexity of a reduced number of countries such as South Africa, Botswana, Bolivia, Guatemala, Cuba, and Nicaragua.From a parcel size perspective, agriculture in Asia is also difficult to monitor with coarse resolution satellites.However, this is still meaningful in areas in which most parcels have the same crop (e.g., rice).
The comparison of datasets against FAOSTAT data has some limitations since the quality of the statistical data could be inadequate.This is acknowledged by FAO itself [62], and a long term initiative has been undertaken to improve the situation [63].In the meantime, the dataset is still the most comprehensive agricultural statistics dataset, and the comparison of land cover data with these data is useful for detecting major discrepancies.
One of the main issues in the accuracy assessment of cropland is the scale difference when comparing points with spatial data.Similar to previous studies [64], we provide a practical solution by simplifying the information in each sample unit of each data set to a single value concentrated in a point.We assume that it does not strongly bias the comparison in favour of one particular map.Nevertheless, a more in-depth analysis of this issue remains an open problem for the future.Moreover, the limited amount of validation and the lack of harmonized sampling designs for assessing cropland classifications is also a challenge.Despite the increasing number of initiatives that freely distribute reference datasets such as Geowiki and GOFC-GOLD, there is still a need to enlarge the number of samples and tailor the sampling plan for cropland validation.In this respect, new reference datasets for cropland validation developed in the framework of FP-7 project SIGMA (Stimulating Innovation for Global Monitoring of Agriculture) [65] (http://www.geoglam-sigma.info/)will enhance opportunities in assessing cropland maps.
Another factor that could be responsible for disagreements among datasets such as the time inconsistency between the datasets is that there can be almost 20 years of difference between them.The impact of different dates is difficult to assess due to the lack of information in net area changes, mainly at the country level, where changes are the result of flows from cropland to non-cropland and from non-cropland to cropland.At the global scale, the net change in cropland from 2000 to 2014, from FAOSTAT, was 1.7%, i.e., an average of 0.13% per year.This is the result of strong increases in some areas such as Africa (+17.5%) and decreases in other areas such as Europe (−3.8%).Nevertheless values at the continental level may hide heterogeneities among regions, as in South Africa, where the rate of change is −9%.Since the general interest in crop monitoring relies on the current crop distribution, a higher weight should be assigned to more recent reference data.In this sense LC-CCI2015 would have fewer divergences due to differences in time acquisition.Moreover, this product provides a time-series of yearly land cover maps that can been suitable for deriving change flows, but this has not yet been properly tested to our knowledge.On the contrary, this would penalize older layers, in particular GLC2000, but this is fair taking into account the purpose of this analysis.
Datasets derived from hybridization such as Geowiki and FAO-GLCshare show medium or higher agreements in a pair-wise analysis as they rely on the integration of the best characteristics of multiple datasets, some of which are also analyzed in this study.The combination of GlobCover and GLC2000 presents a reasonable agreement, as GlobCover uses GLC2000 in its development [28].
It should be stated that, in this study, we only focus on cropland, and the conclusions may not be extrapolated to other classes; for example, we noticed that Globeland30 overestimates grassland in most of the analyzed countries.Furthermore, recommendations are based on a priority index that provides a fast and objective tool to rank our results based on standard measures widely used as the overall accuracy.This tool and, by extension, the final result or recommendation could be adapted to specific user needs.For example, Globeland30 results in the best representation of cropland in a large number of countries under the premise that, for early warning, it is much better to consider pure areas even if the total cropland is underestimated, which means that commission errors are penalized by giving weight to the factors that compose the index or new factors that the recommendation could be monitoring.
Our findings reveals that LC-CCI2010 and GLC2000 are unsuitable for cropland monitoring due to large overestimations from the former and low accuracy and production year for the latter.Another important finding is that the accuracy of LC-CCI2015 for cropland improves with regard to the previous LC-CCI2010 version.Nevertheless, this is not sufficient to conclude that there was a major increase in the accuracy of the overall GlobCover product, which was obtained with a similar methodology to the first.Datasets derived from hybridization such as Geo-Wiki and FAO-GLCshare show medium or higher agreements in a pairwise comparison as they rely on the integration of the best characteristics of multiple datasets, some of which are also analyzed in this study.Similarly, the combination of GlobCover and GLC2000 presents a reasonable agreement, as GlobCover uses GLC2000 in its development [28].On the contrary, our results highlight that sometimes errors are translated from one dataset to another.

Conclusions
Cropland extent and location information is essential for many applications such as early warning, food security assessment, and cropland monitoring, among others.Assessing the reliability and quality of cropland area derived from land cover datasets is necessary for using them appropriately.
The results of this study reveal that total cropland varies significantly among the most widely used global land cover datasets, with generally very low full agreement values from only 1.39% (America) to 11.90% (Asia).Such discrepancies are related to the intrinsic factors of the dataset such as spatial resolution, legend description, methodology, and sensor used.Furthermore, there are areas that are more prone to error classification and disagreement than others, as occurs in land cover transition areas with mixed classes and heterogeneous landscapes.
According to the comparison with FAO statistics and the accuracy assessment based on freely available reference data, generally Globeland30 and FAO-GLCshare resulted in the most adequate datasets to monitor cropland areas, despite some areas of cropland omission in the case of Globeland30 and the low spatial resolution of FAO-GLCshare.On the contrary, GLCNMO2008 presents artefacts such as scattered spatial distribution in some countries (e.g., Rwanda), and LC-CCI2010 is inadequate to monitor cropland areas due to large cropland overestimations, mainly in Africa.Even if the new LC-CCI product (i.e., 2015) improves on the previous version, this latest version is still affected by inherited errors (i.e., LC-CCI2015).
Over the last few years, the number of land cover datasets has been rapidly increasing.With the recent launch of new high spatial resolution data such as Landsat 8 and Sentinel 1 and 2, together with the development of cloud computing technologies such as Google Earth Engine, the availability of global land cover datasets is expected to increase considerably [30].However, based on the large discrepancies between the datasets found and despite the increasing frequency and spatial resolution of land cover classification input data, we can conclude that land cover assessments, similar to the one proposed in this paper, still need to highlight the differences and document the accuracy of existing and future datasets.This enables users to evaluate their utility for specific applications and provides helpful information for global land cover data producers to further improve their maps.In order to increase the quality of future products and comparison studies, we also notice that stronger efforts need to be devoted to the collection of reference datasets and agricultural statistics of higher quality than those currently available.
One of the novelties of this research is to include newly available datasets such as Globeland30 or LC-CCI2015 that have been rarely or not yet evaluated.Moreover, the provided results represent a first step to derive a combined cropland mask and to update the previous JRC produced mask by combining the most suitable land cover dataset at country level [17].

Figure 1 .
Figure 1.Spatial agreement among nine datasets and agreement at country level (see details inAppendix A).Full agreement corresponds to a value of 9 (all nine datasets agree), and full disagreement corresponds to a value of 1 (only one dataset identifies cropland).White corresponds to no cropland for all datasets.

Figure 1 .
Figure 1.Spatial agreement among nine datasets and agreement at country level (see details inAppendix A).Full agreement corresponds to a value of 9 (all nine datasets agree), and full disagreement corresponds to a value of 1 (only one dataset identifies cropland).White corresponds to no cropland for all datasets.

Figure 2 .
Figure 2. Land cover dataset closest to FAOSTAT cropland area at the country level using the 100% weight.

Figure 2 .
Figure 2. Land cover dataset closest to FAOSTAT cropland area at the country level using the 100% weight.

Figure 3 .
Figure 3. (a) Maximum F-score value and (b) dataset with this maximum F-score.In the figure, the number of cropland samples or the total samples are displayed in each country.An example of the accuracy assessment of the Democratic Republic of North Korea (DPRK) is also displayed in the figure.

Figure 3 .
Figure 3. (a) Maximum F-score value and (b) dataset with this maximum F-score.In the figure, the number of cropland samples or the total samples are displayed in each country.An example of the accuracy assessment of the Democratic Republic of North Korea (DPRK) is also displayed in the figure.

Figure 4 .
Figure 4. Field size by country.

Figure 4 .
Figure 4. Field size by country.

Figure 5 .
Figure 5. Recommended land cover dataset for cropland monitoring based on the priority index.

Figure 5 .
Figure 5. Recommended land cover dataset for cropland monitoring based on the priority index.

Table 1 .
Summary of the main characteristics of global land cover datasets.

Table 3 .
Classes in different maps that have been recoded as cropland.Weight is defined as the percentage of cropland.All the non-perennial crops that do not last for more than two growing seasons and crops like sugar cane in which the upper part of the plant is regularly harvested while the root system can remain for more than one year in the field are included in this class.
Woody crops: the class is composed of a main layer of permanent crops and includes all types of orchards and plantations.Multiple or Layered crops: this class combines different land cover situations: two layers of different crops and the presence of one important layer of natural vegetation that covers one layer of cultivated crops.
MODISLC12.CroplandsLand covered with temporary crops followed by harvest and a bare soil period.Note that the perennial woody crops will be the appropriate forest or shrub land cover type.100 14.Cropland/natural vegetation mosaic Land with a mosaic of croplands, forest, shrublands, and grassland in which no one component compromises more than 60% of the landscape.

Table 5 .
Overall accuracy and F-score of the cropland classes of the nine land cover datasets evaluated at the continental level.