A Unified Cropland Layer at 250 m for Global Agriculture Monitoring

Accurate and timely information on the global cropland extent is critical for food security monitoring, water management and earth system modeling. Principally, it allows for analyzing satellite image time-series to assess the crop conditions and permits isolation of the agricultural component to focus on food security and impacts of various climatic scenarios. However, despite its critical importance, accurate information on the spatial extent, cropland mapping with remote sensing imagery remains a major challenge. Following an exhaustive identification and collection of existing land cover maps, a multi-criteria analysis was designed at the country level to evaluate the fitness of a cropland map with regards to four dimensions: its timeliness, its legend, its resolution adequacy and its confidence level. As a result, a Unified Cropland Layer that combines the fittest products into a 250 m global cropland map was assembled. With an evaluated accuracy ranging from 82% to 95%, the Unified Cropland Layer successfully improved the accuracy compared to single global products. Data Set: https://dx.doi.org/10.6084/m9.figshare.2066742 or http://maps.elie.ucl.ac.be Data Set License: CC-BY


Summary
Mapping the global cropland extent is of paramount importance for agricultural production assessment.Timely and accurate cropland information directly feeds global crop monitoring systems [1] and early warning systems such as the Global Information and Early Warning System (GIEWS), the Early Warning Crop Monitor and the Famine Early Warning Systems Network (FEWSNET) [2,3].It also serves environmental climate change studies [4].In both agriculture monitoring and climate modeling, cropland maps mask terrestrial areas dedicated to agriculture in order to (1) assess and compare crop growth and conditions and (2) investigate how agricultural land use could respond to different climatic change scenarios.
Satellite remote sensing provides opportunities for global cropland monitoring in a spatially explicit, economical, efficient, and objective fashion [5].In the last forty years, numerous initiatives aimed at deriving cropland either as a single class in land cover products or as a collection of agricultural land use classes (crops) via classification of satellite images.A large diversity of mapping strategies ranging from the local to the global scale and associated with various degrees of accuracy are documented in the literature [6][7][8][9][10].The existing satellite derived information on the spatial extent of croplands might be categorized in four types of products.First, global land cover maps (GLC2000 [11], GlobCover 2005/2009 [12,13], MODerate resolution Imaging Spectroradiometer (MODIS) Land Cover [14] and the European Space Agency Climate Change Initiative Land Cover products [15]) often describe the cropland according to a Land Cover Classification System (LCCS) typology and focus mainly on vegetation types.As some cropland areas are also often included in mosaic or mixed classes, their integration in agricultural applications is not trivial.In addition, when analyzing their consistency, Fritz et al. [16] highlighted that they underestimate cropland compared to the official statistics and that they also disagree with one another.Earlier, Ramankutty et al. [17] quantified this uncertainty at the global scale: they estimated that the global cropland extent varies between 1.22 and 1.71 billion hectares, i.e., more than 40% of the terrestrial surface.With the first 30 m resolution global land cover map, an increase of accuracy was expected but not observed in the case of the cropland class [18].Second, high resolution national or regional land cover mapping efforts such as Africover and Corine Land Cover have also provided detailed cropland information, but the update frequency is too low for operational crop monitoring.Third, cropland maps were produced at the global, continental or national scale [5][6][7].Some of them were strictly devoted to cropland mapping with an emphasis on water management: the global map of rainfed cropland areas (GMRCA) [19] and the global irrigated area map (GIAM) [20].The relatively coarse spatial resolution of these products (10 km) does not meet the needs for operational applications and suffers from large uncertainties [3].Fourth, some countries (e.g., the USA, Canada and India) have established dedicated annual national crop type mapping based on satellite remote sensing data.Those initiatives remain limited in number as they require advanced operational remote sensing programs and processing capabilities as well as intensive field data collection.Finally, numerous countries maintain a land parcel information system for managing the farmers' declarations, e.g., Land Parcel Identification System in Europe.
Given the known limited accuracy of existing global data sets and the still unresolved challenges for mapping specific areas, an evaluation and merger of existing data sets appear as worthwhile alternatives to creating new data layers from scratch.Hence, this paper presents the first 250 m global cropland map generated by combination of existing maps, hereafter referred to as "Unified Cropland Layer".After an exhaustive identification and collection of existing maps, the selection of the maps to use was done through a multi-criteria analysis.The Unified Cropland Layer aims at providing a map of the annually cultivated areas fitting the JECAM (Joint Experiment of Crop Assessment and Monitoring) cropland definition.The JECAM network has adopted a shared cropland definition that defines the annual cropland as a piece of land of minimum 0.25 ha (minimum width of 30 m) that is sowed/planted and harvestable at least once within the 12 months after the sowing/planting date.The annual cropland produces an herbaceous cover and is sometimes combined with some tree or woody vegetation.There are three known exceptions to this definition.The first one concerns the sugarcane plantation and cassava crop which are included in the cropland class although they have a longer vegetation cycle and are not yearly planted.Second, taken individually, small plots such as legumes do not meet the minimum size criteria of the cropland definition.However, when considered as a continuous heterogeneous field, they should be included in the cropland.The third case is the greenhouse crops that cannot be monitored by remote sensing and are thus excluded from the definition.This shared definition facilitates across-site comparisons in benchmarking activities as well as map integration.

Data and Metadata
In order to produce the best global cropland map, the fittest existing country-level cropland maps were identified by means of a multi-criteria analysis [21].Four criteria were selected to evaluate the fitness of a cropland map: the adequacy of the legend, the spatial resolution, the timeliness and the confidence level (see Section 4).These fittest maps were resampled at 250 m in a reference grid with an average resampling method and then joined spatially into the Unified Cropland Layer (Figure 1).This implies that, wherever possible, the Unified Cropland Layer provides cropland proportion rather than a binary cropland/non-cropland information.Average aggregation was found to retain a greater amount of information than other statistical aggregation techniques [22] and provides more flexibility to the users.For products provided on an annual basis, e.g., the US Cropland Data Layer, 2014 was chosen as the reference year.Table 1 provides a summary of the main characteristics of the data set.It is provided as a single GeoTiff in a latitude/longitude grid of about 355 MB.The valid data range spans from 0 to 100 where 0 represents non cropped pixels and 100 corresponds to fully cropped pixels.

Accuracy Assessment
The accuracy of the Unified Cropland Layer was assessed in two different ways: (1) with the receiver operating characteristic (ROC) curve and (2) with traditional confusion matrices.In both cases, three different and independent reference data sets were used: the consolidated GlobCover 2005 data set, the VIIRS (Visible Infrared Imaging Radiometer Suite) data set and the geoWiki data set (see Section 3.2).
As the Unified Cropland Layer provides continuous cropland proportion values, an ROC curve analysis was implemented.The ROC curve analysis for two classes (cropland/non-cropland) plots the true positive rate on the y-axis and their equivalent false-positive rate on the x-axis for every possible cut-off cropland proportion [23].Accuracy was measured by the area under the ROC curve (AUC), which provides a single measure of the probability that the classifier will rank a randomly-chosen positive instance higher than a randomly-chosen negative instance [24].An area of 1 represents a perfect accuracy; an area of 0.5 represents the performance of a random classification.Validation with the three sets show that the AUCs range from 0.8 to 0.93 (Figure 2).The accuracy was further evaluated by means of confusion matrices (Table 2a-c) and their derived indices [25,26]: (i) the overall accuracy (OA) which expresses the probability of a randomly-selected pixel to be classified accurately; (ii) the producers' accuracy which is defined for a given class as the conditional probability that a pixel classified as category c by the reference data is classified as category c by the map; and (iii) the users' accuracy which is the conditional probability that an area classified as category c by the map is classified as category c by the reference data.Validation with confusion matrices requires discrete inputs, i.e., cropland/non-cropland.A conservative cut-off was chosen: pixel with at least one percent of cropland were considered as cropland.According to the three validation data sets, the overall accuracy figures span from 84% to 94%.This is close to the similar -but at 1 km resolution-International Institute for Applied Systems Analysis-International Food Policy Research Institute (IIASA-IFPRI) map [27], and its overall accuracy is improved with respect to individual cropland products.While the producers' and users' accuracies range between 71-99% for the non-cropland class, large differences appear for the cropland class.This might be partially explained by the thematic mismatch between the JECAM and the validation set definitions, which penalizes the Unified Cropland Layer that is more restrictive semantically.The highest accuracy figures for the cropland class are obtained with the VIIRS data set: 94% of producers' accuracy and 80% of users' accuracy.Overall, the validation exercise revealed an improved accuracy over individual cropland products.Figures from the GlobCover accuracy assessment might be too optimistic as this consolidated validation data mainly focuses on areas with obvious interpretation (see Section 3.2 for more details).These figures as well as the differences between the three accuracy assessments still highlight the need for both improved spatial cropland information (especially in southeast Asia as well as western and southern Africa) and validation data.It should be noted that none of the validation data sets entirely fit the cropland definition of the Unified Cropland Layer, and the accuracy figures presented might be revised when using one that matches.However, no such global validation data set yet exists.

Perspectives of Evolution
In the moderate and long term, the Unified Cropland Layer will evolve with future map releases (e.g., with the complete Corine Land Cover 2012 coverage), changes in data policy and with new contributions from different national and international institutions.Thus, the accuracy of the Unified Cropland Layer is expected to increase over time.However, without legend harmonization, multi-product maps will remain inconsistent by definition.The FAO Land Cover Meta Language (LCML) provides a robust theoretical framework for legend definition and would certainly play a key role in class definition harmonization.
To facilitate the identification and use of land cover products by the community, this study recommends systematically registering Earth Observation resources into the GEOSS Portal which is a main entry point to Earth Observation data from all over the world with linking world-wide community of practice in nine societal benefit areas among which agriculture.With the new and upcoming high resolution satellites such as Landsat-8 and the Sentinel-1 and -2, the number of high resolution land cover products is expected to increase and their accuracies to improve.

Land Cover and Cropland Maps
Global, regional and national data sets were identified by means of systematic review during working sessions with key resource people and expert networks, literature review and web-based search.About 80 global, regional and national maps were identified (Table 3).It rapidly became necessary to distinguish the existing data sets from the ones available; the former having a distribution policy that prevents its use or having issues to access the geo-referenced data source.Therefore, the general rule was to consider only the data sets granting rights of reuse and technically available.

Global Validation Data Sets and Ancillary Data
Four global validation data sets were collected for two purposes: (1) one for the assessment of the individual maps and (2) three for the validation of the unified map (Table 4).For the criteria assessment purpose, an independent set of well-distributed validation samples produced by Zhao et al. [44] was utilized.Two reasons motivated this choice: (1) it was the most populated data set, and (2) its legend was closer to the one used in this study than the other available validation data sets.It consists of a global punctual data set based on interpreting Landsat Thematic Mapper and Enhanced TM+ images for a total of 38,664 sample units pre-determined with an equal-area stratified sampling scheme.This was supplemented by MODIS enhanced vegetation index time series data and other high-resolution imagery on Google Earth.Regarding the validation purpose, the GlobCover 2005, the VIIRS and the geoWiki data sets were selected (Figure 3).The GlobCover 2005 validation data set was built relying on a network of experts familiar with image interpretation and land cover over large areas [12].In total, 16 experts committed to interpret validation samples through a dedicated working environment relying on Google Earth and also providing 10-day NDVI profiles to illustrate the seasonal dynamics.For a given sample, the expert saw not only the sample point but also a 225 ha box that coincided with a 5x5 MEdium Resolution Imaging Spectrometer (MERIS) pixel (MERIS being the sensor used to generate the GlobCover product).For each sample, the experts could describe up to three land cover types and had to provide their level of confidence.This dataset counts 4258 samples, from which 500 samples were randomly selected and re-interpreted by a second independent expert to generate a consolidated dataset.Only the samples with high confidence (186) were kept in this study, which might lead to an optimistic bias in the estimation of the map accuracy.The VIIRS Surface Type validation database is based on a stratified random sample of 500 5 × 5 km blocks [45,46].The samples were extracted from strata derived from an intersection of a Koppen climate classification modified with a human population density layer.Samples were interpreted and labeled with very high resolution imagery.The allocation of samples within each stratum was targeted towards heterogeneous and complex land cover types that are more difficult to map.Recently, Fritz et al. [47] proposed a tool known as the geoWiki to collect volunteered geographic information on land cover from crowd-sourcing.The geoWiki Project capitalizes on a global network of volunteers who wish to help to improve the quality of global land cover maps.The volunteers were asked to review hot-spot areas where existing global land cover maps disagree.They had to determine, based on what they actually see in Google Earth and their local knowledge, if the land cover maps are correct or incorrect [48].
A global field size data set was also made available as a result of a geoWiki crowd-sourcing campaign [48] in which volunteers labelled the field size in four categories (very small, small, medium and large) based on very high resolution images and on a reference 1 × 1 km square box [27].Crowd-sourced observations were then interpolated within the cropland using an inverse distance approach.

Methodology
A cropland map must combine different criteria such as its adequacy with the legend and the spatial resolution, its timeliness and its confidence level.Therefore, the assessment of the cropland products must consider these four different criteria.To handle these four dimensions, a multi-criteria analysis was designed.A multi-criteria analysis is a particularly powerful tool when it comes down to combining the conflicting objectives described by different data sources into a single index form [21] in order to support decision making and priority analysis [49].The general outline of the methodology included the following steps:

•
Constructing a spatial information data base;

•
Translating the criteria into scores; • Defining the weight of each criterion; • Aggregating the criteria in the output index and selecting the product that maximizes this index Four criteria have been selected to evaluate the fitness of the cropland maps: the adequacy of the current legend (ThC), the adequacy of the spatial resolution (RC), the timeliness (TiC) and the confidence level (CC).The scores were attributed following a default rule (Table 5) and were then reinterpreted by experts to ensure consistency with their experience and/or visual assessment.The four criteria were finally aggregated into a single fitness indicator (FI) before being again reinterpreted by the experts.The fitness indicator FI of cropland map j for a country i is computed as: Low scores indicate low fitness with regard to the four prerequisite of a good cropland mask.The Unified Cropland Layer is the spatial combination of the fittest national maps-the products which maximize the fitness indicator.For more details about the methodology, the reader is referred to Waldner et al. [50].

Thematic Consistency Criterion
This criterion evaluates the consistency between the cropland definition of the individual product and the Unified Cropland Layer's.Indeed, as there is no broad agreement on cropland definition, a variety of definitions can be found in the individual products, that matches the Unified Cropland Layer's in various degrees.Even if the JECAM definition appears as the most appropriate, a more pragmatic definition was adopted to cope with the current limitations of global land cover legends.The definition was thus modified as follows: "the cropland is a specific area occupied by an herbaceous crop under permanent or fallow cultivation period (including active shifting cultivation fields)".To evaluate the thematic distance to the proposed definition, we compare it to other definitions on the basis of a set of binary criteria, i.e., presence/absence of a given component.The components were the following: 1. Absence of woody crops (WC); 2. Presence of fallows and bare fields (FB); 3. Absence of managed pasture and meadows (MPM).
If, for a given product, a criterion is met, this product scores 1 and 0 conversely.Individual WC, FB, MPM scores are then summed to obtain the final score of the Thematic distance Criterion (ThC): In order to allow compatibility with the three other criteria TiC, RC and CC, this score originally ranging from 0 to 3, is reclassified between 1 and 4 (Table 5a).

Timeliness Criterion
The timeliness criterion (TiC) characterizes the number of years elapsed since the epoch of a map.The Timeliness Criterion is computed as the difference between the reference year of the unified cropland layer (2014) and the epoch of a given map: where t p is the reference epoch of the map.The differences are then reclassified in four groups for which a score between 1 and 4 is assigned (Table 5b).

Resolution Adequacy Criterion
The spatial resolution required to accurately map the cropland of a given area is a function of the field size, the landscape fragmentation and some extent of the crop diversity.Areas with small adjacent parcels but low crop diversity tend to behave similarly to large fields, whereas higher spatial resolution would be required in areas with similar field size but higher crop diversity.Here, only the field size was taken as proxy to derive the spatial resolution requirements.Country-level field size histograms were derived based on the global field size data set.It was assumed that the adequate field size was the one allowing to map 75% of the largest fields.In practice, according to the distribution, this resolution would tend to fit more than 75% of the cropland extent.The extracted resolution was then related to GEOGLAM spatial resolution requirements (Table 6) that define the required resolution for cropland mapping for different field sizes [51].The number of categories separating the field size class and the class that meets the requirements provided the final resolution criterion value (Table 5c).

Table 6.
Linking the observed field size by crowd-sourcing to GEOGLAM's spatial resolution requirements.

Confidence Level Criterion
Prior to the confidence level assessment, it was necessary to harmonize the legend of both the maps and the validation data sets.The legend of each data set was thus translated into binary legends that fit most of the legends chosen for this research.Then, the confidence level of the maps was assessed by means of confusion matrices obtained from the Zhao et al. [44] reference data set.The overall accuracy was then derived and reclassified into four confidence level categories (Table 5d).

User Notes
The cropland definition used for the Unified Cropland Layer was a pragmatic one, still constrained by the current diverse definitions.Principally, the pragmatic approach was to use the level of thematic details that is generally achieved with land covers at medium to coarse resolution, i.e., multi-annual cropland that includes the fallow sections.However, users shall bear in mind that the products discard the fallow sections and the permanent crops whenever possible.
The crop proportions provided by the Unified Cropland Layer might be affected by some geometric uncertainty that would affect the average aggregator when going to 250 m pixels.Successive resamplings might also result in artifacts, especially in areas where the spatial resolution of the It is expected that the accuracy of the fractional estimates vary depending on the original resolution of the data product.

Figure 1 .
Figure 1.The Unified Cropland Layer at 250 m for 2014.

Figure 3 .
Figure 3. Distribution of the reference samples used for the validation of the 250 m Unified Cropland Layer.
Field Size GEOGLAM Field Size (ha) GEOGLAM Resolution Requirements (m)

Table 1 .
Data set characteristics of the Unified Cropland Layer 2014.

Table 2 .
Accuracy assessments of the Unified Cropland Layer of 2014 with the three different validation data sets.

Table 3 .
Input maps for the analysis.

Table 4 .
Validation data sets collected, their geometries and percentage of cropland samples.

Table 5 .
Default rules of the four components considered in the multi-criteria analysis.