Harmonizing and Combining Existing Land Cover/Land Use Datasets for Cropland Area Monitoring at the African Continental Scale

Mapping cropland areas is of great interest in diverse fields, from crop monitoring to climate change and food security. Recognizing the value of a reliable and harmonized crop mask that entirely covers the African continent, the objectives of this study were to (i) consolidate the best existing land cover/land use datasets, (ii) adapt the Land Cover Classification System (LCCS) for harmonization, (iii) assess the final product, and (iv) compare the final product with two existing datasets. Ten datasets were compared and combined through an expert-based approach in order to create the derived map of cropland areas at 250 m covering the whole of Africa. The resulting cropland mask was compared with two recent cropland extent maps at 1 km: one derived from MODIS and one derived from five existing products. The accuracy of the three products was assessed against a validation sample of 3,591 pixels of 1km regularly distributed over Africa and interpreted using high resolution images, which were collected using the Geo-Wiki tool. The comparison of the resulting crop mask with existing products shows that it has a greater agreement with the expert validation dataset, in particular for places where the cropland represents more than 30% of the area of the validation pixel.


Introduction
Mapping cropland areas is of great interest in diverse fields, from crop monitoring to climate change and food security.Indeed, any decision making process related, directly or indirectly, with agriculture, requires precise and accurate crop extent maps in order to quantify and spatially characterize the role played by the human activity that covers the largest extent of the Earth's surface.However, in Africa, the estimation of cropland extent from remote sensing remains a challenge.Several reasons for this can be put forward: the heterogeneous nature of the agriculture, differences in crop cycles, the spatial structure of the landscape (parcel size), the spectral similarity with grassland, mainly in arid and semi-arid areas, the cloud coverage during the growing season or the inter-annual variability due to climatic events such as droughts and fallow practices.
A plethora of methods and input data have been tested and used in order to estimate the cropland extent in the continent.(i) Landsat images were used to derive cropland maps, such as the Cropland-Use Intensity (CUI) dataset (the United States Geological Survey (USGS), [1986][1987][1988] and the Africover dataset [1].These spatially detailed maps are, in general, coherent with national statistics [2] but they are limited in their spatial coverage.Moreover, these datasets cannot be regularly updated following the methodology commonly used (i.e., visual interpretation).These issues have been addressed by (ii) the global map of cropland extent at 250 m produced using multi-year Moderate Resolution Imaging Spectro-radiometer (MODIS) data [3].However, its global scope does not correctly account for the specificities of some regions of the globe such as Africa.(iii) Other global maps specifically dedicated to croplands were produced by the International Water Management Institute (IWMI): the global map of rainfed cropland areas (GMRCA) [4] and the global irrigated area map (GIAM) [5].However, their coarse spatial resolution (10 km) is not suitable for national and regional applications and they present a large number of uncertainties [6].The same resolution problem characterizes (iv) the cropland mask produced by [7] that is dedicated to agricultural lands at 10 km and combines two satellite-derived land cover maps, i.e., Boston University's MODIS-derived land cover product [8] and the Global Land Cover 2000 (GLC2000) data set [9], with an agricultural inventory.(v) More recently, existing land use/land cover datasets were combined based on expert knowledge and national statistics to produce a probability map of cropland areas [10].However, the product is notably based on global land cover products (i.e., GLC2000, MODIS Land Cover, and GlobCover [11]) that do not focus on cropland areas and where the spatial resolution of the input remote sensing data (from 300 m to 1 km) is not adapted for mapping cropland areas.Moreover the map is highly dependent upon the reliability of national and sub-national statistics [10].
To improve the estimation of cropland areas at the continental scale, two complementary approaches should be considered: (i) to select the best available national and sub-national cropland maps derived, when possible, from high resolution images and combine them in order to derive a continental product and (ii) to develop procedures using high (~30 m) and moderate resolution satellite data (~250 m) at the national and sub-national scale to update existing cropland maps that are outdated and to create higher resolution products when only global maps exist.For the second approach, national and international initiatives already exist: USGS in West Africa for the year 2000 [12] and the Global Land Cover Network (GLCN) launched by the Food and Agriculture Organization of the United Nations (FAO) and the United Nations Environment Programme (UNEP) notably in Senegal for the year 2005 [13].Other initiatives also worth mentioning can be found in [14][15][16][17].The first approach is not very common and up to now only one initiative has been proposed for Africa combining regional and global datasets [11], but similar approaches have been proposed elsewhere like in Europe with Corine Land Cover [18].The general objective of this study fits into the first approach and aims at combining the best existing land cover/land use datasets derived from medium resolution imagery, using a common legend based on the Land Cover Classification System (LCCS) developed by the FAO [19].Ten datasets are harmonized and combined through an expert-based approach to derive a map of cropland areas at 250 m for Africa.
The accuracy of the resulting cropland mask is compared with two recent cropland extent maps at 1km: one derived from MODIS [3] and the other derived from five existing products [11] using a validation sample of 3,591 pixels of 1 km regularly distributed over Africa and interpreted using high resolution images.The validation datasets were collected through the crowdsourcing land cover validation tool called Geo-wiki, which provides a platform for the interpretation of high resolution images on Google Earth (GE) [20].
The land use/land cover (LULC) 2000 datasets produced by USGS covering 8 countries: Benin, Burkina-Faso, Ghana, Guinea, Guinee Bissau, Mali, Mauritania, Niger, Togo.The MODIS-JRC (Joint Research Centre, Monitoring Agricultural ResourceS (MARS) Unit, EU) crop mask derived from MODIS time series for the year 2009 over northern Nigeria and Benin [15].It is worth noting that these datasets differ in terms of data source, resolution, methodology, geographical extent, and time interval used for deriving the land cover/land use dataset (

Validation Datasets and the Geo-Wiki Tool
The validation datasets used in this study are based on the interpretation of high resolution images on Google Earth (GE) through the crowdsourcing land cover validation tool called Geo-wiki [20].Geo-wiki has built up a global network of volunteers that are helping to improve the quality of global land cover maps.Volunteers are asked to determine the land cover type from a simple legend based on Google Earth imagery and their local knowledge.Their input is recorded in a database, along with uploaded photos, which are currently being used to create an improved global land cover map.Several Geo-wiki variants are available, each focusing on different land cover types (e.g., biomass.geo-wiki.org,urban.geo-wiki.org[22]) including cropland (http://agriculture.geo-wiki.org).In the agricultural version of Geo-Wiki, users are asked to provide the percentage of cropland that they can see from Google Earth imagery at intervals of 10% (0, 1-10,…, 90-100), as well as the confidence associated with their assessment.In addition, users can specify if a high resolution image was available and its acquisition date.
Two datasets were used: (i) International Institute for Applied Systems Analysis (IIASA) experts validated pixels of 1km resolution located at each latitude/longitude intersection point across Africa (2,942 pixels), and (ii) pixels of 250 m resolution were interpreted by the Monitoring Agricultural Resources (MARS, JRC, European Commission) experts (649 pixels) every 0.36 degree, covering mainly Niger and northern Nigeria.

Methodology
The methodology follows three steps: (i) harmonization of the ten existing land cover/land use datasets in order to have comparable products, (ii) selection of the cropland classes to be used for the crop mask, (iii) identification of the best product for each country in order to spatially combine them and derive a map of cropland areas at 250 m for Africa.This section describes these steps in more detail as well as the validation of the resulting cropland mask.

Harmonization of the Datasets
The land cover/land use products used in this study have different sources and consequently different projections, formats, resolutions, and legends.Therefore, the following processes were applied prior to combining the different datasets: 1. Legend harmonization: Given the heterogeneity of the products, this is the most critical step in the process.The Land Cover Classification System (LCCS) developed by the FAO aims to analyze and cross-reference regional differences in national land cover descriptions [19].Some of the products have used the LCCS in their development (Globcover, DRC map, Africover, GLCN and the MODIS-JRC dataset) while others have not.Therefore, those products that have not adopted the LCCS differ in how they define agriculture (i.e., land cover description, land use intensity) since the aims of these products differ.In this study, a common legend with five cropland classes in accordance with Globcover standards has been adopted.These are described in Table 2 using the LCCS.All the cropland classes in each of the products were then mapped to this legend (Table 3).For those products that used the LCCS (Globcover, DRC map, Africover, GLCN and the MODIS-JRC dataset) in their development, the conversion of the legend was straightforward.For the CUI dataset, the conversion was more complex since the product describes five levels of agricultural land use intensity (0-5%, 5%-30%, 30%-50%, 50%-70%, 70%-100%) using Landsat images from 1988 that do not correspond to the current reality because of the intensification of agriculture that has occurred since this period.A visual analysis of the product in comparison with recent high-resolution imagery available on Google Earth was then required.Therefore, based on an expert-based visual analysis, the CUI classes have been converted as follows (Table 3): Levels 1 and 2 of the CUI (50%-100%) become "Cultivated and managed areas (70%-100%)" Level 3 (30%-50%) becomes "Mosaic cropland (50%-70%)/vegetation" Level 4 (5%-30%) becomes "Mosaic vegetation/cropland (20%-50%)" Level 5 (0-5%) is considered as natural vegetation 2. Conversion from feature to raster: Vector datasets such as Africover, CUI and SADC are converted into a 250 m resolution raster using the "maximum area" criteria, i.e., the feature with the largest area in the cell yields the attribute assigned to that cell.
3. Reprojection: Datasets using other projections (the CUI dataset, the LULC2000 product and the land cover of Mozambique) were reprojected to the Geographic projection (WGS84).
4. Geometric correction: A spatial shift was observed between the CUI and the other products and corrected accordingly.
5. Resampling: All datasets have been resampled at a 250 m resolution.The reason is that this resolution is compatible with MODIS time series, which are the highest resolution images used for the monitoring of agriculture in Africa.

Selection of Cropland Classes
By default, all cropland classes with a majority of cropland areas (>50%) were integrated into the crop mask.For all classes with a minority of crops (20%-50%), a visual analysis was undertaken by several experts who compared the dataset with high resolution images on GE.Based on this analysis, this class was integrated in the final cropland mask for the Globcover dataset only.Indeed, for some countries in equatorial areas where only mosaic classes are available and/or where cropland areas are mixed with forest, it was preferred to take this class into account in order to avoid underestimating cropland areas in these regions.
For countries where the information on irrigated crops was available, irrigated and rainfed crops have been contrasted.Therefore, three crop masks were produced: (i) one with irrigated and rainfed cropland areas, (ii) one containing only irrigated crops, and (iii) one with only rainfed crops.

Spatial Combination of the Datasets
As overlaps between datasets exist, a priority ranking was determined in order to combine them.Datasets were compared using high resolution images (GE) and the best product has been selected on a case by case basis.In cases where the choice was not obvious for the expert(s) because of product similarity, the following rules were applied: Priority was given to the product with the highest spatial resolution; If two products of the same resolution exist, priority was given to the most recent product.
In some cases, when two datasets were complementary, the cropland classes of both products were spatially combined.

Validation
The validation is based on two validation samples, one with 2,942 pixels of 1km covering Africa and another partially covering Niger and Nigeria with 649 pixels of 250 m.These samples were visually interpreted by experts using high resolution images through the crowdsourcing land cover validation tool called Geo-wiki [20].The three products were validated against this reference: (i) the crop mask obtained by the above described methodology and hereafter referred to as MARS-JRC and two recent cropland products at 1 km spatial resolution, (ii) the global cropland extent map derived from MODIS [3] and (iii) the cropland extent product of Africa derived from five existing products [11].The last two last datasets provide the percentage of cropland while the first is binary in nature.The combined validation of these three products aims to evaluate the improvement, if any, that the MARS-JRC crop mask generates.For both validation datasets, the three cropland extent products were assessed based on (i) the percentage of pixels detected as crop for each category of crop coverage intensity, and (ii) the omission and commission errors for three types of aggregation of crop coverage intensity, i.e., 1%-100%, 30%-100%, 50%-100%.In addition, for the MARS-JRC product, the overall accuracy was computed per country.
Finally, a limited sample of 179 pixels covering Niger and Nigeria has been used in order to assess the extent of the discrepancies between experts.In each of these randomly selected pixels, the vegetation coverage has been estimated by two overlapping experts.This step of the analysis is considered to be important if one aims to understand the scope of the validation in the crowdsourcing environment.
The two validation datasets are publicly available for download at the following address: ftp://mars.jrc.ec.europa.eu/Public/cropmask/.

Crop Mask and Sources
The resulting crop mask is delivered in Geographic projection (WGS84) at a spatial resolution of 0.00208333 degrees (equivalent to 233 m at the equator).It is publicly available for download at the following address: ftp://mars.jrc.ec.europa.eu/Public/cropmask/.
Figure 1 shows the source that has been used for each pixel as well as the number of datasets that were available (Globcover included).Of the 44 countries with significant cropland areas, 23 are covered by a dataset derived from Landsat images, three are covered by regional land cover products and 18 by a global land cover product (Globcover).For eight countries (Burkina Faso, Chad, Mali, Mauritania, Mozambique, Niger, Tanzania and Zimbabwe), two or three Landsat-based products were available and it was thus necessary to compare carefully the different cropland maps.Based on the knowledge of experts and high resolution images (GE), either the best product was selected or the classes derived from different products were spatially combined.
For Mozambique, three products were available (Figure 2): the Cropland Use Intensity dataset (USGS 1986-1988), the SADC land cover database (CSIR, South-Africa) and the land cover of Mozambique (the National Directorate of Land and Forests (DNTF)).According to the experts and the visual analysis with GE, the SADC product was missing cropland areas mainly in the Nampula region (North-East) and in the Manica region (from Manica to Chimoio, Center of the country), which disqualified the product for selection.However, the choice between the two other products was not obvious.An overestimation of the cropland areas has been observed in the CUI dataset whereas some crops were missing in the DNTF dataset, in particular in the regions of Tete and Nampula.As no product seemed better than the other one (and the best probably is somewhere between the two products), priority was given to the most recent product, i.e., the DNTF dataset.
For West African countries covered by the CUI dataset and the LULC 2000 (Burkina Faso, Chad, Mali, Mauritania, and Niger), the cropland classes of both products were found to miss important cropland areas.Fortunately, a meticulous analysis of both products showed that the nature of the omissions is different and that they appear to be complementary.In the CUI dataset (1986)(1987)(1988), the images used are outdated especially for agriculture as cultivated areas expanded considerably in these areas during the last 20 years.In the LULC dataset, omission errors have also been observed most likely due to the coarse resolution (2 km) of the available product (the original product is at 30 m but is not available).Therefore, both datasets have been combined and a better product has been obtained.Figure 3 shows an example of this spatial combination for South-West Niger.
In Tanzania, the Africover dataset was preferred to the SADC dataset, mainly because important cropland areas were missing in the SADC product, notably in the region of Dodoma between Lake Rukwa and the border with Zambia (South-West), and in the North-East of Lake Tanganyika close to the border with Burundi.For Zimbabwe and Malawi, based on comparison with high resolution images (GE), the SADC map was preferred to the CUI dataset as omission and commission errors were observed in the latter.For instance in Malawi, few cropland areas were identified in the Lilongwe districts (around the capital) whereas GE imagery clearly shows cultivated areas.On the other end of the spectrum, the Dzalanyama Forest (in the West of Lolongwe) and the Majete Wildlife Reserve (in the South) were partially identified as crops.
In the Democratic Republic of the Congo (DRC), the Africover dataset seemingly misses vast cropland areas.Some of those areas are labeled as pure and mixed cropland classes in the DRC map based on SPOT VEGETATION time series [14] and are combined within the cropland areas of the Africover dataset.
For the countries where Globcover and only one Landsat-based product were available, the Landsat-derived product was selected.These countries are (i) Burundi, Egypt, Eritrea, Kenya, Rwanda, Somalia, Sudan, and Uganda, covered by Africover, (ii) South-Africa, covered by the SADC dataset, (iii) Senegal, covered by the GLCN dataset, (iv) Ethiopia with the Woody Biomass product and (v) Zambia, covered by the CUI dataset.
In Nigeria, the Globcover and another product derived from MODIS data were available (MODIS-JRC, [15]).The second product was preferred to the first because (i) of the better spatial resolution and (ii) of the classification methodology, which focuses on agricultural lands.
Finally, the crop mask derived from Globcover (classes 10 and 20, plus class 30 for the equatorial countries) was used for the remaining countries.

Qualitative Assessment
The first step in the validation process of the three cropland products, i.e., MARS-JRC and the maps produced by Pittman et al. [3] (referred to hereafter as Global Croplands) and the IIASA African cropland product beta version of Fritz et al. [11] (referred to hereafter as the IIASA product), is an expert visual assessment assisted by high resolution images available on GE.For this purpose, vast territories, randomly distributed throughout Africa, have been visually analyzed, from which three regions, i.e., Niger/Nigeria, Sudan, and South Africa (Figure 4) have been selected to be the focus of this section.These regions are characterized by (i) substantial discrepancies between the three cropland maps, (ii) cultivated areas that can be spotted with high confidence levels using GE imagery and (iii) a large spectrum of agricultural intensity, from extensive farming in Sudan to intensive farming in South Africa.The observed differences between the three products are: For Nigeria (1) and Sudan (2), the IIASA product appears as the inverse of the two other products from what can be observed on GE images.For instance, in Nigeria, the west Tangaza Forest Reserve, characterized by a savannah land cover (see A at Figure 4), has been classified as crops in the IIASA product while the surrounding cropland areas have been classified as natural vegetation.The use of Globcover and GLC2000, which do not have agriculture as their specific focus, in the IIASA product, may explain these mis-interpretations.The same is not observed in the other two products that correctly exclude/include the savannah and surrounding croplands in the crop mask.In Sudan, the IIASA product and the MARS-JRC product are based on the Africover dataset.Therefore, we would have expected a similar result to the MARS-JRC crop mask.However, the product is missing cropland areas in the North and overestimates crop areas in the South.The Global Croplands and the MARS-JRC products have similar patterns especially for regions (1), and (2) but the Global Croplands product shows a lower percentage of crops.In Sudan, the MARS-JRC crop mask correctly depicts cropland areas south of the river, in particular the linear structures in the South of Umm Ruwaba, i.e., the succession of crop and fallow areas (see B at the Figure 4).These areas have not been detected by the two other products.
In South-Africa, the MARS-JRC and Global Croplands crop masks identify the Hoopstad region (in light orange on GE, C at the Figure 4), which is the richest maize-production district in South Africa, while it is missed by the IIASA product.However, the savannah area in the south of Hoopstad (in green on GE, D at the Figure 4) and the cropland areas in the South of Christiana (in dark orange, E at the Figure 4) are not well identified on the Global Croplands product as opposed to the MARS-JRC map and partially in the IIASA product.Large discrepancies and sometimes inverted classifications have been observed between the three products.The largest errors primarily result from the use of global land cover products, where the thresholds are optimized for multi-class identification problems and are not specifically dedicated to agriculture.Moreover, coarse resolution data, which is suitable for global and continental products, performs worse than Landsat-derived datasets.Therefore, the use of a unique satellite image processing chain at the global scale, even when dedicated to agriculture like that of Global Croplands, cannot depict all the cropland areas of the globe with the same intensity, even if the spatial patterns are correct.Finally, the calibration using national statistics may induce errors whereas the satellite-derived map is correct (e.g., Africover in Sudan).

Consistency between Experts
Before proceeding to the quantitative accuracy assessment, which takes as reference the expert interpretations coming from the Geo-wiki tool, one has to evaluate the extent of the agreement between experts.Indeed, high accuracy levels are expected if and only if experts agree between them.Otherwise, divergences between the views of the experts should be taken into account in the interpretation of the results of the accuracy assessment.
For the Niger-Nigeria validation window, the sampling was defined in order to have an overlap of 180 pixels between two experts, where both experts were confident regarding their interpretation of 130 pixels.For the remaining 50 pixels, at least one of the experts was unsure or did not give any interpretation.The percentage of agreement between both experts was computed in two ways: (i) for each category of percentage of croplands (% of agreement for 1 class), and (ii) given the difficulties of determining percentages in a 250 m box, by accepting confusions between neighboring percentage classes (% of agreement for 3 classes).
Figure 5 shows that higher agreement levels between experts are observed for classes with no or almost full agricultural coverage (90%-100%), with respectively 83.1% and 45.2% of agreement.It is worth noting that those classes are the ones most commonly represented-together they include 78.5% of the assessed pixels-as a consequence of the usual clustering of agricultural areas.However, agreement strongly deteriorates for classes with partial agricultural land coverage.Although the agreement estimates for those classes are quite unreliable given the fact each one has only 5 to 10 overlapping pixels, this can be taken as evidence of how difficult it is for a human being to assess the percentage of coverage.This is confirmed by an increase in the percentage of agreement when confusions between neighboring classes are not taken as errors (dark bars).Although higher agreement levels are still observed for classes with high or low agricultural coverage, a significant increase in the agreement is observed for all intermediate classes.
Table 4 presents the agreements and disagreements between experts and for each category.For 13 pixels, the disagreement is high as a difference of at least 50% is observed between the percentages given by the two experts.These results highlight the difficulty to map different agricultural land use intensities, even with labor intensive procedures such as visual interpretation, in particular in semi-arid areas.Despite the use of high resolution imagery and visual interpretation, experts only agree on the extreme intensity values.When considering only two classes of presence (more than 50%) and absence (less than 30%), the level of consistency between experts is respectively 65.1% and 70.2%, which is acceptable for the validation process if we consider only these classes of low and high crop densities (see confusion matrixes in Table 5).

Figure 5.
The percentage of agreement between two experts for each category of crops taking into account the category concerned only (% of agreement for 1 class in grey) but also the neighboring classes (% of agreement for 3 classes in black), and the number of pixels interpreted by both experts for each category (dashed grey line).Table 4. Number of agreements between experts for each category of crops and the percentage of agreement between both experts for each category of crops taking into account the category concerned only (% of agreement for 1 class) but also the neighboring classes (% of agreement for 3 classes).The MARS-JRC product appears to markedly better identify crops than the other products regardless of the threshold adopted.For example, the MARS-JRC product is able to detect between 46.7% to 65.1% of the cropland areas whereas the IIASA product detects between 32.8% to 39.4%.The Global Croplands product detects only 10.1% to 16.1% respectively with the thresholds 30 and 50% but it detects 99.3% of the cropland areas when aggregating all the crops from 1% to 100%.This happens because the majority of the values for the Global Croplands product vary between 1% and 10%.On the other hand, the IIASA and Global Croplands products suffer from smaller commission errors except for the Global Croplands products and the first category.

No
Although less validation pixels are available over the area covering Niger and Nigeria, some interesting results can be observed from the confusion matrices for the three types of aggregation (1%-100%, 30%-100%, 50%-100%) (Table 6).A threshold of no crops-30 percent was chosen but it is not easy to detect cropland below this threshold using medium resolution remote sensing.However, particularly in areas of low population density and in areas of shifting cultivation, these lower percentages will occur.Future products, which are based on classification of Landsat [17], should be able to detect these smaller scale cultivation patterns and it will be possible to lower this threshold for the validation.
The IIASA crop mask performs better than for Africa as a whole with better percentages than the MARS-JRC product for the "crops 1%-100%" class, and presents similar results for the second category (30%-100%).However, the MARS-JRC product better identifies crops than the other products for the third category (50%-100%) and always performs better for the "no crop" class.The Global Croplands product tends to underestimate the percentage of crops for the categories above 30% and 50% (9.8% and 3%) but less contamination errors are observed.interpreted as "crops (50%-100%)" by the validator, are lower than 30% for all the countries where the total number of samples (N) is lower than ten, except for four countries: Ivory Coast (32%), Burkina Faso (38%), Ghana (59%), and Uganda (64%).

Discussion
The generation of an African cropland mask by combining the best available national and sub-national datasets induces a risk of spatial inconsistencies within the final product.Indeed, the real spatial resolution, the thematic content and adopted methodologies vary among datasets and consequently from region to region.In addition, inconsistencies may arise from differences in the acquisition dates of the input data.Finally, high uncertainties may rise from the adoption of the outdated CUI 1986CUI -1988 in Burkina Faso, Chad, Mali, Niger and Zambia.It is clear that such caveats should be kept in mind while using the final product here proposed.However, it is worth noting that several methodological steps have been adopted in order to minimize these problems.First, inconsistencies have been reduced by the adoption of a common legend and resolution.Second, priority has been given to recent products in order to maximize temporal consistencies, but only if they have been shown to perform significantly less well than older ones.Since performance is assessed using recent high resolution images available on Google Earth, the approach automatically excludes out of date products and maximizes the temporal coherence of the final product.Finally, where the CUI 1986-1988 have been selected, the cropland use intensity classes were adapted to the current cropland use intensity using a visual analysis with GE (see Section 3.1) and combined with a recent dataset (LULC 2000) to add the areas that have been recently cultivated.
This being said, the presence of remaining inconsistencies and inaccuracies has been checked by comparing for each country the total cropland areas of the final product with figures from the "Arable land and permanent crops" class of FAOSTAT 2009 (Figure 8) and the cropland areas derived from the IIASA and the Globcover products (considering all the classes with more than 50% of cropland areas).Since the extent of cultivated areas varies considerably between countries, mostly but not only because of differences among the size of the countries, a logarithmic scale in km 2 has been adopted in Figure 8.Although comparison with FAO statistics has its own caveats given that its estimation methodologies may be inconsistent between countries and present non-negligible inaccuracies [7], it is still a valuable additional source of evidence for assessing the consistency of the final product.In this framework, an overall correlation has been observed, i.e., 0.59 for all the countries and 0.78 without three obvious outliers (DRC, Botswana and Libya) between the FAOSTAT and the MARS-JRC crop mask and has been taken as evidence of the spatial consistency of the final product.Compared to Globcover, the figures of the MARS-JRC and IIASA products seem more consistent with FAOSTAT since they are more concentrated in the vicinity of the identity line.Here, it is worth noting that high correlation was expected for the IIASA product because it has been calibrated with FAO statistics.Consequently, the obtained similar agreement between the MARS-JRC and FAOSTAT, although it was not an explicit objective of the product, is to be considered as additional evidence of the consistency of the product proposed here.The MARS-JRC crop mask tends to over-estimate the cropland areas for the majority of the countries (39 over 47).As the over-estimation appears to be systematic in the logarithmic scale, this means that the over-estimation is in percentage of the cropped area of the countries.The observed over-estimation can be explained by three factors.(i) For three datasets used in the crop mask (Globcover, CUI, and MODIS-JRC), there were mixed cropland classes including 50% to 70% of cropland.For these classes, it is thus possible to have an over-estimation of 43% to 100%.(ii) Shifting cultivation is a common practice in Africa and fallow areas are often counted as cropland areas when visually interpreting high resolution images.The impact of this shifting cultivation on cropland extent is probably higher in equatorial areas than in arid and semi-arid areas where fields are cropped for longer periods.(iii) Finally, the coarse spatial resolution of some datasets (250 m to 1 km) adds to the possible confusion between cropland and natural vegetation at this scale and of the generation of mixed classes.
As mentioned above, three main outliers have been identified, which requires some further discussion.For two of these countries (Libya and Botswana), Globcover was used in the MARS-JRC product.However, the trend is different as an over-estimation is observed for Botswana and an under-estimation is observed for Libya.Two reasons could explain these outliers: (i) the references used for the automatic labelling in Globcover can lead to very different results from one place to another according to the quality and accuracy of the reference dataset used, and (ii) the inaccuracies inherent to FAOSTAT.For the DRC, the Africover dataset was not satisfactory as it seemingly misses vast cropland areas (see Figure 8).Therefore, it was combined with the DRC map derived from the interpretation of SPOT VEGETATION time series.Some of these areas are labeled as mixed cropland classes (mainly with grassland), which corresponds to a certain reality but with a possible over-estimation notably due to the coarse resolution of the sensor and to the shifting cultivation practices that are more frequent in this equatorial region.
Although the objective of this study was not to estimate cropland areas, the comparison of cropland areas allows us to highlight the large differences between various sources and the difficulty of estimating these areas accurately.These results confirm the necessity to work on the improvement of the cropland map and cropland area estimation notably by developing procedures at the national and sub-national scale to update existing cropland maps that are outdated and to create higher resolution products when only global maps exist.

Conclusion
Recognizing the value of a reliable and harmonized crop mask over Africa and the technical constraints in the development of a product based on a single methodology for the entire continent as well as the quality of several country level products; this study aimed to optimally integrate the best available cropland datasets in a consolidated product.First, all existing land cover/land use datasets, both at the continental and country levels, have been identified and harmonized.Second, in areas where multiple products were available, they have been compared and combined through an expert-based approach.Finally, the derived map (MARS-JRC) at 250 m resolution has been presented and its quality has been assessed.Two validation samples have been used: one with of 2,942 pixels covering the whole Africa and another partially covering Niger and Nigeria with 649 pixels, both based on the expert visual interpretation of high resolution images using the GeoWiki tool.The results have been compared with two recent cropland products at a 1 km spatial resolution [3,11] that went through the same assessment procedure.
The comparison shows the MARS-JRC crop mask has a higher accuracy when compared to the other two products.Indeed, the considerable underestimation levels observed in the two other crop masks, in particular in areas of high cropland density, are much less present in the MARS-JRC product.The issue of potential inconsistencies, which may result from the combination of different cropland masks, has also been recognized and addressed; first, by methodological principles aiming to minimize their presence and second by a cross country comparison with FAO statistics.The study highlights the fact that consistent, homogenous and more accurate cropland masks can result from the spatial combination of various products.Indeed, a carefully designed methodology based on expert knowledge and high resolution images, has been proven to be able to lead to the selection of the datasets that are more in accordance with present cropland coverage.

Figure 1 .
Figure 1.Selected sources for each pixel (see the colors and the corresponding legend) of the MARS-JRC cropland map and the number of datasets that were available and that have been compared for each country.

Figure 2 .
Figure 2. Cropland maps of Mozambique: the Cropland Use Intensity dataset (USGS 1986-1988), the SADC land cover database (CSIR, South-Africa) and the land cover of the National Directorate of Land and Forests (DNTF).The window spans 1,100 × 1,700 km.

Figure 6 .
Figure 6.Agreement between the three cropland extent products (dark line for JRC, grey line for IIASA, dotted grey line for Global Croplands) and the validation sample for each category of crops based on the African dataset (2,942 pixels).

Figure 7 .
Figure 7. Overall accuracy per country for the MARS-JRC crop mask.The black figures represent the number (n) of samples interpreted as cropland (50%-100%) over the total number of samples (N) per country in the African Geo-Wiki validation dataset.The countries where n is equal to zero or N is lower than 10 are represented in grey.

Figure 8 .
Figure 8. Cropland areas (logarithmic km 2 scale) derived from three land cover/cropland map products (IIASA, MARS-JRC, and Globcover) compared to the cropland area estimates from FAO-stat 2009 (arable land and permanent crops) for all the countries of Africa having a cropland area higher than 10,000 km 2 (and covered by the dataset).The black line represents the 1:1 diagonal.

Table 1 .
Description of each land cover/land use dataset in terms of reference, data source and resolution, and time interval.

Table 2 .
Legend description for cropland classes with the Land Cover Classification System (LCCS) codes and classifiers.

Table 3 .
Translation of the ten legends into the common legend.