Next Article in Journal
Cirrus Cloud Identification from Airborne Far-Infrared and Mid-Infrared Spectra
Previous Article in Journal
Wetland Mapping with Landsat 8 OLI, Sentinel-1, ALOS-1 PALSAR, and LiDAR Data in Southern New Brunswick, Canada
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Comparison of Different Cropland Classification Methods under Diversified Agroecological Conditions in the Zambezi River Basin

1
State Key Laboratory of Remote Sensing Science, Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100101, China
2
College of Advenved Agricultural Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
3
Faculty of Agricultural Sciences, Catholic University of Mozambique, Cuamba, Niassa 3305, Mozambique
4
Division of Agriculture Applications, Soils, and Marine (AASMD), National Authority for Remote Sensing & Space Sciences (NARSS), Cairo 11843, Egypt
5
Agriculture and Natural Resources, University of California, Davis, CA 95618, USA
6
Meteorological Services Department, Harare BE150, Zimbabwe
7
Department of Soil Sciences, School of Agricultural Sciences, University of Zambia, Lusaka 10101, Zambia
8
Physics Department, University of Zimbabwe, Harare MP167, Mt Pleasant, Zimbabwe
9
Department of Agriculture-Ministry of Agriculture, Mulungushi House, Lusaka 10101, Zambia
*
Author to whom correspondence should be addressed.
Remote Sens. 2020, 12(13), 2096; https://doi.org/10.3390/rs12132096
Submission received: 11 June 2020 / Accepted: 24 June 2020 / Published: 30 June 2020

Abstract

:
Having updated knowledge of cropland extent is essential for crop monitoring and food security early warning. Previous research has proposed different methods and adopted various datasets for mapping cropland areas at regional to global scales. However, most approaches did not consider the characteristics of farming systems and apply the same classification method in different agroecological zones (AEZs). Furthermore, the acquisition of in situ samples for classification training remains challenging. To address these knowledge gaps and challenges, this study applied a zone-specific classification by comparing four classifiers (random forest, the support vector machine (SVM), the classification and regression tree (CART) and minimum distance) for cropland mapping over four different AEZs in the Zambezi River basin (ZRB). Landsat-8 and Sentinel-2 data and derived indices were used and synthesized to generate thirty-five layers for classification on the Google Earth Engine platform. Training samples were derived from three existing landcover datasets to minimize the cost of sample acquisitions over the large area. The final cropland map was generated at a 10 m resolution. The performance of the four classifiers and the viability of training samples were analysed. All classifiers presented higher accuracy in cool AEZs than in warm AEZs, which may be attributed to field size and lower confusion between cropland and grassland classes. This indicates that agricultural landscape may impact classification results regardless of the classifiers. Random forest was found to be the most stable and accurate classifier across different agricultural systems, with an overall accuracy of 84% and a kappa coefficient of 0.67. Samples extracted over the full agreement areas among existing datasets reduced uncertainty and provided reliable calibration sets as a replacement of costly in situ measurements. The methodology proposed by this study can be used to generate periodical high-resolution cropland maps in ZRB, which is helpful for the analysis of cropland extension and abandonment as well as intensity changes in response to the escalating population and food insecurity.

Graphical Abstract

1. Introduction

Croplands are changing continuously and intensively at regional to global scales due to climate change and human activities [1]. Cropland extent is an essential part of crop monitoring because it is fundamental for the analysis of crop inventory and the assessment of crop status [2,3,4,5].
Approximately 90% of staple food production in sub-Saharan Africa is provided by rainfed farming systems [6]. In the Zambezi River Basin (ZRB), variations in climate conditions have a direct influence on crop outputs as they may turn into extreme rainfall or extended period of drought [6,7]. The effects of climate change [8] combined with filed sizes and cropping systems in the basin, contribute to the reduced food production [9]. The reduced food production makes the ZRB one of the most vulnerable regions in terms of food security in the continent. The expansion of cropland is a primary way to increase crop production in sub-Saharan Africa. According to official statistics, the cropland in the ZRB has experienced a considerable increase since the 1980s [10]. The driving force in the rise was due to a major political and socio-economic transition which changed the agricultural landscape, cropping practices, and productivity, and ultimately influenced the water cycle, water resources, and energy generation. Accurate and timely analysis of cropland extent can provide objective information for decision-making in agricultural management, food security early warning, and food allocation in this region. Nevertheless, mapping cropland extent remains challenging in the ZRB, mainly because of landscape heterogeneity, fragmented agricultural fields, and different cropping patterns [8,11,12], which makes it difficult to discriminate cropland from other vegetation classes such as grassland. To avoid these problems, different aspects, including the quality of input data, methodology to be applied and land features, should be considered [13,14,15].
Various approaches have been developed and tested by researchers to obtain an accurate and timely cropland extent. Compared to conventional methods such as field surveys, remote sensing-based cropland identification is a more rapid and cost-effective method [13,14,15], and therefore provides more frequently updated cropland information [15,16,17]. Numerous studies have utilized supervised classification methodologies for cropland mapping using various satellite datasets [1,18,19,20,21]. Among those supervised classifiers, the random forest classifier [18,21], the classification and regression tree (CART), the support vector machine (SVM) [20,22], maximum likelihood and minimum distance [23] and the naive Bayes classifier [24] are commonly used [18]. Based on these approaches, several fine resolution landcover and cropland products have become available in recent years, including the fine resolution observation and monitoring of global land cover (FROM-GLC) dataset at 30 m resolution for 2010 [25], the Landsat-derived GLOBELAND30 (GLC30) dataset at 30 m resolution for 2000 and 2010 [26], the Copernicus global land operations (CGLS) land cover product of Africa (CGLS-LC100 at a 100 m resolution for 2015), the European Space Agency Climate Change Initiative Land Cover, Global Land Cover map at 300 m for 2015 (ESACCL-LC-L4-300), the Sentinel-2 Prototype (ESACCI-LC_S2_Prototype) map for Africa at a 20 m resolution for 2016 [27], and the Global Food Security support analysis data over the continent of Africa for 2015 at a 30 m resolution (GFSAD30AFCE) [18]. The accuracies of these landcover products depend on their input data sources.
Recently, researches evaluated the accuracy of these datasets, and it was found that the GLOBELAND30 (GLC30) had an overall accuracy of 78.6% for the year 2000 and 80% for the year 2010 [26]. The overall accuracy of the ESACCI-LC_S2_Prototype dataset was approximately 65% [27], while ESACCL-LC-L4-300 had an overall accuracy of 71.5% [28]. GFSAD30AFCE [18], FROM-GLC [25], and CGLS-LC100 [29] have as overall accuracies of 94.5%, 64.9% and 74.3%, respectively. Independent validation shows that the overall accuracies of all four datasets (CGLS-LC100 [29], ESACCL-LC-L4-300 [28], GFSAD30AFCE [18], and ESACCI-LC_S2_Prototype [27]) were below 65% [12]. Furthermore, results revealed an overestimation of crop area in most countries when compared with the statistics from the Food and Agriculture Organization (FAO) [30]. Several studies have attempted to include more indicative spectral features to improve classification accuracy [5,18]. For example, taking advantage of crop growth features derived from satellite data, the GFSAD30AFCE obtained a relatively higher accuracy than that in other studies [12,19]. The use of time-series data has been proven better than using a single date image [31]. Furthermore, the use of remote sensing data with high spatial resolution is one of the primary factors to obtain high-quality cropland maps [3,14,20]. However, studies have also indicated that a single factor, such as high spatial resolution alone, may not be enough to yield the desired improvement in mapping accuracy. For instance, even though there is an improved spatial resolution from using Sentinel-2 images, this does not result in a significant improvement in the accuracy of cropland mapping [12]. Studies have reported that although accuracy might be improved by spatial stratification [32], limited tests have been carried out and are not publicly available. Despite the efforts to enhance input data for mapping, the high cost of obtaining in situ samples and the lack of training samples when applied over a large scale are other limiting factors to further improve cropland product accuracy [33].
According to [34], the ZRB is composed of four agroecological zones AEZs, namely, tropic cool semiarid, tropic cool sub-humid, tropic warm semiarid, and tropic warm sub-humid zones. Each of these AEZs represents diversified characteristics of cropping practices, field sizes and, heterogeneity of landcover and climate variation [35]. For example, the tropic cool zones are characterized by large field sizes compared to tropic warm zones. These characteristics influence the discrimination of cropland from non-cropland areas. This study aims to investigate the stability of several parametric and non-parametric classifiers for cropland mapping by addressing the diversity in landcover and cropping systems over four different AEZs in the ZRB, taking advantage of multiple fine resolution datasets (Landsat-8 and Sentinel-2), as well as cloud computing with Google Earth Engine (GEE) (https://earthengine.google.com/) [36,37]. The objectives of the paper are (1) to evaluate the feasibility of training samples derived from existing datasets for large-scale cropland mapping and (2) to investigate the stability of four different classifiers (machine learning (random forest, support vector machine, and classification and regression tree) and non-parametric (minimum distance) classifiers) over different AEZs with diverse landscapes and cropping systems.

2. Materials and Methods

2.1. Study Region

The study focused on the Zambezi River basin (ZRB), located in Southern Africa (Figure 1). The Zambezi is the fourth-longest river in Africa [38], and its basin covers an estimated area of approximately 1.38 million km2 [39,40]. It stretches from the upper Zambezi in Zambia to the Zambezi delta in Mozambique, with river discharge of approximately 2600 m3 s−1 of water into the Indian Ocean [39,41]. This basin connects eight countries, namely, Angola, Botswana, Malawi, Mozambique, Namibia, Tanzania, Zambia, and Zimbabwe. The largest section of the basin is in Zambia (approximately 40.7% of the total area of the basin), followed by Angola (18.3%). Zimbabwe, Mozambique, Malawi, Botswana, Tanzania, and Namibia represent 15.9%, 11.4%, 7.7%, 2.8%, 2.0% and 1.2% of the total area of the basin, respectively [41].
The ZRB consists of three sections, namely, the Upper Zambezi, which stretches from the headwater to Livingston in Zambia; the Middle Zambezi from Livingston to Cahora Bassa Dam in Tete province in Mozambique; and the Lower Zambezi from the Cahora Bassa Dam to Beira (Zambezi delta) [38,42]. Agriculture is one of the major economic activities of all basin countries. About 70% of the basin population depends on agriculture for subsistence [43]. Over the region, four categories of cropland sizes have been identified, varying from very small field sizes (less than 2.5 ha) to large field sizes (more than 60 ha) [8]. According to [44], the dominant cropping system over the ZRB is rainfed. As a result, agriculture in this region is profoundly affected by rainfall variability [45,46]. Although little rainfall is received during the dry season (April to September), there is a small fraction of the basin that is equipped with irrigation systems to support some agricultural practices performed all year-round.
The climatic conditions in the Zambezi basin follow the seasonal migration of the intertropical convergence zone, hence resulting in distinct hot wet (October to March) and cool dry (April to September) seasons. Specifically, the wet and dry season average precipitation is 137 mm and 14 mm, respectively (Figure 2a), while the temperature averages are 24.2 °C and 20.3 °C, respectively (Figure 2b). A study by [9] indicated that from 1998–2018, the mean annual precipitation recorded in the basin was 965 mm year−1, with the highest precipitation recorded in the northern (more tropical) region. Across the basin, the rainy season coincides with the growing period of most crops. During this period, most of the rainfed crops, such as maize, sorghum, millet, and rice, are planted. Additionally, [9] found that over the period 1998–2017, the peak of precipitation (i.e., 227 mm month−1) occurs in January. In the winter/cool season, the temperature in the basin is low (Figure 2b), making it possible for some farmers to plant wheat and other crops under irrigation since evapotranspiration is low.
In this study, we assessed four different classification methods under various conditions of AEZs within the ZRB. The four AEZs in the basin are, respectively, the tropic cool semiarid, tropic cool sub-humid, tropic warm semiarid, and tropic warm sub-humid zones [34].

2.2. Data and Processing

2.2.1. Remote Sensing Data and Processing

This study used the Landsat-8 and Sentinel-2 image collections covering three consecutive agricultural years (December 2016/November 2017, December 2017/November 2018, and December 2018/November 2019). Six comparable bands (blue, green, red, near infrared (NIR), short wave infrared 1 (SWIR1), and quality band) from Landsat-8 and Sentinel-2 datasets were used. Due to the frequent clouds, which usually occur in the study area, we used the quality bands (Sentinel-2: Quality band at 60 m resolution (Q60) and Landsat-8: Band of Quality Assessment (BQA)) to mask out clouds. Table 1 shows the characteristics of Sentinel-2 and Landsat-8 data used in this study [48].
The processing of the remote sensing data was carried out in the Google Earth Engine (GEE) platform [36]. Five derived remote sensing indices, namely NDVI (normalized difference vegetation index), LSWI (land surface water index), EVI (enhanced vegetation index), BI (bare soil index), SAVI (soil adjusted vegetation index), and GCVI (green chlorophyll vegetation index) were also used as complementary layers for each Landsat-8 and Sentinel-2 images. Table 2 shows the equations used to calculate these derived indices. Although GVCI is not a commonly used index, it is good at reducing the effect of saturation at high vegetation biomass conditions [49,50]. Researchers applied GCVI in the estimation of leaf area index and green leaf biomass in maize canopies and found that it worked well at both growing and flowering stages [49]. Nearest neighbour bilinear resampling was used to resample the reflectance bands into 10 m, based on which the RS-based indices were derived at a 10 m resolution. To take advantage of all available clear pixels of images and avoid the cloud contamination problems during the rainy season, different integration methods were applied in this study for seasonal composition. The percentile composites at 25%, 50%, 75%, and 95% percentiles were used to compose the derived remote sensing indices (Table 2) during the rainy season while the median composite was used for the dry season. The percentile was used because it can fully capture the spectrum of satellite data and nature vegetation phenological information [51], whereas the median composite reduces the image collection by calculating the median of all values at each pixel across the stack of all matching bands. The yearly stack image used for the subsequent classification process was obtained by combining the two merged images composed for each period (rainy and dry) with a total number of 35 bands as inputs to classifiers (Table 3).

2.2.2. Samples

Three available land cover datasets were used to collect the training samples. These datasets included the (i) GFSAD30AFCE at a 30 m spatial resolution, with an overall accuracy of 94.5% [18]; (ii) the ESA-CCI-LC_S2_Prototype land cover map at a 20 m resolution [27] with cropland accuracy of 63%; and (iii) and the ESACCL-LC-L4-300 (ESA Climate Change Initiative—Land Cover project 2017) at 300 m resolution with an overall accuracy of 75.4%. Before generating the training samples, these three datasets were harmonized into generalized land cover datasets with five categories, as listed in Table 4. Table 4 shows the lookup table between the original and regrouped classes for each dataset. The GFSAD30AFCE did not have to be modified since it only presents three classes (cropland, non-cropland and water bodies). Both the ESA-CCI-LC_S2_Prototype and the ESACCL-LC-L4-300 were regrouped into five classes, including cropland, forest, grassland, urban areas, and water bodies.
A total of 25,000 random points were generated over different AEZs within the ZRB. The distribution of these sample points considered the five class types as well as the area of each AEZ. For example, the sampling density was higher for an AEZ with a larger area. The values from each land cover dataset were extracted to the random points to examine the agreement among the three datasets. Only the samples with full agreement among all three land cover datasets were kept as training samples. This is based on the hypothesis that samples with full agreement among different datasets are more reliable than partially agreement samples. In total, 7971 training samples (Figure A1) were collected over the ZRB basin, in which 576 samples were from the tropic cool sub-humid zone, 1372 samples from the tropic cool semiarid zone, 1163 samples from the tropic warm sub-humid zone, and 4866 samples from the tropic warm semiarid zone. The sample preparation process was performed using ArcGIS.
The validation samples (Figure A1b) were collected from multiple sources, including (1) visual interpretation of Google Earth high-resolution images and Sentinel-2 images, (2) crowdsourced cropland data from Geo-Wiki (global reference datasets on cropland) [56], and (3) field surveys. The field survey samples were collected by using the GVG (GPS-Video-Geographic Information Systems) mobile app [57] during the three years of 2017–2019. A total of 4639 samples were used as validation samples (Figure A1b), in which 1573 samples were from the field surveys, 869 samples were from Geo-Wiki, and the remaining 2197 samples were from visual interpretation.

2.3. Methods

A comprehensive overview of the methodology used in this study is summarized in Figure 3. A link containing the Google Earth Engine script used in this study is presented in the Supplementary Materials.

2.3.1. Definition of Cropland

We adopted the cropland definition of the Joint Experiment of Crop Assessment and Monitoring (JECAM) network, which defines the annual cropland from a remote sensing perspective as a piece of land of a minimum 0.25 ha (minimum width of 30 m) that is sowed/planted and harvestable at least once within the 12 months after the sowing/planting date. The annual cropland produces an herbaceous cover and is sometimes combined with some tree or woody vegetation [58]. Upon this definition, we considered three consecutive years (2017–2019) in our research, and a site must have been classified as cropland at least once over the three-year period to be considered as cropland.

2.3.2. Four Classifiers

To map the cropland area over the ZRB, four supervised classifiers were selected based on the literature review and performed in GEE [36]. These methods included random forest (RF), the support vector machine (SVM), minimum distance (MD), and the classification and regression tree (CART). The tree-based RF classifier [59] is an ensemble supervised machine learning technique [60] that produces multiple decision trees by using the randomly selected subset of training samples and variables [61]. By considering a collection of tree-structure classifiers [59,60,61], the RF classifier yields reliable classifications [62]. This classifier uses the Gini index (generalization of the binominal variance) [63] to measure inequality [64]. The machine-learning SVM classifier [16] has been applied to optical character recognition [65] as well as to produce improved classification results [66]. This method is a supervised non-parametric statistical learning theory [66,67]. The SVM classifiers minimize the classification error in the unseen dataset [67]. The tree-based framework, i.e., the CART classifier [68], uses historical data and tree-building algorithms to construct the decision tree [22,69]. This classifier consists primarily of three parts: (a) construction of the maximum tree; (b) choice of right tree size; and (c) classification of the new data using a constructed tree [69]. Similar to RF classifiers, the classification and regression trees use the Gini index [70]. By taking advantage of not being only a mathematically simple and computationally efficient technique [71,72], the nonparametric classifier MD [73] has no assumption of datasets regarding the features of interest and does not consider class variability [71,73]. This classifier is based on the MD rule that calculates the spectral distance between the measurement vector for the candidate pixel and the primary vector for each assigned sample [74]. To perform the classification, RF was optimized using the following parameters: number of trees = 100, and minimum size of terminal node = 1. The SVM classifier was optimized with a kernel type of RBF (radial basis function), cost = 10, and gamma = 0.5. Both CART and MD used default parameters.

2.3.3. Assessment Indicators

To test the robustness of different classifiers over different AEZs, one test site in each AEZ (Figure 1) in the ZRB was chosen. Each site corresponds to the intersection of one Landsat tile and AEZ boundary. Over these test sites, we mapped the land cover by using different classification methods and evaluated the accuracy of five different classes (cropland, grassland, forest, urban areas and water bodies). The land cover maps were further reclassified into two classes, namely, cropland and non-cropland. The binary cropland and non-cropland maps were also validated and assessed for accuracy.
The confusion matrix, kappa coefficient, analysis of variance, and Tukey HSD (Honestly Significant Difference) were used to assess the accuracy of the resulting maps as well as to assess the differences between the four classifiers across AEZs. The confusion (error) matrix [75] consisted of a cross-tabulation [13] that compares the mapped class labels with the observed class labels on the ground [13,75,76]. By definition, the kappa coefficient (k) reflects the difference between the actual agreement and the agreement expected by chance [77,78]. As in many statistical tests, the kappa coefficient ranges from −1 to +1. Values situated below zero indicate that there is no agreement between the observed and expected data, a value of zero indicates an agreement that can be obtained by random choices, and a value of 1 represents perfect agreement between the data [77]. As indicated by [77], most of the available documentation about the kappa coefficient cites [79] as the source of the following kappa coefficient (Equation (1)):
K = i = 1 k x i i i = 1 k ( X i + x X + i ) N 2 i = 1 k ( X i + x X + i )
where k = number of rows and columns in error matrix; N = total number of observations; Xii = observation in row i and column i; Xi+ = marginal total of row i; and X+i = marginal total of column i.
We assessed the results obtained by the different classifiers by using the analysis of variance (ANOVA) test. ANOVA is a statistical method that uses F-tests [80] to assess differences between several means [80,81]. The study used a two-factor ANOVA without replication [81], i.e., 4 × 4 (four AEZS and four classification methods). In addition to ANOVA, the obtained results were submitted to the Tukey honestly significant difference (HSD) test [82]. The Tukey HSD test is a statistical tool used to determine if the relationship between two sets of data is statistically significant—that is, whether there is a strong chance that an observed numerical change in one value is causally related to an observed change in another value [82]. This test allows the computation of the honestly significant difference between two or more means by using statistical distributions [83]. This test was used to compare the level of differences between the different results. The formula used to compute Tukey’s analysis is shown in Equation (2):
H S D = q α A M S w n k
where qαA represents the relevant critical value of the studentized range statistics, nk represents the number of scores used in calculating the group means of interest and MSw represent the mean square width.
Both ANOVA and Tukey’s HSD tests were computed by using the statistic package SISVAR. Based on the validation, the most reliable and stable classifiers were chosen to map the cropland areas over the ZRB.

3. Results

3.1. The Feasibility of Derived Training Samples

In this study, we relied on the full agreement among land cover (LC) datasets to build our training set. To evaluate the feasibility of this approach, we validated the different areas of agreement (full agreement: three datasets agree on the existing or absence of cropland; medium agreement: two out of the three datasets agrees on the existence of cropland; no-agreement: only one dataset confirmed the existence of cropland) using our validation set, as shown in Table 5. The area of the full agreement was more accurate in distinguishing the cropped from non-cropland areas with an overall accuracy of 89%. This means that the training samples we produced, based on the full agreement, could obtain 11% errors. Nevertheless, this error is less than the tolerance range of sample errors (20%), which could significantly decrease the classification accuracy [84]. Overall, despite the inevitable errors associated with this approach in deriving the training sets, it still a convenient approach to build the training samples for large-scale cropland mapping case studies like the current one, when the field surveys are not an economic or a practical option.

3.2. Performance of Different Classifiers

Figure 4 and Table A1 show the overall accuracy (OA) and kappa coefficient (k) for the cropland classification over four AEZs based on the four classification methods. Overall, the results indicated the performance of classification methods varied with AEZs, but RF performed best among the four methods.
The OA from RF in each AEZ was 94%, 87%, 86%, and 83% in the tropic cool sub-humid, tropic cool semiarid regions, tropic warm sub-humid, and tropic warm semiarid regions, respectively. In addition, RF had the highest average OA (87.4%) and averaged k (0.72) among the four classifiers. CART had the lowest average OA (76.9%), while SVM had the lowest average k (0.48) among the four classifiers.
Table 6 summarizes the confusion matrix for the land cover maps. According to Table 6, the forest and grassland were the main classes incorrectly mixed with cropland in all AEZs. Another interesting finding is that all classifiers performed better in cool zones (tropic cool sub-humid/semiarid) than the warm zones (tropic warm sub-humid/semiarid). This may due to the higher confusion of the three vegetation classes (cropland, forest, and grassland) in the warm zones (Table 6). Nevertheless, the highest separation between cropland and the other two vegetation classes (forest and grassland) was observed when applying the RF classifier. Meanwhile, when compared to the other classifiers, the MD classifier underestimated the cropland in all four AEZs. This was most visible in the tropic warm sub-humid zone, where cropland area was underestimated by 7%. In the tropic cool sub-humid zone, all classifiers tended to underestimate cropland, with approximately 6%, 3.1%, 20%, and 14% for the RF, SVM, CART, and MD classifiers, respectively. Due to clearly distinct spectral features and temporal patterns of urban and water bodies, these two classes were rarely mixed with cropland. The only exception was found in the tropic cool semiarid zone, where the urban and cropland types were mixed. This zone includes large cities in Zimbabwe (e.g., Harare and Chitungwiza) where urban areas usually have a high proportion of green areas that could be mistakenly classified as a cropland.
The significance level (F-ration>F-critic) obtained in the analysis of variance (ANOVA) test showed that the agroecological zones, as well as the image classification methods, had a significant effect on the obtained results, with a p-value < 0.05. Table A2 and Table A3 summarize the analysis of variance computed for both OA and kappa coefficients, respectively. Tukey’s HSD test (Table 7) indicated that for the OA, there were no significant differences between the means obtained by the RF and MD classifiers and the means obtained by the SVM and CART classifiers. Concerning the kappa coefficient, significant differences in the means were recorded by the RF and CART classifiers, with corresponding kappa values of 0.72 and 0.47, respectively. In addition, Tukey’s HSD test, calculated using the average OA (0.0881) and kappa coefficient (HSD=0.2382), confirmed the significant differences found by the ANOVA.

3.3. Cropland Extent over the Zambezi River Basin

We mapped the cropland by using the random forest classifier for each AEZ over the ZRB. This classifier was chosen because it showed higher overall accuracy and user accuracy for cropland class than other classifiers (Table 6 and Table A1 and, Figure 4). Figure 5 shows the extracted cropland extent at 10 m over the ZRB, and Figure 6 compares the cropland map from this study with the GFSAD30AFCE. By using independent samples, most from field surveys and high-resolution Google Earth images, we assessed the overall accuracy and kappa coefficient for our binary cropland map. Our overall accuracy was 84%, and the kappa coefficient was 0.67. Table 8 summarizes the confusion matrix for this assessment.

4. Discussion

Suggested by previous studies, field size [8], spatial extent, and landscape patterns [85] are the major factors impacting the accuracy of cropland classification [12]. According to [8], field size can be of great importance in agricultural land monitoring, referring to the fact that, for example, a small field size will require the use of high-resolution images when compared with larger fields. Over the ZRB, the dominant field size is labelled ‘‘very small’’ (0.64–2.56 ha and < 0.64 ha). To handle this issue, different aspects must be taken into consideration, including the methodology to be applied, the characteristics of the land features, and the quality of the input data for classification. The results obtained from this study indicate that with an average OA of 87.4%, the RF classifier outperformed all other classifiers, including MD (79.4%), SVM (78.2%), and CART (76.9%) in all studied AEZs. The good performance of RF classifier over different AEZs indicates it has substantial potential in mapping cropland features under different conditions. Similar findings were reported by [18,21], who used RF to map not only the cropland extent but also the different features on the Earth’s surface. Although RF performed best in all AEZs, this classifier still needs to be trained in each region considering the dynamics of agricultural conditions.
In this study, we found that by training in each region, the accuracies of this classifier varied with the AEZS (Figure 4), with the highest accuracy observed in the tropic cool sub-humid region (93.8%), and the lowest accuracy observed in the tropic warm semiarid region (82.6%). The differences presented here might be attributed to landscape patterns, field sizes, and different cropping systems at different AEZs [86]. For example, the tropic cool zones (semiarid and sub-humid) have different cropping systems and field sizes compared with those of the tropic warm zones (semiarid and sub-humid). In the tropic cool zones, a high percentage of the area is mostly characterized by commercial farms with relatively large field size [8], making it easy to identify the croplands with higher accuracy. In contrast, over the tropic warm (semiarid and sub-humid) zones, the high phenological similarity between vegetation classes, particularly grassland, with rainfed cropland areas could be the main source of confusion. The zone-dependent variations in cropland classification accuracies were also reported by [87] when mapping cropland area over southeast and northeast Asia using multi-year time-series Landsat 30 m data and a random forest classifier. In the process of cropland mapping, one of the biggest challenges is the separation of cropland from grassland. Grassland has, in some growing periods, spectral features similar to those of cropland, which often confuse discrimination from cropland [88]. Some fields have crops in some years while in other years, they are left idle (bare or as grassland), which also leads to spectral variability and confusion in the multi-temporal analysis. This phenomenon may have led to the higher misclassification among cropland, grassland, and forest in the two tropical warm (semiarid and sub-humid) zones than in the two tropical cool (semiarid and sub-humid) zones, leading to the higher accuracy of all four classifiers in cool AEZs (Figure 4).
It is noteworthy that our research paid special efforts in collecting and processing the input data to improve the cropland classification thanks to the cloud computing techniques. The cropland extent was finally mapped at a 10 m spatial resolution over the ZRB by considering three years, 2017–2019. We obtained an overall accuracy of 84%, which was 2% higher than the GFSAD30AFCE for the years 2015/2016 over the ZRB [18]. The differences between these two studies are the input datasets. This study enhanced mapping by combining reflective bands with multiple derived indices, thereby increasing spectral discriminability between the different classes. In addition, our 10 m cropland map was also compared to the FROM-GLC 30 m cropland map developed by [84] over the ZRB. It was found that not only did we improve the spatial resolution of cropland map (from 30 m to 10 m), but also the accuracy of our cropland map was 15% higher than that of the FROM-GLC product.
This study also proved the importance of the integration of different types of datasets (Landsat-8 and Sentinel-2) for accurate cropland mapping. In this study, these two different datasets were chosen because most of the study region is characterized by rainfed croplands, and the growing period (an essential element in the identification of cropland areas) coincides with the rainy season, thus, obtaining cloud-free time-series images becomes a challenge. Hence, the use of multiple sensors enhanced the acquisition of cloud-free images, which in turn contributed to more accurate results. Apart from the RF classifier and the usage of the different datasets, another essential element that contributed to the improvement in mapping accuracy was the technique used to collect samples for classifier training/calibration. In this study, the samples used were collected from locations where there was agreement between different existing land cover datasets on the class value at a given point. By using different datasets for sampling, we have reduced the uncertainty and therefore provided more reliable calibration sets. Apart from the RF classifier and the usage of the different datasets, another essential element that contributed to the improvement in mapping accuracy was the technique used to collect samples for classifier training/calibration. In this study, the samples used were collected from locations where there was an agreement between different existing land cover datasets on the class value at a given point. Furthermore, a comparison of spatial agreement of the four different land cover datasets (including GFSAD30AFCE, CGLS-LC100, ESACCILC_S2_Prototype, and ESACCL-LC-L4-300) based on standard deviation [12] revealed high spatial agreement on cropland maps over the ZRB, suggesting that these datasets are reliable over the region. By using different datasets for sampling, we have reduced the uncertainty and therefore provided more reliable calibration sets.
In terms of limitations of this study, the present of cloud is a big issue for RS observations in study area, particularly over the rainy season (October to March). Although a yearly mosaic is used to reduce the impact of clouds, the impact cannot be eliminated. The use of Synthetic Aperture Radar (SAR) data may result in better separation between cropland and grassland, especially in areas close to rives or wetlands which might be another limitation of this research. Fortunately, cloud computing on the GEE platform efficiently processed and composited thousands of imageries for each year. As a consequence, the number of cloudless observations for the rainy season (October 2018 to March 2019) was 64 on average by integrating Landsat-8 and Sentinel-2 imageries (Figure A2). Moreover, 64.9% of the ZRB had 34 to 64 valid observations during the rainy season. Thus, the uncertainty of cloud impacts is limited to some extent. Furthermore, more validation samples based on field surveys are needed for better quality assessment. Given that only a small number of validation samples are available, we relied on a freely available validation dataset, which could also introduce some uncertainty in the final assessment.

5. Conclusions

In this study, we assessed the accuracy of four classifiers for cropland mapping in four different AEZs in the ZRB. Two types of remote sensing data (Landsat-8 and Sentinel-2) and their derived vegetation indices were adopted to improve cropland classification. The results proved the robustness of random forests in obtaining accurate cropland extent through all AEZs. The random forest method outperformed other classifiers and had an average overall accuracy of 87.4% across the four AEZs. Although the other classifiers had lower accuracy than RF, they still performed better than previous research conducted in the basin. The accuracy obtained in this study is higher than various available cropland maps over the ZRB, and this is attributed to the high quality of remote sensing and training data used in this study, as well as the consideration of agriculture diversity, including agro-ecosystems and field sizes (from small farms to commercial farms). The use of high resolution and multi-temporal data improves the discrimination of croplands from other vegetation classes, and also contributes to high accuracy, even in sub-basins with small field sizes. Based on the methodology proposed by this study, cropland maps with a resolution of 10 m could be generated periodically, which would be helpful for the rapid crop monitoring and cropland change analysis in response to the escalating population and food insecurity. The applicability of this method over other areas with similar agroecological conditions (e.g., the Congo River basin in central Africa and the Mekong River basin in southeast Asia) still need to be further investigated.

Supplementary Materials

The following link contains the Google Earth Engine script used in this study: https://code.earthengine.google.com/2bbd0d138dc672d8ba9ecfc8583a059b.

Author Contributions

Conceptualization, J.B., M.Z. and B.W.; data curation, J.B., M.N. and S.S.N.; formal analysis, J.B.; investigation, J.B., F.T., W.L., E.P., T.D.M., P.K., E.M. and C.M.; methodology, J.B., M.Z. and M.N.; software, J.B.; validation, M.N. and S.A.C.; writing—original draft, J.B.; writing—review and editing, M.Z., B.W., F.T., W.L., H.Z., N.Z., S.S.N., S.A.C., E.P., T.D.M., P.K., E.M. and C.M. All authors have read and agreed to the published version of the manuscript.

Funding

This paper was funded by the National Natural Science Foundation of China (41561144013, 41861144019 and 41701496), the National Key R & D Program of China (2016YFA0600302), and the Strategic Priority Research Program of the Chinese Academy of Sciences (XDA19030201).

Acknowledgments

The first author, José Bofana, acknowledges the Chinese Academy of Sciences (CAS) and The World Academy of Sciences (TWAS) for awarding him the CAS-TWAS President’s fellowship to carry out the research.

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Appendix A

Figure A1. Distribution of (a) training (6727) samples and (b) validation (4639) samples over the ZRB. Non-cropland classes here are referred to as the following classes: forest, grassland, water bodies and urban areas.
Figure A1. Distribution of (a) training (6727) samples and (b) validation (4639) samples over the ZRB. Non-cropland classes here are referred to as the following classes: forest, grassland, water bodies and urban areas.
Remotesensing 12 02096 g0a1
Figure A2. Number of good observations during the rainy season (October 2018 and November–March 2019). The bar legend indicates the frequency of observations of combined Landsat-8 and Sentinel-2 imageries.
Figure A2. Number of good observations during the rainy season (October 2018 and November–March 2019). The bar legend indicates the frequency of observations of combined Landsat-8 and Sentinel-2 imageries.
Remotesensing 12 02096 g0a2
Table A1. Results of overall accuracy (OA) and kappa Coefficient over the different AEZs based on the applied image classification methods.
Table A1. Results of overall accuracy (OA) and kappa Coefficient over the different AEZs based on the applied image classification methods.
Overall Accuracy
Agroecological zonesRFSVMMDCART
Tropic Cool Sub-humid93.891.884.583.5
Tropic Cool Semiarid87.282.979.482.9
Tropic Warm Sub-humid85.968.576.568.5
Tropic Warm Semiarid82.669.67772.7
Average of Four AEZs87.478.279.476.9
Kappa Coefficient
Agroecological zonesRFSVMMDCART
Tropic Cool Sub-humid0.880.830.850.67
Tropic Cool Semiarid0.720.630.530.62
Tropic Warm Sub-humid0.680.170.420.46
Tropic Warm Semiarid0.60.270.450.33
Average of Four AEZs0.720.480.560.52
Table A2. Summary of the analysis of variance computed for the OA.
Table A2. Summary of the analysis of variance computed for the OA.
Source of VariationSSdfMSFp-ValueF crit
Classification methods0.026930.00905.63180.01883.8625
Agroecological zones0.050730.016910.59380.00263.8625
Error0.014390.0016
Total corrected0.091915
CV (%) = 4.96
Overall mean:0.8045 The number of observations:16
SS: Sum of the squares; MS: Mean Square; df: degree of freedom; F: ratio of two variances; p-value: probability value; F critic: Hypothesis acceptance level; CV: Coefficient of Variation.
Table A3. Summary of the analysis of variance computed for the k.
Table A3. Summary of the analysis of variance computed for the k.
Source of VariationSSdfMSFp-ValueF crit
Classification methods0.138030.04603.95440.04733.8625
Agroecological zones0.411730.137211.79540.00183.8625
Error0.104790.0116
Total corrected0.654515
CV (4%) = 18.98
Overall mean: 0.5683 The number of observations:16

References

  1. Waldner, F.; De Abelleyra, D.; Verón, S.R.; Zhang, M.; Wu, B.; Plotnikov, D.; Bartalev, S.; Lavreniuk, M.; Skakun, S.; Kussul, N.; et al. Towards a set of agrosystem-specific cropland mapping methods to address the global cropland diversity. Int. J. Remote Sens. 2016, 37, 3196–3231. [Google Scholar] [CrossRef] [Green Version]
  2. Lambert, M.J.; Waldner, F.; Defourny, P. Cropland mapping over Sahelian and Sudanian agrosystems: A Knowledge-based approach using PROBA-V time series at 100-m. Remote Sens. 2016, 8, 232. [Google Scholar] [CrossRef] [Green Version]
  3. Husak, G.J.; Marshall, M.T.; Michaelsen, J.; Pedreros, D.; Funk, C.; Galu, G. Crop area estimation using high and medium resolution satellite imagery in areas with complex topography. J. Geophys. Res. Atmos. 2008, 113. [Google Scholar] [CrossRef] [Green Version]
  4. Phalke, A.R.; Özdoğan, M. Large area cropland extent mapping with Landsat data and a generalized classifier. Remote Sens. Environ. 2018, 219, 180–195. [Google Scholar] [CrossRef]
  5. Waldner, F.; Hansen, M.C.; Potapov, P.V.; Löw, F.; Newby, T.; Ferreira, S.; Defourny, P. National-scale cropland mapping based on spectral-temporal features and outdated land cover information. PLoS ONE 2017, 12, 1–24. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  6. Rosegrant, M.W.; Cline, S.A. Global Food Security: Challenges and Policies. Science 2003, 302, 1917–1919. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  7. Matyas, C.J.; Silva, J.A. Extreme weather and economic well-being in rural Mozambique. Nat. Hazards 2013, 66, 31–49. [Google Scholar] [CrossRef]
  8. Fritz, S.; See, L.; Mccallum, I.; You, L.; Bun, A.; Moltchanova, E.; Duerauer, M.; Albrecht, F.; Schill, C.; Perger, C.; et al. Mapping global cropland and field size. Glob. Chang. Biol. 2015, 21, 1980–1992. [Google Scholar] [CrossRef] [PubMed]
  9. Zeng, H.; Wu, B.; Zhang, N.; Tian, F.; Phiri, E.; Musakwa, W.; Zhang, M.; Zhu, L.; Mashonjowa, E. Spatiotemporal analysis of precipitation in the sparsely gauged Zambezi River Basin using remote sensing and google Earth engine. Remote Sens. 2019, 11, 2977. [Google Scholar] [CrossRef] [Green Version]
  10. Marklund, L.G.; Batello, C. FAO Datasets on Land Use, Land Use Change, Agriculture and Forestry and Their Applicability for National Greenhouse Gas Reporting A Background Paper for the IPCC Expert Meeting on Guidance on Greenhouse Gas Inventories of Land Uses such as Agriculture a. 2008. Available online: http://www.fao.org/climatechange/15534-03bd24352e5f95a54c039491c08ca2325.pdf (accessed on 13 June 2020).
  11. Wei, Y.; Lu, M.; Wu, W.; Ru, Y. Multiple factors influence the consistency of cropland datasets in Africa. Int. J. Appl. Earth Obs. Geoinf. 2020, 89, 102087. [Google Scholar] [CrossRef]
  12. Nabil, M.; Zhang, M.; Bofana, J.; Wu, B.; Stein, A.; Dong, T. Assessing factors impacting the spatial discrepancy of remote sensing based cropland products: A case study in Africa. Int. J. Appl. Earth Obs. Geoinf. 2020, 85, 102010. [Google Scholar] [CrossRef]
  13. Foody, G.M. Status of land cover classification accuracy assessment. Remote Sens. Environ. 2002, 80, 185–201. [Google Scholar] [CrossRef]
  14. Tucker, C.J.; Townshend, J.R.G. Goff African Land-cover classification using satellite data. Science 1985, 227, 369–374. [Google Scholar] [CrossRef] [PubMed]
  15. Ustuner, M.; Sanli, F.B.; Abdikan, S.; Esetlili, M.T.; Kurucu, Y. Crop type classification using vegetation indices of rapideye imagery. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. ISPRS Arch. 2014, 40, 195–198. [Google Scholar] [CrossRef] [Green Version]
  16. Ingmar, N.; Schulthess, U.; Asche, H. Comparison of Machine Learning Algorithms Random Forest, Artificial Neural Network and Support Vector Machine To Maximum Likelihood for Supervised Crop Type Classification. In Proceedings of the 4th GEOBIA, Rio Janeiro, Brazil, 7–9 May 2012; pp. 35–40. Available online: https://www.researchgate.net/publication/275641579_COMPARISON_OF_MACHINE_LEARNING_ALGORITHMS_RANDOM_FOREST_ARTIFICIAL_NEURAL_NETWORK_AND_SUPPORT_VECTOR_MACHINE_TO_MAXIMUM_LIKELIHOOD_FOR_SUPERVISED_CROP_TYPE_CLASSIFICATION (accessed on 13 June 2020).
  17. Lobell, D.B.; Asner, G.P. Cropland distributions from temporal unmixing of MODIS data. Remote Sens. Environ. 2004, 93, 412–422. [Google Scholar] [CrossRef]
  18. Xiong, J.; Thenkabail, P.S.; Tilton, J.C.; Gumma, M.K.; Teluguntla, P.; Oliphant, A.; Congalton, R.G.; Yadav, K.; Gorelick, N. Nominal 30-m cropland extent map of continental Africa by integrating pixel-based and object-based algorithms using Sentinel-2 and Landsat-8 data on google earth engine. Remote Sens. 2017, 9, 1065. [Google Scholar] [CrossRef] [Green Version]
  19. Xiong, J.; Thenkabail, P.S.; Gumma, M.K.; Teluguntla, P.; Poehnelt, J.; Congalton, R.G.; Yadav, K.; Thau, D. Automated cropland mapping of continental Africa using Google Earth Engine cloud computing. ISPRS J. Photogramm. Remote Sens. 2017, 126, 225–244. [Google Scholar] [CrossRef] [Green Version]
  20. Waldner, F.; Canto, G.S.; Defourny, P. Automated annual cropland mapping using knowledge-based temporal features. ISPRS J. Photogramm. Remote Sens. 2015, 110, 1–13. [Google Scholar] [CrossRef]
  21. Gislason, P.O.; Benediktsson, J.A.; Sveinsson, J.R. Random forests for land cover classification. In Proceedings of the Pattern Recognition Letters; 2006; Volume 27, pp. 294–300. Available online: https://www.sciencedirect.com/science/article/abs/pii/S0167865505002242 (accessed on 13 June 2020).
  22. Razi, M.A.; Athappilly, K. A comparative predictive analysis of neural networks (NNs), nonlinear regression and classification and regression tree (CART) models. Expert Syst. Appl. 2005, 29, 65–74. [Google Scholar] [CrossRef]
  23. Zhang, D.; Chen, S.; Zhou, Z.H. Learning the kernel parameters in kernel minimum distance classifier. Pattern Recognit. 2006, 39, 133–135. [Google Scholar] [CrossRef]
  24. Ramesh, V.; Ramar, K. Classification of Agricultural Land Soils: A Data Mining Approach. Agric. J. 2011, 6, 82–86. [Google Scholar] [CrossRef]
  25. Gong, P.; Wang, J.; Yu, L.; Zhao, Y.; Zhao, Y.; Liang, L.; Niu, Z.; Huang, X.; Fu, H.; Liu, S.; et al. Finer resolution observation and monitoring of global land cover: First mapping results with Landsat TM and ETM+ data. Int. J. Remote Sens. 2013, 34, 2607–2654. [Google Scholar] [CrossRef] [Green Version]
  26. Chen, J.; Chen, J.; Liao, A.; Cao, X.; Chen, L.; Chen, X.; He, C.; Han, G.; Peng, S.; Lu, M.; et al. Global land cover mapping at 30 m resolution: A POK-based operational approach. ISPRS J. Photogramm. Remote Sens. 2015, 103, 7–27. [Google Scholar] [CrossRef] [Green Version]
  27. CCI Land Cover S2 Prototype Land Cover 20 m map of Africa. ESA. 2017. Available online: https://maps.elie.ucl.ac.be/CCI/viewer/download/ESACCI-LC-Ph2-PUGv2_2.0.pdf (accessed on 13 June 2020).
  28. CCI-LC-PUGV2 Land Cover CCI. Product User Guide Version 2. 2017. Available online: https://maps.elie.ucl.ac.be/CCI/viewer/download/ESACCI-LC-Ph2-PUGv2_2.0.pdf (accessed on 13 June 2020).
  29. CGLOPS-1 Validation Report: Moderate Dynamic Land Cover Collection 100m, Version 1. Copernicus Global Land Operations—Lot 1; Paris, France, 2018; Available online: https://land.copernicus.eu/global/sites/cgls.vito.be/files/products/CGLOPS1_VR_LC100m-V1_I1.20.pdf (accessed on 13 June 2020).
  30. Xu, Y.; Yu, L.; Feng, D.; Peng, D.; Li, C.; Huang, X.; Gong, P. Comparisons of three recent moderate resolution African land cover datasets: CGLS-LC100, ESA-S2-. Int. J. Remote Sens. ISSN 2019, 1161. [Google Scholar] [CrossRef]
  31. Belgiu, M.; Csillik, O. Remote Sensing of Environment Sentinel-2 cropland mapping using pixel-based and object-based time- weighted dynamic time warping analysis. Remote Sens. Environ. 2018, 204, 509–523. [Google Scholar] [CrossRef]
  32. Inglada, J.; Vincent, A.; Arias, M.; Tardy, B.; Morin, D.; Rodes, I. Operational High Resolution Land Cover Map Production at the Country Scale Using Satellite Image Time Series. Remote Sens. 2017, 9, 95. [Google Scholar] [CrossRef] [Green Version]
  33. Bey, A.; Jetimane, J.; Lisboa, S.N.; Ribeiro, N.; Sitoe, A.; Meyfroidt, P. Mapping smallholder and large-scale cropland dynamics with a flexible classification system and pixel-based composites in an emerging frontier of Mozambique. Remote Sens. Environ. 2020, 239, 111611. [Google Scholar] [CrossRef]
  34. HarvestChoice AEZ (16-class, 2009). Available online: http://harvestchoice.org/data/aez16_clas (accessed on 19 June 2018).
  35. Wu, B.; Gommes, R.; Zhang, M.; Zeng, H.; Yan, N.; Zou, W.; Zheng, Y.; Zhang, N.; Chang, S.; Xing, Q.; et al. Global crop monitoring: A satellite-based hierarchical approach. Remote Sens. 2015, 7, 3907–3933. [Google Scholar] [CrossRef] [Green Version]
  36. Gorelick, N.; Hancher, M.; Dixon, M.; Ilyushchenko, S.; Thau, D.; Moore, R. Google Earth Engine: Planetary-scale geospatial analysis for everyone. Remote Sens. Environ. 2017, 202, 18–27. [Google Scholar] [CrossRef]
  37. Jacobson, A.; Dhanota, J.; Godfrey, J.; Jacobson, H.; Rossman, Z.; Stanish, A.; Walker, H.; Riggio, J. A novel approach to mapping land conversion using Google Earth with an application to East Africa. Environ. Model. Softw. 2015, 72, 1–9. [Google Scholar] [CrossRef] [Green Version]
  38. Vörösmarty, C.J.; Moore, B. Modelling basins-scale hydrology in support of physical climate and global biogeochemical studies: An example using the Zambezi River. Surv. Geophys. 1991, 12, 271–311. [Google Scholar] [CrossRef]
  39. Beck, L.; Bernauer, T. How will combined changes in water demand and climate affect water availability in the Zambezi river basin? Glob. Environ. Chang. 2011, 21, 1061–1072. [Google Scholar] [CrossRef]
  40. Timberlake, J. Biodiversity of the Zambezi Basin; Occasional Publications in Biodiversity: Bulawayo, Zimbabwe, 2000; Volume 9. [Google Scholar]
  41. Cohen Liechti, T.; Matos, J.P.; Boillat, J.L.; Schleiss, A.J. Comparison and evaluation of satellite derived precipitation products for hydrological modeling of the Zambezi River Basin. Hydrol. Earth Syst. Sci. 2012, 16, 489–500. [Google Scholar] [CrossRef] [Green Version]
  42. Moore, A.E.; Cotterill, F.P.D.; Main, M.P.L.; Williams, H.B. The Zambezi River. Large Rivers Geomorphol. Manag. 2008, 311–332. [Google Scholar] [CrossRef]
  43. The World Bank. Zambezi River Basin Sustainable Agriculture Water Development Angola, Botswana, Malawi, Mozambique, Namibia, Tanzania, Zambia, Zimbabwe; The International Bank for Reconstruction and Development/The World Bank: Washington, DC, USA, 2008. [Google Scholar]
  44. Beyer, M.; Wallner, M.; Bahlmann, L.; Thiemig, V.; Dietrich, J.; Billib, M. Rainfall characteristics and their implications for rain-fed agriculture: A case study in the Upper Zambezi River Basin. Hydrol. Sci. J. 2016, 61, 321–343. [Google Scholar] [CrossRef]
  45. Calzadilla, A.; Zhu, T.; Rehdanz, K.; Tol, R.S.J.; Ringler, C. Economywide impacts of climate change on agriculture in Sub-Saharan Africa. Ecol. Econ. 2013, 93, 150–165. [Google Scholar] [CrossRef] [Green Version]
  46. Milgroom, J.; Giller, K.E. Courting the rain: Rethinking seasonality and adaptation to recurrent drought in semi-arid southern africa. Agric. Syst. 2013, 118, 91–104. [Google Scholar] [CrossRef]
  47. Harris, I.; Jones, P.D.; Osborn, T.J.; Lister, D.H. Updated high-resolution grids of monthly climatic observations—The CRU TS3. 10 Dataset. Int. J. Climatol. 2014, 34, 623–642. [Google Scholar] [CrossRef] [Green Version]
  48. Zhang, H.K.; Roy, D.P.; Yan, L.; Li, Z.; Huang, H.; Vermote, E.; Skakun, S.; Roger, J. Remote Sensing of Environment Characterization of Sentinel-2A and Landsat-8 top of atmosphere, surface, and nadir BRDF adjusted re fl ectance and NDVI di ff erences. Remote Sens. Environ. 2018, 215, 482–494. [Google Scholar] [CrossRef]
  49. Gitelson, A.A.; Arkebauer, T.J.; Rundquist, D.C.; Keydan, G.; Leavitt, B. Remote estimation of leaf area index and green leaf biomass in maize canopies. Geophys. Res. Lett. 2003, 30, 4–7. [Google Scholar] [CrossRef] [Green Version]
  50. Cai, Y.; Guan, K.; Peng, J.; Wang, S.; Seifert, C.; Wardlow, B.; Li, Z. A high-performance and in-season classification system of field-level crop types using time-series Landsat data and a machine learning approach. Remote Sens. Environ. 2018, 210, 35–47. [Google Scholar] [CrossRef]
  51. Tian, F.; Wu, B.; Zeng, H.; Zhang, X.; Xu, J. Efficient Identification of Corn Cultivation Area with Multitemporal Synthetic Aperture Radar and Optical Images in the Google Earth Engine Cloud Platform. Remote Sens. 2019, 11, 629. [Google Scholar] [CrossRef] [Green Version]
  52. Tucker, C.J. Red and Photographic Infrared l, lnear Combinations for Monitoring Vegetation. Remote Sens. Environ. 1979, 8, 127–150. [Google Scholar] [CrossRef] [Green Version]
  53. Huete, A. A Soil-Adjusted Vegetation Index ( SAVI ). Remote Sens. Environ. 1988, 25, 295–309. [Google Scholar] [CrossRef]
  54. Xiao, X.; Hollinger, D.; Aber, J.; Goltz, M.; Davidson, E.A.; Zhang, Q.; Moore, B. Satellite-based modeling of gross primary production in an evergreen needleleaf forest. Remote Sens. Environ. 2004, 89, 519–534. [Google Scholar] [CrossRef]
  55. Chen, W.; Liu, L.; Zhang, C.; Wang, J.; Wang, J.; Pan, Y. Monitoring the seasonal bare soil areas in Beijing using multi-temporal TM images. Int. Geosci. Remote Sens. Symp. 2004, 5, 3379–3382. [Google Scholar] [CrossRef]
  56. Laso Bayas, J.C.; Lesiv, M.; Waldner, F.; Schucknecht, A.; Duerauer, M.; See, L.; Fritz, S.; Fraisl, D.; Moorthy, I.; McCallum, I.; et al. A global reference database of crowdsourced cropland data collected using the Geo-Wiki platform. Sci. Data 2017, 4, 222222. [Google Scholar] [CrossRef] [PubMed]
  57. Wu, B.; Tian, Y.; Li, Q. GVG, a Crop Type Proportion Sampling Instrument. J. Remote Sens. 2004, 8, 570–580. [Google Scholar]
  58. Waldner, F.; Fritz, S.; Di Gregorio, A.; Plotnikov, D.; Bartalev, S.; Kussul, N.; Gong, P.; Thenkabail, P.; Hazeu, G.; Klein, I.; et al. A unified cropland layer at 250 m for global agriculture monitoring. Data 2016, 1, 3. [Google Scholar] [CrossRef] [Green Version]
  59. Dubath, P.; Rimoldini, L.; Süveges, M.; Blomme, J.; López, M.; Sarro, L.M.; de Ridder, J.; Cuypers, J.; Guy, L.; Lecoeur, I.; et al. Random forest automated supervised classification of Hipparcos periodic variable stars. Mon. Not. R. Astron. Soc. 2011, 414, 2602–2617. [Google Scholar] [CrossRef]
  60. Kullarni, V.Y.; Sinha, P.K. Random Forest Classifier: A Survey and Future Research Directions. Int. J. Adv. Comput. 2013, 36, 1144–1156. [Google Scholar]
  61. Belgiu, M.; Drăguţ, L. Random forest in remote sensing: A review of applications and future directions. ISPRS J. Photogramm. Remote Sens. 2016, 114, 24–31. [Google Scholar] [CrossRef]
  62. Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef] [Green Version]
  63. Pal, M. Random forest classifier for remote sensing classification. Int. J. Remote Sens. 2005, 26, 217–222. [Google Scholar] [CrossRef]
  64. Gastwirth, J.L. The Estimation of the Lorenz Curve and Gini Index. Rev. Econ. Stat. 1972, 54, 306. [Google Scholar] [CrossRef]
  65. Huang, C.; Davis, L.; Townshend, J. An assessment of support vector machines for land cover classification. Int. J. Remote Sens. 2002, 23.4, 725–749. [Google Scholar] [CrossRef]
  66. Kavzoglu, T.; Colkesen, I. A kernel functions analysis for support vector machines for land cover classification. Int. J. Appl. Earth Obs. Geoinf. 2009, 11, 352–359. [Google Scholar] [CrossRef]
  67. Mountrakis, G.; Im, J.; Ogole, C. Support vector machines in remote sensing: A review. ISPRS J. Photogramm. Remote Sens. 2011, 66, 247–259. [Google Scholar] [CrossRef]
  68. Shao, Y.; Lunetta, R.S. Comparison of support vector machine, neural network, and CART algorithms for the land-cover classification using limited training data points. Int. J. Remote Sens. 2005, 26, 217–222. [Google Scholar] [CrossRef]
  69. Temkin, N.R.; Holubkov, R.; Machamer, J.E.; Winn, H.R.; Dikmen, S.S. Classification and regression trees (CART) for prediction of function at 1 year following head trauma. J. Neurosurg. 1995, 82, 764–771. [Google Scholar] [CrossRef]
  70. Loh, W.Y. Classification and regression trees. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2011, 1, 14–23. [Google Scholar] [CrossRef]
  71. Mishra, P.; Singh, D.; Yamaguchi, Y.; Singh, D. Land Cover Classification of Palsar Images By Knowledge Based Decision Tree Classi- Fier and Supervised Classifiers Based on Sar Observables. Prog. Electromagn. Res. B 2011, 30, 47–70. [Google Scholar] [CrossRef] [Green Version]
  72. Sohn, Y.; Rebello, N.S. Supervised and unsupervised spectral angle classifiers. Photogramm. Eng. Remote Sens. 2002, 68, 1271–1280. [Google Scholar]
  73. Lu, D.S.; Mausel, P.; Batistella, M.; Moran, E. Comparison of land-cover classification methods in the Brazilian Amazon Basin. Photogramm. Eng. Remote Sens. 2004, 70, 723–731. [Google Scholar] [CrossRef] [Green Version]
  74. Perumal, K.; Bhaskaran, R. Supervised classification performance of multispectral images. J. Comput. 2010, 2, 124–129. [Google Scholar] [CrossRef]
  75. Congalton, R.G. A review of assessing the accuracy of classifications of remotely sensed data. Remote Sens. Environ. 1991, 37, 35–46. [Google Scholar] [CrossRef]
  76. Story, M.; Congalton, R.G. Accuracy assessment: A user’s perspective. Photogramm. Eng. Remote Sens. 1986, 52, 397–399. [Google Scholar] [CrossRef]
  77. Banko, G. A Review of Assessing the Accuracy of and of Methods Including Remote Sensing Data in Forest Inventory; IIASA: Luxemburgo, Austria, 1998. [Google Scholar]
  78. Congalton, R.G.; Green, K. Assessing the Accuracy of Remotely Sensed Data: Principles and Practices, 2nd ed.; CRC Press: Boca Raton, FL, USA, 2009; ISBN 978-1-4200-5512-2. [Google Scholar]
  79. Bishop, Y.M.M.; Fienberg, S.E.; Holland, P.W. Discrete Multivariate Analyis: Theory and Pratice; The MIT Press: Cambridge, MA, USA, 1975. [Google Scholar]
  80. Penny, W.; Henson, R. Analysis of Variance. In Statistical Parametric Mapping: The Analysis of Functional Brain Images; Penny, W., Friston, K., Ashburner, J., Kiebel, S., Nichols, T., Eds.; Academic Press: Cambridge, MA, USA, 2006; ISBN 978-0-12-372560-8. [Google Scholar]
  81. Lane, D.M. Analysis of Variance. In Introduction to Statistics; Rice University: Houston, TX, USA, 2016; Chapter 15; pp. 517–597. [Google Scholar]
  82. Conagin, A.; Barbin, D.; Demétrio, C.G.B. Modifications for the Tukey Test Procedure and Evaluation of the Power and Efficiency of Multiple Comparison Procedures. Sci. Agric. 2008, 65, 428–432. [Google Scholar] [CrossRef] [Green Version]
  83. Williams, L.J.; Abdi, H. Fisher’s Least Significant Difference ( LSD ) Test 1 Overview 2 Notations 3 Least significant difference. Encycl. Res. Des. 2010, 1–6. [Google Scholar]
  84. Gong, P.; Liu, H.; Zhang, M.; Li, C.; Wang, J.; Huang, H.; Clinton, N.; Ji, L.; Li, W.; Bai, Y.; et al. Stable classification with limited sample: Transferring a 30-m resolution sample set collected in 2015 to mapping 10-m resolution global land cover in 2017. Sci. Bull. 2019, 64, 370–373. [Google Scholar] [CrossRef] [Green Version]
  85. Sanches, I.D.; Moran, E.; Chen, Y.; Batistella, M.; Luiz, A.J.B.; da Silva, R.F.B.; de Oliveira, M.A.F.; Lu, D.; Dutra, L.V.; Huang, J. Mapping croplands, cropping patterns, and crop types using MODIS time-series data. Int. J. Appl. Earth Obs. Geoinf. 2018, 69, 133–147. [Google Scholar] [CrossRef]
  86. Petitjean, F.; Inglada, J.; Gançarski, P. Satellite Image Time Series Analysis Under Time Warping. IEEE Trans. Geosci. Remote Sens. 2012, 50, 3081–3095. [Google Scholar] [CrossRef]
  87. Oliphant, A.J.; Thenkabail, P.S.; Teluguntla, P.; Xiong, J.; Krishna, M.; Congalton, R.G.; Yadav, K. Mapping cropland extent of Southeast and Northeast Asia using multi-year time-series Landsat 30-m data using a random forest classifier on the Google Earth Engine Cloud. Int. J. Appl. Earth Obs. Geoinf. 2019, 81, 110–124. [Google Scholar] [CrossRef]
  88. Gong, P.; Lu, H.; Zhao, J.; Zhao, F.R.; Xu, Y.; Cai, X.; Yu, L. Tracking annual cropland changes from 1984 to 2016 using time-series Landsat images with a change-detection and post-classification approach: Experiments from three sites in Africa. Remote Sens. Environ. 2018, 218, 13–31. [Google Scholar] [CrossRef]
Figure 1. Map of the Zambezi River basin showing the four different agroecological zones (AEZ) and the test sites (with red boundaries) at each AEZ (Adapted from [34]).
Figure 1. Map of the Zambezi River basin showing the four different agroecological zones (AEZ) and the test sites (with red boundaries) at each AEZ (Adapted from [34]).
Remotesensing 12 02096 g001
Figure 2. Spatial distribution of (a) monthly rainfall and (b) monthly surface air temperature over the ZRB from January to December, averaged over 1986–2015 (30-year period). The black boxes indicate the ZRB. Gridded observational temperature and precipitation dataset [47] derived from East Anglia’s Climate Research Unite was used to plot this figure.
Figure 2. Spatial distribution of (a) monthly rainfall and (b) monthly surface air temperature over the ZRB from January to December, averaged over 1986–2015 (30-year period). The black boxes indicate the ZRB. Gridded observational temperature and precipitation dataset [47] derived from East Anglia’s Climate Research Unite was used to plot this figure.
Remotesensing 12 02096 g002
Figure 3. Distribution of the training (6727) samples and validation (4639) samples over the ZRB.
Figure 3. Distribution of the training (6727) samples and validation (4639) samples over the ZRB.
Remotesensing 12 02096 g003
Figure 4. (a) Overall accuracy (OA %) and (b) kappa coefficient (k) over different AEZs for four classification methods (RF, SVM, MD, and CART).
Figure 4. (a) Overall accuracy (OA %) and (b) kappa coefficient (k) over different AEZs for four classification methods (RF, SVM, MD, and CART).
Remotesensing 12 02096 g004
Figure 5. (a) Ten-meter cropland extent over the Zambezi River basin (2017–2019) and (b) Landsat-8 color infrared vegetation composite (bands 5, 4, and 3) for January 1st, 2018.
Figure 5. (a) Ten-meter cropland extent over the Zambezi River basin (2017–2019) and (b) Landsat-8 color infrared vegetation composite (bands 5, 4, and 3) for January 1st, 2018.
Remotesensing 12 02096 g005
Figure 6. Comparison of cropland maps from this study and the GFSAD30AFCE at one subset.
Figure 6. Comparison of cropland maps from this study and the GFSAD30AFCE at one subset.
Remotesensing 12 02096 g006
Table 1. Characteristic of the Sentinel-2 and Landsat-8 remote sensing data used in the study.
Table 1. Characteristic of the Sentinel-2 and Landsat-8 remote sensing data used in the study.
SensorProviderBandsDescriptionWavelength (nm)Resolution (m)
Sentinel-2 MultiSpectral Instrument (MSI) Level-1C—Top of Atmosphere (TOA)European Space Agency (ESA)B2Blue458–52310
B3Green543–57810
B4Red650–68010
B8Near-infrared785–90010
B11SWIR11565–165520
QA60Cloud mask-60
Landsat-8 Surface Reflectance Tier 1 TOAUnited States Geological Survey (USGS)B2Blue452–51230
B3Green533–59030
B4Red636–67330
B5Near-infrared851–87930
B6SWIR-11566–165130
BQAQuality band--
Table 2. Remote sensing derived indices used in this study.
Table 2. Remote sensing derived indices used in this study.
Remote Sensing IndicesFormulaRef.
Normalized Difference Vegetation Index N D V I = N I R R E D N I R + R E D [52]
Soil Adjusted Vegetation Index S A V I = ( N I R R E D N I R + R E D + L ) ( 1 + L ) [53]
Land Surface Water Index L S W I = N I R S W I R N I R + S W I R [54]
Green Chlorophyll Vegetation Index G C V I = ( N I R G R E E N ) 1 [49]
Bare Soil Index B I = ( ( R E D + S W I R ) ( N I R + B L U E ) ( R E D + S W I R ) + ( N I R + B L U E ) ) [55]
Table 3. The two seasonal stack images that composed the yearly mosaic.
Table 3. The two seasonal stack images that composed the yearly mosaic.
Rainy Season Stack ImageDry Season Stack Image
-
Five merged bands from Landsat-8 and Sentinel-2 (blue, green, red, NIR, SWIR).
-
Twenty percentiles (Four percentiles extracted from the merged Landsat-8-derived and Sentinel-2 derived vegetation indices.
-
Median composites of five original merged bands from Landsat-8 and Sentinel-2 (blue, green, red, NIR, SWIR 1).
-
Median composites of five vegetation indices derived from Landsat-8 and Sentinel-2 (NDVI, LSWI, EVI, SAVI, GCVI).
Table 4. Summary of the harmonization process of the different datasets to the five defined land cover classes used in the study.
Table 4. Summary of the harmonization process of the different datasets to the five defined land cover classes used in the study.
Final ClassesOriginal Classes
GFSAD30AFCEESA-CCI-LC_S2_PrototypeESACCL-LC-L4-300
CroplandCroplandCroplandCropland, rainfed; Cropland, irrigated or post-flooding; Mosaic cropland (>50%)/natural vegetation (tree, shrub, herbaceous cover) (<50%)
ForestNon-croplandTrees cover areas
Shrubs cover areas
Lichen mosses/sparse vegetation
Vegetation aquatic or regularly flooded
Mosaic natural vegetation (tree, shrub, herbaceous cover) (>50%)/cropland (<50%); Tree cover, broadleaved, evergreen, closed to open (>15%); Tree cover, broadleaved, deciduous, closed to open (>15%); Tree cover, needleleaved, evergreen, closed to open (>15%); Tree cover, needleleaved, deciduous, closed to open (>15%); Tree cover, mixed leaf type (broadleaved and needleleaved); Mosaic tree and shrub (>50%)/herbaceous cover (<50%); Mosaic herbaceous cover (>50%)/tree and shrub (<50%); Shrubland; Sparse vegetation (tree, shrub, herbaceous cover) (<15%); Tree cover, flooded, fresh or brakish water; Tree cover, flooded, saline water; Shrub or herbaceous cover, flooded, fresh/saline/brakish water;
GrasslandNon-croplandGrasslandGrassland
Urban areasNon-croplandBuilt up areasUrban areas
Water bodiesWaterOpen waterWater bodies
Table 5. Assessing the accuracy of mapping cropland in the three agreement levels among land cover (LC) datasets.
Table 5. Assessing the accuracy of mapping cropland in the three agreement levels among land cover (LC) datasets.
AgreementClassNcCTotalOA %
C13176389485.4
Full agreementNc1865194205990.6
Total1996957295389.0
Median agreementC23574898376.1
No-agreementC36533870348.1
C: cropland, Nc: non-cropland, and OA: overall accuracy.
Table 6. Summary of the confusion matrix of land cover maps.
Table 6. Summary of the confusion matrix of land cover maps.
Random ForestSupport V. MachineCARTMinimum Distance
Reference Data
CFGUWUACFGUWUACFGUWUACFGUWUA
Tropical Cool Sub-humidClassified dataC45000010042040091.33701009739020095
F2426008444280077.854212007124240088
G1025009620190090.56018007570250078
U00073700006185.700041800008280
W0011128600121482.40014147400101393
PA94100788880* 90.388100597593* 84.877100565093* 87.6811007810087* 71.3
Tropical Warm Sub-humidC5232742593814143055425222158380180068
F54780078630500738407007354660081
G5170054167201144126133137105150247
U000401000004180000301009038138
W00002110020302181000021100000020100
PA8492175091* 70.46159485091* 60.86878313891* 68.361903610087* 64
Tropical Warm SemiaridC5621531735314252056405130168501100181
F546800784445008354260079646130071
G71124205511117015723122730421112231049
U000901000001201000011109210113087
W000030100000030100000030100000030100
PA8278516497* 75.37875368697* 71.25971577997* 747478499397* 68.5
Tropical Cool SemiaridC11222820781071124607290214208390080092
F490220078180101087686240074882210074
G0022109685370074952210591515422057
U502380845133407984143805980339078
W00001610000001610010001694000016100
PA93983093100* 80.888825083100* 78.579893093100* 77.174855795100* 73.7
* Overall accuracy (%); C—cropland; F—forest; G—grassland; U—urban area; W—water bodies; UA—user accuracy (%); PA—producer accuracy (%).
Table 7. Summary of the Tukey honestly significant difference (HSD) test computed for the overall accuracy and kappa coefficient.
Table 7. Summary of the Tukey honestly significant difference (HSD) test computed for the overall accuracy and kappa coefficient.
ClassifierOverall AccuracyKappa Coefficient
MeanTest ResultsMeanTest Results
Random Forest0.8739a0.7202a
Minimum Distance0.7938ab0.5603ab
Support Vector Machine0.7816b0.5195ab
Classification and Regression Tree0.7687b0.4734b
Honestly Significant Difference (HSD)0.08810.2382
Minimal Level of Significance5% (0.05)
Note: Means followed by the same letter (a, b) do not differ at the 5% minimum level of significance based on Tukey’s HSD test.
Table 8. Confusion matrix for the accuracy assessment of cropland maps in the ZRB.
Table 8. Confusion matrix for the accuracy assessment of cropland maps in the ZRB.
Reference Data
ClassesNon-CroplandCroplandRaw SumUser Accuracy
Classified DataNon-cropland2284312259688.0
Cropland4391604204378.5
Column sum272319164639
Producer accuracy83.983.7
Overall Accuracy:84%
Kappa:0.67

Share and Cite

MDPI and ACS Style

Bofana, J.; Zhang, M.; Nabil, M.; Wu, B.; Tian, F.; Liu, W.; Zeng, H.; Zhang, N.; Nangombe, S.S.; Cipriano, S.A.; et al. Comparison of Different Cropland Classification Methods under Diversified Agroecological Conditions in the Zambezi River Basin. Remote Sens. 2020, 12, 2096. https://doi.org/10.3390/rs12132096

AMA Style

Bofana J, Zhang M, Nabil M, Wu B, Tian F, Liu W, Zeng H, Zhang N, Nangombe SS, Cipriano SA, et al. Comparison of Different Cropland Classification Methods under Diversified Agroecological Conditions in the Zambezi River Basin. Remote Sensing. 2020; 12(13):2096. https://doi.org/10.3390/rs12132096

Chicago/Turabian Style

Bofana, José, Miao Zhang, Mohsen Nabil, Bingfang Wu, Fuyou Tian, Wenjun Liu, Hongwei Zeng, Ning Zhang, Shingirai S. Nangombe, Sueco A. Cipriano, and et al. 2020. "Comparison of Different Cropland Classification Methods under Diversified Agroecological Conditions in the Zambezi River Basin" Remote Sensing 12, no. 13: 2096. https://doi.org/10.3390/rs12132096

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop