1. Introduction
Food security, sustainable agricultural production, and poverty reduction are top priorities and challenging societal issues in Mozambique. The production and productivity of the main crops is very low due to limited use of improved agricultural inputs and the post-harvest losses are high. Therefore, the access to food for the gross part of Mozambican population relying on agricultural production is only guaranteed during short periods. As such, about 24% of households are food insecure and there is an extremely high level of chronic under-nutrition (43%) which affects almost one in every two children under the age of five years [
1].
Mapping the cropland extent, distribution, and characteristics is of great importance to support measures to mitigate these priority problems in Mozambique. In fact, accurate and consistent information on cropland and the dynamic location of major crop types is required to support future policy, investment, and logistical decisions [
2,
3], as well as production monitoring [
4]. Timely cropland information directly feeds early warning systems, such as FAO Global Information and Early Warning System (GIEWS), Global Monitoring for Food Security (GMFS), and Famine Early Warning Systems Network (FEWS NET) [
5,
6]. Moreover, this information is also relevant for other disciplines, such as environmental impact and mitigation measures of the foreseen climate scenarios [
7]. In agriculture monitoring, as well as climate modeling, cropland maps can be used as a mask to isolate agricultural land for (1) time-series analysis for crop condition monitoring and (2) to examine how the cropland responds to different climatic scenarios [
3]. Additionally, timely and accurate assessment of land cover and land use is essential in understanding how climate variability affects regional crop production and for supporting the allocation of resources for medium and long-term planning. This issue is particularly relevant considering that Mozambique is highly vulnerable to extreme weather events like droughts and floods (ranks 3rd amongst African countries) and to erratic and unpredictable precipitation patterns. These conditions contribute to instability in crop production and often limit access to food. Traditional smallholder and resources-poor farming systems dominate the agricultural landscape in Mozambique, with 99% of fields smaller than 1.5 hectares and growing mainly maize (
Zea mays L.) or cassava (
Manihot esculenta Crantz) for own consumption [
8]. These traditional farming systems are characterized by a highly dynamic shifting nature because when the production starts to decline (in a time frame of very few years), the crop fields are left for fallow and new natural areas are open for cultivation.
Smallholder farmers constitute the socio-economic backbone of the country and provide the basis for food security, preserving natural biodiversity, and providing environmental protection [
5]. It is therefore imperative to adopt data and tools that enable appropriate monitoring and mapping of these traditional agricultural systems so that they can be included and valued into more global food systems. On the other hand, to meet the rapidly growing demand for food, fueled by a rapid population growth and changing diets, and to enhance economic development, pressure on natural areas (forest and savannahs) is increasing [
9,
10]. Cultivating more land to increase yield has put natural land and water resources under intense pressure [
11,
12]. The introduction of agricultural machinery, the use of irrigation systems and fertilizers are contributing to rapidly modify the land surface. Given the rate of agricultural area expansion, the shifting nature of the dominant agricultural system and considering Mozambique’s vulnerability to risks from droughts and floods, there is a need to significantly improve the information management system aiming to feed crop statistics as well as to monitor and support the sustainable use of soil, water, and other resources [
1]. In such context, the timely, accurate, and cost-efficient mapping of these peculiar and evolving agricultural systems in Mozambique, at fine scales, is paramount.
Remote sensing, through Earth Observation (EO) imagery, has been critical for land cover classification thanks to its wide area and time-repeated coverage. Several global land cover products have been developed and made publicly available. Among others, the following products are the most widespread: (i) the Global Land Cover database for the year 2000 (GLC2000), a SPOT4-based map produced by the European Commission’s Joint Research Center (JRC), with a spatial resolution of about 1 km at the equator and larger at higher latitudes [
13,
14,
15]; (ii) the Globcover, produced by the European Space Agency with ENVISAT-MERIS data, with a 300 m of spatial resolution [
16]; (iii) the Moderate-resolution Imaging Spectroradiometer (MODIS) Collection 5 Land Cover Product, with 500 m of spatial resolution [
17]; (iv) the Global Land Cover Characterization database (GLCC), resulting from a concerted effort between the U.S. Geological Survey (USGS), University of Nebraska Lincoln (UNL) and the JRC, which is based on the Advanced Very High Resolution Radiometer (AVHRR) and has a nominal spatial resolution of 1 km [
18]; (v) the Global Food Security-support Analysis Data Product (GFSAD1000), with a nominal scale of 1 km [
19]; (vi) the global Spatial Production Allocation Model (SPAM) dataset, which utilizes several datasets and ancillary information, with a spatial resolution of about 10 km at the equator [
13]; (vii) the International Institute for Applied Systems Analysis-International Food Policy Research Institute (IIASA-IFPRI) cropland product, a cropland percentage hybrid map generated using some of the datasets mentioned above, with a spatial resolution of 1 km [
2]; the National Geomatic Center of China (NGCC) 30 m global land cover (Globeland 30), based on Landsat and China Environmental Disaster Alleviation Satellite (HJ – 1) [
20].
In almost all the above-mentioned products, the coarse spatial resolution hinders the careful assessment of cropland class, which is characterized by a large variety of agro-systems with a dynamic nature. This issue is more pronounced in regions with prevalence of traditional smallholder and resource-poor farming systems, such as in many African countries [
3,
13,
21]. In addition, the authors of [
3,
12,
22] argued that some of these datasets are often not reliable over croplands, as they show important disagreement with each other and with national statistics. Other important limitations of the current global land cover maps are the lack of temporal updates, the limited number of land cover classes, and the types of classes defined, which are of limited use in finer scale applications [
13].
There is growing interest in mapping smallholder dominated landscapes [
23,
24,
25,
26,
27,
28,
29,
30,
31]. In general, smallholder cropland mapping is hindered by characteristics such as small field size, heterogeneity in management practices, landscape fragmentation, and the widespread presence of trees within the fields [
24,
32]. These characteristics differ widely between countries, agro-ecological zones, socio-economic context, and even region or province, making difficult the definition of the kind of spatial resolution and data input combinations for accurately mapping smallholders, as discussed by [
30,
32,
33]. As such, there are studies reporting accurate land cover classification results within smallholder dominated landscapes using high-resolution imagery [
29], very high-resolution imagery [
23], or a combination [
27]. The classification input data and the features calculated from the data, e.g., textural features at different window sizes, also impact on the classification accuracy, e.g., [
33,
34,
35,
36,
37]. The textural features represent the spatial variation in contiguous pixel values and thus provide a quantitative measure of spatial context. These texture features are extracted through the application of multiple functions to the image bands at different window sizes, and thus can result in a large volume of input data with potentially relevant information for the classification [
37]. Although the use of different sets of features in land cover classification has been widely explored, e.g., [
34,
35,
38], more efforts are needed towards assessing the best combination of features when classifying shifting smallholder agricultural landscapes typical in several African countries as in Mozambique. According to [
32] the selection of appropriate imagery and input feature for shifting smallholder classification is probably a context-specific issue.
The development of powerful cloud-based computational frameworks, coupled with the increasing accessibility of imagery resulting from the Landsat Global Archive Consolidation (LGAC) initiative [
39] and the European Commission Copernicus programme (Sentinel 2A and 2B) [
40] is making custom classification more accessible, and have the potential to help overcome the limitations of existing products. An example of the combined use of these assets is the Google Earth Engine (GEE), a planetary-scale platform for geospatial data science powered by Google’s cloud platform [
41]. As pointed out by [
13], GEE is an invaluable tool by making available a huge data pool of satellite imagery and access to robust algorithms that can be applied for diversified types of uses, including image classification. In fact, numerous studies have been developed to perform classification within GEE at regional [
42,
43,
44,
45] and global scales [
46,
47]. Amongst others, the frequently used classifiers include random forest (RF) [
48]; support vector machine (SVM) [
49], maximum entropy model (MaxEnt) [
50]; classification and regression trees (CART) [
51]. The RF, which is a classifier based on an ensemble of classification trees, has been demonstrated to provide higher accuracies in image classification applications than other techniques [
52,
53], independent of the type of data used and of the targeted environment. Moreover, RF is recognized by its ability to cope with a large set of input variables, which results from the use of a random subset of input features in the division of every node [
37].
As highlighted by [
54], the rising availability of improved RS data and tools offers the opportunity for the development of land cover and land uses maps at finer spatial and temporal resolutions, which are suitable to support applications in an agricultural setting with predominance of smallholder and resource-poor agricultural systems. This is particularly crucial in the face of rising climate variability and the associated uncertainties on smallholder agriculture [
40]. Additionally, according to [
3], Mozambique is one of the three Southern African countries of high priority for cropland mapping and which would really benefit from a timely land cover/land use update.
Most previous studies on smallholder cropland mapping considered only a single date or a single season time scale. However, making use of multitemporal imagery (from various years) is needed to capture both the spatial dynamics and the temporal shifting characteristic of smallholder agriculture in Mozambique.
In this context, the main objective of the current study is to develop and test an automated land cover classification and mapping for Vilankulo district in Mozambique using GEE cloud computing and based on different types of features derived from Landsat imagery for three years within the period between 2012 and 2018. The types of features tested include Landsat spectral bands, spectral vegetation indices, and textural features derived from the spectral bands. The specific objectives include (i) evaluating the best combination of input variables for the classification process; (ii) assessing the most adequate window size for the texture input features; (iii) assessing the spatial and temporal dynamics of agriculture in Vilankulo; (iv) comparing the statistics of agricultural areas as extracted from the produced maps to those yearly reported by the Local Government entities.
2. Materials and Methods
2.1. Study Area Description
The current study was conducted in the district of Vilankulo, within the province of Inhambane in southern Mozambique (
Figure 1), and includes agricultural seasons of the years 2012, 2015, and 2018.
Climatologically, Vilankulo is characterized by a semi-arid to arid climate and presents two distinct seasons, a warm and rainy season running from October to March and a cold and dry season that runs from April to August.
The district of Vilankulo shapes well the complexity and heterogeneity of a landscape characterized by diversified land covers types, including agriculture, open and close canopy natural vegetation, water bodies, grasslands, bare soils, and human settlement areas. The agriculture is characterized by two cropping seasons, one during the rainy season and another during the dry season. The agriculture is mainly rainfed and is practiced only in the rainy season in mixed crop systems (maize, peanuts, beans, cassava, etc.); generally multipurpose trees species are also left within the crop fields for shade, fruits, medicines, and firewood collection. During the dry season, the rainfed croplands are not cultivated. In addition, rainfed croplands may be left for fallow after three or four years of consecutive cropping seasons, when the crop production drops considerably, and new natural areas are open for cultivation. Large scale irrigated crops (mainly maize, beans, soybean, and potatoes) have been introduced more recently and although grown throughout the entire year, it is still restricted to a small number of zones within the study area. Additionally, during the dry season, the cultivation is concentrated in areas neighboring water bodies (lagoons and rivers), where vegetables (e.g., tomatoes, cabbage, onion) are grown in small areas using handheld watering cans or small scales pumping irrigation systems.
The open-canopy natural vegetation (OCNV) comprises areas with vegetation bellow 5 m height and canopy cover below 10%; close canopy vegetation (CCNV) corresponds to areas with vegetation above 5 m height and canopy cover above or equal to 10%; water bodies (WB) comprise areas periodically or permanently covered by water; grasslands areas (GR) are primarily covered by grass with or without scattered shrubs; bare soils (BS) are uncovered areas which in the study area include prepared areas for agriculture, sand bands along water bodies; and agriculture (AGR) consists of cultivated fields.
2.2. Landsat Image Selection and Pre-Processing
Due to its high number of bands, Landsat 8 OLI/TIRS was taken as the prime source of data for the study and a search for all USGS Surface Reflectance (SR) images from the Landsat 8 OLI/TIRS sensor covering Vilankulo district in the main cropping seasons of 2015 and 2018 (October to March) was performed within the GEE’s data pool. The images used for each year are listed in
Table 1.
The study area is covered by three Landsat scenes of the path/row 166/75, 166/76, and 167/75.
Pre-processing of Landsat 8 OLI/TIRS images consisted of application of quality bands to remove cloud contamination.
For the year 2012, prior to the Landsat 8 OLI/TIRS launch (February 2013), a search for Landsat 7 ETM+ was done and six atmospherically corrected surface reflectance images and with cloud cover below 5% were identified. In this year, there was an anticipation of cropping season, taking advantage of good meteorological conditions observed in September. As such, an image of September was used for the scene with path/row 167/75. The USGS Gap-Fill algorithm, as implemented within GEE, was applied to fill the SLC-failure gaps in all selected Landsat 7 ETM+ images. The methodology involves using data from other scenes of the same reference (source scenes), to fill the gaps in the scenes of interest (fill scenes). The source scenes were selected within a time interval of less than four months relative to the date of the fill scenes (
Table 2).
2.3. Texture Features
Texture is a distinctive property of all surfaces. It contains important information about the spatial distribution, structural arrangement of surfaces, and their relationship to the surrounding environment [
55]. Various methods can be used to derive textural information, for example, the first and second-order local statistics [
56] and grey level co-occurrence matrix (GLCM) measurements [
55]. Local statistics (LS) describe the moments of a neighborhood of individual pixels in an image region, while GLCM-based texture features characterize the distance and angular relationships among pixels. The GLCM is a tabulation of how often different combinations of pixel brightness values (grey levels) occur in an image. It counts the number of times a pixel of value X lies next to a pixel of value Y, in a given direction and distance, and then derives statistics from this tabulation. GLCM can be classified into two groups: (i) occurrence or first order measures, which describe a statistical property, such as pixel variance or skewness within a specified neighborhood of an image [
57,
58]; and (ii) co-occurrence or second order measures, which characterize the relative frequencies between two pixel brightness values linked by a spatial relation [
59,
60]. The second order GLCM texture measures have been the most common textural features used in remote sensing studies, including for forest structure modelling [
61,
62,
63], habitat modelling [
64], and land cover classification [
65,
66,
67].
In this study, the 14 GLCM metrics proposed by [
55], namely Angular Second Moment (asm), Contrast (contrast), Correlation (corr), Variance (var), Inverse Difference Moment (idm), Sum Average (savg), Sum Variance (svar), Sum Entropy (sent), Entropy (ent), Difference variance (dvar), Difference entropy (dent), Information Measure of correlation 1 (imcorr1), Information Measure of correlation 2 (imcorr2), and maximum correlation coefficient (maxcorr), and an additional four textural features from [
68], specifically, Dissimilarity (diss), Inertia (inertia), Cluster Shade (shade) and Cluster prominence (prom), were computed for each Landsat band. As the value of texture variables depends mostly on window size and spatial resolution [
61], six different window sizes (2, 5, 10, 20, 25, and 30) were tested while the 3 × 3 square default kernel size was kept constant.
2.4. Vegetation Indices
Four vegetation indices frequently used in land cover and agricultural studies were computed: normalized difference vegetation index (NDVI) [
69]; soil adjusted vegetation index (SAVI) [
70]; enhanced vegetation index (EVI) [
71]; and the normalized water difference index (NWDI) [
72]. NDVI is one of the most used vegetation indices for vegetation phenology characterization thanks to its sensitivity to photosynthetically active biomass and its ability to discriminate vegetation/non-vegetation, as well as wetland/non-wetland cover classes. SAVI was included to make use of its ability to minimize the soil background effect, as the study area presents large areas of sparse vegetation. EVI can be useful in areas of high leaf area index (LAI) where NDVI may saturate and NDWI is useful to discriminate water from land due to its sensitiveness to open water [
73].
2.5. Image Composites
For each year studied, when more than one image per scene existed, an aggregation method on a per-pixel, per-band basis was applied by using the median value. The Pearson’s correlation coefficient was used to assess the correlation between Landsat bands. Bands of the same spectral region showed correlation coefficient greater than 0.90 for both visible and short-wave infrared regions. As such, only three bands were selected for further analysis, being the band 1, B1 (presenting the least correlation coefficient (0.24) with the B5); band 5, B5 (presenting the least correlation coefficient with all bands (0.36) in average); and band 6, B6 (presenting the least correlation with all bands (0.60), as compared to the B7, of the same spectral region).
The final image composites for the years 2015 and 2018 comprised 61 features: (i) three reflective Landsat 8 bands (Band 1—deep blue; Band 5—NIR; Band 6—SWIR1), (ii) four vegetation indices (NDVI, SAVI, EVI, and NWDI), and (iii) 54 GLCM textural features (18 GLCM × 3 bands). For the year 2012, the final image composites included the same numbers of features, but the textural features were extracted from reflective Landsat 7 bands (Band 1—blue; Band 4—NIR; Band 5—SWIR1).
2.6. Training and Validation Samples Collection
Six land cover types, characteristic of the study area landscape, were considered for image classification: (i) Close canopy natural vegetation (CCNV), (ii) Open canopy natural vegetation (OCNV), (iii) Water Bodies (WB), (iv) Agriculture, (v) Bare soils (BS), and Grassland (GR). Each land cover type is characterized according to the description provided in
Section 2.1. The urban settlements were excluded from the classification process by using a mask derived from available cartography.
Global positioning system (GPS) points of each land cover type were collected during two field campaigns in 2017 and in 2019 and used as references during the on-screen collection of training data. During the field campaigns, farmers were asked about cultivation in previous years in order to ensure that only croplands operating for at least two consecutive years were included. The GPS points were then overlaid to the Landsat RGB and to very high-resolution Google Earth imagery of the corresponding cropping season and used as reference to select additional training samples. For the years where no ground data were available, the training data was selected based on visual inspection and interpretation of both Google Earth very high-resolution imagery and Landsat RGB composite. Visual interpretation of reference imagery was based on elements that help to identify land cover features such as location, size, shape, tone, color, shadow, texture, and pattern. A similar procedure was used by [
74,
75].
The validation data were generated by a stratified random sampling method from the classification results. The method was performed for each map (2012, 2015, and 2018) and, for each cover class, setting the number of validation points to at least 30% of the respective training points. This procedure enabled independence between training and validation data. It was also implemented in previous studies by [
34].
Table 3 presents the number of training and validation samples used in each year and
Figure 1 shows the spatial distribution of training and validation samples within the study area.
2.7. Input Features Selection for Classification
For the selection of optimal input features, the three non-correlated Landsat spectral bands, four vegetation indices (VIs), and the 18 GLCM textural features extracted in the three Landsat bands were initially considered.
Subsequently, combinations involving one or more types of the above features were established, including (
Table 4) (i) Landsat bands and 18 GLCM per Landsat band (T1: Angular Second Moment (asm), Contrast (contrast), Correlation (corr), Variance (var), Inverse Difference Moment (idm), Sum Average (savg), Sum Variance (svar), Sum Entropy (sent), Entropy (ent), Difference variance (dvar), Difference entropy (dent), Information Measure of correlation 1 (imcorr1), Information Measure of correlation 2 (imcorr2), Maximum Correlation Coefficient (maxcorr), Dissimilarity (diss), Inertia (inertia), Cluster Shade (shade), Cluster prominence (prom)); (ii) Landsat bands and five GLCM textural features (T2: contrast, corr, variance, sevg, diss); (iii) Landsat bands and five GLCM textural features (T3: asm, shade, ent, prom, inertia); (iv) Landsat bands and six GLCM textural features (T4: asm, contrast, var, corr, ent, and idm); (v) Landsat bands; (vi) Landsat bands and four VIs.
For the combinations involving textural features, six different window sizes (2, 5, 10, 20, 25, and 30) were considered.
The subsets of five and six GLCM textural features selected for the combinations tested (#: 7–20 in
Table 4) were identified among the most relevant according to the literature (asm, contrast, variance, correlation, entropy, inverse difference moment, dissimilarity) [
60,
76,
77,
78,
79]. The textural features proposed by [
68], were also included. These are suggested to improve the perceptual concepts of uniformity, proximity, and the qualities of texture periodicity and gradient, when combined with the so called most relevant GLCM textural metrics. The inverse divergence moment (idm) indicates the image homogeneity as it assumes larger values for smaller grey tone differences in pair elements. Contrast is a degree of spatial frequency, the difference between the maximum and the minimum values of a neighboring set of pixels; high contrast values denote high coarse texture. The dissimilarity (diss) informs about the heterogeneity of the grey levels; coarser textures present higher values of dissimilarity. Entropy (ent) indicates the disorder of an image; non-texturally uniform image presents very large entropy. Angular second moment (asm) is a measure of textural uniformity (pixel pair repetition); high asm values occur when the grey level distribution has either a contrast or a periodic form. Correlation (corr) indicates the grey tone linear dependencies in the image; high correlation values imply a linear relationship between the grey levels of pixel pairs. More information about the GLCM features can be found in [
55,
78].
The Jeffreys–Matusita distance (JM) was computed for each feature combination and all possible pair of classes. The JM is a pair-wise measure of class separability based on the probability distribution of two classes [
80,
81,
82]. JM ranges from 0 to 2.0; values below 1.0 indicate that two classes are not separable; values between 1.0 and 1.5 indicate that two classes are separable but with some degree of confusion, and between 1.5 and 1.9, the classes are clearly separable; completely separable classes present JM values above 1.9 [
38]. The selection of final feature combination considered the largest JM for the least separable pairs of classes, the number of features in the combination, and the GLCM window size. A similar approach was applied in previous studies [
38,
83,
84].
2.8. Classifier Algorithm
In this study we used the random forest classifier (RF) [
48], within the GEE platform. The RF can be described as the collection of tree-structured classifiers. It produces multiple decision trees, using randomly selected subsets of training samples and variables. As such, a classification result is obtained in each tree trained on a subset of the original training sample; subsequently a majority vote is used to create the final classification result. This classifier algorithm is considered very fast, robust against over fitting, allows forming as many trees as the user wants, and each decision tree uses a random subset of training data and a random subset of input predictor variables, which reduces the correlation between decision trees as well as the overall computational complexity [
85]. In our study, a RF with 500 trees was created and the number of features per split was set to the square root of the total number of features, which corresponds to the standard settings [
86].
2.9. Accuracy Assessment
A confusion matrix and the accuracy assessment of each map were computed using the validation data set for each land cover type. The widely applied overall accuracy (OA), Kappa coefficient (KC), producer accuracy (PA), and consumer accuracy (CA) were computed and used to assess the quality of derived land cover maps. The OA determines the overall efficiency of the algorithm and is measured by dividing the total number of correctly classified samples by the total number of the testing samples. The KC indicates the degree of agreement between the ground truth data and the predicted values. The PA measures how well a pixel has been classified. It includes the error of omission which refers to the proportion of observed features on the ground that are erroneously excluded from a class. The CA measures the reliability of the map, informing how well the map represents what is really on the ground and it includes the error of commission which refers to pixels erroneously included in a class [
73,
87].
Furthermore, the statistics of cropped areas for each year were computed and compared to those from the Local Government entity responsible for agricultural issues within the district (Serviços Distritais de Actividades Económicas, SDAE). SDAE estimates the agricultural area (ha) and production (ton) within the district based on the share of economically active population as reported by the National Census and share of population in agricultural sector, and the number and size of croplands as reported by the National Agrarian Census.
4. Discussion
The selected input features for land cover classification in the study area include five GLCM textural features (angular second moment—asm, entropy—ent, inertia—inertia, cluster shade—shade, cluster prominence—prom), and three Landsat spectral bands (Band 1—deep blue, Band 5—NIR, and Band 6—SWIR1 for Landsat 8; and Band 1—blue, Band 4—NIR, and Band 5—SWIR1 for Landsat 7). Several studies have shown that integrating optical spectral data with texture features representing the spatial variability in pixel values can improve classification accuracy [
33]. The textural features asm and ent have been included as optimal features for land cover classification in previous studies, e.g., [
76,
78,
79]. In this study, asm and ent performed poorly when combined together with other so called most relevant textural features (texture features combinations #17–#20;
Table 5), but results improved when asm and ent were combined with the textural features inertia, shade, and prom (texture features combinations #1–#12;
Table 5). These results suggest that textural features inertia, shade, and prom may have improved the separability of different cover classes as argued by [
68]. The Landsat spectral bands selected in this study are widely used as important features for land cover classification in diversified types of landscapes [
13,
88,
89]. The increase of window values for textural feature extraction showed notable gain in land cover class separability up to window equal to 25 for combinations #1–#12 (
Table 4). The increase of class separability with the increase of window is plausible for the study area which presents high degree of spatial variability. In a previous study, the authors [
35] concluded that for areas displaying a lower degree of spatial variation small window sizes perform better, while for heterogeneous landscapes texture features extracted using large window values highly improve the classification accuracies. The fact that in this study the class close canopy natural vegetation (CCNV) was insensitive to window increase in several combinations of features may be explained by the within homogeneity that characterizes this type of land cover class in the study area.
Overall, the separability between the land cover classes was good. Lower separability occurred between open canopy natural vegetation (OCNV) and grassland (GR) and close canopy natural vegetation (CCNV) and agriculture (AGR) (
Table 4 and
Table 5). The low separability between OCNV and GR could be explained by the fact that the distinction of these classes was based on the tree/shrub cover percentage or density as observed in the field and from high-resolution imagery. Nevertheless, the two classes may present similar spectral behavior as consequence of defoliation of the trees comprising the upper canopy. Defoliation and leaf development have been indicated as the main factor of seasonality of spectral behavior in deciduous forest canopies [
90,
91]. When the defoliation occurs in the upper canopy trees, the spectral signature of the site is dominated by the understory vegetation. The opposite is verified in case of leaf development [
90]. Other authors also reported lower separability performance between land cover classes related with sparse vegetation and low shrublands [
88]. The low separability between CCNV and AGR may be explained by the fact that in the study area, forest areas are characterized by high fragmentation, with interstices occupied by other land cover types including AGR. Additionally, AGR is characterized by mixed crop systems, which generally include multipurpose trees within the fields. Additionally, in [
92] the authors reported the complexity of classifying mosaic areas comprising different proportions of various homogeneous classes including forests.
The overall accuracy of the classification performed in this study (percentage values: 89%–98%;
Table 7) outperformed the results of previous studies equally using the random forest algorithm with varying data types and in diversified types of landscapes and area extensions. For example, 85% overall accuracy in crop type classification using a multi-spectral time series of RapidEye images [
93]; more than 80% for a time series of Landsat 7 images in homogeneous regions [
94]; around 89% for a time series of Landsat 7 and 8 in heterogeneous regions [
13]; up to 71.7 for Worldview 2 time series in heterogeneous regions [
95] and 85.9% for crop discrimination using SPOT 5 images [
96]. This may be attributed to the fact that in this study, a reduced spatial extent was under consideration, what made it easier to take the local conditions into account and tune the classification accordingly. Nevertheless, the overall accuracy of these studies reports to different land cover classes, including classes relative to different crop types. The lower classification accuracy for the year 2018 (0.89) as compared to the years 2012 (0.94) and 2015 (0.98) may equally be justified by the high fragmentation of forest areas within the study area. In fact, from
Figure 2 it is noticeable that the forest fragmentation is higher in 2018, mainly due to agricultural expansion. Due to the early mentioned characteristics of agricultural systems, the confusion between this class and the forested areas is higher in 2018, what is confirmed in the confusion matrices presented in the
Table 6.
Most OCNV misclassified pixels were erroneously classified as other vegetated classes (CCNV, AGR, and GR). This may be due to the seasonality in canopy spectral signature, which is characteristic in deciduous vegetation [
90]. In fact, within the study area, CCNV pixels may temporally seem OCNV due to defoliation and the same process can make OCNV appear similar to AGR or GR. The inaccurately classified BS pixels as AGR (
Table 6) could be due to the effect of soil background during initial stages of crop development, as various satellite images used in the classification were from dates corresponding to the initial phase of crops growth cycle (
Table 1).
The land cover analysis showed that the study area landscape is predominantly covered by natural vegetation classes (CCNV, OCNV, and GR), and experiences a strong dynamic from one land cover class to another (
Figure 2). Changes between the three natural vegetation classes appear to be largely due to spectral similarities caused by vegetation seasonality characteristics rather than due to conversion from one to another class. Similar findings were reported in previous studies, e.g., [
35,
38,
78]. As was expected, the class agriculture (AGR) ranks first as the driving of effective land cover conversion, specifically for the class OCNV. During the period 2012–2018 a total of 60,945.12 ha of OCNV area was converted into AGR while 74,752.38 ha was recovered from AGR to OCNV. In the same period, a total of 16,203.78 ha was converted from all other cover classes to AGR (
Table 8 and
Table 9). These land cover dynamic of transitions from AGR to other classes and vice-versa confirms the shifting nature of agriculture in the study area, which is characterized by systematically leaving old croplands for fallow and setting new areas for cultivation in natural vegetation areas. As pointed out by [
9,
11,
97], in most African countries natural areas (forests and savannas) are under increasing pressure as the cultivation of more and more land is the means to meet the growing food demand. Additionally, the authors of [
11,
97,
98,
99] pointed out that anthropogenic land conversion is a primary cause of habitat and biodiversity loss.
The statistics of agriculture area as derived from the classification are relatively greater than those reported by the Local Governmental Entity, the SDAE, but the mean difference for the three years is rather low (slightly above 10%;
Figure 3). The SDAE’s statistics, as the overall agricultural statistics relying on farm surveys, have their own accuracy errors, which may result from the respondent (literacy degree, mistrust), the interviewer, the survey instrument (questionnaire design), the mode of data collection, and the incompleteness of the census list. The mean difference between the satellite-based classification and ground truth statistics from SDAE are within the average error of field surveys based agricultural censuses in developing countries (average error of 11% between the area measured and the area estimated by farmers) [
99].
In a study to map cropping intensity of smallholder farms in India, the authors [
25] reported a large discrepancy between cropped area estimated through remote sensing, using Landsat data, and the area of agricultural census (percentage difference of 150.8%). Likewise, [
100] the authors found that irrigation area derived using MODIS 500 m data was consistently greater than the ground truth data in most of the Indian States but the percentage difference was of about 17.2%.
The good accuracy results obtained for the maps produced in this study indicate their reliability for cropland area estimation, while adding a spatially distribution information. This finding is very important considering that cropland mapping is a pre-requisite for crop monitoring through remote sensing.
The applied methodology may most likely be applied straightforward in other districts of Mozambique, given the similarities of agricultural systems. However, its application in other locations across the globe, should be considered with care as the appropriateness of a method depends on the goals of the study, the scale of analysis, and the characteristics of the agricultural system in question [
30,
32,
33].
5. Conclusions
In this study, Google Earth Engine cloud computing was successfully applied for land cover classification in Vilankulo district in Mozambique. The best performing input variables for the random forest ensemble classifier were identified through systematic testing of different types of features. The selected input features included spectral bands and texture features. Furthermore, the best window size for the texture features computation was selected based on land cover classes separability, allowing the identification of the classes with higher sensitivity to this specific parameter in liaison to the inherent specificities of each class. The classification was applied to various years allowing to evaluate the shifting dynamics of smallholder agriculture in a time frame between 2012 and 2018.
The classes close canopy natural vegetation (CCNV), open canopy natural vegetation (OCNV), and grassland (GR) are the prevalent land cover types within the district while agriculture (AGR) is the most important cause of land cover change either by opening new cropping fields or by leaving the old ones for fallow. The cultivated area consistently increased throughout the entire study period and new spatial patterns were identified, with more concentrated agriculture areas spanning to the south east portion of the district.
The produced maps yielded high overall accuracy and the derived agricultural statistics present reasonable agreement with the ground truth statistics from SDAE.
The operational applications of this study results include timely and spatially distributed monitoring of agricultural areas, which can contribute to the improvement of agricultural statistics, to support local decisions regarding agriculture management and associated inputs management (e.g., irrigation water), to feed crop growth simulation models, among others. Additionally, the results allow the monitoring of natural vegetation dynamics for several years, which can support management decisions related to biodiversity and natural resources conservation. Moreover, on demand products may be derived based upon the methodology developed. Nevertheless, future research should address testing other types of remote sensing data (e.g., Sentinel-1 and Sentinel-2) either paired or not paired with Landsat data.