Delineating Smallholder Maize Farms from Sentinel-1 Coupled with Sentinel-2 Data Using Machine Learning

: Rural communities rely on smallholder maize farms for subsistence agriculture, the main driver of local economic activity and food security. However, their planted area estimates are unknown in most developing countries. This study explores the use of Sentinel-1 and Sentinel-2 data to map smallholder maize farms. The random forest (RF), support vector (SVM) machine learning algorithms and model stacking (ST) were applied. Results show that the classiﬁcation of combined Sentinel-1 and Sentinel-2 data improved the RF, SVM and ST algorithms by 24.2%, 8.7%, and 9.1%, respectively, compared to the classiﬁcation of Sentinel-1 data individually. Similarities in the estimated areas (7001.35 ± 1.2 ha for RF, 7926.03 ± 0.7 ha for SVM and 7099.59 ± 0.8 ha for ST) show that machine learning can estimate smallholder maize areas with high accuracies. The study concludes that the single-date Sentinel-1 data were insufﬁcient to map smallholder maize farms. However, single-date Sentinel-1 combined with Sentinel-2 data were sufﬁcient in mapping smallholder farms. These results can be used to support the generation and validation of national crop statistics, thus contributing to food security. high as 35 ◦ C in summer according to the records from the automatic weather stations of the Agricultural Research Council. This area was selected as a case study because most of the rural population are smallholder maize farmers; they farm primarily for subsistence and partially for selling in local markets [20]. Speciﬁc regions of interest (ROI) were delineated for investigation based on the locations of the smallholder maize farms. The ROI was obtained from the local government department of agriculture (DAFF), where they were developed through survey campaigns. The ROI was used to generate an improved estimate of the area covered by smallholder farms by eliminating built-up areas, which can host households with backyard maize gardens leading to an overestimation of the planted areas. These households consume their maize before harvest-time.


Introduction
Maize (Zea mays L.) is an essential cereal crop worldwide for food consumption, animal feed, and the production of industrial products such as biofuels [1]. Developed countries consume lower quantities of maize compared to developing countries (Asia, Latin America and Africa), which are reliant on maize [2]. Smallholder farmers account for 80% of the maize produced as a staple crop in Africa [3]. However, global climate forecasts have reported that Africa could be one of the most susceptible regions to the effects of climate change by 2050. This phenomenon will cause growing water shortages and scarcity of suitable land, which will affect the production of cereal crops including maize [4,5]. Smallholder maize farms are important for the livelihoods of rural communities in Africa who depend on agriculture for food security and their local economic activities. These farmers are faced with problems such as inadequate rainfall due to droughts; they often have poor soils and limited irrigation infrastructure, which hinder their maximum productivity [6]. Although these problems prevail in smallholder farms, there is an increasing demand for maize as a consequence of population growth [7]. The disparity between declining maize supply and increasing demand for maize makes it necessary to develop a methodology to map smallholder maize farms and their sizes. Information about the areal extent of smallholder farms will guide the government when dispersing aid to them, inform land-use policies, and provide an indication of the current food security status, especially in vulnerable rural communities. The information provided by this project will enhance initiatives of local governments to provide spatial information regarding agricultural land-use by rural communities, as reliable information is lacking in most developing countries. high as 35 • C in summer according to the records from the automatic weather stations of the Agricultural Research Council. This area was selected as a case study because most of the rural population are smallholder maize farmers; they farm primarily for subsistence and partially for selling in local markets [20]. Specific regions of interest (ROI) were delineated for investigation based on the locations of the smallholder maize farms. The ROI was obtained from the local government department of agriculture (DAFF), where they were developed through survey campaigns. The ROI was used to generate an improved estimate of the area covered by smallholder farms by eliminating built-up areas, which can host households with backyard maize gardens leading to an overestimation of the planted areas. These households consume their maize before harvest-time.
Africa ( Figure 1). This area experiences rainfall during the warmer months of October to March and the mean annual rainfall is 536 mm. The fields have an average elevation of 1333 m above mean sea level. The temperatures can drop to 7 °C in winter but can be as high as 35 °C in summer according to the records from the automatic weather stations of the Agricultural Research Council. This area was selected as a case study because most of the rural population are smallholder maize farmers; they farm primarily for subsistence and partially for selling in local markets [20]. Specific regions of interest (ROI) were delineated for investigation based on the locations of the smallholder maize farms. The ROI was obtained from the local government department of agriculture (DAFF), where they were developed through survey campaigns. The ROI was used to generate an improved estimate of the area covered by smallholder farms by eliminating built-up areas, which can host households with backyard maize gardens leading to an overestimation of the planted areas. These households consume their maize before harvest-time.

Field Data Collection
Field surveys for the collection of training and validation data for different landcover types within the ROI occurred from 18 to 21 February 2019. This period was selected because maize had the maximum green biomass at this time and could be discriminated more clearly in comparison to other land-cover types [21]. A handheld Garmin Global Positioning System (GPS) device was used to collect waypoints of different land-cover classes, applying a purposive sampling approach. The classes considered were maize (19.72%), bare land (50.01%), vegetation (30.23%) and water (0.0%), which are the dominant classes in the study area. The bare land, vegetation and water classes were amalgamated to form the non-maize areas and the maize areas were used as well. This approach of using only two classes of (1) maize and (2) non-maize areas reduces the classification errors from incorporating different land-cover classes individually. For

Field Data Collection
Field surveys for the collection of training and validation data for different landcover types within the ROI occurred from 18 to 21 February 2019. This period was selected because maize had the maximum green biomass at this time and could be discriminated more clearly in comparison to other land-cover types [21]. A handheld Garmin Global Positioning System (GPS) device was used to collect waypoints of different land-cover classes, applying a purposive sampling approach. The classes considered were maize (19.72%), bare land (50.01%), vegetation (30.23%) and water (0.0%), which are the dominant classes in the study area. The bare land, vegetation and water classes were amalgamated to form the non-maize areas and the maize areas were used as well. This approach of using only two classes of (1) maize and (2) non-maize areas reduces the classification errors from incorporating different land-cover classes individually. For example, there were fewer pixels for water in the study area in comparison to bare land and vegetated areas; using Sustainability 2021, 13, 4728 4 of 16 this as a separate class has the potential of introducing errors depending on the sensitivity of the classifier. Ground-based validation samples for 18 smallholder maize farms were collected using a GPS. The samples were not used as training data for classification.

Sentinel-1 Data Acquisition and Pre-Processing
Sentinel-1 Level-1 ground range detected (GRD) data described in Table 1 were acquired from the Copernicus Open Access Hub. The interferometric wide (IW) image for 20 February 2019 was used; this consisted of the vertical transmit and vertical receive (VV) and vertical transmit and horizontal receive (VH) polarized backscatter values (in decibels) in a 10 m spatial resolution. Pre-processing of the radar images was done using the Sentinel application platform (SNAP). The orbit file was applied to update the orbit state vectors in the metadata file. Then, radiometric calibration was performed to convert the intensity values into sigma nought values. Speckle filtering was implemented to remove the granular noise caused by the interference of waves reflected from many scatterers. The Lee filter was applied at a 7 × 7 window size as it was found to be superior in preserving the edges, linear features, point target and texture [22]. Range Doppler terrain correction was done to correct for geometric distortions caused by topography such as foreshortening and shadows; the Shuttle Radar Topography Mission (SRTM) 3-sec Digital Elevation Model (DEM) was used for this purpose [23]. The backscatter values were converted into decibels, then the VH and VV polarizations were used to generate the VV/VH ratio.

Sentinel-2 Data Acquisition and Pre-Processing
The Sentinel-2 Level-1C image for 26 February 2019 was acquired from the Copernicus Open Access Hub. The Sentinel-2 images were pre-processed using the Sen2Cor plugin in SNAP to convert them from the top of atmosphere reflectance units to the bottom of atmosphere reflectance [24]. The bands which were used are summarized in Table 1. The SWIR and vegetation red edge bands were rescaled to 10 m resolution. The indices depicted in Table 2 were derived. These indices are necessary to be investigated for mapping smallholder farms because they cover a broad part of the electromagnetic spectrum (NIR, red and green) in comparison to only using the normalized difference index (NDVI). Additionally, they are sensitive to changes in soil background; they enhance the green vegetation signal, reduce the saturation effect of NDVI and are sensitive to chlorophyll content [25][26][27][28][29][30][31][32][33][34]. Table 2. Vegetation indices computed from Sentinel-2 imagery.

Vegetation Index Equation Justification
Reference Distinguishes between maize and soil. [26] GNDVI GNDV I = (N IR − Green)/(N IR + Green) More sensitive to chlorophyll concentration than NDVI. [31] IPVI IPV I = N IR/(N IR + Red) Similar to NDVI, but it is computationally faster. [28] Minimizes the effects of variable soil reflectance. [30] Predicting maize green LAI (leaf area index). [34] Better predictor of maize green LAI than MTVI1, and it accounts for soil background. [34] Sensitive to maize greenness. However, it can saturate in dense vegetation when LAI becomes very high. [26] Eliminates the effect of the soil background. [32] Detects maize and is not sensitive to the effects of soil and sun viewing geometry. [29] The SAVI index is similar to NDVI, but it reduces the influence of soil. [27] Detects healthy maize. However, it can saturate in densely vegetated maize plots when LAI becomes very high. [25] Detects green maize biomass and chlorophyll.

Classification Algorithms
Three different approaches were applied for mapping the smallholder farms, namely, RF, SVM and ST. The RF algorithm is a non-parametric decision tree ensemble classifier [35]. This classifier consists of a large number of classification and regression trees (CART), where each pixel is classified using a majority voting system. The RF algorithm trains each tree using an independently drawn subset of the original data using bootstrapping or bagging, and determines the number of features to be used at each node through an evaluation of a random vector [35]. One tuning parameter was defined for RF, the number of trees to grow (ntree), and the rest of the parameters are set to default values. In this study, the ntree was 150; this minimized the Out of Bag error, similar to Rodriguez-Galiano et al. [36]. The RF algorithm was selected because it can handle high dimensional data, is less sensitive to over-fitting and makes no distribution assumptions [18,37,38].
The SVM algorithm is also a non-parametric supervised learning classifier. The SVM uses the kernel function to transform training data into a high dimensional feature space, and to identify the optimal hyperplane that maximizes the distance between the separating hyperplane and the nearest sampling points [39][40][41]. The radial basis kernel was applied for SVM because of its good performance in previous studies [42,43]. The regularization parameter, gamma value and kernel coefficient had to be defined for the classifier. In this study, the regularization parameter was 100, the gamma value was 0.01 and kernel coeffi-Sustainability 2021, 13, 4728 6 of 16 cient was 0, similar to Kumar et al. [44]. The SVM algorithm was selected because it does not make assumptions of the probability distribution and is not sensitive to training sample size [40]. A grid-search method was used to find these optimum turning parameters for both SVM and RF.
Model stacking was applied; it collates the predictions generated by different machine learning algorithms and uses them to generate a second-level learning classifier [45]. In this study, the RF and SVM classifier were stacked, and the Logistic Classifier was used to combine the results. This ensemble model was applied because it has the ability to increase the predictive capacity of the two classifiers instead of using them independently [45].
Although RF has a variable importance measure, the permutation feature importance measurement was applied in this study to determine the importance of the predictors in each experiment, since previous studies have shown that RF variable importance has variations in ranking predictors as different iterations are performed [46]. The permutation feature importance allows different trained models (RF, SVM and ST) to assess feature importance. The algorithm computes reference scores s for the selected model on experimental datasets D. This reference score is the overall accuracy of the classifier. The features j in the datasets D are randomly shuffled to generate a corrupted version of the data D k,j . The scores s k,j are computed on the corrupted datasets D k,j . The feature importance i j is then computed for feature f j according to Equation (1

Experimental Design
These samples were randomly separated into training (80% of the data) and testing (20% of the data) [47]. The training data were used for classification, whereas the testing data were used to evaluate the models. The vegetation indices in Table 2 were derived for use during classification. Then, classification experiments depicted in Table 3 were set for the classification algorithms based on different combinations (data configurations). These experimental set-ups were adopted to investigate the best approach for mapping smallholder maize with Sentinel-1 and Sentinel-2 data.

Classification Model Evaluation and Planted Maize Area Estimation
Model evaluation was done to select the ideal model for estimating the maize areas. The matrices used were the OA, kappa coefficient of agreement (k), cross-validation, precision, recall and F1-Score. The OA is the total classification accuracy and values close to 1 indicate that a classification is accurate; this is computed according to Equation (2). The OA was adjusted using the procedure of Olofsson et al. [48] to account for classification errors. Thek is calculated according to Equation 3 where k is the land-cover classes in the confusion matrix, x i+ and x +j represent the marginal total for row i and column j.
x ii represents the number of observations in the row i and column i and N represents the total number of samples.k values > 0.8 represent a strong agreement between the classification map and the ground reference data.k values between 0.4 and 0.8 represent moderate agreement andk values < 0.4 represent poor agreement [49]. The equations for both matrices are given as: The K-fold cross-validation method was then applied [50]. This method divides the training data randomly into K-folds or subsets (in this study a standard value of 10 was used), where one of the subsets is used as a test data set and the other K-1 is used as a training data set used to fit the model. This process is repeated i times, and the calculated average accuracy is computed for the testing data. The accuracy statistic was used during cross-validation, where values close to 1 indicate a high probability that a sample is correctly classified. The standard deviation of each accuracy value is also computed in each iteration, and the average standard deviation is indicated using a +/− attached to the cross-validation accuracy. The precision, recall and F1-Score were computed to determine the rate at which the pixels were correctly classified. The classifier performs well if the precision, recall and F1-Score are close to 1 [51]. Classification confidence was evaluated using McNemar's test to compare each of the models together [52]. We tested the hypothesis that the two models perform the same. When the Chi-squared values are less or equal to 3.84, the models have the same error at a 95% confidence level. However, one model is superior if the Chi-squared values are greater than 3.84.
The areas derived from the classification map were adjusted to account for classification error, and the 95% confidence interval was computed to compare the three models [48]. These areas were compared to the areas derived from 18 maize farms measured during fieldwork to get an indication of how accurately the models estimate maize-planted areas using a regression equation. The p-value (p) and Pearson correlation coefficient (R) are used to evaluate the accuracy.

Classification Model Evaluation
The performances of the three algorithms applied in this study are presented in Table 4. The experiment with the lowest accuracies was experiment 1, containing the Sentinel-1 polarizations independently. This experiment had an accuracy of between 0.68-0.85 and a cross-validation score of between 0.65-0.69 for the three algorithms.

Variable Importance
The variable importance was determined for the experiments in Table 3 using the permutation feature importance algorithm [46]. The experiments ( Figure 2) varied in terms of the most important predictors depending on the input data. In experiment 1, the VH polarization had the highest importance; however, when integrating other predictors (e.g., experiments 3 and 4), the VV polarization had a higher importance over the other polarizations. The DVI outperformed all the other vegetation indices, followed by GNDVI in experiment 2. The most important bands in experiments 3 and 4 were the blue, red-edge and short-wave infrared (SWIR) bands. Additionally, the Sentinel-2 spectral bands took the highest priority in terms of importance in comparison to the Sentinel-1 polarizations.

Mapping and Area Estimates for Maize
The 95% confidence interval was computed for the maize and non-maize areas within the study area. There was a relatively small variation between the total areas classified by the three algorithms for maize in Table 6. The RF algorithm had a discrepancy of 6% when compared to SVM, and 0.7% when compared to ST for the maize-planted areas. The ST algorithm had a variation of 5.5% in comparison to SVM. The areas classified as planted with maize had a lower error (0.7-1.2 ha) in comparison to the other areas which were not maize (1.2-1.88 ha) based on the 95% confidence interval. The RF algorithm had the lowest accuracy of ±1.2 ha when estimating maize areas, and SVM had the highest accuracy of ±0.7 ha. polarizations. The DVI outperformed all the other vegetation indices, followed by GNDVI in experiment 2. The most important bands in experiments 3 and 4 were the blue, rededge and short-wave infrared (SWIR) bands. Additionally, the Sentinel-2 spectral bands took the highest priority in terms of importance in comparison to the Sentinel-1 polarizations.

Mapping and Area Estimates for Maize
The 95% confidence interval was computed for the maize and non-maize areas within the study area. There was a relatively small variation between the total areas classified by the three algorithms for maize in Table 6. The RF algorithm had a discrepancy of 6% when compared to SVM, and 0.7% when compared to ST for the maize-planted areas. The ST algorithm had a variation of 5.5% in comparison to SVM. The areas classified as planted  The classified areas for 18 smallholder maize farms were related to the field measured area at the same farms in Figure 3. There was a positive relationship, which was significant at a 95% confidence interval (p < 0.05) between the classified areas and field measured areas. The correlation coefficients obtained by the RF, ST and SVM algorithms are 0.51, 0.78 and 0.84, respectively, indicating higher agreement with the field measurements.
The classified areas for 18 smallholder maize farms were related to the field measured area at the same farms in Figure 3. There was a positive relationship, which was significant at a 95% confidence interval (p < 0.05) between the classified areas and field measured areas. The correlation coefficients obtained by the RF, ST and SVM algorithms are 0.51, 0.78 and 0.84, respectively, indicating higher agreement with the field measurements. The three algorithms were used to generate the classification maps in Figure 4b-d depicting the spatial patterns of the two classes considered within the ROI. These maps compared well with the true color composite satellite image in Figure 4a for Sentinel-2. The classification maps generated by SVM, RF and ST were similar. The maize-planted areas were concentrated in the southern part of the Makhudutamaga district. The crop maps derived in this study are fundamental for crop forecasting and crop yield estimation at the end of the season. Changes induced by natural phenomena, such as climate variability and their effects on crop production, can be understood with the use of crop maps. The three algorithms were used to generate the classification maps in Figure 4b-d depicting the spatial patterns of the two classes considered within the ROI. These maps compared well with the true color composite satellite image in Figure 4a for Sentinel-2.

Discussion
This study assessed the applicability of Sentinel-1, Sentinel-2 and derived vegetation indices for mapping smallholder maize in Makhudutamaga, Limpopo Province. Classification experiments were set to evaluate the performance of three machine learning algorithms. The variable importance measures were employed to investigate which predictors had the most influence in each experiment. The best performing algorithms were then used for estimating and mapping the maize-planted areas. Findings suggest that integrating Sentinel-1 and Sentinel-2 is ideal for mapping smallholder maize farms with the application of machine learning algorithms.
Contrary to our expectations, the use of single-date Sentinel-1 radar data was not effective for mapping smallholder maize farms. The data combination consisting of Sentinel-1 polarizations exclusively had a low OA ranging from 67.9% to 84.5%, with RF being the worst performing classifier. These results are similar to those of Abubakar et al. [53], who observed an OA of 78.9% when mapping smallholder maize using Sentinel-1 data by applying SVM. However, Useya and Chen [54] reported an OA of 46% with RF and 40% with K-means classification when mapping smallholder maize farms and other crops with Sentinel-1 single-date data. The poor performance of the Sentinel-1 C-band data could be because of its shorter wavelength, which decreases canopy penetration in comparison to L-band SAR, which has a longer wavelength [55,56]. The inconsistencies in the planting pattern in the smallholder farms, such as a lack of equal row spacing, differences in the plant densities, leaf area index and crop heights in the study area, detract from the performance of the Sentinel-1 data because, according to Inoue et al. [57], C-band data are sensitive to changes in biomass.
The integration of Sentinel-1, Sentinel-2 and vegetation indices were ideal for detecting smallholder maize farms, similar to previous studies in comparison to using Sentinel-1 data independently. Experiments 2, 3 and 4 show a clear increase in performance measures, in both OA and cross-validation scores. These values are more consistent and similar to each other, indicating the positive impact of radar-optical fusion on classification accuracy. Other studies such as that of Van Tricht et al. [16] achieved OAs between 75 and 82% when mapping maize and other land-cover classes with the application of Sentinel-1 and Sentinel-2 data. Abubakar et al. [53] achieved an OA of 97% when mapping smallholder maize with vegetation indices, Sentinel-1 and Sentinel-2 data. The high accuracies attained in this current study are attributed to the use of ideal locations of the electromagnetic spectrum such as the red-edge andSWIR. Furthermore, the vegetation indices applied in the current study reduce background effects (soils and other classes such as buildings), thereby enhancing the detection of crops and vegetation classes [25][26][27][28][29][30][31][32][33][34].
The differences in performance of the SVM, RF and ST algorithms were expected. For example, Ouzemou et al. [58] reported different OAs of 89.3%, 85.3% and 57.2% for RF, SVM and Spectral Angle Mapper (SAM) for crop type mapping with Landsat 8 data. Sonobe et al. [59] found that SVM (OA of 89.1%) had a superior performance than RF (OA of 87.8%) and the Classification and regression tree (CART) (OA of 81.2%) algorithms for classifying crops with TerraSAR-X data. These differences can be induced by various factors. In this study, the first experiment had the lowest accuracies; notably, RF had a low performance. This is because RF has been shown to be highly sensitive to small number of training input data in previous studies, in comparison to SVM and ST [60,61]. All three algorithms had high accuracies in the four experiments, possibly because the ROI used for training focused on maize-planted areas. This approach reduced the effects of using multiple land-cover classes individually which has a potential to lower the classification accuracy.
The variable importance results indicating the superiority of the VV polarization, DVI, GNDVI, blue band, red-edge and SWIR bands for mapping maize were expected. Forkuor et al. [62] found that the VV band was superior to the VH band derived from TerraSAR-X for crop mapping applications. Deschamps et al. [63] used Sentinel-1 data for crop classification and observed that the VV band was important for crop classification. However, other studies, for example Inglada, et al. [64] and Arias et al. [65], have reported that the VH band is more important than the VV bands for mapping crops because it captures the volume scattering from the crop canopy structure [66]. These results suggest that it is important to evaluate the polarizations based on the locality where they are applied. The finding that DVI and GNDVI are the most important indices, when using radar data and vegetation indices for crop classification, highlights the importance of evaluating different indices instead of relying on the commonly used NDVI index. The blue band, red-edge and SWIR bands have proven to be important in previous studies [38,67,68]. These bands capture the biochemical properties, water content and residue cover of different crop types that improves their detection [69]. In experiment 2, the OSAVI index was the least important variable. However, this seems to change in experiment 3, where this index ranked higher than RDVI, MTV12, MTV11, DVI, SAVI and TVI. This may be due to the correlation of these bands with the raw Sentinel-2 bands in experiment 3, while the indices in experiment 2 have a lower correlation between them.
The RF and ST algorithms had a relatively small difference of 0.7% when estimating the total planted maize area class, while the SVM algorithm seems to have overestimated the planted maize area by approximately 6% compared to the results from other algorithms. Even though SVM had a higher correlation coefficient than the RF and ST algorithms, we could not conclude that the SVM was the better estimator since the validation samples are relatively small. More validation data are required to provide more information on the performance of each algorithm in relation to ground-measured areas. However, since all algorithms have similar positive values of correlation coefficients, we can conclude that these algorithms can be used to estimate smallholder maize farmed areas. Unfortunately, official agricultural statistics such as production areas are not available in our study area, and could have been used to validate these observations.
The findings of this study are applicable to the Sustainable Development Goals (SDG), specifically, SDG number 2 (Zero Hunger), target 2.4 and indicator 2.4.1, which concern mitigating factors that affect agricultural production, ensuring sustainable agriculture and increasing the proportion of agricultural area under production [70]. The agricultural production area is of great importance, as it informs local government and related stakeholders about agricultural activities and provides means by which production can be forecasted. The production area is one of the important indicators of food insecurity, especially in developing countries such as South Africa. Thus, this study contributes towards this SDG by using remote sensing data to accurately map production areas for smallholder maize farms. The spatial information generated can be used by local government to assist smallholder farms and policy implementation [70].
The limitations of this study were that a limited number of sample points were collected during fieldwork due to the undulating nature of the terrain, high cost to conduct the fieldwork and prominent mountainous areas, which were not accessible for data collection. This small sample size affects the statistical robustness of results [71]. Secondly, the poor farm management practices of smallholder farmers such as weeds and patches of grass growing in some of the farms affect the spectral signature of maize and decrease the accuracy at which they can be detected with remotely sensed imagery. Thirdly, the use of red-edge indices, which have demonstrated some potential in improving the detection of vegetation in previous studies, should be explored [72,73].

Conclusions
The overall aim of the study was to develop a framework to enhance the delineation of smallholder maize areas using single-date Sentinel-1, Sentinel-2 and derived vegetation indices. The results showed that single-date Sentinel-1 on its own was not sufficient in mapping planted maize fields. When Sentinel-2 data were integrated with Sentinel-1 data, an improvement of 24.2%, 8.7% and 9.1% for RF, SVM and ST algorithms, respectively, were observed. Machine learning proved to have a high capacity to estimate smallholder maizeplanted areas (7001.35 ± 1.2 ha for RF, 7926.03 ± 0.7 ha for SVM and 7099.59 ± 0.8 ha for ST). The framework used in this study can be applied when evaluating different algorithms for mapping smallholder farms. The crop maps derived in this study are fundamental for crop monitoring, land-use policies and aiding food security planning activities.