# Understanding the Requirements for Surveys to Support Satellite-Based Crop Type Mapping: Evidence from Sub-Saharan Africa

^{1}

^{2}

^{3}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Data and Methods

#### 2.1. Survey Data

#### 2.1.1. Malawi

- The coordinates of one plot corner recorded by the enumerator, i.e., “corner point”;
- The coordinates of the plot centroid that is derived from the full boundary, i.e., “centroid”;
- The coordinates of four to eight plot corner points that are derived from the boundary, based on the complexity of the plot shape (geometric simplification), and that are in turn used to:
- Derive the geospatial predictors for each pixel corresponding to a given corner point, these pixels and the associated predictors being used as training data, i.e., “boundary points”;
- Randomly select 20% of the pixels within the convex hull formed by the corner points, derive the geospatial predictors of interest for each sampled pixel, and use these pixels and the associated predictors as the training data, i.e., “convex hull”;
- Derive the geospatial predictors for all pixels within the convex hull and aggregate the information to the plot level by taking the average for each predictor across all pixels, i.e., “hull mean”;

- The full plot boundary that is in turn used to:
- Randomly select 20% of the pixels from a 10 m grid within the plot, derive the geospatial predictors of interest for each sampled pixel, and use these pixels and the associated predictors as the training data, i.e., “plot points”;
- Derive the geospatial predictors for all pixels from a 10 m grid within the plot and aggregate the information to the plot level by taking the average, for each predictor, across all pixels, i.e., the “plot mean”.

#### 2.1.2. Ethiopia

#### 2.2. Earth Observation Data

#### 2.2.1. Synthetic Aperture Radar Imagery

#### 2.2.2. Optical Imagery

#### Harmonic Regressions for Characterizing Crop Phenology

#### 2.2.3. Additional EO Data

#### 2.3. Methodology

- Define a common modeling pipeline that trains and evaluates a maize classification model for a given dataset;
- Feed the modeling pipeline with each dataset in a sequence designed to emulate hypothetical scenarios of field data collection (varying the number of observations, the plot geolocation method, and the minimum plot size);
- Vary the type of satellite data used by the modeling pipeline (optical only, radar only, both optical and radar); and
- Compare evaluation metrics across different scenarios. (Figure 3 depicts the overall structure of the study).

#### 2.3.1. Maize Classification Pipeline

#### 2.3.2. Survey Data Subsets in Accordance with Plot Area

#### 2.3.3. Modeling Data Collection Scenarios

- Seven geolocation methods—boundary points, centroid, convex hull, corner, hull mean, plot points, and plot mean;
- 50 data subsets—2% to 100% subsets of training data, at an increment of two percentage points;
- Five area thresholds—0, 0.05, 0.1, 0.15 and 0.2 ha;
- Three feature types—optical only, radar only, both optical and radar;
- Five replications to capture variability due to random sampling.

#### 2.3.4. Assessing Implications of Differences in the Accuracy of Competing Models

## 3. Results

#### 3.1. Effect of the Approach to Georeferencing Plot Location

#### 3.2. Effect of Sample Size

#### 3.3. Effect of Minimum Plot Size

#### 3.4. Effect of Satellite Data Type

#### 3.5. Spatial Variability of Classification Performance

#### 3.6. Implications of Small Changes in Classification Performance

## 4. Discussion

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Acknowledgments

## Conflicts of Interest

## Appendix A

**Table A1.**Summary statistics of pixel-level S2 observation frequency (after pre-processing) within each agricultural season in Malawi.

2016 | 2017 | 2018 | 2019 | |
---|---|---|---|---|

Mean | 22.12 | 17.79 | 27.23 | 27.64 |

Median | 19.04 | 15.00 | 23.05 | 24.05 |

Variance | 216.06 | 129.74 | 301.34 | 324.91 |

Min | 1 | 1 | 1 | 1 |

Max | 178 | 135 | 216 | 227 |

GDD* | GCVI_sin2 | NDTI_cos1* | NDVI_rmse | RDED4_variance |
---|---|---|---|---|

P_{tot}* | GCVI_t | NDTI_cos2* | NDVI_sin1* | SNDVI_constant |

T_{avg} | GCVI_variance* | NDTI_mean | NDVI_sin2* | SNDVI_cos1 |

aspect* | NBR1_constant* | NDTI_r2* | NDVI_t | SNDVI_cos2 |

elevation* | NBR1_cos1* | NDTI_rmse* | NDVI_variance | SNDVI_mean |

slope* | NBR1_cos2* | NDTI_sin1 | RDED4_constant* | SNDVI_r2* |

COUNT | NBR1_mean | NDTI_sin2* | RDED4_cos1* | SNDVI_rmse* |

GCVI_constant | NBR1_r2* | NDTI_t | RDED4_cos2 | SNDVI_sin1 |

GCVI_cos1 | NBR1_rmse* | NDTI_variance* | RDED4_mean | SNDVI_sin2 |

GCVI_cos2* | NBR1_sin1 | NDVI_constant | RDED4_r2* | SNDVI_t |

GCVI_mean | NBR1_sin2* | NDVI_cos1* | RDED4_rmse* | SNDVI_variance* |

GCVI_r2* | NBR1_t | NDVI_cos2* | RDED4_sin1* | NDVI_sin2 |

GCVI_rmse* | NBR1_variance | NDVI_mean* | RDED4_sin2* | |

GCVI_sin1 | NDTI_constant* | NDVI_r2 | RDED4_t* |

**Figure A1.**Box plots showing the maximum MCC and minimum sample size required to reach ~90% of the same (

**a**) for each geolocation strategy in Malawi and (

**b**) for corner point in Ethiopia.

Malawi | Ethiopia | |
---|---|---|

Field crop | 464 | 477 |

Tree crop or plantation | 21 | 66 |

Other vegetation | 711 | 1251 |

Water or swamp | 166 | 24 |

Building or road | 73 | 59 |

Desert or bare | 71 | 193 |

Total | 1506 | 2070 |

## References

- Davis, B.; Di Giuseppe, S.; Zezza, A. Are African households (not) leaving agriculture? patterns of households’ income sources in rural Sub-Saharan Africa. Food Policy
**2017**, 67, 153–174. [Google Scholar] [CrossRef] [Green Version] - Becker-Reshef, I.; Justice, C.; Barker, B.; Humber, M.; Rembold, F.; Bonifacio, R.; Zappacosta, M.; Budde, M.; Magadzire, T.; Shitote, C.; et al. Strengthening agricultural decisions in countries at risk of food insecurity: The GEOGLAM Crop Monitor for Early Warning. Remote Sens. Environ.
**2020**, 237, 11553. [Google Scholar] [CrossRef] - Burke, M.; Lobell, D.B. Satellite-based assessment of yield variation and its determinants in smallholder African systems. Proc. Natl. Acad. Sci. USA
**2017**, 114, 2189–2194. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Jin, Z.; Azzari, G.; You, C.; Di Tommaso, S.; Aston, S.; Burke, M.; Lobell, D.B. Smallholder maize area and yield mapping at national scales with Google Earth Engine. Remote Sens. Environ.
**2019**, 228, 115–128. [Google Scholar] [CrossRef] - Jin, Z.; Azzari, G.; Burke, M.; Aston, S.; Lobell, D.B. Mapping smallholder yield heterogeneity at multiple scales in Eastern Africa. Remote Sens.
**2017**, 9, 931. [Google Scholar] [CrossRef] [Green Version] - Lambert, M.J.; Traore, P.C.S.; Blaes, X.; Baret, P.; Defourny, P. Estimating smallholder crops production at village level from Sentinel-2 time series in Mali’s cotton belt. Remote Sens. Environ.
**2018**, 216, 647–657. [Google Scholar] [CrossRef] - Lobell, D.B.; Azzari, G.; Burke, M.; Gourlay, S.; Jin, Z.; Kilic, T.; Murray, S. Eyes in the sky, boots on the ground: Assessing satellite- and ground-based approaches to crop yield measurement and analysis. Am. J. Agric. Econ.
**2019**, 102, 202–219. [Google Scholar] [CrossRef] - Lobell, D.B.; Di Tommaso, S.; You, C.; Yacoubou Djima, I.; Burke, M.; Kilic, T. Sight for sorghums: Comparisons of satellite-and ground-based sorghum yield estimates in Mali. Remote Sens.
**2020**, 12, 100. [Google Scholar] [CrossRef] [Green Version] - Nakalembe, C. Urgent and critical need for sub-Saharan African countries to invest in Earth observation-based agricultural early warning and monitoring systems. Environ. Res. Lett.
**2020**, 15, 121002. [Google Scholar] [CrossRef] - Defourny, P.; Bontemps, S.; Bellemans, N.; Cara, C.; Dedieu, G.; Guzzonato, E.; Hagolle, O.; Inglada, J.; Nicola, L.; Rabaute, T.; et al. Near real-time agriculture monitoring at national scale at parcel resolution: Performance assessment of the Sen2-Agri automated system in various cropping systems around the world. Remote Sens. Environ.
**2019**, 221, 551–568. [Google Scholar] [CrossRef] - Xiong, J.; Thenkabail, P.S.; Tilton, J.C.; Gumma, M.K.; Teluguntla, P.; Oliphant, A.; Congalton, R.G.; Yadav, K.; Gorelick, N. Nominal 30-m cropland extent map of continental africa by integrating pixel-based and object-based algorithms using Sentinel-2 and Landsat-8 data on Google Earth Engine. Remote Sens.
**2017**, 9, 1065. [Google Scholar] [CrossRef] [Green Version] - Wei, Y.; Lu, M.; Wu, W.; Ru, Y. Multiple factors influence the consistency of cropland datasets in Africa. Int. J. Appl. Earth Obs. Geoinf.
**2020**, 89, 102087. [Google Scholar] [CrossRef] - Hegarty-Craver, M.; Lu, M.; Wu, W.; Ru, Y. Remote crop mapping at scale: Using satellite imagery and UAV-acquired data as ground truth. Remote Sens.
**2020**, 12, 1984. [Google Scholar] [CrossRef] - Kerner, H.; Nakalembe, C.; Becker-Reshef, I. Field-Level Crop Type Classification with k Nearest Neighbors: A Baseline for a New Kenya Smallholder Dataset. Paper Pre-sented at the ICLR 2020 Workshop on Computer Vision for Agriculture. 2021. Available online: https://arxiv.org/abs/2004.03023v1 (accessed on 3 March 2020).
- Richard, K.; Abdel-Rahman, E.M.; Subramanian, S.; Nyasani, J.O.; Thiel, M.; Jozani, H.; Borgemeister, C.; Landmann, T. Maize cropping systems mapping using rapideye observations in agro-ecological landscapes in Kenya. Sensors
**2017**, 17, 2537. [Google Scholar] [CrossRef] [Green Version] - Abay, K.; Abate, G.T.; Barrett, C.B.; Bernard, T. Correlated non-classical measurement errors, ‘second best’ policy inference, and the inverse size-productivity relationship in agriculture. J. Dev. Econ.
**2019**, 139, 171–184. [Google Scholar] [CrossRef] [Green Version] - Carletto, C.; Gourlay, S.; Winters, P. From guesstimates to GPStimates: Land area measurement and implications for agricultural analysis. J. Afr. Econ.
**2015**, 24, 593–628. [Google Scholar] [CrossRef] [Green Version] - Carletto, C.; Gourlay, S.; Murray, S.; Zezza, A. Cheaper, faster, and more than good enough: Is GPS the new gold standard in land area measurement? Surv. Res. Methods
**2017**, 11, 235–265. [Google Scholar] - Desiere, S.; Jolliffe, D. Land productivity and plot size: Is measurement error driving the inverse relationship. J. Dev. Econ.
**2018**, 130, 84–98. [Google Scholar] [CrossRef] [Green Version] - Kilic, T.; Moylan, H.; Ilukor, J.; Pangapanga-Phiri, I. Root for the tubers: Extended-harvest crop production and productivity measurement in surveys. Food Policy
**2020**, 102, 102033. [Google Scholar] [CrossRef] - Gourlay, S.; Kilic, T.; Lobell, D.B. A new spin on an old debate: Errors in farmer-reported production and their implications for inverse scale-productivity relationship in Uganda. J. Dev. Econ.
**2019**, 141, 102376. [Google Scholar] [CrossRef] - Robertson, L.D.; Davidson, A.; McNairn, H.; Hosseini, M.; Mitchell, S.; De Abelleyra, D.; Verón, S.; Cosh, M.H. Synthetic Aperture Radar (SAR) image processing for operational space-based agriculture mapping. Int. J. Remote Sens.
**2020**, 41, 7112–7144. [Google Scholar] [CrossRef] - Gorelick, N.; Hancher, M.; Dixon, M.; Ilyushchenko, S.; Thau, D.; Moore, R. Google Earth Engine: Planetary-scale geospatial analysis for everyone. Remote. Sens. Environ.
**2017**, 202, 18–27. [Google Scholar] [CrossRef] - Cai, Y.; Guan, K.; Peng, J.; Wang, S.; Seifert, C.; Wardlow, B.; Li, Z. A high-performance and in-season classification system of field-level crop types using time-series Landsat data and a machine learning approach. Remote Sens. Environ.
**2018**, 210, 35–47. [Google Scholar] [CrossRef] - Louis, J.; Debaecker, V.; Pflug, B.; Main-Khorn, M.; Bieniarz, J.; Mueller-Wilm, U.; Cadau, E.; Gascon, F. SENTINEL-2 SEN2COR: L2A Processor for Users. In Proceedings of the Living Planet Symposium 2016, Prague, Czech Republic, 9–13 May 2016; pp. 1–13, SP-740. Available online: https://elib.dlr.de/107381/ (accessed on 30 October 2021).
- Rumora, L.; Miler, M.; Medak, D. Contemporary comparative assessment of atmospheric correction influence on radiometric indices between Sentinel 2A and Landsat 8 imagery. Geocarto Int.
**2019**, 36, 13–27. [Google Scholar] [CrossRef] - Rumora, L.; Miler, M.; Medak, D. Impact of various atmospheric corrections on Sentinel-2 land cover classification accuracy using machine learning classifiers. Int. J. Geo-Inf.
**2020**, 9, 277. [Google Scholar] [CrossRef] [Green Version] - Deines, J.M.; Patel, R.; Liang, S.-Z.; Dado, W.; Lobell, D.B. A million kernels of truth: Insights into scalable satellite maize yield mapping and yield gap analysis from an extensive ground dataset in the US Corn Belt. Remote Sens. Environ.
**2020**, 253, 112174. [Google Scholar] [CrossRef] - Hurskainen, P.; Adhikari, H.; Siljander, M.; Pellikka, P.K.E.; Hemp, A. Auxiliary datasets improve accuracy of object-based land use/land cover classification in heterogeneous savanna landscapes. Remote Sens. Environ.
**2019**, 233, 111354. [Google Scholar] [CrossRef] - Konduri, V.S.; Kumar, J.; Hargrove, W.W.; Hoffman, F.M.; Ganguly, A.R. Mapping crops within the growing season across the United States. Remote Sens. Environ.
**2020**, 251, 112048. [Google Scholar] [CrossRef]

**Figure 1.**Plot geolocation methods and approaches for combining plot geometries with pixel-level data.

**Figure 2.**Examples of the harmonic model smoothing for three different crop types (maize, groundnut and soybean) using a Sentinel-2 GCVI time series in Malawi. The blue points represent the observed Sentinel-2 GCVI time series at a specific location in Malawi through November 2018–July 2019. The red line represents the harmonic fitted GCVI time series.

**Figure 5.**Training curves showing (

**a**) test accuracy (Malawi), (

**b**) test accuracy (Ethiopia), (

**c**) test MCC (Malawi) and (

**d**) test MCC (Ethiopia) as a function of training set size for each geolocation strategy in Malawi (

**a**,

**c**), and for corner point geolocation method in Ethiopia (

**b**,

**d**). Each training curve is aggregated over five trials. The curves shown in the left subplots are aggregated using the trials as is, and the ones on the right are aggregated after first smoothening each trial using a lowess estimator. All figures in the remainder of the results section use the smoothened trials.

**Figure 6.**Box plot showing MCC at different training set sizes for each geolocation strategy (Malawi); $n=727,2834,4945,7030$ correspond to ~$10\%,40\%,70\%$ and $100\%$ of the training set respectively.

**Figure 7.**Trends showing diminishing marginal returns to sample size (

**a**) across all geolocation strategies in Malawi and (

**b**) for the corner point geolocation strategy in Ethiopia.

**Figure 8.**Box plots showing the peak MCC and minimum sample size required to reach the same (

**a**) for each geolocation strategy in Malawi and (

**b**) for corner point in Ethiopia.

**Figure 11.**Map of Malawi showing test performance by district, using all the training data from the plot mean sampling strategy, with optical features and no area threshold (single trial). The performance metric is (

**a**) accuracy and (

**b**) MCC.

**Figure 12.**Scatterplots showing the relationship between test performance and the number of training plots by district, using all the training data from the plot mean sampling strategy in Malawi, with optical features, and no area threshold (five trials). The performance metric is (

**a**) accuracy and (

**b**) MCC.

Plot Category | IHPS 2019 | IHS5 2019/20 | ||
---|---|---|---|---|

Obs | % | Obs | % | |

Plots with no geolocation information | 334 | 6.2 | 1105 | 6.4 |

Plots with a corner point, but no polygon boundary | 1365 | 25.4 | 4871 | 28.4 |

Plots with a corner point and a polygon boundary, but dropped from analysis | 874 | 16.3 | 2139 | 12.5 |

Plots with a corner point and a polygon boundary, used for analysis | 2792 | 52.0 | 9059 | 52.7 |

Total Number of Plots | 5365 | 100.0 | 17,174 | 100.0 |

Total Number of Associated Households | 2335 | 8770 |

**Table 2.**IHPS 2019 and IHS5 2019/20 rainy season plots by maize cultivation status, conditional on being used for analysis.

IHPS 2019 | IHS5 2019/20 | |||||
---|---|---|---|---|---|---|

Season | 2018/19 | 2017/18 | 2018/19 | |||

Crop type | Obs | % | Obs | % | Obs | % |

Maize | 2033 | 72.8 | 2330 | 71.4 | 4222 | 72.9 |

Non-maize | 759 | 27.2 | 935 | 28.6 | 1572 | 27.1 |

Total Number of Plots | 2792 | 100.0 | 3265 | 100.0 | 5794 | 100.0 |

Total Number of Associated Households | 1470 | 1926 | 3506 |

Plot Category | ESS 2018/19 | |
---|---|---|

Obs | % | |

Plots with no geolocation information | 1168 | 8.7 |

Plots with a corner point, but dropped from analysis | 299 | 2.2 |

Plots with a corner point, used for analysis | 11,905 | 89.0 |

Total Number of Plots | 13,372 | 100.0 |

Total Number of Associated Households | 2199 |

**Table 4.**ESS 2018/19 meher season plots by maize cultivation status, conditional on being used for analysis.

Crop Type | ESS 2018/19 | |
---|---|---|

Obs | % | |

Maize | 1867 | 15.7 |

Non-maize | 10,038 | 84.3 |

Total Number of Plots | 11,905 | 100.0 |

Total Number of Associated Households | 2090 |

Band/Index | Name | Central Wavelength/Index Formula | Satellite |
---|---|---|---|

VV | Vertically polarized backscatter | 5.5465763 cm | Sentinel-1 |

VH | Horizontally polarized backscatter | 5.5465763 cm | Sentinel-1 |

RATIO | Ratio | VV/VH | Sentinel-1 |

DIFF | Difference | VV–VH | Sentinel-1 |

RDED4 | Red Edge 4 | 865 nm | Sentinel-2 |

GCVI | Green Chlorophyll Vegetation Index | (NIR/GREEN) − 1 | Sentinel-2 |

NBR1 | Normalized Burn Ratio 1 | (NIR − SWIR1)/(NIR + SWIR1) | Sentinel-2 |

NDTI | Normalized Difference Tillage Index | (SWIR1 − SWIR2)/(SWIR1 + SWIR2) | Sentinel-2 |

NDVI | Normalized Difference Vegetation Index | (NIR − RED)/(NIR + RED) | Sentinel-2 |

SNDVI | Smoothed Normalized Difference Vegetation Index | (NIR − RED)/(NIR + RED + 0.16) | Sentinel-2 |

Feature | Explanation | Data Source | Included in |
---|---|---|---|

Elevation | Obtained using GEE’s inbuilt terrain algorithm that uses an elevation raster to generate slope and aspect bands | Shuttle Radar Topography Mission (30m resolution) | Malawi, Ethiopia |

Slope | Malawi, Ethiopia | ||

Aspect (direction of slope) | Malawi, Ethiopia | ||

Average temperature | Mean daily temperature during growing season | aWhere daily observed weather API (0.1-degree resolution) | Malawi |

GDD | Growing degree days * accumulated during growing season | Malawi | |

Total precipitation | Total precipitation during growing season | Malawi |

Geolocation Method | Area Threshold | Satellite Data | Out of Sample Accuracy | Out of Sample Precision | Out of Sample Recall | Out of Sample MCC |
---|---|---|---|---|---|---|

Boundary points | 0 | Optical only | 0.75 | 0.75 | 0.97 | 0.21 |

Centroid | 0.05 | Optical and SAR | 0.75 | 0.76 | 0.96 | 0.24 |

Convex hull | 0.2 | Optical only | 0.75 | 0.75 | 0.98 | 0.21 |

Corner | 0.05 | Optical only | 0.75 | 0.76 | 0.97 | 0.23 |

Hull mean | 0 | Optical only | 0.75 | 0.77 | 0.93 | 0.25 |

Plot points | 0 | Optical only | 0.75 | 0.76 | 0.97 | 0.24 |

Plot mean | 0.05 | Optical only | 0.75 | 0.77 | 0.94 | 0.26 |

**Table 8.**Malawi maize area as obtained by seven different classification models, and area misclassified as maize/non-maize under each classification model as compared to ‘Plot mean’ (the best performing model).

Classification Model | Out of Sample MCC | Total Maize Area—2018/19 Rainy Season (million ha) | Difference in out of Sample MCC as Compared to ‘Plot mean’ | Total Area with Disagreement as Compared to ‘Plot Mean’ (million ha) |
---|---|---|---|---|

Boundary points | 0.21 | 2.27 | −0.05 | 0.84 |

Centroid | 0.24 | 2.17 | −0.02 | 0.48 |

Convex hull | 0.21 | 2.46 | −0.05 | 0.69 |

Corner | 0.23 | 2.15 | −0.03 | 0.95 |

Hull mean | 0.25 | 1.94 | −0.01 | 0.22 |

Plot points | 0.24 | 2.41 | −0.02 | 0.55 |

Plot mean | 0.26 | 1.99 | ||

Mean across models | 0.23 | 2.19 |

Country | Maize Classification Model Specifications | Seasons Trained on | Seasons Predicted on |
---|---|---|---|

Malawi | Plot mean geolocation method, 0.05 ha area threshold, Optical features only | 2017/18 rainy season, 2018/19 rainy season | 2015/16 rainy season 2016/17 rainy season 2017/18 rainy season 2018/19 rainy season |

Ethiopia | Corner point geolocation method, No area threshold, Optical features only | 2018 meher season | 2016 meher season 2017 meher season 2018 meher season 2019 meher season |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Azzari, G.; Jain, S.; Jeffries, G.; Kilic, T.; Murray, S.
Understanding the Requirements for Surveys to Support Satellite-Based Crop Type Mapping: Evidence from Sub-Saharan Africa. *Remote Sens.* **2021**, *13*, 4749.
https://doi.org/10.3390/rs13234749

**AMA Style**

Azzari G, Jain S, Jeffries G, Kilic T, Murray S.
Understanding the Requirements for Surveys to Support Satellite-Based Crop Type Mapping: Evidence from Sub-Saharan Africa. *Remote Sensing*. 2021; 13(23):4749.
https://doi.org/10.3390/rs13234749

**Chicago/Turabian Style**

Azzari, George, Shruti Jain, Graham Jeffries, Talip Kilic, and Siobhan Murray.
2021. "Understanding the Requirements for Surveys to Support Satellite-Based Crop Type Mapping: Evidence from Sub-Saharan Africa" *Remote Sensing* 13, no. 23: 4749.
https://doi.org/10.3390/rs13234749