Do Satellite Data Correlate with In Situ Rainfall and Smallholder Crop Yields? Implications for Crop Insurance

: Adverse weather is one of the most prevalent sources of risk in agriculture. Its impacts are aggravated by the lack of effective risk management mechanisms. That is why resource-poor farmers tend to respond to weather risks by adopting low-capital investment, low-return, and low-risk agricultural practices. This challenge needs to be addressed with innovative risk management strategies. One of the tools that is gaining traction, especially in the developing countries, is weather-index-based insurance (WII). However, WII uptake is still low because of several constraints, one of which is basis risk. This study attempts to address this problem by evaluating the suitability of TAMSAT, CHIRPS, MODIS, and Sentinel-2 data for WII. We evaluated the ﬁrst three datasets against in situ rainfall measurements at different spatial and temporal scales over the maize-growing season in a smallholder farming area in South Africa. CHIRPS had higher correlations with in situ measured rainfall data than TAMSAT and MODIS NDVI. CHIRPS performed equally well at 10 km and 25 km spatial scales, and better at monthly than daily and 16-day time steps (maximum R = 0.78, mean R = 0.72). Due to the lack of reliable historical yield data, we conducted yield surveys over three consecutive seasons using an objective crop cut method. We then assessed how well rainfall and NDVI related with maize yield. There was a poor relationship between these variables and maize yield (R 2 ≤ 0.14). The study concludes by pointing out that crop yield does not always have a linear relationship with weather and vegetation indices, and that water is not always the main yield-limiting factor in smallholder farming systems. To minimize basis risk, the process of designing WII must include identiﬁcation of main yield-limiting factors for speciﬁc localities. Alternatively, insurers could use crop water requirement methods to design WII.


Introduction
The need for risk management in smallholder farming areas cannot be overstated. One of the most common sources of production risk is drought [1,2]. This phenomenon threatens food security by causing production fluctuations and low yields [3][4][5][6]. As a result, researchers are increasingly exploring different drought risk management strategies [7][8][9] including, among others, agricultural index insurance (AII) [10]. The most common type of AII is weather-based index insurance (WII). WII uses weather-related variables like rainfall, temperature, soil moisture, evapotranspiration, and vegetation indices to monitor crop growing conditions because these indices can detect when intervention is required [11,12]. Drought insurance, for example, indemnifies farmers when rainfall or soil moisture over the crop-growing period fails to reach a certain threshold [12][13][14]. This threshold marks the point at which yield reduction and crop losses occur.
By using indices as proxies for losses, WII reduces administrative costs and premiums while avoiding moral hazard and adverse selection [1,15,16]. In traditional insurance schemes, moral hazard occurs when the insured farmers deliberately expose crops to risks in order to increase the chances of receiving payouts [1,17]. Adverse selection is a situation where high-risk farmers take up insurance more frequently than others because they perceive more profits from insurance [1,17]. WII avoids these challenges because the insurer and the insured cannot easily manipulate the insured indices. However, WII is exposed to basis risk. Basis risk arises when the insurance index does not match the losses incurred by the farmer [18]. The different types of basis risk are temporal basis risk, spatial basis risk, and product basis risk. Temporal basis risk arises when the index fails to estimate losses by virtue of being measured outside the relevant crop growth stages [19]. Spatial basis risk arises when the index fails to capture spatial variability in the weather variable being measured [20]. Product basis risk arises when the selected index is poorly correlated with crop yields/losses due to poor product design and inappropriate index selection [21].
To reduce basis risk, studies have explored different indices that might be well correlated with crop yields and crop losses. They have sought to design these indices by using high spatial resolution data [22][23][24], phenological information [19,25], spatial interpolation [21], spatial and temporal aggregation [26], agro-ecological information [27], and complementary satellite datasets [28,29]. In Ethiopia, Hochrainer et al. Ref [30] reported that the Vegetation Health Index (VHI), measured in the late crop growth stages, explained 60% of crop yield variation. They concluded that the achieved results were not good enough as AII would require an index with at least 80% explanatory power. Black et al. Ref [26] reported that total seasonal rainfall explained 65% of cotton production losses in Zambia. However, they observed that heavy losses also occurred in seasons of near-normal rainfall due to socioeconomic factors. In East Africa, Enenkel et al. Ref [28] reported that combining remotely-sensed soil moisture, rainfall, and Evaporative Stress Index (ESI) explained 44% of maize yield variation. They stated that the impact of non-weather factors on crop yield might have affected the regression results. In Ethiopia, Eze et al. Ref [31] evaluated the relationship between crop yield and indices derived from rainfall and the Normalized Difference Vegetation Index (NDVI). They found minimum and maximum correlations of −0.007 and 0.64, respectively, with yield showing better correlation with annual NDVI than the other indices. They observed that correlations were not significant in one of the investigated districts due to environmental differences between districts.
In Malawi, Anghileri et al. Ref [24] compared satellite-based rainfall estimates (SRFEs), NDVI, Enhanced Vegetation Index (EVI), soil moisture, and ESI to identify a suitable proxy for maize yield. The study found that crop yields and these indices correlated by 0.01 to 0.67, with SRFEs achieving the best results. However, they observed that the correlations exhibited high spatial and temporal variability, making it difficult to identify a suitable index for the entire country. The results obtained in these studies show that remotely sensed indices have the potential to reduce basis risk. However, the studies also report discrepancies which they attribute to spatial variations in environmental conditions and non-weather yield-determining factors. Smallholder crop yields are often influenced by factors not covered by WII contracts. Factors such as seed variety, fertilizer application rate, soil properties, mechanization, and others that are not covered by WII or measured by weather-related indices have as much influence on smallholder crop yields as weather variables [32][33][34][35][36].
This raises the question: Do smallholder crop yields really have a linear relationship with indices like rainfall or the NDVI? Discrepancies between crop yields and weather indices can also be caused by unreliable yield statistics [24,[37][38][39]. Some studies have raised concerns about government and self-reported yield statistics, and their influence on remote sensing-based yield models [24,37,40]. Other studies have investigated and pointed out the inaccuracies of these datasets [41][42][43]. The Food and Agricultural Organization (FAO) reported that South Africa's crop yield data for smallholder farming systems (SFS) are inaccurate and unreliable [38].
Considering the challenges highlighted above, this study conducted yield surveys over three maize growing seasons in a smallholder farming area in South Africa. We employed an objective crop cut method that South Africa's Crop Estimate Committee (CEC) uses to estimate large-scale commercial crop yields. Since WII, particularly drought insurance, compensates for rainfall/water deficits, we assessed different satellite datasets against ground-based rainfall data at different spatial and temporal scales. We then evaluated how well these datasets explain maize yield. The datasets include the Tropical Applications of Meteorology using SATellite (TAMSAT), Climate Hazards Group InfraRed Precipitation with Station data (CHIRPS), Moderate Resolution Imaging Spectroradiometer (MODIS), and Sentinel-2 images. TAMSAT and CHIRPS were selected because they have higher spatial resolutions than other freely available SRFEs covering Africa [44]. MODIS NDVI is a moderate-resolution product that can be used to monitor smallholder farming areas for AII, while Sentinel-2 images have a high-resolution that is suitable for field-level applications. The objectives of this study were to (1) evaluate how sensitive SRFEs and NDVI are to rainfall and (2) how well rainfall and NDVI correlate with maize yield. Although the study is based on a smallholder farming area in South Africa, it also seeks to address the challenges of poor product design and basis risk that confront AII in other parts of Africa and the rest of the world.

Study Area
The study area is O.R. Tambo District Municipality (ORTDM) in the Eastern Cape Province of South Africa ( Figure 1). This study focused on three of the five local municipalities in ORTDM ( Figure 1). Maize cultivation records obtained from the Department of Agriculture, Land Reform and Rural Development (DALRRD) guided the choice of these municipalities. These records consist of GPS coordinates of farms that had planted maize in 2017, 2018, and 2019. Most farmers in the area grow maize because it is the most important food crop in South Africa [45]. White maize is a staple food for the majority of the population and yellow maize is widely used as animal feed [45]. The area has a warm oceanic climate [46]. Mean annual rainfall ranges from 900 mm to 1300 mm, with summer minimum and maximum temperatures of 14 • C to 19 • C and 14 • C to 27 • C, respectively [47]. The coastal areas are densely vegetated and have an undulating terrain with elevations that range from 5 m to 500 m above sea level. The interior has gentle-to-moderate sloping open grasslands; the northern areas have savannas, forests, and maximum elevations approximating 1500 m. The soils are sandy loams, sandy clay loams, and clays that are yellow to black in color and slightly acidic [48,49]. Maize is planted between October and mid-December, and harvested between June and August. However, sometimes farmers plant between late-December and early-January because of delayed deliveries of inputs [50].

Data
The datasets used in the study include in situ rainfall data from seven weather stations (WS), SRFEs from TAMSAT and CHIRPS, NDVIs from MODIS and Sentinel-2, and yield data. The rainfall and NDVI datasets covered the critical maize growing period, which spans from November to April. Most of this period coincides with summer and is South Africa's rainy season.
• TAMSAT (2002TAMSAT ( -2020 TAMSAT data were downloaded from https://www.tamsat.org.uk/ (accessed on 25 November 2021). TAMSAT is a daily, pentadal, monthly, and seasonal rainfall product with a resolution of 0.0375 • . It is developed by the University of Reading. The dataset is a product of EUMETSAT's Meteosat thermal infrared data and ground-based rainfall measurements from rain gauges [51]. TAMSAT was originally developed to monitor rainfall deficit and its impacts on crop yields over the Sahel, but now covers the rest of Africa.
• CHIRPS (2002-2020) CHIRPS data were downloaded from https://climateserv.servirglobal.net/ (accessed on 20 November 2021). CHIRPS is a daily, pentadal, and monthly high-resolution (0.05 • ) precipitation dataset developed by the Climate Hazard Group (CHG) and the United States Geological Survey (USGS) Earth Resources Observation and Science Centre (EROS). This dataset is a product of multiple satellite datasets and ground-based rainfall observations [52]. CHIRPS was developed to support drought monitoring and trend analyses.
• MODIS NDVI (2002-2020) MODIS NDVI data were downloaded from https://ladsweb.modaps.eosdis.nasa.gov (accessed on 15 November 2021). We used the MOD13Q1 product, which has a spatial resolution of 250 m and a temporal resolution of 16 days. The MOD13Q1 algorithm selects nadir view scenes that are free of atmospheric contamination by filtering images collected over 16-day periods. These images are then composited and ingested by a vegetation index algorithm, which produces the final 16-day NDVI product [53,54]. MODIS data were extracted from the HDF files and reprojected using ArcGIS model builder.
• Sentinel-2 (2017-2020) Sentinel-2 images were downloaded from https://scihub.copernicus.eu/ (accessed on 20 November 2021). Sentinel-2 is an earth observations mission comprising a constellation of two satellite instruments that acquire images every 5 to 10 days at spatial resolution of 10 m, 20 m, and 60 m. This dataset consists of 13 spectral bands covering visible, near, and shortwave infrared [55]. We pre-processed this dataset using the Sen2Cor algorithm to correct Top-Of-Atmosphere (TOA) Level-1C reflectance to Bottom-Of-Atmosphere (BOA) surface reflectance. The raw bands were then further processed using QGIS to produce multi-temporal NDVI maps covering the growing season.
• Yield data (2017-2020) The maize-yield data comprised 65 observations collected in 2017-2018, 2018-2019, and 2019-2020. We collected new yield data partly because most of the farmers (80%) indicated that they do not keep yield records and also because the FAO [38] reported that South Africa's SFS yield statistics are inaccurate and unreliable. We collected the yield data using the objective yield survey (OYS) technique that is employed by South Africa's CEC for estimating large-scale commercial farm yields. The FAO [38] report cited above provides details on this technique.

Data Analysis
The data were processed and analyzed using different software packages including QGIS, ArcGIS, Python, RStudio, and Microsoft Excel. Python was used to extract time series NDVI and rainfall values from CHIRPS, MODIS, and Sentinel-2. TAMSAT data are provided by the source in the form of CSV files. RStudio and Microsoft Excel were used for correlation and regression analysis.

Relationship between Satellite Data and In Situ Rainfall
We evaluated the satellite data against the in situ rainfall data using correlation analysis. Other studies also used correlation analysis to validate SRFEs against in situ rainfall data [56][57][58] and to assess the usability of satellite data in AII [26,29,31,59,60]. Since studies [26,29] report that SRFEs perform better when spatially aggregated, we aggregated TAMSAT and CHIRPS to 10 km and 25 km averages (Figure 2). We also aggregated MODIS to the same spatial scale for consistency. We performed spatial aggregation in Python by averaging pixels that lie within 5 km and 12.5 radiuses from the weather station. TAMSAT data were downloaded from the TAMSAT web system at the same spatial scales. TAMSAT, CHIRPS, and in situ rainfall measurements were daily, 16-day, and monthly cumulative rainfall measurements; MODIS NDVIs were 16-day and monthly averages.

Crop Yield Relationships with Rainfall and NDVI
For minimal basis risk, the insurance index must correlate well with crop yields and crop losses. Studies have used linear regression and correlation analyses to determine how weather indices can be used as proxies for crop yields and crop losses [14,28,30,31,59]. This study used simple linear regression to regress maize yield against rainfall and NDVI. The independent variables (rainfall and NDVI) covered the period between planting and yield survey day. We used cumulative rainfall (planting to survey day) with the SRFEs covering 10 km around each farm. We chose 10 km because the 25 km spatial scale did not introduce any significant changes to the performances of the SRFEs (see results in Section 3.1). For MODIS and Sentinel-2 NDVIs, we conducted preliminary analyses through which we sought to identify the best proxy between seasonal average, seasonal maximum, and stagewise NDVI. In the main analysis, we used seasonal maximum because it produced better results than the other ones. The NDVIs only covered the target crop fields (i.e., NDVI-yield analyses were done at the crop field level).

Results
Section 3.1 presents results of the first objective, which was to assess the relationship between the satellite data and in situ rainfall measurements. Section 3.2 presents results of the second objective, which was to assess the relationship between maize yield, rainfall, and NDVI. Figure 3 shows correlations between WS and satellite data at daily time step. All correlations above 0.27 were statistically significant at the 95% confidence level with p-values below 0.05.   At the 16-day time step and the 10 km spatial scale, CHIRPS and WS data agreed by 0.51 to 0.68 and by 0.51 to 0.6 at the 25 km spatial scale. TAMSAT and WS data agreed by 0.28 to 0.57 at the 10 km spatial scale and by 0.30 to 0.57 at the 25 km spatial scale. The 16-day MODIS NDVI and WS data had correlations ranging from 0.22 to 0.39 at the 10 km spatial scale and 0.15 to 0.37 at the 25 km spatial scale. Figure 5 shows correlations between WS and satellite data at the monthly time step.   Figure 6 shows that CHIRPS estimated rainfall better than TAMSAT across all the different spatial and temporal scales investigated in this study. Mean correlations between CHIRPS and WS data at the 10 km spatial scale were 0.34, 0.62, and 0.66 for the daily, 16-day, and monthly time steps, respectively. Mean correlations between CHIRPS and WS data at the 25 km spatial scale were also 0.34, 0.62, and 0.66 for the daily, 16-day, and monthly time steps, respectively. Mean correlations between TAMSAT and WS data at the 10 km spatial scale were 0.32, 0.43, and 0.38 for the daily, 16-day, and monthly time steps, respectively. Mean correlations between TAMSAT and WS data at the 25 km spatial scale were also 0.32, 0.43, and 0.38 for the daily, 16-day, and monthly steps, respectively. Mean correlations between MODIS NDVI and WS data at the 10 km spatial scale were 0.31, 0.62, and 0.44 for the 16-day and monthly time steps, respectively. Mean correlations between MODIS NDVI and WS data at the 25 km spatial scale were 0.30 and 0.43 for the 16-day and monthly time steps, respectively.

Yield Estimation with Rainfall and NDVI
Over the three growing seasons (2017-2018, 2018-2019, and 2019-2020), ORTDM's maize yields ranged between 367.10 and 7449.13 kg/ha, with an average of 3259.16 kg/ha. Average yields per season were 2946.40, 3067, and 3509 kg/ha respectively. Figure 7 shows results of the regression analysis between maize yield and rainfall.  Figure 7 shows that CHIRPS data explained about 3% (p = 0.26) of the maize yield variation, while WS data explained 14% (p = 0.01). Figure 8 shows results of the regression analysis between NDVI and maize yield. As shown in Figure 8, the explanatory power of both MODIS NDVI (R 2 = 0.0017, p = 0.75) and Sentinel-2 NDVI (R 2 = 0.0004, p = 0.88) was very low.

Relationships between the Satellite and WS Data
The first objective of this study was to assess the relationship between WS rainfall data and CHIRPS, TAMSAT and MODIS NDVI at different spatial and temporal scales. The results show that although CHIRPS was a better estimator of daily rainfall than TAMSAT, they both underperformed in estimating daily rainfall (R ≤ 0.40). However, at the 16-day and monthly time steps, the results improved, with CHIRPS producing better results than TAMSAT. We also observed that all the satellite data were poorly correlated with WS4 (Tsolo WS). Removing WS4 from the analysis improved the performances of the SRFEs by 6%. For instance, the mean correlation between CHIRPS and the WS data increased from 0.66 to 0.72. Although SRFEs misestimate rainfall over coastal and mountainous areas [57], WS4 is in a relatively flat terrain and away from the coast compared to the other WSs. A possible explanation for this discrepancy could be inaccuracies in the records of WS4 because preliminary testing of the satellite data revealed moderate to strong intercorrelations between CHIRPS, TAMSAT, and MODIS NDVI. Changing the spatial scale from 10 km to 25 km did not have any significant impact on the performances of all the three satellite datasets. Overall, CHIRPS outperformed TAMSAT and MODIS NDVI and emerged to be most suitable for use at monthly time step.

Relationship of Yield with Rainfall and NDVI
Maize yields were below the national average, which often fluctuates between 4000 kg/ha and 7000 kg/ha [61][62][63]. Low yields in ORTDM, and other smallholder farming areas in South Africa, are also reported by Chimonyo et al. [64] and by Kambanje et al. [65]. South Africa's agricultural sector comprises (1) well-developed commercial farms, which are responsible for most of the country's agricultural output and (2) lessdeveloped resource-poor SFS. ORTDM's maize yields were lower than the national average because the well-developed commercial farms perform better than the SFS, which invest less in farming operations.
The weak linear relationship between maize yield and the two indices (NDVI and rainfall) can be attributed to the influence of non-weather yield-determining factors. In a recent study, Masiza et al. Ref [32] used partial dependence plots to show that ORTDM's maize yields were partially and non-linearly dependent, not only on rainfall, but also on fertilizer application rate, seed variety, heat units, surface moisture, soil pH, and mechanization. They [32] ran a variable importance algorithm that ranked rainfall as the fifth-most important yield-determining factor in ORTDM. This explains our findings, as one would expect maize yield to have moderate-to-strong correlations with rainfall and NDVI if water was the main limiting factor. In addition, ORTDM experienced no weather shocks over the three seasons investigated in this study. On the contrary, studies conducted in South Africa's large-scale commercial farming areas report better correlations between rainfall, NDVI, and maize yield [66][67][68] because the main limiting factor in well-developed farming systems is usually precipitation rather than the wide-ranging factors affecting yields in SFS.
Another important point is that WII (e.g., drought insurance) covers yield losses caused by rainfall or water deficits [15]. Water deficits do cause yield losses, but a significant number of SFS in ORTDM, South Africa, and other parts of the world still produce low yields under optimal weather conditions because of non-weather yield-determining factors. This is supported by Black et al. [26], who observed that heavy crop losses in Zambia occurred even during years of near-normal rainfall because of socio-economic factors. In Uganda, Epule et al. Ref [69] compared climatic and non-climatic determinants of crop yields and observed that non-climatic factors were the main drivers of crop yields. In a district near ORTDM, Mujuru and Obi [70] reported that low maize yields were due to low use of fertilizer. In India, the World Bank [71] reported that correlation between weather indexed payments and crop losses was only 14%.
Based on the findings presented above, we make the following conclusions and recommendations: Although this study made an effort to minimize yield data inaccuracy by employing an object yield-survey method, maize yield still showed poor correlations with NDVI and rainfall. We, therefore, recommend that the process of designing WII must include identification of main yield-limiting factors for specific localities. Black et al. Ref [26] pointed out that a strong influence of non-weather-related factors on yield will cause high basis risk in WII contracts. The influence of non-weather factors on yields can be quantified through variable ranking algorithms or other multivariate techniques. Information about non-weather factors could also assist insurers who might want to link or bundle insurance with inputs and advisory services. Adoption of improved inputs and good farming practices could reduce the influence of non-weather factors on crop losses, and thereby reduce basis risk in WII contracts. However, this requires reliable data on the current farming practices and crop yields. The lack thereof will necessitate collection of new data. For example, Hernandez et al. Ref [72] report that the PULA insurance program continues to collect new data to recalibrate and improve its insurance contracts. However, collecting new yield data using crop cut methods can be time consuming and costly. Insurers need to partner with governments, input suppliers, research institutions, and farmer organizations to achieve this. In South Africa, public-private partnerships are already underway. Public extension services, private farmer organizations, input suppliers, research institutions, and milling plants are all involved in smallholder cropping programs [32,73]. Such programs could be exploited to establish data standards and data sharing platforms.
Lastly, CHIRPS measured rainfall well at the 10 km and 25 km spatial scales. This shows that CHIRPS can map local rainfall conditions. Therefore, CHIRPS data could be combined with physiographic and agro-ecological data to demarcate unit areas of insurance. CHIRPS also performed well at the 16-day and monthly time steps. An alternative to the method used in this study would be a crop water requirements approach (e.g., CROPWAT). A crop water requirements approach could use 20-day and monthly CHIRPS data to develop insurance indices for the critical growth stages of maize. The insurance would then issue payouts when water requirements are not met over these 20-to-30-day periods. This would work well with the FAO's CROPWAT system, which categorizes maize growth stages into initial (20, 25, or 30 days), development (30 days), mid-season (40 days), and late-season (30 days). Studies have showed that CROPWAT can model water-deficit-caused yield reductions [74]. We hope that these recommendations will reach policy makers, researchers, insurers, and other stakeholders involved in AII.