# Twice Is Nice: The Benefits of Two Ground Measures for Evaluating the Accuracy of Satellite-Based Sustainability Estimates

^{1}

^{2}

^{3}

^{*}

## Abstract

**:**

^{2}), which account for this noise in ground data are accordingly higher, and occasionally even double, the apparent performance based on a naïve comparison of satellite and ground measures. Our results caution against evaluating satellite measures without accounting for noise in ground data and emphasize the benefit of estimating that noise by collecting at least two independent ground measures.

## 1. Introduction

## 2. Materials and Methods

#### 2.1. Correlation between Ground Measures

#### 2.2. Derivation of Correction Equation

_{Y}:

_{s}representing normally distributed error with mean 0 and standard deviation σ

_{ε_S}, which is independent of Y. Then we can express the standard deviation of S as

_{1}that is a noisy measure of Y, with a standard deviation of noise σ

_{ε_G1}:

_{ε_G1}independent of both S and Y, in which case

_{1}) is smaller than r(S,Y), because σ

_{G}

_{1}> σ

_{Y}. Thus, correlations reported in studies are typically a lower bound on the correlation of S with the true outcome, but without knowing the ratio of σ

_{G1}:σ

_{Y}one cannot estimate the true value of r(S,Y).

_{2}, which is independent of the first measure, G

_{1}.

_{2}as:

_{G}

_{1}and ε

_{G}

_{2}are independent is:

^{2}(S,Y) is measured not based on the absolute agreement between S and G

_{1}or G

_{2}, but by how this agreement compares to how well the ground measures agree with each other.

_{1}is a crop-cut yield from a random subplot, G

_{2}could be self-reported yield by a farmer only if that farmer is not aware of the crop-cut estimate, or it could be a crop-cut yield from a separate location within the same field, as long as that location is randomly selected. The implications of violations of this assumption are addressed in the Section 4.

#### 2.3. Simulations

_{1}, and G

_{2}on these fields under a set of specified values for μ, σ

_{Y}, ${\sigma}_{{\u03f5}_{S}}$, ${\sigma}_{{\u03f5}_{G1}}$ and ${\sigma}_{{\u03f5}_{G2}}$. We then calculate the correlations r(S,Y), r(S,G

_{1}), r(S,G

_{2}), r(G

_{1},G

_{2}), as well as the estimated $\widehat{{r}^{2}}\left(S,Y\right)$ from Equation (14). We then repeat these simulations 1000 times, using different combinations of parameters for the simulated noise in both satellite and ground measures to explore a range of potential conditions. The range of values used for the parameters were 1–3 t/ha for σ

_{Y}, and 0.5–3 t/ha for ${\sigma}_{{\u03f5}_{S}}$, ${\sigma}_{{\u03f5}_{G1}}$ and ${\sigma}_{{\u03f5}_{G2}}$. The value of μ was fixed at 5 t/ha, since varying this value does not affect the resulting r

^{2}.

#### 2.4. Application to Crop Yields

#### 2.5. Application to Household Consumption

^{2}(i.e., the direct comparison with ground measures without using Equation (14)) between the NL predictions and the overall average or the average of the first subset were also calculated for comparison.

## 3. Results

#### 3.1. Correlations between Ground Measures

#### 3.2. Simulations Illustrate the Value of Two Ground Measures

_{y}

^{2}, which is true in the limit since the covariance between independent realizations of noise will be zero. However, for finite samples of noise (σ

_{S}, σ

_{G1}, σ

_{G2}), these covariances will generally be slightly larger or smaller than zero, leading to a question of how accurate Equation (14) will be under typical conditions.

^{2}(S,Y) with both r

^{2}(S,G

_{1}) (e.g., the training r

^{2}for a satellite model trained on a noisy ground measure) and the estimated $\widehat{{r}^{2}}\left(S,Y\right)$ based on Equation (14). This initial plot pertains to a large number of locations (N = 1000) to illustrate two key points. First, the training r

^{2}is always less than the true r

^{2}, and can be well below 50% of the true value. Second, Equation (14) results in an unbiased estimate of the true r

^{2}across a range of parameter combinations, with most points falling very close to the 1:1 line. Equation (14) remains valid even if one assumes that satellite estimates are subject to Berkson rather than the classical measurement error, with values that are smoother (i.e., have less variance) than the true values (Figure S1).

#### 3.3. Correcting for Ground Noise Significantly Improves Performance Measures for Crop Yields

^{2}. We also compute a 5–95% confidence interval for each measure of r

^{2}by resampling the data with replacement 100 times, calculating the r

^{2}values for each sample, and taking the 5th and 95th highest values out of the 100 bootstrap samples.

^{2}using Equation (14) is above 0.8, more than twice the uncorrected values albeit with wide confidence intervals. When using two 4 × 4 m crop-cuts as the two ground measures, the corrected r

^{2}is lower at 0.52. In this study, we also obtained a full plot harvest for a subset of fields [11], which if treated as the truth provides a direct estimate of r

^{2}(S,Y) of 0.56. Thus, the corrected r

^{2}for the 4 × 4 m crop-cuts was very similar to the direct estimate of the true value. The corrected r

^{2}values when using self-report were also consistent with the true value but exhibited wide error bars owing to the smaller number of fields with both self-report and crop-cut data.

^{2}of 0.07 and 0.20 with self-report and crop-cut yields, respectively. In contrast, the corrected r

^{2}was considerably higher at 0.37, because of the low observed correlation between self-report and crop-cuts (r = 0.32). In a study of 147 wheat fields in Nepal, two separate 5 × 5 m crop-cuts were obtained in each field, with a high correlation of 0.93 between them despite the fact that they were reportedly independent samples. This could reflect the fact that these wheat fields were irrigated, which tends to increase the homogeneity of yield outcomes. As a result of the higher correlation between crop-cuts, the corrected r

^{2}of 0.45 was only slightly higher than the training r

^{2}for the individual crop-cuts (of 0.41 and 0.43).

#### 3.4. Correcting for Ground Noise Significantly Improves Performance Measures for Household Consumptions

^{2}) on the held-out test data was estimated using the naïve comparison, where all households in a cluster were used to estimate average consumption, as well as using Equation (14) with G

_{1}defined as the average of a random half of the households and G

_{2}as the average of the other half (Figure 4). For comparison, we also show how the naïve r

^{2}changes if using only half of the households (G

_{1}). Consistent with the notion that sampling more households helps to reduce noise in the ground-based estimate, the naïve r

^{2}was higher in all six cases when using the average of all households rather than half of the households. However, the corrected r

^{2}, which explicitly accounts for noise in the ground-based measures, was higher still, especially in Tanzania and Nigeria. Corrected r

^{2}in Ethiopia remained low, mainly because nightlights appear to explain very little variation in ground-measured household consumption (the numerator in Equation (14) is close to zero).

^{2}was 0.2 points higher than the naïve r

^{2}for consumption expenditures across the six cases, with a median difference of 0.16. Interestingly, this difference is similar in magnitude to the difference in many countries between satellite performance for predicting assets vs. consumption [1,20]. Thus, although on the surface it appears that assets are “easier” than consumption to predict from satellite, it may be that much of the difference, in fact, arises from the fact that consumption is harder to measure than assets on the ground.

#### 3.5. Correcting Satellite Performance Measures in the Absence of Two Ground-Based Estimates

^{2}between satellite and ground measures should be reported with an emphasis on the downward bias that results from noise in the ground measure.

^{2}was then calculated as:

_{1}is the crop-cut yield, and S

_{1}and S

_{2}are the satellite estimates for the two sampled pixels. This calculation is repeated 100 times, each time taking a new sample of fields and of pixels within the fields, to estimate a confidence interval on the corrected r

^{2}.

^{2}that results from combining these correlations with the single crop-cut (Equation (17)) agreed well with the corrected r

^{2}from using the combination of self-report and crop-cut (Equation (14) (Figure 5). Thus, at least for these two examples, it appears that using satellite measures of in-field heterogeneity can be a useful substitute for two independent ground measures of yield.

## 4. Discussion

## 5. Conclusions

## Supplementary Materials

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Acknowledgments

## Conflicts of Interest

## References

- Jean, N.; Burke, M.; Xie, M.; Davis, W.M.; Lobell, D.B.; Ermon, S. Combining satellite imagery and machine learning to predict poverty. Science
**2016**, 353, 790–794. [Google Scholar] [CrossRef][Green Version] - Blumenstock, J.; Cadamuro, G.; On, R. Supplementary Materials for Predicting poverty and wealth from mobile phone metadata. Science
**2015**, 350, 1073–1076. [Google Scholar] [CrossRef] [PubMed][Green Version] - Sheehan, E.; Meng, C.; Jean, N.; Tan, M.; Burke, M.; Ermon, S.; Uzkent, B.; Lobell, D. Predicting economic development using geolocated wikipedia articles. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Anchorage, AK, USA, 4–8 August 2019. [Google Scholar]
- Burke, M.; Driscoll, A.; Lobell, D.B.; Ermon, S. Using satellite imagery to understand and promote sustainable development. Science
**2021**, 371, eabe8628. [Google Scholar] [CrossRef] - Watmough, G.R.; Marcinko, C.L.J.; Sullivan, C.; Tschirhart, K.; Mutuo, P.K.; Palm, C.A.; Svenning, J.C. Socioecologically informed use of remote sensing data to predict rural household poverty. Proc. Natl. Acad. Sci. USA
**2019**, 116, 1213–1218. [Google Scholar] [CrossRef] [PubMed][Green Version] - Mirza, M.U.; Xu, C.; van Bavel, B.; van Nes, E.H.; Scheffer, M. Global inequality remotely sensed. Proc. Natl. Acad. Sci. USA
**2021**, 118, e1919913118. [Google Scholar] [CrossRef] - Clevers, J.G.P.W. A simplified approach for yield prediction of sugar beet based on optical remote sensing data. Remote Sens. Environ.
**1997**, 61, 221–228. [Google Scholar] [CrossRef] - Lobell, D.B.; Asner, G.P.; Ortiz-Monasterio, J.I.; Benning, T.L. Remote sensing of regional crop production in the Yaqui Valley, Mexico: Estimates and uncertainties. Agric. Ecosyst. Environ.
**2003**, 94, 205–220. [Google Scholar] [CrossRef][Green Version] - Lambert, M.J.; Traoré, P.C.S.; Blaes, X.; Baret, P.; Defourny, P. Estimating smallholder crops production at village level from Sentinel-2 time series in Mali’s cotton belt. Remote Sens. Environ.
**2018**, 216, 647–657. [Google Scholar] [CrossRef] - Burke, M.; Lobell, D.B. Satellite-based assessment of yield variation and its determinants in smallholder African systems. Proc. Natl. Acad. Sci. USA
**2017**, 114, 2189–2194. [Google Scholar] [CrossRef][Green Version] - Lobell, D.B.; Azzari, G.; Burke, M.; Gourlay, S.; Jin, Z.; Kilic, T.; Murray, S. Eyes in the Sky, Boots on the Ground: Assessing Satellite- and Ground-Based Approaches to Crop Yield Measurement and Analysis. Am. J. Agric. Econ.
**2019**, 1–18. [Google Scholar] [CrossRef] - Jain, M.; Singh, B.; Srivastava, A.; McDonald, A.; Lobell, D.B. Mapping Smallholder Wheat Yields and Sowing Dates Using Micro-Satellite Data. Remote Sens.
**2016**, 8, 860. [Google Scholar] [CrossRef][Green Version] - Johnson, D.M. An assessment of pre- and within-season remotely sensed variables for forecasting corn and soybean yields in the United States. Remote Sens. Environ.
**2014**, 141, 116–128. [Google Scholar] [CrossRef] - World Bank Group. Capacity Needs Assessment for Improving Agricultural Statistics in Kenya; World Bank Publications: Washington, DC, USA, 2016. [Google Scholar]
- World Bank Group. Capacity Needs Assessment for Improving Agricultural Statistics in Uganda; World Bank Publications: Washington, DC, USA, 2016. [Google Scholar]
- Carletto, C.; Jolliffe, D.; Banerjee, R. From Tragedy to Renaissance: Improving Agricultural Data for Better Policies. J. Dev. Stud.
**2015**, 51, 133–148. [Google Scholar] [CrossRef] - Gourlay, S.; Kilic, T.; Lobell, D.B. A new spin on an old debate: Errors in farmer-reported production and their implications for inverse scale—Productivity relationship in Uganda. J. Dev. Econ.
**2019**, 141, 102376. [Google Scholar] [CrossRef] - FAO. Handbook on Crop Statistics: Improving Methods for Measuring Crop Area, Production and Yield; FAO: Rome, Italy, 2018. [Google Scholar]
- Fermont, A.; Benson, T. Estimating Yield of Food Crops Grown by Smallholder Farmers: A Review in the Uganda Context; International Food Policy Research Institute: Washington, DC, USA, 2011. [Google Scholar]
- Yeh, C.; Perez, A.; Driscoll, A.; Azzari, G.; Tang, Z.; Lobell, D.; Ermon, S.; Burke, M. Using publicly available satellite imagery and deep learning to understand economic well-being in Africa. Nat. Commun.
**2020**, 11, 2583. [Google Scholar] [CrossRef] - Lobell, D.B.; Di Tommaso, S.; You, C.; Djima, I.Y.; Burke, M.; Kilic, T. Sight for sorghums: Comparisons of satellite-and ground-based sorghum yield estimates in Mali. Remote Sens.
**2020**, 12, 100. [Google Scholar] [CrossRef][Green Version] - Campolo, J.; Güereña, D.; Maharjan, S.; Lobell, D.B. Evaluation of soil-dependent crop yield outcomes in Nepal using ground and satellite-based approaches. Field Crops Res.
**2021**, 260, 107987. [Google Scholar] [CrossRef] - Ayush, K.; Uzkent, B.; Tanmay, K.; Burke, M.; Lobell, D.; Ermon, S. Efficient Poverty Mapping from High Resolution Remote Sensing Images. Proc. AAAI Conf. Artif. Intell.
**2021**, 35, 12–20. [Google Scholar] - Scipal, K.; Holmes, T.; De Jeu, R.; Naeimi, V.; Wagner, W. A possible solution for the problem of estimating the error structure of global soil moisture data sets. Geophys. Res. Lett.
**2008**, 35, 2–5. [Google Scholar] [CrossRef][Green Version] - Stoffelen, A. Toward the true near-surface wind speed: Error modeling and calibration using triple collocation. J. Geophys. Res. Oceans
**1998**, 103, 7755–7766. [Google Scholar] [CrossRef] - Abay, K.A.; Bevis, L.E.M.; Barrett, C.B. Measurement Error Mechanisms Matter: Agricultural Intensification with Farmer Misperceptions and Misreporting. Am. J. Agric. Econ.
**2020**, 103, 498–522. [Google Scholar] [CrossRef]

**Figure 1.**Noise in ground-based measures can lead to low correlations between independent measures in the same location. Gray lines show estimates of the correlation between two independent measures, with one line for each separate dataset, and black line shows the mean value across datasets. Individual studies and sources for crop yields are presented in Table S1. For household assets, we used data from DHS for 23 countries, with 1–4 years per country. For consumption estimates, we used data from LSMS for three countries, with two years per country. The mean number of fields for crop-cut studies was 145. The mean number of clusters for asset surveys was 462, with an average of 25 households per cluster. The mean number of clusters for consumption surveys was 161, with a mean of 10 households per cluster.

**Figure 2.**Simulated performance of Equation (14) to correct the measure of satellite performance for noise in ground data. (

**left**) Training r

^{2}(gray points) and corrected r

^{2}(blue points, using Equation (14)) for a linear regression model that predicts ground-measured yields with satellite-measured yields for 1000 fields, plotted against the “true” r

^{2}between satellite-measured and true yields. True yields and the noisy ground- and satellite-based measures of yields were simulated from Equations (4)–(8), (11) and (12), with each point representing a single set of simulation parameters, and simulations then repeated for different combinations of parameter values to span a wide range of r

^{2}. Lines show a locally-weighted polynomial (LOWESS) fit to the points. Parameters ranged from 1 to 3 t/ha for s

_{Y}, and 0.5 to 3 t/ha for ${\sigma}_{{\u03f5}_{S}}$, ${\sigma}_{{\u03f5}_{G1}}$, and ${\sigma}_{{\u03f5}_{G2}}$. The value of m was fixed at 5 t/ha, since varying this value does not affect the resulting r

^{2}. Overall, training r

^{2}underestimates the true r

^{2}, whereas Equation (14) results in an unbiased estimate of the true r

^{2}. (

**right**) The difference between the true and corrected r

^{2}(from Equation (14)) plotted against the number of fields simulated. The mean difference (i.e., bias) remains small when a small number of fields are simulated, but the median absolute error rises sharply when fewer than 100 fields are measured.

**Figure 3.**Naïve r

^{2}understates true performance for satellite crop yield estimation. Comparison of r

^{2}between satellite-based yields and two different ground-based measures of yields, as well as a corrected r

^{2}using Equation (14). The confidence intervals (gray bars) were calculated by resampling the fields and recalculating the different r

^{2}values. The bottom panel indicates the location, crop, and type of ground measures used. Uganda data are from [11], Mali data from [21], and Nepal data from [22].

**Figure 4.**Naïve r

^{2}understates true performance for satellite household consumption estimation. Comparison of r

^{2}between satellite-based consumption expenditure estimates and household surveys for six country-year combinations. Naïve r

^{2}show estimates when directly comparing satellite estimates to the average value for all or half of households in each cluster, while corrected r

^{2}uses Equation (14) with G

_{1}and G

_{2}each based on half of the households in each cluster. The confidence intervals (gray bars) were calculated by resampling the clusters and recalculating the different r

^{2}values.

**Figure 5.**Satellite measures of in-field heterogeneity can substitute for a second crop-cut. Comparison of naive r

^{2}for a regression of satellite-based yields on crop-cut yields (G

_{1}), the corrected r

^{2}using self-report yields and Equation (14) as in Figure 3 (method 1), and corrected r

^{2}using satellite-based estimates of the correlation between two crop-cuts and Equation (15) (method 2). The confidence intervals (gray bars) were calculated by resampling the fields and recalculating the different r

^{2}values.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Lobell, D.B.; Di Tommaso, S.; Burke, M.; Kilic, T.
Twice Is Nice: The Benefits of Two Ground Measures for Evaluating the Accuracy of Satellite-Based Sustainability Estimates. *Remote Sens.* **2021**, *13*, 3160.
https://doi.org/10.3390/rs13163160

**AMA Style**

Lobell DB, Di Tommaso S, Burke M, Kilic T.
Twice Is Nice: The Benefits of Two Ground Measures for Evaluating the Accuracy of Satellite-Based Sustainability Estimates. *Remote Sensing*. 2021; 13(16):3160.
https://doi.org/10.3390/rs13163160

**Chicago/Turabian Style**

Lobell, David B., Stefania Di Tommaso, Marshall Burke, and Talip Kilic.
2021. "Twice Is Nice: The Benefits of Two Ground Measures for Evaluating the Accuracy of Satellite-Based Sustainability Estimates" *Remote Sensing* 13, no. 16: 3160.
https://doi.org/10.3390/rs13163160