Monitoring Maize Yield Variability over Space and Time with Unsupervised Satellite Imagery Features

Cullen Molitor; Juliet Cohen; Grace Lewin; Steven Cognac; Protensia Hadunka; Jonathan Proctor; Tamma Carleton

doi:10.3390/rs17213641

,

and

¹

The Center for Effective Global Action, University of California, Berkeley, CA 94720, USA

²

Bren School of Environmental Science & Management, University of California, Santa Barbara, CA 93106, USA

³

Ecology, Evolution and Marine Biology, University of California, Santa Barbara, CA 93106, USA

⁴

Department of Agricultural and Consumer Economics, University of Illinois Urbana-Champaign, Champaign, IL 61820, USA

Remote Sens.2025, 17(21), 3641;https://doi.org/10.3390/rs17213641

This article belongs to the Special Issue Crop Yield Prediction Using Remote Sensing Techniques

Version Notes

Order Reprints

Review Reports

Highlights

What are the main findings?

Simple, computationally efficient machine learning methods can reliably predict agricultural outcomes in data-scarce environments using publicly available imagery.
Task-agnostic random convolutional features (RCFs) from satellite imagery achieve an out-of-sample $R^{2}$ of 0.83 in predicting maize yields across space and time in Zambia (2016–2021), outperforming traditional NDVI-based approaches.
RCF models achieve strong performance, predicting exclusively temporal variation in yields ( $R^{2}$ = 0.74) when explicitly trained on temporal anomalies and substantially outperforming NDVI-based approaches ( $R^{2}$ = 0.39).

What are the implications of the main findings?

Effective crop-monitoring systems are feasible without expensive proprietary data or computationally intensive deep learning methods.
Imagery-based monitoring can accurately detect temporal yield anomalies, enabling governments and agencies to target interventions like the expansion or release of government grain stocks, food aid, and agricultural insurance payouts.
Task-agnostic satellite features offer a scalable, multipurpose alternative to traditional vegetation indices, as the same RCF features can be reused across different prediction tasks (income, forest cover, water availability) without customization.

Abstract

Recent innovations in task-agnostic imagery featurization have lowered the computational costs of using machine learning to predict ground conditions from satellite imagery. These methods hold particular promise for the development of imagery-based monitoring systems in low-income regions, where data and computational resources can be limited. However, these relatively simple prediction pipelines have not been evaluated in developing-country contexts over time, limiting our understanding of their performance in practice. Here, we compute task-agnostic random convolutional features from satellite imagery and use linear ridge regression models to predict maize yields over space and time in Zambia, a country prone to severe droughts and crop failure. Leveraging Landsat and Sentinel 2 satellite constellations, in combination with district-level yield data, our model explains 83% of the out-of-sample maize yield variation from 2016 to 2021, slightly outperforming a model trained on Normalized Difference Vegetation Index (NDVI) features, a common remote sensing approach used by practitioners to monitor crop health. Our approach maintains an

R^{2}

score of 0.74 when predicting temporal variation alone, while the performance of the NDVI-based approach drops to an

R^{2}

of 0.39. Our findings imply that this task-agnostic featurization can be used to predict spatial and temporal variation in agricultural outcomes, even in contexts with limited ground truth data. More broadly, these results point to imagery-based monitoring as a promising tool for assisting agricultural planning and food security, even in contexts where computationally expensive methodologies remain out of reach.

Keywords:

maize; yield prediction; Landsat; Sentinel; MOSAIKS; Zambia

1. Introduction

Satellite imagery holds substantial promise for addressing data gaps in regions lacking reliable agricultural data []. Despite this potential, the vast majority of research on the remote sensing of agricultural productivity is conducted in the developed world [,]. In sub-Saharan Africa (SSA), where ground-based data collection can be challenging, costly, and time-intensive, satellite imagery offers a scalable and cost-effective alternative [,]. For satellite imagery to be useful in practice, however, tools and methodologies must be accessible to practitioners in these regions []. This requires the development of computationally efficient, yet high-performing, pipelines that can be implemented with limited resources.

By expanding measurements beyond resource-extensive survey campaigns, imagery-based monitoring can transform how agricultural systems are tracked and managed []. In most parts of Africa, such innovations have the potential to generate enormous social and economic benefits. Agriculture in SSA constitutes, on average, 17% of GDP while employing 50% of the population []. Despite its importance, agricultural yields remain highly variable over space and time. For instance, 95% of cultivated fields are rainfed [], resulting in yields that are both lower and more volatile than those achievable with irrigation [,,]. Given that over half of the calories consumed in Africa come from domestically produced staples (grains, roots, and tubers), monitoring these fluctuations in productivity is essential for ensuring food security and economic stability. The accurate monitoring of maize yields in Zambia, our study context, is of particular value for policy planning and risk management, given that maize serves as the primary staple crop and a key determinant of national livelihoods, household nutrition, and food security. To meet this need, a variety of vegetation indices have been developed to monitor crop performance using satellite imagery. Among these, the Normalized Difference Vegetation Index (NDVI) remains the most widely used [], and it has consistently proven to be a reliable predictor of crop yields [,,,,,]. However, NDVI and similar indices capture only a single type of information from the rich, multi-dimensional data contained in satellite imagery. As a result, they may fail to represent the full complexity of agricultural landscapes or to generalize across regions and time periods with differing production practices and environmental conditions.

In recent years, convolutional neural networks (CNNs) have become a powerful tool for predicting crop yields from satellite imagery [,,]. By automatically learning to extract textural and spectral patterns from raw imagery, CNNs can capture relationships that simpler indices like NDVI cannot, improving accuracy and robustness in yield predictions []. However, their adoption in low-income regions remains limited because they are computationally intensive, require large training datasets, and offer limited interpretability for policymakers []. Consequently, the benefits of deep learning have been concentrated in well-resourced contexts. Moreover, neither index-based methods like NDVI nor deep learning methods have been used to decompose performance predicting African yields across space and time, leaving it unclear whether satellite imagery can reliably identify localities and seasons at risk of crop loss.

In this paper, we apply a task-agnostic satellite-embedding method to assess the feasibility of spatiotemporal agricultural monitoring using low-cost computational tools in the SSA context. Specifically, we adapt the Multi-Task Observation using Satellite Imagery and Kitchen Sinks (MOSAIKS) [,,] approach to predict spatial and, independently, temporal variation in maize yield across Zambia. MOSAIKS employs a simple architecture to produce random convolutional features, which are not tailored to any specific task. The simplicity of the architecture means that MOSAIKS is faster and cheaper than more complex CNNs, such as those used in deep learning models, while maintaining competitive performance through its embedding of rich textural and spectral information [,]. While generating MOSAIKS features initially requires moderate computational resources and image-processing expertise, once features have been created, they can be used for a variety of models and tasks with minimal additional computation and without needing to access the raw imagery. Until now, MOSAIKS has only been evaluated in the U.S. or using harmonized and pre-processed global-scale datasets, and its use has been limited to applications featurizing highly processed and cloud-free raster data. Here, for the first time, we use MOSAIKS to predict variations over time, demonstrating how an accessible and lightweight computational pipeline can effectively monitor crop production in data-limited regions.

Our findings indicate that the MOSAIKS embedding performs well in predicting maize yields, which is of critical importance to Zambia and other low-income, agriculturally dependent populations. We specifically focus on the problem of explaining temporal variation in yield, given how critical temporal prediction is for identifying the risk of food insecurity, distributing insurance payouts, and targeting aid. Despite this, few prior studies have isolated a remote sensing model’s performance over time, and those that do tend to find that predicting temporal variation is more difficult than predicting spatial variation alone []. In producing a reliable model of temporal variation in maize yields, this study can, for example, inform the development of emerging satellite-based agricultural insurance programs, which largely rely on NDVI (e.g., for livestock and for crops; []).

This work carries important implications for enhancing the accessibility and equity of emerging imagery-based machine learning research. New technologies combining imagery with machine learning tend to be too costly to leverage for many less-resourced researchers and are often inaccessible to the practitioners most in need of the tools. We hope that our task-agnostic featurization can unlock imagery-based management and monitoring solutions in a way that empowers practitioners as not only consumers but also producers of imagery-based datasets. By developing accessible and low-cost tools, we aim to democratize the use of satellite imagery, enabling practitioners in low-income regions to actively participate in and benefit from these technologies. To this end, all of our code, input data, output data, and corresponding training tutorials are made publicly available.

The paper is organized as follows. First, we describe the data collected across our study context of Zambia. Then, we detail our methodology, including the processing of satellite imagery and the training of machine learning models to predict administrative data on yields over space and time. We then present results and end with a discussion of the relevance of our findings.

2. Materials and Methods

2.1. Study Context

Our study context is Zambia, a landlocked nation situated in the lower central part of sub-Saharan Africa (Figure 1), where agriculture employs 55% of the population []. Zambia is divided into ten provinces, each of which is further divided into districts. As of the 2010 national census, Zambia had 72 districts. The number of districts has since increased to 116 by 2019 with redistricting []. Our study leverages in situ maize data, which we assemble at the level of the initial 72 districts to maintain consistency over time with the 2010 national census sampling frame.

Figure 1. The country of Zambia is shown with 10 provinces (thick borders) and the 72 district divisions (thin borders) used in analysis. The Zambian cropland percentage is shown as a background raster. Cropland data from [], using the 2019 raster layer at 30 m resolution and aggregated to 0.01 degree resolution. For a full list of districts and relevant statistics, please see Supplementary Table S1. The inset map shows Zambia’s location within Africa (red box).

Maize is the single most important staple crop in Zambia, providing more than half of the country’s daily calorie intake []. The majority of farming households grow maize as their primary crop, with many relying on sales from their harvest []. As much as two-thirds of maize production in Zambia comes from rainfed smallholder farms []. Going back as far as the 1960s, maize production has been heavily subsidized by the Zambian government [].

The reliance on rainfed, rather than irrigated, agriculture has left maize production vulnerable to droughts and led to substantial volatility in maize output []. Within our study period, Zambia recorded droughts during the following agricultural seasons: 2015/16 and 2018/19 []. The 2018/19 agricultural year saw the lowest yields among all the drought years because that year also coincided with pest infestations from fall armyworms [].

2.2. Data

2.2.1. Crop Forecast Survey (CFS)

Our primary objective is to learn the relationship between satellite imagery and maize yields over space and time. To do so, we leverage the Zambia Crop Forecast Survey (CFS), the best available data on maize yields that is comprehensive across the country and available consistently over time. The CFS is a nationally representative, pooled, cross-sectional dataset that surveys a large number of farming households across all districts in Zambia. The CFS is conducted by the Ministry of Agriculture in conjunction with the Zambia Statistics Agency (ZamStats). The CFS is collected primarily to forecast the upcoming harvest for the agricultural season.

In the CFS, an agricultural season runs from 1 October to 30 September the following year. The surveys are conducted between March and April when most of the maize (and other crops) have reached physiological maturity, and thus, the area planted and inputs used are a good estimate for the harvest. The maize harvest begins in May and runs through July or August in certain areas (Figure 2). Zambia has a sub-tropical climate with three distinct seasons: a hot and dry season from mid-August to mid-November, a wet rainy season from mid-November to April, and a cool dry season from May to mid-August []. The rainy season overlaps with the majority of the growing season (Figure 2).

Figure 2. Seasonal timing of agricultural activity, precipitation, crop surveying, and satellite imagery collection in Zambia. The blue bars show monthly mean precipitation, averaged over Zambia from October 2008 to December 2020. The precipitation data is from the ERA5 reanalysis, which has 0.25-degree resolution. The black horizontal bars show the average timing of key agricultural activities (sowing, growing, and harvest), as well as survey data collection (denoted CFS for Crop Forecast Survey). The black dotted line denotes an extended harvest into August, which occurs in some Zambian districts in some years. Satellite imagery is available over the entire year. In our analysis, we investigate using the full monthly range of images, many of which are of low quality due to cloud cover, as well as a limited month range that includes harvest time and less cloud cover.

We use maize yield records from the CFS summarized to the district level (due to privacy constraints, we could not access geolocations for farmers below the district level) over 14 agricultural seasons from 2008/09 to 2021/2022 (hereafter referred to as harvest years 2009–2022), represented in metric tons per hectare (mt/ha). With 72 districts over 14 seasons, we have 1008 available observations in our sample. We use subsets of these data based on the overlap with available satellite imagery.

2.2.2. Cropland Data

We use satellite imagery to predict district-scale crop yields. However, only 5% of Zambia’s area is estimated to contain cropland []. Therefore, in order to focus the model’s attention on imagery located in agricultural areas, we leverage high-resolution estimates of where crops are grown across Zambia from []. Specifically, we aggregate binary indicators of cropland at 30 m resolution from [] to the scale of our 0.01 degree resolution grid employed for image sampling (see Section 2.3.1 below) using bilinear interpolation to compute the percentage of any grid cell that is covered in cropland. We then leverage these data in two alternative ways, evaluating model performance under both. First, we use these data as a spatial mask for cropland areas, sampling points uniformly at random from any grid cell indicated to have nonzero cropland. Second, we sample uniformly at random from all locations nationally, but we use the cropland percentages as weights when aggregating from grid-cell-level imagery to district-level image-feature aggregates. While these two uses of cropland data in our imagery-based pipeline should improve predictive performance by focusing attention on the images where crops are actually grown, it is possible that imperfections in this cropland data—or useful information from non-cropped areas—could lead to this not being the case. Therefore, through our model selection process, we evaluate whether and under what conditions these cropland-extent data improve performance, and we use them only when they are beneficial.

2.2.3. Satellite Imagery

We use imagery from the Copernicus Sentinel-2 missions A and B and the Landsat missions 5, 7, and 8, accessed via the Microsoft Planetary Computer (MPC) Hub (Table 1).

Table 1. The spectral bands of Sentinel-2 MSI, Landsat 5 ETM, Landsat ETM+, and Landsat 8 OLI instruments used in the analysis. Spectral ranges are presented in micrometers (

μ

m). Band abbreviations: R = Red, G = Green, B = Blue, NIR = Near-infrared, SWIR_1.6 = Short-Wave Infrared 1.6

μ

m, SWIR_2.2 = Short-Wave Infrared 2.2

μ

m, CA = Coastal aerosol. “–” values represent a band that is not available or was not considered in the analysis. Overlap with the crop forecast survey (CFS) data to show the usable year range of each satellite, given the scope of our analysis.

We use data products from Sentinel-2 multispectral instruments (MSIs) processed to level 2A (bottom-of-atmosphere) using Sen2Cor from the European Space Agency and focusing on imagery from 2016 to 2022 across four spectral bands at a 10 m ground-sample distance (GSD). These twin satellites operate in a sun-synchronous orbit with a 10-day repeat cycle, phased at 180 degrees to each other and providing a combined revisit time of approximately five days []. This revisit time leads to a high chance of obtaining low-cloud-cover images during the rainy season.

The Landsat data are from the Collection 2 Level-2 Science Products from the thematic mapper (TM), enhanced thematic mapper plus (ETM+), and operational land imager (OLI) instruments. All bands used in this analysis across all 3 instruments have a resolution of 30 m GSD. Owing to the 2003 scan line corrector (SLC) error on Landsat 7, we divide our Landsat analysis into two subsets: one including all three Landsat satellites and one with only Landsat 8 []. We corrected Landsat 7 images by imputing missing pixel values through a simple nearest-neighbor approach, resulting in complete but lower-quality images compared to Landsat 5 or 8. The full Landsat collection allows for our longest time series, from 2009 to 2021. Relying only on Landsat 8 for imagery provides higher-quality images but restricts our time range to 2013 to 2021, and relying on one satellite reduces the opportunity for low-cloud-cover images during the rainy season.

2.2.4. Normalized Difference Vegetation Index (NDVI)

The Normalized Difference Vegetation Index (NDVI) is an imagery-derived index that provides a measurement of vegetation health and density, leveraging the differential absorption and reflection properties of vegetation in the red and near-infrared bands of the electromagnetic spectrum. NDVI data is commonly used to assess food security risk. For example, the Famine Early Warning Systems Network (FEWS Net) uses NDVI to monitor drought risk, allowing for more effective agricultural management []. NDVI is commonly used for crop yield estimation and forecasting [,,,,,], pest and disease monitoring [,], and forecasting agricultural responses to climatic changes [,].

Due to its relevance in agricultural modeling, we use NDVI to set a benchmark with which we compare MOSAIKS performance. The NDVI data come from the Terra Moderate Resolution Imaging Spectroradiometer (MODIS) Vegetation Indices Monthly product (MOD13C2 Version 6.1), which provides an NDVI value at a per-pixel basis over the MODIS climate modeling grid at a 0.05-degree resolution []. We subset these global data to Zambia between 2016 and 2021, the timespan that best matches our imagery feature data. We extract the mean NDVI value for the grid cells that overlap each Zambian district for each month and each year. MOD13C2 data are cloud-free spatial composites, making the quality more reliable than raw satellite imagery.

2.2.5. Temperature and Precipitation

To capture climatic variability across Zambia, we incorporate monthly temperature and precipitation data from the Climatic Research Unit (CRU) Time Series version 4.07 dataset []. The CRU TS dataset provides high-resolution gridded climate variables derived from quality-controlled weather station observations, interpolated to a 0.5 spatial resolution and available globally from 1901 to the present. It is among the most widely used observational climate datasets for environmental and agricultural research due to its long temporal coverage, spatial consistency, and extensive validation.

We subset the CRU TS v4.07 data to the period corresponding to our yield analysis and extract monthly total precipitation (mm) and mean near-surface air temperature (C) for Zambia. These gridded data are spatially aggregated to the district level by taking the mean value across all grid cells overlapping each district boundary. This district-level summarization aligns the climate data spatially with our maize yield observations and imagery features, enabling a direct comparison and integration into our modeling framework.

2.3. Methods

We design a modeling pipeline to predict spatial and temporal variation in maize yield across Zambia using satellite imagery. We separate model training into two branches, motivated by the fact that predictive performance over time tends to be more difficult than performance over space [,], which has distinct practical applications (e.g., targeting locations of low yield versus identifying crop failure in a given year). Our first modeling branch aims to predict all variations in maize yields contained in the ground-truth data, pooling observations over space and time. The second branch focuses only on explaining variations in maize yields over time in order to appropriately evaluate the performance that is most relevant to policies that are triggered by seasons of particularly low yield in specific locations. We use the MOSAIKS framework to accomplish both goals []. MOSAIKS requires several steps, which can be summarized as follows: computing random convolutional features (RCFs) from satellite imagery; spatially joining those features to ground truth data; and estimating a linear ridge regression.

We expand the MOSAIKS framework in several novel ways. First, we use public satellite imagery data to make this tool accessible to a wider user base, whereas existing MOSAIKS applications rely on private imagery [,,]. The imagery sources we use additionally allow us to increase the spectral range used in convolutions. Next, our features are time-varying, rather than static, allowing us to evaluate their use in temporally varying tasks. Finally, we build and test our models in an area that suffers from data sparsity—crop yield estimates are vital for monitoring socio-economic wellbeing in sub-Saharan Africa, but these data tend to be limited in scale and scope. Prior applications of MOSAIKS have been conducted in relatively data-rich environments like the United States [] and India [].

Throughout the model evaluation process, we test how a key set of modeling decisions affects overall performance in predicting maize yields (Table S1). Specifically, we build a set of candidate models based on these decisions, such as which satellite sensors to include and whether to apply a cropland mask. We then choose the model with the best out-of-sample fit evaluated via 5-fold cross-validation (CV). The CV process is repeated using 10 random train/test data splits to assess model variability. Our first model is chosen based on its ability to predict variation in yields over space and time. Our second model is selected based on its ability to predict variation over time alone. In the next subsections, we outline each step in our modeling pipeline.

2.3.1. Standardized Grid for Location Sampling

We begin by initializing a 0.01-degree-by-0.01-degree grid over Zambia. This grid permits systematic sampling of image patches from satellite scenes. For computational efficiency, we sample from this grid before computing image features and aggregating them to the district level for model training. We sample in two ways. First, we conduct a uniform sample that selects every nth grid cell along each row and column following a checkerboard pattern. Second, we conduct a targeted cropland sample that selects grid cells with the highest cropland percentage within each district. To implement this, we calculate the cropland percentage per grid cell and retain the top 10% of grid cells in each district, resulting in 19,598 grid cells across Zambia.

2.3.2. Selecting Imagery in Sampled Locations

For selected grid cells, we search the Microsoft Planetary Computer (MPC) data catalog for overlapping imagery per grid cell, month, and year (Figure 3a). However, cloud cover, especially during Zambia’s rainy season, presents a challenge. We filter the imagery to be less than 10% per scene and select the least cloudy image. The Sentinel-2 constellation has a higher revisit rate (5 days) than the Landsat constellation (8–16 days) and, therefore, is more likely to have low cloud-cover imagery at any given location []. Any time step that does not meet the cloud threshold leaves a gap that is later imputed, as described below.

Figure 3. Methodology overview showing (a) the creation of tabular data from satellite imagery and crop forecast survey data, (b) model selection, (c) experiment 1, selecting for overall performance, and (d) experiment 2, selecting for over-time performance. Detailed explanations of the choices made in panel (b) are shown in the Supplementary Methods Section.

Lastly, we utilized imagery with several band combinations. For Sentinel-2, we evaluated the performance of both the visual spectrum alone (RGB) and the visual spectrum with near infrared (NIR). For Landsat 8 and Landsat Collection imagery, we used all bands available at 30 m GSD (Table 1).

2.3.3. Feature Extraction and Processing

The feature extraction process begins by cropping the relevant grid-cell area from the satellite imagery (Figure 4a). We then normalize the pixel values of the cropped image to a range between 0 and 1 using min–max normalization. Following this, a PyTorch (Version 2.0.1) model generates random convolution features (RCFs). This method extracts rich information related to both color and texture from the satellite images.

Figure 4. Comparative examples of two randomly selected locations in two districts with varying maize yields. Row 1 represents Monze, an area with historically lower average maize yields, and row 2 represents Choma, with high average maize yields. Panel (a) displays the true-color satellite imagery (Sentinel 2) for both locations, panel (b) shows random convolutional feature (RCF) activation maps for four selected random features, and panel (c) presents the Normalized Difference Vegetation Index (NDVI; computed from Sentinel 2). Each row contrasts the conditions during two different harvest years: 2019, marked by generally poor yields across most districts, and 2021, noted for an overall good harvest. Yields shown in metric tons per hectare (mt/ha) are denoted for each subfigure, illustrating the substantial variation between the two years and across locations.

The PyTorch model is initialized with weights drawn randomly from a normal distribution (mean = 0, SD = 1) and a bias set to −1. It uses a convolutional layer with a 3-pixel-by-3-pixel kernel, a stride of 1, no padding, and a dilation rate of 1. The number of color channels corresponds to the number of spectral bands; several combinations are tested (see Supplementary Table S3). The model’s forward pass operates through two parallel paths. In the first path, the rectified linear unit (ReLU) activation function is applied to the convolution output to create a feature map (Figure 4b). In the second path, the model negates the convolution output before applying the ReLU activation function to produce a negative feature map. Subsequently, 2D average pooling is applied in both paths. This pooling layer reduces the spatial dimensions (i.e., height and width) of each feature map to a single value per map. These values are then concatenated to form a feature vector, with its length equivalent to the specified number of RCFs (equivalently, filters), which, in our case, is 1000.

2.3.4. Feature Imputation

Processing all images results in an n-by-k matrix, where n represents the number of selected locations and k the number of features. The cloud cover threshold creates temporal gaps in our feature matrix, necessitating imputation. A two-pass process fills these gaps, initially averaging feature values across the grid cells within a district in the same year and month, followed by averaging within the district and month but across years, when the former is not possible due to substantial cloudiness across a district. Any grid cells with missing feature values after these steps are dropped.

2.3.5. Feature Summarization

Post-processing requires spatially and temporally aligning the feature matrix (with additional attributes for longitude, latitude, year, and month) to annual, district-level maize-yield data. To align data temporally, we match monthly feature values to the maize season’s span. For example, a feature vector for a grid cell for the 2017 growing season contains image-feature values from October of 2016 through September of 2017, following the growing season shown in Figure 2.

To align data spatially, we average grid cell features within each district’s boundaries. We implement two forms of averaging: a simple unweighted mean and a cropland-weighted mean; weights are defined as the cropland percentage in each grid cell. We later evaluate the predictive performance of each aggregation scheme.

2.3.6. Model Specification and Tuning

Following prior work (e.g., [,]), we use ridge regression with 5-fold cross-validation (CV) to establish a relationship between imagery features and maize yield. Specifically, the dataset is divided into a randomly sampled training and validation set (80%) and a test set (20%). Within the training and validation set, and for each model in the large set of models described below, we conduct 5-fold cross-validation to select the best-performing set of regularization parameters. Because our models contain multiple feature sets for distinct sensors, as well as district dummy variables, and, in some cases, NDVI and climate variables, we use a grid search method to identify a vector of regularization parameters (one for each category of variables) that best predict out-of-sample values within the training set. This grid search begins with an initial multi-dimensional grid of regularization parameters taking the values of 0.1, 1, and 10 and expands iteratively if any edges of the grid are selected as the best-performing.

Model performance is gauged by combining predicted and observed values from the validation data across the five folds using the top-performing regularization parameters. The coefficient of determination (

R^{2}

) serves as our primary metric, while the squared Pearson’s correlation coefficient (

r^{2}

) is also evaluated (note that these two metrics can differ when evaluated on held-out test data, unlike in standard ordinary least squares regression evaluated in-sample). After selecting the model with the highest validation

R^{2}

, we then assess and report its performance on the independent test set. Model selection is always performed on the validation set and always with

R^{2}

as the measure of performance.

2.3.7. Model Selection

We aim to construct an imagery-based predictive model that best reflects true variation in maize yields. To do so, we systematically evaluate a set of modeling parameters to identify the best-performing combination. Given the number of parameters we can evaluate, we execute this model-selection process in two steps in order to reduce computational requirements (Figure 3b).

In the first step, we apply our linear modeling scheme to a full grid search of possible parameter combinations with a single random train/test (80/20) split of the data. Model parameters include a choice of sensor(s), the range of months within the year to include imagery for, and various feature engineering decisions; these are detailed in Table S3. The full grid search includes 1892 candidate models. The large number of unique parameter combinations, when combined with the regularization parameter grid search, results in several days of computation time on a high-performance computer cluster.

We use the results of step one to restrict modeling options to a smaller set, and we use the 10 random train/test splits to capture variability. Specifically, we gather performance metrics for each iteration of variable inputs (such as satellite, month range, etc.) and feature engineering options (like weighted average and crop masking). We then calculate the distribution of validation scores and use these to narrow down input parameters by making decisions concerning which ones to test further and which to hold static. The reduced list of parameter combinations includes 114 models, all listed in Supplementary Table S1. The mean of the validation

R^{2}

scores from the 10 random train/test splits is used to select the best-performing model.

2.3.8. Model Evaluation

For the reporting of the final results, our top model parameters are then evaluated on the 10 held-out test sets, resulting in 10 test scores. Given that we have a range of scores, we also calculate the standard error of the mean to gain an understanding of the stability of our results. We report a confidence interval that is simply a 2-standard-error margin (SEM) around the mean.

2.3.9. Experiments Overview

Our key aim is to develop a predictive model from satellite imagery to effectively monitor maize yields across space and time in Zambia. We used the modeling procedure described above to conduct three experiments, each of which elucidates different opportunities and challenges of using imagery to monitor crop yields. Because predicting variation over time is less studied, appears to be more difficult in other applications [], and is critical for identifying when to intervene in food-insecure regions, we focus our attention on the ability of our model to explain temporal variation.

2.3.10. Predicting Maize Yields over Space and Time

In this first experiment, we use the modeling procedure described above to predict spatial and temporal variation present in the ground-truth data obtained from the Crop Forecast Survey across 72 districts and 6 agricultural seasons. Specifically, we estimate ridge regression models in the validation set by regressing the logarithm of maize yields on standardized random convolutional imagery features (RCFs). We apply our two-part model selection scheme as described above. Performance in this experiment indicates how well imagery can predict both spatial (i.e., across-district) and temporal (i.e., across years) variation in maize yields across Zambia (Figure 3c).

2.3.11. Isolating Predictive Performance over Time

In our second experiment, we evaluate the ability of the best-performing model to explain variation only over time. To do so, after selecting our top model, we demean the out-of-sample predictions and the corresponding observed values by their district averages. This gives us temporal anomalies in predicted and observed yields, isolating the temporal component from the overall variation in yield. We then average the test scores over the ten random test splits, and we calculate performance confidence intervals using these same ten test splits.

2.3.12. Model Customization for Temporal Prediction

In our final experiment, we assess the returns to customizing model training procedures with the explicit goal of maximizing performance on temporal variation alone. In many policy settings, average differences across space are well known, but variations over time are more difficult to monitor quickly with traditional data collection methods. Because such measurement over time can be critical for timely and effective interventions, it is valuable to understand how well imagery-based models can perform at explaining temporal variation alone, when they are customized for such a task.

To enhance the model’s ability to predict variation over time, we calculate temporal anomalies of crop yield and imagery features before reapplying our entire model selection process. Anomalies are computed by subtracting the district mean from all yield and feature values (Figure 3d).

2.3.13. Establishing an NDVI Performance Benchmark

We process NDVI data identically to the processing of RCF features described above and proceed with model development using the same time period of study as in other experiments (2016–2021). NDVI data are subjected to the same experimental process for the three experiments described above, including the same linear modeling scheme and ten random train/test splits. The results from each NDVI experiment are used to set a baseline for comparison against RCF results throughout the manuscript (Figure 4c).

In addition to setting a benchmark with NDVI, we attempt to improve the performance of NDVI by adding temperature and precipitation as predictor variables. In the Discussion Section, we describe the benefits of combining RCF features with NDVI and climate data to improve model performance over space and time.

3. Results

The results are organized into three sections, corresponding to the three primary experiments. In each section, we compare the results of the RCF model to the NDVI model. We first show the predictive power of our model when considering all variation (over space and time). We then show that the predictive power of the model trained to predict variation over space and time is low when it is used to predict temporal variation alone. Finally, we show that explicitly training the model to explain temporal variation can markedly improve the results.

3.1. Maize Yield Predictions over Space and Time

In our first experiment, we applied the model selection schema and chose a model with the highest average validation score when explaining yield variation over space and time. We evaluate and report performance on the held-out test sets, which results in a mean score of 0.83 with a 2 SEM of 0.02. In comparison, NDVI produces an average test score of 0.8 with a 2 SEM of 0.01. This shows that, with the simple linear modeling framework of MOSAIKS, which leverages task-agnostic features extracted with no information on crop yields, one can achieve excellent predictive power, even with just 432 training observations. NDVI performance is also high, representing the value of a metric that has been developed and customized for predicting plant growth over many years in the rich remote sensing literature [,,,]. The additional benefit of MOSAIKS in this case is that the RCFs computed for this task can be used with no necessary adjustments for other predictive modeling tasks, such as income, forest cover, or water availability [].

Table 2 summarizes the model parameters that best predict overall variation in maize yields in our first experimental procedure. This optimal model combines monthly RCF vectors from the Landsat and Sentinel 2 satellite constellations between 2016 and 2021. These feature vectors are limited in time for the Landsat collection, but they use the full month range for Sentinel 2. This is intuitive, as the higher revisit rate of Sentinel 2 leads to a higher chance of low-cloud-cover images during the full year, resulting in fewer RCFs needing imputation. The cloud-cover limit, along with the lower revisit rate of Landsat, means we get higher-quality features using only images during the dry season, when cloud cover is more limited.

Table 2. Input variables chosen during model selection when selecting for out-of-sample performance over space and time.

Performance predicting maize yields is both high and relatively certain. Figure 5 shows that uncertainty, as measured with the standard error of the mean, is relatively low compared to performance, indicating consistent and stable results. We note that such measures of uncertainty are rarely reported in prior imagery-based modeling of agricultural outcomes because more complex deep learning architectures are so costly to train that repeating model training many times is infeasible.

Figure 5. Test set performance of (a) RCF model and (b) NDVI model, both trained on log-transformed maize yields. The vertical axis represents the model estimates averaged from 10 random data splits, while the horizontal axis displays observed values. Each data point signifies the annual crop yield of a single Zambian district in a single agricultural season. Points and histograms are colored by the year of the CFS. The coefficient of determination (

R^{2}

) and Pearson’s correlation coefficient squared (

r^{2}

) are provided with a 2-standard-error margin in parentheses. A reference line at 45 degrees is represented in black.

Figure 5 shows little evidence of mean-reverting measurement error [] or other systematic biases in the predictions (see Supplementary Figure S1 for a quantification of mean-reverting measurement error). For example, observations are colored by year, and errors appear well centered around the 45-degree line, even for individual years. Moreover, well-documented yield shocks are recovered well through both RCF and NDVI: the 2019 harvest suffered from a drought and fall armyworm pests, consequently resulting in low yields across the country []. This poor harvest is reflected in Figure 5 in both the observed data (x-axis) and the modeled data (y-axis). This suggests that the models perform well even in the context of large adverse shocks to agriculture, which is valuable, given the increased demand for monitoring in times of scarcity.

3.2. Temporal Performance of Spatiotemporal Models

Here, we isolate the ability of each imagery-based model to predict performance only over time. Without retraining any aspect of the models, here, we report performance demeaned by district, indicating the share of the variance over time that can be explained by models based on RCF and NDVI. This is an important experiment, given that many remote-sensing models are trained on only spatial variation, or on both spatial and temporal variation, but are then applied to fill in gaps over time without evaluating the ability of the model to predict changes over time [].

When evaluating the ability to predict temporal anomalies in yield, the RCF model achieves an average test score of 0.42 with a 2 SEM of 0.08. The NDVI model average test score is 0.33 with a 2 SEM of 0.1. The difference in performance is larger between RCF and NDVI than in the first experiment, showing a modest improvement over NDVI at predicting the variation over time. However, both models perform substantially worse in this experiment, highlighting the importance of evaluating model performance over different subsets of overall variation, particularly when downstream monitoring efforts rely on accurate metrics in one of the relevant dimensions.

Figure 6 reveals that model performance over time not only is relatively low but also exhibits structured biases. Specifically, predictions exhibit substantial mean-reversion, overestimating low-yield anomalies and underestimating high-yield anomalies. We quantify this mean reversion in Supplementary Figure S2 by reporting the slope of the coefficient from a linear regression of predictions on ground truth, where coefficients

λ

< 1 are indicative of mean reversion [,]. For the RCF model, we find that

λ

= 0.49 when evaluated on temporal variation only, while

λ

= 0.38 for the analogous NDVI model. This mean reversion is particularly problematic for low-yield observations, as policy-makers need accurate information on where to anticipate poor yield to better distribute resources. This is shown clearly for 2019, where low yields in most districts were systematically over-estimated in both predictive models. This low performance suggests that, if imagery-based models are to be useful in monitoring over time, they need to be developed to explicitly target temporal variation; we implement this in the final experiment.

Figure 6. Demeaned test set performance of (a) RCF model and (b) NDVI model, both trained on log-transformed maize yield. The vertical axis represents the model estimates demeaned and averaged from 10 random splits, while the horizontal axis displays demeaned observed values. Each data point signifies an annual anomaly in maize yield, relative to a district-specific mean for each Zambian district. Points and histograms are colored by the year of the CFS. The coefficient of determination (

R^{2}

) and Pearson’s correlation coefficient squared (

r^{2}

) are provided with a 2-standard-error margin in parentheses. A reference line at 45 degrees is represented in black.

3.3. Maize Yield Predictions Optimized for Temporal Performance

In this final experiment, we fully retrain imagery models, optimizing for predictive performance on temporal variation alone. Repeating the model selection process using temporal anomalies as training data revealed a new set of optimal tuning parameters (Table 3), highlighting the importance of re-evaluating all modeling decisions when training over time. After selecting the optimal model parameters using the validation sets, we evaluate performance on the held-out test sets. This results in a mean test score of 0.74 with a 2 SEM of 0.05 for the RCF model (Figure 7a). In comparison, NDVI produces an average test score of only 0.39 with a 2 SEM of 0.11 (Figure 7b). This highlights that, while RCF models can be retrained to exhibit high performance over time, NDVI is limited in its ability to isolate temporal variation, even when trained explicitly to do so.

Table 3. Input variables chosen during model selection when selecting for out-of-sample performance over time only.

Figure 7. Test-set performance of (a) RCF model and (b) NDVI model, both trained on maize-yield anomalies. The vertical axis represents the model estimates, averaged and demeaned from 10 random splits, while the horizontal axis displays demeaned observed values. Each data point signifies a year’s crop yield for a Zambian district, demeaned by district prior to modeling. Points and histograms are colored by the year of the CFS. The coefficient of determination (

R^{2}

) and Pearson’s correlation coefficient squared (

r^{2}

) are provided with a 2-standard-error margin in parentheses. A reference line at 45 degrees is represented in black.

Across all years, the root mean squared error (RMSE) of predictions is 0.0417 log points, or around 4.2 percent. Model performance is higher than average in the year with the lowest-yield anomaly, 2019 (RMSE of 0.0395), and lower than average in the year with the highest-yield anomaly, 2021 (RMSE 0.0478). This indicates that predictions using this approach are of the highest fidelity during years of large yield losses.

Table 3 shows that the full-month range is valuable when targeting temporal variability, and that Landsat 7 was detrimental to performance (while Landsat 7 was selected in the best spatio-temporal model in Table 2, only Landsat 8 is selected for the temporal model here in Table 3). One explanation for the change in sensors may be that limiting imagery to Landsat 8 eliminates the effects of the SLC failure, leading to higher-quality data, which may be important to picking up smaller-scale changes in yields over time.

The RCF model additionally shows relatively little evidence of a mean-reverting measurement error when trained on temporal variation and appears to accurately predict both extremely low- and extremely high-yield years (Supplementary Figure S3,

λ = 0.79

). In contrast, the NDVI model continues to exhibit a relatively large degree of mean reversion, with a substantial over-prediction of low-yield years in particular (

λ = 0.45

). This indicates that improved model specification can both increase model performance and reduce mean-reverting measurement error. These results also suggest that the many crop-monitoring tools that use NDVI to estimate risk of famine or food insecurity may be improved by incorporating RCF features [].

3.4. Enhancing RCF with NDVI

One valuable feature of the MOSAIKS imagery-based prediction approach is that concatenating features is straightforward; additional features can simply be added to the ridge regression step. Here, we estimate a set of models that concatenate RCF features with NDVI in addition to temperature, a variable known to be an important determinant of crop yield for rainfed crops in sub-Saharan Africa [,]. The results in Table 4 show that, when predicting overall spatial and temporal variation in maize yields (column 1), adding temperature and NDVI to the RCF features optimized for predicting overall variation slightly improves performance, raising

R^{2}

from 0.83 to 0.85. Adding temperature to NDVI alone raises its performance from

R^{2}

= 0.8 to

R^{2}

= 0.83, making it equivalent to RCF.

Table 4. Full-result comparison showing how results change with selected combinations of variables. RCF₁ = the main model specification, RCF₂ = the model selected for overtime variation, NDVI = the normalized difference vegetation index, T = temperature, and P = precipitation.

However, as shown above in the second experiment, training on the full variation does not lead to high performance over time. Column 2 shows that combining the RCF features with NDVI and temperature can improve performance relative to RCF alone, raising it from

R^{2}

= 0.42 to

R^{2}

= 0.47. When the full model is trained on temporal anomalies and optimized for performance over time, column 3 shows that, together, RCF, NDVI, and temperature achieve an out-of-sample

R^{2}

= 0.743, again slightly improving over RCF alone (

R^{2}

= 0.739) and substantially improving over NDVI alone (

R^{2}

= 0.39) when predicting temporal variation. The marginal increase in performance from adding NDVI and temperature features to the RCF model in this setting indicates that the RCF features already contain much of the information contained in the NDVI and temperature features.

4. Discussion

4.1. Advantages of Approach

Task-agnostic satellite-based embeddings (e.g., [,]) represent a powerful new means of monitoring ground conditions at substantially lower computational costs than task-specific image featurization methods, such as convolutional neural networks. This lower cost makes such methods attractive in regions where both computational resources and local training data are limited. And yet, most evaluations of generalizable satellite embeddings are conducted in data-rich environments and leverage nonpublic source imagery. Moreover, few focus explicitly on performance over time, limiting their policy relevance, given the pressing need to use timely satellite information to fill temporal data gaps between irregular economic, environmental, and agricultural ground data collection campaigns [].

This study demonstrates that a simple, task-agnostic representation of publicly available satellite imagery can provide a powerful and practical foundation for crop yield monitoring in data-scarce environments. Leveraging the MOSAIKS framework [], we show that maize yields across Zambia can be predicted with high accuracy over both space and time, using a model that is computationally efficient, transparent, and easily replicable. The approach not only outperforms conventional NDVI-based monitoring but also offers a scalable alternative to more complex deep-learning architectures that are often inaccessible to institutions in low-income settings.

Methodologically, this work advances the use of task-agnostic imagery by extending the MOSAIKS framework to a data-limited context and to a temporal prediction task. Unlike previous studies that relied on harmonized datasets, focused solely on spatial patterns, or pooled spatial and temporal variation in evaluations [,,], our analysis integrates publicly available satellite imagery to explicitly track yield variability within districts across multiple years. We also show that directly training models to capture temporal anomalies substantially improves their predictive power, a generalizable insight that could improve agricultural remote sensing models.

4.2. Model Sensitivity

As in all imagery-based prediction pipelines, key modeling decisions made in satellite image processing and image feature extraction can influence the performance of the downstream model. Here, we evaluate how three such decisions—the treatment of cloud cover, the density of imagery sampling, and the number of random convolutional features—impact model performance. In all sensitivity tests, we use Sentinel-2 imagery with red, blue, and green spectral bands, and evaluate out-of-sample performance.

4.2.1. Cloud Cover

When selecting a threshold for how much cloud cover is acceptable for an image to be used in the model, there is a tradeoff between image quality (less cloud cover) and image quantity (availability of cloud-free images). In our primary analysis, we use only imagery with less than 10% cloud cover. Testing model sensitivity to that choice, we find that reducing the threshold of cloud cover increases the quality of the imagery but substantially reduces the number of locations with suitable imagery, particularly in the rainiest months of the growing season (i.e., December–February) (Figure S4a). Model performance (in the full sample, analogous to the performance shown in Figure 5) is similar for all thresholds between 4% and our selected threshold of 10%, although performance falls at thresholds below 4% due to declining data availability. Thus, while our key findings related to model performance are robust to the precise cloud cover threshold employed, very restrictive threshold values lower performance because relatively few images are available during the growing-season months.

4.2.2. Image Sampling Density

A key design feature of our approach is its accessibility. To keep the computational demands of the implementation low, we sampled images from 10% of croplands, rather than all croplands, before aggregating imagery features to the district level for model training. Here, we evaluate the sensitivity of model performance to this sampling density in the full sample, analogous to the performance shown in Figure 5. We find that a higher sampling density improves the model performance but that performance plateaus after a density of 8% (Figure S5). Increasing the sampling density above 10% could potentially improve performance beyond what we report in the main text, although the marginal gains in performance above an 8% sampling density suggest that such gains would likely be small (and would incur additional computational costs).

4.2.3. Number of Features

Like the sampling density, the number of random convolutional features (RCFs) computed and used to predict maize yields affects both model performance and computational costs. In our main analysis, we use 1000 RCFs. Testing the sensitivity of model performance to the number of features used in the model, we see that model performance in the full sample increases with the number of features used (Figure S6). This makes sense because each additional feature captures distinct textural and spatial patterns. However, gains from additional features are minimal after approximately 300 features. Thus, it is unlikely that employing more than the 1000 features used in our model would augment the reported performance. Further, the finding that model performance is similar using only 300 features, and does not change with additional features, indicates that the model with 1000 features does not involve overfitting to the data once ridge regression regularization is applied.

4.2.4. Ablation Study

As detailed above, we selected each RCF model using a grid search of possible parameter combinations, including the following: the choice of sensor(s); the range of months within the year to include imagery for; the method for aggregating images across district boundaries; and various feature engineering decisions (see Table S3 for details). Here, we investigate the extent to which each key component of this grid search contributes to overall model performance using a simple ablation study. Specifically, Figure S1 plots the distribution of validation set

R^{2}

across all models run in our full grid search, systematically varying one key modeling decision in different columns of the figure. The results indicate large gains from combining Landsat and Sentinel sensors (median

R^{2}

rises from 0.62 with one sensor to 0.70 with two sensors), with particularly detectable benefits of using Landsat-8 (median

R^{2}

is 0.69 in models using Landsat-8, and 0.66 without). Interestingly, restricting months of the year or imagery over croplands appears to exert minimal influence over model performance.

4.3. Limitations and Future Work

Our analysis faces important limitations, which are particularly critical to consider if machine-learning-based predictions are to be used to guide decision making. These include a coarse handling of cloud cover; only imagery with excessive cloud cover was removed—in future work, synthetic-aperture radar data could be used to improve detection despite clouds, which are particularly prevalent during the rainy growing season. Second, for computational efficiency, we sample (≈10%) images from across district areas instead of densely sampling and featurizing all images that overlap each district’s administrative area. Though initial experiments suggest a minimal impact of increasing the sampling density on our results (Figure S5), modest model improvements could likely be realized with more imagery used as inputs. Third, we aggregate NDVI, climate indicators, and the MOSAIKS features to monthly values. This design choice ensures a like-for-like comparison across methods and allows features to capture information at different stages of the maize-growing season. However, the monthly NDVI average may not represent the optimal representation of crop phenology, and other methods may generate performance gains over the results shown here. Fourth, we evaluate out-of-sample performance using random splits of the data during cross-validation. This tests the ability of the model to estimate yields for specific district–year observations, which could, for example, need to be filled in because of missing or poor-quality ground survey data. We hope that future work explores model optimization and performance using RCF for explicit spatial extrapolation and temporal extrapolation tasks. Finally, as with any machine learning pipeline, our results rely heavily on the quantity and quality of the ground-truth data used to train the predictive model. This underscores that the appropriate use of imagery as an input into socioeconomic and environmental monitoring depends critically on continued investment in high-quality ground-truth data.

5. Conclusions

This study used ground-survey data from Zambian districts to train a satellite-based predictive model of maize yields between 2016 and 2021. With public Landsat and Sentinel-2 imagery as inputs, we have shown that a computationally affordable and task-agnostic machine learning model called MOSAIKS can explain 83% of out-of-sample maize-yield variation. This performance is comparable to that of similarly low-cost and widely used methods like NDVI. When trained and evaluated on temporal changes in maize yields, however, MOSAIKS strongly outperforms NDVI (

R^{2}

= 0.74 for MOSAIKS versus 0.39 for NDVI). These findings have important implications for the use of imagery-based prediction models in agriculturally dependent but data-scarce regions of the globe.

Reliable and timely crop-yield monitoring is valuable in low-income countries where agriculture underpins livelihoods and food security. Maize is particularly important to sub-Saharan Africa, where it is the dominant crop, making up one-third of calories. Like other countries in the region, Zambia uses early estimates of harvests to determine government grain purchases and releases, in an attempt to limit price spikes and resulting food insecurity. The early detection of yield anomalies enables governments and development agencies to target resources—such as input subsidies, extension services, and food aid—to the areas most in need. Accurate yield estimates can also strengthen agricultural insurance programs by improving payout precision and reducing risk. Better yield intelligence also supports private actors by informing production, storage, and market decisions, helping to build more resilient and efficient food systems overall.

A key contribution of this work is its accessibility. While input imagery is freely publicly available, we also make the task-agnostic imagery features we compute available so that other users can directly apply them to other tasks in the Zambian context (e.g., poverty prediction or land-use mapping). Resources permitting, similar MOSAIKS features could be calculated and distributed for the entirety of SSA over many years—an important aim of our future work motivated by these results. These features would enable the rapid monitoring of many variables, without users having to process the raw imagery. The predictive modeling approach used in this analysis is simple—ridge regression using precomputed task-agnostic imagery features—and requires minimal computational resources. We hope that this approach enables local institutions to implement imagery-based monitoring and thereby design yield prediction systems for their local contexts and needs. This accessibility is essential to closing the global data divide and ensuring that technological advances in remote sensing directly benefit the regions that are most food-insecure.

Supplementary Materials

The following supporting information can be downloaded at https://www.mdpi.com/article/10.3390/rs17213641/s1, S1. Supplemental Data: Table S1: Zambian districts (ADM 2) with associated mean, minimum (min), and maximum (max) values of maize yields (yield), temperatures (temp), normalized difference vegetation index (NDVI), and precipitation (precip). Table S2: Data products used in the analysis, along with key characteristics; S2. Supplemental Methods: Table S3: Parameters available for the model selection process; Figure S1: Ablation study for select components of model selection; S3. Supplemental Results: Figure S2: Test set performance of (a) RCF model and (b) NDVI model, both trained on log-transformed maize yield. Figure S3: Demeaned test-set performance of (a) RCF model and (b) NDVI model, both trained on log-transformed maize yield. Figure S4: Test-set performance of (a) RCF model and (b) NDVI model, both trained on maize-yield anomalies. Figure S5: Impact of cloud-cover threshold on data availability and test-set performance of RCF model. Figure S6: Effect of sampling density on RCF model performance. Figure S7: Effect of feature vector size on RCF model performance.

Author Contributions

Conceptualization, J.P. and T.C.; data curation, C.M. and P.H.; formal analysis, C.M., J.C., G.L. and S.C.; writing of the original draft, C.M., J.C., G.L., S.C., P.H., J.P. and T.C.; review and editing of the draft, C.M., J.C., P.H., J.P. and T.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

To enable future research, we will make data available upon publication under a Creative Commons (non-commercial, share-alike) license (CC BY-NC-SA), which prohibits commercial use and requires that any derivative works use the same license. All code used in this study is publicly available at (https://github.com/cropmosaiks/crop-modeling) accessed on 16 October 2025.

Acknowledgments

We thank the Master of Environmental Data Science program at UC Santa Barbara’s Bren School of Environmental Science & Management for supporting this work. We thank Kathy Baylis for providing data access and valuable feedback on the manuscript. This work utilized high-performance computational facilities purchased with funds from the National Science Foundation (CNS-1725797) and administered by the Center for Scientific Computing (CSC). The CSC is supported by the California NanoSystems Institute and the Materials Research Science and Engineering Center (MRSEC; NSF DMR 2308708) at UC Santa Barbara.

Conflicts of Interest

The authors declare no competing interests.

References

Burke, M.; Driscoll, A.; Lobell, D.B.; Ermon, S. Using Satellite Imagery to Understand and Promote Sustainable Development. Science 2021, 371, eabe8628. [Google Scholar] [CrossRef]
Darra, N.; Anastasiou, E.; Kriezi, O.; Lazarou, E.; Kalivas, D.; Fountas, S. Can Yield Prediction Be Fully Digitilized? A Systematic Review. Agronomy 2003, 13, 2441. [Google Scholar] [CrossRef]
Joshi, A.; Pradhan, B.; Chakraborty, S.; Behera, M.D. Winter Wheat Yield Prediction in the Conterminous United States Using Solar-Induced Chlorophyll Fluorescence Data and XGBoost and Random Forest Algorithm. Ecol. Inform. 2023, 77, 102194. [Google Scholar] [CrossRef]
Perez, A.; Yeh, C.; Azzari, G.; Burke, M.; Lobell, D.; Ermon, S. Poverty prediction with public landsat 7 satellite imagery and machine learning. arXiv 2017, arXiv:1711.03654. [Google Scholar] [CrossRef]
Nakalembe, C.; Becker-Reshef, I.; Bonifacio, R.; Hu, G.; Humber, M.L.; Justice, C.J.; Keniston, J.; Mwangi, K.; Rembold, F.; Shukla, S.; et al. A review of satellite-based global agricultural monitoring systems available for Africa. Glob. Food Secur. 2021, 29, 100543. [Google Scholar] [CrossRef]
Carletto, C.; Jolliffe, D.; Banerjee, R. From Tragedy to Renaissance: Improving Agricultural Data for Better Policies. J. Dev. Stud. 2015, 51, 133–148. [Google Scholar] [CrossRef]
World Bank. World Development Indicators 2024; Technical Report; World Bank: Washington, DC, USA, 2024. [Google Scholar]
Wani, S.P.; Sreedevi, T.K.; Rockström, J.; Ramakrishna, Y.S. Rainfed Agriculture—Past Trends and Future Prospects. In Rainfed Agriculture: Unlocking the Potential, 1st ed.; Wani, S.P., Rockström, J., Oweis, T., Eds.; CABI: Wallingford, UK, 2009; pp. 1–35. [Google Scholar] [CrossRef]
Shakoor, U.; Saboor, A.; Baig, I.; Afzal, A.; Rahman, A. Climate Variability Impacts on Rice Crop Production in Pakistan. Pak. J. Agric. Res. 2015, 28, 19–27. [Google Scholar]
Apata, T.G. Effects of Global Climate Change on Nigerian Agriculture: An Empirical Analysis. CBN J. Appl. Stat. 2011, 2, 31–50. [Google Scholar]
Granados, R.; Soria, J.; Cortina, M. Rainfall Variability, Rainfed Agriculture and Degree of Human Marginality in North Guanajuato, Mexico. Singap. J. Trop. Geogr. 2017, 38, 153–166. [Google Scholar] [CrossRef]
Idso, S.; Pinter, P.; Hatfield, J.; Jackson, R.; Reginato, R. A Remote Sensing Model for the Prediction of Wheat Yields Prior to Harvest. J. Theor. Biol. 1979, 77, 217–228. [Google Scholar] [CrossRef]
Rasmussen, M.S. Assessment of Millet Yields and Production in Northern Burkina Faso Using Integrated NDVI from the AVHRR. Int. J. Remote Sens. 1992, 13, 3431–3442. [Google Scholar] [CrossRef]
Labus, M.P.; Nielsen, G.A.; Lawrence, R.L.; Engel, R.; Long, D.S. Wheat Yield Estimates Using Multi-Temporal NDVI Satellite Imagery. Int. J. Remote Sens. 2002, 23, 4169–4180. [Google Scholar] [CrossRef]
Bolton, D.K.; Friedl, M.A. Forecasting Crop Yield Using Remotely Sensed Vegetation Indices and Crop Phenology Metrics. Agric. For. Meteorol. 2013, 173, 74–84. [Google Scholar] [CrossRef]
Petersen, L.K. Real-time prediction of crop yields from MODIS relative vegetation health: A continent-wide analysis of Africa. Remote Sens. 2018, 10, 1726. [Google Scholar] [CrossRef]
Guo, Z.; Chamberlin, J.; You, L. Smallholder maize yield estimation using satellite data and machine learning in Ethiopia. Crop Environ. 2023, 2, 165–174. [Google Scholar] [CrossRef]
Kaneko, A.; Kennedy, T.W.; Mei, L.; Sintek, C.; Burke, M.; Ermon, S.; Lobell, D.B. Deep Learning for Crop Yield Prediction in Africa. In Proceedings of the International Conference on Machine Learning AI for Social Good Workshop, Long Beach, CA, USA, 10–15 June 2019; pp. 33–37. Available online: https://aiforsocialgood.github.io/icml2019/accepted/track1/pdfs/20_aisg_icml2019.pdf (accessed on 16 October 2025).
Muruganantham, P.; Wibowo, S.; Grandhi, S.; Samrat, N.H.; Islam, N. A Systematic Literature Review on Crop Yield Prediction with Deep Learning and Remote Sensing. Remote Sens. 2022, 14, 1990. [Google Scholar] [CrossRef]
El Sakka, M.; Mothe, J.; Ivanovici, M. Images and CNN Applications in Smart Agriculture. Eur. J. Remote Sens. 2024, 57, 2352386. [Google Scholar] [CrossRef]
Victor, B.; Nibali, A.; He, Z. A Systematic Review of the Use of Deep Learning in Satellite Imagery for Agriculture. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 18, 2297–2316. [Google Scholar] [CrossRef]
Rolf, E.; Proctor, J.; Carleton, T.; Bolliger, I.; Shankar, V.; Ishihara, M.; Recht, B.; Hsiang, S. A Generalizable and Accessible Approach to Machine Learning with Global Satellite Imagery. Nat. Commun. 2021, 12, 4392. [Google Scholar] [CrossRef]
Brown, C.F.; Kazmierski, M.R.; Pasquarella, V.J.; Rucklidge, W.J.; Samsikova, M.; Zhang, C.; Shelhamer, E.; Lahera, E.; Wiles, O.; Ilyushchenko, S.; et al. AlphaEarth Foundations: An Embedding Field Model for Accurate and Efficient Global Mapping from Sparse Label Data. arXiv preprint 2025, arXiv:2507.22291. [Google Scholar] [CrossRef]
Corley, I.; Robinson, C.; Dodhia, R.; Ferres, J.M.L.; Najafirad, P. Revisiting pre-trained remote sensing model benchmarks: Resizing and normalization matters. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 3162–3172. [Google Scholar]
Sherman, L.; Proctor, J.; Druckenmiller, H.; Tapia, H.; Hsiang, S.M. Global High-Resolution Estimates of the United Nations Human Development Index Using Satellite Imagery and Machine-learning. NBER Work. Pap. Ser. 2023, 31044. [Google Scholar] [CrossRef]
Khachiyan, A.; Thomas, A.; Zhou, H.; Hanson, G.; Cloninger, A.; Rosing, T.; Khandelwal, A.K. Using Neural Networks to Predict Microspatial Economic Growth. Am. Econ. Rev. Insights 2022, 4, 491–506. [Google Scholar] [CrossRef]
Nguyen, T.T.; Mushtaq, S.; Kath, J.; Nguyen-Huy, T.; Reymondin, L. Satellite-Based Data for Agricultural Index Insurance: A Systematic Quantitative Literature Review. Nat. Hazards Earth Syst. Sci. 2025, 25, 913–927. [Google Scholar] [CrossRef]
GRID3 Inc. GRID3 Use Case Report District Boundaries Harmonisation in Zambia; GRID3 Inc.: New York, NY, USA, 2020. [Google Scholar]
Potapov, P.; Turubanova, S.; Hansen, M.C.; Tyukavina, A.; Zalles, V.; Khan, A.; Song, X.P.; Pickens, A.; Shen, Q.; Cortez, J. Global Maps of Cropland Extent and Change Show Accelerated Cropland Expansion in the Twenty-First Century. Nat. Food 2022, 3, 19–28. [Google Scholar] [CrossRef] [PubMed]
Harris, J.; Chisanga, B.; Drimie, S.; Kennedy, G. Nutrition Transition in Zambia: Changing Food Supply, Food Prices, Household Consumption, Diet and Nutrition Outcomes. Food Secur. 2019, 11, 371–387. [Google Scholar] [CrossRef]
Melkani, A.; Mason, N.M.; Mather, D.L.; Chisanga, B. Smallholder Maize Market Participation and Choice of Marketing Channel in the Presence of Liquidity Constraints: Evidence from Zambia; AgEcon Search: St. Paul, MN, USA, 2019. [Google Scholar]
Dorosh, P.A.; Dradri, S.; Haggblade, S. Regional Trade, Government Policy and Food Security: Recent Evidence from Zambia. Food Policy 2009, 34, 350–366. [Google Scholar] [CrossRef]
Morgan, S.N.; Mason, N.M.; Levine, N.K.; Zulu-Mbata, O. Dis-Incentivizing Sustainable Intensification? The Case of Zambia’s Maize-Fertilizer Subsidy Program. World Dev. 2019, 122, 54–69. [Google Scholar] [CrossRef]
Sleimi, R.; Ghosh, S.; Amarnath, G. Development of Drought Indicators Using Machine Learning Algorithm: A Case Study of Zambia. Technical Report, CGIAR Climate Resilience Initiative 2022. Available online: https://cgspace.cgiar.org/handle/10568/127620 (accessed on 16 October 2025).
Hadunka, P.; Baylis, K. Staple Crop Pest Damage and Natural Resources Exploitation: Fall Army Worm Infestation and Charcoal Production in Zambia. In Proceedings of the 2022 Agricultural Applied Economics Association Annual Meeting, Anaheim, CA, USA, 31 July–2 August 2022. [Google Scholar] [CrossRef]
Thurlow, J.; Zhu, T.; Diao, X. The Impact of Climate Variability and Change on Economic Growth and Poverty in Zambia; IFPRI: Washington, DC, USA, 2008. [Google Scholar]
Delwart, S. Sentinel-2 User Handbook; European Space Agency (ESA): Paris, France, 2015. [Google Scholar]
U.S. Geological Survey. Landsat 8 (L8) Data Users Handbook; Technical Report; U.S. Geological Survey: Reston, VA, USA, 2019.
Senay, G.B.; Velpuri, N.M.; Bohms, S.; Budde, M.; Young, C.; Rowland, J.; Verdin, J.P. Chapter 9—Drought Monitoring and Assessment: Remote Sensing and Modeling Approaches for the Famine Early Warning Systems Network. In Hydro-Meteorological Hazards, Risks and Disasters; Shroder, J.F., Paron, P., Baldassarre, G.D., Eds.; Elsevier: Amsterdam, The Netherlands, 2015; pp. 233–262. [Google Scholar] [CrossRef]
Lobell, D.B.; Hicke, J.A.; Asner, G.P.; Field, C.B.; Tucker, C.J.; Los, S.O. Satellite Estimates of Productivity and Light Use Efficiency in United States Agriculture, 1982–1998. Glob. Change Biol. 2002, 8, 722–735. [Google Scholar] [CrossRef]
Mkhabela, M.S.; Mkhabela, M.S.; Mashinini, N.N. Early Maize Yield Forecasting in the Four Agro-Ecological Regions of Swaziland Using NDVI Data Derived from NOAA’s-AVHRR. Agric. For. Meteorol. 2005, 129, 1–9. [Google Scholar] [CrossRef]
Balaghi, R.; Tychon, B.; Eerens, H.; Jlibene, M. Empirical Regression Models Using NDVI, Rainfall and Temperature Data for the Early Prediction of Wheat Grain Yields in Morocco. Int. J. Appl. Earth Obs. Geoinf. 2008, 10, 438–452. [Google Scholar] [CrossRef]
Burke, M.; Lobell, D.B. Satellite-Based Assessment of Yield Variation and Its Determinants in Smallholder African Systems. Proc. Natl. Acad. Sci. USA 2017, 114, 2189–2194. [Google Scholar] [CrossRef]
Lobell, D.B.; Azzari, G.; Burke, M.; Gourlay, S.; Jin, Z.; Kilic, T.; Murray, S. Eyes in the sky, boots on the ground: Assessing satellite-and ground-based approaches to crop yield measurement and analysis. Am. J. Agric. Econ. 2020, 102, 202–219. [Google Scholar] [CrossRef]
Yang, C.; Odvody, G.N.; Thomasson, J.A.; Isakeit, T.; Nichols, R.L. Change Detection of Cotton Root Rot Infection over 10-Year Intervals Using Airborne Multispectral Imagery. Comput. Electron. Agric. 2016, 123, 154–162. [Google Scholar] [CrossRef]
Kumar, S.; Röder, M.S.; Singh, R.P.; Kumar, S.; Chand, R.; Joshi, A.K.; Kumar, U. Mapping of Spot Blotch Disease Resistance Using NDVI as a Substitute to Visual Observation in Wheat (Triticum aestivum L.). Mol. Breed. 2016, 36, 95. [Google Scholar] [CrossRef]
Xu, Y.; Yang, J.; Chen, Y. NDVI-based Vegetation Responses to Climate Change in an Arid Area of China. Theor. Appl. Climatol. 2015, 126, 213–222. [Google Scholar] [CrossRef]
Wang, R.; Cherkauer, K.; Bowling, L. Corn Response to Climate Stress Detected with Satellite-Based NDVI Time Series. Remote Sens. 2016, 8, 269. [Google Scholar] [CrossRef]
Didan, K. MODIS/Terra Vegetation Indices Monthly L3 Global 0.05Deg CMG V061; Dataset; NASA EOSDIS Land Processes Distributed Active Archive Center: Sioux Falls, SD, USA, 2021. [Google Scholar] [CrossRef]
Harris, I.; Osborn, T.J.; Jones, P.; Lister, D. Version 4 of the CRU TS Monthly High-Resolution Gridded Multivariate Climate Dataset. Sci. Data 2020, 7, 109. [Google Scholar] [CrossRef]
Maestrini, B.; Basso, B. Drivers of Within-Field Spatial and Temporal Variability of Crop Yield across the US Midwest. Sci. Rep. 2018, 8, 14833. [Google Scholar] [CrossRef]
Aiken, E.; Rolf, E.; Blumenstock, J. Fairness and Representation in Satellite-Based Poverty Maps: Evidence of Urban-Rural Disparities and Their Impacts on Downstream Policy. arXiv preprint 2023, arXiv:2305.01783. [Google Scholar] [CrossRef]
Li, J.; Roy, D.P. A Global Analysis of Sentinel-2A, Sentinel-2B and Landsat-8 Data Revisit Intervals and Implications for Terrestrial Monitoring. Remote Sens. 2017, 9, 902. [Google Scholar] [CrossRef]
Carrol, R.J.; Rupert, D.; Stefanski, L.A.; Crainiceanu, C.M. Measurement Error in Nonlinear Models: A Modern Perspective, 2nd ed.; Chapman and Hall/CRC: Boca Raton, FL, USA, 2006. [Google Scholar] [CrossRef]
Ratledge, N.; Cadamuro, G.; De La Cuesta, B.; Stigler, M.; Burke, M. Using Machine Learning to Assess the Livelihood Impact of Electricity Access. Nature 2022, 611, 491–495. [Google Scholar] [CrossRef]
Proctor, J.; Carleton, T.; Sum, S. Parameter Recovery Using Remotely Sensed Variables; Technical Report; National Bureau of Economic Research: Cambridge, MA, USA, 2023. [Google Scholar]
Beyer, M.; Wallner, M.; Bahlmann, L.; Thiemig, V.; Dietrich, J.; Billib, M. Rainfall Characteristics and Their Implications for Rain-Fed Agriculture: A Case Study in the Upper Zambezi River Basin. Hydrol. Sci. J. 2015, 61, 321–343. [Google Scholar] [CrossRef]
Van Ittersum, M.K.; van Bussel, L.G.J.; Wolf, J.; Grassini, P.; van Wart, J.; Guilpart, N.; Claessens, L.; de Groot, H.; Wiebe, K.; Mason-D’Croz, D.; et al. Can Sub-Saharan Africa Feed Itself? Proc. Natl. Acad. Sci. USA 2016, 113, 14964–14969. [Google Scholar] [CrossRef]

Figure 1. The country of Zambia is shown with 10 provinces (thick borders) and the 72 district divisions (thin borders) used in analysis. The Zambian cropland percentage is shown as a background raster. Cropland data from [], using the 2019 raster layer at 30 m resolution and aggregated to 0.01 degree resolution. For a full list of districts and relevant statistics, please see Supplementary Table S1. The inset map shows Zambia’s location within Africa (red box).

Figure 2. Seasonal timing of agricultural activity, precipitation, crop surveying, and satellite imagery collection in Zambia. The blue bars show monthly mean precipitation, averaged over Zambia from October 2008 to December 2020. The precipitation data is from the ERA5 reanalysis, which has 0.25-degree resolution. The black horizontal bars show the average timing of key agricultural activities (sowing, growing, and harvest), as well as survey data collection (denoted CFS for Crop Forecast Survey). The black dotted line denotes an extended harvest into August, which occurs in some Zambian districts in some years. Satellite imagery is available over the entire year. In our analysis, we investigate using the full monthly range of images, many of which are of low quality due to cloud cover, as well as a limited month range that includes harvest time and less cloud cover.

Figure 3. Methodology overview showing (a) the creation of tabular data from satellite imagery and crop forecast survey data, (b) model selection, (c) experiment 1, selecting for overall performance, and (d) experiment 2, selecting for over-time performance. Detailed explanations of the choices made in panel (b) are shown in the Supplementary Methods Section.

Figure 4. Comparative examples of two randomly selected locations in two districts with varying maize yields. Row 1 represents Monze, an area with historically lower average maize yields, and row 2 represents Choma, with high average maize yields. Panel (a) displays the true-color satellite imagery (Sentinel 2) for both locations, panel (b) shows random convolutional feature (RCF) activation maps for four selected random features, and panel (c) presents the Normalized Difference Vegetation Index (NDVI; computed from Sentinel 2). Each row contrasts the conditions during two different harvest years: 2019, marked by generally poor yields across most districts, and 2021, noted for an overall good harvest. Yields shown in metric tons per hectare (mt/ha) are denoted for each subfigure, illustrating the substantial variation between the two years and across locations.

Figure 5. Test set performance of (a) RCF model and (b) NDVI model, both trained on log-transformed maize yields. The vertical axis represents the model estimates averaged from 10 random data splits, while the horizontal axis displays observed values. Each data point signifies the annual crop yield of a single Zambian district in a single agricultural season. Points and histograms are colored by the year of the CFS. The coefficient of determination (

R^{2}

) and Pearson’s correlation coefficient squared (

r^{2}

) are provided with a 2-standard-error margin in parentheses. A reference line at 45 degrees is represented in black.

Figure 6. Demeaned test set performance of (a) RCF model and (b) NDVI model, both trained on log-transformed maize yield. The vertical axis represents the model estimates demeaned and averaged from 10 random splits, while the horizontal axis displays demeaned observed values. Each data point signifies an annual anomaly in maize yield, relative to a district-specific mean for each Zambian district. Points and histograms are colored by the year of the CFS. The coefficient of determination (

R^{2}

) and Pearson’s correlation coefficient squared (

r^{2}

) are provided with a 2-standard-error margin in parentheses. A reference line at 45 degrees is represented in black.

Figure 7. Test-set performance of (a) RCF model and (b) NDVI model, both trained on maize-yield anomalies. The vertical axis represents the model estimates, averaged and demeaned from 10 random splits, while the horizontal axis displays demeaned observed values. Each data point signifies a year’s crop yield for a Zambian district, demeaned by district prior to modeling. Points and histograms are colored by the year of the CFS. The coefficient of determination (

R^{2}

) and Pearson’s correlation coefficient squared (

r^{2}

) are provided with a 2-standard-error margin in parentheses. A reference line at 45 degrees is represented in black.

Table 1. The spectral bands of Sentinel-2 MSI, Landsat 5 ETM, Landsat ETM+, and Landsat 8 OLI instruments used in the analysis. Spectral ranges are presented in micrometers (

μ

m). Band abbreviations: R = Red, G = Green, B = Blue, NIR = Near-infrared, SWIR_1.6 = Short-Wave Infrared 1.6

μ

m, SWIR_2.2 = Short-Wave Infrared 2.2

μ

m, CA = Coastal aerosol. “–” values represent a band that is not available or was not considered in the analysis. Overlap with the crop forecast survey (CFS) data to show the usable year range of each satellite, given the scope of our analysis.

Table 1. The spectral bands of Sentinel-2 MSI, Landsat 5 ETM, Landsat ETM+, and Landsat 8 OLI instruments used in the analysis. Spectral ranges are presented in micrometers (

μ

m). Band abbreviations: R = Red, G = Green, B = Blue, NIR = Near-infrared, SWIR_1.6 = Short-Wave Infrared 1.6

μ

m, SWIR_2.2 = Short-Wave Infrared 2.2

μ

m, CA = Coastal aerosol. “–” values represent a band that is not available or was not considered in the analysis. Overlap with the crop forecast survey (CFS) data to show the usable year range of each satellite, given the scope of our analysis.

Satellite	Sensor	Overlap with CFS	Spectral Band ( $μ$ m)
Satellite	Sensor	Overlap with CFS	R	G	B	NIR	SWIR_1.6	SWIR_2.2	CA
Sentinel-2	MSI	2016–2022	0.65–0.68	0.54–0.58	0.46–0.52	0.78–0.90	–	–	–
Landsat 5	TM	2009–2013	0.63–0.69	0.52–0.60	0.45–0.52	0.76–0.90	1.55–1.75	2.08–2.35	–
Landsat 7	ETM+	2009–2021	0.63–0.69	0.53–0.61	0.45–0.52	0.78–0.90	1.55–1.75	2.09–2.35	–
Landsat 8	OLI	2013–2021	0.64–0.67	0.53–0.59	0.45–0.51	0.85–0.88	1.57–1.65	2.11–2.29	0.43–0.45

Table 2. Input variables chosen during model selection when selecting for out-of-sample performance over space and time.

Sensor Platform	Spectral Bands	Spatial Coverage	Month Range	Crop Mask	Feature Summary Method	District Fixed Effects	Year Range
Landsat-C2-L2	R, G, B, NIR, SWIR1, SWIR2	19,598	4–9	TRUE	Simple mean	TRUE	2016–2021
Sentinel-2-L2a	R, G, B, NIR	15,058	1–12	TRUE	Simple mean

Table 3. Input variables chosen during model selection when selecting for out-of-sample performance over time only.

Sensor Platform	Spectral Bands	Spatial Coverage	Month Range	Crop Mask	Feature Summary Method	Year Range
Landsat-8-C2-L2	AOT, R, G, B, NIR, SWIR1, SWIR2	15,058	1–12	TRUE	Simple mean	2016–2021
Sentinel-2-L2a	R, G, B	3772	1–12	TRUE	Simple mean	2016–2021

Table 4. Full-result comparison showing how results change with selected combinations of variables. RCF₁ = the main model specification, RCF₂ = the model selected for overtime variation, NDVI = the normalized difference vegetation index, T = temperature, and P = precipitation.

Variables	Overall	Demeaned	Anomaly	Overall	Demeaned	Anomaly
Variables	Test $R^{2}$	Test $R^{2}$	Test $R^{2}$	Test $r^{2}$	Test $r^{2}$	Test $r^{2}$
RCF₁, NDVI, T	0.846	0.465	0.582	0.852	0.490	0.609
RCF₁, NDVI	0.838	0.449	0.586	0.844	0.478	0.611
T, NDVI	0.834	0.421	0.514	0.840	0.440	0.539
RCF₁	0.832	0.422	0.583	0.838	0.452	0.607
P, T, NDVI	0.832	0.410	0.508	0.837	0.433	0.534
P, NDVI	0.815	0.361	0.435	0.823	0.383	0.467
RCF₂, NDVI, T	0.812	0.427	0.743	0.820	0.484	0.759
T	0.807	0.313	0.378	0.813	0.356	0.432
RCF₂, NDVI	0.806	0.406	0.734	0.814	0.465	0.749
NDVI	0.804	0.331	0.387	0.814	0.354	0.412
RCF₂	0.803	0.393	0.739	0.811	0.451	0.751
P, T	0.802	0.294	0.369	0.808	0.338	0.428
P	0.699	0.006	0.049	0.717	0.080	0.113

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Monitoring Maize Yield Variability over Space and Time with Unsupervised Satellite Imagery Features

Highlights

Abstract

1. Introduction

2. Materials and Methods

2.1. Study Context

2.2. Data

2.2.1. Crop Forecast Survey (CFS)

2.2.2. Cropland Data

2.2.3. Satellite Imagery

2.2.4. Normalized Difference Vegetation Index (NDVI)

2.2.5. Temperature and Precipitation

2.3. Methods

2.3.1. Standardized Grid for Location Sampling

2.3.2. Selecting Imagery in Sampled Locations

2.3.3. Feature Extraction and Processing

2.3.4. Feature Imputation

2.3.5. Feature Summarization

2.3.6. Model Specification and Tuning

2.3.7. Model Selection

2.3.8. Model Evaluation

2.3.9. Experiments Overview

2.3.10. Predicting Maize Yields over Space and Time

2.3.11. Isolating Predictive Performance over Time

2.3.12. Model Customization for Temporal Prediction

2.3.13. Establishing an NDVI Performance Benchmark

3. Results

3.1. Maize Yield Predictions over Space and Time

3.2. Temporal Performance of Spatiotemporal Models

3.3. Maize Yield Predictions Optimized for Temporal Performance

3.4. Enhancing RCF with NDVI

4. Discussion

4.1. Advantages of Approach

4.2. Model Sensitivity

4.2.1. Cloud Cover

4.2.2. Image Sampling Density

4.2.3. Number of Features

4.2.4. Ablation Study

4.3. Limitations and Future Work

5. Conclusions

Supplementary Materials

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics