Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Monitoring Maize Yield Variability over Space and Time with Unsupervised Satellite Imagery Features

Remote Sens. 2025, 17(21), 3641; https://doi.org/10.3390/rs17213641

by Cullen Molitor¹

, Juliet Cohen²

, Grace Lewin³, Steven Cognac²

, Protensia Hadunka⁴

, Jonathan Proctor⁵

and Tamma Carleton^6,7,*

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Reviewer 3: Anonymous

Remote Sens. 2025, 17(21), 3641; https://doi.org/10.3390/rs17213641

Submission received: 31 August 2025 / Revised: 17 October 2025 / Accepted: 29 October 2025 / Published: 4 November 2025

(This article belongs to the Special Issue Crop Yield Prediction Using Remote Sensing Techniques)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

This paper presents a compelling application of the MOSAIKS framework for predicting maize yield in Zambia using satellite imagery. The work is timely, methodologically rigorous, and addresses an important gap in the literature by focusing on a low-resource, data-sparse context. However, several limitation for improvements are worth noting. My comments are as follows:

1) This paper uses a fixed 10% cloud cover threshold and simple imputation (spatial and temporal averaging). This may introduce bias, especially during the rainy season when cloud persistence is high.

2) Only ~10% of grid cells per district are sampled. While computationally efficient, this may miss important spatial heterogeneity. The authors need to conduct a sensitivity analysis to determine the optimal sampling density or use all available pixels with more efficient featurization (e.g., using GPU acceleration).

3) While mean-reversion bias is noted in temporal predictions, this paper does not deeply investigate its causes (e.g., feature saturation, imputation artifacts, or model misspecification). So, it is suggested to include residual analysis, partial dependence plots, or SHAP values to interpret feature contributions and identify sources of bias.

4) It does not clearly quantify which bands, sensors, or months contribute most to predictive power. So, some ablation studies are needed.

5) The research situation of monitoring maize yield using remote sensing data is not adequatlry introduced. More recent studies should be cited and summarized properly.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

This study employs an innovative approach by utilizing unsupervised satellite image features (MOSAIKS) to predict spatio-temporal variations in maize yields in Zambia. In this study，the author mentioned that they expand the MOSAIKS framework in several novel ways. They use public satellite imagery data to increase the spectral range in convolutions. The time-varying features were used to evaluated to evaluate their performance Maize yield estimation in sub-Saharan Africa, where yield data tend to be limited in scale and scope.Throughout the model evaluation process, they also test how a key set of modeling decisions affect overall performance predicting maize yield. Specifically, they build a set of candidate models based on these decisions, such as which satellite sensors to include and whether to apply a cropland mask.

The methodology exhibits considerable practical significance, and comparisons with the NDVI benchmark substantially enhance the study's credibility, particularly in African regions where agricultural data availability is limited.

This manuscript is good readable and this research falls well within the scope of this journal. However, I have some concerns about the method and result presentations. I think it needs some major modifications and further reviews before it can be accepted for publication in this journal.

Although the use of random convolutional features (RCFs) and ridge regression is mentioned, key parameters such as the number of convolutional kernels, feature dimensions, and regularization parameter ranges remain unspecified. Please supplement the specific parameter settings for MOSAIKS feature extraction, particularly the convolutional layer architecture implemented in PyTorch (e.g., number of channels, number of filters).

RCFs are “task-agnostic” random features that lack explicit physical meaning in themselves. Although the final prediction performance is excellent, how can we understand which specific image features (texture, structure, phenology) are driving the prediction of maize yield? For example, Figure 4 displays some activation maps, but it lacks an in-depth explanation of why these features correlate with high or low yields.

In Sampling was conducted using only 10% of the grid cells, which, while computationally efficient, lacks sufficient justification regarding whether this sampling rate adequately represents the entire region. Could you provide a sensitivity analysis demonstrating the impact of different sampling rates on model performance?

The model had only 432 training samples (6 years across 72 districts/counties) from 2016 to 2021, yet generated a large number of RCF features (exact quantity unspecified). Although ridge regression was employed to prevent overfitting, the combination of such high dimensionality (potentially thousands of features) with relatively few samples raises concerns about potential overfitting risks. Can 5-fold cross-validation adequately assess its generalization capability? It is recommended to supplement with learning curves or demonstrate the final number of features selected by the model.

The paper only compares with NDVI and does not contrast with current mainstream CNN or Transformer-based remote sensing models. Could comparative analysis data with other algorithms be added to highlight MOSAIKS' advantages in computational efficiency and performance?

With regard to the utilization of public satellite imagery data, was spectral consistency processing appliedfor those different type imagery? This aspect appears to lack clarity in the methodological description.

How can we quantitatively assess predictive capability for “extreme years” (e.g., 2019 with insect infestation + drought)? While the visualizations are useful, we recommend providing classification/regression performance metrics (bias, sensitivity/specificity, or RMSE) specifically for extreme years (below the 10th percentile or above the 90th percentile) to evaluate model reliability during policy-sensitive years.

Cloud cover filtering and missing image interpolation: Does the adopted interpolation strategy (first within-year averaging, then cross-year averaging) result in “excessive smoothing,” thereby weakening the response to extreme droughts or insect infestations? Has sensitivity analysis been conducted?

The article employs random 80/20 splits with 5-fold cross-validation during training, repeated across 10 random splits. Please clarify whether these random splits are stratified by “year” or ‘region’ (particularly for time-series performance evaluation, ensuring test years remain “future” for the model to prevent temporal leakage). Such partitioning is especially sensitive for time-series tasks.

The paper requires further clarification regarding the temporal scope of the selected characteristic data. If annual data is used, consideration should be given to the role of data from the early planting stage or late harvest period in yield estimation, and whether this might introduce redundant information.

The NDVI and indicators such as temperature and precipitation used in the paper are based on monthly average values. Taking NDVI as an example, maizetypically exhibits a rapid upward trend during the reproductive growth stage. Using monthly averages may smooth or weaken this dynamic process to some extent, thereby affecting the characterization of key growth stages. Therefore, it is recommended to further discuss the effectiveness and appropriateness of this method in capturing the detailed characteristics of the maize growth process.

The paper appears to lack a discussion section. It is recommended that the author consider incorporating relevant content into this section to enhance the interpretation and contextualization of the findings.

Line 18. No literature references should be included in the abstract.

Line 206.Is "weather to apply a cropland mask." an expression error issue? It should be modified to "..." whether to apply a cropland mask”.

Figure 4. The place names indicated in the figure do not correspond to those in the title.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

The authors submit for publication a manuscript that offers a low-cost method for estimating maize yields in the sub-Saharan Zambian context using publicly available satellite imagery. In particular, the authors leverage the MOSAIKS framework. The methodology and validation is robust, the writing is of good quality, the figures/tables support the results, and I believe the geoprocessing has been done competently. Additionally, estimating maize yields in the farming context of Zambia and neighboring countries is complex and improved remote sensing-based methods are needed. The paper is suitable for publication, though I do have some questions that lead me to suggest major revisions. Much of my feedback below is in the form of questions, which could be addressed in the manuscript text upon revision.

The introduction describes the sub-Saharan African agricultural landscape and the utility of the MOSAIKS method, but it could also use some content on the importance of agriculture to Zambia more specifically.

Please expand on the use of the cropland data (Section 2.3). Was there a threshold used to determine agriculture / not agriculture from the Potapov et al. dataset? In line 143, weighting is mentioned - was this done with respect to the cropland percentage?

Were the Landsat and Sentinel-2 data harmonized in some way, or fused, or simply used in combination? Please comment on potential limitations of spatial resolution and spectral resolution differences.

Why use NASA MODIS for NDVI when so much attention has been given to Landsat and Sentinel-2 reflectance for MOSAIKS? (Section 2.5) Seems like a bit of a mismatch for the comparison when NDVI could also be calculated from Landsat/Sentinel-2. Why not relate MODIS-based MOSAIKS to MODIS NDVI? Also, why MOD13C2 instead of MOD13Q1, which has a much finer spatial resolution? Is the climate modeling grid of some relevance? I don’t discount the value of the analysis, but I think it’s important to understand the rationale for this decision and to frame the results cautiously if this is indeed a major limitation. Would a future study and further promotion of MOSAIKS benefit from tests on the use of MOD13Q1 or Landsat/Sentinel-2 based NDVI?

A major point is made in the introduction about a barrier to entry with respect to satellite-based models of maize yield. I agree, but I am not completely persuaded that MOSAIKS removes very many barriers other than the hardware/data requirements. I believe it still requires substantial expert knowledge and isn’t really a hallmark of simplicity. Perhaps the accessibility stated in the introduction could be tempered.

Line 38: Consider deleting “possible”

Line 43: In what context of “sub-Saharan Africa” does the 35% of GDP metric relate to? Some definitions would refer to all countries south of, or intersecting, the Sahel, while others will describe more of the southeastern African context (e.g., Zambia, Malawi, Zimbabwe, Tanzania, etc.). What is the fraction of GDP attributed to agriculture for Zambia?

Line 45: Percentage listed as text. Prior sentences use numeric structure. Check for consistency.

Line 52: Consider rephrasing to “raw image data”

Lines 57-61: Could cite corroborating literature

Line 62: “decompose performance predicting” kind of vague/unclear

Line 68: Consider deleting “independently”

Line 73: Delete “or”?

Line 77: Change to “pipeline can be used”?

Line 78: “performs well when predicting maize”

Line 105: Consider changing to “Our study leverages in situ maize data”

Line 137: “these data”

Figure 2 Caption: dotted line instead of dashed line

Table 1: Are these year ranges in the “Overlap with CFS” column? If so, consider adding en dashes.

Table 1: I find the double decimals for spectral band ranges to be confusing. Consider en dashes instead or at least clarify in the caption.

Line 150: Were these harmonized Sentinel-2 data? TOA or SR?

Line 171: Comma after example

Line 179: “these global data”

Line 193: MOSAIKS acronym already defined

Line 279: What’s the difference between R2 and the squared PCC? Is this to compare the ridge regression R2 to the linear R2 (or PCC squared)? A little clarification here would help many readers.

Line 348: Where did the temperature and precipitation data come from?

Figure 4: Were the panel (c) NDVI maps produced using MOD13C2? Please clarify the spatial resolution of this dataset and how/if it was downscaled to the farm scale in this diagram.

Line 400: “pf” to “of”

Figure 6 and Lines 406-407: R2 is relatively low, but would it be even lower if some outliers were removed?

Results: Should this section be titled “Results and Discussion”?

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

The authors have well addressed all the questions.

Reviewer 3 Report

Comments and Suggestions for Authors

The authors have duly considered the suggestions/requests from the reviewers and have made substantial improvements to the manuscript. I have no further comments and would recommend this paper be accepted for publication.

Article Menu

Monitoring Maize Yield Variability over Space and Time with Unsupervised Satellite Imagery Features

Further Information

Guidelines

MDPI Initiatives

Follow MDPI