Review Reports - Research on a Prediction Model for Northern Cold Climate Millet Yield per Unit Area Based on IWOA-BP

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

The paper is well-organized and deals with the critical problem of yield forecasting when there
isn't much data and the sample size is tiny. Adding both a temporal rolling forecast for 2024 and a
spatial cross-validation test is a big plus because it shows a strict way to check the model.
The results clearly show that the IWOA-BP model is better, which is a useful finding. The
manuscript offers a promising addition to the field of agricultural modeling. But before it can be
suggested for publishing, a number of big and small issues need to be fixed to make the methods clearer, more precise, and easier to understand.

Big Remarks
1. A key methodological aspect is not obvious. The research employs monthly data for eight
input factors to forecast a singular annual return estimate. It is said that the total sample size is
fifty datasets (five areas over ten years). It is not clear how the model uses the twelve-monthly
values for each of the eight variables as inputs.
• Do you combine the monthly figures for the eight factors into yearly statistics, like the
average temperature during the growth season and the total amount of rain?
• Or do you utilize all 96 monthly data points (12 months × 8 variables) as input features to
guess the annual yield for each of the 50 samples?
This difference is very important for understanding how the model works and how well it
works. The authors need to clearly explain how they turned the monthly time-series data
into a feature vector for the annual prediction model.
2. The authors do a great job of analyzing the correlation and rightly point out that there is a lot
of multicollinearity across the independent variables, especially between the measurements for
soil and air temperature (for example, the correlation between X1 and X4 is 0.95). In the analysis
discussion, they suggest a good plan: "It is hypothesized that the X6 and X7 may serve as core
inputs, with the X3 and X5 providing supplementary data as needed..." The results section, on
the other hand, does not say which combination of variables was employed in the final models
that were presented. Please make it clear if all eight variables were kept as inputs or if a feature
selection process was done based on the collinearity study. This is very important for the study to
be able to be repeated.
3. The study is correctly set up as a "small-sample prediction" problem. The cross-validation
method is fine, but the paper would be better if it went into more detail on the problems and risks
of using a neural network, which is a data-heavy method, on a dataset with only 50 samples. A
short talk about the possibility of overfitting, even with regularization and cross-validation, and
how the model's design was kept basic to avoid this would give useful context and show that you
are thinking more deeply about the problem's limits.
Minor Comments 1. The phrase "per unit area yielded" is used as a verb in a number of places
(for example, "per unit area yielded fifty datasets," "IWOA per unit area yielded 0.185," and
"Fangzheng County per unit area yielded error magnitudes..."). This is not correct grammatically.
Please fix these sentences so that they use the right words.
2. The statement that starts on line 606 seems to be missing something or has bad grammar:
"...minor systematic biases that persist in construction for phenological critical windows...".
Please change this sentence so that it is easier to understand.
3. The text on line 437 says that the MAPE for the IWOA-BP model is "0.59%," yet Table 2 says
that the value is "0.59." The number in the table is probably already a percentage because the
MAPE formula includes multiplying by 100. Please make sure that this measure is described the
same way in both the text and the tables.
4. The Heilongjiang Province map in Figure 1 is not very clear. It would be helpful for the reader
if you could give them a better picture.

Advice
The research in this manuscript is both useful and well-done. The suggested IWOA-BP model
has a lot of potential for predicting yields. However, the lack of clarity on important
methodological issues, especially how to arrange the input features and deal with
multicollinearity, makes it impossible to fully judge the work's scientific soundness and
reproducibility

Author Response

Response to Reviewer 1 Comments
1. Summary
Thank you for the careful and constructive reviews. We have revised the manuscript and, where appropriate, the supplementary materials, and we believe the paper is now substantially improved. All textual amendments are marked with track changes in the resubmitted files, and every point is addressed in the detailed, point-by-point responses.We appreciate the reviewers’ contributions and have aimed to address each concern clearly and transparently.
2. Questions for General Evaluation	Reviewer’s Evaluation	Response and Revisions
Does the introduction provide sufficient background and include all relevant references?	Can be improved
Are all the cited references relevant to the research?	Can be improved
Is the research design appropriate?	Can be improved
Are the methods adequately described?	Can be improved
Are the results clearly presented?	Can be improved
Are the conclusions supported by the results?	Can be improved
3. Point-by-point response to Comments and Suggestions for Authors
Comments 1: ”A critical methodological detail is unclear. The study uses monthly data for eight input variables to predict a single annual yield value. The total sample size is stated to be fifty datasets (5 regions over 10 years). It is not specified how the twelve-monthly values for each of the eight variables are handled as inputs for the model. ·Are the monthly values for the eight variables aggregated into annual statistics (e.g., growing season average temperature, total precipitation)? ·Or, are all 96 monthly data points (12 months × 8 variables) used as input features to predict the single annual yield for each of the 50 samples?”
Response 1: Thank you. We appreciate your careful reading and your emphasis on methodological transparency and reproducibility. Your comment precisely identifies a point that could confuse readers about how our input features are constructed from the monthly records. The manuscript states that we predict a single annual yield for each region–year and that the dataset comprises fifty observations drawn from five regions over ten years. However, it does not explicitly explain how the twelve monthly observations for each of the eight environmental variables are transformed into model inputs. Specifically, the procedures are ambiguous as to whether the monthly series are first aggregated into annual or growing-season statistics—such as average temperatures and total precipitation—or whether all ninety-six raw monthly values per sample are used directly as predictors. This lack of detail makes it difficult for readers to reproduce the feature construction and to understand the dimensionality and scale of the final input set used in the reported models.To address the aforementioned issues, we have supplemented the article with an annual statistical chart covering all eight factors for the decade spanning 2015 to 2024: Figure 2. Annual statistics for 2015 to 2024 based on eight factors.
Comments 2: The authors perform an excellent correlation analysis and correctly identify a high degree of multicollinearity among the independent variables, particularly between soil and air temperature metrics (e.g., correlation between X1 and X4 is 0.95). In the discussion of this analysis, they propose a sound strategy: “It is hypothesized that the X6 and X7 may serve as core inputs, with the X3 and X5 providing supplementary data as required...”. However, the results section does not specify which combination of variables was ultimately used in the final, reported models. Please clarify if all eight variables were retained as inputs or if a feature selection process was conducted based on the collinearity analysis. This is crucial for the reproducibility of the study.
Response 2: Thank you. We appreciate your careful reading and your focus on reproducibility and model transparency. Your comment rightly points out a potential disconnect between the collinearity findings, the strategy outlined in the discussion, and what was actually implemented in the reported models. The manuscript documents strong multicollinearity—especially between soil and air temperature variables—and proposes using X6 and X7 as core inputs with X3 and X5 as supplementary. However, the Results section did not explicitly state which predictors were ultimately included in the final models, leaving it unclear whether all eight growing-season variables were retained or whether a reduced subset was used after the collinearity analysis. This ambiguity makes it difficult for readers to know the final input dimensionality and to reproduce the pipeline.We added a clear statement in the text specifying the final input specification used in all reported models. The text now states that the final models employ a four-variable set—X6, X7 , X3 and X5—and that an eight-variable model is retained only as a baseline for comparison. This addition resolves the ambiguity and aligns the Results section with the rationale given in the collinearity analysis.
Comments 3: The manuscript is correctly framed as a “small-sample prediction” problem. While the cross-validation approach is appropriate, the manuscript would be strengthened by a more in-depth discussion of the inherent challenges and risks of applying a neural network, a data-intensive method, to a dataset of only 50 samples. A brief discussion on the potential for overfitting, even with regularization and cross-validation, and how the model's architecture was kept simple to mitigate this, would add valuable context and demonstrate a deeper engagement with the problem's limitations.
Response 3: Thank you. We appreciate the reviewer’s focus on input definition and reproducibility. Your comment rightly asks us to be explicit about which predictors are used in the final models and how the monthly variables are transformed into model inputs.The manuscript described strong collinearity and discussed a compact specification, but the Results did not clearly state the exact combination of predictors used in the reported models, nor did it make explicit whether monthly measurements were aggregated to annual features or used directly. This ambiguity affects readers’ ability to reproduce the pipeline and understand the final input dimensionality.We added a clear statement to the Methods that all reported models use a four-variable input set derived from the collinearity analysis—X6 , X7 , X3 and X5 . We clarify that monthly variables are aggregated to growing-season statistics for May to September, using means for temperature and soil moisture and totals for precipitation. We note that an eight-variable baseline with X1–X8 is included for reference, but unless otherwise stated, results refer to the four-variable specification. We also state that inputs are z-scored within training folds only and performance is evaluated with year-stratified five-fold cross-validation.Thank you for raising your query regarding the article. We have now added the relevant explanation in the text.
Comments 4: There are several instances of the phrase “per unit area yielded” used as a verb (e.g., “per unit area yielded fifty datasets”, “IWOA per unit area yielded 0.185”, “Fangzheng County per unit area yielded error magnitudes...”). This is grammatically incorrect. Please revise these sentences to use correct phrasing.
Response 4: Thank you. We appreciate your careful reading and your attention to grammatical clarity and consistency.Several sentences incorrectly used the phrase “per unit area yielded” as a verb, which conflates a unit-modifying noun phrase with a verbal predicate and can confuse readers about whether the statement refers to numerical results or to the definition of yield per unit area.We conducted a manuscript-wide edit to correct this usage. We standardized the noun phrase to “yield per unit area” and, where a verb was needed, we used grammatically correct verbs such as “yielded,” “produced,” “returned,” or “achieved” without attaching “per unit area.” These corrections were applied consistently across the Abstract, Methods, Results, Discussion, and all figure/table captions to ensure uniform style and clarity.
Comments 5: The sentence beginning on line 606 appears to be incomplete or syntactically incorrect: “...minor systematic biases that persist in construction for phenological critical windows...”. Please revise this sentence for clarity.
Response 5: Thank you. We appreciate your careful reading and attention to sentence clarity and completeness.You noted that the sentence beginning on line 606 was incomplete and syntactically incorrect, specifically the clause “…minor systematic biases that persist in construction for phenological critical windows…,” which obscured the intended meaning.We rewrote the sentence for clarity and grammatical correctness and placed the revised text in the text. The sentence now reads: “These limitations may introduce minor systematic biases, particularly in defining phenological critical windows. To mitigate them, future work will deepen the nonlinear characterization of rainfall and soil moisture, incorporate higher spatiotemporal resolution multisource remote sensing and reanalysis data, and integrate transfer learning with domain adaptation.”
Comments 6: In the text on line 437, MAPE for the IWOA-BP model is reported as “0.59%,” whereas Table 2 lists the value as “0.59.” Since the MAPE formula includes multiplication by 100, the value in the table is likely already a percentage. Please ensure consistency in how this metric is described in the text and tables.
Response 6: Thank you. We appreciate your careful check of unit consistency for error metrics.You noted a mismatch between the narrative, which reported the IWOA-BP MAPE as “0.59%,” and Table 2, which listed “0.59,” creating ambiguity about whether values were expressed as percentages or fractions.We standardized the presentation by changing the narrative value from “0.59%” to “0.59” to match Table 2.
Comments 7: The map of Heilongjiang Province in Figure 1 is of low resolution. If possible, providing a higher-quality image would be beneficial for the reader.
Response 7: Thank you. We appreciate your attention to figure quality and readability.You noted that the map of Heilongjiang Province in Figure 1 was low resolution, which could impede legibility of geographic details and labels.We replaced Figure 1 with a higher-quality image and updated the manuscript at line 153. Thank you again for your thoughtful consideration. We have carefully addressed all feedback and believe the manuscript has been strengthened. We are happy to provide any additional materials upon request and can coordinate directly with the editorial office as needed. We appreciate your time and look forward to your decision.
4. Response to Comments on the Quality of English Language
Point 1:The English could be improved to more clearly express the research.
Response 1:Thank you for the suggestion. We have refined the language throughout to improve clarity and readability, and all edits are marked in track changes in the revised manuscript.
5. Additional clarifications
If I have misunderstood your meaning, I hope you will communicate with me promptly.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

Summary of the paper

The paper proposes a yield prediction model for millet cultivated in the cold, dry regions of Heilongjiang Province (China). The authors collected field and meteorological data (2014-2023) from five sites, including eight hydrothermal variables. They compared three approaches: standard BP neural network, WOA-BP, and an improved version (IWOA-BP) that integrates Adaptive Inertia Weight and Elite Opposition-Based Learning to enhance optimization. The IWOA-BP model showed the best performance (RMSE = 2.74, R2 = 0.94, MAPE = 0.59%), both in temporal cross-validation and in cross-regional validation using an independent county. The authors also designed a rolling forecast scheme for 2024, showing improved accuracy over the growing season. The paper claims that the IWOA-BP model can provide operational forecasting support for millet yield in small-sample scenarios.

Major comments

Lines 88-148: the description of data sources is detailed, but information about data accessibility and reproducibility is limited. The authors should clarify whether the datasets (meteorological, soil, and yield) are publicly available or if they can be shared upon request, as stated in the “Data Availability” section. This is essential for reproducibility.

Lines 261-346: the mathematical description of the BP and IWOA-BP algorithms is correct but very dense and sometimes lacks explanations for non-specialist readers. A schematic or pseudocode would improve readability.

Lines 349-433: the experimental setup is clearly described, but no information is given about how hyperparameter tuning was validated to avoid data leakage (e.g., tuning on training vs. validation sets). Please clarify the data splits (train/validation/test) explicitly.

Lines 496-516: in the regional validation, the Lanxi site showed weaker performance. The authors mention feature engineering as a potential solution, but this should be discussed more systematically in the Discussion section, including potential causes (e.g., data quality, local heterogeneity).

Lines 518-577: the argument is interesting but mostly restates results. Consider expanding the analysis of the limitations of annual aggregation and discussing how monthly or phenological data could be incorporated in future studies. This would strengthen the methodological contribution.

Minor comments

Line 13: “hydrotemperate”, likely a typo; should be “hydrothermal”.

Lines 74, 102, 173: “precipitaion”, typo; should be “precipitation”.

Author Response

Please refer to the attachment.

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

The paper proposes an improved BP neural network model optimized by an enhanced Whale Optimization Algorithm (IWOA-BP) for predicting millet yield in northern China's cold climate regions. The enhancement incorporates Adaptive Inertia Weight and Elite Opposition-Based Learning to improve hyperparameter optimization, achieving superior performance (RMSE=2.74, R²=0.94) compared to baseline BP and standard WOA-BP approaches across five counties from 2014-2023. Based on this contribution, I offer the following detailed assessment.

Remarks:

Why was logarithmic transformation chosen over square root transformation for X7 and X8? I would like to see a comparison table showing model performance with both methods.
I would like to see a feature importance analysis showing which of the eight input variables contribute most to yield prediction.
How exactly were the 600 monthly rows aggregated to 50 annual samples? I would like to see explicit description of the aggregation method and confirmation that standardization was performed only on training data.
I would like to see hyperparameter sensitivity plots showing how RMSE changes when H, lr, and reg vary by ±20% from optimal values.
Were the three models (BP, WOA-BP, IWOA-BP) initialized with the same random seed and stopping criteria? I would like to see explicit confirmation of identical experimental conditions.
I would like to see effect sizes (Cohen's d) reported alongside the Wilcoxon p-values to quantify practical significance, not just statistical significance.
Why were only 20 whales used in the population for WOA and IWOA? I would like to see justification for this population size or a sensitivity analysis with populations of 10, 20, 30, and 40.
How was the 1-2 month lag term for X7 and X8 determined? I would like to see cross-correlation analysis or ablation study comparing lag terms of 0, 1, 2, and 3 months.
I would like to see the activation functions explicitly stated for the hidden layer and output layer of the BP network, as this is not mentioned in Section 2.3.1.
What early stopping criterion was used for BP training? I would like to see the specific patience value and validation loss threshold documented.
How were multicollinearity issues handled given the high correlations (0.84-0.95) between soil and air temperatures? I would like to see VIF values or ridge regression comparison.
I would like to see the training time comparison between BP, WOA-BP, and IWOA-BP to assess computational efficiency trade-offs.
Why was the regularization coefficient range set to 0-0.10? I would like to see justification for this upper bound or results testing higher values up to 0.50.
I would like to see residual plots by year and by location to check for systematic bias patterns that could indicate missing variables or interactions.
How was overfitting controlled beyond regularization? I would like to see documentation of dropout rates, batch normalization, or other regularization techniques if used.
I would like to see the actual yield ranges for each county documented, as Table 1 only shows aggregated statistics across all locations.
What software packages and versions were used for implementation? I would like to see complete reproducibility information including MATLAB toolbox versions.

Remarks on Tables and Figures:

Table 1 needs clearer column headers—"Standard Deviation" should specify if this is sample or population standard deviation, and units should be included for all temperature and moisture variables.
Figure 2(j) is difficult to interpret because the lines overlap significantly. I would like to see this redesigned as separate subplots for each county or use distinct line styles with thicker lines and larger markers.
Table 5 should include confidence intervals or standard errors for each metric (RMSE, R², MAE, MAPE, RPD) to indicate statistical reliability of the regional differences reported.

Author Response

Please refer to the attachment.

Author Response File: Author Response.pdf

Reviewer 4 Report

Comments and Suggestions for Authors

The submitted article addresses the important and topical issue of millet yield forecasting in northern China, where climatic and soil constraints significantly affect crop production stability. The topic is in line with global trends in the use of artificial intelligence methods and hybrid models for the analysis of complex, non-linear environmental data. The authors applied an improved version of the population optimisation algorithm (Improved Whale Optimisation Algorithm – IWOA) to calibrate a BP (Backpropagation) neural network, achieving – according to the results presented – high fit and accuracy rates for the forecasts.
The introductory part of the article is well written and provides a valuable introduction to the topic, placing the research firmly within the relevant literature. The authors correctly point out the difficulties in applying classical analysis methods when data are limited, heterogeneous and non-linear. The scope of the work covers data from ten years of observations in five locations in Heilongjiang Province, which constitutes an interesting research base.
However, the research methodology raises some concerns. The set of eight meteorological variables used (monthly soil and air temperature values, precipitation totals and soil moisture) is insufficient to accurately reflect the growth and yield of millet under field conditions. The use of monthly data significantly limits the model's ability to capture the dynamics of physiological processes occurring in the plant, especially since millet in these climatic conditions is a species with a short, approximately three-month growing season. As a result, some of the data outside the growing season may introduce noise and distort the structure of the model. In this situation, it would be advisable to consider using daily or at least seasonal data, taking into account key indicators such as the sum of effective temperatures, number of days with precipitation, sunshine, soil type and agrotechnical conditions.
From a methodological point of view, it is also important that the authors refer to DSSAT-CERES simulation models, but do not attempt to compare or integrate them with the proposed solution. The lack of such a reference limits the possibility of assessing the real advantage of the developed IWOA-BP model over existing prediction systems.
The results presented in the article (R² = 0.94, RMSE = 2.74, MAPE = 5.9) indicate a very good fit of the model, but in the absence of adequate validation and proper selection of variables, they may result from overfitting rather than actual predictive power. Additional interregional cross-validation is an interesting element, but it does not solve the problem of insufficient biologically justified data.
In summary, despite an interesting idea and a modern computational approach, the article has serious substantive shortcomings in terms of input data selection, modelling concepts and interpretation of results. In its current form, the work does not meet the criteria of methodological reliability required in scientific publications on crop yield modelling. A thorough reformulation of the methodological part, an extension of the set of variables and an adjustment of the time scale of the data to actual biological processes would be necessary before the article could be recommended for publication.

Author Response

Please refer to the attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

To the Authors,
Thank you for sending the revised manuscript and its details. I have reviewed your point-by-point answers and the changes you have made to the text.
I am happy that all of my concerns have been fully addressed. I have accepted these changes and the manuscript is now much better.

I am happy with the final result because these big changes have made the paper stronger.

Author Response

Response to Reviewer1

Thank you very much for your careful reading and encouraging assessment. We appreciate your constructive suggestions during the review, which helped us clarify the methodology, strengthen the empirical evidence, and improve the presentation. We have implemented all requested revisions and verified that the updated text, tables, and explanations are consistent throughout the manuscript. We are pleased that the revisions meet your expectations and that the manuscript has been substantially improved. We remain available for any further clarifications the editor may require and are grateful for your time and support.

Reviewer 2 Report

Comments and Suggestions for Authors

Dear Authors,

Thank you for sending in a revised version of your manuscript together with a detailed cover letter addressing the comments.

I have carefully reviewed your revisions and am now very happy to inform you that the changes you have addressed do indeed satisfy all the comments I raised.

Congratulations and good luck on the publishing of your work.

Best regards.

Author Response

Response to Reviewer2

Reviewer 3 Report

Comments and Suggestions for Authors

Thank you to the authors for their substantial effort in addressing the reviewer comments. The manuscript has been significantly strengthened through the addition of methodological details, sensitivity analyses, and transparency measures.

Include the transformation comparison table (raw/square-root/log) in manuscript or supplementary materials instead of providing only on request.
The Lanxi case needs concrete follow-up—proposed solutions are mentioned but not tested or implemented.
Provide a more operational roadmap for incorporating monthly/phenological predictors within sample-size constraints.
Table 5 needs explicit column headers for standard deviation columns; Figure 2j requires further visual refinement.
Add learning curve evidence to demonstrate the model hasn't memorized training data despite the small sample size.

Author Response

Please refer to the attachment.

Author Response File: Author Response.pdf

Reviewer 4 Report

Comments and Suggestions for Authors

The authors focused mainly on justifying their research approach, which differs from the solutions used in large, comprehensive studies. Their extensive explanations have also been included in the content of the paper, which has changed my previous assessment from negative to conditionally acceptable. The inclusion of detailed explanations allows the reader to independently assess the validity of the methods used.
However, it should be emphasised that my assessment refers to the arguments contained in the original review, in which I assumed that the authors would decide to withdraw the article in order to supplement it with new data. Assuming positive opinions from other reviewers, it is likely that the paper will be accepted for publication. In my opinion, the article makes a significant contribution to the development of modelling methods published in the journal Agronomy, although it still falls short of perfection in terms of biometric aspects.
In summary, after the corrections have been made, the paper reliably presents the limitations and imperfections of the methods used. I assess the substantive level as moderate, the potential interest of readers as average, while in terms of editing and style, the article can be considered acceptable.