Deep Transfer Learning for UAV-Based Cross-Crop Yield Prediction in Root Crops
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsThe manuscript presents a deep transfer learning framework that leverages multispectral UAV data to perform cross-crop yield prediction, transferring knowledge from potato to sweetpotato. The authors integrate convolutional, recurrent, and attention-based neural network components to construct a hybrid architecture and evaluate its performance across multiple growth stages and feature subsets. The study provides a comprehensive experimental design, including detailed field trials, feature engineering, and a methodologically rigorous comparison among CNN, BiGRU, BiLSTM, and the proposed hybrid model. Overall, the manuscript is well-organized, addresses a timely topic in precision agriculture, and contributes to bridging data gaps in cross-crop modeling.
- The manuscript lacks a dedicated "Discussion" section, which is crucial for contextualizing results, implications, and limitations beyond the brief mentions in the Conclusions. Recommend adding a Discussion section before the Conclusions, where the authors could address: (1) the hybrid model's sensitivity to redundant features at high predictor counts (11F–13F), including causes of greater impact compared to baselines and potential mitigations; and (2) stage-specific performance variations, with peak accuracy at tuberization, linking these to underlying biological processes and exploring adaptations for stage-agnostic predictions.
- Abstract, lines 6-7: The description of the hybrid model refers to "convolutional and recurrent nueral netwoek along with attention mechanisms." "nueral netwoek" should be "neural network".
- Page 4, line 139: "WGS 84: 89â—¦00′23.2′′W, ; 34â—¦08′10.0′′N" contains an extra semicolon after "W,"; correct to remove it.
- Terms such as “sweetpotato,” “sweet potato,” should be used consistently throughout the manuscript. Please check for uniform formatting.
- Page 7, Figure 2 caption: The caption "Figure 2. Schematic representation of the proposed Hybrid CNN–RNN–Attention architecture with parameter-based transfer learning" lacks a terminating period. Please add a period at the end for grammatical completeness and consistency with other captions in the manuscript.
- Page 11, line 352: "decline in the yield were observed" should be changed to "declines in yield were observed."
Author Response
The manuscript presents a deep transfer learning framework that leverages multispectral UAV data to perform cross-crop yield prediction, transferring knowledge from potato to sweetpotato. The authors integrate convolutional, recurrent, and attention-based neural network components to construct a hybrid architecture and evaluate its performance across multiple growth stages and feature subsets. The study provides a comprehensive experimental design, including detailed field trials, feature engineering, and a methodologically rigorous comparison among CNN, BiGRU, BiLSTM, and the proposed hybrid model. Overall, the manuscript is well-organized, addresses a timely topic in precision agriculture, and contributes to bridging data gaps in cross-crop modeling.
Response: The authors sincerely thank the Reviewer for their thoughtful and constructive feedback. We greatly appreciate the time and effort invested in reviewing our manuscript. We have carefully considered all remaining concerns. In response, we have revised the manuscript to address each point in detail, as outlined in our point-by-point reply.
Here is the point-by-point response to the Reviewer's comments and concerns. The responses are in bold. The changes made in the revised manuscript are highlighted in Yellow.
1. The manuscript lacks a dedicated "Discussion" section, which is crucial for contextualizing results, implications, and limitations beyond the brief mentions in the Conclusions. Recommend adding a Discussion section before the Conclusions, where the authors could address: (1) the hybrid model's sensitivity to redundant features at high predictor counts (11F–13F), including causes of greater impact compared to baselines and potential mitigations; and (2) stage-specific performance variations, with peak accuracy at tuberization, linking these to underlying biological processes and exploring adaptations for stage-agnostic predictions.
Response: We thank the reviewer for this insightful recommendation. Section 5 of the revised manuscript now includes an expanded discussion that addresses feature robustness stage-specific model performance, with explicit links to the underlying physiological basis of robust feature migration in cross-crop learning and to the importance of feature robustness for stage-agnostic prediction.
Two new subsections have been added: 5.3 Physiological Basis for Robust Feature Migration in Cross-Crop Transfer Learning (pages 14-16: Lines 478–530) and 5.5 Relevance of Feature Robustness in Model Performance (pages 19 and 20: Lines 629–670). Additionally, the linkage to underlying biological processes is included on page 19, Lines 623–628.
2. Abstract, lines 6-7: The description of the hybrid model refers to "convolutional and recurrent nueral netwoek along with attention mechanisms." "nueral netwoek" should be "neural network".
Response: Thank you for pointing this out. We have corrected the typographical error. The revision has been implemented in the Abstract on page 1: Line 21 of the revised manuscript.
3. Page 4, line 139: "WGS 84: 89â—¦00′23.2′′W, ; 34â—¦08′10.0′′N" contains an extra semicolon after "W,"; correct to remove it.
Response: Thank you for pointing this out. We have removed the unnecessary semicolon. The revision appears on Page 5: Line 180 of the revised manuscript.
4. Terms such as “sweetpotato,” “sweet potato,” should be used consistently throughout the manuscript. Please check for uniform formatting.
Response: Thank you for this observation. We have reviewed the entire manuscript and ensured consistent use of the term “sweetpotato”. All instances have been standardized for consistency. These updates occur throughout the manuscript.
5. Page 7, Figure 2 caption: The caption "Figure 2. Schematic representation of the proposed Hybrid CNN–RNN–Attention architecture with parameter-based transfer learning" lacks a terminating period. Please add a period at the end for grammatical completeness and consistency with other captions in the manuscript.
Response: Thank you for noting this formatting issue. We have added a period at the end of the caption. The change is located in the caption of Figure 3 on Page 9.
6. Page 11, line 352: "decline in the yield were observed" should be changed to "declines in yield were observed."
Response: We agree with the reviewer’s suggestion. The sentence has been corrected, and the update can be found on pages 12and 13: Lines 419-421 of the revised manuscript.
Additional clarifications:
In addition to the above comments, all spelling and grammatical errors have been corrected.
Reviewer 2 Report
Comments and Suggestions for AuthorsThis paper focuses on key issues in cross-crop yield prediction, highlighting its innovation and application value. Specific highlights include: First, addressing the challenge of yield prediction for data-scarce crops like sweet potato, a deep transfer learning framework is proposed, using potato as the source domain and sweet potato as the target domain. This breaks the data dependence of traditional single-crop modeling and provides a new path for precision agriculture research on niche crops. Second, the constructed CNN-RNN-Attention hybrid model achieves high-precision prediction using only 7 core spectral and canopy features, significantly outperforming baseline models requiring 11-13 features, demonstrating excellent balance between model efficiency and practicality. Third, combining UAV multispectral remote sensing and field experiments, the interaction between nitrogen levels and cover crops on sweet potato yield is systematically analyzed, providing conclusions with both theoretical and practical value for optimizing field management. Fourth, through strict feature space consistency assurance and temporal alignment strategies, as well as staged pre-training and selective fine-tuning methods, the applicability of cross-crop transfer learning in root crops is verified.
To further improve the quality and academic impact of the paper, the following specific revisions are proposed:
- The abstract could briefly explain the selection logic for the seven core features (e.g., whether it was based on feature importance ranking, correlation analysis, etc.) to enhance the persuasiveness of the research.
- The introduction should strengthen the analysis of the limitations of existing sweet potato yield prediction research, specifically presenting the differences between this study and existing "UAV + machine learning" methods in terms of feature quantity, model complexity, and prediction accuracy (e.g., R², RMSE).
- In the "UAV field imaging" section of the Materials and Methods, the density of ground control points (GCPs) (e.g., number per hectare) could be supplemented to ensure experimental reproducibility.
- Correlation heatmaps or variance analysis results for the 13 initial features need to be added, clarifying the selection method for the seven core features, and explaining the physiological or physical mechanisms by which these features exhibit stronger robustness in cross-crop migration scenarios.
- The model architecture section should include specific parameter settings for each module, such as the kernel size and number, pooling layer type, and the number of hidden neurons and layers in RNNs (BiGRU, BiLSTM), as well as the dimensions and calculation method of the attention mechanism.
- The results section should include scatter plots of actual and predicted values ​​for different models at each growth stage, rather than simply presenting R² and RMSE values.
- Statistical significance tests of the interaction between nitrogen levels and cover crops should be added, such as an ANOVA table, clearly indicating whether the yield differences between different treatment combinations are significant (with p-values), enhancing the scientific rigor and soundness of the conclusions.
- The discussion section should delve into the underlying reasons for the model's performance degradation after the number of features exceeds seven.
- The research limitations section should specifically explain the model's generalization ability across different climates (e.g., temperate and tropical) and different sweet potato varieties, providing clearer guidance for future research directions.
- The references section should include the latest research findings on cross-crop transfer learning in the agricultural field from 2024 to 2025, especially literature related to root crops (such as potatoes and sweet potatoes).
Author Response
This paper focuses on key issues in cross-crop yield prediction, highlighting its innovation and application value. Specific highlights include: First, addressing the challenge of yield prediction for data-scarce crops like sweet potato, a deep transfer learning framework is proposed, using potato as the source domain and sweet potato as the target domain. This breaks the data dependence of traditional single-crop modeling and provides a new path for precision agriculture research on niche crops. Second, the constructed CNN-RNN-Attention hybrid model achieves high-precision prediction using only 7 core spectral and canopy features, significantly outperforming baseline models requiring 11-13 features, demonstrating excellent balance between model efficiency and practicality. Third, combining UAV multispectral remote sensing and field experiments, the interaction between nitrogen levels and cover crops on sweet potato yield is systematically analyzed, providing conclusions with both theoretical and practical value for optimizing field management. Fourth, through strict feature space consistency assurance and temporal alignment strategies, as well as staged pre-training and selective fine-tuning methods, the applicability of cross-crop transfer learning in root crops is verified.
To further improve the quality and academic impact of the paper, the following specific revisions are proposed:
Response: The authors sincerely thank the Reviewer for their thoughtful and constructive feedback. We greatly appreciate the time and effort invested in reviewing our revised manuscript. We have carefully considered all remaining concerns. In response, we have revised the manuscript to address each point in detail, as outlined in our point-by-point reply.
Here is the point-by-point response to the Reviewer's comments and concerns. The responses are in bold. The changes made in the revised manuscript are highlighted in Yellow.
1. The abstract could briefly explain the selection logic for the seven core features (e.g., whether it was based on feature importance ranking, correlation analysis, etc.) to enhance the persuasiveness of the research.
Response: Thank you for this valuable suggestion. The Abstract has been revised to clarify the method used to assess feature robustness and the selection of transferable predictors, incorporating the reviewer's recommended analyses. The updated Abstract appears on Pages 1–2, Lines 24–40.
2. The introduction should strengthen the analysis of the limitations of existing sweet potato yield prediction research, specifically presenting the differences between this study and existing "UAV + machine learning" methods in terms of feature quantity, model complexity, and prediction accuracy (e.g., R², RMSE).
Response: Thank you for this recommendation. The Introduction has been substantially revised to provide a more precise and more comprehensive analysis of the limitations of existing sweetpotato yield-prediction studies. The revised text appears in the Introduction on Pages 3-4, Lines 103-132.
3. In the "UAV field imaging" section of the Materials and Methods, the density of ground control points (GCPs) (e.g., number per hectare) could be supplemented to ensure experimental reproducibility.
Response: Thank you for this helpful suggestion. We have now added the number of GCPs deployed for UAV imaging in our study area, along with the corresponding GCP density computation (GCPs/ha), to enhance transparency and reproducibility. The revision can be found on Page 6: 225-228.
4. Correlation heatmaps or variance analysis results for the 13 initial features need to be added, clarifying the selection method for the seven core features, and explaining the physiological or physical mechanisms by which these features exhibit stronger robustness in cross-crop migration scenarios.
Response: We thank the reviewer for this insightful recommendation. We have moved feature extraction from Section 2.2 to 2.3 and added more detail on the selection of 13 initial features based on the source-domain datasets. These details can be found on pages 7 and 8: Line 243-251.
In addition, we have added a mechanism for robust feature migration in a cross-crop learning scenario. New subsections have been added to the Results and Discussion section, i.e., Subsection 5.3: Physiological Basis for Robust Feature Migration in Cross-Crop Transfer Learning. The revision is on pages 14-16: Lines 478–530.
5. The model architecture section should include specific parameter settings for each module, such as the kernel size and number, pooling layer type, and the number of hidden neurons and layers in RNNs (BiGRU, BiLSTM), as well as the dimensions and calculation method of the attention mechanism.
Response: Thank you for this important suggestion. We have revised the Model Architecture section to include detailed parameter settings for all modules. Specifically, we now report the convolutional kernel sizes and counts, pooling layer types, the number of layers and hidden units used in the BiGRU and BiLSTM networks, and the dimensionality and computation procedure of the attention mechanism. The Changes in the revised manuscript can be found in Subsection 3.2: Hybrid Deep Transfer Learning Model Architecture. Additionally, we have supplemented Table S1, which provides all essential aspects of the model architecture, including the parameter settings mentioned.
6. The results section should include scatter plots of actual and predicted values ​​for different models at each growth stage, rather than simply presenting R² and RMSE values.
Response: Thank you for this thoughtful suggestion. We acknowledge the value of 1:1 scatter plots for visualizing the agreement between predicted and actual yields. However, generating scatter plots for every model (CNN, BiGRU, BiLSTM, Hybrid) across all five growth stages would result in a very large number of figures, making the Results section excessively long and difficult to interpret.
To ensure clarity and conciseness, we instead present all model outcomes in tabulated form (Tables 4-7), which fully report the R² and RMSE metrics for each model-stage combination. In addition, Figures 7 and 8 provide aggregated visual summaries of model behavior across feature subsets and growth stages, enabling meaningful comparison without overextending the manuscript.
For these reasons, we respectfully retain the tabular and summarized visual formats, which effectively communicate the predictive performance while maintaining readability.
7. Statistical significance tests of the interaction between nitrogen levels and cover crops should be added, such as an ANOVA table, clearly indicating whether the yield differences between different treatment combinations are significant (with p-values), enhancing the scientific rigor and soundness of the conclusions.
Response: Thank you for this important suggestion. We have now added a full two-way ANOVA to evaluate the main effects of nitrogen level and cover crop, as well as their interaction, on sweetpotato yield. The updates can be found on Pages 14, Lines 450-460 and 474-477. Table 3 for ANOVA has also been added on Page 14.
8. The discussion section should delve into the underlying reasons for the model's performance degradation after the number of features exceeds seven.
Response: We thank the reviewer for this insightful recommendation. The revised manuscript now includes a dedicated discussion of why model performance declines when the feature set exceeds seven predictors. This analysis is incorporated into the newly added Subsection 5.5: Relevance of Feature Robustness in Model Performance (pages 19-20, lines 629–670).
9. The research limitations section should specifically explain the model's generalization ability across different climates (e.g., temperate and tropical) and different sweet potato varieties, providing clearer guidance for future research directions.
Response: Thank you for this valuable suggestion. We have now updated the Conclusion to explicitly address the model’s generalization ability and limitations across contrasting climatic regions and across different sweetpotato cultivars. In addition, the entire Conclusion section has been revised to reflect the broader updates made throughout the manuscript in response to the reviewers’ comments.
The updated revised text can be found on Pages 20-21: Line 678-715.
10. The references section should include the latest research findings on cross-crop transfer learning in the agricultural field from 2024 to 2025, especially literature related to root crops (such as potatoes and sweet potatoes).
Response: Thank you for this critical recommendation. We have updated the reference list to include recent studies relevant to transfer learning in agriculture, with particular attention to root and tuber crops. However, we found that very few 2024–2025 studies specifically address cross-crop transfer learning for root crops such as sweetpotato and potato. The relevant works have been incorporated into the Introduction.
Additional clarifications:
In addition to the above comments, all spelling and grammatical errors have been corrected.
Reviewer 3 Report
Comments and Suggestions for Authors In this manuscript, the authors integrated CNN, RNN and attention mechanisms to predict crop yield. Specifically, they incorporated transfer learning and predicted sweet potato yield using potato data as the source. Generally, the predicted accuracy was fair. The structure of the paper is rather straightforward, and the presentation of the methodology, results and discussion are fine. I list my general comments and detailed suggestions as follows. General Comments: 1. The authors emphasized that they incorporated RNN and mentioned five growth stages, thus I suppose temporal information is vital in the model predicting mechanism, and the authors mentioned “experiments were conducted under varying temporal lengths”. However, in the results section (5.3), the results were presented separately for different stages, and my impression is that the results were derived from learning of single stage. If not, I would like the authors to clarify how learning was facilitated with data collected from all five growth stages. My intuition is that factors from all five growth stages can affect the final yield, thus should be considered in a suite, not separately. 2. I would suggest the authors include a detailed table of the hyperparameters and the ranges of value used in the model calibration and validation. A supplementary document should suffice. These technical details help the readers to understand and replicate the modeling methods. Detailed suggestions: Title: too general, given that you are virtually presenting a case study (potato -> sweet potato). Figure 1: I would suggest combining panels (b) and (c). Label the Lon and Lat of the experimental field in the figure as well, not just in the text. Besides, explain R1-R4 (repetition?) Line 156: 101.6 cm -> 1.02 m? (in the same format as the following 9.14 m) Line 225-226: I am not sure what it means by “duplicating the final time point of the potato dataset”. Line 377: Fallow not highest for 56 in 2023 or 112 in 2022. Figure 4: should be moved to after the paragraph between Lines 397-410. Line 417-423: should be moved to the methods section. I am not sure about the stepwise procedures of the indices. Does the order of them matter? Or is there a way to identify optimal suite of indices?Author Response
In this manuscript, the authors integrated CNN, RNN and attention mechanisms to predict crop yield. Specifically, they incorporated transfer learning and predicted sweet potato yield using potato data as the source. Generally, the predicted accuracy was fair. The structure of the paper is rather straightforward, and the presentation of the methodology, results and discussion are fine. I list my general comments and detailed suggestions as follows.
Response: The authors sincerely thank the Reviewer for their thoughtful and constructive feedback. We greatly appreciate the time and effort invested in reviewing our revised manuscript. We have carefully considered all remaining concerns. In response, we have revised the manuscript to address each point in detail, as outlined in our point-by-point reply.
Here is the point-by-point response to the Reviewer's comments and concerns. The responses are in bold. The changes made in the revised manuscript are highlighted in Yellow.
General Comments:
1. The authors emphasized that they incorporated RNN and mentioned five growth stages, thus I suppose temporal information is vital in the model predicting mechanism, and the authors mentioned “experiments were conducted under varying temporal lengths”. However, in the results section (5.3), the results were presented separately for different stages, and my impression is that the results were derived from learning of single stage. If not, I would like the authors to clarify how learning was facilitated with data collected from all five growth stages. My intuition is that factors from all five growth stages can affect the final yield, thus should be considered in a suite, not separately.
Response: We thank the reviewer for pointing this out. We have clarified that the model uses cumulative multitemporal data, in which each stage incorporates all observations from emergence onward (i.e., treated as a suite rather than as isolated stages). This clarification has been added to the revised manuscript on Page 16: Line 537-541.
2. I would suggest the authors include a detailed table of the hyperparameters and the ranges of value used in the model calibration and validation. A supplementary document should suffice. These technical details help the readers to understand and replicate the modeling methods.
Response: Thank you for this helpful suggestion. We have now included a detailed Table S1 of all hyperparameters and parameter ranges used for model calibration and validation in the Supplementary Materials. I've included the attached Supplementary material for more information.
3. Detailed suggestions: Title: too general, given that you are virtually presenting a case study (potato -> sweet potato).
Response: Thank you for the suggestion. We have revised the title to indicate the crop category examined in this study clearly. The updated title is: “Deep Transfer Learning for UAV-based Cross-Crop Yield Prediction in Root Crops.”
4. Figure 1: I would suggest combining panels (b) and (c). Label the Lon and Lat of the experimental field in the figure as well, not just in the text. Besides, explain R1-R4 (repetition?)
Response: Thank you for this constructive suggestion. Figure 1 has been revised to improve clarity and informativeness. We streamlined the layout, removed unnecessary elements, and updated the visual presentation to provide a more precise and concise depiction of the study site and field design.
5. Line 156: 101.6 cm -> 1.02 m? (in the same format as the following 9.14 m)
Response: Thank you for noticing this inconsistency. We have updated the unit format. The correction is reflected on Page 5: Line 181.
6. Line 224-225: I am not sure what it means by “duplicating the final time point of the potato dataset”.
Response: Thank you for raising this point. We agree that the original phrasing was unclear. In the revised manuscript, we now explicitly state that “duplicating the final time point” refers to a preprocessing step that matches the temporal and feature subset dimensions of the potato (source) and sweetpotato (target) datasets. This clarification has been added to Page 8: Lines 258-260 in the revised manuscript.
7. Line 377: Fallow not highest for 56 in 2023 or 112 in 2022. Figure 4: should be moved to after the paragraph between Lines 397-410.
Response: Thank you for pointing this out. The suggestions have been incorporated based on the resulting figure.
8. Line 417-423: should be moved to the methods section. I am not sure about the stepwise procedures of the indices. Does the order of them matter? Or is there a way to identify optimal suite of indices?
Response: Thank you for this thoughtful recommendation. We have moved the procedural descriptions to Section 4: Transfer Learning Strategy to improve the logical flow (Pages 11 and 12: Line 377-386).
Regarding the stepwise feature expansion, the order of indices does not impose any algorithmic constraint. The sequence and robustness were initially determined by the source-domain trained model using a robust feature subset. However, we did perform an analysis of cross-cross feature robustness migration to identify the best fit for transfer learning. The analysis can be found in Subsection: 5.3 Physiological Basis for Robust Feature Migration in Cross-Crop Transfer Learning (pages 14 and 15: Lines 478–530)
Additional clarifications:
In addition to the above comments, all spelling and grammatical errors have been corrected.
Author Response File:
Author Response.docx
