Author Contributions
Conceptualization, J.P.d.M.X. and K.S.; methodology, J.P.d.M.X.; software, J.P.d.M.X.; validation, J.P.d.M.X., K.S. and G.V.M.; formal analysis, J.P.d.M.X.; investigation, J.P.d.M.X.; resources, G.V.M.; data curation, J.P.d.M.X.; writing—original draft preparation, J.P.d.M.X.; writing—review and editing, K.S. and G.V.M.; visualization, J.P.d.M.X.; supervision, K.S.; project administration, K.S.; funding acquisition, G.V.M.; field experiment coordination, C.L.B. and R.S.; data collection support, M.R. All authors have read and agreed to the published version of the manuscript.
Figure 1.
Location of the study area. (A) Brazil in South America. (B) Approximate position of the study area (red star) in western Paraná. (C) Satellite image showing field boundaries.
Figure 1.
Location of the study area. (A) Brazil in South America. (B) Approximate position of the study area (red star) in western Paraná. (C) Satellite image showing field boundaries.
Figure 2.
Data processing and machine learning workflow.
Figure 2.
Data processing and machine learning workflow.
Figure 3.
Smoothed HLS NDVI time-series for one representative growing season per crop: corn 2016 S2 (March–August), soybean 2014 S1 (October 2013–March 2014), and wheat 2013 S2 (April–September 2013). Filled points are the mean NDVI across all sampling points on each clear-sky acquisition date. The continuous line is the Whittaker smoother (λ = 100) fitted to the irregularly spaced observations. The irregular spacing of acquisition dates reflects the combined Landsat-8 and Sentinel-2 revisit schedule after cloud masking.
Figure 3.
Smoothed HLS NDVI time-series for one representative growing season per crop: corn 2016 S2 (March–August), soybean 2014 S1 (October 2013–March 2014), and wheat 2013 S2 (April–September 2013). Filled points are the mean NDVI across all sampling points on each clear-sky acquisition date. The continuous line is the Whittaker smoother (λ = 100) fitted to the irregularly spaced observations. The irregular spacing of acquisition dates reflects the combined Landsat-8 and Sentinel-2 revisit schedule after cloud masking.
Figure 4.
Spatial distribution of correct (green) and misclassified (red) grid cells, RF model, three-class task. No obvious spatial clustering is visible for any crop.
Figure 4.
Spatial distribution of correct (green) and misclassified (red) grid cells, RF model, three-class task. No obvious spatial clustering is visible for any crop.
Figure 5.
Spatial distribution of correct (green) and misclassified (red) grid cells, RF model, binary task. For wheat, all Low cells are correctly classified by the RF (perfect Low-class recall, RF model only; GB/LR/KNN at 84.5% accuracy have Low-class recall = 0.56, as reported in
Table 10); all RF errors are Not Low cells predicted as Low.
Figure 5.
Spatial distribution of correct (green) and misclassified (red) grid cells, RF model, binary task. For wheat, all Low cells are correctly classified by the RF (perfect Low-class recall, RF model only; GB/LR/KNN at 84.5% accuracy have Low-class recall = 0.56, as reported in
Table 10); all RF errors are Not Low cells predicted as Low.
Figure 6.
Top 10 feature importances, three-class task (ablated). Left: Gini importance (RF). Right: Mutual Information score. Climate stage features (dry_days_Tillering, mean_t2m_Ripening) are prominent for wheat but absent from the corn and soy top-10.
Figure 6.
Top 10 feature importances, three-class task (ablated). Left: Gini importance (RF). Right: Mutual Information score. Climate stage features (dry_days_Tillering, mean_t2m_Ripening) are prominent for wheat but absent from the corn and soy top-10.
Figure 7.
Top 10 feature importances, binary task (ablated). For wheat, importance is broadly distributed across temperature, vegetation, and precipitation features, contrasting with the 61.8% single-feature dominance in the unablated model.
Figure 7.
Top 10 feature importances, binary task (ablated). For wheat, importance is broadly distributed across temperature, vegetation, and precipitation features, contrasting with the 61.8% single-feature dominance in the unablated model.
Figure 8.
SHAP global feature importance (mean
), RF model, three-class task. Rankings are consistent with the MI and Gini orderings shown in
Figure 6.
Figure 8.
SHAP global feature importance (mean
), RF model, three-class task. Rankings are consistent with the MI and Gini orderings shown in
Figure 6.
Figure 9.
SHAP global feature importance (mean ), RF model, binary task. Climate features dominate for wheat; vegetation features dominate for corn and soybean.
Figure 9.
SHAP global feature importance (mean ), RF model, binary task. Climate features dominate for wheat; vegetation features dominate for corn and soybean.
Table 1.
Remote sensing and aggregated climate features.
Table 1.
Remote sensing and aggregated climate features.
| Feature Name | Description | Details and Units |
|---|
| Remote Sensing Features (Vegetation) |
| peak_ndvi_to_date | Maximum NDVI until the forecast cutoff. | Peak plant vigor. Range: to 1. |
| ndvi_at_cutoff | NDVI on the last available day. | Vigor at prediction time. |
| evi_at_cutoff | EVI on the cutoff date. | Less prone to canopy
saturation than NDVI. |
| ndvi_auc_to_date | Area under the NDVI curve. | Cumulative photosynthetic activity over the cycle. |
| ndvi_std_to_date | Standard deviation of NDVI. | Elevated values
suggest stress episodes or uneven development. |
| green_up_slope_to_date | Slope of NDVI during green-up. | Higher values indicate faster early-season development. |
| senescence_slope | Rate of NDVI decline after peak. | Slow decline may indicate stress; fast decline, early maturity. |
| days_since_peak_at_cutoff | Days elapsed since peak NDVI. | How far the crop has progressed into senescence. |
| Accumulated and Aggregated Climate Features |
| cum_gdd_base_10_to_date | Growing Degree Days (GDD) accumulated above 10 °C. | Thermal units for corn and soy. Unit: °C-day. |
| cum_gdd_base_0_to_date | GDD accumulated above 0 °C. | Thermal units for wheat. Unit: °C-day. |
| cum_srad_to_date | Accumulated solar radiation. | Total photosynthetically available energy. Unit: MJ/m2. |
| mean_vpd_to_date | Mean Vapor Pressure Deficit. | Atmospheric dryness stress. Unit: kPa. |
| useful_srad_vs_vpd | Radiation discounted for Mean Vapor Pressure Deficit (VPD) stress. | Net energy available under dry-air conditions. |
Table 2.
Growth-stage and rolling-window climate features.
Table 2.
Growth-stage and rolling-window climate features.
| Feature Name | Description | Details and Units |
|---|
| Rolling Window Climate Features |
| precip_7d | Precipitation in the 7 days before cutoff. | Unit: mm. |
| precip_30d | Precipitation in the 30 days before cutoff. | Unit: mm. |
| t2m_mean_30d | Mean temperature in the 30 days before cutoff. | Unit: °C. |
| Climate Features by Growth Stage |
| mean_t2m_[stage] | Mean air temperature during the stage. | Unit: °C. |
| total_precip_[stage] | Total precipitation during the stage. | Unit: mm. |
| dry_days_[stage] | Days with precipitation < 1 mm. | Unit: days. |
| inter_heat_dry_[stage] | Heat × dry-day interaction. | Combined heat–drought stress index. |
Table 3.
Productivity statistics for the three-class definition by crop (t/ha).
Table 3.
Productivity statistics for the three-class definition by crop (t/ha).
| Crop | Class | Mean | Min | Max | Count |
|---|
| Corn | High | 10.57 | 8.89 | 16.07 | 4635 |
| Low | 5.66 | 0.86 | 7.18 | 4637 |
| Medium | 8.02 | 7.18 | 8.89 | 4633 |
| Soybean | High | 4.80 | 4.31 | 7.55 | 4085 |
| Low | 2.99 | 0.50 | 3.67 | 4087 |
| Medium | 3.99 | 3.67 | 4.31 | 4086 |
| Wheat | High | 3.14 | 2.79 | 4.47 | 1224 |
| Low | 0.58 | 0.34 | 1.16 | 1224 |
| Medium | 2.36 | 1.16 | 2.79 | 1224 |
Table 4.
Ablation study: best accuracy (%) with and without harvest_day_of_year. Unablated values include SVM (five algorithms); ablated values exclude SVM (four algorithms). is the accuracy drop attributable to removing the feature.
Table 4.
Ablation study: best accuracy (%) with and without harvest_day_of_year. Unablated values include SVM (five algorithms); ablated values exclude SVM (four algorithms). is the accuracy drop attributable to removing the feature.
| Crop | Task | Unablated | Ablated | |
|---|
| Corn | Three-class | 56.7 (RF) | 49.8 (RF) | −6.9 |
| Corn | Binary | 73.4 (GB) | 73.5 (KNN) | +0.1 |
| Soybean | Three-class | 56.5 (RF) | 48.4 (KNN) | −8.1 |
| Soybean | Binary | 78.9 (GB) | 74.9 (GB/KNN) | −4.0 |
| Wheat | Three-class | 70.9 (KNN) | 58.0 (RF/GB) | −12.9 |
| Wheat | Binary | 98.7 (all) | 84.5 (GB/LR/KNN) | −14.2 |
Table 5.
Model comparison: accuracy (%) and macro-averaged F1. Best per row in bold.
Table 5.
Model comparison: accuracy (%) and macro-averaged F1. Best per row in bold.
| Crop | Task | RF | GB | KNN | LR | Dummy |
|---|
| Accuracy (%) |
| Corn | Three-class | 49.8 | 49.7 | 48.8 | 43.6 | 33.4 |
| Soybean | Three-class | 47.4 | 47.5 | 48.4 | 45.7 | 33.3 |
| Wheat | Three-class | 58.0 | 58.0 | 52.3 | 53.4 | 33.3 |
| Corn | Binary | 72.5 | 73.4 | 73.5 | 61.7 | 66.6 |
| Soybean | Binary | 72.7 | 74.9 | 74.9 | 71.5 | 66.7 |
| Wheat | Binary | 72.8 | 84.5 | 84.5 | 84.5 | 66.7 |
| Macro-averaged F1 |
| Corn | Three-class | 0.479 | 0.475 | 0.456 | 0.408 | — |
| Soybean | Three-class | 0.474 | 0.474 | 0.464 | 0.453 | — |
| Wheat | Three-class | 0.590 | 0.590 | 0.517 | 0.550 | — |
| Corn | Binary | 0.660 | 0.638 | 0.647 | 0.583 | — |
| Soybean | Binary | 0.663 | 0.659 | 0.662 | 0.653 | — |
| Wheat | Binary | 0.727 | 0.801 | 0.801 | 0.801 | — |
Table 6.
Classification Report—Wheat, RF, Three-Class.
Table 6.
Classification Report—Wheat, RF, Three-Class.
| Class | Precision | Recall | F1-Score | Support |
|---|
| Low | 0.93 | 0.56 | 0.70 | 306 |
| Medium | 0.43 | 0.74 | 0.54 | 306 |
| High | 0.67 | 0.44 | 0.53 | 305 |
Table 7.
Classification Report—Corn, RF, Three-Class.
Table 7.
Classification Report—Corn, RF, Three-Class.
| Class | Precision | Recall | F1-Score | Support |
|---|
| Low | 0.71 | 0.37 | 0.49 | 1160 |
| Medium | 0.48 | 0.30 | 0.37 | 1158 |
| High | 0.44 | 0.82 | 0.57 | 1159 |
Table 8.
Classification Report—Soybean, RF, Three-Class.
Table 8.
Classification Report—Soybean, RF, Three-Class.
| Class | Precision | Recall | F1-Score | Support |
|---|
| Low | 0.74 | 0.38 | 0.50 | 1022 |
| Medium | 0.38 | 0.66 | 0.48 | 1022 |
| High | 0.51 | 0.38 | 0.44 | 1021 |
Table 9.
Productivity ranges for binary classes (t/ha).
Table 9.
Productivity ranges for binary classes (t/ha).
| Crop | Class | Mean | Min | Max | Count |
|---|
| Corn | Low | 5.66 | 0.86 | 7.18 | 4637 |
| Not Low | 9.30 | 7.18 | 16.07 | 9268 |
| Soybean | Low | 2.99 | 0.50 | 3.67 | 4087 |
| Not Low | 4.40 | 3.67 | 7.55 | 8171 |
| Wheat | Low | 0.58 | 0.34 | 1.16 | 1224 |
| Not Low | 2.75 | 1.16 | 4.47 | 2448 |
Table 10.
Binary classification report—Wheat, GB/LR/KNN (all 84.5%).
Table 10.
Binary classification report—Wheat, GB/LR/KNN (all 84.5%).
| Class | Precision | Recall | F1 | Support |
|---|
| Low | 0.96 | 0.56 | 0.71 | 306 |
| Not Low | 0.82 | 0.99 | 0.89 | 612 |
| Dummy baseline: 66.7%. RF: 72.8% (macro-F1: 0.727). |
Table 11.
Binary classification report—Soybean, GB (74.9%).
Table 11.
Binary classification report—Soybean, GB (74.9%).
| Class | Precision | Recall | F1 | Support |
|---|
| Low | 0.77 | 0.35 | 0.48 | 1022 |
| Not Low | 0.74 | 0.95 | 0.83 | 2043 |
| Dummy baseline: 66.7%. KNN also 74.9%. LR: 71.5%. RF: 72.7%. |
Table 12.
Binary classification report—Corn, KNN (73.5%).
Table 12.
Binary classification report—Corn, KNN (73.5%).
| Class | Precision | Recall | F1 | Support |
|---|
| Low | 0.70 | 0.35 | 0.47 | 1160 |
| Not Low | 0.74 | 0.93 | 0.82 | 2317 |
| Dummy baseline: 66.6%. GB: 73.4%, RF: 72.5% (above). LR: 61.7% (below). |
Table 13.
Confusion matrix—Corn, RF, Three-Class.
Table 13.
Confusion matrix—Corn, RF, Three-Class.
| | Pred Low | Pred Medium | Pred High |
|---|
| Actual Low | 432 | 192 | 536 |
| Actual Medium | 153 | 352 | 653 |
| Actual High | 22 | 190 | 947 |
Table 14.
Confusion matrix—Soybean, KNN, Three-Class.
Table 14.
Confusion matrix—Soybean, KNN, Three-Class.
| | Pred Low | Pred Medium | Pred High |
|---|
| Actual Low | 403 | 132 | 487 |
| Actual Medium | 114 | 265 | 643 |
| Actual High | 64 | 141 | 816 |
Table 15.
Confusion matrix—Wheat, RF, Three-Class.
Table 15.
Confusion matrix—Wheat, RF, Three-Class.
| | Pred Low | Pred Medium | Pred High |
|---|
| Actual Low | 171 | 135 | 0 |
| Actual Medium | 12 | 226 | 68 |
| Actual High | 0 | 170 | 135 |
Table 16.
Confusion matrix—Corn, KNN, Binary.
Table 16.
Confusion matrix—Corn, KNN, Binary.
| | Pred Low | Pred Not Low |
|---|
| Actual Low | 410 | 750 |
| Actual Not Low | 173 | 2144 |
Table 17.
Confusion matrix—Soybean, GB, Binary (GB and KNN tied at 74.9%).
Table 17.
Confusion matrix—Soybean, GB, Binary (GB and KNN tied at 74.9%).
| | Pred Low | Pred Not Low |
|---|
| Actual Low | 359 | 663 |
| Actual Not Low | 106 | 1937 |
Table 18.
Confusion matrix—Wheat, RF, Binary. RF achieves perfect Low-class recall (all 306 Low fields correctly identified) at the cost of 250 false positives, yielding 72.8% overall accuracy. This conservative behaviour is discussed as potentially preferable in insurance contexts where false negatives are costly.
Table 18.
Confusion matrix—Wheat, RF, Binary. RF achieves perfect Low-class recall (all 306 Low fields correctly identified) at the cost of 250 false positives, yielding 72.8% overall accuracy. This conservative behaviour is discussed as potentially preferable in insurance contexts where false negatives are costly.
| | Pred Low | Pred Not Low |
|---|
| Actual Low | 306 | 0 |
| Actual Not Low | 250 | 362 |
Table 19.
Confusion matrix—Wheat, GB/LR/KNN, Binary. GB, LR, and KNN produced identical predictions (McNemar ), so a single matrix represents all three models.
Table 19.
Confusion matrix—Wheat, GB/LR/KNN, Binary. GB, LR, and KNN produced identical predictions (McNemar ), so a single matrix represents all three models.
| | Pred Low | Pred Not Low |
|---|
| Actual Low | 171 | 135 |
| Actual Not Low | 7 | 605 |
Table 20.
Bootstrap 95% confidence intervals for accuracy and macro-F1, best model per crop-task.
Table 20.
Bootstrap 95% confidence intervals for accuracy and macro-F1, best model per crop-task.
| Crop | Task | Model | Acc (%) | Acc 95% CI (%) | Macro-F1 95% CI |
|---|
| Corn | 3-class | RF | 49.8 | [48.2, 51.5] | [0.463, 0.496] |
| Soybean | 3-class | KNN | 48.4 | [46.8, 50.1] | [0.447, 0.482] |
| Wheat | 3-class | RF | 58.0 | [55.0, 61.4] | [0.560, 0.623] |
| Corn | Binary | KNN | 73.5 | [72.0, 74.8] | [0.629, 0.664] |
| Soybean | Binary | GB | 74.9 | [73.3, 76.4] | [0.638, 0.678] |
| Wheat | Binary | GB | 84.5 | [82.1, 86.7] | [0.771, 0.829] |
Table 21.
Top-five features by mean , Gini importance, and MI score for the three-class RF model, per crop. SHAP values reflect the mean absolute contribution per prediction averaged across all classes; Gini and MI values are the same as reported in the feature importance analysis.
Table 21.
Top-five features by mean , Gini importance, and MI score for the three-class RF model, per crop. SHAP values reflect the mean absolute contribution per prediction averaged across all classes; Gini and MI values are the same as reported in the feature importance analysis.
| Crop | Feature | Mean | Gini | MI |
|---|
| Corn | peak_ndvi_to_date | 0.039 | 0.261 | 0.126 |
| ndvi_auc_to_date | 0.023 | 0.149 | 0.121 |
| evi_at_cutoff | 0.019 | 0.143 | 0.125 |
| senescence_slope | 0.013 | 0.131 | 0.128 |
| ndvi_at_cutoff | 0.012 | 0.103 | 0.118 |
| Soybean | inter_heat_dry_Flowering | 0.044 | 0.145 | 0.094 |
| green_up_slope_to_date | 0.026 | 0.169 | 0.141 |
| evi_at_cutoff | 0.026 | 0.090 | 0.116 |
| ndvi_at_cutoff | 0.019 | 0.156 | 0.152 |
| ndvi_std_to_date | 0.017 | 0.150 | 0.159 |
| Wheat | peak_ndvi_to_date | 0.045 | 0.196 | 0.366 |
| dry_days_Tillering | 0.043 | 0.168 | 0.265 |
| mean_t2m_Ripening | 0.039 | 0.149 | 0.261 |
| cum_srad_to_date | 0.033 | 0.128 | 0.264 |
| ndvi_std_to_date | 0.030 | 0.076 | 0.384 |