1. Introduction
Rice is one of the most important staple food crops in China. Its stable production is directly related to national food security. The accurate prediction of rice yield and the elucidation of the influence mechanisms of meteorological factors on yield are of great significance for guiding agricultural production and formulating disaster prevention and mitigation measures. The methodologies for predicting agricultural product yields have evolved from statistical analysis to mechanistic simulation, and ultimately to machine learning [
1,
2,
3,
4]. Traditional regression or correlation analysis models are simple to construct. However, they are often confined to specific spatiotemporal conditions. These models are inadequate for handling the complex nonlinear problems inherent in crop growth and development. Crop growth, development, and yield formation are essentially nonlinear natural processes. Therefore, nonlinear prediction models utilizing modern methods, such as machine learning and deep learning, significantly outperform traditional models in terms of accuracy. These advanced models can effectively characterize the nonlinear and uncertain features prevalent in agricultural systems.
Despite the significant advancements in the prediction accuracy of machine learning models, their “black-box” nature restricts the understanding of their decision-making processes, hindering their adoption in yield prediction [
5,
6]. Explainable Artificial Intelligence (XAI) addresses this issue by providing transparent and comprehensible explanations for model decisions while maintaining predictive performance [
7]. Among various XAI methods, SHAP (SHapley Additive exPlanations) stands out due to its solid theoretical foundation in game theory and comprehensive interpretability, and has become one of the most widely utilized interpretability tools in agricultural yield prediction. SHAP quantifies the marginal contribution of each feature to the prediction result by decomposing the model prediction into the sum of individual feature contributions.
It is important to note that SHAP is not the only available interpretability method. Partial Dependence Plots (PDP) [
8] provide a straightforward visualization of the marginal effect of a feature on the predicted outcome by averaging predictions over the distribution of complementary features. However, PDP can obscure heterogeneous effects when feature interactions are present, as the averaging process may combine subgroups with opposing response patterns into a flat average. Accumulated Local Effects (ALEs) plots [
9] address this limitation by computing conditional rather than marginal effects, making them more robust to feature correlations and more computationally efficient. Permutation importance [
10] offers a model-agnostic measure of feature relevance by assessing the degradation in model performance when a feature’s values are randomly shuffled. While computationally simple, it provides only a global importance ranking without revealing the direction or shape of feature effects. More recently, causal inference frameworks [
11] have been proposed to move beyond associational interpretations of machine learning models toward identifying causal mechanisms. These methods represent complementary approaches to SHAP, each with distinct trade-offs between computational cost, interpretive depth, and robustness to feature correlations. SHAP was selected for the present study because it uniquely provides both local and global interpretability, supports interaction value decomposition, and offers a theoretically grounded allocation of feature contributions—properties that are essential for the threshold identification and interaction dominance analysis developed herein.
In the context of maize yield prediction in Kenya, researchers integrated XGBoost with SHAP. The results demonstrated that SHAP dependence plots could reveal the negative effects of high temperatures on maize yield. They also identified the law of diminishing returns once fertilizer inputs exceeded an optimal threshold [
12]. Regarding wheat yield prediction, an interpretable yield estimation model was developed using LightGBM and SHAP. This approach revealed the contribution mechanisms of remote sensing indices to yield formation [
13]. In domestic research, scholars have similarly applied SHAP for feature screening and model interpretation in wheat yield prediction. Identifying nonlinear thresholds from SHAP dependence plots has emerged as a cutting-edge direction in the current agrometeorological field [
14]. For instance, based on a coupling framework of the APSIM crop growth model and Random Forest-SHAP, researchers accurately identified a key threshold of 75.5 mm for the intense precipitation index (R95p) during the wheat growth period. When intense precipitation was below this threshold, increased precipitation facilitated yield gains. Once the threshold was exceeded, the beneficial effect rapidly reversed into a yield reduction risk [
14]. In another study utilizing an XGBoost-SHAP framework during extreme precipitation years, a distinct nonlinear threshold relationship between soil sand content and maize yield was observed. Excessively low sand content (approximately 12.85%) exacerbated waterlogging damage, leading to yield reduction. Conversely, when the sand content ranged from 22% to 30%, the impact shifted from inhibitory to promotive [
15]. Collectively, these studies demonstrate that SHAP is not merely a visualization tool for feature importance. It can also serve as an effective instrument for quantifying the nonlinear threshold effects of meteorological factors.
The Zero-Crossing Threshold (ZCT) based on SHAP dependence plots has been extensively applied to identify critical transition points between yield increase and decrease. For example, in a study on landscape ecological risk in the Qilian Mountains, researchers employed spline regression and constraint line methods to identify threshold inflection points for altitude (4200 m), downward shortwave radiation (2502 W/m
2), and a dual-threshold response for grazing intensity (3.35 and 14.36 SU/ha) [
16]. However, the ZCT can only capture critical points where the direction of the effect undergoes a fundamental shift. It fails to identify situations where the effect direction remains unchanged but the effect intensity varies significantly (e.g., a yield-increase effect transitioning from rapid growth to saturation, or a yield-reduction effect progressively accelerating). These “qualitative change points of effect intensity” hold equally important implications for agricultural early warning, indicating either the saturation point of a suitable range or the acceleration point of disaster losses. Furthermore, analyses of SHAP dependence plots have largely remained at the qualitative description level, and few studies have quantified curve slope changes to identify critical positions of effect intensity transformation. In eco-logical threshold research, first derivatives or inflection point detection have been widely utilized to locate such positions [
17], yet this method has not been systematically introduced into SHAP analyses for agricultural yield prediction. In view of these deficiencies, a Derivative Extrema Threshold (DET) detection method is introduced alongside the ZCT analysis. Unlike existing threshold identification approaches in ecological modeling that typically rely on piecewise regression or change-point detection algorithms [
18], the DET is based on the extreme points of the first derivative of the smoothed SHAP dependence plot curves. By capturing the extreme positions where the curve slope changes most rapidly, the DET achieves the quantitative localization of qualitative change points in effect intensity without requiring a directional reversal. This approach bridges the gap in the existing threshold identification system. Concurrently, to address the bottleneck of missing normalized quantitative benchmarks for factor interaction effects, the Interaction Dominance Ratio (IDR) is proposed. Unlike traditional interaction metrics such as the H-statistic [
19] or variance-based sensitivity indices [
20] that quantify interaction strength in absolute terms, the IDR eliminates interference from feature dimensions and scales by constructing a dimensionless ratio of the interaction variation span to the total effect discrete degree. Furthermore, based on the internal distribution characteristics of the data, a three-tier grading standard comprising strong, moderate, and weak levels is established. Through the construction of the afore-mentioned ‘ZCT-DET-IDR’ coupling method, this study aims to systematically elucidate the nonlinear influences and synergistic mechanisms of meteorological factors on rice yield in the Ningbo region, ultimately providing a quantifiable decision-making basis for regional rice yield prediction and the refined, composite early warning of agrometeoro-logical disasters.
2. Materials and Methods
2.1. Data Sources
The study region is Ningbo City (28°51′–30°33′ N, 120°55′–122°16′ E), located on the southeastern coast of China, covering a total land area of approximately 9365 km
2 (
Figure 1). Ningbo is situated in the northern part of Zhejiang Province and lies within the subtropical monsoon climate zone, characterized by hot, humid summers and mild winters, with an annual mean temperature of approximately 16–17 °C and annual precipitation of approximately 1300–1500 mm. Single-season rice is the predominant rice cropping system in this region, with the growing season generally extending from March to October.
Rice yield data were obtained from the Ningbo Statistical Yearbook (1995–2024). These data encompass the unit area yields of rice across nine districts and counties in Ningbo City: Jiangbei, Zhenhai, Beilun, Yinzhou, Fenghua, Yuyao, Cixi, Ninghai, and Xiangshan. Meteorological data were provided by the Ningbo Meteorological Bureau. A total of nine national quality-controlled meteorological stations, one representative station per district or county, were utilized in this study. The records comprise daily observations from March to October for the period 1995–2024 (30 years), subsequently aggregated into monthly averages or cumulative values for model input. The variables include average temperature, maximum temperature, minimum temperature, relative humidity, precipitation, and maximum wind speed. It should be noted that these meteorological stations are not installed directly within rice fields. However, their site selection strictly adheres to the standard construction criteria for national meteorological stations in China, which ensures that each station is situated in a location capable of accurately representing the regional climate characteristics of its administrative area. This design principle minimizes microenvironmental biases and ensures the spatial representativeness of the meteorological observations for the corresponding rice production areas.
2.2. Data Processing
All data preprocessing and visualization were implemented using Python (version 3.13.9). In rice yield prediction studies, the actual crop yield is generally decomposed into three components: Trend Yield, Meteorological Yield, and random error. The Hodrick–Prescott (HP) filter was applied to separate the Trend Yield from the unit rice yield, thereby deriving the Meteorological Yield [
21,
22].
Temporal Feature Expansion was applied to the monthly average data of six variables: average temperature, maximum temperature, minimum temperature, relative humidity, precipitation, and maximum wind speed. Taking the average temperature in Ningbo (March to October over 30 years across nine districts) as an example, the months were combined sequentially into durations of 8, 7, down to 1 month. For instance, TAVG_4-6 denotes the average temperature from April to June, whereas R2020_5-9 represents the cumulative precipitation from May to September. Through this procedure, a total of 216 expanded features were generated.
To eliminate linear collinearity among factors and enhance model prediction accuracy, the Least Absolute Shrinkage and Selection Operator (LASSO) algorithm was employed for preliminary feature screening. 15 meteorological expanded features were retained: average temperature in March (TAVG_3), average temperature from March to May (TAVG_3-5), average temperature in June (TAVG_6), average temperature from June to July (TAVG_6-7), average temperature from August to September (TAVG_8-9), average temperature in October (TAVG_10), average minimum temperature in June (TMIN_6), maximum wind speed in September (WSMAX_9), average relative humidity in September (UAVG_9), cumulative precipitation from March to June (R2020_3-6), cumulative precipitation from April to May (R2020_4-5), cumulative precipitation from April to July (R2020_4-7), cumulative precipitation in May (R2020_5), cumulative precipitation in August (R2020_8), and cumulative precipitation in October (R2020_10).
Regarding spatial dependence, the nine districts in Ningbo are geographically adjacent, which implies potential spatial correlation in agricultural production. This was addressed implicitly through the data decomposition strategy. The Hodrick–Prescott (HP) filter (smoothing parameter λ = 100, which is the standard convention for annual time-series data) was applied to each district’s yield time series individually to extract the meteorological yield. By modeling this detrended component—specifically the inter-annual deviation from each district’s own trend—the analysis effectively controlled for district-specific fixed effects, such as baseline soil fertility and persistent management practices. Consequently, the study focused on the variable component of yield driven primarily by annual weather fluctuations. Given the limited sample size and the dominance of local meteorological drivers, treating each district-year as an independent observation was deemed appropriate for capturing the local crop-weather relationships without introducing the complexity of explicit spatial interaction terms, which could risk overfitting in this sample size regime. To strictly prevent data leakage during preprocessing, all feature scaling (standardization) and LASSO feature selection were fitted exclusively on the training set (1995–2018) and subsequently applied to the test set.
2.3. Model Construction
Multiple Linear Regression (MLR) and five common Machine Learning algorithms were selected to construct meteorological factor-based rice yield prediction models for the Ningbo region. To ensure optimal performance and prevent overfitting, hyperparameters for all machine learning models were tuned using a grid search combined with a 5-fold Time Series Split cross-validation strictly applied to the training set (1995–2018), maximizing the cross-validation R2.
The parameter configurations and their tuning search spaces were specified as follows. For Support Vector Regression (SVR), the Radial Basis Function (RBF) was specified as the kernel function, with the regularization parameter C searched over {0.1, 1,10} and ϵϵ over {0.01, 0.1, 0.5}, yielding final values of C = 1 and ϵ = 0.1. For Bagged Trees, the ensemble size was searched over {100, 200, 300} (final: 200), the base learner was a decision tree, and the minimum number of samples per leaf node was set to 5. For Random Forest (RF), the number of trees was searched over {100, 200, 300} (final: 200), the minimum samples per leaf node over {5, 10, 15} (final: 5), and the number of features considered per split was the square root of the total features. For the Back Propagation Neural Network (BPNN), a single hidden layer structure was adopted with the number of neurons searched over {6, 9, 12} (final: 9). To mitigate overfitting, a dropout layer (rate = 0.25) was added after the hidden layer, and L2 regularization (coefficient = 5 × 10−5) was incorporated into the loss function. The training utilized the Adam optimizer with an initial learning rate of 0.001 and a maximum of 1000 iterations, alongside an early stopping mechanism. For the LightGBM model, hyperparameter tuning was conducted using a Bayesian optimization framework. The optimization was performed on the training set, maximizing the mean R2 of the 5-fold Time Series Split CV. The final selected hyperparameters were: learning_rate = 0.05, n_estimators = 100, num_leaves = 15, subsample = 0.8, and min_child_samples = 10.
The 15 meteorological expanded features and the rice Meteorological Yield were normalized. These were then utilized as the feature variables and the dependent variable, respectively. The samples were chronologically partitioned. A total of 216 samples from 1995 to 2018 constituted the training set, whereas 54 samples from 2019 to 2024 served as the testing set. The prediction performance of the six models was subsequently evaluated. To prevent data leakage and ensure a rigorous evaluation of generalization performance, a time-series nested validation strategy was implemented. During the Recursive Feature Elimination (RFE) phase, feature subsets were evaluated exclusively on the training set using a 5-fold Time Series Split cross-validation (CV). Unlike standard k-fold CV, this method splits the data chronologically, ensuring that validation folds always occur after training folds, thereby preventing future information leakage. The optimal number of features was determined by the mean R2 across the 5-fold CV.
2.4. Nonlinear Threshold Identification Based on SHAP
SHAP is a model interpretation method derived from the Shapley value in game theory, originally proposed by Lundberg and Lee [
23]. It fairly allocates the model’s prediction value to each input feature, ensuring that the contribution of each feature adheres to axioms such as additivity, consistency, and local accuracy.
Based on this definition, SHAP can be utilized in agricultural meteorology to calculate the thresholds of meteorological factors affecting rice yield. To eliminate the interference of data noise on threshold identification, a local smoothing procedure was initially applied to the scatter points in the dependence plots. Specifically, the feature values were sorted in ascending order. A moving average method was then utilized to compute a smoothed curve. The window width was set to 5% of the sample size (with a minimum of 5), and the mean of the SHAP values within the window was assigned as the smoothed value for that point. This smoothed curve represented the overall variation trend of the SHAP values.
For each dependence plot, the smoothed curve was examined to identify positions where the product of two adjacent points was less than or equal to zero (i.e., opposite signs or a zero crossing). If such a position existed, linear interpolation was applied to accurately calculate the zero-crossing coordinates. Here,
xi and
xi+1 denote two adjacent feature values with opposite signs, while
ϕsmooth(
xi) and
ϕsmooth(
xi+1) represent their corresponding smoothed SHAP values. This threshold directly indicated the critical point at which the predicted unit rice yield shifted from a yield-increase dominance (
ϕ > 0) to a yield-decrease dominance (
ϕ < 0):
In the field of ecological threshold research, first derivatives or inflection point detections have been widely utilized to locate critical positions of morphological changes in curves [
24]. However, this approach has not been systematically introduced into SHAP analyses for agricultural yield prediction. For a smoothed SHAP curve, the first derivative reflects the rate of change in the SHAP value relative to the feature value. Therefore, building upon the Zero-Crossing Threshold (ZCT) analysis, a novel threshold identification method termed the Derivative Extrema Threshold (DET) was introduced. The DET was defined as the point where the first derivative attained an extremum (either a maximum or a minimum). This point corresponded to the position of the most rapid slope change, indicating the region where the SHAP value was most sensitive to variations in the feature value.
For a smoothed SHAP dependence curve, its first derivative reflects the rate of change in the SHAP value with respect to the feature value. The Derivative Extrema Threshold (DET) is defined as the point where the first derivative attains a local extremum (maximum or minimum). Mathematically, this corresponds to the inflection point of the original SHAP curve, satisfying:
where
ϕsmooth(
x) is the smoothed SHAP function. It is crucial to clarify that DET does not identify the point where the slope changes fastest (which would be an extremum of the second derivative, i.e., maximum curvature). Instead, DET pinpoints the location where the effect intensity (the SHAP value itself) changes at its maximum rate. This threshold is applicable to scenarios where the direction of the effect remains unchanged, but the intensity undergoes a qualitative shift (e.g., a yield-increase effect transitioning from rapid acceleration to saturation).
The widely used Cubic Spline Interpolation was applied to fit the smoothed curve (smoothing parameter s = 0). The first derivative of the spline function was then computed. Dense sampling was performed using 500 equally spaced points within the range of the feature values. The DET points were identified via the local extrema of this first derivative sequence, indicating the range where the predicted unit rice yield shifted from a yield-increase dominance (ϕ > 0) to a yield-decrease dominance (ϕ < 0).
Following the acquisition of DET candidate points, a specific constraint was required. Although DET points situated near the ZCT possessed clear agrometeorological interpretability regarding boundary effects, the algorithm merely captured the top two extreme points ranked by absolute derivative values. This characteristic frequently caused the selected points to reside in the extreme tails of the data distribution, where observations were exceptionally sparse (e.g., regions of maximum precipitation). Consequently, these points degraded into extrapolation artifacts generated by the spline interpolation, losing their actual physical significance. To address this issue, a spatial constraint criterion based on the boxplot principle was introduced. A boundary of 1.5 times the interquartile range (IQR) was established. Specifically, the interval [Q1 − 1.5 × IQR, Q3 + 1.5 × IQR] for each feature value in the training set was defined as the valid boundary. Any DET candidate points falling outside this range were discarded. This procedure effectively ensured that the retained DET points were located within normal intervals characterized by sufficient data density. It thereby avoided pseudo-threshold interference caused by model extrapolation into sparse regions.
To quantify the uncertainty of the threshold estimations, a Bootstrap resampling method (n = 500) was employed to calculate the 95% confidence interval for each threshold. For simplicity of presentation, we retain the term “95% confidence interval” (CI) throughout this paper; however, it must be strictly clarified that this refers to the bootstrap interval, reflecting the stability of the model-identified thresholds under sample perturbations, rather than a classical confidence interval for an unknown population parameter. During each resampling iteration, a dataset equal in size to the original sample was drawn with replacement from the original training set. The dependence plots, smoothed curves, and the two aforementioned thresholds were recalculated, and the threshold estimates from all Bootstrap samples were recorded.
Due to the substantial variations in the data ranges of different features, a unified absolute width standard could not be applied. A relative width indicator, defined as the ratio of the confidence interval width to the feature value range (
W/
R), was adopted to evaluate the reliability of the thresholds:
Here, U and L represented the upper and lower limits of the confidence interval, respectively. max(Xj) and min(Xj) denote the maximum and minimum values of feature Xj in the training set, respectively. As noted above, this width does not represent the precision of a population parameter estimate, but rather directly reflects the clarity of the turning point within the data: a narrower interval indicates a more stable, explicitly defined response reversal in the observed data, whereas a wider interval implies a relatively gentle effect transition zone or a limitation imposed by local sample sparsity. It must be emphasized that SHAP values quantify the marginal contribution of features to the model’s prediction rather than establishing causal physiological mechanisms. The identified thresholds and interactions reflect conditional response patterns of the constructed model under the specific observational data distribution.
2.5. Factor Interaction Effect Analysis Based on SHAP
SHAP can quantify not only the marginal contribution of a single feature to the model prediction but also the joint impact of interactions between two features via SHAP interaction values [
25]. Unlike the standard SHAP value for a single feature, which reflects only its independent contribution, the SHAP interaction values decompose the prediction value into the sum of the main effects of individual features and the interaction effects of feature pairs.
According to the SHAP framework extension, for any feature
i and feature
j (
i ≠
j), the SHAP interaction value Φ
i,j measures the contribution of the interaction between feature
i and feature
j to the model prediction. The calculation formula is:
Here,
N represents the set of all features,
M is the total number of features, and ∇
i,j(
S) denotes the interactive marginal contribution of features
i and
j on subset
S:
The intuitive meaning of this formula is as follows. The interaction effect of features i and j equals the marginal gain from adding both features to the model simultaneously. This value is then subtracted by the sum of the marginal gains from adding each feature individually. If ∇i,j(S) > 0, a synergistic effect exists between features i and j. Their simultaneous presence yields a positive contribution to the yield that exceeds the sum of their individual contributions. If ∇i,j(S) < 0, an antagonistic effect is indicated. The SHAP interaction values satisfy the symmetry property, meaning Φi,j = Φj,i. Consequently, the total interaction effect for any feature pair is Φi,j + Φj,i = 2Φi,j.
For each feature, its main effect can be derived by subtracting all interaction effects from its SHAP value, where
ϕi is the SHAP value of feature
i. The sum of the main effects and interaction effects across all features equals the total deviation of the model prediction value from the baseline value. The formula is expressed as:
Traditional SHAP analysis assigns an importance score to each feature (e.g., mean absolute SHAP value). This score aggregates the feature’s own main effect and its interaction effects with all other features. However, this aggregated representation can be misleading. When the main effect of a feature and its interaction effects act in opposing directions, partial cancelation occurs during the traditional SHAP aggregation. This phenomenon causes the feature to exhibit a falsely low importance. Furthermore, the absolute magnitude of traditional SHAP interaction values is constrained by the feature’s inherent dimensions and data scale. Thus, it cannot serve as a universal comparison benchmark across different feature pairs.
To deeply deconstruct the interpretable components within traditional SHAP values and characterize the relative strength of interaction effects at the feature pair level, the dimensional interference had to be eliminated. To address the methodological bottleneck of lacking normalized quantitative benchmarks and grading standards, the Interaction Dominance Ratio (IDR) was proposed. The core design of the IDR was to evaluate the “interaction variation” within a relative framework of “total effect variation.” For any feature pair (
i,
j), the IDR calculation formula was defined as:
Regarding this formula, the standard deviation is highly sensitive to extreme outlier samples. In contrast, the adoption of the percentile range (P90–P10) effectively truncated the influence of outliers induced by extreme climate disturbances. This metric focused on the interaction variation span of the central 80% of typical samples. It thereby robustly characterized the true interaction dominance degree of feature pairs under typical meteorological conditions.
In this equation, the numerator characterizes the discrete span (P90–P10) of the pure interaction effect value Φij across different samples. This reflected the variability of the interaction effect among observation points. The denominator characterized the overall discrete degree (standard deviation) of the total combined effect for the feature pair. This total combined effect was the sum of the pure main effects Φii and Φjj and the pure interaction effect Φij. By constructing this dimensionless ratio, the IDR achieved a “de-scaled” comparison of interaction strengths across feature pairs with different dimensions and magnitudes. This provided a specific mathematical implementation for the “horizontal comparison ruler” that was missing in previous studies.
In statistics and various applied fields, the effect size interpretation standards proposed by Jacob Cohen are widely accepted. Cohen categorized the magnitude of the Pearson correlation coefficient r into three levels: “small,” “medium,” and “large,” with corresponding thresholds of r ≈ 0.10, r ≈ 0.30, and r ≈ 0.50, respectively [
25]. These thresholds have been validated and adopted in fields such as agriculture, ecology, and social sciences [
26,
27]. The IDR, as a dimensionless ratio measuring the relative dominance of interaction variation within total effect variation, shares a similar interpretive logic with the correlation coefficient: both quantify the proportion of explained variance attributable to a specific source. Accordingly, adapting Cohen’s benchmarks to the IDR context provides a principled, albeit approximate, basis for grading interaction dominance (
Table 1). An IDR ≥ 0.50 is defined as “strong interaction dominance,” indicating that the interaction variation constitutes the majority of the total effect variation. An IDR between 0.30 and 0.50 is defined as “moderate interaction dominance,” and an IDR < 0.30 as “weak interaction dominance.” While we acknowledge that the direct mapping from Cohen’s r thresholds to IDR values involves an analogical extension rather than a formal statistical equivalence, this adaptation offers a transparent and reproducible grading framework where none previously existed. Future work with larger datasets may enable empirical calibration of these boundaries through receiver operating characteristic analysis or domain-expert elicitation.
Furthermore, to avoid the potential misjudgment of “high interaction ratio but low absolute contribution” that could arise from relying on a single IDR metric, a two-dimensional classification paradigm space was constructed. The “absolute interaction amplitude” served as the horizontal axis, and the IDR served as the vertical axis. Through median splits, this space mapped all feature pairs into four functional prototypes. These included interaction-dominated with high amplitude (upper right: strong synergistic/antagonistic pairs with genuine agronomic intervention value), interaction-dominated with low amplitude (upper left: an interaction mechanism exists but the absolute contribution is limited), main-effect-dominated with high amplitude (lower right: core driving factors in traditional understanding), and low-contribution feature pairs (lower left). This framework elevated the analytical perspective of the study from traditional “feature importance ranking” to “feature pair action mechanism tracing.” It clearly distinguished between two fundamentally different driving logics: “important due to inherent strength” and “important due to intense interaction,” Consequently, it provided a direct quantitative basis for formulating multi-factor synergistic regulation strategies.
2.6. Model Evaluation Metrics
To objectively evaluate the performance of the rice Meteorological Yield prediction models, three commonly utilized statistical indicators were selected for quantitative assessment. These included the coefficient of determination (R-squared, R2), the Root Mean Squared Error (RMSE), and the Mean Absolute Error (MAE).
4. Discussion
4.1. Our Findings Based on SHAP and ZCT
Based on the dual-indicator detection system constructed from the SHAP dependence curves, the response transition characteristics of all 11 optimized meteorological factors were successfully extracted. Each factor exhibited a single Zero-Crossing Threshold (ZCT) point, signifying a monotonic reversal in the direction of its marginal contribution. A further comparison of the uncertainty spans (the ratio of the confidence interval width to the feature value range) across the factors revealed significant heterogeneity in their response patterns. For instance, the spans for the June average temperature and the June average minimum temperature were extremely narrow (<0.1 °C). These factors displayed an abrupt response reversal with a distinct boundary, indicating high statistical stability at their threshold points. Conversely, the span for the April–July cumulative precipitation was relatively wide (reaching 26.0 mm). The SHAP curve for this factor exhibited a gentle slope near the zero-crossing point, reflecting a gradual transition driven by the water regulation capacity of the soil-crop system.
This heterogeneity provides a differentiated perspective for the refined early warning of agrometeorological disasters. For factors characterized by a Narrow Transition Zone, an explicit single early warning threshold can be established. For factors within a Wide Transition Zone, it is necessary to consider implementing a “transition early warning interval” rather than relying exclusively on a single numerical value. Under this theoretical framework, the meteorological thresholds identified in this study possess explicit agronomic validation value:
(1) Validation of the sowing period temperature index: The ZCT for the March average temperature was 11.6 °C, with an extremely narrow interval. Model predictions indicated yield reductions below this value, whereas yield-increase effects emerged above it. This threshold aligns closely with local agrometeorological practices for early rice sowing in the Ningbo region. Local agrometeorological observations indicate that a stable March average temperature passing through 10–12 °C serves as the primary basis for determining the safe sowing period. Temperatures below this limit frequently lead to bud rot and seedling death [
28].
(2) Validation of the composite disaster index during the key growth period: The ZCT for the August–September average temperature was 26.2 °C, and the ZCT for the August cumulative precipitation was 210.6 mm. These two factors precisely formed the core nodes of the Interaction Dominance Triangular Network identified through Interaction Dominance Ratio (IDR) analysis. Local disaster records provided direct physical validation for these two thresholds. Previous research demonstrates that a daily average temperature ≥ 27 °C during the Heading and Grain-Filling Stage of single-season rice in Ningbo can induce significant heat damage. This damage subsequently reduces the seed-setting rate and the 1000-grain weight. Furthermore, August is characterized by frequent typhoons. When monthly precipitation exceeds 200 mm, continuous field waterlogging readily occurs. The superimposition of high temperatures generates a composite stress involving High-Temperature Induced Premature Maturity and overcast, rainy conditions with insufficient sunlight. Under such conditions, yield losses are severely amplified [
29,
30]. The thresholds of 26.2 °C and 210.6 mm identified by the model fall precisely within the risk intervals confirmed by local agrometeorological experiments. This confirms that the “ZCT-DET-IDR” framework captures actual disaster physical processes rather than statistical artifacts.
(3) Validation of the moisture management index during the grain-filling period: The ZCT for the October cumulative precipitation was 83.3 mm. The grain-filling period for late rice in Ningbo coincides with the peak season for continuous autumn rains. Local field experiments and disaster surveys indicate that when October cumulative precipitation consistently exceeds 80 mm, root hypoxia and premature leaf senescence occur. This significantly inhibits substance transport and grain filling [
31]. This explicit threshold is highly consistent with the model outputs, highlighting the need for meticulous drainage and waterlogging mitigation management during the late rice grain-filling period.
4.2. What Role Did DET Play as a Complement to ZCT?
Building upon the absolute reversals identified by the ZCT, the Derivative Extrema Threshold (DET) method further revealed the dynamic evolution of marginal effects. As previously noted, the DET point for the August–September average temperature (26.86 °C) was located to the right of the ZCT and exhibited a significant negative value. This represented a typical deterioration acceleration zone. It indicated that once the high-temperature threshold is breached during the grain-filling stage, negative effects escalate sharply without buffer. Conversely, the negative effect weakening point detected for the May cumulative precipitation to the left of the ZCT (64.9–79.5 mm) accurately mapped a drought mitigation acceleration zone during the transplanting and returning green stage. It must be emphasized that the DET points for certain long-period cumulative precipitation variables (e.g., March–June cumulative precipitation) fell into extreme tails characterized by sparse data. These points essentially represent uncertainty identifiers of model extrapolation and should not be directly assigned agronomic significance. This observation highlights that when utilizing DET as a supplement to ZCT, IQR filtering must be applied to eliminate statistical artifacts.
Through effect decomposition, traditional SHAP analysis revealed significant effect cancelation for the June average minimum temperature and the August cumulative precipitation. This implies that relying solely on traditional mean absolute SHAP values underestimates their true potential. Overcoming this aggregation bias, the IDR matrix accurately characterized the Interaction Dominance Triangular Network composed of the August–September average temperature, the June average minimum temperature, and the August cumulative precipitation. From a rice phenological perspective, this network profoundly reflects the deep coupling between meteorological stress during the reproductive growth stage and the preceding climatic background of the vegetative growth stage. The high IDR (0.622) between the August–September average temperature and the June average minimum temperature indicates that the temperature effect during the Heading and Grain-Filling Stage is highly dependent on the preceding June temperature baseline. Growth delays induced by low temperatures in June alter the Phenological Window for subsequent high-temperature exposure. Additionally, the high IDR (0.549) between the August–September average temperature and the August cumulative precipitation typically mapped the nonlinear amplification effect occurring when High-Temperature Induced Premature Maturity and insufficient sunlight overlap during the heading window. This forms a closed-loop mutual corroboration with the previously mentioned local composite disaster records. These findings have paradigm-shifting implications for agrometeorological disaster early warning. Traditional single-factor threshold warnings pose a systematic underestimation risk when confronted with this interaction dominance triangle. For high-IDR factors, meteorological services should avoid issuing isolated temperature or precipitation forecasts. Instead, a Composite Meteorological Index based on multi-factor combinations should be established, such as a dynamic heat damage index incorporating preceding temperature anomaly correction terms.
To contextualize our findings within a broader geographical framework, it is essential to compare the identified meteorological thresholds and interaction mechanisms with those reported in other global rice-growing regions. For instance, our study identified a ZCT of 26.2 °C for the August–September average temperature (TAVG_8_9) in Ningbo. This aligns closely with global meta-analyses showing that critical thermal thresholds for rice during the reproductive stage typically range from 26 °C to 28 °C, beyond which heat-induced sterility and yield reduction accelerate non-linearly [
32,
33]. Similarly, the strong interaction dominance between high temperature and excessive precipitation (IDR = 0.549 for TAVG_8_9 × R2020_8) corroborates global observations of compound extreme events. Previous studies have demonstrated that the co-occurrence of heat stress and waterlogging amplifies yield losses far beyond the sum of their individual effects—a phenomenon increasingly recognized in monsoon-affected regions of South and Southeast Asia [
34]. The cross-regional consistency of these thresholds and interactions confirms that the physical mechanisms captured by our model are not localized statistical artifacts, but rather reflect fundamental physiological responses of rice to climatic stress.
Regarding methodological advancements, the proposed ZCT-DET-IDR framework demonstrates distinct advantages over conventional interpretable machine learning approaches, though it is not without limitations. Traditional Partial Dependence Plots (PDP) and Accumulated Local Effects (ALE) plots are effective in visualizing average marginal effects but often obscure threshold behaviors by averaging across instances [
9]. While standard SHAP dependence plots can reveal Zero-Crossing Thresholds (ZCT), they fundamentally fail to capture qualitative shifts in effect intensity without directional reversal—a critical gap filled by our Derivative Extrema Threshold (DET) approach. For example, in predicting crop yields under climate extremes, researchers frequently note diminishing marginal returns or effect saturation of certain agronomic factors but lacked a quantitative tool to locate the exact inflection point [
3]. DET mathematically locates these inflection points via first-derivative extrema, providing an “intensity early warning” that ZCT misses. Furthermore, compared to the purely qualitative interpretation of SHAP interaction heatmaps common in existing literature, the Interaction Dominance Ratio (IDR) provides a normalized, dimensionless metric that enables cross-comparison of interaction dominance across feature pairs. Unlike the absolute SHAP interaction values—which are inherently constrained by feature scales and thus cannot differentiate between “strong interaction with weak main effect” and “weak interaction with strong main effect” [
35]—IDR eliminates dimensional interference. However, a limitation of IDR is that its current grading thresholds (0.30 and 0.50) are calibrated analogously to Cohen’s effect size benchmarks. While theoretically robust and empirically functional in this dataset, these cut-offs may require regional recalibration. Future simulation studies with known ground-truth interaction strengths are necessary to validate the universality of these specific boundaries.
A fundamental caveat of this study is that the identified thresholds and interaction networks characterize the conditional response modes of the machine learning model, not direct causal crop responses. While the core thresholds (e.g., 26.2 °C for TAVG_8-9, 210.6 mm for R2020_8) demonstrate high consistency with local agrometeorological records, SHAP-based attributions are ultimately correlational. Unobserved confounding factors, such as concurrent shifts in agronomic management practices or soil properties, may partially be captured by the model and erroneously attributed to meteorological drivers. Therefore, translating these model-identified statistical transition points into standard agrometeorological management protocols requires rigorous validation through causal evidence from controlled experiments, such as artificial climate chambers or staggered planting trials.
4.3. Limitations and Future Work
Several limitations of the present study should be acknowledged:
(1) Regarding the temporal feature expansion strategy, the generation of 216 meteorological variables from only six base variables inevitably introduces multicollinearity and redundancy due to overlapping temporal windows. Although LASSO regression was employed for preliminary feature screening, LASSO tends to select one variable from a group of highly correlated variables and discard the others, which may not always retain the most agronomically meaningful representative. Future studies could consider variance inflation factor (VIF) analysis or principal component analysis (PCA) as complementary screening steps to more explicitly address multicollinearity.
(2) Regarding the robustness of the DET methodology, the DET results are sensitive to the choice of spline interpolation method, smoothing parameter, and IQR filtering criterion. The cubic spline interpolation with smoothing parameter s = 0 may produce oscillation artifacts in sparse data regions, as acknowledged in the Results section where several DET points were identified as uncertainty markers originating from spline extrapolation. Alternative smoothing strategies (e.g., LOESS, kernel regression, or different spline smoothing parameters) may yield different threshold patterns. The finding that all 11 factors exhibited single zero-crossing behavior without multiple threshold structures may also be influenced by the smoothing strategy or the relatively limited sample size. Future work should systematically evaluate the sensitivity of DET results to these methodological choices through robustness experiments with varying smoothing parameters and interpolation methods.
(3) Regarding the statistical significance of model improvements, the optimized LightGBM model achieved an R2 of 0.833 after reducing the feature set from 15 to 11 variables, compared to 0.809 with the full feature set. While this improvement is consistent with the expected benefit of removing noise features, the absence of bootstrap uncertainty intervals or repeated experimental runs makes it difficult to determine whether this improvement is statistically significant or simply due to sampling variability. Similarly, the dominance of August precipitation (R2020_8) as the most influential feature has not been evaluated for stability across different training/testing partitions or alternative feature selection procedures. A stability analysis using bootstrap resampling or multiple random splits would strengthen confidence in the reported importance rankings.
(4) Regarding spatial heterogeneity, the interaction analysis identified a strong tri-angular interaction network among TAVG_8_9, TMIN_6, and R2020_8 at the aggregate level across all nine districts. However, whether these interactions are consistent across all districts within Ningbo or whether spatial heterogeneity exists has not been explored. Coastal districts (e.g., Xiangshan) and inland districts (e.g., Yuyao) may exhibit different interaction patterns due to variations in topography, soil type, and microclimate. Future work should conduct district-level interaction analyses to explore regional variability and improve the practical applicability of the framework.
(5) Regarding generalizability, the applicability of the proposed “ZCT-DET-IDR” framework beyond the Ningbo rice production system remains unclear. The framework was developed and validated using data from a single region with a specific crop type, climatic regime, and agricultural practice. External validation using independent datasets from other crops (e.g., wheat, maize) and climatic regions (e.g., arid, continental) would be necessary to demonstrate the transferability of the methodology. The absence of comparative analyses with alternative threshold identification techniques, ablation studies, or sensitivity analyses further limits the demonstration of the practical superiority or robustness of the DET and IDR methods. Nevertheless, the mathematical formulations of DET and IDR are not crop- or region-specific, and their potential applicability to other XAI interpretation scenarios based on tree models provides a reasonable basis for cautious optimism regarding cross-domain portability.
5. Conclusions
Following dual screening via LASSO and SHAP-Recursive Feature Elimination (RFE), the LightGBM model incorporating 11 core meteorological factors demonstrated optimal performance in predicting rice yield in the Ningbo region (R2 = 0.833), achieving a synergistic enhancement of both prediction accuracy and model interpretability. Building upon this, the proposed Derivative Extrema Threshold (DET) successfully addressed the limitation of the traditional Zero-Crossing Threshold (ZCT) by identifying intensity mutation characteristics. Furthermore, the Interaction Dominance Ratio (IDR) facilitated the horizontal quantitative grading of interaction effects and accurately characterized the Interaction Dominance Triangular Network. Through cross-validation against local agrometeorological experiment records and disaster survey reports in Ningbo, the core thresholds extracted by the model (e.g., 11.6 °C for the March average temperature, 26.2 °C for the August–September average temperature, and 210.6 mm for the August precipitation) exhibited high consistency with the critical conditions of actual regional rice disasters. This confirmed that the “ZCT-DET-IDR” framework is capable of more than statistical inflection point detection; the identified thresholds correspond directly to disaster occurrence boundaries with distinct physical and agronomic significance. Consequently, this framework provides a decision-making foundation for the refined early warning of regional agrometeorological disasters, integrating mathematical rigor with practical applicability.
Through cross-validation against local agrometeorological experiment records and disaster survey reports in Ningbo, the core thresholds extracted by the model (e.g., 11.6 °C for the March average temperature, 26.2 °C for the August–September average temperature, and 210.6 mm for the August precipitation) exhibited high consistency with the critical conditions of actual regional rice disasters. This confirmed that the “ZCT-DET-IDR” framework is capable of more than statistical inflection point detection. The identified thresholds correspond directly to disaster occurrence boundaries with distinct physical and agronomic significance. Consequently, this framework provides a decision-making foundation for the refined early warning of regional agrometeorological disasters, integrating mathematical rigor with practical applicability. From the perspective of methodological advancement, the “ZCT-DET-IDR” framework constructed in this study possesses potential cross-domain portability. The DET addresses the challenge of identifying qualitative changes in effect intensity in the absence of a directional reversal. Concurrently, the IDR resolves the issue regarding the de-scaled comparison of interaction intensities across different feature pairs. These methodological challenges are not exclusive to rice meteorological prediction; they are prevalent in Explainable Artificial Intelligence (XAI) interpretation scenarios based on tree models. However, because the “ZCT-DET-IDR” framework explains model predictions rather than proving causal crop responses, and the IDR grading thresholds may require regional recalibration, the generalizability of these findings must be validated through controlled agronomic experiments and cross-regional datasets. Future work should focus on integrating causal inference methods and multi-source datasets to refine this framework into a standardized “threshold-interaction” dual-metric toolbox for XAI.