1. Introduction
China is the world’s largest tobacco producer and consumer, and its yield and sales rank first in the world, accounting for about 30% of global tobacco production [
1,
2]. As an important pillar of the national economy, the tobacco industry had a total industrial and commercial tax and profit of 1600.8 billion CNY in 2024 (accounting for 9.24% of the national tax revenue), and a total fiscal revenue of 1544.6 billion CNY. Its stable development is crucial to economic and fiscal revenue growth [
3,
4,
5].
In the global tobacco industry chain, flue-cured tobacco is the core raw material for cigarette production, and its yield and quality directly determine the trajectory of industrial development, thus becoming the core research object of tobacco agronomy. As the core production area of China’s flue-cured tobacco, Yunnan Province has a long history of planting and extensive regional distribution. The 12 prefectures and cities across the province carry out large-scale planting annually. Statistics in 2024 showed that Yunnan Province’s tobacco industry firmly ranked first in the country, accounting for 39.8% of the total contribution to the national economy, making it an important area to ensure the security of the national tobacco supply and industrial stability [
6].
Meteorological conditions are the key drivers for regulating the production and quality of flue-cured tobacco [
7]. Yunnan Province has a subtropical plateau monsoon climate [
8]. Although the growth period of flue-cured tobacco field (April–September) has the basic advantages of sufficient light and appropriate temperature, extreme meteorological events such as regional droughts and phased rainstorms caused by strong convective weather in summer occur frequently, causing significant interference to the survival rate of flue-cured tobacco during the transplantation stage and long-term nutrient accumulation, and the formation of quality in the mature period, resulting in outstanding inter-annual fluctuations in the yield. Therefore, accurately analyzing the correlation mechanism between meteorological factors and flue-cured tobacco production and building a scientific yield prediction model have important theoretical and practical significance for guiding the optimization of flue-cured tobacco production layout and the formulation of disaster prevention and mitigation measures, and ensuring the stable development of the industry. Crop yield formation is the result of the synergistic action of multiple factors such as agricultural scientific and technological progress (such as variety improvement, cultivation technology upgrade) [
9], fluctuations in meteorological conditions [
10], soil characteristics [
11], and field management measures [
12]. Among them, meteorological factors have become the core variables in yield prediction research due to their strong inter-annual variability and outstanding uncontrollability [
13].
In recent years, the research on the relationship between meteorological factors and crop yield has become increasingly in-depth. Based on temperature and precipitation data, Zhao et al. used regression models to quantify the degree of influence of meteorological factors on China’s grain yield [
14]; Ma et al. combined trend yield and meteorological yield to construct a time series prediction model, providing method support for the dynamic prediction of crop yields [
15]; Bognár et al. revealed the significant impact of seasonal climate fluctuations on crop yields such as corn and winter wheat through partial least squares regression [
16]; and Didari et al. used the Lasso regression model to confirm the key role of extreme temperature and precipitation events in the estimation of dry wheat yields [
17]. All the above studies show that the nonlinear correlation mechanism between analytical meteorological factors and crop yield is the core breakthrough for achieving accurate prediction. Current crop yield prediction methods are mainly divided into three categories: crop model methods, statistical model methods, and decomposition model methods. The crop model method realizes yield estimation by simulating the physiological and biochemical processes of crops, but relies on fine field parameters. Due to the high cost of parameter acquisition and poor timeliness, it is limited in large-scale practical applications [
18]. The statistical model method predicts yield by constructing a mathematical relationship between meteorological factors and yield, but the ability to analyze complex nonlinear relationships is limited, making it difficult to capture the mutation effect of extreme meteorological events [
19]. The decomposition model method separates the actual yield into trend yield (reflecting long-term stability factors such as scientific and technological progress) and meteorological yield (reflecting the short-term impact of climate fluctuations) and can realize dynamic prediction based on real-time meteorological data, and its application value is more prominent in agricultural production [
20]. Commonly used trend yield extraction methods include the moving average method, the exponential smoothing method, the high-pass filtering method, etc., but the existing methods still need to be optimized to improve the capture accuracy of long-term trends and are susceptible to short-term random fluctuations.
With the outstanding advantages of dealing with nonlinear relationships, machine learning algorithms have been widely used in crop yield prediction [
21]. However, a single model is limited by its own assumptions and data adaptability, and prediction accuracy is susceptible to extreme meteorological events [
22,
23]. When comparing various machine learning and deep learning models, Sharma et al. found that single models such as decision trees and convolutional neural networks have obvious shortcomings in crop yield prediction, and their errors are significantly larger than those of stochastic forests and other ensemble models [
24]; Luo et al. [
25] further confirmed in corn yield prediction research under drought conditions that traditional single algorithms reduce their prediction capabilities in extreme climate events. Even if remote sensing indicator optimization is introduced, their robustness cannot meet the demand. By integrating the prediction results of multiple basic models, the ensemble learning model can effectively reduce the deviation and variance of a single model and improve the prediction stability [
26]. Among them, as an advanced integration method, the Stacking model is outstanding in integrating the feature information of different basic models through meta-models. Islam et al. [
27] found in rice yield estimation that after Stacking multiple tree-based regression models, their prediction accuracy was significantly better than linear regression or individual machine learning models, especially when handling nonlinear coupling problems between meteorological and remote sensing data.
Based on meteorological data and flue-cured tobacco yield data, with 12 long-term major flue-cured tobacco-producing prefectures and cities in Yunnan Province as the study area, this study constructs a coupled flue-cured tobacco yield prediction framework integrating polynomial regression and the Stacking model. It focuses on exploring the applicability of this framework in the accurate prediction of flue-cured tobacco yield and its ability to interpret key meteorological factors. Using this coupled framework allows for the accurate separation of trend yield and meteorological yield, mitigating the interference of short-term fluctuations on prediction results. Meanwhile, it improves the accuracy of interpreting the nonlinear effects of meteorological factors during key growth stages and optimizes the efficiency of regional flue-cured tobacco yield prediction. The specific research objectives are as follows: (1) Verify the feasibility of the polynomial regression–Stacking coupled framework for flue-cured tobacco yield prediction in Yunnan. (2) Interpret the influence mechanism of meteorological factors during key growth stages on flue-cured tobacco yield using the SHAP (SHapley Additive exPlanations) method. (3) Identify the optimal lead time for flue-cured tobacco yield prediction, so as to provide a basis for formulating production scheduling and disaster prevention/mitigation measures in tobacco-growing areas. This study can provide methodological support for flue-cured tobacco yield prediction in subtropical plateau tobacco-growing areas and offer technical references for the application of machine learning in crop yield simulation.
3. Results
3.1. Comparison of Yield Decomposition Results for Flue-Cured Tobacco
To clarify the differences in the application of different trend decomposition methods in flue-cured tobacco yield analysis, the flue-cured tobacco yield in Honghe from 2003 to 2020 was used as the study subject, and the moving average method, exponential smoothing method, high-pass filtering method, and polynomial regression method were adopted to conduct the separation experiment of trend yield and meteorological yield.
The results show that there are significant differences in the decomposition performance of the four methods for flue-cured tobacco yield sequences: the trend yield extracted by the moving average method and exponential smoothing method shows obvious high-frequency fluctuations. Because long-term influence factors such as technological iteration and policy-driven factors in flue-cured tobacco yield fluctuations have continuous logic, these two methods are too sensitive to short-term fluctuations, and it is difficult to stably characterize the long-term trend, resulting in frequent fluctuations in the trend yield curve, which cannot accurately reflect the long-term development trends of the industry. The high-pass filtering method is based on the principle of frequency domain separation and has a strong separation ability for meteorological disturbances (high-frequency signals). It can be seen from
Figure 3 that its trend yield fluctuation amplitude is lower than that of the moving average method and the exponential smoothing method, but the trend yield extracted by this method shows the characteristics of “rise first and then fall”, which is contrary to the realistic logic of continuous progress in flue-cured production technology and the steady increase in industrial investment. Driven by agricultural technology iteration and continuous policy support, the trend yield should show a continuous increase or gradual steady-state evolution. Due to an excessive focus on the separation of high-frequency meteorological signals, the low-frequency long-term components of trend yield are cut into segmented forms, which easily lead to phased deviations in trend analysis and are difficult to adapt to the continuous evolution process of “technology accumulation–continuous gain” in industrial development; in comparison, the trend yield curve fitted by the polynomial regression method is smooth and continuous. It effectively captures the long-term evolution trend of flue-cured tobacco production by constructing mathematical models, adapts to the cumulative gain effect brought about by technological progress, and effectively weakens short-term random fluctuations interference, which is highly consistent with the development laws of the tobacco industry and can better accurately reflect the driving effect of long-term factors on yield.
The separation accuracy of meteorological yield depends on the accuracy of trend yield extraction. Different methods have significant differences in the analysis of meteorological yield: the polynomial regression method enables accurate trend analysis, so the meteorological yield separated by this method more truly reflects the impact of climate fluctuations. In 2012, the meteorological yield of flue-cured tobacco of −202 kg/ha was directly related to the extreme precipitation in Yunnan that year, causing diseases in tobacco fields. Extreme climate events destroy the growth environment of tobacco plants, inhibit photosynthesis and nutrient absorption processes, and ultimately cause yield loss. This result reflects the driving effect of meteorological factors on flue-cured tobacco production [
35]. In contrast, other methods, such as the moving average method and the exponential smoothing method, overshadow meteorological signals due to trend fluctuations, resulting in the inability to clearly distinguish climate disturbances and trend noise from meteorological yield. The high-pass filtering method deviates from industrial logic due to trend segmentation, causing meteorological yield to be doped with false fluctuating components, which both cause “signal distortion” problems and cannot provide accurate data support for the research of climate and yield response mechanisms.
In summary, the polynomial regression method has significant advantages in the trend decomposition of flue-cured tobacco yield. By fitting the smoothing curve, it not only captures the long-term increase in production driven by technological progress but also weakens the interference of abnormal meteorological events on trend separation, and builds a solid data foundation for the subsequent prediction of flue-cured meteorological yield. At the same time, precisely separated meteorological yield can quantify the degree of impact of climate factors on yield, provide method support for the industry to formulate disaster avoidance planting plans and improve meteorological insurance strategies, and help the flue-cured tobacco industry improve climate adaptability and risk response capabilities.
3.2. Meteorological Characteristics Screening
In order to eliminate the influence of multicollinearity among variables and improve the model’s computing efficiency and prediction accuracy, the characteristic factors with a high correlation were eliminated through Pearson’s correlation analysis, and 83 characteristic factors were retained for subsequent screening. These characteristic factors specifically include 16 basic meteorological characteristics, 33 derivative meteorological characteristics, 19 meteorological suitability indexes, and 15 meteorological interaction characteristics.
In the process of Recursive Feature Elimination (RFE), Random Forest was used as the base model, the model performance was evaluated through cross-validation scores, and the optimal number of subsets of features was determined based on the mean square error (MSE). The results show that with the increase in the number of input variables, MSE shows a phased change: the intervals of 1–6 variables show a significant downward trend, the intervals of 6–17 variables fluctuate, and the intervals of 17–83 variables rise slowly (
Figure 4). Taking into account both the model performance and variable redundancy, we finally selected 17 variables for model construction. This subset not only retains the core information of the original data set, but also effectively avoids the model overfitting problem caused by redundant variables.
3.3. Results and Analysis of Yield Prediction Modeling
In order to verify the predictive effectiveness of different models on flue-cured tobacco yield, this study used the data of 216 flue-cured tobacco yields from 12 regions in Yunnan Province (2003–2020) as the sample set, and randomly divided them into a training set (151 samples) and a test set (65 samples) at a 7:3 ratio; retained the data from 2021 to 2023 for model performance verification; and compared the prediction results of the Random Forest (RF), Multi-Layer Perceptron (MLP), Support Vector Regression (SVR), ridge regression (Ridge), and Stacking models (
Table 2).
The results show that the R2 of each model exceeds 0.82, with the Stacking model exhibiting the best performance. In a single model, the RF has a strong generalization ability due to the decision tree integration strategy, but it lacks the ability to capture nonlinear responses to extreme meteorological events. Although MLP is good at nonlinear fitting, it is easy to overfit under small samples. SVR and ridge regression are insufficient in high-dimensional feature processing. The Stacking model integrates the predictions of the base models via the ridge regression meta-model with equilibrium deviation and variance, and its RMSE is significantly lower than that of the MLP, which verifies the advantages of ensemble learning in scenarios involving complex meteorological factor coupling.
3.4. Advance Prediction of Flue-Cured Tobacco Yield
In view of the phenological characteristics of the maturation of flue-cured tobacco in September in the study area, based on the 2003–2020 data set, this study constructed a yield prediction model by combining polynomial regression and the Stacking ensemble model with monthly meteorological data. By analyzing the dynamic changes in prediction accuracy across different months during the growth period, the optimal lead time for yield prediction was determined (
Table 3).
From the perspective of prediction accuracy dynamics, the model accuracy showed a gradual optimization trend with the progression of the growth period: the prediction performance was the worst in the early stage of the vigorous growth period (May), improved in July, and was significantly optimized after entering the maturity period (August–September). Among them, the accuracy was the highest in September (maturity period), and the accuracy in August was close to that in September.
The results of the verification of early prediction performance showed that the Stacking model performed best when it was approximately one month before the maturity period (August, the end of the vegetative growth stage). At this time, the model R2 reached 0.78, the MAPE was 2.92%, and the error level was close to the prediction results of the maturity period (September) (MAPE = 2.29%, R2 = 0.87). This result confirms that August can be used as the best lead time for short-term forecasts of flue-cured tobacco production. This model can effectively capture the correlation characteristics of meteorological factors and yield formation during the fertility period; provide reliable decision-making support for production areas to formulate baking scheduling, disaster prevention and mitigation, and other management measures in advance; and has significant practical application value.
3.5. Verification and Analysis of the Model’s Prediction Performance
In order to further verify the actual predictive effectiveness of the model, this study selected flue-cured tobacco yield data from four typical regions in Yunnan Province (2021–2023) for independent sample validation and systematically compared the deviation characteristics of the predicted yield and the actual yield (
Table 4).
The results showed that the overall consistency between the predicted yield and the actual yield in each region was high. The Chuxiong area exhibited low prediction errors during the validation period, with the best accuracy in 2022 and a slight increase in error in 2023 due to abnormal fluctuations in meteorological conditions; the Honghe area had controllable overall error during the verification period, with the smallest error in 2021 and a significant increase in error in 2023 due to regional drought events that caused the actual yield to deviate from the conventional fluctuation range. The forecast error in Kunming showed significant inter-annual differences. The MAPE in 2021 was only 0.11%, and the forecast results were close to the actual value. In 2023, due to extreme high-temperature weather, the actual yield was abnormally high, resulting in the MAPE rising to 6.73%. The forecast stability in the Baoshan area is relatively good, with a three-year MAPE between 0.75% and 4.53%. In 2022, there was a lag in the model’s response to local flooding events, and the error is relatively high.
Overall, except for Kunming in 2023, the prediction error (MAPE) of the model in each region and year is controlled within 5%, indicating that the model has good applicability and stability in the actual prediction of tobacco production, and can effectively support the dynamic monitoring and accurate early warning of regional tobacco production.
3.6. SHAP Feature Contribution Analysis
In order to accurately analyze the driving mechanism of the yield results of the flue-cured tobacco yield prediction model according to meteorological factors, this study uses the SHAP method to conduct a quantitative analysis of feature importance. As the core indicator of model interpretability analysis, the SHAP value can quantitatively disassemble the contribution of each meteorological factor to the predicted results, clearly define its influence direction and intensity, and provide a theoretical basis for the in-depth explanation of the response relationship of “meteorological conditions–production formation” [
36].
3.6.1. Analysis of Single Factor and Synergistic Effects
Based on the analysis of the distribution characteristics of the SHAP value, the discrete span of Day and Night Temperature Difference in August (TDIFF8) is significantly greater than that of other factors (
Figure 5). It can be determined that it is the core meteorological variable that affects the formation of tobacco yield and model prediction accuracy. Its driving mechanism can be explained by the physiological process of photosynthesis. An appropriate high temperature during the day can activate the key enzyme activities of photosynthetic carbon assimilation and promote the generation of photosynthetic products; a moderate low temperature at night can inhibit respiration and reduce material loss [
37]. The synergistic effect of photothermal conditions dominated by TDIFF8 improves the efficiency of dry matter accumulation by optimizing the dynamic balance of “photosynthetic accumulation–respiratory consumption”, verifying the adaptability of the model to physiological mechanisms.
The discrete characteristics of radiation factors such as April solar radiation (IRRAD4) and temperature factors such as May maximum temperature (TMAX5) are also prominent. Among them, radiation factors, as the energy source of photosynthesis, directly determine the amount of photosynthetic products; temperature and humidity factors indirectly affect the formation of yield by regulating physiological metabolism and material distribution processes [
38,
39,
40]. The synergistic mechanism of photothermal resources is presented in a concrete way through the quantitative analysis of the model, laying the foundation for subsequent research on multi-factor coupling effects.
3.6.2. Stage Specificity of Fertility-Related Factors
The SHAP value of the accumulated meteorological factors during the growth period represented by IRRAD_SUM_maturing (the accumulation amount of radiation in maturity) shows significant fertility stage specificity. When the radiation conditions are suitable, the transport and accumulation of photosynthetic products to the blade can be accelerated. At this time, the SHAP value is positive; if the radiation is abnormal, it will interfere with the distribution and accumulation process of substances, resulting in negative fluctuations in the SHAP value. The above phenomenon verifies the model’s ability to capture the cumulative effect of key meteorological factors during the fertility period, which is consistent with the modeling logic of “focusing on the environment in the fertility period–yield response relationship”. From a methodological perspective, this also proves the scientific nature of feature screening (selecting relevant factors for the growth period such as transplantation, prosperity, maturity, etc.) and model architecture (adapting to nonlinear response relationships), ensuring that the analysis results are consistent with the physiological laws of flue-cured tobacco growth.
3.6.3. Global Quantification and Verification of Feature Importance
With the help of the SHAP feature importance map, the explanatory power of TDIFF8 on yield prediction is the most prominent, confirming the core regulatory value of the stability of the day–night temperature difference in flue-cured tobacco maturity period (
Figure 6). The stable temperature difference can optimize the photothermal synergy effect and promote dry matter accumulation. The influence of IRRAD4 (April solar radiation) is secondary, and April corresponds to the transplanted rooting period. Adequate radiation can accelerate seedling establishment and promote root development. Insufficient radiation will delay the growth rhythm and reduce stress resistance. TDIFF6 (Day and Night Temperature Difference in June) and IRRAD6 (June solar radiation) jointly regulate the long-term growth rhythm: abnormal temperature differences often induce leaf deformity and insufficient radiation will limit the accumulation of photosynthetic products. The two are related to the final yield by affecting the growth rate and leaf development.
In addition, features such as IRRAD5 (May solar radiation) and IRRAD_SUM_maturing (total solar radiation accumulation during the maturity stage) have also been included in the core influence collection, and from the dimensions of post-transplantation growth connection and maturity material accumulation, it confirms the multi-fertility driving effect of meteorological factors on yield. The action law of factors revealed by SHAP analysis in this study forms mutual verification with the model prediction results. Through the inversion verification of an abnormal year (for an abnormal TDIFF8 year, the deviation rate of the predicted yield and actual yield of the model is higher than that of conventional years), the effectiveness of the analysis is further verified, which is consistent with the conclusion that “extreme temperature difference leads to negative contribution” in the SHAP graph.
4. Discussion
The coupling method of polynomial regression and the Stacking model proposed in this study provides a valuable approach to alleviate the inherent limitations of single models in analyzing the nonlinear relationship between meteorological factors and flue-cured tobacco yield. In the yield decomposition experiment, polynomial regression exhibited the best performance in trend yield fitting; its smooth long-term trend curve can accurately capture the cumulative effect of agricultural technological progress, laying a reliable data foundation for the precise separation of meteorological yield. This method is highly consistent with Ji et al.’s [
41] conclusion that “trend yield should accurately reflect the long-term impact of technological progress,” further verifying the universality of polynomial regression in agricultural yield trend extraction. It also offers insights for the improvement of Ma et al.’s [
15] crop yield time series prediction framework; while Ma et al.’s framework focuses on the overall prediction process, the optimization of the trend extraction step in this study can provide references for the application of this framework to subtropical plateau crops. Compared with the moving average method, exponential smoothing method, and high-pass filtering method, polynomial regression effectively avoids the over-sensitivity of the moving average method to short-term fluctuations and the trend segmentation bias of the high-pass filtering method caused by frequency domain separation [
26], making it more aligned with the long-term evolution law of “technological iteration–yield increase” in the flue-cured tobacco industry. Additionally, this study focuses on the stage-specific effects of meteorological factors during the growth period, expanding Zhao et al.’s [
14] linear regression-based analysis of meteorological–yield relationships. By breaking through the limitations of linear models, it captures the nonlinear coupling effects of multi-stage factors.
The feature selection results reveal the multi-dimensional driving mechanism of meteorological factors on flue-cured tobacco yield. Temperature-derived features and light suitability features during the growth period contribute significantly to yield prediction, which is consistent with the biological characteristic of flue-cured tobacco—“preferring temperature and light but being intolerant to extreme high temperatures.” This mechanism can be supported by Didari et al.’s [
17] Lasso regression study on dryland wheat yield: Didari et al. confirmed the key role of extreme temperature and precipitation events in yield estimation, while this study further identifies the growth stage of flue-cured tobacco sensitive to extreme factors (extreme temperature differences in the August maturity stage may affect yield), which can provide references for studies on crop-specific meteorological responses. Critically, the inclusion of interaction terms between growing season solar radiation (IRRAD_growing) and precipitation (RAIN_growing) in the core feature set confirms that the photo–water coupling effect is a key mechanism regulating flue-cured tobacco yield formation. This finding echoes Tang et al.’s [
7] conclusion in their Honghe flue-cured tobacco study that “climatic factors affect yield and quality through synergistic effects,” and realizes an extension at the quantitative level—even under drought conditions, it can still capture synergistic effects well. Meanwhile, this result is also consistent with Guo et al.’s [
34] view that “multi-factor coupling operations can capture joint meteorological impacts”. Guo et al.’s study focuses on drought dynamics, while this study applies this concept to yield prediction scenarios, providing a new scenario for the application of multi-factor coupling in agricultural prediction. Furthermore, significant regional differences in model applicability were observed: the model performed best in the Chuxiong tobacco-growing area (average MAPE = 1.46% during 2021–2023) due to the stable local climate; the relatively high prediction error in Kunming in 2023 (MAPE = 6.73%) was mainly associated with extreme high temperatures (August average temperature higher than normal) and the inability of NASA POWER data (0.5° × 0.5° spatial resolution) to reflect mountain microclimate heterogeneity. Compared with the winter wheat studied by Zhang et al. [
30], flue-cured tobacco, as a leaf crop, is more sensitive to temperature–humidity synergy during the maturity stage. This difference stems from the inherent biological characteristics of crops and can provide a basis for comparative studies on meteorological response laws of different crops.
Model comparison experiments fully verify the advantages of ensemble learning in modeling complex agricultural systems. By integrating the algorithmic characteristics of RF, MLP, SVR, and ridge regression, the Stacking model significantly outperforms single models in key performance metrics. This result is consistent with Islam et al.’s [
27] findings in rice yield prediction, that ensemble models can improve prediction stability by balancing the bias and variance of base models. The difference between the two lies in the following: Islam et al.’s model required remote sensing data to achieve an R
2 of 0.85, while the model in this study achieved a higher accuracy using only meteorological data. This characteristic highlights its efficiency advantage in meteorology-driven yield prediction and helps reduce the concerns about “relatively high remote sensing data acquisition costs” mentioned by Sharma et al. [
24] in crop yield prediction studies.
During the verification period, the model showed a stable predictive performance at the regional scale, but relatively high errors were observed in local production areas. This difference can be attributed to three clear limitations: (1) Non-meteorological factors such as soil physical and chemical properties, field management measures, and biological stresses were not included. This situation echoes Bukowiecki et al.’s [
42] finding that “soil texture and green area index (GAI) can improve wheat yield estimation accuracy”, suggesting that the supplementation of non-meteorological factors may provide room for model accuracy improvement. (2) The insufficient spatial resolution of meteorological data, as the 0.5° × 0.5° resolution of NASA POWER data makes it difficult to capture the small-scale heterogeneity of Yunnan’s “three-dimensional climate”, which is consistent with Lecerf et al.’s [
13] concern that “coarse-resolution meteorological data may limit the spatial precision of crop yield prediction”. (3) The adaptability to extreme climates beyond historical data needs to be enhanced, as the model’s training data (2003–2020) did not include events such as “20 consecutive days of drought” or “temperatures exceeding 40 °C”, leading to increased prediction errors during the 2023 extreme high-temperature event in Kunming. This limitation is consistent with the phenomenon of “limited response ability of single algorithms to extreme climates” found by Luo et al. [
25] in maize drought prediction, suggesting that the introduction of extreme climate indices may help optimize model performance. Based on the above observations, corresponding future research directions may include: (1) Fusing high-resolution remote sensing data and IoT data to supplement non-meteorological information. (2) Introducing extreme climate indices to optimize Stacking base models. (3) Exploring the application potential of this framework in subtropical plateau crops such as maize and rapeseed by adjusting growth stage divisions and core features.