Next Article in Journal
Deep Reinforcement Learning for Sustainable Urban Mobility: A Bibliometric and Empirical Review
Previous Article in Journal
Direct UAV-Based Detection of Botrytis cinerea in Vineyards Using Chlorophyll-Absorption Indices and YOLO Deep Learning
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Predicting Multiple Traits of Rice and Cotton Across Varieties and Regions Using Multi-Source Data and a Meta-Hybrid Regression Ensemble

1
Remote Sensing Information and Digital Earth Center, College of Computer Science and Technology, Qingdao University, Qingdao 266071, China
2
Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China
3
Department of Biological and Agriculture Engineering, University of California, Davis, CA 95616, USA
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Sensors 2026, 26(2), 375; https://doi.org/10.3390/s26020375
Submission received: 26 November 2025 / Revised: 26 December 2025 / Accepted: 4 January 2026 / Published: 6 January 2026
(This article belongs to the Topic Digital Agriculture, Smart Farming and Crop Monitoring)

Abstract

Timely and accurate prediction of crop traits is critical for precision breeding and regional agricultural production. Previous studies have primarily focused on single crop yield traits, neglecting other crop traits and variety-specific analyses. To address this issue, we employed a Meta-Hybrid Regression Ensemble (MHRE) approach by using multiple machine learning (ML) approaches as base learners, integrating regional multi-year, multi-variety crop field trials with satellite remote sensing indices, meteorological and phenological data to predict major crop traits. Results demonstrated MHRE’s optimal performance for rice and cotton, significantly outperforming individual models (RF, XGBoost, CatBoost, and LightGBM). Specifically, for rice crop, MHRE achieved highest accuracy for yield trait (R2 = 0.78, RMSE = 0.59 t ha−1) compared to the best individual model (XGBoost: R2 = 0.76, RMSE = 0.61 t ha−1); traits like effective spike also showed strong predictability (R2 = 0.64, RMSE = 27.81 10,000·spike ha−1). Similarly, for cotton, MHRE substantially improved yield trait prediction (R2 = 0.82, RMSE = 0.33 t ha−1) compared to the best individual model (RF: R2 = 0.77, RMSE = 0.36 t ha−1); bolls per plant accuracy was highest (R2 = 0.93, RMSE = 2.27 bolls plant−1). Moreover, rigorous validation confirmed that crop-specific MHRE models are robust across five rice and three cotton varietal groups and are applicable across six distinct regions in China. Furthermore, we applied the SHAP (SHapley Additive exPlanations) method to analyze the growth stages and key environmental factors affecting major traits. Our study illustrates a practical framework for regional-scale crop traits prediction by fusing multi-source data and ensemble machine learning, offering new insights for precision agriculture and crop management.

1. Introduction

By 2050, the global population is projected to increase by more than two billion [1]. To sustainably secure food supplies under this mounting pressure, it is imperative to elucidate the evolutionary regulatory mechanisms underlying key crop traits and to translate fundamental discoveries into breeding practice [2,3]. Based on the precise prediction of crop traits, we can not only establish a molecular design breeding system to accelerate the development and screening of new cultivars [4,5], but also optimize the spatiotemporal allocation of field resources and provide scientific support for formulating smart agriculture policies [6,7]. Moreover, by integrating trait-prediction data, growers and agronomists can develop dynamic decision-support models for finely tuned crop management [8].
In agricultural production, crop yield serves as a critical indicator of productivity [9]. Although yield is an important indicator, it represents the cumulative outcome of multiple complex biological processes, making it difficult to capture the real-time dynamics of crop growth. In contrast, key crop traits—such as thousand-kernel weight, stem height, and photosynthetic efficiency—are essential for characterizing crop physiological status and potential productivity [10,11,12]. These traits can reflect crop physiological states and stress responses at enhanced spatial and temporal resolutions [13]. Recent advancements in multispectral/hyperspectral remote sensing technologies have substantially improved the precision of crop traits prediction [14,15]. However, existing research predominantly focuses on end-season yield prediction [16], while systematic predictive modeling of these critical crop traits remains relatively underdeveloped [17]. Furthermore, most current findings are constrained to single cultivars or specific regions, with their robustness and reliability across diverse varieties and geographical areas yet to be thoroughly validated [18]. Therefore, in-depth research on predicting key traits of major crops is crucial for enhancing agricultural production efficiency and refining field management decisions.
Satellite remote sensing, particularly using multispectral bands, provides powerful tools for monitoring crop growth and characterizing field spatial heterogeneity [19]. Vegetation indices (VIs) derived from visible and near-infrared bands effectively reflect photosynthetic activity and biomass dynamics [20,21]. With advances in remote sensing technology, an increasing number of novel spectral products are being incorporated into crop characterization research. For instance, sun-induced chlorophyll fluorescence (SIF), originating from specific narrow bands in the near-infrared region, provides a more precise proxy indicator for instantaneous crop photosynthetic efficiency [22,23], while the retrieval of gross primary productivity (GPP) directly reflects the net photosynthetic rate of crops [24,25]. Additionally, soil moisture products derived from microwave sensors such as SMAP and ASCAT can be utilized to assess surface soil moisture conditions and root zone water availability [26]. Particularly, meteorological factors (e.g., air temperature, precipitation) profoundly influence crop physiology [27,28], yet spectral signals alone often fail to fully capture these abiotic drivers, necessitating integration with ground-based or reanalysis data [29]. Similarly, anthropogenic factors such as field management practices significantly impact crop traits, yet these are typically not directly measurable via satellite remote sensing [30,31]. While data from various satellite platforms are extensively applied to evaluate crop growth status, research integrating remote sensing data, meteorological data, and field data for comprehensive crop trait prediction remains relatively scarce.
Currently, three primary approaches are employed for predicting crop traits. The first approach involves process-based growth simulation combined with data assimilation [32]. This method simulates crop growth and development by finely tuning model parameters to generate predictions [33]. However, the complexity of model parameters and the stringent requirements for initial conditions limit its efficiency and scalability for large-scale applications [34,35]. The second approach utilizes statistical regression models to establish relationships between vegetation indices and crop traits. Nevertheless, this method exhibits limited capability in modeling nonlinear relationships and suffers from inadequate generalization across spatial and temporal scales [36]. The third approach is based on machine learning (ML) algorithms. Traditional ML methods, such as Random Forest (RF) and Extreme Gradient Boosting (XGBoost), have been widely applied in crop traits prediction [37,38,39]. However, these methods often struggle to effectively capture the complex interactions between crop traits and environmental factors and face challenges in integrating heterogeneous data. Moreover, constraints in data quantity and quality have confined most studies to single-yield predictions at county or municipal levels [40,41]. As a branch of ML, deep learning (DL) excels in agricultural remote sensing tasks thanks to its powerful feature extraction capabilities. Algorithms such as convolutional neural networks [42,43] have demonstrated remarkable effectiveness in field-scale crop phenotyping using high-resolution satellite data. However, the reliability of these models critically depends on the quantity and quality of field-collected samples. In China’s agricultural sector, the scarcity of such samples and the high cost of their acquisition have limited the large-scale deployment of DL [44].
Despite these advances, current studies on crop trait prediction still face several key limitations. Most work has focused on end-of-season yield rather than on multiple crop traits at regional scales, and is often restricted to single cultivars or limited regions without rigorous evaluation across varietal groups and ecological zones. In addition, the joint use of multi-year, multi-variety field trials with multi-source satellite products and meteorological variables remains uncommon, and existing modelling efforts typically optimise a single algorithm rather than exploiting the complementary strengths of multiple machine learning methods within an ensemble and meta-learning framework. Together, these limitations highlight the need for a general yet practical framework that can perform multi-trait prediction for major crops, make efficient use of heterogeneous data sources, and generalise robustly across varieties and regions.
To overcome these issues, achieving spatial and temporal generalization, performing multi-trait joint prediction, and reducing data sample dependence, we apply a Meta-Hybrid Regression Ensemble (MHRE) approach, using Linear_regression, XGBoost, Decision_tree, RF, LightGBM, and CatBoost as base learners combined with meta-learning strategies and ensemble learning techniques for crop multi-trait prediction. MHRE integrates multiple regression algorithms with a meta-learning strategy [45], thereby overcoming the tendency of single models to become trapped in local optima and their inability to generalize across regions and varieties [46,47]. Furthermore, we incorporate meteorological, remote sensing data, and publicly available field trial data to quantitatively analyze yield drivers [48]. In this study, the MHRE framework is applied in a crop-specific manner, with separate models trained for rice and for cotton using their respective multi-year regional field trial datasets.
The main objectives of this study are: (1) to integrate multi-source data and finely characterize crop growth environments using remote sensing and meteorological features; (2) to construct a scalable MHRE approach that outperforms traditional algorithms in regression prediction tasks; (3) to predict major traits of rice and cotton using crop-specific MHRE models across multiple varietal groups and regions in China, and to evaluate the robustness of the framework within each crop; and (4) to use the SHapley Additive Explanations (SHAP) method to improve model interpretability by identifying and quantifying key growth stages and critical environmental factors.

2. Materials and Methods

2.1. Study Area

The sites shown in Figure 1 encompass most of the experimental fields for the national and regional trials of rice and cotton. The cotton trial stations (green markers) are located in the Yellow River and Yangtze River Basins, which are characterized by favorable climatic conditions and abundant agricultural resources, making them the most important cotton production bases in China.
The rice trial stations (purple markers) are primarily situated in the eastern and southwestern regions of China, focusing on core production areas such as the middle and lower plains of the Yangtze River and the Pearl River Delta, which have the most suitable agro-ecological conditions for rice cultivation and support a high-yield and high-quality rice production system.

2.2. Data Sources

2.2.1. Remote Sensing Data

As shown in Table 1, this study used the National Aeronautics and Space Administration’s (NASA) Moderate Resolution Imaging Spectroradiometer (MODIS) remote sensing dataset and the Solar Induced Chlorophyll Fluorescence (SIF) dataset acquired by the TROPOMI sensor on board ESA’s Sentinel-5P satellite. The following MODIS products were used in this study: Normalized Difference Vegetation Index (NDVI), Leaf Area Index (LAI), Potential Evapotranspiration (PET), Evapotranspiration (ET), Gross Primary Productivity (GPP), Fraction of Absorbed Photosynthetic Active Radiation (Fpar). Moreover, the Chinese 1 km resolution daily all-weather surface soil moisture dataset from the National Tibetan Plateau Science Data Center [49] was used. The 36 km-resolution soil moisture (SM) data from AMSR-E and AMSR-2 were integrated with MODIS reflectance and land surface temperature (LST) using a downscaling model to form a high-coverage soil moisture product at 1 km resolution. To ensure spatial consistency among the multi-source datasets with different native spatial resolutions (500 m to ~3.5 km), all remote sensing variables were expressed at a common 1 km spatial resolution. For the 500 m MODIS products, 1 km values were obtained by aggregating each 2 × 2 block of original pixels using the arithmetic mean. For the coarser TROPOMI SIF data (~0.05°, ~3.5 km), 1 km values at the MODIS locations were derived by bilinear interpolation from the original grid. The 1-km soil-moisture product was reprojected to the same coordinate system while maintaining its native resolution. Field plots from the regional variety trials were geolocated in this reference system and used to extract the corresponding 1-km predictors from each dataset. All reprojection and resampling operations were implemented using standard GIS tools (e.g., GDAL, rasterio, xarray) to ensure consistent spatial alignment across sensors and years.
In addition, we used a national 1 km land-use remote-sensing dataset provided by the Resources and Environmental Science and Data Center of the Chinese Academy of Sciences to distinguish cropland from non-cropland areas and to reduce contamination of vegetation indices by surrounding roads, buildings, water bodies and forests. In this product, cultivated land is clearly separated from water bodies, urban land, residential land and other non-agricultural types. After aligning the land-use map with the remote-sensing imagery, we applied a cropland mask: NDVI, LAI and other remote-sensing predictors were extracted only from pixels classified as cultivated land, while pixels mapped as non-cropland were discarded and regional variety trial plots falling in such pixels were excluded from the modelling dataset. For each remaining plot, local canopy conditions were then represented by averaging the values of cropland pixels within a 2 × 2 neighbourhood of 1 km pixels around the plot location, which further mitigates mixed-pixel effects near field boundaries at this spatial resolution.

2.2.2. Meteorological Data

The meteorological data used in this study were obtained from the European Center for Medium-Range Weather Forecasts (ECMWF) ERA5-Land reanalysis products covering the period from 2006 to 2023 with a spatial resolution of 0.1° and a temporal resolution of daily values. The main meteorological variables extracted included daily mean air temperature, daily maximum air temperature, daily minimum air temperature, and 2 m dew point temperature, from which four key environmental indicators were calculated: growing degree days (GDD, Equation (1), °C days 1 ), saturated water vapor pressure difference (VPD, Equation (2), hPa), killing degree days (KDD, Equation (3), °C days 1 ), and cumulative precipitation (Prcp, Equation (4), mm), which were used to characterize heat stress, water stress, heat accumulation, and precipitation supply during the growing period of the crop, respectively. Their calculation formulas are as follows:
GDD = d = 1 N T max d + T min d 2 T base
VPD = d = 1 N e s e a
KDD = d = 1 N max T max d 29 , 0
Prep = d = 1 N Prep d
where N is the number of days in the growth period; T max d and T min d represent the daily maximum and minimum temperatures (°C) on day d, respectively; the base temperature T base is set at 10 °C for rice and 12 °C for cotton; P rec d denotes the daily precipitation (mm) on day d; e s is the saturated vapor pressure, and is the actual vapor pressure.

2.2.3. Experimental Field Data

The experimental field data for this study were obtained from the National Variety Regional Trial Network of the Ministry of Agriculture and Rural Affairs of China (MARD), which integrates a multi-cycle regional trial dataset for rice from 2006 to 2016, and trial data for cotton obtained over seven non-consecutive years within the period from 2008 to 2023. In total, the compiled dataset includes 13,998 rice samples from 94 trial stations and 8046 cotton samples from 45 stations. These experimental fields were established in typical rice and cotton growing areas to represent different climatic conditions in the region. All data were collected following standardized operating procedures. Data underwent quality checks after data entry to identify and remove anomalies and to make necessary statistical corrections. These detailed experimental field data provide information on crop responses under real environmental conditions and support in-depth studies on the physiological and yield responses of rice and cotton. Table 2 below shows the main parameters of the experimental field data for rice and cotton.

2.3. Modeling Framework

In this study, we developed an MHRE framework that integrates multiple regression algorithms with meta-learning strategies and ensemble learning techniques for the prediction of major crop traits. The framework integrates multi-source remote sensing data (NDVI, LAI, GPP, SM, SIF, Fpar, ET, and PET), meteorological variables (VPD, GDD, Prec, and KDD), and phenological records and major crop traits from multi-year field trials (Figure 2). The agronomic data collected in the field and the pre-processed meteorological parameters were fused with remote sensing indices after systematic quality control and unified coding to form a complete multi-source database. For both rice and cotton, plot-level phenological observations from the regional variety trials were used to align meteorological and remote-sensing variables across years and cycles. Rather than using raw calendar dates, NDVI, LAI and other satellite-derived indices were reindexed relative to key phenological milestones. For rice, sowing, heading and maturity dates defined two main growth phases (sowing–heading and heading–maturity), within which remote-sensing variables were sampled at 10-day intervals for each plot-year. For cotton, an analogous procedure was applied using seeding, flowering and batting dates. For both crops, cumulative GDD, KDD, precipitation and VPD were computed over these phenology-based windows, providing stage-specific summaries of thermal and hydroclimatic conditions. To mitigate the effects of multicollinearity and enhance model generalizability, a recursive feature elimination (RFE) method was employed to pre-select the most informative predictors from the initial feature set. This process identified and retained a subset of non-redundant features that contributed most significantly to the predictive performance, which were then used as inputs for all subsequent machine learning models. The database was divided into 70% training set and 30% test set to ensure the independence of evaluation. To reduce the impact of measurement errors and extreme values, we applied IQR- and Z-score-based outlier filtering, mean imputation and standardization to the multi-year, multi-cycle trial data. Generalization was evaluated using geographically and varietally stratified cross-validation, yielding a more realistic estimate of predictive performance under heterogeneous conditions.
In this study, the MHRE framework was applied separately to rice and cotton, with each crop model trained on its own multi-year field trial dataset. These four tree-based ensemble methods (RF, XGBoost, LightGBM and CatBoost) were selected because they are widely used and well validated for tabular environmental and remote-sensing data, can effectively capture nonlinear relationships and high-order interactions between crop traits and predictors, and represent complementary algorithmic families (bagging versus gradient boosting), which increases model diversity and improves the performance of the meta-ensemble. The inclusion of linear regression alongside tree-based learners introduces heterogeneity in inductive biases within the ensemble. Tree-based models are well-suited for capturing complex nonlinear interactions and hierarchical feature dependencies, whereas linear regression provides an efficient mechanism for modeling global linear trends. Such diversity is a core principle of ensemble learning, as it allows the meta-learner to synergistically combine the distinct strengths of each model class. The hyper-parameter optimization was carried out by grid search in combination with 5-fold cross-validation. Model performance is quantified based on coefficient of determination (R2), Root Mean Square Error (RMSE), Root Relative Mean Squared Error (RRMSE), Ratio of Performance to Deviation (RPD), and Mean Absolute Error (MAE) metrics of the test set, complemented by residual analysis to verify predictive stability. Among the base learners, relatively more complex ML models including XGBoost, RF, LightGBM and CatBoost were selected as baseline models for comparison with the MHRE framework. The generalization ability of the framework is verified through spatial domain cross-validation and new variety testing, while the relative contribution of remote sensing and meteorological variables is elucidated using built-in feature importance indicators and SHAP value analysis to achieve model interpretability.

2.4. Regression Algorithm

The MHRE (Meta-Hybrid Regression Ensemble) combined with six commonly used machine learning (ML) algorithms (Linear regression, Decision tree, LightGBM, RF, XGBoost, and CatBoost) is proposed for multi-source crop traits prediction (Figure 3). In order to avoid problems due to scale differences between the input features and the target variables, this study used normalization techniques and processing steps consistent with related studies. Optimal hyperparameters such as complexity, structure, and learning rate were tuned for each regression algorithm through grid search and k-fold cross-validation [50].
The algorithm constructs better predictive models by combining the predictions of several underlying ML algorithms [51]. Specifically, each base model is first trained on the original training data and then used to generate out-of-sample predictions on a held-out validation set; these predictions, together with the dependent variable Y, form the meta-dataset (Z, Y), where Z is the vector of base-model outputs. The performance of each base model is then evaluated on this meta-dataset using R2, and the best-performing model is chosen as the meta-regressor [52], which learns an optimized weighted or nonlinear combination of the base-model predictions to produce the final forecast. MHRE learns from a population of N independent samples (Yi, Xi) [53] through a two-stage training process: in the first stage, the dataset is randomly split into training and test sets, and the training set is further partitioned into K folds for cross-validation to generate the meta-dataset—each fold uses K − 1 subsets to train the six base models and the remaining subset to obtain out-of-sample predictions; the meta-regressor is then selected based on its R2 performance on this meta-dataset. In the second stage, all six base models are retrained on the entire training set, and for each new sample their predictions form the input vector Z to the already-selected meta-regressor, which produces the final prediction. Finally, the overall predictive performance of MHRE was evaluated on an independent test set.

2.5. Interpretability Analysis of the Model

To address the need for interpretability of ensemble ML models, this study adopts the SHapley additivity interpretation (SHAP) framework based on cooperative game theory. The SHAP technique, which originates from game theory, aims to improve the interpretability of ensemble ML and DL models by quantifying the contribution of each feature to the prediction. It evaluates the marginal contribution of each feature by comparing the prediction results with and without a feature, and the calculated Shapley value reveals the importance of individual features and their interactions. This approach provides insights that help to understand the decision-making process of complex models, thus making the models transparent and mining the relationships hidden in the data. We computed the shap values for each base learner, then we computed the shap values for the meta-learner, and finally we used a weighted synthesis approach to obtain the shap values for each feature [54]. See Equation (5) for more details.
F i n a l _ v a l u e s i = j = 1 n w j SHAP B j ( x i )
where Final _ values i denotes the contribution of the ith feature of the final synthesis. S H A P B j x i denotes the SHAP value of the jth base learner (e.g., Decision Tree, XGBoost, CatBoost, etc.) for the i-th feature. w j is the weight of the jth base learner in the meta-learner. n is the number of base learners.

2.6. Assessment of Predictive Performance

In this study, four indicators are mainly used to evaluate the model performance, namely, the R2, RMSE, RRMSE, MAE, and RPD. These metrics can fully reflect the prediction accuracy and error of the model [55]. Additionally, Pearson’s correlation coefficient was employed to assess the linear relationship between crop traits. Its use was justified as the data met the key assumptions of normality and linearity, as confirmed by Shapiro–Wilk tests and visual inspection of scatterplots [56]. The R2 (Equation (6)) is a statistic that measures the degree of fit between the predicted and actual values of a model, with values ranging from 0 to 1. The closer the value is to 1, the better the model explains the data.
R 2 = 1 i = 1 n ( y i y ^ i ) 2 i = 1 n ( y i y ¯ ) 2
RMSE (Equation (7)) is a measure of the standard deviation of the difference between predicted and actual values; the smaller the value of RMSE, the smaller the model’s prediction error.
RMSE = 1 n i = 1 n ( y i y ^ i ) 2
RRMSE (Equation (7)), expressed as a percentage, is derived by normalizing the RMSE [57]. This normalization provides an intuitive measure of the error magnitude relative to the average level of the observed values, and lower RRMSE values correspond to better model performance.
R R M S E = R M S E / y ¯ × 100 %
MAE (Equation (9)) quantifies model performance by calculating the average of the absolute errors between predicted and actual values [58]. Smaller MAE values indicate higher predictive accuracy and provide a robust measure of the typical prediction error.
MAE = 1 n i = 1 n y i y ^ i
In addition, the ratio of performance to deviation (Equation (10)) was used as a dimensionless indicator to compare the predictive ability of different models, and higher RPD values indicate better predictive performance.
R P D = S D o b s R M S E
In all of the above equations, y i and y ^ i denote the observed and predicted values of the target trait for sample i, y ¯ is the mean of the observed values, n is the number of samples, and SD obs is the standard deviation of the observed values in the validation set.

3. Results

3.1. Overall Performance in Major Crop Traits Prediction

3.1.1. Prediction of Major Traits of Rice

Table 3 systematically illustrates the performance of XGBoost, RF, LightGBM, CatBoost, and MHRE models in the prediction of crop growth parameters in the experimental field. The results demonstrate that the MHRE model has superior performance in the prediction of rice yield and related agronomic traits. Particularly, in the ES prediction task, MHRE and XGBoost achieved the same performance in terms of R2 (0.64). However, MHRE achieved a lower RMSE (27.81 10,000·spike ha−1), RRMSE (11.10%), RPD(2), and MAE (21.79 10,000·spike ha−1), indicating its superior capability in error control. In general, although the base models achieved a certain level of accuracy in predicting yield and other key agronomic traits, RF demonstrated relatively stable performance in some metrics due to its random sampling and feature-splitting mechanisms, but its overall predictive accuracy was still lower than that of the MHRE model.
In yield prediction, the MHRE model achieved the optimal prediction accuracy (R2 = 0.78, RMSE = 0.59 t ha−1) (Figure 4a), and its performance was significantly better than the benchmark model. The model also showed a gradient performance characteristics for the prediction of other key traits: the highest prediction accuracy was achieved for the ES, R2 = 0.64 (Figure 4e), followed by TNG, R2 = 0.61 (Figure 4b) and the NFG, R2 = 0.59 (Figure 4d), while the prediction of the TSW was relatively low in terms of explanatory power (R2 = 0.40) (Figure 4b), and its prediction results exhibited a higher degree of uncertainty. The results demonstrate that meta-learning-based stacked integration strategy effectively improves the prediction robustness of complex agronomic traits under the condition of heterogeneous data from multiple sources and provides reliable technical support for crop growth monitoring in precision agriculture.

3.1.2. Prediction of Major Cotton Traits

Comparative analysis shows that the MHRE model outperforms single algorithms in predicting major cotton traits (Table 4). In particular, for stem height prediction, the MHRE framework achieved the same coefficient of determination (R2 = 0.80) as the RF model but showed higher precision with a lower RMSE (7.72 cm), RRMSE (7.06%), and MAE (5.79 cm). This performance advantage is particularly pronounced for complex, yield-related parameters, suggesting that MHRE captures more effectively the intricate relationships between explanatory variables and phenotypic outcomes. In contrast, traditional XGBoost, CatBoost, and LightGBM models performed relatively worse on the same dataset, although they provided decent predictions in some cases.
Experimental validation on an independent test set confirmed the model’s robust predictive ability across multiple key traits. In holdout trial predictions, several traits reached an R2 of 0.41 or higher. The prediction accuracy for Seed-yield reached R2 = 0.82 (Figure 5a). The model also excelled at structural traits, including NCB (R2 = 0.93; Figure 5c) and stem height (R2 = 0.80; Figure 5b), effectively capturing key features closely related to plant growth structure. Remarkably, MHRE also retained predictive capability for the SUI (R2 = 0.41; Figure 5d)—a composite metric reflecting fiber spinnability and estimated yarn strength. Although the dispersion in SUI predictions suggests residual variation likely due to the trait’s inherent complexity, the ensemble approach still outperformed all individual methods across every evaluated trait. These results demonstrate that ML ensemble strategies are highly effective in addressing the complex phenotype-environment relationships in agricultural crop traits prediction.

3.2. Robustness of Variety-Specific Trait Predictions via Stratified Validation

3.2.1. Robustness of Trait Prediction Across Rice Varieties

To validate the predictive ability of the model across different rice varieties, we adopted a stratified leave-one-out cross-validation strategy, using the coefficient of determination (R2) as the evaluation metric. To rigorously prevent data leakage—particularly temporal bias that could arise from having the same growing year represented in both training and validation sets—the stratification was carefully designed to ensure that all data from a specific sub-variety and growing year were kept within the same fold and exclusively assigned to either training or validation in each iteration. Figure 6 shows the predictive ability of the traits for five types of rice: late-maturing medium-indica in the middle and lower reaches of the Yangtze River (Figure 6a), late-maturing medium-indica in the upper reaches of the Yangtze River (Figure 6e), late-maturing medium-indica (Figure 6b), early-maturing late-indica (Figure 6d), and the early indica group (Figure 6c). Under stratified leave-one-out cross-validation, we predicted the yield and other agronomic traits for each sub-variety within every rice variety and calculated the corresponding prediction accuracy. Each scatter point represents the prediction accuracy for a specific sub-variety. The study found that the middle and lower reaches of the Yangtze River (Figure 6a) and the upper reaches (Figure 6e) of late-maturing medium-yielding indica varieties showed similar prediction patterns, with excellent stability in yield prediction (average R2 > 0.7). In contrast, the NFG and TSW showed a wider dispersion and significant clustering of outliers in these groups. All the variety groups had the common feature of weak explanatory power for TSW. In particular, the two late-indica subgroups—late-maturing (Figure 6b) and early-maturing (Figure 6d)—both achieved excellent yield prediction accuracy (R2 > 0.8). However, the prediction ranges for ES and TNG were limited, while NFG and TSW still showed considerable variation. The early indica group (Figure 6c) presented a particular challenge, with significant variation in the prediction of TNG and TSW. The MHRE model maintained consistent prediction accuracy for major rice traits under variety-specific growth patterns, highlighting the robustness of the model in dealing with the phenotypic complexity of the rice system.

3.2.2. Robustness of Trait Prediction Across Cotton Varieties

The generalizability of the model was further confirmed through a systematic evaluation of three cotton cultivars: medium-maturing conventional cotton (Figure 7a), medium-maturing hybrid cotton (Figure 7b), and early-maturing conventional cotton (Figure 7c), with all evaluations using the same validation scheme. It is noteworthy that both medium-maturing cultivar groups showed robustness in predicting key traits: Seed-yield (medium-maturing conventional: R2 = 0.84, medium-maturing hybrid = 0.89), NCB (medium-maturing conventional: 0.86, medium-maturing hybrid: 0.90), and Stem-height (medium-maturing conventional: 0.74, medium-maturing hybrid: 0.81), with data points densely distributed within the 95% confidence interval. The hybrid cultivar (Figure 7b) showed higher prediction consistency, particularly for seed yield. In contrast, the early-maturing conventional cotton group (Figure 7c) exhibited trait-specific prediction differences. Although seed yield (mean R2 = 0.85) and number of bolls per plant (0.76) maintained high prediction accuracy, plant height prediction showed significant fluctuations (IQR = 0.57). In particular, the SUI presented the greatest prediction challenge across all cultivars.

3.3. Spatial Applicability Under Geographically Stratified Validation

3.3.1. Spatial Applicability for Major Rice Traits

To rigorously evaluate the spatial generalization ability of the model (Figure 8), we constructed a geographically stratified validation framework covering the four major rice production regions in China: East China (EC), South China (SC), Central China (CC), and Southwest China (SW). In the evaluation process, R2 was used as the primary performance metric to quantify the model’s predictive accuracy across different regions. A systematic geographically stratified leave-site cross-validation method was applied using all observation stations within each region as independent test sets (EC: 24 stations, CC: 12 stations, SC: 12 stations, SW: 11 stations). This approach ensured the comprehensiveness and reliability of the evaluation, allowing for an assessment of the model’s performance across different regional environmental conditions. In particular, Central China (CC) showed the best prediction performance, with an overall accuracy index of 0.60 (Figure 8c), and an average R2 of 0.752 for yield-related traits. East China and South China maintained robust predictive ability, with average accuracy indices of 0.58 (Figure 8e) and 0.59 (Figure 8d). By contrast, Southwest China exhibited a lower average accuracy (Figure 8b), which is likely related to its complex topography and highly fragmented rice-growing landscapes, where small paddy fields are frequently mixed with forests, built-up areas and other land uses at the 1 km scale. This spatial heterogeneity, together with variable climate and management and fewer field trial sites, reduces the effective signal-to-noise ratio and limits prediction accuracy in this region.

3.3.2. Spatial Applicability for Major Cotton Traits

To rigorously assess the spatial transferability of the model (Figure 9), we developed a geographically stratified validation framework covering the four major cotton production regions in China: EC, North China (NC), CC, and Northwest China (NW). A station-by-station iterative validation protocol was implemented at all monitoring sites (EC: n = 23, NC: n = 7, CC: n = 17, and NW: n = 2), systematically quantifying the cross-regional prediction performance.
The overall average accuracy indices for the four major macro-regions were: East China (EC) 0.78 (Figure 9b), North China (NC) 0.77 (Figure 9c), Central China (CC) 0.77 (Figure 9d), and Northwest China (NW) 0.80 (Figure 9e). Levene’s test showed no significant differences in prediction performance for major cotton traits across regions: the p-value for NCB was 0.53, for seed yield 0.57, for SUI 0.81, and for stem height 0.61, all >0.05. Notably, East China showed higher consistency in trait prediction accuracy, with the best match between predicted and observed values. It is particularly noteworthy that the Spinning Uniformity Index (SUI) consistently proved to be the most challenging prediction target, with significantly lower accuracy compared to other traits. This performance pattern remained consistent across different geographical layers, indicating the inherent complexity in modeling this fiber quality parameter.

3.4. SHAP Framework for Evaluating Key Factors in the Yield Formation Process of Rice and Cotton

This study employs the SHAP interpretability framework to systematically quantify the contributions of climatic and remote-sensing variables to both yield and other key crop traits (cotton stem height and rice total grain number, TNG) at critical growth stages. For rice (Figure 10a), yield exhibits a secondary sensitivity peak 20 days after sowing (SHAP = 1.10) and reaches its maximum influence at the heading stage (SHAP = 1.51). TNG (Figure 10b) is most responsive during heading stage (SHAP = 0.73). Multi-dimensional analysis (Figure 10e) identifies GDD (SHAP = 2.69), SM (SHAP = 1.52), and SIF (SHAP = 1.19) as the principal drivers of rice yield, while TNG (Figure 10f) is chiefly influenced by KDD (SHAP = 1.39) and SIF (SHAP = 1.05). As shown in Figure 10c, the seeding stage exhibited relatively high sensitivity (SHAP > 1.10), peaking at 50 days after seeding (SHAP = 1.47). Stem-height analysis (Figure 10d) indicates that internode elongation is most sensitive around 40 days after seeding. Overall (Figure 10g), SM (SHAP = 1.39), varietal genetic characteristics (SHAP = 1.60), and vapor pressure deficit (VPD; SHAP = 1.00) dominate cotton yield formation, whereas stem height (Figure 10h) is primarily regulated by NDVI (SHAP = 0.85) and SM (SHAP = 0.75). These distinct spatiotemporal sensitivity patterns—cotton’s mid-growth peak affecting boll and stem development versus rice’s sowing and heading stages—reveal crop-specific resource-allocation strategies, where temperature and canopy photosynthesis around heading govern panicle formation, spikelet fertility and grain filling in rice, and water availability and evaporative demand during flowering and batting strongly regulate boll retention in cotton. This alignment between SHAP-derived sensitivities and established physiological understanding provides a basis for precisely targeting management to the most influential growth stages and for optimizing both yield and other key traits.

4. Discussion

4.1. Potential of MHRE in Major Crop Traits Prediction

Our research shows that the MHRE model effectively integrates satellite remote sensing, meteorological data, and crop growth data to successfully predict major crop traits for different varieties and ecological zones. This is particularly valuable in addressing issues commonly encountered in traditional methods, such as observation gaps, spatial sampling bias, and high data collection costs, thereby demonstrating the significant value of low-resolution satellite data [59]. Compared to traditional ML algorithms, the MHRE model, by integrating multiple suboptimal algorithms, is able to extract complex features from high-dimensional data, significantly improving prediction accuracy. In addition to R2- and error-based metrics, the RPD values further support the advantage of MHRE over the individual base learners. For most key traits, MHRE achieves the highest RPD in Table 3 and Table 4, indicating better discrimination between signal and noise and consistent superiority across multiple performance criteria. While UAV-based methods achieve high trait-prediction accuracy [60], their limited spatial coverage impedes large-scale application. Similarly, regression models integrating ground and multispectral data perform well locally but fail in heterogeneous landscapes due to static observations [61]. Deep learning approaches face computational and data-quality constraints despite strong performance on complex traits [62]. In contrast, MHRE overcomes these limitations by leveraging low-resolution satellite data. The MHRE model has shown stability and consistency across different varieties and geographic regions, integrating multidimensional information from diverse areas to reveal the complex interactions between environmental and genetic factors. Despite robust performance across most regions, MHRE faced accuracy challenges in topographically complex areas like the SW (Figure 8c) due to landscape fragmentation [63]. Additionally, prediction accuracy varies across different traits. For example, ES (Figure 4e), which is highly correlated with canopy spectral features, shows higher prediction accuracy, whereas TSW (Figure 4c), influenced by genetic traits and environmental interactions, has a lower prediction accuracy [64]. SHAP value analysis reveals different physiological drivers in the rice and cotton systems. SHAP analysis not only supports the biological plausibility of the model but also provides novel insights into crop response mechanisms by quantifying the contributions of environmental drivers across different growth stages and traits [65]. Temporal SHAP profiling reveals sensitive windows beyond the reproductive phase that may have received comparatively less emphasis in prior analyses. For example, a distinct secondary sensitivity peak in rice yield occurs at 20 days after sowing (SHAP = 1.10), preceding the main peak at the heading stage (SHAP = 1.51). Notably, this early peak reaches ~73% of the heading peak (1.10/1.51), indicating a non-trivial early-season constraint that would be understated by conventional stage-averaged interpretations. This indicates that early vegetative conditions may exert a lasting effect on final yield formation—a temporal nuance that may be underrepresented in conventional growth models [66]. Meanwhile, the responses of different traits to environmental drivers show clear differentiation. In rice, TNG is primarily influenced by heat stress (KDD, SHAP = 1.39) and canopy photosynthetic capacity (SIF, SHAP = 1.05) [67], whereas accumulated thermal time (GDD, SHAP = 2.69) contributes more strongly to overall yield. This provides a quantitative constraint hierarchy: for TNG, KDD exceeds SIF by 0.34 SHAP units (~32% higher), while for yield, GDD shows a markedly larger contribution (2.69), suggesting distinct physiological “control points” for yield components versus final yield. This suggests that yield components are governed by distinct physiological pathways [68]. In cotton, stem height is jointly influenced by canopy greenness (NDVI, SHAP = 0.85) [69] and soil moisture (SM, SHAP = 0.75), reflecting tight coupling between canopy status and water availability during vegetative growth [70]. In contrast, seed yield is regulated by an integration of varietal traits, SM, and VPD, highlighting the composite nature of reproductive success [71,72]. Furthermore, the concurrent elevation of SHAP values for multiple variables during key periods implies potential synergistic stress effects. For instance, whole-season contribution analysis (Figure 10g) identifies SM and VPD as dominant factors for cotton yield, while temporal dynamic analysis (Figure 10c) reveals a broadly sensitive window around mid-season (approximately 50 days after seeding). This pattern is consistent with a compound water-stress period as flowering approaches, during which limited soil water supply and high atmospheric vapor pressure deficit may jointly intensify plant water stress and thereby increase the risk of boll shedding—an effect that may be difficult to capture in single-factor study designs [73]. Importantly, the SHAP framework provides quantifiable and comparable measures of influence, moving beyond qualitative descriptions such as “water is important” or “heat stress is critical.” For example, for TNG of rice, the SHAP value for KDD (1.39) is quantitatively higher than that for SIF (1.05), indicating that under the observed conditions, heat stress imposed a stronger limitation on grain number than photosynthetic capacity. By translating physiological narratives into comparable effect sizes, the model outputs yield directly testable priorities: which driver matters more, for which trait, and at which time window. These data-driven findings offer a mechanistic and quantitative basis for targeting management practices to specific developmental windows and for prioritizing stress-resilience traits in breeding programs within the complex context of genotype–environment–management interactions.
In this study, SHAP is used as a model-based attribution approach to summarize how the fitted predictor-to-trait mapping distributes importance across drivers, traits, and time, rather than as a tool for causal mechanism discovery. Compared with correlation-based summaries, temporal SHAP profiles provide a stage-resolved view of attribution, helping to localize periods when the model is most sensitive to specific drivers (e.g., early-season and mid-season windows) that are not directly captured by season-averaged analyses. Compared with standard sensitivity/importance summaries that provide a single global ranking, temporal SHAP additionally provides stage-resolved attribution patterns. In addition, because SHAP values are reported on a comparable contribution scale within the fitted model, they enable quantitative, within-model comparisons among drivers for a given trait and stage (e.g., KDD vs. SIF for rice TNG under the observed conditions), complementing qualitative interpretation. We note important limitations: SHAP explanations are model-dependent and do not establish causality; moreover, when predictors are correlated, attribution can be non-unique. Accordingly, the physiological interpretations are framed as testable, attribution-consistent hypotheses to be evaluated against established experimental and modeling evidence.
MHRE takes the variety encoding (as a proxy for genotype), environmental variables, and stage-specific remote-sensing–derived phenotypic indicators as joint inputs. The base learners comprise both linear and nonlinear models. In particular, nonlinear learners such as tree ensembles and gradient boosting can learn conditional response patterns, under which the contribution of an environmental factor may vary with genotype and environmental context, whereas the linear learner mainly provides a relatively stable characterization of global main effects. The meta-learner then integrates complementary patterns learned by different base learners, thereby better representing stage-dependent and nonlinear response differences. Accordingly, the G × E effects referred to in this study are implicitly characterized through the joint inclusion of genotype-proxy and environmental predictors together with the ensemble’s capacity to represent nonlinear/interaction-like responses, rather than through an explicit decomposition of variance into G, E, and G × E components.
Correlation between yield and other crop traits. Pearson analyses (Table 5 and Table 6) highlight distinct yield–trait linkages in the two cropping systems. Pearson’s correlation was applied to evaluate linear associations between cotton traits, as the data conformed to bivariate normality assumptions, verified by Shapiro–Wilk tests and Q-Q plots. Scatterplots confirmed linearity, satisfying the method’s key requirements. Rice yield showed a moderate, statistically significant association with both NFG (r = 0.52, p < 0.05) and TNG (r = 0.44, p < 0.05). These reproductive components, therefore, appear to be the primary drivers of yield variability within our multi-regional data set. In contrast, TSW is only weakly correlated with yield (r = 0.19), and ES shows essentially no linear relationship (r = −0.08). Specifically, NFG and TNG are themselves highly collinear (r = 0.87, p < 0.05), whereas ES is negatively related to both NFG and TNG (each r = −0.54, p < 0.05), reflecting the well-known trade-off between panicle number and grain set. Cotton seed yield correlated most strongly with the NCB (r = 0.32, p < 0.05). Relationships between SUI (r = 0.15) and stem height (r = 0.18) were weak and not statistically significant, indicating that fiber quality and vegetative vigor do not materially constrain yield in the current germplasm set. A moderate positive correlation between NCB and stem height (r = 0.55, p < 0.05) suggests that taller plants tend to carry more fruiting sites, but this structural trait did not translate directly into higher lint yield. Overall, the correlation patterns corroborate the MHRE model outputs: traits with the greatest physiological leverage on yield (NFG and TNG in rice; NCB in cotton) coincide with those traits for which the model achieved the highest prediction accuracies (Figure 4).
In addition, the predictive performance across traits should be interpreted in light of both their variability and the extent to which that variability is captured by the predictors. Descriptive statistics (Table 7) show that some traits have much larger Standard Deviation (SD) and Coefficient of Variation (CV) than others. For rice, TSW exhibits a comparatively modest R2 (0.40) despite its observed variability, suggesting that part of its variation is driven by heterogeneous field conditions and measurement noise that are not fully represented by the predictors. For cotton, NCB combines high variability with very high R2, indicating that most of its dispersion is aligned with the remote-sensing and meteorological predictors. However, the SUI shows low variability but only modest R2 (0.41), suggesting that much of its remaining variation reflects fiber-quality genetics and measurement noise beyond the environmental predictors. These examples show that large variability can lower R2 when it is mainly driven by noise or unobserved factors, whereas traits whose variability is well explained by the available predictors can still be predicted accurately even when their dispersion is large.

4.2. Potential Limitations

Although the MHRE model demonstrates superior predictive performance compared to traditional ML models, three key limitations need to be addressed. First, varietal-specific prediction differences reveal a model extrapolation bottleneck for modern breeding traits, which requires dynamic adaptation through the integration of genome-wide selection signals and phenotypic plasticity parameters [74]. While ML architectures have strong interpolation capabilities, they face limitations in extrapolating beyond the training domain, especially under novel environmental mechanisms [75]. MHRE represents G × E in an implicit, predictive manner and therefore cannot resolve several types of G × E complexity. Specifically, it does not explicitly partition variance into genotype (G), environment (E), and G × E components, and it does not provide explicit or definitive causal inference or mechanistic evidence. It has limited extrapolation to unseen genotypes because variety encoding is only a proxy, and it may not disentangle higher-order interactions (e.g., G × E × management) or rare genotype–environment combinations when management variables are unobserved or data coverage is sparse. In addition, fixed stages/windows may not fully capture genotype-dependent phenological shifts, which can blur genotype-specific environmental sensitivities. Second, insufficient representation of terrain-mediated resource heterogeneity (e.g., water redistribution patterns between plains and highlands) may affect the ability of models to transfer across terrain types [76]. Third, as the global climate becomes increasingly non-stationary [77,78], the predictability of precipitation patterns and temperature, particularly in China’s grain-growing regions, has decreased significantly [79], especially the non-stationarity of rainfall—about a quarter of the Earth’s land area now experiences non-stationary rainfall patterns [80], and this proportion is increasing. The non-stationarity of climate distribution and management practices introduces inherent uncertainty, and the combined effects of climate change and topographic diversity create sudden uncertainties in crop response trajectories, posing challenges for breeding target selection and trait prediction. Furthermore, although non-cropland pixels were removed using the national 1 km land-use dataset and local canopy conditions were represented by averaging cropland pixels within a 2 × 2 neighbourhood around each plot, the use of 1 km resolution remote-sensing data inevitably leads to residual mixed-pixel effects. This issue is particularly relevant in regions with complex topography and highly fragmented fields (e.g., parts of Southwest China), where individual pixels may still contain a mixture of crop and non-crop surfaces. Such pixel-scale heterogeneity can dilute crop-specific spectral signals and thus introduce additional uncertainty into the trait predictions, potentially contributing to the lower accuracy observed in these areas. Moreover, the spatial generalization demonstrated here should be understood as valid within the six studied production regions; applying MHRE to entirely new agro-ecological zones will require further local calibration and independent validation.

4.3. Future Enhancements

To further improve the predictive capability of agricultural modeling, we believe that efforts should be focused on three collaborative areas. First, a multi-dimensional integration framework should be established that combines crop genotype features with field management records to form a mechanistic model of genotype × environment × management (G × E × M) interactions [81] to improve biological realism in yield simulations. Second, adaptive data assimilation mechanisms should be developed to dynamically integrate multi-source observational data [82], including high-resolution satellite imagery, soil moisture sensor networks, and distributed meteorological stations, through self-optimization parameter calibration [83] to mitigate the effects of non-stationarity. Third, hybrid architectures combining process-based crop models with deep neural networks show great potential in balancing the predictive capabilities of data-driven models with the inherent physiological interpretability of mechanistic approaches. These collaborative strategies will collectively address the increasing environmental variability and severe food security challenges, making integrated crop models indispensable tools for crop breeding and agricultural planning. In addition, the current implementation of MHRE is crop-specific, with separate models trained for rice and for cotton. A promising direction for future work is to extend the framework towards a more unified multi-crop setting, for example, by adopting multi-task or hierarchical architectures that jointly model several crop species while retaining crop-specific responses. Such cross-crop modeling could better exploit shared patterns among crops and potentially improve trait prediction in data-scarce crops or regions, thereby further enhancing the generality and practical value of the MHRE framework.

5. Conclusions

This study proposed the MHRE framework, an innovative artificial intelligence approach that integrates multi-source data and interpretable modeling to significantly advance the prediction of major crop traits across diverse regions and crop varieties. By combining large-scale standardized datasets, multi-modal features, and robust validation, the framework demonstrated both high predictive performance and meaningful biological interpretability. Specifically, it achieved substantial improvements over baselines in predicting rice (R2 = 0.78, +6.85%; RMSE = 0.59 t·ha−1, −9.61%) and cotton (R2 = 0.82, +10.8%; RMSE = 0.33 t·ha−1, −14.1%) yields, alongside key traits like rice thousand seed weight (R2 +29%). Moreover, MHRE exhibits strong generalization, evidenced by robust performance in independent tests across six ecological zones (e.g., cotton R2 decline NW to NC: only 0.80 to 0.77, −3.8%). In terms of interpretability, this study reveals distinct spatiotemporal influence in yield and key traits (cotton stem height, rice TNG) to environmental and varietal drivers. In particular, it identifies divergent optimization phases: rice yield exhibits strongest driver responses at heading stage (driven by GDD and SIF), while cotton yield shows maximal effects mid-season (controlled by soil moisture and genetics). These insights establish a vital foundation for developing stage-specific crop management strategies. In conclusion, the MHRE framework demonstrates considerable advancements in both predictive accuracy and interpretability for major crop traits prediction. Its ability to achieve high performance across multiple varieties and ecological regions establishes it as a powerful tool for precision agriculture based on an artificial intelligence approach.

Author Contributions

Y.Q.: Methodology, Software, Data curation, Visualization, Investigation, Formal analysis, Writing—original draft. M.T.: Data curation, Investigation, Formal analysis, Validation, Writing—review and editing; X.Y.: Software, Data curation, Formal analysis, Writing—original draft. X.Z.: Investigation, Software, Data curation, Writing—review and editing. X.J.: Investigation, Formal analysis, Writing—review and editing. N.X.: Data curation, Formal analysis, Writing—review and editing; J.Z.: Conceptualization, Funding acquisition, Project administration, Resources, Supervision, Writing—review and editing. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by Qingdao Science and Technology Benefiting the People Demonstration Project (No. 25-1-5-xdny-11-nsh), the Natural Science Foundation of Shandong Province (No. ZR2024LQX005; No. ZR2020QF067; ZR2023QD073).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data are available on request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Cohen, J.E. Human Population: The Next Half Century. Science 2003, 302, 1172–1175. [Google Scholar] [CrossRef] [PubMed]
  2. Cooper, M.; Messina, C.D. Breeding Crops for Drought-Affected Environments and Improved Climate Resilience. Plant Cell 2023, 35, 162–186. [Google Scholar] [CrossRef]
  3. Wei, X.; Chen, M.; Zhang, Q.; Gong, J.; Liu, J.; Yong, K.; Wang, Q.; Fan, J.; Chen, S.; Hua, H.; et al. Genomic Investigation of 18,421 Lines Reveals the Genetic Architecture of Rice. Science 2024, 385, eadm8762. [Google Scholar] [CrossRef]
  4. Amiteye, S. Basic Concepts and Methodologies of DNA Marker Systems in Plant Molecular Breeding. Heliyon 2021, 7, e08093. [Google Scholar] [CrossRef]
  5. Varshney, R.K.; Barmukh, R.; Roorkiwal, M.; Qi, Y.; Kholova, J.; Tuberosa, R.; Reynolds, M.P.; Tardieu, F.; Siddique, K.H. Breeding Custom-designed Crops for Improved Drought Adaptation. Adv. Genet. 2021, 2, e202100017. [Google Scholar] [CrossRef]
  6. Gill, M.; Anderson, R.; Hu, H.; Bennamoun, M.; Petereit, J.; Valliyodan, B.; Nguyen, H.T.; Batley, J.; Bayer, P.E.; Edwards, D. Machine Learning Models Outperform Deep Learning Models, Provide Interpretation and Facilitate Feature Selection for Soybean Trait Prediction. BMC Plant Biol. 2022, 22, 180. [Google Scholar] [CrossRef]
  7. Samantara, K.; Bohra, A.; Mohapatra, S.R.; Prihatini, R.; Asibe, F.; Singh, L.; Reyes, V.P.; Tiwari, A.; Maurya, A.K.; Croser, J.S. Breeding More Crops in Less Time: A Perspective on Speed Breeding. Biology 2022, 11, 275. [Google Scholar] [CrossRef] [PubMed]
  8. Kakoulidou, I.; Avramidou, E.V.; Baránek, M.; Brunel-Muguet, S.; Farrona, S.; Johannes, F.; Kaiserli, E.; Lieberman-Lazarovich, M.; Martinelli, F.; Mladenov, V.; et al. Epigenetics for Crop Improvement in Times of Global Change. Biology 2021, 10, 766. [Google Scholar] [CrossRef]
  9. Bolton, D.K.; Friedl, M.A. Forecasting Crop Yield Using Remotely Sensed Vegetation Indices and Crop Phenology Metrics. Agric. For. Meteorol. 2013, 173, 74–84. [Google Scholar] [CrossRef]
  10. Meena, M.R.; Appunu, C.; Arun Kumar, R.; Manimekalai, R.; Vasantha, S.; Krishnappa, G.; Kumar, R.; Pandey, S.; Hemaprabha, G. Recent Advances in Sugarcane Genomics, Physiology, and Phenomics for Superior Agronomic Traits. Front. Genet. 2022, 13, 854936. [Google Scholar] [CrossRef] [PubMed]
  11. Newman, S.J.; Furbank, R.T. Explainable Machine Learning Models of Major Crop Traits from Satellite-Monitored Continent-Wide Field Trial Data. Nat. Plants 2021, 7, 1354–1363. [Google Scholar] [CrossRef]
  12. Simmons, C.R.; Lafitte, H.R.; Reimann, K.S.; Brugière, N.; Roesler, K.; Albertsen, M.C.; Greene, T.W.; Habben, J.E. Successes and Insights of an Industry Biotech Program to Enhance Maize Agronomic Traits. Plant Sci. 2021, 307, 110899. [Google Scholar] [CrossRef]
  13. Zhong, S.; Sun, Z.; Di, L. Characteristics of Vegetation Response to Drought in the CONUS Based on Long-Term Remote Sensing and Meteorological Data. Ecol. Indic. 2021, 127, 107767. [Google Scholar] [CrossRef]
  14. Burdett, H.; Wellen, C. Statistical and Machine Learning Methods for Crop Yield Prediction in the Context of Precision Agriculture. Precis. Agric. 2022, 23, 1553–1574. [Google Scholar] [CrossRef]
  15. Singh, N.; Tewari, V.K.; Biswas, P.K.; Dhruw, L.K. Lightweight Convolutional Neural Network Models for Semantic Segmentation of In-Field Cotton Bolls. Artif. Intell. Agric. 2023, 8, 1–19. [Google Scholar] [CrossRef]
  16. Javed, T.; Zhang, J.; Bhattarai, N.; Sha, Z.; Rashid, S.; Yun, B.; Ahmad, S.; Henchiri, M.; Kamran, M. Drought Characterization across Agricultural Regions of China Using Standardized Precipitation and Vegetation Water Supply Indices. J. Clean. Prod. 2021, 313, 127866. [Google Scholar] [CrossRef]
  17. Kaur, B.; Sandhu, K.S.; Kamal, R.; Kaur, K.; Singh, J.; Röder, M.S.; Muqaddasi, Q.H. Omics for the Improvement of Abiotic, Biotic, and Agronomic Traits in Major Cereal Crops: Applications, Challenges, and Prospects. Plants 2021, 10, 1989. [Google Scholar] [CrossRef]
  18. Reiss, E.R.; Drinkwater, L.E. Cultivar Mixtures: A Meta-analysis of the Effect of Intraspecific Diversity on Crop Yield. Ecol. Appl. 2018, 28, 62–77. [Google Scholar] [CrossRef]
  19. Wu, B.; Zhang, M.; Zeng, H.; Tian, F.; Potgieter, A.B.; Qin, X.; Yan, N.; Chang, S.; Zhao, Y.; Dong, Q. Challenges and Opportunities in Remote Sensing-Based Crop Monitoring: A Review. Natl. Sci. Rev. 2023, 10, nwac290. [Google Scholar] [CrossRef]
  20. Cai, Y.; Guan, K.; Lobell, D.; Potgieter, A.B.; Wang, S.; Peng, J.; Xu, T.; Asseng, S.; Zhang, Y.; You, L. Integrating Satellite and Climate Data to Predict Wheat Yield in Australia Using Machine Learning Approaches. Agric. For. Meteorol. 2019, 274, 144–159. [Google Scholar] [CrossRef]
  21. Everingham, Y.; Sexton, J.; Skocaj, D.; Inman-Bamber, G. Accurate Prediction of Sugarcane Yield Using a Random Forest Algorithm. Agron. Sustain. Dev. 2016, 36, 27. [Google Scholar] [CrossRef]
  22. Kimm, H.; Guan, K.; Burroughs, C.H.; Peng, B.; Ainsworth, E.A.; Bernacchi, C.J.; Moore, C.E.; Kumagai, E.; Yang, X.; Berry, J.A. Quantifying High-temperature Stress on Soybean Canopy Photosynthesis: The Unique Role of Sun-induced Chlorophyll Fluorescence. Glob. Change Biol. 2021, 27, 2403–2415. [Google Scholar] [CrossRef]
  23. Liu, Z.; Zhao, F.; Liu, X.; Yu, Q.; Wang, Y.; Peng, X.; Cai, H.; Lu, X. Direct Estimation of Photosynthetic CO2 Assimilation from Solar-Induced Chlorophyll Fluorescence (SIF). Remote Sens. Environ. 2022, 271, 112893. [Google Scholar] [CrossRef]
  24. Leolini, L.; Bregaglio, S.; Ginaldi, F.; Costafreda-Aumedes, S.; Di Gennaro, S.; Matese, A.; Maselli, F.; Caruso, G.; Palai, G.; Bajocco, S. Use of Remote Sensing-Derived fPAR Data in a Grapevine Simulation Model for Estimating Vine Biomass Accumulation and Yield Variability at Sub-Field Level. Precis. Agric. 2023, 24, 705–726. [Google Scholar] [CrossRef]
  25. Zhuo, W.; Huang, J.; Xiao, X.; Huang, H.; Bajgain, R.; Wu, X.; Gao, X.; Wang, J.; Li, X.; Wagle, P. Assimilating Remote Sensing-Based VPM GPP into the WOFOST Model for Improving Regional Winter Wheat Yield Estimation. Eur. J. Agron. 2022, 139, 126556. [Google Scholar] [CrossRef]
  26. Berg, A.; Sheffield, J. Climate Change and Drought: The Soil Moisture Perspective. Curr. Clim. Change Rep. 2018, 4, 180–191. [Google Scholar] [CrossRef]
  27. Wang, Q.; Shao, K.; Cai, Z.; Che, Y.; Chen, H.; Xiao, S.; Wang, R.; Liu, Y.; Li, B.; Ma, Y. Prediction of Sugar Beet Yield and Quality Parameters Using Stacked-LSTM Model with Pre-Harvest UAV Time Series Data and Meteorological Factors. Artif. Intell. Agric. 2025, 15, 252–265. [Google Scholar] [CrossRef]
  28. Baig, I.A.; Irfan, M.; Salam, M.A.; Işik, C. Addressing the Effect of Meteorological Factors and Agricultural Subsidy on Agricultural Productivity in India: A Roadmap toward Environmental Sustainability. Environ. Sci. Pollut. Res. 2023, 30, 15881–15898. [Google Scholar] [CrossRef]
  29. Subedi, B.; Poudel, A.; Aryal, S. The Impact of Climate Change on Insect Pest Biology and Ecology: Implications for Pest Management Strategies, Crop Production, and Food Security. J. Agric. Food Res. 2023, 14, 100733. [Google Scholar] [CrossRef]
  30. Shaheb, M.R.; Venkatesh, R.; Shearer, S.A. A Review on the Effect of Soil Compaction and Its Management for Sustainable Crop Production. J. Biosyst. Eng. 2021, 46, 417–439. [Google Scholar] [CrossRef]
  31. Young, M.D.; Ros, G.H.; de Vries, W. Impacts of Agronomic Measures on Crop, Soil, and Environmental Indicators: A Review and Synthesis of Meta-Analysis. Agric. Ecosyst. Environ. 2021, 319, 107551. [Google Scholar] [CrossRef]
  32. Mubeen, M.; Ahmad, A.; Hammad, H.M.; Awais, M.; Farid, H.U.; Saleem, M.; Din, M.S.; Amin, A.; Ali, A.; Fahad, S.; et al. Evaluating the Climate Change Impact on Water Use Efficiency of Cotton-Wheat in Semi-Arid Conditions Using DSSAT Model. J. Water Clim. Change 2019, 11, 1661–1675. [Google Scholar] [CrossRef]
  33. Ojeda, J.J.; Volenec, J.J.; Brouder, S.M.; Caviglia, O.P.; Agnusdei, M.G. Evaluation of Agricultural Production Systems Simulator as Yield Predictor of Panicum Virgatum and Miscanthus x Giganteus in Several US Environments. Gcb Bioenergy 2017, 9, 796–816. [Google Scholar] [CrossRef]
  34. Wu, S.; Yang, P.; Ren, J.; Chen, Z.; Li, H. Regional Winter Wheat Yield Estimation Based on the WOFOST Model and a Novel VW-4DEnSRF Assimilation Algorithm. Remote Sens. Environ. 2021, 255, 112276. [Google Scholar] [CrossRef]
  35. Zhuo, W.; Huang, J.; Li, L.; Zhang, X.; Ma, H.; Gao, X.; Huang, H.; Xu, B.; Xiao, X. Assimilating Soil Moisture Retrieved from Sentinel-1 and Sentinel-2 Data into WOFOST Model to Improve Winter Wheat Yield Estimation. Remote Sens. 2019, 11, 1618. [Google Scholar] [CrossRef]
  36. Kumari, V.; Agrawal, R.; Kumar, A. Use of Ordinal Logistic Regression in Crop Yield Forecasting. Mausam 2016, 67, 913–918. [Google Scholar] [CrossRef]
  37. Danner, M.; Berger, K.; Wocher, M.; Mauser, W.; Hank, T. Efficient RTM-Based Training of Machine Learning Regression Algorithms to Quantify Biophysical & Biochemical Traits of Agricultural Crops. ISPRS J. Photogramm. Remote Sens. 2021, 173, 278–296. [Google Scholar] [CrossRef]
  38. Joshi, A.; Pradhan, B.; Chakraborty, S.; Behera, M.D. Winter Wheat Yield Prediction in the Conterminous United States Using Solar-Induced Chlorophyll Fluorescence Data and XGBoost and Random Forest Algorithm. Ecol. Inform. 2023, 77, 102194. [Google Scholar] [CrossRef]
  39. Prasad, N.; Patel, N.; Danodia, A. Crop Yield Prediction in Cotton for Regional Level Using Random Forest Approach. Spat. Inf. Res. 2021, 29, 195–206. [Google Scholar] [CrossRef]
  40. Guo, Y.; Fu, Y.; Hao, F.; Zhang, X.; Wu, W.; Jin, X.; Bryant, C.R.; Senthilnath, J. Integrated Phenology and Climate in Rice Yields Prediction Using Machine Learning Methods. Ecol. Indic. 2021, 120, 106935. [Google Scholar] [CrossRef]
  41. Ju, S.; Lim, H.; Ma, J.W.; Kim, S.; Lee, K.; Zhao, S.; Heo, J. Optimal County-Level Crop Yield Prediction Using MODIS-Based Variables and Weather Data: A Comparative Study on Machine Learning Models. Agric. For. Meteorol. 2021, 307, 108530. [Google Scholar] [CrossRef]
  42. Li, J.; Magar, R.T.; Chen, D.; Lin, F.; Wang, D.; Yin, X.; Zhuang, W.; Li, Z. SoybeanNet: Transformer-Based Convolutional Neural Network for Soybean Pod Counting from Unmanned Aerial Vehicle (UAV) Images. Comput. Electron. Agric. 2024, 220, 108861. [Google Scholar] [CrossRef]
  43. Victor, B.; Nibali, A.; Newman, S.J.; Coram, T.; Pinto, F.; Reynolds, M.; Furbank, R.T.; He, Z. High-Throughput Plot-Level Quantitative Phenotyping Using Convolutional Neural Networks on Very High-Resolution Satellite Images. Remote Sens. 2024, 16, 282. [Google Scholar] [CrossRef]
  44. Wang, D.; Cao, W.; Zhang, F.; Li, Z.; Xu, S.; Wu, X. A Review of Deep Learning in Multiscale Agricultural Sensing. Remote Sens. 2022, 14, 559. [Google Scholar] [CrossRef]
  45. Li, Y.; Del Rio Chanona, E.A.; Wong, H.S.; Myers, R.J. Predicting Calcium Carbonate Yield from Wet Carbonation of Recycled Cement Paste Using Interpretable Ensemble Machine Learning. J. Clean. Prod. 2025, 514, 145727. [Google Scholar] [CrossRef]
  46. Okujeni, A.; Van Der Linden, S.; Suess, S.; Hostert, P. Ensemble Learning From Synthetically Mixed Training Data for Quantifying Urban Land Cover With Support Vector Regression. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2017, 10, 1640–1650. [Google Scholar] [CrossRef]
  47. Yoosefzadeh-Najafabadi, M.; Earl, H.J.; Tulpan, D.; Sulik, J.; Eskandari, M. Application of Machine Learning Algorithms in Plant Breeding: Predicting Yield From Hyperspectral Reflectance in Soybean. Front. Plant Sci. 2021, 11, 624273. [Google Scholar] [CrossRef]
  48. Liu, B.; Liu, Y.; Huang, G.; Jiang, X.; Liang, Y.; Yang, C.; Huang, L. Comparison of Yield Prediction Models and Estimation of the Relative Importance of Main Agronomic Traits Affecting Rice Yield Formation in Saline-Sodic Paddy Fields. Eur. J. Agron. 2023, 148, 126870. [Google Scholar] [CrossRef]
  49. Song, P.; Zhang, Y.; Tian, J. Improving Surface Soil Moisture Estimates in Humid Regions by an Enhanced Remote Sensing Technique. Geophys. Res. Lett. 2021, 48, e2020GL091459. [Google Scholar] [CrossRef]
  50. Yu, W.; Yang, G.; Li, D.; Zheng, H.; Yao, X.; Zhu, Y.; Cao, W.; Qiu, L.; Cheng, T. Improved Prediction of Rice Yield at Field and County Levels by Synergistic Use of SAR, Optical and Meteorological Data. Agric. For. Meteorol. 2023, 342, 109729. [Google Scholar] [CrossRef]
  51. Vanschoren, J. Meta-Learning: A Survey. arXiv 2018. [Google Scholar] [CrossRef]
  52. Maciel, A.I.; Costa, I.G.; Lorena, A.C. Measuring the Complexity of Regression Problems. In Proceedings of the 2016 International Joint Conference on Neural Networks (IJCNN), Vancouver, BC, Canada, 24–29 July 2016; pp. 1450–1457. [Google Scholar] [CrossRef]
  53. van der Laan, M.J.; Polley, E.C.; Hubbard, A.E. Super Learner. Stat. Appl. Genet. Mol. Biol. 2007, 6, 25. [Google Scholar] [CrossRef]
  54. Wu, L.; Li, J.; Zhang, J.; Wang, Z.; Tong, J.; Ding, F.; Li, M.; Feng, Y.; Li, H. Prediction Model for the Compressive Strength of Rock Based on Stacking Ensemble Learning and Shapley Additive Explanations. Bull. Eng. Geol. Environ. 2024, 83, 439. [Google Scholar] [CrossRef]
  55. Molinaro, A.M.; Simon, R.; Pfeiffer, R.M. Prediction Error Estimation: A Comparison of Resampling Methods. Bioinformatics 2005, 21, 3301–3307. [Google Scholar] [CrossRef]
  56. Valdez, L.; Buono, C.; Braunstein, L.; Macri, P. Effect of Degree Correlations above the First Shell on the Percolation Transition. EPL 2011, 96, 38001. [Google Scholar] [CrossRef][Green Version]
  57. Notton, G.; Paoli, C.; Vasileva, S.; Nivet, M.; Canaletti, J.; Cristofari, C. Estimation of Hourly Global Solar Irradiation on Tilted Planes from Horizontal One Using Artificial Neural Networks. Energy 2012, 39, 166–179. [Google Scholar] [CrossRef]
  58. Nourbakhsh, Z.; Habibi, N. Combining LSTM and CNN Methods and Fundamental Analysis for Stock Price Trend Prediction. Multimed. Tools Appl. 2023, 82, 17769–17799. [Google Scholar] [CrossRef]
  59. Alahacoon, N.; Amarnath, G. Agricultural Drought Monitoring in Sri Lanka Using Multisource Satellite Data. Adv. Space Res. 2022, 69, 4078–4097. [Google Scholar] [CrossRef]
  60. Zhou, H.; Yang, J.; Lou, W.; Sheng, L.; Li, D.; Hu, H. Improving Grain Yield Prediction through Fusion of Multi-Temporal Spectral Features and Agronomic Trait Parameters Derived from UAV Imagery. Front. Plant Sci. 2023, 14, 1217448. [Google Scholar] [CrossRef]
  61. Pant, J.; Pant, R.; Singh, M.K.; Singh, D.P.; Pant, H. Analysis of Agricultural Crop Yield Prediction Using Statistical Techniques of Machine Learning. Mater. Today Proc. 2021, 46, 10922–10926. [Google Scholar] [CrossRef]
  62. Jeong, S.; Ko, J.; Yeom, J.-M. Predicting Rice Yield at Pixel Scale through Synthetic Use of Crop and Deep Learning Models with Satellite Data in South and North Korea. Sci. Total Environ. 2022, 802, 149726. [Google Scholar] [CrossRef] [PubMed]
  63. Zhang, X.; Hu, M.; Guo, X.; Yang, H.; Zhang, Z.; Zhang, K. Effects of Topographic Factors on Runoff and Soil Loss in Southwest China. Catena 2018, 160, 394–402. [Google Scholar] [CrossRef]
  64. Chen, K.; Łyskowski, A.; Jaremko, Ł.; Jaremko, M. Genetic and Molecular Factors Determining Grain Weight in Rice. Front. Plant Sci. 2021, 12, 605799. [Google Scholar] [CrossRef]
  65. Danilevicz, M.F.; Upadhyaya, S.R.; Batley, J.; Bennamoun, M.; Bayer, P.E.; Edwards, D. Understanding Plant Phenotypes in Crop Breeding through Explainable AI. Plant Biotechnol. J. 2025, 23, 4200–4213. [Google Scholar] [CrossRef] [PubMed]
  66. Kalaitzidis, A.; Kadoglidou, K.; Mylonas, I.; Ghoghoberidze, S.; Ninou, E.; Katsantonis, D. Investigating the Impact of Tillering on Yield and Yield-Related Traits in European Rice Cultivars. Agriculture 2025, 15, 616. [Google Scholar] [CrossRef]
  67. Zhang, Y.; Peñuelas, J. Combining Solar-Induced Chlorophyll Fluorescence and Optical Vegetation Indices to Better Understand Plant Phenological Responses to Global Change. J. Remote Sens. 2023, 3, 0085. [Google Scholar] [CrossRef]
  68. Zhou, Z.; Jin, J.; Li, F.; Liu, J. Estimating the Temperature Sensitivity of Rice (Oryza sativa L.) Yield and Its Components in China Using the CERES-Rice Model. Eur. J. Agron. 2025, 162, 127419. [Google Scholar] [CrossRef]
  69. Wang, Q.; Moreno-Martínez, Á.; Muñoz-Marí, J.; Campos-Taberner, M.; Camps-Valls, G. Estimation of Vegetation Traits with Kernel NDVI. ISPRS J. Photogramm. Remote Sens. 2023, 195, 408–417. [Google Scholar] [CrossRef]
  70. Bhattacharya, A. Effect of Soil Water Deficit on Growth and Development of Plants: A Review. In Soil Water Deficit and Physiological Issues in Plants; Springer: Singapore, 2021; pp. 393–488. [Google Scholar] [CrossRef]
  71. Singh, K.; Wijewardana, C.; Gajanayake, B.; Lokhande, S.; Wallace, T.; Jones, D.; Reddy, K.R. Genotypic Variability among Cotton Cultivars for Heat and Drought Tolerance Using Reproductive and Physiological Traits. Euphytica 2018, 214, 57. [Google Scholar] [CrossRef]
  72. Broughton, K.J.; Payton, P.; Tan, D.K.; Tissue, D.T.; Bange, M.P. Effect of Vapour Pressure Deficit on Gas Exchange of Field-Grown Cotton. J. Cotton Res. 2021, 4, 30. [Google Scholar] [CrossRef]
  73. Gao, M.; Xu, B.; Wang, Y.; Zhou, Z.; Hu, W. Quantifying Individual and Interactive Effects of Elevated Temperature and Drought Stress on Cotton Yield and Fibre Quality. J. Agron. Crop Sci. 2021, 207, 422–436. [Google Scholar] [CrossRef]
  74. Li, A.; Dai, H.; Guo, X.; Zhang, Z.; Zhang, K.; Wang, C.; Wang, X.; Wang, W.; Chen, H.; Li, X.; et al. Genome of the Estuarine Oyster Provides Insights into Climate Impact and Adaptive Plasticity. Commun. Biol. 2021, 4, 1287. [Google Scholar] [CrossRef]
  75. Alcorn, M.A.; Li, Q.; Gong, Z.; Wang, C.; Mai, L.; Ku, W.-S.; Nguyen, A. Strike (With) a Pose: Neural Networks Are Easily Fooled by Strange Poses of Familiar Objects. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; IEEE: New York, NY, USA, 2019; pp. 4840–4849. [Google Scholar] [CrossRef]
  76. Wang, F.; Liang, R.; Li, S.; Xiang, M.; Yang, W.; Lu, M.; Song, Y. Assessing the Impact of Multi-Source Environmental Variables on Soil Organic Carbon in Different Land Use Types of China Using an Interpretable High-Precision Machine Learning Method. Ecol. Indic. 2024, 169, 112865. [Google Scholar] [CrossRef]
  77. Lehmann, J.; Coumou, D.; Frieler, K. Increased Record-Breaking Precipitation Events under Global Warming. Clim. Change 2015, 132, 501–515. [Google Scholar] [CrossRef]
  78. Marshall, M.; Belgiu, M.; Boschetti, M.; Pepe, M.; Stein, A.; Nelson, A. Field-Level Crop Yield Estimation with PRISMA and Sentinel-2. ISPRS J. Photogramm. Remote Sens. 2022, 187, 191–210. [Google Scholar] [CrossRef]
  79. Shi, X.; Wang, C.; Zhao, J.; Wang, K.; Chen, F.; Chu, Q. Increasing Inconsistency between Climate Suitability and Production of Cotton (Gossypium hirsutum L.) in China. Ind. Crops Prod. 2021, 171, 113959. [Google Scholar] [CrossRef]
  80. Sun, F.; Roderick, M.L.; Farquhar, G.D. Rainfall Statistics, Stationarity, and Climate Change. Proc. Natl. Acad. Sci. USA 2018, 115, 2305–2310. [Google Scholar] [CrossRef]
  81. Mumford, M.H.; Forknall, C.R.; Rodriguez, D.; Eyre, J.X.; Kelly, A.M. Incorporating Environmental Covariates to Explore Genotype × Environment × Management (G × E × M) Interactions: A One-Stage Predictive Model. Field Crops Res. 2023, 304, 109133. [Google Scholar] [CrossRef]
  82. Zare, H.; Weber, T.K.; Ingwersen, J.; Nowak, W.; Gayler, S.; Streck, T. Within-Season Crop Yield Prediction by a Multi-Model Ensemble with Integrated Data Assimilation. Field Crops Res. 2024, 308, 109293. [Google Scholar] [CrossRef]
  83. Ma, J.; Xie, H.; Song, K.; Liu, H. Self-Optimizing Path Tracking Controller for Intelligent Vehicles Based on Reinforcement Learning. Symmetry 2022, 14, 31. [Google Scholar] [CrossRef]
Figure 1. Study area showing the spatial distribution of 94 rice and 45 cotton stations in China, where green represents cotton and purple represents rice.
Figure 1. Study area showing the spatial distribution of 94 rice and 45 cotton stations in China, where green represents cotton and purple represents rice.
Sensors 26 00375 g001
Figure 2. Overall methodological flowchart for major crop traits prediction of rice and cotton through multi-source data. P represents the predictions generated by the base model.
Figure 2. Overall methodological flowchart for major crop traits prediction of rice and cotton through multi-source data. P represents the predictions generated by the base model.
Sensors 26 00375 g002
Figure 3. Technical modeling diagram for the MHRE framework, M denotes the meta-learner that combines the predictions of the six base models.
Figure 3. Technical modeling diagram for the MHRE framework, M denotes the meta-learner that combines the predictions of the six base models.
Sensors 26 00375 g003
Figure 4. Accuracy of the model for the prediction of rice traits. (a) regression analysis of predicted and measured Yield; (b) Total number of grains (TNG); (c) Thousand seed weight (TSW); (d) Number of filled grains (NFG); and (e) Effective spikes (ES).
Figure 4. Accuracy of the model for the prediction of rice traits. (a) regression analysis of predicted and measured Yield; (b) Total number of grains (TNG); (c) Thousand seed weight (TSW); (d) Number of filled grains (NFG); and (e) Effective spikes (ES).
Sensors 26 00375 g004
Figure 5. Accuracy of the model for the prediction of major cotton traits. (a) regression analysis of predicted and measured Seed-yield values; (b) Stem-height; (c) Number of bolls per plant (NCB); (d) Spinning uniformity index (SUI).
Figure 5. Accuracy of the model for the prediction of major cotton traits. (a) regression analysis of predicted and measured Seed-yield values; (b) Stem-height; (c) Number of bolls per plant (NCB); (d) Spinning uniformity index (SUI).
Sensors 26 00375 g005
Figure 6. Predictive stability of different rice varieties. (a) Middle and lower Yangtze River late-maturing medium-indica, (b) Medium-indica late-maturing, (c) Early indica group, (d) Early-maturing late-indica, and (e) Upper Yangtze River late-maturing medium-indica.
Figure 6. Predictive stability of different rice varieties. (a) Middle and lower Yangtze River late-maturing medium-indica, (b) Medium-indica late-maturing, (c) Early indica group, (d) Early-maturing late-indica, and (e) Upper Yangtze River late-maturing medium-indica.
Sensors 26 00375 g006
Figure 7. Predictive stability of different cotton varieties. (a) Medium-maturing conventional varieties, (b) Medium-maturity hybrids, and (c) Early maturity routine.
Figure 7. Predictive stability of different cotton varieties. (a) Medium-maturing conventional varieties, (b) Medium-maturity hybrids, and (c) Early maturity routine.
Sensors 26 00375 g007
Figure 8. Bar charts of the study regions and the predictive power of the model. (a) The geographic locations of different regions in China, including South China (SC), Southwest China (SW), East China (EC), and Central China (CC). (b) Predictive ability for different traits in SW. (c) Predictive ability for different traits in CC. (d) Predictive ability for different traits in SC. (e) Predictive ability for yield and other traits in EC.
Figure 8. Bar charts of the study regions and the predictive power of the model. (a) The geographic locations of different regions in China, including South China (SC), Southwest China (SW), East China (EC), and Central China (CC). (b) Predictive ability for different traits in SW. (c) Predictive ability for different traits in CC. (d) Predictive ability for different traits in SC. (e) Predictive ability for yield and other traits in EC.
Sensors 26 00375 g008
Figure 9. Bar charts showing the study regions and the predictive power of the model. (a) shows the geographic locations of different regions in China, including North China (NC), Northwest China (NW), East China (EC), Central China (CC), and South China. (b) shows the predictive ability for different traits in EC, (c) in NC, (d) in CC, and (e) in NW.
Figure 9. Bar charts showing the study regions and the predictive power of the model. (a) shows the geographic locations of different regions in China, including North China (NC), Northwest China (NW), East China (EC), Central China (CC), and South China. (b) shows the predictive ability for different traits in EC, (c) in NC, (d) in CC, and (e) in NW.
Sensors 26 00375 g009
Figure 10. Variable importance over time and combined contribution of variables. (a) Importance of variables at different days after sowing and after heading for rice yield, (b) after sowing and after heading for rice TNG, (c) after seeding and after flowering for cotton seed yield, (d) after sowing and after heading for cotton stem height (e) Combined importance of variables for rice yield, (f) for TNG, (g) for cotton seed yield and (h) for cotton stem height.
Figure 10. Variable importance over time and combined contribution of variables. (a) Importance of variables at different days after sowing and after heading for rice yield, (b) after sowing and after heading for rice TNG, (c) after seeding and after flowering for cotton seed yield, (d) after sowing and after heading for cotton stem height (e) Combined importance of variables for rice yield, (f) for TNG, (g) for cotton seed yield and (h) for cotton stem height.
Sensors 26 00375 g010
Table 1. Details of the remote sensing datasets used in this study.
Table 1. Details of the remote sensing datasets used in this study.
Data SourcesDate TypeVariablesTemporal
Resolution
Spatial Resolution
MODISMOD13A1NDVI16 days500 m
MCD15A3HLAI4 days500 m
MCD15A3HFpar4 days500 m
MOD16A2ET8 days500 m
MOD16A2PET8 days500 m
MOD17A2HGPP8 days500 m
TROPOMIRTSIFSIF8 days0.05°
National Tibetan Plateau Data CenterSMSM1 day1 km
Table 2. Parameters related to field trials in cotton and rice.
Table 2. Parameters related to field trials in cotton and rice.
CropData TypesVariableAbbreviation
RicePhenological stagesSowing dateSow
Heading dateHD
Mature dateMT
Growth durationGD
Rice agronomic traitsYield (t ha−1)-
thousand seed weight (g)TSW (g)
Effective spike
(10,000·spike ha−1)
ES (10,000·spike ha−1)
Number of filled grains (grains/panicle)NFG (grains/panicle)
Total number of grains (grains/panicle)TNG (grains/panicle)
CottonPhenological stagesSeeding dateSD
Flowering dateFlw
Batting dateBat
Growth durationGD
Cotton agronomic & fiber traitsSpinning uniformity indexSUI
Stem-height (cm)-
Number of bolls per plantNCB
Seed-yield (t ha−1)-
Table 3. Comparing the performance of five regression algorithms for the prediction of major rice traits.
Table 3. Comparing the performance of five regression algorithms for the prediction of major rice traits.
TraitsUnitsRegression AlgorithmR2RMSERRMSEMAERPD
Yieldt·ha−1MHRE0.780.596.780.452.12
RF0.690.707.980.642.05
CatBoost0.710.697.880.531.99
XGBoost0.760.616.990.461.82
LightGBM0.750.637.200.481.80
ES10,000·spike·ha−1MHRE0.6427.8111.1021.792.02
RF0.6030.2812.0323.591.87
CatBoost0.6330.8712.1524.051.85
XGBoost0.6429.3711.6422.581.93
LightGBM0.6329.5511.7622.961.91
TNGgrains/panicleMHRE0.6124.4014.1318.401.59
RF0.5827.1115.5219.951.45
CatBoost0.5827.8315.7620.391.43
XGBoost0.5927.0315.5219.631.45
LightGBM0.5827.2415.6119.781.44
NFGgrains/panicleMHRE0.5919.4413.8114.811.57
RF0.5422.7515.9916.641.35
CatBoost0.5621.7715.3616.191.41
XGBoost0.5721.4915.1215.811.43
LightGBM0.5622.0615.5116.171.40
TSWgMHRE0.402.198.161.731.29
RF0.312.429.021.911.17
CatBoost0.312.378.801.881.19
XGBoost0.282.418.951.901.17
LightGBM0.322.348.691.881.21
Bold values indicate the best-performing model for each trait on the test set, considering R2 together with RMSE, RRMSE, MAE, RPD.
Table 4. Comparing the performance of five regression algorithms for the prediction of major cotton traits.
Table 4. Comparing the performance of five regression algorithms for the prediction of major cotton traits.
TraitsUnitsRegression AlgorithmR2RMSERRMSEMAERPD
Seed-yieldt·ha−1MHRE0.820.338.990.252.30
RF0.770.369.690.282.13
CatBoost0.680.4311.650.341.77
XGBoost0.760.3810.170.292.03
LightGBM0.770.379.950.292.08
NCBbolls·plant−1MHRE0.932.2710.111.663.79
RF0.892.9813.142.222.91
CatBoost0.873.2014.022.372.73
XGBoost0.892.8612.682.093.03
LightGBM0.892.8412.622.083.02
SUIindexMHRE0.4110.647.398.301.30
RF0.3412.268.539.611.13
CatBoost0.4011.848.239.251.17
XGBoost0.4011.407.908.851.22
LightGBM0.4011.578.049.081.20
Stem-heightcmMHRE0.807.727.065.792.22
RF0.807.978.577.441.83
CatBoost0.748.998.207.081.91
XGBoost0.798.057.406.222.12
LightGBM0.798.107.446.392.10
Bold values indicate the best-performing model for each trait on the test set, considering R2 together with RMSE, RRMSE, MAE, RPD.
Table 5. Pearson correlation analysis among rice traits.
Table 5. Pearson correlation analysis among rice traits.
TraitsYieldESTSWNFGTNG
Yield1
ES−0.081
TSW0.19−0.301
NFG0.52 *−0.54 *−0.061
TNG0.44 *−0.54 *−0.030.87 *1
* indicates significance at the 0.05 level (p < 0.05).
Table 6. Pearson correlation analysis among cotton traits.
Table 6. Pearson correlation analysis among cotton traits.
TraitsSeed-YieldNCBSUIStem-Height
Seed-yield1
NCB0.32 *1
SUI0.150.131
Stem-height0.180.55 *0.061
* indicates significance at the 0.05 level (p < 0.05).
Table 7. Descriptive statistics of observed rice and cotton traits used for model training and evaluation.
Table 7. Descriptive statistics of observed rice and cotton traits used for model training and evaluation.
CropTraitUnitnMeanSDMinMaxCV (%)
RiceYieldt·ha−113,9508.76 1.25 5.17 12.36 14.35
ES10,000·spike·ha−113,79815.69 3.53 7.9 25.3 22.47
TNGgrains/panicle13,781172.59 38.82 69.5 280.8 22.49
NFGgrains/panicle13,430140.77 30.46 59.2 224.6 21.64
TSWg13,93526.89 2.83 19.0 35.0 10.51
CottonSeed-yieldt·ha−180263.71 0.77 1.61 5.80 20.68
NCBBolls·plant−1801622.55 8.63 5.1 47.6 38.27
SUIindex7862144.33 13.91 107.0 182.0 9.63
Stem-heightcm8012109.38 17.12 60.1 155.9 15.66
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Qin, Y.; Tauqir, M.; Yu, X.; Zheng, X.; Jiang, X.; Xu, N.; Zhang, J. Predicting Multiple Traits of Rice and Cotton Across Varieties and Regions Using Multi-Source Data and a Meta-Hybrid Regression Ensemble. Sensors 2026, 26, 375. https://doi.org/10.3390/s26020375

AMA Style

Qin Y, Tauqir M, Yu X, Zheng X, Jiang X, Xu N, Zhang J. Predicting Multiple Traits of Rice and Cotton Across Varieties and Regions Using Multi-Source Data and a Meta-Hybrid Regression Ensemble. Sensors. 2026; 26(2):375. https://doi.org/10.3390/s26020375

Chicago/Turabian Style

Qin, Yu, Moughal Tauqir, Xiang Yu, Xin Zheng, Xin Jiang, Nuo Xu, and Jiahua Zhang. 2026. "Predicting Multiple Traits of Rice and Cotton Across Varieties and Regions Using Multi-Source Data and a Meta-Hybrid Regression Ensemble" Sensors 26, no. 2: 375. https://doi.org/10.3390/s26020375

APA Style

Qin, Y., Tauqir, M., Yu, X., Zheng, X., Jiang, X., Xu, N., & Zhang, J. (2026). Predicting Multiple Traits of Rice and Cotton Across Varieties and Regions Using Multi-Source Data and a Meta-Hybrid Regression Ensemble. Sensors, 26(2), 375. https://doi.org/10.3390/s26020375

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop