Abstract
Timely and accurate prediction of crop traits is critical for precision breeding and regional agricultural production. Previous studies have primarily focused on single crop yield traits, neglecting other crop traits and variety-specific analyses. To address this issue, we employed a Meta-Hybrid Regression Ensemble (MHRE) approach by using multiple machine learning (ML) approaches as base learners, integrating regional multi-year, multi-variety crop field trials with satellite remote sensing indices, meteorological and phenological data to predict major crop traits. Results demonstrated MHRE’s optimal performance for rice and cotton, significantly outperforming individual models (RF, XGBoost, CatBoost, and LightGBM). Specifically, for rice crop, MHRE achieved highest accuracy for yield trait (R2 = 0.78, RMSE = 0.59 t ha−1) compared to the best individual model (XGBoost: R2 = 0.76, RMSE = 0.61 t ha−1); traits like effective spike also showed strong predictability (R2 = 0.64, RMSE = 27.81 10,000·spike ha−1). Similarly, for cotton, MHRE substantially improved yield trait prediction (R2 = 0.82, RMSE = 0.33 t ha−1) compared to the best individual model (RF: R2 = 0.77, RMSE = 0.36 t ha−1); bolls per plant accuracy was highest (R2 = 0.93, RMSE = 2.27 bolls plant−1). Moreover, rigorous validation confirmed that crop-specific MHRE models are robust across five rice and three cotton varietal groups and are applicable across six distinct regions in China. Furthermore, we applied the SHAP (SHapley Additive exPlanations) method to analyze the growth stages and key environmental factors affecting major traits. Our study illustrates a practical framework for regional-scale crop traits prediction by fusing multi-source data and ensemble machine learning, offering new insights for precision agriculture and crop management.