1. Introduction
Foxtail millet (
Setaria italica) plays an essential role in maintaining dietary diversity and nutritional security in China [
1,
2]. Its exceptional drought tolerance, broad adaptability, high nutritional value, and distinct flavor make it a key cereal in the semi-arid and rainfed regions of northern China [
3,
4]. Although traditional field methods have extensively explored the physiological traits of foxtail millet (e.g., photosynthetic efficiency, stress resistance, and yield formation) [
5,
6], these approaches often rely on limited datasets (e.g., single-site or single-season trials), falling short of providing comprehensive insights into millet performance across varied environmental conditions, particularly in remote or semi-arid areas with poor soils and scarce rainfall [
7,
8,
9]. Consequently, bridging these knowledge gaps is crucial for improving the precision management, pest control, and genetic enhancement of foxtail millet [
10,
11], yet current methods still lack the capacity for large-scale or multi-year applications.
In northern and northwestern China, the cultivation of foxtail millet spans millions of hectares, both underpinning the livelihoods of numerous smallholders and contributing significantly to modernized agricultural production systems [
8,
9]. For farmers, the principal concern is maximizing yield while minimizing inputs such as fertilizer, irrigation, and labor costs. The timely acquisition of critical canopy indicators—including leaf water content, leaf area index (LAI), and leaf nutrient status (e.g., SPAD-derived chlorophyll)—provides direct information on crop growth and final yield potential. Such data also guide precision irrigation and fertilization, thereby boosting resource use efficiency and lowering overall production expenditure. These benefits hold true for both large-scale operations and smallholder farms, highlighting the broader practical significance of advanced monitoring approaches [
12,
13].
Recent advances in unmanned aerial vehicle (UAV) remote sensing, notably multispectral platforms spanning blue, green, red-edge, and near-infrared bands, have opened up new avenues in high-throughput phenotyping [
14,
15,
16,
17]. Compared with traditional methods, UAV-based observations enable more frequent and extensive data collection on crop growth and biochemical attributes, significantly improving spatiotemporal coverage [
15,
18,
19,
20,
21]. Nevertheless, predictive accuracy often diminishes when models are transferred to new sites or growing seasons, affected by soil variability, climatic fluctuations, and sensor calibration inconsistencies [
22,
23,
24,
25]. While data-driven and hybrid models (e.g., PROSAIL, GREENLAB, and APSIM) demonstrate potential for cross-environment extrapolation [
26,
27,
28], comprehensive assessments of model robustness in foxtail millet across diverse regions and years remain scarce [
29,
30]. Although previous research on wheat, rice, and maize has validated UAV multispectral approaches for estimating canopy traits—such as leaf area index, chlorophyll content, and water status [
14,
15,
19,
31,
32]—transferability to different sites or seasons remains a significant obstacle [
33,
34,
35,
36,
37]. This challenge becomes even more pronounced for foxtail millet, a relatively understudied crop requiring systematic advancements.
In this study, multi-temporal UAV imagery and ground-based measurements were collected over two consecutive years (2023, a normal precipitation year, and 2024, a severe drought year) from two experimental sites in the Jinzhong region of Shanxi Province, located approximately 50–60 km apart. A comprehensive evaluation of several modeling approaches—including regularized regression, tree-based ensemble methods, and neural networks—was undertaken to maintain a high prediction accuracy under cross-regional and cross-year conditions. Specifically, we aimed to (1) determine the accuracy of UAV-based multispectral sensing in the high-throughput monitoring of key foxtail millet canopy traits (i.e., leaf water content, SPAD-derived chlorophyll, and leaf area index [LAI]); (2) investigate the cross-regional predictive performances of these canopy phenotyping models; (3) assess the cross-year transferability of the resulting spectral prediction models and examine the influence of multi-site data fusion on model robustness; and (4) propose strategies for integrating mechanistic models or advanced data fusion techniques to further expand model applicability. By constructing a multi-environment modeling framework and conducting systematic validation, this study provides UAV remote-sensing-based support for the precision management and genetic improvement of foxtail millet in semi-arid and rainfed regions, while also offering a reference for large-scale phenotyping and cross-season adaptation in other minor cereals.
2. Materials and Methods
2.1. Description of the Study Area
The following two experimental sites were selected in Shanxi Province, China: the Yuci Lifang Experimental Station (37°51′ N, 112°45′ E) and the Shanxi Agricultural University Paotuan Experimental Station (37°25′ N, 112°36′ E), hereafter referred to as LF and PT, respectively. Located in a temperate continental semi-arid climate zone, the two sites lie approximately 60 km apart. The soils are classified as cinnamon soils (Calcaric Fluvisols), with an organic matter content of 1.4–1.6%. The region has an annual precipitation of 400–500 mm, an annual mean temperature of 9.5–10.8 °C, an annual sunshine duration of 2000–3000 h, and an annual evaporation of about 1500–2300 mm. The experimental fields lie at elevations of 800–900 m above sea level, with a frost-free period of 120–220 days and a moderate to relatively high soil fertility. Maize was planted as the previous crop at both stations, creating favorable residual conditions for foxtail millet cultivation.
A single-year field trial (May–October 2023) was conducted at the PT station, covering an area of 3100 m2. Meanwhile, two consecutive years of field trials (May–October 2023 and May–October 2024) were carried out at the LF station, with a trial area of 2800 m2. The two-year dataset from LF provided critical information for cross-year model validation, while the combined trials at both stations supported the construction and evaluation of cross-regional canopy monitoring models.
2.2. Field Experiment Design
The foxtail millet cultivar “Jingu 21” was selected for this study. Planting was carried out with a row spacing of 25 cm and plant spacing of 10 cm, in accordance with local standard production practices. Water and fertilizer management, as well as pest and disease control, followed standard agronomic protocols to ensure normal crop growth.
Observations covered key growth stages, including seedling emergence, jointing, heading, grain filling, and maturity. During each growing season in 2023 and 2024, measurements were conducted approximately eight times at regular intervals. For each measurement, six representative quadrats (each 50 cm × 50 cm) were randomly chosen in the field. Within each quadrat, 6–9 millet plants were selected, and their positions were recorded using a high-precision M9 GPS (manufactured by Shanghai Huace Navigation Technology Co., Ltd., Shanghai, China) to ensure accurate correspondence between the spectral data and actual phenotypic measurements. For each selected plant, measurements of leaf moisture content, SPAD chlorophyll index, and leaf area index (LAI) were taken. At the end of the experiment, a total of 200 valid datasets were obtained from PT in 2023, LF in 2023, and LF in 2024, respectively, resulting in a total of 600 valid spectral datasets paired with manually measured phenotypic data on millet plants.
2.3. UAV-Based Multispectral Data Acquisitions
A UAV platform (DJI Mavic 3 Multispectral, manufactured by Shenzhen DJI Technology Co., Ltd., Shenzhen, China) equipped with a 4/3-inch visible CMOS sensor and four multispectral CMOS sensors was employed to acquire imagery in the following four key bands: red (650 nm, 16 nm bandwidth), green (560 nm, 16 nm bandwidth), red-edge (730 nm, 16 nm bandwidth), and near-infrared (860 nm, 26 nm bandwidth). The flight altitude was set at 65 m, with forward and side overlaps of 70% and 80%, respectively, to ensure comprehensive field coverage and high-resolution data acquisition. All flights were conducted between 9:00 AM and 11:00 AM under clear, low-wind conditions to minimize variations in illumination.
Before and after each flight, images of a gray reference board (approximately 0.3 reflectance) and a white reference board (approximately 0.5 reflectance) were captured under similar lighting conditions to determine the reference reflectance for each spectral band. The gain and offset for each band were then calculated based on these calibration images, and pixel-wise radiometric corrections were applied to align the raw images with the reference reflectance. By comparing calibration data collected from multiple flights on the same day and on different dates, consistency was maintained across diverse regions and years.
To further reduce the impact of environmental light fluctuations, cloud interference, and sensor parameter drift, the raw multispectral images underwent radiometric calibration and Z-score normalization. This process yielded calibrated reflectance data that more closely represented the crop’s intrinsic (i.e., “true”) spectral characteristics, thereby improving the accuracy with which subsequent models captured the crop’s physiological status and ensuring a reliable basis for comparison with ground-based measurements.
Finally, the raw multispectral images were processed in DJI Terra (developed by Shenzhen DJI Technology Co., Ltd., Shenzhen, China) to perform image mosaicking, geometric distortion correction, and orthorectification. By incorporating ground control points (GCPs) or using RTK-GPS assistance, the planar positioning error of the orthomosaic was limited to within 1–2 pixels.
2.4. Ground Truthing and Phenotyping
To obtain accurate phenotypic data for the millet plants during the growth period and to align these measurements with UAV remote sensing information, the following major canopy parameters were measured in the field:
LAI was measured using an LAI-2200C canopy analyzer or a comparable scanning method (LAI-2200C manufactured by LI-COR, Inc., Lincoln, NE, USA). Plant density or ground cover were considered to calculate the LAI per unit area, reflecting both the crop’s growth status and photosynthetic potential.
- (2)
Chlorophyll Content (SPAD)
A portable chlorophyll meter (CM 1000 Chlorophyll Meter, Spectrum Technologies, Inc., Aurora, IL, USA) was used to measure the top four functional leaves from each selected millet plant. Each measurement was repeated 3–5 times, and the average value was recorded. The SPAD readings indicated the chlorophyll content of the leaves and could be used to assess the plant’s photosynthetic capacity.
- (3)
Canopy Leaf Moisture Content (CLMC)
Simultaneously, the top four functional leaves from each selected millet plant were sampled and immediately sealed in plastic bags. In the laboratory, the fresh weight (
Wf) was measured, after which, the leaves were placed in an oven at 105 °C for 30 min, then dried at 80 °C until a constant weight (
Wd) was achieved. Leaf moisture content was calculated using Equation (1), as follows:
2.5. Data Preprocessing and Vegetation Indices
After radiometric calibration and orthorectification, pixel-level reflectance values were extracted from the four original bands (green, red, red-edge, and near-infrared). Eleven common vegetation indices (
Table 1) were then calculated to capture variations in the crop chlorophyll content, nitrogen status, and canopy structure.
A total of 15 input variables—including the 4 multispectral bands plus 11 vegetation indices—were ultimately compiled. Each variable was standardized using the Z-score method to reduce dimensional disparities and improve model stability.
Table 1 presents the formulas and references for the 11 vegetation indices employed in this study.
2.6. Model Construction and Evaluation Metrics
In this study, the following three types of models were selected: linear and regularized regression, tree-based models, and neural networks. Linear and regularized regression included Lasso regression and Ridge regression, both of which have a low computational cost and are straightforward to interpret [
20,
36]. To determine the optimal regularization parameters (e.g., α for Ridge and Lasso), we performed a grid search over a predefined set of values (e.g., 0.01, 0.1, and 1.0) combined with 5-fold cross-validation, selecting the setting that minimized the validation RMSE. The tree-based models included Decision Tree, Random Forest, XGBoost, and LightGBM, which can capture nonlinear features and are easily parallelized [
16,
31]. For these algorithms, key hyperparameters such as maximum tree depth, number of trees, and learning rate (for boosting models) were tuned via a grid search and cross-validation. For instance, we tested max_depth from 4 to 10 (in increments of 2), learning_rate values of {0.01, 0.05, 0.1}, and n_estimators of {100, 300, 500}. We then selected the final configuration based on minimizing the RMSE and MRE on the validation set. Neural networks primarily used a Multilayer Perceptron (MLP) architecture. In this study, we adopted two hidden layers, each with 64 neurons, using the ReLU activation function and an Adam optimizer [
19]. The batch size (32 or 64) and dropout rate (0.2 or 0.5) were chosen by comparing validation errors under multiple runs, ensuring that the model avoided overfitting in smaller datasets.
The coefficient of determination (R2) quantifies how well the model fits observed data, with values approaching 1 indicating a stronger explanatory power. Mean Relative Error (MRE) and Maximum Relative Error (MaxRE) represent the average and maximum deviation between predicted and observed values, respectively. The Root Mean Square Error (RMSE) measures how closely predictions conform to actual values (a lower RMSE indicates a higher predictive accuracy). Additionally, 1:1 Scatter Plots provide a direct comparison between predicted and observed outcomes, while Cumulative Error Distribution Plots illustrate the distribution of errors over a range of values. By leveraging these metrics, we systematically assessed both the accuracy and robustness of the models for canopy traits such as CLMC, SPAD, and LAI across diverse environments and growing seasons, addressing the need for broad spatial and temporal extrapolation.
2.7. Cross-Location and Cross-Year Experimental Scheme
To thoroughly evaluate the models’ spatial extrapolation capabilities and temporal robustness, the following multi-level experiments and validation strategies were adopted:
- (1)
Single-Location Modeling
Models were independently trained and evaluated using data from Yuci Lifang (2023), Paotuan (2023), and Yuci Lifang (2024), respectively, to assess their performance under site-specific conditions.
- (2)
Cross-Location Extrapolation
A model trained on the 2023 data from the Yuci Lifang site was validated on the 2023 data from the Paotuan site (or vice versa) to evaluate the model’s transferability between different geographic locations.
- (3)
Cross-Year Extrapolation
The 2023 data from the Yuci Lifang site were used for training and validated on the 2024 data from the same site, assessing model robustness across different years in the same region. Alternatively, combined data from Yuci 2023 and Paotuan 2023 were used to train the model and validated on Yuci 2024, allowing for a comparison of the predictive improvements gained by data fusion.
- (4)
Multi-Location and Multi-Year Fusion Modeling
Data from multiple sites and different years (e.g., Paotuan 2023 + Yuci 2023 + Yuci 2024) were merged and uniformly radiometrically corrected to build a “universal model.” Independent tests or cross-validation on each subset were then conducted to examine the improvements in model generality and stability contributed by data fusion.
3. Results
3.1. Consistency and Calibration Effect of Multispectral Data
Figure 1 depicts the multispectral reflectance characteristics of the millet canopy across four key growth stages, ranging from 30 days after emergence (jointing) to 120 days (maturity). The figure compares the following three types of spectra: (1) raw data (in red), (2) data corrected against a gray card (≈0.3 reflectance) and a white card (≈0.5 reflectance) (in blue), and (3) data normalized using Z-score standardization (in green). Between 30 and 50 days, the green band (reflectance roughly 0.2–0.5) exhibited pronounced variability due to factors such as ambient light intensity, cloud cover, and UAV altitude, complicating a stable representation of crop physiology.
Once calibration was applied, all four bands displayed smoother reflectance curves and marked reductions in external illumination and atmospheric interference. For example, the green band (
Figure 1A) steadily declined from days 30 to 60, consistent with rising chlorophyll levels and canopy coverage, whereas the near-infrared band (
Figure 1B) climbed from about 0.8 to 1.5, mirroring rapid canopy expansion. From days 70 to 120, reflectance decreased in all bands, reflecting typical senescence-related spectral patterns and a declining water content.
Z-score normalization further constrained the multispectral values to the range of [−2, 2], greatly enhancing cross-stage and cross-site comparability. In the red band (
Figure 1C), reflectance declined from days 30 to 60 but rebounded between days 70 and 100, aligning with leaf senescence and chlorophyll degradation. Similarly, the red-edge band (
Figure 1D)—highly sensitive to changes in chlorophyll activity and canopy structure—remained relatively stable from days 30 to 60, yet declined sharply from days 70 to 100. This normalization significantly mitigated spatiotemporal variability and highlighted the dynamic spectral changes over the crop’s life cycle.
Thus, the smoothed spectral signatures (
Figure 1) confirm that radiometric calibration and Z-score normalization effectively reduced environmental noise, allowing the inherent canopy reflectance characteristics of the millet to become more apparent. These preprocessed data, therefore, more accurately approximate the “true” reflectance, serving as a robust foundation for subsequent ground validation and model extrapolation.
Overall, the green and red bands exhibited relatively stable fluctuations, driven primarily by chlorophyll absorption and photosynthetic activity, whereas the red-edge and near-infrared bands were more sensitive to changes in canopy structure and biomass—particularly between days 60 and 100. By applying rigorous calibration and normalization, environmental disturbances and UAV parameter fluctuations were substantially minimized, facilitating the precise delineation of the millet canopy’s spectral properties at each growth stage. These steps are instrumental in boosting both model accuracy and extrapolation capacity.
3.2. Importance of Spectral Features and Their Effects on Phenotypic Parameters
In this study, we constructed a Random Forest model to predict three canopy traits—leaf water content (Y1), SPAD (Y2), and leaf area index (Y3)—using 4 multispectral bands (X1–X4) plus 11 derived vegetation indices (X5–X15), forming a total of 15 spectral features. To elucidate the contributions and interactions of these inputs, we employed SHAP (SHapley Additive exPlanations) to interpret the Random Forest predictions.
Figure 2 presents SHAP summary plots for Y1 (
Figure 2A), Y2 (
Figure 2B), and Y3 (
Figure 2C). Larger absolute SHAP values denote stronger feature impacts, whereas the SHAP value’s sign (positive or negative) indicates whether the feature exerts a favorable or adverse effect on predictions.
According to
Figure 2A, X10 (SAVI) is the most critical feature for leaf water content (Y1). High X10 values (red-colored points) correspond to largely positive SHAP values, implying that increases in SAVI have a generally positive effect on Y1. Following SAVI, X12 (WDRVI) and X9 (RVI) rank next in importance, both showing wide SHAP spreads on the positive and negative ends, indicating notable nonlinear interactions with Y1. Other variables, such as X5 (NDVI) and X13 (TVI), also exhibit a moderate to high importance. In contrast, X6 (RDVI) and X14 (DVI) have smaller SHAP ranges, suggesting minimal impacts on Y1 and offering possible avenues for feature reduction in practical applications.
For SPAD (Y2),
Figure 2B reveals that X10 (SAVI) again ranks highly, but X13 (TVI) and X15 (OSAVI) also stand out, underscoring the relevance of red-edge and near-infrared indices in estimating chlorophyll content. Meanwhile, X2 (NIR) and X6 (RDVI) exhibit bipolar SHAP distributions, implying more complex, nonlinear correlations with SPAD. Conversely, X11 (NDGI) and X14 (DVI) contribute less overall, though they still fine-tune predictive accuracy.
For LAI (Y3),
Figure 2C highlights X15 (OSAVI) as having the largest SHAP magnitude, reflecting its strong predictive power. The next most important features, X9 (RVI) and X7 (NLI), also show wide SHAP spreads, illustrating significant nonlinear effects on LAI. While higher RVI or NLI values often yield positive SHAP effects, certain subsets of the data indicate negative influences. X12 (WDRVI) and X5 (NDVI) are likewise influential, whereas X2 (NIR) and X11 (NDGI) remain less significant, contributing only in specific scenarios.
In summary, the 15 spectral features studied demonstrate complex and nonlinear interactions with Y1, Y2, and Y3. X10 (SAVI) is particularly influential for leaf water content and SPAD, while X15 (OSAVI) proves critical for LAI. Other indices (e.g., WDRVI, RVI, NDVI, and TVI) also offer substantial contributions, but vary by target trait. These findings suggest that feature selection and modeling approaches should be tailored to specific phenotypic goals. SHAP-based analysis uncovers intricate positive and negative relationships often overlooked by purely linear methods. By combining Random Forest modeling with SHAP interpretability, our approach offers deeper insights into the roles of multispectral and vegetation index features in foxtail millet canopies. Although individual feature importance varies, the collective use of multiple spectral inputs robustly enhances the predictive accuracy for Y1, Y2, and Y3, highlighting promising directions for high-throughput phenotyping and precision agriculture.
3.3. Model Construction and Evaluation Under Different Datasets
Using comprehensively radiometrically corrected and normalized UAV data—alongside 11 widely employed vegetation indices—various regression models (linear/regularized), tree-based models (e.g., Random Forest and Gradient Boosting), and a Multilayer Perceptron (MLP) architecture were tested. We categorized these models according to cross-regional, cross-year, and data fusion strategies to evaluate the following three key canopy traits in foxtail millet: leaf moisture content (CLMC), SPAD-based chlorophyll content (SPAD), and leaf area index (LAI).
3.3.1. Modeling Results for LF Single-Region Data in 2023
Table 2 presents the evaluation results for the 2023 Yuci Lifang (LF) site. For CLMC, Random Forest (RF) achieved R
2 = 0.852 (training) and 0.607 (validation), with mean relative errors (MRE) of 3.981% and 7.194%, respectively. This underscores RF’s strong nonlinear capability. Ridge regression ranked second (validation R
2 = 0.491), but balanced feature constraints and interpretability.
For SPAD, RF again performed the best (R2 = 0.946/0.912), with an 11.746% MRE in validation and an acceptable maximum relative error (MaxRE). Gradient Boosting (GB) placed second (R2 = 0.932/0.902) and showed an excellent learning capacity (low training MRE), though its validation RMSE was slightly higher than RF’s.
For LAI, both Ridge and GB excelled. Ridge (R2 = 0.758/0.864) had MREs of 11.258%/8.388%, while GB reached a high training R2 (0.948) but a lower validation R2 (0.806). Both models effectively captured canopy structure. Overall, the LF 2023 dataset demonstrated that RF had a higher accuracy for CLMC and SPAD, while Ridge/GB were competitive for LAI. These results confirm that stringent spectral correction and vegetation index selection enable robust trait estimation.
3.3.2. Modeling Results for Taigu Single-Region Data in 2023
Table 3 presents the modeling outcomes for the 2023 Taigu (PT) dataset. For canopy leaf moisture content (CLMC), Gradient Boosting (GB) achieved the highest R
2 values (0.944 for training and 0.512 for validation), highlighting its capacity to handle nonlinear interactions, albeit with a moderately lower validation R
2. Ridge regression produced a similar validation R
2 (0.482), but yielded a slightly higher MaxRE (31.342%).
In predicting SPAD, GB again led (R2 = 0.981/0.866) with an MRE of around 9.810%, effectively capturing chlorophyll dynamics. Lasso regression ranked second but exhibited larger validation errors. These results underscore the strengths of tree-based models in modeling physiological traits such as CLMC and SPAD.
For LAI, the Multilayer Perceptron (MLP) architecture stood out (R2 = 0.921/0.785), offering a validation MRE of 14.432% and an acceptable MaxRE of 41.651%. However, MLP models can be prone to overfitting when the dataset size is limited or when hyperparameter tuning is inadequate. Overall, the results from the 2023 PT site indicate that GB and MLP excelled in capturing nonlinear features, while Ridge and Lasso provided a better interpretability but proved less robust to extreme samples.
3.3.3. Modeling Results for Yuci Single-Region Data in 2024
Compared to 2023, the 2024 LF dataset (
Table 4) showed a notably improved accuracy for Gradient Boosting (GB) and Random Forest (RF). For CLMC, both exceeded 0.98 in terms of training
R2, with a validation
R2 of around 0.458–0.513 and low MRE values (e.g., 3.912% for GB). For SPAD, GB again dominated (
R2 = 0.983/0.956), followed by RF (0.957/0.923). Extended growth-stage sampling likely stabilized model performance.
For LAI, GB reached R2 = 0.998 (training) and 0.972 (validation), with a validation MRE of only 4.234%. RF also performed well (R2 = 0.989/0.952). Despite severe drought, more comprehensive sampling appeared to mitigate environmental variability. These findings confirm that a combination of multiple vegetation indices and broader sampling supports a consistently high accuracy in key canopy traits, even under harsh conditions.
3.3.4. Model Construction and Evaluation Under Integrated Dataset
Building on the single-location, single-year analyses, we combined the datasets from PT 2023 (A), LF 2023 (B), and LF 2024 (C) in various ways (A + B, A + C, B + C, and A + B + C).
Table 5 summarizes the predictive performances for CLMC, SPAD, and LAI under these fusion scenarios.
Overall, merging the datasets generally elevated the validation R2 values and reduced MRE, particularly in Gradient Boosting (GB) and Random Forest (RF). For example, in A + C, GB reached training/validation R2 values of 0.994/0.853, with an MRE of ~3.904%. SPAD predictions often exceeded 0.93 in validation after fusing multi-year or multi-site data, suggesting an enhanced adaptability to chlorophyll variability. Although LAI predictions were somewhat more variable, they still demonstrated gains under certain fusion strategies (e.g., A + B with GB). These results underscore that multi-source data fusion consistently bolsters model robustness, highlighting the advantages of diverse environmental inputs for training.
3.4. Cross-Regional and Cross-Year Validation and Evaluation of the Model
3.4.1. Cross-Regional Model Validation and Evaluation in the Same Year
This section explores how models trained at one site performed when applied to another site within the same year. By comparing the top-performing models from the 2023 LF (Longfen) and 2023 PT (Pingtai) datasets, we assessed cross-site transferability via validation on their respective datasets (
Table 6,
Figure 3,
Figure 4 and
Figure 5).
When the 2023 LF-trained model was extrapolated to the 2023 PT dataset, CLMC predictions (
Figure 3) achieved R
2 = 0.502, MRE = 13.55%, and MaxRE = 28.05% (RMSE 0.118). Conversely, models trained on PT 2023 and tested on LF gave an R
2 of 0.435 but a lower MRE (6.66%), indicating that local environmental factors strongly influenced accuracy, yet the overall performance remained acceptable.
For SPAD (
Figure 4), the LF-trained model achieved an R
2 of approximately 0.597 (MRE 14.96%) on PT, whereas the PT-trained model attained R
2 = 0.831 on LF but exhibited a higher MRE (21.04%). Although outliers were evident, errors tended to cluster in a manageable range, suggesting some practical utility.
For LAI (
Figure 5), the LF-based model produced R
2 = 0.577 (MRE 18.76%) when applied to PT, whereas PT → LF gave R
2 = 0.584 (MRE 15.57%). The largest discrepancies occurred at high LAI values or under extreme conditions, reflecting moderate environmental influences. Generally, predictions fell within a viable error range.
A comprehensive review of
Figure 3,
Figure 4 and
Figure 5 yields three major insights. The models demonstrated feasible across-site extrapolation for CLMC, SPAD, and LAI within the same year, with most points scattered near the 1:1 line. Soil characteristics, local microclimate, and agronomic management predominantly drove prediction variability, especially under high nitrogen levels or at extreme LAI values. CLMC exhibited a more balanced transferability between LF → PT and PT → LF, whereas SPAD and LAI experienced more significant error dispersion, implying that traits linked to local conditions may require additional calibration.
In summary, the 2023 LF-to-PT and PT-to-LF validation confirmed that rigorous spectral calibration, normalization, and judicious feature selection enable a notable extrapolation capacity. Although soil, climate, and management differences contributed to errors, the models still achieved a respectable accuracy for key canopy traits. Future efforts should incorporate broader, multi-region datasets spanning multiple seasons to further improve robustness.
3.4.2. Cross-Year Model Validation and Evaluation for the Following Year
Here, we examine how models trained on the 2023 dataset performed when predicting 2024, evaluating temporal extrapolation. We also investigate whether multi-source data fusion (e.g., combining multi-regional, multi-year samples) enhanced the accuracy for 2024.
Table 7 and
Figure 6,
Figure 7 and
Figure 8 summarize these results.
According to CLMC predictions (
Table 7,
Figure 6), using only the 2023 LF dataset yielded R
2 = 0.464 (MRE = 8.06%, MaxRE ≈ 20.69%, and RMSE = 0.074) when tested on 2024 LF, implying a partial temporal transferability but also biases stemming from weather and management differences. After fusing the data from 2023 LF and 2023 PT, R
2 improved to 0.603 (MRE = 5.17%), indicating that multi-regional data helped to capture leaf moisture variability. Further merging data from 2023 LF and 2024 LF raised R
2 to 0.547 (MRE ≈ 6.19%), suggesting that direct familiarity with the target year benefited predictive stability.
For SPAD (
Figure 7), training exclusively on LF 2023 resulted in R
2 = 0.514 (MRE = 4.21%) on LF 2024, with a MaxRE of 24.72%. Incorporating PT 2023 data elevated R
2 to 0.658, although extreme values caused a higher MaxRE (59.72%). Adding partial 2024 LF data improved R
2 to 0.971 (MRE ≈ 1.02%), illustrating that prior-year information from the same site could greatly enhance predictive accuracy—though caution is warranted to avoid overlap between training and validation samples.
Regarding LAI (
Figure 8), the baseline 2023 LF → 2024 LF model achieved R
2 = 0.583 (MRE = 18.79%), with errors intensifying at high LAI levels. Including PT data raised R
2 to 0.849 (MRE = 9.80%). Incorporating 2024 LF samples further boosted R
2 to 0.937, emphasizing once more that multi-environment data can mitigate extrapolation risks.
Even though 2023 had normal precipitation and 2024 was marked by severe drought, the models retained a satisfactory accuracy across years, demonstrating the significance of spectral calibration, normalization, and feature selection. These findings suggest that augmenting datasets with additional temporal and environmental heterogeneity can further extend model generalizability.
3.4.3. Model Validation and Evaluation Using Combined Year and Regional Datasets
Building on
Section 3.4.1 and
Section 3.4.2, we next examine how integrating multi-year and multi-regional data influences model construction and extrapolation, validated against the independent 2024 LF dataset.
Table 8 and
Figure 8 summarize these outcomes.
When data from 2023 and 2024 (including LF and PT) were merged, the model’s CLMC predictions for 2024 LF attained R
2 = 0.983, MRE ≈ 0.92%, and an RMSE of 0.014 (
Figure 9A,B), with most errors confined to ±2%, indicating an exceptionally high extrapolation accuracy. For SPAD (
Figure 9C,D), R
2 reached 0.947 (MRE = 1.85% and RMSE ≈ 7.32), notably reducing errors relative to single-year or single-region training. LAI predictions (
Figure 9E,F) scored an R
2 of 0.829 (MRE ≈ 20.98% and RMSE = 0.589), although the maximum errors remained high (69.06%), implying a need for additional calibration at extremely high LAI values or under extreme conditions.
Collectively, multi-year and multi-region fusion consistently improved model reliability and precision. The following two key factors explain these gains: (1) broader source data—encompassing a greater range of climates, management practices, and genetic variations allows models to “learn” more versatile spectral–phenotypic relationships, and (2) the direct coverage of target features—incorporating data from the target site/year aligns training more closely with actual prediction conditions. Nevertheless, predicting LAI under severe drought or unusually dense canopies remains challenging, indicating that further adaptation is required.
Overall, the cross-regional and cross-year assessments in
Section 3.4 highlight that meticulous radiometric calibration, normalization, and multi-algorithm integration (including linear, regularized, tree-based, and neural network models) yielded strong spatial and temporal extrapolation capabilities. Models trained on multi-year, multi-region datasets displayed a notably improved performance for target sites and years, demonstrating robust generalization. Future efforts to gather more extensive temporal series and geographically diverse samples—potentially enriched by high-dimensional environmental and management variables—will further refine these models, providing a solid technical foundation for the large-scale, dynamic monitoring and precision management of foxtail millet.
5. Conclusions
This study deployed UAV-based multispectral imaging to monitor three key canopy traits—leaf moisture content (CLMC), SPAD, and leaf area index (LAI)—in foxtail millet (Setaria italica L.) at two experimental sites (LF and PT, approximately 50–60 km apart) across two growing seasons (2023 with normal precipitation and 2024 with severe drought). We thoroughly evaluated the models’ cross-regional and cross-year predictive performance and investigated how multi-source data fusion enhanced model robustness. The primary findings were as follows.
- (1)
Accuracy and feasibility of UAV multispectral monitoring
Under single-site, single-year conditions, rigorous radiometric calibration and a suite of multispectral vegetation indices allowed the models to achieve an R2 of up to approximately 0.95 for CLMC, SPAD, and LAI, with mean relative errors (MREs) of around 10–15%. These results indicate that UAV-based multispectral sensing can effectively capture the key physiological and structural traits in foxtail millet canopies. When the models were transferred to a different site in the same year or applied to the subsequent drought year, the overall R2 values remained around 0.60–0.70, suggesting a reasonable portability despite environmental and management contrasts.
- (2)
Key factors affecting cross-year and cross-regional transferability
Even under severe drought in 2024, the models trained on 2023 data exhibited an acceptable performance; incorporating additional data (e.g., from PT) further enhanced accuracy. This underscores the value of diverse training samples in capturing a greater environmental variability. Soil differences, nitrogen application levels, and extreme weather conditions (like drought) had stronger impacts on certain traits, notably SPAD, or on high-LAI observations, suggesting that site-specific calibration or additional environmental covariates may be required for these cases.
- (3)
Advantages of multi-source data fusion and integration with mechanistic models
By combining data from multiple sites and years, the models achieved R2 values exceeding 0.90 in independent tests, alongside notable reductions in both mean and maximum relative errors. This result highlights the benefit of broader environmental sampling for model generality. Future studies could integrate mechanistic models such as PROSAIL or APSIM and employ advanced data fusion techniques (e.g., deep learning or temporal modeling) to further improve resilience under extreme environmental conditions and across different growth stages.
- (4)
Methodological limitations and future directions
The multispectral UAV platform used in this study is well-suited to clear, low-wind conditions, but may encounter a degraded image quality or positioning under complex terrain, strong cloud shadows, or sudden weather changes. Large-scale deployments may necessitate refined flight planning and calibration procedures. Our experiments focused on the widely grown cultivar “Jingu 21” in a typical semi-arid region of Shanxi Province; users planning to apply the models in other millet varieties or more extreme climates should gather supplemental local calibration samples or conduct partial model retraining.
- (5)
Key spectral predictors (SHAP-based insights)
In addition, SHAP-based feature importance analysis (see
Section 3.2) indicated that SAVI (X10), WDRVI (X12), RVI (X9), NDVI (X5), TVI (X13), and OSAVI (X15) serve as pivotal predictors for CLMC (Y1), SPAD (Y2), and LAI (Y3). Their relative rankings and interactions vary among target traits, suggesting that combining raw multispectral bands with derived vegetation indices can more effectively capture the spatiotemporal dynamics of millet canopies and, in turn, enhance model extrapolation and adaptability.
In conclusion, this research provides a validated UAV-based multispectral framework that can reliably estimate foxtail millet canopy traits across moderate spatial scales and at least two consecutive years, offering valuable insights for precision irrigation, fertilization, and cultivar selection in semi-arid agroecosystems. By extending multi-year trials, broadening geographic coverage, and integrating additional sensor types and mechanistic or deep learning approaches, the modeling framework presented here can be further refined to support the large-scale, long-term phenotyping of drought-resilient cereal crops.