1. Introduction
Citrus is among the most widely consumed and commercially traded fruit crops globally, bearing substantial ecological and economic significance [
1]. In China, particularly in the southwestern regions, the citrus industry has become a critical driver of agricultural modernization and regional economic growth. In a major citrus-producing area of western Hubei Province. However, the rising frequency of extreme weather events—such as seasonal droughts, heatwaves, and intense rainfall—has increasingly threatened citrus production [
2,
3]. These climatic stressors have caused yield instability and quality fluctuations, hindering sustainable citrus production [
4].
Temporal variations in leaf water content (LWC) and chlorophyll content (CHL) not only provide direct insights into citrus responses to water stress but also serve as essential indicators of photosynthetic performance and nitrogen metabolism [
5,
6]. LWC and CHL are key physiological indicators of citrus water status and photosynthetic activity, providing valuable information for growth assessment and irrigation management [
7]. Meanwhile, CHL—an indispensable pigment for photosynthesis undergoes a dynamic equilibrium between biosynthesis and degradation throughout the leaf’s developmental stages [
8,
9]. Therefore, precise monitoring of LWC and CHL is vital for assessing citrus growth status, regulating irrigation practices, and forecasting yields [
10]. However, conventional measurement techniques—such as the fresh-to-dry weight ratio and spectrophotometry—are inherently destructive and limited by low temporal and spatial resolution, high operational costs, and labor-intensive procedures. These methods often necessitate repeated sampling across multiple locations, rendering them impractical for real-time, large-scale orchard monitoring [
11]. In this context, unmanned aerial vehicle (UAV)-based remote sensing emerges as a promising alternative. The high-resolution (2 cm spatial resolution and five spectral bands) data acquired by UAVs can be integrated with phenotypic vegetation traits to enhance the accuracy of plant characterization [
12]. Owing to species-specific physiological characteristics and their interactions with environmental conditions, vegetation exhibits significant spectral variability, which is distinctly expressed across particular spectral bands in remote sensing imagery [
13,
14,
15]. Accurate estimation of LWC and CHL enables early stress detection and supports precision orchard management. Leaves, as the primary organs responsible for photosynthesis and transpiration, are highly sensitive to both environmental fluctuations and internal metabolic regulation, and their structural and functional traits often undergo adaptive changes under varying growth conditions [
16]. Owing to their efficiency and accuracy, remote sensing techniques have become indispensable in precision agriculture, with multispectral inversion providing a rapid, non-destructive means of estimating key physiological parameters such as LWC and CHL [
17]. Nevertheless, most existing studies have concentrated on single-parameter estimation, and systematic research on the joint inversion of LWC and CHL from UAV multispectral data remains limited.
In recent years, spectral remote sensing has provided new opportunities for the non-destructive monitoring of crop physiological parameters [
18,
19]. The combination of reflectance in the visible to near-infrared range and vegetation indices has proven effective in characterizing the spectral responses of LWC and pigment concentrations [
20,
21]. For instance, Narmilan et al. [
22] applied various machine learning algorithms in combination with UAV multispectral remote sensing to estimate sugarcane canopy chlorophyll content, with the random forest (RF) model showing the best performance (R
2 = 0.99). Most existing studies have developed vegetation index models based on 2–3 key spectral bands to estimate LWC and CHL separately. However, a single vegetation index often fails to comprehensively reflect crop physiological status, while incorporating too many variables can lead to overfitting and increased model complexity. Therefore, selecting appropriate spectral variables is crucial for improving both the predictive accuracy and generalization capability of inversion models. Machine learning methods offer significant advantages in modeling complex nonlinearities among variables, providing technical support for remote sensing-based inversion of crop physiological and ecological parameters. In the case of citrus, recent studies using advanced preprocessing and machine learning methods (e.g., wavelet transform, fractional derivatives, random forest, support vector regression, and k-nearest neighbors) have improved the estimation of LWC and CHL [
23,
24,
25]. Nevertheless, these models still face challenges such as overfitting, high complexity, and reduced robustness under varying conditions. Ensemble learning methods, which integrate multiple base learners, further enhance the capacity of models to capture data nonlinearities and improve prediction accuracy [
26]. For example, in recent work “Enhancing spatial resolution of satellite soil moisture data through stacking ensemble learning techniques” researchers used a stacking ensemble framework combining Random Forest, Gradient Boosting, and XGBoost as base learners to improve prediction accuracy of soil moisture estimates [
27]. Likewise, Li et al. [
28] proposed a novel hybrid approach—Spiking-Hybrid—that combines process-based simulation with machine learning models, including partial least squares regression (PLS), Gaussian process regression (GPR), and gradient boosting regression (GBR). Optimization algorithms, which search for optimal parameter configurations within the model space, can significantly enhance the performance of machine learning models by improving data fitting ability, generalization, and prediction accuracy [
29]. However, studies applying optimization algorithm-integrated machine learning approaches to remote sensing estimation of fruit tree physiological parameters remain limited.
Owing to its advantages in high spatial resolution, low cost, and operational flexibility, UAV remote sensing holds great promise for agricultural monitoring. When combined with vegetation index-based inversion, UAV multispectral imagery effectively captures crop growth dynamics and supports nutrient evaluation and yield forecasting. In this study, PLS was selected as a classical linear modeling method widely used in spectral inversion, serving as a benchmark for comparison. The ELM model, known for its fast learning speed and suitability for high-dimensional spectral data, was chosen as the core nonlinear learner. However, the performance of standard ELM is often constrained by randomly initialized weights and biases, which may lead to unstable predictions. To address this, three intelligent optimization algorithms were introduced: Particle Swarm Optimization (PSO), Artificial Hummingbird Algorithm (AHA), and Grey Wolf Optimizer (GWO). PSO provides strong global convergence, AHA can avoid local optima, and GWO effectively balances exploration and exploitation in nonlinear optimization. By integrating these algorithms with ELM, the study aims to enhance model stability and prediction accuracy while systematically evaluating the advantages of different optimization strategies. The objectives of this study are as follows:
- (1)
This study develops an inversion framework that integrates UAV multispectral remote sensing with bio-inspired optimization algorithms, by constructing and comparing five models—PLS, ELM, and three optimized ELM variants (PSO-ELM, AHA-ELM, GWO-ELM). This provides new insights into the applicability of optimization-enhanced machine learning in crop physiology monitoring.
- (2)
We conduct a systematic evaluation of using sensitive band reflectance versus vegetation indices as input variables, thereby clarifying the relative contribution of different spectral features to the estimation of citrus LWC and CHL.
- (3)
We apply and validate a synergistic inversion strategy that integrates optimized ELM models for the simultaneous estimation of LWC and CHL, representing a novel application of multi-parameter modeling to citrus physiological monitoring under water stress.
The results of this study aim to provide theoretical support and technical guidance for citrus growth monitoring and precision fertilization management in the seasonally drought-prone regions.
2. Materials and Methods
2.1. Overview of the Study Area
This study was conducted at the Cangwubang Citrus Experimental Station, located in Yiling District, Yichang City, Hubei Province, China (30°45′ N, 110°41′ E; elevation: 343 m). The site is situated in a mid-latitude, subtropical continental monsoon climate zone, characterized by an average annual temperature of 16.9 °C, average annual precipitation of 1133.5 mm, average wind speed of 1.4 m/s, and mean annual relative humidity of 75.6%. The topsoil (0–60 cm) in the experimental field is classified as brown earth, with a bulk density of 1.48 g/cm
3 and a field capacity of 25.80%. The soil exhibits good fertility, with available nitrogen, phosphorus, and potassium contents of 1.32, 1.84, and 11.54 g/kg, respectively. The organic matter content is 15.6 g/kg, and the soil pH is neutral at 7.0. The experimental trees were 12-year-old Citrus reticulata Blanco cv. Yichang mandarin, planted with a spacing of 4 m × 3 m, and exhibited uniform growth status. The spatial distribution of experimental treatments within the study area is shown in
Figure 1.
2.2. Experiment Design and Ground Data Collection
Three irrigation levels were established to simulate varying water supply conditions and assess their effects on citrus growth, namely: high water (T1: 80–90% of field capacity), moderate water (T2: 70–80%), and low water (T3: 60–70%) input levels. A surface drip irrigation system was employed, with emitters installed at 0.4 m from the tree base in the east, south, west, and north directions. Each emitter delivered water at a flow rate of 2 L/h, and the system operated under a pressure of 0.1 MPa. Irrigation volumes were precisely controlled using water meters. Except for irrigation treatments, all other agronomic practices, including fertilization and pest control, followed the local standard the orchard management protocols to ensure consistency and repeatability.
A total of 9 independent plots were established, each comprising 3 Citrus reticulata Blanco cv. Yichang mandarin trees with uniform growth status. During each data collection event, 24 mature leaves were systematically sampled per tree from the outer canopy in the four cardinal directions (east, west, south, and north) to ensure sample representativeness. Ground data were collected from 21 April to 25 October 2024, at intervals of 17 to 20 days. using a UV–visible spectrophotometer (UV-2600, Shimadzu Corp., Kyoto, Japan) for chlorophyll extraction and an oven-drying method at 105 °C and 80 °C for leaf water content determination. All measurements were performed under clear weather conditions and natural light, between 10:00 AM and 12:00 noon local time, to reduce diurnal variation. UAV-based multispectral imagery acquisition was conducted simultaneously with each ground sampling session to ensure temporal and spatial consistency. After outlier elimination, a total of 263 valid ground observation datasets of LWC and CHL were obtained over the entire experimental period.
2.3. UAV Multispectral Data Acquisition and Processing
Spectral data acquisition was conducted using a DJI Phantom 4 Multispectral UAV (Phantom4-M, P4M; DJI Innovations, Shenzhen, China), equipped with an integrated multispectral imaging system comprising one RGB sensor and five narrowband monochrome sensors. Detailed technical specifications of the imaging system are presented in
Table 1. To ensure data quality, multiple pre-flight tests were conducted prior to formal image acquisition in order to optimize flight parameters. The finalized flight configuration included 80% forward and side overlap, a flight altitude of 75 meters (yielding a ground sampling distance of 2 cm), and a combination of waypoint hovering and perpendicular shooting along the main flight path. The full set of flight parameters is provided in
Table 2.
Radiometric calibration was performed using a MAPIR Spectral Reflectance Calibration Panel (MAPIR, San Diego, CA, USA), consisting of four Lambertian panels with known reflectance values (
Figure 2). Before collecting spectral imagery over the citrus orchard, the panel was placed horizontally on a flat surface, and the UAV was manually positioned at a vertical distance equivalent to seven times the panel’s length to capture calibration images. Calibration images were acquired under stable environmental conditions and consistent natural lighting to support subsequent radiometric correction.
To minimize background interference and enhance spectral accuracy, a handheld multispectral camera was additionally used to capture vertical canopy images under controlled environmental conditions. This supplemental imaging ensured that the extracted reflectance data accurately represented the spectral characteristics of the leaf surfaces. Image stitching, geometric correction, and radiometric calibration were carried out using DJI Terra software (v3.5, DJI Innovations, Shenzhen, China), while reflectance data for the five spectral bands were processed using ENVI 5.3 (Harris Geospatial Solutions, Broomfield, CO, USA). Accordingly, all model inputs were based on surface reflectance values derived from radiometrically calibrated images, rather than raw digital numbers (DN), ensuring the physical consistency of spectral information across different acquisition dates.
2.4. Vegetation Index Selection
The spectral reflectance curves of different crop species—or even the same species at different phenological stages—exhibit dynamic variations in both shape and intensity. These variations are particularly pronounced when physiological parameters such as LWC and CHL change, as such changes directly alter the plant’s absorption and reflectance characteristics in specific wavelength regions, thereby affecting its spectral response patterns. Vegetation indices, derived from mathematical combinations of reflectance values at different wavelengths captured by multispectral or hyperspectral sensors, are designed to enhance vegetation signals, suppress background noise, and quantitatively characterize the physiological and ecological state of vegetation. By designing appropriate band combinations (e.g., Normalized Difference Vegetation Index [NDVI], Enhanced Vegetation Index [EVI]), VIs enable effective extraction of key biophysical information such as crop growth status, health condition, biomass, and stress response. These indices provide a quantitative foundation for applications in agricultural monitoring, crop classification, and biophysical parameter inversion [
30,
31].
Considering the differences in spectral absorption features between LWC and CHL, the vegetation indices selected in this study span key absorption peaks and reflectance troughs from the visible to the near-infrared regions. By aligning with the LWC- and CHL-sensitive spectral regions within the electromagnetic spectrum, the selected indices are ensured to effectively capture the variation characteristics of these two physiological parameters. To enhance physiological sensitivity and based on insights from previous studies [
32,
33,
34] a total of 16 vegetation indices were ultimately selected for their strong correlations with LWC and CHL. Among these, 10 indices were highly correlated with LWC, and another 10 with CHL, with some indices responsive to both parameters. The detailed list of indices and their corresponding band combination formulas is provided in
Table 3.
2.5. Construction of Machine Learning Models
The modeling outcomes are largely influenced by the choice of machine learning algorithms. To enhance inversion accuracy and assess the suitability of different methods, five models were employed in this study for the estimation of LWC and CHL: Partial Least Squares Regression (PLS), Extreme Learning Machine (ELM), Particle Swarm Optimization-based ELM (PSO-ELM), Artificial Hummingbird Algorithm-based ELM (AHA-ELM), and Grey Wolf Optimization-based ELM (GWO-ELM). For model construction, 70% of the full-season dataset was randomly selected as the training set, while the remaining 30% was used as the validation set. A 30% validation ratio helps reduce the volatility of model evaluation metrics, thereby improving result stability, while the 70% training set provides sufficient data support for model training and hyperparameter optimization. During model development, the Kolmogorov–Smirnov (K–S) test was conducted to assess whether the training and validation sets followed the same distribution. The results showed that all p-values exceeded 0.05 (α = 0.05), indicating that the null hypothesis—that both datasets are drawn from the same distribution—could not be rejected. This suggests no significant statistical difference between the two subsets, and confirms that the data partitioning was representative and free from distribution bias, thus minimizing evaluation errors caused by sample shift. To ensure model reproducibility and fair comparison among algorithms, the number of hidden neurons in the ELM models was fixed at 80 based on preliminary experiments, which achieved a good balance between model complexity and generalization performance. A uniform L2 regularization coefficient (λ = 0.01) was applied to prevent overfitting. For fairness in computational effort, all three optimization-based ELM models (PSO-ELM, GWO-ELM, and AHA-ELM) were configured with the same population size (30) and maximum iteration number (150). The PSO parameters were set to an inertia weight of 0.7 and cognitive/social learning coefficients of 1.5 each, while GWO employed the standard linearly decreasing control parameter (a = 2 → 0) and AHA used a small adaptive step size (σ = 0.05) for local exploration. Meanwhile, a 5-fold cross-validation strategy was employed during model training to evaluate prediction accuracy and to avoid overfitting.
- (1)
Partial least squares regression
Partial Least Squares Regression (PLS) is a non-parametric linear multivariate modeling approach that has been widely applied in chemometrics. As a generalized form of multiple linear regression (MLR), PLS is capable of handling datasets with high dimensionality, multicollinearity, or noise interference, making it suitable for modeling tasks involving strong correlations among variables in complex systems [
48]. Among various dimensionality reduction regression methods, PLS effectively mitigates the problems of high dimensionality, multicollinearity, and noise by extracting latent variables that capture the maximum covariance between predictors and response variables, thereby improving both model interpretability and predictive performance. In this study, a five-fold cross-validation strategy was adopted during model training to evaluate the generalization ability of the model across different data subsets. The average performance across the five folds was used as the final evaluation metric, which helps to reduce the risk of overfitting and enhance model robustness.
- (2)
Extreme Learning Machine
Extreme Learning Machine (ELM) is an efficient learning algorithm designed for training Single Hidden Layer Feedforward Neural Networks (SLFNs). Unlike traditional SLFN models, ELM does not require iterative optimization of the input layer parameters during training. Instead, it randomly initializes the input weights and hidden layer biases, and directly computes the output weights by minimizing a regularized loss function composed of the training error and the norm of the output weights. This solution is obtained analytically using the Moore–Penrose (MP) generalized inverse [
49]. The core advantage of ELM lies in its significantly enhanced training efficiency, as it avoids the time-consuming backpropagation process typical of conventional neural networks, while still retaining the powerful nonlinear modeling capability and universal approximation performance of SLFNs [
50]. By randomly initializing hidden layer parameters and rapidly solving for the output weights, ELM exhibits strong computational efficiency and generalization ability in handling high-dimensional nonlinear regression and classification tasks. Consequently, it has been widely applied in various fields such as classification, regression, and pattern recognition.
- (3)
Particle Swarm Optimization
The core concept of the ELM lies in the random initialization of input weights and hidden layer biases, followed by the analytical determination of output weights through minimization of the training error. Although this non-iterative training mechanism significantly improves computational efficiency, it may lead to uncertainty and reduced model stability due to the randomness of initial parameters. To address these limitations and enhance predictive performance, this study incorporates the Particle Swarm Optimization (PSO) algorithm for global optimization of ELM parameters. PSO is a population-based global optimization technique inspired by the foraging behavior of bird flocks. It guides the search process by dynamically updating each particle’s position based on its own historical best and the global best positions found by the swarm, thereby balancing global exploration and local exploitation. PSO has the advantages of fast convergence, ease of implementation, and strong robustness, and it has been widely applied in the modeling of complex nonlinear systems. In this study, PSO is applied to optimize the input weight matrix and hidden layer biases of the ELM model. The goal is to mitigate the uncertainty caused by random initialization and to improve the model’s stability and generalization ability in nonlinear inversion tasks. The overall model structure and optimization workflow are illustrated in
Figure 3.
- (4)
Artificial Hummingbird Algorithm
Although ELM offers significant advantages in training efficiency, its random initialization of input weights and biases introduces considerable uncertainty. Different initializations may result in performance fluctuations, particularly affecting the model’s generalization capability. To enhance the stability and predictive accuracy of ELM in nonlinear inversion tasks, this study incorporates the Artificial Hummingbird Algorithm (AHA) to optimize the initial input weights and hidden layer biases. Proposed by Zhao et al. [
51] in 2021, AHA is a novel metaheuristic optimization algorithm inspired by the natural foraging behaviors and flight modes of hummingbirds. It simulates three flight strategies—axial, diagonal, and omnidirectional—and three foraging behaviors: guided foraging, regional foraging, and migratory foraging. By introducing a visit table to mimic the memory mechanism of hummingbirds tracking food sources, AHA effectively balances global search and local exploitation, demonstrating strong optimization and convergence capabilities. In this study, the mean squared error (MSE) of the training set is used as the fitness function during the AHA optimization process to guide the updating of weights and biases. This approach helps mitigate performance variability caused by random initialization in traditional ELM. The optimal network parameters are decoded and applied to both the training and validation sets for prediction. To further enhance model stability and robustness, an L2 regularization term is introduced to reduce overfitting risk. This optimization framework effectively overcomes the randomness associated with ELM initialization and improves the model’s robustness and predictive accuracy. The overall model structure and optimization process are illustrated in
Figure 4.
Inspired by the predatory behavior of grey wolves, Mirjalili et al. [
52] proposed a novel swarm intelligence optimization algorithm in 2014—the Grey Wolf Optimizer (GWO). GWO is a bio-inspired optimization method that simulates the social hierarchy and cooperative hunting strategies of grey wolf packs. By mimicking the hunting process—namely tracking, encircling, and attacking prey—the algorithm iteratively approaches the global optimum. In this framework, individual wolves adjust their positions based on fitness values and follow the leadership hierarchy (α, β, δ, and ω wolves), enabling effective global exploration and local exploitation of the solution space [
53]. GWO is characterized by its simple structure, minimal parameter requirement, and ease of implementation. Its convergence factor and dynamic position-updating mechanism contribute to a balanced search strategy, which has demonstrated strong performance in solving complex nonlinear optimization problems [
54]. In traditional ELM, input weights and biases are randomly initialized, which may cause substantial fluctuations in model performance. To enhance the generalization capability and predictive accuracy of ELM, this study employs GWO to optimize its initial weights and biases. The optimization procedure is as follows: First, the positions of the grey wolf population are initialized, where each individual encodes a parameter vector representing ELM input weights and biases. Each individual is then decoded into the corresponding weight matrix and bias vector to construct the ELM network. The coefficient of determination (R
2) on the validation set is used as the fitness function to evaluate each individual’s performance. Based on fitness values, the top three individuals are selected as α, β, and δ wolves. The positions of the remaining wolves are updated according to GWO’s position-update strategy. During the iterative process, the position and fitness of the best individual are continuously updated until the maximum number of iterations is reached. Finally, the optimal individual’s position is used as the initialized weights and biases of the ELM network, completing model training and enabling prediction and evaluation on the validation set. The model’s predictive performance is assessed using metrics such as R
2 and RMSE, and the optimal model parameters are saved for reproducibility. The overall model structure and optimization workflow are illustrated in
Figure 5.
2.6. Synergistic Inversion Framework
To simultaneously estimate LWC and CHL, a synergistic inversion framework was implemented. Two types of input datasets were constructed: (i) a vegetation index group, consisting of the three most sensitive indices for LWC and the three most sensitive indices for CHL; and (ii) a spectral band group, consisting of two sensitive bands for LWC and two sensitive bands for CHL.
The Extreme Learning Machine (ELM) and its optimized variants (PSO-ELM, AHA-ELM, GWO-ELM) were extended into a multi-output regression architecture, where a shared hidden layer was followed by two output neurons, corresponding to LWC and CHL, respectively. This design allowed the models to leverage shared spectral–physiological information while providing dual predictions. Training and parameter optimization followed the same procedure as in the single-target inversion. Model performance was reported separately for LWC and CHL using R2 and RMSE.
2.7. Evaluation Indicators
The accuracy of the LWC and CHL inversion models based on vegetation indices extracted from UAV multispectral remote sensing was evaluated using the coefficient of determination (R
2), root mean square error (RMSE), mean relative error (MRE), and estimation accuracy (EA). The specific expressions are as follows:
In the equations, denotes the observed value, denotes the predicted value, and represents the mean of the actual values. A higher coefficient of determination (R2, closer to 1), along with lower values of root mean square error (RMSE) and mean relative error (MRE), and an estimation accuracy (EA) closer to 100%, indicate more stable model performance, better fitting accuracy, and stronger predictive capability.
3. Results
3.1. Correlation Between Multispectral Indices and Citrus Leaf LWC and CHL
Pearson correlation analysis was performed between the spectral band data, vegetation indices, and LWC and CHL of citrus leaves, as shown in
Table 4 and
Table 5. Among the 15 spectral bands and vegetation indices, most variables were negatively correlated with LWC. The red, green, and blue bands showed highly significant correlations with LWC (
p < 0.01), while the red-edge and near-infrared bands did not pass the significance test. Therefore, the red, green, and blue bands were selected as sensitive bands for LWC inversion and used as model input features.
Regarding vegetation indices, all indices had absolute correlation coefficients greater than 0.35 with LWC, among which EXG had the highest (|r| = 0.709). To ensure strong correlation and good discriminative power, the five indices with the highest |r| values—NGRDI, GNDVI, EXG, EXR, and GI—were selected as sensitive vegetation indices for LWC inversion (
Table 4).
For CHL, nine spectral bands and vegetation indices had absolute correlation coefficients greater than 0.5. Using the same selection criteria as for LWC, the red, red-edge, and green bands were identified as sensitive bands for CHL inversion. The corresponding sensitive vegetation indices—OSAVI, NGRDI, GLI, NGBDI, and TSAVI—were used to construct the CHL inversion model (
Table 5). It is worth noting that some spectral bands and vegetation indices exhibited weak or insignificant correlations with LWC and CHL. This can be attributed to several factors: (i) physiological mechanisms—LWC primarily affects leaf reflectance in the visible spectrum through changes in leaf thickness and internal water absorption, while CHL is mainly linked to pigment absorption peaks in the red and green regions; hence, bands in the NIR region showed limited sensitivity under the multispectral setting; (ii) spectral resolution limitations—compared with hyperspectral sensors, the multispectral camera used in this study has relatively broad band ranges, which may dilute subtle absorption features (e.g., red-edge response to CHL); (iii) environmental and canopy effects—soil background, illumination variability, and canopy structure may introduce noise, weakening the correlation between some indices (such as NDVI or SIPI) and physiological traits. Therefore, only those bands and indices showing stable and physiologically interpretable correlations were selected as sensitive variables for subsequent modeling.
3.2. Construction of Citrus LWC and CHL Inversion Models
Based on two types of input datasets—sensitive spectral band combinations and sensitive vegetation index combinations—five models (PLS, ELM, PSO-ELM, AHA-ELM, and GWO-ELM) were constructed to invert citrus LWC and CHL from UAV multispectral data, enabling comparative analysis. All models showed good fitting performance on the training set, and the validation results are shown in
Table 6. Overall, models constructed using vegetation index combinations outperformed those using spectral band combinations. In the PLS model, the validation set R
2 values for LWC and CHL based on vegetation indices were 0.686 and 0.437, respectively, whereas the corresponding values for spectral bands were only 0.572 and 0.371, highlighting the stronger representational power of vegetation indices in parameter inversion. This trend was further confirmed in the ELM model, where the vegetation index-based LWC and CHL achieved validation set R
2 values of 0.654 and 0.534, and RMSE values of 3.173% and 0.217 mg/g, respectively. In contrast, the spectral band-based model had R
2 values of 0.624 and 0.478, and RMSEs of 4.009% and 0.227%.
With the integration of the PSO algorithm, the inversion accuracy significantly improved. The PSO-ELM model achieved the highest performance, with validation set R2 values for LWC and CHL reaching 0.744 and 0.646, and RMSE values decreasing to 3.104% and 0.183%, respectively. The AHA-ELM model also showed high accuracy in CHL inversion (R2 = 0.638, RMSE = 0.246%), with LWC R2 at 0.660, slightly lower than PSO-ELM. The GWO-ELM model achieved a CHL validation set R2 of 0.547, better than PLS and ELM, but in LWC prediction, it only slightly outperformed ELM (R2 = 0.671). Further comparison revealed that PSO-ELM performed best in both accuracy and stability. Compared to PLS, the prediction accuracy for LWC and CHL improved by 8.45% and 47.83%, respectively; compared to ELM, improvements were 13.76% and 20.97%. While AHA-ELM was slightly inferior to PSO-ELM in CHL inversion, it outperformed GWO-ELM, with CHL accuracy improved by 46% better than PLS and by 19.48% better than ELM, respectively. In LWC prediction, AHA-ELM was slightly lower than PLS by 3.79%, but still better than ELM with a 0.91% increase. For GWO-ELM, CHL inversion accuracy improved by 25.17% over PLS and 2.43% over ELM; in LWC prediction, although slightly lower than PLS by 2.19%, it exceeded ELM by 2.60%.
For LWC inversion (
Figure 6), using spectral bands as input, PSO-ELM showed the best overall performance, while ELM achieved the highest validation set accuracy. When using vegetation indices, AHA-ELM performed best on the training set, and PSO-ELM was the most accurate on the validation set. Similar trends were observed in CHL inversion (
Figure 7): under spectral band input, AHA-ELM had the highest training accuracy, while ELM was best on the validation set; with vegetation indices, AHA-ELM again led on the training set, while ELM showed the highest validation set performance.
These performance differences can be attributed to both the nature of the algorithms and the input features. PLS, as a linear regression approach, is constrained in modeling nonlinear relationships between spectral features and physiological parameters, which explains its relatively low performance. ELM introduces nonlinear activation functions and therefore achieves better predictive accuracy. However, its random initialization of input weights and biases often leads to unstable solutions, which the optimization algorithms were designed to overcome. Among them, PSO provided the most consistent improvement owing to its rapid global convergence, while AHA and GWO showed more variable performance due to their sensitivity to parameter settings and potential for premature convergence. In addition, models based on vegetation indices generally outperformed those based on raw spectral bands, since indices can enhance physiological signals while reducing background noise and multicollinearity. This explains why optimized models, particularly PSO-ELM, yielded the highest accuracy when applied to vegetation index inputs.
3.3. Construction of Synergistic Inversion Models for Citrus LWC and CHL
To explore the correlation between citrus LWC and CHL and further improve the inversion accuracy based on UAV multispectral data, a synergistic inversion strategy was developed. Under complex background conditions, the number of required spectral bands theoretically increases with the number of target variables. However, in typical multispectral scenarios (≤10 bands), only 2–4 bands are sufficient to extract key spectral information, three vegetation indices usually involve 2–3 spectral channels, ensuring information diversity while controlling input dimensionality [
55]. Based on Pearson correlation analysis, red and green bands (sensitive to LWC) and green and red-edge bands (sensitive to CHL) were selected as the sensitive band group. Meanwhile, GNDVI, EXG, EXR (for LWC) and OSAVI, GLI, NGBDI (for CHL) formed the sensitive vegetation index group. Since LWC responds strongly to red/green bands and CHL is more responsive to the red-edge, combining both types of spectral features enhances the simultaneous representation of water status and pigment distribution, mitigating band-specific limitations.
To estimate both parameters efficiently, synergistic models based on PSO-ELM, AHA-ELM, and GWO-ELM were constructed. In the training set, PSO-ELM achieved superior performance (LWC: R
2 = 0.792, RMSE = 3.157%; CHL: R
2 = 0.690, RMSE = 0.197 mg/g). On the validation set (
Figure 8), LWC prediction remained stable (R
2 = 0.790, RMSE = 3.292%), while CHL prediction improved significantly (R
2 = 0.672, RMSE = 0.200 mg/g), confirming the effectiveness of PSO in enhancing ELM generalization. Compared with PLS, the PSO-ELM-based synergistic model improved LWC and CHL prediction accuracy by 15.16% and 53.78%, respectively; compared to standard ELM, improvements reached 20.80% and 25.84%. Relative to independent PSO-ELM models for LWC and CHL, the synergistic strategy further improved accuracy by 6.18% and 4.02%, respectively, demonstrating its advantage in multi-parameter physiological trait modeling.
The AHA-ELM joint model also exhibited strong performance on the training set, achieving R
2 values of 0.741 and 0.682 for LWC and CHL, with corresponding RMSEs of 3.070% and 0.191 mg/g. Validation set results (
Figure 9) showed stable LWC prediction (R
2 = 0.685, RMSE = 4.906%) and improved CHL inversion accuracy (R
2 = 0.651, RMSE = 0.229 mg/g). Compared to the single-parameter PLS model, prediction accuracy improved by 8.02% for LWC and 56.06% for CHL. Relative to the ELM model, improvements reached 13.30% and 27.72%, respectively. Compared with the standalone AHA-ELM model, the joint modeling strategy further enhanced LWC and CHL accuracy by 12.27% and 6.90%. The GWO-ELM joint model also yielded favorable results. On the training set, R
2 values for LWC and CHL were 0.720 and 0.627, with RMSEs of 3.699% and 0.204 mg/g. Validation set performance (
Figure 10) reached R
2 values of 0.703 (LWC) and 0.621 (CHL), with RMSEs of 3.752% and 0.239 mg/g, respectively. Compared with the PLS model, LWC and CHL accuracy improved by 4.96% and 50.57%; compared with the ELM model, improvements were 10.09% and 17.42%. Compared to the standalone GWO-ELM model, the joint strategy further enhanced prediction accuracy by 7.30% (LWC) and 14.63% (CHL).
In summary, ELM models optimized by intelligent algorithms and equipped with a joint inversion strategy significantly outperformed single-task models for simultaneous estimation of LWC and CHL. The integration of multi-source sensitive features and optimization algorithms effectively captured the underlying physiological relationships between parameters, thereby improving both prediction accuracy and generalization capability, demonstrating strong application potential.
3.4. Comparison of Models Using Different Input Features
As shown in
Figure 11, overall, models using sensitive band inputs generally achieved higher R
2 and lower RMSE on the training set, indicating better generalization ability. Among them, PSO-ELM-MLT and AHA-ELM-MLT not only maintained high training accuracy but also significantly outperformed traditional PLS and single ELM models on the Validation set. Although models based on spectral bands also demonstrated certain predictive capability, their overall accuracy was slightly lower. Under the sensitive vegetation index input condition, model performance in terms of R
2 and RMSE on both training and validation sets revealed that most models showed stronger generalization. In particular, integrated optimization models such as PSO-ELM-MLT and AHA-ELM-MLT achieved high fitting accuracy on the training set (R
2 = 0.792 and 0.741, respectively), while also delivering excellent performance on the validation set (R
2 = 0.790 and 0.685; RMSE = 3.292% and 4.096%, respectively).
Regarding the CHL-sensitive spectral band group (upper part of
Figure 12), most models achieved relatively high R
2 and low RMSE on the training set, indicating good fitting ability. Optimization models like PSO-ELM and AHA-ELM also showed significantly better generalization performance on the validation set compared with traditional PLS and ELM, with PSO-ELM-MLT and AHA-ELM-MLT maintaining a balanced performance between training and validation. This confirms the potential of multi-source collaborative modeling in improving robustness. Under the CHL-sensitive vegetation index input condition (lower part of
Figure 12), although a few models had slightly lower R
2 on the training set, most exhibited more stable performance with lower RMSE on the validation set, indicating that this type of feature offers clear advantages in enhancing generalization ability. Especially, integrated models such as PSO-ELM-MLT and GWO-ELM-MLT achieved relatively high prediction accuracy and stability, confirming the effectiveness of vegetation indices in capturing CHL variation. In conclusion, compared with sensitive spectral bands, the use of sensitive vegetation indices as input features allows for a more comprehensive representation of plant physiological traits, contributing to improved model stability and inversion accuracy. These results further validate the advantage of vegetation index fusion in the joint inversion of citrus leaf LWC and CHL. Using sensitive vegetation indices as input not only enhances the modeling capability for CHL but also improves model robustness and practical applicability.
3.5. Model Validation and Spatial Heterogeneity Analysis Based on the PSO-ELM Synergistic Inversion
Based on the best-performing PSO-ELM multi-task inversion model, simulated prediction and inversion validation were carried out using field-measured spectral data acquired on 14 July 2024. The results showed clear spatial heterogeneity in citrus LWC and CHL distributions (
Figure 13). High-value zones of LWC (≥70%) and CHL (≥3.0 mg/g) were mainly concentrated in areas with higher irrigation levels, indicating that water supply significantly regulates key leaf physiological parameters. Under the high irrigation treatment (T1), both LWC and CHL reached higher overall levels and exhibited more uniform spatial distributions. The area proportion with LWC ≥ 70% was approximately 60–70%, and that with CHL ≥ 3.0 mg/g was about 50–60%, suggesting that sufficient soil moisture allows citrus leaves to maintain higher water content and chlorophyll levels, thus promoting photosynthesis and growth. Under moderate irrigation (T2), both LWC and CHL decreased notably, with moderately heterogeneous spatial distributions, implying a reduction in resource allocation efficiency due to limited water availability. Under low irrigation (T3), LWC and CHL reached their lowest levels and displayed highly scattered spatial patterns, further confirming that severe water deficit restricts water retention and may inhibit chlorophyll synthesis, negatively impacting overall physiological status and plant health.
In conclusion, the PSO-ELM model demonstrated high accuracy and robustness in the joint inversion of citrus LWC and CHL, confirming its potential for precision crop monitoring and drought stress evaluation.
4. Discussion
In this study, we systematically evaluated the inversion accuracy of PLS, ELM, and three optimized ELM models—PSO-ELM, AHA-ELM, and GWO-ELM. By modifying the ELM framework to support dual-parameter modeling, the model’s ability to handle nonlinear relationships, suppress noise, and mitigate multicollinearity was enhanced. Meanwhile, the physiological correlation between LWC and CHL was leveraged to improve inversion accuracy and model stability, effectively addressing the issues of limited predictive performance and poor robustness in conventional PLS and standard ELM models. Results indicate that models based on sensitive vegetation indices generally outperform those constructed using sensitive spectral bands. This may be attributed to the stronger spectral response of vegetation indices to physiological parameters under specific band combinations, which effectively enhances the signal-to-noise ratio and reduces background interference, thus improving model robustness and prediction performance [
56]. In contrast, single spectral bands are more prone to environmental influence, such as soil background and illumination changes, and are relatively less effective in representing physiological states. Moreover, vegetation indices often have clear physiological relevance and targeted sensitivity mechanisms. For example, GLI is more responsive to changes in chlorophyll, while EXG is highly sensitive to variations in greenness. This type of prior-informed feature fusion enhances feature expression efficiency, reduces model dimensionality, and helps avoid performance degradation due to redundant inputs. Therefore, under conditions of limited spectral resolution, vegetation indices serve as an efficient tool for feature representation and information compression, offering stronger generalization and robustness. They represent one of the key strategies for improving model performance in multispectral inversion tasks.
In this study, the correlations between multispectral bands, vegetation indices, and citrus LWC and CHL were systematically analyzed. Based on the results, sensitive band groups and vegetation index groups were separately selected for modeling LWC and CHL. Among the indices, EXG showed the strongest correlation with LWC, indicating its high sensitivity in representing citrus leaf water status. While most previous studies have employed hyperspectral data for LWC inversion [
57,
58,
59], this study explored the response capacity of multispectral band data for estimating leaf physiological parameters, demonstrating its potential advantages in balancing accuracy and practical applicability in agricultural remote sensing. Additionally, the study found that GLI had the highest correlation with CHL. Interestingly, red-edge-related indices commonly used in traditional CHL inversion research did not show optimal performance in this study [
60]. This could be attributed to the limited spectral resolution of multispectral cameras in the red-edge region, in contrast to hyperspectral instruments. As a result, vegetation indices based on red-edge bands may exhibit reduced sensitivity to chlorophyll content under multispectral conditions, thus constraining their inversion performance. Conversely, under high CHL levels and complex backgrounds, indices derived from red and green bands tend to avoid red-edge saturation and minimize interference from soil and canopy structure, thereby more effectively capturing CHL variations and leading to superior inversion accuracy.
During the process of improving machine learning approaches for citrus LWC and CHL inversion (
Table 6), it was observed that both PSO-ELM and ELM models outperformed the traditional PLS model in terms of inversion accuracy. This difference likely stems from the fact that PLS, as a typical linear regression method, struggles to fully capture the complex nonlinear mapping between LWC and multispectral vegetation features, limiting its ability to interpret correlations among variables [
61,
62]. In contrast, ELM and its improved variant PSO-ELM employ nonlinear activation functions within a feedforward neural network architecture, allowing for better learning of nonlinear relationships and improved predictive performance. The PSO-ELM model demonstrated significantly better performance than the standard ELM in both LWC and CHL inversion tasks. This improvement can be attributed to the limitations of ELM, where input weights and biases are randomly initialized during training, lacking a global optimization mechanism. Such randomness can cause the model to fall into local optima, negatively impacting prediction performance [
63]. By introducing the particle swarm optimization algorithm, PSO-ELM performs a global search over ELM parameters, thereby overcoming the randomness of initialization and enhancing model stability and generalization capability [
64]. Furthermore, the AHA-ELM and GWO-ELM models also achieved higher inversion accuracy compared to the base ELM model. AHA-ELM integrates adaptive weight adjustment with hybrid genetic operators to improve search efficiency and solution diversity, while GWO-ELM mimics the cooperative hunting behavior of grey wolf populations for parameter optimization, balancing global exploration with strong local exploitation ability. Overall, ELM models enhanced by intelligent optimization algorithms exhibit greater robustness and predictive accuracy in modeling complex nonlinear relationships and avoiding local optima, offering a more reliable approach for the joint inversion of multiple physiological variables from multisource data.
From the comparative results (
Figure 6 and
Figure 7), we observed three recurring situations where an optimized ELM variant failed to yield expected improvements: (1) when a model shows substantially higher training accuracy but lower test accuracy, indicating overfitting to the training data; (2) at the extremes of the target ranges (very low or very high LWC/CHL), where sample density is low and prediction errors increased noticeably; and (3) when using raw spectral-band inputs rather than vegetation indices, since band-based inputs are more sensitive to noise, background effects and multicollinearity, which can destabilize the optimizer and the learned mapping. These failure modes can be explained by several factors. First, the limited sample size and uneven distribution across the physiological ranges reduce the model’s ability to generalize—optimizers may fit idiosyncratic patterns in the training set that do not hold in unseen samples. In particular, when samples near the upper or lower physiological limits are scarce, the optimizer may not effectively learn the nonlinear boundary conditions, leading to larger prediction residuals. Second, multispectral sensors have coarser spectral resolution, which weakens subtle pigment- or water-related signals and increases susceptibility to noise; optimized models that aggressively fit training error therefore risk capturing noise rather than true signal. This issue becomes more pronounced under complex illumination or canopy background conditions, where spectral reflectance variability unrelated to physiological changes can mislead the optimizer during fitness evaluation. Third, specific optimization behaviors can lead some metaheuristics to stagnate in suboptimal solutions or to overfit. For example, PSO may prematurely converge when particles cluster too early around local optima, while AHA’s adaptive search can sometimes overweight exploration at the expense of fine-tuning. GWO, on the other hand, may maintain a better balance between exploration and exploitation but converge more slowly when the fitness landscape is highly rugged. These insights suggest that while optimization algorithms substantially enhance model robustness under most conditions, their success depends strongly on data diversity, noise level, and the spectral sensitivity of the input features. Future research should therefore incorporate larger and more evenly distributed datasets, as well as adaptive regularization strategies, to further mitigate these failure modes. Additionally, an analysis of potential error sources indicates that the inversion uncertainty mainly arises from three aspects: image resolution, illumination variability, and model-related factors. The UAV multispectral imagery in this study provided a 2 cm spatial resolution, which effectively reduced mixed-pixel effects and improved canopy-level spectral fidelity. Radiometric calibration using reflectance panels and consistent data collection between 10:00 and 12:00 local time minimized illumination-induced fluctuations. Therefore, residual inversion errors are primarily attributed to model limitations—specifically, the random initialization and limited global search ability of the standard ELM model. The integration of PSO, AHA, and GWO optimization algorithms effectively mitigated these model-related errors by improving parameter stability and global convergence, thereby enhancing prediction accuracy and robustness.
In addition to PLS and ELM, other machine learning models such as Random Forest (RF) and Support Vector Machines (SVMs) were also tested in combination with optimization algorithms. However, their optimized versions still yielded lower accuracy compared to the optimized ELM variants. For example, PSO-RF achieved a validation R2 of 0.63 for LWC and 0.54 for CHL, while PSO-ELM reached 0.79 and 0.67, respectively. Similarly, optimized SVM variants showed limited improvement and unstable convergence. This result suggests that, under the condition of limited sample size and high-dimensional spectral input, ELM’s structural simplicity and nonlinear fitting ability make it more suitable for multispectral inversion tasks, while RF and SVM may require larger datasets or additional feature engineering to achieve competitive performance. These findings further demonstrate the superiority of the optimization-based ELM framework in capturing nonlinear relationships and ensuring stable inversion accuracy. Overall, the use of metaheuristic optimization algorithms enhances the basic ELM model mainly by stabilizing parameter initialization, guiding the search towards globally optimal solutions, and reducing variance across runs. This results in more consistent and accurate predictions compared to the unoptimized ELM, as evidenced by the higher validation set R2 and lower RMSE values achieved by PSO-ELM, AHA-ELM and GWO-ELM in most scenarios. These improvements confirm that the optimizers effectively alleviate the randomness inherent in ELM initialization and strengthen the nonlinear mapping ability of the model.
Application validation was conducted on the developed synergistic inversion models for citrus LWC and CHL, demonstrating stable predictive performance. Under the sensitive vegetation index-driven condition, the PSO-ELM-based synergistic inversion method showed the best overall performance. Compared with the traditional PLS model, the inversion accuracy for LWC and CHL improved by 15.16% and 53.78%, respectively. Compared to the single ELM model, the improvements were 20.80% and 25.84%, respectively. Furthermore, relative to the PSO-ELM model used for single-variable inversion, the synergistic inversion strategy further enhanced accuracy by 6.18% for LWC and 4.02% for CHL. The AHA-ELM synergistic inversion model also performed well on the validation set, achieving improvements of 12.27% and 6.90% in LWC and CHL prediction accuracy over single-target inversion, respectively. This highlights the advantage of adaptive optimization strategies in enhancing model stability and global search capability. A similar trend was observed in the GWO-ELM synergistic model, with prediction accuracy improvements of 7.30% for LWC and 14.63% for CHL. These results suggest that the grey wolf optimization algorithm, while maintaining global search ability, also enhances the nonlinear model’s capacity to capture complex interaction information between dual targets. Overall, all optimized models outperformed their single-variable counterparts under the synergistic inversion framework, confirming the effectiveness and adaptability of the dual-parameter strategy in improving modeling accuracy and generalization capability. From a plant physiology perspective, LWC and CHL are tightly linked during citrus growth. Both are co-regulated by similar environmental stressors (e.g., water stress, light intensity) and exhibit a high degree of coupling in physiological processes such as photosynthetic efficiency, water metabolism, and nutrient accumulation [
65]. Therefore, introducing a synergistic inversion framework for LWC and CHL in multispectral modeling allows for the exploitation of their complementary spectral responses, enhancing the model’s ability to characterize the physiological status of leaves comprehensively [
66]. These findings validate the effectiveness of the synergistic strategy in capturing co-variation patterns of the two variables and provide both theoretical foundation and methodological support for high-precision remote sensing monitoring of fruit tree health.
Despite the encouraging results, several limitations should be acknowledged. First, the dataset used in this study was collected from a single orchard, which may limit the generalization of the models to different growth stages, environmental conditions, or crop varieties. Second, the UAV multispectral sensor provides only a limited number of broad bands, which constrains the ability to capture subtle spectral variations; the inclusion of hyperspectral data or fusion with RGB/thermal imagery could further enhance model performance. Third, although metaheuristic optimization improved ELM performance, the algorithms may still suffer from parameter sensitivity and computational cost when applied to larger-scale datasets. Future studies will address these issues by incorporating multi-site and multi-temporal datasets, validation additional data sources, and refining optimization strategies for scalability and robustness.
Furthermore, the results reinforce the significant advantages of the synergistic inversion framework in enhancing model prediction accuracy—particularly under multisource information fusion and sensitive feature co-driving conditions—by delivering superior modeling capacity and generalizability. Although this study constructed inversion models based solely on sensitive band groups and vegetation index groups, it effectively reflected the spectral response characteristics and variation patterns of LWC and CHL. However, to further improve model robustness and adaptability, future research should incorporate more diverse and multidimensional features (e.g., structural parameters, thermal infrared indicators, texture characteristics), alongside ground-truth data across different growth stages, for model optimization. Rich and varied datasets can help reduce the risk of overfitting and minimize uncertainty in predictions [
67], thereby improving the overall applicability and scalability of inversion models in complex agricultural scenarios.