Next Article in Journal
Aggregate-Associated Soil Nutrients and Enzyme Activities Across Different Forest Types on the Loess Plateau, China
Previous Article in Journal
U.S. National Forests Are More Diverse, Denser and Less Invaded than Neighboring Forests
Previous Article in Special Issue
Forest Fire Risk Early Warning Based on Dynamic Fuel Moisture Content
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Hyperspectral Estimation of Chlorophyll Density in Populus pruinosa Incorporating Leaf Water Content

1
College of Agriculture, Tarim University, Alar 843300, China
2
Key Laboratory of Genetic Improvement and Efficient Production for Specialty Crops in Arid Southern Xinjiang Production and Construction Corps, Tarim University, Alar 843300, China
3
College of Life Science and Technology, Tarim University, Alar 843300, China
4
College of Resource and Environment, Huazhong Agricultural University, Wuhan 430070, China
*
Author to whom correspondence should be addressed.
Forests 2026, 17(6), 692; https://doi.org/10.3390/f17060692 (registering DOI)
Submission received: 4 April 2026 / Revised: 22 May 2026 / Accepted: 1 June 2026 / Published: 11 June 2026

Abstract

Populus pruinosa Schrenk is a keystone species in arid riparian ecosystems, where its physiological status is critical for biodiversity and soil stabilization. In this study, spectral reflectance, leaf chlorophyll density (CHD), and leaf water content (LWC) were measured for Populus pruinosa in the Tarim River headwater region and Awati County, Xinjiang, from July to October 2023. The aim was to estimate CHD using hyperspectral data combined with machine learning and to evaluate the effect of LWC on model accuracy. Raw spectra were preprocessed using Savitzky–Golay (SG) smoothing and continuous wavelet transform (CWT). A two-step feature selection strategy comprising Random Frog and iterative retaining informative variables (IRIV) was applied to extract characteristic bands. Three machine learning models—support vector machine (SVM), random forest (RF), and extreme gradient boosting (XGBoost)—were developed for CHD estimation with and without LWC as an additional input. Incorporating LWC consistently improved the predictive performance of all models. Without LWC, the RF model achieved the best accuracy (training R2 = 0.842, test R2 = 0.830), whereas after LWC integration, XGBoost reached the optimal performance (training R2 = 0.871, test R2 = 0.865). SHAP analysis identified the 687 nm wavelength and its interaction with LWC as the most important predictors. These results indicate that combining spectral information with LWC effectively improves the accuracy and stability of CHD estimation for Populus pruinosa, providing a reliable non-destructive approach for assessing forest ecosystem physiological status—a key contribution to the sustainable management of arid riparian forests.

1. Introduction

Populus pruinosa Schrenk, a species belonging to the Populus genus, is endemic to the riparian zones of the Tarim River, Kashgar River, and Hotan River in the Xinjiang Uygur Autonomous Region, China. Distinguished from other Populus species by its exceptional tolerance to high temperatures and saline–alkaline conditions, this critical desert plant plays an indispensable role in ecological restoration and reconstruction and is essential for maintaining the ecological balance of desert oases [1]. As a third-level nationally protected endangered species, Populus pruinosa is currently experiencing a gradual decline in both population size and distribution range, which endows it with great research value—particularly in the fields of genetic diversity conservation and ecological significance exploration. For example, Thomas and Lang [2] reported that overexploitation of water resources has caused a severe reduction in poplar forest area in the Tarim Basin over the past decades, and that Populus pruinosa stands have become older, sparser, and ultimately doomed to die off with increasing distance from the water table. Given its ecological importance and endangered status, accurate monitoring of its physiological status is crucial. The leaf chlorophyll density (CHD), as a key indicator of plant health, has become a core focus of related research.
Current methods for leaf chlorophyll estimation include traditional chemical extraction, vegetation indices, radiative transfer models such as PROSPECT, and machine learning algorithms like random forest [3]. Chemical extraction is accurate but destructive and labor-intensive, whereas vegetation indices are rapid but species-specific and prone to interference from water stress in arid riparian zones. Radiative transfer models offer interpretability yet suffer from low inversion accuracy, with R2 typically ranging from 0.6 to 0.8, while machine learning approaches achieve higher accuracy, often with R2 exceeding 0.9, but face challenges such as overfitting, poor generalizability, and confounding by leaf water content [4]. Integrating spectral data with leaf water content has been shown to mitigate water-induced spectral variations [5]; for instance, the fusion of hyperspectral reflectance with leaf water content explained over 90% of the variance in chlorophyll fluorescence parameters under dehydration stress [6]. However, such integration remains rarely explored for Populus pruinosa in arid environments, where water stress critically affects leaf spectra and chlorophyll dynamics.
For Populus pruinosa, whose physiological status monitoring relies heavily on CHD as a key health indicator, chlorophyll, which determines CHD, is predominantly composed of two isoforms: chlorophyll a (Chl a) and chlorophyll b (Chl b). These pigments play pivotal roles in photosynthetic capacity through the absorption of light and the conversion of photonic energy, water, and carbon dioxide into carbohydrates and oxygen. The relative abundances of Chl a and Chl b serve as sensitive biomarkers for assessing vegetation responses to environmental perturbations (such as drought and salinity in arid riparian zones) and reflect the intrinsic physiological states of plants [7]. Notably, CHD changes dynamically during plant ontogeny, particularly under conditions of environmental stress (e.g., drought, salinity) or senescence. Therefore, quantitative measurements of CHD provide a robust proxy for evaluating plant nutritional status and act as a critical diagnostic parameter for phytosanitary monitoring of Populus pruinosa. Since the spectral reflectance signatures of leaves are fundamentally determined by their photosynthetic pigment composition, reflectance spectroscopy has become a reliable nondestructive methodology for CHD quantification [8].
In recent years, hyperspectral remote sensing has been widely applied in CHD inversion, with common methods including vegetation index-based models, radiative transfer models, and machine learning algorithms. Although these methods have achieved certain success in CHD monitoring of general vegetation, they have significant limitations when applied to Populus pruinosa in arid regions. Specifically, the riparian zones where Populus pruinosa grows are characterized by scarce water resources and frequent water stress, which can alter leaf spectral reflectance by affecting LWC and chlorophyll degradation, thereby interfering with the accuracy of CHD hyperspectral inversion. However, most existing studies have only relied on hyperspectral data alone, ignoring the interference of water stress on spectral signatures [9,10]. For example, Zarco-Tejada et al. [11] demonstrated that water stress significantly interferes with chlorophyll-related spectral signals when using conventional spectral indices alone, leading to reduced inversion accuracy under arid conditions. Consequently, reliance on hyperspectral data alone results in insufficient inversion accuracy and poor model stability in arid environments. To fill this research gap, this study aims to improve the accuracy and stability of CHD inversion for Populus pruinosa, and its core research objective is to propose an optimized multimodal band selection algorithm combined with hyperspectral-LWC data fusion, so as to realize the accurate nondestructive monitoring of foliar CHD in this species under arid water stress conditions. The innovations of this study lie in four aspects: first, it specifically targets Populus pruinosa, an endangered species in arid riparian zones, which is more targeted than general vegetation CHD inversion studies; second, it breaks through the limitation of traditional studies relying solely on hyperspectral data [12], and integrates LWC data to eliminate the interference of water stress on spectral signals; third, it optimizes the multimodal band selection algorithm to further extract effective spectral features related to CHD, improving the inversion accuracy and model generalization ability; fourth, it employs SHapley Additive exPlanations (SHAP) plots to analyze the inversion promotion effect between LWC and characteristic bands, quantitatively revealing the interaction mechanism of LWC and spectral features in CHD inversion, which enhances the interpretability of the fusion model and provides a theoretical basis for the rational application of LWC and spectral data in arid zone vegetation CHD monitoring. Hyperspectral techniques combined with various modeling approaches have been successfully applied to estimate chlorophyll across different plant species. For instance, machine learning algorithms such as random forest and support vector regression have demonstrated high predictive accuracy for chlorophyll content in citrus leaves [12]; radiative transfer models, including PROSPECT integrated with active learning, have been used for potato leaf chlorophyll estimation [13]; and spectral vegetation indices have been developed to estimate the chlorophyll a/b ratio in wheat under contrasting water availability conditions [14]. These studies collectively demonstrate the potential of hyperspectral methods for non-destructive chlorophyll monitoring. However, their direct applicability to water-stressed Populus pruinosa in arid riparian zones—where both leaf water content and chlorophyll are strongly affected by seasonal water scarcity—remains unexplored.

2. Materials and Methods

2.1. Overview of the Study Area

The study area comprises two sites: the Awati County Populus pruinosa Forest Reserve and the Tarim River headwater region, both located in Xinjiang, China (Figure 1). The Awati Reserve lies between 40°15′53.72″–40°18′23.13″ N and 80°19′54.71″–80°24′13.10″ E and has a typical continental arid climate with scarce precipitation, strong evaporation, and water resources mainly supplied by glacier meltwater [15]. The Tarim River headwater region is located at 40°27′ N, 80°56′ E in Aral City, where the three rivers meet. This area belongs to a warm-temperate extreme continental arid desert climate, characterized by drought and low rainfall, with a high-flow period from July to September due to seasonal flood overflow [15]. Within the Awati Reserve, trees older than 200 years account for about 18% of the population, with an average height of 12.3 m and trunk diameter of 48 cm [16]. All leaf samples were collected from this reserve between July and October 2023.

2.2. Data Collection

To ensure that the CHD of the collected samples presented a certain gradient, samples were collected on a fixed date (20th) in the four months of July, August, September, and October, which fall within the active growing season of Populus pruinosa when leaf photosynthetic activity exhibits measurable variation [17]. Weather conditions were needed, and clear and windless weather was necessary, with the cloud cover in the sky also being limited. The specific requirements were that the cloud cover on that day should be less than 20%, and the Populus pruinosa within each sampling area that met the sampling requirements was collected. The specific sampling requirements included estimation of tree age using diameter at breast height (DBH). Only trees with a DBH between 35 cm and 55 cm were selected, which corresponds to an estimated age range of approximately 150 to 250 years based on previously published growth equations for Populus pruinosa in the Tarim Basin [16]. Pest and disease status was assessed by visual inspection of leaves, branches, and stems. Visual examination focused on symptoms such as leaf spots, holes, wilting, cankers, or visible insect colonies. No molecular or rapid diagnostic tests were performed because the study area has a historically low incidence of severe pest problems. “Relatively small growth difference” was operationally defined as a coefficient of variation in DBH of less than 15% among selected trees within the same sampling point. In practice, this criterion excluded suppressed or extremely dominant individuals and ensured that all sampled trees had comparable crown size and stem form. Twenty leaves were collected from each sampling point, corresponding to two trees per point, and a total of five sampling points were set, resulting in 100 leaves per month. From each tree, 10 representative leaves were taken. The same sampling procedure was repeated in July, August, September, and October, yielding a total of 400 leaf samples across the four months. All 400 samples were combined for subsequent model training and testing. Hyperspectral data require dimensionality reduction to avoid overfitting [18]; our Random Frog-IRIV feature selection effectively reduces input dimensionality. Similar moderate sample sizes have been successfully used for chlorophyll estimation with machine learning after feature reduction [19]. Therefore, the 400-sample dataset is statistically adequate for this study. During the leaf collection process, the high branch shears were used as the collection tool. After collection, leaf samples were immediately placed in portable insulated coolers with ice packs to maintain a temperature of approximately 4 °C. The coolers were transported to the laboratory by car within 4–6 h, taking care to avoid physical damage to the samples. Upon arrival, the samples were immediately preserved in 95% ethanol for chlorophyll extraction. All samples were processed on the same day as collection, ensuring that the time between collection and ethanol treatment did not exceed 8 h. The spectral data collection instrument used was an ASD FieldSpec HandHeld2 portable surface spectroradiometer (ASD Inc., Boulder, CO, USA) with a resolution of 1 nm, a spectral range of 325–1075 nm, high efficiency, rapid scanning (<2 s per time) and a long-lasting battery (more than 8 h). Using this device, the leaf spectral reflectance of the Populus pruinosa samples was measured, and during the spectral reflectance rate collection, ensuring that the collection was performed during a stable period of natural light was necessary to reduce the effect of light intensity fluctuations on the reflectance. The measurement angle was fixed at 60° relative to the incident angle, and the distance between the sensor and different leaves needed to be consistent. During the measurement, the temperature was maintained between 25 °C and 35 °C, and care was taken to avoid water condensation on the leaves to prevent the effect of high humidity on reflectance.

2.3. Measurement of Leaf Physiological Indicators

2.3.1. Measurement of CHD

After the collected leaves of Populus pruinosa were cut into pieces, they were placed in 95% ethanol for extraction. The specific extraction operation involved placing the sample mixture in a refrigerator at 4 °C for 48 h of immersion. After the extraction was completed, it was measured using a spectrophotometer, and the corresponding concentration was calculated using the following formula:
A   c h l   a = 0.01373 A 663 0.000897 A 537 0.003046 A 647
A c h l   b = 0.02405 A 647 0.004305 A 537
S L W = L D W L A
C H D a = A   c h l   a · S L W
C H D b = A   c h l   b · S L W
C H D a + b = C H D a + C H D b
where A 537   A 663 , and A 647 represent the absorbance values measured using a 1 cm cuvette at wavelengths of 537 nm, 663 nm, and 647 nm, respectively; L D W represents the dry matter weight of the leaves measured after drying (g); L A represents the projected area of the leaves (cm2); S L W represents the specific leaf weight (g/cm2) C H D a , and C H D b represent the density values of chlorophyll a and chlorophyll b (mg/cm2), respectively; and C H D a + b represents the total density value (mg/cm2).

2.3.2. Measurement of the LWC

The LWC was determined using the drying method. The sample was placed in an oven at 105 °C until it reached a constant weight [20]. The percentage of weight loss (the reference method) was calculated according to Equation (7).
φ ( % ) = ( W 1   W 2 ) / W 1 × 100
where W 1 represents the fresh weight of the leaf, and W 2 represents the dry weight of the leaf.

2.4. Data Processing

2.4.1. Spectral Data Preprocessing

The original spectral signals of the collected samples are subject to numerous interferences, such as random fluctuations of the detector in the instrument, which cause baseline noise, stray light interference due to the unevenness or contamination of optical components, and other environmental factors [21]. Therefore, eliminating noise, baseline drift and environmental interference and increasing the signal-to-noise ratio and model accuracy are needed. Preprocessing (such as smoothing and normalization) can reduce irrelevant variations and improve the spectral data of effective features [22]. The specific spectral preprocessing method involves preprocessing the original spectral data of Populus pruinosa leaves using Savitzky–Golay (SG) smoothing and then performing a superimposed continuous wavelet transform (CWT) on this basis. SG smoothing filters out high-frequency noise effectively by fitting local window data with polynomials while retaining the spectral trend. In this study, the Savitzky–Golay smoothing was performed with a window size of 11 and a polynomial order of 3. The advantage of this method lies in fast computation and easy parameter adjustment, but it may weaken the details [23]. The superimposed CWT is subsequently performed to decompose the signal at multiple scales, possessing both time–frequency localization capabilities: it can suppress residual noise and extract weak features (such as minute absorption peaks) that are masked by smoothing while optimizing the resolution. The CWT was applied using a Gaussian mother wavelet of order 4 (gaus4) with a decomposition scale of 4. The combination of the two considers both global smoothing and local detail enhancement and is suitable for precise analysis of complex spectra, especially for improving stability and sensitivity in quantitative analysis.

2.4.2. Characteristic Bands Selection Method

Hyperspectral datasets exhibit substantial spectral dimensionality complexity, necessitating feature band selection to mitigate information redundancy and noise propagation, which may compromise analytical fidelity [24]. This critical preprocessing step serves three primary functions: (1) improving model generalizability through dimensionality reduction, (2) isolating spectral regions that are sensitive to target biochemical parameters (e.g., LWC and CHD), and (3) enabling robust physical parameter inversion by eliminating collinear variables [9]. The present study implements two advanced feature selection algorithms: the Random Frog method and the iterative retaining informative variables (IRIV) approach.
Random Frog employs a stochastic global search mechanism based on Monte Carlo sampling, simulating biomimetic “frog leap” dynamics to probabilistically identify optimal spectral bands [25]. This technique synergizes variable importance projection with adaptive probability weighting, revealing efficacy in high-dimensional hyperspectral processing.
Similarly, the IRIV approach operates through an iterative optimization process that dynamically eliminates redundant variables while preserving informative spectral features [26]. The hybrid architecture of the IRIV approach integrates permutation importance assessment with forward–backward stepwise selection, explicitly accounting for multivariate synergistic effects in spectral response patterns.

2.5. Construction and Evaluation Criteria of the Prediction Model

2.5.1. Methodology for Model Construction

The CHD prediction model employed extreme gradient boosting (XGBoost), Random Forest (RF), and support vector machine (SVM) algorithms, which were selected for their complementary capabilities in hyperspectral data processing. The gradient-boosted tree architecture of XGBoost optimizes predictive accuracy through parallelized tree construction and L1/L2 regularization, effectively balancing spectral feature interactions while minimizing overfitting risks via shrinkage (η = 0.1) and subsampling (colsample_bytree = 0.8) mechanisms [27]. The RF ensemble of bootstrap-aggregated decision trees (ntrees = 500) leverages out-of-bag error estimation and permutation-based variable importance ranking, which is particularly advantageous for identifying LWC–chlorophyll spectral covariance patterns through its inherent resistance to high-dimensional noise [28]. The SVM kernel-induced feature space transformation, which uses radial basis functions (γ = 0.01), projects nonlinear spectral relationships into linearly separable hyperplanes while maintaining the generalizability via soft-margin optimization (C = 10) [29]. The model inputs consisted of the measured LWC obtained by laboratory oven-drying and the vegetation-sensitive bands identified through the Random Frog-IRIV hybrid selection. The experimentally determined LWC values were used as direct input features to the models, not as derived or predicted variables. These inputs were used to predict CHD calibrated against laboratory spectrophotometric measurements. To evaluate the influence of LWC integration, two datasets were prepared: Dataset A without LWC and Dataset B with LWC. Each dataset was divided into training and test sets at a ratio of 2:1, and all three models were implemented using Jupyter Notebook (version 6.1.4).

2.5.2. Model Performance Evaluation Metrics and Calculation Methods

The performance evaluation of the constructed model includes the selection of the coefficient of determination (R2), root mean square error (RMSE), and relative percent difference (RPD). These three indicators are used mainly to test the fitting effect and estimation ability of the constructed model. The calculation formulas are shown in Equations (8)–(10). Moreover, to explain the contribution levels of each feature within the model, this study uses the Shapley diagram to display the corresponding Shapley value of each feature. The calculation formula is shown in Equation (11) [30]:
R 2 = i = 1 n ( x i x ¯ ) ( y i y ¯ ) 2 i = 1 n ( x i x ¯ ) 2 i = 1 n ( y i y ¯ ) 2
R M S E = i = 1 n ( y i x i ) 2 n
R P D = i = 1 n ( x i x ¯ ) 2 n R M S E
i = S F { i } S ! F S 1 ! F ! f S I f S
where x i represents the predicted value; x ¯ represents the average of the predicted values; y i represents the measured value; y ¯ represents the average of the measured values; n represents the number of samples; i represents the Shapley value of the i-th feature; F represents the set of all features; S represents the feature subset; f S represents the model output based on the subset S .

3. Results

3.1. The Variation Characteristics of CHD in the Leaves of Populus pruinosa Trees in Different Months

The maximum CHD was 0.1095 mg/cm2, the minimum value was 0.0068 mg/cm2, the average value was 0.0450 mg/cm2, and the variance was 0.000453 mg/cm2 (Figure 2). The observed CHD differences among samples within the same month may reflect spatial heterogeneity in micro-environmental conditions across the five sampling points in the Awati Reserve, such as variations in local groundwater depth, soil salinity, and light availability. Additionally, despite careful selection of trees with similar diameters at breast height and apparent health, intrinsic physiological and genetic differences among individuals could also contribute to the inter-sample variability. The trend of the change in CHD was essentially consistent, showing a continuous decrease as the number of months increased. The decreasing trend is most obvious from September to October. This finding indicates that the CHD of Populus pruinosa leaves collected in this study conforms to the CHD change pattern of Populus pruinosa leaves in the natural environment; that is, the CHD increases significantly during the growing season, and the chlorophyll decomposes rapidly beginning in autumn, resulting in a rapid decrease in CHD [31].

3.2. Spectral Reflectance Characteristics of Populus pruinosa Leaves in Different Months

During different months, the structure and physiological characteristics of the leaves of naturally growing Populus pruinosa in the wild undergo regular changes due to the environmental climate at the time of their respective months. These changes further affect the CHD and spectral reflectance of the leaves, resulting in certain characteristics of change. Each spectral reflectance curve shown in Figure 3 represents the mean value calculated from all leaf samples collected in that month. During different months, the spectral reflectance changes in Populus pruinosa leaves generally followed the same pattern: the samples collected in August presented a corresponding spectral reflectance in the full spectrum that was greater than that of the leaves collected in other months. The spectral reflectance curve in the visible light region (400–700 nm), especially in the 500–700 nm range, shows a more obvious change; the spectral curve of the collected Populus pruinosa leaves reflectance shows the “peak-valley” characteristics of the spectral reflectance curve of the green plants; a relatively obvious peak appears near the 550 nm band, and the “valley” features appear twice in the visible light range, namely, the “blue valley” and the “red valley” near the 485 and 690 nm bands, respectively. This phenomenon is caused mainly by the absorption of red light and the reflectance of green light by chlorophyll and other pigments in Populus pruinosa leaves [32]. Within the range of one peak and two valleys, the spectral reflectance is the highest in August, followed by October and July, and the spectral reflectance changes in these three months are relatively large, whereas the spectral reflectance in September is lower and the change amplitude is lower.

3.3. Results of Characteristic Bands Selection

Owing to the large number of spectral reflectance data bands collected and the notion that some bands may have an impact on and interfere with the prediction modeling of CHD, this study uses a phased optimization strategy to select the characteristic bands to increase the accuracy and robustness of the CHD inversion model. The specific screening process is described in the next section.
Initial screening with Random Frog
Based on the full-band spectral data (325–1075 nm, a total of 751 bands) after Savitzky–Golay smoothing and superimposed CWT processing, the maximum number of best potential variables was 7. Through 1000 iterations of Random Frog algorithm sampling, the probability of each band being included in the optimal variable subset (selection probability, SP) was calculated. A threshold of SP ≥ 25% was set, and 21 significant bands (accounting for 2.79% of the total bands) were selected. The selected bands were concentrated mainly in the visible light region (400–780 nm) and the near-infrared region (780–1075 nm). The SP values of the 734 nm band and the 858 nm band were the highest (56.3% and 51.8%, respectively), indicating the strongest sensitivity to the change in CHD. Figure 4a shows the probability of each band being selected during the frog-jumping screening process. Moreover, to ensure that the selected variables are valid variables, a partial least squares regression (PLS) model was built using the optimal latent variable number in the frog-jumping algorithm screening process [33]. Although the RMSEP reached its global minimum at approximately 5–6 selected variables (Figure 4b), the final set of characteristic bands was not determined solely by RMSEP minimization. Instead, the Random Frog algorithm ranked bands by selection probability (SP), and we retained those with SP ≥ 25% (21 bands). This subset was then further refined by IRIV to obtain the final bands used in the prediction models. The temporary decrease in RMSEP between 70 and 90 variables is a stochastic artifact of the RJMCMC-based search [34], which is inherent to such algorithms because they probabilistically explore the variable space via a Markov chain Monte Carlo framework and may temporarily accept suboptimal solutions to avoid local optima. This phenomenon does not indicate a superior variable subset.
Iterative Retention Information Variable Method (IRIV) for quadratic optimization
With the 21 bands initially selected through Random Frog as the input, IRIV filtering was performed to further eliminate redundant and interfering bands. During the two rounds of IRIV filtering, the maximum number of principal factors used and the number of cross-validation iterations were determined through multiple tests to be 10 and 5, respectively. The IRIV algorithm filtered the characteristic bands for a total of 3 rounds, as shown in Figure 5a. During each iteration, the p value corresponds to each variable. As the number of iterations increases, redundant spectral bands and low-contributing spectral bands are further eliminated. As shown in Figure 5b, these are the bands ultimately selected by the IRIV algorithm. A total of 12 characteristic quantities, namely, 414, 447, 564, 565, 590, 634, 635, 687, 693, 703, 802, and 858 nm, were selected. From the filtering results, compared with the single filtering method, the Random Frog-IRIV combination method shows significant advantages in cross-validation. The method integrates the global search ability of the Random Frog and the local optimization characteristics of the iterative retained information variable (IRIV) method, further reducing the number of redundant bands and effectively balancing the information content of spectral features and the complexity of the model.

3.4. Construction of a Spectral Prediction Model for CHD in the Leaves of Populus pruinosa

3.4.1. Building a Model Without Integrating the LWC

After the Random Frog-IRIV method was applied to filter out the characteristic bands, the performances of the three models differ. By considering the performance of the comprehensive modeling process and the standard R2, RMSE, and RPD for validating the model accuracy, among the RF model, SVM model, and XGBoost model, when modeling is performed without the LWC being integrated, the prediction performance of the RF model is superior to that of the other two models. The training effect is shown in Figure 6, where the R2 of the training set is 0.842, the RMSE is 0.0100, and the RPD is 2.513, whereas the R2 of the test set is 0.830, the RMSE is 0.0093, and the RPD is 2.425.

3.4.2. Building a Model with Integrating the LWC

After the LWC was integrated, the performance of the three models also improved to varying degrees. As shown in Table 1 and Table 2, the modeling performed on the sample dataset with integrated LWC achieved significantly better results than that without integrated LWC. The XGBoost model had the greatest improvement in performance, and its prediction performance surpassed that of the RF model, making it the best-performing model among the three. The training effect is shown in Figure 7. The R2 of the training set is 0.871, increasing by 0.061; the RMSE is 0.0090, decreasing by 0.0017; and the RPD is 2.782, increasing by 0.486. For the test set, the R2 is 0.865, increasing by 0.059, the RMSE is 0.0083, decreasing by 0.0021, and the RPD is 2.725, increasing by 0.456.

3.4.3. Model Performance Explanation

SHAP (Shapley additive explanations) plots are interpretability methods based on game theory and are used to explain the prediction results of machine learning models. SHAP quantifies the contribution of each feature to the model output (Shapley value), providing intuitive global and local explanations. We chose the XGBoost model with the best performance after LWC was integrated and explained its important features internally and the importance of LWC in the modeling process by drawing SHAP plots. As shown in Figure 8a, during the modeling process, the feature with the highest Shapley value is the wavelength of 687 nm. The added LWC feature also has a high Shapley value in the modeling process, and adding LWC can interact with other bands, thereby making the prediction effect more accurate, as shown in Figure 8b, which is the SHAP plot of the interaction between the highest contributing feature wavelength and the LWC feature. Thus, the addition of LWC has a relatively significant contribution to the prediction effect of the model.

4. Discussion

4.1. The Possibility of Accurately Estimating the CHD of Populus pruinosa Leaves Using Hyperspectral Technology

The changes in the spectral reflectance of plant leaves have multiple causes. The most significant influence is caused by changes in indicators such as CHD, LWC, the cellular structure within the leaves, and dry matter content. Therefore, we can use this characteristic to identify the corresponding spectral features of plant leaves [35,36,37,38,39]. After identifying the corresponding spectral features, via machine learning modeling methods, we can determine the spectral feature changes when the biochemical indicators of plants change, thereby inverting the biochemical indicators, as Li et al. [40] did in their research, by preprocessing the original spectrum using methods such as variable standardization (SNV) and multispectral scattering correction (MSC), and then modeling; the authors discovered the relationship between the chlorophyll content and the identified spectral features, thereby achieving accurate estimation of the chlorophyll content using hyperspectral imaging. Compared with traditional spectrophotometry, hyperspectral technology is a nondestructive detection method that does not damage plant leaves and has high measurement efficiency, which can promptly reflect changes in biochemical indicators during the development process of the measured plants [41,42]. In addition, hyperspectral technology can be combined with remote sensing technologies such as unmanned aerial vehicles (UAVs), thereby enabling rapid monitoring of chlorophyll content in large areas of vegetation [43]. In recent years, hyperspectral remote sensing technology has made significant progress in the field of chlorophyll content prediction. Through innovations in key methods, such as spectral preprocessing and feature selection, Esquivel et al. [44,45,46,47] explored various factors that affect the original spectral reflectance and proposed a series of spectral preprocessing methods for original spectral data. In conclusion, accurately estimating the CHD of Populus pruinosa leaves using hyperspectral technology is highly possible. Compared with traditional hyperspectral studies for estimating CHD, this research integrates LWC to estimate and model the CHD of Populus pruinosa leaves rather than relying solely on hyperspectral reflectance data, further improving the feasibility and accuracy of estimating the CHD of Populus pruinosa leaves.
Recent advances in hyperspectral chlorophyll estimation have explored various methodological frameworks. Zhao et al. [48] developed a spatial-spectral feature extraction method for in-field chlorophyll content estimation using hyperspectral imaging, demonstrating the importance of combining spectral and spatial information for accurate prediction. Lu and He [49] systematically compared five machine learning regression algorithms with the PROSAIL radiative transfer model for crop canopy chlorophyll estimation. Their results showed that machine learning methods applied to UAV–satellite data fusion achieved significantly higher accuracy (R2 up to 0.89) than hybrid PROSAIL-ML models (maximum R2 = 0.77), confirming that machine learning is a powerful tool for chlorophyll retrieval. Li et al. [50] applied fractional-order differentiation combined with random forest regression for chlorophyll content estimation in grape leaves, demonstrating that fractional-order differentiation significantly enhanced spectral features and improved model accuracy compared with integer-order methods, achieving an R2 of 0.778 after optimization. In the Tarim Basin region, earlier studies on Populus euphratica have investigated leaf chlorophyll content under different groundwater depths using BP neural network, PLSR, and SVM models, confirming that chlorophyll content is highly sensitive to groundwater availability [51]. In contrast to these earlier studies that relied on single-source spectral data and traditional machine learning methods, our approach integrates leaf water content as an additional predictor and uses XGBoost with SHAP analysis for model interpretability, providing a more comprehensive and explainable framework for chlorophyll density estimation in arid riparian species. This methodological novelty is further supported by recent applications of SHAP analysis in remote sensing, where Zhao et al. [52] demonstrated that SHAP-correlation integrated feature selection enhances model robustness in multi-source hyperspectral retrieval.

4.2. Selection of a Modeling Method for Estimating the CHD of Populus pruinosa Leaves

In this study, to ensure that the selected spectral preprocessing method would yield the expected results, the SG smoothing superimposed CWT spectral preprocessing method was chosen. The noise and baseline drift of the preprocessed spectral data were significantly improved in this study. He Jiachen et al. [53,54,55,56,57] also used the combination of SG smoothing and first-order derivative processing, as well as CWT and first-order derivative processing, in the preprocessing of spectral data, thereby reducing interference factors in the spectral data and significantly improving the performance of the final modeling. In this study, the sensitive response bands of Populus pruinosa leaves to CHD were selected by the Random Frog-IRIV method. For example, Zou Y et al. [58] combined the Random Frog and variable combination population analysis-iterative retained information variable (VCPA-IRIV) methods for feature variable selection, and the results revealed that the effect was better than that of a single selection method. Twelve bands that responded significantly to CHD were selected from the wavelength range of 325–1075 nm. The modeling process of the prediction model was divided into two steps: modeling prediction without the addition of LWC and modeling prediction with the addition of LWC. This study revealed that the prediction model established through sensitive bands has a significantly better modeling accuracy with added LWC than the prediction model without added LWC. In the arid Tarim Basin, Populus pruinosa frequently suffers from water stress, which accelerates chlorophyll degradation and disrupts the photosynthetic apparatus. Under water deficit, the production of reactive oxygen species (ROS) increases, leading to lipid peroxidation and chlorophyll destruction, thereby reducing leaf chlorophyll density (CHD) [59]. This stress-induced pigment loss directly alters leaf spectral reflectance, particularly in the visible and red-edge regions. However, the spectral response to chlorophyll degradation can be confounded by concurrent changes in leaf water content (LWC), because water also absorbs light in the near-infrared region and affects leaf internal structure. Therefore, when using only spectral data, the model faces a “precision gap”: it cannot distinguish whether a spectral change is caused by chlorophyll loss, water variation, or their interaction. Our multimodal model bridges this gap by incorporating measured LWC as an additional predictor. By explicitly providing the water status, the model isolates the confounding effect of water and focuses on the chlorophyll-related spectral features, thereby significantly improving the accuracy and stability of CHD estimation. LWC is also an important factor that affects plant reflectance. In the visible light wavelength range (400–700 nm), because water may absorb incident light, the reflectance of wet soil or plants is lower than that of dry soil or plants. In the modeling process, adding LWC can predict CHD more accurately. Caturegli et al. [60,61,62,63] proved that water stress significantly changes the spectral reflectance of plant leaves, especially in the visible and near-infrared wavelength ranges. Through the SHAP graph in the modeling process, the contribution ratio of the LWC in the modeling process is ranked fourth compared with the contributions of the other sensitive bands.

4.3. Inter-Sample Variability and Seasonal Spectral Anomalies

Inter-sample variability
The observed CHD differences among samples within the same month may reflect spatial heterogeneity in micro-environmental conditions across the five sampling points in the Awati Reserve, such as variations in local groundwater depth, soil salinity, and light availability. Previous studies on Populus pruinosa have demonstrated that environmental factors, including annual mean temperature and precipitation, significantly influence leaf structural traits, and that leaf anatomical traits exhibit higher variability than morphological traits, while leafing intensity (LIM) and main vein vascular bundle area (MVBA) showed high coefficients of variation [64]. Zhang et al. [17] further showed that groundwater depth in natural Populus pruinosa forest habitats ranges from 3.4 to 7.9 m, and that net photosynthetic rate (Pn) and leaf-specific conductivity (Kl) significantly decrease with increasing groundwater depth, reflecting strong hydraulic–photosynthetic coordination and suggesting that water availability directly affects photosynthetic performance. Additionally, despite careful selection of trees with similar diameters at breast height and apparent health, intrinsic physiological and genetic differences among individuals could also contribute to the observed inter-sample variability. Zou et al. [65] reported that tree age alone can explain 15.1% to 38.1% of the variation in leaf functional traits, including Chl a, Chl b and total chlorophyll content, and that soil pH and soil total nitrogen content are also significant influencing factors. These combined environmental and intrinsic factors provide a reasonable explanation for the observed variability in CHD among the sampled trees within the same month.
Seasonal spectral anomalies
The spectral reflectance curves displayed two notable anomalies: the unusually high reflectance of August leaves (up to 40% at the green peak) and the lowest reflectance observed in September, despite the declining CHD trend that began in July. These anomalies are not measurement errors but can be explained by species-specific morphological and physiological factors.
August high reflectance. Taxonomic descriptions indicate that Populus pruinosa leaves are densely tomentulose (covered with short hairs) on both surfaces and possess a thick epicuticular wax layer. Both pubescence and glaucousness are known to markedly increase leaf reflectance in the visible spectrum by enhancing light scattering [66]. Removal of leaf wax has been shown to decrease reflectance in the 400–1000 nm range [67]. In August, the hot and dry conditions of the Tarim Basin likely exacerbate the expression of these surface traits, which serve to reduce heat load and water loss. The effect of these morphological adaptations may outweigh the expected reduction in reflectance that would normally accompany high chlorophyll content during this period, leading to the observed high reflectance.
September low reflectance. The lowest reflectance in September seems contradictory to the declining CHD trend. However, two factors can explain this phenomenon. First, leaf water content (LWC) may be relatively higher in September due to occasional autumn rainfall or reduced evapotranspiration; increased water content is known to decrease leaf reflectance in the visible and near-infrared regions [68,69]. Second, as leaves begin to senesce in September, chlorophyll degradation is accompanied by relative increases in other pigments and structural changes within the leaf mesophyll, which can alter scattering properties and result in a net decrease in reflectance [70,71]. The combined effects of water-induced absorption and pigment-composition changes can overcome the expected reflectance increase from chlorophyll loss, leading to the lowest reflectance in September. This interpretation is consistent with the observed CHD dynamics.

4.4. Implications for Conservation and Management

The hyperspectral-LWC fusion model developed in this study has direct practical value for the conservation and management of Populus pruinosa. As an endangered foundation species of Central Asian riparian forests, Populus pruinosa faces population declines due to groundwater depletion and habitat degradation [72]. Because the proposed model relies only on leaf spectral reflectance and measured LWC, it provides a rapid, non-destructive tool for assessing CHD in the field without laboratory extraction. Reserve managers can therefore monitor the physiological health of individual trees or forest stands in real time and detect water-stress-induced decline before visible symptoms appear.
The practical value of this method extends to broader applications. When integrated with unmanned aerial vehicles or satellite platforms equipped with hyperspectral sensors, the approach can be scaled up to monitor CHD across large areas, helping to identify priority zones for targeted conservation interventions such as ecological water delivery. Seeley et al. [73] demonstrated that leaf-level spectroscopy can capture interspecific and intraspecific variation in water stress responses in riparian cottonwood species, highlighting the potential of remote sensing to monitor drought impacts from leaves to landscapes. Furthermore, the SHAP analysis identified the 687 nm wavelength and its interaction with LWC as the most important predictors. This finding points to a simplified spectral indicator that could be implemented using multispectral sensors for cost-effective routine monitoring.
By providing an accurate, interpretable and scalable method for CHD estimation, this study directly supports the conservation of Populus pruinosa and evidence-based management of arid riparian forests under increasing water scarcity.

4.5. Limitations and Future Research Directions

Several limitations of this study should be acknowledged. First, the total sample size was 400 leaves, which is moderate for machine learning applications. Although we used a fixed 2:1 training-to-test split and feature selection to reduce overfitting risks, a larger dataset would further improve model generalizability. Second, the study was conducted over a single growing season from July to October 2023. Seasonal and inter-annual variations in climate and water availability may affect the relationship between spectral reflectance and CHD. Therefore, long-term observations across multiple years are needed to validate the robustness of the proposed model. Third, we did not directly measure micro-environmental variables such as soil salinity, groundwater depth, or light intensity at each sampling point. These factors were invoked to explain inter-sample variability but were not included as model inputs. Future studies should incorporate these covariates to better understand their influence on CHD prediction.
For future research, we suggest several directions. Expanding the dataset to include more individuals and additional sites across the Tarim Basin would enhance the model‘s applicability. Collecting data over multiple growing seasons would allow assessment of model stability under varying climatic conditions. Transferability of the model to other Populus pruinosa populations or related species should be tested. Additionally, more advanced machine learning techniques such as deep learning or ensemble methods could be explored to further improve prediction accuracy and interpretability. Finally, integrating the proposed method with UAV-based hyperspectral imaging would enable operational large-scale monitoring for conservation management.

5. Conclusions

This study demonstrates that integrating leaf water content with hyperspectral data significantly improves the estimation of CHD in Populus pruinosa, an endangered foundation species of Central Asian riparian forests. The XGBoost model achieved the best predictive performance after LWC fusion, and SHAP analysis identified the 687 nm wavelength and its interaction with LWC as key predictors. The proposed method offers a rapid, non-destructive tool for assessing the physiological health of Populus pruinosa in the Awati Reserve. When integrated with unmanned aerial vehicles or satellite remote sensing, the model can provide reserve managers with real-time, spatially explicit assessments of tree physiological status, enabling prompt identification of water-stressed areas. This approach can support early detection of water-stress-induced decline, guide ecological water delivery in the Tarim Basin, and contribute to the evidence-based conservation of arid riparian ecosystems. Future research should focus on multi-year data collection, testing model transferability to other populations, and integrating UAV-based hyperspectral imaging for large-scale monitoring.

Author Contributions

B.Z.: Writing—original draft, Software, Formal analysis. J.W.: Conceptualization, Writing—review & editing, Funding acquisition. H.L.: Methodology, Investigation, Data curation. C.C.: Resources, Supervision. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by grants from the National Natural Science Foundation of China (Grant No. 32360289).

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Conflicts of Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References

  1. Li, Z.J.; Liu, J.P.; Yu, J.; Zhou, Z.L. Investigation on Bio-ecological Characteristics of Populus pruinosa. Acta Bot. Boreali-Occident. Sin. 2003, 23, 1234–1245. [Google Scholar]
  2. Thomas, F.M.; Lang, P. Growth and water relations of riparian poplar forests under pressure in Central Asia’s Tarim River Basin. River Res. Appl. 2020, 37, 233–240. [Google Scholar] [CrossRef]
  3. Kandpal, K.C.; Kumar, A. Migrating from invasive to noninvasive techniques for enhanced leaf chlorophyll content estimations efficiency. Crit. Rev. Anal. Chem. 2024, 54, 2583–2598. [Google Scholar] [CrossRef]
  4. Li, X.; Zhu, B.; Li, S.; Liu, L.; Song, K.; Liu, J. A comprehensive review of crop chlorophyll mapping using remote sensing approaches: Achievements, limitations, and future perspectives. Sensors 2025, 25, 2345. [Google Scholar] [CrossRef]
  5. Chopra, Y.; Xie, X.; Clothier, J.; Ghosh, S.; Yu, H.; Walia, H.; Sattler, S.E. Hyperspectral imaging to characterize the vegetative tissue biochemical changes in response to water deficit conditions in sorghum (Sorghum bicolor). Front. Plant Sci. 2025, 16, 1515998. [Google Scholar] [CrossRef]
  6. Shao, Z.; Wang, Q.; Zhuang, J. Integrating leaf spectral and water status information to effectively track chlorophyll a fluorescence parameters during dehydration. Physiol. Plant. 2024, 176, e14391. [Google Scholar] [CrossRef]
  7. Chen, Y.; Chen, Y.; Xu, C.; Li, W. Photosynthesis and water use efficiency of Populus euphratica in response to changing groundwater depth and CO2 concentration. Environ. Earth Sci. 2011, 62, 119–125. [Google Scholar] [CrossRef]
  8. Gitelson, A.; Solovchenko, A. Non-invasive quantification of foliar pigments: Possibilities and limitations of reflectance- and absorbance-based approaches. J. Photochem. Photobiol. B 2018, 178, 537–544. [Google Scholar] [CrossRef]
  9. Sonobe, R.; Yamashita, H.; Mihara, H.; Morita, A.; Ikka, T. Estimation of leaf chlorophyll a, b and carotenoid contents and their ratios using hyperspectral reflectance. Remote Sens. 2020, 12, 3265. [Google Scholar] [CrossRef]
  10. Zhang, F.; Zhou, G. Estimation of vegetation water content using hyperspectral vegetation indices: A comparison of crop water indicators in response to water stress treatments for summer maize. BMC Ecol. 2019, 19, 18. [Google Scholar] [CrossRef]
  11. Zarco-Tejada, P.J.; González-Dugo, V.; Williams, L.E.; Suárez, L.; Berni, J.A.J.; Goldhamer, D.; Fereres, E. A PRI-based water stress index combining structural and chlorophyll effects: Assessment using diurnal narrow-band airborne imagery and the CWSI thermal index. Remote Sens. Environ. 2013, 138, 38–50. [Google Scholar] [CrossRef]
  12. Li, D.; Hu, Q.; Ruan, S.; Liu, J.; Zhang, J.; Hu, C.; Liu, Y.; Dian, Y.; Zhou, J. Utilizing hyperspectral reflectance and machine learning algorithms for non-destructive estimation of chlorophyll content in citrus leaves. Remote Sens. 2023, 15, 4934. [Google Scholar] [CrossRef]
  13. Ma, Y.; Qiu, C.; Zhang, J.; Pan, D.; Zheng, C.; Sun, H.; Feng, H.; Song, X. Potato leaf chlorophyll content estimation through radiative transfer modeling and active learning. Agronomy 2023, 13, 3071. [Google Scholar] [CrossRef]
  14. Mulero, G.; Bacher, H.; Kleiner, U.; Peleg, Z.; Herrmann, I. Spectral estimation of in vivo wheat chlorophyll a/b ratio under contrasting water availabilities. Remote Sens. 2022, 14, 2585. [Google Scholar] [CrossRef]
  15. Tao, H.; Gemmer, M.; Bai, Y.G.; Su, B.D.; Mao, W.Y. Trends of streamflow in the Tarim River Basin during the past 50 years: Human impact or climate change? J. Hydrol. 2011, 400, 1–9. [Google Scholar] [CrossRef]
  16. Wang, J.Q.; Han, L.; Li, Z.J.; Zhou, Z.L.; Ma, C.H. Life history characteristics and spatial distribution of Populus pruinosa population at the upper reaches of Tarim River. Acta Ecol. Sin. 2013, 33, 6196–6204. [Google Scholar] [CrossRef]
  17. Zhang, J.; Zhai, J.; Wang, J.; Si, J.; Li, J.; Ge, X.; Li, Z. Interrelationships and Environmental Influences of Photosynthetic Capacity and Hydraulic Conductivity in Desert Species Populus pruinosa. Forests 2024, 15, 1094. [Google Scholar] [CrossRef]
  18. Bioucas-Dias, J.M.; Plaza, A.; Camps-Valls, G.; Scheunders, P.; Nasrabadi, N.; Chanussot, J. Hyperspectral remote sensing data analysis and future challenges. IEEE Geosci. Remote Sens. Mag. 2013, 1, 6–36. [Google Scholar] [CrossRef]
  19. Eshkabilov, S.; Simko, I. Assessing contents of sugars, vitamins, and nutrients in baby leaf lettuce from hyperspectral data with machine learning models. Agriculture 2024, 14, 834. [Google Scholar] [CrossRef]
  20. Stegen, S.; Queirolo, F.; Ostapczuk, P.; Groemping, A.; Paz, M.; Restovic, M.; Carrasco, C. Comparison of different methods for the determination of the water content and the dry mass correction factor in various plant samples. Fresenius J. Anal. Chem. 1998, 360, 601–604. [Google Scholar] [CrossRef]
  21. Yan, C. A review on spectral data preprocessing techniques for machine learning and quantitative analysis. iScience 2025, 28, 112759. [Google Scholar] [CrossRef]
  22. Kang, M.; Han, S.P.; Yang, H.J.; Tang, D.D.; Li, Y.J.; Wang, Z.Q. Data Preprocessing Method for Infrared Spectra Analysis of Natural Gas Components. Infrared Technol. 2021, 43, 804–808. [Google Scholar]
  23. Schmid, M.; Rath, D.; Diebold, U. Why and How Savitzky–Golay Filters Should Be Replaced. ACS Meas. Sci. Au 2022, 2, 185–196. [Google Scholar] [CrossRef]
  24. Wang, J.; Tang, C.; Liu, X.; Zhang, W.; Li, W.; Zhu, X.; Wang, L.; Zomaya, A.Y. Graph regularized spatial-spectral subspace clustering for hyperspectral band selection. Neural Netw. 2022, 153, 292–302. [Google Scholar] [CrossRef]
  25. Yao, X.; Yang, W.; Li, M.; Zhou, P.; Chen, Y.; Hao, H. Prediction of Total Nitrogen in Soil Based on Random Frog and Wavelet Neural Network. IFAC-PapersOnLine 2018, 51, 660–665. [Google Scholar] [CrossRef]
  26. Nian, Y.; Su, X.; Yue, H.; Anwar, S.; Li, J.; Wang, W.; Sheng, Y.; Ma, Q.; Liu, J.; Li, X. Winter Wheat SPAD Prediction Based on Multiple Preprocessing, Sequential Module Fusion, and Feature Mining Methods. Agriculture 2024, 14, 2258. [Google Scholar] [CrossRef]
  27. Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; Association for Computing Machinery: New York, NY, USA, 2016. [Google Scholar] [CrossRef]
  28. Dhillon, M.S.; Dahms, T.; Kuebert-Flock, C.; Rummler, T.; Arnault, J.; Steffan-Dewenter, I.; Ullmann, T. Integrating Random Forest and Crop Modeling Improves the Crop Yield Prediction of Winter Wheat and Oil Seed Rape. Front. Remote Sens. 2023, 3, 1010978. [Google Scholar] [CrossRef]
  29. Zhou, Z.; Morel, J.; Parsons, D.; Kucheryavskiy, S.; Gustavsson, A.-M. Estimation of yield and quality of legume and grass mixtures using partial least squares and support vector machine analysis of spectral data. Comput. Electron. Agric. 2019, 162, 246–253. [Google Scholar] [CrossRef]
  30. Lundberg, S.M.; Lee, S.I. A Unified Approach to Interpreting Model Predictions. Adv. Neural Inf. Process. Syst. 2017, 30, 4765–4774. [Google Scholar] [CrossRef]
  31. Yu, B.; Zhao, C.Y.; Li, J.; Li, J.Y.; Peng, G. Morphological, Physiological, and Biochemical Responses of Populus euphratica to Soil Flooding. Photosynthetica 2015, 53, 110–117. [Google Scholar] [CrossRef]
  32. Paradiso, R.; Cocetta, G.; Proietti, S. Beyond red and blue: Unveiling the hidden action of green wavelengths on plant physiology, metabolisms and gene regulation in horticultural crops. Environ. Exp. Bot. 2025, 230, 106089. [Google Scholar] [CrossRef]
  33. Richter, N.F.; Tudoran, A.A. Elevating theoretical insight and predictive accuracy in business research: Combining PLS-SEM and selected machine learning algorithms. J. Bus. Res. 2024, 173, 114453. [Google Scholar] [CrossRef]
  34. Li, H.-D.; Xu, Q.-S.; Liang, Y.-Z. Random frog: An efficient reversible jump Markov Chain Monte Carlo-like approach for variable selection with applications to gene selection and disease classification. Anal. Chim. Acta 2012, 740, 20–26. [Google Scholar] [CrossRef]
  35. Zhang, Y.; Han, X.; Yang, J. Selection of optimal spectral features for leaf chlorophyll content estimation. Sci. Rep. 2024, 14, 25598. [Google Scholar] [CrossRef]
  36. Chen, X.; Dong, Z.; Liu, J.; Wang, H.; Zhang, Y.; Chen, T.; Du, Y.; Shao, L.; Xie, J. Hyperspectral Characteristics and Quantitative Analysis of Leaf Chlorophyll by Reflectance Spectroscopy Based on a Genetic Algorithm in Combination with Partial Least Squares Regression. Spectrochim. Acta A 2020, 243, 118786. [Google Scholar] [CrossRef]
  37. Jin, X.; Xu, X.; Song, X.; Li, Z.; Wang, J.; Guo, W. Estimating Leaf Water Content in Winter Wheat Using Grey Relational Analysis–Partial Least Squares Modeling with Hyperspectral Data. Agron. J. 2013, 105, 1385–1392. [Google Scholar] [CrossRef]
  38. Li, C.; Xiao, Z.; Liu, Y.; Meng, X.; Li, X.; Wang, X.; Li, Y.; Zhao, C.; Ren, L.; Yang, C.; et al. Hyperspectral Estimation of Winter Wheat Leaf Water Content Based on Fractional Order Differentiation and Continuous Wavelet Transform. Agronomy 2023, 13, 56. [Google Scholar] [CrossRef]
  39. Li, W.; Sun, Z.; Lu, S.; Omasa, K. Estimation of the Leaf Chlorophyll Content Using Multiangular Spectral Reflectance Factor. Plant Cell Environ. 2019, 42, 3152–3165. [Google Scholar] [CrossRef] [PubMed]
  40. Li, Y.; Xu, X.; Wu, W.; Zhu, Y.; Gao, L.; Jiang, X.; Meng, Y.; Yang, G.; Xue, H. Hyperspectral estimation of chlorophyll content in grapevine based on feature selection and GA-BP. Sci. Rep. 2025, 15, 8029. [Google Scholar] [CrossRef] [PubMed]
  41. Ruszczak, B.; Wijata, A.M.; Nalepa, J. Unbiasing the Estimation of Chlorophyll from Hyperspectral Images: A Benchmark Dataset, Validation Procedure and Baseline Results. Remote Sens. 2022, 14, 5526. [Google Scholar] [CrossRef]
  42. Raj, R.; Bayat, B.; Raza, A.; Gaiser, T.; Vereecken, H.; Montzka, C. Within-Field Crop Leaf Area Index Simulation Using a Hybrid PROSAIL-SVR Approach: Evaluating Sentinel-2 and PlanetScope Potential. Int. J. Remote Sens. 2025, 46, 8546–8566. [Google Scholar] [CrossRef]
  43. Xue, Q.; Gao, X.; Lu, F.; Ma, J.; Song, J.; Xu, J. Development and Application of Unmanned Aerial High-Resolution Convex Grating Dispersion Hyperspectral Imager. Sensors 2024, 24, 5812. [Google Scholar] [CrossRef] [PubMed]
  44. Esquivel, F.J.; Esquivel, J.A.; Morgado, A.; Romero-Béjar, J.L.; García del Moral, L.F. Preprocessing of Spectroscopic Data Using Affine Transformations to Improve Pattern-Recognition Analysis: An Application to Prehistoric Lithic Tools. Mathematics 2022, 10, 4250. [Google Scholar] [CrossRef]
  45. Zhao, H.; Wang, Z.; Jia, G.; Tian, J.; Jin, S.; Liang, S.; Liu, Y. The Impact and Correction of Sensitive Environmental Factors on Spectral Reflectance Measured In Situ. Remote Sens. 2023, 15, 5332. [Google Scholar] [CrossRef]
  46. Azarfar, G.; Aboualizadeh, E.; Walter, N.M.; Ratti, S.; Olivieri, C.; Norici, A.; Nasse, M.; Kohler, A.; Giordano, M.; Hirschmugl, C.J. Estimating and Correcting Interference Fringes in Infrared Spectra in Infrared Hyperspectral Imaging. Analyst 2018, 143, 4674–4683. [Google Scholar] [CrossRef] [PubMed]
  47. Rasti, B.; Scheunders, P.; Ghamisi, P.; Licciardi, G.; Chanussot, J. Noise Reduction in Hyperspectral Imagery: Overview and Application. Remote Sens. 2018, 10, 482. [Google Scholar] [CrossRef]
  48. Zhao, R.; Tang, W.; Liu, M.; Wang, N.; Sun, H.; Li, M.; Ma, Y. Spatial-spectral feature extraction for in-field chlorophyll content estimation using hyperspectral imaging. Biosyst. Eng. 2024, 246, 263–276. [Google Scholar] [CrossRef]
  49. Alam, M.M.T.; Milas, A.S.; Gašparović, M.; Osei, H.P. Retrieval of crop canopy chlorophyll: Machine learning vs. radiative transfer model. Remote Sens. 2024, 16, 2058. [Google Scholar] [CrossRef]
  50. Li, Y.; Xu, X.; Wu, W.; Zhu, Y.; Yang, G.; Yang, X.; Meng, Y.; Jiang, X.; Xue, H. Hyperspectral estimation of chlorophyll content in grape leaves based on fractional-order differentiation and random forest algorithm. Remote Sens. 2024, 16, 2174. [Google Scholar] [CrossRef]
  51. Wang, H.Z.; Chen, J.L.; Han, L. Effects of groundwater levels on photosynthetic pigments and light response of chlorophyll fluorescence parameters of Populus euphratica and Populus pruinosa. J. Desert Res. 2013, 33, 1054–1062. [Google Scholar] [CrossRef]
  52. Shan, C.; Cai, T.; Wang, J.; Ma, Y.; Du, J.; Jia, X.; Yang, X.; Guo, F.; Li, H.; Qiu, S. Refined leaf area index retrieval in Yellow River Delta coastal wetlands: UAV-borne hyperspectral and LiDAR data fusion and SHAP–correlation-integrated machine learning. Remote Sens. 2026, 18, 40. [Google Scholar] [CrossRef]
  53. Arts, L.P.A.; van den Broek, E.L. The fast continuous wavelet transformation (fCWT) for real-time, high-quality, noise-resistant time–frequency analysis. Nat. Comput. Sci. 2022, 2, 47–58. [Google Scholar] [CrossRef]
  54. Gu, C.; Ji, S.; Xi, X.; Zhang, Z.; Hong, Q.; Huo, Z.; Li, W.; Mao, W.; Zhao, H.; Zhang, R.; et al. Yield Estimation Based on Continuous Wavelet Transform with Multiple Growth Periods. Front. Plant Sci. 2022, 13, 931789. [Google Scholar] [CrossRef]
  55. Sivagami, A.V.; Ramakrishnan, R.; Subasini, A. Weather Prediction Model using Savitzky-Golay and Kalman Filters. Procedia Comput. Sci. 2019, 165, 449–455. [Google Scholar] [CrossRef]
  56. Liu, B.; Yu, X.; Chen, J.; Wang, Q. Air Pollution Concentration Forecasting Based on Wavelet Transform and Combined Weighting Forecasting Model. Atmos. Pollut. Res. 2021, 12, 101144. [Google Scholar] [CrossRef]
  57. He, J.; He, J.; Liu, G.; Li, W.; Li, Z.; Li, Z. Inversion analysis of soil nitrogen content using hyperspectral images with different preprocessing methods. Ecol. Inform. 2023, 78, 102381. [Google Scholar] [CrossRef]
  58. Zou, Y.; Zhang, A.; Wang, X.; Yang, L.; Ding, M. Comparison of feature selection and data fusion of Fourier transform infrared and Raman spectroscopy for identifying watercolor ink. J. Forensic Sci. 2024, 69, 584–592. [Google Scholar] [CrossRef]
  59. Yu, L.; Dong, H.; Li, Z.; Han, Z.; Korpelainen, H.; Li, C. Species-specific responses to drought, salinity and their interactions in Populus euphratica and P. pruinosa seedlings. J. Plant Ecol. 2020, 13, 563–573. [Google Scholar] [CrossRef]
  60. Caturegli, L.; Matteoli, S.; Gaetani, M.; Grossi, N.; Magni, S.; Minelli, A.; Corsini, G.; Remorini, D.; Volterrani, M. Effects of water stress on spectral reflectance of bermudagrass. Sci. Rep. 2020, 10, 15055. [Google Scholar] [CrossRef]
  61. Kovar, M.; Brestic, M.; Sytar, O.; Barek, V.; Hauptvogel, P.; Zivcak, M. Evaluation of Hyperspectral Reflectance Parameters to Assess the Leaf Water Content in Soybean. Water 2019, 11, 443. [Google Scholar] [CrossRef]
  62. Wu, D.; Wang, P.; Chen, B.; Yi, L.; Dai, Z.; Xiao, B. Evaluation of Leaf Water Content in Watermelon Based on Hyperspectral Reflectance. Water 2025, 17, 1142. [Google Scholar] [CrossRef]
  63. Chen, J.-J.; Sun, Y.; Kopp, K.; Oki, L.; Jones, S.B.; Hipps, L. Effects of Water Availability on Leaf Trichome Density and Plant Growth and Development of Shepherdia × utahensis. Front. Plant Sci. 2022, 13, 855858. [Google Scholar] [CrossRef]
  64. Yang, Z.G.; Zhang, J.L.; Zhai, J.T.; Liu, H.J.; Li, Z.J.; Hu, Y.Y.; Tang, Z. Study on Spatial Variation of Leaf Morphology and Anatomical Structural Traits of Populus pruinosa. Bot. Res. 2024, 13, 553–564. [Google Scholar] [CrossRef]
  65. Zou, X.G.; Wang, Y.; Wang, J.M.; Qu, M.J.; Zhu, W.L.; Zhao, H.; Si, J.H.; Li, J.W. Synergistic and trade-off responses of leaf functional traits of Populus euphratica to tree age and soil factors. J. Beijing For. Univ. 2024, 46, 82–92. [Google Scholar] [CrossRef]
  66. Holmes, M.G.; Keiller, D.R. Effects of pubescence and waxes on the reflectance of leaves in the ultraviolet and photosynthetic wavebands: A comparison of a range of species. Plant Cell Environ. 2002, 25, 85–93. [Google Scholar] [CrossRef]
  67. Lu, S. Effects of leaf surface wax on leaf spectrum and hyperspectral vegetation indices. In Proceedings of the 2013 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Melbourne, VIC, Australia, 21–26 July 2013; pp. 1–4. [Google Scholar] [CrossRef]
  68. Carter, G.A. Primary and secondary effects of water content on the spectral reflectance of leaves. Am. J. Bot. 1991, 78, 916–924. [Google Scholar] [CrossRef]
  69. Ceccato, P.; Flasse, S.; Tarantola, S.; Jacquemoud, S.; Grégoire, J.M. Detecting vegetation leaf water content using reflectance in the optical domain. Remote Sens. Environ. 2001, 77, 22–33. [Google Scholar] [CrossRef]
  70. Gitelson, A.A.; Chivkunova, O.B.; Merzlyak, M.N. Nondestructive estimation of anthocyanins and chlorophylls in anthocyanic leaves. Am. J. Bot. 2009, 96, 1861–1868. [Google Scholar] [CrossRef]
  71. Merzlyak, M.N.; Gitelson, A.A.; Chivkunova, O.B.; Rakitin, V.Y. Non-destructive optical detection of pigment changes during leaf senescence and fruit ripening. Physiol. Plant. 1999, 106, 135–141. [Google Scholar] [CrossRef]
  72. Dimeyeva, L.; Islamgulova, A.; Permitina, V.; Ussen, K.; Kerdyashkin, A.; Tsychuyeva, N.; Salmukhanbetova, Z.; Kurmantayeva, A.; Iskakov, R.; Imanalinova, A.; et al. Plant Diversity and Distribution Patterns of Populus pruinosa Schrenk (Salicaceae) Floodplain Forests in Kazakhstan. Diversity 2023, 15, 797. [Google Scholar] [CrossRef]
  73. Seeley, M.M.; Wiebe, B.C.; Gehring, C.A.; Hultine, K.R.; Posch, B.C.; Cooper, H.F.; Schaefer, E.A.; Bock, B.M.; Abraham, A.J.; Moran, M.E.; et al. Remote sensing reveals inter- and intraspecific variation in riparian cottonwood (Populus spp) response to drought. J. Ecol. 2025, 113, 1760–1779. [Google Scholar] [CrossRef]
Figure 1. Distribution of sampling sites for Populus pruinosa leaves collected from July to October 2023 in the Akesu (Awati region), Xinjiang, China (n = 100 leaves per month, total 400 leaves).
Figure 1. Distribution of sampling sites for Populus pruinosa leaves collected from July to October 2023 in the Akesu (Awati region), Xinjiang, China (n = 100 leaves per month, total 400 leaves).
Forests 17 00692 g001
Figure 2. Distribution of CHD in Populus pruinosa leaves sampled from July to October 2023 (n = 100 leaves per month, total 400 leaves, mg/cm2).
Figure 2. Distribution of CHD in Populus pruinosa leaves sampled from July to October 2023 (n = 100 leaves per month, total 400 leaves, mg/cm2).
Forests 17 00692 g002
Figure 3. Monthly mean spectral reflectance curves of Populus pruinosa leaves (July–October 2023), calculated as the average of all samples per month. Measurements were performed using an ASD Field Spec spectrometer (325–1075 nm).
Figure 3. Monthly mean spectral reflectance curves of Populus pruinosa leaves (July–October 2023), calculated as the average of all samples per month. Measurements were performed using an ASD Field Spec spectrometer (325–1075 nm).
Forests 17 00692 g003
Figure 4. Screening process of spectral bands based on the Random Frog algorithm. (a) Selection probability of each spectral band (values are probabilities ranging from 0 to 1, not percentages); (b) Variation in RMSEP with the number of selected variables in the PLS model.
Figure 4. Screening process of spectral bands based on the Random Frog algorithm. (a) Selection probability of each spectral band (values are probabilities ranging from 0 to 1, not percentages); (b) Variation in RMSEP with the number of selected variables in the PLS model.
Forests 17 00692 g004
Figure 5. Two-step spectral feature band screening via the Random Frog-IRIV hybrid algorithm. (a) p-value heatmap of feature selection during IRIV iteration; (b) Average spectral reflectance of Populus pruinosa leaves with the finally selected 12 characteristic bands.
Figure 5. Two-step spectral feature band screening via the Random Frog-IRIV hybrid algorithm. (a) p-value heatmap of feature selection during IRIV iteration; (b) Average spectral reflectance of Populus pruinosa leaves with the finally selected 12 characteristic bands.
Forests 17 00692 g005
Figure 6. CHD estimation model of Populus pruinosa based on Random Forest (RF) without integrating LWC. The solid lines represent the regression fitting of the training and test sets (training R2 = 0.842, test R2 = 0.830).
Figure 6. CHD estimation model of Populus pruinosa based on Random Forest (RF) without integrating LWC. The solid lines represent the regression fitting of the training and test sets (training R2 = 0.842, test R2 = 0.830).
Forests 17 00692 g006
Figure 7. CHD estimation model of Populus pruinosa based on XGBoost with the integration of LWC. The solid lines represent the regression fitting of the training and test sets (training R2 = 0.871, test R2 = 0.865).
Figure 7. CHD estimation model of Populus pruinosa based on XGBoost with the integration of LWC. The solid lines represent the regression fitting of the training and test sets (training R2 = 0.871, test R2 = 0.865).
Forests 17 00692 g007
Figure 8. SHAP analysis of the XGBoost model for CHD estimation. (a) Mean Shapley value of each feature representing feature importance; (b) bee swarm plot showing the interaction between the 687 nm band and LWC.
Figure 8. SHAP analysis of the XGBoost model for CHD estimation. (a) Mean Shapley value of each feature representing feature importance; (b) bee swarm plot showing the interaction between the 687 nm band and LWC.
Forests 17 00692 g008
Table 1. Performance of the CHD estimation model without integrating LWC.
Table 1. Performance of the CHD estimation model without integrating LWC.
Modeling MethodTraining SetTest Set
R2RMSE (mg/cm2)RPDR2RMSE (mg/cm2)RPD
RF0.8420.01002.5130.8300.00932.425
XGBoost0.8100.01072.2960.8060.01042.269
SVM0.6910.01361.8050.6270.01451.645
Table 2. Performance of the CHD estimation model after integrating LWC.
Table 2. Performance of the CHD estimation model after integrating LWC.
Modeling MethodTraining SetTest Set
R2RMSE (mg/cm2)RPDR2RMSE (mg/cm2)RPD
RF0.8510.00972.5890.8450.00902.538
XGBoost0.8710.00902.7820.8650.00832.725
SVM0.7360.01241.9480.6660.01301.736
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhang, B.; Wang, J.; Li, H.; Cai, C. Hyperspectral Estimation of Chlorophyll Density in Populus pruinosa Incorporating Leaf Water Content. Forests 2026, 17, 692. https://doi.org/10.3390/f17060692

AMA Style

Zhang B, Wang J, Li H, Cai C. Hyperspectral Estimation of Chlorophyll Density in Populus pruinosa Incorporating Leaf Water Content. Forests. 2026; 17(6):692. https://doi.org/10.3390/f17060692

Chicago/Turabian Style

Zhang, Bingling, Jiaqiang Wang, Huixia Li, and Chongfa Cai. 2026. "Hyperspectral Estimation of Chlorophyll Density in Populus pruinosa Incorporating Leaf Water Content" Forests 17, no. 6: 692. https://doi.org/10.3390/f17060692

APA Style

Zhang, B., Wang, J., Li, H., & Cai, C. (2026). Hyperspectral Estimation of Chlorophyll Density in Populus pruinosa Incorporating Leaf Water Content. Forests, 17(6), 692. https://doi.org/10.3390/f17060692

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop