Phenology-Aware Machine Learning Framework for Chlorophyll Estimation in Cotton Using Hyperspectral Reflectance

Jiang, Chunbo; Cheng, Yi; Li, Yongfu; Peng, Lei; Dong, Gangshang; Lai, Ning; Geng, Qinglong

doi:10.3390/rs17152713

Open AccessArticle

Phenology-Aware Machine Learning Framework for Chlorophyll Estimation in Cotton Using Hyperspectral Reflectance

by

Chunbo Jiang

¹

,

Yi Cheng

¹,

Yongfu Li

²,

Lei Peng

²

,

Gangshang Dong

²,

Ning Lai

² and

Qinglong Geng

^2,*

¹

Agricultural Engineering and Information Technology, College of Resources and Environment, Xinjiang Agricultural University, Urumqi 830052, China

²

Xinjiang Academy of Agricultural Sciences, Resource and Environmental Information Technology Innovation Team, Urumqi 830091, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2025, 17(15), 2713; https://doi.org/10.3390/rs17152713

Submission received: 20 May 2025 / Revised: 16 July 2025 / Accepted: 28 July 2025 / Published: 6 August 2025

(This article belongs to the Special Issue Remote Sensing and Machine Learning in Vegetation Biophysical Parameters Estimation (Second Edition))

Download

Browse Figures

Versions Notes

Abstract

Accurate and non-destructive monitoring of leaf chlorophyll content (LCC) is essential for assessing crop photosynthetic activity and nitrogen status in precision agriculture. This study introduces a phenology-aware machine learning framework that combines hyperspectral reflectance data with various regression models to estimate leaf chlorophyll content (LCC) in cotton at six key reproductive stages. Field experiments utilized synchronized spectral and SPAD measurements, incorporating spectral transformations—such as vegetation indices (VIs), first-order derivatives, and trilateration edge parameters (TEPs, a new set of geometric metrics for red-edge characterization)—for evaluation. Five regression approaches were evaluated, including univariate and multivariate linear models, along with three machine learning algorithms: Random Forest, K-Nearest Neighbor, and Support Vector Regression. Random Forest consistently outperformed the other models, achieving the highest R² (0.85) and the lowest RMSE (4.1) during the bud stage. Notably, the optimal prediction accuracy was achieved with fewer than five spectral features. The proposed framework demonstrates the potential for scalable, stage-specific monitoring of chlorophyll dynamics and offers valuable insights for large-scale crop management applications.

Keywords:

chlorophyll estimation; hyperspectral reflectance; random forest regression; phenological stages; vegetation indices; cotton; precision agriculture

Graphical Abstract

1. Introduction

Chlorophyll, the principal light-harvesting pigment in plants, plays a vital role in converting solar energy into chemical energy, thereby driving the Calvin–Benson cycle and enabling ATP and NADPH production [1]. This biochemical process underpins plant physiological activity, nitrogen assimilation, and overall crop productivity, while also serving as a reliable indicator of vegetation health [2]. As a key determinant of photosynthetic efficiency, leaf chlorophyll content (LCC) is closely correlated with maximum carboxylation rates [3]. Substantial inter- and intra-specific variations in LCC have been observed, influenced by developmental stage, nutrient and water availability, and environmental–phenological interactions [4]. Although chemical quantification methods provide high accuracy, their destructive nature and labor-intensive protocols restrict their practical application in field settings, emphasizing the need for non-invasive monitoring technologies [5].

Remote sensing has transformed plant trait analysis through spectral metrics and vegetation indices (VIs) [6], facilitating multi-scale (ground, airborne, satellite) chlorophyll estimation with improved efficiency [7]. Among these, hyperspectral remote sensing offers superior temporal resolution and spectral sensitivity for agricultural monitoring [8]. Distinct chlorophyll interactions are captured in spectral reflectance: strong absorption in blue (400–450 nm) and red (650–700 nm) regions due to pigment absorption, and enhanced reflectance in the near-infrared (700–1050 nm) due to leaf anatomical features [9,10]. Advanced preprocessing methods, such as first-order derivative spectroscopy (FODS) for noise suppression and trilateration edge parameters (TEPs)—a geometric approach to quantify spectral features in the red-edge region (680–780 nm)—have significantly enhanced vegetation trait extraction [11,12]. The integration of these spectral techniques with machine learning algorithms has shown promise for capturing temporal variation in plant traits [13].

Recent advancements in machine learning—particularly ensemble (RFR) [14], kernel-based (SVR) [15], and instance-based (KNNR) regression approaches—have outperformed traditional linear models in spectral trait estimation [16]. These models effectively handle high-dimensional data, enhance prediction accuracy, and offer greater model robustness [17]. RFR has notably reduced estimation error, while SVR and KNNR demonstrate superior performance in sparse or feature-rich datasets [18].

Cotton, a globally important cash crop supporting the textile, food, and pharmaceutical industries, has experienced considerable expansion. Hyperspectral remote sensing has been widely applied to cotton chlorophyll estimation in both domestic and international studies [19,20]. A variety of spectral indices and machine learning methods have been explored to monitor leaf biochemical parameters [21]. However, despite these advances, several technical challenges remain. First, spectral redundancy—where adjacent bands provide highly correlated information—can reduce model generalizability and increase the risk of overfitting. Second, canopy saturation in later growth stages impairs the ability of reflectance data to capture biochemical changes, especially in dense planting patterns. Third, most existing models do not distinguish between phenological stages, instead aggregating data across the growing season, which limits prediction accuracy and model transferability [22]. Finally, environmental variables such as solar angle, background noise, and leaf orientation introduce variability in spectral response that many models fail to compensate for. These limitations highlight the need for phenology-aware frameworks that combine spectral selection with temporal specificity [23].

However, although physically based radiative transfer models (RTMs) provide mechanistic understanding of vegetation reflectance, their application in precision agriculture remains constrained by several factors: high dependency on accurate biophysical parameters, computational complexity [24], and sensitivity to observational conditions [25]. These limitations become more pronounced in dynamic and large-scale field environments, making RTMs less practical for real-time crop monitoring [26].

In contrast, vegetation indices (VIs), derived from specific band combinations, offer a simpler and more robust alternative. VIs are computationally efficient and have shown stable sensitivity to chlorophyll content, particularly in red-edge spectral regions [27]. When combined with machine learning algorithms, they allow effective dimensionality reduction in hyperspectral data while improving model adaptability across phenological stages [28]. Therefore, this study adopts a VI-centered feature selection strategy to establish a phenology-aware chlorophyll estimation framework tailored to cotton [29].

Existing studies on chlorophyll estimation have largely focused on aggregated or static growth stages, often neglecting phenological dynamics that significantly modulate plant spectral responses, while hyperspectral models have been applied to crops like maize, rice, and wheat, few have systematically examined cotton, especially in a stage-specific context. This study is the first, to our knowledge, to incorporate phenological-stage-specific modeling into hyperspectral LCC estimation for cotton. By aligning chlorophyll measurements with six discrete reproductive stages and tailoring spectral feature selection to each phase, we address a key limitation of prior models—namely, their reduced accuracy under phenological variability.

Accordingly, we propose a phenology-aware machine learning framework that integrates multistage hyperspectral data with targeted regression models. The main contributions of this work are as follows: (1) we quantify LCC dynamics across six reproductive stages; (2) we identify minimal, stage-specific spectral features to reduce redundancy and optimize prediction; and (3) we evaluate model generalizability using both linear and nonlinear regression schemes. Our results offer practical implications for scalable, adaptive chlorophyll monitoring in cotton production.

Although physically based Radiative Transfer Models (RTMs) offer mechanistic insights into canopy reflectance and biochemical traits, they often require numerous structural and optical parameters (e.g., LAI, leaf angle distribution, soil reflectance), which are difficult to acquire accurately in field conditions. In contrast, empirical approaches using vegetation indices (VIs) are computationally efficient, require fewer assumptions, and have been validated extensively for chlorophyll estimation in various crops. VIs offer robustness against environmental noise and have demonstrated consistent performance across phenological stages, especially when paired with machine learning. Therefore, this study prioritizes a VI-based feature selection strategy to enable a scalable and operationally efficient chlorophyll monitoring framework in precision agriculture.

2. Materials and Methods

2.1. Workflow Overview

To enhance the clarity of the methodology, a complete workflow diagram is presented in Figure 1, outlining the sequential steps followed in this study: from field-based spectral and chlorophyll data acquisition to spectral preprocessing, feature extraction, dimensionality reduction, model construction, and performance evaluation. Each step is described in detail in the following subsections.

2.2. Study Area and Experimental Design

The field experiment was conducted during the 2024 cotton growing season (April–October) at the Huaxing Agricultural Plantation in Changji Prefecture, Xinjiang (43°55′N, 87°34′E), as shown in Figure 2. The site experiences an arid continental climate, with approximately 2700 h of annual sunshine and substantial seasonal temperature fluctuations. Monthly average temperatures range from −15.6 °C in January to 24.5 °C in July, and the annual accumulated temperature above 10 °C exceeds 3200 °C·d. Annual precipitation averages 190 mm, with over 70% occurring between June and August, supporting a frost-free period of 160–190 days suitable for cotton cultivation.

Experimental plots were established using CCM 113, a cotton cultivar adapted to regional agroecological conditions. Systematic sampling was conducted across six reproductive stages: beginning bud (June 5–15), full bud (June 20–30), first flowering (July 5–15), full flowering (July 20–30), initial boll development (August 5–15), and full boll development (August 20–30).

At each growth stage, synchronized spectral and chlorophyll measurements were collected from apical and basal trifoliate leaves, yielding six temporally resolved datasets. The experimental design accounted for both diurnal and vertical canopy variability, with data collected between 12:00 and 14:00 under consistent solar zenith angles (±5°).

2.3. Data Acquisition

2.3.1. Hyperspectral Data Collection

Hyperspectral reflectance data were acquired using an ASD FieldSpec HandHeld 2 spectrometer (spectral range: 320–1080 nm), which offers a wavelength accuracy of 1 nm and a spectral resolution of <3 nm at 700 nm. Prior to each measurement session, radiometric calibration was performed using a standardized white reference panel to obtain digital-to-radiance conversion coefficients. Spectral measurements were conducted under diffuse light conditions (clear skies, solar zenith angle < 15°), with the 25° field-of-view sensor positioned perpendicular to the center of the leaf lamina.

2.3.2. Leaf Chlorophyll Content Measurement

Chlorophyll content was non-destructively estimated using a SPAD-502Plus meter. Measurements were synchronized with spectral readings on the same leaves, and six replicates were averaged after removing outliers exceeding ±5%.

2.4. Data Preprocessing and Feature Extraction

2.4.1. Spectral Preprocessing

Hyperspectral reflectance data underwent a series of preprocessing steps to ensure data quality and minimize noise. Noisy spectral regions outside 400–1000 nm were excluded. Savitzky–Golay smoothing (window size = 11, polynomial order = 2) reduced high-frequency noise. Standard Normal Variate (SNV) transformation corrected for baseline shifts and scatter. Outliers were removed via visual and statistical screening.

2.4.2. Spectral Feature Extraction

Three categories of features were extracted: 45 vegetation indices (VIs), 4 trilateration edge parameters (TEPs), and first-order derivative spectra (FODS). VIs targeted known chlorophyll-sensitive bands (e.g., MTCI, CIgreen, Datt1). TEPs captured red-edge geometric properties. FODS enhanced sensitivity to local reflectance changes (Table 1). Python 3.10 with NumPy and SciPy was used for computations.

2.5. Feature Selection and Dimensionality Reduction

To reduce multicollinearity and high dimensionality, a three-step selection method was applied:

Pearson correlation ( $| r | > 0.55$ ) pre-screened candidate features.
Lasso regression introduced sparsity, identifying key variables.
Random forest importance scoring quantified predictor relevance.

This combined statistical and machine learning framework ensured interpretability and robustness in feature prioritization.

2.6. Model Construction and Evaluation

2.6.1. Linear Regression Models

ULR and MLR provided interpretable baselines:

\begin{matrix} Y & = c_{0} + c_{1} x_{1} \end{matrix}

(1)

\begin{matrix} Y & = c_{0} + c_{1} x_{1} + c_{2} x_{2} + c_{3} x_{3} \end{matrix}

(2)

where

x_{1}

,

x_{2}

,

x_{3}

are spectral predictors, Y is leaf chlorophyll content, and

c_{i}

are regression coefficients.

2.6.2. Machine Learning Models

Three algorithms were applied:

RFR: Ensemble method with ntree and mtry tuning.
KNNR: Instance-based learning with k tested from 1 to 15.
SVR: RBF kernel-based model tuned using C and $γ$ .

Input features were z-score normalized. 70% of data were used for training, 30% for testing. Hyperparameters were optimized via Bayesian methods.

2.6.3. Model Evaluation

Model accuracy was assessed using:

\begin{matrix} R^{2} & = 1 - \frac{\sum {(y_{i} - {\hat{y}}_{i})}^{2}}{\sum {(y_{i} - \bar{y})}^{2}} \end{matrix}

(3)

\begin{matrix} RMSE & = \sqrt{\frac{1}{N} \sum {(y_{i} - {\hat{y}}_{i})}^{2}} \end{matrix}

(4)

Cross-validation (k = 5) ensured generalization. Results were visualized using scatterplots with regression and 1:1 reference lines.

3. Results

3.1. Statistics of Measured LLC

Figure 3 illustrates the temporal variation in leaf chlorophyll content (LCC) in cotton, based on a total of 1980 samples, with 330 samples collected at each of the six developmental stages from mid-June to early August. The average LCC exhibited a clear phenological trend, rising from 33.1 (mid-June) to a peak of 71.2 (early July), followed by a gradual decline to 63.2 (mid-July). In subsequent stages, maximum LCC values continued to decrease, reaching 61.1 by early August.

Early July was identified as a critical stage, with significantly higher mean LCC (range: 35.4–72.5) compared to other phases: mid-June (33.1–59.7), late June (38.4–61.3), mid-July (39.8–57.7), late July (37.8–64.2), and early August (30.4–63.1). Variability analysis showed coefficients of variation (CV) ranging from 18.9% to 33.5%, with the highest dispersion observed in late July (CV > 30%) and early August. Notably, the mid-July stage showed moderate variability (39.8–57.7), despite being a transitional growth phase.

3.2. Parameter Selection

Figure 4 presents the correlation coefficients between LCC and all spectral predictors, including 45 vegetation indices (VIs), 4 trilateration edge parameters (TEPs), and the first-order derivative spectrum (FODS). Through Pearson correlation screening, ten key predictors were identified: seven VIs (e.g., MTCI, SR_750/710, RVI), two TEPs, and FODS, with correlation coefficients ranging from 0.56 to 0.73. Notably, Carter4 exhibited the strongest negative correlation (r = −0.78), indicating a unique sensitivity to chlorophyll variation.

Figure 5 shows the phenology-specific relationship between LCC and FODS. Peak correlations occurred at different wavelengths across growth stages, such as 743 nm during the bud initiation stage (r = 0.65) and 752.4 nm during full flowering (r = 0.64). These optimal bands correspond to known chlorophyll absorption regions, supporting the biological relevance of the selected features.

Biological Significance of Key Features

Among spectral indices evaluated for leaf chlorophyll content (LCC) estimation, two key features—MTCI (Modified Chlorophyll Absorption Reflectance Index) and FODS-743 (First-Order Derivative at 743 nm)—demonstrated robust predictive capabilities.

MTCI: This index exploits the red-edge region (700–750 nm), where reflectance transitions sharply due to chlorophyll absorption properties. MTCI is calculated as:

MTCI = \frac{R_{753} - R_{708}}{R_{708} - R_{681}},

where R denotes reflectance at specified wavelengths [71]. As chlorophyll content increases, the red-edge position shifts toward longer wavelengths (red shift), altering reflectance gradients captured by MTCI [72]. This sensitivity to chlorophyll-mediated spectral displacement makes MTCI effective for monitoring photosynthetic pigment dynamics and plant physiological status.

FODS-743: The first-order derivative value at 743 nm (

\frac{d R}{d λ} |_{743 nm}

) enhances detection of subtle chlorophyll variations by quantifying the slope within the red-edge transition zone. Spectral derivatives mitigate baseline noise while amplifying subtle absorption features linked to chlorophyll concentration [73]. The 743 nm wavelength resides where the first-derivative spectrum peaks during the red-edge inflection, coinciding with maximal sensitivity to chlorophyll-induced scattering and absorption changes. This technique effectively tracks pigment dynamics across phenological stages by isolating chlorophyll-dependent spectral variations.

Collectively, these indices leverage distinct aspects of light interaction with foliar components: MTCI characterizes chlorophyll-driven red-edge displacement through band ratios, while FODS-743 isolates chlorophyll-sensitive scattering signatures via spectral derivatives. Their efficacy stems from targeting specific biophysical mechanisms—chlorophyll absorption in red wavelengths (670–700 nm) and photon scattering in the near-infrared (>750 nm)—enabling precise quantification of photosynthetic pigment concentrations.

3.3. Univariate Linear Regression

The ten selected spectral features were individually modeled against LCC using univariate linear regression (ULR). Model performance varied across developmental stages, with coefficients of determination (

R^{2}

) peaking at different predictors depending on the phenophase (Table 2). For instance, MTCI yielded the highest

R^{2}

value (0.62) at the full bud stage, while Datt1 performed best during full flowering (

R^{2} = 0.68

). The strongest predictive accuracy was observed during the first boll (mNDVI₇₀₅,

R^{2} = 0.72

) and full boll stages (FODS-743,

R^{2} = 0.73

).

Conversely, early stages such as the beginning bud exhibited lower performance (SR_750/710,

R^{2} = 0.39

). Transitional stages like first flower maintained moderate predictability (FODS-752.4,

R^{2} = 0.60

). These findings underscore the phenological dependency of spectral response and support the use of stage-adaptive predictor selection in chlorophyll modeling.

3.4. Algorithm Implementation

This section may be divided by subheadings. It should provide a concise and precise description of the experimental results, their interpretation as well as the experimental conclusions that can be drawn.

3.5. Multiple Learning Regression

Lasso regression was applied to the ten pre-selected features, yielding three optimal predictors per phenological stage for use in multiple linear regression (MLR) models. Model performance improved progressively across stages, achieving maximum explanatory power at the full boll stage (

R^{2} = 0.77

, RMSE = 4.37), as shown in Table 3.

RMSE values generally decreased with developmental progression, indicating enhanced model stability in later stages. Post-flowering stages consistently achieved

R^{2} > 0.60

, while the first flower stage showed lower performance (

R^{2} = 0.59

). These results affirm the effectiveness of Lasso regularization in balancing model complexity and prediction accuracy, especially under high-dimensional spectral conditions.

Machine Learning Algorithms

Figure 6 displays the top ten most important features identified by random forest regression for each phenological stage. Two dominant features were consistently observed per stage, including RVI and Datt1 during bud initiation, and mNDVI₇₀₅ and Datt3 during the flowering-to-boll transition.

A comparative analysis of machine learning algorithms (Table 4, Figure 7, Figure 8 and Figure 9) revealed that RFR outperformed KNNR and SVR across all growth stages, with

R^{2}

consistently above 0.70. The highest accuracy was achieved at the beginning bud stage using RFR (

R^{2} = 0.85

), compared to SVR (

R^{2} = 0.72

) and KNNR (

R^{2} = 0.80

). Although all models performed well during early stages, RFR demonstrated superior stability throughout the reproductive cycle.

4. Discussion

This study established quantitative relationships between cotton leaf chlorophyll content (LCC) and hyperspectral reflectance by conducting systematic correlation and model performance analyses. Seven vegetation indices (VIs) exhibited strong correlations (

| r | > 0.6

), primarily located within the chlorophyll-sensitive red-edge region (680–780 nm), such as MTCI (

r = 0.73

), SR_750/710 (

r = 0.70

), and mNDVI₇₀₅ (

r = 0.67

). TEPs also showed moderate correlations, including

(SDr - SDb) / (SDr + SDb)

and

SDr / SDb

(

r > 0.56

). Additionally, first-order derivative spectroscopy at 743 nm (FODS-743) achieved

r = 0.67

, confirming its effectiveness in capturing subtle chlorophyll-related reflectance variations.

Two modeling strategies were employed for chlorophyll estimation: (1) full-spectrum modeling using all available wavelengths, and (2) feature-optimized modeling based on selected predictors. Machine learning algorithms—Random Forest (RFR), Support Vector Regression (SVR), and K-Nearest Neighbor Regression (KNNR)—were compared with linear regression models to assess the advantages of parameter optimization in high-dimensional spectral spaces. This dual-strategy approach enabled a comprehensive evaluation of model robustness while accounting for the trade-off between data dimensionality and generalizability.

4.1. Environmental Context of Stage-Specific Variability

The observed high coefficients of variation (CV) in leaf chlorophyll content (LCC) during certain phenological stages, particularly in late July and early August, can be attributed to various environmental factors that influenced the physiological state of the cotton plants. Environmental stressors, such as temperature fluctuations, water availability, and light intensity, were particularly prominent during these stages, which likely contributed to the variability in LCC measurements.

For example, during late July, when the CV was notably high (CV > 30%), water stress became a critical factor. The combination of reduced irrigation and high evapotranspiration during this period may have led to inconsistent chlorophyll synthesis, as plants adapted to moisture stress by reducing photosynthetic activity, which directly impacted chlorophyll content. Additionally, the fluctuation in soil moisture could have affected stomatal conductance, further exacerbating the variability in LCC.

Similarly, in early August, during the full boll development stage, a decrease in light intensity and temperature associated with the late reproductive phase could have affected chlorophyll stability. As cotton plants undergo metabolic transitions during boll development, the physiological processes influencing chlorophyll content become more susceptible to environmental changes, which likely contributed to the observed variation.

These environmental influences underscore the importance of considering the local climate and agronomic practices when interpreting stage-specific variability in LCC. Future research that spans multiple growing seasons will provide more comprehensive insights into how these environmental factors interact with plant phenology to influence chlorophyll dynamics.

4.2. Spectral Predictor Optimization via Linear Regression for Chlorophyll Quantification

Univariate analysis identified FODS-743 nm as the optimal single predictor at the full boll stage, consistent with prior studies on hyperspectral chlorophyll estimation in crops such as tobacco [74], wheat [75], and soybean [76]. The enhanced predictive capacity of spectral derivatives is attributed to their ability to amplify chlorophyll absorption features in the red-edge region (680–780 nm), where plant pigments exhibit peak photosynthetic efficiency [77].

In multivariate regression, combining

SDr / SDb

, MTCI, and Carter3 yielded the highest performance (

R^{2} = 0.77

) at the full boll stage (Table 4). This represents a 23.6% improvement over univariate models, underscoring the synergistic effects of integrating multiple vegetation indices. MTCI reflects red-edge inflection dynamics [78], while

SDr / SDb

captures variations in internal leaf structure. The consistent performance of SR_750/710 across phenological stages also supports its broad applicability in canopy-level chlorophyll estimation, aligning with results from forest-based studies [79].

4.3. Comparative Efficacy of ML Architectures in Chlorophyll Quantification

Among the machine learning models tested, Random Forest Regression (RFR) demonstrated the highest predictive accuracy across all phenological stages. The model achieved optimal performance at the beginning bud stage (

R^{2} = 0.85

, RMSE = 4.1), outperforming both KNNR (

R^{2} = 0.80

) and SVR (

R^{2} = 0.72

). RFR maintained high accuracy in later stages as well, confirming its adaptability to spectral and phenological variability.

RFR’s superior performance is attributable to three core advantages: (1) bootstrap aggregation and random feature subspace sampling reduce overfitting [80]; (2) ensemble averaging enhances robustness to spectral noise; and (3) internal variable importance evaluation improves model interpretability via Gini impurity reduction [81]. In contrast, KNNR showed limited flexibility due to its reliance on local spectral distance, while SVR’s performance fluctuated across stages—likely due to the kernel function’s limitations in handling high-dimensional inputs.

These findings reinforce conclusions from previous crop modeling studies, emphasizing the value of ensemble learning in trait estimation. The stage-specific accuracy variations further highlight the need for adaptive model selection strategies, particularly during transitions when biochemical and structural traits undergo rapid change.

4.4. Model Performance Across Key Phenological Stages

The performance differences observed among models at specific phenological stages provide insights into their adaptability and practical relevance in real-world agricultural scenarios. For instance, during the full flowering stage, Support Vector Regression (SVR) outperformed other models (

R^{2} = 0.79

), likely due to its ability to handle nonlinear spectral responses associated with complex canopy structures and rapid chlorophyll changes. In contrast, Random Forest Regression (RFR) demonstrated relatively lower performance (

R^{2} = 0.52

), possibly due to the increased noise from overlapping leaves and variable light scattering, which challenges ensemble tree-based methods at this stage.

During the early boll development stage, RFR regained its advantage, achieving the highest

R^{2}

(0.80). This suggests that ensemble methods are more robust to the gradual spectral shifts and structural stabilization of the canopy, which occur post-flowering. KNNR, while effective during certain stages, exhibited performance drops in later growth phases, likely due to its sensitivity to local data distribution and diminished discriminative power in increasingly homogeneous spectral environments.

These results suggest that SVR is particularly suitable for phases with rapid physiological changes, while RFR is better suited to structurally stable stages like early or full boll development. Such insights can guide practitioners in selecting models tailored to the phenological stage of interest, thus improving prediction accuracy in field applications.

4.5. Exploring Future Prospects for Cotton Chlorophyll Research

This study proposes a scalable framework that integrates univariate, multivariate, and machine learning models for cotton LCC estimation across six reproductive stages, while the method demonstrated efficacy at the leaf level, its potential for upscaling is considerable. With the integration of UAV-mounted or satellite-based hyperspectral sensors, this approach could facilitate real-time, field-to-regional scale monitoring of crop photosynthetic activity and nitrogen dynamics.

Performance variation across stages was partly influenced by environmental conditions: (1) declines in light and temperature during late reproductive phases reduced chlorophyll stability; (2) variations in soil moisture and water stress altered stomatal conductance and pigment synthesis; and (3) combined meteorological effects (e.g., UV radiation, humidity, wind) modulated physiological processes. These interactions suggest the need for adaptive monitoring systems that integrate spectral and micrometeorological data through advanced sensor fusion.

Considering that chlorophyll content is sensitive to multiple environmental stressors, integrating micrometeorological measurements (e.g., solar radiation, air temperature, and soil moisture) with spectral data will enhance model reliability. Future work should focus on environmental-feature-aware modeling frameworks to address spatiotemporal variability in chlorophyll dynamics.

Future research should focus on: (1) validating hyperspectral indices across platforms (e.g., UAV, LiDAR); (2) developing mechanistic models of chlorophyll–metabolite interactions under climatic stress; and (3) creating phenology-aware calibration protocols for satellite-based applications.

4.6. Practical Implications of Minimal-Band Modeling

The finding that fewer than five spectral features can achieve high prediction accuracy has practical implications for agricultural remote sensing. It enables the design of simplified, cost-efficient multispectral sensors tailored to chlorophyll monitoring, reducing hardware and data processing complexity. This is particularly valuable for geospatial platforms, where sensor payload and battery constraints limit spectral resolution. Deploying models based on key bands (e.g., 743 nm, SR750/710) allows real-time monitoring at the canopy level, supporting precision nitrogen management in large-scale cotton systems.

5. Conclusions

This study achieved three core objectives: (1) it quantitatively characterized the impact of phenological stage on chlorophyll reflectance relationships, highlighting stage-specific optimal predictors; (2) it identified a minimal set of high-performing spectral features, including mNDVI₇₀₅ and FODS at 743 nm, that consistently delivered strong predictive accuracy; and (3) it demonstrated that ensemble-based models, particularly Random Forest Regression (RFR), outperform linear and kernel-based methods across phenophases. These results establish a scalable, phenology-aware framework for chlorophyll estimation.

Among the tested models, RFR demonstrated superior performance and temporal robustness, achieving the highest accuracy at the beginning bud (

R^{2} = 0.85

) and first boll stages (

R^{2} = 0.80

). This robustness is attributed to RFR’s ensemble structure, which mitigates overfitting and improves noise resistance through bootstrap aggregation and feature subspace sampling.

Phenological sensitivity was evident, as LCC fluctuated significantly (23–41%) across stages in response to environmental changes. Decreased light intensity and temperature during reproductive maturation reduced chlorophyll stability, while variable precipitation impacted stomatal conductance and pigment synthesis. These observations underscore the need for stage-specific, environment-aware monitoring strategies.

The proposed framework bridges proximal sensing and precision agriculture by enabling early detection of physiological stress (e.g., water and nitrogen deficiency) through chlorophyll dynamics. Future implementations may leverage UAV-mounted hyperspectral sensors and edge-computing platforms for real-time, large-scale phenotyping. These tools have the potential to enhance irrigation management and fertilization strategies during critical periods of cotton development, particularly boll formation.

Author Contributions

Writing—original draft, methodology, software, and visualization: C.J.; conceptualization, data curation, and funding acquisition: Y.C.; project administration, supervision, and data curation: Y.L.; formal analysis and supervision: L.P.; project administration and supervision: G.D.; validation, supervision, and data curation: N.L.; funding acquisition and writing—review and editing: Q.G. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the Major Science and Technology Special Project of Xinjiang Uygur Autonomous Region (Grant No. 2022A02011-2).

Data Availability Statement

Due to privacy restrictions, the data supporting this study are not publicly available. Reasonable data requests may be directed to the corresponding author (Chunbo Jiang, jiangchunbo@xaas.ac.cn), subject to compliance with applicable regulations and ethical guidelines.

Conflicts of Interest

The authors declare no conflict of interest.

References

Peng, Y.; Nguy-Robertson, A.; Arkebauer, T.; Gitelson, A.A. Assessment of canopy chlorophyll content retrieval in maize and soybean: Implications of hysteresis on the development of generic algorithms. Remote Sens. 2017, 9, 226. [Google Scholar] [CrossRef]
Evans, J.R. Photosynthesis and nitrogen relationships in leaves of C3 plants. Oecologia 1989, 78, 9–19. [Google Scholar] [CrossRef]
Luo, J.; Zhou, J.J.; Masclaux-Daubresse, C.; Wang, N.; Wang, H.; Zheng, B. Morphological and physiological responses to contrasting nitrogen regimes in Populus cathayana is linked to resources allocation and carbon/nitrogen partition. Environ. Exp. Bot. 2019, 162, 247–255. [Google Scholar] [CrossRef]
Porcar-Castell, A.; Tyystjärvi, E.; Atherton, J.; Van der Tol, C.; Flexas, J.; Pfündel, E.E.; Moreno, J.; Frankenberg, C.; Berry, J.A. Linking chlorophyll a fluorescence to photosynthesis for remote sensing applications: Mechanisms and challenges. J. Exp. Bot. 2014, 65, 4065–4095. [Google Scholar] [CrossRef]
Croft, H.; Chen, J.M.; Luo, X.; Bartlett, P.; Chen, B.; Staebler, R.M. Leaf chlorophyll content as a proxy for leaf photosynthetic capacity. Glob. Change Biol. 2017, 23, 3513–3524. [Google Scholar] [CrossRef]
Sage, R.F.; Pearcy, R.W.; Seemann, J.R. The nitrogen use efficiency of C3 and C4 plants: III. Leaf nitrogen effects on the activity of carboxylating enzymes in Chenopodium album (L.) and Amaranthus retroflexus (L.). Plant Physiol. 1987, 85, 355–359. [Google Scholar] [CrossRef]
Luo, J.; Zhou, J.; Li, H.; Shi, W.; Polle, A.; Lu, M.; Sun, X.; Luo, Z.B. Global poplar root and leaf transcriptomes reveal links between growth and stress responses under nitrogen starvation and excess. Tree Physiol. 2015, 35, 1283–1302. [Google Scholar] [CrossRef] [PubMed]
Qiao, L.; Tang, W.; Gao, D.; Zhao, R.; An, L.; Li, M.; Sun, H.; Song, D. UAV-based chlorophyll content estimation by evaluating vegetation index responses under different crop coverages. Comput. Electron. Agric. 2022, 196, 106775. [Google Scholar] [CrossRef]
Aasen, H.; Honkavaara, E.; Lucieer, A.; Zarco-Tejada, P.J. Quantitative remote sensing at ultra-high resolution with UAV spectroscopy: A review of sensor technology, measurement procedures, and data correction workflows. Remote Sens. 2018, 10, 1091. [Google Scholar] [CrossRef]
Ta, N.; Chang, Q.; Zhang, Y. Estimation of apple tree leaf chlorophyll content based on machine learning methods. Remote Sens. 2021, 13, 3902. [Google Scholar] [CrossRef]
Zhou, J.J.; Zhang, Y.H.; Han, Z.M.; Liu, X.Y.; Jian, Y.F.; Hu, C.G.; Dian, Y.Y. Evaluating the performance of hyperspectral leaf reflectance to detect water stress and estimation of photosynthetic capacities. Remote Sens. 2021, 13, 2160. [Google Scholar] [CrossRef]
Gamon, J.A.; Somers, B.; Malenovskỳ, Z.; Middleton, E.M.; Rascher, U.; Schaepman, M.E. Assessing vegetation function with imaging spectroscopy. Surv. Geophys. 2019, 40, 489–513. [Google Scholar] [CrossRef]
Zhang, Y.; Hui, J.; Qin, Q.; Sun, Y.; Zhang, T.; Sun, H.; Li, M. Transfer-learning-based approach for leaf chlorophyll content estimation of winter wheat from hyperspectral data. Remote Sens. Environ. 2021, 267, 112724. [Google Scholar] [CrossRef]
Zarco-Tejada, P.J.; González-Dugo, M.V.; Fereres, E. Seasonal stability of chlorophyll fluorescence quantified from airborne hyperspectral imagery as an indicator of net photosynthesis in the context of precision agriculture. Remote Sens. Environ. 2016, 179, 89–103. [Google Scholar] [CrossRef]
Datt, B. Remote sensing of chlorophyll a, chlorophyll b, chlorophyll a+ b, and total carotenoid content in eucalyptus leaves. Remote Sens. Environ. 1998, 66, 111–121. [Google Scholar] [CrossRef]
Yamashita, H.; Sonobe, R.; Hirono, Y.; Morita, A.; Ikka, T. Dissection of hyperspectral reflectance to estimate nitrogen and chlorophyll contents in tea leaves based on machine learning algorithms. Sci. Rep. 2020, 10, 17360. [Google Scholar] [CrossRef]
Poobalasubramanian, M.; Park, E.S.; Faqeerzada, M.A.; Kim, T.; Kim, M.S.; Baek, I.; Cho, B.K. Identification of early heat and water stress in strawberry plants using chlorophyll-fluorescence indices extracted via hyperspectral images. Sensors 2022, 22, 8706. [Google Scholar] [CrossRef] [PubMed]
Zheng, Y.M.; Zhang, T.Q.; Zhang, J.; Chen, X.D.; Shen, X.G. Influence of smooth, 1st derivative and baseline correction on the near-infrared spectrum analysis with PLS. Guang Pu Xue Yu Guang Pu Fen Xi 2004, 24, 1546–1548. [Google Scholar]
Zhao, T.; Komatsuzaki, M.; Okamoto, H.; Sakai, K. Cover crop nutrient and biomass assessment system using portable hyperspectral camera and laser distance sensor. Eng. Agric. Environ. Food 2010, 3, 105–112. [Google Scholar] [CrossRef]
Zhao, T.; Nakano, A.; Iwaski, Y.; Umeda, H. Application of hyperspectral imaging for assessment of tomato leaf water status in plant factories. Appl. Sci. 2020, 10, 4665. [Google Scholar] [CrossRef]
Turpie, K.R. Explaining the spectral red-edge features of inundated marsh vegetation. J. Coast. Res. 2013, 29, 1111–1117. [Google Scholar] [CrossRef]
Shah, S.H.; Angel, Y.; Houborg, R.; Ali, S.; McCabe, M.F. A random forest machine learning approach for the retrieval of leaf chlorophyll content in wheat. Remote Sens. 2019, 11, 920. [Google Scholar] [CrossRef]
Cui, B.; Zhao, Q.; Huang, W.; Song, X.; Ye, H.; Zhou, X. A new integrated vegetation index for the estimation of winter wheat leaf chlorophyll content. Remote Sens. 2019, 11, 974. [Google Scholar] [CrossRef]
Verrelst, J.; Camps-Valls, G.; Muñoz-Marí, J.; Rivera, J.P.; Veroustraete, F.; Clevers, J.G.P.W.; Moreno, J. Optical remote sensing and the retrieval of terrestrial vegetation bio-geophysical properties—A review. Remote Sens. Environ. 2015, 169, 130–147. [Google Scholar] [CrossRef]
Combal, B.; Baret, F.; Weiss, M.; Trubuil, A.; Macé, D.; Pragnere, A.; Myneni, R.; Knyazikhin, Y.; Wang, L. Retrieval of canopy biophysical variables from bidirectional reflectance: Using prior information to solve the ill-posed inverse problem. Remote Sens. Environ. 2003, 84, 1–15. [Google Scholar] [CrossRef]
Jacquemoud, S.; Féret, J.-B.; Baret, F.; Weiss, M. PROSPECT + SAIL models: A review of use for vegetation characterization. Remote Sens. Environ. 2009, 113, S56–S66. [Google Scholar] [CrossRef]
Delegido, J.; Verrelst, J.; Alonso, L.; Moreno, J. Evaluation of Sentinel-2 red-edge bands for empirical estimation of green LAI and chlorophyll content. Remote Sens. 2011, 3, 2874–2898. [Google Scholar] [CrossRef]
Gitelson, A.A.; Gritz, Y.; Merzlyak, M.N. Relationships between leaf chlorophyll content and spectral reflectance and algorithms for non-destructive chlorophyll assessment in higher plant leaves. J. Plant Physiol. 2003, 160, 271–282. [Google Scholar] [CrossRef]
Zarco-Tejada, P.J.; González-Dugo, V.; Berni, J.A.J. Fluorescence, temperature and narrow-band indices acquired from a UAV platform for water stress detection using a micro-hyperspectral imager and a thermal camera. Remote Sens. Environ. 2012, 117, 322–337. [Google Scholar] [CrossRef]
Narmilan, A.; Gonzalez, F.; Salgadoe, A.S.A.; Kumarasiri, U.W.L.M.; Weerasinghe, H.A.S.; Kulasekara, B.R. Predicting canopy chlorophyll content in sugarcane crops using machine learning algorithms and spectral vegetation indices derived from UAV multispectral imagery. Remote Sens. 2022, 14, 1140. [Google Scholar] [CrossRef]
Weng, H.; Liu, Y.; Captoline, I.; Li, X.; Ye, D.; Wu, R. Citrus Huanglongbing detection based on polyphasic chlorophyll a fluorescence coupled with machine learning and model transfer in two citrus cultivars. Comput. Electron. Agric. 2021, 187, 106289. [Google Scholar] [CrossRef]
Kong, W.; Huang, W.; Casa, R.; Zhou, X.; Ye, H.; Dong, Y. Off-nadir hyperspectral sensing for estimation of vertical profile of leaf chlorophyll content within wheat canopies. Sensors 2017, 17, 2711. [Google Scholar] [CrossRef]
Jin, X.; Li, Z.; Feng, H.; Xu, X.; Yang, G. Newly combined spectral indices to improve estimation of total leaf chlorophyll content in cotton. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2014, 7, 4589–4600. [Google Scholar] [CrossRef]
Jin, X.; Wang, K.; Xiao, C.; Diao, W.; Wang, F.; Chen, B.; Li, S. Comparison of two methods for estimation of leaf total chlorophyll content using remote sensing in wheat. Field Crop. Res. 2012, 135, 24–29. [Google Scholar] [CrossRef]
Ali, A.; Imran, M. Remotely sensed real-time quantification of biophysical and biochemical traits of Citrus (Citrus sinensis L.) fruit orchards—A review. Sci. Hortic. 2021, 282, 110024. [Google Scholar] [CrossRef]
Sari, M.; Sonmez, N.K.; Karaca, M. Relationship between chlorophyll content and canopy reflectance in Washington navel orange trees (Citrus sinensis (L.) Osbeck). Pak. J. Bot. 2006, 38, 1093. [Google Scholar]
Osco, L.P.; Ramos, A.P.M.; Faita Pinheiro, M.M.; Moriya, É.A.S.; Imai, N.N.; Estrabis, N.; Ianczyk, F.; Araújo, F.F.d.; Liesenberg, V.; Jorge, L.A.d.C.; et al. A machine learning framework to predict nutrient content in valencia-orange leaf hyperspectral measurements. Remote Sens. 2020, 12, 906. [Google Scholar] [CrossRef]
Gerhards, M.; Schlerf, M.; Mallick, K.; Udelhoven, T. Challenges and future perspectives of multi-/Hyperspectral thermal infrared remote sensing for crop water-stress detection: A review. Remote Sens. 2019, 11, 1240. [Google Scholar] [CrossRef]
Li, F.; Wang, L.; Liu, J.; Wang, Y.; Chang, Q. Evaluation of leaf N concentration in winter wheat based on discrete wavelet transform analysis. Remote Sens. 2019, 11, 1331. [Google Scholar] [CrossRef]
Gitelson, A.A.; Merzlyak, M.N.; Chivkunova, O.B. Optical properties and nondestructive estimation of anthocyanin content in plant leaves. Photochem. Photobiol. 2001, 74, 38–45. [Google Scholar] [CrossRef]
Liang, L.; Qin, Z.; Zhao, S.; Di, L.; Zhang, C.; Deng, M.; Lin, H.; Zhang, L.; Wang, L.; Liu, Z. Estimating crop chlorophyll content with hyperspectral vegetation indices and the hybrid inversion method. Int. J. Remote Sens. 2016, 37, 2923–2949. [Google Scholar] [CrossRef]
Carter, G.A. Ratios of leaf reflectances in narrow wavebands as indicators of plant stress. Remote Sens. 1994, 15, 697–703. [Google Scholar] [CrossRef]
Datt, B. A new reflectance index for remote sensing of chlorophyll content in higher plants: Tests using Eucalyptus leaves. J. Plant Physiol. 1999, 154, 30–36. [Google Scholar] [CrossRef]
Huete, A.; Justice, C.; Liu, H. Development of vegetation and soil indices for MODIS-EOS. Remote Sens. Environ. 1994, 49, 224–234. [Google Scholar] [CrossRef]
Daughtry, C.S.T.; Walthall, C.L.; Kim, M.S.; De Colstoun, E.B.; McMurtrey Iii, J.E. Estimating corn leaf chlorophyll concentration from leaf and canopy reflectance. Remote Sens. Environ. 2000, 74, 229–239. [Google Scholar] [CrossRef]
Haboudane, D.; Miller, J.R.; Pattey, E.; Zarco-Tejada, P.J.; Strachan, I.B. Hyperspectral vegetation indices and novel algorithms for predicting green LAI of crop canopies: Modeling and validation in the context of precision agriculture. Remote Sens. Environ. 2004, 90, 337–352. [Google Scholar] [CrossRef]
Marshak, A.; Knyazikhin, Y.; Davis, A.B.; Wiscombe, W.J.; Pilewskie, P. Cloud-vegetation interaction: Use of normalized difference cloud index for estimation of cloud optical thickness. Geophys. Res. Lett. 2000, 27, 1695–1698. [Google Scholar] [CrossRef]
Merzlyak, M.N.; Gitelson, A.A.; Chivkunova, O.B.; Rakitin, V.Y. Non-destructive optical detection of pigment changes during leaf senescence and fruit ripening. Physiol. Plant. 1999, 106, 135–141. [Google Scholar] [CrossRef]
Roujean, J.L.; Breon, F.M. Estimating PAR absorbed by vegetation from bidirectional reflectance measurements. Remote Sens. Environ. 1995, 51, 375–384. [Google Scholar] [CrossRef]
Clevers, J.G.P.W. Imaging spectrometry in agriculture-plant vitality and yield indicators. In Imaging Spectrometry—A Tool for Environmental Observations; Springer: Berlin, Germany, 1994; pp. 193–219. [Google Scholar]
Vincini, M.; Frazzi, E.; D’Alessio, P. Angular dependence of maize and sugar beet VIs from directional CHRIS/Proba data. In Proceedings of the 4th ESA CHRIS PROBA Workshop, Frascati, Italy, 19–21 September 2006; pp. 19–21. [Google Scholar]
Penuelas, J.; Baret, F.; Filella, I. Semi-empirical indices to assess carotenoids/chlorophyll a ratio from leaf spectral reflectance. Photosynthetica 1995, 31, 221–230. [Google Scholar]
Lichtenthaler, H.K.; Lang, M.; Stober, F.; Sowinska, M.; Heisel, F.; Miehe, J.A. Detection of photosynthetic parameters and vegetation stress via a new high resolution fluorescence imaging-system. In International Colloquium Photosynthesis and Remote Sensing; EARSeL: Strasbourg, France, 1995; pp. 103–112. [Google Scholar]
McMurtrey Iii, J.E.; Chappelle, E.W.; Kim, M.S.; Meisinger, J.J.; Corp, L.A. Distinguishing nitrogen fertilization levels in field corn (Zea mays L.) with actively induced fluorescence and passive reflectance measurements. Remote Sens. Environ. 1994, 47, 36–44. [Google Scholar] [CrossRef]
Gitelson, A.A.; Merzlyak, M.N. Remote estimation of chlorophyll content in higher plant leaves. Int. J. Remote Sens. 1997, 18, 2691–2697. [Google Scholar] [CrossRef]
Zarco-Tejada, P.J.; Miller, J.R. Land cover mapping at BOREAS using red edge spectral parameters from CASI imagery. J. Geophys. Res. Atmos. 1999, 104, 27921–27933. [Google Scholar] [CrossRef]
Sims, D.A.; Gamon, J.A. Relationships between leaf pigment content and spectral reflectance across a wide range of species, leaf structures and developmental stages. Remote Sens. Environ. 2002, 81, 337–354. [Google Scholar] [CrossRef]
Haboudane, D.; Miller, J.R.; Tremblay, N.; Zarco-Tejada, P.J.; Dextraze, L. Integrated narrow-band vegetation indices for prediction of crop chlorophyll content for application to precision agriculture. Remote Sens. Environ. 2002, 81, 416–426. [Google Scholar] [CrossRef]
Rondeaux, G.; Steven, M.; Baret, F. Optimization of soil-adjusted vegetation indices. Remote Sens. Environ. 1996, 55, 95–107. [Google Scholar] [CrossRef]
Broge, N.H.; Leblanc, E. Comparing prediction power and stability of broadband and hyperspectral vegetation indices for estimation of green leaf area index and canopy chlorophyll density. Remote Sens. Environ. 2001, 76, 156–172. [Google Scholar] [CrossRef]
Datt, B. Remote sensing of water content in Eucalyptus leaves. Aust. J. Bot. 1999, 47, 909–923. [Google Scholar] [CrossRef]
Blackburn, G.A. Spectral indices for estimating photosynthetic pigment concentrations: A test using senescent tree leaves. Int. J. Remote Sens. 1998, 19, 657–675. [Google Scholar] [CrossRef]
Gitelson, A.A.; Viña, A.; Ciganda, V.; Rundquist, D.C.; Arkebauer, T.J. Remote estimation of canopy chlorophyll content in crops. Geophys. Res. Lett. 2005, 32, L08403. [Google Scholar] [CrossRef]
Fitzgerald, G.; Rodriguez, D.; O’Leary, G. Measuring and predicting canopy nitrogen nutrition in wheat using a spectral index—The canopy chlorophyll content index (CCCI). Field Crop. Res. 2010, 116, 318–324. [Google Scholar] [CrossRef]
Dash, J.; Curran, P.J. The MERIS terrestrial chlorophyll index. Int. J. Remote Sens. 2004, 25, 5403–5413. [Google Scholar] [CrossRef]
Gitelson, A.A.; Keydan, G.P.; Merzlyak, M.N. Three-band model for noninvasive estimation of chlorophyll, carotenoids, and anthocyanin contents in higher plant leaves. Geophys. Res. Lett. 2006, 33, L11402. [Google Scholar] [CrossRef]
Jordan, C.F. Derivation of leaf-area index from quality of light on the forest floor. Ecology 1969, 50, 663–666. [Google Scholar] [CrossRef]
Li, X.Y.; Liu, G.S.; Yang, Y.F.; Zhao, C.H.; Yu, Q.W.; Song, S.X. Relationship between hyperspectral parameters and physiological and biochemical indexes of flue-cured tobacco leaves. Agric. Sci. China 2007, 6, 665–672. [Google Scholar] [CrossRef]
Ying, X. An overview of overfitting and its solutions. In Journal of Physics: Conference Series; IOP Publishing: Bristol, UK, 2019; Volume 1168, p. 022022. [Google Scholar]
Fonti, V.; Belitser, E. Feature selection using lasso. VU Amst. Res. Pap. Bus. Anal. 2017, 30, 1–25. [Google Scholar]
Gondek, J.S.; Meyer, G.W.; Newman, J.G. Wavelength dependent reflectance functions. In Proceedings of the 21st Annual Conference on Computer Graphics and Interactive Techniques, Orlando, FL, USA, 24–29 July 1994; pp. 213–220. [Google Scholar]
Dash, J.; Curran, P.J. Evaluation of the MERIS terrestrial chlorophyll index (MTCI). Adv. Space Res. 2007, 39, 100–104. [Google Scholar] [CrossRef]
Demetriades-Shah, T.H.; Steven, M.D.; Clark, J.A. High resolution derivative spectra in remote sensing. Remote Sens. Environ. 1990, 33, 55–64. [Google Scholar] [CrossRef]
Zhang, X.; Sun, H.; Qiao, X.; Yan, X.; Feng, M.; Xiao, L.; Song, X.; Zhang, M.; Shafiq, F.; Yang, W.; et al. Hyperspectral estimation of canopy chlorophyll of winter wheat by using the optimized vegetation indices. Comput. Electron. Agric. 2022, 193, 106654. [Google Scholar] [CrossRef]
Shi, H.; Guo, J.; An, J.; Tang, Z.; Wang, X.; Li, W.; Zhao, X.; Jin, L.; Xiang, Y.; Li, Z.; et al. Estimation of chlorophyll content in soybean crop at different growth stages based on optimal spectral index. Agronomy 2023, 13, 663. [Google Scholar] [CrossRef]
Li, L.; Ren, T.; Ma, Y.; Wei, Q.; Wang, S.; Li, X.; Cong, R.; Liu, S.; Lu, J. Evaluating chlorophyll density in winter oilseed rape (Brassica napus L.) using canopy hyperspectral red-edge parameters. Comput. Electron. Agric. 2016, 126, 21–31. [Google Scholar] [CrossRef]
Zarco-Tejada, P.J.; Miller, J.R.; Morales, A.; Berjón, A.; Agüera, J. Hyperspectral indices and model simulation for chlorophyll estimation in open-canopy tree crops. Remote Sens. Environ. 2004, 90, 463–476. [Google Scholar] [CrossRef]
Bhadra, S.; Sagan, V.; Maimaitijiang, M.; Maimaitiyiming, M.; Newcomb, M.; Shakoor, N.; Mockler, T.C. Quantifying leaf chlorophyll concentration of sorghum from hyperspectral data using derivative calculus and machine learning. Remote Sens. 2020, 12, 2082. [Google Scholar] [CrossRef]
Angel, Y.; McCabe, M.F. Machine learning strategies for the retrieval of leaf-chlorophyll dynamics: Model choice, sequential versus retraining learning, and hyperspectral predictors. Front. Plant Sci. 2022, 13, 722442. [Google Scholar] [CrossRef] [PubMed]
An, G.; Xing, M.; He, B.; Liao, C.; Huang, X.; Shang, J.; Kang, H. Using machine learning for estimating rice chlorophyll content from in situ hyperspectral data. Remote Sens. 2020, 12, 3104. [Google Scholar] [CrossRef]
Maimaitijiang, M.; Sagan, V.; Sidike, P.; Daloye, A.M.; Erkbol, H.; Fritschi, F.B. Crop monitoring using satellite/UAV data fusion and machine learning. Remote Sens. 2020, 12, 1357. [Google Scholar] [CrossRef]

Figure 1. Comprehensive workflow of the phenology-aware chlorophyll estimation framework.

Figure 2. Overview map of the research area. (A) Location of Xinjiang Uygur Autonomous Region within China; (B) Changji Prefecture within Xinjiang; (C) Detailed view of the experimental site at Huaxing Agricultural Plantation.

Figure 3. Distribution of leaf chlorophyll content (LCC) across six cotton reproductive stages. Each stage includes 330 samples, for a total of 1980 measurements.

Figure 4. Correlation coefficients were calculated between all measured LCC values and various parameters.

Figure 5. Phenological-stage-specific LCC-FODS correlations ((a–f): bud to boll phases).

Figure 6. Stage-specific parameter significance (top 10 features): budding (a,b), flowering (c,d), and bolling (e,f).

Figure 7. RFR prediction accuracy along 1:1 reference. (a–f) represent the seasons of beginning bud, full bud, first flower, full flower, first bell, and full bell.

Figure 8. KNNR prediction accuracy along 1:1 reference. (a–f) represent the seasons of beginning bud, full bud, first flower, full flower, first bell, and full bell.

Figure 9. SVR prediction accuracy along 1:1 reference. (a–f) represent the seasons of beginning bud, full bud, first flower, full flower, first bell, and full bell.

Table 1. List of vegetation indices (VIs), their formulas, and references.

No.	Name	Formula	Reference
1	Anthocyanin Reflectance Index 1	$A R I 1 = 1 / (R_{550}) - 1 / R_{700}$	[30]
2	Anthocyanin Reflectance Index 2	$A R I 2 = R_{800} (1 / R_{550} - 1 / R_{700})$	[30]
3	Green Normalized Difference Vegetation Index hyper 1	$G N D V I_{h y p e r 1} = (R_{750} - R_{550}) / (R_{750} + R_{550})$	[31]
4	Green Normalized Difference Vegetation Index hyper 2	$G N D V I_{h y p e r 2} = (R_{800} - R_{550}) / (R_{800} + R_{550})$	[31]
5	Modified Normalized Difference Vegetation Index	$m N D V I_{705} = (R_{750} - R_{705}) / (R_{750} + R_{705} - 2 R_{445})$	[31]
6	Canopy Chlorophyll Index	$C C I = (R_{777} - R_{747}) / R_{763}$	[32]
7	Vogelmann Index 2	$V O G 2 = (R_{734} - R_{747}) / (R_{715} + R_{726})$	[33]
8	Carter1	$C a r t e 1 = R_{695} / R_{420}$	[34]
9	Carter2	$C a r t e 2 = R_{695} / R_{760}$	[34]
10	Carter3	$C a r t e 3 = R_{605} / R_{760}$	[34]
11	Carter4	$C a r t e 4 = R_{710} / R_{760}$	[35]
12	Carter5	$C a r t e 5 = R_{695} / R_{670}$	[36]
13	Datt1	$D a t t 1 = (R_{850} - R_{710}) / (R_{850} - R_{680})$	[37]
14	Datt2	$D a t t 2 = R_{850} / R_{710}$	[38]
15	Datt3	$D a t t 3 = R_{754} / R_{704}$	[39]
16	Enhanced Vegetation Index	$E V I = 2.5 \times (\frac{(R_{800} - R_{670})}{R_{800} + 6 R_{670} - 7.5 R_{475} + 1})$	[40]
17	Modified Chlorophyll Absorption in Reflectance Index	$E V I = 2.5 \times ((R_{800} - R_{670}) / (R_{800} + 6 R_{670} - 7.5 R_{475} + 1))$	[41]
18	Modified Triangular Vegetation Index 1	$M C A R I = ((R_{700} - R_{670}) - 0.2 \times (R_{700} - R_{550})) \times (R_{700} / R_{670})$	[42]
19	Normalized Difference Cloud Index	$M T V I 1 = 1.2 \times (1.2 \times (R_{800} - R_{550}) - 2.5 \times (R_{670} - R_{550}))$	[43]
20	Plant Senescence Reflectance Index	$N D C I = (R_{762} - R_{527}) / (R_{762} + R_{527})$	[44]
21	Renormalized Difference Vegetation Index	$P S R I = (R_{678} - R_{500}) / R_{750}$	[45]
22	Red-Edge Position Linear Interpolation	$R D V I = (R_{800} - R_{670}) / \sqrt{R_{800} + R_{670}}$	[46]
23	Spectral Polygon Vegetation Index 1	$R E P = 700 + 40 \times (((R_{670} + R_{780}) / 2 - R_{700}) / (R_{740} - R_{700}))$	[47]
24	Simple Ratio Pigment Index	$S P V I 1 = 0.4 \times 3.7 \times (R_{800} - R_{670}) - 1.2 \times \| R_{530} - R_{670} \|$	[48]
25	Simple Ratio 440/690	$S R P I = R_{430} / R_{680}$	[49]
26	Simple Ratio 700/670	$S R (440 / 690) = R_{430} / R_{690}$	[50]
27	Simple Ratio 750/550	$S R (700 / 670) = R_{700} / R_{670}$	[51]
28	Simple Ratio 750/700	$S R (750 / 550) = R_{750} / R_{550}$	[52]
29	Simple Ratio 750/710	$S R (750 / 700) = R_{750} / R_{700}$	[53]
30	Simple Ratio 752/690	$S R (750 / 710) = R_{750} / R_{710}$	[54]
31	Simple Ratio 800/680	$S R (752 / 690) = R_{752} / R_{690}$	[55]
32	Transformed Chlorophyll Absorption Ratio	$S R (800 / 680) = R_{800} / R_{680}$	[56]
33	Optimized Soil Adjusted Vegetation Index	$T C A R I = 3 \times ((R_{700} - R_{670}) - 0.2 \times (R_{700} - R_{550}) \times (R_{700} / R_{670}))$	[57]
34	Transformed Chlorophyll Absorption in Reflectance Index/Optimized Soil Adjusted Vegetation Index	$T C A R I / O S A V I = 3 \times ((R_{700} - R_{670}) - 0.2 \times (R_{700} - R_{550}) \times (R_{700} / R_{670}) \times (1 + 0.16 \times (R_{800} - R_{670}) / (R_{800} + R_{670} + 0.16)))$	[58]
35	Triangular Vegetation Index	$T V I = 0.5 \times (120 \times (R_{750} - R_{550}) - 200 \times (R_{670} - R_{550}))$	[59]
36	Leaf Chlorophyll Index	$L C I = (\| R_{850} \| - \| R_{710} \|) / (\| R_{850} \| - \| R_{680} \|)$	[60]
37	Structure Intensive Pigment Index 1	$S I P I 1 = (R_{800} - R_{445}) / (R_{800} - R_{680})$	[61]
38	Structure Intensive Pigment Index 2	$S I P I 2 = (R_{800} - R_{505}) / (R_{800} - R_{690})$	[62]
39	Structure Intensive Pigment Index 3	$S I P I 3 = (R_{800} - R_{470}) / (R_{800} - R_{680})$	[63]
40	Red-Edge Ratio Vegetation Index	$R E R V I = R_{840} / R_{717}$	[64]
41	Red-Edge Normalized Difference Vegetation Index	$R E N D V I = (R_{840} - R_{717}) / (R_{840} + R_{717})$	[64]
42	Green Ratio Vegetation Index	$G R V I = R_{840} / R_{560}$	[65]
43	MERIS Terrestrial Chlorophyll Index	$M T C I = (R_{753} - R_{708}) / (R_{708} - R_{681})$	[66]
44	Chlorophyll Index Green	$C I - g r e e n = (R_{780} / R_{550}) - 1$	[67]
45	Ratio Vegetation Index	$R V I = R_{765} / R_{720}$	[68]
46	FODS	First-order differential spectrum	[69]
47	$S D_{r}$	First-order differential spectral integration in the wavelength range of 680∼760 nm	[69]
48	$S D_{b}$	First-order differential spectral integration in the wavelength range of 490∼530 nm	[69]
49	$S D_{r} / S D_{b}$	Ratio of the red edge area to the blue edge area	[70]
50	$(S D_{r} - S D_{b}) / (S D_{r} + S D_{b})$	Normalized value of the red edge area and the blue edge area	[70]

Note: R, r, and b represent spectral reflectance, red edge, and blue edge, respectively. No. 1–45, 46, and 47–50 were the VIs, FODS, and TEPs, respectively.

Table 2. ULR Efficacy Throughout Phenological Stages.

Reproductive Stage	Model Equation	$R^{2}$	RMSE	Best Predictor
Beginning Bud	$y = 54.35 + 6.12 \cdot x_{1}$	0.39	8.78	SR_750/710
Full Bud	$y = 46.03 + 6.12 \cdot x_{1}$	0.62	7.85	MTCI
First, Flower	$y = 49.68 + 6.12 \cdot x_{1}$	0.60	8.94	FODS(752.4)
Full Flower	$y = 39.90 + 17.32 \cdot x_{1}$	0.68	9.72	Datt1
First, Boll	$y = 49.68 + 6.12 \cdot x_{1}$	0.72	8.32	mNDVI₇₀₅
Full Boll	$y = 59.95 + 6.12 \cdot x_{1}$	0.73	8.64	FODS(743)

Table 3. MLR models across phenological stages.

Reproductive Stage	Model Equation	$R^{2}$	RMSE	Best Predictors
Beginning Bud	$y = 58.52 - 0.63 \cdot x_{1} + 0.78 \cdot x_{2} - 0.15 \cdot x_{3}$	0.59	6.87	SR_750/710, Datt1, Carter4
Full Bud	$y = 48.06 - 0.24 \cdot x_{1} - 0.03 \cdot x_{2} - 0.03 \cdot x_{3}$	0.62	7.85	MTCI, SIPI1, NDVI
First, Flower	$y = 62.01 + 1.38 \cdot x_{1} - 0.73 \cdot x_{2} + 1.13 \cdot x_{3}$	0.60	8.94	FODS, Datt2, RERVI
Full Flower	$y = 51.03 + 4.53 \cdot x_{1} - 6.31 \cdot x_{2} + 2.57 \cdot x_{3}$	0.68	9.72	Datt1, VOG2, Carter5
First, Boll	$y = 67.00 + 2.53 \cdot x_{1} + 6.83 \cdot x_{2} - 0.70 \cdot x_{3}$	0.72	8.32	mNDVI₇₀₅ CI_green, MTCI
Full Boll	$y = 60.82 - 1.57 \cdot x_{1} + 2.05 \cdot x_{2} + 0.34 \cdot x_{3}$	0.77	4.37	FODS(743), MTCI, TCARI

Note:

x_{1}

,

x_{2}

, and

x_{3}

denote the parameters of the best-fit model. textitFirst, bud:

x_{1} = (SDr - SDb) / (SDr + SDb)

,

x_{2} = SDr / SDb

, and

x_{3} = Datt 1

; Full bud:

x_{1} = (SDr - SDb) / (SDr + SDb)

,

x_{2} = SDr / SDb

, and

x_{3} = {SR}_{(750, 710)}

; First, flower:

x_{1} = SDr / SDb

,

x_{2} = MTCI

, and

x_{3} = RVI

; Full flower:

x_{1} = {SR}_{(750, 710)}

,

x_{2} = MTCI

, and

x_{3} = (SDr - SDb) / (SDr + SDb)

; First, boll:

x_{1} = (SDr - SDb) / (SDr + SDb)

,

x_{2} = {SR}_{(750, 710)}

, and

x_{3} = Carter 3

; Full boll:

x_{1} = SDr / SDb

,

x_{2} = MTCI

, and

x_{3} = Carter 3

.

Table 4. Dimensionality reduction integrated with parameter selection.

Reproductive Stage	Metric	RFR	KNNR	SVR	Key Parameters
Beginning Bud	$R^{2}$	0.85	0.80	0.72	RVI, FODS(743)
	RMSE	5.80	5.60	4.10
Full Bud	$R^{2}$	0.62	0.68	0.78	$(SDr - SDb) / (SDr + SDb)$ , RVI
	RMSE	7.70	4.07	4.30
First, Flower	$R^{2}$	0.70	0.63	0.70	Datt3, mNDVI₇₀₅
	RMSE	3.50	4.03	7.10
Full Flower	$R^{2}$	0.52	0.71	0.79	Datt1, Carter3
	RMSE	8.80	4.25	5.90
First, Boll	$R^{2}$	0.80	0.74	0.61	mNDVI₇₀₅, Carter3
	RMSE	4.01	3.80	9.30
Full Boll	$R^{2}$	0.69	0.37	0.69	Datt3, FODS(752.4)
	RMSE	4.07	11.53	6.30

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Jiang, C.; Cheng, Y.; Li, Y.; Peng, L.; Dong, G.; Lai, N.; Geng, Q. Phenology-Aware Machine Learning Framework for Chlorophyll Estimation in Cotton Using Hyperspectral Reflectance. Remote Sens. 2025, 17, 2713. https://doi.org/10.3390/rs17152713

AMA Style

Jiang C, Cheng Y, Li Y, Peng L, Dong G, Lai N, Geng Q. Phenology-Aware Machine Learning Framework for Chlorophyll Estimation in Cotton Using Hyperspectral Reflectance. Remote Sensing. 2025; 17(15):2713. https://doi.org/10.3390/rs17152713

Chicago/Turabian Style

Jiang, Chunbo, Yi Cheng, Yongfu Li, Lei Peng, Gangshang Dong, Ning Lai, and Qinglong Geng. 2025. "Phenology-Aware Machine Learning Framework for Chlorophyll Estimation in Cotton Using Hyperspectral Reflectance" Remote Sensing 17, no. 15: 2713. https://doi.org/10.3390/rs17152713

APA Style

Jiang, C., Cheng, Y., Li, Y., Peng, L., Dong, G., Lai, N., & Geng, Q. (2025). Phenology-Aware Machine Learning Framework for Chlorophyll Estimation in Cotton Using Hyperspectral Reflectance. Remote Sensing, 17(15), 2713. https://doi.org/10.3390/rs17152713

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Phenology-Aware Machine Learning Framework for Chlorophyll Estimation in Cotton Using Hyperspectral Reflectance

Abstract

1. Introduction

2. Materials and Methods

2.1. Workflow Overview

2.2. Study Area and Experimental Design

2.3. Data Acquisition

2.3.1. Hyperspectral Data Collection

2.3.2. Leaf Chlorophyll Content Measurement

2.4. Data Preprocessing and Feature Extraction

2.4.1. Spectral Preprocessing

2.4.2. Spectral Feature Extraction

2.5. Feature Selection and Dimensionality Reduction

2.6. Model Construction and Evaluation

2.6.1. Linear Regression Models

2.6.2. Machine Learning Models

2.6.3. Model Evaluation

3. Results

3.1. Statistics of Measured LLC

3.2. Parameter Selection

Biological Significance of Key Features

3.3. Univariate Linear Regression

3.4. Algorithm Implementation

3.5. Multiple Learning Regression

Machine Learning Algorithms

4. Discussion

4.1. Environmental Context of Stage-Specific Variability

4.2. Spectral Predictor Optimization via Linear Regression for Chlorophyll Quantification

4.3. Comparative Efficacy of ML Architectures in Chlorophyll Quantification

4.4. Model Performance Across Key Phenological Stages

4.5. Exploring Future Prospects for Cotton Chlorophyll Research

4.6. Practical Implications of Minimal-Band Modeling

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI