1. Introduction
Gray jujube (
Ziziphus jujuba Mill.) is an important economic fruit tree that is unique to China, with its fruit being rich in nutrients and having broad market prospects [
1]. In the high-quality and high-yield cultivation management of gray jujube, the plant’s nitrogen nutrition status is a key limiting factor. Chlorophyll, as the core pigment of plant photosynthesis, has a highly positive correlation with the leaf nitrogen content, and is therefore regarded as a reliable indicator for assessing the plant physiological health status and nitrogen nutrition levels [
2,
3]. The SPAD-502 chlorophyll meter, by measuring the light transmittance of leaves at specific wavelengths, can quickly and non-destructively obtain the relative chlorophyll content (SPAD value), and has become an important tool for field nutrition diagnosis [
4]. However, as a point-based measurement system, the SPAD meter is less efficient in characterizing large-area canopy with spatial heterogeneity, and is difficult to integrate into future high-throughput field phenotyping platforms. Therefore, the development of new technology that is capable of achieving the rapid and comprehensive surveillance of nitrogen status in grey jujube is of great significance for realizing its precise fertilization and intelligent management.
Near-infrared spectroscopy, as an efficient and environmentally friendly means of non-destructive inspection, has its analytical foundation in the double frequency and combination band vibrations of hydrogen-containing groups such as C-H, O-H, and N-H in organic compounds [
5]. This technology has been attested to have great potential in the quantitative inversion of plant leaf biochemical parameters such as nitrogen, chlorophyll, and moisture. In recent years, numerous studies have been devoted to applying NIR spectroscopy to estimate the chlorophyll content of various crops. For example, Zhang et al. [
6] successfully used NIR spectroscopy to predict the SPAD value of rice leaves and found that after standard normal variate (SNV) pretreatment, the model performance was significantly improved; Li et al. [
7] compared multiple modeling methods in the study of citrus leaves and confirmed the superiority of the machine learning algorithm in such nonlinear problems; Prattana Lopin et al. [
8] further effectively screened the characteristic wavelength related to chlorophyll through the competitive adaptive reweighted sampling algorithm, simplifying the model. These studies have laid a solid foundation for the application of NIR technology in plant nutrition surveillance. However, NIR spectra are susceptible to interference from environmental noise, light scattering, and sample baseline drift. Therefore, selecting appropriate spectral preprocessing methods (such as smooth, standard normal variate, derivative processing, etc.) to extract effective information, combined with characteristic wavelength selection algorithms (such as competitive adaptive reweighted sampling) to eliminate redundant variables, is key to constructing robust and high-precision quantitative models [
9,
10,
11].
In terms of modeling algorithms, machine learning methods that are capable of handling complex nonlinear relationships have demonstrated significant advantages. The back propagation (BP) neural network and radial basis function neural network, among others [
12,
13], have been widely applied in the field of spectral analysis due to their powerful function approximation capabilities. Different training algorithms affect both the convergence speed and prediction accuracy by altering the optimization path of the model, and the systematic comparison of these algorithms is crucial for constructing the optimal model [
14].
Although NIR spectroscopy has made significant progress in crop nutrition surveillance, existing research has primarily focused on staple food crops such as rice and wheat as well as some fruit trees. For the important economic tree species gray jujube, its leaf structure and biochemical composition may possess uniqueness, making the direct application of models from other crops less universally applicable. Currently, research on the systematic assessment of the SPAD value in gray jujube leaves using NIR spectroscopy remains very limited, particularly in exploring the optimal spectral preprocessing workflow, feature band selection, and the combined effects of different neural network training strategies, where in-depth and systematic reports are still lacking.
Therefore, this study aims to fill this research gap by taking jujube leaves as the research object and systematically investigating the quantitative analysis model of SPAD value based on NIR spectroscopy. The specific objectives of this study include: (1) collecting the NIR spectra and SPAD values of jujube leaves to construct a dataset; (2) comparing the enhancement effects of various spectral preprocessing methods on model performance; (3) applying the competitive adaptive reweighted sampling algorithm to screen for the characteristic wavelength and optimize model inputs; (4) constructing BPNN and RBFNN models and evaluating the predictive efficacy of different training function; and (5) determining the optimal model combination suitable for the non-destructive inspection of SPAD value in jujube leaves. This study is expected to provide a reliable technical solution and theoretical basis for the rapid and non-destructive diagnosis of the nutritional status of jujube.
2. Materials and Methods
2.1. Overview of Test Site
This experiment was conducted on the campus of Tarim University in Alaer City, Xinjiang. The experimental area is characterized by a typical extremely continental arid desert climate in a warm temperate zone, with an average annual precipitation of approximately 50 mm, scarce winter snowfall, annual evaporation exceeding 2000 mm, abundant sunshine, and significant diurnal temperature variations. The test soil was classified as sandy loam soil, characterized by uniform texture and good permeability. The experimental material consisted of 17-year-old grey jujube trees, planted in north–south oriented rows with a spacing of 2 m × 4 m. The orchard was irrigated using the flood irrigation method, and no fertilizers were applied nor pest control measures implemented on the experimental trees. Healthy, uniformly growing mature trees were selected as the research subjects.
2.2. Sample Collection
During the maturity period of grey jujube fruits (30 September 2024), mature leaves were collected from the middle and lower segments of the current-year branches at the outer edge of the tree crown of each test tree. During collection, individual leaves from the four cardinal directions (east, south, west, and north) of the tree crown were carefully selected. The relative chlorophyll content of grey jujube leaves was measured using a chlorophyll meter (SPAD-502Plus, Konica Minolta, Tokyo, Japan), with the SPAD value used as the characterization index [
15]. The instrument was calibrated before measurement. Avoiding the midrib, three points were selected at approximately 2/3 of the distance from the petiole in the middle part of the leaf for measurement, and the average value was taken as the SPAD value for that leaf, thereby reducing errors caused by uneven chlorophyll distribution. All measurements were conducted under stable natural light intensity conditions, ensuring that the probe was tightly attached to the leaf surface with no light leakage. The single-leaf measurement time was controlled within 10 s to prevent leaf deformation due to pressure or dehydration. Finally, each leaf number was accurately recorded with its corresponding three readings and the mean value to establish a SPAD value database. A total of 188 leaves were collected and stored in Ziplock bag inside a 4 °C vehicle refrigerator until spectral scanning.
2.3. Original Spectral Acquisition
After sample collection was completed, the test sample was removed from the 4 °C vehicle refrigerator on the same day (30 September 2024) and placed in the laboratory housing the spectrometer (ambient temperature 24 °C) for a 12-h equilibration period. This ensured that the sample reached room temperature and eliminated any interference from temperature gradients. After powering on the Fourier transform near-infrared spectrometer (Antaris II FTNIR (Thermo Fisher Scientific, Madison, WI, USA)) and preheating for 30 min, diffuse reflectance correction was performed using the standard whiteboard [
16]. If necessary, dust was gently cleaned from the leaf surface with a dust-free cloth. On the leaf, using the leaf vein as a boundary, 2 regions each were selected at the upper and lower ends (totaling 4 sites), and the spectra of different regions was marked with different colors. In each region, the scan was repeated four times with the following parameters: spectral range 10,000~4000 cm
−1; resolution 8 cm
−1; gain 2×; and number of accumulations per scan 64 times. A total of 16 spectral curves were obtained from one single leaf, and the average value was calculated after baseline correction to serve as the final absorbance (A) value of the sample for subsequent chemometric modeling and analysis. This method effectively controlled the effects of temperature fluctuation, instrument drift, and leaf heterogeneity through standardized preprocessing, instrument calibration, and multi-point repeated measurement, laying the data foundation for the construction of a high-precision prediction model.
2.4. Spectral Data Conversion
Since the acquired original near-infrared spectroscopy (NIR) is susceptible to interference from environmental noise, light scattering, and baseline drift, direct modeling would affect the prediction accuracy. Therefore, it was necessary to perform preprocessing on the original spectral data to enhance the effective signal. This study systematically compared multiple preprocessing methods:
Smooth: Utilizes the Savitzky–Golay convolution smooth algorithm [
17] with the aim to suppress random noise and improve the signal-to-noise ratio.
Standard normal variate transformation (SNV): Used to eliminate baseline drift and scale differences caused by leaf surface scattering and variations in optical path length.
Derivative processing: Employs first derivative (FD) and second derivative (SD) processing [
18] to eliminate baseline interference and enhance the resolution of overlapping absorption peak.
Combined preprocessing: To further refine the effects, the aforementioned methods were serially combined, such as “smooth + FD ”, “SNV + FD ”, etc., aiming to comprehensively leverage the advantages of multiple methods and provide higher-quality spectral data for the model.
2.5. Spectral Feature Band Extraction
Full spectral band data are voluminous and exhibit significant multicollinearity and redundant information. To simplify the model, improve computational efficiency, and enhance the model’s generalization ability, this study employed the competitive adaptive reweighted sampling (CARS) algorithm to screening the characteristic wavelength variables most correlated with SPAD value from the preprocessed spectra [
19]. The CARS algorithm iteratively selects wavelength points with a larger regression absolute value of coefficients through adaptive reweighted sampling and a competitive mechanism, thereby optimizing a highly representative characteristic wavelength subset and significantly reducing the data dimension.
2.6. Machine Learning Modeling
Using the previously obtained characteristic wavelength variable as input and the measured SPAD value as output, we established two prediction models: a back propagation (BP) neural network (BPNN) and a radial basis function neural network (RBFNN).
The BP neural network employs a three-layer topology, adjusting weights and thresholds through the back propagation (BP) error algorithm. This study focused on comparing three training functions—LBFGS (quasi-Newton algorithm), Adam (adaptive moment estimation), and SGD (stochastic gradient descent)—to optimize the model performance [
20].
The RBF neural network employs radial basis functions as the activation function in the hidden layer and performs fitting through nonlinear mapping. Three solution methods were compared: direct (direct computation), SVD (singular value decomposition), and gradient (gradient descent) [
21].
After the model was constructed, the sample set was randomly split into a training set (for model development) and validation set (for model evaluation) at a certain ratio such as 3:1.
2.7. Model Evaluation Method
In this study, the regression algorithm was implemented using MATLAB R2024b software. Meanwhile, three metrics—coefficient of determination (R
2), root mean square error (RMSE), and residual prediction deviation (RPD)—were used to comprehensively evaluate the model performance [
22]:
Coefficient of determination (R2): This measures the model goodness of fit and ranges from 0 to 1. A value closer to 1 indicates a stronger agreement between the model predicted value and the measured value. The calculation formula is provided in Formula (1).
Root mean square error (RMSE): Measures the absolute magnitude of the prediction error. A smaller root mean square error (RMSE) value indicates greater model prediction accuracy. The calculation is given in Formula (2).
Residual prediction deviation (RPD): This indicates the model’s predictive ability, calculated as shown in Formula (3). Its evaluation criteria are generally defined as: RPD > 3 indicates excellent model prediction ability; 2 < RPD ≤ 3 means that the model can be used for preliminary prediction; RPD ≤ 2 suggests poor model prediction ability.
In the model evaluation, the dataset was split into a training set and test set at a 3:1 ratio. The above metrics were computed separately to comprehensively assess the model’s fitting effect, prediction accuracy, and generalization ability.
For the sample size, and are the actual value and the predicted value of the Korla fragrant pear leaf total potassium, respectively; is the average value of the actual Korla fragrant pear leaf total potassium; is the standard deviation of the leaf total potassium measurement value of the Korla fragrant pear.
3. Results
3.1. Analysis of Near Infrared Original Spectrum and SPAD Content of Grey Jujube Leaves
Figure 1a shows the original spectral reflection characteristics of grey jujube leaves in the wavenumber range of 4000–10,000 cm
−1. The spectral curve overall exhibited the reflection characteristics typical of healthy green plants. There was a relatively broad and intense absorption valley at approximately 1.5 absorbance, a position that precisely corresponded to the O-H stretching vibrational absorption band of water (H
2O) (where strong absorption peaks typically exist near 5200 cm
−1 and 6900 cm
−1). This prominent feature indicates that the measured grey jujube leaves contained abundant water and had a good moisture status. Furthermore, in the high wavenumber range of 4000–4500 cm
−1, the spectral curve was relatively flat with comparatively weak absorption. The spectral characteristics across the entire near-infrared (NIR) region were closely related to the leaf’s internal structure, water content, and biochemical composition. The original spectrum provides an important data foundation for the subsequent extraction of characteristic wavelength bands and the establishment of a quantitative inversion model for SPAD value or other biochemical parameters.
Figure 1b displays the frequency distribution histogram of the SPAD values in the grey jujube leaf samples. The SPAD value serves as an important indicator of the leaf relative chlorophyll content and even nitrogen nutrition status. In terms of distribution pattern, the SPAD value approximately followed a normal distribution, suggesting that the chlorophyll content in most leaves was moderate and the sample population exhibited good uniformity. The SPAD value was predominantly concentrated between 25 and 45, with the peak count (i.e., sample number) occurring between 35 and 40, indicating that this SPAD value range represents the most typical chlorophyll content level in this study. Notably, leaves with a SPAD value below 25 or above 45 were relatively scarce, which may be due to the fact that all samples were obtained from healthy grey jujube plants, with few individuals showing extreme deficiency or nutrient surplus. The distribution results confirm that the grey jujube leaf samples used in the experiment exhibited both natural variation in chlorophyll content and a relatively stable overall condition, rendering them highly suitable for the development of a predictive model for chlorophyll content.
In summary, this study successfully acquired spectral data and their corresponding SPAD values from healthy grey jujube leaves. The leaves exhibited strong moisture absorption characteristics in the near-infrared region, while their chlorophyll content (represented by SPAD value) showed a concentrated and reasonable distribution. The spectral reflectance characteristics of the leaf are physically associated with its internal biochemical parameters (such as chlorophyll, moisture, and nitrogen content). Therefore, based on the spectral data shown in
Figure 1a, it was entirely feasible to further employ modeling methods, such as feature extraction and machine learning, to explore the quantitative relationship with the SPAD value in
Figure 1b, ultimately achieving the rapid and non-destructive spectral diagnosis of the chlorophyll content in grey jujube leaves.
3.2. Spectral Preprocessing
Figure 2 shows the effect of different preprocessing techniques on the near-infrared spectroscopy of grey jujube leaf. Smooth processing (
Figure 2a) effectively suppresses noise and improves the signal-to-noise ratio, but may weaken subtle characteristic peaks. SNV processing (
Figure 2b) eliminates baseline drift and the scattering effect, highlights differences in chemical composition, and enhances model robustness. First derivative (FD) (
Figure 2c) effectively removes baseline interference and improves the resolution of overlapping peaks by converting the absorption peak into a zero point and the inflection point into an extreme point. Second derivative (SD) (
Figure 2d) further sharpens spectral features and is sensitive to weak signals, though its signal amplitude is relatively small. Combined methods demonstrate a better performance: smooth + FD/SD (
Figure 2e,f) applies smooth first to reduce noise before differentiation, resulting in smoother derivative spectra and suppressing noise amplification caused by derivation; SNV + FD/SD (
Figure 2g,h) corrects the scattering effects first and then eliminates baseline drift, addressing both issues simultaneously. After processing, the spectral characteristics are prominent and the baseline is flat, significantly improving the data quality, which is highly beneficial for subsequent quantitative model development. Through comparison, it was seen that the combined preprocessing method could comprehensively leverage their respective advantages, providing a higher quality spectral data foundation for establishing a high-precision SPAD prediction model.
3.3. Correlation Analysis Between Pretreatment Spectrum and Leaf SPAD Content
In order to further observe the effect of preprocessing on the leaf SPAD content, this study conducted a correlation analysis. As shown in
Figure 3a–h, the correlation coefficient between the leaf SPAD and the target variable was respectively calculated using the following preprocessing methods: smooth, standard normal variate, first derivative, second derivative, smooth + first derivative, smooth + second derivative, standard normal variate + first derivative, and standard normal variate + second derivative. After preprocessing with smooth (
Figure 3a), its correlation coefficient with the target variable was generally within the strong negative correlation interval of −1 to −0.75. While this phenomenon highlights a significant association between the spectrum and the target parameter, it also has some obvious drawbacks: on the one hand, an excessively strong negative correlation may obscure the true physical and chemical mechanism, weaken model interpretability, and lead to an increase in the loss of key information and overfitting risk due to an artificially introduced false signal during the smoothing process or the suppression of weak features; on the other hand, this extreme linear trend may contradict the nonlinear relationship of the actual system, misleading the direction of feature selection and model optimization. After three pretreatment methods—standard normal variate, first derivative, and smooth + first derivative—the correlation range fell within −1 to 1, showing a broader interval compared with the smooth pretreatment, whereas the second derivative, smooth + second derivative, standard normal variate + first derivative, and standard normal variate + second derivative performed poorly.
3.4. Spectral Feature Extraction
To enhance the computational efficiency of the model and alleviate its computational burden, this study adopted the competitive adaptive reweighted sampling (CARS) algorithm to conduct characteristic wavelength extraction on spectral data subjected to different preprocessing methods. The competitive adaptive reweighted sampling algorithm effectively screens the characteristic bands most correlated with the target variable through self-adapting weighted sampling and a competitive mechanism, thereby significantly reducing the data dimension and optimizing the struct of model input.
Specifically, after screening by the competitive adaptive reweighted sampling algorithm, the quantity of characteristic wavelength extracted by different preprocessing methods showed notable variation. Among them, the FD preprocessing method yielded the highest number of spectral features, reaching 50, indicating that its spectral response is relatively complex and may hold more original details and variation information (
Figure 4c). Combined smooth + first derivative preprocessing obtained 28 characteristic wavelengths, demonstrating that smoothing can retain a considerable amount of effective information while reducing noise (
Figure 4e). Standard normal variate + second derivative preprocessing extracted 18 characteristic wavelengths, showing that this combination not only performs scatter true-up and baseline elimination, but also exhibits good feature expression capability (
Figure 4h).
On the other hand, standard normal variate, second derivative, and smooth + second derivative each extracted only five characteristic wavelengths, while standard normal variate + first derivative extracted only six. This indicates that these methods exhibit high feature concentration after competitive adaptive reweighted sampling screening, potentially focusing more on core sensitive bands, which is beneficial for building more concise and high-impact prediction models. However, it also implies the possibility of ignoring some bands, leading to weaker model prediction ability (
Figure 4a,b,d,f,g).The variation in the quantity of characteristic wavelength not only reflects the impact of different preprocessing methods on the distribution of spectral information, but also hints at their applicability under various modeling requirements: a larger number of features may contain richer variation information but entail higher computation complexity, while a smaller number of features facilitates express and robust modeling, though the risk of information omission must be cautioned against. The results of this study can provide a quantitative basis for subsequent feature selection and model optimization aimed at estimating the SPAD value of grey jujube leaves.
3.5. Visualization of Model Evaluation Indicators
To establish a high-precision prediction model for the SPAD value of grey jujube leaves, this study employed both the BP neural network and RBF neural network methods to develop a quantitative analysis model based on the characteristic wavelength extracted by the competitive adaptive reweighted sampling algorithm. The coefficient of determination (R
2), root mean square error (RMSE), and residual prediction deviation (RPD) were selected as evaluation indicators for model performance. The modeling results are shown in
Figure 5.
Figure 5a–c displays the distribution of R
2, RMSE, and RPD indicators on the training set and validation set for each model under different preprocessing methods. Comprehensive analysis indicated that the BP neural network model with smooth + first derivative preprocessing combined with competitive adaptive reweighted sampling feature screening (smooth + first derivative-competitive adaptive reweighted sampling-BP) performed the best. This model achieved R
2 values of 0.87 and 0.85 on the training set and validation set, respectively, both the highest among all models, demonstrating excellent explained variance capability and good generalization performance. In terms of error, its root mean square error (RMSE) values were the lowest, at 1.36 and 1.45, respectively, indicating a small deviation between the predicted value and the measured value. Furthermore, the RPD values of this model were 2.81 (training set) and 2.56 (validation set), further demonstrating its ideal prediction stability and practicality. Therefore, this study considers that the smooth + first derivative-competitive adaptive reweighted sampling-BP model performed the best across multiple metrics, not only exhibiting a good fitting effect but also strong generalization ability, enabling the reliable prediction of the SPAD value of grey jujube leaves, making it the most promising prediction model in this study.
3.6. Model Training Function Selection
To further investigate the influence of different training function on the model performance of neural networks, this study conducted multiple sets of comparative experiments on both the BP neural network and RBF neural network. The BP neural network employed three training algorithms, Adam, LBFGS, and SGD, whereas the RBF neural network utilized three solution methods, direct, SVD, and gradient. By systematically comparing the coefficient of determination (R
2) among the models, we found that the BP neural network generally exhibited a superior performance over the RBF neural network (
Figure 6a).
Within the BP neural network, the LBFGS training algorithm achieved the best prediction performance, with R2 values reaching 0.87 and 0.85 on the training set and validation set, respectively—exceeding those of the top RBF neural network model by 0.04 and 0.04, respectively. In terms of prediction accuracy, the BP-LBFGS model also excelled, recording the lowest root mean square error (RMSE) values among all six models: 1.36 for the training set and 1.35 for the validation set. These results demonstrate its high prediction accuracy. Additionally, the BP-LBFGS model exhibited strong performance in terms of residual prediction deviation (RPD), with values of 2.81 and 2.56 on the training set and validation set, respectively. Although the RBF-direct model also performed well on the RPD metric (training set: 2.67; validation set: 2.65), the BP-LBFGS model still demonstrated a clear advantage when considering both key indicators of coefficient of determination (R2) and root mean square error (RMSE).
In summary, the BP neural network utilizing the LBFGS training algorithm showed excellent overall performance in predicting the SPAD value of grey jujube leaves. It not only achieved high goodness of fitting and prediction accuracy, but also exhibited strong stability, making it better suited for the application needs of this study.
3.7. Linear Fitting and Relative Error Analysis of Model Prediction Data
Following the comparison of the model evaluation metrics above, the smooth + first derivative-competitive adaptive reweighted sampling-BP model based on the LBFGS function was selected for predicting the SPAD value of grey jujube leaves.
A linear fit between the model’s predicted results and the actual measurements yielded a coefficient of 0.88 for the training set (
Figure 7a) and 0.85 for the validation set (
Figure 7b).
Additionally, the relative error was calculated between the predicted and measured values. The results for the training set and validation set are presented in
Figure 7c,d, respectively. The maximum relative error was below 0.3% for the training set and below 0.25% for the validation set.
These results demonstrate that the smooth + first derivative-competitive adaptive reweighted sampling-BP model based on the LBFGS function exhibits strong predictive ability for estimating the SPAD value of grey jujube leaves.
4. Discussion
This study systematically compared different spectral preprocessing methods, characteristic wavelength selection algorithms, and neural network modeling strategies [
23], successfully constructing a high-precision prediction model for SPAD value in grey jujube leaves based on near-infrared spectroscopy. The results indicate that the BP neural network model trained with LBFGS, after preprocessing with smoothing combined with first derivative (Smooth + FD) and extracting characteristic wavelengths via the competitive adaptive reweighted sampling algorithm, demonstrated the optimal prediction performance. This finding not only confirms the feasibility of near-infrared spectroscopy for the non-destructive inspection of biochemical parameters in plant leaves, but also highlights the importance of collaborative optimization among spectral preprocessing, feature selection, and model architecture [
24], providing a theoretical basis and methodological reference for developing field-applicable rapid diagnostic equipment in the future.
From the perspective of spectral preprocessing, a single preprocessing method often struggles to comprehensively refine the spectral quality. For instance, in this study, although smoothing alone improved the signal-to-noise ratio, it caused the correlation between the spectrum and SPAD value to be highly concentrated in the strong negative range (−1 to −0.75), which may have obscured the true physicochemical response mechanisms and even introduced false signals [
25]. In contrast, derivative processing (especially first derivative) could effectively eliminate baseline drift and improve resolution, but it was also more sensitive to noise [
26]. While standard normal variate processing could correct for the scattering effect, it may not have sufficiently highlighted the subtle spectral characteristics related to chlorophyll [
27]. Notably, the combined preprocessing method demonstrated significant advantages: smooth + first derivative enhanced the spectral characteristics while suppressing noise, making subsequent modeling more robust. This is consistent with the findings of Tang et al. (2023) in the detection of tea components [
28], indicating that combined preprocessing can balance noise control and feature enhancement, serving as an effective strategy to improve the accuracy of near-infrared spectral analysis.
In terms of characteristic wavelength selection, the competitive adaptive reweighted sampling algorithm showed significant differences in the number of features extracted from different preprocessed spectra. The highest number of wavelengths was extracted after first derivative preprocessing (50), possibly because derivative processing amplified and preserved more subtle variations from the original spectrum, but it may also have introduced redundant information [
29]. In contrast, fewer features were extracted (5–6) after combined preprocessing methods such as standard normal variate + second derivative, demonstrating higher feature concentration. This indicates that preprocessing methods directly affect the distribution of spectral information and the redundancy of feature expression. Although a greater number of characteristic wavelengths may contain richer information, it also increases the model complexity and the overfitting risk. In this study, the optimal model (smooth + first derivative-competitive adaptive reweighted sampling-BP) ultimately selected 28 characteristic wavelengths, achieving a dimensionality reduction while preserving information content. This indicates that feature selection must seek a balance between informational completeness and model conciseness.
Regarding modeling methods, the BP neural network generally outperformed the RBF neural network, particularly when employing LBFGS as the training method, demonstrating higher prediction accuracy and stability [
30]. As one of the quasi-Newton algorithms, LBFGS is suitable for optimization problems with a moderate number of parameters. Its convergence speed and accuracy are typically superior to stochastic gradient descent (SGD) and even adaptive moment estimation (Adam) in spectral modeling. This finding is consistent with the research conclusion of Yang et al. (2021) in the prediction of wheat leaf nitrogen content [
31], indicating that the training algorithm based on second derivative (SD) approximation refinement is more suitable for complex nonlinear mapping between the spectrum and biochemical parameters. Furthermore, the optimal model maintained a high coefficient of determination (R
2) (>0.84) and RPD (>2.5) on both the training set and validation set, demonstrating that the model possesses both good fitting ability and generalization performance, capable of meeting the demands of practical applications.
Although this study achieved good prediction results, there are still several aspects worthy of in-depth exploration. First, the distribution of SPAD value in grey jujube leaves was relatively concentrated (25–45), with few samples of extreme values, which to some extent limits the model’s extrapolation capability under extreme nutritional conditions. Future studies could expand the sample scope to include leaves from different growth stages and under different nutritional treatments to enhance the model’s generalizability.
Second, although this study employed competitive adaptive reweighted sampling for trait screening, the physical significance and physiological mechanisms of the extracted wavelength points still require further analysis.
Combining the chlorophyll uptake characteristics and moisture interference bands for physical interpretation would help improve the model’s interpretability.
Finally, the current model has not yet considered the impact of environmental factors (such as light, temperature) and differences in leaf physiological structure on the spectrum. Introducing these variables as auxiliary inputs or adopting more complex model architectures like deep learning is expected to further enhance the prediction accuracy. In summary, this study, through multi-method collaborative optimization, constructed a high-precision near-infrared spectroscopy model suitable for the quantitative prediction of SPAD value in grey jujube leaves, providing a reliable analytical framework for crop nutrition via non-destructive inspection. In the future, this method can be rolled out to other stock jujube trees and even more economic crops, promoting the widespread application of spectroscopic technology in smart agriculture.
5. Conclusions
This study successfully validated the feasibility of using near-infrared spectroscopy for the rapid and non-destructive inspection of SPAD value in gray jujube leaf. Compared with traditional destructive chemical measurement methods, this method achieved non-destructive and rapid analysis; moreover, compared with the single-point measurement of SPAD meters, spectroscopy technology more easily enables high-throughput assessment at the canopy scale.
Through systematic comparison, the combination of smooth with first derivative and competitive adaptive reweighted sampling characteristic wavelength selection was determined to be the optimal scheme. On this foundation, the established BP neural network model based on the LBFGS algorithm achieved the best prediction performance (validation set coefficient of determination (R2) > 0.84, RPD > 2.5), confirming the model’s excellent accuracy and robustness.
In summary, the technical system constructed in this research provides an advanced and reliable solution for the rapid field diagnosis of gray jujube nitrogen nutrition and precision fertilization management. The successful application of this method will directly serve production practice, providing strong technical support in achieving the precision agriculture goals of improving fertilizer utilization efficiency, reducing production costs, decreasing non-point source pollution, and guaranteeing high-quality and high-yield jujube fruits.