Detection of SPAD Content in Leaves of Grey Jujube Based on Near Infrared Spectroscopy

Wang, Lanfei; Zeng, Junkai; Yu, Mingyang; Fan, Weifan; Bao, Jianping

doi:10.3390/horticulturae11101251

Open AccessArticle

Detection of SPAD Content in Leaves of Grey Jujube Based on Near Infrared Spectroscopy

by

Lanfei Wang

^1,2,†,

Junkai Zeng

^1,2,†,

Mingyang Yu

^1,2

,

Weifan Fan

^1,2 and

Jianping Bao

^1,2,*

¹

College of Horticulture and Forestry Science, Tarim University, Alar 843300, China

²

Southern Xinjiang Special Fruit Trees High-Quality, High-Quality Cultivation and Deep Processing of Fruit Products Processing Technical National Local Joint Engineering Laboratory, Alar 843300, China

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Horticulturae 2025, 11(10), 1251; https://doi.org/10.3390/horticulturae11101251

Submission received: 13 September 2025 / Revised: 12 October 2025 / Accepted: 15 October 2025 / Published: 17 October 2025

(This article belongs to the Section Fruit Production Systems)

Download

Browse Figures

Review Reports Versions Notes

Abstract

The efficient and non-destructive inspection of the chlorophyll content of grey jujube leaf is of great significance for its growth surveillance and nutritional diagnosis. Near-infrared spectroscopy combined with chemometric methods provides an effective approach to achieve this goal. This study took grey jujube leaves as the research object, systematically collected near-infrared spectral data in the range of 4000–10,000 cm⁻¹, and simultaneously measured their soil and plant analyzer development (SPAD) value as a reference index for chlorophyll content. Through various pretreatment and their combination methods on the original spectrum—smooth, standard normal variable transformation (SNV), first derivative (FD), second derivative (SD), smooth + first derivative (Smooth + FD), smooth + second derivative (Smooth + SD), standard normal variable transformation + first derivative (SNV + FD), standard normal variable transformation + second derivative (SNV + SD)—the effects of different methods on the quality of the spectrum and its correlation with SPAD value were compared. The competitive adaptive reweighted sampling algorithm (CARS) was adopted to extract the characteristic wavelength, aiming to reduce data dimensionality and optimize model input. Both BP neural network and RBF neural network prediction models were established, and the model performance under different training functions was compared. The results indicate that after Smooth + FD pretreatment, followed by CARS screening of the characteristic wavelength, the BP neural network model trained using the LBFGS algorithm demonstrated the best performance, with its coefficient of determination (R²) of 0.87 (training set) and 0.85 (validation set), root mean square error (RMSE) of 1.36 (training set) and 1.35 (validation set), and residual prediction deviation (RPD) of 2.81 (training set) and 2.56 (validation set) showing good prediction accuracy and robustness. Research indicates that by combining near-infrared spectroscopy with feature extraction and machine learning methods, the rapid and non-destructive inspection of the grey jujube leaf SPAD value can be achieved, providing reliable technical support for the real-time monitoring of the nutritional status of jujube trees.

Keywords:

spectral analysis; soil and plant analyzer development value; competitive adaptive reweighted sampling; back propagation neural network; radial basis function neural network

1. Introduction

Gray jujube (Ziziphus jujuba Mill.) is an important economic fruit tree that is unique to China, with its fruit being rich in nutrients and having broad market prospects [1]. In the high-quality and high-yield cultivation management of gray jujube, the plant’s nitrogen nutrition status is a key limiting factor. Chlorophyll, as the core pigment of plant photosynthesis, has a highly positive correlation with the leaf nitrogen content, and is therefore regarded as a reliable indicator for assessing the plant physiological health status and nitrogen nutrition levels [2,3]. The SPAD-502 chlorophyll meter, by measuring the light transmittance of leaves at specific wavelengths, can quickly and non-destructively obtain the relative chlorophyll content (SPAD value), and has become an important tool for field nutrition diagnosis [4]. However, as a point-based measurement system, the SPAD meter is less efficient in characterizing large-area canopy with spatial heterogeneity, and is difficult to integrate into future high-throughput field phenotyping platforms. Therefore, the development of new technology that is capable of achieving the rapid and comprehensive surveillance of nitrogen status in grey jujube is of great significance for realizing its precise fertilization and intelligent management.

Near-infrared spectroscopy, as an efficient and environmentally friendly means of non-destructive inspection, has its analytical foundation in the double frequency and combination band vibrations of hydrogen-containing groups such as C-H, O-H, and N-H in organic compounds [5]. This technology has been attested to have great potential in the quantitative inversion of plant leaf biochemical parameters such as nitrogen, chlorophyll, and moisture. In recent years, numerous studies have been devoted to applying NIR spectroscopy to estimate the chlorophyll content of various crops. For example, Zhang et al. [6] successfully used NIR spectroscopy to predict the SPAD value of rice leaves and found that after standard normal variate (SNV) pretreatment, the model performance was significantly improved; Li et al. [7] compared multiple modeling methods in the study of citrus leaves and confirmed the superiority of the machine learning algorithm in such nonlinear problems; Prattana Lopin et al. [8] further effectively screened the characteristic wavelength related to chlorophyll through the competitive adaptive reweighted sampling algorithm, simplifying the model. These studies have laid a solid foundation for the application of NIR technology in plant nutrition surveillance. However, NIR spectra are susceptible to interference from environmental noise, light scattering, and sample baseline drift. Therefore, selecting appropriate spectral preprocessing methods (such as smooth, standard normal variate, derivative processing, etc.) to extract effective information, combined with characteristic wavelength selection algorithms (such as competitive adaptive reweighted sampling) to eliminate redundant variables, is key to constructing robust and high-precision quantitative models [9,10,11].

In terms of modeling algorithms, machine learning methods that are capable of handling complex nonlinear relationships have demonstrated significant advantages. The back propagation (BP) neural network and radial basis function neural network, among others [12,13], have been widely applied in the field of spectral analysis due to their powerful function approximation capabilities. Different training algorithms affect both the convergence speed and prediction accuracy by altering the optimization path of the model, and the systematic comparison of these algorithms is crucial for constructing the optimal model [14].

Although NIR spectroscopy has made significant progress in crop nutrition surveillance, existing research has primarily focused on staple food crops such as rice and wheat as well as some fruit trees. For the important economic tree species gray jujube, its leaf structure and biochemical composition may possess uniqueness, making the direct application of models from other crops less universally applicable. Currently, research on the systematic assessment of the SPAD value in gray jujube leaves using NIR spectroscopy remains very limited, particularly in exploring the optimal spectral preprocessing workflow, feature band selection, and the combined effects of different neural network training strategies, where in-depth and systematic reports are still lacking.

Therefore, this study aims to fill this research gap by taking jujube leaves as the research object and systematically investigating the quantitative analysis model of SPAD value based on NIR spectroscopy. The specific objectives of this study include: (1) collecting the NIR spectra and SPAD values of jujube leaves to construct a dataset; (2) comparing the enhancement effects of various spectral preprocessing methods on model performance; (3) applying the competitive adaptive reweighted sampling algorithm to screen for the characteristic wavelength and optimize model inputs; (4) constructing BPNN and RBFNN models and evaluating the predictive efficacy of different training function; and (5) determining the optimal model combination suitable for the non-destructive inspection of SPAD value in jujube leaves. This study is expected to provide a reliable technical solution and theoretical basis for the rapid and non-destructive diagnosis of the nutritional status of jujube.

2. Materials and Methods

2.1. Overview of Test Site

This experiment was conducted on the campus of Tarim University in Alaer City, Xinjiang. The experimental area is characterized by a typical extremely continental arid desert climate in a warm temperate zone, with an average annual precipitation of approximately 50 mm, scarce winter snowfall, annual evaporation exceeding 2000 mm, abundant sunshine, and significant diurnal temperature variations. The test soil was classified as sandy loam soil, characterized by uniform texture and good permeability. The experimental material consisted of 17-year-old grey jujube trees, planted in north–south oriented rows with a spacing of 2 m × 4 m. The orchard was irrigated using the flood irrigation method, and no fertilizers were applied nor pest control measures implemented on the experimental trees. Healthy, uniformly growing mature trees were selected as the research subjects.

2.2. Sample Collection

During the maturity period of grey jujube fruits (30 September 2024), mature leaves were collected from the middle and lower segments of the current-year branches at the outer edge of the tree crown of each test tree. During collection, individual leaves from the four cardinal directions (east, south, west, and north) of the tree crown were carefully selected. The relative chlorophyll content of grey jujube leaves was measured using a chlorophyll meter (SPAD-502Plus, Konica Minolta, Tokyo, Japan), with the SPAD value used as the characterization index [15]. The instrument was calibrated before measurement. Avoiding the midrib, three points were selected at approximately 2/3 of the distance from the petiole in the middle part of the leaf for measurement, and the average value was taken as the SPAD value for that leaf, thereby reducing errors caused by uneven chlorophyll distribution. All measurements were conducted under stable natural light intensity conditions, ensuring that the probe was tightly attached to the leaf surface with no light leakage. The single-leaf measurement time was controlled within 10 s to prevent leaf deformation due to pressure or dehydration. Finally, each leaf number was accurately recorded with its corresponding three readings and the mean value to establish a SPAD value database. A total of 188 leaves were collected and stored in Ziplock bag inside a 4 °C vehicle refrigerator until spectral scanning.

2.3. Original Spectral Acquisition

After sample collection was completed, the test sample was removed from the 4 °C vehicle refrigerator on the same day (30 September 2024) and placed in the laboratory housing the spectrometer (ambient temperature 24 °C) for a 12-h equilibration period. This ensured that the sample reached room temperature and eliminated any interference from temperature gradients. After powering on the Fourier transform near-infrared spectrometer (Antaris II FTNIR (Thermo Fisher Scientific, Madison, WI, USA)) and preheating for 30 min, diffuse reflectance correction was performed using the standard whiteboard [16]. If necessary, dust was gently cleaned from the leaf surface with a dust-free cloth. On the leaf, using the leaf vein as a boundary, 2 regions each were selected at the upper and lower ends (totaling 4 sites), and the spectra of different regions was marked with different colors. In each region, the scan was repeated four times with the following parameters: spectral range 10,000~4000 cm⁻¹; resolution 8 cm⁻¹; gain 2×; and number of accumulations per scan 64 times. A total of 16 spectral curves were obtained from one single leaf, and the average value was calculated after baseline correction to serve as the final absorbance (A) value of the sample for subsequent chemometric modeling and analysis. This method effectively controlled the effects of temperature fluctuation, instrument drift, and leaf heterogeneity through standardized preprocessing, instrument calibration, and multi-point repeated measurement, laying the data foundation for the construction of a high-precision prediction model.

2.4. Spectral Data Conversion

Since the acquired original near-infrared spectroscopy (NIR) is susceptible to interference from environmental noise, light scattering, and baseline drift, direct modeling would affect the prediction accuracy. Therefore, it was necessary to perform preprocessing on the original spectral data to enhance the effective signal. This study systematically compared multiple preprocessing methods:

Smooth: Utilizes the Savitzky–Golay convolution smooth algorithm [17] with the aim to suppress random noise and improve the signal-to-noise ratio.

Standard normal variate transformation (SNV): Used to eliminate baseline drift and scale differences caused by leaf surface scattering and variations in optical path length.

Derivative processing: Employs first derivative (FD) and second derivative (SD) processing [18] to eliminate baseline interference and enhance the resolution of overlapping absorption peak.

Combined preprocessing: To further refine the effects, the aforementioned methods were serially combined, such as “smooth + FD ”, “SNV + FD ”, etc., aiming to comprehensively leverage the advantages of multiple methods and provide higher-quality spectral data for the model.

2.5. Spectral Feature Band Extraction

Full spectral band data are voluminous and exhibit significant multicollinearity and redundant information. To simplify the model, improve computational efficiency, and enhance the model’s generalization ability, this study employed the competitive adaptive reweighted sampling (CARS) algorithm to screening the characteristic wavelength variables most correlated with SPAD value from the preprocessed spectra [19]. The CARS algorithm iteratively selects wavelength points with a larger regression absolute value of coefficients through adaptive reweighted sampling and a competitive mechanism, thereby optimizing a highly representative characteristic wavelength subset and significantly reducing the data dimension.

2.6. Machine Learning Modeling

Using the previously obtained characteristic wavelength variable as input and the measured SPAD value as output, we established two prediction models: a back propagation (BP) neural network (BPNN) and a radial basis function neural network (RBFNN).

The BP neural network employs a three-layer topology, adjusting weights and thresholds through the back propagation (BP) error algorithm. This study focused on comparing three training functions—LBFGS (quasi-Newton algorithm), Adam (adaptive moment estimation), and SGD (stochastic gradient descent)—to optimize the model performance [20].

The RBF neural network employs radial basis functions as the activation function in the hidden layer and performs fitting through nonlinear mapping. Three solution methods were compared: direct (direct computation), SVD (singular value decomposition), and gradient (gradient descent) [21].

After the model was constructed, the sample set was randomly split into a training set (for model development) and validation set (for model evaluation) at a certain ratio such as 3:1.

2.7. Model Evaluation Method

In this study, the regression algorithm was implemented using MATLAB R2024b software. Meanwhile, three metrics—coefficient of determination (R²), root mean square error (RMSE), and residual prediction deviation (RPD)—were used to comprehensively evaluate the model performance [22]:

Coefficient of determination (R²): This measures the model goodness of fit and ranges from 0 to 1. A value closer to 1 indicates a stronger agreement between the model predicted value and the measured value. The calculation formula is provided in Formula (1).

Root mean square error (RMSE): Measures the absolute magnitude of the prediction error. A smaller root mean square error (RMSE) value indicates greater model prediction accuracy. The calculation is given in Formula (2).

Residual prediction deviation (RPD): This indicates the model’s predictive ability, calculated as shown in Formula (3). Its evaluation criteria are generally defined as: RPD > 3 indicates excellent model prediction ability; 2 < RPD ≤ 3 means that the model can be used for preliminary prediction; RPD ≤ 2 suggests poor model prediction ability.

In the model evaluation, the dataset was split into a training set and test set at a 3:1 ratio. The above metrics were computed separately to comprehensively assess the model’s fitting effect, prediction accuracy, and generalization ability.

R^{2} = 1 - \frac{\sum_{i = 1}^{n} {(y_{m} - y_{p})}^{2}}{\sum_{i = 1}^{n} {(y_{m} - \bar{y})}^{2}}

(1)

R M S E = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(y_{m} - y_{p})}^{2}}

(2)

R P D = \frac{S_{y}}{R M S E} = \frac{\frac{1}{n - 1} \sum_{i = 1}^{n} {(y_{m, i} - {\bar{y}}_{m})}^{2}}{P M S E}

(3)

For the sample size,

y_{m}

and

y_{p}

are the actual value and the predicted value of the Korla fragrant pear leaf total potassium, respectively;

\bar{y}

is the average value of the actual Korla fragrant pear leaf total potassium;

S_{y}

is the standard deviation of the leaf total potassium measurement value of the Korla fragrant pear.

3. Results

3.1. Analysis of Near Infrared Original Spectrum and SPAD Content of Grey Jujube Leaves

Figure 1a shows the original spectral reflection characteristics of grey jujube leaves in the wavenumber range of 4000–10,000 cm⁻¹. The spectral curve overall exhibited the reflection characteristics typical of healthy green plants. There was a relatively broad and intense absorption valley at approximately 1.5 absorbance, a position that precisely corresponded to the O-H stretching vibrational absorption band of water (H₂O) (where strong absorption peaks typically exist near 5200 cm⁻¹ and 6900 cm⁻¹). This prominent feature indicates that the measured grey jujube leaves contained abundant water and had a good moisture status. Furthermore, in the high wavenumber range of 4000–4500 cm⁻¹, the spectral curve was relatively flat with comparatively weak absorption. The spectral characteristics across the entire near-infrared (NIR) region were closely related to the leaf’s internal structure, water content, and biochemical composition. The original spectrum provides an important data foundation for the subsequent extraction of characteristic wavelength bands and the establishment of a quantitative inversion model for SPAD value or other biochemical parameters.

Figure 1b displays the frequency distribution histogram of the SPAD values in the grey jujube leaf samples. The SPAD value serves as an important indicator of the leaf relative chlorophyll content and even nitrogen nutrition status. In terms of distribution pattern, the SPAD value approximately followed a normal distribution, suggesting that the chlorophyll content in most leaves was moderate and the sample population exhibited good uniformity. The SPAD value was predominantly concentrated between 25 and 45, with the peak count (i.e., sample number) occurring between 35 and 40, indicating that this SPAD value range represents the most typical chlorophyll content level in this study. Notably, leaves with a SPAD value below 25 or above 45 were relatively scarce, which may be due to the fact that all samples were obtained from healthy grey jujube plants, with few individuals showing extreme deficiency or nutrient surplus. The distribution results confirm that the grey jujube leaf samples used in the experiment exhibited both natural variation in chlorophyll content and a relatively stable overall condition, rendering them highly suitable for the development of a predictive model for chlorophyll content.

In summary, this study successfully acquired spectral data and their corresponding SPAD values from healthy grey jujube leaves. The leaves exhibited strong moisture absorption characteristics in the near-infrared region, while their chlorophyll content (represented by SPAD value) showed a concentrated and reasonable distribution. The spectral reflectance characteristics of the leaf are physically associated with its internal biochemical parameters (such as chlorophyll, moisture, and nitrogen content). Therefore, based on the spectral data shown in Figure 1a, it was entirely feasible to further employ modeling methods, such as feature extraction and machine learning, to explore the quantitative relationship with the SPAD value in Figure 1b, ultimately achieving the rapid and non-destructive spectral diagnosis of the chlorophyll content in grey jujube leaves.

3.2. Spectral Preprocessing

Figure 2 shows the effect of different preprocessing techniques on the near-infrared spectroscopy of grey jujube leaf. Smooth processing (Figure 2a) effectively suppresses noise and improves the signal-to-noise ratio, but may weaken subtle characteristic peaks. SNV processing (Figure 2b) eliminates baseline drift and the scattering effect, highlights differences in chemical composition, and enhances model robustness. First derivative (FD) (Figure 2c) effectively removes baseline interference and improves the resolution of overlapping peaks by converting the absorption peak into a zero point and the inflection point into an extreme point. Second derivative (SD) (Figure 2d) further sharpens spectral features and is sensitive to weak signals, though its signal amplitude is relatively small. Combined methods demonstrate a better performance: smooth + FD/SD (Figure 2e,f) applies smooth first to reduce noise before differentiation, resulting in smoother derivative spectra and suppressing noise amplification caused by derivation; SNV + FD/SD (Figure 2g,h) corrects the scattering effects first and then eliminates baseline drift, addressing both issues simultaneously. After processing, the spectral characteristics are prominent and the baseline is flat, significantly improving the data quality, which is highly beneficial for subsequent quantitative model development. Through comparison, it was seen that the combined preprocessing method could comprehensively leverage their respective advantages, providing a higher quality spectral data foundation for establishing a high-precision SPAD prediction model.

3.3. Correlation Analysis Between Pretreatment Spectrum and Leaf SPAD Content

In order to further observe the effect of preprocessing on the leaf SPAD content, this study conducted a correlation analysis. As shown in Figure 3a–h, the correlation coefficient between the leaf SPAD and the target variable was respectively calculated using the following preprocessing methods: smooth, standard normal variate, first derivative, second derivative, smooth + first derivative, smooth + second derivative, standard normal variate + first derivative, and standard normal variate + second derivative. After preprocessing with smooth (Figure 3a), its correlation coefficient with the target variable was generally within the strong negative correlation interval of −1 to −0.75. While this phenomenon highlights a significant association between the spectrum and the target parameter, it also has some obvious drawbacks: on the one hand, an excessively strong negative correlation may obscure the true physical and chemical mechanism, weaken model interpretability, and lead to an increase in the loss of key information and overfitting risk due to an artificially introduced false signal during the smoothing process or the suppression of weak features; on the other hand, this extreme linear trend may contradict the nonlinear relationship of the actual system, misleading the direction of feature selection and model optimization. After three pretreatment methods—standard normal variate, first derivative, and smooth + first derivative—the correlation range fell within −1 to 1, showing a broader interval compared with the smooth pretreatment, whereas the second derivative, smooth + second derivative, standard normal variate + first derivative, and standard normal variate + second derivative performed poorly.

3.4. Spectral Feature Extraction

To enhance the computational efficiency of the model and alleviate its computational burden, this study adopted the competitive adaptive reweighted sampling (CARS) algorithm to conduct characteristic wavelength extraction on spectral data subjected to different preprocessing methods. The competitive adaptive reweighted sampling algorithm effectively screens the characteristic bands most correlated with the target variable through self-adapting weighted sampling and a competitive mechanism, thereby significantly reducing the data dimension and optimizing the struct of model input.

Specifically, after screening by the competitive adaptive reweighted sampling algorithm, the quantity of characteristic wavelength extracted by different preprocessing methods showed notable variation. Among them, the FD preprocessing method yielded the highest number of spectral features, reaching 50, indicating that its spectral response is relatively complex and may hold more original details and variation information (Figure 4c). Combined smooth + first derivative preprocessing obtained 28 characteristic wavelengths, demonstrating that smoothing can retain a considerable amount of effective information while reducing noise (Figure 4e). Standard normal variate + second derivative preprocessing extracted 18 characteristic wavelengths, showing that this combination not only performs scatter true-up and baseline elimination, but also exhibits good feature expression capability (Figure 4h).

On the other hand, standard normal variate, second derivative, and smooth + second derivative each extracted only five characteristic wavelengths, while standard normal variate + first derivative extracted only six. This indicates that these methods exhibit high feature concentration after competitive adaptive reweighted sampling screening, potentially focusing more on core sensitive bands, which is beneficial for building more concise and high-impact prediction models. However, it also implies the possibility of ignoring some bands, leading to weaker model prediction ability (Figure 4a,b,d,f,g).The variation in the quantity of characteristic wavelength not only reflects the impact of different preprocessing methods on the distribution of spectral information, but also hints at their applicability under various modeling requirements: a larger number of features may contain richer variation information but entail higher computation complexity, while a smaller number of features facilitates express and robust modeling, though the risk of information omission must be cautioned against. The results of this study can provide a quantitative basis for subsequent feature selection and model optimization aimed at estimating the SPAD value of grey jujube leaves.

3.5. Visualization of Model Evaluation Indicators

To establish a high-precision prediction model for the SPAD value of grey jujube leaves, this study employed both the BP neural network and RBF neural network methods to develop a quantitative analysis model based on the characteristic wavelength extracted by the competitive adaptive reweighted sampling algorithm. The coefficient of determination (R²), root mean square error (RMSE), and residual prediction deviation (RPD) were selected as evaluation indicators for model performance. The modeling results are shown in Figure 5.

Figure 5a–c displays the distribution of R², RMSE, and RPD indicators on the training set and validation set for each model under different preprocessing methods. Comprehensive analysis indicated that the BP neural network model with smooth + first derivative preprocessing combined with competitive adaptive reweighted sampling feature screening (smooth + first derivative-competitive adaptive reweighted sampling-BP) performed the best. This model achieved R² values of 0.87 and 0.85 on the training set and validation set, respectively, both the highest among all models, demonstrating excellent explained variance capability and good generalization performance. In terms of error, its root mean square error (RMSE) values were the lowest, at 1.36 and 1.45, respectively, indicating a small deviation between the predicted value and the measured value. Furthermore, the RPD values of this model were 2.81 (training set) and 2.56 (validation set), further demonstrating its ideal prediction stability and practicality. Therefore, this study considers that the smooth + first derivative-competitive adaptive reweighted sampling-BP model performed the best across multiple metrics, not only exhibiting a good fitting effect but also strong generalization ability, enabling the reliable prediction of the SPAD value of grey jujube leaves, making it the most promising prediction model in this study.

3.6. Model Training Function Selection

To further investigate the influence of different training function on the model performance of neural networks, this study conducted multiple sets of comparative experiments on both the BP neural network and RBF neural network. The BP neural network employed three training algorithms, Adam, LBFGS, and SGD, whereas the RBF neural network utilized three solution methods, direct, SVD, and gradient. By systematically comparing the coefficient of determination (R²) among the models, we found that the BP neural network generally exhibited a superior performance over the RBF neural network (Figure 6a).

Within the BP neural network, the LBFGS training algorithm achieved the best prediction performance, with R² values reaching 0.87 and 0.85 on the training set and validation set, respectively—exceeding those of the top RBF neural network model by 0.04 and 0.04, respectively. In terms of prediction accuracy, the BP-LBFGS model also excelled, recording the lowest root mean square error (RMSE) values among all six models: 1.36 for the training set and 1.35 for the validation set. These results demonstrate its high prediction accuracy. Additionally, the BP-LBFGS model exhibited strong performance in terms of residual prediction deviation (RPD), with values of 2.81 and 2.56 on the training set and validation set, respectively. Although the RBF-direct model also performed well on the RPD metric (training set: 2.67; validation set: 2.65), the BP-LBFGS model still demonstrated a clear advantage when considering both key indicators of coefficient of determination (R²) and root mean square error (RMSE).

In summary, the BP neural network utilizing the LBFGS training algorithm showed excellent overall performance in predicting the SPAD value of grey jujube leaves. It not only achieved high goodness of fitting and prediction accuracy, but also exhibited strong stability, making it better suited for the application needs of this study.

3.7. Linear Fitting and Relative Error Analysis of Model Prediction Data

Following the comparison of the model evaluation metrics above, the smooth + first derivative-competitive adaptive reweighted sampling-BP model based on the LBFGS function was selected for predicting the SPAD value of grey jujube leaves.

A linear fit between the model’s predicted results and the actual measurements yielded a coefficient of 0.88 for the training set (Figure 7a) and 0.85 for the validation set (Figure 7b).

Additionally, the relative error was calculated between the predicted and measured values. The results for the training set and validation set are presented in Figure 7c,d, respectively. The maximum relative error was below 0.3% for the training set and below 0.25% for the validation set.

These results demonstrate that the smooth + first derivative-competitive adaptive reweighted sampling-BP model based on the LBFGS function exhibits strong predictive ability for estimating the SPAD value of grey jujube leaves.

4. Discussion

This study systematically compared different spectral preprocessing methods, characteristic wavelength selection algorithms, and neural network modeling strategies [23], successfully constructing a high-precision prediction model for SPAD value in grey jujube leaves based on near-infrared spectroscopy. The results indicate that the BP neural network model trained with LBFGS, after preprocessing with smoothing combined with first derivative (Smooth + FD) and extracting characteristic wavelengths via the competitive adaptive reweighted sampling algorithm, demonstrated the optimal prediction performance. This finding not only confirms the feasibility of near-infrared spectroscopy for the non-destructive inspection of biochemical parameters in plant leaves, but also highlights the importance of collaborative optimization among spectral preprocessing, feature selection, and model architecture [24], providing a theoretical basis and methodological reference for developing field-applicable rapid diagnostic equipment in the future.

From the perspective of spectral preprocessing, a single preprocessing method often struggles to comprehensively refine the spectral quality. For instance, in this study, although smoothing alone improved the signal-to-noise ratio, it caused the correlation between the spectrum and SPAD value to be highly concentrated in the strong negative range (−1 to −0.75), which may have obscured the true physicochemical response mechanisms and even introduced false signals [25]. In contrast, derivative processing (especially first derivative) could effectively eliminate baseline drift and improve resolution, but it was also more sensitive to noise [26]. While standard normal variate processing could correct for the scattering effect, it may not have sufficiently highlighted the subtle spectral characteristics related to chlorophyll [27]. Notably, the combined preprocessing method demonstrated significant advantages: smooth + first derivative enhanced the spectral characteristics while suppressing noise, making subsequent modeling more robust. This is consistent with the findings of Tang et al. (2023) in the detection of tea components [28], indicating that combined preprocessing can balance noise control and feature enhancement, serving as an effective strategy to improve the accuracy of near-infrared spectral analysis.

In terms of characteristic wavelength selection, the competitive adaptive reweighted sampling algorithm showed significant differences in the number of features extracted from different preprocessed spectra. The highest number of wavelengths was extracted after first derivative preprocessing (50), possibly because derivative processing amplified and preserved more subtle variations from the original spectrum, but it may also have introduced redundant information [29]. In contrast, fewer features were extracted (5–6) after combined preprocessing methods such as standard normal variate + second derivative, demonstrating higher feature concentration. This indicates that preprocessing methods directly affect the distribution of spectral information and the redundancy of feature expression. Although a greater number of characteristic wavelengths may contain richer information, it also increases the model complexity and the overfitting risk. In this study, the optimal model (smooth + first derivative-competitive adaptive reweighted sampling-BP) ultimately selected 28 characteristic wavelengths, achieving a dimensionality reduction while preserving information content. This indicates that feature selection must seek a balance between informational completeness and model conciseness.

Regarding modeling methods, the BP neural network generally outperformed the RBF neural network, particularly when employing LBFGS as the training method, demonstrating higher prediction accuracy and stability [30]. As one of the quasi-Newton algorithms, LBFGS is suitable for optimization problems with a moderate number of parameters. Its convergence speed and accuracy are typically superior to stochastic gradient descent (SGD) and even adaptive moment estimation (Adam) in spectral modeling. This finding is consistent with the research conclusion of Yang et al. (2021) in the prediction of wheat leaf nitrogen content [31], indicating that the training algorithm based on second derivative (SD) approximation refinement is more suitable for complex nonlinear mapping between the spectrum and biochemical parameters. Furthermore, the optimal model maintained a high coefficient of determination (R²) (>0.84) and RPD (>2.5) on both the training set and validation set, demonstrating that the model possesses both good fitting ability and generalization performance, capable of meeting the demands of practical applications.

Although this study achieved good prediction results, there are still several aspects worthy of in-depth exploration. First, the distribution of SPAD value in grey jujube leaves was relatively concentrated (25–45), with few samples of extreme values, which to some extent limits the model’s extrapolation capability under extreme nutritional conditions. Future studies could expand the sample scope to include leaves from different growth stages and under different nutritional treatments to enhance the model’s generalizability.

Second, although this study employed competitive adaptive reweighted sampling for trait screening, the physical significance and physiological mechanisms of the extracted wavelength points still require further analysis.

Combining the chlorophyll uptake characteristics and moisture interference bands for physical interpretation would help improve the model’s interpretability.

Finally, the current model has not yet considered the impact of environmental factors (such as light, temperature) and differences in leaf physiological structure on the spectrum. Introducing these variables as auxiliary inputs or adopting more complex model architectures like deep learning is expected to further enhance the prediction accuracy. In summary, this study, through multi-method collaborative optimization, constructed a high-precision near-infrared spectroscopy model suitable for the quantitative prediction of SPAD value in grey jujube leaves, providing a reliable analytical framework for crop nutrition via non-destructive inspection. In the future, this method can be rolled out to other stock jujube trees and even more economic crops, promoting the widespread application of spectroscopic technology in smart agriculture.

5. Conclusions

This study successfully validated the feasibility of using near-infrared spectroscopy for the rapid and non-destructive inspection of SPAD value in gray jujube leaf. Compared with traditional destructive chemical measurement methods, this method achieved non-destructive and rapid analysis; moreover, compared with the single-point measurement of SPAD meters, spectroscopy technology more easily enables high-throughput assessment at the canopy scale.

Through systematic comparison, the combination of smooth with first derivative and competitive adaptive reweighted sampling characteristic wavelength selection was determined to be the optimal scheme. On this foundation, the established BP neural network model based on the LBFGS algorithm achieved the best prediction performance (validation set coefficient of determination (R²) > 0.84, RPD > 2.5), confirming the model’s excellent accuracy and robustness.

In summary, the technical system constructed in this research provides an advanced and reliable solution for the rapid field diagnosis of gray jujube nitrogen nutrition and precision fertilization management. The successful application of this method will directly serve production practice, providing strong technical support in achieving the precision agriculture goals of improving fertilizer utilization efficiency, reducing production costs, decreasing non-point source pollution, and guaranteeing high-quality and high-yield jujube fruits.

Author Contributions

Conceptualization, L.W. and J.Z.; Methodology, L.W. and J.Z.; Software, M.Y.; Validation, W.F.; Investigation, W.F.; Resources, J.B.; Data curation, L.W. and J.Z.; Writing—original draft, L.W. and J.Z.; Writing—review & editing, L.W., J.Z. and J.B.; Visualization, M.Y.; Project administration, J.B.; Funding acquisition, J.B. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Bingtuan Science and Technology Program (2022CB001-11).

Data Availability Statement

Data is contained within the article.

Conflicts of Interest

The authors declare no conflict of interest.

References

Ji, X.; Peng, Q.; Yuan, Y.; Shen, J.; Xie, X.; Wang, M. Isolation, structures and bioactivities of the polysaccharides from jujube fruit (Ziziphus jujuba Mill.): A review. Food Chem. 2017, 227, 349–357. [Google Scholar] [CrossRef]
Cai, W.; Zhuang, H.; Wang, X.; Fu, X.; Chen, S.; Yao, L.; Sun, M.; Wang, H.; Yu, C.; Feng, T. Functional Nutrients and Jujube-Based Processed Products in Ziziphus jujuba. Molecules 2024, 29, 3437. [Google Scholar] [CrossRef]
Murchie, E.H.; Lawson, T. Chlorophyll fluorescence analysis: A guide to good practice and understanding some new applications. J. Exp. Bot. 2013, 64, 3983–3998. [Google Scholar] [CrossRef] [PubMed]
Zarco-Tejada, P.J.; González-Dugo, V.; Berni, J.A.J. Fluorescence, temperature and narrow-band indices acquired from a UAV platform for water stress detection using a micro-hyperspectral imager and a thermal camera. Remote Sens. Environ. 2012, 117, 322–337. [Google Scholar] [CrossRef]
Xu, L.; Baldocchi, D.D. Seasonal trends in photosynthetic parameters and stomatal conductance of blue oak (Quercus douglasii) under prolonged summer drought and high temperature. Tree Physiol. 2003, 23, 865–877. [Google Scholar] [CrossRef]
Zhang, Y.; Liang, K.; Zhu, F.; Zhong, X.; Lu, Z.; Chen, Y.; Pan, J.; Lu, C.; Huang, J.; Ye, Q.; et al. Differential Study on Estimation Models for Indica Rice Leaf SPAD Value and Nitrogen Concentration Based on Hyperspectral Monitoring. Remote Sens. 2024, 16, 4604. [Google Scholar] [CrossRef]
Li, D.; Hu, Q.; Ruan, S.; Liu, J.; Zhang, J.; Hu, C.; Liu, Y.; Dian, Y.; Zhou, J. Utilizing Hyperspectral Reflectance and Machine Learning Algorithms for Non-Destructive Estimation of Chlorophyll Content in Citrus Leaves. Remote Sens. 2023, 15, 4934. [Google Scholar] [CrossRef]
Lopin, P.; Nawsang, P.; Laywisadkul, S.; Lopin, K.V. Evaluation of Low-Cost Multi-Spectral Sensors for Measuring Chlorophyll Levels Across Diverse Leaf Types. Sensors 2025, 25, 2198. [Google Scholar] [CrossRef]
Barnes, R.J.; Dhanoa, M.S.; Lister, S.J. Standard Normal Variate Transformation and De-Trending of Near-Infrared Diffuse Reflectance Spectra. Appl. Spectrosc. 1989, 43, 772–777. [Google Scholar] [CrossRef]
Ferguson, D.; Henderson, A.; McInnes, E.F.; Lind, R.; Wildenhain, J.; Gardner, P. Infrared micro-spectroscopy coupled with multivariate and machine learning techniques for cancer classification in tissue: A comparison of classification method, performance, and pre-processing technique. Analyst 2022, 147, 3709–3722. [Google Scholar] [CrossRef]
Bian, X.; Wang, K.; Tan, E.; Diwu, P.; Zhang, F.; Guo, Y. A selective ensemble preprocessing strategy for near-infrared spectral quantitative analysis of complex samples. Chemom. Intell. Lab. Syst. 2020, 197, 103916. [Google Scholar] [CrossRef]
Li, M. Comprehensive Review of Backpropagation Neural Networks. Acad. J. Sci. Technol. 2024, 9, 150–154. [Google Scholar] [CrossRef]
Wang, Z.; Chen, M.; Chen, J. Solving multiscale elliptic problems by sparse radial basis function neural networks. J. Comput. Phys. 2023, 492, 112452. [Google Scholar] [CrossRef]
Angel, Y.; McCabe, M.F. Machine Learning Strategies for the Retrieval of Leaf-Chlorophyll Dynamics: Model Choice, Sequential Versus Retraining Learning, and Hyperspectral Predictors. Front. Plant Sci. 2022, 13, 722442. [Google Scholar] [CrossRef] [PubMed]
Fonseka, C.L.I.S.; Halloluwa, T.; Hewagamage, K.P.; Rathnayake, U.; Bandara, R.M.U.S. A dataset of unmanned aerial vehicle multispectral images acquired over a field to identify nitrogen requirements. Data Brief 2024, 54, 110479. [Google Scholar] [CrossRef] [PubMed]
Paiva, D.N.A.; de Oliveira Perdiz, R.; Almeida, T.E. Using near-infrared spectroscopy to discriminate closely related species: A case study of neotropical ferns. bioRxiv 2020. bioRxiv:2020.2010.2019.343947. [Google Scholar] [CrossRef]
Zhang, G.; Hao, H.; Wang, Y.; Jiang, Y.; Shi, J.; Yu, J.; Cui, X.; Li, J.; Zhou, S.; Yu, B. Optimized adaptive Savitzky-Golay filtering algorithm based on deep learning network for absorption spectroscopy. Spectrochim. Acta Part A Mol. Biomol. Spectrosc. 2021, 263, 120187. [Google Scholar] [CrossRef]
Satyanarayana, C.; Yadav, M.K.; Nath, M. Multiquadric based RBF-HFD approximation formulas and convergence properties. Eng. Anal. Bound. Elem. 2024, 160, 234–257. [Google Scholar] [CrossRef]
Zhang, W.; Kasun, L.C.; Wang, Q.J.; Zheng, Y.; Lin, Z. A Review of Machine Learning for Near-Infrared Spectroscopy. Sensors 2022, 22, 9764. [Google Scholar] [CrossRef]
Chainakun, P.; Fongkaew, I.; Hancock, S.; Young, A.J. Predicting the black hole mass and correlations in X-ray reverberating AGNs using neural networks. Mon. Not. R. Astron. Soc. 2022, 513, 648–660. [Google Scholar] [CrossRef]
Malladi, S.; Lyu, K.; Panigrahi, A.; Arora, S. On the SDEs and Scaling Rules for Adaptive Gradient Algorithms. arXiv 2022, arXiv:2205.10287. [Google Scholar] [CrossRef]
Wang, Z.; Zhou, J.; Peng, K. The Potential of Multi-Task Learning in CFDST Design: Load-Bearing Capacity Design with Three MTL Models. Materials 2024, 17, 1994. [Google Scholar] [CrossRef] [PubMed]
Moradmand, H.; Aghamiri, S.M.R.; Ghaderi, R. Impact of image preprocessing methods on reproducibility of radiomic features in multimodal magnetic resonance imaging in glioblastoma. J. Appl. Clin. Med. Phys. 2020, 21, 179–190. [Google Scholar] [CrossRef]
Li, Q.; Zhang, Z.; Ma, Z. Raman spectral pattern recognition of breast cancer: A machine learning strategy based on feature fusion and adaptive hyperparameter optimization. Heliyon 2023, 9, e18148. [Google Scholar] [CrossRef] [PubMed]
Alsaify, B.A.; Almazari, M.M.; Alazrai, R.; Alouneh, S.; Daoud, M.I. A CSI-Based Multi-Environment Human Activity Recognition Framework. Appl. Sci. 2022, 12, 930. [Google Scholar] [CrossRef]
Wang, X.; Lu, R.; Bi, H.; Li, Y. An Infrared Small Target Detection Method Based on Attention Mechanism. Sensors 2023, 23, 8608. [Google Scholar] [CrossRef]
Muncan, J.; Tsenkova, R. Aquaphotomics—Exploring Water Molecular Systems in Nature. Molecules 2023, 28, 2630. [Google Scholar] [CrossRef]
Tang, T.; Luo, Q.; Yang, L.; Gao, C.; Ling, C.; Wu, W. Research Review on Quality Detection of Fresh Tea Leaves Based on Spectral Technology. Foods 2024, 13, 25. [Google Scholar] [CrossRef]
Leitherer, C. Massive Star Formation in the Ultraviolet Observed with the Hubble Space Telescope. Galaxies 2020, 8, 13. [Google Scholar] [CrossRef]
Livieris, I.E. Improving the Classification Efficiency of an ANN Utilizing a New Training Methodology. Informatics 2019, 6, 1. [Google Scholar] [CrossRef]
Yang, B.; Ma, J.; Yao, X.; Cao, W.; Zhu, Y. Estimation of Leaf Nitrogen Content in Wheat Based on Fusion of Spectral Features and Deep Features from Near Infrared Hyperspectral Imagery. Sensors 2021, 21, 613. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Near-infrared original spectrum and SPAD value statistics of grey jujube leaf: (a) visualization of the original spectrum of grey jujube leaf; (b) visualization of the SPAD value statistics of grey jujube leaf(Lines of different colors correspond to distinct sample datasets.).

Figure 2. Visualization of spectral preprocessing: (a) smooth algorithm; (b) SNV algorithm; (c) first derivative algorithm; (d) second derivative algorithm; (e) smooth + first derivative algorithm; (f) smooth + second derivative algorithm; (g) SNV + first derivative algorithm; (h) SNV + second derivative algorithm (Lines of different colors correspond to distinct sample datasets.).

Figure 3. Correlation between the preprocessed spectra and leaf SPAD content: (a) smooth; (b) standard normal variate; (c) first derivative; (d) second derivative; (e) smooth + first derivative; (f) smooth + second derivative; (g) standard normal variate + first derivative; (h) standard normal variate + second derivative.

Figure 4. Feature extraction sites of various preprocessed spectra based on competitive adaptive reweighted sampling: (a) smooth preprocessing algorithm; (b) standard normal variate preprocessing algorithm; (c) first derivative preprocessing algorithm; (d) second derivative preprocessing algorithm; (e) smooth + first derivative preprocessing algorithm; (f) smooth + second derivative preprocessing algorithm; (g) standard normal variate + first derivative preprocessing algorithm; (h) standard normal variate + second derivative preprocessing algorithm.

Figure 5. Visualization of the model evaluation metrics: (a) R² of the training set and the validation set; (b) root mean square error (RMSE) of the training set and the validation set; (c) RPD of the training set and the validation set. BP denotes the model developed using the BP neural network method, while RBF represents the model constructed with the RBF neural network method.

Figure 6. Visualization of the model evaluation metrics under different training functions: (a) R² of models under different training functions; (b) root mean square error (RMSE) of models under different training functions; (c) RPD of models under different training functions.

Figure 7. Linear fitting and relative error analysis of the smooth + first derivative-competitive adaptive reweighted sampling-BP model prediction data: (a) linear fitting plot for the training set; (b) linear fitting plot for the validation set; (c) relative error analysis for the training set; (d) relative error analysis for the validation set.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, L.; Zeng, J.; Yu, M.; Fan, W.; Bao, J. Detection of SPAD Content in Leaves of Grey Jujube Based on Near Infrared Spectroscopy. Horticulturae 2025, 11, 1251. https://doi.org/10.3390/horticulturae11101251

AMA Style

Wang L, Zeng J, Yu M, Fan W, Bao J. Detection of SPAD Content in Leaves of Grey Jujube Based on Near Infrared Spectroscopy. Horticulturae. 2025; 11(10):1251. https://doi.org/10.3390/horticulturae11101251

Chicago/Turabian Style

Wang, Lanfei, Junkai Zeng, Mingyang Yu, Weifan Fan, and Jianping Bao. 2025. "Detection of SPAD Content in Leaves of Grey Jujube Based on Near Infrared Spectroscopy" Horticulturae 11, no. 10: 1251. https://doi.org/10.3390/horticulturae11101251

APA Style

Wang, L., Zeng, J., Yu, M., Fan, W., & Bao, J. (2025). Detection of SPAD Content in Leaves of Grey Jujube Based on Near Infrared Spectroscopy. Horticulturae, 11(10), 1251. https://doi.org/10.3390/horticulturae11101251

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Detection of SPAD Content in Leaves of Grey Jujube Based on Near Infrared Spectroscopy

Abstract

1. Introduction

2. Materials and Methods

2.1. Overview of Test Site

2.2. Sample Collection

2.3. Original Spectral Acquisition

2.4. Spectral Data Conversion

2.5. Spectral Feature Band Extraction

2.6. Machine Learning Modeling

2.7. Model Evaluation Method

3. Results

3.1. Analysis of Near Infrared Original Spectrum and SPAD Content of Grey Jujube Leaves

3.2. Spectral Preprocessing

3.3. Correlation Analysis Between Pretreatment Spectrum and Leaf SPAD Content

3.4. Spectral Feature Extraction

3.5. Visualization of Model Evaluation Indicators

3.6. Model Training Function Selection

3.7. Linear Fitting and Relative Error Analysis of Model Prediction Data

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI