Estimation of Cadmium Content in Lactuca sativa L. Leaves Using Visible–Near-Infrared Spectroscopy Technology

: In order to monitor cadmium contamination in lettuce quickly, non-invasively, and accurately, and to understand the growth status of lettuce under cadmium pollution, lettuce was used as the test material to detect and analyze the visible–near-infrared reflectance spectra and leaf cadmium content under different concentrations of cadmium stress. A model for estimating lettuce leaf cadmium content was established. For model establishment, firstly, the original spectra were preprocessed using smoothing (Savitzky–Golay, SG), SG combined with multiplicative scatter correction (MSC), SG combined with standard normal variable transformation (SNV), SG combined with mean normalization (MN), SG combined with the first derivative (FD), SG combined with the second derivative (SD), SG combined with the baseline offset (B), and SG combined with de-trending (D). Then, the principal component analysis (PCA) was applied to perform dimensionality reduction on the data. Finally, the reduced dataset was divided into training and testing sets in a 2:1 ratio, and separate models for estimating the lettuce leaf cadmium content were built using partial least squares regression (PLSR), the backpropagation neural network (BP-NN), and support vector regression (SVR) in combination. The results showed that the accumulated cadmium content in lettuce leaves increased with an increase in the soil cadmium concentration. In the visible light range, the spectral reflectance of lettuce leaves increased with an increase in the cadmium concentration. In the near-infrared range, the spectral reflectance of the lettuce leaves under 10 mg/kg and 20 mg/kg cadmium stress was lower than that of the control group. The PLSR models established using the SG + MSC and SG + SNV preprocessing methods exhibited the strongest estimation capability for lettuce leaf cadmium content, with R p2 and RMSE p values of 0.92 and 1.53 mg/kg, respectively, for the testing dataset. This study demonstrated that visible–near-infrared spectroscopy has great potential in monitoring cadmium contamination in lettuce.


Introduction
Cadmium (Cd) contamination is a global environmental issue, with many countries and regions facing varying degrees of cadmium pollution.In the 1930s, Japan experienced "Itai-itai disease", which was caused by Cd contamination.Surveys conducted in the United States and the European Community have shown that 82-94% of cadmium released into the environment ends up in the soil through various pathways, with agricultural soils making up a significant portion of this [1].In India, the average levels of Cd in different types of soil have been found to exceed the prescribed limits [2].Roughly 20% of China's arable land is affected by heavy metal pollution [3], with cadmium contamination exceeding standards in 7% of cases, making it the primary pollutant in contaminated soil areas [4].Cadmium has strong biotoxicity and is difficult to degrade.Once absorbed and enriched by plants, it not only affects the growth and development of plants but also poses a threat to human health through the food chain [5][6][7].Thus, swiftly and accurately monitoring or identifying the extent of heavy metal stress in plants holds significant importance for ensuring food safety.Traditional chemical detection methods, while accurate, are complex and destructive, making it difficult to achieve rapid and large-scale monitoring.This poses challenges in terms of manpower and resources [8,9].However, spectroscopy offers the advantages of high resolution and information content [10], providing the possibility for the fast and non-destructive detection of heavy metal pollution in plants.
Cadmium stress can inhibit plant root growth, disrupt plant water balance and nutrient uptake, reduce photosynthetic efficiency, and lead to oxidative damage of leaves and chlorophyll reduction [11][12][13].These symptoms can affect the absorption or reflection of light by plants, resulting in changes in the reflectance spectrum of plant leaves [14].Therefore, reflectance spectroscopy can be used to identify cadmium pollution in plants.
Currently, extensive research has been conducted by scholars utilizing the variation characteristics of plant reflectance spectra to monitor heavy metal pollution in plants.Gu et al. [15] utilized correlation analysis and stepwise regression statistical methods to conduct a statistical analysis on the relationship between the leaf original spectrum, first-order derivative spectrum, spectral parameters, and cadmium content.The results showed that cadmium pollution has a significant impact on the spectral reflectance of small bok choy leaves in the visible, near-infrared, and far-infrared regions.Shi et al. [16] investigated the spectral response of citrus leaves under soil cadmium stress and developed a regression prediction model for spectral indices.They concluded that visible-nearinfrared spectroscopy has great potential in monitoring heavy metal pollution in citrus trees.Abdel-Rahman et al. [17] utilized hyperspectral data and partial least squares (PLSs) regression to establish a quantitative monitoring model for cadmium content in Swiss chard leaves.Paresh H et al. [18] investigated the reflectance spectra of barley leaves under cadmium stress and found that there were significant differences in the spectral reflectance under different levels of cadmium stress.Spectroscopic techniques can accurately monitor cadmium stress in crops.Li et al. [19] studied the hyperspectral reflectance of lettuce leaves under cadmium stress and found that the partial least squares (PLSs) regression model based on the first-order differential spectrum (FDR) could accurately predict the cadmium mass ratio in lettuce leaves.
Therefore, in this study, lettuce was used as the test material, and visible-near-infrared spectral technology was employed as the research approach.By utilizing a pot cultivation method with externally added cadmium, the impact of cadmium stress on the visiblenear-infrared reflectance spectra of lettuce leaves was investigated.Visible-near-infrared spectroscopy technology not only enables rapid, accurate, non-destructive testing but also possesses advantages, such as easy operation, environmental friendliness, and versatility.The aim was to establish a monitoring model for cadmium stress in lettuce, providing a theoretical basis and reference for the rapid, non-destructive, and accurate monitoring of cadmium stress in lettuce.

Materials and Experimental Design
The experiment was conducted in July 2023 on the campus of Jilin Agricultural University in Changchun, Jilin Province (125 • 42 ′ E and 43 • 82 ′ N), by using a pot cultivation method with externally added cadmium.The test material that was used was Lactuca sativa L., and it was purchased from Kuishou Agriculture and Technology Company (Weifang, China).The test soil is an all-purpose seedling substrate, consisting of pure natural black peat, and it was purchased from Changchun Saishi Agricultural Development Co., Ltd.(Changchun, China) It had an organic matter content of 12.59 g/kg, a total nitrogen content of 0.727 g/kg, an available phosphorus of 0.007 g/kg, and an available potassium content of 0.15 g/kg.Impurities in the soil were screened out and it was ground into fine particles and then left to rest in a dry and ventilated place for 3 days.Distilled water was utilized as the solvent, and cadmium nitrate served as the exogenous cadmium source.A 200 mL solution containing varying concentrations of cadmium was sprayed layer by layer onto the respective experimental soil.Based on the variation range of 0.2 to 20.0 mg/kg for the environmental standard values of cadmium in agricultural soils, obtained from surveys across 16 countries and 2 international organizations, the soil cadmium content was established at 0 (CK), 1, 5, 10, and 20 mg/kg, adhering to the principle of incremental increase [20].After thorough mixing, the soil was left to age for 10 days before being loaded into pots.The dimensions of each pot were 480 mm × 230 mm × 160 mm, and 1.5 kg of soil was loaded into each pot [21].Each treatment level was set with 3 replicates.When the lettuce seedlings grew to the stage of two leaves and one heart, the seedlings with consistent growth and good development were selected and transplanted into pots, with 3 seedlings per pot.To ensure the healthy growth of the lettuce, an adequate water supply was maintained consistently throughout the entire experiment.Watering was conducted daily to ensure that the soil moisture remained between 60% and 70%.The moisture level of the soil was measured using the Y 315 soil temperature, moisture, and pH triple-function meter (YANI, Jinan, China).Furthermore, the pots were repositioned every two days during the experiment to ensure even exposure to light.The experiment was conducted under natural light conditions, with the daytime temperatures at 25 ± 2 • C and the nighttime temperatures at 18 ± 2 • C. The relative humidity was maintained between 60% and 70%.The environmental temperature and humidity were measured using a DL 9010 hygrometer (Deli, Ningbo, China).

Spectral Data Acquisition
The visible-near-infrared reflectance spectra data of the lettuce leaves were collected under five different treatments following 45 days of cadmium stress.The measurements were obtained using the AvaSpec-ULS2048 multi-purpose fiber optic spectrometer manufactured by Aventes (Apeldoorn, Netherlands) in the Netherlands, featuring a wavelength range of 200-1100 nm and a spectral resolution of 0.05-20 nm.The light source utilized was the AvaLight-DHc full-spectrum compact light source, also produced by Aventes, comprising a deuterium lamp wavelength range of 200-400 nm and a tungsten halogen lamp wavelength range of 400-2500 nm.The spectrometer measured a spectrum with 1610 data points.
The measurements were taken on a sunny day with no wind and few clouds, between 10:00 and 14:00.During the data collection, the reflective probe's fiber optic cables were connected to the spectrometer and light source.Then, the reflective probe was fixed using a probe holder, ensuring it formed a 45 • angle with the leaf surface.Finally, a deuterium-halogen lamp was selected as the light source.After preheating for 8 min, a white calibration was performed, followed by the measurements.Throughout the entire experiment, a white calibration was conducted every 30 min.The top 2 or 3 leaves of the lettuce samples were selected for measurement, avoiding the main leaf veins.Three spectral data readings were taken from each lettuce leaf, resulting in a total of 135 spectral curves.The spectral information was collected as shown in Figure 1.
Spectral data can be affected by high-frequency random noise, baseline drift, and light scattering.To mitigate these interferences, preprocessing of the raw spectra is essential [22].In this research, the original spectra underwent preprocessing using various methods such as smoothing (Savitzky-Golay, SG), which was combined with multiplicative scatter correction (MSC), standard normal variable transformation (SNV), mean normalization (MN), the first derivative (FD), the second derivative (SD), the baseline offset (B), and de-trending (D).

Leaf Cadmium Content Determination
The determination of the cadmium content in the leaves was carried out through a digestion method [23].Firstly, the collected leaves were washed, dried, and ashed.Then, 0.10 g-0.70 g of the sample was placed in a dry polyvinyl chloride microwave tube, and 10 mL of superior pure concentrated nitric acid was added.The mixture was oscillated for 1 min.Subsequently, the polyvinyl chloride digestion tube was placed in a 120-degree Celsius reaction furnace for pre-digestion for 20 min.The process was considered complete when a small amount of yellow smoke emerged from the digestion vessel and cooled to room temperature.Next, 5 mL of superior pure concentrated nitric acid was added to the pre-digested microwave tube, which was then placed in a Touchwin 2.0 microwave digestion instrument for microwave dim ingestion.Finally, the digestion solution was transferred to a centrifuge tube with a total volume of 50 mL through three additions of ultrapure water.The mixture was oscillated for 1 min and filtered using a 0.22 um filter membrane to remove impurities.Ultimately, the cadmium content in the leaves was quantified via inductively coupled plasma-mass spectrometry (ICP-MS) (300D, Perkinelmer, Waltham, MA, USA).

Data Processing and Analysis
The spectral data were collected using AvaSoft 8 (Avantes, Apeldoorn, The Netherlands) spectral acquisition software, with the basic data input and analysis conducted in Excel 2010 (Microsoft, Redmond, DC, USA).The preprocessing of the collected spectral data was performed using The Unscrambler X 10.4 (CAMO, Oslo, Norway), while the model training and validation were carried out in Matlab 2023b (MathWorks, Natick, MA, USA).The graphs were produced using Origin 2021 (Electronic Arts Inc (EA), Redwood City, DC, USA).
A significance test is utilized to ascertain if there are notable variances in the average cadmium content within lettuce leaves across various groups.The variance analysis was conducted, followed by Duncan's multiple range test, with the LSD (Least Significant Difference) used for post hoc testing.Typically, a significance level of 5% or 1% is employed as the threshold for establishing significance.When the significance level falls below 5% or 1%, it indicates a notable or highly notable difference.
Partial least squares regression (PLSR) integrates the characteristics of multiple linear regression analysis, principal component analysis, and canonical correlation analysis.It can eliminate the issue of multicollinearity among independent variables, thereby enhancing the stability and predictive power of the constructed model [24].This makes it particularly suitable for spectral analysis, which involves a large number of independent variables, offering an effective approach for addressing multivariate to multivariate problems [25].

Leaf Cadmium Content Determination
The determination of the cadmium content in the leaves was carried out through a digestion method [23].Firstly, the collected leaves were washed, dried, and ashed.Then, 0.10-0.70g of the sample was placed in a dry polyvinyl chloride microwave tube, and 10 mL of superior pure concentrated nitric acid was added.The mixture was oscillated for 1 min.Subsequently, the polyvinyl chloride digestion tube was placed in a 120-degree Celsius reaction furnace for pre-digestion for 20 min.The process was considered complete when a small amount of yellow smoke emerged from the digestion vessel and cooled to room temperature.Next, 5 mL of superior pure concentrated nitric acid was added to the predigested microwave tube, which was then placed in a Touchwin 2.0 microwave digestion instrument for microwave dim ingestion.Finally, the digestion solution was transferred to a centrifuge tube with a total volume of 50 mL through three additions of ultrapure water.The mixture was oscillated for 1 min and filtered using a 0.22 um filter membrane to remove impurities.Ultimately, the cadmium content in the leaves was quantified via inductively coupled plasma-mass spectrometry (ICP-MS) (300D, Perkinelmer, Waltham, MA, USA).

Data Processing and Analysis
The spectral data were collected using AvaSoft 8 (Avantes, Apeldoorn, The Netherlands) spectral acquisition software, with the basic data input and analysis conducted in Excel 2010 (Microsoft, Redmond, DC, USA).The preprocessing of the collected spectral data was performed using The Unscrambler X 10.4 (CAMO, Oslo, Norway), while the model training and validation were carried out in Matlab 2023b (MathWorks, Natick, MA, USA).The graphs were produced using Origin 2021 (Electronic Arts Inc. (EA), Redwood City, DC, USA).
A significance test is utilized to ascertain if there are notable variances in the average cadmium content within lettuce leaves across various groups.The variance analysis was conducted, followed by Duncan's multiple range test, with the LSD (Least Significant Difference) used for post hoc testing.Typically, a significance level of 5% or 1% is employed as the threshold for establishing significance.When the significance level falls below 5% or 1%, it indicates a notable or highly notable difference.
Partial least squares regression (PLSR) integrates the characteristics of multiple linear regression analysis, principal component analysis, and canonical correlation analysis.It can eliminate the issue of multicollinearity among independent variables, thereby enhancing the stability and predictive power of the constructed model [24].This makes it particularly suitable for spectral analysis, which involves a large number of independent variables, offering an effective approach for addressing multivariate to multivariate problems [25].
The BP neural network (BPNN) refers to a multilayer neural network that utilizes error backpropagation, comprising an input layer, hidden layers, and an output layer.These layers are interconnected through corresponding activation functions, weights, and thresholds [26].The BPNN possesses robust capabilities for nonlinear representation, along with significant adaptive capacity and generalization performance, making it commonly used in research on pattern recognition and classification problems [27,28].
Support vector regression (SVR) is an effective predictive algorithm for addressing regression issues.It operates by identifying a hyperplane within the original feature space to map input variables to output variables.To tackle nonlinear problems, support vector machines utilize kernel functions to perform nonlinear mapping in high-dimensional spaces [29,30].SVR balances the training error and generalization capability, demonstrating numerous unique advantages in solving pattern recognition problems characterized by small samples, nonlinearity, high dimensionality, and local minima [31].
Therefore, this study employed three algorithms, PLSR, BPNN, and SVR, to construct the predictive models for lettuce Cd contamination.The accuracy of the models was evaluated using the coefficient of determination (R 2 ) and root mean square error (RMSE).A higher R 2 , closer to 1, and an RMSE closer to 0 indicate greater precision of the established models.

Effects of Cadmium Stress on Cadmium Content in Lettuce Leaves
According to Figure 2, it can be observed that the cadmium content in lettuce leaves increases with the increasing cadmium concentration in the soil.This indicates that lettuce exhibits sensitivity to cadmium stress and possesses a pronounced capability for cadmium accumulation.Furthermore, the accumulation effect intensifies as the concentration of cadmium increases.This phenomenon can be attributed to the substantial presence of organic acids, cellulose, pectin, and similar substances within the plant cell wall, which have the ability to chelate cadmium.With escalating cadmium concentrations, there is a corresponding increase in cadmium content within the cellular organelles, leading to its accumulation [32,33].
The BP neural network (BPNN) refers to a multilayer neural network that utilizes error backpropagation, comprising an input layer, hidden layers, and an output layer.These layers are interconnected through corresponding activation functions, weights, and thresholds [26].The BPNN possesses robust capabilities for nonlinear representation, along with significant adaptive capacity and generalization performance, making it commonly used in research on pattern recognition and classification problems [27,28].
Support vector regression (SVR) is an effective predictive algorithm for addressing regression issues.It operates by identifying a hyperplane within the original feature space to map input variables to output variables.To tackle nonlinear problems, support vector machines utilize kernel functions to perform nonlinear mapping in high-dimensional spaces [29,30].SVR balances the training error and generalization capability, demonstrating numerous unique advantages in solving pattern recognition problems characterized by small samples, nonlinearity, high dimensionality, and local minima [31].
Therefore, this study employed three algorithms, PLSR, BPNN, and SVR, to construct the predictive models for lettuce Cd contamination.The accuracy of the models was evaluated using the coefficient of determination (R 2 ) and root mean square error (RMSE).A higher R 2 , closer to 1, and an RMSE closer to 0 indicate greater precision of the established models.

Effects of Cadmium Stress on Cadmium Content in Lettuce Leaves
According to Figure 2, it can be observed that the cadmium content in lettuce leaves increases with the increasing cadmium concentration in the soil.This indicates that lettuce exhibits sensitivity to cadmium stress and possesses a pronounced capability for cadmium accumulation.Furthermore, the accumulation effect intensifies as the concentration of cadmium increases.This phenomenon can be attributed to the substantial presence of organic acids, cellulose, pectin, and similar substances within the plant cell wall, which have the ability to chelate cadmium.With escalating cadmium concentrations, there is a corresponding increase in cadmium content within the cellular organelles, leading to its accumulation [32,33].

Analysis of Visible-Near-Infrared Reflectance Spectroscopy
The wavelength range of the collected spectra in the experiment is 177.480nm to 1100.316 nm.Plants are sensitive to heavy metal stress in the visible-near-infrared wavelength range, and within the near-infrared range, the spectral response of plants to heavy metal stress is mainly observed in the red edge (670 nm to 780 nm) and red valley bands (600 nm to 720 nm) [19,34].In order to improve model accuracy, reduce noise interference, and consider significant spectral jitter beyond 850 nm, this study selects the spectral bands ranging from 400 nm to 850 nm for analysis.The average spectral curves for each treatment are obtained, resulting in the visible-near-infrared reflectance spectrum shown in Figure 3.

Analysis of Visible-Near-Infrared Reflectance Spectroscopy
The wavelength range of the collected spectra in the experiment is 177.480nm to 1100.316 nm.Plants are sensitive to heavy metal stress in the visible-near-infrared wavelength range, and within the near-infrared range, the spectral response of plants to heavy metal stress is mainly observed in the red edge (670 nm to 780 nm) and red valley bands (600 nm to 720 nm) [19,34].In order to improve model accuracy, reduce noise interference, and consider significant spectral jitter beyond 850 nm, this study selects the spectral bands ranging from 400 nm to 850 nm for analysis.The average spectral curves for each treatment are obtained, resulting in the visible-near-infrared reflectance spectrum shown in Figure 3.The visible-near-infrared reflectance spectra of lettuce leaves show distinct absorption valleys, known as "blue valleys" and "red valleys", around 430 nm and 670 nm, respectively.A reflection peak, referred to as the "green peak", appears around 550 nm.Between 680 nm and 750 nm, there is a sharp increase in leaf reflectance, indicating the presence of the typical "red edge effect".Additionally, a small absorption valley is observed around 760 nm, possibly caused by a narrow water absorption band near 760 nm due to water vapor absorption [35,36].In the visible light spectrum, the leaf spectral reflectance increases with the rise in cadmium concentration.However, under cadmium stress at 1 mg/kg and 5 mg/kg, the changes in the leaf spectral reflectance are less pronounced compared to the CK group.This could be attributed to the fact that cadmium stress at 1 mg/kg and 5 mg/kg promotes plant growth and development, leading to an increase in chlorophyll content, thereby resulting in less noticeable variations in the leaf spectral reflectance compared to the CK group [37].Under cadmium stress at concentrations of 10 mg/kg and 20 mg/kg, the leaf spectral reflectance is significantly higher than that of the CK group.This increase in reflectance could be attributed to the inhibitory effect of cadmium stress at these concentrations on the chlorophyll content in lettuce leaves.The chlorophyll content decreases, leading to leaf yellowing and causing changes in the leaf spectral reflectance [38].
In the near-infrared wavelength range, under cadmium stress of 10 mg/kg and 20 mg/kg, the leaf spectral reflectance is lower than that of the CK group.This could be attributed to the fact that cadmium stress at 10 mg/kg and 20 mg/kg can lead to the destruction of the internal cellular structures in the leaves, resulting in a reduced reflectance ability.Consequently, the leaf spectral reflectance is lower than that of the CK group [39].

Spectral Preprocessing
According to Figure 4, it can be observed that the SG treatment technique effectively reduces noise and interference signals in the data.The SNV treatment standardizes the spectral curves through normalization.The MSC treatment alleviates the impact of scattering on the spectral curves.The MN treatment helps achieve consistency in the spectral intensity.The FD and SD treatment techniques address overlapping peaks in the spectral curves, enhancing the discrimination between spectra.The B treatment increases the The visible-near-infrared reflectance spectra of lettuce leaves show distinct absorption valleys, known as "blue valleys" and "red valleys", around 430 nm and 670 nm, respectively.A reflection peak, referred to as the "green peak", appears around 550 nm.Between 680 nm and 750 nm, there is a sharp increase in leaf reflectance, indicating the presence of the typical "red edge effect".Additionally, a small absorption valley is observed around 760 nm, possibly caused by a narrow water absorption band near 760 nm due to water vapor absorption [35,36].In the visible light spectrum, the leaf spectral reflectance increases with the rise in cadmium concentration.However, under cadmium stress at 1 mg/kg and 5 mg/kg, the changes in the leaf spectral reflectance are less pronounced compared to the CK group.This could be attributed to the fact that cadmium stress at 1 mg/kg and 5 mg/kg promotes plant growth and development, leading to an increase in chlorophyll content, thereby resulting in less noticeable variations in the leaf spectral reflectance compared to the CK group [37].Under cadmium stress at concentrations of 10 mg/kg and 20 mg/kg, the leaf spectral reflectance is significantly higher than that of the CK group.This increase in reflectance could be attributed to the inhibitory effect of cadmium stress at these concentrations on the chlorophyll content in lettuce leaves.The chlorophyll content decreases, leading to leaf yellowing and causing changes in the leaf spectral reflectance [38].
In the near-infrared wavelength range, under cadmium stress of 10 mg/kg and 20 mg/kg, the leaf spectral reflectance is lower than that of the CK group.This could be attributed to the fact that cadmium stress at 10 mg/kg and 20 mg/kg can lead to the destruction of the internal cellular structures in the leaves, resulting in a reduced reflectance ability.Consequently, the leaf spectral reflectance is lower than that of the CK group [39].

Spectral Preprocessing
According to Figure 4, it can be observed that the SG treatment technique effectively reduces noise and interference signals in the data.The SNV treatment standardizes the spectral curves through normalization.The MSC treatment alleviates the impact of scattering on the spectral curves.The MN treatment helps achieve consistency in the spectral intensity.The FD and SD treatment techniques address overlapping peaks in the spectral curves, enhancing the discrimination between spectra.The B treatment increases the signal resolution, making features and peaks more prominent.The D treatment removes the trend signals, thereby improving the data analysis accuracy and reinforcing signal characteristics.These methods not only enhance the signal-to-noise ratio but also maintain the effectiveness of spectral information [40].
signal resolution, making features and peaks more prominent.The D treatment removes the trend signals, thereby improving the data analysis accuracy and reinforcing signal characteristics.These methods not only enhance the signal-to-noise ratio but also maintain the effectiveness of spectral information [40].

Dimensionality Reduction Processing
Due to the original spectral information having 781 data nodes, forming a high-dimensional matrix of 781 × 135, and this information having certain correlations that can lead to information overlap, dimensionality reduction processing is necessary.In this study, principal component analysis (PCA) was used for dimensionality reduction processing to map the spectral data from high dimension to low dimension.This not only preserves the most important features of the spectral data but also removes redundant information and reduces the impact of collinearity, thereby speeding up subsequent data analysis and modeling.The number of principal components for the original spectrum and various preprocessed spectra are determined based on the principle of cumulative

Dimensionality Reduction Processing
Due to the original spectral information having 781 data nodes, forming a highdimensional matrix of 781 × 135, and this information having certain correlations that can lead to information overlap, dimensionality reduction processing is necessary.In this study, principal component analysis (PCA) was used for dimensionality reduction processing to map the spectral data from high dimension to low dimension.This not only preserves the most important features of the spectral data but also removes redundant information and reduces the impact of collinearity, thereby speeding up subsequent data analysis and modeling.The number of principal components for the original spectrum and various preprocessed spectra are determined based on the principle of cumulative contribution rate ≥85% and the principle of variance percentage of each principal component ≥1% [29] (Table 1).

Model Establishment
Estimation models for cadmium content in lettuce leaves are built using partial least squares regression (PLSR), the backpropagation neural network (BP-NN), and support vector regression (SVR).In the PLSR model, the preprocessed spectral data are used as input variables and the cadmium content in the leaves is the output variable.In the BP-NN and SVR models, the reduced-dimensional principal component spectral data are used as the input variables, and the cadmium content in the leaves is the output variable.The training set comprises 90 samples, and the test set contains 45 samples.One sample is randomly selected for validation out of every three, based on the range of cadmium content.

Model Establishment Based on Partial Least Squares Regression (PLSR)
PLSR modeling was performed on the original spectra and various preprocessed spectral data.The training set accounts for two-thirds of the dataset, while the test set accounts for one-third of the dataset.
According to Table 2, it can be observed that the PLSR models established using different processing methods exhibit good predictive performance.The determination coefficients (R 2 ) for the training set are all above 0.95, and the root mean square errors (RMSE) range from 0.33 mg/kg to 0.81 mg/kg.For the test set, the prediction coefficients (R p 2 ) are all above 0.80, and the root mean square errors of prediction (RMSE p ) range from 1.53 mg/kg to 2.08 mg/kg.The PLSR models established using the SG + MSC and SG + SNV preprocessing methods demonstrate the best performance.The R 2 values for the training set are both 0.98, while the R p 2 values for the test set are both 0.92.Additionally, the RMSE p for the test set is the lowest among all the models, at 1.53 mg/kg, representing an improvement of 0.46 mg/kg compared to the model established using the original spectra.These results indicate that the combination of the SG + MSC and SG + SNV preprocessing methods with PLSR modeling is effective in accurately estimating cadmium content in lettuce leaves.The BP-NN algorithm is a multilayer feedforward perceptron neural network that possesses strong self-learning and nonlinear mapping capabilities.During the establishment of the BP-NN model, the training set accounts for two-thirds of the dataset, and the testing set accounts for one-third of the dataset.The learning rate is set to 0.1, and the number of iterations is set to 1000.
The modeling results are shown in Table 3.The BP-NN method was used for modeling, and the R 2 values of the training set are all above 0.80, with the RMSE ranging from 1.50 mg/kg to 3.01 mg/kg.The R 2 values of the testing set are all above 0.65, and the RMSE p values range from 1.74 mg/kg to 3.01 mg/kg.Among them, the BP-NN model established using SG + SD preprocessing has the best performance, with an R 2 of 0.86 and an RMSE of 1.93 mg/kg on the training set.The testing set achieves an R p 2 of 0.88 and an RMSE p of 1.87 mg/kg, which is a 0.16 mg/kg improvement compared to the model established using the original spectra.This indicates that the model constructed using SG + SD preprocessing combined with the BP-NN method can accurately estimate the cadmium content in lettuce leaves.To ensure the SVR model performs optimally, it is necessary to determine the best penalty factor (c), kernel function parameter (g), and optimal kernel function within the SVR model.A grid search is able to comprehensively explore a specified parameter space.It achieves this by exhaustively searching through all possible combinations of parameters, ensuring that no potential combination is overlooked [41].Therefore, in this study, a grid search was employed to determine the optimal penalty factor (c) and the best kernel function parameter (g) for SVR, as shown in Table 4. Through employing the brute force search technique, it is identified that, out of the Linear Kernel, Polynomial Kernel, Radial Basis Function Kernel, and Sigmoid Kernel within the SVR model, the Radial Basis Function (RBF) Kernel emerges as the most effective kernel function.In the SVR model, the training set accounts for two-thirds of the dataset, while the test set accounts for one-third.The modeling results are shown in Table 5.In the SVR model, the R 2 of the training set for all nine treatment methods is above 0.90, and the RMSE ranges from 0.27 mg/kg to 1.68 mg/kg.The R p 2 of the test set is above 0.70, and the RMSE p ranges from 1.80 mg/kg to 2.69 mg/kg.Among them, the SVR model established with the SG + D method shows the best performance.The R 2 of the training set is 0.98, with an RMSE of 0.27 mg/kg.The R p 2 of the test set is 0.84, which is an improvement of 0.11 compared to the model built with the original spectrum.The RMSE p is 1.80 mg/kg, which is an improvement of 0.89 mg/kg compared to the model established with the original spectrum.The results indicate that the estimation model for cadmium content in lettuce leaves established with the SG + D preprocessing method and SVR approach has effective prediction capabilities.

Model Comparison
The analysis and comparison were conducted by selecting the best combination of the different preprocessing methods and models, as shown in Table 6.Among the four models, the BP-NN (SG + SD) model exhibits the poorest performance on both the training and test sets.Although the SVR (SG + D) model shows the best performance on the training set, its performance on the test set is relatively poor.The PLSR (SG + MSC) and PLSR (SG + SNV) models demonstrate the highest accuracy, with the R p 2 on the test set being the highest among the four models, indicating a good fit of the models.Furthermore, the RMSEp is the lowest among the four models, suggesting a high prediction accuracy and a small prediction error.Therefore, it can be concluded that the PLSR model with preprocessing using SG + MSC and SG + SNV demonstrates the highest accuracy in estimating lettuce leaf cadmium content.This may be attributed to the fact that PLSR (partial least squares regression) combines the advantages of canonical correlation analysis, principal component analysis, and multiple linear regression analysis, forming a comprehensive multivariate statistical analysis technique.It efficiently harnesses information from all predictor variables to build robust predictive models [42].Meanwhile, PLSR can effectively suppress or circumvent the problem of multicollinearity among several independent variables, optimizing variable information to achieve maximal collinearity between independent and dependent variables.This feature renders PLSs especially applicable to the analysis of spectral data scenarios in which the quantity of independent variables significantly exceeds that of dependent variables, and there is a noticeable collinearity among the independent variables [43].Currently, many experts and scholars have qualitatively or quantitatively constructed diagnostic models for crop nutrient parameters using PLSR technology from different perspectives, which align with the results obtained in this study [16,19,42].Visible-near-infrared spectroscopy technology demonstrates promising prospects in the monitoring of heavy metals.Villatoro-Pulido M et al. [44] believe that NIRS could act as a rapid screening method for assessing the total minerals, iron, sodium, potassium, and zinc contents in the rocket.R. Font et al. [45,46] suggest that visible-to-near-infrared spectroscopy technology possesses considerable potential in determining the total arsenic (As) in the prostrate amaranth (Amaranthus blitoides S. Watson) and in screening for the inorganic arsenic (i-As) content in commercial rice.Combined with this study, it is illustrated that visible-near-infrared spectroscopy technology, when integrated with the PLSR method, is capable of estimating the cadmium content in lettuce leaves.

Conclusions
In this study, lettuce was selected as the research subject, and an exogenously cadmiumtreated potting method was employed to investigate the cadmium content in lettuce leaves and analyze the visible-near-infrared reflectance spectra under cadmium stress.Furthermore, a cadmium content estimation model for lettuce leaves was established.The following conclusions can be drawn:

•
Lettuce leaves can effectively accumulate cadmium, and the enrichment effect becomes more pronounced with an increasing cadmium concentration.

•
In the visible light range, the spectral reflectance of lettuce leaves increases with the increase in cadmium concentration.In the near-infrared range, under 10 mg/kg and 20 mg/kg cadmium stress, the spectral reflectance of lettuce leaves decreases compared to the CK group.• Among the PLSR, BP-NN, and SVR models established for estimating lettuce leaf cadmium content, the PLSR model with preprocessing using SG + SNV and SG + MSC demonstrates the highest accuracy.The R 2 values for the training set are both 0.98, while the R p 2 and RMSE p values for the testing set are 0.92 and 1.53 mg/kg, respectively.
Therefore, the application of visible-near-infrared spectroscopy technology can offer a theoretical foundation and guidance for ensuring the safe production and quality management of lettuce.

Figure 2 .
Figure 2. Effects of cadmium stress on cadmium content in lettuce leaves.Note: Different lowercase letters indicate significant differences at the p < 0.05 level, with an F value of 226.77.

Figure 2 .
Figure 2. Effects of cadmium stress on cadmium content in lettuce leaves.Note: Different lowercase letters indicate significant differences at the p < 0.05 level, with an F value of 226.77.

Figure 3 .
Figure 3. Visible-near-infrared reflectance spectral characteristics in lettuce leaves under cadmium stress.

Figure 4 .
Figure 4.The spectral curves after preprocessing: (a) the original spectral curves; (b) the spectra preprocessed with SG; (c) the spectra preprocessed with SG + MSC; (d) the spectra preprocessed with SG + SNV; (e) the spectra preprocessed with SG + MN; (f) the spectra preprocessed with SG + FD; (g) the spectra preprocessed with SG + SD; (h) the spectra preprocessed with SG + B; and (i) the spectra preprocessed with SG + D.

Figure 4 .
Figure 4.The spectral curves after preprocessing: (a) the original spectral curves; (b) the spectra preprocessed with SG; (c) the spectra preprocessed with SG + MSC; (d) the spectra preprocessed with SG + SNV; (e) the spectra preprocessed with SG + MN; (f) the spectra preprocessed with SG + FD; (g) the spectra preprocessed with SG + SD; (h) the spectra preprocessed with SG + B; and (i) the spectra preprocessed with SG + D.

Table 1 .
Determination of the number of principal components.

Table 2 .
The effectiveness of the PLSR models.

Table 3 .
The effectiveness of the BP-NN models.

Table 5 .
The effectiveness of the SVR models.

Table 6 .
Comparison of different models' performance.