Detection of Cadmium Content in Pak Choi Using Hyperspectral Imaging Combined with Feature Selection Algorithms and Multivariate Regression Models

Chen, Yongkuai; Wang, Tao; Lin, Shanshan; Liao, Shuilan; Wang, Songliang

doi:10.3390/app16020670

Open AccessArticle

Detection of Cadmium Content in Pak Choi Using Hyperspectral Imaging Combined with Feature Selection Algorithms and Multivariate Regression Models

by

Yongkuai Chen

¹,

Tao Wang

¹,

Shanshan Lin

²,

Shuilan Liao

¹ and

Songliang Wang

^3,*

¹

Institute of Digital Agriculture, Fujian Academy of Agricultural Sciences, Fuzhou 350003, China

²

Fujian Provincial Seed General Station, Fuzhou 350003, China

³

Faculty of Agriculture, Fujian Agriculture and Forestry University, Fuzhou 350002, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2026, 16(2), 670; https://doi.org/10.3390/app16020670

Submission received: 13 November 2025 / Revised: 27 December 2025 / Accepted: 6 January 2026 / Published: 8 January 2026

(This article belongs to the Section Agricultural Science and Technology)

Download

Browse Figures

Versions Notes

Abstract

Pak choi (Brassica chinensis L.) has a strong adsorption capacity for the heavy metal cadmium (Cd), which is a big threat to human health. Traditional detection methods have drawbacks such as destructiveness, time-consuming processes, and low efficiency. Therefore, this study aimed to construct a non-destructive prediction model for Cd content in pak choi leaves using hyperspectral technology combined with feature selection algorithms and multivariate regression models. Four different cadmium concentration treatments (0 (CK), 25, 50, and 100 mg/L) were established to monitor the apparent characteristics, chlorophyll content, cadmium content, chlorophyll fluorescence parameters, and spectral features of pak choi. Competitive adaptive reweighted sampling (CARS), the successive projections algorithm (SPA), and random frog (RF) were used for feature wavelength selection. Partial least squares regression (PLSR), random forest regression (RFR), the Elman neural network, and bidirectional long short-term memory (BiLSTM) models were established using both full spectra and feature wavelengths. The results showed that high-concentration Cd (100 mg/L) significantly inhibited pak choi growth, leaf Cd content was significantly higher than that in the control group, chlorophyll content decreased by 16.6%, and damage to the PSII reaction centre was aggravated. Among the models, the FD–RF–BiLSTM model demonstrated the best prediction performance, with a determination coefficient of the prediction set (Rp²) of 0.913 and a root mean square error of the prediction set (RMSEP) of 0.032. This study revealed the physiological, ecological, and spectral response characteristics of pak choi under Cd stress. It is feasible to detect leaf Cd content in pak choi using hyperspectral imaging technology, and non-destructive, high-precision detection was achieved by combining chemometric methods. This provides an efficient technical means for the rapid screening of Cd pollution in vegetables and holds important practical significance for ensuring the quality and safety of agricultural products.

Keywords:

pak choi; cadmium; hyperspectral technology; feature selection; chlorophyll fluorescence

1. Introduction

With increasing global industrialization, cadmium pollution in agricultural production is becoming an increasingly severe global environmental challenge, primarily due to anthropogenic activities such as industrial emissions, mining operations, the application of phosphate fertilizers and sewage sludge, and improper disposal of electronic waste [1,2]. Studies have reported that the most severe cadmium pollution in agricultural soils globally occurs in regions such as northern and central India, Pakistan, Bangladesh, southern China, and southern Thailand [3]. In China, the average cadmium concentration in agricultural soils is 0.19 mg/kg, approximately twice the natural background level of 0.097 mg/kg. Alarmingly, cadmium pollution affects 33.54% of farmland and 44.65% of urban soils [4].

Cadmium is a Group 1 carcinogen [5]. Previous research has indicated that Cd generally negatively affects plant physiological and molecular processes (including tissue growth, nutrient uptake, photosynthesis, nutrient balance, antioxidant enzyme activity, ROS accumulation, biomass reduction, and molecular pathway disturbances) either directly or indirectly. The toxic effects of Cd include growth inhibition, root damage, leaf curling and yellowing, and even leaf abscission. Furthermore, excessive accumulation of Cd in plants can induce massive reactive oxygen species production, causing cell membrane lipid peroxidation, chloroplast degradation, severe damage to plant photosynthetic reaction centres, and inhibition of plant growth and development [6,7].

Pak choi (Brassica chinensis L.) is a high-value vegetable in Asia and is important for ensuring global food security because of its widespread consumption. This vegetable is widely cultivated in Europe, the Mediterranean, and East Asia (particularly China, South Korea, and Japan) [8,9]. Pak choi, a crop within the Brassicaceae family, Brassica genus, B. campestris L. subspecies, has a very high capacity for cadmium uptake and accumulation. Cadmium, a nonessential element, is easily absorbed and accumulates in its edible leaves, subsequently inducing a biomagnification effect through the food chain. This effect not only threatens crop production itself but also causes irreversible harm to human health [10]. In the General Standard for Contaminants and Toxins in Food and Feed (Codex Stan 193-1995) [11] of the Codex Alimentarius Commission (CAC) and China’s current National Food Safety Standard—Maximum Levels of Contaminants in Food (GB 2762-2022) [12], the maximum limit of cadmium (Cd) in leafy vegetables is 0.2 mg/kg. However, due to factors such as soil cadmium contamination and varietal characteristics, cadmium levels in some pak choi crops may exceed this limit during actual production, posing a direct threat to food safety and human health.

Therefore, assessing cadmium levels in pak choi can mitigate the risk of human exposure via the food chain. Early detection of cadmium accumulation can increase crop safety, reduce losses, prevent cadmium from entering the human body, and guide pollution prevention and control. Early detection is important for increasing food security, protecting public health, and facilitating sustainable agricultural development. Traditional methods for detecting heavy metals in crops rely on laboratory chemical analysis of a large number of leaf samples, which is inefficient and time-consuming. With the development of photoelectric non-destructive prediction technology, efficient and time-saving visible–near-infrared reflectance spectroscopy has become an alternative technique for detecting heavy metal pollution [13]. Spectral data from only plant leaves are needed to establish Cd content models based on sensitive bands or spectral indices [14]. Some studies have reported that spectral-based methods can be used to determine the type of pollution or stress suffered by plants, including pest and disease stress, salinity stress, water stress, and heavy metal pollution [15,16,17,18]. Wang et al. [19] analysed the relationship between the spectral reflectance of pepper leaves at four growth stages at different cadmium stress levels and the Cd content in mature pepper fruits and estimated the fruit Cd content using multiple regression. Yi et al. [20] used hyperspectral remote sensing data combined with support vector machine regression (SVMR) to estimate the cadmium content in field pepper and eggplant leaves, with prediction set determination coefficients R² of 0.897 and 0.726, respectively. Sun et al. [21] used deep belief networks combined with hyperspectral imaging to estimate the cadmium content in lettuce. Shen et al. [22] used hyperspectral imaging and chemometrics to estimate the free proline content in rice leaves under cadmium stress. The results indicated that the ELM model based on 27 feature wavelengths selected by CARS performed best, with an R² value of 0.9426, and could be used to explore changes in free amino acids in rice leaves under Cd stress. The above studies indicate that the use of hyperspectral technology for crop element and content detection and analysis is feasible.

Research on the spectral inversion of the heavy metal cadmium has focused on soil, rice, tomatoes, lettuce, and other crops [23,24,25,26], with few studies on pak choi, which has a strong enrichment capacity. More critically, existing technologies often struggle to achieve precise detection during the subvisual stage of cadmium stress, when conventional physiological indicators such as chlorophyll fluorescence and pigment indices in plants show no significant changes, thereby limiting their practical value in early warning applications. The focus of this study is to use hyperspectral imaging and machine learning tools to construct a prediction model for cadmium content in pak choi. Therefore, the objectives of this study are as follows: (1) to determine the effect of different cadmium stress levels on pak choi using phenotypic data such as 2D images, physiological indicators, cadmium content accumulation, and chlorophyll fluorescence parameters; (2) to evaluate the effectiveness of preprocessing and feature band selection tools in improving model performance; and (3) to apply chemometrics and regression techniques to develop prediction models. This study aims to fill the technical gap in early sub-visual detection of cadmium stress in pak choi, providing a novel technical pathway for non-destructive, rapid early warning of cadmium content in pak choi.

2. Materials and Methods

2.1. Experimental Design

The tested variety was ‘Huaguan Qinggengcai’, bred by Musashino Seed Co., Ltd., Tokyo, Japan. The seedling substrate was imported from Danish Pindstrup peat. Anhydrous cadmium chloride (CdCl₂) was obtained from Shanghai Macklin Biochemical Technology Co., Ltd., Shanghai, China.

The experiments were conducted from 14 February to 20 April 2025, in the phenotyping experimental greenhouse and physiological and biochemical laboratory of the Digital Agriculture Research Institute, Fujian Academy of Agricultural Sciences. Pak choi was sown on 14 February in plug trays covered with peat. After sowing, the seedlings were placed on a tidal seedling bed within a small greenhouse. The nutrient solution formula was as follows: to 100 L of water, A (11.2 kg Ca(NO₃)₂, 12 kg KNO₃) and B (2.6 kg KH₂PO₄, 3 kg MgSO₄, and 750 g EDTA-Fe) were added. During the seedling stage, the electrical conductivity (EC) of the nutrient solution was 1.0 mS/cm. The tidal bed was flooded daily at 08:00 for 5 min, held for 10 min, and then allowed to drain for 10 min. To reduce interference from other factors in the soil, a nutrient film technique (NFT) was used for cultivation to ensure that the experiment was affected only by the cadmium concentration. The transplanting date was 2 April, when the seedlings were moved from the plug trays to custom-made NFT cultivation troughs. The nutrient solution EC was 1.5 mS/cm. After precultivation with a normal nutrient solution for 7 days, cadmium was added to the nutrient solution on 9 April. In this experiment, the cadmium concentrations were 25, 50, and 100 mg/L, and a cadmium-free treatment (0 mg/L) was used as the control (CK). Each treatment consisted of 300 plants, with three replicates per treatment. There were 4 NFT experimental areas, each with an independent water and fertilizer supply system, using a timed irrigation mode (each irrigation for 10 min and an interval of 15 min). The nutrient solution in the reservoir was replaced once every Monday. After 7 days of treatment, pak choi samples were collected to determine their chlorophyll content, cadmium content, and chlorophyll fluorescence parameters, as well as their hyperspectral reflectance. For each treatment, 40 samples were collected, totalling 160 samples.

2.2. Determination of Chlorophyll and Cadmium Contents

Chlorophyll was extracted using the acetone–ethanol mixed solution method [27]. The cadmium content was determined by digestion with a nitric–perchloric acid mixture (4:1, v/v) and measured using an atomic absorption spectrophotometer (AAS, PinAAcle 900F, PerkinElmer, Waltham, MA, USA). Each sample was tested in triplicate, and the final result was determined as the average of the three replicates.

2.3. Measurement of Chlorophyll Fluorescence Data

A chlorophyll fluorescence imager (FC800-D, Photon Systems Instruments PSI, Drásov, Czech Republic) was used. Data were acquired in an indoor dark chamber, with key parameters set as follows: the saturating pulse value was 5250 μmol/(m²·s), the actinic light 2 value was 422 μmol/(m²·s), and the object distance was 30 cm. After dark adaptation of the sample for 30 min, image data acquisition was performed. Upon completion, the following parameters were obtained: initial fluorescence value (F₀), maximum PSII quantum yield (F_v/F_m), effective PSII quantum yield (F_v′/F_m′), actual PSII quantum efficiency (ΦPSII), nonphotochemical quenching coefficient (NPQ), and photochemical quenching coefficient (qP). Each treatment was performed in triplicate.

2.4. Hyperspectral Imaging Data Acquisition

In this study, a visible-near-infrared hyperspectral imaging system (FX10, SPECIM, Oulu, Finland) was used. The system consisted of a hyperspectral imager, an imaging lens, a 500 W halogen lamp, a 99% reflectivity white reflective reference panel, an industrial touchscreen tablet computer, and an electric control translation stage (as shown in Figure 1). The spectral range collected was 397.66–1003.81 nm. The key parameters were as follows: the moving speed of the electric control translation stage was 2 mm/s, the exposure time was 10 ms, and the object distance was 40 cm. The technical roadmap is presented in Figure 2. Before data acquisition, the hyperspectral system was preheated for 30 min to achieve a stable operating state. Subsequently, a white calibration panel with 99% reflectivity and the sample to be tested were fixed side by side at the same sample acquisition station, ensuring that both fully covered the system’s field of view under uniform lighting conditions. After initiating the sample acquisition program, the system automatically completed the capture and storage of dark current images according to the preset workflow and synchronously acquired an integrated spectral image containing both the reflective area of the white calibration panel and the sample area. Upon completion of the acquisition, the reflectance correction of the sample spectral image was performed using ENVI 4.8 software to separately extract the reflectance data of the white panel area from the integrated image as the white reference and invoke the stored dark reference data. The correction formula is detailed in Equation (1).

R = \frac{R_{raw} - R_{dark}}{R_{white} - R_{dark}}

(1)

where

R

represents the corrected sample spectral image,

R_{raw}

denotes the original sample hyperspectral image,

R_{white}

indicates the white calibration image, and

R_{dark}

represents the dark reference image.

2.5. Hyperspectral Data Analysis and Model Evaluation

2.5.1. Hyperspectral Data Preprocessing

The region of interest (ROI) corresponding to the canopy of each sample was delineated using ENVI 5.3 software to extract spectral information. The average spectral data of all pixel points within the ROI was adopted as the representative spectral feature of the sample. Given the significant noise interference in the spectral range of 803.1–1003.81 nm, only the reflectance spectral data of 151 bands within the range of 397.66–800.34 nm for each sample were selected for subsequent modelling and analysis.

To reduce the effect of noise and other interference factors, this study used four methods for preprocessing the original spectral data: first derivative (FD), second derivative (SD), multiplicative scatter correction (MSC), and normalization (Nor). FD and SD are important mathematical tools for analysing the rate of change of spectral signals. FD is mainly used to eliminate baseline drift and background interference, improve parts of the spectrum that change rapidly, separate overlapping absorption peaks, and increase spectral resolution. SD can further increase the resolution of spectral details, allowing clearer identification of the position and shape of weak absorption peaks [28]. MSC is primarily used to eliminate scattering interference caused by uneven particle size, surface roughness, or differences in optical path length [29]. Nor adjusted spectral data to a comparable scale or distribution, differences in units across feature dimensions are eliminated, the balanced weight of each feature is ensured during analysis, and the performance of subsequent models is improved [30].

2.5.2. Feature Wavelength Selection Algorithms

Hyperspectral full-band data often suffer from information redundancy and multicollinearity. Feature wavelength selection can effectively reduce data dimensionality, eliminate redundant information, and improve the generalizability of the model. This study employed three algorithms for feature wavelength screening: competitive adaptive reweighted sampling (CARS), the successive projections algorithm (SPA), and the random frog (RF). CARS simulates a “biological evolution” process, adaptively reweighting and selecting spectral bands, and gradually eliminating redundant and unimportant bands. The premise of CARS is to use Monte Carlo sampling and an exponential decay function to adaptively adjust the selection probability of each band, ultimately selecting the optimal band combination that contributes the most to the modelling performance [31]. SPA is a forward feature variable selection method that identifies the main feature vectors of the data through iterative projection, retaining the main information and eliminating redundancy [32]. The RF is an efficient feature selection method for high-dimensional data. It is based on sequential random sampling and probability statistics, generating different feature subsets through multiple iterations and calculating the frequency of each feature band that is selected to measure feature importance [33].

2.5.3. Modelling Algorithms

Partial least squares regression (PLSR), random forest regression (RFR), the Elman neural network (Elman NNs), and the bidirectional long short-term memory network (BiLSTM) were used to predict the cadmium content. PLSR is a common statistical method for regression analysis and modelling. When data exhibit multicollinearity, PLSR extracts latent variables (also referred to as components or principal components) to project the independent and dependent variables into a new low-dimensional space. These latent variables are the components most correlated between the independent and dependent variables. The main goal of PLSR is to maximize the covariance between independent and dependent variables, establishing a regression relationship between them [34]. In this study, the optimal parameters of the PLSR model were determined via grid search combined with ten-fold cross-validation, with the search range of the number of principal components specified as 1 to 20. RFR is an ensemble learning algorithm that is based on decision trees, builds multiple decision trees, and combines their predictions to perform regression tasks. The random forest is a powerful regression model with strong generalizability and robustness [35]. The core hyperparameter settings for the RFR model in this study were as follows: the number of decision trees was set to 100; the minimum number of samples per leaf node was set to 3; and the mean squared error (MSE) was adopted as the splitting criterion. Elman NNs is an improved recurrent neural network that is based on the BP neural network and is a powerful and widely used neural network model [36]. The core configurations of the Elman NNs are as follows: (1) Network architecture: The number of neurons in the input layer was equal to the dimension of input features in the dataset; the hidden layer was set with 10 neurons (the number of neurons in the context layer was consistent with that in the hidden layer, which was following the default feedback structure of Elman NNs); the output layer contained 1 neuron. (2) Activation functions: The hidden layer adopted the tanh (hyperbolic tangent function), and the output layer used the purelin (linear function). LSTM is an improvement over recurrent neural networks (RNNs). BiLSTM is an improvement over LSTM networks, including forward and backward LSTM layers. BiLSTM supports the bidirectional feature learning of data, which can better identify correlations among multivariate regression data features [37]. For the BiLSTM model, the core parameters were configured as follows: 4 hidden layer units, a maximum of 1000 training epochs, and a batch size of 128. The Adam gradient descent algorithm was selected as the optimizer, with an initial learning rate of 0.01. With respect to dataset division, in this study, 160 samples were divided into a training set of 128 samples and a prediction set of 32 samples at a ratio of 80%:20%.

2.5.4. Model Evaluation Methods

To validate the reliability of the established models, in this study, the determination coefficient (R²) and root mean square error (RMSE) were used to evaluate model accuracy. R² and RMSE are divided into the determination coefficient for the training set (R_c²) and that for the prediction set (R_p²), and the root mean square error for the training set (RMSEC) and that for the prediction set (RMSEP), respectively. R² ranges from 0 to 1. When both R_c² and R_p² are at high levels with a small difference between them, it indicates that the constructed model has both good data fitting ability and generalization performance. If R_c² is relatively high while R_p² is significantly lower, it suggests that the model suffers from overfitting. If both R_c² and R_p² are at low levels, it indicates that the model has an underfitting problem. The RMSE is used to measure the magnitude of the prediction error of the model; a smaller value indicates a higher prediction accuracy of the model [38].

R^{2} = 1 - \frac{\sum_{i = 1}^{n} {({\hat{y}}_{i} - y_{i})}^{2}}{\sum_{i = 1}^{n} {(y_{i} - \bar{y})}^{2}}

(2)

RMSE = \sqrt{\frac{\sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}{n}}

(3)

where n represents the number of samples in the dataset,

y_{i}

denotes the measured value of the sample,

{\hat{y}}_{i}

denotes the predicted value of the sample, and

\bar{y}

denotes the average value of all physicochemical values in the dataset.

3. Results

3.1. Effect of Cadmium Stress on Pak Choi Growth and Cadmium Accumulation

The images of the growth status of the pak choi population with different cadmium concentration treatments on the basis of guided inspection are shown in Figure 3. The 0 mg/L (CK) pak choi treatment resulted in dense leaves and a dark green colour, indicating good growth status and suggesting that pak choi can grow normally in a cadmium-free environment. The 25 mg/L cadmium treatment group began to show slight leaf yellowing, the plant density decreased somewhat, and growth was slightly inhibited. The 50 mg/L cadmium treatment group presented increased leaf yellowing, more obvious plant sparsity, and increased growth inhibition. The 100 mg/L cadmium treatment group exhibited extensive leaf yellowing and abscission, sparse plants, and significantly inhibited growth. These findings indicate that cadmium has a concentration-dependent inhibitory effect on pak choi growth; the higher the concentration is, the stronger the inhibition, indicated by leaf yellowing, plant sparsity, and other symptoms. As shown in Table 1, as the cadmium concentration increased, the minimum, maximum, and average cadmium contents in pak choi significantly increased. There were significant differences in the cadmium content of pak choi leaves with different cadmium concentration treatments, and the cadmium concentration was positively correlated with the cadmium content in pak choi leaves. The cadmium content in pak choi leaves continuously increased with increasing cadmium concentration.

3.2. Effect of Cadmium Stress on Chlorophyll Content in Pak Choi

As shown in Figure 4, CK (0 mg/L) had the highest chlorophyll content, approximately 1.35 mg/g. When the cadmium concentration increased to 25, 50, and 100 mg/L, the chlorophyll content significantly decreased and stabilized at approximately 1.10 mg/g (there was no significant difference among the three groups). These findings indicate that cadmium damages chloroplast structure, inhibits chlorophyll synthase activity, and accelerates chlorophyll degradation, ultimately reducing the leaf chlorophyll content.

3.3. Effect of Cadmium Stress on the Chlorophyll Fluorescence Characteristics of Pak Choi

F₀ (fluorescence origin) reflects the fluorescence level when the photosystem II (PSII) reaction centres are completely open. The changes in F₀ can indicate the degree of damage to PSII reaction centres. An increase in F₀ usually indicates that PSII reaction centres are damaged and that the electron transport process is hindered. In the images (Figure 5), colours leaning towards red indicate higher F₀ values; colours leaning towards green and blue indicate lower F₀ values. At 0 mg/L (CK), the leaves were mainly green and yellow, with few red areas, indicating that the PSII reaction centres were essentially normal and that F₀ was at a low level. At 25 mg/L, the red areas increased slightly, but the overall colour was still mainly green and yellow, indicating weak damage to the PSII reaction centres by low cadmium concentrations. At 50 mg/L, the red areas further expanded, indicating that a medium cadmium concentration caused more obvious damage to the PSII reaction centres, and F₀ increased. At 100 mg/L, red areas were distributed across a large area, with the highest proportion of red areas among the four treatments, indicating that high cadmium concentrations caused the most severe damage to PSII reaction centres and that F₀ increased significantly. As the cadmium concentration increased, the initial fluorescence F₀ of the pak choi leaves gradually increased, indicating that the damage caused by cadmium to the PSII reaction centres of pak choi was concentration-dependent and that a high cadmium concentration significantly inhibited photosynthesis.

As shown in Table 2, with different cadmium concentration treatments, the F_v/F_m and F_v′/F_m′ of pak choi decreased, indicating that cadmium stress inhibited the PSII reaction centres of the photosynthetic system, but the differences among treatments were small. The ΦPSII did not significantly differ between the treatment groups and the control group, indicating that cadmium stress had a minimal effect on the quantum efficiency of PSII electron transport and that the core function of the photosynthetic electron transport chain was not severely damaged. It is likely that cadmium accumulates mainly in the root system and the cytoplasmic matrix of leaves in the short term, without permeating extensively into the thylakoid membranes of chloroplasts; consequently, core parameters such as F_v/F_m and ΦPSII did not exhibit significant fluctuations. qP tended to increase slightly with increasing cadmium concentration, indicating that pak choi enhances electron transport efficiency by upregulating qP, allocating more excess light energy to the carbon assimilation process, and reducing photodamage to the PSII reaction centres, thereby maintaining the relative stability of parameters including F_v′/F_m′ and ΦPSII. NPQ tended to significantly decrease with increasing cadmium concentration, which suggests that the reduced fluidity of chloroplast membranes under cadmium stress inhibits the activity of key enzymes involved in the xanthophyll cycle, resulting in the prominent and early decline in NPQ.

The above phenotypic data indicate that cadmium stress can significantly inhibit the growth and development of pak choi and facilitate cadmium enrichment in leaves. However, the effects of cadmium stress on chlorophyll content and chlorophyll fluorescence parameters did not significantly differ, indicating that the degree of damage caused by cadmium stress can be accurately determined for only leaves from pak choi. Currently, the detection of leaf cadmium content relies mainly on laboratory chemical methods, which have limitations such as destructive sample analysis, long detection cycles, and high costs. To address this problem and increase detection efficiency, this study established a non-destructive, rapid detection technology system for determining cadmium content by constructing a correlation model between leaf spectral characteristics and cadmium content.

3.4. Spectral Curves

In this study, we obtained spectral information from 160 pak choi samples. The spectral range was 397.66–800.34 nm (151 bands in total). The original spectral curves of pak choi at four different cadmium stress concentrations are shown in Figure 6.

3.5. Feature Wavelength Selection Results

The feature wavelengths selected by CARS, SPA, and RF within the complete spectral range are shown in Table 3. The feature wavelengths extracted by different methods have specific differences. The number of feature wavelengths selected by CARS ranged between 23 and 63. The FD pretreatment selected a relatively large number of feature wavelengths, 63 in total, accounting for 41.72% of the full spectral data. For the Nor pretreatment, 23 feature wavelengths were selected, accounting for 15.23% of the full spectral data. SPA selected the fewest feature wavelengths among the three methods, ranging from 5 to 13, accounting for 3.31%, 3.97%, 7.28%, and 8.61% of the complete spectral data. RF selected feature wavelengths ranging from 23 to 44, accounting for 24.50%, 15.23%, 15.89%, and 29.14% of the full spectral data. All three feature wavelength extraction methods eliminated redundant information and collinearity issues, reduced the number of variables, and compressed the spectral data, which helped reduce the complexity of subsequent modelling, lower computational costs, and increase the prediction accuracy of the pak choi leaf cadmium content detection model. The distributions of feature wavelengths extracted by the above three methods are shown in the Supplementary Materials (Figures S1–S3). The distributions of feature wavelengths selected by CARS, SPA, and RF have similarities. All three algorithms selected a large number of feature wavelengths in the visible light band (400–600 nm) and the near-infrared band (700–800 nm).

3.6. Results of Cadmium Content Detection Models Based on Feature Wavelengths

3.6.1. PLSR Prediction Model for Pak Choi Leaf Cadmium Content Based on Feature Wavelengths

With the feature wavelengths extracted by the CARS, SPA, and RF feature wavelength extraction algorithms as input variables and the cadmium content of pak choi leaves as the output variable, PLSR models for pak choi leaf cadmium content based on the CARS, SPA, and RF algorithms were constructed. Original spectral data modelling was also performed for comparison between the models. The modelling results are listed in Table 4. The table shows that the MSC–CARS–PLSR model had the best prediction performance, with an R_c² of 0.918, an RMSEC of 0.026, an R_p² of 0.894, and an RMSEP of 0.034. The difference between R_c² and R_p² of this model was only 0.024, which was relatively small. This indicated that the fitting ability and prediction ability of the model were well-balanced, with no obvious overfitting or underfitting phenomenon. Compared with the raw-PLSR model, R_p² increased by 7.32%, and RMSEP decreased by 17.07%. The FD–SPA–PLSR model performed the worst, with an R_c² of 0.587, an RMSEC of 0.052, an R_p² of 0.732, and an RMSEP of 0.046. The difference between R_c² and R_p² was relatively large, indicating an obvious discrepancy between the fitting performance and prediction performance of the model.

3.6.2. RFR Prediction Model for Pak Choi Leaf Cadmium Content Based on Feature Wavelengths

Compared with the RFR model established using raw data, the RFR models constructed using feature wavelengths selected by the CARS, SPA, and RF methods all achieved better prediction performance. As can be seen from Table 5, the determination coefficients R² for all the model prediction sets were greater than 0.8. The SD–CARS–RFR and SD–RF–RFR models performed better than the other models did. For the SD-CARS-RFR model, the R_c², RMSEC, R_p², and RMSEP values were 0.932, 0.025, 0.903, and 0.034, respectively, with a difference of 0.029 between R_c² and R_p². For the SD-RF-RFR model, the R_c², RMSEC, R_p², and RMSEP values were 0.943, 0.023, 0.901, and 0.034, respectively, with a difference of 0.042 between R_c² and R_p². In comparison, the SD-CARS-RFR model exhibited superior predictive performance.

3.6.3. Elman Prediction Model for Pak Choi Leaf Cadmium Content Based on Feature Wavelengths

As shown in Table 6, the Elman prediction model established on the basis of the Nor-RF model performed best, with an R_c² of 0.918, an RMSEC of 0.027, an R_p² of 0.900, and an RMSEP of 0.034. The difference between R_c² and R_p² of this model was only 0.018, which was relatively small. Compared with that of the raw Elman model, the prediction accuracy of this model increased by 5.14%, and the RMSEP decreased by 17.07%. In contrast, the MSC–RF–Elman model performed poorly, with an R_c² of 0.915, an RMSEC of 0.028, an R_p² of 0.649, and an RMSEP of 0.064. The difference between R_c² and R_p² was greater than 0.1, and R_c² was significantly greater than R_p², which indicated model overfitting. Compared with that of the raw Elman model, its prediction accuracy decreased by 24.18%, and the RMSEP value increased by 56.10%.

3.6.4. BiLSTM Prediction Model for Pak Choi Leaf Cadmium Content Based on Feature Wavelengths

An analysis of the data in Table 7 reveals that compared with the BiLSTM model established using raw data, the BiLSTM models constructed using the CARS, SPA, and RF feature selection methods achieved some improvement in prediction accuracy. In particular, the FD–RF–BiLSTM model performed the best, with an R_c² of 0.946, an RMSEC of 0.022, an R_p² of 0.913, and an RMSEP of 0.032. Compared with the raw BiLSTM model, R_p² increased by 6.16%, and the RMSEP decreased by 20%. Next was the SD-RF-BiLSTM model, with R_c², RMSEC, R_p², and RMSEP values of 0.917, 0.028, 0.903, and 0.034, respectively. The differences in R_c² and R_p² between the two models were relatively small, indicating consistent performance across the training and prediction datasets.

3.6.5. Optimal Model Validation

The above analysis reveals that the feature wavelengths selected using the CARS and RF algorithms achieved the best prediction performance among the four modelling methods: PLSR, RFR, Elman, and BiLSTM. Furthermore, the models constructed using FD preprocessing combined with the RF algorithm performed best. The model accuracy R² for the prediction set specifically decreased as follows: FD–RF–BiLSTM > SD–CARS–RFR > Nor–RF–Elman > MSC–CARS–PLSR. Therefore, the FD–RF–BiLSTM model (R_p² = 0.913) is the optimal prediction model for the pak choi cadmium content. The prediction results are shown in Figure 7.

4. Discussion

In this study, pak choi was somewhat tolerant and could still grow and develop under low cadmium stress, whereas high cadmium stress significantly inhibited pak choi growth, and the degree of inhibition intensified with increasing stress concentration. The cadmium content in the shoots of pak choi increased significantly with increasing cadmium concentration, which is consistent with the results of Chang Pengyan et al. [39]. After 7 days of cadmium stress, the total chlorophyll content of pak choi significantly decreased with the different cadmium concentration treatments, but the differences among the cadmium treatments were not significant. In addition to NPQ, the other chlorophyll fluorescence parameters did not significantly change. The response of the plant to cadmium stress has a time lag; 7 days may not be sufficient for cadmium to accumulate in the plant to a level that significantly affects the photosynthetic system, or the defence mechanisms of the plant may offset the adverse effects of cadmium during the initial stage, requiring a longer stress duration to reveal significant parameter changes. On the other hand, pak choi may have a specific tolerance to cadmium; under short-term cadmium stress, its photosynthetic system can remain relatively stable, which is similar to the results of Wang Tao et al. [40]. As the cadmium treatment time increased and its concentration increased, the photosynthesis of pak choi was more severely inhibited. It is necessary to use hyperspectral modelling to predict cadmium content based on the physiological response characteristics mentioned above. In detection methods using traditional physiological indicators, it is difficult to distinguish different treatment differences under early cadmium stress, while hyperspectral technology can capture subtle biochemical and structural changes that cannot be identified by conventional methods, forming characteristic spectral responses. According to previous studies on pak choi, rapeseed, flue-cured tobacco, and other crops, the spectral characteristics of leaves are strongly correlated with the leaf cadmium content [41,42,43]. Therefore, in this study, regression models were constructed using the selected feature bands as independent variables and the leaf cadmium content as the dependent variable. By comparing and analysing different models, the optimal modelling method for determining the pak choi leaf cadmium content was determined.

In this study, hyperspectral imaging was used to obtain spectral data from pak choi, and the cadmium content in the leaves was measured. FD, SD, MSC, and Nor algorithms were used to preprocess the collected original hyperspectral data. CARS, SPA, and RF algorithms were used to screen feature bands from the preprocessed hyperspectral data. The PLSR, RFR, Elman, and BiLSTM algorithms were used to construct prediction models for the pak choi leaf cadmium content. The results indicated that when FD and SD preprocessing were used, and feature wavelengths were screened through the CARS and RF algorithms, the accuracy of the constructed models exceeded 0.9. FD-RF screened 24.5% of the feature bands, SD-RF selected 15.23% of the feature bands, and SD-CARS retained 19.21% of the original bands. The FD–RF–BiLSTM model had the highest accuracy, indicating that FD preprocessing combined with the RF algorithm is the optimal method for feature wavelength extraction for pak choi leaf cadmium content and that BiLSTM is the optimal prediction model for pak choi leaf cadmium content. This method breaks through the reliance of traditional hyperspectral detection on distinct changes in physiological indicators. Even at the early exposure stage, where chlorophyll fluorescence and pigment indices only exhibit minimal changes, it can still successfully achieve accurate prediction of cadmium content in pak choi, thus demonstrating genuine sub-visual detection capability.

It should be noted that the total sample size of 160 in this study was relatively limited. Although interfering variables were strictly controlled using the NFT hydroponic system and measurement errors were minimized by three replicate determinations, with the core model exhibiting excellent stability (R_p² = 0.913, RMSEP = 0.032), the small sample size may still restrict the generalization ability of the model to pak choi of different genotypes and complex field environments. It also hinders the adequate capture of subtle heterogeneity in spectral responses under cadmium stress, which is one of the main limitations of this study. In addition, all models in this study were constructed and validated under controlled conditions (a single variety, NFT hydroponic system, and stable temperature, humidity, and light environment). Although they exhibit high prediction accuracy, systematic external validation is still required before their promotion and application in actual production environments. Firstly, this study only used “Huaguan Qinggengcai” as the test variety. However, different genotypes of pak choi vary in leaf structure and cadmium enrichment capacity, which may lead to spectral response heterogeneity. Thus, it is necessary to expand the variety and range to verify the generality of the models. Secondly, various cultivation modes, such as soil cultivation and substrate cultivation, exist in actual production. The existing form of cadmium and the crop absorption environment differ significantly from those in hydroponic conditions, so the adaptability of the models to different production systems needs to be verified. Thirdly, environmental factors such as light, temperature, and humidity fluctuate dynamically in open-field or greenhouse cultivation, which may interfere with leaf spectral characteristics. Therefore, it is necessary to test the stability of the models under multiple environmental conditions. Finally, cadmium pollution in actual farmland is mostly low-concentration or combined pollution with other heavy metals, which is different from the medium-to-high concentration single cadmium stress scenario set in this study. Relevant samples need to be supplemented to verify the detection sensitivity of the models in complex pollution scenarios.

External validation across different cultivars, production systems, environmental conditions, and contamination scenarios, combined with an expanded sample size, could clarify the applicable boundaries of the model. This will provide a solid basis for subsequent optimization measures such as the introduction of cultivar correction coefficients and environmental factor compensation modules, thereby improving the practical application value of the technology and facilitating the large-scale application of hyperspectral non-destructive detection technology in the screening of heavy metal contamination in vegetables.

5. Conclusions

(1): The results of the cultivation experiments revealed that in the early stage of stress, the apparent characteristics of pak choi significantly decreased, and cadmium enrichment significantly increased, but the initial chlorophyll and chlorophyll fluorescence did not significantly change.
(2): In terms of feature band selection, CARS and RF achieved dual optimization of model complexity and accuracy through effective dimensionality reduction. The BiLSTM model established on the basis of FD preprocessing combined with RF for feature band selection had the best prediction effect.
(3): This study combined hyperspectral imaging technology with machine learning and deep learning algorithms and further combined it with feature wavelength selection to predict the cadmium content in pak choi leaves.
(4): The models constructed in this study exhibited excellent performance under controlled conditions, yet multi-dimensional external validation is required prior to their practical production and application. In addition, the total sample size of 160 in this study was relatively limited, which may affect the generalization ability of the models to a certain extent. Thus, it is necessary to expand the sample size for further validation and optimization of the models in future research. Such validation should cover different pak choi cultivars, various production systems (e.g., soil cultivation, substrate cultivation, and hydroponics), dynamic environmental conditions of light, temperature, and humidity, as well as practical contamination scenarios including low-concentration and combined heavy metal contamination. This will help clarify the applicable boundaries of the models and further enhance their generalization ability and application reliability.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/app16020670/s1, Figure S1: Distribution map of feature wavelengths selected by the CARS algorithm; Figure S2: Distribution map of feature wavelengths selected by the SPA algorithm; Figure S3: Distribution map of feature wavelengths selected by the RF algorithm; (a), (b), (c), and (d) represent FD, SD, MSC, and Nor preprocessing, respectively.

Author Contributions

Y.C. provided experimental ideas and designed experiments. Y.C. and T.W. conducted the experiment. S.L. (Shanshan Lin) and S.L. (Shuilan Liao) carried out data processing and analysis and drafted the manuscript. S.W. revised the manuscript. S.W. supervised and guided the whole experiment. All authors have read and agreed to the published version of the manuscript.

Funding

This study was supported by the Basic Research Projects of Public Welfare Research Institutes in Fujian Province (2025R1033002).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original data presented in the study are openly available in https://zenodo.org/records/17626908 (accessed on 25 November 2025).

Conflicts of Interest

The authors declare no conflicts of interest.

References

Wang, P.; Chen, H.; Kopittke, P.M.; Zhao, F.J. Cadmium contamination in agricultural soils of China and the impact on food safety. Environ. Pollut. 2019, 249, 1038–1048. [Google Scholar] [CrossRef]
Sun, X.; Zhang, L.; Gu, Y.; Wang, P.; Liu, H.; Qiang, L.; Huang, Q. Nutrient-Element-Mediated Alleviation of Cadmium Stress in Plants: Mechanistic Insights and Practical Implications. Plants 2025, 14, 3081. [Google Scholar] [CrossRef] [PubMed]
Hou, D.; Jia, X.; Wang, L.; McGrath, S.P.; Zhu, Y.G.; Hu, Q.; Zhao, F.; Bank, M.S.; O’Connor, D.; Nriagu, J. Global soil pollution by toxic metals threatens agriculture and human health. Science 2025, 388, 316–321. [Google Scholar] [CrossRef]
Yuan, X.; Xue, N.; Han, Z. A Meta-Analysis of Heavy Metals Pollution in Farmland and Urban Soils in China over the Past 20 Years. J. Environ. Sci. 2021, 101, 217–226. [Google Scholar] [CrossRef]
Genchi, G.; Sinicropi, M.S.; Lauria, G.; Carocci, A.; Catalano, A. The Effects of Cadmium Toxicity. Int. J. Environ. Res. Public Health 2020, 17, 3782. [Google Scholar] [CrossRef]
Zhang, D.; Liu, X.; Zhang, Y.; Ye, J.; Yi, Q. Effects of Arbuscular Mycorrhizal Fungi on the Physiological Responses and Root Organic Acid Secretion of Tomato (Solanum lycopersicum) Under Cadmium Stress. Horticulturae 2025, 11, 1204. [Google Scholar] [CrossRef]
Zhou, J.; Zhu, J.-G.; Xiao, P.; Wang, K.-L.; Xu, Q.; Wu, M.-X.; Pan, Y.-Z. Physiological and Multi-Omics Analysis in Leaves of Solanum americanum in Response to Cd Toxicity. Plants 2025, 14, 2131. [Google Scholar] [CrossRef]
Kim, J.; Lee, J.; Jang, Y.; Lee, S.; Lee, W.M.; Wi, S.; Yoon, H.I. Elucidating Genetic Mechanisms of Summer Stress Tolerance in Chinese Cabbage through GWAS and Phenotypic Analysis. Agronomy 2024, 14, 1960. [Google Scholar] [CrossRef]
Li, Y.; Liu, G.F.; Ma, L.M.; Liu, T.K.; Zhang, C.W.; Xiao, D.; Zheng, H.K.; Chen, F.; Hou, X.L. A chromosome-level reference genome of non-heading Chinese cabbage [Brassica campestris (syn. Brassica rapa) ssp. chinensis]. Hortic. Res. 2020, 7, 212. [Google Scholar] [CrossRef] [PubMed]
Xia, W.; Liao, Y.; Chen, X.; Li, L.; Shi, Y.; Liu, Y.; Zhang, J.; Fu, J. The Impact of Cd Pollution on Arbuscular Mycorrhizal Fungal Communities in Paddy Fields. Plants 2025, 14, 2501. [Google Scholar] [CrossRef]
Codex Standard 193-1995; Codex General Standard for Contaminants and Toxins in Food and Feed. Codex Alimentarius Commission: Rome, Italy, 1995.
GB 2762-2022; National Health Commission of the People’s Republic of China, State Administration for Market Regulation. National Food Safety Standard: Limit of Pollutants in Food. China Standards Publishing House: Beijing, China, 2022.
Zhang, S.; Zhu, Y.; Wang, M.; Fei, T. Selection of the Optimal Spectral Resolution for the Cadmium-Lead Cross Contamination Diagnosing Based on the Hyperspectral Reflectance of Rice Canopy. Sensors 2019, 19, 3889. [Google Scholar] [CrossRef]
Zhang, B.; Guo, B.; Zou, B.; Wei, W.; Lei, Y.; Li, T. Retrieving Soil Heavy Metals Concentrations Based on GaoFen-5 Hyperspectral Satellite Image at an Opencast Coal Mine, Inner Mongolia, China. Environ. Pollut. 2022, 300, 118981. [Google Scholar] [CrossRef]
Wang, Y.; Sun, J.; Wu, Z.; Jia, Y.; Dai, C. Application of Non-Destructive Technology in Plant Disease Detection: Review. Agriculture 2025, 15, 1670. [Google Scholar] [CrossRef]
Choi, J.-Y.; Lee, M.; Lee, D.U.; Choi, J.H.; Lee, M.-A.; Min, S.G.; Park, S.H. Non-destructive monitoring of qualitative properties of salted cabbage using hyperspectral image analysis. LWT 2024, 203, 116329. [Google Scholar] [CrossRef]
Li, L.; Huang, G.; Wu, J.; Yu, Y.; Zhang, G.; Su, Y.; Wang, X.; Chen, H.; Wang, Y.; Wu, D. Combine photosynthetic characteristics and leaf hyperspectral reflectance for early detection of water stress. Front. Plant Sci. 2025, 16, 1520304. [Google Scholar] [CrossRef] [PubMed]
Zhang, J.; Wang, M.; Yang, K.; Zhao, H. Inversion monitoring of heavy metal pollution in corn crops based on ZY-1 02D hyperspectral imaging. Microchem. J. 2025, 208, 112305. [Google Scholar] [CrossRef]
Wang, T.; Wei, H.; Zhou, C.; Gu, Y.; Li, R.; Chen, H.; Ma, W. Estimating Cadmium Concentration in the Edible Part of Capsicum Annuum Using Hyperspectral Models. Environ. Monit. Assess. 2017, 189, 548. [Google Scholar] [CrossRef]
Yi, X.; Wen, X.; Lan, A.; Dai, Q.; Yan, Y.; Zhang, Y.; Yao, Y. Monitoring Cadmium Content in the Leaves of Field Pepper and Eggplant in a Karst Area Using Hyperspectral Remote Sensing Data. Sustainability 2023, 15, 3508. [Google Scholar] [CrossRef]
Sun, J.; Wu, M.; Hang, Y.; Lu, B.; Wu, X.; Chen, Q. Estimating Cadmium Content in Lettuce Leaves Based on Deep Brief Network and Hyperspectral Imaging Technology. J. Food Process Eng. 2019, 42, e13293. [Google Scholar] [CrossRef]
Shen, T.; Zhang, C.; Liu, F.; Wang, W.; Lu, Y.; Chen, R.; He, Y. High-Throughput Screening of Free Proline Content in Rice Leaf under Cadmium Stress Using Hyperspectral Imaging with Chemometrics. Sensors 2020, 20, 3229. [Google Scholar] [CrossRef]
Zhang, X.; Sun, W.; Cen, Y.; Zhang, L.; Wang, N. Predicting cadmium concentration in soils using laboratory and field reflectance spectroscopy. Sci. Total Environ. 2019, 650, 321–334. [Google Scholar] [CrossRef]
Wu, C.; Liu, M.; Liu, X.; Wang, T.; Wang, L. Developing a New Spectral Index for Detecting Cadmium-Induced Stress in Rice on a Regional Scale. Int. J. Environ. Res. Public Health 2019, 16, 4811. [Google Scholar] [CrossRef] [PubMed]
Aguilar-Ariza, A.; Sotta, N.; Fujiwara, T.; Guo, W.; Kamiya, T. A multi-target regression method to predict element concentrations in tomato leaves using hyperspectral imaging. Plant Phenomics 2024, 6, 0146. [Google Scholar] [CrossRef] [PubMed]
Sun, J.; Shi, L.; Cheng, J.; Dai, C.; Wu, X. Hyperspectral imaging for trace cadmium prediction in lettuce leaves. Spectrochim. Acta Part A Mol. Biomol. Spectrosc. 2025, 344, 126735. [Google Scholar] [CrossRef] [PubMed]
Zhang, X.Z. Plant chlorophyll content determination:acetone ethanol mixture method. Liaoning Agric. Sci. 1986, 3, 26–28. [Google Scholar]
Zhang, Y.; Wu, W.; Zhou, X.; Cheng, J.-H. Non-Destructive Detection of Soybean Storage Quality Using Hyperspectral Imaging Technology. Molecules 2025, 30, 1357. [Google Scholar] [CrossRef]
Meng, L.; Chen, G.; Liu, D.; Tian, N. Universal Modeling for Non-Destructive Testing of Soluble Solids Content in Multi-Variety Blueberries Based on Hyperspectral Imaging Technology. Appl. Sci. 2025, 15, 3888. [Google Scholar] [CrossRef]
Li, K.; Guo, Y.; Zhong, H.; Jin, Y.; Li, B.; Fang, H.; Yao, L.; Zhao, C. Rapid Identification of Dendrobium Species Using Near-Infrared Hyperspectral Imaging Technology. Sensors 2025, 25, 5625. [Google Scholar] [CrossRef]
Li, S.; Sun, R.; Li, X.; Li, Y.; Zhao, L.; Huang, X.; Xu, Y. Estimation of Amino Acid and Tea Polyphenol Content of Tea Fresh Leaves Based on Fractional-Order Differential Spectroscopy. Appl. Sci. 2025, 15, 5792. [Google Scholar] [CrossRef]
Park, M.-S.; Faqeerzada, M.A.; Jang, S.H.; Kim, H.; Lee, H.; Kim, G.; Cho, Y.-S.; Hwang, W.-H.; Kim, M.S.; Baek, I.; et al. Detection of Abiotic Stress in Potato and Sweet Potato Plants Using Hyperspectral Imaging and Machine Learning. Plants 2025, 14, 3049. [Google Scholar] [CrossRef]
Yang, H.; Chen, Q.; Qian, J.; Li, J.; Lin, X.; Liu, Z.; Fan, N.; Ma, W. Determination of Dry-Matter Content of Kiwifruit before Harvest Based on Hyperspectral Imaging. AgriEngineering 2024, 6, 52–63. [Google Scholar] [CrossRef]
Tang, Z.; Ma, S.; Qi, H.; Zhang, X.; Zhang, C. Nondestructive Detection of Rice Milling Quality Using Hyperspectral Imaging with Machine and Deep Learning Regression. Foods 2025, 14, 1977. [Google Scholar] [CrossRef] [PubMed]
Zhang, L.; Liu, C.; Han, J.; Yang, Y. Variety Identification of Corn Seeds Based on Hyperspectral Imaging and Convolutional Neural Network. Foods 2025, 14, 3052. [Google Scholar] [CrossRef] [PubMed]
Lan, T.; Shen, S.; Yuan, H.; Jiang, Y.; Tong, H.; Ye, Y. A Rapid Prediction Method of Moisture Content for Green Tea Fixation Based on WOA-Elman. Foods 2022, 11, 2928. [Google Scholar] [CrossRef]
Yan, X.; Liu, S.; Wang, S.; Cui, J.; Wang, Y.; Lv, Y.; Li, H.; Feng, Y.; Luo, R.; Zhang, Z.; et al. Predictive Analysis of Linoleic Acid in Red Meat Employing Advanced Ensemble Models of Bayesian and CNN-Bi-LSTM Decision Layer Fusion Based Hyperspectral Imaging. Foods 2024, 13, 424. [Google Scholar] [CrossRef]
Tian, W.; Zang, L.; Li, Y.; Peng, C.; Zhang, J.; Shao, H.; Xian, R.; Sun, R.; Liu, F.; Tan, H.; et al. A Transformer-based framework with generative spectral augmentation for online monitoring of hyaluronic acid fermentation. Carbohydr. Polym. 2025, 369, 124278. [Google Scholar] [CrossRef] [PubMed]
Chang, P.Y.; Wang, S.L.; Wang, T.; Chen, Y.K. Study on phenotypic characteristics, photosynthetic ability, and cadmium enrichment ability of Chinese cabbage under cadmium stress. Jiangsu Agric. Sci. 2024, 52, 164–172. [Google Scholar]
Wang, T.; Huang, Y.Y.; Chen, Y.K.; Liao, S.L. Effects of cadmium stress on growth, physiological characteristics and cadmium enrichment and transport of lettuce. J. Northwest AF Univ. (Nat. Sci. Ed.) 2024, 52, 115–124. [Google Scholar]
Gao, W. Hyperspectral Characteristics and Simulation of Some Physiological Parameters of Leafyvegetables Under Cd Stress. Ph.D. Thesis, Southwest University, Chongqing, China, 2012. [Google Scholar]
Liu, L.; Zhang, W.J.; Wang, W.H.; Zhang, Y.; Li, Q.; Han, D.; Li, J.J.; Lin, L. Estimation models for spectral response and cadmium contents in leaves of Brassica napus L. Chin. J. Oil Crop Sci. 2019, 41, 46–52. [Google Scholar]
Chen, N.; Feng, H.L.; Yang, Y.D.; Chen, P.; Ren, T.B.; Jia, F.F.; Liu, G.S. Establishment of hyperspectral prediction model for cadmium content in flue-curedtobacco leaves. J. Agric. Resour. Environ. 2021, 38, 570–575. [Google Scholar]

Figure 1. Schematic of the hyperspectral acquisition setup.

Figure 2. Technical roadmap.

Figure 3. Comparison of the growth phenotypes of pak choi with different cadmium concentration treatments.

Figure 4. Effect of different cadmium concentration treatments on the chlorophyll content in pak choi leaves. Different letters above the columns indicate significant differences among treatments (p < 0.05).

Figure 5. Effect of cadmium treatment on chlorophyll fluorescence F₀. F₀ is 0–3000 from blue to red.

Figure 6. Spectral reflectance of pak choi leaves under cadmium stress.

Figure 7. Scatter plot of the predicted vs. actual values for the FD–RF–BiLSTM model. Blue dots represent the training set samples, red dots represent the prediction set samples, and the dashed line denotes an ideal reference line (y = x) for evaluating prediction accuracy.

Table 1. Statistical analysis of cadmium content in pak choi with different cadmium concentration treatments.

Cadmium Mass Concentration (mg/L)	Minimum Value (mg/g)	Maximum Value (mg/g)	Mean (mg/g)	Standard Deviation (mg/g)
0	0	0	0	0
25	0.10	0.20	0.14	0.03
50	0.15	0.25	0.20	0.03
100	0.20	0.30	0.26	0.03

Table 2. Effect of different cadmium concentration treatments on the chlorophyll fluorescence parameters of pak choi.

Cadmium Mass Concentration (mg/L)	F_v/F_m	F_v′/F_m′	ΦPSII	NPQ	qP
0	0.85 ± 0.00 a	0.80 ± 0.02 a	0.45 ± 0.03 a	0.50 ± 0.04 a	0.57 ± 0.05 a
25	0.84 ± 0.01 ab	0.78 ± 0.01 ab	0.46 ± 0.03 a	0.43 ± 0.01 b	0.59 ± 0.04 a
50	0.83 ± 0.00 b	0.77 ± 0.01 b	0.47 ± 0.04 a	0.45 ± 0.01 b	0.62 ± 0.05 a
100	0.84 ± 0.02 ab	0.79 ± 0.02 ab	0.47 ± 0.06 a	0.37 ± 0.04 c	0.60 ± 0.07 a

Note: Different lowercase letters indicate significant differences among treatments (p < 0.05).

Table 3. Feature band selection results.

Feature Wavelength Selection Method	Preprocessing Method	Number of Wavelengths
CARS	FD	63
	SD	28
	MSC	37
	Nor	23
SPA	FD	5
	SD	6
	MSC	11
	Nor	13
RF	FD	37
	SD	23
	MSC	24
	Nor	44

Table 4. PLSR prediction models for pak choi leaf cadmium content established by different feature wavelength extraction algorithms.

Modelling Method	Latent Variable	Training Set		Prediction Set
Modelling Method	Latent Variable	R_c²	RMSEC	R_p²	RMSEP
Raw	11	0.794	0.040	0.833	0.041
FD-CARS	20	0.941	0.023	0.782	0.045
SD-CARS	9	0.827	0.037	0.769	0.045
MSC-CARS	20	0.918	0.026	0.894	0.034
Nor-CARS	13	0.867	0.033	0.818	0.039
FD-SPA	5	0.587	0.052	0.732	0.046
SD-SPA	6	0.613	0.051	0.787	0.043
MSC-SPA	11	0.777	0.041	0.873	0.034
Nor-SPA	12	0.815	0.038	0.866	0.038
FD-RF	11	0.784	0.040	0.811	0.040
SD-RF	14	0.762	0.042	0.762	0.045
MSC-RF	20	0.840	0.036	0.892	0.034
Nor-RF	11	0.811	0.038	0.873	0.034

Table 5. RFR prediction models for pak choi leaf cadmium content established by different feature wavelength extraction algorithms.

Modelling Method	Training Set		Prediction Set
Modelling Method	R_c²	RMSEC	R_p²	RMSEP
Raw	0.905	0.030	0.686	0.060
FD-CARS	0.933	0.025	0.874	0.038
SD-CARS	0.932	0.025	0.903	0.034
SC-CARS	0.938	0.024	0.855	0.041
Nor-CARS	0.916	0.028	0.837	0.044
FD-SPA	0.902	0.030	0.838	0.043
SD-SPA	0.906	0.030	0.867	0.039
MSC-SPA	0.918	0.027	0.839	0.043
Nor-SPA	0.916	0.028	0.805	0.048
FD-RF	0.945	0.023	0.896	0.035
SD-RF	0.943	0.023	0.901	0.034
MSC-RF	0.937	0.024	0.836	0.044
Nor-RF	0.921	0.027	0.819	0.046

Table 6. Elman prediction models for pak choi leaf cadmium content established by different feature wavelength extraction algorithms.

Modelling Method	Training Set		Prediction Set
Modelling Method	R_c²	RMSEC	R_p²	RMSEP
Raw	0.845	0.038	0.856	0.041
FD-CARS	0.945	0.022	0.817	0.046
SD-CARS	0.925	0.026	0.866	0.039
MSC-CARS	0.924	0.026	0.826	0.045
Nor-CARS	0.917	0.028	0.858	0.040
FD-SPA	0.840	0.038	0.864	0.040
SD-SPA	0.850	0.037	0.879	0.037
MSC-SPA	0.877	0.034	0.845	0.042
Nor-SPA	0.925	0.026	0.822	0.045
FD-RF	0.977	0.014	0.796	0.049
SD-RF	0.925	0.026	0.854	0.041
MSC-RF	0.915	0.028	0.649	0.064
Nor-RF	0.918	0.027	0.900	0.034

Table 7. BiLSTM prediction models for pak choi leaf cadmium content established by different feature wavelength extraction algorithms.

Modelling Method	Training Set		Prediction Set
Modelling Method	R_c²	RMSEC	R_p²	RMSEP
Raw	0.773	0.046	0.860	0.040
FD-CARS	0.940	0.023	0.877	0.038
SD-CARS	0.922	0.027	0.835	0.044
MSC-CARS	0.882	0.033	0.870	0.039
Nor-CARS	0.922	0.028	0.870	0.039
FD-SPA	0.808	0.042	0.872	0.039
SD-SPA	0.819	0.041	0.899	0.034
MSC-SPA	0.751	0.048	0.863	0.040
Nor-SPA	0.850	0.037	0.898	0.034
FD-RF	0.946	0.022	0.913	0.032
SD-RF	0.917	0.028	0.903	0.034
MSC-RF	0.873	0.034	0.866	0.039
Nor-RF	0.904	0.030	0.844	0.042

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Chen, Y.; Wang, T.; Lin, S.; Liao, S.; Wang, S. Detection of Cadmium Content in Pak Choi Using Hyperspectral Imaging Combined with Feature Selection Algorithms and Multivariate Regression Models. Appl. Sci. 2026, 16, 670. https://doi.org/10.3390/app16020670

AMA Style

Chen Y, Wang T, Lin S, Liao S, Wang S. Detection of Cadmium Content in Pak Choi Using Hyperspectral Imaging Combined with Feature Selection Algorithms and Multivariate Regression Models. Applied Sciences. 2026; 16(2):670. https://doi.org/10.3390/app16020670

Chicago/Turabian Style

Chen, Yongkuai, Tao Wang, Shanshan Lin, Shuilan Liao, and Songliang Wang. 2026. "Detection of Cadmium Content in Pak Choi Using Hyperspectral Imaging Combined with Feature Selection Algorithms and Multivariate Regression Models" Applied Sciences 16, no. 2: 670. https://doi.org/10.3390/app16020670

APA Style

Chen, Y., Wang, T., Lin, S., Liao, S., & Wang, S. (2026). Detection of Cadmium Content in Pak Choi Using Hyperspectral Imaging Combined with Feature Selection Algorithms and Multivariate Regression Models. Applied Sciences, 16(2), 670. https://doi.org/10.3390/app16020670

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Detection of Cadmium Content in Pak Choi Using Hyperspectral Imaging Combined with Feature Selection Algorithms and Multivariate Regression Models

Abstract

1. Introduction

2. Materials and Methods

2.1. Experimental Design

2.2. Determination of Chlorophyll and Cadmium Contents

2.3. Measurement of Chlorophyll Fluorescence Data

2.4. Hyperspectral Imaging Data Acquisition

2.5. Hyperspectral Data Analysis and Model Evaluation

2.5.1. Hyperspectral Data Preprocessing

2.5.2. Feature Wavelength Selection Algorithms

2.5.3. Modelling Algorithms

2.5.4. Model Evaluation Methods

3. Results

3.1. Effect of Cadmium Stress on Pak Choi Growth and Cadmium Accumulation

3.2. Effect of Cadmium Stress on Chlorophyll Content in Pak Choi

3.3. Effect of Cadmium Stress on the Chlorophyll Fluorescence Characteristics of Pak Choi

3.4. Spectral Curves

3.5. Feature Wavelength Selection Results

3.6. Results of Cadmium Content Detection Models Based on Feature Wavelengths

3.6.1. PLSR Prediction Model for Pak Choi Leaf Cadmium Content Based on Feature Wavelengths

3.6.2. RFR Prediction Model for Pak Choi Leaf Cadmium Content Based on Feature Wavelengths

3.6.3. Elman Prediction Model for Pak Choi Leaf Cadmium Content Based on Feature Wavelengths

3.6.4. BiLSTM Prediction Model for Pak Choi Leaf Cadmium Content Based on Feature Wavelengths

3.6.5. Optimal Model Validation

4. Discussion

5. Conclusions

Supplementary Materials

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI