Next Article in Journal
Modeling the Impact of Future Temperature Increases on Olive Oil Accumulation Patterns in the Iberian Peninsula
Previous Article in Journal
Spatiotemporal Distribution Characteristics of Soil Organic Carbon and Its Influencing Factors in the Loess Plateau
Previous Article in Special Issue
Analysis of the Current Situation and Trends of Optical Sensing Technology Application for Facility Vegetable Life Information Detection
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Research on Non Destructive Detection Method and Model Op-Timization of Nitrogen in Facility Lettuce Based on THz and NIR Hyperspectral

1
School of Agricultural Engineering, Jiangsu University, Zhenjiang 212013, China
2
College of Mechanical and Electronic Engineering, Northwest A&F University, Yangling 712100, China
3
Huai’an Chaimihe Agricultural Technology Co., Ltd., Huai’an 223006, China
*
Author to whom correspondence should be addressed.
Agronomy 2025, 15(10), 2261; https://doi.org/10.3390/agronomy15102261
Submission received: 23 August 2025 / Revised: 19 September 2025 / Accepted: 21 September 2025 / Published: 24 September 2025
(This article belongs to the Special Issue Crop Nutrition Diagnosis and Efficient Production)

Abstract

Considering the growing demand for modern facility agriculture, it is essential to develop non-destructive technologies for assessing lettuce nutritional status. To overcome the limitations of traditional methods, which are destructive and time-consuming, this study proposes a multimodal non-destructive nitrogen detection method for lettuce based on multi-source imaging. The approach integrates terahertz time-domain spectroscopy (THz-TDS) and near-infrared hyperspectral imaging (NIR-HSI) to achieve rapid and non-invasive nitrogen detection. Spectral imaging data of lettuce samples under different nitrogen gradients (20–150%) were simultaneously acquired using a THz-TDS system (0.2–1.2 THz) and a NIR-HSI system (1000–1600 nm), with image segmentation applied to remove background interference. During data processing, Savitzky–Golay smoothing, MSC (for THz data), and SNV (for NIR data) were employed for combined preprocessing, and sample partitioning was performed using the SPXY algorithm. Subsequently, SCARS/iPLS/IRIV algorithms were applied for THz feature selection, while RF/SPA/ICO methods were used for NIR feature screening, followed by nitrogen content prediction modeling with LS-SVM and KELM. Furthermore, small-sample learning was utilized to fuse crop feature information from the two modalities, providing a more comprehensive and effective detection strategy. The results demonstrated that the THz-based model with SCARS-selected power spectrum features and an RBF-kernel LS-SVM achieved the best predictive performance (R2 = 0.96, RMSE = 0.20), while the NIR-based model with ICO features and an RBF-kernel LS-SVM achieved the highest accuracy (R2 = 0.967, RMSE = 0.193). The fusion model, combining SCARS and ICO features, exhibited the best overall performance, with training accuracy of 96.25% and prediction accuracy of 95.94%. This dual-spectral technique leverages the complementary responses of nitrogen in molecular vibrations (THz) and organic chemical bonds (NIR), significantly enhancing model performance. To the best of our knowledge, this is the first study to realize the synergistic application of THz and NIR spectroscopy in nitrogen detection of facility-grown lettuce, providing a high-precision, non-destructive solution for rapid crop nutrition diagnosis.

1. Introduction

With the large-scale development of facility agriculture, real-time and precise monitoring of crop nutritional status has become a core aspect for optimizing fertilization decisions and improving resource utilization efficiency. As one of the most critical elements for plant growth, nitrogen is closely related to crop development [1]. For fast-growing leafy vegetable crops like lettuce, accurate nitrogen management can effectively ensure yield, quality, and economic benefits [2]. Traditional nitrogen detection relies on destructive chemical methods such as the Kjeldahl method [3] and indophenol blue colorimetry [4]. These approaches are time-consuming, costly, and unable to dynamically track physiological changes in individual plants, making continuous measurement impractical [5]. To overcome this bottleneck, non-destructive detection technologies need to evolve toward multi-modal sensing fusion, synergistically leveraging the complementary advantages of different spectral and imaging techniques to achieve comprehensive analysis of crop biochemical parameters.
Spectroscopy technology, owing to its rapidity, efficiency, non-destructiveness, and wide applicability, has matured in the field of crop nutrient monitoring [6,7,8]. It provides a viable pathway for the dynamic monitoring of nitrogen levels in lettuce, and its core advantage lies in spatial-spectral unified analysis capabilities, enabling simultaneous interpretation of both spatial distribution patterns of nitrogen-containing compounds in leaves and spectral response features across specific wavelength bands. In addition, a large number of researchers have found that spectroscopy has a high advantage in detecting the nitrogen content of leaves. Kühn applied different levels of nitrogen and phosphorus treatments to three plants and effectively utilized spectroscopy to achieve accurate predictions [9]. Karoojee, S. sprayed foliar fertilizers with different N and P on Dendrobium officinale and effectively evaluated the changes in leaf nutrient elements using spectroscopy [10], this confirms that using spectroscopy to evaluate the content of basic substances at the leaf scale is very reasonable [11].
Near-infrared spectroscopy (NIR) and terahertz spectroscopy (THz) have become prominent research focuses due to their unique mechanisms. In recent years, extensive studies utilizing these technologies have established them as mainstream methods for quantitative and qualitative analysis of agricultural products and food [12,13,14,15]. A large number of researchers have conducted effective evaluations of the quality attributes of various fruits using near-infrared hyperspectral technology [16,17,18]. Liu investigated the chlorophyll variation mechanism during matcha processing and achieved real-time and quantitative monitoring of chlorophyll using near-infrared (NIR) spectroscopy [19]. Arslan combined near-infrared (NIR) spectroscopy with multiple algorithms to evaluate the antioxidant activity of black wolfberry (Lycium ruthenicum) [20]. Yang fused NIR hyperspectral imagery with deep features to effectively estimate nitrogen levels in wheat leaves at near-ground level [21]. Shi mapped cucumber chlorophyll distribution via NIR hyperspectral data, correlating it with HPLC measurements to accurately diagnose nitrogen deficiency [22]. Mao combined reflectance from selected wavelengths with image texture and morphological features for machine learning-based lettuce canopy nitrogen detection [23]. Tang developed a BP-Adaboost model using NIR spectra to classify rubber tree leaf nitrogen with high accuracy [24]. Azadnia R further established nonlinear in situ prediction models for apple tree nitrogen, phosphorus, and potassium by preprocessing visible-NIR spectra and extracting sensitive wavelengths [25].
However, near-infrared spectroscopy is constrained by moisture absorption interference and shallow detection depth, resulting in limited capability in resolving internal leaf structures and dry matter distribution [26], so people have shifted their focus to terahertz spectroscopy technology.
As a relatively novel and promising sensing technology, the emergence of Terahertz Time-Domain Spectroscopy (THz-TDS) offers a new pathway to overcome the aforementioned limitations; characterized by low energy, deep penetration, and fingerprint spectral characteristics, terahertz (THz) radiation has become a preferred method for rapid, non-destructive detection in agriculture, demonstrating excellent performance in quantitative and qualitative analysis of food [27,28]. As a more recent sensing technique, terahertz waves have seen extensive applications in diverse agricultural sectors including plant health monitoring [29], leaf water content assessment [30,31], seed testing [32,33,34], pesticide detection [35,36,37], identification of hazardous substances [38,39,40,41], disease detection [37] and soil composition analysis [42]. Additionally, since terahertz pulses can effectively penetrate the surface structure of plant leaves to directly characterize rotational energy level transitions in nitrogen-related organic macromolecules [43,44], this provides opportunities for directly detecting internal leaf nutrients. Ziyi Zang utilized terahertz spectroscopic imaging to quantitatively monitor the spatial variability of leaf water, solids, and gas content, enabling non-destructive nutrient monitoring [45]. Zhang collected terahertz spectral data from tomato leaves under varying nitrogen levels and established a machine learning model that enables the prediction of nitrogen content in tomato leaves [46].
Given that a single sensing modality may not comprehensively reflect crop nutritional status, this study proposes to synergistically integrate terahertz spectroscopy’s strength in detecting internal crop traits with hyperspectral imaging’s capacity for capturing external characteristics. By combining feature information across these dual modalities, an innovative attempt will be made to obtain more holistic and effective detection methodology. After acquiring spectral data through near-infrared hyperspectral and terahertz imaging systems, the research will execute sample processing, dataset splitting, feature selection, and model construction. Subsequently, the fused terahertz-hyperspectral features will be trained within a deep learning framework, ultimately establishing a feature-fusion model that capitalizes on the complementary strengths of both terahertz and hyperspectral technologies.

2. Materials and Methods

2.1. Sample Cultivation, Processing, and Collection

The experimental samples used Italian annual bolt-resistant lettuce cultivated in a Venlo-type greenhouse at Jiangsu University (32.2° N, 119.5° E). Seeds were germinated uniformly under identical conditions, and seedlings at the 4–5 true leaf stage with consistent morphology were transplanted into 15 cm-diameter pots containing perlite substrate to ensure standardized initial growth conditions. Plants were subjected to four nitrogen stress gradients (20%, 60%, 100%, and 150% nitrogen levels), with the 100% group serving as the control using modified Yamazaki formula nutrient solution, totaling 100 plants (25 per gradient). Daily volume-equivalent differential-nitrogen irrigation was administered each morning while maintaining optimal greenhouse temperature and humidity. After 40 days of cultivation, 2–3 intact undamaged leaves per plant were excised and stored in sealed plastic bags, yielding 200 leaf samples for analysis.

2.2. Determination of Sample Nitrogen Content

The collected lettuce leaf samples underwent nitrogen content quantification via Kjeldahl nitrogen determination after terahertz data acquisition. To eliminate moisture interference with nitrogen detection while preserving sample structural integrity, samples were freeze-dried using a Christ Alpha 1-2 LD Plus instrument (Marin Christ Co., Ltd., Osterode, Germany). Following dehydration, samples were pulverized into powder with a German RETSCH MM400 ball (RETSCH Co., Ltd., Düsseldorf, Germany) mill, then digested into test solutions using a TD20-HG08SM-108G (Beijing Haifuda Technology Co., Ltd., Beijing, China) digestion furnace for subsequent total nitrogen measurement with a UDK159 Kjeldahl nitrogen analyzer (VELP Co., Ltd., Via Stazione, Italy). This sequential processing ensured accurate reference data acquisition while maintaining sample physicochemical stability.

2.3. Near Infrared Hyperspectral Imaging Data Acquisition

The hyperspectral data was collected using the HSI Analyzer system (Shanghai Wuling Optoelectronics Technology Co., Ltd., Shanghai, China). The overall structure diagram of the hyperspectral imaging system is shown in Figure 1.
After being prepared, the lettuce sample was placed on the stage of the hyperspectral imaging system. Under illumination from a stable light source, it generated reflected light signals, which were collected by the lens and transmitted to the spectral splitting unit. Following spectral decomposition, these signals were sequentially acquired line by line by the detector. As the stage moved at a constant speed, the system continuously recorded and stitched together the spectral information, ultimately generating a hyperspectral volume of data that integrated both spatial and spectral details, thereby providing a foundation for subsequent data processing and modeling.
The spectral data acquired by this hyperspectral imaging system in the present study covers near-infrared wavelengths ranging from 871.607 nm to 1766.322 nm, capturing both the sample’s images and spectral intensity values. This system has a spectral resolution of 3.5 nm, with a total of 256 wavelength points within this range. For the near-infrared images collected, their RGB channels correspond to specific wavelengths: Band R at 1448.899 nm (wavelength index 166), Band G at 1376.352 nm (wavelength index 143), and Band B at 1307.224 nm (wavelength index 121).

2.4. Terahertz Spectral Imaging System Data Acquisition

This study employed the TAS7400 terahertz time-domain spectroscopy (THz-TDS) system developed by Advantest Corporation (Advantest Corporation Co., Ltd., Tokyo, Japan) for sample data acquisition. The system’s configuration and working principle are illustrated in Figure 2. The THz frequency range covered 0–4 THz with a sampling interval of 0.0038 THz. To mitigate moisture interference during THz data collection, a dehumidifier was utilized to reduce the relative humidity within the sample scanning chamber to below 5% RH prior to measurements, with this humidity level rigorously maintained throughout the entire experimental process.
Prior to measurement, the system required preheating to ensure spectral signal stability, followed by sample transmittance spectral scanning. The experimental parameters were standardized according to sample dimensions: the scanning interval between X/Y axes was set to 1.5 mm with 80 steps each, totaling 6400 scanning points. Measurement frequency ranged from 0–4 THz with 3.8 GHz intervals, and sample thickness was configured at 1 mm. Post-scanning, the acquired data was generated as a three-dimensional image volume containing spatial-spectral information, as illustrated in Figure 3.

2.5. Spectral Data Analysis and Model Evaluation

2.5.1. Spectral Preprocessing Method

After acquiring NIR (near-infrared hyperspectral) and THz (terahertz spectroscopy) data, this study employed the Savitzky–Golay (S-G) smoothing algorithm and multiplicative scatter correction (MSC) algorithm to process raw spectral data. These preprocessing steps aimed to enhance modeling efficiency and accuracy by reducing noise interference, emphasizing effective information, and improving the signal-to-noise ratio (SNR). Specifically, the S-G algorithm effectively suppressed random noise while preserving spectral features through polynomial fitting within sliding windows. Concurrently, MSC minimized baseline drift and scattering effects caused by sample heterogeneity by normalizing spectra relative to a reference mean spectrum. This dual approach ensured high-fidelity spectral data extraction, mitigated signal distortion and optimized modeling performance.

2.5.2. Dataset Splitting

After preprocessing the spectral data, to establish a suitable model and conduct data prediction, the sample set was divided into a calibration set (used for model development) and a prediction set (used for data prediction); this study employed the Random Sampling (RS) algorithm and the Sample set Partitioning based on joint X-Y distance (SPXY) algorithm to partition the dataset into these two sets with a ratio of 4:1, specifically 80 samples in the calibration set and 20 samples in the prediction set.

2.5.3. Selection of Characteristic Frequency Bands

Spectral raw data typically contains a large number of variables, and utilizing all variables for model establishment is not only time-consuming and resource-intensive but also prone to degrading model accuracy due to the inclusion of redundant or irrelevant information. Therefore, preprocessing the data to screen critical variables is essential for improving modeling efficiency and enhancing predictive precision.
In this study, Stability Competitive Adaptive Reweighted Sampling (SCARS), Interval Partial Least Squares (iPLS), and Iteratively Retained Informative Variables (IRIV) algorithms were employed to screen characteristic frequency bands in terahertz spectroscopy. Interval Combination Optimization (ICO), Successive Projections Algorithm (SPA), Random Frog (RF) were employed to screen characteristic frequency bands in near-infrared hyperspectral.

2.5.4. Modeling Algorithm

After dimensionality reduction and feature variable extraction from the two types of spectral data, separate prediction models for lettuce nitrogen content were established using the Least Squares Support Vector Machine (LS-SVM) and Kernel Extreme Learning Machine (KELM). For LS-SVM modeling of both spectral datasets, two different kernel functions—Linear Kernel and Radial Basis Function (RBF) Kernel—were employed, while KELM modeling exclusively adopted the RBF Kernel. Specifically, the regularization parameter (C) and kernel parameter (s) were set to 100 and 10 for near-infrared hyperspectral data, and to 100 and 0.1 for terahertz spectral data, respectively.
It is particularly noteworthy that after establishing the two spectral models, this study further integrates them through a small-sample learning approach. Specifically, partial crop samples’ terahertz and hyperspectral characteristic wavelengths were selected and normalized to eliminate the influence of different dimensions, after which they were concatenated into a single sample. The small-sample learning model employed a metric learning method, and the overall framework is illustrated in Figure 4.
Considering that metric learning primarily requires image data as input, which are typically represented as two-dimensional matrices, whereas the extracted spectral features are one-dimensional data, a transformation method was adopted to convert them into two-dimensional form for subsequent processing. To achieve this, the one-dimensional spectral feature vector was multiplied by its transpose, thereby preserving both the form and spectral information, as expressed in the following Formula (3):
D = X T X = x 1 2 x 1 x 2 x 1 x n x 2 x 1 x 2 2 x 2 x n x n x 1 x n x 2 x n 2
In the equation, X represents the one-dimensional spectral feature data, and X T denotes its transpose.
For the label input, the corresponding nitrogen content values of the samples are used.
In the feature extractor module, a neural network framework is employed. Considering the limited sample size, a relatively shallow yet classical residual network, ResNet-18 with 18 layers, is selected.
In the metric space module, Euclidean distance is used to calculate the differences between features, and its formula is given as follows.
d x , y = i = 1 n ( x i y i ) 2
For the classification module, a Softmax classifier is adopted, as it is more suitable for multi-class classification tasks.

2.5.5. Model Evaluation Method

In this study, the model performance was evaluated using the coefficient of determination ( R 2 ) and the root mean square error (RMSE), with the latter further distinguished as the root mean square error of calibration (RMSEC) and prediction (RMSEP). The calculation formulas are provided in Equations (3) and (4). Generally, R2 values closer to 1 and RMSE values closer to 0 indicate better model performance. A smaller difference between RMSEC and RMSEP suggests greater model stability, while R2 values near unity enhance prediction reliability.
R 2 = y ^ y ̄ 2 y y ̄ 2 = 1 y y ^ 2 y y ̄ 2
R M S E = i = 1 n y y ^ 2 n 1
In the equation, y represents the measured true value of the lettuce sample; y ^ denotes the predicted value of the sample model; y ̄ is the average of the true values of the samples; and n is the number of samples.
Herein, R c 2 and R M S E c refer to the coefficient of determination and root mean square error ( R M S E ) of the calibration set, respectively, while R p 2 and R M S E p represent the coefficient of determination and RMSE of the prediction set, respectively. A larger coefficient of determination and a smaller RMSE indicate better performance of the established model.

3. Results

3.1. Frequency Selection and Segmentation of Terahertz Images

3.1.1. Frequency Selection

Since the image volume data obtained from the scanning measurements contains a large volume of data, its processing is relatively complex and unfavorable for further analysis. Therefore, the amount of data in the image volume data needs to be reduced to facilitate subsequent exploration.
From the frequency dimension, the measurement range is 0–4 THz with an interval of 3.8 GHz, resulting in 1049 frequency sampling points. Among these, frequency points with low correlation to the measurement target can be discarded. By grouping the spectrum into intervals of 0.5 THz, the averaged terahertz images in each interval were obtained, as shown in Figure 5.
As illustrated in Figure 5. Terahertz images at different frequencies, the sample information within the 0–2 THz range is relatively complete and highly operable, whereas the images in the 2–4 THz range are blurred, contain numerous irrelevant details, and are significantly affected by noise, which reduces data quality and complicates processing. Accordingly, the frequency range of terahertz images to be analyzed was limited to 0–2 THz, thereby reducing the data volume by half and potentially improving the accuracy of subsequent analysis.
To further evaluate the data, several points were randomly selected from the images, and their frequency-dependent relationships in the 0–2 THz range are shown in Figure 6.
From these results, it can be observed that in the 0–0.2 THz range, the sample and background data exhibit almost no differences, indicating that this range cannot effectively reflect sample features and should therefore be excluded. Similarly, in the 1.2–2 THz range, the fluctuations of the sample and background data become increasingly similar and eventually converge, suggesting limited research value for this interval. Through comparative analysis, the final frequency range for investigation in this study was determined as 0.2–1.2 THz.

3.1.2. Image Segmentation

When processing terahertz image data of a sample at a single frequency, each image consists of 80 × 80 pixels, totaling 6400 data points. Such a large amount of data introduces excessive variables, complicating the analysis procedure and reducing processing efficiency. To address this issue, the average value of all pixels in a terahertz image at a given frequency was used to represent the sample data at that frequency. This approach significantly reduces the complexity of data processing and improves the efficiency of model development. Figure 7 shows the differences among data from different measurement regions within the 0.2–1.2 THz frequency range. In Figure 7, “Raw data” refers to the unprocessed original averaged image data after measurement, which contains both sample and background regions; “Sample data” represents the averaged data of the sample region, excluding the background; while “Background” corresponds to the averaged data of the background region, excluding the sample.
Because the raw data are affected by irrelevant background information, the measured values cannot accurately reflect the true characteristics of the sample. In certain frequency ranges, the signals are even distorted, which severely compromises the reliability of sample information. To mitigate this problem, an image segmentation method was applied to separate the sample region from the background in the terahertz images, thereby improving the correlation of sample data.
Before edge extraction, grayscale conversion and binarization were performed. After the conversion, edge extraction was carried out using the Canny operator, and the results are presented in Figure 8. Finally, the background-removed data were averaged and normalized. The resulting three-dimensional average view of the sample in the 0.2–1.2 THz frequency range is shown in (f), which represents the processed sample data region required for subsequent analysis.

3.2. Preprocessing of Spectral Data

3.2.1. Terahertz Spectral Data Preprocessing

When applying the SG smoothing algorithm with different window widths, the R 2 value of the model initially increases and then decreases as the window width expands. A lower R M S E value indicates higher data accuracy. After a comparative analysis of preprocessing results with various window widths, a window width of 9 points was selected for SG smoothing. Figure 9a presents the comparison of terahertz power spectra before and after SG smoothing.
The Multiplicative Scatter Correction (MSC) algorithm can effectively eliminate baseline shifts and offsets among samples, thereby enhancing spectral information related to component content. As shown in Figure 9b, the comparison before and after MSC preprocessing under the same gradient demonstrates that the corrected sample spectra remain aligned with the reference spectrum.

3.2.2. Near Infrared Hyperspectral Data Preprocessing

In the near-infrared images collected, the RGB channels correspond to the following wavelengths: Band R—1448.899 nm, Band G—1376.352 nm, and Band B—1307.224 nm, with the corresponding band indices of 166, 143, and 121, respectively. The raw data acquired by the hyperspectral imaging system are shown in Figure 10a.
As illustrated in Figure 10a, the extracted raw data at both ends of the near-infrared range are significantly affected by noise, leading to data distortion in those regions. To minimize the impact of noise, the study range was restricted to 1000–1600 nm, covering 180 spectral bands. After selecting this range, SG smoothing was applied to the raw data, with an optimal window width of 7 points per iteration. At this parameter, the R2 value reached 0.9889 and the RMSE achieved its lowest value of 4.4987 × 10−9. Subsequently, multiplicative scatter correction (MSC) was applied to samples within the same gradient, and the differences among samples at different gradient levels are presented in Figure 10b.
In Figure 10b, significant differences in spectral intensity exist among samples of different gradients. At the same wavelength, the spectral intensity increased with higher nitrogen levels. The raw spectral intensity data were then converted to reflectance, and the data within the selected range were further processed using the standard normal variate (SNV) transformation algorithm to eliminate adverse effects on the near-infrared spectra. The calculation formula is as follows:
X s n v = X X ¯ 1 m ( X k X ¯ ) 2 m 1
In this formula, X ¯ = 1 m X k m , m is the number of wavelength points, k =1, 2, m.

3.3. Terahertz Spectral Feature Band Selection

The original terahertz spectral data usually contain a large number of variables. Using all of them to build a model is not only time-consuming and labor-intensive, but also problematic because some variables have little or no correlation with the target, which can seriously affect the accuracy of the model. To reduce data dimensionality and accurately identify spectral frequency variables that characterize nitrogen content in lettuce, it is necessary to apply feature extraction algorithms to screen key frequency bands after spectral preprocessing.

3.3.1. SCARS-Based Feature Band Selection

The stability competitive adaptive reweighted sampling (SCARS) algorithm uses variable stability as the selection criterion, with more stable variables being more likely to be retained. In this study, the number of sampling iterations in the SCARS algorithm was set to 50, and the results eventually converged to stability. Taking the power spectrum data as an example, the algorithm output is shown in Figure 11.
At the 39th iteration, the root mean square error of cross-validation (RMSECV) of the power spectrum model reached its minimum value of 0.2942, after which the error gradually increased. Therefore, the subset of feature variables obtained at the 39th iteration was selected as the optimal variable subset, yielding six terahertz feature bands significantly associated with nutrient content. The extraction process for absorbance feature bands was similar, and the results are presented in Table 1.

3.3.2. iPLS-Based Feature Band Selection

The interval partial least squares (iPLS) method divides the full spectrum into equidistant sub-intervals, builds PLS models for each sub-interval, and calculates the RMSECV values. After identifying the optimal sub-interval, the method expands and combines it with neighboring high-accuracy intervals centered around the sub-interval showing the largest difference from the full-spectrum RMSECV, thereby determining the best predictive interval combination. In this study, the iPLS algorithm divided the frequency points within the 0.2–1.2 THz range into 15 equal parts. Under this partitioning, two intervals (the 5th and the 10th) had RMSECV values below the set threshold, as indicated by the dashed line in Figure 12. This threshold value was obtained from a model built with nine PLS principal components, where the minimum RMSECV reached 0.3656. The modeling results with different numbers of principal components are shown in Figure 12.
As shown in Table 2, the RMSECV values for the 5th and 10th intervals were 0.3356 and 0.3489, respectively, both significantly lower than the threshold of 0.3656. Their corresponding frequency ranges were 0.4654–0.5302 THz and 0.8125–0.8735 THz. Considering all factors, the final power spectrum feature frequency range selected by the iPLS algorithm was 0.8125–0.8735 THz, covering 17 frequency points. The process of extracting the feature frequency range for absorbance was similar. Ultimately, the 12th interval, spanning 0.9384–0.9956 THz and containing 16 frequency points, was selected. The corresponding RMSECV value for this interval was 0.3691, below the threshold of 0.3975, which was obtained from a model built with seven PLS principal components. The results are summarized in Table 3.

3.3.3. IRIV-Based Feature Band Selection

The IRIV (Iteratively Retained Informative Variables) algorithm transforms the original data matrix A into a binary matrix B (where 1/0 indicates whether a variable is selected), and evaluates variable importance using RMSECV. Through multiple iterations and backward elimination, the algorithm screens out the optimal subset of variables. In this study, the parameters were set with a maximum of 10 principal factors and 5-fold cross-validation for variable selection. The running process is shown in Figure 13a.
As illustrated in Figure 13a, after the first three iterations, the number of variables dropped sharply from the original 262 to 38, essentially eliminating most non-informative and interfering variables. The number of retained variables then tended to stabilize. Ultimately, after the 7th round of backward elimination, 15 feature variables were preserved, with their distribution shown in Figure 13b.
The process of selecting absorbance feature variables was similar to that for power spectrum feature variables. The final selection results obtained by the IRIV algorithm are summarized in Table 4.

3.4. NIR Hyperspectral Feature Wavelength Selection

This section used three commonly used feature wavelength selection algorithms, Random Frog (RF), Successive Projections Algorithm (SPA), and Interval Combination Optimization (ICO) to select key wavelength points that can reflect NIR differences.

3.4.1. RF-Based Feature Band Selection

The RF algorithm was applied to extract features from sample data within the 1000–1600 nm range, which contained a total of 180 variables. To minimize the influence of randomness on data analysis, the algorithm was run 1000 times. During execution, a selection probability is calculated for each variable; the higher the probability, the more important the variable. When the selection probability exceeds a predefined threshold, the variable is retained. With the threshold set at 0.2, six variables were selected, corresponding to indices 10, 24, 28, 30, 35, and 161. Their respective wavelengths were 1035.136 nm, 1086.922 nm, 1101.293 nm, 1108.413 nm, 1126.031 nm, and 1533.281 nm. The selected feature wavelengths are shown on the curve in Figure 14.

3.4.2. SPA-Based Feature Band Selection

The Successive Projections Algorithm (SPA) was then employed to identify variable combinations with minimal redundancy and the least linear correlation among numerous spectral variables. In the SPA-based data processing, the minimum number of selected feature variables was set to 1, and the maximum to 60. The feature extraction results are shown in Figure 15. Using SPA, five feature wavelengths were ultimately selected, with an RMSE of 2.0507. Based on the selected feature indices, their corresponding positions in the full spectrum were 46, 50, 61, 67, and 104, which correspond to wavelengths of 1163.943 nm, 1177.468 nm, 1214.039 nm, 1233.65 nm, and 1351.262 nm, respectively.

3.4.3. ICO-Based Feature Band Selection

The Interval Combination Optimization (ICO) algorithm divides the sample variables into n equally spaced intervals. Partial least squares (PLS) regression and five-fold cross-validation are then applied to calculate the root mean square error (RMSE) of each interval. Variables that reduce the model’s RMSE are retained until further changes no longer improve the model performance. The corresponding intervals are then designated as the final variable intervals selected by the algorithm. After multiple runs and comparisons, the maximum number of potential variables was set to 5, the number of matrix samples to 400, the proportion of selected submodels to 0.05, and the number of equally spaced spectral intervals to 40. The results are shown in Figure 16. From the selected intervals indicated by black bars, a total of seven feature wavelengths were identified from the near-infrared data, corresponding to indices 12, 13, 36, 37, 50, 173, and 174. Their respective wavelengths were 1042.684 nm, 1046.439 nm, 1129.523 nm, 1133.007 nm, 1177.468 nm, 1573.699 nm, and 1577.12 nm.

3.5. Construction and Analysis of Predictive Models

3.5.1. Terahertz-Based Prediction Model for Nitrogen Content

After dimensionality reduction and feature variable selection using SCARS, iPLS, and IRIV algorithms, prediction models of lettuce nitrogen content were established based on the selected feature variables using Least Squares Support Vector Machine (LS-SVM) and Kernel Extreme Learning Machine (KELM), respectively. For LS-SVM, two different kernel functions, Linear Kernel and RBF Kernel, were adopted. The modeling results of the above three methods are summarized in Table 5, Table 6 and Table 7.
From the results, it can be seen that under the combination of three different modeling methods and feature extraction algorithms, the modeling performance in the power spectrum dimension is overall superior to that in the absorbance dimension. With the Linear Kernel, the best performance was achieved when the power spectrum data were processed with IRIV, where both the calibration set and prediction set determination coefficients were higher than those of the other two algorithms, and the root mean square error (RMSE) was the lowest. In the absorbance dimension, the best performance was obtained with iPLS-based feature extraction, outperforming the other two algorithms across all evaluation metrics. Overall, modeling based on the power spectrum dimension demonstrated better results.
When using the RBF Kernel, both the power spectrum and absorbance dimensions exhibited optimal performance when feature variables were extracted using the SCARS algorithm. Again, the power spectrum dimension showed superior overall modeling performance. Further comparison of the RBF-Kernel KELM model showed that the model constructed with six SCARS-extracted variables in the power spectrum dimension achieved the best prediction performance, outperforming the IRIV-based model in the absorbance dimension. In summary, combining the power spectrum dimension with appropriate feature extraction methods can significantly enhance modeling performance and achieve higher prediction accuracy.
Figure 17 shows the scatter plots of prediction results for the above modeling methods in both the power spectrum and absorbance dimensions. From the fitting performance of the models, the predicted values of each sample are close to the regression line, indicating good predictive capability in both dimensions. Among them, the three methods all demonstrated the best predictive performance for lettuce nitrogen content when using SCARS-based features in the power spectrum dimension.

3.5.2. NIR-Based Prediction Model for Nitrogen Content

The process of establishing a hyperspectral prediction model is similar to the previous section.
The modeling results of near-infrared data using different algorithms are summarized as follows: Table 8, Table 9 and Table 10. For the Least Squares Support Vector Machine (LS-SVM) with a Linear kernel, the best performance was achieved with six feature variables extracted by the Random Frog (RF) algorithm, with the γ parameter set to 317.0212. The coefficient of determination (R2) and root mean square error (RMSE) were 0.9578 and 0.2312 for the calibration set, and 0.9617 and 0.2485 for the validation set, respectively. Under the RBF kernel, LS-SVM achieved optimal performance using seven feature variables extracted by the Interval Combination Optimization (ICO) algorithm, with γ and σ2 set to 23.2556 and 0.3164, respectively. The corresponding R2 and RMSE values were 0.9677 and 0.1938 for the calibration set, and 0.9603 and 0.2620 for the validation set. Similarly, for the Kernel Extreme Learning Machine (KELM) with the RBF kernel, the best model was constructed using seven ICO-selected feature variables, achieving R2 and RMSE values of 0.9581 and 0.2383 for the calibration set, and 0.9620 and 0.2628 for the validation set. Overall, all models demonstrated strong predictive capability, with R2 values exceeding 0.95.
Overall, the modeling performance of the least squares support vector machine (LSSVM) with the RBF kernel function is superior to that with the linear kernel function. The determination coefficients of the constructed models all exceed 0.95, indicating relatively strong predictive capability. The models built using KELM also achieve determination coefficients above 0.95, suggesting good predictive performance; however, they are still slightly inferior compared to the models established by LSSVM. More content is shown in Figure 18.

3.6. Fusion Model Based on Few-Shot Learning

The spectral information network was trained using 200 leaf samples, expanded to 1000 via noise augmentation. Calibration and validation sets were split 4:1, with input dimensions slightly larger than 100 × 100 pixels. Training was performed on PyTorch (v2.3.0) using ResNet-18 over 30 epochs. Taking the SCARS + ICO combination as an example, the calibration set reached 96.25% accuracy and the validation set 95.94%. Figure 19 shows that when the training reaches 15 rounds, the rate of curve rise or fall slows down slightly, then reaches 20 rounds, the curve begins to oscillate and converge, with residual modules accelerating convergence while maintaining performance. Across all fusion model combinations, training accuracies exceeded 85%, indicating the approach is effective and reliable for few-shot learning classification of spectral data. The test results of fusion models with different combinations are shown in Figure 20.

4. Conclusions

This study employed terahertz (THz) and near-infrared (NIR) hyperspectral techniques to predict nitrogen content in lettuce. THz spectral analysis determined the effective detection range to be 0.2–1.2 THz. Background data were removed using the Canny edge detection operator, and preprocessing with SG smoothing and MSC algorithms improved both data quality and detection efficiency.
For THz feature selection, the preprocessed spectral data were divided into calibration and validation sets using the SPXY algorithm. Feature frequency bands were then screened using SCARS, iPLS, and IRIV algorithms. SCARS extracted key frequencies mainly at 0.24, 0.32, 0.40, 0.87, and 0.90 THz; iPLS selected the 12th interval (0.9384–0.9956 THz, 16 frequency points); IRIV retained 15 power spectra and 12 absorbance features. Based on these selected variables, predictive models were constructed using LS-SVM and KELM. The LS-SVM model achieved the best performance with the RBF kernel and SCARS-selected features, yielding calibration set R2 and RMSE of 0.96 and 0.19, and validation set R2 and RMSE of 0.96 and 0.20. Similarly, the KELM model performed best with SCARS-selected features, with calibration set R2 and RMSE of 0.97 and 0.19, and validation set R2 and RMSE of 0.96 and 0.20.
For NIR hyperspectral data, the spectral range was set to 1000–1600 nm. After SG smoothing and SNV preprocessing, and sample set division via SPXY, feature wavelengths were extracted using RF, SPA, and ICO algorithms. Both LS-SVM and KELM models achieved optimal performance using ICO-selected features. LS-SVM yielded calibration set R2 and RMSE of 0.9677 and 0.1938, and validation set R2 and RMSE of 0.9603 and 0.2620. KELM yielded calibration set R2 and RMSE of 0.9581 and 0.2383, and validation set R2 and RMSE of 0.9620 and 0.2628, indicating that ICO features provided the best predictive capability.
Then we fused THz and NIR models by Few-shot learning method, the model showed that the SCARS + ICO combination performed best, with calibration set accuracy of 90.25% and validation set accuracy of 85.24%. All fusion combinations achieved training accuracies above 70%, confirming the feasibility and effectiveness of this fusion approach.
In conclusion, combining THz and NIR hyperspectral data with feature extraction and modeling techniques, including LS-SVM, KELM, and fusion models, effectively predicts lettuce nitrogen content. Among the feature selection strategies, SCARS and ICO demonstrated the highest predictive performance, providing a reliable method for rapid and non-destructive nitrogen detection in lettuce.

Author Contributions

Conceptualization, J.Z. (Jingbo Zhi), X.Z. and J.H.; methodology, Y.Z.; software, J.Z. (Jingbo Zhi) and J.G.; validation, T.L., J.Z. (Jialiang Zheng) and W.L.; formal analysis, Y.Z.; investigation, Y.Z.; resources, X.Z.; data curation, J.Z. (Jingbo Zhi); writing—original draft preparation, Y.Z., J.Z. (Jialiang Zheng), J.G. and W.L.; writing—review and editing, Y.Z., J.Z. (Jialiang Zheng) and W.L.; visualization, T.L.; supervision, X.Z. and J.H.; project administration, X.Z.; funding acquisition, X.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Key Research and Development Program for Young Scientists (Grant Nos. 2022YFD2000200), Jiangsu Province Industry Forward-looking Program Project (Grant Nos. BE2023017), Agricultural Equipment Department of Jiangsu University (Grant Nos. NZXB20210106), National Key Research and Development Program of China (Grant Nos.2022YFD2002302).

Data Availability Statement

The original contributions presented in the study are included in the article; further inquiries can be directed to the corresponding author.

Acknowledgments

We thank the School of Agricultural Engineering, Jiangsu University, for providing essential facilities and technical support. We are grateful to our laboratory colleagues for their assistance with THz-TDS and NIR-HSI experiments, and to the team members who contributed to lettuce cultivation, nitrogen gradient preparation, and data collection. We also sincerely acknowledge the valuable comments and kind attention of the anonymous reviewers and the editor, which have greatly improved the quality of this work.

Conflicts of Interest

Author Wei Liu was employed by the company Huai’an Chaimihe Agricultural Technology Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Abbreviations

ICOInterval Combination Optimization
IPLSInterval Partial Least Squares
IRIVIteratively Retained Informative Variables
KELMKernel Extreme Learning Machine
LS-SVMLeast Squares Support Vector Machine
MSCMultiplicative Scatter Correction
NIR-HISNear-Infrared Hyperspectral
RFRandom Frog
RBFRadial Basis Function
RSRandom Sampling
SCARSStability Competitive Adaptive Reweighted Sampling
SGSavitzky–Golay
SNVStandard Normal Variate
SPASuccessive Projections Algorithm
SPXYSample set Partitioning based on joint X-Y distance
THz-TDSTerahertz Time-Domain Spectroscopy

References

  1. Kergoat, L.; Lafont, S.; Arneth, A.; Le Dantec, V.; Saugier, B. Nitrogen controls plant canopy light-use efficiency in temperate and boreal ecosystems. J. Geophys. Res. Biogeosci. 2008, 113. [Google Scholar] [CrossRef]
  2. Li, P.; Sun, Z.; Yang, Y.; Lu, M.; Li, H.; Yan, H.; Jin, H.; Song, Y. Determining optimal nitrogen concentration intervals throughout lettuce growth using fluorescence parameters. Comput. Electron. Agric. 2024, 226, 109438. [Google Scholar] [CrossRef]
  3. Lynch, J.M.; Barbano, D.M. Kjeldahl nitrogen analysis as a reference method for protein determination in dairy products. J. AOAC Int. 1999, 82, 1389–1398. [Google Scholar] [CrossRef]
  4. Novamsky, I.; Van Eck, R.; Van Schouwenburg, C.H.; Walinga, I. Total nitrogen determination in plant material by means of the indophenol-blue method. Neth. J. Agric. Sci. 1974, 22, 3–5. [Google Scholar] [CrossRef]
  5. Zhang, Y.K.; Luo, B.; Pan, D.Y.; Song, P.; Lu, W.C.; Wang, C.; Zhao, C.J. Estimation of Canopy Nitrogen Content of Soybean Crops Based on Fractional Differential Algorithm. Spectrosc. Spectr. Anal. 2018, 38, 3221–3230. [Google Scholar]
  6. Prananto, J.A.; Minasny, B.; Weaver, T. Near infrared (NIR) spectroscopy as a rapid and cost-effective method for nutrient analysis of plant leaf tissues. Adv. Agron. 2020, 164, 1–49. [Google Scholar]
  7. Zahir, S.A.D.M.; Jamlos, M.F.; Omar, A.F.; Jamlos, M.A.; Mamat, R.; Muncan, J.; Tsenkova, R. Review-Plant nutritional status analysis employing the visible and near-infrared spectroscopy spectral sensor. Spectrochim. Acta Part A Mol. Biomol. Spectrosc. 2024, 304, 123273. [Google Scholar] [CrossRef] [PubMed]
  8. Lee, W.S.; Ehsani, R. Sensing systems for precision agriculture in Florida. Comput. Electron. Agric. 2015, 112, 2–9. [Google Scholar] [CrossRef]
  9. Kühn, P.; Proß, T.; Römermann, C.; Wesche, K.; Bruelheide, H. Using near-infrared spectroscopy to predict nitrogen and phosphorus concentrations of herbarium specimens under different storage conditions. Plant Methods 2024, 20, 19. [Google Scholar] [CrossRef] [PubMed]
  10. Karoojee, S.; Noypitak, S.; Abdullakasim, S. Determination of total nitrogen content in fresh leaves and leaf powder of Dendrobium orchids using near-infrared spectroscopy. Hortic. Environ. Biotechnol. 2021, 62, 31–40. [Google Scholar] [CrossRef]
  11. Davrinche, A.; Haider, S. Intra-specific leaf trait responses to species richness at two different local scales. Basic Appl. Ecol. 2021, 55, 20–32. [Google Scholar] [CrossRef]
  12. Zhang, J.; Liu, Z.; Pu, Y.; Wang, J.; Tang, B.; Dai, L.; Yu, S.; Chen, R. Identification of transgenic agricultural products and foods using NIR spectroscopy and hyperspectral imaging: A review. Processes 2023, 11, 651. [Google Scholar] [CrossRef]
  13. Zareef, M.; Chen, Q.; Hassan, M.M.; Arslan, M.; Hashim, M.M.; Ahmad, W.; Kutsanedzie, F.Y.H.; Agyekum, A.A. An overview on the applications of typical non-linear algorithms coupled with NIR spectroscopy in food analysis. Food Eng. Rev. 2020, 12, 173–190. [Google Scholar] [CrossRef]
  14. Zhang, J.; Dai, L.; Huang, Z.; Gong, C.; Chen, J.; Xie, J.; Qu, M. Corn Seed Quality Detection Based on Spectroscopy and Its Imaging Technology: A Review. Agriculture 2025, 15, 390. [Google Scholar] [CrossRef]
  15. Mu, X.; Lu, Y. Non-destructive detection of spotted wing Drosophila infestation in blueberry fruit using hyperspectral imaging technology. Agric. Commun. 2025, 3, 100096. [Google Scholar] [CrossRef]
  16. Wang, J.; Guo, Z.; Zou, C.; Jiang, S.; El-Seedi, H.R.; Zou, X. General model of multi-quality detection for apple from different origins by Vis/NIR transmittance spectroscopy. J. Food Meas. Charact. 2022, 16, 2582–2595. [Google Scholar] [CrossRef]
  17. Shen, T.; Zou, X.; Shi, J.; Li, Z.; Huang, X.; Wu, C. Determination geographical origin and flavonoids content of goji berry using near-infrared spectroscopy and chemometrics. Food Anal. Methods 2016, 9, 68–79. [Google Scholar]
  18. Ding, Y.; Yan, Y.; Li, J.; Chen, X.; Jiang, H. Classification of tea quality levels using near-infrared spectroscopy based on CLPSO-SVM. Foods 2022, 11, 1658. [Google Scholar] [CrossRef]
  19. Liu, L.; Zareef, M.; Wang, Z.; Li, H.; Chen, Q.; Ouyang, Q. Monitoring chlorophyll changes during Tencha processing using portable near-infrared spectroscopy. Food Chem. 2023, 412, 135505. [Google Scholar] [CrossRef]
  20. Bilal, M.; Zou, X.; Arslan, M.; Tahir, H.E.; Sun, Y.; Aadil, R.M. Near-infrared spectroscopy coupled chemometric algorithms for prediction of antioxidant activity of black goji berries (Lycium ruthenicum Murr.). J. Food Meas. Charact. 2018, 12, 2366–2376. [Google Scholar] [CrossRef]
  21. Yang, B.; Ma, J.; Yao, X.; Cao, W.; Zhu, Y. Estimation of Leaf Nitrogen Content in Wheat Based on Fusion of Spectral Features and Deep Features from Near Infrared Hyperspectral Imagery. Sensors 2021, 21, 613. [Google Scholar] [CrossRef]
  22. Shi, J.; Zou, X.; Zhao, J.; Wang, K.; Chen, Z.; Huang, X.; Zhang, D.; Holmes, M. Nondestructive Diagnostics of Nitrogen Deficiency by Cucumber Leaf Chlorophyll Distribution Map Based on near Infrared Hyperspectral Imaging. Sci. Hortic. 2012, 138, 190–197. [Google Scholar] [CrossRef]
  23. Mao, H.; Gao, H.; Zhang, X.; Kumi, F. Nondestructive Measurement of Total Nitrogen in Lettuce by Integrating Spectroscopy and Computer Vision. Sci. Hortic. 2015, 184, 1–7. [Google Scholar] [CrossRef]
  24. Tang, R.; Chen, K.; Jiang, C.; Li, C. Determining the content of nitrogen in rubber trees by the method of NIR spectroscopy. J. Appl. Spectrosc. 2017, 84, 627–632. [Google Scholar] [CrossRef]
  25. Azadnia, R.; Rajabipour, A.; Jamshidi, B.; Omid, M. New approach for rapid estimation of leaf nitrogen, phosphorus, and potassium contents in apple-trees using Vis/NIR spectroscopy based on wavelength selection coupled with machine learning. Comput. Electron. Agric. 2023, 207, 107746. [Google Scholar] [CrossRef]
  26. Knyazikhin, Y.; Schull, M.A.; Stenberg, P.; Mõttus, M.; Rautiainen, M.; Yang, Y.; Marshak, A.; Latorre Carmona, P.; Kaufmann, R.K.; Lewis, P.; et al. Hyperspectral remote sensing of foliar nitrogen content. Proc. Natl. Acad. Sci. USA 2013, 110, E185–E192. [Google Scholar] [CrossRef]
  27. Han, C.; Qu, F.; Wang, X.; Zhai, X.; Li, J.; Yu, K.; Zhao, Y. Terahertz spectroscopy and imaging techniques for herbal medicinal plants detection: A comprehensive review. Crit. Rev. Anal. Chem. 2024, 54, 2485–2499. [Google Scholar] [CrossRef] [PubMed]
  28. Afsah-Hejri, L.; Hajeb, P.; Ara, P.; Ehsani, R.J. A comprehensive review on food applications of terahertz spectroscopy and imaging. Compr. Rev. Food Sci. Food Saf. 2019, 18, 1563–1621. [Google Scholar] [CrossRef]
  29. Wang, X.; Wu, Q.Y.S.; Zhang, N.; Ngo, A.C.Y.; Tanadi, J.; Khoo, E.H.; Zhu, Q.; Ke, L. Non-invasive early monitoring plant health using terahertz spectroscopy. J. Mater. Sci. Mater. Electron. 2024, 35, 1346. [Google Scholar] [CrossRef]
  30. You, X.; Pagay, V.; Withayachumnankul, W. Non-Contact Monitoring of Plant Leaf Water Status Using Terahertz Waves. J. Infrared Millim. Terahertz Waves 2025, 46, 1–15. [Google Scholar] [CrossRef]
  31. Pagano, M.; Hoshika, Y.; Gennari, F.; Jacopo, M.; Elena, M.; Andrea, V.; Elena, P.; Sharmin, S.; Alessandro, T.; Alessandra, T. Probing ozone effects on European hornbeam (Carpinus betulus L. and Ostrya carpinifolia Scop.) leaf water content through THz imaging and dynamic stomatal response. Sci. Total Environ. 2024, 956, 177358. [Google Scholar] [CrossRef]
  32. Yang, J.; Li, B.; Yang, A.; Sun, Z.; Wan, X.; Ouyang, A.; Liu, Y. A generalized model for seed internal quality detection based on terahertz imaging technology combined with image compressed sensing and improved-real ESRGAN. Microchem. J. 2025, 208, 112410. [Google Scholar]
  33. Hu, J.; Xu, S.; Huang, Z.; Liu, W.; Zheng, J.; Liao, Y. Rapid Non-Destructive Detection of Rice Seed Vigor via Terahertz Spectroscopy. Agriculture 2024, 15, 34. [Google Scholar] [CrossRef]
  34. Jiang, W.; Wang, J.; Lin, R.; Chen, R.; Chen, W.; Xie, X.; Hsiung, K.L.; Chen, H.Y. Machine learning-based non-destructive terahertz detection of seed quality in peanut. Food Chem. X 2024, 23, 101675. [Google Scholar] [CrossRef]
  35. Qin, B.; Li, Z.; Hu, F.; Hu, C.; Chen, T.; Zhang, H.; Zhao, Y. Highly sensitive detection of carbendazim by using terahertz time-domain spectroscopy combined with metamaterial. IEEE Trans. Terahertz Sci. Technol. 2018, 8, 149–154. [Google Scholar] [CrossRef]
  36. Chen, Z.; Zhang, Z.; Zhu, R.; Xiang, Y.; Yang, Y.; Harrington, P.B. Application of terahertz time-domain spectroscopy combined with chemometrics to quantitative analysis of imidacloprid in rice samples. J. Quant. Spectrosc. Radiat. Transf. 2015, 167, 1–9. [Google Scholar] [CrossRef]
  37. Lee, D.K.; Kim, G.; Kim, C.; Jhon, Y.M.; Kim, J.H.; Lee, T.; Son, J.H.; Seo, M. Ultrasensitive detection of residual pesticides using THz near-field enhancement. IEEE Trans. Terahertz Sci. Technol. 2016, 6, 389–395. [Google Scholar] [CrossRef]
  38. Afsah-Hejri, L.; Hajeb, P.; Ehsani, R. Application of ozone for degradation of mycotoxins in food: A review. Compr. Rev. Food Sci. Food Saf. 2020, 19, 1777–1808. [Google Scholar] [CrossRef]
  39. Liu, W.; Zhao, P.; Wu, C.; Liu, C.; Yang, J.; Zheng, L. Rapid determination of aflatoxin B1 concentration in soybean oil using terahertz spectroscopy with chemometric methods. Food Chem. 2019, 293, 213–219. [Google Scholar] [CrossRef] [PubMed]
  40. Tong, Y.; Wang, S.; Han, K.; Song, X.; Zhang, W.; Ye, Y.; Ren, X. Development of a novel metal grating and its applications of terahertz spectroscopic detection of CuSO4 in fruit. Food Anal. Methods 2021, 14, 1590–1599. [Google Scholar] [CrossRef]
  41. Zhang, X.; Wang, Y.; Zhou, Z.; Zhang, Y.; Wang, X. Detection method for tomato leaf mildew based on hyperspectral fusion terahertz technology. Foods 2023, 12, 535. [Google Scholar] [CrossRef]
  42. Siles, J.V.; Cooper, K.B.; Lee, C.; Lin, R.H.; Chattopadhyay, G.; Mehdi, I. A new generation of room-temperature frequency-multiplied sources with up to 10× higher output power in the 160-GHz–1.6-THz range. IEEE Trans. Terahertz Sci. Technol. 2018, 8, 596–604. [Google Scholar] [CrossRef]
  43. Feng, L.; Wu, B.; Chen, S.; Zhang, C.; He, Y. Application of visible/near-infrared hyperspectral imaging with convolutional neural networks to phenotype aboveground parts to detect cabbage Plasmodiophora brassicae (clubroot). Infrared Phys. Technol. 2022, 121, 104040. [Google Scholar] [CrossRef]
  44. Feng, C.H.; Otani, C. Terahertz spectroscopy technology as an innovative technique for food: Current state-of-the-Art research advances. Crit. Rev. Food Sci. Nutr. 2021, 61, 2523–2543. [Google Scholar] [CrossRef] [PubMed]
  45. Zang, Z.; Wang, J.; Cui, H.-L.; Yan, S. Terahertz Spectral Imaging Based Quantitative Determination of Spatial Distribution of Plant Leaf Constituents. Plant Methods 2019, 15, 106. [Google Scholar] [CrossRef] [PubMed]
  46. Zhang, X.; Duan, C.; Wang, Y.; Gao, H.; Hu, L.; Wang, X. Research on a Nondestructive Model for the Detection of the Nitrogen Content of Tomato. Front. Plant Sci. 2023, 13, 1093671. [Google Scholar] [CrossRef] [PubMed]
Figure 1. Hyperspectral imaging system.
Figure 1. Hyperspectral imaging system.
Agronomy 15 02261 g001
Figure 2. Composition of measuring Terahertz system.
Figure 2. Composition of measuring Terahertz system.
Agronomy 15 02261 g002
Figure 3. Image volume data.
Figure 3. Image volume data.
Agronomy 15 02261 g003
Figure 4. Metric learning framework.
Figure 4. Metric learning framework.
Agronomy 15 02261 g004
Figure 5. Terahertz images at different frequencies (The pseudocolor scale represents the relative reflectivity intensity of the THz signal, with warmer colors indicating higher reflectivity and cooler colors indicating lower reflectivity. Scale bars denote normalized reflectivity values for direct comparison across frequencies).
Figure 5. Terahertz images at different frequencies (The pseudocolor scale represents the relative reflectivity intensity of the THz signal, with warmer colors indicating higher reflectivity and cooler colors indicating lower reflectivity. Scale bars denote normalized reflectivity values for direct comparison across frequencies).
Agronomy 15 02261 g005
Figure 6. Research Scope Selection.
Figure 6. Research Scope Selection.
Agronomy 15 02261 g006
Figure 7. Comparison of data at different points.
Figure 7. Comparison of data at different points.
Agronomy 15 02261 g007
Figure 8. Image processing and edge extraction results. (a) Original pseudo-color image; (b) Grayscale image; (c) Binarized image; (d) Edge extraction using Canny operator; (e) Background-removed pseudo-color image after edge extraction; (f) 3D view.
Figure 8. Image processing and edge extraction results. (a) Original pseudo-color image; (b) Grayscale image; (c) Binarized image; (d) Edge extraction using Canny operator; (e) Background-removed pseudo-color image after edge extraction; (f) 3D view.
Agronomy 15 02261 g008
Figure 9. (a) Comparison of effects before and after terahertz power spectrum smoothing (b) Comparison of effects before and after MSC.
Figure 9. (a) Comparison of effects before and after terahertz power spectrum smoothing (b) Comparison of effects before and after MSC.
Agronomy 15 02261 g009
Figure 10. (a) Raw Data and Data Range Selection Schematic Diagram; (b) Values of different gradient horizontal samples; (c) SNV process.
Figure 10. (a) Raw Data and Data Range Selection Schematic Diagram; (b) Values of different gradient horizontal samples; (c) SNV process.
Agronomy 15 02261 g010
Figure 11. SCARS algorithm results.
Figure 11. SCARS algorithm results.
Agronomy 15 02261 g011
Figure 12. iPLS interval partitioning and modeling results with principal components (The red dashed line is a decision threshold and the number below it represents the optimal principal component score used to establish the PLS model in that specific wavelength range).
Figure 12. iPLS interval partitioning and modeling results with principal components (The red dashed line is a decision threshold and the number below it represents the optimal principal component score used to establish the PLS model in that specific wavelength range).
Agronomy 15 02261 g012
Figure 13. (a) The result of IRIV algorithm; (b) Distribution of variables filtered by the IRIV.
Figure 13. (a) The result of IRIV algorithm; (b) Distribution of variables filtered by the IRIV.
Agronomy 15 02261 g013
Figure 14. The results of RF operation (The red line is the set threshold, which is 0.2).
Figure 14. The results of RF operation (The red line is the set threshold, which is 0.2).
Agronomy 15 02261 g014
Figure 15. Feature variables selected by SPA.
Figure 15. Feature variables selected by SPA.
Agronomy 15 02261 g015
Figure 16. Wavelength intervals selected by ICO.
Figure 16. Wavelength intervals selected by ICO.
Agronomy 15 02261 g016
Figure 17. Prediction of 3 modeling methods based on THz. (a) Linear kernel—Power spectrum dimension; (b) Linear kernel—Absorbance dimension; (c) RBF kernel—Power spectrum dimension; (d) RBF kernel—Absorbance dimension; (e) KELM—Optimal result in power spectrum dimension; (f) KELM—Absorbance dimension.
Figure 17. Prediction of 3 modeling methods based on THz. (a) Linear kernel—Power spectrum dimension; (b) Linear kernel—Absorbance dimension; (c) RBF kernel—Power spectrum dimension; (d) RBF kernel—Absorbance dimension; (e) KELM—Optimal result in power spectrum dimension; (f) KELM—Absorbance dimension.
Agronomy 15 02261 g017
Figure 18. Prediction of 3 Modeling Methods Based on NIR. (a) Linear kernel; (b) RBF kernel; (c) KELM.
Figure 18. Prediction of 3 Modeling Methods Based on NIR. (a) Linear kernel; (b) RBF kernel; (c) KELM.
Agronomy 15 02261 g018
Figure 19. The training results: (a) The accuracy result of training, (b) the loss result of training.
Figure 19. The training results: (a) The accuracy result of training, (b) the loss result of training.
Agronomy 15 02261 g019
Figure 20. The test results of fusion model.
Figure 20. The test results of fusion model.
Agronomy 15 02261 g020
Table 1. Feature frequency band extraction results.
Table 1. Feature frequency band extraction results.
DimensionTimesSelectionMin RMSECVCharacteristic Frequency (THz)
Power3960.29420.2441, 0.3052, 0.3509, 0.3929, 0.8697, 0.9002
Absorbance3470.31810.2480, 0.2937, 0.3395, 0.4005, 0.6485, 0.8774, 0.9097
Table 2. Results of each interval.
Table 2. Results of each interval.
IntervalFrequency (THz)RMSECVIntervalFrequency (THz)RMSECVIntervalFrequency (THz)RMSECV
10.2022~0.26320.507260.5302~0.59580.4965110.8735~0.93840.4624
20.2632~0.33190.452470.5958~0.66380.4693120.9384~0.99560.3853
30.3319~0.39670.538780.6638~0.72860.5497130.9956~1.06430.5861
40.3967~0.46540.635390.7286~0.81250.4399141.0643~1.12920.6674
50.4654~0.53020.3356100.8125~0.87350.3489151.1292~1.19780.4067
Table 3. The result of iPLS algorithm finally selects.
Table 3. The result of iPLS algorithm finally selects.
DimensionIntervalFrequency (THz)Frequency PointsRMSECVSelect Threshold
Power100.8125~0.8735170.34890.3656
Absorbance120.9384~0.9956160.36910.3975
Table 4. The result of IRIV algorithm finally selects.
Table 4. The result of IRIV algorithm finally selects.
DimensionTimesSelectionRMSECharacteristic Frequency (THz)
Power7150.27520.2213, 0.2441, 0.2747, 0.3014, 0.3052, 0.4234, 0.4425, 0.4845, 0.6561, 0.8621, 0.9155, 0.9537, 0.9842, 0.9880, 1.0376
Absorbance6120.25410.2213, 0.2441, 0.2899, 0.4120, 0.4616, 0.4845, 0.5760, 0.6561, 0.8850, 0.9308, 1.0033, 1.0567
Table 5. The result of Liner-Kernel function modeling by THz (Bold represents the selected optimal prediction model).
Table 5. The result of Liner-Kernel function modeling by THz (Bold represents the selected optimal prediction model).
DimensionMethodVariablesγRC2RMSECRP2RMSEP
PowerSCARS6467.26460.94720.24140.95830.1997
iPLS177918.19610.95110.23230.94840.2396
IRIV15878.80590.95730.22680.96570.1835
AbsorbanceSCARS7356.04430.94300.23830.95020.2284
iPLS169491.48660.94510.23320.96810.1816
IRIV171031.17030.94050.24270.94370.2378
Table 6. The result of RBF-Kernel function modeling by THz (Bold represents the selected optimal prediction model).
Table 6. The result of RBF-Kernel function modeling by THz (Bold represents the selected optimal prediction model).
DimensionMethodVariablesγ/σ2RC2RMSECRP2RMSEP
PowerSCARS6479.7452/0.99640.96400.19390.96060.1986
iPLS17195.1037/15.51830.94690.22560.94330.2589
IRIV1518.9048/1.52700.95350.21130.95850.2058
AbsorbanceSCARS7139.9541/0.72550.95910.20660.94830.2176
iPLS1666.8276/14.26460.93680.24810.94740.2374
IRIV175.8173/3.12970.94370.24390.93640.2534
Table 7. The result of KELM by THz (Bold represents the selected optimal prediction model).
Table 7. The result of KELM by THz (Bold represents the selected optimal prediction model).
DimensionMethodVariablesRC2RMSECRP2RMSEP
PowerSCARS60.96750.19130.95960.1997
iPLS170.95120.21320.95030.2204
IRIV150.95670.20830.95370.2117
AbsorbanceSCARS70.95170.21260.94940.2215
iPLS160.94650.22660.95230.2044
IRIV170.96040.19910.95420.2015
Table 8. The result of Liner-Kernel function modeling by NIR.
Table 8. The result of Liner-Kernel function modeling by NIR.
MethodVariablesγRC2RMSECRP2RMSEP
RF6317.02120.95780.23120.96170.2485
SPA510.52360.94810.24520.95270.2594
ICO7768.76440.95640.23880.95410.2548
Table 9. The result of RBF-Kernel function modeling by NIR.
Table 9. The result of RBF-Kernel function modeling by NIR.
MethodVariablesγσ2RC2RMSECRP2RMSEP
RF660.709153.07360.95930.22870.96010.2454
SPA54.513729.00790.94920.24810.95240.2491
ICO723.25560.31640.96770.19380.96030.2620
Table 10. The result of KELM modeling by NIR.
Table 10. The result of KELM modeling by NIR.
MethodVariablesRC2RMSECRP2RMSEP
RF60.95340.24270.95410.2631
SPA50.94730.25820.95120.2699
ICO70.95810.23830.96200.2628
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhang, Y.; Zheng, J.; Zhi, J.; Guo, J.; Hu, J.; Liu, W.; Li, T.; Zhang, X. Research on Non Destructive Detection Method and Model Op-Timization of Nitrogen in Facility Lettuce Based on THz and NIR Hyperspectral. Agronomy 2025, 15, 2261. https://doi.org/10.3390/agronomy15102261

AMA Style

Zhang Y, Zheng J, Zhi J, Guo J, Hu J, Liu W, Li T, Zhang X. Research on Non Destructive Detection Method and Model Op-Timization of Nitrogen in Facility Lettuce Based on THz and NIR Hyperspectral. Agronomy. 2025; 15(10):2261. https://doi.org/10.3390/agronomy15102261

Chicago/Turabian Style

Zhang, Yixue, Jialiang Zheng, Jingbo Zhi, Jili Guo, Jin Hu, Wei Liu, Tiezhu Li, and Xiaodong Zhang. 2025. "Research on Non Destructive Detection Method and Model Op-Timization of Nitrogen in Facility Lettuce Based on THz and NIR Hyperspectral" Agronomy 15, no. 10: 2261. https://doi.org/10.3390/agronomy15102261

APA Style

Zhang, Y., Zheng, J., Zhi, J., Guo, J., Hu, J., Liu, W., Li, T., & Zhang, X. (2025). Research on Non Destructive Detection Method and Model Op-Timization of Nitrogen in Facility Lettuce Based on THz and NIR Hyperspectral. Agronomy, 15(10), 2261. https://doi.org/10.3390/agronomy15102261

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop