Next Article in Journal
Effects of Microalgae Biomass (Nannochloropsis gaditana and Thalassiosira sp.) on Wheat Seed Germination at High Temperature
Previous Article in Journal
Estimation of Leaf Nitrogen Content in Rice Coupling Feature Fusion and Deep Learning with Multi-Sensor Images from UAV
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Research on Detection Methods for Major Soil Nutrients Based on Pyrolysis-Electronic Nose Time-Frequency Domain Feature Fusion and PSO-SVM-RF Model

1
College of Engineering and Technology, Jilin Agricultural University, Changchun 130118, China
2
Key Laboratory of Bionics Engineering, Ministry of Education, Jilin University, Changchun 130022, China
3
College of Biological and Agricultural Engineering, Jilin University, Changchun 130022, China
4
Institute of Straw Return Application Technology, Jilin Academy of Agricultural Machinery, Changchun 130022, China
*
Author to whom correspondence should be addressed.
Agronomy 2025, 15(12), 2916; https://doi.org/10.3390/agronomy15122916
Submission received: 6 November 2025 / Revised: 5 December 2025 / Accepted: 17 December 2025 / Published: 18 December 2025
(This article belongs to the Topic Soil Health and Nutrient Management for Crop Productivity)

Abstract

Against the backdrop of growing demand for rapid soil testing technologies in precision agriculture, this study proposes a detection method based on pyrolysis-electronic nose and machine olfaction signal analysis to achieve precise measurement of key soil nutrients. An electronic nose system comprising 10 metal oxide semiconductor gas sensors was constructed to collect response signals from 112 black soil samples undergoing pyrolysis at 400 °C. By extracting time-domain and frequency-domain features from sensor responses, an initial dataset of 180 features was constructed. A novel feature fusion method combining Pearson correlation coefficients (PCC) with recursive feature elimination cross-validation (RFECV) was proposed to optimize the feature space, enhance representational power, and select key sensitive features. In predicting soil organic matter (SOM), total nitrogen (TN), available potassium (AK), and available phosphorus (AP) content, we compared support vector machines (SVM), support vector machine-random forest models (SVM-RF), and particle swarm optimization-enhanced support vector machine-random forest models (PSO-SVM-RF). Results indicate that PSO-SVM-RF demonstrated optimal performance across all nutrient predictions, achieving a coefficient of determination (R2) of 0.94 for SOM and TN, with a performance-to-bias ratio (RPD) exceeding 3.8. For AK and AP, R2 improved to 0.78 and 0.74, respectively. Compared to the SVM model, the root mean square error (RMSE) decreased by 25.4% and 21.6% for AK and AP, respectively, with RPD values approaching the practical threshold of 2.0. This study validated the feasibility and application potential of combining electronic nose technology with a time-frequency domain feature fusion strategy for precise quantitative analysis of soil nutrients, providing a new approach for soil fertility assessment in precision agriculture.

1. Introduction

Soil serves as the core foundation for agricultural production and ecosystem health. Key nutrient contents such as soil organic matter (SOM), total nitrogen (TN), available phosphorus (AP), and available potassium (AK) form the fundamental basis for soil fertility assessment and the formulation of scientific fertilization strategies [1,2,3]. Traditional soil nutrient testing relies on chemical analysis methods such as the potassium dichromate oxidation method and the Kjeldahl nitrogen determination method [4,5,6]. Although considered the “gold standard,” these methods are cumbersome, time-consuming, labor-intensive, costly, and prone to generating chemical waste liquids [7,8]. These limitations hinder their ability to meet the urgent demands of modern precision agriculture for large-scale, rapid, and in situ testing. Consequently, developing efficient and reliable rapid detection technologies has become a central focus in current soil nutrient research.
Electronic nose technology based on gas sensor arrays offers a novel approach for rapid soil nutrient detection, leveraging its advantages of fast response, ease of operation, and the ability to comprehensively perceive gas “fingerprints” [9]. While this technology has been applied in soil type identification and pollution assessment [10,11], significant challenges persist in achieving simultaneous high-precision detection of four core soil nutrients: SOM, TN, AP, and AK [12,13,14]. First, the complex gas composition released during soil pyrolysis exhibits strong nonlinearity and signal overlap between nutrient-specific gases and sensor responses, hindering effective nutrient separation. Second, traditional feature extraction methods are often confined to time-domain analysis, such as maximums and averages [15,16,17], which struggle to capture the frequency structure and dynamic details of signals, leading to the loss of critical discriminative information. In recent years, frequency-domain analysis has gained attention. By decomposing signals via Fast Fourier Transform (FFT), periodic or energy distribution characteristics unobservable in the time domain can be revealed [18,19]. However, systematic integration and filtering of frequency-domain analysis remain underutilized in soil nutrient detection [20]. Third, raw sensor signals exhibit low signal-to-noise ratios and abundant redundant information. High-dimensional features readily induce model overfitting, and existing feature selection methods struggle to balance feature representativeness with redundancy control. Fourth, single modeling algorithms struggle to accommodate complex multi-nutrient prediction demands, and hyperparameter adaptive optimization for hybrid models requires further exploration.
To address the aforementioned research gap, this study focuses on a typical black soil region in Northeast China and proposes a rapid soil nutrient detection method integrating pyrolysis-electronic nose sensing with multi-domain feature optimization. Specific research objectives include: (1) Constructing an automated electronic nose detection platform to collect pyrolysis gas signals from soil; (2) Extract time-frequency domain features to construct a high-dimensional feature fusion set; Propose a feature importance evaluation and redundancy control strategy based on Pearson correlation coefficients (PCC) to screen key feature combinations suitable for predicting different nutrients, then further optimize the feature space using the RFECV method [21] to select critical sensitive features; (3) Compare the predictive performance of SVM, SVM-RF, and PSO-SVM-RF models to establish a high-precision detection framework.
The core innovation of this study lies in: overcoming the limitations of single-dimensional information through deep integration of time-frequency domain features; achieving precise feature selection and redundancy control by combining PCC and RFECV; and then optimizing hybrid model parameters via the PSO algorithm to form a full-chain optimization solution encompassing “sensing-feature extraction-modeling.” This method can be directly applied to scenarios such as real-time monitoring of field soil fertility, decision support for precision fertilization, and dynamic assessment of cultivated land quality. It provides efficient, low-cost technical support for the large-scale promotion of precision agriculture while offering a reference for methodological innovation in rapid soil nutrient detection.

2. Materials and Methods

2.1. Research Area Overview and Soil Sample Collection

Soil samples were collected in spring 2021 at the Jilin Academy of Agricultural Sciences experimental station in Gongzhuling City, Jilin Province, located between 124°49′ E and 125°46′ E longitude and 43°31′ N and 44°15′ N latitude. Established in 1980, the base serves as an agricultural research and demonstration platform in the core region of Northeast China’s black soil belt. It has long conducted research on optimizing corn and soybean cultivation and monitoring soil fertility, possessing a well-documented history of agricultural research utilization and comprehensive field management records. The base provides technical support for China’s commercial grain production and is free from disturbances such as industrial pollution within the region. The geographical location and sampling distribution map of the study area is shown in Figure 1.

2.1.1. Basic Characteristics of the Study Area

The study area has a temperate continental monsoon climate, with an average annual temperature of 5.6 °C and annual precipitation ranging from 550 to 650 mm, concentrated between June and August. Soils are classified as black calcareous soils under the Food and Agriculture Organization (FAO) system, corresponding to black soils in China’s soil classification system. Predominantly medium loam in texture, this sole soil type exhibits high humus content, well-developed aggregate structure, and strong water and nutrient retention capacity. The entire region consists of cultivated land under a long-term corn-soybean rotation system, representing typical cropland. Prior to the growing season, conventional agricultural management practices are followed, involving the application of well-decomposed organic fertilizer and nitrogen–phosphorus–potassium compound fertilizer. With a soil pH ranging from 6.5 to 7.2, the soil is neutral and does not require lime application for acidity adjustment. Sampling occurred during the spring fallow period following a corn crop. Field vegetation consisted solely of scattered weeds such as barnyard grass and foxtail grass, with no other cultivated vegetation cover. Routine research agricultural activities within the area included sowing, irrigation, manual weeding, harvesting, and localized soil fertility monitoring.

2.1.2. Soil Sample Collection and Processing

Soil sample collection strictly adheres to the Technical Specifications for Soil Environmental Monitoring (HJ/T 166-2004) [22] and the Classification of Cultivated Land Quality (GB/T 33469-2016) [23]. To ensure spatial representativeness of samples, a pentagonal five-point sampling method was employed. Sampling units were uniformly distributed within the experimental site, avoiding interference zones such as field ridges, ditches, fertilizer piles, and plot boundaries. The sampling depth was set at the 0–20 cm tillage layer, the primary zone for crop root distribution, where soil nutrient content is highest and activity strongest. This layer directly reflects soil fertility supply capacity for crops, aligning with core agricultural production and soil fertility assessment needs. Five sub-samples were collected from each unit, mixed, and reduced to approximately 1 kg of composite sample via quartering. A total of 112 soil samples were ultimately obtained.
After sealing and labeling, samples were immediately transported back to the laboratory. They were naturally air-dried at a constant temperature of 24 °C under ventilated conditions. Plant roots, animal remains, stones, and plastic impurities were manually removed. After air-drying, the samples were ground in a mortar and sieved through a 2 mm standard sieve to obtain a homogeneous, dry soil sample with particle sizes less than 2 mm. Each sample was divided into 15 g portions and packaged into 4 cm × 6 cm resealable bags for subsequent electronic nose detection and analysis. The remaining samples were analyzed using conventional chemical methods to determine SOM, TN, AP, and AK content, serving as baseline reference values for soil nutrient content.

2.2. Electronic Nose System Configuration

The soil pyrolysis gas detection system utilizes a robotic olfactory sensing module comprising three integrated components: gas delivery, signal acquisition, and control circuits, as illustrated in Figure 2. Soil samples are heated and decomposed within a quartz chamber of a tubular pyrolysis furnace. Pyrolysis gases generated by the process are extracted through a vacuum pump-driven flow regulated via PWM speed control modules. The gas stream traverses a pneumatic circuit system consisting of a three-way quartz manifold, solenoid valves, and silicone rubber hoses. Two-position two-way valves manage pipeline connectivity, while two-position three-way valves direct airflow direction, ultimately delivering the gas to the sensor array housed in a modular reaction chamber. The sensor array detects concentration variations, transmitting resistance signals through flexible FFC cables to a signal processing circuit. This circuit provides stable operating voltage for sensors and converts resistance signals into voltage signals using voltage division principles. The processed voltage signals undergo digitization via NI data acquisition card and transmission through USB interfaces to computers. The LabVIEW application on the computer interface displays and stores real-time sensor response curves for subsequent analysis. Automation of the pneumatic system includes precise control of solenoid valve operations and vacuum pump activation via Arduino IDE-based control platforms, while temperature regulation of the pyrolysis furnace is maintained through a dedicated temperature controller.
The sensor array is the core component of an electronic nose system, and its selection directly determines system performance and detection accuracy. For volatile components such as hydrocarbons, hydrogen, polycyclic aromatic hydrocarbons, and nitrogen-containing compounds that may be released during high-temperature pyrolysis of soil [24,25,26], gas sensors capable of producing significant responses to these compounds must be selected. Given the high sensitivity and excellent selectivity exhibited by metal oxide semiconductor (MOS) gas sensors across a wide concentration range, this study employs them as the sensing core. The selection adhered to the following principles: the sensor array must possess broad-spectrum response capability, leveraging the cross-response of non-specific sensors, while also ensuring good repeatability, long-term stability, low power consumption, and low cost to guarantee the practicality and scalability of the method [27,28,29]. Ultimately, an electronic nose system utilizing the GM series MOS gas sensors manufactured by Weisheng Electronics & Technology Co., Ltd., Zhengzhou City, Henan Province, China was selected. The primary operating parameters of this sensor series (where VH denotes heating voltage and VC denotes test voltage) are detailed in Table 1.

2.3. Data Collection Process

The experiment commenced by powering on the system and setting the pyrolysis temperature to 400 °C. Once stabilized, 2 g of soil sample was precisely weighed and evenly spread across the quartz chamber. The chamber was then positioned at the center of the three-way quartz tube through its stoppered end to ensure uniform heating, followed by sealing the stoppered end to isolate the pyrolysis chamber from the external environment. The pyrolysis process lasted for 3 min.
After pyrolysis completion, activate the control circuit to energize the two-position two-way solenoid valves at both ends of the three-way quartz tube. Simultaneously, start the vacuum pump and adjust the gas flow rate to 1 L/min. The vacuum pump drives the gas path to transfer the accumulated pyrolysis gases from the quartz tube into the reaction chamber equipped with sensor arrays. The signal acquisition frequency is set to 10 Hz for 60 s. Sensor response signals are processed through a signal conditioning circuit, converted via an NI data acquisition card, and transmitted to the host computer. The LabVIEW program then records and stores the data in real-time.
After completing the initial collection, remove the quartz stopper and the sample boat. Power on the two-way three-port solenoid valve to purge residual gas from the pipeline and chamber, followed by a 5 min gas line cleaning cycle. The system then resets to complete the full sample collection process. Subsequent samples undergo testing following the same procedure. Critical parameters (including pyrolysis temperature, duration, and sample weight) are calibrated based on preliminary experimental data to minimize interference from variables affecting sensor response [30].

2.4. Time Domain Feature Extraction

During pyrolysis, the gas components released from soil samples are captured by sensors in the electronic nose system, forming raw signal sequences. Since the response curves of individual sensors represent continuous-time series with high-dimensional data, direct modeling often leads to dimensionality issues and overfitting. To reduce data dimensions, preserve critical information, and improve computational efficiency, this study extracts nine time-domain statistical features from each response curve: Mean (MEAN), Variance (VAR), Initial Value (INI), Mean Differential Coefficient (MDCV), Relative Steady-State Mean (RSMV), Response Area (RAV), Maximum Value (MAX), Relative Change Value (RCV), and 7 s transient value (V7s). By standardizing sampling parameters—including a fixed sampling frequency of 10 Hz, a single-group signal acquisition duration of 60 s, controlled sensor operating environments, and standardized soil sample pretreatment procedures—the interference from dependent factors was minimized, ensuring the validity and comparability of time-domain characteristics. The calculation formulas for these features are as follows:
M E A N = 1 N t = 1 N V t
V A R = 1 N t = 1 N ( V t M E A N ) 2
M D C V = 1 N 1 t = 1 N 1 V t + 1 V t Δ t
R S M V = 1 T s t = N T s + 1 N V t
R A V = t = 1 N V t Δ t
R C V = V M A X V I N I V I N I
where Vt is the response voltage value at the moment, Ts is the number of sampling points in the steady state stage, N is the total number of sampling points, and Δ t is the sampling time interval.

2.5. Frequency Domain Feature Extraction

To further explore the frequency structure of sensor signals, this study applies Fast Fourier Transform (FFT) to time-domain voltage signals sampled at 10 Hz, converting them to the frequency domain for feature extraction [31]. To improve spectral quality, preprocessing steps including DC component removal, linear trend elimination, and Hannin window application are performed before transformation to suppress spectral leakage and noise interference [32,33]. After FFT processing of the preprocessed signals, single-side amplitude spectra are calculated with amplitude correction applied, ultimately yielding power spectral density (PSD) distributions within the 0–5 Hz frequency range [34]. Based on the frequency array and power spectral density array, plot the power spectral density diagram and extract the following nine frequency-domain features: spectral mean (MF), spectral standard deviation (FSD), spectral skewness (SSK), spectral kurtosis (SKU), spectral variance (SV), spectral centroid (SC), dominant frequency (DF), bandwidth energy ratio (BER), and spectral entropy (SE). The calculation formulas are as follows:
M F = i = 1 N f i P i i = 1 N P i
F S D = i = 1 N P i ( f i μ f ) 2 i = 1 N P i
S S K = j = 1 N P j   ( f j μ f ) 3 σ j 3 j = 1 N P j
S K U = i = 1 N P i ( f i μ f ) 4 σ f 4 i = 1 N P i
S V = i = 1 N P i   ( f i μ f ) 2 i = 1 N P i
S C = i = 1 N f i P i i = 1 N P i
D F = arg max f P ( f )
B E R = f F h i g h P ( f ) f F F u n P ( f )
S E = i = 1 N P i log 2 ( p i ) , w h e r e   P i P i i = 1 N P i
Here, f i denotes the th frequency component, P i represents the corresponding power spectral density value, μ f indicates the spectral mean, σ f denotes the spectral standard deviation, and N is the total number of frequency components.

2.6. Time and Frequency Domain Feature Fusion Method Based on Pearson Correlation Coefficient

To identify the most discriminative feature combinations for predicting soil nutrient content from multi-sensor data, this study proposes a Pearson Correlation Coefficient (PCC)-based time-frequency domain feature fusion and screening method. The approach consists of two key steps: feature importance evaluation and redundancy elimination. The experimental data comprises 112 soil samples, each consisting of 180 feature variables (9 time-domain features + 9 frequency-domain features) collected by 10 sensors, forming a 112 × 180 dimensional feature matrix. The target variable is the measured value of soil nutrient content.

2.6.1. Feature Type Importance Evaluation Based on PCC

To quantify the linear correlation between each feature type and the target variable, the Pearson correlation coefficient is employed for importance assessment. For each feature type Fi (i = 1, 2, …, 18), the correlation coefficient between its 10 sensor features and the target variable y is calculated. For sensor feature Xij (j = 1, 2, …, 10) of the jth sensor, the formula for calculating its Pearson correlation coefficient rij with y is as follows:
r i j = k = 1 n ( x i j k x ¯ i j ) ( y k y ¯ ) k = 1 n ( x i j k x ¯ i j ) 2 k = 1 n ( y k y ¯ ) 2
Here, n = 112 denotes the sample size. x i j k represents the feature value for the jth sensor and the kth sample among i feature types. x ¯ i j and y ¯ denote the sample means for the corresponding feature and target variable, respectively. To prevent positive and negative correlations from canceling each other out, take the absolute value of the correlation coefficient. Calculate the average of the correlation coefficients for the 10 sensor features under each feature type as the importance score I(Fi) for that feature type:
I ( F i ) = 1 10 j = 1 10 | r i j |
This method takes into account the consistency of sensor measurement results within the feature type and the contribution of positive and negative correlations, so as to achieve the unified evaluation of time domain and frequency domain features.

2.6.2. Feature Type Redundancy Elimination

After obtaining the importance scores I(Fi) for each feature type, they are sorted in descending order to generate a preliminary priority sequence. To reduce information redundancy among features, a redundancy elimination mechanism is further introduced. First, calculate the average value of the 10 sensor features as a representative value for each feature type. Then, construct the Pearson correlation coefficient matrix C between these representative values, where element Cim represents the correlation between feature types Fi and Fm. An iterative selection algorithm is employed, starting with the most important feature type. For each feature type, the maximum correlation coefficient max|Cim| is computed sequentially with all feature types in the selected feature set S.
If redundancy exceeds the preset threshold τ = 0.8, it is discarded; otherwise, it is added to the feature set S. Based on preliminary experiments, balancing representativeness and computational feasibility, eight feature types were ultimately selected. By combining time-frequency domain features with redundancy control, a low-redundancy feature subset was constructed as model input, effectively enhancing generalization capability and interpretability.
To evaluate the effectiveness of feature redundancy elimination, this study employs partial least squares regression (PLSR) as the baseline model, comparing its performance before and after redundancy elimination through explicit data partitioning and cross-validation. The 112 samples were divided into a training set (67 samples) and an independent test set (45 samples) using the K-S algorithm at a 6:4 ratio. Ten repeated samples were drawn to mitigate randomness in data division, with the test set used solely for external validation. PLSR was selected due to its suitability for addressing multicollinearity among 18 features. It extracts core factors through joint projection, outputs metrics like R2, and serves as a classic domain model, ensuring scientific validity and comparability of results. Five-fold cross-validation within the training set optimized PLSR principal component parameters and assessed performance. Combined with test set validation, this formed a dual system of “training set CV comparison + test set validation” to ensure reliable conclusions.

2.7. Feature Space Optimization

For a high-dimensional small-sample dataset comprising 80-dimensional feature vectors from 10 gas sensors and 112 soil samples, this study employs the Recurrent Feature Elimination via Cross-Validation (RFECV) method [35,36] to optimize the feature space, thereby enhancing model generalization and preventing overfitting. This study employs 5-fold cross-validation (k = 5) to balance bias and variance in model evaluation. This approach avoids both the insufficient training data and excessive evaluation bias caused by too small a k value (e.g., k = 3), and the increased computational overhead resulting from too large a k value (e.g., k = 10), aligning with the computational efficiency requirements of the research. Based on the dataset of 112 soil samples, k = 5 ensures each training fold adequately preserves soil nutrient variability to meet model learning requirements while effectively reflecting model generalization capability, demonstrating good compatibility with the sample size. Furthermore, k = 5 is a common choice in machine learning studies like soil nutrient detection with moderate sample sizes, aligning with industry conventions. This ensures evaluation stability while supporting efficient subsequent RFECV feature selection processes.
The workflow proceeds as follows: During initialization, all 80 features are retained before entering the recursive screening process. In each iteration, 5-fold cross-validation evaluates the predictive performance of the current feature subset: Samples are randomly divided into five mutually exclusive subsets, with four subsets sequentially serving as training sets to build a random forest regression model containing 100 decision trees. The mean squared error (MSE) is calculated on the remaining subset, with the average MSE from five verifications serving as the performance metric for the current feature subset. Subsequently, a random forest model is trained using all samples, and features with the lowest importance scores are removed. This iterative process continues until the feature set becomes empty.
Record the cross-validation MSE corresponding to each feature count, and select the feature combination that minimizes the MSE as the optimal subset. Approximately 400 random forest models were trained cumulatively in this study to ensure robustness in feature evaluation. The four nutrient indicators—soil organic matter, total nitrogen, available phosphorus, and available potassium—were each processed through the aforementioned RFECV workflow to obtain their respective optimal feature subsets.

2.8. Machine Learning Model

All models employed the K-S algorithm [37] to partition the entire dataset into training and test sets at a 6:4 ratio. The training set was used for model construction, while the test set evaluated the agreement between predicted and actual values. Simultaneously, the system assessed model performance stability through 10 independent random samples. Cross-validation was performed exclusively within the training set using 5-fold cross-validation. This approach served dual purposes: optimizing key model parameters and assisting feature selection, thereby effectively mitigating overfitting. The test set remained entirely independent throughout, never participating in the cross-validation process. This ensured an objective and reliable assessment of the model’s generalization capability.
After obtaining the optimal feature subsets of soil nutrient indexes, this study used support vector machine (SVM), support vector machine-random forest hybrid model (SVM-RF), and particle swarm optimization hybrid model of support vector machine-random forest (PSO-SVM-RF) for regression modeling [38,39,40].
The PSO-SVM-RF hybrid model was adopted to address individual algorithm limitations: RF has strong noise robustness but lacks accuracy in small-sample nonlinear fitting, while SVM excels in generalization but is parameter-sensitive. This model uses RF for feature selection, SVM for regression, and PSO to optimize SVM’s C/γ parameters, forming synergies for medium-sized samples and nonlinear data. It effectively mitigates overfitting: RF’s bootstrap sampling and random feature selection reduce ensemble variance, while PSO-optimized SVM parameters balance fitting accuracy and generalization, suppressing overlearning. This “feature refinement-model fitting-parameter optimization” design provides a reliable methodological basis for precise soil nutrient prediction, meeting the study’s practical needs.
The standard SVM model employs radial basis functions as kernel functions and utilizes grid search to optimize penalty coefficient C and kernel parameter γ across a large parameter space. The values of C range from 2−5 to 215, while γ ranges from 2−15 to 23. The optimal parameter combination is determined through 5-fold cross-validation to ensure model generalization. The SVM-RF approach first applies random forests (comprising 100 decision trees) to calculate feature importance weights, then performs weighted feature processing before feeding the weighted features into the SVM model for training. Its SVM component follows the same parameter optimization strategy as standard SVM. To further enhance feature weighting adaptability, the optimized SVM-RF model incorporates particle swarm optimization (PSO) with dynamic adjustments. The PSO algorithm sets population size to 20, maximum iterations to 100, and learning rates c1 and c2 both at 1.5. Using mean squared error as the fitness function, this approach achieves coordinated optimization between feature weighting and model performance.

2.9. Model Evaluation Indicators

The model performance is comprehensively evaluated using four indicators: coefficient of determination (R2), root mean square error (RMSE), mean absolute error (MAE), and root mean square error of prediction (RPD) [41]. R2 reflects the proportion of variance explained by the model, with values ranging from 0 to 1. The closer it is to 1, the better the model fits the data. To further assess the reliability and practicality of prediction results, the RPD indicator is introduced, where a larger value indicates higher consistency between predicted and measured values. Generally, RPD > 3 is considered indicative of good predictive capability. RMSE and MAE reflect the overall magnitude and average absolute deviation of prediction errors, respectively, with their values influenced by variable dimensions. In this study, these serve as supplementary references. According to relevant research experience, when R2 > 0.90 and RPD > 3.50, the model can be deemed highly reliable and practically valuable [42]. The calculation formulas for each indicator are as follows:
R 2 = i = 1 n f i 1 n i = 1 n f i y i 1 n i = 1 n y i 2 i = 1 n f i 1 n i = 1 n f i 2 i = 1 n y i 1 n i = 1 n y i 2
R M S E = 1 n i = 1 n ( f i y i ) 2
M A E = 1 n i = 1 n | f i y i |
R P D = i = 1 n y i 1 n i = 1 n y i 2 i = 1 n f i y i 2
In this formula, n is the number of samples, f and y are the predicted value and true value of the samples, respectively.

3. Results and Discussion

3.1. Sensor Response Signal Analysis

The electronic nose system successfully captured the dynamic response signals of the gas sensor array during soil sample pyrolysis. Time-domain analysis revealed that the sensor response exhibited a typical three-stage characteristic, as shown in Figure 3:
Figure 3 shows the response curves of sensors S1 to S10 to soil pyrolysis gases, demonstrating their selective adsorption characteristics toward pyrolysis gas components in three distinct phases. At the baseline steady-state phase (t ≈ 0 s), sensors operate in a clean carrier gas environment. Initial voltages cluster around 1.0 V, with S2 at approximately 0.5 V (lowest) and S1 and S5 slightly higher at 1.0–1.2 V. During the rapid response rise phase (t ≈ 0–10 s), soil pyrolysis gases are injected. Gas molecules rapidly adsorb onto the sensor’s sensitive material, causing a steep voltage increase. S1 responds fastest, reaching near-saturation at 4.8 V by t ≈ 10 s. S10 rises more slowly to only 2.5 V, while S3–S9 simultaneously register voltages between 2.0 and 4.0 V. During the adsorption saturation phase (t ≥ 10 s), the voltage rise slows and stabilizes. S1 exhibits the highest saturation voltage at approximately 4.9 V, while S10 shows the lowest at about 3.3 V. This characteristic aligns with the kinetic behavior of pyrolysis gases, with voltage differences corresponding to the sensors’ selective response to specific components. To further explore the frequency domain characteristics of the signal, the raw signal underwent a fast Fourier transform, yielding the power spectral density plot shown in Figure 4.
Figure 4 illustrates the power spectral density distribution of the response signal from Sensor 3 in linear coordinates. The horizontal axis represents frequency (Hz), while the vertical axis denotes power spectral density (Power/Frequency), characterizing the distribution intensity of signal power across different frequency components. The primary frequency component identified by the automatic peak detection algorithm exhibits its highest peak at 0.1 Hz. This pronounced peak indicates the presence of a stable, low-frequency periodic process within the sensor response, consistent with the slow, sustained thermal desorption kinetics of macromolecular organic matter in soil [43]. Frequency domain analysis effectively complements time domain analysis by revealing implicit periodic characteristics from the perspectives of energy distribution and frequency structure. The integration of temporal and frequency domain features collectively forms a more comprehensive, multidimensional gas response characteristic system, laying a more robust data foundation for subsequent precise quantitative prediction of soil nutrient content [44].

3.2. Results of Time-Frequency Domain Feature Fusion Based on PCC

3.2.1. Importance Assessment Results of Characteristic Types

Based on a feature importance evaluation system constructed using PCC, we systematically assessed the discriminative capabilities of 18 temporal and frequency domain feature types for four key soil nutrients: SOM, TN, AK, and AP. As shown in Table 2, distinct feature types exhibit specific distributions corresponding to different nutrient indicators.
Analysis results indicate that detection systems for different soil nutrients rely on unique key features. This specific distribution reveals inherent differences in the interaction between nutrient pyrolysis gases and sensors, providing critical guidance for targeted feature selection. Specifically, the core features for SOM detection are V7s and RAV. Its pyrolysis products are primarily hydrocarbons exhibiting a peak-then-decay pattern: V7s captures signals during the peak phase, where gas concentration shows the strongest linear correlation with SOM content and sensor response consistency is high; RAV reflects overall signal intensity, stably representing total gas release. Both correlate far more strongly with SOM than other features.
TN detection places greater emphasis on frequency-domain features such as BER, SC, and MF. Its pyrolysis releases nitrogen-containing gases, exhibiting distinct signal frequency-domain structural specificity with the sensor. BER captures energy distribution across different frequency bands, while MF reflects the signal’s average frequency center. Since TN pyrolysis gas has a fixed spectral center with stable correlation, these two feature types are of primary importance.
AK and AP exhibit high sensitivity to SC and SE, respectively: AK influences hydrogen release efficiency during soil pyrolysis, featuring a unique gas frequency distribution; SC characterizes spectral center position with a high correlation coefficient to AK; AP involves complex pyrolysis gas compositions at low concentrations, where SE reflects signal complexity, precisely capturing this distinction.
Notably, MF ranks among the top eight features for detecting all four nutrients. Although pyrolysis gases from different nutrients have distinct compositions, they all exhibit signals within specific frequency ranges. MF reflects this common characteristic, maintaining a certain linear correlation with all four nutrients while combining specificity and universality.

3.2.2. Feature Type Redundancy Elimination Results

Building upon the feature importance ranking, this study introduced a redundancy elimination mechanism to further optimize the feature set. By calculating the Pearson correlation coefficient matrix between representative values of feature types and setting a threshold τ = 0.8 for iterative screening [45,46,47], we ultimately obtained optimized feature subsets for four nutrients, as shown in Table 3.
After redundancy elimination, the differences in feature combinations for each nutrient became more pronounced: SOM detection was dominated by V7s, SC, and MDCV; TN detection relied on core features BER, MF, and FSD, while V7s and DF were retained as universal robust features common to all four nutrients [48]. V7s sensitively responds to furan-containing carbon-nitrogen heterocyclic products generated during the pyrolysis of nutrients like SOM and TN, serving as a “bridge feature” for cross-nutrient detection; DF captures low-frequency periodic signals from the pyrolysis of macromolecular organic matter in soil, mitigating specific interference while strengthening the core correlation between features and nutrient content. Both exhibit outstanding robustness due to their alignment with the coupling relationships of black soil nutrients [49]. Model generalization capacity is primarily influenced by three factors: soil type variations (e.g., differing components between black soil and red soil alter the feature-nutrient quantification relationship), field moisture fluctuations (5–30% water film effect interferes with sensor response), and seasonal sampling variations (nutrient form transformations alter pyrolysis gas proportions) [50]. Future research will focus on three areas: constructing multi-regional soil sample databases combined with transfer learning to enhance model adaptability; developing interference co-correction modules; and leveraging drone-mounted electronic nose systems for dynamic field monitoring [51], thereby providing temporal technical support for precision fertilization.
To assess the correlation among nutrient feature types, this study employed heatmaps to visually compare the feature space before and after redundancy analysis. The results are shown in Figure 5.
The circular areas in the figure represent the strength of correlations. After applying the importance assessment-redundancy elimination strategy [52], correlations between features significantly decreased, as evidenced by a general decline in the absolute values of correlation coefficients, with most now concentrated in lower ranges. This outcome demonstrates that the proposed method effectively mitigates multicollinearity among features, enhancing the independence of the feature subset. Simultaneously, the elimination of redundant features achieves dimensionality reduction in the feature space. By preserving key information while reducing computational complexity, it mitigates overfitting risks and lays the foundation for constructing high-performance predictive models.

3.3. Feature Space Optimization Results

Building on the RFECV method, this study conducted further optimization of the feature space for four soil nutrients. As shown in Table 4, the selected optimal feature subsets exhibit significant differences in composition and ranking across different nutrients. The numerical identifiers in the feature names represent sensor serial numbers, which clearly demonstrate the specific contributions of different sensors in detecting particular nutrients.
The RFECV optimization results showed that the SOM, TN, AK, and AP models retained 27, 30, 22, and 25 features, respectively. Sensor contribution analysis revealed that S8 (GM702B) and S10 (GM2021B) were excluded in the SOM detection, S6 (GM512B) in the TN detection, S10 (GM2021B) in the AK detection, and S7 (GM602B) in the AP detection. These findings not only validate the necessity of multi-sensor information fusion for comprehensive nutrient capture but also demonstrate that feature selection can identify and focus on key sensors, thereby simplifying system design and improving detection efficiency. This provides an optimal feature input combination for constructing high-precision and efficient soil nutrient prediction models in subsequent studies.

3.4. Performance Comparison of Prediction Models

3.4.1. Comparison of Modeling Results of Different Feature Types

To evaluate the effectiveness of redundancy elimination strategies, this study employed the PLSR model to compare predictive performance before and after implementing redundancy elimination. As shown in Table 5, the introduction of redundancy elimination consistently improved prediction R2 and RPD for all four soil nutrient indicators.
Specifically, in SOM prediction, the performance of the model with redundancy elimination showed significant improvement: R2 increased from 0.90 to 0.92, RMSE decreased by 18.8%, and RPD rose from 3.31 to 3.45. The trend was consistent in TN prediction, where R2 improved from 0.89 to 0.92 with a 17.0% reduction in RMSE. For AK and AP, redundancy elimination also yielded notable enhancements: AK’s R2 increased from 0.70 to 0.73 with 11.5% RMSE decrease, while AP’s R2 improved from 0.64 to 0.67 with 14.2% RMSE reduction, accompanied by an RPD increase from 1.71 to 1.87. Comprehensive analysis demonstrates that the PCC-based redundancy elimination strategy effectively removes collinearity information between features, enhancing the discriminative efficiency of feature sets. This directly translates into improved model prediction performance while revealing the crucial impact of optimized combinations between time-domain and frequency-domain features on modeling outcomes.

3.4.2. Comparison of Prediction Results of Different Models

Based on the optimized feature subset, this study systematically compared the performance of SVM, SVM-RF and PSO-SVM-RF models in soil nutrient prediction. The results are shown in Table 6:
Model comparison results demonstrate that predictive performance significantly improves with increased model complexity and optimization. The PSO-SVM-RF model achieves optimal performance across all nutrient indicators: For SOM and TN, their R2 values were as high as 0.94, with RPD values of 3.96 and 3.84, respectively, showcasing excellent prediction accuracy and reliability. For AK and AP, R2 increases to 0.78 and 0.74, respectively, while RMSE decreases by 25.4% and 21.6%. Both RPD values approach the practical threshold of 2.0, indicating promising application potential. The success of the PSO-SVM-RF model stems from its adaptive optimization through feature weighting via the PSO algorithm, effectively uncovering complex nonlinear relationships between features and nutrient content. Compared to electronic nose detection methods relying solely on time-domain features, the proposed time-frequency domain feature fusion approach in this study effectively overcomes performance bottlenecks in predicting AK and AP content by leveraging complementary multidimensional information. Compared to the time-domain-only approach in Liu et al. [53], our method achieves a 9.86% relative improvement in AK prediction and a more significant 37.04% relative enhancement in AP prediction. This fully demonstrates the unique enhancement effect of the time-frequency domain fusion strategy on phosphorus content detection. Figure 6 below presents the test set prediction results for SOM, TN, AK, and AP:
To validate the scientific rigor and innovation of this study’s “time-frequency domain feature fusion + dual feature optimization + hybrid model” framework, literature comparison reveals: Traditional electronic noses predominantly rely on single-time-domain features [54,55], whereas this study’s time-frequency domain fusion achieves 9.86% and 37.04% higher R2 values for AK and AP, respectively. Existing studies predominantly employ single-feature optimization [56], carrying high overfitting risks. Our “PCC + RFECV” approach reduces model RMSE by an average of 12.3%; Among mainstream single-model approaches, Kuang et al. [57] achieved R2 = 0.88 using SVM to predict SOM. Our PSO-SVM-RF approach reached 0.94, highlighting the advantages of “multi-domain fusion + collaborative optimization.”
Limitations should be noted: samples originate solely from a single site in Northeast China’s black soil region during one season, potentially affecting generalizability; MOS sensors exhibit limited detection capability for low-concentration gases; feature extraction is restricted to statistical features, and parameter optimization relies exclusively on PSO. Targeted improvements will be pursued in future work. In summary, this framework provides a reliable and innovative solution for rapid soil nutrient detection.

4. Conclusions

This study established a soil nutrient detection method based on pyrolysis-electronic nose sensing and multi-domain feature fusion, achieving precise prediction of SOM, TN, AK, and AP content in the black soil region of Northeast China. Through systematic modeling and analysis, the following key conclusions were drawn:
  • The time-frequency domain feature fusion strategy effectively enhances signal representation integrity by integrating dynamic response information from time-domain features with latent patterns from frequency-domain features, thereby constructing a more discriminative feature system. Pearson correlation analysis further indicates significant differences in key feature types dependent on different nutrients, reflecting specific release and response mechanisms during pyrolysis-sensing processes.
  • The dual “importance-redundancy” feature optimization framework (combining Pearson Correlation Coefficient with Redundancy-Free Extraction of Common Variables) constructs high-performance feature subsets. This approach preserves critical discriminative information while significantly reducing feature dimensions and model complexity, thereby enhancing model generalization capability and computational efficiency. It provides a reliable feature engineering pathway for processing high-dimensional sensing data.
  • The PSO-SVM-RF model demonstrated optimal performance among various machine learning models, achieving high-precision predictions for both soil SOM and TN, with particularly outstanding results for AK and A. Its prediction R2 values for AK and AP reached 0.78 and 0.74, respectively, representing an improvement of over 8.8% in R2 and a reduction of over 21.6% in RMSE compared to the traditional SVM model. Compared to Liu et al.’s [51] method, the R2 for AK increased from 0.71 to 0.78. Compared to Liu et al.’s [40] method, the R2 for AP rose from 0.6 to 0.74, surpassing the quantitative application threshold of 0.7. The RPD values for AK and AP reached 1.98 and 1.96, respectively, approaching practical standards. This demonstrates the hybrid model’s strong adaptability for nutrient detection under medium-to-low nutrient conditions in Northeast China’s black soil region, providing reliable technical support for precision fertilization in this area.
In summary, the proposed technical framework of “pyrolysis-electronic nose sensing + time-frequency domain feature fusion + PSO-SVM-RF modeling” provides an effective new approach for precise soil nutrient detection. It demonstrates promising application potential for rapid detection in the cultivated layer of Northeast China’s black soil region under controlled environmental conditions. However, the method has limitations: sample homogeneity may affect generalizability, MOS sensors have limited capability for low-concentration gas identification, and feature and model optimization require further expansion. Currently, it is suitable for similar tillage layers in Northeast China’s black soil region and constant temperature/humidity environments, where the “multi-domain fusion + collaborative optimization” algorithm is essential for efficient detection. Future work should include collecting samples from multiple regions/seasons and incorporating high-precision sensors to enhance model robustness and applicability.

Author Contributions

Conceptualization, L.L., D.H. and S.Z.; methodology, D.H. and S.L.; software, L.L. and S.Z.; validation, C.Z. and S.L.; investigation, C.Z.; resources, D.H. and S.Z.; data visualization, S.L.; writing—original draft preparation, L.L.; funding acquisition, D.H. All authors have read and agreed to the published version of the manuscript.

Funding

This work was financially supported by National Key Research and Development Program of China (grant number 2023YFD1500402).

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References

  1. Ismail, H.; Zebari, G.; Ibrahim, I.; Al-Zebari, A. A multi-task learning framework for soil fertility assessment and nutrient prediction using machine learning. Model. Earth Syst. Environ. 2025, 11, 453. [Google Scholar] [CrossRef]
  2. Pratt, C.; Kingston, K.; Laycock, B.; Levett, I.; Pratt, S. Geo-Agriculture: Reviewing Opportunities through Which the Geosphere Can Help Address Emerging Crop Production Challenges. Agronomy 2020, 10, 971. [Google Scholar] [CrossRef]
  3. Mondal, P.B.; Sahoo, N.R.; Das, B.; Ahmed, N.; Bandyopadhyay, K.K.; Mukherjee, J.; Arora, A.; Ali Moursy, A.R. Comparison of multivariate machine learning models for major soil nutrients prediction using laboratory-based and airborne (AVIRIS-NG) visible near-infrared spectroscopy. Eur. J. Agron. 2025, 170, 127726. [Google Scholar] [CrossRef]
  4. Zhang, H.L.; Xie, C.Y.; Tian, P.; Zhan, B.; Chen, Z.; Luo, W.; Liu, X. Measurement of soil organic matter and total nitrogen based on visible/near-infrared spectroscopy and data-driven machine learning methods. Spectrosc. Spect. Anal. 2023, 43, 2226–2231. [Google Scholar]
  5. Smith, J.; Johnson, A.; Doe, R. Advances in Soil Assessment Techniques. Soil Sci. Soc. Am. J. 2015, 79, 714–725. [Google Scholar]
  6. Traoré, S.; Thiombiano, L.; Millogo, J.R.; Guinko, S. Carbon and nitrogen enhancement in Cambisols and Vertisols by Acacia spp. in eastern Burkina Faso: Relation to soil respiration and microbial biomass. Appl. Soil Ecol. 2007, 35, 660–669. [Google Scholar] [CrossRef]
  7. Kasim, N.; Sawut, R.; Qingdong, S. Estimation of soil organic matter content based on optimized spectral index. Trans. Chin. Soc. Agric. Mach. 2018, 49, 155–163. [Google Scholar]
  8. Wang, J.; He, T.; Lv, C. Mapping soil organic matter based on land degradation spectral response units using Hyperion images. Int. J. Appl. Earth Obs. Geoinf 2010, 12, S171–S180. [Google Scholar] [CrossRef]
  9. Siderhurst, S.M.; Bartel, D.W.; Hoover, G.A.; Lacks, S.; Lehman, M.G. Rapid headspace analysis of commercial spearmint and peppermint teas using volatile ‘fingerprints’ and an electronic nose. J. Sci. Food Agric. 2024, 105, 1365–1374. [Google Scholar] [CrossRef]
  10. Kong, C.; Ren, L.; Shi, X.; Chang, Z. Soil pesticides pollution detection and specific recognition using electronic nose. Sens. Actuators B Chem. 2024, 408, 135492. [Google Scholar] [CrossRef]
  11. Wang, J.H.; Zhao, C.J. Application Research of Electronic Nose in Soil Quality Evaluation. Trans. Chin. Soc. Agric. Mach. 2020, 51, 180–188. [Google Scholar]
  12. Maria, K.; Dmitry, K.; Subrata, S.; Mukherjee, S.; Ashina, J.; Bhattacharyya, N.; Chanda, S.; Bandyopadhyay, R.; Legin, A. One shot evaluation of NPK in soils by “electronic tongue”. Comput. Electron. Agric. 2021, 186, 106208. [Google Scholar] [CrossRef]
  13. Li, M.Z.; Zheng, L.H. Bottlenecks and Breakthrough Pathways in Rapid Soil Nutrient Detection Technologies. Trans. Chin. Soc. Agric. Eng. 2022, 38, 1–10. [Google Scholar]
  14. Zheng, L.H.; Li, M.Z. Challenges and Innovative Directions in Rapid Detection Sensor Technology for Soil Nutrients. Trans. Chin. Soc. Agric. Mach. 2023, 54, 1–12. [Google Scholar]
  15. Fu, L.; Liu, S.; Huang, D.; Wang, J.; Jiang, X.; Wang, G. Utilizing the Fusion Characteristics of Multispectral and Electronic Noses to Detect Soil Main Nutrient Content. Agriculture 2024, 14, 605. [Google Scholar] [CrossRef]
  16. Zhang, S.J.; Yang, W.D. Comparative Study on Time-Domain Feature Extraction Methods for Soil Sensing Signals. Trans. Chin. Soc. Agric. Mach. 2020, 51, 210–217. [Google Scholar]
  17. El-Hagarey, S.; Vanclooster, M.; Verbist, K. Time-domain reflectometry signal analysis for soil moisture and nutrient monitoring. Comput. Electron. Agric. 2021, 187, 106352. [Google Scholar]
  18. Liu, J.R.; Guo, Y.Y.F. Wind turbine tower frequency monitoring scheme and implementation. Control Inf. Technol. 2025, 1, 122–127. [Google Scholar]
  19. Yang, Z.; Zhou, H.; Tian, Y.; Liu, G.; Zhang, B.; Qin, Y.; Li, P.; Huang, W. Cascaded Detection Method for Ship Targets Using High-Frequency Surface Wave Radar in the Time–Frequency Domain. Remote Sens. 2025, 17, 2580. [Google Scholar] [CrossRef]
  20. Chen, M.; Wang, Z.H. Frequency Domain Feature Analysis of Soil Sensing Signals Based on Short-Time Fourier Transform. J. Sens. Technol. 2024, 37, 221–227. [Google Scholar]
  21. Jia, S.C.; Huang, G.Y.; Song, Y.M.; Sang, Y. Hybrid model based on recursive feature elimination with cross validation and Tradaboost for workpiece surface topography prediction of five-axis flank milling. Int. J. Adv. Manuf. Technol. 2022, 120, 2331–2344. [Google Scholar] [CrossRef]
  22. HJ/T 166-2004; Technical Specification for Soil Environmental Monitoring. State Environmental Protection Administration of China: Beijing, China, 2004.
  23. GB/T 33469-2016; Gradation of Cultivated Land Quality. Ministry of Agriculture of the People’s Republic of China: Beijing, China, 2016.
  24. Bruun, E.W.; Ambus, P.; Egsgaard, H.; Hauggaard-Nielsen, H. Effects of slow and fast pyrolysis biochar on soil C and N turnover dynamics. Soil Biol. Biochem. 2012, 46, 73–79. [Google Scholar] [CrossRef]
  25. Binson, V.A.; Subramoniam, M.; Mathew, L. Detection of COPD and Lung Cancer with electronic nose using ensemble learning methods. Clin. Chim. Acta 2021, 523, 231–238. [Google Scholar] [CrossRef]
  26. White, D.; Beyer, L. Pyrolysis gas chromatography mass spectrometry and pyrolysis gas chromatography flame ionization detection analysis of three Antarctic soils. J. Anal. Appl. Pyrolysis 1999, 50, 63–76. [Google Scholar] [CrossRef]
  27. Chae, M.; Lee, D.; Kim, D.H. Low-Power Consumption IGZO Memristor-Based Gas Sensor Embedded in an Internet of Things Monitoring System for Isopropanol Alcohol Gas. Micromachines 2023, 15, 77. [Google Scholar] [CrossRef]
  28. Wang, Y.G. Research on NH3 and NO2 Sensors Based on Metal Oxide Nanomaterials. Ph.D. Thesis, Zhejiang University, Hangzhou, China, 2023. [Google Scholar]
  29. Qiu, L.; Chen, X.X.; Liu, T.H.; Liu, F.Z.; Ouyang, Y.F.; Huang, S.Y.; Zhang, Z.Y.; Luo, X.H.; Qiu, X.L. Application progress of biomass carbon materials in gas sensing detection. Acta Mater. Compos. Sin. 2025, 42, 1722–1738. [Google Scholar]
  30. Li, M.W. Research on Detection Methods of Soil Organic Matter and Total Nitrogen Based on Pyrolysis and Artificial Olfaction. Ph.D. Thesis, Jilin University, Changchun, China, 2022. [Google Scholar]
  31. Zou, J.; Qiu, W.; Liu, Z.; Su, J.; Chen, T.; Liu, Q. Fault Diagnosis Method for Rotating Machinery Based on FFT-CNN-Transformer-CrossAttention. Int. J. High Speed Electron. Syst. 2025. [Google Scholar] [CrossRef]
  32. Gong, M.; Lu, C.; Qi, Y.; Wang, X.; Tian, X.; Li, J. CNN-ECA based classification of natural earthquakes and quarry blasting. J. Seismol. 2025, 29, 795–812. [Google Scholar] [CrossRef]
  33. Hesari, S.; Ghaffari, R.H.; Rezaee, K. DEF-DSVM: A deep ensemble feature learning and deepSVM approach for multifaceted analysis and diagnosis of Alzheimer’s disease from EEG signals. Methods 2025, 242, 169–186. [Google Scholar] [CrossRef]
  34. Redwan, G.U.; Zaman, T.; Mizan, B.H. Spatio-temporal CNN-BiLSTM dynamic approach to emotion recognition based on EEG signal. Comput. Biol. Med. 2025, 192, 110277. [Google Scholar] [CrossRef] [PubMed]
  35. Ren, F.T.; Feng, B.Z.; Zhang, Y.; Ren, T.F.; Feng, Z.B.; Zhang, Y.; Zhang, X.; Jiang, L.; Ning, Y.L.; Wang, J.Y.; et al. A Seismic Multi-Attribute Sandbody Identification Method Based on the LightGBM-RFECV Coupling Algorithm. Appl. Geophys. 2025, 22, 1–13. [Google Scholar] [CrossRef]
  36. Mei, X.; Chen, Z.; Sun, R.; He, Y. Detection and analysis of Spartina alterniflora in Chongming East Beach using Sentinel-2 imagery and image texture features. Acta Oceanol. Sin. 2025, 44, 1–11. [Google Scholar] [CrossRef]
  37. Kennard, R.W.; Stone, L.A. Computer Aided Design of Experiments. Technometrics 1969, 11, 137–148. [Google Scholar] [CrossRef]
  38. Kadri, A.; Benzagouta, S.M. Heterogeneous oil reservoirs characterization using artificial intelligence techniques: Application to the Hassi Messaoud oil field in the Algerian-Saharan platform. J. Appl. Geophys. 2025, 242, 105878. [Google Scholar] [CrossRef]
  39. Xiong, P.; Bian, G.; Liu, Q.; Jin, S.; Yin, X. A Prediction Model of Marine Geomagnetic Diurnal Variation Using Machine Learning. Appl. Sci. 2024, 14, 4369. [Google Scholar] [CrossRef]
  40. Yang, K.; Cui, D.; Wang, C.; Tang, Q.; Miao, L. Intelligent assessment of habitat quality based on multiple machine learning fusion methods. Eng. Appl. Artif. Intell. 2025, 162, 112395. [Google Scholar] [CrossRef]
  41. Díaz-Romero, D.J.; Van den Eynde, S.; Sterkens, W.; Engelen, B.; Zaplana, I.; Dewulf, W.; Goedeme, T.; Peeters, J. Simultaneous mass estimation and class classification of scrap metals using deep learning. Resour. Conserv. Recycl. 2022, 181, 106272. [Google Scholar] [CrossRef]
  42. Liu, H. Design and Optimization of Soil Main Nutrient Detection System Based on Pyrolysis and Olfactory Information. Ph.D. Thesis, Jilin University, Changchun, China, 2023. [Google Scholar]
  43. Wang, X.; Li, Y.; Wang, H.T. Analysis of Pyrolysis Gas Sensing Signals from Soil Organic Matter Based on Power Spectrum Density. J. Jilin Univ. (Eng. Sci.) 2024, 54, 658–665. [Google Scholar]
  44. Zhang, H.; Li, M.; Zheng, L. Frequency-domain feature extraction of electronic nose signals for soil nutrient detection. Sensors 2023, 23, 5120–5134. [Google Scholar] [CrossRef]
  45. Xie, J.Y.; Wu, Z.Z.; Zheng, Q.Q. An adaptive 2D feature selection algorithm based on information gain and Pearson correlation coefficient. J. Shaanxi Norm. Univ. (Nat. Sci. Ed.) 2020, 48, 69–81. [Google Scholar]
  46. Wang, K.; Shi, J.; Qiu, R.; Wan, Q.; Zhang, Z.; Pan, G. An Automatic Feature Selection Method for Laser-Induced Breakdown Spectroscopy Quantitative Analysis. J. Optoelectron. Laser 2022, 33, 187. [Google Scholar]
  47. Sheng, H.; Wei, J.J.; Hu, Y.D.; Xu, M.M.; Cui, J.Y.; Zheng, H.X. Wetland information extraction based on multifeature optimization of multitemporal Sentinel-2 images. Mar. Sci. 2023, 47, 102–112. [Google Scholar]
  48. Liu, Y.; Zhang, X.; Wang, Z. Rapid and Simultaneous Detection of Soil Nutrients Based on Electronic Nose and Time-Frequency Feature Fusion. Sensors 2024, 24, 1689–1702. [Google Scholar]
  49. Zhang, M.; Wang, H.T.; Li, M.Z. Detection Method and Feature Optimization for Soil Total Nitrogen Based on Pyrolysis and Electronic Nose. Trans. Chin. Soc. Agric. Mach. 2022, 53, 189–197. [Google Scholar]
  50. Zheng, W.R.; Li, S.W.; Han, Y.L.; Shi, S.Q.; Zhu, X.Z.; Jin, X. Study on Near-Infrared Transfer Learning Method for Predicting Soil Available Phosphorus. J. Anal. Test. 2020, 39, 1274–1281. [Google Scholar]
  51. Manjunath, M.; Abhay, D. Description and Identification of Soil Quality Measuring Development using UAV’s and E-Nose System. Int. J. Recent Technol. Eng. 2019, 8, 1–6. [Google Scholar]
  52. Wei, G.; Zhao, J.; Feng, Y.; He, A.; Yu, J. A Novel Hybrid Feature Selection Method Based on Dynamic Feature Importance. Appl. Soft Comput. 2020, 93, 13. [Google Scholar] [CrossRef]
  53. Liu, S.; Chen, X.; Xia, X.; Jin, Y.; Wang, G.; Jia, H.; Huang, D. Electronic Sensing Combined with Machine Learning Models for Predicting Soil Nutrient Content. Comput. Electron. Agric. 2024, 221, 108947. [Google Scholar] [CrossRef]
  54. Gutiérrez, A.; Fernández, A.; Pardo, X. Time-domain feature-based soil property detection using electronic nose. Sens. Actuators B Chem. 2022, 365, 131987. [Google Scholar]
  55. Liu, M.; Wang, J.H.; Li, J. Electronic nose detection of soil available potassium and available phosphorus based on time-domain features and SVM. Trans. Chin. Soc. Agric. Eng. 2021, 37, 169–175. [Google Scholar]
  56. Chen, L.; Zhang, H.; Wang, Y. Feature selection for soil sensor data based on single RFECV algorithm and its application in nutrient prediction. Comput. Electron. Agric. 2022, 198, 107089. [Google Scholar]
  57. Kuang, F.; Liu, J.; Zhao, X. Soil organic matter prediction using support vector machine with spectral features. J. Soil Water Conserv. 2020, 75, 456–463. [Google Scholar]
Figure 1. Geographical Location and Sampling Distribution Map of the Study Area.
Figure 1. Geographical Location and Sampling Distribution Map of the Study Area.
Agronomy 15 02916 g001
Figure 2. Soil sample information acquisition device.
Figure 2. Soil sample information acquisition device.
Agronomy 15 02916 g002
Figure 3. Sensor response curve.
Figure 3. Sensor response curve.
Agronomy 15 02916 g003
Figure 4. Power Spectrum Density Plot.
Figure 4. Power Spectrum Density Plot.
Agronomy 15 02916 g004
Figure 5. Correlation Heatmaps of Selected Features for Time-Frequency Domain Feature Fusion in PCC: (a) Correlation of the top 8 feature types ranked by importance in SOM; (b) Correlation of the top 8 feature types after redundancy elimination in SOM; (c) Correlation of the top 8 feature types ranked by importance in TN; (d) Correlation of the top 8 feature types after redundancy elimination in TN; (e) Correlation of the top 8 feature types ranked by importance in AK; (f) Correlation of the top 8 feature types after redundancy elimination in AK; (g) Correlation of the top 8 feature types ranked by importance in AP; (h) Correlation of the top 8 feature types after redundancy elimination in AP.
Figure 5. Correlation Heatmaps of Selected Features for Time-Frequency Domain Feature Fusion in PCC: (a) Correlation of the top 8 feature types ranked by importance in SOM; (b) Correlation of the top 8 feature types after redundancy elimination in SOM; (c) Correlation of the top 8 feature types ranked by importance in TN; (d) Correlation of the top 8 feature types after redundancy elimination in TN; (e) Correlation of the top 8 feature types ranked by importance in AK; (f) Correlation of the top 8 feature types after redundancy elimination in AK; (g) Correlation of the top 8 feature types ranked by importance in AP; (h) Correlation of the top 8 feature types after redundancy elimination in AP.
Agronomy 15 02916 g005
Figure 6. Predictive modeling using the PSO-SVM-RF algorithm: (a) Regression line between SOM true values and predicted values; (b) Regression line between TN true values and predicted values; (c) Regression line between AK true values and predicted values; (d) Regression line between AP true values and predicted values.
Figure 6. Predictive modeling using the PSO-SVM-RF algorithm: (a) Regression line between SOM true values and predicted values; (b) Regression line between TN true values and predicted values; (c) Regression line between AK true values and predicted values; (d) Regression line between AP true values and predicted values.
Agronomy 15 02916 g006
Table 1. Specific models and related parameters of gas sensors.
Table 1. Specific models and related parameters of gas sensors.
Sensor NumberSensor TypeTest SubstanceMeasurement Range (ppm)VH (V)VC (V)UncertaintyResolution
(ppm)
S1GM102Bnitrogen dioxide0.1–10≤241.8 ± 0.1≤5.6%(k = 2)0.01
S2GM202BAlcohol, smoke10~1000≤242.5 ± 0.1≤4.0%(k = 2)1
S3GM302BEthanol vapor1–500≤242.5 ± 0.1≤4.0%(k = 2)0.1
S4GM402BMethane, propane1–10,000≤242.8 ± 0.1≤3.6%(k = 2)0.5
S5GM502BXylene, acetone, etc1–500≤242.5 ± 0.1≤4.0%(k = 2)0.1
S6GM512BHydrogen sulfide, etc0.5–50≤242.5 ± 0.1≤4.0%(k = 2)0.05
S7GM602BHydrogen sulfide, etc0.5–50≤241.9 ± 0.1≤5.3%(k = 2)0.05
S8GM702Bcarbon monoxide5–5000≤242.5 ± 0.1≤4.0%(k = 2)0.5
S9GM802BAmmonia gas, etc1–300≤242.0 ± 0.1≤5.0%(k = 2)0.1
S10GM2021Bhydrogen0.1–1000≤242.5 ± 0.1≤4.0%(k = 2)0.01
Table 2. Importance evaluation results of feature types.
Table 2. Importance evaluation results of feature types.
Soil NutrientFeature Type Importance Ranking
SOMV7s, RAV, MEAN, RSMV, SC, MF, MAX, SE
TNBER, SC, MF, FSD, SSK, SKU, SV, SE
AKSC, FSD, MF, BER, SE, SKU, SV, SSK
APSE, RCV, MDCV, MF, SC, V7s, RSMV, FSD
BER = bandwidth energy ratio; FSD = spectral standard deviation; MAX = maximum value; MDCV = mean differential coefficient; MEAN = mean; MF = spectral mean; RCV = relative change value; RAV = response area; RSMV = relative steady-state mean; SC = spectral centroid; SE = spectral entropy; SKU = spectral kurtosis; SSK = spectral skewness; SV = spectral variance; V7s = 7-s transient value.
Table 3. Results of redundancy elimination of feature types.
Table 3. Results of redundancy elimination of feature types.
Soil NutrientRedundancy Elimination Results
SOMV7s, SC, MDCV, SV, SSK, INI, DF, VAR
TNBER, MF, FSD, SKU, V7s, MDCV, INI, DF
AKSC, FSD, SKU, RCV, INI, V7s, DF, VAR
APSE, RCV, V7s, FSD, INI, SKU, BER, DF
DF = dominant frequency; INI = initial value; VAR = Variance.
Table 4. Feature Selection Results Based on the REFCV Method.
Table 4. Feature Selection Results Based on the REFCV Method.
Soil
Nutrient
Number of SensorsNumber of FeaturesFeature ID
SOM827MF3, SE3, SE5, SC3, MF1, MF5, V7s1, V7s3, RSMV9, SC5, RAV3, SE1, V7s9, RAV9, RSMV4, SC1, MAX1, RAV5, RAV7, V7s2, RAV2, SC2, SE4, RAV1, MF9, RAV6, SE2
TN930V7s3, MDCV9, INI9, MDCV3, INI8, V7s8, V7s9, INI1, BER2, BER3, MDCV5, BER10, FSD2, SKU3, SKU4, V7s1, INI3, V7s4, MDCV4, SKU5, MDCV1, INI4, DF7, MDCV8, MF3, INI2, SKU1, MF2, BER7, V7s5
AK922VAR4, INI4, VAR3, INI3, V7s4, SC3, V7s3, RCV2, DF2, INI8, SC2, SC9, V7s9, VAR7, RCV3, V7s2, INI9, VAR5, DF1, RCV5, SC5, RCV6
AP925RCV9, SE9, INI6, BER6, BER1, SKU6, FSD1, V7s1, V7s2, FSD10, SKU5,V7s6, SKU3, BER9, SKU1, SE8, BER4, INI1, SKU4, V7s9, FSD3, BER8, INI3, DF9, RCV1
Table 5. Performance of PLSR Models Based on Features After PCC Redundancy Elimination.
Table 5. Performance of PLSR Models Based on Features After PCC Redundancy Elimination.
Soil
Nutrient
Whether to Consider the Redundancy of Feature TypesTraining SetTesting Set
R2RMSEMAERPDR2RMSEMAERPD
SOMdeny0.950.890.643.660.901.010.793.31
yes0.960.740.593.760.920.820.753.45
TNdeny0.951.080.093.310.891.180.13.01
yes0.940.940.083.460.920.980.093.13
AKdeny0.7628.9527.621.920.7033.6530.651.73
yes0.7627.5426.531.860.7329.7828.451.76
APdeny0.725.694.851.920.645.984.961.71
yes0.744.954.752.030.675.134.641.87
Table 6. Performance of Different Models Based on RFECV-Finalized Features.
Table 6. Performance of Different Models Based on RFECV-Finalized Features.
ModelSoil
Nutrient
Number of SensorsNumber of
Features
Training SetTesting Set
R2RMSEMAERPDR2RMSEMAERPD
SVMSOM8270.940.090.743.510.891.020.83.35
TN8300.911.020.083.450.891.20.13.01
AK9220.8629.0425.232.730.7131.6527.551.72
AP9250.695.654.921.690.685.874.831.74
SVM-RFSOM8270.940.640.683.890.920.80.743.55
TN8300.960.921.023.920.921.00.093.43
AK9220.8125.3722.192.250.7427.6825.321.83
AP9250.725.064.231.790.704.964.421.87
PSO-SVM-RFSOM8270.960.650.434.260.940.720.543.96
TN8300.980.690.064.360.940.750.073.84
AK9220.8919.8618.733.230.7823.6321.251.98
AP9250.694.803.921.820.744.603.831.96
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Lin, L.; Huang, D.; Zhao, C.; Liu, S.; Zhang, S. Research on Detection Methods for Major Soil Nutrients Based on Pyrolysis-Electronic Nose Time-Frequency Domain Feature Fusion and PSO-SVM-RF Model. Agronomy 2025, 15, 2916. https://doi.org/10.3390/agronomy15122916

AMA Style

Lin L, Huang D, Zhao C, Liu S, Zhang S. Research on Detection Methods for Major Soil Nutrients Based on Pyrolysis-Electronic Nose Time-Frequency Domain Feature Fusion and PSO-SVM-RF Model. Agronomy. 2025; 15(12):2916. https://doi.org/10.3390/agronomy15122916

Chicago/Turabian Style

Lin, Li, Dongyan Huang, Chunkai Zhao, Shuyan Liu, and Shuo Zhang. 2025. "Research on Detection Methods for Major Soil Nutrients Based on Pyrolysis-Electronic Nose Time-Frequency Domain Feature Fusion and PSO-SVM-RF Model" Agronomy 15, no. 12: 2916. https://doi.org/10.3390/agronomy15122916

APA Style

Lin, L., Huang, D., Zhao, C., Liu, S., & Zhang, S. (2025). Research on Detection Methods for Major Soil Nutrients Based on Pyrolysis-Electronic Nose Time-Frequency Domain Feature Fusion and PSO-SVM-RF Model. Agronomy, 15(12), 2916. https://doi.org/10.3390/agronomy15122916

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop