Next Article in Journal
Three Decades of Habitat Loss and Northward Shift in the Red-Crowned Crane on the Songnen Plain: Conservation Gaps and the Need for Network Expansion
Previous Article in Journal
Effect of Individual Selection Silvicultural Treatment on the Vertical Structure of a Pine-Oak Forest in Northern Mexico
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Vis–NIR Spectroscopy Characteristics of Wetland Soils with Different Water Contents and Machine Learning Models for Carbon and Nitrogen Content

1
State Key Laboratory of Wetland Conservation and Restoration, Beijing 100091, China
2
Institute of Wetland Research, Chinese Academy of Forestry, Beijing 100091, China
3
Key Laboratory of Wetland Services and Restoration, Beijing 100091, China
4
Chahannaoer Wetland Ecosystem Research Station, Ulanqab 013400, China
5
Wuhan Forestry Workstation, Wuhan 430023, China
*
Author to whom correspondence should be addressed.
These authors have contributed equally to this work and share first authorship.
Ecologies 2025, 6(4), 75; https://doi.org/10.3390/ecologies6040075
Submission received: 5 August 2025 / Revised: 23 October 2025 / Accepted: 4 November 2025 / Published: 6 November 2025

Abstract

Soil nutrient detection in wetlands is critical for rapidly and effectively managing these ecosystems. Our objective was to provide a methodological framework for identifying optimal data processing methods and machine learning model for predicting soil organic carbon (SOC) and total nitrogen (TN) content using Vis–NIR spectroscopy, under the confounding influence of varying soil moisture. Soil samples (474) were collected from the Shaanxi Yellow River Wetland Provincial Nature Reserve with five moisture levels (0, 5, 10, 20, and 30%). Using a Vis–NIR spectroscopy system (ASD FS4 spectrometer), soil organic carbon (SOC) and total nitrogen (TN) were detected within the 350–2500 nm spectral range. Machine learning models were established using the Random Forest model (RF), eXtreme Gradient Boosting (XGBoost), and Partial Least Squares Regression (PLSR). The results indicated: (1) spectral reflectance values increased as soil moisture content decreased, with the 0% moisture model being consistently more accurate; (2) models for SOC and TN on first-derivative spectra had higher accuracy; and (3) the RF exhibited higher inversion accuracy and stability (R2 = 0.30–0.69). (4) The SHAP analysis confirmed 1865 nm and 1419 nm as the most contributory bands for SOC and TN prediction respectively, validating the RF model’s spectral interpretation capability.

1. Introduction

Carbon and nitrogen in soil are pivotal to the biogeochemical cycles of terrestrial ecosystems and plant growth, serving as key indicators of soil quality [1]. Soil carbon components aid in the formation and decomposition of organic matter, improve soil structure, support microbial activity and diversity, and enhance soil carbon storage, thereby mitigating climate change [2,3]. Soil nitrogen provides essential nutrients for plant growth, bolstering soil fertility through the nitrogen cycle and fostering ecosystem productivity [4,5]. To understand the critical role of soil in global material cycles, it is necessary to quantitatively assess soil carbon and nitrogen contents and their management [6]. Soil, as a natural continuum, exhibits significant spatial heterogeneity in the distribution of soil organic carbon (SOC) and total nitrogen (TN) [7]. Traditional laboratory chemical measurement methods are expensive, prone to generating chemical waste that causes environmental pollution, and are not feasible for large-scale applications [8,9]. Visible–near infrared Vis–NIR spectroscopy technology can efficiently and rapidly acquire information on various soil nutrients. With low operational costs and strong adaptability, it provides robust technical support for measuring soil nutrient content [10]. The principles of reflectance spectroscopy in soil science are closely related to the interaction between electromagnetic waves and soil components [11,12]. Different wavelengths of soil reflectance spectra exhibit specific optical responses to soil components such as carbon and nitrogen, enabling the estimation of their content [13,14,15]. In recent years, Vis–NIR spectroscopy technology has increasingly been applied in soil science, including soil classification [16], nutrient content estimation [17], moisture measurement [18], and pollutant detection [19], becoming a reliable tool for investigating soil-related issues.
However, the collection of soil spectra is influenced by numerous factors, of which soil moisture is one of the most critical [20,21,22]. Previous studies have shown that when collecting in situ spectra in the field, the presence of soil pores and soil microaggregates causes the soil to absorb moisture from the air. This moisture increases the soil absorbance, increases the light refracted from the soil surface to the air, and decreases the light reflected. Consequently, this leads to measurement errors in the soil spectral reflectance and makes it challenging to establish high-accuracy prediction models for soil property content [23,24,25].
In recent years, researchers worldwide have conducted extensive studies on the relationship between soil moisture and soil spectral reflectance. It is widely accepted that the relationship between soil moisture content and spectral reflectance is nonlinear. Specifically, when the soil moisture content was below field capacity, there was a negative correlation between soil spectral reflectance and soil moisture content, with reflectance values decreasing as moisture content increased. Conversely, when the soil moisture content exceeds the field capacity, there is a positive correlation, with the reflectance values increasing as the moisture content increases [26]. suggested that the significant influence of soil moisture on spectral reflectance may obscure the spectral characteristics of other substances [27]. Morganc et al. found that rewetting air-dried soil samples reduced the prediction accuracy of SOC and inorganic carbon owing to the impact of soil moisture [28]. discovered that the predictive ability of VIS–NIR for soil properties declined when the soil moisture content exceeded 10% [29]. reported that using oven-dried soil in the lab to estimate soil carbon, nitrogen, phosphorus, potassium contents, and pH showed a slight improvement in prediction accuracy with the removal of moisture [30]. indicated that the prediction accuracy for organic matter, available nitrogen, and available potassium significantly decreased with increasing soil moisture [31]. predicted the soil TN content using soil samples with eight moisture levels and concluded that moisture affects the predictive performance of the models [32].
Although research on the impact of soil moisture on near-infrared spectroscopy has made some progress, different scholars have conducted detailed analyses and thorough explorations to predict certain soil parameters (such as soil organic matter, available nitrogen, and available potassium). However, research on the effect of soil moisture on the prediction of SOC and TN is limited. The objectives of this study were to: (1) systematically evaluate the impact of soil moisture on the prediction accuracy of SOC and TN; (2) compare the effectiveness of spectral preprocessing techniques, specifically raw and first-derivative (FD) spectra; (3) assess the performance and stability of three machine learning algorithms—eXtreme Gradient Boosting (XGBoost), Random Forest (RF), and Partial Least Squares Regression (PLSR)—across five soil moisture levels (0%, 5%, 10%, 20%, and 30%); and (4) interpret the prediction mechanisms of the optimal model using SHapley Additive exPlanations (SHAP) to identify the most contributory spectral bands.
While the impact of soil moisture on Vis–NIR spectra is well-documented, a systematic comparison of prediction models across a controlled moisture gradient, combined with interpretable machine learning for SOC and TN in wetlands, remains limited. The novelty of this study lies in this comprehensive framework. Specifically, compared to earlier works, which often focused on specific moisture levels or correction techniques, our study provides a direct and systematic evaluation of model performance and interpretability across five defined moisture levels (0–30%), explicitly quantifying the accuracy loss with increasing moisture and identifying the most robust modeling strategy for such conditions.

2. Materials and Methods

2.1. Study Site

The Shaanxi Yellow River Wetland Provincial Nature Reserve is located in the eastern part of the Guanzhong Plain in Shaanxi Province (110°10′–110°36′ E, 34°36′–35°40′ N), within the Weinan city region. It encompasses the Yellow River channel, floodplains, and the confluence areas of the Yellow River, Wei River, and Luo River in Hancheng, Heyang, Dali, Huayin, and Tongguan, covering a total area of 45,986 ha. It is the largest riverside wetland in the Yellow River Basin in China. The reserve primarily aims to protect the natural environment of the wetland ecosystem and the related biota dependent on it. The terrain within the protected area is flat, and the soil types include saline soil, new soil, and swamp soil, with a soil pH value between 7.66–10.56 The climate is warm temperate continental semi-humid monsoon, with an average annual temperature of 13.5 °C and an average annual precipitation of 529–574 mm [33]. The reserve boasts rich plant resources, including 287 species of seed plants from 236 genera and 70 families, two species of gymnosperms from one family, 69 species of monocots from 17 families, and 216 species of dicots from 52 families. Dominant plant species within the wetland communities include common reed (Phragmites australis), and oriental cattail (Typha orientalis), typical of herbaceous marshes.

2.2. Soil Sample Collection

The sampling was conducted between August and September 2022. Soil samples were collected from a depth of 0–30 cm at each sampling site, thoroughly mixed, and bagged. The sampling points were spaced more than 50 m apart, resulting in 474 sampling points. The final collection included 116, 117, 120, and 121 samples from bare, reed (Phragmites australis), nutgrass (Cyperus rotundus), and cattail (Typha orientalis) wetlands, respectively. The sampling areas were consistent within the Yellow River Wetland Provincial Nature Reserve, providing ample data to support our results.

2.3. Chemical Determination of Soil Carbon and Nitrogen Content

SOC content was determined using the potassium dichromate-ferrous sulphate titration method, whereas TN content was measured using the semi-micro Kjeldahl method.
The descriptive statistics of the measured SOC and TN contents for all 474 samples are as follows: SOC ranged from 0.29 to 13.69 g/kg with a mean ± standard deviation of 3.32 ± 2.03 g/kg; TN ranged from 0.09 to 1.57 g/kg with a mean ± standard deviation of 0.40 ± 0.24 g/kg.

2.4. Spectral Data Measurement of Soil Samples with Different Moisture Contents

Gravel, insects, plant roots, and leaf debris were manually removed from the freshly collected moist soil samples and placed in aluminium boxes that had been pre-dried and weighed, recording the weight as M1. We dried the samples in an oven for 24 h at a temperature of 105 °C, ensuring that all moisture was completely removed. After drying, the samples were weighed again, and the dry weight was recorded as M2. The difference M2-M1 represented the weight of the soil sample in the aluminium box, denoted M. Fully dried soil samples were labelled as samples with 0% moisture content. Subsequently, we added 0.05 M, 0.10 M, 0.20 M, and 0.30 M of distilled water to each sample to prepare soil samples with 0%, 5%, 10%, 20%, and 30% moisture content, respectively.
Using an ASD FS4 (Analytical Spectral Devices, Inc., Boulder, CO, USA) equipped with a soil reflectance probe, the spectral data of soil samples with different moisture gradients were collected in a dark room across the 350–2500 nm wavelength range. The prepared soil samples with varying moisture content were poured into preprepared black Petri dishes and the surface of each soil sample was smoothed with a knife. The soil reflectance probe was positioned vertically 5 cm above the sample surface, ensuring that the receiving area of the probe did not exceed the diameter of the dish, to minimise the influence of external light on the spectral reflectance of the soil sample. The instrument was preheated for 30 min before calibration with a white reference panel, and a white reference correction was performed before the spectral measurement of each soil sample. Ten spectral curves were collected for each soil sample, and the arithmetic average of these 10 curves was used as the actual reflectance spectral data to reduce measurement errors.
The moisture levels investigated in this study (0–30%) encompass a range relevant to field conditions in wetlands. Field capacity for many mineral soils typically falls between 20–30%. Therefore, our gradient from 0% to 30% covers conditions from completely dry to near or at field capacity, which are commonly encountered in field monitoring and remote sensing scenarios. This allows our findings to be directly applicable to understanding and mitigating moisture effects in practical applications.

2.5. Soil Spectral Data Preprocessing

Extract spectral data using Viewspec Pro software (Version 5.6), and export the average of 10 spectral data collected at each sample point as the reflectance spectral data for that sample point. To avoid redundancy caused by overly narrow bands, the spectral data were smoothed by averaging every 10 adjacent bands. To improve inversion accuracy, the raw spectral reflectance was transformed using the first-derivative (FD) method [34]. The formula is:
F D R λ i = R λ i + 1 R ( λ i 1 ) λ
In the formula, λ i represents the wavelength of each band, F D R λ i is the FD value of wavelength λ i , and λ represents the wavelength values from band i to band i + 1.

2.6. Selection of Characteristic Bands

The Successive Projections Algorithm (SPA) is a feature selection method aimed at improving the robustness and predictive power of multivariate regression models by iteratively selecting variables that best explain the data characteristics [35]. The SPA begins by randomly selecting a variable from the dataset, then projects the remaining variables onto the orthogonal complement space of the current variable set. The variable with the largest projection length is added to the variable set. This process is repeated until the number of selected variables reaches a predetermined value n. Using raw spectral reflectance and FD spectral reflectance as independent variables and soil carbon and nitrogen values as dependent variables, SPA calculations were implemented using MATLAB R2023a. The calculated wavelengths were used as characteristic bands to predict soil carbon and nitrogen values. To determine the optimal number of characteristic bands (N), the SPA was run for a range of variable subset sizes. For each candidate subset selected by SPA, a preliminary PLSR model was constructed, and the corresponding Root Mean Square Error (RMSE) was calculated. The final number of variables (n = 20) was chosen as it corresponded to the point where the RMSE reached a minimum or a stable plateau, following the standard optimization procedure for SPA.

2.7. Construction and Accuracy Verification of Machine Learning Models

To ensure that both the calibration and validation sets were representative of the overall distribution of soil carbon and nitrogen content, a stratified sampling method was employed. Specifically, the entire dataset of 474 samples was first sorted in ascending order based on their Soil Organic Carbon (SOC) and Total Nitrogen (TN) content values. The sorted data was then systematically partitioned into four substrata by selecting every third sample. Finally, approximately three-quarters of the samples from each substratum were assigned to the calibration set, while the remaining one-quarter formed the validation set. XGBoost, RF, and PLSR prediction models were constructed for soil ecological stoichiometric characteristics. Training and validation of the three regression models were implemented using the Scikit-learn library in Python 3.9.
The accuracy of the machine learning models was evaluated by comparing the coefficients of determination ( R 2 ), Root Mean Square Error (RMSE), and Mean Absolute Error (MAE). A larger R2 and smaller RMSE and MAE indicate a higher prediction accuracy of the model [36,37].
R 2 = i = 1 n ( y y i ) 2 i = 1 n ( y ¯ y i ) 2
R M S E = i = 1 n ( y y i ) 2 n
M A E = i = 1 n y i y n
where y represents the measured values of the soil element content, y i represents the model-predicted values of the soil element content, y ¯ is the average of the measured soil element content values, and n is the number of samples.
To ensure the robustness and optimal performance of the machine learning models, key hyperparameters were tuned. The key hyperparameters and their search ranges ultimately employed for each model are as follows:
eXtreme Gradient Boosting (XGBoost): The search ranges for key parameters were: the number of estimators (n_estimators) from 0 to 300; maximum tree depth (max_depth) selected from [3, 5–7, 9, 12, 15]; minimum child weight (min_child_weight) selected from [1, 3, 5, 7]; the regularization parameter (gamma) selected from [0, 0.05–0.1, 0.3, 0.5, 0.7, 0.9, 1]; and the subsampling ratios for instances (subsample) and features (colsample_bytree) both selected from [0.6, 0.7, 0.8, 0.9, 1].
Random Forest (RF): The search ranges for key parameters were: the number of trees (n_estimators) from 0 to 300; maximum depth (max_depth) selected from [3, 5–7, 9, 12, 15]; the number of features considered for splitting (max_features) from 5 to 20; the minimum samples required to split a node (min_samples_split) from 2 to 11; and the minimum samples required to be at a leaf node (min_samples_leaf) from 1 to 11.
Partial Least Squares Regression (PLSR): The number of latent components (n_components) was set to 2, as this value was determined to optimally explain the variance in the calibration dataset while avoiding overfitting.
Savitzky–Golay (SG) Filter: Spectral smoothing was applied using an SG filter with a window length set to 9 and a polynomial order of 3. This configuration effectively reduced high-frequency noise while preserving the essential shape of the spectral curves.

2.8. Model Interpretation Using SHAP Analysis

To interpret the prediction mechanisms of the optimal machine learning model and identify the most influential spectral bands, SHapley Additive exPlanations (SHAP) analysis was employed. SHAP is a game-theoretic approach that assigns each feature an importance value for a particular prediction, quantifying its contribution to the model’s output. The analysis was performed specifically on the best-performing Random Forest model under 0% moisture conditions using the SHAP Python library. The mean absolute SHAP value was calculated across all samples in the validation set to determine the overall importance of each wavelength. This analysis allowed for the identification of the key spectral bands driving the predictions of SOC and TN content, as well as the direction of their influence.

3. Results

3.1. Characteristics of Soil Spectral Curves Under Different Moisture Contents

The spectral data of the wetland soils with different moisture contents in the 350–2500 nm range were averaged, the reflectance data were smoothed, and the noise bands were removed using an SG filter. This resulted in the average reflectance spectra for wetland soils with five different moisture contents, as shown in Figure 1. The figure shows that the reflectance spectra of the soils with the five moisture gradients generally increased with wavelength, exhibiting a similar overall trend. However, specific differences were evident at the absorption peaks at approximately 1400 and 1900 nm, where the depth of the absorption peaks increased with increasing soil moisture content. Additionally, differences were observed in the overall reflectance levels; as the soil moisture content increased, the soil spectral reflectance decreased. Notably, the absorption peak at 2200 nm is almost identical across the five moisture levels and did not vary with moisture content, as did the previous two absorption peaks. Quantitatively, the average reflectance at 800 nm decreased from approximately 0.35 at 0% moisture to below 0.20 at 30% moisture, representing a reduction of over 40%. The absorption depth at 1400 nm and 1900 nm, key water absorption features, increased markedly with moisture content.

3.2. Continuous Projection Algorithm for Feature Bands

The SPA reduces multicollinearity among variables by iteratively selecting them through successive projections, thereby enhancing the robustness of the model. This method identifies the most representative variables, improving the predictive accuracy and interpretability of multivariate regression models. Compared to other feature selection methods, SPA efficiently performs dimensionality reduction on large-scale Vis–NIR spectroscopy data with high computational efficiency. The results of the feature bands selected using SPA are shown in Figure 2.
To achieve a good balance in terms of information retention, model stability, and generalisation ability and to enhance the effectiveness of feature band extraction and the accuracy of subsequent analysis, we set the minimum value of the number of selected variables, n, to 20. Consequently, we obtained feature bands for the original and FD spectral data for the five different moisture levels.

3.3. Accuracy Evaluation of Soil Element Content Model Validation Based on Full-Band Analysis of Different Moisture Contents

Based on the full-band raw and FD spectral data and the measured SOC and TN values, XGBoost, RF, and PLSR models were established for wetland soils with five moisture content gradients using 357 training samples. The model validation results are shown in Figure 3.
First, the prediction effectiveness of the three models for the SOC and TN elements was compared. The R 2 range for SOC was 0.18 to 0.58, whereas for soil TN, the R 2 range was 0.14 to 0.60, indicating that the prediction accuracy was relatively better for soil TN. Comparing the three machine learning models, the R 2 range for XGBoost was 0.23 to 0.53, for RF was 0.24 to 0.58, and for PLSR was 0.14 to 0.47. Among these, RF demonstrated the best prediction accuracy.
By comparing the accuracy performance of the raw and FD spectral data transformation types across different models, it was found that the R 2 values of the three models based on the FD spectra were higher than those based on the raw spectra for full-band models with different moisture gradients.
Comparing the inversion results of the models set for the five soil moisture gradients in this study, it was found that the inversion model based on the 0% moisture gradient had higher accuracy and stability than the models for the other four moisture gradients. In addition, the inversion accuracy of the models for the other four moisture gradients decreased as the moisture content increased.

3.4. Accuracy Evaluation of Soil Element Content Model Validation Based on Different Moisture Content Characteristic Bands

Based on the characteristic bands selected by SPA, raw and FD spectral data, and measured SOC and TN values, XGBoost, RF, and PLSR models were established for wetland soils with five moisture content gradients using 357 training samples. The model validation results are listed in Table 1.
First, comparing the prediction accuracy of the three models for SOC and TN, the R 2 range for SOC was 0.18 to 0.59, whereas for soil TN, the R 2 range was 0.14 to 0.69, indicating that the prediction accuracy was relatively better for soil TN. Comparing the three machine learning models, the R 2 range for XGBoost was 0.22 to 0.64, for RF was 0.25 to 0.69, and for PLSR was 0.14 to 0.59. The results showed that RF had the most stable and accurate prediction accuracy, and the accuracy of all three models improved compared to the full-band models.
Comparing the accuracy performance of the raw and FD spectral data transformation types across different models, the R 2 values of the three models based on the FD spectra for characteristic band models with different moisture gradients were higher than those based on the raw spectra.
Comparing the inversion results of the models set for the five soil moisture gradients in this study, the inversion model based on the 0% moisture gradient had higher accuracy and stability than the models for the other four moisture gradients. In addition, the inversion accuracy of the models for the other four moisture gradients decreased as the moisture content increased. Therefore, the RF model based on the FD spectral data of the characteristic bands for 0% moist soil exhibited higher inversion accuracy and stability.
Furthermore, as shown in Figure 4, the scatter plots of SOC and TN estimates using the two spectral characteristic bands and three models at the 0% moisture gradient indicated that the RF validation set points were more concentrated around the 1:1 line. This suggests that the RF model has higher accuracy than the other two models.

3.5. Interpretation of Prediction Mechanisms via SHAP Analysis

The SHAP analysis was applied to the optimal Random Forest model (0% moisture, FD spectra) to elucidate the contribution of individual spectral bands to the prediction of SOC and TN. The results indicated that 1865 nm was the most influential band for SOC prediction, where lower reflectance contributed more significantly to higher predicted values (Figure 5). For TN prediction, 1419 nm was identified as the most contributory band, with higher reflectance exerting a stronger negative contribution to the predicted value (Figure 6).

4. Discussion

While the inverse relationship between soil moisture and spectral reflectance is a well-established phenomenon, the primary novelty of this work lies not in rediscovering this relationship, but in providing a systematic, methodological framework to address its confounding effects. This study moves beyond mere observation by rigorously evaluating and identifying the optimal combination of spectral preprocessing, machine learning algorithms, and interpretability tools to maximize the prediction accuracy of soil carbon and nitrogen under variable moisture conditions.

4.1. Impact of Soil Moisture on the Accuracy of Soil Nutrient Retrieval

The spectral reflectance characteristics of the soil are influenced by its texture and physicochemical properties [38]. The complex composition, physicochemical properties, structure, and functions of soil result in a significant amount of interference in spectral information [39]. Notably, during the acquisition of soil spectral information, soil moisture, clay content, salinity, and temperature are the primary factors causing the appearance of absorption peaks in the spectra [40,41]. Soil moisture content affects spectral reflectance; specifically, as soil moisture content increases, soil reflectance gradually decreases. In this study, after converting the original soil spectra to FD data, Vis–NIR spectroscopy fitting models for SOC and TN content were constructed using the XGBoost, RF, and PLSR methods. The dry-soil model with 0% moisture content exhibited the highest fitting accuracy. This phenomenon may occur because soil moisture masks the spectral features of soil nutrients (such as carbon and nitrogen), leading to differences in spectral bands and affecting soil spectral reflectance.
In addition, because soil moisture has a high absorption coefficient in the near-infrared band, it significantly interferes with the detection of soil nutrient content. Under different moisture conditions, the influence of moisture on the spectrum varied across bands, but significant differences were observed at approximately 1400, 1900, and 2200 nm. This is because of the well-defined water absorption bands near 1400 and 1900 nm [42]. When the soil moisture content is low, its surface roughness may increase, leading to more scattered light being reflected towards the detector, thereby increasing the reflectance. Conversely, when the soil moisture content is high, the dielectric constant of the soil increases, affecting the interaction between the soil and electromagnetic waves, and leading to more light absorption and reduced reflectance. These findings are similar to those of Muller et al. [43]. and Croft et al. [44], suggesting that the lower the soil moisture content, the stronger the ability to predict soil carbon and nitrogen contents using soil spectral features. Future research should focus on eliminating the impact of moisture on the Vis–NIR spectral estimation of soil elements.

4.2. Differences in Element Content Inverted from Vis–NIR Spectroscopy Data

Vis–NIR spectroscopy technology determines the chemical composition and relative content of substances based on spectral reflectance to identify materials. Early studies indicated that visible spectra are primarily produced by outer electron transitions, whereas near-infrared spectra are mainly influenced by molecular vibrations, reflecting the composition and structure of molecules [45]. Soil carbon and nitrogen content are considered soil properties that can enhance spectral reflectance intensity [46]. The successful prediction of carbon and nitrogen contents is mainly because of their direct spectral response in the Vis–NIR spectrum, which is caused by the overtones and combinations of N-H, O-H, C-H+C-H, and C-H+C-C [47,48,49]. Therefore, using Vis–NIR spectroscopy technology to establish models for soil carbon and nitrogen content allows for the rapid estimation of these contents with high prediction accuracy.

4.3. Differences in Inversion of Different Vis–NIR Spectroscopy Data of Wetland Soil

Vis–NIR spectroscopy data are vast and exhibit strong correlations between bands. Before performing data analysis with Vis–NIR spectroscopy data, feature selection methods were used to extract characteristic bands. Eliminating bands unrelated to carbon and nitrogen content could significantly reduce model complexity. This is an important aspect of simplifying Vis–NIR spectroscopy data models [50,51,52]. Using SPA, the number of bands in the raw and FD spectral data was reduced to approximately 20, greatly decreasing the data dimensionality. Additionally, the characteristic bands of the raw and FD spectral data were mainly concentrated in the peaks and troughs where reflectance changes were more pronounced (Figure 2), indicating that the characteristic bands extracted by SPA retained the spectral features well. The models built based on the characteristic bands calculated using SPA showed significantly higher accuracy than the models that did not use SPA. The use of the SPA significantly reduced the model complexity and computational load while improving the model accuracy. Therefore, combining SPA with machine-learning models could be an effective method for rapidly predicting soil carbon and nitrogen content.

4.4. Interpretation of Prediction Mechanisms Through SHAP Analysis

The band contributions for Soil Organic Carbon (SOC) and Total Nitrogen (TN) prediction by the Random Forest model under 0% moisture conditions were analyzed using SHAP values (Figure 5 and Figure 6). The SHAP analysis identified 1865 nm and 1419 nm as the most influential bands for SOC and TN prediction, respectively. It is critical to note that these wavelengths coincide with strong water absorption features. Therefore, their high importance in the model likely reflects the residual coupling between soil moisture and the spectral features of carbon and nitrogen constituents, rather than representing direct, fundamental spectral features of SOC or TN molecules themselves. This underscores the challenge of disentangling moisture effects from nutrient signals, even in models trained on dried samples (0% moisture), and highlights that these models may still be leveraging covariances related to the soil’s water-holding capacity or organic matter structure.
For TN inversion,1419 nm was identified as the most influential band, where higher reflectance exerts a stronger negative contribution to the predicted value 795 nm is the next most contributory band, with higher reflectance yielding a positive contribution to the prediction.

4.5. Impact of Machine Models on the Accuracy of Wetland Soil Carbon and Nitrogen Inversion

This study utilised three regression models to explore the inversion of soil carbon and nitrogen content. The RF and PLSR models have been widely used to predict soil elemental content, confirming their versatility (Table 2). The XGBoost model, an emerging ensemble learning algorithm known for its fast training speed and high fitting accuracy, has been successfully applied in fields such as salinity inversion [53] and chlorophyll-a concentration inversion [54] but is less commonly used in soil nutrient inversion studies. The RF model is an ensemble learning algorithm primarily used for classification and regression tasks. It trains and predicts samples by constructing multiple decision tree models [55]. When building each tree, Random Forest randomly selects a specific number of features at each node split, rather than using all features, which increases model diversity and reduces overfitting [56,57]. The PLSR model combines the advantages of principal component analysis, correlation analysis, and linear regression, effectively removing multicollinearity and handling high-dimensional data [58,59]. The XGBoost model is an ensemble learning algorithm based on tree models, optimized and improved using the gradient boosting algorithm. It introduces regularization terms to control model complexity and prevent overfitting. Additionally, it can automatically handle missing data by learning the optimal split direction for missing values, thereby enhancing the model’s robustness [60,61].
The comparatively lower predictive accuracy of the Partial Least Squares Regression (PLSR) model, relative to the tree-based ensemble methods, can be primarily attributed to its inherent linearity. PLSR is designed to identify linear relationships between spectral data and soil properties. However, the introduction of varying soil moisture creates complex, non-linear interactions and overlaps within the spectral data, which a linear model struggles to adequately capture. In contrast, tree-based models like Random Forest (RF) and eXtreme Gradient Boosting (XGBoost) excel in this context due to their inherent capability to model these non-linear relationships and complex feature interactions through their hierarchical, decision-making structure. Between the two non-linear ensemble methods, the Random Forest (RF) model demonstrated superior robustness and overall accuracy compared to XGBoost. This can be explained by key architectural differences. RF employs “bagging” (Bootstrap Aggregating), which builds each tree on a random subset of data and features. This introduces high model diversity and effectively reduces variance, making it particularly robust to noisy data and complex spectral patterns. While XGBoost is a powerful “boosting” algorithm that sequentially corrects errors, it can be more sensitive to such noise and may require more extensive hyperparameter tuning to achieve optimal generalization. The inherent stability of the RF algorithm likely rendered it more adaptable to the challenging spectral variations induced by moisture in this study.
This study demonstrated R2 values ranging from 0.30 to 0.69 for the best-performing models (Table 1), which are comparatively lower than the high R2 values reported in some previous studies under ideal dry conditions (Table 2). This discrepancy does not indicate a failure of our methodology, but rather underscores the central challenge addressed in our work: the confounding effect of soil moisture. The wide range of R2 values in the literature, from as low as 0.44 to as high as 0.95, can be attributed to several critical factors:
  • Soil Moisture Content: This is the most significant factor. Studies achieving near-perfect prediction were almost exclusively conducted on oven-dried or air-dried soils. By eliminating moisture’s masking effect, these models capture the purest spectral signal of SOC and TN. In contrast, our study intentionally introduced moisture gradients (0–30%) to simulate a more realistic field scenario, where water absorption features overwhelm the weaker spectral features of carbon and nitrogen, inevitably leading to a reduction in predictive accuracy.
  • Sample Size and Diversity: The representativeness of the calibration dataset greatly influences model performance. Studies with a large number of samples (n) covering a wide range of soil types, textures, and land uses tend to build more robust but potentially less precise models (with lower maximum R2), as they must account for greater inherent variability. Studies on smaller, more homogeneous datasets can achieve very high accuracy for that specific context but may lack generalizability.
  • Model and Preprocessing Choices: The choice of algorithm and spectral preprocessing significantly impacts results. Nonlinear models like RF and XGBoost, as demonstrated in our work, are better suited to handle the complex, non-linear interactions introduced by moisture compared to linear models like PLSR.
Therefore, the key contribution of this study is not in achieving the highest possible R2 under ideal conditions, but in systematically quantifying the performance of different models across a gradient of soil moisture—a prevalent and disruptive condition in real-world applications. Our finding that the RF model on first-derivative spectra of dry soil yielded the most accurate and stable predictions provides a critical benchmark and a practical methodological framework for future applications where sample drying is feasible. For in-situ sensing, our results clearly delineate the expected performance loss with increasing moisture, which is vital for setting realistic expectations in precision agriculture and environmental monitoring.

5. Conclusions

This study systematically evaluated the optimal strategy for predicting soil carbon and nitrogen content under varying moisture conditions. By comparing different preprocessing techniques and machine learning models on soil samples with five moisture levels, we established the soil carbon and nitrogen content at different moisture levels. The conclusions are as follows.
  • The soil spectral reflectance values gradually increased as the soil moisture content decreased, with the 0% moisture content prediction model consistently exhibiting better accuracy than other moisture levels.
  • By comparing the validation accuracy of models based on raw and FD spectra, it was found that the estimation models for SOC and TN content built on FD spectra had higher accuracy.
  • The RF model based on SPA-selected characteristic bands had a validation R 2 range of 0.30–0.69, demonstrating higher inversion accuracy and greater stability.
  • The SHAP analysis confirmed 1865 nm and 1419 nm as the most contributory bands for SOC and TN prediction respectively, validating the RF model’s spectral interpretation capability.
In conclusion, the principal contribution of this study is methodological. We systematically demonstrated that the combination of first-derivative spectral preprocessing, the Random Forest algorithm, and feature selection via SPA presents the most robust framework for estimating SOC and TN in wetland soils when confronted with the challenge of variable moisture content. Furthermore, the application of SHAP analysis provided critical, interpretable insights into the prediction mechanisms, identifying key spectral bands and thereby validating the model’s decision-making process. This integrated approach offers a practical and insightful pathway for enhancing the application of Vis–NIR spectroscopy in real-world scenarios where soil moisture cannot be strictly controlled.

Author Contributions

Conceptualization, K.Q. and W.L.; Data curation, K.Q., L.N. and M.X.; Funding acquisition, W.L.; Methodology, K.Q., H.L. and M.X.; Resources, K.Q. and L.C.; Software, X.Z. (Xiajie Zhai) and X.Z. (Xinsheng Zhao); Supervision, J.W.; Visualization, Y.L. and W.L.; Writing—original draft, L.N.; Writing—review & editing, K.Q., L.N. and M.X. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by China’s Special Fund for Basic Scientific Research Business of Central Public Research Institutes (CAFYBB2021ZB003) and the National Key R&D Program of China (2017YFC0506200).

Institutional Review Board Statement

Not applicable.

Data Availability Statement

Data is available from the corresponding author upon request.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Yuan, J.; Liang, Y.; Zhuo, M.; Sadiq, M.; Liu, L.; Wu, J.; Xu, G.; Liu, S.; Li, G.; Yan, L. Soil nitrogen and carbon storages and carbon pool management index under sustainable conservation tillage strategy. Front. Ecol. Evol. 2023, 10, 1082624. [Google Scholar] [CrossRef]
  2. Billings, S.A.; Lajtha, K.; Malhotra, A.; Berhe, A.A.; de Graaff, M.A.; Earl, S.; Fraterrigo, J.; Georgiou, K.; Grandy, S.; Hobbie, S.E.; et al. Soil organic carbon is not just for soil scientists: Measurement recommendations for diverse practitioners. Ecol. Appl. 2021, 31, e02290. [Google Scholar] [CrossRef] [PubMed]
  3. Telo da Gama, J. The role of soils in sustainability, climate change, and ecosystem services: Challenges and opportunities. Ecologies 2023, 4, 552–567. [Google Scholar] [CrossRef]
  4. Reich, P.B.; Tjoelker, M.G.; Machado, J.-L.; Oleksyn, J. Univ ersal scaling of respiratory metabolism, size and nitrogen in plants. Nature 2006, 439, 457–461. [Google Scholar] [CrossRef]
  5. Spohn, M.; Stendahl, J. Carbon, nitrogen, and phosphorus stoichiometry of organic matter in Swedish forest soils and its relationship with climate, tree species, and soil texture. Biogeosciences 2022, 19, 2171–2186. [Google Scholar] [CrossRef]
  6. Sweetman, A.J.; Valle, M.D.; Prevedouros, K.; Jones, K.C.J. The role of soil organic carbon in the global cycling of persistent organic pollutants (POPs): Interpreting and modelling field data. Chemosphere 2005, 60, 959–972. [Google Scholar] [CrossRef] [PubMed]
  7. Tian, J.; Yuan, Y.; Zhou, P.; Wang, L.; Chen, Z.; Chen, Q. Spatial distribution of soil organic carbon and total nitrogen in a micro-catchment of northeast China and their influencing factors. Sustainability 2023, 15, 6355. [Google Scholar] [CrossRef]
  8. Bao, Y.; Meng, X.; Ustin, S.; Wang, X.; Zhang, X.; Liu, H.; Tang, H. Vis-SWIR spectral prediction model for soil organic matter with different grouping strategies. Catena 2020, 195, 104703. [Google Scholar] [CrossRef]
  9. Xu, X.; Chen, S.; Xu, Z.; Yu, Y.; Zhang, S.; Dai, R. Exploring appropriate preprocessing techniques for Hyperspectral soil organic matter content estimation in black soil area. Remote Sens. 2020, 12, 3765. [Google Scholar] [CrossRef]
  10. Katuwal, S.; Knadel, M.; Moldrup, P.; Norgaard, T.; Greve, M.H.; de Jonge, L.W. Visible–near–infrared spectroscopy can predict mass transport of dissolved chemicals through intact soil. Sci. Rep. 2018, 8, 11188. [Google Scholar] [CrossRef]
  11. Schwartz, G.; Eshel, G.; Ben Dor, E. Reflectance spectroscopy as a tool for monitoring contaminated soils. In Soil Contamination; IntechOpen: Rijeka, Croatia, 2011; p. 6790. [Google Scholar] [CrossRef]
  12. Vasava, H.B.; Gupta, A.; Arora, R.; Das, B.S. Assessment of soil texture from spectral reflectance data of bulk soil samples and their dry-sieved aggregate size fractions. Geoderma 2019, 337, 914–926. [Google Scholar] [CrossRef]
  13. Yu, S.; Bu, H.; Dong, W.; Jiang, Z.; Zhang, L.; Xia, Y. Construction and evaluation of prediction model of main soil nutrients based on spectral information. Appl. Sci. 2022, 12, 6298. [Google Scholar] [CrossRef]
  14. Odebiri, O.; Mutanga, O.; Odind, J.; Naicker, R.; Masemola, C.; Sibanda, M. Deep learning approaches in remote sensing of soil organic carbon: A review of utility, challenges, and prospects. Environ. Monit. Assess. 2021, 193, 802. [Google Scholar] [CrossRef]
  15. Misbah, K.; Laamrani, A.; Khechba, K.; Dhiba, D.; Chehbouni, A. Multi-sensors remote sensing applications for assessing, monitoring, and mapping NPK content in soil and crops in African agricultural land. Remote Sens. 2022, 14, 81. [Google Scholar] [CrossRef]
  16. Ewing, J.; Oommen, T.; Jayakumar, P.; Alger, R. Utilizing Hyperspectral remote sensing for soil gradation. Remote Sens. 2020, 12, 3312. [Google Scholar] [CrossRef]
  17. Jia, S.; Li, H.; Wang, Y.; Tong, R.; Li, Q. Hyperspectral imaging analysis for the classification of soil types and the determination of soil total nitrogen. Sensors 2017, 17, 2252. [Google Scholar] [CrossRef]
  18. Guo, Z.; Li, X.; Ren, Y.; Qian, S.; Shao, Y. Research on regional soil moisture dynamics based on Hyperspectral remote sensing technology. Int. J. Low-Carbon Technol. 2023, 18, 737–749. [Google Scholar] [CrossRef]
  19. Zea, M.; Souza, A.; Yang, Y.; Lee, L.; Nemali, K.; Hoagland, L. Leveraging high-throughput Hyperspectral imaging technology to detect cadmium stress in two leafy green crops and accelerate soil remediation efforts. Environ. Pollut. 2022, 292, 118405. [Google Scholar] [CrossRef]
  20. Seidel, M.; Vohland, M.; Greenberg, I.; Ludwig, B.; Ortner, M.; Thiele-Bruhn, S.; Hutengs, C. Soil moisture effects on predictive VNIR and MIR modeling of soil organic carbon and clay content. Geoderma 2022, 427, 116103. [Google Scholar] [CrossRef]
  21. Lesaignoux, A.; Fabre, S.; Briottet, X. Influence of soil moisture content on spectral reflectance of bare soils in the 0.4–14 μm domain. Int. J. Remote Sens. 2013, 34, 2268–2285. [Google Scholar] [CrossRef]
  22. Eredics, A.; Németh, Z.I.; Rákosa, R.; Rasztovits, E.; Móricz, N.; Vig, V. The effect of soil moisture on the reflectance spectra correlations in beech and sessile oak foliage. Acta Silv. Lignaria Hung. 2015, 11, 9–26. [Google Scholar] [CrossRef]
  23. Nocita, M.; Stevens, A.; Noon, C.; van Wesemael, B. Prediction of soil organic carbon for different levels of soil moisture using Vis-NIR spectroscopy. Geoderma 2013, 199, 37–42. [Google Scholar] [CrossRef]
  24. Wijewardane, N.K.; Ge, Y.; Morgan, C.L.S. Moisture insensitive prediction of soil properties from VNIR reflectance spectra based on external parameter orthogonalization. Geoderma 2016, 267, 92–101. [Google Scholar] [CrossRef]
  25. Minasny, B.; Hartemink, A.E. Predicting soil properties in the tropics. Earth-Sci. Rev. 2011, 106, 52–62. [Google Scholar] [CrossRef]
  26. Liu, W.; Baret, F.; Gu, X.; Tong, Q.; Zheng, L.; Zhang, B. Relating soil surface moisture to reflectance. Remote Sens. Environ. 2002, 81, 238–246. [Google Scholar] [CrossRef]
  27. Xia, K.; Xia, S.; Shen, Q.; Yang, B.; Song, Q.; Xu, Y.; Zhang, S.; Zhou, X.; Zhou, Y. Moisture spectral characteristics and Hyperspectral inversion of fly ash-filled reconstructed soil. Spectrochim. Acta A 2021, 253, 119590. [Google Scholar] [CrossRef]
  28. Morgan, C.L.S.; Waiser, T.H.; Brown, D.J.; Hallmark, C.T. Simulated in situ characterization of soil organic and inorganic carbon with visible near-infrared diffuse reflectance spectroscopy. Geoderma 2009, 151, 249–256. [Google Scholar] [CrossRef]
  29. Roudier, P.; Hedley, C.B.; Lobsey, C.R.; Viscarra Rossel, R.A.; Leroux, C. Evaluation of two methods to eliminate the effect of water from soil vis-NIR spectra for predictions of organic carbon. Geoderma 2017, 296, 98–107. [Google Scholar] [CrossRef]
  30. Mouazen, A.M.; Kuang, B.; De Baerdemaeker, J.; Ramon, H. Comparison among principal component, partial least squares and back propagation neural network analyses for accuracy of measurement of selected soil properties with visible and near infrared spectroscopy. Geoderma 2010, 158, 23–31. [Google Scholar] [CrossRef]
  31. Wan, S.; Hou, J.; Zhao, J.; Clarke, N.; Kempenaar, C.; Chen, X. Predicting soil organic matter, available nitrogen, available phosphorus and available potassium in a black soil using a nearby Hyperspectral sensor system. Sensors 2024, 24, 2784. [Google Scholar] [CrossRef]
  32. Liu, Y.; Jiang, Q.; Shi, T.; Fei, T.; Wang, J.; Liu, G.; Chen, Y. Prediction of total nitrogen in cropland soil at different levels of soil moisture with Vis/NIR spectroscopy. Acta Agric. Scand. Sect. B—Soil Plant Sci. 2014, 64, 267–281. [Google Scholar] [CrossRef]
  33. Nie, L.; Qu, K.; Cui, L.; Zhai, X.; Zhao, X.; Lei, Y.; Li, J.; Wang, J.; Wang, R.; Li, W. Inversion of soil carbon, nitrogen, and phosphorus in the Yellow River Wetland of Shaanxi Province using field in situ hyperspectroscopy. Front. Soil Sci. 2024, 4, 1364426. [Google Scholar] [CrossRef]
  34. Cui, L.; Zuo, X.; Dou, Z.; Huang, Y.; Zhao, X.; Zhai, X.; Lei, Y.; Li, J.; Pan, X.; Li, W. Plant identification of Beijing Hanshiqiao wetland based on Hyperspectral data. Spectrosc. Lett. 2021, 54, 381–394. [Google Scholar] [CrossRef]
  35. Araújo, M.C.U.; Saldanha, T.C.B.; Galvão, R.K.H.; Yoneyama, T.; Chame, H.C.; Visani, V. The successive projections algorithm for variable selection in spectroscopic multicomponent analysis. Chemom. Intell. Lab. Syst. 2001, 57, 65–73. [Google Scholar] [CrossRef]
  36. Cui, L.; Dou, Z.; Liu, Z.; Zuo, X.; Lei, Y.; Li, J.; Zhao, X.; Zhai, X.; Pan, X.; Li, W. Hyperspectral inversion of phragmites communis carbon, nitrogen, and phosphorus stoichiometry using three models. Remote Sens. 2020, 12, 1998. [Google Scholar] [CrossRef]
  37. Beaver, J.; Humphreys, E.R.; King, D. Random forest development and modeling of gross primary productivity in the hudson bay lowlands. Can. J. Remote Sens. 2024, 50, 2355937. [Google Scholar] [CrossRef]
  38. Coblinski, J.A.; Giasson, É.; Demattê, J.A.; Dotto, A.C.; Costa, J.J.F.; Vašát, R. Prediction of soil texture classes through different wavelength regions of reflectance spectroscopy at various soil depths. Catena 2020, 189, 104485. [Google Scholar] [CrossRef]
  39. Yang, S.; Wang, Z.; Yang, C.; Wang, C.; Wang, Z.; Yan, X.; Qiao, X.; Feng, M.; Xiao, L.; Shafiq, F.; et al. Estimation of generalized soil structure index based on differential spectra of different orders by multivariate assessment. Int. Soil Water Conserv. Res. 2024, 12, 313–321. [Google Scholar] [CrossRef]
  40. Divyesh, M.; Ajay, K.; Onkar, D. Development of spectral indexes in Hyperspectral imagery for land cover assessment. IETE Tech. Rev. 2019, 36, 216–230. [Google Scholar] [CrossRef]
  41. Ciampalini, A.; André, F.; Garfagnoli, F.; Grandjean, G.; Lambot, S.; Chiarantini, L.; Moretti, S. Improved estimation of soil clay content by the fusion of remote Hyperspectral and proximal geophysical sensing. J. Appl. Geophys. 2015, 116, 135–145. [Google Scholar] [CrossRef]
  42. Demattê, J.A.M.; Sousa, A.A.; Alves, M.C.; Nanni, M.R.; Fiorio, P.R.; Campos, R.C. Determining soil water status and other soil characteristics by spectral proximal sensing. Geoderma 2006, 135, 179–195. [Google Scholar] [CrossRef]
  43. Muller, E.; Decamps, H. Modeling soil moisture–reflectance. Remote Sens. Environ. 2001, 76, 173–180. [Google Scholar] [CrossRef]
  44. Croft, H.; Anderson, K.; Kuhn, N. Evaluating the influence of surface soil moisture and soil surface roughness on optical directional reflectance factors. Eur. J. Soil Sci. 2014, 65, 605–612. [Google Scholar] [CrossRef]
  45. Danesh, M.; Bahrami, H.A. Modeling of soil sand particles using spectroscopy technology. Commun. Soil Sci. Plant Anal. 2022, 53, 2216–2228. [Google Scholar] [CrossRef]
  46. Kearney, M.S.; Stutzer, D.; Turpie, K.; Stevenson, J.C. The effects of tidal inundation on the reflectance characteristics of coastal marsh vegetation. J. Coast. Res. 2009, 256, 1177–1186. [Google Scholar] [CrossRef]
  47. Nie, L.; Dou, Z.; Cui, L.; Tang, X.; Zhai, X.; Zhao, X.; Lei, Y.; Li, J.; Wang, J.; Li, W. Hyperspectral inversion of soil carbon and nutrient contents in the Yellow River Delta wetland. Diversity 2022, 14, 862. [Google Scholar] [CrossRef]
  48. Kuang, B.; Mahmood, H.S.; Quraishi, M.Z.; Hoogmoed, W.B.; Mouazen, A.M.; van Henten, E.J. Sensing soil properties in the laboratory, in situ, and on-line: A review. In Advances in Agronomy; Elsevier Inc.: Amsterdam, The Netherlands, 2012; Volume 114, pp. 155–223. [Google Scholar] [CrossRef]
  49. Dhawale, N.M.; Adamchuk, V.I.; Prasher, S.O.; Rossel, R.A.V.; Ismail, A.A. Evaluation of two portable Hyperspectral-sensor-based instruments to predict key soil properties in Canadian soils. Sensors 2022, 22, 2556. [Google Scholar] [CrossRef]
  50. Zhang, F.; Wu, S.; Liu, J.; Wang, C.; Guo, Z.; Xu, A.; Pan, K.; Pan, X. Predicting soil moisture content over partially vegetation covered surfaces from Hyperspectral data with deep learning. Soil Sci. Soc. Am. J. 2021, 85, 989–1001. [Google Scholar] [CrossRef]
  51. Liu, J.; Xie, J.; Meng, T.; Dong, H. Organic matter estimation of surface soil using successive projection algorithm. Agron. J. 2022, 114, 1944–1951. [Google Scholar] [CrossRef]
  52. Lin, L.; Liu, X. Mixture-based weight learning improves the random forest method for Hyperspectral estimation of soil total nitrogen. Comput. Electron. Agric. 2022, 192, 106634. [Google Scholar] [CrossRef]
  53. Aksoy, S.; Sertel, E.; Roscher, R.; Tanik, A.; Hamzehpour, N. Assessment of soil salinity using explainable machine learning methods and Landsat 8 images. Int. J. Appl. Earth Obs. Geoinf. 2024, 130, 103879. [Google Scholar] [CrossRef]
  54. Zhou, H.; Fu, X.; Li, H. Inversion of chlorophyll-a concentration in Wuliangsu Lake based on OGolden-DBO-XGBoost. Appl. Sci. 2024, 14, 4798. [Google Scholar] [CrossRef]
  55. Li, H.; Jia, S.; Le, Z. Quantitative analysis of soil total nitrogen using Hyperspectral imaging technology with extreme learning machine. Sensors 2019, 19, 4355. [Google Scholar] [CrossRef] [PubMed]
  56. Ndlovu, H.S.; Odindi, J.; Sibanda, M.; Mutanga, O.; Clulow, A.; Chimonyo, V.G.; Mabhaudhi, T. A comparative estimation of maize leaf water content using machine learning techniques and unmanned aerial vehicle (UAV)-based proximal and remotely sensed data. Remote Sens. 2021, 13, 4091. [Google Scholar] [CrossRef]
  57. Wang, J.; Zhou, Q.; Shang, J.; Liu, C.; Zhuang, T.; Ding, J.; Xian, Y.; Zhao, L.; Wang, W.; Zhou, G.; et al. UAV-and machine learning-based retrieval of wheat SPAD values at the overwintering stage for variety screening. Remote Sens. 2021, 13, 5166. [Google Scholar] [CrossRef]
  58. Pechanec, V.; Mráz, A.; Rozkošný, L.; Vyvlečka, P. Usage of airborne Hyperspectral imaging data for identifying spatial variability of soil nitrogen content. ISPRS Int. J. Geo-Inf. 2021, 10, 355. [Google Scholar] [CrossRef]
  59. Rischbeck, P.; Elsayed, S.; Mistele, B.; Barmeier, G.; Heil, K.; Schmidhalter, U. Data fusion of spectral, thermal and canopy height parameters for improved yield prediction of drought stressed spring barley. Eur. J. Agron. 2016, 78, 44–59. [Google Scholar] [CrossRef]
  60. Wei, L.; Wang, Z.; Huang, C.; Zhang, Y.; Wang, Z.; Xia, H.; Cao, L. Transparency estimation of narrow rivers by UAV-borne Hyperspectral remote sensing imagery. IEEE Access 2020, 8, 168137–168153. [Google Scholar] [CrossRef]
  61. Zhang, Y.; Xia, C.; Zhang, X.; Cheng, X.; Feng, G.; Wang, Y.; Gao, Q. Estimating the maize biomass by crop height and narrowband vegetation indices derived from UAV-based Hyperspectral images. Ecol. Indic. 2021, 129, 107985. [Google Scholar] [CrossRef]
  62. Wang, S.; Guan, K.; Zhang, C.; Lee, D.; Margenot, A.J.; Ge, Y.; Peng, J.; Zhou, W.; Zhou, Q.; Huang, Y. Using soil library Hyperspectral reflectance and machine learning to predict soil organic carbon: Assessing potential of airborne and spaceborne optical soil sensing. Remote Sens. Environ. 2022, 271, 112914. [Google Scholar] [CrossRef]
  63. Mondal, B.P.; Sekhon, B.S.; Sahoo, R.N.; Paul, P. VIS-NIR reflectance spectroscopy for assessment of soil organic carbon in a rice-wheat field of Ludhiana district of Punjab. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2019, XLII-3/W6, 417–422. [Google Scholar] [CrossRef]
  64. Ribeiro, S.G.; Teixeira, A.d.S.; de Oliveira, M.R.R.; Costa, M.C.G.; Araújo, I.C.d.S.; Moreira, L.C.J.; Lopes, F.B. Soil organic carbon content prediction using soil-reflected spectra: A comparison of two regression methods. Remote Sens. 2021, 13, 4752. [Google Scholar] [CrossRef]
Figure 1. Spectral reflectance curves of soil with different moisture contents.
Figure 1. Spectral reflectance curves of soil with different moisture contents.
Ecologies 06 00075 g001
Figure 2. Selection results of SPA feature bands. (a,b) Screening results of the original and first-order spectral characteristic bands with different moisture contents of soil organic carbon and total nitrogen are presented in sequence.
Figure 2. Selection results of SPA feature bands. (a,b) Screening results of the original and first-order spectral characteristic bands with different moisture contents of soil organic carbon and total nitrogen are presented in sequence.
Ecologies 06 00075 g002
Figure 3. Model validation accuracy evaluation of soil organic carbon and total nitrogen based on full-band spectra with different moisture contents. (a,b) Validation R2 and RMSE values for constructing models using raw spectral data and first-order differential spectral data, respectively; (c,d) MAE values for constructing models using raw spectral data and first-order differential spectral data, respectively.
Figure 3. Model validation accuracy evaluation of soil organic carbon and total nitrogen based on full-band spectra with different moisture contents. (a,b) Validation R2 and RMSE values for constructing models using raw spectral data and first-order differential spectral data, respectively; (c,d) MAE values for constructing models using raw spectral data and first-order differential spectral data, respectively.
Ecologies 06 00075 g003
Figure 4. Validation results of spectral reflectance model for characteristic bands of soil organic carbon and total nitrogen based on 0% moisture content. (ac) Scatter plots of organic carbon from XGBoost, RF, and PLSR models constructed using raw spectral data, respectively; (df) Scatter plots of organic carbon from XGBoost, RF, and PLSR models constructed using first-order differential spectroscopy data, respectively. (gi) Scatter plots of total nitrogen from XGBoost, RF, and PLSR models constructed using raw spectral data, respectively; (jl) Scatter plots of total nitrogen from XGBoost, RF, and PLSR models constructed using first-order differential spectral data, respectively.
Figure 4. Validation results of spectral reflectance model for characteristic bands of soil organic carbon and total nitrogen based on 0% moisture content. (ac) Scatter plots of organic carbon from XGBoost, RF, and PLSR models constructed using raw spectral data, respectively; (df) Scatter plots of organic carbon from XGBoost, RF, and PLSR models constructed using first-order differential spectroscopy data, respectively. (gi) Scatter plots of total nitrogen from XGBoost, RF, and PLSR models constructed using raw spectral data, respectively; (jl) Scatter plots of total nitrogen from XGBoost, RF, and PLSR models constructed using first-order differential spectral data, respectively.
Ecologies 06 00075 g004
Figure 5. SHAP analysis for SOC prediction (0% moisture) Y-axis: Feature bands (nm) ranked by importance; X-axis: SHAP value (contribution strength); Color: Reflectance (blue = low, red = high).
Figure 5. SHAP analysis for SOC prediction (0% moisture) Y-axis: Feature bands (nm) ranked by importance; X-axis: SHAP value (contribution strength); Color: Reflectance (blue = low, red = high).
Ecologies 06 00075 g005
Figure 6. SHAP analysis for TN prediction (0% moisture).
Figure 6. SHAP analysis for TN prediction (0% moisture).
Ecologies 06 00075 g006
Table 1. Model validation accuracy evaluation of soil organic carbon and total nitrogen based on spectral characteristics of different moisture content bands.
Table 1. Model validation accuracy evaluation of soil organic carbon and total nitrogen based on spectral characteristics of different moisture content bands.
ElementsWater ContentModelOriginalFirst Derivative
Validation SetValidation Set
R2RMSEMAER2RMSEMAE
SOC/(g/kg)0%XGBoost0.4731.1490.9890.5271.0650.922
RF0.5821.0740.9060.5981.0100.867
PLSR0.3401.3221.1520.4671.1510.972
5%XGBoost0.4151.3161.1390.4751.2391.032
RF0.4821.2551.0630.5321.1670.956
PLSR0.2951.4251.2190.4521.2441.053
10%XGBoost0.3681.3461.1470.4571.2661.065
RF0.4031.3101.1240.4791.2511.036
PLSR0.2441.4831.2090.3841.3181.098
20%XGBoost0.3411.3891.1350.3971.3391.129
RF0.3541.3641.1720.4181.2961.089
PLSR0.2461.4811.2400.3731.3311.078
30%XGBoost0.2271.4841.2490.2491.4551.187
RF0.2551.4491.1900.3111.3941.145
PLSR0.1851.5271.2690.3341.3781.120
TN/(g/kg)0%XGBoost0.5770.1190.0900.6480.1160.092
RF0.6060.1290.1060.6930.1140.090
PLSR0.3220.1620.1380.5900.1280.104
5%XGBoost0.3890.1480.1180.5690.1220.098
RF0.4790.1310.1050.5850.1350.103
PLSR0.2690.1710.1390.5390.1380.110
10%XGBoost0.3110.1670.1340.4920.1340.109
RF0.3940.1580.1300.5160.1330.112
PLSR0.2050.1820.1510.4640.1410.119
20%XGBoost0.2960.1700.1360.4350.1520.121
RF0.3270.1650.1360.4270.1500.125
PLSR0.1960.1830.1540.3690.1560.128
30%XGBoost0.2560.1740.1370.3970.1600.125
RF0.3080.1690.1360.4570.1500.113
PLSR0.1420.1870.1560.2940.1700.135
Table 2. Summary of literature results on the prediction of soil carbon and nitrogen elements. Abbreviations: R2, Coefficient of determination; RMSE, Root Mean Square Error (units: g/kg for TN and SOC unless otherwise stated); RPD, Ratio of Performance to Deviation; OR, Original Reflectance; FD, First Derivative; PLSR, Partial Least Squares Regression; RF, Random Forest.
Table 2. Summary of literature results on the prediction of soil carbon and nitrogen elements. Abbreviations: R2, Coefficient of determination; RMSE, Root Mean Square Error (units: g/kg for TN and SOC unless otherwise stated); RPD, Ratio of Performance to Deviation; OR, Original Reflectance; FD, First Derivative; PLSR, Partial Least Squares Regression; RF, Random Forest.
ElementAccuracyData TypeModelAuthor
TN R 2 = 0.921
RMSE = 0.086
RPD = 2.59
Full Spectra-ORPLSRLi et al., 2019 [53]
R 2 = 0.915
RMSE = 0.089
RPD = 2.51
SPA-OR
TN R 2 = 0.757
RMSE = 0.235
ORRFLin et al., 2022 [49]
TN R 2 = 0.355
RMSE = 0.019
RPD = 1.245
ORPLSRPechanec et al., 2021 [56]
SOC R 2 = 0.950FDRFWang et al., 2022 [62]
SOC R 2 = 0.440
RMSE = 0.070
RPD = 1.570
ORPLSRMondal et al., 2019 [63]
SOC R 2 = 0.740
RMSE = 0.159
RPD = 1.780
FDPLSRRibeiro et al., 2021 [64]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Qu, K.; Nie, L.; Cui, L.; Li, H.; Xiong, M.; Zhai, X.; Zhao, X.; Wang, J.; Lei, Y.; Li, W. Vis–NIR Spectroscopy Characteristics of Wetland Soils with Different Water Contents and Machine Learning Models for Carbon and Nitrogen Content. Ecologies 2025, 6, 75. https://doi.org/10.3390/ecologies6040075

AMA Style

Qu K, Nie L, Cui L, Li H, Xiong M, Zhai X, Zhao X, Wang J, Lei Y, Li W. Vis–NIR Spectroscopy Characteristics of Wetland Soils with Different Water Contents and Machine Learning Models for Carbon and Nitrogen Content. Ecologies. 2025; 6(4):75. https://doi.org/10.3390/ecologies6040075

Chicago/Turabian Style

Qu, Keying, Leichao Nie, Lijuan Cui, Huazhe Li, Mingshuo Xiong, Xiajie Zhai, Xinsheng Zhao, Jinzhi Wang, Yinru Lei, and Wei Li. 2025. "Vis–NIR Spectroscopy Characteristics of Wetland Soils with Different Water Contents and Machine Learning Models for Carbon and Nitrogen Content" Ecologies 6, no. 4: 75. https://doi.org/10.3390/ecologies6040075

APA Style

Qu, K., Nie, L., Cui, L., Li, H., Xiong, M., Zhai, X., Zhao, X., Wang, J., Lei, Y., & Li, W. (2025). Vis–NIR Spectroscopy Characteristics of Wetland Soils with Different Water Contents and Machine Learning Models for Carbon and Nitrogen Content. Ecologies, 6(4), 75. https://doi.org/10.3390/ecologies6040075

Article Metrics

Back to TopTop