Next Article in Journal
MHLDet: A Multi-Scale and High-Precision Lightweight Object Detector Based on Large Receptive Field and Attention Mechanism for Remote Sensing Images
Previous Article in Journal
Assessment of Pavement Structural Conditions and Remaining Life Combining Accelerated Pavement Testing and Ground-Penetrating Radar
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Impact of Spectral Resolution and Signal-to-Noise Ratio in Vis–NIR Spectrometry on Soil Organic Matter Estimation

1
Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun 130033, China
2
University of Chinese Academy of Sciences, Beijing 100049, China
3
Center of Materials Science and Optoelectrics Engineering, University of Chinese Academy of Sciences, Beijing 100049, China
4
National Key Laboratory of Science and Technology on Vacuum Technology and Physics, Lanzhou Institute of Physics, Lanzhou 730000, China
*
Author to whom correspondence should be addressed.
Remote Sens. 2023, 15(18), 4623; https://doi.org/10.3390/rs15184623
Submission received: 3 July 2023 / Revised: 7 September 2023 / Accepted: 10 September 2023 / Published: 20 September 2023

Abstract

:
Recently, considerable efforts have been devoted to the estimation of soil properties using optical payloads mounted on drones or satellites. Nevertheless, many studies focus on diverse pretreatments and modeling techniques, while there continues to be a conspicuous absence of research examining the impact of parameters related to optical remote sensing payloads on predictive performance. The main aim of this study is to evaluate how the spectral resolution and signal-to-noise ratio (SNR) of spectrometers affect the precision of predictions for soil organic matter (SOM) content. For this purpose, the initial soil spectral library was partitioned into to two simulated soil spectral libraries, each of which were individually adjusted with respect to the spectral resolutions and SNR levels. To verify the consistency and generality of our results, we employed four multiple regression models to develop multivariate calibration models. Subsequently, in order to determine the minimum spectral resolution and SNR level without significantly affecting the prediction accuracy, we conducted ANOVA tests on the RMSE and R 2 obtained from the independent validation dataset. Our results revealed that (i) the factors significantly affecting SOM prediction performance, in descending order of magnitude, were the SNR levels > spectral resolutions > estimation models, (ii) no substantial difference existed in predictive performance when the spectral resolution fell within 100 nm, and (iii) when the SNR levels exceeded 15%, altering them did not notably affect the SOM predictive performance. This study is expected to provide valuable insights for the design of future optical remote sensing payloads aimed at monitoring large-scale SOM dynamics.

Graphical Abstract

1. Introduction

Soil forms the foundation of the agricultural ecological system. Soil organic matter (SOM) comprises a complex mixture of organic materials derived from plants and animals in different stages of decomposition. It exerts a profound influence on soil nutrients, plant development, human well-being, and the climate [1]. A decrease in SOM significantly influences soil structural stability, water retention, infiltration capacity, nutrient holding, soil biodiversity, fertility, and ultimately, ecological and agroecological productivity. Precisely assessing the SOM holds immense importance for food production, carbon cycling, and climate regulation [2]. Conventional methods for estimating the SOM traditionally rely on labor-intensive field soil sampling and subsequent laboratory analysis. These approaches are expensive, time-consuming, destructive, and have limited spatial coverage. In contrast, spectroscopy technology offers distinct advantages, including high efficiency, speed, non-destructive detection, and ease of use. Over the past few years, with the rapid development of spectral sensing technology, numerous researchers have been utilizing visible and near-infrared (Vis–NIR) spectroscopy technology to estimate SOM content [3,4]. Furthermore, spectral sensors can be mounted on unmanned aerial vehicles and satellites to achieve the purpose of real-time monitoring of the dynamic change in the SOM in a large area [5,6,7,8]. Consequently, the utilization of spectroscopic techniques for SOM content determination holds significant importance in achieving precision agriculture and in advancing agricultural modernization.
The spectral profile of soil results from the combination of various soil constituents, which is characterized by its nonspecific, faint, and broad nature due to the overlap of absorption bands and the low concentrations of soil components. Therefore, multivariate calibration techniques are usually used to build the relation between the SOM content and soil diffuse reflectance spectra [9]. Common multivariate linear regression models include multiple linear regression, ridge regression (RR), principal component regression (PCR), partial least squares regression (PLSR), and so on [8,10,11,12]. Linear regression models provide more interpretability. To be specific, RR, PCR, and PLSR are insensitive to collinearity due to their mathematical principles. Moreover, the relationship between the dependent variable (SOM content) and the spectral data may not be purely linear due to the complex composition of soil properties. Machine learning methods, such as support vector machine regression, (SVMR), the back propagation neural network (BPNN), the cubist regression tree, random forests, and others, can be employed to address nonlinear problems [8,13,14]. The inherent structures of these methods makes it challenging for them to uncover the functional relationship between Vis–NIR spectra and the SOM. Numerous studies have utilized various multiple regression models to predict the SOM. However, there is no consensus on which multiple regression model yields optimal prediction performance due to variations in soil categories and measurement environments. Therefore, to validate the consistency and generality of our findings, four regression techniques (RR, PLSR, SVMR, and BPNN) were used in this study to demonstrate the effects of variations in two core parameters (spectral resolution and signal-to-noise ratio (SNR) levels) of the spectrometer on the estimation of SOM.
Different spectral treatments and modeling approaches can affect prediction accuracy [15,16]. Nevertheless, prediction accuracy is also contingent on the parameters of Vis–NIR spectrometers. In general, a sensor with more spectral bands, higher spectral resolution, and a SNR tends to produce more accurate data, albeit with potential data redundancy. It is a well-known fact that hyperspectral data exhibit autocorrelation, meaning that many wavelengths convey the same information about land cover properties. Can an abundance of wavelengths, high spectral resolutions, and elevated SNR levels significantly enhance SOM prediction accuracy? Or, how should we select spectral resolution and the SNR to efficiently estimate the SOM? Currently, there is limited research analyzing the influence of spectrometer core parameters on soil property estimation. Castaldi et al. reported that introducing noise into the simulated spectra resulted in a decrease in the prediction accuracy of the model. They also found that a spectral resolution of 40 nm could yield soil texture estimation accuracy that was quite comparable to what sensors with higher spectral resolutions achieved [17]. Knadel et al. conducted a comparative analysis of the prediction performance using three Vis–NIR spectrometers that differed in spectral resolutions, SNRs, and spectral bands. Their findings indicated that the spectral range had the most significant impact on the prediction performance. Moreover, they emphasized that when considering the trade-off between the spectral resolution and SNR, a high SNR played a more crucial role [18]. Gomez et al. established ten spectral configurations featuring various spectral resolutions. Their findings revealed that the spectral configurations within the spectral resolution range of 5 to 100 nm delivered comparable and effective predictive performance for clay estimation [19]. Jia transformed airborne hyperspectral images into degraded hyperspectral libraries with different spectral resolutions, spatial resolutions, and SNR levels. They subsequently assessed the classification accuracy of these degraded hyperspectral datasets for crop identification. The findings indicated that the accuracy declined as the SNR levels decreased. Regarding spectral resolution, the accuracy exhibited an initial increase, followed by stabilization and ultimately a decline [20]. This study builds upon prior research and determines the ideal instrument parameters for accurate SOM estimation.
The purpose of this work is to investigate how the spectral resolution and SNR impact the accuracy of SOM estimation. We created two spectral libraries with varying spectral resolutions and SNR levels using the initial spectral library. This allowed us to individually manipulate the spectral resolution and SNR level. To ensure the consistency and generalizability of our findings, we employed four regression models (RR, PLSR, SVMR, and BPNN) with spectral data featuring different spectral resolutions and SNR levels as input variables. We utilized the analysis of variance (ANOVA) technique to further explore which factors significantly influenced the SOM predictive performance, and we identified the lowest spectral resolution and SNR level that did not significantly affect the prediction accuracy.

2. Materials and Methods

2.1. Study Area and Soil Data Collection

The study was carried out in Qiqihar, Heilongjiang Province, Northeast China, which extends from 109°26′15.86″ to 126°45′22.44″ E and from 47°24′31.40″ to 48°14′25.33″ N (Figure 1). The study area has a typical temperate continental monsoon climate with four distinct seasons. Generally, the temperature varies between −24 °C and 27 °C and rarely goes below −29 °C or above 32 °C during a year. The annual average rainfall varies from 400 to 550 mm. The soil type in the study area is mainly black soil, which has the characteristics of high heat, good permeability, light texture, and high soil organic carbon content [21]. Corn, soybeans, and rice are presently the predominant crops grown in this region
In October 2016, we randomly gathered 112 soil samples from the surface layers (0–15 cm). In the field, we removed the surface litter, collected the soil samples, and sealed them in plastic bags. Each sampling location was recorded using a portable GPS. Back in the laboratory, we allowed the soil samples to air-dry naturally, gently ground them, and sifted them through a 1 mm sieve to obtain the fine earth fraction, eliminating small stones, coarse roots, and fallen leaves. Subsequently, we conducted spectral analysis and chemical determinations on the processed soil samples in the laboratory.

2.2. Analysis of SOM Content and Spectra Measurement

A conversion coefficient of 1.724 was applied to convert the soil organic carbon (SOC) to SOM using the formula SOM (g kg 1 ) = 1.724 × SOC (g kg 1 ). In our study, SOC was measured by the potassium dichromate oxidation–outer heating method. Vis–NIR diffuse reflectance spectra of soil samples were measured in a dark room using an ASD FieldSpec3 (Analytical Spectral Devices, Boulder, CO, USA) with a spectrum range of 350 to 2500 nm. For the spectral range of 350 to 1000 nm, the spectral sampling interval of the ASD spectroradiometer is 1.4 nm, thus providing a spectral resolution of 3 nm. In the range of 1000 to 2500 nm, the spectral sampling interval is 2 nm, with a spectral resolution of 10 nm. The reflectance data available to users were resampled with ViewSpecPro (version 6.0.0, ASD, Boulder, CO, USA) across both spectral ranges, thus resulting in 2151 spectral bands. We used a 50 W halogen lamp with a 30° incident angle as a light source, located at a distance of 10 cm from the soil samples. An optical probe with a field of view (FOV) of 1° was vertically placed 5 cm from the center of the soil sample’s surface. Reflectance was calibrated using a normalized white panel before the readings commenced and every 30 min thereafter. To reduce noise, we conducted ten measurements for each soil sample and then averaged them to obtain the spectra. Splicing correction in the ViewSpecPro software was applied to resolve the breakpoints near 1000 nm and 1800 nm. The spectral regions of 350–450 nm and 2401–2500 nm were omitted due to significant instrumental artifacts at the edges of the spectrum, thus resulting in a total of 1951 spectral bands.

2.3. Creation of Simulated Spectral Libraries

To investigate the effect of two core parameters, namely, spectral resolution and SNR, of spectrometers on the SOM prediction performance, we transformed the initial soil spectral library (comprising soil spectra measured by an ASD spectrometer) to create two simulated spectral libraries. The first library, referred to as the spectral configuration library, consists of nine degraded spectral configurations with regular spectral resolution (i.e., the spectral resolution remains constant throughout the considered spectral field). The number of spectral bands in these spectral configurations was reduced from 323 to 8, and the spectral resolution was coarsened from 3 and 10 nm for the 450–1000 nm and 1000–2400 nm, respectively, to 200 nm (Table 1). The second library, known as the spectral SNR library, involved reducing the SNR of the spectral data to the desired SNR level by introducing Gaussian noise to the original spectra. This SNR library contains a total of 15 spectral dataset, spanning SNR values from 100% down to 1%. By separately simulating these two spectral libraries, we can assess the sensitivity of SOM prediction results to spectral resolution and SNR, independent of other variables. The process of generating these two simulated spectral libraries is detailed in the following section.

2.3.1. Spectral Configuration Library

The spectral configuration library contains a total of 10 spectral configurations, which were divided into two categories: ASD_1/1 and 9 Con_X/Y (Table 1). The original soil spectra were named ASD_1/1 because the original spectra measured by the ASD spectrometer were sampled at intervals of 1 nm in the range of 450–1000 nm and 1000–2400 nm. The other 9 Con_X/Y configurations were derived from ASD_1/1, where X represents the spectral resolution from 450 to 1000 nm, and Y represents the spectral resolution from 1000 to 2400 nm. Con_X/Y reduces the number of spectral bands from 323 to 8, decreases the spectral resolution from 3 nm to 200 nm, and sets the spectral sampling interval equal to the spectral resolution. The spectral reflectance of Con_X/Y can be calculated as follows:
Firstly, three parameters are defined in the spectral configuration [22]: the number of spectral bands (N), the spectral resolution (SR), and the spectral sampling interval (SSI). The SR is also called the full half-width maximum (FHWM), and SSI represents the interval between the acquisition of two signals (Figure 2). Then, the initial laboratory-measured spectra are resampled with Gaussian filters whose tails are cut to twice their width, following the filter response function G ( λ ):
G λ = exp λ   λ c 2 2 · σ 2 with σ = SR 2 · 2 · ln 2
where λ represents the spectral step under different SSIs; λ c denotes the central waveband within the range of the spectral response to certain SSIs.
Finally, the spectral reflectance of each Con_X/Y is determined as follows:
R i = λ s λ k   r ( λ ) · G λ λ s λ k   G λ
where λ s and λ k represent the spectral reflectance of the starting and ending bands within the spectral range, respectively.

2.3.2. Spectral SNR Library

The SNR is an important factor affecting the performance of a spectrometer, which compares a signal value in the presence of a signal with a value for system noise in the absence of a signal. In fact, due to the different principles and operating environments of various spectrometers, the models for calculating the SNR differ. To study the impact of SNR on SOM estimation performance, it is necessary to choose a reasonable range of SNR values. In this study, the SNR of each soil spectrum was calculated as the ratio of the mean value of the reference signal intensity to the standard deviation [23] (Figure 3). In this study, the mean and standard deviation of spectral reflectance were obtained from 10 repeated measurements of a soil sample.
We repeated this operation for all spectral bands of the soil spectrum and obtained SNRs for a total of 1951 bands from 450 nm to 2400 nm (Figure 3a). To better understand the distribution of SNR data, we plotted a histogram of SNR data and estimated the probability density function using the kernel density estimation (KDE) method (Figure 3b). KDE can be seen as the smoothing result of the histogram and allows a better representation of multimodality [24]. In this study, the Gaussian kernel was chosen as the kernel of KDE. We set up a spectral SNR library containing 15 different SNRs from 100% to 1% SNR levels (100%, 90%, 80%, 70%, 60%, 50%, 40%, 30%, 25%, 20%, 15%, 10%, 5%, 3%, and 1%). To simulate the spectra with different SNR levels, we added random Gaussian noise to each initial spectral. The process can be represented by the following equations [25]:
N   ( L ) = L SNR   ( L )
N d   ( L ) = N i   ( L ) 2 N   ( L ) 2
L d = L + Rnd   ( 0 , 1 )   ·   N d   ( L )
where L represents the measured raw radiance, N d   ( L ) represents the noise added to the soil spectrum to degrade the SNR to the required noise N i   ( L ) , and Rnd 0 , 1 is used to generate a Gaussian distribution that has a mean value of 0 and a standard deviation of 1. L d is the radiance after adding Gaussian noise.
In practice, the SNR (L) is calculated for each band as shown in Figure 3a, and then the original noise level N (L) for radiance L is obtained using Equation (3). Similarly, the noise N i   ( L ) corresponding to the different desired degradation SNRs for radiance L can also be calculated using Equation (3). The noise N d   ( L ) to be added to the spectral data can then be calculated using the fact that noise can add in an orthogonal manner (i.e., the square of the noise can be considered as the sum of each individual noise source) as shown in Equation (4). Finally, the radiance L d with different SNRs can be obtained in terms of the original radiance L and the added noise Rnd   ( 0 , 1 )   ·   N d   ( L ) according to Equation (5).

2.4. Calibration and Validation of Prediction Models

The whole soil data set was randomly split into a calibration set and a validation set at the ratio of 7:3. To more accurately assess the effect of instrumental parameters on SOM estimation, the above process of partitioning the soil data set was repeated 20 times. Levene’s test was performed on both the calibration sets and validation sets for each group to ensure that they had the same data distribution. Four multivariate techniques (RR, PLSR, SVMR, and BPNN) were used to build the relation between the spectra and SOM on the calibration sets and to test them on the independent validation sets. Before models’ development, the raw spectra were subjected to the Savitzky–Golay smoothing (SG) smoothing with a second-order polynomial and window size of 11 wavelengths, including first derivative (FD), second derivative (SD), absorbance (Abs), and multiplicative scatter correction (MSC) and their combinatorial operations. Finally, normalization (NOR) was carried. The prediction accuracy of the data using the normalized spectra achieved the desired prediction results. In addition, our study mainly focused on the impact of spectral resolution and SNR, two crucial parameters of spectroscopic instruments, on the accuracy of SOM estimation, rather than on enhancing the prediction accuracy of the models. Therefore, the we only normalized the raw spectra before modeling. A brief description of each multiple regression technique is provided below.
Most of the spectral variables in the full Vis–NIR spectrum exhibit strong correlations, indicating that the reflectance at one wavelength is very similar to the value at the adjacent wavelength. Although RR and PLSR are linear models, they are particularly well-suited for addressing multicollinearity issues due to their mathematical principles and are therefore commonly used for the analysis of Vis–NIR spectroscopy data. RR is a modified least squares estimation approach that applies a penalty on the size of the coefficients [26]. We controlled the hyperparameter α to regulate the importance of features. A larger α value indicates a stronger regularization. Typically, the optimal α strikes a balance between calibration model prediction performance, introducing slight bias to the regression while effectively addressing multicollinearity. PLSR integrates compression and regression steps by extracting a fixed number of orthogonal factors called latent variables (LVs) to maximize the covariance between the predictor and response variables [27]. Therefore, the essence of PLSR lies in creating an appropriate linear combination of features that carry the most information rather than processing a large set of correlated data. SVMR is a kernel-based modeling technique rooted in statistical learning theory. It projects raw data from a low-dimensional feature space into a high-dimensional feature space using an implicit mapping (also called a kernel function). It constructs an optimal linear hyperplane as a decision function for nonlinear regression problems and then inversely transforms in the nonlinear space. In summary, SVMR simplifies the problem by converting nonlinear regression into a linear problem in a high-dimensional feature space [28]. BPNN is a highly nonlinear mathematical model composed of nodes (or neurons) organized in layers and connected by links. BPNN optimizes the weights between neurons based on the backpropagation of errors, minimizing backward learning error from the output layer to the input layer [29].
For these four types of multiple regression models mentioned above, it is necessary to select appropriate hyperparameters to enhance the models’ performance. The α value for RR is set in a geometric progression ranging from 0.001 as the minimum to 1 as the maximum, with a total of 30 values. The maximum number of LVs for PLSR was set to 15. In the case of SVMR, we opted for the Gaussian radial basis function (RBF) as the kernel function, which involves two critical hyperparameters: regularization parameters C and γ. We set the values for C as 0.1, 1, 10, 50, 80, 100, 120, 150,180, and 200 and γ as 0.0001, 0.001, 0.01, 0.1, and 10. For BPNN, we constructed a 4-layer perceptron network with one input layer (spectra data), two hidden layers (the number of nodes per layer was set to 2, 4, 6, …, 20), and an output layer. We chose the Relu function as the activation function for the hidden layer, and L- BFGS was selected as the solver for weight optimization. In this study, a grid search and 5-fold cross-validation were employed on the calibration sets to determine the optimal hyperparameters for each model that minimized the cross validated root mean square error ( RMSE CV ). All four multivariate regression models in this study were implemented in Python 3.8.5 and the scikit-learn package.

2.5. Evaluation of Models

The root mean squared error (RMSE), the coefficient of determination ( R 2 ), and the ratio of performance to interquartile distance (RPIQ) are employed to assess the prediction performance of the four models in both calibration sets and validation sets.
RMSE   = i = 1 N y i y ^ i 2 N
R 2 = 1 i = 1 N y i y ^ i 2 i = 1 N y i y - 2
where y i and y ^ i represent the observed values and the predicted values, respectively, y - is the mean of the observed values, and N is the number of samples with i ranging from 1 to N.
RPIQ = IQ RMSE
where IQ represents the difference between the third and first quartiles.
RMSE quantifies the difference between observed values and predicted values, measured in the same units as the dependent variable. R 2 signifies the proportion of variance in the dependent variables explained by the independent variables in the regression models, and it is utilized to assess the goodness of fit of the models. RPIQ accounts for both prediction error and the variation in observed values, providing a more objective and easily comparable measure of model validity during model validation [30,31]. In our study, RMSE cv , R cv 2 , and RPIQ cv represent the RMSE, R 2 and RPIQ in cross-validation, respectively. RMSE p , R p 2 , and RPIQ p represent the RMSE, R 2 and RPIQ in independent validation sets, respectively.

2.6. Application of ANOVA Technique

ANOVA is a statistical method used to compare whether the means of two or more groups differ significantly and to assess the variance using a probability distribution [32]. It enables us to dissect the variation attributable to each factor in relation to the total variation in the presence of random interference. ANOVA determines whether the variation significantly affects the study population and rejects the null hypothesis. This is done by comparing F values with the critical F value ( F crit ) to recognize the significance of each factor’s contribution. Additionally, the p value can be computed to determine whether a factor exerts a significant influence on the research subject. The p value indicates the probability of the null hypothesis H0 being valid. H0 posits that that there is no difference among the studied groups, while H1 suggests that a difference exists. We use the p value to decide whether reject H0 by comparing it to the significance level, which represents the probability of rejecting the null hypothesis when it is true. Therefore, a smaller p value implies a higher likelihood of H1 being correct [33].
In this study, we employed ANOVA to assess the impact of the spectral resolution and SNR on the accuracy of SOM estimation. Since different prediction models may yield varying SOM estimates, we conducted two-way ANOVAs for spectral configuration prediction models and SNR level prediction models. To delve deeper into the influence of instrument parameters on SOM prediction performance, this study proceeded to conduct one-way ANOVA under each of the four prediction models to determine whether different spectral configurations had a significant effect on the accuracy of SOM estimation. The same approach was used to analyze the effect of SNR on SOM prediction performance.

3. Results

3.1. Analysis of SOM

The SOM content in the study area ranged from 29.308 g kg 1 to 59.650 g kg 1 , with a mean value of 42.780 g kg 1 , a standard deviation of 8.198, and a coefficient of variation of 19.16%, thus indicating a moderate level of variation. The entire soil data set (n = 112) was randomly divided into two portions: calibration sets, comprising 7/10 of the total soil samples (n = 78), and validation sets, comprising 3/10 of the total soil samples (n = 34), which was done to provide a more accurately assessment of the impact of the instrumental parameters on the SOM estimation. The aforementioned process of partitioning the soil dataset was repeated 20 times. The statistical description of the results of the 20 divisions of the soil data in this study is presented in Figure 4, and the outcomes of the Levine’s variance test (p > 0.8) for all groups at a significance level of 0.05 indicated consistent variance between the calibration and validation sets (Table 2). Therefore, we assumed that the validation sets represented the data under investigation, and the calibrated model was utilized to predict the SOM content of the validation sets.

3.2. Spectral Characteristics of the Studied Soils

The spectral profiles of 112 soil samples (Figure 5) exhibited a decrease in reflectance as the SOM content increased, thereby aligning with findings from previous studies [34,35,36]. These soil spectra profiles exhibited a general similarity, which was characterized by lower reflectance in the visible (Vis) ranges and higher reflectance in the near-infrared (NIR) ranges. Notably, three distinctive absorption peaks emerged near 1400 nm, 1900 nm, and 2200 nm. The absorption peaks around 1400 nm and 1900 nm are associated with crystallized water and hydrated water [37]. The wavelengths near 2200 nm may be associated with organic molecules, Si-OH bonds, and cation-OH bonds within phyllosilicate minerals [38].

3.3. Reflectance Spectrum of Spectral Configuration Library

The spectral reflectance profiles of the nine spectral configurations provide a more visually discernible representation of their distinctions (Figure 6). Upon visual inspection of these spectra, it becomes evident that as the spectral resolution decreased, differences in the profiles of the nine spectral configurations started to emerge, thereby resulting in variations in the magnitude of the spectral band reflectance.

3.4. Reflectance Spectrum of SNR Library

To investigate the effect of the spectrometer’s SNR on the accuracy of the SOM estimation, we generated a spectral library with 15 different SNRs ranging from 1% to 100%, based on Equations (3)–(5). From Figure 7, we can conclude that when the SNRs were low, the spectral information of the objects became overwhelmed by noise, thus severely impacting the quality of the spectral data. As the SNRs increased, the spectra profiles gradually became clearer, and at high SNR levels, the spectral profiles appeared quite similar.

3.5. Prediction Results of Multiple Regression Models

We designed two independent experiments to analyze the effects of the spectral resolution and SNR on the accuracy of the SOM estimation. The evaluation results were derived from averaging the RMSE p , R p 2 , and RPIQ p values across 20 independent tests. This section discusses the average statistics of these 20 groups. The RMSE cv _ av , R 2 cv _ av , and RPIQ cv _ av represent the averages of the RMSE cv , R cv 2 , and RPIQ cv , respectively, across the 20 groups. Similarly, the RMSE p _ av , R 2 p _ av , and RPIQ p _ av represent the average of the RMSE p , R p 2 , and RPIQ p , respectively, across the 20 groups.
In the first experiment, we focused on investigating how the coarsened spectral resolutions affected the SOM estimation. We built four prediction models for each spectral configuration, which ranged from ASD_1/1 to Con_200/200 nm in terms of the spectral resolution. The performance indicators (RMSE, R 2 , and RPIQ) obtained with these four prediction models consistently exhibited the same trend concerning the spectral resolution variation (refer to Table 3). As we transitioned from ASD_1/1 to Con_100/100 nm, there was only a slight change in the RMSE p _ av , R 2 p _ av , and RPIQ p _ av values for the independent validation set. However, the prediction models constructed with spectral resolutions greater than 100 nm showed notably worse prediction performance (see Figure 8a–c).
In the second experiment, we analyzed the effect of the SNR on the SOM estimation using the RR, PLSR, SVMR, and BPNN prediction models. We assumed that the initial spectral data, as measured by the ASD spectrometer, had a SNR of 100%. To reduce the SNR of the data set from 100% to 1%, Gaussian random noise was introduced into the spectral data. The observed influence of the spectrometer’s SNR on the accuracy of the SOM estimation aligned with our expectations. The results from these four regression methods on the validation sets consistently indicated that there was no significant alteration in the RMSE, R 2 , or RPIQ as the SNR decreased. However, as the SNR continued to drop below a threshold, a notable decrease in the prediction accuracy became apparent (Table 4). This trend was more vividly displayed through the RMSE p _ av , R 2 p _ av , and RPIQ p _ av for the independent validation set (Figure 8d–f). As illustrated in Figure 8d, this threshold was approximately 15%. When the SNR exceeded 20%, any minor differences observed between groups with varying SNRs could be attributed to the slightly different hyperparameters selected for cross-validation, and this effect was relatively weak. However, when the SNR fell below 15%, the prediction performance became highly sensitive to changes in the SNR. Based on these findings, we can conclude that at lower SNR levels, the estimation performance significantly improves with increasing SNR levels. In contrast, the influence of the SNR on the SOM estimation diminishes as the SNR reaches higher levels.
In our current study, the effect of lower SNR levels on the performance of the SOM estimation was more pronounced compared to the spectral resolutions. The results from the independent validation set reveal that the models constructed using the spectral configuration Con_200/200 yielded subpar prediction performance, with an RMSE p _ av > 4.10, an R 2 p _ av < 0.75, and an RPIQ p _ av < 3.57 (Table 3). The prediction models using a 1% SNR level exhibited the poorest prediction performance, with an RMSE p _ av > 4.83, an R 2 p _ av < 0.65, and an RPIQ p _ av < 3.06 (Table 4). Furthermore, all four models (RR, PLSR, SVMR, and BPNN) built using the spectral configuration library and the spectral SNR library displayed similar trends. However, it remains unclear whether these four models yielded significantly different results for the SOM estimation. While Figure 8 provides an initial insight into the impact of the spectral resolution and SNR on the SOM estimation, it does not ascertain whether this impact is statistically significant. Therefore, ANOVA was employed to conduct a more in-depth analysis to identify the factors contributing to the effect on the SOM estimation accuracy.

3.6. ANOVA Results

In our current study, we used ANOVA techniques to identify significant factors influencing the SOM estimation and to determine the levels of these factors that contributed to differences. Since RPIQ = IQ/RMSE, where IQ represents the difference between the third and first quartiles of the true SOM values for each group, it introduces errors into the dataset, thus leading to larger within-group errors and affecting the analysis results. Therefore, we focused on conducting ANOVA with the two performance indicators, RMSE p and R 2 p . In our study, we employed four regression models to predict the SOM for each spectral configuration and SNR level. Initially, we applied two-way ANOVAs to analyze the contributions of the models and instrument parameters to the estimation of the SOM. Subsequently, one-way ANOVAs were performed under each of the four prediction models to investigate whether the spectral resolutions and SNRs had significant effects on the SOM estimation. It is essential to note that for ANOVA to be valid, the data must meet homogeneity and normal distribution requirements. Our analysis indicates that the variances of the residuals for the RMSE p and R 2 p in each group were equal, as was demonstrated by Levene’s test. The significance of the Shapiro–Wilks test was greater than 0.05, thereby indicating that the data satisfied normality requirements and supported the use of ANOVA.

3.6.1. Two-Way ANOVA Results

We considered the two performance indicators, RMSE p and R 2 p , as the dependent variables and examined the factors of the spectral configurations and prediction models. Table 5 presents the results of the two-way ANOVA for the spectral configurations and models. When the significance level was set at 0.05, it became evident that both the spectral configurations and prediction models had a significant effect on prediction performance (p < 0.05; F > F crit ). However, their interaction did not appear to be significant (p > 0.05; F > F crit ). Notably, the p value associated with the spectral configurations was considerably greater than the p value linked to the prediction models. This suggests that the spectral configuration has a more substantial impact on the prediction performance and provides stronger grounds for rejecting the null hypothesis. These findings, in conjunction with the conclusions drawn in Section 3.3, underscore that the spectral resolution has a greater effect on the prediction performance of the SOM compared to the prediction model used.
Table 6 presents the results of a two-way ANOVA in which the RMSE p and R 2 p were considered as the dependent variables. The SNR levels and prediction models were treated as the two factors. With a significance level set at 0.05, it became evident that the main effects of the SNR levels and prediction models were statistically significant, while their interaction effects were not. The p value associated with the SNR levels was significantly larger than the p value linked to the prediction models, thus indicating that the spectrometer’s SNR had a more substantial impact on the SOM prediction performance.
The results obtained from the two-way ANOVA lead us to the conclusion that both the instrument paraments and prediction models indeed exert a significant influence on the accuracy of SOM estimation. Furthermore, it is evident that the instrument parameters had a more pronounced impact, thereby underscoring the practical significance of our study. It is worth noting that the primary objective of this paper was to assess the variation in the SOM estimation accuracy under different spectral resolutions and SNR levels. Therefore, we performed one-way ANOVA under the various prediction models to further scrutinize the effect of the instrument parameters on the accuracy of the SOM estimation.

3.6.2. One-Way ANOVA Results

The present analysis includes one-way ANOVA results for the spectral configurations and SNR levels, as displayed in Table 7 and Table 8, respectively. ANOVA was conducted at a 95% confidence level, with a significance level set at 0.05. For all multiple regression models, the outcomes in Table 7 and Table 8 indicate that both the ANOVA for the spectral configuration and the ANOVA for the SNR level yielded p values below 0.05. According to ANOVA principles, if the p value is less than 0.05, and the calculated F-value exceeds the critical F value (Fcrit), the null hypothesis is rejected. Therefore, our findings suggest that, regardless of the choice of prediction models, the spectrometer’s parameters significantly influenced the accuracy of the SOM prediction. Furthermore, as ANOVA alone does not specify which groups differ from one another, post hoc tests using the Tukey method were employed to further interpret the results of the ANOVA, which specifically focused on the R 2 p . In this study, post hoc tests were conducted using Python 3.8.5 and the statsmodels package.
Figure 9 visualizes the results of the post hoc test of the spectral configurations, with 1 indicating a significant difference between the two groups and 0 signifying no significant difference. While the results of the Tukey test exhibited slight discrepancies across the different prediction models, a consistent observation emerged: there was no notable distinction between the R 2 p values for a 100 nm spectral resolution and those for higher spectral resolutions. The variations observed among the groups are primarily attributed to random factors introduced by the data, thereby mirroring the conclusions drawn from the visual assessments in Figure 8a–c. Consequently, based on the aforementioned post hoc test results, we can confidently assert that a similar SOM estimation performance can be attained when employing a spectral resolution within 100 nm.
Figure 10 shows the post hoc test results for R 2 p across the four prediction models at various SNR levels. Across the 15 distinct sets of SNR levels, all four models exhibited a significant difference in prediction performance at the 10% threshold. Notably, there was no significant difference in the R 2 p values from 100% to 15%, thus aligning with the pattern observed through visual inspection in Figure 8d–f. This pattern underscores that excessive noise in the spectral data can obscure the characteristic information of the measured object, thereby ultimately leading to reduced estimation accuracy.

4. Discussion

The spectral resolution and SNR represent two crucial parameters in optical remote sensing payloads. In our study, we ascertained that modeling approaches, spectral resolution, and SNRs collectively influence the SOM estimation through an analysis and comparison of the results derived from the same original soil spectral library. The extent of these factors’ impacts on the estimation performance has been previously elucidated. In this section, we delve into how instrument parameters influence estimation performance and provide pertinent recommendations for designing optical instruments dedicated to SOM monitoring.
The SNR of spectrometers is intricately linked to the optical subsystem, detector subsystem, and electronic subsystem. In practical spectrometer operation, the maximum achievable SNR in a specific spectral channel is constrained by fluctuations in the detector dark current, the noise associated with readout processes, the photon noise originating from thermal emissions within the instrument, the nonuniformities within the detector array, and the challenges related to detector calibration [39]. In the field, reduced solar irradiance within the NIR region similarly diminishes the SNR within the NIR spectral bands. Consequently, the SNR of spectroscopic instruments is affected by numerous practical factors. In the actual design of a spectrometer, there exist trade-offs among these parameters. Generally, enhancing the SNR leads to compromises such as reduced spectral resolution and increased dimensions and weight of the optical components. [17,25].
The spectral configuration Con_200/200 provided poor prediction performance, as indicated by an RMSE p _ av > 4.10, an R 2 p _ av < 0.75, and an RPIQ p _ av < 3.57. The prediction models using the 1% SNR level produced the worst prediction performance, with an RMSE p _ av > 4.83, an R 2 p _ av < 0.65, and an RPIQ p _ av < 3.06. When taken in conjunction with the results of the ANOVA, it becomes evident that the SNR tends to exert a more pronounced influence on SOM estimation compared to other factors. The phenomenon observed is that the prediction accuracy increases with higher SNR levels, but the rate of improvement diminishes at elevated SNR levels. It is well known that the fundamental principle underlying most machine learning methods involves regressors learning data features and iteratively minimizing losses to make predictions on new data samples. Excessive noise can overpower the inherent data features, thereby resulting in reduced prediction accuracy. This is the reason why the SNR significantly impacts the prediction performance. Furthermore, hyperspectral data typically encompass a multitude of wavelength variables and, in practice, are susceptible to the challenges associated with the “curse of dimensionality” [40]. Our research suggests that decreasing the spectral resolution by reducing the number of spectral bands could alleviate the issues related to multicollinearity and redundant information. This simplification of the prediction models can yield comparable or even more efficient results compared to higher spectral resolutions. Nevertheless, an excessively low number of spectral bands would result in insufficient valid information for accurately estimating the SOM.
When considering optical remote sensing instruments intended for SOM monitoring, it is advisable to prioritize addressing the design requirement for the SNR to mitigate its influence on the precise estimation of the SOM. In certain situations, opting for a lower spectral resolution may be suitable to meet the spectrometer’s SNR prerequisites. Additionally, for hyperspectral instruments, reducing the spectral resolution also entails a decrease in the number of bands, which equates to a reduction in the volume of the spectral data. This approach also simplifies data processing, thus making it less cumbersome. Consequently, investigating the impact of the spectral resolution and SNR levels on the SOM estimation enhances the potential for utilizing spectroscopic techniques for SOM estimation. This applicability can extend to various soil regions, and our findings can be extrapolated to broader scales.

5. Conclusions

In this study, we simulated two spectral libraries using the spectra measured by an ASD spectrometer. We evaluated the effect of the two fundamental spectrometer parameters—the spectral resolution and SNR—on the estimation of the SOM using four multivariate regression methods (i.e., RR, PLSR, SVMR, and BPNN). The following conclusions are derived from our experimental analysis:
  • Various spectral resolutions, varying SNR levels, and the utilization of distinct multivariate regression models for prediction all exert noteworthy influences on the SOM prediction performance. Notably, the variations in the prediction performance attributed to instrument parameters surpassed those attributed to the prediction models. The most substantial disparities stem from SNR levels, with the spectral resolution differences following closely behind.
  • The ANOVA analysis of the performance indicator R 2 p indicates that there is no significant discrepancy in the SOM prediction performance when the spectral resolution falls within 100 nm. However, when the spectral resolution exceeds 100 nm, a significant decline in the estimation accuracy becomes evident.
  • The SNR level of the spectroscopic instruments emerged as the most pivotal factor for the precise estimation of the SOM. Typically, higher SNR levels correspond to enhanced estimation accuracy. Nevertheless, as the SNR reaches higher levels, its impact on the SOM estimation diminishes. The ANOVA results for R 2 p suggest that when the SNR level surpasses 15%, it no longer yields a significant difference in the SOM estimation performance.
In conclusion, the spectral resolution and SNR are important indicators of a spectrometer. Grasping the influence of these parameters on the SOM estimation paves the way for the efficient design of optical remote sensing payloads aimed at monitoring large-scale SOM variations in the future.

Author Contributions

Conceptualization, B.Y.; methodology, B.Y.; software, B.Y.; validation, B.Y., J.Y. and C.Y.; formal analysis, B.Y.; investigation, B.Y.; resources, B.Y., J.X., C.M. and H.D.; data curation, B.Y.; writing—original draft preparation, B.Y.; writing—review and editing, C.Y., J.Y., J.X. and C.M.; visualization, B.Y. and J.Y.; supervision, C.Y. and J.Y.; project administration, C.Y.; funding acquisition, C.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Jilin Key R&D Program of China under Grant 20230201036GX; it was also funded in part by the National Natural Science Foundation of China (NSFC) under Grant 62105331, and Grant 62275114; in part by the Qingdao Industrial Experts Program; and in part by the Taishan Industrial Experts Program.

Data Availability Statement

The data is not available, because the team data involves privacy issues.

Acknowledgments

The editor and the reviewers are thanked for their helpful comments and criticisms of the initial draft of this paper.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Viscarra Rossel, R.A.; Behrens, T.; Ben-Dor, E.; Brown, D.J.; Demattê, J.A.M.; Shepherd, K.D.; Shi, Z.; Stenberg, B.; Stevens, A.; Adamchuk, V.; et al. A global spectral library to characterize the world’s soil. Earth-Sci. Rev. 2016, 155, 198–230. [Google Scholar] [CrossRef]
  2. Prăvălie, R. Exploring the multiple land degradation pathways across the planet. Earth-Sci. Rev. 2021, 220, 103689. [Google Scholar] [CrossRef]
  3. Li, T.; Mu, T.; Liu, G.; Yang, X.; Zhu, G.; Shang, C. A Method of Soil Moisture Content Estimation at Various Soil Organic Matter Conditions Based on Soil Reflectance. Remote Sens. 2022, 14, 2411. [Google Scholar] [CrossRef]
  4. Wang, Y.; Huang, T.; Liu, J.; Lin, Z.; Li, S.; Wang, R.; Ge, Y. Soil pH value, organic matter and macronutrients contents prediction using optical diffuse reflectance spectroscopy. Comput. Electron. Agric. 2015, 111, 69–77. [Google Scholar] [CrossRef]
  5. Sun, W.; Liu, S.; Zhang, X.; Li, Y. Estimation of soil organic matter content using selected spectral subset of hyperspectral data. Geoderma 2022, 409, 115653. [Google Scholar] [CrossRef]
  6. Hong, Y.; Guo, L.; Chen, S.; Linderman, M.; Mouazen, A.M.; Yu, L.; Chen, Y.; Liu, Y.; Liu, Y.; Cheng, H.; et al. Exploring the potential of airborne hyperspectral image for estimating topsoil organic carbon: Effects of fractional-order derivative and optimal band combination algorithm. Geoderma 2020, 365, 114228. [Google Scholar] [CrossRef]
  7. Ma, H.; Wang, C.; Liu, J.; Wang, X.; Zhang, F.; Yuan, Z.; Yao, C.; Pan, X. A Framework for Retrieving Soil Organic Matter by Coupling Multi-Temporal Remote Sensing Images and Variable Selection in the Sanjiang Plain, China. Remote Sens. 2023, 15, 3191. [Google Scholar] [CrossRef]
  8. Wang, S.; Guan, K.; Zhang, C.; Lee, D.; Margenot, A.J.; Ge, Y.; Peng, J.; Zhou, W.; Zhou, Q.; Huang, Y. Using soil library hyperspectral reflectance and machine learning to predict soil organic carbon: Assessing potential of airborne and spaceborne optical soil sensing. Remote Sens. Environ. 2022, 271, 112914. [Google Scholar] [CrossRef]
  9. Rossel, R.A.V.; Behrens, T. Using data mining to model and interpret soil diffuse reflectance spectra. Geoderma 2010, 158, 46–54. [Google Scholar] [CrossRef]
  10. Sirsat, M.S.; Cernadas, E.; Fernández-Delgado, M.; Barro, S. Automatic prediction of village-wise soil fertility for several nutrients in India using a wide range of regression methods. Comput. Electron. Agric. 2018, 154, 120–133. [Google Scholar] [CrossRef]
  11. Lucà, F.; Conforti, M.; Castrignanò, A.; Matteucci, G.; Buttafuoco, G. Effect of calibration set size on prediction at local scale of soil carbon by Vis-NIR spectroscopy. Geoderma 2017, 288, 175–183. [Google Scholar] [CrossRef]
  12. Gupta, A.; Vasava, H.B.; Das, B.S.; Choubey, A.K. Local modeling approaches for estimating soil properties in selected Indian soils using diffuse reflectance data over visible to near-infrared region. Geoderma 2018, 325, 59–71. [Google Scholar] [CrossRef]
  13. Jia, P.; Zhang, J.; He, W.; Hu, Y.; Zeng, R.; Zamanian, K.; Jia, K.; Zhao, X. Combination of Hyperspectral and Machine Learning to Invert Soil Electrical Conductivity. Remote Sens. 2022, 14, 2602. [Google Scholar] [CrossRef]
  14. Xu, S.; Wang, M.; Shi, X. Hyperspectral imaging for high-resolution mapping of soil carbon fractions in intact paddy soil profiles with multivariate techniques and variable selection. Geoderma 2020, 370, 114358. [Google Scholar] [CrossRef]
  15. Zhang, Z.; Ding, J.; Zhu, C.; Wang, J. Combination of efficient signal pre-processing and optimal band combination algorithm to predict soil organic matter through visible and near-infrared spectra. Spectrochim. Acta Part A Mol. Biomol. Spectrosc. 2020, 240, 118553. [Google Scholar] [CrossRef] [PubMed]
  16. Deiss, L.; Margenot, A.J.; Culman, S.W.; Demyan, M.S. Tuning support vector machines regression models improves prediction accuracy of soil properties in MIR spectroscopy. Geoderma 2020, 365, 114227. [Google Scholar] [CrossRef]
  17. Castaldi, F.; Palombo, A.; Santini, F.; Pascucci, S.; Pignatti, S.; Casa, R. Evaluation of the potential of the current and forthcoming multispectral and hyperspectral imagers to estimate soil texture and organic carbon. Remote Sens. Environ. 2016, 179, 54–65. [Google Scholar] [CrossRef]
  18. Knadel, M.; Stenberg, B.; Deng, F.; Thomsen, A.; Greve, M.H. Comparing Predictive Abilities of Three Visible-Near Infrared Spectrophotometers for Soil Organic Carbon and Clay Determination. J. Near Infrared Spectrosc. 2013, 21, 67–80. [Google Scholar] [CrossRef]
  19. Gomez, C.; Adeline, K.; Bacha, S.; Driessen, B.; Gorretta, N.; Lagacherie, P.; Roger, J.M.; Briottet, X. Sensitivity of clay content prediction to spectral configuration of VNIR/SWIR imaging data, from multispectral to hyperspectral scenarios. Remote Sens. Environ. 2018, 204, 18–30. [Google Scholar] [CrossRef]
  20. Jia, J.; Chen, J.; Zheng, X.; Wang, Y.; Guo, S.; Sun, H.; Jiang, C.; Karjalainen, M.; Karila, K.; Duan, Z.; et al. Tradeoffs in the Spatial and Spectral Resolution of Airborne Hyperspectral Imaging Systems: A Crop Identification Case Study. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5510918. [Google Scholar] [CrossRef]
  21. Wang, L.; Qiu, J.; Tang, H.; Li, H.; Li, C.; Van Ranst, E. Modelling soil organic carbon dynamics in the major agricultural regions of China. Geoderma 2008, 147, 47–55. [Google Scholar] [CrossRef]
  22. Adeline, K.R.M.; Gomez, C.; Gorretta, N.; Roger, J.M. Predictive ability of soil properties to spectral degradation from laboratory Vis-NIR spectroscopy data. Geoderma 2017, 288, 143–153. [Google Scholar] [CrossRef]
  23. Liu, L.; Liu, X.; Hu, J. Effects of spectral resolution and SNR on the vegetation solar-induced fluorescence retrieval using FLD-based methods at canopy level. Eur. J. Remote Sens. 2017, 48, 743–762. [Google Scholar] [CrossRef]
  24. Scott, D.W. Multivariate Density Estimation; Wiley: Hoboken, NJ, USA, 2015. [Google Scholar]
  25. Schott, J.R.; Gerace, A.; Woodcock, C.E.; Wang, S.; Zhu, Z.; Wynne, R.H.; Blinn, C.E. The impact of improved signal-to-noise ratios on algorithm performance: Case studies for Landsat class instruments. Remote Sens. Environ. 2016, 185, 37–45. [Google Scholar] [CrossRef]
  26. Hoerl, A.E.; Kennard, R.W. Ridge Regression: Biased Estimation for Nonorthogonal Problems. Technometrics 1970, 12, 55–67. [Google Scholar] [CrossRef]
  27. Geladi, P.; Kowalski, B.R. Partial Least-Squares Regression—A Tutorial. Anal. Chim. Acta 1986, 185, 1–17. [Google Scholar] [CrossRef]
  28. Schölkopf, B. Statistical Learning and Kernel Methods; Springer: Berlin/Heidelberg, Germany, 2001. [Google Scholar]
  29. Haykin, S. Redes Neurais Artificiais: Princípios e Práticas, 2nd ed.; Bookman: Sao Paulo, Brazil, 2001. [Google Scholar]
  30. Lin, L.I. A concordance correlation coefficient to evaluate reproducibility. Biometrics 1989, 45, 255–268. [Google Scholar] [CrossRef]
  31. Bellon-Maurel, V.; Fernandez-Ahumada, E.; Palagos, B.; Roger, J.-M.; McBratney, A. Critical review of chemometric indicators commonly used for assessing the quality of the prediction of soil attributes by NIR spectroscopy. Trac-Trends Anal. Chem. 2010, 29, 1073–1081. [Google Scholar] [CrossRef]
  32. Stahle, L.; Wold, S. Analysis of variance (ANOVA). Chemom. Intell. Lab. Syst. 1989, 6, 259–272. [Google Scholar] [CrossRef]
  33. Wasserstein, R.L.; Lazar, N.A. The ASA’s Statement on p-Values: Context, Process, and Purpose. Am. Stat. 2016, 70, 129–131. [Google Scholar] [CrossRef]
  34. Ba, Y.; Liu, J.; Han, J.; Zhang, X. Application of Vis-NIR spectroscopy for determination the content of organic matter in saline-alkali soils. Spectrochim. Acta Part A Mol. Biomol. Spectrosc. 2020, 229, 117863. [Google Scholar] [CrossRef] [PubMed]
  35. Stoner, E.R.; Baumgardner, M.F. Characteristic Variations in Reflectance of Surface Soils. Soil Sci. Soc. Am. J. 1981, 45, 1161–1165. [Google Scholar] [CrossRef]
  36. Shepherd, K.D.; Walsh, M.G. Development of reflectance spectral libraries for characterization of soil properties. Soil Sci. Soc. Am. J. 2002, 66, 988–998. [Google Scholar] [CrossRef]
  37. Bishop, J.L.; Pieters, C.M.; Edwards, J.O. Infrared Spectroscopic Analyses on the Nature of Water in Montmorillonite. Clays Clay Miner. 1994, 42, 702–716. [Google Scholar] [CrossRef]
  38. Clark, R.N.; King, T.V.V.; Klejwa, M.; Swayze, G.A.; Vergo, N. High Spectral Resolution Reflectance Spectroscopy of Minerals. J. Geophys. Res.-Solid Earth Planets 1990, 95, 12653–12680. [Google Scholar] [CrossRef]
  39. Swayze, G.A.; Clark, R.N.; Goetz, A.F.H.; Chrien, T.G.; Gorelick, N.S. Effects of spectrometer band pass, sampling, and signal-to-noise ratio on spectral identification using the Tetracorder algorithm. J. Geophys. Res.-Planets 2003, 108, 1975. [Google Scholar] [CrossRef]
  40. Salimi, A.; Ziaii, M.; Amiri, A.; Hosseinjani Zadeh, M.; Karimpouli, S.; Moradkhani, M. Using a Feature Subset Selection method and Support Vector Machine to address curse of dimensionality and redundancy in Hyperion hyperspectral data classification. Egypt. J. Remote Sens. Space Sci. 2018, 21, 27–36. [Google Scholar] [CrossRef]
Figure 1. Location of the study in China.
Figure 1. Location of the study in China.
Remotesensing 15 04623 g001
Figure 2. The Gaussian response function G λ for N = 3, SR = 20 nm, and SSI = 20 nm.
Figure 2. The Gaussian response function G λ for N = 3, SR = 20 nm, and SSI = 20 nm.
Remotesensing 15 04623 g002
Figure 3. (a) The SNR of a soil sample spectrum with a total of 1951 spectral bands in the spectral range from 450 nm to 2400 nm. (b) Histogram and probability density estimation of SNR using KDE.
Figure 3. (a) The SNR of a soil sample spectrum with a total of 1951 spectral bands in the spectral range from 450 nm to 2400 nm. (b) Histogram and probability density estimation of SNR using KDE.
Remotesensing 15 04623 g003
Figure 4. Boxplots depicting the SOM values for the calibration and validation sets across 20 groups.
Figure 4. Boxplots depicting the SOM values for the calibration and validation sets across 20 groups.
Remotesensing 15 04623 g004
Figure 5. Reflectance spectra for the entire data set. The highlighted spectra curves correspond to the mean values of the spectra across different ranges of SOM contents.
Figure 5. Reflectance spectra for the entire data set. The highlighted spectra curves correspond to the mean values of the spectra across different ranges of SOM contents.
Remotesensing 15 04623 g005
Figure 6. Reflectance spectra for each spectral configuration from Con_3/10 to Con_200/200.
Figure 6. Reflectance spectra for each spectral configuration from Con_3/10 to Con_200/200.
Remotesensing 15 04623 g006
Figure 7. Reflectance spectra for different SNR levels from 1% SNR to 100% SNR.
Figure 7. Reflectance spectra for different SNR levels from 1% SNR to 100% SNR.
Remotesensing 15 04623 g007
Figure 8. Variation in RMSE p _ av , R 2 p _ av , and RPIQ p _ av with all spectral configurations and SNR levels for the four prediction models. (a) RMSE p _ av for predicting SOM in various spectral configurations; (b) R 2 p _ av for predicting SOM in various spectral configurations; (c) RPIQ p _ av for predicting SOM in various spectral configurations; (d) RMSE p _ av for predicting SOM in various SNR levels; (e) R 2 p _ av for predicting SOM in various SNR levels; (f) RPIQ p _ av for predicting SOM in various SNR levels.
Figure 8. Variation in RMSE p _ av , R 2 p _ av , and RPIQ p _ av with all spectral configurations and SNR levels for the four prediction models. (a) RMSE p _ av for predicting SOM in various spectral configurations; (b) R 2 p _ av for predicting SOM in various spectral configurations; (c) RPIQ p _ av for predicting SOM in various spectral configurations; (d) RMSE p _ av for predicting SOM in various SNR levels; (e) R 2 p _ av for predicting SOM in various SNR levels; (f) RPIQ p _ av for predicting SOM in various SNR levels.
Remotesensing 15 04623 g008aRemotesensing 15 04623 g008b
Figure 9. Results of post hoc test for each set of R 2 p values regarding spectral configurations (1 indicates significant difference between groups; 0 indicates no significant difference).
Figure 9. Results of post hoc test for each set of R 2 p values regarding spectral configurations (1 indicates significant difference between groups; 0 indicates no significant difference).
Remotesensing 15 04623 g009
Figure 10. Results of post hoc test for each set of R 2 p values regarding SNR levels (1 indicates significant difference between groups; 0 indicates no significant difference.
Figure 10. Results of post hoc test for each set of R 2 p values regarding SNR levels (1 indicates significant difference between groups; 0 indicates no significant difference.
Remotesensing 15 04623 g010
Table 1. Descriptions of ten spectral configurations. ASD_1/1 represents the initial spectral library.
Table 1. Descriptions of ten spectral configurations. ASD_1/1 represents the initial spectral library.
ConfigurationsN450–1000 nm1000–2400 nm
SSISRSSISR
ASD_1/1195113110
Con_3/10323331010
Con_10/1019410101010
Con_20/209620202020
Con_40/404740404040
Con_60/603160606060
Con_80/802380808080
Con_100/10018100100100100
Con_150/15012150150150150
Con_200/2008200200200200
Table 2. Descriptive statistics of SOM for the calibration and validation sets, along with the probability levels from Levene’s test under various groupings.
Table 2. Descriptive statistics of SOM for the calibration and validation sets, along with the probability levels from Levene’s test under various groupings.
Calibration   Sets   ( g   kg 1 ) Validation   Sets   ( g   kg 1 )Levene’s Test
MinMaxMeanSDMinMaxMeanSD
129.30858.61643.6478.12429.30859.65040.7738.1410.840
229.30859.65041.6467.91429.30858.61645.4008.3670.921
329.30859.65043.5318.00429.30858.61641.0428.5050.824
429.30857.23742.6048.05829.30859.65043.1868.6320.961
529.30859.65043.2058.21329.30858.61641.7968.2090.989
629.48058.61643.4097.97629.30859.65041.3228.6440.827
729.30858.61642.0127.96329.30859.65044.5558.5820.985
829.30858.61642.8488.12629.48059.65042.6218.4930.815
929.30858.61642.5578.10129.30859.65043.2948.5270.873
1029.30859.65042.7418.20730.68758.61642.8688.3080.800
1129.30857.23741.9757.94529.30859.65044.6418.5950.925
1229.30859.65042.4948.17229.30857.06443.4398.3520.845
1329.30859.65042.2228.25730.68758.61644.0698.0390.894
1429.48059.65043.1288.20729.30857.23741.9748.2510.823
1529.30858.61642.6648.16530.68759.65043.0468.4000.833
1629.30859.65042.9428.25129.30857.23742.4058.1920.954
1729.30859.65043.4018.09929.30857.23741.3448.3760.901
1829.30858.61642.4718.12131.54559.65043.4938.4600.915
1929.30859.65042.2788.17229.48058.61643.9408.2710.894
2029.30859.65043.4158.24629.48058.61641.3118.0200.938
Table 3. Comparison of SOM prediction accuracy values of each model with different spectral configurations.
Table 3. Comparison of SOM prediction accuracy values of each model with different spectral configurations.
ConfigurationsModels RMSE cv _ av R 2 cv _ av RPIQ cv _ av RMSE p _ av R 2 p _ av RPIQ p _ av
ASD_1/1RR3.840.743.523.810.783.85
PLSR3.850.743.513.830.783.82
SVMR3.920.733.463.870.783.80
BPNN3.910.733.453.810.783.85
Con_3/10RR3.840.743.513.810.783.85
PLSR3.850.743.513.820.783.84
SVMR3.930.733.453.880.773.79
BPNN3.890.743.473.800.783.85
Con_10/10RR3.830.743.523.810.783.85
PLSR3.830.743.523.820.783.83
SVMR3.940.733.443.890.773.78
BPNN3.890.743.483.830.783.83
Con_20/20RR3.830.743.533.810.783.85
PLSR3.820.743.533.800.783.85
SVMR3.930.733.443.870.783.80
BPNN3.880.743.483.830.783.83
Con_40/40RR3.830.743.533.820.783.83
PLSR3.820.743.543.830.783.82
SVMR3.950.733.423.850.783.81
BPNN3.880.743.483.830.783.82
Con_60/60RR3.820.743.543.820.783.84
PLSR3.820.743.543.820.783.83
SVMR3.940.733.433.860.783.80
BPNN3.870.743.493.810.783.84
Con_80/80RR3.830.743.533.830.783.82
PLSR3.820.743.533.860.783.79
SVMR3.940.733.433.900.773.76
BPNN3.880.743.483.820.783.83
Con_100/100RR3.850.743.503.830.783.82
PLSR3.850.743.513.860.783.79
SVMR3.970.733.413.940.773.73
BPNN3.900.733.473.830.783.82
Con_150/150RR3.940.733.433.980.763.69
PLSR3.960.733.413.970.763.68
SVMR4.040.713.354.040.763.63
BPNN3.980.723.403.990.763.68
Con_200/200RR4.090.713.314.110.753.58
PLSR4.090.713.314.120.753.57
SVMR4.090.713.314.100.753.58
BPNN4.090.713.304.110.753.58
Table 4. Comparison of SOM prediction accuracy values of each model with different SNR levels.
Table 4. Comparison of SOM prediction accuracy values of each model with different SNR levels.
SNR LevelsModels RMSE cv _ av R 2 cv _ av RPIQ cv _ av RMSE p _ av R 2 p _ av RPIQ p _ av
100% SNRRR3.840.743.523.810.783.85
PLSR3.850.743.513.830.783.82
SVMR3.920.733.463.870.783.80
BPNN3.910.733.453.810.783.85
90% SNRRR3.850.743.513.810.783.84
PLSR3.860.743.503.860.783.80
SVMR3.920.733.453.870.783.80
BPNN3.910.733.463.820.783.85
80% SNRRR3.840.743.523.880.773.79
PLSR3.870.743.493.900.773.76
SVMR3.920.733.453.900.773.77
BPNN3.910.733.463.830.783.83
70% SNRRR3.840.743.523.810.783.85
PLSR3.870.743.493.860.783.80
SVMR3.910.733.463.890.773.78
BPNN3.890.733.473.810.783.85
60% SNRRR3.830.743.533.820.783.84
PLSR3.870.743.493.880.783.78
SVMR3.910.733.463.910.773.76
BPNN3.870.743.493.850.783.81
50% SNRRR3.860.743.503.830.783.83
PLSR3.900.733.463.920.773.74
SVMR3.910.733.463.910.773.77
BPNN3.920.733.453.910.773.75
40% SNRRR3.870.743.493.840.783.81
PLSR3.900.733.463.980.763.68
SVMR3.920.733.453.880.783.78
BPNN3.930.733.443.900.773.76
30% SNRRR3.850.743.503.840.783.82
PLSR3.880.743.483.950.773.71
SVMR3.920.733.453.960.773.70
BPNN3.880.743.483.860.783.80
25% SNRRR3.970.723.413.900.773.78
PLSR3.950.733.423.980.763.68
SVMR3.950.733.423.900.773.78
BPNN4.000.723.383.940.773.73
20% SNRRR3.970.723.403.950.773.72
PLSR3.960.723.414.020.763.65
SVMR4.000.723.384.050.753.64
BPNN4.040.713.354.060.753.61
15% SNRRR4.070.713.324.030.763.66
PLSR4.060.713.334.010.763.68
SVMR4.050.713.344.010.763.68
BPNN4.110.703.294.040.763.65
10% SNRRR4.170.703.254.130.743.57
PLSR4.170.703.254.200.743.51
SVMR4.170.693.254.190.743.53
BPNN4.220.693.214.130.743.58
5% SNRRR4.360.673.104.280.733.45
PLSR4.330.683.124.450.703.33
SVMR4.380.673.094.360.723.40
BPNN4.430.663.064.320.723.42
3% SNRRR4.550.652.974.450.703.32
PLSR4.510.653.004.600.683.21
SVMR4.530.652.994.460.703.30
BPNN4.610.642.934.540.693.25
1% SNRRR4.830.602.804.780.663.09
PLSR4.760.612.844.730.673.12
SVMR4.850.602.794.830.653.06
BPNN4.840.602.794.810.653.07
Table 5. Examining the outcomes of a two-way ANOVA involving RMSE p and R 2 p as dependent variables, with spectral configurations and prediction models considered as the influencing factors.
Table 5. Examining the outcomes of a two-way ANOVA involving RMSE p and R 2 p as dependent variables, with spectral configurations and prediction models considered as the influencing factors.
IndicatorSourceSSdfMSFp Value F crit
RMSE p Configuration6.5090.7218.969.51 × 10−291.89
Model0.4930.164.330.00492.62
Interaction0.16270.00580.151.001.50
Error28.967600.038
Total36.11799
R 2 p Configuration0.09490.01021.105.48 × 10−321.89
Model0.007230.00244.860.00232.62
Interaction0.0022278.32 × 10−50.171.001.50
Error0.38760
Total0.48799
SS: Sum of squares; df: degree of freedom; MS: mean square.
Table 6. Examining the outcomes of a two-way ANOVA involving RMSE p and R 2 p as dependent variables, with SNR levels and prediction models considered as the influencing factors.
Table 6. Examining the outcomes of a two-way ANOVA involving RMSE p and R 2 p as dependent variables, with SNR levels and prediction models considered as the influencing factors.
IndicatorSourceSSdfMSFp ValueFcrit
RMSE p SNR91.82146.5679.058 × 10−1571.70
Model0.7530.253.020.0292.61
Interaction1.14420.0270.331.001.39
Error94.5711400.083
Total188.291199
R 2 p SNR1.51140.1192.639.9 × 10−1771.70
Model0.01130.00363.100.0262.61
Interaction0.019420.00040.381.001.39
Error1.3311400.0012
Total2.861199
Table 7. Results of one-way ANOVA using spectral configurations as the independent variables.
Table 7. Results of one-way ANOVA using spectral configurations as the independent variables.
ModelIndicatorSourceSSdfMSFp ValueFcrit
RR RMSE p Configuration1.8190.205.962.42 × 10−71.93
Error6.421900.034
Total8.23199
R 2 p Con0.02690.0026.841.69 × 10−81.93
Error0.0811900.0004
Total0.11199
PLSR RMSE p Con1.7990.207.462.61 × 10−91.93
Error5.061900.027
Total6.85199
R 2 p Con0.02690.00297.571.84 × 10−91.93
Error0.0731900.0004
Total0.099199
SVMR RMSE p Con1.2990.142.390.0141.93
Error11.421900.060
Total12.72199
R 2 p Con0.01890.0022.690.00581.93
Error0.141900.0007
Total0.16199
BPNN RMSE p Con1.8290.226.484.97 × 10−81.93
Error5.931900.031
Total7.75199
R 2 p Con0.02690.00297.254.78 × 10−91.93
Error0.0761900.0004
Total0.10199
Table 8. Results of one-way ANOVA using SNR level as the independent variable.
Table 8. Results of one-way ANOVA using SNR level as the independent variable.
ModelIndicatorSourceSSdfMSFp ValueFcrit
RR RMSE p SNR23.15141.6520.077.2 × 10−351.73
Error23.472850.082
Total46.62299
R 2 p SNR0.38140.02723.883.21 × 10−401.73
Error0.322850.0011
Total0.69299
PLSR RMSE p SNR22.95141.6421.604.49 × 10−371.73
Error21.622850.076
Total44.57299
R 2 p SNR0.38140.02724.732.31 × 10−411.73
Error0.312850.0011
Total0.69299
SVMR RMSE p SNR22.02141.5716.492.17 × 10−291.73
Error27.192850.095
Total49.21299
R 2 p SNR0.36140.02619.801.8 × 10−341.73
Error0.372850.0013
Total0.74299
BPNN RMSE p SNR24.85141.7722.691.35 × 10−381.73
Error22.292850.078
Total47.14299
R 2 p SNR0.41140.02926.034.69 × 10−431.73
Error0.312850.0011
Total0.72299
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Yu, B.; Yuan, J.; Yan, C.; Xu, J.; Ma, C.; Dai, H. Impact of Spectral Resolution and Signal-to-Noise Ratio in Vis–NIR Spectrometry on Soil Organic Matter Estimation. Remote Sens. 2023, 15, 4623. https://doi.org/10.3390/rs15184623

AMA Style

Yu B, Yuan J, Yan C, Xu J, Ma C, Dai H. Impact of Spectral Resolution and Signal-to-Noise Ratio in Vis–NIR Spectrometry on Soil Organic Matter Estimation. Remote Sensing. 2023; 15(18):4623. https://doi.org/10.3390/rs15184623

Chicago/Turabian Style

Yu, Bo, Jing Yuan, Changxiang Yan, Jiawei Xu, Chaoran Ma, and Hu Dai. 2023. "Impact of Spectral Resolution and Signal-to-Noise Ratio in Vis–NIR Spectrometry on Soil Organic Matter Estimation" Remote Sensing 15, no. 18: 4623. https://doi.org/10.3390/rs15184623

APA Style

Yu, B., Yuan, J., Yan, C., Xu, J., Ma, C., & Dai, H. (2023). Impact of Spectral Resolution and Signal-to-Noise Ratio in Vis–NIR Spectrometry on Soil Organic Matter Estimation. Remote Sensing, 15(18), 4623. https://doi.org/10.3390/rs15184623

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop