Soil organic matter (SOM) is the main source of soil nutrients, which are essential for the growth and development of agricultural crops. Hyperspectral remote sensing is one of the most efficient ways of estimating the SOM content. Visible, near infrared, and mid-infrared reflectance spectroscopy, combined with the partial least squares regression (PLSR) method is considered to be an effective way of determining soil properties. In this study, we used 54 different spectral pretreatments to preprocess soil spectral data. These spectral pretreatments were composed of three denoising methods, six data transformations, and three dimensionality reduction methods. The three denoising methods included no denoising (ND), Savitzky–Golay denoising (SGD), and wavelet packet denoising (WPD). The six data transformations included original spectral data, R; reciprocal, 1/R; logarithmic, log(R); reciprocal logarithmic, log(1/R); first derivative, R’; and first derivative of reciprocal, (1/R)’. The three dimensionality reduction methods included no dimensionality reduction (NDR), sensitive waveband dimensionality reduction (SWDR), and principal component analysis (PCA) dimensionality reduction (PCADR). The processed spectra were then employed to construct PLSR models for predicting the SOM content. The main results were as follows—(1) the wavelet packet denoising (WPD)-R’ and WPD-(1/R)’ data showed stronger correlations with the SOM content. Furthermore, these methods could effectively limit the correlation between the adjacent bands and, thus, prevent “overfitting”. (2) Of the 54 pretreatments investigated, WPD-(1/R)’-PCADR yielded the model with the highest accuracy and stability. (3) For the same denoising method and spectral transformation data, the accuracy of the SOM content estimation model based on SWDR was higher than that of the model based on NDR. Furthermore, the accuracy in the case of PCADR was higher than that for SWDR. (4) Dimensionality reduction was effective in preventing data overfitting. (5) The quality of the spectral data could be improved and the accuracy of the SOM content estimation model could be enhanced effectively, by using some appropriate preprocessing methods (one combining WPD and PCADR in this study).
This is an open access article distributed under the Creative Commons Attribution License
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited