Hyperspectral Modeling of Soil Organic Matter Based on Characteristic Wavelength in East China

: Soil organic matter (SOM) is a key index of soil fertility. Visible and near-infrared (VNIR, 350–2500 nm) reﬂectance spectroscopy is an effective method for modeling SOM content. Characteristic wavelength screening and spectral transformation may improve the performance of SOM prediction. This study aimed explore the optimal combination of characteristic wavelength selection and spectral transformation for hyperspectral modeling of reﬂectance reﬂectance continuum removal (CR) and ﬁrst-order derivative reﬂectance (FDR)) of VNIR characteristic wavelengths competitive adaptive reweighted sampling (CARS) variables elimination (UVE) the partial least squares regression (PLSR), random forest (RF) and support vector regression (SVR) methods the full spectra selected SOM prediction for soil types. (1) The CARS 40–125 characteristic wavelengths from the full spectra. The UVE algorithm screened 105–884 characteristic wavelengths. (2) For two soil types and full spectra, CARS and UVE improved the SOM modeling precision based on the PLSR and SVR methods. The coefﬁcient of determination ( R 2 ) value in the validation of the CARS-PLSR (PLSR model combined with CARS) and CARS-SVR (SVR model combined CARS) models ranged from 0.69 to 0.95, and the relative percent deviation ( RPD ) value ranged from 1.74 to 4.31. Lin’s concordance correlation coefﬁcient ( LCCC ) values ranged from 0.83 to 0.97. The UVE-PLSR and UVE-SVR models showed moderate precision. (3) The PLSR and SVR modeling accuracies of Paddy soil were than those for Shajiang black soil. RF models worse for both soil types, with the R 2 values of ranging from 0.22 to 0.68 and RPD values ranging from 1.01 to 1.60. (4) For soil, the optimal SOM prediction models (highest R 2 and RPD , lowest root mean square error ( RMSE )) ( RMSE : 0.97 and 1.21 g/kg in calibration sets, 0.95 and 1.72 g/kg in validation sets, RPD : 4.31) and CR-CARS-SVR ( R 2 and RMSE: 0.98 and 1.04 g/kg in calibration sets, 0.91 and 2.24 g/kg in validation sets, RPD : 3.37). For Shajiang black soil, the optimal SOM prediction models were LR-CARS-PLSR ( R 2 and RMSE : 0.95 and 0.93 g/kg in calibration sets, 0.86 and 1.44 g/kg in validation sets, RPD : 2.62) and FDR-CARS-SVR ( R 2 and RMSE : 0.99 and 0.45 g/kg in calibration sets, 0.83 and 1.58 g/kg in validation sets, RPD : 2.38). The results suggested that the CARS algorithm combined CR and FDR can signiﬁcantly improve the modeling accuracy of SOM content. presents a ﬂowchart of the process. Firstly, characteristic wavelengths were screened from R, LR, CR and FDR spectral data using the CARS and UVE algorithms, respectively. Secondly, for each type of spectral data, the SOM models were established based on the PLSR, SVR and RF models using characteristic wavelengths and full spectra. Finally, the performance of the models was compared and the optimal model for each soil type was selected.

Due to differences in the soil forming environment, there are different hyperspectral characteristics among different geographical regions and soil types. Many studies have been conducted in different geographical regions and soil types [10,[19][20][21]26]. SOM prediction models with spectral variables for grouping soil samples are more accurate than global modeling methods. A local PLSR model based on the spatial constraints proposed by Shi et al. [19] predicted the SOM content more accurately using a soil spectral library in China. Bao et al. [26] improved the SOM prediction accuracy by applying an optimal soil grouping strategy.
Soil hyperspectral data are composed of various wavelengths with different correlation degrees among them and some information redundancy. Characteristic wavelength screening aims to eliminate the uninformative variables while selecting characteristic variables from hyperspectral data using algorithms and various criteria. After characteristic wavelength screening, the number of spectral wavelengths is compressed significantly, which reduces the dimensionality of the variables and the complexity of the models in the modeling process. Sophisticated methods include the competitive adaptive reweighted sampling (CARS) algorithm, the uninformative variables elimination (UVE) algorithm, the successive projections algorithm (SPA), uniform-interval wavelength reduction, the genetic algorithm, and particle swarm optimization [26][27][28][29][30][31]. Moreover, combinations of multiple algorithms, such as UVE-SPA, CARS-SPA and Monte Carlo-based UVE, have been used to optimize selected wavelengths [29,32]. Some studies reported that CARS could compress the number of original spectral wavelengths to lower than 16% [26,32,33].
PLSR modeling precision based on selected characteristic wavelengths-usually higher than that based on the full spectra [26][27][28]31] or the dimensionality of the spectral data-can be significantly reduced while assuring modeling precision [34]. The CARS and UVE algorithms optimize wavelength selection based on the PLSR model. The UVE algorithm selects characteristic wavelengths based on stability analyses of the PLSR regression coefficient [30]. The CARS algorithm selects characteristic wavelengths with high absolute regression coefficient values in the PLSR model [29]. Both algorithms were shown to be effective ways to reduce the number of inputs and improve the PLSR modeling accuracy of SOM [26,27,32,33]. However, it is rarely reported whether CARS and UVE algorithms can improve the accuracy of machine learning methods, such as random forest (RF), support vector regression (SVR) and others.
Spectral transformation, such as inverse-log reflectance (LR), continuum removal (CR), first-order derivative reflectance (FDR) and fractional order derivative, might increase the precision of SOM prediction models by enhancing the absorption or reflection characteristics of the soils in some wavelengths [11,23,[35][36][37]. For example, Nawar et al. [35] and Dotto et al. [23] reported that CR and FDR transformation had a strong positive influence on the performance of most SOM prediction models. FDR transformation showed better model performance than the second derivative transformation for SOM estimations in several modeling methods [36]. Some research has also explored the prediction effect of SOM content using characteristic wavelength screening combined with different spectral transformation techniques [38,39]. As shown above, characteristic wavelength screening, spectral transformation and combinations of two means have been widely applied to improve the accuracy of SOM spectral modeling.
Paddy soil and Shajiang black soil, i.e., the two main types of cultivated soil in East China, were selected as the study object in this research. After different spectral transformations of the VNIR hyperspectral data of the two soil types, characteristic wavelength datasets were selected using the CARS and UVE algorithms. Then, PLSR, SVR and RF were used to establish SOM prediction models. The objectives of this research were to: (1) analyze the influence of the CARS and UVE algorithms on the accuracy of the PLSR, SVR and RF models, (2) compare improvements of modeling accuracy by characteristic wavelength screening and spectral transformation, and (3) assess the modeling performance of the PLSR, SVR and RF models and establish an optimal SOM prediction model for Paddy soil and Shajiang black soils in East China.

Study Area
Study area 1 is located in the central plains of Jiangsu Province (119 • 53 37 -120 • 14 4 E, 32 • 20 17 -32 • 44 50 N) in eastern China, covering an area of 1050 km 2 ( Figure 1). The annual average temperature, precipitation and elevation are 14.5 • C, 991.7 mm and 5-10 m, respectively. Parent materials mainly included lagoonal facies sediments. Paddy soil dominates. Paddy fields dominate land use type, and the rice-rape rotation is the main crop rotation system.

Soil Sampling and Analysis
In study area 1, 111 Paddy soil samples were collected from the surface layer (0cm) in November 2009 ( Figure 1). In study area 2, 108 Shajiang black soil samples we collected from the surface layer in June 2016 ( Figure 1). In each field, 8-12 soil sampl   (Figure 1). The annual average temperature and precipitation are 14.8 • C and 821.5 mm, respectively. The elevation is 20-30 m, decreasing from the northwest to the southeast. Shajiang black soil dominates. Upland dominates the land use type, and wheat-soybean rotation is the main crop rotation system.

Soil Sampling and Analysis
In study area 1, 111 Paddy soil samples were collected from the surface layer (0-20 cm) in November 2009 ( Figure 1). In study area 2, 108 Shajiang black soil samples were collected from the surface layer in June 2016 ( Figure 1). In each field, 8-12 soil samples were collected within a radius of 10-20 m from the field center. The collected soil samples were mixed and 1 kg was retained using the quartation method. After soil samples were air-dried and ground in the lab, a part of each sample was sieved using a 0.2-mm soil sieve and used to measure SOM content. The SOM content was determined using the potassium dichromate method, which is the same as the wet oxidation method [40].

Soil Spectrum Collection and Preprocessing
After air-drying, grinding and sieving (<2 mm), the diffuse reflectance spectra of the soil samples were measured using an ASD FieldSpec 4 portable spectral radiometer (Analytical Spectral Devices Inc., Boulder, CO, USA). The wavelength range and resampling interval were VNIR (350-2500 nm) and 1 nm, respectively. The entire operation was performed in a dark laboratory with controlled lighting conditions; the light source was a halogen lamp. The soil samples were placed in containers with a diameter of 10 cm and a depth of 1.5 cm, and the surface of the soil sample was scraped flat. The sensor probe was located 15 cm above the surface of the soil sample, with a probe view angle of 25 • . A white panel with 99% reflectance was used to calibrate the spectrometer before measuring. Each sample was rotated four times, and 10 scans were performed from each direction. Hence, 40 scanning spectral curves were collected for each sample and the mean was used as the spectra of the soil sample [41].
The Savitzky-Golay (SG) filter method with a moving window of 11 nm and a local polynomial order of 2 regression was used to smooth the reflectance curves. LR, CR and FDR were applied to transform the original reflectance (R) to strengthen the relationship between the SOM and the spectra. Finally, each soil sample yielded 2141 wavelengths for each type of spectra data in the VNIR (355-2495 nm) domain. Spectral data processing was performed using "prospectr" package [42] in the R software.

Characteristic Wavelength Screening Algorithms
The CARS algorithm selects characteristics by choosing variables with high absolute regression coefficient values in the PLSR model. It consists of three major steps, i.e., Monte Carlo sampling, PLSR modeling and the acquisition of variable weights. This algorithm executes forced wavelength selection by the exponential damping function and makes competitive wavelength selections using the adaptive reweighted sampling technique. The detailed process of the CARS algorithm is shown in the references [29]. The UVE algorithm is a variable selection approach based on stability analysis of the PLSR regression coefficient [30]. This algorithm eliminates uninformative variables that have relatively small covariance with dependent variables but high variances. The detailed process of the UVE algorithm is shown in the references [43,44]. The CARS and UVE algorithms were applied in MATLAB R2012a.

SOM Spectral Modeling
A total of 111 Paddy soil samples were divided into a calibration set (74, 2/3) and a validation set (37, 1/3) using the Kennard-Stone method. For Shajiang black soil, 72 soil samples were selected as the calibration set, and 36 were used as the validation set. Figure 2 Sustainability 2022, 14, 8455 5 of 18 presents a flowchart of the process. Firstly, characteristic wavelengths were screened from R, LR, CR and FDR spectral data using the CARS and UVE algorithms, respectively. Secondly, for each type of spectral data, the SOM models were established based on the PLSR, SVR and RF models using characteristic wavelengths and full spectra. Finally, the performance of the models was compared and the optimal model for each soil type was selected.

SOM Spectral Modeling
A total of 111 Paddy soil samples were divided into a calibration set (74, 2 validation set (37, 1/3) using the Kennard-Stone method. For Shajiang black soi samples were selected as the calibration set, and 36 were used as the validation se 2 presents a flowchart of the process. Firstly, characteristic wavelengths were from R, LR, CR and FDR spectral data using the CARS and UVE algorithms, resp Secondly, for each type of spectral data, the SOM models were established base PLSR, SVR and RF models using characteristic wavelengths and full spectra. Fin performance of the models was compared and the optimal model for each soil t selected. In RF modeling, the two main parameters are the number of trees growin forest (ntree) and the number of randomly selected predictor variables at each no In SVR modeling, the linear kernel function was used to build the model and t parameter was the penalty coefficient (C). The parameters mtry and ntree were set to 100-2000 for RF modeling, respectively, and the C range was set to 2 −4 -2 4 for SVR ing. The "e1071" package [45] in the R software was used for parameter tuning u search and 10-fold cross-validation. RF and SVR modeling and parameter tuni In RF modeling, the two main parameters are the number of trees growing in the forest (n tree ) and the number of randomly selected predictor variables at each node (m try ). In SVR modeling, the linear kernel function was used to build the model and the main parameter was the penalty coefficient (C). The parameters m try and n tree were set to 1-5 and 100-2000 for RF modeling, respectively, and the C range was set to 2 −4 -2 4 for SVR modeling. The "e1071" package [45] in the R software was used for parameter tuning using grid search and 10-fold cross-validation. RF and SVR modeling and parameter tuning were performed using the "e1071" packages in the R software. PLSR modeling was performed using the "pls" package [46] in the R software. Statistical analyses were performed using the "stats" package [47] in the R software (R Core Team, R version 4.2.0, https://www.r-project.org/).

Model Evaluation
The coefficient of determination (R 2 ), root mean square error (RMSE), relative percent deviation (RPD) and Lin's concordance correlation coefficient (LCCC) were chosen as the evaluation indexes. RMSE is smaller as R 2 approaches 1, indicating better stability and higher prediction precision of the model. If RPD is below 1.5, the models have poor estimation abilities. With RPD in the range of 1.5 to 1.8, the estimation precision of the models is improved to some extent, but it has a margin for improvement. For RPD in the range of 1.8 to 2, the prediction is considered to be good. When RPD is higher than 2, the models achieve a high level of precision. LCCC represents the distribution and aggregation degree of the predicted and observed values near the 1:1 line; the larger the value, the better.
The calculation formulas of the different evaluation indexes were as follows: where O i and P i are the observed and predicted values, respectively; O and P are the mean values of observed and predicted values, respectively; s o and s p are the corresponding standard deviations; r is the correlation coefficient between the observed and predicted values; and n is the number of observations.

Characteristic of Soil Spectral Curves
The SOM of Paddy soil samples was relatively high, averaging 32.13 ± 7.21 g/kg ( Table 1), while that of Shajiang black soil was relatively low, averaging 21.60 ± 3.94 g/kg. The coefficients of variation (CV) of SOM in Paddy soil and Shajiang black soil were 18.24% and 22.44%, showing moderate variation. The CV of Paddy soil was relatively high.  The SOM content was divided into seven levels: <15 g/kg, 15-20 g/kg, 20-25 g/kg, 25-30 g/kg, 30-35 g/kg, 35-40 g/kg and >40 g/kg [39]. The mean spectral reflectance curves corresponding to seven SOM content levels were calculated ( Figure 3a). With increasing SOM content, the spectral reflectance of the soil decreased over the full spectral range except for the spectral curve of SOM below 15 g/kg. With increasing wavelength, the reflectance in the visible spectrum increased quickly. In the NIR wavelength, the reflectance of soils showed stable growth ( Figure 3a). The mean spectral reflectance of soil samples with SOM < 15 g/kg was smaller than that of the samples with SOM from 15 to 25 g/kg.
The absorption characteristics were not apparent in the original spectral curves; however, after CR transformation, they were visibly strengthened and the depth of the absorption valley increased ( Figure 3b). Except for the more prominent absorption valleys near 1400 nm, 1900 nm and 2200 nm, the relevant evident characteristics were also detected near 500 nm, 650 nm and 850 nm, respectively. The absorption characteristics near 650 nm were generally strengthened with an increase in SOM content.
The spectral reflectance curves of the two soil types at the minimum, 25%, 50%, 75% and maximum of SOM content were used to analyze the spectral characteristics ( Figure 4). The spectral reflectance curves of Paddy soil and Shajiang black soil gradually became flat with increasing SOM content, indicating negative correlation between the spectral reflectance and SOM content ( Figure 4). For Shajiang black soil, the two spectral curves when the SOM content was 18.91 g/kg and 21.82 g/kg had extremely similar and overlapping characteristics (Figure 4b). Similar phenomena were observed between the two spectral curves when the SOM content was 23.89 g/kg and 31.30 g/kg (Figure 4b). The spectral features showed no significant differences with the change in SOM content. except for the spectral curve of SOM below 15 g/kg. With increasing wavelength, the reflectance in the visible spectrum increased quickly. In the NIR wavelength, the reflectance of soils showed stable growth (Figure 3a). The mean spectral reflectance of soil samples with SOM < 15 g/kg was smaller than that of the samples with SOM from 15 to 25 g/kg. The absorption characteristics were not apparent in the original spectral curves; however, after CR transformation, they were visibly strengthened and the depth of the absorption valley increased (Figure 3b). Except for the more prominent absorption valleys near 1400 nm, 1900 nm and 2200 nm, the relevant evident characteristics were also detected near 500 nm, 650 nm and 850 nm, respectively. The absorption characteristics near 650 nm were generally strengthened with an increase in SOM content.
The spectral reflectance curves of the two soil types at the minimum, 25%, 50%, 75% and maximum of SOM content were used to analyze the spectral characteristics ( Figure  4). The spectral reflectance curves of Paddy soil and Shajiang black soil gradually became flat with increasing SOM content, indicating negative correlation between the spectral reflectance and SOM content ( Figure 4). For Shajiang black soil, the two spectral curves when the SOM content was 18.91 g/kg and 21.82 g/kg had extremely similar and overlapping characteristics (Figure 4b). Similar phenomena were observed between the two spectral curves when the SOM content was 23.89 g/kg and 31.30 g/kg (Figure 4b). The spectral features showed no significant differences with the change in SOM content. SOM content showed significantly negative correlations with the R spectra in the full spectra range; however, it showed significantly positive correlations with the LR spectra ( Figure 5). For Paddy soil, the correlations between SOM content and spectra were  The absorption characteristics were not apparent in the original spectral curves; however, after CR transformation, they were visibly strengthened and the depth of the absorption valley increased (Figure 3b). Except for the more prominent absorption valleys near 1400 nm, 1900 nm and 2200 nm, the relevant evident characteristics were also detected near 500 nm, 650 nm and 850 nm, respectively. The absorption characteristics near 650 nm were generally strengthened with an increase in SOM content.
The spectral reflectance curves of the two soil types at the minimum, 25%, 50%, 75% and maximum of SOM content were used to analyze the spectral characteristics ( Figure  4). The spectral reflectance curves of Paddy soil and Shajiang black soil gradually became flat with increasing SOM content, indicating negative correlation between the spectral reflectance and SOM content ( Figure 4). For Shajiang black soil, the two spectral curves when the SOM content was 18.91 g/kg and 21.82 g/kg had extremely similar and overlapping characteristics (Figure 4b). Similar phenomena were observed between the two spectral curves when the SOM content was 23.89 g/kg and 31.30 g/kg (Figure 4b). The spectral features showed no significant differences with the change in SOM content. SOM content showed significantly negative correlations with the R spectra in the full spectra range; however, it showed significantly positive correlations with the LR spectra ( Figure 5). For Paddy soil, the correlations between SOM content and spectra were SOM content showed significantly negative correlations with the R spectra in the full spectra range; however, it showed significantly positive correlations with the LR spectra ( Figure 5). For Paddy soil, the correlations between SOM content and spectra were stronger. The correlations in the 400-900 nm wavelengths were significantly stronger than those in the other wavelengths and the absolute values of the correlation coefficients were higher than 0.6 ( Figure 5a). For Shajiang black soil, slightly weaker correlations with the SOM content were observed, without great differences in correlation among the different wavelengths. The absolute value of the correlation coefficient was between 0.30 and 0.48. The SOM, CR and FDR spectra presented significant positive or negative correlations at 400-750 nm, 1400-1700 nm and 2200-2400 nm wavelengths, and the absolute values of the correlation coefficients were lower than those of the R and LR spectra ( Figure 5). higher than 0.6 ( Figure 5a). For Shajiang black soil, slightly weaker correlations with the SOM content were observed, without great differences in correlation among the different wavelengths. The absolute value of the correlation coefficient was between 0.30 and 0.48. The SOM, CR and FDR spectra presented significant positive or negative correlations at 400-750 nm, 1400-1700 nm and 2200-2400 nm wavelengths, and the absolute values of the correlation coefficients were lower than those of the R and LR spectra ( Figure 5).

Results of Characteristic Wavelength Screening
For the R spectra of the Paddy soil, the screening results based on the UVE algorithm is shown in Figure 6. A total of 815 wavelengths were screened from 2141 wavelengths, accounting for 37.89% of the total number of spectral wavelengths. The screened 551 wavelengths were distributed at 1223-1550 nm, 1929-2100 nm and 2233-2485 nm, whereas the 264 visible wavelengths were distributed at 355-432 nm, 506-610 nm and 637-718 nm. For the R spectra of Paddy soil, the screening results based on the CARS algorithm are shown in Figure 7. The number of screened wavelengths decreased continuously until reaching zero during the screening process, whereas the Monte Carlo sampling times or operation times increased continuously (Figure 7a). According to the trend graph of the

Results of Characteristic Wavelength Screening
For the R spectra of the Paddy soil, the screening results based on the UVE algorithm is shown in Figure 6. A total of 815 wavelengths were screened from 2141 wavelengths, accounting for 37.89% of the total number of spectral wavelengths. The screened 551 wavelengths were distributed at 1223-1550 nm, 1929-2100 nm and 2233-2485 nm, whereas the 264 visible wavelengths were distributed at 355-432 nm, 506-610 nm and 637-718 nm. the correlation coefficients were lower than those of the R and LR spectra (

Results of Characteristic Wavelength Screening
For the R spectra of the Paddy soil, the screening results based on the U is shown in Figure 6. A total of 815 wavelengths were screened from 2141 accounting for 37.89% of the total number of spectral wavelengths. The wavelengths were distributed at 1223-1550 nm, 1929-2100 nm and 2 whereas the 264 visible wavelengths were distributed at 355-432 nm, 50 637-718 nm. For the R spectra of Paddy soil, the screening results based on the CA are shown in Figure 7. The number of screened wavelengths decreased con reaching zero during the screening process, whereas the Monte Carlo sam operation times increased continuously (Figure 7a). According to the tren For the R spectra of Paddy soil, the screening results based on the CARS algorithm are shown in Figure 7. The number of screened wavelengths decreased continuously until reaching zero during the screening process, whereas the Monte Carlo sampling times or operation times increased continuously (Figure 7a). According to the trend graph of the RMSE of cross-validation (RMSECV) (Figure 7b), the modeling precision increased, whereas the RMSECV decreased when the operation time increased from 1 to 27 due to the deletion of the wavelengths which were poorly correlated with SOM. At the 27th sampling time, Sustainability 2022, 14, 8455 9 of 18 RMSECV reached a minimum; therefore, the selected spectral variable subset was optimal. A total of 61 wavelengths screened by the CARS algorithm were mainly distributed within 1990-2490 nm, accounting for 2.84% of the total number of wavelengths.
Sustainability 2022, 14, x FOR PEER REVIEW RMSE of cross-validation (RMSECV) (Figure 7b), the modeling precision in whereas the RMSECV decreased when the operation time increased from 1 to 2 the deletion of the wavelengths which were poorly correlated with SOM. At the 2 pling time, RMSECV reached a minimum; therefore, the selected spectral variabl was optimal. A total of 61 wavelengths screened by the CARS algorithm were ma tributed within 1990-2490 nm, accounting for 2.84% of the total number of wave The screened characteristic wavelengths of the two soil types are shown in F The number of characteristic wavelengths using the UVE algorithm was higher t with the CARS algorithm; this was related to the principles of the algorithms. T algorithm screened 105-884 characteristic wavelengths for two soil types. The C gorithm compressed the characteristic wavelengths of the two soil types to lower of the full spectral wavelengths and reduced the complexity of the SOM spectra ing. For the R, LR, CR and FDR spectra, the CARS algorithm screened 61-125 cha tic wavelengths from all 2141 wavelengths of Paddy soil and 40-61 for Shajiang b respectively. The screened characteristic wavelengths of the two soil types are shown in Figure 8. The number of characteristic wavelengths using the UVE algorithm was higher than that with the CARS algorithm; this was related to the principles of the algorithms. The UVE algorithm screened 105-884 characteristic wavelengths for two soil types. The CARS algorithm compressed the characteristic wavelengths of the two soil types to lower than 6% of the full spectral wavelengths and reduced the complexity of the SOM spectral modeling. For the R, LR, CR and FDR spectra, the CARS algorithm screened 61-125 characteristic wavelengths from all 2141 wavelengths of Paddy soil and 40-61 for Shajiang black soil, respectively.

PLSR Modeling Based on Characteristic Wavelengths
The PLSR models of SOM were established using the selected wavelengths and full spectral wavelengths ( Table 2). The validation results of the SOM PLSR models are shown in Figure 9 (Paddy soil) and Figure 10 (Shajiang black soil). For different soil types and spectral transformation data, the accuracy of the SOM models using the selected wavelengths was improved to different extents compared to the models using the full spectral wavelengths. The accuracy of the PLSR models combined with the CARS algorithm (CARS-PLSR) was higher than those of the PLSR models combined with the UVE algorithm (UVE-PLSR). CARS-PLRS models had the highest accuracies, with R 2 p , RPD and LCCC values higher than 0.80, 2.0 and 0.90, indicating that the SOM content could be accurately predicted. The PLSR modeling accuracy of paddy soil was better than that of Shajiang black soil.

PLSR Modeling Based on Characteristic Wavelengths
The PLSR models of SOM were established using the selected wavelengths and fu spectral wavelengths ( Table 2). The validation results of the SOM PLSR models are show in Figure 9 (Paddy soil) and Figure 10 (Shajiang black soil). For different soil types an spectral transformation data, the accuracy of the SOM models using the selected wave lengths was improved to different extents compared to the models using the full spectr wavelengths. The accuracy of the PLSR models combined with the CARS algorithm (CARS-PLSR) was higher than those of the PLSR models combined with the UVE algo rithm (UVE-PLSR). CARS-PLRS models had the highest accuracies, with 2 p R , RPD an LCCC values higher than 0.80, 2.0 and 0.90, indicating that the SOM content could be a curately predicted. The PLSR modeling accuracy of paddy soil was better than that o Shajiang black soil.   Note: (a) R, LR, CR, and FDR stand for different spectral data. F stands for full spectral wavelengths; UVE stands for selected wavelengths by UVE algorithm; CARS stands for selected wavelengths by CARS algorithm. Model R-F-PLSR stands for PLSR model using all spectral wavelength reflectance; R-UVE-PLSR stands for PLSR modeling using selected reflectance wavelength by UVE algorithm. Figure 9. Scatter plots of measured and predicted SOM of Paddy soil. Figure 9. Scatter plots of measured and predicted SOM of Paddy soil. With the same soil type and wavelength screening algorithm, the accuracy of the PLSR models after spectral transformation (LR, CR, and FDR) was improved compared with the original reflectance (R). For example, the CARS-PLSR models of Paddy soil  With the same soil type and wavelength screening algorithm, the accuracy of the PLSR models after spectral transformation (LR, CR, and FDR) was improved compared with the original reflectance (R). For example, the CARS-PLSR models of Paddy soil (model LR-CARS-PLSR, CR-CARS-PLSR and FDR-CARS-PLSR), with R 2 p and RPD values greater than 0.90 and 3 and RMSE p lower than 2.43 g/kg, outperformed model R-CARS-PLSR (with R 2 p value of 0.87, RDP value of 2.81 and RMSE p value of 2.68 g/kg). LCCC values increased. For Shajiang black soil, the accuracy of LR-CARS-PLSR and FDR-CARS-PLSR was slightly better than that of R-CARS-PLSR. Overall, this study showed that LR and FDR transformation improved the modeling accuracy, which was consistent with other research results [23,35,36,38].
The improvement of model accuracy by characteristic wavelength screening was superior to that of spectral transformation. For example, in Paddy soil, the R 2 p values of the PLSR models using full transformed spectra (model LR-F-PLSR, CR-F-PLSR and FDR-F-PLSR) ranged from 0.62 to 0.80, and the RPD values ranged from 1.62 to 2.24. Compared with R-F-PLSR (R 2 p and RPD value of was 0.76 and 2.06), the accuracies of LR-F-PLSR, CR-F-PLSR and FDR-F-PLSR were slightly improved ( Table 2). In Paddy soil, the R 2 p and RPD value of the UVE-PLSR (LR-UVE-PLSR, CR-UVE-PLSR and FDR-UVE-PLSR) and CARS-PLSR models (LR-CARS-PLSR, CR-CARS-PLSR and FDR-CARS-PLSR) ranged from 0.81 to 0.95 and from 2.29 to 4.31, respectively, and the accuracy of these models were significantly improved than above models.
For two soil types, the predictive accuracy of the samples with SOM content lower than 20 g/kg was improved significantly in the CARS-PLSR models. The predicted and measured values were concentrated around the 1:1 line (Figures 9i-l and 10i-l). In this study area, the average SOM content of the Shajiang black soil was 21.60 ± 3.94 g/kg and the SOM content of 34% in the samples was lower than 20 g/kg (Table 1). For models using full spectral data of Shajiang black soil, the predicted and measured values were relatively dispersed near the 1:1 line, regardless of whether the SOM content was lower or higher than 20 g/kg (Figure 10a-d). The corresponding LCCC values were between 0.54 and 0.74 and the RPD values ranged from 1.17 to 1.59, indicating the poor predictive ability of the models. The accuracy of the CARS-PLSR models was significantly improved. The corresponding LCCC values were between 0.84 and 0.92 and the predicted and measured values were uniformly distributed near the 1:1 line (Figure 10i-l). The RPD values were all higher than 2, indicating the high accuracy and superior predictive capabilities of the models.
According to previous studies, soil spectral characteristics are not recommended for modeling and prediction when SOM is lower than 20 g/kg [48,49]. This might be because when there is a low SOM content, the spectral reflectance of the soils is dominated by other factors [48,50]. This study established optimal SOM prediction models for Shajiang black soil after CARS algorithm, suggesting that the CARS algorithm is an effective means to improve prediction precision for soils with low SOM contents.

SVR and RF Modeling Based on Characteristic Wavelengths
The SVR and RF models of SOM were established based on the characteristic wavelengths and the full spectra data (Tables 3 and 4). For different soil types and spectral transformation data, the results of SVR modeling were superior to those of RF modeling. On the whole, the accuracy of the SVR models was similar to that of PLSR models, while the accuracy of the RF models was notably worse than that of PLSR models. RF models performed worse for two soil types, with the R 2 values of validation ranging from 0.22 to 0.68 and RPD values ranging from 1.01 to 1.60. The SVR and RF modeling accuracies of Paddy soil were better than that of Shajiang black soil. The accuracies of the SVR models combined with the CARS algorithm (CARS-SVR) and UVE algorithm (UVE-SVR) were higher than those of the SVR models using full spectral data, and CARS-SVR models were best. In Paddy soil, the R 2 p values of the CARS-SVR models ranged from 0.88 to 0.92, the RPD values ranged from 2.70 to 3.37 and the LCCC value ranged from 0.92 to 0.96 (Table 3). The R 2 p values of SVR models using full spectral data ranged from 0.75 to 0.83, the RPD values ranged from 1.87 to 2.36 and the LCCC value ranged from 0.85 to 0.90. These results indicated that a combination of the UVE and CARS algorithms could significantly improve the accuracy of SVR modeling.
For the R spectra of Paddy soil, LR, CR and FDR improved the SVR and RF modeling accuracy moderately. For example, the CARS-SVR models of Paddy soil (model LR-CARS-SVR, CR-CARS-SVR and FDR-CARS-SVR), with R 2 p and RPD values greater than 0.90 and 3 and RMSE p lower than 2.37 g/kg, outperformed R-CARS-PLSR (with R 2 p value of 0.88, RDP value of 2.70, RMSE p value of 2.85 g/kg). LCCC values increased moderately. Additionally, the UVE-SVR models of Paddy soil (model LR-UVE-SVR, CR-UVE-SVR and FDR-UVE-SVR) also outperformed R-UVE-SVR moderately. In Shajiang black soil, spectral transformation could not improve the SVR and RF modeling accuracy.
The results showed that the improvement of SVR and PLSR modeling accuracy by characteristic wavelength screening was superior to that of spectral transformation. CARS-PLSR and CARS-SVR using CR spectra produced the best predictions (hightest R 2 and RPD, lowest RMSE) for SOM modeling of Paddy soil. CARS-PLSR and CARS-SVR using LR and FDR spectra were optimal for Shajiang black soil.

Discussion
This study selected effective spectral wavelengths using the CARS and UVE algorithms. The two algorithms decreased the number of input variables for modeling and increased the modeling accuracy and robustness. In this study, the CARS algorithm reduced the number of wavelengths from the original 2141 to 40-125 for R, LR, CR and FDR, and the UVE algorithm selected 257-884 wavelengths from the full spectral data of two soil types. The screened wavelengths were mainly distributed in the ranges of 400-900 nm, 1400-1700 nm and 2000-2400 nm, which was consistent with the research conclusions of  [26]. This further proved the importance of eliminating uninformative variables from full spectral data during spectral modeling [26][27][28]32,33].
After combining with the CARS algorithm, the modeling precision were remarkably improved compared to models combined with UVE algorithm. This was consistent with the studies reported by Vohland et al., Yu et al. and Tang et al. [27,32,33]. The CARS algorithm was superior to the UVE algorithm during SOM spectral modeling, which was mainly related to the different principles of the two algorithms. The CARS algorithm selects variables with relatively high absolute regression coefficient values in the PLSR model based on adaptive reweighted sampling technology and eliminates wavelengths with small weights [29]. The UVE algorithm selects variables based on the stability of the PLSR coefficient. It can avoid model overfitting and increase the prediction abilities of the models [43,44]. This approach differs from previous wavelength selection methods (i.e., according to prior knowledge or the correlation with SOM). For example, in the R spectral data of Paddy soil, a total of 61 wavelengths selected by the CARS algorithm were mainly distributed within 1990-2495 nm, with the absolute value of correlation coefficient between reflectance and SOM content being lower than 0.46.
SVR and RF methods have advantages over other prediction models, as they are able to model complex, non-linear and linear relationships between variables [10]. Rossel and Behrens [10] reported that predictions by SVR using all VNIR wavelengths produced the smallest RMSE values, with RF performing weakly. Ji et al. [25] reported that SVR using all VNIR wavelengths produced the best prediction (R 2 of validation: 0.64 and RPD: 2.16.), while the precision of RF was poor (R 2 of validation: 0.40 and RPD: 1.61). Terra et al. (2015) [51] also modeled SOM accurately based on SVR with a linear kernel function. Dotto et al. [23] found that the SVR model yielded robust predictions while the overall predictive ability of RF models was considered insufficient. The R 2 of RF model was 0.47 to 0.77. Our study was consistent with the above studies, and proved that the UVE and CARS algorithms can improve SVR modeling accuracy, even though the accuracy is poor when combined with the RF model. For example, Paddy soil SVR modeling produced greater R 2 and RPD values, from 0.75 to 0.92 and from 1.87 to 3.24.
However, Knox et al. [52] showed that the RF model produced an R 2 from 0.63 to 0.88 when using different spectral preprocessing only in the VNIR range. The RF model combined with CARS produced more accurate SOM predictions, with R 2 values ranging from 0.65 to 0.89, as reported by Bao et al. [26]. In that study, the SOM content ranged from 4.25 to 80.32 g/kg, with a mean of 39.5 ± 13.21 g/kg. In our study, the RF models performed worse for two soil types, with the R 2 values ranging from 0.22 to 0.68, RPD values ranging from 1.01 to 1.60 and mean SOM contents of 32.13 ± 7.21 g/kg and 21.60 ± 3.94 g/kg for Paddy soil and Shajiang black soil, respectively. The difference in soil types and SOM content levels might be a reason that RF models performed differently.
The PLSR and SVR models for Shajiang black soil using full spectral domain of R, LR, CR and FDR produced poor results. The R 2 in validation ranged from 0.26 to 0.63 and the RPD ranged from 1.17 to 1.64. The poor model performance was consistent with the conclusions of Lu et al. [53]; their research was mainly related to low SOM content and its weak correlation with the spectra. In this study, there were no significant differences in the correlations between the SOM and different wavelengths, with the absolute value of the correlation coefficient ranging between 0.30 and 0.48 ( Figure 5b). Moreover, there were few differences in the characteristics among the spectral curves of different SOM contents (Figure 4b). After screening the wavelengths with the CARS algorithm, the PLSR and SVR models were significantly improved, with RPD values greater than 2.0.
For soils with low SOM contents, different spectral transformation approaches can improve the precision of spectral modeling. Nawar et al. [35] reported that CR and FDR spectral transformation improved SOM prediction models, to varying degrees, based on PLSR, SVR and MARS. In that study, the SOM content ranged between 0.2 and 23.0 g/kg, averaging 8.9 g/kg. Wang et al. [54] found that discrete wavelet transformation of the original spectra improved the modeling precision of SOM in northern China. The R 2 of the optimal model reached as high as 0.72. In that study, the average SOM was 15.76 g/kg and about 70% of samples had SOM contents lower than 20 g/kg. Yang et al. [55] used spectral characteristic indexes to efficiently predict SOM content of Shajiang black soil in the eastern China, achieving a high degree of precision (R 2 : 0.81). That study included 45 soil samples with SOM contents ranging from 2.07 g/kg to 21.21 g/kg. Further spectral processing, wavelength screening algorithms and modeling techniques were applied to hyperspectral modeling of soil properties. For different soil types and SOM content levels, the optimal spectral treatment might be different, although this requires more comparative studies in the future.

Conclusions
(1) The CARS and UVE algorithms reduced the extent of the soil hyperspectral data and the complexity of SOM spectral modeling. The CARS algorithm had a relatively high compression ratio and selected 40-125 characteristic wavelengths from all VNIR wavelengths of R, LR, CR and FDR. The selected wavelengths of the two soil types were mainly distributed in the near-infrared wavelength range.
(2) For the two soil types and four full spectral domains (R, LR, CR, and FDR), the CARS and UVE algorithms improved the SOM modeling precision based on the PLSR and SVR methods. PLSR and SVR combined with the CARS algorithm displayed the best prediction power, providing an important reference for band selection. The improvement of SVR and PLSR modeling accuracy by CARS and UVE was superior to that of spectral transformation.
(3) CARS-PLSR and CARS-SVR using CR spectra produced the best predictions (highest R 2 and RPD, lowest RMSE) for SOM modeling of Paddy soil. CARS-PLSR and CARS-SVR using LR and FDR spectra were the optimal models for Shajiang black soil. The modeling accuracies of PLSR and SVR of Paddy soil were better than those for Shajiang black soil. RF models performed poorly for both soil types. The CARS algorithm improved predictions considerably for soil samples with low SOM contents.