Prediction of Soil Organic Matter by VIS – NIR Spectroscopy Using Normalized Soil Moisture Index as a Proxy of Soil Moisture

Soil organic matter (SOM) is an important parameter of soil fertility, and visible and near-infrared (VIS–NIR) spectroscopy combined with multivariate modeling techniques have provided new possibilities to estimate SOM. However, the spectral signal is strongly influenced by soil moisture (SM) in the field. Interest in using spectral classification to predict soils in the moist conditions to minimize the influence of SM is growing. The objective of this study was to investigate the transferability of two approaches, SM–based cluster method with known SM (classifying the VIS–NIR spectra into different SM clusters to develop models separately), the normalized soil moisture index (NSMI)–based cluster method with unknown SM (utilizing NSMI to indicate the SM and establish models separately), to predict SOM directly in moist soil spectra. One hundred and twenty one soil samples were collected from Central China, and eight SM levels were obtained for each sample through rewetting experiments. Their reflectance spectra and SOM concentrations were measured in the laboratory. Partial least square-support vector machine (PLS-SVM) was employed to construct SOM prediction models. Specifically, prediction models were developed for NSMI–based clusters with unknown SM data. The models were assessed through three statistics in the processes of calibration and validation: the coefficient of determination (R2), root mean square error (RMSE) and the ratio of the performance to deviation (RPD). Results showed that the variable SM led to reduced VIS–NIR reflectance nonlinearly across the entire spectral range. NSMI was an effective spectral index to indicate the SM. Classifying the VIS–NIR spectra into different SM clusters in known SM states could improve the performance of PLS-SVM models to acceptable prediction accuracies (Rcv = 0.69–0.77, RPD = 1.79–2.08). The estimation of SOM, when using the NSMI–based cluster method with unknown SM (RPD = 1.95–2.04), was similar to the use of the SM–based cluster method with known SM (RPD = 1.79–2.08). The predictive results (RPD = 1.87–2.06) demonstrated that the NSMI—based cluster method has potential for application outside the laboratory for SOM prediction without knowing the SM explicitly, and this method is also easy to carry out and only requires spectral information.


Introduction
It is well recognized that soil organic matter (SOM) can greatly influence the physical and chemical properties in soil, and plays a positive role in crop growth [1].Fast, accurate and cost-effective determination of the SOM content across large areas is crucial for local agricultural development [2,3].The traditional chemical methods of SOM analysis conducted in laboratory environments are relatively accurate, but these methods are tedious and labor-intensive, and cannot rapidly monitor the SOM on broad scales.
In recent years, visible (VIS, 400-700 nm) and near-infrared (NIR, 700-2500 nm) spectroscopy combined with multivariate modeling techniques, provided an alternative tool to characterize SOM [4,5].Besides, this VIS-NIR technique can be applied either in the laboratory or in the field, and when a calibration model between spectral data and their corresponding soil property reference value is developed, this model can be used to predict other soil samples in specific areas only containing their VIS-NIR spectra [6].Compared with laboratory-based VIS-NIR spectroscopic measurements, field-based VIS-NIR spectroscopic measurements can greatly improve the scanning efficiency, due to avoiding the collection and preparation of soil samples (e.g., transporting, air drying, grinding, sieving, etc.), a lot of time and labor would be saved [7].However, the field spectral data is more susceptible to interference of external environmental factors than laboratory deployment, such as variable soil moisture, temperature, natural aggregation, the condition of soil surface and so on, which may lead to the prediction accuracy in the field being less accurate than that of laboratory-based stable environments [8].Among these factors, the presence of soil moisture (SM) does have a substantial, complex and nonlinear effect on reflectance spectra [9][10][11].Generally, with increasing SM, the reflectance spectra across the entire spectral range (350-2500 nm) decrease, and in the field, SM may vary significantly [10].However, direct analysis of one-dimensional spectral data is not efficient enough to explore the effect of SM on the reflectance spectra because of the nature of varying SM.A better approach is to use two-dimensional correlation spectroscopy that can provide more detailed spectral information [12].
Most recently, many researchers have investigated the effect of SM on reflectance spectra, and some methods for removing or minimizing the SM and improving the prediction accuracy of SOM have been also put forward and explored, such as external parameter orthogonalization (EPO) [9,[13][14][15], direct standardization (DS) and piecewise direct standardization (PDS) [11,[16][17][18], "spiking" method [19,20], first derivative [21], slope bias correction (SB) [22], orthogonal signal correction (OSC) and generalized least squares weighting (GLSW) [23,24], spectral classification [25,26] and so on.The above-mentioned EPO, DS and PDS strategies usually require dry soil spectral libraries (SSLs) at a specific scales (global, continental, national or regional) and then use a projection matrix (or transfer matrix) to correct the moist spectra.These SSLs contain both laboratory-based VIS-NIR spectra and corresponding soil property data, which can be applied to develop spectroscopic calibration models relating to moist spectra.The aforementioned "spiking" strategy is based on the idea of strengthening the leverage of the validation samples (moist samples) by increasing the diversity of the calibration samples, and thus can improve the generalization capacity of the model for the validation samples.Likewise, this method is usually associated with SSLs.Spectral preprocessing of first derivative conducted by Wu et al. [21] serve as another method to alleviate the effect of SM on reflectance spectra, in which they found that the reflectance spectra processed by first derivative within some specific wavelength ranges were insensitive to SM, and these specific regions could be used to determine soil parameters under field conditions.However, the effect influenced by SM on reflectance spectra are nonlinear and very complex, thus this method might neglect some important variables.The hypothesis of the SB method is that the prediction systematic error caused by SM can be corrected by a linear slope and bias correction.Similarly, this method also cannot solve the complex and nonlinear effect of SM because of only using simple linear corrections for target variables [22].Jiang et al. [23] utilized OSC and GLSW algorithms to remove relative SM effects and verified the transferability of OSC-partial least square (OSC-PLS) and GLSW-PLS models between different SM levels.Successful practices were reported in their study.One concern with their research is that different SM levels are difficult to determine if directly applied in the field.
Mouazen et al. [25] successfully employed the factorial discriminant analysis (FDA) method to classify the soil VIS-NIR spectra into different SM groups to minimize the effect of SM, and pointed out that the spectral classification would be useful to improve the prediction accuracy for other soil properties (e.g., C and N).Nocita et al. [26] introduced the idea of spectral classification to determine soil organic carbon (SOC) content for moist samples, and computed the normalized soil moisture index (NSMI) as a proxy of SM to spectrally classify soil VIS-NIR spectra into different clusters.Then separate PLS models relating VIS-NIR spectra to SOC content were established for both known SM clusters and unknown SM clusters determined by the NSMI.Their results showed that the predictive accuracies of SOC after NSMI classification (with unknown SM) were similar to those of known SM.Thus, efforts to improve the prediction accuracy of SOM with a wide range of SM may benefit from dividing soil VIS-NIR spectra into smaller sub-variations of SM.The main advantage of the spectral classification method using NSMI is that it can be used especially when the SM is unknown and greatly varies across large areas [26].Moreover, it does not require sample transportation and preparation if the VIS-NIR spectra of moist soil can directly predict soil properties in the field [27].However, as described in the work of Nocita et al. [26], the definition of the spectral classification of what it meant to be clear or mixed mainly depended on visual observation, so their method of spectral classification lacked objective criteria.The fuzzy k-mean (FKM) clustering can be applied to identify the optimal number of clusters that do not need set artificial thresholds, and also can deal with the continuous and complex relationships existing in the spectral data [3,28,29].Although the NSMI calculated from the reflectance values at 1800 and 2119 nm is straightforward to use for reducing the impact of SM, its extension to other datasets finds difficulty, since a NSMI index derived from one specific dataset may not be suitable for another one, and it is difficult to utilize a general NSMI index to indicate SM.Therefore, a more practical NSMI index needs to be taken into account when constructing specific predictive models.
Soil VIS-NIR spectra are mostly non-specific, containing many weak, broad and overlapping bands and thus multivariate statistics would be needed to relate spectra with soil parameters to calibrate prediction models.PLS is a commonly used technique achieving this goal, but it is only a linear calibration method.The attention paid to applying non-linear modeling techniques is continuously increasing, because there are rarely linear relationships between reflectance spectra and soil parameters, especially variation in SM including non-linear nature [9].In particular, support vector machine (SVM) based on kernel-based learning methods has attracted extensive attention in soil VIS-NIR spectra [5].Thus, the modeling technique of the combination of PLS and SVM is expected to be more superior to PLS alone.
In this study, we aimed to: (1) investigate the influence of SM on the reflectance spectra; (2) explore the feasibility of classifying the global model into different clusters for known SM (SM-based cluster); (3) verify if the NSMI could be used as an indicator of SM, and specifically, assess the model transferability between SM-based cluster method and NSMI-based cluster method in the corresponding cluster.We hope this research would provide a theoretical guidance to monitor the SOM for moist samples with unknown SM.

Study Area and Field Sampling
The study area (Chahe town) is situated in the east of Jianghan plain (Hubei Province, China) (Figure 1), which is characterized by a typical subtropical humid monsoon climate with abundant sunlight, rainfall and distinctive seasons, and is famous for its important agricultural region.The elevation of this study region is around 2-35 m, covering a geographical area of 153 km 2 .
In December 2011, July 2012, November 2012 and April 2013, 121 soil samples were collected (Figure 1).At each site, approximately 1 kg of surface soils was obtained (0-20 cm), and we also used a handheld global position system (GPS) to record the corresponding geographical coordinates (positional error < 10 m).These fresh soil samples were packed in sealed plastic bags, labeled and taken to the laboratory.According to the Chinese Soil Taxonomic Classification, the soil types of these 121 samples belong to paddy soils and fluvo-aquic soils, and the major land use types are irrigable lands.
Remote Sens. 2017, 10, 28 4 of 17 Remote Sens. 2017, 10, 28; doi:10.3390/rs10010028www.mdpi.com/journal/remotesensinghandheld global position system (GPS) to record the corresponding geographical coordinates (positional error < 10 m).These fresh soil samples were packed in sealed plastic bags, labeled and taken to the laboratory.According to the Chinese Soil Taxonomic Classification, the soil types of these 121 samples belong to paddy soils and fluvo-aquic soils, and the major land use types are irrigable lands.

Laboratory Analyses and Rewetting Experiment
The collected soil samples were further air-dried, and crushed to pass through a 2 mm sieve.Stones, roots and the vegetation litter were avoided from the soil.Each soil sample was divided into two portions, with one part for SOM analyses and the other part for spectral measurement, respectively.The SOM concentrations of all the 121 samples were chemically determined using the potassium dichromatic oxidation titration method [30].
Prior to rewetting experiment, all soil samples were oven-dried at 105 °C for 24 hr to eliminate soil moisture.Sample rewetting was conducted in only one batch (n = 121).Approximately 100 g of oven-dried soil for each sample was weighed using a scale (accuracy = 0.01 g) in the laboratory, then placed in a petri dish and labeled.These samples were wetted with 40 g of deionized water each and weighed.The 40 g of water were added slowly to each sample with a spray flask, and the dishes were immediately covered with a lid for 24 hr to avoid evaporation to obtain uniform moisture distribution within samples.These dishes were weighed to determine the varying SM weight and then scanned the first set of moist spectra.In the next few days, the samples were uncovered to air-dry at room temperature, and weighed every day, and their spectra were also recorded simultaneously.As a result, a total of eight different SM levels were collected.The average SM for 8 SM levels were 32.66%, 29.10%, 25.50%, 21.62%, 16.95%, 11.85%, 6.87%, 2.55% (gravimetric, dry basis).

Spectral Measurement and Pre-Processing
Reflectance spectra of each soil sample was acquired in a dark room using an ASD FieldSpec ® 3 Portable Spectrometer (Analytical Spectral Devices, Boulder, CO, USA), with sampling interval of

Laboratory Analyses and Rewetting Experiment
The collected soil samples were further air-dried, and crushed to pass through a 2 mm sieve.Stones, roots and the vegetation litter were avoided from the soil.Each soil sample was divided into two portions, with one part for SOM analyses and the other part for spectral measurement, respectively.The SOM concentrations of all the 121 samples were chemically determined using the potassium dichromatic oxidation titration method [30].
Prior to rewetting experiment, all soil samples were oven-dried at 105 • C for 24 h to eliminate soil moisture.Sample rewetting was conducted in only one batch (n = 121).Approximately 100 g of oven-dried soil for each sample was weighed using a scale (accuracy = 0.01 g) in the laboratory, then placed in a petri dish and labeled.These samples were wetted with 40 g of deionized water each and weighed.The 40 g of water were added slowly to each sample with a spray flask, and the dishes were immediately covered with a lid for 24 h to avoid evaporation to obtain uniform moisture distribution within samples.These dishes were weighed to determine the varying SM weight and then scanned the first set of moist spectra.In the next few days, the samples were uncovered to air-dry at room temperature, and weighed every day, and their spectra were also recorded simultaneously.As a result, a total of eight different SM levels were collected.The average SM for 8 SM levels were 32.66%, 29.10%, 25.50%, 21.62%, 16.95%, 11.85%, 6.87%, 2.55% (gravimetric, dry basis).

Spectral Measurement and Pre-Processing
Reflectance spectra of each soil sample was acquired in a dark room using an ASD FieldSpec ® 3 Portable Spectrometer (Analytical Spectral Devices, Boulder, CO, USA), with sampling interval of 1.4 nm (350-1000 nm) and 2 nm (1000-2500 nm).The main geometric parameters of the spectrometer set-up were illustrated as follows: a 50 W halogen lamp with a 45 • incident angle was used as unique light source; the lamp away from petri dish was set as 30 cm; the probe was mounted vertically about 15 cm above the dish, and the field of view of the probe was much smaller than the diameter of the dish.Each sample was scanned in four directions (each rotating the dishes by 90 • ), and five scans were collected at every direction (a total of 20 scans), and then averaged to one spectrum for each sample.Every ten samples, we would optimize the spectrometer using a standardized white Spectralon ® panel as a white reference.The re-sampling interval of the ASD spectrometer was 1 nm.
Before the original spectral data exported, splice corrections were proceeded using viewSpec™ software (version 6.2.0, ASD Inc.: Longmont, CO, USA) to solve breakpoint phenomena around 1000 and 1800 nm.The reflectance of each spectrum was narrowed to 400-2400 nm, and then Savitzky-Golay smoothing with 11 filter widths and a second-order polynomial was applied to filter the reflectance curves [31].Every spectrum was then resampled by averaging ten successive wavelengths to simplify the dimensionality of spectral matrix, and the final wavelength number was 201 for each spectral curve.

Spectral Angle and Two-Dimensional Correlation Spectroscopy
Spectral angle (SA) is a tool that can measure the spectral similarity between a test spectrum t and a reference spectrum r, by calculating the "angle" [32].The SA (θ) is defined by: where t i and r i are the spectral reflectance at specific wavelength, i; n represent the total number of wavelengths.Here, n = 201.Two-dimensional correlation spectroscopy developed by Noda is a powerful spectral analysis method to analyze complex spectral intensity variation obtained successively under certain form of perturbation, such as temperature, pressure, or even concentration, and so on [12,33].In two-dimensional correlation spectroscopy, a group of spectral data is transformed into a correlation intensity map defined by two independent spectral axes.Such two-dimensional correlation spreading spectra into 2-D space would provide more spectral features than conventional one-dimensional spectra, because some spectral features may not be readily observed from one-dimensional spectra.Moreover, some overlapped peaks can also be easily differentiated in real data.Two-dimensional correlation spectra mainly contain three basic properties: synchronous spectrum, asynchronous spectrum and disrelation spectrum.In our case, we used the synchronous correlation spectra to investigate the influence of SM on VIS-NIR spectra in the calibration dataset.Readers can be referred to Noda [12] for additional details on two-dimensional correlation spectroscopy.

Principal Component Analysis and Fuzzy K-Mean Clustering
Principal component analysis (PCA) is a mathematical method for data compression or reduction, and it is commonly applied to extract the informative features from high-dimensional datasets.Through an orthogonal transformation process, a set of original spectral matrix with possibly correlated variables is converted to a group of new uncorrelated variables that are linear combinations of the original variables, namely principal components (PCs).PCA uses a modest number of PCs to characterize as much of the variation in the original data as possible, so the first few PCs might help us to interpret the original dataset.We employed nonlinear iterative partial least squares algorithm to implement the PCA to compute PCs and scores [34].This algorithm avoids calculating the covariance matrix of the spectral matrix, which can effectively reduce the computational time.In general, the spectral cluster analysis is firstly dependent on the PC1, namely the reflectance spectra intensity, and then dependent on the spectral shape (PC2) [28].
Fuzzy k-mean (FKM) clustering is a commonly employed approach in unsupervised clustering.Unlike the k-means clustering and discriminant analysis techniques, FKM clustering algorithm, without setting the threshold manually, can provide an objective criterion to determine the optimal number of clusters, and this is a competitive advantage over other methods [28].The basic idea of the FKM clustering is to divide a set of datasets (in our case known as the PCA scores) into k classes to seek out the iterative minimization of the objective function.Three evaluation parameters are obtained from the FKM clustering algorithm (i.e., fuzziness performance index (FPI), modified partition entropy (MPE) and clustering separation index (S)).FPI is a measure of the continuity between the classifications, and a value close to 0 indicates there is little shared membership and the partition of classifications is obvious.MPE is a comprehensive index to measure the fuzzy degree among various classifications.S represents the relative distinction between the classifications.The optimal number of classifications can be determined when these three values approach 0 simultaneously.For a more comprehensive description of the FKM clustering algorithm, readers are directed to Shi et al. [28].FuzME 3.0 software [35] was used to perform the FKM clustering analyses.The maximum number of iterations, the convergence threshold, and the fuzzy weighted index in the FKM clustering algorithm was set to 300, 0.001 and 1.5, respectively [28].

Normalized Soil Moisture Index
Normalized soil moisture index (NSMI) is a non-dimensional measure of reflectance spectra, calculated from normalized difference of two wavelengths using mathematical operations [36].For the calculation of the NSMI, all possible wavelength combinations in the 400-2400 nm region were explored for their correlation with SM to select the optimal spectral indices to indicate the SM condition, and its mathematical expression is characterized by Equation (2): where R i and R j represent the reflectance values at ith and jth nm, respectively.Two-dimensional correlation map was used to show their possible combination, and developed using a program in Matlab R2014a.The NSMI was easy to use and had good interpretability [26,36].

Calibration and Validation
The whole dataset (n = 121) was sorted in ascending order according to SOM content, and we used stratified sampling approach to separate 121 samples into 41 strata with two or three intervals, and one sample was selected from each strata as independent validation dataset for model validation (a total of 41 samples).The remaining samples were selected as a calibration dataset for model calibration (a total of 80 samples).In each dataset, 8 SM levels were presented for each sample.
Leave-one-out cross-validation was applied to identify the optimal PLS latent factors that obtained the first minimum value of the root mean squared error of cross-validation.The component factors derived from the PLS were then input to SVM for establishing PLS-SVM models.We selected the e-SVM algorithm and radial basis function for modeling, and a grid search technique with 5-fold cross-validation were chosen for model optimization [37,38].The coefficient of determination (R 2 ), root mean squared error (RMSE) and ratio of the performance to deviation (RPD) between the predicted and measured SOM in the processes of calibration, cross-validation and validation were selected to evaluate the model performance.In terms of RPD, RPD < 1.4 indicated unacceptable models/predictions; 1.4 ≤ RPD < 1.8 indicated fair models/predictions; 1.8 ≤ RPD < 2.0 indicated good models/predictions; 2.0 ≤ RPD < 2.5 indicated very good models/predictions; RPD ≥ 2.5 indicated excellent models/predictions [39,40].Generally, the larger R 2 , RPD and the smaller RMSE were indicators of a superior model.All data analyses were carried out in Matlab R2014a (The MathWorks Inc.: Natick, MA, USA).
We labeled samples by different SM levels rather than individual samples in order to avoid pseudo-replication of soil samples.The procedures of two classification methods for model calibration and validation are shown in Figure 2 and compared as follows: (1) the calibration dataset with 8 SM levels (n = 640) was used to perform PCA, and the first few PCs were then clustered with FKM clustering (referred to as SM-based cluster).Then, PLS-SVM was used to develop a separate calibration model with known SM in each cluster.(2) The calibration dataset with 8 SM levels (n = 640) was also applied to calculate NSMI.According to the number of the SM classification, the NSMI index was also divided into the same number of clusters (referred to as NSMI-based cluster).Likewise, a separate PLS-SVM model was established for each NSMI-based cluster, and the results of these models were compared to those of the corresponding SM-based cluster.(3) Models calibrated from NSMI-based cluster were further tested on independent validation dataset (8 SM levels, n = 328) with unknown SM to evaluate the classification effect of NSMI method.Mouazen et al. [25] employed correct classification (CC) method to assess the performance of classification, which was calculated by dividing the number of correctly grouped samples by the total number of samples in that cluster, and we also adopted this method to evaluate the performance of NSMI classification.
FKM clustering (referred to as SM-based cluster).Then, PLS-SVM was used to develop a separate calibration model with known SM in each cluster.(2) The calibration dataset with 8 SM levels (n = 640) was also applied to calculate NSMI.According to the number of the SM classification, the NSMI index was also divided into the same number of clusters (referred to as NSMI-based cluster).Likewise, a separate PLS-SVM model was established for each NSMI-based cluster, and the results of these models were compared to those of the corresponding SM-based cluster.(3) Models calibrated from NSMI-based cluster were further tested on independent validation dataset (8 SM levels, n = 328) with unknown SM to evaluate the classification effect of NSMI method.Mouazen et al. [25] employed correct classification (CC) method to assess the performance of classification, which was calculated by dividing the number of correctly grouped samples by the total number of samples in that cluster, and we also adopted this method to evaluate the performance of NSMI classification.

Descriptive Statistics of SOM
The summary statistics of SOM contents measured by traditional chemical methods for the whole, calibration and independent validation datasets are provided in Figure 3.The calibration dataset varied from 8.90 to 46.15 g•kg −1 with a mean value of 22.03 g•kg −1 , and the range in the independent validation dataset was from 11.41 to 44.02 g•kg −1 with an average of 22.56 g•kg −1 .Overall, the characteristic statistics of both the calibration and independent validation dataset were similar to the whole dataset, indicating that they were well divided to represent the whole dataset.

Descriptive Statistics of SOM
The summary statistics of SOM contents measured by traditional chemical methods for the whole, calibration and independent validation datasets are provided in Figure 3.The calibration dataset varied from 8.90 to 46.15 g•kg −1 with a mean value of 22.03 g•kg −1 , and the range in the independent validation dataset was from 11.41 to 44.02 g•kg −1 with an average of 22.56 g•kg −1 .Overall, the characteristic statistics of both the calibration and independent validation dataset were similar to the whole dataset, indicating that they were well divided to represent the whole dataset.

Influence of SM on VIS-NIR Spectra
To analyze the influence of SM on reflectance spectra, spectral reflectance at different SM levels from the calibration dataset (n = 80) were investigated (Figure 4).Spectral curves at different SM levels showed similar shapes but with different intensities (Figure 4a).Three obvious absorption peaks around 1420, 1940, 2200 nm were exhibited in all SM levels.The reflectance spectra across the entire spectral range tended to decrease as SM increased, but the shifts were not homogeneous along the wavelengths.SM had an evident effect on reflectance spectra: for low SM levels (SM ≤ 17.66%), the decrease in reflectance spectra was more evident, while when SM was higher than 17.66%, the sensitivity of reflectance spectra to variable SM was less noticeable.To better explain the impact of SM, we calculated the spectral angle ( ) between mean spectral curves at different SM levels (Figure 4b).Overall, the SA ( ) varied greatly (0-12.06°).Taking reflectance spectra at SM level of 2.72% as an example, its SA with SM levels of 7.47%, 12.67%, 17.66%, 22.14%, 25.87%, 29.32% and 32.82% ranged from 0 to 12.06°: with the SM increasing, the differences of SA became more and more obvious, further proving that SM affected the reflectance spectra very significantly.Two-dimensional synchronous correlation spectra on the averaged reflectance at different SM levels in the calibration dataset are also performed (Figure 5).According to the color bar illustrated in the figure, the influence of SM on reflectance spectra in NIR range (1000-2400 nm) was clearly

Influence of SM on VIS-NIR Spectra
To analyze the influence of SM on reflectance spectra, spectral reflectance at different SM levels from the calibration dataset (n = 80) were investigated (Figure 4).Spectral curves at different SM levels showed similar shapes but with different intensities (Figure 4a).Three obvious absorption peaks around 1420, 1940, 2200 nm were exhibited in all SM levels.The reflectance spectra across the entire spectral range tended to decrease as SM increased, but the shifts were not homogeneous along the wavelengths.SM had an evident effect on reflectance spectra: for low SM levels (SM ≤ 17.66%), the decrease in reflectance spectra was more evident, while when SM was higher than 17.66%, the sensitivity of reflectance spectra to variable SM was less noticeable.To better explain the impact of SM, we calculated the spectral angle (θ) between mean spectral curves at different SM levels (Figure 4b).Overall, the SA (θ) varied greatly (0-12.06• ).Taking reflectance spectra at SM level of 2.72% as an example, its SA with SM levels of 7.47%, 12.67%, 17.66%, 22.14%, 25.87%, 29.32% and 32.82% ranged from 0 to 12.06 • : with the SM increasing, the differences of SA became more and more obvious, further proving that SM affected the reflectance spectra very significantly.

Influence of SM on VIS-NIR Spectra
To analyze the influence of SM on reflectance spectra, spectral reflectance at different SM levels from the calibration dataset (n = 80) were investigated (Figure 4).Spectral curves at different SM levels showed similar shapes but with different intensities (Figure 4a).Three obvious absorption peaks around 1420, 1940, 2200 nm were exhibited in all SM levels.The reflectance spectra across the entire spectral range tended to decrease as SM increased, but the shifts were not homogeneous along the wavelengths.SM had an evident effect on reflectance spectra: for low SM levels (SM ≤ 17.66%), the decrease in reflectance spectra was more evident, while when SM was higher than 17.66%, the sensitivity of reflectance spectra to variable SM was less noticeable.To better explain the impact of SM, we calculated the spectral angle ( ) between mean spectral curves at different SM levels (Figure 4b).Overall, the SA ( ) varied greatly (0-12.06°).Taking reflectance spectra at SM level of 2.72% as an example, its SA with SM levels of 7.47%, 12.67%, 17.66%, 22.14%, 25.87%, 29.32% and 32.82% ranged from 0 to 12.06°: with the SM increasing, the differences of SA became more and more obvious, further proving that SM affected the reflectance spectra very significantly.Two-dimensional synchronous correlation spectra on the averaged reflectance at different SM levels in the calibration dataset are also performed (Figure 5).According to the color bar illustrated in the figure, the influence of SM on reflectance spectra in NIR range (1000-2400 nm) was clearly Two-dimensional synchronous correlation spectra on the averaged reflectance at different SM levels in the calibration dataset are also performed (Figure 5).According to the color bar illustrated in the figure, the influence of SM on reflectance spectra in NIR range (1000-2400 nm) was clearly stronger than that in visible range.Besides, two autocorrelation peaks at diagonal position near 1450 nm and 1940 nm could be easily observed.Compared to the autocorrelation peak around 1450 nm, the autocorrelation peak around 1940 nm was more obvious, which indicated the wavebands around 1940 nm were more sensitive to the influence of SM, while the wavebands around 1450 nm were relatively insensitive.stronger than that in visible range.Besides, two autocorrelation peaks at diagonal position near 1450 nm and 1940 nm could be easily observed.Compared to the autocorrelation peak around 1450 nm, the autocorrelation peak around 1940 nm was more obvious, which indicated the wavebands around 1940 nm were more sensitive to the influence of SM, while the wavebands around 1450 nm were relatively insensitive.

SM Classification
We first mean centered the reflectance spectra of the calibration dataset (n = 640), and then performed PCA on the pretreated spectral dataset to reduce the dimensionality.The first two PCs together accounted for more than 95% of the total spectral variations (i.e., 97.53% and 1.70% for PC1 and PC2, respectively).The FKM clustering was then utilized to divide the scores of the first two PCs into spectrally similar clusters.A series of numbers of classes (2-10) were examined to identify the optimal number of classifications.The values of FPI, MPE and S of different classes are calculated and compared in Figure 6, from which we can determine that the best number of classifications is equal to 4, where the FPI, MPE and S obtained the minimum values simultaneously.Thus, the scores of the first two PCs at different SM levels of the calibration dataset (n = 640) were divided into four clusters, and its overview map is shown in Figure 7.As SM increased, the PC space distribution varied from cluster 1 to cluster 4 (Figure 7).The PC1 values of cluster 3 and cluster 4 were relatively concentrated, whereas cluster 1 and cluster 2 demonstrated a wide distribution.

SM Classification
We first mean centered the reflectance spectra of the calibration dataset (n = 640), and then performed PCA on the pretreated spectral dataset to reduce the dimensionality.The first two PCs together accounted for more than 95% of the total spectral variations (i.e., 97.53% and 1.70% for PC1 and PC2, respectively).The FKM clustering was then utilized to divide the scores of the first two PCs into spectrally similar clusters.A series of numbers of classes (2-10) were examined to identify the optimal number of classifications.The values of FPI, MPE and S of different classes are calculated and compared in Figure 6, from which we can determine that the best number of classifications is equal to 4, where the FPI, MPE and S obtained the minimum values simultaneously.

SM Classification
We first mean centered the reflectance spectra of the calibration dataset (n = 640), and then performed PCA on the pretreated spectral dataset to reduce the dimensionality.The first two PCs together accounted for more than 95% of the total spectral variations (i.e., 97.53% and 1.70% for PC1 and PC2, respectively).The FKM clustering was then utilized to divide the scores of the first two PCs into spectrally similar clusters.A series of numbers of classes (2-10) were examined to identify the optimal number of classifications.The values of FPI, MPE and S of different classes are calculated and compared in Figure 6, from which we can determine that the best number of classifications is equal to 4, where the FPI, MPE and S obtained the minimum values simultaneously.Thus, the scores of the first two PCs at different SM levels of the calibration dataset (n = 640) were divided into four clusters, and its overview map is shown in Figure 7.As SM increased, the PC space distribution varied from cluster 1 to cluster 4 (Figure 7).The PC1 values of cluster 3 and cluster 4 were relatively concentrated, whereas cluster 1 and cluster 2 demonstrated a wide distribution.Thus, the scores of the first two PCs at different SM levels of the calibration dataset (n = 640) were divided into four clusters, and its overview map is shown in Figure 7.As SM increased, the PC space distribution varied from cluster 1 to cluster 4 (Figure 7).The PC1 values of cluster 3 and cluster 4 were relatively concentrated, whereas cluster 1 and cluster 2 demonstrated a wide distribution.These phenomena were in agreement with Figure 4a, which shows for the higher SM levels, the reflectance spectra were not sensitive to variable SM.Besides, the ranges of PC2 values in four clusters were not similar to each other, manifesting that there was some difference in the spectral shape, and this result was also in accordance with Figure 4a.These phenomena were in agreement with Figure 4a, which shows for the higher SM levels, the reflectance spectra were not sensitive to variable SM.Besides, the ranges of PC2 values in four clusters were not similar to each other, manifesting that there was some difference in the spectral shape, and this result was also in accordance with Figure 4a.Likewise, the 640 soil samples were regrouped to four clusters on the basis of the aforementioned results, and their corresponding descriptive statistics of SOM contents are listed in Table 1.Cluster 2 had the largest variability of SOM contents with a CV (the coefficient of variation) of 38.90%, including 94 soil samples; Cluster 1 had the smallest variability (with a CV of 32.51%) and consisted of 65 soil samples varying from 8.90 to 36.54 g•kg −1 .Cluster 3 comprised 117 soil samples with a CV of 37.84%.In particular, among these four clusters, cluster 4 had the largest number of soil samples, and was about 6 times larger than that of cluster 1, which indicated that when SM increased to higher levels (Figure 8), soil samples would locate in nearly similar spectral spaces.Likewise, the 640 soil samples were regrouped to four clusters on the basis of the aforementioned results, and their corresponding descriptive statistics of SOM contents are listed in Table 1.Cluster 2 had the largest variability of SOM contents with a CV (the coefficient of variation) of 38.90%, including 94 soil samples; Cluster 1 had the smallest variability (with a CV of 32.51%) and consisted of 65 soil samples varying from 8.90 to 36.54 g•kg −1 .Cluster 3 comprised 117 soil samples with a CV of 37.84%.In particular, among these four clusters, cluster 4 had the largest number of soil samples, and was about 6 times larger than that of cluster 1, which indicated that when SM increased to higher levels (Figure 8), soil samples would locate in nearly similar spectral spaces.These phenomena were in agreement with Figure 4a, which shows for the higher SM levels, the reflectance spectra were not sensitive to variable SM.Besides, the ranges of PC2 values in four clusters were not similar to each other, manifesting that there was some difference in the spectral shape, and this result was also in accordance with Figure 4a.Likewise, the 640 soil samples were regrouped to four clusters on the basis of the aforementioned results, and their corresponding descriptive statistics of SOM contents are listed in Table 1.Cluster 2 had the largest variability of SOM contents with a CV (the coefficient of variation) of 38.90%, including 94 soil samples; Cluster 1 had the smallest variability (with a CV of 32.51%) and consisted of 65 soil samples varying from 8.90 to 36.54 g•kg −1 .Cluster 3 comprised 117 soil samples with a CV of 37.84%.In particular, among these four clusters, cluster 4 had the largest number of soil samples, and was about 6 times larger than that of cluster 1, which indicated that when SM increased to higher levels (Figure 8), soil samples would locate in nearly similar spectral spaces.

NSMI Classification
The NSMI indices were computed wavelength-by-wavelength in the range of 400-2400 nm and then the coefficient of determination (R 2 ) between SM and NSMI indices were calculated (Figure 9a).Results showed that there was a strong relationship between SM and NSMI, and the wavelength combinations with good correlation mainly located within 1200-2400 nm (red regions).The highest coefficient of determination of 0.9194 was obtained at 1360 nm on the x-axis and 1940 nm on the y-axis (referred to as the NSMI (R1360−R1940)/(R1360+R1940) ).
For a deeper investigation of the NSMI index, the NSMI values of each soil sample calculated from the corresponding reflectance spectra at 1360 nm and 1940 nm were obtained, and the overall relationship between the SM and NSMI (R1360−R1940)/(R1360+R1940) could be fitted using a linear regression equation (Figure 9b):

NSMI Classification
The NSMI indices were computed wavelength-by-wavelength in the range of 400-2400 nm and then the coefficient of determination (R 2 ) between SM and NSMI indices were calculated (Figure 9a).Results showed that there was a strong relationship between SM and NSMI, and the wavelength combinations with good correlation mainly located within 1200-2400 nm (red regions).The highest coefficient of determination of 0.9194 was obtained at 1360 nm on the x-axis and 1940 nm on the y-axis (referred to as the NSMI(R1360-R1940)/(R1360+R1940)).
For a deeper investigation of the best NSMI index, the NSMI values of each soil sample calculated from the corresponding reflectance spectra at 1360 nm and 1940 nm were obtained, and the overall relationship between the SM and NSMI(R1360-R1940)/(R1360+R1940) could be fitted using a linear regression equation (Figure 9b): SM 0.6209 NSMI 0.0032 The coefficient of determination (R 2 ) between the NSMI(R1360-R1940)/(R1360+R1940) and SM was 0.9194, and it was obvious that the NSMI(R1360-R1940)/(R1360+R1940) was strongly correlated with SM.We employed Equation (3) to predict the independent validation dataset (n = 328) at different SM levels, and the model gave a validation R 2 of 0.8824, indicating the NSMI could be applied as a proxy of soil moisture.
To establish the transferability between SM-based cluster method and NSMI-based cluster method, the 640 soil samples were also partitioned into four clusters according to SM-based cluster, and the respective threshold criteria in the NSMI values were divided: (1) cluster 1, 0.5045 ≤ NSMI <0.9534; (2) cluster 2, 0.4123 ≤ NSMI < 0.5045; (3) cluster 3, 0.3419 ≤ NSMI < 0.4123; (4) cluster 4, 0.0034 ≤ NSMI < 0.3419.The descriptive statistics of SOM for each cluster are summarized in Table 2. SOM contents in cluster 1 displayed a narrow range of 8.90-41.61g•kg −1 , with a CV of 35.26%, whereas cluster 2, cluster 3 and cluster 4 were characterized by slightly larger ranges (Min.-Max.)and CVs of SOM contents, compared with cluster 1.Although some differences existed between the respective cluster obtained The coefficient of determination (R 2 ) between the NSMI (R1360−R1940)/(R1360+R1940) and SM was 0.9194, and it was obvious that the NSMI (R1360−R1940)/(R1360+R1940) was strongly correlated with SM.We employed Equation (3) to predict the independent validation dataset (n = 328) at different SM levels, and the model gave a validation R 2 of 0.8824, indicating the NSMI could be applied as a proxy of soil moisture.
To establish the transferability between SM-based cluster method and NSMI-based cluster method, the 640 soil samples were also partitioned into four clusters according to SM-based cluster, and the respective threshold criteria in the NSMI values were divided: (1) cluster 1, 0.5045 ≤ NSMI < 0.9534; (2) cluster 2, 0.4123 ≤ NSMI < 0.5045; (3) cluster 3, 0.3419 ≤ NSMI < 0.4123; (4) cluster 4, 0.0034 ≤ NSMI < 0.3419.The descriptive statistics of SOM for each cluster are summarized in Table 2. SOM contents in cluster 1 displayed a narrow range of 8.90-41.61g•kg −1 , with a CV of 35.26%, whereas cluster 2, cluster 3 and cluster 4 were characterized by slightly larger ranges (Min.-Max.)and CVs of SOM contents, compared with cluster 1.Although some differences existed between the respective cluster obtained from SM-based cluster method and NSMI-based cluster method (Tables 1 and 2), by and large, some comparable results of SOM contents for each corresponding cluster could be observed.For instance, the statistical characteristics of cluster 4 processed by the SM-based cluster method were similar to those of cluster 4 treated by the NSMI-based cluster method (a same range of SOM), only minor differences existed (mean = 22.75 g•kg −1 , CV = 36.33%for SM-based cluster method, while mean = 21.86 g•kg −1 , CV = 37.33% for NSMI-based cluster method).

Estimation of SOM with PLS-SVM Model
Separate PLS-SVM models for SOM estimation were built for each cluster generated from the SM-based cluster method with known SM and NSMI-based cluster method with unknown SM, and the cross-validation results are shown in Table 3. Overall, in SM-based cluster method, the best model was obtained for cluster 4 (R 2 cv = 0.77 and RPD = 2.08).According to the five-level interpretations of RPD (Section 2.7), a very good model could be observed for cluster 1 (RPD = 2.05); a fair model and good model were obtained for cluster 2 (RPD = 1.79) and cluster 3 (RPD = 1.90), respectively.In NSMI-based cluster method (Table 3), the PLS-SVM models of cluster 2 and cluster 3 performed slightly better than the corresponding cluster from the SM-based cluster method, with RPD = 1.95 and 2.01, respectively, while the cross-validation accuracies for cluster 1 and cluster 4 were lower (in terms of R 2 cv and RPD) compared with SM-based cluster method.Thus, it could indicate that, in comparison with the SM-based cluster method, the PLS-SVM cross-validation models using the NSMI-based cluster method would obtain similar accuracies.
To explore the feasibility of improving estimation of SOM at different SM levels by splitting the calibration dataset into smaller sub-clusters, PLS-SVM model was also performed on the whole calibration dataset (n = 640, global calibration) to compare the performance of the sub-models with global model (Table 3).We observed better results for sub-models than global model: the range of RPD in SM-based cluster method was 1.79 to 2.08, and in NSMI-based cluster method was 1.95 to 2.04, while in global calibration the RPD was only equal to 1.56.
The calibration models of different clusters developed by the NSMI-based cluster method were further applied to test the independent validation dataset (8 SM levels, n = 328), and these 328 soil samples were assigned to four clusters on the basis of the calculated threshold criteria of NSMI values higher SM levels (SM > 17.66%), the decrease was not obvious (Figure 4a).These may be ascribed to the results reported by Lobell and Asner [10], who demonstrated that, as SM increased, once most of soil surfaces absorbed enough SM, the remaining SM that sequentially filled into the micro and macro pores would have little effect on the reflectance spectra.Mouazen et al. [25] established six SM levels by adding water (ranging from 0% to 27.5%, by weight) for the single-field samples, and reported that when SM was higher than 15%, the sensitivity of reflectance spectra to variable SM decreased.Their results were in line with the findings reported by Nocita et al. [26].Similar results were also reported in our research.Thus, we believe our experimental design can serve as a reference for future research when SM is lower than the field holding capacity.
Figure 5 displays that the influence of SM was more evident for the longer wavelengths (1000-2400 nm), particularly the strong SM absorption peak around 1940 nm, which even masked the absorption peak signals around 2200 nm associated with organic functional groups.Published studies, however, indicated that some special wavebands related to SOM (or SOC) were located within these areas influenced by the SM [5,28].For example, Knadel et al. [41] summarized that key components in organic matter had a peak around 1930 nm.Vasques et al. [42] reported that wavebands around 1400 and from 1800 to 2400 nm were especially important for SOC estimation.Therefore, a method for minimizing the effects of SM on reflectance spectra in the estimation of SOM is indispensable.

Clustering the Modeling Dataset into Different SM Levels
Although the reflectance spectra are highly influenced by SM, some studies confirm that if the calibration dataset and validation dataset come from specific SM conditions (approximately similar SM levels), the effects of SM on SOM/SOC estimation would be reduced, and the moist VIS-NIR spectra can also be applied to predict soil properties using modeling strategies [2].For instance, Rodionov et al. [43] found that it was practicable to predict SOC for each corresponding SM level (5% to 25%, 5% interval) with RPD ranging from 2.25 to 3.07.The study from Wang et al. [44] also pointed out that when SM was smaller than 22%, SOM could be reliably predicted if the range of SM at each SM level was well-defined (with a similar SM level).
In the current work, we assumed that the more similar two soil samples were in terms of their VIS-NIR spectra, the more similar they could be in terms of SM.That is to say, in a given set of soil samples at different SM levels, the variation of SM can be explained to a certain degree by the variation of spectral similarity/dissimilarity.We introduced the FKM clustering to divide the calibration dataset at nine SM levels (n = 640) into smaller clusters, which could divide the ranges of SM into different specific SM conditions, and would reduce the non-linear effects of SM on SOM estimation.We observed an improvement in accuracies of cross-validation for clustered models than for global model (RPD = 1.56).Castaldi et al. [45] confirmed that with a priori knowledge of SM, the predictive models from four SM classes could improve the estimation accuracy of clay.Our results support their findings.

NSMI
According to Figure 9b, it could be observed that NSMI was highly correlated with SM across all SM levels, and a strong linear relationship was obtained.Moreover, the wavelengths used to compute the NSMI are located at 1360 nm and 1940 nm, and only two spectral wavelengths are required, without a-priori knowledge of SM [26,35].Thus, the NSMI derived from the VIS-NIR spectral feature space is a useful and potential index for monitoring SM, and the proposed methodology is also simple and easy to implement.
The results of correct classification (CC) obtained by the NSMI-based cluster method are provided in Table 2.For cluster 4, the CC of 90.93% was obtained (33 soil samples were misclassified), and the classification result was the best in all clusters.The order of the classification accuracies for the other three clusters was cluster 2 (CC = 82.98%,16 soil samples were misclassified) > cluster 1 (CC = 75.38%,16 soil samples were misclassified) > cluster 3 (CC = 71.79%,33 soil samples were misclassified).This suggested NSMI-based cluster method gave slightly worse performance than SM-based cluster method, but the results of CC in four clusters were still striking.
It was notable that the accuracy of cross-validation from the corresponding cluster in the NSMI-based cluster method was comparable to that obtained from the corresponding cluster in the SM-based cluster method (Table 3).The independent validation scatterplots processed by NSMI-based cluster method yielded prediction categories ranging from good to very good (RPD = 1.87 to 2.06) (Figure 10).The prediction accuracy of cluster 4 was superior to the other three clusters.One reason might be attributed to the result summarized by Stenberg [2], who experienced a similar result that rewetting samples showed positive effect for estimating SOC content.
With the application of a kernel function, the PLS-SVM model can be flexible to solve the complicated and non-linear regression problems.Other researchers also concluded that SVM was a suitable multivariate method when using VIS-NIR spectral calibration on field moist samples (e.g., Li et al. [46]; Xu et al. [47]).Perhaps in future studies, we can further refine our technique by combining the NSMI with OSC and GLSW algorithms to investigate the potential in the removal of SM.

Conclusions
The results derived from our study clearly demonstrated the need to interpret the influence of variable soil moisture (SM) on the prediction of soil organic matter (SOM) via VIS-NIR spectroscopy.Variable SM led to reduced VIS-NIR reflectance nonlinearly across the entire spectral range.When fuzzy k-mean clustering was applied to partition the calibration dataset into four spectrally similar clusters (SM-based cluster method), the model accuracies in all clusters improved compared with the whole calibration dataset (n = 640, global calibration).This indicated that the non-linear effect of SM was reduced through clustering.The normalized soil moisture index (NSMI) had a strong correlation with SM across all SM levels (validation R 2 was 0.8824).The SOM estimation based on the NSMI-based cluster method with unknown SM presented comparable modeling accuracies compared to the ones estimated by SM-based cluster method with known SM.Good or very good model predictions for SOM (RPD = 1.87-2.06)were obtained using the NSMI-based cluster method.Moreover, the NSMI-based cluster method is easy to carry out since it only considers the spectral information, which might facilitate the prediction of SOM in the field, without explicit knowledge of SM.Because in the field, different well-defined SM levels are difficult to obtain, the present study was conducted in a controlled laboratory environment.In further studies, the effects of soil surface roughness, vegetation cover and the composition of soil (clay, sand) on reflectance spectra need to be taken into account.In addition, future studies should be encouraged to explore the potential of the NSMI-based cluster method associated with more advanced modeling technologies in other study areas.

Figure 1 .
Figure 1.Study area and the location of each sampling site.

Figure 1 .
Figure 1.Study area and the location of each sampling site.

Figure 2 .
Figure 2. The flow chart of two classification methods (SM-based cluster method and NSMI-based cluster method) for SOM estimation at different SM levels in this study.n: the number of soil samples, same as below.

Figure 2 .
Figure 2. The flow chart of two classification methods (SM-based cluster method and NSMI-based cluster method) for SOM estimation at different SM levels in this study.n: the number of soil samples, same as below.

Figure 3 .
Figure 3. Box-plots, histograms and descriptive statistics of SOM: (a) the whole dataset; (b) calibration dataset; and (c) independent validation dataset.Min.: minimum, Max.: maximum, SD: standard deviation, CV: coefficient of variation, n: the number of soil samples.

Figure 4 .
Figure 4. (a) Mean reflectance at different SM levels in the calibration dataset and (b) two-dimensional sample-sample spectral angle (SA, by angle) between different SM levels. : spectral angle.

Figure 3 .
Figure 3. Box-plots, histograms and descriptive statistics of SOM: (a) the whole dataset; (b) calibration dataset; and (c) independent validation dataset.Min.: minimum, Max.: maximum, SD: standard deviation, CV: coefficient of variation, n: the number of soil samples.

Figure 3 .
Figure 3. Box-plots, histograms and descriptive statistics of SOM: (a) the whole dataset; (b) calibration dataset; and (c) independent validation dataset.Min.: minimum, Max.: maximum, SD: standard deviation, CV: coefficient of variation, n: the number of soil samples.

Figure 4 .
Figure 4. (a) Mean reflectance at different SM levels in the calibration dataset and (b) two-dimensional sample-sample spectral angle (SA, by angle) between different SM levels. : spectral angle.

Figure 4 .
Figure 4. (a) Mean reflectance at different SM levels in the calibration dataset and (b) two-dimensional sample-sample spectral angle (SA, by angle) between different SM levels.θ: spectral angle.

Figure 5 .
Figure 5. Two-dimensional synchronous correlation spectra of the mean reflectance at different SM levels (in the calibration dataset).

Figure 6 .
Figure 6.FPI, MPE and S values versus different numbers of classifications in the calibration dataset.

Figure 5 .
Figure 5. Two-dimensional synchronous correlation spectra of the mean reflectance at different SM levels (in the calibration dataset).
Remote Sens. 2017, 10,28 9 of 17Remote Sens. 2017, 10, 28; doi:10.3390/rs10010028www.mdpi.com/journal/remotesensingstronger than that in visible range.Besides, two autocorrelation peaks at diagonal position near 1450 nm and 1940 nm could be easily observed.Compared to the autocorrelation peak around 1450 nm, the autocorrelation peak around 1940 nm was more obvious, which indicated the wavebands around 1940 nm were more sensitive to the influence of SM, while the wavebands around 1450 nm were relatively insensitive.

Figure 5 .
Figure 5. Two-dimensional synchronous correlation spectra of the mean reflectance at different SM levels (in the calibration dataset).

Figure 6 .
Figure 6.FPI, MPE and S values versus different numbers of classifications in the calibration dataset.

Figure 6 .
Figure 6.FPI, MPE and S values versus different numbers of classifications in the calibration dataset.

Figure 7 .
Figure 7. Scatter plot of the first two principal components (PC1, PC2) of four spectral clusters in the calibration dataset (n = 640).

Figure 8 .
Figure 8. Box-plots of SM in four clusters using SM-based cluster method (in the calibration dataset).

Figure 7 .
Figure 7. Scatter plot of the first two principal components (PC1, PC2) of four spectral clusters in the calibration dataset (n = 640).

Figure 7 .
Figure 7. Scatter plot of the first two principal components (PC1, PC2) of four spectral clusters in the calibration dataset (n = 640).

Figure 8 .
Figure 8. Box-plots of SM in four clusters using SM-based cluster method (in the calibration dataset).

Figure 8 .
Figure 8. Box-plots of SM in four clusters using SM-based cluster method (in the calibration dataset).

Figure 9 .
Figure 9. (a) 2-D correlogram of the coefficient of determination (R 2 ) between SM and NSMI indices (n = 640) and (b) the correlation between SM and the optimal NSMI index {(R 1360 − R 1940 )/(R 1360 + R 1940 )} at different SM levels (n = 640).The blue line illustrated in the (b) is the regression line.

Table 1 .
Statistical characteristics of SOM using SM-based cluster method in the calibration dataset.
a Minimum; b Maximum; c Standard deviation; d Coefficient of variation.

Table 1 .
Statistical characteristics of SOM using SM-based cluster method in the calibration dataset.

Table 2 .
Statistical characteristics of SOM using NSMI classification method in the calibration dataset.Minimum; b Maximum; c Standard deviation; d Coefficient of variation; e Correct classification. a

Table 3 .
PLS-SVM cross-validation results for SOM estimation based on SM-based cluster method, NSMI-based cluster method, and the entire calibration dataset (n = 640).