Prediction of Soil Organic Carbon in a New Target Area by Near-Infrared Spectroscopy: Comparison of the Effects of Spiking in Different Scale Soil Spectral Libraries

Near-infrared (NIR) spectroscopy is widely used to predict soil organic carbon (SOC) because it is rapid and accurate under proper calibration. However, the prediction accuracy of the calibration model may be greatly reduced if the soil characteristics of some new target areas are different from the existing soil spectral library (SSL), which greatly limits the application potential of the technology. We attempted to solve the problem by building a large-scale SSL or using the spiking method. A total of 983 soil samples were collected from Zhejiang Province, and three SSLs were built according to geographic scope, representing the provincial, municipal, and district scales. The partial least squares (PLS) algorithm was applied to establish the calibration models based on the three SSLs, and the models were used to predict the SOC of two target areas in Zhejiang Province. The results show that the prediction accuracy of each model was relatively poor regardless of the scale of the SSL (residual predictive deviation (RPD) < 2.5). Then, the Kennard-Stone (KS) algorithm was applied to select 5 or 10 spiking samples from each target area. According to different SSLs and numbers of spiking samples, different spiked models were established by the PLS. The results show that the predictive ability of each model was improved by the spiking method, and the improvement effect was inversely proportional to the scale of the SSL. The spiked models built by combining the district scale SSL and a few spiking samples achieved good prediction of the SOC of two target areas (RPD = 2.72 and 3.13). Therefore, it is possible to accurately measure the SOC of new target areas by building a small-scale SSL with a few spiking samples.


Introduction
Soil is a natural resource that humans depend on for survival. Soil organic carbon (SOC) content is an important indicator to evaluate soil fertility, which directly reflects the soil quality and affects crop growth [1]. The measurement of SOC content is of great significance for accurate fertilization [2]. In addition, as a part of the global carbon cycle, studying the content and distribution of SOC helps human society understand the problem of global warming [3].
In recent decades, as a rapid, nondestructive, and accurate technique, near-infrared (NIR) spectroscopy has been widely used to predict soil nutrient information [4]. He et al. predicted soil This work explored how to use small-scale SSLs to predict the SOC of new target areas and studied the feasibility of the spiking method. It was expected that we would find an effective spiking strategy by adding different amounts of spiking samples and comparing the improvement effect of the predictive ability of the models, based on SSLs of different scales. In this work, three issues were mainly discussed and studied: (1) Does a larger SSL, on which the model is based, provide better prediction for new target areas? (2) Can the method of spiking improve the predictive ability of models built by SSLs of different scales? (3) What effect does the number of spiked samples have on the predictive ability of the model?

Soil Samples
The study area is located in Zhejiang Province (27 • 02 -31 • 11 N, 118 • 01 -123 • 10 E), People's Republic of China. Zhejiang is located in the middle of the subtropical zone, which belongs to the subtropical monsoon climate. According to Chinese standard GB/T 17296-2009 (classification and codes for Chinese soil) [16], the soil of the province is classified as mainly yellow and red soil, mostly distributed in hilly and mountainous areas. In addition, paddy soil is distributed in the plain area, and some saline and desalinized soils are distributed along the coastal area.
First, 714 surface soil samples were collected from 11 prefecture-level cities in Zhejiang Province to form the PSSL. The soil types included red, yellow, paddy, and saline soil. At each collection point, five soil samples were collected from the surface layer (0-20 cm) according to the "plum blossom" sampling method [17]. After picking out the straw and stones, the five soil samples were mixed evenly as an experimental soil sample. In order to reduce the influence of particle size and moisture on the measurement results, each sample was sequentially processed by drying, grinding, and sieving (2 mm aperture). Each processed sample was divided into two parts: one part was sent to the Institute of Soil Fertilizer (Zhejiang Academy of Agricultural Sciences) to measure the chemical reference value of SOC content (g kg −1 ) by the Walkley-Black method [18], and the other part was stored in a Petri dish for NIR spectroscopy measurement and analysis. Then 167 CSSL soil samples were collected by the same method from Jinhua City (28 • 32 -29 • 41 N, 119 • 14 -120 • 46 E). The soil types in the CSSL were mainly red, yellow, and paddy soil. In addition, 102 DSSL soil samples were similarly collected from farmland in Wencheng District (27 • 34 -27 • 59 N, 119 • 46 -120 • 15 E), Wenzhou City. The soil type in the DSSL was mainly paddy soil.
The first target area (TA1) was located in fields in Daoxu Town (30 • 00 -30 • 05 N, 120 • 43 -120 • 48 E), Shaoxing City. A total of 60 soil samples were collected. The typical soil type in TA1 was paddy soil, and the target fields in the area were mainly planted with double-season rice. The second target area (TA2) was located in fields in Hengdian Town (29 • 05 -29 • 13 N, 120 • 14 -120 • 22 E), Jinhua City. A total of 66 soil samples were collected. The typical soil types in TA2 were red, yellow, and paddy soil. The target fields in the area were mainly used for growing vegetables, soybeans, medicinal herbs, rice, corn, sweet potatoes, and other crops. A map showing the three soil libraries and two target areas, and the distribution of sampling points is shown in Figure 1.
In this work, the setting of the three soil libraries and the selection of the two target areas mainly considered two factors: first, the geographic scope covered by the soil libraries was from large to small, covering the provincial, municipal, and district scales in turn, and the corresponding soil sample quantity was also from large to small. In addition, as shown in Figure 1, the geographic scope of the PSSL included TA1 and TA2. The geographic scope of the CSSL included TA2, but not TA1. The geographic scope of the DSSL included neither TA1 nor TA2. Second, from the perspective of soil types, the PSSL contained all soil types in the province, but the proportion of paddy soil was not high (approximately 20%). Therefore, the target area with mainly the paddy soil type was selected as TA1 to study the universality of the large soil library. The soil types of the CSSL included both TA1 and TA2, but the number of soil samples was not as large as in the PSSL. The soil type of the DSSL was the same  In this work, the setting of the three soil libraries and the selection of the two target areas mainly considered two factors: first, the geographic scope covered by the soil libraries was from large to small, covering the provincial, municipal, and district scales in turn, and the corresponding soil sample quantity was also from large to small. In addition, as shown in Figure 1, the geographic scope of the PSSL included TA1 and TA2. The geographic scope of the CSSL included TA2, but not TA1. The geographic scope of the DSSL included neither TA1 nor TA2. Second, from the perspective of soil types, the PSSL contained all soil types in the province, but the proportion of paddy soil was not high (approximately 20%). Therefore, the target area with mainly the paddy soil type was selected as TA1 to study the universality of the large soil library. The soil types of the CSSL included both TA1 and TA2, but the number of soil samples was not as large as in the PSSL. The soil type of the DSSL was the same as TA1, but different from TA2. The main characteristics of all soil samples for the soil libraries and target areas are shown in Table 1.

NIR Spectral Data Measurement
In this work, a Bruker Fourier-type NIR spectrometer (Matrix-I, Bruker Optics Inc., Ettlingen, Germany) was used to measure the NIR spectral data of soil samples. The spectrometer was performed in the reflectance mode, the spectral measurement range of the instrument is 1000-2500 nm and the spectral resolution is 8 cm −1 . Each spectrum contained 1555 wavelength variables. Further, each soil sample was evenly dispersed in the sample cup so that the thickness of the sample reached the height of the sample cup, which was 20 mm. The NIR light source penetrated the quartz window from bottom to top to illuminate the soil sample placed in the sample cup with a diameter of 50 mm. During the measurement, the soil sample rotated with the sample cup to obtain the average spectrum. The rotation speed of the sample cup was 4 cm/s, and each soil sample was scanned 64 times in 40 s. These scans were arithmetic averaged and the result used as the soil sample spectrum. The background spectrum of the spectrometer was obtained by the OPUS software when the sample cup was empty, and the background spectrum was recalibrated every 10 samples. All soil samples in this work followed the same procedure. The Matrix-I was directly connected to a computer via a network cable, and OPUS Sensors 2020, 20, 4357 5 of 14 software (Bruker Optics Inc., Ettlingen, Germany) was used to control the spectrometer to collect data and process the data.

Modeling Methods
In the quantitative detection of SOC with NIR, the calibration set of representative soil samples should be selected first, and then the NIR spectrum and chemical SOC reference values of the soil samples in the calibration set are measured. In the NIR spectral data of soil samples obtained by the spectrometer, the information of various components was superimposed at each wavelength point. Due to the wide range of NIR spectral data and the serious overlap of spectral peaks, it is difficult to analyze directly. Therefore, it is necessary to extract effective information from the spectral data by chemometrics [19]. The calibration model between the NIR spectrum and the SOC reference values of soil samples was established by the chemometric method, so there was a certain statistical relationship between the spectrum and the corresponding SOC reference values. After establishing the calibration model, the NIR spectrum data of the new soil sample can be substituted into the model to predict the corresponding SOC.
The partial least squares (PLS) algorithm is a linear regression method commonly used in spectral data analysis [20]. The PLS algorithm simultaneously considers the spectral information and the corresponding physicochemical value information during modeling, and converts the original data into new variables termed latent variables (LVs), which are mutually orthogonal and uncorrelated through linear transformation. In this work, the optimal LVs used in the PLS model were determined by cross-validation with Unscrambler 9.7 software (CAMO Inc., Oslo, Norway).
Three original calibration models without the target area samples added were established by the PLS method based on the data in the three SSLs, called the unspiked models (UMs). Then, in order to study the effect of spiking, the Kennard-Stone (KS) algorithm was applied to select the spiking subset from each target area [21]. The specific steps were as follows. First, principal component analysis (PCA) was performed on the NIR spectra of the samples in sets TA1 and TA2. Then, two principal components were generated in each case based on the predefined interpretation variance threshold (>99%). Second, five spiking samples were selected from the sample set of a target area by the KS algorithm, so that they were evenly distributed in the space defined by the first two principal components. Furthermore, in order to compare the effect of different numbers of spiking samples on the model, the KS algorithm was also used to select 10 spiking samples from the sample set in a target area. Score plots of the two target area samples in principal component space defined by the first two principal components are shown in Figure 2. Finally, the remaining sample set of the target area was used as a prediction set to investigate the prediction capability of different models. Since for each target area at most 10 samples were selected for the original calibration model, the predicted sets of TA1pre and TA2pre consisted of 50 and 56 samples, respectively. The main characteristics of the two prediction sets are shown in Table 2. Table 2. Main characteristics of the two prediction sets. Finally, the remaining sample set of the target area was used as a prediction set to investigate the prediction capability of different models. Since for each target area at most 10 samples were selected for the original calibration model, the predicted sets of TA1 pre and TA2 pre consisted of 50 and 56 samples, respectively. The main characteristics of the two prediction sets are shown in Table 2.

Model Evaluation
In this work, the decision coefficient (R 2 p ), root mean square error of prediction (RMSEP), bias, and residual predictive deviation (RPD) were used to evaluate the predictive ability of the model. R 2 p was used to indicate the correlation between the predicted and the actual value of the model. The larger the R 2 p , the higher the correlation between predicted and actual value, and the better the model prediction effect [22]. The RMSEP was used to measure the deviation between predicted and the actual value. The smaller the RMSEP, the better the model prediction effect [23]. The RPD is an intuitive indicator used for evaluating a model's predictive ability in soil spectroscopy. Specifically, when the RPD is less than 1.5, it indicates that the model effect is poor. When the RPD is 1.5-2.0, it indicates that the model effect is moderate, i.e., it can only distinguish physical and chemical values. When the RPD is 2.0-2.5, it indicates that the model effect is relatively good and it can be used for quantitative analysis. When the RPD is 2.5-3.0, it indicates that the model is very effective and can be used for quantitative analysis. When the RPD is greater than 3, it indicates that the model has a very good prediction effect [24].

Stastical Analysis
The RPD allowed us to compare the accuracy obtained in different prediction models. In order to analyze whether the difference between the RPD obtained based on different scales SSL is effective, the one-way analysis of variance (ANOVA) was performed using three scales of SSL as the single factor. The specific method is as follows: First, 70% of the soil samples in PSSL were randomly selected for modeling, and then the model was used to predict the two prediction sets to obtain the RPD, respectively. Second, the same operations such as 70% random sampling, modeling and prediction were repeated 50 times, and the mean and standard deviation of the RPD were calculated. The third step was to perform the same operations as PSSL on the CSSL and DSSL models. Finally, the one-way ANOVA was performed to check if the differences between the results obtained with the different scales SSLs were effectively significant.

Soil Spectrum
Before establishing the calibration models, a preprocessing method of standard normal variate (SNV) combined with a detrending method with a polynomial order of 2 were used to eliminate the scattering effects in the spectra. The average NIR spectra after preprocessing of soil sample sets used in this work are shown in Figure 3.

Soil Spectrum
Before establishing the calibration models, a preprocessing method of standard normal variate (SNV) combined with a detrending method with a polynomial order of 2 were used to eliminate the scattering effects in the spectra. The average NIR spectra after preprocessing of soil sample sets used in this work are shown in Figure 3.

ANOVA Test
Based on 70% samples randomly selected from the three SSLs′ sample sets, each model PSSL, CSSL, and DSSL was established by PLS. After 50 predictions were performed by each model on two prediction sets, the statistical results of the R 2 , RMSEP, and RPD are shown in Table 3. Then, the one-way ANOVA test was performed using three scales of SSL as the single factor, and the results of the ANOVA to evaluate the effect of different scale SSLs on the RPD are shown in Table 4. The results show that different scales of SSLs have a significant impact on the predicted RPD (P < 0.001, Table 4), so the difference between the RPD obtained by different scales of SSL is valid.

ANOVA Test
Based on 70% samples randomly selected from the three SSLs sample sets, each model PSSL, CSSL, and DSSL was established by PLS. After 50 predictions were performed by each model on two prediction sets, the statistical results of the R 2 , RMSEP, and RPD are shown in Table 3. Then, the one-way ANOVA test was performed using three scales of SSL as the single factor, and the results of the ANOVA to evaluate the effect of different scale SSLs on the RPD are shown in Table 4. The results show that different scales of SSLs have a significant impact on the predicted RPD (p < 0.001, Table 4), so the difference between the RPD obtained by different scales of SSL is valid.

Unspiked Models
Based on the three SSLs, each unspiked model (UM), PSSL-UM, CSSL-UM, and DSSL-UM, was established by PLS. Then, these models were used to predict the SOC contents of the samples in the two prediction sets, TA1 pre and TA2 pre . The statistical results for the prediction performances of the UMs are shown in Table 5. First, when the PSSL-UM established by the largest SSL was used to predict TA1 pre and TA2 pre , the RPD was 1.90 and 2.17, respectively. This may be because the large-scale PSSL covered the geographic range and soil types of TA2 pre , so a relatively good prediction effect was obtained. Meanwhile, when the soil type of TA1 pre was insufficient in the PSSL, the prediction effect was poor. These conclusions are similar to the results of Guerrero et al., indicating that large-scale soil libraries need to include the similar characteristics of soil samples in the target area to ensure a good prediction effect [25].
Second, the best prediction effect (RPD = 2.36) was obtained when using the CSSL-UM to predict TA2 pre . Although the PSSL contained many more soil samples, CSSL may contain more samples similar to the soil characteristics in TA2 pre due to geographic location, so that the CSSL-UM achieved better prediction performance for TA2 pre . Other researchers found similar results. Zeng et al. reported that samples used in the calibration model were geographically close to the target area, so the modeled soils were likely to have similar characteristics and spectra to the target area samples [26]. Furthermore, compared with a calibration model using all samples of a large-scale SSL, Liu et al. selected samples with similar spectral characteristics to the prediction set samples from the large-scale SSL for calibration modeling, and the ability to predict SOC content was significantly improved [27]. However, when the CSSL neither covered the geographical range of TA1 pre nor contained enough samples of the same soil type, the prediction effect of the CSSL-UM on TA1 pre was very poor (RPD = 1.46).
Third, when using the DSSL-UM established by the smallest SSL to predict TA1 pre and TA2 pre , although the DSSL did not cover the geographic range of both target areas, the DSSL-UM had a moderate to good predictive ability (RPD = 2.05) for TA1 pre because the soil sample type in the DSSL was the same as TA1 pre . Araújo et al. found similar results; when the types and spectra of soil samples in the calibration set and the target area were similar, quantitative analysis of the target area could be carried out even if it was based on a small SSL [28]. However, when some soil types in TA2 pre were missing in the DSSL, the prediction effect of the DSSL-UM on TA2 pre was poor (RPD = 1.83).
Comparing the performance of the UMs, we believe that it was a relatively stable method to use a large-scale SSL covering a wide geographic range and containing a full range of soil types to model, which can generally achieve better prediction results for target areas. However, among all the UMs, the CSSL-UM provided the best prediction results for TA2 pre (R 2 = 0.81, RMSEP = 2.81, bias = 0.18, RPD = 2.36), indicating that the predictive ability of the model could be improved when the SSL contained more soil samples similar to the soil spectrum in target areas. Therefore, in order to obtain a better prediction effect, we studied whether the predictive ability of the model could be improved by means of spiking.

Spiked Models
The KS algorithm was used to select 5 or 10 spiking samples from each target area to establish the spiked calibration model. Then, based on different SSLs and numbers of spiking samples, each spiked model (SM), PSSL-SM, CSSL-SM, and DSSL-SM, was established by PLS, and these models were used to predict the SOC content of the samples in prediction sets TA1 pre and TA2 pre . The statistical results of the prediction performances of the SMs are shown in Table 6. No., number of spiking samples.
To formulate a more intuitive comparison before and after spiking, we needed to determine the differences between the prediction effects of spiking on the models established by the three SSLs containing different soil samples, soil types, and geographic ranges. Figures 4 and 5 show scatter plots of the models' predicted values for soil samples in prediction sets TA1 pre and TA2 pre , and the corresponding SOC laboratory reference value for each sample. Each row in the figures represents a model based on the same SSL. The left, middle, and right-hand columns show the prediction effect of the UMs, the SMs with 5 spiking samples from the target area, and the SMs with 10 spiking samples from the target area, respectively.     Three significant results are shown in Figures 4 and 5. First, the prediction effect of each model was improved with the addition of spiking samples from the target areas, which indicates that the spiking method increased the predictive ability of the model. Other researchers also obtained similar results. Hong et al. found that adding spiking samples to the original model increased the accuracy of the prediction model [29]. In addition, compared with the UMs, improvements in the predictive ability of SMs was inversely proportional to the scale of the SSL. This may be because the smaller the original SSL, the higher the corresponding proportion of spiking samples in the SSL. Therefore, the established prediction model was more consistent with the soil characteristics of the target area, thus obviously improving its predictive ability. Guerrero et al. found a similar phenomenon by comparing eight SSLs of different sizes; the improvement effect of spiking was weaker in the large-scale SSL models, but more obvious in the small-scale SSL models [30].
Second, the method of spiking can solve the problem of poor prediction effect of the original model due to the lack of soil type samples in the target area. Comparing Figure 4d-f, it can be seen that when the CSSL contained a low proportion of soil types in TA1, after adding 5 and 10 spiking samples of the target area, the prediction effect of the CSSL models on TA1 pre was continuously improved (the RPD increased from 1.46 to 2.29 and 2.66). In addition, comparing Figure 5g-i shows that even when the DSSL lacked some soil type samples in TA2, after adding 5 and 10 spiking samples of the target area, the prediction effect of the DSSL models on TA2 pre was also continuously improved (the RPD increased from 1.83 to 2.77 and 3.16). However, when the original SSL already contained sufficient soil samples of similar characteristics to the target area, the predictive ability of the model was not significantly improved by spiking, and decreased due to saturation with the excessive number of spiking samples. For example, comparing Figure 4h,i shows that when 10 spiking samples were added, the predictive ability of the DSSL model on TA1 pre was reduced compared with 5 spiking samples (the RPD decreased from 2.72 to 2.67). In addition, the predictive ability of the CSSL model on TA2 pre was also reduced when 10 spiking samples were added, compared with 5 spiking samples, as shown in Figure 5e,f (the RPD decreased from 2.70 to 2.64). Seidel et al. found a similar result; when the number of spiking samples reached 15-20, the predictive ability of the model declined due to saturation [31].
Third, the method of spiking caused the original SSL to not be affected by the geographic scope, so the model based on the small-scale SSL could be applied to the measurement of the target area outside the geographic scope of the SSL [32]. For example, the geographic scope of the DSSL did not include TA1 and TA2, but after adding 5 and 10 spiking samples, respectively, the corresponding prediction models achieved good results (RPD > 2.5), as shown in Figure 4h,i.
In my opinion, the number of spiking samples is firstly related to the method of selecting spiking samples. Guerrero et al. used the KS algorithm and selected 10% of the samples from the target area as the spiking samples according to the principle of uniform distribution of the principal component scores of the spectral data [30]. Seidel et al. used the KS algorithm and selected 12.5% of the samples from the target area as the spiking samples according to the principle of maximum differentiation based on the Mahalanobis distance of spectral data [31]. The results showed that these spiking samples had greatly improved the prediction accuracy of the model. Second, in order to make the spiking samples as representative of the soil characteristics of the target area as possible, the number of spiking samples should account for approximately 10% of the target set samples. In this work, for the small-scale SSL model DSSL-SM, the KS algorithm was used to select five spiking samples from two different target areas at a time (approximately 10% of the target set samples), the corresponding prediction models achieved good results (RPD > 2.5), as shown in Figures 4h and 5h. Third, for the calibration model based on large-scale SSL, when the number of spiking samples is not enough to improve the prediction accuracy of the model, it is also considered to increase the weight of spiking samples to enhance their influence on the calibration model. Jiang et al. built the calibration model based on 608 soil samples, and selected 10 spiking samples from the target area. The results showed that the prediction accuracy of the model gradually improved as the weight of spiking samples was increased by 1, 5, and 10 times [33].
Overall, for TA1 pre , the DSSL-SM with five spiking samples achieved the best prediction effect in all models, with an R 2 of 0.86, RMSEP of 1.92, bias of −0.15, and RPD of 2.72, as shown in Figure 4h. For TA2 pre , the DSSL-SM with 10 spiking samples achieved the best prediction effect in all models, with an R 2 of 0.90, RMSEP of 2.10, bias of 0.21, and RPD of 3.16, as shown in Figure 5i. In comparison, Castaldi et al. used 12,128 samples from the large-scale LUCAS soil library to predict the SOC of three target sites in Belgium and Luxembourg, and the best prediction accuracy of the model was an RPD of 2.54 and R 2 of 0. 8 [34]. This indicates that the prediction performance of the model based on the small-scale SSL can also be compared with that of the large-scale SSL model after adding a few (5-10) spiking samples.

Conclusions
In this work, three SSLs of different scales (PSSL, CSSL, and DSSL) were constructed using NIR spectral data, and prediction models were established through the PLS algorithm to predict the SOC of two target areas. Since SOC is affected by regional climatic conditions, vegetation, land use, and other factors, the feasibility of the spiking method was further studied in order to improve the universality of the prediction model. For the new target area, it was expected that a model with high prediction accuracy could be built by using the existing SSL with the fewest spiking samples.
The results show that there was no correspondence between the model's predictive ability and the scale of the SSL on which the model was based. In addition, the UMs (PSSL-UM, CSSL-UM, and DSSL-UM) established by the original SSLs were not accurate enough to predict the SOC of the new target areas. Then, the KS algorithm was used to select 5-10 spiking samples evenly distributed in the space defined by the first two principal components from the sample set of the target area. The results show that with the help of spiking samples, the predictive ability of each model was improved, and the improvement effect of the model based on the small-scale SSL was more obvious than that of the large-scale SSL. In this work, the DSSL-SM established by combining the original small-scale DSSL and a few spiking samples achieved good prediction of the SOC of two target areas. Therefore, it may not be necessary to build a large-scale SSL when using NIR spectroscopy to detect the SOC in a new target area. By building a small-scale SSL and measuring the chemical reference values of a few spiked samples, the SOC in a new target area can be accurately predicted.