Improving Spectral Estimation of Soil Organic Carbon Content through Semi-Supervised Regression

Visible and near infrared (VIS-NIR) spectroscopy has been applied to estimate soil organic carbon (SOC) content with many modeling strategies and techniques, in which a crucial and challenging problem is to obtain accurate estimations using a limited number of samples with reference values (labeled samples). To solve such a challenging problem, this study, with Honghu City (Hubei Province, China) as a study area, aimed to apply semi-supervised regression (SSR) to estimate SOC contents from VIS-NIR spectroscopy. A total of 252 soil samples were collected in four field campaigns for laboratory-based SOC content determinations and spectral measurements. Semi-supervised regression with co-training based on least squares support vector machine regression (Co-LSSVMR) was applied for spectral estimations of SOC contents, and it was further compared with LSSVMR. Results showed that Co-LSSVMR could improve the estimations of SOC contents by exploiting samples without reference values (unlabeled samples) when the number of labeled samples was not excessively small and produce better estimations than LSSVMR. Therefore, SSR could reduce the number of labeled samples required in calibration given an accuracy threshold, and it holds advantages in SOC estimations from VIS-NIR spectroscopy with a limited number of labeled samples. Considering the increasing popularity of airborne platforms and sensors, SSR might be a promising modeling technique for SOC estimations from remotely sensed hyperspectral images.


Introduction
Soil organic carbon (SOC) plays important roles in chemical and physical processes of soil environment [1], and it is a key indicator of soil quality [2].Therefore, effective estimations of SOC contents are helpful for soil quality mapping and precision agriculture [3].Over the past several decades, visible and near infrared (VIS-NIR) reflectance spectroscopy has been proven to be an efficient, non-destructive and cost-effective alternative for SOC content estimations [1,[4][5][6][7][8].Although most previous studies have been focused on modeling SOC contents using laboratory-based reflectance spectroscopy, some studies demonstrated the feasibility of estimating SOC contents with airborne and even spaceborne hyperspectral images at within-field and regional scales [5,7].
Several regression models, such as multiple linear regression, partial least square regression [9], principal component regression, support vector machine regression (SVMR) [10], artificial neural networks and random forests, have been employed to estimate SOC contents from VIS-NIR spectroscopy [11].In these methods, sufficient training samples describing soil variations of study areas play a decisive role in the accurate estimations of soil properties, including SOC contents [12].Traditional soil property determination is time-consuming and costly, limiting sample size in model calibration processing.By contrast, the spectral measurements of soil samples used to derive soil properties are more efficient, and, meanwhile, a large amount of soil spectra can be obtained with hyperspectral imaging system under ideal soil and weather conditions.
In addressing the issue on combining above-mentioned traditional soil property determination method and modern spectroscopy technique, semi-supervised learning (SSL) might be an attractive solution, because it is developed to enhance model performance by employing samples with reference values (labeled samples) and those without reference values (unlabeled samples) [13].The underlying idea of SSL is to exploit unlabeled samples to refine models initially calibrated with labeled samples.As a paradigm of SSL, co-training was first proposed by Blum and Mitchell [14], and it trains two classifiers separately on two sufficient and redundant views.An algorithm proposed by Goldman and Zhou [15] trains two classifiers on a single view using two different supervised learning algorithms, and it has drawn significant attentions in the classifications of text [16] and language sentiment [17].
The SSL with co-training has also been introduced to regression.A semi-supervised regression (SSR) approach called Co-training Regressors was proposed by Zhou and Li [18], and it generates two k-nearest neighbor models on the same dataset with different distance metrics, in which each model makes estimation on the unlabeled data for the other during the learning phase.The labeling confidence for an unlabeled sample is determined by the amount of mean square error on the labeled samples, and the final estimation is obtained by averaging the estimates of the two refined models.Although there are some problems ahead to be resolved for SSL, such as stop learning criterion and potentially introducing noise, studies [19,20] have indicated that SSL might be a promising technique both in qualitative and quantitative remote sensing, such as image classification, spectral unmixing and water quality parameter retrieval.No study has been found to apply SSR to estimate SOC contents from VIS-NIR spectroscopy.
Using laboratory-based VIS-NIR reflectance spectroscopy, this study aimed to: (i) evaluate the effectiveness of SSR in improving SOC content estimations by exploiting unlabeled samples; (ii) determine the behavior of SSR regarding to the percentage of labeled samples; and (iii) investigate the sensitivity of SSR to the number of labeled and unlabeled samples.

Least Squares Support Vector Machine Regression
Support vector machine regression (SVMR) can offer complex fitting properties by mapping training data non-linearly into a high-dimensional space using a kernel function [21].Least squares SVMR (LSSVMR) is a modified version of SVMR [22], solving multivariate calibrations by applying least squares error in training error function.LSSVMR has a more simplified training process than SVMR [22], and it has been proved to be a favorable supervised calibration technique in estimating soil properties from VIS-NIR spectroscopy [23][24][25][26].
LSSVMR model can be expressed as follows: where K(x i , x) is the kernel function, |L| is the number of training samples, and α and b are the regression coefficients.The most popular kernel function is radial basis function (RBF, exp − ||x i − x || 2 /2σ 2 , where σ 2 is the width of the Gaussian function) because of its adaptability to non-linear data [25].RBF4, a variant of RBF kernel function (1/2 3 − ||x i − x || 2 /σ 2 × exp(− ||x i − x || 2 /2σ 2 ) [27], was used to generate diversity in SSR with co-training in this study.
For more details about LSSVMR, please refer to the Appendix A.

Semi-Supervised Regression with Co-Training Based on LSSVMR (Co-LSSVMR)
Let L = (x 1 , y 1 ), (x 2 , y 2 ), . . . ,x |L| , y |L| denote the labeled sample set, where x i represents a vector of a soil spectrum, and y i is the associated SOC content.Let U denote the unlabeled sample set, whose soil spectra are available and SOC content values are unknown.
In the SSR with co-training, two regressors are firstly trained with labeled samples, and each regressor is then gradually refined by using the unlabeled samples selected by the other regressor during the co-training progress.In this algorithm, the first key point is building two diverse regressors.In this study, the difference between the two regressors is achieved by using two different kernel functions because the solution of LSSVMR is obtained in a kernel-induced feature space.The second key point is determining the labeling confidence for the unlabeled samples.In the literature [18], the labeling confidence is rated by consulting the influence of each unlabeled sample on the labeled samples, and the sample with the highest labeling confidence is the one that reduces the most of fitting error when used in calibration.In this study, the labeling confidence on each unlabeled sample was measured by the reduction of root mean square error of leave-one-out cross-validation (RMSECV) before and after the sample was added to the training set.
Other two important problems to be addressed in SSR with co-training are the stopping criteria and model selection in learning phases.In theory, a weak learner trained with labeled samples can be raised to an arbitrary precision through the constant use of unlabeled samples [28].However, experiments have shown that the learning performance could not be improved further after a number of learning iterations [28].Hence, to avoid overfitting problem, the number of learning rounds is often specified [18].However, in our initial experiments, a fixed number of learning iterations often failed to select the best model, and thus the learning phase was performed until all potential unlabeled samples were used in this study.RMSECV and root mean square error of calibration (RMSEC) were tested as model selection criterion, and the model with the lowest RMSECV or RMSEC was selected for each regressor.
In Co-LSSVMR, two LSSVMR models (s 1 and s 2 ) generated from the labeled samples with two kernel functions (RBF and RBF4) were employed to label the unlabeled samples and to determine the labeling confidence.In each iteration, the sample with the highest confidence was used to refine the other learner.The learning process was continued until no unlabeled sample could reduce the RMSECV of the two models.After the learning phase, one model was selected for each regressor based on model selection criterion.The final estimation was obtained by averaging the estimations of these two refined models.The steps of Co-LSSVMR are illustrated in Figure 1 and summarized as follows: (1) Copy labeled set L to L 1 and L 2 .
(2) Train two LSSVMR regressors s 1 and s 2 from L 1 and L 2 with RBF and RBF4 as their kernel functions, respectively.(3) Obtain labeling set U 1 and U 2 from the unlabeled set U using s 1 and s 2 , respectively.(4) Add the most confidently labeled sample x u of U 2 to L 1 , and remove x u from U 1 and U 2 .x u is the one that results in the largest reduction of RMSECV of s 2 .(5) Add the most confidently labeled sample x u of U 1 to L 2 , and remove x u from U 1 and U 2 .x u is the one that results in the largest reduction of RMSECV of s 1 .(6) Retrain s 1 and s 2 , and update the labeling values of U 1 and U 2 with s 1 and s 2 , respectively.(7) Repeat Steps (4)-( 6) until neither L 1 nor L 2 changes.( 8) Select a refined model for each regressor (s 1 and s 2 ) according to the model selection criterion.(9) For a sample to be estimated, the average of the estimations of the two refined models is considered the final estimation.

Study Area and Field Sampling
Honghu City (113  C and a mean annual precipitation ranging from 1000 to 1300 mm.The landform is flat and vast, with an average elevation less than 50 m.The parent material is dominated by quaternary alluvial deposits and lacustrine sediments, and the main soil types are paddy (Anthrosols) and fluvo-aquic soil (Gleysols) [29,30].
A total of 252 soil samples were collected in four field campaigns (108 on 20-21 December 2011, 60 on 10-11 July 2012, 40 on 17-19 November 2012 and 44 on 14-15 April 2013), 180 of which were fluvo-aquic soil and the others were paddy soil.At each sampling point, about 1.0 kg of surface soils (0-10 cm) was collected after wiping off plant material, plant residues, roots and stones.Each soil sample was kept in a sealed package for spectral measurement and SOC content determination in the laboratory.

Laboratory Analyses and Measurements
After being air-dried at an indoor temperature for three days and removing stones and plant residues, all soil samples were ground with an agate mortar and passed through a 20-mesh sieve (<2 mm).Each sample was placed in a 10 cm-diameter petri dish.The geometric conditions of the measurement were detailed by Shi [6].The reflectance spectra were measured in a laboratory through an ASD FieldSpec 3 portable spectroradiometer (Analytical Spectral Devices, Inc., Boulder, CO, USA) with a wavelength range of 350-2500 nm.Its sampling intervals are 1.4 in the 350-1000 nm range and 2 in the 1000-2500 nm range, and its spectral resolutions are 3 nm in the 350-1000 nm range and 10 nm in the 1000-2500 nm range.The measured values are interpolated, and the spectroradiometer finally provides a spectrum of 2151 bands with a uniform spectral interval of 1 nm.Correction with a standardized white Spectralon panel (Labsphere, Inc., North Sutton, NH, USA) with near 100% reflectance was performed prior to each scan.An average value of 10 spectral measurements for each sample was calculated as the final reflectance spectrum.After spectral measurements were made, the SOC contents of all soil samples were determined using the Walkley and Black method [31] in the laboratory, which is based on wet oxidation in potassium dichromate.

Spectral Preprocessing and Outlier Detection
Considering the high noise effects at spectral edges, the reflectance spectra were reduced to 400-2450 nm and then smoothed with the Savitzky-Golay smoothing method [32] with a second order polynomial fit and a window size of 9 data points.The outliers were detected using robust principal component analysis (ROBPCA) method, and samples with a large score distance and a large orthogonal distance within the principal component analysis (PCA) subspace were identified as outliers.The ROBPCA was implemented through a Matlab toolbox provided by Verboven and Hubert [33].Six outliers (Figure 2) were detected and eliminated.
To reduce data dimensionality and to match the finer spectral resolution of the spectroradiometer, the reflectance spectra were resampled using 3-nm spacing intervals, providing 681 variables.The SOC contents of the remaining samples were statistically described.The soil samples were divided equally in number into four groups based on SOC content values from low to high, and the average spectrum of each group was calculated, visualized and analyzed.

Model Calibration
Models were calibrated using the settings in Figure 1a-c, respectively, to investigate the sensitivity of SSR to the percentage of labeled samples, and the number of labeled and unlabeled samples.A total of 164 out of the 246 samples were selected as candidate calibration dataset through Kennard-Stone (KS) algorithm [34].The selected samples were recorded in the order produced by the KS algorithm.The remaining 82 samples were used as the validation dataset.For investigating the sensitivity of SSR to the percentage of labeled samples, the first m samples (m = 10, 20…160, and 164) out of the 164 samples were considered as labeled samples, and the remaining 164-m samples were used as unlabeled samples in Co-LSSVMR.Considering the popularity of the partial least square regression (PLSR) in SOC content estimations [35], PLSR models were also calibrated only with the first m labeled samples and compared with LSSVMR and Co-LSVMR models.
In the above experiments, the gains in estimation accuracy were mitigated when 100 out of the 164 samples were used as labeled samples.For assessing the sensitivity of Co-LSSVMR to the number of labeled samples, the unlabeled samples were kept invariant.The last 64 samples out of the 164 samples were used as unlabeled samples, and the first m samples (m = 10, 20…100) were used as labeled samples for Co-LSSVMR.The model performance was studied by increasing the labeled samples size gradually.
For investigating the sensitivity of Co-LSSVMR to the number of unlabeled samples, the labeled samples used for calibration were kept invariant.The first 80 out of the 164 samples were used as labeled samples, and the following 10, 20…60, and 64 samples were considered as unlabeled samples.The model performance was studied by increasing the dataset size of unlabeled samples gradually.

Model Evaluation and Comparison
For comparing the performance of LSSVMR and Co-LSSVMR, the average of SOC estimations obtained by the two initial LSSVMR models calibrated only with labeled samples was considered as the estimation of the LSSVMR.The estimation performance was evaluated using the validation dataset, and the accuracy was assessed by the RMSE of validation (RMSEV), coefficient of determination (R 2 v) and ratio of inter-quartile range to RMSEV (RPIQ) [36,37].Bellon-Maurel et al. [37] suggested that RPIQ, based on quartiles, might be a better indicator for performance estimation than residual prediction deviation (RPD).Moreover, the gains in estimation accuracy obtained by Co-LSSVMR with respect to LSSVMR were obtained by subtracting the RMSEV of Co-LSSVMR from RMSEV of LSSVMR and then dividing the result by the RMSEV of LSSVMR.

Model Calibration
Models were calibrated using the settings in Figure 1a-c, respectively, to investigate the sensitivity of SSR to the percentage of labeled samples, and the number of labeled and unlabeled samples.A total of 164 out of the 246 samples were selected as candidate calibration dataset through Kennard-Stone (KS) algorithm [34].The selected samples were recorded in the order produced by the KS algorithm.The remaining 82 samples were used as the validation dataset.For investigating the sensitivity of SSR to the percentage of labeled samples, the first m samples (m = 10, 20, . . ., 160, and 164) out of the 164 samples were considered as labeled samples, and the remaining 164-m samples were used as unlabeled samples in Co-LSSVMR.Considering the popularity of the partial least square regression (PLSR) in SOC content estimations [35], PLSR models were also calibrated only with the first m labeled samples and compared with LSSVMR and Co-LSVMR models.
In the above experiments, the gains in estimation accuracy were mitigated when 100 out of the 164 samples were used as labeled samples.For assessing the sensitivity of Co-LSSVMR to the number of labeled samples, the unlabeled samples were kept invariant.The last 64 samples out of the 164 samples were used as unlabeled samples, and the first m samples (m = 10, 20, . . ., 100) were used as labeled samples for Co-LSSVMR.The model performance was studied by increasing the labeled samples size gradually.
For investigating the sensitivity of Co-LSSVMR to the number of unlabeled samples, the labeled samples used for calibration were kept invariant.The first 80 out of the 164 samples were used as labeled samples, and the following 10, 20, . . ., 60, and 64 samples were considered as unlabeled samples.The model performance was studied by increasing the dataset size of unlabeled samples gradually.

Model Evaluation and Comparison
For comparing the performance of LSSVMR and Co-LSSVMR, the average of SOC estimations obtained by the two initial LSSVMR models calibrated only with labeled samples was considered as the estimation of the LSSVMR.The estimation performance was evaluated using the validation dataset, and the accuracy was assessed by the RMSE of validation (RMSEV), coefficient of determination (R 2 v ) and ratio of inter-quartile range to RMSEV (RPIQ) [36,37].Bellon-Maurel et al. [37] suggested that RPIQ, based on quartiles, might be a better indicator for performance estimation than residual prediction deviation (RPD).Moreover, the gains in estimation accuracy obtained by Co-LSSVMR with respect to LSSVMR were obtained by subtracting the RMSEV of Co-LSSVMR from RMSEV of LSSVMR and then dividing the result by the RMSEV of LSSVMR.
All programs, including spectra preprocessing, parameter optimization and modeling, were implemented in MATLAB 7.11.0(www.mathworks.com), and the parallel computing toolbox was used to improve computing efficiency.The LS-SVMlab toolbox [27] was used to implement the LSSVMR.

Descriptive Statistics and Reflectance Spectra of Soil Samples
The statistical descriptions of SOC contents for the whole, candidate calibration and validation datasets are shown in Table 1.The SOC contents for the whole dataset varied from 0.76 to 45.73 g•kg −1 , with an average value of 11.51 g•kg −1 and a median value of 10.04 g•kg −1 .The distributions of the whole, calibration and validation datasets showed a positively skewed distribution with a skewness of 1.01, 1.26 and 0.65 and a kurtosis of −0.83, 2.74 and 1.28, respectively.The average reflectance curves show the typical patterns of soil spectra in the VIS-NIR regions with three prominent absorption features around 1400, 1900 and 2200 nm (Figure 3).The absorption region near 1400 nm is the first overtone of OH stretches, and the second region near 1900 nm is due to the combination of OH stretches and H-O-H bend [38].The absorption near 2200 nm results from OH stretches and Al/Fe-OH bend [25].The average reflectance with 4.31 g•kg −1 SOC is the highest, and that with 21.35 g•kg −1 SOC is the lowest.However, the spectra curves of the two other groups are very close to each other, possibly indicating the non-linear relationship between spectra and SOC contents, as SOC and other soil elements combine to produce a soil spectrum.
Remote Sens. 2017, 9, 29 7 of 20 All programs, including spectra preprocessing, parameter optimization and modeling, were implemented in MATLAB 7.11.0(www.mathworks.com), and the parallel computing toolbox was used to improve computing efficiency.The LS-SVMlab toolbox [27] was used to implement the LSSVMR.

Descriptive Statistics and Reflectance Spectra of Soil Samples
The statistical descriptions of SOC contents for the whole, candidate calibration and validation datasets are shown in Table 1.The SOC contents for the whole dataset varied from 0.76 to 45.73 g•kg −1 , with an average value of 11.51 g•kg −1 and a median value of 10.04 g•kg −1 .The distributions of the whole, calibration and validation datasets showed a positively skewed distribution with a skewness of 1.01, 1.26 and 0.65 and a kurtosis of −0.83, 2.74 and 1.28, respectively.The average reflectance curves show the typical patterns of soil spectra in the VIS-NIR regions with three prominent absorption features around 1400, 1900 and 2200 nm (Figure 3).The absorption region near 1400 nm is the first overtone of OH stretches, and the second region near 1900 nm is due to the combination of OH stretches and H-O-H bend [38].The absorption near 2200 nm results from OH stretches and Al/Fe-OH bend [25].The average reflectance with 4.31 g•kg −1 SOC is the highest, and that with 21.35 g•kg −1 SOC is the lowest.However, the spectra curves of the two other groups are very close to each other, possibly indicating the non-linear relationship between spectra and SOC contents, as SOC and other soil elements combine to produce a soil spectrum.

Sensitivity to the Percentage of Labeled Samples
To illustrate the models' behavior in the co-training process, the RMSECV, RMSEC and RMSEV of the two regressors with regard to the number of unlabeled samples exploited are plotted for the scenarios of 10, 70 and 150 labeled samples used in the calibration (Figure 4).For these three cases,

Sensitivity to the Percentage of Labeled Samples
To illustrate the models' behavior in the co-training process, the RMSECV, RMSEC and RMSEV of the two regressors with regard to the number of unlabeled samples exploited are plotted for the scenarios of 10, 70 and 150 labeled samples used in the calibration (Figure 4).For these three cases, the RMSECV of the two regressors exhibited a decreasing trend as more unlabeled samples were incorporated in the calibration, whereas the RMSEC and RMSEV displayed different patterns.When only 10 labeled samples were used, a clear overfitting phenomenon was observed with RMSEC close to 0, whereas the validation performance of the two regressors deteriorated gradually.The calibration and validation accuracies of the two regressors obviously increased when 70 labeled samples were used.For 150 labeled samples, a slight improvement was observed for RMSEC and RMSEV.In addition to the 10 labeled samples, the overfitting phenomena were observed when no more than 40 labeled samples were used in the calibration, which indicated that 40 samples were not sufficient to capture the SOC variations for this study area.Therefore, the cases with less than 40 labeled samples were not considered in determining the model selection criterion.The RMSE versus RMSEC and RMSE versus RMSECV are plotted (Figures 5 and 6) to examine the relationships of model performance with calibration accuracy and cross-validation accuracy.In most cases, RMSEV values had a similar increasing or decreasing trend with RMSEC (Figure 5); whereas RMSEV exhibited weak correlations with RMSECV (Figure 6) because the decrease in RMSECV did not surely result in a decrease in RMSEV.Thus, the RMSEC was selected as the model selection criterion, and the refined model for each regressor was selected according to the lowest RMSEC.Moreover, the cases with fewer than 40 labeled samples in calibration also adopted this model selection criterion.In addition to the 10 labeled samples, the overfitting phenomena were observed when no more than 40 labeled samples were used in the calibration, which indicated that 40 samples were not sufficient to capture the SOC variations for this study area.Therefore, the cases with less than 40 labeled samples were not considered in determining the model selection criterion.The RMSE versus RMSEC and RMSE versus RMSECV are plotted (Figures 5 and 6) to examine the relationships of model performance with calibration accuracy and cross-validation accuracy.In most cases, RMSEV values had a similar increasing or decreasing trend with RMSEC (Figure 5); whereas RMSEV exhibited weak correlations with RMSECV (Figure 6) because the decrease in RMSECV did not surely result in a decrease in RMSEV.Thus, the RMSEC was selected as the model selection criterion, and the refined model for each regressor was selected according to the lowest RMSEC.Moreover, the cases with fewer than 40 labeled samples in calibration also adopted this model selection criterion.The percentage of labeled samples used in calibration had an obvious effect on the estimation accuracies of the models trained only with labeled samples as well as the refined models of Co-LSSVMR.In general, more labeled samples were more likely to result in better estimation performance (Figure 7a,b).When only 10 labeled samples were used, both models produced poor estimations with high RMSEV and low R 2 V .The performances of the two LSSVMR models were improved gradually as the number of the labeled samples increased to 100, in which the performance leveled (Figure 7a).No notable improvement was observed for the two refined models in Co-LSSVMR as the number of the labeled samples increased from 20 to 50; however, the estimation accuracies improved remarkably as the number of labeled samples increased from 50 to 60.Thereafter, the performances of both models were improved until they leveled at 80 labeled samples.
The percentage of labeled samples used in calibration had an obvious effect on the estimation accuracies of the models trained only with labeled samples as well as the refined models of Co-LSSVMR.In general, more labeled samples were more likely to result in better estimation performance (Figure 7a, b).When only 10 labeled samples were used, both models produced poor estimations with high RMSEV and low R 2 V.The performances of the two LSSVMR models were improved gradually as the number of the labeled samples increased to 100, in which the performance leveled (Figure 7a).No notable improvement was observed for the two refined models in Co-LSSVMR as the number of the labeled samples increased from 20 to 50; however, the estimation accuracies improved remarkably as the number of labeled samples increased from 50 to 60.Thereafter, the performances of both models were improved until they leveled at 80 labeled samples.The number of unlabeled samples exploited and the total number of labeled and unlabeled samples used by each refined model in Co-LSSVMR are summarized in Figure 7c,d.The number of unlabeled samples used by each refined model increased quickly as the number of labeled samples increased from 10 to 30.The number displayed a decreasing trend when more than 50 labeled samples were used, which was approximately consistent with the maximum number of unlabeled samples available for each regressor.For the total number of labeled and unlabeled samples used by each refined model, the increasing trend was approximately consistent with the maximum number available for each regressor when more than 30 labeled samples were used in the calibration.
Figure 8 illustrates and compares the final estimation performances of LSSVMR and Co-LSSVMR.The RMSEV obtained by LSSVMR reduced gradually from 5.16 g•kg −1 at 30 labeled samples (RPIQ = 1.97,R 2 V = 0.46) to 3.12 g•kg −1 at 100 labeled samples (RPIQ = 3.31, R 2 V = 0.79), and then it turned relatively stable.The RMSEV obtained by Co-LSSVMR decreased sharply from 4.62 g•kg −1 at 50 labeled samples (RPIQ = 2.19, R 2 V = 0.53) to 3.58 g•kg −1 at 60 labeled samples (RPIQ = 2.84, R 2 V = The number of unlabeled samples exploited and the total number of labeled and unlabeled samples used by each refined model in Co-LSSVMR are summarized in Figure 7c,d.The number of unlabeled samples used by each refined model increased quickly as the number of labeled samples increased from 10 to 30.The number displayed a decreasing trend when more than 50 labeled samples were used, which was approximately consistent with the maximum number of unlabeled samples available for each regressor.For the total number of labeled and unlabeled samples used by each refined model, the increasing trend was approximately consistent with the maximum number available for each regressor when more than 30 labeled samples were used in the calibration.The gains in estimation accuracy achieved by Co-LSSVMR with respect to LSSVMR varied with the percentage of labeled samples used in calibration (Figure 8c).When less than 40 out of the 164 samples were used as labeled samples, Co-LSSVMR had similar or poorer performance when compared with LSSVMR, whereas Co-LSSVMR had an advantage over LSSVMR with over 15% gains in accuracy when 60 to 90 labeled samples were used.Furthermore, the gains in accuracy reached a maximum of 22.52% at 80 labeled samples.The gains in accuracy obtained by Co-LSSVMR were moderate (6.64%, 11.50%, 10.04% and 8.16%) for 100 to 130 labeled samples.When more than 140 samples were labeled, Co-LSSVMR produced slightly better estimations than LSSVMR with gains in accuracy smaller than 5%.The gains in estimation accuracy achieved by Co-LSSVMR with respect to LSSVMR varied with the percentage of labeled samples used in calibration (Figure 8c).When less than 40 out of the 164 samples were used as labeled samples, Co-LSSVMR had similar or poorer performance when compared with LSSVMR, whereas Co-LSSVMR had an advantage over LSSVMR with over 15% gains in accuracy when 60 to 90 labeled samples were used.Furthermore, the gains in accuracy reached a maximum of 22.52% at 80 labeled samples.The gains in accuracy obtained by Co-LSSVMR were moderate (6.64%, 11.50%, 10.04% and 8.16%) for 100 to 130 labeled samples.When more than 140 samples were labeled, Co-LSSVMR produced slightly better estimations than LSSVMR with gains in accuracy smaller than 5%.
The calibration and validation results obtained by LSSVMR and Co-LSSVMR when 80 labeled samples were used for calibration are illustrated and compared in Figure 9.The LSSVMR obtained acceptable fitting accuracy with samples scattering around the 1:1 line (Figure 9a,b), however, it tended to overestimated low SOC values and underestimated high SOC values with a slope of 0.60 and an intercept of 4.05 g•kg −1 (Figure 9c).By exploiting the unlabeled samples, Co-LSSVMR obtained better fitting accuracy (Figure 9e,f) and validation performance with a slope of 0.76 and an intercept of 2.45 g•kg −1 (Figure 9g).
Remote Sens. 2017, 9, 29 12 of 20 The calibration and validation results obtained by LSSVMR and Co-LSSVMR when 80 labeled samples were used for calibration are illustrated and compared in Figure 9.The LSSVMR obtained acceptable fitting accuracy with samples scattering around the 1:1 line (Figure 9a,b), however, it tended to overestimated low SOC values and underestimated high SOC values with a slope of 0.60 and an intercept of 4.05 g•kg −1 (Figure 9c).By exploiting the unlabeled samples, Co-LSSVMR obtained better fitting accuracy (Figure 9e,f) and validation performance with a slope of 0.76 and an intercept of 2.45 g•kg −1 (Figure 9g).

Sensitivity to the Number of Labeled Samples
Figures 10 and 11 show the results obtained by Co-LSSVMR with respect to its sensitivity to the number of labeled samples.The performances of the two refined models in Co-LSSVMR and their average estimations exhibited similar patterns, improving gradually as the number of the labeled samples increased from 20 to 80.The number of the unlabeled samples included in each refined model showed an upward trend as the number of labeled samples increased from 10 to 40 (Figure 10b), while the total number of labeled and unlabeled samples in each refined model displayed an increasing trend as the number of labeled samples increased constantly (Figure 10c).

Sensitivity to the Number of Labeled Samples
Figures 10 and 11 show the results obtained by Co-LSSVMR with respect to its sensitivity to the number of labeled samples.The performances of the two refined models in Co-LSSVMR and their average estimations exhibited similar patterns, improving gradually as the number of the labeled samples increased from 20 to 80.The number of the unlabeled samples included in each refined model showed an upward trend as the number of labeled samples increased from 10 to 40 (Figure 10b), while the total number of labeled and unlabeled samples in each refined model displayed an increasing trend as the number of labeled samples increased constantly (Figure 10c).The effectiveness of Co-LSSVMR was sensitive to the number of labeled samples (Figure 11).The Co-LSSVMR produced less accurate estimations than LSSVMR when less than 40 labeled samples were used in the calibration.In addition, the gains in accuracy obtained by Co-LSSVMR were negligible even though 57 unlabeled samples were exploited when the size of the labeled samples was 50.However, Co-LSSVMR produced notable gains in accuracy ranging from 13% to 21% when 60 to 90 labeled samples were used.The gains in accuracy were reduced to 6.64% when the number of labeled samples was 100.

Sensitivity to the Number of Unlabeled Samples
The results obtained by Co-LSSVMR with 80 labeled samples are summarized in Figures 12 and  13, which show the sensitivity of Co-LSSVMR to the pool size of unlabeled samples.The RMSEV obtained by each refined model and the difference of the RMSEV of the two refined models exhibited decreasing trends as the pool size of unlabeled samples increased from 10 to 60, whereas the number of unlabeled samples exploited by each model displayed an increasing trend.The differences of the RMSEVs of the two refined models were larger than 0.1 g•kg −1 when less than 50 unlabeled samples were available.Thereafter, the estimation performances of the two regressors tended to level off and be comparable as more unlabeled samples were available and incorporated.The effectiveness of Co-LSSVMR was sensitive to the number of labeled samples (Figure 11).The Co-LSSVMR produced less accurate estimations than LSSVMR when less than 40 labeled samples were used in the calibration.In addition, the gains in accuracy obtained by Co-LSSVMR were negligible even though 57 unlabeled samples were exploited when the size of the labeled samples was 50.However, Co-LSSVMR produced notable gains in accuracy ranging from 13% to 21% when 60 to 90 labeled samples were used.The gains in accuracy were reduced to 6.64% when the number of labeled samples was 100.

Sensitivity to the Number of Unlabeled Samples
The results obtained by Co-LSSVMR with 80 labeled samples are summarized in Figures 12 and  13, which show the sensitivity of Co-LSSVMR to the pool size of unlabeled samples.The RMSEV obtained by each refined model and the difference of the RMSEV of the two refined models exhibited decreasing trends as the pool size of unlabeled samples increased from 10 to 60, whereas the number of unlabeled samples exploited by each model displayed an increasing trend.The differences of the RMSEVs of the two refined models were larger than 0.1 g•kg −1 when less than 50 unlabeled samples were available.Thereafter, the estimation performances of the two regressors tended to level off and be comparable as more unlabeled samples were available and incorporated.The effectiveness of Co-LSSVMR was sensitive to the number of labeled samples (Figure 11).The Co-LSSVMR produced less accurate estimations than LSSVMR when less than 40 labeled samples were used in the calibration.In addition, the gains in accuracy obtained by Co-LSSVMR were negligible even though 57 unlabeled samples were exploited when the size of the labeled samples was 50.However, Co-LSSVMR produced notable gains in accuracy ranging from 13% to 21% when 60 to 90 labeled samples were used.The gains in accuracy were reduced to 6.64% when the number of labeled samples was 100.

Sensitivity to the Number of Unlabeled Samples
The results obtained by Co-LSSVMR with 80 labeled samples are summarized in Figures 12  and 13, which show the sensitivity of Co-LSSVMR to the pool size of unlabeled samples.The RMSEV obtained by each refined model and the difference of the RMSEV of the two refined models exhibited decreasing trends as the pool size of unlabeled samples increased from 10 to 60, whereas the number of unlabeled samples exploited by each model displayed an increasing trend.The differences of the RMSEVs of the two refined models were larger than 0.1 g•kg −1 when less than 50 unlabeled samples were available.Thereafter, the estimation performances of the two regressors tended to level off and be comparable as more unlabeled samples were available and incorporated.The estimation accuracies of Co-LSSVMR, RMSEV decreased gradually as the dataset size of unlabeled samples increased from 10 to 60, whereas R 2 V, RPIQ and the gains in estimation accuracy showed increasing trends (Figure 13).Co-LSSVMR obtained good estimations with 11.36% gains in accuracy over LSSVMR when the pool size of unlabeled samples was 10.The gains in accuracy were greater than 20% when the pool size of unlabeled samples was larger than 50.The estimation performance remained stable when more than 60 unlabeled samples were available.

Discussion
This study demonstrated the effectiveness of SSR in estimating SOC contents from VIS-NIR spectroscopy when the number of labeled samples was limited but not excessively small.SSR is based on the assumptions that the data lie on a low-dimensional manifold embedded in a higherdimensional space and similar inputs should have similar outputs [13].These assumptions are valid for the scenario of estimating SOC contents with VIS-NIR spectroscopy, since they are similar to spectral matching in soil spectroscopy, which has been applied successfully to SOC contents as well as other soil characteristics estimations [39][40][41][42].SSR obtained better SOC estimations because the unlabeled data provided helpful information on the ground-truth data distribution [43].This statement was confirmed by our results in Figure 9, which shows that the extra unlabeled samples included in the refined models make each model more consistent with the labeled samples and accordingly produce better performance for the validation dataset.
This study found that SSR produced few model improvements or even reduced model performance when less than 50 labeled samples were used.The labeling confidence for unlabeled samples estimated by the amount of reduction of RMSECV on the labeled samples was not reliable,   The estimation accuracies of Co-LSSVMR, RMSEV decreased gradually as the dataset size of unlabeled samples increased from 10 to 60, whereas R 2 V, RPIQ and the gains in estimation accuracy showed increasing trends (Figure 13).Co-LSSVMR obtained good estimations with 11.36% gains in accuracy over LSSVMR when the pool size of unlabeled samples was 10.The gains in accuracy were greater than 20% when the pool size of unlabeled samples was larger than 50.The estimation performance remained stable when more than 60 unlabeled samples were available.

Discussion
This study demonstrated the effectiveness of SSR in estimating SOC contents from VIS-NIR spectroscopy when the number of labeled samples was limited but not excessively small.SSR is based on the assumptions that the data lie on a low-dimensional manifold embedded in a higherdimensional space and similar inputs should have similar outputs [13].These assumptions are valid for the scenario of estimating SOC contents with VIS-NIR spectroscopy, since they are similar to spectral matching in soil spectroscopy, which has been applied successfully to SOC contents as well as other soil characteristics estimations [39][40][41][42].SSR obtained better SOC estimations because the unlabeled data provided helpful information on the ground-truth data distribution [43].This statement was confirmed by our results in Figure 9, which shows that the extra unlabeled samples included in the refined models make each model more consistent with the labeled samples and accordingly produce better performance for the validation dataset.
This study found that SSR produced few model improvements or even reduced model performance when less than 50 labeled samples were used.The labeling confidence for unlabeled samples estimated by the amount of reduction of RMSECV on the labeled samples was not reliable, The estimation accuracies of Co-LSSVMR, RMSEV decreased gradually as the dataset size of unlabeled samples increased from 10 to 60, whereas R 2 V , RPIQ and the gains in estimation accuracy showed increasing trends (Figure 13).Co-LSSVMR obtained good estimations with 11.36% gains in accuracy over LSSVMR when the pool size of unlabeled samples was 10.The gains in accuracy were greater than 20% when the pool size of unlabeled samples was larger than 50.The estimation performance remained stable when more than 60 unlabeled samples were available.

Discussion
This study demonstrated the effectiveness of SSR in estimating SOC contents from VIS-NIR spectroscopy when the number of labeled samples was limited but not excessively small.SSR is based on the assumptions that the data lie on a low-dimensional manifold embedded in a higher-dimensional space and similar inputs should have similar outputs [13].These assumptions are valid for the scenario of estimating SOC contents with VIS-NIR spectroscopy, since they are similar to spectral matching in soil spectroscopy, which has been applied successfully to SOC contents as well as other soil characteristics estimations [39][40][41][42].SSR obtained better SOC estimations because the unlabeled data provided helpful information on the ground-truth data distribution [43].This statement was confirmed by our results in Figure 9, which shows that the extra unlabeled samples included in the refined models make each model more consistent with the labeled samples and accordingly produce better performance for the validation dataset.
This study found that SSR produced few model improvements or even reduced model performance when less than 50 labeled samples were used.The labeling confidence for unlabeled samples estimated by the amount of reduction of RMSECV on the labeled samples was not reliable, which was also observed by other studies [19,43,44], and such result could be explained by that a small sample dataset was not sufficient to capture information sufficient to describe the soil variations.Thus, overfitting phenomena were more likely to occur when the number of labeled samples used in calibration was small, which indicated that the model had poor generalization performance although it fitted perfectly well for the training dataset.Given the poor performance for the validation dataset, the labeling quality for the unlabeled samples should not be high, and more noises than useful information might be introduced by the addition of unlabeled samples.
We also found that SSR appeared to be more useful when the percentage of labeled samples was neither excessively large nor excessively small.As mentioned above, the performance degradation occurred at small labeled-sample size could be attributed to unreliable labeling confidence evaluation and low labeling quality.However, the labeling confidence became more reliable, and the labeling quality became high as more labeled samples were used.Thus, SSR could capture more useful information from unlabeled samples to improve the estimation accuracy.However, the usefulness of SSR seemed to be limited when the percentage of labeled samples was large enough.This finding might be explained by the following: (i) the high estimation accuracy obtained only with the labeled samples left little room for improvement [44,45]; and (ii) only a small number of unlabeled samples available for exploitation might reduce the probability of selecting highly informative unlabeled samples.Moreover, the results obtained under the fixed number of unlabeled samples exhibited a similar trend to the number of labeled samples.
Co-LSSVMR needed less labeled samples than LSSVMR to achieve good estimations, which could be explained by that the estimation performance of Co-LSSVMR improved more quickly through the use of unlabeled samples as the labeled-sample size increased.However, this study also indicated that estimating SOC contents accurately with very few labeled soil samples through SSR might be impractical.In some applications, SSL appeared to require significantly fewer labeled samples than its corresponding supervised technique in obtaining high accuracy [45], whereas the usefulness and performance of SSL was also found to be dependent on the complexity of the problems to be solved and the labeled sample size [19,44,45].Viscarra Rossel [4] pointed out that sufficient samples, which can adequately describe the soil variation of the study area, were required for the high estimation accuracy of soil properties.The complex relationship between SOC content and soil reflectance might explain the failure to achieve accurate estimations with very few labeled samples, because soil spectrum is determined by the combinations of absorption features from different mineral components and organic matter [46,47].
We found that the effectiveness of SSR was also influenced by the dataset size of unlabeled samples.To some extent, more unlabeled samples available for exploitation were more likely to cause more gains in accuracy.Notable performance differences between the two regressors in Co-LSSVMR were observed when a limited number of unlabeled samples were available.This finding also indicated the improvement potential left for refinement, which was confirmed by the further improvements obtained with larger dataset size of unlabeled samples.
In this study, no extra stopping criteria were adopted to terminate the learning phases because of the limited number of unlabeled samples.However, testing all unlabeled samples exhaustively from a large dataset is inefficient.It is especially true for hyperspectral imaging applications, in which a large population of soil spectra can be collected [48][49][50].In such a scenario, testing all available unlabeled samples is impractical.In addressing this problem, a representative subset of unlabeled samples must be extracted in advance [18].Bazi et al. [19] compared three unlabeled sample selection strategies based on random sampling, variance and differential entropy and found that differential entropy outperformed the other two strategies.This study could be a good reference for future applications of SSR in estimating SOC contents with hyperspectral images.
Several modeling strategies and techniques have been explored in literatures to improve the estimations of SOC contents from hyperspectral spectroscopy.For example, Local regression is a promising modeling strategy especially for diverse datasets with large soil variations [40,42,51].Testing the applicability of the SSR to large diverse and heterogeneous soil spectral dataset [52,53] would be a meaningful task in the future.Moreover, investigating the compatibility of SSR with these modeling strategies and techniques might be meaningful for further studies.

Conclusions
This study investigated the effectiveness of SSR in estimating SOC contents from laboratory-based VIS-NIR spectroscopy with a limited number of samples with reference values.The principal conclusions obtained can be summarized as follows: (1) Co-LSSVMR can generally produce better estimations than LSSVMR when the number of labeled samples is not excessively small (>50), and the gains in accuracy of Co-LSSVMR with respect to LSSVMR can be up to over 20%.(2) SSR requires less labeled samples to produce estimations of a certain accuracy.
(3) The usefulness of SSR is sensitive to the number of labeled and unlabeled samples, and SSR is more likely to produce more gains in estimation accuracy when the number of labeled samples is neither excessively small nor excessively large, and when the unlabeled samples are sufficient.

Figure 1 .
Figure 1.Flow chart for model calibration and validation: (a) the setup for sensitivity to the percentage of labeled samples; (b) the setup for the sensitivity to the number of labeled samples; (c) the setup for the sensitivity to the number of unlabeled samples; and (d) the co-training process in semi-supervised regression (SSR).

Figure 1 .
Figure 1.Flow chart for model calibration and validation: (a) the setup for sensitivity to the percentage of labeled samples; (b) the setup for the sensitivity to the number of labeled samples; (c) the setup for the sensitivity to the number of unlabeled samples; and (d) the co-training process in semi-supervised regression (SSR).

Figure 2 .
Figure 2. Scatterplot of soil reflectance spectra detected using robust principal component analysis method based on two principal components.The vertical and horizontal lines in the plot are the cutoff values of orthogonal and score distance obtained from robust principal component analysis (ROBPCA) to detect outliers, and the Star symbol refers to the outliers.

Figure 2 .
Figure 2. Scatterplot of soil reflectance spectra detected using robust principal component analysis method based on two principal components.The vertical and horizontal lines in the plot are the cutoff values of orthogonal and score distance obtained from robust principal component analysis (ROBPCA) to detect outliers, and the Star symbol refers to the outliers.

Figure 3 .
Figure 3. Average reflectance of four groups and their corresponding soil organic carbon contents (g•kg −1 ).

Figure 3 .
Figure 3. Average reflectance of four groups and their corresponding soil organic carbon contents (g•kg −1 ).
Remote Sens. 2017, 9,29 8 of 20 the RMSECV of the two regressors exhibited a decreasing trend as more unlabeled samples were incorporated in the calibration, whereas the RMSEC and RMSEV displayed different patterns.When only 10 labeled samples were used, a clear overfitting phenomenon was observed with RMSEC close to 0, whereas the validation performance of the two regressors deteriorated gradually.The calibration and validation accuracies of the two regressors obviously increased when 70 labeled samples were used.For 150 labeled samples, a slight improvement was observed for RMSEC and RMSEV.

Figure 4 .
Figure 4. Behaviors of root mean square error of cross-validation (RMSECV), root mean square error of calibration (RMSEC) and root mean square error of validation (RMSEV) achieved by regressors s1 and s2 versus the number of unlabeled samples exploited when 10, 70 and 150 labeled samples (m) were used in Co-LSSVMR, respectively.

Figure 4 .
Figure 4. Behaviors of root mean square error of cross-validation (RMSECV), root mean square error of calibration (RMSEC) and root mean square error of validation (RMSEV) achieved by regressors s1 and s2 versus the number of unlabeled samples exploited when 10, 70 and 150 labeled samples (m) were used in Co-LSSVMR, respectively.

Figure 5 .
Figure 5. Root mean square error of validation (RMSEV) versus root mean square error of calibration (RMSEC) obtained by regressor s1 (a,b); and RMSEV versus RMSEC obtained by regressor s2 (c,d) in Co-LSSVMR training phases when more than 50 labeled samples are used in calibration.

Figure 6 .
Figure 6.Root mean square error of validation (RMSEV) versus root mean square error of crossvalidation (RMSECV) obtained by regressor s1 (a,b); and RMSEV versus RMSECV obtained by regressor s2 (c,d) in Co-LSSVMR training phases when more than 50 labeled samples are used in calibration.

Figure 5 .
Figure 5. Root mean square error of validation (RMSEV) versus root mean square error of calibration (RMSEC) obtained by regressor s1 (a,b); and RMSEV versus RMSEC obtained by regressor s2 (c,d) in Co-LSSVMR training phases when more than 50 labeled samples are used in calibration.

Figure 5 .
Figure 5. Root mean square error of validation (RMSEV) versus root mean square error of calibration (RMSEC) obtained by regressor s1 (a,b); and RMSEV versus RMSEC obtained by regressor s2 (c,d) in Co-LSSVMR training phases when more than 50 labeled samples are used in calibration.

Figure 6 .
Figure 6.Root mean square error of validation (RMSEV) versus root mean square error of crossvalidation (RMSECV) obtained by regressor s1 (a,b); and RMSEV versus RMSECV obtained by regressor s2 (c,d) in Co-LSSVMR training phases when more than 50 labeled samples are used in calibration.

Figure 6 .
Figure 6.Root mean square error of validation (RMSEV) versus root mean square error of cross-validation (RMSECV) obtained by regressor s1 (a,b); and RMSEV versus RMSECV obtained by regressor s2 (c,d) in Co-LSSVMR training phases when more than 50 labeled samples are used in calibration.

Figure 7 .
Figure 7.The performances of the two models trained only with labeled samples (a); and the refined models of Co-LSSVMR (b); the number of unlabeled samples used in each refined model (c); and total number of labeled and unlabeled samples used in each refined model (d) versus the number of labeled samples and its percentage (in brackets).The "Max" in (c) indicates the maximum number of unlabeled samples available for each regressor, which is one half of the pool size of unlabeled samples, and the "Max" in (d) indicates the maximum total number of labeled and unlabeled samples available for each regressor.

Figure 7 .
Figure 7.The performances of the two models trained only with labeled samples (a); and the refined models of Co-LSSVMR (b); the number of unlabeled samples used in each refined model (c); and total number of labeled and unlabeled samples used in each refined model (d) versus the number of labeled samples and its percentage (in brackets).The "Max" in (c) indicates the maximum number of unlabeled samples available for each regressor, which is one half of the pool size of unlabeled samples, and the "Max" in (d) indicates the maximum total number of labeled and unlabeled samples available for each regressor.

Figure 8 .
Figure 8.The performance of LSSVMR and Co-LSSVMR (a); ratio of inter-quartile range to RMSEV (RPIQ) of LSSVMR and Co-LSSVMR (b); and the gains in accuracy obtained by Co-LSSVMR with respect to LSSVMR (c) versus the number of labeled samples and its percentage (in brackets).For comparison, the estimation results obtained by PLSR are also plotted in (a, b).

Figure 8 .
Figure 8.The performance of LSSVMR and Co-LSSVMR (a); ratio of inter-quartile range to RMSEV (RPIQ) of LSSVMR and Co-LSSVMR (b); and the gains in accuracy obtained by Co-LSSVMR with respect to LSSVMR (c) versus the number of labeled samples and its percentage (in brackets).For comparison, the estimation results obtained by PLSR are also plotted in (a,b).

Figure 9 .
Figure 9. Scatter plots of the estimated versus measured SOC content (g•kg −1 ) for the calibration dataset obtained by two supervised LSSVMR models: s1 (a); and s2 (b); and for the validation dataset obtained by LSSVMR trained with 80 labeled samples (c); scatter plots of the estimated versus measured SOC content for the calibration dataset obtained by the two refined: models s1 (d); and s2 (e); and for the validation dataset (f) obtained by Co-LSSVMR, where 80 out of 164 labeled samples were labeled samples.The "Measured SOC content" for each unlabeled sample in (d,e) is the value labeled by the other regressor.The solid line is the regression line between estimated and measured values, and the dashed line is the 1:1 line.

Figure 9 .
Figure 9. Scatter plots of the estimated versus measured SOC content (g•kg −1 ) for the calibration dataset obtained by two supervised LSSVMR models: s1 (a); and s2 (b); and for the validation dataset obtained by LSSVMR trained with 80 labeled samples (c); scatter plots of the estimated versus measured SOC content for the calibration dataset obtained by the two refined: models s1 (d); and s2 (e); and for the validation dataset (f) obtained by Co-LSSVMR, where 80 out of 164 labeled samples were labeled samples.The "Measured SOC content" for each unlabeled sample in (d,e) is the value labeled by the other regressor.The solid line is the regression line between estimated and measured values, and the dashed line is the 1:1 line.

Figure 10 .
Figure 10.The performance of the two refined models of Co-LSSVMR (a); the number of unlabeled samples used in each refined model (b); and the total number of labeled and unlabeled samples used in each refined model (c) versus the number of labeled samples with 64 unlabeled samples available for exploitation.The "Max" in (b) indicates the maximum number of unlabeled samples available, which is 32, for each regressor; and the "Max" in (c) indicates the maximum total number of labeled and unlabeled samples available for each regressor.

Figure 11 .
Figure 11.The performance of: Co-LSSVMR (a); RPIQ of Co-LSSVMR (b); and the gains in accuracy obtained by Co-LSSVMR with respect to LSSVMR (c) versus the number of labeled samples with 64 unlabeled samples available for exploitation.

Figure 10 .
Figure 10.The performance of the two refined models of Co-LSSVMR (a); the number of unlabeled samples used in each refined model (b); and the total number of labeled and unlabeled samples used in each refined model (c) versus the number of labeled samples with 64 unlabeled samples available for exploitation.The "Max" in (b) indicates the maximum number of unlabeled samples available, which is 32, for each regressor; and the "Max" in (c) indicates the maximum total number of labeled and unlabeled samples available for each regressor.

Figure 10 .
Figure 10.The performance of the two refined models of Co-LSSVMR (a); the number of unlabeled samples used in each refined model (b); and the total number of labeled and unlabeled samples used in each refined model (c) versus the number of labeled samples with 64 unlabeled samples available for exploitation.The "Max" in (b) indicates the maximum number of unlabeled samples available, which is 32, for each regressor; and the "Max" in (c) indicates the maximum total number of labeled and unlabeled samples available for each regressor.

Figure 11 .
Figure 11.The performance of: Co-LSSVMR (a); RPIQ of Co-LSSVMR (b); and the gains in accuracy obtained by Co-LSSVMR with respect to LSSVMR (c) versus the number of labeled samples with 64 unlabeled samples available for exploitation.

Figure 11 .
Figure 11.The performance of: Co-LSSVMR (a); RPIQ of Co-LSSVMR (b); and the gains in accuracy obtained by Co-LSSVMR with respect to LSSVMR (c) versus the number of labeled samples with 64 unlabeled samples available for exploitation.

Figure 12 .
Figure 12.The performance of the two refined models of: Co-LSSVMR (a); and the number of unlabeled samples used in each refined model (b) versus the pool size of unlabeled samples with 80 labeled samples used in calibration."Max" in (b) indicates the maximum number of unlabeled samples available for each regressor.

Figure 13 .
Figure 13.The performance of: Co-LSSVMR (a); RPIQ of Co-LSSVMR (b); and the gains in accuracy obtained by Co-LSSVMR with respect to LSSVMR (c) versus the pool size of labeled samples with 80 labeled samples used in calibration.

Figure 12 .
Figure 12.The performance of the two refined models of: Co-LSSVMR (a); and the number of unlabeled samples used in each refined model (b) versus the pool size of unlabeled samples with 80 labeled samples used in calibration."Max" in (b) indicates the maximum number of unlabeled samples available for each regressor.

Figure 12 .
Figure 12.The performance of the two refined models of: Co-LSSVMR (a); and the number of unlabeled samples used in each refined model (b) versus the pool size of unlabeled samples with 80 labeled samples used in calibration."Max" in (b) indicates the maximum number of unlabeled samples available for each regressor.

Figure 13 .
Figure 13.The performance of: Co-LSSVMR (a); RPIQ of Co-LSSVMR (b); and the gains in accuracy obtained by Co-LSSVMR with respect to LSSVMR (c) versus the pool size of labeled samples with 80 labeled samples used in calibration.

Figure 13 .
Figure 13.The performance of: Co-LSSVMR (a); RPIQ of Co-LSSVMR (b); and the gains in accuracy obtained by Co-LSSVMR with respect to LSSVMR (c) versus the pool size of labeled samples with 80 labeled samples used in calibration.
• 07 -114 • 05 E, 29 • 38 -30 • 12 N) is situated in the north shore of the middle reaches of the Yangtze River and the central south of Hubei Province, China.It has a mean annual temperature of 16.6

Table 1 .
Statistical description of the soil organic carbon (SOC) contents (g•kg −1 ) of soil samples.