Vis-NIR Spectroscopy and PLS Regression with Waveband Selection for Estimating the Total C and N of Paddy Soils in Madagascar

Visible and near-infrared (Vis-NIR) diffuse reflectance spectroscopy with partial least squares (PLS) regression is a quick, cost-effective, and promising technology for predicting soil properties. The advantage of PLS regression is that all available wavebands can be incorporated in the model, while earlier studies indicate that PLS models include redundant wavelengths, and selecting specific wavebands can refine PLS analyses. This study evaluated the performance of PLS regression with waveband selection using Vis-NIR reflectance spectra to estimate the total carbon (TC) and total nitrogen (TN) in soils collected mainly from the surface of upland and lowland rice fields in Madagascar (n = 59; after outliers were removed). We used iterative stepwise elimination-based PLS (ISE-PLS) to estimate soil TC and TN and compared the predictive ability with standard full-spectrum PLS (FS-PLS). The predictive abilities were assessed using the coefficient of determination (R2), the root mean squared error of cross-validation (RMSECV), and the residual predictive deviation (RPD). Overall, ISE-PLS using first derivative reflectance (FDR) showed a better predictive accuracy than ISE-PLS for both TC (R2 = 0.972, RMSECV = 0.194, RPD = 5.995) and TN (R2 = 0.949, RMSECV = 0.019, RPD = 4.416) in the soil of Madagascar. The important wavebands for estimating TC (12.59% of all wavebands) and TN (3.55% of all wavebands) were selected from all 2001 wavebands over the 400–2400 nm range using ISE-PLS. These findings suggest that ISE-PLS based on Vis-NIR diffuse reflectance spectra can be used to estimate soil TC and TN contents in Madagascar with an improved predictive accuracy.


Introduction
Carbon (C) and nitrogen (N) contents in soils are two key parameters for sustaining soil and environmental quality, as well as for improving crop productivity because of their involvement in a number of natural processes related to soil health and fertility [1].Moreover, monitoring C levels in soils is increasingly needed because the depleted C levels, particularly in croplands, present an opportunity for carbon sequestration through adequate management practices [2].To efficiently manage C and N in soils, a large number of soil samples must be evaluated for soil spatial variability [3].However, standard procedures for assessing the state of C and N in soils are costly and time consuming [4,5] and require experienced operators.Thus, possible alternatives such as visible (Vis, 400-700 nm) and near-infrared (NIR, 700-2500 nm) spectroscopy are gaining attention; both of these alternatives have been widely accepted as fast and non-destructive methods for estimating soil properties [6,7].These techniques measure the radiation absorbed by various bonds of O-H, C-H, N-H, C=O, C-N, N-H, or C=C, resulting in bending, twisting, stretching, or scissoring [8,9].Diffusely reflected NIR radiation is then correlated to measure material properties using various multivariate calibration techniques [10].Among linear multivariate analyses, partial least squares (PLS) regression is the most commonly used approach for soil spectral analyses.Using PLS regression analyses, many calibrations have been conducted in recent decades to predict soil properties from Vis-NIR spectral data [11,12].The infra-red PLS method of soil property predictions was shown to be well suited for the characterization of soils [13].
However, waveband selection can also refine the performance of PLS analysis not only for the prediction of soil properties [14,15], but also for other chemical and physical properties, such as forage in paddy fields [16], forest [17], and grassland [18,19], or for water quality in irrigation ponds [20], food [21], and fuel [22].The PLS regression method combines the most useful information from hundreds of wavebands into the first several PLS factors (or latent variables), whereas the less important factors might include background effects [17,23].Thus, many approaches for selecting wavebands or wavelength regions have been developed to eliminate useless (or to select useful) wavebands/wavelength regions in PLS analyses; these approaches include iterative stepwise elimination PLS (ISE-PLS) [24], uninformative variable elimination PLS (UVE-PLS) [25], competitive adaptive reweighted sampling (CARS) [26], interval PLS (iPLS) [27], moving window PLS (MW-PLS) [28], and genetic algorithm PLS (GA-PLS) [29].Much of the literature has reported that more accurate calibration models may be achieved by selecting the most informative spectral variables instead of using the standard full-spectrum PLS (FS-PLS).In addition, waveband selection attempts to reduce the complexity and thus improve the robustness of a calibration model [23,30,31].For example, Kawamura et al. [23] reported that removal of the redundant wavebands by ISE-PLS greatly improved the estimation accuracy of herbage mass and forage chemical properties in pasture.The results also suggested that ISE-PLS has the advantage of tuning the optimum bands for PLS regression with a better predictive ability in pastures, although this method has not been applied to soil spectra and soil properties.
In Madagascar, rice is important not only as the country's staple food, but also as the major rural income-generating resource.However, rice yield has been stagnant at less than 3 t ha −1 in recent decades despite relatively favorable water conditions, with 70% of rice-cropping areas categorized as irrigated in this country [32].In a survey of several rice fields in Madagascar's central highland, Tsujimoto et al. [33] showed a significant and linear response of rice yield against the soil organic carbon (SOC) content in relation to the N-supplying capacity of soils, which strongly indicates the importance of soil fertility management for increasing regional rice yields.Extensive research on SOC has been conducted using standard procedures, but most studies have focused on forest carbon stocks in the context of carbon dynamics, global warming, and environmental degradation in Madagascar [34][35][36][37][38]. Extensive and field-based soil C and N evaluations concerning the development of appropriate soil and nutrient management recommendations for the rice-cropping system, the country's major land use, are limited.
The aim of this study was to evaluate whether waveband selection by ISE-PLS would improve the predictive ability of calibrations using laboratory Vis-NIR spectroscopy when predicting soil total C (TC) and total N (TN) contents in Madagascar.The study compares the performance of ISE-PLS with FS-PLS using a set of 59 soil samples collected from upland and lowland rice fields in the central highland of Madagascar.

Study Site and Soil Sampling and Chemical Analyses
The field survey was conducted in the central highland of Madagascar (Figure 1).This region belongs to a subtropical climate with an altitude of 1000-1500 m above sea level.The mean temperature is 14-17 • C in winter and 20-22 • C in summer.The average annual rainfall is 1100 mm (>80% occurs in November-March) [33].The area is dominated by inherently nutrient-poor soil types that are mainly classified into Ferralsols and Acrisols [39] or into Oxisols of semiarid to humid climates [40].
Remote Sens. 2017, 9, 1081 3 of 12 (>80% occurs in November-March) [33].The area is dominated by inherently nutrient-poor soil types that are mainly classified into Ferralsols and Acrisols [39] or into Oxisols of semiarid to humid climates [40].Soil sampling was conducted in 55 rice fields from August to November in 2016, consisting of eight upland and 47 lowland fields under various cropping systems (Figure 1).The sampling positions were recorded with a handy GPS (Colorado300, Garmin, Ltd., Kansas, TX, USA).Surface soil samples were collected from a 0-10 cm depth as composites of three to four cores in each field.Within three fields, sub-surface samples (10-20 cm depth in a field; 10-20, 20-30, and 30-40 cm depth in two fields) were also collected.Thus, 62 soil samples were obtained.

Soil Chemical Analyses
In the laboratory, soil samples were sieved to <2 mm and air dried for seven days.Earlier studies compared the effect of samples sieved to 2 mm and ground to 200 μm and did not obtain highly significant differences with respect to accuracy [41].Thus, we worked with 2 mm crushed and sieved soil samples (0.6 g) in this study.
The TC and TN contents of soils were determined using an automatic NC analyzer, the SUMIGRAPH NC-220F (Sumika Chemical Analysis Service, Ltd., Osaka, Japan).

Vis-NIR Diffuse Reflectance Measurement
Laboratory soil reflectance measurements were conducted in a dark room at the Graduate School of Agriculture, Kyoto University, Japan, on 12-13 December 2016, using a portable spectroradiometer (ASD FieldSpec 4 Hi-Res, ASD Inc., Longmont, CO, USA) and an ASD contact-probe (Figure 2).The ASD FieldSpec measures spectral reflectance in the 350-2500 nm wavelength region with spectral sampling of 1.4 nm in the 350-1000 nm range and 2 nm in the 1000-2500 nm range.The spectral resolution (full-width-half-maximum; FWHM) was 3 nm in the 350-1000 nm range and 6 nm in the 1000-2500 nm range, which were calculated to 1 nm resolution wavelengths for output data using the cubic spline interpolation function in ASD software (RS3 for Windows; ASD).The contact probe light source (halogen lamp) was aligned at 12° to the probe body, ensuring illumination at a fixed angle without the influence of ambient light.The fiber optic cable of the ASD FieldSpec was attached to the contact probe at a fixed measurement angle of 35°.The sensed spot area had a diameter of ~1.1 cm with a field of view of 1.33 cm 2 .A Spectralon (Labsphere, Inc., Sutton, NH, USA) Soil sampling was conducted in 55 rice fields from August to November in 2016, consisting of eight upland and 47 lowland fields under various cropping systems (Figure 1).The sampling positions were recorded with a handy GPS (Colorado300, Garmin, Ltd., Kansas, TX, USA).Surface soil samples were collected from a 0-10 cm depth as composites of three to four cores in each field.Within three fields, sub-surface samples (10-20 cm depth in a field; 10-20, 20-30, and 30-40 cm depth in two fields) were also collected.Thus, 62 soil samples were obtained.

Soil Chemical Analyses
In the laboratory, soil samples were sieved to <2 mm and air dried for seven days.Earlier studies compared the effect of samples sieved to 2 mm and ground to 200 µm and did not obtain highly significant differences with respect to accuracy [41].Thus, we worked with 2 mm crushed and sieved soil samples (0.6 g) in this study.
The TC and TN contents of soils were determined using an automatic NC analyzer, the SUMIGRAPH NC-220F (Sumika Chemical Analysis Service, Ltd., Osaka, Japan).

Vis-NIR Diffuse Reflectance Measurement
Laboratory soil reflectance measurements were conducted in a dark room at the Graduate School of Agriculture, Kyoto University, Japan, on 12-13 December 2016, using a portable spectro-radiometer (ASD FieldSpec 4 Hi-Res, ASD Inc., Longmont, CO, USA) and an ASD contact-probe (Figure 2).The ASD FieldSpec measures spectral reflectance in the 350-2500 nm wavelength region with spectral sampling of 1.4 nm in the 350-1000 nm range and 2 nm in the 1000-2500 nm range.The spectral resolution (full-width-half-maximum; FWHM) was 3 nm in the 350-1000 nm range and 6 nm in the 1000-2500 nm range, which were calculated to 1 nm resolution wavelengths for output data using the cubic spline interpolation function in ASD software (RS3 for Windows; ASD).The contact probe light source (halogen lamp) was aligned at 12 • to the probe body, ensuring illumination at a fixed angle without the influence of ambient light.The fiber optic cable of the ASD FieldSpec was attached to the contact probe at a fixed measurement angle of 35 • .The sensed spot area had a diameter of ~1.1 cm with a field of view of 1.33 cm 2 .A Spectralon (Labsphere, Inc., Sutton, NH, USA) reference panel (white reference) was used to optimize the ASD instrument prior to taking Vis-NIR reflectance measurements for each sample.Bulk soil samples were spread in optical-glass Petri dishes 85 mm in diameter and pressed to form a layer ~19 mm tick.The soil surfaces were scanned 25 times with five replications for the soil samples (see Figure 2c), and the spectral readings were averaged.

Preprocessing of Spectral Data
Spectral data in both edge wavelength regions (350-399 nm and 2401-2500 nm) were eliminated because of low signal-to-noise ratios in the instrument.Thus, a total of 2001 spectral bands between 400 nm and 2400 nm were used for analyses.
First derivative reflectance (FDR) spectra were used to reduce baseline variation and enhance spectral features [42].The FDR was calculated using the Savitzky-Golay smoothing filter [43].A thirdorder, 15-band moving polynomial was fitted according to the original reflectance signatures.The parameters of this polynomial were subsequently used to calculate the derivative at the center waveband of the moving spline window.In addition, a standard normal variate transform (SNV) was employed to reduce the particle size effect [41].
To detect outliers, a principal component analysis was performed on spectral data for calculating the Mahalanobis distance H, and samples with H > 3 were eliminated as outliers.As a result, three samples were considered outliers, leaving 59 samples for further analyses.

Standard Full-Spectrum Partial Least Sqares (FS-PLS) Regression
PLS regression analyses were performed to estimate soil parameters using reflectance and FDR datasets (n = 59).The standard FS-PLS regression equation is as follows: where the response variable y is a vector of the soil parameters (TN and TC); the predictor variables x1 to xi are the surface reflectance or FDR values for spectral bands 1 to i (400, 401, …, 2400 nm), respectively; β1 to βi are the estimated weighted regression coefficients; and ε is the error vector.The latent variables were introduced to simplify the relationship between the response variables and predictor variables.To determine the optimal number of latent variables (NLV), leave-one-out (LOO) cross-validation was performed to avoid over-fitting of the model, which was based on the minimum value of the root mean squared error of cross-validation (RMSECV) (see in Supplementary Materials: Figure S1).The RMSECV was calculated as follows: Bulk soil samples were spread in optical-glass Petri dishes 85 mm in diameter and pressed to form a layer ~19 mm tick.The soil surfaces were scanned 25 times with five replications for the soil samples (see Figure 2c), and the spectral readings were averaged.

Preprocessing of Spectral Data
Spectral data in both edge wavelength regions (350-399 nm and 2401-2500 nm) were eliminated because of low signal-to-noise ratios in the instrument.Thus, a total of 2001 spectral bands between 400 nm and 2400 nm were used for analyses.
First derivative reflectance (FDR) spectra were used to reduce baseline variation and enhance spectral features [42].The FDR was calculated using the Savitzky-Golay smoothing filter [43].A third-order, 15-band moving polynomial was fitted according to the original reflectance signatures.The parameters of this polynomial were subsequently used to calculate the derivative at the center waveband of the moving spline window.In addition, a standard normal variate transform (SNV) was employed to reduce the particle size effect [41].
To detect outliers, a principal component analysis was performed on spectral data for calculating the Mahalanobis distance H, and samples with H > 3 were eliminated as outliers.As a result, three samples were considered outliers, leaving 59 samples for further analyses.

Standard Full-Spectrum Partial Least Sqares (FS-PLS) Regression
PLS regression analyses were performed to estimate soil parameters using reflectance and FDR datasets (n = 59).The standard FS-PLS regression equation is as follows: where the response variable y is a vector of the soil parameters (TN and TC); the predictor variables x 1 to x i are the surface reflectance or FDR values for spectral bands 1 to i (400, 401, . . ., 2400 nm), respectively; β 1 to β i are the estimated weighted regression coefficients; and ε is the error vector.The latent variables were introduced to simplify the relationship between the response variables and predictor variables.To determine the optimal number of latent variables (NLV), leave-one-out (LOO) cross-validation was performed to avoid over-fitting of the model, which was based on the minimum value of the root mean squared error of cross-validation (RMSECV) (see in Supplementary Materials: Figure S1).The RMSECV was calculated as follows: where y i and y p represent the measured and predicted soil parameters for sample i, respectively, and n is the number of samples in the data sets (n = 59).

Iterative Stepwise Elimination Partial Least Squares (ISE-PLS) Regression
ISE-PLS is a PLS model that incorporates a waveband elimination algorithm.The ISE method eliminates noisy variables and selects useful predictors.When PLS models include large numbers of redundant variables or outliers, the models' predictive abilities may perform poorly, while the ISE method can overcome such problems.Performance depends on the importance of predictors (z i ), described as follows: where s i is the standard deviation and β i is the regression coefficient; both s i and β i correspond to the predictor variable of the waveband i.
Initially, all available wavebands (2001 bands, 400-2400 nm) are used to develop the PLS regression model.Then, to create a scope in which useless predictor variables are removed and the predictive ability is improved, each predictor z i is evaluated, and the minimum values are eliminated as less informative wavebands.Subsequently, the PLS model is re-calibrated with the remaining predictors [44].The model-building procedure is repeated until the final model is calibrated with the maximum predictive ability.

Predictive Ability of the PLS Models
The predictive abilities of the FS-PLS and ISE-PLS models were assessed by calculating the coefficient of determination (R 2 ), RMSECV, and the residual predictive deviation (RPD) using LOO cross-validation.High R 2 and low RMSECV values indicate the best model for predicting the soil parameters.The RPD has been defined as the ratio of standard deviation (SD) of reference data for predicting RMSECV [45].For the performance ability of calibration models, RPD was suggested to be at least 3 for agriculture applications, while RPD values between 2 and 3 indicate a model with a good prediction ability, 1.5 < RPD < 2 is an intermediate model needing some improvement, and an RPD < 1.5 indicates that the model has a poor prediction ability [13].
To determine the significant wavelengths used in FS-PLS calibrations, the variable importance in the projection (VIP) [46,47] was used and referred to the selected wavelength regions from ISE-PLS models.The VIP score gives a summary of the importance of an x-variable (waveband) for an observed y-variable and is calculated using the following equation: where VIP k (a) is the importance of the kth predictor variable based on a model with a factors, W ak is the corresponding loading weight of the kth variable in the ath PLS regression factor, SSY a is the explained sum of squares of y obtained from a PLS regression model with a factors, SSY t is the total sum of squares of y, and m is the total number of predictor variables.A high VIP score indicates an important x-variable (waveband) [46,48].All the data handling and linear regression analyses were performed using MATLAB software ver.9.0 (MathWorks, Sherborn, MA, USA).

Soil Properties (TC and TN) and Their Correlations with Each Waveband
Table 1 shows the descriptive analysis for soil TC and TN in the 59 samples.The mean (and SD) values of TC and TN were 2.18% (±1.16%) and 0.17% (±0.08%), respectively.The soil samples yielded a wide range of TC (coefficients of variation [CV] = 53.35)and TN values (CV = 48.08).The SD and range of sample affect the accuracy of soil property predictions using Vis-NIR spectroscopy [11].In the present study, the ranges in soil TC and TN were considered sufficiently large to develop the calibration models using PLS regression analyses.A significant correlation coefficient (r = 0.977, p < 0.001) was found between TC and TN in the soil samples.The results revealed that the soil TC and TN showed a similar shape of correlation using Vis-NIR reflectance and FDR spectra (see in Supplementary Materials: Figure S2).In the reflectance data, reflectance values at 1413 and 2207 nm were highly correlated with the soil TC and TN contents.A peak of negative correlation at 598 nm was also obtained in the Vis wavelength region.In a previous study [49], soil reflectance in the NIR wavelength region was characterized by well-defined absorption features associated with overtones of O-H and H-O-H stretch vibrations in free water (1455 and 1915 nm) and overtones and combinations of O-H stretch and metal-OH bends in a clay lattice (1415 and 2207 nm).

Comparison between FS-PLS and ISE-PLS Models
Figure 3 shows changes in the RMSECV and R 2 values with iterative stepwise elimination procedures of redundant wavebands in the prediction of TC and TN using FDR.The RMSECV decreased as wavebands were removed but increased rapidly after more than 1749 and 1930 wavebands had been removed for TC and TN, respectively.Similarly, the R 2 value tended to increase slowly until the maximum value was obtained when 1749 and 1930 wavebands had been removed.The remaining 252 (=2001 − 1749) and 71 (=2001 − 1930) wavebands were considered useful wavelengths for estimating TC and TN, respectively.The selected number of wavebands (NW) and the selected NW as a percentage of the full spectrum (NW% = NW/whole waveband [N = 2001]) are presented in Table 2, with the values of NLV, R 2 , RMSEC/CV, and RPD from the FS-PLS and ISE-PLS models using the FDR dataset.The optimum NLV ranged between 7 and 15, determined as the lowest RMSECV values calculated from LOO cross-validation to avoid over-fitting of the model.
The remaining 252 (=2001 − 1749) and 71 (=2001 − 1930) wavebands were considered useful wavelengths for estimating TC and TN, respectively.The selected number of wavebands (NW) and the selected NW as a percentage of the full spectrum (NW% = NW/whole waveband [N = 2001]) are presented in Table 2, with the values of NLV, R 2 , RMSEC/CV, and RPD from the FS-PLS and ISE-PLS models using the FDR dataset.The optimum NLV ranged between 7 and 15, determined as the lowest RMSECV values calculated from LOO cross-validation to avoid over-fitting of the model.Table 2. Optimum number of latent variables (NLV), coefficient of determination (R 2 ), root mean squared errors of calibration (RMSEC) and cross-validation (RMSECV), and residual predictive values (RPD) from full-spectrum PLS (FS-PLS) and iterative stepwise elimination PLS (ISE-PLS) models with a selected number of wavebands (NW) and their percentages of the full spectrum (NW%).Considering the difference in model accuracies between the FS-PLS and ISE-PLS (Table 2), better predictive accuracies were obtained in ISE-PLS than FS-PLS for both soil TC (R 2 = 0.972, RMSECV = 0.194) and TN (R 2 = 0.949, RMSECV = 0.019), with RPDs of 5.995 and 4.416, respectively.Figure 4 shows the relationships between the observed and cross-validated predicted values of soil TC and TN from ISE-PLS using FDR data.These results indicate that the soil TC and TN can be rapidly and accurately predicted from Vis-NIR diffuse reflectance spectroscopy using PLS regression.Selecting a subset of wavebands related to soil chemical properties and removing unrelated wavebands further improved the PLS regression results.Moreover, based on RPD > 3, the quality and future applicability of our results could be considered to have an excellent predictive ability.The remaining NW (NW%) of TC and TN was 252 (12.59%) and 71 (3.55%), respectively, suggesting that over 87% of the waveband information from the soil reflectance spectrum was redundant and did not contribute to or disturb the prediction of soil TC and TN.
improved the PLS regression results.Moreover, based on RPD > 3, the quality and future applicability of our results could be considered to have an excellent predictive ability.The remaining NW (NW%) of TC and TN was 252 (12.59%) and 71 (3.55%), respectively, suggesting that over 87% of the waveband information from the soil reflectance spectrum was redundant and did not contribute to or disturb the prediction of soil TC and TN.    2).
These results agree with previous results indicating that the most useful information in the Vis-NIR region (400-2400 nm) was less than 20% for predicting forage [18,19] and water parameters [20].These findings also support previous results showing that the performance of PLS models can be improved through waveband selection.Yang et al. [14] suggested that reducing large spectral datasets is valuable for more efficient storage, computation, and transmission, as well as for the ease of spectral analysis [50].In addition, when fewer wavebands are used, simpler and cheaper spectro-radiometer processes can be developed.

Selected Wavebands from ISE-PLS Models
The selected wavebands from ISE-PLS using FDR spectra to estimate soil TC and TN are shown in Figure 5, with VIP score values from FS-PLS.Based on the VIP score (>1), the wavelengths centered near 418,470,760,1408,1912,2255,2314, and 2339 nm were identified as common important wavelengths for estimating soil TC and TN.Most of the VIP peak regions were selected in the final ISE-PLS models.Although they did not perfectly fit with previously known absorption wavelength regions, some of the wavelengths were revealed within 30 nm of known absorption features.For soil TC prediction, the final model included Vis wavelength regions (400-480 and 640-700 nm), which are associated with soil color and had a huge influence on model calibration.Soil becomes darker as soil organic matter (SOM) increases; thus, several researchers have tried to use soil color information to estimate SOM [9,51].However, soil darkness is only a useful discriminator within limited geological variation.In general, soil reflectance decreases with increasing organic matter content [49] and water content [52].Absorptions of approximately 400, 450, 510, 550, 700, 870, and 1000 nm are characterized by the presence of ferrous and ferric iron oxides and are due to the electronic transitions of the iron cations [53].A spectral band of 2100-2500 nm contributes to the model calibration of C and N in soils [54].
to estimate SOM [9,51].However, soil darkness is only a useful discriminator within limited geological variation.In general, soil reflectance decreases with increasing organic matter content [49] and water content [52].Absorptions of approximately 400, 450, 510, 550, 700, 870, and 1000 nm are characterized by the presence of ferrous and ferric iron oxides and are due to the electronic transitions of the iron cations [53].A spectral band of 2100-2500 nm contributes to the model calibration of C and N in soils [54].Martin et al. [55] reported that the NIR spectroscopy-based prediction of TN may be indirect due to a close correlation with TC, and that the calibration accuracy is higher for TC than for TN.Chang and Laird [56] confirmed that the NIR spectroscopy determination of TN does not always rely on a strong correlation with TC and can determine TN directly.Brunet et al. [41] hypothesized that, depending on the studied dataset, TN can be predicted based on its correlation with TC when the correlation is high; otherwise, it can be predicted directly.In our result, soil TC data showed a high correlation with soil TN data (r = 0.977), and calibrations obtained a better predictive accuracy for TC (R 2 = 0.972, RMSECV = 0.194) than for TN (R 2 = 0.949, RMSECV = 0.019).Within the selected wavebands of soil TN (Figure 5), 90.1% (=64/71 bands × 100%) overlapped with the selected wavebands of soil TC, whereas different wavebands in TC calibration were revealed mainly in the Martin et al. [55] reported that the NIR spectroscopy-based prediction of TN may be indirect due to a close correlation with TC, and that the calibration accuracy is higher for TC than for TN.Chang and Laird [56] confirmed that the NIR spectroscopy determination of TN does not always rely on a strong correlation with TC and can determine TN directly.Brunet et al. [41] hypothesized that, depending on the studied dataset, TN can be predicted based on its correlation with TC when the correlation is high; otherwise, it can be predicted directly.In our result, soil TC data showed a high correlation with soil TN data (r = 0.977), and calibrations obtained a better predictive accuracy for TC (R 2 = 0.972, RMSECV = 0.194) than for TN (R 2 = 0.949, RMSECV = 0.019).Within the selected wavebands of soil TN (Figure 5), 90.1% (=64/71 bands × 100%) overlapped with the selected wavebands of soil TC, whereas different wavebands in TC calibration were revealed mainly in the NIR region (707, 717-719, 774 nm).These results indicated that TN prediction using our dataset was affected by strong correlations with TC data but might be directly estimated.
Lastly, we note that this study was carried out on heterogenous sample data sets, which were collected at upland and lowland soils under various rice-based cropping systems, including wide ranges of soil types in Madagascar.However, several researchers consider the reliability of the prediction questionable when studying heterogeneous sample sets [41].Particle size and arrangement might also affect the calibration due to the light transmission path [57].Moreover, to map the carbon stock at a larger spatial scale in Madagascar, evaluating an appropriate spatial scale with a larger data set is required [58].In future study, thus, more information concerning the effect of a heterogeneous data set on the accuracy of NIRS predictions at different scales is needed in order to apply the methodology to soil characterization of the whole island of Madagascar.

Conclusions
We investigated the performance of waveband selection in the spectral estimation of soil TC and TN using Vis-NIR reflectance data.The results indicated that soil TC and TN in Madagascar can be more accurately estimated by ISE-PLS than by standard FS-PLS using laboratory Vis-NIR spectroscopy.ISE-based wavelength selection in PLS calibration suggested that the important wavebands for estimating soil TC and TN were, respectively, 12.59% and 3.55% of all 2001 wavebands in the 400-2400 nm range.Based on selected FDR wavelengths in the ISE-PLS model, soil TC and TN were determined to provide excellent predictions (RPD > 3), with 0.194% and 0.019% error, respectively.The use of PLS with ISE waveband selection in Vis-NIR reflectance spectra is promising for the spectral assessment of soil TC and TN in Madagascar.Furthermore, the waveband selection procedure refined the predictive ability expected by optimizing the wavelength subset using ISE-PLS.

Figure 1 .
Figure 1.Locations of studied regions and soil sampling points.

Figure 1 .
Figure 1.Locations of studied regions and soil sampling points.

Figure 2 .
Figure 2. (a) The setup used to measure the soil reflectance in a dark room; (b) the use of a contact probe that touches the surface of the soil sample; and (c) the five measuring spots on a soil sample.

Figure 2 .
Figure 2. (a) The setup used to measure the soil reflectance in a dark room; (b) the use of a contact probe that touches the surface of the soil sample; and (c) the five measuring spots on a soil sample.

Figure 3 .
Figure 3. Changes in RMSECV (black line) and R 2 values (red line) in models to estimate total carbon (TC) (a) and total nitrogen (TN) (b) with the stepwise removal of redundant wavebands.The minimum value of the root mean squared error of cross-validation (RMSECV) (blue dotted line) was obtained when 1749 and 1930 wavebands were removed for TC and TN, respectively.

Figure 4 .
Figure 4.Observed and predicted values of soil total carbon (TC) and soil total nitrogen (TN) contents using ISE-PLS models with first derivative reflectance (FDR) data (n = 59).The coefficient of determination (R 2 ), root mean squared error of cross-validation (RMSECV), and residual predicted value (RPD) are cross-validated (leave-one-out cross-validation method) coefficient of determination, root mean squared error, and residual predictive values, respectively (see Table2).
Figure 4.Observed and predicted values of soil total carbon (TC) and soil total nitrogen (TN) contents using ISE-PLS models with first derivative reflectance (FDR) data (n = 59).The coefficient of determination (R 2 ), root mean squared error of cross-validation (RMSECV), and residual predicted value (RPD) are cross-validated (leave-one-out cross-validation method) coefficient of determination, root mean squared error, and residual predictive values, respectively (see Table2).

Figure 4 .
Figure 4. Observed and predicted values of soil total carbon (TC) and soil total nitrogen (TN) contents using ISE-PLS models with first derivative reflectance (FDR) data (n = 59).The coefficient of determination (R 2 ), root mean squared error of cross-validation (RMSECV), and residual predicted value (RPD) are cross-validated (leave-one-out cross-validation method) coefficient of determination, root mean squared error, and residual predictive values, respectively (see Table2).

Figure 5 .
Figure 5. Soil reflectance and its first derivative reflectance (FDR) spectra for the total carbon (TC; a) and total nitrogen (TN; b) datasets and selected waveband (red bar) in iterative stepwise elimination of partial least squares (ISE-PLS) with variable importance in the prediction (VIP) score (blue line) from full-spectrum PLS (FSPLS) models.

Figure 5 .
Figure 5. Soil reflectance and its first derivative reflectance (FDR) spectra for the total carbon (TC; a) and total nitrogen (TN; b) datasets and selected waveband (red bar) in iterative stepwise elimination of partial least squares (ISE-PLS) with variable importance in the prediction (VIP) score (blue line) from full-spectrum PLS (FSPLS) models.

Table 1 .
Descriptive statistics of soil sample data.