Effects of Moisture and Particle Size on Quantitative Determination of Total Organic Carbon (TOC) in Soils Using Near-Infrared Spectroscopy

Near-Infrared Spectroscopy is a cost-effective and environmentally friendly technique that could represent an alternative to conventional soil analysis methods, including total organic carbon (TOC). Soil fertility and quality are usually measured by traditional methods that involve the use of hazardous and strong chemicals. The effects of physical soil characteristics, such as moisture content and particle size, on spectral signals could be of great interest in order to understand and optimize prediction capability and set up a robust and reliable calibration model, with the future perspective of being applied in the field. Spectra of 46 soil samples were collected. Soil samples were divided into three data sets: unprocessed, only dried and dried, ground and sieved, in order to evaluate the effects of moisture and particle size on spectral signals. Both separate and combined normalization methods including standard normal variate (SNV), multiplicative scatter correction (MSC) and normalization by closure (NCL), as well as smoothing using first and second derivatives (DV1 and DV2), were applied to a total of seven cases. Pretreatments for model optimization were designed and compared for each data set. The best combination of pretreatments was achieved by applying SNV and DV2 on partial least squares (PLS) modelling. There were no significant differences between the predictions using the three different data sets (p < 0.05). Finally, a unique database including all three data sets was built to include all the sources of sample variability that were tested and used for final prediction. External validation of TOC was carried out on 16 unknown soil samples to evaluate the predictive ability of the final combined calibration model. Hence, we demonstrate that sample preprocessing has minor influence on the quality of near infrared spectroscopy (NIR) predictions, laying the ground for a direct and fast in situ application of the method. Data can be acquired outside the laboratory since the method is simple and does not need more than a simple band ratio of the spectra.


Introduction
Soil is an essential pillar of agriculture and any form of human intervention influences its activity and the equilibrium of the entire ecosystem [1]. Decades of soil exploitation and intensive land management has led to its dramatic chemical degradation, especially in terms of nutrient losses [2]. Notwithstanding, in recent years, increasing awareness has been reversing the trend by the introduction of improved management technologies such as precision farming [3]. However, implementation of variation within the study site, as shown in previous studies on the same site. The sampling site had an overall surface of 190,000 m 2 , divided into four fields ( Figure 1). Soil samples were collected through grid sampling (n = 23) from an area that had been previously studied with reference to nitrogen fertilization and presently under study for increasing carbon storage and soil organic matter [24]. An additional 16 samples, to be used as an external validation set, were collected within the same field in points different from the grid sampling and were randomly chosen from topsoil and subsoil. Soil cores were broken apart by hand and in the laboratory, homogenised and put in a plastic bag while still moist, without any other treatment. All samples were then stored at −20°C until submitting to reference assays and spectra acquisition.
TOC was determined using a Carbon Analyzer TOC-V-CSM (Shimadzu, Tokio, Japan) after acidification with 2 M HCl to remove dissolved carbonate [25]. The instrument has a detection limit of 5 g/L and a measurement accuracy expressed as CV 1.5%. All determinations were measured in triplicates to calculate reference data reproducibility.

Sample Treatments for NIR Spectral Acquisition
Three sets of NIR spectra were collected from the 46 soil samples: wet samples (WS), dried samples (DS) and ground and sieved samples (GSS), depending on the soil processing. NIR spectra of wet soil samples were collected by simply placing soil on glass Petri cups after being conditioned at room temperature for about 2 h to avoid interference with the spectra signals caused by low temperature (the WS set). The same samples were then dried overnight at 55 °C without any other manipulations and the spectra were registered the following day in the DS set. Finally, the dried soils were ground and passed through a 2 mm sieve to obtain a fine powder, and NIR spectra collected to obtain the GSS set.

NIR Spectroscopy
NIR spectra of soil samples were acquired with a NIRFLex N-500 (Büchi, Switzerland) set up with a polarization interferometer with TeO2 wedges and a Solids Cell Module (Büchi, Flawil, Switzerland), devised for housing standard 9.0 cm diameter Petri dishes (Schott, Mainz, Germany). Soil samples were loaded on the dishes and pressed by means of a stainless steel disk. Due to the eccentrically rotating cup housing, two scans for each sample were taken. The instrument was able to operate in the range of 5-35°C, without any drift in the spectral signal. The reflectance spectra were Soil cores were broken apart by hand and in the laboratory, homogenised and put in a plastic bag while still moist, without any other treatment. All samples were then stored at −20 • C until submitting to reference assays and spectra acquisition.
TOC was determined using a Carbon Analyzer TOC-V-CSM (Shimadzu, Tokio, Japan) after acidification with 2 M HCl to remove dissolved carbonate [25]. The instrument has a detection limit of 5 µg/L and a measurement accuracy expressed as CV 1.5%. All determinations were measured in triplicates to calculate reference data reproducibility.

Sample Treatments for NIR Spectral Acquisition
Three sets of NIR spectra were collected from the 46 soil samples: wet samples (WS), dried samples (DS) and ground and sieved samples (GSS), depending on the soil processing. NIR spectra of wet soil samples were collected by simply placing soil on glass Petri cups after being conditioned at room temperature for about 2 h to avoid interference with the spectra signals caused by low temperature (the WS set). The same samples were then dried overnight at 55 • C without any other manipulations and the spectra were registered the following day in the DS set. Finally, the dried soils were ground and passed through a 2 mm sieve to obtain a fine powder, and NIR spectra collected to obtain the GSS set.

NIR Spectroscopy
NIR spectra of soil samples were acquired with a NIRFLex N-500 (Büchi, Switzerland) set up with a polarization interferometer with TeO 2 wedges and a Solids Cell Module (Büchi, Flawil, Switzerland), devised for housing standard 9.0 cm diameter Petri dishes (Schott, Mainz, Germany). Soil samples were loaded on the dishes and pressed by means of a stainless steel disk. Due to the eccentrically rotating cup housing, two scans for each sample were taken. The instrument was able to operate in the range of 5-35 • C, without any drift in the spectral signal. The reflectance spectra were collected in the 1000-2500 nm full-wavelengths interval using NIRWare 1.4 (Büchi, Flawil, Switzerland), at 2-4 scans/s. An optimized signal-to-noise ratio was guaranteed by averaging 64 scans for each spectrum, with an overall measurement time of 15 s. Internal and external references were acquired every 10 spectra in order to optimize the spectrum signal and set up the light source alignment.

NIR Statistics and Chemometrics
All chemometric analyses including calibration and validation were performed using NIRCal 5.4 (Buchi, Flawil, Switzerland). The raw optical data were pre-processed with full-multiplicative scatter correction (MSC), standard normal variate (SNV) and normalization by closure (NCL) as techniques to reduce the variability due to scattering, and 1st and 2nd derivatives (DV1 and DV2, respectively) to remove baseline offset and linear trends [26]. Depending on regression coefficients computed for reflectance at each wavelength, NIRCal 5.4 recommend the wavelengths interval for each data set. Principal component analysis (PCA) was carried out to perform discriminant qualitative principal component (PC) plots. Partial least squares regression (PLSR) was used as the regression model to correlate the reference data and the NIR predicted results. The optimal number of factors was assessed by calculating the predicted residual error sum of squares (PRESS) values, i.e., the sum of squares of deviation between predicted and reference values [27]. Cross-validation, or internal validation, was carried out as default software output, using the blockwise procedure, sharing out the calibration set into 3-fold blocks, and testing in turn one block as the validation set and the others as calibration sets. The software computed a series of calibration models and automatically selected the best one by comparing the squared Pearson correlation coefficient for both calibration (R 2 cal ) and cross-validation (R 2 CV ), standard error of calibration (SEC) and standard error of cross-validation (SECV). The regression model statistics were also evaluated in terms of relative prediction deviation (RPD), i.e., the relationship between the standard deviation (SD) of the entire population divided by the SEC [28]. Quality of calibration was formulated by Q-value, as a weighted combination of all relevant statistical measures (SEC, SEP, bias and regression coefficients). Q-value is automatically calculated by the software during the calibration protocol as: where w is the weight assigned for each statistical measure, v is the corresponding value of the statistical measures and i is the number of measures. Q-value can be considered as an overall quality index, that classifies regressions by using a number between 0 (useless) and 1 (ideal). To be considered reliable, a calibration has to obtain a Q-value greater than 0.50 [29]. Mahalanobis distance criterion was used as the default method to find outliers in sample sets [30]. The standard error of the laboratory (SEL), i.e., the error of the reference data, was reported in order to benchmark the NIR statistics (SEC and SEP).
The calibrations for soils were then validated by means of external validation, and 16 new independent soil samples were collected in order to verify the predictive ability of NIR to obtain supplementary unknown soil samples. The accuracy of NIR-predicted data sets was measured as squared correlation coefficient (R 2 EX.V ) and root mean standard error of prediction between predictions and reference values (RMSEP) [31]. NIR repeatability on predictions was calculated as the average of three acquisitions for each sample. Mean absolute error (MAE) with standard deviation (SD MAE ) was also calculated to measure prediction accuracy [32].
Calibrations and validations were performed by means of a full-spectrum approach using all the wavelength in the interval of acquisition (1000-2500 nm).

Analysis of Spectral Data
The diffuse reflectance NIR spectra of all soil samples are shown in Figure 2. The major signals around 1400 nm (OH second overtone) and 1900 nm (OH third overtone), are caused by water absorption. Other signals related to organic constituents of soils are almost The major signals around 1400 nm (OH second overtone) and 1900 nm (OH third overtone), are caused by water absorption. Other signals related to organic constituents of soils are almost completely covered by the overwhelming presence of water, resulting in relatively smooth spectra. Various components of soil organic matter are generally more visible in the mid-infrared region, although the weak overtones and combination bands of fundamental vibrations occur in the near infrared range and can be exploited for analytical purposes [32]. As reported in the literature [33], absorption values below 1000 nm (not shown) are usually associated with humic compounds and with pigments derived from chlorophyll and phenolic compounds during decomposition of organic materials and plant residues. Even though often hidden, spectral signals at 1700 nm (CH 2 overtones) and the range 2200-2300 nm (aliphatic CH and OH phenolic compounds) have been already correlated with soil characterization [34][35][36]. In particular, the region around 2200 nm was correlated with clay mineral [37], which is abundant in the soils we collected.
NIR reflectance of WS is strongly influenced by moisture, which induces an overall decrease in reflectance of soils with respect to dry samples [38]. As evidenced in Figure 2b, soil moisture markedly influences the soil scattering features and baseline offset effects. When soil moisture increases, soil particles absorb the water and then micro and macropores are filled with water. Depending on the moisture content, the water film around the soil particles determines a modification in the refractive index with respect to dry soils, where particles are surrounded by air [39], resulting in a larger part of light propagating deeper into the soil and consequently lowering the light scattering [40]. Moreover, it is worthwhile noting that the overall decrease in reflectance with increasing moisture content is not constant along with the spectrum, but it becomes more marked towards longer wavelengths, causing an increase in the slope of the spectral curves between 1800 and 2500 nm. Longer wavelengths are able to strongly adsorb water, emphasizing the change in reflectance [41]. This also generates a slight shift of the maximum of the peak at 1900 nm and peak broadening probably due to the increasing water content.
Conversely, particle size seemed not to have a significant effect on diffuse reflectance of soil samples. As expected, light scattering is more appreciable in the case of only dried samples because of the non-homogeneity of particles with respect to those that are ground and sieved, but comparing red spectra in Figure 2c,d, the overall spectral offsets are almost completely superimposed.
The PCA carried out on all the original NIR spectra has confirmed what was detected by the visual inspection of the spectra ( Figure 3). completely covered by the overwhelming presence of water, resulting in relatively smooth spectra. Various components of soil organic matter are generally more visible in the mid-infrared region, although the weak overtones and combination bands of fundamental vibrations occur in the near infrared range and can be exploited for analytical purposes [32]. As reported in the literature [33], absorption values below 1000 nm (not shown) are usually associated with humic compounds and with pigments derived from chlorophyll and phenolic compounds during decomposition of organic materials and plant residues. Even though often hidden, spectral signals at 1700 nm (CH2 overtones) and the range 2200-2300 nm (aliphatic CH and OH phenolic compounds) have been already correlated with soil characterization [34][35][36]. In particular, the region around 2200 nm was correlated with clay mineral [37], which is abundant in the soils we collected. NIR reflectance of WS is strongly influenced by moisture, which induces an overall decrease in reflectance of soils with respect to dry samples [38]. As evidenced in Figure 2b, soil moisture markedly influences the soil scattering features and baseline offset effects. When soil moisture increases, soil particles absorb the water and then micro and macropores are filled with water. Depending on the moisture content, the water film around the soil particles determines a modification in the refractive index with respect to dry soils, where particles are surrounded by air [39], resulting in a larger part of light propagating deeper into the soil and consequently lowering the light scattering [40]. Moreover, it is worthwhile noting that the overall decrease in reflectance with increasing moisture content is not constant along with the spectrum, but it becomes more marked towards longer wavelengths, causing an increase in the slope of the spectral curves between 1800 and 2500 nm. Longer wavelengths are able to strongly adsorb water, emphasizing the change in reflectance [41]. This also generates a slight shift of the maximum of the peak at 1900 nm and peak broadening probably due to the increasing water content.
Conversely, particle size seemed not to have a significant effect on diffuse reflectance of soil samples. As expected, light scattering is more appreciable in the case of only dried samples because of the non-homogeneity of particles with respect to those that are ground and sieved, but comparing red spectra in Figure 2c,d, the overall spectral offsets are almost completely superimposed.
The PCA carried out on all the original NIR spectra has confirmed what was detected by the visual inspection of the spectra (Figure 3). Along the PC1, which explains 76.22% of the variance, samples are grouped based on moisture content, showing the greatest relevance of this variable. Equally interesting, inside the DS group, a separated sub-cluster corresponding to only dried (blue) and GSS (green) samples along PC2 (19.78% of variance explained) can be clearly recognized. Spectra were surely influenced by the homogeneity of particle size, but as a second-priority variable. Along the PC1, which explains 76.22% of the variance, samples are grouped based on moisture content, showing the greatest relevance of this variable. Equally interesting, inside the DS group, a separated sub-cluster corresponding to only dried (blue) and GSS (green) samples along PC2 (19.78% of variance explained) can be clearly recognized. Spectra were surely influenced by the homogeneity of particle size, but as a second-priority variable.

Spectra Pre-Treatments
As previously shown, the effect of light scattering is markedly predominant in our spectral set, generating spectral baseline shifts and non-linearity phenomena, which constitute the major part of the total variation of the signals. Spectra of soils usually contain noise and interferences, due to the fact that they are complex and multi-component matrices, that can be minimized applying mathematical treatments on the spectra before calibration [42]. Figure 4 shows the effects on average spectra calculated for the three sample sets (WS, DS and GSS) of NCL, MSC and SNV. They are the most widely used pre-processing techniques for NIR spectra to reduce the variability among samples due to scattering and adjust for baseline shifts [43].  In MSC, the light scattering is estimated and corrected for each sample relative to an ideal sample obtained by averaging the complete wavelength range of the data [44].
The signal correction concepts behind SNV and normalization are the same as those for MSC except that an average reference signal is not required. Instead, each observation is processed on its own, isolated from the reminder of the set [45]. As already discussed by Dhanoa et al. [46], there are some similarities among the three pretreatments up to a simple spectral rotation and offset correction (Figure 4b-d).
To reduce the effect of the background and remove the baseline shifts, Savitzky-Golay first and second derivative (5-point and 9-point segments, respectively) have also been applied to our spectra [47] (Figure 5). It is worthwhile noting that in all cases, pretreatments are useful to reduce or eliminate the effect of undesired scattering due to particle size and shape as well as sample packing due to the different ground treatment of soils (red and green spectra in all figures), confirming that in that case the scattering phenomena are derived from the typical physical variations of the measured samples. When the particle size is larger than the wavelength, as is generally the case for solid samples analyzed by NIR, the anisotropic Lorenz-Mie [26] scattering is predominant and could be easily reduced or eliminated by means of the application of mathematical functions on the original spectra [48]. On the contrary, the presence of water in samples, besides light absorption, generates a Rayleigh scattering, or scattering by small particles as water molecules [49], which is strongly wavelength-dependent and nearly isotropic. The results of the application of preprocessing techniques are very different from the previous cases and more marked at larger wavelength [50].

PLSR Calibration Models for TOC Predictions
Using the reference assay values of TOC and the spectral data, original and pretreated, calibration models have been generated by the PLS regression method for DS (Table 1), GSS (Table 2) and WS (Table 3) sets. The reference analysis on soil samples gave values in the range of 0.99-2.42% of TOC, with a standard deviation of 0.33%. All the sample sets contained 46 samples, 30 as C-set and 16 as CV-set. Table 1. The optimal partial least squares (PLS) model prediction results for total organic carbon (TOC) and the corresponding statistical parameters of the various single and combined pretreatments for the DS set (NCL = normalization by closure; MSC = full-multiplicative scatter correction, SNV = standard normal variate; DV1 = 1st derivative; DV2 = 2nd derivative).   The reproducibility of reference data was 0.24%, whilst NIR measurement repeatability was in the range 0.11% for the dried and ground sample set to 0.14% for the WS set, slightly more than the expected value <0.5 SECV, surely due to the high heterogeneity of soils, which makes NIR determinations more subjected to errors [51].

Pre-Treatment Applied
The only outliers found in the case of DS and GSS sets were probably due to a failure in TOC determination for those samples, because they were recognized by the software as original property value-residual outliers. In the case of the WS set, the second outlier was recognized as a NIR-predicted value-residual outlier. Comparing the statistical results reported in the three tables, it is worthwhile noting that the DS and GSS sets show similar results and have overall better performances in terms of correlations and robustness of calibrations than the WS set, where the interference of water on spectra surely provides a decrease in the model's efficiency. Different single pretreatments have generated different effects, SNV being the most suitable treatment for scattering correction, and DV2 (9 point) for smoothing. The combined use of SNV and DV2 have given the best calibration performances for all the three samples sets, also validated by the cross-validation results ( Figure 6). value-residual outliers. In the case of the WS set, the second outlier was recognized as a NIR-predicted value-residual outlier. Comparing the statistical results reported in the three tables, it is worthwhile noting that the DS and GSS sets show similar results and have overall better performances in terms of correlations and robustness of calibrations than the WS set, where the interference of water on spectra surely provides a decrease in the model's efficiency. Different single pretreatments have generated different effects, SNV being the most suitable treatment for scattering correction, and DV2 (9 point) for smoothing. The combined use of SNV and DV2 have given the best calibration performances for all the three samples sets, also validated by the cross-validation results ( Figure 6).
Stevens et al. [5], on a large-scale EU soil survey of about 20,000 samples belonging to eight land-use types with R 2 values from 0.76-0.96 and RPD values ranging from 1.74-2.88. Brown et al. [52] achieved a R 2 value of 0.87 from a global scale. With 1011 soil samples, Shepherd and Walsh [53] obtained a R 2 value of 0.80 for organic carbon content estimation. Moreover, Saiano et al. [54] estimated the soil carbon contents of 89 soils from a small and homogeneous area, Pantelleria Island, and achieved a considerably higher accuracy with a cross-validation R 2 value of 0.951 and an RPD value of 4.49. Accordingly, Cozzolino et al. [33] reported a R 2 CAL = 0.94-0.96 for silt and clay soils and R 2 CAL = 0.89-0.92 for sand soils.
As reported by several Authors [55,56], moisture has dramatic effects on NIR reflectance, usually decreasing at increasing water content from dryness to saturation. In our case, we could confirm a general decreasing of spectral reflectance of wet samples compared with dry samples. However, the overall relatively low absolute water content of soil samples, which did not exceed 14% (w/w) and the narrow range of distribution, within six percentage points (analysis not shown), were probably due to the particular seasonal characteristics during sample collection, and resulted in an acceptable worsening of calibration parameters from GSS to WS models.

External Validation
After developing a calibration model, it is essential to evaluate the performance of the model with samples independent from those used to develop the model itself. External validation is a consolidated procedure that uses a separate data set to validate the calibrations before applying them in routine analysis where, especially in cases of complex and heterogeneous samples strongly affected by composition and structure, cross validation alone is not sufficiently reliable to trust model performances. Accordingly, another 16 independent soil samples, randomly chosen between topsoil and subsoil samples, were collected as previously described, and submitted to NIR detection and to TOC reference assays. The results of these supplementary tests have been reported in terms of NIRpredicted TOC (Table 4) against the original property results. The reference TOC values were in the range of 1.05-2.21% with a standard deviation of 0.32%.  When compared with previous studies in estimating organic carbon content of soils by using visible/NIR spectra, the predictive performance obtained in this study was in accordance with Stevens et al. [5], on a large-scale EU soil survey of about 20,000 samples belonging to eight land-use types with R 2 values from 0.76-0.96 and RPD values ranging from 1.74-2.88. Brown et al. [52] achieved a R 2 value of 0.87 from a global scale. With 1011 soil samples, Shepherd and Walsh [53] obtained a R 2 value of 0.80 for organic carbon content estimation. Moreover, Saiano et al. [54] estimated the soil carbon contents of 89 soils from a small and homogeneous area, Pantelleria Island, and achieved a considerably higher accuracy with a cross-validation R 2 value of 0.951 and an RPD value of 4.49. Accordingly, When compared with previous studies in estimating organic carbon content of soils by using visible/NIR spectra, the predictive performance obtained in this study was in accordance with Stevens et al. [5], on a large-scale EU soil survey of about 20,000 samples belonging to eight land-use types with R 2 values from 0.76-0.96 and RPD values ranging from 1.74-2.88. Brown et al. [52] achieved a R 2 value of 0.87 from a global scale. With 1011 soil samples, Shepherd and Walsh [53] obtained a R 2 value of 0.80 for organic carbon content estimation. Moreover, Saiano et al. [54] estimated the soil carbon contents of 89 soils from a small and homogeneous area, Pantelleria Island, and achieved a considerably higher accuracy with a cross-validation R 2 value of 0.951 and an RPD value of 4.49. Accordingly, Cozzolino et al. [33] reported a R 2 CAL = 0.94-0.96 for silt and clay soils and R 2 CAL = 0.89-0.92 for sand soils.
As reported by several Authors [55,56], moisture has dramatic effects on NIR reflectance, usually decreasing at increasing water content from dryness to saturation. In our case, we could confirm a general decreasing of spectral reflectance of wet samples compared with dry samples. However, the overall relatively low absolute water content of soil samples, which did not exceed 14% (w/w) and the narrow range of distribution, within six percentage points (analysis not shown), were probably due to the particular seasonal characteristics during sample collection, and resulted in an acceptable worsening of calibration parameters from GSS to WS models.

External Validation
After developing a calibration model, it is essential to evaluate the performance of the model with samples independent from those used to develop the model itself. External validation is a consolidated procedure that uses a separate data set to validate the calibrations before applying them in routine analysis where, especially in cases of complex and heterogeneous samples strongly affected by composition and structure, cross validation alone is not sufficiently reliable to trust model performances. Accordingly, another 16 independent soil samples, randomly chosen between topsoil and subsoil samples, were collected as previously described, and submitted to NIR detection and to TOC reference assays. The results of these supplementary tests have been reported in terms of NIR-predicted TOC (Table 4) against the original property results. The reference TOC values were in the range of 1.05-2.21% with a standard deviation of 0.32%. The three sets of NIR-predicted data were evaluated by one-way ANOVA and were not found to be significantly different (p < 0.05) [57]. As a consequence, both particle size and moisture seem not to dramatically influence the predictions, or not in a way that can be taken into account at the moment of sample collection. Samples could be collected and spectra acquired without any physical pre-treatments such as drying or grinding, but nevertheless still obtaining reliable NIR predictions for samples with different physical characteristics.
As a consequence, an attempt to gather all samples together in a single set and a unique recalculated calibration model, was carried out. The regression model (three factors), and the cross-validation curve are shown in Figure 7.  In this way, it was possible to include within a unique calibration all the signal variations derived from both water and particle size, that generate scattering on the spectra. A total of 138 spectra have been processed with SNV and 2nd derivative as pretreatments and used to develop the calibration model. Blockwise cross-validation assigned 92 in the C-set and 46 in the V-set.  In this way, it was possible to include within a unique calibration all the signal variations derived from both water and particle size, that generate scattering on the spectra. A total of 138 spectra have been processed with SNV and 2nd derivative as pretreatments and used to develop the calibration model. Blockwise cross-validation assigned 92 in the C-set and 46 in the V-set. The regression coefficient (R 2 ) and the standard error of calibration (SEC) were 0.78 and 0.17, respectively. The validation samples were predicted with a SECV of 0.18 and a R 2 of 0.80. Based on the DW statistics, both the C-set and V-set showed no autocorrelation. RPD for calibration and cross-validation were 1.9 and 1.8, respectively, which are satisfactory in terms of the predictive ability of the model. The new calibration model was validated using the same 16 samples previously collected, with R 2 of 0.71 and RMSEP of 0.30 (Figure 8). In this way, it was possible to include within a unique calibration all the signal variations derived from both water and particle size, that generate scattering on the spectra. A total of 138 spectra have been processed with SNV and 2nd derivative as pretreatments and used to develop the calibration model. Blockwise cross-validation assigned 92 in the C-set and 46 in the V-set. The regression coefficient (R 2 ) and the standard error of calibration (SEC) were 0.78 and 0.17, respectively. The validation samples were predicted with a SECV of 0.18 and a R 2 of 0.80. Based on the DW statistics, both the C-set and V-set showed no autocorrelation. RPD for calibration and cross-validation were 1.9 and 1.8, respectively, which are satisfactory in terms of the predictive ability of the model. The new calibration model was validated using the same 16 samples previously collected, with R 2 of 0.71 and RMSEP of 0.30 ( Figure 8).

Conclusions
NIR spectroscopy has demonstrated great potential to predict TOC in soil samples with different characteristics in terms of particle size and moisture content. A lab-scale method to predict TOC has been set up and proposed for soils at different moisture content and two particle sizes. Results have shown linearity and satisfactory regression models in all the three cases separately. The absence of differences in the prediction capability of independent samples has demonstrated that the general effects of physical soil characteristics do not generate dramatic interferences with spectral signals and TOC quantifications. Despite a slight worsening of the prediction capacity, the possibility to gather all samples and build a unique calibration model has permitted us to encompass the two principal sources of spectral offsets and shifts in the calibration model, increasing its robustness and reliability NIR-predicted property, TOC (%) Original property, TOC (%)

Conclusions
NIR spectroscopy has demonstrated great potential to predict TOC in soil samples with different characteristics in terms of particle size and moisture content. A lab-scale method to predict TOC has been set up and proposed for soils at different moisture content and two particle sizes. Results have shown linearity and satisfactory regression models in all the three cases separately. The absence of differences in the prediction capability of independent samples has demonstrated that the general effects of physical soil characteristics do not generate dramatic interferences with spectral signals and TOC quantifications. Despite a slight worsening of the prediction capacity, the possibility to gather all samples and build a unique calibration model has permitted us to encompass the two principal sources of spectral offsets and shifts in the calibration model, increasing its robustness and reliability with unknown samples. Future improvements of this application could permit performing NIR analysis of soils directly in field by potentially using a probe connected to the NIR instrument.
Author Contributions: Elena Tamburini has worked for years on the development of NIR in the agri-food field, and carried out the experimental tests and chemometric evaluation required to build up the NIR calibrations and validation. Fabio Vincenzi and Stefania Costa carried out all the experimental assays and data processing related to reference methods. Paolo Mantovi gives an effective contribution in the revision process to agronomic aspects of the studies and the potential usefulness of in field application. As supervisor of the research group, Paola Pedrini and Giuseppe Castaldelli defined the general research statement, from analytical and in-field perspectives, respectively.