Quantification of Soil Properties with Hyperspectral Data : Selecting Spectral Variables with Different Methods to Improve Accuracies and Analyze Prediction Mechanisms

We explored the potentials of both non-imaging laboratory and airborne imaging spectroscopy to assess arable soil quality indicators. We focused on microbial biomass-C (MBC) and hot water-extractable C (HWEC), complemented by organic carbon (OC) and nitrogen (N) as well-studied spectrally active parameters. The aggregation of different spectral variable selection strategies was used to analyze benefits for reachable estimation accuracies and to explore spectral predictive mechanisms for MBC and HWEC. With selected variables, quantification accuracies improved markedly for MBC (laboratory: RPD = 2.32 instead of 1.33 with full spectra; airborne: 2.35 instead of 1.80) and OC (laboratory: RPD = 3.08 instead of 2.36; airborne: 2.20 instead of 1.94). Patterns of selected variables indicated similarities between HWEC and OC, but significant differences between all other soil variables. This agreed to our results of indirect approaches in which both (i) wet-chemical data of OC and N and (ii) spectra fitted to measured OC and N values were used to estimate MBC and HWEC. Compared to these approaches, we found marked benefits of laboratory and airborne data for a direct spectral quantification of MBC (but not for HWEC). This suggests specificity of spectra for MBC, usable for the determination of this important soil parameter.


Introduction
Compared to laboratory reflectance spectroscopy in the Vis-NIR-SWIR domain (i.e., in the spectral region between 400 and 2500 nm) with prepared soil samples and well-defined illumination and viewing conditions, the use of imaging spectroscopy (IS) with air-or spaceborne data is far less common for soil applications.One main reason for the limited number of studies with IS is the still restricted data availability.Beyond this, IS approaches suffer from a series of limitations compared to laboratory or also field spectroscopy.To retrieve surface reflectances from the measured radiances, they have to be corrected for atmospheric effects.The signal-to-noise ratio of air-or spaceborne hyperspectral data is relatively low compared to laboratory data due to a low integration time over the target area.Furthermore, a precise georectification of the data is necessary for a correct attribution of collected and analyzed soil samples to image pixels [1,2].
IS is able to measure the reflectance of only the first few millimeters of the soil surface.Variable surface properties affect the spectral assessment of soil properties, as they may blur-depending on the studied target variable-the specific spectral responses that are relevant for calibrating the estimation model.These properties that are subject to variation in space (and also time) are humidity, soil roughness, degree of soil crusting, texture, and the spectral mixture of bare soil with other materials such as vegetation or crop residues.Compared to airborne data, Hyperion satellite hyperspectral data are additionally limited by a lower spatial resolution with a nominal pixel size of 30 m, which increases, for example, the problem of different surfaces mixed in one pixel [1,3,4].
MBC is a measure of the mass of the living component of soil organic matter.It underpins any biologically related property of soil systems and is essentially linked to nutrient cycles, humification, physical structure formation, degradation of contaminants and soil fertility [17,18].Thus, MBC and its species and functional diversity are recently gaining significance as bioindicators of soil quality [18][19][20], leading to a huge output of studies throughout the last decade [17,21].Compared to total OC, MBC is substantially more dynamic and quickly responds to agricultural management changes such as tillage practice, organic amendments and incorporation of cover crops, or to soil moisture and temperature changes.MBC depends on the amount of available soil C substrate, so that the MBC-to-OC ratio indicates the substrate availability to soil microorganisms [22][23][24][25][26]. Nevertheless, MBC and OC may be highly correlated, since Jenkinson and Ladd [27], for example, reported that situations that promote the accumulation of organic matter in general also increase MBC.
For its value as biological soil quality indicator we examined MBC in addition to OC and N.As another indicator of soil quality we also included hot water-extractable C (HWEC), as this fraction can be used to quantify a labile, decomposable part (Cdec, active pool) of OC that is distinguishable from a passive, inert pool of OC.For long-term "steady state"-conditions, the relationship between HWEC and Cdec can be expressed as a constant ratio [22,28].HWEC sensitively reflects changes in soil organic matter, for example those caused by input, quality and degradation of primary organic material and soil management practices [29][30][31].However, one has to keep in mind that some studies do question the usefulness of water extractions for an understanding of C dynamics [32].
All four soil variables were tested for whether they could be quantified from laboratory spectroscopic and/or airborne hyperspectral data.However, for MBC and HWEC we hypothesize a spectral assessment that is at least in part linked to that of OC.We refer, for example, to the early study of Chang et al. [12], who attributed good predictions of soil biological properties to spectral responses being very similar to that of soil OC.In contrast, Zornoza et al. [16] found specific wavelength ranges different from those for OC to be most useful for the calibration of MBC (and other microbiological soil properties) from Vis-NIR-SWIR spectra.
Hyperspectral data provide large sets of usually strongly collinear predictor variables, something that is, independent from the target variable, a drawback for a successful calibration.To partly compensate these effects, reduction of dimensionality is a standard approach.Partial least squares regression (PLSR), for example, projects the spectral data into a low-dimensional space formed by a set of orthogonal latent variables with the aim to maximize the covariance between predictors and target variable(s).Currently, PLSR is the most common multivariate calibration method [33], although other approaches have also been applied successfully (see overviews in Viscarra Rossel et al. [34], Vohland et al. [35], Bellino et al. [36]).Furthermore, it has been shown in many studies that it is beneficial to include methods of spectral variable selection in the multivariate modeling process to exclude uninformative spectral variables [37].The possible benefits are increased estimation accuracies and more parsimonious and thus robust calibration models, which may improve the predictive ability for independent validation samples.Additionally, the selections of informative or key variables can be used to obtain insight in the underlying spectral predictive mechanisms [37][38][39].However, we have to be aware that relevant mechanisms, e.g., for variables of soil organic matter, may vary from one soil population to another, each with a specific combination of spectrally meaningful factors such as soil texture or color [40,41].For example, a series of studies (see overview in Cécillon et al. [1]) indicates that only spectral variables being greater than 1100 nm are closely associated with OC, whereas Viscarra Rossel et al. [42] showed that soil color can be used as proxy for an indirect quantification of OC.
Several approaches exist for the selection of an optimal subset of spectral wavelengths (see e.g., overview in Xiaobo et al. [37]), which may be subdivided into different categories, depending on the statistical features of the variables used for selection (such as, for example, regression coefficients or t-statistics) and the way the interaction of variables in the search space is considered, if at all.In general, approaches have to compromise between an exhaustive search through the possible combinations of variables and computational efficiency [39].Selections will not fully match but differ from one approach to another due to differences in the search procedure.We assume that multi-method selections, and their comparison and synopsis will be worthwhile for deepening an understanding of the specific spectral responses of the studied soil variables.Moreover, they will prove helpful in the analysis-based on similarities and discrepancies between the specific selection patterns-as to what extent predictive mechanisms for the different soil properties are related to each other.
Against this background, the objectives of our study were: (i) to study and compare the usability of laboratory (non-imaging) and airborne (imaging) spectroscopic measurements to estimate OC, N, MBC and HWEC; (ii) to explore whether estimation accuracies increase by the application of different spectral variable selection techniques (in combination with PLSR); (iii) to analyze and compare predictive mechanisms based on the multi-method spectral selection patterns that we obtained for each soil property; and (iv) to clarify whether the quantification of MBC and/or HWEC is based on specific impacts on the spectra or is rather triggered by the relation to the spectrally active constituents OC and N.
Our dataset comprised spectra acquired airborne with the HyMap sensor (Integrated Spectronics, Baulkham Hills, Australia) and measured in the laboratory with a FOSS XDS Rapid Content Analyzer spectrometer (FOSS NIRSystems, Laurel, MD, USA).In total, 42 agricultural plots were sampled during the HyEurope 2009 campaign.Although limited in size, the studied sample set covered a wide range of geological substrates, pH values, soil types and soil texture classes.For the multivariate modeling, we combined PLSR with three different methods of spectral variable selection that implement basically different search strategies for finding the optimal spectral variable subset-"competitive adaptive reweighted sampling" (CARS [43]), a method that "iteratively retains informative variables" (IRIV [39]) and a genetic algorithm (GA [44]).

Study Site, Soil Samples and Their Analytical Properties
The study area (49 • 55 21.5 N/6 • 25 05 E-49 • 53 53.5 N/6 • 45 52.5 E) was located in the Bitburger Gutland, part of the Eifel low mountain range in Rhineland-Palatinate, Germany.The Bitburger Gutland forms a slightly undulating landscape with different plateaus between 300 and 450 m a.s.l., partly segmented by V-shaped and forested valleys in north to south direction.The plateaus are predominantly used by agriculture (Figure 1).Geologically, the Bitburger Gutland is mainly characterized by Triassic sediments that were deposited in the Trier-Bitburger basin (as an extension of the Parisian basin) (Figure 1); as a consequence of tectonic movements in the Neogene, different resistances to erosion have led to the formation of a series of escarpments [45].In this region, we sampled a total of 60 agricultural plots with bare soils in a campaign between 23 and 27 August 2009.These plots were located in an area of 23 × 2.5 km 2 , which represented the overflight stripe of the HyMap sensor (see Section 2.2.).Plots influenced by clouds in the airborne image data were excluded, so that the data of 42 plots were finally used for the analysis in this study (Figure 1).They represented conventionally used arable land with regular plowing, typical crops being winter wheat, winter rape, rye and maize.
At the sampled plots, soils were classified according to the World Reference Base of Soil Resources [47].The arable soils varied in the range of the parent materials (i.e., for example, debris from iron-rich sandstone, limestone, dolomite or clay sediments each overlain by layers with admixtures of loess) and were identified as Luvic and Haplic Stagnosols, Dystric, Eutric and Colluvic Cambisols, Dystric Regosols and Calcic Leptosols.Soil textures in topsoils (Ap horizons) ranged from (sample counts in parentheses) sand (2), to loamy sand (9), silty sand (8), sandy loam (2), silty loam and silt (11), clay loam (6) and silty clay (4).At each plot, soil was sampled from the top horizon (Ap, 0-10 cm depth).To this end, five samples were taken from one central point and four further sampling points at a distance of 1.5 m from the central point in each cardinal direction.All five samples per plot were pooled in one composite sample.For each plot, the central position was tracked with GPS.
In the laboratory, soil samples were sieved ≤ 2 mm, air-dried and, for C and N analyses, subsequently pulverized by grinding with an agate mortar.The pH was determined potentiometrically In this region, we sampled a total of 60 agricultural plots with bare soils in a campaign between 23 and 27 August 2009.These plots were located in an area of 23 × 2.5 km 2 , which represented the overflight stripe of the HyMap sensor (see Section 2.2).Plots influenced by clouds in the airborne image data were excluded, so that the data of 42 plots were finally used for the analysis in this study (Figure 1).They represented conventionally used arable land with regular plowing, typical crops being winter wheat, winter rape, rye and maize.
At the sampled plots, soils were classified according to the World Reference Base of Soil Resources [47].The arable soils varied in the range of the parent materials (i.e., for example, debris from iron-rich sandstone, limestone, dolomite or clay sediments each overlain by layers with admixtures of loess) and were identified as Luvic and Haplic Stagnosols, Dystric, Eutric and Colluvic Cambisols, Dystric Regosols and Calcic Leptosols.Soil textures in topsoils (Ap horizons) ranged from (sample counts in parentheses) sand (2), to loamy sand (9), silty sand (8), sandy loam (2), silty loam and silt (11), clay loam (6) and silty clay (4).At each plot, soil was sampled from the top horizon (Ap, 0-10 cm depth).To this end, five samples were taken from one central point and four further sampling points at a distance of 1.5 m from the central point in each cardinal direction.All five samples per plot were pooled in one composite sample.For each plot, the central position was tracked with GPS.
In the laboratory, soil samples were sieved ≤2 mm, air-dried and, for C and N analyses, subsequently pulverized by grinding with an agate mortar.The pH was determined potentiometrically in a 0.01 M CaCl 2 solution with a glass electrode (soil-to-solution ratio 1:2.5).Total contents of OC and N were measured by gas chromatography after dry combustion at 1100 • C with a EuroEA elemental analyzer (HekaTech, Wegberg, Germany).Soil samples with possible free carbonate contents were pretreated to remove carbonate-C.In brief, 200 mg pulverized soil (dry matter basis) were treated with 200 mL 0.22 M HCl solution and then suspended (18,000 U min −1 ) with an Ultra Turrax vortexer (IKA, Staufen, Germany) for at least three 3 min.Non-purgeable organic carbon equivalent to total organic carbon (TOC) was then analyzed from this suspension with a TOC-VCPN-analyzer (Shimadzu, Duisburg, Germany).HWEC was determined following the method of Körschens et al. [48].After a 1-h extraction of 10 g soil with distilled water (50 mL) at 100 • C with a Gerhardt Turbotherm TT 125 (Gerhardt, Bonn, Germany), cooling, addition of MgSO 4 and centrifugation at 2600 rpm for 10 min, the dissolved organic carbon of the supernatant was analyzed with the Shimadzu TOC-VCPN-analyzer.
For soil microbiological analysis, the soil material was rewetted through the capillary method [49] using a spray bottle for gently sprinkling the samples twice a day for four days in total.MBC was determined by the chloroform fumigation extraction method according to Joergensen [50]; 0.01 M CaCl 2 solution was used to extract organic C from non-fumigated and chloroform-fumigated soil samples.Subsequent determination of C was performed with the Shimadzu TOC-VCPN analyzer.From the differential C content, MBC was calculated by using the correction factor kEC = 0.45 [50,51].
Statistics of the analyzed soil organic matter parameters (and pH values) are provided in Table 1.Correlations between the analyzed properties are shown in Table 2.The samples covered a broad range of organic matter contents, C-N ratios and a soil reaction from strongly acidic to slightly alkaline pH values.In total, soil parameters given in Table 1 cover values that are typical for agricultural soils [52,53].Selection of plots, sampling, wet-chemical analyses and spectral readings (see Section 2.2) provided the data pool for the further statistical analyses and the interpretation of the obtained results, as illustrated in Figure 2.

Spectral Database
A hyperspectral image was recorded with the 125-band HyMap airborne imaging sensor on 27 August 2009 near solar noon (solar zenith: 39.9°, solar azimuth: 180.7°).The band configuration of HyMap covered a wavelength range from 0.45 to 2.48 μm with band-specific FWHM (Full Width at Half Maximum of the spectral response function) values between 13 and 17 nm.The ground resolution realized in our dataset with a flight altitude of about 2290 m was approximately 4 m.A detailed description of the performance of HyMap in the laboratory can be found in Cocks et al. [54]; inter alia, HyMap achieves, while imaging a 50% reflectance target at an illumination zenith angle of 30 degrees, signal to noise ratios of about 1000:1 or better in spectral regions of the atmospheric windows.
The pre-processing of the acquired HyMap data included the following steps and provided a geocoded and atmospherically corrected image product containing surface reflectances: conversion of digital numbers to at-sensor-radiances using laboratory and in-flight radiometric calibration data provided by HyVista Corporation (Baulkham Hills, New South Wales, Australia), -atmospheric correction to provide surface reflectances with the airborne ATCOR ® (ATCOR-4, German Aerospace Center DLR, Weßling, Germany) software based on the MODTRAN ® (MODerate resolution atmospheric TRANsmission) computer code,

Spectral Database
A hyperspectral image was recorded with the 125-band HyMap airborne imaging sensor on 27 August 2009 near solar noon (solar zenith: 39.9 • , solar azimuth: 180.7 • ).The band configuration of HyMap covered a wavelength range from 0.45 to 2.48 µm with band-specific FWHM (Full Width at Half Maximum of the spectral response function) values between 13 and 17 nm.The ground resolution realized in our dataset with a flight altitude of about 2290 m was approximately 4 m.A detailed description of the performance of HyMap in the laboratory can be found in Cocks et al. [54]; inter alia, HyMap achieves, while imaging a 50% reflectance target at an illumination zenith angle of 30 degrees, signal to noise ratios of about 1000:1 or better in spectral regions of the atmospheric windows.
The pre-processing of the acquired HyMap data included the following steps and provided a geocoded and atmospherically corrected image product containing surface reflectances: conversion of digital numbers to at-sensor-radiances using laboratory and in-flight radiometric calibration data provided by HyVista Corporation (Baulkham Hills, Australia), -atmospheric correction to provide surface reflectances with the airborne ATCOR ® (ATCOR-4, German Aerospace Center DLR, Weßling, Germany) software based on the MODTRAN ® (MODerate resolution atmospheric TRANsmission) computer code, -parametric geocoding with the PARGE ® (ReSe Applications Schläpfer, Wil, Switzerland) software, that directly supports the processing of HyMap data and integrates flight parameters, terrain information and ground control reference points, -cross-track illumination correction (excluding forested and clouded areas) implemented in ENVI TM (ENvironment for Visualizing Images, Harris Geospatial Solutions, Broomfield, CO, USA) using a second order polynomial approach.
Data calibration and atmospheric correction was performed by DLR (German Aerospace Center Oberpfaffenhofen, Weßling, Germany) as data provider.After pre-processing, those pixels were extracted which matched the field-measured GPS coordinates; the spectra of these pixels were transformed to (pseudo-)absorbances by log 10 (reflectances −1 ).
In the laboratory, Vis-NIR-SWIR spectra of all 42 air-dried and ground samples were collected with a FOSS XDS Rapid Content Analyzer (RCA) spectrometer (FOSS NIRSystems, Laurel, MD, USA), which combines a single beam monochromator with the RCA module containing the detectors.Measurements by the FOSS instrument are done in the diffuse reflectance mode.A tungsten halogen lamp is used as internal light source that irradiates, after a separation of the light into individual wavelengths by a holographic grating, the sample at normal incidence (0 • zenith angle).The detectors of the RCA module are all mounted at a 45 • angle to the sample surface.Scans are realized in the spectral range between 400 and 2500 nm with a sampling interval of 0.5 nm and a spectral resolution of 2 nm.The spectrometer is internally calibrated with a ceramic standard before each measurement [55,56].In our case, soil samples were scanned in duplicate by placing them in a quartz-bottomed Petri dish on the sample window.The resulting spectra were recorded as (pseudo-)absorbances, obtained by log 10 (reflectances −1 ), and averaged over 32 scans for each sample.Finally, FOSS XDS spectra were resampled to HyMap spectral bands in which the spectral response function of each band was approximated by a Gaussian function.
Figure 3 shows, as examples, FOSS and HyMAP spectra measured and pre-processed for three different plots: one Dystric Regosol with high sand contents, low pH values (free of carbonates) and low OC contents, one clayey Stagnosol with higher pH values, higher contents of OC and inorganic C and a loamy Colluvic Cambisol with again slightly higher values for pH, OC and inorganic C.
Remote Sens. 2017, 9, 1103 7 of 22 -parametric geocoding with the PARGE ® (ReSe Applications Schläpfer, Wil, Switzerland) software, that directly supports the processing of HyMap data and integrates flight parameters, terrain information and ground control reference points, -cross-track illumination correction (excluding forested and clouded areas) implemented in ENVI TM (ENvironment for Visualizing Images, Harris Geospatial Solutions, Broomfield, CO, USA) using a second order polynomial approach.
Data calibration and atmospheric correction was performed by DLR (German Aerospace Center Oberpfaffenhofen, Weßling, Germany) as data provider.After pre-processing, those pixels were extracted which matched the field-measured GPS coordinates; the spectra of these pixels were transformed to (pseudo-)absorbances by log10(reflectances −1 ).
In the laboratory, Vis-NIR-SWIR spectra of all 42 air-dried and ground samples were collected with a FOSS XDS Rapid Content Analyzer (RCA) spectrometer (FOSS NIRSystems, Laurel, MD, USA), which combines a single beam monochromator with the RCA module containing the detectors.Measurements by the FOSS instrument are done in the diffuse reflectance mode.A tungsten halogen lamp is used as internal light source that irradiates, after a separation of the light into individual wavelengths by a holographic grating, the sample at normal incidence (0° zenith angle).The detectors of the RCA module are all mounted at a 45° angle to the sample surface.Scans are realized in the spectral range between 400 and 2500 nm with a sampling interval of 0.5 nm and a spectral resolution of 2 nm.The spectrometer is internally calibrated with a ceramic standard before each measurement [55,56].In our case, soil samples were scanned in duplicate by placing them in a quartz-bottomed Petri dish on the sample window.The resulting spectra were recorded as (pseudo-)absorbances, obtained by log10(reflectances −1 ), and averaged over 32 scans for each sample.Finally, FOSS XDS spectra were resampled to HyMap spectral bands in which the spectral response function of each band was approximated by a Gaussian function.
Figure 3 shows, as examples, FOSS and HyMAP spectra measured and pre-processed for three different plots: one Dystric Regosol with high sand contents, low pH values (free of carbonates) and low OC contents, one clayey Stagnosol with higher pH values, higher contents of OC and inorganic C and a loamy Colluvic Cambisol with again slightly higher values for pH, OC and inorganic C.

Multivariate Calibrations of OC, N, MBC and HWEC from Spectral Measurements
PLSR based on the nonlinear iterative partial least squares algorithm (NIPALS) was used for the multivariate calibration of the studied soil properties from the pre-processed spectral datasets (FOSS and HyMap data, each with 125 spectral variables).PLSR was used with full spectra and in combination

Multivariate Calibrations of OC, N, MBC and HWEC from Spectral Measurements
PLSR based on the nonlinear iterative partial least squares algorithm (NIPALS) was used for the multivariate calibration of the studied soil properties from the pre-processed spectral datasets (FOSS and HyMap data, each with 125 spectral variables).PLSR was used with full spectra and in combination with CARS, IRIV and GA as different spectral variable selection techniques, which are described in short in the text below.The number of latent variables was defined in each case in a leave-one-out (LOO) cross-validation procedure with the minimum root-mean-square error (RMSE) as criterion; to reduce possible overfitting, we permitted a general maximum of twelve latent variables.As the limited total number of samples (n = 42) did not allow for a split into calibration and external validation samples, we performed alternatively an internal LOO cross-validation of the estimation model.However, one should be aware that LOO cross-validation might provide over-optimistic results, particularly when the number of samples is small.Accuracies may drop with independent validation samples [57,58].
Goodness of estimation was evaluated on the cross-validated (cv) results with the coefficient of determination (r 2 cv ), the ratio of performance to deviation (RPD cv , defined as the ratio of standard deviation of the reference values to standard error of the cross-validated estimates), the RMSE cv and the rRMSE cv (relative RMSE cv = RMSE cv × measured arithmetic mean −1 ).
CARS, here combined with PLSR, aims at selecting key wavelengths with a physical meaning [38,43] in a computationally efficient procedure, which is based on the obtained values of regression coefficients.In detail, a series of m sampling runs (m equaled 50 in our case) is performed, each with two selection steps.In the first step, an exponential function is used (with m and the number of original wavelengths as parameters) to define the number of spectral variables to be removed in each sampling run; the respective number of spectral variables with smallest regression coefficients is then excluded before the second step.In this second step, n random numbers are generated to pick a selection of those n variables that were kept in the first step.The probability of a variable to be picked with these random numbers refers to its weight that is directly related to its regression coefficient.At the end of this step, only those variables are kept that have been picked more than once.At the end of all sampling runs, the best model from the yet defined m models is chosen using the minimum RMSE in the LOO cross-validation.However, due to the random numbers in the second step, no unique solution exists.Thus, we used CARS in an ensemble mode, i.e., the complete procedure was repeated 30 times and the final results were obtained by averaging all 30 (generally normally distributed) predictions per sample.
Compared to CARS, IRIV follows a more exhaustive search strategy.Given the spectral data matrix with n samples and p variables a binary matrix with either 1 or 0 (in a ratio of 1:1) is generated; the dimension of this matrix is k × p with k as the number of random combinations of variables which is defined depending on the number of considered spectral variables.Each row of this binary matrix with a randomly generated sequence of 0 (variable is switched off) and 1 (variable is switched on) determines the variables that are used as input for PLSR.The performance of each subset (row) in the PLSR approach is then evaluated by the five-fold cross validated RMSE, so that a vector of RMSE cv values with a dimension of k × 1 is obtained.In the next step, again all subsets defined by the rows of the binary matrix are considered for modeling, but now with systematic changes to assess the importance of each variable: Let us consider the first column (spectral variable) of the binary matrix; formerly 0, the value is now changed to 1 (and vice versa), while the state of all other variables in column 2, 3, . . ., p is not changed.Again, each subset (now changed at the first position) is used as input for PLSR.At the end of this step, we obtain k pairs of RMSE cv , whereas each pair is related to one row of the binary matrix, one time including the first spectral variable (case A) and one time excluding it (case B).The average RMSE cv is calculated for all case A-and all case B-models.If the mean RMSE cv (A) is less than the mean RMSE cv (B), the variable is considered informative.The described procedure is performed for all variables (columns) of the binary matrix.At the end, all informative variables are retained, while all other variables are removed.Then, a new binary matrix with a reduced dimension p is generated and the selection procedure starts again.As many rounds as necessary are performed so that only informative variables remain.Finally, a backward elimination is conducted with the retained informative variables to retrieve the final list of variables to be used in the PLSR approach.At this step, RMSE cv is calculated with all retained (j) variables.Then it is tested whether leaving out another variable i (with i = 1, 2, . . ., j) provides a new minimum of RMSE cv being smaller than RMSE cv with all variables.If yes, this variable i is left out and the elimination strategy starts again (with j changed to j − 1); if no, no further variable is removed and the current list of variables is used for the final PLSR estimation procedure (with LOO cross-validation).Similar to CARS, IRIV does not provide a unique solution due to the use of random combinations of variables in the binary matrix.Thus, also with IRIV the complete cycle was repeated 30 times and the final results were averaged from all 30 outputs.
GAs are guided random search strategies inspired by natural selection mechanisms.The strategy is to select randomly an initial set of "chromosomes" (each with in our case 125 spectral variables ("genes") and each gene switched on or off by binary coding) and then to generate new chromosomes by reproduction and mutation.The responses of the new chromosomes are evaluated (improved fitness or not?), which leads to a decision to include them in the population or to discard them.Response was here defined as the cross-validated percent of explained variance of the target variable obtained in the respective PLS regression.One hundred evaluations were performed in each run with a total number of 100 runs.At the end of all runs, the selected spectral variables of the overall fittest chromosome were identified as input for PLSR to retrieve estimates for the respective target soil variable.Detailed description of the GA algorithm used in this study can be found in Leardi [44].Different from other GA algorithms, the performed runs are not independent from each other, but each run learns from the information (specified weights for each variable) brought by the previous runs.To combine both the benefits of this learning and the advantage of several independent solutions, we repeated the complete cycle of 100 runs 30 times.Final estimates with this method were then obtained by averaging all 30 solutions for each sample.
Finally, selections obtained with the different methods were "pooled", i.e., all spectral variables (spectral bands) were ranked according to the selection frequencies obtained for them in CARS, IRIV and with the GA.For this ranking, the relative importance (RI) was calculated for each spectral variable and each of the studied soil properties according to (1): where sel i is the selection frequency for spectral variable i. Spectral variables were then sorted according to their RI value and the respective measured spectral data were used in a forward selection mode for PLSR modeling.Estimates were then taken from the most successful model with the highest r 2 value in the cross-validation.
Based on the calculated RI values, the selection patterns were compared for the different soil properties.This was done for all 125 bands and, to consider multicollinearity, also for groups of wavelengths identified by a k-means clustering on the measured FOSS and HyMap data.Six groups were defined for FOSS (455-514 nm, 529-588 nm, 603-748 nm, 762-1207 nm, 1221-2331 nm, 2348-2478 nm), while HyMap data was clustered to seven groups (455-558 nm, 573-705 nm, 719-893 nm, 909-1193 nm, 1207-1405 nm and 1533-1776 nm, 1419-1519 nm and 1788-2331 nm, 2348-2478 nm).The distributions of RI values, now referred to these groups, were then compared one by one using chi 2 -statistics to identify similarities and significant differences between the different soil variables.

Indirect Assessment of MBC and HWEC from OC, N and with Modelled Spectra
In addition to PLSR, we also quantified MBC and HWEC in multiple linear regressions with N and OC as predictor variables.This analysis was done to explore the possible benefits of spectroscopic approaches for the quantification of these soil properties compared to a proxy approach with wet-chemical data.Again, a LOO cross-validation procedure was used to evaluate the predictive power of these regression models.
Besides this approach with only two predictor variables we also used PLSR that was applied to spectra not measured but fitted to the measured values of OC and N. The fitting procedure comprised (i) the selection of four samples-those with the minimum and the maximum values of OC and N-and their measured spectra; (ii) based on these four samples a linear regression analysis between OC and N (each four values) as x variables and the respective four spectral values (in each band) as y variable; and (iii) the application of the band-wise found regression coefficients for all other samples (n = 38) to model spectra as a fit to their measured OC and N contents (i.e., for each sample the spectral value in band i (with i = 1, 2, . . ., 125) was modeled with b 0 + b 1 × OC + b 2 × N).The retrieved spectra contained the complete spectral variability introduced by OC and N and preserved the typical shape of FOSS and HyMap spectra (Figure 4).The modeled spectra were then subjected to multivariate calibration to obtain estimates for all studied soil properties in a cross-validation approach.
Remote Sens. 2017, 9, 1103 10 of 22 spectra as a fit to their measured OC and N contents (i.e., for each sample the spectral value in band i (with i = 1, 2,…, 125) was modeled with b0 + b1 × OC + b2 × N).The retrieved spectra contained the complete spectral variability introduced by OC and N and preserved the typical shape of FOSS and HyMap spectra (Figure 4).The modeled spectra were then subjected to multivariate calibration to obtain estimates for all studied soil properties in a cross-validation approach.

Estimation Results for Soil Variables from Measured Spectra
PLSR models based on full spectra without spectral variable selection provided the following order of accuracies for the FOSS laboratory measurements: OC >> N > MBC > HWEC (Table 3).With the exception of OC, results were weak, with RPD values less than 1.5 in the LOO cross-validation.For

Estimation Results for Soil Variables from Measured Spectra
PLSR models based on full spectra without spectral variable selection provided the following order of accuracies for the FOSS laboratory measurements: OC >> N > MBC > HWEC (Table 3).With the exception of OC, results were weak, with RPD values less than 1.5 in the LOO cross-validation.For HyMap data, the order was OC > MBC > HWEC > N; results for OC and N were worse than with the FOSS data, while estimates for MBC and HWEC, although at a low accuracy level, were slightly more accurate than with FOSS (Table 3).
The analysis of the FOSS results showed that one certain sample provided the markedly highest error terms for N, MBC and HWEC.In an outlier analysis performed in the spectral feature space, this sample was indicated as being the only outlier with a Mahalanobis distance at H = 18.5 and a p-value < 0.01 when compared to the chi 2 probability distribution function.Excluding this sample improved estimation accuracies for N, HWEC and MBC considerably for both FOSS and also HyMap data (Table 3).However, the respective outlier analysis on HyMap data did not indicate any outlier, so that the further analysis using PLSR combined with spectral variable selection was performed with all 42 samples.
Estimation accuracies improved considerably when PLSR was combined with spectral variable selection techniques (Table 4).With FOSS data, RPD cv obtained for OC was greater than 3.0 in all cases.Results were worst for HWEC with RPD cv values being generally less than 2.0.Compared to FOSS results, accuracies dropped distinctly on HyMap data for OC and N, but remained stable or even improved for MBC and HWEC.Similar to PLSR results without variable selection, the order of accuracies obtained with CARS, IRIV and GA switched from OC > N > MBC > HWEC (FOSS) to OC, MBC > HWEC (at least for CARS and IRIV) > N (HyMap).
From all applied selection techniques, GA was the most successful one for all soil variables and for both FOSS and HyMap data (Table 4).Differences between GA and the other selection approaches were most pronounced for N (with FOSS data, RPD cv at 3.70 and less than 2.5 for all other methods) and HWEC (with HyMap data, RPD cv at 2.68 and less than 2.0 with other methods).With pooled selections, accuracies of obtained results usually ranked in the mid-range; only for N this approach was slightly inferior to the other selection procedures (Table 4).In most cases, with the exception of N and HWEC (CARS), less spectral variables were selected on the laboratory data compared to HyMap data (Table 4).

Analysis of Selection Patterns
As the RI integrates the results of CARS, IRIV and GA, it was analyzed for all 125 spectral variables to elucidate differences from one soil variable to another and between FOSS and HyMap.For the comparison between HyMap and FOSS, we found marked differences for all soil variables, being less pronounced for OC compared to N (both illustrated in detail by Figure 5).For an easy comparison, Figure 6 shows relative RI values for all data.In the case of OC, spectral values in the visible range and in the SWIR beyond 2050 nm were important for the PLSR approach on both laboratory and airborne data.Differences between both were more pronounced in the NIR range from 700 to 1000 nm (which was of greater relevance for the HyMap data) and the SWIR range below 2050 nm with some marked RI peaks only for the laboratory data measured with the FOSS instrument (Figure 5a).Values in the NIR between 1000 and 1300 nm were irrelevant for both datasets.For N, a series of marked RI peaks were found for both laboratory and airborne data in the SWIR range at wavelengths greater than 1400 nm, but with the exceptions of 1505 and 2210 nm the positions of these peaks did not match (Figure 5b).In addition, the NIR range below 1000 nm was relevant for the modeling on HyMap data, but of no meaning for the data.For HWEC, the importance of the spectral region less than 700 nm markedly differed between FOSS and HyMap data, whereas this region was relevant for MBC on both datasets (Figure 6).For both variables, some NIR and SWIR key wavelengths on HyMap were of low importance for the FOSS data (HWEC: 1560-1599 nm; MBC: 1560, 1573, 2210 and 2297 nm; Figure 6 and Table 5).

FOSS XDS (Laboratory)
HyMap (Airborne) Concerning RI peaks, we found some bands with a general relevance for all four constituents (Figure 6, Table 5): The band at 2210 nm, usually attributed to the OH combination band of clay minerals [59], was markedly prominent with peaks for OC, N (FOSS and HyMap data), and, on HyMap data, for HWEC and MBC.Absorbances measured at 1776 nm were of importance in the laboratory for all constituents, spectral values at 632 nm for OC, HWEC and MBC.For HyMap, values at 1560, 1573 and 2192 nm were identified as being important for N, HWEC and MBC; these wavelengths may be related to the combination band of free or adsorbed soil water located at 1450 nm (see also Figure 3 for the relevance of this band in the HyMap data) and again to the clay hydroxyl band around 2200 nm [59,60].

HWEC
Based on grouped spectral variables (see Section 2.3), Chi 2 -statistics indicated highly significant differences between the distributions of RI obtained with FOSS for OC, N, HWEC and MBC (p < 0.01 in all cases, with the exception of MBC compared to HWE with lower but still significant differences at p = 0.019).Results differed for HyMap in a way that OC and HWEC showed similar distributions (p = 0.33); in all other cases, differences were again highly significant (with p < 0.01).

Indirect Approaches to Quantify HWEC and MBC
We used both measured and estimated values of OC and N to predict HWEC and MBC (Table 6).For both variables, measured values were generally more appropriate than estimated values.For HWEC, wet-chemical data of OC and N provided estimates that were more accurate than the results obtained with almost all spectrally based approaches (Tables 3, 4 and 6), in which the indirect approach was only outperformed by GA-PLSR applied to laboratory and airborne data.In case of MBC, the indirect approaches using values of OC and N did not reach the accuracies of PLSR combined with variable selection, independent from the selection method and the data (laboratory and airborne, Table 4).
The use of modeled spectra that had been fitted to measured OC and N values as input for PLSR provided perfect estimates for both, OC and N (with modeled FOSS as well as with modeled HyMap spectra; r 2 cv = 1, each with three latent variables).Using these spectra for the estimation of HWEC and MBC supported the results obtained with linear regression using wet-chemical data of OC and N (Table 6).With modeled FOSS spectra, HWEC was estimated with r 2 cv = 0.72, RMSE cv = 116 and RPD cv = 1.90 (Table 6, Figure 3), based on modeled HyMap spectra results were almost identical (Table 6).Visualization of measured vs. estimated HWEC values in a scatterplot documents that the indirect approaches-with chemical data and especially with calculated spectra-outperformed a direct estimation from measured spectra with respect to accuracy and also more homoscedastic residues (Figure 7).In total, these results indicate a very limited use of direct spectral approaches for this soil variable.In contrast, estimates of MBC obtained with indirect approaches showed more scattering around the 1-1 line than those that were quantified from the measured spectra (Figure 7).
obtained with almost all spectrally based approaches (Tables 3, 4 and 6), in which the indirect approach was only outperformed by GA-PLSR applied to laboratory and airborne data.In case of MBC, the indirect approaches using values of OC and N did not reach the accuracies of PLSR combined with variable selection, independent from the selection method and the data (laboratory and airborne, Table 4).
The use of modeled spectra that had been fitted to measured OC and N values as input for PLSR provided perfect estimates for both, OC and N (with modeled FOSS as well as with modeled HyMap spectra; r 2 cv = 1, each with three latent variables).Using these spectra for the estimation of HWEC and MBC supported the results obtained with linear regression using wet-chemical data of OC and N (Table 6).With modeled FOSS spectra, HWEC was estimated with r 2 cv = 0.72, RMSEcv = 116 and RPDcv = 1.90 (Table 6, Figure 3), based on modeled HyMap spectra results were almost identical (Table 6).Visualization of measured vs. estimated HWEC values in a scatterplot documents that the indirect approaches-with chemical data and especially with calculated spectra-outperformed a direct estimation from measured spectra with respect to accuracy and also more homoscedastic residues (Figure 7).In total, these results indicate a very limited use of direct spectral approaches for this soil variable.In contrast, estimates of MBC obtained with indirect approaches showed more scattering around the 1-1 line than those that were quantified from the measured spectra (Figure 7).

Discussion
This study confirmed that OC, which is one key soil characteristic, could be estimated with success using airborne IS.The results are well in line with results of other studies, e.g., Steinberg et al. [8] (r 2 cv = 0.74, RPD cv = 1.90) or Hbirkou et al. [6] (r 2 cv = 0.83, RPD cv = 2.45).However, OC could not be predicted with HyMap spectra in the study of Gomez et al. [5] (r 2 cv = 0.02, RPD cv = 0.99), which was probably due to low OC contents (between 0.4 and 1.5%) in their calibration data.Stevens et al. [61], who studied airborne data of the AHS-160 sensor (Argon ST, Ann Arbor, MI, USA) for the retrieval of OC, worked out that in case of heterogeneous soil conditions stratified approaches referring to sub-regions or, even more appropriate, to different soil types are reasonable for a further improvement of estimation accuracies; for example, with PLSR and based on 100 validation samples examined in that study, accuracies improved from r 2 cv = 0.53 and RPD cv = 1.47 in a "global" approach without stratification to r 2 cv = 0.86 and RPD cv = 2.76 based on soil type specific sub-models.This goes in line with the findings of Colombo et al. [62], who propose to classify soil samples according to their main pedological features followed by the calibration of specific PLS models.This kind of stratification would also be an option for the heterogeneous soil region studied here, in case of a greater number of available samples.Selige et al. [7] used HyMap data and obtained estimates of high accuracy for both OC and total N (r 2 cv = 0.90 and r 2 cv = 0.92), in which especially the results for N outperformed our results markedly.The latter could, in part, be due to a very high correlation between OC and N in the dataset of Selige et al. [7]-Pearson's r equaled 0.97 compared to 0.80 in our data.
For HWEC and MBC, comparative data of other airborne IS studies are not available.For both variables, estimates from HyMap data were generally more accurate than those from FOSS data.However, PLSR combined with pooled variable selections yielded only very small differences.Deviating from that, estimates of OC and N were markedly less accurate when using HyMap instead of laboratory data.The latter is in line with the findings of Stevens et al. [9], who also compared laboratory (ASD FieldSpec) and airborne (AHS-160 sensor) hyperspectral data and found markedly reduced accuracies for OC on the image level-RPD cv dropped from 2.03 (lab) to 1.47 (image).Similar to this, Lagacherie et al. [10] found for the example of HyMap data that empirical relations between laboratory spectra and soil variables (clay and calcium carbonate contents) distinctly degraded to the image level; estimation accuracies obtained from both datasets differed accordingly.Different data levels or scales (laboratory vs. image) are indicated by systematic variations of the shapes of the measured spectra, as discussed in detail by Stevens et al. [9].Different shapes were also obvious for our data (Figures 3 and 4), although some typical absorption features were notable in both datasets (absorption peaks for carbonate at 2350 nm and for clay around 2200 nm [2,10]) and different general reflectance levels due to different soil textures (decreasing reflectances with increasing particle sizes) were also apparent (Figure 3).Remote but direct measurements of more or less rough topsoil surfaces with HyMap are basically different from lab measurements using prepared (dried and ground) samples; increasing soil roughness reduces reflectances due to the effects of microshades, so that absorbance values of laboratory data were markedly lower than those recorded by the HyMap sensor.Spectrally, HyMap provides contiguous sampling without covering the strong atmospheric water absorption bands near 1380 and 1875 nm [63].However, water absorption bands occurring adjacently to water vapor bands with peaks at 1940 nm, 1450 nm and also, weaker and broader than the other two bands, at 1200 nm [63], markedly influenced the shape of the HyMap spectra (Figures 3 and 4).In addition, vegetation coverage or crop residues are also spectrally meaningful factors [9], but not of high relevance here, as we only considered bare soils without or with very little crop residues or stubble.We thus consider different soil moisture contents and soil surface roughnesses to be the main factors for the variability of the HyMap compared to the FOSS spectra.In addition, radiometric calibration uncertainties of the HyMap sensor, possible residual atmospheric effects [10] and not fully compensated directional effects of the monitored soil surfaces may have also induced general spectral differences between airborne and laboratory data.
The reduced variability of FOSS spectra compared to HyMap spectra explains why in most cases (see Section 3.1) less spectral variables were selected on the laboratory data.Selection patterns differed between both datasets.In all cases, variable selection improved estimation accuracies markedly, most pronounced for OC, N and MBC on FOSS data.With pooled selections, RPD cv increased to values greater than 2.0 for MBC (both datasets), N (lab) and OC (HyMap); on laboratory scale, RPD cv obtained for OC increased from 2.36 to more than 3.0.Despite the differences between laboratory and HyMap spectra, we identified some key wavelengths being highly prominent for the quantification of the studied soil variables, above all the hydroxyl band around 2200 nm.This coincides with the findings for other datasets, for example by Viscarra-Rossel and Behrens [34] (OC), Vohland and Emmerling [64] (OC and HWEC) or Vohland et al. [65] (OC, N).In these studies, the water band at 1940 nm was also identified as key region for OC, N and HWEC, but in the data here this only applied for N on the level of laboratory spectra.However, the second important water band at 1450 nm was identified as key region for OC, N (laboratory data) and MBC (HyMap data).Stenberg et al. [59] provide a detailed overview of main soil organic constituents and their overtone and combination bands in the NIR region; bands around 1100, 1600, 1700 to 1800, 2000 and 2200 to 2400 nm were reported to be particularly relevant for OC and N calibration.In addition to the hydroxyl band around 2200 nm, we found a match for the 1700 to 1800 nm region for OC and N in the laboratory data, for N also in the HyMap data, which may be attributed to an alkyl band in this spectral region [34].In line with previous studies [59], we found the visible region to be important for the assessment of OC.
Accuracies obtained for the estimation of HWEC and MBC differed distinctly in our study.However, for both it was hypothesized that their spectral prediction may be closely coupled to that of spectrally active main soil properties.For MBC, Zornoza et al. [16] discussed this issue with a focus an OC as main factor.Values of r 2 obtained in a simple linear regression between measured OC and MBC values were lower than the r 2 values obtained for MBC in the multivariate estimation procedure using NIR data and PLSR.Furthermore, spectral ranges used for the multivariate calibration of OC and MBC differed markedly.From these findings the authors concluded that specific information for MBC must be contained in the spectra.Ludwig et al. [66,67] showed that other main soil properties, in addition to OC, were also relevant for the quantification of MBC.Depending on the sample set, they found the following predictor variables to add significantly to the quantification of MBC in multiple linear regressions: (N, pH, clay), (pH, sulphur (S)), (OC, pH) and (OC, N, S, phosphorus (P)).Estimates with these variables were almost as successful as those with spectral data (whereby mid-infrared data in the range between 400 and 7000 cm −1 and not Vis-NIR-SWIR data were used in these studies).However, from the fact that the most appropriate set of predictor variables differed from one sample set to the other, it can be concluded that underlying predictive mechanisms were variable and prediction models could not be transferred in space and time.This is also in line with the findings of other studies [68,69].Emmerling and Udelhoven [68] modeled MBC of agricultural soils with multiple regressions based on other soil properties and identified clay content, pH and HWEC as highly significant predictors.The application of their model to another dataset provided marked underestimates of MBC, better results were obtained by considering clay content, organic matter, pH and annual precipitation values or by a combination of OC, N, clay and sand content [69].However, these results also indicate that a combination of spectral and non-spectral predictors may be promising, which was realized for example by Peng et al. [70] (though not for biological soil properties), who used laboratory spectral data combined with ancillary environmental data (e.g., plant available water, soil texture classes) and spectral indices based on multispectral remote sensing data for the quantification and spatial modeling of OC.
For HWEC, indirect approaches (i.e., a multiple linear regression with OC and N as predictors and PLSR based on spectra calculated from OC and N values) were more successful than direct approaches based on measured spectra-with PLSR or even PLSR combined with CARS, IRIV and pooled selected variables.Furthermore, spectral variable selection patterns calculated for HyMap did not deviate significantly between HWEC and OC.These findings indeed indicate a very limited use of Vis-NIR-SWIR data, both laboratory and airborne, for a direct quantification of HWEC.No overall spectral pattern was found for HWEC, which may also result from the large variability and heterogeneity of this pool, as it consists of large portions of easily degradable compounds such as carbohydrates, peptides and amino acids [30,71].
For MBC the findings were different: Pattern analysis of selected variables indicated significant differences from those of OC, N and also HWEC.Indirect approaches were outperformed by PLSR coupled to any of the spectral variable selection methods, especially in combination with IRIV, GA and pooled selections.This even did not change with the integration of measured pH values in the multiple linear regression in addition to OC and N (as proposed by the findings of Ludwig et al. [66,67]), which improved accuracy of indirect MBC estimates only slightly (r 2 cv = 0.68, RPD cv = 1.77,RMSE cv = 0.37).This all suggests specificity of the spectral information for MBC, which may either result from a direct and own spectral signature or from a complex interaction of other spectrally meaningful factors related to MBC (or from a combination of both).

Conclusions
Airborne IS proved appropriate for the quantification of key soil properties, above all OC and MBC.Shape of spectra indicated marked differences between the laboratory level (prepared samples) and airborne observations (bare soil surfaces), as the latter was influenced by more variable on-site conditions.Empirical approaches applied with success on laboratory spectroscopic data are thus not per se ready to be employed for airborne data or even hyperspectral spaceborne data, as, for example, the forthcoming Environmental Mapping and Analysis Programme (EnMAP) or the Hyperspectral and Infrared Imager (HyspIRI) missions.With coarser spatial resolutions achieved by the satellite missions, pure bare soil pixels will be more unlikely and approaches are needed to cope with mixed pixels.In any case, estimation models should be re-calibrated for each new situation and for changing conditions defined through soil moisture, roughness and possibly crop residues.
The combination of PLSR with spectral variable selection is a valuable tool for improving accuracies of the obtained estimates.This was shown for conceptually different approaches and proved valid for a more generalized procedure with pooled selections, which balanced the specific responses of the individual methods.Identified key variables could in part be interpreted physically and related to the absorption band of free or adsorbed soil water around 1450 nm and the hydroxyl band around 2200 nm.
Based on our results, the spectral approach is beneficial for the quantification of MBC (but not for HWEC), as it outperformed the indirect approaches with OC and N as predictor variables and/or with calculated spectra being ideal for the measured OC and N values.Specificity of the spectra was identified, but the underlying spectral principles that yield the potential for a statistical prediction of MBC cannot be fully understood from the analysis of one sample set.For this, a comparative analysis with a great number of sample sets with highly different spectral properties and a very wide range of couplings between physical, chemical and biological soil properties would be advisable.

Figure 1 .
Figure 1.Location of the study region with HyMap flight stripe, geological situation and sampling positions (geological data modified after Wagner [46]).

Figure 1 .
Figure 1.Location of the study region with HyMap flight stripe, geological situation and sampling positions (geological data modified after Wagner [46]).

Figure 2 .
Figure 2. Workflow of the study.

Figure 2 .
Figure 2. Workflow of the study.

Figure 3 .
Figure 3. Examples of spectra measured for three plots in the laboratory (FOSS) and airborne (HyMap) (for clarification, FOSS spectra were shifted by −0.1 absorbances).

Figure 3 .
Figure 3. Examples of spectra measured for three plots in the laboratory (FOSS) and airborne (HyMap) (for clarification, FOSS spectra were shifted by −0.1 absorbances).

Figure 4 .
Figure 4. Examples of modeled (a) FOSS and (b) HyMap spectra that were fitted to measured values of OC and N (for clarification, spectra are illustrated for a subset of seven samples; absorbance defined as log10(reflectance −1 ).

Figure 4 .
Figure 4. Examples of modeled (a) FOSS and (b) HyMap spectra that were fitted to measured values of OC and N (for clarification, spectra are illustrated for a subset of seven samples; absorbance defined as log 10 (reflectance −1 ).

Figure 5 .
Figure 5. Relative importance of all 125 spectral variables (FOSS XDS and HyMap) for the estimation of (a) OC and (b) N; peaks are labeled with the central wavelength (rounded to the nearest nm value) of the respective HyMap band.

Figure 6 .
Figure 6.Relative importances of all 125 spectral variables for the estimation of the studied soil properties based on (a) laboratory and (b) airborne data.

Figure 5 . 22 Figure 5 .
Figure 5. Relative importance of all 125 spectral variables (FOSS XDS and HyMap) for the estimation of (a) OC and (b) N; peaks are labeled with the central wavelength (rounded to the nearest nm value) of the respective HyMap band.

Figure 6 .
Figure 6.Relative importances of all 125 spectral variables for the estimation of the studied soil properties based on (a) laboratory and (b) airborne data.

Figure 6 .
Figure 6.Relative importances of all 125 spectral variables for the estimation of the studied soil properties based on (a) laboratory and (b) airborne data.

Figure 7 .
Figure 7. Scatterplots of measured vs. estimated values of HWEC (a-c) and MBC (d,e) (cross-validated results, n = 42), in which estimates were obtained with (a,d) PLSR and pooled selected variables based on measured FOSS spectra; (b,e) linear regression based on wet-chemical data of OC and N;and (c,e) PLSR and calculated FOSS spectra that were perfectly fitted to OC and N values (results of both indirect approaches matched perfectly for MBC).

Figure 7 .
Figure 7. Scatterplots of measured vs. estimated values of HWEC (a-c) and MBC (d,e) (cross-validated results, n = 42), in which estimates were obtained with (a,d) PLSR and pooled selected variables based on measured FOSS spectra; (b,e) linear regression based on wet-chemical data of OC and N; and (c,e) PLSR and calculated FOSS spectra that were perfectly fitted to OC and N values (results of both indirect approaches matched perfectly for MBC).

Table 1 .
Descriptive statistics of the parameters analytically determined for the studied 42 topsoil samples.

Table 3 .
Statistics of estimates from FOSS and HyMap spectra using PLS regression with all 125 spectral variables (LOO cross-validated results).
a number of latent variables; b units are given in the first column; c bias cv (mean error) = mean of cross-validated estimates-mean of measured values, units are given in the first column.

Table 4 .
Statistics of estimates from FOSS and HyMap spectra using PLSR combined with different spectral variable selection methods (LOO cross-validated results, n = 42).
a OC and N in %, HWEC and MBC in mg kg −1 ; b number of latent variables, in case of CARS, IRIV and GA averaged from 30 runs; in brackets number of selected original spectral variables; c units are given in the first column; d bias cv (mean error) = mean of cross-validated estimates-mean of measured values, units are given in the first column; * between given value and 0.0.

Table 6 .
Statistics of estimates for HWEC and MBC obtained with multiple linear regression (measured and estimated values for OC and N as predictors *) and with PLSR using modeled spectra (LOO cross-validated results, n = 42).predictors were significant with p-values < 0.05 in all cases; 1 estimates of OC and N obtained with PLSR and pooled selected variables; 2 as described in Section 2.4; a units are given in the first column; b bias cv (mean error) = mean of cross-validated estimates-mean of measured values, units are given in the first column. *