Soil Organic Carbon Estimation in Croplands by Hyperspectral Remote APEX Data Using the LUCAS Topsoil Database

The most commonly used approach to estimate soil variables from remote-sensing data entails time-consuming and expensive data collection including chemical and physical laboratory analysis. Large spectral libraries could be exploited to decrease the effort of soil variable estimation and obtain more widely applicable models. We investigated the feasibility of a new approach, referred to as bottom-up, to provide soil organic carbon (SOC) maps of bare cropland fields over a large area without recourse to chemical analyses, employing both the pan-European topsoil database from the Land Use/Cover Area frame statistical Survey (LUCAS) and Airborne Prism Experiment (APEX) hyperspectral airborne data. This approach was tested in two areas having different soil characteristics: the loam belt in Belgium, and the Gutland–Oesling region in Luxembourg. Partial least square regression (PLSR) models were used in each study area to estimate SOC content, using both bottom-up and traditional approaches. The PLSR model’s accuracy was tested on an independent validation dataset. Both approaches provide SOC maps having a satisfactory level of accuracy (RMSE = 1.5–4.9 g·kg−1; ratio of performance to deviation (RPD) = 1.4–1.7) and the inter-comparison did not show differences in terms of RMSE and RPD either in the loam belt or in Luxembourg. Thus, the bottom-up approach based on APEX data provided high-resolution SOC maps over two large areas showing the withinand between-field SOC variability.


Introduction
Many soil chemical components (chromophores) interact with the electromagnetic radiation within the visible-near infrared (VNIR: 400-1300 nm) and short-wave infrared (SWIR: 1300-2500 nm) spectral regions.Soil spectroscopy exploits the correlation between spectral features and chromophores in order to estimate soil variables.The strength of this relationship is not the same for all soil variables, and also the shape of the spectral features can be very different.For example, the metal-OH bending and O-H stretching related to a clay lattice cause a narrow and shallow absorption feature at 2207 nm [1,2].Overtones of the O-H and H-O-H stretch vibrations in the free water produce two deep absorption features at 1455 nm and 1915 nm, while organic matter has a strong relationship with electromagnetic radiation in the visible region, due to a wide spectral feature centred around 664 nm related to the chlorophyll pigment [3,4].However other narrow peaks have a strong relation with organic carbon and organic matter in the SWIR, especially between 2100 nm and 2300 nm [5][6][7].In order to exploit spectral features, a suitable spectral resolution is necessary, especially for soil variables affecting spectra in narrow spectral regions (e.g., clay and organic matter).Most of the chemometric approaches to estimate soil variables use the whole spectrum; for example, the partial least square regression (PLSR) method exploits both the wavelengths correlated to the target variable (i.e., absorption features) and other bands indirectly related to the target variable [8].Thus, in addition to band width, the spectral range is very important to ensure an accurate soil variable estimation by chemometric approaches.PLSR allows summarizing of the spectral information and, therefore, in some cases a very narrow band width is not strictly necessary.Castaldi et al. [9] showed that reducing the spectral resolution from 1 nm to 40 nm did not cause a significant decrease in the estimation accuracy for the SOC, clay, sand and silt content.Spectroscopy, applied in the laboratory, allows estimating soil variables only at the sampling points, while both airborne and satellite data provide quantitative soil information over large areas.As the band width of the main airborne sensors is smaller than 40 nm, their spectral range allows for most of the soil chromophores to be exploited.There is an increasing number of papers concerning the estimation of SOC exploiting airborne hyperspectral data [10][11][12][13][14][15][16][17].
Hyperspectral satellite data could be more attractive than airborne data as satellites, having a short revisit time, allow for the collection of a number of images during a year for the same area from the same sensor under same angle of observation.Unlike airborne data, satellite acquisition ensures a cheap way to obtain spectral data over very large areas.However, the quantitative prediction of soil properties using the first generation of hyperspectral satellite sensors is hampered by the very low signal-to-noise ratio (SNR) in the SWIR region for Hyperion imagers on board of the NASA EO-1 platform [9,18,19], or by the restricted spectral range (415-1050 nm) for the Compact High Resolution Imaging Spectrometer (CHRIS) on the European Space Agency's PROBA platform [9,[20][21][22].Because of the low SNR, Zhang et al. [23] obtained unreliable PLSR predictions for soil moisture and clay content using Hyperion image reflectance spectra.In order to decrease the SNR of the Hyperion data, Castaldi et al. [19] used the bands retrieved from the minimum-noise fraction transformation (MNF) as regressors in the multivariate models for the clay, sand and organic matter estimation; in this case, the authors managed to improve the estimation accuracy as compared to multispectral data obtained by the Advance Land Imager (ALI).In the near future, at least five satellites equipped with hyperspectral imagers are due to be launched: the German Environmental Mapping and Analysis Program (EnMap) [24], the Italian PRecursore IperSpettrale della Missione Applicativa (PRISMA) [25], the U.S. NASA Hyperspectral Infrared Imager (HyspIRI) [26], the Japanese Hyperspectral Imager Suite (HISUI) [27], the Israeli Hyperspectral imager (SHALOM) [28], and the China Commercial Remote-sensing Satellite System (CCRSS).
Most authors directly calibrate the airborne or satellite spectra on soil samples collected from the corresponding pixels [16,19,[21][22][23]29].This involves collection of soil samples within bare fields, laboratory measurement of the target variable, and calibration of a multivariate model linking the quantity of the target variable with the spectra extracted from the remote-sensing instrument at the sampling points.This procedure generally ensures a high estimation accuracy within the investigated area, but it entails time-consuming and expensive data collection (sampling, chemical or physical laboratory analysis).The use of other covariates can increase the efficiency of soil-estimation models provided by sampling that is not intensive (i.e., the collection of a few samples in a large area); for example, Casa et al. [21] showed the advantage of merging remote sensing and geophysical data sources using hybrid methods, i.e., statistical techniques combining regression and geostatistical analysis.However, the calibration of empirical models allows only local efficiency and their application to different conditions or areas is often difficult or impossible, even for neighboring fields in cropland as shown by Casa et al. [22] for clay, silt and sand estimation using airborne (MIVIS) and satellite (CHRIS-PROBA) hyperspectral data and PLSR models.The authors compared two different validation strategies: the random selection of the calibration and validation dataset from two neighboring fields, and the spatially independent validation; i.e., using data from one field for calibration and data from another field for validation, Casa et al. [22] showed a lower accuracy of the spatially independent models as compared to the random strategies, confirming the issues about the extrapolation from empirical models.
In this context, national, continental and global soil spectral libraries could be exploited to decrease the efforts concerning the soil-variable estimation and obtaining more widely applicable models.A consistent number of large soil spectral libraries are available at national [30][31][32][33] and continental scale [34,35].Moreover, a global soil spectral library was created connecting spectral libraries from all over the world [36].All these libraries provide, in addition to spectral data, analytical data on a number of soil variables, allowing the calibration of multivariate models covering larger soil variability than the models calibrated using local libraries.However, often the large spectral libraries are built collating local libraries that were collected under differing conditions and using different protocols and instruments.Thus, a spectral alignment between libraries is necessary.Recently, Castaldi et al. [37] used the pan-European LUCAS topsoil database to accurately estimate SOC content in two local spectral libraries.They did not need to apply any spectral alignment, because both LUCAS and local spectra were scanned with the same model of spectrometer (FOSS NIR Systems Inc., Laurel, MD, USA), which provided very stable measurements.This approach could allow estimating SOC or other soil variables in croplands in Europe exploiting the LUCAS database and without chemical laboratory analyses.The satisfactory exploitation of the LUCAS database on local spectral libraries was made possible by the high stability of the laboratory spectral measurements.Unfortunately, remote-sensing data are not stable because of the change of the atmospheric and soil conditions between two different acquisition times and the uncertainty of the pre-processing procedure (radiometric, atmospheric and geometric correction).This instability hampers the use of remote-sensing data to build a reliable spectral library and, consequently, the application of a single calibration model on spectral data acquired at different times.At the same time, the future availability of hyperspectral satellite data entails the need for an automated approach that allows estimation of soil variables over large areas, avoiding the calibration of predicted models using remote-sensing spectra of soil samples collected at the location of the image pixels.For this purpose, large spectral libraries could be exploited in synergy with remote-sensing data, but the comparison between remote and laboratory data is hampered by several factors: the difference in support size, the georeferencing error, the spectral characteristics, and the atmospheric effects.Although the remote-sensing data are atmospherically corrected, this correction introduces an uncertainty in the measurements, causing a non-systematic or random difference between laboratory and remote spectra.
Although a consistent number of spectral-transfer procedures were tested between two datasets both composed of laboratory spectra [38][39][40][41][42][43], the transfer between spectra acquired in laboratory conditions to those acquired by remote sensors is a still rare [44,45].Moreover, in order to find a reliable transfer function, it is mandatory to collect and scan soil samples.In fact, a transfer function can only be obtained by the comparison between spectra acquired using both the master and slave instrument.The transfer function can be stable in the case of two sensors working in the same conditions, like two laboratory spectrometers, while for the remote data the change in atmospheric conditions between two different acquisition times involves different spectral responses at the same points.This difference entails a new transfer function, but the available bare soil surfaces are not always the same, thus implying the need for new sampling for each remote acquisition.
Here we aim to test a new approach, named bottom-up, that allows avoidance of the aforementioned issues related to the spectral transfer between laboratory and remote-sensing data, while at the same time prepares for an automated approach to estimate soil variables over large areas, as will be required once the hyperspectral satellites will be launched.This approach started from the methodology proposed by [37] to predict the SOC values at sampling points based on the LUCAS spectral library; then these values were linked to the airborne spectra building a PLSR model; and in the final step the PLSR model was applied to all bare soil pixels of the airborne image producing SOC maps with the same spatial resolution as the airborne data.Thus, this approach allows laboratory analysis of the target variable to be avoided.The new method was compared with the traditional approach, i.e., the calibration of a multivariate model linking remote spectra and the quantity of the target variable measured in the laboratory.

Study Area
We chose two study areas with a fair amount of cropland having different soil characteristics: the loam belt in Belgium and the Gutland-Oesling region in Luxembourg.
The northern part of Wallonia is mostly characterized by Quaternary and Tertiary geological formations.Iin particular, the Belgian loam belt is mainly characterized by niveo-eolian sediments dominated by cropland with productive silt loam soils and a rolling topography [46].The Belgian research area consists of a 7 km-wide and 66 km-long SW-NE strip (NW corner: 50 • 46 N, 5 • 5 E; SE corner: 50 • 32 N, 4 • 41 E) from Gembloux (Walloon region) to Sint Truiden (Flanders region).The Belgian loam belt has a temperate oceanic climate, the mean annual rainfall is 790 mm, the lowest monthly mean temperature is in January (2.3 • C), and the highest monthly mean in July (17.8 • C).The Luxembourg research area is located in the central and northern part of the Grand Duchy of Luxembourg within the Gutland and Oesling regions (Figure 1) along a 5 km-wide and 29 km-long North-South strip (NW corner: 49  C in summer (July) [46].The mean temperatures in the north of the country (Oesling region) are slightly lower both in winter and summer.The mean annual rainfall is approximately 800 mm in the central and southern Luxembourg regions and 1000 mm in the north.The Gutland region is located in central Luxembourg and, in the southern part of the image area, it consists of secondary marls, limestone, sandstone and dolomites with a rolling topography.The Gutland region can be divided in two agro-pedological zones: the Redange-Diekirch area with clay, clay-loam and sandy-loam soils and the Central area with clay, sandy-loam and sandy soil [14].The Oesling region is located in the north of the Grand Duchy of Luxembourg and it is mainly characterized by shallow stony loam soils derived from Devonian slate, quartzite and sandstone substrates on a sub-horizontal plateau with a mean altitude of 450 m incised by the main rivers [47].
analysis of the target variable to be avoided.The new method was compared with the traditional approach, i.e., the calibration of a multivariate model linking remote spectra and the quantity of the target variable measured in the laboratory.

Study Area
We chose two study areas with a fair amount of cropland having different soil characteristics: the loam belt in Belgium and the Gutland-Oesling region in Luxembourg.
The northern part of Wallonia is mostly characterized by Quaternary and Tertiary geological formations.Iin particular, the Belgian loam belt is mainly characterized by niveo-eolian sediments dominated by cropland with productive silt loam soils and a rolling topography [46].The Belgian research area consists of a 7 km-wide and 66 km-long SW-NE strip (NW corner: 50°46′N, 5°5′E; SE corner: 50°32′N, 4°41′E) from Gembloux (Walloon region) to Sint Truiden (Flanders region).The Belgian loam belt has a temperate oceanic climate, the mean annual rainfall is 790 mm, the lowest monthly mean temperature is in January (2.3 °C), and the highest monthly mean in July (17.8 °C).The Luxembourg research area is located in the central and northern part of the Grand Duchy of Luxembourg within the Gutland and Oesling regions (Figure 1) along a 5 km-wide and 29 km-long North-South strip (NW corner: 49°57′N, 6°49′E; SE corner: 49°42′N, 6°11′E).The climate in Luxembourg is temperate oceanic.The mean temperatures ranging from 0.7 °C in winter (January) to 17 °C in summer (July) [46].The mean temperatures in the north of the country (Oesling region) are slightly lower both in winter and summer.The mean annual rainfall is approximately 800 mm in the central and southern Luxembourg regions and 1000 mm in the north.The Gutland region is located in central Luxembourg and, in the southern part of the image area, it consists of secondary marls, limestone, sandstone and dolomites with a rolling topography.The Gutland region can be divided in two agro-pedological zones: the Redange-Diekirch area with clay, clay-loam and sandyloam soils and the Central area with clay, sandy-loam and sandy soil [14].The Oesling region is located in the north of the Grand Duchy of Luxembourg and it is mainly characterized by shallow stony loam soils derived from Devonian slate, quartzite and sandstone substrates on a sub-horizontal plateau with a mean altitude of 450 m incised by the main rivers [47].

Lucas Topsoil Database
The topsoil dataset provided by the Land Use/Cover Area frame statistical Survey (LUCAS) is a pan-European spectral library, composed of 19,967 topsoil samples collected in 2009 throughout 25 countries [35].The soil samples were air-dried and sieved (2 mm) before the spectral acquisition by means of an XDS Rapid Content Analyzer (FOSS NIR Systems Inc., Laurel, MD, USA) following the protocol described by the manufacturer and the Soil Spectroscopy Group [36].The diffuse high-resolution reflectance was continuously measured from 400 nm to 1300 nm (VNIR) and from 1300 nm to 2500 nm (SWIR) with a spectral sampling interval of 0.5 nm.In addition to spectral measurements, the LUCAS dataset includes 12 chemical and physical soil properties.The total carbon was measured by dry-combustion using a CN analyser (VarioMax, Elementar Gmbh, Hanau, Germany) and SOC content was obtained subtracting the inorganic carbon by means of measuring the CO 2 pressure upon addition of concentrated HCl [48].
A subset of the LUCAS dataset was created selecting only the sample collected in croplands [37].This subset, named LUCAS_crop, consisted of 12,128 samples, which were split into seven classes according to a cluster analysis using the k-means algorithm and the Euclidean distance as a metric [49].The LUCAS clusters were obtained exploiting 12 soil variables: clay, sand, silt, and coarse fragments content, SOC, carbonate content (CaCO 3 ), pH (in H 2 O and CaCl 2 ), cation-exchange capacity (CEC), total nitrogen (N), soluble phosphorus (P), and potassium (K).Some of the LUCAS soil variables were correlated among each other, thus a principal component analysis (PCA) was carried out after the standardization of the variables, obtaining uncorrelated principal components (for more details about the cluster analysis on LUCAS_crop data, see [37]).
The LUCAS_crop classes differ in pH, soil texture, SOC and macroelements: the soils of class A are loamy and rich in N, P, K and SOC (29.7 g•kg −1 ); class B is characterized by acid soils having a sandy loam texture and an average SOC value of 21.5 g•kg −1 ; class C includes slightly acid soils having a silt loam texture, the mean SOC value is 25.7 g•kg −1 ; the soils of class D are organic, with a very high SOC content; the soil of class E are basic with a clay loam texture and are rich in CaCO 3 ; class F consists in slightly basic soils with a sandy loam texture and poor in nutrients; the soils of class F are silty clay loam with high K content (326.4 mg•kg −1 ).The average values of the main soil variables of the seven LUCAS_crop classes are shown in Table 1.
Table 1.Textural class and average values of the main soil variables for the seven classes of the LUCAS_crop dataset.

Airborne Images
Airborne imagery was acquired with the Airborne Prism Experiment (APEX) sensor mounted on a non-pressurized Dornier 228 aircraft of the German Aerospace Center (DLR).The APEX is an airborne (dispersive push broom) imaging spectrometer recording hyperspectral data in 313 bands (in binned mode) in the wavelength range 400-2500 nm [50].APEX spectrometer is developed by a Swiss-Belgian consortium on behalf of the European Space Agency.Since 2010 several flight campaigns have been completed across Europe, covering about 160 areas.The large amount of these airborne data have allowed addressing a wide range of research topics related to remote sensing in land and water [50].The APEX data were recorded on 1 July 2015 in Belgium (flying height 3900 m) and on 20 March 2015 in Luxembourg (flying height 4700 m).The location of the two study areas is shown in Figure 1.The APEX data were pre-processed at the VITO Remote Sensing Department, using the Central Data Processing Center (CDPC) for airborne Earth observation data [51].The APEX radiometric and spectral calibration is performed by means of calibration cubes generated from data measured and collected on the APEX Calibration Home Base (CHB) hosted at DLR Oberpfaffenhofen, Germany [52].The geometric correction of the APEX data is based on direct georeferencing [53].GPS/IMU data registered during the flight were post processed with Applanix POSPac MMS software, thereby using GPS base stations data.The post-processed position and orientation data, together with boresight correction data, the detailed sensor model and the SRTM DEM were used in the ortho-rectification process.Finally, the images were spatially resampled to an output resolution of 2.5 m and projected to the geographic coordinate system (WGS84).The atmospheric correction of the APEX data is performed using the MODTRAN4 radiative transfer code and reflectance retrieval algorithms given in [54,55].As a result, the pre-processed image data were provided as geo-referenced at surface reflectance.

Soil Organic Carbon (SOC) Estimation
SOC estimation using the traditional approach entails merging a soil-variable value obtained by analytical measurements in the laboratory and the spectrum extracted from the APEX imagery at the sampling point, where the former is the dependent variable, while the bands of the electromagnetic spectra are the independent variables [56].These two types of data are used to build a multivariate model to estimate soil variables in other bare soil pixels where we do not have information about the dependent variable (Figure 2).This approach entails the collection of soil samples, laboratory analysis for each variable, the calibration of a multivariate model using a consistent number of samples, and finally the validation of the model in some points not included in the calibration dataset.We used this traditional approach in order to compare it with an alternative approach, the bottom-up, which we tested, to the best of our knowledge, for the first time.
land and water [50].The APEX data were recorded on 1 July 2015 in Belgium (flying height 3900 m) and on 20 March 2015 in Luxembourg (flying height 4700 m).The location of the two study areas is shown in Figure 1.The APEX data were pre-processed at the VITO Remote Sensing Department, using the Central Data Processing Center (CDPC) for airborne Earth observation data [51].The APEX radiometric and spectral calibration is performed by means of calibration cubes generated from data measured and collected on the APEX Calibration Home Base (CHB) hosted at DLR Oberpfaffenhofen, Germany [52].The geometric correction of the APEX data is based on direct georeferencing [53].GPS/IMU data registered during the flight were post processed with Applanix POSPac MMS software, thereby using GPS base stations data.The post-processed position and orientation data, together with boresight correction data, the detailed sensor model and the SRTM DEM were used in the ortho-rectification process.Finally, the images were spatially resampled to an output resolution of 2.5 m and projected to the geographic coordinate system (WGS84).The atmospheric correction of the APEX data is performed using the MODTRAN4 radiative transfer code and reflectance retrieval algorithms given in [54,55].As a result, the pre-processed image data were provided as geo-referenced at surface reflectance.

Soil Organic Carbon (SOC) Estimation
SOC estimation using the traditional approach entails merging a soil-variable value obtained by analytical measurements in the laboratory and the spectrum extracted from the APEX imagery at the sampling point, where the former is the dependent variable, while the bands of the electromagnetic spectra are the independent variables [56].These two types of data are used to build a multivariate model to estimate soil variables in other bare soil pixels where we do not have information about the dependent variable (Figure 2).This approach entails the collection of soil samples, laboratory analysis for each variable, the calibration of a multivariate model using a consistent number of samples, and finally the validation of the model in some points not included in the calibration dataset.We used this traditional approach in order to compare it with an alternative approach, the bottom-up, which we tested, to the best of our knowledge, for the first time.The main difference between the traditional and bottom-up approaches is that the latter does not require analytical laboratory measurements.Instead, different soil variables are estimated exploiting laboratory spectral data.The bottom-up approach consists of two main steps: (i) the estimation of a soil variable at sampling points exploiting the LUCAS spectral library and its ancillary data; and (ii) the mapping of the soil variables using remote-sensing data (Figure 2).The main difference between the traditional and bottom-up approaches is that the latter does not require analytical laboratory measurements.Instead, different soil variables are estimated exploiting laboratory spectral data.The bottom-up approach consists of two main steps: (i) the estimation of a soil variable at sampling points exploiting the LUCAS spectral library and its ancillary data; and (ii) the mapping of the soil variables using remote-sensing data (Figure 2).
The first step consists in splitting the LUCAS_crop dataset into classes using a cluster analysis (see Section 3.1 and [37] for details).Furthermore, some soil samples are collected within the investigated area.For each sample, the spectrum is acquired using the FOSS spectrometer (calibration spectral dataset).Later, the LUCAS dataset classified according to the cluster analysis is used to train an artificial neural network (ANN), which is used to allocate to each sample of the calibration dataset one of the LUCAS classes (see Table 1).The ANN is a multilayer perceptron feed-forward neural network; the input layer consists of the LUCAS spectral data, while the output layer contains seven neurons, one for each of the detected classes.The number of neurons in the only hidden layer were set observing the cross-validation results of classification of 500 iterations.Then, PLSR models are calibrated for each LUCAS class and they were used to estimate the soil variable on the soil spectra of the calibration datasets of the same class.
In the second step of the bottom-up approach, the previously estimated values were used to calibrate the SOC concentration of the pixels in the APEX imagery corresponding to the sampling points.Thus, a PLSR model was created for the bare soil fields in each area and was subsequently applied to create SOC maps.
In order to reduce propagation of the georeferencing error, the spectra were extracted from APEX images using a bi-linear interpolation algorithm for both the traditional and bottom-up approaches.This method provides the spectrum of an interpolated pixel, taking into account the weighted average of the four pixels nearest to the center of the sample area.Moreover, pre-processing of the spectra consisted of removing the water bands (1349-1446 nm and 1795-1972 nm) and applying a Savitzky-Golay smoothing filter [57] with a second-order polynomial fit and a window size of 11 data points.

The Calibration and the Independent Validation Datasets
Two soil spectral datasets were collected for calibration in July 2016, consisting of 84 samples in Luxembourg and 54 in Belgium (Figure 1).The calibration samples were collected within fields that were bare during the over flight.We first checked the state of the soil surface during a field survey one day after the flight campaign.Then we also computed the NDVI on the APEX images in order to locate the remainder of the bare soil cropland fields.A sampling campaign was carried out in July 2016 in order to collect at least one sample for each selected field, and the sample location was randomly chosen in a homogeneous area away from field borders.Each calibration sample consists of five sub-samples collected from the 0-10 cm layer taken with a gouge auger within an area of 3 m radius.Soil samples were air dried, gently crushed, and passed through a 2 mm sieve in the laboratory.Each sample was scanned and SOC values were measured (for the traditional approach only) using the same instruments (FOSS spectrometer, model XDS rapid content analyzer and Variomax CN analyzer), and protocols were used to compile the LUCAS dataset (see Section 3.1).
The SOC maps of all bare soils obtained using the bottom-up approach were divided into seven classes based on the standard deviation of the SOC concentration.Then, we selected some fields both for the loam belt and Luxembourg areas (Figure 3) showing a good variability in terms of SOC content.Within these fields, we applied a stratified random strategy to select at least one random point for each of the seven standard deviation classes.The same within-field sampling procedure used for the calibration datasets was adopted for the validation points.All samples were air-dried and sieved (2 mm) and the SOC content was measured by dry-combustion using the same protocol used for the LUCAS dataset and the calibration datasets (see Section 3.1).
The root mean square error (RMSE; Equation ( 1)) and the ratio of performance to deviation (RPD; Equation (2)) were used as measures for the estimation accuracy.
where y o is the observed and y p is the predicted value, n the number of the samples, and std the standard deviation.
Remote Sens. 2018, 10, x FOR PEER REVIEW 8 of 18 where yo is the observed and yp is the predicted value, n the number of the samples, and the standard deviation.

Calibration and Validation Datasets
The traditional and bottom-up approaches were tested in two areas having different geological and pedological characteristics: the loam belt area in Belgium and the Gutland-Oesling region in Luxembourg.The loam belt is quite homogenous in terms of substrate, soil texture and organic carbon content.The niveo-eolian loess deposit uniformly covered all the area, resulting in welldrained silt loam soils [14].According to the classification of the LUCAS_crop dataset [37], the samples collected in this area for the calibration dataset belong to class F, which is characterized by a low SOC content (Table 1).Actually, both the samples of the calibration and validation dataset in Belgium have a SOC content that ranges between 6.1-15.5 g•kg −1 (Table 2).
The Luxembourg area is less homogenous than the loam belt from a geological and geomorphological point of view.The SOC values range from 6.6-49.8g•kg −1 and standard deviation value is considerably higher than in Belgium (Table 2).The measured SOC values of the validation datasets ranges between 6.9-36.0g•kg −1 (Table 2).

Calibration and Validation Datasets
The traditional and bottom-up approaches were tested in two areas having different geological and pedological characteristics: the loam belt area in Belgium and the Gutland-Oesling region in Luxembourg.The loam belt is quite homogenous in terms of substrate, soil texture and organic carbon content.The niveo-eolian loess deposit uniformly covered all the area, resulting in well-drained silt loam soils [14].According to the classification of the LUCAS_crop dataset [37], the samples collected in this area for the calibration dataset belong to class F, which is characterized by a low SOC content (Table 1).Actually, both the samples of the calibration and validation dataset in Belgium have a SOC content that ranges between 6.1-15.5 g•kg −1 (Table 2).
The Luxembourg area is less homogenous than the loam belt from a geological and geomorphological point of view.The SOC values range from 6.6-49.8g•kg −1 and standard deviation value is considerably higher than in Belgium (Table 2).The measured SOC values of the validation datasets ranges between 6.9-36.0g•kg −1 (Table 2).

SOC Estimation
The first step of the bottom-up approach consisted of the SOC estimation at sampling points exploiting the LUCAS topsoil database.The SOC content at sampling points in Luxembourg was estimated with a RMSE of 5.4 g•kg −1 and a RPD of 2.1 i.e., the calibration (Table 3).In the second step of the bottom-up approach, the estimated SOC values were coupled with the corresponding APEX spectra to calibrate a PLSR model for each flight area.The PLSR models were applied to all bare fields obtaining SOC maps for each study area (Figure 4).The differences between the two study areas in terms of range and standard deviation of SOC values (Table 2) result in different RMSE and RPD values for the independent validation, both for traditional and bottom-up approaches.The independent validation in Luxembourg provided a RMSE values of 4.9 g•kg −1 and a RPD of 1.7 for both approaches, while in Belgium, both traditional and bottom-up approach gave a RMSE of 1.5 g•kg −1 and a RPD of 1.4 (Table 3; Figure 5).Table 3. Estimation accuracy in terms of root mean square error (RMSE) and ratio of the performance to deviation (RPD) obtained from the validation datasets using the traditional and bottom-up approaches.The SOC maps in Luxembourg highlighted a much higher SOC content in the northern area than in the central and southern areas (Figure 6).The SOC ranges between 19.7-51 g•kg −1 with a mean value of 31 g•kg −1 in the north (Figure 6a), while the minimum and maximum values in the central area are, respectively, 2.8 g•kg −1 and 39.3 g•kg −1 and the mean SOC content is 19.4 g•kg −1 (Figure 6b).This north-south gradient in SOC content is in agreement with the findings in Stevens et al. [14] and it is probably due to wetter and colder conditions in the northern plateau which allow a general higher SOC content as compared with the values of the central and southern area.Most of the calibration samples located in the northern area were classified in group A according to the LUCAS_crop classification; this class is mainly characterized by quite high SOC content (Table 1).The substrate of the central and southern area (Gutland region) consists of marls and limestone and most of the calibration samples of this region were classified as B and F according to the LUCAS_crop dataset.These two types of soils are poorer than those of class A, their texture is sandy loam with high sand content and low SOC concentration.The SOC maps obtained by the bottom-up approach in Belgian loam belt confirmed a greater soil homogeneity of this area as compared to Luxembourg (Figure 4b).The SOC maps in Luxembourg highlighted a much higher SOC content in the northern area than in the central and southern areas (Figure 6).The SOC ranges between 19.7-51 g•kg −1 with a mean value of 31 g•kg −1 in the north (Figure 6a), while the minimum and maximum values in the central area are, respectively, 2.8 g•kg −1 and 39.3 g•kg −1 and the mean SOC content is 19.4 g•kg −1 (Figure 6b).This north-south gradient in SOC content is in agreement with the findings in Stevens et al. [14] and it is probably due to wetter and colder conditions in the northern plateau which allow a general higher SOC content as compared with the values of the central and southern area.Most of the calibration    The digital terrain model (DTM) of Wallonia (1 m of resolution) and that of Luxembourg (5 m of resolution) allow a better interpretation of the SOC patterns within the fields.In fact, for both study areas the intra-field variability is mainly due to erosion and sedimentation of topsoil.Higher SOC content can be detected along the downslope field border in field 1 in Luxembourg (Figures 4a and 7a) and in most of the fields in the loam belt (Figure 4b).Although the slope is gentle, the higher SOC values in field 7 are close to the bottom of the field along a strip quite parallel to the contours (Figure 8a); and also for the field in Figure 8b, the highest SOC values were detected in the southern part of the field bordering a small river valley.SOC content is higher to the top and bottom of field 4 (Figure 7b).Field 5 is located in a very flat area and this entails a very low in-field SOC variability; however, there is a clear-cut area in the north-eastern part of the field showing higher SOC content, and this is probably because the neighboring woodland extended in the past in the field.Although most of the SOC maps confirm the relevance of the slope for the within-field variability, some differences, especially between contiguous fields, are due to historical differences in management or land use, as can be clearly seen from the rectangular patterns in the fields in the eastern part of Figure 6a.

Discussion
In the first step of the bottom-up approach we obtained the estimation of SOC values at sampling points with a sufficient level of accuracy (i.e., calibration in Table 3) and without new analytical SOC measurements.The SOC estimation was gained from calibration models obtained by subsetting the LUCAS soil database into classes based on 12 soil variables, and the comparison between measured and estimated SOC values showed an overall RMSE of 4.3 g•kg −1 and a RPD of 2.5 for the calibration

Discussion
In the first step of the bottom-up approach we obtained the estimation of SOC values at sampling points with a sufficient level of accuracy (i.e., calibration in Table 3) and without new analytical SOC measurements.The SOC estimation was gained from calibration models obtained by subsetting the LUCAS soil database into classes based on 12 soil variables, and the comparison between measured and estimated SOC values showed an overall RMSE of 4.3 g•kg −1 and a RPD of 2.5 for the calibration datasets (Table 3).Nocita et al. [58] and Clairotte et al. [59] showed similar results in terms of RPD subsetting large spectral library by means of a local PLSR models; however, the validation datasets are not completely independent for both studies, because the samples used for the validation were selected from the same library as that used for the calibration, i.e., the LUCAS dataset for [58] and the French national spectral library for [59].
Remote-sensing data (airborne and satellite) are used to obtain quantitative soil variables estimation for a large area [29,46,56].For this purpose, most studies report that a calibration dataset is compiled consisting of measured soil values collected at points corresponding to the remote spectral data [56].The calibration dataset is generally used to build a multivariate calibration model to estimate the soil properties across the remote images (traditional approach).Theoretically, large spectral libraries, such as the European LUCAS soil database, could be exploited not only to estimate point data, but also to build estimation models of the soil variables at large scale, thus avoiding the collection of a calibration dataset.Unfortunately, the existing large spectral libraries provide soil spectra acquired in laboratory conditions, thus using spectrometers conceived to provide high-resolution spectral measurements in a controlled environment.Therefore, the main challenge is to find a method to align laboratory and remote spectral data [45,60].Many issues need to be addressed for this goal: first of all, it is necessary to consider the different technical characteristics of the two sensors [40], in particular the full width at half maximum (FWHM) response at each band, the spectral sampling interval and the spectral range, which entail a systematic differences between sensors [7,45].However, when moving from laboratory to field conditions the radiometric resolution and the SNR hamper the alignments of dissimilar spectral outputs, and the atmospheric correction introduces non-systematic or random differences between laboratory and remote spectra.Other important factors concerning the alignment between laboratory and remote-sensing data are the different spatial resolution, the effect of the georeferencing accuracy, and the different soil conditions at acquisition time (soil moisture, roughness) [61].The uncertainty of the pre-processing procedure, and the changing atmospheric conditions at each acquisition time [45], do not allow stable spectral data to be obtained at the same location.In other words, the remote-sensing spectra extracted in the same geographical point but at different dates will never be identical.In order to verify the uncertainty of the spectral data from remote sensors and, consequently, their instability, we compared the spectra acquired over a dark (Figure 9a,b) and bright (Figure 9c,d) reference surface both by Analytical Spectral Devices (ASD) Field Spec Fr Pro spectroradiometer (ASD Inc., Boulder, CO, USA) (Figure 9a,c) in the field and by the APEX airborne sensor (Figure 9b,d).The measurements were carried out in 2013 and 2015 in the Belgian loam belt area, and highlight a consistently greater measurement stability using ASD as compared to remote acquisition (Figure 9). the spectral data from remote sensors and, consequently, their instability, we compared the spectra acquired over a dark (Figure 9a,b) and bright (Figure 9c,d) reference surface both by Analytical Spectral Devices (ASD) Field Spec Fr Pro spectroradiometer (ASD Inc., Boulder, CO, USA) (Figure 9a,c) in the field and by the APEX airborne sensor (Figure 9b,d).The measurements were carried out in 2013 and 2015 in the Belgian loam belt area, and highlight a consistently greater measurement stability using ASD as compared to remote acquisition (Figure 9).In light of the results of this comparison, a different approach is necessary to exploit the spectral libraries over a large area.In this work we proposed the bottom-up approach to bypass all the issues that hamper the stability of the remote-sensing data and, consequently, the alignment between laboratory and remote spectra.The main advantage of the bottom-up as compared to the traditional approach is that the chemical analyses in the calibration dataset are replaced by spectral measurements.Thus, the traditional approach uses the measured soil variable values, while the bottom-up approach uses the estimated values obtained by means of the LUCAS library (Figure 2).Another advantage in connection with the use of the bottom-up approach is that it could estimate different soil variables with the same spectral data; in fact, the LUCAS dataset includes 12 soil variables, most of them linked with spectral features in the VNIR region.Thus, the bottom-up approach could be useful for obtaining maps of different soil variables at regional scale on bare fields without chemical or physical laboratory measurements and without any spectral transfer between laboratory and remote spectra.The application of a spectral-transfer function entails the collection of a specific spectral dataset, where each sample, named standard, is measured both with master and slave instruments.Although Nouri et al. [45] successfully applied some transfer methods from laboratory to airborne data to map clay content, their results were dependent on the quantity of the In light of the results of this comparison, a different approach is necessary to exploit the spectral libraries over a large area.In this work we proposed the bottom-up approach to bypass all the issues that hamper the stability of the remote-sensing data and, consequently, the alignment between laboratory and remote spectra.The main advantage of the bottom-up as compared to the traditional approach is that the chemical analyses in the calibration dataset are replaced by spectral measurements.Thus, the traditional approach uses the measured soil variable values, while the bottom-up approach uses the estimated values obtained by means of the LUCAS library (Figure 2).Another advantage in connection with the use of the bottom-up approach is that it could estimate different soil variables with the same spectral data; in fact, the LUCAS dataset includes 12 soil variables, most of them linked with spectral features in the VNIR region.Thus, the bottom-up approach could be useful for obtaining maps of different soil variables at regional scale on bare fields without chemical or physical laboratory measurements and without any spectral transfer between laboratory and remote spectra.The application of a spectral-transfer function entails the collection of a specific spectral dataset, where each sample, named standard, is measured both with master and slave instruments.Although Nouri et al. [45] successfully applied some transfer methods from laboratory to airborne data to map clay content, their results were dependent on the quantity of the standards, in other words the best estimation performances were obtained with the maximum number of standards available (64 samples).Moreover, the collection of new standards is necessary for each remote acquisition because of the change in atmospheric conditions between two different acquisition times.
The traditional and bottom-up approaches provided very similar estimation accuracy on the independent validation datasets (Table 3; Figure 5).Both approaches allowed estimating SOC with a sufficient degree of accuracy (RMSE = 1.5-4.9g•kg −1 ; RPD = 1.4-1.7)and using the same number of samples for the calibration datasets.The optimum number of samples for the bottom-up approach, or more in general for all remote-sensing applications, depends on the characteristics of the investigated area and the range of the target variable, but usually the information about the range or the variability of the target variable are not available in advance, making it difficult to set the optimum number of samples for the bottom-up approach.
Many authors report SOC estimation from airborne data, especially exploiting the VNIR-SWIR region [10][11][12][13]16,62,63], but also the long-wave infrared region (LWIR; 8-14 µm) [15].However, all of them build a calibration dataset for the airborne spectra on samples analysed in the laboratory (i.e., the traditional approach).The HyMap hyperspectral sensor (400-2450 nm, with a spectral resolution of 12-17 nm) has been tested for SOC estimation, with mitigated results: Gomez et al. [12] gained very poor results in southern France (RPD = 0.99; RMSE = 2.6 g•kg −1 ) as a result of the very low SOC concentrations.Hbirkou et al. [13] obtained a RPD of 2.32 and a RMSE of 1.1 g•kg −1 in northern Germany with higher and more variable SOC concentrations.The aisaDUAL hyperspectral scanner (367 bands between 400 nm and 2500 nm) provided good cross-validation results in Germany [63]; in this case, the RMSE was 2.7 g•kg −1 , but the standard deviation was quite high (8.2 g•kg −1 ).The full VNIR-SWIR data (400-2500 nm) from the airborne hyperspectral scanner (AHS) was employed by Stevens et al. [11] in Belgium, obtaining a RPD of 1.47; while Stevens et al. [46] in Luxembourg discarded the SWIR region because of the low SNR, thus using only 20 bands from 442 nm and 1019 nm, and they obtained a RPD of 1.9.Most of the aforementioned papers provided cross-validation statistics [10][11][12]63], and only Hbirkou et al. [13] and Stevens et al. [46] tested their models on a validation dataset.In both cases, the entire dataset was split into calibration and validation ensuring a representative subset of the data both in the calibration and validation datasets.Here, we tested the PLSR estimation models on a completely independent validation dataset, obtaining an estimation accuracy similar to those obtained by most of the authors cited above who used the traditional approach.In particular, the results obtained for the loam belt area (RPD = 1.40) are very similar to those obtained by Stevens et al. [11] from leave-one-out cross validation in Belgium (RPD = 1.47), while the results that we gained in Luxembourg (RPD = 1.70) are slightly lower to those obtained by [46] in the same area (RPD = 1.90).

Conclusions
We investigated the feasibility of a new approach, named bottom-up, which allows SOC maps to be provided over a large area by APEX airborne data without recourse to chemical analyses or any spectral transfer between laboratory and remote spectra, by exploiting the spectra and ancillary data of the European LUCAS topsoil database.This approach was tested on fields that were bare during the over flight in two areas with different soil characteristics i.e., the loam belt in central Belgium and a north-south flight strip across Luxembourg.The results were compared to those provided by a direct calibration of the airborne spectra.The comparison between the independent validations of these two approaches did not show differences in terms of RMSE or RPD.Thus, the new approach tested in this work (bottom-up) allowed SOC maps to be obtained without chemical laboratory analyses over two large areas with the same satisfactory level of accuracy (RPD ≥ 1.4) obtainable by a traditional approach.Moreover, the two steps of the bottom-up approach allowed bypassing for the first time (to our knowledge) the issues and restrictions related to the spectral transfer between laboratory and remote-sensing data, in this case between LUCAS spectral data obtained by FOSS spectrometer and APEX spectra.
The loam belt area, due to its homogenous geological substrate, did not show any SOC gradient along the flight line (SW-NE).While clear differences between the northern and southern parts of the flight line exist in Luxembourg, this is due to wetter and colder conditions in the northern plateau, which allow a generally higher SOC content as compared with the values of the central and southern area.The within-field variability could be nearly always explained by the geomorphology in these two areas; in fact, by overlaying SOC maps on a DTM it has was possible to show that the areas along downslope field borders or in zero-order valleys are characterized by higher SOC content than the top of the fields, also with gentle slopes, as is the case in the loam belt.We conclude that the bottom-up approach provided the same SOC estimation accuracy as the traditional approach in the loam belt and Luxembourg areas, and that it is able to explain the SOC variability both at field and regional scale, increasing the level of knowledge of SOC availability for agricultural and environmental management.
In light of these promising results, further research will need to focus on the accuracy level of the bottom-up approach using simulated or real satellite data achievable from the new generation of hyperspectral imagers, and also to investigate an algorithm that allows the optimum number of samples for the calibration dataset to be set.Since the efficiency of the bottom-up approach is strongly affected by the LUCAS dataset, it will be appropriate to test the method in different soil and climate regions, covered by the LUCAS dataset, thereby eventually investigating the potential for predicting other soil variables such as clay content or CEC linked with spectral features in the VNIR region.

Figure 1 .
Figure 1.Location of the two study areas in Belgium and Luxembourg over which Airborne Prism Experiment (APEX) data were acquired.

Figure 1 .
Figure 1.Location of the two study areas in Belgium and Luxembourg over which Airborne Prism Experiment (APEX) data were acquired.

Figure 2 .
Figure 2. Flow chart concerning the two soil organic carbon (SOC) estimation approaches used in this work: traditional and bottom-up.

Figure 2 .
Figure 2. Flow chart concerning the two soil organic carbon (SOC) estimation approaches used in this work: traditional and bottom-up.

Figure 3 .
Figure 3. RGB (red: 661 nm; green: 545 nm; blue: 450 nm) airborne images acquired by APEX sensor in Luxembourg (a), and Belgium (b), and the location of the eight validation fields.

Figure 3 .
Figure 3. RGB (red: 661 nm; green: 545 nm; blue: 450 nm) airborne images acquired by APEX sensor in Luxembourg (a), and Belgium (b), and the location of the eight validation fields.

Figure 4 .
Figure 4. SOC maps of the validation fields in Luxembourg (a), and Belgium (b), using the bottomup approach.The white points in the fields correspond to the validation dataset.Figure 4. SOC maps of the validation fields in Luxembourg (a), and Belgium (b), using the bottom-up approach.The white points in the fields correspond to the validation dataset.

Figure 4 . 18 Figure 5 .
Figure 4. SOC maps of the validation fields in Luxembourg (a), and Belgium (b), using the bottomup approach.The white points in the fields correspond to the validation dataset.Figure 4. SOC maps of the validation fields in Luxembourg (a), and Belgium (b), using the bottom-up approach.The white points in the fields correspond to the validation dataset.Remote Sens. 2018, 10, x FOR PEER REVIEW 10 of 18

Figure 5 .
Figure 5. Plot of measured against estimated soil organic carbon content obtained by traditional (a), and bottom-up (b), approaches on the independent validation datasets in Belgium and Luxembourg.

Figure 6 .
Figure 6.RGB (red: 661 nm; green: 545 nm; blue: 450 nm) airborne images acquired by APEX sensor in Luxembourg (left), and some SOC maps obtained by the bottom-up approach in the northern (a), and central (b), areas.

Figure 7 .
Figure 7. Soil organic carbon maps of the field 1 (a) and field 4 (b) obtained by the bottom-up approach in the loam belt area laid on top of the digital terrain model (DTM) of the Luxembourg region with a 5 m resolution.

Figure 6 .
Figure 6.RGB (red: 661 nm; green: 545 nm; blue: 450 nm) airborne images acquired by APEX sensor in Luxembourg (left), and some SOC maps obtained by the bottom-up approach in the northern (a), and central (b), areas.

Figure 6 .
Figure 6.RGB (red: 661 nm; green: 545 nm; blue: 450 nm) airborne images acquired by APEX sensor in Luxembourg (left), and some SOC maps obtained by the bottom-up approach in the northern (a), and central (b), areas.

Figure 7 .
Figure 7. Soil organic carbon maps of the field 1 (a) and field 4 (b) obtained by the bottom-up approach in the loam belt area laid on top of the digital terrain model (DTM) of the Luxembourg region with a 5 m resolution.

Figure 7 .
Figure 7. Soil organic carbon maps of the field 1 (a) and field 4 (b) obtained by the bottom-up approach in the loam belt area laid on top of the digital terrain model (DTM) of the Luxembourg region with a 5 m resolution.Remote Sens. 2018, 10, x FOR PEER REVIEW 12 of 18

Figure 8 .
Figure 8. Soil organic carbon maps of the field 7 (a) and another bare field (b) obtained by the bottomup approach in the loam belt area laid on top of the digital terrain model (DTM) of the Wallonia region with a 1 m resolution.

Figure 8 .
Figure 8. Soil organic carbon maps of the field 7 (a) and another bare field (b) obtained by the bottom-up approach in the loam belt area laid on top of the digital terrain model (DTM) of the Wallonia region with a 1 m resolution.

Figure 9 .
Figure 9. Spectral comparison between Analytical Spectral Devices (ASD) (a,c) and APEX (b,d) measurements acquired on dark (a,b), and bright (c,d), reference surfaces in the loam belt area in 2013 and 2015.

Figure 9 .
Figure 9. Spectral comparison between Analytical Spectral Devices (ASD) (a,c) and APEX (b,d) measurements acquired on dark (a,b), and bright (c,d), reference surfaces in the loam belt area in 2013 and 2015.

Table 2 .
Descriptive statistics of the soil organic carbon content in the calibration and validation dataset of the loam belt and Luxembourg areas.

Table 2 .
Descriptive statistics of the soil organic carbon content in the calibration and validation dataset of the loam belt and Luxembourg areas.