Employing a Multi-Input Deep Convolutional Neural Network to Derive Soil Clay Content from a Synergy of Multi-Temporal Optical and Radar Imagery Data

: Earth observation (EO) has an immense potential as being an enabling tool for mapping spatial characteristics of the topsoil layer. Recently, deep learning based algorithms and cloud computing infrastructure have become available with a great potential to revolutionize the processing of EO data. This paper aims to present a novel EO-based soil monitoring approach leveraging open-access Copernicus Sentinel data and Google Earth Engine platform. Building on key results from existing data mining approaches to extract bare soil reflectance values the current study delivers valuable insights on the synergistic use of open access optical and radar images. The proposed framework is driven by the need to eliminate the influence of ambient factors and evaluate the efficiency of a convolutional neural network (CNN) to effectively combine the complimentary information contained in the pool of both optical and radar spectral information and those form auxiliary geographical coordinates mainly for soil. We developed and calibrated our multi-input CNN model based on soil samples (calibration = 80% and validation 20%) of the LUCAS database and then applied this approach to predict soil clay content. A promising prediction performance (R 2 = 0.60, ratio of performance to the interquartile range (RPIQ) = 2.02, n = 6136) was achieved by the inclusion of both types (synthetic aperture radar (SAR) and laboratory visible near infrared–short wave infrared (VNIR-SWIR) multispectral) of observations using the CNN model, demonstrating an improvement of more than 5.5% in RMSE using the multi-year median optical composite and current state-of-the-art non linear machine learning methods such as random forest (RF; R 2 = 0.55, RPIQ = 1.91, n = 6136) and artificial neural network (ANN; R 2 = 0.44, RPIQ = 1.71, n = 6136). Moreover, we examined post-hoc techniques to interpret the CNN model and thus acquire an understanding of the relationships between spectral information and the soil target identified by the model. Looking to the future, the proposed approach can be adopted on the forthcoming hyperspectral orbital sensors to expand the current capabilities of the EO component by estimating more soil attributes with higher predictive performance.


Introduction
In a world subject to constant climate change and increasing pressures by agricultural intensification and other human activities, soil is considered a vital but endangered component of the global life 2 of 26 support system [1]. In that regard, the challenge to obtain better information, with high resolution and better representation, should be fulfilled at multiple scales to understand and protect the world's soil resource. In this context a recent review about global soil property maps was provided by Dai et al. [2], indicating the advantages and limitations of available soil datasets derived through digital soil mapping techniques.
Digital soil mapping based on spaceborne Earth observation (EO) data is currently undergoing a significant shift. This is driven first and foremost by the advent of the big EO data era, mainly spearheaded by open access of the Landsat archive in 2008 [3], and more recently by the operation of the Europe's Copernicus programme, which provide free and open super spectral imagery data. This is further supported by novel machine learning algorithms for the spatial estimations of soil properties from spaceborne-sourced environmental covariates and harmonized soil observations [4]. Regarding this last point, the work of Behrens et al. [5] has provided a first overview of the potential of deep learning methods for multi-scale terrain feature construction and their relative effectiveness for digital soil mapping. More recently, Padarian et al. [6] and Wadoux et al. [7] expanded the digital soil mapping previously proposed by Behrens et al. [8], by including deep learning techniques to optimally search for local contextual information of covariates, while Tsakiridis et al. [9] recently introduce an interpretable novel localized multi-channel 1-D convolutional neural network (CNN).
Currently, besides the growing trend in EO-based soil mapping using hyperspectral sensors there is another important trend focusing on open multi-spectral data and their analysis. A critical review about recent advances in this domain was provided by Angelopoulou et al. [10]. Recent studies [11][12][13] have shown promising results through the exploitation of Multi-Spectral Instrument (MSI) Sentinel-2 data, together with advanced regression analytics for estimating and mapping the spatial variability of soil properties over a single considered date. Similarly, Gholizadeh et al. [14] extracted spectral reflectance values and associated indices from Sentinel-2 bands and employed a support vector regression algorithm at the pixel level to produce spatial distribution maps of topsoil properties. Recent studies are based on optical imagery data since the integration of multi-sensor data, such as optical and radar, is not trivial since different sensors exhibit various resolutions and depict different phenomena. Only recent studies [15,16] provide promising insights to fuse optical and radar data, where a conjoint use of synthetic aperture radar (SAR) and multi-spectral EO data was employed to account for the soil roughness levels during the prediction of the soil properties.
However, these approaches put specific emphasis on spatial prediction of properties that are relatively static over the observational time period, and mainly focus on soil texture. One key challenge is to extract information and knowledge from this big EO data, possibly almost as soon as the EO data are available for processing and integrating between disciplines. Proper approaches need to be chosen carefully to overcome the issue of limited soil exposure for spatiotemporal regression analysis. In the temporal analysis of EO spaceborne data, it is customary to calculate spectral indices from the initial recorded spectra to enhance the given information and classify the pixels in accordance to them, i.e., valid thresholds are selected and then applied, and only the pixels corresponding to bare soil regions are retained to generate a composite image [17]. Data mining procedures to retrieve bare soil spectral reflectance demand extremely large satellite data archives, as well as high performance image processing computing infrastructure in order to undertake the processing. Recently, several scientific groups presented promising results regarding the development of bare soil composite methods by performing data mining of satellite time series, such as the soil composite mapping processor [18] and the geospatial soil sensing system [19]. However, the temporal variability of the Earth's surface such as soil moisture and roughness have a significant influence on the reflectance values of bare soils, affecting the quantitative assessment of soil parameters [17].
Despite the aforementioned advances in topsoil EO monitoring, the generated datasets correspond to a fairly coarse representation of the various properties and correspond to an average status of the soil in the last decades, since the production of maps is derived from multi-annual data spanning the course of at least a decade. It is in this realm that the deep learning techiques may enable a more proficient way for mapping spatial characteristics of soils with EO data in near real-time. However, poorly documented soil datasets and the small number of ground truth recordings hinder the penetration of these techniques to soil regression analysis. In that regard, large soil databases collected and analyzed under standardized methods and protocols could facilitate to eliminate the need of collecting new samples for model calibrations and this could be a strong asset for EO-based soil monitoring [20].
Differently from previous approaches that mainly focus on one dimensional spectral signature processing and regression, as derived from individual images or generated as the median value of consecutive images, this paper aims to describe a novel methodological framework called the multi-input sentinel-based soil monitoring scheme (S2MoS), which is based on open-access EO data and cloud computing facilities to allow topsoil monitoring. The novel aspect of this framework is that it efficiently leverages information from the temporal behavior in top-soil. Moreover, building on key results from existing multi-temporal data mining approaches the current study will strive to (i) deliver preliminary insights of the synergistic use of open access SAR and optical images obtained from Sentinel-1 and Sentinel-2 sensors; and (ii) evaluate the efficiency of CNNs to produce fine resolution (i.e., 10 m) soil maps based on multi-temporal analysis. Further, the current approach relies on promoting the exploitation of existing soil databases to eliminate the need for new field campaigns and calibration datasets. Given the sufficient spatial distribution and representativeness of Land Use and Coverage Area Frame Survey (LUCAS) dataset across the European continent [21], we leveraged this reference soil data in our study. The developed methodology can be transferred and scaled-up to generate spatial explicit soil indicators over agricultural regions at various scales. Importantly, this model is developed from continental-level data and thus is not restricted to a specific region or area. To demonstrate this, we evaluated the performance of the S2MoS approach in an independent agricultural area (217 km 2 ) in Northern Greece, unseen by the model, involving different types of soils. The results of this study demonstrate a paradigm-shift from standard static soil mapping approaches toward generating more dynamic, on-demand, soil maps through advanced cloud computing resources that simplify access to and processing of a large volume of satellite imagery.

Materials and Methods
The methodological approach consists of three discrete steps: (i) data processing, which includes the creation of the multispectral and radar backscattering time series, the bare soil masking, and the generation of calibration and validation datasets; (ii) EO-based soil regression analysis, where a CNN is used to predict the target soil variable; and (iii) evaluation, for assessing the performance metrics obtained by CNN as well as for comparison of the obtained results with those by other state-of-the-art machine learning algorithms. The overall data processing and analysis workflow is illustrated in Figure 1 and detailed descriptions of the different steps are provided in the sections below.
Remote Sens. 2020, 12, x FOR PEER REVIEW 3 of 26 status of the soil in the last decades, since the production of maps is derived from multi-annual data spanning the course of at least a decade. It is in this realm that the deep learning techiques may enable a more proficient way for mapping spatial characteristics of soils with EO data in near real-time. However, poorly documented soil datasets and the small number of ground truth recordings hinder the penetration of these techniques to soil regression analysis. In that regard, large soil databases collected and analyzed under standardized methods and protocols could facilitate to eliminate the need of collecting new samples for model calibrations and this could be a strong asset for EO-based soil monitoring [20]. Differently from previous approaches that mainly focus on one dimensional spectral signature processing and regression, as derived from individual images or generated as the median value of consecutive images, this paper aims to describe a novel methodological framework called the multiinput sentinel-based soil monitoring scheme (S2MoS), which is based on open-access EO data and cloud computing facilities to allow topsoil monitoring. The novel aspect of this framework is that it efficiently leverages information from the temporal behavior in top-soil. Moreover, building on key results from existing multi-temporal data mining approaches the current study will strive to (i) deliver preliminary insights of the synergistic use of open access SAR and optical images obtained from Sentinel-1 and Sentinel-2 sensors; and (ii) evaluate the efficiency of CNNs to produce fine resolution (i.e., 10 m) soil maps based on multi-temporal analysis. Further, the current approach relies on promoting the exploitation of existing soil databases to eliminate the need for new field campaigns and calibration datasets. Given the sufficient spatial distribution and representativeness of Land Use and Coverage Area Frame Survey (LUCAS) dataset across the European continent [21], we leveraged this reference soil data in our study. The developed methodology can be transferred and scaled-up to generate spatial explicit soil indicators over agricultural regions at various scales. Importantly, this model is developed from continental-level data and thus is not restricted to a specific region or area. To demonstrate this, we evaluated the performance of the S2MoS approach in an independent agricultural area (217 km 2 ) in Northern Greece, unseen by the model, involving different types of soils. The results of this study demonstrate a paradigm-shift from standard static soil mapping approaches toward generating more dynamic, on-demand, soil maps through advanced cloud computing resources that simplify access to and processing of a large volume of satellite imagery.

Materials and Methods
The methodological approach consists of three discrete steps: (i) data processing, which includes the creation of the multispectral and radar backscattering time series, the bare soil masking, and the generation of calibration and validation datasets; (ii) EO-based soil regression analysis, where a CNN is used to predict the target soil variable; and (iii) evaluation, for assessing the performance metrics obtained by CNN as well as for comparison of the obtained results with those by other state-of-theart machine learning algorithms. The overall data processing and analysis workflow is illustrated in Figure 1 and detailed descriptions of the different steps are provided in the sections below.  Table 1 summarizes the data sets used in the current study. They included several EO sources, both optical and radar, derived indices for bare soil masking, and reference data used in the model's building and performance assessment. The LUCAS topsoil data archive was used as the source for the soil texture information. The current study only considered the 8426 georeferenced mineral cropland samples from the 2009 campaign to verify and further validate the modeling activities, since they are the more recent data currently openly available. The percentage of clay in soil material was estimated, following an international standardization procedure, as described by Tóth et al. [22]. The distribution of these samples can be considered as geographically uniform since they widely cover most of the European countries. Further, the LUCAS dataset includes laboratory visible near infrared-short wave infrared (VNIR-SWIR) spectral profiles of each soil sample. The spectra were measured using an XDS Rapid Content Analyzer spectrometer (FOSS NIR Systems Inc., Laurel, MD, USA) operating in the 400-2500 nm wavelength range with 2 nm spectral resolution, and 0.5 nm output after interpolation, resulting in a total of 4200 spectral bands. The spectral data were resampled according to the spectral bands of MSI Sentinel-2 sensor, for further analysis. The resampled spectra were obtained by resampling the LUCAS dataset, using convolution procedures, to the specific spectral response and resolution of the MSI Sentinel-2.

Data Sets Description
Additionally, we used a reference soil dataset from a rural area, which is representative of regional variation in terms of both agricultural management uses and different background soils, to independently verify our methodology ( Figure 2). Our test area comprised of about 217 km 2 in the Zazari river basin in Northern Greece, with extensive agricultural areas covered by annual crops vegetation. According to the 1:1,000,000 scale map, Luvisols, Cambisols, Lithosols, and Regosols [23] are dominant soils of the region (Figure 2b). The major agricultural crops in the region are maize, permanent and temporary meadow for forage, and winter cereals (e.g., wheat and barley). In situ data were collected during a field survey in the fall periods of 2018 and 2019. The local dataset consisted of 52 soil samples, each sample consisting of a mix of five subsamples collected from the upper soil layer (0-10 cm) within an area of 5 m radius. The particle size distribution data was measured using the Bouyoucos [24] method, from which the texture classes were derived. The clay content values of the Greek study area ranged between 10.90% and 60%. The quality control of the soil analysis was secured by a quality assurance protocol followed by the certified laboratory of Aristotle University of topsoil database and the Greek local dataset, can be found in Table 2.

Optical Imagery Data
The available MSI Sentinel-2 satellite images for the sites of LUCAS and Greek dataset were analyzed using computing facilities from the cloud-based platform of Google Earth Engine (GEE), without needing to download them. In this study, both Sentinel-2A and Sentinel-2B Level-2A data from January 2017 to December 2019 were used. To mitigate the limitation that arises due to cloud cover, we applied a selection criterion to cloud percentage (<10%) when generating our cloud-free time series. In the current work, only visible (B2, B3, and B4), near infrared (B5-B8 and B8a) and shortwave infrared (B11 and B12) bands were used, since B1 and B10 were only used for the precise aerosol and cirrus correction, respectively. Accordingly, for the Greek study area the MSI Sentinel-2 composite was created for six-months between October and March, during the non-cultivated seasons of 2018 and 2019. This time period was selected since it allowed for maximum scene data coverage, due to tillage practices. As a last step, we used the mean and standard deviation of each spectral band and then we identified 140 outliers as those having reflectance values more than three standard deviations away from the mean.
Data normalization is an important step to ensure that our model's features (input spectral and radar data, as well as the independent variable) have a similar data range. In this study we calculated the minimum and maximum values of our dataset parameters and used them to normalize the sets

Optical Imagery Data
The available MSI Sentinel-2 satellite images for the sites of LUCAS and Greek dataset were analyzed using computing facilities from the cloud-based platform of Google Earth Engine (GEE), without needing to download them. In this study, both Sentinel-2A and Sentinel-2B Level-2A data from January 2017 to December 2019 were used. To mitigate the limitation that arises due to cloud cover, we applied a selection criterion to cloud percentage (<10%) when generating our cloud-free time series. In the current work, only visible (B2, B3, and B4), near infrared (B5-B8 and B8a) and short-wave infrared (B11 and B12) bands were used, since B1 and B10 were only used for the precise aerosol and cirrus correction, respectively. Accordingly, for the Greek study area the MSI Sentinel-2 composite was created for six-months between October and March, during the non-cultivated seasons of 2018 and 2019. This time period was selected since it allowed for maximum scene data coverage, due to tillage practices. As a last step, we used the mean and standard deviation of each spectral band and then we identified 140 outliers as those having reflectance values more than three standard deviations away from the mean.
Data normalization is an important step to ensure that our model's features (input spectral and radar data, as well as the independent variable) have a similar data range. In this study we calculated the minimum and maximum values of our dataset parameters and used them to normalize the sets to the [−1, 1] range. We applied the normalization to both the LUCAS and Greek reference soil datasets.

Radar Data
VV and VH polarizations for the images available from January 2017 to December 2019 were used. VH is a vertically transmitted and horizontally received SAR backscatter signal from Sentinel-1, while VV is considered as the vertically transmitted and received SAR backscatter signal. We restricted the selection of radar data imagery in those that do not exceed one day from the acquisition time of the corresponding Sentinel-2 data.

Bare soil Filtering
An important parameter that characterizes agricultural regions is the presence of active and non-active vegetation cover, which hinders the acquisition of the raw soil surface spectrum. To exclude vegetated and mixed pixels, NDVI values were derived from the optical imagery data considering the geographical coordinates of each LUCAS sample, using the B4 and B8 bands. Additionally, the differences between B3 and B2 as well as B4 and B3 were derived to remove some of the erroneous data by keeping only the ones with positive differences in these bands. Lastly, the NBR2 related to the dry vegetation presence in the pixels has been used in support of the selection of representative bare soil areas. NBR2 is a ratio index that takes advantage of the contrast between the two Sentinel-2 SWIR bands, as highlighted in recent works for bare soil masking [19,25].
The selection of NDVI and NBR2 thresholds influences the number of observations in the dataset. Hence, a set of combined thresholds was tested ranging from 0.20 to 0.30 in the step of 0.05 and from 0.015 to 0.30 in the step of 0.03 for NDVI and NBR2, respectively. The optimum values were selected by examining the correlation statistics between the corresponding resampled LUCAS soil spectral signature and the median surface reflectance of Sentinel-2 values from the selected bare soil pixels. To summarize, the LUCAS database was utilized in the initial step in order to define the optimal combinations of our thresholds to mask the bare soil pixels from Sentinel-2 imagery data. Overall, we defined the NDVI and NBR2 thresholds equal to 0.25 and 0.075, respectively, since this combination showed the Pearson's correlation values larger than 0.25 (absolute value) especially in the B12 SWIR band that is considered crucial for clay estimation [26]. The results are in compliance with the last study of Castaldi et al. [25] where lower NBR2 values increased the overall performance of the developed models. The correlogram of clay at the generated features from optical and SAR imagery data overall shows very small values, with the highest correlations being for the latitude and the B12 band, which is in SWIR (Figure 3). The low values in Pearson's correlation can be explained due to the changes of the soil moisture and/or roughness, as well as dry vegetation debris compared to clean resampled spectra of laboratory soil spectral libraries (SSLs).

EO-Based Multispectral Regression Analysis
The data collected from Sentinel-1 and Sentinel-2 were used to predict the clay content following the novel methodological framework proposed herein. To compare the accuracy of the proposed CNN we also tested the performance of two commonly used machine learning algorithms, namely random forest (RF) and an artificial neural network (ANN).

Convolution Neural Network for Soil Attributes Mapping
Previous works indicated great potential in utilization of non-linear machine learning methods for quantification of soil attributes with reflectance values and complementary covariates of multispectral EO platforms. Mainly, these studies focused on the information derived from the median spectral values of satellite products, without taking into account the temporal variability of the Earth's surface such as soil moisture and roughness. Our choice for the incorporation of new deep learning techniques, such as CNN, was motivated by our goal to evaluate the complementary information of inter-and intra-annual spectral variations in the chemometric modeling. These algorithms are considered very effective to derive features from short segments of the overall spectral signals [27]. Readers unfamiliar with the CNNs may refer to recent studies [9,28] and relevant reviews [29] where 1D-CNN basic features and operations are described. In addition, a glossary related to deep learning terminology is provided in the end of this manuscript to further explain specific terms that support the description of the CNN model in the following paragraphs.
Here we present the network architecture of the proposed multi-input CNN, which aims to predict topsoil clay content based on the remotely sensed imagery data. The proposed approach leverages the techniques of 1-D CNNs to estimate clay content using in conjunction as predictors (i) all the selected 1-D spectral signatures of masked bare soil pixels (Sentinel-2 10 bands), (ii) their closest, in terms of date, combined observations from VV and VH data (additional two bands), and (iii) the geographical coordinates for each sample point across the observation time. This spectral information, both radar and optical values, is seen by the network as a single entity for each observation date, while the multiple observations within the years captured the related introduced uncertainties (e.g., decorrelation effects through temporal environmental factors). Thus, the end goal is to combine the potentially complementary information arising by the variations of these values.
The proposed network is arranged in a series of six different types of layers, namely: (i) the input layers, where the remotely sensed values accounted for constitute the spectral input channel and the geographical coordinates the auxiliary input channel; (ii) the convolutional layers, which filters a given input and extracts information from specific spectral regions by carrying out the convolution operation; (iii) the pooling layer, used to reduce the resolution of the temporal dimension; (iv) the flattening layer, which converts the multi-channel filters into a single continuous vector; (v) the dense layer, which connects all outputs of the previous layer to all inputs of the forthcoming layer; and (vi) the concatenate layer, which takes as input the resulted tensors from spectral and auxiliary input and returns a single tensor. Figure 4 depicts the overall network architecture. To train the network, we employed the Adam optimizer [30] using a batch size of eight while the number of epochs are set to 100. Most notably, were are not interested in identifying the temporal trends in the spectral signatures per se, considering that the texture of the soil samples remains unaltered within the span of few years. Thus, the convolution occurs across the spectral dimension and not the temporal one.
More concretely, first, a 1-D convolutional layer with 16 filters convolves the spectral input (12 channels), utilizing a kernel of size three. Then we applied a second 1-D convolutional layer that comprised of 64 filters with a kernel size of three. Each convolutional layer was followed by a rectified linear unit (ReLU) activation layer. The convolved layers of the CNN were flattened and then fully connected by a dense layer, followed by a dropout layer with a rate of 0.5. For the dense layer, an exponential linear unit (eLU) activation function was selected, whilst a kernel regularizer 0.002 was used to penalize the weights and reduce overfitting. In order to reduce the dimension of the data (50% each time) we applied a max pooling layer after the convolutional layers. Considering the auxiliary input (two channels, namely the latitude and longitude) two dense layers were used. The first dense layer had eight outputs, and the second dense layer had four outputs. The results of the spectral and auxiliary inputs were then combined whereby a concatenation layer was established. Subsequently, two dense layers of 64 and 16 channels, respectively, applied with an eLU activation function and a kernel L2 regularizer of 0.002. Both were followed by the application of a dropout layer (0.2). Finally, we concluded with one output layer of size one, which corresponded to the target variable, using the tanh activation function. The layers are also presented in Table 3. Table 3. Description of the layers used in the multidimensional convolutional neural network (CNN) architecture.

Type
Kernel Size (Channels × Width) Filters Channels Width Activation  [31] and their applications allow us to derive meaningful interpretations in the domain of soil spectroscopy. A summary of them may be found in Zhang et al. [32].
To understand how the model derives its predictions a post-hoc explainability analysis was conducted. This is an important step in order to make the system more interpretable [33] and ensure the robustness of the predictions, in accordance to the principles of explainable artificial intelligence [34]. To this end, we examined the generated feature maps of the final convolutional layer, which takes into account only the spectral and radar information (and ergo not the location of the samples). For each feature map, the top 5% (in terms of activation) of the patterns from the training set were identified, and their input features and respective clay content were visualized.

Competing Modeling Approaches and Additional Experiments
To compare the accuracy of the proposed multidimensional CNN we also tested the performance of RF and of a different ANN architecture, as representative approaches of the current state-of-the-art. Previous work of Demattê et al. [19] was used as the benchmark. In that regard, we produced the median spectral reflectance of the Sentinel-2 bands to assess the performance of the competing algorithms based on the temporal synthetic spectral reflectance profile of each soil sample.
RF is an ensemble learning classifier [35] that has achieved good performance metrics in various soil spectroscopy studies [36,37]. An appropriate tuning of hyperparameters ensures the RF's consistency. Thus, a grid search on a five-fold cross-validation experiment was conducted to select the optimal hyperparameters for RF model. In particular, we selected the number of variables that can be sampled in each split of the trees from {6, 24} and the minimal number of samples for the terminal nodes from {5, 10, 20, 50}, while the tree parameter were selected from {500, 1000, 1500}. Finally, the optimal set of hyperparameters is as follows: 24 and 20 for the first and second parameter and 1000 number of trees, for the RF and 22 and 50 for the first and second parameter, respectively and 1000 number of trees, for the RF without the location information.
Similarly, we created an ANN architecture using as input (i) the median temporal synthetic spectral reflectance profile of each soil and (ii) the median values along with the geographical coordinates of each sample. Briefly, this network consisted of four trainable layers. The first layer contained eight units. Then we applied a second dense layer, which comprised of 32 units, followed by a dropout layer with a rate of 0.2. For the second dense layer, an eLU activation function was selected, whilst a kernel regularizer L2 of 0.002 was used. In addition, we had another dense layer with eight units and the same parameters as the previous one. We concluded with one output layer of size one, which corresponds to the target variable, using the tanh activation function.
Furthermore, additional experiments were performed. We evaluated the performance of RF and ANN including also the data from Sentinel-1 for each time step and not solely the median reflectance values of the multispectral bands to statistically compare their predictive performance with those derived from the proposed 1D-CNN. Moreover, the proposed CNN model was trained and tested without the use of SAR data, as well as without the use of SAR and geographical coordinates.

Dataset Partition
The averaged spectra on a per pixel basis of Sentinel-2 values for each of the selected samples from the LUCAS database were used to partition the samples into calibration (4901) and testing (1235) set (Figure 5a), uniformly by the Kennard-Stone algorithm [38], so that the calibration set was distributed evenly based on spectral representativeness, extending among the EU territory, as illustrated in Figure 5. We performed bootstrap aggregation without replacement where five different sets of size 3920 (i.e., 80% of the 4901 calibration data) were used to each train a separate model and predict on the test set. Then, these results were combined by averaging the output and thus producing a single prediction for each testing datum. This procedure was performed for all the algorithms, CNN, RF, and ANN. Finally, to test the CNN model the 52 samples from the Zazari river basin were utilized as an independent test dataset.

Evaluation of the Models
In order to assess the performance of calibration models for soil clay content we calculated the root-mean-square error (RMSE, Equation (1)), the coefficient of determination (R 2 , Equation (2)), and the ratio of performance to interquartile range (RPIQ, Equation (3)). The equations used were as follows: where is the observed value and is the predicted value of sample i, N the number of observations (Equation (1)), is the mean of the observed values (Equation (2)), and IQ is the interquartile range (IQ = Q3 − Q1) of the observed values (Equation (3)). Q1 and Q3 denote the first and third quartile, respectively.

Implementation
Our approach was implemented in a way such that Sentinel-1 and Sentinel-2 data handling were processed using the scalable capabilities of GEE [39], making the whole processing routine distributed and thus more computationally efficient. The coordinates (WGS84) of the soil sample sites were imported into GEE using Google fusion tables. The spectral signatures corresponding to the sample sites were then extracted and downloaded for further processing in a local environment. The statistical and CNN regression analyses were performed utilizing the R software [40], with the keras package [41]. The caret package [42] was also used for the development of the RF model. The models were built on a workstation with a NVIDIA Quadro K4200 graphics processing unit.

Evaluation of the Models
In order to assess the performance of calibration models for soil clay content we calculated the root-mean-square error (RMSE, Equation (1)), the coefficient of determination (R 2 , Equation (2)), and the ratio of performance to interquartile range (RPIQ, Equation (3)). The equations used were as follows: where y i is the observed value andŷ i is the predicted value of sample i, N the number of observations (Equation (1)), y is the mean of the observed values (Equation (2)), and IQ is the interquartile range (IQ = Q3 − Q1) of the observed values (Equation (3)). Q1 and Q3 denote the first and third quartile, respectively.

Implementation
Our approach was implemented in a way such that Sentinel-1 and Sentinel-2 data handling were processed using the scalable capabilities of GEE [39], making the whole processing routine distributed and thus more computationally efficient. The coordinates (WGS84) of the soil sample sites were imported into GEE using Google fusion tables. The spectral signatures corresponding to the sample sites were then extracted and downloaded for further processing in a local environment. The statistical and CNN regression analyses were performed utilizing the R software [40], with the keras package [41].
The caret package [42] was also used for the development of the RF model. The models were built on a workstation with a NVIDIA Quadro K4200 graphics processing unit.

Times Series Preliminary Analysis
The multi-temporal analysis of bare soil reflectance of LUCAS sampling sites implemented in this study extended from January 2017 to December 2019. Our choice to focus on a three year study was mainly based on the need to provide a sufficient number of scenes for statistical analysis, able to capture the clean soil spectrum (i.e., without cloud coverage or presence of dry and green vegetation) each sample site at least once. It should be noted that the distribution of cloud-free scenes was significantly variable for different parts in the European continent, having an impact on the number of time pixels were exposed (i.e., more clean soil pixels in Mediterranean countries compared to Northern Europe).
Overall, there were a total of 21,334, 48,913, and 45,842 bare soil pixels in 2017, 2018, and 2019 (the Sentinel-2 Level 2 were ingested in GEE from 2019 and backwards), corresponding to 4507, 5659, and 5397 extracted points, respectively ( Figure 6). Then, a time series of each soil site sample with 10 spectral bands was generated. For the majority of pixels (96%) at least two cloud free observations were available in this period and about a quarter of the pixels had more than 25 observations available. Approximately 15% of the pixels had less than four cloud-free recordings during the entire study period.

Times Series Preliminary Analysis
The multi-temporal analysis of bare soil reflectance of LUCAS sampling sites implemented in this study extended from January 2017 to December 2019. Our choice to focus on a three year study was mainly based on the need to provide a sufficient number of scenes for statistical analysis, able to capture the clean soil spectrum (i.e., without cloud coverage or presence of dry and green vegetation) each sample site at least once. It should be noted that the distribution of cloud-free scenes was significantly variable for different parts in the European continent, having an impact on the number of time pixels were exposed (i.e., more clean soil pixels in Mediterranean countries compared to Northern Europe).
Overall, there were a total of 21,334, 48,913, and 45,842 bare soil pixels in 2017, 2018, and 2019 (the Sentinel-2 Level 2 were ingested in GEE from 2019 and backwards), corresponding to 4507, 5659, and 5397 extracted points, respectively ( Figure 6). Then, a time series of each soil site sample with 10 spectral bands was generated. For the majority of pixels (96%) at least two cloud free observations were available in this period and about a quarter of the pixels had more than 25 observations available. Approximately 15% of the pixels had less than four cloud-free recordings during the entire study period.

Bare Soil Spectral Pattern from LUCAS and Sentinel-2 Data
The next steps in our data analyses were performed using the LUCAS soil sampling points that were masked out as bare soil pixels. Then, for each selected soil sampling point the median absorbance spectral response as well as the associated standard deviation among all the MSI Sentinel-2 recordings was calculated (Figure 7). For comparison we also resampled and converted to reflectance values the corresponding laboratory LUCAS spectra to Sentinel-2 bands. Overall, we could observe that the position and the shape of Sentinel-2 and resampled LUCAS spectral signatures present noticeable differences (e.g., albedo). In particular, the resampled laboratory spectral signatures indicated higher reflectance values relative to MSI Sentinel-2, since the measurement were performed in a controlled environment and not influenced by the surface conditions of the open fields (e.g., roughness caused by the particle size and soil moisture). Moreover, this variation was expected since we should consider that LUCAS SSL was made in disturbed soil samples, while in real field conditions the soil surface were in an undisturbed condition.

Bare Soil Spectral Pattern from LUCAS and Sentinel-2 Data
The next steps in our data analyses were performed using the LUCAS soil sampling points that were masked out as bare soil pixels. Then, for each selected soil sampling point the median absorbance spectral response as well as the associated standard deviation among all the MSI Sentinel-2 recordings was calculated (Figure 7). For comparison we also resampled and converted to reflectance values the corresponding laboratory LUCAS spectra to Sentinel-2 bands. Overall, we could observe that the position and the shape of Sentinel-2 and resampled LUCAS spectral signatures present noticeable differences (e.g., albedo). In particular, the resampled laboratory spectral signatures indicated higher reflectance values relative to MSI Sentinel-2, since the measurement were performed in a controlled environment and not influenced by the surface conditions of the open fields (e.g., roughness caused by the particle size and soil moisture). Moreover, this variation was expected since we should consider that LUCAS SSL was made in disturbed soil samples, while in real field conditions the soil surface were in an undisturbed condition.
Examples of variation of spectral signatures taken from the Sentinel-2 bare soil images for representative LUCAS sites are illustrated in Figure 8. Examples correspond to clay, sandy, silty and silty clay soil type [43]. The number of pixels masked out as bare soil was different depending on the region. For example, at points characterized as clay, sand, silty and silty clay we created a time series of 16, 22, 29 and 18 stacked pixels, respectively. Hence, the illustrated trajectories correspond to the associated number of selected observations across the period of 2017-2019 for the 10 Sentinel-2 bands. Examples of variation of spectral signatures taken from the Sentinel-2 bare soil images for representative LUCAS sites are illustrated in Figure 8. Examples correspond to clay, sandy, silty and silty clay soil type [43]. The number of pixels masked out as bare soil was different depending on the region. For example, at points characterized as clay, sand, silty and silty clay we created a time series of 16, 22, 29 and 18 stacked pixels, respectively. Hence, the illustrated trajectories correspond to the associated number of selected observations across the period of 2017-2019 for the 10 Sentinel-2 bands.   Examples of variation of spectral signatures taken from the Sentinel-2 bare soil images for representative LUCAS sites are illustrated in Figure 8. Examples correspond to clay, sandy, silty and silty clay soil type [43]. The number of pixels masked out as bare soil was different depending on the region. For example, at points characterized as clay, sand, silty and silty clay we created a time series of 16, 22, 29 and 18 stacked pixels, respectively. Hence, the illustrated trajectories correspond to the associated number of selected observations across the period of 2017-2019 for the 10 Sentinel-2 bands.

Prediction Performance
The results of the CNN model for the prediction of clay content across the soil samples of the LUCAS database and the best models from the competing approaches are presented in Figure 9. It should be noted that the CNN model leveraged a plethora of input features attaining a performance of RPIQ 2.02 and R 2 = 0.60, a notable result ( Figure 9C). This shows the multi-input architecture's potential to combine effectively the complimentary information contained in the pool of both optical and radar spectral information and those from auxiliary geographical coordinates. Therefore, in comparison with the prediction performance values of the competing algorithms the CNN model significantly improved the performance predictions (R 2 = 0.55, RPIQ = 1.91 and R 2 = 0.44, RPIQ = 1.71, Figure 9A,B) and decreased the RMSE by a relative 4.5% and 6.5% to RF and ANN, respectively. Hence, the effect of the synergistic use of optical and SAR data was evident, since the most accurate models were developed from the integration of both inputs in the proposed CNN model compared to both RF and ANN models that used the entire Sentinel-1 and Sentinel-2 dataset or simple the median values of Sentinel-2. Moreover, the results indicate that each model has increased its performance with the inclusion of geographical coordinates as auxiliary features. Considering the additional information derived from geographical coordinates, we could observe that it slightly improved (5.5% decrease in RMSE) the CNN's prediction of the soil clay content compared to the CNN without the auxiliary input layer (R 2 = 0.53 and RPIQ = 1.87, Figure 9F). However, the improvement due to additional geographical variables was more pronounced for the RF model, indicating that it did not sufficiently exploit the spectral information (R 2 = 0.38 and RPIQ = 1.63, 9E). A significant improvement of 4% was recorded with the ANN model compared with the version without the location information (R 2 = 0.41 and RPIQ = 1.68, Figure 9D). Overall, even if the improvement of prediction is very slight, it can be expected that complementing the models with auxiliary input features capturing the spatial characteristics of the soil samples enable more precise prediction of topsoil clay content. To conclude, in all the cases the CNN model outperformed the competing modeling approaches, as illustrated in Figure 9, where the results are given. Finally, the corresponding results and the associated scatter plots from the additional analyses are given as Appendices A and B.

Interpretability of the Multi-Input CNN Model
The post-hoc explainability analysis that we performed enabled us to understand which groups of patterns and consequently which features each feature map identifies and considers as important for the prediction of the target property. Out of all generated feature maps, we selected six representative filters to assist in the interpretation of the model (Figures 10 and 11).  Since it is difficult to infer what the deep features depict by directly visualizing their values, the visualization propagates back to the initial features, which are more interpretable and depicts in the form of boxplots, per each filter, the reflectance spectra, the polarization, the location, and the clay content. For visualization purposes, the filters were ordered according to the average clay content of their top 5% activated samples, and the corresponding input features were unnormalized in order to be expressed in their original units. The first two filters detect "low" clay content (i.e., with values mostly between the minimum and Q1), the second two "medium" clay content (i.e., with values at Q2), while final the two last filters predict the "high" contents (i.e., at about Q3 and beyond). This is an interesting result suggesting that the CNN had untangled the latent information and developed complex features, which could identify the clay content only using the spectral and radar information.

Surface Mapping of Soil Attributes
In this case, S2MoS was implemented using EO images from 25 distinct dates, well distributed across the agricultural calendar, acquired over a period of two years over the Zazari test site. Using the predefined 0.25 and 0.075 threshold values for NDVI and NBR2, respectively, we extracted 72.12% bare soil pixels over the agricultural fields, as illustrated in Figure 12.

Surface Mapping of Soil Attributes
In this case, S2MoS was implemented using EO images from 25 distinct dates, well distributed across the agricultural calendar, acquired over a period of two years over the Zazari test site. Using the predefined 0.25 and 0.075 threshold values for NDVI and NBR2, respectively, we extracted 72.12% bare soil pixels over the agricultural fields, as illustrated in Figure 12. So, once the CNN model was validated (see Section 3.3), it was directly applied to the selected bare soil pixels for all of their temporal observations (Sentinel-1 + Sentinel-2), which represents situation's information of the soil surface across different periods (wet, tillage, ploughed, etc.) in the Zazari river basin. At the scale provided, the results were visually homogenous and free of any apparent artifacts, mainly due to the number of pixels masked as bare soil varies depending on the region. The resulting map (Figure 13) illustrates the parcel level soil clay content as derived by the proposed CNN model. The predicted clay values using the proposed CNN model showed relatively similar patterns with the areas covered by the different soil classes (Figure 2a). Moreover, we had an underestimation of the higher clay content, resulting to a moderate predictive performance (R 2 = 0.51, RMSE = 19.8%), compared to the ground truth 52 points. So, once the CNN model was validated (see Section 3.3), it was directly applied to the selected bare soil pixels for all of their temporal observations (Sentinel-1 + Sentinel-2), which represents situation's information of the soil surface across different periods (wet, tillage, ploughed, etc.) in the Zazari river basin. At the scale provided, the results were visually homogenous and free of any apparent artifacts, mainly due to the number of pixels masked as bare soil varies depending on the region. The resulting map ( Figure 13) illustrates the parcel level soil clay content as derived by the proposed CNN model. The predicted clay values using the proposed CNN model showed relatively similar patterns with the areas covered by the different soil classes (Figure 2a). Moreover, we had an underestimation of the higher clay content, resulting to a moderate predictive performance (R 2 = 0.51, RMSE = 19.8%), compared to the ground truth 52 points.

Discussion
The S2MoS proposed in this research has proved its capability to attain better prediction performance metrics compared to the current state-of-the-art approaches. These results were attributed to the multi-input CNN architecture, which has enabled us to leverage the complementary information contained among reflectance and radar values, as well as the geographical information of each sample. In this section we discussed whether the objectives of this research were achieved (Section 4.1), as well as we evaluated the interpretation of the way that input and relevant output associations was modeled (Section 4.2). We also compared the findings of this paper with existing literature (Section 4.3) and suggest directions for future research (Section 4.4). Finally, in Section 4.5, we provided insights and recommendation for the development of a global and operational EO-based soil monitoring system.

EO Regression Analysis with the Use of Convolutional Neural Networks and Synergistic Use of Optical and SAR Data
Currently, EO-based soil spectroscopic regression analysis mainly focuses on machine learning models [36,44] and recently on one-dimensional spectral signature processing with deep learning techniques [45]. Following a growing trend in the application of deep learning techniques in multidimensional prediction on various Earth system science problems [46], we employed a new multi-input approach of spectroscopy analysis using a CNN model, which takes into account both the spectral and radar characteristics of exposed soil sites, across multiple observations, for mapping their spatial characteristics. Those studies have used the Landsat archive to extract bare soil spectral signatures over large areas. To the best of our knowledge, such an investigation has not yet been thoroughly examined for Sentinel optical data. The latter data offer enhanced spatiotemporal resolution compared to Landsat imagery, making them advantageous for producing high resolution soil spatial explicit indicators on large scales. Unlike the proposed S2MoS, none of the above data mining approaches have previously examined leveraging multiple observations as a single entity. Until now, multi-temporal approaches generated outcomes reflecting the median value of the available observations. The proposed S2MoS enables a multi-temporal analysis, which efficiently can leverage information from the temporal behavior in top-soil across the years (Figure 8). This was introduced here for the first time and was the cornerstone of the proposed framework.
The use of Sentinel-2 bands was further supported, by integrating radar values as supplementary predictors. Consequently, a considerable expansion of the feature space (considering both temporal and spectral dimensions) was created, which could not be efficiently handled by the current machine learning approaches. Although the RF algorithm can be considered as a successful approach to soil texture classification [36,37], the proposed CNN network has proven to have significant advantages over other methods used in the EO-based soil spectroscopic regression analysis with respect to prediction performance (R 2 = 0.60, RPIQ = 2.02). In particular, this approach demonstrates a significant improvement in overall accuracy compared to the competing machine learning methods that make use of the multi-year optical composite (i.e., the median) as indicated in Figure 9. Furthermore, as it is expected, the proposed CNN produced the best results even when the RF and ANN models were trained in the whole dataset (see Appendix A). This scenario demands a long training time for few thousand points considering geographical coordinates, optical, and SAR imagery data as input values. The importance of the proposed CNN architecture can be further highlighted by a set of additional comparisons that were performed using as predictors the spectral source without the use of supplementary predictors (i.e., radar values). The corresponding results are given in Appendix B, where in all cases the integrated information of SAR data increases the overall predictive performance. Accordingly, it was concluded that the complementary SAR observations are very useful for improving overall accuracy by capturing the spectral influence of soil moisture and roughness of soil exposed pixels. This finding is also highlighted by Bousbih et al. [15], where they retrieved soil roughness levels and accounted them in soil texture classification.

Feature Importance
The proposed CNN allows a degree of interpretability enabling the identification of key spectral regions in accordance to the principles of explainable artificial intelligence [34]. By visualizing the selected filters from the feature maps ( Figures 10 and 11), the following observations can be made: (i) the albedo of the samples tends to increase as the clay content increases, but at high concentrations it decreases again; this is in accordance with the observations made from the laboratory spectra of the air-dried samples [47], (ii) although the coordinates' distributions are mainly influenced by the high presence of samples in central Europe, a trend can be identified in that samples with higher clay content are to be found in higher latitudes; this is in accordance with the identified correlation between them, which the CNN identified only though the spectral and radar information, (iii) there is a noticeably larger absorption in band B12 as the clay content increases; again, this complies with the well-reported observation that absorbances around 2.2 and 2.3 µm are due to the presence of OH in the mineral structure [48]-this is a region that may also affected by the presence of soil organic matter, as well as CaCO 3 , whose content however is low in the European croplands [21], thus its effect is not profound, (iv) VV tends to be more sensitive in the changes of the average clay content, as it is more affected than VH by the changes in soil roughness, which may be at least partially caused by the different soil texture [16], (v) large clay contents (i.e., larger than Q3) are harder to identify as the generated feature maps evidently confound them with lower concentrations. These interpretations have made the reasoning process of the model more transparent; they indicate how the model has obtained a clear mapping between the predictors and the output property, albeit not an infallible one, with a distinct physical meaning.

Comparison with Current State-of-the Art Regression Algorithms
Several studies [36,37] have assessed the potential of Landsat archive imagery data to the map surface clay content, finding satisfactory-good results (0.63 < R 2 < 0.75). Recently, Gholizadeh et al. [14] and Vaudour et al. [12] evaluated Sentinel-2 images for soil surface properties estimation. In the aforementioned studies, the soil attributes from the surface layer were modeled using machine learning techniques, where each pixel was enriched by some additional features coming from combinations performed among the extracted band values [14,49]. Herein, there is no need to include additional covariates in the model, as convolutional layers ensure that deep features are automatically constructed. Moreover, most studies in EO spectroscopic analysis calibrate multivariate regression models based on reference samples from their study area. In this research, the number of observations in the calibration dataset was restricted to LUCAS data, in contrast to current approaches. Liu et al. [45] also used existing SSLs to develop models for estimation of topsoil clay content and later generated relevant maps based on hyper-spectral image and a transfer function. In their approach, a few field samples and relevant spectral signatures were utilized to fine tune the calibration of model, resulting in a model with R 2 around 0.6. However, we should mention a typical extrapolation problem in digital soil mapping, since very often the developed models perform quite well in training and even test datasets but deviate strongly for data outside their valid area. In that regard, the lower performance in the Greek (R 2 = 0.51) test area ( Figure 13) can be explained by the availability of only one LUCAS soil sample within the Greek study area or by the fact that current samples are not representative of the soil variability of the other regions.

Perspective of Embedding New Reference Soil Databases and Additional Spectral Information
Even though soil texture can be considered constant for more than 20 years, several changes can be potentially caused by natural or anthropogenic activities affecting the reflectance in VNIR-SWIR. This is more prominent for other soil parameters such as organic carbon. In that regard, our approach should be extended to explore more soil properties, such as soil organic carbon, by leveraging the more recent LUCAS topsoil database of 2015 that covers a time period when the Sentinel satellites were operational. Moreover, LUCAS was designed as a monitoring system and not as a mapping tool, hence we should take into account the feature space in the design of sampling strategies [50] in order to develop more reliable calibration models. Finally, it should be noted that further analyses should be performed for assessing precisely temporal spectral variations, e.g., through temporal changes of the soil moisture and/or roughness and appropriate parametrization should be performed to effectively combine SSLs and EO data.
Additionally, by the combination of reflectance data and EO-based terrain or land use derivatives [8,36], we expected to overcome the limitations of Sentinel-2 spectral resolution in key ranges of the electromagnetic spectrum. Furthermore, the S2MoS scheme described herein may be applied to combine other heterogeneous sources in a similar fashion. For example, spectra from the VNIR-SWIR and the thermal infrared range where some of the fundamental vibrations take place, as demonstrated in Chabrillat et al. [51] and Notesco et al. [52] may be appropriately combined to yield better performance than the one attained. In this manner, although Sentinel-2 data has spectral bands similar to Landsat 8's, it excludes the thermal infrared sensor. However, its capability to obtain a large number of images allows the retrieval of more information and the calibration of robust models. This is an important fact when considering the differences in models' performance between Landsat estimations and those from Sentinel-2, like the current research.

Towards an EO-Based Soil Monitoring System
This section highlights the current limitations, discusses potential solutions that reinforce the rationale that remote sensing, artificial intelligence, and digital soil mapping communities should work closely together to better understand and eventually overcome the challenges for an EO-based soil monitoring system.
Copernicus data can be considered as a key resource but the need for higher spatial resolution optical and SAR, as well as hyperspectral and thermal space-based observations can be the game changers for an operational EO-based soil monitoring system. The forthcoming Copernicus Hyperspectral Imaging Mission (CHIME), NASA Surface Biology and Geology (SBG) mission, and Copernicus Land Surface Temperature Monitoring (LSTM) may close this gap and fulfill these needs in them of spectral resolution and thermal coverage, respectively. In future studies, it is worthy to investigate how we can properly integrate satellite soil moisture retrievals and fuse them with multispectral bands for soil mapping. It has been proven that data from Sentinel-1 can be used to define operational schemes and identify synergies, allowing for observational methods of soil moisture to be further evolved and for tailored algorithms delivering mature products to be developed based on our proposed methodology [53]. Synergies with Landsat or the existing hyperspectral sensors in orbit (e.g., PRISMA from Italian Space Agency) should be prioritized to assess the limitations whereby standalone multispectral Sentinel-2 data is not sufficient to reach the desired spatiospectral reliability [54].
The proposed deep learning CNN paves the way to overcome many of the limitations that have, until now, hindered a more wide-spread adoption of artificial intelligence in the domain of soil spectroscopy, such as multiple kernel learning approaches [55] and interpretable fuzzy rule based algorithms. This kind of models are transparent in terms of their reasoning process, thus enabling experts alike to holistically address the issues tackled in contrast to black box models. In future studies, the application of generative adversarial networks [56] for regression should be prioritized, since the use of generator and discriminator networks can be a powerful tool for the synergistic used of large SSLs and space-borne data. In that regards, unlike on LUCAS, Global [57], and GEOCRADLE SSLs [58], large, reliable, labeled soil data do not always exist in soil science, not only because of the sizes of the datasets involved, but also owing to the absence of standard operating procedures in chemical laboratory measurements, as well as common guidelines on lab spectroscopy measurements to ensure interoperability and reliable reusability. The upcoming release of LUCAS version of 2015 as well as new efforts in Africa and Latin America based on agreed set of harmonization principles will allow us to have better chance of chemical attributes estimation within an intra-annual calibration of models.
It should be noted that, approaches based on synergistic use of multimodal EO data, including the available hyper-spectral missions [59], can further raise computational challenges dealing with petabytes of data. To that end, it is critical to exploit the existing IT capabilities of cloud computing and processing platforms, such as GEE, DIAS, and Committee on Earth Observation Satellites Data Cube [60], to significantly reduce the time needed to process EO information.

Conclusions
This paper proposed a topsoil clay content quantification technique based on a multi-input deep convolutional network and a multi-temporal analysis of optical and radar imagery data to produce detailed soil maps of high spatial coverage over agricultural regions. Our approach was implemented with Sentinel-1 and Sentinel-2 data using data from 6136 agricultural sites of the LUCAS SSL, over a three-year period. A detailed study was conducted in the Zazari river basin in Northern Greece to independently examine the performance of the model and allow for better visualization. For that purpose, we used soil data from 52 sites collected within a study period extending from August 2018 to September 2019.
The results deliver preliminary insights of the potential of the integrated optical and radar time series products that can be used for topsoil mapping. In particular, our novel methodological framework can efficiently utilize both Sentinel-1 and Sentinel-2 information instead of solely the median spectral reflectance of each sample. The proposed technique shows promising results (R 2 = 0.60, RPIQ = 2.02) comparing favorably against current state-of-the-art data mining procedures to produce accurate soil predictions from satellite images. Optical and radar bands play different roles in EO-based soil monitoring and integrating the relative utility of these bands is an effective way to enhance the regression performance. In that regard the proposed approach shows that the synergistic use of SAR and optical remote sensing could facilitate the provision of useful added value information regarding the effect of ambient factors of exposed soils, achieving an improvement of 5.5% in RMSE over the performance of the RF, as the best model, which used as input solely the synthetic spectral reflectance of each sample. Moreover, the information based on additional auxiliary input (e.g., geographical coordinates) is an effective complementary source of information to enhance the prediction performance Remote Sens. 2020, 12, 1389 21 of 26 by considering the spatial relationship between regions and soil characteristics. Although this approach has been developed and tested on MSI Sentinel-2 imagery data, its potential and capability can be adopted on the forthcoming hyperspectral orbital sensors where better results are expected, and more soil attributes could be predicted accurately. Funding: This research was partly funded by the "e-shape" project under grant agreement No 820852, supported by European Union's Horizon 2020 research and innovation program.
Acknowledgments: Authors acknowledge the cloud computing platform Google Earth Engine for providing access to Copernicus Sentinel preprocessed images. The LUCAS topsoil dataset used in this work was made available by the European Commission through the European Soil Data Centre managed by the Joint Research Centre (http://esdac.jrc.ec.europa.eu).

Conflicts of Interest:
The authors declare no conflict of interest.