Sentinel-2 and Landsat-8 Multi-Temporal Series to Estimate Topsoil Properties on Croplands

: The spatial and temporal monitoring of soil organic carbon (SOC), and other soil properties related to soil erosion, is extremely important, both from the environmental and economic perspectives. Sentinel-2 (S2) and Landsat-8 (L8) time series increase the probability to observe bare soil ﬁelds in croplands, and thus, monitor soil properties over large regions. In this regard, this work suggests an automated pixel-based approach to select only pure soil pixels in S2 and L8 time series, and to make a synthetic bare soil image (SBSI). The SBSIs and the soil properties measured in the framework of the European LUCAS survey were used to calibrate SOC, clay, and CaCO 3 prediction models. The results highlight a high correlation between laboratory soil spectra and the SBSIs median spectra, especially for the SBSI obtained by a three-year S2 collection, which provides satisfactory results in terms of SOC prediction accuracy (RPD: 1.74). The comparison between S2 and L8 results demonstrated the higher capability of the S2 sensor in terms of SOC prediction accuracy, mainly due to the greater spatial resolution of the bands in the visible region. Whereas, neither S2 nor L8 could accurately predict the clay and CaCO 3 content. This is because of the low spectral and spatial resolution of their SWIR bands that prevent the exploitation of the narrow spectral features related to these two soil attributes. The results of this study prove that large S2 time series can estimate and monitor SOC in croplands using an automated pixel-based approach that selects pure soil pixels and retrieves reliable synthetic soil spectra.


Introduction
Soil organic matter loss, and consequently soil erosion, is one of the main processes of land degradation in croplands, often due to increasingly intensive agricultural management [1,2]. In this regard, The Voluntary Guidelines for Sustainable Soil Management published by Intergovernmental Technical Panel on Soils [3] indicated the loss of soil organic carbon (SOC) as one of the main causes of soil degradation and lay down a set of good practices to enhance the soil organic matter content and improve the soil fertility. These practices will be used as guidelines for cross-compliance rules for Common Agricultural Policy (CAP) related to the good agricultural and environmental condition (GAEC) of the land for the period post-2020 [4].
Other soil properties like topsoil clay, sand, and calcium carbonate (CaCO 3 ) can be used to quantify the soil vulnerability in terms of erosion [5,6]. Consequently, spatial and temporal monitoring of SOC and other soil properties related to soil erosion is extremely important, both from the environmental and economic perspective for sustainable soil management within the European Green Deal, that aims to reduce the misuse of fertilizers, and at the same time, to increase the carbon stock in the soil [7,8].
The physical link between soil properties and the electromagnetic spectrum in the optical spectral domain is well-known [9], and widely exploited in soil spectroscopy for soil properties estimation and monitoring [10]. This is due to chemical components or chromophores interacting with visible and infrared radiations and showing well-defined the SBSIs were obtained using images acquired in three different time windows (1, 2, and 3 years), both for S2 and L8 images, and the soil samples collected for the 2015 LUCAS topsoil survey were used to build multivariate regression models.

LUCAS Topsoil Dataset
In the framework of the Land Use and Coverage Area frame Survey (LUCAS), a topsoil survey is carried out in the Member States of the European Union (EU) every three years to collect statistical information on soil properties. The 2015 LUCAS database includes soil samples for all the 28 countries of the EU, and it was carried out over 90% of the locations sampled in the earlier surveys (2009 and 2012), while the remaining 10%, 5656 samples, were collected over new locations [30].
All the soil samples from the LUCAS surveys in 2009, 2012, and 2015 were collected following the same standard protocol that entails the collection of a composite sample (about 500 g and 20 cm depth) taken by a spade in five points: The first point corresponds to the point location and the other four at 2 m distance following the cardinal directions [31].
Thirteen chemical and physical parameters were measured in the laboratory for each soil samples from the 2015 LUCAS topsoil database, by the ISO methods (clay [32]; organic carbon [33]; calcium carbonate [34]).
The spectra were acquired with an XDS Rapid Content Analyzer (FOSS NIR Systems Inc., Laurel, MD, USA) spectrometer between 400 and 2500 nm with 2 nm resolution following the protocol described by [35].
To link the most recent and available LUCAS soil data with satellite images, and to avoid exceeding the user memory limit imposed by GEE, some regions across Europe were selected, where 815 soil samples were collected in new locations in croplands during the 2015 LUCAS survey: Great Britain and Ireland, Central Europe (Germany, Austria, Czech Republic, Slovakia, Hungary, and Poland), Baltic states (Latvia, Lithuania, Estonia), Western Europe (Great Britain, Ireland, Belgium, Netherlands, Luxembourg, and South France), and Southern Europe (Italy and Malta) ( Figure 1). The 1Y_L collection was added to have satellite data close to the LUCAS survey period-unfortunately S2 was not yet available.
The selection of the bare soil pixels for each image of the collection was carried out in the selected regions concerned by the new sampling location of the 2015 LUCAS survey.
This selection entails applying a cloud mask removing pixel affected by clouds and clouds' shadows using the products included in the two satellite collections. The 'QA60 Sentinel-2 bitmask band with cloud mask information was used to mask cirrus and opaque clouds, and the 'MSK_CLDPRB' and 'MSK_SNWPRB' bands were used to remove cloudy and snowy pixels, respectively. The Pixel Quality Assessment Band provide by L8 was used to mask cloudy pixels.
After that, the Normalized Difference Vegetation Index (NDVI) and Normalized Burn Ratio 2 (NBR2) Equation (1) indices were computed for the remaining pixels to detect green and dry vegetation, respectively, The SWIR 1 band corresponds to B6 for L8 and B11 for S2, while SWIR 2 to B7 for L8 and B12 for S2 (Table 1). The NBR2 index can also remove pixels interested by high soil moisture content that can affect the soil spectrum shape [22]. Only the pixels which comply with the following conditions: No clouds, NDVI < 0.35, NBR2 < 0.125 were kept. The NDVI and NBR2 thresholds were set according to a conservative approach that aims to find a compromise between the size of the dataset and the estimation accuracy [17,23]. For each geographical location represented by the center of the satellite pixel, we could get several dates complying with the above-mentioned conditions. Therefore, we computed the median value for each band, but only considering the locations that respect all the conditions at least three times within the time series ( Figure 2). The selected bands for S2 and L8 data are listed in Table 1.
To demonstrate the role of the time series to increase the SBSIs area and their reliability for soil properties mapping, four representative areas of interest from the 2015 LUCAS survey were selected: Baltic states, Ireland, South Italy, and East Germany-West Poland ( Figure 1). For each sampling location in these areas, we counted how many times the corresponding pixel complying with the conditions described above-in other words, the bare soil frequency within a time interval for the same point. After that, the Normalized Difference Vegetation Index (NDVI) and Normalized Burn Ratio 2 (NBR2) Equation (1) indices were computed for the remaining pixels to detect green and dry vegetation, respectively, The SWIR1 band corresponds to B6 for L8 and B11 for S2, while SWIR2 to B7 for L8 and B12 for S2 ( Table 1). The NBR2 index can also remove pixels interested by high soil moisture content that can affect the soil spectrum shape [22]. Only the pixels which comply with the following conditions: No clouds, NDVI < 0.35, NBR2 < 0.125 were kept. The NDVI and NBR2 thresholds were set according to a conservative approach that aims to find a compromise between the size of the dataset and the estimation accuracy [17,23]. For each geographical location represented by the center of the satellite pixel, we could get several dates complying with the above-mentioned conditions. Therefore, we computed the median value for each band, but only considering the locations that respect all the conditions at least three times within the time series ( Figure 2). The selected bands for S2 and L8 data are listed in Table 1.

Soil Properties Estimation Models
The SOC, clay, and CaCO 3 content provided by the 2015 LUCAS survey and the median spectrum of the SBSIs were used to build the prediction models using the Cubist algorithm. The Cubist is a predicted-oriented regression algorithm with a unique linear regression model at each node defined by a rule. The rules concern the predictors, here the reflectance values of the satellite bands, defining a subset at each node that makes a multi-way tree structure [36]. The Cubist [37] and caret R packages were used to tune the models to find the best number of neighbors and committees (boosting iterations) that minimizes the 10-fold cross-validation root mean square error (RMSE). To compare the accuracy obtained in this work for the three soil properties and with other results in the literature, the ratio of performance to deviation (RPD) Equation (2), were computed, where std is the standard deviation of the observed values. The LUCAS soil properties values were also used with laboratory spectra to build estimation models that can be used as the reference test, since laboratory spectral data should represent the best condition for soil properties estimation, where there are no disturbing effects, due to the distance between sensor and target and the roughness is negligible. In this case, due to the high number of predictors (bands), the well-known partial least squares regression (PLSR) algorithm was also tested [38], which has proved to be powerful for hyperspectral data in soil spectroscopy [39,40]. Cubist and PLSR models were tested for all satellite and laboratory datasets, and only the best result of the two was reported. Moreover, the laboratory spectra were resampled using Gaussian models, according to the L8 (Lab_L8) and S2 (Lab_S2) central wavelengths and bandwidths listed in Table 1.

Bare Soil Selection
Obviously, the larger the time interval, higher the number of bare soil pixels in all the selected regions ( Table 2). The 3Y collection generally avoids very low bare soil frequency, especially for the S2 time series. The highest average values were detected in South Italy and East Germany-West Poland, where the S2 time series provides more bare soil pixels than the L8 collection. The lowest number of bare soil pixels was obtained in the Ireland region, mainly due to the high cloudiness.  Table 3 shows the descriptive statistics of the soil samples (Figure 1), for which it was possible to retrieve a median bare soil spectrum both for the S2 and L8 time series. Although the number of available samples decreases using a narrower time interval (from 3Y to 1Y), the range, mean, and standard deviation only vary slightly. All three soil properties have a very large range of values and high variability. These datasets were used to build clay, SOC, and CaCO 3 prediction models.

SOC, Clay, and CaCO3 Estimation Accuracy
The accuracy of the models calibrated using laboratory spectra (Lab) is very high for CaCO 3 (RPD: 2.74) and lower for clay (RPD: 1.83) and SOC (RPD: 1.58). Resampling the laboratory spectra according to the S2 and L8 spectral characteristics, the accuracy decreases for all the three soil properties, and, in particular, for CaCO 3 , for which the RMSE is almost doubled ( Table 4). The models calibrated with the S2 data showed a good estimation accuracy for SOC content, especially using the S2_3Y time series (RPD: 1.74), while the RPDs of clay and CaCO 3 are quite low. The SOC model using the L8_3Y data provided a sufficient degree of accuracy (RPD: 1.53), however all the L8 statistics are worse than those obtained by S2 data. Generally, the 3Y models provided the best results, while the use of the L8_1Y_L collection did not show any advantage in terms of estimation accuracy as compared to 1Y. Table 4. Estimation accuracy of the soil properties prediction models in terms of root mean square error (RMSE) and the ratio of performance to deviation (RPD). Lab, spectra acquired in laboratory condition; Lab_L8, laboratory spectra resampled, according to the Landsat-8 OLI sensor; Lab_S2, laboratory spectra resampled, according to the Sentinel-2 MSI sensor; L8, synthetic bare soil image obtained by Landsat-8 collection; S2, synthetic bare soil image obtained by Sentinel-2 collection.

Discussion
Vegetation in cropland fields persists most of the time during the year. Consequently, the probability of acquiring a clouds free image and to find, at the same time, bare soil is quite low. In this regard, the composite image increases the investigated area for soil properties mapping, especially if a long and large time series collection is available, such as that provided by the Landsat program, and in particular Landsat-5 (started in 1984), Landsat-7, and the most recent Landsat-8 mission. In this regard, Demattê et al. [23] used Landsat 5 collection to obtain a synthetic soil image of the Sao Paulo region in Brazil, Demattê et al. [29] obtained bare Earth's surface spectra based on Landsat series, and Safanelli et al. [41] used 37 years Landsat data to map soil properties in Europe. Although Sentinel-2 constellation delivered the first images in 2015, the short revisit time provided by the Sentinel-2 already collects a large amount of data worldwide, and some authors started to explore the capability of the S2 time series. These authors include Vaudour et al. [26], who used S2 time series to map SOC in the Versailles plain in France, and Silvero et al. [42], who combined S2 and L8 time series for soil mapping purposes over large areas. Table 2 showed how the better S2 revisit time (five days), as compared with that of L8 (sixteen days), entails collecting more bare soil pixels at the same location, and consequently, obtaining more reliable median reflectance values, smoothing the effect of extreme values. Moreover, collecting images across more years increases the probability of including a larger number of exposed soils in the SBSI, thus increasing the mapping area in croplands. This is particularly important in regions where it is more difficult to acquire clear-sky images, such as those of North Europe (e.g., the Baltic States and Ireland).
The number of selected pixels, classified as bare soil, depends on the criteria adopted for the selection, and thus, for the minimization of the disturbing factors. The NDVI index was widely and successfully used to discriminate between soil (NDVI < 0.25-0.35) and green vegetation, due to the sharp increase of the reflectance between red and NIR region in the vegetation spectrum, while the differences between dry vegetation (crop residues, straw, stubble, etc.) and soil are less showy. The lignin spectral feature in SWIR (around 2100 nm) could help detect dry vegetation, but it is located very close to kaolinite and other clay minerals features that characterize soil spectra. Therefore, the NBR2 has been widely used to detect dry vegetation and spectra affected by high moisture content [16,23]. This normalized index exploits the spectral region between 1600 and 2100, which is almost flat for soil spectra, while the dry vegetation curve is descending, and the slope of the curve is related to the percentage of dry vegetation [23]. The NBR2 was successfully tested both with L8 data using B6 and B7 and with S2 using B11 and B12 [17,26]. Although lower the NBR2 value, higher the probability of detecting pure soil, there is not yet a clear agreement about the best NBR2 threshold value for the discrimination, and the study of Castaldi et al. [17] suggests that the choice should be guided by the specific requirements of the study trying to find a compromise between prediction accuracy and spatial coverage of the map. Vaudour et al. [26] compared per-pixel and per-date approaches for S2 time series, and they gained the best compromise between mapped extent and SOC accuracy (RPD= 1.50) using a per-date approach selecting the driest soil based on soil surface moisture index (S2WI).
Another important disturbing factor is the soil surface roughness, which can be minimized by selecting images acquired when it is more probable to find soil in a seedbed condition. In this regard, Dvorakova et al. [20] selected the 'greening-up' period based on the NDVI timeline, i.e., the last date of acquisition where the NDVI is lower than 0.25 before the crop develops, and they were able to significantly improve the SOC estimation accuracy combing the 'greening-up' approach with a very strict NBR2 threshold (< 0.07), which, however, limited the extent of the mapping area.
If, on one the hand, the composite images increase the mapping area, then, on the other hand, soil imaging spectroscopy should use images acquired as close in time as possible to ground truth survey to build a reliable prediction model, and this assumption is not respected using large multi-temporal data. It must nevertheless be noted that most of the soil properties are quite stable, or change very slowly over time, if no drastic changes in soil management, land use, or cover occur and where there were not remarkable erosion processes or extreme meteorological events. The comparison between 2009 and 2015 LUCAS surveys in 17,613 soil points showed limited changes over the six years period [43]. Consequently, the temporal difference between soil survey and satellite data could be irrelevant for modeling and mapping purposes.
Another important aspect to consider is the consistency between survey area and pixel size or ground sampling distance (GSD). A composite topsoil sample was collected for the LUCAS survey within a circular area of 2 m radius, while the ground sampling distance (GSD) for S2 is 10 m for the visible band, and 20 m in the NIR and SWIR bands. The discrepancy between soil sampling area and GSD is even more marked for L8: All the L8 bands have a resolution of 30 m. This incongruity could lead to not reliable soil estimation models if the measured values do not represent the actual situation within the pixel area [27,44,45]; this issue may become more pronounced according to the magnitude of the geometric error and if the investigated soil property has a large short-range spatial variability. The Pearson's correlation coefficient between reflectance measured on LUCAS soil samples in the laboratory and satellite data (Figure 3) highlighted how the S2 SBSIs are closer to lab spectra than L8 SBSIs, probably due to the lower spatial resolution of the NASA sensor. Figure 3 also shows that, in general, the correlation is stronger in the visible region and that the 3Y SBSI is more correlated to laboratory spectra than 2Y and 1Y for all the S2 bands and for B2, B3, B4, and B5 of the L8. These outcomes are consistent with [41] that using L8 found that generating a bare soil composite image by using a time interval close in time to soil observations did not improve neither the quality of topsoil reflectance nor the prediction accuracy of soil properties, especially for the more stable soil attributes, such as clay and CaCO 3 . Observing Table 4, it is quite evident how the highest accuracy was obtained using 3Y SBSI, especially for SOC estimation. Thus, both the correlation investigation and the model accuracy demonstrated that a large time interval leads to a more reliable SBSI, probably because it can reduce the negative effect of extreme estimates along with the multi-temporal survey [41]. Figure 4 shows how the S2 SBSI spectra are quite similar to laboratory spectra for high SOC content and low clay and CaCO 3 (Figure 4d-f).
The observed differences between laboratory and satellite spectra in terms of reflectance value are probably due to soil moisture that reduces the reflectance over the entire spectrum, influencing the amount of reflected and emitted energy from a soil surface [13,46]; the LUCAS soil samples were dried before being scanned in the laboratory, while the satellite spectra, although the NBR2 was applied to exclude high soil moisture content, still retain a certain amount of water especially for soil with a high clay content. As proof of the previous sentence, we have shown the difference between Lab_S2 and the satellite spectrum is bigger for very high clay content than for low content (Figure 4e).
LUCAS surveys in 17,613 soil points showed limited changes over the six years period [43]. Consequently, the temporal difference between soil survey and satellite data could be irrelevant for modeling and mapping purposes.
Another important aspect to consider is the consistency between survey area and pixel size or ground sampling distance (GSD). A composite topsoil sample was collected for the LUCAS survey within a circular area of 2 m radius, while the ground sampling distance (GSD) for S2 is 10 m for the visible band, and 20 m in the NIR and SWIR bands. The discrepancy between soil sampling area and GSD is even more marked for L8: All the L8 bands have a resolution of 30 m. This incongruity could lead to not reliable soil estimation models if the measured values do not represent the actual situation within the pixel area [27,[44][45]; this issue may become more pronounced according to the magnitude of the geometric error and if the investigated soil property has a large short-range spatial variability. The Pearson's correlation coefficient between reflectance measured on LUCAS soil samples in the laboratory and satellite data (Figure 3) highlighted how the S2 SBSIs are closer to lab spectra than L8 SBSIs, probably due to the lower spatial resolution of the NASA sensor.  Figure 3 also shows that, in general, the correlation is stronger in the visible region and that the 3Y SBSI is more correlated to laboratory spectra than 2Y and 1Y for all the S2 bands and for B2, B3, B4, and B5 of the L8. These outcomes are consistent with [41] that using L8 found that generating a bare soil composite image by using a time interval close in time to soil observations did not improve neither the quality of topsoil reflectance nor the prediction accuracy of soil properties, especially for the more stable soil attributes, such as clay and CaCO3. Observing Table 4, it is quite evident how the highest accuracy was obtained using 3Y SBSI, especially for SOC estimation. Thus, both the correlation investigation and the model accuracy demonstrated that a large time interval leads to a more reliable SBSI, probably because it can reduce the negative effect of extreme estimates along with the multi-temporal survey [41]. Figure 4 shows how the S2 SBSI spectra are quite similar to laboratory spectra for high SOC content and low clay and CaCO3 (Figure  4d-f). The observed differences between laboratory and satellite spectra in terms of reflectance value are probably due to soil moisture that reduces the reflectance over the entire spectrum, influencing the amount of reflected and emitted energy from a soil surface [13,46]; the LUCAS soil samples were dried before being scanned in the laboratory, while the satellite spectra, although the NBR2 was applied to exclude high soil moisture content, still retain a certain amount of water especially for soil with a high clay content. As proof of the previous sentence, we have shown the difference between Lab_S2 and the satellite spectrum is bigger for very high clay content than for low content (Figure 4e). The L8 spectra in Figure 4a,b and c clearly highlights how the reflectance of the synthetic curves is greatly lower than laboratory spectra both for high and low SOC, clay, and CaCO3 contents-their shape is quite flat and the reflectance values are relatively low (most of them around 0.1). The 30 m spatial resolution of the L8 bands increases the probability to include more than one element within the pixel area (e.g., vegetation and soil), and at the same time, this entails a lower capability to detect the presence of the two main elements using the spectral behavior [47,48]; in other words, the NDVI, or other indices, could be not able to discriminate between pure soil and mixed pixels and this could lead to include not bare soil spectra in a time series. However, the effect of these mixed spectra can be mitigated by the computation of the median spectra along with the time series. The prediction models calibrated with laboratory spectra provided a good accuracy for all the three soil properties, and, in particular, the CaCO3 showed a very high RPD (2.74). The resampling process according to the S2 and L8 bands highlighted how a high spectral resolution is needed for clay and calcium carbonates estimation, while it seems less important for SOC. This agrees with the observations from the authors of [49] that The L8 spectra in Figure 4a,b and c clearly highlights how the reflectance of the synthetic curves is greatly lower than laboratory spectra both for high and low SOC, clay, and CaCO 3 contents-their shape is quite flat and the reflectance values are relatively low (most of them around 0.1). The 30 m spatial resolution of the L8 bands increases the probability to include more than one element within the pixel area (e.g., vegetation and soil), and at the same time, this entails a lower capability to detect the presence of the two main elements using the spectral behavior [47,48]; in other words, the NDVI, or other indices, could be not able to discriminate between pure soil and mixed pixels and this could lead to include not bare soil spectra in a time series. However, the effect of these mixed spectra can be mitigated by the computation of the median spectra along with the time series.
The prediction models calibrated with laboratory spectra provided a good accuracy for all the three soil properties, and, in particular, the CaCO 3 showed a very high RPD (2.74). The resampling process according to the S2 and L8 bands highlighted how a high spectral resolution is needed for clay and calcium carbonates estimation, while it seems less important for SOC. This agrees with the observations from the authors of [49] that noted that the reduction of spectral resolution has not a remarkable effect on SOC estimation accuracy, while for a bandwidth larger than 40 nm, the clay estimation accuracy decreases significantly. The authors of [15] and [16] compared Sentinel-2 and airborne hyperspectral data in terms of SOC estimation accuracy. Both papers concluded that Sentinel-2 estimates and maps SOC, and that the retrievable accuracy is not significantly different than that obtained by a better spectral and spatial resolution-especially where a large SOC variability is observable. Whereas, for clay and CaCO 3 estimation, only narrow bands can exploit the spectral features related to these two properties, the heterogeneity of the organic matter composition entails a spectral response spread over the whole electromagnetic spectrum and not limited to a specific spectral region. The clay and CaCO 3 spectral features are located in the SWIR region. Figure 5 clearly shows that the most important bands for clay and CaCO 3 Lab models are located between 2100 and 2500 nm and only a few bands in the visible (Figure 5a).
Remote Sens. 2021, 13, x 12 of 16 bands can exploit the spectral features related to these two properties, the heterogeneity of the organic matter composition entails a spectral response spread over the whole electromagnetic spectrum and not limited to a specific spectral region. The clay and CaCO3 spectral features are located in the SWIR region. Figure 5 clearly shows that the most important bands for clay and CaCO3 Lab models are located between 2100 and 2500 nm and only a few bands in the visible (Figure 5a). The variable importance analysis for 3Y SBSI models confirmed the trend, described above (Figure 5b,c). However, despite both S2 and L8 have two bands in this SWIR, the clay and CaCO3 prediction accuracy is much lower than that observed with laboratory data, this is due on the one hand to the large bandwidth of the S2 and L8 SWIR bands The variable importance analysis for 3Y SBSI models confirmed the trend, described above (Figure 5b,c). However, despite both S2 and L8 have two bands in this SWIR, the clay and CaCO 3 prediction accuracy is much lower than that observed with laboratory data, this is due on the one hand to the large bandwidth of the S2 and L8 SWIR bands (between 85 and 187 nm), that does not properly exploit the narrow spectral feature of the clay mineral and those related to CaCO 3 , and on the other hand to the large pixel size of these bands (20 m for S2 and 30 m for L8). The most important bands for SOC prediction are mostly located in the visible region (Figure 5a), this is also clear for 3Y SBSI models both for L8 and S2, where basically only the blue, green, and red bands contribute to the cubist models (Figure 5b,c). Therefore, the higher accuracy of the satellite SOC models can be explained by the better spatial and spectral resolution (10 m and bandwidth between 30 and 65 nm) of the S2 visible bands compared with SWIR bands. The importance of the resolution of the S2 visible bands can be confirmed by the high correlation between laboratory and satellite spectra shown in Figure 3b: Around 0.7 for 3Y image and around 0.65 for 2Y and 1Y. As shown in Table 4, it should be noted that S2 provided a better SOC prediction accuracy than L8-probably due to the higher spatial resolution of the Copernicus' sensor. This was confirmed by the authors of [19], who compared S2 and L8 in terms of SOC prediction accuracy, and they observed a worsening of the RPD values from 1.53 to 1.40 degrading the S2 GSD to that of L8.
The hyperspectral sensor of the PRISMA (PRecursore IperSpettrale della Missione Applicativa) mission of the Italian Space Agency (ASI) [50] has been acquiring freely available images for the scientific community since 2019. The high spectral resolution of the PRISMA data could be very useful to exploit the narrow spectral features related to clay and CaCO 3 [49]. However, the capability of the PRISMA data in terms of soil prediction accuracy should be assessed in relation to the spatial resolution of the PRISMA hyperspectral sensor (30 m) and the panchromatic camera (5 m). In the future, other hyperspectral satellite missions, such as the Environmental Mapping and Analysis Program (EnMAP) [51] of the German Aerospace Center (DLR) and the planned Sentinel-10/CHIME (Copernicus Hyperspectral Imaging Mission for the Environment) [52], will deliver a large amount of hyperspectral data making it possible to carry out multi-temporal analysis to improve soil properties prediction and mapping.

Conclusions
This work investigates the capability of the S2 and L8 multi-temporal collection to estimate soil properties by extracting bare soil spectral data from a composite image: The synthetic bare soil image (SBSI). Since one of the main challenges for automating the soil properties mapping and monitoring from imaging spectroscopy is the minimization of disturbing factors, an automated pixel-based approach to select exposed soil pixels not affected by clouds, vegetation, and high soil moisture was proposed, with the aim to clearly define the conditions under which reliable soil properties map can be produced over a large region.
The high correlation between soil spectra scanned for the 2015 LUCAS dataset and those retrieved from the three-year SBSI, in correspondence of the LUCAS samples points, especially for S2 data, demonstrated the goodness of the pixel selection method, although a clear reflectance shift can be observed, due to the drier condition of the soil samples scanned in the laboratory.
The SBSIs and some of the samples collected in the framework of the 2015 LUCAS survey were used to calibrate SOC, clay, and CaCO 3 . The results highlighted how the threeyear collection increases the number of selected pixels and the SOC estimation accuracy (RPD: 1.74), smoothing the negative effect of extreme estimates along with the multitemporal survey. The comparison between S2 and L8 outputs demonstrated the higher capability of the Copernicus sensor in terms of SOC prediction accuracy, due to greater spatial resolution of the bands in the visible region. While the two multispectral sensors were no able to properly predict clay and CaCO3 content because of the low spectral and spatial resolution of their SWIR bands, that does not exploit the spectral features related to these soil attributes.
The results of this study proved the capability of large S2 time series to estimate and monitoring SOC in croplands using an automated pixel-based approach that selects pure soil pixels and retrieves reliable synthetic soil spectra. Future research should be focused on fine-tuning the methodology to retrieve a composite image of bare soil from large image collections that aim to find a compromise between SOC prediction accuracy and spatial coverage of the map.