remote Improvement of Clay and Sand Quantiﬁcation Based on a Novel Approach with a Focus on Multispectral Satellite Images

: Soil mapping demands large-scale surveys that are costly and time consuming. It is necessary to identify strategies with reduced costs to obtain detailed information for soil mapping. We aimed to compare multispectral satellite image and relief parameters for the quantiﬁcation and mapping of clay and sand contents. The Temporal Synthetic Spectral (TESS) reﬂectance and Synthetic Soil Image (SYSI) approaches were used to identify and characterize texture spectral signatures at the image level. Soil samples were collected (0–20 cm depth, 919 points) from an area of 14,614 km 2 in Brazil for reference and model calibration. We compared different prediction approaches: (a) TESS and SYSI; (b) Relief-Derived Covariates (RDC); and (c) SYSI plus RDC. The TESS method produced highly similar behavior to the laboratory convolved data. The sandy textural class showed a greater increase in average spectral reﬂectance from Band 1 to 7 compared with the clayey class. The prediction using SYSI produced a better result for clay (R 2 = 0.83; RMSE = 65.0 g kg − 1 ) and sand (R 2 = 0.86; RMSE = 79.9 g kg − 1 ). Multispectral satellite images were more stable for the identiﬁcation of soil properties than relief parameters. These results show that despite the higher efﬁciency expected for hyperspectral sensors, it is not only the number of bands that matter for estimating clay and sand content, but also having bands at key wavelengths related to soil properties.


Introduction
The characteristics of the topsoil layer are important because they provide fundamental information for food production. Soil is composed of physical, chemical, mineralogical, and organic compounds, plus water and air, and these properties have been degraded in many agricultural regions through poor management actions [1]. An important soil property related to physical structure is the particle size distribution, which is divided into three main fractions: clay (<0.002 mm), silt (0.002-0.02 mm), and sand (>0.02 mm). The clay fraction influences several important processes, such Table 1. Literature review concerning prediction of clay content by laboratory spectral data, satellite multispectral data, and relief-derived covariates.
Several hypotheses were formulated to address these questions as follows: (1) A Landsat multispectral sensor can quantify clay and sand contents since it has key bands distributed along VIS-NIR-SWIR; (2) Soil spectral libraries can assist in the interpretation and characterization of satellite information; (3) We also expect that a composite of temporal imagery can achieve the best mapping of spatially continuous bare soil data since it will fill the gaps where soil was bare in one period and covered in another; (4) Soil properties are typically determined by classical tacit knowledge, which makes inferences along landforms or by mathematical procedures and interpolation. Spectra from pixels are a direct source of information, and should draw a better picture of clay and sand distribution than reliefs. One technique that can detect even the barest soil over time periods is the Geospatial Soil Sensing System (GEOS3) [33], which will be evaluated in this study. Thus, the main objective of this study is to evaluate multispectral satellite images for clay and sand content quantification and mapping.

Study Area
The study area is located in the central region of São Paulo state, Brazil, with a total area of 14,614 km 2 ( Figure 1). The climate in the region is classified as humid subtropical (Cwa) according to the Köppen classification, with dry winters and hot summers [34]. The mean annual precipitation is 1480 mm, while the mean annual maximum and minimum temperatures are 28 • C and 15 • C, respectively. The terrain elevation varies between 400 and 1100 m above sea level. The predominant soils are Ferralsols, Lixisols, Arenosols, and Nitosols, with different particle size distributions [35]. The primary land uses are for sugarcane, silviculture, oranges, and pasture. This area was selected because it is subject to intense agricultural usage and great soil variation, expressing the characteristics of a typical tropical region.

Field Sampling and Soil Analysis
Soil samples were established in catenas to represent variations of relief and geology using a centimeter-accuracy (RTK) Trimble GPS unit (Figure 1), and were collected at 0-20 cm depth at a total of 919 locations. All soil samples were ground, dried, and sieved (2 mm), and spectral reflectance measurements were taken using a Spectroradiometer FieldSpec 3 sensor (ASD, Boulder, CO, USA) in the 350 to 2500 nm range (Visible, VIS; Near-infrared, NIR; and Shortwave infrared, SWIR) under laboratory conditions. A white Spectralon plate was used as a reference plate with a 50 W halogen lamp positioned at 35 cm from the platform with a zenith angle of 30 • . Samples were placed on petri dishes and the sensor was positioned vertically at 8 cm from the platform. The soil samples were also analyzed in the laboratory to determine particle size fractions, following the Pipette method [36]. Spectral reflectance information acquired in the laboratory was also convolved to the Landsat TM bands, assuming a Gaussian response function for the full width at half the maximum spectral response of the sensor [37].

A Brief Description of the Geospatial Soil Sensing System (GEOS3)
The GEOS3 was developed by Demattê et al. [33], and is defined as a data mining procedure to process multitemporal Landsat 5 TM Time Series. A user can define a time series containing only cloud-free images from the dry season (to avoid moisture interference) and obtain the same from a database containing legacy data [38]. The method to retrieve topsoil reflectance can be summarized as follows. The system uses a time series of atmospherically corrected Landsat 5 TM images transformed into surface reflectance. All images are masked using several approaches, including NDVI and NBR2, among others, and the pixels that are not soil are extracted. Then, all images containing the remaining pixels with bare soil are superimposed, and the system averages the reflectance to produce the median spectral reflectance, resulting in the Temporal Synthetic Spectral (TESS) reflectance. The composite of all TESS values generates a unique spatially continuous image called the Synthetic Soil Image (SYSI). The SYSI is based on the availability of remote sensing spectral reflectance measurements at different times which are used to achieve spatially continuous maps, rather than modelling, interpolating, or inferencing the soil surface.

Image and Spectral Information
The Landsat data were selected from orbit 220 and path 75, in the period of 1984 to 2011, totaling 151 images. During the time series, most cropping systems were characterized by a period with bare soil. In general, mechanical crop operations are carried out during May and October, matched with the dry season, where the effects of soil moisture can be minimized in the images. In addition, both natural and anthropogenic locations present bare soil, most of which is plowed. The six spectral bands acquired by the sensor are suitable for soil characterization, with measurements in the VIS-NIR-SWIR bands. The atmospherically and geometrically processed higher-level products were acquired from EROS (Earth Resource Observation and Science Center). These products consist of surface reflectance derived through the Landsat Ecosystem Disturbance Adaptive Processing System (LEDAPS). The products are based on Radiative Transfer Modeling performed by the 6S algorithm [39].
GEOS3 was used to construct a single image (SYSI) of the study area (Figure 2a), from which 919 topsoil samples were collected and their laboratory spectra measured. The SYSI indicated that 68% of the pixels were bare soil over the entire study area. More specifically, this corresponded to 92% of the agricultural areas. The spectral signature from the laboratory (convolved to Landsat bands) was compared with the respective spectra from the image (TESS). A morphological evaluation of the spectra was carried out between the sources, as indicated by Demattê et al. [40], considering the intensity and shape. Furthermore, models were constructed to quantify clay and sand content based on spectral reflectance information both from the laboratory and from the SYSI [41]. The values from the raster layers at sampling points were extracted using bilinear interpolation, which uses the four nearest neighbors to find a weighted average for the georeferenced point.

Preprocessing and Model Calibration
A Digital Elevation Model (DEM) (Figure 2b) derived from Shuttle Radar Topography Mission (SRTM v.3, 30 m) data was used to derive five ancillary terrain variables (elevation, vertical distance to channel network, valley depth, relative slope position, and channel network base level). These five terrain covariates were chosen to represent the variability of the area ( Table 2). According to Moura-Bueno et al. [42], SRTM can represent maps at scales from 1: 25,000 to 1: 50,000. They were generated using the compound terrain analysis function of SAGA GIS.
RDC values and the reflectance from SYSI were extracted at the same locations as the soil samples using bilinear interpolation. The SYSI provided reflectance in the six original Landsat 5 TM bands (VIS-NIR-SWIR): Band (B) 1 (450-520 nm), B2 (520-600 nm), B3 (630-690 nm), B4 (760-900 nm), B5 (1550-1750 nm), and B7 (2080-2350 nm). Then, the 919 georeferenced samples were divided into textural classes for stratified random sampling as follows (g kg −1 ): very sandy (<100), sandy (100-150), Remote Sens. 2018, 10, 1555 6 of 21 sandy loam (150-250), clayey loam (250-350), clayey (350-600), and very clayey (> 600). Samples were randomly divided into calibration (75%, n = 686) and validation (25%, n = 233) sets. The textural classes are related to the importance of clay and soil management in Brazil, as referenced in Santos et al. [43]. RDC values and the reflectance from SYSI were extracted at the same locations as the soil samples using bilinear interpolation. The SYSI provided reflectance in the six original Landsat 5 TM bands (VIS-NIR-SWIR): Band (B) 1 (450-520 nm), B2 (520-600 nm), B3 (630-690 nm), B4 (760-900 nm), B5 (1550-1750 nm), and B7 (2080-2350 nm). Then, the 919 georeferenced samples were divided into textural classes for stratified random sampling as follows (g kg −1 ): very sandy (<100), sandy (100-150), sandy loam (150-250), clayey loam (250-350), clayey (350-600), and very clayey (> 600). Samples were randomly divided into calibration (75%, n = 686) and validation (25%, n = 233) sets. The textural classes are related to the importance of clay and soil management in Brazil, as referenced in Santos et al. [43].   Clay and sand quantification were also carried out using laboratory spectral measurements. The same dataset was used to convolve to Landsat 5 TM bands. These approaches were used to compare data sources and spectral resolutions. Only the SYSI, RDC, and the two combined were used for the topsoil mapping, as these covariates have their spatial representation. The spatial coordinates were also used as covariates, enabling enhanced spatial stratification by the models. Clay and sand predictive models were tested using the Cubist and Random Forest (RF) algorithms. The Cubist algorithm finds the relationships between soil covariates by calculating decision trees and multiple regressions. It was implemented using the Cubist R package ensemble with 15 to 20 committees trying to find the best fit model [45]. RF constructs an ensemble of multiple decision trees (ntree = 1000 and mtry = default), returning their mean prediction. The RF algorithm was implemented using the randomForest R package [46]. The selected models were considered when the Root-Mean-Squared Error (RMSE) was minimized and stabilized in the calibration. The statistical procedures were carried out in the programming language R [47].

Soil Maps and Validation
Soil clay and sand maps were produced by applying the models using the raster R package [48]. Predictions by Cubist and RF models were made for the pixels containing all the covariates stacked. SYSI had discontinuity gaps (black regions) which generated unmapped areas because they were related to natural forests or other permanent land uses not assessed by GEOS3. The best validated models (lowest RMSE) using the different approaches (SYSI, RDC, and SYSI + RDC) were used to illustrate spatial variations and artifact production between the covariates. Additionally, predictions did not consider possible spatial dependencies for the properties, i.e., using variogram analysis.
The external validation (25% of samples, n = 233) was performed as a comparison between predicted values and results from the reference physical laboratory analysis. The parameters used to evaluate the performance of the models were RMSE, the coefficient of determination (R 2 ), Ratio of Performance to the Interquartile range (RPIQ) [49], and Ratio of Performance to Deviation (RPD). Differences observed in the predicted clay and sand distributions were also described by plotting the model's outputs in the texture triangle implemented through the soiltexture R package [50].

Characterization of Spectral Patterns of Soils from Ground to Space
The characterization of spectral curves from ground to satellite level ( Figure 3) provides the bases for understanding prediction results. The lower intensity for clayey soils is due to energy absorption by iron oxide minerals (hematite) and other opaque minerals (i.e., magnetite and ilmenite). Soils with higher iron oxide content are derived from basalt [51]. When these minerals are absent in the soil, higher quartz content causes a strong increase in reflectance from B3 to B4, with a peak at B5. The spectral curves of clayey samples present low reflectance intensity across the spectrum, with a typical flat shape, which is different than sandy soils with an increasing reflectance shape [52].
The main, albeit slight, difference between convolved spectral measurements and TESS is observed in B1, B5, and B7 ( Figure 3). The decrease of intensity from B5 to B7 being stronger in the TESS information is due to field conditions, since the information came from a pixel [53]. Nevertheless, spectral shapes from the laboratory and image are very similar, which indicates low moisture influence. The resampling of satellite data from laboratory measurements removed the absorption features present in the spectral measurements, preventing the detection of specific features for mineral identification. It can be observed that from B1 to B5, the clayey signature is almost flat, and as soil becomes sandy, it demonstrates a strong increase in B5 (due to quartz). These observations are in agreement with those of Demattê et al. [54] and Franceschini et al. [6]. Moreover, as the laboratory information is measured under controlled conditions, the similarity with TESS reinforces the robust processing of GEOS3. of satellite data from laboratory measurements removed the absorption features present in the spectral measurements, preventing the detection of specific features for mineral identification. It can be observed that from B1 to B5, the clayey signature is almost flat, and as soil becomes sandy, it demonstrates a strong increase in B5 (due to quartz). These observations are in agreement with those of Demattê et al. [54] and Franceschini et al. [6]. Moreover, as the laboratory information is measured under controlled conditions, the similarity with TESS reinforces the robust processing of GEOS3. Principal component (PC) analysis yielded similar transformations between laboratory spectral measurements and TESS data (Figure 4), grouping different particle size distributions along the two principal components, similar to the results of Lacerda et al. [55]. As the particle size distribution changes, samples shift slightly to the right, increasing the dispersion of the second component ( Figure 4a). The same behavior occurred for TESS samples (Figure 4b). The increasing dispersion of the second component expresses the quartz influence in the infrared spectral reflectance. Meanwhile, clayey soils are related to opaque oxides from the left-hand side of the first principal component (Figure 4a,b). In the characterization of the textural classes, there was no difference between the weights for the six bands (see expression in Figure 4) in both spectral measurements resampled to multispectral satellite data. The loadings of the six factors (Bands 1, 2, 3, 4, 5, and 7) for PC1 were similar (approximately 0.40), as can be seen in the expression in Figure 4a,b. This shows that all bands are important for the differentiation of textural classes. Additionally, PC1 presents 92% and 93% of the proportion of variance for resampled and satellite data (Figure 4), respectively. Principal component (PC) analysis yielded similar transformations between laboratory spectral measurements and TESS data (Figure 4), grouping different particle size distributions along the two principal components, similar to the results of Lacerda et al. [55]. As the particle size distribution changes, samples shift slightly to the right, increasing the dispersion of the second component ( Figure 4a). The same behavior occurred for TESS samples (Figure 4b). The increasing dispersion of the second component expresses the quartz influence in the infrared spectral reflectance. Meanwhile, clayey soils are related to opaque oxides from the left-hand side of the first principal component (Figure 4a,b). In the characterization of the textural classes, there was no difference between the weights for the six bands (see expression in Figure 4) in both spectral measurements resampled to multispectral satellite data. The loadings of the six factors (Bands 1, 2, 3, 4, 5, and 7) for PC1 were similar (approximately 0.40), as can be seen in the expression in Figure 4a,b. This shows that all bands are important for the differentiation of textural classes. Additionally, PC1 presents 92% and 93% of the proportion of variance for resampled and satellite data (Figure 4), respectively. Remote Sens. 2018, 10, x FOR PEER REVIEW 9 of 21

Performance of Clay and Sand Content Predictions
Comparing the models, the Cubist algorithm was more accurate than RF for clay and sand content prediction (Table 3). The decrease in error (RMSE) using Cubist was approximately 20 g kg −1 when using the laboratory spectral reference models, regardless of particle size fraction, while for the SYSI and the other approaches, the decrease in RMSE was between 2 and 10 g kg −1 (Table 3). Although it was not our intention to compare the accuracy of Cubist with RF as both are similar machine learning approaches with superior performance [56,57], the Cubist algorithm was chosen as the best global predictor (RMSE minimization) to be used as the standard algorithm for clay and sand mapping over the study area. Predictions were better for clay than for sand content (Table 3). The sand RMSE was over 100 g kg −1 in some approaches, but was reasonable for clay, although it had been expected that sand estimates would be better than clay estimates due to the higher representativeness of this fraction in the area. The best clay prediction was achieved by applying Cubist with the laboratory spectral reference data (R 2 = 0.86, RMSE = 59.02 g kg −1 , RPD = 2.73, RPIQ = 1.97), which provided a better representation of soil variability. The use of laboratory spectral reference data to estimate clay and sand content was applied to compare the performance of SYSI. Indeed, these results indicated similar performance, despite the limited resolution of the multispectral data. The use of laboratory spectral information has been previously demonstrated to be a good alternative for soil characterization and the prediction of soil properties [58].
A fair comparison between the approaches was made using the laboratory spectral resampled data (Table 3). The laboratory convolved reflectance produced similar results to SYSI and SYSI plus RDC. The best accuracies were achieved using Cubist with only the SYSI bands as predictors (Table  3, RMSE = 65.01 g kg −1 clay and RMSE = 79.99 g kg −1 sand). The accuracies were even close to the laboratory spectral data for estimating sand (Table 3, RMSE = 79.29 g kg −1 ). The results from applying the combination of SYSI with RDC did not differ from those from SYSI itself (R 2 of 0.83 and 0.83, respectively, for clay; R 2 of 0.85 and 0.86, respectively, for sand). The prediction using only the RDC had the lowest R 2 , RPD, and RPIQ (Table 3)

Performance of Clay and Sand Content Predictions
Comparing the models, the Cubist algorithm was more accurate than RF for clay and sand content prediction (Table 3). The decrease in error (RMSE) using Cubist was approximately 20 g kg −1 when using the laboratory spectral reference models, regardless of particle size fraction, while for the SYSI and the other approaches, the decrease in RMSE was between 2 and 10 g kg −1 (Table 3). Although it was not our intention to compare the accuracy of Cubist with RF as both are similar machine learning approaches with superior performance [56,57], the Cubist algorithm was chosen as the best global predictor (RMSE minimization) to be used as the standard algorithm for clay and sand mapping over the study area. Predictions were better for clay than for sand content (Table 3). The sand RMSE was over 100 g kg −1 in some approaches, but was reasonable for clay, although it had been expected that sand estimates would be better than clay estimates due to the higher representativeness of this fraction in the area. The best clay prediction was achieved by applying Cubist with the laboratory spectral reference data (R 2 = 0.86, RMSE = 59.02 g kg −1 , RPD = 2.73, RPIQ = 1.97), which provided a better representation of soil variability. The use of laboratory spectral reference data to estimate clay and sand content was applied to compare the performance of SYSI. Indeed, these results indicated similar performance, despite the limited resolution of the multispectral data. The use of laboratory spectral information has been previously demonstrated to be a good alternative for soil characterization and the prediction of soil properties [58].
A fair comparison between the approaches was made using the laboratory spectral resampled data ( Table 3). The laboratory convolved reflectance produced similar results to SYSI and SYSI plus RDC. The best accuracies were achieved using Cubist with only the SYSI bands as predictors (Table 3, RMSE = 65.01 g kg −1 clay and RMSE = 79.99 g kg −1 sand). The accuracies were even close to the laboratory spectral data for estimating sand (Table 3, RMSE = 79.29 g kg −1 ). The results from applying the combination of SYSI with RDC did not differ from those from SYSI itself (R 2 of 0.83 and 0.83, respectively, for clay; R 2 of 0.85 and 0.86, respectively, for sand). The prediction using only the RDC had the lowest R 2 , RPD, and RPIQ (Table 3). Predicting clay using only terrain covariates resulted in R 2 values of 0.61 and 0.64 with the highest RMSE values of 97.73 and 93.44 g kg −1 for RF and Cubist models, respectively. Moreover, the prediction of sand was slightly superior, reaching R 2 values of 0.63 and 0.65, and RMSE values of 128.52 and 125.69 g kg −1 . Henderson et al. [21] reported similar results in clay quantification using covariates derived from digital elevation models. The authors obtained an R 2 of 0.44 and Stępień et al. [59] found an R 2 of 0.2 using elevation. Landrum et al. [60] found a strong relationship between relief and clay content for soils in the northeast region of Brazil. The influence of topography on soil properties is widely discussed in Florinsky et al. [61]. In fact, the parameters of relief provide a moderate understanding of soil-landscape relationships, while the satellite images added detailed information representing the study area. Ben-Dor et al. [62] conducted an extensive literature review on the importance of satellite imagery to evaluating and characterizing soils. Nanni et al. [63] found a close relationship between satellite and field reference data, highlighting the importance of spectral reflectance for quantifying soil properties.
Several approaches have been used to map topsoil clay, such as using the vegetation index [64], airborne hyperspectral soil data of single images [65], terrain attributes, geology-based stratification [30], and electromagnetic induction techniques [66], among others. Additionally, the clay fraction has also been quantified using multispectral satellite sensors and laboratory spectral sensors (Table 1). These results can be discussed with our findings. Ahmed and Iqbal [67] found a good relationship between Bands 4 and 6 from Landsat 5 TM for clay estimation (R 2 = 0.509 and RMSE = 5.02%), but lower than the result found in Table 3 (R 2 = 0.83, RMSE = 65.01 g kg −1 ). The prediction of clay in the laboratory usually produces better results than other sensors (satellite), in accordance with the study by Ahmed and Iqbal [67], who found an R 2 reaching a maximum of 0.90. Their result, however, is not far from our findings (R 2 = 0.86 and RMSE = 59.02 g kg −1 ). Laboratory spectral data have a strong relationship with soil constituents due to improved experimental conditions [58,68,69], but several bands are used in the models. Despite this, the present study indicates that only a few bands would be sufficient to quantify clay and sand, which confirms the importance of this tool for mapping large areas using images [62].
At satellite sensors (i.e., at approximately 800 km distance from the targets), R 2 performance decreases [70]. Indeed, Coleman et al. [7] started with intermediate performances, reaching an R 2 of 0.4 for clay mapping using Landsat. Khalil et al. [71] reached an R 2 of 0.4 for the same attribute using a single image. Furthermore, the study by Shabou et al. [24] was based on 100 soil samples and a dataset of four Landsat TM scenes, reporting an R 2 of 0.65 and an RMSE of 100 g kg −1 for clay. These accuracies were lower than those found in the present study (R 2 = 0.83 and RMSE = 65.01 g kg −1 ) because here we applied 919 soil samples to 151 Landsat images. Our results can be directly related to the spectral morphology evaluation (Figure 3), which confirms that moisture from field measurements did not significantly interfere in the dataset. As stated by Ackerson et al. [72], when soil samples have differences in moisture, they do not have a linear relationship with the property to be estimated and, thus, there is no correlation. We could only depict these details because of the descriptive evaluation of the spectra, which assists in deciding whether the pixel is from soils as a clinical evaluation and not just a mathematical artifact.
In general, we have observed in previous studies that the R 2 values for clay prediction, using all kinds of sensors, range from approximately 0.2 to 0.9. The spectral sensors with fewer bands, like IKONOS, have lower results. It is evident that the scientific community emphasizes the use of hyperspectral data for soil mapping using, for instance, the airborne hyperspectral imaging spectrometer (HyperSpecTIR) [73] and the airborne hyperspectral sensor 160 (AHS) [74]. However, it is also evident that the use of multispectral satellite data can achieve good performance similar to that of laboratory spectral data [65,75]. These results show that despite the higher efficiency expected for hyperspectral sensors, it is not only the number of bands that matter for estimating clay and sand content, but also having bands at key wavelengths related to soil properties.

Variable Importance for Mapping Soil Attributes
Using the Cubist model and only the SYSI, B7 was found to be the most important covariate in the prediction of clay and sand (Figure 5a,b). Bands 5, 4, and 3 were also highly important to both attribute models. Bands 2 and 1 along with X and Y coordinates provided a slight contribution to the prediction of both soil attributes. Bands of the visible spectral region participated more in sand estimation as this soil particle is strongly related to the high albedo of quartz, while clay content is related to energy absorption by its minerals.
The high dependence of spatial coordinates showed contrasting results for the RDC covariates. Some parameters of RDC showed high participation in clay and sand estimation (Figure 5c,d). However, for clay quantification, the highest participation for conditional rules was from the Y (89%) and X coordinates (41%), indicating strong control from unknown factors, possibly geology. In the regression models of Cubist trees, the variable with the highest contribution was the DEM with 89% inclusion, followed by X coordinate, CNBL, valley depth, Y coordinate, and VDTCN. The contributions for sand models were similar (Figure 5d). The difference was that the relative slope had higher participation in sand models than in clay models.
For the use of the SYSI plus RDC (Figure 5e,f), B7 was the most important covariate for the Cubist conditional rules for both clay and sand models. Subsequently, for all other conditional variables, a low contribution was observed. For the regression part of Cubist, B7 was the most used covariate (>90%), followed by B5, with 89% and 87% for clay and sand modelling, respectively. These SWIR bands (B5 and B7) are still most frequently employed in both attribute estimates, indicating the strong role of this spectral region in the prediction of clay and sand attributes. Although the spectral resolution of Landsat 5 is coarse, the SWIR spectral region contains a relative contribution from the clay mineral absorption feature, which may explain why these bands were important for the attribute estimation [76].

Clay and Sand Content Maps
In both datasets, most soil samples present high sand content and low silt (Figure 6a). Clay distribution is highly concentrated between 100 and 300 g kg −1 . Regarding all samples, 37% were classified as very sandy and sandy, 43% were loamy in texture, and 20% were classified as clayey. The variability of the particle size content is due to geological influences, as the soils of the region are formed by different parent materials, from igneous (basalt) to sedimentary rocks (sandstones and other mixes). The prediction of soil particle size distribution had different performances, although

Clay and Sand Content Maps
In both datasets, most soil samples present high sand content and low silt (Figure 6a). Clay distribution is highly concentrated between 100 and 300 g kg −1 . Regarding all samples, 37% were classified as very sandy and sandy, 43% were loamy in texture, and 20% were classified as clayey. The variability of the particle size content is due to geological influences, as the soils of the region are formed by different parent materials, from igneous (basalt) to sedimentary rocks (sandstones and other mixes). The prediction of soil particle size distribution had different performances, although the samples were uniformly represented in the validation dataset (Figure 6a). Comparing the predictions, all the strategies had a slight tendency to underestimate the clayey samples (Figure 6b,c), except for the use of SYSI plus RDC, which improved representativeness of clayey-textured soils (Figure 6d).
Remote Sens. 2018, 10, x FOR PEER REVIEW 13 of 21 the samples were uniformly represented in the validation dataset (Figure 6a). Comparing the predictions, all the strategies had a slight tendency to underestimate the clayey samples (Figure 6b,c), except for the use of SYSI plus RDC, which improved representativeness of clayey-textured soils (Figure 6d). The clay and sand content maps were produced using the Cubist model. This model performed better than did the RF based on the validation parameters (Table 3). It is important to note that the maps were produced using global predictors instead of spatial prediction functions, which did not consider spatial dependence for the estimations. Because the Cubist algorithm combines the use of decision rules and multiple linear regression to calibrate a model, only the maps that used SYSI had better visual continuity. Spatial continuity is the ability to represent surface variations at finer scales (Figure 7a,b,e,f). The maps with only RDC (Figure 7c,d) provided satisfactory performance, albeit with poor visual continuity due to the presence of large artifacts created by Cubist computation [77]. The clay and sand content maps were produced using the Cubist model. This model performed better than did the RF based on the validation parameters (Table 3). It is important to note that the maps were produced using global predictors instead of spatial prediction functions, which did not consider spatial dependence for the estimations. Because the Cubist algorithm combines the use of decision rules and multiple linear regression to calibrate a model, only the maps that used SYSI had better visual continuity. Spatial continuity is the ability to represent surface variations at finer scales (Figure 7a,b,e,f). The maps with only RDC (Figure 7c,d) provided satisfactory performance, albeit with poor visual continuity due to the presence of large artifacts created by Cubist computation [77]. Terrain inference has usually been used to map soils because most of the surface is covered by vegetation and DEMs are globally available to users. However, the use of relief landforms is insufficient when it comes to explaining variations in soils along the landscape. Thus, spectral features related to mineralogical characteristics of soils may improve predictions in digital soil mapping. Reflectance factors derived from the SYSI and laboratory data show high linear relationships with clay and sand content for the full VIS-NIR-SWIR spectral bands (Figure 8). Spectral Terrain inference has usually been used to map soils because most of the surface is covered by vegetation and DEMs are globally available to users. However, the use of relief landforms is insufficient when it comes to explaining variations in soils along the landscape. Thus, spectral features related to mineralogical characteristics of soils may improve predictions in digital soil mapping. Reflectance factors derived from the SYSI and laboratory data show high linear relationships with clay and sand content for the full VIS-NIR-SWIR spectral bands (Figure 8). Spectral reflectance reached correlations of at least 0.45 and 0.57 with sand and clay content, respectively, with a maximum of 0.78 for clay. The RDC covariates had poor to intermediate associations. Nevertheless, the integration of RDCs with other kinds of data has been demonstrated to be a useful strategy for soil mapping [78]. It is unusual that satellite data achieve similar results to laboratory results. Considering our results, the use of the SYSI shows great potential for soil mapping, and the spatial trends from the images can increase the prediction accuracy of unmapped regions.  [78]. It is unusual that satellite data achieve similar results to laboratory results. Considering our results, the use of the SYSI shows great potential for soil mapping, and the spatial trends from the images can increase the prediction accuracy of unmapped regions. It is worth mentioning that the predicted maps using SYSI have consistency in the field. Figure  9a shows a study case illustrating the variation of clay within a relief landform, which could not be estimated using only the relief parameters. The laboratory ( Figure 9b) and SYSI (Figure 9c) spectral patterns confirm that the two locations have different soils, with 160 and 650 g kg −1 of clay estimated from the physical laboratory analysis. The illustration shows an example in which the relief would not be able to detect this variation when the region has the same relief parameters (elevation, slope, and others). In this study we provided a system to process multitemporal and multispectral satellite images to aid soil mapping. However, using satellite platforms (e.g., using Landsat) is a challenging task, as the sensor is located 800 km away from the surface with coarse spatial resolution (where spectral reflectance mixtures represent different materials within the pixel) and limited spectral resolution. Additionally, several factors can impact soil reflectance, such as atmospheric attenuation, ground surface roughness, presence of topographic shade, rocks, tree stumps, crops, and crop stubble or debris.
It is important to state that the key points of GEOS3 are based on image selection, the evaluation of the spectral signature, and validation indicators. The foremost contribution of SYSI to digital soil mapping is based on the empirical quantitative description of soil formation factors in the scorpan model [10]. As we have shown, SYSI is innovative and can play an important role in predicting topsoil attributes. Thus, SYSI can be considered as a new source of s for scorpan for predicting soil classes and/or attributes. In fact, McBratney et al. [10] pointed out that the s factor is expected to increase in importance for soil predictions as technology enhances. It is worth mentioning that the predicted maps using SYSI have consistency in the field. Figure 9a shows a study case illustrating the variation of clay within a relief landform, which could not be estimated using only the relief parameters. The laboratory ( Figure 9b) and SYSI (Figure 9c) spectral patterns confirm that the two locations have different soils, with 160 and 650 g kg −1 of clay estimated from the physical laboratory analysis. The illustration shows an example in which the relief would not be able to detect this variation when the region has the same relief parameters (elevation, slope, and others). In this study we provided a system to process multitemporal and multispectral satellite images to aid soil mapping. However, using satellite platforms (e.g., using Landsat) is a challenging task, as the sensor is located 800 km away from the surface with coarse spatial resolution (where spectral reflectance mixtures represent different materials within the pixel) and limited spectral resolution. Additionally, several factors can impact soil reflectance, such as atmospheric attenuation, ground surface roughness, presence of topographic shade, rocks, tree stumps, crops, and crop stubble or debris.
It is important to state that the key points of GEOS3 are based on image selection, the evaluation of the spectral signature, and validation indicators. The foremost contribution of SYSI to digital soil mapping is based on the empirical quantitative description of soil formation factors in the scorpan model [10]. As we have shown, SYSI is innovative and can play an important role in predicting topsoil attributes. Thus, SYSI can be considered as a new source of s for scorpan for predicting soil classes and/or attributes. In fact, McBratney et al. [10] pointed out that the s factor is expected to increase in importance for soil predictions as technology enhances.

Conclusions
The spectral signature obtained from the Synthetic Soil Image (SYSI) presented very similar patterns to the equivalent spectral signature of the laboratory data. From clayey to sandy texture, the

Conclusions
The spectral signature obtained from the Synthetic Soil Image (SYSI) presented very similar patterns to the equivalent spectral signature of the laboratory data. From clayey to sandy texture, the spectral signature moved from flat low reflectance to a strong increase after Band 4, respectively. The quantification of clay and sand contents using Temporal Synthetic Spectral Reflectance (TESS) well matched the respective georeferenced location of the laboratory spectral samples. This supported the robustness of the TESS collected along the time series.
The topsoil prediction of clay and sand fractions using the SYSI presented higher-accuracy results than using Relief-Derived Covariates (RDC) alone. The attribute maps originating from the RDC approach presented spatial artifacts, despite their significance for the soil-landscape relationship. The integrated approach of the SYSI + RDC also produced accurate predictions, albeit ones that were similar to the SYSI alone. Relief is important, but it uses inferential, empirical, or mathematical approaches to estimate a soil attribute, while the SYSI is based on the availability of remote sensing spectral reflectance measurements at different times which are then used to achieve spatially continuous results rather than modeling, interpolating, or inferencing its reflectance. Furthermore, the SYSI performance was quite similar to the laboratory reference modelling results, demonstrating its strong capability for soil mapping.
TESS has provided novel information because it gives the user the capacity to evaluate the morphology of the object's spectral signature, as information from specific points, to make an adequate decision. In addition, TESS provides a first impression of soil texture through spectral signature evaluation and SYSI supports this with the precise variation of the soil characteristics along the landscape.
The paradigm asserted by many users that soils cannot be evaluated from images because they are usually covered by vegetation is partially true. The GEOS3 method depicts images from different periods and could fill gaps that were covered with vegetation in other periods with bare soil, thus producing continuous spatial attribute maps. In forestry areas, this technique will certainly have more limitations and its usefulness will be situationally dependent.
Looking towards the future, the technique can be applied to optimizing other digital mapping projects of soils all over the world. The results are promising because they may assist in mapping large areas with bare soils occurring due to agriculture and/or anthropogenic action, degradation, and climate processes.