Estimation of Soil Organic Carbon Contents in Croplands of Bavaria from SCMaP Soil Reﬂectance Composites

: For food security issues or global climate change, there is a growing need for large-scale knowledge of soil organic carbon (SOC) contents in agricultural soils. To capture and quantify SOC contents at a ﬁeld scale, Earth Observation (EO) can be a valuable data source for area-wide mapping. The extraction of exposed soils from EO data is challenging due to temporal or permanent vegetation cover, the inﬂuence of soil moisture or the condition of the soil surface. Compositing techniques of multitemporal satellite images provide an alternative to retrieve exposed soils and to produce a data source. The repeatable soil composites, containing averaged exposed soil areas over several years, are relatively independent from seasonal soil moisture and surface conditions and provide a new EO-based data source that can be used to estimate SOC contents over large geographical areas with a high spatial resolution. Here, we applied the Soil Composite Mapping Processor (SCMaP) to the Landsat archive between 1984 and 2014 of images covering Bavaria, Germany. Compared to existing SOC modeling approaches based on single scenes, the 30-year SCMaP soil reﬂectance composite (SRC) with a spatial resolution of 30 m is used. The SRC spectral information is correlated with point soil data using different machine learning algorithms to estimate the SOC contents in cropland topsoils of Bavaria. We developed a pre-processing technique to address the issue of combining point information with EO pixels for the purpose of modeling. We applied different modeling methods often used in EO soil studies to choose the best SOC prediction model. Based on the model accuracies and performances, the Random Forest (RF) showed the best capabilities to predict the SOC contents in Bavaria (R 2 = 0.67, RMSE = 1.24%, RPD = 1.77, CCC = 0.78). We further validated the model results with an independent dataset. The comparison between the measured and predicted SOC contents showed a mean difference of 0.11% SOC using the best RF model. The SCMaP SRC is a promising approach to predict the spatial SOC distribution over large geographical extents with a high spatial resolution (30 m).


Introduction
Precise knowledge about the distribution of soil organic carbon (SOC) contents in agricultural soils is a valuable information for, e.g., food security issues [1] or global climate change [2]. The organic carbon stocks in soils represent one of the largest reservoirs in the global carbon cycle [3,4] and are affected by various drivers [5]. Soils with sufficiently high [6,7] and balanced SOC contents are considered healthy soils [8,9] and are less prone to impacts of climate change [7]. Adequate land management is necessary to preserve soil health and soil quality [10] and enables an increase of agroecosystem resiliency [11].
To capture and quantify SOC contents in agricultural soils for efficient and sustainable land use, data with high spatial resolution is needed in order to understand the impacts of climate change on soil quality [12]. High-resolution surveys at the national to regional scale are urgently required, as detailed spatial patterns in SOC are an important aspect for land management at the farm or even the field scale [13]. For applications with a large geographical extent [14] (national to European-wide), SOC maps are mostly available with a spatial resolution of 250 m to 1 km. The European Soil Data Center (ESDAC) provides several pan-European SOC maps. However, both well-known maps OCTOP: Topsoil Organic Carbon Contents for Europe [15] and the Topsoil Soil Organic Carbon Map based on the LUCAS (Land Use/Cover Area frame statistical Survey) soil datasets for EU25 [16] are distributed in a coarse 1 km raster format. The maps have, therefore, a limited suitability as a basis for high-resolution analysis at the farm or even the field scale.
Earth Observation (EO) can be a valuable data-source for area-wide mapping with a resolution that allows distinguishing between or even with field patterns [17]. In this context, hyperspectral (e.g., [18][19][20][21]) and multispectral images (e.g., [22][23][24]) are commonly used EO datasets to derive SOC contents. In Table 1, studies for different European regions are compared regarding their capabilities of SOC modeling. Generally, point soil information was correlated with multi-or hyperspectral pixel values using different machine learning (ML) techniques to derive SOC contents. However, in most studies, the estimation of SOC was restricted to relatively small areas (0.09 to 10.000 km 2 ) in which the soil conditions (bare, smooth and dry soils) were considered to be optimal. This optimization prevented applying the models to cover larger geographical areas, except for a couple of previous efforts [25,26]. Additionally, the use of hyperspectral and multispectral remote sensing data for the estimation of SOC contents and other soil variables is hampered by the need for data that provide bare soil conditions. Mapping of exposed soils and the estimation of soil parameters is challenging due to temporal or permanent vegetation cover [27]. The area of exposed soils on a single remote sensing scene is limited, and often, the periods in which exposed soils dominate are restricted to short time windows [28] when the soil is in seedbed condition. Compositing techniques of multi-temporal satellite image archives provide an alternative and are widely used in the literature [25][26][27][28][29][30][31][32][33][34]. The compositing approach allows combining all bare soils of all input scenes, which enables a joint estimation of soil parameters for all exposed soils in the observed time period. For several years, new compositing techniques were developed in the course of opening the Landsat archive [35] that can retrieve exposed soils from multi-temporal satellite image archives [26,28,36,37]. An averaging of exposed soil areas over several years allows producing a new and spatially enhanced data source for soil analyses. Here, the soil spectra are relatively independent from seasonal differences in soil moisture and other soil surface conditions occurring during rain events or longer drought periods. In the resulting new data source, only permanent spatial soil moisture differences such as for the different soil types and texture characteristics remain. However, an in-depth proof of this assumption has not yet been provided.
The operational Soil Composite Mapping Processor (SCMaP) is a multitemporal compositing approach [36], which enables an automated generation of area-wide soil reflectance composites (SRC) for the estimation of soil parameters using all available multispectral reflectance images for a defined period. So far, SCMaP SRC has not been used as an EO database for the SOC modeling of exposed topsoils in croplands of large geographical extents. Therefore, in this study, the SOC modeling capabilities of the SCMaP SRC are investigated and performed for a large portion of the German federal state of Bavaria and adjacent areas (about 130,000 km 2 ) as solid calibration and validation datasets are available.   Generally, linking point data with EO images (30 m, 20 m pixel resolution) can be considered as a potential source of inaccuracies for soil parameter modeling as not all samples are collected at least 30 m from the field border the sample is related to [38]. In this case, the EO pixel may reflect the signal from adjacent fields with different spectral information, which is related to a single soil sample. New approaches are necessary that can handle the misalignment of the soil database and the spectral pixel information from the EO images for SOC modeling.
The overall purpose is to test the potential of the SCMaP SRC database derived from Landsat images covering 30 years to derive a high-resolution map of SOC contents in Bavarian croplands. For this purpose, the SRC is correlated with point soil measurements to derive spatial SOC contents for an area-wide mapping approach. The objectives of the study are: Develop a spatial/spectral filtering technique to prepare the point dataset of the Bavarian test site for modeling purpose using the novel SCMaP SRC.

2.
Apply the 30-year SCMaP SRC to derive SOC contents in Bavaria using different machine learning algorithms.

3.
Validate the SOC map using an additional independent external dataset not included in the model calibration and validation.

Study Area
The study area covers most of the Federal State of Bavaria ( Figure 1) and adjacent regions in southeast Germany and was selected regarding the diversity of landscape and soil types. The area south of 48 • N was excluded as permanent grassland is the dominating land use in this region, and SCMaP is not able to detect soils covered by permanent vegetation. Moreover, mountainous regions of the Alps in southern Bavaria were also excluded. Due to the frequent cloud coverage in this region, only a small number of cloudless scenes per pixel were available for the compositing process compared to other parts of Bavaria.
Generally, linking point data with EO images (30 m, 20 m pixel resolution) can b considered as a potential source of inaccuracies for soil parameter modeling as not a samples are collected at least 30 m from the field border the sample is related to [38]. I this case, the EO pixel may reflect the signal from adjacent fields with different spectra information, which is related to a single soil sample. New approaches are necessary tha can handle the misalignment of the soil database and the spectral pixel information from the EO images for SOC modeling.
The overall purpose is to test the potential of the SCMaP SRC database derived from Landsat images covering 30 years to derive a high-resolution map of SOC contents i Bavarian croplands. For this purpose, the SRC is correlated with point soil measurement to derive spatial SOC contents for an area-wide mapping approach. The objectives of th study are: 1. Develop a spatial/spectral filtering technique to prepare the point dataset of th Bavarian test site for modeling purpose using the novel SCMaP SRC.
2. Apply the 30-year SCMaP SRC to derive SOC contents in Bavaria using differen machine learning algorithms.
3. Validate the SOC map using an additional independent external dataset not include in the model calibration and validation.

Study Area
The study area covers most of the Federal State of Bavaria ( Figure 1) and adjacen regions in southeast Germany and was selected regarding the diversity of landscape an soil types. The area south of 48° N was excluded as permanent grassland is the dominatin land use in this region, and SCMaP is not able to detect soils covered by permanen vegetation. Moreover, mountainous regions of the Alps in southern Bavaria were als excluded. Due to the frequent cloud coverage in this region, only a small number o cloudless scenes per pixel were available for the compositing process compared to othe parts of Bavaria.  The study area comprises about 130,000 km 2 , in which the elevation ranges between 100 m and 1000 m above sea level. The mean annual temperature lies between 6 • C and 10 • C, and the precipitation is between 551 mm and 1800 mm. The region is mainly dominated by Cambisols, Luvisols, Stagnosols, Gleysols and Leptosols [49] according to the World Reference Base for Soil Resource ( [50]).

Soil Organic Carbon Modeling
An overview of the SOC modeling approach is outlined in Figure 2. Landsat 4-7 collection data from 1984 to 2014 are used to build the SRC based on the SCMaP workflow (Section 2.3).
To calibrate an SOC model, SRC reflectance values and spectral indices (Section 2.3) are regressed against topsoil SOC measurements provided by two local authorities and the European LUCAS survey (Section 2.6). dominated by Cambisols, Luvisols, Stagnosols, Gleysols and Leptosols [49] according to the World Reference Base for Soil Resource ( [50]).

Soil Organic Carbon Modeling
An overview of the SOC modeling approach is outlined in Figure 2. Landsat 4-7 collection data from 1984 to 2014 are used to build the SRC based on the SCMaP workflow (Section 2.3). To calibrate an SOC model, SRC reflectance values and spectral indices (Section 2.3) are regressed against topsoil SOC measurements provided by two local authorities and the European LUCAS survey (Section 2.6).
Due to the positioning of several measurements at field borders, and thus, the potential integration of disturbances, a new filtering technique was developed and applied to evaluate the quality of calibration samples (Section 2.4). For the regression, three machine learning algorithms were tested and evaluated (Section 2.5). The models were trained for different datasets combining the reflectances and additional spectral indices.
Per algorithm, the best model is chosen, and the SOC contents are predicted for the entire study area. The prediction results are validated against external, independent SOC point measurements not included in the model calibration by a spatial correlation analysis (Section 2.7).

SCMaP SRC and Spectral Indices
The SCMaP chain [36] allows the generation of soil reflectance composites for individually determined time periods of different years. Bare soil pixels are selected based on a modified vegetation index (PV) using two thresholds that allow separating predominantly undisturbed soils from all other land cover types such as permanent vegetation and permanent non-vegetation. The derivation of the thresholds is based on Due to the positioning of several measurements at field borders, and thus, the potential integration of disturbances, a new filtering technique was developed and applied to evaluate the quality of calibration samples (Section 2.4). For the regression, three machine learning algorithms were tested and evaluated (Section 2.5). The models were trained for different datasets combining the reflectances and additional spectral indices.
Per algorithm, the best model is chosen, and the SOC contents are predicted for the entire study area. The prediction results are validated against external, independent SOC point measurements not included in the model calibration by a spatial correlation analysis (Section 2.7).

SCMaP SRC and Spectral Indices
The SCMaP chain [36] allows the generation of soil reflectance composites for individually determined time periods of different years. Bare soil pixels are selected based on a modified vegetation index (PV) using two thresholds that allow separating predominantly undisturbed soils from all other land cover types such as permanent vegetation and permanent non-vegetation. The derivation of the thresholds is based on an automated technique described in [51]. All selected bare soil pixels are averaged. The operational SCMaP chain can be used to build SRCs containing all pixels in a given time period showing at least once exposed soil.
For the SOC modeling, a period of 30 years (1984-2014) was chosen to provide a smooth spectral database that averages the seasonal variabilities of bare soils. The 30-year period was chosen for several reasons. A high possible soil coverage should be achieved. Using 5-year composites, 31.79% to 34.17% of the entire investigation area are selected as bare soil pixels. Ten-year composites provide 37.54% to 41.21% as exposed soils, and the 30-year composite enables the analysis of 54.53% of the investigation area as uncovered soils. Additionally, a possible large number of soil samples was intended to be used in the modeling dataset. A reduction of the compositing period would have significantly reduced the number of soil samples. For 5-year periods, 112 to 397 soil samples were collected in the respective periods. Based on 10-year composites, 261 to 536 samples are available. For a 30-year compositing range, 1,250 samples can be used for the modeling dataset. Additionally, an average over multiple years enables a reduction of seasonal soil moisture differences and permanent spatial differences remaining in the composite.
The SRC was processed for 228 Landsat-4 TM, 9,990 Landsat-5 ETM and 4,333 Landsat-7 ETM+ collection scenes [52] available between 1984 and 2014. For all scenes of all sensors, the same pre-processing steps were performed. The FMask algorithm [53,54] was used to detect and remove clouds, cloud shadows and pixels that were covered by snow. Additionally, an atmospheric correction was applied to all scenes using Atmospheric Topographic Correction (ATCOR) software for satellite imagery [55]. The quality of the composites is defined, among other factors, by the number of cloudless scenes per pixel [50]. The consistently large number of cloudless scenes per pixel for the total investigation time is given in Figure A1 in Appendix A.
In addition to the point spectral information of the SCMaP SRC, different indices were selected and computed ( Table 2). Indices are commonly used in remote sensing to parameterize specific spectral features caused by physical and/or chemical properties [56]. Besides established indices, an additional index (SCMaPI) was developed to capture the difference between the green and the SWIR I bands of the SCMaP SRC. The SCMaPI shows smaller differences for high SOC content and higher differences for lower SOC content.

Spectral/Spatial Filtering Technique
Based on Landsat imagery, SCMaP provides soil reflectance information with a pixel resolution of 30 m. The link of a point soil sample to a 30 m pixel can result in inaccuracies if the soil sample is not collected at least 30 m from the field border. In this case, the SCMaP pixel may combine multiple surfaces with different spectral information, which are related to one soil sample. A proportion of the soil samples (especially the LUCAS points, [38]) are often taken within a few meters from the borders of agricultural fields, e.g., as shown by the photo documentation of the sampling points at the LUCAS viewer online (https://ec. europa.eu/eurostat/statistical-atlas/gis/viewer/?config=LUCAS-2009.json, accessed on 6 August 2021). The disturbance factors primarily exist at the field boundaries. Eliminating all samples that were collected within 30 m of the field border could decrease the number of biased pixels. However, this would drastically decrease the database and was therefore not considered. Instead, a spectral/spatial filtering technique was developed to prepare the soil database and the spectral information from the EO images for SOC modeling.
The filtering technique evaluates the spectral differences between the sample SRC pixel and its eight neighboring pixels. A comparison of the sample spectra to the neighboring pixel spectra allows an estimation if the reflectance spectra of the sample pixel are influenced by any external disturbances or data artefacts (e.g., mixed spectra of soil and a small portion of vegetation, local variation) or if they are comparable to the surrounding spectra. The spectral/spatial filtering aims to detect pixel clusters with deviating spectra to remove this from further processing. For this purpose, all STDs per pixel cluster per band were used to define a threshold to exclude the deviating pixel clusters. Twice, the STD per band of all pixel cluster STDs was selected as the threshold. The threshold was determined and applied per band. The identified pixel cluster containing at least one to several spectral bands above the thresholds was excluded from the dataset.
The calibration dataset was randomly split into a training (70%) and test (30%) subset. The training set was used to train the model, whereas the test subset of the calibration data was used to validate the model. For the model calibration and validation, common accuracy and performance measures, such as the R 2 (coefficient of determination, from sklearn.metrics), the root mean square error (RMSE) and the ratio of performance to deviation (RPD), were used to evaluate the model performances and to allow a comparison with the literature ( Table 1). The RPD is an established performance measure that determines the quality of a model [81]. Moreover, the commonly used accuracy and performance measure, the Concordance Correlation Coefficient (CCC) [82], is given to assess the agreement between the predicted and measured SOC contents. Additionally, ten-fold cross-validation (cv) was performed to evaluate the performance of the models. The cv was applied to the training subset of the calibration data. In addition to the established accuracy and performance measures, an analysis of the standardized residuals and the autocorrelation of the residuals for the model calibration samples is given in Figures A2 and A3 in the Appendix A.
Besides the reflectances, additional spectral indices ( Table 2) per spectrally/spatially filtered sampling cluster were calculated and implemented in the modeling framework to investigate the influence of further spectral details. For this purpose, for each algorithm, three different model setups were prepared to estimate the influence of the spectral indices on the modeling capabilities. The models were trained based on (a) the composite reflectances (R), (b) the composite reflectances and all indices (RI_all) and (c) the composite reflectances and for each algorithm individually selected indices (RI_sel). Besides the reflectances and indices, no other covariates (e.g., clay content of climate variables) were used in the modeling framework.
For each algorithm, a selection of important features (RI_sel) was performed. The identification of the relevant features for the MLR was based on a linear correlation (Pearson's correlation from Python sklearn.metrics). First, the relationship between the reflectances and indices to the modeling variable SOC was evaluated to exclude insignificant features (correlation coefficients (R) > 0.3). All significant feature pairs with correlation coefficients between −0.7 and 0.7 were then selected for the RI_sel dataset. For the PLSR, the variable importance in projection (VIP) per feature was calculated. Features with a VIP higher than 1.0 [22,83] were selected for the RI_sel dataset. For the RF, a calculation of the internal feature importance score (Mean Decrease Impurity (MDI)) was performed. Features with a score higher than 4.0% were selected as relevant features [84,85].
For each algorithm, the best model setup was selected regarding the cross-validation results and the model validation accuracies. For the best performance dataset (RI, RI_all, RI_sel), the models were applied to the 30-year SRC to predict the spatial SOC contents for the entire study area.

Soil Samples
For point SOC measurements, all available legacy data to cover the highest possible temporal and spatial overlap with the SRC between 1984 and 2014 were used. Kühnel et al. [86] found no significant OC changes between 1986 and 2015 of Bavarian croplands. The authors analyzed 92 repeatedly measured cropland sites in Bavaria. Therefore, all available sampling points between 1984 and 2014 were combined for the modeling dataset. However, there is a disadvantage of using legacy data, i.e., the sampling schemes are not optimally distributed.
The SOC measurements for the calibration set were provided by the Bavarian Environment Agency (LfU-1071 sampling sites) and the Bavarian State Research Center for Agriculture (LfL-134 sampling sites). Additionally, soil samples (504 sampling sites) collected in the framework of the LUCAS (Land Use/Cover Area frame statistical Survey) 2009 Topsoil Survey provided by the European Soil Data Centre (ESDAC) [87] were added ( Figure 1). The LfU provided a large database with topsoil samples equally distributed across Bavaria [88]. The sites were each sampled once between 1984 and 2014. The LfL calibration dataset contained data from the permanent soil observation program (BDF) of Bavaria. In contrast to the once sampled LfU sites, these 134 BDFs were sampled multiple times in the observation period. As [86] found no significant change between 1986 and 2015 for the BDF sites across Bavaria, the available samples per BDF between 1984 and 2014 were averaged. Thus, one measurement per sampling location is included in the calibration dataset. The LUCAS soil samples were collected in 2009 from unique spatial positions across the investigation area. SOC contents of the LfU, LfL and LUCAS databases were all determined by dry combustion using elemental analyzers [88,89].
From all data sources, the samples intersecting with the SCMaP SRC and within the investigation period (1984 to 2014) were selected, i.e., 1385 soil samples. Per soil sample, the reflectance spectra and its eight neighboring pixels of the SRC were extracted and averaged to reduce local spatial variability. After the spectral/spatial filtering, 1215 sampling points for model calibration are remaining.
The external validation set was provided by the LfL and included 352 cropland fields with point SOC measurements sampled between 2001 and 2008. For each agricultural field, five sampling locations were randomly selected. At each sampling location (radius = 1.5 m), six soil samples were taken. SOC contents of all six soil samples of all five sampling locations were averaged to one SOC content per field. SOC contents were determined by dry combustions using CN elemental analyzers. For the external validation of the dataset, 308 samples were intersecting with the SCMaP SRC.
The data provided by the LfU contained the highest range of SOC contents (0.26% to 18.30%; Table 3). The LfL calibration data showed a lower mean SOC content in comparison to the LfU and LUCAS datasets. Overall, the calibration dataset contained locations with higher SOC contents compared to the external validation dataset. For the model calibration and validation, the distribution of SOC contents of the training (70%) (Figure 3a) and test data (30%) (Figure 3b) were comparable. Both datasets contained samples with high SOC concentrations. The distribution of the SOC percentages of the calibration (cal) and the external independent validation (val) datasets (Figure 3c) showed a similar mean; however, the external validation dataset did not contain as high SOC concentrations.

External Validation
To evaluate the accuracy of the prediction models more precisely, a further validation was performed using an independent external dataset provided by LfL, which was not included in the training. For each regression algorithm, the best data set up based on the

External Validation
To evaluate the accuracy of the prediction models more precisely, a further validation was performed using an independent external dataset provided by LfL, which was not included in the training. For each regression algorithm, the best data set up based on the model accuracies and performance was selected and applied to predict the SOC contents for the entire investigation area. The difference for all samples between the predicted and measured SOC contents of the external independent validation dataset were calculated to estimate the averaged difference between the predicted and the measured contents to provide the reliability of predicting SOC for each algorithm.

Spectral/Spatial Filtering
The spectral/spatial filtering was implemented in the model framework to ensure a high-quality calibration database. In order to ensure homogenous pixel clusters, a threshold is necessary to identify the heterogenous pixel clusters. Most of the nine individual pixel spectra per cluster showed homogenous patterns (Figure 4a). Here, an SOC measurement is linked to valid spectral information. However, a few pixel clusters showed deviating spectra (Figure 4b). These heterogenous pixel clusters with deviating individual spectra are represented by high standard deviations (STD) and need to be filtered, as here the possibility of any external influence or data artefacts (e.g., mixed spectra of soil and a small portion of vegetation, local variation) impacting the cluster is very high.

External Validation
To evaluate the accuracy of the prediction models more precisely, a further validation was performed using an independent external dataset provided by LfL, which was not included in the training. For each regression algorithm, the best data set up based on the model accuracies and performance was selected and applied to predict the SOC contents for the entire investigation area. The difference for all samples between the predicted and measured SOC contents of the external independent validation dataset were calculated to estimate the averaged difference between the predicted and the measured contents to provide the reliability of predicting SOC for each algorithm.

Spectral/Spatial Filtering
The spectral/spatial filtering was implemented in the model framework to ensure a high-quality calibration database. In order to ensure homogenous pixel clusters, a threshold is necessary to identify the heterogenous pixel clusters. Most of the nine individual pixel spectra per cluster showed homogenous patterns (Figure 4a). Here, an SOC measurement is linked to valid spectral information. However, a few pixel clusters showed deviating spectra (Figure 4b). These heterogenous pixel clusters with deviating individual spectra are represented by high standard deviations (STD) and need to be filtered, as here the possibility of any external influence or data artefacts (e.g., mixed spectra of soil and a small portion of vegetation, local variation) impacting the cluster is very high.   Figure 5 shows six histograms of all STDs per band for all sample clusters of the calibration dataset. As the threshold twice, the STD per band was selected to identify and eliminate the heterogenous clusters with deviating spectra. Overall, 135 pixel clusters (9.7% of the total calibration dataset) were eliminated from the calibration dataset. The  Figure 5 shows six histograms of all STDs per band for all sample clusters of the calibration dataset. As the threshold twice, the STD per band was selected to identify and eliminate the heterogenous clusters with deviating spectra. Overall, 135 pixel clusters (9.7% of the total calibration dataset) were eliminated from the calibration dataset. The preprocessed, spectrally/spatially filtered calibration set accordingly comprises 1250 averaged sampling clusters. As shown in Figure 5, using the STD as the threshold, a higher number of pixel clusters (674, 48.7%) are identified as heterogeneous clusters. However, a visual analysis showed that too many pixel clusters would be eliminated using the STD as the threshold. It was also tested to set the threshold at three-fold STD (3 STD). Although, this would result in an insufficient selection of heterogenous pixel clusters. A visual analysis has shown that the filtering of 50 pixel clusters (3.6%) selected by the three-fold STD threshold does not filter all heterogenous pixel clusters sufficiently.
Although, this would result in an insufficient selection of heterogenous pixel clusters. A visual analysis has shown that the filtering of 50 pixel clusters (3.6%) selected by the threefold STD threshold does not filter all heterogenous pixel clusters sufficiently.
The spatial/spectral filtering has a significant positive effect on the model accuracies. The RF was applied to the filtered and unfiltered RI_all datasets. A significant increase of the R² (0.64 to 0.67), a decrease of the RMSE (1.38% to 1.26%) and an increase of the RPD (1.46 to 1.56) indicated an improved performance of SOC modeling.

Feature Selection
For each ML algorithm, a feature selection was performed based on the correlation coefficients for the MLR (Figure 6), the VIP scores for the PLSR (Figure 7a) and the feature importance scores for the RF (Figure 7b). We selected 15 features for the MLR, 14 features for the PLSR, and 10 features for the RF. Overall, similar reflectance bands and indices were identified as important features for the individual RI_sel subsets. For the PLSR and the RF, the Landsat bands two (green), three (red) and four (near infrared-NIR) were The spatial/spectral filtering has a significant positive effect on the model accuracies. The RF was applied to the filtered and unfiltered RI_all datasets. A significant increase of the R 2 (0.64 to 0.67), a decrease of the RMSE (1.38% to 1.26%) and an increase of the RPD (1.46 to 1.56) indicated an improved performance of SOC modeling.

Feature Selection
For each ML algorithm, a feature selection was performed based on the correlation coefficients for the MLR (Figure 6), the VIP scores for the PLSR (Figure 7a) and the feature importance scores for the RF (Figure 7b). We selected 15 features for the MLR, 14 features for the PLSR, and 10 features for the RF. Overall, similar reflectance bands and indices were identified as important features for the individual RI_sel subsets. For the PLSR and the RF, the Landsat bands two (green), three (red) and four (near infrared-NIR) were selected as important features. For all three algorithms, the SCMaPI and the NDSI were selected, whereas the overlap of selected indices was higher for the PLSR and RF (Figure 7a,b). However, for the MLR, not all reflectance bands identified as significant features were flagged as independent features showing high R scores (>0.7) in the correlation matrix. This would result in an elimination of most of the reflectance bands. However, they were all included in the RI_sel database as the analysis of the SCMaP SRC reflectances is the focus of this study. selected as important features. For all three algorithms, the SCMaPI and the NDSI were selected, whereas the overlap of selected indices was higher for the PLSR and RF (Figure  7a,b). However, for the MLR, not all reflectance bands identified as significant features were flagged as independent features showing high R scores (>0.7) in the correlation matrix. This would result in an elimination of most of the reflectance bands. However, they were all included in the RI_sel database as the analysis of the SCMaP SRC reflectances is the focus of this study.

Model Results-Calibration
In order to ensure reliable models not overfitting the data, 10-fold cross-validation (cv) using 70% of the calibration data was additionally conducted to the model calibration using the same portion of data ( Table 4). The remaining 30% of the calibration sampling (a)

Model Results-Calibration
In order to ensure reliable models not overfitting the data, 10-fold cross-validation (cv) using 70% of the calibration data was additionally conducted to the model calibration using the same portion of data ( Table 4). The remaining 30% of the calibration sampling points excluded from the model training were used for validating the models (Table 4, Figure 8).     The model accuracies and performances of the cv in comparison to the calibration results (cal) and validation results (val) are given in Table 4. Overall, except for minor differences for the RF similar R 2 , RMSE and RPD values comparing the cal and cv results were detected for all datasets. This indicates that the cal models are valid and did not overfit the data. However, except for MLR and PLSR, based on the R datasets, the cv results are in a similar range compared to the val results. The model val results are in a similar range for the three algorithms. For PLSR and MLR, the use of additional indices is increasing the accuracies significantly. The influence of additional indices on the RF is less visible. Here, the R 2 is constant, the RPD is increasing, and the RMSE is decreasing. The RF shows the highest CCC values compared to the MLR and PLSR, whereas the CCC values for PLRS are lower based on the MLR. Figure 8 shows the model validation results in detail for all three ML algorithms and for all prepared datasets. Overall, the RF regression performed best, showing the highest R 2 (0.67) and RPD scores (1.62 to 1.77) for the different datasets using the 30% validation sampling points of the calibration dataset. Based on RI_all, the best model accuracies comparing all datasets were obtained. The PLSR showed the lowest modeling accuracies with lower R 2 and RPDs and a higher RMSE in comparison to the other algorithms. The models based on the RI_all showed higher accuracies overall than the models based on the R data setup. The indices positively influenced the prediction of SOC. The RI_sel database showed no improvements in the model accuracies for the different ML algorithms. It is worth noting that for high SOC values, all regression approaches and all data set ups (reflectance and/or indices) underestimated the SOC.
Based on the cv and val results, the best set of features for all models was selected. For MLR, PLSR and RF, the model based on RI_all showed the best performances regarding the model validation and is therefore further used in this study. Using the RI_all feature set, the RF showed the best accuracies concerning the model training, cross validation and external validation.

External Validation
For each ML algorithm, the model based on the best feature subset was selected and applied to the whole investigation area. For each point of the external validation dataset, the predicted SOC contents were compared to the measured SOC values. Figure 9 shows the differences between the 308 pairs of values of the predicted and measured SOC contents for Figure  In total, the average errors were relatively low (0.11% ± 0.56% to 0.21% ± 0.61%) comparing the predicted and the measured values. The comparison of the predicted data based on the RF (RI_all) to the external validation data indicated the lowest mean difference. However, all three histograms showed a Gaussian-like distribution with a small number of outliers and a relatively small bias (mean and median values are close to 0.0 for all cases). The absolute differences ranged between −2.23% to +3.14% for MLR, −1.83% to 2.17% for PLSR and −2.56% to 3.05% for RF. algorithms. It is worth noting that for high SOC values, all regression approaches and all data set ups (reflectance and/or indices) underestimated the SOC. Based on the cv and val results, the best set of features for all models was selected. For MLR, PLSR and RF, the model based on RI_all showed the best performances regarding the model validation and is therefore further used in this study. Using the RI_all feature set, the RF showed the best accuracies concerning the model training, cross validation and external validation.

External Validation
For each ML algorithm, the model based on the best feature subset was selected and applied to the whole investigation area. For each point of the external validation dataset, the predicted SOC contents were compared to the measured SOC values. Figure 9 shows the differences between the 308 pairs of values of the predicted and measured SOC contents for Figure 9a MLR (RI_all), Figure 9b PLSR (RI_all) and Figure 9c RF (RI_all). In total, the average errors were relatively low (0.11% ± 0.56% to 0.21% ± 0.61%) comparing the predicted and the measured values. The comparison of the predicted data based on the RF (RI_all) to the external validation data indicated the lowest mean difference. However, all three histograms showed a Gaussian-like distribution with a small number of outliers and a relatively small bias (mean and median values are close to 0.0 for all cases). The absolute differences ranged between −2.23% to +3.14% for MLR, −1.83% to 2.17% for PLSR and −2.56% to 3.05% for RF.
(a) (b) (c) Figure 9. A comparison of the differences between the 308 predicted and the measured SOC contents of the external validation dataset using the best set of input data for (a) MLR, (b) PLSR and (c) RF.

Spatial SOC Prediction
Overall, RF using the RI_all dataset provided the best model performance (R² = 0.67, RPD = 1.77), the highest model accuracy (RMSE = 1.24%) and the lowest mean difference Figure 9. A comparison of the differences between the 308 predicted and the measured SOC contents of the external validation dataset using the best set of input data for (a) MLR, (b) PLSR and (c) RF.

Spatial SOC Prediction
Overall, RF using the RI_all dataset provided the best model performance (R 2 = 0.67, RPD = 1.77), the highest model accuracy (RMSE = 1.24%) and the lowest mean difference comparing the predicted and the measured SOC contents of the independent external validation dataset (0.11% ± 0.56%). Consequently, the RF based on the RI_all dataset was applied to the whole study area. Figure 10 shows the spatial prediction results of the RF model.

Spectral/Spatial Filtering
We applied spatial/spectral filtering to enable the link of a 30 m SRC pixel possibly influenced by different spectral information or other artefacts to a single point SOC measurement. The filtering is based on neighborhood relationships by evaluating the spectral information of the direct neighboring pixel in comparison to the spectral information of the sampling location. It is assumed that the soil sample is representative of the direct surrounding areas.
Due to the analysis of the STDs of all bands of all pixel clusters, all clusters with a heterogenous land cover were identified to be excluded from the input database due to their possible spectral influences. The spatial/spectral filtering has a significant positive effect on the model accuracies.
As shown in Figure 5, the distribution of the cluster STDs per reflectance band is representing a non-gaussian behavior, which indicates using the median and quantiles is more appropriate. Nevertheless, the selection of the threshold of the 2STD is equal to the 95% quantile for most bands. If the threshold is set using the 95% percentile, additionally six pixel clusters would be excluded and could provide an alternative to the presented method.
In general, the link between EO data with a pixel resolution of several meters and Most of the study area revealed SOC contents lower than 2.0%, which is comparable to the mean SOC content of the soil datasets (Table 3). Regions with higher SOC contents (>6.0%) were mainly predicted in the South of Bavaria. Here, several patterns with relatively high SOC contents (>8.0%) are visible. High SOC contents are predicted in the river valleys in the south of the study area and in bogs and marshlands (e.g., Erdinger Moos around the Airport to the northeast of the city Munich or Königsmoos at the northeast of the city Augsburg). This is in line with an SOC map generated for Bavaria using a geostatistical modelling approach that showed the highest SOC stocks in floodplains and bogs [90].

Spectral/Spatial Filtering
We applied spatial/spectral filtering to enable the link of a 30 m SRC pixel possibly influenced by different spectral information or other artefacts to a single point SOC measurement. The filtering is based on neighborhood relationships by evaluating the spectral information of the direct neighboring pixel in comparison to the spectral information of the sampling location. It is assumed that the soil sample is representative of the direct surrounding areas.
Due to the analysis of the STDs of all bands of all pixel clusters, all clusters with a heterogenous land cover were identified to be excluded from the input database due to their possible spectral influences. The spatial/spectral filtering has a significant positive effect on the model accuracies.
As shown in Figure 5, the distribution of the cluster STDs per reflectance band is representing a non-gaussian behavior, which indicates using the median and quantiles is more appropriate. Nevertheless, the selection of the threshold of the 2STD is equal to the 95% quantile for most bands. If the threshold is set using the 95% percentile, additionally six pixel clusters would be excluded and could provide an alternative to the presented method.
In general, the link between EO data with a pixel resolution of several meters and point soil samples provides a challenge for a wide variety of modeling purposes as point information is related to a larger area. The spectral/spatial filtering technique presented can help to identify, in particular, pixels that are not completely within a field boundary and therefore may contain a mixture of several spectral information (e.g., bare soil and vegetation at the edges of fields) for the sample point. As spectral neighborhood relationships of the pixel to which a soil sample is assigned are included in the assessment, the method can also be applied to other areas and is independent of any region or sensor-specific characteristics. The method shown is a simple and robust approach to exclude possibly disturbed pixels from the given data compilation. However, the applicability of the filtering technique has to be evaluated because it might not be best suited for larger or smaller pixel sizes. More suitable approaches that address the issue of linking point information to a pixel with the spatial extent of several meters should be explored.

Data and Modeling
In contrast to many other studies (Table 1), we used a bare soil composite consisting of a long time series of spaceborne Landsat imagery. Except for some case studies [25,37,43,48], all other models listed in Table 1 were built for single, cloudless multi-or hyperspectral scenes. In contrast, the SOC contents were predicted for a novel multispectral data source that was based on a large number of input scenes (14,061) for an area of nearly 130,000 km 2 . Due to the long compositing period, all variabilities were included, and a stable mean SRC was provided. Small-scale spatial differences due to seasonal soil moisture differences are minimized. Soil moisture differences can have a huge influence on hyperspectral and multispectral remote sensing analysis and are addressed by several authors [91][92][93]. It hampers the prediction of soil variables from the reflectance spectra [94]. However, the influence of the permanent soil moisture differences regarding the used SRC has to be investigated.
For SOC modeling, three different ML algorithms were used and compared (Table 4, Figure 8). Overall, the RF showed the best model performances comparing the R 2 (0.62-0.67), RMSE (1.23-1.31), RPD values (1.62-1.77) and CCC results (0.78) for the model validation. However, the results of the RF and PLSR are comparable. Several indices were implemented in order to improve the SOC modeling capabilities. The application of indices is a widely used technique in remote sensing analyses and helps to capture more information, such as band ratios and spectral indices that are, e.g., sensitive to differences in soil properties [44]. As indicated by the model performance (Figure 8), an improvement can be noted for all three algorithms with the additional use of indices. However, the influence on MLR and PLSR was higher compared to the RF results. In this context, the additionally performed 10-fold cross-validation (Table 4) showed that a selection of relevant features is not necessarily required for the different ML algorithms. The PLSR and the RF results using RI_sel showed a small decrease in the model accuracies comparing the RI_all runs.
The model performances (Figure 8) based on the SCMaP SRC were comparable to the SOC prediction capabilities presented by various authors (Table 1). However, we covered a distinctly larger area (except for [25]) with a lower soil point density. However, in almost all studies, lower RMSE values were reported for the SOC prediction. The SOC content available for the study area shows a large range (0.26% to 18.3%; Table 3). A few of the referenced studies were based on a comparable distributed SOC data range. The high RMSEs could be related to the wide range of SOC content in the study area, as indicated by the general underestimation during the calibration stage for high SOC values (Figure 8). For analyzing the influence of high SOC content, it could be considered to separate organic soils with naturally high SOC contents from mineral soils with lower SOC content. A split of the soil samples regarding their SOC distribution was not considered as there was a small number of soil samples with higher SOC contents (52 samples; SOC content > 6%).
Using a 30-year composite could hamper the mapping of SOC contents if changes occur in the investigation area over time. However, for Bavaria, an analysis of SOC changes of the permanent soil observation sites in Bavaria showed a constant behavior of the SOC contents with relatively low overall changes between 1986 and 2016 [86]. Therefore, the use of a 30-year composite to overcome seasonal soil moisture differences is a reasonable approach to model SOC contents for large geographical areas where SOC changes are limited. Although, further analysis of the compositing technique to overcome seasonal soil moisture differences has to be conducted also with respect to the length of the compositing period. Additionally, for the investigation of SOC changes, the applicability of shorter compositing lengths has to be considered.

External Validation
In addition to the cross and model validation, we conducted an external validation based on an independent dataset. The predicted and measured SOC contents were compared, and the mean difference was calculated to estimate the accuracy of the modeling. The comparison showed small mean differences for MLR, PLSR and RF (0.11% to 0.21%). However, the SOC distribution of the validation dataset indicated small differences in comparison to the calibration dataset. The calibration dataset stretches between 0.26% and 18.3%, whereas the validation dataset contained samples with SOC contents between 0.55% and 4.65%. Although the majority of the calibration data is represented by the validation samples, very low (0.11% to 0.25%) and very high (4.66% to 18.3%) SOC contents are not included in the validation dataset. The comparison of the predicted SOC contents with the external validation dataset showed a small overestimation of the modeled SOC contents Figure 9, which has to be considered in the interpretation for the prediction of the entire study area (Figure 10).
To address large-scale SOC predictions (national to European-wide), further standardized validation datasets are needed. However, large-scale SOC maps are mostly available at a lower resolution (250 m to 1 km) and have limited suitability as a basis for validation for the 30 m pixel resolution of the SCMaP SRC database. A different aspect that could be considered for validation is an internal quality measure provided by the number of cloudless scenes per pixel. The usable data availability can be taken into consideration for data validation [51,95]. For the calibration and the validation dataset, an analysis of the number of cloudless observations of all sampling pixels showed a similar distribution (Figure 11a,b). On average, 300 scenes were available for the pixels of the SOC measurements. Both datasets showed a smaller peak on 200 scenes. These pixels are located in areas where the data of one Landsat path row is available. The absolute number of input scenes is smaller there. The smaller peak at 500 scenes per pixel can be related to overlapping areas, where several Landsat path rows are intersecting.

SCMaP SRC as Database for Modeling SOC Contents with High Spatial Resolution Covering Large Geographical Areas
In comparison to existing SOC maps (e.g., OCTOP [15], Topsoil Soil Organic Carbon Map based on the LUCAS Soil datasets for EU25 [16] or SoilGrids [96]), SCMaP provides a novel database for the estimation of soil parameters. The compositing approach allows the investigation of all areas which show at least once a bare soil within the observed time period. As the approach was trained using a large database and was successfully validated using independent data, the transferability to large-scale applications is feasible. In addition, high-resolution analysis considering within or between field differences is still possible as the original Landsat pixel size (30 m) is preserved ( Figure 12). As shown in Figures 10 and 12, for several areas, relatively high SOC contents were predicted by the RF model. Here, a former peat bog ("Königsmoos") is located, which naturally shows higher SOC contents. Such organic soils naturally contain higher SOC contents (>18.0 %) in comparison to other soils. Most of these peatlands have been drained for agricultural use [97].
A comparison with the soil map 1:200,000 (BUEK200, Federal Institute for Geosciences and Natural Resources, BGR) showed that areas with high predicted SOC contents are mostly fens, underpinning the correctness of the results. In addition, the  Figure 11c shows the link between the SOC differences and the distribution of cloudless scenes. The higher differences (<−2% and >2%) cannot be related to fewer cloudless scenes per pixel as the overall average of 300. These findings indicate that SCMaP captured the exposed soils well at the validation sampling points, and the influence of potentially remaining clouds is minimized and seems not to have any influence.

SCMaP SRC as Database for Modeling SOC Contents with High Spatial Resolution Covering Large Geographical Areas
In comparison to existing SOC maps (e.g., OCTOP [15], Topsoil Soil Organic Carbon Map based on the LUCAS Soil datasets for EU25 [16] or SoilGrids [96]), SCMaP provides a novel database for the estimation of soil parameters. The compositing approach allows the investigation of all areas which show at least once a bare soil within the observed time period. As the approach was trained using a large database and was successfully validated using independent data, the transferability to large-scale applications is feasible. In addition, high-resolution analysis considering within or between field differences is still possible as the original Landsat pixel size (30 m) is preserved ( Figure 12).

SCMaP SRC as Database for Modeling SOC Contents with High Spatial Resolution Covering Large Geographical Areas
In comparison to existing SOC maps (e.g., OCTOP [15], Topsoil Soil Organic Carbon Map based on the LUCAS Soil datasets for EU25 [16] or SoilGrids [96]), SCMaP provides a novel database for the estimation of soil parameters. The compositing approach allows the investigation of all areas which show at least once a bare soil within the observed time period. As the approach was trained using a large database and was successfully validated using independent data, the transferability to large-scale applications is feasible. In addition, high-resolution analysis considering within or between field differences is still possible as the original Landsat pixel size (30 m) is preserved ( Figure 12). As shown in Figures 10 and 12, for several areas, relatively high SOC contents were predicted by the RF model. Here, a former peat bog ("Königsmoos") is located, which naturally shows higher SOC contents. Such organic soils naturally contain higher SOC contents (>18.0 %) in comparison to other soils. Most of these peatlands have been drained for agricultural use [97].
A comparison with the soil map 1:200,000 (BUEK200, Federal Institute for Geosciences and Natural Resources, BGR) showed that areas with high predicted SOC contents are mostly fens, underpinning the correctness of the results. In addition, the qualitative SOC distributions shown in the study area are consistent with the results of SOC mapping As shown in Figures 10 and 12, for several areas, relatively high SOC contents were predicted by the RF model. Here, a former peat bog ("Königsmoos") is located, which naturally shows higher SOC contents. Such organic soils naturally contain higher SOC contents (>18.0 %) in comparison to other soils. Most of these peatlands have been drained for agricultural use [97].
A comparison with the soil map 1:200,000 (BUEK200, Federal Institute for Geosciences and Natural Resources, BGR) showed that areas with high predicted SOC contents are mostly fens, underpinning the correctness of the results. In addition, the qualitative SOC distributions shown in the study area are consistent with the results of SOC mapping in Bavaria shown in [90].

Conclusions
The potential of the SCMaP SRC database derived from Landsat images between 1984 and 2014 for large-scale applications with a high spatial resolution was evaluated. We used the SRC to model the spatial SOC distribution of exposed topsoils of croplands in Bavaria. The SRC was correlated with soil point measurements to quantify SOC contents for an area-wide mapping approach. We first developed a spatial/spectral filtering technique to address the challenge of linking a point soil sample to EO data with a pixel resolution of several meters. The results show that a spectral/spatial filtering of heterogenous pixel clusters is improving the SOC modeling.
For SOC quantification, several ML algorithms were applied and compared. The RF showed the highest capabilities to model the SOC content in Bavaria (R 2 = 0.67, RMSE = 1.24%, RPD = 1.77). Further, we determined that the use of additional spectral indices compared to the usage of reflectance data alone can improve SOC modeling.
In addition to the model validation based on a subset of the data, the best model setups (RI_all) were applied to the entire test area and validated using an external independent dataset (n = 308). The differences between the measured and predicted SOC contents were minor for all three ML algorithms and showed the lowest differences for the RF (0.11% ± 0.56% SOC).
The SCMaP SRC is a promising approach to predict the spatial SOC distribution for mapping a large geographical extent with high resolution at the farm or even the field scale. Nevertheless, for application on a larger scale, a validation approach has to be further developed. Several large-scale SOC products are available, although these maps are distributed on a lower resolution in comparison to the SCMaP capabilities.      Figure A3. Autocorrelation of the prediction residuals of the model validation (30% of data points). Figure A3. Autocorrelation of the prediction residuals of the model validation (30% of data points).