Three-Dimensional Mapping of Forest Soil Carbon Stocks Using SCORPAN Modelling and Relative Depth Gradients in the North-Eastern Lowlands of Germany

: To cope with the challenges in forest management that are contemporarily caused by climate change, data on current chemical and physical soil properties are more and more necessary. For this purpose, we present a further amalgam of depth functions and SCORPAN modelling to provide data at arbitrary depth layers. In this concept, regionalisation is split up into the modelling of plot totals and the estimation of vertical distributions. The intended beneﬁts by splitting up are: consistency between estimates on plot level and depth layer level, avoidance of artiﬁcial depth gradients, straightforward interpretation of covariates in the sense of pedogenetic processes, and circumnavigation of the propagation of uncertainties associated with separation between horizons during ﬁeld sampling. The methodology was tailored to the circumstances within the north-eastern lowlands and the utilisation of current inventory data of the National Forest Soil Inventory (NFSI) in Brandenburg (Germany). Using the regionalisation of soil organic carbon (SOC) as an example, the application is demonstrated and discussed in detail. The depth to groundwater table and terrain parameters related to the catchment area were the main factors in SOC storage. The use of kriging did not improve the model performance. The relative depth gradients of SOC were especially distinguished by tree species composition and stand age. We suppose that interesting ﬁelds of application may be found in scenario-based modelling of SOC and when SOC serves as a basis for hydrological modelling.


Introduction
Nowadays, regional data on forest soil properties are increasingly demanded for various questions concerning forest management practices, like tree species selection, liming, harvest intensity, or the detection of risk areas. Thus, quantitative data related to climate change and its site-specific drought effects, as well as data on current soil nutrient status, are required to support decision-making.
Soil organic carbon (SOC) profoundly influences a soil's cation exchange capacity [1][2][3], which is a key feature for nutrient supply to trees. Soil texture, SOC , and the usually tightly correlated bulk density [4][5][6] are the main factors controlling the soil hydraulic properties [7][8][9]. Especially in the case of the mainly sandy textured soils of the northeastern lowlands, soil organic carbon strongly influences the spatial patterns of hydraulic properties [10]. In addition to these well-understood principles, which are widely used in modelling water and element budgets, SOC also affects the wettability of soils [11,12] and, thus, the development of preferential flow paths [13]. Implications of preferential flow paths are not limited to percolation and actual soil water storage [14], and may also include feedback effects on local climate [15].
Whereas in the glacial deposits of the north-eastern lowlands, soil texture is largely predetermined by parent material, other soil forming factors, like groundwater, vegetation/forest management, and climate, also have to be taken into consideration for soil organic carbon. In particular, due to the influence of groundwater and vegetation, the temporal variation of SOC is expected to be high (compared to soil texture). Since enrichment of soils with soil organic matter usually starts just after deposition of parent material, tighter correlations with current terrain can also be expected. On the other hand, to cope with the temporal variation, contemporary measurements of SOC are strictly needed for regionalisation.
Over the last decades, several approaches for mapping SOC have been developed, which generally make intense use of terrain parameters derived from digital elevation models. Further covariates related to soil forming factors, like climate data, geological mapping units, or data on landcover, are also frequently used. The methods usually involved range from machine learning techniques, like support vector machines [16] or artificial neural networks [17], to all kinds of statistical models, such as stepwise linear regression and decision trees [18][19][20][21]. Especially since the introduction of the SCORPAN framework for digital soil mapping by McBratney et al. [22], which extended the classic concept of soil forming factors [23] by soil and space, the additional use of geostatistics in regionalisation approaches for SOC became a widely used technique [24]. Newer approaches also use deep learning [25], data augmentation [26], and increment-averaged kriging [27] to predict multiple depths with a single model.
The majority of studies conducted are on the prediction of SOC for predefined depths of the entire soil solum or discrete soil layers. Prior to development of prediction models, it is therefore necessary that the solum depth or soil layer boundaries are well defined for the respective purposes. If the SOC of several soil layers should be mapped, fitting of the increasing number of prediction models becomes more and more time consuming. The SOC stock of every layer is then treated as unbounded random variables (see also [28]). Thus, it becomes far from straightforward to ensure consistency between SOC stored in single layers and the stocks of the entire soil solum at every single site in the resulting maps.
Analogous problems arise for the resulting depth gradients of SOC, which might adopt artificial shapes. Furthermore, fitting of individual models for respective soil layers forces the user to deal with several different estimates of variable importance. Except for soil layers with a priori well-known differences regarding the main processes for carbon enrichment, such as the organic layer and mineral soil, it will hardly be possible to logically interpret the influence of covariates in terms of soil forming factors. This may hinder the optimal selection of predictors and detection of spurious covariates to be excluded. Last but not least, individual regionalisation of depth layers may enhance the propagation of potentially large sampling errors caused by incorrect separation. Such errors especially result from mingling of the organic layer and mineral soil [29].
The usage of depth functions to describe the vertical variation of soil properties has a long tradition in soil science (see [30] for a detailed overview). The application of soil attribute depth functions in three-dimensional spatial prediction of soil properties was already proposed by Bishop et al. [31]. It has also already been shown that the concept of depth functions is beneficial in mapping SOC distribution throughout the soil profile: Equal-area quadratic smoothing splines were used to derive SOC for homogeneous depth increments from legacy profile data as a data basis for the development of artificial neural networks and regression models [21,32,33]. To obtain maps of continuous depth functions, Malone et al. [32] utilised equal-area quadratic splines for interpolation through predicted SOC contents of discrete soil layers after regionalisation. To calculate weighted means of SOC for mapping units from available legacy soil data, Odgers et al. [34] applied equal-area quadratic smoothing splines. From a more conceptual perspective, depth functions are used for two reasons within 2.5D soil mapping frameworks [35]: The first reason is the harmonisation of soil observations. Equal-area quadratic smoothing splines are then often fitted to depth interval values after mapping to derive continuous depth functions for each pixel mapped. The equal-area smoothing splines of Bishop et al. [31] can be considered as the state of the art for these two applications for one reason in particular: The interpolation process preserves the masses originally measured or predicted on a depth interval basis (especially for small values of λ).
Direct utilisations of depth functions during the mapping process itself are given by Mishra et al. [36] and Kempen et al. [37]. Mishra et al. [36] used ordinary kriging to interpolate parameters of exponential functions fitted to single-point observations beforehand. Kempen et al. [37] predicted parameters of soil-type-specific depth functions of SOC using cokriging.
By using relative depth gradients, we try to strengthen the interpretability of SCOR-PAN factors in three-dimensional mapping of SOC in forest soils. Our further objectives were to ensure the consistency between solum stocks and stocks within single depth layers, to avoid the construction of artificial depth gradients, and to lessen the propagation of sample separation errors. The basic conceptual hypothesis of the method is that distinguishing between covariates that control horizontal and vertical distribution of SOC simplifies variable selection during model development. Even though increasing interpretability may decrease model performance measures, we believe that a good understanding of covariates in terms of soil forming factors is valuable for the mapping results.

Material and Methods
For the regionalisation of SOC stocks in the north-eastern lowlands, we started with the general methodology proposed by Kempen et al. [37]. The main accommodations to the requirements for mapping SOC in the north-eastern lowlands concern a simplified derivation of depth functions, which does not require expert-knowledge-based groupings like horizons and soil types. Additionally, we tried to facilitate the accessibility of the main factors controlling the horizontal and vertical distribution of SOC. To achieve this, we substituted soil-type-specific depth functions with data-driven clustering and split up the mapping process for the vertical and horizontal distribution of SOC with relative depth gradients. A flowchart of the methods and data involved is given in Figure 1.  In the first step, stepwise regression analysis was used to select the most significant covariates, which determine whole solum SOC stocks at inventory plots, from a bundle of potentially relevant SCORPAN factors. Afterwards, the residuals of the fitted regression models were examined using geostatistical methods (variogram analysis, ordinary kriging) on spatial dependencies. Hence, the regionalisation of the SOC solum stocks complied with the widely used regression kriging discussed in detail by Hengl et al. [38].
The regionalisation of depth gradients started with calculation of relative SOC stocks within the particular depth layer by expressing single stocks in proportion to solum stocks. The obtained relative depth gradients were then grouped into depth gradient types using cluster analysis. Finally, classification tree approach was used to estimate the probability that a site was in one of the distinct depth types. In addition to the purpose of regionalisation, the resulting classification tree offers insights into the main factors controlling the vertical distribution of SOC.

Inventory Data
To address the expected temporal variably of SOC adequately, we restricted the SOC data invoked to inventory and inventory-like data. Our primary source of SOC data was thus the second National Forest Soil Inventory (NFSI) [39]. An additional NFSI-like soil sampling campaign was conducted at a regional scale to enhance the analysis of topographic effects on SOC. Within the NFSI, field sampling was conducted during 2006 and 2009. At the regional scale, the collection of soil samples was done in the years 2008 and 2009 [40]. We did not make use of additional legacy profile data to avoid biased estimates on total SOC stocks and to prevent a mix-up of spatial and temporal effects on SOC. In contrast to the often-used legacy profile data, the NFSI comes with six well-defined uniform depth layers (organic layer, 0-5, 5-10, 10-3, 30-60, and 60-90 cm). Hence, the homogenisation or refinement of soil layers was not necessary prior to regionalisation. The volumetric SOC contents (kg m −3 ) were obtained by using Equation (1), where ρ fe , C org , and CF denote the bulk density of the fine earth (kg m −3 ), the concentration of organic carbon (kg kg −1 ), and the volume fraction of coarse fragments (%), respectively.
The volume fraction of coarse fragments was estimated in the field and measured by sieve analysis in the laboratory (depending on rock size). Mass fractions obtained from the sieve were transformed into volume fractions by assuming a rock density of 2.65 g cm −3 . The bulk density of the fine earth was determined on soil-core samples. Mass and volume fractions of coarse fragments (>2 mm) contained within the core samples were excluded from the calculation. Organic carbon was determined by dry combustion using a total analyser (LECO CNS-2000). All laboratory analyses were performed in compliance with the NFSI standards, which are documented in detail in GAFA [41].

Environmental Covariates
Since the preliminary considerations strongly suggested that SOC is controlled by a wide range of influences, we tried to obtain environmental covariates that address all seven SCORPAN factors appropriately. A short overview of the compiled variables is given in Tables 1 and 2. Covariates related to parent material and soil properties were primarily derived from forest site mapping data [42,43] and by means of digital soil mapping techniques [40,44]. In addition to widely used particle size fractions, we additionally computed the geometric mean particle size Dg% ... for numerical characterisation of the soil texture. Dg% ... was originally proposed by Meersmans et al. [45] to improve the reflection of SOC variation due to texture. For estimates of the depth of groundwater, we additionally considered water-table maps based on monitoring wells [46].
Climatic factors were directly taken from the detailed compilation by Riek et al. [47]. The selection of indicators to be considered was guided by assumptions on dominant processes influencing the storage of SOC inside the study area. Thus, CWB win (climatic water balance during winter) and P win (winter precipitation) were chosen as proxies for vertical translocation, whereas CWB sum (climatic water balance in the summer half-year), P sum (summer precipitation), and T a (annual mean temperature) were used to indicate climatic effects on soil biological activity. Additionally, we calculated the annual rage of temperature ∆ T as a surrogate for continentality and the number of frost-free days. The latter was originally introduced by Ramann [48] and then revisited by Jenny [23] as Ramann's Weathering Factor, expressing the length of the 'chemical weathering season'. The proportions of tree species were derived from forest inventory data ('Datenspeicher Wald') to obtain forest-specific indicators on land-use effects. The usage of forest inventory data allowed us to calculate the respective percentages from ratios of a singlespecies basal area to the total basal area of the forest stands. In the rare case of missing data, CORINE landcover classes [49] were used instead (applies to mapping only).
The role of topography on SOC was examined with several terrain attributes derived from a digital elevation model (DEM). As the cell size of the underlying DEM was 25 m, the terrain attributes derived form the environmental covariates with the highest spatial resolution in our study. Seeing that the covariates derived from DEM might be able to improve the prediction of SOC at the smallest spatial scales and the general availability of DEM data, we made exhaustive use of them. Nevertheless, the selection of indices from the vast number of existing terrain attributes was once again strictly constrained with respect to the theory of topography as a soil forming factor. The computed terrain attributes are tabulated in Table 2.
The time factor on SOC was considered from a geologic and a forest management perspective. For this, we computed the mean stand age as a basal-area-weighted average on the basis of forest inventory data. To cope with the problem of soil formation time, we used the geochemical series gathered by forest site mapping as a rough estimate.

Topographic Indices Calculated at the Catchment Scale
A C catchment area [59] β C average slope [50] within catchment L, S, LS slope length, slope steepness, and topographic factors from the Revised Universal Soil Loss Equation (RUSLE in [60]) TWI topographic wetness index [61] SPI stream power index [62] Terrain Attributes Derived from Drainage (DN) and Ridge Networks (RL) [63,64] at Different Spatial Scales (Large Scale (l), Small Scale (s)) uphill slope angle (to ridge network), downhill slope angle (to drainage network), and total slope angle (slope from ridgeline to drainage network) [65] s RL ( ... ), s DN ( ... ), s tot ( ... ), horizontal distances to ridgeline and drainage networks and total slope length (distance from ridgeline to drainage network) [65] ∆z DN ( ... ) elevation above the thalweg [66] RSP ( ... ) relative slope position [65] MBI ( ... ) mass balance index [67] In addition to the consideration of space by the use of geostatistics, we also computed simple distance-based indices. The created indices should serve as proxies for edge effects, like micro-climate and atmospheric deposition. Since the probability of land-use changes in the past increases with closeness to the forest edge, the indices may also act as proxies for former stand history (beyond the time horizon of forest inventory data).
During the development of the model, we tried to circumnavigate the positional errors contained in the different sources of spatial data. Thus, whenever the NFSI contained the same information, as in the case of soil texture or tree species composition, we favoured the inventory data as a source for derivation of environmental covariates. Field estimates from the NFSI of terrain features, slope, exposition, and curvature were also used to verify the spatial positioning of sample locations within the DEM. Following Grimm and Behrens [68], we also iteratively inspected alternative residual values at DEM cells surrounding the measured coordinates. Thereby, we attempted to obtain a further mitigation of uncertainties resulting from locational errors of the DEM and sample points. The assumed range of spatial uncertainty and, thus, the respective domain for virtually shifting the sample points within the DEM were constrained to the 3 × 3 altitude submatrix in both cases.

Regression Analysis of Total Solum SOC
The amount of SOC within the soil solum was assumed to be equal to the sum of SOC stored in the six NFSI depth layers (organic layer plus mineral soil 0-90 cm). Altogether, 424 profiles could be used within the analysis. The total amount of SOC stored within the soil solum SOC Solum (kg m −2 ) was modelled using multiple linear regression equations in the usual form of Equation (2). Predicted values, the constant, regression coefficients, covariates, and the number of explanatory variables are denoted byŶ b 0 , b j , X j , and J, respectively:Ŷ Distributions of solum stocks were inspected by histogram analysis. The functional relationships to predictor variables were examined using scatter plots. In cases showing evidence of non-linear relationships, appropriate intrinsically linear functions were tested. Residual values were examined for heteroscedasticity and autocorrelation in scatter plots and were additionally tested with the Durbin-Watson statistic for the latter. As a measure for the multicollinearity, the tolerance value T j was computed. Low tolerances indicate increasing magnitudes of multicollinearity. A tolerance value of 1 denotes perfectly uncorrelated predictors. Thresholds for entry and removal of covariates into/from the regression equations within the stepwise selection process were set to probabilities of F of 0.1 and 0.15, respectively [69].
We chose stepwise linear regression analysis over the more elaborate statistical techniques that are currently used in digital soil mapping. The intended advantages by sticking to this well-established approach were the integrated variable selection during model development and the direct interpretability of covariate effects: Using multiple linear regression, the influence of each predictor on the SOC can be seen from the regression coefficients b j . Positive coefficients indicate that SOC increases with increasing covariate values. Absolute values of standardised regression coefficientsb j were used as an indicator for relative variable importance within the regression models. Throughout the iterative model development, we permanently examined the plausibility of the predictors that were entered. Covariates that failed to comply with the expected effects associated with the respective soil forming factors were continuously eliminated. This was especially the case if the sign of the estimated regression coefficients was not in accordance with the a priori assumed causal relationships. We also examined the stability of the direction of effects within the lower and upper confidence interval bounds, b j − t 95 s b j and b j + t 95 s b j .
Model performance was assessed by means of scatter plots and statistical indices. For the latter, the common measures-coefficient of determination r 2 , adjusted coefficient of determination r 2 adj , norm of residuals s e , and F-statistic F emp -were calculated. In addition to the classical model performance measures computed on the calibration sample, we used cross-validation to estimate the effects of random fluctuations due to the limited calibration sample size [70]. For this purpose, we employed repeated tenfold cross-validation [71]. For each model one hundred runs of tenfold cross-validation were performed, resulting in one thousand model fits. Based on the one hundred validation data sets that were obtained, the coefficient of determination r 2 cv and the root mean square error RMSE cv were recalculated. We also computed the least squares analysis of slope b 1 cv and the concordance correlation coefficient of Lin [72] (CCC cv ). Both statistics are common indicators in assay and instrument validations. All computations were performed using IBM SPSS Statistics 19 (IBM ® ).
Inspired by the stratified modelling approach of Zirlewagen and v. Wilpert [73], regionalisation of the total solum SOC was divided into two submodels. The first submodel was calibrated at terrestrial sites, while the second addressed soil development under hydromorphic conditions. The idea behind partitioning into submodels is the assumption that markedly different factors control storage of SOC in both hydrologic domains. A desired side effect was the introduction of a (nonlinear) threshold point for groundwater effects on SOC at the terrestrial sites. To prevent discontinuities and to maintain sufficient cases, sites within a transient area from the hydromorphic regime to terrestrial soils were used in the development of both models. The overlap used ranged from 2.0 to 3.0 m depth to the groundwater table. The upper threshold for distinguishing between both hydrologic domains conforms to the classification given by the German soil mapping guidelines, 'GWS6' [74], whereas the lower bound of 3.0 m is taken from local forest site mapping instructions [43]. It is the cut-off above which forest sites are considered to be influenced by groundwater.

Geostatistical Analysis
Following the recommendations of McBratney et al. [22], we applied variogram analysis and kriging on the residuals to examine the spatial trends within the remaining variance. Considering the density of the sampling points, we mainly intended to incorporate unknown large-scale effects, such as paleoclimate or atmospheric depositions, rather than to improve predictions in the close surroundings of single NFSI plots.
All geostatistical analyses were conducted under the assumption of isotropy. The autocorrelation coefficients ρ(h) and semivariances (Matheron's estimator) γ(h) were calculated according to Webster and Oliver [75]. Since variograms and autocorrelograms are sensitive to the underlying lag increment, the first step of the analysis was the estimation of a suitable interval. In order to find an appropriate increment, we calculated the average separation between sample points and their nearest neighbours, as recommended by Webster and Oliver [75]. Finally, the potential model improvement due to the estimation of residuals by ordinary kriging was evaluated. For this purpose, we performed a leave-one-out cross-validation, as recommended by Webster and Oliver [75].

Depth Gradients
To derive the relative depth gradients, the volumetric SOC contents (kg m −3 ) of the six NFSI layers i were expressed as depth-related percentages SOC% (% m −1 ) of total solum SOC storage SOC Solum (kg m −2 ) (Equation (3)). At sites where a forest floor layer was lacking, the partial SOC storage for this layer was set to zero.
All depth data were grouped into depth gradient types using clustering. We used the common agglomerative hierarchical procedure of Ward [76]. As an objective function for minimising at each fusion step, the squared Euclidean distance was used. As stated by Backhaus et al. [77], Ward's method can be considered as conservative, which means that there is little tendency to form equal or unequal group sizes. Therefore, Ward's method is usually one of the best grouping methods for finding the actual underlying structure of the data. In particular, it is an excellent choice if the number of clusters is unknown [78]. For that reason, hierarchical grouping methods were preferred over iterative partitioning methods that require a priori assumptions on the best number of clusters [79], even if this solution may not be superior in terms of within-group homogeneity. For the calculation of Euclidean distances, the partial SOC stocks were equally weighted for each depth layer.
The estimation of the optimal cluster number was guided by examination of the scree plot (elbow method) and the stopping rule of Mojena [78]. The estimation of optimal cluster solutions by the latter is based on the threshold ranges of the standardised error sum of squares (ESS). For this, ESS values are standardised with respect to the mean value and standard deviation of the ESS across all clustering stages. Within standardised ESS values ranging from 2.75 to 3.50, Mojena [78] observed the best overall results in terms of the predicted number of clusters for the subsequent clustering stage. To obtain suitable graphical representations of the selected cluster solution, we used the equal-area quadratic smoothing splines of Bishop et al. [31]. Where available, the SOC data of the NFSI depth layer of 90-140 cm were used to improve the fitting at the bottom of the solum. According to the findings of Bishop et al. [31], we set the the roughness parameter λ to 0.1. Only in cases where the fit using λ = 0.1 obviously failed, we increased λ to 1.0 and refitted the the respective spline. The between-cluster distances of the obtained clusters are delineated numerically by the normalised differences between within-cluster means and total means (t-statistics). In addition to the graphical depiction, the within-cluster distances were assessed by the F-statistic, which expresses the ratio of within-group variability to total variability. Finally, depth gradient types could be obtained as arithmetic means of relative depth gradients merged to a single cluster.

Mapping of Relative Depth Gradient Types
To predict the probabilities of individual depth gradient types, we employed the classification and regression trees approach (CART) of Breiman et al. [80]. The usage of relative depth gradients enhances non-linear dependencies between environmental covariates and class probabilities. Therefore, we preferred tree-based approaches over parametric approaches, like stepwise linear discriminant programs. Thus, using a treebased approach, we avoided elaborate transformations of the predictor variables. A further consideration leading to the employment of CART was the metric scale in the majority of the environmental covariates. Compared to other tree classification approaches like Chi-Squared Automatic Interaction Detection (CHAID) [81], CART is able to process intervallevel variables directly without prior grouping. The Gini diversity index [82,83] was used as measure of node impurity in the growing and pruning steps of tree construction. Thus, the obtained trees were optimised to predict the probabilities of the depth gradient types (class probability tree). The most characteristic feature of CART is the determination of right-sized trees by optimal pruning instead of by using stopping rules. We employed tenfold cross-validation and selected the best-performing tree within the standard error (1 SE rule). The random samples used in tenfold cross-validation were constructed with regard to class membership. The random selection was constrained so that the proportions of clusters in each test sample equalled the respective proportions in the total sample. The only 'stopping rule' invoked was that each terminal node should contain at least a minimum of five cases, thus reducing computational effort, but still providing sufficiently large trees for subsequent pruning in almost any situation [80].

Mapping Total Solum SOC
The best covariates for predicting solum SOC, which were determined by stepwise regression, are given in Table 3. The signs of RSP (s) (relative slope position) and s DN (s) (horizontal distance to drainage network) show that SOC increases at low relative slope positions and at toe-slope positions in close proximity to the drainage line. In terms of soil forming processes, this indicates lateral inputs of organic matter due to flooding and depositions by overland flow from higher slope positions. Similar explanations can be given for the increase of SOC with the steepness of catchment slopes β C and the decrease with the stream power index SPI. Erosion within the catchments causes a corresponding deposition at the sites, while at high erosive potential (SPI ), the SOC is lowered by topsoil losses or loss of surface inputs (litter). In addition to these covariates related to erosion by overland flow, the terrain parameter east ×β (product of local eastness and slope) can be considered as an indicator for the redistribution of organic matter by the prevailing westward winds.
An indicator for the vertical translocation of SOC within the soil profile can be seen in the winter precipitation P win . Especially during winter season, when the soil is wet, rainfall strengthens podzolization (e.g., [84,85]), and thus enhances SOC sequestration in deeper soil layers.
Covariates focusing on in situ soil development are z CaCO 3 (depth of calcareous horizon), CF 0...90 (proportions of coarse fragments) andŜ (direct solar beam irradiance). At first glance, the positive correlation coefficient of z CaCO 3 appears contradictory to the in situ effects of calcium. Actually, calcium bridges organic matter and clay minerals and stabilises aggregates [86]. However, in acidic soils without near-surface or even any calcareous layers within the solum (e.g. z CaCO 3 > 3.0 m), soil respiration is often hindered due to the low activity of soil microbial communities under acid conditions. Of course, acid conditions induced by high z CaCO 3 may also enhance podzolization processes. On the contrary, soil respiration usually increases with increasing soil temperatures [87,88] under intense direct insolationŜ. The reduction of SOC stocks at high rock contents CF 0... 90 indicates that the corresponding loss of soil volume is not balanced by higher concentrations of SOC in the remaining fine-earth fraction (see [89]). Table 3. Selected covariates X j for estimation of SOC stocks in the soil solum and the corresponding statistical parameters: regression coefficient b j (10 −4 ), standard error of the regression coefficients s b j (10 −4 ), standardised regression coefficientsb j , t-statistic t j , p-value p j , confidence interval bounds b j − t 95 s b j and b j + t 95 s b j (10 −4 ), and tolerance value T j . The land-use effects on SOC are captured by the proportions of deciduous trees p deciduous . The negative regression coefficient implies the well-known risk of litter loss by wind blowing for leaf litter. On the other hand, this might also signal that SOC stored in the generally thicker forest floor layers at coniferous sites is not transferred to mineral soil at deciduous sites to the same extent. The raised SOC storage at juvenile glacial deposits (series I ) appears as a result of the higher ecosystem productivity in these nutrient-rich soils.

Submodel for Terrestrial Soils (z GW > 200 cm)
The increase of SOC stocks in the presence of high groundwater tables (z GW ≤ 300 cm) is in line with expectations. Carbon mineralisation is more and more inhibited due to the development of anaerobic conditions with rising groundwater levels at these sites. To describe the nonlinear response to groundwater levels z GW properly, the best fit was found for the logarithmic transformation. Larger catchments also result in higher SOC stocks. In addition to the mean depth of groundwater, a large catchment area A C indicates stable, lessfluctuating groundwater levels, thus causing higher ecosystem productivity at low water tables and reduced mineralisation (especially at high water tables). The effect of z CaCO 3 is similar to that of the terrestrial sites. Acidic conditions inhibit soil microbial activity. The decrease of SOC with increasing mass balance index MBI (l ) is in good agreement with net loss due to the lateral transport processes associated with positive MBI (l ) values.
Especially for the various terrain attributes, the occurrence of multicollinearity, masking the individual variable importance, cannot be completely neglected. In particular, within the submodel for the terrestrial soils, some smaller tolerance values were observed. Despite these restrictions, the inspection of standardised regression coefficientsb j clearly indicates a generally high importance of terrain parameters in the submodel for terrestrial soils. The highest absolute value ofb j was found for the slope within the catchment β C . The soil property z CaCO 3 is also of particular importance in this submodel. The distinctly most important predictor in the submodel for soils with near-surface groundwater was, of course, the depth of groundwater z GW itself. The standardised coefficient obtained for the terrain parameter A C must be seen in terms of the groundwater regime rather than terrain effects. Although the number and importance of included variables are quite different, predictors related to nearly every SCORPAN factor are incorporated into the regression equations. The only exception to this is the space factor, indicating that edge effects and stand history are of minor importance.
The p-values p j derived from the t-statistic are small for the majority of the regression coefficients. Increased p-values occur only for coefficients in the submodel for sites with near-surface groundwater. However, even in these cases, the sign and, thus, the direction of effects do not shift within the lower and upper confidence interval bounds, b j − t 95 s b j and b j + t 95 s b j . All in all, the results of the t-statistic suggest statistically reliable regression equations.
The model performance measures are given in Figure 2. As expected, the p-values derived from the F-distribution signal high significance. The number of explanatory variables J entered into both equations is distinctly below the threshold of 10 N/J that was recommended by Hengl et al. [38] to prevent overfitting. Particularly in terms of absolute errors, the norm of residuals s e confirms that the performance of the groundwater submodel is slightly poorer. The increased differences in this domain can also be seen clearly from the scatter plot of the measured and predicted values ( Figure 2). These differences between both submodels mainly result from the high variability of SOC in the group of soils with near-surface groundwater. Thus, the coefficients of determination, r 2 and r 2 adj , are quite similar in both models. The scatter plot shows no extreme outliers or systematic deviations. Merely by viewing the data of both models at once, some heteroscedasticity appears. Within the single models, though, the observed overall heteroscedasticity has no effect on the estimation of confidence intervals [77]. The assessment of random fluctuations due to the limited size of the calibration sample is shown in Figure 3. The cross-validation estimates of r 2 cv derived for the terrestrial soil models are in good agreement with the values obtained from the calibration sample. In the case of models for hydromorphic conditions, the agreement between the calibration sample and cross-validation estimates of r 2 cv is slightly poorer. In comparison to the cross-validation results for the group of terrestrial sites, the fluctuations of r 2 cv across the repetitions are distinctly higher for the hydromorphic sites. Quite similar relations can be discovered by inspecting s e and RMSE cv . The cross-validation estimates RMSE cv are almost as small as the respective calibration sample estimates s e . The fluctuation of RMSE cv across validation runs is nearly negligible. Again, the situation is different for the crossvalidation estimates derived for hydromorphic sites, showing slightly increased values that vary substantially over the performed repetitions. Altogether, these observations are in line with the findings of Browne [70] regarding the use of cross-validation in regression applications: If the calibration sample is sufficiently large, the model performance measures yielded by calibration and cross-validation are essentially the same. On the other hand, the cross-validation clearly unveils the slightly optimistic impression of model performance that was originally derived for the small sample of hydromorphic sites. The indicators b 1 cv and CCC cv adopted from assay and instrument validation also reflect this dependence on sample size. By definition, the slope b 1 between the observed and predicted values equals the coefficient of determination when this performance measure is used in conjunction with regression analysis (applies to the calibration sample).
Thus, the cross-validation estimates r 2 cv and b 1 cv also reach a comparable level, as long as the sample size is large. CCC cv extends the Pearson correlation coefficient by an additional penalty for deviations of b 1 cv from the 45 • line. As a consequence of this, squared values of CCC cv are distinctly lower than r 2 cv each time.

Geostatistical Analysis of Residuals
After selecting the shortest separations from all 89,676 distances between the 424 sample points, the obtained mean of 3.684 km served as a starting point to choose the lag increment. For practical reasons, e.g., plotting the variogram, the lag increment was finally set to 4.0 km. In the range 0 to 200 km, the mean number of observations contained in each of the 51 lags was 1752. Only at lags > 150 km did the numbers of cases fall below 1000.
The experimental autocorrelation coefficients ρ(h) are shown in Figure 4. The autocorrelogram shows an immediate decline of autocorrelation coefficients within the first lag interval. At all subsequent lag distances h ≥ 2 km, the autocorrelation coefficients show a narrow range of scatter around zero. As a consequence of the considerable number of negative coefficients, it is not possible to compute the autocorrelation length with the exponential autocorrelation function (Equation (4)) commonly used for this purpose [90].
Even as the autocorrelation length reaches close to zero, the sum of coefficients cannot be balanced by the exponential autocorrelation function. Figure 4 shows an alternative fit that assumes all negative coefficients as zero, yielding an autocorrelation length of 1.1 km. These findings imply that autocorrelation is of minor importance at the scale analysed. Likewise, this can be taken as an indication that the main influences affecting SOC in the study area are already captured by the regression models. This is very similar to the findings of Zirlewagen and von Wilpert [91] on the regionalisation of SOC in the forest soil of Baden-Wuerttemberg (Germany). They also observed only very weak autocorrelation coefficients for residuals remaining after regression analysis.  The values of the sample variogram ( Figure 4) reach the sill within a range of 100 km. After reaching this maximum, the semivariance shows a hole effect, which typically arises from regular repetitions [75] or spatial variation occurring at several scales [90]. The observed regularly oscillating semivariances are possibly caused by the periodically arranged glacial deposits in the north-eastern lowlands resulting from successive ice advances. Especially within the study area, the distance between moraines and related landscape elements left by the successive glacial stages is quite similar to the observed periodicity of the semivariogram [92,93]. Although variables like z CaCO 3 and CF 0...90 were entered into the regression equations, these effects may be enhanced by a slight under-representation of the corresponding predictors (e.g., particle size classes) within the regression models.
For modelling the variogram, we selected an exponential function with an additional nugget component (Figure 4). The range for fitting the function to the experimental values was restricted to 100 km, thus neglecting the hole effect at larger lag distances. Following Webster and Oliver [75], we did not fit the variogram model to the hole effect, as there was no clear theory for the observed periodicity. In addition, the use of increased weight factors at high distances in the kriging procedure would not support the desired incorporation of unknown large-scale effects. The combined variogram model achieved a high goodness of fit (R 2 = 0.874). The parameters estimated using the Levenberg-Marquardt method [94] indicate that pairs of residuals are spatially correlated up to a distance of 108 km (ca 3 ×r) until the model approaches the maximum semivariance (sill c). However, nearly half of the spatially dependent variation occurs at distances below the lag increment or is contributed by measurement errors, as can be seen from the fairly high nugget component c 0 [90].
Although modelling the variogram yielded a properly fitted equation with high goodness of fit, solving the kriging equations based on it had only a low predictive power. Plotting the scatter of residuals retained from regression analysis e Reg against their estimates obtained from leave-one-out cross-validation showed only a poor correlation ( Figure  5). When the kriging estimates are used to improve the SOC stocks predicted by regression equations, the conducted correction actually brings a slight deterioration in model performance. Analogously to autocorrelogram analysis, these results show that a high degree of large-scale variation is already explained by regression. In this study, the use of geostatistical estimates can thus be omitted.

Depth Gradients
In the first step, relative depth gradients of the individual inventory plots are grouped into depth gradient types by clustering. Throughout the stagewise proceeding of the clustering process, the number of clusters decreases, starting with the number of observations (424), until all sites and clusters are unified ( Figure 6).
While the the number of clusters decreases, the ESS based on squared Euclidean distances increases. After discarding the initial partition of 424 clusters and the final onecluster solution, 422 possible cluster solutions remain. A first approximation of the optimal number of clusters with the elbow method suggests a solution with five clusters, since the most distinct increase in the ESS appears after the solutions with four clusters. Regarding the standardised ESS values obtained by clustering the relative depth gradients of SOC, the stopping rule of Mojena [78] indicates an optimal solution in the range of 5 to 8 clusters. Finally, taking into account practical considerations on handy cluster sizes that facilitate interpretation and prediction of the resulting depth gradient types, we selected the solution with five clusters for the subsequent analyses. The five relative depth gradient types obtained and the sites assigned to the respective clusters are shown in Figure 7. The most distinct differences between the five clusters can be seen for the forest floor and top soil layers. This is also indicated by the t-statistics. In these depth increments, the t-statistics attain the highest absolute values. In contrast, close to the bottom of the soil solum, the derived depth gradient types seem very similar, which is also expressed by small t-values in these depth increments. The cluster C 5 stands out as having the highest proportions of SOC stored in the organic layer of any cluster. Consequently, C 5 is also characterised by the smallest SOC portions inside the mineral soil with t-statistics that are invariably negative. An opposite depth gradient is typified by cluster C 3 , which contains the lowest proportions of SOC inside the organic layer. It also shows the highest t-values for any mineral soil layer below 10 cm depth. Cluster C 4 is in close accordance with C 3 concerning the forest floor, whereas the mineral soil SOC is closely bunched in the upper parts of the profile. The clusters C 1 and C 2 can be seen as intermediate types between C 3 and C 5 in which increasing proportions of SOC are stored in the forest floor layer.
It does not seem far fetched to bring the derived depth gradients in line with soil types and humus forms. Actually, the clusters C 3 and C 4 often coincide with the humus forms of mull and mull-like moder, respectively. The humus forms of mor humus and mor-humus-like moder appear most frequently within the clusters C 2 and C 5 . Soil types were merely in rough accordance with the derived depth gradient types. Lessives tend to occur preferentially in cluster C 4 . Additionally, the frequencies of arenosols and podzols are increased in the clusters associated with mor humus forms C 2 and C 5 . The loose relationship between clusters and soil types may be a consequence of the fact that vertical distribution of SOC is not a diagnostic criterion for many soil types. On the other hand, the observed differences between soil types and humus forms indicate that the vertical distribution of SOC throughout the soil profile is often altered by current land-use and stand history.
The plotted depth gradients (Figure 7) show a reasonable homogeneity. The assessment of within-group variability by the computed F-statistics supports this impression: Overall, with F-statistics below 1 in the majority of cases, the obtained clusters can be considered as homogeneous ( [77], p. 447). The 'mor humus cluster' C 5 is by far the most homogeneous cluster. The occurrence of increased F-statistics is limited to deeper soil layers. Due to the small proportions of SOC stored in these depth increments, the higher variability can be widely neglected. In fact, the increased F-statistics within the deeper layers can be seen as a result of the equally weighted distances used for hierarchical grouping.
Compared to the variability observed within the upper parts of the soil, also these layers are very homogeneous, as shown by the line graphs in Figure 7.

Mapping of Relative Depth Gradient Types
The selection of an optimum-sized classification tree for the prediction of relative depth gradients is shown in Figure 8. The starting point of the selection process is the initially grown much-too-large tree with 18 terminal nodes | T k | and the 10 respective auxiliary trees grown on subsamples for cross-validation. Subsequently, the weakest subbranches are incrementally pruned off by minimal cost-complexity pruning with respect to the complexity parameter α k . This results in a sequence of 18 subtrees (and 180 auxiliary subtrees) with decreasing complexity, finally ending at the root node (| T k | = 1). Since α is used as a penalty for complexity in the pruning procedure, the complexity parameter α k consequently increases at every stage. As one might expect, the resubstitution estimates for the risks R(T k ), which directly express the diversity within the terminal nodes, monotonically increase with each pruning stage. In contrast, the misclassification rate of the auxiliary trees is highest when sequential pruning is started. Hence, after a phase of decreasing cross-validation estimates R cv (T k ) and reaching of the minimum 0.689 at k = 15, R cv (T k ) increases again and equals R(T k ) at the end, when all subbranches of the auxiliary tree are pruned off and just the root nodes remain. In addition to using R cv (T k ) as an honest estimate of the misclassification rate, Breiman et al. [80] introduced the 1 SE rule, additionally invoking the standard error SE, to guide the user towards a more stable tree selection. Using the 1 SE rule, the subtree of the smallest complexity within the range of ±SE around the minimum value of R cv (T k ) should be selected. Keeping close to this rule, the tree T 16 , which contains just three terminal nodes | T k |, should be selected. It is obvious that a classification tree pruned back to three terminal nodes is not able to predict five clusters. Therefore, we used a less rigid interpretation of this rule and selected tree T 14 . This is exactly the smallest tree within the range of ±SE around the minimum of R cv (T k ) that makes it feasible to predict any distinguished depth gradient type.  | T k | Figure 8. Selection of the best pruned subtree by tenfold cross-validation: resubstitution R(T k ) and cross-validation R cv (T k ) estimates of risks for the sequence k of subtrees and auxiliary subtrees derived by minimal cost-complexity pruning according to the complexity parameter α k . | T k | denotes the number of terminal nodes of the respective subtrees. SE is the estimate for the standard error of R cv (T k ).
The obtained classification tree T 14 is shown in Figure 9. In addition to the splitting rules determined, the frequencies of the clusters contained in each node are also given. The frequencies shown for the root node thus equal the number of clusters inside the total sample. A quite distinct split between clusters associated with thick organic layers, C 2 and C 5 and the clusters with high fractions of SOC contained in mineral soil, C 3 and C 4 , can thus be achieved by the proportions of deciduous trees within the forest stand. Actually, in sum just nine sites of the types C 2 and C 5 remained within the right subbranch (p deciduous > 92%). The clear recognition of depth gradient types associated with mor-humus-like humus forms by the proportions of deciduous trees is in line with the well-known preference of coniferous stands for developing thicker organic layers.
The next and final split on the right is the SOC stored in the soil solum, and it separates cluster C 3 from cluster C 4 . This separation reflects the usually less distinct gradients of SOC across the mineral soil layers in soils rich in organic matter, like gleysols, or soils containing colluvic material.
The left branch, which contains the sites with higher proportions of deciduous trees, is subsequently split by stand age t stand . In particular, the 'mor humus cluster' C 5 is rarely observed in juvenile coniferous stands. Presumably, this must be seen in terms of stand history and the more recent forest plantings. These were often combined with tillage procedures, removing the organic layer, or transferring it to the mineral soil. In the past, clear-cut logging was an established technique widely used throughout the north-eastern lowlands. Especially since the 1960s, the combination of clear-cut logging with intense tillage procedures became a popular method [95]. To facilitate the ploughing, tillage was often preceded by stump removal with bulldozers. Even in cases where no-tillage planting was performed, the necessary clearings stimulate the mineralisation of the organic layer through warming of the forest floor and increased light exposure.  Finally, the subbranch of more mature coniferous forest stands is further differentiated by clay content within the mineral soil C% 0... 90 . At sites that are extremely poor in clay, the highest proportions of SOC are stored inside the organic layer (C 5 ). If at least some clay is present within the soil profile, the cluster C 2 is most frequently found. As described above, the main difference between the two clusters is the less steep gradient of SOC from the organic layer to the mineral soil. The observed shift of SOC from the organic layer to mineral soil with increasing clay content can be seen as a result of higher soil biological activity. In these usually less acidic and dry soils, an increased bioturbation by earthworms can be expected. Furthermore, this may point to the reduced ability of clay-poor mineral soils to stabilise SOC in organic-mineral complexes [86,96,97]. In mineral soils containing at least some clay, clay minerals increase the proportions of the pore space, where carbon is protected against bacteria and enzymes.
In addition to assessing the accuracy of the tree predictions with risks R(T 14 ) and R cv (T 14 ), based on the node impurities and, hence, expressing the risk for erroneous class probabilities, we also computed the error matrix of Congalton and Green [98]. For building the error matrix, we used the most frequent cluster within the individual terminal nodes of the tree for classifications ( Figure 10). As one might expect, the p-values clearly show that the classifications of each individual cluster (p K i ), as well as for the entire error matrix (pK), are significantly better than random results. Referring to the nomenclature of Landis and Koch [99], the relative strength of agreement associated with the derived values of conditional KHAT statisticK i can be considered as moderate in the majority of cases (KHAT is an estimate of Kappa). The classification achieved at least fair accordances for any of the clusters. For the overall classification,K also indicates a fair strength of agreement between the reference and classified data. The poorest results in terms ofK i are produced for cluster C 1 . This can be seen as a result of the anthropogenic origin assumed for this depth gradient, making predictions challenging. In total, the hit rate of the classification tree expressed by overall accuracy (OA) slightly exceeds 50%. The depth gradient with the highest individual hit rates (producer's accuracy (PA) and user's accuracy (UA)) is the most frequently observed cluster C 5 , which is common in mature pine stands on sandy soils. The ratios of row-(n i+ ) and column-sums (n +j ) are reasonably balanced, so the predictions are unbiased and no depth gradient is over-predicted. 28 Figure 10. Cross-tabulation of predicted vs. observed depth gradient types. Accuracy measures of the error matrix are denoted as: row total n i+ , column total n +j , producer's accuracy (PA), user's accuracy (UA), overall accuracy (OA), KHAT statisticK, conditional KHAT statisticK i , and p-values pK of respective KHAT statistics.
By combining the tree estimates of the relative depth gradient probabilities with the estimates of the SOC stocks within the entire soil solum derived by regression analysis, the volumetric SOC content of single layers can be calculated. The application of the complete methodology for making predictions for NFSI depth layers then results in the estimates presented in Table 4 and Figure 11. In addition to the total model performance measures, values based on single layers-thus excluding depth as an independent variableare also given. The p-values derived from the F-statistics signal highly significant prediction of SOC in every layer considered. As one might expect, the scatter plot ( Figure 11) shows a clear decrease of absolute errors with increasing depth, since SOC generally decreases with increasing soil depth. This decrease of absolute errors and variability of SOC is also confirmed by the norm of residuals and the standard deviations of volumetric SOC. Both measures are highest within the forest floor layer and lowest in the 60-90 cm layer of the NFSI.
In terms of the predictable variation expressed by the coefficients of determination r 2 and r 2 adj , the model performance is highest within the 10-30 cm layer. It then decreases towards the the mineral soil surface, as well as with increasing soil depth. The relatively low coefficients of determination within the upper parts of the mineral soil probably result from steep vertical gradients of SOC. Inside these thin near-surface layers, a high degree of disturbance can also often be expected. Furthermore, the results within these layers suffer most from sampling artefacts related to segregation of the organic layer from the mineral soil. Close to the bottom of the soil solum, the coefficients of determination are quite low. However, for the majority of practical applications, this may be balanced by the small SOC contents within these layers in total. Within the organic layer, the coefficients of determination are again fairly high. However, especially in the range of small SOC contents, several outliers that overestimate the SOC stored within the organic layer occur (Figure 11). Facing thin forest floor layers, the distinction between the mineral soil and the organic layer during sampling becomes more and more challenging. In extreme cases, very thin organic layers are 'overlooked' during sampling and are finally sampled together with the first layer of the mineral soil. Thus, the slightly biased estimation observed at low amounts of SOC within the organic layer must not necessarily be taken as a sign of poor model performance. Due to the generally high predictive power of soil depth on SOC content, the coefficients of determination increase markedly if these are computed using all observations jointly. Model performance also increases with increasing layer thickness. If the SOC content within the total mineral soil (0. . . 90 cm) is predicted, the coefficients of determination are similar to the values calculated for total solum stocks.
All in all, the model performance measures obtained from the calibration sample are well confirmed by the respective cross-validation estimates. In particular, the concordance between s e and RMSE cv is promising for an estimation of the prediction errors that is not overly optimistic. Larger discrepancies between the calibration and cross-validation estimates are limited to the depth layers at 0-5 and 60-90 cm. As indicated by the doubled standard deviations, the results of the repeated runs of cross-validation are also less stable within these depth layers. Since CCC cv , which comprises an additional assessment of slope, is a much stricter model performance indicator than r is, the squared values of CCC cv are again distinctly lower than r 2 cv . The application of the derived models for mapping SOC stored in the forest soils of Brandenburg is shown in Figure 12. As examples, the geographic distributions of SOC within the six NFSI depth layers used for model development are shown. Nevertheless, whenever required, the usage of the relative depth gradient types enables the user to predict any depth layer of interest. The cartograms are based on estimates derived for a grid resolution of 100 m. To prevent the estimates from drifting beyond physical bounds, the range of values of the predictor variables was constrained. Cut-offs were set with respect to the limits that are present in the NFSI data used for model development. This became necessary because, in case of the grid, which contains one million individual grid cells, the sample size was drastically enlarged compared to the NFSI sample. Associated with this is a broadened range of values for the environmental covariates, resulting in extreme combinations of predictors in single grid cells. Similarly, according to the soil types that were present in the training sample, histosols, which were not within the scope of the conducted analysis, were excluded from mapping. To obtain a proper visualisation of the spatial distribution for the majority of sites, the colour map legends of the final cartograms were made to span between the 5th und 95th percentiles.
The cartograms for every depth layer drawn show a high spatial variation. Especially at small spatial scales, a high heterogeneity can be observed. As can be seen within the example landscape extract under the magnifying glass, even at the smallest distances, changes of SOC equivalent to the total range of values appear. In particular, the SOC contents of the three upper mineral soil layers attain magnitudes that may significantly contribute to soil water storage and nutrient status. The SOC contents also vary in a range wide enough to substantially alter the soil's physical and chemical properties, which underlines the practical importance of regionalisation. Pursuant to the classification given by Arbeitsgruppe Boden [74], the equivalent soil organic matter contents within the 0-5 cm layer range from medium (h3) to very high (h5) values. Even within the 10-30 cm layer, the h2 class still occurs frequently. The SOC contents of the two subsoil layers are generally very low (h1), and are thus usually only of minor importance for ecological site characteristics. On the other hand, only small quantities of hydrophobic compounds are required to form water-repellent particle coatings [100]. In particular, under acidic conditions, the SOC within the subsoil layers may still contribute to prolongation of preferential flow paths. However, in terms of carbon sequestration, the amounts of SOC actually stored within these thick subsoil layers are essential reservoirs (see below).
In addition to the predominant small-scale patterns caused by terrain and land cover, some large-scale spatial gradients of mineral soil SOC can also be found. The SOC contents are slightly increased in the climatically wetter regions in the north-west and south of the study area. Amongst the large-and small-scale variations, some spatial clusters also become apparent at medium scale. These areas reflect the effects of parent material, which is usually closely related to the glacial deposits within the north-eastern lowlands. Increased SOC contents occur in the area of the post-glacial floodplains, whereas outwash plains are very poor in SOC.
On the small scale as well as on the large scale, the spatial patterns of SOC stored within the organic layer are highly independent from those observed for any layer of the mineral soil. Even the vertically directly adjacent mineral soil layer at 0-5 cm shows a widely different spatial distribution. At first glance, this is not a surprise, since SOC storage in both compartments underlies distinct mechanisms. Whereas the amounts of SOC within the forest floor layer are highly controlled by short-term influences, such as tree species compositions or forest management and stand history in general, the SOC storage is usually more closely related to long-term factors, like topography or parent material. On the other hand, these findings underline that the depth gradient types approach is flexible enough to cope with these different sources of variation, despite the fact that the horizontal distribution of the total SOC stocks is done using the same regression model. Table 5 gives a small example of potential utilisations within the context of carbon sequestration: Across all NFSI depth increments, the highest amounts of SOC are stored in the organic layer by far. In total, more than one-third of the solum SOC is stored within the forest floor. However, due to the thickness of the subsoil layers, the deeper NFSI depth increments also substantially contribute to the stock. Although sites influenced by a near-surface groundwater table merely cover 12.5% of the forested area of Brandenburg, considerable proportions of SOC are stored in these soils. Especially when the stable carbon pools within the mineral soil are considered, the proportion of 20.9% of stored SOC is distinctly larger than the respective area percentage.

Discussion
Since soils and the responsible soil forming factors are unique to the spatial domain under consideration, objective comparisons with studies conducted elsewhere are likely to fail. These difficulties are enhanced by the fact that the available environmental covariates also strongly depend on administrative units. This is especially the case for soil and geological maps. However, despite these limitations, the underlying principles of SOC storage in soils are, of course, transferable to other regions, and thus, some similarities can be observed.
A strong influence of terrain attributes associated with catchment area on SOC was already found by the pioneers in the field of soil attribute prediction through digital terrain analysis [18]. Similar findings have also often been reported in more recent studies. The TWI (topographic wetness index) was one of the best predictors for mapping soil organic matter in Croatia [38]. A high regression coefficient of TWI was also found for mapping mineral soil SOC within the pre-alpine lowlands [91]. Concerning terrain parameters, the most similar results to those of this study were reported by Grimm et al. [20], who found A C , RSP ( ... ), LS (LS factor from the Revised Universal Soil Loss Equation), and TWI as variables with the highest importances for mapping SOC within the topsoil layer on Barro Colorado Island. Slightly different rankings of variable importance were given by Ballabio [16]: For the prediction of SOC stored in A horizons within the Italian Alps, elevation,Ŝ, and Ψ S (sky-view factor) had the highest relative influence, whereas TWI and LS were ranked third and fifth. The findings of Zushi [101], that for the prediction of topsoil SOC stocks in Japanese cedar plantations with multiple linear regression analysis only β and Ψ S were selected by stepwise variable selection, while TWI and north (northness derived from local aspect) had no significant influence, seem even more contradictory to the variable importances observed here.
Beyond that, there are also landscapes where topography appears to be behind other soil forming factors in general. In addition to studies that focused on soil properties and land-use effects [45], in some cases, the influence of terrain on mineral soil SOC was low, even where it was explicitly tested [19,21,102]. Increased influences of landcover classes and tree species compositions were generally observed when SOC within topsoil or forest floor layers was predicted [21,91,102]. This is in good agreement with the superior importance of p deciduous in the CART model for the prediction of relative depth gradients. In studies where data on the depth of the groundwater table were available, it was naturally one of the main factors that controlled SOC at the respective sites [45]. The fairly high influence of z CaCO 3 in our regression models must be interpreted in relation to the often-observed influences of mapping units [19,21,102], since this covariate seems to be a property unique to local forest site mapping.
The achieved model performance is in a range comparable to that of other studies dealing with the prediction of SOC for numerous depth layers [32,36,37]. With low values near the upper and lower boundaries of the soil solum, the coefficients of determination also often exhibit similar depth dependences [20,33].
The reported improvement of predictions by kriging the residuals varies widely throughout the literature. As the improvement of regression estimates by subsequent geostatistical interpolation is the underlying idea of regression kriging, corresponding improvements are usually found [32,38,103]. However, there were also studies that found no substantial improvements [91,104]. Similarly to our results, these studies were conducted in highly structured landscapes with relatively low sampling densities. As already stated by McKenzie and Ryan [105], in these situations, the weak spatial dependence diminishes the advantages of kriging. With this in mind, it is not surprising that even the advantage of regression may disappear if the landscape of the study area is homogeneous and the sampling density is high [106,107]. In these situations, pure spatial interpolation methods become superior to regression (kriging). In addition to landscape properties and sampling densities, these relations are, of course, strongly affected by the availability of suitable environmental covariates at a proper spatial scale.

Conclusions
The concept of relative depth gradients offers a straightforward framework to ensure consistency of depth gradients and model interpretability, whereas other methodologies, such as SCORPAN modelling of depth function parameters, require further constraints.
We suppose that our amalgam of soil depth functions and digital soil mapping can be a handy methodology whenever coherent depth gradients are of special importance for the subsequent analyses. It may also offer a useful methodological framework if the interpretability of covariates that control the horizontal and vertical distribution of SOC is of particular interest. Since SOC stocks were estimated as a whole and an acceptable model performance for single layers was still achieved, the usage of relative depth gradients may help to lessen the propagation of sample separation errors. Therefore, the concept of the relative depth gradients can be beneficial if the regionalisation of SOC serves as a basis for hydrological or geochemical modelling.
Further interesting fields of application may be found in scenario-based modelling of the land-use and climate effects on SOC storage and distribution.
Funding: This study was partially supported through a doctoral scholarship from the Faculty of Forest and Environment of Eberswalde University for Sustainable Development 'Arbeitsgruppe Forschung'.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author.