Evaluation of the Terrestrial Ecosystem Model Biome-BGCMuSo for Modelling Soil Organic Carbon under Different Land Uses

: Soil organic carbon (SOC) is a mandatory pool in national inventory reports on greenhouse gas (GHG) emissions and removals to the UNFCCC. Hence, its accurate assessment is important. Modelling SOC changes for national GHG reports is encouraged, but the uncertainty related to this pool still presents a signiﬁcant challenge; thus, verifying modelling results with ﬁeld observations is essential. We used the process-based model Biome-BGCMuSo and assessed its suitability for use in Croatia’s GHG reporting. We modelled SOC stocks in the top 30 cm of the mineral soil layer (SOC 30 ) for four different land-use (LU) categories (Deciduous/Coniferous Forest, Grassland and Annual Cropland) distributed in three biogeographical regions (Alpine, Continental and Mediterranean) and compared them with results of a national soil survey. A total of 573 plot level simulations were undertaken and results were evaluated at three stratiﬁcation levels (LU, LU × biogeographical region, and plot). The model reproduced the overall country mean of SOC 30 with no overall bias, and showed good performance at the LU level with no signiﬁcant ( p < 0.05) difference for all LUs except Deciduous Forest (11% overestimation). At ﬁner stratiﬁcations, the model performance considerably worsened. Further model calibration, improvement and testing, as well as repeated soil survey are needed in order to assess the changes in SOC 30 and to evaluate the potential of the Biome-BGCMuSo model for use in GHG reporting.


Introduction
Soil organic carbon (SOC) is one of five mandatory pools in national reports on greenhouse gas (GHG) emissions and removals that must be submitted pursuant to the United Nations Framework Convention on Climate Change (UNFCCC) and accounted for under the Kyoto Protocol (KP) [1] and Paris Agreement (PA) [2]. Carbon (C) emissions and removals within the SOC pool are included in the land use, land-use change and forestry (LULUCF) sector of the National Inventory Reports (NIR) of the Parties to the UNFCCC. which are representative of specific biomes [46,47], and subsequently used for simulations and/or predictions at various temporal and spatial scales [18,43,48]. To account for spatial variability within a specific ecosystem, multi-site calibration and validation are recommended [49,50]. As a relatively small country in Europe, but with very high biogeographical diversity, Croatia is an ideal area for testing the model performance for specific ecosystems in different biogeographical regions.
Although C stock changes are more important than current C stocks for GHG accounting, in order to obtain reliable estimates of C stock changes it is important to first be able to properly estimate current C stocks. Therefore, the aim of this research was to test the suitability of the Biome-BGCMuSo terrestrial ecosystem model for modelling SOC stocks in the top 30 cm of the soil mineral layer, at the national scale, under different land uses, through comparison with field data from a national soil survey.

Study Area
The research focused on the total area of Croatia, which is distributed among three biogeographical regions: Continental, Alpine and Mediterranean ( Figure 1) [51]. The Continental region covers the northern and central parts of Croatia, and is bounded to the south by the Dinarides mountains. It is mainly an area of lowlands and low hills at 200-500 m a.s.l., with Pannonian Mountains reaching 1059 m (Mt Ivanščica) in the north. It is characterised by a temperate rainy climate with warm summers and cold winters, and two precipitation maxima classified as Cfwbx" according to the Köppen classification [52]. The mean annual temperature (Tair) is around 10 • C and annual precipitation (P) ranges from approximately 600 to 1200 mm following an east-west gradient [53]. The soil parent material is mainly alluvial deposits (lowlands) and silicate rocks (hills and mountains) [54,55]. The geological background consists of magmatic, clastic and metamorphic rocks [56,57]. The Alpine region of Croatia is situated between the Continental and Mediterranean regions, and covers the central belt of the Dinarides (i.e., high Dinaric Alps) at elevations mainly above 500 m and with the highest mountain peak at 1831 m a.s.l (Mt Dinara). The climate of the Alpine region is characterised by relatively short and cool summers and long winters with abundant snowfall (Cfsbx") [52]. According to the Zalesina meteorological station, situated at 750 m a.s.l., Tair is 6.6 • C and P is 2021 mm, whereas above 1200 m a.s.l. a harsher climate prevails (Dfsbx" according to the Köppen classification) [52] with Tair of 3.6 • C and P of 1892 mm (Zavižan meteorological station at 1594 m a.s.l., for the period 1961-1990) [53]. The soil parent material is mainly limestone and dolomite [54,55], developed on the geological background consisting dominantly of carbonate Mesozoic rocks [56,57]. The Mediterranean region in Croatia covers coastal areas and islands in the Adriatic Sea. It is diverse in orography, with an elevation range of 0-1000 m a.s.l. The climate is warm with hot summers and mild winters, and classified as Csa according to Köppen [52], with Tair of 13.1 and 15.5 • C, and P of 1177 and 936 mm, respectively. The soil parent material is mainly carbonate. Both the Alpine and Mediterranean regions of Croatia feature karst landforms (e.g., sinkholes and caves) and disappearing streams.

Field Measurements and Laboratory Analysis
Soil survey in Croatia was conducted in line with the IPCC guidelines [10], during 2015 and 2016, at 725 plots distributed among different LU categories (Figure 1). At each plot within the Forest land category, 5 m from the plot centre, mineral soil was sampled at 6-8 positions in the directions N, NE, E, SE, S, SW, W and NW. Soil cores were taken with a split tube sampler (Eijkelkamp) to the total depth of 30 cm and divided into 10 cm deep layers. In the case where the use of a sampler was not possible due to the high content of rocks at the site, soil samples were collected from one soil pit sampled in three layers (0-10, 10-20 and 20-30 cm). In cases when the soil was less than 30 cm deep, the soil depth was recorded. At plots within LU categories other than FL, the soil was sampled similarly, but only at 4 positions per plot due to lower heterogeneity in comparison with the forest soils. An additional difference was the sampling within the Annual Cropland LU category, where soil samples were collected from two layers (0-20 and 20-30 cm). Soil sampling in two separate layers (0-10 and 10-20 cm) at plots within AC LU was not justifiable due to soil tilling and mixing of the topsoil layer.
At all non-rocky sites, soil bulk density was estimated by taking three soil samples per layer using Kopecky cylinders. At rocky sites, soil bulk density was estimated from the soil pit using inert quartz sand for estimating the volume of the soil pit according to the ISO standard (ISO 11272:1998). Soil samples were sieved, oven-dried and weighed. For every plot, soil samples from a given layer were then pooled together, homogenised and analysed for soil texture and carbon content. Soil organic carbon (SOC) of soil samples was determined using a Thermo Scientific FLASH 2000 Soil Nitrogen and Carbon analyser. This method allows direct measurements of the total carbon (TC). Carbonates were removed by treating soil samples with 4 M HCl, and heating in a centrifuge tube sitting in a hot block for 2 h. The insoluble residue was washed with Milli-Q water and centrifuged (2×), freeze dried and weighed. The carbon content of the insoluble residue after HCl treatment is the soil organic carbon (SOC).

The Biome-BGCMuSo Model-Parameterisation and Input Data Collection
In this research we used a model Biome-BGCMuSo 4.0., a biogeochemical model that simulates the storage and flux of water, carbon and nitrogen in the soil-plant-atmosphere system [34]. Although newer model versions are also available, version 4.0 has been extensively evaluated [34]. The main improvement of the Biome-BGCMuSo in comparison to the original Biome-BGC model is a multilayer soil sub-model that has 7 layers (i.e., 0-10, 10-30, 30-60, 60-100, 100-200, 200-300 and 300-1000 cm). The Biome-BGCMuSo simulates soil temperature, soil water content, root mass proportion, soil organic C and mineral N, soil moisture stress index with accompanying senescence, and maintenance root respiration flux in each of the layers. The decomposition process is represented as a converging cascade scheme in which organic material, initially passed from plant pools to litter and coarse woody debris pools, decomposes and passes through a cascade of different pools with specific turnover times [60]. It should be noted that the decomposition scheme of Biome-BGCMuSo corresponds to the decomposition scheme, together with specific decomposition constants, used in the CLM-CN model [60]. In each step, organic material is partially respired following the decay rates described by the first-order kinetics [11,23,61]. Another major feature of Biome-BGCMuSo v.4.0 is the implementation of several management options, i.e., harvest, ploughing, fertilisation, planting, thinning and irrigation. The model is freely available online with the source code and full documentation at http://nimbus.elte.hu/bbgc/index.html (accessed on 23 July 2021).
Three main datasets are needed as model inputs: daily meteorological data, sitespecific data and ecophysiological parameters. Optionally, the user can provide yearly atmospheric CO 2 concentration, yearly atmospheric N deposition and management data. Meteorological data needed as model drivers are daily maximum and minimum air temperature, daily precipitation amount, daylight mean global radiation, vapour pressure deficit and geographical coordinates used by the model for the calculation of the daylength.
Spatially explicit data that fully match these requirements are available for Central Europe within the FORESEE database [62]. FORESEE v3.1 is an open-access gridded meteorological database with a spatial resolution of 1/6 • × 1/6 • , spanning between 10 • 55 -28 • 05 E and 42 • 35 -51 • 05 N, and covering the 1951-2100 period with observations and climate model-based projections.
Site-specific data include soil texture, maximum rooting depth, long-term mean air temperature and air temperature range, elevation, latitude and shortwave albedo. These data were estimated from field observations (soil texture, and maximum rooting depth), from the processing of FORESEE database (air temperature) and its ancillary data (elevation and latitude) and based on expert judgment (albedo). Lists of ecophysiological constants or parameters (EPC list) are biome specific or species specific. Lists of ecophysiological parameters used in this research are provided in Table A1. We used EPC lists for Grassland, Cropland and Deciduous forest published in   [34], and adjusted a number of parameters according to Dalrymple and Dwyer (1967) [63], Barbosa  Of the optional input data, we used yearly atmospheric CO 2 concentration data from Mauna Loa Observatory [68] and ice cores [69], and yearly atmospheric N deposition data were estimated from the literature [70]. LU category-specific data on average management practices were estimated based on national regulations (Forest land), statistical yearbooks, through consultations with local experts and authors' experience (Cropland and Grassland) (Table A2).

Soil Organic Carbon (SOC) Modelling
We performed spatial modelling of SOC down to 30 cm (SOC 30 ) for four different LU categories, i.e., Deciduous forests (DF), Coniferous forests (CF), Annual cropland (AC) and Grassland (GL). By modelling within the selected four LU categories, we accounted for 4310.4 kha or 76.2% of Croatia's land territory ( Table 1).
The model simulation had three phases: spin-up, transient and normal run. Spin-up (self-initialisation or equilibrium run) is a widely used method for the estimation of the initial conditions of biogeochemical models [71]. Using this method, long-term simulations are initiated starting from zero SOC and reusing the meteorological data that drives the simulations. The aim of the spin-up simulation is to reach equilibrium with the current climate using predefined N deposition and atmospheric CO 2 concentration values. It was simulated for 6000 years, using: repeating meteorology from 1951-1999, fixed atmospheric CO 2 concentration of 290 ppm, fixed atmospheric N deposition of 0.0002 kg N m −2 yr −1 and annual fire-mortality rate of 0.002 (personal assessment). The transient run was simulated for 100 years from 1900 to 1999, using the same meteorology as in spin-up, estimates (1900-1957) [69] and records   [68] of atmospheric CO 2 concentration, estimates of the atmospheric N deposition [70], and LU-specific management activities (Table A2). The purpose of the transient run is to alleviate undesired ecosystem behaviour caused by a sharp change in atmospheric CO 2 concentration and atmospheric N deposition between the spin-up and normal simulation. The normal run was performed for the period 1990-2016 using meteorology data from the FORESEE database [62], atmospheric CO 2 concentration from Mauna Loa [68], atmospheric N deposition [70] and LU-specific management activities (Table A2).
Meteorological data, site-specific data and ecophysiological parameters were all spatially explicit and their intersection at each sample plot resulted in a table that contained all the specific attributes needed for the model run. Of 725 sample plots within the soil survey, 629 plots were under four LU categories of interest, for which we performed model simulation. One plot was excluded from further analysis as a probable outlier due to the unrealistically high measured value of SOC 30 (>350 t C ha −1 ). Of 628 field plots for which we performed a model run, 55 model runs resulted in unsuccessful simulations (zero live biomass at some point during the simulation). A possible reason for this may be the model sensitivity to N that may result in an unsuccessful growth (and death) of vegetation during the spin-up phase due to high N demand of vegetation and low N availability at specific locations. For more details on the technical components of the model and simulation, see the User's Guide [72]. Finally, we achieved a total of 573 successful plot level simulations for use in the model-data comparison (dataset is available in the Supplementary Materials).

Model Evaluation
Biome-BGCMuSo was previously evaluated with flux and biometric data for C3 grassland, C4 maize and pedunculate oak forest [34], in addition to tree-rings data for oak forests [73]. In this research, validation of the model was performed through comparison of modelled and measured SOC 30 data at three different levels: land use (regardless of the biogeographical region), land use × biogeographical region and plot.
For each LU category, and for the LU × biogeographical region level, the difference between the modelled and the measured SOC 30 stocks was assessed with a Student's t-test. The differences in SOC 30 stocks between LU categories were tested with the Kruskal-Wallis H test, separately for the modelled and the measured SOC 30 stocks. Furthermore, evaluation of model results was performed using quantitative measures, namely coefficient of determination (R 2 ) of the linear regression, mean absolute error (MAE), root mean square error (RMSE) and Nash-Sutcliffe model efficiency (NSE). Finally, analysis of residuals, estimated as the difference between measured and modelled SOC 30 data, was performed at the plot level. Uncertainty analysis at the plot level was not included because it exceeds the scope of this study.

Results
To test the suitability of the model for reporting within the National GHG Inventory, we aggregated our model results according to different LU categories and found a strong correlation between measured and modelled SOC 30 stocks ( Figure 2).
We observed that modelled and measured SOC 30 stocks significantly differ (p < 0.01) for DF only, with the model overestimating the soil carbon stocks ( Figure 2). The magnitude of overestimation (mean difference) was 7.38 tC ha −1 (11.0%). For all other LU categories, there was no statistically significant (p < 0.05) difference between the measured and the modelled SOC 30 stocks.
Furthermore, the SOC 30 stocks in the AC significantly (p < 0.01) differ from SOC 30 stocks in all other LU categories for both the modelled results and the measured data. Measured SOC 30 in DF is significantly (p < 0.05) lower than the measured SOC 30 in GL, whereas modelled SOC 30 in GL significantly (p < 0.01) differs from both DF and CF categories.
For every LU category, the measured SOC 30 showed greater variation in comparison to the modelled one, with the measured SOC 30 in CF having the highest and the modelled SOC 30 in AC having the lowest coefficient of variation (CV) ( Table 2). Moreover, measured variability within different LU was similar (CV of 39% to 45%), whereas the variability in the modelled SOC 30 stock was larger for DF and CF LU categories (CV of 28% and 29%, respectively) than for GL (22%) and AC (19%).   88 30 In preparation of the country's NIR with mandatory land stratification according to the LU, the IPCC GPG considers additional land stratification, with respect to climate, soil types, etc., to be good practice [10]. Although the model showed good performance at the LU level, disaggregation of the results by the biogeographical regions revealed a significant mismatch between measured and modelled data (Figures 3 and A1). For all LU categories within the Mediterranean region, a significant underestimation of the modelled SOC 30 was observed. In contrast, an overestimation of the model was observed for DF and GL LU in the Continental region, and for the DF and CF LU categories in the Alpine region. For the AC in the Alpine region only, the modelled SOC 30 was underestimated. However, because there were only two data points in this category, the confidence interval for this estimate is unknown and an estimate of model accuracy is unfeasible.
No statistically significant (p < 0.05) difference was observed between the measured and the modelled SOC 30 stocks in Alpine GL, Continental CF and AC, and Mediterranean DF land-use categories.
The results also revealed an apparent gradient in modelled SOC 30 with respect to biogeographical regions, namely Alpine > Continental > Mediterranean. No such gradient was observed in the measured data (Table 3, Figure A1). Furthermore, observed gradients in modelled data were present in all LU categories (Table 3, Figure A1). Further disaggregation of model results and comparison at the plot level revealed a very poor correlation between the measured and the modelled SOC 30 stocks (Figure 4). Residual analysis ( Figure A2) showed that the smaller values of the modelled SOC 30 stocks are more likely underestimating the true values (positive residuals), whereas the large model values are likely overestimating the true values. The largest bias was observed for Deciduous forests (R 2 = 0.308, p < 0.001), and the lowest bias was observed for Annual cropland (R 2 = 0.116, p < 0.001).
Summary results of the model evaluation at different levels of stratification are shown in Table 4. Although at the land-use level model performance is very good, when disaggregating results to lower scales, significant discrepancies between measured and modelled SOC 30 data can be observed.

Discussion
During a long history of model development [34,36,38,74] and due to a variety of calibration/validation studies [40,75,76], Biome-BGC evolved and became increasingly more suitable for simulations across various terrestrial ecosystems [77,78] and environmental gradients [42]. The results of our simulation with the new model variant (Biome-BGCMuSo) indicate its suitability for modelling the average mineral SOC down to 30 cm, for Croatia at the national level, for four different land-use categories, namely Deciduous forests (DF), Coniferous forests (CF), Annual croplands (AC) and Grasslands (GL). For some of the more heterogeneous land-use categories, the model could not be initialised due to difficulties in defining adequate ecophysiological parameters. In particular, land-use categories Forests out of yield (i.e., maquis and shrubs), and Perennial croplands represent mixtures of several different types of ecosystems. For example, Forests out of yield include Mediterranean maquis and continental shrubs, whereas Perennial croplands include vineyards, olive groves and orchards. By comparison, the LU category Settlements encompasses a mosaic of sealed land, grass, parks, trees, water bodies, etc., making the selection of the appropriate EPC extremely difficult; in addition, the representativeness of the measured data is frequently questionable. The use of the unique list of ecophysiological parameters for modelling strata encompassing such diverse groups cannot be justified and would require a finer stratification of LU categories [79]. In addition, with a finer stratification (e.g., to vineyards, maquis, etc.) the number of plots for testing the modelling results would become prohibitively small. The lack of completeness at the national level represents a key issue if the model would be used for national GHG reporting, and would require the use of different tiers for different strata [10]. This may not be an issue, provided that in the country's NIR it has been demonstrated that only the modelled strata are Key categories [10]; that is, the contribution of the non-modelled strata to the overall GHG emissions and removals must be comparatively small, so that these can be safely estimated by some other, simpler, robust, albeit less precise, method.
At the LU category level, we observed a strong correlation between the mean modelled and the mean measured SOC 30 stocks ( Figure 2). However, this is due to SOC 30 stocks in AC, which are generally lower than those in grassland or forest land [79]. The variability in the measured SOC 30 stocks for all LU is consistently larger than the variability in modelled SOC 30 (Table 2). Considering the high spatial heterogeneity of soil in Croatia [54], high variability in measured SOC 30 stocks is expected. On the other hand, the use of the "average" management practices in simulations, due to the lack of management data, contributes to lower variability in modelled SOC 30 compared to the measured value, which reflects diverse management practices.
Disaggregation of the results to the level of land use within a specific biogeographical region, or to a plot level, shows worsening of the agreement between the modelled results and field measurements (Figures 3 and 4, and Figure A1). Variability in the measured SOC 30 did not change significantly with stratification. The exceptions are CF and AC in the Mediterranean biogeographical region, which showed considerably larger (52%) and smaller (33%) variability than the overall means for CF and AC, respectively (compare Tables 2 and 3). This is most likely due to the high variability in the soil depth. SOC 30 stocks, apart from the carbon content, strongly depend on the soil depth. In the Mediterranean part of Croatia, the karst landforms dominate and the variability in soil depth is very high, with many areas having less than 30 cm of soil [54]. Therefore, the greater variability in SOC 30 is expected in forest LU categories, but not in croplands, because agriculture targets locations with more (i.e., deeper) soil [80].
High variability in the modelled SOC 30 stocks in CF between biogeographical regions, with low variability within each, is most likely the result of the use of the single EPC list. Considering that for the Biome-BGCMuSo model there is only one published EPC list for the forest, namely, for pedunculate oak forests [34], to model SOC 30 in CF we had to adjust some parameters (Table A1). Although we used a single EPC list for CF, we are aware that, for example, fir, spruce and pine differ significantly in their ecophysiology [81], which should be reflected in the EPC lists. Unfortunately, such species-specific EPC lists for Biome-BGCMuSo are currently lacking. Although we modelled each LU with a unique EPC list, we accounted for site-specific soil parameters, i.e., soil texture and maximum rooting depth, which are assumed to affect SOC processes [82,83]. Evidently, this is not sufficient and, for all LU categories, the modelled SOC 30 stocks show gradients Alpine > Continental > Mediterranean, which is not observed in the measured data.
Overestimation of the SOC 30 stocks in the Alpine biogeographical region (for DF and CF) and in the Continental region (for DF and GL), and underestimation in the Mediterranean region (all LU categories; Figure A1), is a strong indicator that a single EPC parameter list per LU used for all biogeographical regions is not adequate. Finer calibration, or at least adjustment of some of the model parameters, is needed at the level of LU within each biogeographic region. For example, parameters C/N of different plant organs and/or C allocation parameters are found to be key parameters in the original Biome-BGC model, in which model results were very sensitive to even a slight change in these parameters [39,84]. C/N is mainly species specific, whereas C allocation is strongly dependent on tree species, and also environmental (i.e., site and climate) conditions [85]. Of several C allocation parameters used in the model (Table A1), the ratio of fine root-to-leaf C (FRC/LC) can be considered to be the most important for modelling of SOC 30 stocks [86,87]. In dry soil conditions, as is frequently the case in the Mediterranean region, plants tend to allocate more C into roots to reach soil water reservoirs and meet the plant's water demands [88]. By using a single LU-specific EPC list, with a unique FRC/LC ratio in all biogeographical regions, it is likely that for wetter and richer sites the ratio will be too high (resulting in an overestimation of SOC stocks), whereas for dryer and poorer sites, it is likely that the ratio will be too low (resulting in underestimation of SOC stocks). Our results (Figure 3) indicate that this may be one of the causes behind the model overestimation (Alpine and Continental) and underestimation (Mediterranean region) of SOC 30 stocks in some LU. If so, it is likely that for the Alpine and the Continental region the FRC/LC parameter was, on average, probably too high, whereas for the Mediterranean region it was probably too low. Here it is also important to note that although the model shows good efficiency at the national LU level (Table 4), this apparent agreement may only be a fortunate compensation of two opposing biases, i.e., underestimation of SOC 30 at the Mediterranean sites and overestimation of SOC 30 at the Continental and the Alpine sites. This issue is still being investigated and the results will be presented in our future publications.
The strong bias of the model regarding different biogeographical regions is also reflected in the negative Nash-Sutcliff model efficacy (Table 4), which provides evidence against the use of the model on LUs stratified according to biogeographical regions. This confirms the need for improved model parameterisation, which will reflect not only speciesspecific differences, but also differences in ecophysiological traits that could be characteristic at the level of the biogeographical region.
At the plot level, the Biome-BGCMuSo model was not able to account for spatial variation in SOC 30 stocks that originate from site-specific conditions, although the soil sub-model has been substantially improved regarding new processes and parameters for different soil layers [34]. Disagreement between measured and modelled soil carbon stocks at the plot level was previously observed in the original version of Biome-BGC [89]. For modelling at the plot level, reliable, long-term data on site management, disturbance and land-use changes are needed for improvement of the model performance at fine spatial variability. Because an accurate and comprehensive site history for each specific plot is difficult (or even impossible) to obtain, the issue of proper model initialisation [71] still hinders accurate reconstruction of SOC. A potential way forward is to explore the possibility of using a data assimilation method [90] to improve SOC estimates. Another issue is the limitation of the soils sampling. In addition to financial and other constraints that limit the number of soil samples that can be taken within a survey, soil sampling has, by its nature, a point-in-space characteristic, and although it is (if undertaken properly) representative of the area of the site, it probably does not equal the true mean of the SOC 30 stock for the wider area of the site.
Uncertainty analysis implies estimating the model result uncertainty originating from different model inputs and/or different model parameters [91]. Estimating the uncertainty originating from model inputs for Biome-BGCMuSo would assume possessing more than one climate and soil database covering the entire area of the investigation (i.e., the whole national territory) [92]. Currently it is not the case in our study; therefore, it was not performed. Nevertheless, a recent study with Biome-BGCMuSo showed that the spatial variation in the SOC change rate is much larger than the variance in the SOC change rate originating from changes in model inputs (e.g., climate and soil data), for all investigated land uses (i.e., croplands, grasslands and forests) [92]. Accordingly, we can assume that the introduction of input data-related uncertainty at the plot level would not provide us with more information than we already obtained. Furthermore, estimating the uncertainty originating from model parameters is essential in modelling studies [93,94]; however, for the Biome-BGCMuSo model it would be an extensive task due to the high number of model parameters (i.e., 107). Nevertheless, sensitivity analysis is planned as part of the future work on Biome-BGCMuSo model calibration and validation. Finally, our results also indicate that the Biome-BGCMuSo model is still dominantly driven by meteorology, as is the case with the original Biome-BGC model [95]. Among various soil properties that cause spatial heterogeneity of SOC stocks, e.g., soil texture, bulk density, and soil aggregates, recent research shows that SOC is also strongly driven by mycorrhiza [96], microbial biomass [97] and soil fauna [98]. These processes are currently not considered within the Biome-BGCMuSo model. Therefore, in addition to model parameterisation, improvement of the model logic regarding specific processes and geological conditions should also be addressed in the future.

Conclusions
In this study, we focused on the estimation of SOC 30 for Croatia using a process-based biogeochemical model. The study also introduced a new national soil survey dataset, which is consistent with the IPCC guidelines, and served as a reference for the first Biome-BGCMuSo model verification study of SOC 30 at a large spatial scale in Central Europe. We demonstrated that the Biome-BGCMuSo model can reproduce the overall country mean of SOC 30 , but with considerable spatial variability. Although biases were present in different biogeographical regions that eventually compensated for each other, the study also highlighted that the overall, country-scale results of the equilibrium run were consistent with the observed means. Thus, no overall bias was found for the country means, which is a promising finding.
The results of the simulation of SOC 30 in four different land-use categories (Deciduous forests, Coniferous forests, Annual croplands and Grasslands) at the national scale in Croatia indicate that additional calibration/adjustment regarding specific land uses is required, given that the spatial variability in SOC 30 was not well reproduced. This can be partly attributed to the simplistic method that is used for the initialisation of the model, as the spin-up approach entirely ignores the site history. Note that, for the present modelling exercise, information on the past management was not available; thus, future studies should address this issue. From the perspective of the observations, there are also issues with the point-sampling method of the SOC 30 , where the effects of high spatial variability in soil become pronounced. These issues are very difficult to overcome at the national scale.
Model testing through model-data comparisons at different scales, such as the one presented in this work, highlights the main model issues and may serve as a guideline for improvement of model parameters or the internal model logic. A repeated soil survey is needed in order to assess the changes in SOC 30

Conflicts of Interest:
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.   Ploughing 300 down to 30 cm * Deciduous forests in Croatia are even-aged managed, with thinnings of 15% performed every 10 years and with regeneration cuts (2-3 cuts) performed during the last 10 years of the rotation period. In order to perform spatial modelling, we needed thinnings and regeneration cuts distributed among different locations. In the absence of this information, we estimated average annual thinning intensity to ensure evenly distributed thinning at the spatial scale. If we assume a rotation period of 140 years (i.e., prescribed rotation for pedunculate oak), annual thinning intensity should account for 1.5% thinning rate during a 130 year period and 100% regeneration cut during whole rotation period, which sums to 2.1%. Coniferous forests are uneven-aged managed with thinning of 30% performed every 10 years, i.e., average annual thinning intensity is 3%. Figure A1. Comparison of measured and modelled soil organic carbon stocks down to 30 cm in different land-use categories with respect to biogeographical regions. Asterisk indicates statistically significant difference (p < 0.01) between measured and modelled data at the land use × biogeographical region level.