Elaborating Hungarian Segment of the Global Map of Salt-A ﬀ ected Soils (GSSmap): National Contribution to an International Initiative

: Recently, the Global Map of Salt-a ﬀ ected Soils (GSSmap) was launched, which pursued a country-driven approach and aimed to update the global and country-level information on salt-a ﬀ ected soils (SAS). The aim of this paper was to present how Hungary contributed to GSSmap by preparing its own SAS maps using advanced digital soil mapping techniques. We used not just a combination of random forest and multivariate geostatistical techniques for predicting the spatial distribution of SAS indicators (i.e., pH, electrical conductivity and exchangeable sodium percentage) for the topsoil (0–30 cm) and subsoil (30–100 cm), but also a number of indices derived from Sentinel-2 satellite images as environmental covariates. The importance plots of random forests showed that in addition to climatic, geomorphometric parameters and legacy soil information, image indices were the most important covariates. The performance of spatial modelling was checked by 10-fold cross validation showing that the accuracy of the SAS maps was acceptable. By this study and by the resulting maps of it, we not just contributed to GSSmap, but also renewed the SAS mapping methodology in Hungary, where we paid special attention to modelling and quantifying the prediction uncertainty that had not been quantiﬁed or even taken into consideration earlier. and Z.B.; investigation, Z.B., A.L., R.P. and O.P.; data curation, A.L., R.P. and O.P.; writing—original draft preparation, G.S., Z.B., A.L., T.T. and L.P.; writing—review and editing, G.S., Z.B., T.T. and A.L.; visualization, G.S. and A.L.; supervision, Z.B., and L.P.; project administration, G.S.; acquisition,


Introduction
In the past decade, a number of international initiatives have been launched in order to provide thematic spatial soil information with relatively high resolution (from 1 km to 100 m) at the global scale, e.g., SoilGrids [1], Global Soil Map [2] and Global Soil Organic Carbon Map [3]. Nowadays, a country-driven (a.k.a "bottom-up") approach is frequently pursued in which a given country is invited to prepare its own thematic soil maps for targeted soil properties and functions according to the specifications summarized in a guideline or "cookbook" (e.g., [3,4]). This approach has its own advantages and disadvantages, e.g., a map prepared by a country could be more accurate than a globally or continentally compiled one [5,6]. However, merging countrywide maps prepared by different countries (with possibly different methodology, sampling density, data quality etc.) are going to yield artefacts at state or country borders, which makes the final global map not so attractive [7,8].
Recently, an international initiative, namely Global Map of Salt-affected Soils (GSSmap), which has also pursued a country-driven approach, has been launched by the Global Soil Partnership (GSP) of the Food and Agriculture Organization (FAO). Salt-affected soils (SAS) are groups of soils with a high content of soluble salts and/or high amounts of sodium ions and have significant impacts on the environment, water and agriculture [4]. GSSmap is aimed at updating the global and country-level information on these soils and lay ground for future monitoring [4]. The initiative is highly appreciated because the global distribution of salt-affected soils was first estimated in the late 1970s by Szabolcs [9] and since then there has not been consistent updates of the global distribution [4]. In the framework of GSSmap, the role of the countries is to prepare and deliver their own SAS maps for the topsoil (0-30 cm) and for the subsoils (30-100 cm) with quantified prediction uncertainty, which calls for using advanced digital soil mapping techniques. The global map of GSSmap is expected to be published in December, 2020.
Digital soil mapping (DSM), which has become a successful sub-discipline of soil science with an active research output [10], aims to provide spatial soil information for a wide range of studies, such as precision agriculture [11,12], hydrology [13][14][15], environmental sciences [16,17], conservation biology [18,19] or spatial planning [20,21]. For this purpose, geostatistical techniques are widely used, which have been complemented by machine learning algorithms in the past decade. Nowadays, these advanced geostatistical and machine learning techniques, as well as their combinations, are in use for inferring the spatial variations of soil properties, functions and/or services [22][23][24][25]. In addition to the new techniques, the amount of environmental covariates used in DSM is continuously expanding mainly thanks to remote sensing. Although digital elevation models and their derivatives (e.g., slope, aspect and topographic wetness index) are proved to be useful covariates in DSM, remote sensing is able to provide a huge amount of information on land surface with a continuously increasing spatial, temporal and spectral resolution [26][27][28]. By the combination of certain bands of satellite images, such image indices (e.g., normalized difference vegetation index, salinity index and vegetation soil salinity index) can be gained, which provide specific information for the problem in hand. Thus, a lot of papers have addressed the issue of exploiting information acquired by remote sensing techniques in spatial modelling and inferencing (e.g., [29][30][31][32]).
In Hungary, there is a long tradition and history of studying salt-affected soils that is nicely demonstrated by a huge number of monographs dedicated to this topic (e.g., [9,[33][34][35][36]). Most of the areas with SAS can be found in the Great Hungarian Plain that is an alluvial plain filled up with thick alluvial sediments on an ancient seabed. Later loess formation also took place here and the influence of shallow fluctuating, saline-sodic groundwater, as well as permanent or temporary waterlogging created the conditions of SAS formation. Sodium ions, being considered as the most important factor, either dissolved from the Tertiary Era deposits into groundwater [37] or concentrated during consecutive drying and wetting of infiltrated water [38]. Systematic mapping of salt-affected soils has a history of more than a hundred years in Hungary. The first medium-scale SAS map (1:75,000) was prepared in the late 1920s by Arany [39] and Magyar [40] presenting the status and vegetation of salt-affected soils, respectively. Sigmond [33] prepared the first ever quantitative map of soil salinity at the scale of 1:300,000. The first large-scale SAS map (1:10,000) was compiled by Szabolcs [41], who later also compiled SAS maps not just for Europe [42] but for the world [9] as well. The latter is important because it was the basis for assessing the global distribution of salt-affected soils [4].
The objective of our study was to present how Hungary contributed to the GSSmap international initiative by preparing its own maps of salt-affected soils according to the GSSmap specifications. For this purpose, we applied not just a combination of advanced machine learning algorithms and multivariate geostatistical techniques but also a number of image indices exploiting a huge amount of relevant information contained in remote sensing images. Our maps were prepared with a resolution of 100 m because we wanted to simultaneously update the available SAS maps for Hungary.

GSSmap Specifications
In the framework of GSSmap, three SAS indicators were selected, namely pH (H 2 O), electrical conductivity (EC) and exchangeable sodium percentage (ESP) of the soil paste extract, as identifiers of the status and occurrence of salt-affected soils. It was mandatory to map these indicators for the topsoil (0-30 cm) and for the subsoil (30-100 cm) with quantified prediction uncertainty that had to be expressed by the width of the 95% prediction interval. The target spatial resolution of the deliverable maps was 1 km. In addition to the maps of SAS indicators, it was also mandatory to prepare a classified salt severity map for the topsoil and for the subsoil with quantified prediction uncertainty using the maps of SAS indicators. Either the FAO or the USDA classification scheme could be used for preparing the map of salt severity. In summary, 4+4 maps for the topsoil and 4+4 maps for the subsoil were expected to be delivered by a country.

Soil Data, Conversion and Harmonization
For mapping the SAS indicators, we derived soil data from the Hungarian Soil Information and Monitoring System (SIMS) established in 1992 ( Figure 1). SIMS is a countrywide monitoring system providing geographically referenced biological, physical and chemical information on the temporal change of the Hungarian soils. It consists of 4859 soil horizons belonging to 1236 soil profiles.

GSSmap Specifications
In the framework of GSSmap, three SAS indicators were selected, namely pH (H2O), electrical conductivity (EC) and exchangeable sodium percentage (ESP) of the soil paste extract, as identifiers of the status and occurrence of salt-affected soils. It was mandatory to map these indicators for the topsoil (0-30 cm) and for the subsoil (30-100 cm) with quantified prediction uncertainty that had to be expressed by the width of the 95% prediction interval. The target spatial resolution of the deliverable maps was 1 km. In addition to the maps of SAS indicators, it was also mandatory to prepare a classified salt severity map for the topsoil and for the subsoil with quantified prediction uncertainty using the maps of SAS indicators. Either the FAO or the USDA classification scheme could be used for preparing the map of salt severity. In summary, 4+4 maps for the topsoil and 4+4 maps for the subsoil were expected to be delivered by a country.

Soil data, Conversion and Harmonization
For mapping the SAS indicators, we derived soil data from the Hungarian Soil Information and Monitoring System (SIMS) established in 1992 ( Figure 1). SIMS is a countrywide monitoring system providing geographically referenced biological, physical and chemical information on the temporal change of the Hungarian soils. It consists of 4859 soil horizons belonging to 1236 soil profiles. Since neither EC nor ESP of the soil paste-extract are part of the standard laboratory analysis of the soils in Hungary, thus neither does the laboratory protocol of SIMS extend to measure these indicators. For example, instead of measuring EC of the soil-paste extract, the measurement of salt concentration of the soil solution is preferred. Therefore, we had to use conversion methods in order to gain EC and ESP data. In Hungary, Filep [43], as well as Filep and Wafi [44] extensively studied the statistical relationships between EC, ESP and more commonly measured soil properties and elaborated a number of pedotransfer functions by adapting and modifying the internationally used functions. The main advantage of these pedotransfer functions is that they have been elaborated on a consistent and detailed data set of Hungarian SAS and therefore they have been specifically designed for deriving EC and ESP Since neither EC nor ESP of the soil paste-extract are part of the standard laboratory analysis of the soils in Hungary, thus neither does the laboratory protocol of SIMS extend to measure these indicators. For example, instead of measuring EC of the soil-paste extract, the measurement of salt concentration of the soil solution is preferred. Therefore, we had to use conversion methods in order to gain EC and ESP data. In Hungary, Filep [43], as well as Filep and Wafi [44] extensively studied the statistical relationships between EC, ESP and more commonly measured soil properties and elaborated a number of pedotransfer functions by adapting and modifying the internationally used functions. The main advantage of these pedotransfer functions is that they have been elaborated on a consistent and detailed data set of Hungarian SAS and therefore they have been specifically designed for deriving EC and ESP values for Hungarian salt-affected soils. Table 1 summarizes the pedotransfer functions used in this study. Since the depth of soil horizons varies from one soil profile to another, we used mass-preserving splines [45] for modelling the vertical distribution of each SAS indicator at each soil profile. The fitted splines were used to derive the values of SAS indicators for the topsoil (0-30 cm) and for the subsoil (30-100 cm) at each monitoring point. The splined, so-called harmonized values of SAS indicators were used in further spatial modelling. Table 2 summarizes their descriptive statistics.  Table 3 summarizes the environmental covariates used in spatial modelling. Representing the spatial variation of soil mantle, we used the genetic soil type map of Hungary [46] and a thematic layer of the Digital Kreybig Soil Information System (DKSIS [47]), namely the chemical properties of soils. The latter contains detailed, legacy, spatial information on the chemical properties of the Hungarian soils including categories of various types of SAS. Data layers provided by the Hungarian Meteorological Service were used to characterize climate. We characterized topography by a digital elevation model [48] and a number of its derivatives (see Table 3). Parent material was considered based on the correlation between the legend of the geological map of Hungary and 13 parent material classes according to the FAO code system [49].  [49] In addition to the covariates summarized above, we also used a number of relevant indices derived from Sentinel-2 satellite images. The spectral bands of Sentinel-2 range from the visible and near infrared to the short wave infrared with a spatial resolution of 10, 20 and 60 m depending on the sensor. If the meteorological conditions are perfect, 22 Sentinel-2 satellite images in total are needed to cover Hungary entirely. However, due to cloudiness, 27 Sentinel-2 satellite images in total were used in this study, which were acquired during the spring of the year 2019 (i.e., 23/03/2019, 24/03/2019 and 31/03/2019). In the supplementary material, Figure S1 graphically presents the satellite images used in this study and their acquisition date. Spring is an appropriate period for surveying and mapping of salt-affected soils since soil mantle is either uncovered (i.e., bare) or only covered by natural vegetation. Using the satellite images, we compiled a countrywide, cloud-free mosaic for Hungary with the closest possible acquisition times. This mosaic was the basis for deriving satellite-based soil and/or vegetation indices. In this study, we derived those indices, which are the most frequently used in SAS mapping [4]. Table 4 presents these indices with their formulas. All the indices were computed at a resolution of 10 m in an automatized Python environment. Since SAS mapping was targeted with a resolution of 100 m, we aggregated the derived indices to 100 m using a median filter. Table 4. Summary of vegetation and soil indices derived from Sentinel-2 satellite images. Abbreviations: G: green band, R: red band, B: blue band, and NIR: near infrared band.

Indices Abbreviation Formula
All the environmental covariates listed above were resampled into a common geographic reference system with a resolution of 100 m. This geographic reference system was also used in spatial modelling. In the case of categorical covariates (i.e., soil type, DKSIS chemical properties of soils, CORINE land cover, parent material), we applied a nearest neighbor resampling technique, whereas in the case of continuous covariates, a cubic spline technique was applied. Resampling was carried out in SAGA GIS software environment [51].

Spatial Modelling and Classification of SAS
The harmonized data on EC and ESP showed positively skewed distribution (Table 1) and therefore we applied log transformation in order to obtain quasi-normal distribution. The log-transformed EC and ESP data were used in further spatial modelling. The harmonized data on pH showed quasi-normal distribution.
Spatial variation of soil properties at a given depth can be described and modelled in terms of a deterministic component and a stochastic component, that is where Z is the soil property of interest, m is the deterministic component describing structural variation, ε is the stochastic part consisting of random variation that could be spatially correlated, u is the vector of the geographical coordinates, and d is the target soil depth. In this study, we used a combination of advanced machine learning algorithm and multivariate geostatistical techniques for modelling the spatial variation of the SAS indicators at both soil depths. We used random forest (RF [52]) for describing, modelling and predicting the deterministic part of variation of the SAS indicators at both soil depths. We selected RF not only because it has become a frequently applied machine learning technique in various fields (e.g. [53][54][55][56]), but it commonly outperforms other techniques as well [23,57]. Before fitting RF between the indicators and the environmental covariates listed in Table 3, we fine-tuned the hyper-parameter mtry of RF that is the number of input covariates selected randomly at each split. A tuning vector was generated containing the possible values of mtry, and then a repeated 10-fold cross-validation was used for evaluating the performance on each SAS indicator at both soil depths. The fine-tuned values of mtry were used for fitting RF to each indicator and then the fitted RFs were used for spatially exhaustively predicting each SAS indicator at both soil depths.
Since we could reasonably assume that the SAS indicators are interdependent, it is better to jointly model their spatial variation at a given soil depth. Therefore, a geostatistical model, namely cokriging with external drift (coKED), was built up for each soil depth. In coKED, we used the RF predictions of the SAS indicators as external drifts, i.e., they were interpreted as deterministic, known functions of the SAS indicators (i.e., m(u) of Eq. 1). The residuals (i.e., the difference between the RF predictions and the observations of the SAS indicators) represented the stochastic part of variation of the SAS indicators (i.e., ε(u) of Eq. 1). We computed the direct-and cross-variograms of the residuals and then a linear model of coregionalization (LMC) was fitted in order to obtain a statistically valid model [58]. The spatial prediction of SAS indicators and their uncertainty were identified by the corresponding coKED prediction and its kriging variance, respectively. In the case of EC and ESP, we had to transform the coKED predictions back to the original, positively skewed scale since spatial modelling was carried out on a transformed, normal scale. It can be done by adding half of the kriging variance to the coKED prediction and then taking its exponential [59].
According to the GSSmap specifications, we finally classified the spatial predictions of the SAS indicators in order to prepare a salt severity map for both soil depths. For this purpose, we used the FAO classification scheme presented in Table 5.

Quantification and Propagation of Uncertainty
The prediction uncertainty of the SAS indicators was quantified by the width of the 95% prediction interval (PI). This PI reports the range of values within which the true value is expected to occur 95 times out of 100. If the distribution is normal, as in the case of pH, the upper and lower limit of the 95% PI can be readily computed by subtracting and adding 1.96 times the kriging standard deviation to the kriging prediction [60]. If the distribution is lognormal, as in the case of EC and ESP, the limits of the 95% PI are computed on the transformed, normal scale as above and then transformed back to the original, lognormal scale by taking their exponential [61]. The width of the 95% PI (i.e., the difference between the upper and lower limit) is a useful measure of uncertainty because its value can be interpreted as the higher the value, the higher the uncertainty [62].
We also examined how the prediction uncertainty of SAS indicators propagates through the classification scheme (Table 5) at both soil depths. Sequential Gaussian cosimulation was conducted to the geostatistical models built up in the section above and 100 equally probable stochastic realizations of the SAS indicators were simulated for both soil depths. Cosimulation was conditional, that is, it honored the SAS data at the monitoring locations. In addition, the generated stochastic realizations reproduced the joint spatial relationship (i.e., interdependency) of the SAS indicators. We used the generated 100 realizations as inputs of classification for investigating how the prediction uncertainty of inputs propagates through [63]. Thus, 100 classified salt severity maps were obtained for both soil depths, which means 100 simulated salt severity classes at each location for both soil depths. To quantify the uncertainty in the classification output, we used Shannon's [64] information entropy, that is where H(u) is the value of Shannon's information entropy at location u, p i (u) is the probability of the ith salt severity class at location u, and N is the number of possible salt severity classes. The value of p i (u) was determined by the relative frequency of the ith salt severity class based on the 100 salt severity classes simulated at location u. Term in the denominator of Equation (2) ensures that the value of Shannon's entropy falls within the interval [0,1]. Its value can be interpreted as the higher the value, the higher the uncertainty. If H(u) takes the value of zero, then there is no uncertainty, i.e., the probability of one of the possible classes is equal to one at location u. If H(u) takes the value of one, the uncertainty is highest, i.e., each possible class has equal probability of occurring at location u.

Validation
Due to the limited number of soil observations, we used 10-fold cross-validation for inspecting the performance of the spatial predictions and uncertainty quantifications of SAS indicators. In 10-fold cross-validation, the dataset is randomly partitioned into 10 equal-size parts. One of these parts is retained for validating the spatial predictions, which were given by using the remaining nine parts. This step is repeated until each of the 10 parts become a validation set exactly once.
We computed the most commonly used error measures, i.e., bias (mean error, ME) and the spread of the error distribution (root mean square error, RMSE). Furthermore, Lin's [65] concordance correlation coefficient (CCC) and Nash-Sutcliffe [66] model efficiency coefficient (NSE) were computed.
We examined the performance of the uncertainty quantifications by using accuracy plots and G statistics. The underlying theory is that if an uncertainty quantification reports, e.g., the map of the 95% PI, then we expect that 95% of the observed values coming from the validation set should fall within this PI. If the observed fraction is lower than the expected fraction, then the uncertainty has been underestimated. If the observed fraction is higher, the uncertainty has been too liberally estimated (i.e., overestimated). This can be extended to any symmetric PIs. An accuracy plot (a.k.a. prediction interval coverage probability plot) graphically presents the expected and observed fraction for any symmetric PIs and ideally follows the 1:1 line. Based on the accuracy plot, G statistics [67] measures the overall closeness of the observed and the expected fractions, that is where ξ(p) and p are the observed and expected fraction for a p-width symmetric PI, respectively. The value of G statistics can be interpreted as the higher the value, the closer the observed and the expected fractions. Ideally, its value is equal to 1.

Spatial Prediction of SAS Indicators
The log-transformed data on EC and ESP showed quasi normal distribution in both soil depths and therefore they could be easily conducted to spatial modelling. For pH, EC and ESP, mtry values of 9, 18 and 7 were found to be optimal in the topsoil, respectively, whereas mtry values of 14, 15 and 11 were found to be optimal in the subsoil for pH, EC and ESP, respectively.
The importance of the environmental covariates for each SAS indicators at both soil depths is given in the supplementary material (see Figure S2). A terser summary is given in Figure 2 in which the importance of a given environmental covariate was presented by counting its occurrence on the list of the top 3, 5, 10, and 20 important covariates of each SAS indicator. Besides, Figure 2 also presents the most important predictors (MIPs) for the SAS indicators, i.e., those ones which proved to be the most important (or best) covariates in predicting SAS indicators. Temperature, precipitation and evapotranspiration were found to be MIPs at both soil depths, which is not a surprise considering that salt-affected soils have developed with an excess of evaporation over precipitation, helping to raise salt from shallow groundwater [68]. In addition to the climatic data layers detailed above, DEM derivatives (i.e., altitude and channel network base level) and soil information (i.e., soil type and DKSIS's chemical properties of soils) were found to be very important in predicting SAS indicators in the topsoil and subsoil based on the top three lists. Hungary is located in the Carpathian (or Pannonian) Basin, and thus, Hungarian salt-affected soils have been mainly formed in the Great Hungarian Plain, which are discharge areas of regional flow systems of groundwater characterized by surplus salts and especially sodium ions. The importance of DKSIS's thematic layer of chemical properties of soils is evident in Figure 2, meaning that legacy spatial soil information provided by DKSIS are definitely valuable for predicting SAS indicators. Based on the top 5, 10 and 20 lists, we found that further DEM derivatives (e.g., vertical and horizontal distance to channel network, topographic wetness index), land cover and image indices (e.g., salinity index and ratio, brightness index and vegetation soil salinity index) derived from Sentinel-2 satellite images were also important in predicting SAS indicators at both soil depths. Image indices mainly represented the mosaic-like patches of salt-affected soils, which could be hardly captured by other covariates.
Remote Sens. 2020, 12, x FOR PEER REVIEW 9 of 20 index and vegetation soil salinity index) derived from Sentinel-2 satellite images were also important in predicting SAS indicators at both soil depths. Image indices mainly represented the mosaic-like patches of salt-affected soils, which could be hardly captured by other covariates.  Figure 3 presents the experimental direct-and cross-variograms computed from the RFs residuals, and it also depicts the fitted LMCs at both soil depths. In both cases, LMCs have a nested model structure in which the first structure models the discontinuity at the origin (a.k.a. nugget effect), whereas the second one models spatial continuity with range values of 40 and 35 km in the topsoil and in the subsoil, respectively. The model type of the second structure was spherical at both soil depths.  Figure 3 presents the experimental direct-and cross-variograms computed from the RFs residuals, and it also depicts the fitted LMCs at both soil depths. In both cases, LMCs have a nested model structure in which the first structure models the discontinuity at the origin (a.k.a. nugget effect), whereas the second one models spatial continuity with range values of 40 and 35 km in the topsoil and in the subsoil, respectively. The model type of the second structure was spherical at both soil depths. Figure 4 presents the maps of the SAS indicators, whereas their prediction uncertainty is presented in the supplementary material ( Figure S3). The spatial predictions represent well the overall spatial distribution of the SAS indicators in Hungary. High values of EC, ESP and pH have been predicted on those areas where it is well-known that soils are strongly affected by salinity, sodicity and/or alkalinity. These areas are mainly located in the Danube-Tisza Interfluve, the upper part of Tisza River and in the eastern part of Hungary. In addition, the maps presented in Figure 4 give a fairly detailed picture about the spatial variability of the SAS indicators, which makes it easy to identify the mosaic-like patches of salt-affected soils. We should note that the prediction uncertainty of ESP and EC was explicitly high on those areas, where the prediction of EC and ESP was also high. This is because both SAS indicators are lognormally distributed. A property of a lognormally distributed random variable is the proportional effect, i.e., variability is higher in areas with high average values than in areas with low average values [61]. This implies high prediction uncertainty in areas with high predicted values [62,69].   Figure S3). The spatial predictions represent well the overall spatial distribution of the SAS indicators in Hungary. High values of EC, ESP and pH have been predicted on those areas where it is well-known that soils are strongly affected by salinity, sodicity and/or alkalinity. These areas are mainly located in the Danube-Tisza Interfluve, the upper part of Tisza River and in the eastern part of Hungary. In addition, the maps presented in Figure 4 give a fairly detailed picture about the spatial variability of the SAS indicators, which makes it easy to identify the mosaic-like patches of salt-affected soils. We should note that the prediction uncertainty of ESP and EC was explicitly high on those areas, where the prediction of EC and ESP was also high. This is because both SAS indicators are lognormally distributed. A property of a lognormally distributed random variable is the proportional effect, i.e., variability is higher in areas with high average values than in areas with low average values [61]. This implies high prediction uncertainty in areas with high predicted values [62,69].

Performance of Spatial Predictions and Uncertainty Quantifications
In Table 6, we summarized the performance of the spatial predictions by the computed values of ME, RMSE, CCC and NSE. An important aim of every DSM procedure is to provide predictive soil maps without bias (i.e., value of ME close to zero) and with RMSE as low as possible. According to the error measures, the spatial models gave unbiased spatial predictions for each SAS indicator at both soil depths. The values of CCC range from 0.531 to 0.764. The values of NSE can be interpreted as the value of R-square if its value is greater than zero. The values of NSE range from 0.333 to 0.621. The lowest values were obtained for ESP, meaning that its performance was the lowest between the

Performance of Spatial Predictions and Uncertainty Quantifications
In Table 6, we summarized the performance of the spatial predictions by the computed values of ME, RMSE, CCC and NSE. An important aim of every DSM procedure is to provide predictive soil maps without bias (i.e., value of ME close to zero) and with RMSE as low as possible. According to the error measures, the spatial models gave unbiased spatial predictions for each SAS indicator at both soil depths. The values of CCC range from 0.531 to 0.764. The values of NSE can be interpreted as the value of R-square if its value is greater than zero. The values of NSE range from 0.333 to 0.621. The lowest values were obtained for ESP, meaning that its performance was the lowest between the SAS indicators. Table 6. The performance of spatial predictions by 10-fold cross-validation. Abbreviations: ME: mean error, RMSE: root mean square error, CCC: Lin's concordance correlation coefficient, NSE: Nash-Sutcliffe model efficiency coefficient, SAS: salt-affected soils, EC: electrical conductivity, and ESP: exchangeable sodium percentage. We also checked the performance of the uncertainty quantifications of the SAS indicators by accuracy plots and G statistics. Figure 5 presents the prepared accuracy plots and computed G statistics for each SAS indicator at both soil depths. The accuracy plots approximately follow the 1:1 line, proving that the accuracy of uncertainty quantifications is acceptable. This statement is also supported by the computed G statistics, which are quite close to the expected value.

SAS Indicators
Remote Sens. 2020, 12, x FOR PEER REVIEW 12 of 20 Table 6. The performance of spatial predictions by 10-fold cross-validation. Abbreviations: ME: mean error, RMSE: root mean square error, CCC: Lin's concordance correlation coefficient, NSE: Nash-Sutcliffe model efficiency coefficient, SAS: salt-affected soils, EC: electrical conductivity, and ESP: exchangeable sodium percentage. We also checked the performance of the uncertainty quantifications of the SAS indicators by accuracy plots and G statistics. Figure 5 presents the prepared accuracy plots and computed G statistics for each SAS indicator at both soil depths. The accuracy plots approximately follow the 1:1 line, proving that the accuracy of uncertainty quantifications is acceptable. This statement is also supported by the computed G statistics, which are quite close to the expected value.  Figure 6 presents the classified salt severity maps for the topsoil and for the subsoil. Areas with already known salt-affected soils are well reflected in the classified maps, if we compare them with Szabolcs's [42] map (see Figure S4). However, the comparison of these maps is not so straightforward since the SAS categories presented in Szabolcs's map were defined according to the Hungarian soil classification system, which distinguishes salt-affected soils by (i) the vertical distribution of salinity, Figure 5. Accuracy plots and G statistics for the indicators of salt-affected soils in the topsoil (0-30 cm) and in the subsoil (30-100 cm). Abbreviations: EC: electrical conductivity, and ESP: exchangeable sodium percentage. Figure 6 presents the classified salt severity maps for the topsoil and for the subsoil. Areas with already known salt-affected soils are well reflected in the classified maps, if we compare them with Szabolcs's [42] map (see Figure S4). However, the comparison of these maps is not so straightforward since the SAS categories presented in Szabolcs's map were defined according to the Hungarian soil classification system, which distinguishes salt-affected soils by (i) the vertical distribution of salinity, sodicity and alkalinity; (ii) the soil structure; and (iii) the vertical sequence of diagnostic horizons [70]. Nonetheless, areas severely affected by salinity or sodicity are quite similar in Szabolcs's map and in the classified salt severity maps. These areas are mainly located in the Danube-Tisza Interfluve, upper part of Tisza River and in the eastern part of Hungary. Though the improvement (i.e., reclamation or amelioration) of these soils is scientifically well founded in Hungary, it is a rather costly operation. This is the reason why large parts of these areas are kept as graze-land or hayfield, land for afforestation, paddy field, or fishpond. Besides, most of the Hungarian National Parks have salt-affected grasslands, hayfields, marshes, reed-lands, and lakes, and these provide habitat for protected animals (mainly birds), plants and attract lots of tourists.

Classified Maps of Salt Severity
Remote Sens. 2020, 12, x FOR PEER REVIEW 13 of 20 sodicity and alkalinity; (ii) the soil structure; and (iii) the vertical sequence of diagnostic horizons [70]. Nonetheless, areas severely affected by salinity or sodicity are quite similar in Szabolcs's map and in the classified salt severity maps. These areas are mainly located in the Danube-Tisza Interfluve, upper part of Tisza River and in the eastern part of Hungary. Though the improvement (i.e., reclamation or amelioration) of these soils is scientifically well founded in Hungary, it is a rather costly operation. This is the reason why large parts of these areas are kept as graze-land or hayfield, land for afforestation, paddy field, or fishpond. Besides, most of the Hungarian National Parks have salt-affected grasslands, hayfields, marshes, reed-lands, and lakes, and these provide habitat for protected animals (mainly birds), plants and attract lots of tourists. It should be mentioned that the FAO classification scheme (Table 5) proposed by the GSSmap specification is not so appropriate for classifying Hungarian salt-affected soils since it applies a very It should be mentioned that the FAO classification scheme (Table 5) proposed by the GSSmap specification is not so appropriate for classifying Hungarian salt-affected soils since it applies a very strict threshold on salinity (i.e., 0.75 dS·m −1 ). Furthermore, the threshold on sodicity is not so strict with the common ESP value of 15% (Table 5). That is the reason why areas of slight and moderate salinity classes, as well as areas of slight sodicity class, are so widespread in Hungary, especially in the eastern part of Hungary and in the Danube-Tisza Interfluve ( Figure 6). This can be attributed to the fact that large parts of the Hungarian lowlands have been affected by stagnant water and consequently have fine sediments with some minor soluble salt concentration. Since the values of EC reflect not only soluble salts but also fine particles and organic matter [71], the categories of slight and moderate salinity also reflect clayey soils, which are quite widespread in the eastern part of Hungary. These areas are classified as "potential salt-affected soils" in Szabolcs's map (see Figure S4), meaning that these areas could be potentially threated by salt problems.
The quantified uncertainty in the SAS classification outputs is presented in the supplementary material (see Figure S5). There is a strong relationship between the uncertainty in SAS classification outputs and the prediction uncertainty of the SAS indicators ( Figure S3), i.e., the higher the prediction uncertainty of SAS indicators, the higher the uncertainty in the SAS classification output. This is quite conspicuous for salinity (i.e., EC) and sodicity (i.e., ESP). This could be attributed to the fact that the prediction uncertainty of ESP and EC was explicitly high on those areas, where the prediction of ESP and EC was also high. This is why areas strongly affected by salinity or sodicity show higher uncertainty in the SAS classification than areas without salt problems.

Discussion
This section was addressed to discuss some issues in more detail in relation to the methodology and environmental covariates used in this study. Two of them were raised in the course of spatial modelling, whereas one of them relates to the remote sensing images applied as environmental covariates.

On the Interpretability of Machine Learning Algorithms
It is commonly known that data-driven models given by machine learning algorithms (MLAs), including RF, are not easily interpretable since they are too complex and complicated to understand. They frequently appear like a black box in which it is hard to trace and understand what happens. Although there are some MLAs, which can provide more-or-less easily interpretable models (e.g., cubist), the most of them cannot do so. Prediction accuracy and model interpretability are in conflict as it was pointed out by Breiman [72]. Thus, in most cases, a more accurate prediction can be gained by a complex and complicated model than by a simple and easily interpretable one. However, we should try to understand these models as far as possible [73], e.g., by the application of post-hoc techniques [74,75]. The importance plot derived from a RF model can be a valuable tool trying to do so. In this study, we examined these plots of the environmental covariates ( Figure S2) at different levels of importance ( Figure 2) and tried to interpret with expert knowledge why these environmental covariates were important in predicting SAS indicators in Hungary. Obviously, it did not give a full picture about the fitted RFs but, at least, there was a chance to explore and try to interpret the main relationships between the SAS indicators and the environmental covariates, which defined these predictive models. We could identify the background, conditions and driving forces of SAS formation, which is important for studying, mapping and monitoring of these soils.

Remotely Sensed Information as Important Covariates for SAS Mapping
Remotely sensed information, including aerial photographs, on land surface have been used for decades to study the spatio-temporal variability of salt-affected soils (e.g., [29,30,76,77]). Thus, we dedicated a separate subsection to discuss and highlight the importance of remote sensing in SAS mapping. In this study, we pointed out that indices derived from remotely sensed images were informative covariates ( Figure S2 and Figure 2) in representing smaller scale spatial variability of salt-affected soils. Indeed, conditions and driving forces of SAS formation operate at various scales and, therefore, it is a real challenge to map these soils accurately. We have seen that environmental covariates associated with topography and climate were principal because they are important conditions and driving forces of the potential occurrence of SAS at a larger scale. However, salt-affected soils frequently appear as mosaic-like patches. This can be attributed to conditions and driving forces operating at a much smaller scale, such as microtopography or microclimate, which can be hardly taken into consideration directly, especially when countrywide SAS mapping is targeted. The easiest way to capture this small scale variability is the application of image indices derived from remotely sensed information. Indices can provide specific, spatially (or even spatio-temporally) detailed information on the mosaic-like appearance of SAS via the natural vegetation or bare soil surface. It is commonly known that halophytes (i.e., salt-tolerant plants) and their communities are important indicators of SAS and are closely related to mosaic-like patches of SAS. This knowledge on botanic and ecology have been used for decades in field surveying and mapping of SAS [78,79]. Indices derived from remotely sensed images are able to provide information on the occurrence of these plant communities and, therefore, these indices as environmental covariates can be successfully used for modelling smaller scale spatial variability of SAS. Furthermore, since these plant communities are differentiated by microtopography [78], indices containing information on these plant communities can also provide indirect information on the characteristic of microtopography, which cannot be reflected by a countrywide digital elevation model with a relatively low resolution.

Pros And Cons of Using Multivariate Geostatistics
In this study, we used multivariate geostatistics, to be more precise the cokriging approach, with random forest since we could reasonably assume that the SAS indicators are interdependent and jointly vary in space. The computed experimental direct-and cross-variograms ( Figure 3) confirmed this assumption and, therefore, we jointly modelled the spatial distribution of SAS indicators, which is a rarely used approach in practice. When two or more variables are targeted for mapping or spatial modelling, it is common in practice to model their spatial distribution separately. In geostatistics, Goovaerts [58], Wackernagel [80] and Cressie [81] pointed out that this approach could yield inconsistent results, e.g., the sum of the separately kriged particle size fractions of the soils (i.e., sand, silt and clay) is not going to be 100%. Therefore, it is better to jointly model their spatial distribution. The application of the cokriging approach is able not just to exploit the advantages of this interdependency in spatial modelling but also to provide consistent results that are highly appreciated in further assessment [58,80,81]. Besides, in this study, this approach allowed us to generate such stochastic realizations that honored the joint spatial distribution of the SAS indicators and to examine in a consistent way (thanks to these realizations) how the prediction uncertainty of the SAS indicators propagates through the SAS classification scheme. As a matter of fact, there are some disadvantages that make the cokriging approach not so attractive in practice. In this study, we modelled the spatial distribution of three SAS indicators, which meant three direct-and three cross-variograms for a given soil depth (Figure 3), and then we used a linear model of coregionalization (Figure 3) to fit a statistically valid model. As the number of variables increases, not just the number of direct-and cross-variograms increases exponentially, which have to be modelled, but also the modelling becomes complicated and needs more effort because of the increasing number of conditions that have to be satisfied (for more details see Goovaerts [58]).

Conclusions
The objective of our paper was to present how Hungary contributed to GSSmap by preparing its own SAS maps using advanced DSM techniques. We used a combination of random forest and multivariate geostatistics for jointly modelling and predicting the spatial distribution of the selected SAS indicators with special attention to quantifying the prediction uncertainty and how this uncertainty propagates through the SAS classification scheme recommended by FAO.
By the interpretation of the importance plots of the fitted RFs, we have explored and identified the conditions and driving forces of SAS formation at various scales, which could support further studies on SAS. Furthermore, these findings can serve as a basis not just for better understanding the spatial distribution of SAS but also for further surveying, mapping and monitoring of these soils.
In this study, we have pointed out that indices derived from remotely sensed images can serve as highly informative covariates in digital mapping of salt-affected soils. It was revealed that short-scale variability of salt-affected soils, which causes mosaic-like patches in field, can be appropriately captured and modelled via remote sensing indices.
As we have highlighted, there is a long history and tradition of studying salt-affected soils in Hungary, and there are a number of SAS maps with varying scales. By this study and by the resulting maps of it, we not just successfully contributed to the GSSmap international initiative and complemented the available map series of salt-affected soils in Hungary, but also renewed their mapping methodology by using advanced DSM techniques. In the renewal of the methodology, we paid special attention to modelling and quantifying the prediction uncertainty that had not been quantified or even taken into consideration earlier.