The Soil Nutrient Digital Mapping for Precision Agriculture Cases in the Trans-Ural Steppe Zone of Russia Using Topographic Attributes

: Topographic features of territory have a signiﬁcant impact on the spatial distribution of soil properties. This research is focused on digital soil mapping (DSM) of main agrochemical soil properties—values of soil organic carbon (SOC), nitrogen, potassium, calcium, magnesium, sodium, phosphorus, pH, and thickness of the humus-accumulative (AB) horizon of arable lands in the Trans-Ural steppe zone (Republic of Bashkortostan, Russia). The methods of multiple linear regression (MLR) and support vector machine (SVM) were used for the prediction of soil nutrients spatial distribution and variation. We used 17 topographic indices calculated using the SRTM (Shuttle Radar Topography Mission) digital elevation model. Results showed that SVM is the best method in predicting the spatial variation of all soil agrochemical properties with comparison to MLR. According to the coefﬁcient of determination R 2 , the best predictive models were obtained for content of nitrogen ( R 2 = 0.74), SOC ( R 2 = 0.66), and potassium ( R 2 = 0.62). In our study, elevation, slope, and MMRTF (multiresolution ridge top ﬂatness) index are the most important variables. The developed methodology can be used to study the spatial distribution of soil nutrients and large-scale mapping in similar landscapes. According to the coefﬁcient of determination ( R 2 ) and RMSE values, the SVM method was superior to MLR in predicting all studied agrochemical soil properties. The SVM model was obtained to explain the spatial variation of N content and showed the best results ( R 2 = 0.74, RMSE = 14.23). This model is characterized by highly predictive ability. The medium level of prediction was obtained for modeling of SOC, Na, K, and thickness of the humus-accumulative (AB) horizon. The most important variables for predicting the spatial distribution of soil properties were elevation, slope, and MRRTF index. Our results showed the possibility of using topography variables with an SVM approach to predict the main soil agrochemical properties in the Trans-Ural steppe zone (Republic of Bashkortostan, Russia). The use of such ML techniques can reduce soil sampling efforts, and therefore reduce soil mapping costs and keep time/human resources. We assumed that the accuracies obtained in this study are promising for current/future local scale digital soil mapping efforts as well as for the other areas in the world with similar soil/climate conditions. We propose that the elaborated SVM models should be improved by adding other covariates, analyzing other soil formation factors, increasing the number and period of soil sampling, and using other ML techniques to obtain the best results for soil mapping.


Introduction
Nutrients are essential for soil fertility and stable plant growth [1]. Studying, modeling, and mapping the spatial distribution of soil properties is an important task for effective farming and sustainable land management [2]. Most Russian large-scale soil maps were created prior to the late 1980s [3], which is almost the same time as the start of economic problems in the USSR. Thus, the elaboration of new cartographic data and updating of existing cartographic data is necessary, especially due to the active agricultural use and transformation of climatic conditions. Nevertheless, field soil surveys and further large-scale mapping (including updating old maps) are expensive and time-consuming processes.
In recent decades, digital soil mapping (DSM) methods have been actively used to study and map soils and their properties. DSM is based on statistical, geostatistical, and mathematical methods to identify relationships between soil properties and soil formation factors, called environmental covariates [4]. DSM methods are more cost-effective and allow the creation of maps with greater accuracy and higher spatial resolution [5]. These methods are especially relevant to digital mapping of soil nutrients, as laboratory analyses of soil nutrients are costly and time-consuming. Profile curvature PrCu 5 Plan curvature PlCu 6 Flow accumulation FlAc 7 Analytical hillshading AnHil 8 Channel network base level CNBL 9 Channel network distance CND 10 Convergence index CI 11 LS-Factor LS-F 12 Topographic wetness index TWI 13 Valley depth VD 14 Closed depressions CD 15 Relative slope position RSP 16 Multiresolution ridge top flatness MRRTF 17 Multiresolution valley bottom flatness MRVBF Information about relief as a major or additional factor was successfully applied in different studies such as the migration of radiocesium in Japan [18], analysis of shallow landslides after an extreme rainstorm in Italy [19], and proliferation of fungal communities in a boreal alpine ecosystem [20]. In the studied region (Republic of Bashkortostan), the topography data were used in mapping subtypes of Chernozems soils and heavy metals [21], in the study of erosion processes [22], and in preparation of arable land for irrigation reclamation [23].
Machine learning (ML) techniques are actively used for prediction and mapping of soil properties. Various techniques including multiple linear regression (MLR) and support vector machine (SVM) have been successfully applied to the interrelation of soil properties and covariates [24][25][26]. MLR is a machine learning algorithm applied to regress a target variable. MLR is a least-squares model in which a targeted soil property is predicted from selected explanatory topographic variables [27]. This method establishes a linear relationship between the predicted elements and the covariates. For a better understanding of these relationships, MLR is presented below in Equation (1) [28]: where n is the number of predictors, y-response variables, x i -explanatory variables or predictors, a-intercept (constant term), b i -partial regression coefficients, and ε i -residuals. SVM is one of the basic supervised approaches for classification and regression analysis [29,30]. SVM using binary classification defines the closest points between two classes in feature space and draws an optimal dividing line between them [31,32]. This hyperplane, from the maximized margins, is used as a criterion for subsequent classification. Several investigators have reported that the SVM approach provides lower prediction errors compared with other methods [33][34][35].
As relief is one of the main factors of soil formation, we hypothesized that the ML techniques using topography information can predict the spatial distribution of most soil properties with good accuracy.
The objectives of this study are the following: (i) digital mapping of soil nutrients using relief attributes in arable land at the Trans-Ural steppe zone (Republic of Bashkortostan, Russia), (ii) model performance evaluation using MLR and SVM algorithms, and (iii) creating digital maps and determining the spatial distribution of soil nutrients. We assume that the developed models can be used for further large-scale digital mapping of agricultural lands in other regions of the world with similar conditions.

Materials and Methods
The study site is located in the Trans-Ural steppe zone, near the southern border of the Ural Mountains (Republic of Bashkortostan, Russia) ( Figure 1). The study area consists of 1400 ha and is characterized mostly by gentle (1-3 • ) slopes up to 8 • with various exposures. The elevation of the studied field ranges from 460 m in the northwestern part of the site to 377 m above sea level in the southeastern part. The state of soil cover at studied plot is characterized as arable land (the plowing by a tractor with a turnover of the soil layer to a 10-15 cm depth repeated every year in spring). Wheat (Triticum aestivum) is predominantly cultivated on the plot. The reconnaissance survey of the study area showed that in the southern part of territory, water erosion processes (sheet and rill erosion) are occurring.
As relief is one of the main factors of soil formation, we hypothesized that the ML techniques using topography information can predict the spatial distribution of most soil properties with good accuracy.
The objectives of this study are the following: (i) digital mapping of soil nutrients using relief attributes in arable land at the Trans-Ural steppe zone (Republic of Bashkortostan, Russia), (ii) model performance evaluation using MLR and SVM algorithms, and (iii) creating digital maps and determining the spatial distribution of soil nutrients. We assume that the developed models can be used for further large-scale digital mapping of agricultural lands in other regions of the world with similar conditions.

Materials and Methods
The study site is located in the Trans-Ural steppe zone, near the southern border of the Ural Mountains (Republic of Bashkortostan, Russia) ( Figure 1). The study area consists of 1400 ha and is characterized mostly by gentle (1-3°) slopes up to 8° with various exposures. The elevation of the studied field ranges from 460 m in the northwestern part of the site to 377 m above sea level in the southeastern part. The state of soil cover at studied plot is characterized as arable land (the plowing by a tractor with a turnover of the soil layer to a 10-15 cm depth repeated every year in spring). Wheat (Triticum aestivum) is predominantly cultivated on the plot. The reconnaissance survey of the study area showed that in the southern part of territory, water erosion processes (sheet and rill erosion) are occurring. The climate of the study area is characterized as arid and slightly arid, or as Dfa by the Köppen climate classification. The average annual air temperature is 1.4°. The average annual amount of precipitation is 379 mm. The soil cover is presented by Haplic Chernozems [36]. The parent materials are eluvial-deluvial carbonate clay, heavy loams, and eluvium of sandy schists.
In the studied arable plot, soils were sampled from the upper horizons (0-10 cm) in representative places. The scheme of sample points location is shown on the map ( Figure  1). We used a total of 76 topsoil samples, collected in July of 2020. Sample preparation The climate of the study area is characterized as arid and slightly arid, or as Dfa by the Köppen climate classification. The average annual air temperature is 1.4 • C. The average annual amount of precipitation is 379 mm. The soil cover is presented by Haplic Chernozems [36]. The parent materials are eluvial-deluvial carbonate clay, heavy loams, and eluvium of sandy schists.
In the studied arable plot, soils were sampled from the upper horizons (0-10 cm) in representative places. The scheme of sample points location is shown on the map ( Figure 1). We used a total of 76 topsoil samples, collected in July of 2020. Sample preparation was carried out according to the standard procedure: samples were air-dried, then sieved through a 1 mm sieve for further laboratory trials.
Topographic (or terrain) attributes were calculated using the digital elevation model (DEM) NASA's Shuttle Radar Topography Mission (SRTM), with a resolution of 30 m (https: //www2.jpl.nasa.gov/srtm, accessed on 12 October 2020). Calculation of topographic attributes was performed in the SAGA GIS. The terrain attributes obtained from DEM are present in Table 1.
MLR and SVM techniques were used for spatial modeling of soil agrochemical properties. For choosing the final MLR model and to prevent overfitting, we used minimizing Mallows's C p [40] and the stepwise backward selection method.
The SVM approach was conducted using a radial basis kernel. The tuning method was used for choosing the best parameters in the SVM model (C and gamma). These values were defined by the following intervals: 0.1 < C < 1.0 (step size is 0.1) and 0.2 < γ < 0.5 (step size is 0.05), according to Mattera and Haykin [41] and Cherkassky and Ma [42]. The root-mean-square error of prediction (RMSE) and coefficient of determination (R 2 ) values were used for determining final accuracy prediction (Equations (2) and (3)). These values were evaluated by the 10-fold cross-validation approach [43]. Randomly selected, 80% of the total samples were used for model training data, and then the remaining 20% were used for model validation and accuracy assessment. The analyzed data set was randomly divided into ten equally sized folds. Then, nine of these folds were used to calibrate the model and predict the values for the remaining fold. This procedure was repeated 10 times, each time setting a different fold aside [44]. The advantage of this method is that it performs reliably and is unbiased on smaller data sets [45].
where O i and P i are observed and predicted values of soil properties, O avg and P avg are the average values, and n is the number of samples. The model with the lowest RMSE and highest R 2 values was considered to be the most applicable [46]. The R 2 was determined by the following classification [47]: models with R 2 < 0.4 show a poor or very low level of predictive ability; values of 0.5 < R 2 < 0.7 indicate models with an average level of forecasting; models with R 2 > 0.7 are highly predictive.

Results and Discussion
The studied soil in the 0-10 cm depth has a low and medium SOC content and a neutral and slightly alkaline reaction. The content of NPK is sufficient and commensurate for this region, as well as the values of exchangeable cations [50]. According to the thickness of the humus-accumulative (AB) horizon, the soils are characterized as an eroded, uneroded, and somewhere with sediment deposition. The general statistics parameters of soil propertiesmean, minimum, maximum, standard deviation (SD), and coefficient of variation (CV)-are shown in Table 2.  cs description of soil properties in the 0-10 cm depth. SOC: soil organic carbon; AB: humus-accumulative.

SOC, %
The    The final results of modeling using MLR and SVM methods are presented in Table 3. According to the R 2 and RMSE values, SVM showed a better performance for all predicting properties compared to the MLR method. The SVM method has the best ability to predict N (R 2 = 0.74, RMSE = 14.23). For example, Zhou [25] also developed an SVM model for soil total nitrogen prediction in northwestern China. The authors explained 50% of the variation using data about land use, land cover, climate, topography, and remote-sensing-derived variables.  [15] obtained an MLR model with an R 2 value of 0.45 for predicted spatial variation of exchangeable potassium on the arable plot in Russia using topography variables.
Good performance for predicting spatial distribution was obtained by SOC (R 2 = 0.66, RMSE = 0.53) using the SVM approach. This model uses seven topographic variables (AnHil, As, LS-F, TWI, MRRTF). Silatsa [51] concluded that terrain attributes are important variables for predicting SOC content. Similarly, Mahmoudzadeh [35] found that 58% of SOC distribution was described by topographic and climate variables using the SVM model in Iran.
The SVM technique showed almost similar coefficients of determination for the Na (R 2 = 0.49, RMSE = 0.20) and depth of humus horizon (R 2 = 0.52, RMSE = 6.81). The SVM method performed significantly better than the MLR in RMSE value for depth of humus horizon (6.8 vs. 47.03, respectively).
The SVM model explained 43% of the Ca variation (R 2 = 0.43, RMSE = 5.33). The abovementioned study [15], Gopp obtained a model with a 0.67 coefficient of determination for exchangeable Ca.
Low coefficients of determination (R 2 < 0.25) and, consequently, no predictive capabilities were found for pH (R 2 = 0.01), P (R 2 = 0.11), and Mg (R 2 = 0.04). These elements are not predicted with satisfactory accuracy by the methods used. Previous results also obtained poor models for predicting the spatial distribution of pH using machine learning approaches [52,53].
The relative importance of the variables used for predicting soil elements is shown in Figure 3. The MMRTF index is a key attribute for predicting the SOC and thickness of the humus horizon variation (56% and 47%, respectively). MMRTF is a basic variable in Ca modeling (38%). This topographic attribute is a multiscale flatness index that identifies sloping and flat areas and characterizes the direction and strength of the distribution of matter in the landscape [54]. For example, an MMRTF attribute was successfully used in models to predict SOC, K, pH, and clay using regression kriging approaches [55]. , 10, x FOR PEER REVIEW 7 o of matter in the landscape [54]. For example, an MMRTF attribute was successfully us in models to predict SOC, K, pH, and clay using regression kriging approaches [55]. Terrain elevation is the most important variable for explaining the variability of and Ca in the study area (81% and 41%, respectively). Wang [56] reported that elevati is the most important variable affecting the spatial distribution of N. In some other digi soil mapping studies, elevation was also noted as the most important covariate for p dicting soil elements [51,53,[57][58][59].
The slope is the main covariate in the model for Na in our research (41%). The m effective variables are CNBL (21%), LS-Factor (18%), and elevation (17%). The LS-Fac attribute is the second most important parameter in the SOC and N models (16% and 9 respectively). The LS-Factor is a combination of slope length and slope angle. This tribute is one of the five factors of the universal soil loss equation (USLE) and describi the influence of topography on soil erosion risk. Figure 4 shows the frequency of use of all 17 topographic attributes. The most f quently used topographic attribute in modeling is aspect (slope exposition). It is us four times in various regression equations and is involved in the prediction of almost elements (except Ca and Mg). The topographic attribute analytical hillshading (AnHil used three times, which is closely related to aspect; AnHil raster image is a relief with shadow, created based on the angle of the light source and the shadow. Sahabiev [5 used a southwest position as a separate topographic parameter in the regression analy for the prediction model for humus content, N, pH, and physical clay (the sum of s fractions less 0.01 mm, %). The elevation, LS-Factor, and MRRTF variables were also us three times each in the prediction. These variables are closely related to each other a successfully determine areas where accumulation of soil nutrients is occurring. Terrain elevation is the most important variable for explaining the variability of N and Ca in the study area (81% and 41%, respectively). Wang [56] reported that elevation is the most important variable affecting the spatial distribution of N. In some other digital soil mapping studies, elevation was also noted as the most important covariate for predicting soil elements [51,53,[57][58][59].
The slope is the main covariate in the model for Na in our research (41%). The most effective variables are CNBL (21%), LS-Factor (18%), and elevation (17%). The LS-Factor attribute is the second most important parameter in the SOC and N models (16% and 9%, respectively). The LS-Factor is a combination of slope length and slope angle. This attribute is one of the five factors of the universal soil loss equation (USLE) and describing the influence of topography on soil erosion risk. Figure 4 shows the frequency of use of all 17 topographic attributes. The most frequently used topographic attribute in modeling is aspect (slope exposition). It is used four times in various regression equations and is involved in the prediction of almost all elements (except Ca and Mg). The topographic attribute analytical hillshading (AnHil) is used three times, which is closely related to aspect; AnHil raster image is a relief with a shadow, created based on the angle of the light source and the shadow. Sahabiev [55] used a southwest position as a separate topographic parameter in the regression analysis for the prediction model for humus content, N, pH, and physical clay (the sum of soil fractions less 0.01 mm, %). The elevation, LS-Factor, and MRRTF variables were also used three times each in the prediction. These variables are closely related to each other and successfully determine areas where accumulation of soil nutrients is occurring.
Only models with a coefficient of determination R 2 ≥ 0.25 were used to create the maps. The spatial variation of studied soil properties using the SVM predicted models is shown in Figure 5. The spatial variation of SOC, Ca, N, K, and thickness of humus horizon correlate with elevation and are characterized by higher values in the top areas of the western part and the central part of the studied plot. Furthermore, these properties gradually decrease towards the southern part of the field, where runoff processes occur. This south area sufficiently demonstrates the zones of accumulation of Ca, Na, and SOC content due to runoff and sediment deposition from elevated areas. The studied region is characterized by active wind erosion processes [60]. Thus, increased values of these nutrients are also found along roads and forest belts, designed to protect against wind erosion. The spatial distribution of Na has a different character compared to other mapped elements. The Na accumulation zones are concentrated in the southwestern part.
tribute is one of the five factors of the universal soil loss equation (USLE) and describing the influence of topography on soil erosion risk. Figure 4 shows the frequency of use of all 17 topographic attributes. The most frequently used topographic attribute in modeling is aspect (slope exposition). It is used four times in various regression equations and is involved in the prediction of almost all elements (except Ca and Mg). The topographic attribute analytical hillshading (AnHil) is used three times, which is closely related to aspect; AnHil raster image is a relief with a shadow, created based on the angle of the light source and the shadow. Sahabiev [55] used a southwest position as a separate topographic parameter in the regression analysis for the prediction model for humus content, N, pH, and physical clay (the sum of soil fractions less 0.01 mm, %). The elevation, LS-Factor, and MRRTF variables were also used three times each in the prediction. These variables are closely related to each other and successfully determine areas where accumulation of soil nutrients is occurring.  Only models with a coefficient of determination R 2 ≥ 0.25 were used to create the maps. The spatial variation of studied soil properties using the SVM predicted models is shown in Figure 5. The spatial variation of SOC, Ca, N, K, and thickness of humus horizon correlate with elevation and are characterized by higher values in the top areas of the western part and the central part of the studied plot. Furthermore, these properties gradually decrease towards the southern part of the field, where runoff processes occur. This south area sufficiently demonstrates the zones of accumulation of Ca, Na, and SOC content due to runoff and sediment deposition from elevated areas. The studied region is characterized by active wind erosion processes [60]. Thus, increased values of these nutrients are also found along roads and forest belts, designed to protect against wind erosion. The spatial distribution of Na has a different character compared to other mapped elements. The Na accumulation zones are concentrated in the southwestern part. For validation and assessment of the ML techniques accuracy in soil mapping (prediction of soil elements spatial distribution), we compared the results with ordinary kriging approach. The maps based and created using a kriging approach ( Figure 6) have a similar distribution of studied soil parameters as obtained by the SVM method. The Figure 5. Spatial distribution of soil agrochemical properties using SVM method (for R 2 models ≥ 0.25). Here, and in Figure 6, areas with vegetation and roads are masked by white color. niques will be absent or minor. For example, the research [61] conducted earlier in the studied region showed that some properties of soils (namely, the thickness of humus-accumulative horizon, SOC, P, and N) changed insignificantly (less than 5-10%) from 1975 to 2011. Basically, those changes were associated with erosion processes and mechanical impact by tillage operations. Figure 6. Spatial distribution of soil agrochemical properties using an ordinary kriging method (for R 2 models ≥ 0.25).

Conclusions
The monitoring of agricultural land conditions is a necessary task, especially due to intensive anthropogenic pressures and climate transformation. Large-scale mapping of soil properties is necessary for rational land management and agriculture reclamation measures.
According to the coefficient of determination (R 2 ) and RMSE values, the SVM method was superior to MLR in predicting all studied agrochemical soil properties. The SVM model was obtained to explain the spatial variation of N content and showed the best results (R 2 = 0.74, RMSE = 14.23). This model is characterized by highly predictive ability. The medium level of prediction was obtained for modeling of SOC, Na, K, and thickness of the humus-accumulative (AB) horizon. The most important variables for predicting the spatial distribution of soil properties were elevation, slope, and MRRTF index.
Our results showed the possibility of using topography variables with an SVM approach to predict the main soil agrochemical properties in the Trans-Ural steppe zone (Republic of Bashkortostan, Russia). The use of such ML techniques can reduce soil sampling efforts, and therefore reduce soil mapping costs and keep time/human resources. We assumed that the accuracies obtained in this study are promising for current/future local scale digital soil mapping efforts as well as for the other areas in the For validation and assessment of the ML techniques accuracy in soil mapping (prediction of soil elements spatial distribution), we compared the results with ordinary kriging approach. The maps based and created using a kriging approach ( Figure 6) have a similar distribution of studied soil parameters as obtained by the SVM method. The highest values of SOC, Ca, N, K, and thickness of humus horizon are also located in the western part on elevated areas. Thus, the accuracies obtained for the ML techniques in this study are promising and useful in digital soil mapping.
It should be noted that the soil properties could changing over the time. Thus, it is necessary to continue the present study for comparing and validating the results. At the same time, we assumed that the error in digital soil mapping with use of the ML techniques will be absent or minor. For example, the research [61] conducted earlier in the studied region showed that some properties of soils (namely, the thickness of humus-accumulative horizon, SOC, P, and N) changed insignificantly (less than 5-10%) from 1975 to 2011. Basically, those changes were associated with erosion processes and mechanical impact by tillage operations.

Conclusions
The monitoring of agricultural land conditions is a necessary task, especially due to intensive anthropogenic pressures and climate transformation. Large-scale mapping of soil properties is necessary for rational land management and agriculture reclamation measures.
According to the coefficient of determination (R 2 ) and RMSE values, the SVM method was superior to MLR in predicting all studied agrochemical soil properties. The SVM model was obtained to explain the spatial variation of N content and showed the best results (R 2 = 0.74, RMSE = 14.23). This model is characterized by highly predictive ability. The medium level of prediction was obtained for modeling of SOC, Na, K, and thickness of the humus-accumulative (AB) horizon. The most important variables for predicting the spatial distribution of soil properties were elevation, slope, and MRRTF index.
Our results showed the possibility of using topography variables with an SVM approach to predict the main soil agrochemical properties in the Trans-Ural steppe zone (Republic of Bashkortostan, Russia). The use of such ML techniques can reduce soil sampling efforts, and therefore reduce soil mapping costs and keep time/human resources. We assumed that the accuracies obtained in this study are promising for current/future local scale digital soil mapping efforts as well as for the other areas in the world with similar soil/climate conditions. We propose that the elaborated SVM models should be improved by adding other covariates, analyzing other soil formation factors, increasing the number and period of soil sampling, and using other ML techniques to obtain the best results for soil mapping.
Author Contributions: Azamat Suleymanov and Evgeny Abakumov interpreted the data and prepared the manuscript. Azamat Suleymanov, Ruslan Suleymanov, Ilyusya Gabbasova and Mikhail Komissarov performed sample collection and laboratory analyses. All authors have read and agreed to the published version of the manuscript.

Funding:
The article was made with support of the Ministry of Science and Higher Education of the Russian Federation in accordance with agreement № 075-15-2020-922 date 16.11.2020 on providing a grant in the form of subsidies from the Federal budget of Russian Federation. The grant was provided for state support for the creation and development of a world-class scientific center "Agrotechnologies for the Future".

Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.