Smart-Map: An Open-Source QGIS Plugin for Digital Mapping Using Machine Learning Techniques and Ordinary Kriging

Machine Learning (ML) algorithms have been used as an alternative to conventional and geostatistical methods in digital mapping of soil attributes. An advantage of ML algorithms is their flexibility to use various layers of information as covariates. However, ML algorithms come in many variations that can make their application by end users difficult. To fill this gap, a Smart-Map plugin, which complements Geographic Information System QGIS Version 3, was developed using modern artificial intelligence (AI) tools. To generate interpolated maps, Ordinary Kriging (OK) and the Support Vector Machine (SVM) algorithm were implemented. The SVM model can use vector and raster layers available in QGIS as covariates at the time of interpolation. Covariates in the SVM model were selected based on spatial correlation measured by Moran’s Index (I’Moran). To evaluate the performance of the Smart-Map plugin, a case study was conducted with data of soil attributes collected in an area of 75 ha, located in the central region of the state of Goiás, Brazil. Performance comparisons between OK and SVM were performed for sampling grids with 38, 75, and 112 sampled points. R2 and RMSE were used to evaluate the performance of the methods. SVM was found superior to OK in the prediction of soil chemical attributes at the three sample densities tested and was therefore recommended for prediction of soil attributes. In this case study, soil attributes with R2 values ranging from 0.05 to 0.83 and RMSE ranging from 0.07 to 12.01 were predicted by the


Introduction
Digital mapping of soil and plant attributes provides information allowing variablerate (VR) application of agricultural inputs [1]. However, the precision of the VR application depends on precision of the maps that are obtained, typically through interpolation among georeferenced samples. In an economically viable sampling system, a range of interpolation methods can be used, including the geostatistical method of Ordinary Kriging (OK), which is popular in digital soil mapping [2]. However, a disadvantage of OK is the need for large numbers of sampling points for semi-variance modeling [3,4].
Recently, with the large volume of information generated in production fields, Machine Learning (ML) techniques have been used as an alternative to OK for digital mapping of soil attributes [5][6][7][8][9]. ML algorithms attempt to discover and quantify patterns among available data to make predictions. Several models that use ML algorithms for prediction and mapping of soil attributes have been developed [7,10,11], among which are Random Forest, Support Vector Machine (SVM), Cubist, K-Nearest Neighbors, and Artificial Neural Networks [10,12,13]. However, to implement ML models for digital mapping, it is necessary to master open-source programming languages such as Python (Python Software Foundation, Wilmington, DE, USA) and R [14].
For the development of applications using ML, several layers of data must be available, such as environmental and climatic variables, soil and plant sensor data, satellite imagery, yield maps, and digital elevation models. These data can be in matrix or vector format, and in various spatial resolutions, which can make the implementation of the ML interpolation model challenging. As many of these features may have a greater or lesser importance in modeling, it may be necessary to use feature selection and elimination techniques [13,15,16].
A computational tool that facilitates the use of ML techniques in digital mapping without requiring programming knowledge can assist users of geographic information systems (GIS) software. QGIS [17] is open-source software featuring a user-friendly interface and an active community of developers and users. Free computer programs are available for Ordinary Kriging, such as Vesper [18], SGeMS [19], DAGApy [20], and KrigMe [21]. However, none of these are available as a QGIS plugin. Given the potential application of ML and the need to integrate QGIS into a system for digital mapping of soil attributes, this study aimed to develop a plugin called Smart-Map that is integrated with QGIS software for digital mapping using OK and ML as interpolation methods.

Materials and Methods
Smart-Map was registered with the National Institute of Industrial Property (INPI, Ministry of Economy, Brazil, BR 51 2021 000002-1). Its latest version can be found on GitHub web site. Available online: https://github.com/gustavowillam/SmartMapPlugin (accessed on 25 May 2022) or installed from the QGIS plugin repository. Available online: https://plugins.qgis.org/plugins/Smart_Map (accessed on 25 May 2022). Python 3.7 was used to develop the software, being compatible with macOS, Linux, and Windows operating systems. The graphical user interface (GUI) was designed using PyQt5 (Riverbank Computer Limited, Dorchester, United Kingdom). The software is a plugin to QGIS version 3.10 or higher.

Smart-Map Implementation
To validate the OK and ML methodology used by Smart-Map, a case study was conducted, where the accuracy of the interpolation of soil attributes was compared using OK and ML for different sampling grids. For the OK interpolation method, the protocols and equations described by [22] were adopted. The developed plugin allows the user to fit five models of isotropic theoretical semivariograms: linear, linear with sill, exponential, spherical, and Gaussian. The semivariogram model was chosen using a cross-validation method.
The Support Vector Machine (SVM) method is a machine learning algorithm, developed in the 1990s and used for both regression and classification of datasets [23]. The SVM method was chosen for interpolation because it can handle smaller and larger volumes of data [24]. For most ML algorithms, it is necessary to fit hyperparameters that need to be chosen by the user because they depend on the data type and variation. For the SVM algorithm, hyperparameters such as C and gamma (γ) were optimized using a systematic grid search method [25,26], enabling automated fitting. Hence, the C and gamma hyperparameters were optimized based on the RMSE value found during cross-validation. Kernel function is another important hyperparameter for SVM. For the plugin, the Radial Basis Function (RBF) kernel was chosen because it is a non-linear function and can be fitted to most of the data.
In addition to the generation of interpolation maps, Smart-Map can perform cluster analyses using the fuzzy k-means method [27], yielding Management Zones (MZ) maps. To define the ideal number of classes, Smart-Map calculates the FPI (Fuzzy Performance Index) and NCE (Normalized Classification Entropy) indices, which are widely recommended in the literature to define the appropriate number of MZs [28,29]. To execute the cluster process and define the MZs, the fuzzy k-means algorithm of the Scikit-Fuzzy Python library was implemented [30]. The flowchart of the Smart-Map plugin is shown in Figure  1, whereas Figure 2 shows the GUI for map interpolation using OK and SVM in Smart-Map.

Case Study for Smart-Map Plugin Evaluation
A case study to evaluate Smart-Map was conducted in an area of 75 ha, located between the municipalities of Anápolis and Goianápolis at latitude and longitude of approximately −16.274839 and −48.593840, in the central region of the state of Goiás, Brazil (Figure 3). This area is cultivated with soybean, has an average altitude of 1017 m, a flat relief with a soil predominantly classified as Ferralsols, based on the World Reference Base for Soil Resources [31]. Soil samples were collected using a regular grid with a sampling density of two points per hectare, totaling 150 composite samples. The samples were georeferenced with a topographic GNSS Promark 3 (Magellan Co., Santa Clara, CA, USA). Each composite sample comprised 10 individual samples (0 to 0.20 m depth), collected within a 3 m radius. Composite samples were homogenized, packed in plastic bags and identified using a composite sample number. Laboratory analyses were performed to measure the concentrations of macronutrients (P, K + , Ca 2+ and Mg 2+ ), organic matter, cation exchange capacity at pH 7, and particle size. Data of apparent soil electrical conductivity (ECa) were also collected on five dates (Eca_1 measured on 11/11/2010, Eca_2 measured on 11/23/2010, Eca_3 measured on 12/04/2010, Eca_4 measured on 12/13/2010 and Eca_5 measured on 01/26/2011) using a portable conductivity meter Landviser LandMapper ® ERM 02 (Landviser LLC, League City, TX, USA). This device measures the electrical resistivity of the soil using four equally spaced electrodes [32]. The apparent electrical conductivity of the soil is obtained by 1/resistivity. The data used in the case study, were made available to the research community [33]. Descriptive statistics of the data are presented in Table 1.   (6) CEC, cation exchange capacity at pH 7; (7) Altitude; (8) Clay; (9) Silt; (10) Sand; (11) Eca_1, Apparent Soil Electrical conductivity measured on 11/11/2010; (12) Eca_2, Apparent Soil Electrical conductivity measured on 11/23/2010; (13) Eca_3, Apparent Soil Electrical conductivity measured on 12/04/2010; (14) Eca_4, Apparent Soil Electrical conductivity measured on 12/13/2010; (15) Eca_5, Apparent Soil Electrical conductivity measured on 01/26/2011; (16) Eca_Avg, Mean Value of Apparent Soil Electrical conductivity of Eca_1, Eca_2, Eca_3, Eca_4, Eca_5; (17) SD, Standard Deviation; (18) CV, Coefficient of Variation.

Methods of Interpolation and Spatial Correlation Analysis
In the case study, an interpolation grid of 10 m × 10 m was defined to perform interpolation by OK and SVM. To interpolate each point of the grid using OK, the search radius was defined equal to the range obtained by the theoretical semivariogram; the maximum number of neighbors was defined as 16. For interpolation by OK, Smart-Map uses the Python open-source PyKrige library [34]. The PyKrige library performs the interpolation using the k-nearest neighbors method. The library was adapted to also accept the search radius. Interpolation was performed using the k-nearest neighbors method or using a neighborhood search radius, selected by the user.
For interpolation by SVM, a supervised learning model, available in the open-source Scikit-Learn Python library, was implemented [35]. For modeling, it is necessary to construct the X matrix and y vector. The X matrix is composed of columns with the features (covariates) and rows, which are the soil samples. In the X matrix, the geographic coordinates x and y of the point to be interpolated were added as features. In addition to geographic coordinates, other features, including the feature of the variable itself, were added in the X matrix. In this case, the feature is created based on the calculation of the Inverse Distance Weighting (IDW) of the nearest neighbors to the point to be interpolated. The y vector was composed of the observed (true) values of each soil attribute to be interpolated. In this case study, the attributes P, K + , Ca 2+ , and Mg 2+ were interpolated variables. Thus, the observed value obtained for the point is part of the y vector and is not used for feature creation, rather merely the IDW of neighbors of the point were considered. In addition, Smart-Map allows the use of data from other layers in the QGIS database (vector or raster) as features.
In the case study, two methods of modeling by SVM were used, which were termed as SVM1 and SVM2. For the SVM1 method, the geographic coordinates (x and y) and the value of the variable itself, which was estimated using the IDW interpolation method, were used as features. In SVM2, those features that were more correlated with the variable to be interpolated were used as covariates, in addition to the geographic coordinates (x and y) and the value of the variable itself, interpolated using IDW. The selection of covariates was made based on the spatial correlation of Moran's Index (I'Moran), one of the most popular indices for evaluation of spatial correlation [36] of regionalized variables. The univariate I'Moran was used to compare the degree of correlation of the variable itself in different distance spaces (spatial autocorrelation). The univariate I'Moran measures the autocorrelation of the variable to be interpolated. This index was used as an indicator of the spatial dependence of each attribute [37]. A univariate I'Moran value equal to zero means that the variable under study does not show spatial correlation. The closer the value is to 1 or −1, the greater the autocorrelation, that is, the greater the spatial correlation of the variable [6,38]. Univariate I'Moran was calculated according to Equation (1) [39]. The bivariate I'Moran was used to measure the spatial correlation between the available covariates such as CEC, OM, Altitude, Clay, Silt, Sand, and ECa, with the attribute that was interpolated. Its value was calculated according to Equation (2) [40].
where: is the number of observations in the area under study; , represent the observed values of the soil attributes to be interpolated at the points i, j; , represent the observed values of the selected covariate at the points i, j; ̅ is the average of x; � is the average of y; are the elements of the matrix of spatial weights with value 0 on the diagonal (wii = 0).
The optimal subset of covariates for the SVM2 method was selected considering the bivariate I'Moran. Covariates that showed greater spatial correlation with the variable to be interpolated were added to the SVM2 method. To verify the significance of I'Moran, the pseudo p-value was obtained from 999 permutations between the points of the sampling grid at 1% and 5% probability levels. For the calculation of I'Moran, Smart-map used the PySAL open-source Python library [41].

Generation of Scenarios and Performance Criteria for Comparison between Interpolation Methods
To compare the performance of the OK method and the SVM models (SVM1 and SVM2) at various sampling densities, the regular grid of 150 points in the area was reduced to grids with lower densities (25%, 50%, and 75%). Three grids were obtained with 38, 75, and 112 points, respectively. These points were used for semivariogram modeling in the OK method and definition of the training set in the SVM model, whereas the remaining points were used for verification of the accuracy of the prediction. Figure 4a shows the original grid with 150 points and the reduced grids composed of modeling and testing data. In the grid with 38 points (Figure 4b), 38 points were used for modeling and 112 points for testing. In the grid with 75 points (Figure 4c), 75 points were used for modeling and 75 points were used for testing. In the grid with 112 points (Figure 4d), 112 points were used for modeling and 38 points were used for testing.
From the reduction of the sampling grid, interpolated maps were generated using the sets of training points for the OK method and the SVM model at the three densities of sampling grids. In this case study, the attributes P, K + , Ca 2+ , and Mg 2+ were interpolated. For modeling, the SVM method requires the adjustment of two hyperparameters, C and gamma. K-fold cross-validation was used to obtain optimal values of these hyperparameters. Validation with 5-folds was used to optimize the model in the selection of the best hyperparameters using the training dataset. The leave-one-out cross-validation (LOOCV) [42] method was used to measure the performance of the implemented methods. LOOCV consists of using all data and leaving one data point out and has been widely used due to its mathematical simplicity. The outside point is then interpolated by one of the interpolation methods [43]. This strategy was applied to all samples in the set. As the actual values of the set are known, the Coefficient of Determination (R 2 ) and RMSE values of the LOOCV were calculated. The R 2 and the RMSE of predicted and observed data of LOOCV were calculated for each model and for each interpolated attribute. The test sets were used to calculate the R 2 and RMSE of each map obtained by interpolation of P, K + , Ca 2+ and Mg 2+ , after modeling. For this, the interpolated values of P, K + , Ca 2+ and Mg 2+ were extracted from the same places where the test points were located. R 2 and RMSE were calculated using Equations (3) and (4), for P, K + , Ca 2+ , and Mg 2+ for the various sampling grids.
where: � represents the estimated value of the soil attribute at the point k; ̅ is the average of the n sampled points of the soil attribute; is the observed value of the soil attribute at the point k; and is the number of points sampled.

Definition and Selection of Features for the SVM Model
To define the features to be inserted in the SVM model, the user defines the Inverse Distance Weighting (IDW) parameters such as the weighting value (p), the search radius, and the number of neighbors (n) to consider for calculating the feature for the X matrix. In the case study, a search radius equal to the maximum distance between the sampled points was used, the number of neighbors was equal to 16 and a weight (p) equal to unity were used as default values. Figure 5a shows a selection of the 16 closest neighbors to the point where the user wishes to estimate the attribute value using the IDW method of the selected attribute of the QGIS layer (target_A). Figure 5b shows how the ML model for the SVM1 and SVM2 methods was constructed divided into features (X matrix) and variable to be interpolated (y vector). Each row of the training data represents a sample of the grid. In the X matrix, coordX and coordY are the x and y coordinates of the sampled point, respectively; idwA represents the estimated value for the variable based on IDW using the 16 neighbors closest to the sampled point of the attribute to be interpolated; idw_At1, idw_At2, idw_Atn represent the estimated value based on IDW using the 16 neighbors closest to the sampled point of the selected features. In the y vector, target_A represents the sampled values of the attribute to be interpolated, which were P, K + , Ca 2+ , and Mg 2+ . For SVM1, the features consisted of the coordinates (coordX and coordY) of the point and the IDW value of the variable (y) using the 16 neighbors closest to the sampled point, within the defined search radius of the attribute to be estimated. The variable to be interpolated (y) represents the observed soil attribute, for which the user wishes to predict its values at unsampled locations. In this case study, the variables are P, K + , Ca 2+ and Mg 2+ .
In the second approach (SVM2), the features were the coordinates (coordX and coordY), and the IDW of 12 covariates available in the study area: OM, CEC, Altitude, Clay, Silt, Sand, ECa_1, ECa_2, ECa_3, ECa_4, ECa_5, and ECa_Avg. In this case, the features used originated from the original grid with 150 points. This was done because the goal of using the SVM is to take advantage of information that has been densely sampled in the area. These data can be obtained by sensors or comprise quasi-static information.
The R 2 accuracy metric of the LOOCV cross-validation was applied for each subset of covariate added. The subset of covariates that had the best value of R 2 was chosen to define the SVM model to be used for the variable to be interpolated. This selection was performed considering all features for grids with 38, 75, and 112 sampling points. The final trained SVM model was obtained after performing the LOOCV of all points of the training set (Figure 5c). With the trained model, the interpolation of soil variables (P, K + , Ca 2+ , and Mg 2+ ) was performed, thus obtaining the interpolated map for the attribute (Figure 5d).

Results and Discussion
In this section we discuss the results of the spatial correlation obtained through the I'Moran method between the covariates used by the SVM1 and SVM2 methods and the interpolated variables (P, K + , Ca 2+ , and Mg 2+ ). In addition, a performance comparison between the OK and SVM methods is discussed. The OK and SVM1 methods used only the estimated value of the variable to be interpolated as an input feature for the model. The SVM2 method used, in addition to the estimated value of the variable, the covariates with the highest spatial correlation with the variable to be interpolated as input for the model.

Spatial Correlation and Selection of Covariates for the SVM Model
For spatial correlation analysis at the three densities of sampling grid, bivariate I'Moran was used to measure the correlation between the contents of the macronutrients P, K + , Ca 2+ , and Mg 2+ and the covariates with highest temporal stability (CEC, OM, Altitude, Clay, Silt, Sand, ECa_1, ECa_2, ECa_3, ECa_4, ECa_5, ECa_Avg). Figure 6 shows the values of univariate I'Moran for the variables to be interpolated (P, K + , Ca 2+ , and Mg 2+ ) and bivariate I'Moran between the variables to be interpolated and the covariates with greatest temporal stability for the sampling densities of 38, 75, and 112 points. Figure 6 shows that apparent soil electrical conductivity (ECa) measured on five dates showed a significant positive correlation with the attributes Mg 2+ and Ca 2+ , with values ranging from 0.12 (between Ca 2+ and ECa_4, grid of 75 points) to 0.61 (between Mg 2+ and ECa_Avg, grid of 38 points). For the interpolation of these two soil attributes, ECa was used as a covariate in the SVM2 method at the three densities of sampling grids (Figure 6). In the grid of 38 sampling points (Figure 6a), the covariates ECa_1 for the attributes Mg 2+ and Ca 2+ and CEC for Ca 2+ were used. In the grid with 75 points (Figure 6b), ECa_Avg was used for the Mg 2+ attribute and ECa_1 was used for the Ca 2+ attribute. Finally, in the grid with 112 sampling points (Figure 6c), the attributes OM and ECa_1 for Mg 2+ and ECa_Avg for Ca 2+ were used as interpolation covariates.
ECa showed low correlations with the attributes P and K + , implying a lower potential for use as covariates to interpolate P and K + . ECa_4 was used to interpolate only the P attribute in the grid with 38 points, since the correlation was significant with I'Moran of −0.18 (Figure 6a). For the same grid, CEC was used as a covariate for the K + attribute. For the grid with 75 points (Figure 6b), the covariates CEC and OM were used for the K + attribute and the covariate Altitude was used for the P attribute. According to Figure 6b, Altitude presented the highest spatial correction of I'Moran with attribute P, 0.19 and pvalue ≤ 0.05, as well as attribute Sand. However, only Altitude was used because presented the best score in LOOCV. For the grid with 112 points (Figure 6c), the K + attribute used the covariates CEC, OM, and Altitude, and the P attribute used Sand as covariate for interpolation.

Comparison between OK and SVM Methods
For the training set, at three different densities of sampling grids, the values of R 2 (Figure 7) show that the SVM2 method was superior for the four soil attributes analyzed (P, K + , Ca 2+ , and Mg 2+ ), except for K + in the grid with 75 points. The univariate I'Moran for the K + attribute was 0.72 and significant at a 1% probability level in the grid with 75 points, as shown in Figure 6b. Values of R 2 for the SVM2 method in the training set ranged from 0.16 to 0.38. Compared to the SVM1 method, the SMV2 method obtained a higher R 2 for all attributes analyzed in all point densities of the sampling grids.
OK showed the lowest coefficients of determination values for the attributes P, K + , and Ca 2+ in the grid with 38 points (Figure 7). The values of univariate I'Moran for P and K + were low and not significant for the two soil attributes analyzed (Figure 6a). As mentioned by [2,3], OK requires a minimum number of sampling points for good semivariogram modeling. For the grid with 38 points, the SVM2 method performed better than the OK method, with R 2 values ranging from 0.19 to 0.38. For the P attribute, the three methods had the lowest values of R 2 . These data corroborate Figure 6, in which the values of univariate and bivariate I'Moran were low for the P attribute. In general, OK, SVM1, and SVM2 showed lower R² values for the grid of 38 sampling points, compared to grids with higher sampling density. As in the training set, the values of R 2 were also higher for the SVM2 method in the test set ( Figure 8). The lowest correlation coefficient for the SVM2 method was obtained for the P attribute in the sampling grid with 38 points in the test set (R 2 = 0.15). The low performance of SVM2 for predicting the P attribute is related to the covariate added to the grid with 112 points in the training set. The Sand covariate used by the SVM2 method had bivariate I'Moran of 0.14 with the P attribute (Figure 6c). This value was the lowest used by a covariate added to the SVM2 method. Covariates that have low value of bivariate I'Moran with the attribute to be interpolated may not contribute or contribute in a nonsignificant way to a better performance of the SVM2 method. [16] claim that the low correlation between predictor variables and the dependent variable (y) directly impacts the performance of the ML model. The RMSE values for the soil attributes P, K + , Ca 2+ , and Mg 2+ , for the OK, SVM1, and SVM2 methods are shown in Tables 2 and 3 for the training and test sets, respectively. In Table 2 the training sets with 38, 75, and 112 points were displayed, while in Table 3 their respective test sets are 112, 75, and 38 points, in this order, thus totaling 150 sampled points divided between training and testing. As expected, the RMSE values tended to be lower for greater values of R 2 for the training set ( Figure 7 and Table 2) and for the test set ( Figure 8 and Table 3). Similar results have been observed in other studies [5,15]. With lower RMSE values (Table 2), it can be inferred that OK was superior to SVM1 in the prediction of P, as the R 2 was similar (R 2 = 0.11 and 0.15 in the grids of 75 and 112 points, respectively) as shown in Figure 7.

Maps of Soil Attributes
The maps of soil attributes were generated using the samples selected from the training sets with 38, 75, and 112 sampling points, as shown in Figure 4. The set of 150 points was also used to perform interpolation and obtain interpolated maps. The attributes P, K + , Ca 2+ , and Mg 2+ were interpolated using the methods OK, SVM1, and SVM2, obtaining maps with four densities of points. To obtain the maps, a grid with 10 m × 10 m cells was used, totaling 7388 interpolated points. Each interpolated attribute showed a different pattern of spatial variability (Figures 9-12). This may be associated with the characteristics of mobility of the attribute in soil, relief shape, soil formation and soil management over time.
The RMSE shown in Table 3 can be interpreted as the interpolation error for each map obtained by interpolation in each density of the sampling grid and for each soil attribute. This error was calculated based on the test set, because the values of the maps obtained by interpolation of P, K + , Ca 2+ , and Mg 2+ were extracted in the same places where the test points were located, thus calculating the RMSE between the value predicted by the method and the value observed in the test set.
For the maps obtained by interpolation of the P attribute in the grid with 38 sampling points of the training set (Figure 9(a.1-c.1)), the SVM2 method had the lowest error (RMSE = 3.22 mg/dm 3 ), considering the test set of 112 points, according to Table 3. For the grid with a density of 75 sampling points in the training and test set (Figure 9(a.2-c.2)), SVM2 also had the lowest RMSE (2.74 mg/dm 3 ), followed by SVM1 and OK (Table 3). For the grid with a density of 112 sampling points in the training set (Figure 9(a.3-c.3)) and 38 test points, the map obtained by interpolation through the SVM1 method showed the lowest RMSE (1.94 mg/dm 3 ), followed by OK. SVM2 had the highest error (RMSE = 2.79 mg/dm 3 ). For the map obtained by interpolation in the grid with 150 sampling points of the training set, it was not possible to calculate the error, since no observed points were separated for the test set. For this density, the highest P contents are distributed in the central part of the map (Figure 9(a.4-c.4)). For the maps obtained by interpolation of the K + attribute in grids with 38 ( Figure  10(a.1-c.1)), 75 (Figure 10(a.2-c.2)), and 112 ( Figure 10(a.3-c.3)) samples of the training set, the SVM2 method had the lowest error, followed by OK in sets with 38 and 112 points and by SVM1 in the set of 75 points (Table 3). For the map obtained by interpolation in the grid of 150 points, the highest concentrations of K + are located in the east and west parts of the map (Figure 10(a.4-c.4)). The SVM2 method obtained the lowest interpolation error in the three densities of sampling grids, followed by SVM1 and OK for the Ca 2+ attribute, as shown in Table 3. For the grid with density of 150 sampling points, Ca 2+ had higher values in the north and center parts of the study area (Figure 11(a.4-c.4)) for the three interpolation methods. The Mg 2+ attribute, as observed for Ca 2+ and K + , had the lowest error for maps obtained by interpolation through the SVM2 method. In the grid of 75 sampling points for the training and test set (Figure 12(a.2-c.2)), the SVM1 and SVM2 methods obtained the same error (RMSE = 0.10 cmolc/dm 3 ). As the R 2 value of SVM2 (0.47) was higher than the R 2 in SVM1 (0.41) according to Figure 8, implying that the performance of SVM2 was better than that of SVM1. The same occurred for the grid with 38 points of the training set ( Figure  12(a.1-c.1)) and the grid with 112 samples in the test set, as the error of the OK and SVM1 methods was 0.11 cmolc/dm 3 . OK was superior because it had higher R 2 values (Figure 8). For the grid with 150 sampling points, the map obtained by interpolation showed spatial behavior with the highest values concentrated in the northern part of the area ( Figure  12(a.4-c.4)).

Limitations and Future Developments
Smart-Map is a QGIS plugin, allowing generation of interpolated soil attribute maps. A limitation of the plugin is that the maximum number of sampling points in the input layer is limited to 1000; for grids exceeding this limit, the plugin resamples the data based on the neighborhood of the sampled points. Another limitation is that only Ordinary Kriging and Support Vector Machine methods are implemented. Although both methods allow for generation of high-quality maps covering a wide range of applications, they do not necessarily perform well in any conceivable application. In addition, to evaluate the models, only RMSE and R 2 metrics were used based on Leave-one-out cross-validation; however, for certain applications there are more appropriate metrics.
Future extension of the plugin comprises implementation of Co-Kriging and other Machine Learning models such as Cubist, XGBoost, and LightGBM. Techniques for selecting features that are not based on spatial correlation such as Recursive Feature Elimination (RFE) will be implemented as well. Finally, model evaluation metrics such as EAM (Mean Absolute Error), RPD (Relative Difference Percentage) and cross-validation techniques such as K-fold and Holdout will be implemented.

Conclusions
Techniques for digital mapping of soil attributes were implemented using Ordinary Kriging (OK) and the Machine Learning (ML) Support Vector Machine (SVM) algorithm coded in a Smart-Map plugin for QGIS. Machine Learning interpolation allowed data from the QGIS database layers of raster-and vector-type to be used as covariates in the interpolation. The maps generated by the plugin can be exported to QGIS in a shapefile and/or raster format.
In a case study used to evaluate the performance of the Smart-Map plugin, interpolation was compared using three methods being Ordinary Kriging (OK), a machine learning Support Vector Machine method that uses the attribute itself interpolated by Inverse Distance Weighting (IDW) as covariate (SVM1), and with the use of covariates (SVM2). Conclusions are as follows: (1) The SVM2 method was superior to other models in the prediction of soil chemical attributes for the three densities of points in the sampling grids. The R 2 values were higher in 11 of the 12 combinations among the four soil attributes interpolated in three densities of points of the sampling grids, considering the training set. (2) Considering the RMSE of the test set, SVM2 had the lowest error for the prediction of maps obtained by interpolation for the four soil attributes in the three sampling densities, except for the P attribute in the SVM1 method with a grid of 38 points in the test set. (3) One difficulty encountered by ML algorithms for problems of mapping and prediction of soil attributes is to handle the excessive number of covariates in the model. Spatial correlation of I'Moran proved to be efficient for the selection of covariates of greater importance in the model. (4) In areas with low spatial correlation of soil attributes and few sampled points, ML techniques are an alternative to the OK method, especially when covariates with a higher number of points and a significant level of correlation with the variables to be interpolated are available. The results in this study confirmed the feasibility and applicability of ML techniques, especially the "Support Vector Machine" method, for prediction and mapping of soil chemical attributes on a regional scale.