Next Article in Journal
Slip Estimation Using Variation Data of Strain of the Chassis of Lunar Rovers Traveling on Loose Soil
Next Article in Special Issue
Embedded Yolo-Fastest V2-Based 3D Reconstruction and Size Prediction of Grain Silo-Bag
Previous Article in Journal
Tree-Structured Parzan Estimator–Machine Learning–Ordinary Kriging: An Integration Method for Soil Ammonia Spatial Prediction in the Typical Cropland of Chinese Yellow River Delta with Sentinel-2 Remote Sensing Image and Air Quality Data
Previous Article in Special Issue
HyperSFormer: A Transformer-Based End-to-End Hyperspectral Image Classification Method for Crop Classification
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:

Soil Salinity Estimation for South Kazakhstan Based on SAR Sentinel-1 and Landsat-8,9 OLI Data with Machine Learning Models

Institute of Automation and Information Technology, Satbayev University (KazNRTU), Satpayev Str., 22A, Almaty 050013, Kazakhstan
Institute of Information and Computational Technologies, Pushkin Str., 125, Almaty 050010, Kazakhstan
Institute of Zoology SC MES RK, al-Faraby Av., 93, Almaty 050060, Kazakhstan
Faculty of Management Science and Informatics, University of Zilina, Univerzitná 8215/1, 010 26 Žilina, Slovakia
Baltic International Academy, 1/4 Lomonosov Str., LV-1003 Riga, Latvia
Author to whom correspondence should be addressed.
Remote Sens. 2023, 15(17), 4269;
Submission received: 12 May 2023 / Revised: 30 July 2023 / Accepted: 7 August 2023 / Published: 30 August 2023
(This article belongs to the Special Issue Advanced Sensing and Image Processing in Agricultural Applications)


Climate change, uneven distribution of water resources and anthropogenic impact have led to salinization and land degradation in the southern regions of Kazakhstan. Identification of saline lands and their mapping is a laborious process associated with a complex of ground measurements. Data from remote sensing are widely used to solve this problem. In this paper, the problem of assessing the salinity of the lands of the South Kazakhstan region using remote sensing data is considered. The aim of the study is to analyze the applicability of machine learning methods to assess the salinity of agricultural lands in southern Kazakhstan based on remote sensing. The authors present a salinity dataset obtained from field studies and containing more than 200 laboratory measurements of soil salinity. Moreover, the authors describe the results of applying several regression reconstruction algorithms (XGBoost, LightGBM, random forest, Support vector machines, Elastic net, etc.), where synthetic aperture radar (SAR) data from the Sentinel-1 satellite and optical data in the form of spectral salinity indices are used as input data. The obtained results show that, in general, these input data can be used to estimate salinity of the wetted arable land. XGBoost regressor ( R 2 = 0.282) showed the best results. Supplementing the radar data with the values of salinity spectral index improves the result significantly ( R 2 = 0.356). For the local datasets, the best result shown by the model is R 2 = 0.473 (SAR) and R 2 = 0.654 (SAR with spectral indexes), respectively. The study also revealed a number of problems that justify the need for a broader range of ground surveys and consideration of multi-year factors affecting soil salinity. Key results of the article: (i) a set of salinity data for different geographical zones of southern Kazakhstan is presented for the first time; (ii) a method is proposed for determining soil salinity on the basis of synthetic aperture radar supplemented with optical data, and this resulted in the improved prediction of the results for the region under consideration; (iii) a comparison of several types of machine learning models was made and it was found that boosted models give, on average, the best prediction result; (iv) a method for optimizing the number of model input parameters using explainable machine learning is proposed; (v) it is shown that the results obtained in this work are in better agreement with ground-based measurements of electrical conductivity than the results of the previously proposed global model.

1. Introduction

Climate change and anthropogenic impact lead to the degradation of agricultural land due to increased salinity. Climatic factors create a potential danger for soil salinization, while population growth and uncontrolled intensification of the use of land and water resources can lead to an abrupt reduction of pastures and arable land’s area [1]. According to [2], the total area of primary saline soils is about 955 million hectares, and about 77 million hectares are exposed to secondary salinization, of which 58% are irrigated areas. Nearly 20% of all irrigated land is saline, and this proportion tends to increase despite significant land reclamation efforts. It is assumed that, by 2050, the salinity of arable land in the world may exceed 50% [3].
Irrigation of the salinized arable lands and degradation of agricultural lands in the South of Kazakhstan is a systemic negative factor that most strongly affects four regions of Kazakhstan: Turkestan, Almaty, Zhambyl and Kyzylorda. The main problem is related to water resources, which are formed by the transboundary flow of large rivers in the region (the Syrdarya, Ile and Chu rivers). Kazakhstan is located in the lower reaches of river basins and is very vulnerable accordingly with regards to a water supply. The growth of water consumption in the upper parts of the river basins belonging to the territories of neighboring countries (Uzbekistan, China, Kyrgyzstan) and climate change create problems in the water supply of irrigated arable land in southern Kazakhstan. The problem of salinization and its mapping appeared due to the low ensuring of water and food security in the south of Kazakhstan, where two of the big cities of the Republic are located (Almaty, Shymkent). There are several modern solutions that are based on remote sensing monitoring and GIS and have provided solutions for soil salinity.
One such direction is the application of synthetic aperture radar data for estimation of the salinity of various territories [4,5], as well as combined methods, including both radar and conditionally optical remote sensing data [6,7,8].
In this paper, the authors used the data from the field studies sampled during May–July 2022 in three geographically significantly separated areas of southern Kazakhstan. The synthetic aperture radar (SAR) data from Sentinel-1 satellite and optical data from Landsat-8 in the form of spectral salinity indices were used as remote sensing data.
The aim of the study was to analyze the applicability of these data for estimating the salinity of the territories, using machine learning methods. At the same time, the applicability of this method was evaluated to territories that differ significantly in geography and soil condition.
The second task was a comparative analysis of machine-learning algorithms, which could be used in the future to calculate the salinity of the territories of Kazakhstan.
The main contribution of this study is as follows:
  • A set of field data for the study of salinity in the southern regions of Kazakhstan was prepared;
  • The method of soil salinity assessment based on SAR data proposed earlier in the literature was supplemented;
  • The method of salinity assessment was extended by using a combination of multispectral and SAR radar data;
  • A comparative analysis of machine learning algorithms solving the salinity estimation problem on the proposed data set is performed;
  • The significant input parameters (features) were selected and their effect was estimated using the methods of explainable machine learning (EML);
  • The boundaries of the joint use of obtained field data were determined;
  • The obtained modeling results were compared with the results of one of the known models of soil salinity assessment.
Section 2 presents the discussion of scientific papers devoted to solving the salinity estimation problems using machine learning methods. Section 3 describes the salinity estimation methods based on several machine learning models. Section 4 presents the result. Interpretations of the findings are discussed in Section 5. The Section 6 summarizes the results, describes the limitations of the proposed method and formulates the objectives of further research.

2. Related Works

2.1. Classical Salinity Estimation Methods Based on Spectral Data

The use of remote sensing data to assess soil salinity has a history of more than 40 years [1]. After reviewing the scientific studies, the authors revealed that studies [9,10,11] demonstrate a similar approach to the issue of estimating the salinity level of arable irrigated areas; this estimation is based on satellite data and GIS technologies, including a description of the salinity of some areas of irrigated arable land in Kazakhstan [12,13]. Currently, the use of satellite data and GIS technologies is the most widely used means to monitor arable land salinity, which in some cases can compete successfully with technically complex and expensive ground-based monitoring methods. The ideas of various satellite monitoring techniques are generally similar [14,15,16]. Some satellite images, or their time series, are used from various orbital platforms [17,18,19], including hyperspectral data [20,21], low spatial resolution satellite data (such as MODIS) [22,23], and radar images [24]. The spectral channels of the visible and near infrared spectral ranges are used and analyzed for the detection of soil salinity [14,15,16]. The different indices of salinity are constructed in combination with the vegetation indices on the basis of channel combinations; this allows the creation of a logical system of relationships between the field samples data on the existing level of arable land salinity and the spectral characteristics of the underlying surface. Another approach is to use more complex methods, such as clustering algorithms for remote sensing data and other machine learning methods [25,26,27].

2.2. Machine Learning Methods in Salinity Estimation Problems

Machine learning methods (ML) are an important subsection of artificial intelligence (AI), putting into practice the ideas of AI to create learning systems [28]. Machine learning technologies can effectively solve the nonlinear problem of the relationship between soil properties and environmental factors [29]. There have been a number of serious results in solving problems of classification, clustering, and regression analysis in various scientific fields [30,31,32,33,34,35,36,37], and this is the consequence of using ML methods in the studies devoted to the soil salinity. starting around 2012.
In the paper [27], the authors provided an estimation and mapping of soil salinity in the Aral Sea basin; they used data such as terrain indices, height above sea level, remote sensing data, distance to drainage channels, long-term observations of groundwater levels, etc. These input variables predict salinity, and an electromagnetic induction instrument (CM-138) measures the electrical conductivity (EC) of the soil profile down to a depth of 1.5 m in vertical mode. The created classification model obtained an accuracy of estimating saline soils at close to 90%, the salinity threshold defined as about 0.7 dS m−1. The authors provided a sensitivity analysis of the model; as a result, the dependence of the model on such terrain variables as curvature (curv), plan curvature (planc), profile curvature (profc), and solar radiation (solar) was stated; the terrain factors are shown in descending order by their importance for the model. The conclusion of the research is that soil salinity is largely affected by microrelief, convexity, or concavity, which in turn affects surface water retention.
In [30], the authors aim to monitor the aquifer of the eastern Nile Delta, Egypt, to check the level of salinity of groundwater occurring due to seawater intrusion (SWI); with this purpose, they propose to use the predictive regression model. The initial inputs were hydrogeological and hydrogeochemical data from the groundwater wells in 1996, 2007, and 2018, distance from the coastline, aquifer type, and hydraulic pressure. Using these input data, the authors constructed four baseline ML models allowing the forecasting of such indicators as the base exchange index (BEX), groundwater quality index for seawater intrusion (GQISWI), and water. The authors constructed such machine learning models as logistic regression, Gaussian process regression, feed-forward neural networks (FFNN), and one of the deep learning models, long short-term memory. The highest scores in terms of root mean square error (RMSE) and R2 were received from the FFNN model, and testing showed R2 = 0.9667 for BEX, R2 = 0.9316 for GQISWI, and R2 = 0.9259 for water quality. The ground samples of soils from arid areas in northwestern China and multispectral remote sensing data are used in [32]. The authors collected soil samples and determined the correlation between ground measurements and Landsat-8 OLI and Sentinel-2 MSI multispectral images. The results showed that the higher resolution Sentinel-2 MSI imagery gives more accurate results in estimating the salinity in desert areas.
The problem of predicting the location and degree of salinity of territories within the irrigation systems of the Waalhart and Bride rivers in South Africa was considered in [26]. The digital elevation models (DEMs) are considered as input data: DEM obtained from the satellite radar data (Shuttle Radar Topography Mission (SRTM)), and DEM based on photogrammetry (photogrammetrically-extracted digital surface model (DSM)). Areas with soil electrical conductivity (EC) values > 4 dS/m are indicated as soil with salinity. The salinity values based on soil samples were used to develop the model, to train the classifier, and to evaluate the score of the model. The binary classifier (based on the decision tree) demonstrated accuracy within 75%. The authors of the paper claim that the use of height data and their derivatives as input data for geo-statistics and machine learning has great potential for monitoring salt accumulation in the irrigated areas, especially for modeling subsurface conditions.
In [24], the authors considered the relationship between the backscattering coefficient (BC) and EC for Maha-Sarakham district in the northeast region of Thailand. ALOS-PALSAR provides data in four polarizations, HH, HV, VH and VV, with a resolution of 12.5 m. The total for the ground measurements is about 500 points from a depth of 5 cm, and the distance between each measurement is 20 m. Each measurement is identified by GPS with an accuracy of 4 m. The electrical conductivity was evaluated in the laboratory at 25 °C (dS m−1). Several feed forward neural net models have been used to reconstruct the regression relationship between radar data and ground measurements. The models showed average values of the coefficient of determination R2 = 0.85 and RMSE = 5.
A similar method for detecting soil salinity from Sentinel-1 radar data was described in [4]. The authors used a support vector regressor (SVR) to determine salinity, based on the field data.
An interesting approach was presented to solving the problem of salinity mapping using radar images and machine learning algorithms for the Mekong Delta were studied in [33]. The input values were the features generated from Sentinel-1 Synthetic Aperture Radar (SAR) C-band radar images, and the researchers considered the regression problem and predicted the electrical conductivity of soil. The generation of features from radar images was carried out using the method of constructing the Gray Level Co-occurrence Matrix (GLCM) proposed in [37]. The best result was achieved using the Gaussian Process Regressor (GPR) model with results of RMSE = 2.885, MAE = 1.897, and R2 = 0.808.
Recently, the joint use of optical and radar data has gained great popularity. For example, ref [38] describes the combined use of Sentinel-1 and Sentinel-2 data, as well as a digital elevation model (DEM). In [7], the data from Sentinel-1 C radar and optical data from Sentinel-2 are used. Study [6] also uses Sentinel-1 SAR data and Sentinel-2 multispectral data to calculate the normalized differential salinity index (NDSI). The optical indices NDVI, SAVI, NDSI, SI based on Landsat 8 data and ALOS PALSAR-2 SAR data were used to form several combined optical radar indices (ORSDI) [3].
It should be noted that the list of input parameters for the developed soil salinity assessment models varies considerably. Insufficient or excessive number of parameters can lead to decreased quality of operation of machine learning algorithms. That is why it is necessary to optimize a set of model input parameters. In this case, the list of input parameters can be locally dependent. Correlation analysis, the importance of variables in projection (VIP), competitive adaptive sampling with re-weighting (CARS), the genetic algorithm (GA) [39,40,41], a method based on optimization of macro-parameters of the model and the number of input variables [42] are mentioned as means to solve this problem.
Summing up, it can be noted that algorithms based on decision trees, such as the above-mentioned decision tree [26], random forest (RF) [43,44], support vector machines (SVMs) [4,45], boosting models [38,46], and classical FFNN models [27,30,47], are quite often used as machine learning methods for predicting soil salinity.
According to the available information, the works devoted to the use of combined SAR and optical spectral bands data are limited to relatively small areas of the territory and to the application of a small number of machine learning models. However, a comparative analysis of the applicability of machine learning models and satellite data of different spectral ranges to substantially different territories, where both cultivated and uncultivated fields are located, is useful.

3. Method

In this research the authors have developed the approach proposed in [33], so that the SAR Sentinel-1 data were supplemented with terrain and temperature data, which, as noted in the literature sources [27,44] and as supported by this research (see Section 4 and Section 5), improve the qualitative performance of the models. The authors also supplemented the SAR data with optical data, which improved the qualitative performance of the machine-learning models. In addition, there a comparative analysis of machine learning methods was performed and the capabilities were evaluated of the proposed salinity prediction technique for significantly different areas of southern Kazakhstan’s territory.
The flowchart of the process of salinity detection is shown in Figure 1. The process of training and applying the model consists of the following steps:
  • Obtaining the salinity data using the field studies;
  • Obtaining the radar, multispectral and SRTM data from Google Earth Engine;
  • Extraction of linear back scattering intensity in VV and VH polarizations;
  • Texture analysis using the GLCM method;
  • Application of the machine learning algorithms and evaluation of the quality of the trained model;
  • Mapping the selected areas of the territory.
Multispectral data from Landsat-8 were obtained and used to calculate the spectral indices.
As a result, as will be shown below, the regressors that demonstrate the best results were identified, and limitations in the application of the method for individual subsets of data were determined.

3.1. Data Preparation

Field studies of the soil were carried out during the expeditions on 23 May (Shelek), 26 May, 17, 18 and 29 June (Kapchagay) and 18 July 2022 (Alakol). The sampling route was designed taking into account the availability of vehicles and the expected presence of pronounced areas of saline soil. A portable GPS device (Garmin 65) with a positioning accuracy of ≤5 m was used to record the geographic coordinates. Although the positioning accuracy level of the GPS device was not perfect, it was sufficient to ensure reasonable position matching between the sampling site and satellite image pixels (since 10 m is the size of 1 image pixel; in other words, a 10 m resolution for the Sentinel 1 satellite). The sampling locations were photographed and described.
A total of 207 soil samples were collected in the area of Lake Alakol, near Kapchagay Reservoir and Shelek village. Accordingly, three subsets of soil samples were formed: Alakol (45 samples), Kapchagay (84 samples), and Shelek (78 samples). Figure 2 shows the locations of soil sample collection near the Kapchagay reservoir and on the foothill plains near Lake Alakol.
Figure 3 shows soil sampling routes. The collection conditions showed significant differences. Shelek samples were collected on moist loamy arable land with corn plantations at the end of May. At the same time, samples of Kapchagay were collected on dried hilly sandy soil mainly outside of ploughed areas. Alacol samples were collected in mixed mode both within and outside of the arable areas. As will be shown below, this had a significant impact on the quality of the regression analysis. All soil samples were properly sealed, labeled, and transported to the laboratory for measurement of their electrical conductivity.
The samples were completely dried and sieved through a 2-mm mesh sieve to remove vegetation and stone residues. The prepared soil samples were mixed with water at a ratio of 1:5 (one weight fraction of soil and five weight fractions of water). The obtained solutions were left to settle for one day for complete dissolution of the fractions. Then, the electrical conductivity of the soil was determined using a digital meter (Hanna GroLine HI9814) at room temperature (25 °C).
The data from Sentinel-1 Synthetic Aperture Radar (SAR) and optical data obtained in April–August 2022 were used as initial data. Figure 4 shows the images synthesized from the SAR data, which are composed of a combination of colors imitating the reflected radar signals in the following way: Red is VH, Green is VV, and Blue is VH/VV. The first letter of the signal code indicates the polarization of the emitted signal and the second letter the polarization of the reflected signal.
The pre-processing of Sentinel 1 SAR radar data resulted in a data set, a fragment of which is shown in Figure 5. The figure demonstrates the set obtained from the SAR data in April 2022. The entire dataset can be downloaded from the link: (accessed on 1 August 2023).
The radar dataset consists of field soil samples (electrical conductivity—elco50), 16 features that were generated using a radar image and the GLCM method, temperature and information about the terrain (dem, slope). Table 1 contains abbreviations and descriptions of the dataset. For radar data augmentation, the measured altitude by GPS, the slope computed with the use of digital elevation model, and temperature data, which can affect the salinity classification model, were proposed. USGS MODIS Earth Surface Temperature or land surface temperature (LST) and ground elevation model (ELV) data were used to extend the data set. The resolution of the images received during the process were as follows: 1 km for LST and 30 m for ELV. Height above sea level and slope of relief, or angle of land, were received from ELV data.
The data from Landsat-8 satellite with 30 m. resolution were used to calculate the spectral indices listed in Table 2.

3.2. Analysis of Data

To train the machine learning models, an approximately equal distribution of objects with different salinity values is desirable. For this purpose, the target column was converted into a column of five salinity classes for loam soil according to [57,58]. Table 3 shows the distribution of samples by salinity class.
Due to the fact that the distribution of samples by salinity class was approximately equal (except for class 3—high saline), it was decided to divide the data set into training and test randomly, using the standard utility train_test_split Sklearn [59].
An analysis of the correlation of the input parameters of the SAR data shows a significant relationship with the subset of the analyzed data (Figure 6). For example, in the Alakol dataset, ASM_Vh has a zero value, while the rest of the data sets demonstrate an almost 100% correlation with energy_vh. There is a positive correlation between dem, temp and slope in the Alakol and Kapshagai sets, while in the Shelek set there is a significant negative one. The correlation between elco50 (target value) and coordinates also differs significantly for different datasets, including the sign of the correlation.
In this regard, it was decided to evaluate the performance of the machine learning models on the Alakol, Kapchagay, and Shelek data subsets separately.
It should be noted that some input parameters can be redundant and even reduce the quality of the model performance. To optimize the set of properties, random search algorithms are used, for example, the genetic algorithm [7,60]. The disadvantage of this approach consists of rather significant computational costs.
The second question is to evaluate the influence of input parameters (features) on the model results. In other words, this is required to explain the result obtained by the model. Model specific and model agnostic methods of machine learning interpreting methods are used to solve this problem [61]. Due to the fact that the authors analyze the machine learning models based on different mathematical principles, it is convenient to apply an agnostic interpretation model, such as SHapley Additive exPlanations (SHAP) [62]. The authors also used the same model, as will be described below, to optimize multiple features.

3.3. Machine Learning Models

The authors previously considered the possibility of applying a wide range of machine learning algorithms, both classical and modern [63]. Obviously, the application of deep learning models could give a good result. However, the generated dataset is relatively small. The authors are also not aware of similar initial SAR datasets of significant volume or pre-trained models of deep neural networks to solve the salinity estimation problem, which would allow the application of transfer learning techniques for its tuning. Therefore, the use of deep learning models was considered inappropriate. The authors used Busting models [64], Support vector machines, and classical regression algorithms as a reference point (base line) for comparative analysis (Table 4).
To assess the quality of regression models, the following accuracy indicators are often used [31]: coefficient of determination (R2), Mean squared error (MSE) and Mean absolute error (MAE). The quality indicators used to estimate the regressors, as well as the corresponding expressions, are listed in Table 5.

4. Results

4.1. Evaluation of Regression Models

The quality of performance of machine learning models was evaluated by Random permutations cross-validation (ShuffleSplit). In this case, the raw data are divided into training and test sets randomly in a given proportion (in this case, 80% is training data, and 20% is test data). To ensure the statistical significance of the result, such splitting was performed 200 times for each regressor model. The obtained values of the estimates were averaged and variance was calculated for them using the statistics library. The results of the models are shown in Table 6 and Table 7 (full results of the calculations are presented in Appendix A and Appendix B), where varMAE, varMSE, VarR2 are the variance of the obtained estimates, and Duration is the training and estimation time of the model in seconds.
The XGB regressor shows the best results (highlighted in bold). Figure 7 shows a scatterplot of the salinity measured and predicted by XGB for different data sets. The diagonal line in the figure shows the optimum line, where the prediction value coincides with the actual value.
It can be seen that the results of the regressor on the Shelek dataset have the point density closest to the optimum line. The experiments with this dataset show that R2 = 0.473 using SAR data and R2 = 0.654 for the full set of input variables including radar, spectral indices, temperature, and terrain information. Without going into detail, it should be noted that the classification of data using the XGB Classificator with SAR data for five salinity classes gave an average accuracy of 54%. Applying the same classifier to calculate three and two classes gives an average accuracy of 65% and 75%, respectively. At the same time, the use of both spectral indices and radar data gives slightly better average estimates for classes 5, 3 and 2: 58%, 71%, 77%, respectively.

4.2. Analysis of Influence of Input Parameters

To analyze the dependence of the model results on the input parameters, the authors used the SHAP library [62]. The results of the analysis of the XGB model are shown in Figure 8. In the figure, a, c—the influence of the parameters obtained with SAR data; b, d—the influence of spectral data and optical indices.
The regression result depends on almost all input parameters. For example, a decrease of latitude positively affects the salinity value; in other words, the more southern the considered zone, the more likely is high salinity value; high slope value, on the contrary, negatively influences the soil salinity degree—high salinity on a slope is improbable. Temperature is the second most influential parameter. When using the optical data, the third and fourth most important factors are optical indices.
Due to the significant geographical dispersion of the data sets, the use of coordinate values in the number of input parameters may not be appropriate. To this end, in Figure 8, fragments c, d show the effect of parameters without regard to coordinates.
The obtained results allow us to range the input parameters for the regression model. The maximal influence has the parameter gamma_vh. The second parameter in the rank is the temperature: the higher the temperature results, the higher the value of the electrical conductivity (salinity) of the soil. The third and fourth parameters are the optical indices, water index WI, and salinity index SI2 (Figure 8d): the higher the average WI and SI2 values, the higher the salinity.
It can be assumed that the exclusion of terrain and temperature data from the input parameters will reduce the quality of the regression models, which is confirmed by the results of the computational experiment (see Table 8).
At the same time, the set of input parameters can be reduced without degrading the quality of the regressors. However, it must be taken into account that the relationship between the input parameters and the target variable is not linear and the input parameters affect the result together. Therefore, the simple removal of “insignificant” parameters can lead to deterioration of regression indicators. In order to remove the parameters that reduce the performance of the regression model, the authors used the following method. It is known that SHAP values form an objective relation between the model results and the input parameters. Consequently, it is possible to exclude from the properties those variables that do not affect the result and those that have a strong correlation between them. Figure 9 shows the correlation matrices of the original (left) and reduced (right) sets of SHAP values.
After the above-described optimization, the models use 24 input parameters instead of 36 and their results have improved, especially for XGB, RF, Ridge, and SVR (see Table 9).
The results of the regressors’ performance with an exclusively optimized list of spectral indices as input are shown in Appendix C and Appendix D. The list of input parameters was reduced from 20 to 16 and the results improved by about 1%.

5. Discussion

To evaluate the obtained results, in addition to the numerical metrics, it is useful to compare them with the results previously obtained for a given territory with employment of another model. Figure 10 shows the results of applying the XGB regressor to the salinity mapping at a site approximately corresponding to site 4 (Shelek) in Figure 3. The model was trained using the spectral indices. The figure shows a section of the terrain in the foothills of the Zailiysky Alatau (1) and a satellite image Landsat 8 (band 4, red) (2). The sites of soil sample collection are marked on the map (1) by green rectangles. The right part (3) shows the model of predicted salinity levels obtained using Landsat 8, 9 images from 1, 9, 10, 17 and 26 April 2022 (from top to bottom). Green areas correspond to the minimum level of salinity, and lighter areas, going to yellow and red, to a high level of salinity. Note that the cloudy days on 9 and 10 April distorted the result. The bottom right (4) shows the results of the prediction using the RF-based model from [44] based on temperature data (hereafter Mtemp). In this case, it can be seen that the Mtemp model predicts two classes of soil salinity, which in general do not coincide with the measured electrical conductivity values of soil samples and XGB regressor predictions.
Figure 11 illustrates the dynamics of forecasting changes in the condition of the soil cover in the areas of Zailiisky Alatau on 1 April (2) and 23 August (3), 2022. The zone for collecting the soil samples (on 23 May 2022) is marked on the map (1) by a white rectangle. This collection area corresponds to Figure 10. The map in Figure 11 shows approximately 25 times the area of the Zailiiskiy Alatau.
Figure 12 shows the simulation results for Alakol Lake area in comparison with Mtemp model. It can be seen that the models again show significantly different results.
In general, the computational experiments have shown that the satellite SAR data can be used to estimate the salinity in the southern regions of Kazakhstan. Expanding the number of input variables by a set of spectral indices leads to improved results. It can be stated that:
  • XGBRegressor has the best quality indicators for the considered regressors; LightGBM is the second in terms of quality indicators.
  • The results of the constructed regression are significantly better on moist cultivated soil (Shelek) than on the entire data set.
  • The quality of work on the datasets from Alakol and Kapchagay is low. It can be assumed that sampling in a local area of hilly terrain (Kapchagay) and large sampling areas in the Alakol region require a more laborious process of soil data generation, for example, in the form of five-spot sampling [38].
  • Comparison of the results of the XGB regressor with the Mtemp model shows that the models produce significantly different results. It can be assumed that a possible reason for the discrepancy is that Mtemp was trained without using data from the regions of Kazakhstan.
The resulting regression parameters are relatively worse compared to those given in articles by other authors [76]. It can be assumed that this is a consequence of the complexity of the foothill landscape and the large difference in time when performing ground studies. For example, soil samples from the Kapchagay dataset were collected on dry hilly terrain, where there were only small (3–5 m) local areas of salinity that were not visible from space. Samples of the Alacol set were collected during the period of rapid growth of vegetation. At the same time, the Shelek set was formed from samples collected on a plowed flat area with seedlings of low height, where salinity manifestations were expressed in relatively large areas of the field (hundreds of meters).
Large areas of the country and, as a consequence, the variability of the surrounding conditions leads to the fact that the field and satellite data from different areas may differ significantly, and their combined use requires the further research. Accounting for terrain parameters is one way to improve results. A possible solution is to apply deep learning models [63,77,78,79,80], which already demonstrate their advantages in some conditions [81].
It should be noted that UAVs [82] are actively used for solving the problems of precision agriculture, which make it possible to collect multispectral and hyperspectral data of high resolution, which are not available for satellite images [39,83,84]. The use of such data allows a detailed assessment of salinity within relatively small fields of a few hectares in size [85].
The results of mapping of some regions of South Kazakhstan and high-resolution illustrations are attached to the article as a Supplementary Materials.

6. Conclusions

Estimating soil salinity for practice, with sufficient accuracy and on the basis of remotely sensed data, is not an easy process. The main geophysical patterns of soil salinity are well understood, but soil salinity indicators are variable both spatially and temporally and depend significantly on the weather conditions, irrigation conditions and the moment of data collection in the course of field studies.
The experiments have shown that XGBRegressor has the best quality indicators among the considered regression models ( R 2 = 0.654 for dataset Shelek). Although SAR data in general can be used to assess salinity in the southern regions of Kazakhstan, the performed computational experiments showed that the collected data sets are heterogeneous. On the wet soil (Shelek dataset), the results are significantly better, while the other two datasets separately show unsatisfactory results. It can be assumed that this is a consequence of the complexity of the landscape in the study area, the difference in the time of sampling, and the different nature and development of the vegetation cover. This fact requires additional analysis. Perhaps increasing the number of input parameters would help to improve the regression performance. A possible solution is to use those machine learning methods that are used for image processing, in other words, convolutional neural networks of various architectures.
The manifestation of salinity in optical and radar data is not the same. In order to assess the influence of input parameters, the article, probably for the first time, proposes a method for ranking and optimizing model input parameters using one of the EML methods—SHapley Additive exPlanations. In addition, in this work:
  • A labeled data set is proposed for the electrical conductivity of soils in Southern Kazakhstan, which differ significantly in their geographical location;
  • The method of soil salinity estimation described in [33] has been modified and extended with optical data;
  • An analysis of several types of machine learning models was performed and it was shown that boosting regression models generally gives the best result;
  • The results of the developed model are compared with the results of the Mtemp model [44] and it is shown that the developed model provides better agreement with ground-based measurements of electrical conductivity for this region.
Limitations of the study
  • This study is based on a relatively small amount of field data, which differ significantly in geophysical indicators of collection sites and collection times;
  • The quality of the work on the regressors significantly depends on the settings. Despite the search for the best combinations of parameters, it is not possible to analyze all combinations in a limited study;
  • The considered set of input parameters is not exhaustive. It is quite acceptable to use the remote sensing data both close to the time of sampling and remote in time.
Future research
In the future, it is planned to:
  • Evaluate the effect on the data of optical range, including infrared, on regression quality, depending on the time of remote sensing data acquisition;
  • Evaluate the impact of optical and radar data collected within the vegetation growth season (April–August) or for a longer period of time;
  • Apply deep learning models to account for terrain parameters;
  • Evaluate the possibilities of using multispectral images acquired from a UAV for mapping of focal salinity of agricultural fields.

Supplementary Materials

The following supporting information can be downloaded at:

Author Contributions

Conceptualization, R.I.M. and T.M.; methodology, R.I.M.; software, R.I.M., G.S. and T.M.; validation, Y.K., D.M., E.Z. and V.L.; investigation, R.I.M., T.M., D.M. and A.S.; resources, R.I.M., Y.K. and E.Z.; data curation, Y.K., T.M. and G.S.; writing—original draft preparation, R.I.M., T.M. and Y.K.; writing—review and editing, Y.P. and A.S.; visualization, R.I.M.; supervision, R.I.M.; project administration, Y.A.; funding acquisition, Y.A., E.Z. and V.L. All authors have read and agreed to the published version of the manuscript.


This research was funded by the Committee of Science of the Ministry of Science and Higher Education of the Republic of Kazakhstan, under Grants: Grant BR10965172 “Space monitoring and GIS for quantitative assessment of soil salinity and degradation of agricultural lands in South Kazakhstan”, BR18574144 “Development of a Data Mining System for Monitoring Dams and Other Engineering Structures under the Conditions of Man-Made and Natural Impacts”, BR21881908 “Complex of urban ecological support”, AP14869972 “Development and Adaptation of Computer Vision and Machine Learning Methods for Solving Precision Agriculture Problems Using Unmanned Aerial Systems”. This research was implemented under the projects “Earth Observation for Early Warning of Land Degradation at European Frontier” (No. ID 101086250) of EU Framework Programme for Research and Innovation Horizon Europe and “New approaches of Reliability Analysis of non-coherent systems” (VEGA 1/0165/21) of the Ministry of Education, Science, Research, and Sport of the Slovak Republic.

Data Availability Statement

The data presented in this study are openly available in, accessed on 1 August 2023.

Conflicts of Interest

The authors declare no conflict of interest.


AIArtificial intelligence
BCBackscattering coefficient
BEXBase exchange index
CARSCompetitive adaptive sampling with re-weighting
DEMDigital elevation models
DSMDigital surface model
ECElectrical conductivity
ELVGround elevation model
EMLExplainable machine learning
FFNNFeed-forward neural networks
GAGenetic algorithm
GISGeographical information system
GLCMGray level co-occurrence matrix
GPSGlobal positioning system
GQISWIGroundwater quality index for seawater intrusion
LSTLand surface temperature
MAEMean absolute error
MLMachine learning
MSEMean squared error
NDSINormalized differential salinity index
R2Coefficient of determination
RFRandom forest
RMSERoot mean square error
SARSynthetic Aperture Radar
SHAPSHapley Additive exPlanations
SRTMShuttle radar topography mission
SVMSupport vector machine
SWISeawater intrusion
UAVsUnmanned aerial vehicles
varMAEvariance of MAE
varMSE variance of MSE
VarR2variance of R2
VIPvariable in projection

Appendix A. Results of Machine Learning Models Using SAR Data

DatasetRegressorMAEMSE R 2 VarMAEVarMSE Var R 2 Duration
Full DatasetXGB0.6441.9910.2820.0243.5420.04628.14647

Appendix B. Results of Machine Learning Models Using SAR Data and Spectral Indices

DatasetRegressorMAEMSE R 2 VarMAEVarMSE Var R 2 Duration
Full DatasetXGB0.5691.8890.3390.0233.490.05728.16568

Appendix C. Regressor Results Using an Optimized Set of Spectral Indices

Full DatasetXGB0.5871.930.3050.0273.4550.04823.44931

Appendix D. Results of Regressors with Optimized SAR Dataset and Optical Indices

Full DatasetXGB0.5751.8580.3560.0233.5260.06136.6489
N.B. The optimized set of input parameters includes: ‘dem’, ‘temp’, ‘slope’, ‘dissimilarity_vv_1’, ‘contrast_vv_1’, ‘homogeneity_vv_1’, ‘correlation_vv_1’, ‘entropy_vv_1’, ‘dissimilarity_vh_1’, ‘contrast_vh_1’, ‘homogeneity_vh_1’, ‘correlation_vh_1’, ‘entropy_vh_1’, ‘gamma_vv_1’, ‘gamma_vh_1’, ‘NDSI1’, ‘S31’, ‘SI11’, ‘SI31’, ‘SI81’, ‘NDSIre1’, ‘SI3re1’, ‘SSRIre1’, ‘SSRI1’.


  1. Li, X.; Wang, Z.; Song, K.; Zhang, B.; Liu, D.; Guo, Z. Assessment for salinized wasteland expansion and land use change using GIS and remote sensing in the west part of Northeast China. Environ. Monit. Assess. 2007, 131, 421–437. [Google Scholar]
  2. Metternicht, G.I.; Zinck, J. Remote sensing of soil salinity: Potentials and constraints. Remote Sens. Environ. 2003, 85, 1–20. [Google Scholar]
  3. Muhetaer, N.; Nurmemet, I.; Abulaiti, A.; Xiao, S.; Zhao, J. An Efficient Approach for Inverting the Soil Salinity in Keriya Oasis, Northwestern China, Based on the Optical-Radar Feature-Space Model. Sensors 2022, 22, 7226. [Google Scholar] [CrossRef]
  4. Taghadosi, M.M.; Hasanlou, M.; Eftekhari, K. Soil salinity mapping using dual-polarized SAR Sentinel-1 imagery. Int. J. Remote Sens. 2019, 40, 237–252. [Google Scholar] [CrossRef]
  5. Grissa, M.; Abdelfattah, R.; Mercier, G.; Zribi, M.; Chahbi, A.; Lili-Chabaane, Z. Empirical model for soil salinity mapping from SAR data. In Proceedings of the 2011 IEEE International Geoscience and Remote Sensing Symposium, Vancouver, BC, Canada, 24–29 July 2011; pp. 1099–1102. [Google Scholar]
  6. Tripathi, A.; Tiwari, R.K. A simplified subsurface soil salinity estimation using synergy of SENTINEL-1 SAR and SENTINEL-2 multispectral satellite data, for early stages of wheat crop growth in Rupnagar, Punjab, India. Land Degrad. Dev. 2021, 32, 3905–3919. [Google Scholar] [CrossRef]
  7. Mohamed, S.A.; Metwaly, M.M.; Metwalli, M.R.; AbdelRahman, M.A.; Badreldin, N. Integrating Active and Passive Remote Sensing Data for Mapping Soil Salinity Using Machine Learning and Feature Selection Approaches in Arid Regions. Remote Sens. 2023, 15, 1751. [Google Scholar]
  8. Nurmemet, I.; Ghulam, A.; Tiyip, T.; Elkadiri, R.; Ding, J.-L.; Maimaitiyiming, M.; Abliz, A.; Sawut, M.; Zhang, F.; Abliz, A. Monitoring soil salinization in Keriya River Basin, Northwestern China using passive reflective and active microwave remote sensing data. Remote Sens. 2015, 7, 8803–8829. [Google Scholar] [CrossRef]
  9. Singh, A.; Dwivedi, R. Delineation of salt-affected soils through digital analysis of Landsat MSS data. Remote Sens. 1989, 10, 83–92. [Google Scholar] [CrossRef]
  10. Metternicht, G.; Zinck, J. Spatial discrimination of salt-and sodium-affected soil surfaces. Int. J. Remote Sens. 1997, 18, 2571–2586. [Google Scholar] [CrossRef]
  11. Fernandez-Buces, N.; Siebe, C.; Cram, S.; Palacio, J. Mapping soil salinity using a combined spectral response index for bare soil and vegetation: A case study in the former lake Texcoco, Mexico. J. Arid. Environ. 2006, 65, 644–667. [Google Scholar] [CrossRef]
  12. Masoud, A.; Koike, K. Arid land salinization detected by remotely-sensed landcover changes: A case study in the Siwa region, NW Egypt. J. Arid. Environ. 2006, 66, 151–167. [Google Scholar] [CrossRef]
  13. Gabdullin, B.; Zhogolov, A.; Savin, I.Y.; Otarov, A.; Ibrayeva, M.; Golovanov, D. Application of multi-spectral satellite data for interpretation of soil salinization of the irrigated areas (case study of Southern Kazakhstan). Vestn. Mosk. Univ. Seriya 5 Geogr. 2016, 5, 34–41. Available online: (accessed on 5 May 2023).
  14. Gorji, T.; Yildirim, A.; Sertel, E.; Tanik, A. Remote sensing approaches and mapping methods for monitoring soil salinity under different climate regimes. Int. J. Environ. Geoinform. 2019, 6, 33–49. [Google Scholar]
  15. Allbed, A.; Kumar, L. Soil salinity mapping and monitoring in arid and semi-arid regions using remote sensing technology: A review. Adv. Remote Sens. 2013, 2, 373–385. [Google Scholar] [CrossRef]
  16. Abbas, A.; Khan, S.; Hussain, N.; Hanjra, M.A.; Akbar, S. Characterizing soil salinity in irrigated agriculture using a remote sensing approach. Phys. Chem. Earth Parts A/B/C 2013, 55, 43–52. [Google Scholar]
  17. Scudiero, E.; Skaggs, T.H.; Corwin, D.L. Regional-scale soil salinity assessment using Landsat ETM+ canopy reflectance. Remote Sens. Environ. 2015, 169, 335–343. [Google Scholar]
  18. Rahmati, M.; Hamzehpour, N. Quantitative remote sensing of soil electrical conductivity using ETM+ and ground measured data. Int. J. Remote Sens. 2017, 38, 123–140. [Google Scholar] [CrossRef]
  19. Fan, X.; Weng, Y.; Tao, J. Towards decadal soil salinity mapping using Landsat time series data. Int. J. Appl. Earth Obs. Geoinf. 2016, 52, 32–41. [Google Scholar]
  20. Qu, Y.-H.; Duan, X.-L.; Gao, H.-Y.; Chen, A.-P.; An, Y.-Q.; Song, J.-L.; Zhou, H.-M.; He, T. Quantitative retrieval of soil salinity using hyperspectral data in the region of Inner Mongolia Hetao irrigation district. Spectrosc. Spectr. Anal. 2009, 29, 1362–1366. [Google Scholar]
  21. Dutkiewicz, A.; Lewis, M.; Ostendorf, B. Evaluation and comparison of hyperspectral imagery for mapping surface symptoms of dryland salinity. Int. J. Remote Sens. 2009, 30, 693–719. [Google Scholar]
  22. Fallah Shamsi, S.R.; Zare, S.; Abtahi, S.A. Soil salinity characteristics using moderate resolution imaging spectroradiometer (MODIS) images and statistical analysis. Arch. Agron. Soil Sci. 2013, 59, 471–489. [Google Scholar]
  23. Zhang, T.-T.; Qi, J.-G.; Gao, Y.; Ouyang, Z.-T.; Zeng, S.-L.; Zhao, B. Detecting soil salinity with MODIS time series VI data. Ecol. Indic. 2015, 52, 480–489. [Google Scholar]
  24. Phonphan, W.; Tripathi, N.K.; Tipdecho, T.; Eiumnoh, A. Modelling electrical conductivity of soil from backscattering coefficient of microwave remotely sensed data using artificial neural network. Geocarto Int. 2014, 29, 842–859. [Google Scholar]
  25. Zeng, W.; Zhang, D.; Fang, Y.; Wu, J.; Huang, J. Comparison of partial least square regression, support vector machine, and deep-learning techniques for estimating soil salinity from hyperspectral data. J. Appl. Remote Sens. 2018, 12, 022204. [Google Scholar]
  26. Vermeulen, D.; van Niekerk, A. Machine learning performance for predicting soil salinity using different combinations of geomorphometric covariates. Geoderma 2017, 299, 1–12. [Google Scholar]
  27. Akramkhanov, A.; Vlek, P.L. The assessment of spatial distribution of soil salinity risk using neural network. Environ. Monit. Assess. 2012, 184, 2475–2485. [Google Scholar] [PubMed]
  28. Mukhamediev, R.I.; Symagulov, A.; Kuchin, Y.; Yakunin, K.; Yelis, M. From classical machine learning to deep neural networks: A simplified scientometric review. Appl. Sci. 2021, 11, 5541. [Google Scholar]
  29. Allbed, A.; Kumar, L.; Aldakheel, Y.Y. Assessing soil salinity using soil salinity and vegetation indices derived from IKONOS high-spatial resolution imageries: Applications in a date palm dominated region. Geoderma 2014, 230, 1–8. [Google Scholar]
  30. Nosair, A.M.; Shams, M.Y.; AbouElmagd, L.M.; Hassanein, A.E.; Fryar, A.E.; Abu Salem, H.S. Predictive model for progressive salinization in a coastal aquifer using artificial intelligence and hydrogeochemical techniques: A case study of the Nile Delta aquifer, Egypt. Environ. Sci. Pollut. Res. 2022, 29, 9318–9340. [Google Scholar]
  31. Mukhamediev, R.I.; Kuchin, Y.; Amirgaliyev, Y.; Yunicheva, N.; Muhamedijeva, E. Estimation of Filtration Properties of Host Rocks in Sandstone-Type Uranium Deposits Using Machine Learning Methods. IEEE Access 2022, 10, 18855–18872. [Google Scholar]
  32. Wang, J.; Ding, J.; Yu, D.; Teng, D.; He, B.; Chen, X.; Ge, X.; Zhang, Z.; Wang, Y.; Yang, X. Machine learning-based detection of soil salinity in an arid desert region, Northwest China: A comparison between Landsat-8 OLI and Sentinel-2 MSI. Sci. Total Environ. 2020, 707, 136092. [Google Scholar] [PubMed]
  33. Hoa, P.V.; Giang, N.V.; Binh, N.A.; Hai, L.V.H.; Pham, T.-D.; Hasanlou, M.; Tien Bui, D. Soil salinity mapping using SAR Sentinel-1 data and advanced machine learning algorithms: A case study at Ben Tre Province of the Mekong River Delta (Vietnam). Remote Sens. 2019, 11, 128. [Google Scholar]
  34. Merembayev, T.; Amirgaliyev, Y.; Saurov, S.; Wójcik, W. Soil Salinity Classification Using Machine Learning Algorithms and Radar Data in the Case from the South of Kazakhstan. J. Ecol. Eng. 2022, 23, 61–67. [Google Scholar]
  35. Quinlan, J.R. Induction of decision trees. Mach. Learn. 1986, 1, 81–106. [Google Scholar]
  36. Rivest, R.L. Learning decision lists. Mach. Learn. 1987, 2, 229–246. [Google Scholar]
  37. Haralick, R.M.; Shanmugam, K.; Dinstein, I.H. Textural features for image classification. IEEE Trans. Syst. Man Cybern. 1973, 6, 610–621. [Google Scholar]
  38. Ma, G.; Ding, J.; Han, L.; Zhang, Z.; Ran, S. Digital mapping of soil salinization based on Sentinel-1 and Sentinel-2 data combined with machine learning algorithms. Reg. Sustain. 2021, 2, 177–188. [Google Scholar]
  39. Yang, N.; Yang, S.; Cui, W.; Zhang, Z.; Zhang, J.; Chen, J.; Ma, Y.; Lao, C.; Song, Z.; Chen, Y. Effect of spring irrigation on soil salinity monitoring with UAV-borne multispectral sensor. Int. J. Remote Sens. 2021, 42, 8952–8978. [Google Scholar]
  40. Wei, G.; Li, Y.; Zhang, Z.; Chen, Y.; Chen, J.; Yao, Z.; Lao, C.; Chen, H. Estimation of soil salt content by combining UAV-borne multispectral sensor and machine learning algorithms. PeerJ 2020, 8, e9087. [Google Scholar]
  41. Guan, Y.; Grote, K.; Schott, J.; Leverett, K. Prediction of soil water content and electrical conductivity using random Forest methods with UAV multispectral and ground-coupled geophysical data. Remote Sens. 2022, 14, 1023. [Google Scholar]
  42. Chen, B.; Zheng, H.; Luo, G.; Chen, C.; Bao, A.; Liu, T.; Chen, X. Adaptive estimation of multi-regional soil salinization using extreme gradient boosting with Bayesian TPE optimization. Int. J. Remote Sens. 2022, 43, 778–811. [Google Scholar]
  43. Fathizad, H.; Ardakani, M.A.H.; Sodaiezadeh, H.; Kerry, R.; Taghizadeh-Mehrjardi, R. Investigation of the spatial and temporal variation of soil salinity using random forests in the central desert of Iran. Geoderma 2020, 365, 114233. [Google Scholar]
  44. Ivushkin, K.; Bartholomeus, H.; Bregt, A.K.; Pulatov, A.; Kempen, B.; de Sousa, L. Global mapping of soil salinity change. Remote Sens. Environ. 2019, 231, 111260. [Google Scholar]
  45. Guan, X.; Wang, S.; Gao, Z.; Lv, Y. Dynamic prediction of soil salinization in an irrigation district based on the support vector machine. Math. Comput. Model. 2013, 58, 719–724. [Google Scholar] [CrossRef]
  46. Wei, L.; Yuan, Z.; Yu, M.; Huang, C.; Cao, L. Estimation of arsenic content in soil based on laboratory and field reflectance spectroscopy. Sensors 2019, 19, 3904. [Google Scholar] [CrossRef]
  47. Shahabi, M.; Jafarzadeh, A.A.; Neyshabouri, M.R.; Ghorbani, M.A.; Valizadeh Kamran, K. Spatial modeling of soil salinity using multiple linear regression, ordinary kriging and artificial neural network methods. Arch. Agron. Soil Sci. 2017, 63, 151–160. [Google Scholar]
  48. Khan, N.M.; Rastoskuev, V.V.; Shalina, E.V.; Sato, Y. Mapping salt-affected soils using remote sensing indicators—A simple approach with the use of GIS IDRISI. In Proceedings of the 22nd Asian Conference on Remote Sensing, Singapore, 5–9 November 2001. [Google Scholar]
  49. Bannari, A.; Guedon, A.; El-Harti, A.; Cherkaoui, F.; El-Ghmari, A. Characterization of slightly and moderately saline and sodic soils in irrigated agricultural land using simulated data of advanced land imaging (EO-1) sensor. Commun. Soil Sci. Plant Anal. 2008, 39, 2795–2811. [Google Scholar] [CrossRef]
  50. Tripathi, N.; Rai, B.K.; Dwivedi, P. Spatial modeling of soil alkalinity in GIS environment using IRS data. In Proceedings of the 18th Asian Conference in Remote Sensing, Kuala Lumpur, Malaysia, 20–24 October 1997; pp. A.8.1–A.8.6. [Google Scholar]
  51. Nicolas, H.; Walter, C. Detecting salinity hazards within a semiarid context by means of combining soil and remote-sensing data. Geoderma 2006, 134, 217–230. [Google Scholar]
  52. Khan, N.M.; Rastoskuev, V.V.; Sato, Y.; Shiozawa, S. Assessment of hydrosaline land degradation by using a simple approach of remote sensing indicators. Agric. Water Manag. 2005, 77, 96–109. [Google Scholar]
  53. Abbas, A.; Khan, S. Using remote sensing techniques for appraisal of irrigated soil salinity. In Proceedings of the International Congress on Modelling and Simulation (MODSIM), Aucklend, New Zealand, 10–13 December 2007; pp. 2632–2638. [Google Scholar]
  54. Guo, B.; Zang, W.; Zhang, R. Soil salizanation information in the Yellow River Delta based on feature surface models using Landsat 8 OLI data. IEEE Access 2020, 8, 94394–94403. [Google Scholar] [CrossRef]
  55. Yu, X.; Chang, C.; Song, J.; Zhuge, Y.; Wang, A. Precise monitoring of soil salinity in China’s Yellow River Delta using UAV-borne multispectral imagery and a soil salinity retrieval index. Sensors 2022, 22, 546. [Google Scholar] [CrossRef]
  56. USGS EROS Archive, Landsat Archives, Landsat 8 OLI (Operational Land Imager) and TIRS (Thermal Infrared Sensor) Level-1 Data Products. Available online: (accessed on 5 May 2023).
  57. Richards, L.A. Diagnosis and Improvement of Saline and Alkali Soils; LWW: Philadelphia, PA, USA, 1954; Volume 78. [Google Scholar]
  58. Measuring Soil Salinity. Available online: (accessed on 5 May 2023).
  59. Scikit-Learn. Machine Learning in Python. Available online: (accessed on 5 May 2023).
  60. Pang, G.; Wang, T.; Liao, J.; Li, S. Quantitative Model Based on Field-Derived Spectral Characteristics to Estimate Soil Salinity in Minqin County, China. Soil Sci. Soc. Am. J. 2014, 78, 546–555. [Google Scholar] [CrossRef]
  61. Linardatos, P.; Papastefanopoulos, V.; Kotsiantis, S. Explainable ai: A review of machine learning interpretability methods. Entropy 2020, 23, 18. [Google Scholar] [CrossRef] [PubMed]
  62. Lundberg, S.M.; Lee, S.-I. A unified approach to interpreting model predictions. Adv. Neural Inf. Process. Syst. 2017, 30, 1–10. [Google Scholar]
  63. Mukhamediev, R.I.; Popova, Y.; Kuchin, Y.; Zaitseva, E.; Kalimoldayev, A.; Symagulov, A.; Levashenko, V.; Abdoldina, F.; Gopejenko, V.; Yakunin, K. Review of Artificial Intelligence and Machine Learning Technologies: Classification, Restrictions, Opportunities and Challenges. Mathematics 2022, 10, 2552. [Google Scholar] [CrossRef]
  64. Friedman, J.H. Greedy function approximation: A gradient boosting machine. Ann. Stat. 2001, 29, 1189–1232. [Google Scholar] [CrossRef]
  65. Chen, T.; Guestrin, C. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd Acm Sigkdd International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar]
  66. Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T.-Y. Lightgbm: A highly efficient gradient boosting decision tree. In Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017; Paper ID 1786. Available online: (accessed on 1 August 2023).
  67. Al Daoud, E. Comparison between XGBoost, LightGBM and CatBoost using a home credit dataset. Int. J. Comput. Inf. Eng. 2019, 13, 6–10. [Google Scholar]
  68. Bentéjac, C.; Csörgő, A.; Martínez-Muñoz, G. A comparative analysis of gradient boosting algorithms. Artif. Intell. Rev. 2021, 54, 1937–1967. [Google Scholar] [CrossRef]
  69. Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
  70. Cortes, C.; Vapnik, V. Support-vector networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
  71. Yu, H.-F.; Huang, F.-L.; Lin, C.-J. Dual coordinate descent methods for logistic regression and maximum entropy models. Mach. Learn. 2011, 85, 41–75. [Google Scholar] [CrossRef]
  72. Santosa, F.; Symes, W.W. Linear inversion of band-limited reflection seismograms. SIAM J. Sci. Stat. Comput. 1986, 7, 1307–1330. [Google Scholar] [CrossRef]
  73. Tikhonov, A.N.; Goncharsky, A.; Stepanov, V.V.e.; Yagola, A.G. Numerical Methods for the Solution of Ill-Posed Problems; Springer Science & Business Media: Berlin/Heidelberg, Germany, 1995; Volume 328. [Google Scholar]
  74. Hoerl, A.E.; Kennard, R.W. Ridge regression: Applications to nonorthogonal problems. Technometrics 1970, 12, 69–82. [Google Scholar] [CrossRef]
  75. Zou, H.; Hastie, T. Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B Stat. Methodol. 2005, 67, 301–320. [Google Scholar] [CrossRef]
  76. Li, J.; Zhang, T.; Shao, Y.; Ju, Z. Comparing machine learning algorithms for soil salinity mapping using topographic factors and Sentinel-1/2 data: A case study in the Yellow River delta of China. Remote Sens. 2023, 15, 2332. [Google Scholar] [CrossRef]
  77. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Las Vegas, NV, USA, 26 June–1 July 2016; pp. 770–778. [Google Scholar]
  78. Redmon, J.; Farhadi, A. Yolov3: An incremental improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
  79. Taigman, Y.; Yang, M.; Ranzato, M.; Wolf, L. Deepface: Closing the gap to human-level performance in face verification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 1701–1708. [Google Scholar]
  80. Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015, 18th International Conference, Munich, Germany, 5–9 October 2015, Part III 18; Springer International Publishing: Berlin/Heidelberg, Germany, 2015; pp. 234–241. [Google Scholar]
  81. Gu, Q.; Han, Y.; Xu, Y.; Ge, H.; Li, X. Extraction of saline soil distributions using different salinity indices and deep neural networks. Remote Sens. 2022, 14, 4647. [Google Scholar] [CrossRef]
  82. Mukhamediev, R.I.; Symagulov, A.; Kuchin, Y.; Zaitseva, E.; Bekbotayeva, A.; Yakunin, K.; Assanov, I.; Levashenko, V.; Popova, Y.; Akzhalova, A. Review of some applications of unmanned aerial vehicles technology in the resource-rich country. Appl. Sci. 2021, 11, 10171. [Google Scholar] [CrossRef]
  83. Wang, D.; Chen, H.; Wang, G.; Cong, J.; Wang, X.; Wei, X. Salinity inversion of severe saline soil in the yellow river estuary based on UAV multi-spectra. Sci. Agric. Sin. 2019, 52, 1698–1709. [Google Scholar]
  84. Hu, J.; Peng, J.; Zhou, Y.; Xu, D.; Zhao, R.; Jiang, Q.; Fu, T.; Wang, F.; Shi, Z. Quantitative estimation of soil salinity using UAV-borne hyperspectral and satellite multispectral images. Remote Sens. 2019, 11, 736. [Google Scholar] [CrossRef]
  85. Mukhamediev, R.; Amirgaliyev, Y.; Kuchin, Y.; Aubakirov, M.; Terekhov, A.; Merembayev, T.; Yelis, M.; Zaitseva, E.; Levashenko, V.; Popova, Y.; et al. Operational Mapping of Salinization Areas in Agricultural Fields Using Machine Learning Models Based on Low-Altitude Multispectral Images. Drones 2023, 7, 357. [Google Scholar] [CrossRef]
Figure 1. Methodological scheme of the study. X—input variables(features), y—target value.
Figure 1. Methodological scheme of the study. X—input variables(features), y—target value.
Remotesensing 15 04269 g001
Figure 2. Soil sampling locations near Kapchagai Reservoir and Lake Alakol.
Figure 2. Soil sampling locations near Kapchagai Reservoir and Lake Alakol.
Remotesensing 15 04269 g002aRemotesensing 15 04269 g002b
Figure 3. Map of soil sample collection in Almaty and Zhetysu regions (1, 2—Kapchagay, 3—Alacol, 4—Shelek).
Figure 3. Map of soil sample collection in Almaty and Zhetysu regions (1, 2—Kapchagay, 3—Alacol, 4—Shelek).
Remotesensing 15 04269 g003
Figure 4. The color composite image of the SAR Sentinel-1 data from Alakol lake, Kapchagay reservoir and Shelek region (Red: VH, Green: VV, Blue: VH/VV). 1, 2—Kapchagay, 3—Alakol, 4—Shelek.
Figure 4. The color composite image of the SAR Sentinel-1 data from Alakol lake, Kapchagay reservoir and Shelek region (Red: VH, Green: VV, Blue: VH/VV). 1, 2—Kapchagay, 3—Alakol, 4—Shelek.
Remotesensing 15 04269 g004
Figure 5. Part of the Santinel-1 SAR dataset on the salinity of certain areas of South Kazakhstan.
Figure 5. Part of the Santinel-1 SAR dataset on the salinity of certain areas of South Kazakhstan.
Remotesensing 15 04269 g005
Figure 6. Feature correlation heatmaps.
Figure 6. Feature correlation heatmaps.
Remotesensing 15 04269 g006
Figure 7. Scatterplots of laboratory and predicted salinity values for different data sets.
Figure 7. Scatterplots of laboratory and predicted salinity values for different data sets.
Remotesensing 15 04269 g007
Figure 8. Impact of input values on model output.
Figure 8. Impact of input values on model output.
Remotesensing 15 04269 g008
Figure 9. Correlation matrix of SHAP values.
Figure 9. Correlation matrix of SHAP values.
Remotesensing 15 04269 g009
Figure 10. Mapping of soil salinity in April 2022 in Shelek area.
Figure 10. Mapping of soil salinity in April 2022 in Shelek area.
Remotesensing 15 04269 g010
Figure 11. Dynamics of soil condition changes between the southern bank of the Kapchagai reservoir and the foothills of the Zailiisky Alatau.
Figure 11. Dynamics of soil condition changes between the southern bank of the Kapchagai reservoir and the foothills of the Zailiisky Alatau.
Remotesensing 15 04269 g011
Figure 12. Soil salinity in the area of Lake Alakol. Top left (1) is the field study area. Top right (2) is predicted by the developed model. Bottom (3) is predicted by the Mtemp model.
Figure 12. Soil salinity in the area of Lake Alakol. Top left (1) is the field study area. Top right (2) is predicted by the developed model. Bottom (3) is predicted by the Mtemp model.
Remotesensing 15 04269 g012
Table 1. Names of features and their description.
Table 1. Names of features and their description.
Target value
elco50Soil salinity, field data
Features generated using SAR Sentinel-1 data
dissimilarity_vvDissimilarity of gray level co-occurrence matrix for polarization VV
contrast_vvContrast of gray level co-occurrence matrix for polarization VV
homogeneity_vvHomogeneity of gray level co-occurrence matrix for polarization VV
energy_vvEnergy of gray level co-occurrence matrix for polarization VV
entropy_vvEntropy of gray level co-occurrence matrix for polarization VV
gamma_vhLinear backscatter intensity in VV polarization
gamma_vvLinear backscatter intensity in VH polarization
dissimilarity_vhDissimilarity of gray level co-occurrence matrix for polarization VH
contrast_vhContrast of gray level co-occurrence matrix for polarization VH
homogeneity_vhHomogeneity of gray level co-occurrence matrix for polarization VH
energy_vhEnergy of gray level co-occurrence matrix for polarization VH
entropy_vhEntropy of gray level co-occurrence matrix for polarization VH
correlation_vvCorrelation VV
correlation_vhCorrelation VH
ASM_vhAngular second moment VH
ASM_vvAngular second moment VV
Environmental features
Long_decLongitude in decimal coordinates WGS84
Lat_decLatitude in decimal coordinates WGS84
AltitudeMeasured of altitude by GPS
tempMODIS land surface temperature
slopeCalculated slope from DEM
Spectral indexes (see Table 2)
Table 2. Spectral indices used as input for machine learning models.
Table 2. Spectral indices used as input for machine learning models.
Spectral IndexesRef.
N D S I = r e d     n i r r e d   +   n i r [48]
S 1 = b l u e r e d [49]
S 2 = b l u e     r e d b l u e   +   r e d [49]
S 3 = g r e e n   ×   r e d b l u e [49]
S I 1 = g r e e n   ×   r e d 2 [50]
S I 2 = g r e e n 2   +   r e d 2   +   n i r 2 2 [51]
S I 3 = g r e e n 2   +   n i r 2 2 [52]
S I 8 = b l u e   ×   r e d g r e e n [53]
W I 1 = 0.1761 × green + 0.322 × red + 0.3396 × nir [54]
SSRI = n i r g r e e n   ×   r e d   2 [55]
N D S I r e = r e d     s w i r _ 16 r e d   +   s w i r _ 16 *
S I 3 r e = g r e e n 2   +   s w i r _ 16 2 2 *
SSRIre = s w i r _ 16 g r e e n   ×   r e d 2 *
The following spectral ranges were used to calculate the indices [56]: Band 2, Blue (0.450–0.51 µm), Band 3, Green (0.53–0.59 µm), Band 4, Red (0.64–0.67 µm), Band 5, Near-Infrared (0.85–0.88 µm), Band 6, SWIR 16 (1.57–1.65 µm), Band 7, SWIR 22 (2.11–2.29 µm). To use the “swir_16” range data, additional spectral indices are proposed using this range instead of “nir” (marked with “*”).
Table 3. Distribution of field data by salinity class based on electrical conductivity as EC1:5 for loam soil.
Table 3. Distribution of field data by salinity class based on electrical conductivity as EC1:5 for loam soil.
Salinity ClassClass NumberNumber of SamplesEC1:5 Range for Loams (dS/m)
Slightly saline1420.19–0.36
Moderately saline2410.37–0.72
Highly saline3210.73–1.45
Severely saline443>1.45
Table 4. Machine Learning Models.
Table 4. Machine Learning Models.
Regression ModelAbbreviationMethodReferences
XGBoostXGBEnsemble learning method based the gradient boosted trees algorithm.[65]
LightGBMLGBMEnsemble learning method based the gradient boosted trees algorithm.[66,67,68]
Random forestRFEnsemble learning method based on bagging technique [69]
Support vector machinesSVMLinear and non-linear classification based on the technique named kernel trick[70]
Linear regressionLRLinear approach to modeling impact of independent variables to dependent value or target variable. [71]
Lasso regressionLassoBased on the use of such a regularization mechanism that not only helps in reducing overfitting but it can help in feature selection. [72]
Ridge regressionRidgeA regularization mechanism is used to prevent over-training (overfitting).[73,74]
Elastic netElasticNetHybrid of ridge regression and lasso regularization[75]
Table 5. Quality metrics for regression models.
Table 5. Quality metrics for regression models.
Accuracy IndexAbbreviationEquationExplanation
Determination coefficient R 2 / r 2 _ s c o r e R 2 = 1 S S r e s S S t o t
S S r e s = i = 1 m k ( y i h i ) 2
S S t o t = i = 1 m k ( y i y ¯ ) 2 , y ¯ = 1 m k i = 1 m k y i ,
where y i is the actual value;
h i is the estimated value (the value of the hypothesis function) for the i-th sample;
m k m is a part of the training sample (the set of labeled objects)
Mean Absolute ErrorMAE M A E   =   i = 1 n | y i h i | n where n is a simple size; when evaluating the performance of the model on the test set n is the size of the test set
Mean squared errorMSE M S E   =   i = 1 n ( y i h i ) n 2
Table 6. Results of machine learning models using SAR data.
Table 6. Results of machine learning models using SAR data.
DatasetRegression ModelMAEMSE R 2 VarMAEVarMSE Var R 2 Duration
Full DatasetXGB0.6441.9910.2820.0243.5420.04628.14647
Table 7. Results of machine learning models using radar and optical data.
Table 7. Results of machine learning models using radar and optical data.
DatasetRegression Model MAEMSE R 2 VarMAEVarMSE Var R 2 Duration
Full DatasetXGB0.5691.8890.3390.0233.490.05728.16568
Table 8. Results of regressors performance with the SAR dataset, in which the parameters slope, dem, temp are excluded.
Table 8. Results of regressors performance with the SAR dataset, in which the parameters slope, dem, temp are excluded.
DatasetRegressorMAEMSE R 2 VarMAEVarMSE Var R 2 Duration
Full DatasetXGB0.6772.0530.2270.0273.3290.03429.84913
Table 9. Results of machine learning models performance after optimizing the number of input parameters.
Table 9. Results of machine learning models performance after optimizing the number of input parameters.
DatasetRegressorMAEMSE R 2 VarMAEVarMSE Var R 2 Duration
Full Dataset. The set of features are optimized.XGB0.5751.8580.3560.0233.5260.06132.16551
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Mukhamediev, R.I.; Merembayev, T.; Kuchin, Y.; Malakhov, D.; Zaitseva, E.; Levashenko, V.; Popova, Y.; Symagulov, A.; Sagatdinova, G.; Amirgaliyev, Y. Soil Salinity Estimation for South Kazakhstan Based on SAR Sentinel-1 and Landsat-8,9 OLI Data with Machine Learning Models. Remote Sens. 2023, 15, 4269.

AMA Style

Mukhamediev RI, Merembayev T, Kuchin Y, Malakhov D, Zaitseva E, Levashenko V, Popova Y, Symagulov A, Sagatdinova G, Amirgaliyev Y. Soil Salinity Estimation for South Kazakhstan Based on SAR Sentinel-1 and Landsat-8,9 OLI Data with Machine Learning Models. Remote Sensing. 2023; 15(17):4269.

Chicago/Turabian Style

Mukhamediev, Ravil I., Timur Merembayev, Yan Kuchin, Dmitry Malakhov, Elena Zaitseva, Vitaly Levashenko, Yelena Popova, Adilkhan Symagulov, Gulshat Sagatdinova, and Yedilkhan Amirgaliyev. 2023. "Soil Salinity Estimation for South Kazakhstan Based on SAR Sentinel-1 and Landsat-8,9 OLI Data with Machine Learning Models" Remote Sensing 15, no. 17: 4269.

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop