Mapping Soil Salinity/Sodicity by using Landsat OLI Imagery and PLSR Algorithm over Semiarid West Jilin Province, China

Soil salinity and sodicity can significantly reduce the value and the productivity of affected lands, posing degradation, and threats to sustainable development of natural resources on earth. This research attempted to map soil salinity/sodicity via disentangling the relationships between Landsat 8 Operational Land Imager (OLI) imagery and in-situ measurements (EC, pH) over the west Jilin of China. We established the retrieval models for soil salinity and sodicity using Partial Least Square Regression (PLSR). Spatial distribution of the soils that were subjected to hybridized salinity and sodicity (HSS) was obtained by overlay analysis using maps of soil salinity and sodicity in geographical information system (GIS) environment. We analyzed the severity and occurring sizes of soil salinity, sodicity, and HSS with regard to specified soil types and land cover. Results indicated that the models’ accuracy was improved by combining the reflectance bands and spectral indices that were mathematically transformed. Therefore, our results stipulated that the OLI imagery and PLSR method applied to mapping soil salinity and sodicity in the region. The mapping results revealed that the areas of soil salinity, sodicity, and HSS were 1.61 × 106 hm2, 1.46 × 106 hm2, and 1.36 × 106 hm2, respectively. Also, the occurring area of moderate and intensive sodicity was larger than that of salinity. This research may underpin efficiently mapping regional salinity/sodicity occurrences, understanding the linkages between spectral reflectance and ground measurements of soil salinity and sodicity, and provide tools for soil salinity monitoring and the sustainable utilization of land resources.


Introduction
Basically, mapping and assessing the soil salinity and sodicity is a first step for salinity monitoring, salt-soil agriculture development, and environmental sustainability maintenance throughout the world [1,2]. Such work is also valuable to the scientific understanding of soil salinity and sodicity occurrences and pursuing effective measures for evaluating the affected soils [3].

Study Area Descriptions
The west Jilin Province, as the largest sodic saline-alkali soil distribution in Northeast China, is located in the southern Songnen Plains (43 • 59 ~46 • 18 N, 121 • 38 ~126 • 11 E. Figure 1), with low-lying terrains and elevations between 100~200 m. The groundwater depth is about 2 m. The study area is under a temperate, semi-arid continental monsoon climate, with precipitation dwindling from 420~460 mm in the east to 350~420 mm in the west, and evaporation increasing from 1200~1600 mm in the east to 1500~2000 mm in the west [26,27]. About 80% of the total precipitation occurs between mid-June and mid-August. The soils are Chernozem (Haplic Chernozem, FAO), meadow soil (Eutric Vertisol, FAO), aeolian soil (Arenosol, FAO), Solonetz (Solonetz, FAO), and Chestnut soil (Haplic Kastanozem, FAO). Pristine vegetation of this area is temperate steppes grown on the salted soils, with widespread fragmented scar-like saline lands that were spotted on the landscapes. However, grassland conversion to cropland has been common since the 1980s in this region, leading to less than one-quarter of the natural steppes are merely remained and cultivated croplands are ubiquitous with salinity to some extent.

Data Sources and Pre-Processing
Six scenes of satellite images created by Landsat 8 OLI covering the study region were used, of which five scenes were acquired on August 2016, and one image at the end of July. The images were geo-rectified to the Universal Transverse Mercator (UTM) coordinate system using World Geodetic System (WGS) 1984 datum assigned to north UTM Zone 51. Atmospheric correction was performed using FLAASH Model (Fast Line-of-sight Atmospheric Analysis of Spectral Hypercubes) that can correct both additive and multiplicative atmospheric effects [28]. Using ENVI 5.3, the study region was clipped from a mosaic of all the processed images. The spectral indices that link vegetation performance and soil salinity were calculated by the reflectance of corresponding bands, including NDVI, EVI, SIs [2,15,29], DVI [30], SAVI, and SRSI [31]. The spectral indices are listed in Table 1. For enhancing the remotely sensed data that built the linkages between spectral signatures and soil salinity and verifying the inversion models on the retrieval of soil salinity and sodicity, we conducted field measurements and soil sampling in October 2016. As a result, all of the soil samples were collected in triplicate from the 0-5 cm layer. The number of soil samples obtained from each land cover type is shown in Figure 1. All of the samples were air-dried, sieved through a 2 mm sieve. Soil pH and EC were measured in 1:5 soil:water extracts using LEICI pH5-3E and LEICI DDS-307A meters, respectively [17]. Wetland, water body, and built-up land were not involved in the inversion because we focused on the predominantly terrestrial salinity, and thus excluded samples from the lands. By excluding four data points with aberrant values following experimental analysis, we finally accepted 157 sampling points for analysis.
Auxiliary data included Digital Elevation Model (DEM, resolution 30 m) data; a soil type map (1:1000,000), provided by Northeast Institute of Geography and Agroecology, Chinese Academy of Sciences; and, reference/validation site for classification of land cover by field survey or from Google Earth in 2016.

Random Forest to Classifying Land Cover
The RS images were classified by Random Forest (RF) to draw a land cover map for analyzing soil salinity and sodicity within each land cover type. RF is an ensemble of classification trees, where each tree contributes with a single vote for the assignment of the most frequent class to the input data. Key advantages of RF include non-parametric nature, high classifying accuracy, and capability to weigh variable importance [32,33]. RF uses a random subset of inputs or predictive variables in a node division instead of using best variables, which can reduce the generalizing error and finally lead to highest accuracy. Here, the classification system consisted of seven basic land cover classes: woodland, grassland, wetland, water body, barren land, cropland, and built-up [34,35]. The overall land cover classification accuracy was 95.76%, with a Kappa value 0.91 for RF.

The PLSR
The PLSR is an extension of multiple regression analysis, in which the effects of linear combinations of several predictors on a response variable (or multiple response variables) are analyzed. The method combines the traits of principal component analysis and multiple linear regressions [20]. Therefore, to eliminate the influence of collinearity among variables on multiple linear regression models, PLSR is usually used to improve the stability of estimators [36]. For developing the PLSR algorithms in this research, soil EC and pH were used each as an indicator of soil salinity and sodicity. Based on field sampling, the PLSR models can be established for soil salinity and sodicity retrieval [3,16,36]. The variables were transformed into six formulas to better retrieve soil EC and pH before three times of PLSR analyses. For the first time, soil EC and pH were retrieved merely by the reflectance of bands. Subsequently, we only used spectral indices to build up the retrieval models. Finally, we combined the bands and spectral indices to retrieve soil EC and pH. By using R 2 and RMSE, the models were assessed to determine best models. Consequently, the optimized latent variables for regression were obtained from the models for soil salinity and sodicity. Model performance can be evaluated with R 2 , Bias, Standard Deviation (SD), and RMSE. R 2 indicates the strength of statistical correlation between measured and predicted values. The model can be accurate for R 2 > 0.91, good for 0.82 < R 2 < 0.90, moderate for 0.66 < R 2 < 0.81, and poor for 0.5 < R 2 < 0.65, according to Farifteh et al. [37]. Bias measures the mean difference between prediction and measurements, and SD represents the random component of total uncertainties [38]. The metrics are described, as follows: where y denotes the measures with a mean value of y,ŷ denotes the predicted values, and N the number of samples. The 0.05 probability level was used to test the significance of correlation.

Sensitivity of RS Variables to Soil EC and pH
In order to distil the variables sensitive to soil pH and EC, spectral analyses of soil samples were implemented. Regression coefficients were used to indicate the impact of independent variables on the dependent variable, for instance, the higher the regression coefficient value, the greater the impact on the dependent variable [3]. Before modelling, the 157 soil samples were divided into two subsets. The first set (110 samples) was used for calibration and the second set (47 samples) for validation. Laboratory analysis showed that the pH values of the sampled soils ranged from 5.90 to 10.45, with an average 8.70 that generally pointed to sodicity [1]. EC measurements of the samples ranged from 0.03 to 3.99 mS/cm, with an average 0.54 mS/cm that indicated lower soluble salt concentration. Thus, most soils in the study area were more affected by sodicity rather than by salinity. Table 2 shows the correlation coefficients between EC, pH, and reflectance of bands, and between bands. EC showed the lowest coefficient positive with NIR (R = 0.348), moderately correlated with SWIR1, SWIR2, and PAN (R = 0.704, 0.738, and 0.763), and higher with Coastal, Red, Green, and Blue (R = 0.821, 0.810, 0.826, and 0.818). In particular, the correlation coefficient between Red vs. pH (R = 0.805), or between EC vs. Green (R = 0.826) was the highest each among the group, implying that the band Red had priority for soil salinity retrieval and the band Green for sodicity.
Note: * represents significance at the 0.05 level; ** represents significance at the 0.01 level; bold number means the maximum value in a column. Table 3 indicates the correlation coefficients between soil EC, pH, and all of the spectral indices one another. Vegetation indices (NDVI, EVI, DVI, and SAVI) were negatively correlated with EC and pH, with NDVI/SAVI the most negatively correlated. By contrast, salinity indices (SI, SI2, SI3, and SI4) were positively correlated with EC and pH. Since SI (R = 0.803) and SAVI (R = 0.814) exhibited more sensitive to soil sodicity than other spectral indices, thus we stipulated that SAVI was the priority spectral index to reflect soil sodicity. For EC, SI3 was the best estimator of soil salinity (R = 0.826), yet EVI had the poorest (R = −0.449). Overall, the RS-derived vegetation indices were more sensitive to soil sodicity than to soil salinity.

The PLSR Models for Soil Salinity/Sodicity Retrieval
We conducted mathematical transformations on all of the normalized variables, which included the bands and spectral indices before building regression models. The results implied that mathematical transformations upon the normalized variables that integrated bands and spectral indices should be carried out before building regression models, which improved the retrieval accuracy of soil salinity and sodicity than before.
For better comparing accuracy of the inversion models after different forms of mathematical transformation, all of the variables were normalized, as follows [39]: where x is an original value, x is the normalized value. The variables transformed into "1/log(r)" or "e r " exhibited better correlations with soil EC (Table 4). For retrieval of soil pH, the "1/r" was best transform (Table 5). At the same time, we found that some nonlinear transformations can enhance the correlations between the variables and EC or pH, and the accuracy of the regression models that were established by combining bands and spectral indices was improved than that by bands or spectral indices per se. The prediction results showed that the R 2 for EC increased from 0.704 to 0.735, and from 0.666 to 0.694 for pH. The optimized models were highlighted in Tables 4 and 5. For the EC regression model, the most influential variables were SI3 and the band Green, which were transformed through "e r ". For the pH regression model, the most influential variables were SAVI and the band Coastal, which were transformed through "1/r". Obviously, the optimized variables and the mathematical transformations were pivotal in well predicting soil EC and pH.  The prediction accuracy was assessed with 47 validating samples using RMSE and R 2 , which revealed the RMSE of the pH reaching 0.427 (R 2 = 0.747) and that of the EC reaching 0.532 mS/cm (R 2 = 0.698), respectively. These indicated that the two models were acceptable for estimating soil salinity and sodicity.
The mapping results showed that the non-salt affecting area was located in northeast and southwest of the study area, and the intensively salted area was in the middle, whilst the moderately salted area was in the south and middle of the area (Figure 2). The salt-affecting area was 1.61 × 10 6 hm 2 , accounting for 34.28% of the study area; slightly salted was common in the salt-affecting lands (1.17 × 10 6 hm 2 ), accounting for 72.90% of the area. The areas of moderately salted and intensively salted were 0.23 × 10 6 hm 2 and 0.21 × 10 6 hm 2 , accounting for 14.31% and 12.79% of the salt-affecting area, respectively. The sodium-affecting area was 1.46 × 10 6 hm 2 , accounting for 31.15% of the study area; slightly sodic was the dominant level (0.79 × 10 6 hm 2 ), accounting for 54.51% of the area. The areas of moderately sodic and intensively sodic were 0.41 × 10 6 hm 2 and 0.25 × 10 6 hm 2 , accounting for 28.14% and 17.35% in the sodium-affecting area, respectively. The intensively sodic area was distributed in the middle and the north of the study area, and moderately and slightly sodic areas were in the south and middle. By contrast, the non-sodium affecting areas were close to the non-salt affecting area, which was mainly located in the northeast and southwest of the study area ( Figure 3). Spatial distribution of the soils that were affected by HSS was obtained by overlay analysis with the maps of soil salinity and sodicity. The HSS levels are shown in Table 6. The spatial distribution of HSS indicated that the occurrence of HSS appeared roughly at the same place of soil salinity and soil sodicity (Figure 4). The HSS affecting area was 1.36 × 10 6 hm 2 , accounting for 28.91% of the study area. The areas of moderate and intensive HSS were 0.24 × 10 6 hm 2 and 0.19 × 10 6 hm 2 ; accounting for 18.07% and 14.14% in the HSS affecting area, respectively. The intensive HSS area was distributed in the middle of the study area, as revealed manifest spatial overlap between the salinity and sodicity occurrences.

Land Cover Classifications
The spatial distribution of land cover in the west Jilin Province is shown in Figure 5. The results indicated that cropland was the largest land cover, with an area of 3.05 × 10 6 hm 2 , accounting for 65.00% of the study area, as followed by grassland (12.93%), woodland (5.03%), water body (4.77%), wetland (4.64%), built-up land (4.34%), and barren land (3.29%). Grassland and barren land were mainly distributed in the middle of the study area, where many small lakes and ponds were available; woodland scattered across the area, with many as shelter belts on the fields. Accordingly, these classification results can be a basis for assessing soil salinity and sodicity in relation to land cover in the study area.

The Distribution of Soil Salinity/Sodicity with Land Cover
In this research, we merely analyzed soil salinity, sodicity, and HSS for woodland, cropland, grassland, and barren land in the study area. Figure 6 illustrates the areas of soil salinity, sodicity, and HSS with regard to land cover. The results indicated that cropland was the largest in area among all of the land cover types, with the areas of soil salinity, sodicity, and HSS to be 0.91 × 10 6 hm 2 , 0.76 × 10 6 hm 2 , and 0.69 × 10 6 hm 2 , respectively. Grassland, barren land, and woodland followed then. However, the slightly salted level resided the prevalent degree in the affected area among cropland, grassland, and woodland, which emerged the same for the slightly HSS level. Further, the barren land was mostly intensively salted and/or sodic (with about 60.00% of the barren land). As for grassland, moderately sodic level was dominant in the sodium-affecting area (32.24%), whereas woodland exhibited the least affected by salinity and sodicity.  Figure 7 presents the areas of soil salinity, sodicity and HSS with regard to soil types. In this research, soil types encompassed eight classes in the west Jilin Province. Chernozem soil and Meadow soil were dominant in this area. The areas of soil salinity, sodicity, and HSS, with regard to soil types, followed as: Chernozem > Meadow soil > Aeolian soil > Solonetz > Chestnut soil > Bog soil > Solonchak > Black soil. The results stipulated that Chernozem was the largest in area among the soil types, in which the areas of soil salinity, sodicity, and HSS were 0.76 × 10 6 hm 2 , 0.71 × 10 6 hm 2 , and 0.65 × 10 6 hm 2 , respectively. Meadow soil and Aeolian soil followed as the second and the third. Solonetz soil was the most seriously affected by salinity, sodicity and HSS, in which the influenced percentages in area were 72.14%, 68.34%, and 66.27%, respectively. Solonchak soil then followed, with an affected percentage of 57.72% in area. In addition, the soils with affected percentages above 35.00% in area included Aeolian soil, Meadow soil, and Bog soil. The soils with affected percentages below 35.00% in area pointed to Chestnut soil, Chernozem, and Black soil (Mollisol). Naturally, the Black soils were least affected among the soils.

Discussion
This research endeavored to map soil salinity and sodicity via building the relationships between Landsat 8 OLI imagery and in-situ measurements of soil EC and pH over the west Jilin of China. We pursued establishing and improving the models for soil salinity and sodicity retrieval using PLSR and mathematical transformation of selected variables. As a result, we corroborated that combining Landsat 8 OLI imagery and improved PLSR algorithm can efficiently optimize the retrieval models of soil salinity (measured with soil EC) and sodicity (measured with soil pH).
In the past decades, soil salinity and/or sodicity retrieval models have been proposed often by relating soil physi-chemical properties and spectral performance with RS data and ground measurements [29,43]. By using three dielectric retrieval algorithms, small perturbation, optical physics, and Dubois models to implement analysis, Bell et al. [14] retrieved an improved estimation of a dielectric constant for soil salinity discrimination (optimized R 2 = 0.649). Fan et al. [16] established a soil salinity retrieval model (R 2 = 0.749) with advanced Multi-Spectral Sensor and PLSR. Janik et al. [24] documented a method by combining partial least square (PLS) and ANN (R 2 = 0.94). Nawar et al. [36] suggested the multivariate adaptive regression splines (MARS) model (R 2 = 0.73) for soil salinity prediction, which performed better than PLSR. Sidike et al. [3] found that PLSR was better than Stepwise Multiple Regressions (SMR) in retrieving soil salinity. However, we found that the spectral reflectance increased with soil EC, which became dramatic when EC exceeded 1 mS/cm; while pH exceeded 8, the spectral reflectance increased dramatically too (Figure 8). This implied nonlinear relationship might exist between soil EC, pH and the spectral data. Therefore, unlike previous research, we improved the PLSR models' accuracy via combining reflectance bands and spectral indices through nonlinear mathematical transformation before modelling. Consequently, we confirmed that mathematical transformation indeed could improve the correlation between spectral reflectivity and surface sampling measurements, which further improved the accuracy of the PLSR models. Here, "e r "and "1/r" were the optimal mathematical transforms for building the inversion models of soil EC and pH, respectively.
Then, we verified that Landsat 8 OLI imagery was appropriate and economic for extracting the information of soil salinity and sodicity over a large area. Generally, the advantages of RS technology involve quick access and saving time, wider coverage, and constant time series provision for long term monitoring [6,16,44]. By contrast, OLI imagery owns broader views, comparable spatial resolution relative to satellite hyperspectral data (e.g., Hyperion [45]), and abundant band/spectral information competent with higher resolution imagery like QuickBird, IKONOS, and GeoEye-1 [46]. Most importantly, Landsat 8 OLI is free to obtain [47].
Further, we witnessed that linking Landsat OLI-derived variables and field measurements of salinity and sodicity indeed made sense for the mapping and assessment in our case. We found that the band Green best correlated with soil EC, and the band Red best correlated with soil pH (Table 2). Yet, Bai et al. [17] found the band Coastal was best correlated with soil EC and pH. This may lie in the imagery acquisition time difference between spring season in their case and summer growing season in ours. Undoubtedly, it should be better to represent soil EC by reading vegetation performance during summer, as can be readily described by the band Green [29]. As for spectral indices, we found the vegetation indices revealed negatively, and the salinity indices were positively correlated with soil EC and pH. Moreover, the vegetation indices were more sensitive to soil pH than to soil EC, NDVI/SAVI was best correlated spectral index, and SI3 was the most sensitive salinity index to soil EC (Table 2). Nonetheless, Shrestha [11] found that NDVI was poorly negatively correlated with soil EC, and NIR was best correlated with soil EC. Finally, our mapping results indicated that about one third of the area was affected by salt and sodium in the west Jilin Province. The salt-affecting area was larger than that by sodicity, however, the severity of soil salinity was less than that of sodicity, because the moderate and intensive sodicity accounted for 45.49% of the sodic area, and the moderate and intensive salinity accounted for 29.03% of the saline area. Bai et al. [17] have ever estimated the soil sodicity and salinity for the northern Songnen Plains. But soil sodicity in our study was much more serious (mean pH = 8.70) than that in Bai's study area (mean pH = 7.92). The causes may lie in less precipitation and higher temperatures in the west Jilin [26]. The soil EC values were lower when compared with other research, which may be due to the regional variations of the study areas [12,15,16,48] and field sampling uncertainties [15,38]. Despite that the soil samples were collected during a desalting period, we classified the levels of soil salinity and sodicity by expert knowledge and characteristics of the study area [14,23]. This ensures that our results can represent the spatial distributions of soil salinity and sodicity in the region well. Cropland (68.51%) was the major contributor to slight salinity, while in the moderate salinity area, grassland was dominant (50.23%). Among all of the land cover types, grassland and barren land were the major contributors to intensive salinity, with percentages 41.48% and 44.31% in area. Cropland and grassland were also dominant contributors to slight and moderate sodicity in the area. However, some differences existed between soil salinity and sodicity, e.g., the contribution rate of cropland to intensive sodicity was twice of that to intensive salinity in area.
However, several aspects should be consolidated in our research. First, we did not sample the soils from built-up land, wetland, and waters for the sake of targeting terrestrial off-town salinity issues. However, the excluded areas were also likely affected by salt problems. Second, human activities played critical roles in affecting the soil salinity and sodicity, e.g., worsening salinity and sodicity of the top soils by bringing salt to soil surface through altering natural water cycles, or allowing for excessive recharge of groundwater and salt accumulation by concentrating [1]. Built-up land is the central of human activities, however, soil salinity and sodicity in built-up land was poorly assessed [49]. Khaledian et al. [50] found that the soil quality of built-up land was poorer than that of other land types. Also, as wetland and waters are important natural resources, their salinity and sodicity will directly affect vegetation and water quality [48,51]. Therefore, to accurately analyze soil degradation by salinity and sodicity, sampling sites should be much representative, better covering all sorts of landscapes and human activities. In future research, soil samples from wetlands and urban areas are needed. Moreover, it is promising to retrospect soil salinity and sodicity during historical stages in accordance with the employed inversion models, and to unveil the areal trending of salt affection through inversion results of soil salinity and sodicity throughout multiple periods.
Land degradation, featured with land salinity mapping in our case, can be interpreted as a decline in land quality and quantity with spatio-temporal variations. However, the soils and RS data used in this study were for only one year. Hence, the use of richer time series of soils and RS data would help to boost an understanding the land degradation processes. We may focus more on the spatio-temporal dynamics of soil sodicity and salinity at diverse scales and the underlying drivers in our future investigations.

Conclusions
We successfully mapped multiple forms of soil salinity and sodicity in the west Jilin Province of China using Landsat 8 OLI imagery and improved PLSR algorithm through the nonlinear transformation to retrieve soil salinity and sodicity. Relationships between field measurements of salinity and sodicity, reflectance bands, and spectral indices were built up; and, spatial distributions and occurring severity of salinity and sodicity were also analyzed. The results showed that the most influential variables were SI3 and the band Green were transformed through "e r ". For the pH regression model, the most influential variables were SAVI and the band Coastal were transformed through "1/r". RMSE of pH reached 0.427 (R 2 = 0.747) and that of EC reached 0.532 mS/cm (R 2 = 0.698).
By statistics from the mapping, about one-third of the study area was influenced by salinity and sodicity. The intensive salinity, sodicity, and HSS lands were mostly distributed in the low-elevation areas near lakes. Soil degradation due to salinity and sodicity was common with regard to land cover in the area. Barren land was the most affected land cover that was subjected to salinity and sodicity. Grassland was the most notable vegetated land cover affected by soil salinity and sodicity, in which exceeding 75.00% in area was salted. By contrast, cropland was the largest affected area among all the land cover, in which the areas that were subjected to salinity, sodicity and HSS were 0.91 × 10 6 hm 2 , 0.76 × 10 6 hm 2 , and 0.69 × 10 6 hm 2 , respectively. With regard to soil types, Chernozem had the largest area subjected to salinity and sodicity, while Solonetz was the most serious type that was affected by salinity. In short, our results may provide realistic guidance for land degradation monitoring, ecological reclamation, salt-soil agriculture management, and comprehensive utilization of natural resources in the west Jilin Province of China.