remote sensing Estimation of Soil Salinization by Machine Learning Algorithms in Different Arid Regions of Northwest China

: Hyperspectral data has attracted considerable attention in recent years due to its high accuracy in monitoring soil salinization. At present, most existing research focuses on the saline soil in a single area without comparative analysis between regions. The regional differences in the hyperspectral characteristics of saline soil are still unclear. Thus, we chose Golmud in the cold–dry Qaidam Basin (QB–G) and Gaotai–Minghua in the relatively warm–dry Hexi Corridor (HC–GM) as the study areas, and used the deep extreme learning machine (DELM) and sine cosine algorithm– Elman (SCA–Elman) to predict soil salinity, and then selected the most suitable algorithm in these two regions. A total of 79 (QB–G) and 86 (HC–GM) soil samples were collected and tested to obtain their electrical conductivity (EC) and corresponding hyperspectral reﬂectance (R). We utilized the land surface parameters that affect the soil based on Landsat 8 and digital elevation model (DEM) data, selected the variables using the light gradient boosting machine (LightGBM), and built SCA–Elman and DELM from the hyperspectral reﬂectance data combined with land surface parameters. The results revealed the following: (1) The soil hyperspectral reﬂectance in QB–G was higher than that in HC–GM. The soils of QB–G are mainly the chloride type and those of HC–GM mainly belong to the sulfate type, having lower reﬂectance. (2) The accuracies of some of the SCA–Elman and DELM models in QB–G (the highest MAEv, RMSEv, and R 2v were 0.09, 0.12 and 0.75, respectively) were higher than those in HC–GM (the highest MAEv, RMSEv, and R 2v were 0.10, 0.14 and 0.73, respectively), which has ﬂatter terrain and less obvious surface changes. The surface parameters in QB–G had higher correlation coefﬁcients with EC due to the regular altitude change and cold–dry climate. (3) Most of the SCA–Elman results (the mean R 2v in HC-GM and QB-G were 0.62 and 0.60, respectively) in all areas performed better than the DELM results (the mean R 2v in HC–GM and QB–G were 0.51 and 0.49, respectively). Therefore, SCA–Elman was more suitable for the soil salinity prediction in HC–GM and QB–G. This can provide a reference for soil salinization monitoring and model selection in the future.


Introduction
Soil salinization is an important land degradation problem that plagues the world and seriously threatens food security. From 1986 to 2016, the area of salinized land increased by more than 100 million hectares [1]. On 21 October 2021, the Food and Agriculture Organization of the United Nations (FAO) released a global map of the saline soil distribution [2]. This map shows that the world's saline soil covered an area of 833 million hectares and was mainly distributed in the arid and semi-arid areas of Asia, Africa, and Latin America. In arid climate zones, 20-50% of the irrigated soils on all of the continents have an excessively high salt content. Therefore, it is imperative to monitor soil salinization in the future.
To date, many scholars have carried out research using multispectral, hyperspectral, thermal infrared, and microwave techniques to study soil salinization [3][4][5][6]. Among them, hyperspectral remote sensing data have the unique advantage of a continuous band and rich information, but the correlation between the different bands is relatively strong. Accordingly, characteristic bands need to be selected to solve the collinearity problem. The existing variable selection methods primarily include the following: (1) filtering (principal component analysis, correlation coefficient, gray correlation, and chi-square test); (2) embedding (variable selection method based on penalty items or tree models); and (3) encapsulation (generally combined with the multi-objective grey wolf optimizer (MOGWO) [7], whale optimization algorithm (WOA) [8], sine cosine algorithm (SCA) [9], or other heuristic algorithms). Heuristic algorithms can obtain a better solution in a short time and have been widely used in model prediction [10][11][12]. The filtering method is based on statistical principles for variable selection, and has high computational efficiency but low selection accuracy, and is prone to data redundancy. Embedding and encapsulation methods can effectively improve the selection accuracy and reduce the data dimensions [13].
Artificial intelligence methods are widely used in soil salinization research. The concept of artificial intelligence was introduced in 1956. At the end of the 20th century, nonlinear machine learning algorithms developed rapidly and were gradually used by many scholars to construct salinity prediction models, and good results were achieved [14,15]. By 2012, deep learning algorithms had gradually become widely used. Deep learning is the further development of machine learning and can be used to mine data and obtain multiple levels of information. Such algorithms have a strong generalization ability and robustness [16]. Zhang et al. [17] used the Cubist, partial least squares regression (PLSR), and extreme learning machine (ELM) models to establish soil salinity models, and showed the Cubist model had the highest accuracy. Wang et al. [18] used a back propagation neural network (BPNN) to estimate the soil salt content in the Aibi Lake Wetland Nature Reserve and achieved good research results with an R 2 of 0. 95. In the SCORPAN equation, the soil is related to a set of auxiliary data, such as the climate, living organisms, relief, parent material, time, and location [19]. Many studies have predicted the soil salinity based on these auxiliary data [20,21]. The soil ecosystem is highly complex, and there may be a nonlinear relationship between soil salinity and spectral reflectance. DELM improves the traditional ELM algorithm by adding the number of hidden layers that can process high-dimensional data more efficiently, and introduces regularization coefficients which can prevent the model from over-fitting [22]. Ouyang et al. [23] confirmed that the developed deep ELM model performed better than other evaluated regression models related to analyzing NOx. Many studies have used the SCA algorithm to optimize the weight and threshold of a neural network to improve its performance [24]. However, these algorithms (e.g., DELM, LightGBM, and SCA) have rarely been used in the salinization research area. Therefore, LightGBM, DELM, and SCA-Elman machine learning algorithms were selected for salinity inversion in this study. EC is a parameter that many researchers use to study soil salinity at the local scale [25,26]. The geographic environments of the Qaidam Basin and the Hexi Corridor are quite different, especially regarding their climates and altitudes. However, although the degree of salinization in these two places is serious, related research is lacking; thus, in this study these environments were chosen as the study areas, and comparative analysis was conducted. We used the LightGBM to select the modeling variables, and then predicted the EC of the different study areas based on hyperspectral data combined with the surface parameters of the SCORPAN equation and conducted a comparative analysis.

Study Area
Gaotai County and Minghua Township in Yugur Autonomous County of Sunan are located in the Hexi Corridor (HC-GM) of China ( Figure 1). The southern part of HC-GM in the Hexi Corridor contains the Qilian Mountains, and the northern part contains the Heli Mountains. The Heihe River flows through the central flat region of HC-GM. HC-GM has a continental arid climate with little precipitation, and the average annual temperature and precipitation are 8.1 • C and 112.3 mm, respectively. In HC-GM, 23, 2, 3, 5, 3, and 50 samples were collected from arable land, woodland, grassland, waters (e.g., beaches in this study), construction land, and unused land (e.g., sandy land, Gobi, saline-alkali land, marshland, bare land, and other lands), respectively. Among the samples from HC-GM, 21, 23, and 23 samples were categorized as meadow saline soils, desert sandy soils, and gray irrigated desert soils, respectively, and the remaining samples belonged to other soils. The soil type map used the Chinese soil genetic classification system. The normalized difference vegetation index (NDVI) was calculated from Landsat 8. The Qaidam Basin is located in the northwestern part of Qinghai Province and the northeastern part of the Qinghai-Tibetan Plateau ( Figure 1). It is one of the four major basins in China. The southern part of Golmud in the Qaidam Basin (QB-G) contains the Kunlun Mountains, and the northern part contains the Chaerhan Salt Lake. The landform types in QB-G from south to north include a plateau, a proluvial plain, an alluvial-proluvial plain, an alluvial plain, a salt lake sedimentary plain, and a denudation plain. QB-G has a plateau continental climate, with an average temperature of −6.5 • C in winter and 17.5 • C in summer. This area has a high altitude and a cold climate. In QB-G, 17, 11, 28, and 23 samples were collected from arable land, woodland, grassland, and unused land, respectively. The soil types of 35, 12, 10, and 15 samples were classified as meadow saline soils, desert sandy soils, gray-brown desert soils, and dark yellow-brown soil, respectively, and the remaining samples belonged to other types.

Electrical Conductivity and Eight Major Saline Ions Data
In this study, 79 and 86 samples were collected from QB-G and HC-GM, respectively, based on the accessibility of the areas from September to October 2020 when there was no precipitation to wash the salt away and the evaporation was strong; thus, the salt was exposed at the surface as the water evaporated. We collected samples at a depth of 10 cm below the surface and three replicates at each sample point were collected. The collected soil samples were dried in a ventilated laboratory and then sieved through a 1 mm sieve to test the soil EC and hyperspectral reflectance in the laboratory.
The soil solution was prepared according to a water-soil ratio of 5:1. It was stirred well and left to stand for 1 h. The EC of the filtrate was measured after extraction. In addition, we also measured the eight major saline ions in the soil (Cl − , HCO − 3 , CO 2− 3 , SO 2− 4 , K + , Na + , Mg 2+ , and Ca 2+ ).

Hyperspectral Reflectance Data
We used an ASD FieldSpec 4 spectrometer (Analytical Spectral Devices, Boulder, Colorado, USA) with a spectral range of 350-2500 nm to measure the hyperspectral reflectance of soil samples, with a resampling interval of 1 nm. We placed a 70 W halogen lamp that simulates sunlight in a darkroom in the laboratory. A dark-colored vessel with a diameter of 20 cm and a depth of 2 cm was used to hold the soil samples. The distances between the light source and the probe, and between the probe and the soil, were 50 and 10 cm, respectively, and the zenith angle was 15 • . Before each measurement, we used a whiteboard for calibration, and then we collected 20 spectral curves from four directions. Finally, we took the average value as the original hyperspectral reflectance of the sample.

Landsat 8 Remote Sensing Data
Due to the availability of Landsat data for the study areas, eight Landsat 8 OLI datasets from 30 August to 7 October 2020 were selected for use in this study to mosaic into an image. The cloud content was less than 10%, and the resolution was 30 m.

Soil and Terrain Influence Factors
Soil is a complex system. The reflectance of soil is also affected by the parent material, climate, vegetation, topography, water, and salt content [27]. The higher the degree of salinization, the lower the vegetation coverage [28]. The smaller the particle size of the soil, the weaker the absorption of the spectrum [29]. In this study, salinity indexes, vegetation indexes, water indexes, terrain attributes, and drought indexes were included as modeling parameters (Tables 1-4). The terrain attributes were calculated based on the DEM using the System for Automated Geoscientific Analyses-Geographic Information System (SAGA-GIS), and the other parameters were calculated using the Landsat 8 data in ENVI 5.3. Due to the complexity of the UNVI formula, we calculated it in PyCharm Community Edition 2021.2.1.  Table 2. Vegetation indexes required for the model.

Land Surface Parameters Abbreviation Formula References
, L is the background adjustment parameter and C1 and C2 are the atmospheric correction parameters Universal normalized vegetation index UNVI , where i is the band number, R(i) is the spectrum under the i band of the ground object, P w , P v , P s and P 4 respectively represent the normalized reflectance value of the four reference samples; C w , C v , C s , C 4 represent the UPDM coefficient corresponding to each sample. [40] γ is the correction coefficient of atmospheric radiation [41] Difference vegetation index DVI B5 − B4 [42] Green vegetation index Optimized soil adjusted vegetation index OSAVI * (B5 − B4)/(B5 + B4 + θ), θ is the soil regulation parameter that has nothing to do with vegetation coverage conditions [44] Renormalized difference vegetation index RDVI * (NDVI × DVI) 0.5 [45] Soil adjusted vegetation index SAVI * (1 + L)(B5 − B4)/(B5 + B4 + L), L is the soil brightness index [46] Transformed difference vegetation index Canopy response salinity index CRSI [48] *: NDVI, EVI, ARVI, GVI, OSAVI, RDVI, SAVI, and TDVI were calculated by the Spectral Tool with built-in formulations in ENVI 5.3.1. Table 3. Water indexes, drought indexes and remote sensing data required for the model.

Methods
The method used in this study included three main steps ( Figure 2). The first step was to select the forms (mathematical form, order, and resampling interval) of the hyperspectral reflectance that corresponded to the top three absolute values of the Pearson correlation coefficient. The hyperspectral reflectances of the selected forms were combined with land surface parameters as the data source of the model. The second step was to calculate the land surface parameters that affect the soil based on Landsat 8 and DEM data. The third step was selecting the variables using LightGBM and modeling using DELM and SCA-Elman for HC-GM, QB-G, and HCQB-GMG.

Hyperspectral Reflectance Data Processing
The band range of hyperspectral data is 350-2500 nm with good band continuity. In this study, first, the 350-399 nm and 2401-2500 nm intervals in the region with low signalto-noise ratios were deleted, and then Savitzky-Golay filtering of the spectral data was conducted. The Savitzky-Golay smoothing filter is a low-pass filter based on polynomial fitting [53]. The Savitzky-Golay filtering formula is shown in Equation (1).
where ρ i is the fitting value of the smoothing noise reduction at the i-point of the spectrum. ρ i is the original reflectance at the i-point. C i is the weight coefficient, and 2m + 1 is the width of the filter window. k is the order of the smoothing polynomial. The number of the left and right size point was 10 and the polynomial order was 2 in the Unscrambler X 10.4. The 1/lgR, lgR, √ R and 1/R mathematical transformations were performed to eliminate noise interference. Then, we used the fractional differential formula to perform 0-2 order (interval is 0.2 order) differential processing and utilized a resampling method whose intervals included 10, 20, 30, 40, and 50 nm. The Grünwald-Letnikov fractional differential formula is shown in Equation (2).
where α is the order, h is the step size, and t and a are the upper and lower limits of the differential. In the Gamma function, [54,55]. These processes were implemented in MATLAB R2018b.

Variable Selection and Inversion of EC
In this study, the Kennard-Stone algorithm was used to divide the soil samples. (1) LightGBM The LightGBM is an ensemble learning algorithm. An ensemble learning algorithm integrates the prediction results of multiple base learners to improve the generalization ability and robustness of the base learners [56]. The existing ensemble learning methods include serialization methods (e.g., the boosting and LightGBM) and parallelization methods (e.g., bagging and random forest). The LightGBM measures the importance of variable i in a single tree by calculating the reduced loss value of variable i after splitting. The variables with importance values greater than 0 were used in the model in this study. The LightGBM was implemented using the machine learning library (Sklearn) in Python3.7.
(2) DELM The extreme learning machine (ELM) was developed in 2004 [57]. Wang et al. [58] provided the code for the ELM in their book. The DELM uses multiple extreme learning machine-auto encoders (ELM-AE) to perform unsupervised training ( Figure 3). The DELM is initialized based on the output weights of the different ELMs. Therefore, the DELM is also called the multi-layer extreme learning machine (ML-ELM). The deep ELM method both significantly improves the network training time relative to a single ELM and can improve the classification accuracy [59]. Other studies have shown that DELM's performance is significantly better than that of principal components regression, partial least squares regression, and neural networks [23]. The activation function in this study was the sigmoid function, and the regularization coefficient was introduced in the solution of the weight coefficient to improve the generalization ability of the model. In this study, the DELM model was built in MATLAB R2018b.
(3) SCA-Elman The Elman model was proposed by J. L. Elman in 1990. It mainly consists of an input layer, hidden layer, inheritance layer, and output layer. Compared with the BP neural network, it has one more inheritance layer, which is beneficial to improving the global stability of the network. Wang et al. [58] provided the code for the Elman model in their book. The SCA is a stochastic optimization algorithm proposed by Seyedali Mirjalili in 2016, and he provided download link (http://www.alimirjalili.com/SCA.html) (accessed on 1 September 2021) for the code in his paper [9]. SCA uses a mathematical model based on sine and cosine to find the optimal solution, and can effectively converge to the global optimal solution. Liu et al. [60] optimized the weights and thresholds of BPNN through particle swarm optimization (PSO), which effectively prevents the training from falling into the local optimum. In the learning process of Pi-Sigma artificial neural networks (PS-ANNs), a position of SCA consists of weight and bias values of the PS-ANNs, and the use of the SCA method in the training of PS-ANNs produces better results than the use of many other artificial intelligence optimization algorithms [24]. Similarly, SCA-Elman used the SCA to optimize the weights and threshold parameters of the Elman model. In this study, SCA-Elman was constructed in MATLAB R2018b.

Model Verification
The RMSE, coefficient of determination (R 2 ), and mean absolute error (MAE) were used to verify the model's ability to predict the EC. The larger the R 2 , the more stable the model. The smaller the RMSE, the higher the accuracy of the model. The range of MAE is [0, +∞). An MAE of 0 means that the predicted value is basically the same as the true value. As the error increases, the MAE also increases [61].

Descriptive Statistics of the EC and Chemistry Types of the Soil Samples
The mean, SD, and CV of the values corresponding to EC for all of the samples in HC-GM, QB-G, and HCQB-GMG were between those of the calibration dataset and the validation dataset (Table 5), indicating that the quality of the selected samples was good. The CVs of the EC for the samples from HC-GM, QB-G, and HCQB-GMG were all greater than 1, indicating a large degree of variation. The CV for HC-GM was greater than that for QB-G.

Hyperspectral Reflectance Curve of Soil Samples
Based on the soil salinity classification standard and the actual situation in the study area, we classified the soil samples as non-saline soil, very slightly saline soil, slightly saline soil, moderately saline soil, and severely saline soil [62] (Figure 4). The slope of the hyperspectral curve for HC-GM and QB-G increased rapidly at 400-600 nm, remained stable at 600-1800 nm with wide and shallower troughs, and gradually decreased at 2000-2400 nm ( Figure 4). This revealed that the hyperspectral reflectance curve for HC-GM and QB-G have obvious wave type characteristics [63]. The hyperspectral reflectance of QB-G's severely saline soil and moderately saline soil was significantly higher than those of the other soils, and the hyperspectral reflectance values of HC-GM samples were directly proportional to their EC. Hygroscopic water refers to the water still contained in the soil after the fresh soil has been dried for 1 week and stabilized under ventilated conditions. The higher the salt content, the greater the moisture content [64]. The soil samples in this study were naturally air dried indoors, so the reflectance of the severely saline soil from QB-G was not the highest. However, this phenomenon was not observed for HC-GM samples, which may be due to their lower salt contents. The shape of soil reflectance curves was affected by the strong water absorption bands at 1450 and 1950 nm, and occasionally weaker water absorption bands at 1200 and 1770 nm in HC-GM and QB-G [65]. An absorption band at 2200 nm was influenced by the vibrational mode of the hydroxyl ion in HC-GM and QB-G [66]. Hydroxyl ion absorption also occurs at 1450 nm, the same as the case of water absorption. Weak absorption bands at 1200 and 1770 nm correspond to the absorption bands observed in transmission spectra of relatively thick water films [67]. Bands at 1450 and 1950 nm were sharp in HC-GM, and the water molecules were located in well-defined, ordered sites; on the contrary, the relatively broader bands at 1450 and 1950 nm in QB-G indicated the water molecules were in relatively unordered sites, probably as water films on soil particle surfaces [66]. The 2200 nm hydroxyl absorption band could be seen in samples of HC-GM and QB-G. The organic-affected form exhibits a concave shape from 500 to 750 nm with a convex shape from 750 to 1300 nm [68], which showed that the spectral curves in HC-GM and QB-G belonged to the organic-affected form. Karmanov [69] found that the reflection intensity of iron hydroxides containing little water and having a dark brown-red color increased most strongly in the wave interval from 550 to 600 nm, which was same as the curve in HC-GM and QB-G. The curve in the study areas was distinguished by an iron absorption band at around 870 nm [66]. Based on the soil classification scheme [70], 20%, 13%, 31%, and 36% of all samples from HC-GM and 44%, 18%, 30%, and 8% of all samples from QB-G were classified as chloride-type, sulfate-chloride-type, chloride-sulfate-type, and sulfate-type soils, respectively ( Figure 4). Thus, the samples from HC-GM and QB-G were mainly sulfate type and chloride type, respectively. The soil chemicals in the central and southern parts of the Qaidam Basin are mainly NaCI and KCl [71], whereas those in the Hexi Corridor are dominated by sulfate and chloride-sulfate [72], which was consistent with the contents of the eight major saline ions measured in this study. Moreover, NaCl and KCl have weak absorption from the visible to thermal infrared bands, and the average reflectivity of NaCl is higher than that of NaSO 4 [64,73,74]. Therefore, the reflectance of the soil in QB-G was generally higher than that in HC-GM.

Correlation between EC and Different Forms of Hyperspectral Reflectance Data
In this study, the correlation coefficients between the EC and each band (400-2400 nm) of the five forms (R, 1/lgR, 1/R, lgR, and √ R) of hyperspectral reflectance in the 0-2 order at different resampling intervals were calculated. Obviously, the absolute value of the correlation coefficient when the resampling interval was 1 nm was significantly higher. The correlation coefficients of 1/lgR and R were better than those of 1/R, lgR, and √ R for HC-GM, QB-G, and HCQB-GMG ( Figure 5). The order of hyperspectral reflectance of the five forms in all study areas corresponding to the top three absolute values of the correlation coefficients was 0.8, 1.2, 1.4, and 1.6. Based on this, we selected 1/lgR and R of orders 0.8, 1.2, 1.4, and 1.6 for the modeling when the band interval was 1 nm.

Modeling Results of DELM and SCA-Elman
The MAE v , RMSE v , and R 2 v of the best model among all of the SCA-Elman models in HC-GM were 0.10, 0.14, and 0.73, respectively, which were better than those of DELM for HC-GM. The model with the highest accuracy for QB-G was SCA-Elman whose MAE v , RMSE v , and R 2 v were 0.09, 0.12 and 0.75, respectively. The best model from all of the models in HC-GM, QB-G and HCQB-GMG was DELM because its multiple hidden layers were more suitable for handling large samples in HCQB-GMG (MAE v = 0.08, RMSE v = 0.11, R 2 v = 0.77), and whose accuracy was slightly higher than that of SCA-Elman ( Figure 6). The accuracy of SCA-Elman for HC-GM and QB-G was basically higher than that of the DELM, but the accuracy of DELM for HCQB-GMG was slightly higher than that of SCA-Elman. Therefore, SCA-Elman is more suitable for salinity prediction in the study areas.

Modeling Results for Different Data Forms and Different Regions
For HC-GM, the order with the highest accuracy of DELM was 1.4 for R, whereas that of SCA-Elman was 1.4 for 1/lgR ( Figure 6). For QB-G, the order with the highest accuracy of DELM and SCA-Elman was 0.8 for R. For HCQB-GMG, the order with the highest accuracy of DELM was 1.6 for 1/lgR, whereas that of SCA-Elman was 0.8 for 1/lgR. Overall, the accuracies for the 1/lgR with different fractional orders were slightly higher than those of R, and its over-fitting and under-fitting phenomena appeared less frequently than those of R. The accuracies of most of the models with data sources on the order of 0.8 were basically better than those with orders of 1.2, 1.4, and 1.6, indicating that the fractional differential transformation and mathematical transformation improved the accuracy of the model.
By counting the average values of the accuracy indicators of the different models for HC-GM, QB-G, and HCQB-GMG, we found that the accuracy of all of the SCA-Elman and DELM models for HC-GM and QB-G was similar (Table 6). In HC-GM and QB-G, the accuracy of SCA-Elman was slightly higher than DELM, which indicated that SCA-Elman was more suitable for salinity prediction in these two study areas.

Correlation between Different Surface Parameters and EC
All vegetation indexes in QB-G passed the significance test, whereas in HC-GM only GDVI passed the significance test (Figure 7). GDVI has higher sensitivity than NDVI, SAVI, EVI, and SARVI under low vegetation coverage [37]. Most of the salinity indexes in HC-GM passed the significance test, and the number was slightly greater than that of QB-G. The MPDI and NDWI of QB-G and HC-GM passed the significance test. Among all the terrain attributes in HC-GM, only flow-line curvature passed the significance test, whereas many terrain attributes in QB-G showed a high correlation with EC. Other parameters of remote sensing data that passed the significance test in QB-G were wetness of Kauth-Thomas (K-T) transformation, PC3-6, and BI. Brightness of K-T transformation, PC1, texture (homogeneity, contrast, dissimilarity, entropy, second, correlation), and BI passed the significance test in HC-GM.

Analysis of Correlation between Different Surface Parameters and EC
The correlations between the vegetation indexes and the EC in the two regions were relatively high, but they were higher in QB-G than in HC-GM. Generally, the higher the degree of salinization, the lower the vegetation coverage [75]. The EC of most of the samples from QB-G was higher, and was coupled with the cold-dry climate of this area. This had a harmful impact on the vegetation, which mainly comprises salt-sensitive plants, and enhanced the sensitivity of the vegetation to salinity. The Heihe River Basin in the Hexi Corridor is becoming warmer and wetter [76,77], which enhances the salt tolerance of vegetation, especially halophytes. NDVI proved to be an ambiguous indicator of soil salinization because it is also related to biomass, leaf area, plant cover, and nitrogen and chlorophyll content, and its sensitivity differs among species [78]. Zhang et al. found that most vegetation indexes have a weak relationship with soil salinity (mean R 2 = 0.28) [79]. In the study of Wang et al., NDVI, RVI, GDVI, SAVI, and EVI failed the significance test (p < 0.05) with the soil EC [21]. In addition, most of the samples (50 of 86 samples) on HC-GM were distributed on unused land with low vegetation coverage, which was proved by the NDVI map of Figure 1, and most vegetation indexes are not applicable to this land use type. However, there were fewer samples (23 of 79 samples) of unused land in QB-G.
The climates of HC-GM and QB-G are dry. Evaporation is relatively strong in high salinity regions, and the surface is relatively dry [80]. PDI is more suitable for bare soil. Ghulam et al. introduced the vegetation coverage factor into the PDI and proposed the MPDI [52]. In this study, some samples were distributed in grassland and arable land with vegetation. HC-GM is located in an oasis in the middle reaches of the Heihe River, so its terrain is flat. The altitude in QB-G gradually decreases from the southern mountains to the salt lake in the center of the Qaidam Basin, and the vegetation cover decreases with decreasing distance from the salt lake. Akramkhanov et al. [81] found that most of the terrain attributes had a low correlation with soil salinity because the study area was very flat, which was similar to HC-GM. However, other research proved that combining satellite data with DEM data to study soil salinity will make the results more effective and improve accuracy [6], when the surface terrain is similar to that of QB-G.
In another study, we found that PC2 and PC3 attained the highest correlation with the soil EC and there was a statistically significant correlation between the measured soil salinity and the wetness of K-T transformation [21]. Wang found that the wetness of K-T transformation and the salinity index were important [82], which was consistent with the research in this article. QB-G and HC-CS have many salinity indexes that pass the 0.05 and 0.01 significance tests, and their number was higher than that of other indexes. The correlation between the texture and EC was higher in HC-GM than that in QB-G. The mean EC of HC-GM (8.06) was far below that of QB-G (30.92). There were obvious salty crusts in QB-G. In the field investigation, we found that the soil of QB-G has a puffy salty frost and structural crust due to irrigation in the cultivated land, and a smooth salty crust in the grassland, which had higher reflectance than the slight and non-saline soil without crust or salty frost [75,83,84]; this indicated that its surface conditions were relatively uniform ( Figure 7). The soil type of Figure 1 shows that most of the samples in HC-GM were evenly distributed in different soil types, whereas many samples (35 in 79) from QB-G were distributed in meadow saline soils. Thus, the texture in QB-G had a weak correlation with EC. Because BI carries the brightness information of the image, BI is positively correlated with EC, and the area with high BI corresponds to the high EC in the image [85]. The BI of QB-G and HC-CS were both significantly correlated with EC. According to remote sensing images, the high values of PC are mainly distributed in high brightness areas [86]. PC3 and EC were highly correlated in QB-G, and PC1 and EC were significantly correlated in HC-CS, because of the high amount of information of the first three components of the PCA. Metternicht and Zinck [75] proved that PCA of remote sensing images was a useful method to distinguish saline soils. In the Ardakan region of Yazd Province in central Iran, the most important of various auxiliary data for EC prediction were PC1-3 and terrain attributes [87].
The indexes that passed the significance test in HC-GM (PDI, MPDI, VSSI, Int2, SI2, NDSI, BI, GDVI) and QB-G (MPDI, MSAVI, CRSI, NLI, VSSI, S5, NDSI, BI, GDVI, DVI, EVI, GVI, and NDVI) were all calculated with the fourth (red band) and fifth bands (near infrared band) of Landsat 8. Other studies have shown that the red and near-infrared bands contain more soil salinity information [88,89]. Over-fitting and under-fitting problems indicate that the generalization ability of the model is weak [90]. HCQB-GMG (R 2 v = 0.77) had the highest accuracy among all of the models for all of the areas. It was even slightly higher than that of SCA-Elman (R 2 v = 0.74), which shows that the two areas have similarities in terms of dry climate. However, DELM and SCA-Elman had serious over-fitting problems for HC-GM and higher accuracies for QB-G. The changes in the surface environment, especially the altitude and vegetation, are subject to more obvious natural laws in QB-G, whereas HC-GM is located in the flatter oasis of the Heihe River with a single surface environment. Some studies achieved similar results [82,[91][92][93] (Table 7), including those of Peng et al. [91] and Wang et al. [82], whose study areas comprised flat alluvial fans and oases, respectively. Thus, most of the land surface parameters had higher correlation coefficients in QB-G than in HC-GM. This resulted in the lower accuracies of the models for HC-GM. In addition, this may be due to the relatively lower matching between the sub-samples and pixels of Landsat 8 in HC-GM than that in QB-G.

Advantages of Hyperspectral Data and Fractional Differential Transformation
Hyperspectral data contain abundant spectral information [94], but they are prone to data redundancy, so it is necessary to select characteristic variables. The image filtering method based on the fractional order is better than that based on the integer order, which can significantly enhance image edges and avoid large noise [95]. The 0.8, 1.2, 1.4, and 1.6 order differential transformations in this study significantly enhanced the correlation between the hyperspectral reflectance and the EC. The models based on these four fractional orders performed better than those based on the original spectral reflectance in HC-GM, QB-G, and HCQB-GMG. Other scholars have reached similar conclusions. Hong et al. [96] reported that the partial least squares support vector machine (PLS-SVM) based on an order of 1.25 achieved the best performance.

Analysis of the Different Machine Learning Algorithms
At present, many improved algorithms for ELM have been developed [97][98][99]. In this study, the introduction of a regularization coefficient improved the generalization ability of the model [23]. Thus, the simulation accuracy of the DELM for HCQB-GMG was the highest among all of the models for all of the areas. ELM has a single hidden layer, and DELM uses a multi-hidden layer structure to increase the applicability for large samples. The SCA algorithm is characterized by a simple structure, fast convergence speed, high exploratory power, and local optimal avoidance. In this study, the SCA algorithm was used to optimize the weight and threshold parameters of the Elman net. The simulation accuracies of most of the SCA-Elman results for HC-GM and QB-G were better than those of the DELM. Even for HCQB-GMG, the accuracy of SCA-Elman was only slightly lower than that of DELM. Nabiollahi et al. reported similar results using three optimization algorithms (i.e., particle swarm optimization (PSO), genetic algorithm (GA), and bat algorithm (BAT)) to compare the hybridized RF and the standard RF, and concluded that the PSO-RF performed best [100].
Taghadosi et al. extracted features from the radar intensity images and texture analysis of Sentinel-1 data and established soil salinization monitoring models [101]. Zhang et al. extracted normalized backscatter coefficient, entropy, alpha, anisotropy, and other radar indexes from Sentinel-1 dual-polarized data for salinization inversion [102]. Two gammanought backscattering coefficients and various textures features were included in the soil salinity mapping in the study of Hoa et al. [103], all of which achieved good results. Shi et al. conducted a meta-analysis of salinization prediction research, selected 57 articles, and found that models using radar data such as Sentinel-1 outperformed Landsat, whereas models using hyperspectral satellite data such as HJ-1 and EO-1 Hyperion performed similarly to Landsat [104]. Therefore, radar data can be used for soil salinization prediction modeling and regional mapping in the future.

Conclusions
Soil salinization is a serious land degradation problem in arid and semi-arid regions of the world. Hyperspectral data has the advantages of high spectral resolution, continuous bands, and rich information, which are beneficial to the modeling of this research. In this study, QB-G in the Qaidam Basin and HC-GM in the Hexi Corridor were chosen, combined with machine learning methods to monitor the difference in the salinization in these two areas. The 1/lgR mathematical transformation was found to improve the correlation between the hyperspectral reflectance and the EC. In addition, the correlations between the hyperspectral reflectance of the 0.8, 1.2, 1.4, and 1.6 orders and the EC were significantly better. The soils of QB-G mainly belonged to the chloride type, whereas sulfate-type soils predominated in HC-GM. The reflectance of the chloride-type soil was higher than that of sulfate-type soil. The results of most of the SCA-Elman modeling for HC-GM and QB-G were better than those of DELM, indicating that SCA-Elman was more suitable for monitoring salinity in these areas. The accuracy of the salinization monitoring model for QB-G was higher than that for HC-GM. The topography of the oasis in HC-GM is flatter with less obvious surface changes. The topography and vegetation in QB-G exhibit regular changes as the altitude decreases from the south to the center of Qaidam Basin, and its cold-dry climate weakens the tolerance of the vegetation to salt, which results in a higher correlation with the EC. However, DELM for HCQB-GMG had the highest accuracy for HC-GM, QB-G, and HCQB-GMG, which shows that HC-GM and QB-G are similar in terms of their dry climates. This study can provide a valuable reference for salinity prediction and regional development.