Inversion of Soil Salinity Using Multisource Remote Sensing Data and Particle Swarm Machine Learning Models in Keriya Oasis, Northwestern China

: Soil salinization is a global problem that damages soil ecology and affects agricultural development. Timely management and monitoring of soil salinity are essential to achieve the most sustainable development goals in arid and semi-arid regions. It has been demonstrated that Polarimetric Synthetic Aperture Radar (PolSAR) data have a high sensitivity to the soil dielectric constant and soil surface roughness, thus having great potential for the detection of soil salinity. However, studies combining PALSAR-2 data and Landsat 8 data to invert soil salinity information are less common. The particle swarm optimization (PSO) algorithm is characterized by simple operation, fast computation, and good adaptability, but there are relatively few studies applying it to soil salinity as well. This paper takes the Keriya Oasis as an example, proposing the PSO-SVR and PSO-BPNN models by combining PSO with support vector machine regression (SVR) and back-propagation neural network (BPNN) models. Then, PALSAR-2 data, Landsat 8 data, evapotranspiration data, groundwater burial depth data, and DEM data were combined to conduct the inversion study of soil salinity in the study area. The results showed that the introduction of PSO generated a satisfactory estimating performance. The SVR model accuracy (R 2 ) improved by 0.07 (PALSAR-2 data), 0.20 (Landsat 8 data), and 0.19 (PALSAR + Landsat data); the BP model accuracy (R 2 ) improved by 0.03 (PALSAR-2 data), 0.24 (Landsat 8 data), and 0.12 (PALSAR + Landsat data), and then combined with the model inversion plots, we found that PALSAR + Landsat data combined with the PSO-SVR model could achieve better inversion results. The ﬁne texture information of PALSAR-2 data can be used to better invert the soil salinity in the study area by combining it with the rich spectral information of Landsat 8 data. This study complements the research ideas and methods for soil salinization using multi-source remote sensing data to provide scientiﬁc support for salinity monitoring in the study area.


Introduction
Soil salinization is one of the main types of land degradation that affects the ecological environment and crop production security worldwide [1,2]. Due to the dry climate, high evaporation, and shallow depth of groundwater in arid and semi-arid areas, a large amount of salts in the soil gather on the surface, causing salinization; this, coupled with the mismatch between the irrigation and drainage systems of residents and unscientific farming methods, where a large amount of irrigation water is replenished to groundwater, causing secondary salinization, makes the already fragile soil ecosystem of arid areas even more unstable [3][4][5]. Excessive salt accumulation can affect the growth and development process of crops through infiltration and ionic stress, and can also affect the soil structure, eventually reducing the fertility of arable land into a wasteland and affecting food production [6][7][8][9][10].
In addition, salinization affects species diversity, harms human society, degrades roads, and destabilizes buildings [11]. To date, more than three percent of the world's soil resources have been affected by salinization [12]. China attaches great importance to the problem of soil salinization, and the total area of saline soils in China is 3.67 × 107 ha, mainly in the northeastern plains, north-central, northwestern inland, northern China, and coastal areas [13,14]. Xinjiang is the largest province in China in terms of land area, and its saline soil area accounts for 60.6% of the total saline soil surface in the country. As a region with mainly agriculture and animal husbandry, salinization has seriously threatened the sustainable development of agriculture and ecological security in Xinjiang [2,8,15]. Therefore, real-time monitoring and scientific management of soil salinization are essential in Xinjiang.
Remote sensing technology has the characteristics of large scale, timeliness, and periodicity, and only a small amount of field sampling data is needed to establish a model relationship between the remote sensing information of soil salinity and actual ground measurement points. It overcomes the disadvantages of traditional methods of measuring soil salinity (time-consuming and laborious, inability to obtain dynamic information, etc.) and is increasingly used by scholars in the study of soil salinization [16][17][18]. Abderrazak EI Harti et al. estimated soil conductivity in the Tadla plain of central Morocco using Landsat TM/OLI satellite imagery data for the period 2000-2013 and proposed a new soil salinity index (SI) [19].
Hongnan Jiang et al. used HJ-1A hyperspectral data to select the wavelength 510.975 nm for soil spectral reflectance and vegetation index to estimate the surface soil salinity of the Kuqa Oasis in Xinjiang, which provided a new concept for the application of hyperspectral data [20]. Fahad Khan Khadim et al. used remote sensing to map the soil salinity distribution in Everglades National Park and established salinity tolerance thresholds for various vegetation types in the study area [21]. Zheng Wang et al. combined the advantages of remote sensing and GIS, proposed a method to evaluate the risk of uncontrolled soil salinity based on a comprehensive scoring method, and used it for a salinity risk assessment of the Ebinur Lake watershed in Xinjiang [22]. Numerous scholars have found that saline soils have distinct spectral features in the visible and near-infrared spectra, such that these feature components can be used to analyze and study soil salinity. However, traditional optical data are susceptible to cloud cover, weather, and light; accordingly, there are still limitations in the study of soil salinization.
Microwave remote sensing has the advantage of all-day, all-weather operation with high penetration, which can compensate for the deficiencies of optical images in soil salinity studies [23][24][25][26]. Numerous studies have shown that soil water content and salinity affect the dielectric constant of the soil, and the magnitude of the dielectric constant affects the backscatter coefficient in microwave images [27][28][29]. Mohammad Mahdi Taghadosi et al. used Sentinel-1 SAR images combined with support vector regression (SVR) to predict the soil salinity in Qom County, Iran, and found through a comparison that both VV and VH polarization data of SAR images can identify surface saline soils [30]. A.N. Romanov et al. compared SMOS satellite subsurface brightness temperature data with field survey data from the Kuranda Steppe in Siberia and found a good correlation between them [31]. Leilei Dong et al. introduced saturation as a new parameter for the Dobson model, thus improving the accuracy with which the dielectric constant of saline soils can be estimated in the microwave C-band. The new model was also compared with other models, and it was found that the model with the new parameter could better simulate the imaginary part of the dielectric constant, providing a scientific basis for microwave remote sensing inversion of soil salinity [32]. In summary, the application of microwave data is still at a preliminary stage compared to the application of optical data, and the ability to utilize microwave data in salinity research needs to be explored.
Remote sensing technology provides new methods for the study of salinization, but algorithms and models are needed to further improve the accuracy of the salinization information extraction [2,16]. Differences in models and algorithms, from simple linear regression to complex machine learning, may lead to inconsistent results from the same remote sensing data, and researchers have continuously tried to apply new methods to study soil salinity to improve the prediction accuracy. For example, multiple linear regression [33,34], neural network models [35], support vector machine models [36], partial least squares regression models [37], and Cubist models [38] have shown their potential for soil salinity studies. Other researchers have compared multiple models to select the best regional salinity inversion model [39][40][41][42]. There are many algorithms and models that have not been used in the field of soil salinization research to date, and thus have yet to be explored deeply by researchers.
In this paper, the study on soil salinization in the Keriya Oasis consists of the following: (1) Taking the study area as an example, we introduce particle swarm optimization into the study of salinization; the PSO-SVR, PSO-BPNN, SVR, and BPNN machine learning methods are used to model the inversion of soil salinity in this area, and the models are compared and analyzed to identify the most suitable inversion model for the study area.
(2) Comparing the advantages and disadvantages of microwave remote sensing images and optical images for inversion of soil salinity in the study area by inputting PALSAR-2 and Landsat 8 data sources into the model individually or in combination. (3) The optimal model and the best data source of this study were used to invert the soil salinity in the study area and analyze its distribution. This study optimizes the model based on multisource data to improve the prediction accuracy of soil salinity and provide a scientific basis for reasonable management of salinization in the study area.

Study Site
Keriya Oasis (36 • 47 ~37 • 6 N, 81 • 8 ~81 • 45 E) is located in Keriya County, Hotan Region, Xinjiang, bordering Taklamakan Desert and Shaya County in the north, Tibet Province in the south, Qira County in the west, and Minfeng County in the east, as shown in Figure 1. Influenced by a continental arid climate and the geomorphological pattern between mountains and basins, a typical oasis-desert ecosystem is formed. The Keriya Oasis has a high topography in the south and a low topography in the north, with an altitude of 1304-1639 m, rich in heat and light, with an average annual temperature of 12.4 • C, a cumulative temperature of ≥10 • C of 4340 • C, total annual radiation of 6.12 × 10 5 J/cm 2 , and an annual sunshine duration of 2.73 × 10 3 h [25]. The average annual precipitation of the plain oasis is only 14 mm, and the average annual evaporation is as high as 2500 mm. The natural vegetation of the Keriya Oasis is mainly reeds, strange willow, poplar, camel thorn, etc.; the soil type is mainly meadow soil and brown desert soil. The Kriya River, a seasonal river, is the main water source of the Keriya Oasis, originating at the foot of the Kunlun Mountains and eventually disappearing into the dunes of the Taklamakan Desert. The Kriya River basin is home to approximately 250,000 inhabitants, most of whom are mainly engaged in farming, with the main types of cultivation including cotton, wheat, corn, and grapes. Salinization studies in this area can be very helpful for local ecological conservation and sustainable agricultural development [43,44].

Data
Phased-Array L-Band Synthetic Aperture Radar (PALSAR) is an L-band synthetic aperture radar sensor on the Earth observation satellite Advanced Land Observing Satellite (ALOS) launched by Japan, which is mainly used for mapping, regional observation, and disaster monitoring [45,46]. The full polarization (HH, HV, VH, VV) data on 23 April 2015, were acquired in this study. Because PALSAR is an active microwave sensor, it can observe features all day and all night, and the full polarization imaging also makes the images have more features reflection, which can better express the soil salt distribution and accumulation in the study area. The ALOS-PALSAR-2 data used in this paper are Level 1 images, which can be pre-processed in the SARscape5.2.1 ® modules of ENVI5.3 ® , including multi looking processing, 3 × 3 window refined Lee filtering, geocoding, and radiometric calibration, to obtain the standard quad polarization backward scattering coefficient map.
Landsat 8 is a satellite jointly developed and manufactured by NASA and USGS for medium resolution observations of the globe. The Landsat 8-OLI (OLI) has nine bands for effective monitoring of soil salinization [47,48]. To reduce the variability due to time, the Landsat 8-OLI satellite image of 26 April 2015 was selected in this paper. The images were pre-processed into different index operations. To roughly coincide with the imaging time, the field-sampled soils collected from 20 April to 1 May 2015 were selected for this study, and 65 representative sample points were selected to collect surface soil from 0 to 20 cm based on the regional extent, soil characteristics, vegetation distribution status, geomorphological features, and climatic conditions. The coordinates of each soil sample point were recorded by GPS and brought to the laboratory after oven drying, grinding, and filtering through a 1 mm sieve [49,50]. The drying method (105 • C incubator, 48 h) was used to determine the soil water content. After configuring a soil solution with a soil-to-water ratio of 1:5, the soil conductivity was determined by a conductivity meter at room temperature (25 • C). Finally, the total soluble salt content (in g/Kg) was calculated by establishing an equation between conductivity and total soluble salt [51].

Water Could Model
The study area is in an inland arid region, but the surface in the oasis region is still covered by a great deal of vegetation, whose scattering and absorption effects on microwave signals cannot be ignored, and to accurately obtain the backscattering coefficient of the sub-bedding soil, the interference of the vegetation layer on the scattering needs to be removed [52]. Attma et al. proposed the Water Could Model (WCM), which simplifies the scattering process of the microwave remote sensing by assuming that the vegetation layer is a horizontally homogeneous cloud layer and disregarding the scattering effect between the vegetation layer and the sub-bedding soil layer [53], with the following expressions: where σ 0 PP is the total backscatter coefficient; σ 0 veg is the vegetation layer's backscatter coefficient; σ 0 soil is the soil surface's backscatter coefficient; γ 2 is the attenuation factor of penetrating the vegetation layer; m veg is the vegetation water content; θ is the radar wave incidence angle; and A, B are the parameters of the vegetation. Table 1 shows the empirical parameters obtained from previous studies [54]. Bringing all parameters into the equation yields the bare soil backward scattering coefficient (Equation (4)).
The results of the backscattering coefficients obtained after the WCM processing are shown in Figure 2. It can be seen from the figure that the backscattering coefficients of the four polarization methods are reduced after the treatment of the WCM. Figure 3 represents the correlation coefficients of the backscattering coefficients and soil salinity before and after the processing of the WCM, from which it can be seen that the correlation between the backscattering coefficients and soil salinity of all four polarization methods has improved after the treatment. It can be seen that the WCM can remove the interference of the vegetation layer on the backscattering coefficients in the study area.

Back-Propagation Neural Network (BPNN)
BPNN is a classical machine learning method whose algorithm was proposed by Rumelhart et al. in 1986 [55], and its operation is realized by two processes: forward propagation when the data flow is passed from the input layer to the implicit layer and then to the output layer, and if the desired output is not obtained in the output layer, the back propagation of the error signal is started, and the weights and thresholds are adjusted by feeding the errors back to the individual neurons. Forward and backward propagation is alternated to achieve the minimum value of the network error function [56,57]. The back propagation of the error signal makes the BPNN have a strong nonlinear fitting capability and is suitable for application to geography-related problems.

Support Vector Machine
Support vector machine is a machine learning algorithm developed on the theoretical basis of statistics and structural risk minimization [58]. It seeks the optimal plane for solving the original low-dimensional problem by transforming it into a high-dimensional space through kernel functions. Additionally, the goal of SVM is to obtain the optimal solution under the available information rather than the optimal solution under the number of samples and thus is suitable for problems with small samples [59,60]. By changing its kernel function and related parameters, SVM can be divided into two categories, support vector classification (SVC) and support vector regression (SVR), and the radial basis function is chosen as the kernel function of SVR in this paper. Radial basis functions are widely used for different dimensional and sample problems with their powerful nonlinear mapping ability [60]. However, the accuracy of the model simulation is largely affected by the penalty parameter C and the kernel parameters, so it needs to be optimized. The radial basis function is formulated as follows.
where x is the vector of the prediction factor; x is the sample vector of the SVM; and γ is a kernel parameter.

Particle Swarm Optimization
Because of the complex nonlinear relationship between the selected inversion parameters and soil salinity in this article, particle swarm optimization (PSO) was used for the optimization of key parameters in the SVR model and BPNN model. The idea of the particle swarm algorithm is derived from the foraging behavior of birds and was first proposed by Kennedy and Eberhart [61]. The algorithm treats each solution of the optimization problem as a random particle, and by simulating the foraging process of birds, each particle has its random position and flight speed. During the iterative process, each particle records and updates its current position and velocity and records its optimal position and the optimal position of the population [62,63]. For example, in an optimal problem sought in an n-dimensional space, there are m particles forming a population, the position of the ith particle is denoted as The iterative formulas for updating the velocity and position of each particle in the population are as follows [64].
where i = 1, 2, 3, . . . , m; p i is the individual extreme point position; p g is the overall situation extreme point position; ω is the initial weight value of inertia, generally ranging from 0 to 1.4; c 1 , c 2 are the accelerating coefficients; and r 1 , r 2 are random numbers between 0 and 1. The addition of the particle swarm algorithm allows the BPNN model to overcome the drawbacks of slow convergence and easy to fall into local optimal solutions, and the optimal weights and thresholds are proposed to determine the minimum grid error [57]. For the SVM model based on the principle of structural risk minimization, the addition of the PSO algorithm with global convergence could better ensure the rationality of the parameter optimization [64]. Therefore, the PSO algorithm is chosen to optimize the model in this paper.

Model Variables
In this paper, through a large number of experiments, 18 polarization combinations with high correlation with the soil salinity in this study area were selected by the Pearson correlation coefficient method, and then the four with the highest correlation were selected: VV/VH (−0.4), VV − VH/VV + VH (0.41), HV − HH/HV + HH (0.46), and (HV 2 − HH 2 )/(HV 2 + HH 2 ) (0.44) as model parameters to invert the soil salinity (e.g., Figure 4). In contrast to the spectral information-rich Landsat data, researchers have established numerous soil indices and vegetation indices for scientific studies. In this paper, the soil salinity index (SI-T) [65], salinity index (SI2) [66], salinity ratio index (SAIO) [33], and canopy response salinity index (CRSI) [16], which have a high correlation with soil salinity in the study area, were selected as model parameters based on previous studies. To make the predictions more accurate, this paper also included evapotranspiration, groundwater burial depth, and digital elevation data, which have been previously shown to be related to soil salinity [67][68][69], as shown in Table 2. When constructing the model, due to the different data sources, three sets of BPNN structures were constructed in this paper, whose input, the implicit and output layer neurons, were the PALSAR-2 data source: 7:75:1; Landsat 8 data source: 7:80:1; and PALSAR + Landsat data source: 11:155:1. The output layer of all three neural network models is soil salinity, and gradient descent algorithm with variable learning rate is used as the training function, with a learning rate of 0.01, 1000 iterations, and an allowable error of 0.0001. The structure of the PSO models for the three different data sources is the PALSAR-2 data source: 60 evolution times, a population size of 30; Landsat 8 data source: 60 evolution times; the inertia variable is 1 in all three models, and the accelerating coefficients are 2 and 1, respectively.
In this paper, the BPNN, SVR, PSO-BPNN, and PSO-SVR models were established in MATLAB ® , and the model accuracy was evaluated by the Coefficient of Determination (R 2 ) and Root Mean Square Error (RMSE) to evaluate the merits.

Data Division
In this study, a total of 65 soil sample points were collected, with soil salinity ranging from 0.0046 to 59.7895 g/kg, and the mean value was 9.2930 g/kg with a standard deviation of 11.6379 g/kg. By referring to Wei et al.'s study, the conditional Latin Hypercube Sampling design (cLHS) was used to divide all sample points into training and validation sets in the ratio of 7:3 [70]. The training set has 46 soil sample points with salinity ranging from 0.0046 to 59.7895 g/kg, with a mean value of 8.7631 g/kg and a standard deviation of 11.1126 g/kg; the validation set has 19 soil sample points with salinity ranging from 0.0495 to 53.1814 g/kg, with a mean value of 10.5757 g/kg and a standard deviation of 13.1318 g/kg. As shown in Figure 5, the data characteristics of the three data sets were relatively similar.

Results
To compare the predictive ability of the different models for soil salinity in the study area, different inversion parameters were brought into the models in this study, and the R2 and RMSE of the training and validation sets are shown in Table 3. As seen from Table 3, relative to the three data sets, the predicted values of the PSO-SVR model improved by 0.07 (PALSAR-2 data), 0.20 (Landsat 8-OLI data), and 0.19 (PALSAR + Landsat data) compared to the R 2 of the SVR model, and the R 2 of the PSO-BPNN model improved by 0.03 (PALSAR-2 data), 0.24 (Landsat 8-OLI data), and 0.12 (PALSAR + Landsat data). After adding PSO, the current optimal position of each particle is recorded when the model iterates, and based on the position of the individual in the whole, the algorithm determines whether the historical optimal solution of the particle population needs to be updated and continues to iterate on top of the optimal solution; thus, the simulation accuracy of both the SVR and BPNN models is improved.
According to Table 2, the different input variables of the different data sources and different model input parameters affect the simulation accuracy of the model. However, the prediction accuracy of the three data sources in general is from high to low for PALSAR + Landsat, Landsat 8-OLI, and PALSAR-2. The scatter plots of the results predicted by the different data sources and different models are presented in Figure 6. From the figure, we can find that the predicted values of all models are low when the soil salinity is higher than 20 g/kg. The predicted values of the PALSAR + Landsat data source are less discrete compared with the single data source, indicating that the combination of multiple remote sensing data sources can reflect the soil salinity information in the study area more effectively; the predicted values of the model with PSO are relatively less discrete compared with the initial model. When Palsar2+Landsat is used as the data source, the simulated values of the PSO-SVR model are more uniformly distributed around the 1:1 line in the range of 0-20 g/kg, and the predicted values above 20 g/kg were lower than the predicted values but closer to the 1:1 line compared with other models, indicating that the model is relatively stable and has better accuracy. To compare the inversion results of the different data sources and different methods in a more detailed way. In this paper, three representative sub-study areas-I, II, and III-were selected to compare the local features of the inversion results. As shown in Figure 7A, sub-study areas I and II are at the edge of the oasis and are located at the intersection of the oasis and desert. Soil salinity is lower in oasis and higher in desert soils, resulting in more significant differences in soil salinity in the interlaced zone, which can better reflect the inversion results. Sub-study area III is in the interior of the oasis and can reflect the model inversion of soil salinity in the interior of the oasis. We found through our fieldwork that sub-study area I is located in an area of moderate salinity, sub-study area II is located in an area of more severe salinity, and sub area III is located within agricultural land with a low level of salinity, but with minor secondary salinity occurring. The inversion results of the model can also be better represented by different degrees of salinization. To compare the inversion results of different data sources, the PALSAR-2 data source, Landsat 8 data source, and PALSAR + Landsat data source were brought into the PSO-SVR model in this paper, and the inversion results are compared in the sub-study areas. As shown in Figure 7A, (a) indicates the Landsat 8 true color (RGB) images; (b) indicates the PALSAR + Landsat data source inversion results; (c) indicates the Landsat 8 data source inversion results, and (d) indicates the PALSAR-2 data source inversion results. In sub-study areas I and II, the inversion results from the Landsat 8 data source are closer to the soil salinity distribution characteristics of the sub-study area than the PALSAR-2 data source; the inversion results from the PALSAR-2 data source are too fragmented and invert areas with high salinity as low value areas, but retain the feature boundary characteristics well. In sub-study area III, the Landsat 8 data source has too-high inversion values for field soil salinity, and the PALSAR-2 data source inversion results are similarly too fragmented and do not quite match the soil salinity distribution characteristics of the sub-study areas. In the three sub-study areas, the inversion results of the PALSAR + Landsat data sources combine the high-resolution texture information of PALSAR-2 and the rich spectral information of Landsat 8 to make the inversion results more detailed and better reflect the variability in soil salinity. To compare the inversion results of the different models, PALSAR + Landsat data were brought into the PSO-SVR, PSO-BPNN, SVR, and BPNN models in this paper to compare the inversion results in the sub-study area. As shown in Figure 7B As can be seen from the figure, the inversion results of the model without PSO are relatively less good and do not reflect well the differences in soil salinity between the oasis and desert interlacing zones. The inversion values of soil salinity inside the oasis are higher and do not match the soil salinization in the sub-study areas. The inversion results of the PSO-SVR model are more detailed and well represent the differences in soil salinity in different areas, both in the interlaced zone and within the oasis, and are more consistent with the soil salinity distribution characteristics of the sub-study areas (sub-study area I moderate salinity, sub-study area II high salinity, and sub-study area III low salinity) than those of the PSO-BPNN model. With Table 3, the accuracy of the predicted values was improved by bringing the PALSAR + Landsat data source into the PSO-SVR model. Therefore, the PSO-SVR model and PALSAR + Landsat data source were finally selected in this paper to invert the soil salinity in the study area, and the results are shown in Figure 8. To avoid errors as much as possible, the river area in the study area is masked in this paper.   The inversion results of soil salinity in the study area using the optimal model PSO-SVR and the optimal data source PALSAR + Landsat. The white part is the masked river area.

Model Analysis
In this paper, four methods, namely, SVR, BPNN, PSO-SVR, and PSO-BPNN, were used to invert soil salinity. According to the results in Table 3 and Figure 6, it was found that the inversion accuracy of the SVR and BPNN models were relatively close before the addition of PSO, while the inversion results of the PSO-SVR model were better than the PSO-BPNN model after the addition of PSO, and the effect was improved when compared to the original model The reasons for this are as follows: (1) Compared with the BPNN model, the SVR model relies more on the kernel function and the selection of relevant parameters. With the addition of PSO, the SVR model can automatically iterate through the particles to optimize the key parameters within a given range, thus improving the overall accuracy of the model. As can be seen from Figure 9, the PSO-SVR model converges faster and more accurately than the PSO-BPNN model during training, Therefore, the PSO-SVR model is more applicable in this study area. (2) Compared with the BPNN, the SVR model is more suitable for small sample studies. The theoretical basis of nonlinear mapping allows the SVR model to determine the final results with fewer key samples and avoids the drawback of falling into a local optimum, similar to the BPNN model. Similar conclusions were presented in the study of soil salinity in Yanqi, Xinjiang, by Hong Jiang et al. [71].

Analysis of Model Variables
The SVR and BPNN models selected for this paper do not provide variable importance rankings, but refer to Kennedy Were et al.'s study in Eastern Mau Forest Reserve [59]. In this paper, the importance of the predictor variables is represented by removing them from the PSO-SVR and PSO-BPNN models in turn and looking at the degree of increase in RMSE of their prediction results. It can be seen from Figure 10 that the soil, vegetation index, and polarization combinations extracted from the remote sensing data sources have a greater influence on the model for both the PSO-SVR and PSO-BPNN models. The reason may be that in arid and semi-arid regions, vegetation and soil characteristics are indispensable environmental variables for the inversion of soil salinity [72]. However, in terms of remote sensing data sources, the four indices extracted from the Landsat 8 data source are more important to the model than the four polarization combinations extracted from the PALSAR-2 data source. This may be due to the low direct correlation of the bands (HH, HV, VH, VV) in the PALSAR-2 data source with the soil salinity in the study area (as shown in Figure 4), resulting in their combination being less important to the model than the Landsat 8 data source. Relative to all eleven variables, CRSI had the highest importance, indicating that it was the strongest indicator of soil salt salinity in the model of this paper. Probably due to the CRSI, which contains all visible bands and one near-infrared band, it highlights the small peak of vegetation reflectance at the 400-500 nm wavelengths and the sudden change in reflectance that occurs between the red and near-infrared bands [73]. Most of the sampling sites in this paper are in the interior of the oasis, so CRSI is more sensitive to soil salinity in this study area. Kristen Whitney et al. used CRSI as the primary basis for their soil salinity study in Farmland of California's western San Joaquin Valley [74]. Wang et al. also proposed CRSI as one of the important indicators for soil salinity identification in their study of quantitative soil salinity assessment in three oases in Xinjiang, again validating the important role of CRSI for soil salinity studies [75]. This result is similar to the findings of previous studies. Although this study exemplifies the importance of the model input variables by the above methods, it is still a potential uncertainty in this study for the black box problem involved in machine learning. In future studies, we will attempt to solve this problem by other methods.

Data Analysis
In this paper, different data sources were used to invert the soil salinity of the field oasis (see Table 2 and Figure 6). The inversion results were greatly influenced by different data sources, and the prediction effect of the PALSAR data source was not that of the Landsat data source when compared with a single data source. The results might be as follows: (1) The imaging mechanism of PALSAR data is different from that of optical data, resulting in its being seriously affected by speckle noise, and although its image resolution is higher, its spectral information is not as rich as that of Landsat data. (2) Soil salinity in arid areas is affected by many factors; the real and imaginary parts of the dielectric constant of the soil receive the effect of water and salt content, respectively. Although the soil salinity characteristics can be enhanced to some extent by band combination, the effect of water content on the dielectric constant still cannot be eliminated. In contrast, Landsat images can better show the soil salinity characteristic values through more mature spectral indices, thus improving the prediction accuracy of the model. (3) The influence of vegetation in the study area should not be neglected. Although this paper uses the WCM to remove the interference of vegetation to a certain extent, its effect is still different for the densely vegetated area and the less vegetated area in the oasis-desert interlacing zone, and it cannot completely eliminate the influence of vegetation, resulting in a large noise error in the PALSAR image. (4) It has been shown that surface roughness is also one of the important factors affecting the backscattering coefficient of features [76,77]. In this study area, the topography is complex, as there are many zones where the Gobi desert intersects with the vegetated areas; therefore, there is some difficulty in characterizing the surface roughness, which affects the correlation between the backscattering coefficient and the soil salinity. From the above analysis, it can be seen that it is still difficult to invert soil salinity using only a single radar data source [78,79], and in future research, multiple methods or multiple sources of data are needed to reduce the effect of noise.

Soil Salinity Distribution in the Study Area
From the inversion results of this paper, the soil salinity in the study area shows a high north and south and low middle, which is similar to the results of previous studies [25]. Figure 11 shows the spatial interpolation map and DEM map of the groundwater level in this area, and by looking at the map, we can see that the northern part of the area has a low-lying topography and shallow groundwater levels. The reason for the higher salinity of the soil in the northern part of this area may be due to the shallow water table, the salts in the southern soil are brought to the surface due to capillary action, and the dry climate causes the water to evaporate, leaving the salts, and causing the salinization. Second, the terrain in the northern part of the area is relatively flat, groundwater moves mostly in a vertical manner, and horizontal runoff is effectively stagnant, which forms a good environment for soil salinity accumulation. In the southern part of the area, agricultural activities are more significant, and according to Figure 8 that the soil salinity is high around the cultivated land in the southern region, which may be due to unscientific cultivation practices that lead to secondary salinization.

Conclusions
In this paper, we combine particle swarm algorithm and machine learning to propose a method to combine multi-source remote sensing data for soil salinity inversion, and the main findings are as follows: (1) By comparing the simulation results of different models, we found that the prediction accuracy of both the BPNN and SVR models was significantly improved by adding PSO, and the prediction values of the PSO-SVR model are improved by 0.07 compared to the R2 of the SVR model (PALSAR-2 data), 0.20 (Landsat 8-OLI data), and 0.19 (PALSAR + Landsat data). The PSO-BPNN model improved the R2 compared to the BPNN model by 0.03 (PALSAR-2 data), 0.24 (Landsat 8-OLI data), and 0.12 (PALSAR + Landsat data). This indicates that the addition of PSO can have improved model accuracy. The simulation results of the PSO-SVR model are better than those of PSO-BPNN, indicating that the former is more suitable for the study of soil salinization in small areas. The final comparison showed the PSO-SVR model to be the best model for inversion of soil salinity in this area. (2) Comparing the inversion result plots, we found that using only a single microwave image to invert the soil salinity is not as effective as using the Landsat image. The combination of PALSAR's high-resolution texture information and Landsat's rich spectral information can better reflect the distribution of soil salinity in the study area. (3) The soil salinity in the study area tends to be low in the middle and high at the edges, which may be due to the low-lying topography in the north of the study area and the shallow depth of the groundwater, making it vulnerable to salinization; in the south, there are more agricultural activities, and unscientific irrigation methods may lead to secondary salinization. The surface runoff formed by the high terrain in the south may wash salt into the study area territory, intensifying the occurrence of salinization.