1. Introduction
Soil salinization is a major cause of global land degradation, affecting approximately 1 billion hectares of arable land worldwide and posing a massive threat to ecosystems [
1,
2]. The effect of soil salinization is particularly pronounced in arid zones [
3]. The excessive accumulation of soluble salts leads to a decrease in soil fertility, accelerates the decomposition of organic matter on the soil surface, hinders the uptake of soil water by vegetation, and threatens crop production, agricultural activities, and sustainability [
4,
5]. Cultivated land in oases is an important part of the carbon pool in arid areas and increasing soil salinization is decreasing the organic matter content, thereby contributing to the loss of soil carbon sinks [
6]. As such, high-resolution soil salinity monitoring is vital for maintaining food yields, slowing land degradation, and achieving dual carbon goals.
Remote-sensing tools are an effective way to extract quantitative information about soil salinization. The Landsat series [
7,
8], Sentinel series [
9], and other multispectral satellites are very effective for soil salinity monitoring. However, the low- and medium-spatial-resolution of remote-sensing images is still a limiting factor affecting soil salinity map accuracy, which limits their applicability to soil salinity assessment [
10]. Han et al. used the results of Gaofen-2 images to estimate soil salinity. The results showed that the texture characteristics of high-spatial-resolution remote sensing can influence soil estimation [
11]. When different remote sensing sensors are used to construct a soil salinity prediction model, the higher the spatial resolution, the better the correlation of the model [
12]. Shi et al. found that the performance of estimated models from Sentinel-2 data with 10-m spatial resolution outperformed counterparts from Landsat-8 data with 30-m of spatial resolution. Higher spatial resolution reduces the number of mixed pixels, rendering more pure soil pixels for the model [
13]. Therefore, high-spatial-resolution multispectral satellite monitoring may provide more accurate information on soil salinity.
Soil salinity studies are an application of spectral indices, based on the different spectral behaviors associated with ground target image pixels [
14]. Salinized soils have higher spectral reflectance values than nonsaline soils, and the reflectance values of the soil spectrum increase with increasing soil salinity [
15]. Salinity indices are the most effective way to assess and monitor soil salinity on a large scale [
16], where indicators such as the soil salinity index (Int 1–2), Normalized Differential Salinity Index (NDSI), and Salinity Index (SI 1–5) are very effective for salinity mapping [
17,
18,
19]. Also, there is a direct relationship between the Normalized Difference Vegetation Index (NDVI) and soil salinity; with increasing salinity, vegetation growth is inhibited and does not survive in the long term, so there are related studies using NDVI as an indicator of soil salinity [
20,
21]. Soil salinity indexes present a reliable theoretical basis for soil salinity trends. Significant progress has been made in estimating and mapping soil salinity by combining various spectral salinity indices [
22].
The PlanetScope satellite data has brought unprecedented opportunities for high-precision and global-scale remote-sensing mapping by providing 3-m spatial resolution and near-daily revisit frequencies. PlanetScope Labs Inc. (San Francisco, CA, USA) launched the third-generation sensors PSB.SD (SuperDoves) in April 2019, using remote-sensing data from Sentinel-2 as a standard for improved spectral-band calibration to obtain more accurate spectral information [
23]. The SuperDoves have eight spectral bands (coastal blue, blue, green I, green II, yellow, red, red-edge, and near infrared (NIR)) to capture image information. PlanetScope data are used to monitor rapid changes in land-surface statuses and processes, like fine-scale land-use development [
24,
25], agricultural expansion [
26,
27], forest change [
28], glacial lakes mapping [
29], and associated impact assessments. There are also aspects of crop growth prediction [
30,
31], quantifying delicate phenological plants [
32], characterizing surface carbon [
33,
34], etc. In summary, PlanetScope is a powerful new tool for accurate surface monitoring. Salinity monitoring is also a vital issue for surface monitoring. However, it remains unclear whether PlanetScope satellites can effectively monitor and map surface soil salinity. Therefore, the practical advancement of PlanetScope satellites may present a new opportunity for high-precision salinity monitoring.
Previous research into the bands of PlanetScope remote-sensing satellites was limited to the fusion of data in the four bands of blue, green, red, and near-infrared [
24,
29]. The SuperDove satellites have eight bands, among which the yellow and red-edge bands differ from other satellites. The current spectral index is biased toward visible and near-infrared bands [
35] than the yellow and red-edge spectral bands. Muller and van Niekerk [
36] used WorldView-2 data to derive a better performance of the yellow band index compared to the standard salinity index and NDVI, demonstrating the monitoring capability of the yellow band. As such, the introduction of the yellow and red-edge bands into the spectral index of soil salinity characterization provides an opportunity to further explore the salinity monitoring potential of the PlanetScope remote-sensing satellites.
To fill in the high-resolution soil salinity monitoring gaps, we try to answer this question: is the PlanetScope data applicable to soil salinity monitoring in inland drylands? This study has the following four aims: (1) to clarify the relationship between waveband information from the PlanetScope satellites and soil salinity; (2) to develop a new PlanetScope salinity index to validate their salinity monitoring capability; (3) to construct an optimal information retrieval model by comparing the estimated nuanced readings of different partial least-squares regression (PLSR) models; and (4) to monitor and map soil salinity in the arid inland region of Xinjiang, China.
2. Materials and Methods
2.1. Study Area
This study selected the Ogan-Kucha River Oasis (abbreviated here as Weiku Oasis), located in the north-central Tarim Basin at the southern foot of the Tianshan Mountains in the Aksu region of southern Xinjiang (
Figure 1). The oasis includes Kuche, Shaya, and Xinhe counties. The geographical coordinates are from 41°06′N to 41°38′N and 81°26′E to 83°17′E, encompassing a typical and complete premountain alluvial fan and plain. The area has a typical continental, warm temperate, arid climate with hot summers and little rain, a cold and dry winter, an annual average temperature of 10.5–14.4 °C, a multiyear average daily difference of 14.7 °C, and an annual > 10 °C accumulated temperature of 4208 °C. The annual average sunshine hours are 2789–3000 h, yearly evaporation is 2420.23 mm, and the annual average precipitation is 43.1 mm. The evaporation–precipitation ratio is approximately 54:1 [
37]. Natural vegetation is dominated by
Populus euphratica,
Tamarix chinensis,
Phragmites australis,
Alhagi sparsifolia,
Suaeda glauca, and
Kalidium foliatum. The crops are primarily wheat, cotton, and maize, which are distributed in the inner part of the oasis where the irrigation and drainage systems are relatively well developed.
2.2. Data Sources
PlanetScope Labs is a remote-sensing satellite data company founded in 2010 in San Francisco by former U.S. National Air and Space Administration (NASA) scientists. The company successfully developed microsatellite swarm technology for the first time, creating the only global, high-resolution, remote-sensing satellite system with full high-frequency coverage and able to provide the most inexpensive, fast, and applicable remote-sensing satellite data acquisition system. To date, PlanetScope Labs has launched 122 satellites, forming a constellation of satellites that acquire 3–5-m resolution imagery daily with near-daily revisiting frequency. The satellites are characterized by frequent coverage, high spatial resolution, and standard spectral resolution.
The PlanetScope satellite data comes with eight bands that are orthorectified, geometrically and radiometrically corrected, and atmospherically fixed [
23], among other operations. The waveband information of the PlanetScope satellite is shown in
Table 1. The spatial resolution of the satellite is 3 m, the width of an imaging pass is 24 km, the revisit period is 1–2 days, and the transit date of the product used in this work is for 24 July 2021.
2.3. Field Sampling and Data Acquisition
A field soil information survey was conducted in the study area between 20 July to 25 July 2021. Based on land use/land coverage (LU/LC), soil types, soil surface characteristics, previous field sampling experience, and accessibility to the potential sampling sites, a total of 84 representative sampling points were selected in agricultural fields and deserts that meet the spatial resolution of the satellites. (1) Field sampling: A portable global positioning system (GPS; LT500T, CHC Navigation Technology Co. Ltd., Shanghai, China) was used to record the exact location of the center point of each sample. A five-point composite sample (center point and four corners) was taken and mixed in the field to a depth of 10 cm, for a mixed sample of approximately 500 g. The collected soil samples were placed in sealed waterproof bags and labeled for subsequent chemical analysis. (2) Laboratory sample preparation: Samples were air dried, ground, homogenized, and passed through a 2-mm sieve in the laboratory. Then, to establish a 1:5 soil-to-water ratio, 20-g of soil and 100-g of distilled water were thoroughly mixed for at least 30 min. (3) Laboratory experimental measurements: The leachate was extracted to measure electrical conductivity (EC) via a digital multiparameter measuring apparatus (Multi 3420 Set B, WTW GmbH, Weilheim, Germany) equipped with a composite electrode (TetraCon 925) at room temperature (25 °C).
2.4. Spectral Salinity Indices
Some widely used indices for accurate reflection of the spatial distribution characteristics of soil salinity assessment are listed in
Table 2. The satellite’s green band has two parts, labeled as G1 and G2 in this work. To document the novel spectral monitoring capabilities of the yellow and red-edge bands, we used two bands (B5 and B7) to construct new spectral indices and calculated various combinations from these two new bands to generate potential soil salinity indices for estimating EC. The specific way to name the Salinity index (S1) is as follows: when the red-edge band is used to replace the red band, for example, it is named RS1; when the yellow band is used to replace the blue band, it is named YBS1. Due to this construction rule, 22 new red-edge spectral indices and 42 new yellow spectral indices are constructed in this work (
Table 3 and
Table 4).
2.5. Modeling Strategy
PLSR is a statistical method for finding a linear regression model by projecting the predicted and observed variables into a new space [
41]. It is one of the most widely used regression strategies for spectral modeling [
42,
43]. PLSR can simultaneously model multiple predicted dependent variables, thereby reducing dimensionality and avoiding multicollinearity [
44]. PLSR combines the advantages of principal component analysis, typical correlation analysis, and multiple linear analysis and is more suitable for small samples than traditional multiple linear regression (MLR) [
45]. This study used PLSR analysis to assess the potential relationships among dependent and independent variables. Still, considering the PLSR model’s uncertainty, errors occurred in predicting the predictors [
46]. Therefore, band feature optimization, which can reduce the uncertainty and error, is critical to the performance of the PLSR model. For this reason, three strategies with different options for band importance screening are established in this work [
47]: Boruta feature selection, random forest (RF) classification, and extreme gradient boosting (XGBoost).
2.5.1. Boruta Feature Selection
The Boruta algorithm generates three random shadow feature attributes by mixing and washing the original variable attribute values, thus reducing the covariance with the independent variables [
9]. Subsequently, RF regression is performed on the predicted values, combining the three shadow attribute values. By setting the maximum number of iterations, each iteration checks whether each real feature value has higher importance than the best-shaded feature attribute value, filters out the important and unimportant feature values by constant comparison, and determines the Z-score as the importance of each variable. The Boruta feature selection method selects the set of all features relevant to the dependent variable and thus provides a more comprehensive understanding of the factors influencing the dependent variable [
48].
2.5.2. Random Forest (RF) for Waveform Selection
Breiman [
49] developed RF to aggregate ideas and solve classification and regression problems. RF is an integrated machine-learning algorithm that seems practical and, therefore, is increasingly popular in variable selection [
50]. In the RF framework, the importance of variables is influenced by two main parameters: the size of the subset of input variables (mtry) and the number of trees in the forest (ntree). The value of ntree is set to 5 in this study, based on repeated tests.
2.5.3. Extreme Gradient Boosting (XGBoost)
To improve model interpretation and estimation accuracy, it is crucial to select sensitive wavelengths associated with the target parameters [
51]. This study used the XGBoost method for feature selection. XGBoost is a variant of gradient-boosting decision trees that evaluates the importance of boosting trees with three parameters (gain, frequency, and coverage) [
52,
53]. The gain parameter describes the importance of the tree branching features, the frequency parameter captures the number of elements in the constructed tree, and the coverage parameter represents the relative value of the feature observations.
2.6. Model Evaluation
In this study, Boruta and RF algorithm implementations were carried out using the R platform, and XGBoost was implemented through Python 3.7. To determine the ideal number of potential variables when using PLSR, leave-one-out cross-validation (LOOCV) was performed to prevent overfitting or underfitting the data, which may produce models with poor performance [
54].
To evaluate the performance of the calibrated model, several statistical parameters were compared between the measured and model estimates based on independent validation sets: coefficient of determination (R
2), root mean square error (RMSE), and the ratio of prediction to deviation (RPD). According to the classification guidelines for inverse models outlined in [
55], these models can be classified into three categories: class A (RPD ≥ 2.00) is the most efficient model with reliable predictive power; class B (1.40 ≤ RPD < 2.00) is a good model that usually has adequate results; and class C (RPD ≤ 1.40) is an unreliable model. Typically, the most effective model features have high R
2 and RPD values and low RMSE values.
5. Conclusions
Soil salinization mapping with high spatial resolution is of great significance for fine-scale monitoring of salinization. In this study, the recent emergence of PlanetScope data provides an unprecedented opportunity to monitor the delicate expression of soil salinity. Here, we identify the sensitive bands of PlanetScope satellite data for soil salinity and develop a new PlanetScope spectral salinity index combined with measured soil salinity data to assess the monitoring of salinity in oases in the Xinjiang arid zone. The yellow band of PlanetScope satellite data has a higher sensitivity to soil salinity. Among the three algorithms for feature preference (Boruta, RF, and XGBoost), we observe that the newly constructed yellow and red-edge band indices are advantageous for soil salinity estimation, with weights of 80%, 80%, and 60%, respectively. The analysis showed that the new salinity index contributed more to soil salinity. We found that PlanetScope satellites had soil estimation and mapping capabilities for soils in different strategy models, with the best-performing XGBoost-PLSR model R2, RMSE, and RPD values of 0.832, 12.050, and 2.442, respectively, all of which indicated that salt-affected soils dominate in the study area. The model generates 3-m resolution EC maps with more soil salinization details than medium-resolution maps. The study verifies that the model can be used to monitor soil salinization in arid or semi-arid regions. Furthermore, although our current study has not yet been implemented, we intend to use PlanetScope time-series data for soil salinity mapping at different periods in the future. We believe that we can further reduce uncertainties and improve the accuracy of predictions. This work provides an essential reference for developing strategies to manage and improve soil salinity in Xinjiang for sustainable oasis development and continued food security. Meanwhile, Planetscope has gradually become an essential part of earth imaging remote sensing, giving full play to the significant advantages of high precision, effectively deciphering the ground features, and promoting the rapid development of mapping, agriculture, environmental monitoring, and other industries.