Combining the SHAP Method and Machine Learning Algorithm for Desert Type Extraction and Change Analysis on the Qinghai–Tibetan Plateau

Lu, Ruijie; Liu, Shulin; Duan, Hanchen; Kang, Wenping; Zhi, Ying

doi:10.3390/rs16234414

Open AccessArticle

Combining the SHAP Method and Machine Learning Algorithm for Desert Type Extraction and Change Analysis on the Qinghai–Tibetan Plateau

by

Ruijie Lu

¹

,

Shulin Liu

^1,*

,

Hanchen Duan

¹

,

Wenping Kang

¹ and

Ying Zhi

^1,2

¹

Key Laboratory of Ecological Safety and Sustainable Development in Arid Lands, Northwest Institute of Eco-Environment and Resources, Chinese Academy of Sciences, Lanzhou 730000, China

²

University of Chinese Academy of Sciences, Beijing 100049, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2024, 16(23), 4414; https://doi.org/10.3390/rs16234414

Submission received: 21 October 2024 / Revised: 15 November 2024 / Accepted: 22 November 2024 / Published: 25 November 2024

Download

Browse Figures

Versions Notes

Abstract

:

For regional desertification control and sustainable development, it is critical to quickly and accurately understand the distribution pattern and spatial and temporal changes of deserts. In this work, five different machine learning algorithms are used to classify different desert types on the Qinghai–Tibetan Plateau (QTP), and their classification performance is evaluated on the basis of their classification results and classification accuracy. Then, on the basis of the best classification model, the Shapely Additive Explanations (SHAP) method is used to clarify the contribution of each classification feature to the identification of desert types during the machine learning classification process, both globally and locally. Finally, the independent and interactive effects of each factor on desert change on the Qinghai-Tibetan Plateau during the study period are quantitatively analyzed via geodetector. The main results are as follows: (1) Compared with other classification algorithms (GTB, CART, KNN, and SVM), the RF classifier achieves the best performance in classifying QTP desert types, with an overall accuracy (OA) of 87.11% and a kappa coefficient of 0.83. (2) From the perspective of the overall classification of deserts, the five features, namely, elevation, slope, VV, VH, and GLCM, contribute most significantly to the features. In terms of the influence of each classification feature on the extraction of different types of deserts, the radar backscattering coefficient VV serves the most important role in distinguishing sandy deserts; the VH is helpful in distinguishing the four types of deserts: rocky desert, alpine cold desert, sandy deserts, and loamy desert; slope is more effective in distinguishing between the two desert types (rocky desert and alpine cold desert) and other types of deserts; and elevation has a significant role in the identification of alpine cold deserts; and the short-wave infrared band SR_B7 has an important role in the identification of salt crusts and saline deserts. (3) During the study period, the QTP deserts exhibited a reversing trend, and the proportion of desert area decreased from 28.62% to 26.20%. (4) Compared with other factors, slope, precipitation, elevation, vegetation type, and the human footprint have greater effects on changes in the QTP desert area, and the interactions among the factors affecting changes in the desert area all show bidirectional enhancement or nonlinear enhancement effects.

Keywords:

desert classification; Qinghai-Tibetan Plateau; SHAP; geodetector

1. Introduction

Deserts are defined by the China Scientific and Technical Nomenclature Review Committee as geographic landscapes with sparse vegetation formed under arid climatic conditions [1]. In addition, one kind of special desert type, called “cold deserts”, is widely distributed in upper alpine or high-latitude subpolar zones due to physiological drought caused by alpine and low temperatures. Desert is defined as a kind of geographic landscape with sparse or bare surface vegetation formed under arid or alpine climatic conditions [2]. Desert areas are characterized by scarce precipitation, arid climates, poor soil, and sparse vegetation [3]. The unique structure and function of desert areas differ from those of other ecosystems, providing important ecological service values in terms of wind and sand control, soil conservation, carbon sequestration and oxygen release, cultural tourism, hydrological regulation, and maintenance of biodiversity and providing a material basis for the survival and development of people living in desert regions [4,5,6]. However, the ecological environment of desert areas is fragile and vulnerable to environmental degradation and desertification due to external disturbances [7,8]. Moreover, different types of deserts have different causes and processes of formation, as well as different geomorphological conditions, vegetation conditions, soil characteristics, and exploitation and utilization, resulting in different environmental problems [9,10]. At present, desertification has affected one-fourth of the global land area and the livelihoods of nearly one billion people, causing severe environmental degradation and enormous economic losses and threatening the survival and development of human beings [11,12]. China is one of the countries most seriously affected by desertification in the world, with a desertified land area of approximately 2.6 million km², accounting for 27.20% of the country’s land area, and a population of nearly 400 million people affected by desertification, resulting in a direct economic loss of approximately RMB 54 billion [13]. The cold and dry climate in most parts of the Qinghai–Tibetan Plateau (QTP) has created favorable conditions for the development of desertification [14]. Desertification has become a serious problem that hinders the socioeconomic development of the plateau and threatens the ecological security of the region and even Asia [15]. As one of the most serious environmental and ecological problems in the world, the United Nations Conference on Environment and Development (UNCED) included desertification in Agenda 21 as an important issue affecting the sustainable development of human society [16]. Therefore, a comprehensive understanding of the distribution and change characteristics of different desert types is highly important for maintaining the function of deserts and preventing the further development of desertification.

The Qinghai–Tibetan Plateau is affected by aridity and low temperatures, and the deserts in the region are widely distributed and of various types [17]. The selection of appropriate classification features for different deserts plays an important role in accurately extracting desert types in the study area. However, owing to the similarity of the spectral response in desert areas, accurately distinguishing desert types via only optical remote sensing data is difficult [18]. In their study on desertification information extraction in the Aral Sea region, Song et al. noted that microwave backscattering is very sensitive to changes in the surface soil particle size, and this response is more obvious in arid areas with sparse vegetation [19]. Zhang et al. also noted in their discussion of the desert classification system in alpine regions that terrain is an important factor affecting the distribution of water and heat, which is closely related to the distribution of deserts [20]. In addition, some studies have shown that combining texture data with remote sensing data can improve the accuracy of image classification and object recognition [21]. However, the efficiency of preprocessing large amounts of geographic data and remote sensing data via conventional methods is low. Google Earth Engine (GEE), an online processing platform that provides rich remote sensing data and machine learning algorithms internally, can realize the acquisition, processing, analysis, and application of data in a single unit, which can greatly improve the efficiency of image processing and has been widely used in the field of remote sensing classification [22,23]. However, the current application of machine learning classification algorithms to identify desert types and distributions via the GEE platform is still relatively rare. This study uses the GEE data processing platform to extract the QTP desert types, which is of great practical significance for quickly understanding the current status and changing characteristics of large desert areas.

Although machine learning algorithms are known for the advantages of strong generalization ability and high classification accuracy, they can also automate the processing of large amounts of data and have been shown to produce good classification results even when processing high-dimensional and complex data [24,25]. However, machine learning models are black-box models, and the relationship between the input features and model predictions is difficult to understand [26]. To understand the decision-making process within a machine learning model and to explain the impact of input variables on the prediction results, an in-depth and systematic analysis of input features is necessary [27]. Traditional feature evaluation methods, such as the mean decrease in impurity (MDI) and permutation-based feature importance measures, tend to assess the global importance of features and cannot analyze each prediction individually [28,29]. Decision tree-based machine learning classifiers (e.g., RF and GTB) internally provide an MDI-based feature importance metric, and most of the current research on classification feature importance is based on this approach [28,30]. However, this approach only evaluates the contribution of features to the overall model performance, and the unique contribution to different categories is not considered. SHAP is a game theory-based approach with the main goal of explaining the output of a black-box model by attributing the prediction results to different input features [31]. SHAP provides both local and global explanations in terms of the model’s influence process and output results and allows not only the overall importance of features to be assessed but also the degree of contribution (magnitude and direction) of each feature to the model results. Therefore, the SHAP model effectively overcomes the limitations of traditional feature importance analysis methods [32]. Owing to its strong theoretical foundation and rich visualization tools, it has been widely used in the field of natural and social sciences [33]. The SHAP model has been applied to classify wetland plant communities in northeastern China, soil textures, igneous rocks, and urban land use [34,35,36,37]. These findings suggest that SHAP provides a more nuanced interpretation of the predicted results, increases the transparency of machine learning classification models, and helps to improve understanding and trust in the models. This study applies the SHAP model in combination with machine learning algorithms in QTP desert type extraction, aiming to clarify the contribution and importance of different classification features in the overall desert extraction and the identification of different desert types from both global and local perspectives.

The QTP is known as the third pole of the Earth and is more sensitive to climate change and human activity disturbances than other regions [38]. Exploring the drivers of changes in desert dynamics is highly important for the prevention and control of land desertification and the protection and improvement of the ecological environment in the QTP. Many studies have shown that the dynamic change in deserts is a complex process that is influenced by both changes in natural factors and human activities [39,40]. Because of this complexity, previous studies have focused on analyzing the influencing factors of desertification from a single aspect of climate or human activity [41]. Some studies suggested that climate change affects soil quality and vegetation cover, leading to land degradation and the development of desertification [42,43]. Other studies argued that anthropogenic factors such as grazing and wood-cutting lead to the expansion of desertification [44,45]. However, these studies did not consider both the impact of natural factors and human activities on desertification changes. Therefore, there has been a gradual increase in studies using correlation analysis, principal component analysis, and residual trend analysis methods to analyze natural and anthropogenic factors affecting desert change [46,47,48]. However, these research methods cannot fully reflect the nonlinearity and complexity of the influencing factors and ignore the interaction between the factors [49,50]. The geodetector model is a spatial statistical method for detecting the spatial dissimilarity of geographic phenomena, which not only quantifies the importance of different factors in a geographic phenomenon but also explains the influence of the interaction between any two factors [51]. Using the geodetector to analyze the driving factors of QTP desert change, quantify the importance of each factor on the dynamic changes in deserts, and reveal the influence of the interactions between the factors on the dynamic changes in deserts.

The main objectives of this study are to (1) compare the performance of different machine learning classification algorithms in QTP desert typing via the GEE platform; (2) implement the SHAP method to analyze the importance of different classification features from both global and local perspectives on the basis of the optimal classification model; and (3) quantify the drivers of changes in QTP desert dynamics from 2000 to 2020 via geodetector. The research results can provide support for QTP desertification control measures and ecological environment management.

2. Materials and Methods

2.1. Overview of the Study Area

The QTP ranges from 25°–40°N, 73°–105°E, with an average altitude of more than 4000 m; it is known as the “Roof of the World”, “the Third Pole of the Earth”, and “Asia’s Water Tower”; and it is an important ecological security barrier in China [38,52,53]. The QTP is located in southwestern China, and the administrative division covers the entire territory of Qinghai Province and the Tibet Autonomous Region as well as parts of the Xinjiang Autonomous Region, Gansu Province, Yunnan Province, and Sichuan Province, with a total area of approximately 2.79 × 10⁶ km² (Figure 1). The QTP is characterized by low precipitation, uneven spatial and temporal distributions, low temperatures, large temperature differences between day and night, and strong solar radiation, and its overall climate changes from warm and humid to cold and dry from southeast to northwest. With changes in water and heat conditions, forests, meadows, grasslands, deserts, and other landscapes appear from southeast to northwest [54].

Desertification is defined as land degradation in arid, semiarid, and dry subhumid areas resulting from climate change and human activities [55]. The aridity index (AI) proposed by the United Nations Environment Program (UNEP) is widely used to define arid zones [56]. The specific thresholds for the AI (AI = precipitation/potential evapotranspiration) are as follows: extremely arid zone (AI < 0.05), arid zone (0.05 ≤ AI < 0.20), semiarid zone (0.20 ≤ AI < 0.50), subhumid arid zone (0.20 ≤ AI < 0.65), and humid zone (>0.65). In combination with field survey work, the extraction of desert types and the analysis of desert changes in this study were carried out in the subhumid arid zone of the QTP.

2.2. Data Sources and Preprocessing

2.2.1. Remote Sensing Data

The Landsat ETM/OLI product provides a surface reflectance Tier 1 product that has been atmospherically corrected. It has also been topographically corrected and geometrically corrected using a digital elevation model (DEM) and ground control points. The Landsat datasets used in this study are all T1 level surface reflectance data, of which the datasets used in 2000 and 2010 are LANDSAT/LE07/C02/T1_L2, and the dataset used in 2020 is LANDSAT/LC08/C02/T1_L2. The Landsat7 images were atmospherically corrected using the Landsat Ecosystem Disturbance Adaptive Processing System (LEDAPS) algorithm, and the Landsat8 images were atmospherically corrected using the Land Surface Reflectance Code (LaSRC) algorithm for atmospheric correction. Therefore, the Landsat image data used in this work meet certain geometric and radiometric quality requirements and has a high image quality with a revisit period of 16 days and a spatial resolution of 30 m. Vegetation information plays an important role in distinguishing deserts and non-deserts, and in order to more accurately identify the distribution range of deserts in the Qinghai–Tibetan Plateau, the Landsat datasets for the vegetation growing seasons of July and August in 2000, 2010, and 2020 were selected based on the GEE platform. However, since this period (July and August) is also the rainy season on the QTP, with frequent cloudy weather, it is difficult to obtain cloud-free or less cloudy images covering the entire study area; thus, we selected images from neighboring years as supplements. These image sets were then cropped, mosaicked, and declouded with a masking function via the quality assessment band “pixel_qa”. Finally, a median composite was created to produce the Landsat image used in the desert classification of the study area. The specific information for the Landsat dataset is in Table A1 (Appendix A).

Sentinel-1 images: The Sentinel-1 SAR dataset used in this study is COPERNICUS/S1_GRD, which has a spatial resolution of 10 m. The GEE platform was used in processing this data with thermal noise cancellation, radiometric correction, and terrain correction using the Sentinel-1 toolbox. In this work, the median synthesis of the image collection generated from the Sentinel-1 data from July and August 2020 under two polarization modes, vertical emission-vertical reception (VV) and vertical emission-horizontal reception (VH), according to the scope of the study. Among them are 449 VV images and 881 VH images. Subsequently, it is resampled to 30 m as a radar feature image used in the classification process.

2.2.2. Selection of Impact Factors for Desert Change

The development of and changes in deserts are influenced by a combination of factors [40]. In this work, 12 representative, easily accessible, and quantifiable factors were selected from the four dimensions of terrain, environment, climate, and human activities to investigate their influence on the dynamics of deserts. Among them, elevation (Ele), slope (Slo), and aspect (Asp) were the terrain factors; vegetation type (VT) and soil type (ST) were the environmental factors; and precipitation (Pre), temperature (Tem), wind speed (WS), and potential evapotranspiration (PET) were the climatic factors. Population density (Pop), actual livestock carrying capacity (ALCC), and the human footprint (HF) served as human activity factors. The specific information for each factor is shown in Table 1. The spatial distribution of driving factors is shown in Figure A1 (Appendix A).

The trends of a total of seven factors about climate and human activities were calculated from 2000 to 2020 via linear regression trend analysis [57], which was used to analyze their contributions to desert change. Subsequently, based on the comprehensive consideration of the uniformity of the distribution of sample points, the range of the study area, and the efficiency of the model calculation, we chose to use the center point of the 10 km × 10 km grid as the sample point via the ArcGIS fishing net tool, and the trend values of the above seven factors and the values of other factors were extracted as the independent variables for the analysis of factors affecting desert change.

2.3. Methods

2.3.1. Desert Classification System and Sample Selection

In this work, we refer to the Chinese land use classification system established by the Chinese Academy of Sciences and the Ministry of Agriculture and construct a classification system for Qinghai–Tibetan Plateau deserts on the basis of the remote sensing spectral characteristics of deserts combined with factors such as terrain, climate, and ground material composition [58]. The image identification markers of desert types in the QTP were established by combining Landsat 8 false-color composite images with images captured by UAVs during field survey (Table 2).

The selection of classification samples should follow the principles of uniformity and representativeness [59], and accurate and sufficient samples are the prerequisite for accurate classification of desert types. On the basis of this principle, a 10 km×10 km grid was constructed via the ArcGIS fishing net tool, and the center point of the grid was extracted as the sample point; on this basis, the representative areas were supplemented with sample points, the number of sample points in Table A2 (Appendix A). Subsequently, the selected samples were manually interpreted with reference to Google Earth high-resolution images and 1:1 million vegetation type maps of China. The “randomColumn” algorithm in GEE allows for the random division of the sample data. The algorithm contains two types of distribution to generate random numbers (uniform and normal), and the parameter used in this study is uniform, which ensures that the number of samples of each desert type is uniformly divided according to the set ratio when dividing the whole sample data. Therefore, based on the “randomColumn” algorithm, 70% of the sample points are used as training samples for classification, and the remaining 30% are used as validation samples for classification results.

2.3.2. GEE Platform

Google Earth Engine (GEE) is a cloud-based global-scale geospatial data analysis platform designed by Google [22]. Its data directory provides PB-level geospatial datasets, including information from satellites such as Landsat, Sentinel, and MODIS, as well as land cover, terrain, socio-economic, and climate data. The code editor interface of GEE is an integrated development environment (IDE), and its algorithm language foundation is built on JavaScript [60]. Users can perform operations such as querying, managing, visualizing, downloading, and processing multi-source remote sensing data on this interface [61].

2.3.3. Machine Learning Classification Algorithms

Based on the GEE platform, five typical machine learning classification methods are selected to evaluate the classification performance of different desert types, with an aim to find the optimal algorithms suitable for desert classification on the Qinghai–Tibetan Plateau. In addition, in order to fully utilize the classification performance of each algorithm, the parameters of different classification algorithms need to be set. In this work, the classification accuracy of each machine learning algorithm with different parameter settings is queried based on Python 3.8 using the Grid SearchCV method to find the optimal parameters on the basis of considering the computing time of the model [62]. Then, these parameters are applied to the corresponding classification algorithms of GEE.

Random Forest (RF) is a nonparametric integrated machine learning algorithm proposed by Breiman that generates multiple decision trees by randomly selecting a subset of training samples and variables, with the final classification category determined by a majority vote of all the decision trees [63,64]. Before being applied to classification, two parameters need to be set for the random forest: the number of decision trees (ntree) to be generated and the number of features to be used on each node (mtry). It has been shown that the classification accuracy of RF is sensitive to the setting of the ntree value, but as the ntree value continues to increase, the classification error will gradually stabilize [65]. In the parameter optimization, ntree is set to 30–200 with a step size of 5. The final value of ntree selected on the basis of the results of the parameter tuning is 175. The mtry parameter is set as the square root of the number of input features.

Gradient tree boosting (GTB) is an integrated tree-based machine learning algorithm that sequentially trains a set of weak learners and iteratively composes a strong learner as the final classification prediction model via an ensemble of gradient descent methods [66]. During model construction, misclassified samples are given higher weights to minimize the loss function for higher classification accuracy [67]. The classification performance of the gradient boosting tree is strongly affected by the parameter settings, and the parameters to be set include ntree, the sampling rate, shrinkage, and the loss function. Sampling rate is the proportion of samples used for each decision tree in GTB; if the value is too large, which will lead to overfitting, while if the value is too small, which will increase the bias of the fit. Shrinkage is the weight reduction factor for each decision tree; a smaller value means more iterations for the decision tree. Loss is the loss function for the regression. Among them, the loss and shrinkage rates use the default settings in GEE, with the former being the least absolute deviation and the latter being 0.005, whereas the ntree and sampling rates utilize Grid SearchCV for parameter optimization. In this study, the range of the ntree is set from 10 to 50 with a step size of 5; the range of the sampling rate is set from 0.1 to 1 with a step size of 0.1; and the final selected ntree is 45 with a sampling rate of 0.8.

A support vector machine (SVM) is a theoretically superior machine learning algorithm that maximizes the distance between the nearest samples and the plane by creating an optimal hyperplane, also known as a decision boundary, during training to effectively separate classes [68]. The classification accuracy of support vector machines is strongly affected by the kernel function, and it is necessary to choose the appropriate kernel function to accurately create the hyperplane to minimize the error [69]. The four commonly used kernel functions are linear, polynomial, radial basis function (RBF), and sigmoid, among which the RBF kernel has been proven to be superior to the others [70]. The results of parameter optimization also show that a higher classification accuracy can be obtained via the RBF kernel than via the other kernel functions.

The classification and regression tree (CART) is a binary classifier based on a hierarchical decision tree framework. Owing to its simple structure, fast computation speed, and easy-to-understand-and-interpret input-output relationships, it has been widely used in various studies in the field of remote sensing [71]. The CART algorithm provided by the GEE can adjust two parameters: the maximum number of nodes and the minimum leaf population. The CART algorithm performs binary recursive iterations based on these two parameters until all samples are classified. For the minimum leaf population, the range set in the parameter optimization search is from 1 to 10, with a step size of 1. The final value selected is 5. The max nodes are then set to GEE’s default parameter, None.

K-nearest neighbor (KNN) is a nonparametric algorithm for which the classification principle is to determine the class of an unknown sample by calculating the average of the response variables of the K closest neighboring unknown samples of the training samples on the basis of the distance function [72]. Therefore, K and the distance metric are the key parameters that determine the performance of the classifier. A value of K that is too large can lead to overfitting and model instability, whereas lower values of K can produce complex decision boundaries [73]. For parameter optimization, K values ranging from 1 to 20 were tested to determine the optimal K value applicable to desert classification in this study, and the final selection was 9. In addition, the distance metrics for KNN include the Euclidean, Mahalanobis, Manhattan, and Bray-Curtis distances. The final distance metric chosen was “Manhattan”.

2.3.4. Selection of Classification Features

Spectral indexes formed by mathematical combinations of spectral bands from remote sensing images have been widely used to distinguish various types of landforms [74,75]. In this study, nine spectral indexes were selected to assist in the classification of QTP desert types, among which three vegetation indexes—normalized difference vegetation index (NDVI), enhanced vegetation index (EVI), modified soil adjusted vegetation index (MSAVI)—were used to distinguish deserts from high vegetation cover areas. Salinity index (SI), normalized difference snow index (NDSI), normalized difference water index (NDWI) were used to differentiate between saline deserts, snow and ice, and water bodies. It has been pointed out that the classification performance of bare soil index (BSI) increases with the increase of aridity and is an effective indicator to distinguish deserts from other soils [76]. Xiao et al. proposed a topsoil grain size index (TGSI) based on the relationship between the degree of desertification and the composition of topsoil particles [77], and in this study, TGSI was attempted to be used to differentiate desert types composed of different topsoil particles (e.g., SD, GD, and LD). Meanwhile, as one of the important parameters of surface radiant energy balance, surface albedo varies with land cover type [78]. In addition, previous studies have proved that adding radar, terrain, and texture features in the classification process is beneficial to improve the accuracy of object recognition [19,20,21]. Therefore, this paper chooses the above spectral indexes and combines the spectral band, radar, terrain, and texture features as the classification features when classifying the desert types on the Qinghai–Tibetan Plateau (Table 3). The specific formulas of the spectral indexes are shown in Table A3 (Appendix A).

2.3.5. Geodetector Model

In this study, the factor detection module of the model was chosen to explore the extent to which a single factor affects changes in deserts, and the interaction detection module was used to explore whether two factors, when interacting, increase or decrease the extent to which they affect the dependent variable [51]. The explanation of the dependent variable by the factor and the interaction between the two factors is measured by the q value, which is calculated as follows:

q = 1 - \frac{\sum_{h = 1}^{L} N_{h} σ_{h}^{2}}{{N σ}^{2}}

(1)

where “h = 1, …, L” represents the stratification (i.e., categorization or grading) of the dependent or independent variable,

N_{h}

and N are the number of cells in stratum h and the whole region, respectively,

σ_{h}^{2}

and

σ^{2}

are the variance of stratum h and the whole region, respectively, and the value of q ranges from 0 to 1. The greater the value of q, the greater is the degree of influence of the independent variable on the dependent variable.

There are five types of interaction detection roles, and the specific assessment bases and role types are shown in Table A4 (Appendix A).

Geodetector requires that the data types of independent variables must be discrete variables and that continuous variables need to be discretized. On the basis of the ”GD” package developed in R, the optimal discretization of independent variables can be realized by calculating the q value of continuous factors under different grading methods and different numbers of breaks. This study utilized five grading methods: equal spacing, natural breaks, interquartile spacing, geometric spacing, and standard deviation spacing, and the number of breaks was set from 4 to 7.

2.3.6. Model Interpretation

In this work, the SHAP method is used to evaluate the contribution and importance of different classification features in the classification process of identifying different desert types in QTP. The concept of SHAP is derived from Lloyd Shapley’s work on solving the problem of distributive equilibrium in cooperative game theory [28]. From a game theory perspective, each feature is considered a player, and the model prediction results are considered gains, from which the contribution of each player is evaluated so that the gains can be rationally distributed [28,79]. Due to its strong theoretical foundation, Lundberg et al. proposed the SHAP method in 2017 [31]. It is a model-independent method that can explain different models [31].

In the SHAP model, the contribution of each feature to the prediction result is calculated by computing its Shapley value. The overall importance of the feature can then be obtained by calculating the absolute average of the Shapley values of all samples for each feature [31]. The Shapley value (φ_i) of feature i for a given sample is calculated as follows:

φ_{i} = \frac{1}{n!} \sum_{S \subseteq N \ {i}} |S|! (n - 1 - |S|)! (f (S \cup \{i\}) - f (S))

(2)

S is the subset of all features except feature i, N is the set of all features, n is the total number of features, and f is the model prediction function.

SHAP expresses the predicted value of the model as the sum of the attributed values of each input feature and defines a linear function g of binary variables to explain the model f [33].

f (x) = g (x^{'}) = φ_{0} + \sum_{i = 1}^{n} φ_{i}

(3)

where φ₀ is a constant value when all inputs are missing and x is the sample being interpreted. That is, for each sample, the prediction of the model is the sum of a constant and the shapley values of all features.

In this study, we imported the SHAP package (version 0.44.1) into Python 3.8 and used the “TreeExplainer” method to interpret the contribution of each feature to the prediction results in the machine learning classification process.

3. Results

3.1. Differences in Classification Performance of Different Machine Learning Algorithms

This section compares the classification results of five machine learning algorithms (RF, GTB, CART, KNN, and SVM) for QTP desert types in 2020 on the basis of the GEE platform. And its classification performance is evaluated using confusion matrix-based overall accuracy (OA), kappa coefficient, producer accuracy (PA), and user accuracy (UA). Subsequently, the effectiveness of the five machine learning algorithms for the extraction of different desert types was compared by selecting five representative regions that contain various desert types. The location of the local classification detail regions and the confusion matrices for the five machine learning algorithms are shown in Appendix A.

To evaluate the differences in the overall classification performance of the five algorithms, this paper compares the overall classification results (Figure 2) and classification accuracies (Table 4) of the different machine learning algorithms. A comparison of the classification results of the five machine learning classification methods for different desert types on the QTP reveals that three classification methods, RF, GTB, and CART, exhibit better consistency in terms of the spatial distributions of different desert types. While KNN and SVM can successfully recognize some sandy deserts, gravelly deserts, salt crust, and saline deserts distributed in the Qaidam Basin, they cannot effectively recognize three types of loamy deserts, rocky deserts, and alpine cold deserts. Moreover, a comparison of the classification accuracies of different machine learning algorithms reveals that the RF and GTB algorithms have higher classification accuracies, followed by the CART algorithm, whereas the KNN and SVM algorithms have poorer classification accuracies and cannot recognize different deserts well. Among them, RF has the highest classification accuracy, with an OA of 87.11% and a kappa coefficient of 0.83.

To evaluate the classification accuracy of five machine learning algorithms for different desert types using PA and UA in this study. From Table 5, it can be seen that the PA of GD, SM, MS, LD, RD, and AC is the highest when applying the RF classification algorithm. The PA of the RF classifier in recognizing SD is second only to the GTB classifier, and the difference is only 0.01%. From the UA of each desert type, the UA of SM is the highest when applying GTB classification, the UA of MS is the highest when applying CART for classification, while the UA of the other five desert types is the highest when applying the RF classification algorithm. In addition, from the combined view of PA and UA, the CART classification algorithm performs better than the other in distinguishing these desert types of SD, GD, MS, and AC. The KNN classification algorithm, on the other hand, only has good accuracy in recognizing SD and has lower accuracy in classifying other desert types. The SVM algorithm is the least effective in classifying various desert types and cannot distinguish SM, MS, and LD. In the recognition of SD, the RF, GTB, and CART algorithms all achieved satisfactory classification accuracy, with PA and UA above 90%. The five classification algorithms were not as effective as other desert types in recognizing SM and LD, with some different degrees of misclassification. Overall, the RF has higher classification accuracy and classification effect for most of the desert types.

To evaluate the differences in the recognition effects of different machine learning algorithms for each desert type, we chose the classification results for five regions containing various desert types for a detailed comparison (Figure 3). The comparison reveals that in region (a), all five classification methods recognize the SD better, with high consistency for boundary discrimination; however, the CART, KNN, and SVM algorithms present different degrees of confusion in distinguishing between GD, SM, and MS, whereas the RF and GTB algorithms achieve different recognition results between GD and MS. In region (b), the RF and GTB algorithms have better recognition effects on various types of deserts, and the classification results of the CART classifier show greater salt-and-pepper effects. The KNN and SVM algorithms cannot effectively recognize desert types other than MS, especially confusing RD and nondesert classes, which is not conducive to accurately describing the distribution patterns of different desert types. In region (c), there are four desert types, including GD, LD, RD, and AC, and among the five machine learning classification algorithms, RF, GTB, and CART have better recognition effects for these four desert types, as well as snow and ice, and are more accurate in recognizing the boundaries of different deserts. However, the KNN and SVM algorithms cannot describe the boundary between deserts and nondeserts well, and at the same time, they are less effective in recognizing the three types of deserts, i.e., LD, RD, and AC. Region (d) mainly contains five different types of deserts, such as GD, SM, LD, RD, and AC, and nondeserts, such as water bodies and vegetation. The classification results of different algorithms for this region show that RF, GTB, and CART have the ability to recognize the complex desert types in the region. Region (e), on the other hand, includes two types of distributed deserts, GD and RD, and some areas with high vegetation coverage. The classification results reveal that KNN and SVM are not effective at delineating deserts and nondeserts, whereas the other three classification algorithms yield better and more consistent recognition results.

Overall, the RF algorithm classifies QTP desert types most accurately, followed by the GTB. These two algorithms can not only better differentiate deserts and nondeserts but also offer greater consistency in the classification of different desert types. Although the CART classifier has a better classification effect in general, it is not as good as RF and GTB in terms of local detail. KNN and SVM are less effective in classifying different desert types and are not applicable to QTP areas where deserts are widely distributed and complex.

3.2. Classification Feature Importance Analysis Based on SHAP

The above results indicate that the RF classification model performs the best in terms of QTP desert type extraction. Therefore, this section explores the contribution and importance of different features in the RF classification process for desert classification on the Qinghai-Tibetan Plateau via the SHAP method. The bar chart uses the absolute mean value of SHAP to measure the importance of different features, which provides a global interpretation perspective for the RF classification model (Figure 4a).

Figure 4a shows that the elevation among the terrain features has the highest importance in desert classification and is the most important feature in determining the prediction results of the RF model, with an average SHAP value of 0.305, whereas the slope feature has the second highest overall contribution to the classification, with an average SHAP value of 0.289. Among the radar features, both VV and VH play more important roles in the classification process. The average SHAP value of VV is 0.271, whereas that of VH is 0.236. In addition, the texture feature also plays a more important role in the classification process, with an average SHAP value of 0.205. The TGSI and BSI are more important than the other spectral indices. Moreover, among the Landsat image bands, the two shortwave infrared bands, SR_B7 and SR_B6, have higher mean SHAP values and contribute more to the classification than the other bands. The overall contribution of the other features to the classification of desert types was smaller, and the difference between them was not significant. Overall, when extracting the QTP desert types, the five different dimensions of classification features are, in descending order of importance, as follows: terrain features, radar features, texture features, spectral indices, and spectral bands. Among them, the five features of elevation, slope, VV, VH, and GLCM contributed most prominently to desert classification and together explained 66.4% of the RF model results for the classification of QTP deserts.

In addition, the swarm plots generated via the SHAP method were used to further explain the extent and direction of the influence of individual sample feature size on the identification of each desert type (Figure 4b–h). The vertical coordinates in the graph represent the importance of the features, which are ordered from top to bottom in order from large to small, and the color of each feature value from small to large ranges from dark blue to light yellow. The horizontal coordinates represent the size of the SHAP value: a value less than 0 indicates that the sample feature has a negative influence on the prediction of the classification result, and a value greater than 0 indicates that the sample feature has a positive influence on the prediction of the classification result. Figure 4b shows that VV, elevation, VH, GLCM, TGSI, and BSI have greater impacts on SD in the classification process. When the values of VV, elevation, VH, and GLCM features increase, the probability of the sample being classified as SD decreases, which has a negative impact, whereas TGSI and BSI have positive impacts. Figure 4c shows that slope, elevation, the TGSI, and VH have a negative influence and that the GLCM and VV have a positive influence when classifying the GD. When predicting saline deserts (SM and MS), the magnitudes of features such as elevation, slope, SR_B7, and VV have the same direction of influence on the prediction of classification results, which is shown by the fact that elevation, slope, and SR_B7 have a negative influence, whereas VV has a positive influence. However, there is a slight difference in the degree of influence, with slope having the greatest influence on SM and elevation having the greatest influence on MS. In addition, the BSI imparts a greater contribution to the prediction of SM than that of MS, and the shortwave infrared band, SR_B6, also plays a more important role in the classification of MS (Figure 4d,e). Figure 4f shows that elevation, VV, and SR_B5 have positive effects on the prediction results when classifying LD, and slope, VH, and GLCM have negative effects. Figure 4g shows that the three features slope, VH, and VV have a positive influence on the prediction of RD classification results, whereas the GLCM, elevation, and BSI have a negative influence. In the classification of AC, elevation, VH, slope, and VV have positive influences on the prediction of the results, whereas the two shortwave infrared bands SR_B7 and SR_B6 have negative influences (Figure 4h).

Finally, by comparing the roles played by the five features with greater overall importance in the desert classification process in predicting different types of deserts, it can be concluded that the VV and GLCM both make greater contributions to the prediction of SD and GD, but both have opposite influences on the prediction results as the feature value increases, indicating that the VV and GLCM features have better effects in distinguishing these two types of deserts. The influence of VH features on the prediction of RD and AC is different from that of SD and LD, which indicates that VH features play an important role in the differentiation of these types of deserts. Slope features have a positive influence on the prediction of RD and AC, whereas other types of deserts have a negative influence, which suggests that RD and AC are located in areas with high slopes, whereas other deserts are located in areas with low slopes. A comparison of the effects of the distribution of the elevation eigenvalues on the desert classification results reveals that LD and AC are distributed in relatively high elevation areas, especially when the elevation value is the largest, and the effect on AC is greater, which indicates that AC has a higher elevation compared with other types of deserts. Moreover, although some features are not important in the overall classification of deserts, they are still important in distinguishing some specific types of deserts; for example, SR_B7 is important in the prediction of the classification results of salt crusts and saline deserts, and the TGSI has a different effect on the classification of SD and GD, which helps to distinguish these two types of deserts.

3.3. Spatial Distribution and Changes in QTP Deserts

Figure 5 shows the spatial distributions of different desert types in the QTP region in 2000, 2010, and 2020. Overall, deserts are concentrated in the western and northern parts of the QTP, in the Qaidam Basin, and in the high mountains in southern of the QTP. Specifically, the SD is distributed mainly in the flat area at the inner edge of the Qaidam Basin; the GD is distributed mainly in the premountain alluvial plains in the Qaidam Basin and Altun Mountains; there are also some distributions around rivers and lakes; and the SM and MS are concentrated in the low-lying area in the center of the Qaidam Basin; there are also some sporadic distributions in dry rivers and lake areas; and the LD is concentrated on the southern side of the Kunlun Mountains and the Qiangtang Plateau. RD and AC are widely distributed in the Qilian Mountains, Altun Mountains, Kunlun Mountains, Kangdese Mountains, and Ali area in the western part of the QTP, which are at relatively high altitudes and high slopes, and the Kangdese Mountains and Himalayas in the southern part of the plateau are mainly distributed with alpine cold deserts affected by low temperatures.

Table 6 and Figure 6 show the statistics of the area of QTP deserts in different periods and the spatial distribution area of the changes in QTP deserts in the study period, respectively. Table 6 shows that the total area of QTP desert displays a continuous decreasing trend from 2000 to 2020, and the proportion of desert area on the QTP has decreased from 28.62% to 26.20%, with an overall decrease of 66,955.47 km². During the study period, the areas of the gravelly and loamy deserts changed greatly and showed a decreasing trend. The sandy and moderate and severe saline deserts showed large changes, but the decreases were small compared with those in the QTP. The areas of the salt crust and mild saline deserts increased slightly, and the overall changes in the rocky deserts and alpine cold deserts were small. Therefore, the reversal of QTP deserts has resulted mainly from the transformation of gravelly and loamy deserts to non-desert areas.

Figure 7 shows that the transfer between the four desert types (GD, LD, RD, AC) and the non-desert types is larger, and the transfer between GD and LD and between RD and AC is larger among the different desert types. While SD, SM, and MS have a smaller transfer of change in their area due to their smaller overall area. Meanwhile, according to Table 6 and Figure 7, it is shown that the reversal of QTP deserts from 2010 to 2020 is larger than that from 2000 to 2010. The area of the deserts decreased by 19,643.96 km² from 2000 to 2010, and decreased by 47,311.51 km² from 2010 to 2020. During the study period, the reversal of QTP deserts was mainly distributed in the northwestern part of the Qiangtang Plateau, the northeastern part of the Qaidam Basin, and around the salt lakes in the basin. The expansion of QTP deserts was mainly distributed on the sides of the Kangdese Mountains and the Himalaya Mountains.

3.4. Analysis of Factors Influencing Changes in QTP Deserts

According to the different characteristics of the distribution of the data values of each driving factor, in order to be able to show a large difference in the discretization, based on the “GD” package of the R language, we search for the classification method and the number of breaks corresponding to the maximum q-value of each driving factor. Figure 8 shows a visual representation of the results of grading each driver factor on the basis of the GD package. In this case, the optimal discretization method for Ele and WS is natural; Slo, Tem, Pet, Pop, ALCC, and HF are quantiles; Pre is the standard deviation; and Asp is geometric.

Through comparison, it was found that the order of influence of single factors on QTP desert change was Slo > Pre > Ele > VT > HF > ALCC > WS > Pop > ST > Tem > Pet > Asp, but Asp did not pass the significance test of 0.05 (Figure 9). Slo, among the terrain factors, had the most significant effect on QTP desert change, with the strongest explanatory power and a q value of 0.2035. Ele among the terrain factors also had a strong effect on desert change. Among the climate factors, Pre also exerted the most significant effect on desert change, second only to Slo, with a q value of 0.1516. The factor with the greatest influence on desert change among the human activity factors was HF, with a q value of 0.0705, and among the environmental factors, VT had a significantly greater influence on desert change than ST. Among the 12 factors, the top 5 factors with the greatest influence on desert change represent different types of factors, which indicates that the change in the QTP desert area is jointly influenced by multiple factors.

The geodetector interaction detection results (Figure 10) revealed that the interaction between two factors had greater explanatory power for the changes in the QTP deserts than a single factor, and the interaction between any two factors was greater than the independent effect of a single factor was, which manifested as a bidirectional enhancement or a nonlinear enhancement. Among them, the interaction between Pre and Slo shows the strongest influence, with a q value of 0.3342, indicating bidirectional enhancement. Second, the top five factor interactions with the greatest influence were Ele ∩ Slo (q value of 0.3182), Pre ∩ Ele (q value of 0.2884), Slo ∩ VT (q value of 0.2695), and Slo ∩ HF (q value of 0.2625), among which the interaction between Pre and Ele shows nonlinear enhancement. In addition, we also observed that the interactions between Asp, Tem, Pet, Pop, and other factors were all nonlinearly enhanced. This suggests that the combination of terrain, climatic, environmental, and human activity factors greatly enhances the effect on QTP desert change.

4. Discussion

4.1. Machine Learning Algorithm Classification Performance

In this study, we have used the five different machine learning classification algorithms for the extraction of different desert types of QTP to select the optimal algorithm. The RF classifier reliably classifies the target by the prediction of the decision tree ensemble and has been widely used by virtue of its low sensitivity to missing values and unbalanced training data, as well as its good anti-noise and anti-overfitting abilities [64]. By comparing the classification accuracy of each classification algorithm as well as the local detail portrayal, our study found that the RF has higher accuracy compared to other classification algorithms and also has a better effect on the portrayal of each desert type. This finding is in line with the results of several other studies. Pizarro et al. compared the performance of six machine learning methods, RF, CART, GTB, SVM, minimum distance, and naive Bayes, when applied to Andean ecosystem land cover classification via the GEE platform, and the results revealed that RF was the better method [80]. Yang et al., in their study on the classification of land cover in the Qilian Mountains, noted that the classification accuracy of RF was higher than that of CART and SVM [81]. Meanwhile, it has been reported that compared to other classifiers, the RF is less sensitive to parameter settings and better able to deal with a large number of features [82], which may be one of the reasons why the RF displays a higher classification performance. Our study result was found when the overall classification results and local detail descriptions of different machine learning algorithms were compared: RF and GTB provided greater consistency in the extraction of desert types. This may be attributed to the similarity of the model structure between the RF and GTB classifiers, both of which are integrated algorithms based on decision trees. The CART classifier, although it is also a decision tree-based classification model, as a binary decision tree classifier, has a simpler structure than the RF and GTB; thus, its classification performance is relatively poor. In addition, our study shows that KNN and SVM display poorer performance in classifying desert types, which may be related to their classification principles and sample selection. KNN is a proximity algorithm, and its classification prediction is strongly influenced by neighboring samples [83]. SVM classification is based on the idea that only training samples located on class boundaries can effectively discriminate between classes [69]. In selecting the samples, we followed the principle of uniformity, which makes KNN and SVM susceptible to the influence of different samples, thus resulting in misclassification, and this effect is more obvious in regions with complex desert types (Figure 3 region c and region d).

4.2. Impact of Classification Features on Desert Type Identification

The importance of classification features based on the SHAP model revealed that the terrain features of elevation and slope had the highest overall importance in the extraction of desert types. This is because terrain is an important factor affecting water and heat distributions and has an important impact on the spatial distribution of deserts through direct or indirect effects [84]. Moreover, SAR data can reflect the dielectric properties of features, soil roughness, particle size, and other information, which is highly suitable for the research and monitoring of desert areas [85]. Zhu et al. reported that due to the difference in the surface roughness of rocks between sandy plains and paleochannels, the inversion of SAR signals in desert areas also significantly differed [86]. We found that VV and VH play important roles in identifying sandy deserts through the importance of each classification feature in the extraction of different desert types, which further confirms the applicability of radar data in the study of desert areas. GLCM is a commonly used method of statistical analysis of texture, which can reflect the uniformity of the gray distribution of the image, the depth of the texture grooves, as well as the degree of brightness and darkness of the image and other information [87]. Based on this information, texture features can improve the distinguishing ability of “same object different spectrum” and “same spectrum foreign body” phenomena in images, thus improving the classification accuracy [88]. In our study, due to the differences in the composition and terrain, the various desert types with similar spectra show different texture information. Thus, texture features play an important role in the classification feature. In addition, because shortwave infrared radiation is more sensitive to moisture changes [89], the importance of shortwave infrared signals is greater than that of the near-infrared and visible light bands in the classification process, especially in the identification of saline deserts, which is second only to the importance of elevation and slope. This is the case because the saline deserts in the QTP are distributed mainly in the Qaidam Basin, where the climate is arid and evaporation is high, and salts accumulate on the surface because of the continuous evaporation of soil and surface water [90]. Spectral indices based on combinations of spectral bands can further increase the spectral variability among different features and are now widely used to classify various types of features [74]. However, in our study, we found that the commonly used vegetation indices were less helpful in classifying desert types, while the TGSI showed greater feature importance than the other indices. This is because all desert types have low vegetation cover, and the vegetation index is less sensitive to areas with low vegetation cover and therefore has some limitations when applied in arid desert areas [91]. Additionally, since there is a significant difference in grain size between sandy deserts and gravelly deserts, the TGSI has relatively high importance in distinguishing between these two types of deserts.

4.3. Impact Factors on Desert Change

Desert area ecosystems are very fragile and sensitive to environmental changes and are susceptible to changes due to external disturbances. In this study, we analyzed the effects of 12 different types of factors on QTP desert change. The results of factor detection revealed that the terrain factor plays an important role in QTP desert change, and Slo has the highest q statistic value. This is because terrain factors (Ele, Slo) can directly influence surface runoff, soil erosion, and local microclimate conditions through the redistribution of hydrothermal conditions, thus exerting an important influence on desert development and change [92,93]. In addition, Chen et al. noted that the QTP, as the third pole of the earth, is the most sensitive region to climate change and is the most significant region of climate wetting in China [53]. Precipitation directly affects changes in vegetation, which in turn has an important influence on the distribution of and changes in deserts. Our results also revealed that among climate factors, precipitation has the greatest influence on desert change. This finding is consistent with the results of the study of QTP desertification influencing factors by Cuo et al. [94]. Vegetation is the result of the combined influence of terrain, climate, soil, and other factors, and the response of different vegetation types to climate change is spatially heterogeneous [95]. Our study revealed that among environmental factors, vegetation type also has a relatively important influence on desert change.

In addition, animal husbandry is the traditional land use in the QTP area, and overgrazing leads to serious grassland degradation, land desertification, and other problems [96,97]. To restore and protect the environment, the Chinese government has implemented a series of ecological restoration projects, such as the development and protection of grasslands through the construction of fences, grazing bans, and rotational grazing, as well as soil and water erosion control and desertification control [98,99,100,101]. Ma et al. noted that after analyzing the ecological benefits generated by the management and protection measures in different areas of the QTP, the implementation of a series of ecological restoration projects in the QTP effectively curbed the trend of ecological degradation [102]. Zhao et al. noted that a series of ecological restoration projects effectively promoted an increase in vegetation cover on the QTP [103]. The results of geodetector-based research revealed that the actual livestock load in most areas of the QTP has been effectively controlled, and the detection of HF and ALCC factors has also had an important influence on changes in the QTP deserts. Moreover, the development of and changes in deserts are not only the result of a single factor but are affected by multiple factors, such as terrain, climate, the environment, and human activities [49]. The results of the interaction analysis based on geodetector show that the interaction between any two factors is enhanced by bidirectionals or nonlinearly enhanced, which further confirms the above viewpoint.

4.4. Advantages and Limitations of This Study

In this study, we compared the performance of five different machine learning algorithms in desert type extraction, from which we selected the optimal classification method, which is highly important for effectively and quickly understanding the spatial distribution of QTP deserts. Subsequently, the importance of different features in the classification and their positive and negative impacts on different desert types were analyzed via the SHAP model from both global and local perspectives, which effectively compensated for the shortcomings of traditional importance feature assessment. In addition, the factors affecting desert change were quantified via geodetector, and the effects of interactions among the factors were analyzed, which is conducive to obtaining a comprehensive understanding of the dynamic process of desert change and has important reference value for QTP environmental protection and ecological restoration.

However, there are still some shortcomings in this study, such as the time range of Sentinel-1 data and the “salt and pepper effect” that exists in the pixel-based classification process, all of which exert a certain impact on the recognition of changes in deserts. In this study, although we fused patches smaller than 6 × 6 pixel size to the surrounding patches to minimize the influence of the “salt and pepper effect” on the QTP desert classification and change analysis, it may still have some influences for the results. Second, in order to maintain the temporal consistency of the images in each period as much as possible, we all use the July and August images for the median synthesis, but due to the large extent of the Qinghai–Tibetan Plateau, there will still be some temporal differences in the image information of different locations, which may have some impact on the results of the classification. In addition, due to the similarity in spectral response between salt crust and mild saline deserts, they are treated as one type in the classification process. However, there are large differences in the degree of salinity, and we will try to use more information and methods to differentiate them in subsequent studies. Moreover, owing to the limited distribution of meteorological stations in the QTP region, the interpolated data used for each meteorological element cannot fully reflect the actual meteorological conditions on site, which may affect the results of the geodetector. In addition, although we have optimized the discretization of continuous variables in the geodetector model, grid density still has an impact on the detection results. In the future, we will consider using deep learning algorithms and object-oriented classification methods to conduct studies about classification and related problems.

Despite these limitations, this study used SHAP and geodetector to deepen the understanding of desert distribution and change, which provided data support for desertification control on the QTP. In the prevention and control of desertification, it is important to pay attention to adjusting the land-use structure, reasonably allocating the structure of agricultural, animal husbandry, and industrial production, and preventing soil erosion and land desertification caused by human activities (e.g., returning sloping farmland to forests or grasslands, prohibiting and rotating grazing, and ecological protection and restoration of the mining areas). In addition, a series of ecological restoration projects (e.g., protection and restoration of vegetation, establishment of sand barriers, etc.) can be carried out in accordance with local conditions to improve soil quality and reduce the migration capacity of wind and sand, with the purpose of controlling the expansion of desertification.

5. Conclusions

This study compared the performance of five different machine learning classification algorithms (RF, CTB, CART, KNN, and SVM) for QTP desert type extraction to select the optimal algorithm. The results showed that the RF classifier achieved the best performance, with an overall accuracy of 87.11% and a kappa coefficient of 0.83. Meanwhile, by comparing the PA and UA of different desert types when applying different machine learning algorithms for classification and combining the results with the local details comparison, it is found that the RF has a better classification performance in recognizing most of the desert types (SD, GD, LD, RD, AC). Afterward, the SHAP method was used to analyze the importance of different classification features in the RF model classification process from both global and local perspectives. From the perspective of overall desert classification, the most prominent contributions are elevation and slope, followed by VV, VH, and GLCM. From the perspective of the impact of various classification features on the extraction of different types of deserts, the radar backscatter coefficient VV helps distinguish sandy deserts from other types of deserts; VH helps distinguish RD, AC, SD, and LD deserts; slope helps distinguish RD, AC, and other types of deserts; elevation plays a more prominent role in distinguishing alpine cold deserts from other types of deserts; and the short-wave infrared band SR_B7 has an important role in the identification of salt crusts and saline deserts. In addition, an analysis of the changes in the QTP desert from 2000 to 2020 revealed that the QTP desert area exhibited an overall reversal trend during the study period, with the proportion of desert area to the QTP decreasing from 28.62% to 26.20%. The area of the deserts decreased by 19,643.96 km² from 2000 to 2010, and decreased by 47,311.51 km² from 2010 to 2020. And the rate of QTP desert reversal accelerated significantly after 2010. The reversal of deserts mainly comes from the transformation of GD and LD to non-deserts, which are mainly distributed in the northwestern part of the Qiangtang Plateau, the northeastern part of the Qaidam Basin, and those places around the salt lakes in the basin. Research results based on geodetector reveal that the key factors affecting desert change include Slo, Pre, Ele, VT, and HF. The interaction results among various factors indicate that changes in the QTP desert are influenced by multiple factors, such as terrain, climate, environment, and human activities. The interactions among various factors play important roles in changes in the QTP desert through bidirectional enhancement or nonlinear enhancement.

Author Contributions

Conceptualization, R.L. and S.L.; methodology, R.L.; software, R.L. and Y.Z.; validation, R.L. and S.L.; investigation, W.K. and Y.Z.; writing—original draft preparation, R.L.; writing—review and editing, S.L. and H.D. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Second Tibetan Plateau Scientific Expedition and Research Program (The Ministry Science and Technology of the People’s Republic of China, 2019QZKK0305).

Data Availability Statement

The detailed information and acquisition method of the data used have been described in this paper.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Table A1. Specific information of the Landsat dataset.

	Datasets	Years	Months	Number of Images	Path and Row
2000	LANDSAT/LE07/C02/T1_L2	1998–2003	July and August	1043	Path:131–151 Row:32–41
2010	LANDSAT/LE07/C02/T1_L2	2007–2013		1782
2020	LANDSAT/LC08/C02/T1_L2	2019–2022		1723

Table A2. Number of sample points in 2000, 2010, and 2020.

Desert Type	2000	2010	2020
SD	4546	4517	4598
GD	3505	3422	3286
SM	297	293	295
MS	181	177	165
LD	1189	1097	1006
RD	2057	1950	1889
AC	1909	1893	1869
Total	13,684	13,349	13,108

Table A3. The construction formulas of spectral indexes.

Spectral Indexes	Formulas
TGSI	TGSI = $(ρ_{RED} - ρ_{BLUE}) / (ρ_{RED} + ρ_{GREEN} + ρ_{BLUE})$
BSI	BSI = $(ρ_{RED} + ρ_{SWIR 1} - ρ_{NIR} - ρ_{BLUE}) / (ρ_{RED} + ρ_{SWIR 1} + ρ_{NIR} + ρ_{BLUE})$
NDVI	NDVI = $(ρ_{NIR} - ρ_{RED}) / (ρ_{NIR} + ρ_{RED})$
EVI	EVI = 2.5 × (( $ρ_{NIR} - ρ_{RED}) / (ρ_{NIR} + 6 \times ρ_{RED} - 7.5 \times ρ_{BLUE} + 1$ ))
MSAVI	MSAVI = $(2 \times ρ_{NIR} + 1 - \sqrt[2]{{(2 \times ρ_{NIR} + 1)}^{2} - 8 \times (ρ_{NIR} - ρ_{RED})}) / 2$
SI	SI = $\sqrt[2]{ρ_{BLUE} \times ρ_{RED}}$
NDSI	NDSI = $(ρ_{GREEN} - ρ_{SWIR 1}) / (ρ_{GREEN} + ρ_{SWIR 1})$
NDWI	NDWI = $(ρ_{GREEN} - ρ_{NIR}) / (ρ_{GREEN} + ρ_{NIR})$
Albedo	$Albedo = 0.356 ρ_{BLUE} + 0.13 ρ_{RED} + 0.373 ρ_{NIR} + 0.085 ρ_{SWIR 1} + 0.072 ρ_{SWIR 2} - 0.0018$

In the formulas,

ρ_{BLUE}

,

ρ_{GREEN}

,

ρ_{RED}

,

ρ_{NIR}

,

ρ_{SWIR 1}

, and

ρ_{SWIR 2}

represent the surface reflectance of the features in the blue, green, red, near-infrared, shortwave infrared 1, and shortwave infrared 2, respectively. These spectral indexes are calculated based on Landsat images with a spatial resolution of 30 m.

Table A4. Types of interactions and basis of assessment.

Basis of Assessment	Interaction Type
q(X1∩X2) < Min(q(X1), q(X2))	nonlinear weakening
Min(q(X1), q(X2)) < q(X1∩X2) < Max(q(X1), q(X2))	Single-factor nonlinear weakening
q(X1∩X2) > Max(q(X1), q(X2))	bidirectional enhancement
q(X1∩X2) = q(X1) + q(X2)	independent
q(X1∩X2) > q(X1) + q(X2)	nonlinear enhancement

Table A5. The confusion matrix of the RF classification algorithm.

	SD	GD	SM	MS	LD	RD	AC
SD	1291	35	0	0	4	5	0
GD	47	863	4	0	49	15	13
SM	2	27	42	4	3	4	0
MS	3	8	2	37	0	1	0
LD	11	55	4	0	217	24	1
RD	9	32	0	0	17	456	55
AC	0	11	0	0	2	53	473

Table A6. The confusion matrix of the GTB classification algorithm.

	SD	GD	SM	MS	LD	RD	AC
SD	1276	45	1	0	8	5	0
GD	48	856	3	3	48	19	14
SM	2	18	48	5	7	2	0
MS	2	5	6	38	0	0	0
LD	14	55	7	0	212	21	3
RD	5	35	0	1	20	455	53
AC	0	9	3	0	2	64	461

Table A7. The confusion matrix of the CART classification algorithm.

	SD	GD	SM	MS	LD	RD	AC
SD	1218	84	5	3	12	13	0
GD	83	794	9	1	62	31	11
SM	8	20	43	6	2	3	0
MS	2	4	5	40	0	0	0
LD	17	69	9	0	185	31	1
RD	16	45	2	0	29	416	61
AC	0	16	0	0	7	88	428

Table A8. The confusion matrix of the KNN classification algorithm.

	SD	GD	SM	MS	LD	RD	AC
SD	1145	167	1	0	6	16	0
GD	143	725	9	6	78	21	9
SM	6	38	28	6	4	0	0
MS	3	7	9	32	0	0	0
LD	44	99	8	3	143	10	5
RD	49	76	0	1	27	302	114
AC	1	7	0	0	16	90	425

Table A9. The confusion matrix of the SVM classification algorithm.

	SD	GD	RD	AC
SD	1273	49	3	10
GD	773	55	5	158
SM	63	4	1	14
MS	51	0	0	0
LD	137	5	1	169
RD	314	63	10	182
AC	14	14	11	500

Table A10. The meaning of the vegetation type codes.

Number	Meaning
1	Needleleaf forests
2	Alpine vegetation
3	Cultivated vegetation
4	Needleleaf and Broadleaf mixed forests
5	Broadleaf forests
6	Scrubs
7	Deserts
8	Steppes
9	Grasslands
10	Meadows
11	Marshes

Table A11. The meaning of the soil type codes.

Number	Meaning
1	Alpine soil
2	Ferroalloysite soil
3	Calcic soil
4	Saline-alkali soil
5	Desert soil
6	Leached soil
7	Hydromorphic soil
8	Arid soil
9	Semi-leached soil
10	Semi-hydromorphic soil
11	Primary soil
12	Anthropogenic soil

Figure A1. The spatial distribution of driving factors.

In the figure, the seven factors—precipitation (Pre), temperature (Tem), wind speed (WS), potential evapotranspiration (PET), population density (Pop), actual livestock carrying capacity (ALCC), and human footprint (HF)—were averaged from 2000 to 2020.

Figure A2. The location of the local classification detail regions.

References

China Scientific and Technical Nomenclature Review Committee. Geographic Terminology, 2nd ed.; Science Press: Beijing, China, 2007. [Google Scholar]
Yang, F.X.; Gui, D.W.; Yue, J.; Lei, J.Q.; Zhang, Z.W. A discussion on classification system of desert in arid land. J. Arid Land Resour. Environ. 2015, 29, 145–151. [Google Scholar]
Khan, A.A.; Ruby, T.; Naz, N.; Rafay, M. Desert ecosystem management: A sustainable and wise use. In Ecosystem Functions and Management: Theory and Practice; Springer: Cham, Switzerland, 2017; pp. 85–99. [Google Scholar]
Cheng, L.L.; Que, X.E.; Yang, L.; Yao, X.L.; Lu, Q. China’s desert ecosystem: Functions rising and services enhancing. Bull. Chin. Acad. Sci. 2020, 35, 690–698. [Google Scholar]
Wu, S.; Liu, L.; Li, D.; Zhang, W.; Liu, K.; Shen, J.; Zhang, L. Global desert expansion during the 21st century: Patterns, predictors and signals. Land Degrad. Dev. 2023, 34, 377–388. [Google Scholar] [CrossRef]
Chen, H.; Costanza, R. Valuation and management of desert ecosystems and their services. Ecosyst. Serv. 2024, 66, 101607. [Google Scholar] [CrossRef]
Feng, Q.; Chang, Z.Q.; Xi, H.Y.; Su, Y.H.; Wen, X.H.; Zhu, M.; Zhang, J.T.; Zhang, C.Q. Response to Global Change in the Ecologically Fragile and Desert Region of China-Mongolia Based on Carbon and Nitrogen Cycles. Adv. Earth Sci. 2022, 37, 1101. [Google Scholar]
Liu, H.L.; Willems, P.; Bao, A.M.; Wang, L.; Chen, X. Effect of climate change on the vulnerability of a socio-ecological system in an arid area. Glob. Planet Chang. 2016, 137, 1–9. [Google Scholar] [CrossRef]
Yang, L.; Zhao, G.J.; Mu, X.M.; Lan, Z.F.; Jiao, J.Y.; An, S.S.; Wu, Y.Q.; Mi-Ping, P.Q. Integrated assessments of land degradation on the Qinghai-Tibet plateau. Ecol. Indic. 2023, 147, 109945. [Google Scholar] [CrossRef]
Department of Economic and Social Affairs. Transforming Our World: The 2030 Agenda for Sustainable Development; United Nations: New York, NY, USA, 2015. [Google Scholar]
Huang, J.P.; Zhang, G.L.; Zhang, Y.T.; Guan, X.D.; Wei, Y.; Guo, R.X. Global desertification vulnerability to climate change and human activities. Land Degrad. Dev. 2020, 31, 1380–1391. [Google Scholar] [CrossRef]
Sterk, G.; Stoorvogel, J.J. Desertification–Scientific Versus Political Realities. Land 2020, 9, 156. [Google Scholar] [CrossRef]
Zhao, Y.Y.; Gao, G.L.; Qin, S.G.; Yu, M.H.; Ding, G.D. Desertification detection and the evaluation indicators: A review. J. Arid Land Resour. Environ. 2019, 33, 81–87. [Google Scholar]
Li, S.; Dong, Y.X.; Dong, G.R. Sandy Desertification Problem and Sustainable Development in Qinghai-Tibet Plateau; China Tibetology Publishing House: Beijing, China, 2001. [Google Scholar]
Zhang, C.L.; Li, Q.; Shen, Y.P.; Zhou, N.; Wang, X.S.; Li, J.; Jia, W.R. Monitoring of aeolian desertification on the Qinghai-Tibet Plateau from the 1970s to 2015 using Landsat images. Sci. Total Environ. 2018, 619, 1648–1659. [Google Scholar] [CrossRef] [PubMed]
United Nations. Managing fragile ecosystems: Combating desertification and drought. In Agenda 21; A/CONF.151/4 (Part II); United Nations: Rio de Janeiro, Brazil, 1992; pp. 46–66. [Google Scholar]
Wang, X.M.; Geng, X.; Liu, B.; Cai, D.W.; Li, D.F.; Xiao, F.Y.; Zhu, B.Q.; Hua, T.; Lu, R.J.; Liu, F. Desert ecosystems in China: Past, present, and future. Earth Sci. Rev. 2022, 234, 104206. [Google Scholar] [CrossRef]
Halmy, M.W.A.; Gessler, P.E. The application of ensemble techniques for land-cover classification in arid lands. Int. J. Remote Sens. 2015, 36, 5613–5636. [Google Scholar] [CrossRef]
Song, Y.B.; Zheng, H.W.; Chen, X.; Bao, A.M.; Lei, J.Q.; Xu, W.Q.; Luo, G.P.; Guan, Q. Desertification Extraction Based on a Microwave Backscattering Contribution Decomposition Model at the Dry Bottom of the Aral Sea. Remote Sens. 2021, 13, 4850. [Google Scholar] [CrossRef]
Zhang, Z.W.; Yin, H.Y.; Qian, D.F.; Zhou, Y.Z. A Preliminary Study on the Classification System of Desert in Alpine Region—Taking Duilongdeqing District as an Example. J. Plateau Agric. 2019, 3, 2096–4781. [Google Scholar]
Kupidura, P. The comparison of different methods of texture analysis for their efficacy for land use classification in satellite imagery. Remote Sens. 2019, 11, 1233. [Google Scholar] [CrossRef]
Gorelick, N.; Hancher, M.; Dixon, M.; Ilyushchenko, S.; Thau, D.; Moore, R. Google Earth Engine: Planetary-scale geospatial analysis for everyone. Remote Sens. Environ. 2017, 202, 18–27. [Google Scholar] [CrossRef]
Wang, L.; Diao, C.Y.; Xian, G.; Yin, D.M.; Lu, Y.; Zou, S.Y.; Erickson, T.A. A summary of the special issue on remote sensing of land change science with Google earth engine. Remote Sens. Environ. 2020, 248, 112002. [Google Scholar] [CrossRef]
Shetty, S. Analysis of Machine Learning Classifiers for LULC Classification on Google Earth Engine. Master’s Thesis, University of Twente, Enschede, The Netherlands, 2019. [Google Scholar]
Maxwell, A.E.; Warner, T.A.; Fang, F. Implementation of machine learning classification in remote sensing: An applied review. Int. J. Remote Sens. 2018, 39, 2784–2817. [Google Scholar] [CrossRef]
Rudin, C. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat. Mach. Intell. 2019, 5, 206–215. [Google Scholar] [CrossRef]
Fu, B.L.; Liang, Y.Y.; Lao, Z.N.; Sun, X.D.; Li, S.Z.; He, H.C.; Sun, W.W.; Fan, D.L. Quantifying scattering characteristics of mangrove species from Optuna-based optimal machine learning classification using multi-scale feature selection and SAR image time series. Int. J. Appl. Earth Obs. 2023, 122, 103446. [Google Scholar] [CrossRef]
Chen, H.F.; Yang, L.P.; Wu, Q.S. Enhancing land cover mapping and monitoring: An interactive and explainable machine learning approach using Google Earth Engine. Remote Sens. 2023, 15, 4585. [Google Scholar] [CrossRef]
Stojić, A.; Stanić, N.; Vuković, G.; Stanišić, S.; Perišić, M.; Šoštarić, A.; Lazić, L. Explainable extreme gradient boosting tree-based prediction of toluene, ethylbenzene and xylene wet deposition. Sci. Total Environ. 2019, 653, 140–147. [Google Scholar] [CrossRef] [PubMed]
Shafizadeh-Moghadam, H.; Khazaei, M.; Alavipanah, S.K.; Weng, Q. Google Earth Engine for large-scale land use and land cover mapping: An object-based classification approach using spectral, textural and topographical factors. GISci. Remote Sens. 2021, 58, 914–928. [Google Scholar] [CrossRef]
Lundberg, S.M.; Lee, S.I. A Unified Approach to Interpreting Model Predictions. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
Zhang, Y.; Fu, B.L.; Sun, X.D.; Yao, H.; Zhang, S.R.; Wu, Y.; Kuang, H.Y.; Deng, T.F. Effects of multi-growth periods UAV images on classifying karst wetland vegetation communities using object-based optimization stacking algorithm. Remote Sens. 2023, 15, 4003. [Google Scholar] [CrossRef]
Meng, X.Y.; Li, S.Y.; Akhmadi, K.; He, P.X.; Dong, G.P. Trends, turning points, and driving forces of desertification in global arid land based on the segmental trend method and SHAP model. GISci. Remote Sens. 2024, 61, 2367806. [Google Scholar] [CrossRef]
Feng, K.D.; Mao, D.H.; Zhen, J.N.; Pu, H.G.; Yan, H.Q.; Wang, M.; Wang, D.R.; Xiang, H.X.; Ren, Y.X.; Luo, L.; et al. Potential of Sample Migration and Explainable Machine Learning Model for Monitoring Spatiotemporal Changes of Wetland Plant Communities. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 9894. [Google Scholar] [CrossRef]
Zhou, Y.A.; Wu, W.; Wang, H.; Zhang, X.; Yang, C.; Liu, H.B. Identification of soil texture classes under vegetation cover based on Sentinel-2 data with SVM and SHAP techniques. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2022, 15, 3758–3770. [Google Scholar] [CrossRef]
Antonini, A.S.; Tanzola, J.; Asiain, L.; Ferracutti, G.R.; Castro, S.M.; Bjerg, E.A.; Ganuza, M.L. Machine Learning model interpretability using SHAP values: Application to Igneous Rock Classification task. Appl. Comput. Geosci. 2024, 23, 100178. [Google Scholar] [CrossRef]
Hosseiny, B.; Abdi, A.M.; Jamali, S. Urban land use and land cover classification with interpretable machine learning—A case study using Sentinel-2 and auxiliary data. Remote Sens Appl. 2022, 28, 100843. [Google Scholar] [CrossRef]
Yao, T.D.; Thompson, L.G.; Mosbrugger, V.; Zhang, F.; Ma, Y.M.; Luo, T.X.; Xu, B.Q. Third pole environment (TPE). Environ. Dev. 2012, 3, 52–64. [Google Scholar] [CrossRef]
Olsson, L.; Eklundh, L.; Ardö, J. A recent greening of the Sahel—Trends, patterns and potential causes. J. Arid Environ. 2005, 63, 556–566. [Google Scholar] [CrossRef]
Yang, Z.W.; Gao, X.; Lei, J.Q.; Meng, X.Y.; Zhou, N. Analysis of spatiotemporal changes and driving factors of desertification in the Africa Sahel. Catena 2022, 213, 106213. [Google Scholar] [CrossRef]
Feng, Q.; Ma, H.; Jiang, X.M.; Wang, X.; Cao, S. What has caused desertification in China? Sci. Rep. 2015, 5, 15998. [Google Scholar] [CrossRef] [PubMed]
Al-Blooshi, L.S.; Issa, S.; Ksiksi, T. Assessing the environmental impact of climate change on desert ecosystems: A review. Adv. Ecol. Res. 2020, 5, 27–52. [Google Scholar]
Zhang, C.X.; Wang, X.M.; Li, J.C.; Hua, T. Identifying the effect of climate change on desertification in northern China via trend analysis of potential evapotranspiration and precipitation. Ecol. Indic. 2020, 112, 106141. [Google Scholar] [CrossRef]
Zhao, H.L. Desertification processes due to heavy grazing in sandy rangeland, Inner Mongolia. J. Arid Environ. 2005, 62, 309–319. [Google Scholar] [CrossRef]
Xu, D.Y.; Li, C.L.; Song, X.; Ren, H.Y. The dynamics of desertification in the farming-pastoral region of North China over the past 10 years and their relationship to climate change and human activity. Catena 2014, 123, 11–22. [Google Scholar] [CrossRef]
Burrell, A.L.; Evans, J.P.; Liu, Y. Detecting dryland degradation using time series segmentation and residual trend analysis (TSS-RESTREND). Remote Sens. Environ. 2017, 197, 43–57. [Google Scholar] [CrossRef]
Liu, Q.F.; Zhang, Q.; Yan, Y.Z.; Zhang, X.F.; Niu, J.M.; Svenning, J.C. Ecological restoration is the dominant driver of the recent reversal of desertification in the Mu Us Desert (China). J. Clean Prod. 2020, 268, 122241. [Google Scholar] [CrossRef]
Li, S.; Zheng, Y.; Luo, P.; Wang, X.; Li, H.; Lin, P. Desertification in western Hainan Island, China (1959 to 2003). Land Degrad. Dev. 2007, 18, 473–485. [Google Scholar] [CrossRef]
Zhi, Y.; Liu, S.L.; Wang, T.; Duan, H.C.; Kang, W.P. Quantifying the impact of natural and human activity factors on desertification in the Qinghai-Tibetan Plateau. Catena 2024, 246, 108392. [Google Scholar] [CrossRef]
Zhao, H.Y.; Zhai, X.H.; Li, S.; Wang, Y.H.; Xie, J.L.; Yan, C.Z. The continuing decrease of sandy desert and sandy land in northern China in the latest 10 years. Ecol. Indic. 2023, 154, 110699. [Google Scholar] [CrossRef]
Wang, J.F.; Xu, C.D. Geodetector: Principle and prospective. Acta Geogr. Sin. 2017, 72, 116–134. [Google Scholar]
Li, W.H.; Zhou, X.M. Ecosystems of Qinghai-Xizang (Tibetan) Plateau and Approach for Their Sustainable Management; Guangdong Science and Technology Press: Guangzhou, China, 1998. [Google Scholar]
Chen, F.H.; Wang, Y.F.; Zhen, X.L.; Sun, J. Environmental impacts and response strategies on the Qinghai-Tibet Plateau under global change. China Tibetol. 2021, 21, e28. [Google Scholar]
Li, M.; Zhang, X.Z.; He, Y.T.; Niu, B.; Wu, J.S. Assessment of the vulnerability of alpine grasslands on the Qinghai-Tibetan Plateau. PeerJ 2020, 8, e8513. [Google Scholar] [CrossRef]
United Nations Convention to Combat Desertification (UNCCD). United Nations Convention to Combat Desertification in Those Countries Experiencing Serious Drought and/or Desertification Particularly in Africa: Text with Annexes; UNEP: Nairobi, Kenya, 1994. [Google Scholar]
UNEP. World Atlas of Desertification; Edward Arnold: London, UK, 1992. [Google Scholar]
Zhao, H.Y.; Yan, C.Z.; Li, S.; Wang, Y.H. Remote sensing monitoring of aeolian desertification and quantitative analysis of its driving force in the Yellow River Basin during 2000–2020. J. Desert Res. 2023, 43, 127–137. [Google Scholar]
Zhang, J.H.; Feng, Z.M.; Jiang, L.G. Progress on studies of land use/land cover classification systems. Resour. Sci. 2011, 33, 1195–1203. [Google Scholar]
Li, C.X.; Ma, Z.Y.; Wang, L.Y.; Yu, W.J.; Tan, D.L.; Gao, B.B.; Feng, Q.L.; Guo, H.; Zhao, Y.Y. Improving the accuracy of land cover mapping by distributing training samples. Remote Sens. 2021, 13, 4594. [Google Scholar] [CrossRef]
Mbatha, N.; Xulu, S. Time series analysis of MODIS-Derived NDVI for the Hluhluwe-Imfolozi Park, South Africa: Impact of recent intense drought. Climate 2018, 6, 95. [Google Scholar] [CrossRef]
Velastegui-Montoya, A.; Montalván-Burbano, N.; Carrión-Mero, P.; Rivera-Torres, H.; Sadeck, L.; Adami, M. Google Earth Engine: A global analysis and future trends. Remote Sens. 2023, 15, 3675. [Google Scholar] [CrossRef]
Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Belgiu, M.; Drăguţ, L. Random forest in remote sensing: A review of applications and future directions. ISPRS J. Photogramm. 2016, 114, 24–31. [Google Scholar] [CrossRef]
Lawrence, R.L.; Wood, S.D.; Sheley, R.L. 2006. Mapping invasive plants using hyperspectral imagery and Breiman Cutler classifications (RandomForest). Remote Sens. Environ. 2006, 100, 356–362. [Google Scholar] [CrossRef]
Friedman, J.H. Stochastic gradient boosting. Comput. Stat. Data. Anal. 2002, 38, 367–378. [Google Scholar] [CrossRef]
Ustuner, M.; Balik Sanli, F. Polarimetric target decompositions and light gradient boosting machine for crop classification: A comparative evaluation. ISPRS Int. J. Geo-Inf. 2019, 8, 97. [Google Scholar] [CrossRef]
Huang, C.Q.; Davis, L.S.; Townshend, J.R.G. An assessment of support vector machines for land cover classification. Int. J. Remote Sens. 2002, 23, 725–749. [Google Scholar] [CrossRef]
Foody, G.M.; Mathur, A. A relative evaluation of multiclass image classification by support vector machines. IEEE Trans. Geosci. Electron. 2004, 42, 1335–1343. [Google Scholar] [CrossRef]
Yang, X.Y. Parameterizing support vector machines for land cover classification. Photogramm. Eng. Remote Sens. 2011, 77, 27–37. [Google Scholar] [CrossRef]
Loh, W.Y. Classification and regression trees. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2011, 1, 14–23. [Google Scholar] [CrossRef]
Martin, B. Instance-Based Learning: Nearest Neighbour with Generalization; University of Waikato: Hamilton, New Zealand, 1995. [Google Scholar]
Ge, G.; Shi, Z.J.; Zhu, Y.J.; Yang, X.H.; Hao, Y.G. Land use/cover classification in an arid desert-oasis mosaic landscape of China using remote sensed imagery: Performance assessment of four machine learning algorithms. Glob. Ecol. Conserv. 2020, 22, e00971. [Google Scholar] [CrossRef]
Capolupo, A.; Monterisi, C.; Caporusso, G.; Tarantino, E. Extracting land cover data using GEE: A review of the classification indices. In Proceedings of the Computational Science and Its Applications–ICCSA 2020: 20th International Conference, Cagliari, Italy, 1–4 July 2020; Proceedings, Part IV 20. Springer International Publishing: Cham, Switzerland; pp. 782–796. [Google Scholar]
Lu, R.J.; Liu, S.L.; Kang, W.P.; Feng, K.; Guo, Z.C.; Zhi, Y. Combining the GEE platform and machine learning algorithm for desert information extraction. J. Desert Res. 2023, 43, 60–70. [Google Scholar]
Abida, K.; Barbouchi, M.; Boudabbous, K.; Toukabri, W.; Saad, K.; Bousnina, H.; Sahli Chahed, T. Sentinel-2 data for land use mapping: Comparing different supervised classifications in semi-arid areas. Agriculture 2022, 12, 1429. [Google Scholar] [CrossRef]
Xiao, J.; Shen, Y.; Tateishi, R.; Bayaer, W. Development of topsoil grain size index for monitoring desertification in arid land using remote sensing. Int. J. Remote Sens. 2006, 27, 2411–2422. [Google Scholar] [CrossRef]
Liang, S.L. Narrowband to broadband conversions of land surface albedo I: Algorithms. Remote Sens. Environ. 2001, 76, 213–238. [Google Scholar] [CrossRef]
Kawauchi, H.; Fuse, T. Shap-based interpretable object detection method for satellite imagery. Remote Sens. 2022, 14, 1970. [Google Scholar] [CrossRef]
Pizarro, S.E.; Pricope, N.G.; Vargas-Machuca, D.; Huanca, O.; Ñaupari, J. Mapping land cover types for highland Andean ecosystems in Peru using google earth engine. Remote Sens. 2022, 14, 1562. [Google Scholar] [CrossRef]
Yang, Y.P.; Yang, D.; Wang, X.F.; Zhang, Z.; Nawaz, Z. Testing accuracy of land cover classification algorithms in the qilian mountains based on gee cloud platform. Remote Sens. 2021, 13, 5064. [Google Scholar] [CrossRef]
Pelletier, C.; Valero, S.; Inglada, J.; Champion, N.; Dedieu, G. Assessing the robustness of Random Forests to map land cover with high resolution satellite image time series over large areas. Remote Sens. Environ. 2016, 187, 156–168. [Google Scholar] [CrossRef]
Zhang, S.C. Challenges in KNN classification. IEEE Trans. Knowl. Data Eng. 2021, 34, 4663–4675. [Google Scholar] [CrossRef]
Peng, W.F.; Kuang, T.T.; Tao, S. Quantifying influences of natural factors on vegetation NDVI changes based on geographical detector in Sichuan, western China. J. Clean Prod. 2019, 233, 353–367. [Google Scholar] [CrossRef]
Hakdaoui, S.; Emran, A.; Pradhan, B.; Qninba, A.; Balla, T.E.; Mfondoum, A.H.N.; Lee, C.W.; Alamri, A.M. Assessing the changes in the moisture/dryness of water cavity surfaces in imlili sebkha in southwestern morocco by using machine learning classification in google earth engine. Remote Sens. 2020, 12, 131. [Google Scholar] [CrossRef]
Zhu, J.; Liu, G.X.; Zhao, R.; Ding, X.L.; Fu, H.Q. ML based approach for inverting penetration depth of SAR signals over large desert areas. Remote Sens. Environ. 2023, 295, 113643. [Google Scholar] [CrossRef]
Hall-Beyer, M. Practical guidelines for choosing GLCM textures to use in landscape classification tasks over a range of moderate spatial scales. Int. J. Remote Sens. 2017, 38, 1312–1338. [Google Scholar] [CrossRef]
Zhou, H.Y.; Fu, L.Y.; Sharma, R.P.; Lei, Y.C.; Guo, J.P. A hybrid approach of combining random forest with texture analysis and VDVI for desert vegetation mapping Based on UAV RGB Data. Remote Sens. 2021, 13, 1891. [Google Scholar] [CrossRef]
Sadeghi, M.; Jones, S.B.; Philpot, W.D. A linear physically-based model for remote sensing of soil moisture using short wave infrared bands. Remote Sens. Environ. 2015, 164, 66–76. [Google Scholar] [CrossRef]
Wang, X.M.; Li, X.B.; Cai, D.W.; Lou, J.P.; Li, D.F.; Liu, F. Salinification and salt transports under aeolian processes in potential desertification regions of China. Sci. Total Environ. 2021, 782, 146832. [Google Scholar] [CrossRef]
Wu, W.C. The generalized difference vegetation index (GDVI) for dryland characterization. Remote Sens. 2014, 6, 1211–1233. [Google Scholar] [CrossRef]
Guo, B.; Yang, F.; Fan, Y.W.; Zang, W.Q. The dominant driving factors of rocky desertification and their variations in typical mountainous karst areas of Southwest China in the context of global change. Catena 2023, 220, 106674. [Google Scholar] [CrossRef]
Meng, X.Y.; Gao, X.; Li, S.Y.; Lei, J.Q. Spatial and temporal characteristics of vegetation NDVI changes and the driving forces in Mongolia during 1982–2015. Remote Sens. 2020, 12, 603. [Google Scholar] [CrossRef]
Cuo, L.; Zhang, Y.X.; Wu, Y.Q.; Hou, M. Desertification affecting the Tibetan Plateau between 1971–2015: Viewed from a climate perspective. Land Degrad. Dev. 2020, 31, 1956–1968. [Google Scholar] [CrossRef]
Duan, H.C.; Xue, X.; Wang, T.; Kang, W.P.; Liao, J.; Liu, S.L. Spatial and temporal differences in alpine meadow, alpine steppe and all vegetation of the Qinghai-Tibetan Plateau and their responses to climate change. Remote Sens. 2021, 13, 669. [Google Scholar] [CrossRef]
Gao, Q.Z.; Wan, Y.F.; Li, Y.; Qin, X.B.; JiangCun, W.; Xu, H.M. Spatial and temporal pattern of alpine grassland condition and its response to human activities in Northern Tibet, China. Rangeland J. 2010, 32, 165–173. [Google Scholar] [CrossRef]
Huang, K.; Zhang, Y.J.; Zhu, J.T.; Liu, Y.J.; Zu, J.X.; Zhang, J. The influences of climate change and human activities on vegetation dynamics in the Qinghai-Tibet Plateau. Remote Sens. 2016, 8, 876. [Google Scholar] [CrossRef]
Fu, B.J.; Ouyang, Z.Y.; Shi, P.; Fan, J.; Wang, X.D.; Zhang, H.; Zhao, W.W.; Wu, F. Current condition and protection strategies of Qinghai-Tibet Plateau ecological security barrier. Bull. Chin. Acad. Sci. 2021, 36, 1298–1306. [Google Scholar]
Sun, H.L.; Zheng, D.; Yao, T.D.; Zhang, Y. Protection and construction of the national ecological security shelter zone on Tibetan Plateau. Acta Geogr. Sin. 2012, 67, 3–12. [Google Scholar]
Shao, H.Y.; Sun, X.F.; Wang, H.X.; Zhang, H.X.; Xiang, Z.Y.; Tan, R.; Chen, X.Y.; Xian, W.; Qi, J.G. A method to the impact assessment of the returning grazing land to grassland project on regional eco-environmental vulnerability. Environ. Impact Assess. Rev. 2016, 56, 155–167. [Google Scholar] [CrossRef]
Yu, L.; Liu, S.L.; Wang, F.F.; Liu, H.; Liu, Y.X.; Wang, Q.B.; Zhao, Y.F. Effect of ecological restoration projects on carbon footprint in a grassland ecosystem on the Qinghai-Tibet Plateau. Land Degrad. Dev. 2023, 34, 5824–5834. [Google Scholar] [CrossRef]
Ma, S.; Wang, L.J.; Wang, H.Y.; Jiang, J.; Zhang, J.C. Multiple ecological effects and their drivers of ecological restoration programmes in the Qinghai-Tibet Plateau, China. Land Degrad. Dev. 2023, 34, 1415–1429. [Google Scholar] [CrossRef]
Zhao, Z.X.; Dai, E. Vegetation cover dynamics and its constraint effect on ecosystem services on the Qinghai-Tibet Plateau under ecological restoration projects. J. Environ. Manag. 2024, 356, 120535. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Location and geographical overview of the Qinghai–Tibetan plateau.

Figure 2. Overall classification results of five machine learning algorithms for QTP deserts.

Figure 3. Comparison of local classification details of different machine learning algorithms. (a) is the typical region containing mainly SD and MS; (b) is the typical region containing mainly GD and SM; (c) is the typical region containing mainly LD and AC; (d) is the typical region with diverse and complex desert types; (e) is the typical region containing mainly GD, RD and Non-desert.

Figure 4. Global and local importance of SHAP-based classification features. (a) is the global importance; (b–h) are the importance of classification features for different desert types, where (b) is SD, (c) is GD, (d) is SM, (e) is MS, (f) is LD, (g) is RD, and (h) is AC.

Figure 5. Spatial distribution of QTP desert types in 2000, 2010, and 2020.

Figure 6. Spatial distribution of changes in QTP deserts during different periods.

Figure 7. Sankey diagram of QTP desert changes in different periods.

Figure 8. Spatial distribution of driver factor rating results. Note: See Appendix A for details of the meaning of vegetation type and soil type codes.

Figure 9. The q-value of the individual factors. Note: “*” represents p < 0.05.

Figure 10. Detection results of the interaction between the two factors. Note: “↑” and “↑↑” represent bidirectional enhancement and nonlinear enhancement, respectively.

Table 1. Description of factors influencing desert change.

Factor Type	Factor	Year	Unit	Spatial Resolution	Data Sources
Terrain	elevation	-	(m)	30 m	National Aeronautics and Space Administration
	slope		(°)
	aspect		(°)
Environment	vegetation type	2001		1:1 Million	Data Center for Resources and Environmental Sciences ChineseAcademy of Sciences (https://www.resdc.cn/, accessed on 29 September 2024)
Environment	soil type	1995		1:1 Million
Climate	precipitation	2000–2020	(mm)	1 km	National Tibetan Plateau/Third Pole Environment Data Center (http://data.tpdc.ac.cn, accessed on 28 March 2023)
	temperature	2000–2020	(°C)
	wind speed	2000–2020	(m/s)		National Earth System Science Data Center, National Science & Technology Infrastructure of China (http://www.geodata.cn, accessed on 27 August 2024)
	potential evapotranspiration	2000–2020	(mm)		National Tibetan Plateau/Third Pole Environment Data Center (http://data.tpdc.ac.cn, accessed on 28 March 2023)
Human activity	population density	2000/2005/2010/2015/2019	(people/km²)	1 km	Data Center for Resources and Environmental Sciences ChineseAcademy of Sciences (https://www.resdc.cn/, accessed on 26 May 2022)
	actual livestock carrying capacity	2000–2019	(MU/km²)		National Tibetan Plateau/Third Pole Environment Data Center (http://data.tpdc.ac.cn, accessed on 28 September 2024)
	human footprint	2000–2020			Urban Environmental Monitoring and Modeling (UEMM) Team from the School of Land Science and Technology, China Agricultural University

Table 2. Desert classification systems and identifiers on the QTP.

Desert Type	Surface Characterization	Landsat8 Image	DJ4 Image
Sandy desert (SD)	The surface is sand-covered and dominated by sandy shrub/semi-shrub communities, mainly consisting of gently accreting sands, dunes, and scrubby sands
Gravel desert (GD)	Gravel-covered surface, dominated by shrubby, semi-shrubby, or arid herbaceous communities, predominantly alluvial floodplains, and wind-swept Gobi with a high content of various gravels
Salt crust or Mild saline desert (SM)	High topsoil salinity with white salt crusts on the surface, a small amount of mild saline phenomenon with white salt crystals around rivers and lakes
Moderate and Severe saline desert (MS)	Surface saline aggregation, formed and developed over a long period of time, with hard black salt crusts or crystals distributed on the surface and sparse or bare vegetation
Loamy desert (LD)	Surface soil cover, dominated by shrub and semi-shrub communities, mostly distributed in loess and river alluvial (flood) accumulation areas
Rocky desert (RD)	The surface is dominated by bedrock and rock debris, with poorly developed soils and rugged terrain, dominated by arid and hyper-arid shrub communities
Alpine cold desert (AC)	High-altitude frigid zone with little or no surface vegetation, some areas with alpine mat or ice-marginal vegetation

Table 3. Feature variables in machine learning classification.

Feature Type	Feature Name
Spectral bands	$ρ_{BLUE}$ , $ρ_{GREEN}$ , $ρ_{RED}$ , $ρ_{NIR}$ , $ρ_{SWIR 1}$ , $ρ_{SWIR 2}$
Spectral indexes	TGSI, BSI, NDVI, EVI, MSAVI, Albedo, SI, NDSI, NDWI
Radar features	VV, VH
Terrain features	Elevation, Slope
Texture features	Gray-Level Co-occurrence Matrix (GLCM)

Table 4. Classification accuracy of five machine learning algorithms.

	RF	GTB	CART	KNN	SVM
OA	87.11%	86.26%	80.56%	72.18%	47.38
Kappa	0.83	0.82	0.75	0.63	0.27

Table 5. PA and UA for each desert type under different machine learning classification algorithms.

		SD	GD	SM	MS	LD	RD	AC
RF	PA	94.72%	83.71%	80.77%	90.24%	74.32%	81.72%	87.57%
RF	UA	96.70%	87.08%	51.22%	72.55%	69.55%	80.14%	87.76%
GTB	PA	94.73%	83.68%	70.59%	80.85%	71.38%	80.39%	86.82%
GTB	UA	95.58%	86.38%	58.54%	74.51%	67.95%	79.96%	85.53%
CART	PA	90.63%	76.94%	58.90%	80.00%	62.29%	71.48%	85.43%
CART	UA	91.24%	80.12%	52.44%	78.43%	59.29%	73.11%	79.41%
KNN	PA	82.31%	64.79%	50.91%	66.67%	52.19%	68.79%	76.85%
KNN	UA	85.77%	73.16%	34.15%	62.75%	45.84%	53.08%	78.85%
SVM	PA	48.50%	28.95%	0	0	0	32.26%	48.40%
SVM	UA	95.36%	5.55%	0	0	0	1.76%	92.76%

Table 6. QTP desert area statistics in 2000, 2010, and 2020.

Desert Type	2000		2010		2020
Desert Type	Area (km²)	Percentage	Area (km²)	Percentage	Area (km²)	Percentage
SD	33,154.17	1.19%	31,932.52	1.15%	25,291.25	0.91%
GD	220,350.54	7.91%	198,217.35	7.11%	173,921.76	6.24%
SM	20,315.61	0.73%	23,777.04	0.85%	24,025.67	0.86%
MS	17,426.58	0.63%	15,735.47	0.56%	14,485.88	0.52%
LD	115,894.56	4.16%	114,263.61	4.10%	98,674.19	3.54%
RD	170,438.61	6.12%	171,920.68	6.17%	172,043.15	6.17%
AC	219,667.10	7.88%	221,756.54	7.96%	221,849.80	7.96%
Total	797,247.17	28.62%	777,603.21	27.90%	730,291.70	26.20%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Lu, R.; Liu, S.; Duan, H.; Kang, W.; Zhi, Y. Combining the SHAP Method and Machine Learning Algorithm for Desert Type Extraction and Change Analysis on the Qinghai–Tibetan Plateau. Remote Sens. 2024, 16, 4414. https://doi.org/10.3390/rs16234414

AMA Style

Lu R, Liu S, Duan H, Kang W, Zhi Y. Combining the SHAP Method and Machine Learning Algorithm for Desert Type Extraction and Change Analysis on the Qinghai–Tibetan Plateau. Remote Sensing. 2024; 16(23):4414. https://doi.org/10.3390/rs16234414

Chicago/Turabian Style

Lu, Ruijie, Shulin Liu, Hanchen Duan, Wenping Kang, and Ying Zhi. 2024. "Combining the SHAP Method and Machine Learning Algorithm for Desert Type Extraction and Change Analysis on the Qinghai–Tibetan Plateau" Remote Sensing 16, no. 23: 4414. https://doi.org/10.3390/rs16234414

APA Style

Lu, R., Liu, S., Duan, H., Kang, W., & Zhi, Y. (2024). Combining the SHAP Method and Machine Learning Algorithm for Desert Type Extraction and Change Analysis on the Qinghai–Tibetan Plateau. Remote Sensing, 16(23), 4414. https://doi.org/10.3390/rs16234414

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Combining the SHAP Method and Machine Learning Algorithm for Desert Type Extraction and Change Analysis on the Qinghai–Tibetan Plateau

Abstract

1. Introduction

2. Materials and Methods

2.1. Overview of the Study Area

2.2. Data Sources and Preprocessing

2.2.1. Remote Sensing Data

2.2.2. Selection of Impact Factors for Desert Change

2.3. Methods

2.3.1. Desert Classification System and Sample Selection

2.3.2. GEE Platform

2.3.3. Machine Learning Classification Algorithms

2.3.4. Selection of Classification Features

2.3.5. Geodetector Model

2.3.6. Model Interpretation

3. Results

3.1. Differences in Classification Performance of Different Machine Learning Algorithms

3.2. Classification Feature Importance Analysis Based on SHAP

3.3. Spatial Distribution and Changes in QTP Deserts

3.4. Analysis of Factors Influencing Changes in QTP Deserts

4. Discussion

4.1. Machine Learning Algorithm Classification Performance

4.2. Impact of Classification Features on Desert Type Identification

4.3. Impact Factors on Desert Change

4.4. Advantages and Limitations of This Study

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI