1. Introduction
The expansion of impervious surfaces can drive global land cover and land use change, and it is a result of global economic growth and environmental changes [
1]. Impervious surface (IS) is a type of surface coverage where water cannot infiltrate below the surface layer, and it mainly includes artificial landscapes, such as roads, squares, parking lots, and building tops [
2]. As an artificial land cover, the IS strongly affects the quality of the regional ecological environment. The proliferation of IS mirrors urbanization patterns, and it can be viewed as a tangible result of economic globalization. Economic globalization instigates a redistribution of the urban populace, prompting extensive and rapid urban transformation. These changes, in turn, alter the composition and properties of the land cover beneath, correlating with an elevated demand for ecological and environmental resources. Over recent decades, global economic growth has yet to disengage from ecological demand [
3]. For developing nations, such as China, certain disparities in urbanization exacerbate the imbalance between economic and ecological costs [
4], leading to numerous environmental and ecological issues. Examples include an increased risk of urban flooding [
5], the intensification of the urban heat island effect [
6], and the diminishment of carbon storage and biodiversity [
7,
8]. Hence, comprehending the current state and expansion mechanisms of IS, along with studying the rules governing the dynamic change of IS, is crucial for sustainable urban ecological development.
Traditional methods for identifying IS largely rely on data collection, field surveys, or mapping. However, these approaches are labor-intensive, time-consuming, slow in updating data, and exhibit poor real-time performance. Rapidly advancing remote sensing technology is now widely utilized for the mapping and dynamic monitoring of IS, given its convenience, speed, and real-time capabilities [
9]. Existing IS extraction methods, based on remote sensing technology, primarily comprise the regression model method [
10], the exponential method [
11], the spectral mixture–decomposition method [
12], and various machine learning methods [
13]. The exponential method is preferred by many researchers for its simplicity and ease of implementation, but its subjective threshold selection and the spatial resolution limitations of remote sensing images may lead to the misclassification of ground objects [
14]. The spectral mixture–decomposition method is more objective; however, the selection of IS endmembers is intricate, and an improper selection can significantly affect the precision of IS extraction [
15].
Machine learning algorithms excel in detecting changes in land use and land cover. Ghayour et al. [
16] evaluated the performance of several machine learning algorithms for generating land use and land cover (LULC) maps using Sentinel 2 and Landsat 8 satellite data, wherein the overall accuracy of the support vector machine classifier was reported to be 94%. Based on the GEE platform, Saeid et al. [
17] mapped LULC variations accurately by using historical Landsat datasets, demonstrating the efficacy of the random forest algorithm as a potent classifier. Machine learning algorithms are also progressive techniques in the investigation of artificial surface extraction. These models minimize the influence of subjective factors in the learning process, providing swift and precise recognition. For instance, Mahyou et al. [
18] used the random forest algorithm to extract sample points and artificial neural networks to identify IS, which effectively and accurately extracted the impervious water surface of Marrakesh. Utilizing Sentinel 1 and Sentinel 2 data, Shrestha et al. [
19] employed the random forest algorithm to identify the IS of nine cities in Pakistan, achieving an overall classification accuracy between 85% and 98%. Esch et al. [
20] combined Landsat images and road network data, with the support vector machine method, to map IS in parts of Germany, attesting to the method’s ability to accurately map large-scale IS. Jiang et al. [
21] improved the extraction accuracy of Baoding’s built area by using Landsat 8 images and night lighting data, along with the support vector machine algorithm. Despite their significant role in land use and artificial surface monitoring, machine learning algorithms encounter limitations when applied to IS recognition in medium-resolution remote sensing images. The primary reason lies in the inherent complexity and computational intensity of machine learning algorithms. Additionally, medium-resolution remote sensing images are inherently constrained by limitations in resolution and imaging performance. When these limitations are combined with the rich diversity and intricate complexity of the landscape, the result is spectral confusion. This, in turn, leads to a reduction in the accuracy of impervious water extraction [
22].
Many researchers have fused multiple features of remote sensing images to obtain high-quality information on IS and improve the accuracy of IS extraction. Shaban et al. [
23] extracted three texture features, including the gray level co-occurrence matrix (GLCM), gray level difference histogram (GLDH), and difference histogram (SADH), which were combined with spectral features to significantly improve the accuracy of IS extraction. Wang et al. [
24] used the normalized difference vegetation index (NDVI) time series, reflectance spectral features, and spatial texture features as the feature input of support vector machine classification and extracted IS. The overall classification accuracy of the method was 93.66%. The IS extraction method, based on multi-feature inputs, can significantly improve spectral mixing and increase the classification accuracy of IS. Traditional research methodologies typically take into account a limited number of features present in remote sensing images, thereby failing to comprehensively capture the distinctions between IS and other land types. There are studies that attempt to leverage a diverse set of features in the process of IS identification, but they often overlook the significance of feature selection [
25,
26], leading to feature redundancy. In this research, it is hypothesized that the classification accuracy may be influenced by both the category and the number of combined features, and distinct categories of features might be optimally suited to varying machine learning algorithms. Consequently, we selected spectral features, texture features, and seasonal land cover features for further examination. This was done to investigate the extraction accuracy of IS under the impact of varying feature combinations and diverse machine learning algorithms.
The Dianchi Lake Basin is the largest plateau lake basin in the Yunnan–Guizhou Plateau. It is an important water conservation ecological function area, identified based on the ecosystem assessment and ecological security of China [
27], and it is also the most economically dynamic area in Yunnan Province. Due to the rapid expansion of IS, the urban surface of the Dianchi Lake Basin has undergone rapid and dramatic changes, threatening the regional ecological environment, which is vulnerable and sensitive. Therefore, in this study, we investigated the Dianchi Lake Basin, based on remote sensing and machine learning algorithms, to realize the following objectives: (1) To optimize the IS extraction method, the optimal coupling model of the machine learning algorithm and remote sensing features are selected, and they map the IS in the Dianchi Lake Basin from 2000 to 2022. (2) Based on the long time-series mapping results of IS, the dynamic change characteristics of IS in the Dianchi Lake Basin were quantitatively analyzed. Furthermore, building upon the foundation of prior studies, we implemented two key enhancements: (1) We carried out comparisons among the coupling models of various machine learning algorithms and different remote sensing features. The aim was to select the most effective coupling model to maximize the accuracy of IS extraction. (2) We adopted the empirical analysis methodology of the partial least square structural equation model. This was done to dissect the causative link between IS distribution and its influencing factors, thereby providing a more precise depiction of the impact of both latent and observed variables.
  5. Discussion
  5.1. Effects of IS Expansion on the Ecological Quality in the Dianchi Basin
Greenness, humidity, dryness, and heat are closely associated with the quality of the ecological environment. Most studies used these four indicators to construct the remote sensing ecological index (RSEI) [
49,
50,
51]. To realize the goal of “carbon neutralization”, in this study, we added “carbon storage” to the RSEI. Carbon storage is based on the carbon density dataset and land use type/cover, and it is calculated using the InVEST model. Leveraging insights from prior research [
52,
53], the carbon density dataset was adapted via the rainfall and temperature model [
54,
55]. This modification yielded the final carbon density data for the Dianchi Lake Basin. Finally, the improved remote sensing ecological index, C-RSEI, was constructed by conducting a principal component analysis coupled with five indices, including greenness (NDVI), humidity (WET), dryness (NDBSI), heat (LST), and carbon storage (Carbon). A larger value indicated a better eco–environmental quality.
The C-RSEI of the Dianchi watershed in 2000, 2005, 2010, 2015, and 2022 was 0.5054, 0.5071, 0.5118, 0.5085, and 0.5055, respectively, showing an inverted “U” trend, i.e., the eco–environmental quality first increased and then decreased. The temporal and spatial distribution of the eco–environmental quality in the Dianchi Basin, from 2000 to 2022, is shown in 
Figure 8. The overall eco–environmental quality in the Dianchi Basin showed the zonal distribution characteristics of “good north-south, poor middle”. The overall level of urbanization of Songming County, the south of the Jinning District, the east of the Chenggong District, the east and north of the Panlong District, and the eastern part of the Guandu District was low, the intensity of land use development was small, and the quality of the ecological environment was good. In contrast, the quality of the ecological environment was poor in the Wuhua District, the southwest of the Panlong District, the northeast of the Xishan District, the west of the Guandu District, the west of the Chenggong District, and the central part of the Jinning District because of the high level of urbanization, the accumulation of construction land, and the fragmentation of ecological land. From 2000 to 2022, with the continuous expansion of the IS, the eco–environmental quality of the north bank of Dianchi Lake gradually decreased.
To analyze the influence of IS expansion on the quality of the ecological environment, the study area was divided into grids of 1 km × 1 km; then, the IS coverage in each grid was calculated, and the average value of the C-RSEI was extracted. Pearson’s correlation coefficients for the relationship between IS coverage and the C-RSEI in the Dianchi Basin, from 2000 to 2022, were −0.408, −0.366, −0.403, −0.419, and −0.532, respectively, indicating a moderate negative relationship between IS coverage and the quality of the ecological environment at the scale of 1 km × 1 km. We then performed the bivariate spatial autocorrelation analysis on the data. The bivariate global spatial autocorrelation was expressed using the bivariate Moran’s I index. From 2000 to 2022, the bivariate Moran’s I indices of IS coverage and the C-RSEI in the Dianchi Basin were −0.398, −0.354, −0.387, −0.398, and −0.519, respectively, indicating a spatial global negative correlation between IS coverage and the quality of the ecological environment in the Dianchi basin. The bivariate LISA clustering diagram of IS coverage and the C-RSEI, in the Dianchi watershed, is shown in 
Figure 9. The main clustering types were “low-high” and “high-low”, i.e., the quality of the ecological environment of the region with lower IS coverage was better. In contrast, the quality of the ecological environment of the region with higher IS coverage was poor, indicating a spatial local negative correlation between IS coverage and the quality of the ecological environment.
  5.2. Comparison of the Results of IS Extraction for Each Coupled Model
Based on the classification accuracy of each coupling model (
Figure 4), in the extraction experiment of ANN, the OA and Kappa coefficients of IMG-SPE-SSC-TEX
ANN were the highest (84.0093% and 0.7830, respectively). The UA of IMG-TEX
ANN was the highest (92.5602%), and the PA of IMG-SSC
ANN was the highest (90.3676%). In general, the classification effect of IMG-SPE-SSC-TEX
ANN was the best; the OA and Kappa coefficients, as well as UA of IMG
ANN, using only images for IS extraction were the lowest, and the PA of IMG-SPE
ANN and IMG-TEX
ANN were also the lowest. In the extraction experiment based on the SVM coupling model, the OA and Kappa coefficients of IMG-SPE
SVM were the highest (91.9841% and 0.8882, respectively). The UA of IMG-SSC
SVM was the highest (90.7648%), and the PA of IMG-SSC-SPE
SVM was the highest (95.7752%). The extraction effect of IMG-SPE
SVM was the best. The OA and Kappa coefficients of IMG
SVM were the lowest, the UA of IMG-SPE-SSC-TEX
SVM was the lowest, and the PA of IMG
SVM and IMG-TEX
SVM was the lowest. In the RF-based IS extraction model, the OA and Kappa coefficient of IMG-SSC
RF were the highest (90.4579% and 0.8673, respectively). The UA of IMG-TEX
RF was the highest (88.7873%). The PA of IMG-SSC-SPE
RF was the highest (96.3245%). The OA and Kappa coefficients of IMG-SPE
RF were the lowest, the UA of IMG
RF was the lowest, and the PA of IMG-TEX
RF was the lowest.
We found that only using remote sensing images to extract the impervious water surface limited its extraction accuracy, while using remote sensing image features as auxiliary information improved the classification accuracy to some extent, and the type and fusion number of features affected the extraction accuracy. This was because different kinds of features had the same information, and the fusion of too many features caused redundancy and decreased classification accuracy. For these three machine learning methods, the best overall classification result was provided by the SVM, followed by the RF, and the ANN provided the worst classification result. The coupled models with the highest extraction accuracy among the three machine learning algorithms, including IMG-SPE-SSC-TEXANN, IMG-SSCRF, and IMG-SPESVM, respectively, were selected to compare the classification results of the three models.
The differences in the extraction ability of the three IS were mainly concentrated in the extraction of low-reflectivity IS (such as roads, cement floors, etc.), and the details are compared in 
Figure 10. As shown in 
Figure 10a, the suburban area in the central part of the Dianchi Lake Basin was where the ANN had low-reflectivity IS leakage, and more IS was erroneously classified as bare land. The ability of SVM and RF to identify low-reflectivity IS was considerably higher than that of ANN. As shown in 
Figure 10b, at Changshui International Airport in the Guandu District, the ANN also showed IS leakage and identified it as bare land, while RF identified permeable areas (mostly bare land) as IS and showed multi-partitioning identification of the impervious water surface, while the SVM showed a better recognition ability. As shown in 
Figure 10c, in the eastern suburb of the Chenggong District, it is difficult for ANN to identify IS with low reflectivity, which was similar to the results described in 
Figure 10a; thus, the ANN was poor at identifying low IS cover areas.
  5.3. Limitations and Prospects
Based on the three machine learning algorithms and the multi-features of coupled remote sensing images, the optimal model was selected for extracting and mapping the impervious water surface of the Dianchi Lake Basin from 2000 to 2022. Starting with the expansion speed, expansion direction, spatial correlation, and driving mechanism, the dynamic characteristics of the IS in the Dianchi Lake Basin, from 2000 to 2022, were analyzed. In this study, we proposed a relatively innovative research framework for extracting and analyzing the IS of regional long-time series. Our methods and findings might be important for the economic and urban development, as well as the sustainable coordination, of the Dianchi Lake Basin.
However, this study had some shortcomings. First, in the IS extraction experiment, the quantity and quality of sample points were directly related to the IS extraction accuracy, so the selection of sample points was very important but extremely tedious. For subsequent experiments on the extraction of IS in a large area, we can use OpenStreetMap and other open-source products to obtain sample points automatically. Second, in this study, only three machine learning algorithms were used—ANN, SVM, and RF—which have some limitations. We need to use other machine learning algorithms to compare and analyze the results obtained here. Additionally, this study was based on the empirical analysis method, where PLS-SEM was used to identify the driving mechanism of IS expansion in the study area; only the influences of elevation, slope, temperature, rainfall, GDP, population, tourism, road, and other factors were considered, while the reasons for IS changes were complex and diverse. Future studies might consider the impact of natural factors, such as soil types and solar radiation, as well as social factors, such as hospitals and schools.
  6. Conclusions
Based on the Landsat images from 2000 to 2022, the optimal coupling model was used to extract and analyze the impervious water surface of the Dianchi Lake Basin. The results showed the following: (1) By comparing the confusion matrix and accuracy evaluation results of 24 sets of impervious datasets, the optimal coupling model for extracting IS in the Dianchi Basin was found to be IMG-SPESVM, and the extraction effect of SVM was better than that of the other two machine learning methods. (2) The mapping results of IS, in different years, showed that significant changes occurred in the spatial distribution and shape of IS in the Dianchi Lake Basin between 2000 and 2022, but all changes occurred in the area around Dianchi Lake. (3) The IS expanded at a medium speed from 2000 to 2005, at a fast speed from 2005 to 2010, and at a high speed from 2010 to 2015. The rate of IS expansion showed a sequential acceleration from 2000 to 2015 and contracted slowly from 2015 to 2022. (4) The center of mass of the IS moved to the northeast in 2000, the IS expanded to the northeast with Dianchi Lake as the center, and the urban core also moved with it. The standard deviation ellipse shifted considerably in the south–north direction, and the degree of dispersion continued to increase. The IS expansion showed “north extension, east extension, and south extension”. (5) The coverage of IS showed a certain spatial global and local autocorrelation from 2000 to 2022. (6) Natural factors negatively affected the expansion of the IS, and this effect increased slightly after 2010. In contrast, social factors positively affected the distribution of the IS, and its effect gradually weakened from 2005 to 2022. (7) Within the scale of 1 km × 1 km used for the survey, a moderate negative correlation was recorded between IS coverage and the eco–environmental quality in the study area, and a global and local negative correlation was found between them. (8) In this investigation, sample selection largely hinged on visual interpretation, which presents certain limitations. Future studies focusing on impervious extraction might benefit from utilizing open-source products, such as OpenStreetMap, to acquire samples automatically. Further research could also consider deploying a range of machine learning or deep learning algorithms, beyond ANN, SVM, and RF, for comparative evaluation. Moreover, the influence of factors such as soil types, solar radiation, hospitals, schools, and others on the distribution of impervious surfaces can be explored in forthcoming studies.