Aquaculture Ponds Identification Based on Multi-Feature Combination Strategy and Machine Learning from Landsat-5/8 in a Typical Inland Lake of China

: Inland aquaculture ponds, as an important land use type, have brought great economic benefits to local people but at the same time have caused many environmental problems threatening regional ecology security. Therefore, understanding the spatiotemporal pattern of aquaculture ponds and its potential influence on water quality is vital for the sustainable development of inland lakes. In this study, based on Landsat5/8 images, three types of land features, namely spectral features, index features, and texture features, and five machine learning algorithms, namely random forest (RF), extreme gradient boosting (XGBoost), artificial neural network (ANN), k-nearest neighbor (KNN), and Gaussian naive Bayes (GNB), were combined to identify aquaculture ponds and some other primary land use types around a typical inland lake of China. The results demonstrated that the XGBoost algorithm that integrated the three features performed the best among all groups of the five machine learning algorithms and the three features, with an overall accuracy of up to 96.15%. In particular, the texture features provided additional useful information besides the spectral features to allow more accurately separation of aquaculture ponds from other land use types and thus improve the land use mapping ability in complex inland lakes. Next, this study examined the tendency of aquaculture ponds and found a segmented increase mode, namely sharp increase during 1984–2003 and then slow elevation since 2003. Further positive correlation detected between the area of aquaculture ponds and the phytoplankton population dynamics suggest a likely influence of aquaculture activity on the lake water quality. This study provides an important scientific basis for the sustainable management and ecological protection of inland lakes.


Introduction
Land use and cover change (LUCC) is currently a hot area of global environmental change, which closely links human society and natural ecological processes and has a profound impact on human survival and development [1,2].Aquaculture ponds, one of the important land use types, serve as one of the main sources of animal protein and are increasingly contributing to food security in Asia's populous inland cities.Specifically, China's aquaculture production accounted for approximately 60% of the world total until 2020 [3,4].As the size of ponds has increased considerably, intensive aquaculture has caused serious destructive effects on local environments, such as the decrease in water quality, the decline in biodiversity, and the loss of services provided by aquatic ecosystems [5][6][7].Therefore, understanding the expansion pattern of aquaculture ponds in inland lakes and its influences on local environments is of great importance to the healthy development of human-natural ecosystems.
In previous studies, scholars have devoted themselves to improving the accuracy of land use classification from two aspects, namely, the incorporation of multi-source features and the development of new classification algorithms.Multi-source features, mainly including band reflectance, remote sensing indices, and texture characteristics, and the rich fusion of these features can provide comprehensive surface information and thus enhance the ability of land use mapping.For example, Chen et al. (2017) [8] jointly employed Landsat-8 OLI, MODIS, HJ-1A, and ASTER DEM data to perform land cover classification in Beijing by integrating temporal, spectral, angular, and topographic information, which achieved a 4.53% higher overall accuracy (OA) than using only OLI data.Li et al. (2023) [9] deeply explored the scaling effect of image spatial resolution on land cover classification from the perspectives of hybrid image element decomposition and spatial heterogeneity based on GF-2, SPOT-6, Sentinel-2, and Landsat-8, and showed that GF-2 and SPOT-6 had the best classification performance with an OA of up to 92.81%.In addition, Feyisa et al. (2014) [10] proposed the innovative Automatic Water Extraction Index (AWEI), which improved the classification accuracy (kappa = 0.98) of shaded and dark surface areas that are usually difficult to classify correctly by normal methods in New Zealand.Huang et al. (2015) [11] successfully integrated texture features and DEM data using the BP artificial neural network and gained high accuracy in remote sensing image classification and land use change detection (OA = 95.08%).
In terms of algorithms applied to land use classification, traditional classification models such as the maximum likelihood method [12], the K-means method [13,14], and the k-nearest neighbor algorithm (KNN) [15] dominated early research.However, with the rapid development of pattern recognition and machine learning, some intelligent algorithms such as support vector machine (SVM) [16] and neural networks [17] have gradually come to the forefront, presenting higher accuracy and effectiveness than the traditional parametric methods in land use classification.Tree-based models, especially those equipped with learning methods such as random forest (RF) [18] and Extreme Gradient Boosting Tree (XGBoost) [19], have attracted widespread attention for their excellent performance and ease of use.In addition, feature selection plays a key role in improving classification accuracy [20].By removing irrelevant or redundant features, model performance could be largely optimized without losing important information.It has been proven that proper feature selection has a significant impact on the final classification accuracy [21].
Inland lake aquaculture ponds are often overlooked or not included in the existing land use classification system.This is mainly due to the special nature and complexity of lake aquaculture ponds, which make them difficult to clearly delineate with traditional land use types.As a way of utilizing waters, lake aquaculture ponds have their own unique functions and characteristics, which are different from general land use types such as water bodies or agricultural land.Specifically, aquaculture ponds often appear as regularly shaped, isolated, and enclosed bodies of water [22,23].The water quality of ponds is highly affected by aquaculture activities, such as feed delivery and fish excretion, which lead to a significant increase in the material circulation efficiency in the pond [24].Therefore, a lot of aquaculture ponds are easily covered by some plant vegetation and phytoplankton in the growing season as an abundant nutrient supplement, which makes it hard to distinguish them from land vegetation.
Traditional land use classification mainly focuses on distinguishable land utilization modes, such as water bodies, agriculture, forests, and built-up areas, while it lacks a finer categorization of water use modes in the land-water transition zone.Normally, in the existing land classification system, lake aquaculture ponds are often categorized as an unspecified type or ignored.This fails to meet the needs of the scientific protection and sustainable development of lake resources.To accurately evaluate the intensity of aquaculture ponds, and to detect and quantify the distribution and change trends of aquaculture ponds, we selected a typical inland lake with a long history of fish pond culture in northern China to (1) implement different classifiers based on multilevel feature fusion for LUCC mapping and change detection for the selected 10 years of Landsat data from 1984 to 2022 and (2) explore the spatiotemporal pattern of aquaculture ponds and other associated land use types and their potential impacts on local water quality.

Study Area
Nansi Lake (116 ) is one of the most important freshwater lakes in North China (Figure 1), which is not only the main fishery base of Shandong Province but also a critical intermediate lake on the east route of the South-to-North Water Transfer Project.It is approximately 126 km long from north to south and 5-25 km wide from east to west.The central part of the lake is slightly narrower, while the northern and southern parts are broader, forming a teardrop shape.The average water depth in the lake is 2 m.The study region belongs to a warm temperate semi-humid monsoon climate zone with an average annual temperature of about 13.7 • C and an average annual precipitation of 695.2 mm.More than 70% of the annual precipitation falls in the flood season from June to September.Pit-pond culture is the dominant fishery type in Nansi Lake and a vital part of the local economy.Aquaculture ponds constructed by setting up dike banks near the shore are mainly distributed in the water of Nansi Lake.However, with the continuous economic development, the land use structure of the lake area has undergone significant changes in the past several decades.The explosive growth in local population and economic development and the rapid expansion in aquaculture ponds have led to a continuous decrease in the arable land and waters and unavoidably resulted in a certain degree of ecological imbalance in this area.categorization of water use modes in the land-water transition zone.Normally, in the existing land classification system, lake aquaculture ponds are often categorized as an unspecified type or ignored.This fails to meet the needs of the scientific protection and sustainable development of lake resources.To accurately evaluate the intensity of aquaculture ponds, and to detect and quantify the distribution and change trends of aquaculture ponds, we selected a typical inland lake with a long history of fish pond culture in northern China to (1) implement different classifiers based on multilevel feature fusion for LUCC mapping and change detection for the selected 10 years of Landsat data from 1984 to 2022 and (2) explore the spatiotemporal pattern of aquaculture ponds and other associated land use types and their potential impacts on local water quality.

Study Area
Nansi Lake (116°34′-117°21′E, 34°27′-35°20′N) is one of the most important freshwater lakes in North China (Figure 1), which is not only the main fishery base of Shandong Province but also a critical intermediate lake on the east route of the South-to-North Water Transfer Project.It is approximately 126 km long from north to south and 5-25 km wide from east to west.The central part of the lake is slightly narrower, while the northern and southern parts are broader, forming a teardrop shape.The average water depth in the lake is 2 m.The study region belongs to a warm temperate semi-humid monsoon climate zone with an average annual temperature of about 13.7 °C and an average annual precipitation of 695.2 mm.More than 70% of the annual precipitation falls in the flood season from June to September.Pit-pond culture is the dominant fishery type in Nansi Lake and a vital part of the local economy.Aquaculture ponds constructed by setting up dike banks near the shore are mainly distributed in the water of Nansi Lake.However, with the continuous economic development, the land use structure of the lake area has undergone significant changes in the past several decades.The explosive growth in local population and economic development and the rapid expansion in aquaculture ponds have led to a continuous decrease in the arable land and waters and unavoidably resulted in a certain degree of ecological imbalance in this area.

Data
Landsat TM/OLI (L2) images from 1984 to 2022 provided by the United States Geological Survey (USGS) were used in this study to carry out land cover classification.In order to accurately separate aquaculture ponds from nearshore vegetation, we elaborately selected a total of 10 winter cloud-free images to eliminate the effect of spectral convergence caused by plants growing in the ponds during growing season.The acquisition dates of these images were 7 February 1987, 27 January 1989, 24 December 1993, 5 Febru-  In order to comprehensively analyze the long-term changes in LUCC around the lake, we set a 5 km buffer zone based on the vector extent of the lake.When determining the land cover types, we fully considered the actual land use status in the Nansi Lake area, and the potential relationship between different land use types and the dynamic changes in aquaculture ponds.Therefore, five land cover types were identified in the current study, namely farmland, water, aquaculture pond, built-up land, and others (primarily consists of forests and barens).In addition, the phytoplankton density data derived from Wang et al. (2024) [25] were adopted to study the possible influence of aquaculture activity on lake water quality, which allowed an RF-based model to be developed to quantify the ecological status in Nansi Lake by means of Landsat-8 OLI images and obtained a high prediction accuracy.

Samples Collection
In remote sensing image classification, sample quality is crucial to the final mapping accuracy.Following the principle of full frame selection, all Landsat TM/OLI images covering the study area were comprehensively visually interpreted.Special attention was paid to the selection of representative pixels for each type of land use to ensure that the samples could truly reflect the spectral characteristics and spatial distribution of each type of feature.After strict screening and calibration, 11,489 sample points evenly distributed throughout the study area were finally identified, including 1877 samples for farmland, 2649 samples for water, 3102 samples for aquaculture pond, 1122 samples for built-up land, and 2739 samples for others.This approach fully considered the balance of the scale of each LUCC type and thus could avoid classification bias due to excessive differences in the number of samples.Then, we randomly divided them into training and test datasets with an 8:2 ratio to ensure that the model could be adequately trained and its classification performance effectively evaluated.

Classification Features
First, this study selected the three visible bands, the near-infrared band and the two short-wave infrared bands, corresponding to the B1, B2, B3, B4, B5, and B7 bands of Landsat5 TM and the B2, B3, B4, B5, B6, and B7 bands of Landsat8 OLI, as the direct features to capture the difference among land use types in spectral characteristics.Second, we considered the Enhanced Vegetation Index (EVI) and the Modified Normalized Difference Water Index (MNDWI) as two other keys to enhance the discrepancy among the targeted objects.The EVI could reduce the atmospheric effects and address the saturation issue in the area of high vegetation coverage found in the traditional normalized difference vegetation index [26,27].The expression is as follows: where ρ nir , ρ red , and ρ blue are the atmospherically corrected reflectance values for the near-infrared and red and blue bands, respectively.MNDWI can eliminate the effect of terrain difference and solve the problem of noise in water body identification [28,29].The expression is as follows: where Green and SWIR1 are the reflectance values in the green band and short-wave infrared band 1, respectively.
In addition, we also adopted the gray-level co-occurrence matrix (GLCM) method to extract texture information by calculating the gray-level spatial relationship between pixels for further improving the classification accuracy [30].Texture features play a crucial role in the recognition of ground object types, especially in distinguishing ground objects with similar spectral features but different spatial features.For example, aquaculture pond and water body are similar in spectral reflection, but their texture features, such as regularity, roughness, and grain size, may differ significantly.By introducing texture features, we can more accurately depict the spatial structure of ground objects, thus improving the accuracy of classification.In total, eight texture metrics were calculated, including mean, variance, homogeneity, contrast, dissimilarity, entropy, angular second moment, and correlation.These variables provide an effective tool for quantifying surface irregularities and are essential for distinguishing different land cover categories (Table 1).To guarantee the accuracy and effectiveness of the texture analysis, we used a window size of 3 × 3 to traverse the entire image pixel by pixel and took the gray level of 64 to capture the detailed texture information in the image.With the above settings, we successfully extracted 48 texture features from the original image.However, too many features may lead to an increase in computational complexity and a decrease in classification performance.In order to reduce the feature dimension and extract the most important information, we performed principal component transform analysis on these texture features.Principal component analysis (PCA) is a statistical tool that transforms the original features into new, unrelated features through linear transformations.These new features are called principal components.The purpose of PCA is to identify the most important features from the data and aggregate them into a new, smaller set of features that explain the greatest degree of variance in the data.By calculating the covariance matrix of the texture feature matrix, PCA determines the direction that can maximize the variance in the data, that is, the main direction of the data change.Each principal component is a linear combination of the original features, with the first principal component explaining the largest variance in the data, the second principal component explaining the largest portion of the remaining variance, and so on.The first 5 principal components were selected for the subsequent land use classification.

Mean
Reflects the degree of regularity of the texture.

Variance
Measures the dispersion of the gray-level distribution to emphasize the visual edges of land cover patches.

Homogeneity
Measures the local gray-level homogeneity of an image.

Contrast
Reflects the total amount of local gray-level changes in an image.

Dissimilarity
Similar to contrast, if the local contrast is higher, the dissimilarity is also higher.

Entropy
Measures the amount of information contained in an image, representing the degree of non-uniformity or complexity of textures within the image.

Angular Second Moment
Measures the uniformity of the image gray-level distribution, reflecting the degree of uniformity of the image gray-level distribution and the coarseness of the texture.

Correlation
Measures the linear relationship of gray levels, describing the degree of similarity between elements in rows or columns.

Mean
Reflects the degree of regularity of the texture.

Classification Algorithms
In this study, we employed two tree-based machine learning models, RF and XGBoost; two classical models, KNN and Gaussian naive Bayes (GNB); and artificial neural network (ANN) for land cover classification.The RF model is the most widely used classification model in LUCC classification, with proven accuracy [31,32].The XGBoost model stands out in the field of machine learning due to its efficient processing of large-scale data [33,34].ANN is one of the most commonly used non-parametric classification techniques, renowned for its strong generalization capabilities [35].KNN and GNB are both traditional machine Remote Sens. 2024, 16, 2168 6 of 15 learning algorithms that are computationally simple, run efficiently, and perform well in land object identification with high homogeneity [36,37].

RF
RF is an integrated classifier based on decision trees, each of which is independently generated with a user-defined number of features on which its node splits are based.The selection of these features is randomized to warrant model diversity.The training data and variables for each decision tree are generated through a bagging strategy, and the final classification results are derived through majority voting [38].In this study, we set the number of trees (n_estimators) to 100 to ensure that the model had sufficient diversity; we set the maximum number of features per tree (max_features) to the square root of the total number of features to balance the complexity of the model and the risk of overfitting.

XGBoost
XGBoost, a remarkable machine learning algorithm based on the Gradient Boosting Decision Tree framework, stands out due to its superior flexibility, high efficiency, and outstanding performance in Kaggle machine learning competitions [19].By introducing a regularization mechanism, XGBoost is able to smooth the weights of the final learning, effectively avoid the overfitting issue, and thus improve the learning accuracy.In addition, XGBoost is equipped with parallel and distributed computing capabilities, which significantly accelerates the learning speed.In this study, we set the number of iterations (n_estimators) to 100, the maximum depth of the decision tree (max_depth) to 10, and learning_rate to 1.

KNN
KNN is an instance-based classifier for classification and regression [39].It does not rely on an explicit model training process, but instead finds the K closest training samples to an unknown sample by measuring the distance of that sample from all samples in the training set, and uses the category with the most votes as the prediction for the unknown sample based on the category labels of the K samples [40].After experimental validation, K was set to 20 for ensuring that the model can make full use of the information from neighboring samples when classifying, while avoiding the influence of noisy data on the classification results.

GNB
GNB is a machine learning algorithm that uses probabilistic methods and relies on Gaussian distributions.Its principle is based on Bayes' theorem and the assumption of conditional independence between features, i.e., the feature variables of each category obey a normal distribution.By calculating the mean and variance of the feature variables of each category, the algorithm can estimate the probability that an unknown sample belongs to each category based on these statistics [41,42].In our dataset, the distribution of most features is approximately normal, which provides a reasonable basis for the application of the GNB model (Figure S1).In this study, in GNB parameters, the prior was set to none and var_smoothing was set to 1 × 10 −9 .

ANN
The ANN classification algorithm learns the relationship between input features and output categories through a training process.During training, the network calculates outputs through forward propagation and then computes output errors through the backpropagation algorithm, updating network weights based on these errors [43].ANN classification algorithms typically consist of an input layer, one or more hidden layers, and an output layer, with each neuron using an activation function (Sigmoid, Tanh, ReLU, etc.) to determine whether to activate, introducing non-linear factors that enable the neural network to learn and model complex non-linear relationships [44].In this study, we em-ployed Multilayer Perceptron (MLP) as the neural network architecture, selected the logistic function as the activation function, and chose the lbfgs optimizer to refine the weights.

Analysis
Based on spectral features, index features, and texture features, we constructed three feature schemes (Table 2) and trained KNN, GNB, ANN, RF, and XGBoost classification models, respectively.In order to objectively and systematically evaluate the performance of different classification algorithms and feature schemes, we used a variety of statistical metrics for quantitative analysis.Specifically, we calculated the confusion matrix of each model based on the training and testing datasets to visualize the model's classification effect on each category of samples.On this basis, we further quantified the OA, which can directly reflect the proportion of objects correctly classified by the model, providing us with an intuitive performance measurement.
In addition, in order to evaluate the model performance more comprehensively, we also introduced the Kappa consistency coefficient, which is an indicator describing the degree of consistency between the model's classification results and the actual situation.Meanwhile, we further calculated the Producer Accuracy (PA) and recall to assess model performance in terms of the prediction of positive examples of the classification results and the recall of real positive examples.
Finally, we adopted the F1 score as a comprehensive evaluation metric, which combines the information of precision and recall and can fully reflect the comprehensive performance of the model in the classification task.Through the comprehensive analysis of these metrics, we are able to more objectively and comprehensively assess the performance of different classification algorithms and feature schemes in LUCC classification.

Feature Selection and Feature Importance
In this study, feature importance was assessed for the three feature schemes using RF (Figure 2).Among the 13 features assessed, the index feature EVI had the highest importance score of 0.16, indicating that EVI played a key role in classification prediction.The spectral features NIR and SWIR1 also exhibited high importance scores, reflecting their effectiveness in distinguishing different LUCC types.For texture features, the importance scores of the principal components PC1 and PC2 were relatively higher than PC3, PC4, and PC5.In comparison, the effects of visible bands are relatively weaker than those of index and texture features.

Feature Profile Comparison
In order to legibly understand the gaps among the five targeted objects, we systematically compared their differences in spectral features, index features, and principal components of texture features (Figure 3).In the visible light bands, the reflectance distributions of the five objects were relatively similar and generally in the range of 0~0.2.This indicated that the spectral characteristics of these classes did not differ too much in the visible light bands, making it difficult to effectively distinguish them by only relying on the spectral gaps in visible bands.In comparison, in the non-visible bands (NIR, SWIR1, and SWIR2), the spectral properties of the five classes showed significant differences.Specifically, farmland and built-up areas had relatively higher reflectance in the NIR and SWIR bands than aquaculture ponds, water, and others.Meanwhile, water in particular reflected less than aquaculture ponds, which would be helpful to distinguish them.Regarding index features, the highest EVI values and the smallest MNDWI values were detected in farmland.The EVI and MNDWI value of aquaculture ponds lies between water and built-up areas/others.As for texture profiles, only PC1 and PC2 exhibited obvious gaps among the five classes, providing an effective basis for LUCC classification.In contrast, the PC3-PC5 principal components largely overlapped with each other, suggesting that they were useless for improving classification accuracy.

Feature Profile Comparison
In order to legibly understand the gaps among the five targeted objects, we systematically compared their differences in spectral features, index features, and principal components of texture features (Figure 3).In the visible light bands, the reflectance distributions of the five objects were relatively similar and generally in the range of 0~0.2.This indicated that the spectral characteristics of these classes did not differ too much in the visible light bands, making it difficult to effectively distinguish them by only relying on the spectral gaps in visible bands.In comparison, in the non-visible bands (NIR, SWIR1, and SWIR2), the spectral properties of the five classes showed significant differences.Specifically, farmland and built-up areas had relatively higher reflectance in the NIR and SWIR bands than aquaculture ponds, water, and others.Meanwhile, water in particular reflected less than aquaculture ponds, which would be helpful to distinguish them.Regarding index features, the highest EVI values and the smallest MNDWI values were detected in farmland.The EVI and MNDWI value of aquaculture ponds lies between water and built-up areas/others.As for texture profiles, only PC1 and PC2 exhibited obvious gaps among the five classes, providing an effective basis for LUCC classification.In contrast, the PC3-PC5 principal components largely overlapped with each other, suggesting that they were useless for improving classification accuracy.

Accuracy Comparation of Different Classification Models
The performance of classification models was explored in depth on the test data (Table 3).Among the classifiers, the XGBoost and RF models exhibited better performance in the classification task and reached higher accuracies of up to 96.15% and 95.92%, respectively.In contrast, the GNB model performed the worst with an accuracy below 65%.

Accuracy Comparation of Different Classification Models
The performance of classification models was explored in depth on the test data (Table 3).Among the classifiers, the XGBoost and RF models exhibited better performance in the classification task and reached higher accuracies of up to 96.15% and 95.92%, respectively.In contrast, the GNB model performed the worst with an accuracy below 65%.Among the three feature schemes examined, scheme 5 that incorporated texture features outperformed all other schemes overall, especially the XGBoost classifier.In comparison, scheme 3, which only included spectral and texture features, was slightly less effective than scheme 5, indicating the significance of spectral features in the classification process.scheme 4, which contained only index and texture features, had lower classification accuracy than scheme 5, further confirming the critical role of spectral features in classification tasks.This study employed the XGBoost scheme 5 classification scheme to evaluate the accuracy of land use types and presented the corresponding normalized confusion matrix (Figure 4).Among the different land use types, built-up land had the lowest classification accuracy, with PA and recall of 91.3% and 88.63%, respectively.In contrast, farmland had the highest recall, reaching 97.19%, while the 'other' type had the highest PA, at 97.24% (Figure 4b).Apart from built-up land, the correct classification ratio for other land use types was generally higher than 0.9.The correct classification ratio for aquaculture ponds was 0.93, which was relatively superior among all land use types.Further analysis of the misclassification of aquaculture ponds revealed that the highest proportion of errors was with water, amounting to 0.03 (Figure 4a).This indicates the high accuracy of XGBoost scheme 5 in classifying most types of land use.
In terms of visualization, compared to scheme 1, scheme 2, and scheme 4, the patch integrity of surface objects predicted under scheme 5 was significantly improved and the confusion of categories was significantly reduced (Figure 5).Especially in the categorization of aquaculture ponds, scheme 5-based prediction significantly refined the continuity and completeness of their distribution.In terms of visualization, compared to scheme 1, scheme 2, and scheme 4, the patch integrity of surface objects predicted under scheme 5 was significantly improved and the confusion of categories was significantly reduced (Figure 5).Especially in the categorization of aquaculture ponds, scheme 5-based prediction significantly refined the continuity and completeness of their distribution.

Land Cover Changes in Nansi Lake
Based on the results predicted by the XGBoost classifier under scheme 5, this study analyzed the development of the area of five land cover types around Nansi Lake between 1987 and 2021 (Figure 6b).Built-up land and aquaculture ponds have largely expanded since 1987.Specifically, the area of built-up land extended from 120 km 2 in 1987 to 296 km 2 in 2021, while the area of aquaculture ponds surged from 48 km 2 to 842 km 2 , with a sharp increase during 1984-2003.On the contrary, the area of the 'other' type greatly shrank from the dominant cover type to about 215 km 2 by 2021.Farmland showed a slight expansion tendency overall with a sudden drop around 2003.The area of lake bodies fluctuated dramatically during the study period but no significant trends were found here.Spatially, the distribution of aquaculture ponds after 2002 has shown a pronounced characteristic of geographic clustering, primarily concentrated in the western and central regions of Nansi Lake (Figure 6a).

Land Cover Changes in Nansi Lake
Based on the results predicted by the XGBoost classifier under scheme 5, this study analyzed the development of the area of five land cover types around Nansi Lake between 1987 and 2021 (Figure 6b).Built-up land and aquaculture ponds have largely expanded since 1987.Specifically, the area of built-up land extended from 120 km 2 in 1987 to 296 km 2 in 2021, while the area of aquaculture ponds surged from 48 km 2 to 842 km 2 , with a sharp increase during 1984-2003.On the contrary, the area of the 'other' type greatly shrank from the dominant cover type to about 215 km 2 by 2021.Farmland showed a slight expansion tendency overall with a sudden drop around 2003.The area of lake bodies fluctuated dramatically during the study period but no significant trends were found here.Spatially, the distribution of aquaculture ponds after 2002 has shown a pronounced characteristic of geographic clustering, primarily concentrated in the western and central regions of Nansi Lake (Figure 6a).

Relationship between Water Quality and the Expansion of Aquaculture Ponds
This study analyzed the potential impact of aquaculture pond expansion on water quality, expressed through the amount of phytoplankton in Nansi Lake, by means of correlation analysis (Figure 7).We found the phytoplankton abundance decreased after 2003 with the increase in the area of aquaculture ponds.Nonetheless, it showed a positive correlation (R = 0.5) with annual fluctuation, suggesting that its synchronous relationship with the water quality of the lake may be partially affected by aquaculture activity.

Relationship between Water Quality and the Expansion of Aquaculture Ponds
This study analyzed the potential impact of aquaculture pond expansion on water quality, expressed through the amount of phytoplankton in Nansi Lake, by means of correlation analysis (Figure 7).We found the phytoplankton abundance decreased after 2003 with the increase in the area of aquaculture ponds.Nonetheless, it showed a positive correlation (R = 0.5) with annual fluctuation, suggesting that its synchronous relationship with the water quality of the lake may be partially affected by aquaculture activity.

Relationship between Water Quality and the Expansion of Aquaculture Ponds
This study analyzed the potential impact of aquaculture pond expansion on water quality, expressed through the amount of phytoplankton in Nansi Lake, by means of correlation analysis (Figure 7).We found the phytoplankton abundance decreased after 2003 with the increase in the area of aquaculture ponds.Nonetheless, it showed a positive correlation (R = 0.5) with annual fluctuation, suggesting that its synchronous relationship with the water quality of the lake may be partially affected by aquaculture activity.

Discussion
In this study, we performed land use classification with Landsat 5/8 images around Nansi Lake by means of KNN, GNB, RF, and XGB algorithms under multilevel features.Our results showed that the XGB algorithm, especially when combined with texture features, achieved the highest classification accuracy of up to 96.15%.Compared with the existing literature, the classification accuracy in this study is significantly improved.For example, Talukdar et al. (2020) [35] used the RF algorithm to classify a riparian landscape in India and obtained a lower classification accuracy with a kappa coefficient of 0.89.Similarly, Abbas and Jaber (2020) [45] used WorldView-2 image and SVM algorithm to classify the land use in Hilla City in Babylon, Iraq, and obtained an overall classification accuracy of 94.48% and a kappa coefficient of 0.9, which are still less than in the current study.Xia et al. (2020) [46] extracted aquaculture ponds in Shanghai by integrating existing multi-source remote sensing data on the Google Earth Engine platform and combining multi-threshold connection component segmentation and random forest algorithm and reached an OA of 91.8%, which is still lower than the current study.This study not only confirms the key role of multiple feature integration in improving classification results, but also highlights the great potential of advanced machine learning algorithms in land use classification.
The key to achieving such a high classification accuracy in this study is the innovative introduction of texture features and effective dimensionality reduction.Texture features enable the model to better distinguish objects with similar spectra but large texture differences, especially for specific surface object types such as aquaculture ponds.The PCA method effectively reduced the information redundancy among features and improved the classification ability.The first PCA component typically accounts for the largest variance in the PCA analysis of texture data, while the second principal component explains the most variance among the remaining components, and so on.The first five principal components collectively represent approximately 99.99% of the shape information of all land cover types and contributed more to distinguishing aquaculture ponds and farmlands with a regular shape from others.The overall accuracy of RF scheme 4 is 92.03%, indicating that satisfactory classification can be achieved even when using only index and texture principal component features.This may be attributed to the EVI index, a vegetation index that integrates information from the near-infrared, red, and blue bands, outperforming single-band data in terms of classification performance.Similarly, the MNDWI, as a water body index, effectively distinguishes water from other land use types by utilizing information from the shortwave infrared and green bands.It is important to note that although index features (EVI and MNDWI) are included in this study, their contribution to the final classification accuracy when combined with spectral and texture features is not significant.However, in the feature importance evaluation, these indicators scored higher.The reason may be that the sufficiency of spectral information and its strong collinearity with index features factually caused no significant increment in useful information in the final classification.Nevertheless, the contribution of spectral features to the classification process remains substantial.The overall accuracy of the RF classification model based on spectral features reached 92.51%, highlighting the key role of spectral features in distinguishing different land cover types.Particularly in the feature importance evaluation, the NIR band, which scores highly, is greatly effective in differentiating vegetation types, while the SWIR band demonstrates its unique ability in identifying water types.
From the perspective of land use change, this study revealed dramatic changes in land feature type conversion in Nansi Lake.With the advancement of urbanization, natural land types such as bare land and forest land have been gradually converted to other uses.At the same time, the expansion of built land and aquaculture ponds reflects the increasing demand for land due to population growth and urban expansion.It is worth noting that since 2003, the expansion rate of aquaculture ponds in Nansi Lake has slowed down, and at the same time, the water quality of Nansi Lake has also shown a gradual improvement.This change may be closely related to the government's policy of keeping the lake natural, which forced a lot of farmers to reduce aquaculture activities to restore the integrity of the lake ecosystem.With the reduction in fertilizer application, a decrease in phytoplankton density was detected.The contrary yearly trends of aquaculture ponds and water quality are because the embankments of many aquaculture ponds are still there despite no fishery activity under the strong management of local government.However, we must also recognize the complexity of the relationship between aquaculture ponds and changes in water quality in the lake region.The change in water quality is the result of multiple factors, including climate, hydrology, land use, and human activities.Although the change in aquaculture pond area has a certain impact on water quality, it is only one of many factors affecting water quality changes.Therefore, to fully and deeply understand the causes of water quality evolution, more comprehensive investigation is needed in the future.
Despite the remarkable results of this study, there are still some limitations that need to be noted.First, high-precision classification relies heavily on the accurate selection of training and validation samples.In this study, we used visual interpretation to select samples, which is inevitably affected by subjective factors.Different interpreters may classify and categorize feature types in the same area differently according to their own experiences and judgments, leading to misclassification of sample types and thus adversely affecting the accuracy of the classification results.Meanwhile, due to the complexity of local land use types, especially in the transition areas, the unclear boundaries of these types often make it particularly difficult to accurately select high-purity pixels.Second, the current studies mainly rely on traditional feature selection methods, which may not be able to fully mine the potential information in the data.Future research could try to use deep learning techniques to automatically extract and select the most discriminative features.

Conclusions
Aiming at the difficult problem of identifying aquaculture ponds in the Nansi Lake region, this study integrated multi-level features into different machine learning algorithms to achieve high-precision land use classification with the highest accuracy of 96.15%, breaking through the limitations of traditional methods.This study shows that the land use pattern in the region has greatly transformed, and natural land such as bare land and forest land has largely been replaced by aquaculture ponds and built-up land.At the same time, we found that phytoplankton density was correlated with the changes in the area of aquaculture ponds, suggesting that the expansion of ponds and the reduction in local farming strength may have changed the hydrological environment of the lake.

Figure 1 .
Figure 1.Geographical location and overview of Nansi Lake.

Figure 1 .
Figure 1.Geographical location and overview of Nansi Lake.

2. 2 .
Data Landsat TM/OLI (L2) images from 1984 to 2022 provided by the United States Geological Survey (USGS) were used in this study to carry out land cover classification.In order to accurately separate aquaculture ponds from nearshore vegetation, we elaborately selected a total of 10 winter cloud-free images to eliminate the effect of spectral convergence caused by plants growing in the ponds during growing season.The acquisition dates of these images were 7 February 1987, 27 January 1989, 24 December 1993, 5 February 1998, 31 January 2002, 20 December 2003, 26 January 2006, 29 November 2013, 10 December 2017, and 5 December 2021, respectively.
PC4, and PC5.In comparison, the effects of visible bands are relatively weaker than those of index and texture features.

16 Figure 3 .
Figure 3. Quantity difference among five targeted objects in (a) spectral features, (b) index features and (c) texture features.

Figure 3 .
Figure 3. Quantity difference among five targeted objects in (a) spectral features, (b) index features, and (c) texture features.

Figure 4 .
Figure 4. Comparison of classification accuracy for land use types identified with XGBoost scheme5: (a) confusion matrix; (b) PA and recall for each land use type.

Figure 4 . 16 Figure 5 .
Figure 4. Comparison of classification accuracy for land use types identified with XGBoost scheme 5: (a) confusion matrix; (b) PA and recall for each land use type.Remote Sens. 2024, 16, x FOR PEER REVIEW 11 of 16

Figure 5 .
Figure 5. LUCC mapping and details comparison: (a,b) are two partial details presented to compare the land use classification performance among different models.

Figure 7 .
Figure 7. Correlation of aquaculture pond area and phytoplankton density.The correlation coefficient (a) and regression equation (b) between aquaculture pond area and phytoplankton density.

Table 1 .
Characteristics and description of selected GLCM.

Table 3 .
Comparison of classification accuracy of different models.