Exploring Wetland Dynamics in Large River Floodplain Systems with Unsupervised Machine Learning: A Case Study of the Dongting Lake, China

: Large river ﬂoodplain systems (LRFS) are among the most diverse and dynamic ecosystems. Accurately monitoring the dynamics of LRFS over long time series is fundamental and essential for their sustainable development. However, challenges remain because the spatial distribution of LRFS is never static due to inter- and intra-annual changes in environmental conditions. In this study, we developed and tested a methodological framework to re-construct the long-term wetland dynamics in Dongting Lake, China, utilizing an unsupervised machine-learning algorithm (UMLA) on the basis of MODIS (Moderate Resolution Imaging Spectroradiometer) EVI (Enhanced Vegetation Index) time series. Our results showed that the UMLA achieved comparable performance to the time-consuming satellite image segmentation method with a Kappa coe ﬃ cient of agreement greater than 0.75 and an overall accuracy over 85%. With the re-constructed annual wetland distribution maps, we found that 31.35% of wet meadows, one of most important ecological assets in the region, disappeared at an average rate of c.a. 1660 ha year − 1 during the past two decades, which suggests that the Dongting Lake is losing its ecological function of providing wintering ground for migratory water birds, and remediation management actions are urgently required. We concluded that UMLA o ﬀ ers a fast and cost-e ﬃ cient alternative to monitor ecological responses in a rapidly changing environment.


Introduction
Globally, the largest freshwater wetlands are associated with floodplains of large rivers [1,2]. Some well-known examples including the Amazon floodplains (over 800,000 km 2 of wetlands) [3], middle-lower Yangtze floodplains [4,5], and the Okavango Delta in northern Botswana [6]. These wetlands are highly dynamic systems displaying great inter-and intra-annual expansion and shrinkage in extent driven by flow regimes of their associated rivers [7][8][9][10]. Moreover, variations in topography will dictate the ability of the floodplain to drain, as well as the direction of flow and the ponding of river water. The variations of hydrological regimes and topography in space and time pixels according to a defined distance measure and labels clusters based on ancillary information. In contrast, with labelled reference (training) data-a supervised method-trains a classifier, which is then used to predict the classes of the un-labelled data [60]. The major advantage of unsupervised clustering methods is that they require no prior training [54], which can be time-consuming and costly, making them the dominant method for large area land-cover mapping and monitoring [49,61].
In this study, we used the 16-day composite MODIS EVI time series (2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014)(2015)(2016)(2017)(2018)(2019) to explore the wetland dynamics in a large river floodplain system, the Dongting Lake (referred as the Lake thereafter) in China. The Lake is one of two large lakes that remains freely connected with the Yangtze River. The completion of the Three Gorges Dam (TGD) has greatly changed the hydraulic relationship between the Yangtze and its floodplains [62] and induced alteration in inundation regimes in the floodplain, which threatens the ecological integrity of floodplains [12]. Our main objective is to develop and test a methodological framework to re-construct the long-term wetland dynamics utilizing k-medoids algorithm (PAM), an unsupervised machine-learning algorithm (UMLA). The study was designed to address the two specific issues: (1) whether UMLA based on 250 m MODIS data is robust and can produce annual wetland maps with comparable accuracy; (2) whether wetlands can be efficiently discriminated using phenological metrics derived from MODIS EVI time series.

Study Site
The Dongting Lake (28 • 30 N-30 • 20 N, 111 • 40 E-113 • 10 E, hereafter referred as the Lake) is the second largest freshwater wetland complex in China, consisting a mosaic of permanent lakes and ephemeral swamps and floodplains that are interconnected through a network of channels and streams. It is located within the middle Yangtze River catchment of China ( Figure 1) in a subtropical humid zone. Precipitation deceases from south to north with basin average annual total of 1400 mm. Temperature variation within the basin is relatively small and the mean summer and mean winter temperatures are 28.0 • C and 6.7 • C, respectively.
Remote Sens. 2020, 12, x FOR PEER REVIEW 3 of 19 classifier, which is then used to predict the classes of the un-labelled data [60]. The major advantage of unsupervised clustering methods is that they require no prior training [54], which can be timeconsuming and costly, making them the dominant method for large area land-cover mapping and monitoring [49,61].
In this study, we used the 16-day composite MODIS EVI time series (2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014)(2015)(2016)(2017)(2018)(2019) to explore the wetland dynamics in a large river floodplain system, the Dongting Lake (referred as the Lake thereafter) in China. The Lake is one of two large lakes that remains freely connected with the Yangtze River. The completion of the Three Gorges Dam (TGD) has greatly changed the hydraulic relationship between the Yangtze and its floodplains [62] and induced alteration in inundation regimes in the floodplain, which threatens the ecological integrity of floodplains [12]. Our main objective is to develop and test a methodological framework to re-construct the long-term wetland dynamics utilizing k-medoids algorithm (PAM), an unsupervised machine-learning algorithm (UMLA). The study was designed to address the two specific issues: (1) whether UMLA based on 250 m MODIS data is robust and can produce annual wetland maps with comparable accuracy; (2) whether wetlands can be efficiently discriminated using phenological metrics derived from MODIS EVI time series.

Study Site
The Dongting Lake (28°30′ N-30°20′ N, 111°40′ E-113°10′ E, hereafter referred as the Lake) is the second largest freshwater wetland complex in China, consisting a mosaic of permanent lakes and ephemeral swamps and floodplains that are interconnected through a network of channels and streams. It is located within the middle Yangtze River catchment of China ( Figure 1) in a subtropical humid zone. Precipitation deceases from south to north with basin average annual total of 1400 mm. Temperature variation within the basin is relatively small and the mean summer and mean winter temperatures are 28.0 °C and 6.7 °C, respectively.  The Lake has three main water supply: escaping flows from the Yangtze through the "Three Outfalls", inflows from four major tributaries (i.e., Li, Yuan, Zi, and Xiang), and local rainfall and runoff. The Lake discharges into the Yangtze through the only outlet channel located at northeast corner, and the flow was gauged at Chenglingji ( Figure 1). The entire lake was divided into three sub-lakes, West Dongting Lake (WDTL), South Dongting Lake (SDTL), and East Dongting Lake (EDTL) (Figure 1), mainly due to the administrative boundary. The sub-lakes are connected through braided streams. In a typical year, lake water level starts to rise from early March and reaches peak at July, when most wetlands are inundated. When flood recedes in late September, vast mudflats and shallow lakes emerge, providing opportunity for plants to colonize and develop, which in turn provides foraging areas for wintering migratory birds, such as the greater white-fronted goose (Anser albifrons) and the tundra swan (Cygnus columbianus).
The water level in the Lake is characterized by large seasonal fluctuation: it can be up to 9.5 m at the inlet (Nanzue) and to 16.7 m at the outlet (Chenglingji) (Figure 1). The dramatic water level dynamics maintains a unique vegetation community zonation along the elevation gradient (Table 1), ranging from permanent waters with submersed macrophytes to the forested wetlands with almond willow (Salix triandra) and forest plantation of black poplar (Populus nigra). The Lake experienced a long history of land reclamation and the alternation of the natural flow regimes, especially after the Three Gorges Dam. These human disturbances have dramatically reduced the lake capacity [63] and changed the flow seasonality [62], resulting significant difference in wetland distribution [13] with potentially detrimental ecological consequences, such as losing wintering ground for migratory water birds [12] and reducing fish diversity [64]. Thus, wetland conservation is a major challenge to the ecological integrity of the Lake due to the increasing threats of water regime alternations [12,62].
With increasing acknowledgement of the importance of wetlands to sustaining the goods and services of the Lake, conservation actions, such as converting croplands to wetlands, breaching artificial levees, and removing poplar plantation, are practiced in the Dongting areas. In this study, we map the annual distribution of key wetland types in the Lake over a 20-year period (2000~2019) to explore the impacts of hydrological alternation and land-use change on wetland transition.

MODIS Enhanced Vegetation Index (EVI)
Enhanced vegetation Index (EVI) time series [65] from 2000 to 2019 downloaded from NASA EOSDIS LP DAAC (Earth Observing System Data and Information System, Land Processes Distributed Active Archive Center, https://earthdata.nasa.gov/eosdis/daacs/lpdaac). We used the "optimized" Remote Sens. 2020, 12, 2995 5 of 19 vegetation index EVI instead of NDVI (Normalized Difference Vegetation Index), as it enhances the vegetation signal with improved sensitivity in high biomass regions [66].
As some pixels have missing data due to cloud, we first applied a cubic smoothing spline [67] to all pixels with missing data to generate regular 16-day time series of EVI from 18 February 2000 to 24 May 2020. There are two phases of grassy vegetation development in a typical water year in the Lake driven by seasonal water level fluctuation and climate [68]. To fully account for the growth dynamics, we divided the yearly EVI time series into two: one for the water rise phase (i.e., December to May) and one for water recession phase (June to November). For each phase, we calculated a monotonic growth trend (Trend 1 and Trend 2 ) as Theil-Sen slope [69]. Based on the nonparametric Kendall tau statistic [70], the value of all grids with no significant trend (p > 0.05) was replaces with zero. In addition, we calculated four more annual summary metrics, the maximum, mean, minimum, and intra-annual standard deviation, making a total of six EVI-derived variables to capture the land-cover phenology of each grid. We used package spatialEco [71] and Raster package [72] within R version 3.6.1 [73] for all raster manipulations and calculations.

Geomorphological Variables
Inclusion of topographic metrics can improve the accuracy of wetland mapping [74]. We computed a range of geomorphological variables from DEM (digital elevation model). The DEM with 1:10,000 spatial scale was provided by the Changjiang Water Resource Commission.

1.
One and two degree detrended DEM ( Figure 2): the residuals of the linear regression of poly (x, y) with degree of 1 and 2, where Equation (1) for first order detrended elevation and Equation (2) for second order detrended DEM, and where x and y are the longitude and latitude of central of a grid and DEM mean is the global mean DEM.

3.
Local deviation from global (LDFG) [76]: where y is global mean elevation of a 3 × 3 window, and y i is the elevation of grid i.

4.
TPI (Topographic Position Index) [77]: the difference between the value of a cell and the mean value of its 8 surrounding cells These variables are computed using the 'spatialEco' package by Evans and Ram [71] and package 'Spatstat' by Baddeley [78]. All these topographic variables were resampled to 250 m.
To limit the effect of collinearity on clustering, we excluded the highly correlated variables. We randomly selected 20,000 points (with the constraint of 250 m apart) within the study area (about 25% of the total grids) and extracted all EVI and geomorphological variables using these random points. We calculated the Pearson correlation coefficients r between every pair of the variables. For the pair with absolute r greater than 0.75, we kept only one variable for further analysis. This procedure resulted in a set of eight variables, namely Detrend.1, CTI, TPI, Trend 1 , Trend 2 , EVI mean , EVI min , and EVI sd . We built a raster brick with these eight layers (each layer is a raster for a variable). The raster brick was then normalized for further k-means clustering. These variables are computed using the 'spatialEco' package by Evans and Ram [71] and package 'Spatstat' by Baddeley [78]. All these topographic variables were resampled to 250 m.
To limit the effect of collinearity on clustering, we excluded the highly correlated variables. We randomly selected 20,000 points (with the constraint of 250 m apart) within the study area (about 25% of the total grids) and extracted all EVI and geomorphological variables using these random points. We calculated the Pearson correlation coefficients r between every pair of the variables. For the pair with absolute r greater than 0.75, we kept only one variable for further analysis. This procedure resulted in a set of eight variables, namely Detrend.1, CTI, TPI, Trend1, Trend2, EVImean, EVImin, and EVIsd. We built a raster brick with these eight layers (each layer is a raster for a variable). The raster brick was then normalized for further k-means clustering.

Assessing Clustering Tendency
Before clustering, we assessed the clustering tendency of the dataset using the Hopkins statistic [79]. The Hopkins statistic tests the spatial randomness of the data, ranging from 0 (uniformly distributed) to 1 (highly clustered), and a spatially random data tend to have a value of 0.5. The Hopkins statistics for our dataset ranged from 0.88 to 0.95, and the Hopkins statistic for majority of years is over 0.90 (Table 2), far above the threshold of 0.5, indicating that the land grids are well clustered and highly separable.

Assessing Clustering Tendency
Before clustering, we assessed the clustering tendency of the dataset using the Hopkins statistic [79]. The Hopkins statistic tests the spatial randomness of the data, ranging from 0 (uniformly distributed) to 1 (highly clustered), and a spatially random data tend to have a value of 0.5. The Hopkins statistics for our dataset ranged from 0.88 to 0.95, and the Hopkins statistic for majority of years is over 0.90 (Table 2), far above the threshold of 0.5, indicating that the land grids are well clustered and highly separable.

Estimating the Optimal Number of Clusters
The k-medoids clustering requires a prior specification of the number of clusters to be generated. There are many indices developed to determine the optimal number of clusters [80]. However, the optimal number of clusters is somehow subjective and depends on the method used for measuring similarities and the indices used for partitioning.
In this study, we use three methods to estimate the optimal number of clusters: elbow, gap statistic [81], and average silhouette method. The elbow method minimizes the total within-cluster sum of square (WSS), which measures the compactness of the clustering. The elbow method looks at the total WSS as a function of the number of clusters: the number of clusters is selected in such a way so that adding another cluster does not improve much better the total WSS (the elbow point). The gap statistic method compares the change in with-cluster dispersion with the expected dispersion, which is defined through bootstrapping simulation (n = 500 in this study) from a reference distribution. The average silhouette method computes the average silhouette of observations for different values of k. The optimal number of clusters k is the one that maximize the average silhouette width (see below) over a range of possible values for k [82]. The three methods were applied to a subset of 5000 randomly selected observations.

CLARA an Extension of k-Medoids Algorithm (PAM)
We used k-medoids or Partitioning Around Medoids (PAM), one of the most popular unsupervised learning algorithms in data mining [83,84], to delineate vegetation condition class in the Lake. PAM is a robust alternative to k-means for partitioning a dataset into clusters of observation, as it is less sensitive to noise and outliers [82], which are the features of our predictor variables. The goal of the PAM algorithm is to find a sequence of objects (medoids) that are centrally located in clusters by minimizing the average dissimilarity of objects to their closest selected object [82].
The dataset is very large (i.e., 52,498 × 8), preventing the direct application of PAM to the entire dataset due to RAM storage and computing time problems. Therefore, we use CLARA (i.e., Clustering Large Applications) [85,86], which is an extension to PAM designed to deal with data containing a large number of observations. Instead of finding medoids for the entire dataset, CLARA considers a subset of random samples from the original data with a fixed size (5100, about 10% of the original dataset in our study) and applies the PAM algorithm to generate an optimal set of medoids. The procedure of sampling and clustering is repeated for a pre-specified number of times (1000 in this study) in order to minimize the sampling bias. The final clustering results correspond to the set of medoids with the minimal cost, defined as where M is a set of selected clusters, dissimilarity CLARA is conducted with the "clara" function from the cluster package [87].

Clustering Validation
We first assess the clustering by visually inspecting the partitioning result using the function fviz_cluster in factoextra package [88], which produces a scatter plot of datapoints colored by cluster numbers. We then employed both internal and external cluster validation measurements to quantify the cluster results.

Internal Validation
We calculated two indicators of each cluster map, i.e., silhouette width and isolation, to assess the quality of clustering. The silhouette analysis measures how well an observation is clustered, and it estimates the average distance between clusters as where a i is the average dissimilarity between observation i and all other points of the cluster to which i belongs. For all other clusters C, d(i,C) is average dissimilarity of i to all observations of C, and b i is the smallest of these d(i,C). b i can be interpreted as the dissimilarity between i and its "neighbor" cluster, i.e., the nearest one to which it does not belong. A large S i (approaching 1) are very well clustered. A small S i (around 0) means that the observation lies between two clusters. Observations with a negative S i are probably placed in the wrong cluster. Isolation is computed as the ratio of the maximal dissimilarity between the observations in the cluster and the cluster's medoid and the minimal dissimilarity between the cluster's medoid and the medoid of any other cluster. If this ratio is small, the cluster is well-separated from the other clusters.

Map Validation
We compared the clustering maps with two existing vegetation maps for the Dongting area (2009 and 2013), which were based on Landsat image segmentation and validated by ground-truthing. The vegetation maps have 7 classes, i.e., water, mudflat, grassland, reed, poplar tree, other trees, and others (including roads, human settlement, and croplands). We combined these classes to the three clusters in our wetland type maps, namely shallow waters (include waters and mudflats), wet meadow, and rushes (combination of reed marshes and forests). As reed and forest are both subject to harvesting in winter and regeneration (reed) and replantation (poplar tree) in spring, it is difficult to distinguish without extensive field survey. Moreover, more and more poplar plantations are abandoned due to the ongoing "returning cropland to wetland" policy and are gradually colonized by reed, we grouped reed and forest together as one class. The combined 30 m vegetation maps were then aggregated to 250 m. We conducted a cell to cell comparison and computed the Cohen's Kappa index [89] as a measurement of the degree of agreement for both years.

Optimal Number of Clusters
For most of the years, both elbow and average silhouette methods are consistent, indicating that the optimal number of clusters was three (Table 2, Figure 3). However, the gap statistics often provided different answers. For example, the optimal k for 2004 was 10 according to gap statistics, while both elbow and silhouette methods suggested k = 3 (Table 2). To be consistent, we selected three as the optimal number of clusters for all years.

Internal Clustering Validation
The average silhouette width ranged from 0.39 in 2017 to 0.52 in 2018, and the isolation index was small as well (ranged from 0.98 in 2002 to 2.90 in 2012 (Table 3)), suggesting the clustering algorithm separated the grids well. For individual classes, water areas were generally better distinguished than grasslands and territorialized areas, as the silhouette width was comparably larger and the isolation index was smaller for water than for the others (Table 3).

Internal Clustering Validation
The average silhouette width ranged from 0.39 in 2017 to 0.52 in 2018, and the isolation index was small as well (ranged from 0.98 in 2002 to 2.90 in 2012 (Table 3)), suggesting the clustering algorithm separated the grids well. For individual classes, water areas were generally better distinguished than grasslands and territorialized areas, as the silhouette width was comparably larger and the isolation index was smaller for water than for the others (Table 3).   The visual inspection of the clustering showed that the separation of clusters was well performed (Figures 4 and 5). The first two PCA (principal component analysis) axes explained more than 85% of the difference in the EVI metrics of the grids (Figure 4). Most critically, there are very few overlaps of the convex hull of the clusters. The silhouette plots also showed that very few grids were wrongly placed (i.e., negative si value, Figure 5).
The visual inspection of the clustering showed that the separation of clusters was well performed (Figures 4 and 5). The first two PCA (principal component analysis) axes explained more than 85% of the difference in the EVI metrics of the grids (Figure 4). Most critically, there are very few overlaps of the convex hull of the clusters. The silhouette plots also showed that very few grids were wrongly placed (i.e., negative si value, Figure 5).

Validation of Clustering with Vegetation Maps
The comparisons with vegetation maps showed that the unsupervised machine-learning technique achieved satisfactory results for both 2009 and 2013 (Table 4). Although the performance is slightly better for 2013 (Table 4), the Kappa coefficient was over 0.75 for both years, indicating substantial performance, according to Landis and Koch [90]. The overall accuracy, which defines the percentage of correctly classified instances, was also high (greater than 85%, Table 4) for both years.

Validation of Clustering with Vegetation Maps
The comparisons with vegetation maps showed that the unsupervised machine-learning technique achieved satisfactory results for both 2009 and 2013 (Table 4). Although the performance is slightly better for 2013 (Table 4), the Kappa coefficient was over 0.75 for both years, indicating substantial performance, according to Landis and Koch [90]. The overall accuracy, which defines the percentage of correctly classified instances, was also high (greater than 85%, Table 4) for both years. For individual class, there was no systematic misclassification as the balanced accuracy for each class was comparable, which was confirmed by visually comparing the maps. The small patches of wet meadows within large area of rushes distinguished by the wetland distribution maps based on Landsat image were generally absent from the map produced using MODIS data, presenting the majority of mismatches. In addition, both the producer's and user's accuracy were high and comparable (Table 4).

Changes of Wetland Extent
The overall dynamics of wetland transition during 2000 to 2019 in Dongting Lake was the encroachment of reeds and forests into wet meadows, and to less extent, the colonization of wet meadows in mudflats and waters ( Figure 6). While the encroachment of reeds (forests) was widespread across the Lake, the wet meadow colonization mainly occurred in EDTL, probably because of the limited availability of shallow waters and mudflats in WDTL and SDTL.
Remote Sens. 2020, 12, x FOR PEER REVIEW 12 of 19 For individual class, there was no systematic misclassification as the balanced accuracy for each class was comparable, which was confirmed by visually comparing the maps. The small patches of wet meadows within large area of rushes distinguished by the wetland distribution maps based on Landsat image were generally absent from the map produced using MODIS data, presenting the majority of mismatches. In addition, both the producer's and user's accuracy were high and comparable (Table 4).

Changes of Wetland Extent
The overall dynamics of wetland transition during 2000 to 2019 in Dongting Lake was the encroachment of reeds and forests into wet meadows, and to less extent, the colonization of wet meadows in mudflats and waters ( Figure 6). While the encroachment of reeds (forests) was widespread across the Lake, the wet meadow colonization mainly occurred in EDTL, probably because of the limited availability of shallow waters and mudflats in WDTL and SDTL. The overall change of wetland extent and distribution for the 20-year study period in Dongting Lake is presented in Figure 6. During the 20-year period, rushes increased substantially in the expense of wet meadows and shallow water/mudflat (Table 5). For the entire system, 31.35% of wet meadows was lost at a rate of 1675.95 ha year −1 , while rushes increased nearly 80% at an annual rate of 2322.85 ha. A comparison of wetland distribution maps in 2000 and 2019 found that growth (i.e., expansion of existing patches) and colonization (i.e., establishing new colonies) of phragmite were both responsible for the rapid gain of rush land. Compared with wet meadows, the loss of water/mudflat was less striking (16.34% loss at 666.60 ha year −1 ). For individual sub-lakes, the trends of loss and gain were similar although varied in change rates. The loss of wet meadows was most severe in WDTL at the upstream, where up to 65.78% was replaced with rushes ( Table 5). The fastest gain of rushes occurred in EDTL, where area of rushes increased 104.32% (an increasing rate of 961.75 ha year −1 ) during the 20-year period. Until 2019, although there was still large area of shallow waters/mudflats in WDTL, there was no area of wet meadows large enough to support viable population of watering geese in WDTL ( Figure 6). This is also true for SDTL, where large patches of wet meadow were pushed to the downstream deltas, where Zi River and Xiang River enter the Lake ( Figure 6).  The overall change of wetland extent and distribution for the 20-year study period in Dongting Lake is presented in Figure 6. During the 20-year period, rushes increased substantially in the expense of wet meadows and shallow water/mudflat (Table 5). For the entire system, 31.35% of wet meadows was lost at a rate of 1675.95 ha year −1 , while rushes increased nearly 80% at an annual rate of 2322.85 ha. A comparison of wetland distribution maps in 2000 and 2019 found that growth (i.e., expansion of existing patches) and colonization (i.e., establishing new colonies) of phragmite were both responsible for the rapid gain of rush land. Compared with wet meadows, the loss of water/mudflat was less striking (16.34% loss at 666.60 ha year −1 ). For individual sub-lakes, the trends of loss and gain were similar although varied in change rates. The loss of wet meadows was most severe in WDTL at the upstream, where up to 65.78% was replaced with rushes ( Table 5). The fastest gain of rushes occurred in EDTL, where area of rushes increased 104.32% (an increasing rate of 961.75 ha year −1 ) during the 20-year period. Until 2019, although there was still large area of shallow waters/mudflats in WDTL, there was no area of wet meadows large enough to support viable population of watering geese in WDTL ( Figure 6). This is also true for SDTL, where large patches of wet meadow were pushed to the downstream deltas, where Zi River and Xiang River enter the Lake ( Figure 6).
For the entire lake, the overall increasing trend of rushes and decreasing trend of wet meadow were clear, whereas there was no clear trend for water/mudflat (Figure 7). However, the change rates were not spatially consistent. Moreover, the dynamics for wetland types also varied. The overall pattern of wet meadow dynamics was decreasing and was mainly driven by the changes in WDTL and SDTL (Figure 7). The decreases in the extent of wet meadows were almost linear in WDTL and SDTL, and the rate of changes were comparable indicated by the smooth lines. However, in EDTL, the fitted smooth line indicated the decrease was sharper before 2009 (although at a much slower rate that in WDTL and SDTL) but flatted till the end of study period. In contrast, the general dynamics of rushes was increasing although the trend was not linear. In SDTL (and the whole lake), the colonization of reed and forest began to slow down after 2009, whereas the increasing rate remained high until the end of the study period in WDTL. However, in EDTL, the fitted smooth line suggested that the increasing trend of rushes was reversed after 2009 ( Figure 7). For the entire lake, the overall increasing trend of rushes and decreasing trend of wet meadow were clear, whereas there was no clear trend for water/mudflat (Figure 7). However, the change rates were not spatially consistent. Moreover, the dynamics for wetland types also varied. The overall pattern of wet meadow dynamics was decreasing and was mainly driven by the changes in WDTL and SDTL (Figure 7). The decreases in the extent of wet meadows were almost linear in WDTL and SDTL, and the rate of changes were comparable indicated by the smooth lines. However, in EDTL, the fitted smooth line indicated the decrease was sharper before 2009 (although at a much slower rate that in WDTL and SDTL) but flatted till the end of study period. In contrast, the general dynamics of rushes was increasing although the trend was not linear. In SDTL (and the whole lake), the colonization of reed and forest began to slow down after 2009, whereas the increasing rate remained high until the end of the study period in WDTL. However, in EDTL, the fitted smooth line suggested that the increasing trend of rushes was reversed after 2009 ( Figure 7). Dots are mapped areas of water/mudflat, wet meadows, and rushes; lines are the gam (generalized additive model) smoothed trends, and the shaded areas are the 95% confident intervals. DTL = the entire lake, EDTL = East Dongting Lake, SDTL = South Dongting Lake, and WDTL = West Dongting Lake.

Discussion
For effective wetland restoration and management, it is essential to consider them as a dynamic landscape unit rather than a static land-cover type [11], which requires periodic mapping to monitor their spatiotemporal dynamics [44]. In this study, we developed and tested a framework that uses topographic variables derived from DEM and land-surface phenological metrics based on a MODIS Dots are mapped areas of water/mudflat, wet meadows, and rushes; lines are the gam (generalized additive model) smoothed trends, and the shaded areas are the 95% confident intervals. DTL = the entire lake, EDTL = East Dongting Lake, SDTL = South Dongting Lake, and WDTL = West Dongting Lake.

Discussion
For effective wetland restoration and management, it is essential to consider them as a dynamic landscape unit rather than a static land-cover type [11], which requires periodic mapping to monitor their spatiotemporal dynamics [44]. In this study, we developed and tested a framework that uses topographic variables derived from DEM and land-surface phenological metrics based on a MODIS EVI time series to generate annual wetland maps for a large subtropical monsoon wetland complex.
The validation results showed that the accuracy of our approach was consistent and comparable to that of the time-consuming manual Landsat image segmentation with high Kappa coefficient of agreement and accuracy for both overall and individual classes (Table 4). Our framework provides a fast and cost-efficient alternative to monitor ecological responses in a rapidly changing environment, which is key to making effective conservation decisions [91].
MODIS data in 250 m resolution have been widely used in large-scale land-cover mapping [29,92,93]. The trade-off in spatial and temporal resolution of MODIS data provides a unique capability to distinguish annually recurring temporal patterns and to map land-cover types without the need for expensive ground data [93]. The regularity of the 16-day MODIS products provides a major advantage for capturing the land-surface phenology [50]. This is particular useful to study dynamic landscapes that have distinct seasonal patterns including croplands [92,93] and floodplains [44,94,95]. The Hopkins statistic [79] indicated that pixels with annual metrics of land-surface phenology and topographic features have very high cluster tendency (>0.9, Table 2), providing a solid basis for using unsupervised clustering. However, the results of all three employed methods (i.e., elbow, gap statistic, and average silhouette) suggested that our approach might not be consistent for finer (i.e., above three classes) clustering. Powell et al. [42] suggested that discrimination limit for semi-arid floodplain using Landsat based NDVI metrics was four classes, beyond which the performance was significantly deteriorated.
The widespread and rapid loss of wet meadows to rushes in the Lake, particularly within the upstream section (i.e., WDTL and SDTL) reflects the global trend of encroachment of Common Reed (Phragmites australis), a semi-aquatic species, into freshwater and blackish wetlands in subtropical and temperate climates [96,97]. The encroachment of phragmites is associated with increasing nutrient level, altered water regimes, and human disturbances [98,99]. During 2003-2009, the rate of rush expansion was very high (Figure 7), coincident to the period of TGD impounding. This finding suggests that the TGD accelerated the xerophilization trend in the Lake. Large patches of wet meadows (often in hundreds to thousands ha) in the Lake provide irreplaceable winter foraging habitats for hundreds and thousands of migratory water birds, especially ducks and geese [12], representing the most significant ecological asset in the region. In contrast, water birds avoid tall and dense phragmites marshes [100]. Moreover, the rushes offer poor-quality habits for larval and juvenile fish [101]. Thus, the encroachment of rushes into wet meadows presents a real ecological threat to local and regional biodiversity, which requires urgent remediation management actions. This urgency is even more pressing for WDTL and upstream of SDTL, where rush expansion was almost linear from 2000 to 2019, and there is no sign of slowing down.

Conclusions and Management Implications
Periodically mapping of the distribution and extent of wetlands subjected to natural and/or anthropogenic disturbances is a vital task to identify long-term trends and short-term variations for sustainable natural resource management. In this study, we applied an unsupervised machine-learning algorithm-the k-medoids algorithm-to re-construct the long-term floodplain wetland dynamics using land phenology metrics derived from the 16-day composite MODIS EVI time series. Despite the relatively coarse spatial resolution of MODIS data, our approach achieved comparable performance to the time-consuming satellite image segmentation method with a Kappa coefficient of agreement greater than 0.75 and an overall accuracy over 85%. The framework offers a fast and cost-effective alternative to monitor spatial distribution dynamics of wetland in the Dongting Lake and may be applicable to other large river floodplain systems across the world. With the re-constructed annual wetland distribution maps, we found widespread and continuous encroachment of rushes into wet meadows, the most significant ecological asset in the region. In addition, to a much lesser extent, wet meadows have expanded into shallow waters. During the past two decades, 31.35% of wet meadows in Dongting Lake were lost at a rate of 1675.95 ha year −1 , while the area of rushes increased by 79.53% of rushes at an annual rate of 2322.85 ha. Moreover, we identified a period of accelerated rush encroachment (i.e., 2003-2009), which was coincident to the TGD impounding. These findings highlight the rapid