Wetland Vegetation Classiﬁcation through Multi-Dimensional Feature Time Series Remote Sensing Images Using Mahalanobis Distance-Based Dynamic Time Warping

: Efﬁcient methodologies for vegetation-type mapping are signiﬁcant for wetland’s management practices and monitoring. Nowadays, dynamic time warping (DTW) based on remote sensing time series has been successfully applied to vegetation classiﬁcation. However, most of the previous related studies only focused on Normalized Difference Vegetation Index (NDVI) time series while ignoring multiple features in each period image. In order to further improve the accuracy of wetland vegetation classiﬁcation, Mahalanobis Distance-based Dynamic Time Warping (MDDTW) using multi-dimensional feature time series was employed in this research. This method extends the traditional DTW algorithm based on single-dimensional features to multi-dimensional features and solves the problem of calculating similarity distance between multi-dimensional feature time series. Vegetation classiﬁcation experiments were carried out in the Yellow River Delta (YRD). Compared with different classiﬁcation methods, the results show that the K-Nearest Neighbors (KNN) algorithm based on MDDTW (KNN-MDDTW) has achieved better classiﬁcation accuracy; the overall accuracy is more than 90%, and kappa is more than 0.9.


Introduction
Wetlands are one of the most important and biologically diverse ecosystems on earth, providing numerous essential ecosystem services and representing significant economic value [1].Wetland vegetation, as an important component of wetland ecosystems, plays an important role in biodiversity conservation [2,3], climate regulation [4], and water conservation [5].The knowledge of the geospatial distribution of vegetation is paramount to support high levels of biodiversity and provide a wide range of ecosystem services in wetland.The Yellow River Delta (YRD) reserve is the largest natural vegetation area of new wetland along the coast of China, which has been challenged by climate change and alien vegetation invasion in recent years, and some plant communities are facing significant threats [6].Therefore, wetland monitoring and mapping has become the key to protecting the ecological environment of the region.However, traditional wetland plant community mapping requires intensive field work, which is labor intensive, expensive, time consuming, and sometimes not applicable due to limited accessibility.Fortunately, remote sensing technology provides a powerful tool for identifying and mapping wetland plant communities in large areas [7,8].
With the improvement of the spatial and temporal resolution of satellite remote sensing, time series remote sensing images have played an important role in land cover mapping and other fields [9][10][11][12].The accessibility of multitemporal satellite images could reduce the impact on classification caused by the complexity of surfaces and, to some Remote Sens. 2022, 14, 501 2 of 21 extent, the similarity between the spectrum of different vegetation [13].However, part of the image is occluded by clouds and fog, resulting in missing data values [14,15], and the inconsistency of the input data dimensions makes it difficult to apply conventional classifiers, which need consistent data dimensions.In addition, time series remote sensing images have proven to be of great value to acquire phenological insights for land cover mapping, which is reflected by vegetation indexes [16,17].Some data fitting methods, such as Linear fitting [18] and Fourier transform [19][20][21], were used to fit vegetation growth curves and extract phenological features for classification [22][23][24][25], but the periodicity of vegetation is vulnerable to climate, and the time of plant growth and withering may be delayed or advanced [26][27][28], which affects the classification results.
The dynamic time warping (DTW) algorithm, which was first applied to speech and pseudo signature recognition, has been proven to be an effective solution to deal with the time series challenges described above [29,30].As an algorithm widely used for time series matching, this method finds the minimum distance of time series to measure the similarity between two time series by stretching or shrinking distorted time series [31], and the partial data values could be overcome.Although the DTW method can make full use of all time information for better time series shape matching, it ignores the time range, which is a troublesome issue for vegetation classification with similar time series shape but different phenological cycles.In response to this shortcoming, some scholars have introduced time constraints and proposed time-weighted dynamic time warping (TWDTW) to help distinguish different types of land cover categories.This avoids the false matching caused by time delay considering the time offset of vegetation phenological characteristics [32][33][34].This method has been widely used in crop phenological information extraction and mapping, forest tree species classification, etc.
At present, the time series analysis method based on DTW has been successfully applied to the field of vegetation classification; unfortunately, most of the relevant studies are based on a single vegetation index such as Normalized Difference Vegetation Index (NDVI) or Enhanced Vegetation Index (EVI) [35,36].Plants have obvious spectral reflection characteristics, which are closely related to their development, health, and growth conditions.Making full use of spectral information in complex wetland environments is very important for vegetation classification.For example, paper [37] extracted a variety of spectral indices combined with DTW for classification.However, this method uses each time point as a vector to calculate its Euclidean distance in the DTW local distance, which is also DTW based on Euclidean distance (EDDTW), and assigns the same weight to each feature while ignoring the relationship between different features [38,39].On the one hand, the time series with multiple features should be considered as a whole in order to maintain the correlation between features [40].On the other hand, different features may be subject to various interferences or have nothing to do with the final classification results, and there are differences between features.Therefore, how to calculate the overall time series similarity distance and maintain the relationship between different features is the focus of research in time series classification with multiple features.
In this study, Mahalanobis Distance-based Dynamic Time Warping (MDDTW) was proposed to solve the problem of calculating the distance of multi-dimensional feature time series.This method extends the traditional time series similarity calculation based on single features to a multi-dimensional feature time series, and effectively combines multiple vegetation indexes for wetland vegetation classification.Specifically, this study combined multiple spectral indexes to construct multi-dimensional feature time series, then calculated similarity distance with MDDTW.Finally, the Random Forest (RF) algorithm and the K-Nearest Neighbors (KNN) algorithm, based on distance measurement, was carried out in wetland vegetation classification mapping.There are many abbreviations in this study, which are summarized in Appendix A.

Study Area
The Yellow River Delta (118 • 33 -119 • 20 E, 37 • 35 -38 • 12 N) is located in the Shandong Peninsula.The study area, approximately 50 square kilometers, is shown in Figure 1.Located in the land-sea transition zone at the estuary of the Yellow River, the YRD is about to be ranked as one of the most intact, largest, and the most juvenile wetland ecosystems on the planet [41].It is also the largest natural vegetation area of newborn wetland along the coast of China, with an abundance of natural plants resources.The halophyte community that consists of Phragmites australis (P.australis), Robinia pseudoacacia (R. pseudoacacia), Suaeda salsa (S. salsa), Tamarix chinensis (T.chinensis), Salix matsudana (S. matsudana), and Spartina alterniflora (S. alterniflora), is the main biocoenosis in the study area.Furthermore, there are many farmlands on the edge of the study area, which will be classified as a vegetation type in this study because crops are similar to wetland halophytes and have a certain growth cycle.

Study Area
The Yellow River Delta (118°33′-119°20′E, 37°35′-38°12′N) is located in the Shandong Peninsula.The study area, approximately 50 square kilometers, is shown in Figure 1.Located in the land-sea transition zone at the estuary of the Yellow River, the YRD is about to be ranked as one of the most intact, largest, and the most juvenile wetland ecosystems on the planet [41].It is also the largest natural vegetation area of newborn wetland along the coast of China, with an abundance of natural plants resources.The halophyte community that consists of Phragmites australis (P.australis), Robinia pseudoacacia (R. pseudoacacia), Suaeda salsa (S. salsa), Tamarix chinensis (T.chinensis), Salix matsudana (S. matsudana), and Spartina alterniflora (S. alterniflora), is the main biocoenosis in the study area.Furthermore, there are many farmlands on the edge of the study area, which will be classified as a vegetation type in this study because crops are similar to wetland halophytes and have a certain growth cycle.

Remote Sensing Data
The Sentinel-2 MSI data used in this study were from the European Space Agency's Sentinel Scientific Data Hub.The Sentinel-2 consists of two identical satellites, namely Sentinel-2A and Sentinel-2B, allowing a 5-day revisit time at the equator and an even shorter time toward the poles.As a system suitable for monitoring vegetation growth, the Sentinel-2 records the reflected radiance in 13 spectral bands covering the visible and nearinfrared to the short-wave infrared spectrum, with a spatial resolution as high as 10 m.The data were the top-of-atmosphere (TOA) reflectance after orthophoto correction and geometric fine correction.In order to avoid the interference of cloud, 31 images with less cloud cover were selected, and the image dates obtained are shown in Table 1.

Remote Sensing Data
The Sentinel-2 MSI data used in this study were from the European Space Agency's Sentinel Scientific Data Hub.The Sentinel-2 consists of two identical satellites, namely Sentinel-2A and Sentinel-2B, allowing a 5-day revisit time at the equator and an even shorter time toward the poles.As a system suitable for monitoring vegetation growth, the Sentinel-2 records the reflected radiance in 13 spectral bands covering the visible and near-infrared to the short-wave infrared spectrum, with a spatial resolution as high as 10 m.The data were the top-of-atmosphere (TOA) reflectance after orthophoto correction and geometric fine correction.In order to avoid the interference of cloud, 31 images with less cloud cover were selected, and the image dates obtained are shown in Table 1.Compared with the landlocked environment, natural coastal wetland environments are complex and variable.In order to better understand the spatial distribution of vegetation communities, researchers conducted a field expedition of wetland vegetation groups in the YRD from 20 September to 30 September 2019.
There are 116 field stations, which can be seen as calibration points in this investigation, whose distribution is shown in Figure 2. Furthermore, UAV aerial photography was used to obtain information regarding 574 validation points in three areas that are difficult for personnel to reach: natural tidal beach area, semi closed area affected by S. alterniflora, and fully closed tidal flat.The collection of field survey data could be used as a reference for selecting training samples and validation samples.The distribution of validation points covers the vegetation growth area, as shown in Figure 3.In the remote sensing image, pixel points of 3 × 3 window size with each point as the center are selected, and the vegetation of different pixels in the same window is guaranteed to be the same, which reduces the influence of positioning error.The final classification results are evaluated by validation pixel points in combination with the field survey data.It should be noted that the calibration points and validation points selected in the experiment are completely independent and their sites are different.

Data Preprocessing
Preprocessing is required because the Sentinel-2 Level-1C products in this research are top-of-atmosphere radiance.Firstly, all images were subjected to radiometric calibration and atmospheric correction to obtain surface reflectance, employing the Sen2Cor algorithm [42] using SNAP software.The Fmask4.0 [43] model in MATLAB software was then used to set the pixels affected by cloud as invalid pixels because the pixels of the original image can be cloud-contaminated.Finally, the Sentinel-2 satellite images were clipped to extract the study area.
The land cover types in the study area are complex, because there are many nonvegetated areas such as bare tidal flat and water.However, this study mainly focuses on the classification of vegetation using its time series.Extracting the vegetation covered area is the top priority of vegetation classification.Because the vegetation grows vigorously in September, when the image has fewer clouds and fog, a single decision tree was used to generate the surface classification for the image of 29 September to classify water, vegetation, and tidal flat with the support of eCognition software, which contains a variety of multi-scale segmentation methods.The classification results of the ground surface are shown in Figure 4.It should be pointed out that the automatic classification results are manually edited in combination with the survey data to ensure the accuracy of extracting the vegetation coverage area as much as possible.In the experiment, classification is only carried out in the vegetation area.

Data Preprocessing
Preprocessing is required because the Sentinel-2 Level-1C products in this research are top-of-atmosphere radiance.Firstly, all images were subjected to radiometric calibration and atmospheric correction to obtain surface reflectance, employing the Sen2Cor algorithm [42] using SNAP software.The Fmask4.0 [43] model in MATLAB software was then used to set the pixels affected by cloud as invalid pixels because the pixels of the original image can be cloud-contaminated.Finally, the Sentinel-2 satellite images were clipped to extract the study area.
The land cover types in the study area are complex, because there are many nonvegetated areas such as bare tidal flat and water.However, this study mainly focuses on the classification of vegetation using its time series.Extracting the vegetation covered area is the top priority of vegetation classification.Because the vegetation grows vigorously in September, when the image has fewer clouds and fog, a single decision tree was used to generate the surface classification for the image of September 29 to classify water, vegetation, and tidal flat with the support of eCognition software, which contains a variety of multi-scale segmentation methods.The classification results of the ground surface are shown in Figure 4.It should be pointed out that the automatic classification results are manually edited in combination with the survey data to ensure the accuracy of extracting the vegetation coverage area as much as possible.In the experiment, classification is only carried out in the vegetation area.

Methods
In this study, four vegetation index features (NDVI, DVI, EVI, RVI) were extracted based on Sentinel-2 data to first construct multi-dimensional feature time series.The Mahalanobis distance was then used to extend the DTW algorithm-based one-dimensional features to multi-dimensional features for calculating the similarity distance of multi-dimensional feature time series.Because the traditional Mahalanobis distance based on the covariance of the sample set could not accurately measure the sample similarity relationship, this study uses metric learning to train the Mahalanobis matrix and then solve the similarity distance.Finally, the KNN classifier was used to classify wetland vegetation.The overall steps of the experiment are shown in Figure 5.

Methods
In this study, four vegetation index features (NDVI, DVI, EVI, RVI) were extracted based on Sentinel-2 data to first construct multi-dimensional feature time series.The Mahalanobis distance was then used to extend the DTW algorithm-based one-dimensional features to multi-dimensional features for calculating the similarity distance of multidimensional feature time series.Because the traditional Mahalanobis distance based on the covariance of the sample set could not accurately measure the sample similarity relationship, this study uses metric learning to train the Mahalanobis matrix and then solve the similarity distance.Finally, the KNN classifier was used to classify wetland vegetation.The overall steps of the experiment are shown in Figure 5.

Multi-Dimensional Feature Time Series
The Ratio Vegetation Index (RVI) is the earliest vegetation index [44], which strengthens the difference between vegetation in the infrared band and the near-infrared band.However, it does not consider the influence of atmospheric and soil factors.The EVI [45] has a high sensitivity for vegetation, especially for high vegetation cover, and reduces the interference of soil area and atmospheric factors.The Difference Vegetation Index (DVI) [46] is sensitive to vegetation with medium and low coverage.The NDVI is one of the most widely used vegetation indexes, and is also the most commonly used index to monitor plant growth status.Based on these analyses, four common vegetation indices (NDVI, DVI, EVI, RVI) were extracted to build multi-dimensional time series.It is important to point out that different features can be developed for classification, including but not limited to the above four.
The vegetation indices were calculated by the following Equations ( 1)-( 4). ) where, NIR ρ represents the near-infrared band reflectance and BLUE ρ is the spectral reflectance of the blue band, while RED ρ represents the red band reflectance.As for EVI, G is the gain factor, which was set to 2.5.L is the background adjustment factor, which was set to 1. Furthermore, C was set to 7.5.

Multi-Dimensional Feature Time Series
The Ratio Vegetation Index (RVI) is the earliest vegetation index [44], which strengthens the difference between vegetation in the infrared band and the near-infrared band.However, it does not consider the influence of atmospheric and soil factors.The EVI [45] has a high sensitivity for vegetation, especially for high vegetation cover, and reduces the interference of soil area and atmospheric factors.The Difference Vegetation Index (DVI) [46] is sensitive to vegetation with medium and low coverage.The NDVI is one of the most widely used vegetation indexes, and is also the most commonly used index to monitor plant growth status.Based on these analyses, four common vegetation indices (NDVI, DVI, EVI, RVI) were extracted to build multi-dimensional time series.It is important to point out that different features can be developed for classification, including but not limited to the above four.
where, ρ N IR represents the near-infrared band reflectance and ρ BLUE is the spectral reflectance of the blue band, while ρ RED represents the red band reflectance.As for EVI, G is the gain factor, which was set to 2.5.L is the background adjustment factor, which was set to 1. Furthermore, C 1 and C 2 are adjustment factors for fit; in this study, C 1 was set to 6 and C 2 was set to 7.5.

Mahalanobis Distance-Based DTW (MDDTW)
DTW is an algorithm based on the idea of dynamic programming; it is commonly used to measure the similarity of two time series of different lengths and was initially applied to problems involving template matching [47,48].It has been widely cited in the analysis of time series remote sensing data processing in recent years [49].The algorithm uses the time warping function, satisfying certain conditions to describe the time correspondence between the standard time series features and the time series features to be matched.According to the warping path corresponding to the minimum cumulative distance when the two-time series features are matched, the similarity of different time series is measured, and the DTW distance is obtained.
Two time series data, X and Y, are expressed as: where m and n are the lengths of X and Y.
In order to calculate the DTW distance, a matrix of m × n is constructed according to the time series X and Y, where elements (i, j) represent the relative distance, d(x i , y j ), between the two points x i and y i in the corresponding sequences.The data on sequence Y are nonlinearly mapped to sequence X.
The optimal warp path W is expressed as: where max(m, n) ≤ K ≤ m + n − 1.The path satisfies the following constraints: boundary, continuity, and monotonicity [50].The main aim is to find the only optimal path among many bending distances that minimizes the cumulative optimal distance, as shown in Figure 6.

Mahalanobis Distance-Based DTW (MDDTW)
DTW is an algorithm based on the idea of dynamic programming; it is commonly used to measure the similarity of two time series of different lengths and was initially applied to problems involving template matching [47,48].It has been widely cited in the analysis of time series remote sensing data processing in recent years [49].The algorithm uses the time warping function, satisfying certain conditions to describe the time correspondence between the standard time series features and the time series features to be matched.According to the warping path corresponding to the minimum cumulative distance when the two-time series features are matched, the similarity of different time series is measured, and the DTW distance is obtained.
Two time series data, X and Y, are expressed as: , , , , , , , , where m and n are the lengths of X and Y.
In order to calculate the DTW distance, a matrix of m × n is constructed according to the time series X and Y, where elements (i, j) represent the relative distance, d(xi, yj), between the two points xi and yi in the corresponding sequences.The data on sequence Y are nonlinearly mapped to sequence X.
The optimal warp path W is expressed as: ( ) , , , , , where max( , ) 1 m n K m n ≤ ≤ + − .The path satisfies the following constraints: bound- ary, continuity, and monotonicity [50].The main aim is to find the only optimal path among many bending distances that minimizes the cumulative optimal distance, as shown in Figure 6.The final DTW distance is calculated according to Equation (7).
The final DTW distance is calculated according to Equation (7).
where r(i, j) represents the cumulative distance, which is the cumulative minimum distance from the starting point (1,1) to the end point (i, j), and r(1, 1) = D M X 1 , Y 1 .The local distance measure, d(i, j), is defined as: For the multi-dimensional feature time series constructed by NDVI, DVI, EVI, and RVI, different index features will be given the same weight, and the correlation between features will be ignored if the Euclidean distance is chosen to calculate the DTW local distance.As mentioned above, features have different correlation with the label of subjects, and there might also be correlations between the different features.A feasible strategy is to use the Mahalanobis distance function to measure the local distance of vectors in multi-dimensional feature time series.
Mahalanobis distance was designed to measure the distance between a distribution and a single point [51], and it is an effective method to calculate the similarity of two unknown sample sets, using the Mahalanobis matrix to calculate the correlation between variables.
Two multi-dimensional time series X and Y are expressed as: where d is the dimension of the multi-feature time series, and m and n represent the length of multi-dimensional time series X and Y, respectively.The local distance in DTW is calculated by the Mahalanobis distance as Equation (10), where X i = (x 1i , x 2i , . . ., x di ) T , 1 ≤ i ≤ m represents the ith column of X, which is the value of each variable at time i; Y j = (y 1j , y 2j , . . ., y dj ) T , 1 ≤ j ≤ n represents the jth column of Y, which is the value of each variable at time j.The PSD matrix, M, is named as the Mahalanobis matrix.The dynamic time warping algorithm is expressed as: where r(i, j) is the cumulative minimum distance from (1,1) to (i, j), and r(1, 1) Through the above theoretical derivation, a new measure called MDDTW is formed, combining Mahalanobis distance and DTW, which represent the similarity of multidimensional characteristic time series.How to obtain the appropriate Mahalanobis distance is a key problem in this study.The traditional Mahalanobis distance is generally obtained based on the inverse of the covariance matrix, which represents the relationship of the internal aggregation of data.However, it is not enough that the measurement distance function can only reflect the internal aggregation relationship of data.More importantly, it is necessary to establish the relationship between sample attributes and categories.Therefore, a metric learning method is employed for Mahalanobis distance learning to calculate the similarity of multi-dimensional characteristic time series in this study.

Metric Learning for MDDTW
In the study of metric learning, Mahalanobis distance is no longer simply limited to the inverse of the covariance matrix, but needs to be obtained through the process of metric learning [52].The purpose of metric learning based on Mahalanobis distance is to obtain a positive semi-definite symmetric matrix, M, for a given training sample set, X, to establish the relationship between the eigenvectors of the samples.In this way, the similar relationship between the training samples is preserved, and the distance of the same samples is closer, while different kinds are distant.The paper [53,54] developed a new Mahalanobis distance metric learning algorithm using triple constraint and information entropy, and it succeeded in image processing and pattern recognition.Therefore, this method is employed in this study.
Applying the LogDet divergence as the regularization term, the LogDet divergencebased metric learning model [55,56] for multi-dimensional features time series is expressed as: where M t is the Mahalanobis matrix of the tth iteration and η t represents a regularization parameter, which satisfies η t > 0 to balance D ld (M, M t ) and (M, ŷt , y t ).
D ld (M, M t ) is the regularization function, which is expressed as: (M, ŷt , y t ) is the loss function, which is expressed as: where ŷt represents the predicted value, y t represents the expected value, and d M is the Mahalanobis distance of two samples.The three samples x i , x j , x k in the sample set constitute a set of triple constraints, and x i , x j , x k represent multi-dimensional feature time series and belong to two different categories, respectively.The final solution result of M is expressed as Equation ( 15) through mathematical derivation [57,58], while q t M t q T t − p t M t p T t < ρ .
where p t = x i − x j ; q t = (x i − x k ), and α is a constant learning rate parameter, which is chosen between 0 and 1.When the M is obtained by metric learning, the similarity distance of multi-dimensional feature time series can be calculated by MDDTW.In the later stage of this study, a KNN algorithm suitable for distance measurement is combined for vegetation classification.
In the vegetation covered area, each pixel position is time-expanded, and the Mahalanobis matrix (M) is obtained by using the metric learning method described as above.Because the triple constraint needs to be rebuilt whenever M is updated, the number of cycles for setting the dynamic triple constraint construction strategy is set to 10.In each cycle, the number of ternary constraints is N = 5n, where n is the number of training samples.In the later stage of this study, a KNN algorithm suitable for distance measurement is combined for vegetation classification.

Experiment and Results
In order to verify the effectiveness of the classification method based on MDDTW, this study compared four classification methods, including Random Forest based on multitemporal (RF-Multitemporal), Random Forest based on pixel-differential time series (RF-PDTS), TWDTW, and EDDTW.Section 3.1 mainly explains the experimental details and the parameter settings of different classification methods.The classification maps of the different methods are shown in Section 3.2.

Experiment
For the classification based on MDDTW, RF, and KNN, the classifiers were employed after calculating the distance of multi-dimensional features by the MDDTW method.For RF classifiers, the number parameter of classifiers is set to 10 and the depth of decision tree is 50.For the KNN classifier, the distance between each pixel and the samples was calculated directly and then classified.In this study, the K was set to 1 after many experimental comparisons.It should be noted that parallel processing was used to improve efficiency during the KNN classifier employed.As for the EDDTW, which is similar to MDDTW, the time series is constructed by combining four vegetation indexes.Different time points are regarded as vectors, and the local distance of DTW is calculated by Euclidean distance [59].The RF and KNN classifiers were used for classification, and the parameter settings are the same as for MDDTW.For the classification based on RF-Multitemporal, the remote sensing images of the four seasons of spring, summer, autumn, and winter are selected as the data to be classified, as shown in Figure 7; this can reflect the seasonal difference and ensure that the vegetation area is cloudless.The number of Random Forest classifiers is set to 10, and the depth of the decision tree is 50.
different methods are shown in Section 3.2.

Experiment
For the classification based on MDDTW, RF, and KNN, the classifiers were employed after calculating the distance of multi-dimensional features by the MDDTW method.For RF classifiers, the number parameter of classifiers is set to 10 and the depth of decision tree is 50.For the KNN classifier, the distance between each pixel and the samples was calculated directly and then classified.In this study, the K was set to 1 after many experimental comparisons.It should be noted that parallel processing was used to improve efficiency during the KNN classifier employed.As for the EDDTW, which is similar to MDDTW, the time series is constructed by combining four vegetation indexes.Different time points are regarded as vectors, and the local distance of DTW is calculated by Euclidean distance [59].The RF and KNN classifiers were used for classification, and the parameter settings are the same as for MDDTW.For the classification based on RF-Multitemporal, the remote sensing images of the four seasons of spring, summer, autumn, and winter are selected as the data to be classified, as shown in Figure 7; this can reflect the seasonal difference and ensure that the vegetation area is cloudless.The number of Random Forest classifiers is set to 10, and the depth of the decision tree is 50.For the RF-PDTS [25], the number of Random Forest classifiers is set to 10 and the depth of the decision tree is 50.
TWDTW constrains time by setting weights and takes into account the seasonal changes of land cover types.It is an advanced time series classification method [60], where α (gain factor) and β (time distance) are important weight parameters.In this study, α = 0.25 and β = 45 were determined based on experiments, as shown in Figure 8.The RF and KNN classifiers are used for classification, respectively, and are the same as the MDDTW parameters settings.For the RF-PDTS [25], the number of Random Forest classifiers is set to 10 and the depth of the decision tree is 50.
TWDTW constrains time by setting weights and takes into account the seasonal changes of land cover types.It is an advanced time series classification method [60], where α (gain factor) and β (time distance) are important weight parameters.In this study, α = 0.

Result
The classification results based on multitemporal Random Forest, RF-PDTS, TWDTW, EDDTW, and MDDTW are shown in Figure 9a-h.For crops S. alterniflora, R. pseudoacacia, and T. chinensis, different classification maps have spatial consistency, and the distribution of those is roughly the same.Seven areas, A-G, where classified maps have obvious differences, are selected for further analysis.
As shown in Figure 9, the difference between areas A, B, and C is the most obvious.According to the field survey, area A is the production area of Dongying Shengli Oilfield, where artificial buildings, bare land, and P. australis are mixed.The same complex situation occurs in area B, where the mixed growth of P. australis, S. salsa, S. matsudana, and T. chinensis is serious, and there is a small amount of bare land.The classification results (a), (b), and (f) suggest that P. australis occupies a dominant position in the A and B area and other vegetation is ignored.Because the backgrounds of bare land and farmland are relatively similar, the classification results (g) and (h) are closer to the actual situation in area B. Area C is an ecological restoration area; it is in a bare land state due to water shortage in spring and winter, while P. australis, S. salsa, and T. chinensis coexist in summer and autumn.The results of different classification methods differ the most in region C, because the ground features in this region are the most complicated, so it is difficult to evaluate the pros and cons of the different methods in this region.In conclusion, except that the classification results of RF-EDDTW, RF-MDDTW, and KNN-MDDTW in area B are significantly better than other methods, it is meaningless to compare the classification results in areas A and C, considering that there are many non-vegetation interferences.The results (a) and (f) show that the distribution of S. matsudana is relatively concentrated, while the other classification results are relatively broken, which is mainly due to the classification based on pixels in this study.Figure 9c,e shows that this area is mainly covered by S. matsudana, which proves that RF-EDDTW and RF-TWDTW have a poor distinction between P. australis and S. matsudana.
For area E, there are mainly P. australis and S. salsa and T. chinensis.T. chinensis is mainly scattered at the junction of P. australis and S. salsa, so the classification results (a), (b), and (h) are more reasonable.For areas F and G, comparing different classification results, it can be seen that the area of S. alterniflora in the classification results of KNN-MDDTW is significantly larger than that of the other classification methods.Among them, (h) shows that there are scattered S. alterniflora on both sides of the tidal trench in area F. Although the area is small, the classification results, except for (d), failed to identify the classification of S. alterniflora at this location.Combined with the field survey situation, KNN-MDDTW is more accurate in the classification of S. alterniflora in these two areas.
The overall accuracy (OA) and kappa of different classification results are shown in Table 2.Because the RF classifier is not suitable for distance measurement classification, the OA and kappa of RF-TWDTW and RF-EDDTW have not reached 80%.Furthermore, the operation of averaging the time series of different samples as the standard time series cannot accurately reflect the category characteristics [59], which is also the reason why RF-TWDTW and RF-EDDTW have poor classification accuracy.The OA and kappa of MDDTW and TWDTW based on the KNN classifier are better than other classification methods because MDDTW and TWDTW calculate the distance metric, which is based on the KNN classifier.In addition to that, the classification accuracy of KNN-MDDTW based on multi-dimensional features is better than the current advanced KNN-TWDTW that uses single feature classification, indicating that the use of multi-dimensional features has advantages in temporal classification.However, despite the use of four vegetation indices for classification, the classification effect of EDDTW is not satisfactory, and the simple Euclidean distance is less effective for calculating the similarity distance of multidimensional feature time series.The user accuracy (UA) and producer accuracy (PA) of the classification results of different methods are further calculated, as shown in Figures 10 and 11.The classification method based on KNN-MDDTW ranks among the top three in terms of user accuracy and producer accuracy in all categories except T. chinensis.Among them, S. alterniflora, S. matsudana, S. salsa, and R. pseudoacacia have the best UA.PA achieves the best accuracy on all vegetation types except R. pseudoacacia and T. chinensis, so this method is the best overall.At the same time, the results show that although the combination of multiple spectral indexes has achieved good results, the traditional RF-PDTS and TWDTW based on NDVI single vegetation index still have good performance in wetland vegetation classification; however, the effects are better than the multi-dimensional feature calculation method based on EDDTW.

Comparison of Vegetation Classification Methods
In this section, the RF-PDTS and KNN-TWDTW methods, which were successfully applied in the field of vegetation classification with time series remote sensing data [25,[32][33][34] and also performed well in this research, are selected for comparing.The RF-PDTS selects the key phenological phases to classify by analyzing the vegetation growth curve, while the TWDTW and MDDTW both classify by calculating the distance of vegetation growth curve.Therefore, different vegetation growth curves are shown in Figure 12a-g, and confusion matrices from different methods are shown in Table 3 (SA = S. alterniflora, PA = P. australis, SM = S. matsudana, RP = R. pseudoacacia, TC = T. chinensis, SS = S. salsa.).The purpose is to discuss the applicability and deficiencies of different methods by analyzing the growth curves of different vegetation and the misclassification of vegetation.

Comparison of Vegetation Classification Methods
In this section, the RF-PDTS and KNN-TWDTW methods, which were successfully applied in the field of vegetation classification with time series remote sensing data [25,[32][33][34] and also performed well in this research, are selected for comparing.The RF-PDTS selects the key phenological phases to classify by analyzing the vegetation growth curve, while the TWDTW and MDDTW both classify by calculating the distance of vegetation growth curve.Therefore, different vegetation growth curves are shown in Figure 12a-g, and confusion matrices from different methods are shown in Table 3 (SA = S. alterniflora, PA = P. australis, SM = S. matsudana, RP = R. pseudoacacia, TC = T. chinensis, SS = S. salsa.).The purpose is to discuss the applicability and deficiencies of different methods by analyzing the growth curves of different vegetation and the misclassification of vegetation.As shown in Figure 12, the curves of S. alterniflora (b) and S. salsa (g) are obviously different from the others.This is the reason why all classification methods have less misclassification of S. alterniflora and S. salsa in the confusion matrix.Therefore, only the other five types of vegetation are considered in the follow-up comparison.The growth curve comparison is shown in Figure 12h, which clearly shows the differences between the different curves.In order to describe the misclassification of different vegetation from confusion matrix, the following Equation ( 16) is used to simply quantify the degree of confusion between the two vegetations:  As shown in Figure 12, the curves of S. alterniflora (b) and S. salsa (g) are obviously different from the others.This is the reason why all classification methods have less misclassification of S. alterniflora and S. salsa in the confusion matrix.Therefore, only the other five types of vegetation are considered in the follow-up comparison.The growth curve comparison is shown in Figure 12h, which clearly shows the differences between the different curves.
In order to describe the misclassification of different vegetation from confusion matrix, the following Equation ( 16) is used to simply quantify the degree of confusion between the two vegetations: where M(A, B) represent the possibility of misclassification between vegetation A and vegetation B, M B,A represents the amount of vegetation A that was misclassified into B, and T A represents the amunt of vegetation A. The same definition applies to M A,B and T B .For example, M(Crop, TC) represents the misclassification between Crop and T. chinensis.The higher the value, the greater the possibility of misclassification between the two vegetations.Figure 13 shows the misclassification of different types of vegetation with different methods.
In the RF-PDTS method, P. australis and T. chinensis are the most likely to be misclassified.From Figure 12h, it can be seen that their growth curves (yellow and dark blue curves) are the closest, and the time when the maximum NDVI value occurs of those are both around 210 days.After reaching the maximum value, it almost keeps the same trend and drops to the minimum value.At the same time, the possibility of misclassification between Crop and R. pseudoacacia is the smallest in this method.The reason for this can also be found in Figure 12h.The two vegetations have the largest difference in growth and withering time.In other words, the phenological difference between Crop and R. pseudoacacia is large.Although the overall accuracy of KNN-TWDTW is better than that of the RF-PDTS method, the results in Figure 13 show that the possibility of misclassification in the classification of part of the vegetation is greater than that of the RF-PDTS method, such as P. australis and S. matsudana classification.This is mainly because the TWDTW method expresses its similarity by calculating the distance of the curve as a whole.In the classification of P. australis, S. matsudana, and T. chinensis, this method may consider P. australis (yellow) and S. matsudana (light blue) curves as being more similar in overall shape compared to P. australis and T. chinensis.Nevertheless, the TWDTW method has fully proved its role in the field of time series classification, especially for vegetation with similar phenological time.As for the KNN-MDDTW, it has achieved good results in almost all vegetation.two vegetations.Figure 13 shows the misclassification of different types of vegetation with different methods.In the RF-PDTS method, P. australis and T. chinensis are the most likely to be misclassified.From Figure 12h, it can be seen that their growth curves (yellow and dark blue curves) are the closest, and the time when the maximum NDVI value occurs of those are both around 210 days.After reaching the maximum value, it almost keeps the same trend and drops to the minimum value.At the same time, the possibility of misclassification between Crop and R. pseudoacacia is the smallest in this method.The reason for this can also be found in Figure 12h.The two vegetations have the largest difference in growth and withering time.In other words, the phenological difference between Crop and R. pseudoacacia is large.Although the overall accuracy of KNN-TWDTW is better than that of the RF-PDTS method, the results in Figure 13 show that the possibility of misclassification in the classification of part of the vegetation is greater than that of the RF-PDTS method, such as P. australis and S. matsudana classification.This is mainly because the TWDTW method expresses its similarity by calculating the distance of the curve as a whole.In the classification of P. australis, S. matsudana, and T. chinensis, this method may consider P. australis (yellow) and S. matsudana (light blue) curves as being more similar in overall shape compared to P. australis and T. chinensis.Nevertheless, the TWDTW method has fully proved its role in the field of time series classification, especially for vegetation with similar phenological time.As for the KNN-MDDTW, it has achieved good results in almost all vegetation.
The difference in phenology is the basis for the RF-PDTS based on time series remote sensing images.The greater the phenological difference of vegetation, the better the classification effect.The vegetation of the YRD wetland is complex, and the growth cycles of different vegetation are similar, which lead to more errors in classification results.The TWDTW method is an improvement based on DTW, relying on time series curves for classification.The TWDTW algorithm works by comparing the similarity between two time-series and finds their optimal alignment, resulting in dissimilarity measures.The time series curve of typical vegetation in the YRD wetland has a high degree of similarity, as shown in Figure 12.Therefore, it has a poor effect on the classification of some vegetation.Compared with RF-PDTS and TWDTW, the KNN-MDDTW has achieved better results, whether it is vegetation with similar phenological time or similar curve shape.The The difference in phenology is the basis for the RF-PDTS based on time series remote sensing images.The greater the phenological difference of vegetation, the better the classification effect.The vegetation of the YRD wetland is complex, and the growth cycles of different vegetation are similar, which lead to more errors in classification results.The TWDTW method is an improvement based on DTW, relying on time series curves for classification.The TWDTW algorithm works by comparing the similarity between two timeseries and finds their optimal alignment, resulting in dissimilarity measures.The time series curve of typical vegetation in the YRD wetland has a high degree of similarity, as shown in Figure 12.Therefore, it has a poor effect on the classification of some vegetation.Compared with RF-PDTS and TWDTW, the KNN-MDDTW has achieved better results, whether it is vegetation with similar phenological time or similar curve shape.The use of a single vegetation index has limited effect on the classification of complex wetland vegetation; however, the KNN-MDDTW shows the effectiveness of using multi-dimensional features for time series classification.This method can calculate reasonable similarity distance from different multi-dimensional features, which meets the needs of classification.But it needs to be pointed out that although the method has achieved good accuracy, its computational cost is also too large to ignore.At the same time, how to select effective features and the importance of different features still needs further research.

Influence of Parameters on MDDTW Performance
For the classification based on MDDTW, the learning efficiency of M depends on the number of samples.Therefore, it is important to compare the efficiency and accuracy under different sample numbers.The results are shown in Figure 14.It should be pointed out that the experiment was carried out by randomly selecting sample points to calculate time and accuracy, and the result is the average of multiple experiments, which avoid the uncertainty caused by selecting sample points at different positions.
The number of samples has a significant relationship with the accuracy of the final result and the required time.With the increase of the number of samples, the accuracy and kappa both increase, but the time consumption also increases, as shown in Figure 14a.The classification accuracy and kappa coefficient change little with the increase of the number of samples when the number of samples reaches 900.The accuracy has exceeded 90% and kappa was more than 0.9 when the number of samples reaches 1000, as shown in Figure 14b.In the experiment, 1000 samples were chosen for MDDTW classification.
For the classification based on MDDTW, the learning efficiency of M depends on the number of samples.Therefore, it is important to compare the efficiency and accuracy under different sample numbers.The results are shown in Figure 14.It should be pointed out that the experiment was carried out by randomly selecting sample points to calculate time and accuracy, and the result is the average of multiple experiments, which avoid the uncertainty caused by selecting sample points at different positions.The number of samples has a significant relationship with the accuracy of the final result and the required time.With the increase of the number of samples, the accuracy and kappa both increase, but the time consumption also increases, as shown in Figure 14a.The classification accuracy and kappa coefficient change little with the increase of the number of samples when the number of samples reaches 900.The accuracy has exceeded 90% and kappa was more than 0.9 when the number of samples reaches 1000, as shown in Figure 14b.In the experiment, 1000 samples were chosen for MDDTW classification.

Conclusions
The composition and distribution information of wetland vegetation plays a significant role in protecting the wetland ecological environment and formulating effective wetland management plans.In this study, we extracted multiple vegetation indexes based on Sentinel-2 data and studied the multi-dimensional feature time series classification method.The results show that the classification method based on multi-dimensional features has significant advantages in wetland vegetation classification.The overall classification accuracy based on KNN-MDDTW is more than 90% and kappa is more than 0.9, which is better than comparison methods.
The traditional remote sensing time series classification based on DTW has been extended to multi-dimensional feature time series in order to fully utilize spectral information in the current research.This method calculated the similarity distance of multidimensional feature time series and effectively improved the accuracy of wetland vegetation classification in the Yellow River Delta, and it is expected to play an important role in the annual dynamic monitoring of wetland vegetation.However, only four indexes

Conclusions
The composition and distribution information of wetland vegetation plays a significant role in protecting the wetland ecological environment and formulating effective wetland management plans.In this study, we extracted multiple vegetation indexes based on Sentinel-2 data and studied the multi-dimensional feature time series classification method.The results show that the classification method based on multi-dimensional features has significant advantages in wetland vegetation classification.The overall classification accuracy based on KNN-MDDTW is more than 90% and kappa is more than 0.9, which is better than comparison methods.
The traditional remote sensing time series classification based on DTW has been extended to multi-dimensional feature time series in order to fully utilize spectral information in the current research.This method calculated the similarity distance of multi-dimensional feature time series and effectively improved the accuracy of wetland vegetation classification in the Yellow River Delta, and it is expected to play an important role in the annual dynamic monitoring of wetland vegetation.However, only four indexes (including NDVI, RVI, EVI, and DVI) were selected to construct the multi-dimensional feature time series in the current research.The impact of different index features on classification and the subsequent addition of other features to construct more time series dimensions for classification will be focused on in future research.

Figure 1 .
Figure 1.Location of study area in Shandong Province of China.The image to the right is a true color composite (red, green, blue bands) from Sentinel-2.

Figure 1 .
Figure 1.Location of study area in Shandong Province of China.The image to the right is a true color composite (red, green, blue bands) from Sentinel-2.

Figure 3 .
Figure 3.The distribution of verification points.

Figure 6 .
Figure 6.The optimal warping path between time series X and Y (red squares).
25 and β = 45 were determined based on experiments, as shown in Figure 8.The RF and KNN classifiers are used for classification, respectively, and are the same as the MDDTW parameters settings.

21 Figure 10 .
Figure 10.Producer accuracy of different classification methods.

Figure 10 .
Figure 10.Producer accuracy of different classification methods.

Figure 10 .
Figure 10.Producer accuracy of different classification methods.

Figure 11 .
Figure 11.User accuracy of different classification methods.

Figure 11 .
Figure 11.User accuracy of different classification methods.
(a) Curve of Crops (b) Curve of S. alterniflora
of vegetation A that was misclassified into B,and AT represents the amunt of vegetation A. The same definition applies to TC represents the misclassification between Crop and T.chinensis.The higher the value, the greater the possibility of misclassification between the

Figure 13 .
Figure 13.Misclassification of different vegetation with different methods.

Figure 13 .
Figure 13.Misclassification of different vegetation with different methods.

Figure 14 .
Figure 14.The relationship between number of samples, time-consumption, and classification accuracy.(a) Time-Number of samples.(b) OA/kappa-Number of samples.

Figure 14 .
Figure 14.The relationship between number of samples, time-consumption, and classification accuracy.(a) Time-Number of samples.(b) OA/kappa-Number of samples.

Table 2 .
Accuracy of different classification methods.

Table 3 .
Confusion matrix of the wetland vegetation classification.

Table 3 .
Confusion matrix of the wetland vegetation classification.