Continuous Wavelet Analysis of Leaf Reflectance Improves Classification Accuracy of Mangrove Species

Due to continuous degradation of mangrove forests, the accurate monitoring of spatial distribution and species composition of mangroves is essential for restoration, conservation and management of coastal ecosystems. With leaf hyperspectral reflectance, this study aimed to explore the potential of continuous wavelet analysis (CWA) combined with different sample subset partition (stratified random sampling (STRAT), Kennard-Stone sampling algorithm (KS), and sample subset partition based on joint X-Y distances (SPXY)) and feature extraction methods (principal component analysis (PCA), successive projections algorithm (SPA), and vegetation index (VI)) in mangrove species classification. A total of 301 mangrove leaf samples with four species (Avicennia marina, Bruguiera gymnorrhiza, Kandelia obovate and Aegiceras corniculatum) were collected across six different regions. The smoothed reflectance (Smth) and first derivative reflectance (Der) spectra were subjected to CWA using different wavelet scales, and a total of 270 random forest classification models were established and compared. Among the 120 models with CWA of Smth, 88.3% of models increased the overall accuracy (OA) values with an improvement of 0.2–28.6% compared to the model with the Smth spectra; among the 120 models with CWA of Der, 25.8% of models increased the OA values with an improvement of 0.1–11.4% compared to the model with the Der spectra. The model with CWA of Der at the scale of 23 coupling with STRAT and SPA achieved the best classification result (OA = 98.0%), while the best model with Smth and Der alone had OA values of 86.3% and 93.0%, respectively. Moreover, the models using STRAT outperformed those using KS and SPXY, and the models using PCA and SPA had better performances than those using VIs. We have concluded that CWA with suitable scales holds great potential in improving the classification accuracy of mangrove species, and that STRAT combined with the PCA or SPA method is also recommended to improve classification performance. These results may lay the foundation for further studies with UAV-acquired or satellite hyperspectral data, and the encouraging performance of CWA for mangrove species classification can also be extended to other plant species.


Introduction
Mangrove forests are communities of diverse salt-tolerant evergreen trees and other plant species in tropical and subtropical intertidal zones, and they provide important ecosystem services such as nutrient cycling, carbon sequestration, and coastal hazard (e.g., shoreline erosion, soil salinization, hurricanes, and tsunamis) mitigation [1][2][3][4].Due to climate change, natural disasters, and coastal development, the ecological functions of mangrove forests have been continuously degraded for decades [5,6].The diversity and composition of tree species are key parameters for assessing forest ecosystems [7] and are also particularly essential for understanding their response to environmental change and observing the integrity of endangered ecosystems such as mangroves [8].Therefore, the accurate classification of mangrove species and in-time monitoring of their spatial distribution are critical for conserving and restoring mangrove forests.
Conventionally, obtaining species information on mangrove forests requires costly, laborintensive, and time-consuming field investigations, and it is often difficult for investigators to access mangrove forests [9].Due to rapid, large-scale and cost-effective monitoring capacities, remote sensing techniques have been increasingly adopted to survey and evaluate mangrove resources during the past several decades [10].Medium-resolution multispectral imagery, such as that from Landsat [11,12], SPOT [13], and Sentinel-2 [8] are often used to map the distribution of mangrove forests at regional or even national or global scales.Due to the advantage of their having superb spatial and textural features and high-resolution multispectral images, satellites such as Quickbird [14], IKONOS [15], Worldview [16,17], and Pléiades-1 [18] have been widely employed to classify mangrove species at landscape or regional scales.
Compared with multi-spectral satellite images with poor spectral information, hyperspectral remote sensing data contain dozens or hundreds of contiguous wavebands with spectral features related to plant functional traits and are found to be more efficient for tree species classification [19].Leaf [20,21], canopy [21][22][23], satellite (e.g., Hyperion [24,25]), and airborne (e.g., CASI [26], AVIRIS [27,28], and AISA [29]) hyperspectral data have been widely employed in classifying mangrove forests and other forest types (e.g., temperate, subtropical, tropical rainforest, and urban).With rapid advancements in unmanned aerial vehicles (UAV) and self-driving cars, light detection and ranging (LiDAR) techniques have been increasingly utilized in classifying tree species.However, several studies have pointed out that LiDAR alone has not been able to accurately classify tree species, but could be combined with hyperspectral images to further improve classification accuracy [30][31][32].Therefore, LiDAR-acquired structural parameters are often used as auxiliary information, and the significance of exploring hyperspectral data for plant species classification is still indispensable.
The high dimensionality and multi-collinearity of hyperspectral data may decrease model accuracy in supervised learning processes because the number of spectral wavebands often exceeds the number of model calibration samples [33].Therefore, additional processing methods are needed for hyperspectral data to resolve the problem of redundant predictors and enhance spectral differences.Considering feature extraction, dimensionality reduction (e.g., principal component analysis (PCA)), waveband selection (e.g., successive projections algorithm (SPA)), and vegetation index (VI) extraction are the three commonly-used strategies in relating sensitive spectral features to the information of plant species [34][35][36].Moreover, different sample subset partition methods (e.g., stratified random sampling (STRAT), Kennard-Stone sampling algorithm (KS), and sample subset partition based on joint X-Y distances (SPXY)) may cause different classification results [37,38].However, very few studies have investigated the combination of feature extraction and sample subset partition in the classification of mangrove species.
Continuous wavelet analysis (CWA) is an effective noise reduction method and it can also enhance the details of spectral features of hyperspectral data [39][40][41].Hence, CWA has been successfully utilized in quantitative remote sensing for retrieving functional traits of plants (e.g., leaf mass per area [42], canopy water content [43], leaf dry matter content and specific leaf area [44]).In contrast, a few studies have applied CWA in improving species classification accuracy in herbaceous wetlands and tropical dry forests [45,46].However, the advantage of CWA with regards to hyperspectral data is rarely investigated in mangrove species classification.
With the leaf hyperspectral reflectance spectra of four mangrove species samples collected across six regions, this study aimed to explore the potential of CWA combined with different sample subset partition and feature extraction methods in mangrove species classification.The results at the leaf scale may lay the foundation for further studies with UAV-and satellite-based hyperspectral images.

Field Sample
A total of 301 leaf samples of four mangrove species were collected from six sites (Figure 1) in 2017 and 2018 (Table 1), comprising 60 of Avicennia marina, 46 of Bruguiera gymnorrhiza, 81 of Kandelia obovate, and 114 of Aegiceras corniculatum.Each sample was collected from a plot of 10 m × 10 m with a single species, and the center location of each plot was recorded with a global positioning system (GPS) handheld receiver.Moreover, any two plots were at least 30 m apart.For each plot, about 20-30 leaves were picked from the canopy using an extendable trimming pole.To ensure the picked leaves were mature, the leaves between the third and fifth layers from the top were selected [47].Finally, each sample was instantly sealed in a fresh-keeping bag, kept in a dark box with ice packs, and transported to a nearby laboratory for spectral measurement and chemical analysis.
partition and feature extraction methods in mangrove species classification.The results at the leaf scale may lay the foundation for further studies with UAV-and satellite-based hyperspectral images.

Field Sample
A total of 301 leaf samples of four mangrove species were collected from six sites (Figure 1) in 2017 and 2018 (Table 1), comprising 60 of Avicennia marina, 46 of Bruguiera gymnorrhiza, 81 of Kandelia obovate, and 114 of Aegiceras corniculatum.Each sample was collected from a plot of 10 m × 10 m with a single species, and the center location of each plot was recorded with a global positioning system (GPS) handheld receiver.Moreover, any two plots were at least 30 m apart.For each plot, about 20-30 leaves were picked from the canopy using an extendable trimming pole.To ensure the picked leaves were mature, the leaves between the third and fifth layers from the top were selected [47].Finally, each sample was instantly sealed in a fresh-keeping bag, kept in a dark box with ice packs, and transported to a nearby laboratory for spectral measurement and chemical analysis.

Leaf Reflectance Measurement and Spectra Preprocessing
An ASD FieldSpec 4 portable spectroradiometer (Analytical Spectral Devices, Inc., Boulder, CO, USA) was used to measure the leaf spectra reflectance of four mangrove species, and it possesses 2151 wavebands from 350 to 2500 nm with a sampling interval of 1.4 nm in the range of 350-1000 nm

Leaf Reflectance Measurement and Spectra Preprocessing
An ASD FieldSpec 4 portable spectroradiometer (Analytical Spectral Devices, Inc., Boulder, CO, USA) was used to measure the leaf spectra reflectance of four mangrove species, and it possesses 2151 wavebands from 350 to 2500 nm with a sampling interval of 1.4 nm in the range of 350-1000 nm and 2 nm in the range of 1000-2500 nm.For each sample, ten leaves were randomly selected in order to measure their spectra with an ASD leaf clip and plant probe; the spectra of each leaf were recorded with ten successive scans and the spectra of ten leaves were averaged as the final reflectance spectra of the target sample.
Due to the systematic noise at two edges of leaf spectrum (350-399 nm and 2451-2500 nm), the leaf reflectance spectra of 301 samples were first reduced to 400-2450nm (Figure 2).To minimize the effects of random noise on model calibration, the remaining spectra were then processed by a Savitzky-Golay smoothing filter with a second order polynomial fit and a window size of seven data points [48].Finally, the smoothed spectra (hereafter Smth for short) were subjected to first derivative analysis (hereafter Der for short) because this can enhance the peaks and valleys of spectral features [49] and minimize the impact of multiple scatterings of irradiation [50].
with ten successive scans and the spectra of ten leaves were averaged as the final reflectance spectra of the target sample.
Due to the systematic noise at two edges of leaf spectrum (350-399 nm and 2451-2500 nm), the leaf reflectance spectra of 301 samples were first reduced to 400-2450nm (Figure 2).To minimize the effects of random noise on model calibration, the remaining spectra were then processed by a Savitzky-Golay smoothing filter with a second order polynomial fit and a window size of seven data points [48].Finally, the smoothed spectra (hereafter Smth for short) were subjected to first derivative analysis (hereafter Der for short) because this can enhance the peaks and valleys of spectral features [49] and minimize the impact of multiple scatterings of irradiation [50].

Continuous Wavelet Analysis of Leaf Reflectance
Wavelet analysis is an effective pattern of decomposing the original signal into multiple amplitudes and scales [51,52] and has been widely applied in the field of vegetation remote sensing [41,43,44].Wavelet analysis is generally implemented in the form of discrete wavelet analysis (DWA) and continuous wavelet analysis (CWA).The former often transforms the most informative part of the input data to avoid redundancy but the decomposed components from DWA are difficult to interpret for waveband-by-waveband analysis [44].In contrast, CWA generates interpretable signals which directly correspond to original leaf spectra and thus the decomposed signals can reflect the information on plant absorption features [53,54].Therefore, we employed CWA to explore the details of leaf spectra in the classification of mangrove species.
CWA performs the convolution of reflectance spectrum f (λ) into sets of coefficients with a mother wavelet function at various scales (eq. 1) [55].This may be expressed as where Wf (a,b) (a and b are positive real numbers) is the vector of wavelet coefficients, a and b represent scaling and shifting factors, respectively, indicating the width and position of the wavelet function [56], and ψ , (λ) is the mother wavelet function.
The shape of the leaf reflectance spectrum is similar to a Gaussian or quasi-Gaussian function, or a composition of several Gaussian functions [57].Based on the suggestion of Torrence et al. [58], the second order derivative of Gaussian (namely the 'Mexican Hat', "mexh") was chosen as the mother wavelet function (Figure 3).The "mexh" function has symmetry and its mean power is zero [59].Moreover, the "mexh" function has an infinite support width of [-5s, 5s] (s ∈  + ) and its effective basic support range is [-5, 5] [60].
To decrease intensive computation, according to the suggestion of Cheng et al. [61], CWA was only performed at dyadic scales instead of all possible scales.Moreover, the 2051 wavebands (400-2450nm) available in this study made the dyadic scale less than 2 10 = 1024.Based on the suggestion of

Continuous Wavelet Analysis of Leaf Reflectance
Wavelet analysis is an effective pattern of decomposing the original signal into multiple amplitudes and scales [51,52] and has been widely applied in the field of vegetation remote sensing [41,43,44].Wavelet analysis is generally implemented in the form of discrete wavelet analysis (DWA) and continuous wavelet analysis (CWA).The former often transforms the most informative part of the input data to avoid redundancy but the decomposed components from DWA are difficult to interpret for waveband-by-waveband analysis [44].In contrast, CWA generates interpretable signals which directly correspond to original leaf spectra and thus the decomposed signals can reflect the information on plant absorption features [53,54].Therefore, we employed CWA to explore the details of leaf spectra in the classification of mangrove species.
CWA performs the convolution of reflectance spectrum f (λ) into sets of coefficients with a mother wavelet function at various scales (Equation ( 1)) [55].This may be expressed as where W f (a,b) (a and b are positive real numbers) is the vector of wavelet coefficients, a and b represent scaling and shifting factors, respectively, indicating the width and position of the wavelet function [56], and ψ a,b (λ) is the mother wavelet function.
The shape of the leaf reflectance spectrum is similar to a Gaussian or quasi-Gaussian function, or a composition of several Gaussian functions [57].Based on the suggestion of Torrence et al. [58], the second order derivative of Gaussian (namely the 'Mexican Hat', "mexh") was chosen as the mother wavelet function (Figure 3).The "mexh" function has symmetry and its mean power is zero [59].Moreover, the "mexh" function has an infinite support width of (-5s, 5s) (s ∈ Z + ) and its effective basic support range is (-5, 5) [60].
To decrease intensive computation, according to the suggestion of Cheng et al. [61], CWA was only performed at dyadic scales instead of all possible scales.Moreover, the 2051 wavebands (400-2450nm) available in this study made the dyadic scale less than 2 10 = 1024.Based on the suggestion of Cheng et al. [42] and the preliminary experiments of mangrove species classification, the eight scales (2 0 , 2 1 , . . ., 2 7 ) were chosen for CWA of Smth and Der (or named "wavelet power spectra of Smth and Der"), and CWA was implemented with the wavelet packets of MATLAB R2018a.
Cheng et al. [42] and the preliminary experiments of mangrove species classification, the eight scales (2 0 , 2 1 ,…, 2 7 ) were chosen for CWA of Smth and Der (or named "wavelet power spectra of Smth and Der"), and CWA was implemented with the wavelet packets of MATLAB R2018a.

Establishment of Mangrove Species Classification Model
To examine the impact of different sample subset partition and feature extraction methods on the performance of mangrove species classification, three commonly-used subset partition methods (STRAT, KS, and SPXY) and feature selection methods (PCA, SPA, and VI) were tested and compared, respectively.Random forests (RF), a prevalent and successful machine learning method, was chosen as the classification model in this study.
According to the suggestion of Roth et al. [62], to ensure the modeling and prediction sets contained samples of each species, the original dataset (301 samples) was first divided into two sets with STRAT, using 70% (211) of samples for modeling and 30% (90) of samples for prediction (Figure 4).The reason for this partition strategy of selecting the prediction sets was that the KS and SPXY algorithms selected sample subsets based on the Euclidean distances of x-space or x and y space which resulted in an unbalanced prediction sample size of four species.Afterwards, the modeling set was divided into two sets, with 70% (148 samples) being used as a calibration set to construct species classification models and the remaining samples (the validation set) being eliminated due to the aforementioned influence of KS and SPXY.Each process of sample subset partitioning was repeated 50 times to ensure the reliability of the classification results [63].
A total of 270 ( 18

Sample Subset Partition
Compared with simple random sampling (SRS), STRAT can select more representative samples and is especially suitable for remote-based plant species classification [28].The KS algorithm calculates the Euclidean distances within different samples along the independent variable (x) space

Establishment of Mangrove Species Classification Model
To examine the impact of different sample subset partition and feature extraction methods on the performance of mangrove species classification, three commonly-used subset partition methods (STRAT, KS, and SPXY) and feature selection methods (PCA, SPA, and VI) were tested and compared, respectively.Random forests (RF), a prevalent and successful machine learning method, was chosen as the classification model in this study.
According to the suggestion of Roth et al. [62], to ensure the modeling and prediction sets contained samples of each species, the original dataset (301 samples) was first divided into two sets with STRAT, using 70% (211) of samples for modeling and 30% (90) of samples for prediction (Figure 4).The reason for this partition strategy of selecting the prediction sets was that the KS and SPXY algorithms selected sample subsets based on the Euclidean distances of x-space or x and y space which resulted in an unbalanced prediction sample size of four species.Afterwards, the modeling set was divided into two sets, with 70% (148 samples) being used as a calibration set to construct species classification models and the remaining samples (the validation set) being eliminated due to the aforementioned influence of KS and SPXY.Each process of sample subset partitioning was repeated 50 times to ensure the reliability of the classification results [63].
A total of 270 (18 Cheng et al. [42] and the preliminary experiments of mangrove species classification, the eight scales (2 0 , 2 1 ,…, 2 7 ) were chosen for CWA of Smth and Der (or named "wavelet power spectra of Smth and Der"), and CWA was implemented with the wavelet packets of MATLAB R2018a.

Establishment of Mangrove Species Classification Model
To examine the impact of different sample subset partition and feature extraction methods on the performance of mangrove species classification, three commonly-used subset partition methods (STRAT, KS, and SPXY) and feature selection methods (PCA, SPA, and VI) were tested and compared, respectively.Random forests (RF), a prevalent and successful machine learning method, was chosen as the classification model in this study.
According to the suggestion of Roth et al. [62], to ensure the modeling and prediction sets contained samples of each species, the original dataset (301 samples) was first divided into two sets with STRAT, using 70% (211) of samples for modeling and 30% (90) of samples for prediction (Figure 4).The reason for this partition strategy of selecting the prediction sets was that the KS and SPXY algorithms selected sample subsets based on the Euclidean distances of x-space or x and y space which resulted in an unbalanced prediction sample size of four species.Afterwards, the modeling set was divided into two sets, with 70% (148 samples) being used as a calibration set to construct species classification models and the remaining samples (the validation set) being eliminated due to the aforementioned influence of KS and SPXY.Each process of sample subset partitioning was repeated 50 times to ensure the reliability of the classification results [63].
A total of 270 ( 18

Sample Subset Partition
Compared with simple random sampling (SRS), STRAT can select more representative samples and is especially suitable for remote-based plant species classification [28].The KS algorithm calculates the Euclidean distances within different samples along the independent variable (x) space using a stepwise procedure; two samples with the farthest Euclidean distance are first selected and the next sample selected is the farthest one from the first two samples [64].SPXY improves on the KS algorithm [65] by extending the Euclidean distance calculation with both independent and dependent variables.For details of KS and SPXY refer to Galvao et al. [65].The three aforementioned sample subset partition methods were implemented with MATLAB R2018a.

Feature Extraction
PCA is a typical dimensionality reduction method which is widely applied in hyperspectral image analysis [34,66].In this study, spectra (Smth, Der, Smth + CWA (eight scales), and Der + CWA (eight scales)) were subjected to PCA using a pca() function in MATLAB R2018a, and the leading several principal components were chosen based on the eigenvalues-greater-than-one rule to calibrate the model [67].In addition, we added up the percentages of total variance explained by the selected principal components for each spectrum (Table 2).SPA is a forward variable selection algorithm which begins with a waveband, merges a new one during each iteration, and applies projection operators in a vector space until meeting a specified number of wavebands [68].The advantage of SPA lies in its deterministic search process with good robustness and reproducible results.The ratio of the number of selected wavebands to the total number of training samples is 0.15-0.2 to avoid over-fitting problems [69].Gross, et al. [45] discovered that model performance with less than 20 features was relatively stable, and extra features evidently increased computation time.Hence, we set the parameter m_max (maximum number of variables) of the spa() function as 20.The process was implemented with SPA code (www.ele.ita.br/~{}kawakami/spa) in MATLAB R2018a.

Random Forests Classification
The RF algorithm introduces decision trees, the bagging (bootstrapping aggregation) sampling method and internal cross-validation into K binary Classification and Regression Trees (CART) trees, and can effectively overcome the over-fitting problem of machine learning [73][74][75].Hundreds of decision tree models are constructed by RF and the randomized subsets of target data and variables are utilized for building each tree [76].These classification trees are then used to determine the correct classification by majority voting [77].There are two main tuning parameters (ntree and mtry) needed in a RF model, and both of them are kept at the default values because researchers have reported that the default values and the empirical criteria generally produce acceptable results [74,78].The RF was performed with RF code (https://code.google.com/archive/p/randomforest-matlab/downloads) in MATLAB R2018a.

Evaluation of Classification Model Performance
Overall accuracy (OA) (Equation ( 2)), producer's accuracy (PA) (Equation ( 3)) and user's accuracy (UA) (Equation ( 4)) were employed to evaluate the performance of each classification model.Furthermore, the allocation disagreement (AD) (Equation ( 5)) and quantity disagreement (QD) (Equation ( 6)) were adopted rather than Kappa, since Kappa neglects the assessment of off-diagonal elements [79] which is highly relevant to OA [63].For specific descriptions and explanations of AD and QD refer to Jr et al. [80] and Nurmemet et al. [81].The larger the values of OA, PA, or UA, the better the model performance; however, the larger the value of AD or QD, the poorer the model performance.These aforementioned methods may be expressed as in a RF model, and both of them are kept at the default values because researchers have reported that the default values and the empirical criteria generally produce acceptable results [74,78].The RF was performed with RF code (https://code.google.com/archive/p/randomforest-matlab/downloads) in MATLAB R2018a.

Evaluation of Classification Model Performance
Overall accuracy (OA) (eq.2), producer's accuracy (PA) (eq. 3) and user's accuracy (UA) (eq.4) were employed to evaluate the performance of each classification model.Furthermore, the allocation disagreement (AD) (eq.5) and quantity disagreement (QD) (eq.6) were adopted rather than Kappa, since Kappa neglects the assessment of off-diagonal elements [79] which is highly relevant to OA [63].For specific descriptions and explanations of AD and QD refer to Jr et al. [80] and Nurmemet et al. [81].The larger the values of OA, PA, or UA, the better the model performance; however, the larger the value of AD or QD, the poorer the model performance.These aforementioned methods may be expressed as where  is a 4 × 4 error matrix (eq.7), and i is the row/column number of relevant species;  + represents the sum of the ith row of the error matrix, reflecting the total number of the samples in ith species divided into the ith species and the other three species;  + represents the sum of the ith column of the error matrix, reflecting the total number of samples divided into ith species;   , the ith diagonal element of error matrix, indicates the number of species correctly classified; and n is the sample size of prediction sets.

Mean Reflectance and Wavelet Power Spectra of Mangrove Leaf
Taking Am (Avicennia marina) for instance (the cases of other species can be found in Figure S1-1 and S1-2 (see supplementary material Figure S1), the mean reflectance and wavelet power spectra of Smth/Der with eight scales are illustrated in Figure 5.The mean reflectance spectra of Smth only had 10 peaks and 10 troughs (Figure 5a), while 52 peaks and 53 troughs were observed for the first derivative spectra (Figure 5b).Compared with the spectra of Smth/Der, the number of peaks and troughs for the wavelet power spectra of Smth/Der (scale = 1, 2, and 4) experienced a substantial increase with much more detailed spectral features; however, the wavelet power spectra with the scales of 32-128 had less information of spectral features than the spectra of Smth/Der.The other three species (Figure S1-1 and S1-2 (Figure S1)) also experienced the same condition with first derivative and wavelet power spectra (scale = 1, 2, and 4), which enhanced the differences of spectral wavebands and improved the details of spectral features.(7) where A is a 4 × 4 error matrix (Equation ( 7)), and i is the row/column number of relevant species; A i+ represents the sum of the ith row of the error matrix, reflecting the total number of the samples in ith species divided into the ith species and the other three species; A +i represents the sum of the ith column of the error matrix, reflecting the total number of samples divided into ith species; A ii , the ith diagonal element of error matrix, indicates the number of species correctly classified; and n is the sample size of prediction sets.

Mean Reflectance and Wavelet Power Spectra of Mangrove Leaf
Taking Am (Avicennia marina) for instance (the cases of other species can be found in Figure S1-1, S1-2 (see supplementary material Figure S1), the mean reflectance and wavelet power spectra of Smth/Der with eight scales are illustrated in Figure 5.The mean reflectance spectra of Smth only had 10 peaks and 10 troughs (Figure 5a), while 52 peaks and 53 troughs were observed for the first derivative spectra (Figure 5b).Compared with the spectra of Smth/Der, the number of peaks and troughs for the wavelet power spectra of Smth/Der (scale = 1, 2, and 4) experienced a substantial increase with much more detailed spectral features; however, the wavelet power spectra with the scales of 32-128 had less information of spectral features than the spectra of Smth/Der.The other three species (Figure S1-1,S1-2 (Figure S1)) also experienced the same condition with first derivative and wavelet power spectra (scale = 1, 2, and 4), which enhanced the differences of spectral wavebands and improved the details of spectral features.

Performance of Classification Models with Reflectance, Derivative and Wavelet Power Spectra
To explore the advantage of CWA on mangrove species classification, the OAs, Ads, and QDs of the 270 models with reflectance, derivative, and wavelet power spectra were summarized and compared (Table 3).Regardless of the effect of sample subset partition and feature extraction, the models with CWA of Smth spectra at the scales of 2 1 -2 6 (also Smth_2, Smth_4, Smth_8, Smth_16, Smth_32, and Smth_64) and CWA of Der spectra at the scale of 2 3 and 2 4 (also Der_8 and Der_16) had higher mean OAs and lower mean QDs than those with Smth/Der spectra; the Der spectra achieved much better classification performance than the Smth spectra.In addition, among the 120 models with the wavelet power spectra of Smth, 88.3% of models increased the OA values with an improvement of 0.2-28.6%compared to the model with the Smth spectra, and among the 120 models with the wavelet power spectra of Der, 25.8% of models increased OA values, with an improvement of 0.1-11.4% compared to the model with the Der spectra (Table S1-OA comparison).
Among the 270 models, the models with Smth_4 and Der_8 spectra had the highest mean classification accuracies; however, the models with CWA of Smth spectra at the scale of 128 and CWA of Der spectra at the scale of 1 had the poorest classification performances.The two models with CWA of Smth/Der spectra at the scale of 8 held the best classification accuracies, with OA reaching 97.6% and 98.0%, respectively.

Performance of Models with Different Sample Subset Partition Methods
To explore the effect of different sample subset partition methods on the model performance of mangrove species classification, the OAs, ADs and QDs of the 270 models using the STRAT, KS and SPXY methods were compared (Figure 6).The models using the STRAT method (mean OA = 84.5%,mean AD = 10.8%, and mean QD = 4.8%) achieved better classification results than those using the KS (mean OA = 72.7%,mean AD = 20.4%, and mean QD = 6.9%) and SPXY (mean OA = 71.7%,mean AD = 21.3%, and mean QD = 7.0%) methods.Moreover, the standard deviation of the OAs, ADs, and QDs using the STRAT method (SD = 6.7%, 5.9%, and 1.2%, respectively) were much smaller than those using the KS (SD = 13.1%,11.3%, and 2.4%, respectively) and SPXY (SD = 13.7%, 11.9%, and 2.6%, respectively) methods.In addition, the lowest accuracies of models using these three methods (OA = 66.2% for STRAT, OA = 38.0%for KS, and OA = 36.9%for SPXY) occurred with the wavelet power spectra of Der_1.

Performance of Classification Models with Different Feature Extraction Methods
To explore the effect of different feature extraction methods on the model performance of mangrove species classification, the OAs, ADs, and QDs of the 270 models using PCA, SPA, and three VIs (NDVI, RVI, and TBVI) were compared (Figure 7).The ranking order of the mean OA values was PCA > SPA > TBVI > NDVI > RVI, while PCA < SPA < TBVI < NDVI < RVI for the mean AD values and SPA < PCA < TBVI < NVDI < RVI for the mean QD values.The models using PCA had lower standard deviation values of the OAs, ADs, and QDs (SD = 4.6%, 3.5%, and 1.5%, respectively) than those with the other four methods.Moreover, the model using SPA combined with the STRAT method and the wavelet power spectra of Der_8 achieved the highest OA value of 98.0% among the 270 models.Among the three VIs, there were no significant differences between NDVI and RVI in mangrove species classification considering the mean or the SD values of OAs, Ads, and QDs; however, both provided lower classification results than TBVI (mean OA = 72.7%,mean AD = 17.2%, and mean QD = 6.6%).Notably, the model using TBVI (the three wavebands were 730, 1680, and 1735 nm) combined with the STRAT method and the wavelet power spectra of Smth_4 held the highest OA value of 93.6% among the models using VIs, which was higher than the best result of the models using PCA (Figure 7a).
To explore which waveband or spectral range was sensitive to mangrove species classification, the wavebands which were frequently selected (with ratios of the frequency of the specific waveband to the number of all the wavebands selected by SPA greater than 0.01) by SPA (54 models, 714 selected wavebands in total) and VIs (54 × 3 = 162 models, 378 selected wavebands in total) were compiled (Table 4).The selected bands mostly lie in the range of 680-780 nm (red edge region), 1650-1950 nm, and 2200-2450 nm.

Continuous Wavelet Analysis for Mangrove Species Classification
With leaf hyperspectral reflectance, we have explored the potential of CWA combined with different sample subset partition and feature extraction methods in mangrove species classification, and study results have demonstrated that CWA of Smth and Der spectra with suitable wavelet scales could greatly improve the classification accuracy compared to Smth or Der spectra alone (Table 3).Notably, although low-scale (e.g., scale = 1) wavelet coefficients were able to discover numerous details from leaf spectra (Figure 5), the results were unsatisfactory (mean OA: 63.6%); the high-scale (e.g., scale = 128) wavelet coefficients also corresponded to poorer performance of classification (mean OA: 69.3%).Such unsuccessful results may be explained by the fact that the low or high scales of wavelet power spectra have noise, or insensitive or less detailed spectral information related to leaf structure and biochemical components [41,44].In addition, both the derivative analysis and CWA can enhance the differences of spectral wavebands; however, their combination did not improve the accuracy at the scale of 2 1 or 2 2 , suggesting that the details of the original spectra could be largely preserved by derivative analysis or CWA alone.Such inference could also be supported by the result of Der spectra performing much better than Smth spectra.In general, low-scale wavelet coefficients are adept at detecting the characteristics of narrow absorption features of leaves but high-scale coefficients are expert in determining the overall spectral shape of leaf reflectance spectra [44].Therefore, it is of crucial importance to determine the suitable wavelet scale when reflectance or derivative spectra are subjected to CWA.
Considering relevant studies using CWA on hyperspectral data at the leaf level, many studies have reported that the optimal scale of CWA in plant disease detection and biochemical component estimation lies between 2 2 and 2 5 .For instance, Shi et al. [41] selected wavelet features at the scales of 2 2 , 2 4 and 2 5 to study rust infestation; Zhang et al. [23] identified disease sensitive wavelet features at the scales of 2 2~4 ; Li et al. [82] found the performance of scale 2 3 was the best for extracting the red edge position from leave reflectance; and 2 4 was specified as the optimal scale for estimating leaf water content and leaf dry matter content by Cheng et al. [54] and Cheng et al. [42], respectively.Therefore, the optimal continuous wavelet scales between 2 2 and 2 5 are still required to be investigated in future studies.
Previous studies have reported the classification of mangrove species without CWA.For instance, Zhang et al. [23] coupled laboratory leaf hyperspectral reflectance with mangrove health conditions, acquiring an OA around 90%.In terms of other plant species classification without CWA, Shen et al. [34] combined airborne hyperspectral images with LiDAR to classify tree species of subtropical forests and their best OA reached around 91.5%.Our study has revealed the advantage of CWA in mangrove species classification, and the fact that it is promising to employ CWA to further improve the accuracy of plant species classification with drone or satellite-based hyperspectral data.

Impact of Sample Subset Partition and Feature Extraction on Classification Accuracy
Our results have demonstrated that the selection of sample subset partition and feature extraction methods play an important role in improving the accuracies of mangrove species.Generally, the models using STRAT method exhibited more stable and higher classification accuracies than those using KS and SPXY methods.To our knowledge, very few studies have applied SPXY in the sample partition for plant species classification, and comparisons among SRS, KS and SPXY have been performed in quantitative models to determine the asiaticoside content in centella total glucosides [83].However, Zhan et al. [83] have concluded that the model using SPXY achieves the highest prediction accuracy which was contrary to our results.Such disagreement may be related to differences in species, models or the statistical distribution of dependent variables.Moreover, the sample subset partitioned by KS and SPXY cannot ensure the equal number of samples for each species.Hence, it is recommended to employ STRAT method in plant species classification.
Regarding the feature extraction method, the performances of the models using PCA and SPA were better than those using VIs, which could be explained by the fact that VIs had less spectra features with only two or three wavebands.This inference could be supported by the results of the models using TBVI, because TBVI has only one more waveband than NDVI and RVI, but the OAs of the models using TBVI were higher than those using NDVI and RVI.Although the models using SPA had slightly lower mean classification accuracies than those using PCA (Figure 7), the former models could have better waveband explanation and the model using SPA coupled with CWA of Der at the scale of 8 and STRAT had the best classification result among the 270 models.Guzmán Q et al. [39] recently solely applied PCA to realize data reduction and successfully discriminated liana and tree leaves from a neotropical dry forest.Hence, it is difficult to determine whether PCA or SPA is more suitable for mangrove species classification, and both methods still require investigation with other plant species or other forms of hyperspectral data (satellite-based or UAV-acquired).
This study has only focused on wavebands selected by SPA and VIs and many of the selected wavebands lie in the red edge region (680-780 nm) (Table 4).Red edge species are sensitive to vegetation chlorophyll, stress, dynamics, and nitrogen accumulation [84,85].Moreover, the selected wavebands located around 1885 and 2245 nm are related to foliar starch [86].The selected wavebands within 1600-1750 nm might be associated with complex biochemical properties, such as salt, sugar, water, protein, oil, lignin, starch, and cellulose, as well as leaf structure [10]; and the wavebands in the range of 1900-1950 nm and around 1400 nm are strongly affected by water absorption [86].The hyperspectral data alone could explore the sensitive spectral information related to mangrove species classification, and the integration of spectral information and supplementary data (e.g., biochemical components, soil property, and geomorphology) might help to further understand the mechanism of mangrove species composition and extend the classification model to other study sites.

Taxonomically Comparing the Accuracy of Mangrove Species Classification
Some studies mention that the spectral differences of plants within different taxonomic levels are significant [87][88][89].Kiang et al. [88] found that the spectral differences within higher taxonomic levels were more pronounced than those within lower taxonomic levels.Hence, we compiled detailed taxonomical information (Figure S1-3 (Figure S1)) (source: Flora Reipublicae Popularis Sinicae, frps.iplant.cn) on four mangrove species.
Bg and Ko belong to the same family, while Am and Ac are classified into different orders.In most cases, the PAs and UAs of Bg and Ko are lower than Am and Ac (Figure S1-5 (Figure S1)).Moreover, we found that the majority of the error-classifying of Bg and Ko samples were classified into Ko and Bg, respectively, which may be explained by the fact that the shape, size and thickness of their leaves are similar and their biochemical components are approximate (leaf water content (Ko: 70.93% (SD = 2.82%); Bg: 68.27% (SD = 2.67%)); chlorophyll (metered by SPAD-502, Ko: 68.6 (SD = 5.18); Bg: 70.1 (SD = 3.89)).In general, we can infer that it is still a challenge to use hyperspectral data alone to accurately discriminate between taxonomically similar species, and supplementary information such as canopy structure, leaf area index, and leaf biochemical components may be incorporated into the classification model to improve the accuracies of species discrimination.

Conclusions
With leaf hyperspectral data, we have explored the potential of CWA coupled with different sample subset partition and feature extraction methods in mangrove species classification.The following conclusions may be drawn: Regardless of the effect of sample subset partition and feature extraction methods on the performance of mangrove species classification, CWA with suitable scales has great potential to improve the classification accuracy.
2) The STRAT method combined with PCA or SPA methods is recommended to improve classification performance.3) Compared with the original reflectance spectra, the derivative spectra can significantly improve the classification accuracy.
The leaf-level results can lay the foundation for the next step in-depth study of mangrove species classification with UAV-acquired or satellite hyperspectral data, contributing to understanding large-scale species composition and further effectively protect and manage mangrove forests.Moreover, the encouraging performance of CWA can also be extended to other plant species such as forest and crop.Further, eco-environment factors (e.g., elevation, soil property, leaf biochemical components, and canopy structure) are required to investigate their effects on the performance of mangrove species classification in future studies.

Figure 1 .
Figure 1.The distribution and location of study areas.

Figure 1 .
Figure 1.The distribution and location of study areas.

Figure 2 .
Figure 2. The mean smooth (Smth) reflectance spectrum and photos of four mangrove species.

Figure 2 .
Figure 2. The mean smooth (Smth) reflectance spectrum and photos of four mangrove species.
× 3 × 5) models were constructed, considering 18 types of spectra (Smth, Der, Smth + CWA (eight scales), and Der + CWA (eight scales)) in conjunction with three sample subset partition (STRAT, KS, and SPXY) and five feature extraction methods (PCA, SPA, and three VIs).To simply and clearly present the 270 models, CWA of Smth and Der were expressed in Smth_scale and Der_scale (scale = 1, 2, 4, 8, 16, 32, 64, 128) and the combination of sample subset partition and feature extraction was represented in STRAT_PCA, which indicated the sample subset was partitioned by STRAT and the feature was in the meantime extracted by PCA.
3 × 5) models were constructed, considering 18 types of spectra (Smth, Der, Smth + CWA (eight scales), and Der + CWA (eight scales)) in conjunction with three sample subset partition (STRAT, KS, and SPXY) and five feature extraction methods (PCA, SPA, and three VIs).To simply and clearly present the 270 models, CWA of Smth and Der were expressed in Smth_scale and Der_scale (scale = 1, 2, 4, 8, 16, 32, 64, 128) and the combination of sample subset partition and feature extraction was represented in STRAT_PCA, which indicated the sample subset was partitioned by STRAT and the feature was in the meantime extracted by PCA.
× 3 × 5) models were constructed, considering 18 types of spectra (Smth, Der, Smth + CWA (eight scales), and Der + CWA (eight scales)) in conjunction with three sample subset partition (STRAT, KS, and SPXY) and five feature extraction methods (PCA, SPA, and three VIs).To simply and clearly present the 270 models, CWA of Smth and Der were expressed in Smth_scale and Der_scale (scale = 1, 2, 4, 8, 16, 32, 64, 128) and the combination of sample subset partition and feature extraction was represented in STRAT_PCA, which indicated the sample subset was partitioned by STRAT and the feature was in the meantime extracted by PCA.

20 21Figure 6 .
Figure 6.The box plots of the OAs (a), ADs (b) and QDs (c) of the 270 models with the stratified

26 3. 4 .Figure 6 .
Figure 6.The box plots of the OAs (a), ADs (b) and QDs (c) of the 270 models with the stratified random sampling (STRAT), Kennard-Stone sampling algorithm (KS), and subset partition based on joint X-Y distances (SPXY) methods.Each box has three whisker labels representing the maximum, mean, and minimum values for each dataset from top to bottom.The box width represents SD and a comparatively wider box width indicates a larger SD value.

37 Figure 7 .Figure 7 .
Figure 7.The box plots of the OAs (a), ADs (b) and QDs (c) of the 270 models using PCA, successive

Table 1 .
Statistics of 301 mangrove leaf samples.

Table 1 .
Statistics of 301 mangrove leaf samples.

Table 2 .
Summary of percentages of total variance explained by the selected principal components.

Table 4 .
The wavebands selected by SPA and vegetation indexes (Vis) with high frequency.