Classification of Mangrove Species Using Combined WordView-3 and LiDAR Data in Mai Po Nature Reserve , Hong Kong

Mangroves have significant social, economic, environmental, and ecological values but they are under threat due to human activities. An accurate map of mangrove species distribution is required to effectively conserve mangrove ecosystem. This study evaluates the synergy of WorldView-3 (WV-3) spectral bands and high return density LiDAR-derived elevation metrics for classifying seven species in mangrove habitat in Mai Po Nature Reserve in Hong Kong, China. A recursive feature elimination algorithm was carried out to identify important spectral bands and LiDAR (Airborne Light Detection and Ranging) metrics whilst appropriate spatial resolution for pixel-based classification was investigated for discriminating different mangrove species. Two classifiers, support vector machine (SVM) and random forest (RF) were compared. The results indicated that the combination of 2 m resolution WV-3 and LiDAR data yielded the best overall accuracy of 0.88 by SVM classifier comparing with WV-3 (0.72) and LiDAR (0.79). Important features were identified as green (510–581 nm), red edge (705–745 nm), red (630–690 nm), yellow (585–625 nm), NIR (770–895 nm) bands of WV-3, and LiDAR metrics relevant to canopy height (e.g., canopy height model), canopy shape (e.g., canopy relief ratio), and the variation of height (e.g., variation and standard deviation of height). LiDAR features contributed more information than spectral features. The significance of this study is that a mangrove species distribution map with satisfactory accuracy can be acquired by the proposed classification scheme. Meanwhile, with LiDAR data, vertical stratification of mangrove forests in Mai Po was firstly mapped, which is significant to bio-parameter estimation and ecosystem service evaluation in future studies.


Introduction
Mangrove forests are unique inter-tidal wetland in the tropical and subtropical coastal areas that have significant value to human society and living environment.In terms of social and economic value, mangrove forest has high primary productivity which on one hand, provides habitat for diverse aquatic plants, animals and attracts thousands of water birds and hence maintains bio-diversity.On the other hand, supplements food and timber for local people.With respect to environmental and ecological values, mangrove forest protects the shoreline from erosion because of tide, wind, and storms, and acts as the first defensive line to extreme weather for coastal areas.Besides, the mangrove ecosystem is a valuable resource for recreation, education, and scientific research [1].However, mangrove habitat is threatened around the world due to human activities.Therefore, understanding the mangrove habitat is crucial for environmental conservation and an accurate map of species distribution is required to this end.
Situated at the inter-tidal zone, most of the mangrove habitats are swampy and inaccessible, which makes field survey much more difficult.Remote sensing technology plays an important role in mangrove species mapping.High-resolution multispectral satellite images such as the Quickbird, Ikonos, WorldView-2 (WV-2), and WorldView-3 (WV-3) were utilized to classify mangrove species in various studies [2][3][4][5][6].Specifically, WV-2 is the first commercial satellite offering sub-meter image with eight multispectral bands.Spectral bands such as yellow, red edge, and NIR bands recorded by WV-2 and WV-3 sensors showed tremendous potential in improving vegetation species classification and biophysical parameters estimation in previous studies [2,[7][8][9][10][11][12].However, in terms of species discrimination, misclassification still occurred due to similar spectral reflectance characteristics across different species.
The Airborne Light Detection and Ranging (LiDAR) system is able to provide reliable estimations of forest structural attributes and vegetation biophysical parameters such as canopy height, canopy cover, canopy stratification, leaf area index, and biomass [13][14][15].LiDAR elevation metrics derived from high return density LiDAR were found to be effective in tree species classification [16].Shi et al. [17] indicated that no statistically significant difference had been found by using leaf-on and leaf-off LiDAR metrics while LiDAR radiometric metrics were more important than elevation metrics in tree species classification.Therefore, a combination of spectral image and LiDAR data corresponding to the vegetation spectral reflectance and the canopy vertical structure respectively can be a promising approach for better tree species discrimination.The synergy of airborne hyperspectral images and LiDAR data to classify tree species in both pixel-based and object-based approaches was indicated to yield better accuracy when compared to using either spectral or structural data alone [18,19].Chadwick [20] combined LiDAR-derived digital terrain model (DTM) and Ikonos multispectral imagery to improve the mangrove identification with a 7.1% increase in overall accuracy as compared with using multispectral imagery alone.Zhang & Liu [11] integrated LiDAR elevation metrics and WV-2 image to classify temperate forest species and obtained a 20% increase in overall accuracy.
Feature selection is commonly carried out in hyperspectral images to reduce inter-correlation and the Hughes effect.It can shorten the computation time, and the selected feature subset yields even higher classification accuracy than the original data [21,22].Besides, feature selection is robust against overfitting since bias is introduced [23,24].Similar to hyperspectral data, some LiDAR metrics were found to be highly correlated.Several studies [19,25] selected important features either from LiDAR-derived metrics or WV-2-derived features based on the ranking of predictors produced by the random forest algorithm.
There are, however, limited studies that combine high-resolution satellite image and high return density LiDAR to classify mangrove species.Factors such as appropriate grid size or diameter of LiDAR metric was explored in the biophysical parameter retrieval study [26], but very few studies have investigated the impact of these factors in a pixel-based classification.Furthermore, evaluating the importance of spectral bands and LiDAR elevation metrics have seldom been discussed.
This study assesses the capacity of combining WV-3 image and airborne high return density (20 points per m 2 ) LiDAR data to classify six mangrove species and a Gramineae species in the Mai Po Nature Reserve in Hong Kong, China.Recursive feature elimination algorithm was applied to select relevant and crucial features.Two classifiers including support vector machine and random forest were compared for their performance in classification.The objectives of this study are (1) to accurately map the mangrove species distribution in the core zone of Mai Po Nature Reserve, (2) to identify and reveal spectral bands and LiDAR elevation metrics that are important for distinguishing mangrove species, (3) to investigate the appropriate image resolution and LiDAR metrics grid size for a pixel-based classification, and (4) to evaluate the performance of different classifiers in the proposed classification scheme.

Study Area
This study was conducted in Mai Po Nature Reserve (22  59 E-114 • 03 E) in Hong Kong, China.Mai Po is located at the northwestern part of New Territories of Hong Kong.It is close to the border between Hong Kong and Shenzhen Special Economic Zone of Guangdong Province, China where the Shenzhen River and the Deep Bay separate the two cities.Located at the subtropical region, Hong Kong receives abundant solar heat and is strongly affected by monsoons that bring lots of rainfall in the wet season with annual precipitation exceeding 1600 mm.Drained by the Pearl River system as well as Shenzhen River, the intertidal mudflat receives sediments and gradually develops the mangrove habitat.Mai Po Nature Reserve is the largest mangrove ecosystem with an area of 380 hectares in Hong Kong [27].It helps alleviate flood problems and stabilize the shore along the inner Deep Bay [27,28].Mai Po mangrove ecosystem and its surrounding wetlands were designated as Wetland of International Importance under the Ramsar Convention in 1995 [27].
The major mangrove species found in Mai Po include seven native species Kandelia obovata, Avicennia marina, Aegiceras corniculatum, Bruguiera gymnorhiza, Excoecaria agallocha, Acrostichum auerum and Acanthus ilicifolius [28].An exotic tropical species Sonneratia apetala propagated from the plantation mangrove in Futian National Nature Reserve, Shenzhen.They grow fast and are tall in height and block the sunlight, which affected the growth of native mangrove species and foraging of wetland animals.Agriculture, Fisheries, and Conservation Department (AFCD) of the Hong Kong government monitored the distribution of Sonneratia closely and removed them regularly.In this study, the core area of Mai Po Nature Reserve with an area of approximately 2 km 2 was selected as the study area as shown in Figure 1.

Study Area
This study was conducted in Mai Po Nature Reserve (22°29′N-23°31′N,113°59′E-114°03′E) in Hong Kong, China.Mai Po is located at the northwestern part of New Territories of Hong Kong.It is close to the border between Hong Kong and Shenzhen Special Economic Zone of Guangdong Province, China where the Shenzhen River and the Deep Bay separate the two cities.Located at the subtropical region, Hong Kong receives abundant solar heat and is strongly affected by monsoons that bring lots of rainfall in the wet season with annual precipitation exceeding 1600 mm.Drained by the Pearl River system as well as Shenzhen River, the intertidal mudflat receives sediments and gradually develops the mangrove habitat.Mai Po Nature Reserve is the largest mangrove ecosystem with an area of 380 hectares in Hong Kong [27].It helps alleviate flood problems and stabilize the shore along the inner Deep Bay [27,28].Mai Po mangrove ecosystem and its surrounding wetlands were designated as Wetland of International Importance under the Ramsar Convention in 1995 [27].
The major mangrove species found in Mai Po include seven native species Kandelia obovata, Avicennia marina, Aegiceras corniculatum, Bruguiera gymnorhiza, Excoecaria agallocha, Acrostichum auerum and Acanthus ilicifolius [28].An exotic tropical species Sonneratia apetala propagated from the plantation mangrove in Futian National Nature Reserve, Shenzhen.They grow fast and are tall in height and block the sunlight, which affected the growth of native mangrove species and foraging of wetland animals.Agriculture, Fisheries, and Conservation Department (AFCD) of the Hong Kong government monitored the distribution of Sonneratia closely and removed them regularly.In this study, the core area of Mai Po Nature Reserve with an area of approximately 2 km 2 was selected as the study area as shown in Figure 1.
One of the objectives is to select an appropriate spatial resolution for classification.The multispectral panchromatic images were fused using the Gram-Schmidt Pan Sharpening of the ENVI 5.5 software for further analysis.Map reprojection and registration were conducted to the 2 m MS image and pan-sharpening image with reference to the official digital aerial orthophotos acquired from the Lands Department, The Hong Kong Government which were registered under the Hong Kong 1980 grid projection in ArcMap 10.6 software.To facilitate the analysis, non-vegetated areas were identified using the normalized difference vegetation index (NDVI).Through trial-and-error and visual interpretation validation, areas with NDVI less than 0.39, were masked out.

Airborne LiDAR Data
A LiDAR data was acquired using the GL-70A survey-grade LiDAR system by a helicopter on March 21, 2018, during a period of low tide level (1.5-1.8 m).The study area was scanned by a Riegl VUX-1LR laser scanner (RIEGL Inc., Horn, Austria) using near-infrared.Flying at a low altitude (approximately 155-270 m above the ground level), a small footprint and high returns density dataset was acquired.The LiDAR data was projected in Hong Kong 1980 grid, containing attributes including XYZ coordinates, 16 bits of intensity and number of returns.Specification of the LiDAR system and LiDAR data were described in Table 1.The LiDAR data were also registered to align with the digital aerial orthophotos.A filtering method, Progressive Triangular Irregular Network (TIN) Densification, devised by Axelsson [30] was applied to the LiDAR point cloud to classify ground and non-ground points.LiDAR data was preprocessed by using LAStools (http://cs.unc.edu/~{}isenburg/lastools/). Figure 2 showed an example of the vertical profile of the mangroves in LiDAR point cloud.LiDAR metrics were generated in a 2 m and 5 m resolution for comparison purpose.

Field Survey and Reference Data Collection
Species distribution surveys were conducted during March and May in 2018 when Aegiceras corniculatum (AC) blossomed.The presence of flowers improved the certainty and efficiency in distinguishing different species.The mangroves were accessed by a floating boardwalk and by a boat from south to north along the mangrove edge in the Deep Bay.A 5 m accuracy GPS was used to measure the location of in situ samples.Site photos and tree heights were also recorded.According to the site survey, Kandelia obovata (KO) and Avicennia marina (AM) were the two dominant overstory species.Acanthus ilicifolius (AI) was mainly distributed as pioneer species along coastal edges or as understory in the middle part of the study area.AC was frequently found at the edges of seaside and waterway.Exotic species Sonneratia apetala (SA) was sporadically found at coastal edges.
As most of the study area is swampy and inaccessible, extra sample data were collected by visual interpretation from the WV-3 pan-sharpened image aided by experience from field survey.The mangrove species were classified into seven classes.Specifically, the species of KO was sub-classified into two types based on their significant difference in crown shape and spectral information.One group was identified as pure KO with dense canopy crown and the other group was noted as a mixture of KO and AI in which the canopy crown of KO is relatively small and sparse while understory AI is dense.A total of 245 samples were selected from the study area as shown in Figure 1 and these samples were randomly partitioned into training and testing samples in the ratio of 7:3.Species classes and sample size are described in table 2.

Feature Generation and Selection
Feature selection algorithms are able to identify important features, reduce data redundancy, and increase the model interpretability, meanwhile improving the speed and accuracy in training as well as in classification [7,21].To this end, feature selection was performed with recursive feature

Field Survey and Reference Data Collection
Species distribution surveys were conducted during March and May in 2018 when Aegiceras corniculatum (AC) blossomed.The presence of flowers improved the certainty and efficiency in distinguishing different species.The mangroves were accessed by a floating boardwalk and by a boat from south to north along the mangrove edge in the Deep Bay.A 5 m accuracy GPS was used to measure the location of in situ samples.Site photos and tree heights were also recorded.According to the site survey, Kandelia obovata (KO) and Avicennia marina (AM) were the two dominant overstory species.Acanthus ilicifolius (AI) was mainly distributed as pioneer species along coastal edges or as understory in the middle part of the study area.AC was frequently found at the edges of seaside and waterway.Exotic species Sonneratia apetala (SA) was sporadically found at coastal edges.
As most of the study area is swampy and inaccessible, extra sample data were collected by visual interpretation from the WV-3 pan-sharpened image aided by experience from field survey.The mangrove species were classified into seven classes.Specifically, the species of KO was sub-classified into two types based on their significant difference in crown shape and spectral information.One group was identified as pure KO with dense canopy crown and the other group was noted as a mixture of KO and AI in which the canopy crown of KO is relatively small and sparse while understory AI is dense.A total of 245 samples were selected from the study area as shown in Figure 1 and these samples were randomly partitioned into training and testing samples in the ratio of 7:3.Species classes and sample size are described in Table 2.

Feature Generation and Selection
Feature selection algorithms are able to identify important features, reduce data redundancy, and increase the model interpretability, meanwhile improving the speed and accuracy in training as well as in classification [7,21].To this end, feature selection was performed with recursive feature elimination (RFE) based on the Random Forest (RF) model in this study.Recursive feature elimination is a backward selection algorithm which inputs all features to fit a random forest model at the beginning and returns the ranking of feature importance.At each iteration, top-ranked features are retained to train the model.A performance profile is created to summarize all the iterations so that an optimal subset can be determined and the ranking of features is returned.
In order to explore the spectral region and optimal spatial resolution for distinguishing mangrove species, both the WV-3 MS image and pan-sharpened image were used and the eight bands of both images were input as spectral data for feature selection.
Various LiDAR metrics derived from point cloud elevation describe tree structure characteristics, which were widely applied in tree classification, crown segmentation and biophysical parameter retrieval [19,31].In order to get meaningful metrics, grid size was suggested to be larger than individual tree crowns.Since the diameter of most of the canopy crowns is less than 5 m, some are even less than 2 m, and the presence of mixed species in some area, a fine-scale grid was more likely to include pure species in each grid.Therefore, a 2 m-grid size which is compatible with the WV-2 MS image resolution and a 5 m-grid size were designed to explore the optimal LiDAR metric resolution for mangrove species classification.AI was observed as the dominant understory species and their canopy height ranged from 0.48 m to 1.62 m as measured in the field survey.Therefore, the height threshold to separate overstory and understory was set to 1.8 m in LiDAR metrics computation.57 elevation metrics were generated by all returns of LiDAR point cloud in 2 m and 5 m cell size respectively in Fusion software (http://forsys.cfr.washington.edu/fusion/fusion_overview.html).The description of 57 elevation metrics is provided in Appendix A Table A1.
Feature selection was processed in R project (https://www.r-project.org/) with package "Caret".The 10-fold cross validation was applied to evaluate the model performance by overall accuracy and Kappa.Feature selection was performed with WV-3 data and LiDAR metrics separately.The selected features were then combined to ascertain the effectiveness of spectral and structural features in mangrove species discrimination and identification.

Classification and Validation
Machine learning classifiers such as Support Vector Machine (SVM) and Random Forest (RF) were successfully applied in vegetation classification studies using remote sensing data [32,33].In the present study, these two supervised classification algorithms were compared for their performance in classifying mangrove species.

Support Vector Machine Classifier
The SVM classifier is based on the statistical learning theory of structural risk minimization (SRM) to improve classification performance.A kernel-based SVM essentially performs the classification by transforming from the original multi-dimension in which the problem is inseparable to a linearly separated higher-dimension through the kernel function.Support vectors are searched to construct a hyperplane with the maximum margin to separate different classes [34].Radial basis function (RBF) kernel is used to map samples to higher-dimensional space nonlinearly which better handles reality circumstances [35].Two parameters, cost of constraint violation (C) and sigma (σ), are used to control the overfitting and shape hyperplane respectively during classifier training.In this study, various pairs of C and sigma (σ) with a setting range of 10 −3 -10 +3 were searched with 10-fold cross validation to determine the best tuning parameters in R project with the "e1071" package.

Random Forest Classifier
RF is an ensemble learning method for classification.It operates by using different bootstrap samples originating from the training data to grow each decision tree.For each node, a subset of input features (Mtry) is randomly selected for searching the best split.When a test sample is input, each tree gives a classification result and the forest chooses the majority voting as the final classification result [36].In this study, the parameter Mtry was tuned with a range from 2 to the number of input features by 10-fold cross validation carrying out in R project with the "randomForest" package.

Validation
After using training samples to train models, testing samples were input to model for species identification.Classification accuracy was assessed by overall accuracy (OA) and Kappa coefficient for the entire dataset, whilst user's accuracy (UA) and producer's accuracy (PA) were computed for individual classes.

Feature Selection
Table 3 summarizes the feature selection result of each data.When WV-3 spectral bands were the only inputs for feature selection, the selected features of the original 2 m MS image and the 0.5 m pan-sharpened image were similar.For the original 2 m MS image, seven bands were selected except the coastal blue bands (B1), while for the 0.5 m pan-sharpened image, all of eight bands were selected.Both results showed similar feature importance.The green band (B3) and red edge band (B6) were identified as the most important features followed by the yellow band (B4), red band (B5) and NIR band (B7).The blue band (B2), extra NIR2 band (B8), and coastal blue band (B1) were, however, considered as less important in distinguishing mangrove species.Differences of WV-3 bands' value among species are shown in Figure 3, KO and KOAI were very similar in visible bands but KO had a much higher spectral reflectance than KOAI in the red edge band (B6) and NIR bands (B7 and B8).A few pairs of classes were hard to be differentiated.For instance, AC and AI tended to have very similar spectral reflectance in most bands but the green band (B3) might be able to contribute some information.Grass and AM showed great difference from other species in the red band (B5).
result [36].In this study, the parameter Mtry was tuned with a range from 2 to the number of input features by 10-fold cross validation carrying out in R project with the "randomForest" package.

Validation
After using training samples to train models, testing samples were input to model for species identification.Classification accuracy was assessed by overall accuracy (OA) and Kappa coefficient for the entire dataset, whilst user's accuracy (UA) and producer's accuracy (PA) were computed for individual classes.

Feature Selection
Table 3 summarizes the feature selection result of each data.When WV-3 spectral bands were the only inputs for feature selection, the selected features of the original 2 m MS image and the 0.5 m pan-sharpened image were similar.For the original 2 m MS image, seven bands were selected except the coastal blue bands (B1), while for the 0.5 m pan-sharpened image, all of eight bands were selected.Both results showed similar feature importance.The green band (B3) and red edge band (B6) were identified as the most important features followed by the yellow band (B4), red band (B5) and NIR band (B7).The blue band (B2), extra NIR2 band (B8), and coastal blue band (B1) were, however, considered as less important in distinguishing mangrove species.Differences of WV-3 bands' value among species are shown in Figure 3, KO and KOAI were very similar in visible bands but KO had a much higher spectral reflectance than KOAI in the red edge band (B6) and NIR bands (B7 and B8).A few pairs of classes were hard to be differentiated.For instance, AC and AI tended to have very similar spectral reflectance in most bands but the green band (B3) might be able to contribute some information.Grass and AM showed great difference from other species in the red band (B5).When the LiDAR elevation metrics were input for feature selection, the results showed a slight difference between the 2 m-grid metrics versus the 5 m-grid metrics.10 metrics were retained from When the LiDAR elevation metrics were input for feature selection, the results showed a slight difference between the 2 m-grid metrics versus the 5 m-grid metrics.10 metrics were retained from the 2 m-grid metrics while eight metrics were selected from the 5 m-grid metrics.Six metrics were commonly selected including canopy height model (CHM), standard deviation of elevation (Elev stddev), variance of elevation (Elev var), canopy relief ratio, absolute average deviation of elevation (Elev AAD) and Elevation L2 (Elev L2).In the 2 m-grid, metrics 99th and 95th percentile height (Elev P99 and Elev P95), mean and median of elevation were also considered important.Additionally, skewness of elevation (Elev.skewness)and the number of returns above the mean height/number of total first returns × 100 ((All returns above mean)/(Total first returns) × 100) were chosen from the 5 m-grid metrics.Figure 4 describes the difference among species in the 10 important LiDAR metrics.
Remote Sens. 2019, 11, x FOR PEER REVIEW 8 of 17 the 2 m-grid metrics while eight metrics were selected from the 5 m-grid metrics.Six metrics were commonly selected including canopy height model (CHM), standard deviation of elevation (Elev stddev), variance of elevation (Elev var), canopy relief ratio, absolute average deviation of elevation (Elev AAD) and Elevation L2 (Elev L2).In the 2 m-grid, metrics 99th and 95th percentile height (Elev P99 and Elev P95), mean and median of elevation were also considered important.Additionally, skewness of elevation (Elev.skewness)and the number of returns above the mean height/number of total first returns × 100 ((All returns above mean)/(Total first returns) × 100) were chosen from the 5 m-grid metrics.Figure 4 describes the difference among species in the 10 important LiDAR metrics.In order to determine the best resolution, the optimal feature subsets were input to train the classifiers and assess the classification accuracy.As shown in table 3, 2 m resolution WV-3 image produced higher OA (0.70 for RF, 0.72 for SVM) than 0.5 m fused image (0.68 for RF and SVM).Similarly, 2 m LiDAR metrics produced slightly higher OA (0.79) than 5 m LiDAR metrics (0.78) using both SVM and RF classifiers.Therefore, 2 m resolution features were subsequently combined for feature selection.14 features were selected from the combined dataset including eight LiDAR elevation metrics and six spectral bands.Feature selection result showed that LiDAR features were more important than spectral features as the top five selected features were all LiDAR metrics followed by the red edge band (B6).The other five bands were regarded as less important features.In order to determine the best resolution, the optimal feature subsets were input to train the classifiers and assess the classification accuracy.As shown in Table 3, 2 m resolution WV-3 image produced higher OA (0.70 for RF, 0.72 for SVM) than 0.5 m fused image (0.68 for RF and SVM).Similarly, 2 m LiDAR metrics produced slightly higher OA (0.79) than 5 m LiDAR metrics (0.78) using both SVM and RF classifiers.Therefore, 2 m resolution features were subsequently combined for feature selection.14 features were selected from the combined dataset including eight LiDAR elevation metrics and six spectral bands.Feature selection result showed that LiDAR features were more important than spectral features as the top five selected features were all LiDAR metrics followed by the red edge band (B6).The other five bands were regarded as less important features.The optimal subset sizes ranged from 7 to 14 based on all feature selection exercises, which was consistent with previous studies [7,21].Figure 5 shows the variation of overall accuracy and Kappa generated from the RFE RF model with changes in the number of features used in the combined data.Overall accuracy and Kappa coefficient kept increasing when the number of features increased.The improvement is more significant when the subset sizes were less than 10.When beyond 10 features, the improvement was dampened and revealed a trend of gradual saturation between 11 to 15 features.In this example, the highest overall accuracy (0.91) and Kappa (0.89) were achieved with 14 features used.When the subset size was larger than 15, Kappa coefficient saturated and showed a decline with more features added.The optimal subset sizes ranged from 7 to 14 based on all feature selection exercises, which was consistent with previous studies [7,21].Figure 5 shows the variation of overall accuracy and Kappa generated from the RFE RF model with changes in the number of features used in the combined data.Overall accuracy and Kappa coefficient kept increasing when the number of features increased.The improvement is more significant when the subset sizes were less than 10.When beyond 10 features, the improvement was dampened and revealed a trend of gradual saturation between 11 to 15 features.In this example, the highest overall accuracy (0.91) and Kappa (0.89) were achieved with 14 features used.When the subset size was larger than 15, Kappa coefficient saturated and showed a decline with more features added.

Classification and Validation
Table 4 shows the confusion matrix together with the overall accuracy and Kappa yielded by different classifiers when the classification was performed with WV-3 data only.SVM (OA = 0.72 with Kappa = 0.66) performed slightly better than RF (OA = 0.70 with Kappa = 0.63).

Classification and Validation
Table 4 shows the confusion matrix together with the overall accuracy and Kappa yielded by different classifiers when the classification was performed with WV-3 data only.SVM (OA = 0.72 with Kappa = 0.66) performed slightly better than RF (OA = 0.70 with Kappa = 0.63).Classification results from using only the LiDAR data showed that SVM and RF produced nearly similar overall accuracy at 0.79 with Kappa coefficient at 0.74 and 0.75 respectively (Table 5).Specifically, RF yielded higher PA in AC, grass, KO, and SA while SVM yielded higher PA in AI and AM.With the combined data, both SVM and RF obtained similar results.OA and Kappa were 0.88 and 0.85 for the SVM classifier and 0.87 and 0.85 for the RF classifier.As shown in Table 6, SVM identified AI and SA better than RF while RF outperformed SVM in identifying AC.In comparison of the data source, LiDAR data outperformed WV-3 images.AC, AI, and AM were not accurately mapped when using the WV-3 data only.AC only obtained around 0.2 of PA and was largely misclassified to AI, AM, and KO.AI were misclassified as AC.AM only obtained about 0.65 of PA.Approximately 40% of AM were misclassified as KOAI.The presence of gaps and shadows might have affected the spectral reflectance of AM.Meanwhile, training samples of KOAI also included the gap between KO and AI.Therefore, there is a higher chance of misclassification result if spectral reflectance was considered alone.
The LiDAR data is able to differentiate the three mixed species AC, AI, and AM as the PA were improved to around 0.5, 0.8, and 0.9 respectively.However, the PA of grass and SA declined to 0.5.Low shrub vegetation such as grass and AI were mixed due to similar heights.The same problem was observed in exotic SA, KO, and AM with similar height.
Compared to the two independent data, the combined feature dataset significantly enhanced the PA and UA of most of the species.KO and KOAI yielded over 90% PA and UA using SVM and RF algorithms.Among the species, AC yielded the lowest PA and UA. Figure 6 shows the RF classified maps using WV-3 data, LiDAR data, combined data, and the false color WV-3 image is for reference.As shown in Figure 6b, using spectral data alone tends to produce more scattered species pixels than the one using LiDAR data alone (Figure 6c).Based on visual analysis, areas of AC and AI were underestimated in Figure 6b but overestimated in Figure 6c.Combining both data improved the identification of AC and AI.Additionally, shadow influence along the coastal and salt and pepper effect were significantly reduced with the combined data.

Discussion
In this study, important spectral bands of WV-3 and crown structure features derived from airborne LiDAR were used for mangrove species classification in Mai Po Ramsar site in Hong Kong.Two machine learning algorithms, SVM and RF, were applied to examine the capacity of the selected spectral, structure features and their combination in mangrove species classification.The result revealed that both classifiers made the best of combined features for mangrove species classification yielding the highest OA at 0.88 comparing to using WV-3 (OA at around 0.70) and LiDAR (OA at 0.79), individually.
The recursive feature elimination based on RF selected a series of important features and determined the optimal feature subset size for each data.Feature selection results suggested useful bands of WV-3 in differentiating the species including the green, red edge, red, yellow, NIR1, blue, and NIR2 bands.The two most important bands, green and red edge, were sensitive to the concentration of the chlorophyll [37] and were controlled by the leaf internal scattering [38], respectively.The red and blue bands were absorption peaks of chlorophyll for photosynthesis, which clearly separated grass from the mangrove species.The yellow band reflected non-green pigments, such as carotenoids and xanthophyll in leaf.Last but not least, although both two NIR bands were dominantly controlled by the cellular structure within the leaf and sensitive to moisture content [39,40], NIR1 was considered more useful than NIR2 because NIR1 is able to tell apart the AM, KO, and grass.
Therefore, spectral bands that are able to reveal differences in pigment, chemical compounds and leaf internal structure among species are useful.This echos the results found in previous studies.However, species have similar pigment contents, and the leaf internal structure is hard to be differentiated using spectral data alone.As demonstrated by this study, species such as AC, AI, and AM share similar spectral reflectance.

Discussion
In this study, important spectral bands of WV-3 and crown structure features derived from airborne LiDAR were used for mangrove species classification in Mai Po Ramsar site in Hong Kong.Two machine learning algorithms, SVM and RF, were applied to examine the capacity of the selected spectral, structure features and their combination in mangrove species classification.The result revealed that both classifiers made the best of combined features for mangrove species classification yielding the highest OA at 0.88 comparing to using WV-3 (OA at around 0.70) and LiDAR (OA at 0.79), individually.
The recursive feature elimination based on RF selected a series of important features and determined the optimal feature subset size for each data.Feature selection results suggested useful bands of WV-3 in differentiating the species including the green, red edge, red, yellow, NIR1, blue, and NIR2 bands.The two most important bands, green and red edge, were sensitive to the concentration of the chlorophyll [37] and were controlled by the leaf internal scattering [38], respectively.The red and blue bands were absorption peaks of chlorophyll for photosynthesis, which clearly separated grass from the mangrove species.The yellow band reflected non-green pigments, such as carotenoids and xanthophyll in leaf.Last but not least, although both two NIR bands were dominantly controlled by the cellular structure within the leaf and sensitive to moisture content [39,40], NIR1 was considered more useful than NIR2 because NIR1 is able to tell apart the AM, KO, and grass.
Therefore, spectral bands that are able to reveal differences in pigment, chemical compounds and leaf internal structure among species are useful.This echos the results found in previous studies.However, species have similar pigment contents, and the leaf internal structure is hard to be differentiated using spectral data alone.As demonstrated by this study, species such as AC, AI, and AM share similar spectral reflectance.
Canopy height and crown shape were good indicators for vegetation species classification [18,19] and this study shows similar results.As shown in Figure 4, LiDAR metrics describing canopy height such as CHM, cubic mean of elevation, 95th percentile, and 95th percentile of the elevation were selected as useful metrics because AI and grass can easily be separated from arbor species.Meanwhile, the height of KO was generally taller than that of AM and SA.Hence, important LiDAR metrics describing the variability of LiDAR point elevation including standard deviation of elevation, variance of elevation, and median of elevation, which reflected canopy shape and vertical stratification was identified.For example, canopies of KO and AM were sparse and more open, and so LiDAR pulses can hit the understory of AI directly through gaps and generated returns from lower elevation.Therefore, the variability of canopy height in KOAI and AM samples were larger than other species with dense canopies.For another example, the exotic SA is usually clustered in groups or as an isolated tree among a patch of pure species which is characterized by the flat canopy such as pure KO and AC.Canopy relief ratio as a descriptor of crown shape from altimetry observation reflected the canopy surfaces that were in the upper or lower portion of the height range.Both [38,41] indicated the importance of vertical stratification of samples.Thus, single canopy layer of AI, grass, and KO obtained extreme values in canopy relief ratio.Shi et al. and Liu et al. found LiDAR data acquired in leaf-off or semi-leaf-on period interacted more with upper canopy and the spatial characteristics of trunk and branch, and thus could be better described [17,19].Likewise, although mangrove was evergreen vegetation in our study site, the majority of the upper canopies of mangrove were sparse in Mai Po.Therefore, LiDAR structure metrics possessed important information and could classify species with similar spectral characteristics.From the above discussion, the combined data made a complimentary use of spectral and spatial structure features and produced the best accuracy.
The classified map of WV-3 data (Figure 6b) is more scattered than classified maps of LiDAR (Figure 6c) and combined data (Figure 6d).In the study area, the upper mangrove canopies are relatively sparse resulting in a lot of little gaps and shadows appearing in the WV-3 image, which can affect the mangrove canopy reflectance and lead to a mix of canopy and gap/shadow pixels in the classified maps.The canopy heights derived from the first LiDAR return produces a much smoother canopy surface and areas of same species tend to produce a more aggregated result.While there is a difference in canopy height of two species, the boundary is very distinctive.The classified map of WV-3 data (Figure 6b) showed an underestimation on AC.Among the species, AC and KO were hard to discriminate either in fieldwork or through visual interpretation as they have very similar spectral properties and physical appearance.Besides, the areal coverage of AC is relatively small while KO is the dominant species.Hence, reliable training samples of AC was far less than that of KO.The sole use of WV-3 image is not able to separate AC from KO. From fieldwork experience, the canopy height of mature KO is taller than that of AM and AC.This is also revealed by the CHM metric in Figure 4.The average canopy heights of KO, AM, and AC were 6.57, 5.18, and 3.36 m respectively.Hence, species with low canopy height were classified as AC in LiDAR data.
The comparative analysis of spatial resolution demonstrated that at 2 m resolution, both WV-3 and LiDAR data were suitable for pixel-based classification.In terms of WV-3 data, the pan-sharpening process changed the original spectral information to some degree which might affect true spectral characteristics of the species.Meanwhile, many studies indicated that object-based classification outperformed pixel-based classification for very high-resolution image [2,7].However, it was challenging to apply an object segmentation in this study as different mangrove species would be mixed together and grew to patches of flat canopy surface.It is also found that high return density LiDAR enabled meaningful metrics to be derived in finer grid size.The 5 m-grid metrics attained marginally lower accuracy than 2 m-grid metrics.The integration of WV-3 and LiDAR data at the same 2 m spatial resolution also provided the most accurate classification result.
Previous studies found that SVM outperformed RF [7,32,33].In this study, both classifiers obtained nearly similar overall accuracy and kappa when the combined dataset was used.However, the two classifiers showed their respective advantages in identifying various species.For SVM, it obtained more obvious improvement in classifying AI and SA using the combined data.RF used bootstrap samples to grow each decision tree, which was less sensitive to outliners.It should be noted that there were limited samples for some species (e.g., AC, SA, grass) in the study site.RF randomly selected a subset of features to search for the best split for nodes which enabled RF to handle collinear features better.In terms of computation efficiency, RF was more efficient than SVM because SVM needed more computation time in grid search in order to find the best parameter pairs.

Conclusions
This study compared single WV-3 image, LiDAR data and the combined dataset in classifying six mangrove species in the Mai Po Nature Reserve.The result demonstrated that the combined data can effectively identify and map the species and obtained better accuracy than previous studies in the same area [6,42,43].The identification of species Aegiceras corniculatum (AC) and Avicennia marina (AM) was greatly improved when compared with using WV-3 or LiDAR data alone.Important features were selected from the spectral and LiDAR data using recursive feature elimination (RFE) based on the Random Forest (RF) model.The red edge and green band followed by the red, yellow and NIR were considered as useful spectral bands.LiDAR elevation metrics describing crown characteristics from three aspects, canopy height, variation of canopy surface, and canopy stratification were selected as important canopy structure features.It also revealed that the LiDAR structural features were more important than spectral data for discrimination between the mangrove species.In terms of image classification, this study found RF algorithm is more effective in handling the combined data set as compared to SVM.
The significance of this study could be addressed in two aspects.First, distribution of mangrove species was mapped with high accuracy, which may assist the government official to monitor and conserve the mangrove habitats, and prevent the extent of the invasion of exotic species such as SA.Secondly, the combined data showed great potential not only in identifying mangrove species, but also in describing the vertical stratification which is not available before.However, some challenges should be addressed in future studies.In this study, LiDAR crown shape and structure descriptors, as well as LiDAR intensity metrics, had not been fully explored for classification.Besides, the data can contribute to the estimation of other biophysical parameters and evaluation of ecosystem service.

Figure 1 .
Figure 1.Map of the study area and the distribution of ground truth samples.

Figure 1 .
Figure 1.Map of the study area and the distribution of ground truth samples.

Figure 2 .
Figure 2. Illustration of mangrove vertical profile produced from LiDAR data in the study area.The LiDAR point cloud is displayed by height and the location of sample area is shown in Figure 1.

Figure 2 .
Figure 2. Illustration of mangrove vertical profile produced from LiDAR data in the study area.The LiDAR point cloud is displayed by height and the location of sample area is shown in Figure 1.

Figure 5 .
Figure 5. Overall accuracy and Kappa changing with number of features by recursive feature elimination (RFE) basing on the Random Forest (RF).

Figure 5 .
Figure 5. Overall accuracy and Kappa changing with number of features by recursive feature elimination (RFE) basing on the Random Forest (RF).

Figure 6 .
Figure 6.(a) WV-3 multispectral image and classification image, (b) RF classification based on WV3 images, (c) RF classification based on LiDAR metrics, (d) RF classification based on Combined WV3 and LiDAR.

Figure 6 .
Figure 6.(a) WV-3 multispectral image and classification image, (b) RF classification based on WV3 images, (c) RF classification based on LiDAR metrics, (d) RF classification based on Combined WV3 and LiDAR.

Table 1 .
Specification of LiDAR (Airborne Light Detection and Ranging) system and LiDAR data.

Table 2 .
Vegetation types and their training and testing sample for image classification.

Table 2 .
Vegetation types and their training and testing sample for image classification.

Table 3 .
Selected features and corresponding classification accuracy.

Table 3 .
Selected features and corresponding classification accuracy.

Table 4 .
Confusion matrix for classification with WV-3 data.

Table 4 .
Confusion matrix for classification with WV-3 data.

Table 5 .
Confusion matrix for classification with LiDAR data.

Table 6 .
Confusion matrix for classification with combining WV-3 and LiDAR data.