Discrimination of Tropical Mangroves at the Species Level with EO-1 Hyperion Data

Understanding the dynamics of mangroves at the species level is the key for securing sustainable conservation of mangrove forests around the globe. This study demonstrates the capability of the hyper-dimensional remote sensing data for discriminating diversely-populated tropical mangrove species. It was found that five different tropical mangrove species of Southern Thailand, including Avicennia alba, Avicennia marina, Bruguiera parviflora, Rhizophora apiculata, and Rhizophora mucronata, were correctly classified. The selected data treatment (a well-established spectral band selector) helped improve the overall accuracy from 86% to 92%, despite the remaining confusion between the two members of the Rhizophoraceae family and the pioneer species. It is therefore anticipated that the methodology presented in this study can be used as a practical guideline for detailed mangrove species mapping in other study areas. The next stage of this work will be to exploit the differences between the leaf textures of the two Rhizophoraceae mangroves in order to refine the classification outcome.

The hyperspectral sensor is a new generation sensor that has the capability to collect images of hundreds or more contiguous spectral bands [48][49][50][51].A number of related studies claim the advantage of exploiting such hyper-dimensional data [47,[51][52][53][54][55][56][57][58][59][60][61].Some of these reports try to use the hyperspectral data for discriminating mangroves at the species level [25,26,44,51,62].However, it is unfortunate that the outcome of these studies are still inconclusive as their study sites are covered by only a few mangrove species, and a recent conclusion [46] is even in doubt as a result of a poor choice of the GPS measurement.
Due to high dimensionality of hyperspectral data, the practitioner is faced with difficulties of covariance matrix inversion [60,[63][64][65][66].This is called the Hughes phenomenon or the curse of dimensionality [67,68].Furthermore, the co-linearity (i.e., redundant spectral information) also imposes the risk of over fitting when the classification is performed [67,69].To alleviate this problem, the dimensionality of hyperspectral data needs to be reduced while preserving the key spectral information [63].In the remote sensing literature, a popular approach to reducing the spectral dimension is to use feature selection algorithms [60,65,67,68,[70][71][72][73].The genetic search algorithm (GA) is one of the most frequently used band selection found in the recent literature and was also proved to be effective for selecting spectral subsets for vegetation classification [60,72,74].
Consequently, this study investigates further into the potential of remote sensing for mangrove mapping.The aim of this work is to prove for the first time whether the space-borne hyperspectral data can be used for discriminating and mapping diversely populated tropical mangrove species.Thus, the objective of this study was to test capability of the hyperspectral data and the feature selection algorithms for classifying mangroves at the species level.The study area was the Pak Phanang mangrove forest of Southern Thailand, which is densely covered by five different tropical mangrove species.The satellite-borne hyperspectral data used was the EO-1 Hyperion hyperspectral image.The dimension of the hyperspectral data was reduced using a well-established genetic search algorithm [60,72].The final classification results were statistically tested against the independent testing data set under a data rotation scheme.

Study Site
The study site (Figure 1) is at the Talumpuk cape, Pak Phanang District, Nakorn Sri Thammarat Province, Thailand (8°31′N, 100°9′E).The eastern side of the cape is a long narrow sandy beach.The rest of the land is a large intertidal mudflat extensively covered by dense mangrove forests (approximately 57 km 2 ).Seven mangrove species were reported to be found in this area, including Avicennia alba * , Avicennia marina * , Avicennia officinalis, Bruguiera parviflora * , Rhizophora apiculata * , Rhizophora mucronata * , and Sonneratia caseolaris [75], but now only five indicated with asterisks are dominant.The most prominent species is R. apiculata which covers approximately one third of the cape on the western section.The mangrove species of the study area barely intermingle-each species is found surrounded almost solely by other trees of the same species [60].The climate of the study area is tropical.The dry period is between February and April, and the rest of the year is dominated by monsoons [75].

Image Acquisition and Processing
The EO-1 Hyperion image was captured on 29 June 2010, covering the western side of the Talumpuk cape (see a sample image in Figure 1).The Hyperion image has 242 wavebands ranging from 400 nm to 2,500 nm with 10 nm spectral resolution and 30 m spatial resolution [76].The image was provided as Hyperion level 1R data and was radiometrically corrected and calibrated into 196 wavebands.Only 155 stable bands [77] were selected for this study.A de-streaking algorithm [78] was required to minimize the effect of systematic noise.Then, the image was atmospherically corrected and transformed to reflectance using the MOD-TRAN-based FLAASH (Fast Line-of-sight Atmospheric Analysis of Spectral Hypercubes) algorithm under the environment of commercial software (ENVI version 4.7).It provides well-adjusted input for the atmospheric correction through derivation of atmospheric properties such as surface albedo, surface altitude, water vapor column and aerosol from the image [79].The locations of easily recognizable landscape features (e.g., canals, roads and houses) were recorded and used for rectifying the image.The ground control points were recorded by hand-held GPS receivers (Garmin 60CSX), and the differential global positioning system (DGPS) technique [80] was used for post-processing the GPS data.The final positional accuracy of the image after resampling (i.e., using a nearest neighbor algorithm) is less than the size of one pixel (i.e., <0.33 pixels).

Field Data Collection
The field data collection was conducted during the dry season between February and March 2011.The field data were collected 8 months after the image acquisition.With 5 years of experience in the study area [11,21] the Pak Phanang mangroves are deemed to be unchanged over this period of 8 months as the composition of mangrove forests is generally resilient to natural interference [81].A stratified random sampling method was used for locating the sampling plots.The stratification was done by clustering the study area into 15 clusters using a K-Mean method.Mangrove species composition of the trees (i.e., ≥2.5 m high) was recorded from each 30 × 30 m 2 sampling station.The recorded forest stand parameters were species names, tree heights, diameters at breast height, crown cover areas, and DGPS coordinates in the UTM system.Then, the floristic composition of each sampling station was classified into five dominant species (i.e., R. mucronata, R. apiculata, A. marina, A. alba, or B. parviflora) under the supervision of the Royal Thai Marine and Coastal Resource department.There were 402 sampling stations in total.The stations were randomly divided in to two groups for the purpose of image classification and validation (Table 1).The similarity of the spectral properties of these five mangrove species is displayed in Figure 2. The mean spectral profiles are stacked on one another for clarity.Thus, the vertical axis of Figure 2 has no physical meaning.

Genetic Search Algorithm (GA)-Based Band Selection and Classification
A well-established band selection and classification algorithm was used in this study [60].The concept of the algorithm is summarized in Figure 3.The algorithm was run with the following initial parameters: Population Size = 500; Crossover Rate = 80%; and Mutation Rate = 1%, and stopped when there was no improvement in the fitness function over 10 consecutive iterations.Following an unconstrained combinatorial optimization search [60,82], the algorithm had to be trialed with different chromosome sizes to find a wining chromosome length that indicated the appropriate number of spectral bands needed for the classification process.According to the guidelines for chromosome size selection [60], this study used the chromosome sizes varied from 2 to 9. Firstly, the 402 samples were randomly divided in half to create training and testing data for the classification.This process was repeated for 30 times in order to rotate the input data.This already-rotated input data (30 sets in total) were then fed into the algorithm one set at a time.Secondly, spectral subsets were randomly assigned to each chromosome, and the fitness value of each chromosome was determined at this stage.The overall accuracy of the spectral angle mapper (SAM) classifier was used as the fitness value.Then, the cross-over and mutation modules were applied to the chromosomes one after another to reproduce the offspring (i.e., new generation chromosomes).Lastly, the whole process was started over again as the new generation chromosomes were tested for their fitness scores.The spectral angle mapper classifier (SAM) is one of the most popular classification techniques for hyperspectral data [83,84].First, the reflectance of each pixel is coded as n-dimensional vectors.Next, the angular distance between each vector and the references are calculated and compared.Each unknown vector is then classified to the nearest class.However, if the angular distances of an unknown vector are found to be greater than a pre-defined threshold, the unknown vector is then assigned to the unclassified class [51,[85][86][87].In this study, SAM was used by the genetic algorithm for calculating the fitness value of the each chromosome.

Sequential Forward Selection
A typical sequential forward selection (SFS) algorithm [88] was used in this study for the purpose of comparison with the genetic search algorithm in order to see if there is any bias in the final classification results.The SFS method is a suboptimal search algorithm that collects the spectral features that have highest objective values until the number of features reaches the pre-defined number [89,90].

Statistical Test
At the final stage of this study, a two-tailed paired t-test statistic was used to test for bias in the final classification results when different feature selection algorithms were used.The classification results (i.e., the overall accuracies and κ statistics) were statistically compared given the null hypothesis H0: μ 1 = μ 2 and alternative H 1 : μ 1 ≠ μ 2 .Then, the p-values of the test were to be reported.

The Genetic Algorithm (GA) Band Selector
According to the guidelines for chromosome size selection [60], this study used the chromosome sizes varied from 2 to 9. The results of the chromosome size variations were displayed in Figure 4.It was found that the 7-channel chromosome gained the highest average class separability of 87% (i.e., 86.8% in Figure 4) with a standard deviation of ±2%.Note that each vertical bar in Figure 4 indicates the standard deviation after rotating the data 30 times.Additionally, the 8-channel and 6-channel chromosomes were the second and third best performers in terms of average class separability (85.6% and 85.1% in Figure 4, respectively).After rotating the input data 30 times, the performance of the winning chromosome that possessed 7 spectral channels was displayed in Table 2a.The best classification results belonged to the 9th rotation (see the bold area in Table 2a).It was found that the spectral channels of this winning chromosome were 549 nm, 712 nm, 732 nm, 1,034 nm, 1,235 nm, 2,073 nm, and 2,083 nm.This specific band combination gained the training accuracy of 94% and the testing accuracy of 92%.

Table 2. (a)
The performance of the 7-channel chromosome (the winning chromosome) after rotating the input data for 30 times with the best band combination highlighted in bold typeface; (b) The performance of the 7-channel features selected by the SFS method with the best band feature highlighted in bold typeface under a data rotation scheme.OA = Overall Accuracy.For the purpose of visualization, the spectral bands selected by the genetic algorithms were grouped by minimizing their variances.Only the principal spectral locations and the standard deviation bars of the 6-channel, 7-channel, and 8-channel chromosomes were displayed in Figure 5 for visual clarity.The 8 principal locations of the 7-chanal chromosome (i.e., the winning chromosome) were plotted in black.Two of the 8 locations were in the visible area (477 ± 9 nm and 560 ± 23 nm) and the rest were in the infrared areas (751 ± 40 nm, 1,054 ± 38 nm, 1,244 ± 48 nm, 1,538 ± 26 nm, 1,757 ± 36 nm and 2,122 ± 77 nm).The errors given are one standard deviation.The principal spectral locations of the 8-channel and 6-channel chromosomes (i.e., the second and third best performers, respectively) are plotted with different colors in Figure 5.

The Sequential forward Selection
The 7-channel selection of the SFS method is illustrated in Table 2b.In general, the spectral combinations selected by the SFS algorithm were different from the results of the GA method.However, when grouping all of the selected bands by minimizing their variances (see red areas in Figure 5), the principal locations of the SFS method were rather similar to the principal locations of the genetic algorithms.Three of the 7 locations were in the visible spectral region (476 ± 25 nm, 553 ± 20 nm and 641 ± 0 nm) and the rest were in the infrared regions (749 ± 26 nm, 834 ± 5 nm, 1,082 ± 32 nm, and 2,151 ± 68 nm).Two of these principal locations, at 476 nm and 834 nm, even coincided with the results of the 8-channel chromosome.Additionally, the best spectral combination in terms of classification accuracy belonged to the 26th rotation (see the bold highlight in Table 2b).This specific band combination gained the overall testing accuracy of 87%.

The Image Classification
For brevity, this report only presents the classification result of the best spectral combination selected by the two feature selection algorithms and compars it to the situation without the intervention of the two algorithms (i.e., using all 155 spectral bands) (Table 3a-c).The total testing accuracy was improved from 86% to 87% and 92% after applying the genetic band selector and the traditional SFS algorithm, respectively.It was clear that there was a bias in the final classification results when different feature selection algorithms were used.Most values of the producer's and user's accuracies were increased after changing the feature selection methods from the SFS algorithm to the genetic band selector.In particular, the confusion between the RA and RM classes was significantly improved (please compare the highlighted area in Table 3a-c.However, the two outliers were noticeable, including the decrease of the RA producer's accuracy and the AA user's accuracy.The classification results of the two methods were statistically compared using a pair t-test.The statistic results confirmed that the overall accuracies and the κ values of the winning chromosome selected by the genetic algorithm were superior to the classification results of the SFS algorithm (i.e., p-value < 0.001).Finally, the classified images were demonstrated in Figure 6.For brevity, only the classified images of the two feature selection algorithms are displayed.The non-mangrove areas and the clouds in Figure 6 are masked in black and white tones.

Table 3. (a)
The confusion matrix, producer's and user's accuracy of the winning chromosome selected by the genetic search algorithm (Overall Accuracy = 92%), (b) The confusion matrix, producer's and user's accuracy of the band combination selected by the SFS feature selector (Overall Accuracy = 87%), and (c) The confusion matrix, producer's and user's accuracy of the all-spectral-band combination (Overall Accuracy = 86%).

Discussion
In light of the existing literature, it is found that many scientists in the field of remote sensing have already tried to discriminate and map mangroves at the species level [25,26,44,46,51,62], but their efforts have been inconclusive in application to tropical mangrove species discrimination.The authors of [26] should have been the first to conclude this problem, if their hyperspectral image had not been obscured by cloud.Other authors [25,44,51,62] could not make any strong conclusion as their study sites were unfortunately covered with only a few mangrove species.For some [46] it is doubtful as to whether the accuracy of the low-resolution GPS measurement was adequate for their high-resolution image analysis.
This study has demonstrated for the first time that the space-borne hyperspectral data with the help of the well-established genetic search algorithm [60,72] is capable of discriminating and mapping diversely populated tropical mangrove species of Southern Thailand.This claim is supported by the classification results of five different tropical mangrove species, illustrated in Table 3a.This accuracy level is acceptable for the purpose of species-level classification by the USGS standard [91].
Moreover, the selected data rotation method (i.e., rotating the independent testing data for 30 times) helped ensure the classification results.
Despite the fact that testing accuracy was as high as 92%, the difficulties in discriminating between the Rhizophora apiculata (RA) class and the Rhizophora mucronata (RM) class are still noticeable (i.e., the highlighted area in Table 3a).This spectral confusion agrees with the previous work [21,47].The authors reported that the two species could not be clearly separated, even with the help of high signal-to-noise laboratory data and post-classification treatment.A new study is now under way to solve this problem.As both mangroves have quite different leaf shapes [92], it is hypothesized that the difference between their leaf textures could be exploited for this purpose.
The other observable outlier is the user's accuracy of the A. alba class (please see Table 3a).Unlike the other classes, the user's accuracy of the A. alba class is lower than its producer's accuracy.This discrepancy reflects the actual situation of the A. alba class in the field.The A. alba mangrove is a pioneer mangrove species of the study site (i.e., the leading mangrove to colonize the study area) [75].Therefore, the A. alba mangrove is typically mixed with the other classes throughout the study area.
Unlike the selection results of the genetic algorithm (Table 2a), band combinations selected by the SFS method (Table 2b) were found to be less meaningful.For each search, the sequential forward selection algorithm repeatedly selected spectral locations from the same spectral regions, and was not spread over the significant locations listed in the previous paragraph.For example, it was found in the 26th iteration in Table 2b (i.e., the sequential forward selection winning combination) that there were three very close spectral bands (i.e., 498 nm, 529 nm, and 569 nm).It was evident that the sequential forward selection search could not overcome the local minima problem, and it explained the lower classification accuracy gained by the sequential forward selection method when compared to the ones gained by the Genetic Algorithm-based search.However, when lumping the results of all iterations together by minimizing the variances (see red areas in Figure 5), the principal locations of the sequential forward selection method were very similar to the principal locations of the genetic algorithms.This may be explained by the use of the data rotation scheme, as it helped the search algorithm to overcome the local minima (i.e., starting each search from a different region of the feature space).
The genetic search algorithm is flexible in terms of computational variations [60,82,[105][106][107][108].It is possible to vary the initial parameters, including the encoding scheme, population size, crossover rate, selection method, and mutation probability.Furthermore, any other popular classifier (e.g., the maximum likelihood classifier) can be used instead of the spectral angle mapper classifier.Additionally, the fitness scoring system could be changed from tracking the overall accuracy to other optimizing criteria (e.g., monitoring the κ statistic).These modifications may have some influence on the evolution, but it is expected that the robustness of the evolutionary search can still provide similar results [105].However, the study on the effects of these variations is beyond the scope of this work.

Conclusions
This study is the first to confirm the capability of the hyper-dimensional remote sensing data for discriminating diversely-populated tropical mangrove species.It is found that five different tropical mangrove species of Southern Thailand can be correctly classified.With the help of the band selection method, the classification accuracy is improved to 92% despite the remaining confusion between the two members of the Rhizophoraceae family and the mix-up between the pioneer species and the other mangroves.Since the methodology proposed in this study can accurately classify the five tropical mangrove species that possess very similar spectral properties, it is anticipated that this methodology can be used as a guideline for detailed mangrove species mapping in other study areas.Additionally, a follow-up study is now being conducted to solve the problem of differentiating between the leaf textures of the two Rhizophoraceae mangroves and thus refine the classification outcome.

Figure 1 .
Figure 1.The location of the Talumpuk cape (a), Pak Phanang District, Nakorn Sri Thammarat Province, Thailand shown against an enlarged satellite image of the cape (b) captured by the EO-1 Hyperion sensor on 29 June 2010.

Figure 2 .
Figure 2. The stack plot of average reflectance curves of five tropical mangrove species under study.

Figure 3 .
Figure 3.A flowchart (after [60]) showing the concept of the band selection and classification algorithm (OA = Overall Accuracy; GA = Genetic Algorithm).

Figure 4 .
Figure 4.A comparison between the averaged overall accuracies plus the standard deviation bars of eight different chromosome sizes varied from 2 to 9 spectral channels selected by the genetic search algorithm.

Figure 5 .
Figure 5.The principal spectral locations and the standard deviation bars of the 6-channel (blue), 7-channel (black), and 8-channel (green) chromosomes selected by the genetic algorithm against the locations selected by the traditional SFS method (red).

Figure 6 .
Figure 6.(a) The classified image of the winning chromosome selected by the genetic search algorithm (Overall Accuracy = 92%) and (b) the classified image of the 7 spectral-band combination selected by the SFS feature selector (Overall Accuracy = 87%).

Table 1 .
The number of training and testing samples per species and their abbreviations.