Ensemble of ERDTs for Spectral–Spatial Classification of Hyperspectral Images Using MRS Object-Guided Morphological Profiles

In spectral-spatial classification of hyperspectral image tasks, the performance of conventional morphological profiles (MPs) that use a sequence of structural elements (SEs) with predefined sizes and shapes could be limited by mismatching all the sizes and shapes of real-world objects in an image. To overcome such limitation, this paper proposes the use of object-guided morphological profiles (OMPs) by adopting multiresolution segmentation (MRS)-based objects as SEs for morphological closing and opening by geodesic reconstruction. Additionally, the ExtraTrees, bagging, adaptive boosting (AdaBoost), and MultiBoost ensemble versions of the extremely randomized decision trees (ERDTs) are introduced and comparatively investigated for spectral-spatial classification of hyperspectral images. Two hyperspectral benchmark images are used to validate the proposed approaches in terms of classification accuracy. The experimental results confirm the effectiveness of the proposed spatial feature extractors and ensemble classifiers.


Introduction
Due to the technical evolution of optical remote sensors over the last few decades, now the remote sensing (RS) community can obtain diverse data sets with rich spatial, spectral and temporal information. In particular, hyperspectral sensors can provide detailed spectral information with hundreds of spectral wavelengths and can increase the possibility of more accurately discriminating materials of interest. Furthermore, the high (5.0 m ≤ spatial resolution ≤ 10.0 m) and very high (spatial resolution < 5.0 m) spatial resolution (HR, VHR) of some of these sensors enables the analysis of small spatial structures with unprecedented detail. However, the high dimensionality of hyperspectral images may lead to the Hughes phenomenon, in which the classification accuracy will be downgraded in case of the limited number of training samples and the classification method is not capable of handling high-dimensional data [1]. Additionally, while HR and VHR data solve the problem of being able to "see" structure objects and elements, they do not help in focusing on the extraction cut-point splits, either totally or partially performed at random, and the ExtraTrees ensemble version were proposed for use in both classification and regression problems [38]. In our previous work, ERDT and ExtraTrees were investigated for their ability to classify three VHR multispectral images acquired over urban areas, and compared against decision tree (DT, C4.5), bagging, RaF, SVM and RoF in terms of classification accuracy and computational efficiency [10]. However, the performance of other ensemble versions (e.g., bagging, AdaBoost and MultiBoost) of ERDT for RS, particularly for hyperspectral image classification tasks using OMP and OMPsM features, has not yet been investigated. Hence, another contribution of this letter is to introduce and investigate the performance of bagging, AdaBoost and MultiBoost versions of ERDT in a hyperspectral image classification task.

Object-Guided MPs
Generally, morphological operators act on the values of the pixels by considering the neighborhood of the pixels determined by an SE with a predefined size and shape, based on two basic operators: dilation and erosion. In grayscale morphological reconstruction, two images and one SE are involved. One image, the marker f, contains the starting points for the transformation, while the other image, the mask g, constrains the transformation. According to the definitions from MM, morphological opening by reconstruction (OBR) of grayscale images can be obtained by first eroding (returning the minimum values of f contained in the specified SE) the input image and using it as a marker, while closing by reconstruction (CBR) can be obtained by complementing the marker image f, obtaining the OBR, and complementing the subsequent procedure [10,11,24,34,35]. In general, the object-guided morphological OBR can be obtained by first eroding the input image using segmented objects (where J λ S represent the numbers (S) of objects from MRS procedure with scale λ) in the SE approach and by using the result as a marker in geodesic reconstruction by a dilation phase [35]: where the object-guided CBR, obtained by complementing the image, contains the object-guided OBR (OOBR) using ∃J i * ∈ J * as SEs and complements the resulting procedure: In MM, the erosion of f by b at any location (x, y) is defined as the minimum value of all the pixels in its neighborhood defined by b (∃J λ i∈S ∈ J λ S in our case). In contrast, dilation returns the maximum value of the image in the window outlined by b. Then, the erosion and dilation operators can be defined as follows: Finally, if the structuring elements ∃J λ i∈S ∈ J λ S are specified by objects, the OMPs of an image f can be defined as: To avoid possible side effects from unusual minimum or maximum pixel values within objects, OMPsM are proposed by using extra mean pixel values that are contained within regions in an object-oriented manner: Although the use of MPs can help in creating an image feature set that has more discriminative information, the redundancy is still evident in the feature set, particularly for hyperspectral images. Therefore, feature extraction can be used to find the most important features first; then, morphological operators are applied [24]. After PCA is applied to the original feature set, EOMPs and EOMPsM can be obtained by applying the basic principles of OMPs and OMPsM described above to the first few (typically three) features.

ExtraTrees
The ERDT approach is a new decision tree (DT) induction algorithm that selects attribute and cut-point splits, either completely or partially at random, whereas the ExtraTrees algorithm is an ensemble version of unpruned ERDT, which follows by introducing the random committee-based ensemble criterion [38]. By comparing ExtraTrees with other DT-based ensemble methods such as bagging, boosting and RaF, the main differences can be outlined: (1) this new algorithm splits nodes by choosing cut points (which are responsible for a significant part of the error rates of tree-based methods) fully at random in the tree induction phase, which makes the tree structures independent of the target variable values of the learning samples, and (2) it uses the entire set of learning samples, rather than a bootstrap replica sample (typically adopted by the other DT methods), to grow trees.
Let X = {x τ } l τ=1 denote a labeled training set with Y = y τ l τ=1 as the labels, K represents the number of attributes randomly selected at each node, and η is the minimum sample size for splitting a node. An ERDT can be built by following the steps described in Algorithm 1.
Generate K splits {s 1 , . . . , s K }|s i = [a < a c ], ∀i = 1, . . . , K, where a is numerical attribute and a c is a cut-point uniformly drawn from a X min , a X max , which denote the minimal and maximal values of a in X, respectively; 5.
Select a split s * = max i=1,...,K {s i , X}; 6. Split X into subsets X l and X r according to s * ; 7.
Build single ERDT t ERDT f and t ERDT r from subsets X l and X r , respectively; Thereafter, the ExtraTrees ensemble algorithm can be built exploiting the random committee ensemble learning (EL) criterion, i.e., an ensemble of randomizable base classifiers is built using a different random number seed, and the final prediction is the average of the predictions generated by the individual base classifiers [10,38]. Similarly, bagging, AdaBoost and MultiBoost versions of ERDT can be realized following the corresponding ensemble construction criteria. Bagging, also called bootstrap aggregating, trains each model in the ensemble using a randomly drawn subset of the training set, and then votes with equal weight [39]. AdaBoost, an abbreviation for adaptive boosting, incrementally builds an ensemble in the sense that subsequent weak learners are tweaked in favor of those instances misclassified by previous classifiers [40]. MultiBoost can be viewed as a combination of AdaBoost with bagging, which can harness both AdaBoost's bias and variance reduction with bagging's superior variance reduction to produce a committee with lower error, also offering, as an advantage over AdaBoost, the suitability to parallel execution [41].

Data Sets
The first hyperspectral image was acquired by the Reflective Optics System Imaging Spectrometer (ROSIS) optical sensor, which provides 115 bands with a spectral range coverage ranging from 0.43 µm to 0.86 µm. The main objective of the ROSIS project is the detection of spectral fine structures especially in coastal waters. This task determined the selection of the spectral range, bandwidth, number of channels, radiometric resolution and its tilt capability for sun glint avoidance. However, ROSIS can be used just as well for the monitoring of spectral features above land or within the atmosphere. The image shown in Figure 1a depicts the Engineering School of Pavia University (Pavia, Italy) with the geometric resolution of 1.3 m. The image has 610 × 340 pixels with 103 spectral channels, where 12 very noisy bands were discarded manually after the data acquisition. The validation data refer to nine land cover classes (as shown in Figure 1). This scene was provided by Professor Paolo Gamba from the Telecommunications and Remote Sensing Laboratory, Pavia University (Pavia, Italy).
J. Imaging 2020, 6, x FOR PEER REVIEW 5 of 15 combination of AdaBoost with bagging, which can harness both AdaBoost's bias and variance reduction with bagging's superior variance reduction to produce a committee with lower error, also offering, as an advantage over AdaBoost, the suitability to parallel execution [41].

Data Sets
The first hyperspectral image was acquired by the Reflective Optics System Imaging Spectrometer (ROSIS) optical sensor, which provides 115 bands with a spectral range coverage ranging from 0.43 μm to 0.86 μm. The main objective of the ROSIS project is the detection of spectral fine structures especially in coastal waters. This task determined the selection of the spectral range, bandwidth, number of channels, radiometric resolution and its tilt capability for sun glint avoidance. However, ROSIS can be used just as well for the monitoring of spectral features above land or within the atmosphere. The image shown in Figure 1a depicts the Engineering School of Pavia University (Pavia, Italy) with the geometric resolution of 1.3 m. The image has 610 × 340 pixels with 103 spectral channels, where 12 very noisy bands were discarded manually after the data acquisition. The validation data refer to nine land cover classes (as shown in Figure 1). This scene was provided by Professor Paolo Gamba from the Telecommunications and Remote Sensing Laboratory, Pavia University (Pavia, Italy).  [42]. Originally, this image has 349 × 1905 pixels with 144 spectral bands in the spectral range between 380 and 1050 nm. In our experiment, dense cloud-covered area at the right part and total of nine blank pixel lines at the upper and lower image edges were removed, which result in subset image with the size of 340 × 1350 pixels.  [42]. Originally, this image has 349 × 1905 pixels with 144 spectral bands in the spectral range between 380 and 1050 nm. In our experiment, dense cloud-covered area at the right part and total of nine blank pixel lines at the upper and lower image edges were removed, which result in subset image with the size of 340 × 1350 pixels.

Experimental Configuration
The free parameters of ERDT, where K represents the number of the attributes set, are the same as the default for the C4.5 algorithm used in bagging and RaF. The overall accuracy (OA) and kappa statistic are used to evaluate the classification performances of these methods. In the case of multiclass classification, OA is usually calculated by dividing the sum of diagonal numbers, which represent correctly classified instances, by the total number of reference instances in the confusion matrix: where i Tp represents numbers of the correctly classified instances for class i, i Tn represents the total number of instances from class i, and there are a total of N classes.
To generate MPs and MPPR, we apply a disk shape SE with n = 10 openings and closings by conventional and partial reconstructions, ranging from one to ten with a step-size increment of one. This choice results in a total of 2163 = 103 + 103 × 10 × 2 and 3024 = 144 + 144 × 10 × 2 dimensional stacked data sets using original spectral bands and a total of 70 = 10 + 3 × 10 × 2 and 67 = 7 + 3 × 10 × 2 dimensional stacked data sets using PCA-transformed features, for Pavia University and GRSS-DFC2013, respectively. Note that only the first ten and seven PCA-transformed features from Pavia University and GRSS-DFC2013, respectively, are considered in the experiments. For fair comparison purposes, we set a total of ten scales for MRS in the image segmentation phase. For instance, the scale parameter is increased from 10 to 55 by a step-size of five to produce a total of 10 scale segmentation results. The segmentation result, which is crucial for guiding MPs, typically relies on the scale parameters that are highly dependent on the spatial resolution and geometrical complexity of the image under consideration. Hence, in the next experiment, we examine the performance of OMPs, OMPsM, EOMPs and EOMPsM with different scale sets. Note that OMPsM and EOMPsM also contain the mean pixel values within objects that produce 3193 = 103 + 103 × 10 × 3, 4464 = 144 + 144 × 10 × 3, 100 = 10 + 3 × 10 × 3 and 97 = 7 + 3 × 10 × 3 dimensional stacked data sets using the original spectral bands and PCA-transformed features for Pavia University and GRSS-DFC2013, respectively. Figure 3 shows the examples of OBR, opening by partial reconstruction (OBPR), and the proposed OOBR with different parameter sets using the second principal component of the Pavia University data. The range of disk shape SEs in OBR and OBPR were set between 6 to 10, while the scale parameters of MRS and OOBR were set between 60 to 100 empirically. A comparison of the results in the first row indicates that OBPR is more capable of modeling the attributes of different objects than OBR from a sequence of SEs, in accordance with the finding in [12]. However, many

Experimental Configuration
The free parameters of ERDT, where K represents the number of the attributes set, are the same as the default for the C4.5 algorithm used in bagging and RaF. The overall accuracy (OA) and kappa statistic are used to evaluate the classification performances of these methods. In the case of multiclass classification, OA is usually calculated by dividing the sum of diagonal numbers, which represent correctly classified instances, by the total number of reference instances in the confusion matrix: where Tp i represents numbers of the correctly classified instances for class i, Tn i represents the total number of instances from class i, and there are a total of N classes.
To generate MPs and MPPR, we apply a disk shape SE with n = 10 openings and closings by conventional and partial reconstructions, ranging from one to ten with a step-size increment of one. This choice results in a total of 2163 = 103 + 103 × 10 × 2 and 3024 = 144 + 144 × 10 × 2 dimensional stacked data sets using original spectral bands and a total of 70 = 10 + 3 × 10 × 2 and 67 = 7 + 3 × 10 × 2 dimensional stacked data sets using PCA-transformed features, for Pavia University and GRSS-DFC2013, respectively. Note that only the first ten and seven PCA-transformed features from Pavia University and GRSS-DFC2013, respectively, are considered in the experiments. For fair comparison purposes, we set a total of ten scales for MRS in the image segmentation phase. For instance, the scale parameter is increased from 10 to 55 by a step-size of five to produce a total of 10 scale segmentation results. The segmentation result, which is crucial for guiding MPs, typically relies on the scale parameters that are highly dependent on the spatial resolution and geometrical complexity of the image under consideration. Hence, in the next experiment, we examine the performance of OMPs, OMPsM, EOMPs and EOMPsM with different scale sets. Note that OMPsM and EOMPsM also contain the mean pixel values within objects that produce 3193 = 103 + 103 × 10 × 3, 4464 = 144 + 144 × 10 × 3, 100 = 10 + 3 × 10 × 3 and 97 = 7 + 3 × 10 × 3 dimensional stacked data sets using the original spectral bands and PCA-transformed features for Pavia University and GRSS-DFC2013, respectively. Figure 3 shows the examples of OBR, opening by partial reconstruction (OBPR), and the proposed OOBR with different parameter sets using the second principal component of the Pavia University data. The range of disk shape SEs in OBR and OBPR were set between 6 to 10, while the scale parameters of MRS and OOBR were set between 60 to 100 empirically. A comparison of the results in the first row indicates that OBPR is more capable of modeling the attributes of different objects than OBR from a sequence of SEs, in accordance with the finding in [12]. However, many large objects and boundaries between different objects that should have appeared were removed at a very small scale after OBPR. In contrast, OOBR maintains the object information between boundaries exactly as in the original by affecting only the brightness or darkness of the objects with different scale parameters. In other words, effects from the scale parameter of OOBR are much smaller than effects from the scale of OBPR.

Results and Analysis
J. Imaging 2020, 6, x FOR PEER REVIEW 7 of 15 large objects and boundaries between different objects that should have appeared were removed at a very small scale after OBPR. In contrast, OOBR maintains the object information between boundaries exactly as in the original by affecting only the brightness or darkness of the objects with different scale parameters. In other words, effects from the scale parameter of OOBR are much smaller than effects from the scale of OBPR. In Figure 4, we present the results for various spatial feature extractors with different parameter sets using an SVM with an radial bias function (RBF) kernel to evaluate the performance of EOMPs and EOMPsM on the considered data sets. A total of 10 rounds were executed for each experiment for the purpose of an objective evaluation.
In Figure 4, we present the results for various spatial feature extractors with different parameter sets using an SVM with an radial bias function (RBF) kernel to evaluate the performance of EOMPs and EOMPsM on the considered data sets. A total of 10 rounds were executed for each experiment for the purpose of an objective evaluation.  The graphs confirm the superiority of the proposed feature extraction methods in contrast to MPs and MPPR, specifically with the best improvements obtained by EOMPsM, and this is valid for both data sets (Figure 4c,d,g,h). However, the effects of the segmentation scale parameter in MRS are different for the considered data sets using the original spectral and the PCA-transferred features. For instance, the best OA curves are achieved by OMPs using the original spectral bands of ROSIS university data with the MRS scale ranges set as 40 to 400 with 40 sequence steps (see Figure 4a). In contrast, the best OA curves are achieved by OMPsM using the original spectral bands with the MRS scale range set from 310 to 400, with 10 sequence steps (see Figure 4b).
Interestingly, the superiority of the larger scale set relative to the smaller one is no longer true when using the original spectral features, because some noise corrupted bands mislead partial reconstruction in MPs and MPPR and the image segmentation procedures in OMPs and OMPsM (see Figure 4a,b,e,f). Additionally, EOMPsM with a larger scale set could limit and even degrade the classification accuracy, whereas a single mean value was assigned to different targets contained in single large segmented objects (see Figure 4c,d,g,h). Summarizing these results, OMPs and EOMPs are more suitable to accommodate the original spectral and PCA-transformed features for a larger MRS scale range set with a larger sequence step, while OMPsM and EOMPsM are more suitable for a larger MRS scale range but with a smaller sequence step. Figure 5 shows the OA values with respect to the number of trees in bagging, RaF and ExtraTrees, and with respect to the number of iterations in AdaBoost and MultiBoostAB ensemble classifiers. According to these graphs, there are no prominent improvements or decreasing trends for a tree size greater than 100 in most of the cases, a result consistent with the findings in other studies [8,10]. Moreover, it is clear that the bagging ensemble of ERDT (Bag(ERDT)) is uniformly better than the bagging ensemble of the conventional C4.5 approach (Bag(C4.5)) in all classification scenarios in terms of classification accuracy. Instead, the performance of the MultiBoostAB and AdaBoost ensemble of ERDT (MB(ERDT) and AB(ERDT)) are not constantly superior to the MultiBoostAB and AdaBoost ensemble of C4.5 (MB(C4.5) and (C4.5)) using different features on the two data sets. Specifically, MB(C4.5) and AB(C4.5) show better OA values than MB(ERDT) and (ERDT) using MPs and MPPR but show lower OA values using OMPs features for considered data sets, and similar OA values shown by using OMPsM features of Pavia University data set but lower values by using OMPsM features from the GRSS-DFC2013 data set. MB(ERDT) and AB(ERDT) show better results using OMPs and OMPsM features but lower results using MPs and MPPR features, which can be explained by the fact that: (1) fewer, but harder to be correctly classified, instances are those focused on by the MultiBoostAB and AdaBoost criteria, which could further weaken ERDT and lead to an overly abundant diversity that hindered the construction of an improved ensemble scenario, (2) however, this shortage could be overcome by exploiting advanced discriminative features. Other solutions for this limitation could be either (1) early stopping of the ensemble or (2) critical tuning of the parameters of ERDT in each iteration step.
Finally, Figures 6 and 7 present the best classification maps corresponding to the highlighted values in Tables 1 and 2 Table 1 and ground truth map, (k) for comparison with Pavia University test data.   Table 1 and ground truth map, (k) for comparison with Pavia University test data.   Table 2 and ground truth map, (k) for comparison with GRSS-DFC2013 test data.
Once again, all classifiers uniformly and clearly confirm the effectiveness of the proposed MRS OMPs over the conventional MPs and MPPR approaches. For instance, the best classification results, with the highest OA (98.75%) and kappa statistic (0.98) values, were achieved by MultiBoost(C4.5) using EOMPsM features on the ROSIS university data, and by AdaBoost(ERDT) using OMPsM features on GRSS-DFC2013 data (OA = 96.59%, kappa statistic = 0.96). If we compare the ensemble versions of ERDT, it is clear that ExtraTrees is better than Bag(C4.5) and is comparable to RaF(C4.5), and that the best improvement in OA values is achieved by either AdaBoost or MultiBoost ensemble (see the numbers in bold in Tables 1 and 2). Additionally, regarding EOMPs, the superiority of EOMPsM over EMPs is clear, its performance is comparable to the one by EMPPR, or even better in some cases.  Table 2 and ground truth map, (k) for comparison with GRSS-DFC2013 test data.
Once again, all classifiers uniformly and clearly confirm the effectiveness of the proposed MRS OMPs over the conventional MPs and MPPR approaches. For instance, the best classification results, with the highest OA (98.75%) and kappa statistic (0.98) values, were achieved by MultiBoost(C4.5) using EOMPsM features on the ROSIS university data, and by AdaBoost(ERDT) using OMPsM features on GRSS-DFC2013 data (OA = 96.59%, kappa statistic = 0.96). If we compare the ensemble versions of ERDT, it is clear that ExtraTrees is better than Bag(C4.5) and is comparable to RaF(C4.5), and that the best improvement in OA values is achieved by either AdaBoost or MultiBoost ensemble (see the numbers in bold in Tables 1 and 2). Additionally, regarding EOMPs, the superiority of EOMPsM over EMPs is clear, its performance is comparable to the one by EMPPR, or even better in some cases.

Conclusions
In this study, we propose the concept of OMPs for spatial feature extraction in high-resolution hyperspectral images, by using multiscale objects after multi resolution segmentation as the SEs. Additionally, ExtraTrees, bagging, AdaBoost, and MultiBoost ensemble versions of the ERDT algorithm are introduced and comparatively investigated on two benchmark hyperspectral data sets. The experimental results confirm the effectiveness of the proposed OMPs, OMPs(M) and their extended versions. In addition, the superiority of EOMPsM over the conventional MPs and MPPR is reported. In the evaluation of the adopted classifiers, the bagging ensemble of ERDT is better than the bagging version of C4.5, and ExtraTrees is better than Bag(C4.5) but comparable to RaF(C4.5). The best improvements are reached by the AdaBoost or MultiBoost ensemble of ERDT using OMPsM extracted from the original bands, or EOMPsM extracted from the PCA-transformed features.
Future works will focus on the role of self-adaptive segmentation scale selection for multiscale segmentation in the usefulness of OMPs and EOMPsM. The early steps and self-adaptive parameter tuning of individual ERDT in the AdaBoost and MultiBoost ensemble framework will also be investigated.