Deep Learning for Land Cover Classiﬁcation Using Only a Few Bands

: There is an emerging interest in using hyperspectral data for land cover classiﬁcation. The motivation behind using hyperspectral data is the notion that increasing the number of narrowband spectral channels would provide richer spectral information and thus help improve the land cover classiﬁcation performance. Although hyperspectral data with hundreds of channels provide detailed spectral signatures, the curse of dimensionality might lead to degradation in the land cover classiﬁcation performance. Moreover, in some practical applications, hyperspectral data may not be available due to cost, data storage, or bandwidth issues, and RGB and near infrared (NIR) could be the only image bands available for land cover classiﬁcation. Light detection and ranging (LiDAR) data is another type of data to assist land cover classiﬁcation especially if the land covers of interest have di ﬀ erent heights. In this paper, we examined the performance of two Convolutional Neural Network (CNN)-based deep learning algorithms for land cover classiﬁcation using only four bands (RGB + NIR) and ﬁve bands (RGB + NIR + LiDAR), where these limited number of image bands were augmented using Extended Multi-attribute Proﬁles (EMAP). The deep learning algorithms were applied to a well-known dataset used in the 2013 IEEE Geoscience and Remote Sensing Society (GRSS) Data Fusion Contest. With EMAP augmentation, the two deep learning algorithms were observed to achieve better land cover classiﬁcation performance using only four bands as compared to that using all 144 hyperspectral bands.

There is also an increasing interest in adapting deep learning methods for land cover classification after several breakthroughs have been reported in a variety of computer vision tasks such as image were conducted for the cases when all hyperspectral bands were used for land cover classification in contrast to EMAP-augmented bands only and RGB+NIR bands alone. Additionally, the impact of adding LiDAR on the classification performance of the three cases was examined. These three cases are: (a) Limited number of bands (RGB+NIR) vs. (RGB+NIR+LiDAR), (b) EMAP-augmentation of RGB+NIR vs. EMAP augmentation of RGB+NIR+LiDAR, and (c) hyperspectral bands vs. hyperspectral bands+LiDAR. We used the 2013 IEEE GRSS Data Fusion Contest dataset [44] in the investigations. Two conventional classification algorithms, Joint Sparse Representation (JSR) [14] and Support Vector Machine (SVM) [47], were also applied to the same dataset for the EMAP-augmented cases. The contributions of this paper are as follows: • We provided a comprehensive performance evaluation of two CNN-based deep learning methods for land cover classification when a limited number of bands (RGB+NIR and RGB+NIR+LiDAR) were used for augmentation with EMAP. The evaluation also included detailed classification accuracy comparisons when limited bands were used only and when all hyperspectral bands were used without EMAP.

•
We showed that, with deep learning methods using fewer number of bands and utilizing EMAP-based augmentation, it is quite possible to arrive at highly decent accuracies for land cover classification. This eliminates the need for hundreds of hyperspectral bands and reduces it to four bands.

•
We demonstrated that, even though adding LiDAR band to RGB+NIR bands for EMAP augmentation made a significant impact with conventional classifiers, JSR and SVM, for land cover classification, no considerable impact was observed with the deep learning methods since their classification performances were already good when using the EMAP-augmented RGB+NIR bands.
The rest of this paper is organized as follows. In Section 2, we review the two deep learning methods, EMAP, and the 2013 IEEE GRSS Data Fusion Contest Data. In Section 3, we summarize our findings. Finally, we conclude our paper with a few remarks.

Our Customized CNN Method
For this CNN-based method, we used the same structure that was used in our previous work [39]. Only the filter size in the first convolution layer was changed accordingly to be consistent with the input patch sizes. This CNN model has four convolutional layers, with various filter sizes and a fully connected layer with 100 hidden units as shown in Figure 1. When we designed the network, we tried different configurations for the number of layers and the size of each layer and selected the one that provided the best results. We did this for all the layers (convolutional and FC layers). The choice of "100 hidden units" was the outcome of our design studies. This model was also used for soil detection due to illegal tunnel excavation, as described in [48]. Each convolutional layer utilizes the Rectified Linear Unit (ReLu) as the activation function. The last fully connected layer uses the SoftMax function for classification. We added dropout layer for each convolutional layer with a dropout rate of 0.1 to mitigate overfitting [49]. It should be noted that the network size changes depending on the input size. In Figure 1, the network should be the same for 5 × 5 as for 7 × 7. The only difference would be that the first layer is deleted. For 3 × 3, the first two layers are deleted. The number of bands (N) in the input image can be any integer number.

CNN-3D
CNN-3D is a CNN-based method [50] with two convolutional layers and two fully connected layers. The two convolutional layers have pooling units and all of its four layers have ReLu units. Figure 2 shows the architecture of this second CNN-based method that is used for land cover classification. This network was used in hyperspectral pixel classification before. The details have been described by the authors of [50]. Due to the use of pooling layers, the patch size cannot be too small. Otherwise, the later layers will not have any impacts.

EMAP
For a grayscale image and a sequence of threshold levels { ℎ , ℎ , … ℎ }, the attribute profile (AP) of was obtained by applying a sequence of thinning and thickening attribute transformations to every pixel in as follows: where and ( = 1, 2, … ) are the thickening and thinning operators at threshold ℎ , respectively. The EMAP of was then acquired by stacking two or more APs using any feature reduction technique on multispectral/hyperspectral data, such as purely geometric attributes (e.g., area, length of the perimeter, image moments, shape factors), or textural attributes (e.g., range, standard deviation, entropy) [36,[39][40][41] as shown in (2). Further technical details about EMAP can be found in [36,[39][40][41].
When generating the EMAP-augmented bands in the conducted investigations, no feature dimension reduction process was applied to the hyperspectral data (such as Principal Component Analysis) because the hyperspectral bands of interest were only RGB+NIR instead of all the hyperspectral bands available. EMAP was directly applied to these limited number of bands (RGB+NIR) with the selected attribute profiles, which were the 'area (a)' and 'length of the diagonal of the bounding box' (d) [43]. The lambda parameters in EMAP for both 'area' and 'length of the diagonal of the bounding box' attributes were arbitrarily chosen. For the 'area' attribute, which is

CNN-3D
CNN-3D is a CNN-based method [50] with two convolutional layers and two fully connected layers. The two convolutional layers have pooling units and all of its four layers have ReLu units. Figure 2 shows the architecture of this second CNN-based method that is used for land cover classification. This network was used in hyperspectral pixel classification before. The details have been described by the authors of [50]. Due to the use of pooling layers, the patch size cannot be too small. Otherwise, the later layers will not have any impacts.

CNN-3D
CNN-3D is a CNN-based method [50] with two convolutional layers and two fully connected layers. The two convolutional layers have pooling units and all of its four layers have ReLu units. Figure 2 shows the architecture of this second CNN-based method that is used for land cover classification. This network was used in hyperspectral pixel classification before. The details have been described by the authors of [50]. Due to the use of pooling layers, the patch size cannot be too small. Otherwise, the later layers will not have any impacts.

EMAP
For a grayscale image and a sequence of threshold levels ℎ , ℎ , … ℎ , the attribute profile (AP) of was obtained by applying a sequence of thinning and thickening attribute transformations to every pixel in as follows: where and 1, 2, … are the thickening and thinning operators at threshold ℎ , respectively. The EMAP of was then acquired by stacking two or more APs using any feature reduction technique on multispectral/hyperspectral data, such as purely geometric attributes (e.g., area, length of the perimeter, image moments, shape factors), or textural attributes (e.g., range, standard deviation, entropy) [36,[39][40][41] as shown in (2). Further technical details about EMAP can be found in [36,[39][40][41].
, … When generating the EMAP-augmented bands in the conducted investigations, no feature dimension reduction process was applied to the hyperspectral data (such as Principal Component Analysis) because the hyperspectral bands of interest were only RGB+NIR instead of all the hyperspectral bands available. EMAP was directly applied to these limited number of bands (RGB+NIR) with the selected attribute profiles, which were the 'area (a)' and 'length of the diagonal of the bounding box' (d) [43]. The lambda parameters in EMAP for both 'area' and 'length of the diagonal of the bounding box' attributes were arbitrarily chosen. For the 'area' attribute, which is

EMAP
For a grayscale image f and a sequence of threshold levels {Th 1 , Th 2 , . . . Th n }, the attribute profile (AP) of f was obtained by applying a sequence of thinning and thickening attribute transformations to every pixel in f as follows: where φ i and γ i (i = 1, 2, . . . n) are the thickening and thinning operators at threshold Th i , respectively.
When generating the EMAP-augmented bands in the conducted investigations, no feature dimension reduction process was applied to the hyperspectral data (such as Principal Component Analysis) because the hyperspectral bands of interest were only RGB+NIR instead of all the hyperspectral bands available. EMAP was directly applied to these limited number of bands (RGB+NIR) with the selected attribute profiles, which were the 'area (a)' and 'length of the diagonal of the bounding box' (d) [43]. The lambda parameters in EMAP for both 'area' and 'length of the diagonal Remote Sens. 2020, 12, 2000 5 of 17 of the bounding box' attributes were arbitrarily chosen. For the 'area' attribute, which is related to modeling spatial information, the higher number of extrema (lambda parameters) contains more detail, whereas the smaller number smooths out the input data. In this work, smoothing was favored and the lambda parameters for the area attribute of EMAP, which is a sequence of thresholds used by the morphological attribute filters, were set to 10 and 15, respectively. For the 'length of the diagonal of the bounding box' attribute, which is related to the shape of the regions, the lambda parameters were set to 50, 100, and 500. With these two attributes and their parameter settings, 10 augmented bands were generated for a single band image. The total number of image bands becomes 11 when including the original single band image, Because the resultant EMAP augmented bands were few in number, no feature dimension reduction process was applied to the augmented EMAP bands and all of the EMAP augmented bands were used with the deep learning classifiers.

Dataset
The hyperspectral image dataset for the University of Houston area and the corresponding LiDAR data were used in this paper. This dataset, together with its ground truth land cover maps, were obtained from the IEEE GRSS Data Fusion package [44] and were used in the 2013 IEEE Geoscience and Remote Sensing Society Data Fusion Contest. The hyperspectral data in this dataset contain 144 bands ranging in wavelength values from 380 mm to 1050 nm with a spectral width of 4.65 nm and a spatial resolution of 2.5 m per pixel. The LiDAR data contain the height information and have a resolution of 2.5 m per pixel. Table 1 displays the number of training and test data pixels per land cover class provided in this dataset. There were 15 land cover classes which were identified and named by the contest. These classes and their corresponding training and test datasets (each pixel in the training and test dataset had its own land cover class label) were fixed. The training data set included 2832 pixels and the test dataset includes the remaining 12,197 labeled pixels. Figure 3a shows the color image, and Figure 3b,c show the color image with ground truth land cover annotations for both the training and test dataset. The brightness and contrast of the color image in Figure 3 was adjusted for better visual assessment. (b) Color image overlaid with ground truth training data land cover pixels.
(c) Color image overlaid with ground truth test data land cover pixels.

Performance Metrics
For classification performance evaluation of the deep learning methods with EMAP-augmented bands and all other band combinations, we used overall accuracy, OA, which is the ratio of the sum of correctly classified pixels from all classes to the total number of pixels in the test data. In addition to OA, we also generated average accuracy, AA, value which corresponds to the average of individual class accuracies, which is also known as 'balanced accuracy'. The last performance metric was the well-known Kappa (K) coefficient [51]. Table 2 shows our customized CNN model [48] Table 2 consist of two parts. The first part corresponds to the resultant performance metrics (OA, AA, and Kappa), and the second part corresponds to the correct classification accuracy for each land cover type using the investigated band combinations with our customized CNN model. Figure 4 and 5 show these results using bar chart type of plots. When applying the customized CNN model, two different patch sizes were investigated, 3 × 3 and 5 × 5, respectively. In Figures 4 and 5, the value in parenthesis after the band combination type corresponds to patch size. As an example, (p3) indicates a patch size of 3 × 3.

Our Customized CNN Results
The highest classification accuracy was found to be with the 44 EMAP-augmented bands case using the 3 × 3 patch size, followed by the 55 EMAP-augmented bands case with 3 × 3 patch size. Among the 15 different land covers, two land covers, synthetic grass and tennis court, had perfect correct classification followed by close-to-perfect classification accuracy for soil land cover. The In the hyperspectral image, the investigations with the four bands (RGB+NIR) correspond to the narrow hyperspectral bands, which were directly retrieved from hyperspectral data. These bands are Red (R), Green (G), Blue (Blue), and NIR, and the band numbers in the hyperspectral data for RGB and NIR bands are (R), #30 (G), #22 (B), and #103 (NIR).
We would like to mention that there is another dataset known as Trento data that contains both hyperspectral and LiDAR images. However, the Trento dataset is no longer publicly available to researchers.

Performance Metrics
For classification performance evaluation of the deep learning methods with EMAP-augmented bands and all other band combinations, we used overall accuracy, OA, which is the ratio of the sum of correctly classified pixels from all classes to the total number of pixels in the test data. In addition to OA, we also generated average accuracy, AA, value which corresponds to the average of individual class accuracies, which is also known as 'balanced accuracy'. The last performance metric was the well-known Kappa (K) coefficient [51]. Table 2 shows our customized CNN model [48] classification results for the test dataset with six different sets of image bands. These six different sets of image bands are:  Table 2 consist of two parts. The first part corresponds to the resultant performance metrics (OA, AA, and Kappa), and the second part corresponds to the correct classification Remote Sens. 2020, 12, 2000 7 of 17 accuracy for each land cover type using the investigated band combinations with our customized CNN model. Figures 4 and 5 show these results using bar chart type of plots. When applying the customized CNN model, two different patch sizes were investigated, 3 × 3 and 5 × 5, respectively. In Figures 4 and 5, the value in parenthesis after the band combination type corresponds to patch size. As an example, (p3) indicates a patch size of 3 × 3.

Our Customized CNN Results
Remote Sens. 2020, 12, x FOR PEER REVIEW 7 of 17 lowest classification accuracy was found to be for highway class. Figure 6 shows the estimated land cover maps for the whole image with our customized CNN model. The estimated land cover map with the highest overall accuracy for each image band set is shown in Figure 6.    lowest classification accuracy was found to be for highway class. Figure 6 shows the estimated land cover maps for the whole image with our customized CNN model. The estimated land cover map with the highest overall accuracy for each image band set is shown in Figure 6.    The highest classification accuracy was found to be with the 44 EMAP-augmented bands case using the 3 × 3 patch size, followed by the 55 EMAP-augmented bands case with 3 × 3 patch size. Among the 15 different land covers, two land covers, synthetic grass and tennis court, had perfect correct classification followed by close-to-perfect classification accuracy for soil land cover. The lowest classification accuracy was found to be for highway class. Figure 6 shows the estimated land cover maps for the whole image with our customized CNN model. The estimated land cover map with the highest overall accuracy for each image band set is shown in Figure 6.  (g) Figure 6. Estimated land cover maps by our customized CNN method for the whole image. Table 3 shows the classification results of the CNN-3D method [49] for the test dataset with six different sets of image bands. With this method, two different patch sizes, 13 × 13 and 17 × 17, were investigated for three of the five sets of bands. As mentioned earlier, the patch sizes cannot be too small due to the pooling layers in C3D. For instance, if the patch size is 7 × 7, the patch size will drop to single digits after the first two layers, hence the network cannot function as desired. For the two  Table 3 shows the classification results of the CNN-3D method [49] for the test dataset with six different sets of image bands. With this method, two different patch sizes, 13 × 13 and 17 × 17, were investigated for three of the five sets of bands. As mentioned earlier, the patch sizes cannot be too small due to the pooling layers in C3D. For instance, if the patch size is 7 × 7, the patch size will drop to single digits after the first two layers, hence the network cannot function as desired. For the two remaining band sets, which were 144 hyperspectral bands and the 144 hyperspectral bands+LiDAR, the GPU memory was not sufficient for CNN-3D to run. Consequently, only the patch size of 7 × 7 was considered for 144 hyperspectral bands and the 144 hyperspectral bands+LiDAR cases. Even though comparing the 7 × 7 patch size results of these two image band sets with the 13 × 13 and 17 × 17 patch size results of the other three image band sets would not be fair, we still included these results in Table 3 to provide some rough idea about the impact of adding LiDAR band to the hyperspectral bands. Figure 7 corresponds to the bar chart of the resultant performance metrics (OA, AA, and Kappa), and Figure 8 shows the bar chart plot of the correct classification accuracy for each land cover type using the investigated band combinations with CNN-3D model in Table 3.         With CNN-3D, the highest classification accuracy was obtained with the 55 EMAP-augmented bands and 44 EMAP-augmented bands cases using a patch size of 13 × 13. The 55 EMAP-augmented bands case was observed to have slightly higher accuracy than the 44 EMAP-augmented bands case. The classification results with all other image band sets had relatively lower values. For the 55 EMAP-augmented bands case, the top three close-to-perfect correct classifications were for the soil, synthetic grass, and tennis court land covers, whereas the lowest classification accuracy was for highway class. In our customized CNN model results, similar observations were made with respect to the land covers with best top three and the worst classification accuracies. Figure 9 shows the estimated land cover maps with this method for the whole image. For image band sets with more than one investigated patch window size, the one providing better classification performance is shown in Figure 9.   (g) Figure 9. CNN-3D method estimated land cover maps for the whole image. Table 4 shows the highest classification accuracy values obtained with our customized CNN model and CNN-3D method for each of the six image band sets. In five of these six cases, our customized CNN method was observed to perform better than CNN-3D method. The CNN-3D performed better than our customized CNN method only in the 55 EMAP-augmented bands case.

Performance Comparison of Deep Learning Methods
Considering the best classification results from any of the two deep learning methods in each image band set, one interesting observation from Table 5 is that using RGB+NIR bands (four bands) without LiDAR and using hyperspectral bands (144 bands) without the LiDAR resulted in slightly better classification accuracies than the same cases which included LiDAR. For the 44 EMAPaugmented bands case and 55 EMAP-augmented bands case, a similar observation was made when the average accuracy and Kappa metrics were taken into consideration instead of the overall accuracy metric, since these two metrics were found to be slightly higher in the 44 EMAP-augmented bands case in comparison to the 55 EMAP-augmented bands case. For the overall accuracy metric, they were also very close to each other in value. Considering the results with and without LiDAR in these three cases, it was observed that including LiDAR did not seem to have a considerable impact with the deep learning methods. This is somewhat not intuitive, since LiDAR was expected to bring valuable information with respect to height. This could be because several of the 15 land covers in this dataset are surface-level type land covers, meaning they have similar heights with the exception of trees, residential, and commercial land covers. When looking at the classification accuracies of these three land cover classes for the LiDAR-added cases to see whether LiDAR had any impact, mixed results were also seen. It was anticipated that no significant impact of LiDAR would be observed with the two deep learning methods because their classification performances were already good when using RGB+NIR bands.
By comparing the classification maps of Figures 6 and 9, it can be seen that our customized CNN results look crisper due to the use of small patch sizes. In a nutshell, the two CNNs differed only in  Table 4 shows the highest classification accuracy values obtained with our customized CNN model and CNN-3D method for each of the six image band sets. In five of these six cases, our customized CNN method was observed to perform better than CNN-3D method. The CNN-3D performed better than our customized CNN method only in the 55 EMAP-augmented bands case.

Performance Comparison of Deep Learning Methods
Considering the best classification results from any of the two deep learning methods in each image band set, one interesting observation from Table 5 is that using RGB+NIR bands (four bands) without LiDAR and using hyperspectral bands (144 bands) without the LiDAR resulted in slightly better classification accuracies than the same cases which included LiDAR. For the 44 EMAP-augmented bands case and 55 EMAP-augmented bands case, a similar observation was made when the average accuracy and Kappa metrics were taken into consideration instead of the overall accuracy metric, since these two metrics were found to be slightly higher in the 44 EMAP-augmented bands case in comparison to the 55 EMAP-augmented bands case. For the overall accuracy metric, they were also very close to each other in value. Considering the results with and without LiDAR in these three cases, it was observed that including LiDAR did not seem to have a considerable impact with the deep learning methods. This is somewhat not intuitive, since LiDAR was expected to bring valuable information with respect to height. This could be because several of the 15 land covers in this dataset are surface-level type land covers, meaning they have similar heights with the exception of trees, residential, and commercial land covers. When looking at the classification accuracies of these three land cover classes for the LiDAR-added cases to see whether LiDAR had any impact, mixed results were also seen. It was anticipated that no significant impact of LiDAR would be observed with the two deep learning methods because their classification performances were already good when using RGB+NIR bands.
By comparing the classification maps of Figures 6 and 9, it can be seen that our customized CNN results look crisper due to the use of small patch sizes. In a nutshell, the two CNNs differed only in the pooling layers. In our investigations, we found that the pooling layers did not help the overall classification performance.

Comparison with Conventional Methods
The deep learning methods' best classification performances for 44 EMAP-augmented bands and 55 EMAP-augmented bands cases were compared with two conventional methods, Joint Sparse Representation (JSR) [14] and Support Vector Machine (SVM) [47], using the same two sets of EMAP-augmented bands (44 and 55). JSR is more computationally demanding, as it exploits neighborhood pixels for joint land type classification. In JSR, a 3 × 3 or 5 × 5 patch of pixels is used in the S target matrix. The parameter s 0 in Equation (13) of [52] is the design parameter, which controls the number of sparse elements. Details of the mathematics have been described by the authors of [52]. We chose that parameter to be 10e-4. There were two SVM parameters, the penalty factor C and the radial basis function (RBF) kernel width, γ, which were chosen to be 10 and 0.1, respectively. In addition to these two conventional methods, classification results reported by two papers, which used this same dataset with hyperspectral and LiDAR bands augmented with EMAP, were also considered for comparison as well. Details of the JSR and SVM results have been described by the authors of [52].
The resultant overall accuracies of these methods are shown in Table 5. The two conventional methods were observed to perform extremely close in accuracy to the deep learning methods for the 55 EMAP-augmented bands case. For the 44 EMAP-augmented bands case, on the other hand, the deep learning methods' performances were considerably better than the conventional methods. From Table 5, it can be noticed that adding the LiDAR band to the RGB+NIR bands for EMAP augmentation made a significant impact in overall classification accuracy with the conventional classifiers, JSR and SVM, for land cover classification. However, no considerable impact was observed when adding the LiDAR band to the the EMAP augmentation with the deep learning methods as discussed before.
In [44], the authors investigated three different combinations of hyperspectral bands for land cover classification. Among these, the investigation which used the hyperspectral bands, additional bands from LiDAR data, and the EMAP-augmented hyperspectral bands resulted in an overall accuracy values of 90.65%, whereas the overall accuracy when using only the hyperspectral bands and EMAP-augmented hyperspectral bands was 84.40%. In another paper [53], a graph-based approach was proposed to fuse HS and LiDAR data for land cover classification. In [53], the authors used the same dataset with three different band combinations. Their investigation with hyperspectral bands only using SVM classifier had an accuracy of 80.72%. Our results using only hyperspectral bands are in line with this result, since we reached an accuracy of 81.48% using our customized CNN method, which provided slightly higher accuracy than their result. A second investigation described by the authors of [53] used the morphological profiles of hyperspectral and LiDAR bands and reached an accuracy of 86.39%.
In our case, even though only four bands (RGB+NIR) and five bands (RGB+NIR+LiDAR) were used with EMAP for augmentation, decent classification accuracies of 87.92% (Our CNN) and 87.96% (CNN-3D) were reached, respectively. These overall accuracy values are higher than the investigations described by the authors of [44,53], which used all hyperspectral bands and additional LiDAR bands together with EMAP augmentation. This is quite impressive considering that we augmented only four bands with EMAP instead of using all hyperspectral bands and additional bands from LiDAR and still reached an overall accuracy of 87.92%. Our work shows that with deep learning methods using a fewer number of bands and utilizing EMAP augmentation, it is possible to arrive at very decent correct classification accuracies for land cover classification, eliminating the need of using hundreds of hyperspectral bands. This could ultimately reduce computation and data storage needs. Here, we compared the computational costs of different methods. We used a desktop PC with i7 quadcore CPU and a GPU (NVIDIA GeForce Titan X Pascal 12GB GDDR5X). The JSR and SVM results only used CPU, and the CNN and C3D used GPU. Table 6 summarizes the computational times in minutes. It can be seen that SVM were much faster than JSR because JSR needed a lot more time in minimizing the sparseness constraint. The CNN and C3D times are similar and less than three minutes, which might be misleading. It should be noted that the deep learning methods all used GPU, hence the computational times are less than those conventional methods.

Conclusions
In this paper, we investigated the classification performance of two deep learning methods for land cover classification, where only four bands (RGB+NIR) and five bands (RGB+NIR+LiDAR) were used in terms of EMAP augmentation. In the performance evaluations, the RGB+NIR bands and hyperspectral bands were used without EMAP as well. The results showed that, using deep learning methods and EMAP augmentation, RGB+NIR (four bands) or RGB+NIR+LiDAR (five bands) can produce very good classification performance, and these results are better than those obtained by conventional classifiers such as SVM and JSR. We observed that, even though adding LiDAR band to RGB+NIR bands for augmentation made a significant classification performance impact with respect to conventional classifiers (JSR and SVM), no significant impact was observed with the deep learning methods, since their classification performances were already very good when using RGB+NIR bands. This work demonstrated that using RGB+NIR (four bands) or RGB+NIR+LiDAR (five bands) with EMAP augmentation is feasible for land cover classification, as the resultant accuracies are only a few percentage points lower than some of the best performing methods in the literature which used the same dataset and utilized all the available hyperspectral bands.
The fusion of different classifiers using decision level fusion [54] or pixel level fusion [55] could be a good future direction. Parallel implementation of JSR and SVM using parallel processing techniques could be another future direction.