Improving Land Cover Classification Using Extended Multi-Attribute Profiles (EMAP) Enhanced Color, Near Infrared, and LiDAR Data

Hyperspectral (HS) data have found a wide range of applications in recent years. Researchers observed that more spectral information helps land cover classification performance in many cases. However, in some practical applications, HS data may not be available, due to cost, data storage, or bandwidth issues. Instead, users may only have RGB and near infrared (NIR) bands available for land cover classification. Sometimes, light detection and ranging (LiDAR) data may also be available to assist land cover classification. A natural research problem is to investigate how well land cover classification can be achieved under the aforementioned data constraints. In this paper, we investigate the performance of land cover classification while only using four bands (RGB+NIR) or five bands (RGB+NIR+LiDAR). A number of algorithms have been applied to a well-known dataset (2013 IEEE Geoscience and Remote Sensing Society Data Fusion Contest). One key observation is that some algorithms can achieve better land cover classification performance by using only four bands as compared to that of using all 144 bands in the original hyperspectral data with the help of synthetic bands generated by Extended Multi-attribute Profiles (EMAP). Moreover, LiDAR data do improve the land cover classification performance even further.


Introduction
Hyperspectral (HS) images have been used in many applications [1]. Examples of HS sensors include Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) [2] and Adaptive Infrared Imaging Spectroradiometer (AIRIS) [3]. The AVIRIS images have 224 bands in the range of 0.4 to 2.5 µm. AIRIS is a longwave infrared (LWIR) sensor with 20 bands for the remote detection of chemical agents, such as nerve gas. In the literature, people have used HS data in small target detection [4,5], fire damage assessment [6,7], anomaly detection [8][9][10][11][12][13][14], chemical agent detection and classification [3,15], border monitoring [16], change detection [17][18][19][20], and Mars mineral map abundance estimation [21,22]. There are also many papers on land cover classification. For instance, the fusion of HS and LiDAR data was proposed in [23] and applied to the 2013 IEEE Geoscience and Remote Sensing Society (GRSS) Data Fusion Contest dataset. All 144 bands with help from Extended Multi-attribute Profiles (EMAP) were used. The results achieved 90% in overall accuracy. In another paper [24], a graph-based approach was proposed in order to fuse HS and LiDAR data for land cover classification.
However, HS sensors are expensive and they usually demand large data storage. In some real-time applications, HS data may need to be transmitted via bandwidth constrained channels to ground stations for data processing. The above scenarios prohibit the use of HS sensors for some applications, such as precision agriculture, where farmers may have a limited budget. In [25], some challenges in practical applications are mentioned. One of them is that, in the event that no HS data are available, synthetic spectral bands using EMAP may be a good alternative. Some recent applications of using EMAP for soil detection and change detection can be found in [26].
In this paper, we focus on addressing the above practical problem in land cover classification, where only a few bands, namely RGB and near infrared (NIR) bands, are available. Under such data constrained situations, our first question is about what performance we can achieve for land cover classification while only using those four bands. Second, if synthetic bands using EMAP are available, what kind of performance boost can one get? Is the resulting performance close to the case of using all hyperspectral bands? Third, if LiDAR data are available, can we see even further enhancement in land cover classification?
It should be noted that RGB and NIR images are mostly used for vegetation detection, which involves the simple calculation of normalized difference vegetation index (NDVI). The generation of NDVI can be done in real-time. There are also some recent researchers who have used color images for visual object classification (VOC) using deep learning methods (DeepLabV3+ [27], SegNet [28], Pyramid Scene Parsing network (PSP) [29], and Fully Convolutional Network (FCN) [30]). In principle, those deep learning methods for object detection using color images can be adapted to land cover classification. Recently, our team initiated an investigation along that direction in [31]. However, those deep learning methods require a lot of training data and may not yield better results when data are scarce, which is the case for the IEEE Houston dataset presented in this paper.
In our investigations, we applied nine algorithms, including three hyperspectral classification methods, Matched Signature Detection (MSD), Adaptive Subspace Detection (ASD), Reed-Xiaoli Detection (RXD), and their kernel versions, and also Sparse Representation (SR), Joint SR (JSR), and Support Vector Machine (SVM) to the 2013 IEEE GRSS Data Fusion Contest dataset [23] for land cover classification. Some of them cannot be directly applied and required some customization. For instance, RXD is a well-known technique for anomaly detection. We modified it for land cover classification. The customization of existing algorithms can be considered as our first contribution. In our studies, we clearly saw the advantages of using EMAP. In most of the cases, EMAP versions resulted in a significant performance increase. Answering the question of what performance gain using synthetic bands for land cover classification is our second contribution. Moreover, we also confirmed that land cover classification performance can be further enhanced if LiDAR is combined with EMAP. Confirming that the LiDAR does help the classification accuracy is our third contribution.
It is worth mentioning that the proposed methodology has been applied to another dataset known as the Trento dataset, which contains both hyperspectral and LiDAR. The results clearly demonstrated that our proposed approach achieved land cover classification results that were very close to the state-of-the-art methods in the literature by using only RGB, NIR, and LiDAR with EMAP. However, we did not have permission to publish those results related to the Trento dataset. Interested readers can contact us directly for those results.
The paper is organized as follows. In Section 2, we will review those classification algorithms, the 2013 IEEE GRSS Data Fusion Contest Data, EMAP, and evaluation metrics. In Section 3, we will summarize our findings. The classification results using different methods and combinations of spectral bands will be presented with tables and classification maps. Moreover, we compared with two representative results in the literature for the same data set. Our results of using only four available bands with help from EMAP are very close to those results presented in [23,24] that used 144 bands. Finally, we will conclude our paper with a few remarks.

Land Cover Classification Methods
Although land cover classification can be done using object based detection methods, here we perform pixel-based classification. This is because the IEEE dataset only has ground truth land cover labels in pixels rather than land cover maps. There are 15 land cover classes in the 2013 IEEE GRSS Data Fusion Contest. For each class, a number of signatures from the training data are available. For a particular classifier, the classification process begins by separately detecting each class. The maximum detection value at each pixel location across all classes' maps is taken. The index of that maximum value will then be the class label. Aggregating all of those classification labels into a two-dimensional (2D) matrix yields the overall classification map. The details of each method are as follows.

Matched Subspace Detection (MSD)
MSD [32] is the process of matching the signatures of a background and target dataset in order to classify a given pixel as a specific class. There are two separate hypothetical scenarios, either with a present or absent target. These equations are established as H 0 (target absent) and H 1 (target present) where T and B are the orthogonal matrices with column vectors of a certain dimension that span the subspace of the target and background/non-target, θ and ζ are the unknown vectors that account for the various different corresponding column vectors of T and B, respectively, and n represents random noise. These equations can then be transformed to create a generalized likelihood ratio test (GLRT) to predict whether a specific pixel will be a target or background pixel: where The reason that we chose MSD as one of the methods is because MSD has been proven to work well in some hyperspectral target detection applications [32]. In this paper, we use MSD, as follows. To detect class i, we use pixel signatures from the training samples in class i to form the target matrix T i and the rest of the samples in other classes to form the matrix B i . We then insert T i and B i in (1) and (2) and the perform target detection for class i using (3) for the whole image cube. The detection map i is saved. We then repeat the process for i = 1 to 15. The label that corresponds to maximum value out of the 15 maps at each pixel location will be the class identity.

Adaptive Subspace Detection (ASD)
ASD [32] has a very similar process to MSD with a slightly varying set of hypotheses: where U is the orthogonal matrix whose column vectors are eigenvectors of the target subspace, θ is the unknown vector whose entries are coefficients for the target subspace, and n is the random Gaussian noise. Those equations are then solved in order to create a similar GLRT to MSD. The classification is done for each class first and the final decision is made by picking the class label that corresponds the maximum detection value at each pixel location.
Similar to MSD, we adopted ASD simply because it has been proven to work quite well in target detection while using hyperspectral images [32,33].

Reed-Xiaoli Detection (RXD)
In hyperspectral image processing community, RXD [34] is usually used for anomaly detection. It is simple and efficient. We would like to emphasize that, to the best of our knowledge, no one has applied RXD for land cover classification before. Here, we apply RXD in a very different way. For land cover classification, RXD follows the same procedure as MSD and ASD, using a H 0 and H 1 equation and combining them to generate a GLRT. The background pixels come from training samples in the 14 other classes to detect pixels in class i. That is, RXD can be expressed as: where r is the test pixel, µ b is the estimated sample mean of the 14 background classes, and C b is the background covariance of the training samples in the 14 other classes. The process will repeat for 15 classes, one for each class. The final classification is done by choosing the class label that corresponds to the maximum detection value at each pixel location. The kernel versions of each of the above methods-ASD, MSD, and RXD-all follow a similar fashion.

Kernel MSD (KMSD)
In [32], it was demonstrated that KMSD has a better performance than MSD. In light of that, we also included KMSD in our investigations. In KMSD, the input data have been implicitly mapped by a nonlinear function Φ into a high dimensional feature space F. The detection model in F is then given by: where the variables are defined in [32]. The kernelized GLRT for KMSD can be found in Equation (9) of [32].

Kernel ASD (KASD)
Similar to KMSD, KASD [32] was also adopted in our investigations because of its good performance in [32]. In KASD, similar to KMSD, the detection formulation can be written as The various variables are defined in [15]. The final detector in kernelized format can be found in Equation (30) of [15].

Kernel RXD (KRXD)
The reason for including KRXD was because KRXD was demonstrated to perform much better than RXD in [34]. In KRXD, every pixel is transformed to a high dimensional space via a nonlinear transformation. The kernel representation for the dot product in feature space between two arbitrary vectors x i and x j is expressed as A commonly used kernel is the Gaussian radial basis function (RBF) kernel where c is a constant and x and y are spectral signatures of two pixels in a hyperspectral image. The above kernel function is the well-known kernel trick that avoids the actual computation of high dimensional features and enables the implementation of KRXD. Details of KRXD can be found in [19,34].

Sparse Representation (SR)
In [16], we applied SR to detect soil due to illegal tunnel excavation. It was observed that SR was one of the high performing methods. We also included SR in this paper because of the above. SR exploits the structure of only having a few nonzero values by solving the convex l 1,q -norm minimization problem: min where S q is defined as the number of non-zero rows of S, the signature values of a given pixel, s 0 is a pre-defined maximum row-sparsity parameter, q > 1 is a norm of matrix S that encourages sparsity patterns across multiple observations, and D is the dictionary of class signatures.

Joint Sparse Representation (JSR)
Similar to SR, JSR was used in our earlier study in soil detection [16]. Although JSR is more computationally demanding, it exploits neighborhood pixels for joint land type classification. In JSR, 3 × 3 or 5 × 5 patch of pixels are used in the S target matrix. It is the same equation as Equation (13), but with an added dimension to S that accounts for each pixel within whatever patch size is used. Details of the mathematics can be found in [16].

Support Vector Machine (SVM)
SVMs were first suggested in the 1960s [35] for classification and they have been an area of intense research, owing to developments in the techniques and theory coupled with extensions to regression and density estimation. An SVM is a general architecture that can be applied to pattern recognition and classification [36], regression estimation, and other problems, such as speech and target recognition. SVM can be constructed from a simple linear maximum margin classifier that can be trained by solving a convex quadratic programming problem with constraints.
The reason for including SVM in our experiments is simply because several past papers [23,24] also used SVM in land cover classifications.

EMAP
In this section, we briefly introduce EMAP, which has been shown to yield good classification performance when only one has a few spectral bands available. Mathematically, given an input grayscale image f and a sequence of threshold levels {Th 1 , Th 2 , . . . Th n }, the attribute profile (AP) of f is obtained by applying a sequence of thinning and thickening attribute transformations to every pixel in f , as follows: where φ i and γ i (i = 1, 2, . . . n) are the thickening and thinning operators at threshold Th i , respectively.
The EMAP of f is then acquired by stacking two or more APs while using any feature reduction technique on multispectral/hyperspectral data, such as purely geometric attributes (e.g., area, length of the perimeter, image moments, shape factors), or textural attributes (e.g., range, standard deviation, entropy) [37][38][39][40].
In this paper, the "area (a)" and "length of the diagonal of the bounding box (d)" attributes of EMAP [26] were used. For the area attribute of EMAP, two thresholds used by the morphological attribute filters were set to 10 and 15. For the Length attribute of EMAP, the thresholds were set to 50, 100, and 500. The above thresholds were chosen based on experience, because we observed them to yield consistent results in our experiments. With this parameter setting, EMAP creates 11 synthetic bands for a given single band image. One of the bands comes from the original image.
EMAP has been used in hyperspectral image processing before. More technical details and applications of EMAP can be found in [37][38][39][40]. In fact, in [23,24], EMAP has been used for land cover classification before. One key difference between the above references and our approach here is that we applied EMAP to only RGB+NIR and RGB+NIR+LiDAR, whereas the above methods all used the original hyperspectral data.

Dataset Used
From the IEEE GRSS Data Fusion package [23], we obtained the ground truth classification pixels, the hyperspectral image of the University of Houston area, and the LiDAR data of the same area. The instruments used to collect the dataset are a hyperspectral sensor and a LiDAR sensor. The hyperspectral data contain 144 bands that range in wavelength from 380-1050 nm with spatial resolution of 2.5 m. Each band has a spectral width of 4.65 nm. The LiDAR data contain the elevation information with a resolution of 2.5 m. Table 1 displays the number of training and testing pixels per class. Figure 1 below shows the Houston area with the ground truth classifications that the test used to compare and determine the overall accuracy.
The predetermined training data set includes 2832 pixels and the testing data included the remaining 12,197 labeled pixels from the University of Houston dataset. The results from using this predetermined training data were found to be considerably worse than the random subsampling approach, which suggested that this predetermined training data might not be entirely indicative of the testing data. In any event, we conducted our investigations with the predetermined data set to compare our results with other past studies.
Remote Sens. 2020, 12, x FOR PEER REVIEW 6 of 28 to yield consistent results in our experiments. With this parameter setting, EMAP creates 11 synthetic bands for a given single band image. One of the bands comes from the original image. EMAP has been used in hyperspectral image processing before. More technical details and applications of EMAP can be found in [37][38][39][40]. In fact, in [23,24], EMAP has been used for land cover classification before. One key difference between the above references and our approach here is that we applied EMAP to only RGB+NIR and RGB+NIR+LiDAR, whereas the above methods all used the original hyperspectral data.

Dataset Used
From the IEEE GRSS Data Fusion package [23], we obtained the ground truth classification pixels, the hyperspectral image of the University of Houston area, and the LiDAR data of the same area. The instruments used to collect the dataset are a hyperspectral sensor and a LiDAR sensor. The hyperspectral data contain 144 bands that range in wavelength from 380-1050 nm with spatial resolution of 2.5 m. Each band has a spectral width of 4.65 nm. The LiDAR data contain the elevation information with a resolution of 2.5 m. Table 1 displays the number of training and testing pixels per class. Figure 1 below shows the Houston area with the ground truth classifications that the test used to compare and determine the overall accuracy.
The predetermined training data set includes 2,832 pixels and the testing data included the remaining 12,197 labeled pixels from the University of Houston dataset. The results from using this predetermined training data were found to be considerably worse than the random subsampling approach, which suggested that this predetermined training data might not be entirely indicative of the testing data. In any event, we conducted our investigations with the predetermined data set to compare our results with other past studies.    There are a number of datasets used for analysis, as shown in Table 2. The first group is the RGB (band # 60, 30, 22 in the hyperspectral data) and the NIR band (band #103). It should be noted that the above selection of bands is not the same as band selection in the literature [41]. In band selection, the objective is to select the most informative bands out of the available hyperspectral bands. In our case, we are restricted to only having a few bands. One might have some concerns about the use of narrow bands in the HS data to emulate RGB and NIR bands. We will address this issue in Section 3.2. It turns out that such simple selection is almost equivalent to creating "realistic" RGB and NIR bands by applying spectral responses of actual color and NIR imagers to the hyperspectral bands. We call this group Dataset-4 (DS-4). The second group is the RGB and NIR coupled with LiDAR data. We denote is as Dataset-5 (DS-5). The third group is the four band group put through EMAP augmentation to produce 44 bands as each band produces ten other bands in addition to the original band (denoted as Dataset-44 (DS-44)). The fifth group is the five band plus EMAP augmentation (denoted as Dataset-55 (DS-55)). The sixth group is the full hyperspectral image of 144 bands (denoted as Dataset-144 (DS-144)). Finally, the last group is the 144 bands + LiDAR case (denoted as Dataset-145 (DS-145)). The DS-4 and DS-5 editions should require less computational times but with degradation in accuracy. DS-5 will result in a lesser reduction in accuracy with a minimal increase in time. The LiDAR data will help the classification of tall structures as well as the differentiation of trees, shrubs, and grass. These two cases will not expect to work well in land cover classification. Meanwhile, the DS-44 and DS-55 cases provide a middle ground in both time consumption and accuracy loss that, depending on the method, could prove to be useful in practical applications. The full hyperspectral image (DS-144) and the DS-145 case (144 + LiDAR) are simply used as benchmarks to compare with the rest of the combinations.

Evaluation Metrics
We have adopted overall accuracy (OA), average accuracy (AA), and kappa (k) coefficient as the performance metrics to be consistent with existing land cover classification methods in the literature. OA is defined as the ratio between the sum of correctly classified pixels from all classes and the total number of pixels in all classes. AA is the average of the individual class accuracies. Kappa coefficient is defined as (overall accuracy-random accuracy)/(1-random accuracy) where random accuracy is also known as accuracy by chance. For more details, please visit L3 Harris' website at https://www.harrisgeospatial.com/docs/CalculatingConfusionMatrices.html.

Results of Using Narrow Bands
Analysis was conducted while using each method, with different combinations of bands in Table 2, to determine their accuracy and computational efficiency. We used well-known performance metrics in the research community, namely, overall accuracy (OA), average accuracy (AA), and Kappa (κ) coefficient. The definitions of those metrics can be found in the public domain.
Tables 3-5 summarize the key metrics (OA, AA, and Kappa) of each method for different band combinations. Red numbers indicate the best accuracy for each method and the bold numbers indicate the best performance for each dataset. Tables 6-8 provide the detailed class-specific accuracies. For the majority of the tables, there is a progression of accuracy improvement from least bands to most bands used. The most obvious exception to this is the full hyperspectral image. It should be noted that, without EMAP, the performance metrics of SVM in DS-4 and DS-5 are, in general, inferior to the full HS cases. This basically answers our first question raised in Section 1 about the feasibility of using only RGB and NIR for land cover classification. In short, if one only uses RGB and NIR bands, the land cover classification performance is not accurate enough. Now, the advantage of using EMAP is stressed here. In most of the cases, EMAP versions resulted in a significant performance increase. For instance, when we used the four bands with EMAP (DS-44), the JSR and SVM methods achieved 80.77% and 82.64% overall accuracy, respectively. The corresponding results of using JSR and SVM without EMAP (DS-4) are only 59.83% and 70.43%, respectively. When using EMAP with five bands (DS-55), Joint Sparse Representation (JSR) provided the highest overall accuracy with 86.86 %. The overall accuracy for JSR was 70.81% without EMAP (DS-5) while using the same five bands. This is followed by the SVM method, again, when using EMAP with five bands (DS-55), which had an overall accuracy of 86.00%. The overall accuracy for SVM was 74.62% without EMAP using the same five bands (DS-5). The above practically answers our second question about whether EMAP can help classification performance raised in Section 1.
In several instances, the accuracy will decrease from the DS-55 case of each method when compared to the DS-144 case. Interestingly, in both of the two best performing cases (JSR and SVM), if all 144 original bands were used, the classification accuracies were 72.57% (JSR in DS-144) and 78.68% (SVM in DS-144), respectively, which are considerably lower. This shows that sometimes using all of the hyperspectral bands for land cover classification could lead to poor results. We will discuss this point in Section 3.2.     From Tables 3-5, it is quite clear to see that LiDAR did help the performance, especially for the SVM cases. For example, DS-5 is better than DS-4 in SVM; DS-55 is better than DS-44 in SVM; and, DS-145 is better than DS-144 in SVM. This answers the third question about whether LiDAR can help the classification performance that we raised in Section 1.
The next important information is regarding computational complexity. As a general rule, all of the standard methods are faster than their kernel counterparts and the kernel methods are faster than the SR and JSR methods. Table 9 shows the varying elapsed time (ET) values during training in minutes. Table 9 adopts the same layout of Table 3, Table 4, and Table 5, except that the best value is the minimum value and bold values are not implemented. A Windows-7 PC without GPU (16 G RAM and i7-CPU) was used in our experiments. By looking at Table 9, it is clear that there are some major advantages to certain methods when it comes to the ET value. The most drastic difference in time is absolutely the SR and JSR methods. While the kernel methods take up to two hours to process the image, the JSR methods can take over a day and a half to process it. It is clear that the final three methods are most consistently accurate, but only SVM is really worth the increased consistency due to the vast amount of time that it takes for SR and JSR to generate results. At its slowest, JSR is still classifying about five pixels per second; however, MSD at its slowest still processes over 1,250 pixels per second and KASD processes about 85. JSR simply cannot keep up. In the SVM row in Table 9, we noticed that the ET training times for DS-4 and DS-5 are actually higher than the other cases. This observation might look strange, even though we have repeated our experiments multiple times. The ET times are all correct. The reason is most likely because the support vectors in SVM are obtained via an iterative optimization process during training. If the optimization metric reaches a pre-specified threshold, then the iteration stops. We believe that, in the cases of DS-44, DS-45, DS-144, and DS-145, the convergence speeds were faster than those DS-4 and DS-5 cases and, hence, less time is needed to reach good performance. This might be corroborated by the fact that DS-4 and DS-5 indeed have inferior performance than the other cases, because more computational iterations (more time) were needed and yet the final performance metrics were still lower than the other cases.
The accuracy values are generated from a small subsection of the full image, but the ET value is calculated across the entire image. The ground truth values of around twelve thousand pixels while the full image encompasses more than 650,000 pixels. It is important to look at the classification maps of the full image in order to obtain a better sense of the accuracy of the methods. All of the classification maps ( Figures A1-A9) are in the Appendix A but the classification maps of DS-44 case for the nine methods are shown in Figure 2, as it is the most practical case and gives a good sense of the accuracies of each method.

Comparison with Khodadadzadeh et al.'s Results [23]
Here, we extracted some numbers from Table V in [23] and put them in Table 10. According to [23], the case of X h +EMAP(X h ) used all 144 bands and some additional EMAP bands; the case of X h + AP(X L ) used 144 bands and some additional bands from the LiDAR data; the last case is the combination of 144 bands, LiDAR, and EMAP bands.
We also extracted our best performing numbers from Table 3 and put them Table 10. Our DS-44 band case includes four optical bands (RGB+NIR) and 40 EMAP bands. This case is similar to the X h +EMAP(X h ), except that we only used four bands out of the 144 bands. It can be seen that our results are only 2 to 4 % lower than that of using 144 bands. The DS-55 case includes LiDAR information. When comparing our results to those two cases (X h + AP(X L )) and (X h + AP(X L ) + EMAP(X h )) in [23], our results are only 1 to 4% lower.
The above comparison shows that our results of using only four or five bands with EMAP can achieve results that are only a few percentage points lower than MLRsub in [23]. This means that it is feasible to use RGB+NIR with EMAP for practical land cover classification. A large distortion is clearly present in the right quarter of each image, as can be seen in Figure 2. This is most likely caused by a cloud, which then affects the detection performance of each method. It can also be seen that certain images, such as KRXD (sub-image (f)), have a decent amount of noise in its classifications. Even with that noise, the accuracy of the ground truth for KRXD is still around 64%. In contrast, the SVM map is quite consistent with much less noise.

Comparison with Khodadadzadeh et al.'s Results
Here, we extracted some numbers from Table V in [23] and put them in Table 10. According to [23], the case of X h +EMAP(X h ) used all 144 bands and some additional EMAP bands; the case of X h + AP(X L ) used 144 bands and some additional bands from the LiDAR data; the last case is the combination of 144 bands, LiDAR, and EMAP bands.
We also extracted our best performing numbers from Table 3 and put them Table 10. Our DS-44 band case includes four optical bands (RGB+NIR) and 40 EMAP bands. This case is similar to the X h +EMAP(X h ), except that we only used four bands out of the 144 bands. It can be seen that our results are only 2 to 4 % lower than that of using 144 bands. The DS-55 case includes LiDAR information. When comparing our results to those two cases (X h + AP(X L )) and (X h + AP(X L ) + EMAP(X h )) in [23], our results are only 1 to 4% lower.
The above comparison shows that our results of using only four or five bands with EMAP can achieve results that are only a few percentage points lower than MLRsub in [23]. This means that it is feasible to use RGB+NIR with EMAP for practical land cover classification. Now, we compare our results with those results that were generated by using Generalized Graph-based Fusion (GGF) [24]. We extract some numbers from Table I of [24] and put them in Table 10. The HS case (known as Raw HS in [24]) is the result of directly using SVM on the 144 bands. The MPSHSLi used SVM on features generated by morphological profile of HS and LiDAR. The GGF case is the case of using SVM to GGF features (both hyperspectral and LiDAR).
When comparing our DS-44 case with the Raw HS case in [24], one can see that our results are almost the same. Comparing our DS-55 case to the MPSHSLi case in [24], we can see the results are also comparable. Finally, our DS-55 results are 7% lower than that of GGF. However, it is important to mention that the GGF method utilized some additional information from the test samples. From the paragraph below Figure 2 in Liao et al.'s paper [24], the authors mentioned "in our experiments, 5000 samples were randomly selected to train . . . our proposed GGF". As there are only 2832 pixels in the training data, the 5000 samples must have come from the test data and this is not a fair comparison between GGF and our results.

Wide RGB and NIR Bands
When we created the RGB and NIR bands, we directly selected those narrow bands from the hyperspectral data. There are many color imagers with different wavelength ranges. Some imagers may have narrow bands that indeed look like those hyperspectral bands. In some imagers, the spectral response of each individual band might be wider. See Figure 3 for actual spectral responses of RGB and NIR bands of a commercial imager [42]. It will be important to investigate the robustness of our approach with respect to the different sensors. Will the bandwidth affect the observations in this paper? Here, we carry out a comparative study between the use of our choice of narrow RGB and NIR bands, and the wide band images by combining multiple bands from the hyperspectral image using the actual spectral response functions.
Here, we first synthesize wide RGB and NIR bands from the hyperspectral bands. We generate the individual bands by a weighted average of the hyperspectral bands based on the spectral response functions shown in Figure 3. That is, the weights come from the spectral response curves and we multiply the weights with each corresponding band in the hyperspectral data and then add the results together. After we generate the four wide bands, we then apply the EMAP algorithm to generate the synthetic EMAP bands. Finally, we applied the SVM classifier to those wide bands. Table 11 shows the classification results. From Table 11, it can be seen that the land cover classification performance using narrow bands from the HS data is only less than 0.5% from that of using wide bands. In four out of 12 cases, the narrow bands have slightly better performance than the wide bands. This study shows that, if wide RGB and NIR bands were used, the land cover classification performance would be better than that of using narrow bands from the HS data. This makes sense, because the wide bands have more spectral information. In short, the land cover classification performances of narrow and wide bands are comparable and our results are consistent, regardless of the bandwidths.
Remote Sens. 2020, 12, x FOR PEER REVIEW 5 of 28 using wide bands. In four out of 12 cases, the narrow bands have slightly better performance than the wide bands. This study shows that, if wide RGB and NIR bands were used, the land cover classification performance would be better than that of using narrow bands from the HS data. This makes sense, because the wide bands have more spectral information. In short, the land cover classification performances of narrow and wide bands are comparable and our results are consistent, regardless of the bandwidths.

Discussion
It should be noted that our emphasis in this paper is to investigate the performance of using four bands (DS-4) and, if LiDAR data are available, five bands (DS-5) for land cover classification. The four and five bands were augmented with EMAP. We will address some additional discussions in the following.

Full Hyperspectral Data vs. Synthetic Bands
One might expect the case of using all 144 bands of hyperspectral data to yield the best performance. This turns out to be not the case in our findings. Although we do not have a solid

Discussion
It should be noted that our emphasis in this paper is to investigate the performance of using four bands (DS-4) and, if LiDAR data are available, five bands (DS-5) for land cover classification. The four and five bands were augmented with EMAP. We will address some additional discussions in the following.

Full Hyperspectral Data vs. Synthetic Bands
One might expect the case of using all 144 bands of hyperspectral data to yield the best performance. This turns out to be not the case in our findings. Although we do not have a solid theory to explain this behavior, we have certainly seen several observations in other researchers' results. For instance, in Table IV of [43], there are at least two approaches (PCA and MPsHSI) that utilized a fewer number of bands and yet still achieved better overall accuracies than that of using all bands. In Table 4 of [41], the method that is known as ISSC used fewer bands and yet has better performance than that of using all bands. In Table 5 of [41], another method known as LP also used fewer bands and achieved higher accuracy than that of using all bands. One potential explanation for the above observations is the curse of dimensionality in the hyperspectral data. In other words, more bands may confuse the classifiers in some ways. Another potential explanation might be related to the "redundancy" issue, as suggested by one anonymous reviewer, in hyperspectral data. That is, more but redundant spectral bands may be harmful than that of fewer but non-redundant bands. However, more theoretical research might be needed to fully understand the above observations.

EMAP Based Augmentation vs. Deep Learning
Another natural question is that EMAP augmentation might be outdated in the era of deep learning because deep learning has some generalization capability. This viewpoint may or may not be valid, depending on applications. For the same data set, we are currently investigating two deep learning methods using only four (RGB+NIR) or five (RGB+NIR+LiDAR) bands. We developed one for soil detection using multispectral images. It is a customized structure with six layers. Details of the architecture can be found in [44]. Someone else developed another one for hyperspectral image classification. We found the open source code from Github [45]. Both deep learning methods are based on convolutional neural network (CNN).
In our preliminary experiments, we observed that the overall accuracies are around 80% using four or five bands. However, with the EMAP augmented bands, the deep learning results are improved to close to 88%.
In our opinion, the power of deep learning can only emerge when one has a vast amount of training data. Unfortunately, in the IEEE GRSS Data Fusion Contest dataset, the training samples are much fewer than the testing samples and, consequently, deep learning methods did not show its power in boosting up the land cover classification performance. We plan to wrap up this deep learning work in the near future.

Potential of Using Object Based Approaches
In recent years, object based classification approaches have gained popularity in remote sensing. Object based approaches involve segmentation and classification steps, and the salt-and-pepper classification maps that are generated by pixel based approaches can be avoided. Moreover, object based approaches can incorporate geometric shapes, sizes, and spectral information into the land cover classification process. In theory, object based approaches should have better performance in land cover classification. Some researchers concluded that object based methods have better performance than pixel based methods [46,47].
Unfortunately, the training, testing, and ground truth label data are all in pixels for both the IEEE and the Trento datasets. That is, we do not have the ground truth land cover maps for both datasets. It will be a good future direction to work with those dataset owners to define and generate the ground truth land cover maps for all of the land cover types, so that object based approaches can be evaluated for those datasets.

Conclusions and Future Directions
In this paper, we have investigated the performance of land cover classification while only using four bands (RGB+NIR) or five bands (RGB+NIR+LiDAR). Our first key observation is that the land cover classification performance using four or five bands without EMAP is not good enough, regardless of the classification algorithm. Our second key observation is that, with help from EMAP, the four or five bands can produce very good classification performance using the SVM and JSR algorithms.
We also observed that LiDAR data further enhance the classification performance. Comparing our results with representative papers in the literature shows that using four or five bands with EMAP is feasible for land cover classification, as the accuracies are only a few percentage points lower than some of the best performing methods in the literature that utilize all of the hyperspectral bands. When taking computational times into account, there is further complexity, as the best performing methods often take a significantly longer amount of time to process information. The one exception to this is the SVM method, as it performs above average in all scenarios and is the best in all but the DS-55 case while maintaining a computational time of less than five minutes for all but the DS-4 case.
There are future directions for our work. The first one is about whether one can perform fusion of multiple classification maps to further improve the classification accuracy. For instance, we applied nine methods and each one generates a land cover classification map. It will be interesting to investigate the fusion of those nine maps via some voting or Dempster Shafer fusion algorithms. The second direction is to explore deep learning approaches for land classification while only using RGB+NIR bands. A third direction is to investigate what bands in the EMAP enhanced data are more useful than others.  Comparing our results with representative papers in the literature shows that using four or five bands with EMAP is feasible for land cover classification, as the accuracies are only a few percentage points lower than some of the best performing methods in the literature that utilize all of the hyperspectral bands. When taking computational times into account, there is further complexity, as the best performing methods often take a significantly longer amount of time to process information. The one exception to this is the SVM method, as it performs above average in all scenarios and is the best in all but the DS-55 case while maintaining a computational time of less than five minutes for all but the DS-4 case.
There are future directions for our work. The first one is about whether one can perform fusion of multiple classification maps to further improve the classification accuracy. For instance, we applied nine methods and each one generates a land cover classification map. It will be interesting to investigate the fusion of those nine maps via some voting or Dempster Shafer fusion algorithms. The second direction is to explore deep learning approaches for land classification while only using RGB+NIR bands. A third direction is to investigate what bands in the EMAP enhanced data are more useful than others.