Next Article in Journal
CDOM Optical Properties and DOC Content in the Largest Mixing Zones of the Siberian Shelf Seas
Previous Article in Journal
High-Resolution Gridded Population Datasets: Exploring the Capabilities of the World Settlement Footprint 2019 Imperviousness Layer for the African Continent
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Relative Total Variation Structure Analysis-Based Fusion Method for Hyperspectral and LiDAR Data Classification

1
Department of Remote Sensing Science and Technology, School of Electronic Engineering, Xidian University, Xi’an 710071, China
2
Laboratory of Information Processing and Transmission, L2TI, Institut Galilée, University Paris XIII, 93430 Paris, France
3
Key Laboratory of Digital Earth Science, Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China
4
Academy of Advanced Interdisciplinary Research, Xidian University, Xi’an 710071, China
*
Author to whom correspondence should be addressed.
Remote Sens. 2021, 13(6), 1143; https://doi.org/10.3390/rs13061143
Submission received: 9 February 2021 / Revised: 7 March 2021 / Accepted: 12 March 2021 / Published: 17 March 2021
(This article belongs to the Section Remote Sensing Image Processing)

Abstract

:
The fusion of the hyperspectral image (HSI) and the light detecting and ranging (LiDAR) data has a wide range of applications. This paper proposes a novel feature fusion method for urban area classification, namely the relative total variation structure analysis (RTVSA), to combine various features derived from HSI and LiDAR data. In the feature extraction stage, a variety of high-performance methods including the extended multi-attribute profile, Gabor filter, and local binary pattern are used to extract the features of the input data. The relative total variation is then applied to remove useless texture information of the processed data. Finally, nonparametric weighted feature extraction is adopted to reduce the dimensions. Random forest and convolutional neural networks are utilized to evaluate the fusion images. Experiments conducted on two urban Houston University datasets (including Houston 2012 and the training portion of Houston 2017) demonstrate that the proposed method can extract the structural correlation from heterogeneous data, withstand a noise well, and improve the land cover classification accuracy.

Graphical Abstract

1. Introduction

Recently, the advancement of remote sensing technologies has resulted in an improvement in the availability of multi-sensor data in the same region and a deeper understanding of the research area [1,2]. Specifically, the hyperspectral image (HSI) has hundreds of spectral bands for each pixel. A detailed overview of the spectral features of the ground cover can be given by the rich spectral details preserved in HSI [3]. However, HSI may not be able to reliably identify objects with the same spectral properties [4]. Moreover, the light detecting and ranging (LiDAR) data could provide height information that is complementary to spectral details [1,5]. Objects of the same elevation but made from different materials cannot be separated using only LiDAR elevation information [6]. Therefore, the fusion of the high spectral resolution of HSI and the structural information given by LiDAR will provide more complete and enhanced surface properties for a broader range of applications, such as forest monitoring [7,8,9], biomass estimation [10], and geological analysis [11]. In order to allow the best use of the information given by different sensors, several feature extraction techniques are recommended. Typical feature extraction methods include unsupervised techniques such as principal component analysis (PCA) [12], independent component analysis [13], and the maximum noise fraction [14] and supervised methods such as linear discriminative analysis [15]. However, these methods process each pixel separately and are only optimized for extracting spectral features, without taking into account spatial context information [16,17,18]. Moreover, it has been proven that image fusion has the benefit of combining spectral features and spatial information, because the spatial regularity of remote sensing image surface materials is typically very uniform in a limited region [19,20]. Different techniques for feature extraction are commonly adopted to make full use of the spectral and spatial information given by both HSI and LiDAR data. For example, the attribute profile (AP) [21], which focuses on high-level spatial features, has attracted a great deal of interest. It can retain the geometric features of the input image while removing unimportant details [22,23]. The extended multi-attribute profile (EMAP) [24], which is then proposed to create spectral-spatial features through a set of attribute profiles, has been successfully implemented in the fusion of HSI and LiDAR data [25,26]. What is more, Gabor features can express the spatial structure of various sizes and directions in the image [27,28]. Lin et al. enhanced the discriminative low-rank Gabor filter to fully explore the capacity of extracting useful information from the spectral and spatial domains [29]. The local binary pattern (LBP) [30], which belongs to invariant and gray-scale texture operators of rotation, is proposed for texture extraction. Unlike the features extraction methods described above, the LBP concentrates on texture details (including global contrast information and texture depth) [31]. Furthermore, other advanced tools such as composite kernel methods [32], low-rank representation [20], and edge-preserving filtering [16] are utilized to make use of spectral and spatial information. However, the automatic feature fusion of multiple types of data is not straightforward [6]. To combine the heterogeneous features, several feature fusion methods have been developed. These methods can be roughly divided into five categories: feature-based stack structure, multiple kernels learning, sparse representation, graph-based method, and deep learning.
  • Feature-based stack structure: The method of stacking features to create a spectral-spatial cube is relatively simple [33,34]. However, the stacked feature vector assigned to each pixel has a high dimensionality, which contributes to the curse of dimensionality and a limited number of available training samples [35].
  • Multiple kernels learning : Integrating multi-source data based on multiple kernels is effective [5]. For example, Camps-Valls et al. suggested a general kernel method-based architecture that allows multi-sensor images to be combined with contextual information [36]. However, it is a challenging job to build an acceptable kernel and to pick its parameters [37].
  • Sparse representation: Some fusion strategies combine heterogeneous features by dictionary creation and sparse coefficient solutions based on sparse representation [38,39]. These are non-parametric approaches that do not require any data distribution or mathematical estimation assumptions. Dian et al. formulated the fusion problem as the calculation of the spectral basis and coefficients by taking advantage of the non-local spatial self-similarities, prior knowledge of the spectral unmixing, and a sparse prior [40]. Nevertheless, how to solve the problem of optimization in sparse representation is a tough task [35].
  • Graph-based method: The discriminant graph-based method merges heterogeneous features by mining the manifold structure of these features [3,41]. Liao et al. proposed a general graph-based fusion approach to combine dimension reduction and the spectral details with morphological profiles (MPs) and apply the method to the HSI and LiDAR fusion [1].
  • Deep learning: Deep learning approaches such as convolutional neural networks (CNNs) can derive layer-by-layer joint spectral-spatial features [12,42,43,44]. In extracting non-linear and hidden features, these approaches have great promise. However, deep learning networks have hyper-parameters and are thus vulnerable to the over-fitting problem [35].
Total variation (TV) [45] is an efficient regularization technology for image processing and has been commonly used in remote sensing applications such as pansharpening [46], image denoising [47], and feature extraction [48]. In image fusion, the TV method based on feature extraction has high performance. For example, Kumar et al. used a semi-norm method based on TV to fuse the pixels of the input images [49]. Ma et al. emulated the fusion as an optimization problem and merged gradient transfer and TV minimization for infrared and visible image fusion [50]. However, there are relatively few studies on TV-based image feature fusion, especially for HSI and LiDAR data fusion. In this paper, a TV-based feature fusion method is proposed for the merging of HSI and LiDAR city datasets. In the feature extraction stage, a variety of high-performance methods including EMAP, Gabor, and LBP are used to extract the features of the input data. Then, the relative TV structure analysis (RTVSA) is adopted to combine various features derived from HSI and LiDAR data. The proposed method is utilized on two publicly accessible urban Houston University datasets (including Houston 2012 and the training portion of Houston 2017). Two classification methods, random forest and convolutional neural network, are applied to the dataset for land cover classification. The main contributions of this paper are as follows:
(1)
First, this paper proposes a novel algorithm for the fusion of HSI and LiDAR data based on RTV and nonparametric weighted feature extraction. The proposed method RTVSA can effectively improve the classification accuracy of the HSI and LiDAR fusion data, extract the structural association from heterogeneous data, and have noise adaptability.
(2)
The spatial features used in the HSI and LiDAR fusion are studied in this paper. It is proven that the LBP and EMAP features can achieve high classification accuracy.
The remainder of this paper is structured as follows. Section 2 describes the related work. Section 3 introduces in depth the feature fusion approach proposed in this paper and some feature extraction methods. Section 4 presents the dataset used in this paper Section 5 shows the experiment results. The discussion is presented in Section 6. Finally, the concluding remarks are given in Section 7.

2. Related Works

To evaluate the quality of the fusion images, two classification methods, including random forest (RF) and the convolutional neural network (CNN), are used to classify the image before and after the fusion.

2.1. Random Forest

Random forest (RF) is a standard method of ensemble learning, with the outstanding output of classification and high processing speed, and it can prevent over-fitting effectively [51,52,53,54,55,56,57]. RF is a mixture of tree predictors. Each tree depends on the value of a random vector sampled independently and has the same distribution for all trees in the forest [58]. The generalization error of the tree classifier forest depends on the intensity and the similarity of each tree in the forest. They vote for the most popular class after creating a large number of trees.

2.2. Convolutional Neural Network

The convolutional neural network (CNN) is one of the most commonly used visual data processing methods based on deep learning. Recent studies showed that 3D-CNN achieves satisfactory results for HSI classification. In this paper, the hybrid spectral CNN (HybridSN) [59] is used for classification. Let the input data cube be represented as I R M × N × D . The input data cube is divided into small overlapping 3D neighboring patches P R S × S × D . The true value label is determined by the label of the central pixel ( α , β ) , covering the S × S window or spatial range and all D spectral bands. By combining the 3D kernel with 3D data, 3D convolution is accomplished. The 3D kernel is used to create feature maps of the convolution layer on several continuous bands in the input layer. Spectral details will be captured here. In 3D convolution, in the j-th feature map of the i-th layer, the activation value of the spatial location ( x , y , z ) is:
ν i , j x , y , z = φ b i , j + τ = 1 d l 1 μ = η η ρ = γ γ σ = δ δ ω i , j , τ μ , ρ , σ × ν i 1 , τ x + μ , y + ρ , z + σ
where φ is the activation function, b i , j is the bias parameter for the j-th feature map of the i-th layer, d l 1 represents the number of feature maps in the ( l 1 ) -th layer, and ω i , j , τ are the values of the weight parameter of the j-th feature map of the i-th layer in the τ -th dimension. The parameters of CNN are determined by using a supervised method for training with the aid of gradient descent optimization technology.

3. Methods

In this section, the TV-based method is proposed to fuse the HSI and LiDAR data. The approach consists of two parts, feature extraction and feature fusion. Three high-performance methods are used in the feature extraction, including the LBP, EMAP, and Gabor. The relative total variation structure analysis method is suggested in feature fusion. Using random forest and convolutional neural networks, the processed data are classified respectively. The flowchart is shown in Figure 1.

3.1. Feature Extraction

3.1.1. Local Binary Pattern

The local binary pattern (LBP) [30] is a simple, yet effective texture descriptor that can analyze each pixel and its region to summarize the local structure in the image. It has the characteristics of gray-scale invariance and rotation invariance. For each band of the input image, the texture U in the local neighborhood is defined as the joint gray-level distribution of image pixels. The gray value of the image m c consists of the gray value of the local neighborhood’s middle pixel, and m p ( p = 0 , 1 , P 1 ) is the gray value of P equidistant pixels on a circle forming a circularly symmetrical neighbor set with a radius of Q ( Q > 0 ). If the coordinates of m c are ( 0 , 0 ) , then the coordinates of m p can be expressed as ( 2 R s i n ( 2 π p / P ) , R c o s ( 2 π p / P ) ) . The LBP is calculated by thresholding the neighbors m p to create a P-bit binary code as:
L B P ( m p ) = p = 0 P 1 U ( m p m c ) 2 p

3.1.2. Extended Multi-Attribute Profile

The extended multi-attribute profile (EMAP) relies on the application of the attribute profile (AP) to the data. The AP is a multi-level decomposition of image attribute filters [24]. Considering the different attributes, APs extract various information from the image, such as area, range, length of the diagonal of the bounding box, and entropy. The size of the structuring element (SE) also affects the degree of processing of the input image. The AP has duality, deleting from the image the darker portion and processing the input image via the closed reconstruction at the same time. The opening and closing reconstructions of the image I are respectively given by:
γ S i ( I ) = S I δ ( ξ i ( I ) )
ϕ S i ( I ) = S I ξ ( δ i ( I ) )
where ξ i and δ i are the erosion and dilation of the SE size i, respectively. S I δ and S I ξ achieve reconstruction through dilation and erosion, respectively. Different from the morphological profile (MP), the AP can steadily simplify the picture as the filter value increases, since it has a cumulative function. In order to ensure that the added standards of the attribute filter are verified, the family of criteria T i must be considered, making i j T i T j γ T i γ T j . Therefore, the AP can be regarded as the concatenation of a thickened AP, Π ϕ T , and a sparse AP, Π γ T :
A P ( I ) = Π i : Π i = Π ϕ T θ , w i t h θ = ( n 1 + i ) , θ [ 1 , n ] Π i = Π γ T θ , w i t h θ = ( i n 1 ) , θ [ n + 1 , 2 n + 1 ]
where T = T 1 , T 2 , , T n is a set of ordered criteria. The process of EMAP is as follows. First, principal component analysis is used for the input image. AP is calculated on the extracted n principal components (PCs). An extended attribute profile ( E A P ), composed of n different APs, can be formulated as:
E A P = A P P C 1 , A P P C 2 , , A P P C n
The E M A P is a combination of different EAPs.
E M A P = E A P a 1 , E A P a 2 , , E A P a n
where a i ( i = 1 , , k ) is the generic attribute.

3.1.3. Gabor Filter

Gabor filters have been commonly used in hyperspectral image classification, among which 3D-Gabor filters are efficient and superior in extracting spectral-spatial characteristics [60]. In this paper, to extract the features of input images, the discriminative low-rank Gabor filtering proposed in [29] is adopted. The method used decomposes the standard 3D Gabor filter into several sub-filters, referring to various combinations of single-rank low-pass and band-pass filters. It can maintain the features of the image appropriate for discrimination purposes, relative to the conventional Gabor filter. For the input image I with a size of x , y , z , calculate the Gaussian rank one envelope first.
e n i = 1 2 π 1 / 2 σ i exp i 2 2 σ i 2 i = x , y , z
where σ i is the standard deviation. Then, find the cosine harmonic h a r cos and sine harmonic h a r sin of rank one, respectively.
h a r cos i = cos i ψ i i = x , y , z
h a r sin i = sin i ψ i i = x , y , z
The required sub-filter consists of a Gaussian function and a corresponding harmonic function.
g a u cos i = e n i · h a r cos i i = x , y , z
g a u sin i = e n i · h a r sin i i = x , y , z
Finally, the 3D Gabor filter of image I can be obtained by:
G a b o r I = I g a u cos x g a u cos y g a u sin z

3.2. Relative Total Variation Structure Analysis

The proposed feature fusion method, relative total variation structure analysis (RTVSA), includes two parts, which are RTV-based fusion and feature dimension reduction. The relative total variation (RTV) [61] model can capture meaningful structural data and delete redundant texture data in the image. Furthermore, the approach is general, i.e., it is sufficient for the decomposition of anisotropic texture. The RTV model does not presuppose the form of texture to exclude the uncertain texture, but implements a novel mapping of window intrinsic variance. For the combined HSI and LiDAR image s after feature extraction, the RTV model can be expressed as:
arg min s 1 2 f s 2 2 + λ i D x s ( i ) L x s ( i ) + ε + D y s ( i ) L y s ( i ) + ε
D x s ( i ) = j R ( i ) g i , j ( x s ) i
D y s ( i ) = j R ( i ) g i , j ( y s ) i
L x s ( i ) = j R ( i ) g i , j ( x s ) i
L y s ( i ) = j R ( i ) g i , j ( y s ) i
where j is the index of all pixels in a square region centered on point i and g i , j is a weighting function specified according to spatial affinity.
g i , j exp ( x i x j ) 2 + ( y i y j ) 2 2 σ 2
In [61], the authors transformed the original nonlinear problem into a set of sub-problems that were easier to solve. As nonlinear problems can be translated into solving a sequence of linear equations, the RTV is broken down into nonlinear terms and quadratic terms. To solve Equation (14), the penalty in the x direction is extended as:
i D x s ( i ) L x s ( i ) + ε = j i R ( j ) g i , j j R ( i ) g i , j ( x s ) j + ε ( x s ) j j i R ( j ) g i , j L x ( i ) + ε 1 ( x s ) j + ε s ( x s ) j 2 ) = j u x j w x j ( x s ) j 2
It is possible to work with the y-direction term similarly. Reconstruct the quadratic term ( x s ) j 2 and the nonlinear component u x j w x j at the same time. They can be articulated as:
u x j = i R ( j ) g i , j L x ( i ) + ε = G σ 1 G σ x s + ε j
w x j = 1 ( x s ) j + ε s
where G σ is the Gaussian kernel function with standard deviation σ and * is the convolution symbol. Then, Equation (14) can be transformed into the following matrix form:
( v s v I ) T ( v s v I ) + λ ( v s T C x T U x W x C x v s + v s T C y T U y W y C y v s )
where v s and v I represent the column vectors of s and I, respectively. C x and C y are the Toeplitz matrices of the forward difference gradient operators. U x , U y , W x , and W y are all diagonal matrices, and the values on their diagonals are: U x [ i , i ] = u x i , U y [ i , i ] = u y i , W x [ i , i ] = w x i , and W y [ i , i ] = w y i . Derivation of the above matrix can get the following linear equation.
( 1 + λ L t ) · v s t + 1 = v I
where 1 is the identity matrix and L t is the weight matrix computed on the vector v s t . The data processed by the RTV still retain a large number of features and may be redundant. Many studies have shown that nonparametric weighted feature extraction (NWFE) is a powerful tool for extracting features of remote sensing images. The main idea of NWFE is to assign different weights to each sample and define new non-parameters between the class and the within-class scatter matrix. In NWFE, first calculate the distance between each pair of sampling points, and form a distance matrix d i s t ( h l ( i ) , h k ( j ) ) 1 , where h l ( i ) represents the l-th sample in class i. Then, calculate the weight function ϖ l k ( i , j ) through the distance matrix.
ϖ l k ( i , j ) = d i s t ( h l ( i ) , h k ( j ) ) 1 t = 1 n j d i s t ( h l ( i ) , h k ( j ) ) 1
The weighted mean function M j ( h l ( i ) ) can be obtained by ϖ l k ( i , j ) .
M j ( x l ( i ) ) = k = 1 N i ϖ l k ( i , j ) h k ( j )
The weight of the scatter matrix μ l ( i , j ) is a function of h l ( i ) and M j ( h l ( i ) ) , defined as:
μ l ( i , j ) = d i s t ( h l ( i ) , M j ( h l ( i ) ) ) 1 t = 1 N j d i s t ( h t ( i ) , M j ( h t ( i ) ) ) 1
NWFE extracts L features by using weight vectors between and within classes. The extracted L features are the L eigenvectors having the largest eigenvalues of the following matrix:
S w i N W 1 S b e N W
where the within-class scatter matrix is defined as:
S w i N W = i = 1 L P i l = 1 N i μ l ( i , j ) n i h l ( i ) M i ( h l ( i ) ) h l ( i ) M i ( h l ( i ) ) T
The between-class scatter matrix for L classes is defined as:
S b e N W = i = 1 L P i j = 1 , j i L l = 1 N i μ l ( i , j ) n i · h l ( i ) M i ( h l ( i ) ) h l ( i ) M i ( h l ( i ) ) T

4. Materials

Two Houston datasets, including hyperspectral images and LiDAR data, were adopted to evaluate the effectiveness of the proposed RTVSA method. Table 1 lists the detailed information about the training and test sets.

4.1. 2012 Houston Dataset

The 2012 Houston dataset (http://www.classic.grss-ieee.org/community/technical-committees/data-fusion/2013-ieee-grss-data-fusion-paper-contest-results/, accessed on 23 December 2020), originally distributed for the 2013 IEEE Geoscience and Remote Sensing Society (GRSS) Data Fusion Contest (DFC), contains the HSI and a LiDAR-derived digital surface model (DSM). This dataset was gathered in June 2012 and covers the Houston University and the surrounding urban areas. Among them, the HSI consists of 144 spectral bands in the 380 nm to 1050 nm region and has been calibrated to at-sensor spectral radiance units. The HSI was captured by the ITRES -CASI (Compact Airborne Spectrographic Imager) 1500 hyperspectral imager. The LiDAR data were acquired using an Optech Gemini sensor. The size of the HSI and LiDAR-derived data is 349 × 1905 with a spatial resolution of 2.5 m. The ground truth map contains 15 classes and is provided in a 2.5 m ground sampling distance (GSD) raster. Figure 2 shows the HSI, the LiDAR-derived DSM, and the ground truth map. In Table 1, the division of training data and test data on the 2012 Houston dataset is completely based on the 2013 GRSS DFC.

4.2. 2017 Houston Dataset

The 2017 Houston dataset (http://www.classic.grss-ieee.org/community/technical-committees/data-fusion/2018-ieee-grss-data-fusion-contest/, accessed on 23 December 2020), originally distributed for the 2018 IEEE GRSS Data Fusion Contest, is composed of HSI and multispectral LiDAR data. In this paper, the training portion of the dataset was used. The HSI, captured by an ITRES CASI 1500, contains 48 bands with a spectral range of 0.38–1.05 m at a 1 m GSD. The multispectral LiDAR data, acquired by an Optech Titan MW (14SEN/CON340), includes point cloud data at 1550 nm, 1064 nm, and 532 nm, as well as the intensity grid, returned for the first time in each channel, and digital surface model (DSM) data with a GSD of 0.5 m. The ground truth map contains 20 urban land cover/land use classes and is provided in a 0.5 m GSD raster with an image size of 1202 × 4768 . Figure 3 presents the HSI, LiDAR-derived DSM, and the ground truth map. In Table 1, for each class, one percent of the samples are randomly selected as the training samples and the rest as test samples.

5. Results

5.1. Experiment Settings

Three different methods including the LBP, EMAP, and Gabor were adopted in the feature extraction part. In the experiment, their parameters were designed as follows.
  • EMAP : According to References [21,24,25], three different attributes are considered when constructing the EMAP: the area, standard deviation, and length of the diagonal of the bounding box. The EMAP with the area attribute describes the proportion of the structure in the scene. To create the profile with the area attribute, the sizes of the SE were set to 10, 15, and 20. The standard deviation attribute performs multi-layer decomposition on objects in the scene that are not related to the geometric shape of the area, but models the gray uniformity of the pixels in the area [21]. For the standard deviation, the size of the SE was 150. The length attribute gives the diagonal length of the smallest rectangle surrounding the connected components, and the sizes of the SEs were set to 50, 100, and 500.
  • Gabor: Before extracting Gabor features, the HSI was used to extract the first seven principal components using principal component analysis. For each Gabor filter, the frequency length ω was set to 1 / 2 π , 1 / 4 π , 1 / 8 π , and 1 / 16 π , and the angle between the frequency and the spectral size was set to 0, 1 / 4 π , 1 / 2 π , and 3 / 4 π .
  • LBP: The input data were first normalized. The circle of radius Q was set to one, and the number of data points P on the circular symmetric neighbor set was eight.
The parameters of the RTVSA method were set as follows: λ = 0.04 , σ = 2 , and L = 30 . In image classification, according to the results in Reference [62] and our previous work [51], the number of decision trees in the RF was determined to be 100, and the number of prediction variables was set approximately to the square root of the number of input bands. The settings of the CNN were based on [59]. In the 3D-CNN framework, the size of the 3D convolution kernel was 8 × 3 × 3 × 7 × 1 , 16 × 3 × 3 × 5 × 8 , and 32 × 3 × 3 × 3 × 16 . The mini-batch size was 256, and the network was trained for 100 epochs without batch normalization and data expansion. No pre-trained models were used, and all models were trained from scratch. According to the classification results, the optimal learning rate of 0.001 was selected. The training dataset and the test dataset were randomly selected from the input data respectively according to Table 1.

5.2. Evaluation Indexes

To evaluate the performance of fusion image classification, four commonly used quantitative metrics, namely class accuracy (CA), overall accuracy (OA), average accuracy (AA), and the Kappa coefficient, were used to assess the classification results. Specifically, CA is used to measure the percentage of correctly classified pixels in each class. OA represents the proportion of samples that are correctly classified among all samples. AA measures the average of the percentage of pixels correctly classified for each class. The Kappa statistics is a multivariate statistical method for classification accuracy.

5.3. Results and Analysis

5.3.1. The 2012 Houston Dataset

For the 2012 Houston dataset, the RTVSA fusion method was applied based on the LBP, EMAP, and Gabor feature extraction, respectively. Random forest and CNN classifiers were used to classify the results of different fusion methods. The RF and CNN classification performances are shown in Table 2 and Table 3, respectively. The best results are shown in bold. r represents the number of bands in each image. It can be seen from Table 2 that, compared to directly classifying the original data, the data obtained through feature extraction can effectively improve the performance of random forest. In particular, the spatial information extracted by the EMAP can significantly improve the classification accuracy. For example, the application of the EMAP can improve AA by nearly 7.72% for the HSI. Among all the methods considered in the paper, the RTVSA fusion method captured the redundant information in the HSI and LiDAR data and eliminated unnecessary texture information. All OAs using the RTVSA method exceeded 90%. Besides, since the proposed method took into account spectral and spatial information, it showed excellent performance in the accuracy of specific classes, such as the classes of synthetic grass, soil, tennis court, and running track. Among the CNN classification results, the EMAP extracted multi-scale spatial features, and the RTVSA fusion based on EMAP feature extraction had the best performance. For example, in terms of OA, the classification accuracy of fused images based on EMAP was 11.93% higher than that of LBP and 2.93% higher than that of Gabor. However, the HSI data after function extraction had a good output in certain classes. For example, the classification accuracy of synthetic grass, soil, tennis court, and running track in HSI data based on Gabor feature extraction reached 100%. HSI data themselves have rich spectral information, and the data after spatial feature extraction were more conducive to image classification. When it integrated LiDAR data, it supplemented complementary information and was more helpful for image classification. The classification maps using RF and CNN classifiers obtained by different fusion approaches on the 2012 Houston data are shown in Figure 4 and Figure 5, respectively. The sample labels were consistent with those in Figure 2. Both figures demonstrate that the proposed fusion method gives a classification map of homogeneous regions while maintaining the structure. With regards to the city data collection, the use of RTV not only boosts the smoothness, but also preserves the current structure. It is worth noting that the CNN classification map is skewed because the number of unprocessed LiDAR data bands was too limited.

5.3.2. The 2017 Houston Dataset

The RF and CNN classification results of different fusion methods on the 2017 Houston dataset are shown in Table 4 and Table 5, respectively. Different from the 2012 Houston dataset, the LBP-based RTVSA fusion method achieved the best classification accuracy on the 2017 Houston dataset. This is related to the spectral and spatial information contained in the original data. Based on LBP feature extraction, the classification results after RTVSA fusion were excellent. As far as OA is concerned, the LBP based fusion achieved OA of 91.83%, which was 15.44% higher than that of HSI and 14.33% higher than that of LiDAR. The classification results of CNN also proved the above. The results indicated that the proposed fusion method was stable enough to accurately identify complex classes with few features. However, since the 2017 Houston dataset was only a part of the 2012 dataset and the 2017 dataset with a 1m GSD was too detailed, the classification performance of the 2017 Houston dataset was slightly inferior to that of the 2012 Houston dataset. The classification maps using RF and CNN classifiers obtained by different fusion approaches on the 2017 Houston data are shown in Figure 6 and Figure 7, respectively. The sample labels are consistent with those in Figure 3. In the RF classification map (Figure 6), the fusion data based on the LBP and RTVSA had large differences between each class, which can be distinguished well. There was a wide difference between the Gabor-based HSI and LiDAR data classification maps, primarily because the spatial information contained in the LiDAR data varied considerably from the spectral information contained in the HSI data. In Figure 7, it can be seen that the consideration of EMAP can make the classification map uniform by extracting the structure of various objects, compared to the case where only the spectral information is considered. This was further enhanced by using the proposed method of fusing spectral, spatial, and elevation information.

5.4. Parametric Analysis

Three important parameters, namely λ , σ , and L, on the performance of the proposed RTVSA method are discussed in this section. The experiment was conducted on the 2012 Houston dataset, using RF and CNN classifiers, respectively. Other parameters were fixed when addressing λ , i.e., σ = 2 and L = 30 . Similarly, other parameters were also fixed when determining the effect of σ , i.e., λ = 0.04 and L = 30 . When measuring the effect of L, λ and σ took 0.04 and two, respectively. The effect of classification performance is shown in Figure 8 and Figure 9 by pairs of these parameters. It can be seen from Figure 8 that while the value of λ increases, the RF classification accuracy also presents an upward trend. The growth rate of σ is close to that of λ , and when the value of σ exceeds three, the classification accuracy essentially no longer improves. The classification accuracy shows an upward trend with an increase of L. When L exceeds 40, the classification accuracy no longer increases. This phenomenon is not surprising. More features can introduce more useful information, which is beneficial to improve classification performance. However, if the number of features is too large, this will cause information redundancy, which will result in a decrease in classification accuracy. The changes of parameters under the CNN classifiers are similar to the RF. However, due to the characteristics of the CNN, it is more sensitive to changes in parameters.

6. Discussion

Due to the limitation of imaging equipment, hyperspectral data have rich spectral information, but it may not be able to reliably identify objects with the same spectral characteristics. Moreover, the LiDAR data can provide complementary information along with the HSI. To meet social and economic applications, more flexible land cover data products are needed. This paper aims to extract structural correlations from heterogeneous data to fuse HSI and LiDAR data. The proposed feature-level fusion method can provide an effective way to extract and fuse features for improving the classification accuracy of land cover. In this study, a TV-based feature fusion method is proposed for the fusing of HSI and LiDAR urban datasets. The proposed method is used on two publicly available Houston datasets. The classification results of random forest and CNN classification demonstrate the effectiveness of the proposed method.
(1)
In this study, three feature extraction methods, the LBP, EMAP, and Gabor, are used to extract the spatial features of HSI and LiDAR. The results in Table 2, Table 3, Table 4 and Table 5 show that the use of these three features can improve the accuracy of land cover classification. In the 2012 Houston dataset, the EMAP feature provides the highest value of OA. Compared with other methods, the EMAP feature obtains the highest classification accuracy in the classes of healthy grass, stressed grass, synthetic grass, trees, soil, residential, commercial, and roads. However, it is slightly inferior to the performance of the LBP feature in the other classes. The EMAP extracts multi-scale structural information, while the LBP feature shows the texture feature of a local area. In the 2017 Houston dataset, different from the 2012 Houston dataset, the LBP obtains the highest classification accuracy. The reason is that the 2017 Houston dataset with a 1m GSD contains more detailed information, and the 2017 Houston dataset used in this paper only covers a part of the 2012 Houston dataset. When the sample is in a homogeneous region, the LBP feature plays an important role. When the sample contains more boundary regions, the performance of the EMAP feature is better.
(2)
Although the two datasets used in this paper cover part of the same area, the classification accuracy of the 2012 Houston dataset is slightly higher than that of the 2017 Houston dataset. This is due to many reasons. The two datasets have different spatial resolutions, where the 2012 Houston dataset has a 2.5 m GSD and the 2017 Houston dataset a 2.5 m GSD. The dimensions of the HSI and LiDAR are not the same in the two datasets. The two datasets use different data types. The 2017 Houston dataset contains multispectral LiDAR, while the 2012 Houston dataset contains a LiDAR-derived DSM. In addition, the dataset used in this paper is involved in the area of Houston University, but the geo-reference is not part of the paper.
(3)
The feature-level fusion method proposed in this paper only contains a single spatial feature. Some researchers have found that the fusion of multiple features can improve the accuracy of land cover classification [63,64]. Therefore, the fusion of multiple features to improve the usability of remote sensing images will be considered in future work.

7. Conclusions

In this paper, a feature-level fusion method based on total variation is proposed to combine the heterogeneous features of HSI and LiDAR data. The proposed method consists of two parts, feature extraction and feature fusion. In feature extraction, three effective methods (LBP, EMAP, and Gabor) are adopted to extract spatial features, among which the LBP and EMAP have greater contributions to improving land cover classification. In feature fusion, the proposed RTVSA method is suitable for removing the useless texture and noise information and enhances class separability, which can effectively combine the complementary information of various data. Two urban Houston University datasets, including the 2012 Houston dataset and the training portion of the 2017 Houston dataset, are used to evaluate the proposed method. Experimental results prove that the RTVSA fusion method can capture feature redundancy while improving classification accuracy on the study datasets. Moreover, RTVSA retains the structure and provides a uniform classification map. In the future, the fusion of multiple features to improve the usability of remote sensing images will be considered.

Author Contributions

Y.Q. and W.F. conceived of and designed the experiments; Y.T. wrote the paper. Y.T. and W.Z. performed the experiments. W.F. and G.D. revised the paper. W.H. and M.X. edited the manuscript. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (61772397, 12005169), the National Key R&D Program of China (2016YFE0200400), the Open Research Fund of Key Laboratory of Digital Earth Science (2019LDE005), the science and technology innovation team of Shaanxi Province (2019TD-002), and Special fund for basic scientific research project in the central scientific research institutes (Institute of Grassland Research of CAAS) (1610332020026).

Data Availability Statement

The study did not report any data.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Liao, W.; Pižurica, A.; Bellens, R.; Gautama, S.; Philips, W. Generalized Graph-Based Fusion of Hyperspectral and LiDAR Data Using Morphological Features. IEEE Geosci. Remote Sens. Lett. 2015, 12, 552–556. [Google Scholar] [CrossRef]
  2. Xia, J.; Yokoya, N.; Iwasaki, A. Fusion of Hyperspectral and LiDAR Data With a Novel Ensemble Classifier. IEEE Geosci. Remote Sens. Lett. 2018, 15, 957–961. [Google Scholar] [CrossRef]
  3. Gu, Y.; Wang, Q. Discriminative Graph-Based Fusion of HSI and LiDAR Data for Urban Area Classification. IEEE Geosci. Remote Sens. Lett. 2017, 14, 906–910. [Google Scholar] [CrossRef]
  4. Rasti, B.; Ghamisi, P.; Plaza, J.; Plaza, A. Fusion of Hyperspectral and LiDAR Data Using Sparse and Low-Rank Component Analysis. IEEE Trans. Geosci. Remote Sens. 2017, 55, 6354–6365. [Google Scholar] [CrossRef] [Green Version]
  5. Gu, Y.; Wang, Q.; Jia, X.; Benediktsson, J.A. A Novel MKL Model of Integrating LiDAR Data and MSI for Urban Area Classification. IEEE Trans. Geosci. Remote Sens. 2015, 53, 5312–5326. [Google Scholar] [CrossRef]
  6. Rasti, B.; Ghamisi, P.; Gloaguen, R. Hyperspectral and LiDAR Fusion Using Extinction Profiles and Total Variation Component Analysis. IEEE Trans. Geosci. Remote Sens. 2017, 55, 3997–4007. [Google Scholar] [CrossRef]
  7. Sankey, T.; Donager, J.; McVay, J.; Sankey, J.B. UAV lidar and hyperspectral fusion for forest monitoring in the southwestern USA. Remote Sens. Environ. 2017, 195, 30–43. [Google Scholar] [CrossRef]
  8. Dalponte, M.; Bruzzone, L.; Gianelle, D. Fusion of Hyperspectral and LIDAR Remote Sensing Data for Classification of Complex Forest Areas. IEEE Trans. Geosci. Remote Sens. 2008, 46, 1416–1427. [Google Scholar] [CrossRef] [Green Version]
  9. Alonzo, M.; Bookhagen, B.; Roberts, D.A. Urban tree species mapping using hyperspectral and lidar data fusion. Remote Sens. Environ. 2014, 148, 70–83. [Google Scholar] [CrossRef]
  10. Swatantran, A.; Dubayah, R.; Roberts, D.; Hofton, M.; Blair, J.B. Mapping biomass and stress in the Sierra Nevada using lidar and hyperspectral data fusion. Remote Sens. Environ. 2011, 115, 2917–2930. [Google Scholar] [CrossRef] [Green Version]
  11. Buckley, S.J.; Kurz, T.H.; Howell, J.A.; Schneider, D. Terrestrial lidar and hyperspectral data fusion products for geological outcrop analysis. Comput. Geosci. 2013, 54, 249–258. [Google Scholar] [CrossRef]
  12. Li, S.; Hao, Q.; Kang, X.; Benediktsson, J.A. Gaussian Pyramid Based Multiscale Feature Fusion for Hyperspectral Image Classification. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2018, 11, 3312–3324. [Google Scholar] [CrossRef]
  13. Falco, N.; Benediktsson, J.A.; Bruzzone, L. Spectral and Spatial Classification of Hyperspectral Images Based on ICA and Reduced Morphological Attribute Profiles. IEEE Trans. Geosci. Remote Sens. 2015, 53, 6223–6240. [Google Scholar] [CrossRef] [Green Version]
  14. Licciardi, G.; Pacifici, F.; Tuia, D.; Prasad, S.; West, T.; Giacco, F.; Thiel, C.; Inglada, J.; Christophe, E.; Chanussot, J.; et al. Decision Fusion for the Classification of Hyperspectral Data: Outcome of the 2008 GRS-S Data Fusion Contest. IEEE Trans. Geosci. Remote Sens. 2009, 47, 3857–3865. [Google Scholar] [CrossRef] [Green Version]
  15. Hang, R.; Liu, Q.; Sun, Y.; Yuan, X.; Pei, H.; Plaza, J.; Plaza, A. Robust Matrix Discriminative Analysis for Feature Extraction from Hyperspectral Images. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2017, 10, 2002–2011. [Google Scholar] [CrossRef]
  16. Kang, X.; Li, S.; Benediktsson, J.A. Feature Extraction of Hyperspectral Images with Image Fusion and Recursive Filtering. IEEE Trans. Geosci. Remote Sens. 2014, 52, 3742–3752. [Google Scholar] [CrossRef]
  17. Fang, L.; He, N.; Li, S.; Ghamisi, P.; Benediktsson, J.A. Extinction Profiles Fusion for Hyperspectral Images Classification. IEEE Trans. Geosci. Remote Sens. 2018, 56, 1803–1815. [Google Scholar] [CrossRef]
  18. Gao, L.; Hong, D.; Yao, J.; Zhang, B.; Gamba, P.; Chanussot, J. Spectral Superresolution of Multispectral Imagery with Joint Sparse and Low-Rank Learning. IEEE Trans. Geosci. Remote Sens. 2020, 1–12. [Google Scholar] [CrossRef]
  19. Jia, S.; Wu, K.; Zhu, J.; Jia, X. Spectral-Spatial Gabor Surface Feature Fusion Approach for Hyperspectral Imagery Classification. IEEE Trans. Geosci. Remote Sens. 2019, 57, 1142–1154. [Google Scholar] [CrossRef]
  20. Jia, S.; Tang, G.; Zhu, J.; Li, Q. A Novel Ranking-Based Clustering Approach for Hyperspectral Band Selection. IEEE Trans. Geosci. Remote Sens. 2016, 54, 88–102. [Google Scholar] [CrossRef]
  21. Dalla Mura, M.; Benediktsson, J.A.; Waske, B.; Bruzzone, L. Morphological Attribute Profiles for the Analysis of Very High Resolution Images. IEEE Trans. Geosci. Remote Sens. 2010, 48, 3747–3762. [Google Scholar] [CrossRef]
  22. Ghamisi, P.; Benediktsson, J.A.; Phinn, S. Land cover classification using both hyperspectral and LiDAR data. Int. J. Image Data Fusion 2015, 6, 189–215. [Google Scholar] [CrossRef]
  23. Khodadadzadeh, M.; Li, J.; Prasad, S.; Plaza, A. Fusion of Hyperspectral and LiDAR Remote Sensing Data Using Multiple Feature Learning. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2015, 8, 2971–2983. [Google Scholar] [CrossRef]
  24. Mura, M.D.; Benediktsson, J.A.; Waske, B.; Bruzzone, L. Extended profiles with morphological attribute filters for the analysis of hyperspectral data. Int. J. Remote Sens. 2010, 31, 5975–5991. [Google Scholar] [CrossRef]
  25. Kwan, C.; Gribben, D.; Ayhan, B.; Bernabe, S.; Plaza, A.; Selva, M. Improving Land Cover Classification Using Extended Multi-Attribute Profiles (EMAP) Enhanced Color, Near Infrared, and LiDAR Data. Remote Sens. 2020, 12, 1392. [Google Scholar] [CrossRef]
  26. Luo, R.; Liao, W.; Zhang, H.; Zhang, L.; Scheunders, P.; Pi, Y.; Philips, W. Fusion of Hyperspectral and LiDAR Data for Classification of Cloud-Shadow Mixed Remote Sensed Scene. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2017, 10, 3768–3781. [Google Scholar] [CrossRef]
  27. Kang, X.; Li, C.; Li, S.; Lin, H. Classification of Hyperspectral Images by Gabor Filtering Based Deep Network. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2018, 11, 1166–1178. [Google Scholar] [CrossRef]
  28. Li, W.; Du, Q. Gabor-Filtering-Based Nearest Regularized Subspace for Hyperspectral Image Classification. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2014, 7, 1012–1022. [Google Scholar] [CrossRef]
  29. He, L.; Li, J.; Plaza, A.; Li, Y. Discriminative Low-Rank Gabor Filtering for Spectral-Spatial Hyperspectral Image Classification. IEEE Trans. Geosci. Remote Sens. 2017, 55, 1381–1395. [Google Scholar] [CrossRef]
  30. Ojala, T.; Pietikainen, M.; Maenpaa, T. Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans. Pattern Anal. Mach. Intell. 2002, 24, 971–987. [Google Scholar] [CrossRef]
  31. Ge, C.; Du, Q.; Li, W.; Li, Y.; Sun, W. Hyperspectral and LiDAR Data Classification Using Kernel Collaborative Representation Based Residual Fusion. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2019, 12, 1963–1973. [Google Scholar] [CrossRef]
  32. Fang, L.; Li, S.; Duan, W.; Ren, J.; Benediktsson, J.A. Classification of Hyperspectral Images by Exploiting Spectral-Spatial Information of Superpixel via Multiple Kernels. IEEE Trans. Geosci. Remote Sens. 2015, 53, 6663–6674. [Google Scholar] [CrossRef] [Green Version]
  33. Puttonen, E.; Jaakkola, A.; Litkey, P.; Hyyppä, J. Tree classification with fused mobile laser scanning and hyperspectral data. Sensors 2011, 11, 5158–5182. [Google Scholar] [CrossRef] [PubMed]
  34. Pedergnana, M.; Marpu, P.R.; Dalla Mura, M.; Benediktsson, J.A.; Bruzzone, L. Classification of Remote Sensing Optical and LiDAR Data Using Extended Attribute Profiles. IEEE J. Sel. Top. Signal Process. 2012, 6, 856–865. [Google Scholar] [CrossRef]
  35. Imani, M.; Ghassemian, H. An overview on spectral and spatial information fusion for hyperspectral image classification: Current trends and challenges. Inf. Fusion 2020, 59, 59–83. [Google Scholar] [CrossRef]
  36. Camps-Valls, G.; Gomez-Chova, L.; Munoz-Mari, J.; Rojo-Alvarez, J.L.; Martinez-Ramon, M. Kernel-Based Framework for Multitemporal and Multisource Remote Sensing Data Classification and Change Detection. IEEE Trans. Geosci. Remote Sens. 2008, 46, 1822–1835. [Google Scholar] [CrossRef]
  37. Zhang, Y.; Yang, H.L.; Prasad, S.; Pasolli, E.; Jung, J.; Crawford, M. Ensemble Multiple Kernel Active Learning For Classification of Multisource Remote Sensing Data. IEEE J. Sel. Top. Appl. Earth Obs. Remote. Sens. 2015, 8, 845–858. [Google Scholar] [CrossRef]
  38. Zhang, Y.; Prasad, S. Multisource Geospatial Data Fusion via Local Joint Sparse Representation. IEEE Trans. Geosci. Remote Sens. 2016, 54, 3265–3276. [Google Scholar] [CrossRef]
  39. Rasti, B.; Ghamisi, P. Remote sensing image classification using subspace sensor fusion. Inf. Fusion 2020, 64, 121–130. [Google Scholar] [CrossRef]
  40. Dian, R.; Li, S.; Fang, L.; Wei, Q. Multispectral and hyperspectral image fusion with spatial-spectral sparse representation. Inf. Fusion 2019, 49, 262–270. [Google Scholar] [CrossRef]
  41. Hong, D.; Gao, L.; Yao, J.; Zhang, B.; Plaza, A.; Chanussot, J. Graph Convolutional Networks for Hyperspectral Image Classification. IEEE Trans. Geosci. Remote Sens. 2020, 1–13. [Google Scholar] [CrossRef]
  42. Chen, Y.; Li, C.; Ghamisi, P.; Jia, X.; Gu, Y. Deep Fusion of Remote Sensing Data for Accurate Classification. IEEE Geosci. Remote Sens. Lett. 2017, 14, 1253–1257. [Google Scholar] [CrossRef]
  43. Zhang, M.; Li, W.; Du, Q.; Gao, L.; Zhang, B. Feature Extraction for Classification of Hyperspectral and LiDAR Data Using Patch-to-Patch CNN. IEEE Trans. Cybern. 2020, 50, 100–111. [Google Scholar] [CrossRef] [PubMed]
  44. Feng, Q.; Zhu, D.; Yang, J.; Li, B. Multisource hyperspectral and lidar data fusion for urban land use mapping based on a modified two-branch convolutional neural network. ISPRS Int. J. Geo Inf. 2019, 8, 28. [Google Scholar] [CrossRef] [Green Version]
  45. Rudin, L.I.; Osher, S.; Fatemi, E. Nonlinear total variation based noise removal algorithms. Phys. D Nonlinear Phenom. 1992, 60, 259–268. [Google Scholar] [CrossRef]
  46. Palsson, F.; Sveinsson, J.R.; Ulfarsson, M.O. A New Pansharpening Algorithm Based on Total Variation. IEEE Geosci. Remote Sens. Lett. 2014, 11, 318–322. [Google Scholar] [CrossRef]
  47. Chang, Y.; Yan, L.; Fang, H.; Liu, H. Simultaneous Destriping and Denoising for Remote Sensing Images With Unidirectional Total Variation and Sparse Representation. IEEE Geosci. Remote Sens. Lett. 2014, 11, 1051–1055. [Google Scholar] [CrossRef]
  48. Duan, P.; Kang, X.; Li, S.; Ghamisi, P. Noise-Robust Hyperspectral Image Classification via Multi-Scale Total Variation. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2019, 12, 1948–1962. [Google Scholar] [CrossRef]
  49. Kumar, M.; Dass, S. A Total Variation-Based Algorithm for Pixel-Level Image Fusion. IEEE Trans. Image Process. 2009, 18, 2137–2143. [Google Scholar] [CrossRef] [Green Version]
  50. Ma, J.; Chen, C.; Li, C.; Huang, J. Infrared and visible image fusion via gradient transfer and total variation minimization. Inf. Fusion 2016, 31, 100–109. [Google Scholar] [CrossRef]
  51. Quan, Y.; Tong, Y.; Feng, W.; Dauphin, G.; Huang, W.; Xing, M. A Novel Image Fusion Method of Multi-Spectral and SAR Images for Land Cover Classification. Remote Sens. 2020, 12, 3801. [Google Scholar] [CrossRef]
  52. Feng, W.; Dauphin, G.; Huang, W.; Quan, Y.; Liao, W. New margin-based subsampling iterative technique in modified random forests for classification. Knowl. Syst. 2019, 182, 104845. [Google Scholar] [CrossRef]
  53. Feng, W.; Dauphin, G.; Huang, W.; Quan, Y.; Bao, W.; Wu, M.; Li, Q. Dynamic Synthetic Minority Over-Sampling Technique-Based Rotation Forest for the Classification of Imbalanced Hyperspectral Data. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2019, 12, 2159–2169. [Google Scholar] [CrossRef]
  54. Feng, W.; Huang, W.; Bao, W. Imbalanced Hyperspectral Image Classification With an Adaptive Ensemble Method Based on SMOTE and Rotation Forest With Differentiated Sampling Rates. IEEE Geosci. Remote Sens. Lett. 2019, 16, 1879–1883. [Google Scholar] [CrossRef]
  55. Feng, W.; Huang, W.; Ren, J. Class Imbalance Ensemble Learning Based on the Margin Theory. Appl. Sci. 2018, 8, 815. [Google Scholar] [CrossRef] [Green Version]
  56. Quan, Y.; Zhong, X.; Feng, W.; Dauphin, G.; Gao, L.; Xing, M. A Novel Feature Extension Method for the Forest Disaster Monitoring Using Multispectral Data. Remote Sens. 2020, 12, 2261. [Google Scholar] [CrossRef]
  57. Qiang Li, W.F.; Quan, Y. Trend and forecasting of the COVID-19 outbreak in China. J. Infect. 2020, 80, 469–496. [Google Scholar]
  58. Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef] [Green Version]
  59. Roy, S.K.; Krishna, G.; Dubey, S.R.; Chaudhuri, B.B. HybridSN: Exploring 3D-2-D CNN Feature Hierarchy for Hyperspectral Image Classification. IEEE Geosci. Remote Sens. Lett. 2020, 17, 277–281. [Google Scholar] [CrossRef] [Green Version]
  60. Shen, L.; Jia, S. Three-Dimensional Gabor Wavelets for Pixel-Based Hyperspectral Imagery Classification. IEEE Trans. Geosci. Remote Sens. 2011, 49, 5039–5046. [Google Scholar] [CrossRef]
  61. Xu, L.; Yan, Q.; Xia, Y.; Jia, J. Structure extraction from texture via relative total variation. ACM Trans. Graph. TOG 2012, 31, 1–10. [Google Scholar] [CrossRef]
  62. Rodriguez-Galiano, V.; Ghimire, B.; Rogan, J.; Chica-Olmo, M.; Rigol-Sanchez, J. An assessment of the effectiveness of a random forest classifier for land cover classification. ISPRS J. Photogramm. Remote Sens. 2012, 67, 93–104. [Google Scholar] [CrossRef]
  63. Mu, C.; Liu, Y.; Liu, Y. Hyperspectral Image Spectral-Spatial Classification Method Based on Deep Adaptive Feature Fusion. Remote Sens. 2021, 13, 746. [Google Scholar] [CrossRef]
  64. Mohla, S.; Pande, S.; Banerjee, B.; Chaudhuri, S. FusAtNet: Dual Attention Based SpectroSpatial Multimodal Fusion Network for Hyperspectral and LiDAR Classification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, Seattle, WA, USA, 14–19 June 2020. [Google Scholar]
Figure 1. The flowchart of the algorithm design. LBP, local binary pattern; EMAP, extended multi-attribute profile; RTVSA, relative total variation structure analysis.
Figure 1. The flowchart of the algorithm design. LBP, local binary pattern; EMAP, extended multi-attribute profile; RTVSA, relative total variation structure analysis.
Remotesensing 13 01143 g001
Figure 2. The 2012 Houston dataset.
Figure 2. The 2012 Houston dataset.
Remotesensing 13 01143 g002aRemotesensing 13 01143 g002b
Figure 3. The 2017 Houston dataset.
Figure 3. The 2017 Houston dataset.
Remotesensing 13 01143 g003
Figure 4. Classification maps obtained by RF classifiers combined with the proposed methods on the 2012 Houston dataset.
Figure 4. Classification maps obtained by RF classifiers combined with the proposed methods on the 2012 Houston dataset.
Remotesensing 13 01143 g004aRemotesensing 13 01143 g004b
Figure 5. Classification maps obtained by CNN classifiers combined with the proposed methods on the 2012 Houston dataset.
Figure 5. Classification maps obtained by CNN classifiers combined with the proposed methods on the 2012 Houston dataset.
Remotesensing 13 01143 g005aRemotesensing 13 01143 g005bRemotesensing 13 01143 g005c
Figure 6. Classification maps obtained by RF classifiers combined with the proposed methods on the 2017 Houston dataset.
Figure 6. Classification maps obtained by RF classifiers combined with the proposed methods on the 2017 Houston dataset.
Remotesensing 13 01143 g006aRemotesensing 13 01143 g006bRemotesensing 13 01143 g006cRemotesensing 13 01143 g006d
Figure 7. Classification maps obtained by CNN classifiers combined with the proposed methods on the 2017 Houston dataset.
Figure 7. Classification maps obtained by CNN classifiers combined with the proposed methods on the 2017 Houston dataset.
Remotesensing 13 01143 g007aRemotesensing 13 01143 g007bRemotesensing 13 01143 g007c
Figure 8. Influence of the parameters on the RF classification accuracies of the proposed method on the 2012 Houston dataset.
Figure 8. Influence of the parameters on the RF classification accuracies of the proposed method on the 2012 Houston dataset.
Remotesensing 13 01143 g008
Figure 9. Influence of the parameters on the CNN classification accuracies of the proposed method on the 2012 Houston dataset.
Figure 9. Influence of the parameters on the CNN classification accuracies of the proposed method on the 2012 Houston dataset.
Remotesensing 13 01143 g009
Table 1. Dataset information of 2012 Houston and the 2017 Houston. The sizes of the 2012 Houston dataset and the 2017 Houston dataset are 349 × 1905 and 1202 × 4768 , respectively.
Table 1. Dataset information of 2012 Houston and the 2017 Houston. The sizes of the 2012 Houston dataset and the 2017 Houston dataset are 349 × 1905 and 1202 × 4768 , respectively.
DatasetThe 2012 Houston DatasetDatasetThe 2017 Houston Dataset
No.Class NameTrainTestTotalNo.Class NameTrainTestTotal
1Healthy grass198105312511Healthy grass9897019799
2Stressed grass190106412542Stressed grass32532,17732,502
3Synthetic grass1925056973Artificial turf7677684
4Trees188105612444Evergreen trees13613,45213,588
5Soil186105612425Deciduous trees5049985048
6Water1821433256Bare earth4544714516
7Residential196107212687Water3263266
8Commercial191105312448Residential39839,36439,762
9Road193105912529Commercial2237221,447223,684
10Highway1911036122710Roads458,45,35245,810
11Railway1811054123511Sidewalks34033,66234,002
12Parking Lot 11921041123312Crosswalks1515011516
13Parking Lot 218428546913Major thoroughfares46445,89446,358
14Tennis Court18124742814Highways9897519849
15Running Track18747366015Railways6968686937
16Paved parking lots11511,36011475
17Unpaved parking lots1148149
18Cars6665126578
19Trains5453115365
20Stadium seats6867566824
Total 283212,19715,029Total 5047499,665504,712
Table 2. The RF classification performance of the proposed method based on the LBP, EMAP, and Gabor feature extraction on the 2012 Houston dataset (%). The best values are shown in bold.
Table 2. The RF classification performance of the proposed method based on the LBP, EMAP, and Gabor feature extraction on the 2012 Houston dataset (%). The best values are shown in bold.
Features LBPEMAPGabor
ClassHSILiDARHSILiDARRTVSAHSILiDARRTVSAHSILiDARRTVSA
r14414135930158411303645230
Healthy grass96.1841.3696.7940.3193.4896.7073.9498.0994.9617.2095.40
Stressed grass98.0922.8895.2332.9387.1898.5357.0299.0596.367.1993.59
Synthetic grass93.1457.5794.7067.8696.8899.6991.89100.0098.4434.1798.44
Trees98.2537.9496.1551.3189.9595.7276.5797.6491.1746.0796.50
Soil96.3338.3297.8137.9790.5597.9971.74100.00 99.9115.1496.85
Water90.9722.7487.6313.3893.9888.9675.2586.2974.580.3381.27
Residential85.0031.6284.3236.1692.0395.5479.8697.1788.7746.8792.72
Commercial87.5052.7181.1251.4087.6785.3187.3397.5590.2134.4495.19
Road77.2627.0073.2632.5575.6988.3753.5695.3177.3424.0590.89
Highway86.0921.4379.8924.6298.4188.5754.3097.7996.7220.1198.94
Railway81.1626.5080.9045.9599.9196.8368.0598.5094.1944.8196.74
Parking Lot 179.1026.9082.9822.5792.8688.9866.4092.0695.3323.8196.83
Parking Lot 225.296.0325.995.1093.2780.5144.5577.4992.340.4692.81
Tennis Court92.8923.6096.9544.92100.0097.2178.6899.4993.150.7680.71
Running Track95.7244.1597.2054.0499.0199.8489.9599.6797.692.8099.67
AA (%)85.5332.0584.7337.4192.7393.2571.2795.7492.0821.2193.77
OA (%)87.2533.0386.0838.3991.8193.4870.5596.7592.6124.9394.88
Kappa0.86190.27600.84940.33360.91140.92940.68160.96490.92010.18350.9446
Table 3. The CNN classification performance of the proposed method based on the LBP, EMAP, and Gabor feature extraction on the 2012 Houston dataset (%). The best values are shown in bold.
Table 3. The CNN classification performance of the proposed method based on the LBP, EMAP, and Gabor feature extraction on the 2012 Houston dataset (%). The best values are shown in bold.
Features LBPEMAPGabor
ClassHSILiDARHSILiDARRTVSAHSILiDARRTVSAHSILiDARRTVSA
r14414135930158411303645230
Healthy grass92.5915.4695.5725.1180.9798.5271.339898.2680.1995.4
Stressed grass95.7322.1898.6138.3989.7798.1871.0610089.2570.2894.45
Synthetic grass98.086.5499.5316.6799.0799.8492.5299.6910010099.69
Trees93.3710.3196.7747.0789.6194.579.9110098.2592.4996.07
Soil10012.6999.7411.6486.5399.8380.0510010083.998.86
Water82.593.0192.644.3585.9595.9961.5410096.3293.3186.29
Residential97.433.8597.2637.8795.6396.3280.9897.9487.6683.1291.26
Commercial93.879.8793.1970.1391.279188.3894.9392.2388.7398.17
Road86.9111.8195.5750.8785.3396.3577.698.788.0276.3991.06
Highway99.663.199.3843.7677.7710098.8510092.1293.898.94
Railway96.915.9897.1935.9795.1610095.2510099.1298.3397.63
Parking Lot 194.995.999.319.6575.4297.3677.8998.9494.8979.0398.24
Parking Lot 286.371.1691.94.8667.8293.0640.2895.695.684.9590.28
Tennis Court100010025.8983.2510098.7310010098.9899.75
Running Track1000.8298.855.7694.2410091.7810010092.7698.03
AA (%)94.579.5197.0329.2086.5297.4080.4198.9295.4587.7595.61
OA (%)95.0811.3697.2433.5486.9797.3781.7298.8994.7886.2795.97
Kappa0.94680.03410.97010.27870.85900.97160.80220.98800.94360.85170.9564
Table 4. The RF classification performance of the proposed method based on the LBP, EMAP, and Gabor feature extraction on the 2017 Houston dataset (%). The best values are shown in bold.
Table 4. The RF classification performance of the proposed method based on the LBP, EMAP, and Gabor feature extraction on the 2017 Houston dataset (%). The best values are shown in bold.
Features LBPEMAPGabor
ClassHSILiDARHSILiDARRTVSAHSILiDARRTVSAHSILiDARRTVSA
r487961430528773036436430
Healthy grass92.45 68.4090.9163.1684.4891.9361.0282.4371.6976.7079.49
Stressed grass92.6587.8894.2589.6194.7694.1987.8494.3685.7486.4389.26
Artificial turf95.5095.1394.3592.7699.8598.086.7699.9382.508.0887.12
Evergreen trees93.0586.6492.9687.5797.1694.0888.3097.0584.8393.5092.01
Deciduous trees51.0160.0547.9063.8289.6060.8532.5788.4526.1364.8255.34
Bare earth78.3063.2674.3952.9199.3685.8930.2794.6088.7439.1593.79
Water59.3556.7067.0536.2889.3681.1030.4862.7732.3815.6769.14
Residential76.0071.1073.1570.4090.4687.4369.3485.9473.8857.5284.42
Commercial93.6395.0693.5895.4099.0296.1692.7998.8296.2694.5897.01
Roads49.9154.7848.0754.0580.2858.4133.7575.7945.4548.4861.63
Sidewalks43.2562.1140.0666.9169.8051.4227.0265.2231.9444.7047.53
Crosswalks3.522.281.471.1518.665.972.678.240.702.458.05
Major thoroughfares61.0056.3358.9155.2588.7870.7633.1579.9965.9152.8080.04
Highways59.2841.1757.1739.7782.1669.8623.6481.0464.5535.7381.72
Railways87.6562.0089.0754.8098.9590.286.4297.6193.2336.9596.43
Paved parking lots66.2846.5360.9745.6994.9578.0634.2095.8772.8536.0587.50
Unpaved parking lots38.550.864.820.0091.5777.113.7967.9952.840.1753.18
Cars11.3755.345.6155.9468.4251.8423.0743.728.5524.2150.08
Trains38.3150.2428.2244.0985.7667.4941.1167.1754.0552.7875.21
Stadium seats73.5661.7068.7759.2199.1681.8320.9592.4056.0943.7875.13
AA (%)63.2358.8859.5856.4486.1374.6437.4678.9759.4245.7373.20
OA (%)77.6177.5176.3977.5091.8383.6266.7189.0676.8572.8084.52
Kappa0.70240.70550.68470.70470.89250.78370.53190.85460.68270.62800.7936
Table 5. The CNN classification performance of the proposed method based on the LBP, EMAP, and Gabor feature extraction on the 2017 Houston dataset (%). The best values are shown in bold.
Table 5. The CNN classification performance of the proposed method based on the LBP, EMAP, and Gabor feature extraction on the 2017 Houston dataset (%). The best values are shown in bold.
Features LBPEMAPGabor
ClassHSILiDARHSILiDARRTVSAHSILiDARRTVSAHSILiDARRTVSA
r487961430528773036436430
Healthy grass94.5369.918.9474.5374.2861.3877.3786.0580.2581.0578.67
Stressed grass93.3484.9680.2389.193.3992.4890.8594.992.5191.193.85
Artificial turf95.0500.5215.4796.8398.2785.0599.6710091.9593.58
Evergreen trees97.9291.1669.6293.5699.1695.6290.8698.3896.0797.0297.08
Deciduous trees83.680.65.9739.5696.0579.8469.7592.2975.0685.3875.15
Bare earth98.9406.2649.0699.9799.3888.999.9898.2890.2899.94
Water76.570023.1550.3827.3215.5656.6438.0532.6483.59
Residential93.476.3347.6284.6299.4297.5390.4899.1692.8493.0894.65
Commercial98.0692.7691.595.6499.4797.5896.8898.7998.0197.9498.71
Roads77.0146.5640.4365.4989.5173.6366.9789.6477.2773.983.18
Sidewalks74.7347.6135.7763.788.2765.7669.2581.9868.1171.2770.68
Crosswalks11.4400.233.0521.850.8718.3537.249.4218.8524.45
Major thoroughfares84.5737.5151.0774.7397.0887.7785.393.9485.3892.1992.46
Highways94.09041.8472.6198.8493.8978.5297.4492.7893.7996.19
Railways99.34022.9464.8499.9999.2491.3299.7997.9790.7198.08
Paved parking lots91.9716.0622.7565.1399.5195.6582.4399.1295.0687.9695.64
Unpaved parking lots0.1700050.693.786.5397.4231.19.7956.01
Cars95.1307.5256.8698.490.689.5698.3990.3689.9893.7
Trains94.413.2676.4680.9599.9896.4295.4299.3694.2994.9898.58
Stadium seats94.5305.6777.2499.8986.3776.7399.1699.0398.0698.84
AA (%)82.4428.8331.2759.4687.6577.1773.3090.9780.5979.1086.15
OA (%)91.8367.7965.2483.1796.3990.4188.0095.6490.9691.2293.10
Kappa0.89350.54540.52850.77660.95300.87520.84240.94340.88180.88520.9100
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Quan, Y.; Tong, Y.; Feng, W.; Dauphin, G.; Huang, W.; Zhu, W.; Xing, M. Relative Total Variation Structure Analysis-Based Fusion Method for Hyperspectral and LiDAR Data Classification. Remote Sens. 2021, 13, 1143. https://doi.org/10.3390/rs13061143

AMA Style

Quan Y, Tong Y, Feng W, Dauphin G, Huang W, Zhu W, Xing M. Relative Total Variation Structure Analysis-Based Fusion Method for Hyperspectral and LiDAR Data Classification. Remote Sensing. 2021; 13(6):1143. https://doi.org/10.3390/rs13061143

Chicago/Turabian Style

Quan, Yinghui, Yingping Tong, Wei Feng, Gabriel Dauphin, Wenjiang Huang, Wentao Zhu, and Mengdao Xing. 2021. "Relative Total Variation Structure Analysis-Based Fusion Method for Hyperspectral and LiDAR Data Classification" Remote Sensing 13, no. 6: 1143. https://doi.org/10.3390/rs13061143

APA Style

Quan, Y., Tong, Y., Feng, W., Dauphin, G., Huang, W., Zhu, W., & Xing, M. (2021). Relative Total Variation Structure Analysis-Based Fusion Method for Hyperspectral and LiDAR Data Classification. Remote Sensing, 13(6), 1143. https://doi.org/10.3390/rs13061143

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop