Next Article in Journal
A Sidelobe Suppression Method for Circular Ground-Based SAR 3D Imaging Based on Sparse Optimization of Radial Phase-Center Distribution
Previous Article in Journal
Dense Oil Tank Detection and Classification via YOLOX-TR Network in Large-Scale SAR Images
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Joint Classification of Hyperspectral and LiDAR Data Based on Position-Channel Cooperative Attention Network

School of Electronics and Information, Northwestern Polytechnical University, Xi’an 710072, China
*
Author to whom correspondence should be addressed.
Remote Sens. 2022, 14(14), 3247; https://doi.org/10.3390/rs14143247
Submission received: 18 May 2022 / Revised: 23 June 2022 / Accepted: 2 July 2022 / Published: 6 July 2022

Abstract

:
Remote sensing image classification is a prominent topic in earth observation research, but there is a performance bottleneck when classifying single-source objects. As the types of remote sensing data are gradually diversified, the joint classification of multi-source remote sensing data becomes possible. However, the existing classification methods have limitations in heterogeneous feature representation of multimodal remote sensing data, which restrict the collaborative classification performance. To resolve this issue, a position-channel collaborative attention network is proposed for the joint classification of hyperspectral and LiDAR data. Firstly, in order to extract the spatial, spectral, and elevation features of land cover objects, a multiscale network and a single-branch backbone network are designed. Then, the proposed position-channel collaborative attention module adaptively enhances the features extracted from the multi-scale network in different degrees through the self-attention module, and exploits the features extracted from the multiscale network and single-branch network through the cross-attention module, so as to capture the comprehensive features of HSI and LiDAR data, narrow the semantic differences of heterogeneous features, and realize complementary advantages. The depth intersection mode further improves the performance of collaborative classification. Finally, a series of comparative experiments were carried out in the 2012 Houston dataset and Trento dataset, and the effectiveness of the model was proved by qualitative and quantitative comparison.

Graphical Abstract

1. Introduction

With the continuous development of remote sensing technology, remote sensing images are rich in high-resolution ground information, and remote sensing image classification is an important component of remote sensing interpretation, which is widely used in natural disaster prevention, urban and rural planning and other fields. Remote sensing image classification extracts the high-level semantic information of images, and maps the features of remote sensing images to category labels, to realize the classification and recognition of images [1]. Among multitudinous remote sensing data, hyperspectral images (HSI) are obtained by imaging spectrometer, with wavelengths that cover visible and near-infrared channels, reflecting spectra of hundreds/thousands of bands of each pixel on the earth’s surface, narrower spectral bands, and more channels [2]. HSIs have spatial and spectral smoothness, which can not only describe objects in detail and accurately, but also has a high correlation between adjacent bands, greatly improving the ability of target recognition [3,4,5].
However, the intense correlation of HSI will lead to information redundancy. When sensors collect HSI scene data, they are easily interfered by environmental factors, such as clouds and shadows, which will be prone to cause information confusion. Therefore, HSI alone can hardly yield promising classification results. Compared with HSI, light detection and ranging (LiDAR) uses pulsed laser to measure the range, which is an active remote sensing method [6,7]. Moreover, LiDAR is not easily affected by weather conditions, which can not only provide the height and shape information of the scene, but also has intense accuracy and flexibility [8,9]. HSIs can provide various spectral information, and LiDAR data has accurate spatial and elevation information. The two data are deeply fused to realize the complementary advantages of multi-source remote sensing data, thus breaking through the performance bottleneck of single remote sensing data (such as “foreign objects with the same spectrum” or “same object with different spectrums”) and finally achieving the purpose of improving the classification accuracy of objects [10,11]. Many works of literature have proven the effectiveness of the combined interpretation of HSI and LiDAR, indicating that LiDAR can make up for the shortcomings of HSI sensors to some extent. For HSI classification, a series of classification methods have been proposed, such as support vector machines (SVMs) [12], which are widely used in the case of less training data, extreme learning machine [13], random forest classification [14], classification based on sparse representation [15,16,17], evidence theory [18,19,20], etc. Morphological extended attribute profiles (EAPs) can be used for joint feature extraction of HSI and LiDAR images to achieve the purpose of classification [21]. In [22], using multi-kernel learning combined with multi-source heterogeneous features, the kernels constructed by different features are weighted summation. Ge et al. combined classification of HSI and LiDAR images by extinction profiles (EPs), local binary pattern (LBP), and kernel collaborative representation [23]. In [24], the extended multi-attribute profile (EMAP) is used for feature extraction, multi-scale total variation (MSTV) is used for feature estimation in low-dimensional space, and the random forest classifier is used for classification finally. ShearSAF [25] transforms the reduced dimension HSI and LiDAR images into Shearlet domain for feature extraction, and finally classifies them through the random forest.
Most of the traditional classification models simply stack the features of multimodal data, and the direct superposition of these high-dimensional features leads to Hughes phenomenon inevitably. Additionally, the traditional methods of classification rely heavily on hand-designed features, which limis the expression of the model [26]. Of late years, convolutional neural network (CNN) has been proved to be effective in many computer vision tasks, such as classification [27,28,29], detection [30], and segmentation [31]. The research method based on CNN model has become the mainstream method in the field of remote sensing image classification [32,33], showing excellent feature extraction ability and gradually replacing the method based on manual features.
The IEEE GRSS data fusion contest provided data support for multi-source remote sensing image fusion, and accelerated the development of data fusion technology. Li et al. [34] constructed three CNNs to learn the spectral, spatial, and altitude characteristics, respectively, and then fused them using the composite kernel method. To solve the problem caused by feature imbalance, the step of feature fusion is skipped, and a decision fusion method of HSI and LiDAR data classification is proposed [35]. In [36,37], both feature fusion strategy and decision fusion strategy are used to improve the classification performance. Although these methods have been proved to be effective, the complementarity and heterogeneity between HSI and LiDAR data lead to great differences in object representation, and it is difficult to achieve satisfactory classification performance with a plot of training samples. To dispose of this problem, unsupervised CNN models based on encoder–decoder architecture were proposed [38,39], and the data reconstruction strategy is certified to capacity for better activating the neurons across modalities. In [40], the two-branch convolutional neural network combines the spatial and spectral features of HSI extracted separately with the features extracted by LiDAR. EndNet [41] is a deep encoder–decoder network architecture, which can reconstruct multimodal input by enforcing fusion features. In [42], MDL-Middle is an intermediate fusion CNN model. Zhang et al. [43] put forward an interleaving perception convolutional neural network (IP-CNN), designed a bidirectional self-encoder to reconstruct multimodel data, and integrated multi-source heterogeneous information in an interleaving manner. In addition, attention mechanisms have been successfully applied to remote sensing image classification.
Attention mechanism originates from visual signal processing, which indicates the strategy of allocating biased computing resources to the processed signal to underline its information. In the observation of scene or image information, attention will make our brain specialize in the target area, highlight the salient features and suppress the irrelevant features, which is extensively used in classification tasks. In 2017, the Google team put forward the self-attention structure in the article “Attention is all you need” [44], which caused a huge response, making the attention mechanism an important research topic. Hu et al. [45] proposed Squeeze-and-Excitation (SE), which can make the network pay attention to the relationship between channels, and automatically learn the importance of different channel features, improving the accuracy of image classification. The convolutional block attention module (CBAM) proposed by Woo et al. [46] and the attention module (BAM) proposed by Jong et al. [47] can consider both spatial attention and channel attention at the same time when used in image classification, it can make the error rate lower and the classification more stable.
An attention module can be introduced into the model to pay close attention to significant features in channel and position. Fu et al. [48] proposed the double attention network (DAN), which proposed the position attention module and the channel attention module to establish the semantic dependency and applied it to the multimodal fusion network of HSI and LiDAR classification. FusAtNet was proposed in [49], similar to the method in this paper, the model highlights the spectral features of HSI by self-attention mechanism, and fuses the spatial features of LiDAR by cross-attention mechanism. S 2 ENet [50] proposed S 2 EM module for spatial–spectral enhancement of cross-modal information interaction. However, there are still some problems with these methods. The features are only fused in the shallow layer, which does not fully reflect the complementarity of HSI and LiDAR features. In this paper, by combining attention mechanism, a joint classification of HSI and LiDAR data based on the position-channel cooperative attention network is proposed. The different levels of features are extracted through the multiscale network, and deep fusion is carried out through the position-channel cooperative attention ( PC 2 A ) module, so as to give full play to the complementary advantages of multimodal data.
In this article, we propose a joint classification of HSI and LiDAR data based on position-channel cooperative attention network. In feature extraction, we use a multiscale network and self-position enhancement attention network to extract different levels of deep features of HSI. Then, a forward-inverted CNN structure is designed to extract rich spatial features of LiDAR images, and it is fused with the extracted HSI features to obtain spatial–spectral information among different levels. Finally, the probability distribution of each pixel is calculated by three classification modules, and the classification results are fused. The effectiveness of the proposed model is proved on real HSI data sets.
The main contributions are summarized as follows:
  • A multiscale network is designed to obtain different level of location channel features, and design a single-branch network to extract rich position information of LiDAR;
  • The effective PC 2 A module is designed, which consists of three modules: self-position enhancement attention (SPA), fused-position enhancement attention (FPA) and fused-channel enhancement attention (FCA). Enhanced features obtained through PC 2 A module can make full use of the complementarity of position-channel features of HSI and LiDAR images, and the fused features are helpful to obtain more effective classification results;
  • In the fusion phase, shallow features are fused by the position-channel cooperative attention network, and then the deep features are fused by concatenation mechanism. To enhance the identification ability of learned features, three output layers are adopted, and these three output results are combined via a weighted summation method, whose weights are automatically updated through the network learning.
The rest of this article is organized as follows. The details of the proposed model are described in Section 2. In Section 3, the data set and experimental results are introduced. Finally, this article is summarized in Section 4.

2. Methodology

2.1. Overview

The framework of the proposed position-channel collaborative attention network ( PC 2 ANet ) is shown in Figure 1, including the feature extraction model, PC 2 A module, and decision fusion module. The feature extraction model consists of a multiscale network and a single-branch network. The multiscale network uses 2D CNN and 3D CNN with different kernel sizes to extract the position and channel information of HSI, and the 2D CNN is used in a single-branch network to extract the position information of LiDAR. PC 2 A module consists of SPA, FPA, and FCA. The SPA module can enhance the spatial features of HSI, besides, the FPA and FCA module can enhance the spatial features of HSI and the channel features of LiDAR through cross-attention. Data fusion methods include feature fusion and decision fusion. Among them, feature fusion is the fusion of extracted HSI features and LiDAR features through feature concatenate, and decision fusion consists of three classification modules.

2.2. Feature Extraction Model

The overall parameter configuration of the designed feature extraction model is described in detail in Figure 2. In the classification task, a large number of samples are needed for training. However, the collection of ground reference data is expensive and difficult, so it is difficult to obtain a large number of ground reference data. Moreover, the manual labeling of HSI is costly, so there are few labeled samples available. To solve this problem, the existing training samples are rotated to expand the sample set.
After the data rotated, the HSI feature is fed into the multiscale network, which includes 3D CNN, 2D CNN, batch normalization layers, and ReLU. Two-dimensional CNN can fully extract spatial information, but will ignore the rich spectral features of HSIs. Three-dimensional CNN can extract spatial and spectral information, but excessive use will make the network too bloated, which may reduce the classification accuracy. In order to alleviate these problems, the combination of 3D CNN and 2D CNN is used to construct a multiscale network for feature extraction of HSI in this paper. Each branch has different hierarchical structures, and the first convolutional layer of each branch has a different kernel size. Three different features are extracted by different branches, and then they are fused after passing through the SPA module, respectively, so that more discriminative feature representation can be obtained, and the classification performance can be effectively improved.
The LiDAR image feature is fed into a single-branch network, which includes 2D CNN, batch normalization layers, and ReLU. LiDAR images contain abundant spatial information, which can make up for the low spatial resolution of HSI. Three-layer 2D CNN is used to extract spatial features of LiDAR. Through feature extraction, the number of channels of LiDAR images is continuously deepened, so as to further integrate with the features extracted by HSI.

2.3. Position-Channel Collaborative Attention Module

(1) Self-Position Attention Module: Self-position attention module can build rich semantic associations on local features to realize the spatial enhancement of HSI. The features A, B, and C are extracted from the three branches of the multi-scale network, respectively. The spatial and spectral features of HSI images are extracted in different degrees, but the spatial features of HSI images are not significant enough. The SPA module can be used to strengthen the spatial details of three branches, and transmit A, B, and C to the SPA module, respectively, which makes the spatial features more discriminating. Take feature A as an example, As shown in Figure 3a, given a local feature A R C × H × W , where H × W is the number of pixels and C is the number of channels. We first transpose it to A T R H × W × C , then we perform a matrix multiplication between A T and A to calculate position attention map B R C × C . After that, we aggregate the global vectors by the squeeze operation to obtain the position impact of all images corresponding to HSI. The whole process can be formulated as:
s p a ( A ) = A S q u g A i , A 0 T , , g A i , A ( N 1 ) T
where A R C × H × W is the extracted HSI image feature. A i R 1 × ( H × W ) and A j T R H × W × 1 denote the vectors at position i of A and position j of A T , respectively. S q u ( · ) represents squeeze operation. We define the function g ( · ) as:
g A , A T = R 1 ( A ) · R 1 A T
where R 1 ( · ) denote dimensional transformation, which is reshaping three-dimensional space R C × H × W into two-dimensional space R C × ( H × W ) . Features B and C, respectively, repeat the above operations to obtain three enhanced HSI features, and then fuse them by feature concatenate. The output of three SPA modules are calculated as:
o u t p u t ( T branch ) = Cat ( s p a ( A ) , s p a ( B ) , s p a ( C ) )
where Cat ( · ) denote concatenate, which is to splice two tensors together.
(2) Fused-Position Attention Module: FPA module can realize spatial enhancement of HSI through LiDAR image. This module captures the position response between HSI and LiDAR image features, and enhances the original HSI features through this response, so that the obtained features not only have rich channel information, but also have rich position information through enhancement. As shown in Figure 3b, given features F 1 R C × H 1 × W 1 and F 2 R C × H 2 × W 2 , which represent the extracted HSI image features and LiDAR image features respectively. Firstly, we reshape F 1 R C × H 1 × W 1 into F 1 T R H 1 × W 1 × C , and perform dimensional tranformation and a matrix multiplication between F 1 T and F 2 to get the position attention map P R H 1 × W 1 × H 2 × W 2 . Through the squeeze operation, F 1 can obtain the spatial enhancement from all positions of F 2 . The whole process can be formulates as:
f p a F 1 , F 2 = F 1 Squ g f 1 i , f 2 0 , , g f 1 i , f 2 N 2 1
where F 1 R C × H 1 × W 1 is the extracted HSI image feature, N 2 = H 2 × W 2 . f 1 i and f 2 j denote the vectors at position i of F 1 T and position j of F 2 , respectively.
(3) Fused-Channel Attention Module: Correspondingly, the FCA module can realize channel enhancement of LiDAR image through HSI. This module captures the channel response between HSI and LiDAR image features, and enhances the original LiDAR image features through this response. As shown in Figure 3c, the spatial enhancement effect is calculated as:
f c a F 1 , F 2 = F 2 S q u g f 2 i , f 1 0 , , g f 2 i , f 1 C 1 1
where f 2 i and f 1 i denote the vectors at position i of F 2 T and position j of F 1 , respectively.
The two outputs of the multi-branch network model are the common inputs of the FPA module and the FCA module. FPA module is used to enhance the spatial features of HSI features, and the FCA module is used to enhance the channel representation of LiDAR features, and then the output of the single branch network is fused by feature addition. The calculation is as:
o u t p u t ( S branch ) = f p a F 1 , F 2 f c a F 1 , F 2
where ⊕ denotes the matrix addition, F 1 R C × H 1 × W 1 and F 2 R C × H 2 × W 2 represent the extracted HSI features and LiDAR image features, respectively.

2.4. Data Fusion Network Model

The data fusion method includes two parts, feature fusion, and decision fusion. Among them, feature fusion is the concatenate of HSI features extracted by the multiscale network and LiDAR features extracted by a single-branch network. Decision fusion is to fuse different classification results by adding weights, mainly the classification results after the SPA module, FPA and FCA module, and feature fusion.
The calculation process of feature fusion can be formulated as:
O u t = Cat σ o u t p u t ( M b r a n c h ) , o u t p u t ( S b r a n c h )
The calculation process of σ ( · ) is as follows:
σ ( · ) = R e L U B N f 0 ( · )
where B N ( · ) denotes batch normalization and R e L U ( · ) denotes activation function. f 0 ( · ) denotes 2D convolution operation, which is to reduce the dimension of the features extracted by HSI for feature fusion.
The calculation process of classification module can be formulated as:
C ( · ) = M L P ( F l a t t e n ( A v g P o o l ( · ) ) )
o u t p u t ( M b r a n c h ) represents the output of HSI feature extraction network, o u t p u t ( S b r a n c h ) represents the output of LiDAR feature extraction network and O u t represents the result of feature fusion. They are, respectively, sent to the classification module to obtain three different probability matrices, and the final classification result can be obtained by weighted summation. The weights are automatically updated through network training. The specific calculation process can be formulated as:
R e s u l t = C ( O u t ) + λ 1 · C σ o u t p u t ( M b r a n c h ) + λ 2 · C o u t p u t ( S b r a n c h )
where λ 1 and λ 2 represent the learned weights.

3. Results and Discussion

3.1. Data Sets

(1) 2012 Houston Data: The data set was obtained in June 2012 on the University of Houston campus and adjacent urban areas. The number of spectral bands of hyperspectral data is 144. It consists of hyperspectral images and DSM data, both of which contain 349 × 1905 pixels with a spatial resolution of 2.5 m. The whole scene contains 15 different classes. Table 1 lists the detailed sample quantity of each category and the color of each class. Figure 4 gives False-color images of hyperspectral data, gray images of lidar data, and Ground-truth map. Standard training and test samples are used throughout the experiment to make the experimental results credible. These data and reference classes can be obtained online from IEEE GRSS website (http://dase.grss-ieee.org/ (accessed on 5 July 2022)).
(2) Trento Dataset: The data set was obtained on a rural area in southern Trento. The number of spectral bands of hyperspectral data is 63. It consists of hyperspectral images captured by AISA Eagle sensor and LiDAR data captured by the Optech ALTM 3100EA sensor, both of which contain 600 × 166 pixels with a spatial resolution of 1 m. The whole scene contains 6 different classes. Table 2 lists the detailed sample quantity of each category and the color of each class. Figure 5 gives false-color images of hyperspectral data, gray images of lidar data, and ground-truth map. The experiment also uses standard training and test samples.

3.2. Experimental Setup

Two different data sets were used to evaluate the effectiveness of the model through the overall accuracy (OA), average accuracy (AA), and Kappa coefficient, and compared with several different models. Among them, the overall accuracy (OA) defines the ratio between all pixels that are correctly classified and the total number of pixels in the test set. Averaged Accuracy (AA) is the average probability that the accuracy of each class of element is added and divided by the number of categories. Kappa Coefficient is also used to evaluate the classification accuracy to check the consistency between the remote sensing classification result map and the real landmark map.
The experimental environment is under the framework of Pytorch, using cross entropy loss function and Adam optimizer, and the learning rate is set to 0.001. Batch size and the numbers of training epochs are set to 64 and 200.
To verify the proposed PC 2 ANet , the classification results of different patch sizes are compared, other parameters are fixed, and different patches are selected from the candidate set {5 × 5, 7 × 7, 9 × 9, 11 × 11} to evaluate the effect of patches.
Figure 6 shows the change of OA value under different patch sizes. Experimental results show that the features extracted by different patch sizes have different classification effects. In the 2012 Houston dataset, when the patch increases from 5 × 5 to 9 × 9, the OA keeps increasing. However, when the patch is 11× 11, the OA decreases greatly and reaches the peak when the patch is 9 × 9. In the Trento dataset, OA reaches its peak when the patch is 5 × 5. Different data sets have different features and information. Therefore, it is necessary to select the appropriate patch according to different feature information to get better results.

3.3. Experimental Results

(1) Effectiveness of PC 2 A module: The comparative experiment of single-source image classification after adding PC 2 A module to LiDAR image and HSI is listed. HSI means feature extraction and classification of HSI through the multiscale network, LiDAR means feature extraction and classification of LiDAR images through the single-branch network. LiDAR- PC 2 A means that LiDAR image adds the position-channel collaborative attention module, and HSI- PC 2 A means that HSI adds the position-channel collaborative attention module. In order to ensure the authenticity and effectiveness of the comparative experiment, the standard data set is adopted uniformly, and all the training and test samples are the same.
Table 3 lists the overall accuracy (OA), average accuracy (AA), and Kappa coefficient of the five models on the 2012 Houston data set and Trento data set, and the best results are shown in bold. Obviously, the classification accuracy of single-source images with PC 2 A module is higher than that without PC 2 A module. In the Houston dataset, the OA of the LiDAR- PC 2 A model is 60.87%, that of the LiDAR model is 58.25%, and the accuracy is improved by 2.62% after adding PC 2 A module. The OA of the HSI- PC 2 A model is 97.31% and that of the HSI model is 96.72% on the Trento dataset, and the accuracy is improved by 0.59% after adding PC 2 A module. Figure 7 shows the classification results of the five models listed in Table 3 on the Houston data set. It can be seen that from Figure 7a–e, the category classification becomes clearer and the results become more obvious. Figure 8 shows the partial sorting results on the Trento data set, which is primarily used to compare the classification details of HSI and HSI- PC 2 A models. In this partial map, the class of Wood occupies a large area, and when using the HSI model for classification, some Wood will be mistakenly classified as vineyard and apple trees. With the addition of PC 2 A module, the model pays more attention to the detailed information of categories and can make better identification. It can be seen from Figure 8 that the classification effect of the HSI- PC 2 A model on the wood category is obviously better than that of the HSI model.
The result of single-source classification of LiDAR images is the worst among the methods listed in Table 3. However, LiDAR data can provide potential details for HSI, including the height and shape information of ground objects, which is a necessary supplement to HSI defects. The OA of LiDAR image and HSI after feature fusion classification is also much higher than that of HSI single source classification, which verifies the indispensability of LiDAR data and fully demonstrates the effectiveness of feature fusion.
(2) Effectiveness of the proposed PC 2 ANet : To validate the effectiveness of the proposed method, the proposed model PC 2 ANet is compared with advanced deep learning models such as Two-Branch CNN [40], EndNet [41], MDL-Middle [42], FusAtNet [49], IP-CNN [43], CRNN [4], S 2 ENet [50] and HRWN [37]. To make a fair comparison, all training and testing samples are the same. In which, two-branch CNN realizes feature fusion by combining the spatial and spectral features extracted from HSI with LiDAR image features extracted from cascade networks. EndNet is a deep encoder–decoder network architecture, which fuses multi-modal information by strengthening fusion features. MDL-Middle is a baseline CNN model through intermediate fusion. FusAtNet is a method that generates an attention map through “self-attention” mechanism to highlight its own spectral features and highlights spatial features through “cross-attention” mechanism to realize classification. IP-CNN designed a bidirectional automatic encoder to reconstruct HSI and LiDAR data, and the reconstruction process does not depend on the labeling information. The convolutional recurrent neural network (CRNN) was proposed to learn more discriminant features in HSI data classification. S 2 ENet enhances the spectral and spatial representation of images through the spatial–spectral enhancement module of cross-modal information interaction to achieve the purpose of feature fusion. HRWN jointly optimizes the dual-tunnel CNN and pixel-level affinity through a random walk layer, which strengthens the spatial consistency in the deeper layer of the network.
Table 4 and Table 5, respectively, list the OA, AA, and Kappa coefficient of the 2012 Houston data set and Trento data set, the precision of each class is also listed, among which the best results are shown in bold. Obviously, the experimental results of PC 2 ANet are better than those of other methods. Taking the 2012 Houston data set as an example, the OA of the proposed PC 2 ANet method is 95.02%, AA is 94.97% and Kappa is 94.59, which is the best among the listed methods. Specifically, using the Houston dataset for evaluation, it shows approximately 7.04%, 6.50%, 5.47%, 5.04%, 2.96%, 6.47%, 0.83%, and 1.41% improvements over Two-branch, EndNet, MDL-Middle, FusAtNet, IP-CNN, CRNN, S 2 ENet , and HRWN, respectively. For the Trento data set, the categorization effect of the PC 2 ANet model for the two categories of Ground and Roads are significantly higher than other models. Although the classification accuracy of different models on this data set has reached a relatively high level, the taxonomy effect for the two categories Ground and Roads is less than ideal. As can be seen from Table 2, the number of samples in the categories Ground and Roads is relatively small. Due to the imbalance of sample categories, the classification results of the first category are better than that of the last category. However, the presented PC 2 ANet model concentrates on the detailed information of categories through the attention mechanism and learns the difference of various categories accurately for precise classification, which indicates the superiority of this model.
In terms of single category classification effect, the model PC 2 ANet proposed in this paper has the best classification results in the categories of healthy grass, tree, residential, commercial, road, and highway. Especially in the case of poor classification effect of healthy grass, commercial and highway, our model can show a better classification effect, which is greatly improved compared with other models.
In order to qualitatively evaluate the classification performance, Figure 9 and Figure 10 show the classification results of different methods, and the visual results are consistent with the data results in Table 4 and Table 5. The classification result of EndNet is shown in Figure 9b has more detailed information, but the OA, AA, and Kappa are far lower than that of the proposed method PC 2 ANet . Because the input of EndNet is pixel by pixel, not pixel block, which makes the classified figures show more details. However, the correlation of adjacent pixels is not fully utilized, and only the current pixel category is considered, so the detailed information displayed by the classification results is probably wrong, which limits the classification accuracy of this model. It further demonstrates the significance of considering neighborhood information.
(3) Training and testing time cost analysis: Table 6 and Table 7 show the training and testing time cost of different models. All experiments are implemented under the same software configuration. The training process takes a long time, while the testing process takes a short time. Table 6 presents the training and testing times for the unfused model and the fused model. Obviously, the time cost of LiDAR- PC 2 A is greater than that of LiDAR, and that of HSI- PC 2 A is greater than that of HSI, which shows that the attention mechanism will increase the computational cost of the network, but at the same time it will improve the performance of the network. In Table 7, the training and testing time of EndNet on both datasets is short, because the neighborhood information is not considered in EndNet during training and testing. Using a single pixel as the input can save training and testing time, but ignoring the neighborhood information leads to a drop in accuracy. The training and testing times of FusAtNet, S 2 ENet , and PC 2 ANet networks are relatively high. The common point of these networks is that the attention module is added, which increases the computational cost of the model, but at the same time improves the classification performance. As can be seen from Table 7, the training and testing time of the FusAtNet model is much longer than that of other models, because the model uses multiple cross-attentions, and the network is deeper than others, which extremely increases the calculation cost.

3.4. Ablation Studies

As the proposed model PC 2 ANet includes two modules to improve classification performance, namely, the PC 2 A module and the decision fusion module, in order to verify the effectiveness of these three modules, respectively, a series of ablation experiments were conducted on the 2012 Houston data set and Trento data set. Specifically, while keeping other parts of the model unchanged, the experiment was conducted in two cases: With/Without PC 2 A Module, and With/Without Decision Fusion, and the experimental data were compared.
(1) With/Without PC 2 A Module: PC 2 A module consists of SPA, FCA, and FPA. By weighting the position and channel features to different degrees, the extracted features contain more effective information, which is beneficial to classification, thus improving the expression ability of the model. In order to analyze the effectiveness of the PC 2 A module, Table 8 shows the results with and without the PC 2 A module. It is obvious that the OA, AA, and Kappa of the model with PC 2 A are higher than those without PC 2 A .
Part of the categories of OA are shown in Figure 11. In the 2012 Houston data set, the classification results of most categories by this model are higher than those of other models, especially those with plenty of samples, which shows that the model has a strong learning ability. Under the condition of enough samples, it can learn the detailed information of different categories effectively, thus making accurate classification. As can be seen from Figure 11, the accuracy of most categories is improved after adding the module PC 2 A , which proves the superiority of this module.
(2) With/Without Decision Fusion: The decision fusion (DS) module fuses the output of the HSI feature extraction network, LiDAR image feature extraction network, and the result after feature fusion. The three classification results obtained by the classification module are fused. Through the automatic learning of the network, the weight coefficient is constantly updated to obtain the optimal weight addition result. To analyze the effectiveness of the decision fusion module, Table 9 shows the results with and without the decision fusion module (DS). It is obvious that the OA, AA, and Kappa of the model with DS are higher than those without DS.
Figure 12 shows OA for part classes. In the 2012 Houston data set, there are nine classes of OA that are higher than those without DS. In the Trento data set, the classification results of each class are higher than those without DS. Experiments can effectively prove that the decision fusion module can obviously improve the performance of the model.

4. Conclusions

This paper proposed a position-channel cooperative attention network for HSI and LiDAR data fusion. In the feature extraction stage, HSI features are initially extracted by a multiscale network, and then the spatial information of HSI features is enhanced by the SPA module; LiDAR features are initially extracted by a single-branch network, and then the extracted HSI and LiDAR features are enhanced by FPA and FCA modules, respectively, for complementary spatial features and channel features so that the important features of the image can be paid more attention, while the attention to useless information is reduced. Feature fusion is the fusion of extracted HSI and LiDAR features through feature concatenation. After feature fusion, HSI and LiDAR features extracted by PC 2 A module are sent to the classification module, respectively, three different output results are combined by weighted summation, and the weights are automatically learned through network. In order to validate the effectiveness of the proposed model, we constructed experiments on two data sets. Experiments show that the proposed model is effective by comparing the qualitative and quantitative results of the 2012 Houston data and Trento data. In addition, we evaluated the influence of neighborhood size, PC 2 A module, and decision fusion module on classification performance through ablation experiments. Experiments show that the proposed modules and parameter settings can improve the experimental performance. In the future, we need to explore more powerful feature extraction methods and more effective feature fusion methods.

Author Contributions

Conceptualization, L.Z.; methodology, L.Z.; validation, L.Z.; writing—original draft preparation, L.Z.; writing—review and editing, L.Z., J.G. and W.J.; supervision, J.G. and W.J. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded in part by National Key R&D Program of China, grant number 2021YFB3900502; in part by the National Natural Science Foundation of China, grant number 61901376.

Data Availability Statement

The Houston Dataset is available at: http://dase.grss-ieee.org/, accessed on 5 July 2022. The Trento Dataset can be obtained from [42].

Acknowledgments

The authors would like to thank the IEEE GRSS Image Analysis and Data Fusion Technical Committee for providing the 2012 Houston data.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Hong, D.; Yokoya, N.; Chanussot, J.; Zhu, X.X. CoSpace: Common subspace learning from hyperspectral-multispectral correspondences. IEEE Trans. Geosci. Remote Sens. 2019, 57, 4349–4359. [Google Scholar] [CrossRef] [Green Version]
  2. Lu, Z.; Xu, B.; Sun, L.; Zhan, T.; Tang, S. 3-D Channel and spatial attention based multiscale spatial–spectral residual network for hyperspectral image classification. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 13, 4311–4324. [Google Scholar] [CrossRef]
  3. Yue, J.; Fang, L.; Rahmani, H.; Ghamisi, P. Self-supervised learning with adaptive distillation for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2021, 60, 1–13. [Google Scholar] [CrossRef]
  4. Wu, H.; Prasad, S. Convolutional recurrent neural networks forhyperspectral data classification. Remote Sens. 2017, 9, 298. [Google Scholar] [CrossRef] [Green Version]
  5. Ghamisi, P.; Benediktsson, J.A.; Sveinsson, J.R. Automatic Spectral–Spatial Classification Framework Based on Attribute Profiles and Supervised Feature Extraction. IEEE Trans. Geosci. Remote Sens. 2014, 52, 5771–5782. [Google Scholar] [CrossRef]
  6. Kuras, A.; Brell, M.; Rizzi, J.; Burud, I. Hyperspectral and Lidar Data Applied to the Urban Land Cover Machine Learning and Neural-Network-Based Classification: A Review. Remote Sens. 2021, 13, 3393. [Google Scholar] [CrossRef]
  7. Mäyrä, J.; Keski-Saari, S.; Kivinen, S.; Tanhuanpää, T.; Hurskainen, P.; Kullberg, P.; Poikolainen, L.; Viinikka, A.; Tuominen, S.; Kumpula, T.; et al. Tree species classification from airborne hyperspectral and LiDAR data using 3D convolutional neural networks. Remote Sens. Environ. 2021, 256, 112322. [Google Scholar] [CrossRef]
  8. Wang, X.; Feng, Y.; Song, R.; Mu, Z.; Song, C. Multi-attentive hierarchical dense fusion net for fusion classification of hyperspectral and LiDAR data. Inf. Fusion 2022, 82, 1–18. [Google Scholar] [CrossRef]
  9. Ghamisi, P.; Rasti, B.; Yokoya, N.; Wang, Q.; Hofle, B.; Bruzzone, L.; Bovolo, F.; Chi, M.; Anders, K.; Gloaguen, R.; et al. Multisource and multitemporal data fusion in remote sensing: A comprehensive review of the state of the art. IEEE Geosci. Remote Sens. Mag. 2019, 7, 6–39. [Google Scholar] [CrossRef] [Green Version]
  10. Debes, C.; Merentitis, A.; Heremans, R.; Hahn, J.; Frangiadakis, N.; van Kasteren, T.; Liao, W.; Bellens, R.; Pižurica, A.; Gautama, S.; et al. Hyperspectral and LiDAR data fusion: Outcome of the 2013 GRSS data fusion contest. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2014, 7, 2405–2418. [Google Scholar] [CrossRef]
  11. Geng, J.; Deng, X.; Ma, X.; Jiang, W. Transfer Learning for SAR Image Classification Via Deep Joint Distribution Adaptation Networks. IEEE Trans. Geosci. Remote. Sens. 2020, 58, 5377–5392. [Google Scholar] [CrossRef]
  12. Wang, Y.; Duan, H. Classification of Hyperspectral Images by SVM Using a Composite Kernel by Employing Spectral, Spatial and Hierarchical Structure Information. Remote Sens. 2018, 10, 26. [Google Scholar] [CrossRef] [Green Version]
  13. Huang, G.B.; Zhu, Q.Y.; Siew, C.K. Extreme learning machine: Theory and applications. Neurocomputing 2006, 70, 489–501. [Google Scholar] [CrossRef]
  14. Zhang, S.; Li, S.; Fu, W.; Fang, L. Multiscale superpixel-based sparse representation for hyperspectral image classification. Remote Sens. 2017, 9, 139. [Google Scholar] [CrossRef] [Green Version]
  15. Hänsch, R.; Hellwich, O. Feature-independent classification of hyperspectral images by projection-based random forests. In Proceedings of the 2015 7th Workshop on Hyperspectral Image and Signal Processing: Evolution in Remote Sensing (WHISPERS), Tokyo, Japan, 2–5 June 2015; pp. 1–4. [Google Scholar]
  16. Jiang, W. A correlation coefficient for belief functions. Int. J. Approx. Reason. 2018, 103, 94–106. [Google Scholar] [CrossRef] [Green Version]
  17. Liu, Y.; Liu, L.; Gao, Y.; Yang, L. An Improved Random Forest Algorithm Based on Attribute Compatibility. In Proceedings of the 2019 IEEE 3rd Information Technology, Networking, Electronic and Automation Control Conference (ITNEC), Chengdu, China, 15–17 March 2019. [Google Scholar]
  18. Jiang, W.; Cao, Y.; Deng, X. A Novel Z-network Model Based on Bayesian Network and Z-number. IEEE Trans. Fuzzy Syst. 2020, 28, 1585–1599. [Google Scholar] [CrossRef]
  19. He, Z.; Jiang, W. An evidential Markov decision making model. Inf. Sci. 2018, 467, 357–372. [Google Scholar] [CrossRef] [Green Version]
  20. Jiang, W.; Xie, C.; Zhuang, M.; Tang, Y. Failure Mode and Effects Analysis based on a novel fuzzy evidential method. Appl. Soft Comput. 2017, 57, 672–683. [Google Scholar] [CrossRef]
  21. Pedergnana, M.; Marpu, P.R.; Dalla Mura, M.; Benediktsson, J.A.; Bruzzone, L. Classification of remote sensing optical and LiDAR data using extended attribute profiles. IEEE J. Sel. Top. Signal Process. 2012, 6, 856–865. [Google Scholar] [CrossRef]
  22. Gu, Y.; Wang, Q.; Jia, X.; Benediktsson, J.A. A novel MKL model of integrating LiDAR data and MSI for urban area classification. IEEE Trans. Geosci. Remote Sens. 2015, 53, 5312–5326. [Google Scholar]
  23. Ge, C.; Du, Q.; Li, W.; Li, Y.; Sun, W. Hyperspectral and LiDAR data classification using kernel collaborative representation based residual fusion. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2019, 12, 1963–1973. [Google Scholar] [CrossRef]
  24. Tong, Y.; Quan, Y.; Feng, W.; Dauphin, G.; Wang, Y.; Wu, P.; Xing, M. Multi-Scale Feature Extraction and Total Variation Based Fusion Method For HSI and Lidar Data Classification. In Proceedings of the 2021 IEEE International Geoscience and Remote Sensing Symposium IGARSS, Brussels, Belgium, 11–16 July 2021; pp. 5433–5436. [Google Scholar]
  25. Jia, S.; Zhan, Z.; Xu, M. Shearlet-Based Structure-Aware Filtering for Hyperspectral and LiDAR Data Classification. J. Remote Sens. 2021, 2021, 9825415. [Google Scholar] [CrossRef]
  26. Huang, K.; Geng, J.; Jiang, W.; Deng, X.; Xu, Z. Pseudo-Loss Confidence Metric for Semi-Supervised Few-Shot Learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 8671–8680. [Google Scholar]
  27. Xu, K.; Huang, H.; Deng, P.; Li, Y. Deep feature aggregation framework driven by graph convolutional network for scene classification in remote sensing. IEEE Trans. Neural Netw. Learn. Syst. 2021. [Google Scholar] [CrossRef] [PubMed]
  28. He, Z.; Jiang, W. An evidential dynamical model to predict the interference effect of categorization on decision making. Knowl.-Based Syst. 2018, 150, 139–149. [Google Scholar] [CrossRef]
  29. Jiang, W.; Huang, K.; Geng, J.; Deng, X. Multi-Scale Metric Learning for Few-Shot Learning. IEEE Trans. Circuits Syst. Video Technol. 2021, 31, 1091–1102. [Google Scholar] [CrossRef]
  30. He, Q.; Sun, X.; Yan, Z.; Fu, K. DABNet: Deformable contextual and boundary-weighted network for cloud detection in remote sensing images. IEEE Trans. Geosci. Remote. Sens. 2021, 60, 1–16. [Google Scholar] [CrossRef]
  31. Sun, Y.; Fu, Z.; Sun, C.; Hu, Y.; Zhang, S. Deep Multimodal Fusion Network for Semantic Segmentation Using Remote Sensing Image and LiDAR Data. IEEE Trans. Geosci. Remote Sens. 2021, 60. [Google Scholar] [CrossRef]
  32. Bazi, Y.; Bashmal, L.; Rahhal, M.M.A.; Dayil, R.A.; Ajlan, N.A. Vision transformers for remote sensing image classification. Remote Sens. 2021, 13, 516. [Google Scholar] [CrossRef]
  33. Miao, W.; Geng, J.; Jiang, W. Semi-Supervised Remote Sensing Image Scene Classification Using Representation Consistency Siamese Network. IEEE Trans. Geosci. Remote Sens. 2022, 60. [Google Scholar] [CrossRef]
  34. Li, H.; Ghamisi, P.; Soergel, U.; Zhu, X.X. Hyperspectral and LiDAR fusion using deep three-stream convolutional neural networks. Remote Sens. 2018, 10, 1649. [Google Scholar] [CrossRef] [Green Version]
  35. Zhao, C.; Gao, X.; Wang, Y.; Li, J. Efficient multiple-feature learning-based hyperspectral image classification with limited training samples. IEEE Trans. Geosci. Remote Sens. 2016, 54, 4052–4062. [Google Scholar] [CrossRef]
  36. Hang, R.; Li, Z.; Ghamisi, P.; Hong, D.; Xia, G.; Liu, Q. Classification of hyperspectral and LiDAR data using coupled CNNs. IEEE Trans. Geosci. Remote Sens. 2020, 58, 4939–4950. [Google Scholar] [CrossRef] [Green Version]
  37. Zhao, X.; Tao, R.; Li, W.; Li, H.C.; Du, Q.; Liao, W.; Philips, W. Joint classification of hyperspectral and LiDAR data using hierarchical random walk and deep CNN architecture. IEEE Trans. Geosci. Remote Sens. 2020, 58, 7355–7370. [Google Scholar] [CrossRef]
  38. Zhang, M.; Li, W.; Du, Q.; Gao, L.; Zhang, B. Feature extraction for classification of hyperspectral and LiDAR data using patch-to-patch CNN. IEEE Trans. Cybern. 2018, 50, 100–111. [Google Scholar] [CrossRef] [PubMed]
  39. Hong, D.; Yokoya, N.; Xia, G.S.; Chanussot, J.; Zhu, X.X. X-ModalNet: A semi-supervised deep cross-modal network for classification of remote sensing data. ISPRS J. Photogramm. Remote Sens. 2020, 167, 12–23. [Google Scholar] [CrossRef] [PubMed]
  40. Xu, X.; Li, W.; Ran, Q.; Du, Q.; Gao, L.; Zhang, B. Multisource remote sensing data classification based on convolutional neural network. IEEE Trans. Geosci. Remote Sens. 2017, 56, 937–949. [Google Scholar] [CrossRef]
  41. Hong, D.; Gao, L.; Hang, R.; Zhang, B.; Chanussot, J. Deep Encoder–Decoder Networks for Classification of Hyperspectral and LiDAR Data. IEEE Geosci. Remote Sens. Lett. 2022, 19, 1–5. [Google Scholar] [CrossRef]
  42. Hong, D.; Gao, L.; Yokoya, N.; Yao, J.; Chanussot, J.; Du, Q.; Zhang, B. More Diverse Means Better: Multimodal Deep Learning Meets Remote-Sensing Imagery Classification. IEEE Trans. Geosci. Remote Sens. 2021, 59, 4340–4354. [Google Scholar] [CrossRef]
  43. Zhang, M.; Li, W.; Tao, R.; Li, H.; Du, Q. Information fusion for classification of hyperspectral and LiDAR data using IP-CNN. IEEE Trans. Geosci. Remote Sens. 2021, 60, 1–12. [Google Scholar] [CrossRef]
  44. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention is All You Need. arXiv 2017, arXiv:1706.03762. [Google Scholar]
  45. Hu, J.; Shen, L.; Sun, G. Squeeze-and-Excitation Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar]
  46. Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
  47. Park, J.; Woo, S.; Lee, J.Y.; Kweon, I.S. A simple and light-weight attention module for convolutional neural networks. Int. J. Comput. Vis. 2020, 128, 783–798. [Google Scholar] [CrossRef]
  48. Fu, J.; Liu, J.; Tian, H.; Li, Y.; Bao, Y.; Fang, Z.; Lu, H. Dual attention network for scene segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 3146–3154. [Google Scholar]
  49. Mohla, S.; Pande, S.; Banerjee, B.; Chaudhuri, S. Fusatnet: Dual attention based spectrospatial multimodal fusion network for hyperspectral and lidar classification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Seattle, WA, USA, 14–19 June 2020; pp. 92–93. [Google Scholar]
  50. Fang, S.; Li, K.; Li, Z. S2ENet: Spatial–Spectral Cross-Modal Enhancement Network for Classification of Hyperspectral and LiDAR Data. IEEE Geosci. Remote. Sens. Lett. 2022, 19, 1–5. [Google Scholar] [CrossRef]
Figure 1. Proposed classification framework of PC 2 ANet .
Figure 1. Proposed classification framework of PC 2 ANet .
Remotesensing 14 03247 g001
Figure 2. Overall parameter configuration of the designed feature extraction model.
Figure 2. Overall parameter configuration of the designed feature extraction model.
Remotesensing 14 03247 g002
Figure 3. Position-channel collaborative attention module. (a) SPA. (b) FPA. (c) FCA.
Figure 3. Position-channel collaborative attention module. (a) SPA. (b) FPA. (c) FCA.
Remotesensing 14 03247 g003
Figure 4. Visualization of the 2012 Houston Data. (a) False-color images of hyperspectral data using 64, 43, and 22 as R, G, B, respectively. (b) Gray images of LiDAR data. (c) Ground-truth map.
Figure 4. Visualization of the 2012 Houston Data. (a) False-color images of hyperspectral data using 64, 43, and 22 as R, G, B, respectively. (b) Gray images of LiDAR data. (c) Ground-truth map.
Remotesensing 14 03247 g004
Figure 5. Visualization of the Trento Data. (a) False-color images of hyperspectral data using 40, 20, and 10 as R, G, B, respectively. (b) Gray images of LiDAR data. (c) Ground-truth map.
Figure 5. Visualization of the Trento Data. (a) False-color images of hyperspectral data using 40, 20, and 10 as R, G, B, respectively. (b) Gray images of LiDAR data. (c) Ground-truth map.
Remotesensing 14 03247 g005
Figure 6. Classification performance of the proposed PC 2 ANet with different patch scales.
Figure 6. Classification performance of the proposed PC 2 ANet with different patch scales.
Remotesensing 14 03247 g006
Figure 7. Visualization of the influence of PC 2 A module on 2012 Houston data set. (a) LiDAR. (b) LiDAR- PC 2 A . (c) HSI. (d) HSI- PC 2 A . (e) PC 2 ANet .
Figure 7. Visualization of the influence of PC 2 A module on 2012 Houston data set. (a) LiDAR. (b) LiDAR- PC 2 A . (c) HSI. (d) HSI- PC 2 A . (e) PC 2 ANet .
Remotesensing 14 03247 g007
Figure 8. Partial detail classification diagram with/without PC 2 A module on Trento dataset.
Figure 8. Partial detail classification diagram with/without PC 2 A module on Trento dataset.
Remotesensing 14 03247 g008
Figure 9. Classification maps of the 2012 Houston data using different models. (a) Two-branch. (b) EndNet. (c) MDL-Middle. (d) FusAtNet. (e) S 2 ENet . (f) PC 2 ANet .
Figure 9. Classification maps of the 2012 Houston data using different models. (a) Two-branch. (b) EndNet. (c) MDL-Middle. (d) FusAtNet. (e) S 2 ENet . (f) PC 2 ANet .
Remotesensing 14 03247 g009
Figure 10. Classification maps of the Trento data using different models. (a) Two-branch. (b) EndNet. (c) MDL-Middle. (d) FusAtNet. (e) S 2 ENet . (f) PC 2 ANet .
Figure 10. Classification maps of the Trento data using different models. (a) Two-branch. (b) EndNet. (c) MDL-Middle. (d) FusAtNet. (e) S 2 ENet . (f) PC 2 ANet .
Remotesensing 14 03247 g010
Figure 11. Influence of with/without PC 2 A module on classification accuracies of each class. (a) 2012 Houston data set. (b) Trento data set.
Figure 11. Influence of with/without PC 2 A module on classification accuracies of each class. (a) 2012 Houston data set. (b) Trento data set.
Remotesensing 14 03247 g011
Figure 12. Influence of with/without decision fusion module on classification accuracies of each class. (a) 2012 Houston data set. (b) Trento data set.
Figure 12. Influence of with/without decision fusion module on classification accuracies of each class. (a) 2012 Houston data set. (b) Trento data set.
Remotesensing 14 03247 g012
Table 1. The 2012 Houston data: numbers of training and testing sample, color and RGB value in each class.
Table 1. The 2012 Houston data: numbers of training and testing sample, color and RGB value in each class.
ClassLand-Cover TypeTrainTestColorRGB
C1Healthy grass1981053 (0, 205, 0)
C2Stressed grass1901064 (127, 255, 0)
C3Synthetic grass192505 (46, 139, 87)
C4Tree1881056 (0, 139, 0)
C5Soil1861056 (160, 82, 45)
C6Water182143 (0, 255, 255)
C7Residential1961072 (255, 255, 255)
C8Commercial1911053 (216, 191, 216)
C9Road1931059 (255, 100, 100)
C10Highway1911036 (139, 0, 0)
C11Railway1811054 (0, 0, 255)
C12Parking lot 11921041 (255, 255, 0)
C13Parking lot 2184285 (238, 154, 0)
C14Tennis court181247 (85, 26, 139)
C15Running track187473 (255, 127, 80)
-Total283212,197--
Table 2. Trento data: numbers of training and testing sample, color and RGB value in each class.
Table 2. Trento data: numbers of training and testing sample, color and RGB value in each class.
ClassLand-Cover TypeTrainTestColorRGB
C1Apple trees1293905 (0, 255, 0)
C2Buildings1252778 (0, 0, 255)
C3Ground105374 (255, 255, 0)
C4Wood1548969 (255, 0, 255)
C5Vineyard18410317 (0, 255, 255)
C6Roads1223252 (255, 0, 0)
-Total81929,595--
Table 3. Influence of PC 2 A on the Houston and Trento dataset.
Table 3. Influence of PC 2 A on the Houston and Trento dataset.
DatasetMetricesLiDARLiDAR- PC 2 AHSIHSI- PC 2 A PC 2 ANet
2012 HoustonOA(%)58.2560.8791.9492.4395.02
AA(%)61.5362.4092.5093.0894.97
Kappa54.9457.6991.2591.7894.59
TrentoOA(%)87.0087.1996.7297.3199.15
AA(%)78.0681.3295.5095.9298.81
Kappa82.8583.1695.6396.4198.87
Table 4. Classification accuracies (%) and kappa coefficients of different model on the 2012 Houston dataset.
Table 4. Classification accuracies (%) and kappa coefficients of different model on the 2012 Houston dataset.
ClassTwo-BranchEndNetMDL-MiddleFusAtNetIP-CNNCRNN S 2 ENetHRWN PC 2 ANet
C183.1081.5883.1083.1085.7783.0082.9185.6186.89
C284.1083.6585.0696.0587.3479.4110085.1799.72
C310010099.6010010099.8010099.5799.80
C493.0993.0991.5793.0994.2690.1596.8892.2097.92
C510099.9198.8699.4398.4299.7199.9110098.39
C699.3095.1010010099.9183.2110098.1595.80
C792.8282.6596.6493.5394.5988.0695.1595.9897.48
C882.3481.2988.1392.1291.8188.6193.9297.5995.06
C984.7088.2985.9383.6389.3566.0191.3188.6691.78
C1065.4489.0074.4264.0972.4352.2292.9586.2398.26
C1188.2483.7884.5490.1396.5781.9794.6997.9895.35
C1289.5390.3995.3991.9395.6069.8389.4397.4087.22
C1392.2882.4687.3788.4294.3779.6483.1691.4781.40
C1496.7610095.1410099.8610010010099.60
C1599.7998.1010099.1599.9910010010099.79
OA87.9888.5289.5589.9892.0688.5594.1993.6195.02
AA90.1189.9591.0594.6593.3590.3094.6994.4094.97
Kappa86.9887.5988.7189.1391.4287.5693.6993.0994.59
Table 5. Classification accuracies (%) and kappa coefficients of different model on the Trento dataset.
Table 5. Classification accuracies (%) and kappa coefficients of different model on the Trento dataset.
ClassTwo-BranchEndNetMDL-MiddleFusAtNetIP-CNNCRNN S 2 ENetHRWN PC 2 ANet
C199.7888.1999.5099.5499.0098.3999.6599.7599.75
C297.9398.4997.5598.4999.4090.4697.3194.3298.28
C399.9395.1999.1099.7399.1099.7999.6798.75100
C499.4699.3099.9010099.9296.9699.9710099.90
C598.9691.9699.7199.9099.6610099.7210099.65
C691.6890.1492.2593.3290.2181.6393.2494.9095.27
OA98.3694.1798.7399.0698.5897.3098.8798.8699.15
AA97.9693.8898.0098.5097.8894.5498.2697.9598.81
Kappa97.8392.2298.3298.7598.1796.3998.5098.4898.87
Table 6. Training and testing time (seconds) cost compared to models for single-source classification.
Table 6. Training and testing time (seconds) cost compared to models for single-source classification.
Dataset LiDARLiDAR- PC 2 A HSIHSI- PC 2 A PC 2 ANet
2012 HoustonTraining258.170340.615298.096319.132425.508
Testing51.96155.27442.88945.69274.110
TrentoTraining185.421201.816273.243294.228357.661
Testing6.3805.0768.27511.61014.179
Table 7. Training and testing time (seconds) cost compared to advanced deep learning models.
Table 7. Training and testing time (seconds) cost compared to advanced deep learning models.
Dataset Two-BranchEndNetMDL-MiddleFusAtNet S 2 ENet PC 2 ANet
2012 HoustonTraining264.457271.866309.8392519.060338.179425.508
Testing46.06955.76061.088931.67552.81874.110
TrentoTraining139.393121.245145.861882.266190.702357.661
Testing3.0103.2763.69066.0785.26114.179
Table 8. Influence of with/without PC 2 A module on classification accuracies.
Table 8. Influence of with/without PC 2 A module on classification accuracies.
Dataset PC 2 A ModuleOAAAKappa
Houston×94.0994.7193.58
95.0294.9794.59
Trento×98.8598.3598.47
99.1598.8198.87
Table 9. Influence of with/without Decision Fusion on classification accuracies.
Table 9. Influence of with/without Decision Fusion on classification accuracies.
DatasetDecision FusionOAAAKappa
Houston×94.3394.8893.84
95.0294.9794.59
Trento×98.8798.4298.50
99.1598.8198.87
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Zhou, L.; Geng, J.; Jiang, W. Joint Classification of Hyperspectral and LiDAR Data Based on Position-Channel Cooperative Attention Network. Remote Sens. 2022, 14, 3247. https://doi.org/10.3390/rs14143247

AMA Style

Zhou L, Geng J, Jiang W. Joint Classification of Hyperspectral and LiDAR Data Based on Position-Channel Cooperative Attention Network. Remote Sensing. 2022; 14(14):3247. https://doi.org/10.3390/rs14143247

Chicago/Turabian Style

Zhou, Lin, Jie Geng, and Wen Jiang. 2022. "Joint Classification of Hyperspectral and LiDAR Data Based on Position-Channel Cooperative Attention Network" Remote Sensing 14, no. 14: 3247. https://doi.org/10.3390/rs14143247

APA Style

Zhou, L., Geng, J., & Jiang, W. (2022). Joint Classification of Hyperspectral and LiDAR Data Based on Position-Channel Cooperative Attention Network. Remote Sensing, 14(14), 3247. https://doi.org/10.3390/rs14143247

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop