Cross Attention Based Dual-Modality Collaboration for Hyperspectral Image and LiDAR Data Classification
Abstract
1. Introduction
- 1.
- Based on cross-attention, we proposed the cross-attention bridge (CAB), leveraging the flexible advantage of CA to combine the HSI and LiDAR images, which have different architectures, such as features that dynamically adopted the feature fusion.
- 2.
- As we can combine data with different architectures, we use the complementary advantages of my proposed modal multistage fusion, a module that generates features combined from different semantic levels. The cross-attention module has been designed to combine and fuse features from different modalities.
- 3.
- Three publicly accessible datasets are utilized to assess the suggested methodology, and many state-of-the-art (SOTA) HSI and LiDAR classification methods are contrasted against it. The experimental results demonstrate that CAB-HL has an outstanding performance, achieving an accuracy of 99.33% on the Houston2013 dataset, surpassing other sophisticated algorithms by at least 2.5%.
2. Related Work
2.1. Multistage Feature Extraction
2.2. Attention Mechanism
2.3. Hyperspectral and LiDAR Fusion Classification
3. Methodology
3.1. Overall Architecture
- HSI Feature Extraction: A multipath 3D convolutional block is used to extract spectral–spatial features from the HSI data.
- LiDAR Feature Extraction: A multipath depthwise 2D convolutional block processes the LiDAR data to extract spatial features at multiple scales
- Cross-Attention Fusion: The extracted features from HSI and LiDAR are fused using a cross-attention mechanism in two stages.
- Classification: The fused features are passed through fully connected layers and a softmax activation function to generate class predictions.
3.2. Multipath Convolutional Blocks (CAB-HL)
- -
- Multi-Scale Convolutions: Different kernel sizes (, , ) are applied to capture features at varying scales.
- -
- Channel Refinement: Pointwise convolutions refine the extracted features by reducing the number of channels:
- -
- Feature Fusion and Skip Connections: The refined features are concatenated and passed through another 3D convolutional layer. A skip connection adds the input back to the processed features:
- -
- Multi-Scale Convolutions: Convolutions with kernel sizes (, , ) extract features at different spatial scales:
- -
- Depthwise Separable Convolutions: Depthwise convolutions followed by pointwise convolutions refine the spatial features:
- -
- Feature Fusion and Skip Connections: The refined features are concatenated and processed further, with the input added back to preserve the original information:
3.3. Cross-Attention Mechanism
3.4. Classification and Optimization
3.5. Algorithm
Algorithm 1 Multimodal classification pipeline with cross-attention |
Require: HSI data , LiDAR data |
Ensure: Classified results Y
|
4. Experimental Results
4.1. Datasets
4.2. Parameter Tuning
4.3. Parameter Setting
4.4. Effect of Patch Size and PCA Components on OA
4.5. Learning-Rate Comparison
4.6. Comparison and Analysis of Classification Performance
4.6.1. Classification Performance on the Augsburg Dataset
- CAB-HL achieved the highest accuracy, with an OA of 94.5%, AA of 77.12%, and a Kappa coefficient of 92.1%.
- DSHF follows as the second-best model with an OA of 92.88%, benefiting from its spectral–spatial fusion capability but struggling with shadowed urban areas.
- MS2CANet (91.67%) and SAL2RN (89.85%) perform relatively well but fail to fully exploit spatial dependencies in the dataset.
- MDL-RS, EndNet, and DHViT exhibit a lower classification accuracy, particularly in urban areas where mixed pixels reduce their effectiveness.
4.6.2. Classification Performance on the Houston2013 Dataset
- CAB-HL markedly surpasses all alternative approaches, attaining OA of 99.35%, an AA of 99.5%, and a Kappa coefficient of 99.3.
- DSHF achieved an overall accuracy of 92.88%, ranking second in performance, although it demonstrates misclassification in metropolitan areas characterized by significant spectral mixing.
- MS2CANet (91.99%) and SAL2RN (91.08%) yield robust findings; nevertheless, they do not sustain classification consistency across various land cover types.
- MDL-RS and EndNet exhibit modest performance; nevertheless, their classification maps indicate increased noise and misclassification in vegetative areas.
- CCR-Net and DHViT demonstrate the poorest performance, especially in differentiating residential, business, and road networks due to spectral overlap.
4.6.3. Classification Performance on the Trento Dataset
- CAB-HL attains an OA of 99.7%, an AA of 99.6%, and a Kappa coefficient of 99.59, illustrating its exceptional proficiency in identifying agricultural and urban areas.
- DSHF (98.3%) and MS2CANet (97.9%) demonstrate competitive efficacy; nevertheless, they inadequately capture fine-grained spatial information, resulting in misclassification within mixed land cover areas.
- SAL2RN (97.6%) and CCR-Net (96.4%) exhibit difficulty in distinguishing between vineyard and ground classes, thus affecting their total classification accuracy.
- DHViT and ExViT demonstrate the poorest performance, mostly because to their constrained capacity to capture long-range dependencies in agricultural regions.
4.7. Multiscale Input Patch Size Comparison
4.8. Learning-Rate Comparison
5. Ablation Study: Evaluating the Impact of Key Components in the CAB-HL Model
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Xu, H.; Zheng, T.; Liu, Y.; Zhang, Z.; Xue, C.; Li, J. A joint convolutional cross ViT network for hyperspectral and light detection and ranging fusion classification. Remote Sens. 2024, 16, 489. [Google Scholar] [CrossRef]
- Sun, W.; Yang, G.; Chen, C.; Chang, M.; Huang, K.; Meng, X.; Liu, L. Development status and literature analysis of China’s earth observation remote sensing satellites. J. Remote Sens. 2020, 24, 479–510. [Google Scholar] [CrossRef]
- Hong, D.; Gao, L.; Yao, J.; Zhang, B.; Plaza, A.; Chanussot, J. Graph convolutional networks for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2020, 59, 5966–5978. [Google Scholar] [CrossRef]
- Wang, L.; Wang, X. Dual-coupled cnn-gcn-based classification for hyperspectral and lidar data. Sensors 2022, 22, 5735. [Google Scholar] [CrossRef]
- Yang, J.X.; Wang, J.; Sui, C.H.; Long, Z.; Zhou, J. HSLiNets: Hyperspectral Image and LiDAR Data Fusion Using Efficient Dual Linear Feature Learning Networks. arXiv 2024, arXiv:2412.00302. [Google Scholar]
- Yu, C.; Han, R.; Song, M.; Liu, C.; Chang, C.I. A simplified 2D-3D CNN architecture for hyperspectral image classification based on spatial–spectral fusion. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 13, 2485–2501. [Google Scholar] [CrossRef]
- Hao, J.; Dong, F.; Wang, S.; Li, Y.; Cui, J.; Men, J.; Liu, S. Combined hyperspectral imaging technology with 2D convolutional neural network for near geographical origins identification of wolfberry. J. Food Meas. Charact. 2022, 16, 4923–4933. [Google Scholar] [CrossRef]
- Liu, D.; Han, G.; Liu, P.; Yang, H.; Sun, X.; Li, Q.; Wu, J. A novel 2D-3D CNN with spectral-spatial multi-scale feature fusion for hyperspectral image classification. Remote Sens. 2021, 13, 4621. [Google Scholar] [CrossRef]
- Zhao, J.; Wang, G.; Zhou, B.; Ying, J.; Liu, J. Exploring an application-oriented land-based hyperspectral target detection framework based on 3D–2D CNN and transfer learning. EURASIP J. Adv. Signal Process. 2024, 2024, 37. [Google Scholar] [CrossRef]
- Wang, Q.; Zhou, B.; Zhang, J.; Xie, J.; Wang, Y. Joint Classification of Hyperspectral Images and LiDAR Data Based on Dual-Branch Transformer. Sensors 2024, 24, 867. [Google Scholar] [CrossRef]
- Hong, D.; Gao, L.; Hang, R.; Zhang, B.; Chanussot, J. Deep encoder–decoder networks for classification of hyperspectral and LiDAR data. IEEE Geosci. Remote Sens. Lett. 2020, 19, 5500205. [Google Scholar] [CrossRef]
- Ghamisi, P.; Höfle, B.; Zhu, X.X. Hyperspectral and LiDAR data fusion using extinction profiles and deep convolutional neural network. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2016, 10, 3011–3024. [Google Scholar] [CrossRef]
- Hong, D.; Gao, L.; Yokoya, N.; Yao, J.; Chanussot, J.; Du, Q.; Zhang, B. More diverse means better: Multimodal deep learning meets remote-sensing imagery classification. IEEE Trans. Geosci. Remote Sens. 2020, 59, 4340–4354. [Google Scholar] [CrossRef]
- Ali, A.; Mu, C.; Zhang, Z.; Zhu, J.; Liu, Y. A two-branch multiscale spectral-spatial feature extraction network for hyperspectral image classification. J. Inf. Intell. 2024, 2, 224–235. [Google Scholar] [CrossRef]
- Choi, E.; Lee, C. Optimizing feature extraction for multiclass problems. IEEE Trans. Geosci. Remote Sens. 2001, 39, 521–528. [Google Scholar] [CrossRef]
- Zheng, Q.; Sun, J. Effective point cloud analysis using multi-scale features. Sensors 2021, 21, 5574. [Google Scholar] [CrossRef] [PubMed]
- Gao, H.; Wu, H.; Chen, Z.; Zhang, Y.; Zhang, Y.; Li, C. Multiscale spectral-spatial cross-extraction network for hyperspectral image classification. IET Image Process. 2022, 16, 755–771. [Google Scholar] [CrossRef]
- Pan, H.; Gao, F.; Dong, J.; Du, Q. Multiscale adaptive fusion network for hyperspectral image denoising. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2023, 16, 3045–3059. [Google Scholar] [CrossRef]
- Wang, K.; Bai, F.; Li, J.; Liu, Y.; Li, Y. MashFormer: A novel multiscale aware hybrid detector for remote sensing object detection. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2023, 16, 2753–2763. [Google Scholar] [CrossRef]
- Xue, Z.; Yu, X.; Tan, X.; Liu, B.; Yu, A.; Wei, X. Multiscale deep learning network with self-calibrated convolution for hyperspectral and LiDAR data collaborative classification. IEEE Trans. Geosci. Remote Sens. 2021, 60, 5514116. [Google Scholar] [CrossRef]
- Ding, K.; Lu, T.; Fu, W.; Li, S.; Ma, F. Global–local transformer network for HSI and LiDAR data joint classification. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5541213. [Google Scholar] [CrossRef]
- Song, D.; Gao, J.; Wang, B.; Wang, M. A multi-scale pseudo-siamese network with an attention mechanism for classification of hyperspectral and lidar data. Remote Sens. 2023, 15, 1283. [Google Scholar] [CrossRef]
- Meng, Q.; Zhao, M.; Zhang, L.; Shi, W.; Su, C.; Bruzzone, L. Multilayer feature fusion network with spatial attention and gated mechanism for remote sensing scene classification. IEEE Geosci. Remote Sens. Lett. 2022, 19, 6510105. [Google Scholar] [CrossRef]
- Geng, Z.; Guo, M.H.; Chen, H.; Li, X.; Wei, K.; Lin, Z. Is attention better than matrix decomposition? arXiv 2021, arXiv:2109.04553. [Google Scholar] [CrossRef]
- Mnih, V.; Heess, N.; Graves, A.; Kavukcuoglu, K. Recurrent models of visual attention. Adv. Neural Inf. Process. Syst. 2014, 27. [Google Scholar] [CrossRef]
- Vaswani, A. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017. [Google Scholar] [CrossRef]
- Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar]
- Li, H.C.; Hu, W.S.; Li, W.; Li, J.; Du, Q.; Plaza, A. A 3 clnn: Spatial, spectral and multiscale attention convlstm neural network for multisource remote sensing data classification. IEEE Trans. Neural Netw. Learn. Syst. 2020, 33, 747–761. [Google Scholar] [CrossRef]
- He, K.; Sun, W.; Yang, G.; Meng, X.; Ren, K.; Peng, J.; Du, Q. A dual global–local attention network for hyperspectral band selection. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5527613. [Google Scholar] [CrossRef]
- Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
- Zhu, M.; Jiao, L.; Liu, F.; Yang, S.; Wang, J. Residual spectral–spatial attention network for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2020, 59, 449–462. [Google Scholar] [CrossRef]
- Wang, X.; Wang, X.; Zhao, K.; Zhao, X.; Song, C. Fsl-unet: Full-scale linked unet with spatial–spectral joint perceptual attention for hyperspectral and multispectral image fusion. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5539114. [Google Scholar] [CrossRef]
- Xu, X.; Li, W.; Ran, Q.; Du, Q.; Gao, L.; Zhang, B. Multisource remote sensing data classification based on convolutional neural network. IEEE Trans. Geosci. Remote Sens. 2017, 56, 937–949. [Google Scholar] [CrossRef]
- Mohla, S.; Pande, S.; Banerjee, B.; Chaudhuri, S. Fusatnet: Dual attention based spectrospatial multimodal fusion network for hyperspectral and lidar classification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Seattle, WA, USA, 14–19 June 2020; pp. 92–93. [Google Scholar]
- Hang, R.; Li, Z.; Ghamisi, P.; Hong, D.; Xia, G.; Liu, Q. Classification of hyperspectral and LiDAR data using coupled CNNs. IEEE Trans. Geosci. Remote Sens. 2020, 58, 4939–4950. [Google Scholar] [CrossRef]
- Debes, C.; Merentitis, A.; Heremans, R.; Hahn, J.; Frangiadakis, N.; van Kasteren, T.; Liao, W.; Bellens, R.; Pižurica, A.; Gautama, S.; et al. Hyperspectral and LiDAR data fusion: Outcome of the 2013 GRSS data fusion contest. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2014, 7, 2405–2418. [Google Scholar] [CrossRef]
- Hong, D.; Hu, J.; Yao, J.; Chanussot, J.; Zhu, X.X. Multimodal remote sensing benchmark datasets for land cover classification with a shared and specific feature learning model. ISPRS J. Photogramm. Remote Sens. 2021, 178, 68–80. [Google Scholar] [CrossRef] [PubMed]
- Lu, T.; Ding, K.; Fu, W.; Li, S.; Guo, A. Coupled adversarial learning for fusion classification of hyperspectral and LiDAR data. Inf. Fusion 2023, 93, 118–131. [Google Scholar] [CrossRef]
- Wu, X.; Hong, D.; Chanussot, J. Convolutional neural networks for multimodal remote sensing data classification. IEEE Trans. Geosci. Remote Sens. 2021, 60, 5517010. [Google Scholar] [CrossRef]
- Yao, J.; Zhang, B.; Li, C.; Hong, D.; Chanussot, J. Extended vision transformer (ExViT) for land use and land cover classification: A multimodal deep learning framework. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5514415. [Google Scholar] [CrossRef]
- Xue, Z.; Tan, X.; Yu, X.; Liu, B.; Yu, A.; Zhang, P. Deep hierarchical vision transformer for hyperspectral and LiDAR data classification. IEEE Trans. Image Process. 2022, 31, 3095–3110. [Google Scholar] [CrossRef]
- Li, J.; Liu, Y.; Song, R.; Li, Y.; Han, K.; Du, Q. Sal2rn: A spatial–spectral salient reinforcement network for hyperspectral and lidar data fusion classification. IEEE Trans. Geosci. Remote Sens. 2022, 61, 5500114. [Google Scholar] [CrossRef]
- Wang, X.; Zhu, J.; Feng, Y.; Wang, L. MS2CANet: Multiscale spatial–spectral cross-modal attention network for hyperspectral image and LiDAR classification. IEEE Geosci. Remote Sens. Lett. 2024, 21, 5501505. [Google Scholar] [CrossRef]
- Feng, Y.; Song, L.; Wang, L.; Wang, X. DSHFNet: Dynamic scale hierarchical fusion network based on multiattention for hyperspectral image and LiDAR data classification. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5522514. [Google Scholar] [CrossRef]
- Wang, X.; Song, L.; Feng, Y.; Zhu, J. S3F2Net: Spatial-Spectral-Structural Feature Fusion Network for Hyperspectral Image and LiDAR Data Classification. IEEE Trans. Circuits Syst. Video Technol. 2025, 35, 4801–4815. [Google Scholar] [CrossRef]
- Wang, J.; Li, J.; Shi, Y.; Lai, J.; Tan, X. AM3Net: Adaptive mutual-learning-based multimodal data fusion network. IEEE Trans. Circuits Syst. Video Technol. 2022, 32, 5411–5426. [Google Scholar] [CrossRef]
Dataset | Houston2013 [37] | Trento [11] | Augsburg [38] | |||
---|---|---|---|---|---|---|
Location | Houston, Texas, USA | Trento, Italy | Augsburg, Germany | |||
Sensor Type | HSI | LiDAR | HSI | LiDAR | HSI | LiDAR |
Image Size | 349 × 1905 | 349 × 1905 | 600 × 166 | 600 × 166 | 332 × 485 | 332 × 485 |
Spatial Resolution | 2.5 m | 2.5 m | 1 m | 1 m | 30 m | 30 m |
Number of Bands | 144 | 1 | 63 | 1 | 180 | 1 |
Wavelength Range | 0.38–1.05 m | / | 0.42–0.99 m | / | 0.4–2.5 m | / |
Sensor Name | CASI-1500 | / | AISA Eagle | Optech ALTM 3100EA | HySpex | DLR-3 K |
No. | Class (Train/Test) | MDL-RS [13] | EndNet [11] | CALC [39] | CCR-Net [40] | ExViT [41] | DHViT [42] | SAL2RN [43] | MS2CANet [44] | DSHF [45] | Proposed |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | Forest (146/13361) | 84.97 | 89.70 | 94.16 | 93.47 | 91.83 | 90.45 | 96.58 | 96.40 | 97.60 | 98.74 |
2 | Residential-Area (264/30065) | 91.56 | 86.75 | 95.32 | 96.86 | 95.38 | 90.87 | 97.69 | 97.90 | 92.94 | 98.24 |
3 | Industrial-Area (21/3830) | 8.17 | 24.26 | 86.18 | 82.56 | 43.32 | 61.20 | 53.44 | 48.79 | 87.13 | 78.09 |
4 | Low-Plants (248/36609) | 78.07 | 76.77 | 95.57 | 84.45 | 91.13 | 82.82 | 92.84 | 96.47 | 96.38 | 97.59 |
5 | Allotment (52/523) | 26.00 | 34.42 | 0.00 | 44.36 | 41.11 | 21.80 | 38.62 | 44.55 | 64.05 | 96.94 |
6 | Commercial-Area (7/1638) | 2.14 | 9.46 | 6.05 | 0.00 | 26.01 | 23.50 | 15.14 | 13.43 | 2.50 | 11.9 |
7 | Water (23/1507) | 36.30 | 46.78 | 55.96 | 40.48 | 42.07 | 7.43 | 12.47 | 49.90 | 48.77 | 58.33 |
OA% (761/77533) | 79.11 | 78.25 | 91.46 | 87.82 | 87.82 | 83.06 | 89.85 | 91.65 | 91.67 | 94.5 | |
AA% | 46.74 | 52.29 | 61.89 | 63.17 | 61.41 | 54.01 | 58.11 | 63.92 | 69.91 | 77.12 | |
Kappa×100 | 67.52 | 68.11 | 88.04 | 82.48 | 82.44 | 75.91 | 85.26 | 88.22 | 88.12 | 92.1 |
No. | Class (Train/Test) | MDL-RS [13] | EndNet [11] | CALC [39] | CCR-Net [40] | ExViT [41] | DHViT [42] | SAL2RN [43] | MS2CANet [44] | DSHF [45] | Proposed |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | Health grass (198/1053) | 83.10 | 96.24 | 82.24 | 83.00 | 91.55 | 81.01 | 82.71 | 82.62 | 97.44 | 99.62 |
2 | Stressed grass (190/1064) | 81.58 | 93.46 | 83.93 | 84.87 | 85.15 | 85.15 | 85.15 | 82.04 | 85.15 | 99.44 |
3 | Synthetic grass (192/505) | 100.00 | 96.44 | 93.47 | 100 | 98.61 | 93.47 | 100.00 | 92.67 | 97.62 | 100 |
4 | Trees (188/1056) | 99.72 | 98.51 | 98.86 | 92.14 | 98.61 | 80.40 | 92.80 | 98.67 | 100 | 99.91 |
5 | Soil (186/1056) | 99.81 | 96.25 | 99.72 | 99.81 | 100.00 | 99.53 | 100.00 | 100.00 | 100 | 100 |
6 | Water (182/143) | 95.10 | 96.53 | 98.60 | 95.80 | 98.60 | 95.80 | 93.00 | 100.00 | 100 | 100 |
7 | Residential (196/1072) | 90.02 | 98.03 | 90.21 | 95.34 | 88.90 | 92.16 | 94.86 | 94.30 | 88.34 | 98.6 |
8 | Commercial (191/1053) | 87.94 | 95.53 | 81.58 | 81.39 | 93.35 | 92.21 | 90.97 | 89.74 | 82.15 | 96.87 |
9 | Road (193/1059) | 81.59 | 79.80 | 84.99 | 84.14 | 89.52 | 81.78 | 94.14 | 93.20 | 89.61 | 98.21 |
10 | Highway (191/1036) | 86.68 | 77.53 | 68.15 | 63.22 | 65.54 | 68.73 | 71.42 | 82.04 | 99.32 | 100 |
11 | Railway (181/1054) | 89.37 | 86.40 | 95.16 | 90.32 | 95.83 | 79.60 | 91.93 | 98.29 | 80.17 | 100 |
12 | Parking lot 1 (192/104) | 85.69 | 87.02 | 93.56 | 93.08 | 90.39 | 94.81 | 97.79 | 94.90 | 98.75 | 99.9 |
13 | Parking lot 2 (184/285) | 83.16 | 79.64 | 87.37 | 88.42 | 90.53 | 88.42 | 84.21 | 92.28 | 92.98 | 100 |
14 | Tennis court (181/247) | 100.00 | 98.71 | 99.60 | 96.36 | 99.60 | 100.00 | 99.59 | 87.85 | 100 | 100 |
15 | Running track (187/473) | 98.73 | 99.38 | 99.58 | 99.37 | 89.85 | 71.25 | 100.00 | 99.78 | 100 | 100 |
OA% (2832/12197) | 89.60 | 90.77 | 88.91 | 88.15 | 90.62 | 85.82 | 91.08 | 91.99 | 92.88 | 99.35 | |
AA% | 90.83 | 91.96 | 90.47 | 89.89 | 91.76 | 86.95 | 91.90 | 92.56 | 94.10 | 99.5 | |
Kappa×100 | 88.75 | 89.98 | 87.98 | 87.19 | 89.83 | 84.61 | 90.32 | 91.37 | 92.27 | 99.3 |
No. | Class (Train/Test) | MDL-RS [13] | EndNet [11] | CALC [39] | CCR-Net [40] | ExViT [41] | DHViT [42] | SAL2RN [43] | MS2CANet [44] | DSHF [45] | Proposed |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | Apple trees (129/3905) | 88.58 | 88.19 | 97.26 | 100.00 | 99.56 | 98.36 | 99.74 | 99.84 | 99.49 | 99.95 |
2 | Building (125/2778) | 95.86 | 98.49 | 100.00 | 98.88 | 98.13 | 99.06 | 96.76 | 98.52 | 98.74 | 99.21 |
3 | Ground (105/374) | 93.58 | 95.19 | 89.57 | 79.68 | 76.47 | 67.65 | 83.68 | 86.36 | 99.73 | 97.59 |
4 | Woods (154/8969) | 99.22 | 99.30 | 100.00 | 100.00 | 100.00 | 100.00 | 99.21 | 99.98 | 100 | 99.99 |
5 | Vineyard (184/10317) | 83.82 | 91.96 | 99.75 | 94.79 | 99.93 | 98.89 | 99.97 | 100.00 | 100 | 100 |
6 | Roads (122/3052) | 76.51 | 90.14 | 87.45 | 88.07 | 93.84 | 87.98 | 88.99 | 92.92 | 93.45 | 98.2 |
OA% (819/29395) | 90.65 | 94.17 | 98.16 | 96.57 | 98.80 | 98.00 | 98.06 | 98.92 | 99.13 | 99.7 | |
AA% | 89.60 | 93.88 | 95.68 | 93.57 | 94.66 | 92.16 | 94.72 | 96.27 | 98.57 | 99.16 | |
Kappa×100 | 86.28 | 92.22 | 97.48 | 95.43 | 98.39 | 97.31 | 97.40 | 98.56 | 98.83 | 99.59 |
Module | Metrics | Datasets | ||||
---|---|---|---|---|---|---|
HSI | LiDAR | CAM | Houston2013 | Augsburg | Trento | |
√ | × | × | OA (%) | 98.98 | 93.79 | 98.88 |
AA (%) | 99.22 | 75.44 | 98.19 | |||
Kappa (%) | 98.9 | 91.06 | 98.5 | |||
√ | √ | × | OA (%) | 99.57 | 94.50 | 99.63 |
AA (%) | 99.67 | 77.12 | 99.02 | |||
Kappa (%) | 99.54 | 92.10 | 99.5 | |||
√ | × | √ | OA (%) | 99.76 | 93.93 | 99.63 |
AA (%) | 99.81 | 79.27 | 99.43 | |||
Kappa (%) | 99.74 | 91.32 | 99.51 | |||
√ | √ | √ | OA (%) | 99.89 | 93.51 | 99.83 |
AA (%) | 99.91 | 75.76 | 99.70 | |||
Kappa (%) | 99.88 | 90.69 | 99.77 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Hussain, K.M.; Zhao, K.; Zhou, Y.; Ali, A.; Li, Y. Cross Attention Based Dual-Modality Collaboration for Hyperspectral Image and LiDAR Data Classification. Remote Sens. 2025, 17, 2836. https://doi.org/10.3390/rs17162836
Hussain KM, Zhao K, Zhou Y, Ali A, Li Y. Cross Attention Based Dual-Modality Collaboration for Hyperspectral Image and LiDAR Data Classification. Remote Sensing. 2025; 17(16):2836. https://doi.org/10.3390/rs17162836
Chicago/Turabian StyleHussain, Khanzada Muzammil, Keyun Zhao, Yang Zhou, Aamir Ali, and Ying Li. 2025. "Cross Attention Based Dual-Modality Collaboration for Hyperspectral Image and LiDAR Data Classification" Remote Sensing 17, no. 16: 2836. https://doi.org/10.3390/rs17162836
APA StyleHussain, K. M., Zhao, K., Zhou, Y., Ali, A., & Li, Y. (2025). Cross Attention Based Dual-Modality Collaboration for Hyperspectral Image and LiDAR Data Classification. Remote Sensing, 17(16), 2836. https://doi.org/10.3390/rs17162836