CDAU-Net: A Novel CoordConv-Integrated Deep Dual Cross Attention Mechanism for Enhanced Road Extraction in Remote Sensing Imagery
Abstract
:1. Introduction
- (1)
- Substituting the first layer of convolution in the U-Net network encoder and the last layer of the decoder with CoordConv convolution: This facilitates the introduction of spatial coordinate information of pixels, enabling a better understanding of spatial structures and relationships in image processing, which subsequently enhances road extraction accuracy [19];
- (2)
- Innovative Integration of Deep Dual Cross-Attention and Skip Connections: We deploy a state-of-the-art DDCA mechanism, in amalgamation with depth-wise separable convolutions, to dramatically minimize the computational overhead while concurrently preserving channel-wise and spatial features. Additionally, this mechanism substitutes certain skip connections in the architecture, aiding the model in learning intricate interactions among features across scales and enabling their weighted fusion for enhanced task performance.
2. Related Work
3. Methodology
3.1. U-Net Architecture
3.2. CoordConv Implementation
3.3. Depthwise Separable Convolution
3.4. Deep Dual Cross Attention (DDCA)
3.4.1. The Multiscale Feature Embedding (MFE) Module
3.4.2. DCCA
3.4.3. DSCA
3.5. CDAU-Net
4. Experiments
4.1. Datasets
4.1.1. Massachusetts Road Dataset
4.1.2. DeepGlobe Road Dataset
4.2. Experiment Settings
4.2.1. Training Environment Description
4.2.2. Hyperparameter Settings
4.2.3. Hyperparameter Settings
4.3. Comparative Experiment Results and Analysis
4.3.1. Massachusetts Road Dataset Experimental Results
- Spectral Ambiguity Among Different Objects:
- 2.
- Spectral Discrepancies for the Same Object:
- 3.
- Shadow Occlusion:
4.3.2. DeepGlobe Road Dataset Experimental Results
4.4. Ablation Study Results and Analysis
- (a)
- Base U-Net: Our baseline model, a standard U-Net network, is devoid of any enhancements we proposed. This experiment aims to display baseline performance as a reference for comparison with other studies.
- (b)
- U-Net + CoordConv: This model variant adds CoordConv convolution to the first layer of the encoder and the last layer of the decoder in the base U-Net. CoordConv convolution is thought to augment model representation and location capabilities; this experiment evaluates the performance impact of this improvement;
- (c)
- U-Net + Cross Attention without Depthwise Separable Convolution: This setup adds a cross-attention mechanism to the U-Net base but does not utilize depthwise separable convolution. We expect this improvement to enhance model performance, but we also aim to investigate the contribution of depthwise separable convolution to performance;
- (d)
- U-Net + Cross Attention with Depthwise Separable Convolution: This experiment builds on the previous setup by further incorporating depthwise separable convolution. We expect this improvement to further enhance model performance;
- (e)
- CDAU-Net (Cross Attention Module with CoordConv and Depthwise Separable Convolution): Our proposed complete model, which combines the cross-attention module with CoordConv and depthwise separable convolution. We anticipate this model to outperform all the ablation experiments.
4.5. Computational Efficiency
5. Discussion
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Huang, H.; Savkin, A.V.; Huang, C. Decentralized Autonomous Navigation of a UAV Network for Road Traffic Monitoring. IEEE Trans. Aerosp. Electron. Syst. 2021, 57, 2558–2564. [Google Scholar] [CrossRef]
- Baltodano, S.; Sibi, S.; Martelaro, N.; Gowda, N.; Ju, W. The RRADS Platform: A Real Road Autonomous Driving Simulator. In Proceedings of the 7th International Conference on Automotive User Interfaces and Interactive Vehicular Applications, Nottingham, UK, 1–3 September 2015; pp. 281–288. [Google Scholar]
- Sadeghi-Niaraki, A.; Varshosaz, M.; Kim, K.; Jung, J.J. Real World Representation of a Road Network for Route Planning in GIS. Expert Syst. Appl. 2011, 38, 11999–12008. [Google Scholar] [CrossRef]
- Salama, A.S.; Saleh, B.K.; Eassa, M.M. Intelligent Cross Road Traffic Management System (ICRTMS). In Proceedings of the 2010 2nd International Conference on Computer Technology and Development, Cairo, Egypt, 2–4 November 2010; pp. 27–31. [Google Scholar]
- Singh, N.; Katiyar, S.K. Application of Geographical Information System (GIS) in Reducing Accident Blackspots and in Planning of a Safer Urban Road Network: A Review. Ecol. Inform. 2021, 66, 101436. [Google Scholar] [CrossRef]
- Rogan, J.; Chen, D. Remote Sensing Technology for Mapping and Monitoring Land-Cover and Land-Use Change. Prog. Plan. 2004, 61, 301–325. [Google Scholar] [CrossRef]
- Zhang, B.; Wu, Y.; Zhao, B.; Chanussot, J.; Hong, D.; Yao, J.; Gao, L. Progress and Challenges in Intelligent Remote Sensing Satellite Systems. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2022, 15, 1814–1822. [Google Scholar] [CrossRef]
- Lu, J.; Liu, H.; Yao, Y.; Tao, S.; Tang, Z.; Lu, J. Hsi Road: A Hyper Spectral Image Dataset for Road Segmentation. In Proceedings of the 2020 IEEE International Conference on Multimedia and Expo (ICME), London, UK, 6–10 July 2020; pp. 1–6. [Google Scholar]
- Sebari, I.; He, D.-C. Automatic Fuzzy Object-Based Analysis of VHSR Images for Urban Objects Extraction. ISPRS J. Photogramm. Remote Sens. 2013, 79, 171–184. [Google Scholar] [CrossRef]
- Saeedimoghaddam, M.; Stepinski, T.F. Automatic Extraction of Road Intersection Points from USGS Historical Map Series Using Deep Convolutional Neural Networks. Int. J. Geogr. Inf. Sci. 2020, 34, 947–968. [Google Scholar] [CrossRef]
- Hou, Y.; Liu, Z.; Zhang, T.; Li, Y. C-UNet: Complement UNet for Remote Sensing Road Extraction. Sensors 2021, 21, 2153. [Google Scholar] [CrossRef] [PubMed]
- Zhou, L.; Zhang, C.; Wu, M. D-LinkNet: LinkNet with Pretrained Encoder and Dilated Convolution for High Resolution Satellite Imagery Road Extraction. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Salt Lake City, UT, USA, 18–23 June 2018; pp. 182–186. [Google Scholar]
- Dai, L.; Zhang, G.; Zhang, R. RADANet: Road Augmented Deformable Attention Network for Road Extraction From Complex High-Resolution Remote-Sensing Images. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5602213. [Google Scholar] [CrossRef]
- Abdollahi, A.; Pradhan, B.; Shukla, N.; Chakraborty, S.; Alamri, A. Deep Learning Approaches Applied to Remote Sensing Datasets for Road Extraction: A State-of-the-Art Review. Remote Sens. 2020, 12, 1444. [Google Scholar] [CrossRef]
- Lan, M.; Zhang, Y.; Zhang, L.; Du, B. Global Context Based Automatic Road Segmentation via Dilated Convolutional Neural Network. Inf. Sci. 2020, 535, 156–171. [Google Scholar] [CrossRef]
- Wei, Y.; Zhang, K.; Ji, S. Simultaneous Road Surface and Centerline Extraction from Large-Scale Remote Sensing Images Using CNN-Based Segmentation and Tracing. IEEE Trans. Geosci. Remote Sens. 2020, 58, 8919–8931. [Google Scholar] [CrossRef]
- Sun, Z.; Zhou, W.; Ding, C.; Xia, M. Multi-Resolution Transformer Network for Building and Road Segmentation of Remote Sensing Image. ISPRS Int. J. Geo-Inf. 2022, 11, 165. [Google Scholar] [CrossRef]
- Han, K.; Xiao, A.; Wu, E.; Guo, J.; Xu, C.; Wang, Y. Transformer in Transformer. Adv. Neural Inf. Process. Syst. 2021, 34, 15908–15919. [Google Scholar]
- Liu, R.; Lehman, J.; Molino, P.; Petroski Such, F.; Frank, E.; Sergeev, A.; Yosinski, J. An Intriguing Failing of Convolutional Neural Networks and the Coordconv Solution. arXiv 2018, arXiv:1807.03247. [Google Scholar]
- Xu, Y.; Xie, Z.; Feng, Y.; Chen, Z. Road Extraction from High-Resolution Remote Sensing Imagery Using Deep Learning. Remote Sens. 2018, 10, 1461. [Google Scholar] [CrossRef]
- Jonsson, P.; Casselgren, J.; Thörnberg, B. Road Surface Status Classification Using Spectral Analysis of NIR Camera Images. IEEE Sens. J. 2014, 15, 1641–1656. [Google Scholar] [CrossRef]
- Taylor, M.A. Remoteness and Accessibility in the Vulnerability Analysis of Regional Road Networks. Transp. Res. Part A Policy Pract. 2012, 46, 761–771. [Google Scholar] [CrossRef]
- Trinder, J.C.; Wang, Y. Knowledge-Based Road Interpretation in Aerial Images. Int. Arch. Photogramm. Remote Sens. 1998, 32, 635–640. [Google Scholar]
- Xu, G.; Liu, M.; Jiang, Z.; Shen, W.; Huang, C. Online Fault Diagnosis Method Based on Transfer Convolutional Neural Networks. IEEE Trans. Instrum. Meas. 2019, 69, 509–520. [Google Scholar] [CrossRef]
- Long, J.; Shelhamer, E.; Darrell, T. Fully Convolutional Networks for Semantic Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
- Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, 5–9 October 2015; Springer: Berlin/Heidelberg, Germany, 2015; pp. 234–241. [Google Scholar]
- Ren, Y.; Yu, Y.; Guan, H. DA-CapsUNet: A Dual-Attention Capsule U-Net for Road Extraction from Remote Sensing Imagery. Remote Sens. 2020, 12, 2866. [Google Scholar] [CrossRef]
- Oktay, O.; Schlemper, J.; Folgoc, L.L.; Lee, M.; Heinrich, M.; Misawa, K.; Mori, K.; McDonagh, S.; Hammerla, N.Y.; Kainz, B. Attention U-Net: Learning Where to Look for the Pancreas. arXiv 2018, arXiv:1804.03999. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Zhang, Z.; Liu, Q.; Wang, Y. Road Extraction by Deep Residual U-Net. IEEE Geosci. Remote Sens. Lett. 2018, 15, 749–753. [Google Scholar] [CrossRef]
- Chen, L.-C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected Crfs. arXiv 2014, arXiv:1412.7062. [Google Scholar]
- Chen, L.-C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. Deeplab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected Crfs. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 40, 834–848. [Google Scholar] [CrossRef] [PubMed]
- Chen, L.-C.; Papandreou, G.; Schroff, F.; Adam, H. Rethinking Atrous Convolution for Semantic Image Segmentation. arXiv 2017, arXiv:1706.05587. [Google Scholar]
- Zhu, Q.; Zhang, Y.; Wang, L.; Zhong, Y.; Guan, Q.; Lu, X.; Zhang, L.; Li, D. A Global Context-Aware and Batch-Independent Network for Road Extraction from VHR Satellite Imagery. ISPRS J. Photogramm. Remote Sens. 2021, 175, 353–365. [Google Scholar] [CrossRef]
- Lu, X.; Zhong, Y.; Zheng, Z.; Chen, D.; Su, Y.; Ma, A.; Zhang, L. Cascaded Multi-Task Road Extraction Network for Road Surface, Centerline, and Edge Extraction. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5621414. [Google Scholar] [CrossRef]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 10012–10022. [Google Scholar]
- Shaw, P.; Uszkoreit, J.; Vaswani, A. Self-Attention with Relative Position Representations. arXiv 2018, arXiv:1803.02155. [Google Scholar]
- Yang, Z.; Zhou, D.; Yang, Y.; Zhang, J.; Chen, Z. TransRoadNet: A Novel Road Extraction Method for Remote Sensing Images via Combining High-Level Semantic Feature and Context. IEEE Geosci. Remote Sens. Lett. 2022, 19, 6509505. [Google Scholar] [CrossRef]
- Zhang, Z.; Miao, C.; Liu, C.; Tian, Q. DCS-TransUperNet: Road Segmentation Network Based on CSwin Transformer with Dual Resolution. Appl. Sci. 2022, 12, 3511. [Google Scholar] [CrossRef]
- Xu, Z.; Liu, Y.; Gan, L.; Sun, Y.; Wu, X.; Liu, M.; Wang, L. Rngdet: Road Network Graph Detection by Transformer in Aerial Images. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–12. [Google Scholar] [CrossRef]
- Niu, B.; Wen, W.; Ren, W.; Zhang, X.; Yang, L.; Wang, S.; Zhang, K.; Cao, X.; Shen, H. Single Image Super-Resolution via a Holistic Attention Network. In Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; Springer: Berlin/Heidelberg, Germany, 2020; pp. 191–207. [Google Scholar]
- Pan, X.; Yang, F.; Gao, L.; Chen, Z.; Zhang, B.; Fan, H.; Ren, J. Building Extraction from High-Resolution Aerial Imagery Using a Generative Adversarial Network with Spatial and Channel Attention Mechanisms. Remote Sens. 2019, 11, 917. [Google Scholar] [CrossRef]
- Wu, Y.; Wu, Y.; Wang, B.; Yang, H. A Remote Sensing Method for Crop Mapping Based on Multiscale Neighborhood Feature Extraction. Remote Sens. 2022, 15, 47. [Google Scholar] [CrossRef]
- Zhang, R.; Zhu, F.; Liu, J.; Liu, G. Depth-Wise Separable Convolutions and Multi-Level Pooling for an Efficient Spatial CNN-Based Steganalysis. IEEE Trans. Inf. Forensics Secur. 2019, 15, 1138–1150. [Google Scholar] [CrossRef]
- Mnih, V. Machine Learning for Aerial Image Labeling; University of Toronto: Toronto, ON, Canada, 2013. [Google Scholar]
- Demir, I.; Koperski, K.; Lindenbaum, D.; Pang, G.; Huang, J.; Basu, S.; Hughes, F.; Tuia, D.; Raskar, R.D. A Challenge to Parse the Earth through Satellite Images. arXiv 2018, arXiv:1805.06561. [Google Scholar]
- De Boer, P.-T.; Kroese, D.P.; Mannor, S.; Rubinstein, R.Y. A Tutorial on the Cross-Entropy Method. Ann. Oper. Res. 2005, 134, 19–67. [Google Scholar] [CrossRef]
- Sun, K.; Zhao, Y.; Jiang, B.; Cheng, T.; Xiao, B.; Liu, D.; Mu, Y.; Wang, X.; Liu, W.; Wang, J. High-Resolution Representations for Labeling Pixels and Regions. Available online: https://arxiv.org/abs/1904.04514v1 (accessed on 9 October 2023).
- Chen, L.-C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation; Springer: Berlin/Heidelberg, Germany, 2018; pp. 801–818. [Google Scholar]
- Wang, Y.; Peng, Y.; Li, W.; Alexandropoulos, G.C.; Yu, J.; Ge, D.; Xiang, W. DDU-Net: Dual-Decoder-U-Net for Road Extraction Using High-Resolution Remote Sensing Images. IEEE Trans. Geosci. Remote Sens. 2022, 60, 4412612. [Google Scholar] [CrossRef]
Actual Category | Predictive Category | |
---|---|---|
Road | Non-Road | |
Road | True positive (TP) | False negative (FN) |
None-Road | False positive (FP) | True negative (TN) |
Scheme | Network | Precision (%) | ReCall (%) | IoU (%) |
---|---|---|---|---|
(a) | U-Net | 78.82 | 83.76 | 77.67 |
(b) | ResUNet | 79.29 | 83.25 | 77.91 |
(c) | HRNetv2 | 79.96 | 83.93 | 77.54 |
(d) | DeepLabv3+ | 80.56 | 83.9 | 77.45 |
(e) | DDUNet | 81.13 | 84.65 | 78.24 |
(f) | CDAU-Net (Ours) | 83.84 | 84.42 | 78.56 |
Scheme | Network | Precision (%) | ReCall (%) | IoU (%) |
---|---|---|---|---|
(a) | U-Net | 83.43 | 84.45 | 75.36 |
(b) | ResUNet | 84.09 | 84.08 | 76.81 |
(c) | HRNetv2 | 85.09 | 84.08 | 77.41 |
(d) | DeepLabv3+ | 85.48 | 85.65 | 77.15 |
(e) | DDUNet | 85.92 | 86.74 | 77.85 |
(f) | CDAU-Net (Ours) | 86.69 | 86.59 | 78.52 |
Scheme | BL | Cc | DCA | Massachusetts Road Dataset | DeepGlobe Road Dataset | |||||
---|---|---|---|---|---|---|---|---|---|---|
CA | DW | P (%) | R (%) | IoU (%) | P (%) | R (%) | IoU (%) | |||
(a) | √ | 78.82 | 83.76 | 77.67 | 83.43 | 84.45 | 75.36 | |||
(b) | √ | √ | 79.48 | 83.03 | 78.06 | 85.36 | 86.09 | 77.24 | ||
(c) | √ | √ | 81.55 | 83.43 | 77.41 | 84.61 | 84.36 | 75.57 | ||
(d) | √ | √ | √ | 82.54 | 83.61 | 77.97 | 85.66 | 86.25 | 77.95 | |
(e) | √ | √ | √ | √ | 83.84 | 84.42 | 78.56 | 86.69 | 86.59 | 78.52 |
Network | Parameters (M) | FLOPS (GLOPS) |
---|---|---|
U-Net | 13.40 | 31.12 |
ResUnet | 13.04 | 80.99 |
HRNetv2 | 13.55 | 45.20 |
Deeplabv3+ | 27.62 | 90.71 |
DDUNet | 25.89 | 62.33 |
CDAU-Net | 16.88 | 41.15 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Yin, A.; Ren, C.; Yue, W.; Shao, H.; Xue, X. CDAU-Net: A Novel CoordConv-Integrated Deep Dual Cross Attention Mechanism for Enhanced Road Extraction in Remote Sensing Imagery. Remote Sens. 2023, 15, 4914. https://doi.org/10.3390/rs15204914
Yin A, Ren C, Yue W, Shao H, Xue X. CDAU-Net: A Novel CoordConv-Integrated Deep Dual Cross Attention Mechanism for Enhanced Road Extraction in Remote Sensing Imagery. Remote Sensing. 2023; 15(20):4914. https://doi.org/10.3390/rs15204914
Chicago/Turabian StyleYin, Anchao, Chao Ren, Weiting Yue, Hongjuan Shao, and Xiaoqin Xue. 2023. "CDAU-Net: A Novel CoordConv-Integrated Deep Dual Cross Attention Mechanism for Enhanced Road Extraction in Remote Sensing Imagery" Remote Sensing 15, no. 20: 4914. https://doi.org/10.3390/rs15204914
APA StyleYin, A., Ren, C., Yue, W., Shao, H., & Xue, X. (2023). CDAU-Net: A Novel CoordConv-Integrated Deep Dual Cross Attention Mechanism for Enhanced Road Extraction in Remote Sensing Imagery. Remote Sensing, 15(20), 4914. https://doi.org/10.3390/rs15204914