A Real-Time Semantic Segmentation Method Based on STDC-CT for Recognizing UAV Emergency Landing Zones
Abstract
:1. Introduction
- We create a new dataset for UAV aerial semantic segmentation and achieve recognition of emergency landing zones, protected targets, and buildings during high-altitude UAV flight missions.
- A lightweight semantic segmentation network named STDC-CT i is proposed for UAV emergency landing zones recognition. The proposed model consists of STDC backbone, SOAE module, PAPPM, DCFFM and Detail Guidance Module. The network achieves a balance of segmentation speed and accuracy and improves the segmentation accuracy on small objects.
- Extensive experiments have been carried out to evaluate the efficiency of our method. The results of our experiments indicate that our model can reach cutting-edge performance on the UAV-City, Cityscapes, and UAVid datasets. In addition, we deploy the trained model onto a UAV equipped with a Jetson TX2 embedded device. It shows that the model works well for real-world UAV applications.
2. Related Work
2.1. Conventional Methods for Recognizing UAV Emergency Landing Zones
- Artificial recognition: the UAV pilot observes the surrounding environment and selects a suitable landing area, such as a flat and open grassland, ground, rooftop, etc.
- Photo analysis: this method involves analyzing the photos of the landing zones captured by a UAV camera to judge the flatness, obstacles, terrain, and other factors, thus aiding in landing zone selection.
- Altitude measurement: altitude measuring equipment mounted on the UAV can be used to measure the height of the landing zone and judge whether it meets safety requirements.
- Radar scanning: using radar equipment, the UAV scans the landing zones and obtains information such as the terrain height and obstacles to gauge the suitability of the landing area.
- GPS positioning: the GPS device mounted on the UAV obtains the location information of the landing zones, and evaluates factors such as terrain, height, and slope to comprehensively select a landing zone.
- Object recognition method: a deep learning-based pre-trained object detection model can be built on the UAV to process images or videos captured during flights in real time, thereby recognizing the positions of the landing zones.
2.2. Conventional Semantic Segmentation Methods
- Threshold-based segmentation [28]: The earliest semantic segmentation algorithms were based on threshold segmentation, which involves categorizing pixels in an image into the foreground and background based on a fixed threshold and processing the foreground region further. However, this kind of method is not suitable for images with complex backgrounds.
- Region-based segmentation [29]: Region-based segmentation methods locate clusters of similar pixels in an image, which are considered to be of the same class of pixels. This method can handle images with complex backgrounds, but the results often contain discontinuous regions.
- Edge-based segmentation [30]: Edge-based segmentation methods categorize pixels into different classes by detecting edges in the image. This method can generate continuous segmentation results, but it is sensitive to noise.
- Graph-based segmentation [31,32]: Graph-based segmentation methods consider an image as a graph, where pixels are nodes and the edges represent the similarity between pixels. Segmentation is completed by optimizing the objective function. This method can produce highly accurate segmentation results, but it requires longer computation time and resources.
- Cluster-based segmentation [33]: Cluster-based segmentation methods categorize pixels in an image into different groups based on the similarity between pixels obtained through clustering. This method can handle large-scale images, but it is sensitive to noise.
2.3. Methods for Semantic Segmentation Based on Deep Learning
2.3.1. Generic Semantic Segmentation
2.3.2. Lightweight Semantic Segmentation Network
2.3.3. Real-Time Semantic Segmentation
3. UAV-City Dataset
3.1. Image Acquisition Strategy
- During the operation of UAVs, strict compliance with safety regulations for drone flights is ensured.
- The maximum flight altitude of a drone is set at 140 m, with lateral flight maintaining stability at around 120 m.
- During image acquisition, the onboard camera captures continuous images of the ground with a time interval of 0.1 s, providing a top-down view. The camera angle is set vertically.
- Multiple flights are conducted to capture images from different flight paths, introducing variance into the dataset to mitigate the risk of overfitting during model training.
- Data collection is conducted under favorable weather conditions with sufficient daylight.
3.2. Image Processing and Annotation
3.2.1. Image Filtering
3.2.2. Image Annotation
- Horizontal roof: the rooftop area of the buildings is flat.
- Horizontal ground: flat ground areas other than roadways used for vehicular traffic.
- Horizontal lawn: flat lawns.
- River: identifiable water bodies, including rivers and lakes.
- Plant: low vegetation, such as grass and shrubs.
- Tree: tall trees with canopies and trunks.
- Car: all vehicles on roads and parking lots, including cars, buses, trucks, tractors, etc.
- Human: all visible pedestrians on the ground.
- Building: residential buildings, garages, security booths, office buildings, and other structures under construction.
- Road: roads and bridges where vehicles are legally allowed to travel.
- Obstacle: steel frames, transmission line poles, and roads under construction.
3.3. Statistical Analysis
4. Proposed Method
4.1. Small Object Attention Extractor for STDC
4.2. Laplacian of Gaussian for Detail Guidance Module
4.3. PAPPM for Capturing Contextual Information
4.4. The Detail and Context Feature Fusion Module
5. Experimental Results
5.1. Datasets
5.2. Implementation Details
5.3. Ablation Study
5.4. Compare with Mainstream Methods
5.5. Embedded Experiments
5.6. Analysis of UAV Emergency Landing Zone Recognition Results
6. Discussion
7. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Data Availability Statement
Conflicts of Interest
References
- Kim, S.Y.; Muminov, A. Forest Fire Smoke Detection Based on Deep Learning Approaches and Unmanned Aerial Vehicle Images. Sensors 2023, 23, 5702. [Google Scholar] [CrossRef] [PubMed]
- Li, S.; Yang, X.; Lin, X.; Zhang, Y.; Wu, J. Real-Time Vehicle Detection from UAV Aerial Images Based on Improved YOLOv5. Sensors 2023, 23, 5634. [Google Scholar] [CrossRef] [PubMed]
- Lin, T.H.; Su, C.W. Oriented Vehicle Detection in Aerial Images Based on YOLOv4. Sensors 2022, 22, 8394. [Google Scholar] [CrossRef]
- Zhu, C.; Zhu, J.; Bu, T.; Gao, X. Monitoring and Identification of Road Construction Safety Factors via UAV. Sensors 2022, 22, 8797. [Google Scholar] [CrossRef]
- Natesan, S.; Armenakis, C.; Benari, G.; Lee, R. Use of UAV-borne spectrometer for land cover classification. Drones 2018, 2, 16. [Google Scholar] [CrossRef] [Green Version]
- Matikainen, L.; Karila, K. Segment-based land cover mapping of a suburban area—Comparison of high-resolution remotely sensed datasets using classification trees and test field points. Remote Sens. 2011, 3, 1777–1804. [Google Scholar] [CrossRef] [Green Version]
- Belcore, E.; Piras, M.; Pezzoli, A. Land Cover Classification from Very High-Resolution UAS Data for Flood Risk Mapping. Sensors 2022, 22, 5622. [Google Scholar] [CrossRef]
- Trujillo, M.A.; Martínez-de Dios, J.R.; Martín, C.; Viguria, A.; Ollero, A. Novel Aerial Manipulator for Accurate and Robust Industrial NDT Contact Inspection: A New Tool for the Oil and Gas Inspection Industry. Sensors 2019, 19, 1305. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Karam, S.N.; Bilal, K.; Shuja, J.; Rehman, F.; Yasmin, T.; Jamil, A. Inspection of unmanned aerial vehicles in oil and gas industry: Critical analysis of platforms, sensors, networking architecture, and path planning. J. Electron. Imaging 2022, 32, 011006. [Google Scholar] [CrossRef]
- Zhang, C.; Tang, Z.; Zhang, M.; Wang, B.; Hou, L. Developing a more reliable aerial photography-based method for acquiring freeway traffic data. Remote Sens. 2022, 14, 2202. [Google Scholar] [CrossRef]
- Lu, M.; Xu, Y.; Li, H. Vehicle Re-Identification Based on UAV Viewpoint: Dataset and Method. Remote Sens. 2022, 14, 4603. [Google Scholar] [CrossRef]
- Feng, D.; Haase-Schütz, C.; Rosenbaum, L.; Hertlein, H.; Glaeser, C.; Timm, F.; Wiesbeck, W.; Dietmayer, K. Deep multi-modal object detection and semantic segmentation for autonomous driving: Datasets, methods, and challenges. IEEE Trans. Intell. Transp. Syst. 2020, 22, 1341–1360. [Google Scholar] [CrossRef] [Green Version]
- Siam, M.; Gamal, M.; Abdel-Razek, M.; Yogamani, S.; Jagersand, M.; Zhang, H. A comparative study of real-time semantic segmentation for autonomous driving. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Salt Lake City, UT, USA, 18–22 June 2018; pp. 587–597. [Google Scholar]
- Asgari Taghanaki, S.; Abhishek, K.; Cohen, J.P.; Cohen-Adad, J.; Hamarneh, G. Deep semantic segmentation of natural and medical images: A review. Artif. Intell. Rev. 2021, 54, 137–178. [Google Scholar] [CrossRef]
- Liu, S.; Cheng, J.; Liang, L.; Bai, H.; Dang, W. Light-weight semantic segmentation network for UAV remote sensing images. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 8287–8296. [Google Scholar] [CrossRef]
- Yu, C.; Wang, J.; Peng, C.; Gao, C.; Yu, G.; Sang, N. Bisenet: Bilateral segmentation network for real-time semantic segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 325–341. [Google Scholar]
- Fan, M.; Lai, S.; Huang, J.; Wei, X.; Chai, Z.; Luo, J.; Wei, X. Rethinking bisenet for real-time semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 9716–9725. [Google Scholar]
- Lyu, Y.; Vosselman, G.; Xia, G.S.; Yilmaz, A.; Yang, M.Y. UAVid: A semantic segmentation dataset for UAV imagery. ISPRS J. Photogramm. Remote Sens. 2020, 165, 108–119. [Google Scholar] [CrossRef]
- Zhang, F.; Jiao, L.; Li, L.; Liu, F.; Liu, X. Multiresolution attention extractor for small object detection. arXiv 2020, arXiv:2006.05941. [Google Scholar]
- Patil, K.A.; Prashanth, K.M.; Ramalingaiah, A. Texture Feature Extraction of Lumbar Spine Trabecular Bone Radiograph Image using Laplacian of Gaussian Filter with KNN Classification to Diagnose Osteoporosis. J. Phys. Conf. Ser. 2021, 2070, 012137. [Google Scholar] [CrossRef]
- Gunn, S.R. On the discrete representation of the Laplacian of Gaussian. Pattern Recognit. 1999, 32, 1463–1472. [Google Scholar] [CrossRef]
- Stoppa, F.; Vreeswijk, P.; Bloemen, S.; Bhattacharyya, S.; Caron, S.; Jóhannesson, G.; de Austri, R.R.; Oetelaar, C.v.d.; Zaharijas, G.; Groot, P.; et al. AutoSourceID-Light. Fast optical source localization via U-Net and Laplacian of Gaussian. arXiv 2022, arXiv:2202.00489. [Google Scholar]
- Xu, J.; Xiong, Z.; Bhattacharyya, S.P. Pidnet: A real-time semantic segmentation network inspired from pid controller. arXiv 2022, arXiv:2206.02066. [Google Scholar]
- Kaljahi, M.A.; Shivakumara, P.; Idris, M.Y.I.; Anisi, M.H.; Lu, T.; Blumenstein, M.; Noor, N.M. An automatic zone detection system for safe landing of UAVs. Expert Syst. Appl. 2019, 122, 319–333. [Google Scholar] [CrossRef] [Green Version]
- Shah Alam, M.; Oluoch, J. A survey of safe landing zone detection techniques for autonomous unmanned aerial vehicles (UAVs). Expert Syst. Appl. 2021, 179, 115091. [Google Scholar] [CrossRef]
- Gautam, A.; Sujit, P.; Saripalli, S. A survey of autonomous landing techniques for UAVs. In Proceedings of the 2014 International Conference on Unmanned Aircraft Systems (ICUAS), Orlando, FL, USA, 27–30 May 2014; pp. 1210–1218. [Google Scholar] [CrossRef]
- Xin, L.; Tang, Z.; Gai, W.; Liu, H. Vision-Based Autonomous Landing for the UAV: A Review. Aerospace 2022, 9, 634. [Google Scholar] [CrossRef]
- Otsu, N. A threshold selection method from gray-level histograms. IEEE Trans. Syst. Man, Cybern. 1979, 9, 62–66. [Google Scholar] [CrossRef] [Green Version]
- Tremeau, A.; Borel, N. A region growing and merging algorithm to color segmentation. Pattern Recognit. 1997, 30, 1191–1203. [Google Scholar] [CrossRef]
- Khan, J.F.; Bhuiyan, S.M.; Adhami, R.R. Image segmentation and shape analysis for road-sign detection. IEEE Trans. Intell. Transp. Syst. 2010, 12, 83–96. [Google Scholar] [CrossRef]
- Rother, C.; Kolmogorov, V.; Blake, A. “GrabCut” interactive foreground extraction using iterated graph cuts. ACM Trans. Graph. (TOG) 2004, 23, 309–314. [Google Scholar] [CrossRef]
- Boykov, Y.Y.; Jolly, M.P. Interactive graph cuts for optimal boundary & region segmentation of objects in ND images. In Proceedings of the Eighth IEEE International Conference on Computer Vision, ICCV 2001, Vancouver, BC, Canada, 7–14 July 2001; Volume 1, pp. 105–112. [Google Scholar]
- Achanta, R.; Shaji, A.; Smith, K.; Lucchi, A.; Fua, P.; Süsstrunk, S. SLIC superpixels compared to state-of-the-art superpixel methods. IEEE Trans. Pattern Anal. Mach. Intell. 2012, 34, 2274–2282. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
- Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, 5–9 October 2015; Proceedings, Part III 18. Springer: Berlin/Heidelberg, Germany, 2015; pp. 234–241. [Google Scholar]
- Chen, L.C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 40, 834–848. [Google Scholar] [CrossRef] [Green Version]
- Zhao, H.; Shi, J.; Qi, X.; Wang, X.; Jia, J. Pyramid scene parsing network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2881–2890. [Google Scholar]
- Lin, G.; Milan, A.; Shen, C.; Reid, I. Refinenet: Multi-path refinement networks for high-resolution semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1925–1934. [Google Scholar]
- Fu, J.; Liu, J.; Tian, H.; Li, Y.; Bao, Y.; Fang, Z.; Lu, H. Dual attention network for scene segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 3146–3154. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.C. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 4510–4520. [Google Scholar]
- Ma, N.; Zhang, X.; Zheng, H.T.; Sun, J. Shufflenet v2: Practical guidelines for efficient cnn architecture design. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 116–131. [Google Scholar]
- Paszke, A.; Chaurasia, A.; Kim, S.; Culurciello, E. Enet: A deep neural network architecture for real-time semantic segmentation. arXiv 2016, arXiv:1606.02147. [Google Scholar]
- Pohlen, T.; Hermans, A.; Mathias, M.; Leibe, B. Full-resolution residual networks for semantic segmentation in street scenes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4151–4160. [Google Scholar]
- Zhao, H.; Qi, X.; Shen, X.; Shi, J.; Jia, J. Icnet for real-time semantic segmentation on high-resolution images. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 405–420. [Google Scholar]
- Lo, S.Y.; Hang, H.M.; Chan, S.W.; Lin, J.J. Efficient dense modules of asymmetric convolution for real-time semantic segmentation. In Proceedings of the ACM Multimedia Asia, Beijing, China, 15–18 December 2019; pp. 1–6. [Google Scholar]
- Wang, Y.; Zhou, Q.; Xiong, J.; Wu, X.; Jin, X. ESNet: An efficient symmetric network for real-time semantic segmentation. In Proceedings of the Pattern Recognition and Computer Vision: Second Chinese Conference, PRCV 2019, Xi’an, China, 8–11 November 2019; Proceedings, Part II 2. Springer: Berlin/Heidelberg, Germany, 2019; pp. 41–52. [Google Scholar]
- Cordts, M.; Omran, M.; Ramos, S.; Rehfeld, T.; Enzweiler, M.; Benenson, R.; Franke, U.; Roth, S.; Schiele, B. The cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NE, USA, 27–30 June 2016; pp. 3213–3223. [Google Scholar]
- Chen, L.C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 801–818. [Google Scholar]
- Li, P.; Dong, X.; Yu, X.; Yang, Y. When humans meet machines: Towards efficient segmentation networks. In Proceedings of the the 31st British Machine Vision Virtual Conference, Virtual Event, 7–10 September 2020. [Google Scholar]
- Orsic, M.; Kreso, I.; Bevandic, P.; Segvic, S. In defense of pre-trained imagenet architectures for real-time semantic segmentation of road-driving images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 12607–12616. [Google Scholar]
- Nirkin, Y.; Wolf, L.; Hassner, T. Hyperseg: Patch-wise hypernetwork for real-time semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Virtual, 19–25 June 2021; pp. 4061–4070. [Google Scholar]
- Yu, C.; Gao, C.; Wang, J.; Yu, G.; Shen, C.; Sang, N. Bisenet v2: Bilateral network with guided aggregation for real-time semantic segmentation. Int. J. Comput. Vis. 2021, 129, 3051–3068. [Google Scholar] [CrossRef]
- Peng, J.; Liu, Y.; Tang, S.; Hao, Y.; Chu, L.; Chen, G.; Wu, Z.; Chen, Z.; Yu, Z.; Du, Y.; et al. Pp-liteseg: A superior real-time semantic segmentation model. arXiv 2022, arXiv:2204.02681. [Google Scholar]
- Kumaar, S.; Lyu, Y.; Nex, F.; Yang, M.Y. Cabinet: Efficient context aggregation network for low-latency semantic segmentation. In Proceedings of the 2021 IEEE International Conference on Robotics and Automation (ICRA), Xi’an, China, 30 May–5 June 2021; pp. 13517–13524. [Google Scholar]
- Badrinarayanan, V.; Kendall, A.; Cipolla, R. Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 2481–2495. [Google Scholar] [CrossRef]
- Wang, L.; Li, R.; Zhang, C.; Fang, S.; Duan, C.; Meng, X.; Atkinson, P.M. UNetFormer: A UNet-like transformer for efficient semantic segmentation of remote sensing urban scene imagery. ISPRS J. Photogramm. Remote Sens. 2022, 190, 196–214. [Google Scholar] [CrossRef]
- Wang, L.; Li, R.; Wang, D.; Duan, C.; Wang, T.; Meng, X. Transformer meets convolution: A bilateral awareness network for semantic segmentation of very fine resolution urban scene images. Remote Sens. 2021, 13, 3065. [Google Scholar] [CrossRef]
- Jetson TX2 Module. Available online: https://developer.nvidia.com/embedded/jetson-tx2 (accessed on 2 March 2023).
Model | Backbone | Key Innovations | mIoU (%) | FPS | Parameters |
---|---|---|---|---|---|
ENet [43] | - | Initial Block, Bottleneck Module | 58.3 | 76.9 | 0.37 M |
FRRN [44] | ResNet | FRRU, Full-resolution Residual Networks (FRRNs) | 71.8 | 2.1 | - |
ICNet [45] | PSPNet-50 | Cascade Feature Fusion Unit (CFF), The Loss Function | 69.5 | 30.3 | - |
EDANet [46] | - | EDA Module | 67.3 | 108.7 | 0.68 M |
BiSeNet [16] | ResNet-18 | Attention Refinement Module (ARM), Feature Fusion Module (FFM) | 74.7 | 65.5 | 49.0 M |
ESNet [47] | - | factorized convolutional units (FCU) and their parallel counterparts (PFCU) | 70.7 | 63 | 1.66 M |
STDC-Seg75 [17] | STDC1 | Short-Term Dense Concatenate Module (STDC), Detail Guidance Module | 75.3 | 126.7 | - |
Template | mIoU (%) | |
---|---|---|
Cityscapes | UAV-City | |
SOAE- | 74.8 | 64.7 |
SOAE- | 75.4 | 65.4 |
SOAE- | 75.7 | 65.8 |
Method | SOAE | LoG | PAPPM | DCFFM | mIoU (%) | |
---|---|---|---|---|---|---|
UAV-City | Cityscapes | |||||
STDC-Seg75 [17] | - | - | - | - | 65.1 | 75.3 |
STDC-CT75 | ✓ | - | - | - | 65.8 | 75.7 |
STDC-CT75 | ✓ | ✓ | - | - | 66.2 | 75.9 |
STDC-CT75 | ✓ | ✓ | ✓ | - | 66.6 | 76.2 |
STDC-CT75 | ✓ | ✓ | ✓ | ✓ | 67.3 | 76.5 |
Model | Resolution | Backbone | mIoU (%) | FPS |
---|---|---|---|---|
U-Net [35] | VGG16 | 63.4 | 28.9 | |
PSPNet [37] | ResNet50 | 52.1 | 34.5 | |
DeepLabv3+ [49] | MobileNetV2 | 57.4 | 35.5 | |
STDC-Seg [17] | STDC1 | 65.1 | 212.3 | |
STDC-CT | STDC1 | 67.3 | 196.8 |
Model | Class IoU (%) | mIoU (%) | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Hor. Roof | Hor. Gro. | Hor. Lawn | River | Plant | Tree | Car | Hum. | Bui. | Road | Obs. | Back. | ||
U-Net | 63.4 | 51.5 | 57.7 | 81.4 | 57.9 | 85.6 | 56.1 | 13.5 | 78.9 | 80.6 | 81.5 | 53.5 | 63.4 |
PSPNet | 52.1 | 47.8 | 55.2 | 75.3 | 43.5 | 80.6 | 32.2 | 2.1 | 71.7 | 73.2 | 68.8 | 51.5 | 54.5 |
DeepLabv3+ | 54.1 | 60.2 | 65.2 | 77.8 | 36.2 | 84 | 42.4 | 5.8 | 76.2 | 75.2 | 65.8 | 50 | 57.7 |
STDC-Seg75 | 64.3 | 62.8 | 58.5 | 80.6 | 60.5 | 83.4 | 58.1 | 16.1 | 81.6 | 79.5 | 83.9 | 52.3 | 65.1 |
STDC-CT75 | 65.6 | 62.3 | 66.2 | 85.7 | 59.2 | 86.7 | 61.3 | 18.6 | 83.1 | 82.5 | 82.1 | 54.3 | 67.3 |
Model | Backbone | GPU | mIoU | FPS |
---|---|---|---|---|
ENet [43] | - | Nvidia Titan X | 58.3 | 76.9 |
ICNet [45] | PSPNet-50 | Nvidia Titan X | 69.5 | 30.3 |
HMSeg [50] | - | GTX 1080Ti | 74.3 | 83.2 |
BiSeNetV1 [16] | ResNet-18 | GTX 1080Ti | 74.7 | 65.5 |
SwiftNet [51] | ResNet-18 | GTX 1080Ti | 75.4 | 39.9 |
HyperSeg-M [52] | EfficientNet | GTX 1080Ti | 75.8 | 36.9 |
BiSeNetV2-L [53] | - | GTX 1080Ti | 75.3 | 47.3 |
PP-Lite-T2 [54] | STDC1 | GTX 1080Ti | 74.9 | 143.6 |
STDC-Seg75 [17] | STDC1 | GTX 1080Ti | 75.3 | 126.7 |
CABiNet [55] | MobileNetV3 | RTX 2080Ti | 75.9 | 76.5 |
STDC-CT75 | STDC1 | RTX 2080Ti | 76.5 | 122.6 |
Model | Class IoU (%) | mIoU (%) | |||||||
---|---|---|---|---|---|---|---|---|---|
Clutter | Building | Road | Tree | Low Veg. | Mov. car | Static Car | Human | ||
FCN-8s [34] | 63.9 | 84.7 | 76.5 | 73.3 | 61.9 | 65.9 | 45.5 | 22.3 | 62.4 |
SegNet [56] | 65.6 | 85.9 | 79.2 | 78.8 | 63.7 | 68.9 | 52.1 | 19.3 | 64.2 |
BiSeNet [16] | 64.7 | 85.7 | 61.1 | 78.3 | 77.3 | 48.6 | 63.4 | 17.5 | 61.5 |
U-Net [35] | 61.8 | 82.9 | 75.2 | 77.3 | 62.0 | 59.6 | 30.0 | 18.6 | 58.4 |
BiSeNetV2 [53] | 61.2 | 81.6 | 77.1 | 76.0 | 61.3 | 66.4 | 38.5 | 15.4 | 59.7 |
DeepLabv3+ [49] | 68.9 | 87.6 | 82.2 | 79.8 | 65.9 | 69.9 | 55.4 | 26.1 | 67.0 |
UNetFormer [57] | 68.4 | 87.4 | 81.5 | 80.2 | 63.5 | 73.6 | 56.4 | 31.0 | 67.8 |
BANet [58] | 66.6 | 85.4 | 80.7 | 78.9 | 62.1 | 69.3 | 52.8 | 21.0 | 64.6 |
STDC-Seg75 [17] | 68.7 | 86.8 | 79.4 | 78.6 | 65.4 | 68.1 | 55.7 | 24.5 | 65.9 |
STDC-CT75 | 69.2 | 88.5 | 80.1 | 80.4 | 66.3 | 73.8 | 60.3 | 28.4 | 68.4 |
Items | Specification |
---|---|
CPU | Dual-Core NVIDIA Denver 2 64-Bit CPU Quad-Core ARM® Cortex®-A57 MPCore |
GPU | 256-core NVIDIA Pascal™ architecture GPU |
Power | 7.5 W/15 W |
Memory | 8GB 128-bit LPDDR4 Memory 1866 MHx − 59.7 GB/s |
Storage | 32 GB eMMC 5.1 |
Operating system(OS) | Linux for Tegra R28.1 |
AI Performance | 1.33 TFLOPs |
Model | mIoU (%) | Inference Time (ms) |
---|---|---|
U-Net | 63.4 | 392.63 |
PSPNet | 52.1 | 332.47 |
DeepLabv3+ | 57.4 | 253.78 |
STDC-Seg | 65.1 | 52.71 |
STDC-CT | 67.3 | 58.32 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Jiang, B.; Chen, Z.; Tan, J.; Qu, R.; Li, C.; Li, Y. A Real-Time Semantic Segmentation Method Based on STDC-CT for Recognizing UAV Emergency Landing Zones. Sensors 2023, 23, 6514. https://doi.org/10.3390/s23146514
Jiang B, Chen Z, Tan J, Qu R, Li C, Li Y. A Real-Time Semantic Segmentation Method Based on STDC-CT for Recognizing UAV Emergency Landing Zones. Sensors. 2023; 23(14):6514. https://doi.org/10.3390/s23146514
Chicago/Turabian StyleJiang, Bo, Zhonghui Chen, Jintao Tan, Ruokun Qu, Chenglong Li, and Yandong Li. 2023. "A Real-Time Semantic Segmentation Method Based on STDC-CT for Recognizing UAV Emergency Landing Zones" Sensors 23, no. 14: 6514. https://doi.org/10.3390/s23146514
APA StyleJiang, B., Chen, Z., Tan, J., Qu, R., Li, C., & Li, Y. (2023). A Real-Time Semantic Segmentation Method Based on STDC-CT for Recognizing UAV Emergency Landing Zones. Sensors, 23(14), 6514. https://doi.org/10.3390/s23146514