Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (38)

Search Parameters:
Keywords = CamVid

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
34 pages, 5777 KiB  
Article
ACNet: An Attention–Convolution Collaborative Semantic Segmentation Network on Sensor-Derived Datasets for Autonomous Driving
by Qiliang Zhang, Kaiwen Hua, Zi Zhang, Yiwei Zhao and Pengpeng Chen
Sensors 2025, 25(15), 4776; https://doi.org/10.3390/s25154776 - 3 Aug 2025
Viewed by 167
Abstract
In intelligent vehicular networks, the accuracy of semantic segmentation in road scenes is crucial for vehicle-mounted artificial intelligence to achieve environmental perception, decision support, and safety control. Although deep learning methods have made significant progress, two main challenges remain: first, the difficulty in [...] Read more.
In intelligent vehicular networks, the accuracy of semantic segmentation in road scenes is crucial for vehicle-mounted artificial intelligence to achieve environmental perception, decision support, and safety control. Although deep learning methods have made significant progress, two main challenges remain: first, the difficulty in balancing global and local features leads to blurred object boundaries and misclassification; second, conventional convolutions have limited ability to perceive irregular objects, causing information loss and affecting segmentation accuracy. To address these issues, this paper proposes a global–local collaborative attention module and a spider web convolution module. The former enhances feature representation through bidirectional feature interaction and dynamic weight allocation, reducing false positives and missed detections. The latter introduces an asymmetric sampling topology and six-directional receptive field paths to effectively improve the recognition of irregular objects. Experiments on the Cityscapes, CamVid, and BDD100K datasets, collected using vehicle-mounted cameras, demonstrate that the proposed method performs excellently across multiple evaluation metrics, including mIoU, mRecall, mPrecision, and mAccuracy. Comparative experiments with classical segmentation networks, attention mechanisms, and convolution modules validate the effectiveness of the proposed approach. The proposed method demonstrates outstanding performance in sensor-based semantic segmentation tasks and is well-suited for environmental perception systems in autonomous driving. Full article
(This article belongs to the Special Issue AI-Driving for Autonomous Vehicles)
Show Figures

Figure 1

25 pages, 2518 KiB  
Article
An Efficient Semantic Segmentation Framework with Attention-Driven Context Enhancement and Dynamic Fusion for Autonomous Driving
by Jia Tian, Peizeng Xin, Xinlu Bai, Zhiguo Xiao and Nianfeng Li
Appl. Sci. 2025, 15(15), 8373; https://doi.org/10.3390/app15158373 - 28 Jul 2025
Viewed by 349
Abstract
In recent years, a growing number of real-time semantic segmentation networks have been developed to improve segmentation accuracy. However, these advancements often come at the cost of increased computational complexity, which limits their inference efficiency, particularly in scenarios such as autonomous driving, where [...] Read more.
In recent years, a growing number of real-time semantic segmentation networks have been developed to improve segmentation accuracy. However, these advancements often come at the cost of increased computational complexity, which limits their inference efficiency, particularly in scenarios such as autonomous driving, where strict real-time performance is essential. Achieving an effective balance between speed and accuracy has thus become a central challenge in this field. To address this issue, we present a lightweight semantic segmentation model tailored for the perception requirements of autonomous vehicles. The architecture follows an encoder–decoder paradigm, which not only preserves the capability for deep feature extraction but also facilitates multi-scale information integration. The encoder leverages a high-efficiency backbone, while the decoder introduces a dynamic fusion mechanism designed to enhance information interaction between different feature branches. Recognizing the limitations of convolutional networks in modeling long-range dependencies and capturing global semantic context, the model incorporates an attention-based feature extraction component. This is further augmented by positional encoding, enabling better awareness of spatial structures and local details. The dynamic fusion mechanism employs an adaptive weighting strategy, adjusting the contribution of each feature channel to reduce redundancy and improve representation quality. To validate the effectiveness of the proposed network, experiments were conducted on a single RTX 3090 GPU. The Dynamic Real-time Integrated Vision Encoder–Segmenter Network (DriveSegNet) achieved a mean Intersection over Union (mIoU) of 76.9% and an inference speed of 70.5 FPS on the Cityscapes test dataset, 74.6% mIoU and 139.8 FPS on the CamVid test dataset, and 35.8% mIoU with 108.4 FPS on the ADE20K dataset. The experimental results demonstrate that the proposed method achieves an excellent balance between inference speed, segmentation accuracy, and model size. Full article
Show Figures

Figure 1

39 pages, 7470 KiB  
Article
Estimation of Fractal Dimension and Semantic Segmentation of Motion-Blurred Images by Knowledge Distillation in Autonomous Vehicle
by Seong In Jeong, Min Su Jeong and Kang Ryoung Park
Fractal Fract. 2025, 9(7), 460; https://doi.org/10.3390/fractalfract9070460 - 15 Jul 2025
Viewed by 400
Abstract
Research on semantic segmentation for remote sensing road scenes advanced significantly, driven by autonomous driving technology. However, motion blur from camera or subject movements hampers segmentation performance. To address this issue, we propose a knowledge distillation-based semantic segmentation network (KDS-Net) that is robust [...] Read more.
Research on semantic segmentation for remote sensing road scenes advanced significantly, driven by autonomous driving technology. However, motion blur from camera or subject movements hampers segmentation performance. To address this issue, we propose a knowledge distillation-based semantic segmentation network (KDS-Net) that is robust to motion blur, eliminating the need for image restoration networks. KDS-Net leverages innovative knowledge distillation techniques and edge-enhanced segmentation loss to refine edge regions and improve segmentation precision across various receptive fields. To enhance the interpretability of segmentation quality under motion blur, we incorporate fractal dimension estimation to quantify the geometric complexity of class-specific regions, allowing for a structural assessment of predictions generated by the proposed knowledge distillation framework for autonomous driving. Experiments on well-known motion-blurred remote sensing road scene datasets (CamVid and KITTI) demonstrate mean IoU scores of 72.42% and 59.29%, respectively, surpassing state-of-the-art methods. Additionally, the lightweight KDS-Net (21.44 M parameters) enables real-time edge computing, mitigating data privacy concerns and communication overheads in internet of vehicles scenarios. Full article
Show Figures

Figure 1

14 pages, 2210 KiB  
Article
AMFFNet: Adaptive Multi-Scale Feature Fusion Network for Urban Image Semantic Segmentation
by Shuting Huang and Haiyan Huang
Electronics 2025, 14(12), 2344; https://doi.org/10.3390/electronics14122344 - 8 Jun 2025
Cited by 2 | Viewed by 521
Abstract
Urban image semantic segmentation faces challenges including the coexistence of multi-scale objects, blurred semantic relationships between complex structures, and dynamic occlusion interference. Existing methods often struggle to balance global contextual understanding of large scenes and fine-grained details of small objects due to insufficient [...] Read more.
Urban image semantic segmentation faces challenges including the coexistence of multi-scale objects, blurred semantic relationships between complex structures, and dynamic occlusion interference. Existing methods often struggle to balance global contextual understanding of large scenes and fine-grained details of small objects due to insufficient granularity in multi-scale feature extraction and rigid fusion strategies. To address these issues, this paper proposes an Adaptive Multi-scale Feature Fusion Network (AMFFNet). The network primarily consists of four modules: a Multi-scale Feature Extraction Module (MFEM), an Adaptive Fusion Module (AFM), an Efficient Channel Attention (ECA) module, and an auxiliary supervision head. Firstly, the MFEM utilizes multiple depthwise strip convolutions to capture features at various scales, effectively leveraging contextual information. Then, the AFM employs a dynamic weight assignment strategy to harmonize multi-level features, enhancing the network’s ability to model complex urban scene structures. Additionally, the ECA attention mechanism introduces cross-channel interactions and nonlinear transformations to mitigate the issue of small-object segmentation omissions. Finally, the auxiliary supervision head enables shallow features to directly affect the final segmentation results. Experimental evaluations on the CamVid and Cityscapes datasets demonstrate that the proposed network achieves superior mean Intersection over Union (mIoU) scores of 77.8% and 81.9%, respectively, outperforming existing methods. The results confirm that AMFFNet has a stronger ability to understand complex urban scenes. Full article
(This article belongs to the Topic Intelligent Image Processing Technology)
Show Figures

Figure 1

23 pages, 1714 KiB  
Article
Deep LBLS: Accelerated Sky Region Segmentation Using Hybrid Deep CNNs and Lattice Boltzmann Level-Set Model
by Fatema A. Albalooshi, M. R. Qader, Yasser Ismail, Wael Elmedany, Hesham Al-Ammal, Muttukrishnan Rajarajan and Vijayan K. Asari
Eng 2025, 6(3), 57; https://doi.org/10.3390/eng6030057 - 19 Mar 2025
Viewed by 593
Abstract
Accurate segmentation of the sky region is crucial for various applications, including object detection, tracking, and recognition, as well as augmented reality (AR) and virtual reality (VR) applications. However, sky region segmentation poses significant challenges due to complex backgrounds, varying lighting conditions, and [...] Read more.
Accurate segmentation of the sky region is crucial for various applications, including object detection, tracking, and recognition, as well as augmented reality (AR) and virtual reality (VR) applications. However, sky region segmentation poses significant challenges due to complex backgrounds, varying lighting conditions, and the absence of clear edges and textures. In this paper, we present a new hybrid fast segmentation technique for the sky region that learns from object components to achieve rapid and effective segmentation while preserving precise details of the sky region. We employ Convolutional Neural Networks (CNNs) to guide the active contour and extract regions of interest. Our algorithm is implemented by leveraging three types of CNNs, namely DeepLabV3+, Fully Convolutional Network (FCN), and SegNet. Additionally, we utilize a local image fitting level-set function to characterize the region-based active contour model. Finally, the Lattice Boltzmann approach is employed to achieve rapid convergence of the level-set function. This forms a deep Lattice Boltzmann Level-Set (deep LBLS) segmentation approach that exploits deep CNN, the level-set method (LS), and the lattice Boltzmann method (LBM) for sky region separation. The performance of the proposed method is evaluated on the CamVid dataset, which contains images with a wide range of object variations due to factors such as illumination changes, shadow presence, occlusion, scale differences, and cluttered backgrounds. Experiments conducted on this dataset yield promising results in terms of computation time and the robustness of segmentation when compared to state-of-the-art methods. Our deep LBLS approach demonstrates better performance, with an improvement in mean recall value reaching up to 14.45%. Full article
Show Figures

Figure 1

14 pages, 48905 KiB  
Article
RSM-Optimizer: Branch Optimization for Dual- or Multi-Branch Semantic Segmentation Networks
by Xiaohong Zhang, Wenwen Zong and Yaning Jiang
Electronics 2025, 14(6), 1109; https://doi.org/10.3390/electronics14061109 - 11 Mar 2025
Viewed by 705
Abstract
Semantic segmentation is a crucial task in the field of computer vision, with important applications in areas such as autonomous driving, medical image analysis, and remote sensing image analysis. Dual-branch and multi-branch semantic segmentation networks that leverage deep learning technologies can enhance both [...] Read more.
Semantic segmentation is a crucial task in the field of computer vision, with important applications in areas such as autonomous driving, medical image analysis, and remote sensing image analysis. Dual-branch and multi-branch semantic segmentation networks that leverage deep learning technologies can enhance both segmentation accuracy and speed. These networks typically contain a semantic branch and a context branch. However, the feature maps in the detail branch are limited to a single type of receptive field, which limits models’ abilities to perceive objects at different scales. During the feature map fusion process, low-resolution feature maps from the semantic branch are upsampled with a large factor to match the feature maps in the detail branch. Unfortunately, these upsampling operations inevitably introduce noise. To address these issues, we propose several improvements to optimize the detail and semantic branches. We first design a receptive field-driven feature enhancement module to enrich the receptive fields of feature maps in the detail branch. Then, we propose a stepwise upsampling and fusion module to reduce the noise introduced during the upsampling process of feature fusion. Finally, we introduce a pyramid mixed pooling module (PMPM) to improve models’ abilities to perceive objects of different shapes. Considering the diversity of objects in terms of scale, shape, and category in urban street scene data, we carried out experiments on the Cityscapes and CamVid datasets. The experimental results on both datasets validate the effectiveness and efficiency of the proposed improvements. Full article
Show Figures

Figure 1

15 pages, 924 KiB  
Article
Novel Approach in Vegetation Detection Using Multi-Scale Convolutional Neural Network
by Fatema A. Albalooshi
Appl. Sci. 2024, 14(22), 10287; https://doi.org/10.3390/app142210287 - 8 Nov 2024
Cited by 2 | Viewed by 1214
Abstract
Vegetation segmentation plays a crucial role in accurately monitoring and analyzing vegetation cover, growth patterns, and changes over time, which in turn contributes to environmental studies, land management, and assessing the impact of climate change. This study explores the potential of a multi-scale [...] Read more.
Vegetation segmentation plays a crucial role in accurately monitoring and analyzing vegetation cover, growth patterns, and changes over time, which in turn contributes to environmental studies, land management, and assessing the impact of climate change. This study explores the potential of a multi-scale convolutional neural network (MSCNN) design for object classification, specifically focusing on vegetation detection. The MSCNN is designed to integrate multi-scale feature extraction and attention mechanisms, enabling the model to capture both fine and coarse vegetation patterns effectively. Moreover, the MSCNN architecture integrates multiple convolutional layers with varying kernel sizes (3 × 3, 5 × 5, and 7 × 7), enabling the model to extract features at different scales, which is vital for identifying diverse vegetation patterns across various landscapes. Vegetation detection is demonstrated using three diverse datasets: the CamVid dataset, the FloodNet dataset, and the multispectral RIT-18 dataset. These datasets present a range of challenges, including variations in illumination, the presence of shadows, occlusion, scale differences, and cluttered backgrounds, which are common in real-world scenarios. The MSCNN architecture allows for the integration of information from multiple scales, facilitating the detection of diverse vegetation types under varying conditions. The performance of the proposed MSCNN method is rigorously evaluated and compared against state-of-the-art techniques in the field. Comprehensive experiments showcase the effectiveness of the approach, highlighting its robustness in accurately segmenting and classifying vegetation even in complex environments. The results indicate that the MSCNN design significantly outperforms traditional methods, achieving a remarkable global accuracy and boundary F1 score (BF score) of up to 98%. This superior performance underscores the MSCNN’s capability to enhance vegetation detection in imagery, making it a promising tool for applications in environmental monitoring and land use management. Full article
(This article belongs to the Section Computing and Artificial Intelligence)
Show Figures

Figure 1

19 pages, 5481 KiB  
Article
Real-Time Semantic Segmentation Algorithm for Street Scenes Based on Attention Mechanism and Feature Fusion
by Bao Wu, Xingzhong Xiong and Yong Wang
Electronics 2024, 13(18), 3699; https://doi.org/10.3390/electronics13183699 - 18 Sep 2024
Cited by 3 | Viewed by 1999
Abstract
In computer vision, the task of semantic segmentation is crucial for applications such as autonomous driving and intelligent surveillance. However, achieving a balance between real-time performance and segmentation accuracy remains a significant challenge. Although Fast-SCNN is favored for its efficiency and low computational [...] Read more.
In computer vision, the task of semantic segmentation is crucial for applications such as autonomous driving and intelligent surveillance. However, achieving a balance between real-time performance and segmentation accuracy remains a significant challenge. Although Fast-SCNN is favored for its efficiency and low computational complexity, it still faces difficulties when handling complex street scene images. To address this issue, this paper presents an improved Fast-SCNN, aiming to enhance the accuracy and efficiency of semantic segmentation by incorporating a novel attention mechanism and an enhanced feature extraction module. Firstly, the integrated SimAM (Simple, Parameter-Free Attention Module) increases the network’s sensitivity to critical regions of the image and effectively adjusts the feature space weights across channels. Additionally, the refined pyramid pooling module in the global feature extraction module captures a broader range of contextual information through refined pooling levels. During the feature fusion stage, the introduction of an enhanced DAB (Depthwise Asymmetric Bottleneck) block and SE (Squeeze-and-Excitation) attention optimizes the network’s ability to process multi-scale information. Furthermore, the classifier module is extended by incorporating deeper convolutions and more complex convolutional structures, leading to a further improvement in model performance. These enhancements significantly improve the model’s ability to capture details and overall segmentation performance. Experimental results demonstrate that the proposed method excels in processing complex street scene images, achieving a mean Intersection over Union (mIoU) of 71.7% and 69.4% on the Cityscapes and CamVid datasets, respectively, while maintaining inference speeds of 81.4 fps and 113.6 fps. These results indicate that the proposed model effectively improves segmentation quality in complex street scenes while ensuring real-time processing capabilities. Full article
Show Figures

Figure 1

17 pages, 5621 KiB  
Article
Asymmetric-Convolution-Guided Multipath Fusion for Real-Time Semantic Segmentation Networks
by Jie Liu, Bing Zhao and Ming Tian
Mathematics 2024, 12(17), 2759; https://doi.org/10.3390/math12172759 - 5 Sep 2024
Viewed by 1132
Abstract
Aiming to provide solutions for problems proposed by the inaccurate segmentation of long objects and information loss of small objects in real-time semantic segmentation algorithms, this paper proposes a lightweight multi-branch real-time semantic segmentation network based on BiseNetV2. The new auxiliary branch makes [...] Read more.
Aiming to provide solutions for problems proposed by the inaccurate segmentation of long objects and information loss of small objects in real-time semantic segmentation algorithms, this paper proposes a lightweight multi-branch real-time semantic segmentation network based on BiseNetV2. The new auxiliary branch makes full use of spatial details and context information to cover the long object in the field of view. Meanwhile, in order to ensure the inference speed of the model, the asymmetric convolution is used in each stage of the auxiliary branch to design a structure with low computational complexity. In the multi-branch fusion stage, the alignment-and-fusion module is designed to provide guidance information for deep and shallow feature mapping, so as to make up for the problem of feature misalignment in the fusion of information at different scales, and thus reduce the loss of small target information. In order to further improve the model’s awareness of key information, a global context module is designed to capture the most important features in the input data. The proposed network uses an NVIDIA GeForce RTX 3080 Laptop GPU experiment on the road street view Cityscapes and CamVid datasets, with the average simultaneously occurring ratios reaching 77.1% and 77.4%, respectively, and the running speeds reaching 127 frames/s and 112 frames/s, respectively. The experimental results show that the proposed algorithm can achieve a real-time segmentation and improve the accuracy significantly, showing good semantic segmentation performance. Full article
Show Figures

Figure 1

16 pages, 1960 KiB  
Article
LMANet: A Lightweight Asymmetric Semantic Segmentation Network Based on Multi-Scale Feature Extraction
by Hui Chen, Zhexuan Xiao, Bin Ge and Xuedi Li
Electronics 2024, 13(17), 3361; https://doi.org/10.3390/electronics13173361 - 23 Aug 2024
Viewed by 1437
Abstract
With the swift progress of deep learning and its wide application in semantic segmentation, the effect of semantic segmentation has been significantly improved. However, how to achieve a reasonable compromise between accuracy, model size, and inference speed is crucial. In this paper, we [...] Read more.
With the swift progress of deep learning and its wide application in semantic segmentation, the effect of semantic segmentation has been significantly improved. However, how to achieve a reasonable compromise between accuracy, model size, and inference speed is crucial. In this paper, we propose a lightweight multi-scale asymmetric encoder–decoder network (LMANet) that is designed on the basis of an encoder–decoder structure. First, an optimized bottleneck module is used to extract features from different levels, and different receptive fields are applied to obtain effective information on different scales. Then, a channel-attention module and a feature-extraction module are introduced to constitute the residual structure, and different feature maps are connected by a feature-fusion module to effectively improve segmentation accuracy. Finally, a lightweight multi-scale decoder is designed to recover the image, and a spatial attention module is added to recover the spatial details effectively. This paper has verified the proposed method on the Cityscapes dataset and CamVid dataset and achieved mean intersection over union (mIoU) of 73.9% and 71.3% with the inference speeds of 111 FPS and 118 FPS, respectively, and the number of parameters is only 0.85 M. Full article
Show Figures

Figure 1

14 pages, 26445 KiB  
Article
Containment Control-Guided Boundary Information for Semantic Segmentation
by Wenbo Liu, Junfeng Zhang, Chunyu Zhao, Yi Huang, Tao Deng and Fei Yan
Appl. Sci. 2024, 14(16), 7291; https://doi.org/10.3390/app14167291 - 19 Aug 2024
Viewed by 1311
Abstract
Real-time semantic segmentation is a challenging task in computer vision, especially in complex scenes. In this study, a novel three-branch semantic segmentation model is designed, aiming to effectively use boundary information to improve the accuracy of semantic segmentation. The proposed model introduces the [...] Read more.
Real-time semantic segmentation is a challenging task in computer vision, especially in complex scenes. In this study, a novel three-branch semantic segmentation model is designed, aiming to effectively use boundary information to improve the accuracy of semantic segmentation. The proposed model introduces the concept of containment control in a pioneering way, which treats image interior elements as well as image boundary elements as followers and leaders in containment control, respectively. Based on this, we utilize two learnable feature fusion matrices in the high-level semantic information stage of the model to quantify the fusion process of internal and boundary features. Further, we design a dedicated loss function to update the parameters of the feature fusion matrices based on the criterion of containment control, which enables fine-grained communication between target features. In addition, our model incorporates a Feature Enhancement Unit (FEU) to tackle the challenge of maximizing the utility of multi-scale features essential for semantic segmentation tasks through the meticulous reconstruction of these features. The proposed model proves effective on the publicly available Cityscapes and CamVid datasets, achieving a trade-off between effectiveness and speed. Full article
(This article belongs to the Special Issue Digital Image Processing: Novel Technologies and Applications)
Show Figures

Figure 1

22 pages, 5551 KiB  
Article
BMSeNet: Multiscale Context Pyramid Pooling and Spatial Detail Enhancement Network for Real-Time Semantic Segmentation
by Shan Zhao, Xin Zhao, Zhanqiang Huo and Fukai Zhang
Sensors 2024, 24(16), 5145; https://doi.org/10.3390/s24165145 - 9 Aug 2024
Cited by 3 | Viewed by 1694
Abstract
Most real-time semantic segmentation networks use shallow architectures to achieve fast inference speeds. This approach, however, limits a network’s receptive field. Concurrently, feature information extraction is restricted to a single scale, which reduces the network’s ability to generalize and maintain robustness. Furthermore, loss [...] Read more.
Most real-time semantic segmentation networks use shallow architectures to achieve fast inference speeds. This approach, however, limits a network’s receptive field. Concurrently, feature information extraction is restricted to a single scale, which reduces the network’s ability to generalize and maintain robustness. Furthermore, loss of image spatial details negatively impacts segmentation accuracy. To address these limitations, this paper proposes a Multiscale Context Pyramid Pooling and Spatial Detail Enhancement Network (BMSeNet). First, to address the limitation of singular semantic feature scales, a Multiscale Context Pyramid Pooling Module (MSCPPM) is introduced. By leveraging various pooling operations, this module efficiently enlarges the receptive field and better aggregates multiscale contextual information. Moreover, a Spatial Detail Enhancement Module (SDEM) is designed, to effectively compensate for lost spatial detail information and significantly enhance the perception of spatial details. Finally, a Bilateral Attention Fusion Module (BAFM) is proposed. This module leverages pixel positional correlations to guide the network in assigning appropriate weights to the features extracted from the two branches, effectively merging the feature information of both branches. Extensive experiments were conducted on the Cityscapes and CamVid datasets. Experimental results show that the proposed BMSeNet achieves a good balance between inference speed and segmentation accuracy, outperforming some state-of-the-art real-time semantic segmentation methods. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

18 pages, 4504 KiB  
Article
LACTNet: A Lightweight Real-Time Semantic Segmentation Network Based on an Aggregated Convolutional Neural Network and Transformer
by Xiangyue Zhang, Hexiao Li, Jingyu Ru, Peng Ji and Chengdong Wu
Electronics 2024, 13(12), 2406; https://doi.org/10.3390/electronics13122406 - 19 Jun 2024
Cited by 1 | Viewed by 2210
Abstract
Transformers have demonstrated a significant advantage over CNNs in modeling long-range dependencies, leading to increasing attention being paid towards their application in semantic segmentation tasks. In the present work, a novel semantic segmentation model, LACTNet, is introduced, which synergistically combines Transformer and CNN [...] Read more.
Transformers have demonstrated a significant advantage over CNNs in modeling long-range dependencies, leading to increasing attention being paid towards their application in semantic segmentation tasks. In the present work, a novel semantic segmentation model, LACTNet, is introduced, which synergistically combines Transformer and CNN architectures for the real-time processing of local and global contextual features. LACTNet is designed with a lightweight Transformer, which integrates a specially designed gated convolutional feedforward network, to establish feature dependencies across distant regions. A Lightweight Average Feature Bottleneck (LAFB) module is designed to effectively capture spatial detail information within the features, thereby enhancing segmentation accuracy. To address the issue of spatial feature loss in the decoder, a long skip-connection approach is employed through the designed Feature Fusion Enhancement Module (FFEM), which enhances the integrity of spatial features and the feature interaction capability in the decoder. LACTNet is evaluated on two datasets, achieving a segmentation accuracy of 74.8% mIoU and a frame rate of 90 FPS on the Cityscapes dataset, and a segmentation accuracy of 71.8% mIoU with a frame rate of 126 FPS on the CamVid dataset. Full article
Show Figures

Figure 1

19 pages, 8535 KiB  
Article
A Fast Attention-Guided Hierarchical Decoding Network for Real-Time Semantic Segmentation
by Xuegang Hu and Jing Feng
Sensors 2024, 24(1), 95; https://doi.org/10.3390/s24010095 - 24 Dec 2023
Cited by 3 | Viewed by 1448
Abstract
Semantic segmentation provides accurate scene understanding and decision support for many applications. However, many models strive for high accuracy by adopting complex structures, decreasing the inference speed, and making it challenging to meet real-time requirements. Therefore, a fast attention-guided hierarchical decoding network for [...] Read more.
Semantic segmentation provides accurate scene understanding and decision support for many applications. However, many models strive for high accuracy by adopting complex structures, decreasing the inference speed, and making it challenging to meet real-time requirements. Therefore, a fast attention-guided hierarchical decoding network for real-time semantic segmentation (FAHDNet), which is an asymmetric U-shaped structure, is proposed to address this issue. In the encoder, we design a multi-scale bottleneck residual unit (MBRU), which combines the attention mechanism and decomposition convolution to design a parallel structure for aggregating multi-scale information, making the network perform better at processing information at different scales. In addition, we propose a spatial information compensation (SIC) module that effectively uses the original input to make up for the spatial texture information lost during downsampling. In the decoder, the global attention (GA) module is used to process the feature map of the encoder, enhance the feature interaction in the channel and spatial dimensions, and enhance the ability to mine feature information. At the same time, the lightweight hierarchical decoder integrates multi-scale features to better adapt to different scale targets and accurately segment objects of different sizes. Through experiments, FAHDNet performs outstandingly on two public datasets, Cityscapes and Camvid. Specifically, the network achieves 70.6% mean intersection over union (mIoU) at 135 frames per second (FPS) on Cityscapes and 67.2% mIoU at 335 FPS on Camvid. Compared to the existing networks, our model maintains accuracy while achieving faster inference speeds, thus enhancing its practical usability. Full article
(This article belongs to the Section Sensor Networks)
Show Figures

Figure 1

15 pages, 1091 KiB  
Article
Swin-APT: An Enhancing Swin-Transformer Adaptor for Intelligent Transportation
by Yunzhuo Liu, Chunjiang Wu, Yuting Zeng, Keyu Chen and Shijie Zhou
Appl. Sci. 2023, 13(24), 13226; https://doi.org/10.3390/app132413226 - 13 Dec 2023
Cited by 5 | Viewed by 2578
Abstract
Artificial Intelligence has been widely applied in intelligent transportation systems. In this work, Swin-APT, a deep learning-based approach for semantic segmentation and object detection in intelligent transportation systems is presented. Swin-APT includes a lightweight network and a multiscale adapter network designed for image [...] Read more.
Artificial Intelligence has been widely applied in intelligent transportation systems. In this work, Swin-APT, a deep learning-based approach for semantic segmentation and object detection in intelligent transportation systems is presented. Swin-APT includes a lightweight network and a multiscale adapter network designed for image semantic segmentation and object detection tasks. An inter-frame consistency module is proposed to extract more accurate road information from images. Experimental results on four datasets: BDD100K, CamVid, SYNTHIA, and CeyMo, demonstrate that Swin-APT outperforms the baseline by 13.1%. Furthermore, experiments on the road marking detection benchmark show an improvement of 1.85% of mAcc. Full article
(This article belongs to the Special Issue Advances in Image Enhancement and Restoration Technology)
Show Figures

Figure 1

Back to TopTop