MDPI - Publisher of Open Access Journals

34 pages, 5777 KiB

Open AccessArticle

ACNet: An Attention–Convolution Collaborative Semantic Segmentation Network on Sensor-Derived Datasets for Autonomous Driving

by Qiliang Zhang, Kaiwen Hua, Zi Zhang, Yiwei Zhao and Pengpeng Chen

Sensors 2025, 25(15), 4776; https://doi.org/10.3390/s25154776 - 3 Aug 2025

Viewed by 167

Abstract

In intelligent vehicular networks, the accuracy of semantic segmentation in road scenes is crucial for vehicle-mounted artificial intelligence to achieve environmental perception, decision support, and safety control. Although deep learning methods have made significant progress, two main challenges remain: first, the difficulty in [...] Read more.

In intelligent vehicular networks, the accuracy of semantic segmentation in road scenes is crucial for vehicle-mounted artificial intelligence to achieve environmental perception, decision support, and safety control. Although deep learning methods have made significant progress, two main challenges remain: first, the difficulty in balancing global and local features leads to blurred object boundaries and misclassification; second, conventional convolutions have limited ability to perceive irregular objects, causing information loss and affecting segmentation accuracy. To address these issues, this paper proposes a global–local collaborative attention module and a spider web convolution module. The former enhances feature representation through bidirectional feature interaction and dynamic weight allocation, reducing false positives and missed detections. The latter introduces an asymmetric sampling topology and six-directional receptive field paths to effectively improve the recognition of irregular objects. Experiments on the Cityscapes, CamVid, and BDD100K datasets, collected using vehicle-mounted cameras, demonstrate that the proposed method performs excellently across multiple evaluation metrics, including mIoU, mRecall, mPrecision, and mAccuracy. Comparative experiments with classical segmentation networks, attention mechanisms, and convolution modules validate the effectiveness of the proposed approach. The proposed method demonstrates outstanding performance in sensor-based semantic segmentation tasks and is well-suited for environmental perception systems in autonomous driving. Full article

(This article belongs to the Special Issue AI-Driving for Autonomous Vehicles)

► Show Figures

Figure 1

25 pages, 2518 KiB

Open AccessArticle

An Efficient Semantic Segmentation Framework with Attention-Driven Context Enhancement and Dynamic Fusion for Autonomous Driving

by Jia Tian, Peizeng Xin, Xinlu Bai, Zhiguo Xiao and Nianfeng Li

Appl. Sci. 2025, 15(15), 8373; https://doi.org/10.3390/app15158373 - 28 Jul 2025

Viewed by 349

Abstract

In recent years, a growing number of real-time semantic segmentation networks have been developed to improve segmentation accuracy. However, these advancements often come at the cost of increased computational complexity, which limits their inference efficiency, particularly in scenarios such as autonomous driving, where [...] Read more.

In recent years, a growing number of real-time semantic segmentation networks have been developed to improve segmentation accuracy. However, these advancements often come at the cost of increased computational complexity, which limits their inference efficiency, particularly in scenarios such as autonomous driving, where strict real-time performance is essential. Achieving an effective balance between speed and accuracy has thus become a central challenge in this field. To address this issue, we present a lightweight semantic segmentation model tailored for the perception requirements of autonomous vehicles. The architecture follows an encoder–decoder paradigm, which not only preserves the capability for deep feature extraction but also facilitates multi-scale information integration. The encoder leverages a high-efficiency backbone, while the decoder introduces a dynamic fusion mechanism designed to enhance information interaction between different feature branches. Recognizing the limitations of convolutional networks in modeling long-range dependencies and capturing global semantic context, the model incorporates an attention-based feature extraction component. This is further augmented by positional encoding, enabling better awareness of spatial structures and local details. The dynamic fusion mechanism employs an adaptive weighting strategy, adjusting the contribution of each feature channel to reduce redundancy and improve representation quality. To validate the effectiveness of the proposed network, experiments were conducted on a single RTX 3090 GPU. The Dynamic Real-time Integrated Vision Encoder–Segmenter Network (DriveSegNet) achieved a mean Intersection over Union (mIoU) of 76.9% and an inference speed of 70.5 FPS on the Cityscapes test dataset, 74.6% mIoU and 139.8 FPS on the CamVid test dataset, and 35.8% mIoU with 108.4 FPS on the ADE20K dataset. The experimental results demonstrate that the proposed method achieves an excellent balance between inference speed, segmentation accuracy, and model size. Full article

► Show Figures

Figure 1

39 pages, 7470 KiB

Open AccessArticle

Estimation of Fractal Dimension and Semantic Segmentation of Motion-Blurred Images by Knowledge Distillation in Autonomous Vehicle

by Seong In Jeong, Min Su Jeong and Kang Ryoung Park

Fractal Fract. 2025, 9(7), 460; https://doi.org/10.3390/fractalfract9070460 - 15 Jul 2025

Viewed by 400

Abstract

Research on semantic segmentation for remote sensing road scenes advanced significantly, driven by autonomous driving technology. However, motion blur from camera or subject movements hampers segmentation performance. To address this issue, we propose a knowledge distillation-based semantic segmentation network (KDS-Net) that is robust [...] Read more.

Research on semantic segmentation for remote sensing road scenes advanced significantly, driven by autonomous driving technology. However, motion blur from camera or subject movements hampers segmentation performance. To address this issue, we propose a knowledge distillation-based semantic segmentation network (KDS-Net) that is robust to motion blur, eliminating the need for image restoration networks. KDS-Net leverages innovative knowledge distillation techniques and edge-enhanced segmentation loss to refine edge regions and improve segmentation precision across various receptive fields. To enhance the interpretability of segmentation quality under motion blur, we incorporate fractal dimension estimation to quantify the geometric complexity of class-specific regions, allowing for a structural assessment of predictions generated by the proposed knowledge distillation framework for autonomous driving. Experiments on well-known motion-blurred remote sensing road scene datasets (CamVid and KITTI) demonstrate mean IoU scores of 72.42% and 59.29%, respectively, surpassing state-of-the-art methods. Additionally, the lightweight KDS-Net (21.44 M parameters) enables real-time edge computing, mitigating data privacy concerns and communication overheads in internet of vehicles scenarios. Full article

(This article belongs to the Special Issue Advances in Pattern Recognition—Image and Time Series Analyses—through Fractal Geometry and Complexity Theory)

► Show Figures

Figure 1

14 pages, 2210 KiB

Open AccessArticle

AMFFNet: Adaptive Multi-Scale Feature Fusion Network for Urban Image Semantic Segmentation

by Shuting Huang and Haiyan Huang

Electronics 2025, 14(12), 2344; https://doi.org/10.3390/electronics14122344 - 8 Jun 2025

Cited by 2 | Viewed by 521

Abstract

Urban image semantic segmentation faces challenges including the coexistence of multi-scale objects, blurred semantic relationships between complex structures, and dynamic occlusion interference. Existing methods often struggle to balance global contextual understanding of large scenes and fine-grained details of small objects due to insufficient [...] Read more.

Urban image semantic segmentation faces challenges including the coexistence of multi-scale objects, blurred semantic relationships between complex structures, and dynamic occlusion interference. Existing methods often struggle to balance global contextual understanding of large scenes and fine-grained details of small objects due to insufficient granularity in multi-scale feature extraction and rigid fusion strategies. To address these issues, this paper proposes an Adaptive Multi-scale Feature Fusion Network (AMFFNet). The network primarily consists of four modules: a Multi-scale Feature Extraction Module (MFEM), an Adaptive Fusion Module (AFM), an Efficient Channel Attention (ECA) module, and an auxiliary supervision head. Firstly, the MFEM utilizes multiple depthwise strip convolutions to capture features at various scales, effectively leveraging contextual information. Then, the AFM employs a dynamic weight assignment strategy to harmonize multi-level features, enhancing the network’s ability to model complex urban scene structures. Additionally, the ECA attention mechanism introduces cross-channel interactions and nonlinear transformations to mitigate the issue of small-object segmentation omissions. Finally, the auxiliary supervision head enables shallow features to directly affect the final segmentation results. Experimental evaluations on the CamVid and Cityscapes datasets demonstrate that the proposed network achieves superior mean Intersection over Union (mIoU) scores of 77.8% and 81.9%, respectively, outperforming existing methods. The results confirm that AMFFNet has a stronger ability to understand complex urban scenes. Full article

(This article belongs to the Topic Intelligent Image Processing Technology)

► Show Figures

Figure 1

23 pages, 1714 KiB

Open AccessArticle

Deep LBLS: Accelerated Sky Region Segmentation Using Hybrid Deep CNNs and Lattice Boltzmann Level-Set Model

by Fatema A. Albalooshi, M. R. Qader, Yasser Ismail, Wael Elmedany, Hesham Al-Ammal, Muttukrishnan Rajarajan and Vijayan K. Asari

Eng 2025, 6(3), 57; https://doi.org/10.3390/eng6030057 - 19 Mar 2025

Viewed by 593

Abstract

Accurate segmentation of the sky region is crucial for various applications, including object detection, tracking, and recognition, as well as augmented reality (AR) and virtual reality (VR) applications. However, sky region segmentation poses significant challenges due to complex backgrounds, varying lighting conditions, and [...] Read more.

Accurate segmentation of the sky region is crucial for various applications, including object detection, tracking, and recognition, as well as augmented reality (AR) and virtual reality (VR) applications. However, sky region segmentation poses significant challenges due to complex backgrounds, varying lighting conditions, and the absence of clear edges and textures. In this paper, we present a new hybrid fast segmentation technique for the sky region that learns from object components to achieve rapid and effective segmentation while preserving precise details of the sky region. We employ Convolutional Neural Networks (CNNs) to guide the active contour and extract regions of interest. Our algorithm is implemented by leveraging three types of CNNs, namely DeepLabV3+, Fully Convolutional Network (FCN), and SegNet. Additionally, we utilize a local image fitting level-set function to characterize the region-based active contour model. Finally, the Lattice Boltzmann approach is employed to achieve rapid convergence of the level-set function. This forms a deep Lattice Boltzmann Level-Set (deep LBLS) segmentation approach that exploits deep CNN, the level-set method (LS), and the lattice Boltzmann method (LBM) for sky region separation. The performance of the proposed method is evaluated on the CamVid dataset, which contains images with a wide range of object variations due to factors such as illumination changes, shadow presence, occlusion, scale differences, and cluttered backgrounds. Experiments conducted on this dataset yield promising results in terms of computation time and the robustness of segmentation when compared to state-of-the-art methods. Our deep LBLS approach demonstrates better performance, with an improvement in mean recall value reaching up to 14.45%. Full article

► Show Figures

Figure 1

14 pages, 48905 KiB

Open AccessArticle

RSM-Optimizer: Branch Optimization for Dual- or Multi-Branch Semantic Segmentation Networks

by Xiaohong Zhang, Wenwen Zong and Yaning Jiang

Electronics 2025, 14(6), 1109; https://doi.org/10.3390/electronics14061109 - 11 Mar 2025

Viewed by 705

Abstract

Semantic segmentation is a crucial task in the field of computer vision, with important applications in areas such as autonomous driving, medical image analysis, and remote sensing image analysis. Dual-branch and multi-branch semantic segmentation networks that leverage deep learning technologies can enhance both [...] Read more.

Semantic segmentation is a crucial task in the field of computer vision, with important applications in areas such as autonomous driving, medical image analysis, and remote sensing image analysis. Dual-branch and multi-branch semantic segmentation networks that leverage deep learning technologies can enhance both segmentation accuracy and speed. These networks typically contain a semantic branch and a context branch. However, the feature maps in the detail branch are limited to a single type of receptive field, which limits models’ abilities to perceive objects at different scales. During the feature map fusion process, low-resolution feature maps from the semantic branch are upsampled with a large factor to match the feature maps in the detail branch. Unfortunately, these upsampling operations inevitably introduce noise. To address these issues, we propose several improvements to optimize the detail and semantic branches. We first design a receptive field-driven feature enhancement module to enrich the receptive fields of feature maps in the detail branch. Then, we propose a stepwise upsampling and fusion module to reduce the noise introduced during the upsampling process of feature fusion. Finally, we introduce a pyramid mixed pooling module (PMPM) to improve models’ abilities to perceive objects of different shapes. Considering the diversity of objects in terms of scale, shape, and category in urban street scene data, we carried out experiments on the Cityscapes and CamVid datasets. The experimental results on both datasets validate the effectiveness and efficiency of the proposed improvements. Full article

► Show Figures

Figure 1

15 pages, 924 KiB

Open AccessArticle

Novel Approach in Vegetation Detection Using Multi-Scale Convolutional Neural Network

by Fatema A. Albalooshi

Appl. Sci. 2024, 14(22), 10287; https://doi.org/10.3390/app142210287 - 8 Nov 2024

Cited by 2 | Viewed by 1214

Abstract

Vegetation segmentation plays a crucial role in accurately monitoring and analyzing vegetation cover, growth patterns, and changes over time, which in turn contributes to environmental studies, land management, and assessing the impact of climate change. This study explores the potential of a multi-scale [...] Read more.

Vegetation segmentation plays a crucial role in accurately monitoring and analyzing vegetation cover, growth patterns, and changes over time, which in turn contributes to environmental studies, land management, and assessing the impact of climate change. This study explores the potential of a multi-scale convolutional neural network (MSCNN) design for object classification, specifically focusing on vegetation detection. The MSCNN is designed to integrate multi-scale feature extraction and attention mechanisms, enabling the model to capture both fine and coarse vegetation patterns effectively. Moreover, the MSCNN architecture integrates multiple convolutional layers with varying kernel sizes (3 × 3, 5 × 5, and 7 × 7), enabling the model to extract features at different scales, which is vital for identifying diverse vegetation patterns across various landscapes. Vegetation detection is demonstrated using three diverse datasets: the CamVid dataset, the FloodNet dataset, and the multispectral RIT-18 dataset. These datasets present a range of challenges, including variations in illumination, the presence of shadows, occlusion, scale differences, and cluttered backgrounds, which are common in real-world scenarios. The MSCNN architecture allows for the integration of information from multiple scales, facilitating the detection of diverse vegetation types under varying conditions. The performance of the proposed MSCNN method is rigorously evaluated and compared against state-of-the-art techniques in the field. Comprehensive experiments showcase the effectiveness of the approach, highlighting its robustness in accurately segmenting and classifying vegetation even in complex environments. The results indicate that the MSCNN design significantly outperforms traditional methods, achieving a remarkable global accuracy and boundary F1 score (BF score) of up to 98%. This superior performance underscores the MSCNN’s capability to enhance vegetation detection in imagery, making it a promising tool for applications in environmental monitoring and land use management. Full article

(This article belongs to the Section Computing and Artificial Intelligence)

► Show Figures

Figure 1

19 pages, 5481 KiB

Open AccessArticle

Real-Time Semantic Segmentation Algorithm for Street Scenes Based on Attention Mechanism and Feature Fusion

by Bao Wu, Xingzhong Xiong and Yong Wang

Electronics 2024, 13(18), 3699; https://doi.org/10.3390/electronics13183699 - 18 Sep 2024

Cited by 3 | Viewed by 1999

Abstract

In computer vision, the task of semantic segmentation is crucial for applications such as autonomous driving and intelligent surveillance. However, achieving a balance between real-time performance and segmentation accuracy remains a significant challenge. Although Fast-SCNN is favored for its efficiency and low computational [...] Read more.

In computer vision, the task of semantic segmentation is crucial for applications such as autonomous driving and intelligent surveillance. However, achieving a balance between real-time performance and segmentation accuracy remains a significant challenge. Although Fast-SCNN is favored for its efficiency and low computational complexity, it still faces difficulties when handling complex street scene images. To address this issue, this paper presents an improved Fast-SCNN, aiming to enhance the accuracy and efficiency of semantic segmentation by incorporating a novel attention mechanism and an enhanced feature extraction module. Firstly, the integrated SimAM (Simple, Parameter-Free Attention Module) increases the network’s sensitivity to critical regions of the image and effectively adjusts the feature space weights across channels. Additionally, the refined pyramid pooling module in the global feature extraction module captures a broader range of contextual information through refined pooling levels. During the feature fusion stage, the introduction of an enhanced DAB (Depthwise Asymmetric Bottleneck) block and SE (Squeeze-and-Excitation) attention optimizes the network’s ability to process multi-scale information. Furthermore, the classifier module is extended by incorporating deeper convolutions and more complex convolutional structures, leading to a further improvement in model performance. These enhancements significantly improve the model’s ability to capture details and overall segmentation performance. Experimental results demonstrate that the proposed method excels in processing complex street scene images, achieving a mean Intersection over Union (mIoU) of 71.7% and 69.4% on the Cityscapes and CamVid datasets, respectively, while maintaining inference speeds of 81.4 fps and 113.6 fps. These results indicate that the proposed model effectively improves segmentation quality in complex street scenes while ensuring real-time processing capabilities. Full article

► Show Figures

Figure 1

17 pages, 5621 KiB

Open AccessArticle

Asymmetric-Convolution-Guided Multipath Fusion for Real-Time Semantic Segmentation Networks

by Jie Liu, Bing Zhao and Ming Tian

Mathematics 2024, 12(17), 2759; https://doi.org/10.3390/math12172759 - 5 Sep 2024

Viewed by 1132

Abstract

Aiming to provide solutions for problems proposed by the inaccurate segmentation of long objects and information loss of small objects in real-time semantic segmentation algorithms, this paper proposes a lightweight multi-branch real-time semantic segmentation network based on BiseNetV2. The new auxiliary branch makes [...] Read more.

Aiming to provide solutions for problems proposed by the inaccurate segmentation of long objects and information loss of small objects in real-time semantic segmentation algorithms, this paper proposes a lightweight multi-branch real-time semantic segmentation network based on BiseNetV2. The new auxiliary branch makes full use of spatial details and context information to cover the long object in the field of view. Meanwhile, in order to ensure the inference speed of the model, the asymmetric convolution is used in each stage of the auxiliary branch to design a structure with low computational complexity. In the multi-branch fusion stage, the alignment-and-fusion module is designed to provide guidance information for deep and shallow feature mapping, so as to make up for the problem of feature misalignment in the fusion of information at different scales, and thus reduce the loss of small target information. In order to further improve the model’s awareness of key information, a global context module is designed to capture the most important features in the input data. The proposed network uses an NVIDIA GeForce RTX 3080 Laptop GPU experiment on the road street view Cityscapes and CamVid datasets, with the average simultaneously occurring ratios reaching 77.1% and 77.4%, respectively, and the running speeds reaching 127 frames/s and 112 frames/s, respectively. The experimental results show that the proposed algorithm can achieve a real-time segmentation and improve the accuracy significantly, showing good semantic segmentation performance. Full article

(This article belongs to the Special Issue Mathematical Modeling, Machine Learning, and Intelligent Computing for Internet of Things)

► Show Figures

Figure 1

16 pages, 1960 KiB

Open AccessArticle

LMANet: A Lightweight Asymmetric Semantic Segmentation Network Based on Multi-Scale Feature Extraction

by Hui Chen, Zhexuan Xiao, Bin Ge and Xuedi Li

Electronics 2024, 13(17), 3361; https://doi.org/10.3390/electronics13173361 - 23 Aug 2024

Viewed by 1437

Abstract

With the swift progress of deep learning and its wide application in semantic segmentation, the effect of semantic segmentation has been significantly improved. However, how to achieve a reasonable compromise between accuracy, model size, and inference speed is crucial. In this paper, we [...] Read more.

With the swift progress of deep learning and its wide application in semantic segmentation, the effect of semantic segmentation has been significantly improved. However, how to achieve a reasonable compromise between accuracy, model size, and inference speed is crucial. In this paper, we propose a lightweight multi-scale asymmetric encoder–decoder network (LMANet) that is designed on the basis of an encoder–decoder structure. First, an optimized bottleneck module is used to extract features from different levels, and different receptive fields are applied to obtain effective information on different scales. Then, a channel-attention module and a feature-extraction module are introduced to constitute the residual structure, and different feature maps are connected by a feature-fusion module to effectively improve segmentation accuracy. Finally, a lightweight multi-scale decoder is designed to recover the image, and a spatial attention module is added to recover the spatial details effectively. This paper has verified the proposed method on the Cityscapes dataset and CamVid dataset and achieved mean intersection over union (mIoU) of 73.9% and 71.3% with the inference speeds of 111 FPS and 118 FPS, respectively, and the number of parameters is only 0.85 M. Full article

► Show Figures

Figure 1

14 pages, 26445 KiB

Open AccessArticle

Containment Control-Guided Boundary Information for Semantic Segmentation

by Wenbo Liu, Junfeng Zhang, Chunyu Zhao, Yi Huang, Tao Deng and Fei Yan

Appl. Sci. 2024, 14(16), 7291; https://doi.org/10.3390/app14167291 - 19 Aug 2024

Viewed by 1311

Abstract

Real-time semantic segmentation is a challenging task in computer vision, especially in complex scenes. In this study, a novel three-branch semantic segmentation model is designed, aiming to effectively use boundary information to improve the accuracy of semantic segmentation. The proposed model introduces the [...] Read more.

Real-time semantic segmentation is a challenging task in computer vision, especially in complex scenes. In this study, a novel three-branch semantic segmentation model is designed, aiming to effectively use boundary information to improve the accuracy of semantic segmentation. The proposed model introduces the concept of containment control in a pioneering way, which treats image interior elements as well as image boundary elements as followers and leaders in containment control, respectively. Based on this, we utilize two learnable feature fusion matrices in the high-level semantic information stage of the model to quantify the fusion process of internal and boundary features. Further, we design a dedicated loss function to update the parameters of the feature fusion matrices based on the criterion of containment control, which enables fine-grained communication between target features. In addition, our model incorporates a Feature Enhancement Unit (FEU) to tackle the challenge of maximizing the utility of multi-scale features essential for semantic segmentation tasks through the meticulous reconstruction of these features. The proposed model proves effective on the publicly available Cityscapes and CamVid datasets, achieving a trade-off between effectiveness and speed. Full article

(This article belongs to the Special Issue Digital Image Processing: Novel Technologies and Applications)

► Show Figures

Figure 1

22 pages, 5551 KiB

Open AccessArticle

BMSeNet: Multiscale Context Pyramid Pooling and Spatial Detail Enhancement Network for Real-Time Semantic Segmentation

by Shan Zhao, Xin Zhao, Zhanqiang Huo and Fukai Zhang

Sensors 2024, 24(16), 5145; https://doi.org/10.3390/s24165145 - 9 Aug 2024

Cited by 3 | Viewed by 1694

Abstract

Most real-time semantic segmentation networks use shallow architectures to achieve fast inference speeds. This approach, however, limits a network’s receptive field. Concurrently, feature information extraction is restricted to a single scale, which reduces the network’s ability to generalize and maintain robustness. Furthermore, loss [...] Read more.

Most real-time semantic segmentation networks use shallow architectures to achieve fast inference speeds. This approach, however, limits a network’s receptive field. Concurrently, feature information extraction is restricted to a single scale, which reduces the network’s ability to generalize and maintain robustness. Furthermore, loss of image spatial details negatively impacts segmentation accuracy. To address these limitations, this paper proposes a Multiscale Context Pyramid Pooling and Spatial Detail Enhancement Network (BMSeNet). First, to address the limitation of singular semantic feature scales, a Multiscale Context Pyramid Pooling Module (MSCPPM) is introduced. By leveraging various pooling operations, this module efficiently enlarges the receptive field and better aggregates multiscale contextual information. Moreover, a Spatial Detail Enhancement Module (SDEM) is designed, to effectively compensate for lost spatial detail information and significantly enhance the perception of spatial details. Finally, a Bilateral Attention Fusion Module (BAFM) is proposed. This module leverages pixel positional correlations to guide the network in assigning appropriate weights to the features extracted from the two branches, effectively merging the feature information of both branches. Extensive experiments were conducted on the Cityscapes and CamVid datasets. Experimental results show that the proposed BMSeNet achieves a good balance between inference speed and segmentation accuracy, outperforming some state-of-the-art real-time semantic segmentation methods. Full article

(This article belongs to the Section Sensing and Imaging)

► Show Figures

Figure 1

18 pages, 4504 KiB

Open AccessArticle

LACTNet: A Lightweight Real-Time Semantic Segmentation Network Based on an Aggregated Convolutional Neural Network and Transformer

by Xiangyue Zhang, Hexiao Li, Jingyu Ru, Peng Ji and Chengdong Wu

Electronics 2024, 13(12), 2406; https://doi.org/10.3390/electronics13122406 - 19 Jun 2024

Cited by 1 | Viewed by 2210

Abstract

Transformers have demonstrated a significant advantage over CNNs in modeling long-range dependencies, leading to increasing attention being paid towards their application in semantic segmentation tasks. In the present work, a novel semantic segmentation model, LACTNet, is introduced, which synergistically combines Transformer and CNN [...] Read more.

Transformers have demonstrated a significant advantage over CNNs in modeling long-range dependencies, leading to increasing attention being paid towards their application in semantic segmentation tasks. In the present work, a novel semantic segmentation model, LACTNet, is introduced, which synergistically combines Transformer and CNN architectures for the real-time processing of local and global contextual features. LACTNet is designed with a lightweight Transformer, which integrates a specially designed gated convolutional feedforward network, to establish feature dependencies across distant regions. A Lightweight Average Feature Bottleneck (LAFB) module is designed to effectively capture spatial detail information within the features, thereby enhancing segmentation accuracy. To address the issue of spatial feature loss in the decoder, a long skip-connection approach is employed through the designed Feature Fusion Enhancement Module (FFEM), which enhances the integrity of spatial features and the feature interaction capability in the decoder. LACTNet is evaluated on two datasets, achieving a segmentation accuracy of 74.8% mIoU and a frame rate of 90 FPS on the Cityscapes dataset, and a segmentation accuracy of 71.8% mIoU with a frame rate of 126 FPS on the CamVid dataset. Full article

► Show Figures

Figure 1

19 pages, 8535 KiB

Open AccessArticle

A Fast Attention-Guided Hierarchical Decoding Network for Real-Time Semantic Segmentation

by Xuegang Hu and Jing Feng

Sensors 2024, 24(1), 95; https://doi.org/10.3390/s24010095 - 24 Dec 2023

Cited by 3 | Viewed by 1448

Abstract

Semantic segmentation provides accurate scene understanding and decision support for many applications. However, many models strive for high accuracy by adopting complex structures, decreasing the inference speed, and making it challenging to meet real-time requirements. Therefore, a fast attention-guided hierarchical decoding network for [...] Read more.

Semantic segmentation provides accurate scene understanding and decision support for many applications. However, many models strive for high accuracy by adopting complex structures, decreasing the inference speed, and making it challenging to meet real-time requirements. Therefore, a fast attention-guided hierarchical decoding network for real-time semantic segmentation (FAHDNet), which is an asymmetric U-shaped structure, is proposed to address this issue. In the encoder, we design a multi-scale bottleneck residual unit (MBRU), which combines the attention mechanism and decomposition convolution to design a parallel structure for aggregating multi-scale information, making the network perform better at processing information at different scales. In addition, we propose a spatial information compensation (SIC) module that effectively uses the original input to make up for the spatial texture information lost during downsampling. In the decoder, the global attention (GA) module is used to process the feature map of the encoder, enhance the feature interaction in the channel and spatial dimensions, and enhance the ability to mine feature information. At the same time, the lightweight hierarchical decoder integrates multi-scale features to better adapt to different scale targets and accurately segment objects of different sizes. Through experiments, FAHDNet performs outstandingly on two public datasets, Cityscapes and Camvid. Specifically, the network achieves 70.6% mean intersection over union (mIoU) at 135 frames per second (FPS) on Cityscapes and 67.2% mIoU at 335 FPS on Camvid. Compared to the existing networks, our model maintains accuracy while achieving faster inference speeds, thus enhancing its practical usability. Full article

(This article belongs to the Section Sensor Networks)

► Show Figures

Figure 1

15 pages, 1091 KiB

Open AccessArticle

Swin-APT: An Enhancing Swin-Transformer Adaptor for Intelligent Transportation

by Yunzhuo Liu, Chunjiang Wu, Yuting Zeng, Keyu Chen and Shijie Zhou

Appl. Sci. 2023, 13(24), 13226; https://doi.org/10.3390/app132413226 - 13 Dec 2023

Cited by 5 | Viewed by 2578

Abstract

Artificial Intelligence has been widely applied in intelligent transportation systems. In this work, Swin-APT, a deep learning-based approach for semantic segmentation and object detection in intelligent transportation systems is presented. Swin-APT includes a lightweight network and a multiscale adapter network designed for image [...] Read more.

Artificial Intelligence has been widely applied in intelligent transportation systems. In this work, Swin-APT, a deep learning-based approach for semantic segmentation and object detection in intelligent transportation systems is presented. Swin-APT includes a lightweight network and a multiscale adapter network designed for image semantic segmentation and object detection tasks. An inter-frame consistency module is proposed to extract more accurate road information from images. Experimental results on four datasets: BDD100K, CamVid, SYNTHIA, and CeyMo, demonstrate that Swin-APT outperforms the baseline by 13.1%. Furthermore, experiments on the road marking detection benchmark show an improvement of 1.85% of mAcc. Full article

(This article belongs to the Special Issue Advances in Image Enhancement and Restoration Technology)

► Show Figures

Figure 1

Search Results (38)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (38)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI