A GAN-Based Framework with Dynamic Adaptive Attention for Multi-Class Image Segmentation in Autonomous Driving
Abstract
1. Introduction
2. Related Work
3. Materials and Methods
3.1. Overview of GAN Architecture
3.1.1. Discriminator Loss
3.1.2. Generator Loss
3.2. Adaptive Ensemble Attention for Autonomous Driving
3.3. Combining Adaptive Ensemble Attention with GAN
- Q, K, and V are the query, key, and value matrices, respectively, derived from the input X.
- dk is the dimensionality of the query and key vectors.
- The Softmax operation ensures that the attention weights are normalized.
- GlobalAvgPool (X) is a global average pooling operation over the spatial dimensions WH × W.
- W1 and W2 are learned weight matrices, σ is the sigmoid activation function.
3.4. Dataset
3.5. Implementation Process
Training Setup Details
- Learning Rate: Initially set to 0.001 with a step decay schedule to reduce it by a factor of 0.1 every 100 epochs.
- Optimizer: Adam optimizer was used for both the generator and discriminator due to its efficiency in converging GANs.
- Weight Initialization: Xavier (Glorot) initialization was used for convolutional layers to ensure stable gradients.
- Regularization: Dropout (rate = 0.5) was applied in intermediate layers to prevent overfitting. L2 regularization (λ = 0.0001) was also used.
- Data Augmentation: Applied random horizontal flipping, brightness variation, random cropping, and slight rotations to improve robustness and generalization.
3.6. Evaluation Metrics
4. Results
4.1. Experimental Setup
4.2. Visual Results of Segmentation Outputs—BDD100K Dataset
4.3. Visual Results of Segmentation Outputs—CITYSCAPE Dataset
4.4. Visual Results of Segmentation Outputs—KITTI Dataset
4.5. Ablation Study on Backbone Networks
4.6. Ablation Study of Framework Components
4.7. Runtime Performance and Model Complexity
5. Discussion
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Yu, H.; Huo, S.; Zhu, M.; Gong, Y.; Xiang, Y. Machine learning-based vehicle intention trajectory recognition and prediction for autonomous driving. arXiv 2024, arXiv:2402.16036. [Google Scholar]
- Guan, L.; Yuan, X. Dynamic weighting and boundary-aware active domain adaptation for semantic segmentation in autonomous driving environment. IEEE Trans. Intell. Transp. Syst. 2024, 25, 18461–18471. [Google Scholar] [CrossRef]
- Sun, C.; Zhao, H.; Mu, L.; Xu, F.; Lu, L. Image Semantic Segmentation for Autonomous Driving Based on Improved U-Net. C-Comput. Model. Eng. Sci. 2023, 136, 787–801. [Google Scholar] [CrossRef]
- Khairnar, S.; Thepade, S.D.; Kolekar, S.; Gite, S.; Pradhan, B.; Alamri, A.; Patil, B.; Dahake, S.; Gaikwad, R.; Chaudhari, A. Enhancing semantic segmentation for autonomous vehicle scene understanding in indian context using modified CANet model. MethodsX 2025, 14, 103131. [Google Scholar] [CrossRef] [PubMed]
- Liu, Y.; Wang, Y.; Li, Q. Lane detection based on real-time semantic segmentation for end-to-end autonomous driving under low-light conditions. Digit. Signal Process. 2024, 155, 104752. [Google Scholar] [CrossRef]
- Hao, W.; Wang, J.; Lu, H. A Real-Time Semantic Segmentation Method Based on Transformer for Autonomous Driving. Comput. Mater. Contin. 2024, 81, 4419–4433. [Google Scholar] [CrossRef]
- Unar, S.; Su, Y.; Zhao, X.; Liu, P.; Wang, Y.; Fu, X. Towards applying image retrieval approach for finding semantic locations in autonomous vehicles. Multimed. Tools Appl. 2024, 83, 20537–20558. [Google Scholar] [CrossRef]
- Zhang, C.; Lu, W.; Wu, J.; Ni, C.; Wang, H. SegNet network architecture for deep learning image segmentation and its integrated applications and prospects. Acad. J. Sci. Technol. 2024, 9, 224–229. [Google Scholar] [CrossRef]
- Mei, J.; Zhou, T.; Huang, K.; Zhang, Y.; Zhou, Y.; Wu, Y.; Fu, H. A survey on deep learning for polyp segmentation: Techniques, challenges and future trends. Vis. Intell. 2025, 3, 1. [Google Scholar] [CrossRef]
- George, G.V.; Hussain, M.S.; Hussain, R.; Jenicka, S. Efficient Road Segmentation Techniques with Attention-Enhanced Conditional GANs. SN Comput. Sci. 2024, 5, 176. [Google Scholar] [CrossRef]
- Liang, X.; Li, C.; Tian, L. Generative adversarial network for semi-supervised image captioning. Comput. Vis. Image Underst. 2024, 249, 104199. [Google Scholar] [CrossRef]
- Ma, X.; Hu, K.; Sun, X.; Chen, S. Adaptive Attention Module for Image Recognition Systems in Autonomous Driving. Int. J. Intell. Syst. 2024, 2024, 3934270. [Google Scholar] [CrossRef]
- Wang, N.; Guo, R.; Shi, C.; Wang, Z.; Zhang, H.; Lu, H.; Zheng, Z.; Chen, X. SegNet4D: Efficient Instance-Aware 4D Semantic Segmentation for LiDAR Point Cloud. IEEE Trans. Autom. Sci. Eng. 2025, 22, 15339–15350. [Google Scholar] [CrossRef]
- Hofmarcher, M.; Unterthiner, T.; Arjona-Medina, J.; Klambauer, G.; Hochreiter, S.; Nessler, B. Visual scene understanding for autonomous driving using semantic segmentation. In Explainable AI: Interpreting, Explaining and Visualizing Deep Learning; Springer: Cham, Switzerland, 2019; pp. 285–296. [Google Scholar]
- Chang, C.C.; Lin, W.C.; Wang, P.S.; Yu, S.F.; Lu, Y.C.; Lin, K.C.; Wu, K.C. Q-YOLOP: Quantization-aware you only look once for panoptic driving perception. In Proceedings of the 2023 IEEE International Conference on Multimedia and Expo Workshops (ICMEW), Brisbane, Australia, 10–14 July 2023; IEEE: Piscataway, NJ, USA; pp. 52–56. [Google Scholar]
- Luo, T.; Chen, Y.; Luan, T.; Cai, B.; Chen, L.; Wang, H. Ids-model: An efficient multitask model of road scene instance and drivable area segmentation for autonomous driving. IEEE Trans. Transp. Electrif. 2023, 10, 1454–1464. [Google Scholar] [CrossRef]
- Siam, M.; Gamal, M.; Abdel-Razek, M.; Yogamani, S.; Jagersand, M.; Zhang, H. A comparative study of real-time semantic segmentation for autonomous driving. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Salt Lake City, UT, USA, 18–22 June 2018; pp. 587–597. [Google Scholar]
- Feng, D.; Haase-Schütz, C.; Rosenbaum, L.; Hertlein, H.; Glaser, C.; Timm, F.; Wiesbeck, W.; Dietmayer, K. Deep Multi-Modal Object Detection and Semantic Segmentation for Autonomous Driving: Datasets, Methods, and Challenges. IEEE Trans. Intell. Transp. Syst. 2021, 22, 1341–1360. [Google Scholar] [CrossRef]
- Chen, W.; Miao, Z.; Qu, Y.; Shi, G. HRDLNet: A semantic segmentation network with high resolution representation for urban street view images. Complex Intell. Syst. 2024, 10, 7825–7844. [Google Scholar] [CrossRef]
- Zhou, W.; Berrio, J.S.; Worrall, S.; Nebot, E. Automated Evaluation of Semantic Segmentation Robustness for Autonomous Driving. IEEE Trans. Intell. Transp. Syst. 2020, 21, 1951–1963. [Google Scholar] [CrossRef]
- Qian, Y.; Dolan, J.M.; Yang, M. DLT-Net: Joint detection of drivable areas, lane lines, and traffic objects. IEEE Trans. Intell. Transp. Syst. 2019, 21, 4670–4679. [Google Scholar] [CrossRef]
- Aksoy, E.E.; Baci, S.; Cavdar, S. Salsanet: Fast road and vehicle segmentation in lidar point clouds for autonomous driving. In Proceedings of the 2020 IEEE Intelligent Vehicles Symposium (IV), Las Vegas, NV, USA, 19 October–13 November 2020; IEEE: Piscataway, NJ, USA; pp. 926–932. [Google Scholar]
- Wu, D.; Liao, M.W.; Zhang, W.T.; Wang, X.G.; Bai, X.; Cheng, W.Q.; Liu, W.Y. Yolop: You only look once for panoptic driving perception. Mach. Intell. Res. 2022, 19, 550–562. [Google Scholar] [CrossRef]
- Fujiyoshi, H.; Hirakawa, T.; Yamashita, T. Deep learning-based image recognition for autonomous driving. IATSS Res. 2019, 43, 244–252. [Google Scholar] [CrossRef]
- Xie, E.; Wang, W.; Yu, Z.; Anandkumar, A.; Alvarez, J.M.; Luo, P. SegFormer: Simple and efficient design for semantic segmentation with transformers. Adv. Neural Inf. Process. Syst. 2021, 34, 12077–12090. [Google Scholar]
- Yang, J.; Guo, S.; Bocus, M.J.; Chen, Q.; Fan, R. Semantic segmentation for autonomous driving. In Autonomous Driving Perception: Fundamentals and Applications; Springer: Singapore, 2023; pp. 101–137. [Google Scholar]
- Yin, W.; Liu, Y.; Shen, C.; Sun, B.; van den Hengel, A. Scaling up multi-domain semantic segmentation with sentence embeddings. Int. J. Comput. Vis. 2024, 132, 4036–4051. [Google Scholar] [CrossRef]
- Zhu, Y.; Sapra, K.; Reda, F.A.; Shih, K.J.; Newsam, S.; Tao, A.; Catanzaro, B. Improving semantic segmentation via video propagation and label relaxation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 8856–8865. [Google Scholar]
- Al-Ajlan, M.; Ykhlef, M. A Review of Generative Adversarial Networks for Intrusion Detection Systems: Advances, Challenges, and Future Directions. Comput. Mater. Contin. 2024, 81, 2053–2076. [Google Scholar] [CrossRef]
- Celik, F.; Celik, K.; Celik, A. Enhancing brain tumor classification through ensemble attention mechanism. Sci. Rep. 2024, 14, 22260. [Google Scholar] [CrossRef] [PubMed]
- Zhao, Q.; Liu, J.; Li, Y.; Zhang, H. Semantic segmentation with attention mechanism for remote sensing images. IEEE Trans. Geosci. Remote Sens. 2021, 60, 5403913. [Google Scholar] [CrossRef]
- Yu, F.; Chen, H.; Wang, X.; Xian, W.; Chen, Y.; Liu, F.; Madhavan, V.; Darrell, T. Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 2636–2645. [Google Scholar]
- Cordts, M.; Omran, M.; Ramos, S.; Rehfeld, T.; Enzweiler, M.; Benenson, R.; Franke, U.; Roth, S.; Schiele, B. The cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 3213–3223. [Google Scholar]
- Krstinić, D.; Braović, M.; Šerić, L.; Božić-Štulić, D. Multi-label classifier performance evaluation with confusion matrix. Comput. Sci. Inf. Technol. 2020, 1, 1–4. [Google Scholar]
- Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention; Springer: Cham, Switzerland, 2015; pp. 234–241. [Google Scholar]
- Cheng, B.; Misra, I.; Schwing, A.G.; Kirillov, A.; Girdhar, R. Masked-attention mask transformer for universal image segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 1290–1299. [Google Scholar]
- Yuan, Y.; Chen, X.; Wang, J. Object-contextual representations for semantic segmentation. In European Conference on Computer Vision; Springer: Cham, Switzerland, 2020; pp. 173–190. [Google Scholar]
- Lo, S.Y.; Hang, H.M.; Chan, S.W.; Lin, J.J. Efficient dense modules of asymmetric convolution for real-time semantic segmentation. In Proceedings of the 1st ACM International Conference on Multimedia in Asia, Beijing, China, 15–18 December 2019; pp. 1–6. [Google Scholar]
- Fan, M.; Lai, S.; Huang, J.; Wei, X.; Chai, Z.; Luo, J.; Wei, X. Rethinking bisenet for real-time semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 9716–9725. [Google Scholar]
- Xu, J.; Xiong, Z.; Bhattacharyya, S.P. PIDNet: A real-time semantic segmentation network inspired by PID controllers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 19529–19539. [Google Scholar]
- Hong, Y.; Pan, H.; Sun, W.; Jia, Y. Deep dual-resolution networks for real-time and accurate semantic segmentation of road scenes. arXiv 2021, arXiv:2101.06085. [Google Scholar]
- Yan, X.; Gao, J.; Zheng, C.; Zheng, C.; Zhang, R.; Cui, S.; Li, Z. 2dpass: 2d priors assisted semantic segmentation on lidar point clouds. In European Conference on Computer Vision; Springer: Cham, Switzerland, 2022; pp. 677–695. [Google Scholar]
- Lai, X.; Chen, Y.; Lu, F.; Liu, J.; Jia, J. Spherical transformer for lidar-based 3d recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 17545–17555. [Google Scholar]
- Wiseman, Y. Real-time monitoring of traffic congestions. In Proceedings of the 2017 IEEE International Conference on Electro Information Technology (EIT), Lincoln, NE, USA, 14–17 May 2017; pp. 501–505. [Google Scholar]
Backbone Architecture | Accuracy (%) | Precision (%) | Recall (%) | F1-Score (%) | AUC (%) | mIOU (%) |
---|---|---|---|---|---|---|
MobileNetV3 Only | 97.65 | 97.60 | 97.62 | 97.61 | 95.1 | 85.21 |
EfficientNetB7 Only | 98.12 | 98.05 | 98.09 | 98.07 | 96.8 | 87.38 |
Proposed Hybrid Model | 98.94 | 98.91 | 98.93 | 98.91 | 98.4 | 89.46 |
Model Variant | Accuracy (%) | F1-Score (%) | AUC (%) | mIoU (%) |
---|---|---|---|---|
Full AEA-GAN (Proposed) | 98.94 | 98.91 | 98.4 | 89.46 |
Without GAN (only AEA-based UNet) | 97.83 | 97.79 | 96.7 | 85.72 |
Without AEA (no attention mechanisms) | 97.65 | 97.58 | 96.4 | 84.93 |
Without self-attention | 97.61 | 97.55 | 96.2 | 84.66 |
Without spatial attention | 97.62 | 97.53 | 96.1 | 84.52 |
Without channel attention | 97.60 | 97.51 | 96.0 | 84.49 |
With standard attention (non-adaptive) | 97.84 | 97.77 | 96.6 | 85.39 |
Method | Accuracy (%) | Precision (%) | Recall (%) | F1-Score (%) | AUC (%) | mIOU (%) |
---|---|---|---|---|---|---|
Spatial attention | 97.9 | 97.95 | 97.95 | 97.9 | 96 | 85.41 |
Self-attention | 97.9 | 97.95 | 97.95 | 97.9 | 96 | 86.62 |
Proposed | 98.94 | 98.91 | 98.93 | 98.91 | 98.4 | 89.46 |
Method | Accuracy (%) | Precision (%) | Recall (%) | F1-Score (%) | AUC (%) | mIOU (%) |
---|---|---|---|---|---|---|
Majority voting | 97.9 | 97.95 | 97.95 | 97.9 | 96 | 87.42 |
Weighed Averaging | 97.9 | 97.95 | 97.95 | 97.9 | 96 | 88.46 |
Proposed Hybrid | 98.94 | 98.91 | 98.93 | 98.91 | 98.4 | 89.46 |
Backbone Configuration | mIOU (%) | Params (M) | FLOPs (G) | Inference Time (ms) | FPS |
---|---|---|---|---|---|
MobileNetV3 Only | 85.21 | 17.8 | 36.4 | 6.3 | 38 |
EfficientNetB7 Only | 87.38 | 60.2 | 143.5 | 5.0 | 20 |
Proposed Hybrid (AEA-GAN) | 89.46 | 39.5 | 92.1 | 3.0 | 26 |
Dataset | Studies | Model | mIOU (%) |
---|---|---|---|
Wang et al. (2024) [13] | SegNet 4D | 55.2 | |
Chang et al. (2023) [15] | Q-YoloP | 61.2 | |
BDD100K | Chen et al. (2024) [19] | HRDLNet | 70.4 |
Qian et al. (2019) [21] | DLT-Net | 71.3 | |
Yashwanth et al. (2022) [23] | Yolo-P | 72.6 | |
Xie et al. (2021) [25] | Segformer | 75.08 | |
Luo et al. (2023) [16] | IDS model | 83.63 | |
Standard U-Net model | Standard U-Net | 84.16 ± 0.18 | |
Proposed hybrid method | AEA-GAN | 89.46 ± 0.1 | |
Cheng et al. (2021) [36] | MaskFormer | 84.3 | |
Xie et al. (2021) [25] | SegFormer | 84.0 | |
Cityscapes | Yuan et al. (2020) [37] | HRNet + OCR | 81.1 |
Lo et al. (2019) [38] | EDANet | 67.3 | |
Fan et al. (2021) [39] | STDC-Seg75 | 75.3 | |
Xu et al. (2023) [40] | PIDNet-S | 78.6 | |
Hong et al. (2021) [41] | DDRNet-23 | 80.6 | |
Proposed hybrid method | AEA-GAN | 89.02 ± 0.2 | |
Zhu et al. (2019) [28] | VPLR | 72.8 | |
Yan et al. (2022) [42] | 2DPASS | 72.9 | |
KITTI | Lai et al. (2023) [43] | Spherical Transformer | 74.8 |
Proposed hybrid method | AEA-GAN | 88.13 ± 0.3 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Jama, B.S.A.; Hacibeyoglu, M. A GAN-Based Framework with Dynamic Adaptive Attention for Multi-Class Image Segmentation in Autonomous Driving. Appl. Sci. 2025, 15, 8162. https://doi.org/10.3390/app15158162
Jama BSA, Hacibeyoglu M. A GAN-Based Framework with Dynamic Adaptive Attention for Multi-Class Image Segmentation in Autonomous Driving. Applied Sciences. 2025; 15(15):8162. https://doi.org/10.3390/app15158162
Chicago/Turabian StyleJama, Bashir Sheikh Abdullahi, and Mehmet Hacibeyoglu. 2025. "A GAN-Based Framework with Dynamic Adaptive Attention for Multi-Class Image Segmentation in Autonomous Driving" Applied Sciences 15, no. 15: 8162. https://doi.org/10.3390/app15158162
APA StyleJama, B. S. A., & Hacibeyoglu, M. (2025). A GAN-Based Framework with Dynamic Adaptive Attention for Multi-Class Image Segmentation in Autonomous Driving. Applied Sciences, 15(15), 8162. https://doi.org/10.3390/app15158162