ISANet: A Real-Time Semantic Segmentation Network Based on Information Supplementary Aggregation Network
Abstract
1. Introduction
- An SLBU is developed for the extraction of feature information. The SLBU has a small number of parameters, yet it can acquire rich semantic features. Moreover, by leveraging compensation branches to supplement the lost spatial details, the SLBU is able to enhance the integrity of spatial features.
- An OBFAM is put forward for the effective fusion of feature information and compensation information. The OBFAM boosts segmentation performance by integrating spatial attention branches and channel attention branches, which in turn strengthens the correlation between features.
- A lightweight real-time semantic segmentation network named ISANet is proposed. This network generates only a small number of parameters but still achieves excellent segmentation accuracy and fast segmentation speed.
2. Related Work
2.1. Semantic Information
2.2. Spatial Information
2.3. Attention Mechanism
2.4. Real-Time Semantic Segmentation Structure
3. Proposed Method
3.1. Spatial-Supplementary Lightweight Bottleneck Unit (SLBU)
Algorithm 1: SLBU |
Input: input image |
Output: feature map |
1: |
2: |
3: |
4: |
5: |
6: |
7: |
8: |
9: |
3.2. Object Boundary Feature Attention Module (OBFAM)
Algorithm 2: OBFAM |
Input: input image |
Output: feature map |
1: |
2: |
3: |
3.3. Information Supplementary Aggregation Network (ISANet)
4. Experiments
4.1. Experimental Settings
4.1.1. Cityscapes
4.1.2. CamVid
4.1.3. Settings
4.2. Ablation Studies
4.2.1. Ablation Study for MFCM
4.2.2. Ablation Study of SDAM and MSIRB
4.2.3. Ablation Study of OBFAM
4.2.4. Ablation Study of SLBU
4.3. Comparison with Other Lightweight SOTA Methods
4.3.1. Results on Cityscapes
4.3.2. Results on CamVid
5. Conclusions and Discussion
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Zhao, H.; Shi, J.; Qi, X.; Wang, X.; Jia, J. Pyramid scene parsing network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2881–2890. [Google Scholar]
- Zhao, H.; Qi, X.; Shen, X.; Shi, J.; Jia, J. Icnet for real-time semantic segmentation on high-resolution images. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 405–420. [Google Scholar]
- Yu, C.; Wang, J.; Peng, C.; Gao, C.; Yu, G.; Sang, N. Bisenet: Bilateral segmentation network for real-time semantic segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 325–341. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Yu, C.; Gao, C.; Wang, J.; Yu, G.; Shen, C.; Sang, N. Bisenet v2: Bilateral network with guided aggregation for real-time semantic segmentation. Int. J. Comput. Vis. 2021, 129, 3051–3068. [Google Scholar] [CrossRef]
- Lopez-Montiel, M.; Lopez, D.A.; Montiel, O. JetSeg: Efficient Real-Time Semantic Segmentation Model for Low-Power GPU-Embedded Systems. arXiv 2023, arXiv:2305.11419. [Google Scholar]
- Atif, N.; Mazhar, S.; Sarma, D.; Bhuyan, M.K.; Ahamed, S.R. Efficient context integration through factorized pyramidal learning for ultra-lightweight semantic segmentation. arXiv 2023, arXiv:2302.11785. [Google Scholar] [CrossRef]
- Zhang, Z.; Xu, Z.; Gu, X.; Xiong, J. Cross-cbam: A lightweight network for scene segmentation. arXiv 2023, arXiv:2306.02306. [Google Scholar] [CrossRef]
- Xu, J.; Xiong, Z.; Bhattacharyya, S.P. PIDNet: A real-time semantic segmentation network inspired by PID controllers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 19529–19539. [Google Scholar]
- Zhao, J.; Zou, F.; Li, R.; Li, Y.; Li, K. Efficient resolution-preserving network for real-time semantic segmentation. In Proceedings of the 2021 International Joint Conference on Neural Networks (IJCNN), Shenzhen, China, 18–22 July 2021; IEEE: New York, NY, USA, 2021; pp. 1–8. [Google Scholar]
- Bhattacharyya, D.; Thirupathi Rao, N.; Joshua, E.S.N.; Hu, Y.C. A bi-directional deep learning architecture for lung nodule semantic segmentation. Vis. Comput. 2023, 39, 5245–5261. [Google Scholar]
- Yan, Q.; Li, S.; Liu, C.; Liu, M.; Chen, Q. RoboSeg: Real-time semantic segmentation on computationally constrained robots. IEEE Trans. Syst. Man Cybern. Syst. 2022, 52, 1567–1577. [Google Scholar]
- Wu, T.; Tang, S.; Zhang, R.; Cao, J.; Zhang, Y. CGNet: A light-weight context guided network for semantic segmentation. IEEE Trans. Image Process. 2020, 30, 1169–1179. [Google Scholar] [CrossRef]
- Li, G.; Jiang, S.; Yun, I.; Kim, J.; Kim, J. Depth-wise asymmetric bottleneck with point-wise aggregation decoder for real-time semantic segmentation in urban scenes. IEEE Access 2020, 8, 27495–27506. [Google Scholar] [CrossRef]
- Yang, Q.; Chen, T.; Fan, J.; Lu, Y.; Zuo, C.; Chi, Q. Eadnet: Efficient asymmetric dilated network for semantic segmentation. In Proceedings of the ICASSP 2021—2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, ON, Canada, 6–11 June 2021; IEEE: New York, NY, USA, 2021; pp. 2315–2319. [Google Scholar]
- Singha, T.; Pham, D.S.; Krishna, A.; Gedeon, T. A lightweight multi-scale feature fusion network for real-time semantic segmentation. In Neural Information Processing, Proceedings of the International Conference on Neural Information Processing, Bali, Indonesia, 8–12 December 2021; Springer International Publishing: Cham, Switzerland; pp. 193–205.
- Hu, X.; Jing, L.; Sehar, U. Joint pyramid attention network for real-time semantic segmentation of urban scenes. Appl. Intell. 2022, 52, 580–594. [Google Scholar] [CrossRef]
- Hao, S.; Zhou, Y.; Guo, Y.; Hong, R.; Cheng, J.; Wang, M. Real-time semantic segmentation via spatial-detail guided context propagation. IEEE Trans. Neural Netw. Learn. Syst. 2022, 36, 4042–4053. [Google Scholar] [CrossRef]
- Liu, J.; Xu, X.; Shi, Y.; Deng, C.; Shi, M. RELAXNet: Residual efficient learning and attention expected fusion network for real-time semantic segmentation. Neurocomputing 2022, 474, 115–127. [Google Scholar] [CrossRef]
- Chen, Y.; Xia, R.; Yang, K.; Zou, K. MFFN: Image super-resolution via multi-level features fusion network. Vis. Comput. 2024, 40, 489–504. [Google Scholar] [CrossRef]
- Wang, Y.; Zhou, Q.; Liu, J.; Xiong, J.; Gao, G.; Wu, X.; Latecki, L.J. Lednet: A lightweight encoder-decoder network for real-time semantic segmentation. In Proceedings of the 2019 IEEE International Conference on Image Processing (ICIP), Taipei, Taiwan, 2–25 September 2019; IEEE: New York, NY, USA, 2019; pp. 1860–1864. [Google Scholar]
- Zhang, K.; Liao, Q.; Zhang, J.; Liu, S.; Ma, H.; Xue, J.H. EFRNet: A lightweight network with efficient feature fusion and refinement for real-time semantic segmentation. In Proceedings of the 2021 IEEE International Conference on Multimedia and Expo (ICME), Shenzhen, China, 5–9 July 2021; IEEE: New York, NY, USA, 2021. [Google Scholar]
- Elhassan, M.A.; Huang, C.; Yang, C.; Munea, T.L. DSANet: Dilated spatial attention for real-time semantic segmentation in urban street scenes. Expert Syst. Appl. 2021, 183, 115090. [Google Scholar] [CrossRef]
- Weng, X.; Yan, Y.; Chen, S.; Xue, J.H.; Wang, H. Stage-aware feature alignment network for real-time semantic segmentation of street scenes. IEEE Trans. Circuits Syst. Video Technol. 2021, 32, 4444–4459. [Google Scholar] [CrossRef]
- Tang, X.; Tu, W.; Li, K.; Cheng, J. DFFNet: An IoT-perceptive dual feature fusion network for general real-time semantic segmentation. Inf. Sci. 2021, 565, 326–343. [Google Scholar] [CrossRef]
- Ma, S.; Zhao, Z.; Hou, Z.; Yu, W.; Yang, X.; Zhao, X. SEDNet: Real-Time Semantic Segmentation Algorithm Based on STDC. Int. J. Intell. Syst. 2025, 2025, 8243407. [Google Scholar] [CrossRef]
- Zhao, S.; Wang, Y.; Huo, Z.; Zhang, F. Lightweight and real-time semantic segmentation network via multi-scale dilated convolutions. Vis. Comput. 2025, 41, 11833–11855. [Google Scholar] [CrossRef]
- Xie, E.; Wang, W.; Yu, Z.; Anandkumar, A.; Alvarez, J.M.; Luo, P. SegFormer: Simple and efficient design for semantic segmentation with transformers. Adv. Neural Inf. Process. Syst. 2021, 34, 12077–12090. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30. [Google Scholar] [CrossRef]
- Beltagy, I.; Peters, M.E.; Cohan, A. Longformer: The long-document transformer. arXiv 2020, arXiv:2004.05150. [Google Scholar] [CrossRef]
- Kitaev, N.; Kaiser, Ł.; Levskaya, A. Reformer: The efficient transformer. arXiv 2020, arXiv:2001.04451. [Google Scholar] [CrossRef]
- Peng, C.; Tian, T.; Chen, C.; Guo, X.; Ma, J. Bilateral attention decoder: A lightweight decoder for real-time semantic segmentation. Neural Netw. 2021, 137, 188–199. [Google Scholar]
- Li, G.; Li, L.; Zhang, J. BiAttnNet: Bilateral attention for improving real-time semantic segmentation. IEEE Signal Process. Lett. 2021, 29, 46–50. [Google Scholar]
- Hao, X.; Hao, X.; Zhang, Y.; Li, Y.; Wu, C. Real-time semantic segmentation with weighted factorized-depthwise convolution. Image and Vision Computing 2021, 114, 104269. [Google Scholar]
- Sagar, A. AaSeg: Attention aware network for real time semantic segmentation. arXiv 2021, arXiv:2108.04349. [Google Scholar]
- Xu, G.; Li, J.; Gao, G.; Lu, H.; Yang, J.; Yue, D. Lightweight real-time semantic segmentation network with efficient transformer and CNN. IEEE Trans. Intell. Transp. Syst. 2023, 24, 15897–15906. [Google Scholar] [CrossRef]
- Wan, Q.; Huang, Z.; Lu, J.; Yu, G.; Zhang, L. Seaformer: Squeeze-enhanced axial transformer for mobile semantic segmentation. In Proceedings of the Eleventh International Conference on Learning Representations, Kigali, Rwanda, 1–5 May 2023. [Google Scholar]
- Chen, Y.; Xia, R.; Yang, K.; Zou, K. DARGS: Image inpainting algorithm via deep attention residuals group and semantics. J. King Saud Univ. Comput. Inf. Sci. 2023, 35, 101567. [Google Scholar] [CrossRef]
- Choromanski, K.; Likhosherstov, V.; Dohan, D.; Song, X.; Gane, A.; Sarlos, T.; Hawkins, P.; Davis, J.; Mohiuddin, A.; Kaiser, L.; et al. Rethinking attention with performers. arXiv 2020, arXiv:2009.14794. [Google Scholar]
- Kehinde, T.O.; Adedokun, O.J.; Joseph, A.; Kabirat, K.M.; Akano, H.A.; Olanrewaju, O.A. Helformer: An attention-based deep learning model for cryptocurrency price forecasting. J. Big Data 2025, 12, 81. [Google Scholar] [CrossRef]
- Pan, H.; Hong, Y.; Sun, W.; Jia, Y. Deep Dual-Resolution Networks for Real-Time and Accurate Semantic Segmentation of Traffic Scenes. IEEE Trans. Intell. Transp. Syst. 2023, 24, 3448–3460. [Google Scholar] [CrossRef]
- Senhua, X.U.E.; Liqing, G.A.O.; Liang, W.A.N.; Wei, F.E.N.G. Multi-scale context-aware network for continuous sign language recognition. Virtual Real. Intell. Hardw. 2024, 6, 323–337. [Google Scholar]
- Gao, G.; Xu, G.; Li, J.; Yu, Y.; Lu, H.; Yang, J. FBSNet: A Fast Bilateral Symmetrical Network for Real-Time Semantic Segmentation. IEEE Trans. Multim. 2023, 25, 3273–3283. [Google Scholar] [CrossRef]
- Zhang, X.; Du, B.; Wu, Z.; Wan, T. LAANet: Lightweight attention-guided asymmetric network for real-time semantic segmentation. Neural Comput. Appl. 2022, 34, 3573–3587. [Google Scholar] [CrossRef]
- Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 2818–2826. [Google Scholar]
- Chollet, F. Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1800–1807. [Google Scholar]
- Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv 2017, arXiv:1704.04861. [Google Scholar] [CrossRef]
- Yu, F.; Koltun, V.; Funkhouser, T. Dilated residual networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 472–480. [Google Scholar]
- Mehta, S.; Rastegari, M.; Caspi, A.; Shapiro, L.; Hajishirzi, H. Espnet: Efficient spatial pyramid of dilated convolutions for semantic segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 561–580. [Google Scholar]
- Roy, A.G.; Navab, N.; Wachinger, C. Concurrent spatial and channel ‘squeeze & excitation’in fully convolutional networks. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Granada, Spain, 16–20 September 2018; Springer International Publishing: Cham, Switzerland, 2018; pp. 421–429. [Google Scholar]
- Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar]
- Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
- Zhou, L.; Gong, C.; Liu, Z.; Fu, K. SAL: Selection and attention losses for weakly supervised semantic segmentation. IEEE Trans. Multimed. 2020, 23, 1035–1048. [Google Scholar] [CrossRef]
- Paszke, A.; Chaurasia, A.; Kim, S.; Culurciello, E. Enet: A deep neural network architecture for real-time semantic segmentation. arXiv 2016, arXiv:1606.02147. [Google Scholar] [CrossRef]
- Cordts, M.; Omran, M.; Ramos, S.; Rehfeld, T.; Enzweiler, M.; Benenson, R. The cityscapes dataset for semantic urban scene understanding. In Computer Vision, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; IEEE: New York, NY, USA, 2016; pp. 3213–3223. [Google Scholar]
- Brostow, G.J.; Shotton, J.; Fauqueur, J.; Cipolla, R. Segmentation and recognition using structure from motion point clouds. In Computer Vision, Proceedings of the European Conference on Computer Vision, Marseille, France, 12–18 October 2008; Springer: Berlin/Heidelberg, Germany, 2008; pp. 44–57. [Google Scholar]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep convolutional neural networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef]
- Liu, J.; Zhou, Q.; Qiang, Y.; Kang, B.; Wu, X.; Zheng, B. FDDWNet: A lightweight convolutional neural network for real-time semantic segmentation. In Proceedings of the ICASSP 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 4–8 May 2020; IEEE: New York, NY, USA, 2020; pp. 2373–2377. [Google Scholar]
- Gao, G.; Xu, G.; Yu, Y.; Xie, J.; Yang, J.; Yue, D. MSCFNet: A lightweight network with multi-scale context fusion for real-time semantic segmentation. IEEE Trans. Intell. Transp. Syst. 2021, 23, 25489–25499. [Google Scholar] [CrossRef]
- Dong, G.; Yan, Y.; Shen, C.; Wang, H. Real-Time High-Performance Semantic Image Segmentation of Urban Street Scenes. IEEE Trans. Intell. Transp. Syst. 2021, 22, 3258–3274. [Google Scholar] [CrossRef]
- Yang, Z.; Yu, H.; Fu, Q.; Sun, W.; Jia, W.; Sun, M.; Mao, Z.H. NDNet: Narrow while deep network for real-time semantic segmentation. IEEE Trans. Intell. Transp. Syst. 2020, 22, 5508–5519. [Google Scholar] [CrossRef]
- Fan, J.; Wang, F.; Chu, H.; Hu, X.; Cheng, Y.; Gao, B. MLFNet: Multi-Level Fusion Network for Real-Time Semantic Segmentation of Autonomous Driving. IEEE Trans. Intell. Veh. 2023, 8, 756–767. [Google Scholar] [CrossRef]
- Hu, X.; Ke, Y. EMFANet: A lightweight network with efficient multi-scale feature aggregation for real-time semantic segmentation. J. Real-Time Image Process. 2024, 21, 40. [Google Scholar] [CrossRef]
- He, J.Y.; Liang, S.H.; Wu, X.; Zhao, B.; Zhang, L. Mgseg: Multiple granularity-based real-time semantic segmentation network. IEEE Trans. Image Process. 2021, 30, 7200–7214. [Google Scholar] [CrossRef] [PubMed]
- Jin, Z.; Dou, F.; Feng, Z.; Zhang, C. BSNet: A bilateral real-time semantic segmentation network based on multi-scale receptive fields. J. Vis. Commun. Image Represent. 2024, 102, 104188. [Google Scholar] [CrossRef]
- Wu, L.; Qiu, S.; Chen, Z. Real-time semantic segmentation network based on parallel atrous convolution for short-term dense concatenate and attention feature fusion. J. Real-Time Image Process. 2024, 21, 74. [Google Scholar] [CrossRef]
- Kuntao, C.A.O.; Huang, X.; Shao, J. Aggregation architecture and all-to-one network for real-time semantic segmentation. In Proceedings of the ICASSP 2021—2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, ON, Canada, 6–11 June 2021; IEEE: New York, NY, USA, 2021; pp. 2330–2334. [Google Scholar]
Layer | Operator | Mode | Output Size |
---|---|---|---|
1 | 3 × 3 Conv | Stride 2 | 32 × 256 × 512 |
2 | 3 × 3 Conv | Stride 1 | 32 × 256 × 512 |
3 | 3 × 3 Conv | Stride 1 | 32 × 256 × 512 |
Concatenate | - | 35 × 256 × 512 | |
4 | Downsample Block | - | 64 × 128 × 256 |
5–6 | 2 × SLBU | Dilated 2 | 64 × 128 × 256 |
7–8 | 2 × SLBU | Dilated 2 | 64 × 128 × 256 |
Concatenate | - | 67 × 128 × 256 | |
9 | Downsample Block | - | 128 × 64 × 128 |
10–11 | 2 × SLBU | Dilated 3 | 128 × 64 × 128 |
12–13 | 2 × SLBU | Dilated 7 | 128 × 64 × 128 |
14–15 | 2 × SLBU | Dilated 13 | 128 × 64 × 128 |
Concatenate | - | 131 × 64 × 128 | |
16 | 1 × 1 Conv | Stride 1 | 128 × 64 × 128 |
17–19 | 3 × SLBU | Dilated 2 | 128 × 64 × 128 |
20 | 1 × 1 Conv | Stride 1 | 131 × 64 × 128 |
21 | OBFAM | - | 131 × 64 × 128 |
22 | 1 × 1 Conv | Stride 1 | 131 × 64 × 128 |
23 | UpSample Block | - | 131 × 128 × 256 |
24 | OBFAM | - | 131 × 128 × 256 |
25 | 1 × 1 Conv | Stride 1 | 131 × 128 × 256 |
26 | UpSample Block | - | 131 × 256 × 512 |
27 | 3 × 3 Conv | Stride 2 | C × 256 × 512 |
28 | UpSample Block | - | C × 512 × 1024 |
M | N | Dilation Rates | mIoU (%) | Speed (fps) | Params (M) |
---|---|---|---|---|---|
4 | 4 | {2,2,2,2},{3,3,7,7} | 72.5 | 60 | 1.14 |
4 | 6 | {2,2,2,2},{3,3,7,7,13,13} | 76.7 | 58 | 1.37 |
4 | 6 | {2,2,2,2},{4,4,8,8,16,16} | 75.8 | 52 | 1.37 |
4 | 6 | {2,2,2,2},{7,7,13,13,23,23} | 76.1 | 55 | 1.37 |
4 | 8 | {2,2,2,2},{3,3,7,7,13,13,23,23} | 74.4 | 49 | 1.53 |
4 | 8 | {2,2,2,2},{4,4,8,8,16,16,32,32} | 74.1 | 46 | 1.53 |
6 | 4 | {2,2,2,2,2,2},{3,3,7,7} | 73.7 | 53 | 1.14 |
6 | 6 | {2,2,2,2,2,2},{3,3,7,7,13,13} | 74.7 | 47 | 1.42 |
6 | 6 | {2,2,2,2,2,2},{4,4,8,8,16,16} | 73.4 | 43 | 1.42 |
6 | 8 | {2,2,2,2,2,2},{3,3,7,7,13,13,23,23} | 75.4 | 40 | 1.51 |
6 | 8 | {2,2,2,2,2,2},{4,4,8,8,16,16,32,32} | 75.1 | 41 | 1.51 |
Module | mIoU (%) | Speed (fps) | Params (M) |
---|---|---|---|
Base | 70.6 | 72 | 1.21 |
Base + SFAB | 73.3 | 67 | 1.26 |
Base + MSIRB | 73.7 | 66 | 1.26 |
Base + SFAB + MSIRB | 76.7 | 58 | 1.37 |
Attention Mechanism | mIoU (%) | Speed (fps) | Params (M) |
---|---|---|---|
SE [51] | 74.9 | 61 | 1.35 |
CBAM [52] | 75.7 | 21 | 1.35 |
OBFAM | 76.7 | 58 | 1.37 |
Attention Mechanism | mIoU (%) | Speed (fps) | Params (M) |
---|---|---|---|
SE [51] | 72.5 | 92 | 1.35 |
CBAM [52] | 73.1 | 33 | 1.35 |
OBFAM | 73.8 | 90 | 1.37 |
Bottleneck Unit | mIoU (%) | Speed (fps) | Params (M) |
---|---|---|---|
Bottleneck | 71.0 | 80 | 0.19 |
SS-nbt | 73.5 | 54 | 1.21 |
EABR | 74.1 | 60 | 1.04 |
FDSS-bt | 74.7 | 56 | 1.04 |
SLBU-L | 75.1 | 63 | 1.37 |
SLBU | 76.7 | 58 | 1.37 |
Method | Roa | Sid | Bui | Wal | Fen | Pol | Tli | Tsi | Veg | Ter | Sky | Per | Rid | Car | Tru | Bus | Tra | Mot | Bic | Avg |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
EFRNet [22] | 97.9 | 82.1 | 90.7 | 45.2 | 50.4 | 59.0 | 62.6 | 68.4 | 91.9 | 69.4 | 94.2 | 78.5 | 59.8 | 93.4 | 52.3 | 60.8 | 53.7 | 49.9 | 64.2 | 69.7 |
DABNet [14] | 97.9 | 82.0 | 90.6 | 45.5 | 50.1 | 59.3 | 63.5 | 67.7 | 91.8 | 70.1 | 92.8 | 78.1 | 57.8 | 93.7 | 52.8 | 63.7 | 56.0 | 51.3 | 66.8 | 70.1 |
LEDNet [21] | 98.1 | 79.5 | 91.6 | 47.7 | 49.9 | 62.8 | 61.3 | 72.8 | 92.6 | 61.2 | 94.9 | 76.2 | 53.7 | 90.9 | 64.4 | 64.0 | 52.7 | 44.4 | 71.6 | 70.6 |
FBSNet [43] | 98.0 | 83.2 | 91.5 | 50.9 | 53.5 | 62.5 | 67.6 | 71.5 | 92.7 | 70.5 | 94.4 | 82.5 | 63.8 | 93.9 | 50.5 | 56.0 | 37.6 | 56.2 | 70.1 | 70.9 |
FDDWNet [58] | 98.0 | 82.4 | 91.1 | 52.5 | 51.2 | 59.9 | 64.4 | 68.9 | 92.5 | 70.3 | 94.4 | 80.8 | 59.8 | 94.0 | 56.5 | 68.9 | 48.6 | 55.7 | 67.7 | 71.5 |
MSCFNet [59] | 97.7 | 82.8 | 91.0 | 49.0 | 52.5 | 61.2 | 67.1 | 71.4 | 92.3 | 70.2 | 94.3 | 82.7 | 62.7 | 94.1 | 50.9 | 66.1 | 51.9 | 57.6 | 70.2 | 71.9 |
LETNet [36] | 98.2 | 83.6 | 91.6 | 50.9 | 53.7 | 61.0 | 66.7 | 70.5 | 92.5 | 70.5 | 94.9 | 82.3 | 61.7 | 94.4 | 55.0 | 72.4 | 57.0 | 57.6 | 69.3 | 72.8 |
LBN-AA [60] | 98.2 | 84.0 | 91.6 | 50.7 | 49.5 | 60.9 | 69.0 | 73.6 | 92.6 | 70.3 | 94.4 | 83.0 | 65.7 | 94.9 | 62.0 | 70.9 | 53.3 | 62.5 | 71.8 | 73.6 |
MSDSeg [25] | 98.4 | 84.4 | 91.7 | 50.6 | 52.4 | 58.5 | 65.3 | 69.8 | 92.5 | 70.7 | 95.0 | 94.3 | 59.7 | 94.6 | 60.3 | 77.2 | 68.3 | 54.9 | 70.3 | 74.0 |
SEDNet [26] | 98.5 | 85.3 | 92.1 | 54.6 | 53.3 | 59.4 | 69.0 | 73.7 | 92.6 | 70.6 | 94.9 | 83.1 | 67.4 | 95.3 | 70.3 | 81.3 | 76.3 | 62.5 | 71.8 | 76.4 |
ISANet | 98.7 | 85.4 | 92.4 | 54.5 | 53.5 | 62.9 | 70.3 | 74.4 | 93.4 | 70.4 | 95.5 | 83.6 | 68.5 | 96.4 | 70.1 | 77.4 | 75.6 | 62.1 | 72.5 | 76.7 |
Methods | Backbone | Input Size | FPS | Params (M) | FLOPs (G) | mIoU (%) |
---|---|---|---|---|---|---|
CGNet [13] | No | 360 × 640 | 74 | 0.50 | 6.0 | 64.8 |
NDNet [61] | No | 1024 × 2048 | 40 | 0.50 | 14.0 | 65.3 |
EFRNet [22] | No | 512 × 1024 | 44 | 2.10 | 21.0 | 69.7 |
DABNet [14] | No | 512 × 1024 | 94 | 0.76 | 10.5 | 70.1 |
SGCPNet [18] | No | 1024 × 2048 | 103 | 0.61 | 4.5 | 70.9 |
FBSNet [43] | No | 512 × 1024 | 90 | 0.62 | 9.7 | 70.9 |
FDDWNet [58] | No | 512 × 1024 | 60 | 0.80 | 85.0 | 71.5 |
MSCFNet [59] | No | 512 × 1024 | 50 | 1.15 | 17.1 | 71.9 |
MLFNet [62] | ResNet34 | 512 × 1024 | 72 | 3.99 | 15.5 | 72.1 |
EMFANet [63] | No | 512 × 1024 | 41 | 1.03 | 40.7 | 72.1 |
BisNet-V2 [5] | No | 512 × 1024 | 156 | 3.40 | 21.2 | 72.6 |
MGSeg [64] | ShuffleNet | 1024 × 2048 | 101 | 4.50 | 16.2 | 72.7 |
LETNet [36] | No | 512 × 1024 | 150 | 0.95 | 13.6 | 72.8 |
BSNet [65] | No | 512 × 1024 | 63 | 9.57 | 39.2 | 72.8 |
LBN-AA [60] | No | 448 × 896 | 51 | 6.20 | 49.5 | 73.6 |
PACAMNet [66] | No | 512 × 1024 | 62 | 11.10 | 19.8 | 73.9 |
MSDSeg [25] | No | 512 × 1024 | 51 | 2.3 | 6.5 | 74.0 |
ATONet [67] | No | 1024 × 2048 | 42 | 13.30 | 134.5 | 74.4 |
Seaformer [37] | No | 1024 × 2048 | 2 | 4.00 | 8.0 | 75.9 |
SegFormer [28] | Mit-B0 | 1024 × 2048 | 1 | 84.70 | 125.5 | 76.2 |
SEDNet [26] | STDC1 | 768 × 1536 | 105 | 6.63 | 65.9 | 76.4 |
ISANet | No | 512 × 1024 | 8 | 1.37 | 12.6 | 76.7 |
Methods | Backbone | Input Size | FPS | Params (M) | FLOPs (G) | mIoU (%) |
---|---|---|---|---|---|---|
NDNet [61] | No | 360 × 480 | 52 | 0.50 | 1.2 | 57.2 |
EFRNet [22] | No | 360 × 480 | 133 | 2.10 | 6.9 | 65.0 |
CGNet [13] | No | 360 × 480 | 59 | 0.50 | 4.5 | 65.6 |
DABNet [14] | No | 360 × 480 | 81 | 0.76 | 3.5 | 66.4 |
FDDWNet [58] | No | 360 × 480 | 72 | 0.80 | 28.1 | 66.9 |
EMFANet [63] | No | 360 × 480 | 32 | 1.03 | 13.3 | 67.5 |
LBN-AA [60] | No | 720 × 960 | 39 | 6.20 | 85.2 | 68.0 |
BSNet [65] | No | 720 × 960 | 131 | 9.57 | 29.7 | 68.4 |
BisNet-V2 [5] | No | 720 × 960 | 116 | 3.40 | 27.9 | 68.7 |
FBSNet [43] | No | 360 × 480 | 120 | 0.62 | 3.2 | 68.9 |
SGCPNet [18] | No | 720 × 960 | 278 | 0.61 | 1.5 | 69.0 |
MSCFNet [59] | No | 360 × 480 | 66 | 1.15 | 5.6 | 69.3 |
MLFNet [62] | ResNet34 | 720 × 960 | 57 | 3.99 | 20.4 | 69.4 |
PACAMNet [66] | No | 720 × 960 | 81 | 11.1 | 15.0 | 69.8 |
ATONet [67] | No | 720 × 960 | 93 | 13.30 | 44.3 | 70.1 |
LETNet [36] | No | 360 × 480 | 200 | 0.95 | 4.5 | 70.5 |
MGSeg [64] | ShuffleNet | 360 × 480 | 127 | 4.50 | 1.3 | 71.1 |
Seaformer [37] | No | 720 × 960 | 3 | 4.00 | 2.6 | 71.6 |
SegFormer [28] | Mit-B0 | 720 × 960 | 2 | 84.70 | 41.4 | 72.7 |
SEDNet [26] | STDC1 | 720 × 960 | 165.6 | 6.63 | 21.9 | 72.8 |
MSDSeg [25] | No | 720 × 960 | 44 | 2.3 | 2.2 | 73.4 |
ISANet | No | 360 × 480 | 90 | 1.37 | 4.2 | 73.8 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Li, F.; Li, H.; He, D.; Zhang, X. ISANet: A Real-Time Semantic Segmentation Network Based on Information Supplementary Aggregation Network. Electronics 2025, 14, 3998. https://doi.org/10.3390/electronics14203998
Li F, Li H, He D, Zhang X. ISANet: A Real-Time Semantic Segmentation Network Based on Information Supplementary Aggregation Network. Electronics. 2025; 14(20):3998. https://doi.org/10.3390/electronics14203998
Chicago/Turabian StyleLi, Fuxiang, Hexiao Li, Dongsheng He, and Xiangyue Zhang. 2025. "ISANet: A Real-Time Semantic Segmentation Network Based on Information Supplementary Aggregation Network" Electronics 14, no. 20: 3998. https://doi.org/10.3390/electronics14203998
APA StyleLi, F., Li, H., He, D., & Zhang, X. (2025). ISANet: A Real-Time Semantic Segmentation Network Based on Information Supplementary Aggregation Network. Electronics, 14(20), 3998. https://doi.org/10.3390/electronics14203998