Efficient Semantic Segmentation Using Multi-Path Decoder
Abstract
:1. Introduction
2. Related Works
3. Method
Algorithm 1 Main Decoder in the Multi-Path Decoder Network. |
Input: The image need to be segmented, ; Output: The segmentation result, S, the main loss, , and the loss for each decoder, . 1: , where ; 2: for each do 3: ; 4: for each do 5: ; 6: end for 7: , where 8: 9: NLLLoss 10: end for 11: ,where 12: NLLLoss 13: |
Algorithm 2 Edge Decoder in the Multi-Path Decoder Network. |
Input:
Feature map output by the encoder, , where ;
Output: The edge loss, . 1: ; 2: for each do 3: ; 4: end for 5: , where 6: 7: NLLLoss |
4. Results
4.1. Cityscapes
4.2. ADE20K
5. Conclusions
Author Contributions
Funding
Conflicts of Interest
References
- Teichmann, M.; Weber, M.; Zoellner, M.; Cipolla, R.; Urtasun, R. MultiNet: Real-time joint semantic reasoning for autonomous driving. arXiv 2016, arXiv:1612.07695. [Google Scholar]
- Hong, Z.W.; Yu-Ming, C.; Su, S.Y.; Shann, T.Y.; Chang, Y.H.; Yang, H.K.; Ho, H.L.; Tu, C.C.; Chang, Y.C.; Hsiao, T.C. Virtual-to-real: Learning to control in visual semantic segmentation. arXiv 2018, arXiv:1802.00285. [Google Scholar]
- Long, J.; Shelhamer, E.; Darrell, T. Fully Convolutional Networks for Semantic Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
- Romera, E.; Alvarez, J.M.; Bergasa, L.M.; Arroyo, R. ERFNet: Efficient residual factorized ConvNet for real-time semantic segmentation. IEEE Trans. Intell. Transp. Syst. 2018, 19, 263–272. [Google Scholar] [CrossRef]
- Zhao, H.; Qi, X.; Shen, X.; Shi, J.; Jia, J. ICNet for real-time semantic segmentation on high-resolution images. In Computer Vision—ECCV; Springer: Cham, Switzerland, 2018; pp. 418–434. [Google Scholar]
- Mehta, S.; Rastegari, M.; Caspi, A.; Shapiro, L.G.; Hajishirzi, H. ESPNet: Efficient spatial pyramid of dilated convolutions for semantic segmentation. In Computer Vision—ECCV; Springer: Cham, Switzerland, 2018; pp. 561–580. [Google Scholar]
- Siam, M.; Gamal, M.; Abdel-Razek, M.; Yogamani, S.; Jagersand, M.; Zhang, H. A comparative study of real-time semantic segmentation for autonomous driving. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Salt Lake City, UT, USA, 18–22 June 2018; pp. 700–70010. [Google Scholar]
- Orsic, M.; Kreso, I.; Bevandic, P.; Segvic, S. In defense of pre-trained ImageNet architectures for real-time semantic segmentation of road-driving images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 16–20 June 2019. [Google Scholar]
- Kreso, I.; Krapac, J.; Segvic, S. Efficient ladder-style DenseNets for semantic segmentation of large images. arXiv 2019, arXiv:1905.05661. [Google Scholar] [CrossRef] [Green Version]
- Takikawa, T.; Acuna, D.; Jampani, V.; Fidler, S. Gated-scnn: Gated shape CNNs for semantic segmentation. arXiv 2019, arXiv:1907.05740. [Google Scholar]
- Ding, H.; Jiang, X.; Liu, A.Q.; Thalmann, N.M.; Wang, G. Boundary-aware feature propagation for scene segmentation. arXiv 2019, arXiv:1909.00179. [Google Scholar]
- Yu, Z.; Feng, C.; Liu, M.Y.; Ramalingam, S. CASENet: Deep category-aware semantic edge detection. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 June 2017. [Google Scholar]
- Xie, S.; Tu, Z. Holistically-nested edge detection. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 1395–1403. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 1904–1916. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Lazebnik, S.; Schmid, C.; Ponce, J. Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories. In Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), New York, NY, USA, 17–22 June 2006; Volume 2, pp. 2169–2178. [Google Scholar]
- Chen, L.C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs. arXiv 2014, arXiv:1412.7062. [Google Scholar]
- Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Medical Image Computing and Computer-Assisted Intervention; Springer: Cham, Switzerland, 2015; pp. 234–241. [Google Scholar]
- Ghiasi, G.; Fowlkes, C.C. Laplacian Pyramid Reconstruction and Refinement for Semantic Segmentation. In Computer Vision—ECCV 2016; Springer: Cham, Switzerland, 2016; pp. 519–534. [Google Scholar]
- Islam, M.A.; Rochan, M.; Bruce, N.D.B.; Wang, Y. Gated Feedback Refinement Network for Dense Image Labeling. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4877–4885. [Google Scholar]
- Peng, C.; Zhang, X.; Yu, G.; Luo, G.; Sun, J. Large Kernel Matters—Improve Semantic Segmentation by Global Convolutional Network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1743–1751. [Google Scholar]
- Chen, L.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 833–851. [Google Scholar]
- Xiao, T.; Liu, Y.; Zhou, B.; Jiang, Y.; Sun, J. Unified Perceptual Parsing for Scene Understanding. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 432–448. [Google Scholar]
- Ioffe, S.; Szegedy, C. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. In Proceedings of the International Conference on Machine Learning, Lille, France, 6–11 July 2015; pp. 448–456. [Google Scholar]
- Nair, V.; Hinton, G.E. Rectified Linear Units Improve Restricted Boltzmann Machines. In Proceedings of the International Conference on Machine Learning, Haifa, Israel, 21–24 June 2010; pp. 807–814. [Google Scholar]
- Vallurupalli, N.; Annamaneni, S.; Varma, G.; Jawahar, C.V.; Mathew, M.; Nagori, S. Efficient semantic segmentation using gradual grouping. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–22 June 2018; pp. 598–606. [Google Scholar]
- Chaurasia, A.; Culurciello, E. LinkNet: Exploiting encoder representations for efficient semantic segmentation. In Proceedings of the 2017 IEEE Visual Communications and Image Processing (VCIP), St. Petersburg, FL, USA, 10–13 December 2017; pp. 1–4. [Google Scholar]
- Zhao, H.; Shi, J.; Qi, X.; Wang, X.; Jia, J. Pyramid Scene Parsing Network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 6230–6239. [Google Scholar]
- Chen, L.; Papandreou, G.; Schroff, F.; Adam, H. Rethinking Atrous Convolution for Semantic Image Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
- Yu, C.; Wang, J.; Peng, C.; Gao, C.; Yu, G.; Sang, N. Learning a Discriminative Feature Network for Semantic Segmentation. arXiv 2018, arXiv:1804.09337. [Google Scholar]
- Liang, X.; Zhou, H.; Xing, E. Dynamic-Structured Semantic Propagation Network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–22 June 2018. [Google Scholar]
- Zhao, H.; Zhang, Y.; Liu, S.; Shi, J.; Loy, C.C.; Lin, D.; Jia, J. PSANet: Point-wise Spatial Attention Network for Scene Parsing. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018. [Google Scholar]
- Yuhui, Y.; Jingdong, W. Ocnet: Object context network for scene parsing. arXiv 2018, arXiv:1809.00916. [Google Scholar]
Encoder | Number of Paths | Number of Channels in Each Path | mIoU(%) |
---|---|---|---|
ResNet-50 | 1 | 256 | 76.25 |
ResNet-50 | 2 | 128 | 76.43 |
ResNet-50 | 4 | 64 | 76.84 |
ResNet-50 | 8 | 32 | 76.56 |
ResNet-50 | 4 | 128 | 76.99 |
ResNet-50 | 8 | 64 | 76.75 |
Encoder | Resolution | RB | Edge Decoder | Multi-Path Decoder | MS | mIoU(%) | FLOPs |
---|---|---|---|---|---|---|---|
ResNet-50 | 768 × 1536 | 74.45 | 174.59G | ||||
ResNet-50 | 768 × 1536 | ✓ | 75.11 | 174.59G | |||
ResNet-50 | 768 × 1536 | ✓ | ✓ | 76.25 | 116.1G | ||
ResNet-50 | 768 × 1536 | ✓ | ✓ | ✓ | 76.84 | 118.84G | |
ResNet-50 | 1024 × 2048 | ✓ | ✓ | ✓ | 78.06 | 211.27G | |
ResNet-101 | 1024 × 2048 | ✓ | ✓ | ✓ | 78.68 | 366.33G | |
ResNet-101 | 1024 × 2048 | ✓ | ✓ | ✓ | ✓ | 79.99 | - |
Road | Swalk | Build. | Wall | Fence | Pole | Tlight | Sign | veg. | Terrain | Sky | Person | Rider | Car | Truck | Bus | Train | mbike | Bike | mIoU | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
BL | 98.3 | 84.8 | 92.2 | 50.6 | 52.6 | 63.0 | 68.6 | 74.1 | 92.2 | 71.8 | 94.3 | 84.4 | 66.2 | 95.1 | 60.4 | 72.4 | 59.5 | 61.8 | 72.4 | 74.5 |
+RB | 98.3 | 84.2 | 92.0 | 53.4 | 53.5 | 62.6 | 68.1 | 73.0 | 92.4 | 71.8 | 94.6 | 83.1 | 65.6 | 94.7 | 65.0 | 77.5 | 69.0 | 59.8 | 71.2 | 75.2 |
+ED | 98.4 | 84.2 | 92.1 | 57.0 | 55.6 | 60.0 | 60.5 | 72.2 | 92.8 | 69.6 | 95.2 | 83.4 | 65.4 | 95.1 | 70.7 | 84.7 | 82.0 | 58.3 | 71.5 | 76.3 |
+MPD | 98.5 | 85.4 | 92.3 | 51.9 | 56.1 | 61.2 | 67.6 | 74.4 | 92.9 | 71.2 | 95.2 | 83.9 | 66.7 | 95.4 | 72.7 | 83.0 | 76.0 | 62.8 | 72.6 | 76.8 |
Method | Resolution | Val mIoU | Test mIoU | FLOPs | FPS |
---|---|---|---|---|---|
DG2s [25] | - | 70.6 | 19G | - | |
ERFNet [4] | - | 69.7 | 27.7G | 18.4 | |
ESPNet [6] | - | 60.3 | - | 108.7 | |
SwiftNet [8] | 75.4 | 75.5 | 114G | 39.3 | |
LinkNet [26] | 76.4 | - | 402G | - | |
PSPNet [27] | 78.4 | - | 1444G+ | - | |
DeepLabv3 [28] | 77.82 | - | 1444G+ | - | |
DeepLabv3+ [21] | 78.79 | - | 1416G | - | |
DFN [29] | - | 79.3 | 890G+ | - | |
MPDNet | 76.84 | 76.7 | 118.84G | 37.6 |
Encoder | RB | Edge Decoder | Multi-Path Decoder | MS | mIoU(%) | PA(%) |
---|---|---|---|---|---|---|
ResNet-50 | 38.88 | 78.34 | ||||
ResNet-50 | ✓ | 39.56 | 79.65 | |||
ResNet-50 | ✓ | ✓ | 40.98 | 79.93 | ||
ResNet-50 | ✓ | ✓ | ✓ | 42.06 | 80.42 | |
ResNet-101 | ✓ | ✓ | ✓ | 43.14 | 80.91 | |
ResNet-101 | ✓ | ✓ | ✓ | ✓ | 44.01 | 81.44 |
Method | Single Scale Test mIoU(%)/PA(%) | Multi Scale Test mIoU(%)/PA(%) | SS FLOPs@1Mpx |
---|---|---|---|
PSPNet [27] | 41.96/80.64 | 43.29/81.39 | 722G+ |
UPerNet [22] | 42.00/80.79 | 42.66/81.01 | 370.74G |
DSSPN [30] | - | 43.68/81.13 | - |
PSANet [31] | 42.75/80.71 | 43.77/81.51 | 722G+ |
OCNet [32] | - | 45.45/- | 722G+ |
MPDNet | 43.14/80.91 | 44.01/81.44 | 246.9G |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Bai, X.; Zhou, J. Efficient Semantic Segmentation Using Multi-Path Decoder. Appl. Sci. 2020, 10, 6386. https://doi.org/10.3390/app10186386
Bai X, Zhou J. Efficient Semantic Segmentation Using Multi-Path Decoder. Applied Sciences. 2020; 10(18):6386. https://doi.org/10.3390/app10186386
Chicago/Turabian StyleBai, Xing, and Jun Zhou. 2020. "Efficient Semantic Segmentation Using Multi-Path Decoder" Applied Sciences 10, no. 18: 6386. https://doi.org/10.3390/app10186386
APA StyleBai, X., & Zhou, J. (2020). Efficient Semantic Segmentation Using Multi-Path Decoder. Applied Sciences, 10(18), 6386. https://doi.org/10.3390/app10186386