Advances in Computer Vision and Semantic Segmentation

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Computing and Artificial Intelligence".

Deadline for manuscript submissions: closed (10 January 2024) | Viewed by 4861

Special Issue Editors


E-Mail Website
Guest Editor
Department of Computer Science, College of Science, Swansea University, Singleton Park, Swansea SA2 8PP, UK
Interests: visual analytics; machine learning; digital geometry processing; pattern recognition and vision; multi-dimensional data analysis; information retrieval and indexing
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor

E-Mail Website
Guest Editor
Department of Computer Science, College of Science, Swansea University, Singleton Park, Swansea SA2 8PP, UK
Interests: computer vision; image processing; machine learning; medical image analysis
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
Department of Computer Science, College of Science, Swansea University, Singleton Park, Swansea SA2 8PP, UK
Interests: saliency; object detection; visual attention

E-Mail Website
Guest Editor
School of Computer Science, University of Birmingham, Edgbaston Birmingham B15 2TT, UK
Interests: computer vision; machine learning; medical imaging
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Semantic segmentation is a core problem for many applications, such as image manipulation, facial segmentation, healthcare, security and surveillance, medical imaging and diagnosis, aerial and satellite image surveying and processing, city 3D modelling, and scene understanding. It is also an important building block in more complex systems, including autonomous cars, drones, and human-centric robots.

The recent advances in deep learning techniques (e.g., CNN, FCN, UNet, Graph LSTM, Spatial pyramid, Attentional modelling, Transformer) have fostered many great improvements in semantic segmentation, not only improving speed and accuracy but also inspiring areas, such as instance and panoptic segmentation.

This Special Issue welcomes research works in semantic segmentation (and its broader areas, including instance and panoptic segmentation) and advanced computer vision applications relating to semantic segmentation. It covers possible research and application areas, including multimodal segmentation (e.g., referring image segmentation), salient object detection and segmentation, 3D (point cloud, meshes) semantic segmentation, video semantic segmentation, and many others. Papers focusing on new data (e.g., hyper-spectral data, MRI CT, point cloud, meshes) and new deep architectures, techniques, and learning strategies (e.g., weakly supervised/unsupervised semantic segmentation, zero/few-shot learning, domain adaptation, real-time processing, contextual information, transfer learning, reinforcement learning, and the critical issue of acquiring training data) are all welcome.

Dr. Gary KL Tam
Dr. Frederick W. B. Li
Prof. Dr. Xianghua Xie
Dr. Avishek Siris
Dr. Jianbo Jiao
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • semantic segmentation
  • instance segmentation
  • panoptic segmentation
  • multimodal segmentation
  • referring image segmentation
  • salient object detection and segmentation
  • 3D semantic segmentation
  • Video semantic segmentation
  • weakly supervised semantic segmentation
  • unsupervised semantic segmentation
  • advanced machine searning segmentation techniques
  • medical semantic segmentation

Published Papers (6 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

17 pages, 2886 KiB  
Article
A Multi-Path Semantic Segmentation Network Based on Convolutional Attention Guidance
by Chenyang Feng, Shu Hu and Yi Zhang
Appl. Sci. 2024, 14(5), 2024; https://doi.org/10.3390/app14052024 - 29 Feb 2024
Viewed by 422
Abstract
Due to the efficiency of self-attention mechanisms in encoding spatial information, Transformer-based models have recently taken a dominant position among semantic segmentation methods. However, Transformer-based models have the disadvantages of requiring a large amount of computation and lacking attention to detail, so we [...] Read more.
Due to the efficiency of self-attention mechanisms in encoding spatial information, Transformer-based models have recently taken a dominant position among semantic segmentation methods. However, Transformer-based models have the disadvantages of requiring a large amount of computation and lacking attention to detail, so we look back to the CNN model. In this paper, we propose a multi-path semantic segmentation network with convolutional attention guidance (dubbed MCAG). It has a multi-path architecture, and feature guidance from the main path is used in other paths, which forces the model to focus on the object’s boundaries and details. It also explores multi-scale convolutional features through spatial attention. Finally, it captures both local and global contexts in spatial and channel dimensions in an adaptive manner. Extensive experiments were conducted on popular benchmarks, and it was found that MCAG surpasses other SOTA methods by achieving 47.7%, 82.51% and 43.6% mIoU on ADE20K, Cityscapes and COCO-Stuff, respectively. Specifically, the experimental results prove that the proposed model has high segmentation precision for small objects, which demonstrates the effectiveness of convolutional attention mechanisms and multi-path strategies. The results show that the CNN model can achieve good segmentation effects with a lower amount of calculation. Full article
(This article belongs to the Special Issue Advances in Computer Vision and Semantic Segmentation)
Show Figures

Figure 1

17 pages, 10426 KiB  
Article
Self-Improved Learning for Salient Object Detection
by Songyuan Li, Hao Zeng, Huanyu Wang and Xi Li
Appl. Sci. 2023, 13(23), 12966; https://doi.org/10.3390/app132312966 - 04 Dec 2023
Viewed by 637
Abstract
Salient Object Detection (SOD) aims at identifying the most visually distinctive objects in a scene. However, learning a mapping directly from a raw image to its corresponding saliency map is still challenging. First, the binary annotations of SOD impede the model from learning [...] Read more.
Salient Object Detection (SOD) aims at identifying the most visually distinctive objects in a scene. However, learning a mapping directly from a raw image to its corresponding saliency map is still challenging. First, the binary annotations of SOD impede the model from learning the mapping smoothly. Second, the annotator’s preference introduces noisy labeling in the SOD datasets. Motivated by these, we propose a novel learning framework which consists of the Self-Improvement Training (SIT) strategy and the Augmentation-based Consistent Learning (ACL) scheme. SIT aims at reducing the learning difficulty, which provides smooth labels and improves the SOD model in a momentum-updating manner. Meanwhile, ACL focuses on improving the robustness of models by regularizing the consistency between raw images and their corresponding augmented images. Extensive experiments on five challenging benchmark datasets demonstrate that the proposed framework can play a plug-and-play role in various existing state-of-the-art SOD methods and improve their performances on multiple benchmarks without any architecture modification. Full article
(This article belongs to the Special Issue Advances in Computer Vision and Semantic Segmentation)
Show Figures

Figure 1

15 pages, 2191 KiB  
Article
Towards Feasible Capsule Network for Vision Tasks
by Dang Thanh Vu, Le Bao Thai An, Jin Young Kim and Gwang Hyun Yu
Appl. Sci. 2023, 13(18), 10339; https://doi.org/10.3390/app131810339 - 15 Sep 2023
Viewed by 813
Abstract
Capsule networks exhibit the potential to enhance computer vision tasks through their utilization of equivariance for capturing spatial relationships. However, the broader adoption of these networks has been impeded by the computational complexity of their routing mechanism and shallow backbone model. To address [...] Read more.
Capsule networks exhibit the potential to enhance computer vision tasks through their utilization of equivariance for capturing spatial relationships. However, the broader adoption of these networks has been impeded by the computational complexity of their routing mechanism and shallow backbone model. To address these challenges, this paper introduces an innovative hybrid architecture that seamlessly integrates a pretrained backbone model with a task-specific capsule head (CapsHead). Our methodology is extensively evaluated across a range of classification and segmentation tasks, encompassing diverse datasets. The empirical findings robustly underscore the efficacy and practical feasibility of our proposed approach in real-world vision applications. Notably, our approach yields substantial 3.45% and 6.24% enhancement in linear evaluation on the CIFAR10 dataset and segmentation on the VOC2012 dataset, respectively, compared to baselines that do not incorporate the capsule head. This research offers a noteworthy contribution by not only advancing the application of capsule networks, but also mitigating their computational complexities. The results substantiate the feasibility of our hybrid architecture, thereby paving the way for a wider integration of capsule networks into various computer vision tasks. Full article
(This article belongs to the Special Issue Advances in Computer Vision and Semantic Segmentation)
Show Figures

Figure 1

12 pages, 788 KiB  
Article
Assessing Efficiency in Artificial Neural Networks
by Nicholas J. Schaub and Nathan Hotaling
Appl. Sci. 2023, 13(18), 10286; https://doi.org/10.3390/app131810286 - 14 Sep 2023
Viewed by 770
Abstract
The purpose of this work was to develop an assessment technique and subsequent metrics that help in developing an understanding of the balance between network size and task performance in simple model networks. Here, exhaustive tests on simple model neural networks and datasets [...] Read more.
The purpose of this work was to develop an assessment technique and subsequent metrics that help in developing an understanding of the balance between network size and task performance in simple model networks. Here, exhaustive tests on simple model neural networks and datasets are used to validate both the assessment approach and the metrics derived from it. The concept of neural layer state space is introduced as a simple mechanism for understanding layer utilization, where a state is the on/off activation state of all neurons in a layer for an input. Neural efficiency is computed from state space to measure neural layer utilization, and a second metric called the artificial intelligence quotient (aIQ) was created to balance neural network performance and neural efficiency. To study aIQ and neural efficiency, two simple neural networks were trained on MNIST: a fully connected network (LeNet-300-100) and a convolutional neural network (LeNet-5). The LeNet-5 network with the highest aIQ was 2.32% less accurate but contained 30,912 times fewer parameters than the network with the highest accuracy. Both batch normalization and dropout layers were found to increase neural efficiency. Finally, networks with a high aIQ are shown to be resistant to memorization and overtraining as well as capable of learning proper digit classification with an accuracy of 92.51%, even when 75% of the class labels are randomized. These results demonstrate the utility of aIQ and neural efficiency as metrics for determining the performance and size of a small network using exemplar data. Full article
(This article belongs to the Special Issue Advances in Computer Vision and Semantic Segmentation)
Show Figures

Figure 1

17 pages, 5880 KiB  
Article
Tamed Warping Network for High-Resolution Semantic Video Segmentation
by Songyuan Li, Junyi Feng and Xi Li
Appl. Sci. 2023, 13(18), 10102; https://doi.org/10.3390/app131810102 - 07 Sep 2023
Viewed by 768
Abstract
Recent approaches for fast semantic video segmentation have reduced redundancy by warping feature maps across adjacent frames, greatly speeding up the inference phase. However, the accuracy drops seriously owing to the errors incurred by warping. In this paper, we propose a novel framework [...] Read more.
Recent approaches for fast semantic video segmentation have reduced redundancy by warping feature maps across adjacent frames, greatly speeding up the inference phase. However, the accuracy drops seriously owing to the errors incurred by warping. In this paper, we propose a novel framework and design a simple and effective correction stage after warping. Specifically, we build a non-key-frame CNN, fusing warped context features with current spatial details. Based on the feature fusion, our context feature rectification (CFR) module learns the model’s difference from a per-frame model to correct the warped features. Furthermore, our residual-guided attention (RGA) module utilizes the residual maps in the compressed domain to help CRF focus on error-prone regions. Results on Cityscapes show that the accuracy significantly increases from 67.3% to 71.6%, and the speed edges down from 65.5 FPS to 61.8 FPS at a resolution of 1024×2048. For non-rigid categories, e.g., “human” and “object”, the improvements are even higher than 18 percentage points. Full article
(This article belongs to the Special Issue Advances in Computer Vision and Semantic Segmentation)
Show Figures

Figure 1

17 pages, 3666 KiB  
Article
Encoder–Decoder Structure Fusing Depth Information for Outdoor Semantic Segmentation
by Songnan Chen, Mengxia Tang, Ruifang Dong and Jiangming Kan
Appl. Sci. 2023, 13(17), 9924; https://doi.org/10.3390/app13179924 - 01 Sep 2023
Cited by 1 | Viewed by 878
Abstract
The semantic segmentation of outdoor images is the cornerstone of scene understanding and plays a crucial role in the autonomous navigation of robots. Although RGB–D images can provide additional depth information for improving the performance of semantic segmentation tasks, current state–of–the–art methods directly [...] Read more.
The semantic segmentation of outdoor images is the cornerstone of scene understanding and plays a crucial role in the autonomous navigation of robots. Although RGB–D images can provide additional depth information for improving the performance of semantic segmentation tasks, current state–of–the–art methods directly use ground truth depth maps for depth information fusion, which relies on highly developed and expensive depth sensors. Aiming to solve such a problem, we proposed a self–calibrated RGB-D image semantic segmentation neural network model based on an improved residual network without relying on depth sensors, which utilizes multi-modal information from depth maps predicted with depth estimation models and RGB image fusion for image semantic segmentation to enhance the understanding of a scene. First, we designed a novel convolution neural network (CNN) with an encoding and decoding structure as our semantic segmentation model. The encoder was constructed using IResNet to extract the semantic features of the RGB image and the predicted depth map and then effectively fuse them with the self–calibration fusion structure. The decoder restored the resolution of the output features with a series of successive upsampling structures. Second, we presented a feature pyramid attention mechanism to extract the fused information at multiple scales and obtain features with rich semantic information. The experimental results using the publicly available Cityscapes dataset and collected forest scene images show that our model trained with the estimated depth information can achieve comparable performance to the ground truth depth map in improving the accuracy of the semantic segmentation task and even outperforming some competitive methods. Full article
(This article belongs to the Special Issue Advances in Computer Vision and Semantic Segmentation)
Show Figures

Figure 1

Back to TopTop