Special Issue "Deep Image Semantic Segmentation and Recognition"

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Computing and Artificial Intelligence".

Deadline for manuscript submissions: 31 December 2021.

Special Issue Editors

Prof. Dr. Aleš Jaklič
E-Mail Website
Guest Editor
Faculty of Computer and Information Science, University of Ljubljana, Ljubljana, Slovenia
Interests: computer vision
Prof. Dr. Peter Peer
E-Mail Website
Guest Editor
Faculty of Computer and Information Science, University of Ljubljana, 12, 1000 Ljubljana, Slovenia
Interests: biometrics; computer vision
Prof. Dr. Radim Burget
E-Mail Website
Guest Editor
Department of Telecommunications, Brno University of Technology, 616 00 Brno, Czech Republic
Interests: big data; deep learning; computer vision
Prof. Dr. Fran Bellas
E-Mail Website
Guest Editor
CITIC research center, University of A Coruña, A Coruña, Spain
Interests: robotics, cognitive robotics, evolutionary robotics, educational robotics, computer vision

Special Issue Information

Dear Colleagues,

Recent advances in hardware development and deep neural network architectures on top of the availability of big image databases spurred many new research directions in the field of computer vision, detection, segmentation, semantics extraction and recognition. Motivation for these research efforts stems from various practical applications ranging from autonomous driving to robotics in agriculture, from medical image analysis and biometrics to geosensing, and many more application areas that will benefit from significant improvement in performance of segmentation and recognition algorithms based on deep neural networks.

The aim of this special issue is to gather state of the art research to provide practitioners with broad overview of suitable deep neural network architectures and applications areas with objective performance metrices. We welcome well structured manuscripts with nicely illustrated background and novelty. We also recommend to authors to make the source code, databases, models and architectures publicly available, and to submit multimedia with each manuscript as it significantly increases the visibility and citations of publications.

Prof. Dr. Aleš Jaklič,
Prof. Dr. Peter Peer,
Prof. Dr. Radim Burget,
Prof. Dr. Fran Bellas
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All papers will be peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2000 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • computer vision
  • deep learning
  • detection
  • segmentation
  • recognition
  • reconstruction
  • grouping
  • semantics
  • verification
  • identification

Published Papers (7 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Article
An Advanced Spectral–Spatial Classification Framework for Hyperspectral Imagery Based on DeepLab v3+
Appl. Sci. 2021, 11(12), 5703; https://doi.org/10.3390/app11125703 - 19 Jun 2021
Cited by 3 | Viewed by 496
Abstract
DeepLab v3+ neural network shows excellent performance in semantic segmentation. In this paper, we proposed a segmentation framework based on DeepLab v3+ neural network and applied it to the problem of hyperspectral imagery classification (HSIC). The dimensionality reduction of the hyperspectral image is [...] Read more.
DeepLab v3+ neural network shows excellent performance in semantic segmentation. In this paper, we proposed a segmentation framework based on DeepLab v3+ neural network and applied it to the problem of hyperspectral imagery classification (HSIC). The dimensionality reduction of the hyperspectral image is performed using principal component analysis (PCA). DeepLab v3+ is used to extract spatial features, and those are fused with spectral features. A support vector machine (SVM) classifier is used for fitting and classification. Experimental results show that the framework proposed in this paper outperforms most traditional machine learning algorithms and deep-learning algorithms in hyperspectral imagery classification tasks. Full article
(This article belongs to the Special Issue Deep Image Semantic Segmentation and Recognition)
Show Figures

Figure 1

Article
Supervised Learning Based Peripheral Vision System for Immersive Visual Experiences for Extended Display
Appl. Sci. 2021, 11(11), 4726; https://doi.org/10.3390/app11114726 - 21 May 2021
Viewed by 465
Abstract
Video display content can be extended to the walls of the living room around the TV using projection. The problem of providing appropriate projection content is hard for the computer and we solve this problem with deep neural network. We propose the peripheral [...] Read more.
Video display content can be extended to the walls of the living room around the TV using projection. The problem of providing appropriate projection content is hard for the computer and we solve this problem with deep neural network. We propose the peripheral vision system that provides the immersive visual experiences to the user by extending the video content using deep learning and projecting that content around the TV screen. The user may manually create the appropriate content for the existing TV screen, but it is too expensive to create it. The PCE (Pixel context encoder) network considers the center of the video frame as input and the outside area as output to extend the content using supervised learning. The proposed system is expected to pave a new road to the home appliance industry, transforming the living room into the new immersive experience platform. Full article
(This article belongs to the Special Issue Deep Image Semantic Segmentation and Recognition)
Show Figures

Figure 1

Article
How to Correctly Detect Face-Masks for COVID-19 from Visual Information?
Appl. Sci. 2021, 11(5), 2070; https://doi.org/10.3390/app11052070 - 26 Feb 2021
Cited by 4 | Viewed by 1874
Abstract
The new Coronavirus disease (COVID-19) has seriously affected the world. By the end of November 2020, the global number of new coronavirus cases had already exceeded 60 million and the number of deaths 1,410,378 according to information from the World Health Organization (WHO). [...] Read more.
The new Coronavirus disease (COVID-19) has seriously affected the world. By the end of November 2020, the global number of new coronavirus cases had already exceeded 60 million and the number of deaths 1,410,378 according to information from the World Health Organization (WHO). To limit the spread of the disease, mandatory face-mask rules are now becoming common in public settings around the world. Additionally, many public service providers require customers to wear face-masks in accordance with predefined rules (e.g., covering both mouth and nose) when using public services. These developments inspired research into automatic (computer-vision-based) techniques for face-mask detection that can help monitor public behavior and contribute towards constraining the COVID-19 pandemic. Although existing research in this area resulted in efficient techniques for face-mask detection, these usually operate under the assumption that modern face detectors provide perfect detection performance (even for masked faces) and that the main goal of the techniques is to detect the presence of face-masks only. In this study, we revisit these common assumptions and explore the following research questions: (i) How well do existing face detectors perform with masked-face images? (ii) Is it possible to detect a proper (regulation-compliant) placement of facial masks? and iii) How useful are existing face-mask detection techniques for monitoring applications during the COVID-19 pandemic? To answer these and related questions we conduct a comprehensive experimental evaluation of several recent face detectors for their performance with masked-face images. Furthermore, we investigate the usefulness of multiple off-the-shelf deep-learning models for recognizing correct face-mask placement. Finally, we design a complete pipeline for recognizing whether face-masks are worn correctly or not and compare the performance of the pipeline with standard face-mask detection models from the literature. To facilitate the study, we compile a large dataset of facial images from the publicly available MAFA and Wider Face datasets and annotate it with compliant and non-compliant labels. The annotation dataset, called Face-Mask-Label Dataset (FMLD), is made publicly available to the research community. Full article
(This article belongs to the Special Issue Deep Image Semantic Segmentation and Recognition)
Show Figures

Figure 1

Article
Semantic 3D Mapping from Deep Image Segmentation
Appl. Sci. 2021, 11(4), 1953; https://doi.org/10.3390/app11041953 - 23 Feb 2021
Viewed by 553
Abstract
The perception and identification of visual stimuli from the environment is a fundamental capacity of autonomous mobile robots. Current deep learning techniques make it possible to identify and segment objects of interest in an image. This paper presents a novel algorithm to segment [...] Read more.
The perception and identification of visual stimuli from the environment is a fundamental capacity of autonomous mobile robots. Current deep learning techniques make it possible to identify and segment objects of interest in an image. This paper presents a novel algorithm to segment the object’s space from a deep segmentation of an image taken by a 3D camera. The proposed approach solves the boundary pixel problem that appears when a direct mapping from segmented pixels to their correspondence in the point cloud is used. We validate our approach by comparing baseline approaches using real images taken by a 3D camera, showing that our method outperforms their results in terms of accuracy and reliability. As an application of the proposed algorithm, we present a semantic mapping approach for a mobile robot’s indoor environments. Full article
(This article belongs to the Special Issue Deep Image Semantic Segmentation and Recognition)
Show Figures

Figure 1

Article
Dual-Window Superpixel Data Augmentation for Hyperspectral Image Classification
Appl. Sci. 2020, 10(24), 8833; https://doi.org/10.3390/app10248833 - 10 Dec 2020
Cited by 3 | Viewed by 630
Abstract
Deep learning (DL) has been shown to obtain superior results for classification tasks in the field of remote sensing hyperspectral imaging. Superpixel-based techniques can be applied to DL, significantly decreasing training and prediction times, but the results are usually far from satisfactory due [...] Read more.
Deep learning (DL) has been shown to obtain superior results for classification tasks in the field of remote sensing hyperspectral imaging. Superpixel-based techniques can be applied to DL, significantly decreasing training and prediction times, but the results are usually far from satisfactory due to overfitting. Data augmentation techniques alleviate the problem by synthetically generating new samples from an existing dataset in order to improve the generalization capabilities of the classification model. In this paper we propose a novel data augmentation framework in the context of superpixel-based DL called dual-window superpixel (DWS). With DWS, data augmentation is performed over patches centered on the superpixels obtained by the application of simple linear iterative clustering (SLIC) superpixel segmentation. DWS is based on dividing the input patches extracted from the superpixels into two regions and independently applying transformations over them. As a result, four different data augmentation techniques are proposed that can be applied to a superpixel-based CNN classification scheme. An extensive comparison in terms of classification accuracy with other data augmentation techniques from the literature using two datasets is also shown. One of the datasets consists of small hyperspectral small scenes commonly found in the literature. The other consists of large multispectral vegetation scenes of river basins. The experimental results show that the proposed approach increases the overall classification accuracy for the selected datasets. In particular, two of the data augmentation techniques introduced, namely, dual-flip and dual-rotate, obtained the best results. Full article
(This article belongs to the Special Issue Deep Image Semantic Segmentation and Recognition)
Show Figures

Figure 1

Article
Multi-Frame Labeled Faces Database: Towards Face Super-Resolution from Realistic Video Sequences
Appl. Sci. 2020, 10(20), 7213; https://doi.org/10.3390/app10207213 - 16 Oct 2020
Cited by 1 | Viewed by 736
Abstract
Forensically trained facial reviewers are still considered as one of the most accurate approaches for person identification from video records. The human brain can utilize information, not just from a single image, but also from a sequence of images (i.e., videos), and even [...] Read more.
Forensically trained facial reviewers are still considered as one of the most accurate approaches for person identification from video records. The human brain can utilize information, not just from a single image, but also from a sequence of images (i.e., videos), and even in the case of low-quality records or a long distance from a camera, it can accurately identify a given person. Unfortunately, in many cases, a single still image is needed. An example of such a case is a police search that is about to be announced in newspapers. This paper introduces a face database obtained from real environment counting in 17,426 sequences of images. The dataset includes persons of various races and ages and also different environments, different lighting conditions or camera device types. This paper also introduces a new multi-frame face super-resolution method and compares this method with the state-of-the-art single-frame and multi-frame super-resolution methods. We prove that the proposed method increases the quality of face images, even in cases of low-resolution low-quality input images, and provides better results than single-frame approaches that are still considered the best in this area. Quality of face images was evaluated using several objective mathematical methods, and also subjective ones, by several volunteers. The source code and the dataset were released and the experiment is fully reproducible. Full article
(This article belongs to the Special Issue Deep Image Semantic Segmentation and Recognition)
Show Figures

Figure 1

Article
LSUN-Stanford Car Dataset: Enhancing Large-Scale Car Image Datasets Using Deep Learning for Usage in GAN Training
Appl. Sci. 2020, 10(14), 4913; https://doi.org/10.3390/app10144913 - 17 Jul 2020
Cited by 7 | Viewed by 1719
Abstract
Currently there is no publicly available adequate dataset that could be used for training Generative Adversarial Networks (GANs) on car images. All available car datasets differ in noise, pose, and zoom levels. Thus, the objective of this work was to create an improved [...] Read more.
Currently there is no publicly available adequate dataset that could be used for training Generative Adversarial Networks (GANs) on car images. All available car datasets differ in noise, pose, and zoom levels. Thus, the objective of this work was to create an improved car image dataset that would be better suited for GAN training. To improve the performance of the GAN, we coupled the LSUN and Stanford car datasets. A new merged dataset was then pruned in order to adjust zoom levels and reduce the noise of images. This process resulted in fewer images that could be used for training, with increased quality though. This pruned dataset was evaluated by training the StyleGAN with original settings. Pruning the combined LSUN and Stanford datasets resulted in 2,067,710 images of cars with less noise and more adjusted zoom levels. The training of the StyleGAN on the LSUN-Stanford car dataset proved to be superior to the training with just the LSUN dataset by 3.7% using the Fréchet Inception Distance (FID) as a metric. Results pointed out that the proposed LSUN-Stanford car dataset is more consistent and better suited for training GAN neural networks than other currently available large car datasets. Full article
(This article belongs to the Special Issue Deep Image Semantic Segmentation and Recognition)
Show Figures

Figure 1

Planned Papers

The below list represents only planned manuscripts. Some of these manuscripts have not been received by the Editorial Office yet. Papers submitted to MDPI journals are subject to peer-review.

Title: How to correctly detect face masks for COVID-19?
 
Abstract: The new Coronavirus COVID-19 has seriously affected the world. By the end of November 2020, the global number of new coronavirus cases had already exceeded 60 million and the number of deaths 1,410,378 according to information from the World Health Organization (WHO). To limit the spread of the disease, mandatory face–mask rules are now becoming common in public settings around the world. Additionally, many public service providers require customers to wear face masks in accordance with predefined rules (e.g., covering both mouth and nose) when using public services. These developments inspired research into automatic (computer–vision based) techniques for face–mask detection that can help monitor public behavior and contribute towards constraining the COVID-19 pandemic. While existing research in this area resulted in efficient techniques for face–mask detection, these usually operate under the assumption that modern face detectors provide perfect detection performance (even for masked faces) and that the main goal of the techniques is to detect the presence of face masks only. In this study we revisit these common assumptions and explore the following research questions: (i) How well do existing face detectors perform with masked face images? (ii) Is it possible to detect a proper (regulation–compliant) placement of facial masks? and iii) How useful are existing face mask detection techniques for monitoring applications during the COVID-19 pandemic? To answer these and related questions we conduct a comprehensive experimental evaluation of several recent face detectors for their performance with masked face images. Furthermore, we investigate the usefulness of multiple off-the shelf deep learning models for recognizing correct face–mask placement. Finally, we design a complete pipeline for recognizing whether face masks are worn correctly or not and compare the performance of the pipeline with standard face–mask detection models from the literature. To facilitate the study we compile a large dataset of facial images from the publicly available MAFA and Wider Face datasets and annotate it with compliant and non–compliant labels. The annotation dataset, called Face Mask–Label Dataset (FMLD), is made publicly available to the research community.
Back to TopTop