Special Issue "Deep Learning for Computer Vision: Algorithms, Theory and Application"

A special issue of Electronics (ISSN 2079-9292). This special issue belongs to the section "Artificial Intelligence".

Deadline for manuscript submissions: 30 September 2022 | Viewed by 17500

Special Issue Editors

Prof. Dr. Jungong Han
E-Mail Website
Guest Editor
Data Science Group, University of Warwick, Coventry CV4 7AL, UK
Interests: video analytics; computer vision; machine learning; artificial intelligence
Prof. Dr. Guiguang Ding
E-Mail Website
Guest Editor
School of Software, Tsinghua University, Beijing 100084, China
Interests: multimedia analysis; computer vision; machine learning

Special Issue Information

Dear Colleagues,

In recent years, deep learning methods have made important breakthroughs in several fields, with computer vision being one of the most prominent cases. Compared with traditional machine learning methods that rely on handcrafted features, deep learning-based methods allow the acquisition of knowledge directly from data and are hence capable of extracting much more abstract and semantic features with better representation capability. With their cascaded layers that can contain hundreds of millions of parameters, they can model highly nonlinear functions. Despite these advances, issues such as with handling noisy or distorted data, training deep networks with limited labeled data, and training models while protecting data confidentiality and integrity are some of the challenges faced by deep computer vision practitioners. As such, it is now necessary to explore the advanced algorithms, theories, and optimization approaches as applied to deep computer vision.

This Special Issue aims at bringing together researchers and scientists from various disciplines to present recent advances in dealing with the challenging problems of computer vision within the framework of deep learning. We invite authors to submit manuscripts on topics related to the theme of the Special Issue and which have not been previously published. The topics of interest include, but are not limited to:

  • Deep learning algorithms and models (supervised/weakly supervised/unsupervised)
  • Feature learning and feature representation based on deep learning
  • Deep learning-based image recognition
  • Deep learning-based video understanding
  • Deep learning-based remote sensing image analysis
  • Deep learning-based saliency/co-saliency detection
  • Deep learning-based visual object tracking
  • Deep learning-based image super-resolution
  • Deep learning-based image quality assessment
  • Deep network compression and acceleration

Prof. Dr. Jungong Han
Prof. Dr. Guiguang Ding
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Electronics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2000 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • deep learning
  • artificial intelligence
  • neural networks
  • visual analysis
  • vision application

Published Papers (22 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Review

Article
Automatic Modulation Classification with Neural Networks via Knowledge Distillation
Electronics 2022, 11(19), 3018; https://doi.org/10.3390/electronics11193018 - 22 Sep 2022
Viewed by 251
Abstract
Deep learning is used for automatic modulation recognition in neural networks, and because of the need for high classification accuracy, deeper and deeper networks are used. However, these are computationally very expensive for neural network training and inference, so its utility in the [...] Read more.
Deep learning is used for automatic modulation recognition in neural networks, and because of the need for high classification accuracy, deeper and deeper networks are used. However, these are computationally very expensive for neural network training and inference, so its utility in the case of a mobile with memory limitations or weak computational power is questionable. As a result, a trade-off between network depth and network classification accuracy must be considered. To address this issue, we used a knowledge distillation method in this study to improve the classification accuracy of a small network model. First, we trained Inception–Resnet as a teacher network, which has a size of 311.77 MB and a final peak classification accuracy of 93.09%. We used the method to train convolutional neural network 3 (CNN3) and increase its peak classification accuracy from 79.81 to 89.36%, with a network size of 0.37 MB. It was also used similarly to train mini Inception–Resnet and increase its peak accuracy from 84.18 to 93.59%, with a network size of 39.69 MB. When we compared all classification accuracy peaks, we discover that knowledge distillation improved small networks and that the student network had the potential to outperform the teacher network. Using knowledge distillation, a small network model can achieve the classification accuracy of a large network model. In practice, choosing the appropriate student network based on the constraints of the usage conditions while using knowledge distillation (KD) would be a way to meet practical needs. Full article
(This article belongs to the Special Issue Deep Learning for Computer Vision: Algorithms, Theory and Application)
Article
A New Compact Method Based on a Convolutional Neural Network for Classification and Validation of Tomato Plant Disease
Electronics 2022, 11(19), 2994; https://doi.org/10.3390/electronics11192994 - 21 Sep 2022
Viewed by 177
Abstract
With recent advancements in the classification methods of various domains, deep learning has shown remarkable results over traditional neural networks. A compact convolutional neural network (CNN) model with reduced computational complexity that performs equally well compared to the pretrained ResNet-101 model was developed. [...] Read more.
With recent advancements in the classification methods of various domains, deep learning has shown remarkable results over traditional neural networks. A compact convolutional neural network (CNN) model with reduced computational complexity that performs equally well compared to the pretrained ResNet-101 model was developed. This three-layer CNN model was developed for plant leaf classification in this work. The classification of disease in tomato plant leaf images of the healthy and disease classes from the PlantVillage (PV) database is discussed in this work. Further, it supports validating the models with the images taken at “Krishi Vigyan Kendra Narayangaon (KVKN),” Pune, India. The disease categories were chosen based on their prevalence in Indian states. The proposed approach presents a performance improvement concerning other state-of-the-art methods; it achieved classification accuracies of 99.13%, 99.51%, and 99.40% with N1, N2, and N3 models, respectively, on the PV dataset. Experimental results demonstrate the validity of the proposed approach under complex background conditions. For the images captured at KVKN for predicting tomato plant leaf disease, the validation accuracy was 100% for the N1 model, 98.44% for the N2 model, and 96% for the N3 model. The training time for the developed N2 model was reduced by 89% compared to the ResNet-101 model. The models developed are smaller, more efficient, and less time-complex. The performance of the developed model will help us to take a significant step towards managing the infected plants. This will help farmers and contribute to sustainable agriculture. Full article
(This article belongs to the Special Issue Deep Learning for Computer Vision: Algorithms, Theory and Application)
Show Figures

Figure 1

Article
Global Correlation Enhanced Hand Action Recognition Based on NST-GCN
Electronics 2022, 11(16), 2518; https://doi.org/10.3390/electronics11162518 - 11 Aug 2022
Viewed by 282
Abstract
Hand action recognition is an important part of intelligent monitoring, human–computer interaction, robotics and other fields. Compared with other methods, the hand action recognition method using skeleton information can ignore the error effects caused by complex background and movement speed changes, and the [...] Read more.
Hand action recognition is an important part of intelligent monitoring, human–computer interaction, robotics and other fields. Compared with other methods, the hand action recognition method using skeleton information can ignore the error effects caused by complex background and movement speed changes, and the computational cost is relatively small. The spatial-temporal graph convolution networks (ST-GCN) model has excellent performance in the field of skeleton-based action recognition. In order to solve the problem of the root joint and the further joint not being closely connected, resulting in a poor hand-action-recognition effect, this paper firstly uses the dilated convolution to replace the standard convolution in the temporal dimension. This is in order to process the time series features of the hand action video, which increases the receptive field in the temporal dimension and enhances the connection between features. Then, by adding non-physical connections, the connection between the joints of the fingertip and the root of the finger is established, and a new partition strategy is adopted to strengthen the hand correlation of each joint point information. This helps to improve the network’s ability to extract the spatial-temporal features of the hand. The improved model is tested on public datasets and real scenarios. The experimental results show that compared with the original model, the 14-category top-1 and 28-category top-1 evaluation indicators of the dataset have been improved by 4.82% and 6.96%. In the real scene, the recognition effect of the categories with large changes in hand movements is better, and the recognition results of the categories with similar trends of hand movements are poor, so there is still room for improvement. Full article
(This article belongs to the Special Issue Deep Learning for Computer Vision: Algorithms, Theory and Application)
Show Figures

Figure 1

Article
Face-Based CNN on Triangular Mesh with Arbitrary Connectivity
Electronics 2022, 11(15), 2466; https://doi.org/10.3390/electronics11152466 - 08 Aug 2022
Viewed by 314
Abstract
Applying convolutional neural networks (CNNs) to triangular meshes has always been a challenging task. Because of the complex structure of the meshes, most of the existing methods apply CNNs indirectly to them, and require complex preprocessing or transformation of the meshes. In this [...] Read more.
Applying convolutional neural networks (CNNs) to triangular meshes has always been a challenging task. Because of the complex structure of the meshes, most of the existing methods apply CNNs indirectly to them, and require complex preprocessing or transformation of the meshes. In this paper, we propose a novel face-based CNN, which can be directly applied to triangular meshes with arbitrary connectivity by defining face convolution and pooling. The proposed approach takes each face of the meshes as the basic element, similar to CNNs with pixels of 2D images. First, the intrinsic features of the faces are used as the input features of the network. Second, a sort convolution operation with adjustable convolution kernel sizes is constructed to extract the face features. Third, we design an approximately uniform pooling operation by learnable face collapse, which can be applied to the meshes with arbitrary connectivity, and we directly use its inverse operation as unpooling. Extensive experiments show that the proposed approach is comparable to, or can even outperform, state-of-the-art methods in mesh classification and mesh segmentation. Full article
(This article belongs to the Special Issue Deep Learning for Computer Vision: Algorithms, Theory and Application)
Show Figures

Figure 1

Article
LLGF-Net: Learning Local and Global Feature Fusion for 3D Point Cloud Semantic Segmentation
Electronics 2022, 11(14), 2191; https://doi.org/10.3390/electronics11142191 - 13 Jul 2022
Viewed by 384
Abstract
Three-dimensional (3D) point cloud semantic segmentation is fundamental in complex scene perception. Currently, although various efficient 3D semantic segmentation networks have been proposed, the overall effect has a certain gap to 2D image segmentation. Recently, some transformer-based methods have opened a new stage [...] Read more.
Three-dimensional (3D) point cloud semantic segmentation is fundamental in complex scene perception. Currently, although various efficient 3D semantic segmentation networks have been proposed, the overall effect has a certain gap to 2D image segmentation. Recently, some transformer-based methods have opened a new stage in computer vision, which also has accelerated the effective development of methods in 3D point cloud segmentation. In this paper, we propose a novel semantic segmentation network named LLGF-Net that can aggregate features from both local and global levels of point clouds, effectively improving the ability to extract feature information from point clouds. Specifically, we adopt the multi-head attention mechanism in the original Transformer model to obtain the local features of point clouds and then use the position-distance information of point clouds in 3D space to obtain the global features. Finally, the local features and global features are fused and embedded into the encoder–decoder network to generate our method. Our extensive experimental results on the 3D point cloud dataset demonstrate the effectiveness and superiority of our method. Full article
(This article belongs to the Special Issue Deep Learning for Computer Vision: Algorithms, Theory and Application)
Show Figures

Figure 1

Article
Depth Estimation of Monocular PCB Image Based on Self-Supervised Convolution Network
Electronics 2022, 11(12), 1812; https://doi.org/10.3390/electronics11121812 - 07 Jun 2022
Viewed by 433
Abstract
To improve the accuracy of using deep neural networks to predict the depth information of a single image, we proposed an unsupervised convolutional neural network for single-image depth estimation. Firstly, the network is improved by introducing a dense residual module into the encoding [...] Read more.
To improve the accuracy of using deep neural networks to predict the depth information of a single image, we proposed an unsupervised convolutional neural network for single-image depth estimation. Firstly, the network is improved by introducing a dense residual module into the encoding and decoding structure. Secondly, the optimized hybrid attention module is introduced into the network. Finally, stereo image is used as the training data of the network to realize the end-to-end single-image depth estimation. The experimental results on KITTI and Cityscapes data sets show that compared with some classical algorithms, our proposed method can obtain better accuracy and lower error. In addition, we train our models on PCB data sets in industrial environments. Experiments in several scenarios verify the generalization ability of the proposed method and the excellent performance of the model. Full article
(This article belongs to the Special Issue Deep Learning for Computer Vision: Algorithms, Theory and Application)
Show Figures

Figure 1

Article
Accelerated Diagnosis of Novel Coronavirus (COVID-19)—Computer Vision with Convolutional Neural Networks (CNNs)
Electronics 2022, 11(7), 1148; https://doi.org/10.3390/electronics11071148 - 06 Apr 2022
Cited by 1 | Viewed by 627
Abstract
Early detection and diagnosis of COVID-19, as well as the exact separation of non-COVID-19 cases in a non-invasive manner in the earliest stages of the disease, are critical concerns in the current COVID-19 pandemic. Convolutional Neural Network (CNN) based models offer a remarkable [...] Read more.
Early detection and diagnosis of COVID-19, as well as the exact separation of non-COVID-19 cases in a non-invasive manner in the earliest stages of the disease, are critical concerns in the current COVID-19 pandemic. Convolutional Neural Network (CNN) based models offer a remarkable capacity for providing an accurate and efficient system for the detection and diagnosis of COVID-19. Due to the limited availability of RT-PCR (Reverse transcription-polymerase Chain Reaction) tests in developing countries, imaging-based techniques could offer an alternative and affordable solution to detect COVID-19 symptoms. This paper reviewed the current CNN-based approaches and investigated a custom-designed CNN method to detect COVID-19 symptoms from CT (Computed Tomography) chest scan images. This study demonstrated an integrated method to accelerate the process of classifying CT scan images. In order to improve the computational time, a hardware-based acceleration method was investigated and implemented on a reconfigurable platform (FPGA). Experimental results highlight the difference between various approximations of the design, providing a range of design options corresponding to both software and hardware. The FPGA-based implementation involved a reduced pre-processed feature vector for the classification task, which is a unique advantage of this particular application. To demonstrate the applicability of the proposed method, results from the CPU-based classification and the FPGA were measured separately and compared retrospectively. Full article
(This article belongs to the Special Issue Deep Learning for Computer Vision: Algorithms, Theory and Application)
Show Figures

Figure 1

Article
CondNAS: Neural Architecture Search for Conditional CNNs
Electronics 2022, 11(7), 1101; https://doi.org/10.3390/electronics11071101 - 31 Mar 2022
Viewed by 518
Abstract
As deep learning has become prevalent and adopted in various application domains, the need for efficient convolution neural network (CNN) inference on diverse target platforms has increased. To address the need, a neural architecture search (NAS) technique called once-for-all, or OFA, which aims [...] Read more.
As deep learning has become prevalent and adopted in various application domains, the need for efficient convolution neural network (CNN) inference on diverse target platforms has increased. To address the need, a neural architecture search (NAS) technique called once-for-all, or OFA, which aims to efficiently find the optimal CNN architecture for the given target platform using genetic algorithm (GA), has recently been proposed. Meanwhile, a conditional CNN architecture, which allows early exits with auxiliary classifiers in the middle of a network to achieve efficient inference without accuracy loss or with negligible loss, has been proposed. In this paper, we propose a NAS technique for the conditional CNN architecture, CondNAS, which efficiently finds a near-optimal conditional CNN architecture for the target platform using GA. By attaching auxiliary classifiers through adaptive pooling, OFA’s SuperNet is successfully extended, such that it incorporates the various conditional CNN sub-networks. In addition, we devise machine learning-based prediction models for the accuracy and latency of an arbitrary conditional CNN, which are used in the GA of CondNAS to efficiently explore the large search space. The experimental results show that the conditional CNNs from CondNAS is 2.52× and 1.75× faster than the CNNs from OFA for Galaxy Note10+ GPU and CPU, respectively. Full article
(This article belongs to the Special Issue Deep Learning for Computer Vision: Algorithms, Theory and Application)
Show Figures

Figure 1

Article
Region Resolution Learning and Region Segmentation Learning with Overall and Body Part Perception for Pedestrian Detection
Electronics 2022, 11(6), 966; https://doi.org/10.3390/electronics11060966 - 21 Mar 2022
Viewed by 555
Abstract
Pedestrian detection is a great challenge, especially in complex and diverse occlusion environments. When a pedestrian is in an occlusion situation, the pedestrian visible part becomes incomplete, and the body bounding box contains part of the pedestrian, other objects and backgrounds. Based on [...] Read more.
Pedestrian detection is a great challenge, especially in complex and diverse occlusion environments. When a pedestrian is in an occlusion situation, the pedestrian visible part becomes incomplete, and the body bounding box contains part of the pedestrian, other objects and backgrounds. Based on this, we attempt different methods to help the detector learn more features of the pedestrian under different occlusion situations. First, we propose region resolution learning, which learns the pedestrian regions on the input image. Second, we propose fine-grained segmentation learning to learn the outline and shape of different parts of pedestrians. We propose an anchor-free approach that combines a pedestrian detector CSP, region Resolution learning and Segmentation learning (CSPRS). We help the detector to learn extra features. CSPRS provides another way to perceive pixels, outline and shapes in pedestrian areas. This detector includes region resolution learning, and segmentation learning helps the detector to locate pedestrians. By simply adding the region resolution learning branch and segmentation branch, CSPRS achieves good results. The experimental results show that both methods of learning pedestrian features improve performance. We evaluate our proposed detector CSPRS on the CityPersons benchmark, and the experiments show that CSPRS achieved 42.53% on the heavy subset on the CityPersons dataset. Full article
(This article belongs to the Special Issue Deep Learning for Computer Vision: Algorithms, Theory and Application)
Show Figures

Figure 1

Article
Lane following Learning Based on Semantic Segmentation with Chroma Key and Image Superposition
Electronics 2021, 10(24), 3113; https://doi.org/10.3390/electronics10243113 - 14 Dec 2021
Viewed by 827
Abstract
There are various techniques to approach learning in autonomous driving; however, all of them suffer from some problems. In the case of imitation learning based on artificial neural networks, the system must learn to correctly identify the elements of the environment. In some [...] Read more.
There are various techniques to approach learning in autonomous driving; however, all of them suffer from some problems. In the case of imitation learning based on artificial neural networks, the system must learn to correctly identify the elements of the environment. In some cases, it takes a lot of effort to tag the images with the proper semantics. This is also relevant given the need to have very varied scenarios to train and to thus obtain an acceptable generalization capacity. In the present work, we propose a technique for automated semantic labeling. It is based on various learning phases using image superposition combining both scenarios with chromas and real indoor scenarios. This allows the generation of augmented datasets that facilitate the learning process. Further improvements by applying noise techniques are also studied. To carry out the validation, a small-scale car model is used that learns to automatically drive on a reduced circuit. A comparison with models that do not rely on semantic segmentation is also performed. The main contribution of our proposal is the possibility of generating datasets for real indoor scenarios with automatic semantic segmentation, without the need for endless human labeling tasks. Full article
(This article belongs to the Special Issue Deep Learning for Computer Vision: Algorithms, Theory and Application)
Show Figures

Figure 1

Article
A Semantic Segmentation Method for Early Forest Fire Smoke Based on Concentration Weighting
Electronics 2021, 10(21), 2675; https://doi.org/10.3390/electronics10212675 - 31 Oct 2021
Viewed by 624
Abstract
Forest fire smoke detection based on deep learning has been widely studied. Labeling the smoke image is a necessity when building datasets of target detection and semantic segmentation. The uncertainty in labeling the forest fire smoke pixels caused by the non-uniform diffusion of [...] Read more.
Forest fire smoke detection based on deep learning has been widely studied. Labeling the smoke image is a necessity when building datasets of target detection and semantic segmentation. The uncertainty in labeling the forest fire smoke pixels caused by the non-uniform diffusion of smoke particles will affect the recognition accuracy of the deep learning model. To overcome the labeling ambiguity, the weighted idea was proposed in this paper for the first time. First, the pixel-concentration relationship between the gray value and the concentration of forest fire smoke pixels in the image was established. Second, the loss function of the semantic segmentation method based on concentration weighting was built and improved; thus, the network could pay attention to the smoke pixels differently, an effort to better segment smoke by weighting the loss calculation of smoke pixels. Finally, based on the established forest fire smoke dataset, selection of the optimum weighted factors was made through experiments. mIoU based on the weighted method increased by 1.52% than the unweighted method. The weighted method cannot only be applied to the semantic segmentation and target detection of forest fire smoke, but also has a certain significance to other dispersive target recognition. Full article
(This article belongs to the Special Issue Deep Learning for Computer Vision: Algorithms, Theory and Application)
Show Figures

Figure 1

Article
Multi-Stage Attention-Enhanced Sparse Graph Convolutional Network for Skeleton-Based Action Recognition
Electronics 2021, 10(18), 2198; https://doi.org/10.3390/electronics10182198 - 08 Sep 2021
Viewed by 633
Abstract
Graph convolutional networks (GCNs), which model human actions as a series of spatial-temporal graphs, have recently achieved superior performance in skeleton-based action recognition. However, the existing methods mostly use the physical connections of joints to construct a spatial graph, resulting in limited topological [...] Read more.
Graph convolutional networks (GCNs), which model human actions as a series of spatial-temporal graphs, have recently achieved superior performance in skeleton-based action recognition. However, the existing methods mostly use the physical connections of joints to construct a spatial graph, resulting in limited topological information of the human skeleton. In addition, the action features in the time domain have not been fully explored. To better extract spatial-temporal features, we propose a multi-stage attention-enhanced sparse graph convolutional network (MS-ASGCN) for skeleton-based action recognition. To capture more abundant joint dependencies, we propose a new strategy for constructing skeleton graphs. This simulates bidirectional information flows between neighboring joints and pays greater attention to the information transmission between sparse joints. In addition, a part attention mechanism is proposed to learn the weight of each part and enhance the part-level feature learning. We introduce multiple streams of different stages and merge them in specific layers of the network to further improve the performance of the model. Our model is finally verified on two large-scale datasets, namely NTU-RGB+D and Skeleton-Kinetics. Experiments demonstrate that the proposed MS-ASGCN outperformed the previous state-of-the-art methods on both datasets. Full article
(This article belongs to the Special Issue Deep Learning for Computer Vision: Algorithms, Theory and Application)
Show Figures

Figure 1

Article
Increasing Information Entropy of Both Weights and Activations for the Binary Neural Networks
Electronics 2021, 10(16), 1943; https://doi.org/10.3390/electronics10161943 - 12 Aug 2021
Cited by 1 | Viewed by 562
Abstract
In terms of memory footprint requirement and computing speed, the binary neural networks (BNNs) have great advantages in power-aware deployment applications, such as AIoT edge terminals, wearable and portable devices, etc. However, the networks’ binarization process inevitably brings considerable information losses, and further [...] Read more.
In terms of memory footprint requirement and computing speed, the binary neural networks (BNNs) have great advantages in power-aware deployment applications, such as AIoT edge terminals, wearable and portable devices, etc. However, the networks’ binarization process inevitably brings considerable information losses, and further leads to accuracy deterioration. To tackle these problems, we initiate analyzing from a perspective of the information theory, and manage to improve the networks information capacity. Based on the analyses, our work has two primary contributions: the first is a newly proposed median loss (ML) regularization technique. It improves the binary weights distribution more evenly, and consequently increases the information capacity of BNNs greatly. The second is the batch median of activations (BMA) method. It raises the entropy of activations by subtracting a median value, and simultaneously lowers the quantization error by computing separate scaling factors for the positive and negative activations procedure. Experiment results prove that the proposed methods utilized in ResNet-18 and ResNet-34 individually outperform the Bi-Real baseline by 1.3% and 0.9% Top-1 accuracy on the ImageNet 2012. Proposed ML and BMA for the storage cost and calculation complexity increments are minor and negligible. Additionally, comprehensive experiments also prove that our methods can be applicable and embedded into the present popular BNN networks with accuracy improvement and negligible overhead increment. Full article
(This article belongs to the Special Issue Deep Learning for Computer Vision: Algorithms, Theory and Application)
Show Figures

Figure 1

Article
Improving Heterogeneous Network Knowledge Transfer Based on the Principle of Generative Adversarial
Electronics 2021, 10(13), 1525; https://doi.org/10.3390/electronics10131525 - 24 Jun 2021
Cited by 9 | Viewed by 599
Abstract
Deep learning requires a large amount of datasets to train deep neural network models for specific tasks, and thus training of a new model is a very costly task. Research on transfer networks used to reduce training costs will be the next turning [...] Read more.
Deep learning requires a large amount of datasets to train deep neural network models for specific tasks, and thus training of a new model is a very costly task. Research on transfer networks used to reduce training costs will be the next turning point in deep learning research. The use of source task models to help reduce the training costs of the target task models, especially heterogeneous systems, is a problem we are studying. In order to quickly obtain an excellent target task model driven by the source task model, we propose a novel transfer learning approach. The model linearly transforms the feature mapping of the target domain and increases the weight value for feature matching to realize the knowledge transfer between heterogeneous networks and add a domain discriminator based on the principle of generative adversarial to speed up feature mapping and learning. Most importantly, this paper proposes a new objective function optimization scheme to complete the model training. It successfully combines the generative adversarial network with the weight feature matching method to ensure that the target model learns the most beneficial features from the source domain for its task. Compared with the previous transfer algorithm, our training results are excellent under the same benchmark for image recognition tasks. Full article
(This article belongs to the Special Issue Deep Learning for Computer Vision: Algorithms, Theory and Application)
Show Figures

Figure 1

Article
VRBagged-Net: Ensemble Based Deep Learning Model for Disaster Event Classification
Electronics 2021, 10(12), 1411; https://doi.org/10.3390/electronics10121411 - 11 Jun 2021
Viewed by 863
Abstract
A flood is an overflow of water that swamps dry land. The gravest effects of flooding are the loss of human life and economic losses. An early warning of these events can be very effective in minimizing the losses. Social media websites such [...] Read more.
A flood is an overflow of water that swamps dry land. The gravest effects of flooding are the loss of human life and economic losses. An early warning of these events can be very effective in minimizing the losses. Social media websites such as Twitter and Facebook are quite effective in the efficient dissemination of information pertinent to any emergency. Users on these social networking sites share both textual and rich content images and videos. The Multimedia Evaluation Benchmark (MediaEval) offers challenges in the form of shared tasks to develop and evaluate new algorithms, approaches and technologies for explorations and exploitations of multimedia in decision making for real time problems. Since 2015, the MediaEval has been running a shared task of predicting several aspects of flooding and through these shared tasks, many improvements have been observed. In this paper, the classification framework VRBagged-Net is proposed and implemented for flood classification. The framework utilizes the deep learning models Visual Geometry Group (VGG) and Residual Network (ResNet), along with the technique of Bootstrap aggregating (Bagging). Various disaster-based datasets were selected for the validation of the VRBagged-Net framework. All the datasets belong to the MediaEval Benchmark Workshop, this includes Disaster Image Retrieval from Social Media (DIRSM), Flood Classification for Social Multimedia (FCSM) and Image based News Topic Disambiguation (INTD). VRBagged-Net performed encouraging well in all these datasets with slightly different but relevant tasks. It produces Mean Average Precision at different levels of 98.12, and Average Precision at 480 of 93.64 on DIRSM. On the FCSM dataset, it produces an F1 score of 90.58. Moreover, the framework has been applied on the dataset of Image-Based News Topic Disambiguation (INTD), and exceeds the previous best result by producing an F1 evaluation of 93.76. The VRBagged-Net with a slight modification also ranked first in the flood-related Multimedia Task at the MediaEval Workshop 2020. Full article
(This article belongs to the Special Issue Deep Learning for Computer Vision: Algorithms, Theory and Application)
Show Figures

Figure 1

Article
Crowd Counting Using End-to-End Semantic Image Segmentation
Electronics 2021, 10(11), 1293; https://doi.org/10.3390/electronics10111293 - 28 May 2021
Cited by 4 | Viewed by 1328
Abstract
Crowd counting is an active research area within scene analysis. Over the last 20 years, researchers proposed various algorithms for crowd counting in real-time scenarios due to many applications in disaster management systems, public events, safety monitoring, and so on. In our paper, [...] Read more.
Crowd counting is an active research area within scene analysis. Over the last 20 years, researchers proposed various algorithms for crowd counting in real-time scenarios due to many applications in disaster management systems, public events, safety monitoring, and so on. In our paper, we proposed an end-to-end semantic segmentation framework for crowd counting in a dense crowded image. Our proposed framework was based on semantic scene segmentation using an optimized convolutional neural network. The framework successfully highlighted the foreground and suppressed the background part. The framework encoded the high-density maps through a guided attention mechanism system. We obtained crowd counting through integrating the density maps. Our proposed algorithm classified the crowd counting in each image into groups to adapt the variations occurring in crowd counting. Our algorithm overcame the scale variations of a crowded image through multi-scale features extracted from the images. We conducted experiments with four standard crowd-counting datasets, reporting better results as compared to previous results. Full article
(This article belongs to the Special Issue Deep Learning for Computer Vision: Algorithms, Theory and Application)
Show Figures

Figure 1

Article
Traffic Police Gesture Recognition Based on Gesture Skeleton Extractor and Multichannel Dilated Graph Convolution Network
Electronics 2021, 10(5), 551; https://doi.org/10.3390/electronics10050551 - 26 Feb 2021
Cited by 4 | Viewed by 825
Abstract
Traffic police gesture recognition is important in automatic driving. Most existing traffic police gesture recognition methods extract pixel-level features from RGB images which are uninterpretable because of a lack of gesture skeleton features and may result in inaccurate recognition due to background noise. [...] Read more.
Traffic police gesture recognition is important in automatic driving. Most existing traffic police gesture recognition methods extract pixel-level features from RGB images which are uninterpretable because of a lack of gesture skeleton features and may result in inaccurate recognition due to background noise. Existing deep learning methods are not suitable for handling gesture skeleton features because they ignore the inevitable connection between skeleton joint coordinate information and gestures. To alleviate the aforementioned issues, a traffic police gesture recognition method based on a gesture skeleton extractor (GSE) and a multichannel dilated graph convolution network (MD-GCN) is proposed. To extract discriminative and interpretable gesture skeleton coordinate information, a GSE is proposed to extract skeleton coordinate information and remove redundant skeleton joints and bones. In the gesture discrimination stage, GSE-based features are introduced into the proposed MD-GCN. The MD-GCN constructs a graph convolution with a multichannel dilated to enlarge the receptive field, which extracts body topological and spatiotemporal action features from skeleton coordinates. Comparison experiments with state-of-the-art methods were conducted on a public dataset. The results show that the proposed method achieves an accuracy rate of 98.95%, which is the best and at least 6% higher than that of the other methods. Full article
(This article belongs to the Special Issue Deep Learning for Computer Vision: Algorithms, Theory and Application)
Show Figures

Figure 1

Article
Overwater Image Dehazing via Cycle-Consistent Generative Adversarial Network
Electronics 2020, 9(11), 1877; https://doi.org/10.3390/electronics9111877 - 08 Nov 2020
Cited by 1 | Viewed by 1318
Abstract
In contrast to images taken on land scenes, images taken over water are more prone to degradation due to the influence of the haze. However, existing image dehazing methods are mainly developed for land-scene images and perform poorly when applied to overwater images. [...] Read more.
In contrast to images taken on land scenes, images taken over water are more prone to degradation due to the influence of the haze. However, existing image dehazing methods are mainly developed for land-scene images and perform poorly when applied to overwater images. To address this problem, we collect the first overwater image dehazing dataset and propose a Generative Adversial Network (GAN)-based method called OverWater Image Dehazing GAN (OWI-DehazeGAN). Due to the difficulties of collecting paired hazy and clean images, the dataset contains unpaired hazy and clean images taken over water. The proposed OWI-DehazeGAN is composed of an encoder–decoder framework, supervised by a forward-backward translation consistency loss for self-supervision and a perceptual loss for content preservation. In addition to qualitative evaluation, we design an image quality assessment neural network to rank the dehazed images. Experimental results on both real and synthetic test data demonstrate that the proposed method performs superiorly against several state-of-the-art land dehazing methods. Compared with the state-of-the-art, our method gains a significant improvement by 1.94% for SSIM, 7.13% for PSNR and 4.00% for CIEDE2000 on the synthetic test dataset. Full article
(This article belongs to the Special Issue Deep Learning for Computer Vision: Algorithms, Theory and Application)
Show Figures

Figure 1

Article
Robust Image Classification with Cognitive-Driven Color Priors
Electronics 2020, 9(11), 1837; https://doi.org/10.3390/electronics9111837 - 03 Nov 2020
Cited by 1 | Viewed by 605
Abstract
Existing image classification methods based on convolutional neural networks usually use a large number of samples to learn classification features hierarchically, causing the problems of over-fitting and error propagation layer by layer. Thus, they are vulnerable to adversarial samples generated by adding imperceptible [...] Read more.
Existing image classification methods based on convolutional neural networks usually use a large number of samples to learn classification features hierarchically, causing the problems of over-fitting and error propagation layer by layer. Thus, they are vulnerable to adversarial samples generated by adding imperceptible disturbances to input samples. To address the above issue, we propose a cognitive-driven color prior model to memorize the color attributes of target samples inspired by the characteristics of human memory. At inference stage, color priors are indexed from the memory and fused with features of convolutional neural networks to achieve robust image classification. The proposed color prior model is cognitive-driven and has no training parameters, thus it has strong generalization and can effectively defend against adversarial samples. In addition, our method directly combines the features of the prior model with the classification probability of the convolutional neural network, without changing the network structure and its parameters of the existing algorithm. It can be combined with other adversarial attack defense methods, such as various preprocessing modules such as PixelDefense or adversarial training methods, to improve the robustness of image classification. Experiments on several benchmark datasets show that the proposed method improves the anti-interference ability of image classification algorithms. Full article
(This article belongs to the Special Issue Deep Learning for Computer Vision: Algorithms, Theory and Application)
Show Figures

Figure 1

Article
Efficient Facial Landmark Localization Based on Binarized Neural Networks
Electronics 2020, 9(8), 1236; https://doi.org/10.3390/electronics9081236 - 31 Jul 2020
Viewed by 1362
Abstract
Facial landmark localization is a significant yet challenging computer vision task, whose accuracy has been remarkably improved due to the successful application of deep Convolutional Neural Networks (CNNs). However, CNNs require huge storage and computation overhead, thus impeding their deployment on computationally limited [...] Read more.
Facial landmark localization is a significant yet challenging computer vision task, whose accuracy has been remarkably improved due to the successful application of deep Convolutional Neural Networks (CNNs). However, CNNs require huge storage and computation overhead, thus impeding their deployment on computationally limited platforms. In this paper, to the best of our knowledge, it is the first time that an efficient facial landmark localization is implemented via binarized CNNs. We introduce a new network architecture to calculate the binarized models, referred to as Amplitude Convolutional Networks (ACNs), based on the proposed asynchronous back propagation algorithm. We can efficiently recover the full-precision filters only using a single factor in an end-to-end manner, and the efficiency of CNNs for facial landmark localization is further improved by the extremely compressed 1-bit ACNs. Our ACNs reduce the storage space of convolutional filters by a factor of 32 compared with the full-precision models on dataset LFW+Webface, CelebA, BioID and 300W, while achieving a comparable performance to the full-precision facial landmark localization algorithms. Full article
(This article belongs to the Special Issue Deep Learning for Computer Vision: Algorithms, Theory and Application)
Show Figures

Figure 1

Article
Pruning Convolutional Neural Networks with an Attention Mechanism for Remote Sensing Image Classification
Electronics 2020, 9(8), 1209; https://doi.org/10.3390/electronics9081209 - 27 Jul 2020
Cited by 15 | Viewed by 2059
Abstract
Despite the great success of Convolutional Neural Networks (CNNs) in various visual recognition tasks, the high computational and storage costs of such deep networks impede their deployments in real-time remote sensing tasks. To this end, considerable attention has been given to the filter [...] Read more.
Despite the great success of Convolutional Neural Networks (CNNs) in various visual recognition tasks, the high computational and storage costs of such deep networks impede their deployments in real-time remote sensing tasks. To this end, considerable attention has been given to the filter pruning techniques, which enable slimming deep networks with acceptable performance drops and thus implementing them on the remote sensing devices. In this paper, we propose a new scheme, termed Pruning Filter with Attention Mechanism (PFAM), to compress and accelerate traditional CNNs. In particular, a novel correlation-based filter pruning criterion, which explores the long-range dependencies among filters via an attention module, is employed to select the to-be-pruned filters. Distinct from previous methods, the less correlated filters are first pruned after the pruning stage in the current training epoch, and they are reconstructed and updated during the next training epoch. Doing so allows manipulating input data with the maximum information preserved when executing the original training strategy such that the compressed network model can be obtained without the need for the pretrained model. The proposed method is evaluated on three public remote sensing image datasets, and the experimental results demonstrate its superiority, compared to state-of-the-art baselines. Specifically, PFAM achieves a 0.67% accuracy improvement with a 40% model-size reduction on the Aerial Image Dataset (AID) dataset, which is impressive. Full article
(This article belongs to the Special Issue Deep Learning for Computer Vision: Algorithms, Theory and Application)
Show Figures

Figure 1

Review

Jump to: Research

Review
Face Image Analysis Using Machine Learning: A Survey on Recent Trends and Applications
Electronics 2022, 11(8), 1210; https://doi.org/10.3390/electronics11081210 - 11 Apr 2022
Cited by 1 | Viewed by 504
Abstract
Human face image analysis using machine learning is an important element in computer vision. The human face image conveys information such as age, gender, identity, emotion, race, and attractiveness to both human and computer systems. Over the last ten years, face analysis methods [...] Read more.
Human face image analysis using machine learning is an important element in computer vision. The human face image conveys information such as age, gender, identity, emotion, race, and attractiveness to both human and computer systems. Over the last ten years, face analysis methods using machine learning have received immense attention due to their diverse applications in various tasks. Although several methods have been reported in the last ten years, face image analysis still represents a complicated challenge, particularly for images obtained from ’in the wild’ conditions. This survey paper presents a comprehensive review focusing on methods in both controlled and uncontrolled conditions. Our work illustrates both merits and demerits of each method previously proposed, starting from seminal works on face image analysis and ending with the latest ideas exploiting deep learning frameworks. We show a comparison of the performance of the previous methods on standard datasets and also present some promising future directions on the topic. Full article
(This article belongs to the Special Issue Deep Learning for Computer Vision: Algorithms, Theory and Application)
Show Figures

Figure 1

Planned Papers

The below list represents only planned manuscripts. Some of these manuscripts have not been received by the Editorial Office yet. Papers submitted to MDPI journals are subject to peer-review.

Title: Face Image Analysis Using Machine Learning: A Survey on Recent Trends and Applications
Authors: Khalil Khan
Affiliation: Institute of Applied Sciences and Technology
Abstract: Human face image analysis using machine learning is an important cue in computer vision. The human face image conveys information such as age, gender, identity, emotion, race, and attractiveness to both human and computer systems. Over the last ten years, face analysis methods using machine learning have received immense attention due to their diverse applications in various tasks. Although several methods have been reported in the last ten years, still face image analysis represents a complicated challenge, particularly images obtained in the wild conditions. This survey paper presents a comprehensive review focusing on methods in both controlled and uncontrolled conditions. Our work illustrates both merits and demerits of each method previously proposed, starting from seminal works on face image analysis and ending with the latest ideas exploiting deep learning frameworks. We show a comparison of the performance of the previous methods on standard datasets and also present some promising future directions on the topic.

Back to TopTop