sensors-logo

Journal Browser

Journal Browser

Special Issue "Image and Video Processing and Recognition Based on Artificial Intelligence"

A special issue of Sensors (ISSN 1424-8220). This special issue belongs to the section "Intelligent Sensors".

Deadline for manuscript submissions: 31 December 2020.

Special Issue Editors

Prof. Dr. Kang Ryoung Park
Website
Guest Editor
Division of Electronics and Electrical Engineering, Dongguk University, 30, Pildong-ro 1-gil, Jung-gu, Seoul, Republic of Korea, 100-715
Interests: deep learning, biometrics, image processing
Special Issues and Collections in MDPI journals
Prof. Dr. Sangyoun Lee
Website
Guest Editor
School of Electrical and Electronic Engineering, Yonsei University, 50 Yonsei-ro, Seodaemun-gu, Seoul 120-749, Republic of Korea
Interests: human detection & recognition; gesture recognition; face recognition, HEVC
Special Issues and Collections in MDPI journals
Prof. Dr. Euntai Kim
Website
Guest Editor
School of Electrical and Electronic Engineering, Yonsei University, 50 Yonsei-ro, Seodaemun-gu, Seoul 120-749, Republic of Korea
Interests: pedestrian and vehicle detection & recognition; vision for advanced driver assistance systems (ADAS); robot vision
Special Issues and Collections in MDPI journals

Special Issue Information

Dear Colleagues,

Recent developments have led to the vivid application of artificial intelligence (AI) techniques to image and video processing and recognition. While the state-of-the-art technology has matured, its performance is still affected by various environmental conditions and heterogeneous databases. The purpose of this Special Issue is to invite high-quality and state-of-the-art academic papers on challenging issues in the field of AI-based image and video processing and recognition. We solicit original papers of unpublished and completed research that are not currently under review by any other conference, magazine or journal. Topics of interest include but are not limited to the following:

  • AI-based image processing, understanding, recognition, compression, and reconstruction;
  • AI-based video processing, understanding, recognition, compression, and reconstruction;
  • Computer vision based on AI;
  • AI-based biometrics;
  • AI-based object detection and tracking;
  • Approaches that combine AI techniques and conventional methods for image and video processing and recognition;
  • Explainable AI (XAI) for image and video processing and recognition;
  • Generative adversarial network (GAN)-based image and video processing and recognition;
  • Approaches that combine AI techniques and blockchain methods for image and video processing and recognition.

Prof. Dr. Kang Ryoung Park
Prof. Dr. Sangyoun Lee
Prof. Dr. Euntai Kim
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All papers will be peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Sensors is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2000 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • Image processing, understanding, recognition, compression, and reconstruction based on AI
  • Video processing, understanding, recognition, compression, and reconstruction based on AI
  • Computer vision based on AI
  • Biometrics based on AI
  • Fusion of AI and conventional methods
  • XAI and GAN
  • Fusion of AI and blockchain methods

Published Papers (11 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Open AccessArticle
Target Recognition in Infrared Circumferential Scanning System via Deep Convolutional Neural Networks
Sensors 2020, 20(7), 1922; https://doi.org/10.3390/s20071922 - 30 Mar 2020
Abstract
With an infrared circumferential scanning system (IRCSS), we can realize long-time surveillance over a large field of view. Recognizing targets in the field of view automatically is a crucial component of improving environmental awareness under the trend of informatization, especially in the defense [...] Read more.
With an infrared circumferential scanning system (IRCSS), we can realize long-time surveillance over a large field of view. Recognizing targets in the field of view automatically is a crucial component of improving environmental awareness under the trend of informatization, especially in the defense system. Target recognition consists of two subtasks: detection and identification, corresponding to the position and category of the target, respectively. In this study, we propose a deep convolutional neural network (DCNN)-based method to realize the end-to-end target recognition in the IRCSS. Existing DCNN-based methods require a large annotated dataset for training, while public infrared datasets are mostly used for target tracking. Therefore, we build an infrared target recognition dataset to both overcome the shortage of data and enhance the adaptability of the algorithm in various scenes. We then use data augmentation and exploit the optimal cross-domain transfer learning strategy for network training. In this process, we design the smoother L1 as the loss function in bounding box regression for better localization performance. In the experiments, the proposed method achieved 82.7 mAP, accomplishing the end-to-end infrared target recognition with high effectiveness on accuracy. Full article
Open AccessArticle
Ultrasound Image-Based Diagnosis of Malignant Thyroid Nodule Using Artificial Intelligence
Sensors 2020, 20(7), 1822; https://doi.org/10.3390/s20071822 - 25 Mar 2020
Abstract
Computer-aided diagnosis systems have been developed to assist doctors in diagnosing thyroid nodules to reduce errors made by traditional diagnosis methods, which are mainly based on the experiences of doctors. Therefore, the performance of such systems plays an important role in enhancing the [...] Read more.
Computer-aided diagnosis systems have been developed to assist doctors in diagnosing thyroid nodules to reduce errors made by traditional diagnosis methods, which are mainly based on the experiences of doctors. Therefore, the performance of such systems plays an important role in enhancing the quality of a diagnosing task. Although there have been the state-of-the art studies regarding this problem, which are based on handcrafted features, deep features, or the combination of the two, their performances are still limited. To overcome these problems, we propose an ultrasound image-based diagnosis of the malignant thyroid nodule method using artificial intelligence based on the analysis in both spatial and frequency domains. Additionally, we propose the use of weighted binary cross-entropy loss function for the training of deep convolutional neural networks to reduce the effects of unbalanced training samples of the target classes in the training data. Through our experiments with a popular open dataset, namely the thyroid digital image database (TDID), we confirm the superiority of our method compared to the state-of-the-art methods. Full article
Show Figures

Figure 1

Open AccessArticle
Presentation Attack Face Image Generation Based on a Deep Generative Adversarial Network
Sensors 2020, 20(7), 1810; https://doi.org/10.3390/s20071810 - 25 Mar 2020
Abstract
Although face-based biometric recognition systems have been widely used in many applications, this type of recognition method is still vulnerable to presentation attacks, which use fake samples to deceive the recognition system. To overcome this problem, presentation attack detection (PAD) methods for face [...] Read more.
Although face-based biometric recognition systems have been widely used in many applications, this type of recognition method is still vulnerable to presentation attacks, which use fake samples to deceive the recognition system. To overcome this problem, presentation attack detection (PAD) methods for face recognition systems (face-PAD), which aim to classify real and presentation attack face images before performing a recognition task, have been developed. However, the performance of PAD systems is limited and biased due to the lack of presentation attack images for training PAD systems. In this paper, we propose a method for artificially generating presentation attack face images by learning the characteristics of real and presentation attack images using a few captured images. As a result, our proposed method helps save time in collecting presentation attack samples for training PAD systems and possibly enhance the performance of PAD systems. Our study is the first attempt to generate PA face images for PAD system based on CycleGAN network, a deep-learning-based framework for image generation. In addition, we propose a new measurement method to evaluate the quality of generated PA images based on a face-PAD system. Through experiments with two public datasets (CASIA and Replay-mobile), we show that the generated face images can capture the characteristics of presentation attack images, making them usable as captured presentation attack samples for PAD system training. Full article
Show Figures

Figure 1

Open AccessArticle
Deep Active Learning for Surface Defect Detection
Sensors 2020, 20(6), 1650; https://doi.org/10.3390/s20061650 - 16 Mar 2020
Abstract
Most of the current object detection approaches deliver competitive results with an assumption that a large number of labeled data are generally available and can be fed into a deep network at once. However, due to expensive labeling efforts, it is difficult to [...] Read more.
Most of the current object detection approaches deliver competitive results with an assumption that a large number of labeled data are generally available and can be fed into a deep network at once. However, due to expensive labeling efforts, it is difficult to deploy the object detection systems into more complex and challenging real-world environments, especially for defect detection in real industries. In order to reduce the labeling efforts, this study proposes an active learning framework for defect detection. First, an Uncertainty Sampling is proposed to produce the candidate list for annotation. Uncertain images can provide more informative knowledge for the learning process. Then, an Average Margin method is designed to set the sampling scale for each defect category. In addition, an iterative pattern of training and selection is adopted to train an effective detection model. Extensive experiments demonstrate that the proposed method can render the required performance with fewer labeled data. Full article
Show Figures

Figure 1

Open AccessArticle
Multi-Person Pose Estimation using an Orientation and Occlusion Aware Deep Learning Network
Sensors 2020, 20(6), 1593; https://doi.org/10.3390/s20061593 - 12 Mar 2020
Abstract
Image based human behavior and activity understanding has been a hot topic in the field of computer vision and multimedia. As an important part, skeleton estimation, which is also called pose estimation, has attracted lots of interests. For pose estimation, most of the [...] Read more.
Image based human behavior and activity understanding has been a hot topic in the field of computer vision and multimedia. As an important part, skeleton estimation, which is also called pose estimation, has attracted lots of interests. For pose estimation, most of the deep learning approaches mainly focus on the joint feature. However, the joint feature is not sufficient, especially when the image includes multi-person and the pose is occluded or not fully visible. This paper proposes a novel multi-task framework for the multi-person pose estimation. The proposed framework is developed based on Mask Region-based Convolutional Neural Networks (R-CNN) and extended to integrate the joint feature, body boundary, body orientation and occlusion condition together. In order to further improve the performance of the multi-person pose estimation, this paper proposes to organize the different information in serial multi-task models instead of the widely used parallel multi-task network. The proposed models are trained on the public dataset Common Objects in Context (COCO), which is further augmented by ground truths of body orientation and mutual-occlusion mask. Experiments demonstrate the performance of the proposed method for multi-person pose estimation and body orientation estimation. The proposed method can detect 84.6% of the Percentage of Correct Keypoints (PCK) and has an 83.7% Correct Detection Rate (CDR). Comparisons further illustrate the proposed model can reduce the over-detection compared with other methods. Full article
Show Figures

Figure 1

Open AccessArticle
Color-Guided Depth Map Super-Resolution Using a Dual-Branch Multi-Scale Residual Network with Channel Interaction
Sensors 2020, 20(6), 1560; https://doi.org/10.3390/s20061560 - 11 Mar 2020
Abstract
We designed an end-to-end dual-branch residual network architecture that inputs a low-resolution (LR) depth map and a corresponding high-resolution (HR) color image separately into the two branches, and outputs an HR depth map through a multi-scale, channel-wise feature extraction, interaction, and upsampling. Each [...] Read more.
We designed an end-to-end dual-branch residual network architecture that inputs a low-resolution (LR) depth map and a corresponding high-resolution (HR) color image separately into the two branches, and outputs an HR depth map through a multi-scale, channel-wise feature extraction, interaction, and upsampling. Each branch of this network contains several residual levels at different scales, and each level comprises multiple residual groups composed of several residual blocks. A short-skip connection in every residual block and a long-skip connection in each residual group or level allow for low-frequency information to be bypassed while the main network focuses on learning high-frequency information. High-frequency information learned by each residual block in the color image branch is input into the corresponding residual block in the depth map branch, and this kind of channel-wise feature supplement and fusion can not only help the depth map branch to alleviate blur in details like edges, but also introduce some depth artifacts to feature maps. To avoid the above introduced artifacts, the channel interaction fuses the feature maps using weights referring to the channel attention mechanism. The parallel multi-scale network architecture with channel interaction for feature guidance is the main contribution of our work and experiments show that our proposed method had a better performance in terms of accuracy compared with other methods. Full article
Show Figures

Figure 1

Open AccessArticle
Simplified Fréchet Distance for Generative Adversarial Nets
Sensors 2020, 20(6), 1548; https://doi.org/10.3390/s20061548 - 11 Mar 2020
Abstract
We introduce a distance metric between two distributions and propose a Generative Adversarial Network (GAN) model: the Simplified Fréchet distance (SFD) and the Simplified Fréchet GAN (SFGAN). Although the data generated through GANs are similar to real data, GAN often undergoes unstable training [...] Read more.
We introduce a distance metric between two distributions and propose a Generative Adversarial Network (GAN) model: the Simplified Fréchet distance (SFD) and the Simplified Fréchet GAN (SFGAN). Although the data generated through GANs are similar to real data, GAN often undergoes unstable training due to its adversarial structure. A possible solution to this problem is considering Fréchet distance (FD). However, FD is unfeasible to realize due to its covariance term. SFD overcomes the complexity so that it enables us to realize in networks. The structure of SFGAN is based on the Boundary Equilibrium GAN (BEGAN) while using SFD in loss functions. Experiments are conducted with several datasets, including CelebA and CIFAR-10. The losses and generated samples of SFGAN and BEGAN are compared with several distance metrics. The evidence of mode collapse and/or mode drop does not occur until 3000k steps for SFGAN, while it occurs between 457k and 968k steps for BEGAN. Experimental results show that SFD makes GANs more stable than other distance metrics used in GANs, and SFD compensates for the weakness of models based on BEGAN-based network structure. Based on the experimental results, we can conclude that SFD is more suitable for GAN than other metrics. Full article
Show Figures

Figure 1

Open AccessArticle
Semi-Supervised Nests of Melanocytes Segmentation Method Using Convolutional Autoencoders
Sensors 2020, 20(6), 1546; https://doi.org/10.3390/s20061546 - 11 Mar 2020
Abstract
In this research, we present a semi-supervised segmentation solution using convolutional autoencoders to solve the problem of segmentation tasks having a small number of ground-truth images. We evaluate the proposed deep network architecture for the detection of nests of nevus cells in histopathological [...] Read more.
In this research, we present a semi-supervised segmentation solution using convolutional autoencoders to solve the problem of segmentation tasks having a small number of ground-truth images. We evaluate the proposed deep network architecture for the detection of nests of nevus cells in histopathological images of skin specimens is an important step in dermatopathology. The diagnostic criteria based on the degree of uniformity and symmetry of border irregularities are particularly vital in dermatopathology, in order to distinguish between benign and malignant skin lesions. However, to the best of our knowledge, it is the first described method to segment the nests region. The novelty of our approach is not only the area of research, but, furthermore, we address a problem with a small ground-truth dataset. We propose an effective computer-vision based deep learning tool that can perform the nests segmentation based on an autoencoder architecture with two learning steps. Experimental results verified the effectiveness of the proposed approach and its ability to segment nests areas with Dice similarity coefficient 0.81, sensitivity 0.76, and specificity 0.94, which is a state-of-the-art result. Full article
Show Figures

Figure 1

Open AccessArticle
An Efficient Building Extraction Method from High Spatial Resolution Remote Sensing Images Based on Improved Mask R-CNN
Sensors 2020, 20(5), 1465; https://doi.org/10.3390/s20051465 - 06 Mar 2020
Abstract
In this paper, we consider building extraction from high spatial resolution remote sensing images. At present, most building extraction methods are based on artificial features. However, the diversity and complexity of buildings mean that building extraction methods still face great challenges, so methods [...] Read more.
In this paper, we consider building extraction from high spatial resolution remote sensing images. At present, most building extraction methods are based on artificial features. However, the diversity and complexity of buildings mean that building extraction methods still face great challenges, so methods based on deep learning have recently been proposed. In this paper, a building extraction framework based on a convolution neural network and edge detection algorithm is proposed. The method is called Mask R-CNN Fusion Sobel. Because of the outstanding achievement of Mask R-CNN in the field of image segmentation, this paper improves it and then applies it in remote sensing image building extraction. Our method consists of three parts. First, the convolutional neural network is used for rough location and pixel level classification, and the problem of false and missed extraction is solved by automatically discovering semantic features. Second, Sobel edge detection algorithm is used to segment building edges accurately so as to solve the problem of edge extraction and the integrity of the object of deep convolutional neural networks in semantic segmentation. Third, buildings are extracted by the fusion algorithm. We utilize the proposed framework to extract the building in high-resolution remote sensing images from Chinese satellite GF-2, and the experiments show that the average value of IOU (intersection over union) of the proposed method was 88.7% and the average value of Kappa was 87.8%, respectively. Therefore, our method can be applied to the recognition and segmentation of complex buildings and is superior to the classical method in accuracy. Full article
Show Figures

Figure 1

Open AccessArticle
A Multi-Task Framework for Facial Attributes Classification through End-to-End Face Parsing and Deep Convolutional Neural Networks
Sensors 2020, 20(2), 328; https://doi.org/10.3390/s20020328 - 07 Jan 2020
Abstract
Human face image analysis is an active research area within computer vision. In this paper we propose a framework for face image analysis, addressing three challenging problems of race, age, and gender recognition through face parsing. We manually labeled face images for training [...] Read more.
Human face image analysis is an active research area within computer vision. In this paper we propose a framework for face image analysis, addressing three challenging problems of race, age, and gender recognition through face parsing. We manually labeled face images for training an end-to-end face parsing model through Deep Convolutional Neural Networks. The deep learning-based segmentation model parses a face image into seven dense classes. We use the probabilistic classification method and created probability maps for each face class. The probability maps are used as feature descriptors. We trained another Convolutional Neural Network model by extracting features from probability maps of the corresponding class for each demographic task (race, age, and gender). We perform extensive experiments on state-of-the-art datasets and obtained much better results as compared to previous results. Full article
Show Figures

Figure 1

Open AccessArticle
EEG-Based Multi-Modal Emotion Recognition using Bag of Deep Features: An Optimal Feature Selection Approach
Sensors 2019, 19(23), 5218; https://doi.org/10.3390/s19235218 - 28 Nov 2019
Cited by 1
Abstract
Much attention has been paid to the recognition of human emotions with the help of electroencephalogram (EEG) signals based on machine learning technology. Recognizing emotions is a challenging task due to the non-linear property of the EEG signal. This paper presents an advanced [...] Read more.
Much attention has been paid to the recognition of human emotions with the help of electroencephalogram (EEG) signals based on machine learning technology. Recognizing emotions is a challenging task due to the non-linear property of the EEG signal. This paper presents an advanced signal processing method using the deep neural network (DNN) for emotion recognition based on EEG signals. The spectral and temporal components of the raw EEG signal are first retained in the 2D Spectrogram before the extraction of features. The pre-trained AlexNet model is used to extract the raw features from the 2D Spectrogram for each channel. To reduce the feature dimensionality, spatial, and temporal based, bag of deep features (BoDF) model is proposed. A series of vocabularies consisting of 10 cluster centers of each class is calculated using the k-means cluster algorithm. Lastly, the emotion of each subject is represented using the histogram of the vocabulary set collected from the raw-feature of a single channel. Features extracted from the proposed BoDF model have considerably smaller dimensions. The proposed model achieves better classification accuracy compared to the recently reported work when validated on SJTU SEED and DEAP data sets. For optimal classification performance, we use a support vector machine (SVM) and k-nearest neighbor (k-NN) to classify the extracted features for the different emotional states of the two data sets. The BoDF model achieves 93.8% accuracy in the SEED data set and 77.4% accuracy in the DEAP data set, which is more accurate compared to other state-of-the-art methods of human emotion recognition. Full article
Show Figures

Figure 1

Back to TopTop