sensors-logo

Journal Browser

Journal Browser

Image and Video Processing and Recognition Based on Artificial Intelligence

A special issue of Sensors (ISSN 1424-8220). This special issue belongs to the section "Internet of Things".

Deadline for manuscript submissions: closed (31 December 2020) | Viewed by 105349

Printed Edition Available!
A printed edition of this Special Issue is available here.

Special Issue Editors

Special Issue Information

Dear Colleagues,

Recent developments have led to the vivid application of artificial intelligence (AI) techniques to image and video processing and recognition. While the state-of-the-art technology has matured, its performance is still affected by various environmental conditions and heterogeneous databases. The purpose of this Special Issue is to invite high-quality and state-of-the-art academic papers on challenging issues in the field of AI-based image and video processing and recognition. We solicit original papers of unpublished and completed research that are not currently under review by any other conference, magazine or journal. Topics of interest include but are not limited to the following:

  • AI-based image processing, understanding, recognition, compression, and reconstruction;
  • AI-based video processing, understanding, recognition, compression, and reconstruction;
  • Computer vision based on AI;
  • AI-based biometrics;
  • AI-based object detection and tracking;
  • Approaches that combine AI techniques and conventional methods for image and video processing and recognition;
  • Explainable AI (XAI) for image and video processing and recognition;
  • Generative adversarial network (GAN)-based image and video processing and recognition;
  • Approaches that combine AI techniques and blockchain methods for image and video processing and recognition.

Prof. Dr. Kang Ryoung Park
Prof. Dr. Sangyoun Lee
Prof. Dr. Euntai Kim
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Sensors is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • Image processing, understanding, recognition, compression, and reconstruction based on AI
  • Video processing, understanding, recognition, compression, and reconstruction based on AI
  • Computer vision based on AI
  • Biometrics based on AI
  • Fusion of AI and conventional methods
  • XAI and GAN
  • Fusion of AI and blockchain methods

Published Papers (25 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

18 pages, 3790 KiB  
Article
A Robust Handwritten Numeral Recognition Using Hybrid Orthogonal Polynomials and Moments
by Sadiq H. Abdulhussain, Basheera M. Mahmmod, Marwah Abdulrazzaq Naser, Muntadher Qasim Alsabah, Roslizah Ali and S. A. R. Al-Haddad
Sensors 2021, 21(6), 1999; https://doi.org/10.3390/s21061999 - 12 Mar 2021
Cited by 26 | Viewed by 2072
Abstract
Numeral recognition is considered an essential preliminary step for optical character recognition, document understanding, and others. Although several handwritten numeral recognition algorithms have been proposed so far, achieving adequate recognition accuracy and execution time remain challenging to date. In particular, recognition accuracy depends [...] Read more.
Numeral recognition is considered an essential preliminary step for optical character recognition, document understanding, and others. Although several handwritten numeral recognition algorithms have been proposed so far, achieving adequate recognition accuracy and execution time remain challenging to date. In particular, recognition accuracy depends on the features extraction mechanism. As such, a fast and robust numeral recognition method is essential, which meets the desired accuracy by extracting the features efficiently while maintaining fast implementation time. Furthermore, to date most of the existing studies are focused on evaluating their methods based on clean environments, thus limiting understanding of their potential application in more realistic noise environments. Therefore, finding a feasible and accurate handwritten numeral recognition method that is accurate in the more practical noisy environment is crucial. To this end, this paper proposes a new scheme for handwritten numeral recognition using Hybrid orthogonal polynomials. Gradient and smoothed features are extracted using the hybrid orthogonal polynomial. To reduce the complexity of feature extraction, the embedded image kernel technique has been adopted. In addition, support vector machine is used to classify the extracted features for the different numerals. The proposed scheme is evaluated under three different numeral recognition datasets: Roman, Arabic, and Devanagari. We compare the accuracy of the proposed numeral recognition method with the accuracy achieved by the state-of-the-art recognition methods. In addition, we compare the proposed method with the most updated method of a convolutional neural network. The results show that the proposed method achieves almost the highest recognition accuracy in comparison with the existing recognition methods in all the scenarios considered. Importantly, the results demonstrate that the proposed method is robust against the noise distortion and outperforms the convolutional neural network considerably, which signifies the feasibility and the effectiveness of the proposed approach in comparison to the state-of-the-art recognition methods under both clean noise and more realistic noise environments. Full article
Show Figures

Figure 1

18 pages, 1015 KiB  
Article
Fast Approximation for Sparse Coding with Applications to Object Recognition
by Zhenzhen Sun and Yuanlong Yu
Sensors 2021, 21(4), 1442; https://doi.org/10.3390/s21041442 - 19 Feb 2021
Cited by 5 | Viewed by 1768
Abstract
Sparse Coding (SC) has been widely studied and shown its superiority in the fields of signal processing, statistics, and machine learning. However, due to the high computational cost of the optimization algorithms required to compute the sparse feature, the applicability of SC to [...] Read more.
Sparse Coding (SC) has been widely studied and shown its superiority in the fields of signal processing, statistics, and machine learning. However, due to the high computational cost of the optimization algorithms required to compute the sparse feature, the applicability of SC to real-time object recognition tasks is limited. Many deep neural networks have been constructed to low fast estimate the sparse feature with the help of a large number of training samples, which is not suitable for small-scale datasets. Therefore, this work presents a simple and efficient fast approximation method for SC, in which a special single-hidden-layer neural network (SLNNs) is constructed to perform the approximation task, and the optimal sparse features of training samples exactly computed by sparse coding algorithm are used as ground truth to train the SLNNs. After training, the proposed SLNNs can quickly estimate sparse features for testing samples. Ten benchmark data sets taken from UCI databases and two face image datasets are used for experiment, and the low root mean square error (RMSE) results between the approximated sparse features and the optimal ones have verified the approximation performance of this proposed method. Furthermore, the recognition results demonstrate that the proposed method can effectively reduce the computational time of testing process while maintaining the recognition performance, and outperforms several state-of-the-art fast approximation sparse coding methods, as well as the exact sparse coding algorithms. Full article
Show Figures

Figure 1

22 pages, 3703 KiB  
Article
Multi-Block Color-Binarized Statistical Images for Single-Sample Face Recognition
by Insaf Adjabi, Abdeldjalil Ouahabi, Amir Benzaoui and Sébastien Jacques
Sensors 2021, 21(3), 728; https://doi.org/10.3390/s21030728 - 21 Jan 2021
Cited by 53 | Viewed by 3644
Abstract
Single-Sample Face Recognition (SSFR) is a computer vision challenge. In this scenario, there is only one example from each individual on which to train the system, making it difficult to identify persons in unconstrained environments, mainly when dealing with changes in facial expression, [...] Read more.
Single-Sample Face Recognition (SSFR) is a computer vision challenge. In this scenario, there is only one example from each individual on which to train the system, making it difficult to identify persons in unconstrained environments, mainly when dealing with changes in facial expression, posture, lighting, and occlusion. This paper discusses the relevance of an original method for SSFR, called Multi-Block Color-Binarized Statistical Image Features (MB-C-BSIF), which exploits several kinds of features, namely, local, regional, global, and textured-color characteristics. First, the MB-C-BSIF method decomposes a facial image into three channels (e.g., red, green, and blue), then it divides each channel into equal non-overlapping blocks to select the local facial characteristics that are consequently employed in the classification phase. Finally, the identity is determined by calculating the similarities among the characteristic vectors adopting a distance measurement of the K-nearest neighbors (K-NN) classifier. Extensive experiments on several subsets of the unconstrained Alex and Robert (AR) and Labeled Faces in the Wild (LFW) databases show that the MB-C-BSIF achieves superior and competitive results in unconstrained situations when compared to current state-of-the-art methods, especially when dealing with changes in facial expression, lighting, and occlusion. The average classification accuracies are 96.17% and 99% for the AR database with two specific protocols (i.e., Protocols I and II, respectively), and 38.01% for the challenging LFW database. These performances are clearly superior to those obtained by state-of-the-art methods. Furthermore, the proposed method uses algorithms based only on simple and elementary image processing operations that do not imply higher computational costs as in holistic, sparse or deep learning methods, making it ideal for real-time identification. Full article
Show Figures

Figure 1

28 pages, 6537 KiB  
Article
Finger-Vein Recognition Using Heterogeneous Databases by Domain Adaption Based on a Cycle-Consistent Adversarial Network
by Kyoung Jun Noh, Jiho Choi, Jin Seong Hong and Kang Ryoung Park
Sensors 2021, 21(2), 524; https://doi.org/10.3390/s21020524 - 13 Jan 2021
Cited by 12 | Viewed by 2380
Abstract
The conventional finger-vein recognition system is trained using one type of database and entails the serious problem of performance degradation when tested with different types of databases. This degradation is caused by changes in image characteristics due to variable factors such as position [...] Read more.
The conventional finger-vein recognition system is trained using one type of database and entails the serious problem of performance degradation when tested with different types of databases. This degradation is caused by changes in image characteristics due to variable factors such as position of camera, finger, and lighting. Therefore, each database has varying characteristics despite the same finger-vein modality. However, previous researches on improving the recognition accuracy of unobserved or heterogeneous databases is lacking. To overcome this problem, we propose a method to improve the finger-vein recognition accuracy using domain adaptation between heterogeneous databases using cycle-consistent adversarial networks (CycleGAN), which enhances the recognition accuracy of unobserved data. The experiments were performed with two open databases—Shandong University homologous multi-modal traits finger-vein database (SDUMLA-HMT-DB) and Hong Kong Polytech University finger-image database (HKPolyU-DB). They showed that the equal error rate (EER) of finger-vein recognition was 0.85% in case of training with SDUMLA-HMT-DB and testing with HKPolyU-DB, which had an improvement of 33.1% compared to the second best method. The EER was 3.4% in case of training with HKPolyU-DB and testing with SDUMLA-HMT-DB, which also had an improvement of 4.8% compared to the second best method. Full article
Show Figures

Figure 1

21 pages, 6683 KiB  
Article
Monocular Depth Estimation with Joint Attention Feature Distillation and Wavelet-Based Loss Function
by Peng Liu, Zonghua Zhang, Zhaozong Meng and Nan Gao
Sensors 2021, 21(1), 54; https://doi.org/10.3390/s21010054 - 24 Dec 2020
Cited by 13 | Viewed by 3965
Abstract
Depth estimation is a crucial component in many 3D vision applications. Monocular depth estimation is gaining increasing interest due to flexible use and extremely low system requirements, but inherently ill-posed and ambiguous characteristics still cause unsatisfactory estimation results. This paper proposes a new [...] Read more.
Depth estimation is a crucial component in many 3D vision applications. Monocular depth estimation is gaining increasing interest due to flexible use and extremely low system requirements, but inherently ill-posed and ambiguous characteristics still cause unsatisfactory estimation results. This paper proposes a new deep convolutional neural network for monocular depth estimation. The network applies joint attention feature distillation and wavelet-based loss function to recover the depth information of a scene. Two improvements were achieved, compared with previous methods. First, we combined feature distillation and joint attention mechanisms to boost feature modulation discrimination. The network extracts hierarchical features using a progressive feature distillation and refinement strategy and aggregates features using a joint attention operation. Second, we adopted a wavelet-based loss function for network training, which improves loss function effectiveness by obtaining more structural details. The experimental results on challenging indoor and outdoor benchmark datasets verified the proposed method’s superiority compared with current state-of-the-art methods. Full article
Show Figures

Figure 1

17 pages, 4871 KiB  
Article
Classification and Prediction of Typhoon Levels by Satellite Cloud Pictures through GC–LSTM Deep Learning Model
by Jianyin Zhou, Jie Xiang and Sixun Huang
Sensors 2020, 20(18), 5132; https://doi.org/10.3390/s20185132 - 09 Sep 2020
Cited by 17 | Viewed by 3993
Abstract
Typhoons are some of the most serious natural disasters, and the key to disaster prevention and mitigation is typhoon level classification. How to better use data of satellite cloud pictures to achieve accurate classification of typhoon levels has become one of classification the [...] Read more.
Typhoons are some of the most serious natural disasters, and the key to disaster prevention and mitigation is typhoon level classification. How to better use data of satellite cloud pictures to achieve accurate classification of typhoon levels has become one of classification the hot issues in current studies. A new framework of deep learning neural network, Graph Convolutional–Long Short-Term Memory Network (GC–LSTM), is proposed, which is based on the data of satellite cloud pictures of the Himawari-8 satellite in 2010–2019. The Graph Convolutional Network (GCN) is used to process the irregular spatial structure of satellite cloud pictures effectively, and the Long Short-Term Memory (LSTM) network is utilized to learn the characteristics of satellite cloud pictures over time. Moreover, to verify the effectiveness and accuracy of the model, the prediction effect and model stability are compared with other models. The results show that: the algorithm performance of this model is better than other prediction models; the prediction accuracy rate of typhoon level classification reaches 92.35%, and the prediction accuracy of typhoons and super typhoons reaches 95.12%. The model can accurately identify typhoon eye and spiral cloud belt, and the prediction results are always kept in the minimum range compared with the actual results, which proves that the GC–LSTM model has stronger stability. The model can accurately identify the levels of different typhoons according to the satellite cloud pictures. In summary, the results can provide a theoretical basis for the related research of typhoon level classification. Full article
Show Figures

Figure 1

26 pages, 9085 KiB  
Article
NCC Based Correspondence Problem for First- and Second-Order Graph Matching
by Beibei Cui and Jean-Charles Créput
Sensors 2020, 20(18), 5117; https://doi.org/10.3390/s20185117 - 08 Sep 2020
Cited by 2 | Viewed by 2205
Abstract
Automatically finding correspondences between object features in images is of main interest for several applications, as object detection and tracking, identification, registration, and many derived tasks. In this paper, we address feature correspondence within the general framework of graph matching optimization and with [...] Read more.
Automatically finding correspondences between object features in images is of main interest for several applications, as object detection and tracking, identification, registration, and many derived tasks. In this paper, we address feature correspondence within the general framework of graph matching optimization and with the principal aim to contribute. We proposed two optimized algorithms: first-order and second-order for graph matching. On the one hand, a first-order normalized cross-correlation (NCC) based graph matching algorithm using entropy and response through Marr wavelets within the scale-interaction method is proposed. First, we proposed a new automatic feature detection processing by using Marr wavelets within the scale-interaction method. Second, feature extraction is executed under the mesh division strategy and entropy algorithm, accompanied by the assessment of the distribution criterion. Image matching is achieved by the nearest neighbor search with normalized cross-correlation similarity measurement to perform coarse matching on feature points set. As to the matching points filtering part, the Random Sample Consensus Algorithm (RANSAC) removes outliers correspondences. One the other hand, a second-order NCC based graph matching algorithm is presented. This algorithm is an integer quadratic programming (IQP) graph matching problem, which is implemented in Matlab. It allows developing and comparing many algorithms based on a common evaluation platform, sharing input data, and a customizable affinity matrix and matching list of candidate solution pairs as input data. Experimental results demonstrate the improvements of these algorithms concerning matching recall and accuracy compared with other algorithms. Full article
Show Figures

Figure 1

19 pages, 2088 KiB  
Article
Crowd Counting with Semantic Scene Segmentation in Helicopter Footage
by Gergely Csönde, Yoshihide Sekimoto and Takehiro Kashiyama
Sensors 2020, 20(17), 4855; https://doi.org/10.3390/s20174855 - 27 Aug 2020
Cited by 4 | Viewed by 2427
Abstract
Continually improving crowd counting neural networks have been developed in recent years. The accuracy of these networks has reached such high levels that further improvement is becoming very difficult. However, this high accuracy lacks deeper semantic information, such as social roles (e.g., student, [...] Read more.
Continually improving crowd counting neural networks have been developed in recent years. The accuracy of these networks has reached such high levels that further improvement is becoming very difficult. However, this high accuracy lacks deeper semantic information, such as social roles (e.g., student, company worker, or police officer) or location-based roles (e.g., pedestrian, tenant, or construction worker). Some of these can be learned from the same set of features as the human nature of an entity, whereas others require wider contextual information from the human surroundings. The primary end-goal of developing recognition software is to involve them in autonomous decision-making systems. Therefore, it must be foolproof, which is, it must have good semantic understanding of the input. In this study, we focus on counting pedestrians in helicopter footage and introduce a dataset created from helicopter videos for this purpose. We use semantic segmentation to extract the required additional contextual information from the surroundings of an entity. We demonstrate that it is possible to increase the pedestrian counting accuracy in this manner. Furthermore, we show that crowd counting and semantic segmentation can be simultaneously achieved, with comparable or even improved accuracy, by using the same crowd counting neural network for both tasks through hard parameter sharing. The presented method is generic and it can be applied to arbitrary crowd density estimation methods. A link to the dataset is available at the end of the paper. Full article
Show Figures

Figure 1

21 pages, 8359 KiB  
Article
An Input-Perceptual Reconstruction Adversarial Network for Paired Image-to-Image Conversion
by Aamir Khan, Weidong Jin, Muqeet Ahmad, Rizwan Ali Naqvi and Desheng Wang
Sensors 2020, 20(15), 4161; https://doi.org/10.3390/s20154161 - 27 Jul 2020
Cited by 2 | Viewed by 2256
Abstract
Image-to-image conversion based on deep learning techniques is a topic of interest in the fields of robotics and computer vision. A series of typical tasks, such as applying semantic labels to building photos, edges to photos, and raining to de-raining, can be seen [...] Read more.
Image-to-image conversion based on deep learning techniques is a topic of interest in the fields of robotics and computer vision. A series of typical tasks, such as applying semantic labels to building photos, edges to photos, and raining to de-raining, can be seen as paired image-to-image conversion problems. In such problems, the image generation network learns from the information in the form of input images. The input images and the corresponding targeted images must share the same basic structure to perfectly generate target-oriented output images. However, the shared basic structure between paired images is not as ideal as assumed, which can significantly affect the output of the generating model. Therefore, we propose a novel Input-Perceptual and Reconstruction Adversarial Network (IP-RAN) as an all-purpose framework for imperfect paired image-to-image conversion problems. We demonstrate, through the experimental results, that our IP-RAN method significantly outperforms the current state-of-the-art techniques. Full article
Show Figures

Figure 1

13 pages, 3078 KiB  
Article
Global-and-Local Context Network for Semantic Segmentation of Street View Images
by Chih-Yang Lin, Yi-Cheng Chiu, Hui-Fuang Ng, Timothy K. Shih and Kuan-Hung Lin
Sensors 2020, 20(10), 2907; https://doi.org/10.3390/s20102907 - 21 May 2020
Cited by 15 | Viewed by 4961
Abstract
Semantic segmentation of street view images is an important step in scene understanding for autonomous vehicle systems. Recent works have made significant progress in pixel-level labeling using Fully Convolutional Network (FCN) framework and local multi-scale context information. Rich global context information is also [...] Read more.
Semantic segmentation of street view images is an important step in scene understanding for autonomous vehicle systems. Recent works have made significant progress in pixel-level labeling using Fully Convolutional Network (FCN) framework and local multi-scale context information. Rich global context information is also essential in the segmentation process. However, a systematic way to utilize both global and local contextual information in a single network has not been fully investigated. In this paper, we propose a global-and-local network architecture (GLNet) which incorporates global spatial information and dense local multi-scale context information to model the relationship between objects in a scene, thus reducing segmentation errors. A channel attention module is designed to further refine the segmentation results using low-level features from the feature map. Experimental results demonstrate that our proposed GLNet achieves 80.8% test accuracy on the Cityscapes test dataset, comparing favorably with existing state-of-the-art methods. Full article
Show Figures

Graphical abstract

14 pages, 22449 KiB  
Article
Deep Binary Classification via Multi-Resolution Network and Stochastic Orthogonality for Subcompact Vehicle Recognition
by Joongchol Shin, Bonseok Koo, Yeongbin Kim and Joonki Paik
Sensors 2020, 20(9), 2715; https://doi.org/10.3390/s20092715 - 09 May 2020
Cited by 2 | Viewed by 2772
Abstract
To encourage people to save energy, subcompact cars have several benefits of discount on parking or toll road charge. However, manual classification of the subcompact car is highly labor intensive. To solve this problem, automatic vehicle classification systems are good candidates. Since a [...] Read more.
To encourage people to save energy, subcompact cars have several benefits of discount on parking or toll road charge. However, manual classification of the subcompact car is highly labor intensive. To solve this problem, automatic vehicle classification systems are good candidates. Since a general pattern-based classification technique can not successfully recognize the ambiguous features of a vehicle, we present a new multi-resolution convolutional neural network (CNN) and stochastic orthogonal learning method to train the network. We first extract the region of a bonnet in the vehicle image. Next, both extracted and input image are engaged to low and high resolution layers in the CNN model. The proposed network is then optimized based on stochastic orthogonality. We also built a novel subcompact vehicle dataset that will be open for a public use. Experimental results show that the proposed model outperforms state-of-the-art approaches in term of accuracy, which means that the proposed method can efficiently classify the ambiguous features between subcompact and non-subcompact vehicles. Full article
Show Figures

Figure 1

15 pages, 15725 KiB  
Article
Shedding Light on People Action Recognition in Social Robotics by Means of Common Spatial Patterns
by Itsaso Rodríguez-Moreno, José María Martínez-Otzeta, Izaro Goienetxea, Igor Rodriguez-Rodriguez and Basilio Sierra
Sensors 2020, 20(8), 2436; https://doi.org/10.3390/s20082436 - 24 Apr 2020
Cited by 13 | Viewed by 2922
Abstract
Action recognition in robotics is a research field that has gained momentum in recent years. In this work, a video activity recognition method is presented, which has the ultimate goal of endowing a robot with action recognition capabilities for a more natural social [...] Read more.
Action recognition in robotics is a research field that has gained momentum in recent years. In this work, a video activity recognition method is presented, which has the ultimate goal of endowing a robot with action recognition capabilities for a more natural social interaction. The application of Common Spatial Patterns (CSP), a signal processing approach widely used in electroencephalography (EEG), is presented in a novel manner to be used in activity recognition in videos taken by a humanoid robot. A sequence of skeleton data is considered as a multidimensional signal and filtered according to the CSP algorithm. Then, characteristics extracted from these filtered data are used as features for a classifier. A database with 46 individuals performing six different actions has been created to test the proposed method. The CSP-based method along with a Linear Discriminant Analysis (LDA) classifier has been compared to a Long Short-Term Memory (LSTM) neural network, showing that the former obtains similar or better results than the latter, while being simpler. Full article
Show Figures

Figure 1

16 pages, 5066 KiB  
Article
A Double-Branch Surface Detection System for Armatures in Vibration Motors with Miniature Volume Based on ResNet-101 and FPN
by Tao Feng, Jiange Liu, Xia Fang, Jie Wang and Libin Zhou
Sensors 2020, 20(8), 2360; https://doi.org/10.3390/s20082360 - 21 Apr 2020
Cited by 13 | Viewed by 2885
Abstract
In this paper, a complete system based on computer vision and deep learning is proposed for surface inspection of the armatures in a vibration motor with miniature volume. A device for imaging and positioning was designed in order to obtain the images of [...] Read more.
In this paper, a complete system based on computer vision and deep learning is proposed for surface inspection of the armatures in a vibration motor with miniature volume. A device for imaging and positioning was designed in order to obtain the images of the surface of the armatures. The images obtained by the device were divided into a training set and a test set. With continuous experimental exploration and improvement, the most efficient deep-network model was designed. The results show that the model leads to high accuracy on both the training set and the test set. In addition, we proposed a training method to make the network designed by us perform better. To guarantee the quality of the motor, a double-branch discrimination mechanism was also proposed. In order to verify the reliability of the system, experimental verification was conducted on the production line, and a satisfactory discrimination performance was reached. The results indicate that the proposed detection system for the armatures based on computer vision and deep learning is stable and reliable for armature production lines. Full article
Show Figures

Figure 1

17 pages, 4754 KiB  
Article
Multi-Modality Medical Image Fusion Using Convolutional Neural Network and Contrast Pyramid
by Kunpeng Wang, Mingyao Zheng, Hongyan Wei, Guanqiu Qi and Yuanyuan Li
Sensors 2020, 20(8), 2169; https://doi.org/10.3390/s20082169 - 11 Apr 2020
Cited by 84 | Viewed by 5925
Abstract
Medical image fusion techniques can fuse medical images from different morphologies to make the medical diagnosis more reliable and accurate, which play an increasingly important role in many clinical applications. To obtain a fused image with high visual quality and clear structure details, [...] Read more.
Medical image fusion techniques can fuse medical images from different morphologies to make the medical diagnosis more reliable and accurate, which play an increasingly important role in many clinical applications. To obtain a fused image with high visual quality and clear structure details, this paper proposes a convolutional neural network (CNN) based medical image fusion algorithm. The proposed algorithm uses the trained Siamese convolutional network to fuse the pixel activity information of source images to realize the generation of weight map. Meanwhile, a contrast pyramid is implemented to decompose the source image. According to different spatial frequency bands and a weighted fusion operator, source images are integrated. The results of comparative experiments show that the proposed fusion algorithm can effectively preserve the detailed structure information of source images and achieve good human visual effects. Full article
Show Figures

Figure 1

18 pages, 4202 KiB  
Article
Target Recognition in Infrared Circumferential Scanning System via Deep Convolutional Neural Networks
by Gao Chen and Weihua Wang
Sensors 2020, 20(7), 1922; https://doi.org/10.3390/s20071922 - 30 Mar 2020
Cited by 5 | Viewed by 2897
Abstract
With an infrared circumferential scanning system (IRCSS), we can realize long-time surveillance over a large field of view. Recognizing targets in the field of view automatically is a crucial component of improving environmental awareness under the trend of informatization, especially in the defense [...] Read more.
With an infrared circumferential scanning system (IRCSS), we can realize long-time surveillance over a large field of view. Recognizing targets in the field of view automatically is a crucial component of improving environmental awareness under the trend of informatization, especially in the defense system. Target recognition consists of two subtasks: detection and identification, corresponding to the position and category of the target, respectively. In this study, we propose a deep convolutional neural network (DCNN)-based method to realize the end-to-end target recognition in the IRCSS. Existing DCNN-based methods require a large annotated dataset for training, while public infrared datasets are mostly used for target tracking. Therefore, we build an infrared target recognition dataset to both overcome the shortage of data and enhance the adaptability of the algorithm in various scenes. We then use data augmentation and exploit the optimal cross-domain transfer learning strategy for network training. In this process, we design the smoother L1 as the loss function in bounding box regression for better localization performance. In the experiments, the proposed method achieved 82.7 mAP, accomplishing the end-to-end infrared target recognition with high effectiveness on accuracy. Full article
Show Figures

Figure 1

23 pages, 4257 KiB  
Article
Ultrasound Image-Based Diagnosis of Malignant Thyroid Nodule Using Artificial Intelligence
by Dat Tien Nguyen, Jin Kyu Kang, Tuyen Danh Pham, Ganbayar Batchuluun and Kang Ryoung Park
Sensors 2020, 20(7), 1822; https://doi.org/10.3390/s20071822 - 25 Mar 2020
Cited by 76 | Viewed by 9900
Abstract
Computer-aided diagnosis systems have been developed to assist doctors in diagnosing thyroid nodules to reduce errors made by traditional diagnosis methods, which are mainly based on the experiences of doctors. Therefore, the performance of such systems plays an important role in enhancing the [...] Read more.
Computer-aided diagnosis systems have been developed to assist doctors in diagnosing thyroid nodules to reduce errors made by traditional diagnosis methods, which are mainly based on the experiences of doctors. Therefore, the performance of such systems plays an important role in enhancing the quality of a diagnosing task. Although there have been the state-of-the art studies regarding this problem, which are based on handcrafted features, deep features, or the combination of the two, their performances are still limited. To overcome these problems, we propose an ultrasound image-based diagnosis of the malignant thyroid nodule method using artificial intelligence based on the analysis in both spatial and frequency domains. Additionally, we propose the use of weighted binary cross-entropy loss function for the training of deep convolutional neural networks to reduce the effects of unbalanced training samples of the target classes in the training data. Through our experiments with a popular open dataset, namely the thyroid digital image database (TDID), we confirm the superiority of our method compared to the state-of-the-art methods. Full article
Show Figures

Figure 1

24 pages, 5227 KiB  
Article
Presentation Attack Face Image Generation Based on a Deep Generative Adversarial Network
by Dat Tien Nguyen, Tuyen Danh Pham, Ganbayar Batchuluun, Kyoung Jun Noh and Kang Ryoung Park
Sensors 2020, 20(7), 1810; https://doi.org/10.3390/s20071810 - 25 Mar 2020
Cited by 5 | Viewed by 3553
Abstract
Although face-based biometric recognition systems have been widely used in many applications, this type of recognition method is still vulnerable to presentation attacks, which use fake samples to deceive the recognition system. To overcome this problem, presentation attack detection (PAD) methods for face [...] Read more.
Although face-based biometric recognition systems have been widely used in many applications, this type of recognition method is still vulnerable to presentation attacks, which use fake samples to deceive the recognition system. To overcome this problem, presentation attack detection (PAD) methods for face recognition systems (face-PAD), which aim to classify real and presentation attack face images before performing a recognition task, have been developed. However, the performance of PAD systems is limited and biased due to the lack of presentation attack images for training PAD systems. In this paper, we propose a method for artificially generating presentation attack face images by learning the characteristics of real and presentation attack images using a few captured images. As a result, our proposed method helps save time in collecting presentation attack samples for training PAD systems and possibly enhance the performance of PAD systems. Our study is the first attempt to generate PA face images for PAD system based on CycleGAN network, a deep-learning-based framework for image generation. In addition, we propose a new measurement method to evaluate the quality of generated PA images based on a face-PAD system. Through experiments with two public datasets (CASIA and Replay-mobile), we show that the generated face images can capture the characteristics of presentation attack images, making them usable as captured presentation attack samples for PAD system training. Full article
Show Figures

Figure 1

12 pages, 1023 KiB  
Article
Deep Active Learning for Surface Defect Detection
by Xiaoming Lv, Fajie Duan, Jia-Jia Jiang, Xiao Fu and Lin Gan
Sensors 2020, 20(6), 1650; https://doi.org/10.3390/s20061650 - 16 Mar 2020
Cited by 23 | Viewed by 7542
Abstract
Most of the current object detection approaches deliver competitive results with an assumption that a large number of labeled data are generally available and can be fed into a deep network at once. However, due to expensive labeling efforts, it is difficult to [...] Read more.
Most of the current object detection approaches deliver competitive results with an assumption that a large number of labeled data are generally available and can be fed into a deep network at once. However, due to expensive labeling efforts, it is difficult to deploy the object detection systems into more complex and challenging real-world environments, especially for defect detection in real industries. In order to reduce the labeling efforts, this study proposes an active learning framework for defect detection. First, an Uncertainty Sampling is proposed to produce the candidate list for annotation. Uncertain images can provide more informative knowledge for the learning process. Then, an Average Margin method is designed to set the sampling scale for each defect category. In addition, an iterative pattern of training and selection is adopted to train an effective detection model. Extensive experiments demonstrate that the proposed method can render the required performance with fewer labeled data. Full article
Show Figures

Figure 1

21 pages, 14785 KiB  
Article
Multi-Person Pose Estimation using an Orientation and Occlusion Aware Deep Learning Network
by Yanlei Gu, Huiyang Zhang and Shunsuke Kamijo
Sensors 2020, 20(6), 1593; https://doi.org/10.3390/s20061593 - 12 Mar 2020
Cited by 9 | Viewed by 5651
Abstract
Image based human behavior and activity understanding has been a hot topic in the field of computer vision and multimedia. As an important part, skeleton estimation, which is also called pose estimation, has attracted lots of interests. For pose estimation, most of the [...] Read more.
Image based human behavior and activity understanding has been a hot topic in the field of computer vision and multimedia. As an important part, skeleton estimation, which is also called pose estimation, has attracted lots of interests. For pose estimation, most of the deep learning approaches mainly focus on the joint feature. However, the joint feature is not sufficient, especially when the image includes multi-person and the pose is occluded or not fully visible. This paper proposes a novel multi-task framework for the multi-person pose estimation. The proposed framework is developed based on Mask Region-based Convolutional Neural Networks (R-CNN) and extended to integrate the joint feature, body boundary, body orientation and occlusion condition together. In order to further improve the performance of the multi-person pose estimation, this paper proposes to organize the different information in serial multi-task models instead of the widely used parallel multi-task network. The proposed models are trained on the public dataset Common Objects in Context (COCO), which is further augmented by ground truths of body orientation and mutual-occlusion mask. Experiments demonstrate the performance of the proposed method for multi-person pose estimation and body orientation estimation. The proposed method can detect 84.6% of the Percentage of Correct Keypoints (PCK) and has an 83.7% Correct Detection Rate (CDR). Comparisons further illustrate the proposed model can reduce the over-detection compared with other methods. Full article
Show Figures

Figure 1

16 pages, 2784 KiB  
Article
Color-Guided Depth Map Super-Resolution Using a Dual-Branch Multi-Scale Residual Network with Channel Interaction
by Ruijin Chen and Wei Gao
Sensors 2020, 20(6), 1560; https://doi.org/10.3390/s20061560 - 11 Mar 2020
Cited by 6 | Viewed by 3105
Abstract
We designed an end-to-end dual-branch residual network architecture that inputs a low-resolution (LR) depth map and a corresponding high-resolution (HR) color image separately into the two branches, and outputs an HR depth map through a multi-scale, channel-wise feature extraction, interaction, and upsampling. Each [...] Read more.
We designed an end-to-end dual-branch residual network architecture that inputs a low-resolution (LR) depth map and a corresponding high-resolution (HR) color image separately into the two branches, and outputs an HR depth map through a multi-scale, channel-wise feature extraction, interaction, and upsampling. Each branch of this network contains several residual levels at different scales, and each level comprises multiple residual groups composed of several residual blocks. A short-skip connection in every residual block and a long-skip connection in each residual group or level allow for low-frequency information to be bypassed while the main network focuses on learning high-frequency information. High-frequency information learned by each residual block in the color image branch is input into the corresponding residual block in the depth map branch, and this kind of channel-wise feature supplement and fusion can not only help the depth map branch to alleviate blur in details like edges, but also introduce some depth artifacts to feature maps. To avoid the above introduced artifacts, the channel interaction fuses the feature maps using weights referring to the channel attention mechanism. The parallel multi-scale network architecture with channel interaction for feature guidance is the main contribution of our work and experiments show that our proposed method had a better performance in terms of accuracy compared with other methods. Full article
Show Figures

Figure 1

27 pages, 9720 KiB  
Article
Simplified Fréchet Distance for Generative Adversarial Nets
by Chung-Il Kim, Meejoung Kim, Seungwon Jung and Eenjun Hwang
Sensors 2020, 20(6), 1548; https://doi.org/10.3390/s20061548 - 11 Mar 2020
Cited by 15 | Viewed by 4153
Abstract
We introduce a distance metric between two distributions and propose a Generative Adversarial Network (GAN) model: the Simplified Fréchet distance (SFD) and the Simplified Fréchet GAN (SFGAN). Although the data generated through GANs are similar to real data, GAN often undergoes unstable training [...] Read more.
We introduce a distance metric between two distributions and propose a Generative Adversarial Network (GAN) model: the Simplified Fréchet distance (SFD) and the Simplified Fréchet GAN (SFGAN). Although the data generated through GANs are similar to real data, GAN often undergoes unstable training due to its adversarial structure. A possible solution to this problem is considering Fréchet distance (FD). However, FD is unfeasible to realize due to its covariance term. SFD overcomes the complexity so that it enables us to realize in networks. The structure of SFGAN is based on the Boundary Equilibrium GAN (BEGAN) while using SFD in loss functions. Experiments are conducted with several datasets, including CelebA and CIFAR-10. The losses and generated samples of SFGAN and BEGAN are compared with several distance metrics. The evidence of mode collapse and/or mode drop does not occur until 3000k steps for SFGAN, while it occurs between 457k and 968k steps for BEGAN. Experimental results show that SFD makes GANs more stable than other distance metrics used in GANs, and SFD compensates for the weakness of models based on BEGAN-based network structure. Based on the experimental results, we can conclude that SFD is more suitable for GAN than other metrics. Full article
Show Figures

Figure 1

19 pages, 5725 KiB  
Article
Semi-Supervised Nests of Melanocytes Segmentation Method Using Convolutional Autoencoders
by Dariusz Kucharski, Pawel Kleczek, Joanna Jaworek-Korjakowska, Grzegorz Dyduch and Marek Gorgon
Sensors 2020, 20(6), 1546; https://doi.org/10.3390/s20061546 - 11 Mar 2020
Cited by 24 | Viewed by 4171
Abstract
In this research, we present a semi-supervised segmentation solution using convolutional autoencoders to solve the problem of segmentation tasks having a small number of ground-truth images. We evaluate the proposed deep network architecture for the detection of nests of nevus cells in histopathological [...] Read more.
In this research, we present a semi-supervised segmentation solution using convolutional autoencoders to solve the problem of segmentation tasks having a small number of ground-truth images. We evaluate the proposed deep network architecture for the detection of nests of nevus cells in histopathological images of skin specimens is an important step in dermatopathology. The diagnostic criteria based on the degree of uniformity and symmetry of border irregularities are particularly vital in dermatopathology, in order to distinguish between benign and malignant skin lesions. However, to the best of our knowledge, it is the first described method to segment the nests region. The novelty of our approach is not only the area of research, but, furthermore, we address a problem with a small ground-truth dataset. We propose an effective computer-vision based deep learning tool that can perform the nests segmentation based on an autoencoder architecture with two learning steps. Experimental results verified the effectiveness of the proposed approach and its ability to segment nests areas with Dice similarity coefficient 0.81, sensitivity 0.76, and specificity 0.94, which is a state-of-the-art result. Full article
Show Figures

Figure 1

13 pages, 6423 KiB  
Article
An Efficient Building Extraction Method from High Spatial Resolution Remote Sensing Images Based on Improved Mask R-CNN
by Lili Zhang, Jisen Wu, Yu Fan, Hongmin Gao and Yehong Shao
Sensors 2020, 20(5), 1465; https://doi.org/10.3390/s20051465 - 06 Mar 2020
Cited by 64 | Viewed by 5473
Abstract
In this paper, we consider building extraction from high spatial resolution remote sensing images. At present, most building extraction methods are based on artificial features. However, the diversity and complexity of buildings mean that building extraction methods still face great challenges, so methods [...] Read more.
In this paper, we consider building extraction from high spatial resolution remote sensing images. At present, most building extraction methods are based on artificial features. However, the diversity and complexity of buildings mean that building extraction methods still face great challenges, so methods based on deep learning have recently been proposed. In this paper, a building extraction framework based on a convolution neural network and edge detection algorithm is proposed. The method is called Mask R-CNN Fusion Sobel. Because of the outstanding achievement of Mask R-CNN in the field of image segmentation, this paper improves it and then applies it in remote sensing image building extraction. Our method consists of three parts. First, the convolutional neural network is used for rough location and pixel level classification, and the problem of false and missed extraction is solved by automatically discovering semantic features. Second, Sobel edge detection algorithm is used to segment building edges accurately so as to solve the problem of edge extraction and the integrity of the object of deep convolutional neural networks in semantic segmentation. Third, buildings are extracted by the fusion algorithm. We utilize the proposed framework to extract the building in high-resolution remote sensing images from Chinese satellite GF-2, and the experiments show that the average value of IOU (intersection over union) of the proposed method was 88.7% and the average value of Kappa was 87.8%, respectively. Therefore, our method can be applied to the recognition and segmentation of complex buildings and is superior to the classical method in accuracy. Full article
Show Figures

Figure 1

24 pages, 1454 KiB  
Article
A Multi-Task Framework for Facial Attributes Classification through End-to-End Face Parsing and Deep Convolutional Neural Networks
by Khalil Khan, Muhammad Attique, Rehan Ullah Khan, Ikram Syed and Tae-Sun Chung
Sensors 2020, 20(2), 328; https://doi.org/10.3390/s20020328 - 07 Jan 2020
Cited by 21 | Viewed by 5639
Abstract
Human face image analysis is an active research area within computer vision. In this paper we propose a framework for face image analysis, addressing three challenging problems of race, age, and gender recognition through face parsing. We manually labeled face images for training [...] Read more.
Human face image analysis is an active research area within computer vision. In this paper we propose a framework for face image analysis, addressing three challenging problems of race, age, and gender recognition through face parsing. We manually labeled face images for training an end-to-end face parsing model through Deep Convolutional Neural Networks. The deep learning-based segmentation model parses a face image into seven dense classes. We use the probabilistic classification method and created probability maps for each face class. The probability maps are used as feature descriptors. We trained another Convolutional Neural Network model by extracting features from probability maps of the corresponding class for each demographic task (race, age, and gender). We perform extensive experiments on state-of-the-art datasets and obtained much better results as compared to previous results. Full article
Show Figures

Figure 1

16 pages, 1810 KiB  
Article
EEG-Based Multi-Modal Emotion Recognition using Bag of Deep Features: An Optimal Feature Selection Approach
by Muhammad Adeel Asghar, Muhammad Jamil Khan, Fawad, Yasar Amin, Muhammad Rizwan, MuhibUr Rahman, Salman Badnava and Seyed Sajad Mirjavadi
Sensors 2019, 19(23), 5218; https://doi.org/10.3390/s19235218 - 28 Nov 2019
Cited by 69 | Viewed by 6100
Abstract
Much attention has been paid to the recognition of human emotions with the help of electroencephalogram (EEG) signals based on machine learning technology. Recognizing emotions is a challenging task due to the non-linear property of the EEG signal. This paper presents an advanced [...] Read more.
Much attention has been paid to the recognition of human emotions with the help of electroencephalogram (EEG) signals based on machine learning technology. Recognizing emotions is a challenging task due to the non-linear property of the EEG signal. This paper presents an advanced signal processing method using the deep neural network (DNN) for emotion recognition based on EEG signals. The spectral and temporal components of the raw EEG signal are first retained in the 2D Spectrogram before the extraction of features. The pre-trained AlexNet model is used to extract the raw features from the 2D Spectrogram for each channel. To reduce the feature dimensionality, spatial, and temporal based, bag of deep features (BoDF) model is proposed. A series of vocabularies consisting of 10 cluster centers of each class is calculated using the k-means cluster algorithm. Lastly, the emotion of each subject is represented using the histogram of the vocabulary set collected from the raw-feature of a single channel. Features extracted from the proposed BoDF model have considerably smaller dimensions. The proposed model achieves better classification accuracy compared to the recently reported work when validated on SJTU SEED and DEAP data sets. For optimal classification performance, we use a support vector machine (SVM) and k-nearest neighbor (k-NN) to classify the extracted features for the different emotional states of the two data sets. The BoDF model achieves 93.8% accuracy in the SEED data set and 77.4% accuracy in the DEAP data set, which is more accurate compared to other state-of-the-art methods of human emotion recognition. Full article
Show Figures

Figure 1

Back to TopTop