sensors-logo

Journal Browser

Journal Browser

Deep Learning-Based Image Sensors

A special issue of Sensors (ISSN 1424-8220). This special issue belongs to the section "Intelligent Sensors".

Deadline for manuscript submissions: closed (30 June 2019) | Viewed by 174773

Special Issue Editors

Special Issue Information

Dear Colleagues,

Recent developments have led to the widespread use of deep learning-based image sensors, such as visible light, near-infrared (NIR), and thermal camera sensors, in a variety of applications in video surveillance, biometrics, image compression, computer vision, and image restoration, etc. While existing technology has matured, its performance is still affected by various environmental conditions, and recent approaches have been attempted to fuse deep learning techniques with conventional methods to guarantee higher accuracy. The goal of this Special Issue is to invite high-quality, state-of-the-art research papers that deal with challenging issues in deep learning-based image sensors. We solicit original papers of unpublished and completed research that are not currently under review by any other conference/magazine/journal. Topics of interest include, but are not limited to, the following:

  • Deep learning-based image processing, understanding, recognition, compression, and reconstruction by visible light, NIR, thermal camera, and multimodal camera sensors
  • Deep learning-based video processing, understanding, recognition, compression, and reconstruction by various camera sensors
  • Deep learning-based computer vision
  • Deep learning-based biometrics and spoof detection
  • Deep learning-based object detection and tracking
  • Approaches that combine deep learning techniques and conventional methods on images by various camera sensors

Prof. Dr. Kang Ryoung Park
Prof. Dr. Sangyoun Lee
Prof. Dr. Euntai Kim
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Sensors is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • Image processing, understanding, recognition, compression, and reconstruction based on deep learning
  • Video processing, understanding, recognition, compression, and reconstruction based on deep learning
  • Computer vision based on deep learning
  • Biometrics based on deep learning
  • Fusion of deep learning and conventional methods

Published Papers (29 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Review

16 pages, 1929 KiB  
Article
Multi-Source Deep Transfer Neural Network Algorithm
by Jingmei Li, Weifei Wu, Di Xue and Peng Gao
Sensors 2019, 19(18), 3992; https://doi.org/10.3390/s19183992 - 16 Sep 2019
Cited by 30 | Viewed by 3107
Abstract
Transfer learning can enhance classification performance of a target domain with insufficient training data by utilizing knowledge relating to the target domain from source domain. Nowadays, it is common to see two or more source domains available for knowledge transfer, which can improve [...] Read more.
Transfer learning can enhance classification performance of a target domain with insufficient training data by utilizing knowledge relating to the target domain from source domain. Nowadays, it is common to see two or more source domains available for knowledge transfer, which can improve performance of learning tasks in the target domain. However, the classification performance of the target domain decreases due to mismatching of probability distribution. Recent studies have shown that deep learning can build deep structures by extracting more effective features to resist the mismatching. In this paper, we propose a new multi-source deep transfer neural network algorithm, MultiDTNN, based on convolutional neural network and multi-source transfer learning. In MultiDTNN, joint probability distribution adaptation (JPDA) is used for reducing the mismatching between source and target domains to enhance features transferability of the source domain in deep neural networks. Then, the convolutional neural network is trained by utilizing the datasets of each source and target domain to obtain a set of classifiers. Finally, the designed selection strategy selects classifier with the smallest classification error on the target domain from the set to assemble the MultiDTNN framework. The effectiveness of the proposed MultiDTNN is verified by comparing it with other state-of-the-art deep transfer learning on three datasets. Full article
(This article belongs to the Special Issue Deep Learning-Based Image Sensors)
Show Figures

Figure 1

19 pages, 8083 KiB  
Article
Precise Pollen Grain Detection in Bright Field Microscopy Using Deep Learning Techniques
by Ramón Gallardo-Caballero, Carlos J. García-Orellana, Antonio García-Manso, Horacio M. González-Velasco, Rafael Tormo-Molina and Miguel Macías-Macías
Sensors 2019, 19(16), 3583; https://doi.org/10.3390/s19163583 - 17 Aug 2019
Cited by 34 | Viewed by 8077
Abstract
The determination of daily concentrations of atmospheric pollen is important in the medical and biological fields. Obtaining pollen concentrations is a complex and time-consuming task for specialized personnel. The automatic location of pollen grains is a handicap due to the high complexity of [...] Read more.
The determination of daily concentrations of atmospheric pollen is important in the medical and biological fields. Obtaining pollen concentrations is a complex and time-consuming task for specialized personnel. The automatic location of pollen grains is a handicap due to the high complexity of the images to be processed, with polymorphic and clumped pollen grains, dust, or debris. The purpose of this study is to analyze the feasibility of implementing a reliable pollen grain detection system based on a convolutional neural network architecture, which will be used later as a critical part of an automated pollen concentration estimation system. We used a training set of 251 videos to train our system. As the videos record the process of focusing the samples, this system makes use of the 3D information presented by several focal planes. Besides, a separate set of 135 videos (containing 1234 pollen grains of 11 pollen types) was used to evaluate detection performance. The results are promising in detection (98.54% of recall and 99.75% of precision) and location accuracy (0.89 IoU as the average value). These results suggest that this technique can provide a reliable basis for the development of an automated pollen counting system. Full article
(This article belongs to the Special Issue Deep Learning-Based Image Sensors)
Show Figures

Graphical abstract

18 pages, 4375 KiB  
Article
Lightweight Driver Monitoring System Based on Multi-Task Mobilenets
by Whui Kim, Woo-Sung Jung and Hyun Kyun Choi
Sensors 2019, 19(14), 3200; https://doi.org/10.3390/s19143200 - 20 Jul 2019
Cited by 33 | Viewed by 5006
Abstract
Research on driver status recognition has been actively conducted to reduce fatal crashes caused by the driver’s distraction and drowsiness. As in many other research areas, deep-learning-based algorithms are showing excellent performance for driver status recognition. However, despite decades of research in the [...] Read more.
Research on driver status recognition has been actively conducted to reduce fatal crashes caused by the driver’s distraction and drowsiness. As in many other research areas, deep-learning-based algorithms are showing excellent performance for driver status recognition. However, despite decades of research in the driver status recognition area, the visual image-based driver monitoring system has not been widely used in the automobile industry. This is because the system requires high-performance processors, as well as has a hierarchical structure in which each procedure is affected by an inaccuracy from the previous procedure. To avoid using a hierarchical structure, we propose a method using Mobilenets without the functions of face detection and tracking and show this method is enabled to recognize facial behaviors that indicate the driver’s distraction. However, frames per second processed by Mobilenets with a Raspberry pi, one of the single-board computers, is not enough to recognize the driver status. To alleviate this problem, we propose a lightweight driver monitoring system using a resource sharing device in a vehicle (e.g., a driver’s mobile phone). The proposed system is based on Multi-Task Mobilenets (MT-Mobilenets), which consists of the Mobilenets’ base and multi-task classifier. The three Softmax regressions of the multi-task classifier help one Mobilenets base recognize facial behaviors related to the driver status, such as distraction, fatigue, and drowsiness. The proposed system based on MT-Mobilenets improved the accuracy of the driver status recognition with Raspberry Pi by using one additional device. Full article
(This article belongs to the Special Issue Deep Learning-Based Image Sensors)
Show Figures

Figure 1

15 pages, 4152 KiB  
Article
Violence Detection Using Spatiotemporal Features with 3D Convolutional Neural Network
by Fath U Min Ullah, Amin Ullah, Khan Muhammad, Ijaz Ul Haq and Sung Wook Baik
Sensors 2019, 19(11), 2472; https://doi.org/10.3390/s19112472 - 30 May 2019
Cited by 151 | Viewed by 9919
Abstract
The worldwide utilization of surveillance cameras in smart cities has enabled researchers to analyze a gigantic volume of data to ensure automatic monitoring. An enhanced security system in smart cities, schools, hospitals, and other surveillance domains is mandatory for the detection of violent [...] Read more.
The worldwide utilization of surveillance cameras in smart cities has enabled researchers to analyze a gigantic volume of data to ensure automatic monitoring. An enhanced security system in smart cities, schools, hospitals, and other surveillance domains is mandatory for the detection of violent or abnormal activities to avoid any casualties which could cause social, economic, and ecological damages. Automatic detection of violence for quick actions is very significant and can efficiently assist the concerned departments. In this paper, we propose a triple-staged end-to-end deep learning violence detection framework. First, persons are detected in the surveillance video stream using a light-weight convolutional neural network (CNN) model to reduce and overcome the voluminous processing of useless frames. Second, a sequence of 16 frames with detected persons is passed to 3D CNN, where the spatiotemporal features of these sequences are extracted and fed to the Softmax classifier. Furthermore, we optimized the 3D CNN model using an open visual inference and neural networks optimization toolkit developed by Intel, which converts the trained model into intermediate representation and adjusts it for optimal execution at the end platform for the final prediction of violent activity. After detection of a violent activity, an alert is transmitted to the nearest police station or security department to take prompt preventive actions. We found that our proposed method outperforms the existing state-of-the-art methods for different benchmark datasets. Full article
(This article belongs to the Special Issue Deep Learning-Based Image Sensors)
Show Figures

Figure 1

13 pages, 7076 KiB  
Article
SCENet: Secondary Domain Intercorrelation Enhanced Network for Alleviating Compressed Poisson Noises
by Seok Bong Yoo and Mikyong Han
Sensors 2019, 19(8), 1939; https://doi.org/10.3390/s19081939 - 25 Apr 2019
Cited by 1 | Viewed by 3532
Abstract
In real image coding systems, block-based coding is often applied on images contaminated by camera sensor noises such as Poisson noises, which cause complicated types of noises called compressed Poisson noises. Although many restoration methods have recently been proposed for compressed images, they [...] Read more.
In real image coding systems, block-based coding is often applied on images contaminated by camera sensor noises such as Poisson noises, which cause complicated types of noises called compressed Poisson noises. Although many restoration methods have recently been proposed for compressed images, they do not provide satisfactory performance on the challenging compressed Poisson noises. This is mainly due to (i) inaccurate modeling regarding the image degradation, (ii) the signal-dependent noise property, and (iii) the lack of analysis on intercorrelation distortion. In this paper, we focused on the challenging issues in practical image coding systems and propose a compressed Poisson noise reduction scheme based on a secondary domain intercorrelation enhanced network. Specifically, we introduced a compressed Poisson noise corruption model and combined the secondary domain intercorrelation prior with a deep neural network especially designed for signal-dependent compression noise reduction. Experimental results showed that the proposed network is superior to the existing state-of-the-art restoration alternatives on classical images, the LIVE1 dataset, and the SIDD dataset. Full article
(This article belongs to the Special Issue Deep Learning-Based Image Sensors)
Show Figures

Figure 1

26 pages, 1810 KiB  
Article
Spatio–Temporal Image Representation of 3D Skeletal Movements for View-Invariant Action Recognition with Deep Convolutional Neural Networks
by Huy Hieu Pham, Houssam Salmane, Louahdi Khoudour, Alain Crouzil, Pablo Zegers and Sergio A. Velastin
Sensors 2019, 19(8), 1932; https://doi.org/10.3390/s19081932 - 24 Apr 2019
Cited by 27 | Viewed by 6113
Abstract
Designing motion representations for 3D human action recognition from skeleton sequences is an important yet challenging task. An effective representation should be robust to noise, invariant to viewpoint changes and result in a good performance with low-computational demand. Two main challenges in this [...] Read more.
Designing motion representations for 3D human action recognition from skeleton sequences is an important yet challenging task. An effective representation should be robust to noise, invariant to viewpoint changes and result in a good performance with low-computational demand. Two main challenges in this task include how to efficiently represent spatio–temporal patterns of skeletal movements and how to learn their discriminative features for classification tasks. This paper presents a novel skeleton-based representation and a deep learning framework for 3D action recognition using RGB-D sensors. We propose to build an action map called SPMF (Skeleton Posture-Motion Feature), which is a compact image representation built from skeleton poses and their motions. An Adaptive Histogram Equalization (AHE) algorithm is then applied on the SPMF to enhance their local patterns and form an enhanced action map, namely Enhanced-SPMF. For learning and classification tasks, we exploit Deep Convolutional Neural Networks based on the DenseNet architecture to learn directly an end-to-end mapping between input skeleton sequences and their action labels via the Enhanced-SPMFs. The proposed method is evaluated on four challenging benchmark datasets, including both individual actions, interactions, multiview and large-scale datasets. The experimental results demonstrate that the proposed method outperforms previous state-of-the-art approaches on all benchmark tasks, whilst requiring low computational time for training and inference. Full article
(This article belongs to the Special Issue Deep Learning-Based Image Sensors)
Show Figures

Figure 1

20 pages, 4629 KiB  
Article
Depth Estimation and Semantic Segmentation from a Single RGB Image Using a Hybrid Convolutional Neural Network
by Xiao Lin, Dalila Sánchez-Escobedo, Josep R. Casas and Montse Pardàs
Sensors 2019, 19(8), 1795; https://doi.org/10.3390/s19081795 - 15 Apr 2019
Cited by 22 | Viewed by 5862
Abstract
Semantic segmentation and depth estimation are two important tasks in computer vision, and many methods have been developed to tackle them. Commonly these two tasks are addressed independently, but recently the idea of merging these two problems into a sole framework has been [...] Read more.
Semantic segmentation and depth estimation are two important tasks in computer vision, and many methods have been developed to tackle them. Commonly these two tasks are addressed independently, but recently the idea of merging these two problems into a sole framework has been studied under the assumption that integrating two highly correlated tasks may benefit each other to improve the estimation accuracy. In this paper, depth estimation and semantic segmentation are jointly addressed using a single RGB input image under a unified convolutional neural network. We analyze two different architectures to evaluate which features are more relevant when shared by the two tasks and which features should be kept separated to achieve a mutual improvement. Likewise, our approaches are evaluated under two different scenarios designed to review our results versus single-task and multi-task methods. Qualitative and quantitative experiments demonstrate that the performance of our methodology outperforms the state of the art on single-task approaches, while obtaining competitive results compared with other multi-task methods. Full article
(This article belongs to the Special Issue Deep Learning-Based Image Sensors)
Show Figures

Figure 1

34 pages, 10955 KiB  
Article
VisNet: Deep Convolutional Neural Networks for Forecasting Atmospheric Visibility
by Akmaljon Palvanov and Young Im Cho
Sensors 2019, 19(6), 1343; https://doi.org/10.3390/s19061343 - 18 Mar 2019
Cited by 55 | Viewed by 7723
Abstract
Visibility is a complex phenomenon inspired by emissions and air pollutants or by factors, including sunlight, humidity, temperature, and time, which decrease the clarity of what is visible through the atmosphere. This paper provides a detailed overview of the state-of-the-art contributions in relation [...] Read more.
Visibility is a complex phenomenon inspired by emissions and air pollutants or by factors, including sunlight, humidity, temperature, and time, which decrease the clarity of what is visible through the atmosphere. This paper provides a detailed overview of the state-of-the-art contributions in relation to visibility estimation under various foggy weather conditions. We propose VisNet, which is a new approach based on deep integrated convolutional neural networks for the estimation of visibility distances from camera imagery. The implemented network uses three streams of deep integrated convolutional neural networks, which are connected in parallel. In addition, we have collected the largest dataset with three million outdoor images and exact visibility values for this study. To evaluate the model’s performance fairly and objectively, the model is trained on three image datasets with different visibility ranges, each with a different number of classes. Moreover, our proposed model, VisNet, evaluated under dissimilar fog density scenarios, uses a diverse set of images. Prior to feeding the network, each input image is filtered in the frequency domain to remove low-level features, and a spectral filter is applied to each input for the extraction of low-contrast regions. Compared to the previous methods, our approach achieves the highest performance in terms of classification based on three different datasets. Furthermore, our VisNet considerably outperforms not only the classical methods, but also state-of-the-art models of visibility estimation. Full article
(This article belongs to the Special Issue Deep Learning-Based Image Sensors)
Show Figures

Figure 1

19 pages, 4429 KiB  
Article
Multi-Oriented and Scale-Invariant License Plate Detection Based on Convolutional Neural Networks
by Jing Han, Jian Yao, Jiao Zhao, Jingmin Tu and Yahui Liu
Sensors 2019, 19(5), 1175; https://doi.org/10.3390/s19051175 - 07 Mar 2019
Cited by 27 | Viewed by 4847
Abstract
License plate detection (LPD) is the first and key step in license plate recognition. State-of-the-art object-detection algorithms based on deep learning provide a promising form of LPD. However, there still exist two main challenges. First, existing methods often enclose objects with horizontal rectangles. [...] Read more.
License plate detection (LPD) is the first and key step in license plate recognition. State-of-the-art object-detection algorithms based on deep learning provide a promising form of LPD. However, there still exist two main challenges. First, existing methods often enclose objects with horizontal rectangles. However, horizontal rectangles are not always suitable since license plates in images are multi-oriented, reflected by rotation and perspective distortion. Second, the scale of license plates often varies, leading to the difficulty of multi-scale detection. To address the aforementioned problems, we propose a novel method of multi-oriented and scale-invariant license plate detection (MOSI-LPD) based on convolutional neural networks. Our MOSI-LPD tightly encloses the multi-oriented license plates with bounding parallelograms, regardless of the license plate scales. To obtain bounding parallelograms, we first parameterize the edge points of license plates by relative positions. Next, we design mapping functions between oriented regions and horizontal proposals. Then, we enforce the symmetry constraints in the loss function and train the model with a multi-task loss. Finally, we map region proposals to three edge points of a nearby license plate, and infer the fourth point to form bounding parallelograms. To achieve scale invariance, we first design anchor boxes based on inherent shapes of license plates. Next, we search different layers to generate region proposals with multiple scales. Finally, we up-sample the last layer and combine proposal features extracted from different layers to recognize true license plates. Experimental results have demonstrated that the proposed method outperforms existing approaches in terms of detecting license plates with different orientations and multiple scales. Full article
(This article belongs to the Special Issue Deep Learning-Based Image Sensors)
Show Figures

Figure 1

30 pages, 14270 KiB  
Article
Deep Residual CNN-Based Ocular Recognition Based on Rough Pupil Detection in the Images by NIR Camera Sensor
by Young Won Lee, Ki Wan Kim, Toan Minh Hoang, Muhammad Arsalan and Kang Ryoung Park
Sensors 2019, 19(4), 842; https://doi.org/10.3390/s19040842 - 18 Feb 2019
Cited by 28 | Viewed by 4574
Abstract
Accurate segmentation of the iris area in input images has a significant effect on the accuracy of iris recognition and is a very important preprocessing step in the overall iris recognition process. In previous studies on iris recognition, however, the accuracy of iris [...] Read more.
Accurate segmentation of the iris area in input images has a significant effect on the accuracy of iris recognition and is a very important preprocessing step in the overall iris recognition process. In previous studies on iris recognition, however, the accuracy of iris segmentation was reduced when the images of captured irises were of low quality due to problems such as optical and motion blurring, thick eyelashes, and light reflected from eyeglasses. Deep learning-based iris segmentation has been proposed to improve accuracy, but its disadvantage is that it requires a long processing time. To resolve this problem, this study proposes a new method that quickly finds a rough iris box area without accurately segmenting the iris region in the input images and performs ocular recognition based on this. To address this problem of reduced accuracy, the recognition is performed using the ocular area, which is a little larger than the iris area, and a deep residual network (ResNet) is used to resolve the problem of reduced recognition rates due to misalignment between the enrolled and recognition iris images. Experiments were performed using three databases: Institute of Automation Chinese Academy of Sciences (CASIA)-Iris-Distance, CASIA-Iris-Lamp, and CASIA-Iris-Thousand. They confirmed that the method proposed in this study had a higher recognition accuracy than existing methods. Full article
(This article belongs to the Special Issue Deep Learning-Based Image Sensors)
Show Figures

Figure 1

20 pages, 13742 KiB  
Article
Sensor Fusion-Based Cooperative Trail Following for Autonomous Multi-Robot System
by Mingyang Geng, Shuqi Liu and Zhaoxia Wu
Sensors 2019, 19(4), 823; https://doi.org/10.3390/s19040823 - 17 Feb 2019
Cited by 4 | Viewed by 3042
Abstract
Autonomously following a man-made trail in the wild is a challenging problem for robotic systems. Recently, deep learning-based approaches have cast the trail following problem as an image classification task and have achieved great success in the vision-based trail-following problem. However, the existing [...] Read more.
Autonomously following a man-made trail in the wild is a challenging problem for robotic systems. Recently, deep learning-based approaches have cast the trail following problem as an image classification task and have achieved great success in the vision-based trail-following problem. However, the existing research only focuses on the trail-following task with a single-robot system. In contrast, many robotic tasks in reality, such as search and rescue, are conducted by a group of robots. While these robots are grouped to move in the wild, they can cooperate to lead to a more robust performance and perform the trail-following task in a better manner. Concretely, each robot can periodically exchange the vision data with other robots and make decisions based both on its local view and the information from others. This paper proposes a sensor fusion-based cooperative trail-following method, which enables a group of robots to implement the trail-following task by fusing the sensor data of each robot. Our method allows each robot to face the same direction from different altitudes to fuse the vision data feature on the collective level and then take action respectively. Besides, considering the quality of service requirement of the robotic software, our method limits the condition to implementing the sensor data fusion process by using the “threshold” mechanism. Qualitative and quantitative experiments on the real-world dataset have shown that our method can significantly promote the recognition accuracy and lead to a more robust performance compared with the single-robot system. Full article
(This article belongs to the Special Issue Deep Learning-Based Image Sensors)
Show Figures

Figure 1

28 pages, 10538 KiB  
Article
Deep Learning-Based Multinational Banknote Type and Fitness Classification with the Combined Images by Visible-Light Reflection and Infrared-Light Transmission Image Sensors
by Tuyen Danh Pham, Dat Tien Nguyen, Chanhum Park and Kang Ryoung Park
Sensors 2019, 19(4), 792; https://doi.org/10.3390/s19040792 - 15 Feb 2019
Cited by 9 | Viewed by 4415
Abstract
Automatic sorting of banknotes in payment facilities, such as automated payment machines or vending machines, consists of many tasks such as recognition of banknote type, classification of fitness for recirculation, and counterfeit detection. Previous studies addressing these problems have mostly reported separately on [...] Read more.
Automatic sorting of banknotes in payment facilities, such as automated payment machines or vending machines, consists of many tasks such as recognition of banknote type, classification of fitness for recirculation, and counterfeit detection. Previous studies addressing these problems have mostly reported separately on each of these classification tasks and for a specific type of currency only. In other words, there has been little research conducted considering a combination of these multiple tasks, such as classification of banknote denomination and fitness of banknotes, as well as considering a multinational currency condition of the method. To overcome this issue, we propose a multinational banknote type and fitness classification method that both recognizes the denomination and input direction of banknotes and determines whether the banknote is suitable for reuse or should be replaced by a new one. We also propose a method for estimating the fitness value of banknotes and the consistency of the estimation results among input trials of a banknote. Our method is based on a combination of infrared-light transmission and visible-light reflection images of the input banknote and uses deep-learning techniques with a convolutional neural network. The experimental results on a dataset composed of Indian rupee (INR), Korean won (KRW), and United States dollar (USD) banknote images with mixture of two and three fitness levels showed that the proposed method gives good performance in the combination condition of currency types and classification tasks. Full article
(This article belongs to the Special Issue Deep Learning-Based Image Sensors)
Show Figures

Figure 1

21 pages, 7220 KiB  
Article
Vehicle Detection in Urban Traffic Surveillance Images Based on Convolutional Neural Networks with Feature Concatenation
by Fukai Zhang, Ce Li and Feng Yang
Sensors 2019, 19(3), 594; https://doi.org/10.3390/s19030594 - 30 Jan 2019
Cited by 45 | Viewed by 9692
Abstract
Vehicle detection with category inference on video sequence data is an important but challenging task for urban traffic surveillance. The difficulty of this task lies in the fact that it requires accurate localization of relatively small vehicles in complex scenes and expects real-time [...] Read more.
Vehicle detection with category inference on video sequence data is an important but challenging task for urban traffic surveillance. The difficulty of this task lies in the fact that it requires accurate localization of relatively small vehicles in complex scenes and expects real-time detection. In this paper, we present a vehicle detection framework that improves the performance of the conventional Single Shot MultiBox Detector (SSD), which effectively detects different types of vehicles in real-time. Our approach, which proposes the use of different feature extractors for localization and classification tasks in a single network, and to enhance these two feature extractors through deconvolution (D) and pooling (P) between layers in the feature pyramid, is denoted as DP-SSD. In addition, we extend the scope of the default box by adjusting its scale so that smaller default boxes can be exploited to guide DP-SSD training. Experimental results on the UA-DETRAC and KITTI datasets demonstrate that DP-SSD can achieve efficient vehicle detection for real-world traffic surveillance data in real-time. For the UA-DETRAC test set trained with UA-DETRAC trainval set, DP-SSD with the input size of 300 × 300 achieves 75.43% mAP (mean average precision) at the speed of 50.47 FPS (frames per second), and the framework with a 512 × 512 sized input reaches 77.94% mAP at 25.12 FPS using an NVIDIA GeForce GTX 1080Ti GPU. The DP-SSD shows comparable accuracy, which is better than those of the compared state-of-the-art models, except for YOLOv3. Full article
(This article belongs to the Special Issue Deep Learning-Based Image Sensors)
Show Figures

Figure 1

20 pages, 79222 KiB  
Article
Real-Time Semantic Segmentation for Fisheye Urban Driving Images Based on ERFNet
by Álvaro Sáez, Luis M. Bergasa, Elena López-Guillén, Eduardo Romera, Miguel Tradacete, Carlos Gómez-Huélamo and Javier del Egido
Sensors 2019, 19(3), 503; https://doi.org/10.3390/s19030503 - 25 Jan 2019
Cited by 36 | Viewed by 8504
Abstract
The interest in fisheye cameras has recently risen in the autonomous vehicles field, as they are able to reduce the complexity of perception systems while improving the management of dangerous driving situations. However, the strong distortion inherent to these cameras makes the usage [...] Read more.
The interest in fisheye cameras has recently risen in the autonomous vehicles field, as they are able to reduce the complexity of perception systems while improving the management of dangerous driving situations. However, the strong distortion inherent to these cameras makes the usage of conventional computer vision algorithms difficult and has prevented the development of these devices. This paper presents a methodology that provides real-time semantic segmentation on fisheye cameras leveraging only synthetic images. Furthermore, we propose some Convolutional Neural Networks(CNN) architectures based on Efficient Residual Factorized Network(ERFNet) that demonstrate notable skills handling distortion and a new training strategy that improves the segmentation on the image borders. Our proposals are compared to similar state-of-the-art works showing an outstanding performance and tested in an unknown real world scenario using a fisheye camera integrated in an open-source autonomous electric car, showing a high domain adaptation capability. Full article
(This article belongs to the Special Issue Deep Learning-Based Image Sensors)
Show Figures

Figure 1

27 pages, 2654 KiB  
Article
Visible-Light Camera Sensor-Based Presentation Attack Detection for Face Recognition by Combining Spatial and Temporal Information
by Dat Tien Nguyen, Tuyen Danh Pham, Min Beom Lee and Kang Ryoung Park
Sensors 2019, 19(2), 410; https://doi.org/10.3390/s19020410 - 20 Jan 2019
Cited by 12 | Viewed by 5107
Abstract
Face-based biometric recognition systems that can recognize human faces are widely employed in places such as airports, immigration offices, and companies, and applications such as mobile phones. However, the security of this recognition method can be compromised by attackers (unauthorized persons), who might [...] Read more.
Face-based biometric recognition systems that can recognize human faces are widely employed in places such as airports, immigration offices, and companies, and applications such as mobile phones. However, the security of this recognition method can be compromised by attackers (unauthorized persons), who might bypass the recognition system using artificial facial images. In addition, most previous studies on face presentation attack detection have only utilized spatial information. To address this problem, we propose a visible-light camera sensor-based presentation attack detection that is based on both spatial and temporal information, using the deep features extracted by a stacked convolutional neural network (CNN)-recurrent neural network (RNN) along with handcrafted features. Through experiments using two public datasets, we demonstrate that the temporal information is sufficient for detecting attacks using face images. In addition, it is established that the handcrafted image features efficiently enhance the detection performance of deep features, and the proposed method outperforms previous methods. Full article
(This article belongs to the Special Issue Deep Learning-Based Image Sensors)
Show Figures

Figure 1

25 pages, 23278 KiB  
Article
Deep RetinaNet-Based Detection and Classification of Road Markings by Visible Light Camera Sensors
by Toan Minh Hoang, Phong Ha Nguyen, Noi Quang Truong, Young Won Lee and Kang Ryoung Park
Sensors 2019, 19(2), 281; https://doi.org/10.3390/s19020281 - 11 Jan 2019
Cited by 28 | Viewed by 8498
Abstract
Detection and classification of road markings are a prerequisite for operating autonomous vehicles. Although most studies have focused on the detection of road lane markings, the detection and classification of other road markings, such as arrows and bike markings, have not received much [...] Read more.
Detection and classification of road markings are a prerequisite for operating autonomous vehicles. Although most studies have focused on the detection of road lane markings, the detection and classification of other road markings, such as arrows and bike markings, have not received much attention. Therefore, we propose a detection and classification method for various types of arrow markings and bike markings on the road in various complex environments using a one-stage deep convolutional neural network (CNN), called RetinaNet. We tested the proposed method in complex road scenarios with three open datasets captured by visible light camera sensors, namely the Malaga urban dataset, the Cambridge dataset, and the Daimler dataset on both a desktop computer and an NVIDIA Jetson TX2 embedded system. Experimental results obtained using the three open databases showed that the proposed RetinaNet-based method outperformed other methods for detection and classification of road markings in terms of both accuracy and processing time. Full article
(This article belongs to the Special Issue Deep Learning-Based Image Sensors)
Show Figures

Figure 1

29 pages, 10608 KiB  
Article
Faster R-CNN and Geometric Transformation-Based Detection of Driver’s Eyes Using Multiple Near-Infrared Camera Sensors
by Sung Ho Park, Hyo Sik Yoon and Kang Ryoung Park
Sensors 2019, 19(1), 197; https://doi.org/10.3390/s19010197 - 07 Jan 2019
Cited by 8 | Viewed by 6043
Abstract
Studies are being actively conducted on camera-based driver gaze tracking in a vehicle environment for vehicle interfaces and analyzing forward attention for judging driver inattention. In existing studies on the single-camera-based method, there are frequent situations in which the eye information necessary for [...] Read more.
Studies are being actively conducted on camera-based driver gaze tracking in a vehicle environment for vehicle interfaces and analyzing forward attention for judging driver inattention. In existing studies on the single-camera-based method, there are frequent situations in which the eye information necessary for gaze tracking cannot be observed well in the camera input image owing to the turning of the driver’s head during driving. To solve this problem, existing studies have used multiple-camera-based methods to obtain images to track the driver’s gaze. However, this method has the drawback of an excessive computation process and processing time, as it involves detecting the eyes and extracting the features of all images obtained from multiple cameras. This makes it difficult to implement it in an actual vehicle environment. To solve these limitations of existing studies, this study proposes a method that uses a shallow convolutional neural network (CNN) for the images of the driver’s face acquired from two cameras to adaptively select camera images more suitable for detecting eye position; faster R-CNN is applied to the selected driver images, and after the driver’s eyes are detected, the eye positions of the camera image of the other side are mapped through a geometric transformation matrix. Experiments were conducted using the self-built Dongguk Dual Camera-based Driver Database (DDCD-DB1) including the images of 26 participants acquired from inside a vehicle and the Columbia Gaze Data Set (CAVE-DB) open database. The results confirmed that the performance of the proposed method is superior to those of the existing methods. Full article
(This article belongs to the Special Issue Deep Learning-Based Image Sensors)
Show Figures

Figure 1

18 pages, 5729 KiB  
Article
Development of Limited-Angle Iterative Reconstruction Algorithms with Context Encoder-Based Sinogram Completion for Micro-CT Applications
by Shih-Chun Jin, Chia-Jui Hsieh, Jyh-Cheng Chen, Shih-Huan Tu, Ya-Chen Chen, Tzu-Chien Hsiao, Angela Liu, Wen-Hsiang Chou, Woei-Chyn Chu and Chih-Wei Kuo
Sensors 2018, 18(12), 4458; https://doi.org/10.3390/s18124458 - 16 Dec 2018
Cited by 10 | Viewed by 4989
Abstract
Limited-angle iterative reconstruction (LAIR) reduces the radiation dose required for computed tomography (CT) imaging by decreasing the range of the projection angle. We developed an image-quality-based stopping-criteria method with a flexible and innovative instrument design that, when combined with LAIR, provides the image [...] Read more.
Limited-angle iterative reconstruction (LAIR) reduces the radiation dose required for computed tomography (CT) imaging by decreasing the range of the projection angle. We developed an image-quality-based stopping-criteria method with a flexible and innovative instrument design that, when combined with LAIR, provides the image quality of a conventional CT system. This study describes the construction of different scan acquisition protocols for micro-CT system applications. Fully-sampled Feldkamp (FDK)-reconstructed images were used as references for comparison to assess the image quality produced by these tested protocols. The insufficient portions of a sinogram were inpainted by applying a context encoder (CE), a type of generative adversarial network, to the LAIR process. The context image was passed through an encoder to identify features that were connected to the decoder using a channel-wise fully-connected layer. Our results evidence the excellent performance of this novel approach. Even when we reduce the radiation dose by 1/4, the iterative-based LAIR improved the full-width half-maximum, contrast-to-noise and signal-to-noise ratios by 20% to 40% compared to a fully-sampled FDK-based reconstruction. Our data support that this CE-based sinogram completion method enhances the efficacy and efficiency of LAIR and that would allow feasibility of limited angle reconstruction. Full article
(This article belongs to the Special Issue Deep Learning-Based Image Sensors)
Show Figures

Graphical abstract

15 pages, 5048 KiB  
Article
An Improved YOLOv2 for Vehicle Detection
by Jun Sang, Zhongyuan Wu, Pei Guo, Haibo Hu, Hong Xiang, Qian Zhang and Bin Cai
Sensors 2018, 18(12), 4272; https://doi.org/10.3390/s18124272 - 04 Dec 2018
Cited by 142 | Viewed by 12943
Abstract
Vehicle detection is one of the important applications of object detection in intelligent transportation systems. It aims to extract specific vehicle-type information from pictures or videos containing vehicles. To solve the problems of existing vehicle detection, such as the lack of vehicle-type recognition, [...] Read more.
Vehicle detection is one of the important applications of object detection in intelligent transportation systems. It aims to extract specific vehicle-type information from pictures or videos containing vehicles. To solve the problems of existing vehicle detection, such as the lack of vehicle-type recognition, low detection accuracy, and slow speed, a new vehicle detection model YOLOv2_Vehicle based on YOLOv2 is proposed in this paper. The k-means++ clustering algorithm was used to cluster the vehicle bounding boxes on the training dataset, and six anchor boxes with different sizes were selected. Considering that the different scales of the vehicles may influence the vehicle detection model, normalization was applied to improve the loss calculation method for length and width of bounding boxes. To improve the feature extraction ability of the network, the multi-layer feature fusion strategy was adopted, and the repeated convolution layers in high layers were removed. The experimental results on the Beijing Institute of Technology (BIT)-Vehicle validation dataset demonstrated that the mean Average Precision (mAP) could reach 94.78%. The proposed model also showed excellent generalization ability on the CompCars test dataset, where the “vehicle face” is quite different from the training dataset. With the comparison experiments, it was proven that the proposed method is effective for vehicle detection. In addition, with network visualization, the proposed model showed excellent feature extraction ability. Full article
(This article belongs to the Special Issue Deep Learning-Based Image Sensors)
Show Figures

Figure 1

24 pages, 6029 KiB  
Article
Face Recognition Using the SR-CNN Model
by Yu-Xin Yang, Chang Wen, Kai Xie, Fang-Qing Wen, Guan-Qun Sheng and Xin-Gong Tang
Sensors 2018, 18(12), 4237; https://doi.org/10.3390/s18124237 - 03 Dec 2018
Cited by 29 | Viewed by 6427
Abstract
In order to solve the problem of face recognition in complex environments being vulnerable to illumination change, object rotation, occlusion, and so on, which leads to the imprecision of target position, a face recognition algorithm with multi-feature fusion is proposed. This study presents [...] Read more.
In order to solve the problem of face recognition in complex environments being vulnerable to illumination change, object rotation, occlusion, and so on, which leads to the imprecision of target position, a face recognition algorithm with multi-feature fusion is proposed. This study presents a new robust face-matching method named SR-CNN, combining the rotation-invariant texture feature (RITF) vector, the scale-invariant feature transform (SIFT) vector, and the convolution neural network (CNN). Furthermore, a graphics processing unit (GPU) is used to parallelize the model for an optimal computational performance. The Labeled Faces in the Wild (LFW) database and self-collection face database were selected for experiments. It turns out that the true positive rate is improved by 10.97–13.24% and the acceleration ratio (the ratio between central processing unit (CPU) operation time and GPU time) is 5–6 times for the LFW face database. For the self-collection, the true positive rate increased by 12.65–15.31%, and the acceleration ratio improved by a factor of 6–7. Full article
(This article belongs to the Special Issue Deep Learning-Based Image Sensors)
Show Figures

Figure 1

15 pages, 5391 KiB  
Article
Robust Drivable Road Region Detection for Fixed-Route Autonomous Vehicles Using Map-Fusion Images
by Yichao Cai, Dachuan Li, Xiao Zhou and Xingang Mou
Sensors 2018, 18(12), 4158; https://doi.org/10.3390/s18124158 - 27 Nov 2018
Cited by 18 | Viewed by 4699
Abstract
Environment perception is one of the major issues in autonomous driving systems. In particular, effective and robust drivable road region detection still remains a challenge to be addressed for autonomous vehicles in multi-lane roads, intersections and unstructured road environments. In this paper, a [...] Read more.
Environment perception is one of the major issues in autonomous driving systems. In particular, effective and robust drivable road region detection still remains a challenge to be addressed for autonomous vehicles in multi-lane roads, intersections and unstructured road environments. In this paper, a computer vision and neural networks-based drivable road region detection approach is proposed for fixed-route autonomous vehicles (e.g., shuttles, buses and other vehicles operating on fixed routes), using a vehicle-mounted camera, route map and real-time vehicle location. The key idea of the proposed approach is to fuse an image with its corresponding local route map to obtain the map-fusion image (MFI) where the information of the image and route map act as complementary to each other. The information of the image can be utilized in road regions with rich features, while local route map acts as critical heuristics that enable robust drivable road region detection in areas without clear lane marking or borders. A neural network model constructed upon the Convolutional Neural Networks (CNNs), namely FCN-VGG16, is utilized to extract the drivable road region from the fused MFI. The proposed approach is validated using real-world driving scenario videos captured by an industrial camera mounted on a testing vehicle. Experiments demonstrate that the proposed approach outperforms the conventional approach which uses non-fused images in terms of detection accuracy and robustness, and it achieves desirable robustness against undesirable illumination conditions and pavement appearance, as well as projection and map-fusion errors. Full article
(This article belongs to the Special Issue Deep Learning-Based Image Sensors)
Show Figures

Graphical abstract

14 pages, 3796 KiB  
Article
A Vehicle Recognition Algorithm Based on Deep Transfer Learning with a Multiple Feature Subspace Distribution
by Hai Wang, Yijie Yu, Yingfeng Cai, Long Chen and Xiaobo Chen
Sensors 2018, 18(12), 4109; https://doi.org/10.3390/s18124109 - 23 Nov 2018
Cited by 22 | Viewed by 3924
Abstract
Vehicle detection is a key component of environmental sensing systems for Intelligent Vehicles (IVs). The traditional shallow model and offline learning-based vehicle detection method are not able to satisfy the real-world challenges of environmental complexity and scene dynamics. Focusing on these problems, this [...] Read more.
Vehicle detection is a key component of environmental sensing systems for Intelligent Vehicles (IVs). The traditional shallow model and offline learning-based vehicle detection method are not able to satisfy the real-world challenges of environmental complexity and scene dynamics. Focusing on these problems, this work proposes a vehicle detection algorithm based on a multiple feature subspace distribution deep model with online transfer learning. Based on the multiple feature subspace distribution hypothesis, a deep model is established in which multiple Restricted Boltzmann Machines (RBMs) construct the lower layers and a Deep Belief Network (DBN) composes the superstructure. For this deep model, an unsupervised feature extraction method is applied, which is based on sparse constraints. Then, a transfer learning method with online sample generation is proposed based on the deep model. Finally, the entire classifier is retrained online with supervised learning. The experiment is actuated using the KITTI road image datasets. The performance of the proposed method is compared with many state-of-the-art methods and it is demonstrated that the proposed deep transfer learning-based algorithm outperformed existing state-of-the-art methods. Full article
(This article belongs to the Special Issue Deep Learning-Based Image Sensors)
Show Figures

Figure 1

18 pages, 23196 KiB  
Article
Deep CNNs with Robust LBP Guiding Pooling for Face Recognition
by Zhongjian Ma, Yuanyuan Ding, Baoqing Li and Xiaobing Yuan
Sensors 2018, 18(11), 3876; https://doi.org/10.3390/s18113876 - 10 Nov 2018
Cited by 13 | Viewed by 3867
Abstract
Pooling layer in Convolutional Neural Networks (CNNs) is designed to reduce dimensions and computational complexity. Unfortunately, CNN is easily disturbed by noise in images when extracting features from input images. The traditional pooling layer directly samples the input feature maps without considering whether [...] Read more.
Pooling layer in Convolutional Neural Networks (CNNs) is designed to reduce dimensions and computational complexity. Unfortunately, CNN is easily disturbed by noise in images when extracting features from input images. The traditional pooling layer directly samples the input feature maps without considering whether they are affected by noise, which brings about accumulated noise in the subsequent feature maps as well as undesirable network outputs. To address this issue, a robust Local Binary Pattern (LBP) Guiding Pooling (G-RLBP) mechanism is proposed in this paper to down sample the input feature maps and lower the noise impact simultaneously. The proposed G-RLBP method calculates the weighted average of all pixels in the sliding window of this pooling layer as the final results based on their corresponding probabilities of being affected by noise, thus lowers the noise impact from input images at the first several layers of the CNNs. The experimental results show that the carefully designed G-RLBP layer can successfully lower the noise impact and improve the recognition rates of the CNN models over the traditional pooling layer. The performance gain of the G-RLBP is quite remarkable when the images are severely affected by noise. Full article
(This article belongs to the Special Issue Deep Learning-Based Image Sensors)
Show Figures

Figure 1

20 pages, 3461 KiB  
Article
A Benchmark Dataset and Deep Learning-Based Image Reconstruction for Electrical Capacitance Tomography
by Jin Zheng, Jinku Li, Yi Li and Lihui Peng
Sensors 2018, 18(11), 3701; https://doi.org/10.3390/s18113701 - 31 Oct 2018
Cited by 36 | Viewed by 4296
Abstract
Electrical Capacitance Tomography (ECT) image reconstruction has developed for decades and made great achievements, but there is still a need to find a new theoretical framework to make it better and faster. In recent years, machine learning theory has been introduced in the [...] Read more.
Electrical Capacitance Tomography (ECT) image reconstruction has developed for decades and made great achievements, but there is still a need to find a new theoretical framework to make it better and faster. In recent years, machine learning theory has been introduced in the ECT area to solve the image reconstruction problem. However, there is still no public benchmark dataset in the ECT field for the training and testing of machine learning-based image reconstruction algorithms. On the other hand, a public benchmark dataset can provide a standard framework to evaluate and compare the results of different image reconstruction methods. In this paper, a benchmark dataset for ECT image reconstruction is presented. Like the great contribution of ImageNet that transformed machine learning research, this benchmark dataset is hoped to be helpful for society to investigate new image reconstruction algorithms since the relationship between permittivity distribution and capacitance can be better mapped. In addition, different machine learning-based image reconstruction algorithms can be trained and tested by the unified dataset, and the results can be evaluated and compared under the same standard, thus, making the ECT image reconstruction study more open and causing a breakthrough. Full article
(This article belongs to the Special Issue Deep Learning-Based Image Sensors)
Show Figures

Figure 1

16 pages, 5003 KiB  
Article
A Low-Light Sensor Image Enhancement Algorithm Based on HSI Color Model
by Shiping Ma, Hongqiang Ma, Yuelei Xu, Shuai Li, Chao Lv and Mingming Zhu
Sensors 2018, 18(10), 3583; https://doi.org/10.3390/s18103583 - 22 Oct 2018
Cited by 27 | Viewed by 4835
Abstract
Images captured by sensors in unpleasant environment like low illumination condition are usually degraded, which means low visibility, low brightness, and low contrast. In order to improve this kind of images, in this paper, a low-light sensor image enhancement algorithm based on HSI [...] Read more.
Images captured by sensors in unpleasant environment like low illumination condition are usually degraded, which means low visibility, low brightness, and low contrast. In order to improve this kind of images, in this paper, a low-light sensor image enhancement algorithm based on HSI color model is proposed. At first, we propose a dataset generation method based on the Retinex model to overcome the shortage of sample data. Then, the original low-light image is transformed from RGB to HSI color space. The segmentation exponential method is used to process the saturation (S) and the specially designed Deep Convolutional Neural Network is applied to enhance the intensity component (I). At the end, we back into the original RGB space to get the final improved image. Experimental results show that the proposed algorithm not only enhances the image brightness and contrast significantly, but also avoids color distortion and over-enhancement in comparison with some other state-of-the-art research papers. So, it effectively improves the quality of sensor images. Full article
(This article belongs to the Special Issue Deep Learning-Based Image Sensors)
Show Figures

Figure 1

34 pages, 7611 KiB  
Article
CNN-Based Multimodal Human Recognition in Surveillance Environments
by Ja Hyung Koo, Se Woon Cho, Na Rae Baek, Min Cheol Kim and Kang Ryoung Park
Sensors 2018, 18(9), 3040; https://doi.org/10.3390/s18093040 - 11 Sep 2018
Cited by 22 | Viewed by 5682
Abstract
In the current field of human recognition, most of the research being performed currently is focused on re-identification of different body images taken by several cameras in an outdoor environment. On the other hand, there is almost no research being performed on indoor [...] Read more.
In the current field of human recognition, most of the research being performed currently is focused on re-identification of different body images taken by several cameras in an outdoor environment. On the other hand, there is almost no research being performed on indoor human recognition. Previous research on indoor recognition has mainly focused on face recognition because the camera is usually closer to a person in an indoor environment than an outdoor environment. However, due to the nature of indoor surveillance cameras, which are installed near the ceiling and capture images from above in a downward direction, people do not look directly at the cameras in most cases. Thus, it is often difficult to capture front face images, and when this is the case, facial recognition accuracy is greatly reduced. To overcome this problem, we can consider using the face and body for human recognition. However, when images are captured by indoor cameras rather than outdoor cameras, in many cases only part of the target body is included in the camera viewing angle and only part of the body is captured, which reduces the accuracy of human recognition. To address all of these problems, this paper proposes a multimodal human recognition method that uses both the face and body and is based on a deep convolutional neural network (CNN). Specifically, to solve the problem of not capturing part of the body, the results of recognizing the face and body through separate CNNs of VGG Face-16 and ResNet-50 are combined based on the score-level fusion by Weighted Sum rule to improve recognition performance. The results of experiments conducted using the custom-made Dongguk face and body database (DFB-DB1) and the open ChokePoint database demonstrate that the method proposed in this study achieves high recognition accuracy (the equal error rates of 1.52% and 0.58%, respectively) in comparison to face or body single modality-based recognition and other methods used in previous studies. Full article
(This article belongs to the Special Issue Deep Learning-Based Image Sensors)
Show Figures

Figure 1

31 pages, 8314 KiB  
Article
Face Detection in Nighttime Images Using Visible-Light Camera Sensors with Two-Step Faster Region-Based Convolutional Neural Network
by Se Woon Cho, Na Rae Baek, Min Cheol Kim, Ja Hyung Koo, Jong Hyun Kim and Kang Ryoung Park
Sensors 2018, 18(9), 2995; https://doi.org/10.3390/s18092995 - 07 Sep 2018
Cited by 43 | Viewed by 6466
Abstract
Conventional nighttime face detection studies mostly use near-infrared (NIR) light cameras or thermal cameras, which are robust to environmental illumination variation and low illumination. However, for the NIR camera, it is difficult to adjust the intensity and angle of the additional NIR illuminator [...] Read more.
Conventional nighttime face detection studies mostly use near-infrared (NIR) light cameras or thermal cameras, which are robust to environmental illumination variation and low illumination. However, for the NIR camera, it is difficult to adjust the intensity and angle of the additional NIR illuminator according to its distance from an object. As for the thermal camera, it is expensive to use as a surveillance camera. For these reasons, we propose a nighttime face detection method based on deep learning using a single visible-light camera. In a long-distance night image, it is difficult to detect faces directly from the entire image due to noise and image blur. Therefore, we propose Two-Step Faster region-based convolutional neural network (R-CNN) based on the image preprocessed by histogram equalization (HE). As a two-step scheme, our method sequentially performs the detectors of body and face areas, and locates the face inside a limited body area. By using our two-step method, the processing time by Faster R-CNN can be reduced while maintaining the accuracy of face detection by Faster R-CNN. Using a self-constructed database called Dongguk Nighttime Face Detection database (DNFD-DB1) and an open database of Fudan University, we proved that the proposed method performs better compared to other existing face detectors. In addition, the proposed Two-Step Faster R-CNN outperformed single Faster R-CNN and our method with HE showed higher accuracies than those without our preprocessing in nighttime face detection. Full article
(This article belongs to the Special Issue Deep Learning-Based Image Sensors)
Show Figures

Figure 1

32 pages, 3014 KiB  
Article
Deep Learning-Based Enhanced Presentation Attack Detection for Iris Recognition by Combining Features from Local and Global Regions Based on NIR Camera Sensor
by Dat Tien Nguyen, Tuyen Danh Pham, Young Won Lee and Kang Ryoung Park
Sensors 2018, 18(8), 2601; https://doi.org/10.3390/s18082601 - 08 Aug 2018
Cited by 37 | Viewed by 5330
Abstract
Iris recognition systems have been used in high-security-level applications because of their high recognition rate and the distinctiveness of iris patterns. However, as reported by recent studies, an iris recognition system can be fooled by the use of artificial iris patterns and lead [...] Read more.
Iris recognition systems have been used in high-security-level applications because of their high recognition rate and the distinctiveness of iris patterns. However, as reported by recent studies, an iris recognition system can be fooled by the use of artificial iris patterns and lead to a reduction in its security level. The accuracies of previous presentation attack detection research are limited because they used only features extracted from global iris region image. To overcome this problem, we propose a new presentation attack detection method for iris recognition by combining features extracted from both local and global iris regions, using convolutional neural networks and support vector machines based on a near-infrared (NIR) light camera sensor. The detection results using each kind of image features are fused, based on two fusion methods of feature level and score level to enhance the detection ability of each kind of image features. Through extensive experiments using two popular public datasets (LivDet-Iris-2017 Warsaw and Notre Dame Contact Lens Detection 2015) and their fusion, we validate the efficiency of our proposed method by providing smaller detection errors than those produced by previous studies. Full article
(This article belongs to the Special Issue Deep Learning-Based Image Sensors)
Show Figures

Figure 1

Review

Jump to: Research

18 pages, 1300 KiB  
Review
A Survey of the Techniques for The Identification and Classification of Human Actions from Visual Data
by Shahela Saif, Samabia Tehseen and Sumaira Kausar
Sensors 2018, 18(11), 3979; https://doi.org/10.3390/s18113979 - 15 Nov 2018
Cited by 13 | Viewed by 3672
Abstract
Recognition of human actions form videos has been an active area of research because it has applications in various domains. The results of work in this field are used in video surveillance, automatic video labeling and human-computer interaction, among others. Any advancements in [...] Read more.
Recognition of human actions form videos has been an active area of research because it has applications in various domains. The results of work in this field are used in video surveillance, automatic video labeling and human-computer interaction, among others. Any advancements in this field are tied to advances in the interrelated fields of object recognition, spatio- temporal video analysis and semantic segmentation. Activity recognition is a challenging task since it faces many problems such as occlusion, view point variation, background differences and clutter and illumination variations. Scientific achievements in the field have been numerous and rapid as the applications are far reaching. In this survey, we cover the growth of the field from the earliest solutions, where handcrafted features were used, to later deep learning approaches that use millions of images and videos to learn features automatically. By this discussion, we intend to highlight the major breakthroughs and the directions the future research might take while benefiting from the state-of-the-art methods. Full article
(This article belongs to the Special Issue Deep Learning-Based Image Sensors)
Show Figures

Figure 1

Back to TopTop