E-Mail Alert

Add your e-mail address to receive forthcoming issues of this journal:

Journal Browser

Journal Browser

Special Issue "Deep Learning-Based Image Sensors"

A special issue of Sensors (ISSN 1424-8220). This special issue belongs to the section "Intelligent Sensors".

Deadline for manuscript submissions: 30 June 2019

Special Issue Editors

Guest Editor
Prof. Kang Ryoung Park

Division of Electronics and Electrical Engineering, Dongguk University, 30, Pildong- ro 1-gil, Jung-gu, Seoul 100-715, Republic of Korea
Website | E-Mail
Interests: deep learning, biometrics, image processing
Guest Editor
Prof. Sangyoun Lee

School of Electrical and Electronic Engineering, Yonsei University, 50 Yonsei-ro, Seodaemun-gu, Seoul 120-749, Korea
Website | E-Mail
Interests: human detection & recognition, gesture recognition, face recognition, HEVC
Guest Editor
Prof. Euntai Kim

School of Electrical and Electronic Engineering, Yonsei University, 50 Yonsei-ro, Seodaemun-gu, Seoul 120-749, Korea
Website | E-Mail
Phone: +82
Interests: pedestrian and vehicle detection & recognition, vision for advanced driver assistance systems (ADAS), robot vision

Special Issue Information

Dear Colleagues,

Recent developments have led to the widespread use of deep learning-based image sensors, such as visible light, near-infrared (NIR), and thermal camera sensors, in a variety of applications in video surveillance, biometrics, image compression, computer vision, and image restoration, etc. While existing technology has matured, its performance is still affected by various environmental conditions, and recent approaches have been attempted to fuse deep learning techniques with conventional methods to guarantee higher accuracy. The goal of this Special Issue is to invite high-quality, state-of-the-art research papers that deal with challenging issues in deep learning-based image sensors. We solicit original papers of unpublished and completed research that are not currently under review by any other conference/magazine/journal. Topics of interest include, but are not limited to, the following:

  • Deep learning-based image processing, understanding, recognition, compression, and reconstruction by visible light, NIR, thermal camera, and multimodal camera sensors
  • Deep learning-based video processing, understanding, recognition, compression, and reconstruction by various camera sensors
  • Deep learning-based computer vision
  • Deep learning-based biometrics and spoof detection
  • Deep learning-based object detection and tracking
  • Approaches that combine deep learning techniques and conventional methods on images by various camera sensors

Prof. Dr. Kang Ryoung Park
Prof. Dr. Sangyoun Lee
Prof. Dr. Euntai Kim
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All papers will be peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Sensors is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1800 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • Image processing, understanding, recognition, compression, and reconstruction based on deep learning
  • Video processing, understanding, recognition, compression, and reconstruction based on deep learning
  • Computer vision based on deep learning
  • Biometrics based on deep learning
  • Fusion of deep learning and conventional methods

Published Papers (24 papers)

View options order results:
result details:
Displaying articles 1-24
Export citation of selected articles as:

Research

Jump to: Review

Open AccessArticle Spatio–Temporal Image Representation of 3D Skeletal Movements for View-Invariant Action Recognition with Deep Convolutional Neural Networks
Sensors 2019, 19(8), 1932; https://doi.org/10.3390/s19081932 (registering DOI)
Received: 6 March 2019 / Revised: 10 April 2019 / Accepted: 17 April 2019 / Published: 24 April 2019
PDF Full-text (1810 KB) | HTML Full-text | XML Full-text
Abstract
Designing motion representations for 3D human action recognition from skeleton sequences is an important yet challenging task. An effective representation should be robust to noise, invariant to viewpoint changes and result in a good performance with low-computational demand. Two main challenges in this [...] Read more.
Designing motion representations for 3D human action recognition from skeleton sequences is an important yet challenging task. An effective representation should be robust to noise, invariant to viewpoint changes and result in a good performance with low-computational demand. Two main challenges in this task include how to efficiently represent spatio–temporal patterns of skeletal movements and how to learn their discriminative features for classification tasks. This paper presents a novel skeleton-based representation and a deep learning framework for 3D action recognition using RGB-D sensors. We propose to build an action map called SPMF (Skeleton Posture-Motion Feature), which is a compact image representation built from skeleton poses and their motions. An Adaptive Histogram Equalization (AHE) algorithm is then applied on the SPMF to enhance their local patterns and form an enhanced action map, namely Enhanced-SPMF. For learning and classification tasks, we exploit Deep Convolutional Neural Networks based on the DenseNet architecture to learn directly an end-to-end mapping between input skeleton sequences and their action labels via the Enhanced-SPMFs. The proposed method is evaluated on four challenging benchmark datasets, including both individual actions, interactions, multiview and large-scale datasets. The experimental results demonstrate that the proposed method outperforms previous state-of-the-art approaches on all benchmark tasks, whilst requiring low computational time for training and inference. Full article
(This article belongs to the Special Issue Deep Learning-Based Image Sensors)
Figures

Figure 1

Open AccessArticle Depth Estimation and Semantic Segmentation from a Single RGB Image Using a Hybrid Convolutional Neural Network
Sensors 2019, 19(8), 1795; https://doi.org/10.3390/s19081795
Received: 6 March 2019 / Revised: 9 April 2019 / Accepted: 12 April 2019 / Published: 15 April 2019
PDF Full-text (4629 KB) | HTML Full-text | XML Full-text
Abstract
Semantic segmentation and depth estimation are two important tasks in computer vision, and many methods have been developed to tackle them. Commonly these two tasks are addressed independently, but recently the idea of merging these two problems into a sole framework has been [...] Read more.
Semantic segmentation and depth estimation are two important tasks in computer vision, and many methods have been developed to tackle them. Commonly these two tasks are addressed independently, but recently the idea of merging these two problems into a sole framework has been studied under the assumption that integrating two highly correlated tasks may benefit each other to improve the estimation accuracy. In this paper, depth estimation and semantic segmentation are jointly addressed using a single RGB input image under a unified convolutional neural network. We analyze two different architectures to evaluate which features are more relevant when shared by the two tasks and which features should be kept separated to achieve a mutual improvement. Likewise, our approaches are evaluated under two different scenarios designed to review our results versus single-task and multi-task methods. Qualitative and quantitative experiments demonstrate that the performance of our methodology outperforms the state of the art on single-task approaches, while obtaining competitive results compared with other multi-task methods. Full article
(This article belongs to the Special Issue Deep Learning-Based Image Sensors)
Figures

Figure 1

Open AccessArticle VisNet: Deep Convolutional Neural Networks for Forecasting Atmospheric Visibility
Sensors 2019, 19(6), 1343; https://doi.org/10.3390/s19061343
Received: 11 February 2019 / Revised: 6 March 2019 / Accepted: 11 March 2019 / Published: 18 March 2019
PDF Full-text (10955 KB) | HTML Full-text | XML Full-text
Abstract
Visibility is a complex phenomenon inspired by emissions and air pollutants or by factors, including sunlight, humidity, temperature, and time, which decrease the clarity of what is visible through the atmosphere. This paper provides a detailed overview of the state-of-the-art contributions in relation [...] Read more.
Visibility is a complex phenomenon inspired by emissions and air pollutants or by factors, including sunlight, humidity, temperature, and time, which decrease the clarity of what is visible through the atmosphere. This paper provides a detailed overview of the state-of-the-art contributions in relation to visibility estimation under various foggy weather conditions. We propose VisNet, which is a new approach based on deep integrated convolutional neural networks for the estimation of visibility distances from camera imagery. The implemented network uses three streams of deep integrated convolutional neural networks, which are connected in parallel. In addition, we have collected the largest dataset with three million outdoor images and exact visibility values for this study. To evaluate the model’s performance fairly and objectively, the model is trained on three image datasets with different visibility ranges, each with a different number of classes. Moreover, our proposed model, VisNet, evaluated under dissimilar fog density scenarios, uses a diverse set of images. Prior to feeding the network, each input image is filtered in the frequency domain to remove low-level features, and a spectral filter is applied to each input for the extraction of low-contrast regions. Compared to the previous methods, our approach achieves the highest performance in terms of classification based on three different datasets. Furthermore, our VisNet considerably outperforms not only the classical methods, but also state-of-the-art models of visibility estimation. Full article
(This article belongs to the Special Issue Deep Learning-Based Image Sensors)
Figures

Figure 1

Open AccessArticle Multi-Oriented and Scale-Invariant License Plate Detection Based on Convolutional Neural Networks
Sensors 2019, 19(5), 1175; https://doi.org/10.3390/s19051175
Received: 8 January 2019 / Revised: 3 March 2019 / Accepted: 4 March 2019 / Published: 7 March 2019
PDF Full-text (4429 KB) | HTML Full-text | XML Full-text
Abstract
License plate detection (LPD) is the first and key step in license plate recognition. State-of-the-art object-detection algorithms based on deep learning provide a promising form of LPD. However, there still exist two main challenges. First, existing methods often enclose objects with horizontal rectangles. [...] Read more.
License plate detection (LPD) is the first and key step in license plate recognition. State-of-the-art object-detection algorithms based on deep learning provide a promising form of LPD. However, there still exist two main challenges. First, existing methods often enclose objects with horizontal rectangles. However, horizontal rectangles are not always suitable since license plates in images are multi-oriented, reflected by rotation and perspective distortion. Second, the scale of license plates often varies, leading to the difficulty of multi-scale detection. To address the aforementioned problems, we propose a novel method of multi-oriented and scale-invariant license plate detection (MOSI-LPD) based on convolutional neural networks. Our MOSI-LPD tightly encloses the multi-oriented license plates with bounding parallelograms, regardless of the license plate scales. To obtain bounding parallelograms, we first parameterize the edge points of license plates by relative positions. Next, we design mapping functions between oriented regions and horizontal proposals. Then, we enforce the symmetry constraints in the loss function and train the model with a multi-task loss. Finally, we map region proposals to three edge points of a nearby license plate, and infer the fourth point to form bounding parallelograms. To achieve scale invariance, we first design anchor boxes based on inherent shapes of license plates. Next, we search different layers to generate region proposals with multiple scales. Finally, we up-sample the last layer and combine proposal features extracted from different layers to recognize true license plates. Experimental results have demonstrated that the proposed method outperforms existing approaches in terms of detecting license plates with different orientations and multiple scales. Full article
(This article belongs to the Special Issue Deep Learning-Based Image Sensors)
Figures

Figure 1

Open AccessArticle Deep Residual CNN-Based Ocular Recognition Based on Rough Pupil Detection in the Images by NIR Camera Sensor
Sensors 2019, 19(4), 842; https://doi.org/10.3390/s19040842
Received: 21 December 2018 / Revised: 30 January 2019 / Accepted: 15 February 2019 / Published: 18 February 2019
PDF Full-text (14270 KB) | HTML Full-text | XML Full-text
Abstract
Accurate segmentation of the iris area in input images has a significant effect on the accuracy of iris recognition and is a very important preprocessing step in the overall iris recognition process. In previous studies on iris recognition, however, the accuracy of iris [...] Read more.
Accurate segmentation of the iris area in input images has a significant effect on the accuracy of iris recognition and is a very important preprocessing step in the overall iris recognition process. In previous studies on iris recognition, however, the accuracy of iris segmentation was reduced when the images of captured irises were of low quality due to problems such as optical and motion blurring, thick eyelashes, and light reflected from eyeglasses. Deep learning-based iris segmentation has been proposed to improve accuracy, but its disadvantage is that it requires a long processing time. To resolve this problem, this study proposes a new method that quickly finds a rough iris box area without accurately segmenting the iris region in the input images and performs ocular recognition based on this. To address this problem of reduced accuracy, the recognition is performed using the ocular area, which is a little larger than the iris area, and a deep residual network (ResNet) is used to resolve the problem of reduced recognition rates due to misalignment between the enrolled and recognition iris images. Experiments were performed using three databases: Institute of Automation Chinese Academy of Sciences (CASIA)-Iris-Distance, CASIA-Iris-Lamp, and CASIA-Iris-Thousand. They confirmed that the method proposed in this study had a higher recognition accuracy than existing methods. Full article
(This article belongs to the Special Issue Deep Learning-Based Image Sensors)
Figures

Figure 1

Open AccessArticle Sensor Fusion-Based Cooperative Trail Following for Autonomous Multi-Robot System
Sensors 2019, 19(4), 823; https://doi.org/10.3390/s19040823
Received: 10 January 2019 / Revised: 2 February 2019 / Accepted: 10 February 2019 / Published: 17 February 2019
Cited by 1 | PDF Full-text (13742 KB) | HTML Full-text | XML Full-text | Supplementary Files
Abstract
Autonomously following a man-made trail in the wild is a challenging problem for robotic systems. Recently, deep learning-based approaches have cast the trail following problem as an image classification task and have achieved great success in the vision-based trail-following problem. However, the existing [...] Read more.
Autonomously following a man-made trail in the wild is a challenging problem for robotic systems. Recently, deep learning-based approaches have cast the trail following problem as an image classification task and have achieved great success in the vision-based trail-following problem. However, the existing research only focuses on the trail-following task with a single-robot system. In contrast, many robotic tasks in reality, such as search and rescue, are conducted by a group of robots. While these robots are grouped to move in the wild, they can cooperate to lead to a more robust performance and perform the trail-following task in a better manner. Concretely, each robot can periodically exchange the vision data with other robots and make decisions based both on its local view and the information from others. This paper proposes a sensor fusion-based cooperative trail-following method, which enables a group of robots to implement the trail-following task by fusing the sensor data of each robot. Our method allows each robot to face the same direction from different altitudes to fuse the vision data feature on the collective level and then take action respectively. Besides, considering the quality of service requirement of the robotic software, our method limits the condition to implementing the sensor data fusion process by using the “threshold” mechanism. Qualitative and quantitative experiments on the real-world dataset have shown that our method can significantly promote the recognition accuracy and lead to a more robust performance compared with the single-robot system. Full article
(This article belongs to the Special Issue Deep Learning-Based Image Sensors)
Figures

Figure 1

Open AccessArticle Deep Learning-Based Multinational Banknote Type and Fitness Classification with the Combined Images by Visible-Light Reflection and Infrared-Light Transmission Image Sensors
Sensors 2019, 19(4), 792; https://doi.org/10.3390/s19040792
Received: 31 December 2018 / Revised: 7 February 2019 / Accepted: 12 February 2019 / Published: 15 February 2019
PDF Full-text (10538 KB) | HTML Full-text | XML Full-text
Abstract
Automatic sorting of banknotes in payment facilities, such as automated payment machines or vending machines, consists of many tasks such as recognition of banknote type, classification of fitness for recirculation, and counterfeit detection. Previous studies addressing these problems have mostly reported separately on [...] Read more.
Automatic sorting of banknotes in payment facilities, such as automated payment machines or vending machines, consists of many tasks such as recognition of banknote type, classification of fitness for recirculation, and counterfeit detection. Previous studies addressing these problems have mostly reported separately on each of these classification tasks and for a specific type of currency only. In other words, there has been little research conducted considering a combination of these multiple tasks, such as classification of banknote denomination and fitness of banknotes, as well as considering a multinational currency condition of the method. To overcome this issue, we propose a multinational banknote type and fitness classification method that both recognizes the denomination and input direction of banknotes and determines whether the banknote is suitable for reuse or should be replaced by a new one. We also propose a method for estimating the fitness value of banknotes and the consistency of the estimation results among input trials of a banknote. Our method is based on a combination of infrared-light transmission and visible-light reflection images of the input banknote and uses deep-learning techniques with a convolutional neural network. The experimental results on a dataset composed of Indian rupee (INR), Korean won (KRW), and United States dollar (USD) banknote images with mixture of two and three fitness levels showed that the proposed method gives good performance in the combination condition of currency types and classification tasks. Full article
(This article belongs to the Special Issue Deep Learning-Based Image Sensors)
Figures

Figure 1

Open AccessArticle Vehicle Detection in Urban Traffic Surveillance Images Based on Convolutional Neural Networks with Feature Concatenation
Sensors 2019, 19(3), 594; https://doi.org/10.3390/s19030594
Received: 4 January 2019 / Revised: 24 January 2019 / Accepted: 29 January 2019 / Published: 30 January 2019
PDF Full-text (7220 KB) | HTML Full-text | XML Full-text
Abstract
Vehicle detection with category inference on video sequence data is an important but challenging task for urban traffic surveillance. The difficulty of this task lies in the fact that it requires accurate localization of relatively small vehicles in complex scenes and expects real-time [...] Read more.
Vehicle detection with category inference on video sequence data is an important but challenging task for urban traffic surveillance. The difficulty of this task lies in the fact that it requires accurate localization of relatively small vehicles in complex scenes and expects real-time detection. In this paper, we present a vehicle detection framework that improves the performance of the conventional Single Shot MultiBox Detector (SSD), which effectively detects different types of vehicles in real-time. Our approach, which proposes the use of different feature extractors for localization and classification tasks in a single network, and to enhance these two feature extractors through deconvolution (D) and pooling (P) between layers in the feature pyramid, is denoted as DP-SSD. In addition, we extend the scope of the default box by adjusting its scale so that smaller default boxes can be exploited to guide DP-SSD training. Experimental results on the UA-DETRAC and KITTI datasets demonstrate that DP-SSD can achieve efficient vehicle detection for real-world traffic surveillance data in real-time. For the UA-DETRAC test set trained with UA-DETRAC trainval set, DP-SSD with the input size of 300 × 300 achieves 75.43% mAP (mean average precision) at the speed of 50.47 FPS (frames per second), and the framework with a 512 × 512 sized input reaches 77.94% mAP at 25.12 FPS using an NVIDIA GeForce GTX 1080Ti GPU. The DP-SSD shows comparable accuracy, which is better than those of the compared state-of-the-art models, except for YOLOv3. Full article
(This article belongs to the Special Issue Deep Learning-Based Image Sensors)
Figures

Figure 1

Open AccessArticle Real-Time Semantic Segmentation for Fisheye Urban Driving Images Based on ERFNet
Sensors 2019, 19(3), 503; https://doi.org/10.3390/s19030503
Received: 12 December 2018 / Revised: 11 January 2019 / Accepted: 21 January 2019 / Published: 25 January 2019
PDF Full-text (79222 KB) | HTML Full-text | XML Full-text
Abstract
The interest in fisheye cameras has recently risen in the autonomous vehicles field, as they are able to reduce the complexity of perception systems while improving the management of dangerous driving situations. However, the strong distortion inherent to these cameras makes the usage [...] Read more.
The interest in fisheye cameras has recently risen in the autonomous vehicles field, as they are able to reduce the complexity of perception systems while improving the management of dangerous driving situations. However, the strong distortion inherent to these cameras makes the usage of conventional computer vision algorithms difficult and has prevented the development of these devices. This paper presents a methodology that provides real-time semantic segmentation on fisheye cameras leveraging only synthetic images. Furthermore, we propose some Convolutional Neural Networks(CNN) architectures based on Efficient Residual Factorized Network(ERFNet) that demonstrate notable skills handling distortion and a new training strategy that improves the segmentation on the image borders. Our proposals are compared to similar state-of-the-art works showing an outstanding performance and tested in an unknown real world scenario using a fisheye camera integrated in an open-source autonomous electric car, showing a high domain adaptation capability. Full article
(This article belongs to the Special Issue Deep Learning-Based Image Sensors)
Figures

Figure 1

Open AccessArticle Visible-Light Camera Sensor-Based Presentation Attack Detection for Face Recognition by Combining Spatial and Temporal Information
Sensors 2019, 19(2), 410; https://doi.org/10.3390/s19020410
Received: 17 December 2018 / Revised: 9 January 2019 / Accepted: 17 January 2019 / Published: 20 January 2019
PDF Full-text (2654 KB) | HTML Full-text | XML Full-text
Abstract
Face-based biometric recognition systems that can recognize human faces are widely employed in places such as airports, immigration offices, and companies, and applications such as mobile phones. However, the security of this recognition method can be compromised by attackers (unauthorized persons), who might [...] Read more.
Face-based biometric recognition systems that can recognize human faces are widely employed in places such as airports, immigration offices, and companies, and applications such as mobile phones. However, the security of this recognition method can be compromised by attackers (unauthorized persons), who might bypass the recognition system using artificial facial images. In addition, most previous studies on face presentation attack detection have only utilized spatial information. To address this problem, we propose a visible-light camera sensor-based presentation attack detection that is based on both spatial and temporal information, using the deep features extracted by a stacked convolutional neural network (CNN)-recurrent neural network (RNN) along with handcrafted features. Through experiments using two public datasets, we demonstrate that the temporal information is sufficient for detecting attacks using face images. In addition, it is established that the handcrafted image features efficiently enhance the detection performance of deep features, and the proposed method outperforms previous methods. Full article
(This article belongs to the Special Issue Deep Learning-Based Image Sensors)
Figures

Figure 1

Open AccessArticle Deep RetinaNet-Based Detection and Classification of Road Markings by Visible Light Camera Sensors
Sensors 2019, 19(2), 281; https://doi.org/10.3390/s19020281
Received: 26 December 2018 / Revised: 7 January 2019 / Accepted: 8 January 2019 / Published: 11 January 2019
PDF Full-text (23278 KB) | HTML Full-text | XML Full-text
Abstract
Detection and classification of road markings are a prerequisite for operating autonomous vehicles. Although most studies have focused on the detection of road lane markings, the detection and classification of other road markings, such as arrows and bike markings, have not received much [...] Read more.
Detection and classification of road markings are a prerequisite for operating autonomous vehicles. Although most studies have focused on the detection of road lane markings, the detection and classification of other road markings, such as arrows and bike markings, have not received much attention. Therefore, we propose a detection and classification method for various types of arrow markings and bike markings on the road in various complex environments using a one-stage deep convolutional neural network (CNN), called RetinaNet. We tested the proposed method in complex road scenarios with three open datasets captured by visible light camera sensors, namely the Malaga urban dataset, the Cambridge dataset, and the Daimler dataset on both a desktop computer and an NVIDIA Jetson TX2 embedded system. Experimental results obtained using the three open databases showed that the proposed RetinaNet-based method outperformed other methods for detection and classification of road markings in terms of both accuracy and processing time. Full article
(This article belongs to the Special Issue Deep Learning-Based Image Sensors)
Figures

Figure 1

Open AccessFeature PaperArticle Faster R-CNN and Geometric Transformation-Based Detection of Driver’s Eyes Using Multiple Near-Infrared Camera Sensors
Sensors 2019, 19(1), 197; https://doi.org/10.3390/s19010197
Received: 3 December 2018 / Revised: 31 December 2018 / Accepted: 3 January 2019 / Published: 7 January 2019
PDF Full-text (10608 KB) | HTML Full-text | XML Full-text
Abstract
Studies are being actively conducted on camera-based driver gaze tracking in a vehicle environment for vehicle interfaces and analyzing forward attention for judging driver inattention. In existing studies on the single-camera-based method, there are frequent situations in which the eye information necessary for [...] Read more.
Studies are being actively conducted on camera-based driver gaze tracking in a vehicle environment for vehicle interfaces and analyzing forward attention for judging driver inattention. In existing studies on the single-camera-based method, there are frequent situations in which the eye information necessary for gaze tracking cannot be observed well in the camera input image owing to the turning of the driver’s head during driving. To solve this problem, existing studies have used multiple-camera-based methods to obtain images to track the driver’s gaze. However, this method has the drawback of an excessive computation process and processing time, as it involves detecting the eyes and extracting the features of all images obtained from multiple cameras. This makes it difficult to implement it in an actual vehicle environment. To solve these limitations of existing studies, this study proposes a method that uses a shallow convolutional neural network (CNN) for the images of the driver’s face acquired from two cameras to adaptively select camera images more suitable for detecting eye position; faster R-CNN is applied to the selected driver images, and after the driver’s eyes are detected, the eye positions of the camera image of the other side are mapped through a geometric transformation matrix. Experiments were conducted using the self-built Dongguk Dual Camera-based Driver Database (DDCD-DB1) including the images of 26 participants acquired from inside a vehicle and the Columbia Gaze Data Set (CAVE-DB) open database. The results confirmed that the performance of the proposed method is superior to those of the existing methods. Full article
(This article belongs to the Special Issue Deep Learning-Based Image Sensors)
Figures

Figure 1

Open AccessArticle Development of Limited-Angle Iterative Reconstruction Algorithms with Context Encoder-Based Sinogram Completion for Micro-CT Applications
Sensors 2018, 18(12), 4458; https://doi.org/10.3390/s18124458
Received: 29 October 2018 / Revised: 11 December 2018 / Accepted: 13 December 2018 / Published: 16 December 2018
PDF Full-text (5729 KB) | HTML Full-text | XML Full-text
Abstract
Limited-angle iterative reconstruction (LAIR) reduces the radiation dose required for computed tomography (CT) imaging by decreasing the range of the projection angle. We developed an image-quality-based stopping-criteria method with a flexible and innovative instrument design that, when combined with LAIR, provides the image [...] Read more.
Limited-angle iterative reconstruction (LAIR) reduces the radiation dose required for computed tomography (CT) imaging by decreasing the range of the projection angle. We developed an image-quality-based stopping-criteria method with a flexible and innovative instrument design that, when combined with LAIR, provides the image quality of a conventional CT system. This study describes the construction of different scan acquisition protocols for micro-CT system applications. Fully-sampled Feldkamp (FDK)-reconstructed images were used as references for comparison to assess the image quality produced by these tested protocols. The insufficient portions of a sinogram were inpainted by applying a context encoder (CE), a type of generative adversarial network, to the LAIR process. The context image was passed through an encoder to identify features that were connected to the decoder using a channel-wise fully-connected layer. Our results evidence the excellent performance of this novel approach. Even when we reduce the radiation dose by 1/4, the iterative-based LAIR improved the full-width half-maximum, contrast-to-noise and signal-to-noise ratios by 20% to 40% compared to a fully-sampled FDK-based reconstruction. Our data support that this CE-based sinogram completion method enhances the efficacy and efficiency of LAIR and that would allow feasibility of limited angle reconstruction. Full article
(This article belongs to the Special Issue Deep Learning-Based Image Sensors)
Figures

Graphical abstract

Open AccessArticle An Improved YOLOv2 for Vehicle Detection
Sensors 2018, 18(12), 4272; https://doi.org/10.3390/s18124272
Received: 26 October 2018 / Revised: 23 November 2018 / Accepted: 30 November 2018 / Published: 4 December 2018
PDF Full-text (5048 KB) | HTML Full-text | XML Full-text
Abstract
Vehicle detection is one of the important applications of object detection in intelligent transportation systems. It aims to extract specific vehicle-type information from pictures or videos containing vehicles. To solve the problems of existing vehicle detection, such as the lack of vehicle-type recognition, [...] Read more.
Vehicle detection is one of the important applications of object detection in intelligent transportation systems. It aims to extract specific vehicle-type information from pictures or videos containing vehicles. To solve the problems of existing vehicle detection, such as the lack of vehicle-type recognition, low detection accuracy, and slow speed, a new vehicle detection model YOLOv2_Vehicle based on YOLOv2 is proposed in this paper. The k-means++ clustering algorithm was used to cluster the vehicle bounding boxes on the training dataset, and six anchor boxes with different sizes were selected. Considering that the different scales of the vehicles may influence the vehicle detection model, normalization was applied to improve the loss calculation method for length and width of bounding boxes. To improve the feature extraction ability of the network, the multi-layer feature fusion strategy was adopted, and the repeated convolution layers in high layers were removed. The experimental results on the Beijing Institute of Technology (BIT)-Vehicle validation dataset demonstrated that the mean Average Precision (mAP) could reach 94.78%. The proposed model also showed excellent generalization ability on the CompCars test dataset, where the “vehicle face” is quite different from the training dataset. With the comparison experiments, it was proven that the proposed method is effective for vehicle detection. In addition, with network visualization, the proposed model showed excellent feature extraction ability. Full article
(This article belongs to the Special Issue Deep Learning-Based Image Sensors)
Figures

Figure 1

Open AccessArticle Face Recognition Using the SR-CNN Model
Sensors 2018, 18(12), 4237; https://doi.org/10.3390/s18124237
Received: 27 October 2018 / Revised: 27 November 2018 / Accepted: 28 November 2018 / Published: 3 December 2018
Cited by 1 | PDF Full-text (6029 KB) | HTML Full-text | XML Full-text
Abstract
In order to solve the problem of face recognition in complex environments being vulnerable to illumination change, object rotation, occlusion, and so on, which leads to the imprecision of target position, a face recognition algorithm with multi-feature fusion is proposed. This study presents [...] Read more.
In order to solve the problem of face recognition in complex environments being vulnerable to illumination change, object rotation, occlusion, and so on, which leads to the imprecision of target position, a face recognition algorithm with multi-feature fusion is proposed. This study presents a new robust face-matching method named SR-CNN, combining the rotation-invariant texture feature (RITF) vector, the scale-invariant feature transform (SIFT) vector, and the convolution neural network (CNN). Furthermore, a graphics processing unit (GPU) is used to parallelize the model for an optimal computational performance. The Labeled Faces in the Wild (LFW) database and self-collection face database were selected for experiments. It turns out that the true positive rate is improved by 10.97–13.24% and the acceleration ratio (the ratio between central processing unit (CPU) operation time and GPU time) is 5–6 times for the LFW face database. For the self-collection, the true positive rate increased by 12.65–15.31%, and the acceleration ratio improved by a factor of 6–7. Full article
(This article belongs to the Special Issue Deep Learning-Based Image Sensors)
Figures

Figure 1

Open AccessArticle Robust Drivable Road Region Detection for Fixed-Route Autonomous Vehicles Using Map-Fusion Images
Sensors 2018, 18(12), 4158; https://doi.org/10.3390/s18124158
Received: 30 August 2018 / Revised: 17 November 2018 / Accepted: 20 November 2018 / Published: 27 November 2018
Cited by 1 | PDF Full-text (5391 KB) | HTML Full-text | XML Full-text
Abstract
Environment perception is one of the major issues in autonomous driving systems. In particular, effective and robust drivable road region detection still remains a challenge to be addressed for autonomous vehicles in multi-lane roads, intersections and unstructured road environments. In this paper, a [...] Read more.
Environment perception is one of the major issues in autonomous driving systems. In particular, effective and robust drivable road region detection still remains a challenge to be addressed for autonomous vehicles in multi-lane roads, intersections and unstructured road environments. In this paper, a computer vision and neural networks-based drivable road region detection approach is proposed for fixed-route autonomous vehicles (e.g., shuttles, buses and other vehicles operating on fixed routes), using a vehicle-mounted camera, route map and real-time vehicle location. The key idea of the proposed approach is to fuse an image with its corresponding local route map to obtain the map-fusion image (MFI) where the information of the image and route map act as complementary to each other. The information of the image can be utilized in road regions with rich features, while local route map acts as critical heuristics that enable robust drivable road region detection in areas without clear lane marking or borders. A neural network model constructed upon the Convolutional Neural Networks (CNNs), namely FCN-VGG16, is utilized to extract the drivable road region from the fused MFI. The proposed approach is validated using real-world driving scenario videos captured by an industrial camera mounted on a testing vehicle. Experiments demonstrate that the proposed approach outperforms the conventional approach which uses non-fused images in terms of detection accuracy and robustness, and it achieves desirable robustness against undesirable illumination conditions and pavement appearance, as well as projection and map-fusion errors. Full article
(This article belongs to the Special Issue Deep Learning-Based Image Sensors)
Figures

Graphical abstract

Open AccessArticle A Vehicle Recognition Algorithm Based on Deep Transfer Learning with a Multiple Feature Subspace Distribution
Sensors 2018, 18(12), 4109; https://doi.org/10.3390/s18124109
Received: 19 October 2018 / Revised: 19 November 2018 / Accepted: 21 November 2018 / Published: 23 November 2018
Cited by 1 | PDF Full-text (3796 KB) | HTML Full-text | XML Full-text
Abstract
Vehicle detection is a key component of environmental sensing systems for Intelligent Vehicles (IVs). The traditional shallow model and offline learning-based vehicle detection method are not able to satisfy the real-world challenges of environmental complexity and scene dynamics. Focusing on these problems, this [...] Read more.
Vehicle detection is a key component of environmental sensing systems for Intelligent Vehicles (IVs). The traditional shallow model and offline learning-based vehicle detection method are not able to satisfy the real-world challenges of environmental complexity and scene dynamics. Focusing on these problems, this work proposes a vehicle detection algorithm based on a multiple feature subspace distribution deep model with online transfer learning. Based on the multiple feature subspace distribution hypothesis, a deep model is established in which multiple Restricted Boltzmann Machines (RBMs) construct the lower layers and a Deep Belief Network (DBN) composes the superstructure. For this deep model, an unsupervised feature extraction method is applied, which is based on sparse constraints. Then, a transfer learning method with online sample generation is proposed based on the deep model. Finally, the entire classifier is retrained online with supervised learning. The experiment is actuated using the KITTI road image datasets. The performance of the proposed method is compared with many state-of-the-art methods and it is demonstrated that the proposed deep transfer learning-based algorithm outperformed existing state-of-the-art methods. Full article
(This article belongs to the Special Issue Deep Learning-Based Image Sensors)
Figures

Figure 1

Open AccessArticle Deep CNNs with Robust LBP Guiding Pooling for Face Recognition
Sensors 2018, 18(11), 3876; https://doi.org/10.3390/s18113876
Received: 10 October 2018 / Revised: 7 November 2018 / Accepted: 8 November 2018 / Published: 10 November 2018
Cited by 1 | PDF Full-text (23196 KB) | HTML Full-text | XML Full-text
Abstract
Pooling layer in Convolutional Neural Networks (CNNs) is designed to reduce dimensions and computational complexity. Unfortunately, CNN is easily disturbed by noise in images when extracting features from input images. The traditional pooling layer directly samples the input feature maps without considering whether [...] Read more.
Pooling layer in Convolutional Neural Networks (CNNs) is designed to reduce dimensions and computational complexity. Unfortunately, CNN is easily disturbed by noise in images when extracting features from input images. The traditional pooling layer directly samples the input feature maps without considering whether they are affected by noise, which brings about accumulated noise in the subsequent feature maps as well as undesirable network outputs. To address this issue, a robust Local Binary Pattern (LBP) Guiding Pooling (G-RLBP) mechanism is proposed in this paper to down sample the input feature maps and lower the noise impact simultaneously. The proposed G-RLBP method calculates the weighted average of all pixels in the sliding window of this pooling layer as the final results based on their corresponding probabilities of being affected by noise, thus lowers the noise impact from input images at the first several layers of the CNNs. The experimental results show that the carefully designed G-RLBP layer can successfully lower the noise impact and improve the recognition rates of the CNN models over the traditional pooling layer. The performance gain of the G-RLBP is quite remarkable when the images are severely affected by noise. Full article
(This article belongs to the Special Issue Deep Learning-Based Image Sensors)
Figures

Figure 1

Open AccessArticle A Benchmark Dataset and Deep Learning-Based Image Reconstruction for Electrical Capacitance Tomography
Sensors 2018, 18(11), 3701; https://doi.org/10.3390/s18113701
Received: 27 September 2018 / Revised: 26 October 2018 / Accepted: 29 October 2018 / Published: 31 October 2018
PDF Full-text (3461 KB) | HTML Full-text | XML Full-text
Abstract
Electrical Capacitance Tomography (ECT) image reconstruction has developed for decades and made great achievements, but there is still a need to find a new theoretical framework to make it better and faster. In recent years, machine learning theory has been introduced in the [...] Read more.
Electrical Capacitance Tomography (ECT) image reconstruction has developed for decades and made great achievements, but there is still a need to find a new theoretical framework to make it better and faster. In recent years, machine learning theory has been introduced in the ECT area to solve the image reconstruction problem. However, there is still no public benchmark dataset in the ECT field for the training and testing of machine learning-based image reconstruction algorithms. On the other hand, a public benchmark dataset can provide a standard framework to evaluate and compare the results of different image reconstruction methods. In this paper, a benchmark dataset for ECT image reconstruction is presented. Like the great contribution of ImageNet that transformed machine learning research, this benchmark dataset is hoped to be helpful for society to investigate new image reconstruction algorithms since the relationship between permittivity distribution and capacitance can be better mapped. In addition, different machine learning-based image reconstruction algorithms can be trained and tested by the unified dataset, and the results can be evaluated and compared under the same standard, thus, making the ECT image reconstruction study more open and causing a breakthrough. Full article
(This article belongs to the Special Issue Deep Learning-Based Image Sensors)
Figures

Figure 1

Open AccessArticle A Low-Light Sensor Image Enhancement Algorithm Based on HSI Color Model
Sensors 2018, 18(10), 3583; https://doi.org/10.3390/s18103583
Received: 18 September 2018 / Revised: 15 October 2018 / Accepted: 20 October 2018 / Published: 22 October 2018
Cited by 1 | PDF Full-text (5003 KB) | HTML Full-text | XML Full-text
Abstract
Images captured by sensors in unpleasant environment like low illumination condition are usually degraded, which means low visibility, low brightness, and low contrast. In order to improve this kind of images, in this paper, a low-light sensor image enhancement algorithm based on HSI [...] Read more.
Images captured by sensors in unpleasant environment like low illumination condition are usually degraded, which means low visibility, low brightness, and low contrast. In order to improve this kind of images, in this paper, a low-light sensor image enhancement algorithm based on HSI color model is proposed. At first, we propose a dataset generation method based on the Retinex model to overcome the shortage of sample data. Then, the original low-light image is transformed from RGB to HSI color space. The segmentation exponential method is used to process the saturation (S) and the specially designed Deep Convolutional Neural Network is applied to enhance the intensity component (I). At the end, we back into the original RGB space to get the final improved image. Experimental results show that the proposed algorithm not only enhances the image brightness and contrast significantly, but also avoids color distortion and over-enhancement in comparison with some other state-of-the-art research papers. So, it effectively improves the quality of sensor images. Full article
(This article belongs to the Special Issue Deep Learning-Based Image Sensors)
Figures

Figure 1

Open AccessArticle CNN-Based Multimodal Human Recognition in Surveillance Environments
Sensors 2018, 18(9), 3040; https://doi.org/10.3390/s18093040
Received: 7 August 2018 / Revised: 7 September 2018 / Accepted: 8 September 2018 / Published: 11 September 2018
PDF Full-text (7611 KB) | HTML Full-text | XML Full-text
Abstract
In the current field of human recognition, most of the research being performed currently is focused on re-identification of different body images taken by several cameras in an outdoor environment. On the other hand, there is almost no research being performed on indoor [...] Read more.
In the current field of human recognition, most of the research being performed currently is focused on re-identification of different body images taken by several cameras in an outdoor environment. On the other hand, there is almost no research being performed on indoor human recognition. Previous research on indoor recognition has mainly focused on face recognition because the camera is usually closer to a person in an indoor environment than an outdoor environment. However, due to the nature of indoor surveillance cameras, which are installed near the ceiling and capture images from above in a downward direction, people do not look directly at the cameras in most cases. Thus, it is often difficult to capture front face images, and when this is the case, facial recognition accuracy is greatly reduced. To overcome this problem, we can consider using the face and body for human recognition. However, when images are captured by indoor cameras rather than outdoor cameras, in many cases only part of the target body is included in the camera viewing angle and only part of the body is captured, which reduces the accuracy of human recognition. To address all of these problems, this paper proposes a multimodal human recognition method that uses both the face and body and is based on a deep convolutional neural network (CNN). Specifically, to solve the problem of not capturing part of the body, the results of recognizing the face and body through separate CNNs of VGG Face-16 and ResNet-50 are combined based on the score-level fusion by Weighted Sum rule to improve recognition performance. The results of experiments conducted using the custom-made Dongguk face and body database (DFB-DB1) and the open ChokePoint database demonstrate that the method proposed in this study achieves high recognition accuracy (the equal error rates of 1.52% and 0.58%, respectively) in comparison to face or body single modality-based recognition and other methods used in previous studies. Full article
(This article belongs to the Special Issue Deep Learning-Based Image Sensors)
Figures

Figure 1

Open AccessArticle Face Detection in Nighttime Images Using Visible-Light Camera Sensors with Two-Step Faster Region-Based Convolutional Neural Network
Sensors 2018, 18(9), 2995; https://doi.org/10.3390/s18092995
Received: 31 July 2018 / Revised: 4 September 2018 / Accepted: 4 September 2018 / Published: 7 September 2018
Cited by 2 | PDF Full-text (8314 KB) | HTML Full-text | XML Full-text
Abstract
Conventional nighttime face detection studies mostly use near-infrared (NIR) light cameras or thermal cameras, which are robust to environmental illumination variation and low illumination. However, for the NIR camera, it is difficult to adjust the intensity and angle of the additional NIR illuminator [...] Read more.
Conventional nighttime face detection studies mostly use near-infrared (NIR) light cameras or thermal cameras, which are robust to environmental illumination variation and low illumination. However, for the NIR camera, it is difficult to adjust the intensity and angle of the additional NIR illuminator according to its distance from an object. As for the thermal camera, it is expensive to use as a surveillance camera. For these reasons, we propose a nighttime face detection method based on deep learning using a single visible-light camera. In a long-distance night image, it is difficult to detect faces directly from the entire image due to noise and image blur. Therefore, we propose Two-Step Faster region-based convolutional neural network (R-CNN) based on the image preprocessed by histogram equalization (HE). As a two-step scheme, our method sequentially performs the detectors of body and face areas, and locates the face inside a limited body area. By using our two-step method, the processing time by Faster R-CNN can be reduced while maintaining the accuracy of face detection by Faster R-CNN. Using a self-constructed database called Dongguk Nighttime Face Detection database (DNFD-DB1) and an open database of Fudan University, we proved that the proposed method performs better compared to other existing face detectors. In addition, the proposed Two-Step Faster R-CNN outperformed single Faster R-CNN and our method with HE showed higher accuracies than those without our preprocessing in nighttime face detection. Full article
(This article belongs to the Special Issue Deep Learning-Based Image Sensors)
Figures

Figure 1

Open AccessArticle Deep Learning-Based Enhanced Presentation Attack Detection for Iris Recognition by Combining Features from Local and Global Regions Based on NIR Camera Sensor
Sensors 2018, 18(8), 2601; https://doi.org/10.3390/s18082601
Received: 20 July 2018 / Revised: 2 August 2018 / Accepted: 5 August 2018 / Published: 8 August 2018
Cited by 1 | PDF Full-text (3014 KB) | HTML Full-text | XML Full-text
Abstract
Iris recognition systems have been used in high-security-level applications because of their high recognition rate and the distinctiveness of iris patterns. However, as reported by recent studies, an iris recognition system can be fooled by the use of artificial iris patterns and lead [...] Read more.
Iris recognition systems have been used in high-security-level applications because of their high recognition rate and the distinctiveness of iris patterns. However, as reported by recent studies, an iris recognition system can be fooled by the use of artificial iris patterns and lead to a reduction in its security level. The accuracies of previous presentation attack detection research are limited because they used only features extracted from global iris region image. To overcome this problem, we propose a new presentation attack detection method for iris recognition by combining features extracted from both local and global iris regions, using convolutional neural networks and support vector machines based on a near-infrared (NIR) light camera sensor. The detection results using each kind of image features are fused, based on two fusion methods of feature level and score level to enhance the detection ability of each kind of image features. Through extensive experiments using two popular public datasets (LivDet-Iris-2017 Warsaw and Notre Dame Contact Lens Detection 2015) and their fusion, we validate the efficiency of our proposed method by providing smaller detection errors than those produced by previous studies. Full article
(This article belongs to the Special Issue Deep Learning-Based Image Sensors)
Figures

Figure 1

Review

Jump to: Research

Open AccessReview A Survey of the Techniques for The Identification and Classification of Human Actions from Visual Data
Sensors 2018, 18(11), 3979; https://doi.org/10.3390/s18113979
Received: 29 September 2018 / Revised: 24 October 2018 / Accepted: 9 November 2018 / Published: 15 November 2018
Cited by 1 | PDF Full-text (1300 KB) | HTML Full-text | XML Full-text
Abstract
Recognition of human actions form videos has been an active area of research because it has applications in various domains. The results of work in this field are used in video surveillance, automatic video labeling and human-computer interaction, among others. Any advancements in [...] Read more.
Recognition of human actions form videos has been an active area of research because it has applications in various domains. The results of work in this field are used in video surveillance, automatic video labeling and human-computer interaction, among others. Any advancements in this field are tied to advances in the interrelated fields of object recognition, spatio- temporal video analysis and semantic segmentation. Activity recognition is a challenging task since it faces many problems such as occlusion, view point variation, background differences and clutter and illumination variations. Scientific achievements in the field have been numerous and rapid as the applications are far reaching. In this survey, we cover the growth of the field from the earliest solutions, where handcrafted features were used, to later deep learning approaches that use millions of images and videos to learn features automatically. By this discussion, we intend to highlight the major breakthroughs and the directions the future research might take while benefiting from the state-of-the-art methods. Full article
(This article belongs to the Special Issue Deep Learning-Based Image Sensors)
Figures

Figure 1

Sensors EISSN 1424-8220 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert
Back to Top