Special Issue "Advanced Intelligent Imaging Technology"

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Computing and Artificial Intelligence".

Deadline for manuscript submissions: closed (31 May 2019).

Special Issue Editor

Prof. Joonki Paik
E-Mail Website
Guest Editor
Department of Image, Graduate School of Advanced Imaging Science,Chung-Ang University, Seoul 156-756, Korea
Interests: image enhancement and restoration; computational imaging; intelligent surveillance systems
Special Issues and Collections in MDPI journals

Special Issue Information

Dear Colleagues,

A general pipeline of visual information processing includes: i) image sensing and acquisition, ii) pre-processing, iii) feature detection or metric estimation, and iv) high-level decision. State-of-the-art artificial intelligence technology caused a quantum leap in performance improvements to each step of visual information processing.

Artificial intelligence-based image signal processing (ISP) technology can drastically enhance the acquired digital images through demosaicing, denoising, deblurring, super resolution, and the wide dynamic range using deep neural networks. Feature detection and image analyses are the most popular application areas of artificial intelligence. An intelligent imaging system can solve various problems that are unsolvable without using the intelligence or learning.

An objective of this Special Issue is to highlight innovative developments of intelligent imaging technology related with various state-of-the-art image acquisition, pre-processing, feature detection, and image analysis using machine learning and artificial intelligence. In addition, any applications that combine two or more intelligent imaging methods are another important research area. Topics include, but are not limited to:

  • Computational photography for intelligent imaging
  • Visual inspection using machine learning and artificial intelligence
  • Depth estimation and three-dimensional analysis
  • Image processing and computer vision algorithms for advanced driver assistance systems (ADAS)
  • Wide-area, intelligent surveillance systems using multiple camera network
  • Advanced image signal processor (ISP) based on artificial intelligence
  • Deep neural networks for inverse imaging problems
  • Multiple camera collaboration based on reinforcement learning
  • Fusion of hybrid sensors for intelligent imaging systems

Prof. Joonki Paik
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All papers will be peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2000 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • deep neural network (DNN)
  • artificial neural network (ANN)
  • intelligent surveillance systems
  • computational photography
  • computational imaging
  • image signal processor (ISP)
  • camera network
  • visual inspection...

Published Papers (73 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Review

Article
Estimation of Parameters of Parathyroid Glands Using Particle Swarm Optimization and Multivariate Generalized Gaussian Function Mixture
Appl. Sci. 2019, 9(21), 4511; https://doi.org/10.3390/app9214511 - 24 Oct 2019
Cited by 1 | Viewed by 619
Abstract
The paper introduces a fitting method for Single-Photon Emission Computed Tomography (SPECT) images of parathyroid glands using generalized Gaussian function for quantitative assessment of preoperative parathyroid SPECT/CT scintigraphy results in a large patient cohort. Parathyroid glands are very small for SPECT acquisition and [...] Read more.
The paper introduces a fitting method for Single-Photon Emission Computed Tomography (SPECT) images of parathyroid glands using generalized Gaussian function for quantitative assessment of preoperative parathyroid SPECT/CT scintigraphy results in a large patient cohort. Parathyroid glands are very small for SPECT acquisition and the overlapping of 3D distributions was observed. The application of multivariate generalized Gaussian function mixture allows modeling, but results depend on the optimization algorithm. Particle Swarm Optimization (PSO) with global best, ring, and random neighborhood topologies were compared. The obtained results show benefits of random neighborhood topology that gives a smaller error for 3D position and the position estimation was improved by about 3 % voxel size, but the most important is the reduction of processing time to a few minutes, compared to a few hours in relation to the random walk algorithm. Moreover, the frequency of obtaining low MSE values was more than two times higher for this topology. The presented method based on random neighborhood topology allows quantifying activity in a specific voxel in a short time and could be applied it in clinical practice. Full article
(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)
Show Figures

Figure 1

Article
Generating a Cylindrical Panorama from a Forward-Looking Borehole Video for Borehole Condition Analysis
Appl. Sci. 2019, 9(16), 3437; https://doi.org/10.3390/app9163437 - 20 Aug 2019
Cited by 2 | Viewed by 1082
Abstract
Geological exploration plays a fundamental and crucial role in geological engineering. The most frequently used method is to obtain borehole videos using an axial view borehole camera system (AVBCS) in a pre-drilled borehole. This approach to surveying the internal structure of a borehole [...] Read more.
Geological exploration plays a fundamental and crucial role in geological engineering. The most frequently used method is to obtain borehole videos using an axial view borehole camera system (AVBCS) in a pre-drilled borehole. This approach to surveying the internal structure of a borehole is based on the video playback and video screenshot analysis. One of the drawbacks of AVBCS is that it provides only a qualitative description of borehole information with a forward-looking borehole video, but quantitative analysis of the borehole data, such as the width and dip angle of fracture, are unavailable. In this paper, we proposed a new approach to create a whole borehole-wall cylindrical panorama from the borehole video acquired by AVBCS, which provides a possibility for further analysis of borehole information. Firstly, based on the Otsu and region labeling algorithms, a borehole center location algorithm is proposed to extract the borehole center of each video image automatically. Afterwards, based on coordinate mapping (CM), a virtual coordinate graph (VCG) is designed in the unwrapping process of the front view borehole-wall image sequence, generating the corresponding unfolded image sequence and reducing the computational cost. Subsequently, based on the sum of absolute difference (SAD), a projection transformation SAD (PTSAD), which considers the gray level similarity of candidate images, is proposed to achieve the matching of the unfolded image sequence. Finally, an image filtering module is introduced to filter the invalid frames and the remaining frames are stitched into a complete cylindrical panorama. Experiments on two real-world borehole videos demonstrate that the proposed method can generate panoramic borehole-wall unfolded images from videos with satisfying visual effect for follow up geological condition analysis. From the resulting image, borehole information, including the rock mechanical properties, distribution and width of fracture, fault distribution and seam thickness, can be further obtained and analyzed. Full article
(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)
Show Figures

Figure 1

Article
Novel Hand Gesture Alert System
Appl. Sci. 2019, 9(16), 3419; https://doi.org/10.3390/app9163419 - 19 Aug 2019
Cited by 4 | Viewed by 1267
Abstract
Sexual assault can cause great societal damage, with negative socio-economic, mental, sexual, physical and reproductive consequences. According to the Eurostat, the number of crimes increased in the European Union between 2008 and 2016. However, despite the increase in security tools such as cameras, [...] Read more.
Sexual assault can cause great societal damage, with negative socio-economic, mental, sexual, physical and reproductive consequences. According to the Eurostat, the number of crimes increased in the European Union between 2008 and 2016. However, despite the increase in security tools such as cameras, it is usually difficult to know if an individual is subject to an assault based on his or her posture. Hand gestures are seen by many as the natural means of nonverbal communication when interacting with a computer, and a considerable amount of research has been performed. In addition, the identifiable hand placement characteristics provided by modern inexpensive commercial depth cameras can be used in a variety of gesture recognition-based systems, particularly for human-machine interactions. This paper introduces a novel gesture alert system that uses a combination of Convolution Neural Networks (CNNs). The overall system can be subdivided into three main parts: firstly, the human detection in the image using a pretrained “You Only Look Once (YOLO)” method, which extracts the related bounding boxes containing his/her hands; secondly, the gesture detection/classification stage, which processes the bounding box images; and thirdly, we introduced a module called “counterGesture”, which triggers the alert. Full article
(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)
Show Figures

Figure 1

Article
Fast Continuous Structural Similarity Patch Based Arbitrary Style Transfer
Appl. Sci. 2019, 9(16), 3304; https://doi.org/10.3390/app9163304 - 12 Aug 2019
Viewed by 876
Abstract
Style transfer is using a pair of content and style images to synthesize a stylized image which has both the structure of the content image and the style of style image. Existing optimization-based methods are limited in their performance. Some works using a [...] Read more.
Style transfer is using a pair of content and style images to synthesize a stylized image which has both the structure of the content image and the style of style image. Existing optimization-based methods are limited in their performance. Some works using a feed-forward network allow arbitrary style transfer but cannot reflect the style. In this paper, we present a fast continuous structural similarity patch based arbitrary style transfer. Firstly, we introduce the structural similarity index (SSIM) to compute the similarity between all of the content and style patches for obtaining their similarity. Then a local style patch choosing procedure is applied to maximize the utilization of all style patches and make the swapped style patch continuous matching with respect to the spatial location of style at the same time. Finally, we apply an efficient trained feed-forward inverse network to obtain the final stylized image. We use more than 80,000 natural images and 120,000 style images to train that feed-forward inverse network. The results show that our method is able to transfer arbitrary style with consistency, and the result comparison stage is made to show the effectiveness and high-quality of our stylized images. Full article
(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)
Show Figures

Figure 1

Article
TF-YOLO: An Improved Incremental Network for Real-Time Object Detection
Appl. Sci. 2019, 9(16), 3225; https://doi.org/10.3390/app9163225 - 07 Aug 2019
Cited by 21 | Viewed by 2016
Abstract
In recent years, significant advances have been gained in visual detection, and an abundance of outstanding models have been proposed. However, state-of-the-art object detection networks have some inefficiencies in detecting small targets. They commonly fail to run on portable devices or embedded systems [...] Read more.
In recent years, significant advances have been gained in visual detection, and an abundance of outstanding models have been proposed. However, state-of-the-art object detection networks have some inefficiencies in detecting small targets. They commonly fail to run on portable devices or embedded systems due to their high complexity. In this workpaper, a real-time object detection model, termed as Tiny Fast You Only Look Once (TF-YOLO), is developed to implement in an embedded system. Firstly, the k-means++ algorithm is applied to cluster the dataset, which contributes to more excellent priori boxes of the targets. Secondly, inspired by the multi-scale prediction idea in the Feature Pyramid Networks (FPN) algorithm, the framework in YOLOv3 is effectively improved and optimized, by three scales to detect the earlier extracted features. In this way, the modified network is sensitive for small targets. Experimental results demonstrate that the proposed TF-YOLO method is a smaller, faster and more efficient network model increasing the performance of end-to-end training and real-time object detection for a variety of devices. Full article
(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)
Show Figures

Figure 1

Article
Triple-Attention Mixed-Link Network for Single-Image Super-Resolution
Appl. Sci. 2019, 9(15), 2992; https://doi.org/10.3390/app9152992 - 25 Jul 2019
Cited by 1 | Viewed by 1209
Abstract
Single-image super-resolution is of great importance as a low-level computer-vision task. Recent approaches with deep convolutional neural networks have achieved impressive performance. However, existing architectures have limitations due to the less sophisticated structure along with less strong representational power. In this work, to [...] Read more.
Single-image super-resolution is of great importance as a low-level computer-vision task. Recent approaches with deep convolutional neural networks have achieved impressive performance. However, existing architectures have limitations due to the less sophisticated structure along with less strong representational power. In this work, to significantly enhance the feature representation, we proposed triple-attention mixed-link network (TAN), which consists of (1) three different aspects (i.e., kernel, spatial, and channel) of attention mechanisms and (2) fusion of both powerful residual and dense connections (i.e., mixed link). Specifically, the network with multi-kernel learns multi-hierarchical representations under different receptive fields. The features are recalibrated by the effective kernel and channel attention, which filters the information and enables the network to learn more powerful representations. The features finally pass through the spatial attention in the reconstruction network, which generates a fusion of local and global information, lets the network restore more details, and improves the reconstruction quality. The proposed network structure decreases 50% of the parameter growth rate compared with previous approaches. The three attention mechanisms provide 0.49 dB, 0.58 dB, and 0.32 dB performance gain when evaluating on Set5, Set14, and BSD100. Thanks to the diverse feature recalibrations and the advanced information flow topology, our proposed model is strong enough to perform against the state-of-the-art methods on the benchmark evaluations. Full article
(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)
Show Figures

Figure 1

Article
Variational Autoencoder-Based Multiple Image Captioning Using a Caption Attention Map
Appl. Sci. 2019, 9(13), 2699; https://doi.org/10.3390/app9132699 - 02 Jul 2019
Cited by 4 | Viewed by 1695
Abstract
Image captioning is a promising research topic that is applicable to services that search for desired content in a large amount of video data and a situation explanation service for visually impaired people. Previous research on image captioning has been focused on generating [...] Read more.
Image captioning is a promising research topic that is applicable to services that search for desired content in a large amount of video data and a situation explanation service for visually impaired people. Previous research on image captioning has been focused on generating one caption per image. However, to increase usability in applications, it is necessary to generate several different captions that contain various representations for an image. We propose a method to generate multiple captions using a variational autoencoder, which is one of the generative models. Because an image feature plays an important role when generating captions, a method to extract a Caption Attention Map (CAM) of the image is proposed, and CAMs are projected to a latent distribution. In addition, methods for the evaluation of multiple image captioning tasks are proposed that have not yet been actively researched. The proposed model outperforms in the aspect of diversity compared with the base model when the accuracy is comparable. Moreover, it is verified that the model using CAM generates detailed captions describing various content in the image. Full article
(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)
Show Figures

Figure 1

Article
Neural Sign Language Translation Based on Human Keypoint Estimation
Appl. Sci. 2019, 9(13), 2683; https://doi.org/10.3390/app9132683 - 01 Jul 2019
Cited by 31 | Viewed by 2088
Abstract
We propose a sign language translation system based on human keypoint estimation. It is well-known that many problems in the field of computer vision require a massive dataset to train deep neural network models. The situation is even worse when it comes to [...] Read more.
We propose a sign language translation system based on human keypoint estimation. It is well-known that many problems in the field of computer vision require a massive dataset to train deep neural network models. The situation is even worse when it comes to the sign language translation problem as it is far more difficult to collect high-quality training data. In this paper, we introduce the KETI (Korea Electronics Technology Institute) sign language dataset, which consists of 14,672 videos of high resolution and quality. Considering the fact that each country has a different and unique sign language, the KETI sign language dataset can be the starting point for further research on the Korean sign language translation. Using the KETI sign language dataset, we develop a neural network model for translating sign videos into natural language sentences by utilizing the human keypoints extracted from the face, hands, and body parts. The obtained human keypoint vector is normalized by the mean and standard deviation of the keypoints and used as input to our translation model based on the sequence-to-sequence architecture. As a result, we show that our approach is robust even when the size of the training data is not sufficient. Our translation model achieved 93.28% (55.28%, respectively) translation accuracy on the validation set (test set, respectively) for 105 sentences that can be used in emergency situations. We compared several types of our neural sign translation models based on different attention mechanisms in terms of classical metrics for measuring the translation performance. Full article
(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)
Show Figures

Figure 1

Article
A Model-Based Approach of Foreground Region of Interest Detection for Video Codecs
Appl. Sci. 2019, 9(13), 2670; https://doi.org/10.3390/app9132670 - 30 Jun 2019
Viewed by 869
Abstract
Detecting the Region of Interest (ROI) for video clips is a significant and useful technique both in video codecs and surveillance/monitor systems. In this paper, a new model-based detection method is designed which suits video compression codecs by proposing two models: an “inter” [...] Read more.
Detecting the Region of Interest (ROI) for video clips is a significant and useful technique both in video codecs and surveillance/monitor systems. In this paper, a new model-based detection method is designed which suits video compression codecs by proposing two models: an “inter” and “intra” model. The “inter” model exploits the motion information represented as blocks by global motion compensation approaches while the “intra” model extracts the objects details through objects filtering and image segmentation procedures. Finally, the detection results are formed through a new clustering with fine-tune approach from the “intra” model assisted with the “inter” model. Experimental results show that the proposed method fits well for real-time video codecs and it achieves a good performance both on detection precision and on computing time. In addition, the proposed method is versatile for a wide range of surveillance videos with different characteristics. Full article
(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)
Show Figures

Figure 1

Article
A Method for Identification of Multisynaptic Boutons in Electron Microscopy Image Stack of Mouse Cortex
Appl. Sci. 2019, 9(13), 2591; https://doi.org/10.3390/app9132591 - 26 Jun 2019
Cited by 1 | Viewed by 995
Abstract
Recent electron microscopy (EM) imaging techniques make the automatic acquisition of a large number of serial sections from brain samples possible. On the other hand, it has been proven that the multisynaptic bouton (MSB), a structure that consists of one presynaptic bouton and [...] Read more.
Recent electron microscopy (EM) imaging techniques make the automatic acquisition of a large number of serial sections from brain samples possible. On the other hand, it has been proven that the multisynaptic bouton (MSB), a structure that consists of one presynaptic bouton and multiple postsynaptic spines, is closely related to sensory deprivation, brain trauma, and learning. Nevertheless, it is still a challenging task to analyze this essential structure from EM images due to factors such as imaging artifacts and the presence of complicated subcellular structures. In this paper, we present an effective way to identify the MSBs on EM images. Using normalized images as training data, two convolutional neural networks (CNNs) are trained to obtain the segmentation of synapses and the probability map of the neuronal membrane, respectively. Then, a series of follow-up operations are employed to obtain rectified segmentation of synapses and segmentation of neurons. By incorporating this information, the MSBs can be reasonably identified. The dataset in this study is an image stack of mouse cortex that contains 178 serial images with a size of 6004 pixels × 5174 pixels and a voxel resolution of 2 nm × 2 nm × 50 nm. The precision and recall on MSB detection are 68.57% and 94.12%, respectively. Experimental results demonstrate that our method is conducive to biologists’ research on MSBs’ properties. Full article
(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)
Show Figures

Figure 1

Article
A Two-Stage Gradient Ascent-Based Superpixel Framework for Adaptive Segmentation
Appl. Sci. 2019, 9(12), 2421; https://doi.org/10.3390/app9122421 - 13 Jun 2019
Cited by 3 | Viewed by 861
Abstract
Superpixel segmentation usually over-segments an image into fragments to extract regional features, thus linking up advanced computer vision tasks. In this work, a novel coarse-to-fine gradient ascent framework is proposed for superpixel-based color image adaptive segmentation. In the first stage, a speeded-up Simple [...] Read more.
Superpixel segmentation usually over-segments an image into fragments to extract regional features, thus linking up advanced computer vision tasks. In this work, a novel coarse-to-fine gradient ascent framework is proposed for superpixel-based color image adaptive segmentation. In the first stage, a speeded-up Simple Linear Iterative Clustering (sSLIC) method is adopted to generate uniform superpixels efficiently, which assumes that homogeneous regions preserve high consistence during clustering, consequently, much redundant computation for updating can be avoided. Then a simple criterion is introduced to evaluate the uniformity in each superpixel region, once a superpixel region is under-segmented, an adaptive marker-controlled watershed algorithm processes a finer subdivision. Experimental results show that the framework achieves better performance on detail-rich regions than previous superpixel approaches with satisfactory efficiency. Full article
(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)
Show Figures

Figure 1

Article
Application of Deep Convolutional Neural Networks and Smartphone Sensors for Indoor Localization
Appl. Sci. 2019, 9(11), 2337; https://doi.org/10.3390/app9112337 - 06 Jun 2019
Cited by 17 | Viewed by 1250
Abstract
Indoor localization systems are susceptible to higher errors and do not meet the current standards of indoor localization. Moreover, the performance of such approaches is limited by device dependence. The use of Wi-Fi makes the localization process vulnerable to dynamic factors and energy [...] Read more.
Indoor localization systems are susceptible to higher errors and do not meet the current standards of indoor localization. Moreover, the performance of such approaches is limited by device dependence. The use of Wi-Fi makes the localization process vulnerable to dynamic factors and energy hungry. A multi-sensor fusion based indoor localization approach is proposed to overcome these issues. The proposed approach predicts pedestrians’ current location with smartphone sensors data alone. The proposed approach aims at mitigating the impact of device dependency on the localization accuracy and lowering the localization error in the magnetic field based localization systems. We trained a deep learning based convolutional neural network to recognize the indoor scene which helps to lower the localization error. The recognized scene is used to identify a specific floor and narrow the search space. The database built of magnetic field patterns helps to lower the device dependence. A modified K nearest neighbor (mKNN) is presented to calculate the pedestrian’s current location. The data from pedestrian dead reckoning further refines this location and an extended Kalman filter is implemented to this end. The performance of the proposed approach is tested with experiments on Galaxy S8 and LG G6 smartphones. The experimental results demonstrate that the proposed approach can achieve an accuracy of 1.04 m at 50 percent, regardless of the smartphone used for localization. The proposed mKNN outperforms K nearest neighbor approach, and mean, variance, and maximum errors are lower than those of KNN. Moreover, the proposed approach does not use Wi-Fi for localization and is more energy efficient than those of Wi-Fi based approaches. Experiments reveal that localization without scene recognition leads to higher errors. Full article
(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)
Show Figures

Graphical abstract

Article
Intelligent Thermal Imaging-Based Diagnostics of Turbojet Engines
Appl. Sci. 2019, 9(11), 2253; https://doi.org/10.3390/app9112253 - 31 May 2019
Cited by 21 | Viewed by 1419
Abstract
There are only a few applications of infrared thermal imaging in aviation. In the area of turbojet engines, infrared imaging has been used to detect temperature field anomalies in order to identify structural defects in the materials of engine casings or other engine [...] Read more.
There are only a few applications of infrared thermal imaging in aviation. In the area of turbojet engines, infrared imaging has been used to detect temperature field anomalies in order to identify structural defects in the materials of engine casings or other engine parts. In aviation applications, the evaluation of infrared images is usually performed manually by an expert. This paper deals with the design of an automatic intelligent system which evaluates the technical state and diagnoses a turbojet engine during its operation based on infrared thermal (IRT) images. A hybrid system interconnecting a self-organizing feature map and an expert system is designed for this purpose. A Kohonen neural network (the self-organizing feature map) is successfully applied to segment IRT images of a turbojet engine with high precision, and the expert system is then used to create diagnostic information from the segmented images. This paper represents a proof of concept of this hybrid system using data from a small iSTC-21v turbojet engine operating in laboratory conditions. Full article
(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)
Show Figures

Graphical abstract

Article
FMnet: Iris Segmentation and Recognition by Using Fully and Multi-Scale CNN for Biometric Security
Appl. Sci. 2019, 9(10), 2042; https://doi.org/10.3390/app9102042 - 17 May 2019
Cited by 9 | Viewed by 1464
Abstract
In Deep Learning, recent works show that neural networks have a high potential in the field of biometric security. The advantage of using this type of architecture, in addition to being robust, is that the network learns the characteristic vectors by creating intelligent [...] Read more.
In Deep Learning, recent works show that neural networks have a high potential in the field of biometric security. The advantage of using this type of architecture, in addition to being robust, is that the network learns the characteristic vectors by creating intelligent filters in an automatic way, grace to the layers of convolution. In this paper, we propose an algorithm “FMnet” for iris recognition by using Fully Convolutional Network (FCN) and Multi-scale Convolutional Neural Network (MCNN). By taking into considerations the property of Convolutional Neural Networks to learn and work at different resolutions, our proposed iris recognition method overcomes the existing issues in the classical methods which only use handcrafted features extraction, by performing features extraction and classification together. Our proposed algorithm shows better classification results as compared to the other state-of-the-art iris recognition approaches. Full article
(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)
Show Figures

Figure 1

Article
A Low-Cost Approach to Crack Python CAPTCHAs Using AI-Based Chosen-Plaintext Attack
Appl. Sci. 2019, 9(10), 2010; https://doi.org/10.3390/app9102010 - 16 May 2019
Cited by 8 | Viewed by 1310
Abstract
CAPTCHA authentication has been challenged by recent technology advances in AI. However, many of the AI advances challenging CAPTCHA are either restricted by a limited amount of labeled CAPTCHA data or are constructed in an expensive or complicated way. In contrast, this paper [...] Read more.
CAPTCHA authentication has been challenged by recent technology advances in AI. However, many of the AI advances challenging CAPTCHA are either restricted by a limited amount of labeled CAPTCHA data or are constructed in an expensive or complicated way. In contrast, this paper illustrates a low-cost approach that takes advantage of the nature of open source libraries for an AI-based chosen-plaintext attack. The chosen-plaintext attack described here relies on a deep learning model created and trained on a simple personal computer in a low-cost way. It shows an efficient cracking rate over two open-source Python CAPTCHA Libraries, Claptcha and Captcha. This chosen-plaintext attack method has raised a potential security alert in the era of AI, particularly to small-business owners who use the open-source CAPTCHA libraries. The main contributions of this project include: (1) it is the first low-cost method based on chosen-plaintext attack by using the nature of open-source Python CAPTCHA libraries; (2) it is a novel way to combine TensorFlow object detection and our proposed peak segmentation algorithm with convolutional neural network to improve the recognition accuracy. Full article
(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)
Show Figures

Figure 1

Article
Infrared Stripe Correction Algorithm Based on Wavelet Analysis and Gradient Equalization
Appl. Sci. 2019, 9(10), 1993; https://doi.org/10.3390/app9101993 - 15 May 2019
Cited by 5 | Viewed by 986
Abstract
In the uncooled infrared imaging systems, owing to the non-uniformity of the amplifier in the readout circuit, the infrared image has obvious stripe noise, which greatly affects its quality. In this study, the generation mechanism of stripe noise is analyzed, and a new [...] Read more.
In the uncooled infrared imaging systems, owing to the non-uniformity of the amplifier in the readout circuit, the infrared image has obvious stripe noise, which greatly affects its quality. In this study, the generation mechanism of stripe noise is analyzed, and a new stripe correction algorithm based on wavelet analysis and gradient equalization is proposed, according to the single-direction distribution of the fixed image noise of infrared focal plane array. The raw infrared image is transformed by a wavelet transform, and the cumulative histogram of the vertical component is convolved by a Gaussian operator with a one-dimensional matrix, in order to achieve gradient equalization in the horizontal direction. In addition, the stripe noise is further separated from the edge texture by a guided filter. The algorithm is verified by simulating noised image and real infrared image, and the comparison experiment and qualitative and quantitative analysis with the current advanced algorithm show that the correction result of the algorithm in this paper is not only mild in visual effect, but also that the structural similarity (SSIM) and peak signal-to-noise ratio (PSNR) indexes can get the best result. It is shown that this algorithm can effectively remove stripe noise without losing details, and the correction performance of this method is better than the most advanced method. Full article
(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)
Show Figures

Graphical abstract

Article
3D Wireframe Modeling and Viewpoint Estimation for Multi-Class Objects Combining Deep Neural Network and Deformable Model Matching
Appl. Sci. 2019, 9(10), 1975; https://doi.org/10.3390/app9101975 - 14 May 2019
Cited by 1 | Viewed by 1206
Abstract
The accuracy of 3D viewpoint and shape estimation from 2D images has been greatly improved by machine learning, especially deep learning technology such as the convolution neural network (CNN). However, current methods are always valid only for one specific category and have exhibited [...] Read more.
The accuracy of 3D viewpoint and shape estimation from 2D images has been greatly improved by machine learning, especially deep learning technology such as the convolution neural network (CNN). However, current methods are always valid only for one specific category and have exhibited poor performance when generalized to other categories, which means that multiple detectors or networks are needed for multi-class object image cases. In this paper, we propose a method with strong generalization ability, which incorporates only one CNN with deformable model matching processing for the 3D viewpoint and the shape estimation of multi-class object image cases. The CNN is utilized to detect keypoints of the potential object from the image, while a deformable model matching stage is designed to conduct 3D wireframe modeling and viewpoint estimation simultaneously with the support of the detected keypoints. Besides, parameter estimation by deformable model matching processing has robust fault-tolerance to the keypoint detection results containing mistaken keypoints. The proposed method is evaluated on Pascal3D+ dataset. Experiments show that the proposed method performs well in both parameter estimation accuracy and the multi-class objects generalization. This research is a useful exploration to extend the generalization of deep learning in specific tasks. Full article
(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)
Show Figures

Figure 1

Article
Deep Forest-Based Monocular Visual Sign Language Recognition
Appl. Sci. 2019, 9(9), 1945; https://doi.org/10.3390/app9091945 - 12 May 2019
Cited by 4 | Viewed by 1246
Abstract
Sign language recognition (SLR) is a bridge linking the hearing impaired and the general public. Some SLR methods using wearable data gloves are not portable enough to provide daily sign language translation service, while visual SLR is more flexible to work with in [...] Read more.
Sign language recognition (SLR) is a bridge linking the hearing impaired and the general public. Some SLR methods using wearable data gloves are not portable enough to provide daily sign language translation service, while visual SLR is more flexible to work with in most scenes. This paper introduces a monocular vision-based approach to SLR. Human skeleton action recognition is proposed to express semantic information, including the representation of signs’ gestures, using the regularization of body joint features and a deep-forest-based semantic classifier with a voting strategy. We test our approach on the public American Sign Language Lexicon Video Dataset (ASLLVD) and a private testing set. It proves to achieve a promising performance and shows a high generalization capability on the testing set. Full article
(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)
Show Figures

Figure 1

Article
Two-Level Attentions and Grouping Attention Convolutional Network for Fine-Grained Image Classification
Appl. Sci. 2019, 9(9), 1939; https://doi.org/10.3390/app9091939 - 11 May 2019
Cited by 9 | Viewed by 1295
Abstract
The focus of fine-grained image classification tasks is to ignore interference information and grasp local features. This challenge is what the visual attention mechanism excels at. Firstly, we have constructed a two-level attention convolutional network, which characterizes the object-level attention and the pixel-level [...] Read more.
The focus of fine-grained image classification tasks is to ignore interference information and grasp local features. This challenge is what the visual attention mechanism excels at. Firstly, we have constructed a two-level attention convolutional network, which characterizes the object-level attention and the pixel-level attention. Then, we combine the two kinds of attention through a second-order response transform algorithm. Furthermore, we propose a clustering-based grouping attention model, which implies the part-level attention. The grouping attention method is to stretch all the semantic features, in a deeper convolution layer of the network, into vectors. These vectors are clustered by a vector dot product, and each category represents a special semantic. The grouping attention algorithm implements the functions of group convolution and feature clustering, which can greatly reduce the network parameters and improve the recognition rate and interpretability of the network. Finally, the low-level visual features and high-level semantic information are merged by a multi-level feature fusion method to accurately classify fine-grained images. We have achieved good results without using pre-training networks and fine-tuning techniques. Full article
(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)
Show Figures

Figure 1

Article
Complex Human–Object Interactions Analyzer Using a DCNN and SVM Hybrid Approach
Appl. Sci. 2019, 9(9), 1869; https://doi.org/10.3390/app9091869 - 07 May 2019
Cited by 5 | Viewed by 1258
Abstract
Nowadays, with the emergence of sophisticated electronic devices, human daily activities are becoming more and more complex. On the other hand, research has begun on the use of reliable, cost-effective sensors, patient monitoring systems, and other systems that make daily life more comfortable [...] Read more.
Nowadays, with the emergence of sophisticated electronic devices, human daily activities are becoming more and more complex. On the other hand, research has begun on the use of reliable, cost-effective sensors, patient monitoring systems, and other systems that make daily life more comfortable for the elderly. Moreover, in the field of computer vision, human action recognition (HAR) has drawn much attention as a subject of research because of its potential for numerous cost-effective applications. Although much research has investigated the use of HAR, most has dealt with simple basic actions in a simplified environment; not much work has been done in more complex, real-world environments. Therefore, a need exists for a system that can recognize complex daily activities in a variety of realistic environments. In this paper, we propose a system for recognizing such activities, in which humans interact with various objects, taking into consideration object-oriented activity information, the use of deep convolutional neural networks, and a multi-class support vector machine (multi-class SVM). The experiments are performed on a publicly available cornell activity dataset: CAD-120 which is a dataset of human–object interactions featuring ten high-level daily activities. The outcome results show that the proposed system achieves an accuracy of 93.33%, which is higher than other state-of-the-art methods, and has great potential for applications recognizing complex daily activities. Full article
(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)
Show Figures

Figure 1

Article
Medical Image Segmentation with Adjustable Computational Complexity Using Data Density Functionals
Appl. Sci. 2019, 9(8), 1718; https://doi.org/10.3390/app9081718 - 25 Apr 2019
Cited by 3 | Viewed by 1201
Abstract
Techniques of automatic medical image segmentation are the most important methods for clinical investigation, anatomic research, and modern medicine. Various image structures constructed from imaging apparatus achieve a diversity of medical applications. However, the diversified structures are also a burden of contemporary techniques. [...] Read more.
Techniques of automatic medical image segmentation are the most important methods for clinical investigation, anatomic research, and modern medicine. Various image structures constructed from imaging apparatus achieve a diversity of medical applications. However, the diversified structures are also a burden of contemporary techniques. Performing an image segmentation with a tremendously small size (<25 pixels by 25 pixels) or tremendously large size (>1024 pixels by 1024 pixels) becomes a challenge in perspectives of both technical feasibility and theoretical development. Noise and pixel pollution caused by the imaging apparatus even aggravate the difficulty of image segmentation. To simultaneously overcome the mentioned predicaments, we propose a new method of medical image segmentation with adjustable computational complexity by introducing data density functionals. Under this theoretical framework, several kernels can be assigned to conquer specific predicaments. A square-root potential kernel is used to smoothen the featured components of employed images, while a Yukawa potential kernel is applied to enhance local featured properties. Besides, the characteristic of global density functional estimation also allows image compression without losing the main image feature structures. Experiments on image segmentation showed successful results with various compression ratios. The computational complexity was significantly improved, and the score of accuracy estimated by the Jaccard index had a great outcome. Moreover, noise and regions of light pollution were mostly filtered out in the procedure of image compression. Full article
(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)
Show Figures

Figure 1

Article
Computer-Aided Detection of Hyperacute Stroke Based on Relative Radiomic Patterns in Computed Tomography
Appl. Sci. 2019, 9(8), 1668; https://doi.org/10.3390/app9081668 - 23 Apr 2019
Cited by 7 | Viewed by 1029
Abstract
Ischemic stroke is one of the leading causes of disability and death. To achieve timely assessments, a computer-aided diagnosis (CAD) system was proposed to perform early recognition of hyperacute ischemic stroke based on non-contrast computed tomography (NCCT). In total, 26 patients with hyperacute [...] Read more.
Ischemic stroke is one of the leading causes of disability and death. To achieve timely assessments, a computer-aided diagnosis (CAD) system was proposed to perform early recognition of hyperacute ischemic stroke based on non-contrast computed tomography (NCCT). In total, 26 patients with hyperacute ischemic stroke (with onset <6 h previous) and 56 normal controls composed the image database. For each NCCT slice, textural features were extracted from Ranklet-transformed images which had enhanced local contrast. Textural differences between the two sides of an image were calculated and combined in a machine learning classifier to detect stroke areas. The proposed CAD system using Ranklet features achieved significantly higher accuracy (81% vs. 71%), specificity (90% vs. 79%), and area under the curve (Az) (0.81 vs. 0.73) than conventional textural features. Diagnostic suggestions provided by the CAD system are fast and promising and could be useful in the pipeline of hyperacute ischemic stroke assessments. Full article
(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)
Show Figures

Figure 1

Article
Pedestrian Flow Tracking and Statistics of Monocular Camera Based on Convolutional Neural Network and Kalman Filter
Appl. Sci. 2019, 9(8), 1624; https://doi.org/10.3390/app9081624 - 18 Apr 2019
Cited by 7 | Viewed by 1154
Abstract
Pedestrian flow statistics and analysis in public places is an important means to ensure urban safety. However, in recent years, a video-based pedestrian flow statistics algorithm mainly relies on binocular vision or a vertical downward camera, which has serious limitations on the application [...] Read more.
Pedestrian flow statistics and analysis in public places is an important means to ensure urban safety. However, in recent years, a video-based pedestrian flow statistics algorithm mainly relies on binocular vision or a vertical downward camera, which has serious limitations on the application scene and counting area, and cannot make use of the large number of monocular cameras in the city. To solve this problem, we propose a pedestrian flow statistics algorithm based on monocular camera. Firstly, a convolution neural network is used to detect the pedestrian targets. Then, with a Kalman filter, the motion models for the targets are established. Based on these motion models, data association algorithm completes target tracking. Finally, the pedestrian flow is counted by the pedestrian counting method based on virtual blocks. The algorithm is tested on real scenes and public data sets. The experimental results show that the algorithm has high accuracy and strong real-time performance, which verifies the reliability of the algorithm. Full article
(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)
Show Figures

Figure 1

Article
A Trimmed Clustering-Based l1-Principal Component Analysis Model for Image Classification and Clustering Problems with Outliers
Appl. Sci. 2019, 9(8), 1562; https://doi.org/10.3390/app9081562 - 15 Apr 2019
Cited by 1 | Viewed by 1258
Abstract
Different versions of principal component analysis (PCA) have been widely used to extract important information for image recognition and image clustering problems. However, owing to the presence of outliers, this remains challenging. This paper proposes a new PCA methodology based on a novel [...] Read more.
Different versions of principal component analysis (PCA) have been widely used to extract important information for image recognition and image clustering problems. However, owing to the presence of outliers, this remains challenging. This paper proposes a new PCA methodology based on a novel discovery that the widely used l 1 -PCA is equivalent to a two-groups k -means clustering model. The projection vector of the l 1 -PCA is the vector difference between the two cluster centers estimated by the clustering model. In theory, this vector difference provides inter-cluster information, which is beneficial for distinguishing data objects from different classes. However, the performance of l 1 -PCA is not comparable with the state-of-the-art methods. This is because the l 1 -PCA can be sensitive to outliers, as the equivalent clustering model is not robust to outliers. To overcome this limitation, we introduce a trimming function to the clustering model and propose a trimmed-clustering based l 1 -PCA (TC-PCA). With this trimming set formulation, the TC-PCA is not sensitive to outliers. Besides, we mathematically prove the convergence of the proposed algorithm. Experimental results on image classification and clustering indicate that our proposed method outperforms the current state-of-the-art methods. Full article
(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)
Show Figures

Figure 1

Article
Attention-Aware Adversarial Network for Person Re-Identification
Appl. Sci. 2019, 9(8), 1550; https://doi.org/10.3390/app9081550 - 14 Apr 2019
Cited by 1 | Viewed by 1179
Abstract
Person re-identification (re-ID) is a fundamental problem in the field of computer vision. The performance of deep learning-based person re-ID models suffers from a lack of training data. In this work, we introduce a novel image-specific data augmentation method on the feature map [...] Read more.
Person re-identification (re-ID) is a fundamental problem in the field of computer vision. The performance of deep learning-based person re-ID models suffers from a lack of training data. In this work, we introduce a novel image-specific data augmentation method on the feature map level to enforce feature diversity in the network. Furthermore, an attention assignment mechanism is proposed to enforce that the person re-ID classifier focuses on nearly all important regions of the input person image. To achieve this, a three-stage framework is proposed. First, a baseline classification network is trained for person re-ID. Second, an attention assignment network is proposed based on the baseline network, in which the attention module learns to suppress the response of the current detected regions and re-assign attentions to other important locations. By this means, multiple important regions for classification are highlighted by the attention map. Finally, the attention map is integrated in the attention-aware adversarial network (AAA-Net), which generates high-performance classification results with an adversarial training strategy. We evaluate the proposed method on two large-scale benchmark datasets, including Market1501 and DukeMTMC-reID. Experimental results show that our algorithm performs favorably against the state-of-the-art methods. Full article
(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)
Show Figures

Figure 1

Article
Adaptive Context-Aware and Structural Correlation Filter for Visual Tracking
Appl. Sci. 2019, 9(7), 1338; https://doi.org/10.3390/app9071338 - 29 Mar 2019
Cited by 1 | Viewed by 979
Abstract
Accurate visual tracking is a challenging issue in computer vision. Correlation filter (CF) based methods are sought in visual tracking based on their efficiency and high performance. Nonetheless, traditional CF-based trackers have insufficient context information, and easily drift in scenes of fast motion [...] Read more.
Accurate visual tracking is a challenging issue in computer vision. Correlation filter (CF) based methods are sought in visual tracking based on their efficiency and high performance. Nonetheless, traditional CF-based trackers have insufficient context information, and easily drift in scenes of fast motion or background clutter. Moreover, CF-based trackers are sensitive to partial occlusion, which may reduce their overall performance and even lead to failure in tracking challenge. In this paper, we presented an adaptive context-aware (CA) and structural correlation filter for tracking. Firstly, we propose a novel context selecting strategy to obtain negative samples. Secondly, to gain robustness against partial occlusion, we construct a structural correlation filter by learning both the holistic and local models. Finally, we introduce an adaptive updating scheme by using a fluctuation parameter. Extensive comprehensive experiments on object tracking benchmark (OTB)-100 datasets demonstrate that our proposed tracker performs favorably against several state-of-the-art trackers. Full article
(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)
Show Figures

Graphical abstract

Article
Fusing Appearance and Prior Cues for Road Detection
Appl. Sci. 2019, 9(5), 996; https://doi.org/10.3390/app9050996 - 10 Mar 2019
Cited by 2 | Viewed by 1072
Abstract
Road detection is a crucial research topic in computer vision, especially in the framework of autonomous driving and driver assistance. Moreover, it is an invaluable step for other tasks such as collision warning, vehicle detection, and pedestrian detection. Nevertheless, road detection remains challenging [...] Read more.
Road detection is a crucial research topic in computer vision, especially in the framework of autonomous driving and driver assistance. Moreover, it is an invaluable step for other tasks such as collision warning, vehicle detection, and pedestrian detection. Nevertheless, road detection remains challenging due to the presence of continuously changing backgrounds, varying illumination (shadows and highlights), variability of road appearance (size, shape, and color), and differently shaped objects (lane markings, vehicles, and pedestrians). In this paper, we propose an algorithm fusing appearance and prior cues for road detection. Firstly, input images are preprocessed by simple linear iterative clustering (SLIC), morphological processing, and illuminant invariant transformation to get superpixels and remove lane markings, shadows, and highlights. Then, we design a novel seed superpixels selection method and model appearance cues using the Gaussian mixture model with the selected seed superpixels. Next, we propose to construct a road geometric prior model offline, which can provide statistical descriptions and relevant information to infer the location of the road surface. Finally, a Bayesian framework is used to fuse appearance and prior cues. Experiments are carried out on the Karlsruhe Institute of Technology and Toyota Technological Institute (KITTI) road benchmark where the proposed algorithm shows compelling performance and achieves state-of-the-art results among the model-based methods. Full article
(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)
Show Figures

Graphical abstract

Article
Color Inverse Halftoning Method with the Correlation of Multi-Color Components Based on Extreme Learning Machine
Appl. Sci. 2019, 9(5), 841; https://doi.org/10.3390/app9050841 - 27 Feb 2019
Cited by 15 | Viewed by 1115
Abstract
Look-up table (LUT) based method is a popular and effective way for inverse halftoning. However, it still has very large development space to improve the reconstructed color image quality for color halftone images, because most of the existing color inverse halftoning methods are [...] Read more.
Look-up table (LUT) based method is a popular and effective way for inverse halftoning. However, it still has very large development space to improve the reconstructed color image quality for color halftone images, because most of the existing color inverse halftoning methods are the simple extension of LUT methods to each color components separately. To this end, this paper presents a novel color inverse halftoning method by exploiting the correlation of multi-color components. Through considering all existent contone values with the same halftone pattern in three color component tables, we firstly propose a concept of common pattern. Then the extreme learning machine (ELM) is employed to estimate the contone values for nonexistent patterns according to common patterns in color LUT, which can not only improve the fitting precision of nonexistent values but also has fast transformation speed. Experimental results show that the proposed method achieves a better image quality when compared to previously published methods. Full article
(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)
Show Figures

Figure 1

Article
Fully Symmetric Convolutional Network for Effective Image Denoising
Appl. Sci. 2019, 9(4), 778; https://doi.org/10.3390/app9040778 - 22 Feb 2019
Cited by 5 | Viewed by 1114
Abstract
Neural-network-based image denoising is one of the promising approaches to deal with problems in image processing. In this work, a deep fully symmetric convolutional–deconvolutional neural network (FSCN) is proposed for image denoising. The proposed model comprises a novel architecture with a chain of [...] Read more.
Neural-network-based image denoising is one of the promising approaches to deal with problems in image processing. In this work, a deep fully symmetric convolutional–deconvolutional neural network (FSCN) is proposed for image denoising. The proposed model comprises a novel architecture with a chain of successive symmetric convolutional–deconvolutional layers. This framework learns convolutional–deconvolutional mappings from corrupted images to the clean ones in an end-to-end fashion without using image priors. The convolutional layers act as feature extractor to encode primary components of the image contents while eliminating corruptions, and the deconvolutional layers then decode the image abstractions to recover the image content details. An adaptive moment optimizer is used to minimize the reconstruction loss as it is appropriate for large data and noisy images. Extensive experiments were conducted for image denoising to evaluate the FSCN model against the existing state-of-the-art denoising algorithms. The results show that the proposed model achieves superior denoising, both qualitatively and quantitatively. This work also presents the efficient implementation of the FSCN model by using GPU computing which makes it easy and attractive for practical denoising applications. Full article
(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)
Show Figures

Figure 1

Article
Comparative Evaluation of Hand-Crafted Image Descriptors vs. Off-the-Shelf CNN-Based Features for Colour Texture Classification under Ideal and Realistic Conditions
Appl. Sci. 2019, 9(4), 738; https://doi.org/10.3390/app9040738 - 20 Feb 2019
Cited by 25 | Viewed by 1979
Abstract
Convolutional Neural Networks (CNN) have brought spectacular improvements in several fields of machine vision including object, scene and face recognition. Nonetheless, the impact of this new paradigm on the classification of fine-grained images—such as colour textures—is still controversial. In this work, we evaluate [...] Read more.
Convolutional Neural Networks (CNN) have brought spectacular improvements in several fields of machine vision including object, scene and face recognition. Nonetheless, the impact of this new paradigm on the classification of fine-grained images—such as colour textures—is still controversial. In this work, we evaluate the effectiveness of traditional, hand-crafted descriptors against off-the-shelf CNN-based features for the classification of different types of colour textures under a range of imaging conditions. The study covers 68 image descriptors (35 hand-crafted and 33 CNN-based) and 46 compilations of 23 colour texture datasets divided into 10 experimental conditions. On average, the results indicate a marked superiority of deep networks, particularly with non-stationary textures and in the presence of multiple changes in the acquisition conditions. By contrast, hand-crafted descriptors were better at discriminating stationary textures under steady imaging conditions and proved more robust than CNN-based features to image rotation. Full article
(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)
Show Figures

Figure 1

Article
Effective Crack Damage Detection Using Multilayer Sparse Feature Representation and Incremental Extreme Learning Machine
Appl. Sci. 2019, 9(3), 614; https://doi.org/10.3390/app9030614 - 12 Feb 2019
Cited by 4 | Viewed by 2213
Abstract
Detecting cracks within reinforced concrete is still a challenging problem, owing to the complex disturbances from the background noise. In this work, we advocate a new concrete crack damage detection model, based upon multilayer sparse feature representation and an incremental extreme learning machine [...] Read more.
Detecting cracks within reinforced concrete is still a challenging problem, owing to the complex disturbances from the background noise. In this work, we advocate a new concrete crack damage detection model, based upon multilayer sparse feature representation and an incremental extreme learning machine (ELM), which has both favorable feature learning and classification capabilities. Specifically, by cropping and using a sliding window operation and image rotation, a large number of crack and non-crack patches are obtained from the collected concrete images. With the existing image patches, the defect region features can be quickly calculated by the multilayer sparse ELM autoencoder networks. Then, the online incremental ELM classified network is used to recognize the crack defect features. Unlike the commonly-used deep learning-based methods, the presented ELM-based crack detection model can be trained efficiently without tediously fine-tuning the entire-network parameters. Moreover, according to the ELM theory, the proposed crack detector works universally for defect feature extraction and detection. In the experiments, when compared with other recently developed crack detectors, the proposed concrete crack detection model can offer outstanding training efficiency and favorable crack detecting accuracy. Full article
(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)
Show Figures

Graphical abstract

Article
A 3D Object Detection Based on Multi-Modality Sensors of USV
Appl. Sci. 2019, 9(3), 535; https://doi.org/10.3390/app9030535 - 05 Feb 2019
Cited by 3 | Viewed by 1493
Abstract
Unmanned Surface Vehicles (USVs) are commonly equipped with multi-modality sensors. Fully utilized sensors could improve object detection of USVs. This could further contribute to better autonomous navigation. The purpose of this paper is to solve the problems of 3D object detection of USVs [...] Read more.
Unmanned Surface Vehicles (USVs) are commonly equipped with multi-modality sensors. Fully utilized sensors could improve object detection of USVs. This could further contribute to better autonomous navigation. The purpose of this paper is to solve the problems of 3D object detection of USVs in complicated marine environment. We propose a 3D object detection Depth Neural Network based on multi-modality data of USVs. This model includes a modified Proposal Generation Network and Deep Fusion Detection Network. The Proposal Generation Network improves feature extraction. Meanwhile, the Deep Fusion Detection Network enhances the fusion performance and can achieve more accurate results of object detection. The model was tested on both the KITTI 3D object detection dataset (A project of Karlsruhe Institute of Technology and Toyota Technological Institute at Chicago) and a self-collected offshore dataset. The model shows excellent performance in a small memory condition. The results further prove that the method based on deep learning can give good accuracy in conditions of complicated surface in marine environment. Full article
(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)
Show Figures

Figure 1

Article
A Fast Sparse Coding Method for Image Classification
Appl. Sci. 2019, 9(3), 505; https://doi.org/10.3390/app9030505 - 01 Feb 2019
Cited by 2 | Viewed by 1068
Abstract
Image classification is an important problem in computer vision. The sparse coding spatial pyramid matching (ScSPM) framework is widely used in this field. However, the sparse coding cannot effectively handle very large training sets because of its high computational complexity, and ignoring the [...] Read more.
Image classification is an important problem in computer vision. The sparse coding spatial pyramid matching (ScSPM) framework is widely used in this field. However, the sparse coding cannot effectively handle very large training sets because of its high computational complexity, and ignoring the mutual dependence among local features results in highly variable sparse codes even for similar features. To overcome the shortcomings of previous sparse coding algorithm, we present an image classification method, which replaces the sparse dictionary with a stable dictionary learned via low computational complexity clustering, more specifically, a k-medoids cluster method optimized by k-means++. The proposed method can reduce the learning complexity and improve the feature’s stability. In the experiments, we compared the effectiveness of our method with the existing ScSPM method and its improved versions. We evaluated our approach on two diverse datasets: Caltech-101 and UIUC-Sports. The results show that our method can increase the accuracy of spatial pyramid matching, which suggests that our method is capable of improving performance of sparse coding features. Full article
(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)
Show Figures

Figure 1

Article
Harbor Extraction Based on Edge-Preserve and Edge Categories in High Spatial Resolution Remote-Sensing Images
Appl. Sci. 2019, 9(3), 420; https://doi.org/10.3390/app9030420 - 26 Jan 2019
Cited by 1 | Viewed by 929
Abstract
Efficient harbor extraction is essential due to the strategic importance of this target in economic and military construction. However, there are few studies on harbor extraction. In this article, a new harbor extraction algorithm based on edge preservation and edge categories (EC) is [...] Read more.
Efficient harbor extraction is essential due to the strategic importance of this target in economic and military construction. However, there are few studies on harbor extraction. In this article, a new harbor extraction algorithm based on edge preservation and edge categories (EC) is proposed for high spatial resolution remote-sensing images. In the preprocessing stage, we propose a local edge preservation algorithm (LEPA) to remove redundant details and reduce useless edges. After acquiring the local edge-preserve images, in order to reduce the redundant matched keypoints and improve the accuracy of the target candidate extraction method, we propose a scale-invariant feature transform (SIFT) keypoints extraction method based on edge categories (EC-SIFT): this method greatly reduces the redundancy of SIFT keypoint and improves the computational complexity of the target extraction system. Finally, the harbor extraction algorithm uses the Support Vector Machine (SVM) classifier to identify the harbor target. The experimental results show that the proposed algorithm effectively removes redundant details and improves the accuracy and efficiency of harbor target extraction. Full article
(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)
Show Figures

Figure 1

Article
Arabic Cursive Text Recognition from Natural Scene Images
Appl. Sci. 2019, 9(2), 236; https://doi.org/10.3390/app9020236 - 10 Jan 2019
Cited by 10 | Viewed by 2402
Abstract
This paper presents a comprehensive survey on Arabic cursive scene text recognition. The recent years’ publications in this field have witnessed the interest shift of document image analysis researchers from recognition of optical characters to recognition of characters appearing in natural images. Scene [...] Read more.
This paper presents a comprehensive survey on Arabic cursive scene text recognition. The recent years’ publications in this field have witnessed the interest shift of document image analysis researchers from recognition of optical characters to recognition of characters appearing in natural images. Scene text recognition is a challenging problem due to the text having variations in font styles, size, alignment, orientation, reflection, illumination change, blurriness and complex background. Among cursive scripts, Arabic scene text recognition is contemplated as a more challenging problem due to joined writing, same character variations, a large number of ligatures, the number of baselines, etc. Surveys on the Latin and Chinese script-based scene text recognition system can be found, but the Arabic like scene text recognition problem is yet to be addressed in detail. In this manuscript, a description is provided to highlight some of the latest techniques presented for text classification. The presented techniques following a deep learning architecture are equally suitable for the development of Arabic cursive scene text recognition systems. The issues pertaining to text localization and feature extraction are also presented. Moreover, this article emphasizes the importance of having benchmark cursive scene text dataset. Based on the discussion, future directions are outlined, some of which may provide insight about cursive scene text to researchers. Full article
(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)
Show Figures

Figure 1

Article
New Evolutionary-Based Techniques for Image Registration
Appl. Sci. 2019, 9(1), 176; https://doi.org/10.3390/app9010176 - 05 Jan 2019
Cited by 4 | Viewed by 1327
Abstract
The work reported in this paper aims at the development of evolutionary algorithms to register images for signature recognition purposes. We propose and develop several registration methods in order to obtain accurate and fast algorithms. First, we introduce two variants of the firefly [...] Read more.
The work reported in this paper aims at the development of evolutionary algorithms to register images for signature recognition purposes. We propose and develop several registration methods in order to obtain accurate and fast algorithms. First, we introduce two variants of the firefly method that proved to have excellent accuracy and fair run times. In order to speed up the computation, we propose two variants of Accelerated Particle Swarm Optimization (APSO) method. The resulted algorithms are significantly faster than the firefly-based ones, but the recognition rates are a little bit lower. In order to find a trade-off between the recognition rate and the computational complexity of the algorithms, we developed a hybrid method that combines the ability of auto-adaptive Evolution Strategies (ES) search to discover a global optimum solution with the strong quick convergence ability of APSO. The accuracy and the efficiency of the resulted algorithms have been experimentally proved by conducting a long series of tests on various pairs of signature images. The comparative analysis concerning the quality of the proposed methods together with conclusions and suggestions for further developments are provided in the final part of the paper. Full article
(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)
Show Figures

Figure 1

Article
Using the Guided Fireworks Algorithm for Local Backlight Dimming
Appl. Sci. 2019, 9(1), 129; https://doi.org/10.3390/app9010129 - 01 Jan 2019
Cited by 1 | Viewed by 1306
Abstract
Local backlight dimming is a promising display technology, with good performance in improving the visual quality and reducing the power consumption of device displays. To set optimal backlight luminance, it is important to design high performance local dimming algorithms. In this paper, we [...] Read more.
Local backlight dimming is a promising display technology, with good performance in improving the visual quality and reducing the power consumption of device displays. To set optimal backlight luminance, it is important to design high performance local dimming algorithms. In this paper, we focused on improving the quality of the displayed image, and take local backlight dimming as an optimization problem. In order to better evaluate the image quality, we used the structural similarity (SSIM) index as the image quality evaluation method, and built the model for the local dimming problem. To solve this optimization problem, we designed the local dimming algorithm based on the Fireworks Algorithm (FWA), which is a new evolutionary computation (EC) algorithm. To further improve the solution quality, we introduced a guiding strategy into the FWA and proposed an improved algorithm named the Guided Fireworks Algorithm (GFWA). Experimental results showed that the GFWA had a higher performance in local backlight dimming compared with the Look-Up Table (LUT) algorithm, the Improved Shuffled Frog Leaping Algorithm (ISFLA), and the FWA. Full article
(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)
Show Figures

Figure 1

Article
Improvement in Classification Performance Based on Target Vector Modification for All-Transfer Deep Learning
Appl. Sci. 2019, 9(1), 128; https://doi.org/10.3390/app9010128 - 01 Jan 2019
Cited by 9 | Viewed by 1943
Abstract
This paper proposes a target vector modification method for the all-transfer deep learning (ATDL) method. Deep neural networks (DNNs) have been used widely in many applications; however, the DNN has been known to be problematic when large amounts of training data are not [...] Read more.
This paper proposes a target vector modification method for the all-transfer deep learning (ATDL) method. Deep neural networks (DNNs) have been used widely in many applications; however, the DNN has been known to be problematic when large amounts of training data are not available. Transfer learning can provide a solution to this problem. Previous methods regularize all layers, including the output layer, by estimating the relation vectors, which are then used instead of one-hot target vectors of the target domain. These vectors are estimated by averaging the target domain data of each target domain label in the output space. This method improves the classification performance, but it does not consider the relation between the relation vectors. From this point of view, we propose a relation vector modification based on constrained pairwise repulsive forces. High pairwise repulsive forces provide large distances between the relation vectors. In addition, the risk of divergence is mitigated by the constraint based on distributions of the output vectors of the target domain data. We apply our method to two simulation experiments and a disease classification using two-dimensional electrophoresis images. The experimental results show that reusing all layers through our estimation method is effective, especially for a significantly small number of the target domain data. Full article
(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)
Show Figures

Graphical abstract

Article
Accelerating Image Classification using Feature Map Similarity in Convolutional Neural Networks
Appl. Sci. 2019, 9(1), 108; https://doi.org/10.3390/app9010108 - 29 Dec 2018
Cited by 13 | Viewed by 2761
Abstract
Convolutional neural networks (CNNs) have greatly improved image classification performance. However, the extensive time required for classification owing to the large amount of computation involved, makes it unsuitable for application to low-performance devices. To speed up image classification, we propose a cached CNN, [...] Read more.
Convolutional neural networks (CNNs) have greatly improved image classification performance. However, the extensive time required for classification owing to the large amount of computation involved, makes it unsuitable for application to low-performance devices. To speed up image classification, we propose a cached CNN, which can classify input images based on similarity with previously input images. Because the feature maps extracted from the CNN kernel represent the intensity of features, images with a similar intensity can be classified into the same class. In this study, we cache class labels and feature vectors extracted from feature maps for images classified by the CNN. Then, when a new image is input, its class label is output based on its similarity with the cached feature vectors. This process can be performed at each layer; hence, if the classification is successful, there is no need to perform the remaining convolution layer operations. This reduces the required classification time. We performed experiments to measure and evaluate the cache hit rate, precision, and classification time. Full article
(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)
Show Figures

Figure 1

Article
The Optical Barcode Detection and Recognition Method Based on Visible Light Communication Using Machine Learning
Appl. Sci. 2018, 8(12), 2425; https://doi.org/10.3390/app8122425 - 29 Nov 2018
Cited by 6 | Viewed by 1510
Abstract
Visible light communication (VLC) has developed rapidly in recent years. VLC has the advantages of high confidentiality, low cost, etc. It could be an effective way to connect online to offline (O2O). In this paper, an RGB-LED-ID detection and recognition method based on [...] Read more.
Visible light communication (VLC) has developed rapidly in recent years. VLC has the advantages of high confidentiality, low cost, etc. It could be an effective way to connect online to offline (O2O). In this paper, an RGB-LED-ID detection and recognition method based on VLC using machine learning is proposed. Different from traditional encoding and decoding VLC, we develop a new VLC system with a form of modulation and recognition. We create different features for different LEDs to make it an Optical Barcode (OBC) based on a Complementary Metal-Oxide-Semiconductor (CMOS) senor and a pulse-width modulation (PWM) method. The features are extracted using image processing and then support vector machine (SVM) and artificial neural networks (ANN) are introduced into the scheme, which are employed as a classifier. The experimental results show that the proposed method can provide a huge number of unique LED-IDs with a high LED-ID recognition rate and its performance in dark and distant conditions is significantly better than traditional Quick Response (QR) codes. This is the first time the VLC is used in the field of Internet of Things (IoT) and it is an innovative application of RGB-LED to create features. Furthermore, with the development of camera technology, the number of unique LED-IDs and the maximum identifiable distance would increase. Therefore, this scheme can be used as an effective complement to QR codes in the future. Full article
(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)
Show Figures

Figure 1

Article
A New Rotor Position Measurement Method for Permanent Magnet Spherical Motors
Appl. Sci. 2018, 8(12), 2415; https://doi.org/10.3390/app8122415 - 28 Nov 2018
Cited by 7 | Viewed by 1307
Abstract
This paper proposes a new high-precision rotor position measurement (RPM) method for permanent magnet spherical motors (PMSMs). In the proposed method, a LED light spot generation module (LSGM) was installed at the top of the rotor shaft. In the LSGM, three LEDs were [...] Read more.
This paper proposes a new high-precision rotor position measurement (RPM) method for permanent magnet spherical motors (PMSMs). In the proposed method, a LED light spot generation module (LSGM) was installed at the top of the rotor shaft. In the LSGM, three LEDs were arranged in a straight line with different distances between them, which were formed as three optical feature points (OFPs). The images of the three OFPs acquired by a high-speed camera were used to calculate the rotor position of PMSMs in the world coordinate frame. An experimental platform was built to verify the effectiveness of the proposed RPM method. Full article
(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)
Show Figures

Figure 1

Article
An Image Segmentation Method Based on Improved Regularized Level Set Model
Appl. Sci. 2018, 8(12), 2393; https://doi.org/10.3390/app8122393 - 26 Nov 2018
Cited by 7 | Viewed by 1178
Abstract
When the level set algorithm is used to segment an image, the level set function must be initialized periodically to ensure that it remains a signed distance function (SDF). To avoid this defect, an improved regularized level set method-based image segmentation approach is [...] Read more.
When the level set algorithm is used to segment an image, the level set function must be initialized periodically to ensure that it remains a signed distance function (SDF). To avoid this defect, an improved regularized level set method-based image segmentation approach is presented. First, a new potential function is defined and introduced to reconstruct a new distance regularization term to solve this issue of periodically initializing the level set function. Second, by combining the distance regularization term with the internal and external energy terms, a new energy functional is developed. Then, the process of the new energy functional evolution is derived by using the calculus of variations and the steepest descent approach, and a partial differential equation is designed. Finally, an improved regularized level set-based image segmentation (IRLS-IS) method is proposed. Numerical experimental results demonstrate that the IRLS-IS method is not only effective and robust to segment noise and intensity-inhomogeneous images but can also analyze complex medical images well. Full article
(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)
Show Figures

Figure 1

Article
Impulse Noise Denoising Using Total Variation with Overlapping Group Sparsity and Lp-Pseudo-Norm Shrinkage
Appl. Sci. 2018, 8(11), 2317; https://doi.org/10.3390/app8112317 - 20 Nov 2018
Cited by 7 | Viewed by 1546
Abstract
Models based on total variation (TV) regularization are proven to be effective in removing random noise. However, the serious staircase effect also exists in the denoised images. In this study, two-dimensional total variation with overlapping group sparsity (OGS-TV) is applied to images with [...] Read more.
Models based on total variation (TV) regularization are proven to be effective in removing random noise. However, the serious staircase effect also exists in the denoised images. In this study, two-dimensional total variation with overlapping group sparsity (OGS-TV) is applied to images with impulse noise, to suppress the staircase effect of the TV model and enhance the dissimilarity between smooth and edge regions. In the traditional TV model, the L1-norm is always used to describe the statistics characteristic of impulse noise. In this paper, the Lp-pseudo-norm regularization term is employed here to replace the L1-norm. The new model introduces another degree of freedom, which better describes the sparsity of the image and improves the denoising result. Under the accelerated alternating direction method of multipliers (ADMM) framework, Fourier transform technology is introduced to transform the matrix operation from the spatial domain to the frequency domain, which improves the efficiency of the algorithm. Our model concerns the sparsity of the difference domain in the image: the neighborhood difference of each point is fully utilized to augment the difference between the smooth and edge regions. Experimental results show that the peak signal-to-noise ratio, the structural similarity, the visual effect, and the computational efficiency of this new model are improved compared with state-of-the-art denoising methods. Full article
(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)
Show Figures

Figure 1

Article
Use of Gradient-Based Shadow Detection for Estimating Environmental Illumination Distribution
Appl. Sci. 2018, 8(11), 2255; https://doi.org/10.3390/app8112255 - 15 Nov 2018
Cited by 5 | Viewed by 988
Abstract
Environmental illumination information is necessary to achieve a consistent integration of virtual objects in a given image. In this paper, we present a gradient-based shadow detection method for estimating the environmental illumination distribution of a given scene, in which a three-dimensional (3-D) augmented [...] Read more.
Environmental illumination information is necessary to achieve a consistent integration of virtual objects in a given image. In this paper, we present a gradient-based shadow detection method for estimating the environmental illumination distribution of a given scene, in which a three-dimensional (3-D) augmented reality (AR) marker, a cubic reference object of a known size, is employed. The geometric elements (the corners and sides) of the AR marker constitute the candidate’s shadow boundary; they are obtained on a flat surface according to the relationship between the camera and the candidate’s light sources. We can then extract the shadow regions by collecting the local features that support the candidate’s shadow boundary in the image. To further verify the shadows passed by the local features-based matching, we examine whether significant brightness changes occurred in the intersection region between the shadows. Our proposed method can reduce the unwanted effects caused by the threshold values during edge-based shadow detection, as well as those caused by the sampling position during point-based illumination estimation. Full article
(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)
Show Figures

Figure 1

Article
Minimum Barrier Distance-Based Object Descriptor for Visual Tracking
Appl. Sci. 2018, 8(11), 2233; https://doi.org/10.3390/app8112233 - 13 Nov 2018
Cited by 1 | Viewed by 1189
Abstract
In most visual tracking tasks, the target is tracked by a bounding box given in the first frame. The complexity and redundancy of background information in the bounding box inevitably exist and affect tracking performance. To alleviate the influence of background, we propose [...] Read more.
In most visual tracking tasks, the target is tracked by a bounding box given in the first frame. The complexity and redundancy of background information in the bounding box inevitably exist and affect tracking performance. To alleviate the influence of background, we propose a robust object descriptor for visual tracking in this paper. First, we decompose the bounding box into non-overlapping patches and extract the color and gradient histograms features for each patch. Second, we adopt the minimum barrier distance (MBD) to calculate patch weights. Specifically, we consider the boundary patches as the background seeds and calculate the MBD from each patch to the seed set as the weight of each patch since the weight calculated by MBD can represent the difference between each patch and the background more effectively. Finally, we impose the weight on the extracted feature to get the descriptor of each patch and then incorporate our MBD-based descriptor into the structured support vector machine algorithm for tracking. Experiments on two benchmark datasets demonstrate the effectiveness of the proposed approach. Full article
(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)
Show Figures

Graphical abstract

Article
An Improved Neural Network Cascade for Face Detection in Large Scene Surveillance
Appl. Sci. 2018, 8(11), 2222; https://doi.org/10.3390/app8112222 - 11 Nov 2018
Cited by 4 | Viewed by 1712
Abstract
Face detection for security cameras monitoring large and crowded areas is very important for public safety. However, it is much more difficult than traditional face detection tasks. One reason is, in large areas like squares, stations and stadiums, faces captured by cameras are [...] Read more.
Face detection for security cameras monitoring large and crowded areas is very important for public safety. However, it is much more difficult than traditional face detection tasks. One reason is, in large areas like squares, stations and stadiums, faces captured by cameras are usually at a low resolution and thus miss many facial details. In this paper, we improve popular cascade algorithms by proposing a novel multi-resolution framework that utilizes parallel convolutional neural network cascades for detecting faces in large scene. This framework utilizes the face and head-with-shoulder information together to deal with the large area surveillance images. Comparing with popular cascade algorithms, our method outperforms them by a large margin. Full article
(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)
Show Figures

Figure 1

Article
Deep Learning Case Study for Automatic Bird Identification
Appl. Sci. 2018, 8(11), 2089; https://doi.org/10.3390/app8112089 - 29 Oct 2018
Cited by 7 | Viewed by 1901
Abstract
An automatic bird identification system is required for offshore wind farms in Finland. Indubitably, a radar is the obvious choice to detect flying birds, but external information is required for actual identification. We applied visual camera images as external data. The proposed system [...] Read more.
An automatic bird identification system is required for offshore wind farms in Finland. Indubitably, a radar is the obvious choice to detect flying birds, but external information is required for actual identification. We applied visual camera images as external data. The proposed system for automatic bird identification consists of a radar, a motorized video head and a single-lens reflex camera with a telephoto lens. A convolutional neural network trained with a deep learning algorithm is applied to the image classification. We also propose a data augmentation method in which images are rotated and converted in accordance with the desired color temperatures. The final identification is based on a fusion of parameters provided by the radar and the predictions of the image classifier. The sensitivity of this proposed system, on a dataset containing 9312 manually taken original images resulting in 2.44 × 106 augmented data set, is 0.9463 as an image classifier. The area under receiver operating characteristic curve for two key bird species is 0.9993 (the White-tailed Eagle) and 0.9496 (The Lesser Black-backed Gull), respectively. We proposed a novel system for automatic bird identification as a real world application. We demonstrated that our data augmentation method is suitable for image classification problem and it significantly increases the performance of the classifier. Full article
(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)
Show Figures

Figure 1

Article
An Image-Based Fall Detection System for the Elderly
Appl. Sci. 2018, 8(10), 1995; https://doi.org/10.3390/app8101995 - 20 Oct 2018
Cited by 10 | Viewed by 2541
Abstract
Due to advances in medical technology, the elderly population has continued to grow. Elderly healthcare issues have been widely discussed—especially fall accidents—because a fall can lead to a fracture and have serious consequences. Therefore, the effective detection of fall accidents is important for [...] Read more.
Due to advances in medical technology, the elderly population has continued to grow. Elderly healthcare issues have been widely discussed—especially fall accidents—because a fall can lead to a fracture and have serious consequences. Therefore, the effective detection of fall accidents is important for both elderly people and their caregivers. In this work, we designed an Image-based FAll Detection System (IFADS) for nursing homes, where public areas are usually equipped with surveillance cameras. Unlike existing fall detection algorithms, we mainly focused on falls that occur while sitting down and standing up from a chair, because the two activities together account for a higher proportion of falls than forward walking. IFADS first applies an object detection algorithm to identify people in a video frame. Then, a posture recognition method is used to keep tracking the status of the people by checking the relative positions of the chair and the people. An alarm is triggered when a fall is detected. In order to evaluate the effectiveness of IFADS, we not only simulated different fall scenarios, but also adopted YouTube and Giphy videos that captured real falls. Our experimental results showed that IFADS achieved an average accuracy of 95.96%. Therefore, IFADS can be used by nursing homes to improve the quality of residential care facilities. Full article
(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)
Show Figures

Figure 1

Article
A New Cost Function Combining Deep Neural Networks (DNNs) and l2,1-Norm with Extraction of Robust Facial and Superpixels Features in Age Estimation
Appl. Sci. 2018, 8(10), 1943; https://doi.org/10.3390/app8101943 - 16 Oct 2018
Viewed by 1317
Abstract
Automatic age estimation from unconstrained facial images is a challenging task and it recently has gained much attention due to its wide range of applications. In this paper, we propose a new model based on convolutional neural networks (CNNs) and l2,1-norm to [...] Read more.
Automatic age estimation from unconstrained facial images is a challenging task and it recently has gained much attention due to its wide range of applications. In this paper, we propose a new model based on convolutional neural networks (CNNs) and l2,1-norm to select age-related features for the age estimation task. A new cost function is proposed. To learn and train the new model, we provide the analysis and the proof for the convergence of the new cost function to solve minimization problem of deep neural networks (DNNs) and the l2,1-norm. High-level features are extracted from the facial images by using transfer learning, since there are currently not enough large age databases that can be used to train a deep learning network. Then, the extracted features are fed to the proposed model to select the most efficient age-related features. In addition, a new system that is based on DNN to jointly fine-tune two different DNNs with two different feature sets is developed. Experimental results show the effectiveness of the proposed methods and achieved the state-of-art performance on a public database. Full article
(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)
Show Figures

Figure 1

Article
Temporal Action Detection in Untrimmed Videos from Fine to Coarse Granularity
Appl. Sci. 2018, 8(10), 1924; https://doi.org/10.3390/app8101924 - 15 Oct 2018
Cited by 4 | Viewed by 1107
Abstract
Temporal action detection in long, untrimmed videos is an important yet challenging task that requires not only recognizing the categories of actions in videos, but also localizing the start and end times of each action. Recent years, artificial neural networks, such as Convolutional [...] Read more.
Temporal action detection in long, untrimmed videos is an important yet challenging task that requires not only recognizing the categories of actions in videos, but also localizing the start and end times of each action. Recent years, artificial neural networks, such as Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) improve the performance significantly in various computer vision tasks, including action detection. In this paper, we make the most of different granular classifiers and propose to detect action from fine to coarse granularity, which is also in line with the people’s detection habits. Our action detection method is built in the ‘proposal then classification’ framework. We employ several neural network architectures as deep information extractor and segment-level (fine granular) and window-level (coarse granular) classifiers. Each of the proposal and classification steps is executed from the segment to window level. The experimental results show that our method not only achieves detection performance that is comparable to that of state-of-the-art methods, but also has a relatively balanced performance for different action categories. Full article
(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)
Show Figures

Figure 1

Article
Large-Scale Fine-Grained Bird Recognition Based on a Triplet Network and Bilinear Model
Appl. Sci. 2018, 8(10), 1906; https://doi.org/10.3390/app8101906 - 13 Oct 2018
Cited by 1 | Viewed by 1451
Abstract
The main purpose of fine-grained classification is to distinguish among many subcategories of a single basic category, such as birds or flowers. We propose a model based on a triple network and bilinear methods for fine-grained bird identification. Our proposed model can be [...] Read more.
The main purpose of fine-grained classification is to distinguish among many subcategories of a single basic category, such as birds or flowers. We propose a model based on a triple network and bilinear methods for fine-grained bird identification. Our proposed model can be trained in an end-to-end manner, which effectively increases the inter-class distance of the network extraction features and improves the accuracy of bird recognition. When experimentally tested on 1096 birds in a custom-built dataset and on Caltech-UCSD (a public bird dataset), the model achieved an accuracy of 88.91% and 85.58%, respectively. The experimental results confirm the high generalization ability of our model in fine-grained image classification. Moreover, our model requires no additional manual annotation information such as object-labeling frames and part-labeling points, which guarantees good versatility and robustness in fine-grained bird recognition. Full article
(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)
Show Figures

Figure 1

Article
A Novel Lightweight Approach for Video Retrieval on Mobile Augmented Reality Environment
Appl. Sci. 2018, 8(10), 1860; https://doi.org/10.3390/app8101860 - 10 Oct 2018
Cited by 2 | Viewed by 1181
Abstract
Mobile Augmented Reality merges the virtual objects with real world on mobile devices, while video retrieval brings out the similar looking videos from the large-scale video dataset. Since mobile augmented reality application demands the real-time interaction and operation, we need to process and [...] Read more.
Mobile Augmented Reality merges the virtual objects with real world on mobile devices, while video retrieval brings out the similar looking videos from the large-scale video dataset. Since mobile augmented reality application demands the real-time interaction and operation, we need to process and interact in real-time. Furthermore, augmented reality based virtual objects can be poorly textured. In order to resolve the above mentioned issues, in this research, we propose a novel, fast and robust approach for retrieving videos on the mobile augmented reality environment using an image and video queries. In the beginning, Top-K key-frames are extracted from the videos which significantly increases the efficiency. Secondly, we introduce a novel frame based feature extraction method, namely Pyramid Ternary Histogram of Oriented Gradient (PTHOG) to extract the shape feature from the virtual objects in an effective and efficient manner. Thirdly, we utilize the Double-Bit Quantization (DBQ) based hashing to accomplish the nearest neighbor search efficiently, which produce the candidate list of videos. Lastly, the similarity measure is performed to re-rank the videos which are obtained from the candidate list. An extensive experimental analysis is performed in order to verify our claims. Full article
(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)
Show Figures

Figure 1

Article
A Method for Singular Points Detection Based on Faster-RCNN
Appl. Sci. 2018, 8(10), 1853; https://doi.org/10.3390/app8101853 - 09 Oct 2018
Cited by 4 | Viewed by 1302
Abstract
Most methods for singular points detection usually depend on the orientation fields of fingerprints, which cannot achieve reliable and accurate detection of poor quality fingerprints. In this study, a new method for fingerprint singular points detection based on Faster-RCNN (Faster Region-based Convolutional Network [...] Read more.
Most methods for singular points detection usually depend on the orientation fields of fingerprints, which cannot achieve reliable and accurate detection of poor quality fingerprints. In this study, a new method for fingerprint singular points detection based on Faster-RCNN (Faster Region-based Convolutional Network method) is proposed, which is a two-step process, and an orientation constraint is added in Faster-RCNN to obtain orientation information of singular points. Besides, we designed a convolutional neural network (ConvNet) for singular points detection according to the characteristics of fingerprint images and the existing works. Specifically, the proposed method could extract singular points directly from raw fingerprint images without traditional preprocessing. Experimental results demonstrate the effectiveness of the proposed method. In comparison with other detection algorithms, our method achieves 96.03% detection rate for core points and 98.33% detection rate for delta points on FVC2002 DB1 dataset while 90.75% for core points and 94.87% on NIST SD4 dataset, which outperform other algorithms. Full article
(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)
Show Figures

Figure 1

Article
Necessary Morphological Patches Extraction for Automatic Micro-Expression Recognition
Appl. Sci. 2018, 8(10), 1811; https://doi.org/10.3390/app8101811 - 03 Oct 2018
Cited by 3 | Viewed by 1260
Abstract
Micro expressions are usually subtle and brief facial expressions that humans use to hide their true emotional states. In recent years, micro-expression recognition has attracted wide attention in the fields of psychology, mass media, and computer vision. The shortest micro expression lasts only [...] Read more.
Micro expressions are usually subtle and brief facial expressions that humans use to hide their true emotional states. In recent years, micro-expression recognition has attracted wide attention in the fields of psychology, mass media, and computer vision. The shortest micro expression lasts only 1/25 s. Furthermore, different from macro-expressions, micro-expressions have considerable low intensity and inadequate contraction of the facial muscles. Based on these characteristics, automatic micro-expression detection and recognition are great challenges in the field of computer vision. In this paper, we propose a novel automatic facial expression recognition framework based on necessary morphological patches (NMPs) to better detect and identify micro expressions. Micro expression is a subconscious facial muscle response. It is not controlled by the rational thought of the brain. Therefore, it calls on a few facial muscles and has local properties. NMPs are the facial regions that must be involved when a micro expression occurs. NMPs were screened based on weighting the facial active patches instead of the holistic utilization of the entire facial area. Firstly, we manually define the active facial patches according to the facial landmark coordinates and the facial action coding system (FACS). Secondly, we use a LBP-TOP descriptor to extract features in these patches and the Entropy-Weight method to select NMP. Finally, we obtain the weighted LBP-TOP features of these NMP. We test on two recent publicly available datasets: CASME II and SMIC database that provided sufficient samples. Compared with many recent state-of-the-art approaches, our method achieves more promising recognition results. Full article
(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)
Show Figures

Figure 1