Applied Sciences

Research

Jump to: Review

18 pages, 1364 KB

Open AccessArticle

Estimation of Parameters of Parathyroid Glands Using Particle Swarm Optimization and Multivariate Generalized Gaussian Function Mixture

by Maria H. Listewnik, Hanna Piwowarska-Bilska, Krzysztof Safranow, Jacek Iwanowski, Maria Laszczyńska, Maria Chosia, Marek Ostrowski, Bożena Birkenfeld, Dorota Oszutowska-Mazurek and Przemyslaw Mazurek

Appl. Sci. 2019, 9(21), 4511; https://doi.org/10.3390/app9214511 - 24 Oct 2019

Cited by 3 | Viewed by 2603

Abstract

The paper introduces a fitting method for Single-Photon Emission Computed Tomography (SPECT) images of parathyroid glands using generalized Gaussian function for quantitative assessment of preoperative parathyroid SPECT/CT scintigraphy results in a large patient cohort. Parathyroid glands are very small for SPECT acquisition and [...] Read more.

The paper introduces a fitting method for Single-Photon Emission Computed Tomography (SPECT) images of parathyroid glands using generalized Gaussian function for quantitative assessment of preoperative parathyroid SPECT/CT scintigraphy results in a large patient cohort. Parathyroid glands are very small for SPECT acquisition and the overlapping of 3D distributions was observed. The application of multivariate generalized Gaussian function mixture allows modeling, but results depend on the optimization algorithm. Particle Swarm Optimization (PSO) with global best, ring, and random neighborhood topologies were compared. The obtained results show benefits of random neighborhood topology that gives a smaller error for 3D position and the position estimation was improved by about

3 %

voxel size, but the most important is the reduction of processing time to a few minutes, compared to a few hours in relation to the random walk algorithm. Moreover, the frequency of obtaining low MSE values was more than two times higher for this topology. The presented method based on random neighborhood topology allows quantifying activity in a specific voxel in a short time and could be applied it in clinical practice. Full article

(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)

► Show Figures

Figure 1

23 pages, 6118 KB

Open AccessArticle

Generating a Cylindrical Panorama from a Forward-Looking Borehole Video for Borehole Condition Analysis

by Zhaopeng Deng, Maoyong Cao, Yushui Geng and Laxmisha Rai

Appl. Sci. 2019, 9(16), 3437; https://doi.org/10.3390/app9163437 - 20 Aug 2019

Cited by 15 | Viewed by 4214

Abstract

Geological exploration plays a fundamental and crucial role in geological engineering. The most frequently used method is to obtain borehole videos using an axial view borehole camera system (AVBCS) in a pre-drilled borehole. This approach to surveying the internal structure of a borehole [...] Read more.

Geological exploration plays a fundamental and crucial role in geological engineering. The most frequently used method is to obtain borehole videos using an axial view borehole camera system (AVBCS) in a pre-drilled borehole. This approach to surveying the internal structure of a borehole is based on the video playback and video screenshot analysis. One of the drawbacks of AVBCS is that it provides only a qualitative description of borehole information with a forward-looking borehole video, but quantitative analysis of the borehole data, such as the width and dip angle of fracture, are unavailable. In this paper, we proposed a new approach to create a whole borehole-wall cylindrical panorama from the borehole video acquired by AVBCS, which provides a possibility for further analysis of borehole information. Firstly, based on the Otsu and region labeling algorithms, a borehole center location algorithm is proposed to extract the borehole center of each video image automatically. Afterwards, based on coordinate mapping (CM), a virtual coordinate graph (VCG) is designed in the unwrapping process of the front view borehole-wall image sequence, generating the corresponding unfolded image sequence and reducing the computational cost. Subsequently, based on the sum of absolute difference (SAD), a projection transformation SAD (PTSAD), which considers the gray level similarity of candidate images, is proposed to achieve the matching of the unfolded image sequence. Finally, an image filtering module is introduced to filter the invalid frames and the remaining frames are stitched into a complete cylindrical panorama. Experiments on two real-world borehole videos demonstrate that the proposed method can generate panoramic borehole-wall unfolded images from videos with satisfying visual effect for follow up geological condition analysis. From the resulting image, borehole information, including the rock mechanical properties, distribution and width of fracture, fault distribution and seam thickness, can be further obtained and analyzed. Full article

(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)

► Show Figures

Figure 1

17 pages, 6624 KB

Open AccessArticle

Novel Hand Gesture Alert System

by Sebastien Mambou, Ondrej Krejcar, Petra Maresova, Ali Selamat and Kamil Kuca

Appl. Sci. 2019, 9(16), 3419; https://doi.org/10.3390/app9163419 - 19 Aug 2019

Cited by 8 | Viewed by 4789

Abstract

Sexual assault can cause great societal damage, with negative socio-economic, mental, sexual, physical and reproductive consequences. According to the Eurostat, the number of crimes increased in the European Union between 2008 and 2016. However, despite the increase in security tools such as cameras, [...] Read more.

Sexual assault can cause great societal damage, with negative socio-economic, mental, sexual, physical and reproductive consequences. According to the Eurostat, the number of crimes increased in the European Union between 2008 and 2016. However, despite the increase in security tools such as cameras, it is usually difficult to know if an individual is subject to an assault based on his or her posture. Hand gestures are seen by many as the natural means of nonverbal communication when interacting with a computer, and a considerable amount of research has been performed. In addition, the identifiable hand placement characteristics provided by modern inexpensive commercial depth cameras can be used in a variety of gesture recognition-based systems, particularly for human-machine interactions. This paper introduces a novel gesture alert system that uses a combination of Convolution Neural Networks (CNNs). The overall system can be subdivided into three main parts: firstly, the human detection in the image using a pretrained “You Only Look Once (YOLO)” method, which extracts the related bounding boxes containing his/her hands; secondly, the gesture detection/classification stage, which processes the bounding box images; and thirdly, we introduced a module called “counterGesture”, which triggers the alert. Full article

(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)

► Show Figures

Figure 1

14 pages, 7725 KB

Open AccessArticle

Fast Continuous Structural Similarity Patch Based Arbitrary Style Transfer

by Bing Wu, Youdong Ding and Qingshuang Dong

Appl. Sci. 2019, 9(16), 3304; https://doi.org/10.3390/app9163304 - 12 Aug 2019

Cited by 12 | Viewed by 3741

Abstract

Style transfer is using a pair of content and style images to synthesize a stylized image which has both the structure of the content image and the style of style image. Existing optimization-based methods are limited in their performance. Some works using a [...] Read more.

Style transfer is using a pair of content and style images to synthesize a stylized image which has both the structure of the content image and the style of style image. Existing optimization-based methods are limited in their performance. Some works using a feed-forward network allow arbitrary style transfer but cannot reflect the style. In this paper, we present a fast continuous structural similarity patch based arbitrary style transfer. Firstly, we introduce the structural similarity index (SSIM) to compute the similarity between all of the content and style patches for obtaining their similarity. Then a local style patch choosing procedure is applied to maximize the utilization of all style patches and make the swapped style patch continuous matching with respect to the spatial location of style at the same time. Finally, we apply an efficient trained feed-forward inverse network to obtain the final stylized image. We use more than 80,000 natural images and 120,000 style images to train that feed-forward inverse network. The results show that our method is able to transfer arbitrary style with consistency, and the result comparison stage is made to show the effectiveness and high-quality of our stylized images. Full article

(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)

► Show Figures

Figure 1

16 pages, 5394 KB

Open AccessArticle

TF-YOLO: An Improved Incremental Network for Real-Time Object Detection

by Wangpeng He, Zhe Huang, Zhifei Wei, Cheng Li and Baolong Guo

Appl. Sci. 2019, 9(16), 3225; https://doi.org/10.3390/app9163225 - 7 Aug 2019

Cited by 72 | Viewed by 8867

Abstract

In recent years, significant advances have been gained in visual detection, and an abundance of outstanding models have been proposed. However, state-of-the-art object detection networks have some inefficiencies in detecting small targets. They commonly fail to run on portable devices or embedded systems [...] Read more.

In recent years, significant advances have been gained in visual detection, and an abundance of outstanding models have been proposed. However, state-of-the-art object detection networks have some inefficiencies in detecting small targets. They commonly fail to run on portable devices or embedded systems due to their high complexity. In this workpaper, a real-time object detection model, termed as Tiny Fast You Only Look Once (TF-YOLO), is developed to implement in an embedded system. Firstly, the k-means++ algorithm is applied to cluster the dataset, which contributes to more excellent priori boxes of the targets. Secondly, inspired by the multi-scale prediction idea in the Feature Pyramid Networks (FPN) algorithm, the framework in YOLOv3 is effectively improved and optimized, by three scales to detect the earlier extracted features. In this way, the modified network is sensitive for small targets. Experimental results demonstrate that the proposed TF-YOLO method is a smaller, faster and more efficient network model increasing the performance of end-to-end training and real-time object detection for a variety of devices. Full article

(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)

► Show Figures

Figure 1

15 pages, 30291 KB

Open AccessArticle

Triple-Attention Mixed-Link Network for Single-Image Super-Resolution

by Xi Cheng, Xiang Li and Jian Yang

Appl. Sci. 2019, 9(15), 2992; https://doi.org/10.3390/app9152992 - 25 Jul 2019

Cited by 8 | Viewed by 3561

Abstract

Single-image super-resolution is of great importance as a low-level computer-vision task. Recent approaches with deep convolutional neural networks have achieved impressive performance. However, existing architectures have limitations due to the less sophisticated structure along with less strong representational power. In this work, to [...] Read more.

Single-image super-resolution is of great importance as a low-level computer-vision task. Recent approaches with deep convolutional neural networks have achieved impressive performance. However, existing architectures have limitations due to the less sophisticated structure along with less strong representational power. In this work, to significantly enhance the feature representation, we proposed triple-attention mixed-link network (TAN), which consists of (1) three different aspects (i.e., kernel, spatial, and channel) of attention mechanisms and (2) fusion of both powerful residual and dense connections (i.e., mixed link). Specifically, the network with multi-kernel learns multi-hierarchical representations under different receptive fields. The features are recalibrated by the effective kernel and channel attention, which filters the information and enables the network to learn more powerful representations. The features finally pass through the spatial attention in the reconstruction network, which generates a fusion of local and global information, lets the network restore more details, and improves the reconstruction quality. The proposed network structure decreases 50% of the parameter growth rate compared with previous approaches. The three attention mechanisms provide 0.49 dB, 0.58 dB, and 0.32 dB performance gain when evaluating on Set5, Set14, and BSD100. Thanks to the diverse feature recalibrations and the advanced information flow topology, our proposed model is strong enough to perform against the state-of-the-art methods on the benchmark evaluations. Full article

(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)

► Show Figures

Figure 1

12 pages, 4201 KB

Open AccessArticle

Variational Autoencoder-Based Multiple Image Captioning Using a Caption Attention Map

by Boeun Kim, Saim Shin and Hyedong Jung

Appl. Sci. 2019, 9(13), 2699; https://doi.org/10.3390/app9132699 - 2 Jul 2019

Cited by 5 | Viewed by 5384

Abstract

Image captioning is a promising research topic that is applicable to services that search for desired content in a large amount of video data and a situation explanation service for visually impaired people. Previous research on image captioning has been focused on generating [...] Read more.

Image captioning is a promising research topic that is applicable to services that search for desired content in a large amount of video data and a situation explanation service for visually impaired people. Previous research on image captioning has been focused on generating one caption per image. However, to increase usability in applications, it is necessary to generate several different captions that contain various representations for an image. We propose a method to generate multiple captions using a variational autoencoder, which is one of the generative models. Because an image feature plays an important role when generating captions, a method to extract a Caption Attention Map (CAM) of the image is proposed, and CAMs are projected to a latent distribution. In addition, methods for the evaluation of multiple image captioning tasks are proposed that have not yet been actively researched. The proposed model outperforms in the aspect of diversity compared with the base model when the accuracy is comparable. Moreover, it is verified that the model using CAM generates detailed captions describing various content in the image. Full article

(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)

► Show Figures

Figure 1

19 pages, 11900 KB

Open AccessArticle

Neural Sign Language Translation Based on Human Keypoint Estimation

by Sang-Ki Ko, Chang Jo Kim, Hyedong Jung and Choongsang Cho

Appl. Sci. 2019, 9(13), 2683; https://doi.org/10.3390/app9132683 - 1 Jul 2019

Cited by 174 | Viewed by 12268

Abstract

We propose a sign language translation system based on human keypoint estimation. It is well-known that many problems in the field of computer vision require a massive dataset to train deep neural network models. The situation is even worse when it comes to [...] Read more.

We propose a sign language translation system based on human keypoint estimation. It is well-known that many problems in the field of computer vision require a massive dataset to train deep neural network models. The situation is even worse when it comes to the sign language translation problem as it is far more difficult to collect high-quality training data. In this paper, we introduce the KETI (Korea Electronics Technology Institute) sign language dataset, which consists of 14,672 videos of high resolution and quality. Considering the fact that each country has a different and unique sign language, the KETI sign language dataset can be the starting point for further research on the Korean sign language translation. Using the KETI sign language dataset, we develop a neural network model for translating sign videos into natural language sentences by utilizing the human keypoints extracted from the face, hands, and body parts. The obtained human keypoint vector is normalized by the mean and standard deviation of the keypoints and used as input to our translation model based on the sequence-to-sequence architecture. As a result, we show that our approach is robust even when the size of the training data is not sufficient. Our translation model achieved 93.28% (55.28%, respectively) translation accuracy on the validation set (test set, respectively) for 105 sentences that can be used in emergency situations. We compared several types of our neural sign translation models based on different attention mechanisms in terms of classical metrics for measuring the translation performance. Full article

(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)

► Show Figures

Figure 1

24 pages, 8330 KB

Open AccessArticle

A Model-Based Approach of Foreground Region of Interest Detection for Video Codecs

by Zhewei Zhang, Tao Jing, Bowen Ding, Meilin Gao and Xuejing Li

Appl. Sci. 2019, 9(13), 2670; https://doi.org/10.3390/app9132670 - 30 Jun 2019

Cited by 2 | Viewed by 3127

Abstract

Detecting the Region of Interest (ROI) for video clips is a significant and useful technique both in video codecs and surveillance/monitor systems. In this paper, a new model-based detection method is designed which suits video compression codecs by proposing two models: an “inter” [...] Read more.

Detecting the Region of Interest (ROI) for video clips is a significant and useful technique both in video codecs and surveillance/monitor systems. In this paper, a new model-based detection method is designed which suits video compression codecs by proposing two models: an “inter” and “intra” model. The “inter” model exploits the motion information represented as blocks by global motion compensation approaches while the “intra” model extracts the objects details through objects filtering and image segmentation procedures. Finally, the detection results are formed through a new clustering with fine-tune approach from the “intra” model assisted with the “inter” model. Experimental results show that the proposed method fits well for real-time video codecs and it achieves a good performance both on detection precision and on computing time. In addition, the proposed method is versatile for a wide range of surveillance videos with different characteristics. Full article

(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)

► Show Figures

Figure 1

19 pages, 3965 KB

Open AccessArticle

A Method for Identification of Multisynaptic Boutons in Electron Microscopy Image Stack of Mouse Cortex

by Hao Deng, Chao Ma, Hua Han, Qiwei Xie and Lijun Shen

Appl. Sci. 2019, 9(13), 2591; https://doi.org/10.3390/app9132591 - 26 Jun 2019

Cited by 1 | Viewed by 3738

Abstract

Recent electron microscopy (EM) imaging techniques make the automatic acquisition of a large number of serial sections from brain samples possible. On the other hand, it has been proven that the multisynaptic bouton (MSB), a structure that consists of one presynaptic bouton and [...] Read more.

Recent electron microscopy (EM) imaging techniques make the automatic acquisition of a large number of serial sections from brain samples possible. On the other hand, it has been proven that the multisynaptic bouton (MSB), a structure that consists of one presynaptic bouton and multiple postsynaptic spines, is closely related to sensory deprivation, brain trauma, and learning. Nevertheless, it is still a challenging task to analyze this essential structure from EM images due to factors such as imaging artifacts and the presence of complicated subcellular structures. In this paper, we present an effective way to identify the MSBs on EM images. Using normalized images as training data, two convolutional neural networks (CNNs) are trained to obtain the segmentation of synapses and the probability map of the neuronal membrane, respectively. Then, a series of follow-up operations are employed to obtain rectified segmentation of synapses and segmentation of neurons. By incorporating this information, the MSBs can be reasonably identified. The dataset in this study is an image stack of mouse cortex that contains 178 serial images with a size of 6004 pixels × 5174 pixels and a voxel resolution of 2 nm × 2 nm × 50 nm. The precision and recall on MSB detection are 68.57% and 94.12%, respectively. Experimental results demonstrate that our method is conducive to biologists’ research on MSBs’ properties. Full article

(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)

► Show Figures

Figure 1

13 pages, 5837 KB

Open AccessArticle

A Two-Stage Gradient Ascent-Based Superpixel Framework for Adaptive Segmentation

by Wangpeng He, Cheng Li, Yanzong Guo, Zhifei Wei and Baolong Guo

Appl. Sci. 2019, 9(12), 2421; https://doi.org/10.3390/app9122421 - 13 Jun 2019

Cited by 7 | Viewed by 3355

Abstract

Superpixel segmentation usually over-segments an image into fragments to extract regional features, thus linking up advanced computer vision tasks. In this work, a novel coarse-to-fine gradient ascent framework is proposed for superpixel-based color image adaptive segmentation. In the first stage, a speeded-up Simple [...] Read more.

Superpixel segmentation usually over-segments an image into fragments to extract regional features, thus linking up advanced computer vision tasks. In this work, a novel coarse-to-fine gradient ascent framework is proposed for superpixel-based color image adaptive segmentation. In the first stage, a speeded-up Simple Linear Iterative Clustering (sSLIC) method is adopted to generate uniform superpixels efficiently, which assumes that homogeneous regions preserve high consistence during clustering, consequently, much redundant computation for updating can be avoided. Then a simple criterion is introduced to evaluate the uniformity in each superpixel region, once a superpixel region is under-segmented, an adaptive marker-controlled watershed algorithm processes a finer subdivision. Experimental results show that the framework achieves better performance on detail-rich regions than previous superpixel approaches with satisfactory efficiency. Full article

(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)

► Show Figures

Figure 1

21 pages, 4075 KB

Open AccessArticle

Application of Deep Convolutional Neural Networks and Smartphone Sensors for Indoor Localization

by Imran Ashraf, Soojung Hur and Yongwan Park

Appl. Sci. 2019, 9(11), 2337; https://doi.org/10.3390/app9112337 - 6 Jun 2019

Cited by 51 | Viewed by 4831

Abstract

Indoor localization systems are susceptible to higher errors and do not meet the current standards of indoor localization. Moreover, the performance of such approaches is limited by device dependence. The use of Wi-Fi makes the localization process vulnerable to dynamic factors and energy [...] Read more.

Indoor localization systems are susceptible to higher errors and do not meet the current standards of indoor localization. Moreover, the performance of such approaches is limited by device dependence. The use of Wi-Fi makes the localization process vulnerable to dynamic factors and energy hungry. A multi-sensor fusion based indoor localization approach is proposed to overcome these issues. The proposed approach predicts pedestrians’ current location with smartphone sensors data alone. The proposed approach aims at mitigating the impact of device dependency on the localization accuracy and lowering the localization error in the magnetic field based localization systems. We trained a deep learning based convolutional neural network to recognize the indoor scene which helps to lower the localization error. The recognized scene is used to identify a specific floor and narrow the search space. The database built of magnetic field patterns helps to lower the device dependence. A modified K nearest neighbor (mKNN) is presented to calculate the pedestrian’s current location. The data from pedestrian dead reckoning further refines this location and an extended Kalman filter is implemented to this end. The performance of the proposed approach is tested with experiments on Galaxy S8 and LG G6 smartphones. The experimental results demonstrate that the proposed approach can achieve an accuracy of 1.04 m at 50 percent, regardless of the smartphone used for localization. The proposed mKNN outperforms K nearest neighbor approach, and mean, variance, and maximum errors are lower than those of KNN. Moreover, the proposed approach does not use Wi-Fi for localization and is more energy efficient than those of Wi-Fi based approaches. Experiments reveal that localization without scene recognition leads to higher errors. Full article

(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)

► Show Figures

Graphical abstract

22 pages, 12906 KB

Open AccessArticle

Intelligent Thermal Imaging-Based Diagnostics of Turbojet Engines

by Rudolf Andoga, Ladislav Főző, Martin Schrötter, Marek Češkovič, Stanislav Szabo, Róbert Bréda and Michal Schreiner

Appl. Sci. 2019, 9(11), 2253; https://doi.org/10.3390/app9112253 - 31 May 2019

Cited by 39 | Viewed by 6977

Abstract

There are only a few applications of infrared thermal imaging in aviation. In the area of turbojet engines, infrared imaging has been used to detect temperature field anomalies in order to identify structural defects in the materials of engine casings or other engine [...] Read more.

There are only a few applications of infrared thermal imaging in aviation. In the area of turbojet engines, infrared imaging has been used to detect temperature field anomalies in order to identify structural defects in the materials of engine casings or other engine parts. In aviation applications, the evaluation of infrared images is usually performed manually by an expert. This paper deals with the design of an automatic intelligent system which evaluates the technical state and diagnoses a turbojet engine during its operation based on infrared thermal (IRT) images. A hybrid system interconnecting a self-organizing feature map and an expert system is designed for this purpose. A Kohonen neural network (the self-organizing feature map) is successfully applied to segment IRT images of a turbojet engine with high precision, and the expert system is then used to create diagnostic information from the segmented images. This paper represents a proof of concept of this hybrid system using data from a small iSTC-21v turbojet engine operating in laboratory conditions. Full article

(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)

► Show Figures

Graphical abstract

17 pages, 1466 KB

Open AccessArticle

FM_net: Iris Segmentation and Recognition by Using Fully and Multi-Scale CNN for Biometric Security

by Rachida Tobji, Wu Di and Naeem Ayoub

Appl. Sci. 2019, 9(10), 2042; https://doi.org/10.3390/app9102042 - 17 May 2019

Cited by 28 | Viewed by 5695

Abstract

In Deep Learning, recent works show that neural networks have a high potential in the field of biometric security. The advantage of using this type of architecture, in addition to being robust, is that the network learns the characteristic vectors by creating intelligent [...] Read more.

In Deep Learning, recent works show that neural networks have a high potential in the field of biometric security. The advantage of using this type of architecture, in addition to being robust, is that the network learns the characteristic vectors by creating intelligent filters in an automatic way, grace to the layers of convolution. In this paper, we propose an algorithm “FM_net” for iris recognition by using Fully Convolutional Network (FCN) and Multi-scale Convolutional Neural Network (MCNN). By taking into considerations the property of Convolutional Neural Networks to learn and work at different resolutions, our proposed iris recognition method overcomes the existing issues in the classical methods which only use handcrafted features extraction, by performing features extraction and classification together. Our proposed algorithm shows better classification results as compared to the other state-of-the-art iris recognition approaches. Full article

(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)

► Show Figures

Figure 1

17 pages, 1393 KB

Open AccessArticle

A Low-Cost Approach to Crack Python CAPTCHAs Using AI-Based Chosen-Plaintext Attack

by Ning Yu and Kyle Darling

Appl. Sci. 2019, 9(10), 2010; https://doi.org/10.3390/app9102010 - 16 May 2019

Cited by 22 | Viewed by 5731

Abstract

CAPTCHA authentication has been challenged by recent technology advances in AI. However, many of the AI advances challenging CAPTCHA are either restricted by a limited amount of labeled CAPTCHA data or are constructed in an expensive or complicated way. In contrast, this paper [...] Read more.

CAPTCHA authentication has been challenged by recent technology advances in AI. However, many of the AI advances challenging CAPTCHA are either restricted by a limited amount of labeled CAPTCHA data or are constructed in an expensive or complicated way. In contrast, this paper illustrates a low-cost approach that takes advantage of the nature of open source libraries for an AI-based chosen-plaintext attack. The chosen-plaintext attack described here relies on a deep learning model created and trained on a simple personal computer in a low-cost way. It shows an efficient cracking rate over two open-source Python CAPTCHA Libraries, Claptcha and Captcha. This chosen-plaintext attack method has raised a potential security alert in the era of AI, particularly to small-business owners who use the open-source CAPTCHA libraries. The main contributions of this project include: (1) it is the first low-cost method based on chosen-plaintext attack by using the nature of open-source Python CAPTCHA libraries; (2) it is a novel way to combine TensorFlow object detection and our proposed peak segmentation algorithm with convolutional neural network to improve the recognition accuracy. Full article

(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)

► Show Figures

Figure 1

19 pages, 10962 KB

Open AccessArticle

Infrared Stripe Correction Algorithm Based on Wavelet Analysis and Gradient Equalization

by Ende Wang, Ping Jiang, Xukui Hou, Yalong Zhu and Liangyu Peng

Appl. Sci. 2019, 9(10), 1993; https://doi.org/10.3390/app9101993 - 15 May 2019

Cited by 27 | Viewed by 4385

Abstract

In the uncooled infrared imaging systems, owing to the non-uniformity of the amplifier in the readout circuit, the infrared image has obvious stripe noise, which greatly affects its quality. In this study, the generation mechanism of stripe noise is analyzed, and a new [...] Read more.

In the uncooled infrared imaging systems, owing to the non-uniformity of the amplifier in the readout circuit, the infrared image has obvious stripe noise, which greatly affects its quality. In this study, the generation mechanism of stripe noise is analyzed, and a new stripe correction algorithm based on wavelet analysis and gradient equalization is proposed, according to the single-direction distribution of the fixed image noise of infrared focal plane array. The raw infrared image is transformed by a wavelet transform, and the cumulative histogram of the vertical component is convolved by a Gaussian operator with a one-dimensional matrix, in order to achieve gradient equalization in the horizontal direction. In addition, the stripe noise is further separated from the edge texture by a guided filter. The algorithm is verified by simulating noised image and real infrared image, and the comparison experiment and qualitative and quantitative analysis with the current advanced algorithm show that the correction result of the algorithm in this paper is not only mild in visual effect, but also that the structural similarity (SSIM) and peak signal-to-noise ratio (PSNR) indexes can get the best result. It is shown that this algorithm can effectively remove stripe noise without losing details, and the correction performance of this method is better than the most advanced method. Full article

(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)

► Show Figures

Graphical abstract

14 pages, 5916 KB

Open AccessArticle

3D Wireframe Modeling and Viewpoint Estimation for Multi-Class Objects Combining Deep Neural Network and Deformable Model Matching

by Xiaoyuan Ren, Libing Jiang, Xiaoan Tang and Weichun Liu

Appl. Sci. 2019, 9(10), 1975; https://doi.org/10.3390/app9101975 - 14 May 2019

Cited by 2 | Viewed by 4711

Abstract

The accuracy of 3D viewpoint and shape estimation from 2D images has been greatly improved by machine learning, especially deep learning technology such as the convolution neural network (CNN). However, current methods are always valid only for one specific category and have exhibited [...] Read more.

The accuracy of 3D viewpoint and shape estimation from 2D images has been greatly improved by machine learning, especially deep learning technology such as the convolution neural network (CNN). However, current methods are always valid only for one specific category and have exhibited poor performance when generalized to other categories, which means that multiple detectors or networks are needed for multi-class object image cases. In this paper, we propose a method with strong generalization ability, which incorporates only one CNN with deformable model matching processing for the 3D viewpoint and the shape estimation of multi-class object image cases. The CNN is utilized to detect keypoints of the potential object from the image, while a deformable model matching stage is designed to conduct 3D wireframe modeling and viewpoint estimation simultaneously with the support of the detected keypoints. Besides, parameter estimation by deformable model matching processing has robust fault-tolerance to the keypoint detection results containing mistaken keypoints. The proposed method is evaluated on Pascal3D+ dataset. Experiments show that the proposed method performs well in both parameter estimation accuracy and the multi-class objects generalization. This research is a useful exploration to extend the generalization of deep learning in specific tasks. Full article

(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)

► Show Figures

Figure 1

14 pages, 2891 KB

Open AccessArticle

Deep Forest-Based Monocular Visual Sign Language Recognition

by Qifan Xue, Xuanpeng Li, Dong Wang and Weigong Zhang

Appl. Sci. 2019, 9(9), 1945; https://doi.org/10.3390/app9091945 - 12 May 2019

Cited by 12 | Viewed by 4482

Abstract

Sign language recognition (SLR) is a bridge linking the hearing impaired and the general public. Some SLR methods using wearable data gloves are not portable enough to provide daily sign language translation service, while visual SLR is more flexible to work with in [...] Read more.

Sign language recognition (SLR) is a bridge linking the hearing impaired and the general public. Some SLR methods using wearable data gloves are not portable enough to provide daily sign language translation service, while visual SLR is more flexible to work with in most scenes. This paper introduces a monocular vision-based approach to SLR. Human skeleton action recognition is proposed to express semantic information, including the representation of signs’ gestures, using the regularization of body joint features and a deep-forest-based semantic classifier with a voting strategy. We test our approach on the public American Sign Language Lexicon Video Dataset (ASLLVD) and a private testing set. It proves to achieve a promising performance and shows a high generalization capability on the testing set. Full article

(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)

► Show Figures

Figure 1

15 pages, 1215 KB

Open AccessArticle

Two-Level Attentions and Grouping Attention Convolutional Network for Fine-Grained Image Classification

by Yadong Yang, Xiaofeng Wang, Quan Zhao and Tingting Sui

Appl. Sci. 2019, 9(9), 1939; https://doi.org/10.3390/app9091939 - 11 May 2019

Cited by 16 | Viewed by 4550

Abstract

The focus of fine-grained image classification tasks is to ignore interference information and grasp local features. This challenge is what the visual attention mechanism excels at. Firstly, we have constructed a two-level attention convolutional network, which characterizes the object-level attention and the pixel-level [...] Read more.

The focus of fine-grained image classification tasks is to ignore interference information and grasp local features. This challenge is what the visual attention mechanism excels at. Firstly, we have constructed a two-level attention convolutional network, which characterizes the object-level attention and the pixel-level attention. Then, we combine the two kinds of attention through a second-order response transform algorithm. Furthermore, we propose a clustering-based grouping attention model, which implies the part-level attention. The grouping attention method is to stretch all the semantic features, in a deeper convolution layer of the network, into vectors. These vectors are clustered by a vector dot product, and each category represents a special semantic. The grouping attention algorithm implements the functions of group convolution and feature clustering, which can greatly reduce the network parameters and improve the recognition rate and interpretability of the network. Finally, the low-level visual features and high-level semantic information are merged by a multi-level feature fusion method to accurately classify fine-grained images. We have achieved good results without using pre-training networks and fine-tuning techniques. Full article

(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)

► Show Figures

Figure 1

14 pages, 4851 KB

Open AccessArticle

Complex Human–Object Interactions Analyzer Using a DCNN and SVM Hybrid Approach

by Cho Nilar Phyo, Thi Thi Zin and Pyke Tin

Appl. Sci. 2019, 9(9), 1869; https://doi.org/10.3390/app9091869 - 7 May 2019

Cited by 11 | Viewed by 3953

Abstract

Nowadays, with the emergence of sophisticated electronic devices, human daily activities are becoming more and more complex. On the other hand, research has begun on the use of reliable, cost-effective sensors, patient monitoring systems, and other systems that make daily life more comfortable [...] Read more.

Nowadays, with the emergence of sophisticated electronic devices, human daily activities are becoming more and more complex. On the other hand, research has begun on the use of reliable, cost-effective sensors, patient monitoring systems, and other systems that make daily life more comfortable for the elderly. Moreover, in the field of computer vision, human action recognition (HAR) has drawn much attention as a subject of research because of its potential for numerous cost-effective applications. Although much research has investigated the use of HAR, most has dealt with simple basic actions in a simplified environment; not much work has been done in more complex, real-world environments. Therefore, a need exists for a system that can recognize complex daily activities in a variety of realistic environments. In this paper, we propose a system for recognizing such activities, in which humans interact with various objects, taking into consideration object-oriented activity information, the use of deep convolutional neural networks, and a multi-class support vector machine (multi-class SVM). The experiments are performed on a publicly available cornell activity dataset: CAD-120 which is a dataset of human–object interactions featuring ten high-level daily activities. The outcome results show that the proposed system achieves an accuracy of 93.33%, which is higher than other state-of-the-art methods, and has great potential for applications recognizing complex daily activities. Full article

(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)

► Show Figures

Figure 1

15 pages, 6090 KB

Open AccessArticle

Medical Image Segmentation with Adjustable Computational Complexity Using Data Density Functionals

by Chien-Chang Chen, Meng-Yuan Tsai, Ming-Ze Kao and Henry Horng-Shing Lu

Appl. Sci. 2019, 9(8), 1718; https://doi.org/10.3390/app9081718 - 25 Apr 2019

Cited by 6 | Viewed by 4120

Abstract

Techniques of automatic medical image segmentation are the most important methods for clinical investigation, anatomic research, and modern medicine. Various image structures constructed from imaging apparatus achieve a diversity of medical applications. However, the diversified structures are also a burden of contemporary techniques. [...] Read more.

Techniques of automatic medical image segmentation are the most important methods for clinical investigation, anatomic research, and modern medicine. Various image structures constructed from imaging apparatus achieve a diversity of medical applications. However, the diversified structures are also a burden of contemporary techniques. Performing an image segmentation with a tremendously small size (<25 pixels by 25 pixels) or tremendously large size (>1024 pixels by 1024 pixels) becomes a challenge in perspectives of both technical feasibility and theoretical development. Noise and pixel pollution caused by the imaging apparatus even aggravate the difficulty of image segmentation. To simultaneously overcome the mentioned predicaments, we propose a new method of medical image segmentation with adjustable computational complexity by introducing data density functionals. Under this theoretical framework, several kernels can be assigned to conquer specific predicaments. A square-root potential kernel is used to smoothen the featured components of employed images, while a Yukawa potential kernel is applied to enhance local featured properties. Besides, the characteristic of global density functional estimation also allows image compression without losing the main image feature structures. Experiments on image segmentation showed successful results with various compression ratios. The computational complexity was significantly improved, and the score of accuracy estimated by the Jaccard index had a great outcome. Moreover, noise and regions of light pollution were mostly filtered out in the procedure of image compression. Full article

(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)

► Show Figures

Figure 1

11 pages, 2412 KB

Open AccessArticle

Computer-Aided Detection of Hyperacute Stroke Based on Relative Radiomic Patterns in Computed Tomography

by Chung-Ming Lo, Peng-Hsiang Hung and Kevin Li-Chun Hsieh

Appl. Sci. 2019, 9(8), 1668; https://doi.org/10.3390/app9081668 - 23 Apr 2019

Cited by 20 | Viewed by 3858

Abstract

Ischemic stroke is one of the leading causes of disability and death. To achieve timely assessments, a computer-aided diagnosis (CAD) system was proposed to perform early recognition of hyperacute ischemic stroke based on non-contrast computed tomography (NCCT). In total, 26 patients with hyperacute [...] Read more.

Ischemic stroke is one of the leading causes of disability and death. To achieve timely assessments, a computer-aided diagnosis (CAD) system was proposed to perform early recognition of hyperacute ischemic stroke based on non-contrast computed tomography (NCCT). In total, 26 patients with hyperacute ischemic stroke (with onset <6 h previous) and 56 normal controls composed the image database. For each NCCT slice, textural features were extracted from Ranklet-transformed images which had enhanced local contrast. Textural differences between the two sides of an image were calculated and combined in a machine learning classifier to detect stroke areas. The proposed CAD system using Ranklet features achieved significantly higher accuracy (81% vs. 71%), specificity (90% vs. 79%), and area under the curve (Az) (0.81 vs. 0.73) than conventional textural features. Diagnostic suggestions provided by the CAD system are fast and promising and could be useful in the pipeline of hyperacute ischemic stroke assessments. Full article

(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)

► Show Figures

Figure 1

13 pages, 27080 KB

Open AccessArticle

Pedestrian Flow Tracking and Statistics of Monocular Camera Based on Convolutional Neural Network and Kalman Filter

by Miao He, Haibo Luo, Bin Hui and Zheng Chang

Appl. Sci. 2019, 9(8), 1624; https://doi.org/10.3390/app9081624 - 18 Apr 2019

Cited by 18 | Viewed by 4162

Abstract

Pedestrian flow statistics and analysis in public places is an important means to ensure urban safety. However, in recent years, a video-based pedestrian flow statistics algorithm mainly relies on binocular vision or a vertical downward camera, which has serious limitations on the application [...] Read more.

Pedestrian flow statistics and analysis in public places is an important means to ensure urban safety. However, in recent years, a video-based pedestrian flow statistics algorithm mainly relies on binocular vision or a vertical downward camera, which has serious limitations on the application scene and counting area, and cannot make use of the large number of monocular cameras in the city. To solve this problem, we propose a pedestrian flow statistics algorithm based on monocular camera. Firstly, a convolution neural network is used to detect the pedestrian targets. Then, with a Kalman filter, the motion models for the targets are established. Based on these motion models, data association algorithm completes target tracking. Finally, the pedestrian flow is counted by the pedestrian counting method based on virtual blocks. The algorithm is tested on real scenes and public data sets. The experimental results show that the algorithm has high accuracy and strong real-time performance, which verifies the reliability of the algorithm. Full article

(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)

► Show Figures

Figure 1

25 pages, 1998 KB

Open AccessArticle

A Trimmed Clustering-Based l₁-Principal Component Analysis Model for Image Classification and Clustering Problems with Outliers

by Benson S. Y. Lam and S. K. Choy

Appl. Sci. 2019, 9(8), 1562; https://doi.org/10.3390/app9081562 - 15 Apr 2019

Cited by 2 | Viewed by 4268

Abstract

Different versions of principal component analysis (PCA) have been widely used to extract important information for image recognition and image clustering problems. However, owing to the presence of outliers, this remains challenging. This paper proposes a new PCA methodology based on a novel [...] Read more.

Different versions of principal component analysis (PCA) have been widely used to extract important information for image recognition and image clustering problems. However, owing to the presence of outliers, this remains challenging. This paper proposes a new PCA methodology based on a novel discovery that the widely used

l_{1}

-PCA is equivalent to a two-groups

k

-means clustering model. The projection vector of the

l_{1}

-PCA is the vector difference between the two cluster centers estimated by the clustering model. In theory, this vector difference provides inter-cluster information, which is beneficial for distinguishing data objects from different classes. However, the performance of

l_{1}

-PCA is not comparable with the state-of-the-art methods. This is because the

l_{1}

-PCA can be sensitive to outliers, as the equivalent clustering model is not robust to outliers. To overcome this limitation, we introduce a trimming function to the clustering model and propose a trimmed-clustering based

l_{1}

-PCA (TC-PCA). With this trimming set formulation, the TC-PCA is not sensitive to outliers. Besides, we mathematically prove the convergence of the proposed algorithm. Experimental results on image classification and clustering indicate that our proposed method outperforms the current state-of-the-art methods. Full article

(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)

► Show Figures

Figure 1

12 pages, 1045 KB

Open AccessArticle

Attention-Aware Adversarial Network for Person Re-Identification

by Aihong Shen, Huasheng Wang, Junjie Wang, Hongchen Tan, Xiuping Liu and Junjie Cao

Appl. Sci. 2019, 9(8), 1550; https://doi.org/10.3390/app9081550 - 14 Apr 2019

Cited by 1 | Viewed by 3202

Abstract

Person re-identification (re-ID) is a fundamental problem in the field of computer vision. The performance of deep learning-based person re-ID models suffers from a lack of training data. In this work, we introduce a novel image-specific data augmentation method on the feature map [...] Read more.

Person re-identification (re-ID) is a fundamental problem in the field of computer vision. The performance of deep learning-based person re-ID models suffers from a lack of training data. In this work, we introduce a novel image-specific data augmentation method on the feature map level to enforce feature diversity in the network. Furthermore, an attention assignment mechanism is proposed to enforce that the person re-ID classifier focuses on nearly all important regions of the input person image. To achieve this, a three-stage framework is proposed. First, a baseline classification network is trained for person re-ID. Second, an attention assignment network is proposed based on the baseline network, in which the attention module learns to suppress the response of the current detected regions and re-assign attentions to other important locations. By this means, multiple important regions for classification are highlighted by the attention map. Finally, the attention map is integrated in the attention-aware adversarial network (AAA-Net), which generates high-performance classification results with an adversarial training strategy. We evaluate the proposed method on two large-scale benchmark datasets, including Market1501 and DukeMTMC-reID. Experimental results show that our algorithm performs favorably against the state-of-the-art methods. Full article

(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)

► Show Figures

Figure 1

16 pages, 4911 KB

Open AccessArticle

Adaptive Context-Aware and Structural Correlation Filter for Visual Tracking

by Bin Zhou and Tuo Wang

Appl. Sci. 2019, 9(7), 1338; https://doi.org/10.3390/app9071338 - 29 Mar 2019

Cited by 2 | Viewed by 3467

Abstract

Accurate visual tracking is a challenging issue in computer vision. Correlation filter (CF) based methods are sought in visual tracking based on their efficiency and high performance. Nonetheless, traditional CF-based trackers have insufficient context information, and easily drift in scenes of fast motion [...] Read more.

Accurate visual tracking is a challenging issue in computer vision. Correlation filter (CF) based methods are sought in visual tracking based on their efficiency and high performance. Nonetheless, traditional CF-based trackers have insufficient context information, and easily drift in scenes of fast motion or background clutter. Moreover, CF-based trackers are sensitive to partial occlusion, which may reduce their overall performance and even lead to failure in tracking challenge. In this paper, we presented an adaptive context-aware (CA) and structural correlation filter for tracking. Firstly, we propose a novel context selecting strategy to obtain negative samples. Secondly, to gain robustness against partial occlusion, we construct a structural correlation filter by learning both the holistic and local models. Finally, we introduce an adaptive updating scheme by using a fluctuation parameter. Extensive comprehensive experiments on object tracking benchmark (OTB)-100 datasets demonstrate that our proposed tracker performs favorably against several state-of-the-art trackers. Full article

(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)

► Show Figures

Graphical abstract

15 pages, 5099 KB

Open AccessArticle

Fusing Appearance and Prior Cues for Road Detection

by Fenglei Ren, Xin He, Zhonghui Wei, Lei Zhang, Jiawei He, Zhiya Mu and You Lv

Appl. Sci. 2019, 9(5), 996; https://doi.org/10.3390/app9050996 - 10 Mar 2019

Cited by 2 | Viewed by 2963

Abstract

Road detection is a crucial research topic in computer vision, especially in the framework of autonomous driving and driver assistance. Moreover, it is an invaluable step for other tasks such as collision warning, vehicle detection, and pedestrian detection. Nevertheless, road detection remains challenging [...] Read more.

Road detection is a crucial research topic in computer vision, especially in the framework of autonomous driving and driver assistance. Moreover, it is an invaluable step for other tasks such as collision warning, vehicle detection, and pedestrian detection. Nevertheless, road detection remains challenging due to the presence of continuously changing backgrounds, varying illumination (shadows and highlights), variability of road appearance (size, shape, and color), and differently shaped objects (lane markings, vehicles, and pedestrians). In this paper, we propose an algorithm fusing appearance and prior cues for road detection. Firstly, input images are preprocessed by simple linear iterative clustering (SLIC), morphological processing, and illuminant invariant transformation to get superpixels and remove lane markings, shadows, and highlights. Then, we design a novel seed superpixels selection method and model appearance cues using the Gaussian mixture model with the selected seed superpixels. Next, we propose to construct a road geometric prior model offline, which can provide statistical descriptions and relevant information to infer the location of the road surface. Finally, a Bayesian framework is used to fuse appearance and prior cues. Experiments are carried out on the Karlsruhe Institute of Technology and Toyota Technological Institute (KITTI) road benchmark where the proposed algorithm shows compelling performance and achieves state-of-the-art results among the model-based methods. Full article

(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)

► Show Figures

Graphical abstract

12 pages, 7802 KB

Open AccessArticle

Color Inverse Halftoning Method with the Correlation of Multi-Color Components Based on Extreme Learning Machine

by Erhu Zhang, Yan Zhang and Jinghong Duan

Appl. Sci. 2019, 9(5), 841; https://doi.org/10.3390/app9050841 - 27 Feb 2019

Cited by 18 | Viewed by 3255

Abstract

Look-up table (LUT) based method is a popular and effective way for inverse halftoning. However, it still has very large development space to improve the reconstructed color image quality for color halftone images, because most of the existing color inverse halftoning methods are [...] Read more.

Look-up table (LUT) based method is a popular and effective way for inverse halftoning. However, it still has very large development space to improve the reconstructed color image quality for color halftone images, because most of the existing color inverse halftoning methods are the simple extension of LUT methods to each color components separately. To this end, this paper presents a novel color inverse halftoning method by exploiting the correlation of multi-color components. Through considering all existent contone values with the same halftone pattern in three color component tables, we firstly propose a concept of common pattern. Then the extreme learning machine (ELM) is employed to estimate the contone values for nonexistent patterns according to common patterns in color LUT, which can not only improve the fitting precision of nonexistent values but also has fast transformation speed. Experimental results show that the proposed method achieves a better image quality when compared to previously published methods. Full article

(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)

► Show Figures

Figure 1

19 pages, 7527 KB

Open AccessArticle

Fully Symmetric Convolutional Network for Effective Image Denoising

by Steffi Agino Priyanka and Yuan-Kai Wang

Appl. Sci. 2019, 9(4), 778; https://doi.org/10.3390/app9040778 - 22 Feb 2019

Cited by 12 | Viewed by 4219

Abstract

Neural-network-based image denoising is one of the promising approaches to deal with problems in image processing. In this work, a deep fully symmetric convolutional–deconvolutional neural network (FSCN) is proposed for image denoising. The proposed model comprises a novel architecture with a chain of [...] Read more.

Neural-network-based image denoising is one of the promising approaches to deal with problems in image processing. In this work, a deep fully symmetric convolutional–deconvolutional neural network (FSCN) is proposed for image denoising. The proposed model comprises a novel architecture with a chain of successive symmetric convolutional–deconvolutional layers. This framework learns convolutional–deconvolutional mappings from corrupted images to the clean ones in an end-to-end fashion without using image priors. The convolutional layers act as feature extractor to encode primary components of the image contents while eliminating corruptions, and the deconvolutional layers then decode the image abstractions to recover the image content details. An adaptive moment optimizer is used to minimize the reconstruction loss as it is appropriate for large data and noisy images. Extensive experiments were conducted for image denoising to evaluate the FSCN model against the existing state-of-the-art denoising algorithms. The results show that the proposed model achieves superior denoising, both qualitatively and quantitatively. This work also presents the efficient implementation of the FSCN model by using GPU computing which makes it easy and attractive for practical denoising applications. Full article

(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)

► Show Figures

Figure 1

32 pages, 25002 KB

Open AccessArticle

Comparative Evaluation of Hand-Crafted Image Descriptors vs. Off-the-Shelf CNN-Based Features for Colour Texture Classification under Ideal and Realistic Conditions

by Raquel Bello-Cerezo, Francesco Bianconi, Francesco Di Maria, Paolo Napoletano and Fabrizio Smeraldi

Appl. Sci. 2019, 9(4), 738; https://doi.org/10.3390/app9040738 - 20 Feb 2019

Cited by 65 | Viewed by 7272

Abstract

Convolutional Neural Networks (CNN) have brought spectacular improvements in several fields of machine vision including object, scene and face recognition. Nonetheless, the impact of this new paradigm on the classification of fine-grained images—such as colour textures—is still controversial. In this work, we evaluate [...] Read more.

Convolutional Neural Networks (CNN) have brought spectacular improvements in several fields of machine vision including object, scene and face recognition. Nonetheless, the impact of this new paradigm on the classification of fine-grained images—such as colour textures—is still controversial. In this work, we evaluate the effectiveness of traditional, hand-crafted descriptors against off-the-shelf CNN-based features for the classification of different types of colour textures under a range of imaging conditions. The study covers 68 image descriptors (35 hand-crafted and 33 CNN-based) and 46 compilations of 23 colour texture datasets divided into 10 experimental conditions. On average, the results indicate a marked superiority of deep networks, particularly with non-stationary textures and in the presence of multiple changes in the acquisition conditions. By contrast, hand-crafted descriptors were better at discriminating stationary textures under steady imaging conditions and proved more robust than CNN-based features to image rotation. Full article

(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)

► Show Figures

Figure 1

21 pages, 2779 KB

Open AccessArticle

Effective Crack Damage Detection Using Multilayer Sparse Feature Representation and Incremental Extreme Learning Machine

by Baoxian Wang, Yiqiang Li, Weigang Zhao, Zhaoxi Zhang, Yufeng Zhang and Zhe Wang

Appl. Sci. 2019, 9(3), 614; https://doi.org/10.3390/app9030614 - 12 Feb 2019

Cited by 20 | Viewed by 6463

Abstract

Detecting cracks within reinforced concrete is still a challenging problem, owing to the complex disturbances from the background noise. In this work, we advocate a new concrete crack damage detection model, based upon multilayer sparse feature representation and an incremental extreme learning machine [...] Read more.

Detecting cracks within reinforced concrete is still a challenging problem, owing to the complex disturbances from the background noise. In this work, we advocate a new concrete crack damage detection model, based upon multilayer sparse feature representation and an incremental extreme learning machine (ELM), which has both favorable feature learning and classification capabilities. Specifically, by cropping and using a sliding window operation and image rotation, a large number of crack and non-crack patches are obtained from the collected concrete images. With the existing image patches, the defect region features can be quickly calculated by the multilayer sparse ELM autoencoder networks. Then, the online incremental ELM classified network is used to recognize the crack defect features. Unlike the commonly-used deep learning-based methods, the presented ELM-based crack detection model can be trained efficiently without tediously fine-tuning the entire-network parameters. Moreover, according to the ELM theory, the proposed crack detector works universally for defect feature extraction and detection. In the experiments, when compared with other recently developed crack detectors, the proposed concrete crack detection model can offer outstanding training efficiency and favorable crack detecting accuracy. Full article

(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)

► Show Figures

Graphical abstract

13 pages, 2369 KB

Open AccessArticle

A 3D Object Detection Based on Multi-Modality Sensors of USV

by Yingying Wu, Huacheng Qin, Tao Liu, Hao Liu and Zhiqiang Wei

Appl. Sci. 2019, 9(3), 535; https://doi.org/10.3390/app9030535 - 5 Feb 2019

Cited by 13 | Viewed by 5112

Abstract

Unmanned Surface Vehicles (USVs) are commonly equipped with multi-modality sensors. Fully utilized sensors could improve object detection of USVs. This could further contribute to better autonomous navigation. The purpose of this paper is to solve the problems of 3D object detection of USVs [...] Read more.

Unmanned Surface Vehicles (USVs) are commonly equipped with multi-modality sensors. Fully utilized sensors could improve object detection of USVs. This could further contribute to better autonomous navigation. The purpose of this paper is to solve the problems of 3D object detection of USVs in complicated marine environment. We propose a 3D object detection Depth Neural Network based on multi-modality data of USVs. This model includes a modified Proposal Generation Network and Deep Fusion Detection Network. The Proposal Generation Network improves feature extraction. Meanwhile, the Deep Fusion Detection Network enhances the fusion performance and can achieve more accurate results of object detection. The model was tested on both the KITTI 3D object detection dataset (A project of Karlsruhe Institute of Technology and Toyota Technological Institute at Chicago) and a self-collected offshore dataset. The model shows excellent performance in a small memory condition. The results further prove that the method based on deep learning can give good accuracy in conditions of complicated surface in marine environment. Full article

(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)

► Show Figures

Figure 1

10 pages, 377 KB

Open AccessArticle

A Fast Sparse Coding Method for Image Classification

by Mujun Zang, Dunwei Wen, Tong Liu, Hailin Zou and Chanjuan Liu

Appl. Sci. 2019, 9(3), 505; https://doi.org/10.3390/app9030505 - 1 Feb 2019

Cited by 5 | Viewed by 3534

Abstract

Image classification is an important problem in computer vision. The sparse coding spatial pyramid matching (ScSPM) framework is widely used in this field. However, the sparse coding cannot effectively handle very large training sets because of its high computational complexity, and ignoring the [...] Read more.

Image classification is an important problem in computer vision. The sparse coding spatial pyramid matching (ScSPM) framework is widely used in this field. However, the sparse coding cannot effectively handle very large training sets because of its high computational complexity, and ignoring the mutual dependence among local features results in highly variable sparse codes even for similar features. To overcome the shortcomings of previous sparse coding algorithm, we present an image classification method, which replaces the sparse dictionary with a stable dictionary learned via low computational complexity clustering, more specifically, a k-medoids cluster method optimized by k-means++. The proposed method can reduce the learning complexity and improve the feature’s stability. In the experiments, we compared the effectiveness of our method with the existing ScSPM method and its improved versions. We evaluated our approach on two diverse datasets: Caltech-101 and UIUC-Sports. The results show that our method can increase the accuracy of spatial pyramid matching, which suggests that our method is capable of improving performance of sparse coding features. Full article

(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)

► Show Figures

Figure 1

20 pages, 6912 KB

Open AccessArticle

Harbor Extraction Based on Edge-Preserve and Edge Categories in High Spatial Resolution Remote-Sensing Images

by Jian He, Yongfei Guo, Zeshu Zhang, Hangfei Yuan, Yonghui Ning and Shuai Shao

Appl. Sci. 2019, 9(3), 420; https://doi.org/10.3390/app9030420 - 26 Jan 2019

Cited by 6 | Viewed by 2609

Abstract

Efficient harbor extraction is essential due to the strategic importance of this target in economic and military construction. However, there are few studies on harbor extraction. In this article, a new harbor extraction algorithm based on edge preservation and edge categories (EC) is [...] Read more.

Efficient harbor extraction is essential due to the strategic importance of this target in economic and military construction. However, there are few studies on harbor extraction. In this article, a new harbor extraction algorithm based on edge preservation and edge categories (EC) is proposed for high spatial resolution remote-sensing images. In the preprocessing stage, we propose a local edge preservation algorithm (LEPA) to remove redundant details and reduce useless edges. After acquiring the local edge-preserve images, in order to reduce the redundant matched keypoints and improve the accuracy of the target candidate extraction method, we propose a scale-invariant feature transform (SIFT) keypoints extraction method based on edge categories (EC-SIFT): this method greatly reduces the redundancy of SIFT keypoint and improves the computational complexity of the target extraction system. Finally, the harbor extraction algorithm uses the Support Vector Machine (SVM) classifier to identify the harbor target. The experimental results show that the proposed algorithm effectively removes redundant details and improves the accuracy and efficiency of harbor target extraction. Full article

(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)

► Show Figures

Figure 1

27 pages, 3908 KB

Open AccessArticle

Arabic Cursive Text Recognition from Natural Scene Images

by Saad Bin Ahmed, Saeeda Naz, Muhammad Imran Razzak and Rubiyah Yusof

Appl. Sci. 2019, 9(2), 236; https://doi.org/10.3390/app9020236 - 10 Jan 2019

Cited by 17 | Viewed by 9057

Abstract

This paper presents a comprehensive survey on Arabic cursive scene text recognition. The recent years’ publications in this field have witnessed the interest shift of document image analysis researchers from recognition of optical characters to recognition of characters appearing in natural images. Scene [...] Read more.

This paper presents a comprehensive survey on Arabic cursive scene text recognition. The recent years’ publications in this field have witnessed the interest shift of document image analysis researchers from recognition of optical characters to recognition of characters appearing in natural images. Scene text recognition is a challenging problem due to the text having variations in font styles, size, alignment, orientation, reflection, illumination change, blurriness and complex background. Among cursive scripts, Arabic scene text recognition is contemplated as a more challenging problem due to joined writing, same character variations, a large number of ligatures, the number of baselines, etc. Surveys on the Latin and Chinese script-based scene text recognition system can be found, but the Arabic like scene text recognition problem is yet to be addressed in detail. In this manuscript, a description is provided to highlight some of the latest techniques presented for text classification. The presented techniques following a deep learning architecture are equally suitable for the development of Arabic cursive scene text recognition systems. The issues pertaining to text localization and feature extraction are also presented. Moreover, this article emphasizes the importance of having benchmark cursive scene text dataset. Based on the discussion, future directions are outlined, some of which may provide insight about cursive scene text to researchers. Full article

(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)

► Show Figures

Figure 1

15 pages, 609 KB

Open AccessArticle

New Evolutionary-Based Techniques for Image Registration

by Catalina-Lucia Cocianu and Alexandru Stan

Appl. Sci. 2019, 9(1), 176; https://doi.org/10.3390/app9010176 - 5 Jan 2019

Cited by 13 | Viewed by 3487

Abstract

The work reported in this paper aims at the development of evolutionary algorithms to register images for signature recognition purposes. We propose and develop several registration methods in order to obtain accurate and fast algorithms. First, we introduce two variants of the firefly [...] Read more.

The work reported in this paper aims at the development of evolutionary algorithms to register images for signature recognition purposes. We propose and develop several registration methods in order to obtain accurate and fast algorithms. First, we introduce two variants of the firefly method that proved to have excellent accuracy and fair run times. In order to speed up the computation, we propose two variants of Accelerated Particle Swarm Optimization (APSO) method. The resulted algorithms are significantly faster than the firefly-based ones, but the recognition rates are a little bit lower. In order to find a trade-off between the recognition rate and the computational complexity of the algorithms, we developed a hybrid method that combines the ability of auto-adaptive Evolution Strategies (ES) search to discover a global optimum solution with the strong quick convergence ability of APSO. The accuracy and the efficiency of the resulted algorithms have been experimentally proved by conducting a long series of tests on various pairs of signature images. The comparative analysis concerning the quality of the proposed methods together with conclusions and suggestions for further developments are provided in the final part of the paper. Full article

(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)

► Show Figures

Figure 1

18 pages, 6101 KB

Open AccessArticle

Using the Guided Fireworks Algorithm for Local Backlight Dimming

by Tao Zhang, Xin Zhao, Yifei Wang and Qin Zeng

Appl. Sci. 2019, 9(1), 129; https://doi.org/10.3390/app9010129 - 1 Jan 2019

Cited by 3 | Viewed by 3670

Abstract

Local backlight dimming is a promising display technology, with good performance in improving the visual quality and reducing the power consumption of device displays. To set optimal backlight luminance, it is important to design high performance local dimming algorithms. In this paper, we [...] Read more.

Local backlight dimming is a promising display technology, with good performance in improving the visual quality and reducing the power consumption of device displays. To set optimal backlight luminance, it is important to design high performance local dimming algorithms. In this paper, we focused on improving the quality of the displayed image, and take local backlight dimming as an optimization problem. In order to better evaluate the image quality, we used the structural similarity (SSIM) index as the image quality evaluation method, and built the model for the local dimming problem. To solve this optimization problem, we designed the local dimming algorithm based on the Fireworks Algorithm (FWA), which is a new evolutionary computation (EC) algorithm. To further improve the solution quality, we introduced a guiding strategy into the FWA and proposed an improved algorithm named the Guided Fireworks Algorithm (GFWA). Experimental results showed that the GFWA had a higher performance in local backlight dimming compared with the Look-Up Table (LUT) algorithm, the Improved Shuffled Frog Leaping Algorithm (ISFLA), and the FWA. Full article

(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)

► Show Figures

Figure 1

13 pages, 7180 KB

Open AccessArticle

Improvement in Classification Performance Based on Target Vector Modification for All-Transfer Deep Learning

by Yoshihide Sawada, Yoshikuni Sato, Toru Nakada, Shunta Yamaguchi, Kei Ujimoto and Nobuhiro Hayashi

Appl. Sci. 2019, 9(1), 128; https://doi.org/10.3390/app9010128 - 1 Jan 2019

Cited by 11 | Viewed by 4459

Abstract

This paper proposes a target vector modification method for the all-transfer deep learning (ATDL) method. Deep neural networks (DNNs) have been used widely in many applications; however, the DNN has been known to be problematic when large amounts of training data are not [...] Read more.

This paper proposes a target vector modification method for the all-transfer deep learning (ATDL) method. Deep neural networks (DNNs) have been used widely in many applications; however, the DNN has been known to be problematic when large amounts of training data are not available. Transfer learning can provide a solution to this problem. Previous methods regularize all layers, including the output layer, by estimating the relation vectors, which are then used instead of one-hot target vectors of the target domain. These vectors are estimated by averaging the target domain data of each target domain label in the output space. This method improves the classification performance, but it does not consider the relation between the relation vectors. From this point of view, we propose a relation vector modification based on constrained pairwise repulsive forces. High pairwise repulsive forces provide large distances between the relation vectors. In addition, the risk of divergence is mitigated by the constraint based on distributions of the output vectors of the target domain data. We apply our method to two simulation experiments and a disease classification using two-dimensional electrophoresis images. The experimental results show that reusing all layers through our estimation method is effective, especially for a significantly small number of the target domain data. Full article

(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)

► Show Figures

Graphical abstract

18 pages, 6702 KB

Open AccessArticle

Accelerating Image Classification using Feature Map Similarity in Convolutional Neural Networks

by Keunyoung Park and Doo-Hyun Kim

Appl. Sci. 2019, 9(1), 108; https://doi.org/10.3390/app9010108 - 29 Dec 2018

Cited by 29 | Viewed by 7882

Abstract

Convolutional neural networks (CNNs) have greatly improved image classification performance. However, the extensive time required for classification owing to the large amount of computation involved, makes it unsuitable for application to low-performance devices. To speed up image classification, we propose a cached CNN, [...] Read more.

Convolutional neural networks (CNNs) have greatly improved image classification performance. However, the extensive time required for classification owing to the large amount of computation involved, makes it unsuitable for application to low-performance devices. To speed up image classification, we propose a cached CNN, which can classify input images based on similarity with previously input images. Because the feature maps extracted from the CNN kernel represent the intensity of features, images with a similar intensity can be classified into the same class. In this study, we cache class labels and feature vectors extracted from feature maps for images classified by the CNN. Then, when a new image is input, its class label is output based on its similarity with the cached feature vectors. This process can be performed at each layer; hence, if the classification is successful, there is no need to perform the remaining convolution layer operations. This reduces the required classification time. We performed experiments to measure and evaluate the cache hit rate, precision, and classification time. Full article

(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)

► Show Figures

Figure 1

21 pages, 12877 KB

Open AccessArticle

The Optical Barcode Detection and Recognition Method Based on Visible Light Communication Using Machine Learning

by Jingyi Li and Weipeng Guan

Appl. Sci. 2018, 8(12), 2425; https://doi.org/10.3390/app8122425 - 29 Nov 2018

Cited by 14 | Viewed by 5570

Abstract

Visible light communication (VLC) has developed rapidly in recent years. VLC has the advantages of high confidentiality, low cost, etc. It could be an effective way to connect online to offline (O2O). In this paper, an RGB-LED-ID detection and recognition method based on [...] Read more.

Visible light communication (VLC) has developed rapidly in recent years. VLC has the advantages of high confidentiality, low cost, etc. It could be an effective way to connect online to offline (O2O). In this paper, an RGB-LED-ID detection and recognition method based on VLC using machine learning is proposed. Different from traditional encoding and decoding VLC, we develop a new VLC system with a form of modulation and recognition. We create different features for different LEDs to make it an Optical Barcode (OBC) based on a Complementary Metal-Oxide-Semiconductor (CMOS) senor and a pulse-width modulation (PWM) method. The features are extracted using image processing and then support vector machine (SVM) and artificial neural networks (ANN) are introduced into the scheme, which are employed as a classifier. The experimental results show that the proposed method can provide a huge number of unique LED-IDs with a high LED-ID recognition rate and its performance in dark and distant conditions is significantly better than traditional Quick Response (QR) codes. This is the first time the VLC is used in the field of Internet of Things (IoT) and it is an innovative application of RGB-LED to create features. Furthermore, with the development of camera technology, the number of unique LED-IDs and the maximum identifiable distance would increase. Therefore, this scheme can be used as an effective complement to QR codes in the future. Full article

(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)

► Show Figures

Figure 1

21 pages, 7986 KB

Open AccessArticle

A New Rotor Position Measurement Method for Permanent Magnet Spherical Motors

by Yin Lu, Cungang Hu, Qunjing Wang, Yi Hong, Weixiang Shen and Chengquan Zhou

Appl. Sci. 2018, 8(12), 2415; https://doi.org/10.3390/app8122415 - 28 Nov 2018

Cited by 17 | Viewed by 4687

Abstract

This paper proposes a new high-precision rotor position measurement (RPM) method for permanent magnet spherical motors (PMSMs). In the proposed method, a LED light spot generation module (LSGM) was installed at the top of the rotor shaft. In the LSGM, three LEDs were [...] Read more.

This paper proposes a new high-precision rotor position measurement (RPM) method for permanent magnet spherical motors (PMSMs). In the proposed method, a LED light spot generation module (LSGM) was installed at the top of the rotor shaft. In the LSGM, three LEDs were arranged in a straight line with different distances between them, which were formed as three optical feature points (OFPs). The images of the three OFPs acquired by a high-speed camera were used to calculate the rotor position of PMSMs in the world coordinate frame. An experimental platform was built to verify the effectiveness of the proposed RPM method. Full article

(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)

► Show Figures

Figure 1

19 pages, 4365 KB

Open AccessArticle

An Image Segmentation Method Based on Improved Regularized Level Set Model

by Lin Sun, Xinchao Meng, Jiucheng Xu and Shiguang Zhang

Appl. Sci. 2018, 8(12), 2393; https://doi.org/10.3390/app8122393 - 26 Nov 2018

Cited by 12 | Viewed by 5409

Abstract

When the level set algorithm is used to segment an image, the level set function must be initialized periodically to ensure that it remains a signed distance function (SDF). To avoid this defect, an improved regularized level set method-based image segmentation approach is [...] Read more.

When the level set algorithm is used to segment an image, the level set function must be initialized periodically to ensure that it remains a signed distance function (SDF). To avoid this defect, an improved regularized level set method-based image segmentation approach is presented. First, a new potential function is defined and introduced to reconstruct a new distance regularization term to solve this issue of periodically initializing the level set function. Second, by combining the distance regularization term with the internal and external energy terms, a new energy functional is developed. Then, the process of the new energy functional evolution is derived by using the calculus of variations and the steepest descent approach, and a partial differential equation is designed. Finally, an improved regularized level set-based image segmentation (IRLS-IS) method is proposed. Numerical experimental results demonstrate that the IRLS-IS method is not only effective and robust to segment noise and intensity-inhomogeneous images but can also analyze complex medical images well. Full article

(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)

► Show Figures

Figure 1

21 pages, 4404 KB

Open AccessArticle

Impulse Noise Denoising Using Total Variation with Overlapping Group Sparsity and Lp-Pseudo-Norm Shrinkage

by Lingzhi Wang, Yingpin Chen, Fan Lin, Yuqun Chen, Fei Yu and Zongfu Cai

Appl. Sci. 2018, 8(11), 2317; https://doi.org/10.3390/app8112317 - 20 Nov 2018

Cited by 15 | Viewed by 4775

Abstract

Models based on total variation (TV) regularization are proven to be effective in removing random noise. However, the serious staircase effect also exists in the denoised images. In this study, two-dimensional total variation with overlapping group sparsity (OGS-TV) is applied to images with [...] Read more.

Models based on total variation (TV) regularization are proven to be effective in removing random noise. However, the serious staircase effect also exists in the denoised images. In this study, two-dimensional total variation with overlapping group sparsity (OGS-TV) is applied to images with impulse noise, to suppress the staircase effect of the TV model and enhance the dissimilarity between smooth and edge regions. In the traditional TV model, the L1-norm is always used to describe the statistics characteristic of impulse noise. In this paper, the Lp-pseudo-norm regularization term is employed here to replace the L1-norm. The new model introduces another degree of freedom, which better describes the sparsity of the image and improves the denoising result. Under the accelerated alternating direction method of multipliers (ADMM) framework, Fourier transform technology is introduced to transform the matrix operation from the spatial domain to the frequency domain, which improves the efficiency of the algorithm. Our model concerns the sparsity of the difference domain in the image: the neighborhood difference of each point is fully utilized to augment the difference between the smooth and edge regions. Experimental results show that the peak signal-to-noise ratio, the structural similarity, the visual effect, and the computational efficiency of this new model are improved compared with state-of-the-art denoising methods. Full article

(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)

► Show Figures

Figure 1

13 pages, 3208 KB

Open AccessArticle

Use of Gradient-Based Shadow Detection for Estimating Environmental Illumination Distribution

by Sangyoon Lee and Hyunki Hong

Appl. Sci. 2018, 8(11), 2255; https://doi.org/10.3390/app8112255 - 15 Nov 2018

Cited by 8 | Viewed by 3153

Abstract

Environmental illumination information is necessary to achieve a consistent integration of virtual objects in a given image. In this paper, we present a gradient-based shadow detection method for estimating the environmental illumination distribution of a given scene, in which a three-dimensional (3-D) augmented [...] Read more.

Environmental illumination information is necessary to achieve a consistent integration of virtual objects in a given image. In this paper, we present a gradient-based shadow detection method for estimating the environmental illumination distribution of a given scene, in which a three-dimensional (3-D) augmented reality (AR) marker, a cubic reference object of a known size, is employed. The geometric elements (the corners and sides) of the AR marker constitute the candidate’s shadow boundary; they are obtained on a flat surface according to the relationship between the camera and the candidate’s light sources. We can then extract the shadow regions by collecting the local features that support the candidate’s shadow boundary in the image. To further verify the shadows passed by the local features-based matching, we examine whether significant brightness changes occurred in the intersection region between the shadows. Our proposed method can reduce the unwanted effects caused by the threshold values during edge-based shadow detection, as well as those caused by the sampling position during point-based illumination estimation. Full article

(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)

► Show Figures

Figure 1

20 pages, 2740 KB

Open AccessArticle

Minimum Barrier Distance-Based Object Descriptor for Visual Tracking

by Zhengzheng Tu, Linlin Guo, Chenglong Li, Ziwei Xiong and Xiao Wang

Appl. Sci. 2018, 8(11), 2233; https://doi.org/10.3390/app8112233 - 13 Nov 2018

Cited by 1 | Viewed by 3397

Abstract

In most visual tracking tasks, the target is tracked by a bounding box given in the first frame. The complexity and redundancy of background information in the bounding box inevitably exist and affect tracking performance. To alleviate the influence of background, we propose [...] Read more.

In most visual tracking tasks, the target is tracked by a bounding box given in the first frame. The complexity and redundancy of background information in the bounding box inevitably exist and affect tracking performance. To alleviate the influence of background, we propose a robust object descriptor for visual tracking in this paper. First, we decompose the bounding box into non-overlapping patches and extract the color and gradient histograms features for each patch. Second, we adopt the minimum barrier distance (MBD) to calculate patch weights. Specifically, we consider the boundary patches as the background seeds and calculate the MBD from each patch to the seed set as the weight of each patch since the weight calculated by MBD can represent the difference between each patch and the background more effectively. Finally, we impose the weight on the extracted feature to get the descriptor of each patch and then incorporate our MBD-based descriptor into the structured support vector machine algorithm for tracking. Experiments on two benchmark datasets demonstrate the effectiveness of the proposed approach. Full article

(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)

► Show Figures

Graphical abstract

14 pages, 9562 KB

Open AccessArticle

An Improved Neural Network Cascade for Face Detection in Large Scene Surveillance

by Chengbin Peng, Wei Bu, Jiangjian Xiao, Ka-chun Wong and Minmin Yang

Appl. Sci. 2018, 8(11), 2222; https://doi.org/10.3390/app8112222 - 11 Nov 2018

Cited by 8 | Viewed by 5140

Abstract

Face detection for security cameras monitoring large and crowded areas is very important for public safety. However, it is much more difficult than traditional face detection tasks. One reason is, in large areas like squares, stations and stadiums, faces captured by cameras are [...] Read more.

Face detection for security cameras monitoring large and crowded areas is very important for public safety. However, it is much more difficult than traditional face detection tasks. One reason is, in large areas like squares, stations and stadiums, faces captured by cameras are usually at a low resolution and thus miss many facial details. In this paper, we improve popular cascade algorithms by proposing a novel multi-resolution framework that utilizes parallel convolutional neural network cascades for detecting faces in large scene. This framework utilizes the face and head-with-shoulder information together to deal with the large area surveillance images. Comparing with popular cascade algorithms, our method outperforms them by a large margin. Full article

(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)

► Show Figures

Figure 1

15 pages, 8694 KB

Open AccessArticle

Deep Learning Case Study for Automatic Bird Identification

by Juha Niemi and Juha T. Tanttu

Appl. Sci. 2018, 8(11), 2089; https://doi.org/10.3390/app8112089 - 29 Oct 2018

Cited by 31 | Viewed by 7230

Abstract

An automatic bird identification system is required for offshore wind farms in Finland. Indubitably, a radar is the obvious choice to detect flying birds, but external information is required for actual identification. We applied visual camera images as external data. The proposed system [...] Read more.

An automatic bird identification system is required for offshore wind farms in Finland. Indubitably, a radar is the obvious choice to detect flying birds, but external information is required for actual identification. We applied visual camera images as external data. The proposed system for automatic bird identification consists of a radar, a motorized video head and a single-lens reflex camera with a telephoto lens. A convolutional neural network trained with a deep learning algorithm is applied to the image classification. We also propose a data augmentation method in which images are rotated and converted in accordance with the desired color temperatures. The final identification is based on a fusion of parameters provided by the radar and the predictions of the image classifier. The sensitivity of this proposed system, on a dataset containing 9312 manually taken original images resulting in 2.44 × 10⁶ augmented data set, is 0.9463 as an image classifier. The area under receiver operating characteristic curve for two key bird species is 0.9993 (the White-tailed Eagle) and 0.9496 (The Lesser Black-backed Gull), respectively. We proposed a novel system for automatic bird identification as a real world application. We demonstrated that our data augmentation method is suitable for image classification problem and it significantly increases the performance of the classifier. Full article

(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)

► Show Figures

Figure 1

31 pages, 14005 KB

Open AccessArticle

An Image-Based Fall Detection System for the Elderly

by Kun-Lin Lu and Edward T.-H. Chu

Appl. Sci. 2018, 8(10), 1995; https://doi.org/10.3390/app8101995 - 20 Oct 2018

Cited by 38 | Viewed by 11292

Abstract

Due to advances in medical technology, the elderly population has continued to grow. Elderly healthcare issues have been widely discussed—especially fall accidents—because a fall can lead to a fracture and have serious consequences. Therefore, the effective detection of fall accidents is important for [...] Read more.

Due to advances in medical technology, the elderly population has continued to grow. Elderly healthcare issues have been widely discussed—especially fall accidents—because a fall can lead to a fracture and have serious consequences. Therefore, the effective detection of fall accidents is important for both elderly people and their caregivers. In this work, we designed an Image-based FAll Detection System (IFADS) for nursing homes, where public areas are usually equipped with surveillance cameras. Unlike existing fall detection algorithms, we mainly focused on falls that occur while sitting down and standing up from a chair, because the two activities together account for a higher proportion of falls than forward walking. IFADS first applies an object detection algorithm to identify people in a video frame. Then, a posture recognition method is used to keep tracking the status of the people by checking the relative positions of the chair and the people. An alarm is triggered when a fall is detected. In order to evaluate the effectiveness of IFADS, we not only simulated different fall scenarios, but also adopted YouTube and Giphy videos that captured real falls. Our experimental results showed that IFADS achieved an average accuracy of 95.96%. Therefore, IFADS can be used by nursing homes to improve the quality of residential care facilities. Full article

(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)

► Show Figures

Figure 1

19 pages, 3578 KB

Open AccessArticle

A New Cost Function Combining Deep Neural Networks (DNNs) and l2,1-Norm with Extraction of Robust Facial and Superpixels Features in Age Estimation

by Arafat Abu Mallouh, Zakariya Qawaqneh and Buket D. Barkana

Appl. Sci. 2018, 8(10), 1943; https://doi.org/10.3390/app8101943 - 16 Oct 2018

Viewed by 4502

Abstract

Automatic age estimation from unconstrained facial images is a challenging task and it recently has gained much attention due to its wide range of applications. In this paper, we propose a new model based on convolutional neural networks (CNNs) and l2,1-norm to [...] Read more.

Automatic age estimation from unconstrained facial images is a challenging task and it recently has gained much attention due to its wide range of applications. In this paper, we propose a new model based on convolutional neural networks (CNNs) and l2,1-norm to select age-related features for the age estimation task. A new cost function is proposed. To learn and train the new model, we provide the analysis and the proof for the convergence of the new cost function to solve minimization problem of deep neural networks (DNNs) and the l2,1-norm. High-level features are extracted from the facial images by using transfer learning, since there are currently not enough large age databases that can be used to train a deep learning network. Then, the extracted features are fed to the proposed model to select the most efficient age-related features. In addition, a new system that is based on DNN to jointly fine-tune two different DNNs with two different feature sets is developed. Experimental results show the effectiveness of the proposed methods and achieved the state-of-art performance on a public database. Full article

(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)

► Show Figures

Figure 1

17 pages, 4724 KB

Open AccessArticle

Temporal Action Detection in Untrimmed Videos from Fine to Coarse Granularity

by Guangle Yao, Tao Lei, Xianyuan Liu and Ping Jiang

Appl. Sci. 2018, 8(10), 1924; https://doi.org/10.3390/app8101924 - 15 Oct 2018

Cited by 7 | Viewed by 3898

Abstract

Temporal action detection in long, untrimmed videos is an important yet challenging task that requires not only recognizing the categories of actions in videos, but also localizing the start and end times of each action. Recent years, artificial neural networks, such as Convolutional [...] Read more.

Temporal action detection in long, untrimmed videos is an important yet challenging task that requires not only recognizing the categories of actions in videos, but also localizing the start and end times of each action. Recent years, artificial neural networks, such as Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) improve the performance significantly in various computer vision tasks, including action detection. In this paper, we make the most of different granular classifiers and propose to detect action from fine to coarse granularity, which is also in line with the people’s detection habits. Our action detection method is built in the ‘proposal then classification’ framework. We employ several neural network architectures as deep information extractor and segment-level (fine granular) and window-level (coarse granular) classifiers. Each of the proposal and classification steps is executed from the segment to window level. The experimental results show that our method not only achieves detection performance that is comparable to that of state-of-the-art methods, but also has a relatively balanced performance for different action categories. Full article

(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)

► Show Figures

Figure 1

13 pages, 4360 KB

Open AccessArticle

Large-Scale Fine-Grained Bird Recognition Based on a Triplet Network and Bilinear Model

by Zhicheng Zhao, Ze Luo, Jian Li, Kaihua Wang and Bingying Shi

Appl. Sci. 2018, 8(10), 1906; https://doi.org/10.3390/app8101906 - 13 Oct 2018

Cited by 8 | Viewed by 5206

Abstract

The main purpose of fine-grained classification is to distinguish among many subcategories of a single basic category, such as birds or flowers. We propose a model based on a triple network and bilinear methods for fine-grained bird identification. Our proposed model can be [...] Read more.

The main purpose of fine-grained classification is to distinguish among many subcategories of a single basic category, such as birds or flowers. We propose a model based on a triple network and bilinear methods for fine-grained bird identification. Our proposed model can be trained in an end-to-end manner, which effectively increases the inter-class distance of the network extraction features and improves the accuracy of bird recognition. When experimentally tested on 1096 birds in a custom-built dataset and on Caltech-UCSD (a public bird dataset), the model achieved an accuracy of 88.91% and 85.58%, respectively. The experimental results confirm the high generalization ability of our model in fine-grained image classification. Moreover, our model requires no additional manual annotation information such as object-labeling frames and part-labeling points, which guarantees good versatility and robustness in fine-grained bird recognition. Full article

(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)

► Show Figures

Figure 1

13 pages, 2167 KB

Open AccessArticle

A Novel Lightweight Approach for Video Retrieval on Mobile Augmented Reality Environment

by Joolekha Bibi Joolee, Md Azher Uddin, Jawad Khan, Taeyeon Kim and Young-Koo Lee

Appl. Sci. 2018, 8(10), 1860; https://doi.org/10.3390/app8101860 - 10 Oct 2018

Cited by 3 | Viewed by 3262

Abstract

Mobile Augmented Reality merges the virtual objects with real world on mobile devices, while video retrieval brings out the similar looking videos from the large-scale video dataset. Since mobile augmented reality application demands the real-time interaction and operation, we need to process and [...] Read more.

Mobile Augmented Reality merges the virtual objects with real world on mobile devices, while video retrieval brings out the similar looking videos from the large-scale video dataset. Since mobile augmented reality application demands the real-time interaction and operation, we need to process and interact in real-time. Furthermore, augmented reality based virtual objects can be poorly textured. In order to resolve the above mentioned issues, in this research, we propose a novel, fast and robust approach for retrieving videos on the mobile augmented reality environment using an image and video queries. In the beginning, Top-K key-frames are extracted from the videos which significantly increases the efficiency. Secondly, we introduce a novel frame based feature extraction method, namely Pyramid Ternary Histogram of Oriented Gradient (PTHOG) to extract the shape feature from the virtual objects in an effective and efficient manner. Thirdly, we utilize the Double-Bit Quantization (DBQ) based hashing to accomplish the nearest neighbor search efficiently, which produce the candidate list of videos. Lastly, the similarity measure is performed to re-rank the videos which are obtained from the candidate list. An extensive experimental analysis is performed in order to verify our claims. Full article

(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)

► Show Figures

Figure 1

11 pages, 16777 KB

Open AccessArticle

A Method for Singular Points Detection Based on Faster-RCNN

by Yonghong Liu, Baicun Zhou, Congying Han, Tiande Guo and Jin Qin

Appl. Sci. 2018, 8(10), 1853; https://doi.org/10.3390/app8101853 - 9 Oct 2018

Cited by 7 | Viewed by 4944

Abstract

Most methods for singular points detection usually depend on the orientation fields of fingerprints, which cannot achieve reliable and accurate detection of poor quality fingerprints. In this study, a new method for fingerprint singular points detection based on Faster-RCNN (Faster Region-based Convolutional Network [...] Read more.

Most methods for singular points detection usually depend on the orientation fields of fingerprints, which cannot achieve reliable and accurate detection of poor quality fingerprints. In this study, a new method for fingerprint singular points detection based on Faster-RCNN (Faster Region-based Convolutional Network method) is proposed, which is a two-step process, and an orientation constraint is added in Faster-RCNN to obtain orientation information of singular points. Besides, we designed a convolutional neural network (ConvNet) for singular points detection according to the characteristics of fingerprint images and the existing works. Specifically, the proposed method could extract singular points directly from raw fingerprint images without traditional preprocessing. Experimental results demonstrate the effectiveness of the proposed method. In comparison with other detection algorithms, our method achieves 96.03% detection rate for core points and 98.33% detection rate for delta points on FVC2002 DB1 dataset while 90.75% for core points and 94.87% on NIST SD4 dataset, which outperform other algorithms. Full article

(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)

► Show Figures

Figure 1

15 pages, 2480 KB

Open AccessArticle

Necessary Morphological Patches Extraction for Automatic Micro-Expression Recognition

by Yue Zhao and Jiancheng Xu

Appl. Sci. 2018, 8(10), 1811; https://doi.org/10.3390/app8101811 - 3 Oct 2018

Cited by 7 | Viewed by 4693

Abstract

Micro expressions are usually subtle and brief facial expressions that humans use to hide their true emotional states. In recent years, micro-expression recognition has attracted wide attention in the fields of psychology, mass media, and computer vision. The shortest micro expression lasts only [...] Read more.

Micro expressions are usually subtle and brief facial expressions that humans use to hide their true emotional states. In recent years, micro-expression recognition has attracted wide attention in the fields of psychology, mass media, and computer vision. The shortest micro expression lasts only 1/25 s. Furthermore, different from macro-expressions, micro-expressions have considerable low intensity and inadequate contraction of the facial muscles. Based on these characteristics, automatic micro-expression detection and recognition are great challenges in the field of computer vision. In this paper, we propose a novel automatic facial expression recognition framework based on necessary morphological patches (NMPs) to better detect and identify micro expressions. Micro expression is a subconscious facial muscle response. It is not controlled by the rational thought of the brain. Therefore, it calls on a few facial muscles and has local properties. NMPs are the facial regions that must be involved when a micro expression occurs. NMPs were screened based on weighting the facial active patches instead of the holistic utilization of the entire facial area. Firstly, we manually define the active facial patches according to the facial landmark coordinates and the facial action coding system (FACS). Secondly, we use a LBP-TOP descriptor to extract features in these patches and the Entropy-Weight method to select NMP. Finally, we obtain the weighted LBP-TOP features of these NMP. We test on two recent publicly available datasets: CASME II and SMIC database that provided sufficient samples. Compared with many recent state-of-the-art approaches, our method achieves more promising recognition results. Full article

(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)

► Show Figures

Figure 1

17 pages, 8699 KB

Open AccessEditor’s ChoiceArticle

Visualization and Interpretation of Convolutional Neural Network Predictions in Detecting Pneumonia in Pediatric Chest Radiographs

by Sivaramakrishnan Rajaraman, Sema Candemir, Incheol Kim, George Thoma and Sameer Antani

Appl. Sci. 2018, 8(10), 1715; https://doi.org/10.3390/app8101715 - 20 Sep 2018

Cited by 262 | Viewed by 20497

Abstract

Pneumonia affects 7% of the global population, resulting in 2 million pediatric deaths every year. Chest X-ray (CXR) analysis is routinely performed to diagnose the disease. Computer-aided diagnostic (CADx) tools aim to supplement decision-making. These tools process the handcrafted and/or convolutional neural network [...] Read more.

Pneumonia affects 7% of the global population, resulting in 2 million pediatric deaths every year. Chest X-ray (CXR) analysis is routinely performed to diagnose the disease. Computer-aided diagnostic (CADx) tools aim to supplement decision-making. These tools process the handcrafted and/or convolutional neural network (CNN) extracted image features for visual recognition. However, CNNs are perceived as black boxes since their performance lack explanations. This is a serious bottleneck in applications involving medical screening/diagnosis since poorly interpreted model behavior could adversely affect the clinical decision. In this study, we evaluate, visualize, and explain the performance of customized CNNs to detect pneumonia and further differentiate between bacterial and viral types in pediatric CXRs. We present a novel visualization strategy to localize the region of interest (ROI) that is considered relevant for model predictions across all the inputs that belong to an expected class. We statistically validate the models’ performance toward the underlying tasks. We observe that the customized VGG16 model achieves 96.2% and 93.6% accuracy in detecting the disease and distinguishing between bacterial and viral pneumonia respectively. The model outperforms the state-of-the-art in all performance metrics and demonstrates reduced bias and improved generalization. Full article

(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)

► Show Figures

Figure 1

21 pages, 420 KB

Open AccessArticle

Associative Memories to Accelerate Approximate Nearest Neighbor Search

by Vincent Gripon, Matthias Löwe and Franck Vermet

Appl. Sci. 2018, 8(9), 1676; https://doi.org/10.3390/app8091676 - 16 Sep 2018

Cited by 7 | Viewed by 4965

Abstract

Nearest neighbor search is a very active field in machine learning. It appears in many application cases, including classification and object retrieval. In its naive implementation, the complexity of the search is linear in the product of the dimension and the cardinality of [...] Read more.

Nearest neighbor search is a very active field in machine learning. It appears in many application cases, including classification and object retrieval. In its naive implementation, the complexity of the search is linear in the product of the dimension and the cardinality of the collection of vectors into which the search is performed. Recently, many works have focused on reducing the dimension of vectors using quantization techniques or hashing, while providing an approximate result. In this paper, we focus instead on tackling the cardinality of the collection of vectors. Namely, we introduce a technique that partitions the collection of vectors and stores each part in its own associative memory. When a query vector is given to the system, associative memories are polled to identify which one contains the closest match. Then, an exhaustive search is conducted only on the part of vectors stored in the selected associative memory. We study the effectiveness of the system when messages to store are generated from i.i.d. uniform ±1 random variables or 0–1 sparse i.i.d. random variables. We also conduct experiments on both synthetic data and real data and show that it is possible to achieve interesting trade-offs between complexity and accuracy. Full article

(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)

► Show Figures

Figure 1

20 pages, 5856 KB

Open AccessArticle

Detection and Classification of Overlapping Cell Nuclei in Cytology Effusion Images Using a Double-Strategy Random Forest

by Khin Yadanar Win, Somsak Choomchuay, Kazuhiko Hamamoto and Manasanan Raveesunthornkiat

Appl. Sci. 2018, 8(9), 1608; https://doi.org/10.3390/app8091608 - 11 Sep 2018

Cited by 15 | Viewed by 7447

Abstract

Due to the close resemblance between overlapping and cancerous nuclei, the misinterpretation of overlapping nuclei can affect the final decision of cancer cell detection. Thus, it is essential to detect overlapping nuclei and distinguish them from single ones for subsequent quantitative analyses. This [...] Read more.

Due to the close resemblance between overlapping and cancerous nuclei, the misinterpretation of overlapping nuclei can affect the final decision of cancer cell detection. Thus, it is essential to detect overlapping nuclei and distinguish them from single ones for subsequent quantitative analyses. This paper presents a method for the automated detection and classification of overlapping nuclei from single nuclei appearing in cytology pleural effusion (CPE) images. The proposed system is comprised of three steps: nuclei candidate extraction, dominant feature extraction, and classification of single and overlapping nuclei. A maximum entropy thresholding method complemented by image enhancement and post-processing was employed for nuclei candidate extraction. For feature extraction, a new combination of 16 geometrical and 10 textural features was extracted from each nucleus region. A double-strategy random forest was performed as an ensemble feature selector to select the most relevant features, and an ensemble classifier to differentiate between overlapping nuclei and single ones using selected features. The proposed method was evaluated on 4000 nuclei from CPE images using various performance metrics. The results were 96.6% sensitivity, 98.7% specificity, 92.7% precision, 94.6% F1 score, 98.4% accuracy, 97.6% G-mean, and 99% area under curve. The computation time required to run the entire algorithm was just 5.17 s. The experiment results demonstrate that the proposed algorithm yields a superior performance to previous studies and other classifiers. The proposed algorithm can serve as a new supportive tool in the automated diagnosis of cancer cells from cytology images. Full article

(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)

► Show Figures

Figure 1

13 pages, 1699 KB

Open AccessArticle

Multiple Network Fusion with Low-Rank Representation for Image-Based Age Estimation

by Chaoqun Hong, Zhiqiang Zeng, Xiaodong Wang and Weiwei Zhuang

Appl. Sci. 2018, 8(9), 1601; https://doi.org/10.3390/app8091601 - 10 Sep 2018

Viewed by 3275

Abstract

Image-based age estimation is a challenging task since there are ambiguities between the apparent age of face images and the actual ages of people. Therefore, data-driven methods are popular. To improve data utilization and estimation performance, we propose an image-based age estimation method. [...] Read more.

Image-based age estimation is a challenging task since there are ambiguities between the apparent age of face images and the actual ages of people. Therefore, data-driven methods are popular. To improve data utilization and estimation performance, we propose an image-based age estimation method. Theoretically speaking, the key idea of the proposed method is to integrate multi-modal features of face images. In order to achieve it, we propose a multi-modal learning framework, which is called Multiple Network Fusion with Low-Rank Representation (MNF-LRR). In this process, different deep neural network (DNN) structures, such as autoencoders, Convolutional Neural Networks (CNNs), Recursive Neural Networks (RNNs), and so on, can be used to extract semantic information of facial images. The outputs of these neural networks are then represented in a low-rank feature space. In this way, feature fusion is obtained in this space, and robust multi-modal image features can be computed. An experimental evaluation is conducted on two challenging face datasets for image-based age estimation extracted from the Internet Move Database (IMDB) and Wikipedia (WIKI). The results show the effectiveness of the proposed MNF-LRR. Full article

(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)

► Show Figures

Figure 1

17 pages, 1609 KB

Open AccessArticle

Planning Lung Radiotherapy Incorporating Motion Freeze PET/CT Imaging

by Shih-Neng Yang, Wan-Chi Chiang, Kuei-Ting Chou, Geoffrey Zhang and Tzung-Chi Huang

Appl. Sci. 2018, 8(9), 1583; https://doi.org/10.3390/app8091583 - 7 Sep 2018

Cited by 3 | Viewed by 4042

Abstract

Motion Freeze (MF), which integrates 100% of the signal of each respiratory phase in four-dimensional positron emission tomography (4D-PET) images and creates the MF-PET, is capable of eliminate the influences induced by respiratory motion and dispersing from three-dimensional PET (3D-PET) and 4D-PET images. [...] Read more.

Motion Freeze (MF), which integrates 100% of the signal of each respiratory phase in four-dimensional positron emission tomography (4D-PET) images and creates the MF-PET, is capable of eliminate the influences induced by respiratory motion and dispersing from three-dimensional PET (3D-PET) and 4D-PET images. In this study, the effectiveness of respiratory gated radiotherapy applying MF-PET (MF-Plan) in lung cancer patient was investigated and compared with three-dimensional intensity modulated radiotherapy (3D-Plan) and routine respiratory gated radiotherapy (4D-Plan) on the impact of target volume and dosimetry. Thirteen lung cancer patients were enrolled. The internal target volumes were generated with 40% of maximum standardized uptake value. The 3D-Plan, 4D-Plan, and MF-Plan were created for each patient to study the radiation to the targets and organs at risk. MF-Plans were associated with significant reductions in lung, heart, and spinal cord doses. The median reductions in lung V20, lung mean, heart mean doses, and spinal cord maximum dose compared with 3D-Plans were improved. When compared with 4D-Plans, the median reductions in lung V20, lung mean dose, heart mean dose, and spinal cord maximum dose were improved. Our results indicate that the MF-Plan may improve critical organ sparing in the lung, heart, and spinal cord, while maintaining high target coverage. Full article

(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)

► Show Figures

Figure 1

15 pages, 3979 KB

Open AccessArticle

Automatic Metallic Surface Defect Detection and Recognition with Convolutional Neural Networks

by Xian Tao, Dapeng Zhang, Wenzhi Ma, Xilong Liu and De Xu

Appl. Sci. 2018, 8(9), 1575; https://doi.org/10.3390/app8091575 - 6 Sep 2018

Cited by 434 | Viewed by 28569

Abstract

Automatic metallic surface defect inspection has received increased attention in relation to the quality control of industrial products. Metallic defect detection is usually performed against complex industrial scenarios, presenting an interesting but challenging problem. Traditional methods are based on image processing or shallow [...] Read more.

Automatic metallic surface defect inspection has received increased attention in relation to the quality control of industrial products. Metallic defect detection is usually performed against complex industrial scenarios, presenting an interesting but challenging problem. Traditional methods are based on image processing or shallow machine learning techniques, but these can only detect defects under specific detection conditions, such as obvious defect contours with strong contrast and low noise, at certain scales, or under specific illumination conditions. This paper discusses the automatic detection of metallic defects with a twofold procedure that accurately localizes and classifies defects appearing in input images captured from real industrial environments. A novel cascaded autoencoder (CASAE) architecture is designed for segmenting and localizing defects. The cascading network transforms the input defect image into a pixel-wise prediction mask based on semantic segmentation. The defect regions of segmented results are classified into their specific classes via a compact convolutional neural network (CNN). Metallic defects under various conditions can be successfully detected using an industrial dataset. The experimental results demonstrate that this method meets the robustness and accuracy requirements for metallic defect detection. Meanwhile, it can also be extended to other detection applications. Full article

(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)

► Show Figures

Figure 1

12 pages, 2427 KB

Open AccessArticle

PSI-CNN: A Pyramid-Based Scale-Invariant CNN Architecture for Face Recognition Robust to Various Image Resolutions

by Gi Pyo Nam, Heeseung Choi, Junghyun Cho and Ig-Jae Kim

Appl. Sci. 2018, 8(9), 1561; https://doi.org/10.3390/app8091561 - 5 Sep 2018

Cited by 24 | Viewed by 5994

Abstract

Face recognition is one research area that has benefited from the recent popularity of deep learning, namely the convolutional neural network (CNN) model. Nevertheless, the recognition performance is still compromised by the model’s dependency on the scale of input images and the limited [...] Read more.

Face recognition is one research area that has benefited from the recent popularity of deep learning, namely the convolutional neural network (CNN) model. Nevertheless, the recognition performance is still compromised by the model’s dependency on the scale of input images and the limited number of feature maps in each layer of the network. To circumvent these issues, we propose PSI-CNN, a generic pyramid-based scale-invariant CNN architecture which additionally extracts untrained feature maps across multiple image resolutions, thereby allowing the network to learn scale-independent information and improving the recognition performance on low resolution images. Experimental results on the LFW dataset and our own CCTV database show PSI-CNN consistently outperforming the widely-adopted VGG face model in terms of face matching accuracy. Full article

(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)

► Show Figures

Figure 1

13 pages, 3669 KB

Open AccessArticle

The Application of Deep Learning and Image Processing Technology in Laser Positioning

by Chern-Sheng Lin, Yu-Chia Huang, Shih-Hua Chen, Yu-Liang Hsu and Yu-Chen Lin

Appl. Sci. 2018, 8(9), 1542; https://doi.org/10.3390/app8091542 - 3 Sep 2018

Cited by 24 | Viewed by 5395

Abstract

In this study, machine vision technology was used to precisely position the highest energy of the laser spot to facilitate the subsequent joining of product workpieces in a laser welding machine. The displacement stage could place workpieces into the superposition area and allow [...] Read more.

In this study, machine vision technology was used to precisely position the highest energy of the laser spot to facilitate the subsequent joining of product workpieces in a laser welding machine. The displacement stage could place workpieces into the superposition area and allow the parts to be joined. With deep learning and a convolutional neural network training program, the system could enhance the accuracy of the positioning and enhance the efficiency of the machine work. A bi-analytic deep learning localization method was proposed in this study. A camera was used for real-time monitoring. The first step was to use a convolutional neural network to perform a large-scale preliminary search and locate the laser light spot region. The second step was to increase the optical magnification of the camera, re-image the spot area, and then use template matching to perform high-precision repositioning. According to the aspect ratio of the search result area, the integrity parameters of the target spot were determined. The centroid calculation was performed in the complete laser spot. If the target was an incomplete laser spot, the operation of invariant moments would be performed. Based on the result, the precise position of the highest energy of the laser spot could be obtained from the incomplete laser spot image. The amount of displacement could be calculated by overlapping the highest energy of the laser spot and the center of the image. Full article

(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)

► Show Figures

Figure 1

15 pages, 3690 KB

Open AccessArticle

Partial Block Scheme and Adaptive Update Model for Kernelized Correlation Filters-Based Object Tracking

by Soowoong Jeong and Joonki Paik

Appl. Sci. 2018, 8(8), 1349; https://doi.org/10.3390/app8081349 - 10 Aug 2018

Cited by 2 | Viewed by 3415

Abstract

In visual object tracking, the dynamic environment is a challenging issue. Partial occlusion and scale variation are typical challenging problems. We present a correlation-based object tracking based on the discriminative model. To attenuate the influence by partial occlusion, partial sub-blocks are constructed from [...] Read more.

In visual object tracking, the dynamic environment is a challenging issue. Partial occlusion and scale variation are typical challenging problems. We present a correlation-based object tracking based on the discriminative model. To attenuate the influence by partial occlusion, partial sub-blocks are constructed from the original block, and each of them operates independently. The scale space is employed to deal with scale variation using a feature pyramid. We also present an adaptive update model with a weighting function to calculate the frame-adaptive learning rate. Theoretical analysis and experimental results demonstrate that the proposed method can robustly track drastic deformed objects. The sparse update reduces the computational cost for real-time tracking. Although the partial block scheme generation increases the computational cost, we present a novel sparse update approach to reduce the computational cost drastically for real-time tracking. The experiments were performed on a variety of sequences, and the proposed method exhibited better performance compared with the state-of-the-art trackers. Full article

(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)

► Show Figures

Figure 1

16 pages, 16204 KB

Open AccessArticle

Image Dehazing and Enhancement Using Principal Component Analysis and Modified Haze Features

by Minseo Kim, Soohwan Yu, Seonhee Park, Sangkeun Lee and Joonki Paik

Appl. Sci. 2018, 8(8), 1321; https://doi.org/10.3390/app8081321 - 8 Aug 2018

Cited by 8 | Viewed by 4957

Abstract

This paper presents a computationally efficient haze removal and image enhancement methods. The major contribution of the proposed research is two-fold: (i) an accurate atmospheric light estimation using principal component analysis, and (ii) learning-based transmission estimation. To reduce the computational cost, we impose [...] Read more.

This paper presents a computationally efficient haze removal and image enhancement methods. The major contribution of the proposed research is two-fold: (i) an accurate atmospheric light estimation using principal component analysis, and (ii) learning-based transmission estimation. To reduce the computational cost, we impose a constraint on the candidate pixels to estimate the haze components in the sub-image. In addition, the proposed method extracts modified haze-relevant features to estimate an accurate transmission using random forest. Experimental results show that the proposed method can provide high-quality results with a significantly reduced computational load compared with existing methods. In addition, we demonstrate that the proposed method can significantly enhance the contrast of low-light images according to the assumption on the visual similarity between the inverted low-light and haze images. Full article

(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)

► Show Figures

Figure 1

20 pages, 1558 KB

Open AccessArticle

Deep Region of Interest and Feature Extraction Models for Palmprint Verification Using Convolutional Neural Networks Transfer Learning

by Mahdieh Izadpanahkakhk, Seyyed Mohammad Razavi, Mehran Taghipour-Gorjikolaie, Seyyed Hamid Zahiri and Aurelio Uncini

Appl. Sci. 2018, 8(7), 1210; https://doi.org/10.3390/app8071210 - 23 Jul 2018

Cited by 76 | Viewed by 10076

Abstract

Palmprint verification is one of the most significant and popular approaches for personal authentication due to its high accuracy and efficiency. Using deep region of interest (ROI) and feature extraction models for palmprint verification, a novel approach is proposed where convolutional neural networks [...] Read more.

Palmprint verification is one of the most significant and popular approaches for personal authentication due to its high accuracy and efficiency. Using deep region of interest (ROI) and feature extraction models for palmprint verification, a novel approach is proposed where convolutional neural networks (CNNs) along with transfer learning are exploited. The extracted palmprint ROIs are fed to the final verification system, which is composed of two modules. These modules are (i) a pre-trained CNN architecture as a feature extractor and (ii) a machine learning classifier. In order to evaluate our proposed model, we computed the intersection over union (IoU) metric for ROI extraction along with accuracy, receiver operating characteristic (ROC) curves, and equal error rate (EER) for the verification task.The experiments demonstrated that the ROI extraction module could significantly find the appropriate palmprint ROIs, and the verification results were crucially precise. This was verified by different databases and classification methods employed in our proposed model. In comparison with other existing approaches, our model was competitive with the state-of-the-art approaches that rely on the representation of hand-crafted descriptors. We achieved a IoU score of 93% and EER of 0.0125 using a support vector machine (SVM) classifier for the contact-based Hong Kong Polytechnic University Palmprint (HKPU) database. It is notable that all codes are open-source and can be accessed online. Full article

(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)

► Show Figures

Figure 1

14 pages, 2831 KB

Open AccessArticle

Automated Diabetic Retinopathy Screening System Using Hybrid Simulated Annealing and Ensemble Bagging Classifier

by Syna Sreng, Noppadol Maneerat, Kazuhiko Hamamoto and Ronakorn Panjaphongse

Appl. Sci. 2018, 8(7), 1198; https://doi.org/10.3390/app8071198 - 22 Jul 2018

Cited by 20 | Viewed by 7840

Abstract

Diabetic Retinopathy (DR) is the leading cause of blindness in working-age adults globally. Primary screening of DR is essential, and it is recommended that diabetes patients undergo this procedure at least once per year to prevent vision loss. However, in addition to the [...] Read more.

Diabetic Retinopathy (DR) is the leading cause of blindness in working-age adults globally. Primary screening of DR is essential, and it is recommended that diabetes patients undergo this procedure at least once per year to prevent vision loss. However, in addition to the insufficient number of ophthalmologists available, the eye examination itself is labor-intensive and time-consuming. Thus, an automated DR screening method using retinal images is proposed in this paper to reduce the workload of ophthalmologists in the primary screening process and so that ophthalmologists may make effective treatment plans promptly to help prevent patient blindness. First, all possible candidate lesions of DR were segmented from the whole retinal image using a combination of morphological-top-hat and Kirsch edge-detection methods supplemented by pre- and post-processing steps. Then, eight feature extractors were utilized to extract a total of 208 features based on the pixel density of the binary image as well as texture, color, and intensity information for the detected regions. Finally, hybrid simulated annealing was applied to select the optimal feature set to be used as the input to the ensemble bagging classifier. The evaluation results of this proposed method, on a dataset containing 1200 retinal images, indicate that it performs better than previous methods, with an accuracy of 97.08%, a sensitivity of 90.90%, a specificity of 98.92%, a precision of 96.15%, an F-measure of 93.45% and the area under receiver operating characteristic curve at 98.34%. Full article

(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)

► Show Figures

Figure 1

14 pages, 17703 KB

Open AccessArticle

Image Segmentation by Searching for Image Feature Density Peaks

by Zhe Sun, Meng Qi, Jian Lian, Weikuan Jia, Wei Zou, Yunlong He, Hong Liu and Yuanjie Zheng

Appl. Sci. 2018, 8(6), 969; https://doi.org/10.3390/app8060969 - 13 Jun 2018

Cited by 5 | Viewed by 5207

Abstract

Image segmentation attempts to classify the pixels of a digital image into multiple groups to facilitate subsequent image processing. It is an essential problem in many research areas such as computer vision and image processing application. A large number of techniques have been [...] Read more.

Image segmentation attempts to classify the pixels of a digital image into multiple groups to facilitate subsequent image processing. It is an essential problem in many research areas such as computer vision and image processing application. A large number of techniques have been proposed for image segmentation. Among these techniques, the clustering-based segmentation algorithms occupy an extremely important position in this field. However, existing popular clustering schemes often depends on prior knowledge and threshold used in the clustering process, or lack of an automatic mechanism to find clustering centers. In this paper, we propose a novel image segmentation method by searching for image feature density peaks. We apply the clustering method to each superpixel in an input image and construct the final segmentation map according to the classification results of each pixel. Our method can give the number of clusters directly without prior knowledge, and the cluster centers can be recognized automatically without interference from noise. Experimental results validate the improved robustness and effectiveness of the proposed method. Full article

(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)

► Show Figures

Figure 1

20 pages, 9732 KB

Open AccessArticle

Superpixel Segmentation Using Weighted Coplanar Feature Clustering on RGBD Images

by Zhuoqun Fang, Xiaosheng Yu, Chengdong Wu, Dongyue Chen and Tong Jia

Appl. Sci. 2018, 8(6), 902; https://doi.org/10.3390/app8060902 - 31 May 2018

Cited by 9 | Viewed by 4172

Abstract

Superpixel segmentation is a widely used preprocessing method in computer vision, but its performance is unsatisfactory for color images in cluttered indoor environments. In this work, a superpixel method named weighted coplanar feature clustering (WCFC) is proposed, which produces full coverage of superpixels [...] Read more.

Superpixel segmentation is a widely used preprocessing method in computer vision, but its performance is unsatisfactory for color images in cluttered indoor environments. In this work, a superpixel method named weighted coplanar feature clustering (WCFC) is proposed, which produces full coverage of superpixels in RGB-depth (RGBD) images of indoor scenes. Basically, a linear iterative clustering is adopted based on a cluster criterion that measures the color similarity, space proximity and geometric resemblance between pixels. However, to avoid the adverse impact of RGBD image flaws and to make full use of the depth information, WCFC first preprocesses the raw depth maps with an inpainting algorithm called a Cross-Bilateral Filter. Second, a coplanar feature is extracted from the refined RGBD image to represent the geometric similarities between pixels. Third, combined with the colors and positions of the pixels, the coplanar feature constructs the feature vector of the clustering method; thus, the distance measure, as the cluster criterion, is computed by normalizing the feature vectors. Finally, in order to extract the features of the RGBD image dynamically, a content-adaptive weight is introduced as a coefficient of the coplanar feature, which strikes a balance between the coplanar feature and other features. Experiments performed on the New York University (NYU) Depth V2 dataset demonstrate that WCFC outperforms the available state-of-the-art methods in terms of accuracy of superpixel segmentation, while maintaining a high speed. Full article

(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)

► Show Figures

Figure 1

28 pages, 11678 KB

Open AccessArticle

Development and Experimental Evaluation of Machine-Learning Techniques for an Intelligent Hairy Scalp Detection System

by Wei-Chien Wang, Liang-Bi Chen and Wan-Jung Chang

Appl. Sci. 2018, 8(6), 853; https://doi.org/10.3390/app8060853 - 23 May 2018

Cited by 30 | Viewed by 8364

Abstract

Deep learning has become the most popular research subject in the fields of artificial intelligence (AI) and machine learning. In October 2013, MIT Technology Review commented that deep learning was a breakthrough technology. Deep learning has made progress in voice and image recognition, [...] Read more.

Deep learning has become the most popular research subject in the fields of artificial intelligence (AI) and machine learning. In October 2013, MIT Technology Review commented that deep learning was a breakthrough technology. Deep learning has made progress in voice and image recognition, image classification, and natural language processing. Prior to deep learning, decision tree, linear discriminant analysis (LDA), support vector machines (SVM), k-nearest neighbors algorithm (K-NN), and ensemble learning were popular in solving classification problems. In this paper, we applied the previously mentioned and deep learning techniques to hairy scalp images. Hairy scalp problems are usually diagnosed by non-professionals in hair salons, and people with such problems may be advised by these non-professionals. Additionally, several common scalp problems are similar; therefore, non-experts may provide incorrect diagnoses. Hence, scalp problems have worsened. In this work, we implemented and compared the deep-learning method, the ImageNet-VGG-f model Bag of Words (BOW), with machine-learning classifiers, and histogram of oriented gradients (HOG)/pyramid histogram of oriented gradients (PHOG) with machine-learning classifiers. The tools from the classification learner apps were used for hairy scalp image classification. The results indicated that deep learning can achieve an accuracy of 89.77% when the learning rate is 1 × 10⁻⁴, and this accuracy is far higher than those achieved by BOW with SVM (80.50%) and PHOG with SVM (53.0%). Full article

(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)

► Show Figures

Figure 1

17 pages, 7324 KB

Open AccessArticle

An Improved Image Semantic Segmentation Method Based on Superpixels and Conditional Random Fields

by Wei Zhao, Yi Fu, Xiaosong Wei and Hai Wang

Appl. Sci. 2018, 8(5), 837; https://doi.org/10.3390/app8050837 - 22 May 2018

Cited by 41 | Viewed by 7528

Abstract

This paper proposed an improved image semantic segmentation method based on superpixels and conditional random fields (CRFs). The proposed method can take full advantage of the superpixel edge information and the constraint relationship among different pixels. First, we employ fully convolutional networks (FCN) [...] Read more.

This paper proposed an improved image semantic segmentation method based on superpixels and conditional random fields (CRFs). The proposed method can take full advantage of the superpixel edge information and the constraint relationship among different pixels. First, we employ fully convolutional networks (FCN) to obtain pixel-level semantic features and utilize simple linear iterative clustering (SLIC) to generate superpixel-level region information, respectively. Then, the segmentation results of image boundaries are optimized by the fusion of the obtained pixel-level and superpixel-level results. Finally, we make full use of the color and position information of pixels to further improve the semantic segmentation accuracy using the pixel-level prediction capability of CRFs. In summary, this improved method has advantages both in terms of excellent feature extraction capability and good boundary adherence. Experimental results on both the PASCAL VOC 2012 dataset and the Cityscapes dataset show that the proposed method can achieve significant improvement of segmentation accuracy in comparison with the traditional FCN model. Full article

(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)

► Show Figures

Figure 1

22 pages, 3482 KB

Open AccessArticle

An Efficient Multiscale Scheme Using Local Zernike Moments for Face Recognition

by Emrah Basaran, Muhittin Gökmen and Mustafa E. Kamasak

Appl. Sci. 2018, 8(5), 827; https://doi.org/10.3390/app8050827 - 21 May 2018

Cited by 11 | Viewed by 7539

Abstract

In this study, we propose a face recognition scheme using local Zernike moments (LZM), which can be used for both identification and verification. In this scheme, local patches around the landmarks are extracted from the complex components obtained by LZM transformation. Then, phase [...] Read more.

In this study, we propose a face recognition scheme using local Zernike moments (LZM), which can be used for both identification and verification. In this scheme, local patches around the landmarks are extracted from the complex components obtained by LZM transformation. Then, phase magnitude histograms are constructed within these patches to create descriptors for face images. An image pyramid is utilized to extract features at multiple scales, and the descriptors are constructed for each image in this pyramid. We used three different public datasets to examine the performance of the proposed method:Face Recognition Technology (FERET), Labeled Faces in the Wild (LFW), and Surveillance Cameras Face (SCface). The results revealed that the proposed method is robust against variations such as illumination, facial expression, and pose. Aside from this, it can be used for low-resolution face images acquired in uncontrolled environments or in the infrared spectrum. Experimental results show that our method outperforms state-of-the-art methods on FERET and SCface datasets. Full article

(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)

► Show Figures

Figure 1

16 pages, 1174 KB

Open AccessArticle

Multi-View Ground-Based Cloud Recognition by Transferring Deep Visual Information

by Zhong Zhang, Donghong Li, Shuang Liu, Baihua Xiao and Xiaozhong Cao

Appl. Sci. 2018, 8(5), 748; https://doi.org/10.3390/app8050748 - 9 May 2018

Cited by 16 | Viewed by 3350

Abstract

Since cloud images captured from different views possess extreme variations, multi-view ground-based cloud recognition is a very challenging task. In this paper, a study of view shift is presented in this field. We focus both on designing proper feature representation and learning distance [...] Read more.

Since cloud images captured from different views possess extreme variations, multi-view ground-based cloud recognition is a very challenging task. In this paper, a study of view shift is presented in this field. We focus both on designing proper feature representation and learning distance metrics from sample pairs. Correspondingly, we propose transfer deep local binary patterns (TDLBP) and weighted metric learning (WML). On one hand, to deal with view shift, like variations of illuminations, locations, resolutions and occlusions, we first utilize cloud images to train a convolutional neural network (CNN), and then extract local features from the part summing maps (PSMs) based on feature maps. Finally, we maximize the occurrences of regions for the final feature representation. On the other hand, the number of cloud images in each category varies greatly, leading to the unbalanced similar pairs. Hence, we propose a weighted strategy for metric learning. We validate the proposed method on three cloud datasets (the MOC_e, IAP_e, and CAMS_e) that are collected by different meteorological organizations in China, and the experimental results show the effectiveness of the proposed method. Full article

(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)

► Show Figures

Figure 1

Review

Jump to: Research

9 pages, 1129 KB

Open AccessReview

Place Recognition: An Overview of Vision Perspective

by Zhiqiang Zeng, Jian Zhang, Xiaodong Wang, Yuming Chen and Chaoyang Zhu

Appl. Sci. 2018, 8(11), 2257; https://doi.org/10.3390/app8112257 - 15 Nov 2018

Cited by 22 | Viewed by 6215

Abstract

Place recognition is one of the most fundamental topics in the computer-vision and robotics communities, where the task is to accurately and efficiently recognize the location of a given query image. Despite years of knowledge accumulated in this field, place recognition still remains [...] Read more.

Place recognition is one of the most fundamental topics in the computer-vision and robotics communities, where the task is to accurately and efficiently recognize the location of a given query image. Despite years of knowledge accumulated in this field, place recognition still remains an open problem due to the various ways in which the appearance of real-world places may differ. This paper presents an overview of the place-recognition literature. Since condition-invariant and viewpoint-invariant features are essential factors to long-term robust visual place-recognition systems, we start with traditional image-description methodology developed in the past, which exploits techniques from the image-retrieval field. Recently, the rapid advances of related fields, such as object detection and image classification, have inspired a new technique to improve visual place-recognition systems, that is, convolutional neural networks (CNNs). Thus, we then introduce the recent progress of visual place-recognition systems based on CNNs to automatically learn better image representations for places. Finally, we close with discussions and mention of future work on place recognition. Full article

(This article belongs to the Special Issue Advanced Intelligent Imaging Technology)

► Show Figures

Figure 1

Journal Menu

Journal Browser

Advanced Intelligent Imaging Technology

Share This Special Issue

Special Issue Editor

Special Issue Information

Keywords

Benefits of Publishing in a Special Issue

Published Papers (73 papers)

Research

Review

Further Information

Guidelines

MDPI Initiatives

Follow MDPI